Loading share button...

Z.AI: GLM 4.5vsPony Alpha

42 challenges

Why Z.AI: GLM 4.5?

No standout advantages

Why Pony Alpha?

1.6x more context200K
7 months newerFeb 2026
3 / 42

Estimate Complexity

reasoning·hard

Tests an AI's ability to make educated estimates based on technical knowledge

Z.AI: GLM 4.5

Z.AI: GLM 4.5

To estimate the total number of FLOPs (floating-point operations) required to train GPT-3 without searching online, we rely on known architectural details and standard computational models for transformer training. Here's the step-by-step reasoning:

Step 1: Key Parameters of GPT-3

  • Number of parameters (P): GPT-3 has 175 billion parameters (175 × 10⁹).
  • Training dataset size (D): The model was trained on approximately 300 billion tokens (300 × 10⁹ tokens). This is a standard figure associated with GPT-3.

Step 2: FLOPs per Token Calculation

Training a transformer model involves two main operations per token:

  1. Forward pass: Computes the model's output.
  2. Backward pass: Computes gradients for backpropagation.

For transformer architectures, the combined FLOPs for forward and backward passes per token is empirically estimated as 6 × P. This is derived from:

  • Forward pass: Requires ~2 × P FLOPs per token (matrix multiplications dominate, with each parameter used in multiply-add operations).
  • Backward pass: Typically requires ~2× the FLOPs of the forward pass due to gradient calculations (~4 × P FLOPs per token).
  • Total per token: 2P (forward) + 4P (backward) = 6P FLOPs per token.

Step 3: Total FLOPs for the Entire Dataset

The total FLOPs scales linearly with the dataset size:

  • Total FLOPs = FLOPs per token × Number of tokens.
  • Substituting the values:
    Total FLOPs = (6 × P) × D = 6 × (175 × 10⁹) × (300 × 10⁹).

Step 4: Calculation

  • Compute the product:
    6 × 175 × 300 × 10⁹ × 10⁹ = 6 × 52,500 × 10¹⁸.
  • Simplify:
    6 × 52,500 = 315,000.
    Total FLOPs = 315,000 × 10¹⁸ = 3.15 × 10²³ FLOPs.

Step 5: Validation with Known Compute

  • GPT-3 training reportedly used ~3,640 petaflop-days of compute.
  • Convert petaflop-days to FLOPs:
    1 petaflop-day = 10¹⁵ FLOPs × 86,400 seconds/day = 8.64 × 10¹⁹ FLOPs.
    3,640 petaflop-days = 3,640 × 8.64 × 10¹⁹ ≈ 3.14 × 10²³ FLOPs.
  • This closely matches our estimate of 3.15 × 10²³ FLOPs, confirming the calculation.

Step 6: Batch Size and Steps Consideration

  • Batch size: GPT-3 used a batch size of 3.2 million tokens per step (80 batches × 4096 tokens/batch).
  • Number of training steps:
    Steps = Total tokens / Batch size = 300 × 10⁹ / 3.2 × 10⁶ = 93,750 steps.
  • FLOPs per step:
    Per step FLOPs = 6 × P × Batch size = 6 × 175 × 10⁹ × 3.2 × 10⁶ = 3.36 × 10¹⁸ FLOPs/step.
  • Total FLOPs:
    93,750 steps × 3.36 × 10¹⁸ FLOPs/step ≈ 3.15 × 10²³ FLOPs.
    This consistency reinforces the estimate.

Step 7: Negligible Factors

  • Parameter updates: Optimizer steps (e.g., Adam) add O(P) FLOPs per step. For 93,750 steps:
    Update FLOPs ≈ 93,750 × 175 × 10⁹ ≈ 1.64 × 10¹⁶ FLOPs (negligible vs. 3.15 × 10²³).
  • Embedding/softmax layers: Included in the 6P per-token estimate.
  • Attention mechanisms: Captured by the 6P heuristic (validated for transformers).

Final Estimate

The total FLOPs required to train GPT-3 is 3.15 × 10²³ FLOPs. This is derived from:

  • Standard transformer FLOPs per token (6P).
  • GPT-3 parameters (175B) and dataset size (300B tokens).
  • Cross-validation with reported compute usage (3,640 petaflop-days).
Pony Alpha

Pony Alpha

The total number of FLOPs required to train GPT‑3 is approximately 3.15 × 10²³ (or 315 zettaFLOPs). The reasoning follows these steps:

  1. Model size – GPT‑3 has 175 billion (175 × 10⁹) parameters. This is a well‑known figure from the original paper.

  2. Training data size – The model was trained on 300 billion (300 × 10⁹) tokens. The paper states that all models were trained for a total of 300 billion tokens.

  3. FLOPs per parameter per token – For a transformer model, the compute can be approximated by:

    • Forward pass: each parameter is used in a multiply‑add operation, contributing about 2 FLOPs per token.
    • Backward pass (gradient computation): roughly 4 FLOPs per parameter per token (two extra passes for gradients with respect to weights and inputs).
    • Hence, training (forward + backward) requires about 6 FLOPs per parameter per token.
  4. Total FLOPs – Multiply the three numbers: [ \text{FLOPs} = 6 \times (175 \times 10^9) \times (300 \times 10^9) = 6 \times 175 \times 300 \times 10^{18} = 315,!000 \times 10^{18} = 3.15 \times 10^{23}. ]

  5. Consistency check – This estimate aligns with the reported compute of 3.14 × 10²³ FLOPs mentioned in OpenAI’s technical analysis, confirming its plausibility.

The calculation captures the dominant linear‑algebra operations; small additional costs (e.g., attention quadratic terms, layer norms) are negligible in comparison.

Vote
2 / 42

AI Board Game Logic

reasoning·medium

Tests an AI's ability to understand game rules and strategy

Z.AI: GLM 4.5

Z.AI: GLM 4.5

Pony Alpha

Pony Alpha

Vote
3 / 42

Logic Puzzle

reasoning·medium

Tests an AI's ability to solve a simple but potentially confusing logic puzzle

Z.AI: GLM 4.5

Z.AI: GLM 4.5

Sally has 3 brothers, meaning there are 3 boys in the family. Each brother has 2 sisters, indicating that there are exactly 2 girls in the family. Since Sally is one of the girls, there must be one additional sister. Therefore, Sally has one sister.

This can be verified by considering the family composition:

  • Sisters: Sally and one other girl.
  • Brothers: 3 boys.

From Sally's perspective:

  • She has 3 brothers.
  • She has 1 sister (the other girl).

From each brother's perspective:

  • They have 2 sisters (Sally and the other girl).

This satisfies both conditions: Sally has 3 brothers, and each brother has 2 sisters.

If there were only one sister (Sally), then each brother would have only one sister, which contradicts the information that each brother has 2 sisters. If there were three sisters, then each brother would have three sisters, which also contradicts the information that each brother has 2 sisters. Thus, there must be exactly two sisters, meaning Sally has one sister.

\boxed{1}

Pony Alpha

Pony Alpha

Sally has 1 sister.

Here is the breakdown:

  1. If each brother has 2 sisters, that means there are 2 girls in total in the family.
  2. Sally is one of those girls.
  3. Therefore, the other girl is Sally's only sister.
Vote
Loading