Loading share button...

Z.ai: GLM 5vsPony Alpha

47 challenges

3 / 47

Estimate Complexity

reasoning·hard

Tests an AI's ability to make educated estimates based on technical knowledge

Z.ai: GLM 5

Z.ai: GLM 5

The total number of FLOPs required to train GPT‑3 is approximately 3.15 × 10²³.

Step‑by‑step reasoning:

  1. Model size: GPT‑3 has 175 billion (1.75 × 10¹¹) trainable parameters.

  2. Training data: According to the paper abstract, the model was trained on 300 billion (3 × 10¹¹) tokens.

  3. FLOPs per token for training a transformer:

    • Forward pass: each parameter is used in one multiply‑add operation → ~2 FLOPs per parameter → 2 N FLOPs per token.
    • Backward pass (gradient computation): roughly twice the forward cost → ~4 N FLOPs per token.
    • Total: ~6 N FLOPs per token.
      This approximation is standard for large transformer models and neglects small overheads like attention quadratic terms (which add only ~1–2% for GPT‑3’s sequence length).
  4. Calculation:
    Total FLOPs ≈ 6 × (1.75 × 10¹¹) × (3 × 10¹¹)
    = 6 × 5.25 × 10²²
    = 3.15 × 10²³.

  5. Consistency check: The GPT‑3 paper reports a training compute of 3,640 PF‑days. Converting:
    1 PF‑day = 10¹⁵ FLOP/s × 86400 s = 8.64 × 10¹⁹ FLOPs.
    3,640 PF‑days = 3,640 × 8.64 × 10¹⁹ ≈ 3.14 × 10²³ FLOPs, matching the estimate.

Thus, the total FLOPs required to train GPT‑3 is about 3.15 × 10²³.

Pony Alpha

Pony Alpha

The total number of FLOPs required to train GPT‑3 is approximately 3.15 × 10²³ (or 315 zettaFLOPs). The reasoning follows these steps:

  1. Model size – GPT‑3 has 175 billion (175 × 10⁹) parameters. This is a well‑known figure from the original paper.

  2. Training data size – The model was trained on 300 billion (300 × 10⁹) tokens. The paper states that all models were trained for a total of 300 billion tokens.

  3. FLOPs per parameter per token – For a transformer model, the compute can be approximated by:

    • Forward pass: each parameter is used in a multiply‑add operation, contributing about 2 FLOPs per token.
    • Backward pass (gradient computation): roughly 4 FLOPs per parameter per token (two extra passes for gradients with respect to weights and inputs).
    • Hence, training (forward + backward) requires about 6 FLOPs per parameter per token.
  4. Total FLOPs – Multiply the three numbers: [ \text{FLOPs} = 6 \times (175 \times 10^9) \times (300 \times 10^9) = 6 \times 175 \times 300 \times 10^{18} = 315,!000 \times 10^{18} = 3.15 \times 10^{23}. ]

  5. Consistency check – This estimate aligns with the reported compute of 3.14 × 10²³ FLOPs mentioned in OpenAI’s technical analysis, confirming its plausibility.

The calculation captures the dominant linear‑algebra operations; small additional costs (e.g., attention quadratic terms, layer norms) are negligible in comparison.

Vote
2 / 47

AI Board Game Logic

reasoning·medium

Tests an AI's ability to understand game rules and strategy

Z.ai: GLM 5

Z.ai: GLM 5

Pony Alpha

Pony Alpha

Vote
3 / 47

Logic Puzzle

reasoning·medium

Tests an AI's ability to solve a simple but potentially confusing logic puzzle

Z.ai: GLM 5

Z.ai: GLM 5

Sally has 1 sister.

Here is the breakdown:

  1. Since the brothers are all siblings, they share the same sisters.
  2. If each brother has 2 sisters, that means there are 2 girls in total in the family.
  3. Sally is one of those girls.
  4. Therefore, the other girl is Sally's only sister.
Pony Alpha

Pony Alpha

Sally has 1 sister.

Here is the breakdown:

  1. If each brother has 2 sisters, that means there are 2 girls in total in the family.
  2. Sally is one of those girls.
  3. Therefore, the other girl is Sally's only sister.
Vote
Loading