Loading share button...

Z.ai: GLM 5vsGPT-5

46 challenges

Why Z.ai: GLM 5?

3.5x cheaper overall$0.80/M in · $2.56/M out
Leads 1 of 1 benchmarks
Stronger on SWE-bench Verified77.8% vs 74.9%
6 months newerFeb 2026

Why GPT-5?

2x more context400K
3 / 46

Estimate Complexity

reasoning·hard

Tests an AI's ability to make educated estimates based on technical knowledge

Z.ai: GLM 5

Z.ai: GLM 5

The total number of FLOPs required to train GPT‑3 is approximately 3.15 × 10²³.

Step‑by‑step reasoning:

  1. Model size: GPT‑3 has 175 billion (1.75 × 10¹¹) trainable parameters.

  2. Training data: According to the paper abstract, the model was trained on 300 billion (3 × 10¹¹) tokens.

  3. FLOPs per token for training a transformer:

    • Forward pass: each parameter is used in one multiply‑add operation → ~2 FLOPs per parameter → 2 N FLOPs per token.
    • Backward pass (gradient computation): roughly twice the forward cost → ~4 N FLOPs per token.
    • Total: ~6 N FLOPs per token.
      This approximation is standard for large transformer models and neglects small overheads like attention quadratic terms (which add only ~1–2% for GPT‑3’s sequence length).
  4. Calculation:
    Total FLOPs ≈ 6 × (1.75 × 10¹¹) × (3 × 10¹¹)
    = 6 × 5.25 × 10²²
    = 3.15 × 10²³.

  5. Consistency check: The GPT‑3 paper reports a training compute of 3,640 PF‑days. Converting:
    1 PF‑day = 10¹⁵ FLOP/s × 86400 s = 8.64 × 10¹⁹ FLOPs.
    3,640 PF‑days = 3,640 × 8.64 × 10¹⁹ ≈ 3.14 × 10²³ FLOPs, matching the estimate.

Thus, the total FLOPs required to train GPT‑3 is about 3.15 × 10²³.

GPT-5

GPT-5

Here’s a back‑of‑the‑envelope estimate using standard LM training-compute heuristics.

  1. Assumptions
  • Model size (N): GPT-3 has about 175 billion parameters (N ≈ 1.75 × 10^11).
  • Training tokens (T): GPT-3 was trained on roughly 300 billion tokens (T ≈ 3.0 × 10^11).
  • FLOPs per token per parameter: A common rule of thumb for transformer LM training is FLOPs ≈ 6 × N × T The factor ~6 comes from counting forward and backward passes for the parameter-heavy matmuls (roughly 2N for forward + 4N for backward). It ignores small terms from layernorms, softmax, embeddings, and optimizer updates (which we’ll account for with a small overhead factor later).
  1. Core calculation
  • 6 × N × T = 6 × (1.75 × 10^11) × (3.0 × 10^11)
  • First multiply N × T: 1.75 × 3.0 = 5.25; powers: 10^11 × 10^11 = 10^22 So N × T = 5.25 × 10^22
  • Now multiply by 6: 6 × 5.25 × 10^22 = 31.5 × 10^22 = 3.15 × 10^23 FLOPs
  1. Overheads and uncertainty
  • Non-matmul ops (attention softmax, masking, residuals, layernorms), embeddings, and optimizer update steps add some overhead. A 10–20% uplift is a reasonable allowance.
  • Applying +10–20%: 3.15 × 10^23 × (1.1 to 1.2) ≈ (3.5 ± 0.3) × 10^23 FLOPs.

Answer

  • Point estimate: ~3.15 × 10^23 FLOPs
  • With a modest overhead allowance: roughly (3.3–3.8) × 10^23 FLOPs

This is a coarse estimate; the exact number depends on details like sequence length, activation checkpointing, implementation efficiency, and the precise token count.

Vote
2 / 46

AI Board Game Logic

reasoning·medium

Tests an AI's ability to understand game rules and strategy

Z.ai: GLM 5

Z.ai: GLM 5

GPT-5

GPT-5

Vote
3 / 46

Logic Puzzle

reasoning·medium

Tests an AI's ability to solve a simple but potentially confusing logic puzzle

Z.ai: GLM 5

Z.ai: GLM 5

Sally has 1 sister.

Here is the breakdown:

  1. Since the brothers are all siblings, they share the same sisters.
  2. If each brother has 2 sisters, that means there are 2 girls in total in the family.
  3. Sally is one of those girls.
  4. Therefore, the other girl is Sally's only sister.
GPT-5

GPT-5

1

Reason: If each of the 3 brothers has 2 sisters, there are exactly 2 girls in the family. One is Sally, so she has 1 sister.

Vote
Loading