Rival

Z.ai: GLM 5vsGPT-4o (Omni)

46 challenges

Why Z.ai: GLM 5?

3.8x cheaper overall$0.80/M in · $2.56/M out

1.6x more context203K

21 months newerFeb 2026

Why GPT-4o (Omni)?

No standout advantages

3 / 46

Estimate Complexity

reasoning·hard

Tests an AI's ability to make educated estimates based on technical knowledge

Z.ai: GLM 5

Zhipu · Feb 2026text

The total number of FLOPs required to train GPT‑3 is approximately 3.15 × 10²³.

Step‑by‑step reasoning:

Model size: GPT‑3 has 175 billion (1.75 × 10¹¹) trainable parameters.
Training data: According to the paper abstract, the model was trained on 300 billion (3 × 10¹¹) tokens.
FLOPs per token for training a transformer:
- Forward pass: each parameter is used in one multiply‑add operation → ~2 FLOPs per parameter → 2 N FLOPs per token.
- Backward pass (gradient computation): roughly twice the forward cost → ~4 N FLOPs per token.
- Total: ~6 N FLOPs per token.
  This approximation is standard for large transformer models and neglects small overheads like attention quadratic terms (which add only ~1–2% for GPT‑3’s sequence length).
Calculation:
Total FLOPs ≈ 6 × (1.75 × 10¹¹) × (3 × 10¹¹)
= 6 × 5.25 × 10²²
= 3.15 × 10²³.
Consistency check: The GPT‑3 paper reports a training compute of 3,640 PF‑days. Converting:
1 PF‑day = 10¹⁵ FLOP/s × 86400 s = 8.64 × 10¹⁹ FLOPs.
3,640 PF‑days = 3,640 × 8.64 × 10¹⁹ ≈ 3.14 × 10²³ FLOPs, matching the estimate.

Thus, the total FLOPs required to train GPT‑3 is about 3.15 × 10²³.

GPT-4o (Omni)

Openai · May 2024text

To estimate the total number of FLOPs required to train GPT-3, let's break it down into key components.

1. Model Size and Parameters

GPT-3 has 175 billion parameters.

2. FLOPs per Forward and Backward Pass

In deep learning, the FLOPs required for a forward pass and backward pass are estimated as:

Forward pass: ( 2 imes ) the number of parameters
Backward pass: ( 2 imes ) the FLOPs of the forward pass

Thus, total FLOPs per training step:

[ ext{Total FLOPs per step} = 6 imes ext{number of parameters} imes ext{tokens per batch} ]

For GPT-3, let's assume 175B parameters and a batch size of 2048 tokens (common for large-scale training).

[ ext{FLOPs per training step} = 6 imes 175 imes 10^9 imes 2048 ]

[ = 2.15 imes 10^{15} ext{ FLOPs per step} ]

3. Total Training Steps

GPT-3 was trained with 300 billion tokens (as per OpenAI's paper).

If the batch size is 2048 tokens, the total number of steps is:

[ ext{Total steps} = rac{300 imes 10^9}{2048} approx 1.46 imes 10^8 ]

4. Total FLOPs Required

Multiplying the FLOPs per step by the total number of steps:

[ ext{Total FLOPs} = (2.15 imes 10^{15}) imes (1.46 imes 10^8) ]

[ approx 3.14 imes 10^{23} ext{ FLOPs} ]

Final Answer:

The estimated total number of FLOPs required to train GPT-3 is about ( 3.14 imes 10^{23} ) floating point operations.

Vote

2 / 46

AI Board Game Logic

reasoning·medium

Tests an AI's ability to understand game rules and strategy

Z.ai: GLM 5

Zhipu · Feb 2026svg

GPT-4o (Omni)

Openai · May 2024svg

Vote

3 / 46

Logic Puzzle

reasoning·medium

Tests an AI's ability to solve a simple but potentially confusing logic puzzle

Z.ai: GLM 5

Zhipu · Feb 2026text

Sally has 1 sister.

Here is the breakdown:

Since the brothers are all siblings, they share the same sisters.
If each brother has 2 sisters, that means there are 2 girls in total in the family.
Sally is one of those girls.
Therefore, the other girl is Sally's only sister.