Rival

Z.AI: GLM 4.5vsGPT-4.1

35 challenges

Why Z.AI: GLM 4.5?

3.6x cheaper overall$0.60/M in · $2.20/M out

Leads 1 of 1 benchmarks

Stronger on SWE-bench Verified64.2% vs 54.6%

3 months newerJul 2025

Why GPT-4.1?

8.2x more context1.0M

3 / 35

Estimate Complexity

reasoning·hard

Tests an AI's ability to make educated estimates based on technical knowledge

Z.AI: GLM 4.5

Zhipu · Jul 2025text

To estimate the total number of FLOPs (floating-point operations) required to train GPT-3 without searching online, we rely on known architectural details and standard computational models for transformer training. Here's the step-by-step reasoning:

Step 1: Key Parameters of GPT-3

Number of parameters (P): GPT-3 has 175 billion parameters (175 × 10⁹).
Training dataset size (D): The model was trained on approximately 300 billion tokens (300 × 10⁹ tokens). This is a standard figure associated with GPT-3.

Step 2: FLOPs per Token Calculation

Training a transformer model involves two main operations per token:

Forward pass: Computes the model's output.
Backward pass: Computes gradients for backpropagation.

For transformer architectures, the combined FLOPs for forward and backward passes per token is empirically estimated as 6 × P. This is derived from:

Forward pass: Requires ~2 × P FLOPs per token (matrix multiplications dominate, with each parameter used in multiply-add operations).
Backward pass: Typically requires ~2× the FLOPs of the forward pass due to gradient calculations (~4 × P FLOPs per token).
Total per token: 2P (forward) + 4P (backward) = 6P FLOPs per token.

Step 3: Total FLOPs for the Entire Dataset

The total FLOPs scales linearly with the dataset size:

Total FLOPs = FLOPs per token × Number of tokens.
Substituting the values:
Total FLOPs = (6 × P) × D = 6 × (175 × 10⁹) × (300 × 10⁹).

Step 4: Calculation

Compute the product:
6 × 175 × 300 × 10⁹ × 10⁹ = 6 × 52,500 × 10¹⁸.
Simplify:
6 × 52,500 = 315,000.
Total FLOPs = 315,000 × 10¹⁸ = 3.15 × 10²³ FLOPs.

Step 5: Validation with Known Compute

GPT-3 training reportedly used ~3,640 petaflop-days of compute.
Convert petaflop-days to FLOPs:
1 petaflop-day = 10¹⁵ FLOPs × 86,400 seconds/day = 8.64 × 10¹⁹ FLOPs.
3,640 petaflop-days = 3,640 × 8.64 × 10¹⁹ ≈ 3.14 × 10²³ FLOPs.
This closely matches our estimate of 3.15 × 10²³ FLOPs, confirming the calculation.

Step 6: Batch Size and Steps Consideration

Batch size: GPT-3 used a batch size of 3.2 million tokens per step (80 batches × 4096 tokens/batch).
Number of training steps:
Steps = Total tokens / Batch size = 300 × 10⁹ / 3.2 × 10⁶ = 93,750 steps.
FLOPs per step:
Per step FLOPs = 6 × P × Batch size = 6 × 175 × 10⁹ × 3.2 × 10⁶ = 3.36 × 10¹⁸ FLOPs/step.
Total FLOPs:
93,750 steps × 3.36 × 10¹⁸ FLOPs/step ≈ 3.15 × 10²³ FLOPs.
This consistency reinforces the estimate.

Step 7: Negligible Factors

Parameter updates: Optimizer steps (e.g., Adam) add O(P) FLOPs per step. For 93,750 steps:
Update FLOPs ≈ 93,750 × 175 × 10⁹ ≈ 1.64 × 10¹⁶ FLOPs (negligible vs. 3.15 × 10²³).
Embedding/softmax layers: Included in the 6P per-token estimate.
Attention mechanisms: Captured by the 6P heuristic (validated for transformers).

Final Estimate

The total FLOPs required to train GPT-3 is 3.15 × 10²³ FLOPs. This is derived from:

Standard transformer FLOPs per token (6P).
GPT-3 parameters (175B) and dataset size (300B tokens).
Cross-validation with reported compute usage (3,640 petaflop-days).

GPT-4.1

Openai · Apr 2025text

Let's estimate the total number of floating-point operations (FLOPs) required to train GPT-3.

Step 1: Recall GPT-3 scale

From published sources (I won't look this up now, as instructed):

GPT-3 (175B) has about 175 billion parameters.
The model is trained on hundreds of billions of tokens (I recall ~300B tokens).

Step 2: FLOPs per token per parameter

When training a transformer, roughly 2 forward passes and 1 backward pass per token (combined), and the number of FLOPs per parameter per token is about 6 (this is a number I remember from general literature).

But to be more precise, the convention is: Training one token requires about 2 FLOPs per parameter (one forward and one backward pass, with backward generally being twice as expensive as forward). Sometimes the estimate is 2, 3, 6, or even more, depending on if you count both, under what assumptions (optimizer steps, auxiliary computations, etc.), but 4 or 6 is standard.

Let's use 6 FLOPs/parameter/token (a commonly cited "rule of thumb").

Step 3: Compute FLOPs

Formula

[ \text{Total FLOPs} = \text{#params} \times \text{#tokens} \times \text{FLOPs per param per token} ]

Plug in numbers:

#params ≈ 175 × 10⁹
#tokens ≈ 300 × 10⁹
FLOPs per param per token ≈ 6

[ \text{Total FLOPs} = 175 \times 10^9 \times 300 \times 10^9 \times 6 ]

Let's multiply:

(175 \times 300 = 52,500)
(52,500 \times 6 = 315,000)
(10^9 \times 10^9 = 10^{18})

So,

[ \text{Total FLOPs} = 315{,}000 \times 10^{18} = 3.15 \times 10^{23} \text{ FLOPs} ]

Step 4: Order of Magnitude & Re-check

People sometimes write (3 \times 10^{23}) for GPT-3, so our estimate matches the ballpark.

Step 5: Summary Sentence

Estimate:
It would take approximately (3 \times 10^{23}) FLOPs to train GPT-3 (175B parameters, 300B tokens, 6 FLOPs/param/token).

Assumptions:

175B parameters
300B tokens
6 FLOPs per parameter per token

In summary:

Estimated total training compute for GPT-3: (\boxed{sim3 \times 10^{23}}) FLOPs
(based on 175B parameters, 300B tokens, 6 FLOPs/parameter/token)

Vote

2 / 35

AI Board Game Logic

reasoning·medium

Tests an AI's ability to understand game rules and strategy

Z.AI: GLM 4.5

Zhipu · Jul 2025svg

GPT-4.1

Openai · Apr 2025svg

Vote

3 / 35

Logic Puzzle

reasoning·medium

Tests an AI's ability to solve a simple but potentially confusing logic puzzle

Z.AI: GLM 4.5

Zhipu · Jul 2025text

Sally has 3 brothers, meaning there are 3 boys in the family. Each brother has 2 sisters, indicating that there are exactly 2 girls in the family. Since Sally is one of the girls, there must be one additional sister. Therefore, Sally has one sister.

This can be verified by considering the family composition:

Sisters: Sally and one other girl.
Brothers: 3 boys.

From Sally's perspective:

She has 3 brothers.
She has 1 sister (the other girl).

From each brother's perspective:

They have 2 sisters (Sally and the other girl).

This satisfies both conditions: Sally has 3 brothers, and each brother has 2 sisters.

If there were only one sister (Sally), then each brother would have only one sister, which contradicts the information that each brother has 2 sisters. If there were three sisters, then each brother would have three sisters, which also contradicts the information that each brother has 2 sisters. Thus, there must be exactly two sisters, meaning Sally has one sister.

\boxed{1}