Rival

Claude 3.7 Thinking SonnetvsOpenAI o3

41 challenges

Why Claude 3.7 Thinking Sonnet?

1.4x cheaper overall$6.00/M in · $30.00/M out

Why OpenAI o3?

2 months newerApr 2025

3 / 41

Estimate Complexity

reasoning·hard

Tests an AI's ability to make educated estimates based on technical knowledge

Claude 3.7 Thinking Sonnet

Anthropic · Feb 2025text

Estimating GPT-3 Training FLOPs

I'll estimate the total number of floating point operations (FLOPs) required to train GPT-3 by breaking down the calculation step by step.

Model Architecture Facts

GPT-3 has approximately 175 billion parameters
Uses a transformer architecture with multiple layers
Context length of 2048 tokens
Trained on approximately 300 billion tokens

Calculating FLOPs per Token

For transformer models, we need to account for operations in:

Self-attention mechanisms
Feed-forward networks
Layer normalization and other operations

For GPT-3, let's use these parameters:

Embedding dimension (d): ~12,288
Number of layers (L): ~96
Context length (s): 2,048

Forward Pass Calculation

For each token in each layer:

Self-attention: ~4 × d² + 2 × d × s FLOPs
Feed-forward network: ~8 × d² FLOPs
Other operations: ~d FLOPs

Total FLOPs per token (forward pass): ≈ L × (12 × d² + 2 × d × s) ≈ 96 × (12 × 12,288² + 2 × 12,288 × 2,048) ≈ 96 × (1.8 × 10⁹ + 5 × 10⁷) ≈ 1.8 × 10¹¹ FLOPs

Backward Pass

The backward pass typically requires about 2× the computation of the forward pass: ≈ 2 × 1.8 × 10¹¹ ≈ 3.6 × 10¹¹ FLOPs

Total FLOPs per token (forward + backward): ≈ 1.8 × 10¹¹ + 3.6 × 10¹¹ ≈ 5.4 × 10¹¹ FLOPs

Total Training FLOPs

Multiply by the total number of tokens processed: ≈ 5.4 × 10¹¹ × 300 × 10⁹ ≈ 1.62 × 10²³ FLOPs

Alternative Calculation

As a sanity check, using a simpler heuristic:

Each parameter requires ~2-4 FLOPs per token (including forward, backward, and weight update)
175B parameters × 4 FLOPs × 300B tokens ≈ 2.1 × 10²³ FLOPs

Conclusion

The total training computation for GPT-3 was approximately 1.6-2.1 × 10²³ FLOPs (160-210 zettaFLOPs).

This massive computational requirement explains why training such models costs millions of dollars and requires specialized supercomputing infrastructure.

OpenAI o3

Openai · Apr 2025text

Below is a simple "back-of-the-envelope" estimate that uses only a few public facts about GPT-3 and very generic rules of thumb for counting operations.

Step 1. Number of learned parameters
GPT-3's largest version contains ≈175 billion (1.75 × 10¹¹) trainable parameters.

Step 2. How much data was seen during training
OpenAI reports that the model was trained on about 300 billion tokens (3 × 10¹¹).

Step 3. How many floating-point operations does one parameter cost per token?
A practical rule of thumb for Transformer language models is
Training FLOPs ≈ 6 × (# parameters) × (# training tokens)

Where does the factor 6 come from?

• Forward pass: every parameter is involved in one multiply-accumulate → 2 FLOPs
• Back-prop gradient computation: roughly another 2 FLOPs
• Parameter-update (optimizer) and miscellaneous overhead: ≈2 FLOPs
Total ≈6 FLOPs per parameter per token.

Step 4. Plug in the numbers
Total FLOPs = 6 × (1.75 × 10¹¹) × (3 × 10¹¹)
= 6 × 5.25 × 10²²
≈ 3.15 × 10²³ FLOPs.

Step 5. Express the answer in more familiar units
3.15 × 10²³ ≈ 315 zettaFLOPs (1 ZFLOP = 10²¹ FLOPs).

Step 6. Sanity check with hardware-time intuition
An NVIDIA A100 delivers ≈ 312 teraFLOP/s (0.312 PFLOP/s) in mixed precision.
Wall-clock time on a single A100:
3.15 × 10²³ / 3.12 × 10¹⁴ ≈ 1.0 × 10⁹ s ≈ 32 years.
Running on 1,000 A100s in parallel would take ≈12 days, which is the right ballpark for a large industrial training run.

Final estimate
Training GPT-3 (175 B parameters, 300 B tokens) requires on the order of 3 × 10²³ floating-point operations, i.e. roughly 300 zettaFLOPs.

Vote

2 / 41

AI Board Game Logic

reasoning·medium

Tests an AI's ability to understand game rules and strategy

Claude 3.7 Thinking Sonnet

Anthropic · Feb 2025svg

Couldn't render this response

This model didn't return a displayable SVG image for this challenge.

View raw output

<svg width="300" height="300" xmlns="http://www.w3.org/2000/svg">
<!-- Background -->
<rect width="300" height="300" fill="white"/>
<!-- Grid lines -->
<line x1="100" y1="0" x2="100" y2="300" stroke="black" stroke-width="5"/>
<line x1="200" y1="0" x2="200" y2="300" stroke="black" stroke-width="5"/>
<line x1="0" y1="100" x2="300" y2="100" stroke="black" stroke-width="5"/>
<line x1="0" y1="200" x2="300" y2="200" stroke="black" stroke-width="5"/>
<!-- X's -->
<line x1="20" y1="20" x2="80" y2="80" stroke="red" stroke-width="8" stroke-linecap="round"/>
<line x1="80" y1="20" x2="20" y2="80" stroke="red" stroke-width="8" stroke-linecap="round"/>
<line x1="20" y1="120" x2="80" y2="180" stroke="red" stroke-width="8" stroke-linecap="round"/>
<line x1="80" y1="120" x2="20" y2="180" stroke="red" stroke-width="8" stroke-linecap="round"/>
<line x1="220" y1="220" x2="280" y2="280" stroke="red" stroke-width="8" stroke-linecap="round"/>
<line x1="280" y1="220" x2="220" y2="280" stroke="red" stroke-width="8" stroke-linecap="round"/>
<!-- O's -->
<circle cx="150" cy="50" r="30" stroke="blue" stroke-width="8" fill="none"/>
<circle cx="150" cy="150" r="30" stroke="blue" stroke-width="8" fill="none"/>
<circle cx="50" cy="250" r="30" stroke="blue" stroke-width="8" fill="none"/>