Estimate Complexity

Claude Sonnet 4.6's response to the 'Estimate Complexity' challenge.

Claude Sonnet 4.6

February 17, 2026

Text

?
Prompt

Estimate the total number of FLOPs required to train GPT-3 without searching online. Explain your reasoning step by step.

Response

Claude Sonnet 4.6Estimate Complexity

Claude Sonnet 4.6

Estimating FLOPs to Train GPT-3

Step 1: Recall GPT-3's Architecture

GPT-3 has approximately:

175 billion parameters (this is well-known)
96 transformer layers
96 attention heads
Hidden dimension: 12,288
Context length: 2,048 tokens

Step 2: FLOPs per Token (Forward Pass)

For a transformer, the dominant operations are matrix multiplications. There's a useful rule of thumb:

FLOPs per token ≈ 2 × N (for the forward pass)

where N = number of parameters.

Why 2×N?

Each parameter is involved in roughly one multiply-add operation per token
A multiply-add = 2 FLOPs
This holds when sequence length is much smaller than model dimension (which is roughly true here)

So forward pass FLOPs per token: $$2 \times 175 \times 10^9 = 3.5 \times 10^{11} \text{ FLOPs/token}$$

Step 3: Account for Backward Pass

Backpropagation costs approximately 2× the forward pass (computing gradients for weights + activations).

So total per token: $$3 \times 3.5 \times 10^{11} = 1.05 \times 10^{12} \text{ FLOPs/token}$$

Step 4: Training Tokens

GPT-3 was trained on approximately 300 billion tokens (from the paper's description of ~300B tokens from filtered Common Crawl, WebText, books, Wikipedia).

Step 5: Total FLOPs

$$\text{Total FLOPs} = 1.05 \times 10^{12} \times 3 \times 10^{11}$$

$$= 3.15 \times 10^{23} \text{ FLOPs}$$

$$\approx \boxed{3 \times 10^{23} \text{ FLOPs}}$$

Sanity Check

This is ~300 zettaFLOPs
At 1,000 A100s running at ~300 TFLOPs each (with ~50% utilization → ~150 TFLOPs effective):
- Total compute = 1,000 × 1.5×10¹⁴ = 1.5×10¹⁷ FLOPs/second
- Time = 3×10²³ / 1.5×10¹⁷ ≈ 2×10⁶ seconds ≈ 23 days
- OpenAI reportedly used thousands of GPUs for weeks — this is plausible ✓

Summary

Component	Value
Parameters	175B
FLOPs/token (forward)	~3.5 × 10¹¹
Training multiplier (fwd+bwd)	×3
Training tokens	~300B
Total FLOPs	~3 × 10²³

The key formula to remember: C ≈ 6ND where N = parameters, D = training tokens.

Estimate Complexity

Turn this model response into notes, narration, or a short video

Auto Notes Transcript Narrate Video Cut

Partner link

This output is part of our open dataset — 5K+ AI responses across 200 models

About Claude Sonnet 4.6

Capabilities

ConversationReasoningCode GenerationAnalysisAgentic Tool UseTool Use

Specifications

Provider: Anthropic
Released: 2026-02-17
Size: LARGE
Context: 1,000,000 tokens

Keep exploring

SAME PROMPT

Gemini 3.1 Pro Preview's version

Same prompt, different result

COMPARE

Claude Sonnet 4.6 vs Z.ai: GLM 5

Both outputs, side by side

Rival

Back to Claude Sonnet 4.6

Estimate Complexity

Claude Sonnet 4.6's response to the 'Estimate Complexity' challenge.

Claude Sonnet 4.6

February 17, 2026

Text

?
Prompt

Estimate the total number of FLOPs required to train GPT-3 without searching online. Explain your reasoning step by step.

Response

Claude Sonnet 4.6Estimate Complexity

Claude Sonnet 4.6

Estimating FLOPs to Train GPT-3

Step 1: Recall GPT-3's Architecture

GPT-3 has approximately:

175 billion parameters (this is well-known)
96 transformer layers
96 attention heads
Hidden dimension: 12,288
Context length: 2,048 tokens

Step 2: FLOPs per Token (Forward Pass)

For a transformer, the dominant operations are matrix multiplications. There's a useful rule of thumb:

FLOPs per token ≈ 2 × N (for the forward pass)

where N = number of parameters.

Why 2×N?

Each parameter is involved in roughly one multiply-add operation per token
A multiply-add = 2 FLOPs
This holds when sequence length is much smaller than model dimension (which is roughly true here)

So forward pass FLOPs per token: $$2 \times 175 \times 10^9 = 3.5 \times 10^{11} \text{ FLOPs/token}$$

Step 3: Account for Backward Pass

Backpropagation costs approximately 2× the forward pass (computing gradients for weights + activations).

So total per token: $$3 \times 3.5 \times 10^{11} = 1.05 \times 10^{12} \text{ FLOPs/token}$$

Step 4: Training Tokens

GPT-3 was trained on approximately 300 billion tokens (from the paper's description of ~300B tokens from filtered Common Crawl, WebText, books, Wikipedia).

Step 5: Total FLOPs

$$\text{Total FLOPs} = 1.05 \times 10^{12} \times 3 \times 10^{11}$$

$$= 3.15 \times 10^{23} \text{ FLOPs}$$

$$\approx \boxed{3 \times 10^{23} \text{ FLOPs}}$$

Sanity Check

This is ~300 zettaFLOPs
At 1,000 A100s running at ~300 TFLOPs each (with ~50% utilization → ~150 TFLOPs effective):
- Total compute = 1,000 × 1.5×10¹⁴ = 1.5×10¹⁷ FLOPs/second
- Time = 3×10²³ / 1.5×10¹⁷ ≈ 2×10⁶ seconds ≈ 23 days
- OpenAI reportedly used thousands of GPUs for weeks — this is plausible ✓

Summary

Component	Value
Parameters	175B
FLOPs/token (forward)	~3.5 × 10¹¹
Training multiplier (fwd+bwd)	×3
Training tokens	~300B
Total FLOPs	~3 × 10²³

The key formula to remember: C ≈ 6ND where N = parameters, D = training tokens.

Estimate Complexity

Turn this model response into notes, narration, or a short video

Auto Notes Transcript Narrate Video Cut

Partner link

This output is part of our open dataset — 5K+ AI responses across 200 models

About Claude Sonnet 4.6

Capabilities

ConversationReasoningCode GenerationAnalysisAgentic Tool UseTool Use

Specifications

Provider: Anthropic
Released: 2026-02-17
Size: LARGE
Context: 1,000,000 tokens

Keep exploring

SAME PROMPT

Gemini 3.1 Pro Preview's version

Same prompt, different result

COMPARE

Claude Sonnet 4.6 vs Z.ai: GLM 5

Both outputs, side by side

Estimate Complexity

?
Prompt

Response

Estimating FLOPs to Train GPT-3

Step 1: Recall GPT-3's Architecture

Step 2: FLOPs per Token (Forward Pass)

Step 3: Account for Backward Pass

Step 4: Training Tokens

Step 5: Total FLOPs

Sanity Check

Summary

About Claude Sonnet 4.6

Capabilities

Categories

Specifications

Gemini 3.1 Pro Preview's version

Claude Sonnet 4.6 vs Z.ai: GLM 5

Estimate Complexity

?
Prompt

Response

Estimating FLOPs to Train GPT-3

Step 1: Recall GPT-3's Architecture

Step 2: FLOPs per Token (Forward Pass)

Step 3: Account for Backward Pass

Step 4: Training Tokens

Step 5: Total FLOPs

Sanity Check

Summary

About Claude Sonnet 4.6

Capabilities

Categories

Specifications

Gemini 3.1 Pro Preview's version

Claude Sonnet 4.6 vs Z.ai: GLM 5

Estimate Complexity

?Prompt

Response

Estimating FLOPs to Train GPT-3

Step 1: Recall GPT-3's Architecture

Step 2: FLOPs per Token (Forward Pass)

Step 3: Account for Backward Pass

Step 4: Training Tokens

Step 5: Total FLOPs

Sanity Check

Summary

About Claude Sonnet 4.6

Capabilities

Categories

Specifications

Gemini 3.1 Pro Preview's version

Claude Sonnet 4.6 vs Z.ai: GLM 5

Estimate Complexity

?Prompt

Response

Estimating FLOPs to Train GPT-3

Step 1: Recall GPT-3's Architecture

Step 2: FLOPs per Token (Forward Pass)

Step 3: Account for Backward Pass

Step 4: Training Tokens

Step 5: Total FLOPs

Sanity Check

Summary

About Claude Sonnet 4.6

Capabilities

Categories

Specifications

Gemini 3.1 Pro Preview's version

Claude Sonnet 4.6 vs Z.ai: GLM 5

?
Prompt

?
Prompt