Estimate Complexity

Claude 4.5 Sonnet's response to the 'Estimate Complexity' challenge.

Claude Sonnet 4.5

September 29, 2025

Text

?
Prompt

Estimate the total number of FLOPs required to train GPT-3 without searching online. Explain your reasoning step by step.

Response

Claude Sonnet 4.5Estimate Complexity

Claude Sonnet 4.5

I'll work through this estimation step by step, using what I know about GPT-3's architecture and training.

Known/Estimated Parameters

Model size:

GPT-3 (175B version) has ~175 billion parameters

Training data:

Trained on ~300 billion tokens (this was mentioned in the paper)

Architecture:

Transformer decoder-only model

FLOPs Calculation

For transformer models, there's an approximate rule for FLOPs per token:

FLOPs per token ≈ 6P (for forward pass) + 6P (for backward pass) = 12P total

Where P = number of parameters

This factor of 6 comes from:

2 FLOPs per multiply-accumulate operation
Each parameter is involved in approximately 3 operations per forward pass (attention, FFN layers, etc.)

Calculation:

Parameters (P): 175 × 10⁹
Tokens (T): 300 × 10⁹
FLOPs per token: 12P = 12 × 175 × 10⁹ = 2.1 × 10¹²

Total training FLOPs:

Total = FLOPs per token × number of tokens
Total = 2.1 × 10¹² × 300 × 10⁹
Total = 6.3 × 10²³ FLOPs

Final Estimate

~3-6 × 10²³ FLOPs (or ~300-600 zettaFLOPs)

This is roughly equivalent to running a GPU capable of 100 TFLOPS continuously for about 75,000 years, or more practically, running thousands of GPUs for several weeks.

Estimate Complexity

?
Prompt

Response

Known/Estimated Parameters

FLOPs Calculation

Final Estimate

About Claude Sonnet 4.5

Capabilities

Categories

Specifications

Estimate Complexity

?Prompt

Response

Known/Estimated Parameters

FLOPs Calculation

Final Estimate

About Claude Sonnet 4.5

Capabilities

Categories

Specifications

?
Prompt