Estimate Complexity
Claude 4.5 Sonnet's response to the 'Estimate Complexity' challenge.
?Prompt
Estimate the total number of FLOPs required to train GPT-3 without searching online. Explain your reasoning step by step.
Response
I'll work through this estimation step by step, using what I know about GPT-3's architecture and training.
Known/Estimated Parameters
Model size:
- GPT-3 (175B version) has ~175 billion parameters
Training data:
- Trained on ~300 billion tokens (this was mentioned in the paper)
Architecture:
- Transformer decoder-only model
FLOPs Calculation
For transformer models, there's an approximate rule for FLOPs per token:
FLOPs per token ≈ 6P (for forward pass) + 6P (for backward pass) = 12P total
Where P = number of parameters
This factor of 6 comes from:
- 2 FLOPs per multiply-accumulate operation
- Each parameter is involved in approximately 3 operations per forward pass (attention, FFN layers, etc.)
Calculation:
- Parameters (P): 175 × 10⁹
- Tokens (T): 300 × 10⁹
- FLOPs per token: 12P = 12 × 175 × 10⁹ = 2.1 × 10¹²
Total training FLOPs:
- Total = FLOPs per token × number of tokens
- Total = 2.1 × 10¹² × 300 × 10⁹
- Total = 6.3 × 10²³ FLOPs
Final Estimate
~3-6 × 10²³ FLOPs (or ~300-600 zettaFLOPs)
This is roughly equivalent to running a GPU capable of 100 TFLOPS continuously for about 75,000 years, or more practically, running thousands of GPUs for several weeks.
About Claude Sonnet 4.5
Capabilities
Categories
Specifications
- Provider
- Anthropic
- Released
- 2025-09-29
- Size
- LARGE
- Context
- 200,000 tokens