Estimate Complexity
Claude Opus 4.5's response to the 'Estimate Complexity' challenge.
Claude Opus 4.5
November 24, 2025
Text
?Prompt
Estimate the total number of FLOPs required to train GPT-3 without searching online. Explain your reasoning step by step.
Response
I'll estimate the FLOPs required to train GPT-3 step by step, using what I know about the model architecture and training process.
Key Parameters for GPT-3
From memory:
- Parameters (P): 175 billion (175 × 10⁹)
- Training tokens (D): ~300 billion (300 × 10⁹)
- Context length: 2048 tokens
- Layers: 96
- Hidden dimension: 12,288
- Attention heads: 96
The Standard Approximation Formula
There's a well-known approximation for transformer training FLOPs:
FLOPs ≈ 6 × P × D
Where:
- P = number of parameters
- D = number of training tokens
- The factor of 6 comes from:
- 2× for forward pass (multiply-accumulate = 2 ops per parameter)
- 4× for backward pass (roughly 2× the forward pass for computing gradients with respect to both activations and weights)
Calculation
FLOPs = 6 × P × D
FLOPs = 6 × (175 × 10⁹) × (300 × 10⁹)
FLOPs = 6 × 175 × 300 × 10¹⁸
FLOPs = 6 × 52,500 × 10¹⁸
FLOPs = 315,000 × 10¹⁸
FLOPs = 3.15 × 10²³
Result
Approximately 3 × 10²³ FLOPs (or ~300 zettaFLOPs)
Sanity Check
This aligns with what OpenAI reported. For context:
- Training on 1,000 V100 GPUs (~100 TFLOP/s each at FP16) = 10¹⁷ FLOP/s total
- Time needed: 3×10²³ / 10¹⁷ = 3×10⁶ seconds ≈ 35 days
This is plausible for a large-scale training run, which gives me confidence the estimate is in the right ballpark.
About Claude Opus 4.5
Capabilities
ConversationReasoningCode GenerationAnalysisTool Use
Categories
TextCodeMultimodal
Specifications
- Provider
- Anthropic
- Released
- 2025-11-24
- Size
- XLARGE