Estimate Complexity
Gpt 5 Pro's response to the 'Estimate Complexity' challenge.
GPT-5 Pro
October 6, 2025
Text
?Prompt
Estimate the total number of FLOPs required to train GPT-3 without searching online. Explain your reasoning step by step.
Response
Here’s a simple, standard way to estimate training FLOPs for a large dense Transformer like GPT-3:
- Use the “6 N T” rule of thumb
- For dense Transformers, the total training compute (FLOPs) is well-approximated by:
FLOPs ≈ 6 × N × T
where:
- N = number of trainable parameters
- T = number of tokens seen during training
- Intuition for the 6: a forward pass is roughly 2 FLOPs per parameter per token (one multiply and one add), and backward/optimizer roughly add another ~4× of the forward cost, giving ~6× in total. Non-matmul overheads (layernorm, softmax, etc.) are smaller-order and largely folded into this constant.
- Plug in GPT-3’s parameter count
- N ≈ 175 billion = 1.75 × 10^11
- Choose a reasonable token count
- GPT-3 175B is commonly associated with roughly T ≈ 300 billion tokens = 3.0 × 10^11
- This is consistent with the model being undertrained by modern “compute-optimal” standards (which would use many more tokens per parameter), but it matches the historical setup.
- Compute
- FLOPs ≈ 6 × (1.75 × 10^11) × (3.0 × 10^11)
- = 6 × 5.25 × 10^22
- = 3.15 × 10^23 FLOPs
- Caveats and range
- Counting conventions: if one counts a fused multiply–add as 1 FLOP instead of 2, you’d get roughly half this value.
- Real-world training often uses activation recomputation and has some overheads, which can add ~10–30% (or more) to the compute.
- If the true token count were, say, 200–500B rather than exactly 300B, the estimate would scale linearly to about 2.1 × 10^23 to 5.25 × 10^23 FLOPs.
Final estimate
- About 3 × 10^23 FLOPs to train GPT-3 (175B) end to end, with a plausible range of roughly 2–5 × 10^23 depending on exact tokens and overheads.
About GPT-5 Pro
Capabilities
ConversationReasoningCode GenerationAnalysis
Categories
TextCode
Specifications
- Provider
- Openai
- Released
- 2025-10-06
- Size
- XLARGE
- Context
- 400,000 tokens