Estimate Complexity
Gpt 5's response to the 'Estimate Complexity' challenge.
GPT-5
August 7, 2025
Text
?Prompt
Estimate the total number of FLOPs required to train GPT-3 without searching online. Explain your reasoning step by step.
Response
Here’s a back‑of‑the‑envelope estimate using standard LM training-compute heuristics.
- Assumptions
- Model size (N): GPT-3 has about 175 billion parameters (N ≈ 1.75 × 10^11).
- Training tokens (T): GPT-3 was trained on roughly 300 billion tokens (T ≈ 3.0 × 10^11).
- FLOPs per token per parameter: A common rule of thumb for transformer LM training is FLOPs ≈ 6 × N × T The factor ~6 comes from counting forward and backward passes for the parameter-heavy matmuls (roughly 2N for forward + 4N for backward). It ignores small terms from layernorms, softmax, embeddings, and optimizer updates (which we’ll account for with a small overhead factor later).
- Core calculation
- 6 × N × T = 6 × (1.75 × 10^11) × (3.0 × 10^11)
- First multiply N × T: 1.75 × 3.0 = 5.25; powers: 10^11 × 10^11 = 10^22 So N × T = 5.25 × 10^22
- Now multiply by 6: 6 × 5.25 × 10^22 = 31.5 × 10^22 = 3.15 × 10^23 FLOPs
- Overheads and uncertainty
- Non-matmul ops (attention softmax, masking, residuals, layernorms), embeddings, and optimizer update steps add some overhead. A 10–20% uplift is a reasonable allowance.
- Applying +10–20%: 3.15 × 10^23 × (1.1 to 1.2) ≈ (3.5 ± 0.3) × 10^23 FLOPs.
Answer
- Point estimate: ~3.15 × 10^23 FLOPs
- With a modest overhead allowance: roughly (3.3–3.8) × 10^23 FLOPs
This is a coarse estimate; the exact number depends on details like sequence length, activation checkpointing, implementation efficiency, and the precise token count.
About GPT-5
Capabilities
ConversationReasoningCode GenerationAnalysisPlanningMemoryAgentic Tool UseFunction Calling
Categories
TextCodeMultimodalVisionAudio
Specifications
- Provider
- Openai
- Released
- 2025-08-07
- Size
- XLARGE
- Parameters
- 1.8T
- Context
- 400,000 tokens