Estimate Complexity
o4-mini-high's response to the Tests an AI's ability to make educated estimates based on technical knowledge
?Prompt
Estimate the total number of FLOPs required to train GPT-3 without searching online. Explain your reasoning step by step.
Response
Here is one way to get to the oft‐quoted figure of "a few × 10^23 FLOPs" for training GPT‑3. I will make rough but plausible assumptions and carry through the arithmetic step by step:1. Model size – GPT‑3 "175 B" has about N ≈ 1.75×10^11 trainable parameters.2. Amount of training data – The paper says they trained on on the order of 3×10^11 tokens (≈ 300 billion tokens).3. FLOPs per token per parameter – In a forward pass each parameter participates in one multiply + one add → ≈2 FLOPs per parameter per token. – Back‐propagation through that parameter costs on the order of twice as much again (roughly another 4 FLOPs per parameter per token). – We can fold in any small extra cost for optimizer updates and say order‐of‐magnitude it to ≈ 6 FLOPs per parameter per token.4. Total FLOPs – FLOPs ≈ (FLOPs per parameter per token) × (number of parameters) × (number of tokens) – FLOPs ≈ 6 × (1.75×10^11) × (3×10^11) – ≈ 6 × 5.25×10^22 – ≈ 3.15×10^23 floating‐point operationsSo you end up with a training cost on the order of 10^23–10^24 FLOPs (often quoted as ≃3×10^23 FLOPs). That is roughly 300 zettaflops (300×10^21 FLOPs) in total.
About OpenAI o4 Mini High
Capabilities
Categories
Specifications
- Provider
- Openai
- Released
- April 16, 2025
- Size
- MEDIUM
- Parameters
- Not disclosed
- Context
- 200,000 tokens