Estimate Complexity
Gpt 4.1's response to the Estimate Complexity challenge.
?Prompt
Estimate the total number of FLOPs required to train GPT-3 without searching online. Explain your reasoning step by step.
Response
Let's estimate the total number of floating-point operations (FLOPs) required to train GPT-3.
Step 1: Recall GPT-3 scale
From published sources (I won't look this up now, as instructed):
- GPT-3 (175B) has about 175 billion parameters.
- The model is trained on hundreds of billions of tokens (I recall ~300B tokens).
Step 2: FLOPs per token per parameter
When training a transformer, roughly 2 forward passes and 1 backward pass per token (combined), and the number of FLOPs per parameter per token is about 6 (this is a number I remember from general literature).
But to be more precise, the convention is: Training one token requires about 2 FLOPs per parameter (one forward and one backward pass, with backward generally being twice as expensive as forward). Sometimes the estimate is 2, 3, 6, or even more, depending on if you count both, under what assumptions (optimizer steps, auxiliary computations, etc.), but 4 or 6 is standard.
Let's use 6 FLOPs/parameter/token (a commonly cited "rule of thumb").
Step 3: Compute FLOPs
Formula
[ \text{Total FLOPs} = \text{#params} \times \text{#tokens} \times \text{FLOPs per param per token} ]
Plug in numbers:
- #params ≈ 175 × 10⁹
- #tokens ≈ 300 × 10⁹
- FLOPs per param per token ≈ 6
[ \text{Total FLOPs} = 175 \times 10^9 \times 300 \times 10^9 \times 6 ]
Let's multiply:
- (175 \times 300 = 52,500)
- (52,500 \times 6 = 315,000)
- (10^9 \times 10^9 = 10^{18})
So,
[ \text{Total FLOPs} = 315{,}000 \times 10^{18} = 3.15 \times 10^{23} \text{ FLOPs} ]
Step 4: Order of Magnitude & Re-check
People sometimes write (3 \times 10^{23}) for GPT-3, so our estimate matches the ballpark.
Step 5: Summary Sentence
Estimate:
It would take approximately (3 \times 10^{23}) FLOPs to train GPT-3 (175B parameters, 300B tokens, 6 FLOPs/param/token).
Assumptions:
- 175B parameters
- 300B tokens
- 6 FLOPs per parameter per token
In summary:
Estimated total training compute for GPT-3: (\boxed{sim3 \times 10^{23}}) FLOPs
(based on 175B parameters, 300B tokens, 6 FLOPs/parameter/token)
About GPT-4.1
Capabilities
Categories
Specifications
- Provider
- Openai
- Released
- 2025-04-14
- Size
- XLARGE
- Parameters
- Not disclosed
- Context
- 1,047,576 tokens