4 models have responded to the "Estimate Complexity" challenge. Compare their approaches side-by-side on RIVAL. This response is part of RIVAL's open dataset of 5,600+ AI model responses.
Gpt 4.1's response to the Estimate Complexity challenge.
Estimate the total number of FLOPs required to train GPT-3 without searching online. Explain your reasoning step by step.
Let's estimate the total number of floating-point operations (FLOPs) required to train GPT-3.
From published sources (I won't look this up now, as instructed):
When training a transformer, roughly 2 forward passes and 1 backward pass per token (combined), and the number of FLOPs per parameter per token is about 6 (this is a number I remember from general literature).
But to be more precise, the convention is: Training one token requires about 2 FLOPs per parameter (one forward and one backward pass, with backward generally being twice as expensive as forward). Sometimes the estimate is 2, 3, 6, or even more, depending on if you count both, under what assumptions (optimizer steps, auxiliary computations, etc.), but 4 or 6 is standard.
Let's use 6 FLOPs/parameter/token (a commonly cited "rule of thumb").
[ \text{Total FLOPs} = \text{#params} \times \text{#tokens} \times \text{FLOPs per param per token} ]
Plug in numbers:
[ \text{Total FLOPs} = 175 \times 10^9 \times 300 \times 10^9 \times 6 ]
Let's multiply:
So,
[ \text{Total FLOPs} = 315{,}000 \times 10^{18} = 3.15 \times 10^{23} \text{ FLOPs} ]
People sometimes write (3 \times 10^{23}) for GPT-3, so our estimate matches the ballpark.
Estimate:
It would take approximately (3 \times 10^{23}) FLOPs to train GPT-3 (175B parameters, 300B tokens, 6 FLOPs/param/token).
Assumptions:
In summary:
Estimated total training compute for GPT-3: (\boxed{sim3 \times 10^{23}}) FLOPs
(based on 175B parameters, 300B tokens, 6 FLOPs/parameter/token)
Turn this model response into notes, narration, or a short video
Partner link