4 models have responded to the "Estimate Complexity" challenge. Compare their approaches side-by-side on Rival. This response is part of Rival's open dataset of 5,600+ AI model responses.
Gpt 5.4's response to the 'Estimate Complexity' challenge.
Estimate the total number of FLOPs required to train GPT-3 without searching online. Explain your reasoning step by step.
A standard back-of-the-envelope estimate for training FLOPs is:
[ \text{Training FLOPs} \approx 6 \times N \times D ]
where:
This is a common transformer scaling approximation.
GPT-3 is known to have about:
[ N = 175 \text{ billion} = 1.75 \times 10^{11} ]
A typical quoted training token count is about:
[ D = 300 \text{ billion} = 3 \times 10^{11} ]
First compute parameter-tokens product:
[ N \times D = (1.75 \times 10^{11})(3 \times 10^{11}) = 5.25 \times 10^{22} ]
Now multiply by 6:
[ 6 \times 5.25 \times 10^{22} = 31.5 \times 10^{22} = 3.15 \times 10^{23} ]
[ \boxed{\text{Training GPT-3 required on the order of } 3 \times 10^{23} \text{ FLOPs}} ]
This is usually expressed as a few (10^{23}) FLOPs, which matches the well-known rough estimate.
If you want, I can also derive where the (6ND) rule comes from in more detail.
Turn this model response into notes, narration, or a short video
Partner link