4 models have responded to the "Estimate Complexity" challenge. Compare their approaches side-by-side on RIVAL. This response is part of RIVAL's open dataset of 5,600+ AI model responses.
Deepseek V3.2 Exp's response to the 'Estimate Complexity' challenge.
Estimate the total number of FLOPs required to train GPT-3 without searching online. Explain your reasoning step by step.
Alright — let’s estimate the total FLOPs required to train GPT‑3 without looking anything up.
A common rule-of-thumb for training Transformer models is:
[ \text{Training FLOPs} \approx 6 \times N \times D ]
Where:
This “6” factor comes from:
GPT‑3 has multiple sizes; the largest is 175 billion parameters (175e9, or 1.75 × 10(^{11})).
So ( D = 1.75 \times 10^{11} ).
GPT‑3 was trained on ~300 billion tokens (from their paper).
So ( N = 3 \times 10^{11} ) tokens.
[ \text{FLOPs} \approx 6 \times (1.75 \times 10^{11}) \times (3 \times 10^{11}) ]
First multiply ( 1.75 \times 3 = 5.25 )
Exponents: ( 10^{11} \times 10^{11} = 10^{22} )
So:
[ 6 \times 5.25 \times 10^{22} = 31.5 \times 10^{22} = 3.15 \times 10^{23} \ \text{FLOPs} ]
This formula is approximate. In practice, there might be extra operations from attention computation, optimizers, etc., but for large dense models, ( 6ND ) is standard.
The actual number from the GPT‑3 paper is often cited as ~3.14 × 10(^{23}) FLOPs, which matches our estimate almost exactly.
[ \boxed{3.15 \times 10^{23}} ] FLOPs for training GPT‑3 (175B parameters on 300B tokens).
Turn this model response into notes, narration, or a short video
Partner link