4 models have responded to the "Estimate Complexity" challenge. Compare their approaches side-by-side on Rival. This response is part of Rival's open dataset of 5,600+ AI model responses.
Qwen3.5 35B A3B's response to the 'Estimate Complexity' challenge.
Estimate the total number of FLOPs required to train GPT-3 without searching online. Explain your reasoning step by step.
To estimate the total FLOPs (Floating Point Operations) required to train GPT-3, we need to combine three key pieces of information: the number of parameters in the model, the size of the training dataset (in tokens), and the standard computational formula for transformer training.
Here is the step-by-step reasoning and calculation.
The most famous specification of GPT-3 is its parameter count. From general knowledge of the model architecture (published in the 2020 paper "Language Models are Few-Shot Learners" by Brown et al.), we know:
Training a transformer model requires processing the dataset multiple times (epochs). The standard metric for the total volume of text processed is the number of "tokens" (pieces of text) the model sees during the entire training run.
To calculate the total compute required for training a Transformer model, we use a standard rule of thumb derived from the computational complexity of the forward and backward passes.
Therefore, the total training FLOPs formula is: $$ \text{Total FLOPs} = 6 \times P \times T $$
Now we substitute the values from Steps 1 and 2 into the formula from Step 3.
$$ \text{Total FLOPs} = 6 \times (1.75 \times 10^{11}) \times (3.0 \times 10^{11}) $$
First, multiply the scalar numbers: $$ 6 \times 1.75 = 10.5 $$ $$ 10.5 \times 3.0 = 31.5 $$
Next, multiply the powers of 10: $$ 10^{11} \times 10^{11} = 10^{22} $$
Combine them: $$ 31.5 \times 10^{22} $$
Convert to standard scientific notation: $$ 3.15 \times 10^{23} $$
The result is approximately $3.15 \times 10^{23}$ FLOPs.
In the machine learning community, this is often rounded to $3 \times 10^{23}$ or simply cited as 315 sextillion FLOPs. This estimate aligns with the widely accepted figures reported in technical summaries of the GPT-3 paper, which state the training required roughly $3.1 \times 10^{23}$ FLOPs.
Based on the parameter count of 175 billion, a training set of 300 billion tokens, and the standard $6 \times \text{Params} \times \text{Tokens}$ formula for Transformer training:
The estimated total number of FLOPs required to train GPT-3 is approximately $3.15 \times 10^{23}$.
Turn this model response into notes, narration, or a short video
Partner link