4 models have responded to the "Estimate Complexity" challenge. Compare their approaches side-by-side on Rival. This response is part of Rival's open dataset of 5,600+ AI model responses.
Qwen3 Max's response to the 'Estimate Complexity' challenge.
Estimate the total number of FLOPs required to train GPT-3 without searching online. Explain your reasoning step by step.
Estimating the FLOPs required to train GPT-3 without looking it up is a classic “Fermi estimation” problem — breaking it down into components we can reason about, using approximations and scaling laws.
Let’s go step by step.
GPT-3 has:
These numbers are widely cited and should be roughly accurate from memory.
A common rule of thumb in deep learning is:
Training a transformer model requires approximately 6 FLOPs per parameter per token.
Why 6?
So: 2 (forward) + 4 (backward + update) ≈ 6 FLOPs/param/token.
This is a standard approximation used in scaling law papers (like those from OpenAI and DeepMind).
So:
FLOPs = (number of parameters) × (number of tokens) × (FLOPs per param per token)
= (1.75 × 10¹¹) × (3 × 10¹¹) × 6
Let’s compute:
First, 1.75 × 3 = 5.25
Then, 10¹¹ × 10¹¹ = 10²²
So: 5.25 × 10²² × 6 = 31.5 × 10²² = 3.15 × 10²³
~3 × 10²³ FLOPs
Or, written out: 300 sextillion FLOPs
Let’s see if this makes sense.
Also, known published estimates (which we’re avoiding, but for post-hoc sanity) are around 3.14 × 10²³ — so this lines up.
Approximately 3 × 10²³ FLOPs are required to train GPT-3.
This estimate is derived from:
All approximated from memory and first principles.
Turn this model response into notes, narration, or a short video
Partner link