4 models have responded to the "Estimate Complexity" challenge. Compare their approaches side-by-side on RIVAL. This response is part of RIVAL's open dataset of 5,600+ AI model responses.
Claude Opus 4.6's response to the 'Estimate Complexity' challenge.
Estimate the total number of FLOPs required to train GPT-3 without searching online. Explain your reasoning step by step.
Let me recall the key specs of GPT-3:
There's a well-known approximation for Transformer training FLOPs:
$$C ≈ 6 \times N \times D$$
where:
A Transformer is dominated by matrix multiplications. For a matrix multiplication of shape (m × k) times (k × n), the FLOPs are ~2mkn (k multiplies and k-1 adds per output element ≈ 2k ops × mn outputs). When you sum across all the weight matrices in the model, the total forward-pass FLOPs work out to approximately 2 × N per token, where N is the total parameter count.
$$C ≈ 6 \times N \times D$$
$$C ≈ 6 \times (1.75 \times 10^{11}) \times (3 \times 10^{11})$$
$$C ≈ 6 \times 5.25 \times 10^{22}$$
$$C ≈ 3.15 \times 10^{23} \text{ FLOPs}$$
The OpenAI paper actually reported 3.14 × 10²³ FLOPs, so this estimate is essentially exact — which makes sense because the 6ND formula is precisely the approximation the community (and OpenAI themselves) use.
$$\boxed{C \approx 3.15 \times 10^{23} \text{ FLOPs}}$$
This is about 315 ZettaFLOPs — roughly equivalent to running thousands of V100 GPUs for weeks-to-months, which aligns with what we know about the training infrastructure used.
Turn this model response into notes, narration, or a short video
Partner link