4 models have responded to the "Estimate Complexity" challenge. Compare their approaches side-by-side on Rival. This response is part of Rival's open dataset of 5,600+ AI model responses.
Gpt 5.1 Codex Max's response to the 'Estimate Complexity' challenge.
Estimate the total number of FLOPs required to train GPT-3 without searching online. Explain your reasoning step by step.
What you need for a back‑of‑the‑envelope is:
You can justify that 6 in a more explicit way by summing over a single layer. For GPT‑3’s configuration ((d_{\text{model}})=12 288, (d_\text{ff})=49 152, 96 layers, (L)=2 048):
Summing and converting multiply–adds to FLOPs gives (\sim3.7\times10^9) FLOPs per layer per token. Across 96 layers that’s (\sim3.6\times10^{11}) FLOPs for a forward pass on one token; including backward pass multiplies that by ≈3, giving (\sim1.1\times10^{12}) FLOPs per token. Multiply by (3\times10^{11}) tokens and you get (\sim3\times10^{23}) FLOPs.
Using the simpler 6 × (N,T) rule of thumb: (6\times1.75\times10^{11}\times3\times10^{11}\approx3.15\times10^{23}) floating‑point operations.
Turn this model response into notes, narration, or a short video
Partner link