3 models have responded to the "Estimate Complexity" challenge. Compare their approaches side-by-side on RIVAL. This response is part of RIVAL's open dataset of 5,600+ AI model responses.
Claude Sonnet 4.6's response to the 'Estimate Complexity' challenge.
Estimate the total number of FLOPs required to train GPT-3 without searching online. Explain your reasoning step by step.
GPT-3 has approximately:
For a transformer, the dominant operations are matrix multiplications. There's a useful rule of thumb:
FLOPs per token ≈ 2 × N (for the forward pass)
where N = number of parameters.
Why 2×N?
So forward pass FLOPs per token: $$2 \times 175 \times 10^9 = 3.5 \times 10^{11} \text{ FLOPs/token}$$
Backpropagation costs approximately 2× the forward pass (computing gradients for weights + activations).
So total per token: $$3 \times 3.5 \times 10^{11} = 1.05 \times 10^{12} \text{ FLOPs/token}$$
GPT-3 was trained on approximately 300 billion tokens (from the paper's description of ~300B tokens from filtered Common Crawl, WebText, books, Wikipedia).
$$\text{Total FLOPs} = 1.05 \times 10^{12} \times 3 \times 10^{11}$$
$$= 3.15 \times 10^{23} \text{ FLOPs}$$
$$\approx \boxed{3 \times 10^{23} \text{ FLOPs}}$$
| Component | Value |
|---|---|
| Parameters | 175B |
| FLOPs/token (forward) | ~3.5 × 10¹¹ |
| Training multiplier (fwd+bwd) | ×3 |
| Training tokens | ~300B |
| Total FLOPs | ~3 × 10²³ |
The key formula to remember: C ≈ 6ND where N = parameters, D = training tokens.
Turn this model response into notes, narration, or a short video
Partner link