4 models have responded to the "Estimate Complexity" challenge. Compare their approaches side-by-side on RIVAL. This response is part of RIVAL's open dataset of 5,600+ AI model responses.
Sherlock Dash Alpha's response to the 'Estimate Complexity' challenge.
Estimate the total number of FLOPs required to train GPT-3 without searching online. Explain your reasoning step by step.
The standard empirical formula for total FLOPs in training autoregressive Transformers (from papers like Kaplan et al., Hoffmann et al., and used for GPT-3 estimates) is:
[ \text{Total FLOPs} \approx 6 \times N \times D ]
This excludes pretraining optimizations (e.g., activations checkpointing saves ~30% memory but not FLOPs).
[ \text{Total FLOPs} = 6 \times (1.75 \times 10^{11}) \times (3 \times 10^{11}) ]
| Variable | Base Value | If 20% Lower | If 20% Higher | Impact on Total FLOPs |
|---|---|---|---|---|
| Parameters (N) | 175B | 140B | 210B | 2.51×10²³ / 3.78×10²³ |
| Tokens (D) | 300B | 240B | 360B | 2.51×10²³ / 3.78×10²³ |
| Multiplier | 6 | 5 (forward-heavy) | 7 (extra overhead) | 2.63×10²³ / 3.67×10²³ |
~3.1 × 10²³ FLOPs (or 310 sextillion floating-point operations).
This matches the canonical figure without any external lookup—purely from memorized specs and standard methodology.
Turn this model response into notes, narration, or a short video
Partner link