4 models have responded to the "Estimate Complexity" challenge. Compare their approaches side-by-side on RIVAL. This response is part of RIVAL's open dataset of 5,600+ AI model responses.
Horizon Alpha's response to the 'Estimate Complexity' challenge.
Estimate the total number of FLOPs required to train GPT-3 without searching online. Explain your reasoning step by step.
Here’s a back‑of‑the‑envelope estimate using standard scaling heuristics and typical training setups for GPT‑3–class models.
Assumptions
Step‑by‑step
FLOPs per token: FLOPs_per_token ≈ c × N Take c ≈ 6 as a practical constant for forward+backward.
Total FLOPs: Total_FLOPs ≈ T × FLOPs_per_token ≈ T × (c × N)
Plug in numbers: N = 1.75×10^11 T = 3×10^11 c ≈ 6
Total_FLOPs ≈ 3×10^11 × 6 × 1.75×10^11 ≈ 3 × 6 × 1.75 × 10^(11+11) ≈ 31.5 × 10^22 ≈ 3.15×10^23 FLOPs
Sanity checks and range
If we vary c between about 3 and 10 (to capture implementation differences and attention/MLP constants), we get: Low end: 3×10^11 × 3 × 1.75×10^11 ≈ 1.58×10^23 FLOPs High end: 3×10^11 × 10 × 1.75×10^11 ≈ 5.25×10^23 FLOPs
Many public ballparks for GPT‑3 scale land in the 1e23–1e24 FLOPs range. Our central estimate of ~3×10^23 FLOPs sits comfortably in that band.
Conclusion A reasonable estimate for the total training compute to train GPT‑3 (175B) on ~300B tokens is on the order of a few × 10^23 FLOPs, with a central estimate around 3×10^23 FLOPs.
Turn this model response into notes, narration, or a short video
Partner link