4 models have responded to the "Estimate Complexity" challenge. Compare their approaches side-by-side on RIVAL. This response is part of RIVAL's open dataset of 5,600+ AI model responses.
O1's response to the Tests an AI's ability to make educated estimates based on technical knowledge
Estimate the total number of FLOPs required to train GPT-3 without searching online. Explain your reasoning step by step.
Below is one way to arrive at a back-of-the-envelope estimate for the total floating-point operations (FLOPs) needed to train GPT-3 (the 175B-parameter model). Note that different sources use slightly different assumptions, so you may see figures that differ by factors of 2–3; however, the overall scale (10^23 FLOPs) remains consistent.
Why 6?
(Exact counts differ depending on the details of the architecture, hardware efficiency, and whether you count a multiply-add as one or two FLOPs, but 6 is a convenient ballpark multiplier.)
Using the 6 FLOPs rule of thumb:
[ ext{Total FLOPs} ;approx; 6 ; imes; N_p ; imes; N_t. ]
Plug in the numbers:
So,
[ ext{Total FLOPs} ;=; 6 ; imes; (1.75 imes 10^{11}) ; imes; (3.0 imes 10^{11}). ]
First, multiply the “nice” numbers:
So the coefficient is 31.5. Now include the powers of ten:
Hence,
[ ext{Total FLOPs} ;approx; 3.15 imes 10^{23}. ]
Therefore, a rough estimate for the total training cost of GPT-3 (175B) is on the order of
[
mathbf{3 imes 10^{23}} ext{ FLOPs.}
]
Depending on exact architectural details, optimizer choices, hardware efficiency, and definitions of FLOPs vs. MACs (multiply-accumulate operations), public estimates generally range from around (2 imes 10^{23}) to (4 imes 10^{23}). But the important takeaway is that it is a few times (10^{23}) FLOPs.
This simple calculation gives a ballpark figure without delving into the fine details of the GPT-3 architecture or specifics of training hardware.
Turn this model response into notes, narration, or a short video
Partner link