4 models have responded to the "Estimate Complexity" challenge. Compare their approaches side-by-side on RIVAL. This response is part of RIVAL's open dataset of 5,600+ AI model responses.
Optimus Alpha's response estimating the FLOPs required to train GPT-3.
Estimate the total number of FLOPs required to train GPT-3 without searching online. Explain your reasoning step by step.
Let's walk through an estimation of how many FLOPs were required to train GPT-3, using only reasoning and not searching for the exact numbers.
GPT-3 (the original) had 175 billion parameters.
So, FLOPs per token = 2 × number of parameters.
Total FLOPs = FLOPs per token × number of tokens processed
Now, multiply:
[ \text{Total FLOPs} = 350,\text{B} \times 300,\text{B} = (3.5 \times 10^{11}) \times (3 \times 10^{11}) = 1.05 \times 10^{23} ]
So,
[ \boxed{ \approx 1 \times 10^{23} \text{ FLOPs} } ]
This estimate is rough, but matches the ballpark for very large models. (Other nuances, like multiple epochs or data repetition, may factor in, but using "number of tokens processed" as above is standard.)
It takes about ( 1 \times 10^{23} ) FLOPs to train GPT-3.
This comes from:
Assumptions:
Notes:
Turn this model response into notes, narration, or a short video
Partner link