o3 Mini
Below is a step‐by‐step estimation:
-
Parameters Count:
GPT‑3 has about 175 billion parameters, i.e. roughly 1.75×10¹¹ parameters. -
Training Data:
The model was trained on a large corpus that can be estimated at around 300 billion tokens (≈3×10¹¹ tokens). -
FLOPs per Token:
In training deep networks like GPT‑3, the forward pass involves roughly one multiply–accumulate per parameter per token, but the backward pass (for computing gradients) costs about 2–3 times as much as the forward pass. A common rule‐of‐thumb is that training requires roughly 6 times as many FLOPs as a single forward pass. -
Total FLOPs Calculation:
The total number of FLOPs is therefore approximately:
[ ext{FLOPs} approx 6 imes ( ext{number of parameters}) imes ( ext{number of tokens}) ] Plugging in the numbers: [ ext{FLOPs} approx 6 imes 1.75 imes 10^{11} imes 3 imes 10^{11} approx 6 imes 5.25 imes 10^{22} approx 3.15 imes 10^{23} ] -
Final Estimate:
So, the training of GPT‑3 likely required on the order of 3×10²³ FLOPs.
This is an approximate calculation that combines rough estimates for both the number of training tokens and the factor accounting for both the forward and backward passes. Variations in these estimates or the exact training setup could change the final number by a factor of a few, but 3×10²³ FLOPs is a commonly cited ballpark figure.
