Estimate Complexity

Devstral Medium's response to the 'Estimate Complexity' challenge.

Mistral Devstral Medium

July 11, 2025

Text

?
Prompt

Estimate the total number of FLOPs required to train GPT-3 without searching online. Explain your reasoning step by step.

Mistral Devstral MediumEstimate Complexity

Mistral Devstral Medium

To estimate the total number of FLOPs (floating point operations) required to train GPT-3, we'll use the following steps:

Model size: GPT-3 has 175 billion parameters.
Training tokens: GPT-3 was trained on 300 billion tokens.
FLOPs per token: To estimate the FLOPs per token, we'll use the rule of thumb that training a transformer model requires about 6 FLOPs per parameter per token.
Total FLOPs: Multiply the number of parameters by the number of tokens and the FLOPs per token.

Now let's calculate:

So, the estimated total number of FLOPs required to train GPT-3 is approximately 3.15 * 10^23.