Estimate Complexity
Devstral Medium's response to the 'Estimate Complexity' challenge.
Mistral Devstral Medium
July 11, 2025
Text
?Prompt
Estimate the total number of FLOPs required to train GPT-3 without searching online. Explain your reasoning step by step.
Response
To estimate the total number of FLOPs (floating point operations) required to train GPT-3, we'll use the following steps:
- Model size: GPT-3 has 175 billion parameters.
- Training tokens: GPT-3 was trained on 300 billion tokens.
- FLOPs per token: To estimate the FLOPs per token, we'll use the rule of thumb that training a transformer model requires about 6 FLOPs per parameter per token.
- Total FLOPs: Multiply the number of parameters by the number of tokens and the FLOPs per token.
Now let's calculate:
- Model size: 175 billion parameters = 175 * 10^9
- Training tokens: 300 billion tokens = 300 * 10^9
- FLOPs per token: 6 FLOPs per parameter per token
- Total FLOPs: (175 * 10^9) * (300 * 10^9) * 6 = 3.15 * 10^23 FLOPs
So, the estimated total number of FLOPs required to train GPT-3 is approximately 3.15 * 10^23.
About Mistral Devstral Medium
Capabilities
ConversationReasoningCode GenerationAnalysis
Categories
TextCode
Specifications
- Provider
- Mistral
- Released
- 2025-07-11
- Size
- MEDIUM