Estimate Complexity
Deepseek R1's response to the Tests an AI's ability to make educated estimates based on technical knowledge
?Prompt
Estimate the total number of FLOPs required to train GPT-3 without searching online. Explain your reasoning step by step.
Response
To estimate the total number of FLOPs required to train GPT-3, we use the following reasoning:
-
Model Parameters (P):
GPT-3 has 175 billion parameters ((P = 175 imes 10^9)). -
Training Tokens (N):
GPT-3 was trained on approximately 300 billion tokens ((N = 300 imes 10^9)). -
FLOPs per Token:
- A forward pass requires ~(2P) FLOPs per token (each parameter involves a multiply-accumulate operation, which counts as 2 FLOPs).
- A backward pass (for gradients) is roughly twice as expensive as the forward pass, requiring ~(4P) FLOPs per token.
- Total FLOPs per token: (2P + 4P = 6P).
-
Total Training FLOPs:
Multiply FLOPs per token by the total number of tokens:
[ ext{FLOPs} = 6 imes P imes N = 6 imes 175 imes 10^9 imes 300 imes 10^9 = 3.15 imes 10^{23} , ext{FLOPs}. ]
Final Answer:
The total number of FLOPs required to train GPT-3 is approximately (3.1 imes 10^{23}) FLOPs.
About DeepSeek R1
Capabilities
Categories
Specifications
- Provider
- Deepseek
- Released
- February 2025
- Size
- LARGE
- Parameters
- 671B total (37B active MoE)
- Context
- 128,000 tokens