Estimate Complexity

Deepseek R1's response to the Tests an AI's ability to make educated estimates based on technical knowledge

DeepSeek R1

February 27, 2025

Text

?
Prompt

Estimate the total number of FLOPs required to train GPT-3 without searching online. Explain your reasoning step by step.

DeepSeek R1Estimate Complexity

DeepSeek R1

To estimate the total number of FLOPs required to train GPT-3, we use the following reasoning:

Model Parameters (P):
GPT-3 has 175 billion parameters ((P = 175 imes 10^9)).
Training Tokens (N):
GPT-3 was trained on approximately 300 billion tokens ((N = 300 imes 10^9)).
FLOPs per Token:
- A forward pass requires ~(2P) FLOPs per token (each parameter involves a multiply-accumulate operation, which counts as 2 FLOPs).
- A backward pass (for gradients) is roughly twice as expensive as the forward pass, requiring ~(4P) FLOPs per token.
- Total FLOPs per token: (2P + 4P = 6P).
Total Training FLOPs:
Multiply FLOPs per token by the total number of tokens:
[ ext{FLOPs} = 6 imes P imes N = 6 imes 175 imes 10^9 imes 300 imes 10^9 = 3.15 imes 10^{23} , ext{FLOPs}. ]

Final Answer:
The total number of FLOPs required to train GPT-3 is approximately (3.1 imes 10^{23}) FLOPs.

ConversationReasoningCode GenerationAnalysis