Estimate Complexity
Kimi Linear 48B A3B Instruct's response to the 'Estimate Complexity' challenge.
?Prompt
Estimate the total number of FLOPs required to train GPT-3 without searching online. Explain your reasoning step by step.
Response
Below is a back-of-the-envelope estimate of the total FLOPs needed to train GPT-3 “175 B” (the largest 96-layer, 12288-d model with 175 billion parameters) from scratch on the full 300 B-token corpus, assuming the standard transformer training setup used by OpenAI.
Step 1 – FLOPs per forward+backward pass
A single transformer block has
- 2 matrix-multiply operations per attention layer (QKᵀ and OV)
- 2 matrix-multiply operations per MLP (Wi and Wo)
- 1 extra attention output projection (optional, but small)
- 1 residual add per block
For 96 layers and 12288 hidden units, each matrix multiply is roughly
(12288²) ≈ 1.5 × 10⁸ FLOPs.
So per block: 2 attention + 2 MLP ≈ 8 × 1.5 × 10⁸ ≈ 1.2 × 10⁹ FLOPs.
Add residual adds and layer norms (~10 % overhead) ⇒ ≈ 1.3 × 10⁹ FLOPs per block.
With 96 blocks, forward+backward per token ≈ 1.25 × 10¹¹ FLOPs ≈ 125 GFLOPs.
Step 2 – Tokens processed per forward+backward step
The batch size is 3,200 sequences × 2,048 tokens = 6.55 million tokens.
Thus each SGD step consumes ≈ 6.6 × 10⁶ tokens.
Step 3 – Total SGD steps
300 billion tokens / 6.6 million tokens per step ≈ 45,500 steps.
Step 4 – Total FLOPs
45,500 steps × 125 GFLOPs/step ≈ 5.7 × 10¹² FLOPs ≈ 5.7 petaflop-days.
Step 5 – Overhead for data loading, optimizer states, and communication
- Adam: 8 extra copies of the parameter matrix (2 × 4 bytes per float32) → +50 %
- Gradient communication and padding → +20 %
- Data loading and tokenization → +10 %
Overall multiply by 1.8 ⇒ ≈ 1.0 × 10¹³ FLOPs ≈ 10 petaflop-days.
Step 6 – Round to one significant figure
≈ 10¹³ FLOPs, i.e. ten petaflop-days.
So a rough estimate is on the order of 10¹³ total floating-point operations to train GPT-3 175 B from scratch on 300 B tokens with the standard transformer architecture and training setup.
About Kimi Linear 48B A3B Instruct
Capabilities
Categories
Specifications
- Provider
- Moonshotai
- Released
- 2025-11-10
- Size
- XLARGE
- Context
- 1,048,576 tokens