Estimate Complexity
o4 Mini's response to the challenge: Estimate GPT-3 training FLOPs.
?Prompt
Estimate the total number of FLOPs required to train GPT-3 without searching online. Explain your reasoning step by step.
Response
Here's a back‐of‐the‐envelope derivation that gets us into the right ballpark (∼10²³ FLOPs total):
-
Model size
• GPT–3 "175 B" has about P ≃1.75×10¹¹ parameters. -
Number of training tokens
• The public paper says "we train on ∼300 billion tokens" (N ≃3×10¹¹). -
FLOPs per token
• Every forward+backward through the network touches all parameters.
• Rough rule of thumb: you need O(6) FLOPs per parameter per token
(a couple multiplies/adds in the forward pass, a similar count in the backward, and some small extra for optimizer updates). • So 6 × P FLOPs/token ≃ 6×1.75×10¹¹ ≃1.05×10¹² FLOPs/token. -
Total FLOPs
• Multiply FLOPs/token by number of tokens:
Total ≃ (1.05×10¹² FLOPs/token) × (3×10¹¹ tokens)
≃ 3.15×10²³ FLOPs. -
Sanity check
• 3×10²³ FLOPs is 3×10⁵ PFLOP·days (since 1 PFLOP/s×86 400 s ≈ 8.6×10¹⁹ FLOPs/day).
• That's a few thousand GPU‐days on a modern accelerator cluster—consistent with the published "thousands of V100‐GPU days" statement.
Bottom‐line estimate: on the order of 10²³–10²⁴ floating‐point operations to train GPT‑3 175 B from scratch.
About OpenAI o4-mini
Capabilities
Categories
Specifications
- Provider
- Openai
- Released
- 2025-04-16
- Size
- MEDIUM
- Parameters
- Not disclosed