Estimate Complexity

o4 Mini's response to the challenge: Estimate GPT-3 training FLOPs.

OpenAI o4-mini

April 16, 2025

Text

?
Prompt

Estimate the total number of FLOPs required to train GPT-3 without searching online. Explain your reasoning step by step.

OpenAI o4-miniEstimate Complexity

OpenAI o4-mini

Here's a back‐of‐the‐envelope derivation that gets us into the right ballpark (∼10²³ FLOPs total):

Model size
• GPT–3 "175 B" has about P ≃1.75×10¹¹ parameters.
Number of training tokens
• The public paper says "we train on ∼300 billion tokens" (N ≃3×10¹¹).
FLOPs per token
• Every forward+backward through the network touches all parameters.
• Rough rule of thumb: you need O(6) FLOPs per parameter per token
(a couple multiplies/adds in the forward pass, a similar count in the backward, and some small extra for optimizer updates). • So 6 × P FLOPs/token ≃ 6×1.75×10¹¹ ≃1.05×10¹² FLOPs/token.
Total FLOPs
• Multiply FLOPs/token by number of tokens:
Total ≃ (1.05×10¹² FLOPs/token) × (3×10¹¹ tokens)
≃ 3.15×10²³ FLOPs.
Sanity check
• 3×10²³ FLOPs is 3×10⁵ PFLOP·days (since 1 PFLOP/s×86 400 s ≈ 8.6×10¹⁹ FLOPs/day).
• That's a few thousand GPU‐days on a modern accelerator cluster—consistent with the published "thousands of V100‐GPU days" statement.

Bottom‐line estimate: on the order of 10²³–10²⁴ floating‐point operations to train GPT‑3 175 B from scratch.