Estimate Complexity

o4-mini-high's response to the Tests an AI's ability to make educated estimates based on technical knowledge

OpenAI o4 Mini High

April 16, 2025

Text

?
Prompt

Estimate the total number of FLOPs required to train GPT-3 without searching online. Explain your reasoning step by step.

Response

OpenAI o4 Mini HighEstimate Complexity

OpenAI o4 Mini High

Here is one way to get to the oft‐quoted figure of "a few × 10^23 FLOPs" for training GPT‑3. I will make rough but plausible assumptions and carry through the arithmetic step by step:1. Model size – GPT‑3 "175 B" has about N ≈ 1.75×10^11 trainable parameters.2. Amount of training data – The paper says they trained on on the order of 3×10^11 tokens (≈ 300 billion tokens).3. FLOPs per token per parameter – In a forward pass each parameter participates in one multiply + one add → ≈2 FLOPs per parameter per token. – Back‐propagation through that parameter costs on the order of twice as much again (roughly another 4 FLOPs per parameter per token). – We can fold in any small extra cost for optimizer updates and say order‐of‐magnitude it to ≈ 6 FLOPs per parameter per token.4. Total FLOPs – FLOPs ≈ (FLOPs per parameter per token) × (number of parameters) × (number of tokens) – FLOPs ≈ 6 × (1.75×10^11) × (3×10^11) – ≈ 6 × 5.25×10^22 – ≈ 3.15×10^23 floating‐point operationsSo you end up with a training cost on the order of 10^23–10^24 FLOPs (often quoted as ≃3×10^23 FLOPs). That is roughly 300 zettaflops (300×10^21 FLOPs) in total.

About OpenAI o4 Mini High

Capabilities

ConversationReasoningCode GenerationAnalysis

Specifications

Provider: Openai
Released: 2025-04-16
Size: MEDIUM
Parameters: Not disclosed
Context: 200,000 tokens