Estimate Complexity

Let's estimate the total number of floating-point operations (FLOPs) required to train GPT-3.

Step 1: Recall GPT-3 scale

From published sources (I won't look this up now, as instructed):

GPT-3 (175B) has about 175 billion parameters.
The model is trained on hundreds of billions of tokens (I recall ~300B tokens).

Step 2: FLOPs per token per parameter

When training a transformer, roughly 2 forward passes and 1 backward pass per token (combined), and the number of FLOPs per parameter per token is about 6 (this is a number I remember from general literature).

But to be more precise, the convention is: Training one token requires about 2 FLOPs per parameter (one forward and one backward pass, with backward generally being twice as expensive as forward). Sometimes the estimate is 2, 3, 6, or even more, depending on if you count both, under what assumptions (optimizer steps, auxiliary computations, etc.), but 4 or 6 is standard.

Let's use 6 FLOPs/parameter/token (a commonly cited "rule of thumb").

Step 3: Compute FLOPs

Formula

[ \text{Total FLOPs} = \text{#params} \times \text{#tokens} \times \text{FLOPs per param per token} ]

Plug in numbers:

#params ≈ 175 × 10⁹
#tokens ≈ 300 × 10⁹
FLOPs per param per token ≈ 6

[ \text{Total FLOPs} = 175 \times 10^9 \times 300 \times 10^9 \times 6 ]

Let's multiply:

(175 \times 300 = 52,500)
(52,500 \times 6 = 315,000)
(10^9 \times 10^9 = 10^{18})

So,

[ \text{Total FLOPs} = 315{,}000 \times 10^{18} = 3.15 \times 10^{23} \text{ FLOPs} ]

Step 4: Order of Magnitude & Re-check

People sometimes write (3 \times 10^{23}) for GPT-3, so our estimate matches the ballpark.

Step 5: Summary Sentence

Estimate:
It would take approximately (3 \times 10^{23}) FLOPs to train GPT-3 (175B parameters, 300B tokens, 6 FLOPs/param/token).

Assumptions:

175B parameters
300B tokens
6 FLOPs per parameter per token

In summary:

Estimated total training compute for GPT-3: (\boxed{sim3 \times 10^{23}}) FLOPs
(based on 175B parameters, 300B tokens, 6 FLOPs/parameter/token)

?
Prompt

Response

Step 1: Recall GPT-3 scale

Step 2: FLOPs per token per parameter

Step 3: Compute FLOPs

Formula

Step 4: Order of Magnitude & Re-check

Step 5: Summary Sentence

About GPT-4.1

Capabilities

Categories

Specifications

Estimate Complexity

?Prompt

Response

Step 1: Recall GPT-3 scale

Step 2: FLOPs per token per parameter

Step 3: Compute FLOPs

Formula

Step 4: Order of Magnitude & Re-check

Step 5: Summary Sentence

About GPT-4.1

Capabilities

Categories

Specifications

?
Prompt