Estimate Complexity

Alright — let’s estimate the total FLOPs required to train GPT‑3 without looking anything up.

1. Known basic formula

A common rule-of-thumb for training Transformer models is:

[ \text{Training FLOPs} \approx 6 \times N \times D ]

Where:

( N ) = total number of tokens in training
( D ) = total number of parameters in the model

This “6” factor comes from:

Forward pass: ~ ( 2 \times D ) FLOPs per token (2 FLOPs per parameter, from matrix multiplies and activations)
Backward pass: ~ ( 4 \times D ) FLOPs per token (roughly twice the forward pass)
So ( 2 + 4 = 6 ) FLOPs per parameter per token during training.

2. GPT‑3 parameters

GPT‑3 has multiple sizes; the largest is 175 billion parameters (175e9, or 1.75 × 10(^{11})).

So ( D = 1.75 \times 10^{11} ).

3. Training tokens

GPT‑3 was trained on ~300 billion tokens (from their paper).
So ( N = 3 \times 10^{11} ) tokens.

4. Apply formula

[ \text{FLOPs} \approx 6 \times (1.75 \times 10^{11}) \times (3 \times 10^{11}) ]

First multiply ( 1.75 \times 3 = 5.25 )
Exponents: ( 10^{11} \times 10^{11} = 10^{22} )

So:

[ 6 \times 5.25 \times 10^{22} = 31.5 \times 10^{22} = 3.15 \times 10^{23} \ \text{FLOPs} ]

5. Adjust for reality

This formula is approximate. In practice, there might be extra operations from attention computation, optimizers, etc., but for large dense models, ( 6ND ) is standard.

The actual number from the GPT‑3 paper is often cited as ~3.14 × 10(^{23}) FLOPs, which matches our estimate almost exactly.

6. Final answer

[ \boxed{3.15 \times 10^{23}} ] FLOPs for training GPT‑3 (175B parameters on 300B tokens).

?
Prompt

Response

1. Known basic formula

2. GPT‑3 parameters

3. Training tokens

4. Apply formula

5. Adjust for reality

6. Final answer

About DeepSeek V3.2 Exp

Capabilities

Categories

Specifications

Estimate Complexity

?Prompt

Response

1. Known basic formula

2. GPT‑3 parameters

3. Training tokens

4. Apply formula

5. Adjust for reality

6. Final answer

About DeepSeek V3.2 Exp

Capabilities

Categories

Specifications

?
Prompt