Estimate Complexity

A standard back-of-the-envelope estimate for training FLOPs is:

[ \text{Training FLOPs} \approx 6 \times N \times D ]

where:

(N) = number of model parameters
(D) = number of training tokens
the factor 6 comes from roughly:
- forward pass: (\sim 2ND)
- backward pass: (\sim 4ND)

This is a common transformer scaling approximation.

1) Plug in GPT-3 size

GPT-3 is known to have about:

[ N = 175 \text{ billion} = 1.75 \times 10^{11} ]

A typical quoted training token count is about:

[ D = 300 \text{ billion} = 3 \times 10^{11} ]

2) Multiply

First compute parameter-tokens product:

[ N \times D = (1.75 \times 10^{11})(3 \times 10^{11}) = 5.25 \times 10^{22} ]

Now multiply by 6:

[ 6 \times 5.25 \times 10^{22} = 31.5 \times 10^{22} = 3.15 \times 10^{23} ]

3) Final estimate

[ \boxed{\text{Training GPT-3 required on the order of } 3 \times 10^{23} \text{ FLOPs}} ]

4) Sanity check

This is usually expressed as a few (10^{23}) FLOPs, which matches the well-known rough estimate.

If you want, I can also derive where the (6ND) rule comes from in more detail.

Response

GPT-5.4Estimate Complexity

GPT-5.4

A standard back-of-the-envelope estimate for training FLOPs is:

[ \text{Training FLOPs} \approx 6 \times N \times D ]

where:

(N) = number of model parameters
(D) = number of training tokens
the factor 6 comes from roughly:
- forward pass: (\sim 2ND)
- backward pass: (\sim 4ND)

This is a common transformer scaling approximation.

1) Plug in GPT-3 size

GPT-3 is known to have about:

[ N = 175 \text{ billion} = 1.75 \times 10^{11} ]

A typical quoted training token count is about:

[ D = 300 \text{ billion} = 3 \times 10^{11} ]

2) Multiply

First compute parameter-tokens product:

[ N \times D = (1.75 \times 10^{11})(3 \times 10^{11}) = 5.25 \times 10^{22} ]

Now multiply by 6:

[ 6 \times 5.25 \times 10^{22} = 31.5 \times 10^{22} = 3.15 \times 10^{23} ]

3) Final estimate

[ \boxed{\text{Training GPT-3 required on the order of } 3 \times 10^{23} \text{ FLOPs}} ]

4) Sanity check

This is usually expressed as a few (10^{23}) FLOPs, which matches the well-known rough estimate.

If you want, I can also derive where the (6ND) rule comes from in more detail.

Estimate Complexity

?
Prompt

Response

1) Plug in GPT-3 size

2) Multiply

3) Final estimate

4) Sanity check

About GPT-5.4

Capabilities

Categories

Specifications

Claude Sonnet 4.6's version

GPT-5.4 vs Google: Gemini 3.1 Flash Lite Preview

Estimate Complexity

?
Prompt

Response

1) Plug in GPT-3 size

2) Multiply

3) Final estimate

4) Sanity check

About GPT-5.4

Capabilities

Categories

Specifications

Claude Sonnet 4.6's version

GPT-5.4 vs Google: Gemini 3.1 Flash Lite Preview

GPT-5.4 — Estimate Complexity

Estimate Complexity

?Prompt

Response

1) Plug in GPT-3 size

2) Multiply

3) Final estimate

4) Sanity check

About GPT-5.4

Capabilities

Categories

Specifications

Claude Sonnet 4.6's version

GPT-5.4 vs Google: Gemini 3.1 Flash Lite Preview

GPT-5.4 — Estimate Complexity

Estimate Complexity

?Prompt

Response

1) Plug in GPT-3 size

2) Multiply

3) Final estimate

4) Sanity check

About GPT-5.4

Capabilities

Categories

Specifications

Claude Sonnet 4.6's version

GPT-5.4 vs Google: Gemini 3.1 Flash Lite Preview

?
Prompt

?
Prompt