Estimating GPT-3 Training FLOPs

I'll work through this from first principles using publicly known GPT-3 specs.

Known Parameters

Model size: 175 billion parameters
Training tokens: ~300 billion tokens (estimated from OpenAI's hints about compute)
Training approach: Standard transformer with dense attention

Step 1: FLOPs per Token per Parameter

For a forward pass through a transformer:

Each token attends to all previous tokens (roughly)
Each parameter is involved in roughly 2 operations per token (multiply-accumulate = 2 FLOPs)

Rough estimate: ~2 FLOPs per parameter per token for forward pass

Step 2: Account for Backward Pass

Standard backpropagation requires:

Forward pass: 1x
Backward pass: ~2x (gradients for weights and activations)

Total: ~6 FLOPs per parameter per token (accounting for forward + backward)

Some literature suggests 3-6x multiplier; I'll use 6 as a reasonable middle-ground estimate.

Step 3: Calculate Total

Total FLOPs = Parameters × Tokens × FLOPs-per-token-per-param
            = 175 × 10^9 × 300 × 10^9 × 6
            = 175 × 300 × 6 × 10^18
            = 315,000 × 10^18
            ≈ 3.15 × 10^23 FLOPs

Final Estimate

~3 × 10^23 FLOPs (order of magnitude: 300 exaFLOPs-seconds)

This aligns with OpenAI's reported ~3,640 petaflop/s-days, which converts to roughly 3 × 10^23 FLOPs.

Reasoning Check

This represents ~10,000 GPU-years of A100 compute (rough validation)
Cost would be $10-15M at 2020 cloud prices
Publicly consistent with OpenAI's disclosed estimates

Confidence: Moderate — the 2-6x multiplier for backprop is the biggest uncertainty, but order of magnitude should be solid.

Estimate Complexity

?
Prompt

Response

Estimating GPT-3 Training FLOPs

Known Parameters

Step 1: FLOPs per Token per Parameter

Step 2: Account for Backward Pass

Step 3: Calculate Total

Final Estimate

Reasoning Check

About Claude Haiku 4.5

Capabilities

Categories

Specifications

Estimate Complexity

?Prompt

Response

Estimating GPT-3 Training FLOPs

Known Parameters

Step 1: FLOPs per Token per Parameter

Step 2: Account for Backward Pass

Step 3: Calculate Total

Final Estimate

Reasoning Check

About Claude Haiku 4.5

Capabilities

Categories

Specifications

?
Prompt