Skip to content
Rival
Models
CompareBest ForArena
Sign Up
Sign Up

Compare AI vibes, not scores. Side-by-side outputs across the world's best models.

@rival_tips

Explore

  • Compare Models
  • All Models
  • Leaderboard
  • Challenges

Discover

  • AI Creators
  • AI Tools
  • The Graveyard

Developers

  • Developer Hub
  • MCP Server
  • .llmignore
  • Badges
  • RIVAL Datasets

Connect

  • Methodology
  • Sponsor
  • Partnerships
  • Privacy Policy
  • Terms
  • RSS Feed
© 2026 Rival
Rival
Models
CompareBest ForArena
Sign Up
Sign Up

Compare AI vibes, not scores. Side-by-side outputs across the world's best models.

@rival_tips

Explore

  • Compare Models
  • All Models
  • Leaderboard
  • Challenges

Discover

  • AI Creators
  • AI Tools
  • The Graveyard

Developers

  • Developer Hub
  • MCP Server
  • .llmignore
  • Badges
  • RIVAL Datasets

Connect

  • Methodology
  • Sponsor
  • Partnerships
  • Privacy Policy
  • Terms
  • RSS Feed
© 2026 Rival
  1. Home
  2. Models
  3. GPT-5.1
  4. Estimate Complexity

See how GPT-5.1 compares on this challenge:

  • GPT-5.1 vs Claude Sonnet 4.6
  • GPT-5.1 vs Gemini 3.1 Pro Preview
  • GPT-5.1 vs Z.ai: GLM 5

4 models have responded to the "Estimate Complexity" challenge. Compare their approaches side-by-side on RIVAL. This response is part of RIVAL's open dataset of 5,600+ AI model responses.

Rival
Models
CompareBest ForArena
Sign Up
Sign Up
Back to GPT-5.1

Estimate Complexity

Gpt 5.1's response to the 'Estimate Complexity' challenge.

GPT-5.1
November 13, 2025
Text

?
Prompt

Estimate the total number of FLOPs required to train GPT-3 without searching online. Explain your reasoning step by step.

GPT-5.1
Response

gpt-5.1 logo
GPT-5.1Estimate Complexity
gpt-5.1 logo
GPT-5.1

The total training compute for GPT‑3 (175B) is on the order of (10^{23}) floating‑point operations. A reasonable point estimate is (\sim 3\times 10^{23}) FLOPs.

Here’s the step‑by‑step reasoning from basic principles.


1. Key known quantities

From the GPT‑3 paper (architecture details are well known):

  • Number of parameters (N \approx 175 \text{ billion} = 1.75\times 10^{11})
  • Number of training tokens (T \approx 300 \text{ billion} = 3\times 10^{11})

We’ll assume:

  • Standard dense Transformer decoder-only model
  • Standard training: forward pass + backward pass + optimizer step
  • We want the total training FLOPs over all tokens and all steps.

2. FLOPs per parameter per token

For a dense neural network, a single weight typically participates in one multiply-add per forward pass per “use”:

  • 1 multiplication + 1 addition ≈ 2 FLOPs per weight usage.

For large Transformers, most compute is in matrix multiplications in the attention and MLP blocks; these are roughly proportional to the total number of parameters, so a common back-of-the-envelope is:

  • Forward pass ≈ (2N) FLOPs per token
    (each parameter used once in a multiply-add on average)

Backward pass is more expensive because you’re computing gradients:

  • Backward pass is roughly 2× the forward pass cost in terms of FLOPs
    → ≈ (4N) FLOPs per token

On top of that you have:

  • Some extra overhead for:
    • Nonlinearities, layer norms, softmax
    • Optimizer updates (Adam, etc.)

These are smaller but non-negligible; a common heuristic is to lump them into a constant factor, giving:

[ \text{FLOPs per token} \approx 6N ]

This “6” is a heuristic constant:

  • 2N (forward) + 4N (backward & optimizer & overhead) ≈ 6N

Some authors use factors between ~3 and ~6; using 6 is a conservative, widely used estimate in scaling law work.


3. Total FLOPs formula

Total training FLOPs:

[ \text{FLOPs}_{\text{total}} \approx (\text{FLOPs per token}) \times (\text{number of tokens}) ]

Using the heuristic:

[ \text{FLOPs}_{\text{total}} \approx 6N T ]

Plug in GPT‑3’s values:

  • (N = 1.75\times 10^{11})
  • (T = 3\times 10^{11})

So:

[ \begin{aligned} \text{FLOPs}_{\text{total}} &\approx 6 \times (1.75\times 10^{11}) \times (3\times 10^{11}) \ &= 6 \times 1.75 \times 3 \times 10^{22} \ &= 6 \times 5.25 \times 10^{22} \ &= 31.5 \times 10^{22} \ &= 3.15 \times 10^{23} \text{ FLOPs} \end{aligned} ]

So a good estimate is:

[ \boxed{\text{GPT‑3 (175B) training compute} ;\approx; 3\times 10^{23}\ \text{FLOPs}} ]


4. Sense check (order of magnitude)

Even if our constant factor is off (say it’s 3 instead of 6, or 8 instead of 6), we’re still in the same ballpark:

  • With factor 3: (\sim 1.6\times 10^{23}) FLOPs
  • With factor 8: (\sim 4.2\times 10^{23}) FLOPs

So the order of magnitude (10^{23}) FLOPs is robust, and (\sim 3\times 10^{23}) FLOPs is a reasonable central estimate.

Estimate Complexity

Turn this model response into notes, narration, or a short video

Auto NotesTranscriptNarrateVideo Cut

Partner link

This output is part of our open dataset — 5K+ AI responses across 200 models

About GPT-5.1

Capabilities

ConversationReasoningCode GenerationAnalysis

Categories

TextCodeMultimodal

Specifications

Provider
Openai
Released
2025-11-13
Size
XLARGE
Context
400,000 tokens

Keep exploring

SAME PROMPT

Claude Sonnet 4.6's version

Same prompt, different result

COMPARE

GPT-5.1 vs Gemini 3.1 Pro Preview

Both outputs, side by side

Compare AI vibes, not scores. Side-by-side outputs across the world's best models.

@rival_tips

Explore

  • Compare Models
  • All Models
  • Leaderboard
  • Challenges

Discover

  • AI Creators
  • AI Tools
  • The Graveyard

Developers

  • Developer Hub
  • MCP Server
  • .llmignore
  • Badges
  • RIVAL Datasets

Connect

  • Methodology
  • Sponsor
  • Partnerships
  • Privacy Policy
  • Terms
  • RSS Feed
© 2026 Rival