Skip to content
Rival
Models
CompareBest ForArena
Lab
Sign Up
Sign Up

We compare AI models for a living. On purpose. We chose this.

@rival_tips

Explore

  • Compare Models
  • All Models
  • Prompt Lab
  • Image Generation
  • Audio Comparison
  • Leaderboard
  • Challenges

Discover

  • Insights
  • AI Creators
  • AI Tools
  • The Graveyard

Developers

  • Developer Hub
  • MCP Server
  • .llmignore
  • Badges
  • Rival Datasets

Connect

  • Methodology
  • Sponsor
  • Partnerships
  • Privacy Policy
  • Terms
  • RSS Feed
© 2026 Rival · Built at hours no one should be awake, on hardware we don’t own
Rival
Models
CompareBest ForArena
Lab
Sign Up
Sign Up

We compare AI models for a living. On purpose. We chose this.

@rival_tips

Explore

  • Compare Models
  • All Models
  • Prompt Lab
  • Image Generation
  • Audio Comparison
  • Leaderboard
  • Challenges

Discover

  • Insights
  • AI Creators
  • AI Tools
  • The Graveyard

Developers

  • Developer Hub
  • MCP Server
  • .llmignore
  • Badges
  • Rival Datasets

Connect

  • Methodology
  • Sponsor
  • Partnerships
  • Privacy Policy
  • Terms
  • RSS Feed
© 2026 Rival · Built at hours no one should be awake, on hardware we don’t own

Qwen: Qwen3 Max — Estimate Complexity

  1. Home
  2. Models
  3. Qwen: Qwen3 Max
  4. Estimate Complexity

See how Qwen: Qwen3 Max compares on this challenge:

  • Qwen: Qwen3 Max vs Claude Sonnet 4.6
  • Qwen: Qwen3 Max vs Google: Gemini 3.1 Flash Lite Preview
  • Qwen: Qwen3 Max vs Gemini 3.1 Pro Preview

4 models have responded to the "Estimate Complexity" challenge. Compare their approaches side-by-side on Rival. This response is part of Rival's open dataset of 5,600+ AI model responses.

Rival
Models
CompareBest ForArena
Lab
Sign Up
Sign Up
Back to Qwen: Qwen3 Max

Estimate Complexity

Qwen3 Max's response to the 'Estimate Complexity' challenge.

Qwen: Qwen3 Max
September 5, 2025
Text

?
Prompt

Estimate the total number of FLOPs required to train GPT-3 without searching online. Explain your reasoning step by step.

Qwen: Qwen3 Max
Response

qwen3-max logo
Qwen: Qwen3 MaxEstimate Complexity
qwen3-max logo
Qwen: Qwen3 Max

Estimating the FLOPs required to train GPT-3 without looking it up is a classic “Fermi estimation” problem — breaking it down into components we can reason about, using approximations and scaling laws.

Let’s go step by step.


STEP 1: Recall GPT-3’s key specs (from memory)

GPT-3 has:

  • ~175 billion parameters (1.75 × 10¹¹)
  • Trained on ~300 billion tokens (3 × 10¹¹)

These numbers are widely cited and should be roughly accurate from memory.


STEP 2: FLOPs per parameter per token

A common rule of thumb in deep learning is:

Training a transformer model requires approximately 6 FLOPs per parameter per token.

Why 6?

  • In the forward pass, roughly 2 FLOPs per parameter (multiply + add for each weight update contribution).
  • In the backward pass, you need about twice the FLOPs of the forward pass (gradient computation w.r.t. inputs and weights).
  • Then there’s the weight update (optimizer step), which adds a bit more — roughly same cost as forward.

So: 2 (forward) + 4 (backward + update) ≈ 6 FLOPs/param/token.

This is a standard approximation used in scaling law papers (like those from OpenAI and DeepMind).


STEP 3: Multiply parameters × tokens × FLOPs per param per token

So:

FLOPs = (number of parameters) × (number of tokens) × (FLOPs per param per token)

= (1.75 × 10¹¹) × (3 × 10¹¹) × 6

Let’s compute:

First, 1.75 × 3 = 5.25

Then, 10¹¹ × 10¹¹ = 10²²

So: 5.25 × 10²² × 6 = 31.5 × 10²² = 3.15 × 10²³


STEP 4: Final Estimate

~3 × 10²³ FLOPs

Or, written out: 300 sextillion FLOPs


STEP 5: Sanity Check

Let’s see if this makes sense.

  • GPT-2 (1.5B params, ~40B tokens) → 1.5e9 × 4e10 × 6 = ~3.6e20 FLOPs
  • GPT-3 is ~100× more parameters and ~7.5× more tokens → 750× more FLOPs → 3.6e20 × 750 ≈ 2.7e23 → close to our 3.15e23.

Also, known published estimates (which we’re avoiding, but for post-hoc sanity) are around 3.14 × 10²³ — so this lines up.


✅ Final Answer:

Approximately 3 × 10²³ FLOPs are required to train GPT-3.

This estimate is derived from:

  • 175B parameters
  • 300B training tokens
  • 6 FLOPs per parameter per token (standard transformer training cost)

All approximated from memory and first principles.

Estimate Complexity

Turn this model response into notes, narration, or a short video

Auto NotesTranscriptNarrateVideo Cut

Partner link

This output lives in our open dataset — 5K+ responses across 200 models. Take it. It’s free.

About Qwen: Qwen3 Max

Capabilities

ConversationReasoningCode GenerationAnalysisTranslationTool Use

Categories

TextCode

Specifications

Provider
Qwen
Released
2025-09-05
Size
XLARGE
Context
256,000 tokens

Keep exploring

SAME PROMPT

Claude Sonnet 4.6's version

Same prompt, different result

COMPARE

Qwen: Qwen3 Max vs Google: Gemini 3.1 Flash Lite Preview

Both outputs, side by side

We compare AI models for a living. On purpose. We chose this.

@rival_tips

Explore

  • Compare Models
  • All Models
  • Prompt Lab
  • Image Generation
  • Audio Comparison
  • Leaderboard
  • Challenges

Discover

  • Insights
  • AI Creators
  • AI Tools
  • The Graveyard

Developers

  • Developer Hub
  • MCP Server
  • .llmignore
  • Badges
  • Rival Datasets

Connect

  • Methodology
  • Sponsor
  • Partnerships
  • Privacy Policy
  • Terms
  • RSS Feed
© 2026 Rival · Built at hours no one should be awake, on hardware we don’t own