Skip to content
Rival
Models
CompareBest ForArena
Sign Up
Sign Up

Compare AI vibes, not scores. Side-by-side outputs across the world's best models.

@rival_tips

Explore

  • Compare Models
  • All Models
  • Leaderboard
  • Challenges

Discover

  • AI Creators
  • AI Tools
  • The Graveyard

Developers

  • Developer Hub
  • MCP Server
  • .llmignore
  • Badges
  • RIVAL Datasets

Connect

  • Methodology
  • Sponsor
  • Partnerships
  • Privacy Policy
  • Terms
  • RSS Feed
© 2026 Rival
Rival
Models
CompareBest ForArena
Sign Up
Sign Up

Compare AI vibes, not scores. Side-by-side outputs across the world's best models.

@rival_tips

Explore

  • Compare Models
  • All Models
  • Leaderboard
  • Challenges

Discover

  • AI Creators
  • AI Tools
  • The Graveyard

Developers

  • Developer Hub
  • MCP Server
  • .llmignore
  • Badges
  • RIVAL Datasets

Connect

  • Methodology
  • Sponsor
  • Partnerships
  • Privacy Policy
  • Terms
  • RSS Feed
© 2026 Rival
  1. Home
  2. Models
  3. Qwen3 Next 80B A3B Thinking
  4. Estimate Complexity

See how Qwen3 Next 80B A3B Thinking compares on this challenge:

  • Qwen3 Next 80B A3B Thinking vs Claude Sonnet 4.6
  • Qwen3 Next 80B A3B Thinking vs Gemini 3.1 Pro Preview
  • Qwen3 Next 80B A3B Thinking vs Z.ai: GLM 5

4 models have responded to the "Estimate Complexity" challenge. Compare their approaches side-by-side on RIVAL. This response is part of RIVAL's open dataset of 5,600+ AI model responses.

Rival
Models
CompareBest ForArena
Sign Up
Sign Up
Back to Qwen3 Next 80B A3B Thinking

Estimate Complexity

Qwen3 Next 80B A3B Thinking's response to the 'Estimate Complexity' challenge.

Qwen3 Next 80B A3B Thinking
September 12, 2025
Text

?
Prompt

Estimate the total number of FLOPs required to train GPT-3 without searching online. Explain your reasoning step by step.

Qwen3 Next 80B A3B Thinking
Response

qwen3-next-80b-a3b-thinking logo
Qwen3 Next 80B A3B ThinkingEstimate Complexity
qwen3-next-80b-a3b-thinking logo
Qwen3 Next 80B A3B Thinking

To estimate the total FLOPs (floating-point operations) required to train GPT-3 without external references, I'll break this down step by step using fundamental principles of deep learning training and standard heuristics for Transformer models. This is a classic back-of-the-envelope calculation based on widely accepted knowledge in the field.


Step 1: Understand the key components

Training a large language model like GPT-3 involves:

  • Forward pass: Computing predictions from input data.
  • Backward pass: Computing gradients for parameter updates.
  • Total FLOPs = FLOPs from forward pass + FLOPs from backward pass across all training steps.

For Transformer-based models (like GPT-3), a well-established heuristic exists:

Total FLOPs ≈ 6 × (number of parameters) × (total training tokens)

This formula is derived from:

  • Forward pass: ~2 FLOPs per parameter per token (due to matrix multiplications, which involve multiply-add operations).
  • Backward pass: ~4 FLOPs per parameter per token (gradient computation is more computationally expensive than the forward pass).
  • Total: 2 + 4 = 6 FLOPs per parameter per token.

This heuristic accounts for all major operations (attention mechanisms, feed-forward layers, layer norms, etc.) and includes the sequence length implicitly. Why?

  • For a sequence of length $L$, the FLOPs per sequence is ~$6N \times L$.
  • Total tokens $T = \text{number of sequences} \times L$, so when you compute total FLOPs = $6N \times L \times (\text{sequences}) = 6N \times T$.
  • Thus, $L$ cancels out—only the total tokens processed ($T$) matters.

Step 2: Recall known facts about GPT-3

From publicly documented details (without external searches):

  • Number of parameters ($N$): GPT-3 has 175 billion parameters (175 × 10⁹).
    • This is a well-known figure from the original GPT-3 paper (Brown et al., 2020).
  • Total training tokens ($T$): GPT-3 was trained on ~300 billion tokens (300 × 10⁹).
    • This comes from the paper's description of the training data: Common Crawl (45%), WebText2 (18%), books (15%), Wikipedia (4%), and other sources, totaling ~300B tokens.

Step 3: Apply the formula

Plug the values into the heuristic:
$$ \text{Total FLOPs} = 6 \times N \times T = 6 \times (175 \times 10^9) \times (300 \times 10^9) $$

Break down the math:

  1. First, compute $N \times T$:
    $$ 175 \times 10^9 \times 300 \times 10^9 = (175 \times 300) \times 10^{18} = 52,500 \times 10^{18} = 5.25 \times 10^{22} $$
  2. Multiply by 6:
    $$ 6 \times 5.25 \times 10^{22} = 31.5 \times 10^{22} = 3.15 \times 10^{23} $$

Step 4: Validate the result

  • This estimate aligns with widely cited figures in the AI community (e.g., from follow-up papers like "Training Compute-Optimal Large Language Models" by Hoffmann et al.).
  • Actual values may vary slightly due to implementation details (e.g., mixed-precision training, optimizer overhead), but 3.15 × 10²³ FLOPs is the standard back-of-the-envelope estimate.
  • For context:
    • This is equivalent to 315 exaFLOPs (1 exaFLOP = 10¹⁸ FLOPs).
    • A single NVIDIA A100 GPU delivers ~19.5 teraFLOPs (10¹² FLOPs) in FP16, so training GPT-3 would require ~16 million A100 GPU-hours (though in reality, it was distributed across thousands of GPUs for weeks).

Why this estimate is reasonable

  • The 6N per token heuristic is empirically validated for Transformers.
    • Forward pass: ~2N FLOPs/token (matrix multiplications dominate).
    • Backward pass: ~4N FLOPs/token (gradient calculations involve extra operations like chain rule).
  • Total tokens ($T$) is the correct metric because sequence length cancels out (as shown in Step 1).
  • GPT-3’s $N = 175B$ and $T = 300B$ are uncontroversial facts from its original documentation.

Final Answer

The total FLOPs required to train GPT-3 is approximately 3.15 × 10²³ FLOPs (or 315 exaFLOPs). This estimate is derived from first principles and standard deep learning heuristics, without external references.

Estimate Complexity

Turn this model response into notes, narration, or a short video

Auto NotesTranscriptNarrateVideo Cut

Partner link

This output is part of our open dataset — 5K+ AI responses across 200 models

About Qwen3 Next 80B A3B Thinking

Capabilities

ConversationReasoningCode GenerationAnalysis

Categories

TextCode

Specifications

Provider
Qwen
Released
2025-09-11
Size
LARGE
Context
65,536 tokens

Keep exploring

SAME PROMPT

Claude Sonnet 4.6's version

Same prompt, different result

COMPARE

Qwen3 Next 80B A3B Thinking vs Gemini 3.1 Pro Preview

Both outputs, side by side

Compare AI vibes, not scores. Side-by-side outputs across the world's best models.

@rival_tips

Explore

  • Compare Models
  • All Models
  • Leaderboard
  • Challenges

Discover

  • AI Creators
  • AI Tools
  • The Graveyard

Developers

  • Developer Hub
  • MCP Server
  • .llmignore
  • Badges
  • RIVAL Datasets

Connect

  • Methodology
  • Sponsor
  • Partnerships
  • Privacy Policy
  • Terms
  • RSS Feed
© 2026 Rival