Skip to content
Rival
Models
CompareBest ForArena
Sign Up
Sign Up

Compare AI vibes, not scores. Side-by-side outputs across the world's best models.

@rival_tips

Explore

  • Compare Models
  • All Models
  • Leaderboard
  • Challenges

Discover

  • AI Creators
  • AI Tools
  • The Graveyard

Developers

  • Developer Hub
  • MCP Server
  • .llmignore
  • Badges
  • RIVAL Datasets

Connect

  • Methodology
  • Sponsor
  • Partnerships
  • Privacy Policy
  • Terms
  • RSS Feed
© 2026 Rival
Rival
Models
CompareBest ForArena
Sign Up
Sign Up

Compare AI vibes, not scores. Side-by-side outputs across the world's best models.

@rival_tips

Explore

  • Compare Models
  • All Models
  • Leaderboard
  • Challenges

Discover

  • AI Creators
  • AI Tools
  • The Graveyard

Developers

  • Developer Hub
  • MCP Server
  • .llmignore
  • Badges
  • RIVAL Datasets

Connect

  • Methodology
  • Sponsor
  • Partnerships
  • Privacy Policy
  • Terms
  • RSS Feed
© 2026 Rival
  1. Home
  2. Models
  3. Qwen Plus 0728
  4. Estimate Complexity

See how Qwen Plus 0728 compares on this challenge:

  • Qwen Plus 0728 vs Claude Sonnet 4.6
  • Qwen Plus 0728 vs Gemini 3.1 Pro Preview
  • Qwen Plus 0728 vs Z.ai: GLM 5

4 models have responded to the "Estimate Complexity" challenge. Compare their approaches side-by-side on RIVAL. This response is part of RIVAL's open dataset of 5,600+ AI model responses.

Rival
Models
CompareBest ForArena
Sign Up
Sign Up
Back to Qwen Plus 0728

Estimate Complexity

Qwen Plus 2025 07 28's response to the 'Estimate Complexity' challenge.

Qwen Plus 0728
September 12, 2025
Text

?
Prompt

Estimate the total number of FLOPs required to train GPT-3 without searching online. Explain your reasoning step by step.

Qwen Plus 0728
Response

qwen-plus-2025-07-28 logo
Qwen Plus 0728Estimate Complexity
qwen-plus-2025-07-28 logo
Qwen Plus 0728

To estimate the total number of floating-point operations (FLOPs) required to train GPT-3, we can use a widely adopted approximation based on the model size, the number of training tokens, and the architecture of transformer-based models. We’ll do this without looking up specific values, relying only on known public disclosures and reasonable assumptions.

Step 1: Understand the scaling law for training FLOPs

For transformer models, a well-known approximation for the total training FLOPs is:

[ \text{Total FLOPs} \approx 6 \times N \times D ]

Where:

  • ( N ) = number of parameters in the model
  • ( D ) = number of training tokens

The factor of 6 comes from empirical analysis that accounts for the computational cost of forward pass, backward pass, and parameter updates. Specifically:

  • Forward pass: ~2N FLOPs per token
  • Backward pass: ~2N FLOPs per token (similar to forward)
  • Optimizer step (e.g., Adam): ~2N FLOPs per token (e.g., gradient updates, momentum, etc.)

So total is roughly 6N FLOPs per token, hence:

[ \text{Total FLOPs} \approx 6 \times N \times D ]

Step 2: Estimate GPT-3’s parameter count

GPT-3 was released with multiple sizes, but the largest version had 175 billion parameters.

So: [ N = 175 \times 10^9 = 1.75 \times 10^{11} ]

Step 3: Estimate the number of training tokens

From public papers (e.g., the GPT-3 paper by Brown et al., 2020), the model was trained on 300 billion tokens.

So: [ D = 300 \times 10^9 = 3 \times 10^{11} ]

(Note: This is a known figure from the literature, but assuming we recall it from general AI knowledge.)

Step 4: Apply the formula

[ \text{Total FLOPs} \approx 6 \times (1.75 \times 10^{11}) \times (3 \times 10^{11}) ]

First, multiply the coefficients: [ 6 \times 1.75 \times 3 = 6 \times 5.25 = 31.5 ]

Now the exponents: [ 10^{11} \times 10^{11} = 10^{22} ]

So: [ \text{Total FLOPs} \approx 31.5 \times 10^{22} = 3.15 \times 10^{23} ]

Final Estimate:

[ \boxed{3.15 \times 10^{23}} \text{ FLOPs} ]

Summary of Reasoning:

  • GPT-3 has 175B parameters.
  • It was trained on 300B tokens.
  • Each parameter is involved in approximately 6 FLOPs per token during training (due to forward, backward, and optimizer steps).
  • Multiply: ( 6 \times 175 \times 10^9 \times 300 \times 10^9 = 3.15 \times 10^{23} )

This is a standard estimate used in the field and aligns with known reports (e.g., Kaplan et al. scaling laws), even without searching online.

Estimate Complexity

Turn this model response into notes, narration, or a short video

Auto NotesTranscriptNarrateVideo Cut

Partner link

This output is part of our open dataset — 5K+ AI responses across 200 models

About Qwen Plus 0728

Capabilities

ConversationReasoningCode GenerationAnalysis

Categories

TextCode

Specifications

Provider
Qwen
Released
2025-09-08
Size
LARGE
Context
1,000,000 tokens

Keep exploring

SAME PROMPT

Claude Sonnet 4.6's version

Same prompt, different result

COMPARE

Qwen Plus 0728 vs Gemini 3.1 Pro Preview

Both outputs, side by side

Compare AI vibes, not scores. Side-by-side outputs across the world's best models.

@rival_tips

Explore

  • Compare Models
  • All Models
  • Leaderboard
  • Challenges

Discover

  • AI Creators
  • AI Tools
  • The Graveyard

Developers

  • Developer Hub
  • MCP Server
  • .llmignore
  • Badges
  • RIVAL Datasets

Connect

  • Methodology
  • Sponsor
  • Partnerships
  • Privacy Policy
  • Terms
  • RSS Feed
© 2026 Rival