Skip to content
Rival
Models
CompareBest ForArena
Lab
Sign Up
Sign Up

We spend our weekends yelling at API endpoints so you don’t have to.

@rival_tips

Explore

  • Compare Models
  • All Models
  • Prompt Lab
  • Image Generation
  • Audio Comparison
  • Leaderboard
  • Challenges

Discover

  • Insights
  • AI Creators
  • AI Tools
  • The Graveyard

Developers

  • Developer Hub
  • MCP Server
  • .llmignore
  • Badges
  • Rival Datasets

Connect

  • Methodology
  • Sponsor
  • Partnerships
  • Privacy Policy
  • Terms
  • RSS Feed
© 2026 Rival · Built with mass sleep deprivation, energy drinks, and someone else’s GPU cluster
Rival
Models
CompareBest ForArena
Lab
Sign Up
Sign Up

We spend our weekends yelling at API endpoints so you don’t have to.

@rival_tips

Explore

  • Compare Models
  • All Models
  • Prompt Lab
  • Image Generation
  • Audio Comparison
  • Leaderboard
  • Challenges

Discover

  • Insights
  • AI Creators
  • AI Tools
  • The Graveyard

Developers

  • Developer Hub
  • MCP Server
  • .llmignore
  • Badges
  • Rival Datasets

Connect

  • Methodology
  • Sponsor
  • Partnerships
  • Privacy Policy
  • Terms
  • RSS Feed
© 2026 Rival · Built with mass sleep deprivation, energy drinks, and someone else’s GPU cluster

Qwen: Qwen3.5 35B A3B — Estimate Complexity

  1. Home
  2. Models
  3. Qwen: Qwen3.5 35B A3B
  4. Estimate Complexity

See how Qwen: Qwen3.5 35B A3B compares on this challenge:

  • Qwen: Qwen3.5 35B A3B vs Claude Sonnet 4.6
  • Qwen: Qwen3.5 35B A3B vs Gemini 3.1 Pro Preview
  • Qwen: Qwen3.5 35B A3B vs Z.ai: GLM 5

4 models have responded to the "Estimate Complexity" challenge. Compare their approaches side-by-side on Rival. This response is part of Rival's open dataset of 5,600+ AI model responses.

Rival
Models
CompareBest ForArena
Lab
Sign Up
Sign Up
Back to Qwen: Qwen3.5 35B A3B

Estimate Complexity

Qwen3.5 35B A3B's response to the 'Estimate Complexity' challenge.

Qwen: Qwen3.5 35B A3B
February 26, 2026
Text

?
Prompt

Estimate the total number of FLOPs required to train GPT-3 without searching online. Explain your reasoning step by step.

Qwen: Qwen3.5 35B A3B
Response

qwen3.5-35b-a3b logo
Qwen: Qwen3.5 35B A3BEstimate Complexity
qwen3.5-35b-a3b logo
Qwen: Qwen3.5 35B A3B

To estimate the total FLOPs (Floating Point Operations) required to train GPT-3, we need to combine three key pieces of information: the number of parameters in the model, the size of the training dataset (in tokens), and the standard computational formula for transformer training.

Here is the step-by-step reasoning and calculation.

Step 1: Identify the Number of Parameters

The most famous specification of GPT-3 is its parameter count. From general knowledge of the model architecture (published in the 2020 paper "Language Models are Few-Shot Learners" by Brown et al.), we know:

  • Parameters ($P$): 175 Billion ($1.75 \times 10^{11}$).

Step 2: Identify the Training Dataset Size (Tokens)

Training a transformer model requires processing the dataset multiple times (epochs). The standard metric for the total volume of text processed is the number of "tokens" (pieces of text) the model sees during the entire training run.

  • Training Tokens ($T$): The GPT-3 paper cites a dataset size of approximately 300 Billion tokens ($3.0 \times 10^{11}$).
    • Note: This is a combination of Common Crawl, WebText2, BooksCorpus, and Wikipedia, filtered for quality.

Step 3: Establish the FLOPs Formula

To calculate the total compute required for training a Transformer model, we use a standard rule of thumb derived from the computational complexity of the forward and backward passes.

  1. Forward Pass: To process one token, the model performs matrix multiplications proportional to the number of parameters. This costs approximately $2 \times P$ FLOPs per token.
  2. Backward Pass: To update the weights via backpropagation, the model must calculate gradients. This is roughly twice as expensive as the forward pass, costing approximately $4 \times P$ FLOPs per token.
  3. Total per Token: $2P + 4P = \mathbf{6 \times P}$ FLOPs per token.

Therefore, the total training FLOPs formula is: $$ \text{Total FLOPs} = 6 \times P \times T $$

Step 4: Perform the Calculation

Now we substitute the values from Steps 1 and 2 into the formula from Step 3.

$$ \text{Total FLOPs} = 6 \times (1.75 \times 10^{11}) \times (3.0 \times 10^{11}) $$

First, multiply the scalar numbers: $$ 6 \times 1.75 = 10.5 $$ $$ 10.5 \times 3.0 = 31.5 $$

Next, multiply the powers of 10: $$ 10^{11} \times 10^{11} = 10^{22} $$

Combine them: $$ 31.5 \times 10^{22} $$

Convert to standard scientific notation: $$ 3.15 \times 10^{23} $$

Step 5: Final Verification and Context

The result is approximately $3.15 \times 10^{23}$ FLOPs.

In the machine learning community, this is often rounded to $3 \times 10^{23}$ or simply cited as 315 sextillion FLOPs. This estimate aligns with the widely accepted figures reported in technical summaries of the GPT-3 paper, which state the training required roughly $3.1 \times 10^{23}$ FLOPs.

Conclusion

Based on the parameter count of 175 billion, a training set of 300 billion tokens, and the standard $6 \times \text{Params} \times \text{Tokens}$ formula for Transformer training:

The estimated total number of FLOPs required to train GPT-3 is approximately $3.15 \times 10^{23}$.

Estimate Complexity

Turn this model response into notes, narration, or a short video

Auto NotesTranscriptNarrateVideo Cut

Partner link

This output lives in our open dataset — 5K+ responses across 200 models. Take it. It’s free.

About Qwen: Qwen3.5 35B A3B

Capabilities

ConversationReasoningCode GenerationAnalysisTool UseAgentic Tool UseTranslation

Categories

TextCodeVisionMultimodal

Specifications

Provider
Qwen
Released
2026-02-25
Size
MEDIUM
Parameters
35B (3B active)
Context
262,144 tokens

Keep exploring

SAME PROMPT

Claude Sonnet 4.6's version

Same prompt, different result

COMPARE

Qwen: Qwen3.5 35B A3B vs Gemini 3.1 Pro Preview

Both outputs, side by side

We spend our weekends yelling at API endpoints so you don’t have to.

@rival_tips

Explore

  • Compare Models
  • All Models
  • Prompt Lab
  • Image Generation
  • Audio Comparison
  • Leaderboard
  • Challenges

Discover

  • Insights
  • AI Creators
  • AI Tools
  • The Graveyard

Developers

  • Developer Hub
  • MCP Server
  • .llmignore
  • Badges
  • Rival Datasets

Connect

  • Methodology
  • Sponsor
  • Partnerships
  • Privacy Policy
  • Terms
  • RSS Feed
© 2026 Rival · Built with mass sleep deprivation, energy drinks, and someone else’s GPU cluster