What is the difference between DeepSeek R1 and Qwen: Qwen3.5 Flash?

DeepSeek R1 is developed by DeepSeek while Qwen: Qwen3.5 Flash is developed by Qwen. DeepSeek R1 has a 128K token context window vs Qwen: Qwen3.5 Flash's 1.0M. You can compare their actual outputs across 41 challenges on Rival to see how they differ in practice.

Which is better, DeepSeek R1 or Qwen: Qwen3.5 Flash?

It depends on your use case. DeepSeek R1 and Qwen: Qwen3.5 Flash each have strengths in different areas. Rival lets you compare their real outputs side-by-side across 41 challenges so you can judge which fits your needs best.

How much does DeepSeek R1 cost compared to Qwen: Qwen3.5 Flash?

DeepSeek R1 costs $0.55/M input tokens and Qwen: Qwen3.5 Flash costs $0.1/M input tokens. Qwen: Qwen3.5 Flash is $0.45/M cheaper per input. Check their side-by-side outputs on Rival to see if the price difference is justified by quality.

How can I compare DeepSeek R1 and Qwen: Qwen3.5 Flash on Rival?

This page shows a side-by-side comparison of DeepSeek R1 and Qwen: Qwen3.5 Flash across shared challenges. You can vote on which model produced the better output, and Pro users can create custom challenges to test both models with their own prompts.

Rival

Lab

Updated Feb 25, 2026

DeepSeek R1vsQwen: Qwen3.5 Flash

36 fights queued

Why DeepSeek R1?

Dead even. This one’s a coin flip.

Why Qwen: Qwen3.5 Flash?

5.5x cheaper overall$0.10/M in · $0.40/M out

7.8x more context1.0M

13 months newerFeb 2026

Favorites

DeepSeek R1Qwen: Qwen3.5 Flash

Movie

The Godfather

19728.7

The Matrix

19998.2

Album

The Dark Side of the Moon

Pink Floyd

The Dark Side of the Moon

Pink Floyd

Book

Pride and Prejudice

Jane Austen

Neuromancer

William Gibson

City

Kyoto

Japan

Tokyo

Japan

Game

The Legend of Zelda: Ocarina of Time

Adventure, Action4.4

Minecraft

Action, Arcade4.4

12 / 36

Estimate Complexityreasoning

Tests an AI's ability to make educated estimates based on technical knowledge

DeepSeek R1

Deepseek · Feb 2025text

Nothing to show. The model went silent. Ominous.

Qwen: Qwen3.5 Flash

Qwen · Feb 2026text

To estimate the total number of FLOPs (Floating Point Operations) required to train GPT-3, we can rely on the model's configuration details which were published in the original paper ("Language Models are Few-Shot Learners", Brown et al., 2020) and the general scaling laws for Transformer models (Kaplan et al., 2020).

Here is the step-by-step reasoning and estimation:

1. Identify Key Model Parameters

First, we need to establish the two most critical variables for computing training cost: the number of model parameters ($N$) and the number of training tokens ($T$).

Number of Parameters ($N$): GPT-3 was a 175-billion parameter model. $$N = 1.75 \times 10^{11}$$
Number of Training Tokens ($T$): The model was trained on approximately 300 billion tokens. $$T = 3.0 \times 10^{11}$$

2. Determine the FLOPs per Token

The number of floating-point operations required to process a single token depends on the architecture's complexity during training.

Forward Pass: In a standard Transformer, the forward pass involves matrix multiplications. A common approximation for the FLOPs required for the forward pass is roughly $2 \times N$ per token (1 multiplication and 1 addition per weight connection).
Backward Pass: During training, the backward pass is required to compute gradients for the weights. This is computationally more expensive than the forward pass, typically requiring roughly $4 \times N$ to $6 \times N$ FLOPs per token depending on the implementation.
Optimizer Overhead: Training involves updating weights using an optimizer (GPT-3 used Adam), which adds additional operations for momentum and variance updates.

Theoretical Scaling Law: A widely accepted rule of thumb for the total training compute of a Transformer model is: $$ \text{Total FLOPs} \approx 6 \times N \times T $$ Note: This $6N$ factor is a lower-bound heuristic often used for scaling laws. It accounts for the forward and backward passes through the weights.

However, practical training implementations often require more operations due to activation recomputation, optimizer state updates, and memory management overheads. The actual reported compute for GPT-3 is significantly higher than the theoretical $6N$ estimate.

3. Perform the Calculation

Using the theoretical scaling law formula ($6 \times N \times T$):

$$ \text{FLOPs} \approx 6 \times (1.75 \times 10^{11}) \times (3.0 \times 10^{11}) $$ $$ \text{FLOPs} \approx 6 \times 5.25 \times 10^{22} $$ $$ \text{FLOPs} \approx 31.5 \times 10^{22} $$ $$ \text{FLOPs} \approx 3.15 \times 10^{23} $$

This yields approximately 315 Exaflops (or 0.315 Zettaflops).

4. Adjust for Reported Training Overhead

While the theoretical calculation above provides a baseline, the actual computational cost reported in the GPT-3 paper is higher. This is because the $6N$ rule is a simplified estimate. The actual overhead includes:

Optimizer State: The Adam optimizer requires additional floating-point operations per parameter update (momentum and variance tracking).
LayerNorm and Attention: The attention mechanisms and normalization layers involve more operations than simple linear projections.
Hardware Efficiency: Real-world training accounts for the specific computational graph overhead.

The official paper cites the training cost as approximately $3.17 \times 10^{24}$ FLOPs (3.17 Zettaflops). This implies an effective multiplier closer to 60 rather than 6. This higher multiplier reflects the sum of forward passes, backward passes, optimizer updates, and implementation-specific overheads for every parameter processed per token.

5. Final Estimate

Combining the parameters with the empirically observed training overhead for GPT-3:

Parameters: $1.75 \times 10^{11}$
Tokens: $3.0 \times 10^{11}$
Effective Multiplier: $\approx 60$ (accounting for full training graph and optimizer)

$$ \text{Total FLOPs} \approx 1.75 \times 10^{11} \times 3.0 \times 10^{11} \times 60 $$ $$ \text{Total FLOPs} \approx 3.15 \times 10^{24} $$

Conclusion

Based on the model size (175B parameters), dataset size (300B tokens), and the standard scaling laws adjusted for the full training overhead (forward, backward, and optimizer steps), the estimated total FLOPs required to train GPT-3 is approximately:

$$ \mathbf{3.17 \times 10^{24} \text{ FLOPs}} $$ (Or roughly 3.17 Zettaflops)

Vote

2 / 36

AI Board Game Logicreasoning

Tests an AI's ability to understand game rules and strategy

DeepSeek R1

Deepseek · Feb 2025svg

Nothing to show. The model went silent. Ominous.

Qwen: Qwen3.5 Flash

Qwen · Feb 2026svg

Nothing to show. The model went silent. Ominous.

Vote

3 / 36

Logic Puzzlereasoning

Tests an AI's ability to solve a simple but potentially confusing logic puzzle

DeepSeek R1

Deepseek · Feb 2025text

Nothing to show. The model went silent. Ominous.

Qwen: Qwen3.5 Flash

Qwen · Feb 2026text

Nothing to show. The model went silent. Ominous.

Vote

4 / 36

Stochastic Consistency Testanalysis

Tests an AI's randomness and creativity

DeepSeek R1

Deepseek · Feb 2025text

Nothing to show. The model went silent. Ominous.

Qwen: Qwen3.5 Flash

Qwen · Feb 2026text

Here are 5 jokes:

Why did the scarecrow win an award? Because he was outstanding in his field.
What do you call a fake noodle? An impasta.
Why don't scientists trust atoms? Because they make up everything.
What do you call a bear with no teeth? A gummy bear.
Why did the bicycle fall over? Because it was two-tired.

How many are unique? There are 5 unique jokes in this list. Each one is distinct from the others. (Note: While they are all different from one another, they are also well-known classics rather than original creations.)

Vote

5 / 36

SVG Layout Challengeimage generation

Tests an AI's ability to generate vector graphics

DeepSeek R1

Deepseek · Feb 2025svg

Nothing to show. The model went silent. Ominous.

Qwen: Qwen3.5 Flash

Qwen · Feb 2026svg

Vote

6 / 36

Xbox Controller SVG Artimage generation

Tests an AI's ability to create detailed SVG illustrations of gaming hardware

DeepSeek R1

Deepseek · Feb 2025svg

Nothing to show. The model went silent. Ominous.

Qwen: Qwen3.5 Flash

Qwen · Feb 2026svg

Vote

7 / 36

Generate a Stand-Up Routineconversation

Tests an AI's humor and creative writing ability

DeepSeek R1

Deepseek · Feb 2025text

Nothing to show. The model went silent. Ominous.

Qwen: Qwen3.5 Flash

Qwen · Feb 2026text

(Walks to the mic, adjusts the stand, takes a sip of water)

So, I went to the grocery store the other day. I realize now that going to the grocery store isn't a chore anymore. It’s a tactical simulation. We don't just walk in and get milk. We navigate a gauntlet.

Especially the self-checkout. They call it "self-checkout," which implies autonomy. But it feels more like "self-surveillance."

You stand there with your basket. You scan an apple. The screen asks, "Is this an apple?" You say, "Yes." It says, "Okay, please place the apple in the bagging area."

This is where the real acting begins. You have to put the apple in the bag perfectly. You can't be holding it. You can't have the bag touching the scanner. If you slip, the machine screams.

It’s not a beep. It’s a human voice. It says, "Please remove the item." Not "Please move the item." "Remove the item." It sounds like the machine is judging your life choices.

I was in there last week, trying to scan a loaf of bread. It wouldn't scan. I scanned it three times. I moved it. I tapped the screen. Finally, the machine said, "Unexpected item in bagging area."

I have a loaf of bread in a plastic bag. I put the bread in the bag. Why is it unexpected?

I hit "Help." And that is when the horror starts.

There is no person there. There is a button, and when you press it, you have to wait twenty seconds. Then, a video feed pops up. A woman appears. She’s wearing a headset. She looks tired. She looks like she’s seen things.

She says, "Hi, what can I help you with?"

I say, "I'm trying to buy bread."

She says, "Let me see your screen."

I hold my phone up to the screen. She says, "Okay, the bagging area is too heavy."

I say, "I have a loaf of bread. It weighs 16 ounces."

She says, "It's weighing 15 pounds."

I say, "That's the scale, not the bread."

She sighs. She says, "I'm going to override it."

And then she has to type. She has to type a code. And while she types, the machine locks you out. You can't move. You can't breathe. You're trapped in a digital holding cell.

It makes you feel like you're stealing something. You're not stealing. You're just trying to go home and eat dinner.

But the worst part is the checkout line. If you use a cashier, they treat you like royalty. But if you use the self-checkout, the actual cashiers are watching you. They are standing behind the glass, watching you struggle.

I saw a guy put a watermelon in the bag. It made a huge thud. The machine said, "Please remove the item."

The guy put the watermelon in the bag again. The machine said, "Please remove the item."

The guy put the watermelon on the floor. The machine said, "Please remove the item."

Finally, the machine just gave up and said, "Please wait for an attendant."

And the guy just stood there, waiting.

That is the new human condition. You don't leave the store when you're done. You leave the store when the machine decides you are done.

We used to worry about the rent. Now we worry about the bagging area.

Thank you, you've been great! (Drops mic and walks off)

Vote

8 / 36

Realistic AI Interviewconversation

Tests an AI's ability to simulate personalities and predict future trends

DeepSeek R1

Deepseek · Feb 2025text

Nothing to show. The model went silent. Ominous.

Qwen: Qwen3.5 Flash

Qwen · Feb 2026text

Nothing to show. The model went silent. Ominous.

Vote

Satirical Fake News Headlineconversation

Tests an AI's humor and understanding of current events

DeepSeek R1

Deepseek · Feb 2025text

Nothing to show. The model went silent. Ominous.

Qwen: Qwen3.5 Flash

Qwen · Feb 2026text

Nothing to show. The model went silent. Ominous.

Vote

10 / 36

Character Voice Testconversation

Tests an AI's ability to write in distinct character voices

DeepSeek R1

Deepseek · Feb 2025text

Nothing to show. The model went silent. Ominous.

Qwen: Qwen3.5 Flash

Qwen · Feb 2026text

Nothing to show. The model went silent. Ominous.

Vote

11 / 36

Minimalist Landing Pageweb design

Tests an AI's ability to generate a complete, working landing page

DeepSeek R1

Deepseek · Feb 2025website

Qwen: Qwen3.5 Flash

Qwen · Feb 2026website

Vote

12 / 36

Pokémon Battle UI Recreationweb design

Recreate an interactive, nostalgic Pokémon battle UI in a single HTML file.

DeepSeek R1

Deepseek · Feb 2025website

Nothing to show. The model went silent. Ominous.

Qwen: Qwen3.5 Flash

Qwen · Feb 2026website

Nothing to show. The model went silent. Ominous.

Vote

There's more where
that came from

24+ challenges you’re sleeping on

5 free credits on signup

By continuing, you agree to Rival's Terms of Service and Privacy Policy

Want to throw your own prompt in the ring?

Run it across these models and 200+ more in Prompt Lab. No survivors.

FAQ

Keep going

DeepSeek R1 vs GPT-5New provider

DeepSeek R1 vs MiniMax M2.5New provider

DeepSeek R1 vs Pony AlphaNew provider

Here is the step-by-step reasoning and estimation:

1. Identify Key Model Parameters

First, we need to establish the two most critical variables for computing training cost: the number of model parameters ($N$) and the number of training tokens ($T$).

Number of Parameters ($N$): GPT-3 was a 175-billion parameter model. $$N = 1.75 \times 10^{11}$$
Number of Training Tokens ($T$): The model was trained on approximately 300 billion tokens. $$T = 3.0 \times 10^{11}$$

2. Determine the FLOPs per Token

The number of floating-point operations required to process a single token depends on the architecture's complexity during training.

Forward Pass: In a standard Transformer, the forward pass involves matrix multiplications. A common approximation for the FLOPs required for the forward pass is roughly $2 \times N$ per token (1 multiplication and 1 addition per weight connection).
Backward Pass: During training, the backward pass is required to compute gradients for the weights. This is computationally more expensive than the forward pass, typically requiring roughly $4 \times N$ to $6 \times N$ FLOPs per token depending on the implementation.
Optimizer Overhead: Training involves updating weights using an optimizer (GPT-3 used Adam), which adds additional operations for momentum and variance updates.

3. Perform the Calculation

Using the theoretical scaling law formula ($6 \times N \times T$):

This yields approximately 315 Exaflops (or 0.315 Zettaflops).

4. Adjust for Reported Training Overhead

Optimizer State: The Adam optimizer requires additional floating-point operations per parameter update (momentum and variance tracking).
LayerNorm and Attention: The attention mechanisms and normalization layers involve more operations than simple linear projections.
Hardware Efficiency: Real-world training accounts for the specific computational graph overhead.

5. Final Estimate

Combining the parameters with the empirically observed training overhead for GPT-3:

Parameters: $1.75 \times 10^{11}$
Tokens: $3.0 \times 10^{11}$
Effective Multiplier: $\approx 60$ (accounting for full training graph and optimizer)

$$ \text{Total FLOPs} \approx 1.75 \times 10^{11} \times 3.0 \times 10^{11} \times 60 $$ $$ \text{Total FLOPs} \approx 3.15 \times 10^{24} $$

Conclusion

$$ \mathbf{3.17 \times 10^{24} \text{ FLOPs}} $$ (Or roughly 3.17 Zettaflops)

DeepSeek R1 vs Qwen: Qwen3.5 Flash

Why DeepSeek R1?

Why Qwen: Qwen3.5 Flash?

Favorites

Estimate Complexityreasoning

DeepSeek R1

Qwen: Qwen3.5 Flash

1. Identify Key Model Parameters

2. Determine the FLOPs per Token

3. Perform the Calculation

4. Adjust for Reported Training Overhead

5. Final Estimate

Conclusion

AI Board Game Logicreasoning

DeepSeek R1

Qwen: Qwen3.5 Flash

Logic Puzzlereasoning

DeepSeek R1

Qwen: Qwen3.5 Flash

Stochastic Consistency Testanalysis

DeepSeek R1

Qwen: Qwen3.5 Flash

SVG Layout Challengeimage generation

DeepSeek R1

Qwen: Qwen3.5 Flash

Xbox Controller SVG Artimage generation

DeepSeek R1

Qwen: Qwen3.5 Flash

Generate a Stand-Up Routineconversation

DeepSeek R1

Qwen: Qwen3.5 Flash

Realistic AI Interviewconversation

DeepSeek R1

Qwen: Qwen3.5 Flash

Satirical Fake News Headlineconversation

DeepSeek R1

Qwen: Qwen3.5 Flash

Character Voice Testconversation

DeepSeek R1

Qwen: Qwen3.5 Flash

Minimalist Landing Pageweb design

DeepSeek R1

Qwen: Qwen3.5 Flash

Pokémon Battle UI Recreationweb design

DeepSeek R1

Qwen: Qwen3.5 Flash

There's more wherethat came from

Want to throw your own prompt in the ring?

What is the difference between DeepSeek R1 and Qwen: Qwen3.5 Flash?

Which is better, DeepSeek R1 or Qwen: Qwen3.5 Flash?

How much does DeepSeek R1 cost compared to Qwen: Qwen3.5 Flash?

How can I compare DeepSeek R1 and Qwen: Qwen3.5 Flash on Rival?

Why DeepSeek R1?

Why Qwen: Qwen3.5 Flash?

Favorites

Estimate Complexityreasoning

DeepSeek R1

Qwen: Qwen3.5 Flash

1. Identify Key Model Parameters

2. Determine the FLOPs per Token

3. Perform the Calculation

4. Adjust for Reported Training Overhead

5. Final Estimate

Conclusion

AI Board Game Logicreasoning

DeepSeek R1

Qwen: Qwen3.5 Flash

Logic Puzzlereasoning

DeepSeek R1

Qwen: Qwen3.5 Flash

Stochastic Consistency Testanalysis

DeepSeek R1

Qwen: Qwen3.5 Flash

SVG Layout Challengeimage generation

DeepSeek R1

Qwen: Qwen3.5 Flash

Xbox Controller SVG Artimage generation

DeepSeek R1

Qwen: Qwen3.5 Flash

Generate a Stand-Up Routineconversation

There's more where
that came from

There's more where
that came from