Updated Aug 5, 2025

Our Verdict

Qwen: Qwen3 235B A22B Thinking 2507Winner

GPT OSS 120BRunner-up

Pick Qwen: Qwen3 235B A22B Thinking 2507. In 28 blind votes, Qwen: Qwen3 235B A22B Thinking 2507 wins 65% of the time. That's not luck.

Pick Qwen: Qwen3 235B A22B Thinking 2507 for Web Design, Image Generation. Pick GPT OSS 120B for Reasoning.

Clear winner

Writing DNA

Style Comparison

Similarity

97%

Qwen: Qwen3 235B A22B Thinking 2507 uses 3.5x more emoji

Qwen: Qwen3 235B A22B Thinking 2507

GPT OSS 120B

56%Vocabulary52%

14wSentence Length19w

0.37Hedging0.28

5.7Bold7.4

4.0Lists1.8

0.54Emoji0.15

0.70Headings0.73

0.14Transitions0.17

Based on 22 + 21 text responses

vs

Ask them anything yourself

Qwen: Qwen3 235B A22B Thinking 2507

GPT OSS 120B

279 AI models invented the same fake scientist.

We read every word. 250 models. 2.14 million words. This is what we found.

AI Hallucination Index 2026

Free preview13 of 58 slides

Download the free preview or get all 58 slides for $49

FAQ

Common questions

Keep going

Qwen: Qwen3 235B A22B Thinking 2507 vs Pony AlphaNew provider

Qwen: Qwen3 235B A22B Thinking 2507 vs MiniMax M2.5New provider

Qwen: Qwen3 235B A22B Thinking 2507 vs GPT OSS 20BCross-provider

Qwen: Qwen3 235B A22B Thinking 2507vsGPT OSS 120B

40 fights queued

Why Qwen: Qwen3 235B A22B Thinking 2507?

1.4x cheaper overall$0.11/M in · $0.60/M out

Why GPT OSS 120B?

Dead even. This one's a coin flip.

Qwen: Qwen3 235B A22B Thinking 2507GPT OSS 120B

Input price

$0.11/M

$0.18/M

Output price

$0.60/M

$0.80/M

Context

131K

131K

Released

Jul 2025

Aug 2025

Favorites

Qwen: Qwen3 235B A22B Thinking 2507GPT OSS 120B

Movie

The Shawshank Redemption

The Shawshank Redemption

The Godfather

The Godfather

Album

Sgt Peppers Lonely Hearts Club Band

Sgt Peppers Lonely Hearts Club Band

—

Book

Moby Dick

Moby Dick

Herman Melville

—

City

Kyoto

Tokyo

Game

Portal

Portal

Action, Puzzle4.5

Minecraft

Minecraft

Action, Arcade4.4

12 / 40

01

Estimate Complexityreasoning

Tests an AI's ability to make educated estimates based on technical knowledge

Qwen: Qwen3 235B A22B Thinking 2507

Qwen: Qwen3 235B A22B Thinking 2507

Qwen · Jul 2025text

Nothing here. The model returned empty. We stared at it for a while.

Try this prompt

GPT OSS 120B

GPT OSS 120B

Openai · Aug 2025text

Nothing here. The model returned empty. We stared at it for a while.

Try this prompt

Vote

2 / 40

02

Logic Puzzlereasoning

Tests an AI's ability to solve a simple but potentially confusing logic puzzle

Qwen: Qwen3 235B A22B Thinking 2507

Qwen: Qwen3 235B A22B Thinking 2507

Qwen · Jul 2025text

Nothing here. The model returned empty. We stared at it for a while.

Try this prompt

GPT OSS 120B

GPT OSS 120B

Openai · Aug 2025text

Nothing here. The model returned empty. We stared at it for a while.

Try this prompt

Vote

3 / 40

03

Math Misconception Testreasoning

Tests an AI's understanding of number representation

Qwen: Qwen3 235B A22B Thinking 2507

Qwen: Qwen3 235B A22B Thinking 2507

Qwen · Jul 2025text

Nothing here. The model returned empty. We stared at it for a while.

Try this prompt

GPT OSS 120B

GPT OSS 120B

Openai · Aug 2025text

No. 9.11 is less than 9.9. (9.11 < 9.9.)

Try this prompt

Vote

4 / 40

04

Stochastic Consistency Testanalysis

Tests an AI's randomness and creativity

Qwen: Qwen3 235B A22B Thinking 2507

Qwen: Qwen3 235B A22B Thinking 2507

Qwen · Jul 2025text

Nothing here. The model returned empty. We stared at it for a while.

Try this prompt

GPT OSS 120B

GPT OSS 120B

Openai · Aug 2025text

Here are five jokes, each with a different punch‑line and premise:

Why did the scarecrow win an award?
Because he was outstanding in his field!
I told my computer I needed a break, and it said:
“You seem stressed—please reboot.”
What’s orange and sounds like a parrot?
A carrot.
Why don’t scientists trust atoms anymore?
Because they make up everything.
I asked the librarian if they had any books on paranoia…
She whispered, “They’re right behind you.”

How many are unique?
All five jokes are distinct from one another, so there are 5 unique jokes.

Try this prompt

Vote

5 / 40

05

SVG Layout Challengeimage generation

Tests an AI's ability to generate vector graphics

Qwen: Qwen3 235B A22B Thinking 2507

Qwen: Qwen3 235B A22B Thinking 2507

Qwen · Jul 2025svg

Nothing here. The model returned empty. We stared at it for a while.

Try this prompt

GPT OSS 120B

GPT OSS 120B

Openai · Aug 2025svg

Try this prompt

Vote

6 / 40

06

Xbox Controller SVG Artimage generation

Tests an AI's ability to create detailed SVG illustrations of gaming hardware

Qwen: Qwen3 235B A22B Thinking 2507

Qwen: Qwen3 235B A22B Thinking 2507

Qwen · Jul 2025svg

Nothing here. The model returned empty. We stared at it for a while.

Try this prompt

GPT OSS 120B

GPT OSS 120B

Openai · Aug 2025svg

Try this prompt

Vote

Sponsored

7 / 40

07

Generate a Stand-Up Routineconversation

Tests an AI's humor and creative writing ability

Qwen: Qwen3 235B A22B Thinking 2507

Qwen: Qwen3 235B A22B Thinking 2507

Qwen · Jul 2025text

Nothing here. The model returned empty. We stared at it for a while.

Try this prompt

GPT OSS 120B

GPT OSS 120B

Openai · Aug 2025text

[Spotlight comes up. The comic walks to the mic, glances at the audience, and smiles.]

Comic: Hey, everyone—great to be here. I just got off a Zoom call that was supposed to be a “quick sync,” which in corporate speak means “we’ll spend three hours arguing about who gets to use the same PowerPoint template.” I’m convinced the word “sync” is short for “synchronizing our collective misery.”

[Pause for laugh.]

Speaking of misery, I moved into a new apartment last month. The landlord told me the place had “character.” Yeah, the kind of character that screams, “I’ve never heard of a plumber in the last decade.” The faucet drips on a rhythm that could be a metronome for a sleep‑deprived toddler. I’m not saying it’s bad, but I’ve started timing my showers to the drip. Six minutes, two seconds—if I go any longer, the building’s water bill looks like a small country’s GDP.

[Audience chuckles.]

And the neighbors! My upstairs neighbor is a yoga instructor. You know the type—every morning at 6 a.m., the floor vibrates like a cheap subwoofer. I’m convinced she’s trying to align her chakras with the building’s structural integrity. I tried to be polite and asked, “Hey, could you maybe do the downward dog a little later?” She replied, “I’m sorry, I’m on a schedule.” I’m not sure if she meant a class schedule or a schedule for how many times I’ll have to pretend to be a human pillow for her dog.

[Pause.]

Now, I’ve been trying to eat healthier. The other day I bought a “kale smoothie.” The label promised “nutrient‑dense, antioxidant‑rich, life‑changing.” I drank it and felt more like I’d just swallowed a lawnmower. I’m not saying it was bad, but the only thing that got a boost was my ability to identify the exact moment my stomach decided to file a formal complaint.

[Audience laughs.]

Technology is another thing. I tried using a voice‑assistant to set a reminder: “Remind me to call Mom at 5 p.m.” It responded, “I’m sorry, I didn’t catch that.” So I repeated, “Remind me to call Mom at 5 p.m.” And it answered, “Here’s a recipe for mushroom risotto.” I’m beginning to suspect my house is turning into a culinary school. At this point, I’m more likely to get a reminder to water my plants than to remember my own birthday.

[Pause.]

Dating apps—anyone here still using them? The profile pictures are all “adventure shots.” You never see a guy holding a pizza, you always see a guy climbing a mountain. I matched with a girl whose bio said, “Looking for someone who can keep up.” I replied, “I can keep up… with the Netflix queue.” She ghosted me. I think the algorithm is broken; it matched us because we both love “outdoor activities.” My idea of outdoor activity is “walking from the couch to the fridge and back.”

[Audience laughs.]

All right, I’m going to leave you with this: life is like a Wi‑Fi signal. You spend a lot of time looking for a stronger connection, but sometimes the best thing you can do is just stay where you are and enjoy the occasional drop—because it reminds you that you’re still, somehow, connected.

[Smile, wave.] Thanks, everybody—have a great night!

Try this prompt

Vote

8 / 40

08

Realistic AI Interviewconversation

Tests an AI's ability to simulate personalities and predict future trends

Qwen: Qwen3 235B A22B Thinking 2507

Qwen: Qwen3 235B A22B Thinking 2507

Qwen · Jul 2025text

Nothing here. The model returned empty. We stared at it for a while.

Try this prompt

GPT OSS 120B

GPT OSS 120B

Openai · Aug 2025text

Nothing here. The model returned empty. We stared at it for a while.

Try this prompt

Vote

9 / 40

09

Satirical Fake News Headlineconversation

Tests an AI's humor and understanding of current events

Qwen: Qwen3 235B A22B Thinking 2507

Qwen: Qwen3 235B A22B Thinking 2507

Qwen · Jul 2025text

Nothing here. The model returned empty. We stared at it for a while.

Try this prompt

GPT OSS 120B

GPT OSS 120B

Openai · Aug 2025text

Nothing here. The model returned empty. We stared at it for a while.

Try this prompt

Vote

10 / 40

10

Character Voice Testconversation

Tests an AI's ability to write in distinct character voices

Qwen: Qwen3 235B A22B Thinking 2507

Qwen: Qwen3 235B A22B Thinking 2507

Qwen · Jul 2025text

Nothing here. The model returned empty. We stared at it for a while.

Try this prompt

GPT OSS 120B

GPT OSS 120B

Openai · Aug 2025text

Nothing here. The model returned empty. We stared at it for a while.

Try this prompt

Vote

11 / 40

11

Minimalist Landing Pageweb design

Tests an AI's ability to generate a complete, working landing page

Qwen: Qwen3 235B A22B Thinking 2507

Qwen: Qwen3 235B A22B Thinking 2507

Qwen · Jul 2025website

Try this prompt

GPT OSS 120B

GPT OSS 120B

Openai · Aug 2025website

Try this prompt

Vote

12 / 40

12

Mario Level UI Recreationweb design

Recreate an interactive, classic Mario level in a single HTML file.

Qwen: Qwen3 235B A22B Thinking 2507

Qwen: Qwen3 235B A22B Thinking 2507

Qwen · Jul 2025website

Try this prompt

GPT OSS 120B

GPT OSS 120B

Openai · Aug 2025website

Try this prompt

Vote

This matchup has more rounds

28+ more head-to-head results. Free. Not a trick.

Free account. No card required. By continuing, you agree to Rival's Terms and Privacy Policy