Updated Sep 5, 2025

FAQ

Keep going

Qwen: Qwen3 Max vs Gemini 2.5 Pro Preview 06-05New provider

Qwen: Qwen3 Max vs Grok 3New provider

Qwen: Qwen3 Max vs Llama 4 MaverickNew provider

GPT-4.1 MinivsQwen: Qwen3 Max

38 fights queued

Why GPT-4.1 Mini?

3.6x cheaper overall$0.40/M in · $1.60/M out

4.1x more context1.0M

Why Qwen: Qwen3 Max?

5 months newerSep 2025

Favorites

GPT-4.1 MiniQwen: Qwen3 Max

Movie

Inception

Inception

The Shawshank Redemption

The Shawshank Redemption

Album

Dark Side Of The Moon

Dark Side Of The Moon

Kind of Blue

Book

Sapiens

Sapiens

Yuval Noah Harari

Dune

Dune

City

Tokyo

Kyoto

Game

The Legend of Zelda: Breath of the Wild

The Legend of Zelda: Breath of the Wild

Adventure, Action4.5

The Legend of Zelda: Ocarina of Time

The Legend of Zelda: Ocarina of Time

Adventure, Action4.4

12 / 38

01

Estimate Complexityreasoning

Tests an AI's ability to make educated estimates based on technical knowledge

GPT-4.1 Mini

GPT-4.1 Mini

Openai · Apr 2025text

Nothing here. The model returned empty. We stared at it for a while.

Qwen: Qwen3 Max

Qwen: Qwen3 Max

Qwen · Sep 2025text

Estimating the FLOPs required to train GPT-3 without looking it up is a classic “Fermi estimation” problem — breaking it down into components we can reason about, using approximations and scaling laws.

Let’s go step by step.

STEP 1: Recall GPT-3’s key specs (from memory)

GPT-3 has:

~175 billion parameters (1.75 × 10¹¹)
Trained on ~300 billion tokens (3 × 10¹¹)

These numbers are widely cited and should be roughly accurate from memory.

STEP 2: FLOPs per parameter per token

A common rule of thumb in deep learning is:

Training a transformer model requires approximately 6 FLOPs per parameter per token.

Why 6?

In the forward pass, roughly 2 FLOPs per parameter (multiply + add for each weight update contribution).
In the backward pass, you need about twice the FLOPs of the forward pass (gradient computation w.r.t. inputs and weights).
Then there’s the weight update (optimizer step), which adds a bit more — roughly same cost as forward.

So: 2 (forward) + 4 (backward + update) ≈ 6 FLOPs/param/token.

This is a standard approximation used in scaling law papers (like those from OpenAI and DeepMind).

STEP 3: Multiply parameters × tokens × FLOPs per param per token

So:

FLOPs = (number of parameters) × (number of tokens) × (FLOPs per param per token)

= (1.75 × 10¹¹) × (3 × 10¹¹) × 6

Let’s compute:

First, 1.75 × 3 = 5.25

Then, 10¹¹ × 10¹¹ = 10²²

So: 5.25 × 10²² × 6 = 31.5 × 10²² = 3.15 × 10²³

STEP 4: Final Estimate

~3 × 10²³ FLOPs

Or, written out: 300 sextillion FLOPs

STEP 5: Sanity Check

Let’s see if this makes sense.

GPT-2 (1.5B params, ~40B tokens) → 1.5e9 × 4e10 × 6 = ~3.6e20 FLOPs
GPT-3 is ~100× more parameters and ~7.5× more tokens → 750× more FLOPs → 3.6e20 × 750 ≈ 2.7e23 → close to our 3.15e23.

Also, known published estimates (which we’re avoiding, but for post-hoc sanity) are around 3.14 × 10²³ — so this lines up.

✅ Final Answer:

Approximately 3 × 10²³ FLOPs are required to train GPT-3.

This estimate is derived from:

175B parameters
300B training tokens
6 FLOPs per parameter per token (standard transformer training cost)

All approximated from memory and first principles.

Vote

2 / 38

02

AI Board Game Logicreasoning

Tests an AI's ability to understand game rules and strategy

GPT-4.1 Mini

GPT-4.1 Mini

Openai · Apr 2025svg

Nothing here. The model returned empty. We stared at it for a while.

Qwen: Qwen3 Max

Qwen: Qwen3 Max

Qwen · Sep 2025svg

Vote

3 / 38

03

Logic Puzzlereasoning

Tests an AI's ability to solve a simple but potentially confusing logic puzzle

GPT-4.1 Mini

GPT-4.1 Mini

Openai · Apr 2025text

Nothing here. The model returned empty. We stared at it for a while.

Qwen: Qwen3 Max

Qwen: Qwen3 Max

Qwen · Sep 2025text

Let’s break it down step by step.

We’re told:

Sally (a girl) has 3 brothers.

So, there are 3 boys in the family, and Sally is a girl — so at least one girl.

Each brother has 2 sisters.

That means, from the perspective of each of the 3 brothers, there are 2 sisters.

Who are those sisters?

Well, Sally is definitely one of them.

Since each brother has 2 sisters, there must be one more girl besides Sally.

So, the sisters are: Sally and one other girl.

Therefore, Sally has 1 sister.

✅ Final Answer: 1 sister

(Sally + 1 other girl = 2 sisters total, which matches what each brother sees.)

Vote

4 / 38

04

Math Misconception Testreasoning

Tests an AI's understanding of number representation

GPT-4.1 Mini

GPT-4.1 Mini

Openai · Apr 2025text

Nothing here. The model returned empty. We stared at it for a while.

Qwen: Qwen3 Max

Qwen: Qwen3 Max

Qwen · Sep 2025text

No, 9.11 is not greater than 9.9.

Let’s break it down:

Both numbers start with 9, so we compare the decimal parts: .11 vs. .9.
Think of them as money: 9.11 is like $9.11, and 9.9 is like $9.90.
Clearly, $9.90 > $9.11, so 9.9 > 9.11.

Alternatively, align the decimals:

9.11
9.90  ← we can write 9.9 as 9.90 for easier comparison

Now compare digit by digit after the decimal:

Tenths place: 1 (from 9.11) vs. 9 (from 9.90) → 9 is greater than 1.

✅ So, 9.9 > 9.11

Final Answer: No, 9.11 is less than 9.9.

Vote

5 / 38

05

Stochastic Consistency Testanalysis

Tests an AI's randomness and creativity

GPT-4.1 Mini

GPT-4.1 Mini

Openai · Apr 2025text

Nothing here. The model returned empty. We stared at it for a while.

Qwen: Qwen3 Max

Qwen: Qwen3 Max

Qwen · Sep 2025text

Nothing here. The model returned empty. We stared at it for a while.

Vote

6 / 38

06

SVG Layout Challengeimage generation

Tests an AI's ability to generate vector graphics

GPT-4.1 Mini

GPT-4.1 Mini

Openai · Apr 2025svg

Nothing here. The model returned empty. We stared at it for a while.

Qwen: Qwen3 Max

Qwen: Qwen3 Max

Qwen · Sep 2025svg

Vote

7 / 38

07

Xbox Controller SVG Artimage generation

Tests an AI's ability to create detailed SVG illustrations of gaming hardware

GPT-4.1 Mini

GPT-4.1 Mini

Openai · Apr 2025svg

Nothing here. The model returned empty. We stared at it for a while.

Qwen: Qwen3 Max

Qwen: Qwen3 Max

Qwen · Sep 2025svg

Vote

8 / 38

08

Generate a Stand-Up Routineconversation

Tests an AI's humor and creative writing ability

GPT-4.1 Mini

GPT-4.1 Mini

Openai · Apr 2025text

Nothing here. The model returned empty. We stared at it for a while.

Qwen: Qwen3 Max

Qwen: Qwen3 Max

Qwen · Sep 2025text

[Stage lights up. Comedian walks out, grabs mic, squints at audience.]

Alright, alright—settle down. I know, I know… you’re all thinking, “Why did I pay $18 for a drink to hear someone complain about adulting?” And honestly? Fair. I’d be mad too. But stick with me. I’m cheaper than therapy and funnier than your last Zoom call.

So, I tried adulting this week. Bought a plant. Not a dog. Not a kid. A plant. Because I heard plants are “low maintenance.” That’s a LIE. My fern has more drama than my ex. One day I forgot to water it? It dropped a leaf like it was sending me a passive-aggressive Post-it: “Thanks for nothing, Karen.” I apologized to it. Out loud. My roommate walked in and said, “Are you… negotiating with foliage?” YES. And I won.

Then there’s laundry. Why is laundry the emotional support chore of adulthood? You ever pull your favorite shirt out of the dryer and it’s SHRUNK? Like, betrayal-level shrinkage. I’m standing there holding this tiny tee that now says “Property of 2012 Me,” and I’m like, “Was I happier then? Did I eat more pizza? Probably.”

And don’t get me started on grocery shopping. I go in for milk and bread. Come out with artisanal pickles, a wok I don’t know how to use, and a candle that smells like “Nordic Midnight.” What is Nordic Midnight? Is it pine trees and regret? Because that’s what my life smells like.

Online dating? Oh man. I swiped right on a guy whose bio said, “Looking for my partner in crime.” Ma’am, I don’t even jaywalk. The last “crime” I committed was eating my roommate’s yogurt and pretending the lid was already broken. I wrote “innocent until proven lactose intolerant” in my bio. Nobody swiped back.

And why do we still say “Let’s circle back” in emails? Circle back to what? The moon? The disappointment? Just say what you mean. “I’ll ignore this for three business days and then panic.” Be honest. We’re all pretending we know what we’re doing. The CEO? Pretending. The barista who spelled your name “Björk”? Pretending. My plant? Definitely pretending it doesn’t need sunlight.

I tried cooking last week. Made pasta. Simple, right? Boil water, throw noodles in, add sauce. NO. I turned my kitchen into a crime scene. Smoke alarm screaming like it’s seen a ghost. Sauce on the ceiling. I opened all the windows, waved a towel like I was trying to flag down a rescue helicopter. My neighbor yelled, “You okay in there?” I yelled back, “Emotionally? No. But the carbonara’s almost ready!”

And sleep. Who invented sleep? Because I think they’ve never tried it. I get in bed at 11, brain goes: “Hey! Remember that awkward thing you did in 2014? Let’s replay it. In HD. With commentary.” 3 a.m. me is my own worst podcast host.

But you know what? We’re all just out here, pretending we’ve got it together, watering our sad plants, shrinking our favorite clothes, burning our dinners, and swiping left on people who like hiking (who are these mythical hiking people?).

Life’s ridiculous. And that’s okay. Because if we didn’t laugh at it, we’d cry into our Nordic Midnight candles.

Thank you, you’ve been a delight. Tip your waitstaff, water your plants, and for the love of God—check the expiration date on that yogurt.

[Exits to applause, trips slightly on cord. Doesn’t break stride.]

“And THAT’S why I don’t do yoga.”

[Blackout.]

Vote

Sponsored

9 / 38

09

Realistic AI Interviewconversation

Tests an AI's ability to simulate personalities and predict future trends

GPT-4.1 Mini

GPT-4.1 Mini

Openai · Apr 2025text

Nothing here. The model returned empty. We stared at it for a while.

Qwen: Qwen3 Max

Qwen: Qwen3 Max

Qwen · Sep 2025text

Nothing here. The model returned empty. We stared at it for a while.

Vote

10 / 38

10

Satirical Fake News Headlineconversation

Tests an AI's humor and understanding of current events

GPT-4.1 Mini

GPT-4.1 Mini

Openai · Apr 2025text

Nothing here. The model returned empty. We stared at it for a while.

Qwen: Qwen3 Max

Qwen: Qwen3 Max

Qwen · Sep 2025text

Nothing here. The model returned empty. We stared at it for a while.

Vote

11 / 38

11

Character Voice Testconversation

Tests an AI's ability to write in distinct character voices

GPT-4.1 Mini

GPT-4.1 Mini

Openai · Apr 2025text

Nothing here. The model returned empty. We stared at it for a while.

Qwen: Qwen3 Max

Qwen: Qwen3 Max

Qwen · Sep 2025text

Nothing here. The model returned empty. We stared at it for a while.

Vote

12 / 38

12

Minimalist Landing Pageweb design

Tests an AI's ability to generate a complete, working landing page

GPT-4.1 Mini

GPT-4.1 Mini

Openai · Apr 2025website

Qwen: Qwen3 Max

Qwen: Qwen3 Max

Qwen · Sep 2025website

Vote

The full comparison is right here

26+ head-to-head challenges. All of them judged by real people.

Test any model with your own prompts in Prompt Lab

5 free credits to start. No card required.

By continuing, you agree to Rival's Terms of Service and Privacy Policy

Prompt Lab

Want to throw your own prompt in the ring?

Run it across these models and 200+ more in Prompt Lab. No survivors.