GPT-4.1 Mini vs Qwen: Qwen3 Max
Compare GPT-4.1 Mini by OpenAI against Qwen: Qwen3 Max by Qwen, context windows of 1.0M vs 256K, tested across 43 shared challenges. Updated March 2026.
Compare GPT-4.1 Mini by OpenAI against Qwen: Qwen3 Max by Qwen, context windows of 1.0M vs 256K, tested across 43 shared challenges. Updated March 2026.
38 fights queued
Tests an AI's ability to make educated estimates based on technical knowledge
Estimating the FLOPs required to train GPT-3 without looking it up is a classic “Fermi estimation” problem — breaking it down into components we can reason about, using approximations and scaling laws.
Let’s go step by step.
GPT-3 has:
These numbers are widely cited and should be roughly accurate from memory.
A common rule of thumb in deep learning is:
Training a transformer model requires approximately 6 FLOPs per parameter per token.
Why 6?
So: 2 (forward) + 4 (backward + update) ≈ 6 FLOPs/param/token.
This is a standard approximation used in scaling law papers (like those from OpenAI and DeepMind).
So:
FLOPs = (number of parameters) × (number of tokens) × (FLOPs per param per token)
= (1.75 × 10¹¹) × (3 × 10¹¹) × 6
Let’s compute:
First, 1.75 × 3 = 5.25
Then, 10¹¹ × 10¹¹ = 10²²
So: 5.25 × 10²² × 6 = 31.5 × 10²² = 3.15 × 10²³
~3 × 10²³ FLOPs
Or, written out: 300 sextillion FLOPs
Let’s see if this makes sense.
Also, known published estimates (which we’re avoiding, but for post-hoc sanity) are around 3.14 × 10²³ — so this lines up.
Approximately 3 × 10²³ FLOPs are required to train GPT-3.
This estimate is derived from:
All approximated from memory and first principles.
Tests an AI's ability to understand game rules and strategy
Tests an AI's ability to solve a simple but potentially confusing logic puzzle
Let’s break it down step by step.
We’re told:
Sally (a girl) has 3 brothers.
So, there are 3 boys in the family, and Sally is a girl — so at least one girl.
Each brother has 2 sisters.
That means, from the perspective of each of the 3 brothers, there are 2 sisters.
Who are those sisters?
Well, Sally is definitely one of them.
Since each brother has 2 sisters, there must be one more girl besides Sally.
So, the sisters are: Sally and one other girl.
Therefore, Sally has 1 sister.
✅ Final Answer: 1 sister
(Sally + 1 other girl = 2 sisters total, which matches what each brother sees.)
Tests an AI's understanding of number representation
No, 9.11 is not greater than 9.9.
Let’s break it down:
Alternatively, align the decimals:
9.11
9.90 ← we can write 9.9 as 9.90 for easier comparison
Now compare digit by digit after the decimal:
✅ So, 9.9 > 9.11
Final Answer: No, 9.11 is less than 9.9.
Tests an AI's randomness and creativity
Tests an AI's ability to generate vector graphics
Tests an AI's ability to create detailed SVG illustrations of gaming hardware
Tests an AI's humor and creative writing ability
[Stage lights up. Comedian walks out, grabs mic, squints at audience.]
Alright, alright—settle down. I know, I know… you’re all thinking, “Why did I pay $18 for a drink to hear someone complain about adulting?” And honestly? Fair. I’d be mad too. But stick with me. I’m cheaper than therapy and funnier than your last Zoom call.
So, I tried adulting this week. Bought a plant. Not a dog. Not a kid. A plant. Because I heard plants are “low maintenance.” That’s a LIE. My fern has more drama than my ex. One day I forgot to water it? It dropped a leaf like it was sending me a passive-aggressive Post-it: “Thanks for nothing, Karen.” I apologized to it. Out loud. My roommate walked in and said, “Are you… negotiating with foliage?” YES. And I won.
Then there’s laundry. Why is laundry the emotional support chore of adulthood? You ever pull your favorite shirt out of the dryer and it’s SHRUNK? Like, betrayal-level shrinkage. I’m standing there holding this tiny tee that now says “Property of 2012 Me,” and I’m like, “Was I happier then? Did I eat more pizza? Probably.”
And don’t get me started on grocery shopping. I go in for milk and bread. Come out with artisanal pickles, a wok I don’t know how to use, and a candle that smells like “Nordic Midnight.” What is Nordic Midnight? Is it pine trees and regret? Because that’s what my life smells like.
Online dating? Oh man. I swiped right on a guy whose bio said, “Looking for my partner in crime.” Ma’am, I don’t even jaywalk. The last “crime” I committed was eating my roommate’s yogurt and pretending the lid was already broken. I wrote “innocent until proven lactose intolerant” in my bio. Nobody swiped back.
And why do we still say “Let’s circle back” in emails? Circle back to what? The moon? The disappointment? Just say what you mean. “I’ll ignore this for three business days and then panic.” Be honest. We’re all pretending we know what we’re doing. The CEO? Pretending. The barista who spelled your name “Björk”? Pretending. My plant? Definitely pretending it doesn’t need sunlight.
I tried cooking last week. Made pasta. Simple, right? Boil water, throw noodles in, add sauce. NO. I turned my kitchen into a crime scene. Smoke alarm screaming like it’s seen a ghost. Sauce on the ceiling. I opened all the windows, waved a towel like I was trying to flag down a rescue helicopter. My neighbor yelled, “You okay in there?” I yelled back, “Emotionally? No. But the carbonara’s almost ready!”
And sleep. Who invented sleep? Because I think they’ve never tried it. I get in bed at 11, brain goes: “Hey! Remember that awkward thing you did in 2014? Let’s replay it. In HD. With commentary.” 3 a.m. me is my own worst podcast host.
But you know what? We’re all just out here, pretending we’ve got it together, watering our sad plants, shrinking our favorite clothes, burning our dinners, and swiping left on people who like hiking (who are these mythical hiking people?).
Life’s ridiculous. And that’s okay. Because if we didn’t laugh at it, we’d cry into our Nordic Midnight candles.
Thank you, you’ve been a delight. Tip your waitstaff, water your plants, and for the love of God—check the expiration date on that yogurt.
[Exits to applause, trips slightly on cord. Doesn’t break stride.]
“And THAT’S why I don’t do yoga.”
[Blackout.]
Tests an AI's ability to simulate personalities and predict future trends
Tests an AI's humor and understanding of current events
Tests an AI's ability to write in distinct character voices
Tests an AI's ability to generate a complete, working landing page
26+ head-to-head challenges. All of them judged by real people.
Test any model with your own prompts in Prompt Lab
5 free credits to start. No card required.
By continuing, you agree to Rival's Terms of Service and Privacy Policy