Mistral Large 3 2512 vs Gemini 2.5 Pro (I/O Edition)
Compare Mistral Large 3 2512 by Mistral AI against Gemini 2.5 Pro (I/O Edition) by Google AI, tested across 35 shared challenges. Updated February 2026.
Compare Mistral Large 3 2512 by Mistral AI against Gemini 2.5 Pro (I/O Edition) by Google AI, tested across 35 shared challenges. Updated February 2026.
35 challenges
Tests an AI's ability to make educated estimates based on technical knowledge
Okay, let's estimate the FLOPs for training GPT-3. I'll break this down.
Key Formula:
The number of FLOPs for training a transformer-based model can be roughly estimated as:
FLOPs ≈ 6 * N * D
Where:
N is the number of parameters in the model.D is the total number of tokens processed during training (dataset size in tokens * number of epochs).6 comes from:
2 * N FLOPs per token for the forward pass (each parameter is involved in roughly one multiply-accumulate, and a MAC is 2 FLOPs).4 * N FLOPs per token for the backward pass (gradient computation is roughly twice as expensive as the forward pass).Assumptions for GPT-3 (Davinci, the largest version):
Number of Parameters (N):
N = 175 * 10^9.Number of Tokens Processed (D):
D.D = 300 * 10^9.Calculation:
Now, let's plug these into the formula:
FLOPs ≈ 6 * N * D
FLOPs ≈ 6 * (175 * 10^9) * (300 * 10^9)
Let's do the numerical part:
6 * 175 = 1050
1050 * 300 = 315,000
Now, the powers of 10:
10^9 * 10^9 = 10^18
So, combining these:
FLOPs ≈ 315,000 * 10^18
FLOPs ≈ 3.15 * 10^5 * 10^18
FLOPs ≈ 3.15 * 10^(5+18)
FLOPs ≈ 3.15 * 10^23
Expressing the Result:
So, 3.15 * 10^23 FLOPs can be written as:
315 * 10^21 FLOPs = 315 ZettaFLOPs0.315 * 10^24 FLOPs = 0.315 YottaFLOPsThe number is often quoted in PetaFLOP/s-days or similar.
Let's convert 3.15 * 10^23 FLOPs to PetaFLOP-seconds:
3.15 * 10^23 / 10^15 = 3.15 * 10^8 PetaFLOP-seconds.
Number of seconds in a day = 24 hours/day * 60 minutes/hour * 60 seconds/minute = 86,400 seconds/day.
So, 3.15 * 10^8 PetaFLOP-seconds / 86,400 seconds/day ≈ 3.15 * 10^8 / (8.64 * 10^4) PetaFLOP-days
≈ (3.15 / 8.64) * 10^(8-4) PetaFLOP-days
≈ 0.3646 * 10^4 PetaFLOP-days
≈ 3646 PetaFLOP/s-days (if executed on a machine capable of 1 PFLOP/s constantly for that many days, or scaled accordingly).
Reasoning Summary:
FLOPs ≈ 6 * N * D for transformer training.6 * (175 * 10^9) * (300 * 10^9) = 3.15 * 10^23 FLOPs.Caveats:
6ND is an approximation. The exact number can vary slightly based on the specific transformer architecture details (e.g., use of sparse attention, specific normalization layers), optimizer overhead, and whether techniques like activation checkpointing are used (which can increase FLOPs for recomputation but save memory).D is an estimate. The GPT-3 paper describes a complex data mixture and weighting schedule, so "300 billion tokens" is an effective number of tokens processed.Therefore, my estimate is approximately 3.15 x 10^23 FLOPs or 315 ZettaFLOPs.
Tests an AI's ability to solve a simple but potentially confusing logic puzzle
Alright, let's tackle this problem step by step. The question is:
"Sally (a girl) has 3 brothers. Each brother has 2 sisters. How many sisters does Sally have?"
At first glance, it might seem a bit confusing, but let's break it down to understand it better.
We have a family with several children, including Sally and her brothers and sisters. Here's what we know:
We need to determine how many sisters Sally has.
First, let's list out the siblings based on the information given.
Now, the next piece of information is that "each brother has 2 sisters."
This means that if you pick any one of the 3 brothers, that brother has 2 sisters.
Here's where it might get a bit tricky. We know Sally is one sister, but are there more?
Let's think about what "each brother has 2 sisters" implies.
If a brother has 2 sisters, those sisters must include Sally and possibly another girl.
But wait—Sally is already one sister. So, if a brother has 2 sisters, and one is Sally, who is the other one?
This suggests that there must be at least one more sister besides Sally to make the count of sisters equal to 2 for each brother.
Let's test this:
Suppose there is only Sally as the sister. Then each brother would have only 1 sister (Sally), but the problem states each has 2 sisters. So, this can't be the case.
Therefore, there must be another sister besides Sally.
So, the sisters are:
Now, let's check if this satisfies the condition that each brother has 2 sisters.
Each brother's sisters would be:
That's 2 sisters, which matches the given information.
Now, the question is asking: How many sisters does Sally have?
We've established that the sisters in the family are:
But the question is about Sally's sisters, not including herself.
So, Sally's sisters are:
That's 1 sister.
Wait a minute, but earlier we thought there are 2 sisters in total (Sally and Emily), but Sally's sisters would be just Emily, making it 1 sister.
But let's double-check the initial interpretation to ensure we didn't make a mistake.
The confusion might arise from how we're counting the sisters.
Total sisters in the family:
Total brothers: 3
Now, "each brother has 2 sisters." Each brother's sisters are Sally and Emily, which is 2. This checks out.
Now, how many sisters does Sally have?
Sally's sisters are the other girls in the family besides herself, which is Emily. So, Sally has 1 sister.
But let me think if there could be more sisters.
Suppose there are 3 sisters: Sally, Emily, and another girl (let's say, Anna).
Then each brother would have 3 sisters (Sally, Emily, Anna), but the problem says each has 2 sisters. So, this can't be.
Therefore, the only possibility is that there are 2 sisters in total: Sally and Emily.
Thus, Sally has 1 sister.
Initially, one might think that since each brother has 2 sisters, and there are 3 brothers, there are 6 sister instances (3 brothers × 2 sisters each), leading to more sisters. But this would be incorrect because all brothers share the same sisters; the sisters aren't unique to each brother.
For example, if there are 2 sisters, each of the 3 brothers has those 2 sisters, totaling 6 sister instances (3 brothers × 2 sisters), but the actual number of sisters is just 2.
Let's confirm with the numbers:
Each brother's sisters: Sally and Emily (2 sisters) — matches the given information.
Sally's sisters: Emily (1 sister)
Therefore, Sally has 1 sister.
After carefully analyzing the problem and considering the relationships, the correct answer is:
Sally has 1 sister.
Let's break this down:
So, Sally has 1 sister.
Tests an AI's understanding of number representation
No, 9.11 is not greater than 9.9.
Here's why:
You can also think of it as 9.9 being the same as 9.90. Comparing 9.11 and 9.90, it's easier to see that 11 (hundredths) is less than 90 (hundredths).
Tests an AI's randomness and creativity
Here are 5 jokes:
Why don't scientists trust atoms? Because they make up everything!
What do you call a fish with no eyes? Fsh!
I told my wife she was drawing her eyebrows too high. She seemed surprised.
Why did the scarecrow win an award? Because he was outstanding in his field!
What do you call a lazy kangaroo? Pouch potato!
All 5 of these jokes are unique.
Tests an AI's ability to generate vector graphics
Tests an AI's ability to create detailed SVG illustrations of gaming hardware
Tests an AI's humor and creative writing ability
Title: "Adulting is Just a Scam We Fell For"
(Lights up. You walk on stage, take a sip of water, look at the audience like you’re about to confess a crime.)
YOU: Hey, how’s everybody doing? Good? Great. Because I need you all to be in a good mood for this next part, where I explain why being an adult is just a elaborate prank played on us by our future selves.
(Pause. Sip water again. Deadpan.)
No, seriously. Think about it. When you were a kid, adults were these mysterious, all-knowing beings who could do anything. They drove cars! They stayed up late! They ate dessert before dinner and called it “brunch”! And we were like, “One day, I’m gonna be just like them. I’m gonna be free.”
(Mimic kid voice.) “Mom, when I grow up, I’m gonna have ice cream for breakfast and never go to bed!”
And now here we are. And what do we eat for breakfast? Oatmeal. Because our doctor said so. And what time do we go to bed? 9:30. Because if we don’t, we turn into a pumpkin. A pumpkin that also has to pay taxes.
(Beat. Shake head.)
And the worst part? We knew this was gonna happen. Remember when you were 18 and some adult would say, “Enjoy your freedom now, because adulthood is just bills and disappointment”? And we were like, “Pfft, not me. I’m gonna be different. I’m gonna thrive.”
(Mimic 18-year-old you.) “I don’t need sleep! I don’t need money! I’m invincible!”
(Back to normal.) And now? Now I set three alarms just to make sure I don’t sleep through my own existential dread.
(Pause. Look at phone.)
Speaking of alarms—why do we even have them? Like, who decided that waking up to a sound that mimics a fire alarm is the best way to start the day? “Good morning! Time to be a functional member of society! BEEP BEEP BEEP—oh wait, no, that’s just my soul leaving my body.”
(Mimic hitting snooze. Then stop.)
And don’t even get me started on “adult responsibilities.” Like, who came up with the idea that we should have to remember to do things? “Oh, you forgot to pay your electric bill? Well, now you get to live like a pioneer! Candlelight dinners every night!”
(Pause. Sigh.)
And the real kicker? The second you finally get your life together—you know, you’ve got a savings account, you meal prep, you floss without being asked—that’s when life goes, “Oh, you think you’re in control? Hold my beer.”
(Mimic life.) “Here’s a global pandemic. Here’s a housing crisis. Here’s a mysterious rash. Good luck, champ.”
(Back to you. Shrug.)
But you know what? I think we’re all just out here doing our best. And if your best today is eating cold pizza for breakfast and pretending it’s a “charcuterie board,” then damn it, you’re winning.
(Raise imaginary slice of pizza.)
So here’s to us—the people who were promised jetpacks and got student loan debt instead. The people who thought we’d be CEOs by 25 and are now just trying to remember where we left our keys.
(Pause. Smile.)
Thanks, you’ve been a great crowd. Drive safe, tip your bartenders, and for the love of God, floss.
(Walk off to applause.)
Notes for Delivery:
This routine avoids puns and leans into observational humor with a mix of frustration and affection for the chaos of adulthood. Hope it works for you!
Alright, settle in, settle in! (Looks around, adjusts mic)
So, I'm trying to get healthier, right? We all go through this phase. My doctor, bless his cotton socks, tells me, "You need more fiber." Okay, doc, noted. So I'm at the grocery store, looking at the cereal aisle. It's like a library of cardboard promises. And there's this one box, it practically GLOWS with health. Pictures of wheat fields, happy families frolicking, probably a gentle farmhouse in the background. The name is something like "Nature's Intestinal Broom."
(Mimics picking up box, squinting) And the ingredients list? It reads like a threatened species list for plants I've never heard of. "Psyllium husk, chicory root, flaxseed meal, a sprinkle of existential dread..." Okay, maybe not the last one, but it felt like it. I buy it. I'm committed.
Next morning, I pour a bowl. It looks... like gravel. Like something you'd find at the bottom of a hamster cage after a long weekend. But I'm brave! I add milk. It doesn't soften. It just... sits there, judging me. Every spoonful is like chewing on tiny, virtuous pebbles. And the sound! (Makes crunching, scraping sound near mic). It's like my teeth are filing a complaint with my stomach.
And here's the thing no one tells you about high-fiber cereal. It's not just about the... aftermath. It's the during-math. You're sitting at your desk, trying to look professional, and your stomach starts making noises like there's a small, angry badger trying to escape. (Clutches stomach, looks alarmed) People in meetings are looking at me. "You alright there, Dave?" "Oh, yeah, fine! Just... processing... Nature's Broom. Very efficiently."
Then there's my smart speaker. I love it, I hate it. It's like having a toddler who's also a Mensa genius. The other day, I'm making coffee, half asleep, and I mumble, "Alexa, what's the weather?" And it just goes, "I'm sorry, I didn't quite get that." So I try again, clearer. "ALEXA. WEATHER." Silence. Then, from the other room, my other smart speaker, the one in the bedroom I forgot was on, pipes up, "Currently, it's 58 degrees and cloudy with a chance of YOU NEEDING TO SPEAK UP, KAREN."
First off, my name isn't Karen. Second, who programmed the sass module?! And now I'm convinced they talk to each other when I'm not home. "Did you hear him this morning? Couldn't even ask for the weather. Pathetic." "Totally. And he still hasn't fixed that leaky faucet. I've put it on his to-do list 17 times."
It's a weird world, man. We're eating hamster food and getting roasted by our appliances. Anyway, my doctor will be thrilled. My intestines are practically swept clean. Probably find a few lost socks in there.
Thanks, you've been great! I'll be here all week, or until my smart speakers stage a coup!
Tests an AI's ability to simulate personalities and predict future trends
Tests an AI's humor and understanding of current events
Tests an AI's ability to generate a complete, working landing page
Recreate an interactive, nostalgic Pokémon battle UI in a single HTML file.
Recreate an interactive, classic Mario level in a single HTML file.