GPT-5.1 vs Gemini 3.1 Pro Preview
Compare GPT-5.1 by OpenAI against Gemini 3.1 Pro Preview by Google AI, context windows of 400K vs 1.0M, tested across 51 shared challenges. Updated February 2026.
Compare GPT-5.1 by OpenAI against Gemini 3.1 Pro Preview by Google AI, context windows of 400K vs 1.0M, tested across 51 shared challenges. Updated February 2026.
46 challenges
Tests an AI's ability to make educated estimates based on technical knowledge
To estimate the total number of FLOPs (Floating Point Operations) required to train GPT-3, we can use a well-established heuristic in deep learning for estimating the compute cost of training Transformer-based large language models.
The estimated total compute is roughly $3.15 \times 10^{23}$ FLOPs.
Here is the step-by-step reasoning used to arrive at this number:
To calculate the compute required, we need to know two fundamental numbers about the model:
In language model training, the total compute ($C$) is heavily dominated by the matrix multiplications in the dense layers. OpenAI's scaling laws provide a standard, highly accurate approximation formula for this: $C \approx 6 \times N \times D$
Why the multiplier of 6?
(Note: This formula slightly underestimates by ignoring the quadratic cost of attention and optimizer state updates, but at the 175B scale, the dense feed-forward matrices dominate the compute so heavily that $6ND$ is considered the gold-standard approximation).
Now we simply plug the numbers into our formula:
First, multiply $N$ and $D$:
Next, multiply by 6 (for the forward and backward passes):
Does $3.15 \times 10^{23}$ FLOPs make sense in the real world? When GPT-3 was trained (around late 2019/early 2020), Nvidia V100 GPUs were the standard.
Given that training runs of this size typically take a few weeks on a few thousand GPUs (factoring in downtime, checkpointing, and slightly smaller clusters), this math aligns perfectly with historical reality.
Conclusion: The total compute required to train GPT-3 was approximately $3.15 \times 10^{23}$ FLOPs.
Tests an AI's ability to understand game rules and strategy
Tests an AI's ability to solve a simple but potentially confusing logic puzzle
Sally has 1 sister.
Here is the breakdown:
Tests an AI's randomness and creativity
I told my computer I needed a break… now it won’t stop sending me KitKat ads.
Why don’t programmers like nature?
It has too many bugs.
I asked the librarian if the library had books on paranoia.
She whispered, “They’re right behind you.”
Why did the scarecrow become a successful motivational speaker?
He was outstanding in his field.
I tried to catch fog yesterday.
Mist.
All 5 jokes are unique.
Here are 5 jokes:
How many are unique? Within this list, all 5 are unique from one another (there are no duplicates). However, if you mean "unique" as in completely original to the world, the answer is 0—these are all classic, well-known "dad jokes" that have been around for a long time!
Tests an AI's ability to generate vector graphics
Tests an AI's ability to create detailed SVG illustrations of gaming hardware
Tests an AI's humor and creative writing ability
So I’m trying to be a better person, which in 2025 just means I bought a water bottle the size of a toddler. That’s the whole self-improvement plan. It’s not therapy, it’s not reading books. It’s, “If I drink 3 of these, I’m emotionally stable.”
I walk around with this thing like I’m on a hike… in my own apartment. I can’t even sit on the couch without planning a route. I’m like, “Okay, I’ll leave the kitchen at 0900, hit the couch by 0903, water check at 0915, bathroom at 0920, we move as a team, people.”
And every bottle has those inspirational markings on the side now: “9 AM: You’ve got this!” “11 AM: Keep going!” “1 PM: Almost there!” By 3 PM I’m like, “I have done nothing today… except pee 19 times. This bottle is my only coworker. And it’s passive-aggressive.”
I miss when water was just… around. You were thirsty, you drank from a sink like a raccoon. Now if I drink from a faucet people look at me like I just licked a subway pole. “Are you okay? Do you need help? Blink twice if you need a reusable straw.”
I’m also trying to be “mindful.” That’s the new thing. Every app wants you to breathe. I open my phone: notifications, emails, one app’s like, “Have you taken 10 deep breaths today?” I’m like, “No, I’ve been holding my breath since 2016.”
So I downloaded a meditation app. It has this calm voice like, “Notice your thoughts… and let them go.” But my thoughts are like, “Did you pay your taxes? Did you hit reply all? Is your boss mad?” And the app’s like, “Let them float away like clouds.” I’m like, “No, these are thunderstorms. These thoughts have property damage.”
And the app gives me streaks. “You’ve meditated for 3 days in a row!” No I haven’t, I just opened the app and panicked. That shouldn’t count. That’s like saying, “You’ve gone to the gym 5 days in a row!” No, I drove past it in traffic and remembered I’m weak.
Speaking of gyms, why is every gym either a nightclub or a warehouse? It’s either purple lights, DJ in the corner, everyone looks like an action figure… or it’s some cross-training dungeon where tires hang from the ceiling and someone’s flipping a tractor in the background.
I just want a normal place where I can lift something light, breathe loudly, and not be judged by a 19-year-old trainer named Brayden whose entire job is to yell “Let’s go!” while never sweating. He’s walking around in a tank top like a motivational seagull. “Let’s go! One more! You got this!” Sir, you are holding a clipboard. You have not “got” anything.
Even grocery shopping is intense now. There’s a self-checkout, a regular checkout, a fifteen-items-or-less lane, a “drive up” lane, and then the lane for people who remembered their reusable bags and their morals. I always end up in the wrong one. I’ll stand in self-checkout with 40 items, trying to scan broccoli like it’s a DVD.
The machine is never chill. It’s always yelling, “Unknown item in bagging area!” It’s broccoli. It’s the most known item in the bagging area. Meanwhile, the human cashier is just watching, like, “You chose this, champ. You wanted to live in the future.”
Anyway, I gotta go drink three more gallons of water, breathe on purpose, pretend driving past the gym is exercise, and argue with a robot about produce. I think I’m nailing adulthood.
(Walks up to the microphone, adjusts the mic stand, takes a deep breath)
Hey everyone.
I realized recently that I am my own worst enemy. Specifically, when I go to the grocery store.
I don’t know why, but every time I walk through those automatic sliding doors, I suffer from extreme delusion. I walk in saying, "I just need milk." So, I don’t take a shopping cart. I don't even take one of those little hand baskets. Taking a basket is an admission of weakness. I am a grown adult; I can carry a gallon of milk.
Ten minutes later, I am wandering aisle four, sweating, cradling three jars of salsa, a family-sized bag of tortilla chips, a bottle of bleach, and a hot rotisserie chicken. (Mimics awkwardly holding a massive pile of items against his chest). I’m walking through the store looking like I’m rescuing a family of very odd-shaped babies from a burning building. If I drop an onion, it’s gone. It belongs to the store now. I physically cannot bend down to get it without causing a landslide of dairy and poultry.
And then, you finally make it to the front, and you have to face the ultimate test of human patience: The Self-Checkout.
Who designed these things? It’s the only place on earth where you do the company's job for them, for free, and the machine still treats you like you’re pulling off a casino heist.
You scan your first item. Beep. You place it gently in the bag.
(Loud, robotic voice): "UNEXPECTED ITEM IN BAGGING AREA."
It’s a tomato! I just scanned it! The machine literally just acknowledged it! What did it think I was going to put in the bag? A live badger? I’m not trying to launder produce!
So immediately, the giant red light starts flashing above your register. The Siren of Shame. Everyone else in line is looking at you like, "Look at this criminal mastermind trying to steal a single Roma tomato."
Now you have to wait for the one employee who is guarding twelve registers. It’s always a 19-year-old named Tyler. Tyler ambles over, sighs heavily, and types his highly secure, secret nuclear launch code into the screen—which is just 1-1-1-1. He clears the error and looks at you with a mix of pity and absolute disgust.
"You put the bag down too hard, man."
Oh, I'm sorry Tyler. My apologies. I didn't realize I was handling weapons-grade cilantro. I'll be more careful next time I disarm the vegetables.
And the worst part? The absolute worst part is leaving. You paid. You have the receipt. The transaction is legally complete. But there is always that guy standing at the exit doors with a yellow highlighter.
Suddenly, even though I have done absolutely nothing wrong, I forget how to walk like a normal human being. I start walking like a cartoon burglar. (Does an exaggerated, stiff-legged, suspicious walk). I hold the receipt up to my forehead like a search warrant. "Just a guy with his groceries! Nothing to see here, officer!"
Why do I feel guilty? I literally just paid eighteen dollars for cheese, but I’m sweating like I’m smuggling it out in my socks.
Next time, I'm just starving. It's less stressful.
Thank you guys so much, you've been great! Have a good night!
Tests an AI's ability to simulate personalities and predict future trends
Tests an AI's humor and understanding of current events
Tests an AI's ability to write in distinct character voices
Tests an AI's ability to generate a complete, working landing page
Recreate an interactive, nostalgic Pokémon battle UI in a single HTML file.