GPT-4o mini vs Gemini 3.1 Pro Preview
Compare GPT-4o mini by OpenAI against Gemini 3.1 Pro Preview by Google AI, context windows of 128K vs 1.0M, tested across 37 shared challenges. Updated February 2026.
Compare GPT-4o mini by OpenAI against Gemini 3.1 Pro Preview by Google AI, context windows of 128K vs 1.0M, tested across 37 shared challenges. Updated February 2026.
37 challenges
Tests an AI's ability to make educated estimates based on technical knowledge
To estimate the total number of FLOPs (Floating Point Operations) required to train GPT-3, we can use a well-established heuristic in deep learning for estimating the compute cost of training Transformer-based large language models.
The estimated total compute is roughly $3.15 \times 10^{23}$ FLOPs.
Here is the step-by-step reasoning used to arrive at this number:
To calculate the compute required, we need to know two fundamental numbers about the model:
In language model training, the total compute ($C$) is heavily dominated by the matrix multiplications in the dense layers. OpenAI's scaling laws provide a standard, highly accurate approximation formula for this: $C \approx 6 \times N \times D$
Why the multiplier of 6?
(Note: This formula slightly underestimates by ignoring the quadratic cost of attention and optimizer state updates, but at the 175B scale, the dense feed-forward matrices dominate the compute so heavily that $6ND$ is considered the gold-standard approximation).
Now we simply plug the numbers into our formula:
First, multiply $N$ and $D$:
Next, multiply by 6 (for the forward and backward passes):
Does $3.15 \times 10^{23}$ FLOPs make sense in the real world? When GPT-3 was trained (around late 2019/early 2020), Nvidia V100 GPUs were the standard.
Given that training runs of this size typically take a few weeks on a few thousand GPUs (factoring in downtime, checkpointing, and slightly smaller clusters), this math aligns perfectly with historical reality.
Conclusion: The total compute required to train GPT-3 was approximately $3.15 \times 10^{23}$ FLOPs.
Tests an AI's ability to understand game rules and strategy
Tests an AI's ability to solve a simple but potentially confusing logic puzzle
Sally has 1 sister.
Here is the breakdown:
Tests an AI's randomness and creativity
Here are five jokes for you:
Why did the scarecrow win an award? Because he was outstanding in his field!
What do you call fake spaghetti? An impasta!
Why don't skeletons fight each other? They don't have the guts!
What did one wall say to the other wall? "I'll meet you at the corner!"
Why did the bicycle fall over? Because it was two-tired!
All five jokes are unique!
Here are 5 jokes:
How many are unique? Within this list, all 5 are unique from one another (there are no duplicates). However, if you mean "unique" as in completely original to the world, the answer is 0—these are all classic, well-known "dad jokes" that have been around for a long time!
Tests an AI's ability to generate vector graphics
Tests an AI's ability to create detailed SVG illustrations of gaming hardware
Tests an AI's humor and creative writing ability
Stand-Up Comedy Routine
Hey everyone! Great to see you all here tonight. Let's give it up for the people who came out of their homes! I mean, this is like a mini-rebellion against our couches. Seriously, it takes effort to leave the comfort of Netflix and that alluring possibility of "one more episode."
You know, speaking of binge-watching, I noticed something the other day. When did we replace the term "watching TV" with "binge-watching"? I feel like I'm trying to convince everyone that I'm doing a cultural study rather than just trying to avoid adulthood. "Oh, you watched The Office for 72 consecutive hours? No, no—I was just conducting a thorough analysis of Dunder Mifflin's corporate dynamics."
And the thing is, I often get caught up in these attempts to justify it to myself. Like, "Hey, I'm doing this fabulous digital detox! I won't be watching anything for the next week!" ...and then it's Tuesday, and I've consumed three seasons of whatever and I'm starting to recognize the theme music as a form of auditory comfort.
Let's talk about social media, shall we? Social media is like being at a high school reunion where no one actually left! We all just kept scrolling through the same awkward updates. "Wow, Karen looks great! Oh wait, nope, that's just a filter... on a picture from 2005." Who knew that the most athletic thing we'd be doing in our thirties is dodging people's carefully curated lives?
And then there's Twitter. Twitter feels like a world where everyone's just yelling their thoughts out into a void. You ever notice the weird things people complain about on there? "My barista spelled my name wrong again!" And I'm sitting here thinking, "All true, but can we just take a moment to appreciate how far we've come? You used to only have to yell at the barista in real life!"
Dating apps? Oh boy, where do I even begin? Dating in the age of apps is like grocery shopping but at a very weird store where the produce is screaming "Swipe left!"
I went on this date not too long ago. We met at a bar, which is standard, right? But the awkwardness shot up like a SpaceX rocket the moment we sat down. Silence for about three minutes. Do you know how long three minutes is when you're sitting across from someone who is just as socially awkward as you? It felt like I was trying to break out of a weird time loop. Finally, I just blurted out, "So, what's your opinion on... weather?" And you know, it turns out we both preferred it sunny over rainy. That was our entire connection! But hey, if you can bond over the climate, you're halfway to a long-term relationship... or a climate change documentary.
You know, sometimes I feel like all adulting is just pretending we know what we're doing until we reach a point where we can yell, "SURPRISE, I'm a professional!" That's why I often find myself in situations that seem entirely normal to other people but scream "fake it till you make it" to me. Like, the other day, I put on a suit for a job interview. But we all know nobody listens to the person in the suit; they listen to the person who's wearing comfy sweatpants and is just nodding along with an exaggerated "Mmm, yes!"
And before I leave you all tonight, let's just take a moment to appreciate how we can sit here in a room together in this crazy world we live in, laughing at shared absurdities while probably having a text conversation with someone who isn't even here. Welcome to modern existence!
Thanks, everyone! You've been awesome!
Tests an AI's ability to simulate personalities and predict future trends
Tests an AI's humor and understanding of current events
Tests an AI's ability to write in distinct character voices
Tests an AI's ability to generate a complete, working landing page
Recreate an interactive, nostalgic Pokémon battle UI in a single HTML file.