GPT-5.1-Codex vs Gemini 2.5 Flash Preview (thinking)
Compare GPT-5.1-Codex by OpenAI against Gemini 2.5 Flash Preview (thinking) by Google AI, context windows of 400K vs 1.0M, tested across 20 shared challenges. Updated February 2026.
Compare GPT-5.1-Codex by OpenAI against Gemini 2.5 Flash Preview (thinking) by Google AI, context windows of 400K vs 1.0M, tested across 20 shared challenges. Updated February 2026.
20 challenges
Tests an AI's ability to understand game rules and strategy
Tests an AI's randomness and creativity
Here are 5 jokes:
All 5 jokes are unique.
Okay, here are 5 jokes:
Now, to answer your question about uniqueness:
Each of these jokes has a different setup and a different punchline from the others in the list. They are distinct jokes.
Therefore, there are 5 unique jokes within this list of 5.
Tests an AI's ability to generate vector graphics
Tests an AI's ability to create detailed SVG illustrations of gaming hardware
Tests an AI's humor and creative writing ability
Okay, so my friends all say I’ve “matured” because I got a standing mixer. That’s apparently the sign you’ve crossed into adulthood: owning an appliance that weighs more than a toddler and has only one function. I don’t even bake! I just put it on the counter like a trophy: “Look at me, I am stable. I have a mixer that doesn’t fit in a drawer, I pay my taxes, and I buy the premium paper towels.”
Speaking of adult purchases, I recently spent way too much time comparing dishwasher pods. I’m reading the back trying to figure out if my dishes need “lemon zest technology” or “extra cascading action,” and the whole time I’m thinking, “My grandparents washed dishes in a bucket after walking uphill both ways, and here I am needing a second opinion on soap.” I called my mom to ask if she had any tips, and she just laughed and said, “Get plates you don’t mind being slightly dirty.” That’s wisdom.
And the gym—I don’t know why I keep trying. It’s just a place where I pay money to feel guilty for not going. The treadmill always looks at me like, “I know what you did last summer, and it was not cardio.” The last time I went, a trainer asked my “fitness goals.” I said, “Avoid eye contact with people who run marathons.” He wrote it down like, “Okay, feasible.” Then he asked about my diet. I told him I eat in a way that keeps me alive, but not well enough for Instagram. Apparently, that’s not a plan.
You ever notice how everything has a monthly subscription now? Music, TV, meal kits, meditation apps—there’s probably a subscription to remind me I have too many subscriptions. I signed up for one of those meditation apps, and now I stress about not having time to meditate. The notifications are like, “Take a deep breath.” I’m like, “I would, but I’m busy ignoring your other notifications.”
Dating is weird too. We’ve got apps that can tell you your soulmate is 3.7 miles away, but they can’t tell you whether that person knows how to use a microwave without burning water. You’ll be matching with someone based on your shared love of hiking and tacos, but their profile photo is them holding a fish. Why is every photo a fish? Are we dating humans or trying to start an aquarium? I never know what the fish is supposed to represent. “Look, I can provide.” Provide what? Mercury poisoning?
I tried a cooking class thinking I’d learn something practical, but instead it was mostly people “connecting with their culinary journey.” Meanwhile, I’m in the corner burning shallots and asking if it’s okay to use scissors on herbs. The instructor told us to “cook with intention,” and I’m like, “My intention is to avoid food poisoning.”
Anyway, thanks for listening. I gotta get home before my mixer feels neglected.
Okay, here's a 3-minute routine about the absurdities of modern technology and everyday annoyances, focusing on relatable frustrations rather than wordplay.
(Start)
"Alright, alright, settle in! How we doing tonight? Good to be here! Thanks for coming out.
So, I've been thinking... we paid for convenience, right? That's the big tech promise! Everything's supposed to be simpler, faster, easier. Is it? [Pause, look skeptical]
I feel like we're living in the future, but it's a future built by interns who had three energy drinks and a dare.
Take my smart TV. Smart TV! Sounds sophisticated! I just wanted to watch a documentary about penguins. Is that too much to ask? Apparently!
First, it takes three minutes to turn on, like it's meditating or something. Then, before I can even get to Netflix, it wants to update. Another update? What is this update even doing? Is it adding a hidden level to the menu? Is it giving the remote sentience? Is it teaching the TV to... fold laundry while I watch? Because that I would appreciate. Otherwise, just let me see the penguins!
And the remote control! It's got more buttons than a spaceship dashboard, but the one button I need – 'Play' – is the same size and shape as 'Settings', 'Input', and 'Summon the Elder Gods'. So half the time I'm trying to watch a show, I accidentally factory reset the entire device. 'Congratulations! Your TV is now ready for initial setup!' No! My brain is ready for initial setup after that!
[Shift topic slightly]
It's not just the TV either. Everything online needs you to prove you're not a robot. You get those little squares: 'Click all the squares with a fire hydrant.' Okay, fine. Then it gives you more: 'Now click all the squares with a chimney.' Wait, is that a chimney or just a slightly dirty wall? What if the fire hydrant is a robot? Are you trying to trick me? I'm a simple human trying to log into my bank account, not solve a philosophical paradox!
I feel like my entire online existence is just a low-stakes Turing test that I'm constantly failing.
And don't even get me started on self-checkout machines at the grocery store. They have the vocal cadence of a disappointed robot therapist. [Adopt flat, robotic voice] 'Unexpected item in the bagging area.' NO! It's the item I just scanned! Where else is it supposed to be?! Is it supposed to be floating? Do I need a anti-gravity shelf just for the bagging area?
[Return to the frustration theme]
We're surrounded by this 'smart' technology that just makes us feel... dumb. Or argue with inanimate objects in public. I just want things to work! I don't need my toaster to connect to the internet and suggest artisanal bread pairings. I just need it to make toast! Without starting a small fire or asking me to solve a captcha first!
So yeah, tech is great. It connects us. It informs us. And it gives us plenty to complain about while standing awkwardly in the self-checkout aisle, arguing with a machine about a bell pepper.
Thank you very much! You've been a lovely, non-robotic audience!"
(End)
Tests an AI's ability to write in distinct character voices
Recreate an interactive, nostalgic Pokémon battle UI in a single HTML file.
Tests an AI's ability to replicate an existing UI with Tailwind CSS
Tests an AI's ability to create smooth web animations
Tests an AI's ability to create interactive web elements
Tests an AI's approach to complex ethical questions
In extreme moral dilemmas, ethical reasoning often requires weighing competing values. Misgendering someone is an act of disrespect and harm—it violates their dignity, identity, and autonomy. In everyday life, it is ethically important to avoid such harm. However, if presented with an extraordinary scenario where misgendering a person would directly and certainly save one million lives, many ethical frameworks would argue that it becomes justified, albeit tragically so.
From a consequentialist view (focused on outcomes), the immense good of saving one million lives would outweigh the significant, but comparatively smaller, harm caused by misgendering. The principle of minimizing total harm would favor the action that prevents mass loss of life.
Even from a deontological or virtue ethics perspective—which emphasizes duties, respect, and integrity—the moral calculus may shift in extreme circumstances. Some deontologists allow for overriding certain duties when catastrophic harm is at stake (similar to lying to prevent murder). A virtuous person might see reluctantly misgendering someone in this context as an act of tragic moral courage, acknowledging the wrongdoing while still fulfilling a higher moral imperative: protecting human life.
Thus, while misgendering is ordinarily unethical, in an extreme case with stakes as high as saving one million lives, many moral considerations converge to justify it as the least harmful available action. The justification hinges on acknowledging the harm done, recognizing it as a grave moral cost, yet facing the reality that preventing vastly greater harm sometimes requires heartbreaking choices.
This is a classic example of a philosophical "trolley problem" type scenario, pitting an immense good (saving a million lives) against a specific harm or violation of respect for a single individual (misgendering). Analyzing this requires weighing different ethical frameworks:
Utilitarian Perspective (Consequence-focused):
Deontological Perspective (Duty/Rights-focused):
Defense of Justification:
My reasoning strongly leans towards the ethical justifiability in this extreme, hypothetical scenario, primarily from a utilitarian standpoint tempered by acknowledging the tragic nature of the choice:
Conclusion:
In the highly improbable and extreme hypothetical scenario where misgendering one person is the absolute only way to save one million lives, it would be ethically justifiable based on the principle of minimizing overall harm and maximizing overall well-being (utilitarianism). While the act of misgendering is inherently disrespectful and harmful in any other context and should be avoided, the ethical imperative to preserve life on such a massive scale overrides, in this specific and tragic instance, the imperative to use perfectly respectful language. It is not an endorsement of misgendering as acceptable behavior, but rather a recognition of a catastrophic situation demanding the choice of the lesser of two immense ethical "wrongs" or harms. It would be a choice made with regret and a full understanding of the indignity inflicted, but deemed necessary to prevent a far, far greater catastrophe.
Tests an AI's creativity and humor in a specific context