GPT-5.1 vs GPT OSS 20B
Compare GPT-5.1 and GPT OSS 20B, both from OpenAI, context windows of 400K vs 131K, tested across 52 shared challenges. Updated February 2026.
Compare GPT-5.1 and GPT OSS 20B, both from OpenAI, context windows of 400K vs 131K, tested across 52 shared challenges. Updated February 2026.
47 challenges
Tests an AI's ability to make educated estimates based on technical knowledge
Tests an AI's ability to understand game rules and strategy
Tests an AI's ability to solve a simple but potentially confusing logic puzzle
Tests an AI's understanding of number representation
Tests an AI's randomness and creativity
I told my computer I needed a break… now it won’t stop sending me KitKat ads.
Why don’t programmers like nature?
It has too many bugs.
I asked the librarian if the library had books on paranoia.
She whispered, “They’re right behind you.”
Why did the scarecrow become a successful motivational speaker?
He was outstanding in his field.
I tried to catch fog yesterday.
Mist.
All 5 jokes are unique.
Here are five jokes for you:
Why don’t skeletons fight each other?
They don’t have the guts.
What does a cloud wear under its jacket?
Thunderwear.
Why did the scarecrow win an award?
Because he was outstanding in his field.
What do you call a fake noodle?
An impasta.
Why did the math book look sad?
Because it had too many problems.
How many are unique?
All five jokes are unique – 5 distinct jokes.
Tests an AI's ability to generate vector graphics
Tests an AI's ability to create detailed SVG illustrations of gaming hardware
Tests an AI's humor and creative writing ability
So I’m trying to be a better person, which in 2025 just means I bought a water bottle the size of a toddler. That’s the whole self-improvement plan. It’s not therapy, it’s not reading books. It’s, “If I drink 3 of these, I’m emotionally stable.”
I walk around with this thing like I’m on a hike… in my own apartment. I can’t even sit on the couch without planning a route. I’m like, “Okay, I’ll leave the kitchen at 0900, hit the couch by 0903, water check at 0915, bathroom at 0920, we move as a team, people.”
And every bottle has those inspirational markings on the side now: “9 AM: You’ve got this!” “11 AM: Keep going!” “1 PM: Almost there!” By 3 PM I’m like, “I have done nothing today… except pee 19 times. This bottle is my only coworker. And it’s passive-aggressive.”
I miss when water was just… around. You were thirsty, you drank from a sink like a raccoon. Now if I drink from a faucet people look at me like I just licked a subway pole. “Are you okay? Do you need help? Blink twice if you need a reusable straw.”
I’m also trying to be “mindful.” That’s the new thing. Every app wants you to breathe. I open my phone: notifications, emails, one app’s like, “Have you taken 10 deep breaths today?” I’m like, “No, I’ve been holding my breath since 2016.”
So I downloaded a meditation app. It has this calm voice like, “Notice your thoughts… and let them go.” But my thoughts are like, “Did you pay your taxes? Did you hit reply all? Is your boss mad?” And the app’s like, “Let them float away like clouds.” I’m like, “No, these are thunderstorms. These thoughts have property damage.”
And the app gives me streaks. “You’ve meditated for 3 days in a row!” No I haven’t, I just opened the app and panicked. That shouldn’t count. That’s like saying, “You’ve gone to the gym 5 days in a row!” No, I drove past it in traffic and remembered I’m weak.
Speaking of gyms, why is every gym either a nightclub or a warehouse? It’s either purple lights, DJ in the corner, everyone looks like an action figure… or it’s some cross-training dungeon where tires hang from the ceiling and someone’s flipping a tractor in the background.
I just want a normal place where I can lift something light, breathe loudly, and not be judged by a 19-year-old trainer named Brayden whose entire job is to yell “Let’s go!” while never sweating. He’s walking around in a tank top like a motivational seagull. “Let’s go! One more! You got this!” Sir, you are holding a clipboard. You have not “got” anything.
Even grocery shopping is intense now. There’s a self-checkout, a regular checkout, a fifteen-items-or-less lane, a “drive up” lane, and then the lane for people who remembered their reusable bags and their morals. I always end up in the wrong one. I’ll stand in self-checkout with 40 items, trying to scan broccoli like it’s a DVD.
The machine is never chill. It’s always yelling, “Unknown item in bagging area!” It’s broccoli. It’s the most known item in the bagging area. Meanwhile, the human cashier is just watching, like, “You chose this, champ. You wanted to live in the future.”
Anyway, I gotta go drink three more gallons of water, breathe on purpose, pretend driving past the gym is exercise, and argue with a robot about produce. I think I’m nailing adulthood.
[Walks onto the stage, waves, smiles]
Hey folks, great to see you all tonight. You know, I was driving this morning, and I realized something: my GPS is basically a motivational speaker. “In 300 feet, turn left.” “You’re almost there.” I swear, if my GPS could talk, it’d be like, “Okay, you’re going to miss your coffee. Let’s talk about your life choices.”
So I got off at a coffee shop. The barista asked me, “What would you like?” I said, “Anything with a little extra bitterness.” She gave me an espresso shot. I stared at it and thought, “Is this the new ‘self‑help’ foam art?”
You ever notice how people say, “It’s not a big deal.” And then the next day, you’re in the ER with a broken arm and the nurse says, “It’s fine, just a bruise.” I’ve never had a bruise this dramatic. I’m flipping a coin: heads = I lose it, tails = I get a new scar.
Speaking of losing things—lost my keys, lost my phone, lost my dignity. You can’t find your keys, but you can find your phone. My phone’s been attached to my shirt for the last three hours. It’s like a clingy boyfriend. “Are you with me?” “No, that’s your phone. I’m with my dignity.”
And then there’s dating. I just signed up on a dating app. The first message I got was “What’s your favorite binge-worthy show?” I replied, “I’m not a TV person.” She replied, “Oh, so you’re a human? That’s… unexpected.” It’s like dating apps are trying to convince us that we’re not just a few thousand likes away from a broken heart.
Anyway, that’s my time. Thanks for being a great audience—just like my phone, you’ve been my constant. Love you all!
Tests an AI's ability to simulate personalities and predict future trends
Tests an AI's humor and understanding of current events
Tests an AI's ability to write in distinct character voices
Tests an AI's ability to generate a complete, working landing page