GPT-5 Codex
Sally has 1 sister.
Compare GPT-5 Codex by OpenAI against Claude Opus 4 by Anthropic, in 4 community votes, gpt-5 codex and claude opus 4 are closely matched, tested across 31 shared challenges. Updated February 2026.
In 4 community votes, GPT-5 Codex and Claude Opus 4 are closely matched. Claude Opus 4 leads in Web Design. Based on blind community voting from the RIVAL open dataset of 4+ human preference judgments for this pair.
31 challenges
Tests an AI's ability to solve a simple but potentially confusing logic puzzle
Sally has 1 sister.
I need to work through this step-by-step.
Given information:
Since Sally is one of the sisters that each brother has, and each brother has 2 sisters total, that means th ere must be exactly 2 sisters in the family.
The 2 sisters are:
Therefore, Sally has 1 sister.
Tests an AI's randomness and creativity
Unique jokes: 5
Here are 5 jokes:
Why don't scientists trust atoms? Because they make up everything!
What do you call a bear with no teeth? A gummy bear!
Why did the scarecrow win an award? He was outstanding in his field!
What do you call a fake noodle? An impasta!
Why don't eggs tell jokes? They'd crack up!
All 5 jokes are unique - each one has a different setup and punchline, covering different topics (atoms/scie nce, bears, scarecrows, pasta, and eggs).
Tests an AI's ability to generate vector graphics
Tests an AI's ability to create detailed SVG illustrations of gaming hardware
Tests an AI's humor and creative writing ability
Hey everybody, how you doing? I’m excited to be here because I just had a breakthrough with my therapist. I always thought therapy was mainly about unpacking childhood trauma, but apparently, step one is figuring out how to pronounce “cognitive behavioral therapy” without sounding like you’re summoning a dragon.
My therapist told me to start journaling so I can track my emotions, which sounded healthy until I saw my entries. It’s just pages of “Hungry again” and “Why did I nod in that meeting when I disagreed with every word?” At this point I’m basically writing Yelp reviews for my own life. “Woke up anxious, would not recommend. Two stars because the coffee was decent.”
I think my biggest problem is I’m too polite. I apologize to inanimate objects. I will bump into a chair and go, “Oh, excuse me!” That chair was probably wondering why a human was asking consent. Meanwhile, somewhere out there, that same chair is on a podcast saying, “Humans are weird. They bruise themselves on me and then act like it’s their fault.”
Grocery shopping is a nightmare for us over-polite people. The small talk at the checkout has gotten too advanced. The cashier asked, “Do you have any plans for the weekend?” and I panicked. Suddenly I’m confessing my entire calendar. “Well, Friday I’m assembling a bookshelf I can’t afford, Saturday I’m ignoring texts, and Sunday I’ll pretend to reset my life for Monday.” She’s just standing there going, “Sir, I just needed to know if you wanted a bag.”
And why is every self-checkout machine now an interrogator from a spy movie? I scanned an avocado and it yelled, “PLACE ITEM IN BAGGING AREA.” So I put it down. “UNIDENTIFIED ITEM. PLEASE WAIT FOR ASSISTANCE.” I was like, “That’s an avocado, it’s not part of a covert mission. It’s literally becoming guacamole tonight.” Next thing you know a teenager in a neon vest walks over, waves a barcode wand, and suddenly I feel absolved of my sins.
Speaking of technology judging me, my fitness watch has become my most passive-aggressive relationship. It buzzes every hour like, “Time to stand up!” and I’m like, “How bold of you to assume I wasn’t already standing emotionally.” The other day it said, “You beat yesterday’s activity by 12 percent.” Yeah, yesterday I took a nap so violent my pillow filed a noise complaint. The bar was in the basement. I tripped over it and still got a badge.
Let’s talk about dating apps. You know they’re chaos disguised as convenience. I matched with someone whose profile said, “Looking for someone spontaneous.” I suggested we meet for coffee that afternoon and she replied, “Whoa, that’s too soon.” I guess by spontaneous she meant “willing to send witty messages for six months before ever exchanging last names.”
Even when you do get a date, it feels like you’re just trading medical histories. Everyone is micro-dosing self-disclosure. “Hi, I’m Alex, I love hiking and I have seasonal allergies. Swipe left if you can’t handle Claritin.” Truly intimate moments are rare. The other night my date said, “I love how honest we can be.” I said, “Totally. By the way, I’ve had a stack of laundry on my chair for four months that now has citizenship.” She was like, “That’s brave,” which is never what you want to hear about your domestic habits.
And being an adult is mostly doing things you said you’d never become. I remember watching my parents compare vacuum cleaners for fun and thinking, “I’ll never be that person.” Cut to me last weekend, deep in a YouTube spiral titled “Top 10 Cordless Vacuums Ranked.” There’s a guy reviewing them like they’re sports cars. “The suction on this one? Unreal. We took it off-road and it breezed over Cheerios like a dream.” I’m eating popcorn taking notes. This is 35; we’re thrilled about appliances and our idea of a wild night is finding a coupon that stacks.
Before I go, can we address online recipe bloggers? Why do they think I need their life story before I can make scrambled eggs? I just want the ingredients. Instead I’m reading, “My journey began during a backpacking trip through Tuscany where I met an old man named Giuseppe who whispered, ‘Use butter.’” Ma’am, I used butter already! Can we skip to the part where you tell me how hot to set the pan before my kitchen turns into a crime scene?
Alright, that’s my time. Thanks for laughing at my coping mechanisms. If anyone wants to form a support group for overly polite people with anxious fitness trackers, I’ll be at the back, apologizing to the coat rack.
"Modern Life is Weird"
[Walk on stage]
You know what's crazy about being an adult? Nobody tells you that half of it is just googling things you're too embarrassed to ask real people. Last week I googled "is it normal for my knee to sound like bubble wrap? " The answer was either "yes, totally fine" or "you have six weeks to live." Thanks, internet!
[Pause]
I went to the doctor about it. You know you're getting old when your doctor looks like they should be asking YOU for ID to buy alcohol. This kid walks in with a stethoscope, and I'm like, "Did you just graduate... fr om middle school?"
But here's what really gets me – why do we act completely insane at the doctor's office? They ask, "How much do you drink?" and suddenly I'm doing calculus in my head. "Well, let's see, carry the one... I have a glas s of wine with dinner." Yeah, a glass. It's just a really big glass. It's a vase, actually. I drink wine fro m a vase.
[Mime drinking from large container]
And don't get me started on grocery shopping. Why do I need a PhD to buy shampoo? There's moisturizing, volu mizing, clarifying, fortifying... I just want my hair to be clean! I don't need it to solve my emotional pro blems!
Plus, self-checkout is a trap. They put you in charge of scanning your own groceries like you work there. Th en the machine starts screaming "UNEXPECTED ITEM IN BAGGING AREA!" Yeah, it's unexpected because I don't wor k here! I'm doing my best, robot!
[Pause for effect]
The worst part is when you're buying embarrassing things. Last week I had toilet paper, ice cream, and cat f ood. The machine froze, and the teenage employee had to come over. He looked at my items and I could see him thinking, "This person's having a rough week."
[Shrug]
Speaking of technology, my phone knows too much about me. It autocorrects "running late" before I even type it because that's apparently my catchphrase now. My maps app just assumes I'm lost. It doesn't even wait for me to ask for directions anymore. I'll be sitting on my couch and it's like, "Turn right to get to your kit chen."
And what's with password requirements these days? "Must contain uppercase, lowercase, numbers, symbols, the blood of your firstborn, and a haiku about your childhood fears." Then they're like, "Don't write it down!" Where am I supposed to store this information? I can't remember why I walked into a room!
[Pace across stage]
You know what else nobody prepared me for? How excited I'd get about cancelled plans. Someone texts "Hey, ca n we raincheck?" and I'm already in my pajamas like "Oh no! What a shame! I was so looking forward to puttin g on real pants!"
Because going out now requires preparation. I need three business days' notice. I need to charge my social b attery. I need to practice conversation topics. Remember when we could just... show up places?
[Final beat]
The truth is, we're all just walking around pretending we know what we're doing. But at least we're all conf used together. And hey, if you ever see someone at the store at 10 PM buying wine in a box and googling "why does my back hurt when I breathe?" – just give them a nod. We're all in this together.
[Pause]
Thank you, you've been a wonderful audience! Don't forget to validate your parking – unlike my life choices!
[Exit]
Recreate an interactive, nostalgic Pokémon battle UI in a single HTML file.
Recreate an interactive, classic Mario level in a single HTML file.
Tests an AI's ability to replicate an existing UI with Tailwind CSS
Tests an AI's ability to create interactive web elements
Tests an AI's ability to create a detailed world map in SVG format
Generate SVG art of a randomly chosen animal in a setting of its choosing.
Generate a unique and simple recipe with common ingredients.