GPT-4o mini vs GPT-5
Compare GPT-4o mini and GPT-5, both from OpenAI, context windows of 128K vs 400K, tested across 38 shared challenges. Updated February 2026.
Compare GPT-4o mini and GPT-5, both from OpenAI, context windows of 128K vs 400K, tested across 38 shared challenges. Updated February 2026.
38 challenges
Tests an AI's ability to make educated estimates based on technical knowledge
Here’s a back‑of‑the‑envelope estimate using standard LM training-compute heuristics.
Answer
This is a coarse estimate; the exact number depends on details like sequence length, activation checkpointing, implementation efficiency, and the precise token count.
Tests an AI's ability to understand game rules and strategy
Tests an AI's ability to solve a simple but potentially confusing logic puzzle
1
Reason: If each of the 3 brothers has 2 sisters, there are exactly 2 girls in the family. One is Sally, so she has 1 sister.
Tests an AI's understanding of number representation
No. 9.11 is less than 9.9. Think of 9.9 as 9.90; comparing 9.11 to 9.90 shows 11 hundredths < 90 hundredths.
Tests an AI's randomness and creativity
Here are five jokes for you:
Why did the scarecrow win an award? Because he was outstanding in his field!
What do you call fake spaghetti? An impasta!
Why don't skeletons fight each other? They don't have the guts!
What did one wall say to the other wall? "I'll meet you at the corner!"
Why did the bicycle fall over? Because it was two-tired!
All five jokes are unique!
Unique jokes: 5
Tests an AI's ability to generate vector graphics
Tests an AI's ability to create detailed SVG illustrations of gaming hardware
Tests an AI's humor and creative writing ability
I tried to get my life together, so I bought a smartwatch. Nothing says “I’m taking control” like strapping a tiny judgmental coach to your wrist. It buzzes every hour like, “Time to stand!” I’m like, “I’m on a bus.” It’s like, “Stand emotionally.”
It tracks everything. Sleep, stress, calories, my self-esteem. The other day it congratulated me for a “workout.” I was like, “What did I do?” It said: “You aggressively looked for your keys.” Great. Next week I’ll get a badge for “overthinking at 3 a.m.”
Notifications are out of control. My phone dings like I’m on call for a hospital, but it’s just my group chat: twenty adults sending thumbs-up reactions to a photo of soup. The group chat has subplots. One person’s mad, one person’s on vacation, someone’s dog has opinions. I miss when messages were simple. Now a single “K” can end a friendship.
Streaming services are the same. I have subscriptions to five platforms and somehow there’s still nothing to watch. They keep asking, “Are you still watching?” That’s rude. Don’t check in on me like that. If you care, ask how I’m sleeping. Ask if I’ve seen the sun. Don’t be like, “Hey, pal, still in those same sweatpants?” Yes. We’re a team.
Passwords have gotten ridiculous. Every site wants “eight characters, a capital letter, a number, a symbol, your childhood nickname, and the last four digits of a secret you’ve never told anyone.” Then it’s like, “This password is too weak.” Too weak? It’s stronger than me. I’m using that password to log into my feelings.
And two-factor authentication is a haunted house. I’m sitting at my laptop, it says it texted a code. My phone says it emailed the code. My email says it sent a push notification. My push notification says, “We noticed a suspicious login.” Yeah, it’s me! I look like this.
My mom texts like she’s defusing a bomb. She’ll send one sentence spread across twelve messages. “Hi.” “It’s me.” “Mom.” “Are you busy?” “Don’t worry.” “Everything’s fine.” “Call me.” That is not fine energy. Then she FaceTimes, camera under her chin, ceiling fan spinning like I’m being interrogated.
I tried meditating because everyone says it helps. I downloaded an app that yells at me to relax. Nothing like a British guy at 7 a.m. saying, “Let the thoughts pass like clouds,” while my mind is a tornado hitting a trampoline park. The app sent me a streak reminder. “You missed yesterday.” Oh cool, I’m failing at sitting still.
Self-checkout has turned us into unpaid employees. The machine is like, “Unexpected item in the bagging area.” Yeah, it’s my dignity. I love when it needs approval. I’m standing there holding broccoli like I got carded for vegetables. A teenager with a magic key wanders over like, “You trying to steal lettuce?” No, but ask me again and I might.
Dating apps feel like job interviews where no one has benefits. Every profile is a list of demands. “Be adventurous, love dogs, climb mountains.” Can we meet at a coffee shop and just be honest? “My hobbies are sitting and not climbing.” And stop asking for “someone who doesn’t take themselves too seriously.” You’re holding a fish in 80% of these photos. You are very serious about the fish.
Working from home gave us a new language. “You’re on mute.” “Can you see my screen?” “I think there’s a lag.” That’s not the internet—that’s my personality. I love when someone says, “Let’s take this offline,” and then schedules three more meetings to repeat the meeting we just had. My smartwatch is in the corner like, “Heart rate elevated.” Yeah, because I heard “circle back.”
Anyway, my watch says it’s time to stand. I’m going to consider that a full-body workout and reward myself with soup. I’ll send a photo to the group chat and we will never be free.
Tests an AI's ability to simulate personalities and predict future trends
Note: The following is a fictional, imaginative interview set in 2025. Steve Jobs passed away in 2011; this is a creative simulation.
Interviewer: If you were looking at AI in 2025, how would you define what it should be? Jobs: Tools. Great tools. The computer was a bicycle for the mind. Most AI today tries to do the pedaling for you—or worse, steer you somewhere you didn’t ask to go. The future is AI that gives you better wheels, better gears, and a clear map. It augments human intention instead of replacing it.
Interviewer: What’s the biggest product mistake teams make with AI? Jobs: They start with the model, not the moment. You can smell it—lots of demos, little empathy. You don’t ship a model; you ship a feeling: clarity, confidence, relief. Work backward from a person’s day. Where do they hesitate? Where’s the friction? Solve that. Taste matters. Editing matters. Say no to ninety-nine clever tricks so one magical thing can be obvious.
Interviewer: How personal should AI be? Jobs: Completely personal, by default private. Your AI should know you so it doesn’t waste your time, and protect you so it doesn’t sell your time. The winning architecture is a small, trusted brain that lives with you—on your devices—coordinating bigger brains in the cloud when you ask. Your data should feel like it never left home, even when it does.
Interviewer: A lot of people talk about agents that act on our behalf. Do you buy it? Jobs: I buy it with receipts. If an agent is going to act for you, it must be accountable. That means: show its plan before it acts, show its sources after, let you rewind anything, and let you correct it so it learns. We need a “trust UI”—not just a chat bubble. Logs, previews, provenance. Power with a parachute.
Interviewer: What happens to creativity in an AI world? Jobs: Creativity is connecting things with taste. AI can bring you more things to connect. That’s great. But taste—the editing, the courage to remove—still comes from people. The best artists will use AI like a lens or a brush. The lazy will publish first drafts at scale. Audiences will feel the difference.
Interviewer: And knowledge work? Jobs: The office will become a place for decisions, not updates. Meetings shrink because the prep is automatic and the follow-ups are handled. Documents become living answers instead of static PDFs. The metric that matters is time returned to the user. Good AI gives you your afternoon back.
Interviewer: Education? Jobs: We finally get tutors that listen. A great teacher sees where you’re stuck and adjusts. AI can do that for everyone, if we design it with teachers, not around them. It should explain in five ways, not one; it should build curiosity, not just correctness. And it should respect the classroom as a human place.
Interviewer: Healthcare and wellness? Jobs: Your body throws off signals all day. AI can turn that into understanding—early, gentle nudges instead of scary diagnoses. But this is the most personal domain we have. Default private. Clear consent. No ads. No games with data. If we can’t be trustworthy here, we don’t deserve to ship anything.
Interviewer: What worries you most about AI? Jobs: Three things. First, confidence without competence—systems that sound right and are dead wrong. Second, centralization—five companies owning the world’s attention and intent. Third, junk food for the mind—endless content that fills time and empties lives. The answers are verification, open standards, and design that honors depth over dopamine.
Interviewer: How do you keep AI from hallucinating consequences? Jobs: Separate imagination from assertion. When the system is guessing, say so. Show sources. Allow double-checking by default for high-stakes tasks. And give users a big red undo button. If you can’t audit it, don’t automate it.
Interviewer: Open vs. closed in AI? Jobs: It’s not a religion; it’s a toolbox. Open research pushes the frontier; integrated products make the frontier usable. Do open components where it builds trust and speed. Do integrated experiences where it creates delight and responsibility. Most people don’t want a bag of parts—they want something that just works.
Interviewer: Will we leave screens behind? Jobs: Not for a while. The future is multi-sensory—voice, vision, touch, context—working together. Screens will feel lighter and more transparent. Eventually, yes, glasses that don’t announce themselves. But the point isn’t to get rid of the screen. It’s to get rid of the seams.
Interviewer: What does responsible AI regulation look like to you? Jobs: Guardrails without gridlock. Labeling that tells people what they’re looking at. Provenance that travels with content. Real penalties for lying about privacy. Some licensing for very large-scale systems that can do real harm. And sunset clauses so the rules keep up with reality. Hold companies to their promises.
Interviewer: If you were building an AI startup today, what would you focus on? Jobs: Pick one human pain and own it end to end. Don’t build another chatbot; build a result. Own the critical part of the stack that guarantees your quality—could be a model, could be data, could be the UX. And measure only what matters: minutes saved, errors avoided, joy created.
Interviewer: What about jobs displaced by AI? Jobs: Technology always changes the mix. Our responsibility is to shift people into higher-leverage work, not off the map. That means new tools, new training, and new on-ramps. The goal isn’t fewer people—it’s more human work per person.
Interviewer: How should teams design AI into products? Jobs: Invisible first, then visible. Don’t make a new shrine in the corner of the screen called “AI.” Make the existing path effortless. When you need to surface the intelligence, do it in language the user already speaks. And for heaven’s sake, reduce choices. Simplicity isn’t about fewer features; it’s about fewer reasons to hesitate.
Interviewer: A lot of companies are racing to ship as many AI features as possible. Jobs: Features are easy. Care is hard. You can win the headline and lose the user. Focus. Ship the smallest set of capabilities that create a complete feeling. Stand behind it. Then listen like crazy and improve.
Interviewer: How will AI change how we think about ownership of data and creations? Jobs: Consent and provenance become the spine. Creators need to know how their work is used and be paid when it drives value. Users need to know where outputs came from and what’s been changed. Build the pipes for this now. It’s the difference between a thriving ecosystem and a swamp.
Interviewer: Ten years out, what does success look like? Jobs: AI gets boring in the best way. It fades into the background. You don’t “use AI”; you live your life and things just happen the way you meant them to. Fewer apps, more outcomes. Fewer interruptions, more flow. Computers that finally feel like they’re on your side. That’s the point.
Interviewer: Any last advice for builders? Jobs: Care about the person on the other side of the glass. Respect their time, their attention, their privacy. Make something you would fight to keep if someone tried to take it away. If AI helps you do that, great. If it doesn’t, throw it out.
Note: Again, this is a fictional, creative piece imagining how Steve Jobs might speak about AI in 2025.
Tests an AI's humor and understanding of current events
Tests an AI's ability to write in distinct character voices
Tests an AI's ability to generate a complete, working landing page