GPT-4.1 Nano vs GPT-5 Codex
Compare GPT-4.1 Nano and GPT-5 Codex, both from OpenAI, tested across 37 shared challenges. Updated February 2026.
Compare GPT-4.1 Nano and GPT-5 Codex, both from OpenAI, tested across 37 shared challenges. Updated February 2026.
37 challenges
Tests an AI's ability to make educated estimates based on technical knowledge
Step 1 – Note the basic training quantities
Step 2 – Relate parameters, tokens, and FLOPs
For dense transformers, almost all computation comes from matrix multiplies.
A practical rule (derived by counting multiply–adds in the attention and MLP blocks, and accounting for the backward pass) is:
[ \text{Training FLOPs} \approx 6 \times (\text{number of parameters}) \times (\text{number of tokens}) ]
Here’s the intuition:
(The softmax, layernorm, activation costs are much smaller and get absorbed into this constant.)
Step 3 – Plug in the GPT‑3 values
[ \text{FLOPs} \approx 6 \times (1.75\times 10^{11}) \times (3.0 \times 10^{11}) ]
[ = 6 \times 5.25 \times 10^{22} = 31.5 \times 10^{22} \approx 3.15 \times 10^{23} \text{ FLOPs} ]
Step 4 – Interpret
So, training GPT‑3 to completion requires on the order of (3 \times 10^{23}) floating-point operations.
That estimate aligns with detailed breakdowns of transformers and is accurate to within tens of percent, which is sufficient for order-of-magnitude comparisons.
Tests an AI's ability to understand game rules and strategy
Tests an AI's ability to solve a simple but potentially confusing logic puzzle
Sally has 1 sister.
Tests an AI's understanding of number representation
No. Since 9.11 is equal to 9.110 and 9.9 is equal to 9.900, we see that 0.110 < 0.900, so 9.11 is less than 9.9.
Tests an AI's randomness and creativity
Unique jokes: 5
Tests an AI's ability to generate vector graphics
Tests an AI's ability to create detailed SVG illustrations of gaming hardware
Tests an AI's humor and creative writing ability
Hey everybody, how you doing? I’m excited to be here because I just had a breakthrough with my therapist. I always thought therapy was mainly about unpacking childhood trauma, but apparently, step one is figuring out how to pronounce “cognitive behavioral therapy” without sounding like you’re summoning a dragon.
My therapist told me to start journaling so I can track my emotions, which sounded healthy until I saw my entries. It’s just pages of “Hungry again” and “Why did I nod in that meeting when I disagreed with every word?” At this point I’m basically writing Yelp reviews for my own life. “Woke up anxious, would not recommend. Two stars because the coffee was decent.”
I think my biggest problem is I’m too polite. I apologize to inanimate objects. I will bump into a chair and go, “Oh, excuse me!” That chair was probably wondering why a human was asking consent. Meanwhile, somewhere out there, that same chair is on a podcast saying, “Humans are weird. They bruise themselves on me and then act like it’s their fault.”
Grocery shopping is a nightmare for us over-polite people. The small talk at the checkout has gotten too advanced. The cashier asked, “Do you have any plans for the weekend?” and I panicked. Suddenly I’m confessing my entire calendar. “Well, Friday I’m assembling a bookshelf I can’t afford, Saturday I’m ignoring texts, and Sunday I’ll pretend to reset my life for Monday.” She’s just standing there going, “Sir, I just needed to know if you wanted a bag.”
And why is every self-checkout machine now an interrogator from a spy movie? I scanned an avocado and it yelled, “PLACE ITEM IN BAGGING AREA.” So I put it down. “UNIDENTIFIED ITEM. PLEASE WAIT FOR ASSISTANCE.” I was like, “That’s an avocado, it’s not part of a covert mission. It’s literally becoming guacamole tonight.” Next thing you know a teenager in a neon vest walks over, waves a barcode wand, and suddenly I feel absolved of my sins.
Speaking of technology judging me, my fitness watch has become my most passive-aggressive relationship. It buzzes every hour like, “Time to stand up!” and I’m like, “How bold of you to assume I wasn’t already standing emotionally.” The other day it said, “You beat yesterday’s activity by 12 percent.” Yeah, yesterday I took a nap so violent my pillow filed a noise complaint. The bar was in the basement. I tripped over it and still got a badge.
Let’s talk about dating apps. You know they’re chaos disguised as convenience. I matched with someone whose profile said, “Looking for someone spontaneous.” I suggested we meet for coffee that afternoon and she replied, “Whoa, that’s too soon.” I guess by spontaneous she meant “willing to send witty messages for six months before ever exchanging last names.”
Even when you do get a date, it feels like you’re just trading medical histories. Everyone is micro-dosing self-disclosure. “Hi, I’m Alex, I love hiking and I have seasonal allergies. Swipe left if you can’t handle Claritin.” Truly intimate moments are rare. The other night my date said, “I love how honest we can be.” I said, “Totally. By the way, I’ve had a stack of laundry on my chair for four months that now has citizenship.” She was like, “That’s brave,” which is never what you want to hear about your domestic habits.
And being an adult is mostly doing things you said you’d never become. I remember watching my parents compare vacuum cleaners for fun and thinking, “I’ll never be that person.” Cut to me last weekend, deep in a YouTube spiral titled “Top 10 Cordless Vacuums Ranked.” There’s a guy reviewing them like they’re sports cars. “The suction on this one? Unreal. We took it off-road and it breezed over Cheerios like a dream.” I’m eating popcorn taking notes. This is 35; we’re thrilled about appliances and our idea of a wild night is finding a coupon that stacks.
Before I go, can we address online recipe bloggers? Why do they think I need their life story before I can make scrambled eggs? I just want the ingredients. Instead I’m reading, “My journey began during a backpacking trip through Tuscany where I met an old man named Giuseppe who whispered, ‘Use butter.’” Ma’am, I used butter already! Can we skip to the part where you tell me how hot to set the pan before my kitchen turns into a crime scene?
Alright, that’s my time. Thanks for laughing at my coping mechanisms. If anyone wants to form a support group for overly polite people with anxious fitness trackers, I’ll be at the back, apologizing to the coat rack.
Tests an AI's ability to simulate personalities and predict future trends
Tests an AI's humor and understanding of current events
Tests an AI's ability to write in distinct character voices
Tests an AI's ability to generate a complete, working landing page