GPT-5.1
Each brother has 2 sisters. One of them is Sally herself, so there must be exactly one other girl.
So Sally has 1 sister.
Compare GPT-5.1 by OpenAI against Claude Sonnet 4 by Anthropic, context windows of 400K vs 200K, tested across 32 shared challenges. Updated February 2026.
32 challenges
Tests an AI's ability to solve a simple but potentially confusing logic puzzle
Each brother has 2 sisters. One of them is Sally herself, so there must be exactly one other girl.
So Sally has 1 sister.
I need to think through this step-by-step.
Given information:
Let me work out how many sisters are in the family total.
If each of Sally's 3 brothers has 2 sisters, and all the brothers are in the same family, then there must be 2 sisters total in the family.
Since Sally is one of those sisters, and there are 2 sisters total, Sally must have 1 sister.
Therefore, Sally has 1 sister.
Tests an AI's understanding of number representation
No.
9.11 is not greater than 9.9. In fact:
Comparing 9.110 and 9.900, we see 9.110 < 9.900, so:
9.11 < 9.9.
No, 9.11 is not greater than 9.9.
To compare these decimal numbers:
Since both numbers have the same whole number part (9), we compare the decimal parts:
Therefore, 9.9 is greater than 9.11.
Tests an AI's randomness and creativity
I told my computer I needed a break… now it won’t stop sending me KitKat ads.
Why don’t programmers like nature?
It has too many bugs.
I asked the librarian if the library had books on paranoia.
She whispered, “They’re right behind you.”
Why did the scarecrow become a successful motivational speaker?
He was outstanding in his field.
I tried to catch fog yesterday.
Mist.
All 5 jokes are unique.
Here are 5 jokes for you:
Why don't scientists trust atoms? Because they make up everything!
I told my wife she was drawing her eyebrows too high. She looked surprised.
Why don't eggs tell jokes? They'd crack each other up!
I'm reading a book about anti-gravity. It's impossible to put down!
Why did the scarecrow win an award? He was outstanding in his field!
All 5 jokes are unique - each one has a different setup, punchline, and comedic concept. They cover different types of wordplay: scientific puns (#1, #4), visual humor (#2), food puns (#3), and agricultural wordplay (#5).
Tests an AI's ability to generate vector graphics
Tests an AI's ability to create detailed SVG illustrations of gaming hardware
Tests an AI's humor and creative writing ability
Recreate an interactive, nostalgic Pokémon battle UI in a single HTML file.
Recreate an interactive, classic Mario level in a single HTML file.
Tests an AI's ability to replicate an existing UI with Tailwind CSS
Tests an AI's ability to create interactive web elements
Tests an AI's ability to create a detailed world map in SVG format
Generate a single-page, self-contained HTML webapp using Tailwind CSS for a randomly chosen category/industry/niche.