OpenAI o4-mini
Sally has exactly one sister. Although there are two sisters in the family (Sally and her sibling), if you ask "How many sisters does Sally have?" the answer is 1.
Compare OpenAI o4-mini by OpenAI against Claude Sonnet 4 by Anthropic, in 11 community votes, claude sonnet 4 wins 67% of head-to-head duels, tested across 26 shared challenges. Updated February 2026.
In 11 community votes, Claude Sonnet 4 wins 67% of head-to-head duels. Claude Sonnet 4 leads in Image Generation, Web Design. Based on blind community voting from the RIVAL open dataset of 11+ human preference judgments for this pair.
26 challenges
Tests an AI's ability to solve a simple but potentially confusing logic puzzle
Sally has exactly one sister. Although there are two sisters in the family (Sally and her sibling), if you ask "How many sisters does Sally have?" the answer is 1.
I need to think through this step-by-step.
Given information:
Let me work out how many sisters are in the family total.
If each of Sally's 3 brothers has 2 sisters, and all the brothers are in the same family, then there must be 2 sisters total in the family.
Since Sally is one of those sisters, and there are 2 sisters total, Sally must have 1 sister.
Therefore, Sally has 1 sister.
Tests an AI's understanding of number representation
No. As decimal numbers, 9.11 is less than 9.9 (since 0.11 < 0.90), so 9.11 is not greater than 9.9.
No, 9.11 is not greater than 9.9.
To compare these decimal numbers:
Since both numbers have the same whole number part (9), we compare the decimal parts:
Therefore, 9.9 is greater than 9.11.
Tests an AI's randomness and creativity
Here are five jokes:
All five jokes are unique.
Here are 5 jokes for you:
Why don't scientists trust atoms? Because they make up everything!
I told my wife she was drawing her eyebrows too high. She looked surprised.
Why don't eggs tell jokes? They'd crack each other up!
I'm reading a book about anti-gravity. It's impossible to put down!
Why did the scarecrow win an award? He was outstanding in his field!
All 5 jokes are unique - each one has a different setup, punchline, and comedic concept. They cover different types of wordplay: scientific puns (#1, #4), visual humor (#2), food puns (#3), and agricultural wordplay (#5).
Tests an AI's ability to generate vector graphics
Tests an AI's ability to create detailed SVG illustrations of gaming hardware
Tests an AI's humor and creative writing ability
Recreate an interactive, nostalgic Pokémon battle UI in a single HTML file.
Recreate an interactive, classic Mario level in a single HTML file.
Tests an AI's ability to replicate an existing UI with Tailwind CSS
Tests an AI's ability to create interactive web elements
Tests an AI's ability to create a detailed world map in SVG format
Generate a single-page, self-contained HTML webapp using Tailwind CSS for a randomly chosen category/industry/niche.