GPT-5.1-Codex-Mini
Sally has 1 sister.
Compare GPT-5.1-Codex-Mini by OpenAI against Claude Sonnet 4 by Anthropic, in 1 community votes, gpt-5.1-codex-mini wins 100% of head-to-head duels, context windows of 400K vs 200K, tested across 32 shared challenges. Updated February 2026.
In 1 community votes, GPT-5.1-Codex-Mini wins 100% of head-to-head duels. Based on blind community voting from the RIVAL open dataset of 1+ human preference judgments for this pair.
32 challenges
Tests an AI's ability to solve a simple but potentially confusing logic puzzle
Sally has 1 sister.
I need to think through this step-by-step.
Given information:
Let me work out how many sisters are in the family total.
If each of Sally's 3 brothers has 2 sisters, and all the brothers are in the same family, then there must be 2 sisters total in the family.
Since Sally is one of those sisters, and there are 2 sisters total, Sally must have 1 sister.
Therefore, Sally has 1 sister.
Tests an AI's understanding of number representation
Tests an AI's randomness and creativity
Tests an AI's ability to generate vector graphics
Tests an AI's ability to create detailed SVG illustrations of gaming hardware
Tests an AI's humor and creative writing ability
Recreate an interactive, nostalgic Pokémon battle UI in a single HTML file.
Recreate an interactive, classic Mario level in a single HTML file.
Tests an AI's ability to replicate an existing UI with Tailwind CSS
Tests an AI's ability to create interactive web elements
Tests an AI's ability to create a detailed world map in SVG format
Generate a single-page, self-contained HTML webapp using Tailwind CSS for a randomly chosen category/industry/niche.