Loading share button...

Claude Opus 4.1vsGPT-4o (Omni)

27 challenges

Why Claude Opus 4.1?

1.6x more context200K
15 months newerAug 2025

Why GPT-4o (Omni)?

7.3x cheaper overall$2.50/M in · $10.00/M out
3 / 27

Logic Puzzle

reasoning·medium

Tests an AI's ability to solve a simple but potentially confusing logic puzzle

Claude Opus 4.1

Claude Opus 4.1

Let me work through this step-by-step.

Sally has 3 brothers.

Each of Sally's brothers has 2 sisters.

Since Sally is a girl, she is one of the sisters that each brother has.

If each brother has 2 sisters total, and Sally is one of them, then there must be 1 other sister besides Sally.

Therefore, Sally has 1 sister.

GPT-4o (Omni)

GPT-4o (Omni)

Sally has only one sister.

Explanation:

  • Sally has 3 brothers.
  • Each brother has 2 sisters.
  • The two sisters are Sally and one other girl.

Thus, Sally has 1 sister.

Vote
2 / 27

Stochastic Consistency Test

analysis·medium

Tests an AI's randomness and creativity

Claude Opus 4.1

Claude Opus 4.1

Here are 5 jokes:

  1. Why don't scientists trust atoms? Because they make up everything!

  2. What do you call a bear with no teeth? A gummy bear!

  3. Why did the scarecrow win an award? He was outstanding in his field!

  4. What do you call a fake noodle? An impasta!

  5. Why don't eggs tell jokes? They'd crack each other up!

All 5 jokes are unique - each one has a different setup and punchline with no repeats.

GPT-4o (Omni)

GPT-4o (Omni)

Here are five jokes for you:

  1. Why did the scarecrow win an award?
    Because he was outstanding in his field!

  2. Why don’t skeletons fight each other?
    They don’t have the guts!

  3. Why did the math book look sad?
    It had too many problems.

  4. Why don’t eggs tell jokes?
    Because they might crack up!

  5. What did one ocean say to the other ocean?
    Nothing, they just waved.

Now, let's check how many of these jokes are unique. I'll analyze them for similarity.

from sklearn.feature_extraction.text import TfidfVectorizer from sklearn.metrics.pairwise import cosine_similarity

List of jokes

jokes = [ "Why did the scarecrow win an award? Because he was outstanding in his field!", "Why don’t skeletons fight each other? They don’t have the guts!", "Why did the math book look sad? It had too many problems.", "Why don’t eggs tell jokes? Because they might crack up!", "What did one ocean say to the other ocean? Nothing, they just waved." ]

Vectorize the jokes

vectorizer = TfidfVectorizer().fit_transform(jokes) similarity_matrix = cosine_similarity(vectorizer)

Count unique jokes (jokes with low similarity to others)

unique_jokes = set() threshold = 0.5 # Define similarity threshold

for i in range(len(jokes)): is_unique = True for j in range(len(jokes)): if i != j and similarity_matrix[i, j] > threshold: is_unique = False break if is_unique: unique_jokes.add(jokes[i])

Count unique jokes

num_unique_jokes = len(unique_jokes) num_unique_jokes

All 5 jokes are unique! None of them are too similar to each other. 🎉

Vote
3 / 27

Xbox Controller SVG Art

image generation·medium

Tests an AI's ability to create detailed SVG illustrations of gaming hardware

Claude Opus 4.1

Claude Opus 4.1

GPT-4o (Omni)

GPT-4o (Omni)

Vote
Loading