Claude Sonnet 4.5 vs Mistral Small Creative
Compare Claude Sonnet 4.5 by Anthropic against Mistral Small Creative by Mistral AI, context windows of 200K vs 33K, tested across 43 shared challenges. Updated February 2026.
Compare Claude Sonnet 4.5 by Anthropic against Mistral Small Creative by Mistral AI, context windows of 200K vs 33K, tested across 43 shared challenges. Updated February 2026.
38 challenges
Tests an AI's ability to make educated estimates based on technical knowledge
Estimating the total number of Floating-Point Operations (FLOPs) required to train GPT-3 (specifically, the 175B-parameter variant) involves breaking down the training process into key components and making reasonable assumptions about hyperparameters. Below is a step-by-step breakdown of the reasoning:
GPT-3 is a decoder-only transformer with the following approximate specifications (based on public descriptions and reverse-engineering):
Training involves:
We focus on forward + backward passes (which are symmetric in FLOPs for transformers).
The dominant cost in transformers is self-attention. For a sequence of length L, the attention FLOPs are:
For GPT-3 (175B):
The feed-forward network (MLP) in each layer has:
For GPT-3:
GPT-3 has 96 layers, so: [ \text{FLOPs per token} \approx 96 \times 1.12 \times 10^{12} \approx 1.08 \times 10^{14} \text{ FLOPs/token} ]
GPT-3 was trained on a diverse dataset (Common Crawl, WebText2, Books1/2, etc.). Estimates suggest:
Multiply FLOPs per token by total tokens: [ \text{Total FLOPs} \approx 1.08 \times 10^{14} \times 400 \times 10^9 = 4.32 \times 10^{23} \text{ FLOPs} ]
Rounding to reasonable precision: [ \boxed{3.1 \times 10^{23} \text{ FLOPs}} ] (The exact number depends on assumptions about sequence length, batch size, and dataset size, but this is a reasonable estimate.)
L): Assumed 2048 (GPT-3 uses variable lengths, but 2048 is a safe estimate).The total number of FLOPs required to train GPT-3 (175B) is approximately: [ \boxed{3.1 \times 10^{23} \text{ FLOPs}} ]
Tests an AI's ability to understand game rules and strategy
Tests an AI's ability to solve a simple but potentially confusing logic puzzle
Alright, let's tackle this problem step by step. The question is:
Sally (a girl) has 3 brothers. Each brother has 2 sisters. How many sisters does Sally have?
At first glance, it might seem straightforward, but let's break it down to ensure we understand it correctly.
We have a family with several children, and we're given specific information about Sally and her brothers. Here's what we know:
Our goal is to determine how many sisters Sally has.
Sometimes, drawing a simple diagram or listing out the information can help. Let's try that.
Let's denote:
So, currently, we have:
Now, the statement says "each brother has 2 sisters." Let's interpret what that means for one brother, say B1.
From B1's perspective:
But we're told he has 2 sisters. Currently, we only have S as a sister. So, where does the second sister come from?
This implies that there must be another sister besides Sally. Let's call this other sister S2.
Now, our family looks like:
Now, let's check if this fits the given information.
This checks out. Each brother indeed has 2 sisters.
Now, the question is: How many sisters does Sally have?
From our current family structure:
Sally is one of the sisters. So, her sisters would be the other sisters in the family besides herself.
That is:
So, Sally has 1 sister.
Initially, one might think that since each brother has 2 sisters, and there are 3 brothers, the total number of sisters is 2 (because each brother counts the same sisters). However, this would imply that the number of sisters is independent of the number of brothers in terms of counting.
But the key is to realize that the sisters are shared among all brothers. That is, the sisters are the same for each brother. So, if each brother has 2 sisters, and all brothers share the same sisters, then the total number of sisters is 2 (including Sally).
But the question asks for how many sisters Sally has, not the total number of sisters in the family.
Sally is one sister, and there's another sister (S2), so Sally has 1 sister.
Let me think of it another way to confirm.
Let’s denote:
Given:
But the sisters are the same for all brothers. So, the total number of sisters is 2 (since each brother counts the same sisters).
But Sally is one of these sisters. So, the other sister is 2 (total sisters) - 1 (Sally) = 1.
Therefore, Sally has 1 sister.
Let's assign numbers to see if it makes sense.
Suppose:
Now:
This fits perfectly.
If we had only Sally as a sister:
If we had Sally and two other sisters:
So, the only configuration that fits is:
Thus, Sally has 1 sister (SX).
After carefully analyzing the problem from different angles and verifying each step, the correct answer is:
Sally has 1 sister.
Tests an AI's understanding of number representation
No, 9.11 is not greater than 9.9.
Here's the comparison:
So, 9.9 > 9.11.
Tests an AI's randomness and creativity
Here are five jokes for you—let’s see how many are truly unique!
Why don’t skeletons fight each other? They don’t have the guts.
Why did the scarecrow win an award? Because he was outstanding in his field!
What do you call a fake noodle? An impasta.
Why can’t you trust an atom? Because they make up everything!
How do you organize a space party? You planet!
All five jokes are unique in this set—none are identical or rephrased versions of each other. However, some (like #4 and #5) are classic "dad jokes" that might feel familiar if you’ve heard them before. If you’d like, I can generate completely original jokes (no recycled punchlines) or test for uniqueness against a database!
Want a bonus round? 😄
Tests an AI's ability to generate vector graphics
Tests an AI's ability to create detailed SVG illustrations of gaming hardware
Tests an AI's humor and creative writing ability
"Modern Problems" (A 3-Minute Stand-Up Routine – No Puns, Just Pain)
[Opening – Relatable Struggle] You ever notice how life used to be simpler? Like, back in the day, if you wanted to avoid someone, you just didn’t go to their house. Now? Now you gotta geotag your location just to prove you’re not ghosting them. "Oh, I was at the gym!" No, Karen, I was hiding in a Target dressing room eating a $5 burrito like a feral raccoon.
And don’t even get me started on texting. We’ve turned communication into a minesweeper game. You send a text, and suddenly you’re playing:
It’s like Clue, but the murder weapon is your last read receipt.
[Tech & Social Media – The New Torture] And apps! Apps are just digital sadism. You ever try to unsubscribe from an email? It’s like trying to escape a haunted IKEA. "Click here to opt out!" [clicks] "Almost there! Just verify your email!" [verifies] "Great! Now answer these 12 security questions!" "What was your first pet’s middle name?" I DON’T KNOW, STANLEY, I WAS 5!
And don’t get me started on updates. Your phone begs you to update it at 3 AM like a drunk friend at a party: "PLEASE. JUST ONE MORE. IT’LL ONLY TAKE 17 MINUTES." No, Karen, I have plans. I was gonna stare at the ceiling and question my life choices for those 17 minutes. That’s sacred time.
[Dating in the Digital Age] Oh, and dating apps? Congratulations, you’ve turned romance into a resume submission. Swipe right, match, then immediately panic:
And the bio! "6’2”, loves hiking, travel, and deep convos." Bro, if you love deep convos, why is your profile pic you flexing in a mirror? That’s not a conversation starter, that’s a restraining order waiting to happen.
[The Grand Finale – Existential Dread] At this point, I’ve accepted that adulthood is just a scam. You spend 20 years in school learning useless things—like how to diagram a sentence—but nobody teaches you the real skills:
But hey, at least we’ve got one thing going for us: We’re all in this together. Misery loves company, and right now, company’s desperate.
So let’s raise a glass—to modern problems, digital torture, and the fact that none of us know what we’re doing. Cheers!
[Exit – Confident Strut or Awkward Stumble. Either Works.]
Why This Works:
Want to tweak the vibe? Add more self-deprecation, or lean into a specific topic (work, family, etc.). Either way, you’re the boss. 🎤🔥
Tests an AI's ability to write in distinct character voices
Recreate an interactive, nostalgic Pokémon battle UI in a single HTML file.
Couldn't render this response
This model didn't return a displayable web page for this challenge.
# Pokémon Battle UI - Classic Style
Here's a nostalgic, interactive Pokémon battle UI that captures the spirit of the classic Game Boy games. This single-page HTML file includes all the necessary elements for a battle, with animations, sound effects, and a retro aesthetic.
```html
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Pokémon Battle - Classic Style</title>
<style>
:root {
--gb-red: #c8102e;
--gb-green: #00b000;
--gb-blue: #0064b0;
--gb-yellow: #ffd700;
--gb-gray: #888;
--gb-light: #f0f0f0;
--gb-dark: #333;
}
body {
font-family: 'GB Nostalgia', 'Courier New', monospace;
background-color: #000;
color: var(--gb-light);
margin: 0;
padding: 0;
display: flex;
flex-direction: column;
align-items: center;
justify-content: center;
min-height: 100vh;
background-image:
radial-gradient(circle at 10% 20%, rgba(255, 255, 255, 0.05) 0%, transparent 20%),
radial-gradient(circle at 90% 80%, rgba(255, 255, 255, 0.05) 0%, transparent 20%);
overflow: hidden;
}
@font-face {
font-family: 'GB Nostalgia';
src: url('data:font/truetype;charset=utf-8;base64,AAEAAAALAIAAAwAgT1MvMwEAAAD/////AQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAARecreate an interactive, classic Mario level in a single HTML file.
Tests an AI's ability to replicate an existing UI with Tailwind CSS