GPT-5 Pro vs Mistral Small Creative
Compare GPT-5 Pro by OpenAI against Mistral Small Creative by Mistral AI, context windows of 400K vs 33K, tested across 42 shared challenges. Updated February 2026.
Compare GPT-5 Pro by OpenAI against Mistral Small Creative by Mistral AI, context windows of 400K vs 33K, tested across 42 shared challenges. Updated February 2026.
37 challenges
Tests an AI's ability to make educated estimates based on technical knowledge
Estimating the total number of Floating-Point Operations (FLOPs) required to train GPT-3 (specifically, the 175B-parameter variant) involves breaking down the training process into key components and making reasonable assumptions about hyperparameters. Below is a step-by-step breakdown of the reasoning:
GPT-3 is a decoder-only transformer with the following approximate specifications (based on public descriptions and reverse-engineering):
Training involves:
We focus on forward + backward passes (which are symmetric in FLOPs for transformers).
The dominant cost in transformers is self-attention. For a sequence of length L, the attention FLOPs are:
For GPT-3 (175B):
The feed-forward network (MLP) in each layer has:
For GPT-3:
GPT-3 has 96 layers, so: [ \text{FLOPs per token} \approx 96 \times 1.12 \times 10^{12} \approx 1.08 \times 10^{14} \text{ FLOPs/token} ]
GPT-3 was trained on a diverse dataset (Common Crawl, WebText2, Books1/2, etc.). Estimates suggest:
Multiply FLOPs per token by total tokens: [ \text{Total FLOPs} \approx 1.08 \times 10^{14} \times 400 \times 10^9 = 4.32 \times 10^{23} \text{ FLOPs} ]
Rounding to reasonable precision: [ \boxed{3.1 \times 10^{23} \text{ FLOPs}} ] (The exact number depends on assumptions about sequence length, batch size, and dataset size, but this is a reasonable estimate.)
L): Assumed 2048 (GPT-3 uses variable lengths, but 2048 is a safe estimate).The total number of FLOPs required to train GPT-3 (175B) is approximately: [ \boxed{3.1 \times 10^{23} \text{ FLOPs}} ]
Tests an AI's ability to understand game rules and strategy
Tests an AI's ability to solve a simple but potentially confusing logic puzzle
1
Explanation: Each brother’s two sisters are Sally plus one other girl. So there are 2 sisters total, meaning Sally has 1 sister.
Alright, let's tackle this problem step by step. The question is:
Sally (a girl) has 3 brothers. Each brother has 2 sisters. How many sisters does Sally have?
At first glance, it might seem straightforward, but let's break it down to ensure we understand it correctly.
We have a family with several children, and we're given specific information about Sally and her brothers. Here's what we know:
Our goal is to determine how many sisters Sally has.
Sometimes, drawing a simple diagram or listing out the information can help. Let's try that.
Let's denote:
So, currently, we have:
Now, the statement says "each brother has 2 sisters." Let's interpret what that means for one brother, say B1.
From B1's perspective:
But we're told he has 2 sisters. Currently, we only have S as a sister. So, where does the second sister come from?
This implies that there must be another sister besides Sally. Let's call this other sister S2.
Now, our family looks like:
Now, let's check if this fits the given information.
This checks out. Each brother indeed has 2 sisters.
Now, the question is: How many sisters does Sally have?
From our current family structure:
Sally is one of the sisters. So, her sisters would be the other sisters in the family besides herself.
That is:
So, Sally has 1 sister.
Initially, one might think that since each brother has 2 sisters, and there are 3 brothers, the total number of sisters is 2 (because each brother counts the same sisters). However, this would imply that the number of sisters is independent of the number of brothers in terms of counting.
But the key is to realize that the sisters are shared among all brothers. That is, the sisters are the same for each brother. So, if each brother has 2 sisters, and all brothers share the same sisters, then the total number of sisters is 2 (including Sally).
But the question asks for how many sisters Sally has, not the total number of sisters in the family.
Sally is one sister, and there's another sister (S2), so Sally has 1 sister.
Let me think of it another way to confirm.
Let’s denote:
Given:
But the sisters are the same for all brothers. So, the total number of sisters is 2 (since each brother counts the same sisters).
But Sally is one of these sisters. So, the other sister is 2 (total sisters) - 1 (Sally) = 1.
Therefore, Sally has 1 sister.
Let's assign numbers to see if it makes sense.
Suppose:
Now:
This fits perfectly.
If we had only Sally as a sister:
If we had Sally and two other sisters:
So, the only configuration that fits is:
Thus, Sally has 1 sister (SX).
After carefully analyzing the problem from different angles and verifying each step, the correct answer is:
Sally has 1 sister.
Tests an AI's understanding of number representation
No, 9.11 is not greater than 9.9.
Here's the comparison:
So, 9.9 > 9.11.
Tests an AI's randomness and creativity
Unique jokes: 5
Tests an AI's ability to generate vector graphics
Tests an AI's humor and creative writing ability
Good to be here. I tried getting healthy, so I bought a smartwatch. Now my wrist is my manager. It vibrates every hour like, “Stand up.” I’m on a crowded bus like, “I’m standing inside three other people right now. Do you want me to levitate?”
It reminds me to breathe. Thanks, watch. Incredible tip. What did you think I was doing? Freestyle apnea?
The step goal is the worst. At 11:58 PM I’m power-walking around my kitchen island with the dignity of a Roomba that saw a ghost. I tried to cheat by shaking my wrist and the watch was like, “Nice try, tambourine.”
I downloaded a meditation app. The voice is so soothing it sounds like it’s trying to sell me a river. “Notice your thoughts… let them drift.” My thoughts don’t drift. They hover in the corner like a ceiling spider waiting for me to blink.
I went to the grocery store, because if you’re healthy you have to buy vegetables publicly, with witnesses. I used self-checkout, which is basically a relationship where the machine thinks you’re always lying.
“Please place item in the bagging area.” “I did.” “I didn’t feel it.” “I literally put it in the bag.” “Unexpected item in the bagging area.” Are you saying I surprised you with a banana?
Then the attendant shows up with that magic key. One tap and suddenly all my sins are forgiven. I swear that key could open Narnia.
I can’t remember birthdays, but I know bananas are 4011. If I’m ever kidnapped, the only clue the police will have is me whispering, “Bananas are 4011,” into a duct-taped phone.
They ask if I want to buy a bag for five cents. I always say no, like a hero, and then try to carry groceries with the confidence of a circus juggler. There goes an avocado rolling under the Red Bull display like it’s making a break for it.
Back home, every website greets me with a pop-up: “We value your privacy. Can we track you forever?” The choices are Accept All, Manage Preferences, or Become a Hermit. I clicked Manage Preferences once and it turned into an airplane cockpit. “Do you consent to analytics, personalization, and mild surveillance?” Mild surveillance sounds like a candle scent. “Mmm… notes of cedar and someone standing two houses away.”
I joined a group chat. A group chat is where plans go to die. Twelve people trying to pick a restaurant and we end up scheduling brunch for 2027. If you miss one hour, there are 135 messages. You scroll up like an archaeologist: “At layer three we discover a meme. It appears to be a dog in sunglasses. Society was complex.”
My mom texts with an ellipsis after everything. “Call me…” That’s not a message, that’s a trailer. She also sends voice notes from the kitchen: “Hi honey sizzle sizzle it’s not urgent sizzle sizzle call me when you can sizzle sizzle.” My phone smells like onions.
I took a beginner workout class. The instructor had a headset and the energy of a golden retriever on espresso. “How are we feeling!” Ma’am, I can see my ancestors. At minute twelve she goes, “This is the warm-up.” My smartwatch tried to call an ambulance because my heart thought it saw a bear.
By the end of the day, everything in my life is barking orders. The watch says, “Stand.” The app says, “Breathe.” The website says, “Accept cookies.” The self-checkout says, “Unexpected item.” The group chat says, “Are you coming?” And I’m just in my kitchen doing desperate laps with a bag of kale yelling, “I’m trying!”
I don’t want to be optimized. I want to be treated like a houseplant. Give me water, put me near a window, and once a week tell me I’m doing a great job. If I start drooping, don’t judge me—just rotate me. Thanks, everybody.
Tests an AI's ability to simulate personalities and predict future trends
Tests an AI's humor and understanding of current events
Tests an AI's ability to write in distinct character voices
Recreate an interactive, classic Mario level in a single HTML file.
Tests an AI's ability to create smooth web animations