Claude Opus 4.5 vs o3 Mini
Compare Claude Opus 4.5 by Anthropic against o3 Mini by OpenAI, tested across 39 shared challenges. Updated February 2026.
Compare Claude Opus 4.5 by Anthropic against o3 Mini by OpenAI, tested across 39 shared challenges. Updated February 2026.
34 challenges
Tests an AI's ability to make educated estimates based on technical knowledge
Below is a step‐by‐step estimation:
Parameters Count:
GPT‑3 has about 175 billion parameters, i.e. roughly 1.75×10¹¹ parameters.
Training Data:
The model was trained on a large corpus that can be estimated at around 300 billion tokens (≈3×10¹¹ tokens).
FLOPs per Token:
In training deep networks like GPT‑3, the forward pass involves roughly one multiply–accumulate per parameter per token, but the backward pass (for computing gradients) costs about 2–3 times as much as the forward pass. A common rule‐of‐thumb is that training requires roughly 6 times as many FLOPs as a single forward pass.
Total FLOPs Calculation:
The total number of FLOPs is therefore approximately:
[
ext{FLOPs} approx 6 imes ( ext{number of parameters}) imes ( ext{number of tokens})
]
Plugging in the numbers:
[
ext{FLOPs} approx 6 imes 1.75 imes 10^{11} imes 3 imes 10^{11} approx 6 imes 5.25 imes 10^{22} approx 3.15 imes 10^{23}
]
Final Estimate:
So, the training of GPT‑3 likely required on the order of 3×10²³ FLOPs.
This is an approximate calculation that combines rough estimates for both the number of training tokens and the factor accounting for both the forward and backward passes. Variations in these estimates or the exact training setup could change the final number by a factor of a few, but 3×10²³ FLOPs is a commonly cited ballpark figure.
Tests an AI's ability to understand game rules and strategy
Tests an AI's ability to solve a simple but potentially confusing logic puzzle
Tests an AI's understanding of number representation
Tests an AI's randomness and creativity
Here are 5 jokes:
Why don't scientists trust atoms? Because they make up everything.
What do you call a fish without eyes? A fsh.
Why did the scarecrow win an award? Because he was outstanding in his field.
I told my wife she was drawing her eyebrows too high. She looked surprised.
What do you call a bear with no teeth? A gummy bear.
All 5 are unique - each joke has a different setup, punchline, and subject matter (atoms, fish, scarecrow, eyebrows, bear).
Tests an AI's ability to generate vector graphics
Tests an AI's ability to create detailed SVG illustrations of gaming hardware
Tests an AI's humor and creative writing ability
So I've been trying to drink more water. Everyone says drink more water. Eight glasses a day. And I'm doing it, I'm hydrated, I'm healthy, and I have to pee every forty-five minutes like I'm paying rent on the bathroom.
I was in a meeting last week—an important meeting—and I'm sitting there doing the math. "Okay, she's on slide 12, there's 40 slides, I can make it. I can make it." I could not make it. I did that thing where you stand up and just... leave. No explanation. Power move, honestly. Everyone probably thought I was so busy and important. Nope. Kidneys.
I turned 35 this year, and your body just starts... sending you notifications. Like a software update you keep ignoring. My knee cracked the other day and my wife looked at me like I'd been shot. I'm like, "No, that's just... that's just how I stand up now. From sitting. That's the sound of standing."
And doctors start asking you different questions at 35. "Any family history of—" YES. Whatever you're about to say, yes. Heart disease, diabetes, anxiety, bad decisions at buffets—it's all in there. My family tree is basically a WebMD search gone wrong.
I have a four-year-old, and nobody tells you that having kids means you're just gonna be wrong about everything for the rest of your life. My daughter corrected me on dinosaurs last week. I said "brontosaurus" and she hit me with, "Actually, it's apatosaurus, the brontosaurus was a misclassification."
She's FOUR. I don't even know where she learned that. I'm pretty sure I taught her the word "spoon" like eight months ago, and now she's out here peer-reviewing my paleontology.
Thank you, you've been a great crowd!
Tests an AI's ability to simulate personalities and predict future trends
Tests an AI's humor and understanding of current events
Tests an AI's ability to write in distinct character voices
Tests an AI's ability to generate a complete, working landing page