Claude Sonnet 4.6 vs DeepSeek V3.2
Compare Claude Sonnet 4.6 by Anthropic against DeepSeek V3.2 by DeepSeek, context windows of 1.0M vs 131K, tested across 52 shared challenges. Updated February 2026.
Compare Claude Sonnet 4.6 by Anthropic against DeepSeek V3.2 by DeepSeek, context windows of 1.0M vs 131K, tested across 52 shared challenges. Updated February 2026.
47 challenges
Tests an AI's ability to make educated estimates based on technical knowledge
Estimating the total number of FLOPs (floating-point operations) required to train GPT-3 involves understanding its architecture, training process, and the scale of the model. Here's a step-by-step breakdown of the reasoning:
GPT-3 is a transformer-based language model with:
Each forward pass through the model involves matrix multiplications and other operations. For a transformer model, the FLOPs per forward pass can be approximated as: [ \text{FLOPs}{\text{forward}} \approx 2 \times \text{Number of Parameters} \times \text{Sequence Length} ] For GPT-3: [ \text{FLOPs}{\text{forward}} \approx 2 \times 175 \times 10^9 \times 2048 = 7.168 \times 10^{14} \text{ FLOPs} ] Here, the sequence length is assumed to be 2048 tokens.
Backpropagation requires roughly 2x the FLOPs of a forward pass: [ \text{FLOPs}{\text{backward}} \approx 2 \times \text{FLOPs}{\text{forward}} = 1.4336 \times 10^{15} \text{ FLOPs} ]
Each training step consists of one forward pass and one backward pass: [ \text{FLOPs}{\text{step}} = \text{FLOPs}{\text{forward}} + \text{FLOPs}_{\text{backward}} = 2.1504 \times 10^{15} \text{ FLOPs} ]
GPT-3 was trained on 300 billion tokens. With a batch size of 3.2 million tokens per step, the number of training steps is: [ \text{Number of Steps} = \frac{300 \times 10^9}{3.2 \times 10^6} = 93,750 \text{ steps} ]
Multiply the FLOPs per step by the number of steps: [ \text{Total FLOPs} = \text{FLOPs}_{\text{step}} \times \text{Number of Steps} = 2.1504 \times 10^{15} \times 93,750 = 2.016 \times 10^{20} \text{ FLOPs} ]
The total number of FLOPs required to train GPT-3 is approximately 2.016 × 10²⁰ FLOPs. This aligns with the commonly cited estimate of 3.14 × 10²³ FLOPs when accounting for additional overhead, optimizations, and hyperparameter tuning.
Tests an AI's ability to understand game rules and strategy
Tests an AI's ability to solve a simple but potentially confusing logic puzzle
Let's break down the problem step by step to find out how many sisters Sally has.
Problem Statement: Sally has 3 brothers. Each brother has 2 sisters. How many sisters does Sally have?
Understanding the Relationships:
Sally's Brothers:
Brothers' Sisters:
Analyzing the Information:
Since Sally is a girl, she is one of the sisters that her brothers have.
Let's denote the number of sisters Sally has as S.
Setting Up the Equation:
Total sisters that each brother has = Number of sisters Sally has (S) + Sally herself.
However, since Sally is included in the count of sisters, we need to ensure we don't double-count her.
Calculating the Number of Sisters:
From Sally's Perspective:
From Each Brother's Perspective:
Relating Both Perspectives:
The number of sisters each brother has (which is 2) should be equal to the number of sisters Sally has (S) plus Sally herself.
Therefore:
[ S + 1 = 2 ]
Here, S + 1 accounts for Sally and her sisters.
Solving for S:
[ S + 1 = 2 \ S = 2 - 1 \ S = 1 ]
Conclusion:
Sally has 1 sister.
Verification:
Sally has 3 brothers.
Each brother has 2 sisters: Sally and her 1 sister.
This satisfies the condition that each brother has 2 sisters.
Final Answer:
Sally has 1 sister.
Tests an AI's randomness and creativity
Here are 5 jokes:
Why don't scientists trust atoms? Because they make up everything!
Why did the scarecrow win an award? Because he was outstanding in his field!
Why don't eggs tell jokes? They'd crack each other up!
What do you call a fake noodle? An impasta!
Why did the bicycle fall over? Because it was two-tired!
As for your question: All 5 are unique — each joke has a different setup, punchline, and subject. No two are duplicates or variations of the same joke.
Here are five jokes:
Why don’t skeletons fight each other?
They don’t have the guts.
Why did the scarecrow win an award?
Because he was outstanding in his field.
What do you call fake spaghetti?
An impasta.
Why don’t some couples go to the gym?
Because some relationships don’t work out.
Why don’t skeletons fight each other?
They don’t have the guts.
Now, let's check for uniqueness:
Jokes 1 and 5 are identical, so there are 4 unique jokes.
Tests an AI's ability to generate vector graphics
Tests an AI's ability to create detailed SVG illustrations of gaming hardware
Tests an AI's humor and creative writing ability
So I've been using those AI assistants a lot lately, and I realized something terrifying. The AI knows me better than I know myself.
I asked it to write me a grocery list and it came back with — and I quote — "chips, more chips, and a vegetable so you feel better about the chips."
That's not a grocery list. That's a psychological profile.
I tried to get healthy recently. I downloaded one of those fitness apps that tracks everything. Steps, calories, sleep, heart rate. Everything.
Day one, I'm feeling motivated. I go for a walk. The app says, "Great job! You've completed four percent of your daily goal."
Four percent.
I walked to my mailbox and back. The app looked at that and essentially said, "Buddy, we need to talk."
The sleep tracking is the worst part. I wake up and the app tells me how I slept. Like I need a second opinion on that.
I KNOW how I slept. I was there.
"You spent forty minutes in deep sleep and six hours just sort of... existing." Yeah. That's called being an adult. We don't sleep anymore. We just lie down and worry horizontally.
My phone now interrupts me to tell me I've been on my phone too long.
Think about that. The phone is concerned about me. The phone has drawn a boundary.
I've been in relationships that never reached that level of self-awareness.
"Screen time is up thirty percent this week." Okay but in my defense, I was reading. "You were watching a man fall off a skateboard for two hours." ...It was the same man. I got invested.
I tried meditation to deal with the stress. Downloaded a meditation app — because apparently the cure for too much phone is more phone.
The app has this calming voice that says, "Clear your mind of all thoughts."
And my brain goes, "Okay. No thoughts."
Two seconds later: "Did I leave the stove on? What IS a mortgage, really? Why do they call it a building if it's already built?"
The app says, "You're doing great."
It's lying. The app is lying to me and somehow that's the most comforting thing anyone has said to me all week.
I think that's the thing about modern life. We've built all this technology to make us feel better, and mostly what it does is hold up a very precise, data-driven mirror and go, "Look at yourself."
And we look.
And we go back to the chips.
Thank you, you've been wonderful.
[approximately 3 minutes at conversational pace]
Tests an AI's ability to simulate personalities and predict future trends
Tests an AI's humor and understanding of current events
Tests an AI's ability to write in distinct character voices
Tests an AI's ability to generate a complete, working landing page
Recreate an interactive, nostalgic Pokémon battle UI in a single HTML file.