Claude Sonnet 3.6 (2022-10-22) vs Qwen3 Coder
Compare Claude Sonnet 3.6 (2022-10-22) by Anthropic against Qwen3 Coder by Qwen, tested across 32 shared challenges. Updated February 2026.
Compare Claude Sonnet 3.6 (2022-10-22) by Anthropic against Qwen3 Coder by Qwen, tested across 32 shared challenges. Updated February 2026.
32 challenges
Tests an AI's ability to make educated estimates based on technical knowledge
I'll estimate the FLOPs required to train GPT-3 by breaking down the key components and making reasonable approximations.
For a transformer model, the forward pass FLOPs are dominated by matrix multiplications in the attention and MLP layers.
Attention layers:
MLP layers:
Total forward pass FLOPs per token: ~1.7 × 10¹¹
Backward pass typically requires ~2× the forward pass FLOPs due to gradient computations.
Forward + Backward = ~5.1 × 10¹¹ FLOPs per token
However, we need to account for multiple epochs and the fact that training typically processes sequences, not individual tokens.
Sequence-based calculation:
Total training FLOPs: ~3 × 10²³ to 5 × 10²³ FLOPs
This is equivalent to approximately 300 to 500 exaFLOPs.
This estimate aligns reasonably with published figures and captures the massive computational requirements of training such a large language model, representing one of the most computationally intensive AI training runs to date.
Tests an AI's ability to understand game rules and strategy
Tests an AI's ability to solve a simple but potentially confusing logic puzzle
I need to work through this step-by-step.
Given information:
Let me think about this carefully:
Since Sally is a girl, she is one of the sisters that each brother has.
If each brother has 2 sisters, and Sally is one of those sisters, then there must be 1 additional sister besides Sally.
Therefore, Sally has 1 sister.
To verify:
Sally has 1 sister.
Tests an AI's understanding of number representation
Tests an AI's randomness and creativity
Tests an AI's ability to create detailed SVG illustrations of gaming hardware
Tests an AI's humor and creative writing ability
Tests an AI's ability to simulate personalities and predict future trends
Tests an AI's humor and understanding of current events
Tests an AI's ability to write in distinct character voices
Tests an AI's ability to generate a complete, working landing page
Recreate an interactive, nostalgic Pokémon battle UI in a single HTML file.