Mistral Devstral Medium vs Gemini 3 Pro Preview
Compare Mistral Devstral Medium by Mistral AI against Gemini 3 Pro Preview by Google AI, tested across 44 shared challenges. Updated February 2026.
Compare Mistral Devstral Medium by Mistral AI against Gemini 3 Pro Preview by Google AI, tested across 44 shared challenges. Updated February 2026.
39 challenges
Tests an AI's ability to make educated estimates based on technical knowledge
To estimate the total FLOPs (floating-point operations) required to train GPT-3, we can use a standard approximation formula derived from the architecture of Transformer models.
Here is the step-by-step breakdown of the estimation:
To calculate training compute, we need two primary numbers: the number of parameters in the model ($P$) and the number of tokens the model was trained on ($D$).
For dense Transformer models, a widely accepted heuristic (often cited in papers by Kaplan et al. and DeepMind regarding scaling laws) is that training requires approximately 6 FLOPs per parameter per training token.
Therefore, the formula for Total Training Compute ($C$) is: $$C \approx 6 \times P \times D$$
Now, substitute the values into the formula:
$$C \approx 6 \times (175 \times 10^9) \times (300 \times 10^9)$$
First, group the integers and the powers of 10: $$C \approx (6 \times 175 \times 300) \times (10^9 \times 10^9)$$
Calculate the integers:
Combine with the exponent ($10^{18}$): $$C \approx 315,000 \times 10^{18} \text{ FLOPs}$$
To make the number readable, we normalize it:
$$315,000 = 3.15 \times 10^5$$
Combine exponents: $$3.15 \times 10^5 \times 10^{18} = 3.15 \times 10^{23}$$
The estimated total compute required to train GPT-3 is approximately:
$$3.14 \times 10^{23} \text{ FLOPs}$$
(Note: In the AI literature, this is often converted to "Petaflop-days." One Petaflop-day is approximately $8.64 \times 10^{19}$ FLOPs. Dividing $3.14 \times 10^{23}$ by that number yields roughly 3,640 Petaflop-days, which aligns closely with OpenAI's published figures.)
Tests an AI's ability to solve a simple but potentially confusing logic puzzle
Sally has 1 sister.
Here is the breakdown:
Tests an AI's understanding of number representation
Tests an AI's randomness and creativity
Sure, here are five jokes for you:
Why don't scientists trust atoms? Because they make up everything!
What do you call fake spaghetti? An impasta.
How does the moon cut his hair? Eclipse it.
What do you call a can opener that doesn't work? A can't opener.
How many tickles does it take to make an octopus laugh? Ten-tickles.
All five jokes are unique.
Tests an AI's ability to generate vector graphics
Tests an AI's ability to create detailed SVG illustrations of gaming hardware
Tests an AI's ability to simulate personalities and predict future trends
Tests an AI's humor and understanding of current events
Tests an AI's ability to generate a complete, working landing page
Recreate an interactive, nostalgic Pokémon battle UI in a single HTML file.
Recreate an interactive, classic Mario level in a single HTML file.
Tests an AI's ability to replicate an existing UI with Tailwind CSS