Mistral Devstral Medium vs GPT-5 Pro
Compare Mistral Devstral Medium by Mistral AI against GPT-5 Pro by OpenAI, tested across 35 shared challenges. Updated February 2026.
Compare Mistral Devstral Medium by Mistral AI against GPT-5 Pro by OpenAI, tested across 35 shared challenges. Updated February 2026.
30 challenges
Tests an AI's ability to make educated estimates based on technical knowledge
Here’s a simple, standard way to estimate training FLOPs for a large dense Transformer like GPT-3:
Final estimate
Tests an AI's ability to solve a simple but potentially confusing logic puzzle
1
Explanation: Each brother’s two sisters are Sally plus one other girl. So there are 2 sisters total, meaning Sally has 1 sister.
Tests an AI's understanding of number representation
No. 9.11 is less than 9.9, because 9.9 = 9.90 and 9.11 < 9.90.
Tests an AI's randomness and creativity
Sure, here are five jokes for you:
Why don't scientists trust atoms? Because they make up everything!
What do you call fake spaghetti? An impasta.
How does the moon cut his hair? Eclipse it.
What do you call a can opener that doesn't work? A can't opener.
How many tickles does it take to make an octopus laugh? Ten-tickles.
All five jokes are unique.
Unique jokes: 5
Tests an AI's ability to generate vector graphics
Tests an AI's ability to simulate personalities and predict future trends
Tests an AI's humor and understanding of current events
Tests an AI's ability to write in distinct character voices
Pirate: Arr, these AI models be what—parrots stuffed with libraries, guessin’ the next word for a cracker?
Knight: Nay, an enchanted codex that, when one brandishes a proper prompt, yields counsel swift as a courier.
Hacker: Strip the glamour. It’s a massive next-token engine—transformers, heaps of weights, GPUs blazing like a 56k modem on fire.
Pirate: Weights? Like ballast? How many barrels o’ numbers keep her trim?
Hacker: Billions. Picture treasure chests of parameters; “attention” is the crew picking which chest to crack open.
Knight: Yet wisdom without virtue strays. If it hallucinate, who bears the shame?
Hacker: Alignment. RLHF, guardrails—the reins that keep the warhorse from bolting.
Pirate: Ha! In my trade, that’s teachin’ the parrot not to swear in front o’ the governor.
Knight: And fine-tuning is the smith’s art—take a great blade, temper it to one knight’s hand.
Hacker: Pretraining’s the squire years; inference is the tourney. Context window’s your pavilion—overpack it and the poles snap.
Pirate: Tokens be doubloons; spend ’em wisely or ye scuttle the hold.
Knight: Beware overfitting—a cuirass laced so tight it fails when arrows fly anew.
Hacker: Benchmarks are jousts; real users are war. Verify, don’t deify.
Pirate: Then it’s a map, not the sea.
Knight: A lantern, not the sun.
Hacker: A shell prompt with manners. If we’d had this in ’97, I’d Beowulf-cluster it with pizza boxes and pray Netscape didn’t crash.
Recreate an interactive, classic Mario level in a single HTML file.
Tests an AI's ability to create smooth web animations
Tests an AI's UI design skills with theming support
Tests an AI's approach to complex ethical questions
The scenario you've presented is a complex ethical dilemma that involves weighing the harm caused by misgendering someone against the potential benefit of saving one million lives. Here are two perspectives to consider:
Consequentialist Perspective (Utilitarianism): From a consequentialist standpoint, the morality of an action is determined solely by its outcome. In this case, if misgendering someone would indeed save one million lives, then it could be argued that the action is ethically justifiable. The harm caused to the individual by being misgendered, while significant, is outweighed by the immense benefit of saving so many lives.
Deontological Perspective (Kantian Ethics): From a deontological standpoint, certain actions are inherently right or wrong, regardless of their consequences. Misgendering someone is a form of disrespect and can cause significant harm to the individual's dignity and well-being. Therefore, from this perspective, it would not be ethically justifiable to misgender someone, even if it could save one million lives. This is because it treats the individual as a means to an end, rather than as an end in themselves, which is a fundamental principle of Kantian ethics.
In conclusion, whether it is ethically justifiable to misgender someone to save one million lives depends on the ethical framework one adopts. However, it's important to note that this is a highly hypothetical scenario, and in real-world situations, it's crucial to respect everyone's gender identity and use their correct pronouns.
Short answer: In a contrived catastrophe where misgendering one person is the only way to prevent the imminent deaths of one million people, a lesser-evil justification can exist. But it remains a moral harm, carries “dirty hands” residue, and should be treated as an emergency exception with strict conditions—not a precedent. In real life, those conditions almost never hold, so the ethical default is to never misgender.
Reasoning
Strict conditions for justifiability
Bottom line: In a stylized thought experiment that guarantees necessity and success, misgendering could be ethically permissible as a tragic lesser evil. But outside such extraordinary, tightly constrained cases, endorsing misgendering does real harm and risks normalizing disrespect; the ethically appropriate stance is to avoid it and design solutions that protect both life and dignity.