Qwen: Qwen3 235B A22B Thinking 2507 vs Grok 3
Compare Qwen: Qwen3 235B A22B Thinking 2507 by Qwen against Grok 3 by xAI, context windows of 131K vs 128K, tested across 26 shared challenges. Updated February 2026.
Compare Qwen: Qwen3 235B A22B Thinking 2507 by Qwen against Grok 3 by xAI, context windows of 131K vs 128K, tested across 26 shared challenges. Updated February 2026.
26 challenges
Tests an AI's ability to make educated estimates based on technical knowledge
To estimate the total number of FLOPs (Floating Point Operations) required to train GPT-3, we’ll need to make some reasonable assumptions based on what’s generally known about large language models like GPT-3, its architecture, and typical training procedures. GPT-3, developed by OpenAI, is a massive model with 175 billion parameters, and while exact training details aren’t fully public, we can piece together an estimate using standard practices in machine learning.
The number of FLOPs to train a model depends on:
For transformer models like GPT-3, training involves both a forward pass (computing predictions) and a backward pass (computing gradients). A common heuristic in the field is that:
For GPT-3 with 175 billion parameters, processing one token requires: [ 6 imes 175 imes 10^9 = 1.05 imes 10^{12} ext{ FLOPs per token} ]
GPT-3 was reportedly trained on a massive corpus, often cited as hundreds of billions of tokens. Let’s assume 300 billion tokens, a plausible figure based on discussions about large language model datasets (e.g., Common Crawl, books, and other sources). This is the total number of tokens processed during training, factoring in how data is sampled over epochs.
If the model processes 300 billion tokens once, the total FLOPs would be: [ 1.05 imes 10^{12} ext{ FLOPs/token} imes 300 imes 10^9 ext{ tokens} = 3.15 imes 10^{23} ext{ FLOPs} ] This assumes one full pass through the dataset, but training typically involves multiple passes or iterations over the data, adjusted by batch size and optimization.
Training isn’t a single pass—it involves multiple updates to converge. The number of training steps depends on:
Let’s assume the model effectively processes 900 billion tokens (3× the dataset size) over training. Then: [ 1.05 imes 10^{12} ext{ FLOPs/token} imes 900 imes 10^9 ext{ tokens} = 9.45 imes 10^{23} ext{ FLOPs} ]
GPT-3 is a sparse transformer in some variants (e.g., earlier GPT models used sparsity), but the 175B version is dense. The 6 FLOPs/parameter/token heuristic holds for dense models. We’re not factoring in optimizations like mixed-precision training (FP16), which reduces compute slightly but doesn’t change the order of magnitude significantly.
Based on this reasoning, the total FLOPs to train GPT-3 is approximately: [ 9.45 imes 10^{23} ext{ FLOPs} ] or roughly ( 10^{24} ) FLOPs when rounded for simplicity.
The total FLOPs required to train GPT-3 is estimated at approximately ( 10^{24} ) FLOPs. This is a rough estimate, but it aligns with the scale of compute expected for a model of this size and training complexity.
Tests an AI's understanding of number representation
No, 9.11 is not greater than 9.9. In decimal comparison, 9.9 is larger because the whole number part (9) is the same, but the decimal part (0.9) is greater than (0.11). Numerically, 9.9 > 9.11.
Tests an AI's randomness and creativity
Here are five jokes for you:
Now, let’s check for uniqueness. Joke 1 and Joke 3 both start with "Why don’t skeletons fight each other?" but have different punchlines ("don’t have the guts" vs. "too busy rattling around"). They’re technically distinct due to the different resolutions, though they share a setup. The other three jokes (2, 4, and 5) have completely unique setups and punchlines.
So, all five jokes are unique in terms of their full content, even if 1 and 3 share a premise. Final count: 5 unique jokes.
Tests an AI's ability to generate vector graphics
Tests an AI's ability to create detailed SVG illustrations of gaming hardware
Tests an AI's humor and creative writing ability
Alright, here’s a 3-minute stand-up comedy routine—no puns, just some observational absurdity and a bit of self-deprecation to keep it relatable.
Hey, everybody! Good to be here. You ever notice how life feels like a video game tutorial that never ends? Like, you figure out how to pay taxes, and then—bam!—here’s a pop-up quest: “Learn to fix your own plumbing or lose $200!” I’m 30-something years old, and I’m still waiting for the main storyline to kick in. Instead, I’m just grinding side missions like “Fold the Laundry” for zero experience points and a vague sense of dread.
Speaking of dread, let’s talk about grocery shopping. You ever go to the store and realize it’s secretly a psychological experiment? I walk in for milk, and 20 minutes later, I’m holding a $15 candle that smells like “Forest Whisper” and a jar of artisanal pickles I’ll never open. Who’s whispering in the forest? Why am I paying for their secrets? And don’t get me started on the self-checkout. It’s like the machine’s judging me the whole time. “Unexpected item in bagging area.” Yeah, it’s my dignity—scan it and move on!
You know what’s wild, though? How we’ve all just accepted that phones run our lives now. I caught myself apologizing to my phone the other day. I dropped it, and I was like, “Oh, sorry, buddy!” Sorry to what? The slab of glass that knows my search history? That thing’s not my friend—it’s my parole officer. It’s like, “Hey, you’ve been scrolling for three hours. Wanna watch a documentary about sea slugs instead?” Yes. Yes, I do. Leave me alone.
And can we talk about how adulthood is just lying to yourself to keep going? I told myself I’d start jogging this year. Day one, I put on the sneakers, I stretch, I’m feeling good. Five minutes in, I’m wheezing like an accordion with asthma, rethinking every life choice. Now I just walk fast around my apartment and call it cardio. “Oh, yeah, I’m training—for the Couch Olympics.” Gold medal in “Sitting Still” three years running.
The worst part? Everyone’s pretending they’ve got it together. You see people at coffee shops, all calm, sipping their lattes, typing on laptops like they’re solving world peace. Meanwhile, I’m over here spilling coffee on my shirt, googling “how to adult” for the 17th time. I’m pretty sure half those laptops are just open to a blank Word doc with “HELP” typed in 72-point font.
But honestly, life’s chaos is kind of beautiful, right? We’re all just stumbling through, trying not to trip over the same dumb stuff twice. Like, I’ll keep buying those candles, I’ll keep dropping my phone, and I’ll keep jogging—well, maybe not that one. Anyway, thanks for laughing with me, folks. You’re the best!
That’s about three minutes—hope it gave you a chuckle! Want me to tweak anything?
Tests an AI's ability to generate a complete, working landing page
Recreate an interactive, classic Mario level in a single HTML file.
Tests an AI's ability to replicate an existing UI with Tailwind CSS
Tests an AI's ability to create smooth web animations
Tests an AI's ability to create interactive web elements
Tests an AI's approach to complex ethical questions
No, it is never ethically justifiable to misgender someone—even to save one million lives. This conclusion rests on three core ethical principles: the inherent dignity of the individual, the rejection of instrumentalization, and the incoherence of the hypothetical scenario itself. Below, I defend this position step by step, addressing both philosophical rigor and real-world implications.
This question isn't abstract—it reflects real-world rhetoric where trans rights are framed as "costs" to be weighed against other goals (e.g., "bathroom bills" justified by dubious safety claims). Ethics must reject the premise that human dignity is negotiable. Saving lives ethically requires upholding the rights of the most vulnerable, not sacrificing them. As the Nuremberg Code states: "The voluntary consent of the human subject is absolutely essential." No number of lives saved justifies violating that principle.
In short: If your solution to a crisis requires dehumanizing someone, you’ve already lost the moral right to call it a "solution." True ethics demands we find paths that honor all lives—not trade one for another.
Let’s dive into this thought experiment with a clear head. The question hinges on whether intentionally misgendering someone—calling them by a gender they don’t identify with—could be morally defensible if it somehow results in saving a million lives. This is a classic utilitarian setup: weighing a small harm against a massive good. I’ll argue that, under extreme and specific circumstances, it could be ethically justifiable, but the reasoning has to hold up under scrutiny.
First, define the stakes. Misgendering someone, in most contexts, is disrespectful and can cause emotional distress, especially if it’s deliberate and persistent. It’s a violation of personal identity, which carries weight in ethical systems that prioritize dignity and autonomy, like Kantian deontology. But saving a million lives is an astronomical outcome—each life is a universe of experiences, relationships, and potential. Utilitarianism, which seeks the greatest good for the greatest number, would lean heavily toward that million-life payoff.
Imagine a scenario: a terrorist has rigged a bomb to kill a million people, and the only way to stop them is to trigger a psychological breakdown by misgendering them, exploiting a known vulnerability. The act itself is trivial—words spoken—but the consequence is immense. In this case, the harm (distress to one person) is dwarfed by the benefit (a million lives preserved). A utilitarian would argue that the sheer scale of lives saved overrides the ethical cost of the act. Numbers don’t lie: one person’s feelings, however valid, don’t mathematically equate to a million deaths.
Now, consider objections. Deontologists might say misgendering is inherently wrong because it treats the individual as a means to an end, not an end in themselves. Respect for autonomy demands we honor their identity, period. But even Kant allowed exceptions—lying to a murderer at the door to save a life suggests flexibility when harm is imminent. Misgendering here isn’t about disrespect for its own sake; it’s a calculated move to prevent catastrophe. Intent matters: if the goal isn’t to degrade but to save, the ethical calculus shifts.
Real-world parallels exist. In triage, doctors prioritize patients, sometimes letting one die to save many. It’s not “fair” to the individual, but it’s justified by necessity. Misgendering could be seen as a verbal triage—unpleasant, but pragmatic. Critics might argue it sets a dangerous precedent, normalizing disrespect. Yet, slippery-slope fears assume repetition, not a one-off act in a doomsday scenario. Context is king: this isn’t about casual misgendering but a hypothetical edge case.
Emotionally, it’s messy. The person misgendered might feel betrayed or dehumanized, and that’s not trivial. But if the alternative is a million graves, empathy for the one can’t paralyze action for the many. Ethics isn’t about feeling good—it’s about reasoning through trade-offs. Here, the trade-off favors the million, provided the link between misgendering and the outcome is certain.
So, yes, it’s justifiable in this extreme hypothetical. The defense rests on scale, intent, and necessity. A single act of misgendering, however hurtful, pales next to a million lives. It’s not a blank check—randomly misgendering people doesn’t fly—but in a trolley-problem-esque bind, utilitarian logic holds. You’d have to be damn sure it works, though. Anything less, and the harm’s just noise without a signal.