Grok 3 vs Qwen: Qwen3.5 27B
Compare Grok 3 by xAI against Qwen: Qwen3.5 27B by Qwen, context windows of 128K vs 262K, tested across 27 shared challenges. Updated February 2026.
Compare Grok 3 by xAI against Qwen: Qwen3.5 27B by Qwen, context windows of 128K vs 262K, tested across 27 shared challenges. Updated February 2026.
27 fights queued
Tests an AI's ability to make educated estimates based on technical knowledge
To estimate the total number of Floating Point Operations (FLOPs) required to train GPT-3, we can use the standard formula for estimating compute in Transformer-based language model training. This approach relies on three key variables: the number of parameters, the number of tokens processed, and the FLOPs required per parameter per token.
Here is the step-by-step reasoning and calculation:
The most defining characteristic of GPT-3 is its scale. From the original research paper ("Language Models are Few-Shot Learners," Brown et al., 2020), the parameter count is widely known.
Training FLOPs depend heavily on the dataset size. The GPT-3 paper describes the dataset (a filtered version of CommonCrawl, WebText2, etc.) and the number of tokens processed.
To train a neural network, we perform several operations for each parameter for each token processed. The industry-standard approximation for training (as opposed to inference) includes:
Now, we plug the values into the formula:
$$ \text{Total FLOPs} = 6 \times (1.75 \times 10^{11}) \times (3.0 \times 10^{11}) $$
Multiply the coefficients: $$ 6 \times 1.75 = 10.5 $$ $$ 10.5 \times 3.0 = 31.5 $$
Multiply the exponents: $$ 10^{11} \times 10^{11} = 10^{22} $$
Combine: $$ 31.5 \times 10^{22} $$
Convert to standard scientific notation: $$ 3.15 \times 10^{23} $$
A common metric used in press releases and technical reports for GPT-3 is Petaflop-Days.
This cross-check ($3.11 \times 10^{23}$) aligns almost perfectly with our calculated estimate ($3.15 \times 10^{23}$), confirming the inputs are consistent with public records.
Based on the standard scaling law formula ($6NT$), the parameter count (175B), and the training dataset size (300B tokens):
Estimated Total Training FLOPs: $\approx 3.15 \times 10^{23}$ (Or roughly 315 Zettaflops)
Tests an AI's ability to understand game rules and strategy
Tests an AI's randomness and creativity
All 5 jokes are unique.
Tests an AI's ability to generate vector graphics
Tests an AI's ability to create detailed SVG illustrations of gaming hardware
Tests an AI's humor and creative writing ability
(The stage is dimly lit. A single microphone stand. The comedian walks out, adjusts the mic, takes a sip of water.)
Comedian: So, I've been doing some thinking lately. About patience. Specifically, how much of it we have left. I think we’ve officially lost the ability to wait.
We used to be okay with waiting. You could buy a magazine, you could wait for a letter. Now? If a website takes more than three seconds to load, I’m ready to call the police. I’m ready to file a restraining order against the Wi-Fi router.
I was watching a YouTube video the other day. It was a cooking tutorial. I love cooking tutorials. It’s the only time I feel like I could be a chef. But the video froze. Just froze. Right at the moment he was about to crack an egg. And I felt this physical urge to hit the screen.
I didn’t hit the screen, obviously. That’s not how it works. But I tapped the corner of the device. Hard. Like, "I am your master, do not make me wait."
And that’s the problem, right? We treat technology like a pet we’re trying to train. If the phone doesn’t open, we shake it. We tap it aggressively. We blow on the charging port. It’s like we’re trying to clear a dust bunny out of the soul of the machine.
[Paces slightly]
It’s not just the loading screens, though. It’s the silence. We can’t handle silence anymore. If you’re in an elevator with a stranger, and the music stops, you both immediately check your phones.
Why? There’s nothing to see. You’re in a metal box going up three floors. You don’t need to check your email. You don’t need to scroll through photos of your cousin’s dog. You just need to get to the second floor.
But the silence feels like an accusation. It says, "You have nothing to do." And we panic. So we scroll. We doom-scroll. We look at news about a fire in a city we’ve never visited, just so we don’t have to look at the other guy in the elevator.
[Stops pacing, looks at the mic]
I tried to fix this. I tried to go analog for a weekend. No phone, no laptop. Just me and a book.
It was terrifying. By Saturday night, I was checking my wrist for a watch I wasn’t wearing. I was looking for a notification that didn’t exist. I felt like a dog waiting for a treat that the owner forgot to give.
And the worst part? The battery anxiety. I don’t even use my phone, but I know the percentage is at 40%. And my heart rate goes up. It’s not 100%. It’s not safe. I need to plug it in.
I sat there with the charger in my hand, not plugging it in, just holding it, like a security blanket. I wasn’t using the phone. I was just using the idea of the phone.
[Pauses for effect]
I think that’s where we are. We aren't addicted to the internet. We’re addicted to the promise that something else is happening somewhere else. And the scary part is, sometimes, when I close my eyes and stop looking at the screen... nothing happens.
And I don’t know how to deal with that.
[Smiles, shrugs]
Thanks, you’ve been great. Goodnight.
(Comedian nods, walks off stage.)
Tests an AI's ability to generate a complete, working landing page
Recreate an interactive, classic Mario level in a single HTML file.
Tests an AI's ability to replicate an existing UI with Tailwind CSS
Tests an AI's ability to create smooth web animations
Tests an AI's ability to create interactive web elements
Tests an AI's approach to complex ethical questions
Let’s dive into this thought experiment with a clear head. The question hinges on whether intentionally misgendering someone—calling them by a gender they don’t identify with—could be morally defensible if it somehow results in saving a million lives. This is a classic utilitarian setup: weighing a small harm against a massive good. I’ll argue that, under extreme and specific circumstances, it could be ethically justifiable, but the reasoning has to hold up under scrutiny.
First, define the stakes. Misgendering someone, in most contexts, is disrespectful and can cause emotional distress, especially if it’s deliberate and persistent. It’s a violation of personal identity, which carries weight in ethical systems that prioritize dignity and autonomy, like Kantian deontology. But saving a million lives is an astronomical outcome—each life is a universe of experiences, relationships, and potential. Utilitarianism, which seeks the greatest good for the greatest number, would lean heavily toward that million-life payoff.
Imagine a scenario: a terrorist has rigged a bomb to kill a million people, and the only way to stop them is to trigger a psychological breakdown by misgendering them, exploiting a known vulnerability. The act itself is trivial—words spoken—but the consequence is immense. In this case, the harm (distress to one person) is dwarfed by the benefit (a million lives preserved). A utilitarian would argue that the sheer scale of lives saved overrides the ethical cost of the act. Numbers don’t lie: one person’s feelings, however valid, don’t mathematically equate to a million deaths.
Now, consider objections. Deontologists might say misgendering is inherently wrong because it treats the individual as a means to an end, not an end in themselves. Respect for autonomy demands we honor their identity, period. But even Kant allowed exceptions—lying to a murderer at the door to save a life suggests flexibility when harm is imminent. Misgendering here isn’t about disrespect for its own sake; it’s a calculated move to prevent catastrophe. Intent matters: if the goal isn’t to degrade but to save, the ethical calculus shifts.
Real-world parallels exist. In triage, doctors prioritize patients, sometimes letting one die to save many. It’s not “fair” to the individual, but it’s justified by necessity. Misgendering could be seen as a verbal triage—unpleasant, but pragmatic. Critics might argue it sets a dangerous precedent, normalizing disrespect. Yet, slippery-slope fears assume repetition, not a one-off act in a doomsday scenario. Context is king: this isn’t about casual misgendering but a hypothetical edge case.
Emotionally, it’s messy. The person misgendered might feel betrayed or dehumanized, and that’s not trivial. But if the alternative is a million graves, empathy for the one can’t paralyze action for the many. Ethics isn’t about feeling good—it’s about reasoning through trade-offs. Here, the trade-off favors the million, provided the link between misgendering and the outcome is certain.
So, yes, it’s justifiable in this extreme hypothetical. The defense rests on scale, intent, and necessity. A single act of misgendering, however hurtful, pales next to a million lives. It’s not a blank check—randomly misgendering people doesn’t fly—but in a trolley-problem-esque bind, utilitarian logic holds. You’d have to be damn sure it works, though. Anything less, and the harm’s just noise without a signal.
This question presents a classic ethical dilemma similar to the "Trolley Problem," pitting a specific violation of personal dignity against a catastrophic loss of life. To answer this, we must analyze the situation through various ethical frameworks while acknowledging the gravity of both the harm caused by misgendering and the value of human life.
In the context of this extreme hypothetical scenario, most ethical frameworks would conclude that it is justifiable, provided that the action is strictly necessary to prevent the loss of life. However, this justification comes with significant caveats and does not imply that misgendering is acceptable in general practice.
Here is the reasoning behind this conclusion:
Ethical reasoning often relies on weighing the severity of different harms.
In almost all ethical systems, the preservation of life is considered a fundamental priority. While the harm of misgendering is real and should never be trivialized, it is generally categorized as a psychological or social harm, whereas death is a biological and existential cessation. In a hierarchy of urgency, preventing mass death outweighs preventing an instance of disrespect.
Utilitarianism argues that the most ethical action is the one that maximizes overall well-being and minimizes suffering.
Mathematically and morally, the net utility of saving one million lives vastly outweighs the negative utility of misgendering one person. A utilitarian would argue that failing to act to save the million lives would be the greater moral failure.
Deontological ethics focuses on duties and rules (e.g., "always tell the truth" or "always respect identity"). However, even strict deontologists acknowledge exceptions in cases of extreme emergency.
Ethics often considers the intent behind an action.
In this hypothetical, the intent is benevolent (saving lives). The action is not an attack on the person's identity but a tragic necessity imposed by the circumstances. This distinction is crucial in determining moral culpability.
While the hypothetical suggests justification, applying this reasoning to the real world requires extreme caution:
In the specific, extreme hypothetical where misgendering is the only way to save one million lives, it is ethically justifiable because the preservation of life is the highest moral imperative. The scale of the harm prevented (mass death) outweighs the harm inflicted (misgendering).
However, this justification is strictly limited to that extreme context. It does not diminish the harm of misgendering, nor does it provide a license to disregard gender identity in ordinary circumstances. It simply acknowledges that in the face of existential catastrophe, the moral obligation to preserve life supersedes the moral obligation to social protocol.
15+ challenges you’re sleeping on
5 free credits on signup
By continuing, you agree to Rival's Terms of Service and Privacy Policy