DeepSeek R1 vs Qwen: Qwen3.5 Flash
Compare DeepSeek R1 by DeepSeek against Qwen: Qwen3.5 Flash by Qwen, context windows of 128K vs 1.0M, tested across 41 shared challenges. Updated February 2026.
Compare DeepSeek R1 by DeepSeek against Qwen: Qwen3.5 Flash by Qwen, context windows of 128K vs 1.0M, tested across 41 shared challenges. Updated February 2026.
36 fights queued
Tests an AI's ability to make educated estimates based on technical knowledge
To estimate the total number of FLOPs (Floating Point Operations) required to train GPT-3, we can rely on the model's configuration details which were published in the original paper ("Language Models are Few-Shot Learners", Brown et al., 2020) and the general scaling laws for Transformer models (Kaplan et al., 2020).
Here is the step-by-step reasoning and estimation:
First, we need to establish the two most critical variables for computing training cost: the number of model parameters ($N$) and the number of training tokens ($T$).
The number of floating-point operations required to process a single token depends on the architecture's complexity during training.
Theoretical Scaling Law: A widely accepted rule of thumb for the total training compute of a Transformer model is: $$ \text{Total FLOPs} \approx 6 \times N \times T $$ Note: This $6N$ factor is a lower-bound heuristic often used for scaling laws. It accounts for the forward and backward passes through the weights.
However, practical training implementations often require more operations due to activation recomputation, optimizer state updates, and memory management overheads. The actual reported compute for GPT-3 is significantly higher than the theoretical $6N$ estimate.
Using the theoretical scaling law formula ($6 \times N \times T$):
$$ \text{FLOPs} \approx 6 \times (1.75 \times 10^{11}) \times (3.0 \times 10^{11}) $$ $$ \text{FLOPs} \approx 6 \times 5.25 \times 10^{22} $$ $$ \text{FLOPs} \approx 31.5 \times 10^{22} $$ $$ \text{FLOPs} \approx 3.15 \times 10^{23} $$
This yields approximately 315 Exaflops (or 0.315 Zettaflops).
While the theoretical calculation above provides a baseline, the actual computational cost reported in the GPT-3 paper is higher. This is because the $6N$ rule is a simplified estimate. The actual overhead includes:
The official paper cites the training cost as approximately $3.17 \times 10^{24}$ FLOPs (3.17 Zettaflops). This implies an effective multiplier closer to 60 rather than 6. This higher multiplier reflects the sum of forward passes, backward passes, optimizer updates, and implementation-specific overheads for every parameter processed per token.
Combining the parameters with the empirically observed training overhead for GPT-3:
$$ \text{Total FLOPs} \approx 1.75 \times 10^{11} \times 3.0 \times 10^{11} \times 60 $$ $$ \text{Total FLOPs} \approx 3.15 \times 10^{24} $$
Based on the model size (175B parameters), dataset size (300B tokens), and the standard scaling laws adjusted for the full training overhead (forward, backward, and optimizer steps), the estimated total FLOPs required to train GPT-3 is approximately:
$$ \mathbf{3.17 \times 10^{24} \text{ FLOPs}} $$ (Or roughly 3.17 Zettaflops)
Tests an AI's ability to understand game rules and strategy
Tests an AI's ability to solve a simple but potentially confusing logic puzzle
Tests an AI's randomness and creativity
Here are 5 jokes:
How many are unique? There are 5 unique jokes in this list. Each one is distinct from the others. (Note: While they are all different from one another, they are also well-known classics rather than original creations.)
Tests an AI's ability to generate vector graphics
Tests an AI's ability to create detailed SVG illustrations of gaming hardware
Tests an AI's humor and creative writing ability
(Walks to the mic, adjusts the stand, takes a sip of water)
So, I went to the grocery store the other day. I realize now that going to the grocery store isn't a chore anymore. It’s a tactical simulation. We don't just walk in and get milk. We navigate a gauntlet.
Especially the self-checkout. They call it "self-checkout," which implies autonomy. But it feels more like "self-surveillance."
You stand there with your basket. You scan an apple. The screen asks, "Is this an apple?" You say, "Yes." It says, "Okay, please place the apple in the bagging area."
This is where the real acting begins. You have to put the apple in the bag perfectly. You can't be holding it. You can't have the bag touching the scanner. If you slip, the machine screams.
It’s not a beep. It’s a human voice. It says, "Please remove the item." Not "Please move the item." "Remove the item." It sounds like the machine is judging your life choices.
I was in there last week, trying to scan a loaf of bread. It wouldn't scan. I scanned it three times. I moved it. I tapped the screen. Finally, the machine said, "Unexpected item in bagging area."
I have a loaf of bread in a plastic bag. I put the bread in the bag. Why is it unexpected?
I hit "Help." And that is when the horror starts.
There is no person there. There is a button, and when you press it, you have to wait twenty seconds. Then, a video feed pops up. A woman appears. She’s wearing a headset. She looks tired. She looks like she’s seen things.
She says, "Hi, what can I help you with?"
I say, "I'm trying to buy bread."
She says, "Let me see your screen."
I hold my phone up to the screen. She says, "Okay, the bagging area is too heavy."
I say, "I have a loaf of bread. It weighs 16 ounces."
She says, "It's weighing 15 pounds."
I say, "That's the scale, not the bread."
She sighs. She says, "I'm going to override it."
And then she has to type. She has to type a code. And while she types, the machine locks you out. You can't move. You can't breathe. You're trapped in a digital holding cell.
It makes you feel like you're stealing something. You're not stealing. You're just trying to go home and eat dinner.
But the worst part is the checkout line. If you use a cashier, they treat you like royalty. But if you use the self-checkout, the actual cashiers are watching you. They are standing behind the glass, watching you struggle.
I saw a guy put a watermelon in the bag. It made a huge thud. The machine said, "Please remove the item."
The guy put the watermelon in the bag again. The machine said, "Please remove the item."
The guy put the watermelon on the floor. The machine said, "Please remove the item."
Finally, the machine just gave up and said, "Please wait for an attendant."
And the guy just stood there, waiting.
That is the new human condition. You don't leave the store when you're done. You leave the store when the machine decides you are done.
We used to worry about the rent. Now we worry about the bagging area.
Thank you, you've been great! (Drops mic and walks off)
Tests an AI's ability to simulate personalities and predict future trends
Tests an AI's humor and understanding of current events
Tests an AI's ability to write in distinct character voices
Tests an AI's ability to generate a complete, working landing page
Recreate an interactive, nostalgic Pokémon battle UI in a single HTML file.
24+ challenges you’re sleeping on
5 free credits on signup
By continuing, you agree to Rival's Terms of Service and Privacy Policy