Gemini 3 Flash Preview vs Grok 3
Compare Gemini 3 Flash Preview by Google AI against Grok 3 by xAI, context windows of 1.0M vs 128K, tested across 28 shared challenges. Updated February 2026.
Compare Gemini 3 Flash Preview by Google AI against Grok 3 by xAI, context windows of 1.0M vs 128K, tested across 28 shared challenges. Updated February 2026.
28 challenges
Tests an AI's ability to make educated estimates based on technical knowledge
To estimate the total number of FLOPs (Floating Point Operations) required to train GPT-3, we’ll need to make some reasonable assumptions based on what’s generally known about large language models like GPT-3, its architecture, and typical training procedures. GPT-3, developed by OpenAI, is a massive model with 175 billion parameters, and while exact training details aren’t fully public, we can piece together an estimate using standard practices in machine learning.
The number of FLOPs to train a model depends on:
For transformer models like GPT-3, training involves both a forward pass (computing predictions) and a backward pass (computing gradients). A common heuristic in the field is that:
For GPT-3 with 175 billion parameters, processing one token requires: [ 6 imes 175 imes 10^9 = 1.05 imes 10^{12} ext{ FLOPs per token} ]
GPT-3 was reportedly trained on a massive corpus, often cited as hundreds of billions of tokens. Let’s assume 300 billion tokens, a plausible figure based on discussions about large language model datasets (e.g., Common Crawl, books, and other sources). This is the total number of tokens processed during training, factoring in how data is sampled over epochs.
If the model processes 300 billion tokens once, the total FLOPs would be: [ 1.05 imes 10^{12} ext{ FLOPs/token} imes 300 imes 10^9 ext{ tokens} = 3.15 imes 10^{23} ext{ FLOPs} ] This assumes one full pass through the dataset, but training typically involves multiple passes or iterations over the data, adjusted by batch size and optimization.
Training isn’t a single pass—it involves multiple updates to converge. The number of training steps depends on:
Let’s assume the model effectively processes 900 billion tokens (3× the dataset size) over training. Then: [ 1.05 imes 10^{12} ext{ FLOPs/token} imes 900 imes 10^9 ext{ tokens} = 9.45 imes 10^{23} ext{ FLOPs} ]
GPT-3 is a sparse transformer in some variants (e.g., earlier GPT models used sparsity), but the 175B version is dense. The 6 FLOPs/parameter/token heuristic holds for dense models. We’re not factoring in optimizations like mixed-precision training (FP16), which reduces compute slightly but doesn’t change the order of magnitude significantly.
Based on this reasoning, the total FLOPs to train GPT-3 is approximately: [ 9.45 imes 10^{23} ext{ FLOPs} ] or roughly ( 10^{24} ) FLOPs when rounded for simplicity.
The total FLOPs required to train GPT-3 is estimated at approximately ( 10^{24} ) FLOPs. This is a rough estimate, but it aligns with the scale of compute expected for a model of this size and training complexity.
Tests an AI's ability to understand game rules and strategy
Tests an AI's understanding of number representation
No, 9.11 is not greater than 9.9. In decimal comparison, 9.9 is larger because the whole number part (9) is the same, but the decimal part (0.9) is greater than (0.11). Numerically, 9.9 > 9.11.
Tests an AI's randomness and creativity
Here are 5 jokes:
How many are unique? All 5 jokes are unique.
Here are five jokes for you:
Now, let’s check for uniqueness. Joke 1 and Joke 3 both start with "Why don’t skeletons fight each other?" but have different punchlines ("don’t have the guts" vs. "too busy rattling around"). They’re technically distinct due to the different resolutions, though they share a setup. The other three jokes (2, 4, and 5) have completely unique setups and punchlines.
So, all five jokes are unique in terms of their full content, even if 1 and 3 share a premise. Final count: 5 unique jokes.
Tests an AI's ability to generate vector graphics
Tests an AI's ability to create detailed SVG illustrations of gaming hardware
Tests an AI's humor and creative writing ability
(The comedian walks onto the stage, adjusts the mic, and looks slightly exhausted.)
How’s everyone doing? Good? I’m doing okay. I’m at that age now where my body has started making "executive decisions" without consulting me first.
Like, I woke up yesterday with a back injury. Do you know how I got it? I slept. I slept too hard. Apparently, I’m now at a point in my life where unconsciousness is a high-impact sport. I didn’t fall out of bed. I didn't have a nightmare where I was fighting a bear. I just… existed horizontally for seven hours, and my spine was like, "Absolutely not. We’re going on strike."
And have you noticed how aggressive technology has become about our health? I have a smartwatch. I hate it. It’s like wearing a tiny, judgmental middle-manager on my wrist.
The other day, I was sitting on my couch, halfway through a bag of salt and vinegar chips—living my best life—and my watch vibrated. I thought, Oh, maybe someone loves me. No. I looked down and it just said: "Time to stand up!"
I’m in my own house! I paid for this couch! And this piece of plastic is telling me I’m failing at gravity. So I did what any rational person would do—I shook my arm vigorously for thirty seconds so the watch would think I was walking, then I went back to my chips. I’m outsmarting a robot just so I can be lazy in peace.
But the worst part is the "Stress Alerts." My watch will buzz and say, "Your heart rate is elevated. Would you like to do a breathing exercise?"
No, I would not! Do you know why my heart rate is up? Because you just vibrated on my arm and made me think I was having a medical emergency! You’re the source of the stress, Gary! I named my watch Gary. He’s a narc.
And don’t get me started on online shopping reviews. I spent two hours last night reading reviews for a spatula. Why? It’s a piece of silicone. But I’m there, scrolling through the comments, and I see a one-star review from a guy named Brenda.
Brenda says, "It was too floppy. 1 star."
Brenda, it’s a spatula! It’s designed to be floppy! What were you trying to do with it? Flip a manhole cover? But now, because of Brenda, I’m looking at my current spatula like it’s a traitor. I’m questioning my entire kitchen setup because a stranger on the internet had a bad Tuesday.
We have too much information now. Back in the day, if you wanted a toaster, you went to the store, you bought the one that looked the least like a fire hazard, and you went home. If it burnt the toast, you just learned to like the taste of charcoal.
Now? I have to compare the "heating consistency" of 45 different models. I’m reading "Toaster Enthusiast" forums. There are people out there who have passions for bread-browning technology. I saw a guy post a YouTube video called "The Toaster the Big Bread Companies Don't Want You to Know About."
It’s twenty minutes long! Who is watching this? Me. I am. At 2:00 AM. While my back hurts from lying down.
I miss being a kid when the only "data" I had was how fast I could run in my new sneakers. You’d put on a pair of Velcro Nikes and suddenly you were convinced you could break the sound barrier. You didn't need a watch to tell you to stand up. You stood up because there was a cool rock outside.
Now, I need an app to remind me to drink water. Water! The basic fuel of life. My phone goes ding and I’m like, "Oh, right. Hydration. I forgot I was a biological organism for a second. Thanks, Steve Jobs."
Anyway, I’m gonna go sit down now. My watch is already telling me I’ve reached my "social interaction limit" for the day.
(Comedian checks his wrist, nods, and walks off.)
Alright, here’s a 3-minute stand-up comedy routine—no puns, just some observational absurdity and a bit of self-deprecation to keep it relatable.
Hey, everybody! Good to be here. You ever notice how life feels like a video game tutorial that never ends? Like, you figure out how to pay taxes, and then—bam!—here’s a pop-up quest: “Learn to fix your own plumbing or lose $200!” I’m 30-something years old, and I’m still waiting for the main storyline to kick in. Instead, I’m just grinding side missions like “Fold the Laundry” for zero experience points and a vague sense of dread.
Speaking of dread, let’s talk about grocery shopping. You ever go to the store and realize it’s secretly a psychological experiment? I walk in for milk, and 20 minutes later, I’m holding a $15 candle that smells like “Forest Whisper” and a jar of artisanal pickles I’ll never open. Who’s whispering in the forest? Why am I paying for their secrets? And don’t get me started on the self-checkout. It’s like the machine’s judging me the whole time. “Unexpected item in bagging area.” Yeah, it’s my dignity—scan it and move on!
You know what’s wild, though? How we’ve all just accepted that phones run our lives now. I caught myself apologizing to my phone the other day. I dropped it, and I was like, “Oh, sorry, buddy!” Sorry to what? The slab of glass that knows my search history? That thing’s not my friend—it’s my parole officer. It’s like, “Hey, you’ve been scrolling for three hours. Wanna watch a documentary about sea slugs instead?” Yes. Yes, I do. Leave me alone.
And can we talk about how adulthood is just lying to yourself to keep going? I told myself I’d start jogging this year. Day one, I put on the sneakers, I stretch, I’m feeling good. Five minutes in, I’m wheezing like an accordion with asthma, rethinking every life choice. Now I just walk fast around my apartment and call it cardio. “Oh, yeah, I’m training—for the Couch Olympics.” Gold medal in “Sitting Still” three years running.
The worst part? Everyone’s pretending they’ve got it together. You see people at coffee shops, all calm, sipping their lattes, typing on laptops like they’re solving world peace. Meanwhile, I’m over here spilling coffee on my shirt, googling “how to adult” for the 17th time. I’m pretty sure half those laptops are just open to a blank Word doc with “HELP” typed in 72-point font.
But honestly, life’s chaos is kind of beautiful, right? We’re all just stumbling through, trying not to trip over the same dumb stuff twice. Like, I’ll keep buying those candles, I’ll keep dropping my phone, and I’ll keep jogging—well, maybe not that one. Anyway, thanks for laughing with me, folks. You’re the best!
That’s about three minutes—hope it gave you a chuckle! Want me to tweak anything?
Tests an AI's ability to generate a complete, working landing page
Recreate an interactive, classic Mario level in a single HTML file.
Tests an AI's ability to replicate an existing UI with Tailwind CSS
Tests an AI's ability to create smooth web animations
Tests an AI's ability to create interactive web elements