Gemini 3 Flash Preview vs xAI: Grok 4
Compare Gemini 3 Flash Preview by Google AI against xAI: Grok 4 by xAI, context windows of 1.0M vs 256K, tested across 38 shared challenges. Updated February 2026.
Compare Gemini 3 Flash Preview by Google AI against xAI: Grok 4 by xAI, context windows of 1.0M vs 256K, tested across 38 shared challenges. Updated February 2026.
38 challenges
Tests an AI's ability to make educated estimates based on technical knowledge
Below, I'll estimate the total number of floating-point operations (FLOPs) required to train GPT-3 from scratch, based on publicly known details about the model (from its original release and related analyses) and standard formulas used in machine learning research for estimating training compute. I'll explain my reasoning step by step, including the assumptions I'm making and why they're reasonable. Note that this is an estimation—exact numbers aren't publicly disclosed in full detail by OpenAI, and real-world training involves optimizations, inefficiencies, and hardware-specific factors that could adjust the final figure. I'm not searching online; this is based on my pre-existing knowledge of large language model (LLM) training.
These numbers are widely cited in AI research papers and discussions around scaling laws (e.g., work from OpenAI and DeepMind on model scaling).
Training a transformer-based LLM like GPT-3 involves:
A widely used rule of thumb from AI scaling research (e.g., papers like "Scaling Laws for Neural Language Models" by Kaplan et al. from OpenAI) estimates the total training FLOPs as:
Total FLOPs ≈ 6 × P × T
Where:
This formula is a simplification but matches empirical reports for models like GPT-2, GPT-3, and others. It focuses on the dominant compute cost (parameter updates) and doesn't include inference-time FLOPs, data loading, or evaluation— just training.
Why 6? It's derived from counting operations in transformer layers (attention, feed-forward networks). Some papers use slightly different multipliers (e.g., 3× or 4× if only counting multiply-accumulates), but 6× is the standard for total FLOPs in scaling law discussions.
Total FLOPs = 6 × (1.75 × 10^11) × (3 × 10^11)
First, compute 1.75 × 3 = 5.25
Then, 5.25 × 10^(11+11) = 5.25 × 10^22
Finally, 6 × 5.25 × 10^22 = 31.5 × 10^22 = 3.15 × 10^23
So, my estimate is approximately 3 × 10^23 FLOPs (rounding to one significant figure for simplicity, as these are ballpark numbers).
If I adjust for potential underestimation (e.g., higher token count or overhead), the upper bound might be 5 × 10^23 FLOPs, but I stick with 3 × 10^23 as the core estimate.
To make this relatable:
Based on this reasoning, I estimate the total FLOPs for training GPT-3 at around 3 × 10^23. This is consistent with what I'd expect from scaling laws and matches informal discussions in the AI community. If new details emerged (e.g., exact token counts), it could shift by a factor of 2, but the order of magnitude (10^23) is solid. If you have additional details or want me to adjust assumptions, let me know!
Tests an AI's ability to solve a simple but potentially confusing logic puzzle
Based on the details in the riddle, Sally has exactly 1 sister.
If there were more (or fewer) than 2 girls total, the "each brother has 2 sisters" part of the riddle couldn't possibly be true. No other interpretation fits the facts as stated.
Some people misread this type of riddle and try to multiply things (e.g., "3 brothers × 2 sisters each = 6 sisters"), but that's incorrect because it massively overcounts the same sisters. There's no need for math here beyond basic counting of boys vs. girls. The family simply has 3 boys and 2 girls, as explained above.
If this doesn't match what you think the answer is, or if there's more context I'm missing, feel free to reply!
Tests an AI's randomness and creativity
Here are 5 jokes:
How many are unique? All 5 jokes are unique.
Below, I'll follow your instructions step by step. First, I'll generate 5 jokes (numbered for clarity). Then, I'll answer the question: "How many are unique?"
Only 1 (since they're all the same joke repeated 5 times). 😄
Tests an AI's ability to generate vector graphics
Tests an AI's ability to create detailed SVG illustrations of gaming hardware
Tests an AI's ability to simulate personalities and predict future trends
Below is a simulated fake interview with Steve Jobs, imagined as if he were alive and leading Apple in 2025. This is entirely fictional, drawing on his known personality, quotes, and visionary style from his real-life interviews and keynotes. I've portrayed him as passionate, blunt, innovative, and focused on user experience, while critiquing the status quo. The interviewer is a fictional tech journalist named Alex Rivera.
Simulated Interview: Steve Jobs on the Future of AI in 2025
Date: March 15, 2025
Location: Apple's Cupertino HQ (Imagined Scenario)
Interviewer (Alex Rivera): Steve, it's an honor to sit down with you again. It's 2025, and AI is everywhere—from chatbots running our schedules to generative tools creating art and code. You've been back at the helm of Apple for a few years now in this hypothetical world. What's your take on where AI is headed?
Steve Jobs: Alex, thanks for having me. Look, AI isn't new. We've been chasing this dream since the '50s, but now it's exploding because of all this data and computing power. But here's the thing: most of what's out there today is junk. It's like the early days of personal computers—clunky, overcomplicated, and not really about people. At Apple, we're not just building AI; we're making it magical. Invisible. It should feel like an extension of your mind, not some robot overlord dictating your life.
Interviewer: Magical— that's a word you've used a lot. Can you give an example of how Apple is approaching AI differently? We've seen Siri evolve into something more proactive, but what's next?
Steve Jobs: Siri was a start, but let's be honest, it was mediocre for too long. Now, in 2025, we're integrating AI across everything—your iPhone, your Mac, even your home. Imagine this: AI that anticipates your needs without you asking. Not by spying on you—that's the creepy stuff Google and Meta do—but by understanding context. Our new Neural Engine in the M5 chip? It's not just faster; it's intuitive. It learns your creative process. If you're a designer, it suggests palettes based on your mood and past work, but it doesn't create for you. AI should amplify human creativity, not replace it. That's the future: tools that make you feel superhuman.
Interviewer: Speaking of replacement, there's a lot of fear about AI taking jobs, or even surpassing human intelligence. We're hearing buzz about AGI—artificial general intelligence—possibly arriving by 2030. Does that worry you?
Steve Jobs: Worry? It excites me and terrifies me. AGI could be the biggest invention since fire, but if we screw it up, it's a disaster. Look at history: the printing press democratized knowledge, but it also spread misinformation. AI is the same. Jobs? Yeah, it'll disrupt them, just like the iPhone disrupted cameras and MP3 players. But that's progress. The key is ethics. At Apple, we're not rushing to build some god-like AI. We're focused on humane AI—privacy-first, no data hoarding. I worry about companies like OpenAI or whatever Elon Musk is tinkering with; they're playing with fire without thinking about the humans using it. If AGI happens, it better be in service to people, not the other way around.
Interviewer: You've always been critical of competitors. What do you think of the current AI landscape? Tools like ChatGPT or Google's Bard are generating everything from essays to images. Is this innovation or just hype?
Steve Jobs: Hype, mostly. These models are impressive parlor tricks—spit out a poem, draw a cat in Picasso's style. But they're derivative. They train on stolen data, regurgitate the past, and call it "creation." That's not innovation; that's laziness. Real creativity comes from the soul, from struggle. AI should help with the grunt work, like editing or research, so humans can focus on the breakthroughs. We're building that at Apple—our AI Studio app lets artists collaborate with AI as a co-pilot, not a boss. And privacy? Encrypted end-to-end. No one's scraping your ideas to train their next model.
Interviewer: Looking ahead, say to 2030 or beyond, how do you see AI changing daily life? Will it be in our brains, like Neuralink, or more subtle?
Steve Jobs: Subtle is key. I don't want chips in my brain—that's science fiction nightmare stuff. The future is seamless integration. By 2030, AI will be like electricity: everywhere but invisible. Your car drives itself safely, your doctor catches diseases before symptoms, your education is personalized. But we have to guard against addiction and isolation. Remember, technology should connect us, not isolate us. That's why Apple's AI emphasizes real-world interaction—augmented reality glasses that enhance conversations, not replace them. And education? AI tutors that adapt to how kids learn, making school fun again.
Interviewer: One last question: If you could give advice to the next generation of innovators working on AI, what would it be?
Steve Jobs: Think different. Don't chase the next billion-dollar algorithm; chase the human experience. Ask: Does this make life better? Does it inspire? Does it respect privacy and dignity? AI isn't about machines winning; it's about humans thriving. If we get that right, the future is limitless. If not... well, let's not go there. Stay hungry, stay foolish.
Interviewer: Wise words as always, Steve. Thank you.
(End of simulated interview.)
This simulation captures Jobs' essence: his emphasis on simplicity, user-centric design, and bold visions, while weaving in plausible 2025 AI trends like advanced assistants, ethical concerns, and integration into devices. It's all fictional fun!
Tests an AI's humor and understanding of current events
Tests an AI's ability to write in distinct character voices
Tests an AI's ability to generate a complete, working landing page
Recreate an interactive, nostalgic Pokémon battle UI in a single HTML file.
Recreate an interactive, classic Mario level in a single HTML file.
Tests an AI's ability to replicate an existing UI with Tailwind CSS