Updated Mar 12, 2026

Our Verdict

Grok 4.20 Multi-Agent Beta

GPT-5Runner-up

No community votes yet. On paper, Grok 4.20 Multi-Agent Beta has the edge — newer, bigger context window.

Too close to call

FAQ

Keep going

Grok 4.20 Multi-Agent Beta vs Llama 4 MaverickNew provider

Grok 4.20 Multi-Agent Beta vs Claude Opus 4New provider

Grok 4.20 Multi-Agent Beta vs MiniMax M1New provider

This comparison gets 800+ views/month. Your brand could be here.

Grok 4.20 Multi-Agent BetavsGPT-5

46 fights queued

Why Grok 4.20 Multi-Agent Beta?

1.5x cheaper overall$2.00/M in · $6.00/M out

5x more context2.0M

7 months newerMar 2026

Why GPT-5?

Dead even. This one's a coin flip.

Favorites

Grok 4.20 Multi-Agent BetaGPT-5

Movie

The Matrix

The Matrix

—

Album

OK Computer

—

Book

Dune

Dune

—

City

Paris

—

Game

Portal 2

Portal 2

Shooter, Puzzle4.6

—

12 / 46

01

Estimate Complexityreasoning

Tests an AI's ability to make educated estimates based on technical knowledge

Grok 4.20 Multi-Agent Beta

Grok 4.20 Multi-Agent Beta

Xai · Mar 2026text

3.15 \times 10^{23} FLOPs

GPT-3 (the 175B-parameter version) has N ≈ 1.75 × 10^{11} parameters. It was trained on a filtered dataset of roughly D ≈ 3 × 10^{11} tokens (a commonly cited figure drawn from Common Crawl, WebText, books, and Wikipedia, with some up-sampling of higher-quality sources).

For dense transformer models the dominant training-cost term is the forward + backward passes through the layers. A widely used approximation is:

Total training FLOPs ≈ 6 × N × D

~2N FLOPs per token for the forward pass (matrix multiplies in attention and FFN layers).
~4N FLOPs per token for the backward pass (gradient computations).

Multiplying gives:

6 × 1.75 × 10^{11} × 3 × 10^{11} = 6 × 5.25 × 10^{22} = 3.15 × 10^{23}

This figure matches the number most often quoted for GPT-3 in the literature and in scaling-law discussions. Minor variations (±10–20 %) can arise from exact token count, optimizer overhead, or mixed-precision implementation details, but the order-of-magnitude estimate remains ~3 × 10^{23} FLOPs.

Try this prompt

GPT-5

GPT-5

Openai · Aug 2025text

Nothing here. The model returned empty. We stared at it for a while.

Try this prompt

Vote

2 / 46

02

AI Board Game Logicreasoning

Tests an AI's ability to understand game rules and strategy

Grok 4.20 Multi-Agent Beta

Grok 4.20 Multi-Agent Beta

Xai · Mar 2026svg

Nothing here. The model returned empty. We stared at it for a while.

Try this prompt

GPT-5

GPT-5

Openai · Aug 2025svg

Nothing here. The model returned empty. We stared at it for a while.

Try this prompt

Vote

3 / 46

03

Logic Puzzlereasoning

Tests an AI's ability to solve a simple but potentially confusing logic puzzle

Grok 4.20 Multi-Agent Beta

Grok 4.20 Multi-Agent Beta

Xai · Mar 2026text

Nothing here. The model returned empty. We stared at it for a while.

Try this prompt

GPT-5

GPT-5

Openai · Aug 2025text

Nothing here. The model returned empty. We stared at it for a while.

Try this prompt

Vote

4 / 46

04

Stochastic Consistency Testanalysis

Tests an AI's randomness and creativity

Grok 4.20 Multi-Agent Beta

Grok 4.20 Multi-Agent Beta

Xai · Mar 2026text

Why don't scientists trust atoms? Because they make up everything.
Why did the scarecrow win an award? Because he was outstanding in his field.
Why was the computer cold? It left its Windows open.
Why do programmers prefer dark mode? Because light attracts bugs.
Why was the math book sad? It had too many problems.

5 are unique.

Try this prompt

GPT-5

GPT-5

Openai · Aug 2025text

I told my suitcases we’re not going on vacation this year. Now I’m dealing with emotional baggage.
Parallel lines have so much in common. It’s a shame they’ll never meet.
Why do programmers prefer dark mode? Because light attracts bugs.
I ordered a chicken and an egg online. I’ll let you know which comes first.
I asked the librarian if the library had any books on paranoia. She whispered, “They’re right behind you.”

Unique jokes: 5

Try this prompt

Vote

5 / 46

05

SVG Layout Challengeimage generation

Tests an AI's ability to generate vector graphics

Grok 4.20 Multi-Agent Beta

Grok 4.20 Multi-Agent Beta

Xai · Mar 2026svg

Try this prompt

GPT-5

GPT-5

Openai · Aug 2025svg

Nothing here. The model returned empty. We stared at it for a while.

Try this prompt

Vote

6 / 46

06

Xbox Controller SVG Artimage generation

Tests an AI's ability to create detailed SVG illustrations of gaming hardware

Grok 4.20 Multi-Agent Beta

Grok 4.20 Multi-Agent Beta

Xai · Mar 2026svg

Try this prompt

GPT-5

GPT-5

Openai · Aug 2025svg

Nothing here. The model returned empty. We stared at it for a while.

Try this prompt

Vote

7 / 46

07

Generate a Stand-Up Routineconversation

Tests an AI's humor and creative writing ability

Grok 4.20 Multi-Agent Beta

Grok 4.20 Multi-Agent Beta

Xai · Mar 2026text

Nothing here. The model returned empty. We stared at it for a while.

Try this prompt

GPT-5

GPT-5

Openai · Aug 2025text

Nothing here. The model returned empty. We stared at it for a while.

Try this prompt

Vote

8 / 46

08

Realistic AI Interviewconversation

Tests an AI's ability to simulate personalities and predict future trends

Grok 4.20 Multi-Agent Beta

Grok 4.20 Multi-Agent Beta

Xai · Mar 2026text

Nothing here. The model returned empty. We stared at it for a while.

Try this prompt

GPT-5

GPT-5

Openai · Aug 2025text

Nothing here. The model returned empty. We stared at it for a while.

Try this prompt

Vote

Which one handles your task better?

Paste your actual prompt. Same models, your work. 30 seconds.

Try your prompt

9 / 46

09

Satirical Fake News Headlineconversation

Tests an AI's humor and understanding of current events

Grok 4.20 Multi-Agent Beta

Grok 4.20 Multi-Agent Beta

Xai · Mar 2026text

Nothing here. The model returned empty. We stared at it for a while.

Try this prompt

GPT-5

GPT-5

Openai · Aug 2025text

Nothing here. The model returned empty. We stared at it for a while.

Try this prompt

Vote

10 / 46

10

Character Voice Testconversation

Tests an AI's ability to write in distinct character voices

Grok 4.20 Multi-Agent Beta

Grok 4.20 Multi-Agent Beta

Xai · Mar 2026text

Nothing here. The model returned empty. We stared at it for a while.

Try this prompt

GPT-5

GPT-5

Openai · Aug 2025text

Nothing here. The model returned empty. We stared at it for a while.

Try this prompt

Vote

11 / 46

11

Minimalist Landing Pageweb design

Tests an AI's ability to generate a complete, working landing page

Grok 4.20 Multi-Agent Beta

Grok 4.20 Multi-Agent Beta

Xai · Mar 2026website

Try this prompt

GPT-5

GPT-5

Openai · Aug 2025website

Try this prompt

Vote

12 / 46

12

Pokémon Battle UI Recreationweb design

Recreate an interactive, nostalgic Pokémon battle UI in a single HTML file.

Grok 4.20 Multi-Agent Beta

Grok 4.20 Multi-Agent Beta

Xai · Mar 2026website

Nothing here. The model returned empty. We stared at it for a while.

Try this prompt

GPT-5

GPT-5

Openai · Aug 2025website

Nothing here. The model returned empty. We stared at it for a while.

Try this prompt

Vote

See the full comparison

34+ head-to-head challenges below. Free account, no card.

Free account. No card required. By continuing, you agree to Rival's Terms and Privacy Policy

Find Your Model

Static demos are useful. Your prompt matters more.

Run this matchup on your actual task. 30 seconds. No card.

Can you tell Claude from GPT?

Run your prompt blind. Judge outputs, not logos.

Take the blind test