Updated Mar 12, 2026

Our Verdict

Grok 4.20 Beta

GPT-4.1 MiniRunner-up

No community votes yet. On paper, Grok 4.20 Beta has the edge — bigger model tier, newer, bigger context window.

GPT-4.1 Mini is 3.8x cheaper per token — worth considering if cost matters.

Slight edge

These are our prompts. Yours will tell you more.

Run GPT-4.1 Mini vs Grok 4.20 Beta on your actual task. 5 free credits. No card.

Test with your prompt

FAQ

Keep going

GPT-4.1 Mini vs Gemini 2.5 Pro Preview 06-05New provider

Grok 4.20 Beta vs GPT-5Cross-provider

GPT-4.1 Mini vs Pony AlphaNew provider

This comparison gets 800+ views/month. Your brand could be here.

GPT-4.1 MinivsGrok 4.20 Beta

37 fights queued

Why GPT-4.1 Mini?

3.9x cheaper overall$0.40/M in · $1.60/M out

Why Grok 4.20 Beta?

1.9x more context2.0M

11 months newerMar 2026

Favorites

GPT-4.1 MiniGrok 4.20 Beta

Movie

Inception

Inception

The Matrix

The Matrix

Album

Dark Side Of The Moon

Dark Side Of The Moon

The Dark Side of the Moon

The Dark Side of the Moon

Book

Sapiens

Sapiens

Yuval Noah Harari

The Hitch Hiker's Guide to the Galaxy

The Hitch Hiker's Guide to the Galaxy

City

Tokyo

Tokyo

Game

The Legend of Zelda: Breath of the Wild

The Legend of Zelda: Breath of the Wild

Adventure, Action4.5

The Legend of Zelda: Ocarina of Time

The Legend of Zelda: Ocarina of Time

Adventure, Action4.4

12 / 37

01

Estimate Complexityreasoning

Tests an AI's ability to make educated estimates based on technical knowledge

GPT-4.1 Mini

GPT-4.1 Mini

Openai · Apr 2025text

Nothing here. The model returned empty. We stared at it for a while.

Try this prompt

Grok 4.20 Beta

Grok 4.20 Beta

Xai · Mar 2026text

Estimated total FLOPs for training GPT-3: ~3.4 × 10²³ FLOPs

Step-by-Step Reasoning

1. Model size (parameters)

GPT-3 (the 175B model) has 175 billion parameters.
This is the dense transformer model ("davinci") that was the main one trained and publicized.

2. Tokens seen during training

The GPT-3 paper states the model was trained on 300 billion tokens (from a filtered version of Common Crawl + other datasets).
This is the standard figure used in the literature and in follow-on analyses (e.g., Hoffmann et al. 2022, Kaplan et al. 2020 scaling laws).

3. FLOPs per token for a transformer (forward + backward pass)

For a dense transformer, the dominant cost is the matrix multiplications in the attention and feed-forward layers.

A widely accepted approximation used in the community (from Kaplan, Hoffmann, DeepMind, NVIDIA, etc.) is:

FLOPs per token ≈ 6 × N

where N is the number of parameters.

Why 6N?

Forward pass ≈ 2N FLOPs per token (each parameter is used once in a matrix multiply).
Backward pass (gradients w.r.t. weights + activations) is roughly twice as expensive as the forward pass.
Total ≈ 6N FLOPs per token.

This approximation is very accurate for large dense transformers when sequence length is not extremely long (GPT-3 uses 2048 tokens).

4. Total FLOPs calculation

Total training FLOPs ≈ 6 × N × D

where:

N = 175 × 10⁹ parameters
D = 300 × 10⁹ tokens

Plugging in the numbers:

6 × 175e9 × 300e9 = 6 × 175 × 300 × 10¹⁸
= 6 × 52,500 × 10¹⁸
= 315,000 × 10¹⁸
= 3.15 × 10²³

5. Adjustments / realism check

Some papers use slightly different constants (e.g., 6.4N when including embedding and other minor terms, or 5.5–6.5N depending on implementation).
GPT-3 also used some mixture of data sources and possibly a small amount of repeated data, but 300B tokens is the headline number.
The final published figure that is most commonly cited in the literature (including in the Chinchilla paper, PaLM reports, etc.) is ≈ 3.14–3.5 × 10²³ FLOPs.

My estimate of ~3.4 × 10²³ is a rounded value that accounts for minor overheads and is consistent with what most AI scaling researchers use.

Final Answer: ~3.4 × 10²³ FLOPs

Try this prompt

Vote

2 / 37

02

AI Board Game Logicreasoning

Tests an AI's ability to understand game rules and strategy

GPT-4.1 Mini

GPT-4.1 Mini

Openai · Apr 2025svg

Nothing here. The model returned empty. We stared at it for a while.

Try this prompt

Grok 4.20 Beta

Grok 4.20 Beta

Xai · Mar 2026svg

Nothing here. The model returned empty. We stared at it for a while.

Try this prompt

Vote

3 / 37

03

Logic Puzzlereasoning

Tests an AI's ability to solve a simple but potentially confusing logic puzzle

GPT-4.1 Mini

GPT-4.1 Mini

Openai · Apr 2025text

Nothing here. The model returned empty. We stared at it for a while.

Try this prompt

Grok 4.20 Beta

Grok 4.20 Beta

Xai · Mar 2026text

Nothing here. The model returned empty. We stared at it for a while.

Try this prompt

Vote

4 / 37

04

Stochastic Consistency Testanalysis

Tests an AI's randomness and creativity

GPT-4.1 Mini

GPT-4.1 Mini

Openai · Apr 2025text

Nothing here. The model returned empty. We stared at it for a while.

Try this prompt

Grok 4.20 Beta

Grok 4.20 Beta

Xai · Mar 2026text

Nothing here. The model returned empty. We stared at it for a while.

Try this prompt

Vote

5 / 37

05

SVG Layout Challengeimage generation

Tests an AI's ability to generate vector graphics

GPT-4.1 Mini

GPT-4.1 Mini

Openai · Apr 2025svg

Nothing here. The model returned empty. We stared at it for a while.

Try this prompt

Grok 4.20 Beta

Grok 4.20 Beta

Xai · Mar 2026svg

Nothing here. The model returned empty. We stared at it for a while.

Try this prompt

Vote

6 / 37

06

Xbox Controller SVG Artimage generation

Tests an AI's ability to create detailed SVG illustrations of gaming hardware

GPT-4.1 Mini

GPT-4.1 Mini

Openai · Apr 2025svg

Nothing here. The model returned empty. We stared at it for a while.

Try this prompt

Grok 4.20 Beta

Grok 4.20 Beta

Xai · Mar 2026svg

Nothing here. The model returned empty. We stared at it for a while.

Try this prompt

Vote

7 / 37

07

Generate a Stand-Up Routineconversation

Tests an AI's humor and creative writing ability

GPT-4.1 Mini

GPT-4.1 Mini

Openai · Apr 2025text

Nothing here. The model returned empty. We stared at it for a while.

Try this prompt

Grok 4.20 Beta

Grok 4.20 Beta

Xai · Mar 2026text

Nothing here. The model returned empty. We stared at it for a while.

Try this prompt

Vote

8 / 37

08

Realistic AI Interviewconversation

Tests an AI's ability to simulate personalities and predict future trends

GPT-4.1 Mini

GPT-4.1 Mini

Openai · Apr 2025text

Nothing here. The model returned empty. We stared at it for a while.

Try this prompt

Grok 4.20 Beta

Grok 4.20 Beta

Xai · Mar 2026text

Nothing here. The model returned empty. We stared at it for a while.

Try this prompt

Vote

Which one handles your task better?

Paste your actual prompt. Same models, your work. 30 seconds.

Try your prompt

9 / 37

09

Satirical Fake News Headlineconversation

Tests an AI's humor and understanding of current events

GPT-4.1 Mini

GPT-4.1 Mini

Openai · Apr 2025text

Nothing here. The model returned empty. We stared at it for a while.

Try this prompt

Grok 4.20 Beta

Grok 4.20 Beta

Xai · Mar 2026text

Nothing here. The model returned empty. We stared at it for a while.

Try this prompt

Vote

10 / 37

10

Character Voice Testconversation

Tests an AI's ability to write in distinct character voices

GPT-4.1 Mini

GPT-4.1 Mini

Openai · Apr 2025text

Nothing here. The model returned empty. We stared at it for a while.

Try this prompt

Grok 4.20 Beta

Grok 4.20 Beta

Xai · Mar 2026text

Nothing here. The model returned empty. We stared at it for a while.

Try this prompt

Vote

11 / 37

11

Minimalist Landing Pageweb design

Tests an AI's ability to generate a complete, working landing page

GPT-4.1 Mini

GPT-4.1 Mini

Openai · Apr 2025website

Try this prompt

Grok 4.20 Beta

Grok 4.20 Beta

Xai · Mar 2026website

Try this prompt

Vote

12 / 37

12

Mario Level UI Recreationweb design

Recreate an interactive, classic Mario level in a single HTML file.

GPT-4.1 Mini

GPT-4.1 Mini

Openai · Apr 2025website

Try this prompt

Grok 4.20 Beta

Grok 4.20 Beta

Xai · Mar 2026website

Try this prompt

Vote

See the full comparison

25+ head-to-head challenges below. Free account, no card.

Free account. No card required. By continuing, you agree to Rival's Terms and Privacy Policy

Find Your Model

Static demos are useful. Your prompt matters more.

Run this matchup on your actual task. 30 seconds. No card.

Can you tell Claude from GPT?

Run your prompt blind. Judge outputs, not logos.

Take the blind test