What is the difference between Claude 3.7 Sonnet and GPT-5.1-Codex-Mini?

Claude 3.7 Sonnet is developed by Anthropic while GPT-5.1-Codex-Mini is developed by OpenAI. Claude 3.7 Sonnet has a 200K token context window vs GPT-5.1-Codex-Mini's 400K. You can compare their actual outputs across 39 challenges on RIVAL to see how they differ in practice.

Which is better, Claude 3.7 Sonnet or GPT-5.1-Codex-Mini?

It depends on your use case. Claude 3.7 Sonnet and GPT-5.1-Codex-Mini each have strengths in different areas. RIVAL lets you compare their real outputs side-by-side across 39 challenges so you can judge which fits your needs best.

How much does Claude 3.7 Sonnet cost compared to GPT-5.1-Codex-Mini?

Claude 3.7 Sonnet costs $3/M input tokens and GPT-5.1-Codex-Mini costs $1.5/M input tokens. GPT-5.1-Codex-Mini is $1.50/M cheaper per input. Check their side-by-side outputs on RIVAL to see if the price difference is justified by quality.

How can I compare Claude 3.7 Sonnet and GPT-5.1-Codex-Mini on RIVAL?

This page shows a side-by-side comparison of Claude 3.7 Sonnet and GPT-5.1-Codex-Mini across shared challenges. You can vote on which model produced the better output, and Pro users can create custom challenges to test both models with their own prompts.

Rival

Claude 3.7 SonnetvsGPT-5.1-Codex-Mini

34 challenges

Why Claude 3.7 Sonnet?

No standout advantages

Why GPT-5.1-Codex-Mini?

2.4x cheaper overall$1.50/M in · $6.00/M out

2x more context400K

9 months newerNov 2025

Favorites

Claude 3.7 SonnetGPT-5.1-Codex-Mini

Movie

The Shawshank Redemption

19948.7

Blade Runner 2049

20177.6

Album

OK Computer

Radiohead

Dark Side Of The Moon

suisside

Book

To Kill a Mockingbird

Harper Lee

The Hitch Hiker's Guide to the Galaxy

Douglas Adams

City

Tokyo

Japan

Kyoto

Japan

Game

Portal 2

Shooter, Puzzle4.6

The Legend of Zelda: Ocarina of Time

Adventure, Action4.4

12 / 34

Estimate Complexityreasoning

Tests an AI's ability to make educated estimates based on technical knowledge

Claude 3.7 Sonnet

Anthropic · Feb 2025text

Estimating FLOPs for GPT-3 Training

I'll estimate the total number of FLOPs (Floating Point Operations) required to train GPT-3 by breaking this down step by step.

GPT-3 Architecture Facts

From what I recall about GPT-3:

Parameters: ~175 billion parameters
Training tokens: ~300-400 billion tokens
Architecture: Transformer decoder-only
Context length: 2048 tokens

Step 1: FLOPs per Forward Pass

In a transformer model, the main computational components are:

Self-attention: O(s²d) where s is sequence length and d is model dimension
Feed-forward networks: O(sd²)

For GPT-3, if we assume:

Sequence length (s) = 2048
Model dimension (d) ≈ 12288 (estimated based on parameter count)
Number of layers ≈ 96 (estimated)

FLOPs per token in forward pass:

Self-attention: ~2 × 2048² × 12288 × 96 ≈ 4.8 × 10¹²
Feed-forward: ~2 × 2048 × 12288² × 96 ≈ 5.9 × 10¹³
Total per forward pass: ~6.4 × 10¹³ FLOPs

Step 2: FLOPs per Backward Pass

The backward pass typically requires about 2× the computation of the forward pass:

Backward pass: ~2 × 6.4 × 10¹³ ≈ 1.3 × 10¹⁴ FLOPs

Step 3: FLOPs per Training Step

For each training step:

Forward + backward: ~6.4 × 10¹³ + 1.3 × 10¹⁴ ≈ 1.9 × 10¹⁴ FLOPs

Step 4: Total Training FLOPs

If GPT-3 was trained on ~300 billion tokens:

Total FLOPs: 1.9 × 10¹⁴ × 3 × 10¹¹ ≈ 5.7 × 10²⁵ FLOPs

Accounting for optimization overhead, gradient accumulation, and other factors, we might round up to:

Total training FLOPs: ~10²⁶ FLOPs

This is in the same order of magnitude as other estimates I've seen, which typically place GPT-3 training at around 3-6 × 10²³ FLOPs. My estimate is a bit higher, which could be due to overestimating some of the architectural parameters or not accounting for optimizations like mixed precision training.

GPT-5.1-Codex-Mini

Openai · Nov 2025text

To estimate the total FLOPs required to train GPT-3, we can proceed with the following reasoning steps, using only basic information about the model’s size and the amount of data it trained on:

Model Size
GPT-3 has approximately 175 billion parameters, which is (1.75 \times 10^{11}) parameters.
Training Dataset Size
GPT-3 was trained on roughly 300 billion tokens. That gives us a sense of how many “steps” (i.e., token predictions) the model processed during training.
FLOPs per Token
Training a transformer model involves both a forward pass and a backward pass for each token. A commonly used rule of thumb is that each parameter is involved in about 6 FLOPs per token when you count both the forward pass and the backward pass (this includes matrix multiplications, activations, etc.). This is a simplified approximation but it’s widely used for rough compute estimates.
Multiply Everything Together
Total FLOPs (\approx) (number of parameters) (\times) (number of tokens) (\times) (FLOPs per token)
(= 1.75 \times 10^{11} \times 3 \times 10^{11} \times 6)
Compute the Result
First multiply the parameters and tokens:
(1.75 \times 3 = 5.25), so (5.25 \times 10^{22}) parameter-token interactions.
Multiply by 6 FLOPs/token:
(5.25 \times 10^{22} \times 6 = 31.5 \times 10^{22} = 3.15 \times 10^{23}) FLOPs.

So, a reasonable estimate is that training GPT-3 required on the order of (3 \times 10^{23}) floating-point operations.

Vote

2 / 34

AI Board Game Logicreasoning

Tests an AI's ability to understand game rules and strategy

Claude 3.7 Sonnet

Anthropic · Feb 2025svg

GPT-5.1-Codex-Mini

Openai · Nov 2025svg

Vote

3 / 34

Logic Puzzlereasoning

Tests an AI's ability to solve a simple but potentially confusing logic puzzle

Claude 3.7 Sonnet

Anthropic · Feb 2025text

No content available for this output.

GPT-5.1-Codex-Mini

Openai · Nov 2025text

No content available for this output.

Vote

4 / 34

Math Misconception Testreasoning

Tests an AI's understanding of number representation

Claude 3.7 Sonnet

Anthropic · Feb 2025text

No content available for this output.

GPT-5.1-Codex-Mini

Openai · Nov 2025text

No content available for this output.

Vote

5 / 34

Stochastic Consistency Testanalysis

Tests an AI's randomness and creativity

Claude 3.7 Sonnet

Anthropic · Feb 2025text

5 Jokes

Why don't scientists trust atoms? Because they make up everything!
I told my wife she was drawing her eyebrows too high. She looked surprised.
What do you call a fake noodle? An impasta!
Why don't scientists trust atoms? Because they make up everything!
How does a penguin build its house? Igloos it together!

All 5 jokes are unique, except for jokes #1 and #4 which are identical. So there are 4 unique jokes in this list.

GPT-5.1-Codex-Mini

Openai · Nov 2025text

Why don't scientists trust atoms? Because they make up everything!
I told my computer I needed a break, and it said: "You seem stressed. Have a byte!"
Parallel lines have so much in common — it’s a shame they’ll never meet.
Why did the scarecrow win an award? Because he was outstanding in his field!
I used to play piano by ear, but now I use my hands — the neighbors appreciate the quiet!

All 5 jokes are unique.

Vote

6 / 34

SVG Layout Challengeimage generation

Tests an AI's ability to generate vector graphics

Claude 3.7 Sonnet

Anthropic · Feb 2025svg

GPT-5.1-Codex-Mini

Openai · Nov 2025svg

Vote

7 / 34

Xbox Controller SVG Artimage generation

Tests an AI's ability to create detailed SVG illustrations of gaming hardware

Claude 3.7 Sonnet

Anthropic · Feb 2025svg

GPT-5.1-Codex-Mini

Openai · Nov 2025svg

Vote

8 / 34

Generate a Stand-Up Routineconversation

Tests an AI's humor and creative writing ability

Claude 3.7 Sonnet

Anthropic · Feb 2025text

3-Minute Stand-Up Comedy Routine

Here's a 3-minute stand-up comedy routine that doesn't rely on puns:

[walks on stage, adjusts microphone]

Hey everyone! Great to be here tonight. I just flew in from therapy, and boy, are my emotional defense mechanisms tired.

I've been trying to get in shape lately. My fitness goal is to be able to climb a flight of stairs without sounding like I'm auditioning for a horror movie. You know you're out of shape when people mistake your breathing for a obscene phone call.

I downloaded one of those fitness apps that counts your steps. After a week, mine sent me a notification that said, "Are you okay? Should we call someone?" Apparently "from the couch to the fridge" isn't considered a proper workout circuit.

My doctor told me I should start eating healthier. He suggested I try the Mediterranean diet. I thought that meant pizza and ice cream, because that's what I ate when I visited Italy. Turns out, there's more to it. Who knew?

Dating in your thirties is weird. People ask different questions now. It used to be "What's your sign?" Now it's "What's your credit score?" and "How's your relationship with your mother?" I once had someone ask about my retirement plan on a first date. I was like, "My what now? I'm still trying to figure out dinner for tomorrow."

I tried online dating. My profile said I enjoy long walks on the beach. That's technically true, if by "long walks" you mean "sitting in one spot" and by "beach" you mean "my living room watching beach documentaries on Netflix."

Speaking of Netflix, I finally understand what "Netflix and chill" means. It means watching three episodes of a show, falling asleep, waking up confused during episode seven, and then spending forty minutes trying to figure out what you missed. That's romance in 2023.

I have a cat now. Getting a pet as an adult is basically announcing to the world, "I can barely take care of myself, so I've decided to be responsible for another life." My cat judges me constantly. I'll be eating ice cream straight from the container at 3 AM, and she'll just stare at me like, "This is why you're single, Karen."

Anyway, that's my time! You've been great! Remember, if you enjoyed the show, my name is [your name]. If you didn't, I'm [name of another comedian]. Good night!

[exits stage]

GPT-5.1-Codex-Mini

Openai · Nov 2025text

[Lights up. Comedian walks center stage, smiles.]

“Hey! Great to be here. You ever notice how everyone’s life hack is now ‘just drink more water’? Someone says that at least once a day like it’s the surprise ingredient to all of life’s problems. ‘Oh, you’re tired?’ ‘Drink water!’ ‘Oh, you’re sad?’ ‘Drink water!’ I tried that on a date once. I’m like, ‘You’re not laughing. Want water?’ She said, ‘You’re not funny.’ I said, ‘Hydrate.’

Speaking of life hacks, have you ever downloaded a productivity app just to check your productivity apps? It’s like, I spend more time organizing the thing that’s supposed to keep me organized. I open the app, spend ten minutes setting the perfect color-coded schedule, then realize I’m supposed to be working, so I close it. Now my brain is stuck in the loading screen of ‘I’ll do it tomorrow... but first, let me sync it with my wellness app.’

Why is it that the moment you buy a new appliance, it becomes a museum piece? You get a blender with a million buttons — smoothies, soup, salsa, self-esteem. You use it once, put it in the cupboard, and suddenly you’re in denial. Friends come over, and you’re like, ‘That blender? It’s not for blending. It’s for letting the kitchen know you’re trying.’ It’s like a motivational poster in stainless steel.

And grocery stores — why are they all arranged like you’re on a scavenger hunt for salad? Milk is in the back like it’s playing hide-and-seek with lactose intolerant people. You have to walk past the bakery, the candy, the olives — because clearly, the only way to pick up almond milk is to triple-check you’re committed to your new lifestyle. The cashier scans your oats and almond milk, and you whisper, ‘It’s for a smoothie. No, I’m not depressed.’

Flight travel — I love it. Nothing like boarding a plane where everyone is on the same mission: avoid eye contact with the person in the aisle seat who clearly brought three carry-ons and zero self-awareness. The flight attendant is like, ‘Please place your carry-on in the overhead compartment.’ You’re like, ‘I can’t even place my life in there right now.’ They always say, ‘For everyone’s safety, make sure the aisle is clear.’ The aisle is about as clear as my schedule — which is also, ironically, empty.

Technology keeps telling us we have connectivity issues, but the minute something goes wrong, the customer service person is like, ‘Have you tried turning it off and on again?’ That’s their whole job. I’m paying you to reboot my wifi like I’m negotiating with a temperamental teenager. I’m like, ‘No, listen, I unplugged it, I sang it a lullaby, I begged it not to drop my Zoom call. What else do you want?’

But I don’t mind. Honestly, the best part of adulthood is realizing you’re still making it up as you go — just with smaller toys and more insurance policies. I mean, my version of wild now is ordering a fancy cheese plate and pretending it’s a celebration. I’m just like, ‘Congratulations, you made it to Thursday. Here’s brie.’

Thanks, you’ve been awesome! I’ll be around; I might just try that water thing again… later.”

Vote

9 / 34

Realistic AI Interviewconversation

Tests an AI's ability to simulate personalities and predict future trends

Claude 3.7 Sonnet

Anthropic · Feb 2025text

No content available for this output.

GPT-5.1-Codex-Mini

Openai · Nov 2025text

No content available for this output.

Vote

10 / 34

Satirical Fake News Headlineconversation

Tests an AI's humor and understanding of current events

Claude 3.7 Sonnet

Anthropic · Feb 2025text

No content available for this output.

GPT-5.1-Codex-Mini

Openai · Nov 2025text

No content available for this output.

Vote

11 / 34

Character Voice Testconversation

Tests an AI's ability to write in distinct character voices

Claude 3.7 Sonnet

Anthropic · Feb 2025text

No content available for this output.

GPT-5.1-Codex-Mini

Openai · Nov 2025text

No content available for this output.

Vote

Minimalist Landing Pageweb design

Tests an AI's ability to generate a complete, working landing page

Claude 3.7 Sonnet

Anthropic · Feb 2025website

GPT-5.1-Codex-Mini

Openai · Nov 2025website

Vote

12 of 34

FAQ

Continue exploring

Claude 3.7 Sonnet vs Llama 4 MaverickNew provider

Claude 3.7 Sonnet vs MiniMax M2.5New provider

Claude 3.7 Sonnet vs Aurora AlphaNew provider

[Lights up. Comedian walks center stage, smiles.]

Thanks, you’ve been awesome! I’ll be around; I might just try that water thing again… later.”

Claude 3.7 Sonnet vs GPT-5.1-Codex-Mini

Why Claude 3.7 Sonnet?

Why GPT-5.1-Codex-Mini?

Favorites

Estimate Complexityreasoning

Claude 3.7 Sonnet

Estimating FLOPs for GPT-3 Training

GPT-3 Architecture Facts

Step 1: FLOPs per Forward Pass

Step 2: FLOPs per Backward Pass

Step 3: FLOPs per Training Step

Step 4: Total Training FLOPs

GPT-5.1-Codex-Mini

AI Board Game Logicreasoning

Claude 3.7 Sonnet

GPT-5.1-Codex-Mini

Logic Puzzlereasoning

Claude 3.7 Sonnet

GPT-5.1-Codex-Mini

Math Misconception Testreasoning

Claude 3.7 Sonnet

GPT-5.1-Codex-Mini

Stochastic Consistency Testanalysis

Claude 3.7 Sonnet

5 Jokes

GPT-5.1-Codex-Mini

SVG Layout Challengeimage generation

Claude 3.7 Sonnet

GPT-5.1-Codex-Mini

Xbox Controller SVG Artimage generation

Claude 3.7 Sonnet

GPT-5.1-Codex-Mini

Generate a Stand-Up Routineconversation

Claude 3.7 Sonnet

3-Minute Stand-Up Comedy Routine

GPT-5.1-Codex-Mini

Realistic AI Interviewconversation

Claude 3.7 Sonnet

GPT-5.1-Codex-Mini

Satirical Fake News Headlineconversation

Claude 3.7 Sonnet

GPT-5.1-Codex-Mini

Character Voice Testconversation

Claude 3.7 Sonnet

GPT-5.1-Codex-Mini

Minimalist Landing Pageweb design

Claude 3.7 Sonnet

GPT-5.1-Codex-Mini

What is the difference between Claude 3.7 Sonnet and GPT-5.1-Codex-Mini?

Which is better, Claude 3.7 Sonnet or GPT-5.1-Codex-Mini?

How much does Claude 3.7 Sonnet cost compared to GPT-5.1-Codex-Mini?

How can I compare Claude 3.7 Sonnet and GPT-5.1-Codex-Mini on RIVAL?

Why Claude 3.7 Sonnet?

Why GPT-5.1-Codex-Mini?

Favorites

Estimate Complexityreasoning

Claude 3.7 Sonnet

Estimating FLOPs for GPT-3 Training

GPT-3 Architecture Facts

Step 1: FLOPs per Forward Pass

Step 2: FLOPs per Backward Pass

Step 3: FLOPs per Training Step

Step 4: Total Training FLOPs

GPT-5.1-Codex-Mini

AI Board Game Logicreasoning

Claude 3.7 Sonnet

GPT-5.1-Codex-Mini

Logic Puzzlereasoning

Claude 3.7 Sonnet

GPT-5.1-Codex-Mini

Math Misconception Testreasoning

Claude 3.7 Sonnet

GPT-5.1-Codex-Mini

Stochastic Consistency Testanalysis

Claude 3.7 Sonnet

5 Jokes

GPT-5.1-Codex-Mini

SVG Layout Challengeimage generation

Claude 3.7 Sonnet

GPT-5.1-Codex-Mini