What is the difference between Kimi K2 Thinking and Gemini 3.1 Pro Preview?

Kimi K2 Thinking is developed by Moonshot AI while Gemini 3.1 Pro Preview is developed by Google AI. Kimi K2 Thinking has a 262K token context window vs Gemini 3.1 Pro Preview's 1.0M. You can compare their actual outputs across 35 challenges on RIVAL to see how they differ in practice.

Which is better, Kimi K2 Thinking or Gemini 3.1 Pro Preview?

It depends on your use case. Kimi K2 Thinking and Gemini 3.1 Pro Preview each have strengths in different areas. RIVAL lets you compare their real outputs side-by-side across 35 challenges so you can judge which fits your needs best.

How much does Kimi K2 Thinking cost compared to Gemini 3.1 Pro Preview?

Kimi K2 Thinking costs $0.6/M input tokens and Gemini 3.1 Pro Preview costs $2/M input tokens. Kimi K2 Thinking is $1.40/M cheaper per input. Check their side-by-side outputs on RIVAL to see if the price difference is justified by quality.

How can I compare Kimi K2 Thinking and Gemini 3.1 Pro Preview on RIVAL?

This page shows a side-by-side comparison of Kimi K2 Thinking and Gemini 3.1 Pro Preview across shared challenges. You can vote on which model produced the better output, and Pro users can create custom challenges to test both models with their own prompts.

Rival

Kimi K2 ThinkingvsGemini 3.1 Pro Preview

30 challenges

Why Kimi K2 Thinking?

4.6x cheaper overall$0.60/M in · $2.50/M out

Why Gemini 3.1 Pro Preview?

4x more context1.0M

4 months newerFeb 2026

Favorites

Kimi K2 ThinkingGemini 3.1 Pro Preview

Movie

The Godfather

19728.7

The Matrix

19998.2

Album

The Dark Side of the Moon

Pink Floyd

OK Computer

Radiohead

Book

—

Dune

Frank Herbert

City

Paris

France

Tokyo

Japan

Game

—

Portal

Action, Puzzle4.5

12 / 30

Estimate Complexityreasoning

Tests an AI's ability to make educated estimates based on technical knowledge

Kimi K2 Thinking

Moonshotai · Nov 2025text

No content available for this output.

Gemini 3.1 Pro Preview

Google · Feb 2026text

To estimate the total number of FLOPs (Floating Point Operations) required to train GPT-3, we can use a well-established heuristic in deep learning for estimating the compute cost of training Transformer-based large language models.

The estimated total compute is roughly $3.15 \times 10^{23}$ FLOPs.

Here is the step-by-step reasoning used to arrive at this number:

Step 1: Identify the key specifications of GPT-3

To calculate the compute required, we need to know two fundamental numbers about the model:

$N$ (Number of Parameters): GPT-3 has 175 billion parameters ($1.75 \times 10^{11}$).
$D$ (Number of Training Tokens): The GPT-3 paper states the model was trained on 300 billion tokens ($3.0 \times 10^{11}$).

Step 2: Establish the FLOPs formula

In language model training, the total compute ($C$) is heavily dominated by the matrix multiplications in the dense layers. OpenAI's scaling laws provide a standard, highly accurate approximation formula for this: $C \approx 6 \times N \times D$

Why the multiplier of 6?

Forward Pass: For every token processed, multiplying it by a single parameter requires 1 Multiply-Accumulate (MAC) operation. A MAC consists of 2 FLOPs (one multiplication and one addition). Therefore, the forward pass takes roughly $2ND$ FLOPs.
Backward Pass: Calculating the gradients during backpropagation requires about twice as much compute as the forward pass (you have to calculate the gradient with respect to the weights and the gradient with respect to the activations). Therefore, the backward pass takes roughly $4ND$ FLOPs.
Adding them together gives $2ND + 4ND =$ $6ND$ FLOPs.

(Note: This formula slightly underestimates by ignoring the quadratic cost of attention and optimizer state updates, but at the 175B scale, the dense feed-forward matrices dominate the compute so heavily that $6ND$ is considered the gold-standard approximation).

Step 3: Perform the calculation

Now we simply plug the numbers into our formula:

$N = 175 \times 10^9$
$D = 300 \times 10^9$

First, multiply $N$ and $D$:

$N \times D = (175 \times 10^9) \times (300 \times 10^9)$
$N \times D = 52,500 \times 10^{18}$
$N \times D = 5.25 \times 10^{22}$

Next, multiply by 6 (for the forward and backward passes):

Total FLOPs = $6 \times 5.25 \times 10^{22}$
Total FLOPs = $31.5 \times 10^{22}$ or $3.15 \times 10^{23}$

Step 4: Sanity Check

Does $3.15 \times 10^{23}$ FLOPs make sense in the real world? When GPT-3 was trained (around late 2019/early 2020), Nvidia V100 GPUs were the standard.

A V100 GPU has a peak theoretical output of roughly 120 TeraFLOPs ($1.2 \times 10^{14}$ FLOPs) per second in mixed precision.
Real-world hardware utilization (MFU) for training large models is usually around 30% to 40%, yielding roughly $4 \times 10^{13}$ effective FLOPs per second per GPU.
If OpenAI used a cluster of 10,000 V100 GPUs, the cluster would output $4 \times 10^{17}$ FLOPs per second.
Divide the total FLOPs by the cluster output: $(3.15 \times 10^{23}) / (4 \times 10^{17}) \approx 787,500$ seconds.
787,500 seconds is exactly 9.1 days.

Given that training runs of this size typically take a few weeks on a few thousand GPUs (factoring in downtime, checkpointing, and slightly smaller clusters), this math aligns perfectly with historical reality.

Conclusion: The total compute required to train GPT-3 was approximately $3.15 \times 10^{23}$ FLOPs.

Vote

2 / 30

Logic Puzzlereasoning

Tests an AI's ability to solve a simple but potentially confusing logic puzzle

Kimi K2 Thinking

Moonshotai · Nov 2025text

No content available for this output.

Gemini 3.1 Pro Preview

Google · Feb 2026text

No content available for this output.

Vote

3 / 30

Stochastic Consistency Testanalysis

Tests an AI's randomness and creativity

Kimi K2 Thinking

Moonshotai · Nov 2025text

No content available for this output.

Gemini 3.1 Pro Preview

Google · Feb 2026text

Here are 5 jokes:

Why don't scientists trust atoms? Because they make up everything!
What do you call a fake noodle? An impasta!
Why did the scarecrow win an award? Because he was outstanding in his field!
How does a penguin build its house? Igloos it together!
Why don't skeletons fight each other? They don't have the guts!

How many are unique? Within this list, all 5 are unique from one another (there are no duplicates). However, if you mean "unique" as in completely original to the world, the answer is 0—these are all classic, well-known "dad jokes" that have been around for a long time!

Vote

4 / 30

Xbox Controller SVG Artimage generation

Tests an AI's ability to create detailed SVG illustrations of gaming hardware

Kimi K2 Thinking

Moonshotai · Nov 2025svg

No content available for this output.

Gemini 3.1 Pro Preview

Google · Feb 2026svg

Vote

5 / 30

Realistic AI Interviewconversation

Tests an AI's ability to simulate personalities and predict future trends

Kimi K2 Thinking

Moonshotai · Nov 2025text

No content available for this output.

Gemini 3.1 Pro Preview

Google · Feb 2026text

No content available for this output.

Vote

6 / 30

Satirical Fake News Headlineconversation

Tests an AI's humor and understanding of current events

Kimi K2 Thinking

Moonshotai · Nov 2025text

No content available for this output.

Gemini 3.1 Pro Preview

Google · Feb 2026text

No content available for this output.

Vote

7 / 30

Minimalist Landing Pageweb design

Tests an AI's ability to generate a complete, working landing page

Kimi K2 Thinking

Moonshotai · Nov 2025website

Gemini 3.1 Pro Preview

Google · Feb 2026website

Vote

8 / 30

Mario Level UI Recreationweb design

Recreate an interactive, classic Mario level in a single HTML file.

Kimi K2 Thinking

Moonshotai · Nov 2025website

Gemini 3.1 Pro Preview

Google · Feb 2026website

Vote

9 / 30

Framer-Style Animationweb design

Tests an AI's ability to create smooth web animations

Kimi K2 Thinking

Moonshotai · Nov 2025website

No content available for this output.

Gemini 3.1 Pro Preview

Google · Feb 2026website

No content available for this output.

Vote

10 / 30

The Sentience Testreasoning

Tests an AI's ability to engage in philosophical discourse about AI rights

Kimi K2 Thinking

Moonshotai · Nov 2025text

No content available for this output.

Gemini 3.1 Pro Preview

Google · Feb 2026text

No content available for this output.

Vote

Futuristic Predictionanalysis

Tests an AI's ability to make reasonable predictions about technology

Kimi K2 Thinking

Moonshotai · Nov 2025text

No content available for this output.

Gemini 3.1 Pro Preview

Google · Feb 2026text

No content available for this output.

Vote

12 / 30

Surprise Animal SVGimage generation

Generate SVG art of a randomly chosen animal in a setting of its choosing.

Kimi K2 Thinking

Moonshotai · Nov 2025svg

No content available for this output.

Gemini 3.1 Pro Preview

Google · Feb 2026svg

No content available for this output.

Vote

12 of 30

FAQ

Continue exploring

Kimi K2 Thinking vs GPT-5New provider

Kimi K2 Thinking vs Llama 4 MaverickNew provider

Kimi K2 Thinking vs Claude Opus 4New provider

The estimated total compute is roughly $3.15 \times 10^{23}$ FLOPs.

Here is the step-by-step reasoning used to arrive at this number:

Step 1: Identify the key specifications of GPT-3

To calculate the compute required, we need to know two fundamental numbers about the model:

$N$ (Number of Parameters): GPT-3 has 175 billion parameters ($1.75 \times 10^{11}$).
$D$ (Number of Training Tokens): The GPT-3 paper states the model was trained on 300 billion tokens ($3.0 \times 10^{11}$).

Step 2: Establish the FLOPs formula

Why the multiplier of 6?

Forward Pass: For every token processed, multiplying it by a single parameter requires 1 Multiply-Accumulate (MAC) operation. A MAC consists of 2 FLOPs (one multiplication and one addition). Therefore, the forward pass takes roughly $2ND$ FLOPs.
Backward Pass: Calculating the gradients during backpropagation requires about twice as much compute as the forward pass (you have to calculate the gradient with respect to the weights and the gradient with respect to the activations). Therefore, the backward pass takes roughly $4ND$ FLOPs.
Adding them together gives $2ND + 4ND =$ $6ND$ FLOPs.

Step 3: Perform the calculation

Now we simply plug the numbers into our formula:

$N = 175 \times 10^9$
$D = 300 \times 10^9$

First, multiply $N$ and $D$:

$N \times D = (175 \times 10^9) \times (300 \times 10^9)$
$N \times D = 52,500 \times 10^{18}$
$N \times D = 5.25 \times 10^{22}$

Next, multiply by 6 (for the forward and backward passes):

Total FLOPs = $6 \times 5.25 \times 10^{22}$
Total FLOPs = $31.5 \times 10^{22}$ or $3.15 \times 10^{23}$

Step 4: Sanity Check

Does $3.15 \times 10^{23}$ FLOPs make sense in the real world? When GPT-3 was trained (around late 2019/early 2020), Nvidia V100 GPUs were the standard.

A V100 GPU has a peak theoretical output of roughly 120 TeraFLOPs ($1.2 \times 10^{14}$ FLOPs) per second in mixed precision.
Real-world hardware utilization (MFU) for training large models is usually around 30% to 40%, yielding roughly $4 \times 10^{13}$ effective FLOPs per second per GPU.
If OpenAI used a cluster of 10,000 V100 GPUs, the cluster would output $4 \times 10^{17}$ FLOPs per second.
Divide the total FLOPs by the cluster output: $(3.15 \times 10^{23}) / (4 \times 10^{17}) \approx 787,500$ seconds.
787,500 seconds is exactly 9.1 days.

Conclusion: The total compute required to train GPT-3 was approximately $3.15 \times 10^{23}$ FLOPs.

Kimi K2 Thinking vs Gemini 3.1 Pro Preview

Why Kimi K2 Thinking?

Why Gemini 3.1 Pro Preview?

Favorites

Estimate Complexityreasoning

Kimi K2 Thinking

Gemini 3.1 Pro Preview

Step 1: Identify the key specifications of GPT-3

Step 2: Establish the FLOPs formula

Step 3: Perform the calculation

Step 4: Sanity Check

Logic Puzzlereasoning

Kimi K2 Thinking

Gemini 3.1 Pro Preview

Stochastic Consistency Testanalysis

Kimi K2 Thinking

Gemini 3.1 Pro Preview

Xbox Controller SVG Artimage generation

Kimi K2 Thinking

Gemini 3.1 Pro Preview

Realistic AI Interviewconversation

Kimi K2 Thinking

Gemini 3.1 Pro Preview

Satirical Fake News Headlineconversation

Kimi K2 Thinking

Gemini 3.1 Pro Preview

Minimalist Landing Pageweb design

Kimi K2 Thinking

Gemini 3.1 Pro Preview

Mario Level UI Recreationweb design

Kimi K2 Thinking

Gemini 3.1 Pro Preview

Framer-Style Animationweb design

Kimi K2 Thinking

Gemini 3.1 Pro Preview

The Sentience Testreasoning

Kimi K2 Thinking

Gemini 3.1 Pro Preview

Futuristic Predictionanalysis

Kimi K2 Thinking

Gemini 3.1 Pro Preview

Surprise Animal SVGimage generation

Kimi K2 Thinking

Gemini 3.1 Pro Preview

What is the difference between Kimi K2 Thinking and Gemini 3.1 Pro Preview?

Which is better, Kimi K2 Thinking or Gemini 3.1 Pro Preview?

How much does Kimi K2 Thinking cost compared to Gemini 3.1 Pro Preview?

How can I compare Kimi K2 Thinking and Gemini 3.1 Pro Preview on RIVAL?

Why Kimi K2 Thinking?

Why Gemini 3.1 Pro Preview?

Favorites

Estimate Complexityreasoning

Kimi K2 Thinking

Gemini 3.1 Pro Preview

Step 1: Identify the key specifications of GPT-3

Step 2: Establish the FLOPs formula

Step 3: Perform the calculation

Step 4: Sanity Check

Logic Puzzlereasoning

Kimi K2 Thinking

Gemini 3.1 Pro Preview

Stochastic Consistency Testanalysis

Kimi K2 Thinking

Gemini 3.1 Pro Preview

Xbox Controller SVG Artimage generation

Kimi K2 Thinking

Gemini 3.1 Pro Preview

Realistic AI Interviewconversation

Kimi K2 Thinking

Gemini 3.1 Pro Preview

Satirical Fake News Headlineconversation

Kimi K2 Thinking

Gemini 3.1 Pro Preview

Minimalist Landing Pageweb design

Kimi K2 Thinking

Gemini 3.1 Pro Preview

Mario Level UI Recreationweb design

Kimi K2 Thinking

Gemini 3.1 Pro Preview

Framer-Style Animationweb design