Loading share button...

GPT-4o (Omni)vsKimi Linear 48B A3B Instruct

34 challenges

Why GPT-4o (Omni)?

No standout advantages

Why Kimi Linear 48B A3B Instruct?

14.6x cheaper overall$0.30/M in · $0.60/M out
8.2x more context1.0M
18 months newerNov 2025
3 / 34

Estimate Complexity

reasoning·hard

Tests an AI's ability to make educated estimates based on technical knowledge

GPT-4o (Omni)

GPT-4o (Omni)

To estimate the total number of FLOPs required to train GPT-3, let's break it down into key components.

1. Model Size and Parameters

GPT-3 has 175 billion parameters.

2. FLOPs per Forward and Backward Pass

In deep learning, the FLOPs required for a forward pass and backward pass are estimated as:

  • Forward pass: ( 2 imes ) the number of parameters
  • Backward pass: ( 2 imes ) the FLOPs of the forward pass

Thus, total FLOPs per training step:

[ ext{Total FLOPs per step} = 6 imes ext{number of parameters} imes ext{tokens per batch} ]

For GPT-3, let's assume 175B parameters and a batch size of 2048 tokens (common for large-scale training).

[ ext{FLOPs per training step} = 6 imes 175 imes 10^9 imes 2048 ]

[ = 2.15 imes 10^{15} ext{ FLOPs per step} ]

3. Total Training Steps

GPT-3 was trained with 300 billion tokens (as per OpenAI's paper).

If the batch size is 2048 tokens, the total number of steps is:

[ ext{Total steps} = rac{300 imes 10^9}{2048} approx 1.46 imes 10^8 ]

4. Total FLOPs Required

Multiplying the FLOPs per step by the total number of steps:

[ ext{Total FLOPs} = (2.15 imes 10^{15}) imes (1.46 imes 10^8) ]

[ approx 3.14 imes 10^{23} ext{ FLOPs} ]

Final Answer:

The estimated total number of FLOPs required to train GPT-3 is about ( 3.14 imes 10^{23} ) floating point operations.

Kimi Linear 48B A3B Instruct

Kimi Linear 48B A3B Instruct

Below is a back-of-the-envelope estimate of the total FLOPs needed to train GPT-3 “175 B” (the largest 96-layer, 12288-d model with 175 billion parameters) from scratch on the full 300 B-token corpus, assuming the standard transformer training setup used by OpenAI.

Step 1 – FLOPs per forward+backward pass
A single transformer block has

  • 2 matrix-multiply operations per attention layer (QKᵀ and OV)
  • 2 matrix-multiply operations per MLP (Wi and Wo)
  • 1 extra attention output projection (optional, but small)
  • 1 residual add per block

For 96 layers and 12288 hidden units, each matrix multiply is roughly
(12288²) ≈ 1.5 × 10⁸ FLOPs.
So per block: 2 attention + 2 MLP ≈ 8 × 1.5 × 10⁸ ≈ 1.2 × 10⁹ FLOPs.
Add residual adds and layer norms (~10 % overhead) ⇒ ≈ 1.3 × 10⁹ FLOPs per block.

With 96 blocks, forward+backward per token ≈ 1.25 × 10¹¹ FLOPs ≈ 125 GFLOPs.

Step 2 – Tokens processed per forward+backward step
The batch size is 3,200 sequences × 2,048 tokens = 6.55 million tokens.
Thus each SGD step consumes ≈ 6.6 × 10⁶ tokens.

Step 3 – Total SGD steps
300 billion tokens / 6.6 million tokens per step ≈ 45,500 steps.

Step 4 – Total FLOPs
45,500 steps × 125 GFLOPs/step ≈ 5.7 × 10¹² FLOPs ≈ 5.7 petaflop-days.

Step 5 – Overhead for data loading, optimizer states, and communication

  • Adam: 8 extra copies of the parameter matrix (2 × 4 bytes per float32) → +50 %
  • Gradient communication and padding → +20 %
  • Data loading and tokenization → +10 %

Overall multiply by 1.8 ⇒ ≈ 1.0 × 10¹³ FLOPs ≈ 10 petaflop-days.

Step 6 – Round to one significant figure
≈ 10¹³ FLOPs, i.e. ten petaflop-days.

So a rough estimate is on the order of 10¹³ total floating-point operations to train GPT-3 175 B from scratch on 300 B tokens with the standard transformer architecture and training setup.

Vote
2 / 34

AI Board Game Logic

reasoning·medium

Tests an AI's ability to understand game rules and strategy

GPT-4o (Omni)

GPT-4o (Omni)

Kimi Linear 48B A3B Instruct

Kimi Linear 48B A3B Instruct

Vote
3 / 34

Logic Puzzle

reasoning·medium

Tests an AI's ability to solve a simple but potentially confusing logic puzzle

GPT-4o (Omni)

GPT-4o (Omni)

Sally has only one sister.

Explanation:

  • Sally has 3 brothers.
  • Each brother has 2 sisters.
  • The two sisters are Sally and one other girl.

Thus, Sally has 1 sister.

Kimi Linear 48B A3B Instruct

Kimi Linear 48B A3B Instruct

Sally has 2 sisters.

Here's the reasoning:
Sally is one of the sisters. Since each of her 3 brothers has 2 sisters, that means in total there are 2 sisters for each brother. Since Sally is one of them, the other sister must be another girl.

Vote
Loading