Loading share button...

Gemini Pro 1.0vsGPT-5

27 challenges

Why Gemini Pro 1.0?

5.9x cheaper overall$0.50/M in · $1.50/M out

Why GPT-5?

12.2x more context400K
20 months newerAug 2025
3 / 27

Estimate Complexity

reasoning·hard

Tests an AI's ability to make educated estimates based on technical knowledge

Gemini Pro 1.0

Gemini Pro 1.0

Estimating GPT-3 Training FLOPs: A Step-by-Step Approach

Without access to online resources, we can estimate the FLOPs (Floating-point Operations) needed for GPT-3 training by considering the following factors:

1. Model Size:

  • GPT-3 has 175 billion parameters.
  • Each parameter update involves two multiplications and one addition for each floating-point number.
  • Therefore, each parameter update requires 3 FLOPs.
  • Total FLOPs per update: 175 billion parameters * 3 FLOPs/parameter = 525 billion FLOPs.

2. Training Data Size:

  • GPT-3 was trained on a massive dataset of text and code.
  • Let's assume the dataset size is 500 billion words.
  • Each word is represented by an embedding vector of size 1,024.
  • Therefore, the total number of floating-point numbers in the dataset is 500 billion words * 1,024 = 512 trillion.

3. Number of Training Steps:

  • The exact number of training steps is unknown, but it likely involves millions of iterations over the entire dataset.
  • Let's assume there are 10 million training steps.

4. Optimization Algorithm:

  • GPT-3 likely uses Adam optimizer, which involves additional computations for updating parameters.
  • The exact FLOPs associated with the optimizer are difficult to estimate without knowing the specific hyperparameters used.
  • For simplicity, we can assume an additional 10 FLOPs per parameter update.

5. Hardware Efficiency:

  • Modern hardware like GPUs and TPUs can perform several FLOPs per clock cycle.
  • Let's assume an average efficiency of 5 FLOPs per clock cycle.

Putting it all together:

  • Total FLOPs per update: 525 billion (parameter updates) + 10 billion (optimizer) = 535 billion FLOPs
  • Total FLOPs for training: 535 billion FLOPs/update * 10 million updates = 5.35 * 10^18 FLOPs
  • Adjusted for hardware efficiency: 5.35 * 10^18 FLOPs / 5 FLOPs/cycle = 1.07 * 10^18 clock cycles

Therefore, the estimated total FLOPs required to train GPT-3 without searching online is approximately 1.07 * 10^18 clock cycles.

Important Caveats:

  • This is a rough estimate and may not be accurate due to several unknown factors, including the exact training data size, number of training steps, optimizer settings, and hardware efficiency.
  • The actual FLOPs could be significantly higher or lower depending on the specific training details.

Additionally, it's important to remember that this is just an estimation of the computational cost. Training a model like GPT-3 also requires significant energy consumption and environmental impact.

GPT-5

GPT-5

Here’s a back‑of‑the‑envelope estimate using standard LM training-compute heuristics.

  1. Assumptions
  • Model size (N): GPT-3 has about 175 billion parameters (N ≈ 1.75 × 10^11).
  • Training tokens (T): GPT-3 was trained on roughly 300 billion tokens (T ≈ 3.0 × 10^11).
  • FLOPs per token per parameter: A common rule of thumb for transformer LM training is FLOPs ≈ 6 × N × T The factor ~6 comes from counting forward and backward passes for the parameter-heavy matmuls (roughly 2N for forward + 4N for backward). It ignores small terms from layernorms, softmax, embeddings, and optimizer updates (which we’ll account for with a small overhead factor later).
  1. Core calculation
  • 6 × N × T = 6 × (1.75 × 10^11) × (3.0 × 10^11)
  • First multiply N × T: 1.75 × 3.0 = 5.25; powers: 10^11 × 10^11 = 10^22 So N × T = 5.25 × 10^22
  • Now multiply by 6: 6 × 5.25 × 10^22 = 31.5 × 10^22 = 3.15 × 10^23 FLOPs
  1. Overheads and uncertainty
  • Non-matmul ops (attention softmax, masking, residuals, layernorms), embeddings, and optimizer update steps add some overhead. A 10–20% uplift is a reasonable allowance.
  • Applying +10–20%: 3.15 × 10^23 × (1.1 to 1.2) ≈ (3.5 ± 0.3) × 10^23 FLOPs.

Answer

  • Point estimate: ~3.15 × 10^23 FLOPs
  • With a modest overhead allowance: roughly (3.3–3.8) × 10^23 FLOPs

This is a coarse estimate; the exact number depends on details like sequence length, activation checkpointing, implementation efficiency, and the precise token count.

Vote
2 / 27

AI Board Game Logic

reasoning·medium

Tests an AI's ability to understand game rules and strategy

Gemini Pro 1.0

Gemini Pro 1.0

GPT-5

GPT-5

Vote
3 / 27

Logic Puzzle

reasoning·medium

Tests an AI's ability to solve a simple but potentially confusing logic puzzle

Gemini Pro 1.0

Gemini Pro 1.0

Sally has 1 sister.

Explanation:

  • Sally is a girl.
  • She has 3 brothers.
  • All the siblings share the same sisters. Since Sally is one of the sisters, and each brother has 2 sisters in total, there must be one other girl besides Sally in the family.
  • Therefore, Sally has 1 sister.
GPT-5

GPT-5

1

Reason: If each of the 3 brothers has 2 sisters, there are exactly 2 girls in the family. One is Sally, so she has 1 sister.

Vote
Loading