Skip to content
Rival
Models
CompareBest ForArena
Sign Up
Sign Up

Compare AI vibes, not scores. Side-by-side outputs across the world's best models.

@rival_tips

Explore

  • Compare Models
  • All Models
  • Leaderboard
  • Challenges

Discover

  • AI Creators
  • AI Tools
  • The Graveyard

Developers

  • Developer Hub
  • MCP Server
  • .llmignore
  • Badges
  • RIVAL Datasets

Connect

  • Methodology
  • Sponsor
  • Partnerships
  • Privacy Policy
  • Terms
  • RSS Feed
© 2026 Rival
Rival
Models
CompareBest ForArena
Sign Up
Sign Up

Compare AI vibes, not scores. Side-by-side outputs across the world's best models.

@rival_tips

Explore

  • Compare Models
  • All Models
  • Leaderboard
  • Challenges

Discover

  • AI Creators
  • AI Tools
  • The Graveyard

Developers

  • Developer Hub
  • MCP Server
  • .llmignore
  • Badges
  • RIVAL Datasets

Connect

  • Methodology
  • Sponsor
  • Partnerships
  • Privacy Policy
  • Terms
  • RSS Feed
© 2026 Rival
  1. Home
  2. Models
  3. Z.AI: GLM 4.7
  4. Estimate Complexity

See how Z.AI: GLM 4.7 compares on this challenge:

  • Z.AI: GLM 4.7 vs Claude Sonnet 4.6
  • Z.AI: GLM 4.7 vs Gemini 3.1 Pro Preview
  • Z.AI: GLM 4.7 vs Z.ai: GLM 5

4 models have responded to the "Estimate Complexity" challenge. Compare their approaches side-by-side on RIVAL. This response is part of RIVAL's open dataset of 5,600+ AI model responses.

Rival
Models
CompareBest ForArena
Sign Up
Sign Up
Back to Z.AI: GLM 4.7

Estimate Complexity

Glm 4.7's response to the 'Estimate Complexity' challenge.

Z.AI: GLM 4.7
December 23, 2025
Text

?
Prompt

Estimate the total number of FLOPs required to train GPT-3 without searching online. Explain your reasoning step by step.

Z.AI: GLM 4.7
Response

glm-4.7 logo
Z.AI: GLM 4.7Estimate Complexity
glm-4.7 logo
Z.AI: GLM 4.7

To estimate the total number of FLOPs (Floating Point Operations) required to train GPT-3 (175 billion parameters), we can use the standard scaling laws for Transformer models combined with the specific hyperparameters reported in the GPT-3 paper ("Language Models are Few-Shot Learners").

Here is the step-by-step reasoning:

1. Identify Key Variables

From the GPT-3 paper, we have two primary numbers required for the estimation:

  • Number of Parameters ($N$): $175 \times 10^9$ (175 billion).
  • Training Data Size ($D$): The paper states the model was trained on roughly 300 billion tokens ($300 \times 10^9$).

2. Determine FLOPs per Token

Training a neural network involves a forward pass (computing the output) and a backward pass (computing gradients and updating weights).

For a standard Transformer architecture (which GPT-3 is), the computational cost is dominated by matrix multiplications in the attention and feed-forward layers.

A widely accepted rule of thumb in deep learning (used in scaling laws like those from Kaplan et al. and the Chinchilla paper) for the FLOPs required to process one token is: $$ \text{FLOPs per token} \approx 6 \times N $$

Why the factor of 6?

  • Forward Pass ($\approx 2N$): For every parameter in the model, you perform one multiply and one add operation (2 FLOPs) during the forward pass.
  • Backward Pass ($\approx 4N$): Computing gradients is roughly twice as expensive as the forward pass (due to the chain rule and storing activations), adding another $4N$ FLOPs.
  • Total: $2N + 4N = 6N$.

3. Calculate Total FLOPs

Now we multiply the FLOPs per token by the total number of tokens trained.

$$ \text{Total FLOPs} = 6 \times N \times D $$

Substituting the values:

$$ \text{Total FLOPs} = 6 \times (175 \times 10^9) \times (300 \times 10^9) $$

Let's do the arithmetic:

  1. $175 \times 300 = 52,500$
  2. $52,500 \times 6 = 315,000$
  3. Combine the exponents: $10^9 \times 10^9 = 10^{18}$

$$ \text{Total FLOPs} = 315,000 \times 10^{18} $$

$$ \text{Total FLOPs} = 3.15 \times 10^{23} $$

4. Final Result and Context

The estimated total compute required to train GPT-3 is approximately $3.15 \times 10^{23}$ FLOPs.

To put this in more common hardware terms, we can convert this to PetaFLOP-days (the number of days a computer capable of $10^{15}$ operations per second would take to do the work).

  • Seconds in a day: $86,400$
  • PetaFLOP-day in FLOPs: $10^{15} \times 86,400 = 8.64 \times 10^{19}$

$$ \text{Days} = \frac{3.15 \times 10^{23}}{8.64 \times 10^{19}} \approx 3,645 \text{ PetaFLOP-days} $$

Conclusion: Based on the model size of 175 billion parameters and a training dataset of 300 billion tokens, the total estimated compute is $3.15 \times 10^{23}$ FLOPs (or roughly 3,645 PetaFLOP-days).

Estimate Complexity

Turn this model response into notes, narration, or a short video

Auto NotesTranscriptNarrateVideo Cut

Partner link

This output is part of our open dataset — 5K+ AI responses across 200 models

About Z.AI: GLM 4.7

Capabilities

ConversationReasoningCode GenerationAnalysis

Categories

TextCode

Specifications

Provider
Openrouter
Released
2025-12-22
Size
LARGE

Keep exploring

SAME PROMPT

Claude Sonnet 4.6's version

Same prompt, different result

COMPARE

Z.AI: GLM 4.7 vs Gemini 3.1 Pro Preview

Both outputs, side by side

Compare AI vibes, not scores. Side-by-side outputs across the world's best models.

@rival_tips

Explore

  • Compare Models
  • All Models
  • Leaderboard
  • Challenges

Discover

  • AI Creators
  • AI Tools
  • The Graveyard

Developers

  • Developer Hub
  • MCP Server
  • .llmignore
  • Badges
  • RIVAL Datasets

Connect

  • Methodology
  • Sponsor
  • Partnerships
  • Privacy Policy
  • Terms
  • RSS Feed
© 2026 Rival