Skip to content
Rival
Models
CompareBest ForArena
Sign Up
Sign Up

Compare AI vibes, not scores. Side-by-side outputs across the world's best models.

@rival_tips

Explore

  • Compare Models
  • All Models
  • Leaderboard
  • Challenges

Discover

  • AI Creators
  • AI Tools
  • The Graveyard

Developers

  • Developer Hub
  • MCP Server
  • .llmignore
  • Badges
  • RIVAL Datasets

Connect

  • Methodology
  • Sponsor
  • Partnerships
  • Privacy Policy
  • Terms
  • RSS Feed
© 2026 Rival
Rival
Models
CompareBest ForArena
Sign Up
Sign Up

Compare AI vibes, not scores. Side-by-side outputs across the world's best models.

@rival_tips

Explore

  • Compare Models
  • All Models
  • Leaderboard
  • Challenges

Discover

  • AI Creators
  • AI Tools
  • The Graveyard

Developers

  • Developer Hub
  • MCP Server
  • .llmignore
  • Badges
  • RIVAL Datasets

Connect

  • Methodology
  • Sponsor
  • Partnerships
  • Privacy Policy
  • Terms
  • RSS Feed
© 2026 Rival
  1. Home
  2. Models
  3. Z.AI: GLM 4.5
  4. Estimate Complexity

See how Z.AI: GLM 4.5 compares on this challenge:

  • Z.AI: GLM 4.5 vs Claude Sonnet 4.6
  • Z.AI: GLM 4.5 vs Gemini 3.1 Pro Preview
  • Z.AI: GLM 4.5 vs Z.ai: GLM 5

4 models have responded to the "Estimate Complexity" challenge. Compare their approaches side-by-side on RIVAL. This response is part of RIVAL's open dataset of 5,600+ AI model responses.

Rival
Models
CompareBest ForArena
Sign Up
Sign Up
Back to Z.AI: GLM 4.5

Estimate Complexity

Glm 4.5's response to the 'Estimate Complexity' challenge.

Z.AI: GLM 4.5
July 28, 2025
Text

?
Prompt

Estimate the total number of FLOPs required to train GPT-3 without searching online. Explain your reasoning step by step.

Z.AI: GLM 4.5
Response

glm-4.5 logo
Z.AI: GLM 4.5Estimate Complexity
glm-4.5 logo
Z.AI: GLM 4.5

To estimate the total number of FLOPs (floating-point operations) required to train GPT-3 without searching online, we rely on known architectural details and standard computational models for transformer training. Here's the step-by-step reasoning:

Step 1: Key Parameters of GPT-3

  • Number of parameters (P): GPT-3 has 175 billion parameters (175 × 10⁹).
  • Training dataset size (D): The model was trained on approximately 300 billion tokens (300 × 10⁹ tokens). This is a standard figure associated with GPT-3.

Step 2: FLOPs per Token Calculation

Training a transformer model involves two main operations per token:

  1. Forward pass: Computes the model's output.
  2. Backward pass: Computes gradients for backpropagation.

For transformer architectures, the combined FLOPs for forward and backward passes per token is empirically estimated as 6 × P. This is derived from:

  • Forward pass: Requires ~2 × P FLOPs per token (matrix multiplications dominate, with each parameter used in multiply-add operations).
  • Backward pass: Typically requires ~2× the FLOPs of the forward pass due to gradient calculations (~4 × P FLOPs per token).
  • Total per token: 2P (forward) + 4P (backward) = 6P FLOPs per token.

Step 3: Total FLOPs for the Entire Dataset

The total FLOPs scales linearly with the dataset size:

  • Total FLOPs = FLOPs per token × Number of tokens.
  • Substituting the values:
    Total FLOPs = (6 × P) × D = 6 × (175 × 10⁹) × (300 × 10⁹).

Step 4: Calculation

  • Compute the product:
    6 × 175 × 300 × 10⁹ × 10⁹ = 6 × 52,500 × 10¹⁸.
  • Simplify:
    6 × 52,500 = 315,000.
    Total FLOPs = 315,000 × 10¹⁸ = 3.15 × 10²³ FLOPs.

Step 5: Validation with Known Compute

  • GPT-3 training reportedly used ~3,640 petaflop-days of compute.
  • Convert petaflop-days to FLOPs:
    1 petaflop-day = 10¹⁵ FLOPs × 86,400 seconds/day = 8.64 × 10¹⁹ FLOPs.
    3,640 petaflop-days = 3,640 × 8.64 × 10¹⁹ ≈ 3.14 × 10²³ FLOPs.
  • This closely matches our estimate of 3.15 × 10²³ FLOPs, confirming the calculation.

Step 6: Batch Size and Steps Consideration

  • Batch size: GPT-3 used a batch size of 3.2 million tokens per step (80 batches × 4096 tokens/batch).
  • Number of training steps:
    Steps = Total tokens / Batch size = 300 × 10⁹ / 3.2 × 10⁶ = 93,750 steps.
  • FLOPs per step:
    Per step FLOPs = 6 × P × Batch size = 6 × 175 × 10⁹ × 3.2 × 10⁶ = 3.36 × 10¹⁸ FLOPs/step.
  • Total FLOPs:
    93,750 steps × 3.36 × 10¹⁸ FLOPs/step ≈ 3.15 × 10²³ FLOPs.
    This consistency reinforces the estimate.

Step 7: Negligible Factors

  • Parameter updates: Optimizer steps (e.g., Adam) add O(P) FLOPs per step. For 93,750 steps:
    Update FLOPs ≈ 93,750 × 175 × 10⁹ ≈ 1.64 × 10¹⁶ FLOPs (negligible vs. 3.15 × 10²³).
  • Embedding/softmax layers: Included in the 6P per-token estimate.
  • Attention mechanisms: Captured by the 6P heuristic (validated for transformers).

Final Estimate

The total FLOPs required to train GPT-3 is 3.15 × 10²³ FLOPs. This is derived from:

  • Standard transformer FLOPs per token (6P).
  • GPT-3 parameters (175B) and dataset size (300B tokens).
  • Cross-validation with reported compute usage (3,640 petaflop-days).
Estimate Complexity

Turn this model response into notes, narration, or a short video

Auto NotesTranscriptNarrateVideo Cut

Partner link

This output is part of our open dataset — 5K+ AI responses across 200 models

About Z.AI: GLM 4.5

Capabilities

ConversationReasoningCode GenerationAnalysis

Categories

TextCode

Specifications

Provider
Zhipu
Released
2025-07-25
Size
LARGE
Context
128,000 tokens

Keep exploring

SAME PROMPT

Claude Sonnet 4.6's version

Same prompt, different result

COMPARE

Z.AI: GLM 4.5 vs Gemini 3.1 Pro Preview

Both outputs, side by side

Compare AI vibes, not scores. Side-by-side outputs across the world's best models.

@rival_tips

Explore

  • Compare Models
  • All Models
  • Leaderboard
  • Challenges

Discover

  • AI Creators
  • AI Tools
  • The Graveyard

Developers

  • Developer Hub
  • MCP Server
  • .llmignore
  • Badges
  • RIVAL Datasets

Connect

  • Methodology
  • Sponsor
  • Partnerships
  • Privacy Policy
  • Terms
  • RSS Feed
© 2026 Rival