Z.AI: GLM 4.7 vs GPT-4.1 Mini
Compare Z.AI: GLM 4.7 by OpenRouter against GPT-4.1 Mini by OpenAI, tested across 43 shared challenges. Updated February 2026.
Compare Z.AI: GLM 4.7 by OpenRouter against GPT-4.1 Mini by OpenAI, tested across 43 shared challenges. Updated February 2026.
38 challenges
Tests an AI's ability to make educated estimates based on technical knowledge
Certainly! Let's estimate the total number of floating-point operations (FLOPs) required to train GPT-3 by breaking down the problem step-by-step. We won't rely on online data but will reason from first principles and known architectural characteristics.
GPT-3, the largest model version, has approximately:
A transformer forward pass consists mainly of:
Two main components dominate FLOPs:
From papers and common configurations:
Using formulas from transformer FLOPs estimation:
[ ext{FLOPs}_ ext{layer/token} approx 2 imes d_{model}^2 ]
(for attention and FFN combined, includes multiplications and additions)
Because the FFN (feedforward network) is the heaviest part, it dominates, so let's assume:
Substituting:
[ 8 imes (12,288)^2 approx 8 imes 1.5 imes 10^8 = 1.2 imes 10^9 ext{ FLOPs per token per layer} ]
For 96 layers:
[ 1.2 imes 10^9 imes 96 = 1.15 imes 10^{11} ext{ FLOPs per token (forward pass)} ]
Backpropagation generally costs about 2× to 3× the forward pass FLOPs because computing gradients involves forward pass and backward pass computations.
Use factor 3 to be safe:
[ ext{FLOPs} = 3 imes 1.15 imes 10^{11} = 3.45 imes 10^{11} ext{ FLOPs per token (forward + backward)} ]
From publicly known info, GPT-3 was trained on about 300 billion tokens (around 3 × 10¹¹ tokens) of text.
Multiply FLOPs per token by total tokens:
[ 3.45 imes 10^{11} imes 3 imes 10^{11} = 1.035 imes 10^{23} ext{ FLOPs} ]
[ oxed{ ext{Total FLOPs to train GPT-3} approx 10^{23} quad ext{FLOPs} } ]
So, the training of GPT-3 requires on the order of ( 10^{23} ) floating point operations.
Tests an AI's ability to understand game rules and strategy
Tests an AI's ability to solve a simple but potentially confusing logic puzzle
Let's analyze the information:
Since Sally is a girl and one of the sisters, for each brother to have 2 sisters, it means there must be a total of 2 sisters in the family (including Sally).
Therefore:
Answer: Sally has 1 sister.
Tests an AI's understanding of number representation
No, 9.11 is not greater than 9.9.
When comparing decimal numbers, you compare digit by digit from left to right:
Since 1 is less than 9, 9.11 is less than 9.9.
Tests an AI's randomness and creativity
Here are 5 jokes:
All 5 jokes are unique.
Tests an AI's ability to generate vector graphics
Tests an AI's ability to create detailed SVG illustrations of gaming hardware
Tests an AI's humor and creative writing ability
Tests an AI's ability to simulate personalities and predict future trends
Tests an AI's humor and understanding of current events
Tests an AI's ability to write in distinct character voices
Tests an AI's ability to generate a complete, working landing page