4 models have responded to the "Estimate Complexity" challenge. Compare their approaches side-by-side on RIVAL. This response is part of RIVAL's open dataset of 5,600+ AI model responses.
Gpt 5.1 Codex Mini's response to the 'Estimate Complexity' challenge.
Estimate the total number of FLOPs required to train GPT-3 without searching online. Explain your reasoning step by step.
To estimate the total FLOPs required to train GPT-3, we can proceed with the following reasoning steps, using only basic information about the model’s size and the amount of data it trained on:
Model Size
GPT-3 has approximately 175 billion parameters, which is (1.75 \times 10^{11}) parameters.
Training Dataset Size
GPT-3 was trained on roughly 300 billion tokens. That gives us a sense of how many “steps” (i.e., token predictions) the model processed during training.
FLOPs per Token
Training a transformer model involves both a forward pass and a backward pass for each token. A commonly used rule of thumb is that each parameter is involved in about 6 FLOPs per token when you count both the forward pass and the backward pass (this includes matrix multiplications, activations, etc.). This is a simplified approximation but it’s widely used for rough compute estimates.
Multiply Everything Together
Total FLOPs (\approx) (number of parameters) (\times) (number of tokens) (\times) (FLOPs per token)
(= 1.75 \times 10^{11} \times 3 \times 10^{11} \times 6)
Compute the Result
First multiply the parameters and tokens:
(1.75 \times 3 = 5.25), so (5.25 \times 10^{22}) parameter-token interactions.
Multiply by 6 FLOPs/token:
(5.25 \times 10^{22} \times 6 = 31.5 \times 10^{22} = 3.15 \times 10^{23}) FLOPs.
So, a reasonable estimate is that training GPT-3 required on the order of (3 \times 10^{23}) floating-point operations.
Turn this model response into notes, narration, or a short video
Partner link