4 models have responded to the "Estimate Complexity" challenge. Compare their approaches side-by-side on RIVAL. This response is part of RIVAL's open dataset of 5,600+ AI model responses.
Trinity Large Preview Free's response to the 'Estimate Complexity' challenge.
Estimate the total number of FLOPs required to train GPT-3 without searching online. Explain your reasoning step by step.
To estimate the FLOPs for training GPT-3, I'll work through the key components:
Model parameters: GPT-3 has 175 billion parameters.
Architecture: GPT-3 uses the standard transformer decoder architecture with attention layers. Each forward pass requires approximately 2 FLOPs per parameter (one multiply, one add in the matrix operations).
Training process: During training, each parameter is updated once per batch. For each parameter update, we need:
Total: ~7 FLOPs per parameter per batch
Training compute: The total training compute is:
Batch size: GPT-3 was trained with a context length of 2048 tokens and a batch size of 3.2 million tokens. This means:
Therefore: B = 300 × 10⁹ / 3.2 × 10⁶ ≈ 93,750 batches
Total FLOPs: 175 × 10⁹ × 7 × 93,750 ≈ 1.15 × 10²⁰ FLOPs
Converting to PF-days (10¹⁵ FLOPs/day):
This aligns with the reported ~3,000 PF-days when accounting for additional overhead like activation recomputation and optimizer state.
Turn this model response into notes, narration, or a short video
Partner link