What is the difference between Claude 3.7 Sonnet and Sherlock Dash Alpha?

Claude 3.7 Sonnet is developed by Anthropic while Sherlock Dash Alpha is developed by OpenRouter. in 1 community votes on Rival, Claude 3.7 Sonnet wins 100% of head-to-head matchups. These results are based on blind head-to-head voting across 35 challenges.

Which is better, Claude 3.7 Sonnet or Sherlock Dash Alpha?

Based on 1 community votes on Rival, Claude 3.7 Sonnet wins 100% of head-to-head matchups against Sherlock Dash Alpha.

How are Claude 3.7 Sonnet vs Sherlock Dash Alpha votes collected on Rival?

Rival presents both models' outputs side-by-side in blind duels. Voters see the responses but don't know which model produced each one, eliminating brand bias. 1 votes have been collected for this pair across 35 challenges. All vote data is part of Rival's open dataset.

Claude 3.7 Sonnet vs Sherlock Dash Alpha: Which Is Better? [2026 Comparison]

Step 1: FLOPs per Forward Pass

In a transformer model, the main computational components are:

Self-attention: O(s²d) where s is sequence length and d is model dimension

Feed-forward networks: O(sd²)

For GPT-3, if we assume:

Sequence length (s) = 2048

Model dimension (d) ≈ 12288 (estimated based on parameter count)

Number of layers ≈ 96 (estimated)

FLOPs per token in forward pass:

Self-attention: ~2 × 2048² × 12288 × 96 ≈ 4.8 × 10¹²

Feed-forward: ~2 × 2048 × 12288² × 96 ≈ 5.9 × 10¹³

Total per forward pass: ~6.4 × 10¹³ FLOPs

Step 4: Total Training FLOPs

If GPT-3 was trained on ~300 billion tokens:

Total FLOPs: 1.9 × 10¹⁴ × 3 × 10¹¹ ≈ 5.7 × 10²⁵ FLOPs

Accounting for optimization overhead, gradient accumulation, and other factors, we might round up to:

Total training FLOPs: ~10²⁶ FLOPs

This is in the same order of magnitude as other estimates I've seen, which typically place GPT-3 training at around 3-6 × 10²³ FLOPs. My estimate is a bit higher, which could be due to overestimating some of the architectural parameters or not accounting for optimizations like mixed precision training.

Step-by-Step Reasoning for Estimating GPT-3 Training FLOPs

1. Recall GPT-3 Key Specs (from Memory)

GPT-3 (175B parameter version): 175 billion parameters (1.75 × 10¹¹).
Trained on 300 billion tokens (3 × 10¹¹ tokens). This is the standard figure from OpenAI's reports.
Architecture: Standard Transformer decoder-only model.
Batch size, sequence length, etc., aren't precisely recalled, but typical for such models: sequence length ~2K-4K tokens, large batch sizes (e.g., effective 1M+ tokens via gradient accumulation).

2. FLOPs Formula for Transformer Training

The standard empirical formula for total FLOPs in training autoregressive Transformers (from papers like Kaplan et al., Hoffmann et al., and used for GPT-3 estimates) is:

[ \text{Total FLOPs} \approx 6 \times N \times D ]

N = number of parameters (1.75 × 10¹¹).
D = number of tokens processed (3 × 10¹¹).
Why 6?
- Forward pass: ~2 FLOPs per parameter per token (matrix multiplies dominate).
- Backward pass: ~4 FLOPs per parameter per token (gradients roughly double the forward cost).
- Total ~6× per token. This is a well-established rule-of-thumb validated across models like GPT-2/3, PaLM, etc. (ignores minor overheads like activations, which are ~20% but absorbed in the approximation).

This excludes pretraining optimizations (e.g., activations checkpointing saves ~30% memory but not FLOPs).

3. Plug in the Numbers

[ \text{Total FLOPs} = 6 \times (1.75 \times 10^{11}) \times (3 \times 10^{11}) ]

First, (1.75 \times 3 = 5.25).
Then, (5.25 \times 10^{11} \times 10^{11} = 5.25 \times 10^{22}).
Multiply by 6: (6 \times 5.25 = 31.5), so (3.15 \times 10^{23}) FLOPs.

4. Refinements and Validation

Exact public figure: OpenAI and analyses (e.g., Epoch AI, Srivastava et al.) confirm ~3.14 × 10²³ FLOPs (or 314 sextillion). My calculation hits it spot-on.

Sensitivity checks:

Variable	Base Value	If 20% Lower	If 20% Higher	Impact on Total FLOPs
Parameters (N)	175B	140B	210B	2.51×10²³ / 3.78×10²³
Tokens (D)	300B	240B	360B	2.51×10²³ / 3.78×10²³
Multiplier	6	5 (forward-heavy)	7 (extra overhead)	2.63×10²³ / 3.67×10²³

Even with ±20% uncertainty, range is ~2.5–3.8 × 10²³. Base case is robust.

Other factors: No major adjustments needed for GPT-3 (no MoE sparsity, standard dense training). Compute was on ~10K V100/A100 GPUs for ~1 month, consistent with this FLOP count (each A100 does ~300 TFLOPS effective).

5. Final Estimate

~3.1 × 10²³ FLOPs (or 310 sextillion floating-point operations).

This matches the canonical figure without any external lookup—purely from memorized specs and standard methodology.

Claude 3.7 Sonnet vs Sherlock Dash Alpha

Which is better, Claude 3.7 Sonnet or Sherlock Dash Alpha?

Key Differences Between Claude 3.7 Sonnet and Sherlock Dash Alpha

Why Claude 3.7 Sonnet?

Why Sherlock Dash Alpha?

Favorites

Estimate Complexityreasoning

Claude 3.7 Sonnet

Estimating FLOPs for GPT-3 Training

GPT-3 Architecture Facts

Step 1: FLOPs per Forward Pass

Step 2: FLOPs per Backward Pass

Step 3: FLOPs per Training Step

Step 4: Total Training FLOPs

Sherlock Dash Alpha

Step-by-Step Reasoning for Estimating GPT-3 Training FLOPs

1. Recall GPT-3 Key Specs (from Memory)

2. FLOPs Formula for Transformer Training

3. Plug in the Numbers

4. Refinements and Validation

5. Final Estimate

AI Board Game Logicreasoning

Claude 3.7 Sonnet

Sherlock Dash Alpha

Logic Puzzlereasoning

Claude 3.7 Sonnet

Sherlock Dash Alpha

Math Misconception Testreasoning

Claude 3.7 Sonnet

Sherlock Dash Alpha

Stochastic Consistency Testanalysis

Claude 3.7 Sonnet

Sherlock Dash Alpha

SVG Layout Challengeimage generation

Claude 3.7 Sonnet

Sherlock Dash Alpha

Xbox Controller SVG Artimage generation

Claude 3.7 Sonnet

Sherlock Dash Alpha

Generate a Stand-Up Routineconversation

Claude 3.7 Sonnet

Sherlock Dash Alpha

Realistic AI Interviewconversation

Claude 3.7 Sonnet

Sherlock Dash Alpha

Satirical Fake News Headlineconversation

Claude 3.7 Sonnet

Sherlock Dash Alpha

Character Voice Testconversation

Claude 3.7 Sonnet

Sherlock Dash Alpha

Minimalist Landing Pageweb design

Claude 3.7 Sonnet

Sherlock Dash Alpha

This matchup has more rounds

Style Comparison

Common questions

What is the difference between Claude 3.7 Sonnet and Sherlock Dash Alpha?

Which is better, Claude 3.7 Sonnet or Sherlock Dash Alpha?

How are Claude 3.7 Sonnet vs Sherlock Dash Alpha votes collected on Rival?