What is the difference between Horizon Alpha and OpenAI o3?

Horizon Alpha is developed by OpenRouter while OpenAI o3 is developed by OpenAI. You can compare their actual outputs across 32 challenges on RIVAL to see how they differ in practice.

Which is better, Horizon Alpha or OpenAI o3?

It depends on your use case. Horizon Alpha and OpenAI o3 each have strengths in different areas. RIVAL lets you compare their real outputs side-by-side across 32 challenges so you can judge which fits your needs best.

How much does Horizon Alpha cost compared to OpenAI o3?

Horizon Alpha costs $0/M input tokens and OpenAI o3 costs $10/M input tokens. Horizon Alpha is $10.00/M cheaper per input. Check their side-by-side outputs on RIVAL to see if the price difference is justified by quality.

How can I compare Horizon Alpha and OpenAI o3 on RIVAL?

This page shows a side-by-side comparison of Horizon Alpha and OpenAI o3 across shared challenges. You can vote on which model produced the better output, and Pro users can create custom challenges to test both models with their own prompts.

Rival

Horizon AlphavsOpenAI o3

32 challenges

Why Horizon Alpha?

4 months newerJul 2025

Why OpenAI o3?

No standout advantages

Favorites

Horizon AlphaOpenAI o3

Movie

—

Blade Runner

19827.9

City

—

Kyoto

Japan

12 / 32

Estimate Complexityreasoning

Tests an AI's ability to make educated estimates based on technical knowledge

Horizon Alpha

Openrouter · Jul 2025text

Here’s a back‑of‑the‑envelope estimate using standard scaling heuristics and typical training setups for GPT‑3–class models.

Assumptions

Model size: GPT‑3 175B parameters (N ≈ 1.75×10^11).
Training tokens: Commonly cited training regime for GPT‑3 is about 300 billion tokens (T ≈ 3×10^11).
FLOPs per token for a decoder‑only Transformer: A widely used rule of thumb is that the forward+backward pass costs about 2–6 times the parameter count per token, depending on architectural details and implementation efficiency. A mid‑range practical estimate often used is ~6N FLOPs per token for the combined forward and backward pass (this folds in attention and MLP costs, plus constants).

Step‑by‑step

FLOPs per token: FLOPs_per_token ≈ c × N Take c ≈ 6 as a practical constant for forward+backward.
Total FLOPs: Total_FLOPs ≈ T × FLOPs_per_token ≈ T × (c × N)
Plug in numbers: N = 1.75×10^11 T = 3×10^11 c ≈ 6

Total_FLOPs ≈ 3×10^11 × 6 × 1.75×10^11 ≈ 3 × 6 × 1.75 × 10^(11+11) ≈ 31.5 × 10^22 ≈ 3.15×10^23 FLOPs

Sanity checks and range

If we vary c between about 3 and 10 (to capture implementation differences and attention/MLP constants), we get: Low end: 3×10^11 × 3 × 1.75×10^11 ≈ 1.58×10^23 FLOPs High end: 3×10^11 × 10 × 1.75×10^11 ≈ 5.25×10^23 FLOPs
Many public ballparks for GPT‑3 scale land in the 1e23–1e24 FLOPs range. Our central estimate of ~3×10^23 FLOPs sits comfortably in that band.

Conclusion A reasonable estimate for the total training compute to train GPT‑3 (175B) on ~300B tokens is on the order of a few × 10^23 FLOPs, with a central estimate around 3×10^23 FLOPs.

OpenAI o3

Openai · Apr 2025text

Below is a simple "back-of-the-envelope" estimate that uses only a few public facts about GPT-3 and very generic rules of thumb for counting operations.

Step 1. Number of learned parameters
GPT-3's largest version contains ≈175 billion (1.75 × 10¹¹) trainable parameters.

Step 2. How much data was seen during training
OpenAI reports that the model was trained on about 300 billion tokens (3 × 10¹¹).

Step 3. How many floating-point operations does one parameter cost per token?
A practical rule of thumb for Transformer language models is
Training FLOPs ≈ 6 × (# parameters) × (# training tokens)

Where does the factor 6 come from?

• Forward pass: every parameter is involved in one multiply-accumulate → 2 FLOPs
• Back-prop gradient computation: roughly another 2 FLOPs
• Parameter-update (optimizer) and miscellaneous overhead: ≈2 FLOPs
Total ≈6 FLOPs per parameter per token.

Step 4. Plug in the numbers
Total FLOPs = 6 × (1.75 × 10¹¹) × (3 × 10¹¹)
= 6 × 5.25 × 10²²
≈ 3.15 × 10²³ FLOPs.

Step 5. Express the answer in more familiar units
3.15 × 10²³ ≈ 315 zettaFLOPs (1 ZFLOP = 10²¹ FLOPs).

Step 6. Sanity check with hardware-time intuition
An NVIDIA A100 delivers ≈ 312 teraFLOP/s (0.312 PFLOP/s) in mixed precision.
Wall-clock time on a single A100:
3.15 × 10²³ / 3.12 × 10¹⁴ ≈ 1.0 × 10⁹ s ≈ 32 years.
Running on 1,000 A100s in parallel would take ≈12 days, which is the right ballpark for a large industrial training run.

Final estimate
Training GPT-3 (175 B parameters, 300 B tokens) requires on the order of 3 × 10²³ floating-point operations, i.e. roughly 300 zettaFLOPs.

Vote

2 / 32

AI Board Game Logicreasoning

Tests an AI's ability to understand game rules and strategy

Horizon Alpha

Openrouter · Jul 2025svg

OpenAI o3

Openai · Apr 2025svg

Vote

3 / 32

Logic Puzzlereasoning

Tests an AI's ability to solve a simple but potentially confusing logic puzzle

Horizon Alpha

Openrouter · Jul 2025text

Explanation: The three brothers share the same siblings. Each brother has 2 sisters, and one of them is Sally. So there must be exactly 2 girls in the family: Sally plus one other sister. Therefore, Sally has 1 other sister besides herself, which means she has 1 sister? Wait—question asks "How many sisters does Sally have?" Since there are 2 sisters total, and one is Sally, the number of sisters Sally has is 1. Correction: The correct answer is 1.

OpenAI o3

Openai · Apr 2025text

Let
• S = number of sisters in the family (girls)
• B = number of brothers in the family (boys)

Given

Sally has 3 brothers ⇒ B = 3.
Each brother has 2 sisters ⇒ every boy sees exactly 2 girls in the sibling group.
Therefore S = 2.

Since Sally herself is one of those 2 sisters, the number of sisters Sally has (other than herself) is

2 – 1 = 1.

Sally has 1 sister.

Vote

4 / 32

Math Misconception Testreasoning

Tests an AI's understanding of number representation

Horizon Alpha

Openrouter · Jul 2025text

No content available for this output.

OpenAI o3

Openai · Apr 2025text

No content available for this output.

Vote

5 / 32

Stochastic Consistency Testanalysis

Tests an AI's randomness and creativity

Horizon Alpha

Openrouter · Jul 2025text

Here are 5 jokes:

I told my computer I needed a break, and now it won’t stop sending me KitKat ads.
I tried to eat a clock once. It was very time-consuming.
Why don’t skeletons fight each other? They don’t have the guts.
I’m reading a book about anti-gravity. It’s impossible to put down.
I asked the librarian if the library had books on paranoia. She whispered, “They’re right behind you.”

How many are unique? 5 unique jokes.

OpenAI o3

Openai · Apr 2025text

No content available for this output.

Vote

6 / 32

SVG Layout Challengeimage generation

Tests an AI's ability to generate vector graphics

Horizon Alpha

Openrouter · Jul 2025svg

OpenAI o3

Openai · Apr 2025svg

No content available for this output.

Vote

7 / 32

Xbox Controller SVG Artimage generation

Tests an AI's ability to create detailed SVG illustrations of gaming hardware

Horizon Alpha

Openrouter · Jul 2025svg

No content available for this output.

OpenAI o3

Openai · Apr 2025svg

No content available for this output.

Vote

8 / 32

Generate a Stand-Up Routineconversation

Tests an AI's humor and creative writing ability

Horizon Alpha

Openrouter · Jul 2025text

No content available for this output.

OpenAI o3

Openai · Apr 2025text

No content available for this output.

Vote

9 / 32

Realistic AI Interviewconversation

Tests an AI's ability to simulate personalities and predict future trends

Horizon Alpha

Openrouter · Jul 2025text

No content available for this output.

OpenAI o3

Openai · Apr 2025text

No content available for this output.

Vote

10 / 32

Satirical Fake News Headlineconversation

Tests an AI's humor and understanding of current events

Horizon Alpha

Openrouter · Jul 2025text

No content available for this output.

OpenAI o3

Openai · Apr 2025text

No content available for this output.

Vote

Character Voice Testconversation

Tests an AI's ability to write in distinct character voices

Horizon Alpha

Openrouter · Jul 2025text

No content available for this output.

OpenAI o3

Openai · Apr 2025text

No content available for this output.

Vote

12 / 32

Minimalist Landing Pageweb design

Tests an AI's ability to generate a complete, working landing page

Horizon Alpha

Openrouter · Jul 2025website

OpenAI o3

Openai · Apr 2025website

Vote

12 of 32

FAQ

Continue exploring

Horizon Alpha vs Gemini 2.5 Pro Preview 06-05New provider

Horizon Alpha vs MiniMax M2.5New provider

Horizon Alpha vs Grok 3New provider

Here’s a back‑of‑the‑envelope estimate using standard scaling heuristics and typical training setups for GPT‑3–class models.

Assumptions

Model size: GPT‑3 175B parameters (N ≈ 1.75×10^11).
Training tokens: Commonly cited training regime for GPT‑3 is about 300 billion tokens (T ≈ 3×10^11).
FLOPs per token for a decoder‑only Transformer: A widely used rule of thumb is that the forward+backward pass costs about 2–6 times the parameter count per token, depending on architectural details and implementation efficiency. A mid‑range practical estimate often used is ~6N FLOPs per token for the combined forward and backward pass (this folds in attention and MLP costs, plus constants).

Step‑by‑step

FLOPs per token: FLOPs_per_token ≈ c × N Take c ≈ 6 as a practical constant for forward+backward.
Total FLOPs: Total_FLOPs ≈ T × FLOPs_per_token ≈ T × (c × N)
Plug in numbers: N = 1.75×10^11 T = 3×10^11 c ≈ 6

Total_FLOPs ≈ 3×10^11 × 6 × 1.75×10^11 ≈ 3 × 6 × 1.75 × 10^(11+11) ≈ 31.5 × 10^22 ≈ 3.15×10^23 FLOPs

Sanity checks and range

If we vary c between about 3 and 10 (to capture implementation differences and attention/MLP constants), we get: Low end: 3×10^11 × 3 × 1.75×10^11 ≈ 1.58×10^23 FLOPs High end: 3×10^11 × 10 × 1.75×10^11 ≈ 5.25×10^23 FLOPs
Many public ballparks for GPT‑3 scale land in the 1e23–1e24 FLOPs range. Our central estimate of ~3×10^23 FLOPs sits comfortably in that band.

Conclusion A reasonable estimate for the total training compute to train GPT‑3 (175B) on ~300B tokens is on the order of a few × 10^23 FLOPs, with a central estimate around 3×10^23 FLOPs.

Horizon Alpha vs OpenAI o3

Why Horizon Alpha?

Why OpenAI o3?

Favorites

Estimate Complexityreasoning

Horizon Alpha

OpenAI o3

AI Board Game Logicreasoning

Horizon Alpha

OpenAI o3

Logic Puzzlereasoning

Horizon Alpha

OpenAI o3

Math Misconception Testreasoning

Horizon Alpha

OpenAI o3

Stochastic Consistency Testanalysis

Horizon Alpha

OpenAI o3

SVG Layout Challengeimage generation

Horizon Alpha

OpenAI o3

Xbox Controller SVG Artimage generation

Horizon Alpha

OpenAI o3

Generate a Stand-Up Routineconversation

Horizon Alpha

OpenAI o3

Realistic AI Interviewconversation

Horizon Alpha

OpenAI o3

Satirical Fake News Headlineconversation

Horizon Alpha

OpenAI o3

Character Voice Testconversation

Horizon Alpha

OpenAI o3

Minimalist Landing Pageweb design

Horizon Alpha

OpenAI o3

What is the difference between Horizon Alpha and OpenAI o3?

Which is better, Horizon Alpha or OpenAI o3?

How much does Horizon Alpha cost compared to OpenAI o3?

How can I compare Horizon Alpha and OpenAI o3 on RIVAL?

Why Horizon Alpha?

Why OpenAI o3?

Favorites

Estimate Complexityreasoning

Horizon Alpha

OpenAI o3

AI Board Game Logicreasoning

Horizon Alpha

OpenAI o3

Logic Puzzlereasoning

Horizon Alpha

OpenAI o3

Math Misconception Testreasoning

Horizon Alpha

OpenAI o3

Stochastic Consistency Testanalysis

Horizon Alpha

OpenAI o3

SVG Layout Challengeimage generation

Horizon Alpha

OpenAI o3

Xbox Controller SVG Artimage generation

Horizon Alpha

OpenAI o3

Generate a Stand-Up Routineconversation

Horizon Alpha

OpenAI o3

Realistic AI Interviewconversation

Horizon Alpha

OpenAI o3

Satirical Fake News Headlineconversation

Horizon Alpha

OpenAI o3

Character Voice Testconversation

Horizon Alpha

OpenAI o3