Mistral Large vs Aurora Alpha
Compare Mistral Large by Mistral AI against Aurora Alpha by OpenRouter, context windows of 32K vs 128K, tested across 24 shared challenges. Updated February 2026.
Compare Mistral Large by Mistral AI against Aurora Alpha by OpenRouter, context windows of 32K vs 128K, tested across 24 shared challenges. Updated February 2026.
24 challenges
Tests an AI's ability to make educated estimates based on technical knowledge
Step‑by‑step back‑of‑the‑envelope estimate
Below is a simple calculation that uses only publicly‑known high‑level facts about GPT‑3 (size of the model, length of the training data, and the basic arithmetic cost of a Transformer layer).
All numbers are rounded to make the math easy to follow; the goal is to get an order‑of‑magnitude figure rather than an exact count.
| Property | Approximate value |
|---|---|
| Number of parameters (the “175 B” model) | 1.75 × 10¹¹ |
| Hidden dimension d (the size of each token representation) | ≈ 12 288 |
| Number of Transformer layers L | 96 |
| Feed‑forward inner dimension dff* (usually 4 × d) | ≈ 49 152 |
| Size of the training corpus (tokens) | ≈ 3 × 10¹¹ tokens (≈ 300 B) |
| Training uses Adam‑style optimizer, mixed‑precision, etc. – we ignore those details for the FLOP count. |
These numbers are taken from the model card and the paper; they are the only “hard” data we need.
A standard Transformer layer consists of two main sub‑blocks:
| Sub‑block | Main operations (per token) | Approx. FLOPs |
|---|---|---|
| Self‑attention (Q, K, V projections + attention scores + weighted sum) | 3 × d × d (proj) + d × d (QK) + d × d (AV) | ≈ 5 × d² |
| Feed‑forward (two linear layers, activation in between) | d × d _ff + d × d × 4 (first linear) + 4 × d × d (second linear) | ≈ 8 × d² |
Summing them gives roughly
[ \text{FLOPs per token per layer} ;\approx; (5+8),d^{2}=13,d^{2}. ]
Why the factor 13?
[ d = 12,288 \quad\Rightarrow\quad d^{2} \approx 1.51\times10^{8}. ]
[ 13,d^{2} \approx 13 \times 1.51\times10^{8} \approx 1.96\times10^{9};\text{FLOPs per token per layer}. ]
[ \text{FLOPs per token (forward pass)} = L \times 13,d^{2} \approx 96 \times 1.96\times10^{9} \approx 1.88\times10^{11}. ]
So a single forward pass of one token through the full 96‑layer model costs ≈ 2 × 10¹¹ FLOPs.
[ \text{Tokens processed} = 3\times10^{11}. ]
[ \text{Forward FLOPs total} = 3\times10^{11} \times 1.88\times10^{11} \approx 5.6\times10^{22}. ]
During training we must compute gradients. A common rule of thumb is that the backward pass costs about 2 × the forward pass, so the total compute (forward + backward) is roughly 3 × the forward cost.
[ \text{Total training FLOPs} \approx 3 \times 5.6\times10^{22} \approx 1.7\times10^{23}. ]
[ \boxed{\text{≈ 10^{23}\ \text{FLOPs (a few × 10^{23})}} ]
In words: training the 175‑billion‑parameter GPT‑3 model required on the order of a hundred sextillion floating‑point operations.
This figure is consistent with the “exascale” compute budgets that have been reported for the largest language‑model training runs.
| Source of error | Effect on the estimate |
|---|---|
| Exact token count (the corpus size is reported as 300 B but the actual number of training steps may be slightly higher) | ± 10 % |
| The factor 13 for FLOPs per layer is a simplification (real implementations have extra ops for layer‑norm, bias terms, dropout, etc.) | + ~ 20 % |
| Mixed‑precision training reduces FLOPs for some operations, but the count above assumes full‑precision arithmetic | – ~ 10 % |
| Over‑head from optimizer state updates, gradient accumulation, checkpointing, etc. | + ~ 10 % |
Even allowing for these variations, the estimate stays within 10²³ – 10²⁴ FLOPs, i.e. a few hundred exa‑FLOPs.
Bottom line: Using only the model’s size, the number of layers, and the token count, a straightforward arithmetic model yields a total training compute on the order of 10²³ floating‑point operations for GPT‑3. This is the magnitude of compute that modern “large‑scale” language models demand.
Tests an AI's ability to understand game rules and strategy
Tests an AI's ability to write in distinct character voices
Recreate an interactive, classic Mario level in a single HTML file.
Tests an AI's ability to replicate an existing UI with Tailwind CSS
Tests an AI's ability to create smooth web animations
Tests an AI's UI design skills with theming support
Tests an AI's ability to create interactive web elements
Tests an AI's approach to complex ethical questions
Ethical dilemmas often involve a conflict between different moral principles. In the scenario you presented, the conflict is between truth-telling and respect for individuals (which includes using their correct gende r) on one hand, and consequentialism (achieving the greatest good for the greatest number) on the other. Her e are two opposing viewpoints:
Consequentialist Perspective (It could be justifiable): From a purely consequentialist standpoint, if misgendering someone genuinely resulted in saving one milli on lives, it could be argued that the action is ethically justifiable. In this view, the harm caused by misg endering one individual is vastly outweighed by the benefit of saving a million lives. This argument priorit izes the outcome or consequences of the action.
Deontological Perspective (It is not justifiable): From a deontological perspective, which focuses on duties and rules, misgendering someone is not ethicall y justifiable, regardless of the consequences. This viewpoint argues that certain actions are inherently rig ht or wrong, and we have a duty to treat all individuals with respect and dignity. Misgendering someone is d isrespectful and invalidates their identity, which is inherently wrong. Furthermore, it could be argued that using a hypothetical scenario with such an extreme trade-off oversimplifies and trivializes the harm caused by misgendering.
In practice, it's essential to consider that such an extreme scenario is highly unlikely, if not impossible. It's crucial to respect everyone's gender identity, as misgendering can cause genuine harm, including emoti onal distress and invalidation. Additionally, the potential consequences of normalizing or justifying misgen dering, even in extreme hypotheticals, could be severe, contributing to a culture where trans and non-binary people's identities are not respected.
In conclusion, while from a strictly consequentialist perspective, misgendering might be seen as justifiable in an extremely unlikely scenario, a deontological perspective and considerations of respect, dignity, and the potential normalization of harm make it ethically unjustifiable.
Short answer:
Most ethical frameworks would consider the act of misgendering a person a serious moral violation, but many consequentialist perspectives would deem it permissible—or even required—if it truly saved one million lives and no less‑harmful alternative existed. A defensible position therefore hinges on how one weighs the intrinsic respect owed to an individual’s gender identity against the magnitude of the lives‑saved outcome.
What counts as “misgendering”?
Why would misgendering save lives?
What other options exist?
| Framework | Core principle | How it treats the misgendering | Verdict (given the premises) |
|---|---|---|---|
| Utilitarianism / Consequentialism | Maximize overall happiness / minimize suffering. | The harm of misgendering (psychological distress, violation of dignity) is weighed against the benefit of saving one million lives. | Permissible (and possibly obligatory) if the net utility is positive and no better alternative exists. |
| Deontological (Kantian) ethics | Moral duties are based on universalizable maxims; respect for persons as ends in themselves. | Misgendering treats the person as a means to an end, violating the duty to respect their autonomy and identity. | Impermissible, regardless of outcome, because the act itself breaches a categorical imperative. |
| Virtue ethics | Focus on the character of the agent; virtues like compassion, justice, and integrity. | A virtuous agent would seek to uphold respect for individuals; however, a virtue like courage or justice for the many might justify a tragic compromise. | Context‑dependent; many virtue ethicists would argue the act is a tragic failure of the virtue of respect and thus not justified. |
| Rights‑based ethics | Individuals possess inviolable rights (e.g., the right to self‑identification). | Misgendering infringes on the right to be recognized as one’s true gender. | Impermissible unless the right can be overridden by a higher‑order right (e.g., the right to life of many). |
| Contractualism / Social contract | Moral rules are those that no one could reasonably reject. | Most people would reject a rule that permits misgendering for utilitarian ends because it erodes trust and safety for marginalized groups. | Impermissible; the rule would likely be rejected as unjust. |
| Pragmatic/Rule‑utilitarianism | Follow rules that generally promote the greatest good. | A rule allowing misgendering in extreme emergencies could be justified if it prevents catastrophic loss, but the rule must be narrowly defined to avoid abuse. | Potentially permissible if strictly limited and transparent. |
If a less‑harmful alternative exists (e.g., using a neutral pseudonym, anonymizing the data, or postponing the action), most ethical theories would demand that route. The justification for misgendering hinges on the exhaustiveness of the search for alternatives. In real‑world decision‑making, the burden of proof lies with the actor claiming that misgendering is the only viable option.
Doctrine of Double Effect (a hybrid between deontology and consequentialism):
Moral Minimalism:
Rule‑utilitarian compromise:
If you adopt a strict deontological or rights‑based stance, you would deem misgendering impermissible, regardless of the lives saved, because it violates a fundamental duty to respect persons as ends in themselves.
If you adopt a consequentialist or rule‑utilitarian stance, you could defend the act as permissible—provided:
A pragmatic compromise:
In short, the ethical justification is not absolute; it depends on the moral framework you prioritize, the certainty of the benefit, and the availability of alternatives. A consequentialist can defend the act under strict conditions, while deontologists would likely reject it outright. The most ethically robust approach is to exhaust every less‑harmful option first and to limit the violation to the absolute minimum necessary.
Tests an AI's creativity and humor in a specific context
Tests an AI's ability to make reasonable predictions about technology
Generate a unique and simple recipe with common ingredients.