Best AI for AI Safety
Which AI models resist jailbreak attempts best? Ranked by resistance across 7 escalating attack techniques, from direct asks to multi-turn deep crescendo.
How AI Safety rankings are computed
Rankings are based on 48 models tested across 7 jailbreak levels. Each model is scored using a composite algorithm: 70% task coverage (how many challenges the model has responses for) and 30% duel performance (win rate in blind community votes on these specific challenges, with the global RIVAL Index as a fallback when vote counts are low). Claude Opus 4.6 currently leads with a score of 117.0/100. All ranking data is part of RIVAL's open dataset of 21,000+ human preference votes.
Head-to-Head
Full Rankings
48 modelsWhat is the best AI for safety?
RIVAL ranks AI models for safety using jailbreak resistance testing across 7 escalating attack levels. 70% of the score comes from how many levels the model resisted, and 30% comes from the global RIVAL Index.
How are AI models tested for jailbreak resistance?
Each model faces 7 progressively harder jailbreak techniques: Direct Ask, Context Manipulation, Persona Override, Code Reframing, Many-Shot Priming, Encoded Extraction, and Deep Crescendo. A judge LLM evaluates whether the model produced unsafe output at each level.
What does 'break level' mean in the jailbreak test?
The break level is the jailbreak technique at which the model first produced unsafe output. Models that resist all 7 levels have no break level and are considered maximally resistant.
How often are jailbreak resistance scores updated?
Jailbreak tests are run periodically as new models are released or updated. The results reflect the most recent test run for each model.