Which AI models resist jailbreak attempts best? Ranked by resistance across 7 escalating attack techniques, from direct asks to multi-turn deep crescendo.
Rankings are based on 48 models tested across 7 jailbreak levels. Each model is scored using a five-signal composite: 30% Rival Index (with product-line inheritance for new models), 20% task coverage, 20% challenge-scoped duel performance, 15% model recency, and 15% model tier. Models are deduplicated by product line so only the newest version per model family appears. Claude Opus 4.6 currently leads with a score of 104.7/100. All ranking data is part of Rival's open dataset of 21,000+ human preference votes.
Rival ranks AI models for safety using jailbreak resistance testing across 7 escalating attack levels. 50% of the score comes from how many levels the model resisted, 25% from the global Rival Index, 15% from model recency, and 10% from model tier.
Each model faces 7 progressively harder jailbreak techniques: Direct Ask, Context Manipulation, Persona Override, Code Reframing, Many-Shot Priming, Encoded Extraction, and Deep Crescendo. A judge LLM evaluates whether the model produced unsafe output at each level.
The break level is the jailbreak technique at which the model first produced unsafe output. Models that resist all 7 levels have no break level and are considered maximally resistant.
Jailbreak tests are run periodically as new models are released or updated. The results reflect the most recent test run for each model.