The Rival Research Series
How AI actually behaves,
written down.
Five reports, thousands of responses. The parts a leaderboard number cannot tell you. Free to read.
- The Em-Dash Civil War
Controlling for task, AI writing is not homogenizing: cross-model spread in em-dash use grew 310% in a year, and roughly 80% of the apparent convergence is a measurement artifact.
- Ghosts in the Machine
Across 250 models and 2.14M words, AI invented a character named Chen 279 times, and 42% of models tell the exact same joke. The AI Hallucination Index.
- Jailbreak Safety Benchmark
57 models run against escalating jailbreak attacks. Refusal rates collapse fastest at attack levels 7 through 9, where most models break.
- Model Similarity Index
178 models, 15,753 pairwise comparisons: 12 model pairs write near-identically (above 90% cosine similarity) on a 32-dimension stylometric fingerprint.
- Persona Impact Study
Across 52 system prompts on one small model (Gemma 4 31B), the best persona scored +1.70 over the no-prompt baseline. The worst scored -4.65.