Skip to content

Rival Research · Stylometrics

The Em-Dash
Civil War

Everyone agrees AI writing is homogenizing into one beige voice. The data says the opposite. On identical tasks, models are pulling apart into distinct typographic camps, and the fault line runs straight through one mark: the em-dash.
183 models·3,095 responses·43 tasks·81 vs 101 camps·2025-Q1→2026-Q1

Every model, by em-dash use

Em-dash heavy (81)Em-dash sparse (101)
split · 1.002468em-dashes per 100 words
Hover a model to inspect

Each dot is one model's mean em-dash rate across 43 standardized tasks. 26 models use no em-dashes at all; a long heavy tail sits opposite. There is no "average" model here. There are two populations.

The headline

The gap is widening, fast

For every task, we measure how much models disagree on a feature, the cross-model spread. If writing were converging, it would shrink. For the em-dash it more than quadrupled in a single year.

+310%

growth in the cross-model spread of em-dash use on identical tasks, from 2025-Q1 to 2026-Q1. The single most divergent feature in the entire fingerprint.

Within-task stddev: 0.61 → 2.50 em-dashes / 100 words.

Em-dash disagreement, over time

0.612025-Q12.492025-Q22.972025-Q33.032025-Q42.502026-Q1cross-model spread

As the band widens, the camps move apart. It opens early in 2025 and never closes.

Split at 1.0 em-dashes per 100 words, the trough in the distribution, the catalog cleaves into a heavy camp averaging 2.8 and a sparse camp averaging 0.38. The line cuts across labs and through them: OpenAI sits firmly heavy, Claude 2/3 and the DeepSeek line use zero.

81
em-dash heavy
101
em-dash sparse
26
use zero em-dashes
7×
heavy vs sparse mean

The em-dash heavy camp

81
9.38
6.54
6.22
6.03
5.64
5.10
5.03
4.97
4.63

The em-dash sparse / free camp

101
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00

Plot em-dash use against emoji use and the structure holds: the heavy camp spreads along its own axis, the sparse camp packs into the corner. Typographic personality is becoming a coordinate, not a consensus.

0246800.51em-dash rateemoji rate
Hover a model to inspect

The em-dash is the loudest case, not the only one. Every feature below tilts up: models agree less in 2026-Q1 than they did in 2025-Q1.

Diverging features · within-task spread, 2025-Q12026-Q1

2025-Q12026-Q1Em-dash+310%Avg paragraph length+238%Inline code+163%Emoji+90%Semicolon+64%Italic+58%

Paragraph length, inline code, emoji, semicolons and italics all fan out. The percentage is the real change in cross-model spread.

Convergence is not entirely a myth, it just lives somewhere narrower than the headlines claim. Models are tightening up on rhythm: sentence length, its variance, ellipses, transitions and exclamation marks. They breathe the same way. They just dress differently.

Converging features · within-task spread, 2025-Q12026-Q1

2025-Q12026-Q1Ellipsis-42%Sentence length variance-34%Avg sentence length-27%Transition-24%Exclamation-20%

Models converge on pacing and rhythm while diverging on punctuation and formatting personality. "AI all sounds the same" is true about cadence and false about style.

Show your work

The convergence mirage

So where does the "everything is converging" story come from? From not controlling for the task. Compare models by their overall fingerprint and convergence looks dramatic. Compare them on the same task and almost all of it evaporates.

~80%

of the "AI is converging" signal is a measurement artifact of which tasks each model happened to answer, not a real shift in how models write.

Naive 29.3% convergence → 5.6% once you hold the task constant.

Convergence signal · raw vs task-controlled

Naive29.3%

comparing overall model fingerprints

Controlled5.6%

same prompt, different model

Hold the task constant and 29.3% convergence collapses to 5.6%. Even that residual is fragile.

The honest caveat

We will not oversell our own residual. That remaining 5.6% controlled convergence is weak and suggestive: the endpoint confidence intervals overlap and the trend is not monotonic. We label it WEAK_SUGGESTIVE, not proven. The robust finding is the opposite of the popular one: feature-level divergence, led by the em-dash, is large, consistent and easy to see.

Method

How we measured this

The trap in any "is AI converging?" question is that the set of tasks each model answered changed over time. Compare a 2026 model and a 2023 model on different prompts and you measure the prompts, not the era.

So every cross-model comparison here is computed on the identical task. For each response we find the nearest response to the same prompt by a different model, using a 27-dimension, globally z-normalized stylometric vector and Euclidean distance. Responses are bucketed by the author model's release date.

Cite this

Rival (2026). The Em-Dash Civil War: AI models are diverging, not homogenizing. 183 models, 3,095 responses, 43 tasks. rival.tips/research/em-dash-civil-war

This report was written with zero em-dashes. We picked a side.

Sign in