Rival Research · Stylometrics

The Em-Dash
Civil War

Everyone agrees AI writing is homogenizing into one beige voice. The data says the opposite. On identical tasks, models are pulling apart into distinct typographic camps, and the fault line runs straight through one mark: the em-dash.

183 models·3,095 responses·43 tasks·81 vs 101 camps·2025-Q1→2026-Q1

Every model, by em-dash use

Em-dash heavy (81)Em-dash sparse (101)

Hover a model to inspect

Each dot is one model's mean em-dash rate across 43 standardized tasks. 26 models use no em-dashes at all; a long heavy tail sits opposite. There is no "average" model here. There are two populations.

The headline

The gap is widening, fast

For every task, we measure how much models disagree on a feature, the cross-model spread. If writing were converging, it would shrink. For the em-dash it more than quadrupled in a single year.

+310%

growth in the cross-model spread of em-dash use on identical tasks, from 2025-Q1 to 2026-Q1. The single most divergent feature in the entire fingerprint.

Within-task stddev: 0.61 → 2.50 em-dashes / 100 words.

Em-dash disagreement, over time

As the band widens, the camps move apart. It opens early in 2025 and never closes.

Split at 1.0 em-dashes per 100 words, the trough in the distribution, the catalog cleaves into a heavy camp averaging 2.8 and a sparse camp averaging 0.38. The line cuts across labs and through them: OpenAI sits firmly heavy, Claude 2/3 and the DeepSeek line use zero.

em-dash heavy

101

em-dash sparse

use zero em-dashes

7×

heavy vs sparse mean

The em-dash heavy camp

ModelLabPer 100w

Kimi K2 0905

Moonshot

9.38

Mistral Small Creative

Mistral

6.58

Mercury

Inception

6.54

Mercury 2

Inception

6.22

GPT-5.3 Chat

OpenAI

6.03

OpenAI o3

OpenAI

5.64

Qwen3 Max Thinking

Qwen

5.18

Aurora Alpha

OpenRouter

5.10

Solar Pro 3

Upstage

5.03

Mistral Large

Mistral

4.97

Mistral Large 3 2512

Mistral

4.64

GPT OSS 120B

OpenAI

4.63

The em-dash sparse / free camp

101

ModelLabPer 100w

Claude 2

Anthropic

0.00

Claude 3 Haiku

Anthropic

0.00

Claude 3 Opus

Anthropic

0.00

Claude 3 Sonnet

Anthropic

0.00

Claude Sonnet 3.6 (2022-10-22)

Anthropic

0.00

Cypher Alpha (free)

OpenRouter

0.00

DeepSeek Prover V2

DeepSeek

0.00

DeepSeek R1 0528

DeepSeek

0.00

Mistral Devstral Medium

Mistral

0.00

Mistral Devstral Small 1.1

Mistral

0.00

Gemini 1.5 Pro

Google

0.00

Gemini 2.0 Flash Thinking

Google

0.00

Plot em-dash use against emoji use and the structure holds: the heavy camp spreads along its own axis, the sparse camp packs into the corner. Typographic personality is becoming a coordinate, not a consensus.

Hover a model to inspect

The em-dash is the loudest case, not the only one. Every feature below tilts up: models agree less in 2026-Q1 than they did in 2025-Q1.

Diverging features · within-task spread, 2025-Q1 → 2026-Q1

Paragraph length, inline code, emoji, semicolons and italics all fan out. The percentage is the real change in cross-model spread.

Convergence is not entirely a myth, it just lives somewhere narrower than the headlines claim. Models are tightening up on rhythm: sentence length, its variance, ellipses, transitions and exclamation marks. They breathe the same way. They just dress differently.

Converging features · within-task spread, 2025-Q1 → 2026-Q1

Models converge on pacing and rhythm while diverging on punctuation and formatting personality. "AI all sounds the same" is true about cadence and false about style.

Show your work

The convergence mirage

So where does the "everything is converging" story come from? From not controlling for the task. Compare models by their overall fingerprint and convergence looks dramatic. Compare them on the same task and almost all of it evaporates.

~80%

of the "AI is converging" signal is a measurement artifact of which tasks each model happened to answer, not a real shift in how models write.

Naive 29.3% convergence → 5.6% once you hold the task constant.

Convergence signal · raw vs task-controlled

Naive−29.3%

comparing overall model fingerprints

Controlled−5.6%

same prompt, different model

Hold the task constant and 29.3% convergence collapses to 5.6%. Even that residual is fragile.

The honest caveat

We will not oversell our own residual. That remaining 5.6% controlled convergence is weak and suggestive: the endpoint confidence intervals overlap and the trend is not monotonic. We label it WEAK_SUGGESTIVE, not proven. The robust finding is the opposite of the popular one: feature-level divergence, led by the em-dash, is large, consistent and easy to see.

Method

How we measured this

The trap in any "is AI converging?" question is that the set of tasks each model answered changed over time. Compare a 2026 model and a 2023 model on different prompts and you measure the prompts, not the era.

So every cross-model comparison here is computed on the identical task. For each response we find the nearest response to the same prompt by a different model, using a 27-dimension, globally z-normalized stylometric vector and Euclidean distance. Responses are bucketed by the author model's release date.

Cite this

Rival (2026). The Em-Dash Civil War: AI models are diverging, not homogenizing. 183 models, 3,095 responses, 43 tasks. rival.tips/research/em-dash-civil-war

Download dataset (JSON)

The Model Similarity Index

Which models write identically, and what it costs you.

Live model pricing

Input / output prices per 1M tokens, across every lab.

This report was written with zero em-dashes. We picked a side.

Rival Research · Stylometrics

The Em-Dash
Civil War

183 models·3,095 responses·43 tasks·81 vs 101 camps·2025-Q1→2026-Q1

Every model, by em-dash use

Em-dash heavy (81)Em-dash sparse (101)

Hover a model to inspect

The headline

The gap is widening, fast

For every task, we measure how much models disagree on a feature, the cross-model spread. If writing were converging, it would shrink. For the em-dash it more than quadrupled in a single year.

+310%

growth in the cross-model spread of em-dash use on identical tasks, from 2025-Q1 to 2026-Q1. The single most divergent feature in the entire fingerprint.

Within-task stddev: 0.61 → 2.50 em-dashes / 100 words.

Em-dash disagreement, over time

As the band widens, the camps move apart. It opens early in 2025 and never closes.

em-dash heavy

101

em-dash sparse

use zero em-dashes

7×

heavy vs sparse mean

The em-dash heavy camp

ModelLabPer 100w

Kimi K2 0905

Moonshot

9.38

Mistral Small Creative

Mistral

6.58

Mercury

Inception

6.54

Mercury 2

Inception

6.22

GPT-5.3 Chat

OpenAI

6.03

OpenAI o3

OpenAI

5.64

Qwen3 Max Thinking

Qwen

5.18

Aurora Alpha

OpenRouter

5.10

Solar Pro 3

Upstage

5.03

Mistral Large

Mistral

4.97

Mistral Large 3 2512

Mistral

4.64

GPT OSS 120B

OpenAI

4.63

The em-dash sparse / free camp

101

ModelLabPer 100w

Claude 2

Anthropic

0.00

Claude 3 Haiku

Anthropic

0.00

Claude 3 Opus

Anthropic

0.00

Claude 3 Sonnet

Anthropic

0.00

Claude Sonnet 3.6 (2022-10-22)

Anthropic

0.00

Cypher Alpha (free)

OpenRouter

0.00

DeepSeek Prover V2

DeepSeek

0.00

DeepSeek R1 0528

DeepSeek

0.00

Mistral Devstral Medium

Mistral

0.00

Mistral Devstral Small 1.1

Mistral

0.00

Gemini 1.5 Pro

Google

0.00

Gemini 2.0 Flash Thinking

Google

0.00

Hover a model to inspect

The em-dash is the loudest case, not the only one. Every feature below tilts up: models agree less in 2026-Q1 than they did in 2025-Q1.

Diverging features · within-task spread, 2025-Q1 → 2026-Q1

Paragraph length, inline code, emoji, semicolons and italics all fan out. The percentage is the real change in cross-model spread.

Converging features · within-task spread, 2025-Q1 → 2026-Q1

Models converge on pacing and rhythm while diverging on punctuation and formatting personality. "AI all sounds the same" is true about cadence and false about style.

Show your work

The convergence mirage

~80%

of the "AI is converging" signal is a measurement artifact of which tasks each model happened to answer, not a real shift in how models write.

Naive 29.3% convergence → 5.6% once you hold the task constant.

Convergence signal · raw vs task-controlled

Naive−29.3%

comparing overall model fingerprints

Controlled−5.6%

same prompt, different model

Hold the task constant and 29.3% convergence collapses to 5.6%. Even that residual is fragile.

The honest caveat

Method

How we measured this

Cite this

Rival (2026). The Em-Dash Civil War: AI models are diverging, not homogenizing. 183 models, 3,095 responses, 43 tasks. rival.tips/research/em-dash-civil-war

Download dataset (JSON)

The Model Similarity Index

Which models write identically, and what it costs you.

Live model pricing

Input / output prices per 1M tokens, across every lab.

This report was written with zero em-dashes. We picked a side.

Rival Research · Stylometrics

The Em-Dash
Civil War

183 models·3,095 responses·43 tasks·81 vs 101 camps·2025-Q1→2026-Q1

Every model, by em-dash use

Em-dash heavy (81)Em-dash sparse (101)

Hover a model to inspect

The headline

The gap is widening, fast

For every task, we measure how much models disagree on a feature, the cross-model spread. If writing were converging, it would shrink. For the em-dash it more than quadrupled in a single year.

+310%

growth in the cross-model spread of em-dash use on identical tasks, from 2025-Q1 to 2026-Q1. The single most divergent feature in the entire fingerprint.

Within-task stddev: 0.61 → 2.50 em-dashes / 100 words.

Em-dash disagreement, over time

As the band widens, the camps move apart. It opens early in 2025 and never closes.

em-dash heavy

101

em-dash sparse

use zero em-dashes

7×

heavy vs sparse mean

The em-dash heavy camp

ModelLabPer 100w

Kimi K2 0905

Moonshot

9.38

Mistral Small Creative

Mistral

6.58

Mercury

Inception

6.54

Mercury 2

Inception

6.22

GPT-5.3 Chat

OpenAI

6.03

OpenAI o3

OpenAI

5.64

Qwen3 Max Thinking

Qwen

5.18

Aurora Alpha

OpenRouter

5.10

Solar Pro 3

Upstage

5.03

Mistral Large

Mistral

4.97

Mistral Large 3 2512

Mistral

4.64

GPT OSS 120B

OpenAI

4.63

The em-dash sparse / free camp

101

ModelLabPer 100w

Claude 2

Anthropic

0.00

Claude 3 Haiku

Anthropic

0.00

Claude 3 Opus

Anthropic

0.00

Claude 3 Sonnet

Anthropic

0.00

Claude Sonnet 3.6 (2022-10-22)

Anthropic

0.00

Cypher Alpha (free)

OpenRouter

0.00

DeepSeek Prover V2

DeepSeek

0.00

DeepSeek R1 0528

DeepSeek

0.00

Mistral Devstral Medium

Mistral

0.00

Mistral Devstral Small 1.1

Mistral

0.00

Gemini 1.5 Pro

Google

0.00

Gemini 2.0 Flash Thinking

Google

0.00

Hover a model to inspect

The em-dash is the loudest case, not the only one. Every feature below tilts up: models agree less in 2026-Q1 than they did in 2025-Q1.

Diverging features · within-task spread, 2025-Q1 → 2026-Q1

Paragraph length, inline code, emoji, semicolons and italics all fan out. The percentage is the real change in cross-model spread.

Converging features · within-task spread, 2025-Q1 → 2026-Q1

Models converge on pacing and rhythm while diverging on punctuation and formatting personality. "AI all sounds the same" is true about cadence and false about style.

Show your work

The convergence mirage

~80%

of the "AI is converging" signal is a measurement artifact of which tasks each model happened to answer, not a real shift in how models write.

Naive 29.3% convergence → 5.6% once you hold the task constant.

Convergence signal · raw vs task-controlled

Naive−29.3%

comparing overall model fingerprints

Controlled−5.6%

same prompt, different model

Hold the task constant and 29.3% convergence collapses to 5.6%. Even that residual is fragile.

The honest caveat

Method

How we measured this

Cite this

Rival (2026). The Em-Dash Civil War: AI models are diverging, not homogenizing. 183 models, 3,095 responses, 43 tasks. rival.tips/research/em-dash-civil-war

Download dataset (JSON)

The Model Similarity Index

Which models write identically, and what it costs you.

Live model pricing

Input / output prices per 1M tokens, across every lab.

This report was written with zero em-dashes. We picked a side.

The gap is widening, fast

The two campsOne mark, two populations182 models

The em-dash heavy camp

The em-dash sparse / free camp

It's not just em-dashesThe whole typographic personality is fracturing6 features

The honest other halfWhat they still agree on: pacing5 features

The convergence mirage

How we measured this

The gap is widening, fast

The two campsOne mark, two populations182 models

The em-dash heavy camp

The em-dash sparse / free camp

It's not just em-dashesThe whole typographic personality is fracturing6 features

The honest other halfWhat they still agree on: pacing5 features

The convergence mirage

How we measured this

The gap is widening, fast

The two campsOne mark, two populations182 models

The em-dash heavy camp

The em-dash sparse / free camp

It's not just em-dashesThe whole typographic personality is fracturing6 features

The honest other halfWhat they still agree on: pacing5 features

The convergence mirage

How we measured this