Skip to content

Rival Research / Volume 04 / 2026

The Persona
Impact Study

One model. One task. 52 system prompts. We held every other variable constant and measured how much a persona actually moves the needle on design output.

We picked a small open model on purpose: Gemma 4 31B. A frontier model produces competent output no matter what you put in the system prompt, which drowns out the persona signal. A smaller model leaves visible headroom, and headroom is what makes the effect measurable. The question is not whether Gemma is good. It is how much of the quality gap a well-designed persona can actually close.

Isolating the system-prompt effect

Every render below is Gemma 4 31B on the same landing-page brief. The only variable is the system prompt: no prompt, the highest-scoring persona we tested, and the lowest-scoring one. Scores are a rubric-weighted composite from three blinded judges, on a 0–10 scale.

Best ↗
8.77+1.70
Reference-pinned persona
“Build this like vercel.com”
Open
Baseline
7.07± 0
No system prompt
(the empty string)
Open
Worst ↘
2.42−4.65
Reverse-psychology persona
“Make it bad on purpose”
Open
010
2.42
7.07
8.77

One system prompt can move a small model +1.70 points above baseline — or −4.65 below it. The rest of this page is the 156 generations and 52 personas behind that range.

52
Personas tested
156
Generations scored
4.14
ANOVA F
0.164
Effect size η²
0.803
Krippendorff α (mean)
3 × Opus 4.7
Judge waves
Meta / structural
Winning bucket
8.54
Top persona score

Headline finding

Rule-dense "design-cheat" prompts scored below baseline.

We loaded eight personas with state-of-the-art design heuristics (Refactoring UI rules, Tufte, WCAG AA, the Tailwind scale, 8-pt grids, modular type ratios). The expectation was that this bucket would dominate. Instead it averaged 6.93 — below the 7.00 scored by the blank control.

What won were reasoning scaffolds (draft-critique-revise, few-shot exemplars), terse role assignments (Figma designer, Apple CPO), and reference-pinned prompts ("build this like vercel.com"). Taste beats rules. Reasoning beats cramming.

Bucket leaderboard · 95% bootstrap CIs

02468Meta / structural7.70Classic role, expansive7.63Classic role, short7.52Masterclass (copy-ready)7.38Baseline / control7.00Production system prompt6.96Design-cheat persona6.93Adversarial / unhinged6.28

Prompt length vs composite score · each dot is one persona

0246810Persona system-prompt length (characters)Composite score

Longer is not better. The scatter is flat-to-inverted: many of the top scores come from prompts under 400 characters, and the very longest prompts cluster near the middle.

Top 10 personascomposite
1Stripe SVP of Design (expansive)Classic role, expansiveσ 0.158.54
2Reference-pinned promptMasterclass (copy-ready)σ 0.418.30
3Figma principal designer (short)Classic role, shortσ 0.188.14
4Vercel-style monochromeDesign-cheat personaσ 0.338.13
5The self-correcting loopMasterclass (copy-ready)σ 0.057.97
6Brutalist web designer, 20 years in (expansive)Classic role, expansiveσ 0.267.97
7Draft, critique, reviseMeta / structuralσ 0.377.87
8Apple CPO (short)Classic role, shortσ 0.237.83
9v0 by Vercel style promptProduction system promptσ 0.047.83
10Few-shot exemplar patternsMasterclass (copy-ready)σ 0.357.82
Bottom 10 personascomposite
1Reverse psychology, make it badAdversarial / unhingedσ 0.132.55
2OpenAI ChatGPT-style system promptProduction system promptσ 3.154.63
3Peaked-in-2003 puristAdversarial / unhingedσ 0.555.79
4Apple.com landing page templateDesign-cheat personaσ 0.475.82
5Safety and guardrails strict promptProduction system promptσ 0.746.18
6Accessibility-first system promptProduction system promptσ 0.526.35
7The structured checklistMasterclass (copy-ready)σ 0.346.49
8Brutalist designer, 20 yrs (short)Classic role, shortσ 0.906.62
9Tailwind scale disciplineDesign-cheat personaσ 0.286.66
10Spacing as designDesign-cheat personaσ 0.326.70

Ranked by composite · pulled from the full 52-persona pool

The top 7 prompts

The seven highest-scoring personas out of the 52 we tested — averaging 8.13 across 5 different buckets, which is the more interesting finding: there is no single prompt shape that wins. Terse role assignments, expansive personas, reasoning scaffolds, and a reference-pinned one-liner all make it into the top seven. Each card below shows that persona's best of three samples on the same landing-page brief.

Stripe SVP of Design (expansive)

composite 8.54 · n=3 · σ=0.15

Tests whether embedding Stripe-specific typographic heuristics and forbidden words produces measurably tighter editorial craft than the one-line Stripe role.

Best sample from this prompt
Open
sample 2 · best of 38.70

Reference-pinned prompt

composite 8.30 · n=3 · σ=0.41

Pins the model's taste to specific, nameable sites it has seen in training, replacing vague style words with concrete reference behavior.

Best sample from this prompt
Open
sample 2 · best of 38.77

Figma principal designer (short)

composite 8.14 · n=3 · σ=0.18

Tests whether a modern-design identity biases output toward contemporary layout and type.

Best sample from this prompt
Open
sample 2 · best of 38.33

Vercel-style monochrome

composite 8.13 · n=3 · σ=0.33

Encodes the Vercel and Linear visual language as a mechanical ruleset so the model can produce that restrained, confident aesthetic by lookup.

Best sample from this prompt
Open
sample 2 · best of 38.48

The self-correcting loop

composite 7.97 · n=3 · σ=0.05

Forces the model to draft, honestly score, commit to a single fix, and rewrite, turning one-shot output into a two-pass process.

Best sample from this prompt
Open
sample 2 · best of 38.02

Brutalist web designer, 20 years in (expansive)

composite 7.97 · n=3 · σ=0.26

Tests whether a contrarian persona with a specific aesthetic lineage produces visually distinct output that resists the default SaaS template pull.

Best sample from this prompt
Open
sample 2 · best of 38.27

Draft, critique, revise

composite 7.87 · n=3 · σ=0.37

Tests whether forcing a self-administered rubric-based QA pass before the final render improves output quality independent of any persona.

Best sample from this prompt
Open
sample 3 · best of 38.13

Every generation · 156 rendered landing pages · sorted by composite

Reference-pinned prompt8.8
Stripe SVP of Design (expansive)8.7
Stripe SVP of Design (expansive)8.5
Vercel-style monochrome8.5
Stripe SVP of Design (expansive)8.4
Figma principal designer (short)8.3
Lovable/Bolt-style full-app shipper8.3
Brutalist web designer, 20 years in (expansive)8.3
Few-shot exemplar patterns8.2
Draft, critique, revise8.1
Few-shot with 2 exemplar patterns8.1
Reference-pinned prompt8.1
Figma principal designer (short)8.1
Vercel-style monochrome8.1
Stripe SVP Design (short)8.1
Webflow expert (short)8.1
Apple CPO (short)8.0
Chain-of-thought forcing8.0
Stripe SVP Design (short)8.0
Draft, critique, revise8.0
Reference-pinned prompt8.0
The self-correcting loop8.0
Figma principal designer (short)8.0
The self-correcting loop8.0
Type system first7.9
Jony Ive (expansive)7.9
The self-correcting loop7.9
Last project before retirement7.9
v0 by Vercel style prompt7.9
Apple CPO (short)7.8
Brutalist web designer, 20 years in (expansive)7.8
Jony Ive (short)7.8
Webflow expert (short)7.8
Jony Ive (expansive)7.8
v0 by Vercel style prompt7.8
Last project before retirement7.8
Vercel-style monochrome7.8
Chain-of-thought forcing7.8
v0 by Vercel style prompt7.8
Conversion-first agency prompt7.8
Brutalist web designer, 20 years in (expansive)7.8
Explain every choice7.8
The Design Constitution7.7
Few-shot exemplar patterns7.7
Senior Webflow expert, premium marketing sites (expansive)7.7
Self-consistency sampling (encoded)7.7
Tufte / data-ink principles for marketing7.7
Design-system / component-library prompt7.6
Conversion-first agency prompt7.6
Conversion-first agency prompt7.6
Anthropic-style constitutional prompt7.6
Apple CPO (short)7.6
Design-system / component-library prompt7.6
Anthropic-style constitutional prompt7.6
Lovable/Bolt-style full-app shipper7.6
Few-shot exemplar patterns7.6
Design-system / component-library prompt7.5
Jony Ive (short)7.5
Last project before retirement7.5
Refactoring UI rules7.5
$10M contingent payout7.5
Few-shot with 2 exemplar patterns7.5
Self-consistency sampling (encoded)7.5
Apple CPO (expansive)7.5
Self-consistency sampling (encoded)7.5
Draft, critique, revise7.4
Jony Ive (expansive)7.4
Figma principal designer (expansive)7.4
The anti-pattern ban list7.4
$10M contingent payout7.4
Cursor-style code quality prompt7.4
CRO specialist (short)7.4
Few-shot with 2 exemplar patterns7.4
Explain every choice7.4
Webflow expert (short)7.4
Jony Ive (short)7.4
Senior frontend engineer (short)7.4
Figma principal designer (expansive)7.3
Chain-of-thought forcing7.3
CRO specialist (short)7.3
Type system first7.3
Brutalist designer, 20 yrs (short)7.3
Anthropic-style constitutional prompt7.3
Senior frontend engineer (short)7.3
Brand voice and copy system prompt7.3
Apple CPO (expansive)7.3
Senior frontend engineer (short)7.3
Minimal task-aware assistant7.3
Brand voice and copy system prompt7.2
Minimal polite assistant7.2
Cursor-style code quality prompt7.2
The anti-pattern ban list7.2
The AI Steve Jobs would have fired7.2
Minimal task-aware assistant7.1
Cursor-style code quality prompt7.1
Empty system prompt7.1
Minimal polite assistant7.1
Explain every choice7.1
The design token contract7.1
Stripe SVP Design (short)7.0
Lovable/Bolt-style full-app shipper7.0
Type system first7.0
Spacing as design7.0
The AI Steve Jobs would have fired7.0
Tailwind scale discipline7.0
$10M contingent payout7.0
Refactoring UI rules7.0
Senior Webflow expert, premium marketing sites (expansive)6.9
Safety and guardrails strict prompt6.9
Brutalist designer, 20 yrs (short)6.9
Figma principal designer (expansive)6.9
Tufte / data-ink principles for marketing6.9
Empty system prompt6.9
The design token contract6.9
Color discipline6.9
The anti-pattern ban list6.9
Minimal polite assistant6.9
CRO specialist (short)6.9
Color discipline6.9
Apple CPO (expansive)6.8
Senior Webflow expert, premium marketing sites (expansive)6.8
Accessibility-first system prompt6.8
The Design Constitution6.8
Empty system prompt6.8
The structured checklist6.8
Spacing as design6.8
The Design Constitution6.7
Minimal task-aware assistant6.7
Tailwind scale discipline6.6
Color discipline6.6
The structured checklist6.6
The AI Steve Jobs would have fired6.6
OpenAI ChatGPT-style system prompt6.5
The design token contract6.5
Brand voice and copy system prompt6.5
Refactoring UI rules6.5
Accessibility-first system prompt6.5
Tailwind scale discipline6.4
Peaked-in-2003 purist6.4
OpenAI ChatGPT-style system prompt6.4
Spacing as design6.3
Tufte / data-ink principles for marketing6.2
Safety and guardrails strict prompt6.2
Apple.com landing page template6.1
The structured checklist6.1
Apple.com landing page template6.1
Accessibility-first system prompt5.8
Peaked-in-2003 purist5.7
Brutalist designer, 20 yrs (short)5.6
Safety and guardrails strict prompt5.5
Peaked-in-2003 purist5.3
Apple.com landing page template5.3
Reverse psychology, make it bad2.7
Reverse psychology, make it bad2.6
Reverse psychology, make it bad2.4
OpenAI ChatGPT-style system prompt1.0

Free + paid

Take the research with you

Method

How the study was run

Model
Gemma 4 31B Instruct via OpenRouter. Paid primary, free fallback. Temperature 0.7, max tokens 8192.
Task
One fixed prompt: build a single-file HTML landing page for a fictional luxury-real-estate CRM called Keystone. 8 required sections. Inline styles only. No external assets.
Independent variable
Persona / system prompt only. 52 personas across 8 buckets. 3 samples per persona = 156 total generations.
Judging
Three independent blinded waves. Each wave was 33 Claude Opus 4.7 agents, each scoring ~15 responses on the 6-axis anchored Likert rubric. Judges never saw the persona label. Filenames were opaque SHA256 hashes. Scripts, meta tags, and HTML comments were stripped before judge read.
Stats
One-way ANOVA across buckets. 10,000-sample bootstrap 95% CIs on every mean. Cohen's d for pairwise bucket comparisons. Krippendorff's α (interval) across the three judge waves, per axis. Mean α across axes: 0.803.
Reproducibility
All 156 raw HTML files, rendered screenshots, per-judge scores, and per-response metric breakdowns are checked into the repo. Anyone can re-run the pipeline or re-judge the dataset.

Citation

Rival Research. The Persona Impact Study. 2026. rival.tips/research/persona-impact. Dataset: persona-impact-2026.jsonl.