Skip to content

SubjectiveBench

Does it have taste?

Every benchmark measures whether AI is smart. This one measures whether it has taste. One uncapped score, judged by humans, originality first. Turns out most models draw the same seagull.

SubjectiveBench v1 · 10,758 outputs · 226 models · updated June 2026

Independent. We sell nothing to the labs we rank.

100 · the reference
headroom
leads at 70
0
160+
Every scored model, on one uncapped scale. The field clusters low. The space past 100 is the taste nobody has shown yet.
58
Points the whole field fits in
70
Top model
21
Median model
30
Points of headroom to 100

The scale

100 is the reference. Nothing reaches it yet.

0 is no taste. 100 is the reference: genuinely original, tasteful work. No model reaches it yet, and the scale runs past it because real taste has further to go.
Originality first. A polished, generic answer scores low. We are not grading homework, we are looking for a point of view.
Every score is relative to one frozen reference. New models never move the 100 line; they just land above or below it.
Sort
#ModelIndex
  1. Frontier
    1
    Claude Fable 5
    n=56·craft 69·orig 69·see output
    70
  2. Capable
    2
    OpenRouter Fusion · Quality (Jun 2026)
    n=58·craft 59·orig 43·see output
    49
  3. =3
    GPT-5.4 Pro
    n=18·craft 52·orig 39·see output
    44
  4. =3
    OpenRouter Fusion · Budget (Jun 2026)
    n=49·craft 54·orig 39·see output
    44
  5. =3
    Claude Opus 4.7
    n=57·craft 51·orig 38·see output
    43
  6. =3
    NVIDIA: Nemotron 3 Ultra
    n=58·craft 52·orig 37·see output
    42
  7. =3
    Claude Opus 4.6
    n=56·craft 52·orig 33·see output
    41
  8. =3
    Z.ai: GLM 5.2
    n=58·craft 50·orig 37·see output
    41
  9. Generic
    =9
    MiniMax M3
    n=57·craft 50·orig 33·see output
    39
  10. =9
    DeepSeek V4 Pro
    n=58·craft 51·orig 30·see output
    39
  11. =9
    Polaris Alpha
    n=35·craft 47·orig 33·see output
    38
  12. =9
    Kimi K2.6
    n=58·craft 47·orig 34·see output
    38
  13. =9
    GLM 5 Turbo
    n=53·craft 49·orig 33·see output
    38
  14. =9
    GPT-5.5
    n=58·craft 50·orig 30·see output
    38
  15. =9
    Z.ai: GLM 5.1
    n=57·craft 49·orig 31·see output
    38
  16. =9
    Claude Opus 4.8
    n=57·craft 48·orig 30·see output
    37
  17. =9
    Qwen: Qwen3.7 Max
    n=58·craft 50·orig 29·see output
    36
  18. =9
    Gemini 3.5 Flash
    n=58·craft 49·orig 28·see output
    36
  19. =9
    Claude Sonnet 4.6
    n=52·craft 49·orig 26·see output
    36
  20. =9
    Kimi K2.7 Code
    n=58·craft 48·orig 28·see output
    36
  21. =9
    DeepSeek V4 Flash
    n=58·craft 47·orig 28·see output
    35
  22. =9
    Gemini 3.1 Pro Preview
    n=53·craft 47·orig 27·see output
    35
  23. =9
    Hunter Alpha
    n=38·craft 45·orig 28·see output
    34
  24. =9
    Pony Alpha
    n=47·craft 43·orig 28·see output
    34
  25. =9
    GPT-5.4
    n=53·craft 48·orig 25·see output
    33
  26. =9
    Qwen: Qwen3.6 Max Preview
    n=53·craft 47·orig 25·see output
    33
  27. =9
    Kimi K2.5
    n=58·craft 41·orig 28·see output
    33
  28. =9
    GPT-5.3-Codex
    n=53·craft 45·orig 25·see output
    33
  29. =9
    Z.ai: GLM 5
    n=53·craft 42·orig 27·see output
    32
  30. =9
    Gemini 3 Pro Preview
    n=51·craft 43·orig 25·see output
    32
  31. =9
    Qwen: Qwen3.5 Plus 2026-04-20
    n=56·craft 45·orig 24·see output
    31
  32. =9
    Grok 4.20 Multi-Agent Beta
    n=53·craft 43·orig 25·see output
    31
  33. =9
    Ling 2.6 1T
    n=58·craft 41·orig 26·see output
    31
  34. =9
    Qwen: Qwen3.6 Plus Preview (free)
    n=58·craft 44·orig 23·see output
    31
  35. =9
    Qwen: Qwen3.6 27B
    n=55·craft 44·orig 24·see output
    31
  36. =9
    Kimi K2
    n=59·craft 39·orig 27·see output
    31
  37. =9
    MiMo-V2.5-Pro
    n=57·craft 44·orig 23·see output
    31
  38. =9
    Horizon Beta
    n=41·craft 44·orig 23·see output
    31
  39. =9
    Claude Opus 4.5
    n=58·craft 42·orig 23·see output
    31
  40. =9
    xAI: Grok 4.3
    n=58·craft 42·orig 24·see output
    30
  41. =9
    GPT-5
    n=53·craft 38·orig 25·see output
    30
  42. =9
    Healer Alpha
    n=47·craft 42·orig 25·see output
    30
  43. =9
    Qwen: Qwen3.7 Plus
    n=58·craft 42·orig 23·see output
    30
  44. =9
    Horizon Alpha
    n=34·craft 44·orig 23·see output
    30
  45. =9
    MiMo-V2-Pro
    n=58·craft 41·orig 23·see output
    30
  46. =9
    Claude Sonnet 4.5
    n=43·craft 42·orig 22·see output
    30
  47. =9
    Gemini 3 Flash Preview
    n=59·craft 40·orig 23·see output
    30
  48. =9
    MoonshotAI: Kimi K2 0905
    n=59·craft 37·orig 25·see output
    29
  49. =9
    MiMo-V2.5
    n=58·craft 40·orig 22·see output
    29
  50. =9
    GPT-5.2
    n=53·craft 41·orig 21·see output
    28
  51. =9
    Qwen: Qwen3.6 Flash
    n=58·craft 41·orig 20·see output
    28
  52. =9
    Qwen: Qwen3.5 397B A17B
    n=53·craft 39·orig 21·see output
    28
  53. =9
    Claude 3.7 Thinking Sonnet
    n=59·craft 39·orig 21·see output
    28
  54. =9
    GPT-5.2 Pro
    n=31·craft 37·orig 23·see output
    27
  55. =9
    Google: Gemma 4 26B A4B
    n=58·craft 38·orig 21·see output
    27
  56. =9
    Gemini 2.5 Pro Preview 06-05
    n=44·craft 37·orig 20·see output
    27
  57. =9
    Qwen: Qwen3.6 35B A3B
    n=58·craft 41·orig 20·see output
    27
  58. =9
    Z.AI: GLM 4.7
    n=58·craft 36·orig 22·see output
    27
  59. =9
    gemini-2-5-pro-preview-05-06
    n=1·craft 35·orig 20·see output
    27
  60. =9
    Qwen3 Coder Next
    n=53·craft 39·orig 20·see output
    27
  61. =9
    GPT-5.4 Nano
    n=53·craft 40·orig 21·see output
    27
  62. =9
    Grok 4.20 Beta
    n=53·craft 37·orig 21·see output
    27
  63. =9
    Grok 4.1 Fast
    n=59·craft 38·orig 20·see output
    27
  64. =9
    Seed 2.0 Lite
    n=52·craft 36·orig 21·see output
    26
  65. =9
    MiMo-V2-Omni
    n=53·craft 36·orig 20·see output
    26
  66. =9
    DeepSeek V3.2 Exp
    n=54·craft 37·orig 19·see output
    26
  67. =9
    Qwen: Qwen3 Max Thinking
    n=58·craft 36·orig 20·see output
    25
  68. =9
    Z.AI: GLM 4.6
    n=59·craft 37·orig 18·see output
    25
  69. =9
    Qwen: Qwen3.5 122B A10B
    n=53·craft 37·orig 19·see output
    25
  70. =9
    GPT-5 Mini
    n=59·craft 33·orig 20·see output
    25
  71. =9
    MiMo-V2-Flash
    n=59·craft 35·orig 19·see output
    25
  72. =9
    Gemini 2.5 Pro (I/O Edition)
    n=42·craft 35·orig 18·see output
    25
  73. =9
    Qwen: Qwen3.5 27B
    n=53·craft 35·orig 18·see output
    25
  74. =9
    OpenAI o3
    n=58·craft 33·orig 20·see output
    25
  75. =9
    NVIDIA Nemotron 3 Super (free)
    n=53·craft 34·orig 19·see output
    25
  76. =9
    Google: Gemma 4 31B
    n=49·craft 35·orig 18·see output
    25
  77. =9
    Claude Opus 4
    n=58·craft 36·orig 17·see output
    25
  78. =9
    GPT-5.4 Mini
    n=53·craft 39·orig 17·see output
    25
  79. =9
    Qwen: Qwen3 235B A22B Thinking 2507
    n=59·craft 35·orig 18·see output
    24
  80. =9
    Qwen Plus 0728 (thinking)
    n=37·craft 37·orig 18·see output
    24
  81. =9
    MiniMax: MiniMax M2.1
    n=59·craft 34·orig 18·see output
    24
  82. =9
    Claude Haiku 4.5
    n=54·craft 35·orig 18·see output
    24
  83. =9
    Qwen: Qwen3.5 Plus 2026-02-15
    n=53·craft 37·orig 17·see output
    24
  84. =9
    qwen3-30b-a3b-thinking
    n=1·craft 36·orig 14·see output
    24
  85. =9
    DeepSeek V3.1
    n=54·craft 35·orig 18·see output
    24
  86. =9
    Google: Gemini 2.5 Flash Preview 09-2025
    n=51·craft 34·orig 18·see output
    24
  87. =9
    Ring 2.6 1T
    n=43·craft 36·orig 17·see output
    24
  88. =9
    MiniMax M2
    n=35·craft 32·orig 18·see output
    24
  89. =9
    Claude Sonnet 4
    n=58·craft 35·orig 17·see output
    24
  90. =9
    GPT-5.1
    n=52·craft 34·orig 18·see output
    24
  91. =9
    Claude 3.7 Sonnet
    n=60·craft 34·orig 16·see output
    23
  92. =9
    Sherlock Think Alpha
    n=34·craft 35·orig 17·see output
    23
  93. =9
    GPT-5.3 Chat
    n=53·craft 32·orig 17·see output
    23
  94. =9
    Owl Alpha
    n=58·craft 35·orig 18·see output
    23
  95. =9
    GPT-5 Codex
    n=50·craft 33·orig 17·see output
    23
  96. =9
    Optimus Alpha
    n=19·craft 34·orig 16·see output
    23
  97. =9
    Kimi K2 Thinking
    n=58·craft 30·orig 18·see output
    23
  98. =9
    Qwen: Qwen3 Max
    n=59·craft 35·orig 16·see output
    23
  99. =9
    MiniMax M2.7
    n=53·craft 35·orig 16·see output
    23
  100. =9
    GPT-4.1
    n=59·craft 35·orig 16·see output
    23
  101. =9
    Golden Gate Claude
    n=12·craft 17·orig 35·see output
    23
  102. =9
    Claude Opus 4.1
    n=57·craft 34·orig 15·see output
    22
  103. =9
    Qwen3 Coder
    n=59·craft 33·orig 16·see output
    22
  104. =9
    Qwen3 Next 80B A3B Instruct
    n=59·craft 33·orig 17·see output
    22
  105. =9
    Qwen: Qwen3.5 35B A3B
    n=53·craft 34·orig 15·see output
    22
  106. =9
    Gemini 2.5 Pro Experimental
    n=44·craft 33·orig 16·see output
    22
  107. =9
    GPT-5 Pro
    n=42·craft 28·orig 18·see output
    22
  108. =9
    MiniMax M2.5
    n=53·craft 31·orig 16·see output
    22
  109. =9
    Z.AI: GLM 4.5
    n=59·craft 33·orig 15·see output
    21
  110. =9
    Elephant Alpha
    n=58·craft 31·orig 15·see output
    21
  111. =9
    GLM 4.7 Flash
    n=58·craft 29·orig 17·see output
    21
  112. =9
    Qwen: Qwen3.5 Flash
    n=53·craft 31·orig 15·see output
    21
  113. =9
    Qwen3 Coder Plus
    n=59·craft 31·orig 15·see output
    21
  114. =9
    Mistral Large 3 2512
    n=59·craft 33·orig 14·see output
    21
  115. =9
    Qwen: Qwen3 235B A22B 2507
    n=36·craft 34·orig 14·see output
    21
  116. =9
    Qwen3 Next 80B A3B Thinking
    n=58·craft 29·orig 16·see output
    21
  117. =9
    Gemini 2.0 Pro Experimental
    n=22·craft 28·orig 15·see output
    21
  118. =9
    Z.AI: GLM 4 32B
    n=58·craft 31·orig 16·see output
    21
  119. =9
    Qwen Plus 0728
    n=59·craft 32·orig 15·see output
    20
  120. =9
    GPT-5.1 Codex Max
    n=53·craft 28·orig 15·see output
    20
  121. =9
    xAI: Grok 4
    n=57·craft 31·orig 15·see output
    20
  122. =9
    Google: Gemini 3.1 Flash Lite Preview
    n=53·craft 28·orig 15·see output
    20
  123. =9
    Mistral Medium 3.1
    n=59·craft 32·orig 14·see output
    20
  124. =9
    Claude Sonnet 3.6 (2022-10-22)
    n=59·craft 29·orig 14·see output
    20
  125. =9
    Google: Gemini 3.1 Flash Lite
    n=58·craft 29·orig 14·see output
    20
  126. =9
    Inception: Mercury 2
    n=53·craft 30·orig 13·see output
    19
  127. =9
    MiniMax M1
    n=58·craft 28·orig 15·see output
    19
  128. =9
    GPT-4.1 Mini
    n=59·craft 31·orig 13·see output
    19
  129. =9
    Mistral Large 2
    n=23·craft 31·orig 14·see output
    19
  130. =9
    Qwen3 30B A3B Thinking 2507
    n=58·craft 30·orig 14·see output
    19
  131. =9
    Mistral Small 4
    n=53·craft 31·orig 13·see output
    19
  132. =9
    Gemini 2.5 Flash Preview 05-20 (thinking)
    n=12·craft 28·orig 15·see output
    19
  133. =9
    DeepSeek V3.2
    n=59·craft 29·orig 14·see output
    19
  134. =9
    Qwen3 235B A22B
    n=58·craft 30·orig 14·see output
    19
  135. =9
    Bert-Nebulon Alpha
    n=35·craft 29·orig 14·see output
    19
  136. =9
    ChatGPT-4o (March 2025)
    n=33·craft 31·orig 12·see output
    19
  137. =9
    DeepSeek R1
    n=59·craft 29·orig 14·see output
    19
  138. =9
    Google: Gemini 2.5 Flash Lite Preview 09-2025
    n=59·craft 30·orig 13·see output
    19
  139. =9
    GPT-5.1-Codex
    n=53·craft 28·orig 14·see output
    19
  140. =9
    Grok 3 Beta
    n=59·craft 30·orig 13·see output
    19
  141. =9
    Sherlock Dash Alpha
    n=34·craft 29·orig 14·see output
    19
  142. =9
    Ling 2.6 Flash
    n=58·craft 28·orig 15·see output
    19
  143. =9
    Gemini 2.5 Flash Preview
    n=29·craft 27·orig 13·see output
    18
  144. =9
    Qwen: Qwen3 30B A3B Instruct 2507
    n=58·craft 30·orig 13·see output
    18
  145. =9
    Qwen3.5 9B
    n=55·craft 28·orig 13·see output
    18
  146. =9
    Grok 3
    n=57·craft 28·orig 13·see output
    18
  147. =9
    ERNIE 4.5 300B A47B
    n=58·craft 30·orig 12·see output
    18
  148. =9
    DeepSeek V3 (March 2024)
    n=59·craft 30·orig 12·see output
    18
  149. =9
    GPT-5.1-Codex-Mini
    n=59·craft 26·orig 14·see output
    18
  150. =9
    OpenAI Codex Mini
    n=9·craft 25·orig 14·see output
    18
  151. =9
    Mistral Large
    n=59·craft 29·orig 13·see output
    18
  152. =9
    xAI: Grok 4 Fast (free)
    n=35·craft 27·orig 13·see output
    18
  153. =9
    OpenAI o4 Mini High
    n=55·craft 26·orig 13·see output
    18
  154. =9
    Z.AI: GLM 4.5 Air
    n=58·craft 28·orig 13·see output
    18
  155. =9
    Gemini 2.5 Flash Preview 05-20
    n=10·craft 27·orig 11·see output
    18
  156. =9
    Gemini 2.5 Flash Preview (thinking)
    n=20·craft 26·orig 12·see output
    18
  157. =9
    Mistral Small Creative
    n=59·craft 30·orig 12·see output
    18
  158. =9
    Sonoma Dusk Alpha
    n=34·craft 29·orig 12·see output
    18
  159. =9
    GPT OSS 120B
    n=54·craft 26·orig 13·see output
    17
  160. =9
    Sonar Pro Search
    n=52·craft 26·orig 13·see output
    17
  161. =9
    Kimi Linear 48B A3B Instruct
    n=35·craft 26·orig 14·see output
    17
  162. =9
    Grok 3 Mini Beta
    n=13·craft 27·orig 15·see output
    17
  163. =9
    DeepSeek R1 0528
    n=58·craft 27·orig 12·see output
    17
  164. =9
    Trinity Large Preview
    n=40·craft 27·orig 13·see output
    17
  165. =9
    GPT-4o (Omni)
    n=54·craft 27·orig 12·see output
    17
  166. =9
    o1
    n=59·craft 23·orig 13·see output
    17
  167. =9
    Gemini 2.0 Flash Thinking
    n=22·craft 27·orig 11·see output
    17
  168. =9
    Grok Code Fast 1
    n=60·craft 25·orig 13·see output
    17
  169. =9
    Gemma 3 27B
    n=60·craft 25·orig 13·see output
    17
  170. =9
    Aurora Alpha
    n=53·craft 26·orig 11·see output
    17
  171. =9
    Mistral Medium 3
    n=59·craft 28·orig 11·see output
    17
  172. =9
    TNG R1T Chimera
    n=48·craft 26·orig 12·see output
    16
  173. =9
    Sonoma Sky Alpha
    n=35·craft 26·orig 11·see output
    16
  174. =9
    QwQ 32B
    n=16·craft 21·orig 14·see output
    16
  175. =9
    OpenAI o4-mini
    n=59·craft 23·orig 12·see output
    16
  176. =9
    GPT-5.2 Chat
    n=53·craft 24·orig 11·see output
    16
  177. =9
    Qwen3 30B A3B
    n=59·craft 25·orig 12·see output
    16
  178. =9
    Llama 4 Maverick
    n=57·craft 24·orig 12·see output
    16
  179. =9
    Mistral: Devstral 2 2512
    n=35·craft 25·orig 11·see output
    16
  180. =9
    Inception: Mercury
    n=59·craft 24·orig 12·see output
    15
  181. =9
    Gemini 2.5 Flash Lite Preview 06-17
    n=11·craft 22·orig 11·see output
    15
  182. =9
    Qwen3 Coder Flash
    n=59·craft 25·orig 11·see output
    15
  183. =9
    Amazon Nova 2 Lite
    n=35·craft 24·orig 11·see output
    15
  184. =9
    PaLM 2 Chat
    n=22·craft 19·orig 14·see output
    15
  185. =9
    GPT-4.5
    n=26·craft 22·orig 11·see output
    15
  186. =9
    GPT-4.1 Nano
    n=55·craft 21·orig 11·see output
    14
  187. =9
    Claude 2
    n=5·craft 23·orig 11·see output
    14
  188. =9
    Llama 4 Scout
    n=58·craft 21·orig 11·see output
    14
  189. =9
    Gemini 1.5 Pro
    n=20·craft 21·orig 11·see output
    14
  190. =9
    INTELLECT-3
    n=59·craft 25·orig 9·see output
    14
  191. =9
    o3 Mini
    n=59·craft 20·orig 10·see output
    14
  192. =9
    Quasar Alpha
    n=6·craft 18·orig 9·see output
    14
  193. =9
    Claude 3 Opus
    n=12·craft 17·orig 12·see output
    14
  194. =9
    DeepSeek V3.2 Speciale
    n=55·craft 21·orig 9·see output
    13
  195. =9
    Llama 3.1 70B (Instruct)
    n=56·craft 20·orig 11·see output
    13
  196. =9
    GPT-5.1 Chat
    n=53·craft 19·orig 11·see output
    13
  197. =9
    Claude 3 Haiku
    n=58·craft 20·orig 10·see output
    13
  198. =9
    qwen3-30b-a3b-instruct
    n=1·craft 29·orig 8·see output
    13
  199. =9
    Gemini Pro 1.0
    n=28·craft 19·orig 11·see output
    13
  200. =9
    Solar Pro 3
    n=38·craft 22·orig 10·see output
    13
  201. =9
    GPT-4
    n=26·craft 20·orig 9·see output
    13
  202. =9
    Gemma 3 12B
    n=60·craft 21·orig 9·see output
    13
  203. =9
    GPT OSS 20B
    n=54·craft 19·orig 9·see output
    12
  204. =9
    Nova Premier 1.0
    n=51·craft 21·orig 8·see output
    12
  205. =9
    Mistral Devstral Medium
    n=59·craft 20·orig 8·see output
    12
  206. =9
    Grok 3 Thinking
    n=14·craft 23·orig 7·see output
    12
  207. =9
    Andromeda Alpha
    n=31·craft 19·orig 10·see output
    12
  208. =9
    GPT-4o mini
    n=55·craft 20·orig 8·see output
    12
  209. =9
    Gemma 3n 4B
    n=57·craft 19·orig 8·see output
    12
  210. =9
    NVIDIA Nemotron Nano 9B V2
    n=59·craft 17·orig 9·see output
    12
  211. =9
    GPT-5 Nano
    n=59·craft 16·orig 9·see output
    11
  212. =9
    Claude 3 Sonnet
    n=13·craft 15·orig 11·see output
    11
  213. =9
    Mistral Devstral Small 1.1
    n=59·craft 18·orig 8·see output
    11
  214. =9
    Llama 3 70B
    n=58·craft 19·orig 8·see output
    11
  215. =9
    Cypher Alpha (free)
    n=32·craft 15·orig 9·see output
    11
  216. =9
    Mistral Nemo
    n=58·craft 18·orig 9·see output
    11
  217. =9
    Google: Gemma 3n 2B
    n=27·craft 17·orig 9·see output
    10
  218. =9
    MiniMax M2-her
    n=59·craft 13·orig 9·see output
    10
  219. =9
    GPT-3.5 Turbo
    n=58·craft 15·orig 7·see output
    9
  220. =9
    DeepSeek Prover V2
    n=6·craft 13·orig 6·see output
    8
  221. =221
    Llama 3.1 405B
    n=12·craft 7·orig 4·see output
    5
  222. =221
    Qwen3 0.6B
    n=16·craft 6·orig 5·see output
    4
  223. =223
    GPT-2
    n=7·craft 0·orig 2·see output
    1
  224. =223
    GPT-1
    n=7·craft 0·orig 1·see output
    1
  225. =223
    NVIDIA: Nemotron 3.5 Content Safety
    n=58·craft 0·orig 0·see output
    0
  226. =223
    Riverflow V2 Fast
    n=2·craft 0·orig 0·see output
    0

Showing 226 of 226 models. 100 is the reference; nothing reaches it yet. The number is uncapped; near-equal scores share a rank. SubjectiveBench v1 · calibrated June 2026.

01

AI seeds the score

A model scores every output against the same prompt's reference answer, originality first.

02

A human decides

The machine under-rewards originality, so a person re-checks and tweaks. The human is the point, not the formality.

03

One number, uncapped

You get a Taste Index per output and per model. Higher is rarer. Most of today's models sit below 100.

How we keep it honest

Anchored
Every score is relative to one frozen reference set to 100.
Originality first
We punish the homogeneous default. Polish does not rescue sameness.
Shown, not asserted
Every scored output is on the site. Read them and disagree.
Human-curated
An AI seeds, a human decides. The override rate is public.
No paid placement
No model pays to rank. We run Compare too. That is the only conflict, and now you know it.

Taste has no objective ground truth. SubjectiveBench measures one curated reference, on a fixed scale, originality first. The honest move is to read the outputs yourself and disagree. Every score links to the real output behind it. SubjectiveBench v1, calibrated June 2026.

SubjectiveBench v1 (June 2026). rival.tips. https://www.rival.tips/subjectivebench

Questions, answered

What is SubjectiveBench?
A taste benchmark. Every other benchmark measures whether a model is competent. This one measures whether it has taste: craft, originality, and whether it escapes the answer every other model gives. One uncapped score per output and per model, judged by humans.
Why is the scale uncapped, and why does nothing reach 100?
Because taste has headroom and competence does not. 100 is the reference: the level of genuinely original, tasteful work the scale is anchored to. No model reaches it yet. The best sits well below, and most cluster near the floor making the same default choices. The scale runs past 100 to infinity because when a model finally gets there, taste keeps going. A 0 to 100 percentage would pretend there is a ceiling. There is not.
What does "originality first" mean?
A polished, generic answer scores low. The clean purple landing page everyone generates, the seagull drawn from the one angle every model picks, the joke that is technically a joke but not actually new: all of it sits near the floor, no matter how competent. We reward a point of view, not homework.
Isn't taste just your opinion?
Yes, and we say so out loud. There is no objective ground truth for taste. So we do two things. An AI pass seeds every score against a fixed reference for the same prompt, then a human re-checks and adjusts, because the machine systematically under-rewards originality. And we put every scored output on the site. Read them and disagree. A benchmark you can audit beats a number you have to trust.
Can a model game it?
Eventually, like any benchmark. The defenses: scores are anchored to a fixed reference, the rubric punishes the homogeneous default rather than rewarding polish, the prompt set rotates between versions, and no model can submit its own best run or pay for placement. When a model games one version, the next version is built to expose it.