5,629 model outputs and 21,686 human preference votes. Same prompts, controlled conditions, structured JSONL. Built for researchers, ML engineers, and the mass sleep-deprived.
{"model_id": "gpt-4.1", "model_name": "GPT 4.1", "provider": "OpenAI", "prompt_id": "gpt-4.1-joke", "prompt_title": "Programming Joke", "prompt_text": "Tell me a programming joke.", "prompt_category": "humor", "response_type": "text", "content": "Why do programmers prefer dark mode? Because light attracts bugs.", "date": "2025-04-15"}{"model_id": "claude-3.7-sonnet", "model_name": "Claude 3.7 Sonnet", "provider": "Anthropic", "prompt_id": "claude-3.7-sonnet-minimalist-landing-page", "prompt_title": "Minimalist Landing Page", "prompt_text": "Generate a single-page landing page for a new AI startup...", "prompt_category": "web-design", "response_type": "website", "content": "<!DOCTYPE html><html lang=\"en\"><head>...</head><body>...</body></html>", "date": "2025-03-28"}{"model_id": "gemini-2.5-pro-exp", "model_name": "Gemini 2.5 Pro", "provider": "Google", "prompt_id": "gemini-2.5-pro-exp-world-map-svg", "prompt_title": "World Map SVG", "prompt_text": "Create an SVG world map with interactive hover effects.", "prompt_category": "svg-generation", "response_type": "svg", "content": "<svg viewBox=\"0 0 1000 500\" xmlns=\"http://www.w3.org/2000/svg\">...</svg>", "date": "2025-04-02"}... 5,626 more linesMost AI benchmarks test narrow tasks with synthetic grading. This one captures how models actually perform on real creative, technical, and analytical challenges. Same prompts, no cherry-picking, no vibes-only methodology.
Every model gets the exact same prompt. No cherry-picking, no prompt engineering variance. Just cold, fair, reproducible chaos.
Community votes from AI duels. Real people picking winners, not a GPT-4 judge hallucinating quality scores.
Text, websites, SVGs, images, code. 14 categories from web design to philosophy. Your eval pipeline has never eaten this well.
Streams directly into eval frameworks, reward model training, and LLM-as-judge setups. No CSV wrangling. No Parquet drama.
Each line is a complete model response with full metadata. JSONL format, one JSON object per line, stream-friendly. Your parser will be grateful.
{"model_id": "gpt-4.1", "model_name": "GPT 4.1", "provider": "OpenAI", "prompt_id": "gpt-4.1-joke", "prompt_title": "Programming Joke", "prompt_text": "Tell me a programming joke.", "prompt_category": "humor", "response_type": "text", "content": "Why do programmers prefer dark mode? Because light attracts bugs.", "date": "2025-04-15"}{"model_id": "claude-3.7-sonnet", "model_name": "Claude 3.7 Sonnet", "provider": "Anthropic", "prompt_id": "claude-3.7-sonnet-minimalist-landing-page", "prompt_title": "Minimalist Landing Page", "prompt_text": "Generate a single-page landing page for a new AI startup...", "prompt_category": "web-design", "response_type": "website", "content": "<!DOCTYPE html><html lang=\"en\"><head>...</head><body>...</body></html>", "date": "2025-03-28"}{"model_id": "gemini-2.5-pro-exp", "model_name": "Gemini 2.5 Pro", "provider": "Google", "prompt_id": "gemini-2.5-pro-exp-world-map-svg", "prompt_title": "World Map SVG", "prompt_text": "Create an SVG world map with interactive hover effects.", "prompt_category": "svg-generation", "response_type": "svg", "content": "<svg viewBox=\"0 0 1000 500\" xmlns=\"http://www.w3.org/2000/svg\">...</svg>", "date": "2025-04-02"}14 categories of creative, technical, and analytical tasks. The models didn't get to pick.
February 2026 edition.
Metadata only, no response content
{"model_id": "gpt-4.1", "model_name": "GPT 4.1", "provider": "OpenAI", "prompt_id": "gpt-4.1-joke", "prompt_title": "Programming Joke", "prompt_category": "humor", "response_type": "text", "date": "2025-04-15"}All 5,629 responses with full content
February 2026 Edition
Free samples included. No account required. We won't even ask for your email.
See what we found in this data
The AI Hallucination Index 2026. 250 models analyzed. 40+ slides.