5,629 model outputs and 21,686 human preference votes. Same prompts, controlled conditions, structured JSONL. Built for researchers, ML engineers, and eval pipelines.
{"model_id": "gpt-4.1", "model_name": "GPT 4.1", "provider": "OpenAI", "prompt_id": "gpt-4.1-joke", "prompt_title": "Programming Joke", "prompt_text": "Tell me a programming joke.", "prompt_category": "humor", "response_type": "text", "content": "Why do programmers prefer dark mode? Because light attracts bugs.", "date": "2025-04-15"}{"model_id": "claude-3.7-sonnet", "model_name": "Claude 3.7 Sonnet", "provider": "Anthropic", "prompt_id": "claude-3.7-sonnet-minimalist-landing-page", "prompt_title": "Minimalist Landing Page", "prompt_text": "Generate a single-page landing page for a new AI startup...", "prompt_category": "web-design", "response_type": "website", "content": "<!DOCTYPE html><html lang=\"en\"><head>...</head><body>...</body></html>", "date": "2025-03-28"}{"model_id": "gemini-2.5-pro-exp", "model_name": "Gemini 2.5 Pro", "provider": "Google", "prompt_id": "gemini-2.5-pro-exp-world-map-svg", "prompt_title": "World Map SVG", "prompt_text": "Create an SVG world map with interactive hover effects.", "prompt_category": "svg-generation", "response_type": "svg", "content": "<svg viewBox=\"0 0 1000 500\" xmlns=\"http://www.w3.org/2000/svg\">...</svg>", "date": "2025-04-02"}... 5,626 more linesMost AI benchmarks test narrow tasks. RIVAL captures how models actually perform on real creative, technical, and analytical challenges — under identical conditions.
Every model gets the exact same prompt. No cherry-picking, no prompt engineering variance.
Community votes from AI duels — actual human preference signals, not synthetic labels.
Text, websites, SVGs, images, code. 14 categories from web design to philosophy.
JSONL format streams directly into eval frameworks, reward model training, and LLM-as-judge setups.
Each line is a complete model response with full metadata. JSONL format — one JSON object per line, stream-friendly.
{"model_id": "gpt-4.1", "model_name": "GPT 4.1", "provider": "OpenAI", "prompt_id": "gpt-4.1-joke", "prompt_title": "Programming Joke", "prompt_text": "Tell me a programming joke.", "prompt_category": "humor", "response_type": "text", "content": "Why do programmers prefer dark mode? Because light attracts bugs.", "date": "2025-04-15"}{"model_id": "claude-3.7-sonnet", "model_name": "Claude 3.7 Sonnet", "provider": "Anthropic", "prompt_id": "claude-3.7-sonnet-minimalist-landing-page", "prompt_title": "Minimalist Landing Page", "prompt_text": "Generate a single-page landing page for a new AI startup...", "prompt_category": "web-design", "response_type": "website", "content": "<!DOCTYPE html><html lang=\"en\"><head>...</head><body>...</body></html>", "date": "2025-03-28"}{"model_id": "gemini-2.5-pro-exp", "model_name": "Gemini 2.5 Pro", "provider": "Google", "prompt_id": "gemini-2.5-pro-exp-world-map-svg", "prompt_title": "World Map SVG", "prompt_text": "Create an SVG world map with interactive hover effects.", "prompt_category": "svg-generation", "response_type": "svg", "content": "<svg viewBox=\"0 0 1000 500\" xmlns=\"http://www.w3.org/2000/svg\">...</svg>", "date": "2025-04-02"}14 categories of creative, technical, and analytical tasks.
February 2026 edition.
Metadata only — no response content
{"model_id": "gpt-4.1", "model_name": "GPT 4.1", "provider": "OpenAI", "prompt_id": "gpt-4.1-joke", "prompt_title": "Programming Joke", "prompt_category": "humor", "response_type": "text", "date": "2025-04-15"}All 5,629 responses with full content
February 2026 Edition
Free samples included. No account required.