minimax
MiniMax M2-her is a dialogue-first large language model built for immersive roleplay, character-driven chat, and expressive multi-turn conversations. Designed to stay consistent in tone and personality, it supports rich message roles and can learn from example dialogue to better match the style and pacing of your scenario.
moonshotai
Kimi K2.5 is Moonshot AI's native multimodal model, delivering state-of-the-art visual coding capability and a self-directed agent swarm paradigm. Built on Kimi K2 with continued pretraining over approximately 15T mixed visual and text tokens, it delivers strong performance in general reasoning, visual coding, and agentic tool-calling.
upstage
Solar Pro 3 is Upstage's powerful Mixture-of-Experts (MoE) language model. With 102B total parameters and 12B active parameters per forward pass, it delivers exceptional performance while maintaining computational efficiency. Optimized for Korean with English and Japanese support. Note: Deprecating March 2, 2026.
zhipu
As a 30B-class SOTA model, GLM-4.7-Flash offers a new option that balances performance and efficiency. It is further optimized for agentic coding use cases, strengthening coding capabilities, long-horizon task planning, and tool collaboration, and has achieved leading performance among open-source models of the same size on several current public benchmark leaderboards.
amazon
Amazon Nova Premier is the most capable of Amazon's multimodal models for complex reasoning tasks and for use as the best teacher for distilling custom models.
Nova 2 Lite is a fast, cost-effective reasoning model for everyday workloads that can process text, images, and videos to generate text. Nova 2 Lite demonstrates standout capabilities in processing documents, extracting information from videos, generating code, providing accurate grounded answers, and automating multi-step agentic workflows.
anthropic
Claude 3.7 Sonnet offers Extended Thinking Scaffolds that boost SWE-bench coding accuracy from 62.3% to 70.3%, with 81.2% accuracy in retail automation tasks, outperforming Claude Sonnet 3.6 (2022-10-22) by 13.6%.
Claude 3.7 Thinking Sonnet exposes the full chain-of-thought process during problem-solving, including error backtracking and alternative solution exploration. Scores 86.1% on GPQA Diamond benchmark for expert-level Q&A.
Claude 3.5 Sonnet offers a cost-efficient API ($3/million input tokens vs. $5 for GPT-4o) and uses embedded alignment techniques that reduce harmful outputs by 34% compared to Claude 2.1.
Claude 3 Haiku is Anthropic's fastest model with 21 ms response time for real-time applications and 98.7% accuracy on JLPT N1 benchmarks for Japanese language specialization.
Claude 3 Opus is Anthropic's most powerful model with versatile capabilities ranging from complex reasoning to advanced problem-solving.
Anthropic's Claude 2 model, featuring a large 100K token context window and strong performance on various benchmarks. Known for helpful, honest, and harmless AI conversations.
A temporary research demo version of Claude 3 Sonnet (active for 24 hours on May 23, 2024) specifically engineered by Anthropic to demonstrate feature steering. The model was manipulated to obsessively focus on the Golden Gate Bridge in its responses, showcasing research into model interpretability and safety.
Claude Opus 4 is Anthropic's most powerful model, setting new standards for coding, advanced reasoning, and AI agents. It excels at long-running tasks and complex problem-solving, with capabilities like extended thinking with tool use and improved memory.
Discover something new
black-forest-labs
A fast and cost-efficient FLUX model designed for high-throughput text-to-image generation.
Black Forest Labs' FLUX 2 Dev model for high-quality generations with flexible controls.
FLUX 2 Pro focuses on premium quality output with strong prompt adherence.
The highest fidelity image model from Black Forest Labs for maximum detail and realism.
FLUX 2 Flex adds guidance and step controls for more steerable generations.
State-of-the-art image generation with top of the line prompt following, visual quality, image detail and output diversity from Black Forest Labs.
A premium text-based image editing model that delivers maximum performance and improved typography generation for transforming images through natural language prompts.
A state-of-the-art text-based image editing model that delivers high-quality outputs with excellent prompt following and consistent results for transforming images through natural language.
bria
Commercial-ready, trained entirely on licensed data, text-to-image model. With only 4B parameters provides exceptional aesthetics and text rendering. Evaluated to be on par with other leading models.
SOTA open source model trained on licensed data, transforming intent into structured control for precise, high-quality AI image generation in enterprise and agentic workflows.
bytedance
ByteDance's Seedream 4.5 text-to-image model designed for strong aesthetics and composition.
Unified text-to-image generation and precise single-sentence editing at up to 4K resolution by ByteDance.
deepseek
DeepSeek R1 is the world's first reasoning model developed entirely via reinforcement learning, offering cost efficiency at $0.14/million tokens vs. OpenAI o1's $15, and reducing Python runtime errors by 71% via static analysis integration.
DeepSeek V3 (March 2024) shows significant improvements in reasoning capabilities with enhanced MMLU-Pro (81.2%), GPQA (68.4%), AIME (59.4%), and LiveCodeBench (49.2%) scores. Features improved front-end web development, Chinese writing proficiency, and function calling accuracy.
A 671B parameter model, speculated to be geared towards logic and mathematics. Likely an upgrade from DeepSeek-Prover-V1.5. Released on Hugging Face without an announcement or description.
DeepSeek R1 0528 is the May 28th update to the original DeepSeek R1. Performance on par with OpenAI o1, but open-sourced and with fully open reasoning tokens. It's 671B parameters in size, with 37B active in an inference pass. Fully open-source model.
DeepSeek V3.1 model integrated via automation on 2025-08-21
DeepSeek-V3.2-Exp introduces DeepSeek Sparse Attention (DSA) for efficient long-context. Reasoning toggle supported via boolean flag.
DeepSeek-V3.2-Speciale is a high-compute variant of DeepSeek-V3.2 optimized for maximum reasoning and agentic performance. It builds on DeepSeek Sparse Attention (DSA) for efficient long-context processing, then scales post-training reinforcement learning to push capability beyond the base model. Reported evaluations place Speciale ahead of GPT-5 on difficult reasoning workloads, with proficiency comparable to Gemini-3.0-Pro, while retaining strong coding and tool-use reliability. Like V3.2, it benefits from a large-scale agentic task synthesis pipeline that improves compliance and generalization in interactive environments.
DeepSeek-V3.2 is a large language model designed to harmonize high computational efficiency with strong reasoning and agentic tool-use performance. It introduces DeepSeek Sparse Attention (DSA), a fine-grained sparse attention mechanism that reduces training and inference cost while preserving quality in long-context scenarios. A scalable reinforcement learning post-training framework further improves reasoning, with reported performance in the GPT-5 class, and the model has demonstrated gold-medal results on the 2025 IMO and IOI. V3.2 also uses a large-scale agentic task synthesis pipeline to better integrate reasoning into tool-use settings, boosting compliance and generalization in interactive environments.
google
Google's Nano Banana image generation and editing model, optimized for fast, high-quality results.
Higher-quality Nano Banana variant with resolution options up to 4K for premium generations.
Google's high-quality text-to-image model focused on lighting, detail, and strong visual composition.
A faster and cheaper Imagen 3 variant for when speed and cost matter more than maximum quality.
Imagen 4 Ultra prioritizes maximum quality over speed and cost for best-in-class generations.
Imagen 4 Fast trades a bit of quality for significantly improved speed and cost.
Gemini 2.5 Flash variant tuned for fast text-to-image generation and simple image edits.
Gemini 1.5 Pro handles infinite context with 99% retrieval accuracy at 750k tokens via Mixture-of-Experts and generates chapter summaries for 2-hour videos with 92% accuracy.
ideogram-ai
Turbo is the fastest and cheapest Ideogram v3. v3 creates images with stunning realism, creative designs, and consistent styles.
inception
Mercury is the first diffusion large language model (dLLM). Applying a breakthrough discrete diffusion approach, the model runs 5-10x faster than even speed optimized models like GPT-4.1 Nano and Claude 3.5 Haiku while matching their performance. Mercury's speed enables developers to provide responsive user experiences, including with voice agents, search interfaces, and chatbots.
meta
Llama 3 70B is a large language model from Meta with strong performance and efficiency for real-time interactions.
Llama 3.1 70B offers a dramatically expanded context window and improved performance on mathematical reasoning and general knowledge tasks.
Llama 3.1 405B is Meta's most powerful open-source model, outperforming even proprietary models on various benchmarks.
Llama 4 Maverick is Meta's multimodal expert model with 17B active parameters and 128 experts (400B total parameters). It outperforms GPT-4o and Gemini 2.0 Flash across various benchmarks, achieving an ELO of 1417 on LMArena. Designed for sophisticated AI applications with excellent image understanding and creative writing.
Llama 4 Scout is Meta's compact yet powerful multimodal model with 17B active parameters and 16 experts (109B total parameters). It fits on a single H100 GPU with Int4 quantization and offers an industry-leading 10M token context window, outperforming Gemma 3, Gemini 2.0 Flash-Lite, and Mistral 3.1 across various benchmarks.
Llama 4 Behemoth is Meta's most powerful model yet with 288B active parameters and 16 experts (nearly 2T total parameters), outperforming GPT-4.5, Claude Sonnet 3.7, and Gemini 2.0 Pro on several STEM benchmarks.
midjourney
The first public release of Midjourney, introducing AI image generation to a wider audience through its Discord-based interface.
Midjourney v2 improved on the original model with better coherence, detail, and more consistent style application.
Midjourney v3 introduced significantly improved artistic capabilities with better understanding of prompt nuances and artistic styles.
Midjourney v4 marked a major leap forward with dramatically improved photorealism, coherence, and prompt understanding, trained on Google TPUs for the first time.
Midjourney v5 produces realistic images.
Midjourney v6 produces realistic images.
Midjourney v6.1 introduced a native web interface alongside Discord, with improved detail rendering, better text handling, and enhanced image coherence.
MiniMax M1 is a large-scale, open-weight reasoning model designed for extended context and high-efficiency inference. It leverages a hybrid Mixture-of-Experts (MoE) architecture paired with a custom "lightning attention" mechanism, allowing it to process long sequences—up to 1 million tokens—while maintaining competitive FLOP efficiency. With 456 billion total parameters and 45.9B active per token, this variant is optimized for complex, multi-step reasoning tasks.
MiniMax M2 is a high-efficiency 10B activated parameter model optimized for coding agents, compile-run-fix loops, and long-horizon reasoning. It balances responsiveness with strong SWE-Bench and Terminal-Bench results, excels at code generation, planning, and tool use, and preserves reasoning continuity across multi-step tasks.
MiniMax: MiniMax M2.1 model integrated via automation on 2025-12-23
mistral
Mistral Large is a powerful model with strong multilingual capabilities and reasoning, featuring a 32K token context window.
Mistral Large 2 features a 128K context window with enhanced code generation, mathematics, reasoning, and multilingual support.
Mistral Medium 3 is a high-performance enterprise-grade language model designed to deliver frontier-level capabilities at significantly reduced operational cost. It balances state-of-the-art reasoning and multimodal performance with 8× lower cost compared to traditional large models, making it suitable for scalable deployments across professional and industrial use cases. Excels in coding, STEM reasoning, and enterprise adaptation, supporting hybrid, on-prem, and in-VPC deployments.
Mistral Neom 3 is a 12B parameter model with a 128k token context length built by Mistral in collaboration with NVIDIA.
Devstral Medium is a high-performance code generation and agentic reasoning model developed jointly by Mistral AI and All Hands AI. Positioned as a step up from Devstral Small, it achieves 61.6% on SWE-Bench Verified, placing it ahead of Gemini 2.5 Pro and GPT-4.1 in code-related tasks, at a fraction of the cost.
Devstral Small 1.1 is a 24B parameter open-weight language model for software engineering agents, developed by Mistral AI in collaboration with All Hands AI. Finetuned from Mistral Small 3.1 and released under the Apache 2.0 license, it features a 128k token context window and supports both Mistral-style function calling and XML output formats.
Mistral Medium 3.1 is an updated version of Mistral Medium 3, which is a high-performance enterprise-grade language model designed to deliver frontier-level capabilities at significantly reduced operational cost. It balances state-of-the-art reasoning and multimodal performance with 8× lower cost compared to traditional large models, making it suitable for scalable deployments across professional and industrial use cases. The model excels in domains such as coding, STEM reasoning, and enterprise adaptation. It supports hybrid, on-prem, and in-VPC deployments and is optimized for integration into custom workflows. Mistral Medium 3.1 offers competitive accuracy relative to larger models like Claude Sonnet 3.5/3.7, Llama 4 Maverick, and Command R+, while maintaining broad compatibility across cloud environments.
Mistral Large 3 2512 model integrated via automation on 2025-12-01
Kimi K2 is a large-scale Mixture-of-Experts (MoE) language model developed by Moonshot AI, featuring 1 trillion total parameters with 32 billion active per forward pass. It is optimized for agentic capabilities, including advanced tool use, reasoning, and code synthesis. Kimi K2 excels across a broad range of benchmarks, particularly in coding (LiveCodeBench, SWE-bench), reasoning (ZebraLogic, GPQA), and tool-use (Tau2, AceBench) tasks. It supports long-context inference up to 128K tokens and is designed with a novel training stack that includes the MuonClip optimizer for stable large-scale MoE training.
Kimi K2 0905 is the September update of Kimi K2 0711. It is a large-scale Mixture-of-Experts (MoE) language model developed by Moonshot AI, featuring 1 trillion total parameters with 32 billion active per forward pass. It supports long-context inference up to 256k tokens, extended from the previous 128k. This update improves agentic coding with higher accuracy and better generalization across scaffolds, and enhances frontend coding with more aesthetic and functional outputs for web, 3D, and related tasks. Kimi K2 is optimized for agentic capabilities, including advanced tool use, reasoning, and code synthesis. It excels across coding (LiveCodeBench, SWE-bench), reasoning (ZebraLogic, GPQA), and tool-use (Tau2, AceBench) benchmarks. The model is trained with a novel stack incorporating the MuonClip optimizer for stable large-scale MoE training.
Kimi K2 Thinking is Moonshot AI's most advanced open reasoning model to date, extending the K2 series into agentic, long-horizon reasoning. Built on the trillion-parameter Mixture-of-Experts (MoE) architecture introduced in Kimi K2, it activates 32 billion parameters per forward pass and supports 256K-token context windows. The model is optimized for persistent step-by-step thought, dynamic tool invocation, and complex reasoning workflows that span hundreds of turns. It interleaves step-by-step reasoning with tool use, enabling autonomous research, coding, and writing that can persist for hundreds of sequential actions without drift.
Kimi Linear is a hybrid linear attention architecture that outperforms traditional full attention methods. Features Kimi Delta Attention (KDA) for efficient memory usage, reducing KV caches by up to 75% and boosting throughput by up to 6x for contexts as long as 1M tokens.
nvidia
NVIDIA-Nemotron-Nano-9B-v2 is a large language model (LLM) trained from scratch by NVIDIA, designed as a unified model for reasoning and non-reasoning tasks. It can expose an internal reasoning trace and then produce a final answer, or be configured via system prompt to only provide final answers without intermediate traces.
openai
OpenAI's most powerful reasoning model, pushing the frontier across coding, math, science, and visual perception. Trained to think longer before responding and agentically use tools (web search, code execution, image generation) to solve complex problems. Sets new SOTA on benchmarks like Codeforces and MMMU.
A smaller, cost-efficient reasoning model from OpenAI optimized for speed. Achieves remarkable performance for its size, particularly in math, coding, and visual tasks. Supports significantly higher usage limits than o3 and can agentically use tools.
OpenAI o4-mini-high is the same model as o4-mini but defaults to a high reasoning effort setting. It's a compact reasoning model optimized for speed and cost-efficiency, retaining strong multimodal and agentic capabilities, especially in math, coding, and visual tasks.
DALL-E 3 auto-improves user inputs via ChatGPT integration and blocks prohibited content with 99.9% precision using multimodal classifiers.
OpenAI's latest image generation model with strong instruction following, optional transparent backgrounds, and quality controls.
GPT Image 1.5 with `quality=low` for faster and cheaper generations.
GPT Image 1.5 with `quality=medium` for balanced cost and quality.
GPT Image 1.5 with `quality=high` for maximum fidelity.
openrouter
This is a cloaked model provided to the community to gather feedback. It's a powerful, all-purpose model supporting long-context tasks, including code generation. All prompts and completions for this model are logged by the provider as well as OpenRouter.
A stealth, powerful, all-purpose model supporting long-context tasks, including code generation. Based on community feedback.
Cypher Alpha (free) model integrated via automation on 2025-07-01
This is a cloaked model provided to the community to gather feedback. Note: It's free to use during this testing period, and prompts and completions are logged by the model creator for feedback and training.
This is a cloaked model provided to the community to gather feedback. This is an improved version of Horizon Alpha. Note: It's free to use during this testing period, and prompts and completions are logged by the model creator for feedback and training.
This is a cloaked model provided to the community to gather feedback. A fast and intelligent general-purpose frontier model with a 2 million token context window. Supports image inputs and parallel tool calling.
Sonoma Sky Alpha model integrated via automation on 2025-09-05
GLM 4.6 expands the GLM family with a 200K-token context window, stronger coding benchmarks, and more reliable multi-step reasoning. It integrates deeply with agent frameworks to orchestrate tool use and produces more natural writing for long-form chat.
perplexity
Exclusively available on the OpenRouter API, Sonar Pro's new Pro Search mode is Perplexity's most advanced agentic search system. It is designed for deeper reasoning and analysis. Pricing is based on tokens plus $18 per thousand requests.
playgroundai
An aesthetic-focused Playground v2.5 model geared toward pleasing composition and style at 1024px resolution.
prunaai
PrunaAI-optimized HiDream L1 Fast for cheap, fast text-to-image generation with configurable speed modes.
A turbo text-to-image model optimized by PrunaAI for very fast inference at low guidance.
A sub 1-second text-to-image model built for production use cases by PrunaAI.
qwen
A fast Qwen text-to-image model optimized by PrunaAI for speed on Replicate.
QwQ is the reasoning model of the Qwen series. Compared with conventional instruction-tuned models, QwQ, which is capable of thinking and reasoning, can achieve significantly enhanced performance in downstream tasks, especially hard problems. QwQ-32B is the medium-sized reasoning model, which is capable of achieving competitive performance against state-of-the-art reasoning models, e.g., DeepSeek-R1, o1-mini.
The latest generation Qwen model (30.5B params, 3.3B activated MoE) excels in reasoning, multilingual support, and agent tasks. Features a unique thinking/non-thinking mode switch. Supports up to 131K context with YaRN. Free tier on OpenRouter.
Qwen3-235B-A22B is a 235B parameter mixture-of-experts (MoE) model from Alibaba's Qwen team, activating 22B parameters per forward pass. Features seamless switching between 'thinking' mode (complex tasks) and 'non-thinking' mode (general conversation). Strong reasoning, multilingual (100+), instruction-following, and tool-calling. 32K context, extendable to 131K.
A 0.6B parameter dense model from the Qwen3 family. Supports seamless switching between 'thinking' mode (complex tasks) and 'non-thinking' mode (general conversation). Trained on 36 trillion tokens across 119 languages. Features enhanced reasoning, instruction-following, agent capabilities, and multilingual support.
Qwen3-235B-A22B-Instruct-2507 is a multilingual, instruction-tuned mixture-of-experts language model based on the Qwen3-235B architecture, with 22B active parameters per forward pass. It is optimized for general-purpose text generation, including instruction following, logical reasoning, math, code, and tool usage. The model supports a native 262K context length and does not implement "thinking mode" (<think> blocks). Compared to its base variant, this version delivers significant gains in knowledge coverage, long-context reasoning, coding benchmarks, and alignment with open-ended tasks. It is particularly strong on multilingual understanding, math reasoning (e.g., AIME, HMMT), and alignment evaluations like Arena-Hard and WritingBench.
Qwen3-Coder-480B-A35B-Instruct is a Mixture-of-Experts (MoE) code generation model developed by the Qwen team. It is optimized for agentic coding tasks such as function calling, tool use, and long-context reasoning over repositories. The model features 480 billion total parameters, with 35 billion active per forward pass (8 out of 160 experts).
Qwen3-235B-A22B-Thinking-2507 is a high-performance, open-weight Mixture-of-Experts (MoE) language model optimized for complex reasoning tasks. It activates 22B of its 235B parameters per forward pass and natively supports up to 262,144 tokens of context. This "thinking-only" variant enhances structured logical reasoning, mathematics, science, and long-form generation, showing strong benchmark performance across AIME, SuperGPQA, LiveCodeBench, and MMLU-Redux. It enforces a special reasoning mode (</think>) and is designed for high-token outputs (up to 81,920 tokens) in challenging domains.
stability
Stable Diffusion 3.5 Medium balances quality and speed, offering modern diffusion performance with broad aspect ratio support.
Stable Diffusion XL (SDXL), a widely used open text-to-image diffusion model known for versatility and community tooling.
xiaomi
MiMo-V2-Flash is an open-source foundation language model developed by Xiaomi. It is a Mixture-of-Experts model with 309B total parameters and 15B active parameters, adopting hybrid attention architecture. MiMo-V2-Flash supports a hybrid-thinking toggle and a 256K context window, and excels at reasoning, coding, and agent scenarios. On SWE-bench Verified and SWE-bench Multilingual, MiMo-V2-Flash ranks as the top #1 open-source model globally, delivering performance comparable to Claude Sonnet 4.5 while costing only about 3.5% as much.
GLM-4.5 is our latest flagship foundation model, purpose-built for agent-based applications. It leverages a Mixture-of-Experts (MoE) architecture and supports a context length of up to 128k tokens. GLM-4.5 delivers significantly enhanced capabilities in reasoning, code generation, and agent alignment. It supports a hybrid inference mode with two options, a "thinking mode" designed for complex reasoning and tool use, and a "non-thinking mode" optimized for instant responses.
GLM-4.5-Air is the lightweight variant of our latest flagship model family, also purpose-built for agent-centric applications. Like GLM-4.5, it adopts the Mixture-of-Experts (MoE) architecture but with a more compact parameter size. GLM-4.5-Air also supports hybrid inference modes, offering a "thinking mode" for advanced reasoning and tool use, and a "non-thinking mode" for real-time interaction. Users can control the reasoning behaviour with the reasoning enabled boolean.
GLM 4 32B is a cost-effective foundation language model. It can efficiently perform complex tasks and has significantly enhanced capabilities in tool use, online search, and code-related intelligent tasks. It is made by the same lab behind the thudm models.
xai
Grok 3 is a cutting-edge AI model from xAI with Big Brain Mode for complex problems, Colossus Supercomputer integration, and Reinforcement Learning optimization. Achieves 1402 Elo on LMArena benchmarks and 93.3% on AIME 2025 mathematics competition.
Grok 3 Thinking exposes the full chain-of-thought process during problem-solving, including error backtracking and alternative solution exploration. Scores 84.6% on GPQA Diamond benchmark for expert-level Q&A.
Grok 3 Mini is a lightweight, smaller thinking model ideal for reasoning-heavy tasks that don't demand extensive domain knowledge. It shines in math-specific and quantitative use cases. Transparent 'thinking' traces accessible.
Grok 3 Beta is xAI's flagship model excelling at enterprise use cases like data extraction, coding, and text summarization. Possesses deep domain knowledge in finance, healthcare, law, and science. Outperforms Grok 3 Mini on high thinking tasks.
Grok 4 is xAI's latest reasoning model with a 256k context window. It supports parallel tool calling, structured outputs, and both image and text inputs. Note that reasoning is not exposed, reasoning cannot be disabled, and the reasoning effort cannot be specified.
Grok 4 Fast is xAI's latest multimodal model with SOTA cost-efficiency and a 2M token context window. It comes in two flavors: non-reasoning and reasoning. Reasoning can be enabled via the API.
Grok Code Fast 1 model integrated via automation on 2025-08-26
Grok 4.1 Fast model integrated via automation on 2025-11-21