NVIDIA

NVIDIA: Nemotron 3 Ultra

conversationreasoningcode-generationanalysisagentic-tool-usetool-useplanning

NVIDIA Nemotron 3 Ultra is an open frontier-reasoning and orchestration model from NVIDIA, with 55B active parameters out of 550B total (MoE). Built on a hybrid Transformer-Mamba mixture-of-experts architecture, it supports text input and output with a context window of up to 1M tokens. It is suited for long-running agentic workflows, including agent orchestration, coding agents, deep research, and complex multi-step reasoning.

NVIDIA: Nemotron 3.5 Content Safety

NVIDIA Nemotron 3.5 Content Safety is a compact 4B-parameter multimodal guardrail model from NVIDIA, fine-tuned from Google Gemma-3-4B. It moderates both inputs to and responses from LLMs and VLMs, accepting text and image input and returning text output: a safe/unsafe classification for the user prompt and the response, safety category labels, and an optional reasoning trace. It covers 12 languages with a 128K context window.

NVIDIA Nemotron 3 Super (free)

conversationreasoningcode-generationanalysisplanningagentic-tool-use

NVIDIA Nemotron 3 Super is a 120B-parameter open hybrid MoE model, activating just 12B parameters for maximum compute efficiency and accuracy in complex multi-agent applications. Built on a hybrid Mamba-Transformer Mixture-of-Experts architecture with multi-token prediction (MTP), it delivers over 50% higher token generation compared to leading open models. The model features a 1M token context window for long-term agent coherence, cross-document reasoning, and multi-step task planning. Latent MoE enables calling 4 experts for the inference cost of only one, improving intelligence and generalization. Fully open with weights, datasets, and recipes under the NVIDIA Open License.

NVIDIA Nemotron Nano 9B V2

NVIDIA-Nemotron-Nano-9B-v2 is a large language model (LLM) trained from scratch by NVIDIA, designed as a unified model for reasoning and non-reasoning tasks. It can expose an internal reasoning trace and then produce a final answer, or be configured via system prompt to only provide final answers without intermediate traces.

Loading...

Model Evolution

One challenge, every NVIDIA generation.

NVIDIA

Builds the Nemotron model family and AI hardware and software for inference.

Total Models

4

Text Models

4

Active Period

Sep 2025 to Jun 2026

NVIDIA: Nemotron 3 Ultra

conversationreasoningcode-generationanalysisagentic-tool-usetool-useplanning

NVIDIA Nemotron 3 Ultra is an open frontier-reasoning and orchestration model from NVIDIA, with 55B active parameters out of 550B total (MoE). Built on a hybrid Transformer-Mamba mixture-of-experts architecture, it supports text input and output with a context window of up to 1M tokens. It is suited for long-running agentic workflows, including agent orchestration, coding agents, deep research, and complex multi-step reasoning.

NVIDIA: Nemotron 3.5 Content Safety

NVIDIA Nemotron 3.5 Content Safety is a compact 4B-parameter multimodal guardrail model from NVIDIA, fine-tuned from Google Gemma-3-4B. It moderates both inputs to and responses from LLMs and VLMs, accepting text and image input and returning text output: a safe/unsafe classification for the user prompt and the response, safety category labels, and an optional reasoning trace. It covers 12 languages with a 128K context window.

NVIDIA Nemotron 3 Super (free)

conversationreasoningcode-generationanalysisplanningagentic-tool-use

NVIDIA Nemotron 3 Super is a 120B-parameter open hybrid MoE model, activating just 12B parameters for maximum compute efficiency and accuracy in complex multi-agent applications. Built on a hybrid Mamba-Transformer Mixture-of-Experts architecture with multi-token prediction (MTP), it delivers over 50% higher token generation compared to leading open models. The model features a 1M token context window for long-term agent coherence, cross-document reasoning, and multi-step task planning. Latent MoE enables calling 4 experts for the inference cost of only one, improving intelligence and generalization. Fully open with weights, datasets, and recipes under the NVIDIA Open License.

NVIDIA Nemotron Nano 9B V2

NVIDIA-Nemotron-Nano-9B-v2 is a large language model (LLM) trained from scratch by NVIDIA, designed as a unified model for reasoning and non-reasoning tasks. It can expose an internal reasoning trace and then produce a final answer, or be configured via system prompt to only provide final answers without intermediate traces.

Loading...

Model Evolution

One challenge, every NVIDIA generation.

NVIDIA

Builds the Nemotron model family and AI hardware and software for inference.

Total Models

4

Text Models

4

Active Period

Sep 2025 to Jun 2026

NVIDIA: Nemotron 3 Ultra

conversationreasoningcode-generationanalysisagentic-tool-usetool-useplanning

NVIDIA Nemotron 3 Ultra is an open frontier-reasoning and orchestration model from NVIDIA, with 55B active parameters out of 550B total (MoE). Built on a hybrid Transformer-Mamba mixture-of-experts architecture, it supports text input and output with a context window of up to 1M tokens. It is suited for long-running agentic workflows, including agent orchestration, coding agents, deep research, and complex multi-step reasoning.

NVIDIA: Nemotron 3.5 Content Safety

NVIDIA Nemotron 3.5 Content Safety is a compact 4B-parameter multimodal guardrail model from NVIDIA, fine-tuned from Google Gemma-3-4B. It moderates both inputs to and responses from LLMs and VLMs, accepting text and image input and returning text output: a safe/unsafe classification for the user prompt and the response, safety category labels, and an optional reasoning trace. It covers 12 languages with a 128K context window.

NVIDIA Nemotron 3 Super (free)

conversationreasoningcode-generationanalysisplanningagentic-tool-use

NVIDIA Nemotron 3 Super is a 120B-parameter open hybrid MoE model, activating just 12B parameters for maximum compute efficiency and accuracy in complex multi-agent applications. Built on a hybrid Mamba-Transformer Mixture-of-Experts architecture with multi-token prediction (MTP), it delivers over 50% higher token generation compared to leading open models. The model features a 1M token context window for long-term agent coherence, cross-document reasoning, and multi-step task planning. Latent MoE enables calling 4 experts for the inference cost of only one, improving intelligence and generalization. Fully open with weights, datasets, and recipes under the NVIDIA Open License.

NVIDIA Nemotron Nano 9B V2

NVIDIA-Nemotron-Nano-9B-v2 is a large language model (LLM) trained from scratch by NVIDIA, designed as a unified model for reasoning and non-reasoning tasks. It can expose an internal reasoning trace and then produce a final answer, or be configured via system prompt to only provide final answers without intermediate traces.

NVIDIA

Builds the Nemotron model family and AI hardware and software for inference.

Total Models

4

Text Models

4

Active Period

Sep 2025 to Jun 2026

NVIDIA: Nemotron 3 Ultra

conversationreasoningcode-generationanalysisagentic-tool-usetool-useplanning

NVIDIA Nemotron 3 Ultra is an open frontier-reasoning and orchestration model from NVIDIA, with 55B active parameters out of 550B total (MoE). Built on a hybrid Transformer-Mamba mixture-of-experts architecture, it supports text input and output with a context window of up to 1M tokens. It is suited for long-running agentic workflows, including agent orchestration, coding agents, deep research, and complex multi-step reasoning.

NVIDIA: Nemotron 3.5 Content Safety

NVIDIA Nemotron 3.5 Content Safety is a compact 4B-parameter multimodal guardrail model from NVIDIA, fine-tuned from Google Gemma-3-4B. It moderates both inputs to and responses from LLMs and VLMs, accepting text and image input and returning text output: a safe/unsafe classification for the user prompt and the response, safety category labels, and an optional reasoning trace. It covers 12 languages with a 128K context window.

NVIDIA Nemotron 3 Super (free)

conversationreasoningcode-generationanalysisplanningagentic-tool-use

NVIDIA Nemotron 3 Super is a 120B-parameter open hybrid MoE model, activating just 12B parameters for maximum compute efficiency and accuracy in complex multi-agent applications. Built on a hybrid Mamba-Transformer Mixture-of-Experts architecture with multi-token prediction (MTP), it delivers over 50% higher token generation compared to leading open models. The model features a 1M token context window for long-term agent coherence, cross-document reasoning, and multi-step task planning. Latent MoE enables calling 4 experts for the inference cost of only one, improving intelligence and generalization. Fully open with weights, datasets, and recipes under the NVIDIA Open License.

NVIDIA Nemotron Nano 9B V2

NVIDIA-Nemotron-Nano-9B-v2 is a large language model (LLM) trained from scratch by NVIDIA, designed as a unified model for reasoning and non-reasoning tasks. It can expose an internal reasoning trace and then produce a final answer, or be configured via system prompt to only provide final answers without intermediate traces.