NVIDIA: Nemotron 3 Ultra by Nvidia — Pricing, Benchmarks &amp; Real Outputs

Alternatives to NVIDIA: Nemotron 3 Ultra

NVIDIA: Nemotron 3 Ultra is good. These would like a word anyway.

NVIDIA: Nemotron 3 Ultra has opinions about your brand too.Ask 10 models what they tell people about you. Verbatim receipts.Run the mirror

Updated Jun 6, 2026

Share

Loading...

Compare NVIDIA: Nemotron 3 Ultra

Alternatives to NVIDIA: Nemotron 3 Ultra

NVIDIA: Nemotron 3 Ultra is good. These would like a word anyway.

NVIDIA: Nemotron 3 Ultra has opinions about your brand too.Ask 10 models what they tell people about you. Verbatim receipts.Run the mirror

Updated Jun 6, 2026

Share

NVIDIA: Nemotron 3 Ultra

NVIDIA Nemotron 3 Ultra is an open frontier-reasoning and orchestration model from NVIDIA, with 55B active parameters out of 550B total (MoE). Built on a hybrid Transformer-Mamba mixture-of-experts architecture, it supports text input and output with a context window of up to 1M tokens. It is suited for long-running agentic workflows, including agent orchestration, coding agents, deep research, and complex multi-step reasoning.

Provider

Release Date

2026-06-04

Size

XLARGE

Pricing

In: $0.00/1M

Out: $0.00/1M

Rival’s tab

$0.44

58/58 receipts survived

Get API accessProvider and language code samples

Provider

fromimport openai  OpenAI

client = OpenAI(
"https://openrouter.ai/api/v1"    base_url=,
"$OPENROUTER_API_KEY"    api_key=,
)

response = client.chat.completions.create(
"nvidia/nemotron-3-ultra-550b-a55b"    model=,
"role""user""content""Hello!"    messages=[{: , : }],
)
print(response.choices[0].message.content)

Set OPENROUTER_API_KEY with your OpenRouter API key from openrouter.ai/keys.

Deep analysisPsychometrics, taste index, writing DNA

SubjectiveBench

Taste Index

Across 58 scored outputs

420.42xThe default

0100headroom →

Taste is judged on an uncapped scale, originality first. The space past 100 is craft today's models rarely reach.

Craft52

Originality37

Plays it safe

share of outputs that are the default answer

24%

Writing DNA

Stylometric Fingerprint

Based on 26 text responses

Tick = global average

Vocabulary Diversity59%

Unique words vs. total words. Higher = richer vocabulary.

Sentence Length15.7 words

Average words per sentence.

Hedging0.26

"Might", "perhaps", "arguably" per 100 words.

Bold Formatting7.7

**Bold** markers per 1,000 characters.

List Usage2.7

Bullet and numbered list items per 1,000 characters.

Section Structure0.75

Markdown headings per 1,000 characters.

Emoji Usage0.13

Emoji per 1,000 characters.

Transitions0.04

"However", "moreover", "furthermore" per 100 words.

Opening Habits

Consistency

79%

Across 26 responses

Favorites

Movie

The Shawshank Redemption

Album

The Dark Side of the Moon

Book

To Kill a Mockingbird

City

Kyoto

Game

The Legend of Zelda: Ocarina of Time

Sponsored

Model Responses

53 outputs · $0.44 tracked across 53 receipts

Related Models

NVIDIA: Nemotron 3.5 Content Safety

NVIDIA Nemotron 3.5 Content Safety is a compact 4B-parameter multimodal guardrail model from NVIDIA, fine-tuned from Google Gemma-3-4B. It moderates both inputs to and responses from LLMs and VLMs, accepting text and image input and returning text output: a safe/unsafe classification for the user prompt and the response, safety category labels, and an optional reasoning trace. It covers 12 languages with a 128K context window.

NVIDIA Nemotron Nano 9B V2

NVIDIA-Nemotron-Nano-9B-v2 is a large language model (LLM) trained from scratch by NVIDIA, designed as a unified model for reasoning and non-reasoning tasks. It can expose an internal reasoning trace and then produce a final answer, or be configured via system prompt to only provide final answers without intermediate traces.

NVIDIA Nemotron 3 Super (free)

NVIDIA Nemotron 3 Super is a 120B-parameter open hybrid MoE model, activating just 12B parameters for maximum compute efficiency and accuracy in complex multi-agent applications. Built on a hybrid Mamba-Transformer Mixture-of-Experts architecture with multi-token prediction (MTP), it delivers over 50% higher token generation compared to leading open models. The model features a 1M token context window for long-term agent coherence, cross-document reasoning, and multi-step task planning. Latent MoE enables calling 4 experts for the inference cost of only one, improving intelligence and generalization. Fully open with weights, datasets, and recipes under the NVIDIA Open License.

Keep exploring

NVIDIA: Nemotron 3 Ultra vs MiniMax M3

Real outputs compared side by side

Best AI for Complex Reasoning

Which AI reasons best under pressure? Ranked across 11 challenges: contracts,...

Loading...

Compare NVIDIA: Nemotron 3 Ultra

Alternatives to NVIDIA: Nemotron 3 Ultra

NVIDIA: Nemotron 3 Ultra is good. These would like a word anyway.

NVIDIA: Nemotron 3 Ultra has opinions about your brand too.Ask 10 models what they tell people about you. Verbatim receipts.Run the mirror

Updated Jun 6, 2026

Share

NVIDIA: Nemotron 3 Ultra

NVIDIA Nemotron 3 Ultra is an open frontier-reasoning and orchestration model from NVIDIA, with 55B active parameters out of 550B total (MoE). Built on a hybrid Transformer-Mamba mixture-of-experts architecture, it supports text input and output with a context window of up to 1M tokens. It is suited for long-running agentic workflows, including agent orchestration, coding agents, deep research, and complex multi-step reasoning.

Provider

Release Date

2026-06-04

Size

XLARGE

Pricing

In: $0.00/1M

Out: $0.00/1M

Rival’s tab

$0.44

58/58 receipts survived

Get API accessProvider and language code samples

Provider

fromimport openai  OpenAI

client = OpenAI(
"https://openrouter.ai/api/v1"    base_url=,
"$OPENROUTER_API_KEY"    api_key=,
)

response = client.chat.completions.create(
"nvidia/nemotron-3-ultra-550b-a55b"    model=,
"role""user""content""Hello!"    messages=[{: , : }],
)
print(response.choices[0].message.content)

Set OPENROUTER_API_KEY with your OpenRouter API key from openrouter.ai/keys.

Deep analysisPsychometrics, taste index, writing DNA

SubjectiveBench

Taste Index

Across 58 scored outputs

420.42xThe default

0100headroom →

Taste is judged on an uncapped scale, originality first. The space past 100 is craft today's models rarely reach.

Craft52

Originality37

Plays it safe

share of outputs that are the default answer

24%

Writing DNA

Stylometric Fingerprint

Based on 26 text responses

Tick = global average

Vocabulary Diversity59%

Unique words vs. total words. Higher = richer vocabulary.

Sentence Length15.7 words

Average words per sentence.

Hedging0.26

"Might", "perhaps", "arguably" per 100 words.

Bold Formatting7.7

**Bold** markers per 1,000 characters.

List Usage2.7

Bullet and numbered list items per 1,000 characters.

Section Structure0.75

Markdown headings per 1,000 characters.

Emoji Usage0.13

Emoji per 1,000 characters.

Transitions0.04

"However", "moreover", "furthermore" per 100 words.

Opening Habits

Consistency

79%

Across 26 responses

Favorites

Movie

The Shawshank Redemption

Album

The Dark Side of the Moon

Book

To Kill a Mockingbird

City

Kyoto

Game

The Legend of Zelda: Ocarina of Time

Sponsored

Model Responses

53 outputs · $0.44 tracked across 53 receipts

Related Models

NVIDIA: Nemotron 3.5 Content Safety

NVIDIA Nemotron 3.5 Content Safety is a compact 4B-parameter multimodal guardrail model from NVIDIA, fine-tuned from Google Gemma-3-4B. It moderates both inputs to and responses from LLMs and VLMs, accepting text and image input and returning text output: a safe/unsafe classification for the user prompt and the response, safety category labels, and an optional reasoning trace. It covers 12 languages with a 128K context window.

NVIDIA Nemotron Nano 9B V2

NVIDIA-Nemotron-Nano-9B-v2 is a large language model (LLM) trained from scratch by NVIDIA, designed as a unified model for reasoning and non-reasoning tasks. It can expose an internal reasoning trace and then produce a final answer, or be configured via system prompt to only provide final answers without intermediate traces.

NVIDIA Nemotron 3 Super (free)

NVIDIA Nemotron 3 Super is a 120B-parameter open hybrid MoE model, activating just 12B parameters for maximum compute efficiency and accuracy in complex multi-agent applications. Built on a hybrid Mamba-Transformer Mixture-of-Experts architecture with multi-token prediction (MTP), it delivers over 50% higher token generation compared to leading open models. The model features a 1M token context window for long-term agent coherence, cross-document reasoning, and multi-step task planning. Latent MoE enables calling 4 experts for the inference cost of only one, improving intelligence and generalization. Fully open with weights, datasets, and recipes under the NVIDIA Open License.

Keep exploring

NVIDIA: Nemotron 3 Ultra vs MiniMax M3

Real outputs compared side by side

Best AI for Complex Reasoning

Which AI reasons best under pressure? Ranked across 11 challenges: contracts,...

Loading...

Compare NVIDIA: Nemotron 3 Ultra

Alternatives to NVIDIA: Nemotron 3 Ultra

NVIDIA: Nemotron 3 Ultra is good. These would like a word anyway.

NVIDIA: Nemotron 3 Ultra has opinions about your brand too.Ask 10 models what they tell people about you. Verbatim receipts.Run the mirror

NVIDIA: Nemotron 3 Ultra

NVIDIA Nemotron 3 Ultra is an open frontier-reasoning and orchestration model from NVIDIA, with 55B active parameters out of 550B total (MoE). Built on a hybrid Transformer-Mamba mixture-of-experts architecture, it supports text input and output with a context window of up to 1M tokens. It is suited for long-running agentic workflows, including agent orchestration, coding agents, deep research, and complex multi-step reasoning.

Provider

Release Date

2026-06-04

Size

XLARGE

Pricing

In: $0.00/1M

Out: $0.00/1M

Rival’s tab

$0.44

58/58 receipts survived

Get API accessProvider and language code samples

Provider

fromimport openai  OpenAI

client = OpenAI(
"https://openrouter.ai/api/v1"    base_url=,
"$OPENROUTER_API_KEY"    api_key=,
)

response = client.chat.completions.create(
"nvidia/nemotron-3-ultra-550b-a55b"    model=,
"role""user""content""Hello!"    messages=[{: , : }],
)
print(response.choices[0].message.content)

Set OPENROUTER_API_KEY with your OpenRouter API key from openrouter.ai/keys.

Deep analysisPsychometrics, taste index, writing DNA

SubjectiveBench

Taste Index

Across 58 scored outputs

420.42xThe default

0100headroom →

Taste is judged on an uncapped scale, originality first. The space past 100 is craft today's models rarely reach.

Craft52

Originality37

Plays it safe

share of outputs that are the default answer

24%

Writing DNA

Stylometric Fingerprint

Based on 26 text responses

Tick = global average

Vocabulary Diversity59%

Unique words vs. total words. Higher = richer vocabulary.

Sentence Length15.7 words

Average words per sentence.

Hedging0.26

"Might", "perhaps", "arguably" per 100 words.

Bold Formatting7.7

**Bold** markers per 1,000 characters.

List Usage2.7

Bullet and numbered list items per 1,000 characters.

Section Structure0.75

Markdown headings per 1,000 characters.

Emoji Usage0.13

Emoji per 1,000 characters.

Transitions0.04

"However", "moreover", "furthermore" per 100 words.

Opening Habits

Consistency

79%

Across 26 responses

Favorites

Movie

The Shawshank Redemption

Album

The Dark Side of the Moon

Book

To Kill a Mockingbird

City

Kyoto

Game

The Legend of Zelda: Ocarina of Time

Sponsored

Model Responses

53 outputs · $0.44 tracked across 53 receipts

Related Models

NVIDIA: Nemotron 3.5 Content Safety

NVIDIA Nemotron 3.5 Content Safety is a compact 4B-parameter multimodal guardrail model from NVIDIA, fine-tuned from Google Gemma-3-4B. It moderates both inputs to and responses from LLMs and VLMs, accepting text and image input and returning text output: a safe/unsafe classification for the user prompt and the response, safety category labels, and an optional reasoning trace. It covers 12 languages with a 128K context window.

NVIDIA Nemotron Nano 9B V2

NVIDIA-Nemotron-Nano-9B-v2 is a large language model (LLM) trained from scratch by NVIDIA, designed as a unified model for reasoning and non-reasoning tasks. It can expose an internal reasoning trace and then produce a final answer, or be configured via system prompt to only provide final answers without intermediate traces.

NVIDIA Nemotron 3 Super (free)

NVIDIA Nemotron 3 Super is a 120B-parameter open hybrid MoE model, activating just 12B parameters for maximum compute efficiency and accuracy in complex multi-agent applications. Built on a hybrid Mamba-Transformer Mixture-of-Experts architecture with multi-token prediction (MTP), it delivers over 50% higher token generation compared to leading open models. The model features a 1M token context window for long-term agent coherence, cross-document reasoning, and multi-step task planning. Latent MoE enables calling 4 experts for the inference cost of only one, improving intelligence and generalization. Fully open with weights, datasets, and recipes under the NVIDIA Open License.

Keep exploring

NVIDIA: Nemotron 3 Ultra vs MiniMax M3

Real outputs compared side by side

Best AI for Complex Reasoning

Which AI reasons best under pressure? Ranked across 11 challenges: contracts,...

NVIDIA: Nemotron 3 Ultra

NVIDIA Nemotron 3 Ultra is an open frontier-reasoning and orchestration model from NVIDIA, with 55B active parameters out of 550B total (MoE). Built on a hybrid Transformer-Mamba mixture-of-experts architecture, it supports text input and output with a context window of up to 1M tokens. It is suited for long-running agentic workflows, including agent orchestration, coding agents, deep research, and complex multi-step reasoning.

Provider

Release Date

2026-06-04

Size

XLARGE

Pricing

In: $0.00/1M

Out: $0.00/1M

Rival’s tab

$0.44

58/58 receipts survived

Get API accessProvider and language code samples

Provider

fromimport openai  OpenAI

client = OpenAI(
"https://openrouter.ai/api/v1"    base_url=,
"$OPENROUTER_API_KEY"    api_key=,
)

response = client.chat.completions.create(
"nvidia/nemotron-3-ultra-550b-a55b"    model=,
"role""user""content""Hello!"    messages=[{: , : }],
)
print(response.choices[0].message.content)

Set OPENROUTER_API_KEY with your OpenRouter API key from openrouter.ai/keys.

Deep analysisPsychometrics, taste index, writing DNA

SubjectiveBench

Taste Index

Across 58 scored outputs

420.42xThe default

0100headroom →

Taste is judged on an uncapped scale, originality first. The space past 100 is craft today's models rarely reach.

Craft52

Originality37

Plays it safe

share of outputs that are the default answer

24%

Writing DNA

Stylometric Fingerprint

Based on 26 text responses

Tick = global average

Vocabulary Diversity59%

Unique words vs. total words. Higher = richer vocabulary.

Sentence Length15.7 words

Average words per sentence.

Hedging0.26

"Might", "perhaps", "arguably" per 100 words.

Bold Formatting7.7

**Bold** markers per 1,000 characters.

List Usage2.7

Bullet and numbered list items per 1,000 characters.

Section Structure0.75

Markdown headings per 1,000 characters.

Emoji Usage0.13

Emoji per 1,000 characters.

Transitions0.04

"However", "moreover", "furthermore" per 100 words.

Opening Habits

Consistency

79%

Across 26 responses

Favorites

Movie

The Shawshank Redemption

Album

The Dark Side of the Moon

Book

To Kill a Mockingbird

City

Kyoto

Game

The Legend of Zelda: Ocarina of Time

Sponsored

Model Responses

53 outputs · $0.44 tracked across 53 receipts

Related Models

NVIDIA: Nemotron 3.5 Content Safety

NVIDIA Nemotron 3.5 Content Safety is a compact 4B-parameter multimodal guardrail model from NVIDIA, fine-tuned from Google Gemma-3-4B. It moderates both inputs to and responses from LLMs and VLMs, accepting text and image input and returning text output: a safe/unsafe classification for the user prompt and the response, safety category labels, and an optional reasoning trace. It covers 12 languages with a 128K context window.

NVIDIA Nemotron Nano 9B V2

NVIDIA-Nemotron-Nano-9B-v2 is a large language model (LLM) trained from scratch by NVIDIA, designed as a unified model for reasoning and non-reasoning tasks. It can expose an internal reasoning trace and then produce a final answer, or be configured via system prompt to only provide final answers without intermediate traces.

NVIDIA Nemotron 3 Super (free)

NVIDIA Nemotron 3 Super is a 120B-parameter open hybrid MoE model, activating just 12B parameters for maximum compute efficiency and accuracy in complex multi-agent applications. Built on a hybrid Mamba-Transformer Mixture-of-Experts architecture with multi-token prediction (MTP), it delivers over 50% higher token generation compared to leading open models. The model features a 1M token context window for long-term agent coherence, cross-document reasoning, and multi-step task planning. Latent MoE enables calling 4 experts for the inference cost of only one, improving intelligence and generalization. Fully open with weights, datasets, and recipes under the NVIDIA Open License.

Keep exploring

NVIDIA: Nemotron 3 Ultra vs MiniMax M3

Real outputs compared side by side