What is the RIVAL glossary?

It's a concise reference explaining common terminology used when discussing language models, sampling strategies and evaluation.

Can I reference these definitions in my classes?

Absolutely. Educators are welcome to link to this page or incorporate these explanations into teaching materials.

Where can I see examples of each concept?

Every card on the glossary page includes a short example demonstrating how the term is used in practice.

AI Glossary

Master the language of AI with our comprehensive glossary. Each term includes practical examples to accelerate your understanding.

Last Updated: June 2, 2025

Temperature

Controls the randomness of model outputs. Lower values are more predictable while higher values encourage creativity.

Top-k sampling

When generating text, limits next-token choices to the k most likely options, cutting off the long tail of improbable tokens.

Top-p (nucleus) sampling

Chooses from the smallest set of tokens whose cumulative probability exceeds p, balancing coherence with diversity.

Beam search

A decoding strategy that keeps multiple hypotheses at each step and selects the sequence with the highest overall probability.

Chain-of-thought

A prompting technique that encourages a model to reason step by step before producing a final answer.

Self-consistency

A decoding method that samples multiple reasoning paths and chooses the answer that appears most frequently among them.

Logit bias

Direct manipulation of token probabilities to encourage or discourage specific outputs.

Token

A basic unit of text processed by language models, which can represent characters, words, or sub‑words.

Prompt engineering

Crafting input prompts in a way that elicits the desired behaviour from a model.

RLHF

Reinforcement Learning from Human Feedback – fine‑tuning a model with preference data collected from people.

Fine-tuning

Adapting a pretrained model on a smaller dataset so it excels at a specific task.

Context window

The maximum number of tokens a model can consider in a single prompt.

Zero-shot

Providing a task with no examples and expecting the model to perform based solely on instructions.

Few-shot

Including a handful of examples in the prompt to show the desired style or format.

Embeddings

Numeric vector representations of text that capture semantic meaning for tasks like search or clustering.

Gradient descent

Optimization algorithm that adjusts model weights to minimize training loss.

LoRA

Low‑Rank Adaptation; a parameter‑efficient fine‑tuning technique that inserts small trainable matrices into a frozen model.

Knowledge cutoff

The most recent date of information included in a model's training data.

Parameter

An individual weight value within a neural network; large language models contain billions of them.

Retrieval-Augmented Generation (RAG)

A technique that combines external information retrieval with generation, giving models fresh, task-specific context.

Hallucination

When a model confidently produces information that is plausible-sounding but factually incorrect or nonexistent.

System prompt

A hidden instruction that sets the overarching behaviour, tone, or constraints for a chat model.

Prompt injection

A technique or attack where user input manipulates a prompt to override or leak hidden instructions.

Alignment

The extent to which an AI system's behaviour matches human intentions, ethical standards, and societal values.

Inference

The phase where a trained model generates outputs from inputs, as opposed to being trained or fine-tuned.

Attention

A mechanism in transformer models that lets them weigh the relevance of different tokens when generating each output.

Transformer

A neural network architecture based on self-attention that enables parallel processing, powering most modern language models.

Reranking

Reordering a list of candidate outputs (e.g., retrieved passages) using a specialised scoring model to surface the best ones.

Model Context Protocol (MCP)

A structured specification for packaging context blocks—such as system prompts, memory, retrieved documents, and user messages—before sending them to a language model, enabling reproducible, streamed interactions.