Grounding with context: docs, examples, tool outputs

Hallucinations aren't fixed by asking the model to be more careful. They're fixed by giving the model the facts before it tries to generate them. That's grounding.

Three levels of grounding

Static grounding. Facts embedded directly in the prompt. Good for stable, small fact sets (brand guidelines, product specs, common policies).
Dynamic grounding (RAG). Facts retrieved per-query from a vector DB or search index. Good for large or frequently-changing knowledge (docs, support articles, internal wikis).
Tool-call grounding. Model calls a function to fetch facts on demand (database lookup, API call). Good for exact-answer data (account balances, inventory counts).

Most production systems combine all three.

Static grounding: how to do it well

Put facts before the task. Model sees context first, task second.
Separate clearly. Use <docs>, <context>, or markdown sections. The model is better at "refer to the context" when context is visibly delimited.
Don't pad. Irrelevant context degrades quality (middle-forgetting effect; attention dilution). If a fact isn't useful for this query, leave it out.

Dynamic grounding (RAG) essentials

RAG is a whole discipline — the important 20% for an intro:

Embeddings matter more than the LLM. Bad retrieval → bad answers. The embedding model decides what "relevant" means.
Chunking affects everything. Chunk size, overlap, and whether you chunk by fixed length or semantic boundary (paragraph, section) — all impact quality. Default to semantic chunks at paragraph or section level.
Top-K is a calibration choice. Retrieve too few — miss relevant context. Retrieve too many — dilute the prompt. Start at 5.
Re-rank. After initial retrieval, run a cross-encoder re-ranker to pick the best subset. Easy +10-20% relevance.
Log what got retrieved. When an answer is wrong, you need to see whether retrieval failed or the model failed. Can't debug what you don't log.

Tool-call grounding

For any fact that has an authoritative answer somewhere, don't ask the model to remember it. Give it a tool:

"What's my balance?" → get_account_balance(user_id)
"Which region is this order shipping to?" → get_order_details(order_id)
"Current stock price?" → get_quote(ticker)

Tool calls have failure modes too (wrong function, wrong arguments, silent fallbacks). But they eliminate a category of hallucination entirely — and make audit trails trivial.

The "refusal when ungrounded" pattern

Add an explicit rule: "If the answer isn't in the provided context, say you don't know. Never guess."

This doesn't work perfectly — models still occasionally make things up — but it's a strong signal that shifts the distribution toward honest refusals.

What kills grounding

Stuffing too much context. The "lost in the middle" effect eats grounding quality.
Conflicting context. Two docs with contradictory facts. Model picks one; you don't know which. Deduplicate upstream.
Forgetting to freshen. RAG indexes go stale. Schedule re-indexing.

Hallucinations aren't fixed by asking the model to be more careful. They're fixed by giving the model the facts before it tries to generate them. That's grounding.

Three levels of grounding

Static grounding. Facts embedded directly in the prompt. Good for stable, small fact sets (brand guidelines, product specs, common policies).
Dynamic grounding (RAG). Facts retrieved per-query from a vector DB or search index. Good for large or frequently-changing knowledge (docs, support articles, internal wikis).
Tool-call grounding. Model calls a function to fetch facts on demand (database lookup, API call). Good for exact-answer data (account balances, inventory counts).

Most production systems combine all three.

Static grounding: how to do it well

Put facts before the task. Model sees context first, task second.
Separate clearly. Use <docs>, <context>, or markdown sections. The model is better at "refer to the context" when context is visibly delimited.
Don't pad. Irrelevant context degrades quality (middle-forgetting effect; attention dilution). If a fact isn't useful for this query, leave it out.

Dynamic grounding (RAG) essentials

RAG is a whole discipline — the important 20% for an intro:

Embeddings matter more than the LLM. Bad retrieval → bad answers. The embedding model decides what "relevant" means.
Chunking affects everything. Chunk size, overlap, and whether you chunk by fixed length or semantic boundary (paragraph, section) — all impact quality. Default to semantic chunks at paragraph or section level.
Top-K is a calibration choice. Retrieve too few — miss relevant context. Retrieve too many — dilute the prompt. Start at 5.
Re-rank. After initial retrieval, run a cross-encoder re-ranker to pick the best subset. Easy +10-20% relevance.
Log what got retrieved. When an answer is wrong, you need to see whether retrieval failed or the model failed. Can't debug what you don't log.

Tool-call grounding

For any fact that has an authoritative answer somewhere, don't ask the model to remember it. Give it a tool:

"What's my balance?" → get_account_balance(user_id)
"Which region is this order shipping to?" → get_order_details(order_id)
"Current stock price?" → get_quote(ticker)

Tool calls have failure modes too (wrong function, wrong arguments, silent fallbacks). But they eliminate a category of hallucination entirely — and make audit trails trivial.

The "refusal when ungrounded" pattern

Add an explicit rule: "If the answer isn't in the provided context, say you don't know. Never guess."

This doesn't work perfectly — models still occasionally make things up — but it's a strong signal that shifts the distribution toward honest refusals.

What kills grounding

Stuffing too much context. The "lost in the middle" effect eats grounding quality.
Conflicting context. Two docs with contradictory facts. Model picks one; you don't know which. Deduplicate upstream.
Forgetting to freshen. RAG indexes go stale. Schedule re-indexing.

Grounding with context: docs, examples, tool outputs

Three levels of grounding

Static grounding: how to do it well

Dynamic grounding (RAG) essentials

Tool-call grounding

The "refusal when ungrounded" pattern

What kills grounding

2-question self-check

Continue in this track

Grounding with context: docs, examples, tool outputs

Three levels of grounding

Static grounding: how to do it well

Dynamic grounding (RAG) essentials

Tool-call grounding

The "refusal when ungrounded" pattern

What kills grounding

2-question self-check

Continue in this track