Few-shot learning done right

Few-shot examples are the second-strongest lever in prompt engineering after a clear task description. Used poorly they bias the model toward the examples themselves.

What few-shot prompting actually is

You show the model 2-5 input/output pairs that demonstrate the pattern you want, then give it a new input. It "learns" the format, tone, and approach from the examples — no fine-tuning, no training. Everything happens in context.

This works because the model's next-token prediction sees your examples and infers: "ah, the pattern is that after an input like this, an output like that follows."

Choosing examples that teach

Your examples should cover:

The shape of a typical input. Not too simple, not too weird.
A tricky case. An edge where naive pattern-matching would fail. Shows the model where the nuance lives.
A near-miss. An input that looks like one class but is actually another. Forces the model to pay attention.

Three examples that cover these three cases outperform twelve random ones.

The bias trap

If your three examples all classify "positive sentiment" and you ask the model to classify a new input, it'll lean positive more than it should. Fix:

Balance the classes. If you're doing binary classification, have roughly equal positive and negative examples.
Vary length, topic, and style across examples — otherwise the model over-fits to those surface features.
Randomize order if you're evaluating on a large set.

Input/output formatting

Make the format of your examples identical to what you want in the response. The model will mimic the format you show it. Watch for:

Trailing newlines (present or absent consistently).
Quotation style (" vs ' vs backticks).
Length of outputs (if examples are 30 words, the output will be 30 words).

Few-shot vs. description

Ask yourself: can I describe this task in one paragraph well enough that any competent human could do it from the description alone?

Yes? A clear description + zero examples often works.
No — the task is "I'll know it when I see it"? Few-shot is the right tool.

Example selection — static vs. dynamic

For production systems, you have two options:

Static examples baked into the prompt. Cheaper, simpler, good when the task is narrow.
Dynamic examples selected per-query by semantic similarity to the incoming input. More effective for diverse inputs but adds retrieval infrastructure.

Start static. Move to dynamic when you have enough varied traffic to justify it.

Few-shot examples are the second-strongest lever in prompt engineering after a clear task description. Used poorly they bias the model toward the examples themselves.

What few-shot prompting actually is

This works because the model's next-token prediction sees your examples and infers: "ah, the pattern is that after an input like this, an output like that follows."

Choosing examples that teach

Your examples should cover:

The shape of a typical input. Not too simple, not too weird.
A tricky case. An edge where naive pattern-matching would fail. Shows the model where the nuance lives.
A near-miss. An input that looks like one class but is actually another. Forces the model to pay attention.

Three examples that cover these three cases outperform twelve random ones.

The bias trap

If your three examples all classify "positive sentiment" and you ask the model to classify a new input, it'll lean positive more than it should. Fix:

Balance the classes. If you're doing binary classification, have roughly equal positive and negative examples.
Vary length, topic, and style across examples — otherwise the model over-fits to those surface features.
Randomize order if you're evaluating on a large set.

Input/output formatting

Make the format of your examples identical to what you want in the response. The model will mimic the format you show it. Watch for:

Trailing newlines (present or absent consistently).
Quotation style (" vs ' vs backticks).
Length of outputs (if examples are 30 words, the output will be 30 words).

Few-shot vs. description

Ask yourself: can I describe this task in one paragraph well enough that any competent human could do it from the description alone?

Yes? A clear description + zero examples often works.
No — the task is "I'll know it when I see it"? Few-shot is the right tool.

Example selection — static vs. dynamic

For production systems, you have two options:

Static examples baked into the prompt. Cheaper, simpler, good when the task is narrow.
Dynamic examples selected per-query by semantic similarity to the incoming input. More effective for diverse inputs but adds retrieval infrastructure.

Start static. Move to dynamic when you have enough varied traffic to justify it.

Few-shot learning done right

What few-shot prompting actually is

Choosing examples that teach

The bias trap

Input/output formatting

Few-shot vs. description

Example selection — static vs. dynamic

2-question self-check

Continue in this track

Few-shot learning done right

What few-shot prompting actually is

Choosing examples that teach

The bias trap

Input/output formatting

Few-shot vs. description

Example selection — static vs. dynamic

2-question self-check

Continue in this track