Few-shot learning done right
How to pick examples that teach, not just pad.
Few-shot examples are the second-strongest lever in prompt engineering after a clear task description. Used poorly they bias the model toward the examples themselves.
What few-shot prompting actually is
You show the model 2-5 input/output pairs that demonstrate the pattern you want, then give it a new input. It "learns" the format, tone, and approach from the examples — no fine-tuning, no training. Everything happens in context.
This works because the model's next-token prediction sees your examples and infers: "ah, the pattern is that after an input like this, an output like that follows."
Choosing examples that teach
Your examples should cover:
- The shape of a typical input. Not too simple, not too weird.
- A tricky case. An edge where naive pattern-matching would fail. Shows the model where the nuance lives.
- A near-miss. An input that looks like one class but is actually another. Forces the model to pay attention.
Three examples that cover these three cases outperform twelve random ones.
The bias trap
If your three examples all classify "positive sentiment" and you ask the model to classify a new input, it'll lean positive more than it should. Fix:
- Balance the classes. If you're doing binary classification, have roughly equal positive and negative examples.
- Vary length, topic, and style across examples — otherwise the model over-fits to those surface features.
- Randomize order if you're evaluating on a large set.
Input/output formatting
Make the format of your examples identical to what you want in the response. The model will mimic the format you show it. Watch for:
- Trailing newlines (present or absent consistently).
- Quotation style (
"vs'vs backticks). - Length of outputs (if examples are 30 words, the output will be 30 words).
Few-shot vs. description
Ask yourself: can I describe this task in one paragraph well enough that any competent human could do it from the description alone?
- Yes? A clear description + zero examples often works.
- No — the task is "I'll know it when I see it"? Few-shot is the right tool.
Example selection — static vs. dynamic
For production systems, you have two options:
- Static examples baked into the prompt. Cheaper, simpler, good when the task is narrow.
- Dynamic examples selected per-query by semantic similarity to the incoming input. More effective for diverse inputs but adds retrieval infrastructure.
Start static. Move to dynamic when you have enough varied traffic to justify it.
Check your understanding
2-question self-check
Optional. Your answers feed your knowledge score on the track certificate.
Q1.Which is the MOST valuable kind of example to include in a few-shot prompt?
Q2.Diminishing returns in few-shot prompting means…
Continue in this track
More lessons from Prompt Engineering Mastery.
Lesson 1
System prompts that actually hold
Why system prompts drift, and how to write ones that stay.
Lesson 2
Chain-of-thought: when it helps and when it hurts
The real trade-offs of asking a model to think out loud.
Lesson 4
Structured outputs: JSON, XML, and the tax of each
Get reliable structured data out of language models.