Prompt chaining vs. one-shot prompting
When to break a task apart and when to let the model handle it whole.
One prompt that does three things usually does none of them well. Chain-of-prompts is the antidote — but only if the chain is designed, not improvised.
When to chain
Break a single prompt into two or more calls when:
- The task has distinct phases (understand → decide → format).
- Earlier phases need more tokens than later ones (researching vs. writing).
- You want different models for different parts (cheap model for easy step, frontier model for hard step).
- You need user or system approval between steps.
When not to chain
- The task is latency-sensitive. Each extra call adds round-trip time.
- The model can do it in one call without quality loss. Chaining for its own sake adds error surface.
- Steps are tightly coupled — if step 2 needs most of step 1's reasoning context to make sense, just do it all in one prompt.
A useful shape: plan → act → verify
For tasks with moderate complexity and quality requirements:
- Plan. First call produces a structured plan (steps, approach). Cheap model is often fine.
- Act. Second call executes against the plan. Frontier model. Gets the plan as input.
- Verify. Third call checks the output against the plan. Judge-style. Cheap model.
Each step has a single clear task. Each can be debugged independently. Adding evals is straightforward.
Token accounting
Chained prompts reuse context, which inflates tokens fast. Three tactics:
- Summarize between steps. Don't carry the full plan forward — carry a compact version.
- Pick which intermediate artifacts travel. Intermediate reasoning often shouldn't.
- Cache shared context. OpenAI, Anthropic, and Gemini all now offer prompt caching for repeated system prompts or large shared context. Critical for chains.
Orchestration frameworks
LangGraph, Mastra, Inngest, Vercel AI SDK, and custom plain TypeScript are all reasonable ways to wire up chains. Pick based on:
- Need for durability? Inngest, Temporal, or similar — chain steps survive crashes.
- Need for streaming? Vercel AI SDK is among the cleanest.
- Custom graph of steps? LangGraph is purpose-built.
- Simple linear chain? Plain functions, no framework.
The most common failure mode is adopting a framework before the chain is well understood. Start as a flat script; extract to a framework once the pattern stabilizes.
Where chains quietly fail
- Error handling. If step 2 returns junk, step 3 is garbage-in-garbage-out. Add validation between steps.
- Drift in prompts. Each step's prompt evolves over time and they get out of sync. Treat chain prompts like interconnected code modules.
- Hidden dependencies. Step 3's prompt expects step 1 produced a specific format. If step 1's prompt changes, step 3 silently breaks. Document the contract.
Check your understanding
2-question self-check
Optional. Your answers feed your knowledge score on the track certificate.
Q1.A good sign that a task SHOULD be chained into multiple prompts is…
Q2.What's a common failure mode of chained prompts?
Continue in this track
More lessons from Prompt Engineering Mastery.