Prompt chaining vs. one-shot prompting

One prompt that does three things usually does none of them well. Chain-of-prompts is the antidote — but only if the chain is designed, not improvised.

When to chain

Break a single prompt into two or more calls when:

The task has distinct phases (understand → decide → format).
Earlier phases need more tokens than later ones (researching vs. writing).
You want different models for different parts (cheap model for easy step, frontier model for hard step).
You need user or system approval between steps.

When not to chain

The task is latency-sensitive. Each extra call adds round-trip time.
The model can do it in one call without quality loss. Chaining for its own sake adds error surface.
Steps are tightly coupled — if step 2 needs most of step 1's reasoning context to make sense, just do it all in one prompt.

A useful shape: plan → act → verify

For tasks with moderate complexity and quality requirements:

Plan. First call produces a structured plan (steps, approach). Cheap model is often fine.
Act. Second call executes against the plan. Frontier model. Gets the plan as input.
Verify. Third call checks the output against the plan. Judge-style. Cheap model.

Each step has a single clear task. Each can be debugged independently. Adding evals is straightforward.

Token accounting

Chained prompts reuse context, which inflates tokens fast. Three tactics:

Summarize between steps. Don't carry the full plan forward — carry a compact version.
Pick which intermediate artifacts travel. Intermediate reasoning often shouldn't.
Cache shared context. OpenAI, Anthropic, and Gemini all now offer prompt caching for repeated system prompts or large shared context. Critical for chains.

Orchestration frameworks

LangGraph, Mastra, Inngest, Vercel AI SDK, and custom plain TypeScript are all reasonable ways to wire up chains. Pick based on:

Need for durability? Inngest, Temporal, or similar — chain steps survive crashes.
Need for streaming? Vercel AI SDK is among the cleanest.
Custom graph of steps? LangGraph is purpose-built.
Simple linear chain? Plain functions, no framework.

The most common failure mode is adopting a framework before the chain is well understood. Start as a flat script; extract to a framework once the pattern stabilizes.

Where chains quietly fail

Error handling. If step 2 returns junk, step 3 is garbage-in-garbage-out. Add validation between steps.
Drift in prompts. Each step's prompt evolves over time and they get out of sync. Treat chain prompts like interconnected code modules.
Hidden dependencies. Step 3's prompt expects step 1 produced a specific format. If step 1's prompt changes, step 3 silently breaks. Document the contract.

One prompt that does three things usually does none of them well. Chain-of-prompts is the antidote — but only if the chain is designed, not improvised.

When to chain

Break a single prompt into two or more calls when:

The task has distinct phases (understand → decide → format).
Earlier phases need more tokens than later ones (researching vs. writing).
You want different models for different parts (cheap model for easy step, frontier model for hard step).
You need user or system approval between steps.

When not to chain

The task is latency-sensitive. Each extra call adds round-trip time.
The model can do it in one call without quality loss. Chaining for its own sake adds error surface.
Steps are tightly coupled — if step 2 needs most of step 1's reasoning context to make sense, just do it all in one prompt.

A useful shape: plan → act → verify

For tasks with moderate complexity and quality requirements:

Plan. First call produces a structured plan (steps, approach). Cheap model is often fine.
Act. Second call executes against the plan. Frontier model. Gets the plan as input.
Verify. Third call checks the output against the plan. Judge-style. Cheap model.

Each step has a single clear task. Each can be debugged independently. Adding evals is straightforward.

Token accounting

Chained prompts reuse context, which inflates tokens fast. Three tactics:

Summarize between steps. Don't carry the full plan forward — carry a compact version.
Pick which intermediate artifacts travel. Intermediate reasoning often shouldn't.
Cache shared context. OpenAI, Anthropic, and Gemini all now offer prompt caching for repeated system prompts or large shared context. Critical for chains.

Orchestration frameworks

LangGraph, Mastra, Inngest, Vercel AI SDK, and custom plain TypeScript are all reasonable ways to wire up chains. Pick based on:

Need for durability? Inngest, Temporal, or similar — chain steps survive crashes.
Need for streaming? Vercel AI SDK is among the cleanest.
Custom graph of steps? LangGraph is purpose-built.
Simple linear chain? Plain functions, no framework.

The most common failure mode is adopting a framework before the chain is well understood. Start as a flat script; extract to a framework once the pattern stabilizes.

Where chains quietly fail

Error handling. If step 2 returns junk, step 3 is garbage-in-garbage-out. Add validation between steps.
Drift in prompts. Each step's prompt evolves over time and they get out of sync. Treat chain prompts like interconnected code modules.
Hidden dependencies. Step 3's prompt expects step 1 produced a specific format. If step 1's prompt changes, step 3 silently breaks. Document the contract.

Prompt chaining vs. one-shot prompting

When to chain

When not to chain

A useful shape: plan → act → verify

Token accounting

Orchestration frameworks

Where chains quietly fail

2-question self-check

Continue in this track

Prompt chaining vs. one-shot prompting

When to chain

When not to chain

A useful shape: plan → act → verify

Token accounting

Orchestration frameworks

Where chains quietly fail

2-question self-check

Continue in this track