Chain-of-thought: when it helps and when it hurts
The real trade-offs of asking a model to think out loud.
Chain-of-thought prompting used to be a superpower. Now it's a default in many models — and in some cases it's actively making things worse. Here's when to use it, when to skip it, and when to trust the model's internal reasoning instead.
What chain-of-thought actually does
"Think step by step before answering" prompts the model to generate reasoning tokens before the answer. That reasoning lives in the context, and the final answer is generated with those reasoning tokens in view. The model effectively uses its own output as scratch space.
For hard problems — multi-step math, complex logical inference, ambiguous classification — this consistently improves quality. For easy problems, it adds latency and can introduce errors by "overthinking."
The 2026 wrinkle: reasoning models
OpenAI's o-series, Claude's Thinking mode, Gemini's Thinking — these models do chain-of-thought internally and don't surface it in the final response (or surface it separately). When you're using a reasoning model, adding "think step by step" in your prompt is redundant at best and confusing at worst.
Rule of thumb:
- Reasoning model? Don't CoT-prompt. Let the model's built-in reasoning handle it.
- Non-reasoning model on a hard problem? Ask for step-by-step reasoning.
- Non-reasoning model on an easy problem? Skip it — the latency isn't worth it.
When CoT hurts
- Simple classification. "Is this spam?" doesn't need reasoning; it needs a 1-token answer.
- Tight latency budgets. Each reasoning token is ~50-150ms. A six-paragraph CoT adds real UX cost.
- Over-cautious outputs. CoT can talk the model into finding edge cases that don't matter, producing hedged, unhelpful answers.
Patterns that sharpen CoT when you do use it
- Bound the reasoning length. "Think in 3-5 short steps, then give the answer." Prevents reasoning diarrhea.
- Separate reasoning from output. Ask for
<reasoning>tags followed by<answer>tags. Makes it easy to parse and optionally hide the reasoning. - Reverse CoT for critique. "State your answer. Then state the strongest argument against it. Revise if the counter-argument is valid." Produces more balanced results than forward CoT.
Self-consistency
A research-era trick: generate multiple CoT reasonings, pick the majority answer. Expensive in tokens, but genuinely helps on reasoning tasks. Some production systems use it for very-high-stakes decisions where latency doesn't matter.
In practice
Run the A/B test on your own task. Take 20 evals. Run once with CoT, once without. Look at quality, latency, cost. Many teams discover their CoT wasn't actually helping.
Check your understanding
2-question self-check
Optional. Your answers feed your knowledge score on the track certificate.
Q1.When is chain-of-thought prompting LEAST helpful?
Q2.If you're using a reasoning model (like o-series or Claude Thinking), you should…
Continue in this track
More lessons from Prompt Engineering Mastery.