Temperature, top-p, and the knobs nobody explains
The sampling parameters that shape creativity, determinism, and diversity.
Temperature, top-p, top-k, frequency penalty — four knobs most people set once and forget. Here's what each actually does, and the shockingly small number of settings worth using.
Temperature
Temperature controls how sharply the model picks from the probability distribution of possible next tokens.
- Temperature 0: always pick the highest-probability token. Deterministic-ish (still randomness in ties and batching). Good for classification, extraction, math.
- Temperature 0.7: default for most chat assistants. Natural-sounding, mildly creative.
- Temperature 1.0+: flattens the distribution. More creative, more chaotic. Gets incoherent past 1.5 on most models.
Top-p (nucleus sampling)
After ranking possible tokens by probability, top-p cuts off the tail. top_p = 0.9 means "only sample from the top tokens whose combined probability is ≥ 90%." Everything less likely gets zero chance.
Top-p and temperature compound. Using both aggressively is how you get "creative and incoherent." Pick one knob and hold the other at default.
Top-k
Like top-p but with a fixed number instead of a probability cutoff. Less common today — top-p adapts to the distribution better.
Frequency / presence penalty
Discourage the model from repeating tokens or topics. Useful when the model gets stuck in loops ("and and and"). Rarely needed on modern models; when it is, a small penalty (0.1-0.3) is enough.
The cheat sheet
| Task | Temperature | top_p |
|---|---|---|
| Classification / extraction | 0 | 1 (default) |
| Structured outputs (JSON) | 0 - 0.2 | 1 |
| Code generation | 0.1 - 0.3 | 1 |
| Drafting / summarization | 0.4 - 0.7 | 1 |
| Brainstorming / creative | 0.8 - 1.0 | 0.9 |
| Poetry / fiction | 0.9 - 1.1 | 0.9 |
What to actually do
Start at temperature 0 for anything with a "correct" answer. Go to 0.7 for anything open-ended. Don't touch top-p unless you're squeezing diversity out of a creative generation. Skip frequency penalty unless you have an actual repetition problem.
The most common mistake is treating temperature like a generic "quality" slider. It isn't. It's a diversity slider. Higher temperature doesn't mean better — it means more varied. For most production use, the answer is lower than you think.
Check your understanding
3-question self-check
Optional. Your answers feed your knowledge score on the track certificate.
Q1.Which temperature is most appropriate for classification and structured extraction tasks?
Q2.What's the main effect of raising temperature?
Q3.Is temperature 0 guaranteed to produce bit-for-bit identical output on every call?
Continue in this track
More lessons from AI Fundamentals.
Lesson 3
Tokens, context windows, and why your prompts get cut off
The mechanics of context — and how to reason about fit, cost, and truncation.
Lesson 4
Your first useful prompt
Walk through structuring a prompt that gets consistent, production-quality output.
Lesson 6
When to use AI — and when you really shouldn't
A practical framework for identifying where AI adds value and where it doesn't.