Temperature, top-p, and the knobs nobody explains

Temperature, top-p, top-k, frequency penalty — four knobs most people set once and forget. Here's what each actually does, and the shockingly small number of settings worth using.

Temperature

Temperature controls how sharply the model picks from the probability distribution of possible next tokens.

Temperature 0: always pick the highest-probability token. Deterministic-ish (still randomness in ties and batching). Good for classification, extraction, math.
Temperature 0.7: default for most chat assistants. Natural-sounding, mildly creative.
Temperature 1.0+: flattens the distribution. More creative, more chaotic. Gets incoherent past 1.5 on most models.

Top-p (nucleus sampling)

After ranking possible tokens by probability, top-p cuts off the tail. top_p = 0.9 means "only sample from the top tokens whose combined probability is ≥ 90%." Everything less likely gets zero chance.

Top-p and temperature compound. Using both aggressively is how you get "creative and incoherent." Pick one knob and hold the other at default.

Top-k

Like top-p but with a fixed number instead of a probability cutoff. Less common today — top-p adapts to the distribution better.

Frequency / presence penalty

Discourage the model from repeating tokens or topics. Useful when the model gets stuck in loops ("and and and"). Rarely needed on modern models; when it is, a small penalty (0.1-0.3) is enough.

The cheat sheet

Task	Temperature	top_p
Classification / extraction	0	1 (default)
Structured outputs (JSON)	0 - 0.2	1
Code generation	0.1 - 0.3	1
Drafting / summarization	0.4 - 0.7	1
Brainstorming / creative	0.8 - 1.0	0.9
Poetry / fiction	0.9 - 1.1	0.9

What to actually do

Start at temperature 0 for anything with a "correct" answer. Go to 0.7 for anything open-ended. Don't touch top-p unless you're squeezing diversity out of a creative generation. Skip frequency penalty unless you have an actual repetition problem.

The most common mistake is treating temperature like a generic "quality" slider. It isn't. It's a diversity slider. Higher temperature doesn't mean better — it means more varied. For most production use, the answer is lower than you think.

Temperature, top-p, and the knobs nobody explains

The sampling parameters that shape creativity, determinism, and diversity.

Temperature, top-p, top-k, frequency penalty — four knobs most people set once and forget. Here's what each actually does, and the shockingly small number of settings worth using.

Temperature

Temperature controls how sharply the model picks from the probability distribution of possible next tokens.

Temperature 0: always pick the highest-probability token. Deterministic-ish (still randomness in ties and batching). Good for classification, extraction, math.
Temperature 0.7: default for most chat assistants. Natural-sounding, mildly creative.
Temperature 1.0+: flattens the distribution. More creative, more chaotic. Gets incoherent past 1.5 on most models.

Top-p (nucleus sampling)

Top-p and temperature compound. Using both aggressively is how you get "creative and incoherent." Pick one knob and hold the other at default.

Top-k

Like top-p but with a fixed number instead of a probability cutoff. Less common today — top-p adapts to the distribution better.

Frequency / presence penalty

Discourage the model from repeating tokens or topics. Useful when the model gets stuck in loops ("and and and"). Rarely needed on modern models; when it is, a small penalty (0.1-0.3) is enough.

The cheat sheet

Task	Temperature	top_p
Classification / extraction	0	1 (default)
Structured outputs (JSON)	0 - 0.2	1
Code generation	0.1 - 0.3	1
Drafting / summarization	0.4 - 0.7	1
Brainstorming / creative	0.8 - 1.0	0.9
Poetry / fiction	0.9 - 1.1	0.9

What to actually do

Check your understanding

3-question self-check

Optional. Your answers feed your knowledge score on the track certificate.

Q1.Which temperature is most appropriate for classification and structured extraction tasks?
Q2.What's the main effect of raising temperature?
Q3.Is temperature 0 guaranteed to produce bit-for-bit identical output on every call?

Temperature, top-p, and the knobs nobody explains

Temperature

Top-p (nucleus sampling)

Top-k

Frequency / presence penalty

The cheat sheet

What to actually do

3-question self-check

Continue in this track

Temperature, top-p, and the knobs nobody explains

Temperature

Top-p (nucleus sampling)

Top-k

Frequency / presence penalty

The cheat sheet

What to actually do

3-question self-check

Continue in this track