How models are trained (and why it matters to you)

Picking a model without understanding how it was made is like buying a car based on the color. Three stages of training — pre-training, instruction tuning, alignment — each shape a different dial.

Act 1 — Pre-training

The base model ingests a huge corpus (public internet text, books, code, often trillions of tokens) and learns next-token prediction. At the end of pre-training you have a model that can continue text fluently but doesn't necessarily follow instructions. It'll happily autocomplete your input like it's a Wikipedia article.

This is where most of the capability lives. It's also where nearly all the compute cost lives — on the order of $10M-$100M+ for frontier runs.

Act 2 — Instruction tuning (SFT)

Humans (or other models) show the base model pairs of instruction + ideal response. The model learns to respond to instructions instead of just continuing text. This is what turns "fancy autocomplete" into "helpful assistant."

Orders of magnitude cheaper than pre-training. Mostly about tone and format, not raw intelligence.

Act 3 — Alignment (RLHF, Constitutional AI, DPO)

Further tuning — typically reinforcement learning from human feedback — shapes the model to prefer helpful, honest, harmless responses. This is where a model's personality is sculpted: how cautious it is, how it refuses, how much it pushes back.

Why you should care

Decision	Which stage drives it
Raw capability (hard reasoning, long context)	Pre-training
Tone and refusal behavior	Alignment
Custom domain behavior (fine-tuning on your data)	Acts 2 + 3 — you're not rewriting Act 1
Cost per token at inference	Pre-training decisions (size, architecture)

When a new model ships, you're buying a bundle of all three. Benchmarks can move in opposite directions depending on which act changed.

What this predicts

If a new model feels smarter but refuses more things, probably the alignment got tighter. If it's suddenly better at code, likely they baked in more code pre-training data. If it's "chattier," the SFT corpus expanded. Being able to attribute behavior changes to training decisions is half of vendor evaluation.

Picking a model without understanding how it was made is like buying a car based on the color. Three stages of training — pre-training, instruction tuning, alignment — each shape a different dial.

Act 1 — Pre-training

This is where most of the capability lives. It's also where nearly all the compute cost lives — on the order of $10M-$100M+ for frontier runs.

Act 2 — Instruction tuning (SFT)

Orders of magnitude cheaper than pre-training. Mostly about tone and format, not raw intelligence.

Act 3 — Alignment (RLHF, Constitutional AI, DPO)

Why you should care

Decision	Which stage drives it
Raw capability (hard reasoning, long context)	Pre-training
Tone and refusal behavior	Alignment
Custom domain behavior (fine-tuning on your data)	Acts 2 + 3 — you're not rewriting Act 1
Cost per token at inference	Pre-training decisions (size, architecture)

When a new model ships, you're buying a bundle of all three. Benchmarks can move in opposite directions depending on which act changed.

How models are trained (and why it matters to you)

Act 1 — Pre-training

Act 2 — Instruction tuning (SFT)

Act 3 — Alignment (RLHF, Constitutional AI, DPO)

Why you should care

What this predicts

3-question self-check

Continue in this track

How models are trained (and why it matters to you)

Act 1 — Pre-training

Act 2 — Instruction tuning (SFT)

Act 3 — Alignment (RLHF, Constitutional AI, DPO)

Why you should care

What this predicts

3-question self-check

Continue in this track