What is a large language model, really?
Strip the hype. Learn what an LLM actually does, token by token.
Strip the hype. At the hardware level an LLM is doing one thing: guessing the next token. Everything you find impressive about it is built on top of that single operation.
Tokens, not words
When you send "How do I fine-tune a model?" to a language model, it doesn't see seven words. It sees a sequence of tokens — roughly sub-word units. "fine-tune" might be two tokens. A rare word can be four. Your API bill, your latency, and your context limits are all measured in tokens.
A useful calibration: 1 token ≈ 0.75 English words. A page of dense prose is about 500 tokens. GPT-4-class models handle ~128k tokens of context — that's a 200-page book.
The single operation
Given a sequence of tokens, the model outputs a probability distribution over every token in its vocabulary (typically 100k+ options). It picks one, appends it, and repeats. That's the whole mechanism.
The "intelligence" you see is an emergent property of doing next-token prediction very well, across trillions of training examples. There is no separate "reasoning" module. When you ask it to solve a puzzle, the reasoning happens in the tokens it generates — which is why telling it to "think step by step" actually helps.
Why this mental model matters
- Context is everything. The model has zero memory between calls. Anything it "knows" about your task must be in the prompt or the tool results you feed back.
- It doesn't know what it doesn't know. It predicts confident-sounding text. Calibration (knowing when it's unsure) is a weak point — one you have to engineer around.
- Output shape drives output quality. If you want JSON, ask for JSON with a schema. If you want a list, ask for a list. The prompt tells the model what tokens look "probable" next.
In practice
Next time a model gives you a wrong answer, don't debug by re-phrasing with more superlatives. Ask: did my prompt give it enough context to produce the right next tokens? Nine times out of ten, that's the real question.
Check your understanding
3-question self-check
Optional. Your answers feed your knowledge score on the track certificate.
Q1.At the mechanical level, what does a large language model do on every step?
Q2.Why does the lesson compare an LLM to a brilliant new hire with total amnesia?
Q3.What is the practical consequence of tokens being sub-word units rather than full words?
Continue in this track
More lessons from AI Fundamentals.
Lesson 2
How models are trained (and why it matters to you)
Pre-training, instruction tuning, alignment — and what each one means for your choices.
Lesson 3
Tokens, context windows, and why your prompts get cut off
The mechanics of context — and how to reason about fit, cost, and truncation.
Lesson 4
Your first useful prompt
Walk through structuring a prompt that gets consistent, production-quality output.