Memory systems: short, long, and associative
The three kinds of memory an agent needs and how to build each.
A goldfish agent is useless. But "give it memory" is a one-sentence answer to a six-month problem. Here's the three-kinds-of-memory taxonomy, and why you probably need all of them.
Three kinds of memory
- Working memory. The current conversation or task — everything in the context window right now.
- Episodic memory. Specific past events/interactions — "what did this user ask last time?"
- Semantic memory. General knowledge distilled from many interactions — "this user prefers concise responses."
Most agents ship with only working memory and then fail when users expect continuity.
Working memory: manage it, don't fight it
The context window is your working memory. When it fills, you have to decide what to keep:
- Recent messages — always keep.
- Task definition — always keep.
- Intermediate reasoning — usually drop after a summary.
- Tool outputs — drop large ones after extracting the relevant bits.
Summarization-at-threshold is the standard pattern. When the window is 70% full, summarize older messages into a compact "story so far" paragraph.
Episodic memory: the logbook
Every meaningful interaction gets stored with metadata (user, timestamp, topic, outcome). Later, when a new query comes in, you retrieve relevant episodes and inject them into the context.
Implementation: usually vector embeddings over the interaction summaries, indexed per user. When a new query arrives, embed it, retrieve top-5 episodes, include them in the prompt.
Gotchas:
- Don't dump raw transcripts. Summarize at write-time; retrieval returns summaries, not 10k-token dumps.
- Temporal decay. A 6-month-old episode is usually less relevant than a 6-day-old one. Weight by recency.
- Privacy. Users expect "forget about this." Build the delete path from day one.
Semantic memory: the personality model
Across many episodes, patterns emerge. "User asks for bullet points 80% of the time." "User is a TypeScript developer." These patterns get distilled into a compact user profile that loads on every request.
This is where agents start to feel personal. It's also where they start to feel creepy if you over-extract. Be conservative about what you generalize.
The consolidation loop
Periodically (nightly, weekly), run a batch job that:
- Reads recent episodes.
- Extracts repeated patterns.
- Updates the semantic memory.
This is expensive per run but runs asynchronously — not in the request path.
The trade-offs you have to pick
- Retrieval scope — all-time vs last-N-days. All-time gets more data but more drift.
- Write frequency — every turn vs every session. Every turn is comprehensive but expensive.
- Automation vs. user control — does the user see and edit their memory? Showing memory builds trust; hiding it avoids a UX rabbit hole.
When you don't need episodic memory
- Stateless tasks. Classifiers, extractors, one-shot summarizers.
- Per-session only. Customer support chats where each conversation is independent.
- Low-traffic per user. If a user interacts once a quarter, retrieval-across-time has few documents to work with.
Don't build infrastructure you won't use.
Check your understanding
2-question self-check
Optional. Your answers feed your knowledge score on the track certificate.
Q1.Which three memory types does the lesson identify?
Q2.For episodic memory, you should typically…
Continue in this track
More lessons from Building AI Agents.
Lesson 1
What an agent actually is (and isn't)
Cut through the marketing. Define agents by behavior, not hype.
Lesson 2
Tool use: giving a model hands
How tool calling works under the hood, and how to design tools models can use.
Lesson 4
Planning strategies: ReAct, Plan-and-Execute, and beyond
Different shapes of agent reasoning and when to use each.