Memory systems: short, long, and associative

A goldfish agent is useless. But "give it memory" is a one-sentence answer to a six-month problem. Here's the three-kinds-of-memory taxonomy, and why you probably need all of them.

Three kinds of memory

Working memory. The current conversation or task — everything in the context window right now.
Episodic memory. Specific past events/interactions — "what did this user ask last time?"
Semantic memory. General knowledge distilled from many interactions — "this user prefers concise responses."

Most agents ship with only working memory and then fail when users expect continuity.

Working memory: manage it, don't fight it

The context window is your working memory. When it fills, you have to decide what to keep:

Recent messages — always keep.
Task definition — always keep.
Intermediate reasoning — usually drop after a summary.
Tool outputs — drop large ones after extracting the relevant bits.

Summarization-at-threshold is the standard pattern. When the window is 70% full, summarize older messages into a compact "story so far" paragraph.

Episodic memory: the logbook

Every meaningful interaction gets stored with metadata (user, timestamp, topic, outcome). Later, when a new query comes in, you retrieve relevant episodes and inject them into the context.

Implementation: usually vector embeddings over the interaction summaries, indexed per user. When a new query arrives, embed it, retrieve top-5 episodes, include them in the prompt.

Gotchas:

Don't dump raw transcripts. Summarize at write-time; retrieval returns summaries, not 10k-token dumps.
Temporal decay. A 6-month-old episode is usually less relevant than a 6-day-old one. Weight by recency.
Privacy. Users expect "forget about this." Build the delete path from day one.

Semantic memory: the personality model

Across many episodes, patterns emerge. "User asks for bullet points 80% of the time." "User is a TypeScript developer." These patterns get distilled into a compact user profile that loads on every request.

This is where agents start to feel personal. It's also where they start to feel creepy if you over-extract. Be conservative about what you generalize.

The consolidation loop

Periodically (nightly, weekly), run a batch job that:

Reads recent episodes.
Extracts repeated patterns.
Updates the semantic memory.

This is expensive per run but runs asynchronously — not in the request path.

The trade-offs you have to pick

Retrieval scope — all-time vs last-N-days. All-time gets more data but more drift.
Write frequency — every turn vs every session. Every turn is comprehensive but expensive.
Automation vs. user control — does the user see and edit their memory? Showing memory builds trust; hiding it avoids a UX rabbit hole.

When you don't need episodic memory

Stateless tasks. Classifiers, extractors, one-shot summarizers.
Per-session only. Customer support chats where each conversation is independent.
Low-traffic per user. If a user interacts once a quarter, retrieval-across-time has few documents to work with.

Don't build infrastructure you won't use.

A goldfish agent is useless. But "give it memory" is a one-sentence answer to a six-month problem. Here's the three-kinds-of-memory taxonomy, and why you probably need all of them.

Three kinds of memory

Working memory. The current conversation or task — everything in the context window right now.
Episodic memory. Specific past events/interactions — "what did this user ask last time?"
Semantic memory. General knowledge distilled from many interactions — "this user prefers concise responses."

Most agents ship with only working memory and then fail when users expect continuity.

Working memory: manage it, don't fight it

The context window is your working memory. When it fills, you have to decide what to keep:

Recent messages — always keep.
Task definition — always keep.
Intermediate reasoning — usually drop after a summary.
Tool outputs — drop large ones after extracting the relevant bits.

Summarization-at-threshold is the standard pattern. When the window is 70% full, summarize older messages into a compact "story so far" paragraph.

Episodic memory: the logbook

Every meaningful interaction gets stored with metadata (user, timestamp, topic, outcome). Later, when a new query comes in, you retrieve relevant episodes and inject them into the context.

Implementation: usually vector embeddings over the interaction summaries, indexed per user. When a new query arrives, embed it, retrieve top-5 episodes, include them in the prompt.

Gotchas:

Don't dump raw transcripts. Summarize at write-time; retrieval returns summaries, not 10k-token dumps.
Temporal decay. A 6-month-old episode is usually less relevant than a 6-day-old one. Weight by recency.
Privacy. Users expect "forget about this." Build the delete path from day one.

Semantic memory: the personality model

This is where agents start to feel personal. It's also where they start to feel creepy if you over-extract. Be conservative about what you generalize.

The consolidation loop

Periodically (nightly, weekly), run a batch job that:

Reads recent episodes.
Extracts repeated patterns.
Updates the semantic memory.

This is expensive per run but runs asynchronously — not in the request path.

The trade-offs you have to pick

Retrieval scope — all-time vs last-N-days. All-time gets more data but more drift.
Write frequency — every turn vs every session. Every turn is comprehensive but expensive.
Automation vs. user control — does the user see and edit their memory? Showing memory builds trust; hiding it avoids a UX rabbit hole.

When you don't need episodic memory

Stateless tasks. Classifiers, extractors, one-shot summarizers.
Per-session only. Customer support chats where each conversation is independent.
Low-traffic per user. If a user interacts once a quarter, retrieval-across-time has few documents to work with.

Don't build infrastructure you won't use.

Memory systems: short, long, and associative

Three kinds of memory

Working memory: manage it, don't fight it

Episodic memory: the logbook

Semantic memory: the personality model

The consolidation loop

The trade-offs you have to pick

When you don't need episodic memory

2-question self-check

Continue in this track

Memory systems: short, long, and associative

Three kinds of memory

Working memory: manage it, don't fight it

Episodic memory: the logbook

Semantic memory: the personality model

The consolidation loop

The trade-offs you have to pick

When you don't need episodic memory

2-question self-check

Continue in this track