Entry

Agent Memory Is Not a Vector Database

Teams reach for a vector DB and stop thinking. Real agents need layered memory: a working set, durable preferences, retrieval, and state that survives a restart. A practical breakdown.

Filed[2026.05.19]4 min read

The working set

This is what the agent has in front of it right now: the task, the last few steps, the results of the tools it just called. It lives in the context window, and the context window has a property people forget. It only grows during a task. It never forgets on its own. Every wrong turn, every abandoned approach, every oversized tool response stays in there, crowding out the thing that matters.

So the working set is a curation problem, not a storage problem. The question is not "how do I keep more" but "how do I keep the right things and drop the rest." One of the most reliable patterns I use is to externalise state to files and reconstruct the working set fresh each iteration rather than letting one long conversation accumulate sludge. (I wrote about that as Ralph mode: progress persists in files, failures evaporate when the session resets.)

Durable preferences

Some things should outlive every task. "Always reply in British English." "This customer is on the enterprise plan." "Never auto-approve a refund over 200 francs." These are small, they are mostly text, and they belong somewhere boring and reliable, a row in a database or a config file the agent reads at the start of every run.

People try to cram this into the vector store and it goes wrong immediately, because preferences are not retrieved by semantic similarity. You do not want the agent to remember your refund limit only when the conversation happens to sound similar to the one where you set it. You want it loaded every time, deterministically. A WHERE user_id = ? is the right tool here, and it is fine that it is unfashionable.

Retrieval over knowledge

This is the layer the vector database actually belongs to: searching a body of documents, code, or past cases the agent was not trained on and cannot fit in context. Here the real lesson is that pure vector search is not enough either.

Semantic search misses exact matches. Ask for an error code or a product SKU and embeddings will happily return things that are "about" it while skipping the document that literally contains it. Keyword search has the opposite blind spot: it nails the exact string and misses anything phrased differently. The retrieval that holds up in production runs both and reranks the combined results. That is exactly why I built GNO on hybrid search (keyword plus vectors plus a reranking pass) rather than embeddings alone. When I built clinical retrieval at a Swiss health-tech company, the same shape held: pgvector for the semantic layer, a two-stage context build on top, because the first pass casts wide and the second pass earns precision.

State that survives a restart

The fourth layer is the one that separates a chatbot from an agent: where are we in this multi-step process, and can we pick it up after a crash. What has the agent already done, what is still pending, what is blocked.

This is not memory in the "recall a fact" sense. It is a state machine with durability. For a support agent it is the case status. For a coding agent it is the task graph, which tasks are done, which are in progress, which got blocked after three failed attempts. In Flow-Next this lives in version-controlled files so the whole run has an audit trail and any iteration can re-read exactly where things stand. If your agent forgets what it was doing the moment the process restarts, it does not have memory. It has a goldfish with a good vocabulary.

The actual design question

Before you stand up a vector database, ask what kind of remembering the agent has to do, and you will usually find it is several kinds at once. The working set wants curation. Preferences want a key-value lookup. Knowledge wants hybrid retrieval. Process wants durable state. Each has a different right answer, and a vector store is the right answer to exactly one of them.

Get this layered and most of the "the agent is unreliable" complaints quietly disappear, because the agent stops hallucinating the things it should have remembered and stops drowning in the things it should have forgotten.

Memory is one of five layers. The rest (tools, verification, guardrails, observability) are in the field guide to harness engineering.

Mickel.tech

The working set

Durable preferences

Retrieval over knowledge

State that survives a restart

The actual design question