Entry
Agent Memory Is Not a Vector Database
Teams reach for a vector DB and stop thinking. Real agents need layered memory: a working set, durable preferences, retrieval, and state that survives a restart. A practical breakdown.
00 / Masthead
Entry
Teams reach for a vector DB and stop thinking. Real agents need layered memory: a working set, durable preferences, retrieval, and state that survives a restart. A practical breakdown.
Ask most teams how their agent remembers things and the answer is "we put it in a vector database." Ask what they store there and you get a pause, because the honest answer is "everything, and we are not sure it helps."
A vector store is one kind of memory. It is good at one question: find me things that mean something similar to this. That is genuinely useful, and it is also a small slice of what an agent actually needs to remember. When teams treat the vector DB as the whole memory system, they end up with an agent that can find a vaguely related document but cannot remember what you told it ninety seconds ago. Memory is a layer in the system around the model, and it has at least four parts.
This is what the agent has in front of it right now: the task, the last few steps, the results of the tools it just called. It lives in the context window, and the context window has a property people forget. It only grows during a task. It never forgets on its own. Every wrong turn, every abandoned approach, every oversized tool response stays in there, crowding out the thing that matters.
So the working set is a curation problem, not a storage problem. The question is not "how do I keep more" but "how do I keep the right things and drop the rest." One of the most reliable patterns I use is to externalise state to files and reconstruct the working set fresh each iteration rather than letting one long conversation accumulate sludge. (I wrote about that as Ralph mode: progress persists in files, failures evaporate when the session resets.)
Some things should outlive every task. "Always reply in British English." "This customer is on the enterprise plan." "Never auto-approve a refund over 200 francs." These are small, they are mostly text, and they belong somewhere boring and reliable, a row in a database or a config file the agent reads at the start of every run.
People try to cram this into the vector store and it goes wrong immediately, because preferences are not retrieved by semantic similarity. You do not want the agent to remember your refund limit only when the conversation happens to sound similar to the one where you set it. You want it loaded every time, deterministically. A WHERE user_id = ? is the right tool here, and it is fine that it is unfashionable.
This is the layer the vector database actually belongs to: searching a body of documents, code, or past cases the agent was not trained on and cannot fit in context. Here the real lesson is that pure vector search is not enough either.
Semantic search misses exact matches. Ask for an error code or a product SKU and embeddings will happily return things that are "about" it while skipping the document that literally contains it. Keyword search has the opposite blind spot: it nails the exact string and misses anything phrased differently. The retrieval that holds up in production runs both and reranks the combined results. That is exactly why I built GNO on hybrid search (keyword plus vectors plus a reranking pass) rather than embeddings alone. When I built clinical retrieval at a Swiss health-tech company, the same shape held: pgvector for the semantic layer, a two-stage context build on top, because the first pass casts wide and the second pass earns precision.
The fourth layer is the one that separates a chatbot from an agent: where are we in this multi-step process, and can we pick it up after a crash. What has the agent already done, what is still pending, what is blocked.
This is not memory in the "recall a fact" sense. It is a state machine with durability. For a support agent it is the case status. For a coding agent it is the task graph, which tasks are done, which are in progress, which got blocked after three failed attempts. In Flow-Next this lives in version-controlled files so the whole run has an audit trail and any iteration can re-read exactly where things stand. If your agent forgets what it was doing the moment the process restarts, it does not have memory. It has a goldfish with a good vocabulary.
Before you stand up a vector database, ask what kind of remembering the agent has to do, and you will usually find it is several kinds at once. The working set wants curation. Preferences want a key-value lookup. Knowledge wants hybrid retrieval. Process wants durable state. Each has a different right answer, and a vector store is the right answer to exactly one of them.
Get this layered and most of the "the agent is unreliable" complaints quietly disappear, because the agent stops hallucinating the things it should have remembered and stops drowning in the things it should have forgotten.
Memory is one of five layers. The rest (tools, verification, guardrails, observability) are in the field guide to harness engineering.