Embed-on-Write, Recall-on-Read: Vector Memory With Zero New Infra
How Matrix gives agents persistent vector memory by storing 768-d embeddings on the graph node and querying Neo4j's native HNSW index — no separate vector DB.
The default architecture diagram for "give my agent memory" has a new box in it: a dedicated vector database. Pinecone, Weaviate, Qdrad, pgvector — pick one, stand it up, give it credentials, keep it in sync with your real data, and now you have two stores that have to agree about the world.
Matrix doesn't have that box. The same Neo4j graph that holds your organizations, agents, sessions, and contacts also is the vector memory database. An agent's memory is a node in the graph with a 768-dimensional float vector hanging off it, and recall is a Cypher call against Neo4j's native HNSW index. No second store, no sync job, no extra deploy target.
This post is the how — the write path, the recall path, the one HNSW index that quietly serves two completely different features, and the deliberate decision to keep the vector off the normal property pipeline.
The premise: you already have a vector store
If you're running anything on Neo4j 5, you already have a production-grade approximate-nearest-neighbour index. Neo4j ships a native HNSW vector index: you declare a node label, a property to index, the dimensionality, and a similarity function, and you get db.index.vector.queryNodes(...) for free.
So the question isn't "which vector DB do I bolt on." It's "why would I run a second database when the one holding my domain model already does this." For a multi-tenant agent platform where memory is inseparable from the contact, the agent, and the session it belongs to, the answer is: you wouldn't. Keeping memory in the graph means a recall query and a "who is this contact" query hit the same store, in the same transaction boundary, under the same tenant filter.
Embed-on-write: every memory carries its vector
A memory in Matrix is a Memory entity — same generic EntityType / EntityNode model as everything else. When MemoryService.write(...) persists a fact, it does two things in sequence: write the row, then embed it.
EntityDto saved = entityManager.createEntity(SystemSchemaSeeder.MEMORY_TYPE, props);
// Best-effort embedding. Vectorize failures (no API key, transient network) leave the row
// without a vector — it still recalls via substring fallback and gets a vector next write.
try {
float[] vec = embeddings.vectorize(content);
if (vec != null && vec.length > 0) vectorStore.setVector(saved.id(), vec);
} catch (Exception e) {
log.warn("embed-on-write failed for memory {}: {}", saved.id(), e.getMessage());
}
The embedding backend is pluggable behind matrix.memory.embeddings. The default (gemini) is GeminiEmbeddingBackend, which calls Google's gemini-embedding-001 and pins the output to 768 dimensions via outputDimensionality. That pin is load-bearing: the Neo4j index is created at exactly 768d, so if the embedding model returned its native (larger) vector, every insert into the index would fail. There's also a vertex backend (text-embedding-005, same 768d pin) for deployments routing embeddings through Vertex AI.
Two design choices in that snippet matter:
- Embedding is best-effort. If the embed call throws — no API key, a transient network blip — the row still persists. It just lands without a vector. It remains recallable via the substring fallback (below) and picks up a vector on its next write. Memory writes never fail because an embedding API hiccuped.
- The vector goes straight onto the node, not through the property pipeline.
vectorStore.setVector(id, vec)is raw Cypher. More on why next.
Why the vector bypasses the property pipeline
Matrix's generic entity model stores scalar fields as native Neo4j node properties, governed by PropertyDefinition schemas and a CRUD pipeline that validates, splits, and round-trips them. The embedding doesn't go through any of that. MemoryVectorStore writes it directly:
session.run(
"MATCH (n:Entity) WHERE id(n) = $id SET n.embedding = $vec",
Values.parameters("id", entityId, "vec", doubles));
The vector is internal plumbing, not a user-facing field. It isn't something an operator types into a form or an API caller sets. Treating it as a regular property would mean a PropertyDefinition for a 768-element float array, dragging it through serialization on every read, and shipping a kilobyte of floats across the bolt wire and out the API on every entity fetch.
Instead, embedding is a reserved key. EntityManager keeps a small denylist —
private static final Set<String> RESERVED_KEYS = Set.of("entityType", "orgId", "embedding");
— and the recall-side entity fetch never pulls properties(n) at all, so the raw vector stays off the wire and out of the user-facing DTO entirely. The graph node holds it; the index reads it; nothing else sees it.
Recall-on-read: vector search, then re-rank
Recall embeds the query and asks the HNSW index. MemoryVectorStore.searchScoped(...) is the core:
CALL db.index.vector.queryNodes($index, $k, $vec) YIELD node, score
WHERE node.orgId = $orgId AND node.entityType = $type
RETURN id(node) AS id, properties(node) AS props, score
ORDER BY score DESC
$index is entity_node_embedding, created by the V003__memory_vector_index.cypher migration:
CREATE VECTOR INDEX entity_node_embedding IF NOT EXISTS
FOR (n:Entity) ON (n.embedding)
OPTIONS { indexConfig: {
`vector.dimensions`: 768,
`vector.similarity_function`: 'cosine'
}};
Cosine matches what the embedding model normalizes to. A few things the query is doing on purpose:
- It over-fetches. The index returns a candidate set wider than the requested
topK(4× plus a floor), because scope filtering happens in Java afterward — by agent id, user id, and conversation id. The HNSW index can't filter on those without bloating its own definition, so we ask for extra and trim. - It's tenant-scoped at the index boundary.
node.orgId = $orgIdAND-s the currentTenantContextonto every recall. The index is shared across tenants; the query is not. - It hides superseded facts. Memory rows that have been replaced (temporal validity — a separate post) carry a
supersededAtand are dropped from results, so the latest fact wins without losing history.
What comes back isn't returned raw. MemoryService.recall(...) re-ranks the candidates by a blend of similarity + recency + importance, so a stale-but-semantically-close fact doesn't outrank a fresher, more important one. Then it cuts to topK.
The fallback that means recall never returns empty by accident
float[] qVec = (query == null || query.isBlank()) ? new float[0] : safeVectorize(query);
if (qVec.length > 0) {
List<MemoryVectorStore.Hit> hits = vectorStore.searchScoped(kind, scope, qVec, Math.max(topK, 8));
if (!hits.isEmpty()) {
// ... rank and return
}
}
// Substring / no-query fallback.
return substringRecall(scope, kind, query, topK);
If the embedding backend is the noop (matrix.memory.embeddings=noop — useful for tests and dev boxes without a Gemini key), or the vector index simply has nothing yet (the very first writes on a fresh DB right after migration), the vector path yields nothing and recall degrades gracefully to a substring scan over the scoped rows. The write path still works in noop mode — it just leaves the index unfilled. You can develop the whole memory feature with embeddings off and the agent still remembers, just lexically instead of semantically.
One index, two features
Here's the part that pays for the whole design. The exact same entity_node_embedding index that powers memory recall also powers Knowledge RAG.
When you drag a PDF into a corpus, each chunk becomes a KnowledgeChunk entity with its own 768-d embedding on the node. KnowledgeVectorStore searches it — and there was no new migration to do it. The chunks live as :Entity nodes with the same embedding property the index already covers; the query just filters by entityType = KnowledgeChunk and the requested corpus instead of Memory:
MATCH (c:Entity)-[:HAS_FIELD {field: 'knowledge'}]->(kn:Entity)
WHERE c.orgId = $orgId AND c.entityType = $type
AND id(kn) IN $knowledgeIds AND c.embedding IS NOT NULL
WITH c, kn, vector.similarity.cosine(c.embedding, $vec) AS score
WHERE score IS NOT NULL
RETURN id(c) AS id, id(kn) AS knowledgeId, properties(c) AS props, score
ORDER BY score DESC LIMIT $topK
(Knowledge search scopes to the requested corpus first, then ranks with exact cosine — the shared HNSW index returns a global top-k across the whole org, which would let a large corpus crowd out a small one. That's a RAG-correctness story covered in RAG You Set Up by Dragging a PDF Into a Browser.)
The point for this post: one HNSW index, two embedded-content types, zero extra infrastructure for the second one. Add a third embedded entity type tomorrow and it inherits the same index for the cost of a WHERE entityType = ....
What this buys you
- One store to operate, back up, and reason about. Memory and your domain model share a transaction boundary and a tenant filter. No "the vector DB and the source of truth disagree" class of bug.
- No sync pipeline. The fact and its vector are written together, on the same node. There's nothing to keep eventually-consistent.
- Graceful degradation. Best-effort embedding means an embedding outage never blocks a write; the noop backend and substring fallback mean recall always returns something sensible.
- Cheap to extend. A new kind of semantically-searchable thing is a new
entityTypeon an index you already have.
Takeaway
You do not need a separate vector memory database to give agents memory. If your system of record is a graph, the graph can hold the vectors too: embed on write, store the 768-d array directly on the node, and let Neo4j's native HNSW index serve recall. Keep the vector off your user-facing property pipeline so it never crosses the wire, scope every query to the tenant, and let one index serve every embedded content type you add. The result is persistent, semantic, multi-tenant agent memory with zero new boxes on the diagram.
Memory across channels — phone and chat, joined to one pool per contact — is the feature people actually feel. See how that joins up in Agents That Actually Remember You — Across Chat and Phone.
Build an agent that remembers. Spin up a workspace, attach a Knowledge corpus or just start a conversation, and watch the same HNSW index serve both. The memory internals live in org.au.memory.*; the architecture is in docs/ARCHITECTURE.md layer 7. Create a workspace and give your agent a memory with zero new infra.
Build your first agent on Matrix
Spin up a workspace, wire up tools and knowledge, give your agent a voice, and talk to it in real time — no agent code required.