Every AI developer hits the same wall at some point. You build an agent that can code, browse, execute tools, hold conversations. Then you restart it. And everything is gone. No memory of what you discussed yesterday. No context from last week's debugging session. No idea what preferences you set up.
It's like waking up with amnesia every single time.
The good news: the community has figured this out. What was once a research problem is now a solvable engineering challenge. Here's what's actually working in production right now.
The core problem: sessions start from zero
When you fire up Claude Code, GPT-4, or any local agent, it begins with a clean slate. It doesn't remember the bug you debugged yesterday, the architecture decision you made last week, or that you prefer Tailwind over Bootstrap. Every session is a fresh start.
This works fine for one-off queries. But the moment you want an agent to handle real work multi-step projects, long-term workflows, or ongoing collaborations the amnesia becomes a bottleneck.
The community has converged on a clear consensus: persistent memory is not optional anymore. It's a fundamental feature that separates toy agents from production systems.
What's actually working: memory architectures in the wild
One of the most compelling implementations surfaced recently. A developer built Memori, a Rust + SQLite solution specifically for giving coding agents permanent memory. The architecture is worth studying because it solves the right problems:
- Hybrid search: Combines FTS5 full-text search with cosine vector similarity, fused using Reciprocal Rank Fusion. Text queries auto-vectorize without manual flags.
- Auto-dedup: When cosine similarity exceeds 0.92 between same-type memories, it updates the existing entry instead of creating a duplicate.
- Decay scoring: Logarithmic access boost + exponential time decay with a ~69-day half-life. Frequently-used memories surface first; stale ones fade naturally.
- Built-in embeddings: Ships with fastembed AllMiniLM-L6-V2. No external API calls, no vendor lock-in.
- One-step setup: A single command injects a behavioral snippet that teaches the agent when to store, search, and maintain its own memory.
The performance numbers are striking: 43µs UUID lookups, 65µs text search on 1K memories, 18ms end-to-end for insert plus auto-embed. On an Apple M4 Pro, it handles 8,100 writes per second. This is not a proof-of-concept. This is production-ready infrastructure.
The memory landscape: who's building what
The broader landscape reveals multiple approaches converging on similar patterns. A LangChain discussion laid out a comparison that's useful for orientation:
- Mem0 leans toward product applications with high speed and medium setup complexity.
- MemGPT targets complex agents where maximum control matters, accepting harder setup in return.
- OpenMemory optimizes for coding agents with fast performance and medium control.
Each targets different use cases, but they all share the same underlying insight: memory must be architecturally separated from the agent runtime. You should be able to restart the agent, swap the model, or migrate to a different host and retain what matters.
The multi-agent memory pattern
Another interesting implementation came from the Violet project, which implements a multi-agent hierarchy on Claude Code. What makes it relevant here is the memory architecture:
- Shared memory files (STATE.md, DECISIONS.md, CORRECTIONS.md, HANDOFF.md) that agents read at session start and write at session end
- SQLite + FTS5 + vector embeddings for heavier semantic search
- A "knowledge kernel" that transfers compiled knowledge from cloud to local sessions
The cloud-to-local transfer is particularly interesting. Cloud sessions (Claude) compile knowledge fragments into a local memory kernel. Local sessions (Ollama) load those fragments. Over time, the local model gets functionally smarter without retraining.
This pattern matters because it solves a real pain point: expensive API calls versus offline capability. You get the best of both worlds.
Why ChromaDB keeps coming up
Across the AI development community, ChromaDB appeared repeatedly as the recommended backend for long-term agent memory. The pattern is consistent:
- Vector embeddings stored for semantic recall
- Metadata fields for filtering and organization
- Persistence options that survive restarts
- Query capabilities that support both exact and approximate matching
It's not the only option, but it's become the default recommendation for teams that want something simpler than building their own vector pipeline but more capable than raw text files.
What your memory stack actually needs
Based on what's proven in production, here's what a solid agent memory implementation includes:
Short-term working memory: What the agent is actively processing right now. Usually in-memory, fast, ephemeral. Lost on restart, and that's fine.
Long-term semantic memory: Durable, queryable, indexed. This is what survives restarts. Typically vector embeddings + metadata, backed by SQLite or similar.
Event and decision logs: Append-only traces for auditing, replay, and incident analysis. When something goes wrong, you need to know what the agent was thinking.
Compaction and decay policies: Memory that grows forever becomes unusable. You need summarization, merging, and expiration of stale details.
Export and portability: Backups are useless if you can't restore them. Export formats, migration paths, and tested recovery procedures are not optional.
The backup question nobody asks but everyone should
Here's what gets overlooked: memory backup is not just about storing the data. It's about tested recovery.
A pattern emphasized by experienced self-hosted operators tested decryption keys. Backups without verified restore capability are decorative optimism. If you've never successfully restored from backup, you don't have a backup strategy. You have a false sense of security.
For agent memory specifically, this means:
- Regular restore drills (monthly at minimum)
- Integrity verification after any restore
- Documented recovery procedures that don't require "tribal knowledge"
- Clear RTO and RPO targets for different memory classes
The practical takeaway
If you're building AI agents and haven't solved persistence yet, start with one layer and expand:
- Start simple: Markdown files for decisions and state (Violet-style). No infrastructure, immediate value.
- Add search: SQLite with FTS5 for text search across past sessions.
- Layer vectors: Add semantic search when text search isn't enough.
- Automate maintenance: Compaction, decay, cleanup scripts that run periodically.
You don't need everything on day one. But you do need the architectural intention. The moment you treat memory as an afterthought, you guarantee that your agents will remain toys.
The direction things are heading
Persistent memory is becoming assumed rather than exceptional. Agents that forget between sessions will feel broken. Agents that remember, reason over their history, and improve over time will feel like the norm.
The competitive edge is shifting from "how smart is your model?" to "how well does your agent remember and build on what it knows?"
That's the practical reality of where we are in early 2026. The tools exist. The patterns are proven. The question is whether you're building memory in from the start or patching it in later after users have already left.