The frustration that started this
If you use Claude Code for real work, you’ve hit this wall: you’re deep in a session, you’ve made architectural decisions, debugged tricky issues, established patterns — and then context compaction happens. Claude summarizes your conversation to free up tokens, and suddenly it’s forgotten that you switched from MongoDB to PostgreSQL three hours ago.
You explain it again. It forgets again. Repeat.
I got tired of re-explaining my own codebase to my AI assistant. So I built Claude Cortex — a memory system that works like a brain, not a notepad.
What it actually does
Claude Cortex is an MCP server that gives Claude Code three types of memory:
- Short-term memory (STM) — session-level, high detail, decays within hours
- Long-term memory (LTM) — cross-session, consolidated from STM, persists for weeks
- Episodic memory — specific events: “when I tried X, Y happened”
The key insight: not everything is worth remembering. The system scores every piece of information for salience — how important it actually is:
"Remember that we're using PostgreSQL" → architecture decision → 0.9 salience
"Fixed the auth bug by clearing the token cache" → error resolution → 0.8 salience
"The current file has 200 lines" → temporary context → 0.2 salience (won't persist)
Memories also decay over time, just like human memory:
score = base_salience × (0.995 ^ hours_since_access)
But every time a memory is accessed, it gets reinforced by 1.2×. Frequently useful memories survive. One-off details fade away. This isn’t a key-value store — it’s a system that learns what matters.
The compaction problem, solved
Here’s the specific workflow that used to drive me nuts:
Before Cortex:
Session starts → Work for 2 hours → Compaction happens →
Claude: "What database are you using?" → You: *screams internally*
After Cortex:
Session starts → Work for 2 hours → Compaction happens →
PreCompact hook auto-extracts 3-5 important memories →
Claude: "Let me check my memory..." →
Recalls: PostgreSQL, JWT auth, React frontend, modular architecture →
Continues working seamlessly
The PreCompact hook is the secret weapon. It runs automatically before every compaction event, scanning the conversation for decisions, error fixes, learnings, and architecture notes. No manual intervention needed.
v1.6.0: The intelligence overhaul
The first version was essentially CRUD-with-decay. It worked, but the subsystems were isolated — search didn’t improve linking, linking didn’t improve search, salience was set once and never evolved.
v1.6.0 was a seven-task overhaul to make everything feed back into everything else:
1. Semantic linking via embeddings
Previously, memories only linked if they shared tags. Now, two memories about PostgreSQL with completely different tags will still link — the system computes embedding similarity and creates connections at ≥0.6 cosine similarity.
2. Search feedback loops
Every search now does three things:
- Returns results (obviously)
- Reinforces salience of returned memories (with diminishing returns)
- Creates links between co-returned results
Your search patterns literally shape the knowledge graph.
3. Dynamic salience evolution
Salience isn’t static anymore. During consolidation:
- Hub memories (lots of links) get a logarithmic bonus
- Contradicted memories get a small penalty
- The system learns which memories are structurally important
4. Contradiction surfacing
If you told Claude “use PostgreSQL” in January and “use MongoDB” in March, the system detects this and flags it:
⚠️ WARNING: Contradicts "Use PostgreSQL" (Memory #42)
No more silently holding conflicting information.
5. Memory enrichment
Memories accumulate context over time. If you search for “JWT auth” and the query contains information the memory doesn’t have, it gets appended. Memories grow richer through use.
6. Real consolidation
The old system just deduplicated exact matches. Now it clusters related STM memories and merges them into coherent LTM entries:
STM: "Set up JWT tokens with RS256 signing"
STM: "JWT tokens expire after 24 hours"
STM: "Added JWT verification middleware"
→ Consolidated LTM: "JWT authentication system using RS256 signing.
Tokens expire after 24 hours with 7-day refresh tokens.
Verification middleware on all protected routes."
Three noisy short-term memories become one structured long-term memory.
7. Activation weight tuning
Recently activated memories get a meaningful boost in search results. If you just looked at something, it’s more likely to be relevant again.
Getting started
Install
npm install -g claude-cortex
Configure Claude Code
Create .mcp.json in your project (or ~/.claude/.mcp.json for global):
{
"mcpServers": {
"memory": {
"type": "stdio",
"command": "npx",
"args": ["-y", "claude-cortex"]
}
}
}
Set up the PreCompact hook
Add to ~/.claude/settings.json:
{
"hooks": {
"PreCompact": [
{
"matcher": "",
"hooks": [
{
"type": "command",
"command": "npx -y claude-cortex-hook pre-compact"
}
]
}
]
}
}
Restart Claude Code, approve the MCP server, and you’re done. Claude will start remembering things automatically.
Use it naturally
You don’t need to learn new commands. Just talk to Claude:
"Remember that we're using PostgreSQL for the database"
"What do you know about our auth setup?"
"Get the context for this project"
The system handles categorization, salience scoring, and storage behind the scenes.
The dashboard
There’s also an optional 3D brain visualization dashboard — because honestly, watching memories form as glowing nodes in a neural network is just cool.
npx claude-cortex service install # auto-start on login
It shows your memory graph in real-time via WebSocket, with search, filters, stats, and even a SQL console for poking at the database directly. Memories are color-coded: blue for architecture, purple for patterns, green for preferences, red for errors, yellow for learnings.
How it compares
Most MCP memory tools are flat key-value stores. You manually save and manually retrieve. Claude Cortex is different in a few ways:
- Salience detection — it decides what’s worth remembering, not you
- Temporal decay — old irrelevant stuff fades naturally
- STM → LTM consolidation — short-term memories get merged into long-term ones
- Semantic linking — memories form a knowledge graph, not a list
- PreCompact hook — survives Claude Code’s context compaction automatically
It’s not perfect. Embeddings add some latency. The consolidation heuristics are tuned for my workflows and might need adjustment for yours. The dashboard is a nice-to-have, not a must-have. But for the core problem — Claude forgetting things it shouldn’t forget — it works really well.
The stack
- TypeScript, compiled to ESM
- SQLite with FTS5 for full-text search
-
@huggingface/transformersfor local embeddings (v1.6.1 fixed ARM64 support) - MCP protocol for Claude Code integration
- React + Three.js for the dashboard
- 56 passing tests, MIT licensed
Try it out
npm install -g claude-cortex
If you’re using Claude Code for anything beyond quick one-offs, give it a shot. The difference between an AI that remembers your project and one that doesn’t is night and day.
Stars and feedback welcome — this is a solo project and I’m iterating fast.