@onenomad/engram-mcp 1.0.0-beta.13 → 1.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,691 +1,685 @@
1
- # Engram
2
-
3
- A memory system for AI agents that actually works. LLMs can't remember anything between conversations by default, and the existing solutions are either too simple (just dump everything in a vector DB) or too expensive (send your entire history to an API every time). Engram sits in the middle. It runs locally, doesn't need an API key for basic operation, and scores **92% recall on LoCoMo** and **99% recall on LongMemEval** — both category-leading on the two most-cited memory benchmarks. That beats every open-source memory system I've tested against.
4
-
5
- The core idea is that memory shouldn't just be "find similar text." When someone asks "where was I working last March?" the system needs to actually reason about time, not just pattern match on the word "March." So the search pipeline combines vector similarity, keyword matching with IDF weighting, temporal inference, a knowledge graph, and spreading activation over a memory graph. Each piece handles a different kind of recall that the others miss.
6
-
7
- ## Table of Contents
8
-
9
- - [Benchmark Results](#benchmark-results)
10
- - [How It Works](#how-it-works)
11
- - [Compatibility](#compatibility)
12
- - [Installation](#installation) (Claude Code, Claude Desktop, Cursor/Windsurf/Cline, Source)
13
- - [Configuration](#configuration)
14
- - [Tools](#tools)
15
- - [Slash Commands](#slash-commands)
16
- - [Architecture](#architecture)
17
- - [Benchmarks](#benchmarks)
18
- - [Security](#security)
19
- - [Use Cases](#use-cases)
20
- - [Pairs Well With: Persona MCP](#pairs-well-with-persona-mcp)
21
- - [License](#license)
22
-
23
- ## Benchmark Results
24
-
25
- Tested against the [LoCoMo benchmark](https://github.com/snap-research/locomo) (1,986 QA pairs across 10 long conversations). No LLM reranking, just the retrieval pipeline on its own.
26
-
27
- ```
28
- Per-category:
29
- adversarial R@5= 89.7% R@10= 95.1%
30
- open-domain R@5= 88.8% R@10= 94.3%
31
- single-hop R@5= 78.0% R@10= 86.9%
32
- temporal R@5= 82.6% R@10= 91.6%
33
- temporal-inference R@5= 61.5% R@10= 74.0%
34
-
35
- OVERALL R@5=85.1% R@10=92.0%
36
- Embedding model Xenova/all-MiniLM-L6-v2 (23MB, runs on CPU)
37
- ```
38
-
39
- For details on recent benchmark optimization work including regression fixes, sub-session chunking, and reranking analysis, see [docs/benchmark-optimization.md](docs/benchmark-optimization.md).
40
-
41
- For reference, here's how that stacks up against other memory systems on LoCoMo:
42
-
43
- | System | Score | Metric | Requires API | Notes |
44
- |--------|-------|--------|-------------|-------|
45
- | **Engram** | **92.0%** | **R@10** | **No** | **Local embeddings, sub-session chunking, no rerank** |
46
- | MemMachine v0.2 | 91.7% | LLM-judge | Yes | GPT-4.1-mini for extraction + judge |
47
- | Backboard | 90.1% | LLM-judge | Yes | GPT-4.1 judge |
48
- | MemPalace hybrid v5 | 88.9% | R@10 | No | Most direct comparison, same metric |
49
- | Zep Graphiti | 85.2% | LLM-judge | Yes | Graph-based retrieval |
50
- | Supermemory | 83.5% | R@10 | No | |
51
- | Letta | ~83.2% | LLM-judge | Yes | |
52
- | Zep (standard) | 75.1% | LLM-judge | Yes | |
53
- | Mem0 | 64-67% | LLM-judge | Yes | Cloud API |
54
- | OpenAI memory | 52.9% | LLM-judge | Yes | Built-in ChatGPT memory |
55
-
56
- A couple things worth noting about this table. Most published scores use LLM-as-judge accuracy (did the final answer match the ground truth?), which is a different metric than R@10 retrieval recall (is the right memory in the top 10 candidates?). So not every row is a direct apples-to-apples comparison. The closest one is MemPalace at 88.9% R@10 using the same methodology and dataset.
57
-
58
- The other thing that stands out is the API column. Most of the systems above require calls to GPT-4 or similar models for extraction, reranking, or both. Engram hits 92.0% using a 23MB local embedding model on CPU with zero API calls during retrieval. LLM reranking was tested and found to be [actively harmful](docs/benchmark-optimization.md#llm-reranking-analysis) for this pipeline.
59
-
60
- ## How It Works
61
-
62
- ### The Search Pipeline
63
-
64
- This is where the interesting stuff happens. A query goes through nine stages before results come back.
65
-
66
- **Stage 1: Signal Extraction.** Before any search happens, the query gets parsed for dates, entities (proper nouns), quoted phrases, and temporal language. If someone writes "What was Matt working on before he switched jobs in June?" the system extracts the date (June), the entity (Matt), and flags it as a temporal inference query because of the word "before."
67
-
68
- **Stage 2: Vector Search.** Standard ANN search against LanceDB using cosine distance on 384-dim embeddings. This handles the "find semantically similar stuff" part. Candidates need at least 0.25 similarity to make the cut.
69
-
70
- **Stage 3: IDF Keyword Scoring.** Rare terms in the query get weighted higher than common words. If you search for "Matt TypeScript" both terms will dominate scoring because they appear in relatively few memories. Proper nouns get an extra 1.5x boost. Results from this stage get blended with vector scores. The blend shifts toward keywords when entities are present, since names and specific nouns are better matched by exact text than by embedding similarity.
71
-
72
- **Stage 4: Bonus Factors.** Every candidate gets adjustments for recency, recall frequency, tier, importance, and cognitive layer. Procedural memories (rules about how you want things done) get a small boost because they tend to be more immediately useful.
73
-
74
- **Stage 5: Temporal Boost.** If the query mentions dates, memories get boosted based on whether they contain matching date strings or were created near the query date. Exact date matches in content get up to +0.4, timestamp proximity up to +0.3.
75
-
76
- **Stage 6: Time-Window Retrieval.** This is the big one for temporal inference. When the system detects temporal signals, it pulls memories from the relevant time period into the candidate pool regardless of semantic similarity. "Where was I working in March 2024?" needs memories *from* March 2024, not just memories that happen to mention the word "March." Window size adapts to date precision. A specific day gets +/- 3 days, a month gets the full month plus buffer, a year gets the full year. If the query says "before March," the window extends 90 days earlier.
77
-
78
- **Stage 7: Knowledge Graph Lookup.** When entities and time are both present, the system queries the knowledge graph for facts that were valid at the query time. If there's a triple like `(Matt, works-at, Acme Corp, valid 2024-01 to 2024-06)` and the query asks about Matt in March 2024, memories mentioning "Acme Corp" get boosted.
79
-
80
- **Stage 8: Spreading Activation.** Based on [Collins & Loftus (1975)](https://en.wikipedia.org/wiki/Spreading_activation). The top 5 scoring memories activate their graph neighbors, which in turn activate their neighbors. Two hops deep, with activation decaying at each hop. Temporal edges get a 1.5x multiplier when the query involves time reasoning.
81
-
82
- **Stage 9: Token Budget.** Results get sorted by score and trimmed to fit within a configurable token budget (default 1500 tokens). This prevents context bloat when injecting memories into prompts.
83
-
84
- ### Memory Tiers
85
-
86
- Memories flow through four tiers with automatic promotion and demotion, plus a fifth scratch tier that sits outside the lifecycle:
87
-
88
- ```
89
- scratch (24h, never promoted, manual promote only)
90
-
91
- daily (2 days) --> short-term (14 days) --> long-term (90 days) --> archive
92
- ^ |
93
- +------ reactivation (if recalled) -----------+
94
- ```
95
-
96
- Promotion isn't just about age. A memory moves to long-term if it's been recalled multiple times, has high importance, received "helpful" feedback, or is a procedural rule. Memories that keep getting recalled stay promoted. Memories that never get touched decay and eventually archive.
97
-
98
- **Scratch tier** is for exploratory, session-only notes that you may want to discard. Pass `tier: 'scratch'` to `memory_ingest` and the chunk is excluded from every consolidation path: no promotion, no merging, no decay-to-archive, no linking. After 24 hours scratch chunks are auto-purged. Use `memory_scratch_promote` to graduate one to short-term once you've decided it's worth keeping.
99
-
100
- ### Memory Origin
101
-
102
- Every chunk carries an `origin` tag that distinguishes user-asserted memory from auto-derived memory:
103
-
104
- - **`user`** — written explicitly via `memory_ingest`. Treated as canonical user-territory: the consolidator never auto-merges, near-duplicate-deletes, or archives these. Importance still decays normally, but the content and lifecycle stay sacred.
105
- - **`extracted`** — pulled from a conversation by `memory_extract` or the Mem0 provider.
106
- - **`imported`** — bulk-loaded via `memory_import`.
107
- - **`derived`** — produced by consolidation (e.g. episodic-to-semantic summaries).
108
-
109
- The split mirrors the journal pattern in [Persona](https://github.com/OneNomad-LLC/persona-mcp): a clean ownership boundary between what the user said and what the system inferred. If you want auto-extracted memories to lose to your hand-written ones in a near-duplicate fight, this is what makes that happen.
110
-
111
- Importance decays exponentially over time, but the rates differ by cognitive layer:
112
- - **Procedural** (rules): decays slowest (0.98/week, floor 0.15). Rules tend to stay relevant.
113
- - **Semantic** (facts): medium decay (0.97/week, floor 0.10)
114
- - **Episodic** (events): decays fastest (0.95/week, floor 0.05). Specific moments matter less over time.
115
-
116
- ### Cognitive Layers
117
-
118
- Every memory gets classified into one of three layers:
119
-
120
- - **Episodic** is for events tied to a specific moment. "User debugged a schema migration and it took most of the session."
121
- - **Semantic** is for durable facts. "User prefers TypeScript over Python." "User's dog is named Ellie."
122
- - **Procedural** is for behavioral rules about how the user wants things done. "Always show code before explanation." "Never use em-dashes."
123
-
124
- The system can extract these from conversations using LLM-powered classification or, if no API key is available, a set of heuristic patterns. The heuristics catch things like "I always prefer X" (procedural), "I work at Y" (semantic fact), and "no, don't do that" (correction/procedural).
125
-
126
- ### Procedural Rules
127
-
128
- Learned from user corrections and direct instructions. Each rule has a confidence score that shifts with evidence:
129
-
130
- - Reinforcement (user repeats or confirms the rule): confidence +0.1
131
- - Contradiction (user does the opposite): confidence -0.2
132
-
133
- The asymmetry is intentional. Contradictions should weigh more because they often mean the rule was wrong. Rules that hit zero confidence get pruned.
134
-
135
- ### Knowledge Graph
136
-
137
- Entity-relationship triples with temporal validity. Each triple records when a fact became true and optionally when it stopped being true.
138
-
139
- ```
140
- ("Matt", "works-at", "Acme Corp", validFrom: 2024-01, validTo: 2024-06)
141
- ("Matt", "works-at", "NewCo", validFrom: 2024-06, validTo: null)
142
- ("finch-core", "uses", "TypeScript", validFrom: 2025-01, validTo: null)
143
- ```
144
-
145
- When a fact changes, the old triple gets invalidated (marked with an end date) and a new one gets created. The full history is preserved so the system can answer questions about the past. Adding a triple that already exists just bumps its confidence score.
146
-
147
- ### Reconsolidation
148
-
149
- Borrowed from neuroscience. When a memory gets recalled during a relevant conversation and marked as helpful, the system can update it with new context. A memory like "User prefers TypeScript" might get refined to "User prefers TypeScript for large projects but uses Python for quick scripts" if that nuance comes up in conversation.
150
-
151
- This only triggers if the memory hasn't been reconsolidated in the last 24 hours (to prevent over-updating) and requires an LLM API key.
152
-
153
- ### Recall Outcomes
154
-
155
- A feedback loop that lets the system learn which memories are actually useful:
156
-
157
- - **Helpful**: importance +0.05, triggers reconsolidation, strengthens graph edges to co-recalled memories
158
- - **Corrected**: importance -0.10 (memory was wrong)
159
- - **Irrelevant**: importance -0.05
160
-
161
- If a memory gets marked irrelevant 3+ times out of the last 5 recalls, its importance drops sharply and it may get archived.
162
-
163
- ### Knowledge Graph Auto-Population
164
-
165
- When a memory is ingested, the system heuristically extracts entity-relationship triples and adds them to the knowledge graph automatically. It detects 12 relationship types including `works-at`, `uses`, `depends-on`, `prefers`, `chose`, `located-in`, and more. This means the knowledge graph grows passively as memories accumulate, without needing explicit `memory_kg_add` calls for every fact.
166
-
167
- ### Governance Middleware
168
-
169
- Advisory checks that flag potential issues without auto-deleting anything:
170
-
171
- - **Contradiction detection**: Combines vector similarity, keyword heuristics, and optional LLM analysis to find memories that conflict with each other. Flags them for review.
172
- - **Semantic drift monitoring**: Tracks how the memory store's content distribution shifts over time. Alerts if the store is drifting significantly from its historical baseline.
173
- - **Memory poisoning checks**: Detects patterns that suggest adversarial injection — unusual embedding distributions, suspiciously high importance scores, or content that doesn't fit the user's established patterns.
174
-
175
- ### Adaptive Forgetting
176
-
177
- Inspired by [FadeMem (Jan 2026)](https://arxiv.org/abs/2501.xxxxx). Standard FSRS decay is purely time-based, but real memory doesn't work that way. A fact that's semantically close to things you keep recalling should decay slower than an isolated fact you never revisit.
178
-
179
- Adaptive forgetting modulates the decay rate based on semantic proximity to recently recalled memories. If a memory's nearest neighbors are getting recalled, it decays slower. If nothing nearby is ever accessed, it fades faster. This reduces storage without losing contextually relevant information.
180
-
181
- ### Self-Organizing Memories
182
-
183
- During consolidation, the system does two passes of housekeeping beyond decay and promotion. First, any memory missing a short description gets one auto-generated from its content, which makes it easier to surface in summaries and the knowledge graph. Second, the consolidator scans for semantically related memories that aren't yet linked and generates cross-links between them, so spreading activation has more edges to traverse the next time a query comes in. The graph densifies passively as the store grows.
184
-
185
- ### Duplicate Detection and Merging
186
-
187
- New memories get checked against existing ones using Jaccard similarity on word sets (threshold 0.75). If a duplicate is found, it doesn't get stored.
188
-
189
- During consolidation, the system also scans for near-duplicates using cosine similarity on embeddings (threshold 0.9). When found, the higher-importance memory absorbs the other's recall count and the duplicate gets deleted.
190
-
191
- ### Handoff Protocol
192
-
193
- Context compaction is irreversible, and if the window fills completely before compaction runs the user has to abandon the chat. Engram treats this as a first-class failure mode and ships three tools that mechanize the fix:
194
-
195
- - `memory_handoff_write` persists a structured "where we left off" snapshot to `handoffs/YYYY-MM-DD_HH-MM-SS.{json,md}` — currentTask, completed, nextSteps, openQuestions, file references, decisions, and free-form notes. The JSON half is for programmatic resume; the markdown half is for humans.
196
- - `memory_handoff_read` loads the latest handoff (or a specific one by stamp). Agents call it at session start to pick up from exactly where the previous session stopped.
197
- - `memory_context_pressure` is a self-nudge: the agent reports its own pressure level (`ok`/`warm`/`hot`/`critical`) and gets back a deterministic action plan — when to save, when to write the handoff, when to compact early rather than riding the window to the edge. Passing `phaseBoundary=true` (task complete, pivoting focus, finishing a subsystem) overrides level and forces a proactive compact; the reasoning is that pivots thrash Anthropic's 5-minute prompt cache anyway, so eating that miss at the boundary is effectively free and avoids carrying the verbose tool output of the finished phase into the next one.
198
-
199
- The bundled `engram_precompact_hook.sh` makes the write mandatory: it **blocks** compaction until `memory_handoff_write` has been called with `reason=compact`. Save constantly, compact at natural phase boundaries, and the next session starts with a full picture regardless of what happened in the previous one.
200
-
201
- ## Compatibility
202
-
203
- Engram is an MCP (Model Context Protocol) server. It works with any client that supports the MCP standard. That includes:
204
-
205
- - **Claude Code** (Anthropic's CLI and desktop app)
206
- - **Claude.ai** (via MCP server configuration)
207
- - **Cursor** (AI code editor)
208
- - **Windsurf** (AI code editor)
209
- - **Cline** (VS Code extension)
210
- - **Continue** (VS Code / JetBrains extension)
211
- - **Any MCP-compatible client** (the protocol is open and standardized)
212
-
213
- If your tool can connect to an MCP server over stdio, Engram will work with it.
214
-
215
- ## Installation
216
-
217
- ### Claude Code
218
-
219
- ```bash
220
- claude mcp add engram -- npx @onenomad/engram-mcp
221
- ```
222
-
223
- ### Claude Desktop
224
-
225
- Add to your Claude Desktop config file. On macOS it's at `~/Library/Application Support/Claude/claude_desktop_config.json`, on Windows at `%APPDATA%\Claude\claude_desktop_config.json`:
226
-
227
- ```json
228
- {
229
- "mcpServers": {
230
- "engram": {
231
- "command": "npx",
232
- "args": ["@onenomad/engram-mcp"]
233
- }
234
- }
235
- }
236
- ```
237
-
238
- Restart Claude Desktop after saving.
239
-
240
- ### Any MCP Client (Cursor, Windsurf, Cline, etc.)
241
-
242
- Add to your client's MCP config:
243
-
244
- ```json
245
- {
246
- "mcpServers": {
247
- "engram": {
248
- "command": "npx",
249
- "args": ["@onenomad/engram-mcp"]
250
- }
251
- }
252
- }
253
- ```
254
-
255
- ### From Source
256
-
257
- ```bash
258
- git clone https://github.com/OneNomad-LLC/engram-mcp.git
259
- cd engram-mcp
260
- npm install
261
- npm run build
262
- ```
263
-
264
- Then point your MCP client at `dist/server.js`:
265
-
266
- ```json
267
- {
268
- "mcpServers": {
269
- "engram": {
270
- "command": "node",
271
- "args": ["/path/to/engram/dist/server.js"]
272
- }
273
- }
274
- }
275
- ```
276
-
277
- ## Configuration
278
-
279
- ### Environment Variables
280
-
281
- | Variable | Default | Description |
282
- |----------|---------|-------------|
283
- | `OPENROUTER_API_KEY` | (none) | Enables LLM extraction and reranking via [OpenRouter](https://openrouter.ai). Pick any model provider you want. Without it, the system uses heuristic extraction and keyword/vector search only. |
284
- | `MEM0_API_KEY` | (none) | Enables Mem0 cloud extraction as a second opinion |
285
- | `ENGRAM_DATA_DIR` | `~/.claude/engram` | Where data gets stored |
286
- | `ENGRAM_EMBEDDING_MODEL` | `Xenova/all-MiniLM-L6-v2` | HuggingFace model for embeddings |
287
- | `ENGRAM_DEVICE` | `cpu` | Embedding device: `cpu`, `dml` (DirectML), or `cuda` |
288
- | `ENGRAM_MODEL` | `anthropic/claude-haiku-4.5` | OpenRouter model ID for LLM features. Only used when `OPENROUTER_API_KEY` is set. Any model on [openrouter.ai](https://openrouter.ai) works. |
289
- | `STORAGE_BACKEND` | `file` | Storage backend: `file` (LanceDB + filesystem, default), `postgres` (self-hosted multi-tenant), or `cloud` (Pyre Cloud Pro). See below. |
290
- | `DATABASE_URL` | (none) | Postgres connection string. Required when `STORAGE_BACKEND=postgres`. |
291
- | `TENANT_ID` | (none) | Tenant identifier — every row in postgres is scoped by this. Required when `STORAGE_BACKEND=postgres`. |
292
- | `PYRE_API_URL` | (none) | pyre-web server URL for `engram-mcp login`. Alternative to the positional arg or `--server` flag — one of the three is required. |
293
- | `PYRE_API_KEY` | (none) | Pyre Cloud API key. Overrides the field from `~/.pyre/credentials.json` when set. |
294
- | `PYRE_CREDENTIALS_FILE` | `~/.pyre/credentials.json` | Override the credentials-file path (CI / headless installs). |
295
-
296
- ### Hosted (Pyre Cloud)
297
-
298
- For Pyre Cloud Pro users:
299
-
300
- ```bash
301
- npm install -g @onenomad/engram-mcp
302
- engram-mcp login https://getpyre.ai
303
- ```
304
-
305
- `login` requires the pyre-web server URL. The binary ships with no hardcoded default — you point at whichever Pyre instance you're using (prod, staging, your own deployment). Three equivalent ways to supply it:
306
-
307
- ```bash
308
- engram-mcp login https://getpyre.ai # positional argument
309
- engram-mcp login --server https://getpyre.ai # flag
310
- PYRE_API_URL=https://getpyre.ai engram-mcp login # env var
311
- ```
312
-
313
- `login` opens that URL in your browser, shows you a one-time pairing code, and waits for you to approve the device. On approval it writes `~/.pyre/credentials.json` (mode 0600) using the canonical `api_url` from the server's response — which may differ from the login URL you typed if the server normalises or redirects. From that point on Engram automatically routes through your cloud Engram instance. Local data stays local; nothing changes for users who don't run `login`.
314
-
315
- ```
316
- $ engram-mcp login https://getpyre.ai
317
- Open this URL in your browser to authorize:
318
-
319
- https://getpyre.ai/connect
320
-
321
- Enter this code when prompted: PYRE-7K4M-9N2X
322
- (waiting for approval — Ctrl+C to cancel)
323
- Logged in. Credentials saved to ~/.pyre/credentials.json.
324
- ```
325
-
326
- To sign out:
327
-
328
- ```bash
329
- engram-mcp logout
330
- ```
331
-
332
- This deletes `~/.pyre/credentials.json` and reverts Engram to local file mode on the next run. Idempotent — running it when you're already logged out exits 0.
333
-
334
- **Where credentials live**
335
-
336
- Credentials are stored at `~/.pyre/credentials.json` with mode `0600` (readable by you only). The file is a flat JSON object with `api_url`, `api_key`, `label`, `scopes`, and `issued_at`. Override the location with `PYRE_CREDENTIALS_FILE` if you have a multi-user setup.
337
-
338
- **Headless / CI installs**
339
-
340
- There's no terminal to open a browser from in CI. Skip `login` and set the env vars directly:
341
-
342
- ```bash
343
- export STORAGE_BACKEND=cloud
344
- export PYRE_API_URL=https://getpyre.ai
345
- export PYRE_API_KEY=sk_pyre_xxx
346
- ```
347
-
348
- When `STORAGE_BACKEND` is unset, Engram probes for `~/.pyre/credentials.json` and uses cloud mode if it finds one. Explicit env vars always win.
349
-
350
- The existing `STORAGE_BACKEND=postgres` self-host path (below) is unaffected — none of this changes anything for users running their own postgres instance.
351
-
352
- ### Cloud / multi-tenant mode
353
-
354
- By default Engram stores everything locally under `ENGRAM_DATA_DIR` (LanceDB tables for chunks/daily_logs/rules/knowledge_triples, plus markdown files for diary and handoffs). For a single user on a single machine this is the right answer — fast, offline, zero dependencies.
355
-
356
- For shared/cloud deployments where many users share one Engram process, Engram also speaks postgres with [pgvector](https://github.com/pgvector/pgvector).
357
-
358
- 1. Provision a postgres database with the `vector` extension available.
359
- 2. Set environment variables:
360
-
361
- ```bash
362
- export STORAGE_BACKEND=postgres
363
- export DATABASE_URL=postgres://user:pass@host:5432/engram
364
- export TENANT_ID=<one-id-per-user>
365
- ```
366
-
367
- 3. Install the postgres driver (it's an `optionalDependency`, so file-mode users don't pull it in):
368
-
369
- ```bash
370
- npm install pg
371
- ```
372
-
373
- 4. Run the schema migrations against the database once:
374
-
375
- ```bash
376
- DATABASE_URL=postgres://... npx engram-migrate
377
- ```
378
-
379
- This creates the six tables (`chunks`, `daily_logs`, `rules`, `knowledge_triples`, `diary_entries`, `handoffs`), enables the `vector` extension, and adds the hot-path indexes (per-tenant created_at, ivfflat on chunks.embedding, etc.). The runner is idempotent — re-running is a no-op for already-applied files.
380
-
381
- 5. Boot Engram normally. Every query is scoped by `TENANT_ID`; switching tenants is just a different env var on a different process.
382
-
383
- **Notes**
384
-
385
- - pgvector required (`CREATE EXTENSION vector;`). The migration runs this for you when your DB role has the privileges; otherwise create it manually first.
386
- - Embedding dimension is 384 by default (matches the local `Xenova/all-MiniLM-L6-v2` model). If you change `ENGRAM_EMBEDDING_MODEL` to one with a different dimensionality, edit `migrations/postgres/001_init.sql` before running migrations.
387
- - Local file mode and postgres mode are **not** wire-compatible — there's no auto-import. If you're migrating an existing local install to the cloud, re-ingest is the path. Diary and handoffs in particular store different on-disk formats (markdown files vs. jsonb rows).
388
- - Single user, single machine: stay on `file`. The postgres path exists for the hosted Pyre deployment and similar shared infra.
389
-
390
- ## Tools
391
-
392
- The MCP server exposes 20 tools across six groups. Several earlier tools (`memory_format`, `memory_check_duplicate`, `memory_extract_rules`, `memory_taxonomy`, `memory_kg_stats`) were folded into their parent tools in 1.0.0-beta.6 — pass the relevant flag or mode to the parent instead. 1.0.0-beta.8 added the Handoff tools for cross-session continuity. 1.0.0 adds the memory origin field (user vs derived), the scratch tier, and `memory_scratch_promote`.
393
-
394
- ### Core Memory
395
-
396
- | Tool | What it does |
397
- |------|-------------|
398
- | `memory_search` | Hybrid ANN + keyword search with spreading activation. Supports a formatted output mode for prompt injection (replaces the old `memory_format`). |
399
- | `memory_ingest` | Write-ahead log: immediately persist a memory before responding. Runs duplicate detection inline (replaces `memory_check_duplicate`). Defaults `origin='user'` since explicit ingest is user-asserted; pass `tier: 'scratch'` for session-only notes. |
400
- | `memory_scratch_promote` | Graduate a scratch-tier memory to short-term so it survives the 24h auto-purge and enters the normal consolidation lifecycle. |
401
- | `memory_extract` | Extract memories from a conversation (LLM or heuristic). Rules-only mode replaces the old `memory_extract_rules`. |
402
- | `memory_maintain` | Run consolidation (decay, promote, link, merge, self-organize). Auto-describes unnamed memories, generates cross-links, and syncs the Persona procedural bridge when both servers are running. |
403
- | `memory_rules` | Show active procedural rules |
404
- | `memory_outcome` | Record recall feedback (helpful/corrected/irrelevant) |
405
- | `memory_session` | Manage session state (hot RAM scratchpad) |
406
- | `memory_stats` | Memory statistics by tier, layer, type. Includes KG stats, domain/topic taxonomy, and Persona bridge status (replaces `memory_kg_stats` and `memory_taxonomy`). |
407
-
408
- ### Knowledge Graph
409
-
410
- | Tool | What it does |
411
- |------|-------------|
412
- | `memory_kg_add` | Add a subject-predicate-object triple |
413
- | `memory_kg_query` | Query triples with optional filters |
414
- | `memory_kg_invalidate` | Mark a fact as no longer valid |
415
- | `memory_kg_timeline` | Get chronological history of an entity |
416
-
417
- ### Diary
418
-
419
- | Tool | What it does |
420
- |------|-------------|
421
- | `memory_diary_write` | Write a session diary entry |
422
- | `memory_diary_read` | Read diary entries by date or range |
423
-
424
- ### Handoff (cross-session continuity)
425
-
426
- | Tool | What it does |
427
- |------|-------------|
428
- | `memory_handoff_write` | Structured "where we left off" snapshot — currentTask, completed, nextSteps, openQuestions, fileRefs, decisions, notes. Written before compaction or session end so a fresh session can resume without re-explanation. |
429
- | `memory_handoff_read` | Load the latest handoff (or one by stamp; `list=true` for recent stamps). Call at session start to pick up where the prior session left off. |
430
- | `memory_context_pressure` | Self-assess context window pressure (`ok`/`warm`/`hot`/`critical`) and receive a deterministic action plan — when to save memories, when to write a handoff, when to invoke `/compact`. Pass `phaseBoundary=true` at natural task/phase boundaries to force a proactive compact regardless of level (pivots thrash the cache anyway — compacting at the boundary is a free lunch). |
431
-
432
- ### Governance
433
-
434
- | Tool | What it does |
435
- |------|-------------|
436
- | `memory_govern` | Run governance checks: contradiction detection (vector + heuristic + LLM), semantic drift monitoring, and memory poisoning detection. All advisory — flags issues without auto-deleting. |
437
-
438
- ### Import
439
-
440
- | Tool | What it does |
441
- |------|-------------|
442
- | `memory_import` | Bulk import from Claude Code JSONL, ChatGPT JSON, or plain text |
443
-
444
- ## Slash Commands
445
-
446
- These work in any MCP-compatible client (Claude Code, Cursor, etc.). The MCP server advertises them in its instructions so the agent knows how to handle them. SKILL.md files are also included for platforms that discover skills from the filesystem.
447
-
448
- | Command | What it does |
449
- |---------|-------------|
450
- | `/memory-source <engram\|off\|hybrid>` | Switch memory backend. "engram" uses Engram exclusively, "off" disables all persistent memory, "hybrid" runs Engram alongside native client memory. |
451
- | `/recall <query>` | Search memories using the full hybrid pipeline (vector + keyword + temporal + KG + spreading activation). Results presented conversationally. |
452
- | `/forget <what>` | Find and remove or correct specific memories. Shows matches and confirms before acting. |
453
- | `/memory-health [maintain]` | Show memory system stats (tiers, layers, rules, KG size). With "maintain", runs the full consolidation cycle. |
454
- | `/memory-api <key>` | Set or update the OpenRouter API key that unlocks LLM extraction, reranking, and procedural-rule learning. |
455
- | `/knowledge <subcommand>` | Knowledge graph operations. Subcommands: `timeline <entity>`, `about <entity>`, `add <s> <p> <o>`, `correct <s> <p>`, `stats`. |
456
- | `/memory <subcommand>` | Quick ops. Subcommands: `save <content>`, `diary [date]`, `diary write <entry>`, `import <source>`, `rules`, `session [show\|clear]`. |
457
-
458
- ### Installing Slash Commands for Claude Code
459
-
460
- The slash commands above are advertised in Engram's MCP server instructions and work automatically in most clients. For Claude Code specifically, you can also install them as custom commands so they show up in the `/` command menu:
461
-
462
- ```bash
463
- # From the engram directory
464
- bash install-commands.sh
465
-
466
- # To overwrite existing commands
467
- bash install-commands.sh --force
468
- ```
469
-
470
- This copies command files to `~/.claude/commands/` where Claude Code picks them up globally. After installing, type `/` in Claude Code to see them in the command list.
471
-
472
- ## Architecture
473
-
474
- ```
475
- Conversations --> Extract --> LanceDB (vectors + metadata)
476
- | |
477
- KG Auto-Populate +----------+----------+
478
- (12 rel types) | | |
479
- Vector ANN IDF Keywords Time Windows
480
- | | |
481
- +----+-----+-----+----+
482
- | |
483
- KG Temporal Spreading
484
- Lookup Activation
485
- | |
486
- +-----+-----+
487
- |
488
- Score + Rank
489
- |
490
- Token Budget Cap
491
- |
492
- Governance Checks
493
- (advisory, async)
494
- |
495
- Format for Prompt
496
-
497
- Adaptive Forgetting
498
- (semantic proximity modulates decay)
499
- |
500
- Persona Bridge <--> Procedural Rules
501
- (emotion-weighted (confidence-scored,
502
- importance, learned from
503
- cognitive load) corrections)
504
- ```
505
-
506
- ### Data Storage
507
-
508
- Everything lives locally:
509
-
510
- ```
511
- ~/.claude/engram/
512
- ├── SESSION-STATE.md # Hot RAM scratchpad
513
- ├── diary/ # Daily diary entries
514
- │ └── YYYY-MM-DD.md
515
- ├── handoffs/ # Cross-session "where we left off" snapshots
516
- │ ├── YYYY-MM-DD_HH-MM-SS.json
517
- │ └── YYYY-MM-DD_HH-MM-SS.md
518
- └── lance/ # LanceDB tables
519
- ├── chunks.lance/ # Memory chunks with embeddings
520
- ├── daily_logs.lance/ # Extraction logs
521
- ├── rules.lance/ # Procedural rules
522
- └── knowledge_triples.lance/
523
- ```
524
-
525
- ### Dependencies
526
-
527
- - **LanceDB** for the embedded vector database, handles ANN search natively
528
- - **@huggingface/transformers** for local embedding inference (Xenova/all-MiniLM-L6-v2, 384 dimensions, 23MB)
529
- - **openai** (optional) for LLM-powered extraction and reranking via OpenRouter
530
- - **mem0ai** (optional) for Mem0 cloud extraction
531
- - **@modelcontextprotocol/sdk** for the MCP server protocol
532
-
533
- ## Benchmarks
534
-
535
- Clone the repo, install, fetch the public datasets, run the whole suite:
536
-
537
- ```bash
538
- git clone https://github.com/OneNomad-LLC/engram-mcp.git
539
- cd engram-mcp
540
- npm install
541
- bash benchmarks/download-datasets.sh
542
- npm run bench:all
543
- ```
544
-
545
- That's it. Every benchmark writes a JSON result file into `benchmarks/results/` and a consolidated table prints at the end. Missing datasets get skipped, not failed — partial runs are valid. No API keys are required for the default configuration.
546
-
547
- For full methodology, dataset citations, and reproducibility steps, see [BENCHMARKS.md](BENCHMARKS.md).
548
-
549
- ### Our scores at HEAD
550
-
551
- | Benchmark | Metric | Score | Hardware | Notes |
552
- |-----------|-------|------|----------|-------|
553
- | LoCoMo (1,986 QA) | R@10 | **92.0%** | M-series laptop | Zero-API, sub-session chunking |
554
- | LoCoMo (1,986 QA) | R@5 | **85.1%** | M-series laptop | Zero-API |
555
- | LongMemEval (500 Q) | R@5 | **99.0%** | M-series laptop | Zero-API |
556
- | Engram synthetic | R@5 | TODO capture | — | Internal regression battery |
557
- | Ingest throughput (cold) | chunks/sec | TODO capture | — | File backend, KG extraction off |
558
- | Ingest throughput (warm) | chunks/sec | TODO capture | — | File backend, 10k chunks preloaded |
559
- | Query latency (medium, 10k corpus) | p50 / p99 | TODO capture | — | Top-K=10, single thread |
560
-
561
- Run the suite yourself — the `results.json` file captures the exact config, embedding model, commit hash, and per-category breakdown for verification.
562
-
563
- ### LoCoMo
564
-
565
- [Snap Research's LoCoMo](https://github.com/snap-research/locomo) — 1,986 multi-hop QA pairs across 10 long synthetic conversations. We score Recall@5 and Recall@10 with the full hybrid retrieval pipeline. A retrieved session counts as a hit if it contains any of the evidence dialog IDs for the question.
566
-
567
- Categories: `single-hop`, `temporal`, `temporal-inference`, `open-domain`, `adversarial`.
568
-
569
- ```bash
570
- npm run bench:locomo # full run
571
- npm run bench:locomo -- --limit 200 # quick subset
572
- npm run bench:locomo -- --rerank # with LLM rerank (needs OPENROUTER_API_KEY)
573
- npm run bench:locomo -- --verbose
574
- ```
575
-
576
- Runtime: ~3–5 min on an M-series Mac. Paper: [Maharana et al., 2024](https://arxiv.org/abs/2402.17753).
577
-
578
- ### LongMemEval
579
-
580
- [LongMemEval](https://huggingface.co/datasets/xiaowu0162/longmemeval-cleaned) — 500 questions across six types, ~53 candidate sessions per question. We score Recall@5 / @10 and NDCG@5 / @10. Binary recall — at least one answer session in the top K.
581
-
582
- ```bash
583
- npm run bench:longmemeval
584
- npm run bench:longmemeval -- --limit 50
585
- npm run bench:longmemeval -- --rerank
586
- ```
587
-
588
- Runtime: ~6–10 min for the full 500 on an M-series Mac. The dataset is ~277 MB. Paper: [Wu et al., 2024](https://arxiv.org/abs/2410.10813).
589
-
590
- ### Engram synthetic suite (`bench`)
591
-
592
- A self-contained 15-question battery covering single-fact recall, preferences, temporal reasoning, knowledge updates, and adversarial / distractor resistance. No dataset download. Exits non-zero when R@5 drops below 70% — used as the pre-merge regression gate.
593
-
594
- ```bash
595
- npm run bench
596
- npm run bench:verbose
597
- ```
598
-
599
- Runtime: ~30 sec. Self-contained — runs on a clean clone with no flags.
600
-
601
- ### Ingest throughput
602
-
603
- Pushes N synthetic chunks (default 10,000) through `wal.ingest()` and reports chunks/sec. Two modes: `cold` (fresh data dir) and `warm` (10k chunks pre-loaded). The bench waits for background side-effects to drain before stopping the clock, so the number is "fully persisted" not "queued." KG extraction is skipped to keep the bench API-key-free.
604
-
605
- ```bash
606
- npm run bench:throughput
607
- npm run bench:throughput -- --chunks 5000 --mode warm
608
- ```
609
-
610
- Runtime: ~1–3 min at default settings.
611
-
612
- ### Query latency
613
-
614
- Loads N synthetic chunks (default 10,000), runs M queries (default 1,000) sequentially, and reports p50 / p95 / p99 latency per query bucket (`short` keyword queries, `medium` single-sentence questions, `long` multi-clause questions). Wall-clock is measured around the full `search()` call — the same path `memory_search` hits at the MCP boundary.
615
-
616
- ```bash
617
- npm run bench:latency
618
- npm run bench:latency -- --chunks 5000 --queries 500
619
- npm run bench:latency -- --topk 5
620
- ```
621
-
622
- Runtime: ~2–4 min at default settings.
623
-
624
- ## Security
625
-
626
- ### Network calls
627
-
628
- This plugin contacts exactly two services:
629
-
630
- 1. **HuggingFace Hub** for a one-time model download on first run (~23MB), cached after that
631
- 2. **Mem0 API**, only when `extractionProvider` is `mem0` or `both`
632
-
633
- If you set `OPENROUTER_API_KEY`, it contacts the OpenRouter API for LLM features (you pick the model provider). Without any API keys, everything runs fully local.
634
-
635
- No telemetry. No analytics. No phoning home.
636
-
637
- ### Local storage
638
-
639
- All memory data stays on disk at `~/.claude/engram/`. Nothing gets sent anywhere unless you explicitly configure an external provider.
640
-
641
- ## Use Cases
642
-
643
- Here are some real situations where this makes a difference.
644
-
645
- **Personal AI assistant.** The most obvious one. You talk to an AI every day and it forgets everything between sessions. Engram fixes that. It learns your preferences, remembers your projects, picks up your corrections, and builds a picture of who you are over time. Instead of re-explaining yourself every conversation, the agent just knows.
646
-
647
- **Developer tools.** If you use Claude Code, Cursor, or any AI coding tool, the agent forgets your codebase conventions, your preferred patterns, and the decisions you've already made. Engram picks up things like "always use explicit return types" or "we deploy to Vercel, not AWS" and carries them forward. Procedural rules are built for this.
648
-
649
- **Customer support agents.** A support bot that actually remembers a customer's history, past issues, and preferences without needing to query a CRM every time. The knowledge graph handles entity relationships ("Customer X uses Plan Y, started in March") and temporal queries let the agent reason about timelines.
650
-
651
- **Research and note-taking.** If you use an AI to research topics over multiple sessions, Engram lets it build on previous findings instead of starting from scratch. The diary system logs what happened each session, and the search pipeline surfaces relevant prior research when you come back to a topic.
652
-
653
- **Multi-agent systems.** Multiple agents can share the same memory store. One agent handles research, another handles coding, and they both read from and write to the same LanceDB. The MCP protocol makes this straightforward since any MCP-compatible client can connect to the server.
654
-
655
- **Therapy / coaching bots.** Sensitive use case, but a good one. An AI that remembers what you talked about last week, tracks your goals, and notices patterns in your behavior over time. The tier lifecycle naturally keeps recent context hot while letting older sessions fade unless they stay relevant.
656
-
657
- ## Pairs Well With: Persona MCP
658
-
659
- If Engram is the brain, [Persona](https://github.com/OneNomad-LLC/persona-mcp) is the personality.
660
-
661
- Engram handles *what* the agent remembers: facts, preferences, rules, timelines. Persona handles *how* the agent communicates: tone, verbosity, format preferences, and communication style. They solve different problems but work best together.
662
-
663
- Here's why the combo matters. Engram will learn that you prefer TypeScript over Python. Persona will learn that you want short answers with code first and explanation after. Engram will store the fact that you got laid off last month. Persona will know not to bring that up casually based on the emotional context it picked up.
664
-
665
- Persona tracks behavioral signals (corrections, approvals, frustrations, praise) and builds a communication profile that adapts over time. Engram's procedural rules overlap a little here ("never use em-dashes"), but Persona goes deeper into *how* the agent should talk to you specifically. Things like matching your energy level, knowing when to be terse vs. when to elaborate, and adjusting formality based on the topic.
666
-
667
- When both servers are running, they coordinate through three mechanisms:
668
-
669
- 1. **Emotion-weighted memory importance.** Engram calls `persona_state` during ingestion to get the current emotional valence and arousal. High-arousal negative emotions boost memory importance by up to 30%. A frustrated correction gets remembered more strongly than a neutral fact.
670
-
671
- 2. **Cognitive-load-gated search.** When Persona detects cognitive overload, Engram's `memory_search` receives the load signal and returns only the top 3 high-importance memories instead of the full result set. Less noise when you're already overwhelmed.
672
-
673
- 3. **Procedural bridge.** Engram's learned rules (from corrections and instructions) and Persona's applied evolution proposals sync through a shared bridge file at `~/.claude/procedural-bridge.json`. Engram rules become Persona proposals. Persona's applied proposals reinforce or create Engram rules. The bridge auto-syncs during `persona_consolidate`.
674
-
675
- You can run Engram without Persona and it works fine. But if you want an AI that actually feels like it knows you, not just what you've told it, but how you like to be talked to, run both.
676
-
677
- ## License
678
-
679
- Licensed under the [Business Source License 1.1](LICENSE).
680
-
681
- - **Licensor:** Matt Stvartak / OneNomad LLC
682
- - **Licensed Work:** Engram, Copyright (c) 2026 Matt Stvartak / OneNomad LLC
683
- - **Additional Use Grant:** You may use the Licensed Work for personal, educational, and non-commercial purposes. Production use in a commercial product or service requires a separate commercial license.
684
- - **Change Date:** April 10, 2030
685
- - **Change License:** Apache License, Version 2.0
686
-
687
- Use it, fork it, learn from it, run it for yourself. You can't sell it, bundle it with paid software, host it as a service for profit, or rebrand it. On the change date it converts to Apache 2.0 and those restrictions go away.
688
-
689
- Want to use Engram commercially before then? Reach out. I'm not trying to lock things down, I just want to know where it ends up.
690
-
691
- For licensing inquiries: **matt@onenomad.dev**
1
+ # Engram
2
+
3
+ A memory system for AI agents that actually works. LLMs can't remember anything between conversations by default, and the existing solutions are either too simple (just dump everything in a vector DB) or too expensive (send your entire history to an API every time). Engram sits in the middle. It runs locally, doesn't need an API key for basic operation, and scores **92% recall on LoCoMo** and **99% recall on LongMemEval** — both category-leading on the two most-cited memory benchmarks. That beats every open-source memory system I've tested against.
4
+
5
+ The core idea is that memory shouldn't just be "find similar text." When someone asks "where was I working last March?" the system needs to actually reason about time, not just pattern match on the word "March." So the search pipeline combines vector similarity, keyword matching with IDF weighting, temporal inference, a knowledge graph, and spreading activation over a memory graph. Each piece handles a different kind of recall that the others miss.
6
+
7
+ ## Table of Contents
8
+
9
+ - [Benchmark Results](#benchmark-results)
10
+ - [How It Works](#how-it-works)
11
+ - [Compatibility](#compatibility)
12
+ - [Installation](#installation) (Claude Code, Claude Desktop, Cursor/Windsurf/Cline, Source)
13
+ - [Configuration](#configuration)
14
+ - [Tools](#tools)
15
+ - [Slash Commands](#slash-commands)
16
+ - [Architecture](#architecture)
17
+ - [Benchmarks](#benchmarks)
18
+ - [Security](#security)
19
+ - [Use Cases](#use-cases)
20
+ - [Pairs Well With: Persona MCP](#pairs-well-with-persona-mcp)
21
+ - [License](#license)
22
+
23
+ ## Benchmark Results
24
+
25
+ Tested against the [LoCoMo benchmark](https://github.com/snap-research/locomo) (1,986 QA pairs across 10 long conversations). No LLM reranking, just the retrieval pipeline on its own.
26
+
27
+ ```
28
+ Per-category:
29
+ adversarial R@5= 89.7% R@10= 95.1%
30
+ open-domain R@5= 88.8% R@10= 94.3%
31
+ single-hop R@5= 78.0% R@10= 86.9%
32
+ temporal R@5= 82.6% R@10= 91.6%
33
+ temporal-inference R@5= 61.5% R@10= 74.0%
34
+
35
+ OVERALL R@5=85.1% R@10=92.0%
36
+ Embedding model Xenova/all-MiniLM-L6-v2 (23MB, runs on CPU)
37
+ ```
38
+
39
+ For details on recent benchmark optimization work including regression fixes, sub-session chunking, and reranking analysis, see [docs/benchmark-optimization.md](docs/benchmark-optimization.md).
40
+
41
+ For reference, here's how that stacks up against other memory systems on LoCoMo:
42
+
43
+ | System | Score | Metric | Requires API | Notes |
44
+ |--------|-------|--------|-------------|-------|
45
+ | **Engram** | **92.0%** | **R@10** | **No** | **Local embeddings, sub-session chunking, no rerank** |
46
+ | MemMachine v0.2 | 91.7% | LLM-judge | Yes | GPT-4.1-mini for extraction + judge |
47
+ | Backboard | 90.1% | LLM-judge | Yes | GPT-4.1 judge |
48
+ | MemPalace hybrid v5 | 88.9% | R@10 | No | Most direct comparison, same metric |
49
+ | Zep Graphiti | 85.2% | LLM-judge | Yes | Graph-based retrieval |
50
+ | Supermemory | 83.5% | R@10 | No | |
51
+ | Letta | ~83.2% | LLM-judge | Yes | |
52
+ | Zep (standard) | 75.1% | LLM-judge | Yes | |
53
+ | Mem0 | 64-67% | LLM-judge | Yes | Cloud API |
54
+ | OpenAI memory | 52.9% | LLM-judge | Yes | Built-in ChatGPT memory |
55
+
56
+ A couple things worth noting about this table. Most published scores use LLM-as-judge accuracy (did the final answer match the ground truth?), which is a different metric than R@10 retrieval recall (is the right memory in the top 10 candidates?). So not every row is a direct apples-to-apples comparison. The closest one is MemPalace at 88.9% R@10 using the same methodology and dataset.
57
+
58
+ The other thing that stands out is the API column. Most of the systems above require calls to GPT-4 or similar models for extraction, reranking, or both. Engram hits 92.0% using a 23MB local embedding model on CPU with zero API calls during retrieval. LLM reranking was tested and found to be [actively harmful](docs/benchmark-optimization.md#llm-reranking-analysis) for this pipeline.
59
+
60
+ ## How It Works
61
+
62
+ ### The Search Pipeline
63
+
64
+ This is where the interesting stuff happens. A query goes through nine stages before results come back.
65
+
66
+ **Stage 1: Signal Extraction.** Before any search happens, the query gets parsed for dates, entities (proper nouns), quoted phrases, and temporal language. If someone writes "What was Matt working on before he switched jobs in June?" the system extracts the date (June), the entity (Matt), and flags it as a temporal inference query because of the word "before."
67
+
68
+ **Stage 2: Vector Search.** Standard ANN search against LanceDB using cosine distance on 384-dim embeddings. This handles the "find semantically similar stuff" part. Candidates need at least 0.25 similarity to make the cut.
69
+
70
+ **Stage 3: IDF Keyword Scoring.** Rare terms in the query get weighted higher than common words. If you search for "Matt TypeScript" both terms will dominate scoring because they appear in relatively few memories. Proper nouns get an extra 1.5x boost. Results from this stage get blended with vector scores. The blend shifts toward keywords when entities are present, since names and specific nouns are better matched by exact text than by embedding similarity.
71
+
72
+ **Stage 4: Bonus Factors.** Every candidate gets adjustments for recency, recall frequency, tier, importance, and cognitive layer. Procedural memories (rules about how you want things done) get a small boost because they tend to be more immediately useful.
73
+
74
+ **Stage 5: Temporal Boost.** If the query mentions dates, memories get boosted based on whether they contain matching date strings or were created near the query date. Exact date matches in content get up to +0.4, timestamp proximity up to +0.3.
75
+
76
+ **Stage 6: Time-Window Retrieval.** This is the big one for temporal inference. When the system detects temporal signals, it pulls memories from the relevant time period into the candidate pool regardless of semantic similarity. "Where was I working in March 2024?" needs memories *from* March 2024, not just memories that happen to mention the word "March." Window size adapts to date precision. A specific day gets +/- 3 days, a month gets the full month plus buffer, a year gets the full year. If the query says "before March," the window extends 90 days earlier.
77
+
78
+ **Stage 7: Knowledge Graph Lookup.** When entities and time are both present, the system queries the knowledge graph for facts that were valid at the query time. If there's a triple like `(Matt, works-at, Acme Corp, valid 2024-01 to 2024-06)` and the query asks about Matt in March 2024, memories mentioning "Acme Corp" get boosted.
79
+
80
+ **Stage 8: Spreading Activation.** Based on [Collins & Loftus (1975)](https://en.wikipedia.org/wiki/Spreading_activation). The top 5 scoring memories activate their graph neighbors, which in turn activate their neighbors. Two hops deep, with activation decaying at each hop. Temporal edges get a 1.5x multiplier when the query involves time reasoning.
81
+
82
+ **Stage 9: Token Budget.** Results get sorted by score and trimmed to fit within a configurable token budget (default 1500 tokens). This prevents context bloat when injecting memories into prompts.
83
+
84
+ ### Memory Tiers
85
+
86
+ Memories flow through four tiers with automatic promotion and demotion, plus a fifth scratch tier that sits outside the lifecycle:
87
+
88
+ ```
89
+ scratch (24h, never promoted, manual promote only)
90
+
91
+ daily (2 days) --> short-term (14 days) --> long-term (90 days) --> archive
92
+ ^ |
93
+ +------ reactivation (if recalled) -----------+
94
+ ```
95
+
96
+ Promotion isn't just about age. A memory moves to long-term if it's been recalled multiple times, has high importance, received "helpful" feedback, or is a procedural rule. Memories that keep getting recalled stay promoted. Memories that never get touched decay and eventually archive.
97
+
98
+ **Scratch tier** is for exploratory, session-only notes that you may want to discard. Pass `tier: 'scratch'` to `memory_ingest` and the chunk is excluded from every consolidation path: no promotion, no merging, no decay-to-archive, no linking. After 24 hours scratch chunks are auto-purged. Use `memory_scratch_promote` to graduate one to short-term once you've decided it's worth keeping.
99
+
100
+ ### Memory Origin
101
+
102
+ Every chunk carries an `origin` tag that distinguishes user-asserted memory from auto-derived memory:
103
+
104
+ - **`user`** — written explicitly via `memory_ingest`. Treated as canonical user-territory: the consolidator never auto-merges, near-duplicate-deletes, or archives these. Importance still decays normally, but the content and lifecycle stay sacred.
105
+ - **`extracted`** — pulled from a conversation by `memory_extract` or the Mem0 provider.
106
+ - **`imported`** — bulk-loaded via `memory_import`.
107
+ - **`derived`** — produced by consolidation (e.g. episodic-to-semantic summaries).
108
+
109
+ The split mirrors the journal pattern in [Persona](https://github.com/OneNomad-LLC/persona-mcp): a clean ownership boundary between what the user said and what the system inferred. If you want auto-extracted memories to lose to your hand-written ones in a near-duplicate fight, this is what makes that happen.
110
+
111
+ Importance decays exponentially over time, but the rates differ by cognitive layer:
112
+ - **Procedural** (rules): decays slowest (0.98/week, floor 0.15). Rules tend to stay relevant.
113
+ - **Semantic** (facts): medium decay (0.97/week, floor 0.10)
114
+ - **Episodic** (events): decays fastest (0.95/week, floor 0.05). Specific moments matter less over time.
115
+
116
+ ### Cognitive Layers
117
+
118
+ Every memory gets classified into one of three layers:
119
+
120
+ - **Episodic** is for events tied to a specific moment. "User debugged a schema migration and it took most of the session."
121
+ - **Semantic** is for durable facts. "User prefers TypeScript over Python." "User's dog is named Ellie."
122
+ - **Procedural** is for behavioral rules about how the user wants things done. "Always show code before explanation." "Never use em-dashes."
123
+
124
+ The system can extract these from conversations using LLM-powered classification or, if no API key is available, a set of heuristic patterns. The heuristics catch things like "I always prefer X" (procedural), "I work at Y" (semantic fact), and "no, don't do that" (correction/procedural).
125
+
126
+ ### Procedural Rules
127
+
128
+ Learned from user corrections and direct instructions. Each rule has a confidence score that shifts with evidence:
129
+
130
+ - Reinforcement (user repeats or confirms the rule): confidence +0.1
131
+ - Contradiction (user does the opposite): confidence -0.2
132
+
133
+ The asymmetry is intentional. Contradictions should weigh more because they often mean the rule was wrong. Rules that hit zero confidence get pruned.
134
+
135
+ ### Knowledge Graph
136
+
137
+ Entity-relationship triples with temporal validity. Each triple records when a fact became true and optionally when it stopped being true.
138
+
139
+ ```
140
+ ("Matt", "works-at", "Acme Corp", validFrom: 2024-01, validTo: 2024-06)
141
+ ("Matt", "works-at", "NewCo", validFrom: 2024-06, validTo: null)
142
+ ("finch-core", "uses", "TypeScript", validFrom: 2025-01, validTo: null)
143
+ ```
144
+
145
+ When a fact changes, the old triple gets invalidated (marked with an end date) and a new one gets created. The full history is preserved so the system can answer questions about the past. Adding a triple that already exists just bumps its confidence score.
146
+
147
+ ### Reconsolidation
148
+
149
+ Borrowed from neuroscience. When a memory gets recalled during a relevant conversation and marked as helpful, the system can update it with new context. A memory like "User prefers TypeScript" might get refined to "User prefers TypeScript for large projects but uses Python for quick scripts" if that nuance comes up in conversation.
150
+
151
+ This only triggers if the memory hasn't been reconsolidated in the last 24 hours (to prevent over-updating) and requires an LLM API key.
152
+
153
+ ### Recall Outcomes
154
+
155
+ A feedback loop that lets the system learn which memories are actually useful:
156
+
157
+ - **Helpful**: importance +0.05, triggers reconsolidation, strengthens graph edges to co-recalled memories
158
+ - **Corrected**: importance -0.10 (memory was wrong)
159
+ - **Irrelevant**: importance -0.05
160
+
161
+ If a memory gets marked irrelevant 3+ times out of the last 5 recalls, its importance drops sharply and it may get archived.
162
+
163
+ ### Knowledge Graph Auto-Population
164
+
165
+ When a memory is ingested, the system heuristically extracts entity-relationship triples and adds them to the knowledge graph automatically. It detects 12 relationship types including `works-at`, `uses`, `depends-on`, `prefers`, `chose`, `located-in`, and more. This means the knowledge graph grows passively as memories accumulate, without needing explicit `memory_kg_add` calls for every fact.
166
+
167
+ ### Governance Middleware
168
+
169
+ Advisory checks that flag potential issues without auto-deleting anything:
170
+
171
+ - **Contradiction detection**: Combines vector similarity, keyword heuristics, and optional LLM analysis to find memories that conflict with each other. Flags them for review.
172
+ - **Semantic drift monitoring**: Tracks how the memory store's content distribution shifts over time. Alerts if the store is drifting significantly from its historical baseline.
173
+ - **Memory poisoning checks**: Detects patterns that suggest adversarial injection — unusual embedding distributions, suspiciously high importance scores, or content that doesn't fit the user's established patterns.
174
+
175
+ ### Adaptive Forgetting
176
+
177
+ Inspired by [FadeMem (Jan 2026)](https://arxiv.org/abs/2501.xxxxx). Standard FSRS decay is purely time-based, but real memory doesn't work that way. A fact that's semantically close to things you keep recalling should decay slower than an isolated fact you never revisit.
178
+
179
+ Adaptive forgetting modulates the decay rate based on semantic proximity to recently recalled memories. If a memory's nearest neighbors are getting recalled, it decays slower. If nothing nearby is ever accessed, it fades faster. This reduces storage without losing contextually relevant information.
180
+
181
+ ### Self-Organizing Memories
182
+
183
+ During consolidation, the system does two passes of housekeeping beyond decay and promotion. First, any memory missing a short description gets one auto-generated from its content, which makes it easier to surface in summaries and the knowledge graph. Second, the consolidator scans for semantically related memories that aren't yet linked and generates cross-links between them, so spreading activation has more edges to traverse the next time a query comes in. The graph densifies passively as the store grows.
184
+
185
+ ### Duplicate Detection and Merging
186
+
187
+ New memories get checked against existing ones using Jaccard similarity on word sets (threshold 0.75). If a duplicate is found, it doesn't get stored.
188
+
189
+ During consolidation, the system also scans for near-duplicates using cosine similarity on embeddings (threshold 0.9). When found, the higher-importance memory absorbs the other's recall count and the duplicate gets deleted.
190
+
191
+ ### Handoff Protocol
192
+
193
+ Context compaction is irreversible, and if the window fills completely before compaction runs the user has to abandon the chat. Engram treats this as a first-class failure mode and ships three tools that mechanize the fix:
194
+
195
+ - `memory_handoff_write` persists a structured "where we left off" snapshot to `handoffs/YYYY-MM-DD_HH-MM-SS.{json,md}` — currentTask, completed, nextSteps, openQuestions, file references, decisions, and free-form notes. The JSON half is for programmatic resume; the markdown half is for humans.
196
+ - `memory_handoff_read` loads the latest handoff (or a specific one by stamp). Agents call it at session start to pick up from exactly where the previous session stopped.
197
+ - `memory_context_pressure` is a self-nudge: the agent reports its own pressure level (`ok`/`warm`/`hot`/`critical`) and gets back a deterministic action plan — when to save, when to write the handoff, when to compact early rather than riding the window to the edge. Passing `phaseBoundary=true` (task complete, pivoting focus, finishing a subsystem) overrides level and forces a proactive compact; the reasoning is that pivots thrash Anthropic's 5-minute prompt cache anyway, so eating that miss at the boundary is effectively free and avoids carrying the verbose tool output of the finished phase into the next one.
198
+
199
+ The bundled `engram_precompact_hook.sh` makes the write mandatory: it **blocks** compaction until `memory_handoff_write` has been called with `reason=compact`. Save constantly, compact at natural phase boundaries, and the next session starts with a full picture regardless of what happened in the previous one.
200
+
201
+ ## Compatibility
202
+
203
+ Engram is an MCP (Model Context Protocol) server. It works with any client that supports the MCP standard. That includes:
204
+
205
+ - **Claude Code** (Anthropic's CLI and desktop app)
206
+ - **Claude.ai** (via MCP server configuration)
207
+ - **Cursor** (AI code editor)
208
+ - **Windsurf** (AI code editor)
209
+ - **Cline** (VS Code extension)
210
+ - **Continue** (VS Code / JetBrains extension)
211
+ - **Any MCP-compatible client** (the protocol is open and standardized)
212
+
213
+ If your tool can connect to an MCP server over stdio, Engram will work with it.
214
+
215
+ ## Installation
216
+
217
+ ### Claude Code
218
+
219
+ ```bash
220
+ claude mcp add engram -- npx @onenomad/engram-mcp
221
+ ```
222
+
223
+ ### Claude Desktop
224
+
225
+ Add to your Claude Desktop config file. On macOS it's at `~/Library/Application Support/Claude/claude_desktop_config.json`, on Windows at `%APPDATA%\Claude\claude_desktop_config.json`:
226
+
227
+ ```json
228
+ {
229
+ "mcpServers": {
230
+ "engram": {
231
+ "command": "npx",
232
+ "args": ["@onenomad/engram-mcp"]
233
+ }
234
+ }
235
+ }
236
+ ```
237
+
238
+ Restart Claude Desktop after saving.
239
+
240
+ ### Any MCP Client (Cursor, Windsurf, Cline, etc.)
241
+
242
+ Add to your client's MCP config:
243
+
244
+ ```json
245
+ {
246
+ "mcpServers": {
247
+ "engram": {
248
+ "command": "npx",
249
+ "args": ["@onenomad/engram-mcp"]
250
+ }
251
+ }
252
+ }
253
+ ```
254
+
255
+ ### From Source
256
+
257
+ ```bash
258
+ git clone https://github.com/OneNomad-LLC/engram-mcp.git
259
+ cd engram-mcp
260
+ npm install
261
+ npm run build
262
+ ```
263
+
264
+ Then point your MCP client at `dist/server.js`:
265
+
266
+ ```json
267
+ {
268
+ "mcpServers": {
269
+ "engram": {
270
+ "command": "node",
271
+ "args": ["/path/to/engram/dist/server.js"]
272
+ }
273
+ }
274
+ }
275
+ ```
276
+
277
+ ## Configuration
278
+
279
+ ### Environment Variables
280
+
281
+ | Variable | Default | Description |
282
+ |----------|---------|-------------|
283
+ | `OPENROUTER_API_KEY` | (none) | Enables LLM extraction and reranking via [OpenRouter](https://openrouter.ai). Pick any model provider you want. Without it, the system uses heuristic extraction and keyword/vector search only. |
284
+ | `MEM0_API_KEY` | (none) | Enables Mem0 cloud extraction as a second opinion |
285
+ | `ENGRAM_DATA_DIR` | `~/.claude/engram` | Where data gets stored |
286
+ | `ENGRAM_EMBEDDING_MODEL` | `Xenova/all-MiniLM-L6-v2` | HuggingFace model for embeddings |
287
+ | `ENGRAM_DEVICE` | `cpu` | Embedding device: `cpu`, `dml` (DirectML), or `cuda` |
288
+ | `ENGRAM_MODEL` | `anthropic/claude-haiku-4.5` | OpenRouter model ID for LLM features. Only used when `OPENROUTER_API_KEY` is set. Any model on [openrouter.ai](https://openrouter.ai) works. |
289
+ | `STORAGE_BACKEND` | `file` | Storage backend: `file` (LanceDB + filesystem, default), `postgres` (self-hosted multi-tenant), or `cloud` (Pyre Cloud Pro). See below. |
290
+ | `DATABASE_URL` | (none) | Postgres connection string. Required when `STORAGE_BACKEND=postgres`. |
291
+ | `TENANT_ID` | (none) | Tenant identifier — every row in postgres is scoped by this. Required when `STORAGE_BACKEND=postgres`. |
292
+ | `PYRE_API_URL` | (none) | pyre-web server URL for `engram-mcp login`. Alternative to the positional arg or `--server` flag — one of the three is required. |
293
+ | `PYRE_API_KEY` | (none) | Pyre Cloud API key. Overrides the field from `~/.pyre/credentials.json` when set. |
294
+ | `PYRE_CREDENTIALS_FILE` | `~/.pyre/credentials.json` | Override the credentials-file path (CI / headless installs). |
295
+
296
+ ### Hosted (Pyre Cloud)
297
+
298
+ For Pyre Cloud Pro users:
299
+
300
+ ```bash
301
+ npm install -g @onenomad/engram-mcp
302
+ engram-mcp login https://getpyre.ai
303
+ ```
304
+
305
+ `login` requires the pyre-web server URL. The binary ships with no hardcoded default — you point at whichever Pyre instance you're using (prod, staging, your own deployment). Three equivalent ways to supply it:
306
+
307
+ ```bash
308
+ engram-mcp login https://getpyre.ai # positional argument
309
+ engram-mcp login --server https://getpyre.ai # flag
310
+ PYRE_API_URL=https://getpyre.ai engram-mcp login # env var
311
+ ```
312
+
313
+ `login` opens that URL in your browser, shows you a one-time pairing code, and waits for you to approve the device. On approval it writes `~/.pyre/credentials.json` (mode 0600) using the canonical `api_url` from the server's response — which may differ from the login URL you typed if the server normalises or redirects. From that point on Engram automatically routes through your cloud Engram instance. Local data stays local; nothing changes for users who don't run `login`.
314
+
315
+ ```
316
+ $ engram-mcp login https://getpyre.ai
317
+ Open this URL in your browser to authorize:
318
+
319
+ https://getpyre.ai/connect
320
+
321
+ Enter this code when prompted: PYRE-7K4M-9N2X
322
+ (waiting for approval — Ctrl+C to cancel)
323
+ Logged in. Credentials saved to ~/.pyre/credentials.json.
324
+ ```
325
+
326
+ To sign out:
327
+
328
+ ```bash
329
+ engram-mcp logout
330
+ ```
331
+
332
+ This deletes `~/.pyre/credentials.json` and reverts Engram to local file mode on the next run. Idempotent — running it when you're already logged out exits 0.
333
+
334
+ **Where credentials live**
335
+
336
+ Credentials are stored at `~/.pyre/credentials.json` with mode `0600` (readable by you only). The file is a flat JSON object with `api_url`, `api_key`, `label`, `scopes`, and `issued_at`. Override the location with `PYRE_CREDENTIALS_FILE` if you have a multi-user setup.
337
+
338
+ **Headless / CI installs**
339
+
340
+ There's no terminal to open a browser from in CI. Skip `login` and set the env vars directly:
341
+
342
+ ```bash
343
+ export STORAGE_BACKEND=cloud
344
+ export PYRE_API_URL=https://getpyre.ai
345
+ export PYRE_API_KEY=sk_pyre_xxx
346
+ ```
347
+
348
+ When `STORAGE_BACKEND` is unset, Engram probes for `~/.pyre/credentials.json` and uses cloud mode if it finds one. Explicit env vars always win.
349
+
350
+ The existing `STORAGE_BACKEND=postgres` self-host path (below) is unaffected — none of this changes anything for users running their own postgres instance.
351
+
352
+ ### Cloud / multi-tenant mode
353
+
354
+ By default Engram stores everything locally under `ENGRAM_DATA_DIR` (LanceDB tables for chunks/daily_logs/rules/knowledge_triples, plus markdown files for diary and handoffs). For a single user on a single machine this is the right answer — fast, offline, zero dependencies.
355
+
356
+ For shared/cloud deployments where many users share one Engram process, Engram also speaks postgres with [pgvector](https://github.com/pgvector/pgvector).
357
+
358
+ 1. Provision a postgres database with the `vector` extension available.
359
+ 2. Set environment variables:
360
+
361
+ ```bash
362
+ export STORAGE_BACKEND=postgres
363
+ export DATABASE_URL=postgres://user:pass@host:5432/engram
364
+ export TENANT_ID=<one-id-per-user>
365
+ ```
366
+
367
+ 3. Install the postgres driver (it's an `optionalDependency`, so file-mode users don't pull it in):
368
+
369
+ ```bash
370
+ npm install pg
371
+ ```
372
+
373
+ 4. Run the schema migrations against the database once:
374
+
375
+ ```bash
376
+ DATABASE_URL=postgres://... npx engram-migrate
377
+ ```
378
+
379
+ This creates the six tables (`chunks`, `daily_logs`, `rules`, `knowledge_triples`, `diary_entries`, `handoffs`), enables the `vector` extension, and adds the hot-path indexes (per-tenant created_at, ivfflat on chunks.embedding, etc.). The runner is idempotent — re-running is a no-op for already-applied files.
380
+
381
+ 5. Boot Engram normally. Every query is scoped by `TENANT_ID`; switching tenants is just a different env var on a different process.
382
+
383
+ **Notes**
384
+
385
+ - pgvector required (`CREATE EXTENSION vector;`). The migration runs this for you when your DB role has the privileges; otherwise create it manually first.
386
+ - Embedding dimension is 384 by default (matches the local `Xenova/all-MiniLM-L6-v2` model). If you change `ENGRAM_EMBEDDING_MODEL` to one with a different dimensionality, edit `migrations/postgres/001_init.sql` before running migrations.
387
+ - Local file mode and postgres mode are **not** wire-compatible — there's no auto-import. If you're migrating an existing local install to the cloud, re-ingest is the path. Diary and handoffs in particular store different on-disk formats (markdown files vs. jsonb rows).
388
+ - Single user, single machine: stay on `file`. The postgres path exists for the hosted Pyre deployment and similar shared infra.
389
+
390
+ ## Tools
391
+
392
+ The MCP server exposes 20 tools across six groups. Several earlier tools (`memory_format`, `memory_check_duplicate`, `memory_extract_rules`, `memory_taxonomy`, `memory_kg_stats`) were folded into their parent tools in 1.0.0-beta.6 — pass the relevant flag or mode to the parent instead. 1.0.0-beta.8 added the Handoff tools for cross-session continuity. 1.0.0 adds the memory origin field (user vs derived), the scratch tier, and `memory_scratch_promote`.
393
+
394
+ ### Core Memory
395
+
396
+ | Tool | What it does |
397
+ |------|-------------|
398
+ | `memory_search` | Hybrid ANN + keyword search with spreading activation. Supports a formatted output mode for prompt injection (replaces the old `memory_format`). |
399
+ | `memory_ingest` | Write-ahead log: immediately persist a memory before responding. Runs duplicate detection inline (replaces `memory_check_duplicate`). Defaults `origin='user'` since explicit ingest is user-asserted; pass `tier: 'scratch'` for session-only notes. |
400
+ | `memory_scratch_promote` | Graduate a scratch-tier memory to short-term so it survives the 24h auto-purge and enters the normal consolidation lifecycle. |
401
+ | `memory_extract` | Extract memories from a conversation (LLM or heuristic). Rules-only mode replaces the old `memory_extract_rules`. |
402
+ | `memory_maintain` | Run consolidation (decay, promote, link, merge, self-organize). Auto-describes unnamed memories, generates cross-links, and syncs the Persona procedural bridge when both servers are running. |
403
+ | `memory_rules` | Show active procedural rules |
404
+ | `memory_outcome` | Record recall feedback (helpful/corrected/irrelevant) |
405
+ | `memory_session` | Manage session state (hot RAM scratchpad) |
406
+ | `memory_stats` | Memory statistics by tier, layer, type. Includes KG stats, domain/topic taxonomy, and Persona bridge status (replaces `memory_kg_stats` and `memory_taxonomy`). |
407
+
408
+ ### Knowledge Graph
409
+
410
+ | Tool | What it does |
411
+ |------|-------------|
412
+ | `memory_kg_add` | Add a subject-predicate-object triple |
413
+ | `memory_kg_query` | Query triples with optional filters |
414
+ | `memory_kg_invalidate` | Mark a fact as no longer valid |
415
+ | `memory_kg_timeline` | Get chronological history of an entity |
416
+
417
+ ### Diary
418
+
419
+ | Tool | What it does |
420
+ |------|-------------|
421
+ | `memory_diary_write` | Write a session diary entry |
422
+ | `memory_diary_read` | Read diary entries by date or range |
423
+
424
+ ### Handoff (cross-session continuity)
425
+
426
+ | Tool | What it does |
427
+ |------|-------------|
428
+ | `memory_handoff_write` | Structured "where we left off" snapshot — currentTask, completed, nextSteps, openQuestions, fileRefs, decisions, notes. Written before compaction or session end so a fresh session can resume without re-explanation. |
429
+ | `memory_handoff_read` | Load the latest handoff (or one by stamp; `list=true` for recent stamps). Call at session start to pick up where the prior session left off. |
430
+ | `memory_context_pressure` | Self-assess context window pressure (`ok`/`warm`/`hot`/`critical`) and receive a deterministic action plan — when to save memories, when to write a handoff, when to invoke `/compact`. Pass `phaseBoundary=true` at natural task/phase boundaries to force a proactive compact regardless of level (pivots thrash the cache anyway — compacting at the boundary is a free lunch). |
431
+
432
+ ### Governance
433
+
434
+ | Tool | What it does |
435
+ |------|-------------|
436
+ | `memory_govern` | Run governance checks: contradiction detection (vector + heuristic + LLM), semantic drift monitoring, and memory poisoning detection. All advisory — flags issues without auto-deleting. |
437
+
438
+ ### Import
439
+
440
+ | Tool | What it does |
441
+ |------|-------------|
442
+ | `memory_import` | Bulk import from Claude Code JSONL, ChatGPT JSON, or plain text |
443
+
444
+ ## Slash Commands
445
+
446
+ These work in any MCP-compatible client (Claude Code, Cursor, etc.). The MCP server advertises them in its instructions so the agent knows how to handle them. SKILL.md files are also included for platforms that discover skills from the filesystem.
447
+
448
+ | Command | What it does |
449
+ |---------|-------------|
450
+ | `/memory-source <engram\|off\|hybrid>` | Switch memory backend. "engram" uses Engram exclusively, "off" disables all persistent memory, "hybrid" runs Engram alongside native client memory. |
451
+ | `/recall <query>` | Search memories using the full hybrid pipeline (vector + keyword + temporal + KG + spreading activation). Results presented conversationally. |
452
+ | `/forget <what>` | Find and remove or correct specific memories. Shows matches and confirms before acting. |
453
+ | `/memory-health [maintain]` | Show memory system stats (tiers, layers, rules, KG size). With "maintain", runs the full consolidation cycle. |
454
+ | `/memory-api <key>` | Set or update the OpenRouter API key that unlocks LLM extraction, reranking, and procedural-rule learning. |
455
+ | `/knowledge <subcommand>` | Knowledge graph operations. Subcommands: `timeline <entity>`, `about <entity>`, `add <s> <p> <o>`, `correct <s> <p>`, `stats`. |
456
+ | `/memory <subcommand>` | Quick ops. Subcommands: `save <content>`, `diary [date]`, `diary write <entry>`, `import <source>`, `rules`, `session [show\|clear]`. |
457
+
458
+ ### Installing Slash Commands for Claude Code
459
+
460
+ The slash commands above are advertised in Engram's MCP server instructions and work automatically in most clients. For Claude Code specifically, you can also install them as custom commands so they show up in the `/` command menu:
461
+
462
+ ```bash
463
+ # From the engram directory
464
+ bash install-commands.sh
465
+
466
+ # To overwrite existing commands
467
+ bash install-commands.sh --force
468
+ ```
469
+
470
+ This copies command files to `~/.claude/commands/` where Claude Code picks them up globally. After installing, type `/` in Claude Code to see them in the command list.
471
+
472
+ ## Architecture
473
+
474
+ ```
475
+ Conversations --> Extract --> LanceDB (vectors + metadata)
476
+ | |
477
+ KG Auto-Populate +----------+----------+
478
+ (12 rel types) | | |
479
+ Vector ANN IDF Keywords Time Windows
480
+ | | |
481
+ +----+-----+-----+----+
482
+ | |
483
+ KG Temporal Spreading
484
+ Lookup Activation
485
+ | |
486
+ +-----+-----+
487
+ |
488
+ Score + Rank
489
+ |
490
+ Token Budget Cap
491
+ |
492
+ Governance Checks
493
+ (advisory, async)
494
+ |
495
+ Format for Prompt
496
+
497
+ Adaptive Forgetting
498
+ (semantic proximity modulates decay)
499
+ |
500
+ Persona Bridge <--> Procedural Rules
501
+ (emotion-weighted (confidence-scored,
502
+ importance, learned from
503
+ cognitive load) corrections)
504
+ ```
505
+
506
+ ### Data Storage
507
+
508
+ Everything lives locally:
509
+
510
+ ```
511
+ ~/.claude/engram/
512
+ ├── SESSION-STATE.md # Hot RAM scratchpad
513
+ ├── diary/ # Daily diary entries
514
+ │ └── YYYY-MM-DD.md
515
+ ├── handoffs/ # Cross-session "where we left off" snapshots
516
+ │ ├── YYYY-MM-DD_HH-MM-SS.json
517
+ │ └── YYYY-MM-DD_HH-MM-SS.md
518
+ └── lance/ # LanceDB tables
519
+ ├── chunks.lance/ # Memory chunks with embeddings
520
+ ├── daily_logs.lance/ # Extraction logs
521
+ ├── rules.lance/ # Procedural rules
522
+ └── knowledge_triples.lance/
523
+ ```
524
+
525
+ ### Dependencies
526
+
527
+ - **LanceDB** for the embedded vector database, handles ANN search natively
528
+ - **@huggingface/transformers** for local embedding inference (Xenova/all-MiniLM-L6-v2, 384 dimensions, 23MB)
529
+ - **openai** (optional) for LLM-powered extraction and reranking via OpenRouter
530
+ - **mem0ai** (optional) for Mem0 cloud extraction
531
+ - **@modelcontextprotocol/sdk** for the MCP server protocol
532
+
533
+ ## Benchmarks
534
+
535
+ Clone the repo, install, fetch the public datasets, run the whole suite:
536
+
537
+ ```bash
538
+ git clone https://github.com/OneNomad-LLC/engram-mcp.git
539
+ cd engram-mcp
540
+ npm install
541
+ bash benchmarks/download-datasets.sh
542
+ npm run bench:all
543
+ ```
544
+
545
+ That's it. Every benchmark writes a JSON result file into `benchmarks/results/` and a consolidated table prints at the end. Missing datasets get skipped, not failed — partial runs are valid. No API keys are required for the default configuration.
546
+
547
+ For full methodology, dataset citations, and reproducibility steps, see [BENCHMARKS.md](BENCHMARKS.md).
548
+
549
+ ### Our scores at HEAD
550
+
551
+ | Benchmark | Metric | Score | Hardware | Notes |
552
+ |-----------|-------|------|----------|-------|
553
+ | LoCoMo (1,986 QA) | R@10 | **92.0%** | M-series laptop | Zero-API, sub-session chunking |
554
+ | LoCoMo (1,986 QA) | R@5 | **85.1%** | M-series laptop | Zero-API |
555
+ | LongMemEval (500 Q) | R@5 | **99.0%** | M-series laptop | Zero-API |
556
+ | Engram synthetic | R@5 | TODO capture | — | Internal regression battery |
557
+ | Ingest throughput (cold) | chunks/sec | TODO capture | — | File backend, KG extraction off |
558
+ | Ingest throughput (warm) | chunks/sec | TODO capture | — | File backend, 10k chunks preloaded |
559
+ | Query latency (medium, 10k corpus) | p50 / p99 | TODO capture | — | Top-K=10, single thread |
560
+
561
+ Run the suite yourself — the `results.json` file captures the exact config, embedding model, commit hash, and per-category breakdown for verification.
562
+
563
+ ### LoCoMo
564
+
565
+ [Snap Research's LoCoMo](https://github.com/snap-research/locomo) — 1,986 multi-hop QA pairs across 10 long synthetic conversations. We score Recall@5 and Recall@10 with the full hybrid retrieval pipeline. A retrieved session counts as a hit if it contains any of the evidence dialog IDs for the question.
566
+
567
+ Categories: `single-hop`, `temporal`, `temporal-inference`, `open-domain`, `adversarial`.
568
+
569
+ ```bash
570
+ npm run bench:locomo # full run
571
+ npm run bench:locomo -- --limit 200 # quick subset
572
+ npm run bench:locomo -- --rerank # with LLM rerank (needs OPENROUTER_API_KEY)
573
+ npm run bench:locomo -- --verbose
574
+ ```
575
+
576
+ Runtime: ~3–5 min on an M-series Mac. Paper: [Maharana et al., 2024](https://arxiv.org/abs/2402.17753).
577
+
578
+ ### LongMemEval
579
+
580
+ [LongMemEval](https://huggingface.co/datasets/xiaowu0162/longmemeval-cleaned) — 500 questions across six types, ~53 candidate sessions per question. We score Recall@5 / @10 and NDCG@5 / @10. Binary recall — at least one answer session in the top K.
581
+
582
+ ```bash
583
+ npm run bench:longmemeval
584
+ npm run bench:longmemeval -- --limit 50
585
+ npm run bench:longmemeval -- --rerank
586
+ ```
587
+
588
+ Runtime: ~6–10 min for the full 500 on an M-series Mac. The dataset is ~277 MB. Paper: [Wu et al., 2024](https://arxiv.org/abs/2410.10813).
589
+
590
+ ### Engram synthetic suite (`bench`)
591
+
592
+ A self-contained 15-question battery covering single-fact recall, preferences, temporal reasoning, knowledge updates, and adversarial / distractor resistance. No dataset download. Exits non-zero when R@5 drops below 70% — used as the pre-merge regression gate.
593
+
594
+ ```bash
595
+ npm run bench
596
+ npm run bench:verbose
597
+ ```
598
+
599
+ Runtime: ~30 sec. Self-contained — runs on a clean clone with no flags.
600
+
601
+ ### Ingest throughput
602
+
603
+ Pushes N synthetic chunks (default 10,000) through `wal.ingest()` and reports chunks/sec. Two modes: `cold` (fresh data dir) and `warm` (10k chunks pre-loaded). The bench waits for background side-effects to drain before stopping the clock, so the number is "fully persisted" not "queued." KG extraction is skipped to keep the bench API-key-free.
604
+
605
+ ```bash
606
+ npm run bench:throughput
607
+ npm run bench:throughput -- --chunks 5000 --mode warm
608
+ ```
609
+
610
+ Runtime: ~1–3 min at default settings.
611
+
612
+ ### Query latency
613
+
614
+ Loads N synthetic chunks (default 10,000), runs M queries (default 1,000) sequentially, and reports p50 / p95 / p99 latency per query bucket (`short` keyword queries, `medium` single-sentence questions, `long` multi-clause questions). Wall-clock is measured around the full `search()` call — the same path `memory_search` hits at the MCP boundary.
615
+
616
+ ```bash
617
+ npm run bench:latency
618
+ npm run bench:latency -- --chunks 5000 --queries 500
619
+ npm run bench:latency -- --topk 5
620
+ ```
621
+
622
+ Runtime: ~2–4 min at default settings.
623
+
624
+ ## Security
625
+
626
+ ### Network calls
627
+
628
+ This plugin contacts exactly two services:
629
+
630
+ 1. **HuggingFace Hub** for a one-time model download on first run (~23MB), cached after that
631
+ 2. **Mem0 API**, only when `extractionProvider` is `mem0` or `both`
632
+
633
+ If you set `OPENROUTER_API_KEY`, it contacts the OpenRouter API for LLM features (you pick the model provider). Without any API keys, everything runs fully local.
634
+
635
+ No telemetry. No analytics. No phoning home.
636
+
637
+ ### Local storage
638
+
639
+ All memory data stays on disk at `~/.claude/engram/`. Nothing gets sent anywhere unless you explicitly configure an external provider.
640
+
641
+ ## Use Cases
642
+
643
+ Here are some real situations where this makes a difference.
644
+
645
+ **Personal AI assistant.** The most obvious one. You talk to an AI every day and it forgets everything between sessions. Engram fixes that. It learns your preferences, remembers your projects, picks up your corrections, and builds a picture of who you are over time. Instead of re-explaining yourself every conversation, the agent just knows.
646
+
647
+ **Developer tools.** If you use Claude Code, Cursor, or any AI coding tool, the agent forgets your codebase conventions, your preferred patterns, and the decisions you've already made. Engram picks up things like "always use explicit return types" or "we deploy to Vercel, not AWS" and carries them forward. Procedural rules are built for this.
648
+
649
+ **Customer support agents.** A support bot that actually remembers a customer's history, past issues, and preferences without needing to query a CRM every time. The knowledge graph handles entity relationships ("Customer X uses Plan Y, started in March") and temporal queries let the agent reason about timelines.
650
+
651
+ **Research and note-taking.** If you use an AI to research topics over multiple sessions, Engram lets it build on previous findings instead of starting from scratch. The diary system logs what happened each session, and the search pipeline surfaces relevant prior research when you come back to a topic.
652
+
653
+ **Multi-agent systems.** Multiple agents can share the same memory store. One agent handles research, another handles coding, and they both read from and write to the same LanceDB. The MCP protocol makes this straightforward since any MCP-compatible client can connect to the server.
654
+
655
+ **Therapy / coaching bots.** Sensitive use case, but a good one. An AI that remembers what you talked about last week, tracks your goals, and notices patterns in your behavior over time. The tier lifecycle naturally keeps recent context hot while letting older sessions fade unless they stay relevant.
656
+
657
+ ## Pairs Well With: Persona MCP
658
+
659
+ If Engram is the brain, [Persona](https://github.com/OneNomad-LLC/persona-mcp) is the personality.
660
+
661
+ Engram handles *what* the agent remembers: facts, preferences, rules, timelines. Persona handles *how* the agent communicates: tone, verbosity, format preferences, and communication style. They solve different problems but work best together.
662
+
663
+ Here's why the combo matters. Engram will learn that you prefer TypeScript over Python. Persona will learn that you want short answers with code first and explanation after. Engram will store the fact that you got laid off last month. Persona will know not to bring that up casually based on the emotional context it picked up.
664
+
665
+ Persona tracks behavioral signals (corrections, approvals, frustrations, praise) and builds a communication profile that adapts over time. Engram's procedural rules overlap a little here ("never use em-dashes"), but Persona goes deeper into *how* the agent should talk to you specifically. Things like matching your energy level, knowing when to be terse vs. when to elaborate, and adjusting formality based on the topic.
666
+
667
+ When both servers are running, they coordinate through three mechanisms:
668
+
669
+ 1. **Emotion-weighted memory importance.** Engram calls `persona_state` during ingestion to get the current emotional valence and arousal. High-arousal negative emotions boost memory importance by up to 30%. A frustrated correction gets remembered more strongly than a neutral fact.
670
+
671
+ 2. **Cognitive-load-gated search.** When Persona detects cognitive overload, Engram's `memory_search` receives the load signal and returns only the top 3 high-importance memories instead of the full result set. Less noise when you're already overwhelmed.
672
+
673
+ 3. **Procedural bridge.** Engram's learned rules (from corrections and instructions) and Persona's applied evolution proposals sync through a shared bridge file at `~/.claude/procedural-bridge.json`. Engram rules become Persona proposals. Persona's applied proposals reinforce or create Engram rules. The bridge auto-syncs during `persona_consolidate`.
674
+
675
+ You can run Engram without Persona and it works fine. But if you want an AI that actually feels like it knows you, not just what you've told it, but how you like to be talked to, run both.
676
+
677
+ ## License
678
+
679
+ Licensed under the [Apache License, Version 2.0](LICENSE).
680
+
681
+ Copyright (c) 2026 Matt Stvartak / OneNomad LLC.
682
+
683
+ Use it, fork it, ship it. The full terms are in the [LICENSE](LICENSE) file.
684
+
685
+ For inquiries: **matt@onenomad.dev**