agent-working-memory 0.5.5 → 0.6.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (82) hide show
  1. package/README.md +428 -399
  2. package/dist/api/routes.d.ts.map +1 -1
  3. package/dist/api/routes.js +60 -5
  4. package/dist/api/routes.js.map +1 -1
  5. package/dist/cli.js +468 -68
  6. package/dist/cli.js.map +1 -1
  7. package/dist/coordination/index.d.ts +11 -0
  8. package/dist/coordination/index.d.ts.map +1 -0
  9. package/dist/coordination/index.js +39 -0
  10. package/dist/coordination/index.js.map +1 -0
  11. package/dist/coordination/mcp-tools.d.ts +8 -0
  12. package/dist/coordination/mcp-tools.d.ts.map +1 -0
  13. package/dist/coordination/mcp-tools.js +221 -0
  14. package/dist/coordination/mcp-tools.js.map +1 -0
  15. package/dist/coordination/routes.d.ts +9 -0
  16. package/dist/coordination/routes.d.ts.map +1 -0
  17. package/dist/coordination/routes.js +573 -0
  18. package/dist/coordination/routes.js.map +1 -0
  19. package/dist/coordination/schema.d.ts +12 -0
  20. package/dist/coordination/schema.d.ts.map +1 -0
  21. package/dist/coordination/schema.js +125 -0
  22. package/dist/coordination/schema.js.map +1 -0
  23. package/dist/coordination/schemas.d.ts +227 -0
  24. package/dist/coordination/schemas.d.ts.map +1 -0
  25. package/dist/coordination/schemas.js +125 -0
  26. package/dist/coordination/schemas.js.map +1 -0
  27. package/dist/coordination/stale.d.ts +27 -0
  28. package/dist/coordination/stale.d.ts.map +1 -0
  29. package/dist/coordination/stale.js +58 -0
  30. package/dist/coordination/stale.js.map +1 -0
  31. package/dist/engine/activation.d.ts.map +1 -1
  32. package/dist/engine/activation.js +119 -23
  33. package/dist/engine/activation.js.map +1 -1
  34. package/dist/engine/consolidation.d.ts.map +1 -1
  35. package/dist/engine/consolidation.js +27 -6
  36. package/dist/engine/consolidation.js.map +1 -1
  37. package/dist/index.js +100 -4
  38. package/dist/index.js.map +1 -1
  39. package/dist/mcp.js +149 -80
  40. package/dist/mcp.js.map +1 -1
  41. package/dist/storage/sqlite.d.ts +21 -0
  42. package/dist/storage/sqlite.d.ts.map +1 -1
  43. package/dist/storage/sqlite.js +331 -282
  44. package/dist/storage/sqlite.js.map +1 -1
  45. package/dist/types/engram.d.ts +24 -0
  46. package/dist/types/engram.d.ts.map +1 -1
  47. package/dist/types/engram.js.map +1 -1
  48. package/package.json +57 -55
  49. package/src/api/index.ts +3 -3
  50. package/src/api/routes.ts +600 -536
  51. package/src/cli.ts +850 -397
  52. package/src/coordination/index.ts +47 -0
  53. package/src/coordination/mcp-tools.ts +318 -0
  54. package/src/coordination/routes.ts +846 -0
  55. package/src/coordination/schema.ts +120 -0
  56. package/src/coordination/schemas.ts +155 -0
  57. package/src/coordination/stale.ts +97 -0
  58. package/src/core/decay.ts +63 -63
  59. package/src/core/embeddings.ts +88 -88
  60. package/src/core/hebbian.ts +93 -93
  61. package/src/core/index.ts +5 -5
  62. package/src/core/logger.ts +36 -36
  63. package/src/core/query-expander.ts +66 -66
  64. package/src/core/reranker.ts +101 -101
  65. package/src/engine/activation.ts +758 -656
  66. package/src/engine/connections.ts +103 -103
  67. package/src/engine/consolidation-scheduler.ts +125 -125
  68. package/src/engine/consolidation.ts +29 -6
  69. package/src/engine/eval.ts +102 -102
  70. package/src/engine/eviction.ts +101 -101
  71. package/src/engine/index.ts +8 -8
  72. package/src/engine/retraction.ts +100 -100
  73. package/src/engine/staging.ts +74 -74
  74. package/src/index.ts +208 -121
  75. package/src/mcp.ts +1093 -1013
  76. package/src/storage/index.ts +3 -3
  77. package/src/storage/sqlite.ts +1017 -963
  78. package/src/types/agent.ts +67 -67
  79. package/src/types/checkpoint.ts +46 -46
  80. package/src/types/engram.ts +245 -217
  81. package/src/types/eval.ts +100 -100
  82. package/src/types/index.ts +6 -6
package/README.md CHANGED
@@ -1,399 +1,428 @@
1
- # AgentWorkingMemory (AWM)
2
-
3
- **Persistent working memory for AI agents.**
4
-
5
- AWM helps agents retain important project knowledge across conversations and sessions. Instead of storing everything and retrieving by similarity alone, it filters for salience, builds associative links between related memories, and periodically consolidates useful knowledge while letting noise fade.
6
-
7
- Use it through Claude Code via MCP or as a local HTTP service for custom agents. Everything runs locally: SQLite + ONNX models + Node.js. No cloud, no API keys.
8
-
9
- ### Without AWM
10
- - Agent forgets earlier architecture decision
11
- - Suggests Redux after project standardized on Zustand
12
- - Repeats discussion already settled three days ago
13
- - Every new conversation starts from scratch
14
-
15
- ### With AWM
16
- - Recalls prior state-management decision and rationale
17
- - Surfaces related implementation patterns from past sessions
18
- - Continues work without re-asking for context
19
- - Gets more consistent the longer you use it
20
-
21
- ---
22
-
23
- ## Quick Start
24
-
25
- **Node.js 20+** required — check with `node --version`.
26
-
27
- ```bash
28
- npm install -g agent-working-memory
29
- awm setup --global
30
- ```
31
-
32
- Restart Claude Code. That's it — 14 memory tools appear automatically.
33
-
34
- First conversation will be ~30 seconds slower while ML models download (~200MB total, cached locally). After that, everything runs on your machine.
35
-
36
- > For isolated memory per folder, see [Separate Memory Pools](#separate-memory-pools). For team onboarding, see [docs/quickstart.md](docs/quickstart.md).
37
-
38
- ---
39
-
40
- ## Who this is for
41
-
42
- - **Long-running coding agents** that need cross-session project knowledge
43
- - **Multi-agent workflows** where specialized agents share a common memory
44
- - **Local-first setups** where cloud memory is not acceptable
45
- - **Teams using Claude Code** who want persistent context without manual notes
46
-
47
- ## What this is not
48
-
49
- - Not a chatbot UI
50
- - Not a hosted SaaS
51
- - Not a generic vector database
52
- - Not a replacement for your source of truth (code, docs, tickets)
53
-
54
- ---
55
-
56
- ## Why it's different
57
-
58
- Most "memory for AI" projects are vector databases with a retrieval wrapper. AWM goes further:
59
-
60
- | | Typical RAG / Vector Store | AWM |
61
- |---|---|---|
62
- | **Storage** | Everything | Salience-filtered with low-confidence fallback (novel events go active, borderline enter staging, low-salience stored at reduced confidence) |
63
- | **Retrieval** | Cosine similarity | 10-phase pipeline: dual BM25 (keyword + expanded) + vectors + reranking + graph walk + decay + coref expansion |
64
- | **Connections** | None | Hebbian edges that strengthen when memories co-activate |
65
- | **Over time** | Grows forever, gets noisier | Consolidation: diameter-enforced clustering, cross-topic bridges, synaptic-tagged decay |
66
- | **Forgetting** | Manual cleanup | Cognitive forgetting: unused memories fade, reinforced knowledge persists (access-count modulated) |
67
- | **Feedback** | None | Useful/not-useful signals tune confidence and retrieval rank |
68
- | **Correction** | Delete and re-insert | Retraction: wrong memories invalidated, corrections linked, penalties propagate |
69
- | **Noise rejection** | None | Multi-channel agreement gate: requires 2+ retrieval channels to agree before returning results |
70
- | **Duplicates** | Stored repeatedly | Reinforce-on-duplicate: near-exact matches boost existing memory instead of creating copies |
71
-
72
- The design is based on cognitive science — ACT-R activation decay, Hebbian learning, complementary learning systems, synaptic homeostasis, and synaptic tagging — rather than ad-hoc heuristics. See [How It Works](#how-it-works) and [docs/cognitive-model.md](docs/cognitive-model.md) for details.
73
-
74
- ---
75
-
76
- ## Benchmarks (v0.5.4)
77
-
78
- | Eval | Score | What it tests |
79
- |------|-------|---------------|
80
- | Edge Cases | **100% (34/34)** | 9 failure modes: context collapse, hub toxicity, flashbulb distortion, narcissistic interference, identity collision, contradiction trapping, bridge overshoot, noise forgetting |
81
- | Stress Test | **96.2% (50/52)** | 500 memories, 100 sleep cycles, 10 topic clusters, 20 bridges/cycle, catastrophic forgetting, adversarial spam, recovery |
82
- | Workday | **93.3%** | 43 memories across 4 simulated work sessions, knowledge transfer, context switching, cross-cutting queries, noise filtering |
83
- | A/B Test | **AWM 85% vs Baseline 83%** | 100 project events, 24 recall questions, 22/22 fact recall |
84
- | Sleep Cycle | **85.7% pre-sleep** | 60 memories, 4 topic clusters, consolidation impact measurement |
85
- | Token Savings | **67.5% accuracy, 55% savings** | Memory-guided context vs full conversation history, 2.2x efficiency |
86
- | LoCoMo | **28.2%** | Industry-standard conversational memory benchmark (1,986 QA pairs, 10 conversations) |
87
- | Mini Multi-Hop | **80% (4/5)** | Entity bridging across conversation turns |
88
-
89
- ### Consolidation (v0.5.4 with BGE-small)
90
-
91
- | Metric | Value |
92
- |--------|-------|
93
- | Topic clusters formed | **10** per consolidation cycle |
94
- | Cross-topic bridges | **20** in first cycle |
95
- | Edges strengthened | **127** per cycle |
96
- | Graph size at scale | **3,000-4,500 edges** (500 memories) |
97
- | Recall after 100 cycles | **90%** stable |
98
- | Catastrophic forgetting survival | **5/5** (100%) |
99
-
100
- All evals are reproducible: `npm run test:edge`, `npm run test:stress`, `npm run test:workday`, etc. See [Testing & Evaluation](#testing--evaluation).
101
-
102
- ---
103
-
104
- ## Features
105
-
106
- ### Memory Tools (14)
107
-
108
- | Tool | Purpose |
109
- |------|---------|
110
- | `memory_write` | Store a memory (salience filter + reinforce-on-duplicate) |
111
- | `memory_recall` | Retrieve relevant memories by context (dual BM25 + coref expansion) |
112
- | `memory_feedback` | Report whether a recalled memory was useful |
113
- | `memory_retract` | Invalidate a wrong memory with optional correction |
114
- | `memory_supersede` | Replace outdated memory with current version |
115
- | `memory_stats` | View memory health metrics and activity |
116
- | `memory_checkpoint` | Save execution state (survives context compaction) |
117
- | `memory_restore` | Recover state + relevant context at session start |
118
- | `memory_task_add` | Create a prioritized task |
119
- | `memory_task_update` | Change task status/priority |
120
- | `memory_task_list` | List tasks by status |
121
- | `memory_task_next` | Get the highest-priority actionable task |
122
- | `memory_task_begin` | Start a task — auto-checkpoints and recalls context |
123
- | `memory_task_end` | End a task — writes summary and checkpoints |
124
-
125
- ### Separate Memory Pools
126
-
127
- By default, all projects share one memory pool. For isolated pools per folder, place a `.mcp.json` in each parent folder with a different `AWM_AGENT_ID`:
128
-
129
- ```
130
- C:\Users\you\work\.mcp.json -> AWM_AGENT_ID: "work"
131
- C:\Users\you\personal\.mcp.json -> AWM_AGENT_ID: "personal"
132
- ```
133
-
134
- Claude Code uses the closest `.mcp.json` ancestor. Same database, isolation by agent ID.
135
-
136
- ### Incognito Mode
137
-
138
- ```bash
139
- AWM_INCOGNITO=1 claude
140
- ```
141
-
142
- Registers zero tools — Claude doesn't see memory at all. All other tools and MCP servers work normally.
143
-
144
- ### Auto-Checkpoint Hooks
145
-
146
- Installed by `awm setup --global`:
147
-
148
- - **Stop** — reminds Claude to write/recall after each response
149
- - **PreCompact** — auto-checkpoints before context compression
150
- - **SessionEnd** — auto-checkpoints and consolidates on close
151
- - **15-min timer** — silent auto-checkpoint while session is active
152
-
153
- ### Auto-Backup
154
-
155
- The HTTP server automatically copies the database to a `backups/` directory on startup with a timestamp. Cheap insurance against data loss.
156
-
157
- ### Activity Log
158
-
159
- ```bash
160
- tail -f "$(npm root -g)/agent-working-memory/data/awm.log"
161
- ```
162
-
163
- Real-time: writes, recalls, reinforcements, checkpoints, consolidation, hook events.
164
-
165
- ### Activity Stats
166
-
167
- ```bash
168
- curl http://127.0.0.1:8401/stats
169
- ```
170
-
171
- Returns daily counts: `{"writes": 8, "recalls": 9, "hooks": 3, "total": 25}`
172
-
173
- ---
174
-
175
- ## Memory Invocation Strategy
176
-
177
- AWM combines deterministic hooks for guaranteed memory operations at lifecycle transitions with agent-directed usage during active work.
178
-
179
- ### Deterministic triggers (always happen)
180
-
181
- | Event | Action |
182
- |-------|--------|
183
- | Session start | `memory_restore` — recover state + recall context |
184
- | Pre-compaction | Auto-checkpoint via hook sidecar |
185
- | Session end | Auto-checkpoint + full consolidation |
186
- | Every 15 min | Silent auto-checkpoint (if active) |
187
- | Task start | `memory_task_begin` — checkpoint + recall |
188
- | Task end | `memory_task_end` — summary + checkpoint |
189
-
190
- ### Agent-directed triggers (when these situations occur)
191
-
192
- **Write memory when:**
193
- - A project decision is made or changed
194
- - A root cause is discovered
195
- - A reusable implementation pattern is established
196
- - A preference, constraint, or requirement is clarified
197
- - A prior assumption is found to be wrong
198
-
199
- **Recall memory when:**
200
- - Starting work on a new task or subsystem
201
- - Re-entering code you haven't touched recently
202
- - After context compaction
203
- - After a failed attempt (check if there's prior knowledge)
204
- - Before refactoring or making architectural changes
205
-
206
- **Retract when:**
207
- - A stored memory turns out to be wrong or outdated
208
-
209
- **Feedback when:**
210
- - A recalled memory was used (useful) or irrelevant (not useful)
211
-
212
- ---
213
-
214
- ## HTTP API
215
-
216
- For custom agents, scripts, or non-Claude-Code workflows:
217
-
218
- ```bash
219
- awm serve # From npm install
220
- npx tsx src/index.ts # From source
221
- ```
222
-
223
- Write a memory:
224
-
225
- ```bash
226
- curl -X POST http://localhost:8400/memory/write \
227
- -H "Content-Type: application/json" \
228
- -d '{
229
- "agentId": "my-agent",
230
- "concept": "Express error handling",
231
- "content": "Use centralized error middleware as the last app.use()",
232
- "eventType": "causal",
233
- "surprise": 0.5,
234
- "causalDepth": 0.7
235
- }'
236
- ```
237
-
238
- Recall:
239
-
240
- ```bash
241
- curl -X POST http://localhost:8400/memory/activate \
242
- -H "Content-Type: application/json" \
243
- -d '{
244
- "agentId": "my-agent",
245
- "context": "How should I handle errors in my Express API?"
246
- }'
247
- ```
248
-
249
- ---
250
-
251
- ## How It Works
252
-
253
- ### The Memory Lifecycle
254
-
255
- 1. **Write** — Salience scoring evaluates novelty, surprise, causal depth, and effort. High-salience memories go active; borderline ones enter staging; low-salience stored at reduced confidence for recall fallback. Near-duplicates reinforce existing memories instead of creating copies.
256
-
257
- 2. **Connect** — Vector embedding (BGE-small-en-v1.5, 384d). Temporal edges link to recent memories. Hebbian edges form between co-retrieved memories. Coref expansion resolves pronouns to entity names.
258
-
259
- 3. **Retrieve** 10-phase pipeline: coref expansion + query expansion + dual BM25 (keyword-stripped + expanded) + semantic vectors + Rocchio pseudo-relevance feedback + ACT-R temporal decay (synaptic-tagged) + Hebbian boost + entity-bridge boost + graph walk + cross-encoder reranking + multi-channel agreement gate.
260
-
261
- 4. **Consolidate** — 7-phase sleep cycle: diameter-enforced clustering (prevents chaining), edge strengthening (access-weighted), cross-topic bridge formation (direct closest-pair), confidence-modulated decay (synaptic tagging extends half-life), synaptic homeostasis, cognitive forgetting, staging sweep. Embedding backfill ensures all memories are clusterable.
262
-
263
- 5. **Feedback** Useful/not-useful signals adjust confidence, affecting retrieval rank and forgetting resistance.
264
-
265
- ### Cognitive Foundations
266
-
267
- - **ACT-R activation decay** (Anderson 1993) — memories decay with time, strengthen with use. Synaptic tagging: heavily-accessed memories decay slower (log-scaled).
268
- - **Hebbian learning** — co-retrieved memories form stronger associative edges
269
- - **Complementary Learning Systems** — fast capture (salience + staging) + slow consolidation (sleep cycle)
270
- - **Synaptic homeostasis** — edge weight normalization prevents hub domination
271
- - **Forgetting as feature** — noise removal improves signal-to-noise for connected memories
272
- - **Diameter-enforced clustering** — prevents semantic chaining (e.g., physics->biophysics->cooking = 1 cluster)
273
- - **Multi-channel agreement** — OOD detection requires multiple retrieval channels to agree
274
-
275
- ---
276
-
277
- ## Architecture
278
-
279
- ```
280
- src/
281
- core/ # Cognitive primitives
282
- embeddings.ts - Local vector embeddings (BGE-small-en-v1.5, 384d)
283
- reranker.ts - Cross-encoder passage scoring (ms-marco-MiniLM)
284
- query-expander.ts - Synonym expansion (flan-t5-small)
285
- salience.ts - Write-time importance scoring (novelty + salience + reinforce-on-duplicate)
286
- decay.ts - ACT-R temporal activation decay
287
- hebbian.ts - Association strengthening/weakening
288
- logger.ts - Append-only activity log (data/awm.log)
289
- engine/ # Processing pipelines
290
- activation.ts - 10-phase retrieval pipeline (dual BM25, coref, agreement gate)
291
- consolidation.ts - 7-phase sleep cycle (diameter clustering, direct bridging, synaptic tagging)
292
- connections.ts - Discover links between memories
293
- staging.ts - Weak signal buffer (promote or discard)
294
- retraction.ts - Negative memory / corrections
295
- eviction.ts - Capacity enforcement
296
- hooks/
297
- sidecar.ts - Hook HTTP server (auto-checkpoint, stats, timer)
298
- storage/
299
- sqlite.ts - SQLite + FTS5 persistence layer
300
- api/
301
- routes.ts - HTTP endpoints (memory + task + system)
302
- mcp.ts - MCP server (14 tools, incognito support)
303
- cli.ts - CLI (setup, serve, hook config)
304
- index.ts - HTTP server entry point (auto-backup on startup)
305
- ```
306
-
307
- For detailed architecture including pipeline phases, database schema, and system diagrams, see [docs/architecture.md](docs/architecture.md).
308
-
309
- ---
310
-
311
- ## Testing & Evaluation
312
-
313
- ### Unit Tests
314
-
315
- ```bash
316
- npx vitest run # 68 tests
317
- ```
318
-
319
- ### Eval Suites
320
-
321
- | Command | What it tests | Score |
322
- |---------|--------------|-------|
323
- | `npm run test:edge` | 9 adversarial failure modes: context collapse, hub toxicity, flashbulb distortion, narcissistic interference, identity collision, contradiction trapping, bridge overshoot, noise benefit | **100% (34/34)** |
324
- | `npm run test:stress` | 500 memories, 100 sleep cycles, 10 clusters, 20 bridges, catastrophic forgetting, adversarial spam, recovery | **96.2% (50/52)** |
325
- | `npm run test:workday` | 43 memories across 4 projects, 14 recall challenges, noise filtering | **93.3%** |
326
- | `npm run test:ab` | AWM vs keyword baseline, 100 events, 24 questions | **AWM 85% (22/22 recall)** |
327
- | `npm run test:sleep` | 60 memories, 4 topic clusters, consolidation impact | **85.7% pre-sleep** |
328
- | `npm run test:tokens` | Token savings vs full conversation history | **67.5% accuracy, 55% savings** |
329
- | `npm run test:locomo` | LoCoMo conversational memory benchmark (1,986 QA pairs) | **28.2%** |
330
- | `npm run test:self` | Pipeline component checks | **97.4%** |
331
-
332
- ---
333
-
334
- ## Environment Variables
335
-
336
- | Variable | Default | Purpose |
337
- |----------|---------|---------|
338
- | `AWM_PORT` | `8400` | HTTP server port |
339
- | `AWM_DB_PATH` | `memory.db` | SQLite database path |
340
- | `AWM_AGENT_ID` | `claude-code` | Agent ID (memory namespace) |
341
- | `AWM_EMBED_MODEL` | `Xenova/bge-small-en-v1.5` | Embedding model (retrieval-optimized) |
342
- | `AWM_EMBED_DIMS` | `384` | Embedding dimensions |
343
- | `AWM_RERANKER_MODEL` | `Xenova/ms-marco-MiniLM-L-6-v2` | Reranker model |
344
- | `AWM_HOOK_PORT` | `8401` | Hook sidecar port |
345
- | `AWM_HOOK_SECRET` | *(none)* | Bearer token for hook auth |
346
- | `AWM_API_KEY` | *(none)* | Bearer token for HTTP API auth |
347
- | `AWM_INCOGNITO` | *(unset)* | Set to `1` to disable all tools |
348
-
349
- ## Tech Stack
350
-
351
- | Component | Technology |
352
- |-----------|-----------|
353
- | Language | TypeScript (ES2022, strict) |
354
- | Database | SQLite via better-sqlite3 + FTS5 |
355
- | HTTP | Fastify 5 |
356
- | MCP | @modelcontextprotocol/sdk |
357
- | ML Runtime | @huggingface/transformers (local ONNX) |
358
- | Embeddings | BGE-small-en-v1.5 (BAAI, retrieval-optimized, 384d) |
359
- | Reranker | ms-marco-MiniLM-L-6-v2 (cross-encoder) |
360
- | Query Expansion | flan-t5-small (synonym generation) |
361
- | Tests | Vitest 4 |
362
- | Validation | Zod 4 |
363
-
364
- All three ML models run locally via ONNX. No external API calls for retrieval. The entire system is a single SQLite file + a Node.js process.
365
-
366
- ## What's New in v0.5.4
367
-
368
- - **BGE-small-en-v1.5 embedding model** — retrieval-optimized, 60% higher cosine for related short texts
369
- - **Diameter-enforced clustering** prevents semantic chaining, forms 10 distinct topic clusters
370
- - **Direct cross-topic bridging** 20 bridges per consolidation cycle
371
- - **Dual BM25 retrieval** — keyword-stripped + expanded queries for better precision
372
- - **Multi-channel agreement gate** OOD detection prevents off-topic results
373
- - **Reinforce-on-duplicate** near-duplicate writes boost existing memories
374
- - **No-discard salience** — low-salience memories stored at reduced confidence (available for fallback recall)
375
- - **Synaptic tagging** access count modulates decay (heavily-used memories persist longer)
376
- - **Coref expansion** pronoun queries auto-expanded with recent entity names
377
- - **Async consolidation** embedding backfill ensures all memories are clusterable
378
- - **Auto-backup** database copied to backups/ on server startup
379
-
380
- See [CHANGELOG-0.5.4.md](CHANGELOG-0.5.4.md) for full details.
381
-
382
- ## Project Status
383
-
384
- AWM is in active development (v0.5.4). The core memory pipeline, consolidation system, and MCP integration are stable and used daily in production coding workflows.
385
-
386
- - Core retrieval and consolidation: **stable**
387
- - MCP tools and Claude Code integration: **stable**
388
- - Task management: **stable**
389
- - Hook sidecar and auto-checkpoint: **stable**
390
- - HTTP API: **stable** (for custom agents)
391
- - Cognitive consolidation (clustering, bridging): **stable** (v0.5.4)
392
-
393
- See [CHANGELOG-0.5.4.md](CHANGELOG-0.5.4.md) for version history.
394
-
395
- ---
396
-
397
- ## License
398
-
399
- Apache 2.0see [LICENSE](LICENSE) and [NOTICE](NOTICE).
1
+ # AgentWorkingMemory (AWM)
2
+
3
+ **Persistent working memory for AI agents.**
4
+
5
+ AWM helps agents retain important project knowledge across conversations and sessions. Instead of storing everything and retrieving by similarity alone, it filters for salience, builds associative links between related memories, and periodically consolidates useful knowledge while letting noise fade.
6
+
7
+ Use it through Claude Code via MCP or as a local HTTP service for custom agents. Everything runs locally: SQLite + ONNX models + Node.js. No cloud, no API keys.
8
+
9
+ ### Without AWM
10
+ - Agent forgets earlier architecture decision
11
+ - Suggests Redux after project standardized on Zustand
12
+ - Repeats discussion already settled three days ago
13
+ - Every new conversation starts from scratch
14
+
15
+ ### With AWM
16
+ - Recalls prior state-management decision and rationale
17
+ - Surfaces related implementation patterns from past sessions
18
+ - Continues work without re-asking for context
19
+ - Gets more consistent the longer you use it
20
+
21
+ ---
22
+
23
+ ## Quick Start
24
+
25
+ **Node.js 20+** required — check with `node --version`.
26
+
27
+ ```bash
28
+ npm install -g agent-working-memory
29
+ awm setup --global
30
+ ```
31
+
32
+ Restart Claude Code. That's it — 14 memory tools appear automatically.
33
+
34
+ First conversation will be ~30 seconds slower while ML models download (~200MB total, cached locally). After that, everything runs on your machine.
35
+
36
+ > For isolated memory per folder, see [Separate Memory Pools](#separate-memory-pools). For team onboarding, see [docs/quickstart.md](docs/quickstart.md).
37
+
38
+ ---
39
+
40
+ ## Who this is for
41
+
42
+ - **Long-running coding agents** that need cross-session project knowledge
43
+ - **Multi-agent workflows** where specialized agents share a common memory
44
+ - **Local-first setups** where cloud memory is not acceptable
45
+ - **Teams using Claude Code** who want persistent context without manual notes
46
+
47
+ ## What this is not
48
+
49
+ - Not a chatbot UI
50
+ - Not a hosted SaaS
51
+ - Not a generic vector database
52
+ - Not a replacement for your source of truth (code, docs, tickets)
53
+
54
+ ---
55
+
56
+ ## Why it's different
57
+
58
+ Most "memory for AI" projects are vector databases with a retrieval wrapper. AWM goes further:
59
+
60
+ | | Typical RAG / Vector Store | AWM |
61
+ |---|---|---|
62
+ | **Storage** | Everything | Salience-filtered with low-confidence fallback (novel events go active, borderline enter staging, low-salience stored at reduced confidence) |
63
+ | **Retrieval** | Cosine similarity | 10-phase pipeline: dual BM25 (keyword + expanded) + vectors + reranking + graph walk + decay + coref expansion |
64
+ | **Connections** | None | Hebbian edges that strengthen when memories co-activate |
65
+ | **Over time** | Grows forever, gets noisier | Consolidation: diameter-enforced clustering, cross-topic bridges, synaptic-tagged decay |
66
+ | **Forgetting** | Manual cleanup | Cognitive forgetting: unused memories fade, reinforced knowledge persists (access-count modulated) |
67
+ | **Feedback** | None | Useful/not-useful signals tune confidence and retrieval rank |
68
+ | **Correction** | Delete and re-insert | Retraction: wrong memories invalidated, corrections linked, penalties propagate |
69
+ | **Noise rejection** | None | Multi-channel agreement gate: requires 2+ retrieval channels to agree before returning results |
70
+ | **Duplicates** | Stored repeatedly | Reinforce-on-duplicate: near-exact matches boost existing memory instead of creating copies |
71
+
72
+ The design is based on cognitive science — ACT-R activation decay, Hebbian learning, complementary learning systems, synaptic homeostasis, and synaptic tagging — rather than ad-hoc heuristics. See [How It Works](#how-it-works) and [docs/cognitive-model.md](docs/cognitive-model.md) for details.
73
+
74
+ ---
75
+
76
+ ## Benchmarks (v0.6.0)
77
+
78
+ ### Eval Harness (new in v0.6.0)
79
+
80
+ | Suite | Score | Threshold | What it tests |
81
+ |-------|-------|-----------|---------------|
82
+ | Retrieval | **Recall@5 = 0.800** | >= 0.80 | 200 facts, 50 queries BM25 + vector + reranker pipeline precision |
83
+ | Associative | **success@10 = 1.000** | >= 0.70 | 20 multi-hop causal chains graph walk finds non-obvious connections |
84
+ | Redundancy | **dedup F1 = 0.966** | >= 0.80 | 50 clusters × 4 paraphrases — consolidation removes duplicates correctly |
85
+ | Temporal | **Spearman = 0.932** | >= 0.75 | 25 facts with controlled age/access ACT-R decay ranking accuracy |
86
+
87
+ Key finding: **consolidation improves retrieval by 30%** — post-consolidation recall (0.950) exceeds pre-consolidation (0.650). Removing redundant noise helps ranking.
88
+
89
+ ### Full Test Suite
90
+
91
+ | Command | Score | What it tests |
92
+ |---------|-------|---------------|
93
+ | `npm run eval` | **4/4 suites pass** | Retrieval, associative, redundancy, temporal benchmarks with ablation support |
94
+ | `npm run test:run` | **77/77 tests** | Unit tests: salience, decay, hebbian, supersession, coordination |
95
+ | `npm run test:mcp` | **5/5 pass** | MCP protocol: write, recall, feedback, retract, stats |
96
+ | `npm run test:self` | **94.1% EXCELLENT** | Pipeline component checks across all cognitive subsystems |
97
+ | `npm run test:edge` | **All pass** | 9 failure modes: narcissistic interference, identity collision, contradiction trapping, bridge overshoot, noise forgetting |
98
+ | `npm run test:stress` | **96.2% (50/52)** | 500 memories, 100 sleep cycles, catastrophic forgetting, adversarial spam, recovery |
99
+ | `npm run test:workday` | **93.3% EXCELLENT** | 43 memories across 4 projects, cross-cutting queries, noise filtering |
100
+ | `npm run test:ab` | **AWM 20/22 vs Baseline 18/22** | AWM outperforms keyword baseline on architecture + testing topics |
101
+ | `npm run test:sleep` | **71.4%** | 60 memories, 4 topic clusters, consolidation impact across 3 cycles |
102
+ | `npm run test:tokens` | **56.3% savings, 2.3x efficiency** | Memory-guided context vs full history, keyword accuracy 72.5% |
103
+ | `npm run test:pilot` | **14/15 pass** | Production-like queries with noise rejection (5/5 noise rejected) |
104
+ | `npm run test:locomo` | **28.2%** | Industry-standard LoCoMo conversational memory benchmark (1,986 QA pairs) |
105
+
106
+ ### Consolidation Health (v0.6.0)
107
+
108
+ | Metric | Value |
109
+ |--------|-------|
110
+ | Topic clusters formed | **10** per consolidation cycle |
111
+ | Cross-topic bridges | **20** in first cycle |
112
+ | Edges strengthened | **135** per cycle (access-weighted) |
113
+ | Graph size at scale | **3,000-4,500 edges** (500 memories) |
114
+ | Recall after 100 cycles | **90%** stable |
115
+ | Catastrophic forgetting survival | **5/5** (100%) |
116
+ | Post-dedup retrieval | **0.950** (consolidation improves recall) |
117
+
118
+ All evals are reproducible. See [Testing & Evaluation](#testing--evaluation).
119
+
120
+ ---
121
+
122
+ ## Features
123
+
124
+ ### Memory Tools (14)
125
+
126
+ | Tool | Purpose |
127
+ |------|---------|
128
+ | `memory_write` | Store a memory (salience filter + reinforce-on-duplicate) |
129
+ | `memory_recall` | Retrieve relevant memories by context (dual BM25 + coref expansion) |
130
+ | `memory_feedback` | Report whether a recalled memory was useful |
131
+ | `memory_retract` | Invalidate a wrong memory with optional correction |
132
+ | `memory_supersede` | Replace outdated memory with current version |
133
+ | `memory_stats` | View memory health metrics and activity |
134
+ | `memory_checkpoint` | Save execution state (survives context compaction) |
135
+ | `memory_restore` | Recover state + relevant context at session start |
136
+ | `memory_task_add` | Create a prioritized task |
137
+ | `memory_task_update` | Change task status/priority |
138
+ | `memory_task_list` | List tasks by status |
139
+ | `memory_task_next` | Get the highest-priority actionable task |
140
+ | `memory_task_begin` | Start a task — auto-checkpoints and recalls context |
141
+ | `memory_task_end` | End a task — writes summary and checkpoints |
142
+
143
+ ### Separate Memory Pools
144
+
145
+ By default, all projects share one memory pool. For isolated pools per folder, place a `.mcp.json` in each parent folder with a different `AWM_AGENT_ID`:
146
+
147
+ ```
148
+ C:\Users\you\work\.mcp.json -> AWM_AGENT_ID: "work"
149
+ C:\Users\you\personal\.mcp.json -> AWM_AGENT_ID: "personal"
150
+ ```
151
+
152
+ Claude Code uses the closest `.mcp.json` ancestor. Same database, isolation by agent ID.
153
+
154
+ ### Incognito Mode
155
+
156
+ ```bash
157
+ AWM_INCOGNITO=1 claude
158
+ ```
159
+
160
+ Registers zero tools Claude doesn't see memory at all. All other tools and MCP servers work normally.
161
+
162
+ ### Auto-Checkpoint Hooks
163
+
164
+ Installed by `awm setup --global`:
165
+
166
+ - **Stop** — reminds Claude to write/recall after each response
167
+ - **PreCompact** — auto-checkpoints before context compression
168
+ - **SessionEnd** — auto-checkpoints and consolidates on close
169
+ - **15-min timer** — silent auto-checkpoint while session is active
170
+
171
+ ### Auto-Backup
172
+
173
+ The HTTP server automatically copies the database to a `backups/` directory on startup with a timestamp. Cheap insurance against data loss.
174
+
175
+ ### Activity Log
176
+
177
+ ```bash
178
+ tail -f "$(npm root -g)/agent-working-memory/data/awm.log"
179
+ ```
180
+
181
+ Real-time: writes, recalls, reinforcements, checkpoints, consolidation, hook events.
182
+
183
+ ### Activity Stats
184
+
185
+ ```bash
186
+ curl http://127.0.0.1:8401/stats
187
+ ```
188
+
189
+ Returns daily counts: `{"writes": 8, "recalls": 9, "hooks": 3, "total": 25}`
190
+
191
+ ---
192
+
193
+ ## Memory Invocation Strategy
194
+
195
+ AWM combines deterministic hooks for guaranteed memory operations at lifecycle transitions with agent-directed usage during active work.
196
+
197
+ ### Deterministic triggers (always happen)
198
+
199
+ | Event | Action |
200
+ |-------|--------|
201
+ | Session start | `memory_restore` recover state + recall context |
202
+ | Pre-compaction | Auto-checkpoint via hook sidecar |
203
+ | Session end | Auto-checkpoint + full consolidation |
204
+ | Every 15 min | Silent auto-checkpoint (if active) |
205
+ | Task start | `memory_task_begin` — checkpoint + recall |
206
+ | Task end | `memory_task_end` — summary + checkpoint |
207
+
208
+ ### Agent-directed triggers (when these situations occur)
209
+
210
+ **Write memory when:**
211
+ - A project decision is made or changed
212
+ - A root cause is discovered
213
+ - A reusable implementation pattern is established
214
+ - A preference, constraint, or requirement is clarified
215
+ - A prior assumption is found to be wrong
216
+
217
+ **Recall memory when:**
218
+ - Starting work on a new task or subsystem
219
+ - Re-entering code you haven't touched recently
220
+ - After context compaction
221
+ - After a failed attempt (check if there's prior knowledge)
222
+ - Before refactoring or making architectural changes
223
+
224
+ **Retract when:**
225
+ - A stored memory turns out to be wrong or outdated
226
+
227
+ **Feedback when:**
228
+ - A recalled memory was used (useful) or irrelevant (not useful)
229
+
230
+ ---
231
+
232
+ ## HTTP API
233
+
234
+ For custom agents, scripts, or non-Claude-Code workflows:
235
+
236
+ ```bash
237
+ awm serve # From npm install
238
+ npx tsx src/index.ts # From source
239
+ ```
240
+
241
+ Write a memory:
242
+
243
+ ```bash
244
+ curl -X POST http://localhost:8400/memory/write \
245
+ -H "Content-Type: application/json" \
246
+ -d '{
247
+ "agentId": "my-agent",
248
+ "concept": "Express error handling",
249
+ "content": "Use centralized error middleware as the last app.use()",
250
+ "eventType": "causal",
251
+ "surprise": 0.5,
252
+ "causalDepth": 0.7
253
+ }'
254
+ ```
255
+
256
+ Recall:
257
+
258
+ ```bash
259
+ curl -X POST http://localhost:8400/memory/activate \
260
+ -H "Content-Type: application/json" \
261
+ -d '{
262
+ "agentId": "my-agent",
263
+ "context": "How should I handle errors in my Express API?"
264
+ }'
265
+ ```
266
+
267
+ ---
268
+
269
+ ## How It Works
270
+
271
+ ### The Memory Lifecycle
272
+
273
+ 1. **Write** — Salience scoring evaluates novelty, surprise, causal depth, and effort. High-salience memories go active; borderline ones enter staging; low-salience stored at reduced confidence for recall fallback. Near-duplicates reinforce existing memories instead of creating copies.
274
+
275
+ 2. **Connect** — Vector embedding (BGE-small-en-v1.5, 384d). Temporal edges link to recent memories. Hebbian edges form between co-retrieved memories. Coref expansion resolves pronouns to entity names.
276
+
277
+ 3. **Retrieve** — 10-phase pipeline: coref expansion + query expansion + dual BM25 (keyword-stripped + expanded) + semantic vectors + Rocchio pseudo-relevance feedback + ACT-R temporal decay (synaptic-tagged) + Hebbian boost + entity-bridge boost + graph walk + cross-encoder reranking + multi-channel agreement gate.
278
+
279
+ 4. **Consolidate** — 7-phase sleep cycle: diameter-enforced clustering (prevents chaining), edge strengthening (access-weighted), cross-topic bridge formation (direct closest-pair), confidence-modulated decay (synaptic tagging extends half-life), synaptic homeostasis, cognitive forgetting, staging sweep. Embedding backfill ensures all memories are clusterable.
280
+
281
+ 5. **Feedback** — Useful/not-useful signals adjust confidence, affecting retrieval rank and forgetting resistance.
282
+
283
+ ### Cognitive Foundations
284
+
285
+ - **ACT-R activation decay** (Anderson 1993) memories decay with time, strengthen with use. Synaptic tagging: heavily-accessed memories decay slower (log-scaled).
286
+ - **Hebbian learning** — co-retrieved memories form stronger associative edges
287
+ - **Complementary Learning Systems** — fast capture (salience + staging) + slow consolidation (sleep cycle)
288
+ - **Synaptic homeostasis** edge weight normalization prevents hub domination
289
+ - **Forgetting as feature** — noise removal improves signal-to-noise for connected memories
290
+ - **Diameter-enforced clustering** prevents semantic chaining (e.g., physics->biophysics->cooking = 1 cluster)
291
+ - **Multi-channel agreement** OOD detection requires multiple retrieval channels to agree
292
+
293
+ ---
294
+
295
+ ## Architecture
296
+
297
+ ```
298
+ src/
299
+ core/ # Cognitive primitives
300
+ embeddings.ts - Local vector embeddings (BGE-small-en-v1.5, 384d)
301
+ reranker.ts - Cross-encoder passage scoring (ms-marco-MiniLM)
302
+ query-expander.ts - Synonym expansion (flan-t5-small)
303
+ salience.ts - Write-time importance scoring (novelty + salience + reinforce-on-duplicate)
304
+ decay.ts - ACT-R temporal activation decay
305
+ hebbian.ts - Association strengthening/weakening
306
+ logger.ts - Append-only activity log (data/awm.log)
307
+ engine/ # Processing pipelines
308
+ activation.ts - 10-phase retrieval pipeline (dual BM25, coref, agreement gate)
309
+ consolidation.ts - 7-phase sleep cycle (diameter clustering, direct bridging, synaptic tagging)
310
+ connections.ts - Discover links between memories
311
+ staging.ts - Weak signal buffer (promote or discard)
312
+ retraction.ts - Negative memory / corrections
313
+ eviction.ts - Capacity enforcement
314
+ hooks/
315
+ sidecar.ts - Hook HTTP server (auto-checkpoint, stats, timer)
316
+ storage/
317
+ sqlite.ts - SQLite + FTS5 persistence layer
318
+ api/
319
+ routes.ts - HTTP endpoints (memory + task + system)
320
+ mcp.ts - MCP server (14 tools, incognito support)
321
+ cli.ts - CLI (setup, serve, hook config)
322
+ index.ts - HTTP server entry point (auto-backup on startup)
323
+ ```
324
+
325
+ For detailed architecture including pipeline phases, database schema, and system diagrams, see [docs/architecture.md](docs/architecture.md).
326
+
327
+ ---
328
+
329
+ ## Testing & Evaluation
330
+
331
+ ### Unit Tests
332
+
333
+ ```bash
334
+ npx vitest run # 77 tests (salience, decay, hebbian, supersession)
335
+ ```
336
+
337
+ ### Eval Harness (v0.6.0)
338
+
339
+ ```bash
340
+ npm run eval # All 4 benchmark suites
341
+ npm run eval -- --suite=retrieval # Single suite
342
+ npm run eval -- --bm25-only # Ablation: BM25 only
343
+ npm run eval -- --no-graph-walk # Ablation: disable graph walk
344
+ ```
345
+
346
+ Suites: retrieval (Recall@5), associative (multi-hop), redundancy (dedup F1), temporal (Spearman vs ACT-R). Ablation flags isolate each pipeline component's contribution.
347
+
348
+ ### Full Test Suite
349
+
350
+ ```bash
351
+ npm run test:mcp # MCP protocol smoke test (5/5)
352
+ npm run test:self # Pipeline component checks (94.1%)
353
+ npm run test:edge # 9 adversarial failure modes
354
+ npm run test:stress # 500 memories, 100 consolidation cycles (96.2%)
355
+ npm run test:workday # 4-session production simulation (93.3%)
356
+ npm run test:ab # AWM vs baseline comparison
357
+ npm run test:sleep # Consolidation impact measurement
358
+ npm run test:tokens # Token savings analysis (56.3% savings)
359
+ npm run test:pilot # Production-like query validation (14/15)
360
+ npm run test:locomo # LoCoMo industry benchmark (28.2%)
361
+ ```
362
+
363
+ ---
364
+
365
+ ## Environment Variables
366
+
367
+ | Variable | Default | Purpose |
368
+ |----------|---------|---------|
369
+ | `AWM_PORT` | `8400` | HTTP server port |
370
+ | `AWM_DB_PATH` | `memory.db` | SQLite database path |
371
+ | `AWM_AGENT_ID` | `claude-code` | Agent ID (memory namespace) |
372
+ | `AWM_EMBED_MODEL` | `Xenova/bge-small-en-v1.5` | Embedding model (retrieval-optimized) |
373
+ | `AWM_EMBED_DIMS` | `384` | Embedding dimensions |
374
+ | `AWM_RERANKER_MODEL` | `Xenova/ms-marco-MiniLM-L-6-v2` | Reranker model |
375
+ | `AWM_HOOK_PORT` | `8401` | Hook sidecar port |
376
+ | `AWM_HOOK_SECRET` | *(none)* | Bearer token for hook auth |
377
+ | `AWM_API_KEY` | *(none)* | Bearer token for HTTP API auth |
378
+ | `AWM_INCOGNITO` | *(unset)* | Set to `1` to disable all tools |
379
+
380
+ ## Tech Stack
381
+
382
+ | Component | Technology |
383
+ |-----------|-----------|
384
+ | Language | TypeScript (ES2022, strict) |
385
+ | Database | SQLite via better-sqlite3 + FTS5 |
386
+ | HTTP | Fastify 5 |
387
+ | MCP | @modelcontextprotocol/sdk |
388
+ | ML Runtime | @huggingface/transformers (local ONNX) |
389
+ | Embeddings | BGE-small-en-v1.5 (BAAI, retrieval-optimized, 384d) |
390
+ | Reranker | ms-marco-MiniLM-L-6-v2 (cross-encoder) |
391
+ | Query Expansion | flan-t5-small (synonym generation) |
392
+ | Tests | Vitest 4 |
393
+ | Validation | Zod 4 |
394
+
395
+ All three ML models run locally via ONNX. No external API calls for retrieval. The entire system is a single SQLite file + a Node.js process.
396
+
397
+ ## What's New in v0.6.0
398
+
399
+ - **Memory taxonomy** memories classified as `episodic`, `semantic`, `procedural`, or `unclassified`. Auto-classified on write. Filter by type on recall.
400
+ - **Query-adaptive retrieval** — pipeline adapts to query type: `targeted` (boost exact matches), `exploratory` (wider graph walk, more semantic), `balanced` (default).
401
+ - **Decision propagation** — decisions automatically broadcast to coordination layer for cross-agent discovery. Peer decisions shown on `memory_restore`.
402
+ - **Completion verification** — workers must provide proof of work (result summary, optional commit SHA) when completing assignments.
403
+ - **Task priority & dependencies** — priority field (0-10) and `blocked_by` for task ordering.
404
+ - **Eval harness** — `npm run eval` benchmarks retrieval, associative, redundancy, and temporal performance with ablation mode.
405
+ - **DB hardening** — busy_timeout, integrity check on startup, hot backups every 10 min, WAL checkpoint on shutdown.
406
+ - **Consolidation recall fix** — redundancy pruning transfers associations and tags to survivor memory.
407
+
408
+ See [CHANGELOG.md](CHANGELOG.md) for full details.
409
+
410
+ ## Project Status
411
+
412
+ AWM is in active development (v0.6.0). The core memory pipeline, consolidation system, multi-agent coordination, and MCP integration are stable and used daily in production coding workflows.
413
+
414
+ - Core retrieval and consolidation: **stable**
415
+ - MCP tools and Claude Code integration: **stable**
416
+ - Multi-agent coordination: **stable** (v0.6.0)
417
+ - Task management: **stable**
418
+ - Hook sidecar and auto-checkpoint: **stable**
419
+ - HTTP API: **stable** (for custom agents)
420
+ - Eval harness: **stable** (v0.6.0)
421
+
422
+ See [CHANGELOG.md](CHANGELOG.md) for version history.
423
+
424
+ ---
425
+
426
+ ## License
427
+
428
+ Apache 2.0 — see [LICENSE](LICENSE) and [NOTICE](NOTICE).