agent-working-memory 0.5.3 → 0.5.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,358 +1,399 @@
1
- # AgentWorkingMemory (AWM)
2
-
3
- **Persistent working memory for AI agents.**
4
-
5
- AWM helps agents retain important project knowledge across conversations and sessions. Instead of storing everything and retrieving by similarity alone, it filters for salience, builds associative links between related memories, and periodically consolidates useful knowledge while letting noise fade.
6
-
7
- Use it through Claude Code via MCP or as a local HTTP service for custom agents. Everything runs locally: SQLite + ONNX models + Node.js. No cloud, no API keys.
8
-
9
- ### Without AWM
10
- - Agent forgets earlier architecture decision
11
- - Suggests Redux after project standardized on Zustand
12
- - Repeats discussion already settled three days ago
13
- - Every new conversation starts from scratch
14
-
15
- ### With AWM
16
- - Recalls prior state-management decision and rationale
17
- - Surfaces related implementation patterns from past sessions
18
- - Continues work without re-asking for context
19
- - Gets more consistent the longer you use it
20
-
21
- ---
22
-
23
- ## Quick Start
24
-
25
- **Node.js 20+** required — check with `node --version`.
26
-
27
- ```bash
28
- npm install -g agent-working-memory
29
- awm setup --global
30
- ```
31
-
32
- Restart Claude Code. That's it — 13 memory tools appear automatically.
33
-
34
- First conversation will be ~30 seconds slower while ML models download (~124MB, cached locally). After that, everything runs on your machine.
35
-
36
- > For isolated memory per folder, see [Separate Memory Pools](#separate-memory-pools). For team onboarding, see [docs/quickstart.md](docs/quickstart.md).
37
-
38
- ---
39
-
40
- ## Who this is for
41
-
42
- - **Long-running coding agents** that need cross-session project knowledge
43
- - **Multi-agent workflows** where specialized agents share a common memory
44
- - **Local-first setups** where cloud memory is not acceptable
45
- - **Teams using Claude Code** who want persistent context without manual notes
46
-
47
- ## What this is not
48
-
49
- - Not a chatbot UI
50
- - Not a hosted SaaS
51
- - Not a generic vector database
52
- - Not a replacement for your source of truth (code, docs, tickets)
53
-
54
- ---
55
-
56
- ## Why it's different
57
-
58
- Most "memory for AI" projects are vector databases with a retrieval wrapper. AWM goes further:
59
-
60
- | | Typical RAG / Vector Store | AWM |
61
- |---|---|---|
62
- | **Storage** | Everything | Only novel, salient events (77% filtered at write time) |
63
- | **Retrieval** | Cosine similarity | 10-phase pipeline: BM25 + vectors + reranking + graph walk + decay |
64
- | **Connections** | None | Hebbian edges that strengthen when memories co-activate |
65
- | **Over time** | Grows forever, gets noisier | Consolidation: strengthens clusters, prunes noise, builds bridges |
66
- | **Forgetting** | Manual cleanup | Cognitive forgetting: unused memories fade, confirmed knowledge persists |
67
- | **Feedback** | None | Useful/not-useful signals tune confidence and retrieval rank |
68
- | **Correction** | Delete and re-insert | Retraction: wrong memories invalidated, corrections linked, penalties propagate |
69
-
70
- The design is based on cognitive science — ACT-R activation decay, Hebbian learning, complementary learning systems, and synaptic homeostasis — rather than ad-hoc heuristics. See [How It Works](#how-it-works) and [docs/cognitive-model.md](docs/cognitive-model.md) for details.
71
-
72
- ---
73
-
74
- ## Benchmarks
75
-
76
- | Eval | Score | What it tests |
77
- |------|-------|---------------|
78
- | Edge Cases | **100% (34/34)** | 9 failure modes: hub toxicity, flashbulb distortion, narcissistic interference, identity collision, noise forgetting benefit |
79
- | Stress Test | **92.3% (48/52)** | 500 memories, 100 sleep cycles, catastrophic forgetting, adversarial spam |
80
- | A/B Test | **AWM 100% vs Baseline 83%** | 100 project events, 24 recall questions |
81
- | Self-Test | **97.4%** | 31 pipeline component checks |
82
- | Workday | **86.7%** | 43 memories across 4 simulated work sessions |
83
- | Real-World | **93.1%** | 300 code chunks from a 71K-line production monorepo |
84
- | Token Savings | **64.5% savings** | Memory-guided context vs full conversation history |
85
-
86
- All evals are reproducible: `npm run test:self`, `npm run test:edge`, `npm run test:stress`, etc. See [Testing & Evaluation](#testing--evaluation) and [docs/benchmarks.md](docs/benchmarks.md) for full details.
87
-
88
- ---
89
-
90
- ## Features
91
-
92
- ### Memory Tools (13)
93
-
94
- | Tool | Purpose |
95
- |------|---------|
96
- | `memory_write` | Store a memory (salience filter decides disposition) |
97
- | `memory_recall` | Retrieve relevant memories by context |
98
- | `memory_feedback` | Report whether a recalled memory was useful |
99
- | `memory_retract` | Invalidate a wrong memory with optional correction |
100
- | `memory_stats` | View memory health metrics and activity |
101
- | `memory_checkpoint` | Save execution state (survives context compaction) |
102
- | `memory_restore` | Recover state + relevant context at session start |
103
- | `memory_task_add` | Create a prioritized task |
104
- | `memory_task_update` | Change task status/priority |
105
- | `memory_task_list` | List tasks by status |
106
- | `memory_task_next` | Get the highest-priority actionable task |
107
- | `memory_task_begin` | Start a task — auto-checkpoints and recalls context |
108
- | `memory_task_end` | End a task — writes summary and checkpoints |
109
-
110
- ### Separate Memory Pools
111
-
112
- By default, all projects share one memory pool. For isolated pools per folder, place a `.mcp.json` in each parent folder with a different `AWM_AGENT_ID`:
113
-
114
- ```
115
- C:\Users\you\work\.mcp.json → AWM_AGENT_ID: "work"
116
- C:\Users\you\personal\.mcp.json → AWM_AGENT_ID: "personal"
117
- ```
118
-
119
- Claude Code uses the closest `.mcp.json` ancestor. Same database, isolation by agent ID.
120
-
121
- ### Incognito Mode
122
-
123
- ```bash
124
- AWM_INCOGNITO=1 claude
125
- ```
126
-
127
- Registers zero tools Claude doesn't see memory at all. All other tools and MCP servers work normally.
128
-
129
- ### Auto-Checkpoint Hooks
130
-
131
- Installed by `awm setup --global`:
132
-
133
- - **Stop** — reminds Claude to write/recall after each response
134
- - **PreCompact** auto-checkpoints before context compression
135
- - **SessionEnd** — auto-checkpoints and consolidates on close
136
- - **15-min timer** — silent auto-checkpoint while session is active
137
-
138
- ### Activity Log
139
-
140
- ```bash
141
- tail -f "$(npm root -g)/agent-working-memory/data/awm.log"
142
- ```
143
-
144
- Real-time: writes, recalls, checkpoints, consolidation, hook events.
145
-
146
- ### Activity Stats
147
-
148
- ```bash
149
- curl http://127.0.0.1:8401/stats
150
- ```
151
-
152
- Returns daily counts: `{"writes": 8, "recalls": 9, "hooks": 3, "total": 25}`
153
-
154
- ---
155
-
156
- ## Memory Invocation Strategy
157
-
158
- AWM combines deterministic hooks for guaranteed memory operations at lifecycle transitions with agent-directed usage during active work.
159
-
160
- ### Deterministic triggers (always happen)
161
-
162
- | Event | Action |
163
- |-------|--------|
164
- | Session start | `memory_restore` — recover state + recall context |
165
- | Pre-compaction | Auto-checkpoint via hook sidecar |
166
- | Session end | Auto-checkpoint + full consolidation |
167
- | Every 15 min | Silent auto-checkpoint (if active) |
168
- | Task start | `memory_task_begin` — checkpoint + recall |
169
- | Task end | `memory_task_end` — summary + checkpoint |
170
-
171
- ### Agent-directed triggers (when these situations occur)
172
-
173
- **Write memory when:**
174
- - A project decision is made or changed
175
- - A root cause is discovered
176
- - A reusable implementation pattern is established
177
- - A preference, constraint, or requirement is clarified
178
- - A prior assumption is found to be wrong
179
-
180
- **Recall memory when:**
181
- - Starting work on a new task or subsystem
182
- - Re-entering code you haven't touched recently
183
- - After context compaction
184
- - After a failed attempt (check if there's prior knowledge)
185
- - Before refactoring or making architectural changes
186
-
187
- **Retract when:**
188
- - A stored memory turns out to be wrong or outdated
189
-
190
- **Feedback when:**
191
- - A recalled memory was used (useful) or irrelevant (not useful)
192
-
193
- ---
194
-
195
- ## HTTP API
196
-
197
- For custom agents, scripts, or non-Claude-Code workflows:
198
-
199
- ```bash
200
- awm serve # From npm install
201
- npx tsx src/index.ts # From source
202
- ```
203
-
204
- Write a memory:
205
-
206
- ```bash
207
- curl -X POST http://localhost:8400/memory/write \
208
- -H "Content-Type: application/json" \
209
- -d '{
210
- "agentId": "my-agent",
211
- "concept": "Express error handling",
212
- "content": "Use centralized error middleware as the last app.use()",
213
- "eventType": "causal",
214
- "surprise": 0.5,
215
- "causalDepth": 0.7
216
- }'
217
- ```
218
-
219
- Recall:
220
-
221
- ```bash
222
- curl -X POST http://localhost:8400/memory/activate \
223
- -H "Content-Type: application/json" \
224
- -d '{
225
- "agentId": "my-agent",
226
- "context": "How should I handle errors in my Express API?"
227
- }'
228
- ```
229
-
230
- ---
231
-
232
- ## How It Works
233
-
234
- ### The Memory Lifecycle
235
-
236
- 1. **Write** — Salience scoring evaluates novelty, surprise, causal depth, and effort. High-salience memories go active; borderline ones enter staging; noise is discarded.
237
-
238
- 2. **Connect** — Vector embedding (MiniLM-L6-v2, 384d). Temporal edges link to recent memories. Hebbian edges form between co-retrieved memories.
239
-
240
- 3. **Retrieve** — 10-phase pipeline: BM25 + semantic search + cross-encoder reranking + temporal decay (ACT-R) + graph walks + confidence gating.
241
-
242
- 4. **Consolidate** — 7-phase sleep cycle: replay clusters, strengthen edges, bridge cross-topic, decay unused, normalize hubs, forget noise, sweep staging.
243
-
244
- 5. **Feedback** — Useful/not-useful signals adjust confidence, affecting retrieval rank and forgetting resistance.
245
-
246
- ### Cognitive Foundations
247
-
248
- - **ACT-R activation decay** (Anderson 1993) — memories decay with time, strengthen with use
249
- - **Hebbian learning** — co-retrieved memories form stronger associative edges
250
- - **Complementary Learning Systems** — fast capture (salience + staging) + slow consolidation (sleep cycle)
251
- - **Synaptic homeostasis** — edge weight normalization prevents hub domination
252
- - **Forgetting as feature** — noise removal improves signal-to-noise for connected memories
253
-
254
- ---
255
-
256
- ## Architecture
257
-
258
- ```
259
- src/
260
- core/ # Cognitive primitives
261
- embeddings.ts - Local vector embeddings (MiniLM-L6-v2, 384d)
262
- reranker.ts - Cross-encoder passage scoring (ms-marco-MiniLM)
263
- query-expander.ts - Synonym expansion (flan-t5-small)
264
- salience.ts - Write-time importance scoring (novelty + salience)
265
- decay.ts - ACT-R temporal activation decay
266
- hebbian.ts - Association strengthening/weakening
267
- logger.ts - Append-only activity log (data/awm.log)
268
- engine/ # Processing pipelines
269
- activation.ts - 10-phase retrieval pipeline
270
- consolidation.ts - 7-phase sleep cycle consolidation
271
- connections.ts - Discover links between memories
272
- staging.ts - Weak signal buffer (promote or discard)
273
- retraction.ts - Negative memory / corrections
274
- eviction.ts - Capacity enforcement
275
- hooks/
276
- sidecar.ts - Hook HTTP server (auto-checkpoint, stats, timer)
277
- storage/
278
- sqlite.ts - SQLite + FTS5 persistence layer
279
- api/
280
- routes.ts - HTTP endpoints (memory + task + system)
281
- mcp.ts - MCP server (13 tools, incognito support)
282
- cli.ts - CLI (setup, serve, hook config)
283
- index.ts - HTTP server entry point
284
- ```
285
-
286
- For detailed architecture including pipeline phases, database schema, and system diagrams, see [docs/architecture.md](docs/architecture.md).
287
- For an implementation plan to improve memory precision and stale-context suppression, see [docs/memory-quality-hardening-rfc.md](docs/memory-quality-hardening-rfc.md).
288
-
289
- ---
290
-
291
- ## Testing & Evaluation
292
-
293
- ### Unit Tests
294
-
295
- ```bash
296
- npx vitest run # 68 tests
297
- ```
298
-
299
- ### Eval Suites
300
-
301
- | Command | What it tests | Score |
302
- |---------|--------------|-------|
303
- | `npm run test:self` | 31 pipeline checks: embeddings, BM25, reranker, decay, confidence, Hebbian, graph walks, staging | **97.4%** |
304
- | `npm run test:edge` | 9 adversarial failure modes: context collapse, hub toxicity, flashbulb distortion, narcissistic interference, identity collision, contradiction, bridge overshoot, noise benefit | **100%** |
305
- | `npm run test:stress` | 500 memories, 100 sleep cycles, catastrophic forgetting, adversarial spam, recovery | **92.3%** |
306
- | `npm run test:workday` | 43 memories across 4 projects, 14 recall challenges | **86.7%** |
307
- | `npm run test:ab` | AWM vs keyword baseline, 100 events, 24 questions | **AWM 100% vs 83%** |
308
- | `npm run test:tokens` | Token savings vs full conversation history | **64.5%** |
309
- | `npm run test:realworld` | 300 chunks from 71K-line monorepo, 16 challenges | **93.1%** |
310
-
311
- ---
312
-
313
- ## Environment Variables
314
-
315
- | Variable | Default | Purpose |
316
- |----------|---------|---------|
317
- | `AWM_PORT` | `8400` | HTTP server port |
318
- | `AWM_DB_PATH` | `memory.db` | SQLite database path |
319
- | `AWM_AGENT_ID` | `claude-code` | Agent ID (memory namespace) |
320
- | `AWM_EMBED_MODEL` | `Xenova/all-MiniLM-L6-v2` | Embedding model |
321
- | `AWM_EMBED_DIMS` | `384` | Embedding dimensions |
322
- | `AWM_RERANKER_MODEL` | `Xenova/ms-marco-MiniLM-L-6-v2` | Reranker model |
323
- | `AWM_HOOK_PORT` | `8401` | Hook sidecar port |
324
- | `AWM_HOOK_SECRET` | *(none)* | Bearer token for hook auth |
325
- | `AWM_INCOGNITO` | *(unset)* | Set to `1` to disable all tools |
326
-
327
- ## Tech Stack
328
-
329
- | Component | Technology |
330
- |-----------|-----------|
331
- | Language | TypeScript (ES2022, strict) |
332
- | Database | SQLite via better-sqlite3 + FTS5 |
333
- | HTTP | Fastify 5 |
334
- | MCP | @modelcontextprotocol/sdk |
335
- | ML Runtime | @huggingface/transformers (local ONNX) |
336
- | Tests | Vitest 4 |
337
- | Validation | Zod 4 |
338
-
339
- All three ML models run locally via ONNX. No external API calls for retrieval. The entire system is a single SQLite file + a Node.js process.
340
-
341
- ## Project Status
342
-
343
- AWM is in active development (v0.5.x). The core memory pipeline, consolidation system, and MCP integration are stable and used daily in production coding workflows.
344
-
345
- - Core retrieval and consolidation: **stable**
346
- - MCP tools and Claude Code integration: **stable**
347
- - Task management: **stable**
348
- - Hook sidecar and auto-checkpoint: **stable**
349
- - HTTP API: **stable** (for custom agents)
350
-
351
- See [CHANGELOG.md](CHANGELOG.md) for version history.
352
-
353
- ---
354
-
355
- ## License
356
-
357
- Apache 2.0 see [LICENSE](LICENSE) and [NOTICE](NOTICE).
358
-
1
+ # AgentWorkingMemory (AWM)
2
+
3
+ **Persistent working memory for AI agents.**
4
+
5
+ AWM helps agents retain important project knowledge across conversations and sessions. Instead of storing everything and retrieving by similarity alone, it filters for salience, builds associative links between related memories, and periodically consolidates useful knowledge while letting noise fade.
6
+
7
+ Use it through Claude Code via MCP or as a local HTTP service for custom agents. Everything runs locally: SQLite + ONNX models + Node.js. No cloud, no API keys.
8
+
9
+ ### Without AWM
10
+ - Agent forgets earlier architecture decision
11
+ - Suggests Redux after project standardized on Zustand
12
+ - Repeats discussion already settled three days ago
13
+ - Every new conversation starts from scratch
14
+
15
+ ### With AWM
16
+ - Recalls prior state-management decision and rationale
17
+ - Surfaces related implementation patterns from past sessions
18
+ - Continues work without re-asking for context
19
+ - Gets more consistent the longer you use it
20
+
21
+ ---
22
+
23
+ ## Quick Start
24
+
25
+ **Node.js 20+** required — check with `node --version`.
26
+
27
+ ```bash
28
+ npm install -g agent-working-memory
29
+ awm setup --global
30
+ ```
31
+
32
+ Restart Claude Code. That's it — 14 memory tools appear automatically.
33
+
34
+ First conversation will be ~30 seconds slower while ML models download (~200MB total, cached locally). After that, everything runs on your machine.
35
+
36
+ > For isolated memory per folder, see [Separate Memory Pools](#separate-memory-pools). For team onboarding, see [docs/quickstart.md](docs/quickstart.md).
37
+
38
+ ---
39
+
40
+ ## Who this is for
41
+
42
+ - **Long-running coding agents** that need cross-session project knowledge
43
+ - **Multi-agent workflows** where specialized agents share a common memory
44
+ - **Local-first setups** where cloud memory is not acceptable
45
+ - **Teams using Claude Code** who want persistent context without manual notes
46
+
47
+ ## What this is not
48
+
49
+ - Not a chatbot UI
50
+ - Not a hosted SaaS
51
+ - Not a generic vector database
52
+ - Not a replacement for your source of truth (code, docs, tickets)
53
+
54
+ ---
55
+
56
+ ## Why it's different
57
+
58
+ Most "memory for AI" projects are vector databases with a retrieval wrapper. AWM goes further:
59
+
60
+ | | Typical RAG / Vector Store | AWM |
61
+ |---|---|---|
62
+ | **Storage** | Everything | Salience-filtered with low-confidence fallback (novel events go active, borderline enter staging, low-salience stored at reduced confidence) |
63
+ | **Retrieval** | Cosine similarity | 10-phase pipeline: dual BM25 (keyword + expanded) + vectors + reranking + graph walk + decay + coref expansion |
64
+ | **Connections** | None | Hebbian edges that strengthen when memories co-activate |
65
+ | **Over time** | Grows forever, gets noisier | Consolidation: diameter-enforced clustering, cross-topic bridges, synaptic-tagged decay |
66
+ | **Forgetting** | Manual cleanup | Cognitive forgetting: unused memories fade, reinforced knowledge persists (access-count modulated) |
67
+ | **Feedback** | None | Useful/not-useful signals tune confidence and retrieval rank |
68
+ | **Correction** | Delete and re-insert | Retraction: wrong memories invalidated, corrections linked, penalties propagate |
69
+ | **Noise rejection** | None | Multi-channel agreement gate: requires 2+ retrieval channels to agree before returning results |
70
+ | **Duplicates** | Stored repeatedly | Reinforce-on-duplicate: near-exact matches boost existing memory instead of creating copies |
71
+
72
+ The design is based on cognitive science — ACT-R activation decay, Hebbian learning, complementary learning systems, synaptic homeostasis, and synaptic tagging — rather than ad-hoc heuristics. See [How It Works](#how-it-works) and [docs/cognitive-model.md](docs/cognitive-model.md) for details.
73
+
74
+ ---
75
+
76
+ ## Benchmarks (v0.5.4)
77
+
78
+ | Eval | Score | What it tests |
79
+ |------|-------|---------------|
80
+ | Edge Cases | **100% (34/34)** | 9 failure modes: context collapse, hub toxicity, flashbulb distortion, narcissistic interference, identity collision, contradiction trapping, bridge overshoot, noise forgetting |
81
+ | Stress Test | **96.2% (50/52)** | 500 memories, 100 sleep cycles, 10 topic clusters, 20 bridges/cycle, catastrophic forgetting, adversarial spam, recovery |
82
+ | Workday | **93.3%** | 43 memories across 4 simulated work sessions, knowledge transfer, context switching, cross-cutting queries, noise filtering |
83
+ | A/B Test | **AWM 85% vs Baseline 83%** | 100 project events, 24 recall questions, 22/22 fact recall |
84
+ | Sleep Cycle | **85.7% pre-sleep** | 60 memories, 4 topic clusters, consolidation impact measurement |
85
+ | Token Savings | **67.5% accuracy, 55% savings** | Memory-guided context vs full conversation history, 2.2x efficiency |
86
+ | LoCoMo | **28.2%** | Industry-standard conversational memory benchmark (1,986 QA pairs, 10 conversations) |
87
+ | Mini Multi-Hop | **80% (4/5)** | Entity bridging across conversation turns |
88
+
89
+ ### Consolidation (v0.5.4 with BGE-small)
90
+
91
+ | Metric | Value |
92
+ |--------|-------|
93
+ | Topic clusters formed | **10** per consolidation cycle |
94
+ | Cross-topic bridges | **20** in first cycle |
95
+ | Edges strengthened | **127** per cycle |
96
+ | Graph size at scale | **3,000-4,500 edges** (500 memories) |
97
+ | Recall after 100 cycles | **90%** stable |
98
+ | Catastrophic forgetting survival | **5/5** (100%) |
99
+
100
+ All evals are reproducible: `npm run test:edge`, `npm run test:stress`, `npm run test:workday`, etc. See [Testing & Evaluation](#testing--evaluation).
101
+
102
+ ---
103
+
104
+ ## Features
105
+
106
+ ### Memory Tools (14)
107
+
108
+ | Tool | Purpose |
109
+ |------|---------|
110
+ | `memory_write` | Store a memory (salience filter + reinforce-on-duplicate) |
111
+ | `memory_recall` | Retrieve relevant memories by context (dual BM25 + coref expansion) |
112
+ | `memory_feedback` | Report whether a recalled memory was useful |
113
+ | `memory_retract` | Invalidate a wrong memory with optional correction |
114
+ | `memory_supersede` | Replace outdated memory with current version |
115
+ | `memory_stats` | View memory health metrics and activity |
116
+ | `memory_checkpoint` | Save execution state (survives context compaction) |
117
+ | `memory_restore` | Recover state + relevant context at session start |
118
+ | `memory_task_add` | Create a prioritized task |
119
+ | `memory_task_update` | Change task status/priority |
120
+ | `memory_task_list` | List tasks by status |
121
+ | `memory_task_next` | Get the highest-priority actionable task |
122
+ | `memory_task_begin` | Start a task — auto-checkpoints and recalls context |
123
+ | `memory_task_end` | End a task — writes summary and checkpoints |
124
+
125
+ ### Separate Memory Pools
126
+
127
+ By default, all projects share one memory pool. For isolated pools per folder, place a `.mcp.json` in each parent folder with a different `AWM_AGENT_ID`:
128
+
129
+ ```
130
+ C:\Users\you\work\.mcp.json -> AWM_AGENT_ID: "work"
131
+ C:\Users\you\personal\.mcp.json -> AWM_AGENT_ID: "personal"
132
+ ```
133
+
134
+ Claude Code uses the closest `.mcp.json` ancestor. Same database, isolation by agent ID.
135
+
136
+ ### Incognito Mode
137
+
138
+ ```bash
139
+ AWM_INCOGNITO=1 claude
140
+ ```
141
+
142
+ Registers zero tools — Claude doesn't see memory at all. All other tools and MCP servers work normally.
143
+
144
+ ### Auto-Checkpoint Hooks
145
+
146
+ Installed by `awm setup --global`:
147
+
148
+ - **Stop** — reminds Claude to write/recall after each response
149
+ - **PreCompact** — auto-checkpoints before context compression
150
+ - **SessionEnd** — auto-checkpoints and consolidates on close
151
+ - **15-min timer** — silent auto-checkpoint while session is active
152
+
153
+ ### Auto-Backup
154
+
155
+ The HTTP server automatically copies the database to a `backups/` directory on startup with a timestamp. Cheap insurance against data loss.
156
+
157
+ ### Activity Log
158
+
159
+ ```bash
160
+ tail -f "$(npm root -g)/agent-working-memory/data/awm.log"
161
+ ```
162
+
163
+ Real-time: writes, recalls, reinforcements, checkpoints, consolidation, hook events.
164
+
165
+ ### Activity Stats
166
+
167
+ ```bash
168
+ curl http://127.0.0.1:8401/stats
169
+ ```
170
+
171
+ Returns daily counts: `{"writes": 8, "recalls": 9, "hooks": 3, "total": 25}`
172
+
173
+ ---
174
+
175
+ ## Memory Invocation Strategy
176
+
177
+ AWM combines deterministic hooks for guaranteed memory operations at lifecycle transitions with agent-directed usage during active work.
178
+
179
+ ### Deterministic triggers (always happen)
180
+
181
+ | Event | Action |
182
+ |-------|--------|
183
+ | Session start | `memory_restore` — recover state + recall context |
184
+ | Pre-compaction | Auto-checkpoint via hook sidecar |
185
+ | Session end | Auto-checkpoint + full consolidation |
186
+ | Every 15 min | Silent auto-checkpoint (if active) |
187
+ | Task start | `memory_task_begin` — checkpoint + recall |
188
+ | Task end | `memory_task_end` summary + checkpoint |
189
+
190
+ ### Agent-directed triggers (when these situations occur)
191
+
192
+ **Write memory when:**
193
+ - A project decision is made or changed
194
+ - A root cause is discovered
195
+ - A reusable implementation pattern is established
196
+ - A preference, constraint, or requirement is clarified
197
+ - A prior assumption is found to be wrong
198
+
199
+ **Recall memory when:**
200
+ - Starting work on a new task or subsystem
201
+ - Re-entering code you haven't touched recently
202
+ - After context compaction
203
+ - After a failed attempt (check if there's prior knowledge)
204
+ - Before refactoring or making architectural changes
205
+
206
+ **Retract when:**
207
+ - A stored memory turns out to be wrong or outdated
208
+
209
+ **Feedback when:**
210
+ - A recalled memory was used (useful) or irrelevant (not useful)
211
+
212
+ ---
213
+
214
+ ## HTTP API
215
+
216
+ For custom agents, scripts, or non-Claude-Code workflows:
217
+
218
+ ```bash
219
+ awm serve # From npm install
220
+ npx tsx src/index.ts # From source
221
+ ```
222
+
223
+ Write a memory:
224
+
225
+ ```bash
226
+ curl -X POST http://localhost:8400/memory/write \
227
+ -H "Content-Type: application/json" \
228
+ -d '{
229
+ "agentId": "my-agent",
230
+ "concept": "Express error handling",
231
+ "content": "Use centralized error middleware as the last app.use()",
232
+ "eventType": "causal",
233
+ "surprise": 0.5,
234
+ "causalDepth": 0.7
235
+ }'
236
+ ```
237
+
238
+ Recall:
239
+
240
+ ```bash
241
+ curl -X POST http://localhost:8400/memory/activate \
242
+ -H "Content-Type: application/json" \
243
+ -d '{
244
+ "agentId": "my-agent",
245
+ "context": "How should I handle errors in my Express API?"
246
+ }'
247
+ ```
248
+
249
+ ---
250
+
251
+ ## How It Works
252
+
253
+ ### The Memory Lifecycle
254
+
255
+ 1. **Write** — Salience scoring evaluates novelty, surprise, causal depth, and effort. High-salience memories go active; borderline ones enter staging; low-salience stored at reduced confidence for recall fallback. Near-duplicates reinforce existing memories instead of creating copies.
256
+
257
+ 2. **Connect** — Vector embedding (BGE-small-en-v1.5, 384d). Temporal edges link to recent memories. Hebbian edges form between co-retrieved memories. Coref expansion resolves pronouns to entity names.
258
+
259
+ 3. **Retrieve** — 10-phase pipeline: coref expansion + query expansion + dual BM25 (keyword-stripped + expanded) + semantic vectors + Rocchio pseudo-relevance feedback + ACT-R temporal decay (synaptic-tagged) + Hebbian boost + entity-bridge boost + graph walk + cross-encoder reranking + multi-channel agreement gate.
260
+
261
+ 4. **Consolidate** — 7-phase sleep cycle: diameter-enforced clustering (prevents chaining), edge strengthening (access-weighted), cross-topic bridge formation (direct closest-pair), confidence-modulated decay (synaptic tagging extends half-life), synaptic homeostasis, cognitive forgetting, staging sweep. Embedding backfill ensures all memories are clusterable.
262
+
263
+ 5. **Feedback** — Useful/not-useful signals adjust confidence, affecting retrieval rank and forgetting resistance.
264
+
265
+ ### Cognitive Foundations
266
+
267
+ - **ACT-R activation decay** (Anderson 1993) — memories decay with time, strengthen with use. Synaptic tagging: heavily-accessed memories decay slower (log-scaled).
268
+ - **Hebbian learning** — co-retrieved memories form stronger associative edges
269
+ - **Complementary Learning Systems** — fast capture (salience + staging) + slow consolidation (sleep cycle)
270
+ - **Synaptic homeostasis** edge weight normalization prevents hub domination
271
+ - **Forgetting as feature** — noise removal improves signal-to-noise for connected memories
272
+ - **Diameter-enforced clustering** prevents semantic chaining (e.g., physics->biophysics->cooking = 1 cluster)
273
+ - **Multi-channel agreement** OOD detection requires multiple retrieval channels to agree
274
+
275
+ ---
276
+
277
+ ## Architecture
278
+
279
+ ```
280
+ src/
281
+ core/ # Cognitive primitives
282
+ embeddings.ts - Local vector embeddings (BGE-small-en-v1.5, 384d)
283
+ reranker.ts - Cross-encoder passage scoring (ms-marco-MiniLM)
284
+ query-expander.ts - Synonym expansion (flan-t5-small)
285
+ salience.ts - Write-time importance scoring (novelty + salience + reinforce-on-duplicate)
286
+ decay.ts - ACT-R temporal activation decay
287
+ hebbian.ts - Association strengthening/weakening
288
+ logger.ts - Append-only activity log (data/awm.log)
289
+ engine/ # Processing pipelines
290
+ activation.ts - 10-phase retrieval pipeline (dual BM25, coref, agreement gate)
291
+ consolidation.ts - 7-phase sleep cycle (diameter clustering, direct bridging, synaptic tagging)
292
+ connections.ts - Discover links between memories
293
+ staging.ts - Weak signal buffer (promote or discard)
294
+ retraction.ts - Negative memory / corrections
295
+ eviction.ts - Capacity enforcement
296
+ hooks/
297
+ sidecar.ts - Hook HTTP server (auto-checkpoint, stats, timer)
298
+ storage/
299
+ sqlite.ts - SQLite + FTS5 persistence layer
300
+ api/
301
+ routes.ts - HTTP endpoints (memory + task + system)
302
+ mcp.ts - MCP server (14 tools, incognito support)
303
+ cli.ts - CLI (setup, serve, hook config)
304
+ index.ts - HTTP server entry point (auto-backup on startup)
305
+ ```
306
+
307
+ For detailed architecture including pipeline phases, database schema, and system diagrams, see [docs/architecture.md](docs/architecture.md).
308
+
309
+ ---
310
+
311
+ ## Testing & Evaluation
312
+
313
+ ### Unit Tests
314
+
315
+ ```bash
316
+ npx vitest run # 68 tests
317
+ ```
318
+
319
+ ### Eval Suites
320
+
321
+ | Command | What it tests | Score |
322
+ |---------|--------------|-------|
323
+ | `npm run test:edge` | 9 adversarial failure modes: context collapse, hub toxicity, flashbulb distortion, narcissistic interference, identity collision, contradiction trapping, bridge overshoot, noise benefit | **100% (34/34)** |
324
+ | `npm run test:stress` | 500 memories, 100 sleep cycles, 10 clusters, 20 bridges, catastrophic forgetting, adversarial spam, recovery | **96.2% (50/52)** |
325
+ | `npm run test:workday` | 43 memories across 4 projects, 14 recall challenges, noise filtering | **93.3%** |
326
+ | `npm run test:ab` | AWM vs keyword baseline, 100 events, 24 questions | **AWM 85% (22/22 recall)** |
327
+ | `npm run test:sleep` | 60 memories, 4 topic clusters, consolidation impact | **85.7% pre-sleep** |
328
+ | `npm run test:tokens` | Token savings vs full conversation history | **67.5% accuracy, 55% savings** |
329
+ | `npm run test:locomo` | LoCoMo conversational memory benchmark (1,986 QA pairs) | **28.2%** |
330
+ | `npm run test:self` | Pipeline component checks | **97.4%** |
331
+
332
+ ---
333
+
334
+ ## Environment Variables
335
+
336
+ | Variable | Default | Purpose |
337
+ |----------|---------|---------|
338
+ | `AWM_PORT` | `8400` | HTTP server port |
339
+ | `AWM_DB_PATH` | `memory.db` | SQLite database path |
340
+ | `AWM_AGENT_ID` | `claude-code` | Agent ID (memory namespace) |
341
+ | `AWM_EMBED_MODEL` | `Xenova/bge-small-en-v1.5` | Embedding model (retrieval-optimized) |
342
+ | `AWM_EMBED_DIMS` | `384` | Embedding dimensions |
343
+ | `AWM_RERANKER_MODEL` | `Xenova/ms-marco-MiniLM-L-6-v2` | Reranker model |
344
+ | `AWM_HOOK_PORT` | `8401` | Hook sidecar port |
345
+ | `AWM_HOOK_SECRET` | *(none)* | Bearer token for hook auth |
346
+ | `AWM_API_KEY` | *(none)* | Bearer token for HTTP API auth |
347
+ | `AWM_INCOGNITO` | *(unset)* | Set to `1` to disable all tools |
348
+
349
+ ## Tech Stack
350
+
351
+ | Component | Technology |
352
+ |-----------|-----------|
353
+ | Language | TypeScript (ES2022, strict) |
354
+ | Database | SQLite via better-sqlite3 + FTS5 |
355
+ | HTTP | Fastify 5 |
356
+ | MCP | @modelcontextprotocol/sdk |
357
+ | ML Runtime | @huggingface/transformers (local ONNX) |
358
+ | Embeddings | BGE-small-en-v1.5 (BAAI, retrieval-optimized, 384d) |
359
+ | Reranker | ms-marco-MiniLM-L-6-v2 (cross-encoder) |
360
+ | Query Expansion | flan-t5-small (synonym generation) |
361
+ | Tests | Vitest 4 |
362
+ | Validation | Zod 4 |
363
+
364
+ All three ML models run locally via ONNX. No external API calls for retrieval. The entire system is a single SQLite file + a Node.js process.
365
+
366
+ ## What's New in v0.5.4
367
+
368
+ - **BGE-small-en-v1.5 embedding model** — retrieval-optimized, 60% higher cosine for related short texts
369
+ - **Diameter-enforced clustering** — prevents semantic chaining, forms 10 distinct topic clusters
370
+ - **Direct cross-topic bridging** — 20 bridges per consolidation cycle
371
+ - **Dual BM25 retrieval** — keyword-stripped + expanded queries for better precision
372
+ - **Multi-channel agreement gate** — OOD detection prevents off-topic results
373
+ - **Reinforce-on-duplicate** — near-duplicate writes boost existing memories
374
+ - **No-discard salience** — low-salience memories stored at reduced confidence (available for fallback recall)
375
+ - **Synaptic tagging** — access count modulates decay (heavily-used memories persist longer)
376
+ - **Coref expansion** — pronoun queries auto-expanded with recent entity names
377
+ - **Async consolidation** — embedding backfill ensures all memories are clusterable
378
+ - **Auto-backup** — database copied to backups/ on server startup
379
+
380
+ See [CHANGELOG-0.5.4.md](CHANGELOG-0.5.4.md) for full details.
381
+
382
+ ## Project Status
383
+
384
+ AWM is in active development (v0.5.4). The core memory pipeline, consolidation system, and MCP integration are stable and used daily in production coding workflows.
385
+
386
+ - Core retrieval and consolidation: **stable**
387
+ - MCP tools and Claude Code integration: **stable**
388
+ - Task management: **stable**
389
+ - Hook sidecar and auto-checkpoint: **stable**
390
+ - HTTP API: **stable** (for custom agents)
391
+ - Cognitive consolidation (clustering, bridging): **stable** (v0.5.4)
392
+
393
+ See [CHANGELOG-0.5.4.md](CHANGELOG-0.5.4.md) for version history.
394
+
395
+ ---
396
+
397
+ ## License
398
+
399
+ Apache 2.0 — see [LICENSE](LICENSE) and [NOTICE](NOTICE).