@cchez/memory-mcp 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/DESIGN.md ADDED
@@ -0,0 +1,188 @@
1
+ # Memory MCP — Design Document
2
+
3
+ ## Purpose
4
+
5
+ A persistent, semantic memory store for AI agents. Agents working on the same codebase or within the same team can read and write shared knowledge across sessions: coding rules discovered during development, architecture decisions, team discussions from Slack, and project facts. The goal is to eliminate repeated context re-explanation and give agents access to learnings that accumulate over time.
6
+
7
+ ---
8
+
9
+ ## Core Design Decisions
10
+
11
+ ### 1. Vector storage: Qdrant
12
+
13
+ Qdrant is the vector database. Chosen over alternatives because:
14
+ - Written in Rust — high throughput, low latency on commodity hardware
15
+ - Rich payload filtering (filter by `source`, `tags`, `memory_type` in a single query)
16
+ - Clean REST + gRPC API, well-maintained Docker image
17
+ - Snapshot API simplifies backup and cross-machine migration
18
+ - No operational overhead vs distributed options (Milvus, Weaviate)
19
+
20
+ pgvector was considered but rejected: SQL join overhead is unnecessary when the only query pattern is ANN search + metadata filter.
21
+
22
+ ### 2. Embedding model: Ollama bge-m3
23
+
24
+ **bge-m3** (BAAI, 1024-dim, ~1.1 GB) runs locally via Ollama.
25
+
26
+ Selected over alternatives:
27
+
28
+ | Model | Dims | Size | Chinese | Decision |
29
+ |---|---|---|---|---|
30
+ | nomic-embed-text | 768 | ~300MB | near-zero | Rejected — too weak for Chinese content |
31
+ | text-embedding-3-small | 1536 | API | excellent | Rejected — sends data to OpenAI, ongoing cost |
32
+ | bge-m3 | 1024 | ~1.1GB | SOTA | **Selected** |
33
+ | qwen3-embedding:8b | 4096 | ~16GB | #1 MTEB | Rejected — requires GPU, too heavy for dev machines |
34
+
35
+ OpenAI is still supported as a fallback via `EMBEDDING_PROVIDER=openai`.
36
+
37
+ **Critical constraint:** The embedding model is locked at collection creation time. All stored vectors must use the same model and dimension. Switching models requires a full re-embed of all data (`scripts/migrate-reembed.ts`).
38
+
39
+ ### 3. Two collections, auto-routed by `memory_type`
40
+
41
+ A single collection mixing Slack summaries with coding rules degrades search quality — the semantic spaces of "team communication" and "technical constraints" are different enough to pollute results when merged.
42
+
43
+ Two collections are maintained:
44
+
45
+ ```
46
+ coding ← rule, decision, preference
47
+ workspace ← fact, summary
48
+ ```
49
+
50
+ Routing is automatic: `store_memory` reads `memory_type` and writes to the correct collection. Agents never specify a collection when writing. Search defaults to both collections, with an optional `collections` filter.
51
+
52
+ ### 4. Content-hash deduplication
53
+
54
+ Point IDs in Qdrant are the first 32 hex characters of SHA-256(`content`). Writing identical content twice produces one record (Qdrant upsert is idempotent). This prevents retrieval pollution from repeated ingestion of the same Slack message or rule.
55
+
56
+ ### 5. MCP as the agent interface
57
+
58
+ The MCP stdio server exposes four tools:
59
+
60
+ | Tool | Purpose |
61
+ |---|---|
62
+ | `store_memory` | Write a memory (auto-routed, deduplicated) |
63
+ | `search_memory` | Hybrid vector + keyword + recency search across one or both collections |
64
+ | `delete_memory` | Remove a record by ID |
65
+ | `list_memories` | Paginated browse for audit / ID lookup |
66
+ | `correct_memory` | Supersede an incorrect/stale memory with a corrected replacement |
67
+ | `capture_episode` | Opt-in episode summary and durable observation capture |
68
+
69
+ MCP was chosen over a plain REST API for the agent interface because:
70
+ - Tool descriptions are embedded in the protocol — agents discover capabilities without configuration
71
+ - Works with Claude Code, Claude Desktop, and any MCP-compliant host without custom integration
72
+ - Zod schema validation on all inputs prevents malformed queries reaching Qdrant
73
+
74
+ A separate HTTP ingest server (`server.ts`) handles bulk writes from automated pipelines (e.g. scheduled Slack summarisation jobs) where MCP overhead is unnecessary. It uses Bearer token auth and returns partial-success responses (HTTP 207) when some items in a batch fail.
75
+
76
+ ### 6. Self-contained Docker stack
77
+
78
+ Qdrant and Ollama both run as Docker containers. The MCP server runs on the host (as a stdio process spawned by the agent host). This means:
79
+ - `docker compose up -d` in `db/` is the only infrastructure setup step
80
+ - Data persists in bind mounts (`db/qdrant_data/`, `db/ollama_data/`)
81
+ - Moving to a new machine = copy the folder + `docker compose up -d`
82
+ - No cloud dependencies, no API keys required for the default configuration
83
+
84
+ ---
85
+
86
+ ## Data Model
87
+
88
+ Each memory is stored as a Qdrant point:
89
+
90
+ ```typescript
91
+ {
92
+ id: string, // SHA-256 hash of content (first 32 hex chars)
93
+ vector: number[], // 1024-dim embedding from bge-m3
94
+ payload: {
95
+ content: string, // The knowledge text
96
+ source: string, // Origin: "slack/channel", "agent/claude-code", etc.
97
+ memory_type: MemoryType, // "rule" | "decision" | "preference" | "fact" | "summary"
98
+ tags: string[], // Optional: specific tags for filtered retrieval
99
+ created_at: string, // ISO 8601 timestamp
100
+ updated_at: string, // ISO 8601 timestamp
101
+ status: "active" | "superseded" | "deprecated",
102
+ supersedes?: string, // old memory ID replaced by this memory
103
+ superseded_by?: string, // new memory ID that replaced this memory
104
+ correction_reason?: string,
105
+ confidence?: number, // 0-1 audit signal
106
+ episode_id?: string, // task/debug episode that produced this memory
107
+ related_ids?: string[],
108
+ valid_until?: string,
109
+ last_verified_at?: string,
110
+ }
111
+ }
112
+ ```
113
+
114
+ ### `memory_type` semantics
115
+
116
+ | Type | Collection | Meaning | Durability |
117
+ |---|---|---|---|
118
+ | `rule` | coding | Hard constraint (CI rule, style enforcement, hard limit) | Long-lived |
119
+ | `decision` | coding | Architecture or technology choice with rationale | Long-lived |
120
+ | `preference` | coding | Soft convention or team preference | Medium |
121
+ | `fact` | workspace | Current state: config values, team facts, integration details | Time-sensitive |
122
+ | `summary` | workspace | Distilled summary of a discussion, meeting, or Slack thread | Time-sensitive |
123
+
124
+ ### `source` conventions
125
+
126
+ ```
127
+ slack/<channel-name> — from a Slack channel
128
+ agent/claude-code — discovered by Claude Code during development
129
+ agent/<other-agent-name> — discovered by another AI agent
130
+ manual/<topic> — manually provided by a human
131
+ confluence/<page-title> — from a Confluence page
132
+ ci/<pipeline-name> — from CI/CD output
133
+ ```
134
+
135
+ ---
136
+
137
+ ## Search Behaviour
138
+
139
+ Search embeds the query with the same model, then runs lifecycle-aware hybrid retrieval:
140
+
141
+ 1. Query is embedded → 1024-dim vector
142
+ 2. Active memories are searched by default; `superseded` and `deprecated` records are excluded unless `include_inactive=true`
143
+ 3. Vector candidates come from Qdrant cosine similarity search
144
+ 4. Keyword candidates come from lexical matching over `content`, `source`, `memory_type`, `tags`, and `episode_id`
145
+ 5. Scores are blended from vector similarity, keyword score, and recency score; `rule` and `decision` do not decay, while `fact`, `summary`, and `preference` receive recency weighting
146
+ 6. MMR re-ranks merged candidates to reduce near-duplicate results
147
+ 7. Optional filters are applied: `source`, `tags`, `memory_type`, `score_threshold`
148
+
149
+ Supported modes: `hybrid` (default), `vector`, and `keyword`. Recommended `score_threshold`: **0.5** for general search, **0.65** when precision matters.
150
+
151
+ ## Feedback Loop and Episodes
152
+
153
+ `correct_memory` handles the correction loop: user or agent provides corrected content and a reason, the system writes a new active memory, marks the old record `superseded`, and links both sides via `supersedes` / `superseded_by`.
154
+
155
+ `capture_episode` handles opt-in SDLC/debug memory. It stores a compact episode summary plus durable observations supplied by the caller. It intentionally does not capture raw conversation or tool traces; noisy automatic trace ingestion is out of scope.
156
+
157
+ ---
158
+
159
+ ## Agent Skills
160
+
161
+ Two skills guide agent behaviour around memory:
162
+
163
+ **`memory-search`** — triggers automatically when starting work on a feature, discussing architecture, or when the user asks to look something up. Chooses the right collection and filters, caps at two search attempts per topic, and applies results directly to work rather than just summarising them back.
164
+
165
+ **`memory-save`** — triggers when a coding rule is discovered, a decision is ratified, or a configuration fact is established. Applies a three-question filter (durable? non-obvious? actionable?) before saving. Writes content in structured templates optimised for future semantic retrieval.
166
+
167
+ Install both skills from `skills/` into `~/.claude/skills/`.
168
+
169
+ ---
170
+
171
+ ## Migration Scripts
172
+
173
+ ### `scripts/migrate.ts`
174
+
175
+ One-time migration from the original single `knowledge` collection to the dual `coding`/`workspace` structure. Infers `memory_type` from `source` and `tags`. Copies vectors directly (no re-embed needed — same model, same dimensions).
176
+
177
+ ### `scripts/migrate-reembed.ts`
178
+
179
+ Used when switching embedding models. Drops and recreates all collections at the new vector size, then re-embeds every record using the currently configured embedding provider. Supports dry-run mode (`MIGRATE_DRY_RUN=true`). Includes an Ollama health-check with retry so it works in automated environments where the model is still loading.
180
+
181
+ ---
182
+
183
+ ## What Is Not in Scope
184
+
185
+ - **Multi-tenancy / ACLs:** All agents share the same collections. If per-user or per-team isolation is needed, implement separate collections with a naming convention (e.g. `coding_teamA`) and route via `source` filters.
186
+ - **Chunking:** Long documents are stored as single vectors. Content over ~500 tokens will have degraded embedding quality. For long-form ingestion, chunk before calling `store_memory`.
187
+ - **Raw trace capture:** The system does not automatically ingest full conversations or tool traces. Episode capture is explicit and summary-based.
188
+ - **Background reflection / promotion:** There is no background agent that promotes short-term facts to long-term decisions without an explicit `capture_episode` or `store_memory` call.
package/README.md ADDED
@@ -0,0 +1,484 @@
1
+ # Memory MCP
2
+
3
+ A self-contained AI memory store for agents and developers. Stores coding rules, architecture decisions, team discussions, and project facts as semantic vectors. Any MCP-compatible agent can search and save memories across sessions.
4
+
5
+ **Stack:** Qdrant (vector DB) + Ollama bge-m3 (embedding) + MCP stdio server — all containerised, zero cloud dependencies.
6
+
7
+ ---
8
+
9
+ ## Architecture
10
+
11
+ ```
12
+ memory-mcp/
13
+ ├── db/
14
+ │ ├── docker-compose.yml # Qdrant + Ollama containers
15
+ │ ├── .env.example # Environment variable template
16
+ │ ├── qdrant_data/ # Persisted vector data (gitignored)
17
+ │ └── ollama_data/ # Persisted model cache (gitignored)
18
+ ├── src/
19
+ │ ├── index.ts # MCP stdio server
20
+ │ ├── server.ts # HTTP bulk-ingest server
21
+ │ ├── embedding.ts # Embedding provider abstraction (Ollama / OpenAI)
22
+ │ ├── qdrant.ts # Qdrant client — dual-collection, auto-routing
23
+ │ └── tools/
24
+ │ ├── store.ts # store_memory tool
25
+ │ ├── search.ts # search_memory tool
26
+ │ ├── delete.ts # delete_memory tool
27
+ │ ├── list.ts # list_memories tool
28
+ │ ├── correct.ts # correct_memory tool — supersession chain
29
+ │ └── episode.ts # capture_episode tool — opt-in task summaries
30
+ ├── scripts/
31
+ │ ├── migrate.ts # One-time migration: old collection → coding/workspace
32
+ │ └── migrate-reembed.ts # Re-embed all data when switching embedding model
33
+ └── skills/
34
+ ├── memory-search/ # Agent skill: when and how to search memory
35
+ ├── memory-save/ # Agent skill: when and how to save to memory
36
+ └── memory-correct/ # Agent skill: when and how to correct memory
37
+ ```
38
+
39
+ ### Collections
40
+
41
+ Memories are auto-routed to one of two Qdrant collections based on `memory_type`:
42
+
43
+ | Collection | memory_type values | Content |
44
+ |---|---|---|
45
+ | `coding` | `rule`, `decision`, `preference` | Coding constraints, architecture choices, tool preferences |
46
+ | `workspace` | `fact`, `summary` | Team discussions, Slack summaries, project facts, config values |
47
+
48
+ ---
49
+
50
+ ## Quick Start
51
+
52
+ ### Step 1 — Start the database and embedding model
53
+
54
+ ```bash
55
+ cd db
56
+ docker compose up -d
57
+ ```
58
+
59
+ This starts:
60
+ - **Qdrant** on `http://localhost:6333` (vector database)
61
+ - **Ollama** on `http://localhost:11434` (embedding server, auto-pulls `bge-m3` on first run)
62
+
63
+ First run takes 1–2 minutes while bge-m3 (~1.1 GB) downloads. Check progress:
64
+
65
+ ```bash
66
+ docker compose logs -f ollama
67
+ ```
68
+
69
+ Verify both are ready:
70
+
71
+ ```bash
72
+ curl http://localhost:6333/collections # Qdrant
73
+ curl http://localhost:11434/api/tags # Ollama — should list bge-m3
74
+ ```
75
+
76
+ > **Note:** If you have Ollama running locally on port 11434, stop it first to avoid port conflicts:
77
+ > ```bash
78
+ > killall ollama 2>/dev/null || true
79
+ > ```
80
+
81
+ ### Step 2 — Configure environment
82
+
83
+ ```bash
84
+ cp db/.env.example .env
85
+ ```
86
+
87
+ The defaults work out of the box with the docker-compose setup. Edit `.env` only if you need to customise:
88
+
89
+ ```env
90
+ QDRANT_URL=http://localhost:6333
91
+ CODING_COLLECTION=coding
92
+ WORKSPACE_COLLECTION=workspace
93
+
94
+ EMBEDDING_PROVIDER=ollama # or "openai"
95
+ EMBEDDING_DIMENSIONS=1024 # must match the model (bge-m3 = 1024)
96
+ OLLAMA_BASE_URL=http://localhost:11434
97
+ OLLAMA_MODEL=bge-m3
98
+
99
+ # Only needed if EMBEDDING_PROVIDER=openai
100
+ OPENAI_API_KEY=sk-...
101
+
102
+ INGEST_API_KEY=your-secret-key # for HTTP bulk-ingest auth
103
+ PORT=3000
104
+ ```
105
+
106
+ ### Step 3 — Install dependencies
107
+
108
+ Only needed for local development from this repo:
109
+
110
+ ```bash
111
+ npm install
112
+ ```
113
+
114
+ ### Step 4 — Wire up the MCP server
115
+
116
+ For normal usage, run the MCP server via npm with `npx`. Add this to your Claude Code / Claude Desktop MCP config (`.claude/settings.json` or `claude_desktop_config.json`):
117
+
118
+ ```json
119
+ {
120
+ "mcpServers": {
121
+ "memory-mcp": {
122
+ "type": "stdio",
123
+ "command": "npx",
124
+ "args": [
125
+ "-y",
126
+ "@cchez/memory-mcp@latest"
127
+ ],
128
+ "env": {
129
+ "QDRANT_URL": "http://localhost:6333",
130
+ "EMBEDDING_PROVIDER": "ollama",
131
+ "OLLAMA_BASE_URL": "http://localhost:11434",
132
+ "OLLAMA_MODEL": "bge-m3",
133
+ "EMBEDDING_DIMENSIONS": "1024",
134
+ "CODING_COLLECTION": "coding",
135
+ "WORKSPACE_COLLECTION": "workspace"
136
+ }
137
+ }
138
+ }
139
+ }
140
+ ```
141
+
142
+ For local development before publishing, point the MCP host at the source file instead:
143
+
144
+ ```json
145
+ {
146
+ "mcpServers": {
147
+ "memory-mcp": {
148
+ "type": "stdio",
149
+ "command": "node",
150
+ "args": [
151
+ "/absolute/path/to/memory-mcp/node_modules/tsx/dist/cli.mjs",
152
+ "/absolute/path/to/memory-mcp/src/index.ts"
153
+ ],
154
+ "env": {
155
+ "QDRANT_URL": "http://localhost:6333",
156
+ "EMBEDDING_PROVIDER": "ollama",
157
+ "OLLAMA_BASE_URL": "http://localhost:11434",
158
+ "OLLAMA_MODEL": "bge-m3",
159
+ "EMBEDDING_DIMENSIONS": "1024",
160
+ "CODING_COLLECTION": "coding",
161
+ "WORKSPACE_COLLECTION": "workspace"
162
+ }
163
+ }
164
+ }
165
+ }
166
+ ```
167
+
168
+ > **Why `node` instead of `npm run dev` for local development?** Some MCP hosts don't reliably pass the `cwd` field to the spawned process, causing npm to look for `package.json` in the wrong directory. Invoking `node` with absolute paths bypasses npm entirely and works regardless of working directory.
169
+
170
+ Replace both `/absolute/path/to/memory-mcp` occurrences with the actual path to this repo (e.g. `/Users/you/projects/memory-mcp`).
171
+
172
+ Restart Claude Code / Claude Desktop. The MCP tools will be available immediately.
173
+
174
+ ### Step 5 — Install agent skills (optional but recommended)
175
+
176
+ Copy the skills into your Claude skills directory so agents automatically know when to search and save memory:
177
+
178
+ ```bash
179
+ cp -r skills/memory-search ~/.claude/skills/
180
+ cp -r skills/memory-save ~/.claude/skills/
181
+ cp -r skills/memory-correct ~/.claude/skills/
182
+ ```
183
+
184
+ Restart Claude Code to pick up the new skills.
185
+
186
+ ---
187
+
188
+ ## Publishing to npmjs
189
+
190
+ The package is configured as `@cchez/memory-mcp` and exposes one executable bin, `memory-mcp`, backed by `dist/index.js`. The package uses a `files` allowlist so local Qdrant/Ollama data is not published.
191
+
192
+ Before publishing:
193
+
194
+ ```bash
195
+ npm run build
196
+ npm run pack:dry-run
197
+ ```
198
+
199
+ First-time publish for a scoped public package:
200
+
201
+ ```bash
202
+ npm login --registry=https://registry.npmjs.org/
203
+ npm publish --access public --registry=https://registry.npmjs.org/
204
+ ```
205
+
206
+ After publish, agent configs can use:
207
+
208
+ ```json
209
+ {
210
+ "command": "npx",
211
+ "args": ["-y", "@cchez/memory-mcp@latest"]
212
+ }
213
+ ```
214
+
215
+ The npm package only runs the MCP server. Qdrant and Ollama still need to be running separately, for example with `db/docker-compose.yml` from this repo.
216
+
217
+ ### Step 6 - Global instruction
218
+
219
+ ```
220
+ # Memory
221
+
222
+ 知识库(memory mcp)是核心资产,通过 memory mcp 访问。
223
+
224
+ ## 搜索 — 先查再做
225
+
226
+ 开始任何任务前,先搜索知识库获取相关上下文:
227
+ - 开始写代码或修 bug → 搜索是否有相关编码规则或约束
228
+ - 讨论架构或技术选型 → 搜索是否有已有决策
229
+ - 遇到项目、系统、集成的名称 → 搜索相关背景和配置事实
230
+ - 用户问"我们之前怎么处理 X 的" → 搜索知识库,别靠猜
231
+
232
+ 搜索时设 `score_threshold: 0.5` 过滤低相关结果。不确定搜哪个 collection 就两个都搜(默认行为)。每个话题最多搜两次(换个措辞重试一次),没结果就停,不要死循环。
233
+
234
+ ## 保存 — 三问过滤
235
+
236
+ 发现以下信息时主动保存,不需要用户提醒:
237
+ - 编码约束或规则("方法复杂度不能超过 15,否则 CI 挂")
238
+ - 架构或技术选型决策("选择了 Qdrant,原因是…")
239
+ - 配置事实("Airwallex RFI KYC template ID 是 xxx")
240
+ - 解决了一个非显而易见的歧义
241
+
242
+ 用户说"记住这个"时,立即保存。
243
+
244
+ 保存前过三问:① 耐久?3 个月后还成立吗 ② 非显而易见?新人不会的知道吗 ③ 可复用?会影响下次工作吗。三问全是才保存。不保存:任务状态、短暂信息、众所周知的事实。
245
+
246
+ ## memory_type 分类
247
+
248
+ | memory_type | 保存什么 | 自动路由到 |
249
+ |---|---|---|
250
+ | `rule` | 编码约束、CI 规则、强制规范 | coding |
251
+ | `decision` | 技术选型、架构决策(含原因) | coding |
252
+ | `preference` | 团队偏好、软性约定 | coding |
253
+ | `fact` | 配置值、团队现状、集成细节 | workspace |
254
+ | `summary` | Slack/会议/文档的提炼摘要 | workspace |
255
+
256
+ collection 不需要手动指定,由 `memory_type` 自动路由。
257
+
258
+ ## source 格式
259
+
260
+ `slack/<频道名>` / `agent/claude-code` / `confluence/<页面名>` / `manual/<主题>`
261
+
262
+ ## 删除过时记忆
263
+
264
+ 发现知识库中有错误或过期内容时,用 `correct_memory` 写入修订版并 supersede 旧记忆。只有敏感信息、重复垃圾、或确实不应保留审计历史的记录才用 `delete_memory(id, collection)`。
265
+ ```
266
+
267
+ ---
268
+
269
+ ## MCP Tools Reference
270
+
271
+ ### `store_memory`
272
+
273
+ Save a piece of knowledge. Content with identical text is automatically deduplicated (idempotent upsert via SHA-256 hash).
274
+
275
+ | Parameter | Type | Required | Description |
276
+ |---|---|---|---|
277
+ | `content` | string | yes | The knowledge to store (1–5 sentences, written for future retrieval) |
278
+ | `source` | string | yes | Origin: `slack/channel-name`, `agent/claude-code`, `confluence/page-title`, `manual/...` |
279
+ | `memory_type` | enum | yes | `rule` / `decision` / `preference` / `fact` / `summary` |
280
+ | `tags` | string[] | no | 2–5 specific tags for filtered retrieval |
281
+ | `status` | enum | `active` | `active` / `superseded` / `deprecated` |
282
+ | `confidence` | number | no | Confidence score 0–1 |
283
+ | `episode_id` | string | no | Task/debug episode that produced the memory |
284
+ | `related_ids` | string[] | no | Related memory IDs |
285
+ | `supersedes` | string | no | Older memory ID replaced by this memory |
286
+ | `valid_until` | string | no | ISO timestamp for time-sensitive facts |
287
+ | `last_verified_at` | string | no | ISO timestamp for last verification |
288
+
289
+ Returns: `{ id, collection }`
290
+
291
+ ### `search_memory`
292
+
293
+ Hybrid search across one or both collections. The default `hybrid` mode combines vector similarity, exact keyword matching, light recency weighting, and MMR de-duplication. Superseded and deprecated memories are hidden unless explicitly requested.
294
+
295
+ | Parameter | Type | Default | Description |
296
+ |---|---|---|---|
297
+ | `query` | string | — | Natural language query |
298
+ | `collections` | array | both | `["coding"]`, `["workspace"]`, or omit for both |
299
+ | `limit` | number | 5 | Max results (1–20) |
300
+ | `score_threshold` | number | — | Min similarity 0–1 (recommended: 0.5) |
301
+ | `memory_type` | enum | — | Filter by type |
302
+ | `tags` | string[] | — | Filter — all specified tags must match |
303
+ | `source` | string | — | Filter by exact source string |
304
+ | `include_inactive` | boolean | false | Include `superseded` / `deprecated` memories |
305
+ | `mode` | enum | `hybrid` | `vector`, `keyword`, or `hybrid` |
306
+ | `use_recency` | boolean | true | Apply recency weighting to time-sensitive memories |
307
+ | `use_mmr` | boolean | true | Reduce duplicate-looking results |
308
+
309
+ Returns: array of `{ id, content, source, memory_type, collection, tags, score, vector_score, keyword_score, recency_score, status, created_at, updated_at, ...lifecycle_fields }`
310
+
311
+ ### `correct_memory`
312
+
313
+ Create a corrected replacement for an existing memory. The old memory is marked `superseded`; the new memory is written as `active` with `supersedes` pointing at the old ID.
314
+
315
+ | Parameter | Type | Required | Description |
316
+ |---|---|---|---|
317
+ | `id` | string | yes | Old memory ID |
318
+ | `collection` | enum | yes | Old memory collection: `coding` or `workspace` |
319
+ | `corrected_content` | string | yes | Replacement memory content |
320
+ | `correction_reason` | string | yes | Why the old memory is wrong/stale |
321
+ | `source` | string | no | Defaults to old source |
322
+ | `memory_type` | enum | no | Defaults to old type |
323
+ | `tags` | string[] | no | Defaults to old tags |
324
+ | `confidence` | number | no | Defaults to `0.9` |
325
+
326
+ ### `capture_episode`
327
+
328
+ Opt-in capture for task/debug episodes. This does **not** ingest raw conversation or tool traces; callers provide a compact summary plus durable observations that passed a quality filter.
329
+
330
+ | Parameter | Type | Required | Description |
331
+ |---|---|---|---|
332
+ | `episode_id` | string | yes | Task/debug episode identifier |
333
+ | `source` | string | yes | Usually `agent/claude-code` or another agent source |
334
+ | `summary` | string | yes | What happened, key attempts, final conclusion |
335
+ | `observations` | array | yes | 1–10 structured memories with `content`, `memory_type`, optional `tags` and `confidence` |
336
+ | `related_ids` | string[] | no | Existing memory IDs related to this episode |
337
+
338
+ ### `delete_memory`
339
+
340
+ Delete a specific memory entry by ID.
341
+
342
+ | Parameter | Type | Required | Description |
343
+ |---|---|---|---|
344
+ | `id` | string | yes | ID returned by `store_memory` |
345
+ | `collection` | enum | yes | `coding` or `workspace` |
346
+
347
+ ### `list_memories`
348
+
349
+ Browse memory entries with pagination — useful for auditing or finding IDs.
350
+
351
+ | Parameter | Type | Default | Description |
352
+ |---|---|---|---|
353
+ | `collection` | enum | — | `coding` or `workspace` (required) |
354
+ | `limit` | number | 20 | Max results (1–100) |
355
+ | `offset` | number | 0 | Pagination offset |
356
+ | `source` | string | — | Filter by source |
357
+ | `memory_type` | enum | — | Filter by type |
358
+ | `include_inactive` | boolean | false | Include superseded/deprecated memories for audit |
359
+
360
+ ---
361
+
362
+ ## Example Prompts
363
+
364
+ ### Saving memories
365
+
366
+ ```
367
+ Remember this rule: all service methods must have cyclomatic complexity ≤ 15,
368
+ otherwise the CI pipeline fails. Source is agent/claude-code, type is rule,
369
+ tags: ci, complexity, typescript.
370
+ ```
371
+
372
+ ```
373
+ Save to memory: we decided to use SHA-256 content hash as the Qdrant point ID
374
+ for deduplication. Same content written twice produces one record.
375
+ Source: agent/claude-code, type: decision, tags: qdrant, deduplication, architecture.
376
+ ```
377
+
378
+ ```
379
+ Note down: Airwallex RFI hosted flow KYC template ID is "kyc_rfi_v2_au".
380
+ Source: manual/airwallex-config, type: fact, tags: airwallex, rfi, configuration.
381
+ ```
382
+
383
+ ### Searching memories
384
+
385
+ ```
386
+ Search memory for any rules about code complexity limits.
387
+ ```
388
+
389
+ ```
390
+ What do we know about our Airwallex integration configuration?
391
+ Search the workspace collection.
392
+ ```
393
+
394
+ ```
395
+ Find all architecture decisions we've made. Search coding collection,
396
+ memory_type decision, score threshold 0.5.
397
+ ```
398
+
399
+ ```
400
+ List everything in the coding collection so I can audit what rules are stored.
401
+ ```
402
+
403
+ ### Deleting stale memories
404
+
405
+ ```
406
+ Delete memory id "a3f2b1c9..." from the workspace collection.
407
+ ```
408
+
409
+ ---
410
+
411
+ ## HTTP Bulk Ingest
412
+
413
+ For automated pipelines (e.g. daily Slack summarisation jobs):
414
+
415
+ ```bash
416
+ npm run dev:server
417
+ ```
418
+
419
+ ```bash
420
+ curl -X POST http://localhost:3000/ingest \
421
+ -H "Authorization: Bearer <INGEST_API_KEY>" \
422
+ -H "Content-Type: application/json" \
423
+ -d '{
424
+ "memories": [
425
+ {
426
+ "content": "Pod Pay Pilots weekly update 2026-04-22: document upload UI refactor completed. UAT sign-off from Katrina Li.",
427
+ "source": "slack/pod-pay-pilots",
428
+ "memory_type": "summary",
429
+ "tags": ["weekly-update", "document-upload", "uat"]
430
+ },
431
+ {
432
+ "content": "Decision: LaunchDarkly flag AU_discoverability_beanie_events controls Beanie event emission for AU bill pay. Default off in production.",
433
+ "source": "slack/pod-pay-pilots-engineers",
434
+ "memory_type": "fact",
435
+ "tags": ["launchdarkly", "feature-flag", "au-bill-pay"]
436
+ }
437
+ ]
438
+ }'
439
+ ```
440
+
441
+ Response (all succeeded):
442
+ ```json
443
+ { "stored": 2, "failed": 0, "succeeded": [{"index": 0, "id": "..."}, {"index": 1, "id": "..."}] }
444
+ ```
445
+
446
+ Response (partial failure — HTTP 207):
447
+ ```json
448
+ { "stored": 1, "failed": 1, "succeeded": [...], "errors": [{"index": 1, "error": "..."}] }
449
+ ```
450
+
451
+ Health check:
452
+ ```bash
453
+ curl http://localhost:3000/health
454
+ # {"status":"ok"}
455
+ ```
456
+
457
+ ---
458
+
459
+ ## Switching Embedding Models
460
+
461
+ The embedding model is a **hard infrastructure decision** — all stored vectors must use the same model. To switch models:
462
+
463
+ 1. Update `.env`: set `OLLAMA_MODEL` (or `EMBEDDING_PROVIDER=openai`) and `EMBEDDING_DIMENSIONS`
464
+ 2. Run the re-embed migration:
465
+
466
+ ```bash
467
+ # Dry-run first
468
+ MIGRATE_DRY_RUN=true node --env-file=.env ./node_modules/tsx/dist/cli.mjs scripts/migrate-reembed.ts
469
+
470
+ # Execute
471
+ MIGRATE_DRY_RUN=false node --env-file=.env ./node_modules/tsx/dist/cli.mjs scripts/migrate-reembed.ts
472
+ ```
473
+
474
+ This drops and recreates all collections at the new vector size, then re-embeds every stored record.
475
+
476
+ ---
477
+
478
+ ## Stop Services
479
+
480
+ ```bash
481
+ cd db
482
+ docker compose down # stop containers, keep data
483
+ docker compose down -v # stop containers and delete all data
484
+ ```
@@ -0,0 +1,16 @@
1
+ QDRANT_URL=http://localhost:6333
2
+ CODING_COLLECTION=coding
3
+ WORKSPACE_COLLECTION=workspace
4
+
5
+ EMBEDDING_PROVIDER=ollama
6
+ EMBEDDING_DIMENSIONS=1024
7
+
8
+ # OpenAI (only if EMBEDDING_PROVIDER=openai)
9
+ OPENAI_API_KEY=sk-...
10
+
11
+ # Ollama — points to the ollama container in docker-compose
12
+ OLLAMA_BASE_URL=http://localhost:11434
13
+ OLLAMA_MODEL=bge-m3
14
+
15
+ INGEST_API_KEY=your-secret-key
16
+ PORT=3000