@cchez/memory-mcp 1.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/DESIGN.md +188 -0
- package/README.md +484 -0
- package/db/.env.example +16 -0
- package/db/docker-compose.yml +33 -0
- package/dist/embedding.js +54 -0
- package/dist/index.js +60 -0
- package/dist/qdrant.js +349 -0
- package/dist/server.js +67 -0
- package/dist/tools/correct.js +57 -0
- package/dist/tools/delete.js +12 -0
- package/dist/tools/episode.js +65 -0
- package/dist/tools/list.js +37 -0
- package/dist/tools/search.js +98 -0
- package/dist/tools/store.js +71 -0
- package/package.json +66 -0
- package/skills/memory-correct/SKILL.md +83 -0
- package/skills/memory-save/SKILL.md +209 -0
- package/skills/memory-search/SKILL.md +156 -0
package/DESIGN.md
ADDED
|
@@ -0,0 +1,188 @@
|
|
|
1
|
+
# Memory MCP — Design Document
|
|
2
|
+
|
|
3
|
+
## Purpose
|
|
4
|
+
|
|
5
|
+
A persistent, semantic memory store for AI agents. Agents working on the same codebase or within the same team can read and write shared knowledge across sessions: coding rules discovered during development, architecture decisions, team discussions from Slack, and project facts. The goal is to eliminate repeated context re-explanation and give agents access to learnings that accumulate over time.
|
|
6
|
+
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
## Core Design Decisions
|
|
10
|
+
|
|
11
|
+
### 1. Vector storage: Qdrant
|
|
12
|
+
|
|
13
|
+
Qdrant is the vector database. Chosen over alternatives because:
|
|
14
|
+
- Written in Rust — high throughput, low latency on commodity hardware
|
|
15
|
+
- Rich payload filtering (filter by `source`, `tags`, `memory_type` in a single query)
|
|
16
|
+
- Clean REST + gRPC API, well-maintained Docker image
|
|
17
|
+
- Snapshot API simplifies backup and cross-machine migration
|
|
18
|
+
- No operational overhead vs distributed options (Milvus, Weaviate)
|
|
19
|
+
|
|
20
|
+
pgvector was considered but rejected: SQL join overhead is unnecessary when the only query pattern is ANN search + metadata filter.
|
|
21
|
+
|
|
22
|
+
### 2. Embedding model: Ollama bge-m3
|
|
23
|
+
|
|
24
|
+
**bge-m3** (BAAI, 1024-dim, ~1.1 GB) runs locally via Ollama.
|
|
25
|
+
|
|
26
|
+
Selected over alternatives:
|
|
27
|
+
|
|
28
|
+
| Model | Dims | Size | Chinese | Decision |
|
|
29
|
+
|---|---|---|---|---|
|
|
30
|
+
| nomic-embed-text | 768 | ~300MB | near-zero | Rejected — too weak for Chinese content |
|
|
31
|
+
| text-embedding-3-small | 1536 | API | excellent | Rejected — sends data to OpenAI, ongoing cost |
|
|
32
|
+
| bge-m3 | 1024 | ~1.1GB | SOTA | **Selected** |
|
|
33
|
+
| qwen3-embedding:8b | 4096 | ~16GB | #1 MTEB | Rejected — requires GPU, too heavy for dev machines |
|
|
34
|
+
|
|
35
|
+
OpenAI is still supported as a fallback via `EMBEDDING_PROVIDER=openai`.
|
|
36
|
+
|
|
37
|
+
**Critical constraint:** The embedding model is locked at collection creation time. All stored vectors must use the same model and dimension. Switching models requires a full re-embed of all data (`scripts/migrate-reembed.ts`).
|
|
38
|
+
|
|
39
|
+
### 3. Two collections, auto-routed by `memory_type`
|
|
40
|
+
|
|
41
|
+
A single collection mixing Slack summaries with coding rules degrades search quality — the semantic spaces of "team communication" and "technical constraints" are different enough to pollute results when merged.
|
|
42
|
+
|
|
43
|
+
Two collections are maintained:
|
|
44
|
+
|
|
45
|
+
```
|
|
46
|
+
coding ← rule, decision, preference
|
|
47
|
+
workspace ← fact, summary
|
|
48
|
+
```
|
|
49
|
+
|
|
50
|
+
Routing is automatic: `store_memory` reads `memory_type` and writes to the correct collection. Agents never specify a collection when writing. Search defaults to both collections, with an optional `collections` filter.
|
|
51
|
+
|
|
52
|
+
### 4. Content-hash deduplication
|
|
53
|
+
|
|
54
|
+
Point IDs in Qdrant are the first 32 hex characters of SHA-256(`content`). Writing identical content twice produces one record (Qdrant upsert is idempotent). This prevents retrieval pollution from repeated ingestion of the same Slack message or rule.
|
|
55
|
+
|
|
56
|
+
### 5. MCP as the agent interface
|
|
57
|
+
|
|
58
|
+
The MCP stdio server exposes four tools:
|
|
59
|
+
|
|
60
|
+
| Tool | Purpose |
|
|
61
|
+
|---|---|
|
|
62
|
+
| `store_memory` | Write a memory (auto-routed, deduplicated) |
|
|
63
|
+
| `search_memory` | Hybrid vector + keyword + recency search across one or both collections |
|
|
64
|
+
| `delete_memory` | Remove a record by ID |
|
|
65
|
+
| `list_memories` | Paginated browse for audit / ID lookup |
|
|
66
|
+
| `correct_memory` | Supersede an incorrect/stale memory with a corrected replacement |
|
|
67
|
+
| `capture_episode` | Opt-in episode summary and durable observation capture |
|
|
68
|
+
|
|
69
|
+
MCP was chosen over a plain REST API for the agent interface because:
|
|
70
|
+
- Tool descriptions are embedded in the protocol — agents discover capabilities without configuration
|
|
71
|
+
- Works with Claude Code, Claude Desktop, and any MCP-compliant host without custom integration
|
|
72
|
+
- Zod schema validation on all inputs prevents malformed queries reaching Qdrant
|
|
73
|
+
|
|
74
|
+
A separate HTTP ingest server (`server.ts`) handles bulk writes from automated pipelines (e.g. scheduled Slack summarisation jobs) where MCP overhead is unnecessary. It uses Bearer token auth and returns partial-success responses (HTTP 207) when some items in a batch fail.
|
|
75
|
+
|
|
76
|
+
### 6. Self-contained Docker stack
|
|
77
|
+
|
|
78
|
+
Qdrant and Ollama both run as Docker containers. The MCP server runs on the host (as a stdio process spawned by the agent host). This means:
|
|
79
|
+
- `docker compose up -d` in `db/` is the only infrastructure setup step
|
|
80
|
+
- Data persists in bind mounts (`db/qdrant_data/`, `db/ollama_data/`)
|
|
81
|
+
- Moving to a new machine = copy the folder + `docker compose up -d`
|
|
82
|
+
- No cloud dependencies, no API keys required for the default configuration
|
|
83
|
+
|
|
84
|
+
---
|
|
85
|
+
|
|
86
|
+
## Data Model
|
|
87
|
+
|
|
88
|
+
Each memory is stored as a Qdrant point:
|
|
89
|
+
|
|
90
|
+
```typescript
|
|
91
|
+
{
|
|
92
|
+
id: string, // SHA-256 hash of content (first 32 hex chars)
|
|
93
|
+
vector: number[], // 1024-dim embedding from bge-m3
|
|
94
|
+
payload: {
|
|
95
|
+
content: string, // The knowledge text
|
|
96
|
+
source: string, // Origin: "slack/channel", "agent/claude-code", etc.
|
|
97
|
+
memory_type: MemoryType, // "rule" | "decision" | "preference" | "fact" | "summary"
|
|
98
|
+
tags: string[], // Optional: specific tags for filtered retrieval
|
|
99
|
+
created_at: string, // ISO 8601 timestamp
|
|
100
|
+
updated_at: string, // ISO 8601 timestamp
|
|
101
|
+
status: "active" | "superseded" | "deprecated",
|
|
102
|
+
supersedes?: string, // old memory ID replaced by this memory
|
|
103
|
+
superseded_by?: string, // new memory ID that replaced this memory
|
|
104
|
+
correction_reason?: string,
|
|
105
|
+
confidence?: number, // 0-1 audit signal
|
|
106
|
+
episode_id?: string, // task/debug episode that produced this memory
|
|
107
|
+
related_ids?: string[],
|
|
108
|
+
valid_until?: string,
|
|
109
|
+
last_verified_at?: string,
|
|
110
|
+
}
|
|
111
|
+
}
|
|
112
|
+
```
|
|
113
|
+
|
|
114
|
+
### `memory_type` semantics
|
|
115
|
+
|
|
116
|
+
| Type | Collection | Meaning | Durability |
|
|
117
|
+
|---|---|---|---|
|
|
118
|
+
| `rule` | coding | Hard constraint (CI rule, style enforcement, hard limit) | Long-lived |
|
|
119
|
+
| `decision` | coding | Architecture or technology choice with rationale | Long-lived |
|
|
120
|
+
| `preference` | coding | Soft convention or team preference | Medium |
|
|
121
|
+
| `fact` | workspace | Current state: config values, team facts, integration details | Time-sensitive |
|
|
122
|
+
| `summary` | workspace | Distilled summary of a discussion, meeting, or Slack thread | Time-sensitive |
|
|
123
|
+
|
|
124
|
+
### `source` conventions
|
|
125
|
+
|
|
126
|
+
```
|
|
127
|
+
slack/<channel-name> — from a Slack channel
|
|
128
|
+
agent/claude-code — discovered by Claude Code during development
|
|
129
|
+
agent/<other-agent-name> — discovered by another AI agent
|
|
130
|
+
manual/<topic> — manually provided by a human
|
|
131
|
+
confluence/<page-title> — from a Confluence page
|
|
132
|
+
ci/<pipeline-name> — from CI/CD output
|
|
133
|
+
```
|
|
134
|
+
|
|
135
|
+
---
|
|
136
|
+
|
|
137
|
+
## Search Behaviour
|
|
138
|
+
|
|
139
|
+
Search embeds the query with the same model, then runs lifecycle-aware hybrid retrieval:
|
|
140
|
+
|
|
141
|
+
1. Query is embedded → 1024-dim vector
|
|
142
|
+
2. Active memories are searched by default; `superseded` and `deprecated` records are excluded unless `include_inactive=true`
|
|
143
|
+
3. Vector candidates come from Qdrant cosine similarity search
|
|
144
|
+
4. Keyword candidates come from lexical matching over `content`, `source`, `memory_type`, `tags`, and `episode_id`
|
|
145
|
+
5. Scores are blended from vector similarity, keyword score, and recency score; `rule` and `decision` do not decay, while `fact`, `summary`, and `preference` receive recency weighting
|
|
146
|
+
6. MMR re-ranks merged candidates to reduce near-duplicate results
|
|
147
|
+
7. Optional filters are applied: `source`, `tags`, `memory_type`, `score_threshold`
|
|
148
|
+
|
|
149
|
+
Supported modes: `hybrid` (default), `vector`, and `keyword`. Recommended `score_threshold`: **0.5** for general search, **0.65** when precision matters.
|
|
150
|
+
|
|
151
|
+
## Feedback Loop and Episodes
|
|
152
|
+
|
|
153
|
+
`correct_memory` handles the correction loop: user or agent provides corrected content and a reason, the system writes a new active memory, marks the old record `superseded`, and links both sides via `supersedes` / `superseded_by`.
|
|
154
|
+
|
|
155
|
+
`capture_episode` handles opt-in SDLC/debug memory. It stores a compact episode summary plus durable observations supplied by the caller. It intentionally does not capture raw conversation or tool traces; noisy automatic trace ingestion is out of scope.
|
|
156
|
+
|
|
157
|
+
---
|
|
158
|
+
|
|
159
|
+
## Agent Skills
|
|
160
|
+
|
|
161
|
+
Two skills guide agent behaviour around memory:
|
|
162
|
+
|
|
163
|
+
**`memory-search`** — triggers automatically when starting work on a feature, discussing architecture, or when the user asks to look something up. Chooses the right collection and filters, caps at two search attempts per topic, and applies results directly to work rather than just summarising them back.
|
|
164
|
+
|
|
165
|
+
**`memory-save`** — triggers when a coding rule is discovered, a decision is ratified, or a configuration fact is established. Applies a three-question filter (durable? non-obvious? actionable?) before saving. Writes content in structured templates optimised for future semantic retrieval.
|
|
166
|
+
|
|
167
|
+
Install both skills from `skills/` into `~/.claude/skills/`.
|
|
168
|
+
|
|
169
|
+
---
|
|
170
|
+
|
|
171
|
+
## Migration Scripts
|
|
172
|
+
|
|
173
|
+
### `scripts/migrate.ts`
|
|
174
|
+
|
|
175
|
+
One-time migration from the original single `knowledge` collection to the dual `coding`/`workspace` structure. Infers `memory_type` from `source` and `tags`. Copies vectors directly (no re-embed needed — same model, same dimensions).
|
|
176
|
+
|
|
177
|
+
### `scripts/migrate-reembed.ts`
|
|
178
|
+
|
|
179
|
+
Used when switching embedding models. Drops and recreates all collections at the new vector size, then re-embeds every record using the currently configured embedding provider. Supports dry-run mode (`MIGRATE_DRY_RUN=true`). Includes an Ollama health-check with retry so it works in automated environments where the model is still loading.
|
|
180
|
+
|
|
181
|
+
---
|
|
182
|
+
|
|
183
|
+
## What Is Not in Scope
|
|
184
|
+
|
|
185
|
+
- **Multi-tenancy / ACLs:** All agents share the same collections. If per-user or per-team isolation is needed, implement separate collections with a naming convention (e.g. `coding_teamA`) and route via `source` filters.
|
|
186
|
+
- **Chunking:** Long documents are stored as single vectors. Content over ~500 tokens will have degraded embedding quality. For long-form ingestion, chunk before calling `store_memory`.
|
|
187
|
+
- **Raw trace capture:** The system does not automatically ingest full conversations or tool traces. Episode capture is explicit and summary-based.
|
|
188
|
+
- **Background reflection / promotion:** There is no background agent that promotes short-term facts to long-term decisions without an explicit `capture_episode` or `store_memory` call.
|
package/README.md
ADDED
|
@@ -0,0 +1,484 @@
|
|
|
1
|
+
# Memory MCP
|
|
2
|
+
|
|
3
|
+
A self-contained AI memory store for agents and developers. Stores coding rules, architecture decisions, team discussions, and project facts as semantic vectors. Any MCP-compatible agent can search and save memories across sessions.
|
|
4
|
+
|
|
5
|
+
**Stack:** Qdrant (vector DB) + Ollama bge-m3 (embedding) + MCP stdio server — all containerised, zero cloud dependencies.
|
|
6
|
+
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
## Architecture
|
|
10
|
+
|
|
11
|
+
```
|
|
12
|
+
memory-mcp/
|
|
13
|
+
├── db/
|
|
14
|
+
│ ├── docker-compose.yml # Qdrant + Ollama containers
|
|
15
|
+
│ ├── .env.example # Environment variable template
|
|
16
|
+
│ ├── qdrant_data/ # Persisted vector data (gitignored)
|
|
17
|
+
│ └── ollama_data/ # Persisted model cache (gitignored)
|
|
18
|
+
├── src/
|
|
19
|
+
│ ├── index.ts # MCP stdio server
|
|
20
|
+
│ ├── server.ts # HTTP bulk-ingest server
|
|
21
|
+
│ ├── embedding.ts # Embedding provider abstraction (Ollama / OpenAI)
|
|
22
|
+
│ ├── qdrant.ts # Qdrant client — dual-collection, auto-routing
|
|
23
|
+
│ └── tools/
|
|
24
|
+
│ ├── store.ts # store_memory tool
|
|
25
|
+
│ ├── search.ts # search_memory tool
|
|
26
|
+
│ ├── delete.ts # delete_memory tool
|
|
27
|
+
│ ├── list.ts # list_memories tool
|
|
28
|
+
│ ├── correct.ts # correct_memory tool — supersession chain
|
|
29
|
+
│ └── episode.ts # capture_episode tool — opt-in task summaries
|
|
30
|
+
├── scripts/
|
|
31
|
+
│ ├── migrate.ts # One-time migration: old collection → coding/workspace
|
|
32
|
+
│ └── migrate-reembed.ts # Re-embed all data when switching embedding model
|
|
33
|
+
└── skills/
|
|
34
|
+
├── memory-search/ # Agent skill: when and how to search memory
|
|
35
|
+
├── memory-save/ # Agent skill: when and how to save to memory
|
|
36
|
+
└── memory-correct/ # Agent skill: when and how to correct memory
|
|
37
|
+
```
|
|
38
|
+
|
|
39
|
+
### Collections
|
|
40
|
+
|
|
41
|
+
Memories are auto-routed to one of two Qdrant collections based on `memory_type`:
|
|
42
|
+
|
|
43
|
+
| Collection | memory_type values | Content |
|
|
44
|
+
|---|---|---|
|
|
45
|
+
| `coding` | `rule`, `decision`, `preference` | Coding constraints, architecture choices, tool preferences |
|
|
46
|
+
| `workspace` | `fact`, `summary` | Team discussions, Slack summaries, project facts, config values |
|
|
47
|
+
|
|
48
|
+
---
|
|
49
|
+
|
|
50
|
+
## Quick Start
|
|
51
|
+
|
|
52
|
+
### Step 1 — Start the database and embedding model
|
|
53
|
+
|
|
54
|
+
```bash
|
|
55
|
+
cd db
|
|
56
|
+
docker compose up -d
|
|
57
|
+
```
|
|
58
|
+
|
|
59
|
+
This starts:
|
|
60
|
+
- **Qdrant** on `http://localhost:6333` (vector database)
|
|
61
|
+
- **Ollama** on `http://localhost:11434` (embedding server, auto-pulls `bge-m3` on first run)
|
|
62
|
+
|
|
63
|
+
First run takes 1–2 minutes while bge-m3 (~1.1 GB) downloads. Check progress:
|
|
64
|
+
|
|
65
|
+
```bash
|
|
66
|
+
docker compose logs -f ollama
|
|
67
|
+
```
|
|
68
|
+
|
|
69
|
+
Verify both are ready:
|
|
70
|
+
|
|
71
|
+
```bash
|
|
72
|
+
curl http://localhost:6333/collections # Qdrant
|
|
73
|
+
curl http://localhost:11434/api/tags # Ollama — should list bge-m3
|
|
74
|
+
```
|
|
75
|
+
|
|
76
|
+
> **Note:** If you have Ollama running locally on port 11434, stop it first to avoid port conflicts:
|
|
77
|
+
> ```bash
|
|
78
|
+
> killall ollama 2>/dev/null || true
|
|
79
|
+
> ```
|
|
80
|
+
|
|
81
|
+
### Step 2 — Configure environment
|
|
82
|
+
|
|
83
|
+
```bash
|
|
84
|
+
cp db/.env.example .env
|
|
85
|
+
```
|
|
86
|
+
|
|
87
|
+
The defaults work out of the box with the docker-compose setup. Edit `.env` only if you need to customise:
|
|
88
|
+
|
|
89
|
+
```env
|
|
90
|
+
QDRANT_URL=http://localhost:6333
|
|
91
|
+
CODING_COLLECTION=coding
|
|
92
|
+
WORKSPACE_COLLECTION=workspace
|
|
93
|
+
|
|
94
|
+
EMBEDDING_PROVIDER=ollama # or "openai"
|
|
95
|
+
EMBEDDING_DIMENSIONS=1024 # must match the model (bge-m3 = 1024)
|
|
96
|
+
OLLAMA_BASE_URL=http://localhost:11434
|
|
97
|
+
OLLAMA_MODEL=bge-m3
|
|
98
|
+
|
|
99
|
+
# Only needed if EMBEDDING_PROVIDER=openai
|
|
100
|
+
OPENAI_API_KEY=sk-...
|
|
101
|
+
|
|
102
|
+
INGEST_API_KEY=your-secret-key # for HTTP bulk-ingest auth
|
|
103
|
+
PORT=3000
|
|
104
|
+
```
|
|
105
|
+
|
|
106
|
+
### Step 3 — Install dependencies
|
|
107
|
+
|
|
108
|
+
Only needed for local development from this repo:
|
|
109
|
+
|
|
110
|
+
```bash
|
|
111
|
+
npm install
|
|
112
|
+
```
|
|
113
|
+
|
|
114
|
+
### Step 4 — Wire up the MCP server
|
|
115
|
+
|
|
116
|
+
For normal usage, run the MCP server via npm with `npx`. Add this to your Claude Code / Claude Desktop MCP config (`.claude/settings.json` or `claude_desktop_config.json`):
|
|
117
|
+
|
|
118
|
+
```json
|
|
119
|
+
{
|
|
120
|
+
"mcpServers": {
|
|
121
|
+
"memory-mcp": {
|
|
122
|
+
"type": "stdio",
|
|
123
|
+
"command": "npx",
|
|
124
|
+
"args": [
|
|
125
|
+
"-y",
|
|
126
|
+
"@cchez/memory-mcp@latest"
|
|
127
|
+
],
|
|
128
|
+
"env": {
|
|
129
|
+
"QDRANT_URL": "http://localhost:6333",
|
|
130
|
+
"EMBEDDING_PROVIDER": "ollama",
|
|
131
|
+
"OLLAMA_BASE_URL": "http://localhost:11434",
|
|
132
|
+
"OLLAMA_MODEL": "bge-m3",
|
|
133
|
+
"EMBEDDING_DIMENSIONS": "1024",
|
|
134
|
+
"CODING_COLLECTION": "coding",
|
|
135
|
+
"WORKSPACE_COLLECTION": "workspace"
|
|
136
|
+
}
|
|
137
|
+
}
|
|
138
|
+
}
|
|
139
|
+
}
|
|
140
|
+
```
|
|
141
|
+
|
|
142
|
+
For local development before publishing, point the MCP host at the source file instead:
|
|
143
|
+
|
|
144
|
+
```json
|
|
145
|
+
{
|
|
146
|
+
"mcpServers": {
|
|
147
|
+
"memory-mcp": {
|
|
148
|
+
"type": "stdio",
|
|
149
|
+
"command": "node",
|
|
150
|
+
"args": [
|
|
151
|
+
"/absolute/path/to/memory-mcp/node_modules/tsx/dist/cli.mjs",
|
|
152
|
+
"/absolute/path/to/memory-mcp/src/index.ts"
|
|
153
|
+
],
|
|
154
|
+
"env": {
|
|
155
|
+
"QDRANT_URL": "http://localhost:6333",
|
|
156
|
+
"EMBEDDING_PROVIDER": "ollama",
|
|
157
|
+
"OLLAMA_BASE_URL": "http://localhost:11434",
|
|
158
|
+
"OLLAMA_MODEL": "bge-m3",
|
|
159
|
+
"EMBEDDING_DIMENSIONS": "1024",
|
|
160
|
+
"CODING_COLLECTION": "coding",
|
|
161
|
+
"WORKSPACE_COLLECTION": "workspace"
|
|
162
|
+
}
|
|
163
|
+
}
|
|
164
|
+
}
|
|
165
|
+
}
|
|
166
|
+
```
|
|
167
|
+
|
|
168
|
+
> **Why `node` instead of `npm run dev` for local development?** Some MCP hosts don't reliably pass the `cwd` field to the spawned process, causing npm to look for `package.json` in the wrong directory. Invoking `node` with absolute paths bypasses npm entirely and works regardless of working directory.
|
|
169
|
+
|
|
170
|
+
Replace both `/absolute/path/to/memory-mcp` occurrences with the actual path to this repo (e.g. `/Users/you/projects/memory-mcp`).
|
|
171
|
+
|
|
172
|
+
Restart Claude Code / Claude Desktop. The MCP tools will be available immediately.
|
|
173
|
+
|
|
174
|
+
### Step 5 — Install agent skills (optional but recommended)
|
|
175
|
+
|
|
176
|
+
Copy the skills into your Claude skills directory so agents automatically know when to search and save memory:
|
|
177
|
+
|
|
178
|
+
```bash
|
|
179
|
+
cp -r skills/memory-search ~/.claude/skills/
|
|
180
|
+
cp -r skills/memory-save ~/.claude/skills/
|
|
181
|
+
cp -r skills/memory-correct ~/.claude/skills/
|
|
182
|
+
```
|
|
183
|
+
|
|
184
|
+
Restart Claude Code to pick up the new skills.
|
|
185
|
+
|
|
186
|
+
---
|
|
187
|
+
|
|
188
|
+
## Publishing to npmjs
|
|
189
|
+
|
|
190
|
+
The package is configured as `@cchez/memory-mcp` and exposes one executable bin, `memory-mcp`, backed by `dist/index.js`. The package uses a `files` allowlist so local Qdrant/Ollama data is not published.
|
|
191
|
+
|
|
192
|
+
Before publishing:
|
|
193
|
+
|
|
194
|
+
```bash
|
|
195
|
+
npm run build
|
|
196
|
+
npm run pack:dry-run
|
|
197
|
+
```
|
|
198
|
+
|
|
199
|
+
First-time publish for a scoped public package:
|
|
200
|
+
|
|
201
|
+
```bash
|
|
202
|
+
npm login --registry=https://registry.npmjs.org/
|
|
203
|
+
npm publish --access public --registry=https://registry.npmjs.org/
|
|
204
|
+
```
|
|
205
|
+
|
|
206
|
+
After publish, agent configs can use:
|
|
207
|
+
|
|
208
|
+
```json
|
|
209
|
+
{
|
|
210
|
+
"command": "npx",
|
|
211
|
+
"args": ["-y", "@cchez/memory-mcp@latest"]
|
|
212
|
+
}
|
|
213
|
+
```
|
|
214
|
+
|
|
215
|
+
The npm package only runs the MCP server. Qdrant and Ollama still need to be running separately, for example with `db/docker-compose.yml` from this repo.
|
|
216
|
+
|
|
217
|
+
### Step 6 - Global instruction
|
|
218
|
+
|
|
219
|
+
```
|
|
220
|
+
# Memory
|
|
221
|
+
|
|
222
|
+
知识库(memory mcp)是核心资产,通过 memory mcp 访问。
|
|
223
|
+
|
|
224
|
+
## 搜索 — 先查再做
|
|
225
|
+
|
|
226
|
+
开始任何任务前,先搜索知识库获取相关上下文:
|
|
227
|
+
- 开始写代码或修 bug → 搜索是否有相关编码规则或约束
|
|
228
|
+
- 讨论架构或技术选型 → 搜索是否有已有决策
|
|
229
|
+
- 遇到项目、系统、集成的名称 → 搜索相关背景和配置事实
|
|
230
|
+
- 用户问"我们之前怎么处理 X 的" → 搜索知识库,别靠猜
|
|
231
|
+
|
|
232
|
+
搜索时设 `score_threshold: 0.5` 过滤低相关结果。不确定搜哪个 collection 就两个都搜(默认行为)。每个话题最多搜两次(换个措辞重试一次),没结果就停,不要死循环。
|
|
233
|
+
|
|
234
|
+
## 保存 — 三问过滤
|
|
235
|
+
|
|
236
|
+
发现以下信息时主动保存,不需要用户提醒:
|
|
237
|
+
- 编码约束或规则("方法复杂度不能超过 15,否则 CI 挂")
|
|
238
|
+
- 架构或技术选型决策("选择了 Qdrant,原因是…")
|
|
239
|
+
- 配置事实("Airwallex RFI KYC template ID 是 xxx")
|
|
240
|
+
- 解决了一个非显而易见的歧义
|
|
241
|
+
|
|
242
|
+
用户说"记住这个"时,立即保存。
|
|
243
|
+
|
|
244
|
+
保存前过三问:① 耐久?3 个月后还成立吗 ② 非显而易见?新人不会的知道吗 ③ 可复用?会影响下次工作吗。三问全是才保存。不保存:任务状态、短暂信息、众所周知的事实。
|
|
245
|
+
|
|
246
|
+
## memory_type 分类
|
|
247
|
+
|
|
248
|
+
| memory_type | 保存什么 | 自动路由到 |
|
|
249
|
+
|---|---|---|
|
|
250
|
+
| `rule` | 编码约束、CI 规则、强制规范 | coding |
|
|
251
|
+
| `decision` | 技术选型、架构决策(含原因) | coding |
|
|
252
|
+
| `preference` | 团队偏好、软性约定 | coding |
|
|
253
|
+
| `fact` | 配置值、团队现状、集成细节 | workspace |
|
|
254
|
+
| `summary` | Slack/会议/文档的提炼摘要 | workspace |
|
|
255
|
+
|
|
256
|
+
collection 不需要手动指定,由 `memory_type` 自动路由。
|
|
257
|
+
|
|
258
|
+
## source 格式
|
|
259
|
+
|
|
260
|
+
`slack/<频道名>` / `agent/claude-code` / `confluence/<页面名>` / `manual/<主题>`
|
|
261
|
+
|
|
262
|
+
## 删除过时记忆
|
|
263
|
+
|
|
264
|
+
发现知识库中有错误或过期内容时,用 `correct_memory` 写入修订版并 supersede 旧记忆。只有敏感信息、重复垃圾、或确实不应保留审计历史的记录才用 `delete_memory(id, collection)`。
|
|
265
|
+
```
|
|
266
|
+
|
|
267
|
+
---
|
|
268
|
+
|
|
269
|
+
## MCP Tools Reference
|
|
270
|
+
|
|
271
|
+
### `store_memory`
|
|
272
|
+
|
|
273
|
+
Save a piece of knowledge. Content with identical text is automatically deduplicated (idempotent upsert via SHA-256 hash).
|
|
274
|
+
|
|
275
|
+
| Parameter | Type | Required | Description |
|
|
276
|
+
|---|---|---|---|
|
|
277
|
+
| `content` | string | yes | The knowledge to store (1–5 sentences, written for future retrieval) |
|
|
278
|
+
| `source` | string | yes | Origin: `slack/channel-name`, `agent/claude-code`, `confluence/page-title`, `manual/...` |
|
|
279
|
+
| `memory_type` | enum | yes | `rule` / `decision` / `preference` / `fact` / `summary` |
|
|
280
|
+
| `tags` | string[] | no | 2–5 specific tags for filtered retrieval |
|
|
281
|
+
| `status` | enum | `active` | `active` / `superseded` / `deprecated` |
|
|
282
|
+
| `confidence` | number | no | Confidence score 0–1 |
|
|
283
|
+
| `episode_id` | string | no | Task/debug episode that produced the memory |
|
|
284
|
+
| `related_ids` | string[] | no | Related memory IDs |
|
|
285
|
+
| `supersedes` | string | no | Older memory ID replaced by this memory |
|
|
286
|
+
| `valid_until` | string | no | ISO timestamp for time-sensitive facts |
|
|
287
|
+
| `last_verified_at` | string | no | ISO timestamp for last verification |
|
|
288
|
+
|
|
289
|
+
Returns: `{ id, collection }`
|
|
290
|
+
|
|
291
|
+
### `search_memory`
|
|
292
|
+
|
|
293
|
+
Hybrid search across one or both collections. The default `hybrid` mode combines vector similarity, exact keyword matching, light recency weighting, and MMR de-duplication. Superseded and deprecated memories are hidden unless explicitly requested.
|
|
294
|
+
|
|
295
|
+
| Parameter | Type | Default | Description |
|
|
296
|
+
|---|---|---|---|
|
|
297
|
+
| `query` | string | — | Natural language query |
|
|
298
|
+
| `collections` | array | both | `["coding"]`, `["workspace"]`, or omit for both |
|
|
299
|
+
| `limit` | number | 5 | Max results (1–20) |
|
|
300
|
+
| `score_threshold` | number | — | Min similarity 0–1 (recommended: 0.5) |
|
|
301
|
+
| `memory_type` | enum | — | Filter by type |
|
|
302
|
+
| `tags` | string[] | — | Filter — all specified tags must match |
|
|
303
|
+
| `source` | string | — | Filter by exact source string |
|
|
304
|
+
| `include_inactive` | boolean | false | Include `superseded` / `deprecated` memories |
|
|
305
|
+
| `mode` | enum | `hybrid` | `vector`, `keyword`, or `hybrid` |
|
|
306
|
+
| `use_recency` | boolean | true | Apply recency weighting to time-sensitive memories |
|
|
307
|
+
| `use_mmr` | boolean | true | Reduce duplicate-looking results |
|
|
308
|
+
|
|
309
|
+
Returns: array of `{ id, content, source, memory_type, collection, tags, score, vector_score, keyword_score, recency_score, status, created_at, updated_at, ...lifecycle_fields }`
|
|
310
|
+
|
|
311
|
+
### `correct_memory`
|
|
312
|
+
|
|
313
|
+
Create a corrected replacement for an existing memory. The old memory is marked `superseded`; the new memory is written as `active` with `supersedes` pointing at the old ID.
|
|
314
|
+
|
|
315
|
+
| Parameter | Type | Required | Description |
|
|
316
|
+
|---|---|---|---|
|
|
317
|
+
| `id` | string | yes | Old memory ID |
|
|
318
|
+
| `collection` | enum | yes | Old memory collection: `coding` or `workspace` |
|
|
319
|
+
| `corrected_content` | string | yes | Replacement memory content |
|
|
320
|
+
| `correction_reason` | string | yes | Why the old memory is wrong/stale |
|
|
321
|
+
| `source` | string | no | Defaults to old source |
|
|
322
|
+
| `memory_type` | enum | no | Defaults to old type |
|
|
323
|
+
| `tags` | string[] | no | Defaults to old tags |
|
|
324
|
+
| `confidence` | number | no | Defaults to `0.9` |
|
|
325
|
+
|
|
326
|
+
### `capture_episode`
|
|
327
|
+
|
|
328
|
+
Opt-in capture for task/debug episodes. This does **not** ingest raw conversation or tool traces; callers provide a compact summary plus durable observations that passed a quality filter.
|
|
329
|
+
|
|
330
|
+
| Parameter | Type | Required | Description |
|
|
331
|
+
|---|---|---|---|
|
|
332
|
+
| `episode_id` | string | yes | Task/debug episode identifier |
|
|
333
|
+
| `source` | string | yes | Usually `agent/claude-code` or another agent source |
|
|
334
|
+
| `summary` | string | yes | What happened, key attempts, final conclusion |
|
|
335
|
+
| `observations` | array | yes | 1–10 structured memories with `content`, `memory_type`, optional `tags` and `confidence` |
|
|
336
|
+
| `related_ids` | string[] | no | Existing memory IDs related to this episode |
|
|
337
|
+
|
|
338
|
+
### `delete_memory`
|
|
339
|
+
|
|
340
|
+
Delete a specific memory entry by ID.
|
|
341
|
+
|
|
342
|
+
| Parameter | Type | Required | Description |
|
|
343
|
+
|---|---|---|---|
|
|
344
|
+
| `id` | string | yes | ID returned by `store_memory` |
|
|
345
|
+
| `collection` | enum | yes | `coding` or `workspace` |
|
|
346
|
+
|
|
347
|
+
### `list_memories`
|
|
348
|
+
|
|
349
|
+
Browse memory entries with pagination — useful for auditing or finding IDs.
|
|
350
|
+
|
|
351
|
+
| Parameter | Type | Default | Description |
|
|
352
|
+
|---|---|---|---|
|
|
353
|
+
| `collection` | enum | — | `coding` or `workspace` (required) |
|
|
354
|
+
| `limit` | number | 20 | Max results (1–100) |
|
|
355
|
+
| `offset` | number | 0 | Pagination offset |
|
|
356
|
+
| `source` | string | — | Filter by source |
|
|
357
|
+
| `memory_type` | enum | — | Filter by type |
|
|
358
|
+
| `include_inactive` | boolean | false | Include superseded/deprecated memories for audit |
|
|
359
|
+
|
|
360
|
+
---
|
|
361
|
+
|
|
362
|
+
## Example Prompts
|
|
363
|
+
|
|
364
|
+
### Saving memories
|
|
365
|
+
|
|
366
|
+
```
|
|
367
|
+
Remember this rule: all service methods must have cyclomatic complexity ≤ 15,
|
|
368
|
+
otherwise the CI pipeline fails. Source is agent/claude-code, type is rule,
|
|
369
|
+
tags: ci, complexity, typescript.
|
|
370
|
+
```
|
|
371
|
+
|
|
372
|
+
```
|
|
373
|
+
Save to memory: we decided to use SHA-256 content hash as the Qdrant point ID
|
|
374
|
+
for deduplication. Same content written twice produces one record.
|
|
375
|
+
Source: agent/claude-code, type: decision, tags: qdrant, deduplication, architecture.
|
|
376
|
+
```
|
|
377
|
+
|
|
378
|
+
```
|
|
379
|
+
Note down: Airwallex RFI hosted flow KYC template ID is "kyc_rfi_v2_au".
|
|
380
|
+
Source: manual/airwallex-config, type: fact, tags: airwallex, rfi, configuration.
|
|
381
|
+
```
|
|
382
|
+
|
|
383
|
+
### Searching memories
|
|
384
|
+
|
|
385
|
+
```
|
|
386
|
+
Search memory for any rules about code complexity limits.
|
|
387
|
+
```
|
|
388
|
+
|
|
389
|
+
```
|
|
390
|
+
What do we know about our Airwallex integration configuration?
|
|
391
|
+
Search the workspace collection.
|
|
392
|
+
```
|
|
393
|
+
|
|
394
|
+
```
|
|
395
|
+
Find all architecture decisions we've made. Search coding collection,
|
|
396
|
+
memory_type decision, score threshold 0.5.
|
|
397
|
+
```
|
|
398
|
+
|
|
399
|
+
```
|
|
400
|
+
List everything in the coding collection so I can audit what rules are stored.
|
|
401
|
+
```
|
|
402
|
+
|
|
403
|
+
### Deleting stale memories
|
|
404
|
+
|
|
405
|
+
```
|
|
406
|
+
Delete memory id "a3f2b1c9..." from the workspace collection.
|
|
407
|
+
```
|
|
408
|
+
|
|
409
|
+
---
|
|
410
|
+
|
|
411
|
+
## HTTP Bulk Ingest
|
|
412
|
+
|
|
413
|
+
For automated pipelines (e.g. daily Slack summarisation jobs):
|
|
414
|
+
|
|
415
|
+
```bash
|
|
416
|
+
npm run dev:server
|
|
417
|
+
```
|
|
418
|
+
|
|
419
|
+
```bash
|
|
420
|
+
curl -X POST http://localhost:3000/ingest \
|
|
421
|
+
-H "Authorization: Bearer <INGEST_API_KEY>" \
|
|
422
|
+
-H "Content-Type: application/json" \
|
|
423
|
+
-d '{
|
|
424
|
+
"memories": [
|
|
425
|
+
{
|
|
426
|
+
"content": "Pod Pay Pilots weekly update 2026-04-22: document upload UI refactor completed. UAT sign-off from Katrina Li.",
|
|
427
|
+
"source": "slack/pod-pay-pilots",
|
|
428
|
+
"memory_type": "summary",
|
|
429
|
+
"tags": ["weekly-update", "document-upload", "uat"]
|
|
430
|
+
},
|
|
431
|
+
{
|
|
432
|
+
"content": "Decision: LaunchDarkly flag AU_discoverability_beanie_events controls Beanie event emission for AU bill pay. Default off in production.",
|
|
433
|
+
"source": "slack/pod-pay-pilots-engineers",
|
|
434
|
+
"memory_type": "fact",
|
|
435
|
+
"tags": ["launchdarkly", "feature-flag", "au-bill-pay"]
|
|
436
|
+
}
|
|
437
|
+
]
|
|
438
|
+
}'
|
|
439
|
+
```
|
|
440
|
+
|
|
441
|
+
Response (all succeeded):
|
|
442
|
+
```json
|
|
443
|
+
{ "stored": 2, "failed": 0, "succeeded": [{"index": 0, "id": "..."}, {"index": 1, "id": "..."}] }
|
|
444
|
+
```
|
|
445
|
+
|
|
446
|
+
Response (partial failure — HTTP 207):
|
|
447
|
+
```json
|
|
448
|
+
{ "stored": 1, "failed": 1, "succeeded": [...], "errors": [{"index": 1, "error": "..."}] }
|
|
449
|
+
```
|
|
450
|
+
|
|
451
|
+
Health check:
|
|
452
|
+
```bash
|
|
453
|
+
curl http://localhost:3000/health
|
|
454
|
+
# {"status":"ok"}
|
|
455
|
+
```
|
|
456
|
+
|
|
457
|
+
---
|
|
458
|
+
|
|
459
|
+
## Switching Embedding Models
|
|
460
|
+
|
|
461
|
+
The embedding model is a **hard infrastructure decision** — all stored vectors must use the same model. To switch models:
|
|
462
|
+
|
|
463
|
+
1. Update `.env`: set `OLLAMA_MODEL` (or `EMBEDDING_PROVIDER=openai`) and `EMBEDDING_DIMENSIONS`
|
|
464
|
+
2. Run the re-embed migration:
|
|
465
|
+
|
|
466
|
+
```bash
|
|
467
|
+
# Dry-run first
|
|
468
|
+
MIGRATE_DRY_RUN=true node --env-file=.env ./node_modules/tsx/dist/cli.mjs scripts/migrate-reembed.ts
|
|
469
|
+
|
|
470
|
+
# Execute
|
|
471
|
+
MIGRATE_DRY_RUN=false node --env-file=.env ./node_modules/tsx/dist/cli.mjs scripts/migrate-reembed.ts
|
|
472
|
+
```
|
|
473
|
+
|
|
474
|
+
This drops and recreates all collections at the new vector size, then re-embeds every stored record.
|
|
475
|
+
|
|
476
|
+
---
|
|
477
|
+
|
|
478
|
+
## Stop Services
|
|
479
|
+
|
|
480
|
+
```bash
|
|
481
|
+
cd db
|
|
482
|
+
docker compose down # stop containers, keep data
|
|
483
|
+
docker compose down -v # stop containers and delete all data
|
|
484
|
+
```
|
package/db/.env.example
ADDED
|
@@ -0,0 +1,16 @@
|
|
|
1
|
+
QDRANT_URL=http://localhost:6333
|
|
2
|
+
CODING_COLLECTION=coding
|
|
3
|
+
WORKSPACE_COLLECTION=workspace
|
|
4
|
+
|
|
5
|
+
EMBEDDING_PROVIDER=ollama
|
|
6
|
+
EMBEDDING_DIMENSIONS=1024
|
|
7
|
+
|
|
8
|
+
# OpenAI (only if EMBEDDING_PROVIDER=openai)
|
|
9
|
+
OPENAI_API_KEY=sk-...
|
|
10
|
+
|
|
11
|
+
# Ollama — points to the ollama container in docker-compose
|
|
12
|
+
OLLAMA_BASE_URL=http://localhost:11434
|
|
13
|
+
OLLAMA_MODEL=bge-m3
|
|
14
|
+
|
|
15
|
+
INGEST_API_KEY=your-secret-key
|
|
16
|
+
PORT=3000
|