mcp-probe-kit 3.0.18 → 3.0.21

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (62) hide show
  1. package/README.md +87 -55
  2. package/build/index.js +3 -1
  3. package/build/lib/__tests__/agents-md-template.unit.test.d.ts +1 -0
  4. package/build/lib/__tests__/agents-md-template.unit.test.js +27 -0
  5. package/build/lib/__tests__/memory-config.unit.test.js +9 -0
  6. package/build/lib/__tests__/memory-injection.unit.test.d.ts +1 -0
  7. package/build/lib/__tests__/memory-injection.unit.test.js +51 -0
  8. package/build/lib/__tests__/memory-orchestration.unit.test.d.ts +1 -0
  9. package/build/lib/__tests__/memory-orchestration.unit.test.js +84 -0
  10. package/build/lib/__tests__/memory-payload.unit.test.d.ts +1 -0
  11. package/build/lib/__tests__/memory-payload.unit.test.js +35 -0
  12. package/build/lib/__tests__/project-context-layout.unit.test.d.ts +1 -0
  13. package/build/lib/__tests__/project-context-layout.unit.test.js +80 -0
  14. package/build/lib/agents-md-template.d.ts +25 -0
  15. package/build/lib/agents-md-template.js +57 -0
  16. package/build/lib/memory-client.d.ts +8 -1
  17. package/build/lib/memory-client.js +53 -44
  18. package/build/lib/memory-config.d.ts +8 -0
  19. package/build/lib/memory-config.js +19 -0
  20. package/build/lib/memory-orchestration.d.ts +10 -3
  21. package/build/lib/memory-orchestration.js +146 -7
  22. package/build/lib/memory-payload.d.ts +21 -0
  23. package/build/lib/memory-payload.js +65 -0
  24. package/build/lib/merge-agents-md.d.ts +6 -0
  25. package/build/lib/merge-agents-md.js +51 -0
  26. package/build/lib/project-context-layout.d.ts +78 -0
  27. package/build/lib/project-context-layout.js +350 -0
  28. package/build/lib/workspace-root.js +6 -1
  29. package/build/resources/ui-ux-data/metadata.json +1 -1
  30. package/build/schemas/index.d.ts +62 -11
  31. package/build/schemas/memory-tools.d.ts +38 -9
  32. package/build/schemas/memory-tools.js +24 -9
  33. package/build/schemas/project-tools.d.ts +24 -2
  34. package/build/schemas/project-tools.js +24 -2
  35. package/build/tools/__tests__/code_insight.unit.test.js +3 -3
  36. package/build/tools/__tests__/init_project_context.unit.test.js +32 -21
  37. package/build/tools/__tests__/start_feature.unit.test.js +2 -1
  38. package/build/tools/code_insight.js +11 -9
  39. package/build/tools/index.d.ts +1 -0
  40. package/build/tools/index.js +1 -0
  41. package/build/tools/init_project_context.js +563 -506
  42. package/build/tools/memorize_asset.js +12 -0
  43. package/build/tools/scan_and_extract_patterns.js +7 -7
  44. package/build/tools/search_memory.d.ts +7 -0
  45. package/build/tools/search_memory.js +57 -0
  46. package/build/tools/start_bugfix.js +257 -251
  47. package/build/tools/start_feature.js +140 -134
  48. package/build/tools/start_ui.js +405 -405
  49. package/docs/.mcp-probe/layout.json +11 -0
  50. package/docs/data/tools.js +18 -0
  51. package/docs/i18n/all-tools/en.json +6 -1
  52. package/docs/i18n/all-tools/ja.json +2 -1
  53. package/docs/i18n/all-tools/ko.json +2 -1
  54. package/docs/i18n/all-tools/zh-CN.json +7 -2
  55. package/docs/i18n/en.json +38 -7
  56. package/docs/i18n/ja.json +9 -2
  57. package/docs/i18n/ko.json +9 -2
  58. package/docs/i18n/zh-CN.json +40 -9
  59. package/docs/memory-local-setup.md +314 -0
  60. package/docs/memory-local-setup.zh-CN.md +283 -0
  61. package/docs/pages/getting-started.html +252 -33
  62. package/package.json +2 -2
@@ -0,0 +1,314 @@
1
+ # Local Memory Stack (Qdrant + Nomic Embed)
2
+
3
+ This guide documents a **lightweight local setup** for mcp-probe-kit memory tools (`search_memory`, `memorize_asset`, `read_memory_asset`, `scan_and_extract_patterns`):
4
+
5
+ - **Qdrant** — vector database (port `50008`)
6
+ - **Infinity (nomic-embed)** — embedding API (port `50012`), replaces Ollama for users who want a smaller footprint
7
+
8
+ Both services are typically deployed with Docker Compose under a shared `docker-start` layout. Ports use the `500xx` convention to avoid conflicts.
9
+
10
+ | Service | Host port | Container port | Purpose |
11
+ |---------|-----------|----------------|---------|
12
+ | Qdrant HTTP | `50008` | `6333` | REST API, dashboard |
13
+ | Qdrant gRPC | `50009` | `6334` | gRPC API |
14
+ | Nomic Embed (Infinity) | `50012` | `7997` | OpenAI-compatible embeddings |
15
+
16
+ ---
17
+
18
+ ## 1. Qdrant
19
+
20
+ ### Service info
21
+
22
+ | Item | Value |
23
+ |------|-------|
24
+ | Image | `qdrant/qdrant:latest` |
25
+ | Container name | `qdrant` |
26
+ | HTTP | `http://127.0.0.1:50008` |
27
+ | gRPC | `127.0.0.1:50009` |
28
+ | Data | `./data` → `/qdrant/storage` |
29
+ | Snapshots | `./snapshots` → `/qdrant/snapshots` |
30
+ | Auth | `QDRANT_API_KEY` in `.env` (header `api-key`) |
31
+
32
+ ### `docker-compose.yml`
33
+
34
+ ```yaml
35
+ services:
36
+ qdrant:
37
+ image: qdrant/qdrant:latest
38
+ container_name: qdrant
39
+ restart: always
40
+ env_file:
41
+ - .env
42
+ ports:
43
+ - "50008:6333"
44
+ - "50009:6334"
45
+ volumes:
46
+ - ./data:/qdrant/storage
47
+ - ./snapshots:/qdrant/snapshots
48
+ environment:
49
+ - QDRANT__SERVICE__HTTP_PORT=6333
50
+ - QDRANT__SERVICE__GRPC_PORT=6334
51
+ - QDRANT__LOG_LEVEL=INFO
52
+ - QDRANT__SERVICE__API_KEY=${QDRANT_API_KEY}
53
+ healthcheck:
54
+ test:
55
+ - "CMD"
56
+ - "bash"
57
+ - "-c"
58
+ - "exec 3<>/dev/tcp/127.0.0.1/6333 && printf 'GET /collections HTTP/1.1\r\nHost: localhost\r\napi-key: ${QDRANT_API_KEY}\r\nConnection: close\r\n\r\n' >&3 && IFS= read -r line <&3 && [[ \"$$line\" == *\"200\"* ]]"
59
+ interval: 30s
60
+ timeout: 10s
61
+ retries: 5
62
+ start_period: 30s
63
+ ```
64
+
65
+ ### `.env.example`
66
+
67
+ ```bash
68
+ # copy: cp .env.example .env
69
+ # generate key: python -c "import secrets; print(secrets.token_urlsafe(32))"
70
+
71
+ QDRANT_API_KEY=change-me-to-a-long-random-string
72
+ QDRANT_URL=http://127.0.0.1:50008
73
+ ```
74
+
75
+ ### First deploy
76
+
77
+ ```powershell
78
+ cd qdrant
79
+ copy .env.example .env
80
+ # Edit .env — set a long random QDRANT_API_KEY
81
+ docker compose up -d
82
+ ```
83
+
84
+ ### Verify
85
+
86
+ ```bash
87
+ # Health / collections (requires api-key when API_KEY is enabled)
88
+ curl http://127.0.0.1:50008/collections \
89
+ -H "api-key: YOUR_QDRANT_API_KEY"
90
+
91
+ # Web UI (enter the same key in the dashboard)
92
+ # http://127.0.0.1:50008/dashboard
93
+ ```
94
+
95
+ ### Common commands
96
+
97
+ ```powershell
98
+ docker compose up -d # start
99
+ docker compose down # stop
100
+ docker compose logs -f qdrant # logs
101
+ docker compose restart qdrant # restart
102
+ ```
103
+
104
+ ### Notes
105
+
106
+ - After enabling `QDRANT__SERVICE__API_KEY`, **all** REST/gRPC requests must include header `api-key`.
107
+ - mcp-probe-kit sends this via `MEMORY_QDRANT_API_KEY`.
108
+ - Collection `mcp_probe_memory` is created automatically on first `memorize_asset` write (Cosine distance; vector size inferred from the first embedding).
109
+
110
+ ---
111
+
112
+ ## 2. Nomic Embed (Infinity)
113
+
114
+ Lightweight embedding server based on [Infinity](https://github.com/michaelfeil/infinity). Model: `nomic-ai/nomic-embed-text-v1.5` (768 dimensions). No Ollama required.
115
+
116
+ ### Service info
117
+
118
+ | Item | Value |
119
+ |------|-------|
120
+ | Image | `michaelf34/infinity:0.0.70` |
121
+ | Container name | `nomic-embed` |
122
+ | Host port | `50012` → container `7997` |
123
+ | Model | `nomic-ai/nomic-embed-text-v1.5` |
124
+ | Vector dim | **768** |
125
+ | Engine | `torch` (CPU if no GPU) |
126
+ | Auth | `INFINITY_API_KEY` → `Authorization: Bearer <key>` |
127
+ | Model cache | Docker volume `hf_cache` → `/app/.cache` |
128
+
129
+ ### `docker-compose.yml`
130
+
131
+ ```yaml
132
+ services:
133
+ nomic-embed:
134
+ image: michaelf34/infinity:0.0.70
135
+ container_name: nomic-embed
136
+ restart: unless-stopped
137
+ ports:
138
+ - "50012:7997"
139
+ volumes:
140
+ - hf_cache:/app/.cache
141
+ environment:
142
+ INFINITY_API_KEY: ${INFINITY_API_KEY}
143
+ command:
144
+ - v2
145
+ - --model-id
146
+ - nomic-ai/nomic-embed-text-v1.5
147
+ - --revision
148
+ - main
149
+ - --dtype
150
+ - float32
151
+ - --batch-size
152
+ - "8"
153
+ - --engine
154
+ - torch
155
+ - --port
156
+ - "7997"
157
+ - --no-bettertransformer
158
+ healthcheck:
159
+ test:
160
+ - "CMD"
161
+ - "curl"
162
+ - "-f"
163
+ - "http://127.0.0.1:7997/health"
164
+ interval: 30s
165
+ timeout: 10s
166
+ retries: 5
167
+ start_period: 120s
168
+
169
+ volumes:
170
+ hf_cache:
171
+ ```
172
+
173
+ ### `.env.example`
174
+
175
+ ```bash
176
+ INFINITY_API_KEY=change-me-to-a-long-random-string
177
+ ```
178
+
179
+ ### First deploy
180
+
181
+ ```powershell
182
+ cd nomic-embed
183
+ copy .env.example .env
184
+ # Edit .env — set INFINITY_API_KEY (long random string)
185
+ docker compose up -d
186
+ docker logs -f nomic-embed # wait for "ready to batch requests"
187
+ ```
188
+
189
+ First start downloads the HuggingFace model (**~2–5 minutes** cold start).
190
+
191
+ ### Verify
192
+
193
+ ```bash
194
+ curl http://127.0.0.1:50012/health
195
+
196
+ curl http://127.0.0.1:50012/models \
197
+ -H "Authorization: Bearer YOUR_INFINITY_API_KEY"
198
+
199
+ # Important: path is /embeddings — NOT /v1/embeddings
200
+ curl http://127.0.0.1:50012/embeddings \
201
+ -H "Authorization: Bearer YOUR_INFINITY_API_KEY" \
202
+ -H "Content-Type: application/json" \
203
+ -d '{"model":"nomic-ai/nomic-embed-text-v1.5","input":"hello world"}'
204
+ ```
205
+
206
+ Response: `data[0].embedding` is a **768**-float array.
207
+
208
+ Swagger: `http://127.0.0.1:50012/docs`
209
+
210
+ ### Performance (CPU, indicative)
211
+
212
+ | Scenario | ~Latency |
213
+ |----------|----------|
214
+ | Single short text (warm) | 30–50 ms |
215
+ | First request | ~150 ms |
216
+ | Batch 8 | ~150 ms |
217
+ | Resident memory | ~1 GB |
218
+
219
+ Suitable for MCP memory and occasional writes; not for high-concurrency bulk indexing.
220
+
221
+ ---
222
+
223
+ ## 3. mcp-probe-kit MCP configuration
224
+
225
+ **Recommended:** Qdrant on `50008` + Infinity on `50012` with `openai-compatible` provider.
226
+
227
+ ```json
228
+ {
229
+ "mcpServers": {
230
+ "mcp-probe-kit": {
231
+ "command": "npx",
232
+ "args": ["-y", "mcp-probe-kit@latest"],
233
+ "env": {
234
+ "MEMORY_QDRANT_URL": "http://127.0.0.1:50008",
235
+ "MEMORY_QDRANT_API_KEY": "YOUR_QDRANT_API_KEY",
236
+ "MEMORY_QDRANT_COLLECTION": "mcp_probe_memory",
237
+ "MEMORY_EMBEDDING_PROVIDER": "openai-compatible",
238
+ "MEMORY_EMBEDDING_URL": "http://127.0.0.1:50012/embeddings",
239
+ "MEMORY_EMBEDDING_MODEL": "nomic-ai/nomic-embed-text-v1.5",
240
+ "MEMORY_EMBEDDING_API_KEY": "YOUR_INFINITY_API_KEY",
241
+ "MEMORY_SEARCH_LIMIT": "3",
242
+ "MEMORY_SUMMARY_MAX_CHARS": "280"
243
+ }
244
+ }
245
+ }
246
+ }
247
+ ```
248
+
249
+ Claude Code: put the same keys under `mcpServers.mcp-probe-kit.env` in `.mcp.json`.
250
+
251
+ After changing env, **fully restart** your MCP client (e.g. quit and reopen Cursor).
252
+
253
+ ### Environment variable reference
254
+
255
+ | Variable | Required | Description |
256
+ |----------|----------|-------------|
257
+ | `MEMORY_QDRANT_URL` | Yes (read/write) | Qdrant base URL, e.g. `http://127.0.0.1:50008` |
258
+ | `MEMORY_QDRANT_API_KEY` | If Qdrant auth enabled | Sent as `api-key` header |
259
+ | `MEMORY_QDRANT_COLLECTION` | No | Default `mcp_probe_memory` |
260
+ | `MEMORY_EMBEDDING_URL` | Yes (write/search) | e.g. `http://127.0.0.1:50012/embeddings` |
261
+ | `MEMORY_EMBEDDING_MODEL` | Yes (write/search) | `nomic-ai/nomic-embed-text-v1.5` |
262
+ | `MEMORY_EMBEDDING_PROVIDER` | No | Must be `openai-compatible` for Infinity |
263
+ | `MEMORY_EMBEDDING_API_KEY` | Yes for Infinity | Bearer token = `INFINITY_API_KEY` |
264
+ | `MEMORY_SEARCH_LIMIT` | No | Default `3` |
265
+ | `MEMORY_SUMMARY_MAX_CHARS` | No | Default `280` |
266
+
267
+ ---
268
+
269
+ ## 4. End-to-end smoke test
270
+
271
+ ```bash
272
+ # 1) Qdrant
273
+ curl -s http://127.0.0.1:50008/collections -H "api-key: YOUR_QDRANT_API_KEY"
274
+
275
+ # 2) Embedding
276
+ curl -s -X POST http://127.0.0.1:50012/embeddings \
277
+ -H "Authorization: Bearer YOUR_INFINITY_API_KEY" \
278
+ -H "Content-Type: application/json" \
279
+ -d '{"model":"nomic-ai/nomic-embed-text-v1.5","input":"mcp-probe-kit test"}' \
280
+ | jq '.data[0].embedding | length'
281
+ # Expected: 768
282
+ ```
283
+
284
+ Then in the IDE, call `memorize_asset` once and `read_memory_asset` / semantic search via orchestration tools.
285
+
286
+ ---
287
+
288
+ ## 5. Troubleshooting
289
+
290
+ | Symptom | Fix |
291
+ |---------|-----|
292
+ | Qdrant `401` | Set `MEMORY_QDRANT_API_KEY` to match `qdrant/.env` |
293
+ | Embedding `401` | Use `Authorization: Bearer` + correct `INFINITY_API_KEY` |
294
+ | Embedding `404` | URL must be `http://127.0.0.1:50012/embeddings`, not `/v1/embeddings` |
295
+ | `nomic-embed` health stuck on `starting` | First model download; check `docker logs nomic-embed` |
296
+ | Log `No CUDA runtime` | Normal on CPU |
297
+ | Dimension mismatch in Qdrant | Collection was created with another model; delete collection or use a new `MEMORY_QDRANT_COLLECTION` name |
298
+ | Memory write disabled | Ensure all three are set: `MEMORY_QDRANT_URL`, `MEMORY_EMBEDDING_URL`, `MEMORY_EMBEDDING_MODEL` |
299
+
300
+ ---
301
+
302
+ ## 6. Alternatives
303
+
304
+ | Stack | When to use |
305
+ |-------|-------------|
306
+ | **Qdrant + Infinity (this guide)** | Default for local dev; lighter than Ollama |
307
+ | Qdrant + Ollama | If you already run Ollama for chat models |
308
+ | Qdrant + hosted OpenAI-compatible API | No local embedding container |
309
+
310
+ See also [README — Optional Memory System Setup](../README.md#optional-memory-system-setup).
311
+
312
+ ---
313
+
314
+ **中文说明**: 同内容中文版见 [memory-local-setup.zh-CN.md](./memory-local-setup.zh-CN.md).
@@ -0,0 +1,283 @@
1
+ # 本地记忆栈(Qdrant + Nomic Embed)
2
+
3
+ 面向 `search_memory`、`memorize_asset`、`read_memory_asset`、`scan_and_extract_patterns` 的**轻量本机部署**说明:
4
+
5
+ - **Qdrant** — 向量库(端口 `50008`)
6
+ - **Infinity(nomic-embed)** — 向量生成(端口 `50012`),**替代 Ollama**,对用户更轻
7
+
8
+ 建议使用 Docker Compose 统一部署;端口采用 `500xx` 段,避免与其它服务冲突。
9
+
10
+ | 服务 | 宿主机端口 | 容器端口 | 说明 |
11
+ |------|------------|----------|------|
12
+ | Qdrant HTTP | `50008` | `6333` | REST、Dashboard |
13
+ | Qdrant gRPC | `50009` | `6334` | gRPC |
14
+ | Nomic Embed | `50012` | `7997` | OpenAI 兼容 embedding |
15
+
16
+ ---
17
+
18
+ ## 1. Qdrant 向量数据库
19
+
20
+ ### 服务信息
21
+
22
+ | 项 | 值 |
23
+ |----|-----|
24
+ | **镜像** | `qdrant/qdrant:latest` |
25
+ | **容器名** | `qdrant` |
26
+ | **HTTP** | `http://127.0.0.1:50008` |
27
+ | **gRPC** | `127.0.0.1:50009` |
28
+ | **数据** | `./data` → `/qdrant/storage` |
29
+ | **快照** | `./snapshots` → `/qdrant/snapshots` |
30
+ | **认证** | `.env` 中 `QDRANT_API_KEY`,请求头 `api-key` |
31
+
32
+ ### `docker-compose.yml`
33
+
34
+ ```yaml
35
+ services:
36
+ qdrant:
37
+ image: qdrant/qdrant:latest
38
+ container_name: qdrant
39
+ restart: always
40
+ env_file:
41
+ - .env
42
+ ports:
43
+ - "50008:6333"
44
+ - "50009:6334"
45
+ volumes:
46
+ - ./data:/qdrant/storage
47
+ - ./snapshots:/qdrant/snapshots
48
+ environment:
49
+ - QDRANT__SERVICE__HTTP_PORT=6333
50
+ - QDRANT__SERVICE__GRPC_PORT=6334
51
+ - QDRANT__LOG_LEVEL=INFO
52
+ - QDRANT__SERVICE__API_KEY=${QDRANT_API_KEY}
53
+ healthcheck:
54
+ test:
55
+ - "CMD"
56
+ - "bash"
57
+ - "-c"
58
+ - "exec 3<>/dev/tcp/127.0.0.1/6333 && printf 'GET /collections HTTP/1.1\r\nHost: localhost\r\napi-key: ${QDRANT_API_KEY}\r\nConnection: close\r\n\r\n' >&3 && IFS= read -r line <&3 && [[ \"$$line\" == *\"200\"* ]]"
59
+ interval: 30s
60
+ timeout: 10s
61
+ retries: 5
62
+ start_period: 30s
63
+ ```
64
+
65
+ ### `.env.example`
66
+
67
+ ```bash
68
+ # 复制:copy .env.example .env
69
+ # 生成密钥:python -c "import secrets; print(secrets.token_urlsafe(32))"
70
+
71
+ QDRANT_API_KEY=请改为长随机串
72
+ QDRANT_URL=http://127.0.0.1:50008
73
+ ```
74
+
75
+ ### 首次部署
76
+
77
+ ```powershell
78
+ cd qdrant
79
+ copy .env.example .env
80
+ # 编辑 .env,设置 QDRANT_API_KEY
81
+ docker compose up -d
82
+ ```
83
+
84
+ ### 验证
85
+
86
+ ```bash
87
+ curl http://127.0.0.1:50008/collections \
88
+ -H "api-key: 你的QDRANT_API_KEY"
89
+
90
+ # 面板(需填入相同 Key)
91
+ # http://127.0.0.1:50008/dashboard
92
+ ```
93
+
94
+ ### 常用命令
95
+
96
+ ```powershell
97
+ docker compose up -d
98
+ docker compose down
99
+ docker compose logs -f qdrant
100
+ docker compose restart qdrant
101
+ ```
102
+
103
+ ### 说明
104
+
105
+ - 开启 `QDRANT__SERVICE__API_KEY` 后,所有 REST/gRPC 请求必须带 `api-key` 头。
106
+ - mcp-probe-kit 通过环境变量 `MEMORY_QDRANT_API_KEY` 传入。
107
+ - 首次 `memorize_asset` 写入会自动创建 collection `mcp_probe_memory`(Cosine;维度由首次 embedding 推断)。
108
+
109
+ ---
110
+
111
+ ## 2. Nomic Embed(Infinity 推理服务)
112
+
113
+ 基于 [Infinity](https://github.com/michaelfeil/infinity),模型 `nomic-ai/nomic-embed-text-v1.5`,**768 维**,无需 Ollama。
114
+
115
+ ### 服务信息
116
+
117
+ | 项 | 值 |
118
+ |----|-----|
119
+ | **镜像** | `michaelf34/infinity:0.0.70` |
120
+ | **容器名** | `nomic-embed` |
121
+ | **端口** | `50012` → `7997` |
122
+ | **模型** | `nomic-ai/nomic-embed-text-v1.5` |
123
+ | **向量维度** | 768 |
124
+ | **引擎** | `torch`(无 GPU 走 CPU) |
125
+ | **认证** | `INFINITY_API_KEY`(`Authorization: Bearer`) |
126
+ | **模型缓存** | 卷 `hf_cache` → `/app/.cache` |
127
+
128
+ ### `docker-compose.yml`
129
+
130
+ ```yaml
131
+ services:
132
+ nomic-embed:
133
+ image: michaelf34/infinity:0.0.70
134
+ container_name: nomic-embed
135
+ restart: unless-stopped
136
+ ports:
137
+ - "50012:7997"
138
+ volumes:
139
+ - hf_cache:/app/.cache
140
+ environment:
141
+ INFINITY_API_KEY: ${INFINITY_API_KEY}
142
+ command:
143
+ - v2
144
+ - --model-id
145
+ - nomic-ai/nomic-embed-text-v1.5
146
+ - --revision
147
+ - main
148
+ - --dtype
149
+ - float32
150
+ - --batch-size
151
+ - "8"
152
+ - --engine
153
+ - torch
154
+ - --port
155
+ - "7997"
156
+ - --no-bettertransformer
157
+ healthcheck:
158
+ test:
159
+ - "CMD"
160
+ - "curl"
161
+ - "-f"
162
+ - "http://127.0.0.1:7997/health"
163
+ interval: 30s
164
+ timeout: 10s
165
+ retries: 5
166
+ start_period: 120s
167
+
168
+ volumes:
169
+ hf_cache:
170
+ ```
171
+
172
+ ### 首次部署
173
+
174
+ ```powershell
175
+ cd nomic-embed
176
+ copy .env.example .env
177
+ docker compose up -d
178
+ docker logs -f nomic-embed
179
+ ```
180
+
181
+ 冷启动需下载模型,约 **2–5 分钟**,日志出现 `ready to batch requests` 即就绪。
182
+
183
+ ### 访问与验证
184
+
185
+ ```bash
186
+ curl http://127.0.0.1:50012/health
187
+
188
+ curl http://127.0.0.1:50012/models \
189
+ -H "Authorization: Bearer 你的INFINITY_API_KEY"
190
+
191
+ # 注意:路径是 POST /embeddings,不是 /v1/embeddings
192
+ curl http://127.0.0.1:50012/embeddings \
193
+ -H "Authorization: Bearer 你的INFINITY_API_KEY" \
194
+ -H "Content-Type: application/json" \
195
+ -d '{"model":"nomic-ai/nomic-embed-text-v1.5","input":"hello world"}'
196
+ ```
197
+
198
+ Swagger:`http://127.0.0.1:50012/docs`
199
+
200
+ ### 性能(纯 CPU,参考)
201
+
202
+ | 场景 | 约耗时 |
203
+ |------|--------|
204
+ | 单条短文本(热) | 30–50 ms |
205
+ | 首条 | ~150 ms |
206
+ | batch 8 | ~150 ms |
207
+ | 常驻内存 | ~1 GB |
208
+
209
+ 适合 MCP 记忆、低频写入;不适合高并发批量入库。
210
+
211
+ ---
212
+
213
+ ## 3. 与 mcp-probe-kit / Qdrant 配合
214
+
215
+ **推荐组合**:Qdrant `50008` + Infinity `50012`。
216
+
217
+ ```json
218
+ {
219
+ "mcpServers": {
220
+ "mcp-probe-kit": {
221
+ "command": "npx",
222
+ "args": ["-y", "mcp-probe-kit@latest"],
223
+ "env": {
224
+ "MEMORY_QDRANT_URL": "http://127.0.0.1:50008",
225
+ "MEMORY_QDRANT_API_KEY": "与 qdrant/.env 中 QDRANT_API_KEY 相同",
226
+ "MEMORY_QDRANT_COLLECTION": "mcp_probe_memory",
227
+ "MEMORY_EMBEDDING_PROVIDER": "openai-compatible",
228
+ "MEMORY_EMBEDDING_URL": "http://127.0.0.1:50012/embeddings",
229
+ "MEMORY_EMBEDDING_MODEL": "nomic-ai/nomic-embed-text-v1.5",
230
+ "MEMORY_EMBEDDING_API_KEY": "与 nomic-embed/.env 中 INFINITY_API_KEY 相同",
231
+ "MEMORY_SEARCH_LIMIT": "3",
232
+ "MEMORY_SUMMARY_MAX_CHARS": "280",
233
+ "MEMORY_SEARCH_MIN_SCORE": "0",
234
+ "MEMORY_SEARCH_SHOW_SOURCE": "false",
235
+ "MEMORY_REPO_ID": ""
236
+ }
237
+ }
238
+ }
239
+ }
240
+ ```
241
+
242
+ Claude Code:写入 `.mcp.json` 的 `mcpServers.mcp-probe-kit.env`。修改后**完全重启** Cursor。
243
+
244
+ ---
245
+
246
+ ## 4. 端到端自测
247
+
248
+ ```bash
249
+ curl -s http://127.0.0.1:50008/collections -H "api-key: 你的QDRANT_API_KEY"
250
+
251
+ curl -s -X POST http://127.0.0.1:50012/embeddings \
252
+ -H "Authorization: Bearer 你的INFINITY_API_KEY" \
253
+ -H "Content-Type: application/json" \
254
+ -d '{"model":"nomic-ai/nomic-embed-text-v1.5","input":"测试"}' \
255
+ | jq '.data[0].embedding | length'
256
+ # 期望输出:768
257
+ ```
258
+
259
+ ---
260
+
261
+ ## 5. 故障排查
262
+
263
+ | 现象 | 处理 |
264
+ |------|------|
265
+ | Qdrant `401` | 配置 `MEMORY_QDRANT_API_KEY`,与 `.env` 一致 |
266
+ | Embedding `401` | 检查 `Authorization: Bearer` 与 `INFINITY_API_KEY` |
267
+ | Embedding `404` | URL 必须是 `/embeddings`,勿用 `/v1/embeddings` |
268
+ | health 长期 `starting` | 首次下模型,看 `docker logs nomic-embed` |
269
+ | 日志 `No CUDA runtime` | 正常,表示 CPU 推理 |
270
+ | 向量维度不匹配 | 换模型后需新 collection 或删旧 collection |
271
+ | 记忆写入不可用 | 同时配置 `MEMORY_QDRANT_URL`、`MEMORY_EMBEDDING_URL`、`MEMORY_EMBEDDING_MODEL` |
272
+
273
+ ---
274
+
275
+ ## 6. 其它方案
276
+
277
+ | 方案 | 适用 |
278
+ |------|------|
279
+ | **本指南(Qdrant + Infinity)** | 本地开发默认,比 Ollama 轻 |
280
+ | Qdrant + Ollama | 已用 Ollama 跑聊天模型时 |
281
+ | Qdrant + 云端 OpenAI 兼容 API | 不想跑本地 embedding 容器 |
282
+
283
+ 英文版:[memory-local-setup.md](./memory-local-setup.md)