memory-lancedb-pro 1.0.6 → 1.0.8
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +9 -0
- package/README.md +43 -2
- package/README_CN.md +43 -2
- package/index.ts +4 -4
- package/openclaw.plugin.json +6 -2
- package/package.json +1 -1
- package/scripts/jsonl_distill.py +20 -0
package/CHANGELOG.md
CHANGED
|
@@ -1,5 +1,14 @@
|
|
|
1
1
|
# Changelog
|
|
2
2
|
|
|
3
|
+
## 1.0.8
|
|
4
|
+
|
|
5
|
+
- Add: JSONL distill extractor supports optional agent allowlist via env var `OPENCLAW_JSONL_DISTILL_ALLOWED_AGENT_IDS` (default off / compatible).
|
|
6
|
+
|
|
7
|
+
|
|
8
|
+
## 1.0.7
|
|
9
|
+
|
|
10
|
+
- Fix: resolve `agentId` from hook context (`ctx?.agentId`) for `before_agent_start` and `agent_end`, restoring per-agent scope isolation when using multi-agent setups.
|
|
11
|
+
|
|
3
12
|
## 1.0.6
|
|
4
13
|
|
|
5
14
|
- Fix: auto-recall injection now correctly skips cron prompts wrapped as `[cron:...] run ...` (reduces token usage for cron jobs).
|
package/README.md
CHANGED
|
@@ -161,6 +161,32 @@ Filters out low-quality content at both auto-capture and tool-store stages:
|
|
|
161
161
|
|
|
162
162
|
## Installation
|
|
163
163
|
|
|
164
|
+
### AI-safe install notes (anti-hallucination)
|
|
165
|
+
|
|
166
|
+
If you are following this README using an AI assistant, **do not assume defaults**. Always run these commands first and use the real output:
|
|
167
|
+
|
|
168
|
+
```bash
|
|
169
|
+
openclaw config get agents.defaults.workspace
|
|
170
|
+
openclaw config get plugins.load.paths
|
|
171
|
+
openclaw config get plugins.slots.memory
|
|
172
|
+
openclaw config get plugins.entries.memory-lancedb-pro
|
|
173
|
+
```
|
|
174
|
+
|
|
175
|
+
Recommendations:
|
|
176
|
+
- Prefer **absolute paths** in `plugins.load.paths` unless you have confirmed the active workspace.
|
|
177
|
+
- If you use `${JINA_API_KEY}` (or any `${...}` variable) in config, ensure the **Gateway service process** has that environment variable (system services often do **not** inherit your interactive shell env).
|
|
178
|
+
- After changing plugin config, run `openclaw gateway restart`.
|
|
179
|
+
|
|
180
|
+
### Jina API keys (embedding + rerank)
|
|
181
|
+
|
|
182
|
+
- **Embedding**: set `embedding.apiKey` to your Jina key (recommended: use an env var like `${JINA_API_KEY}`).
|
|
183
|
+
- **Rerank** (when `retrieval.rerankProvider: "jina"`): you can typically use the **same** Jina key for `retrieval.rerankApiKey`.
|
|
184
|
+
- If you use a different rerank provider (`siliconflow`, `pinecone`, etc.), `retrieval.rerankApiKey` should be that provider’s key.
|
|
185
|
+
|
|
186
|
+
Key storage guidance:
|
|
187
|
+
- Avoid committing secrets into git.
|
|
188
|
+
- Using `${...}` env vars is fine, but make sure the **Gateway service process** has those env vars (system services often do not inherit your interactive shell environment).
|
|
189
|
+
|
|
164
190
|
### What is the “OpenClaw workspace”?
|
|
165
191
|
|
|
166
192
|
In OpenClaw, the **agent workspace** is the agent’s working directory (default: `~/.openclaw/workspace`).
|
|
@@ -168,7 +194,9 @@ According to the docs, the workspace is the **default cwd**, and **relative path
|
|
|
168
194
|
|
|
169
195
|
> Note: OpenClaw configuration typically lives under `~/.openclaw/openclaw.json` (separate from the workspace).
|
|
170
196
|
|
|
171
|
-
**Common mistake:** cloning the plugin somewhere else, while keeping `plugins.load.paths: ["plugins/memory-lancedb-pro"]
|
|
197
|
+
**Common mistake:** cloning the plugin somewhere else, while keeping a **relative path** like `plugins.load.paths: ["plugins/memory-lancedb-pro"]`. Relative paths can be resolved against different working directories depending on how the Gateway is started.
|
|
198
|
+
|
|
199
|
+
To avoid ambiguity, use an **absolute path** (Option B) or clone into `<workspace>/plugins/` (Option A) and keep your config consistent.
|
|
172
200
|
|
|
173
201
|
### Option A (recommended): clone into `plugins/` under your workspace
|
|
174
202
|
|
|
@@ -285,7 +313,7 @@ openclaw config get plugins.slots.memory
|
|
|
285
313
|
"bm25Weight": 0.3,
|
|
286
314
|
"minScore": 0.3,
|
|
287
315
|
"rerank": "cross-encoder",
|
|
288
|
-
"rerankApiKey": "
|
|
316
|
+
"rerankApiKey": "${JINA_API_KEY}",
|
|
289
317
|
"rerankModel": "jina-reranker-v2-base-multilingual",
|
|
290
318
|
"rerankEndpoint": "https://api.jina.ai/v1/rerank",
|
|
291
319
|
"rerankProvider": "jina",
|
|
@@ -414,6 +442,19 @@ The script is **safe**: it never modifies session logs.
|
|
|
414
442
|
|
|
415
443
|
By default it skips historical reset snapshots (`*.reset.*`) and excludes the distiller agent itself (`memory-distiller`) to prevent self-ingestion loops.
|
|
416
444
|
|
|
445
|
+
### Optional: restrict distillation sources (allowlist)
|
|
446
|
+
|
|
447
|
+
By default, the extractor scans **all agents** (except `memory-distiller`).
|
|
448
|
+
|
|
449
|
+
If you want higher signal (e.g., only distill from your main assistant + coding bot), set:
|
|
450
|
+
|
|
451
|
+
```bash
|
|
452
|
+
export OPENCLAW_JSONL_DISTILL_ALLOWED_AGENT_IDS="main,code-agent"
|
|
453
|
+
```
|
|
454
|
+
|
|
455
|
+
- Unset / empty / `*` / `all` → allow all agents (default)
|
|
456
|
+
- Comma-separated list → only those agents are scanned
|
|
457
|
+
|
|
417
458
|
### Recommended setup (dedicated distiller agent)
|
|
418
459
|
|
|
419
460
|
#### 1) Create a dedicated agent
|
package/README_CN.md
CHANGED
|
@@ -162,6 +162,32 @@ Query → BM25 FTS ─────┘
|
|
|
162
162
|
|
|
163
163
|
## 安装
|
|
164
164
|
|
|
165
|
+
### AI 安装指引(防幻觉版)
|
|
166
|
+
|
|
167
|
+
如果你是用 AI 按 README 操作,**不要假设任何默认值**。请先运行以下命令,并以真实输出为准:
|
|
168
|
+
|
|
169
|
+
```bash
|
|
170
|
+
openclaw config get agents.defaults.workspace
|
|
171
|
+
openclaw config get plugins.load.paths
|
|
172
|
+
openclaw config get plugins.slots.memory
|
|
173
|
+
openclaw config get plugins.entries.memory-lancedb-pro
|
|
174
|
+
```
|
|
175
|
+
|
|
176
|
+
建议:
|
|
177
|
+
- `plugins.load.paths` 建议优先用**绝对路径**(除非你已确认当前 workspace)。
|
|
178
|
+
- 如果配置里使用 `${JINA_API_KEY}`(或任何 `${...}` 变量),务必确保运行 Gateway 的**服务进程环境**里真的有这些变量(systemd/launchd/docker 通常不会继承你终端的 export)。
|
|
179
|
+
- 修改插件配置后,运行 `openclaw gateway restart` 使其生效。
|
|
180
|
+
|
|
181
|
+
### Jina API Key(Embedding + Rerank)如何填写
|
|
182
|
+
|
|
183
|
+
- **Embedding**:将 `embedding.apiKey` 设置为你的 Jina key(推荐用环境变量 `${JINA_API_KEY}`)。
|
|
184
|
+
- **Rerank**(当 `retrieval.rerankProvider: "jina"`):通常可以直接复用同一个 Jina key,填到 `retrieval.rerankApiKey`。
|
|
185
|
+
- 如果你选择了其它 rerank provider(如 `siliconflow` / `pinecone`),则 `retrieval.rerankApiKey` 应填写对应提供商的 key。
|
|
186
|
+
|
|
187
|
+
Key 存储建议:
|
|
188
|
+
- 不要把 key 提交到 git。
|
|
189
|
+
- 使用 `${...}` 环境变量没问题,但务必确保运行 Gateway 的**服务进程环境**里真的有该变量(systemd/launchd/docker 往往不会继承你终端的 export)。
|
|
190
|
+
|
|
165
191
|
### 什么是 “OpenClaw workspace”?
|
|
166
192
|
|
|
167
193
|
在 OpenClaw 中,**agent workspace(工作区)** 是 Agent 的工作目录(默认:`~/.openclaw/workspace`)。
|
|
@@ -169,7 +195,9 @@ Query → BM25 FTS ─────┘
|
|
|
169
195
|
|
|
170
196
|
> 说明:OpenClaw 的配置文件通常在 `~/.openclaw/openclaw.json`,与 workspace 是分开的。
|
|
171
197
|
|
|
172
|
-
**最常见的安装错误:** 把插件 clone
|
|
198
|
+
**最常见的安装错误:** 把插件 clone 到别的目录,但在配置里仍然写类似 `"paths": ["plugins/memory-lancedb-pro"]` 的**相对路径**。相对路径的解析基准会受 Gateway 启动方式/工作目录影响,容易指向错误位置。
|
|
199
|
+
|
|
200
|
+
为避免歧义:建议用**绝对路径**(方案 B),或把插件放在 `<workspace>/plugins/`(方案 A)并保持配置一致。
|
|
173
201
|
|
|
174
202
|
### 方案 A(推荐):克隆到 workspace 的 `plugins/` 目录下
|
|
175
203
|
|
|
@@ -286,7 +314,7 @@ openclaw config get plugins.slots.memory
|
|
|
286
314
|
"bm25Weight": 0.3,
|
|
287
315
|
"minScore": 0.3,
|
|
288
316
|
"rerank": "cross-encoder",
|
|
289
|
-
"rerankApiKey": "
|
|
317
|
+
"rerankApiKey": "${JINA_API_KEY}",
|
|
290
318
|
"rerankModel": "jina-reranker-v2-base-multilingual",
|
|
291
319
|
"candidatePoolSize": 20,
|
|
292
320
|
"recencyHalfLifeDays": 14,
|
|
@@ -357,6 +385,19 @@ OpenClaw 会把每个 Agent 的完整会话自动落盘为 JSONL:
|
|
|
357
385
|
|
|
358
386
|
> 脚本只读 session JSONL,不会修改原始日志。
|
|
359
387
|
|
|
388
|
+
### (可选)启用 Agent 来源白名单(提高信噪比)
|
|
389
|
+
|
|
390
|
+
默认情况下,extractor 会扫描 **所有 Agent**(但会排除 `memory-distiller` 自身,防止自我吞噬)。
|
|
391
|
+
|
|
392
|
+
如果你只想从某些 Agent 蒸馏(例如只蒸馏 `main` + `code-agent`),可以设置环境变量:
|
|
393
|
+
|
|
394
|
+
```bash
|
|
395
|
+
export OPENCLAW_JSONL_DISTILL_ALLOWED_AGENT_IDS="main,code-agent"
|
|
396
|
+
```
|
|
397
|
+
|
|
398
|
+
- 不设置 / 空 / `*` / `all`:扫描全部(默认)
|
|
399
|
+
- 逗号分隔列表:只扫描列表内 agentId
|
|
400
|
+
|
|
360
401
|
### 推荐部署(独立 distiller agent)
|
|
361
402
|
|
|
362
403
|
#### 1)创建 distiller agent(示例用 gpt-5.2)
|
package/index.ts
CHANGED
|
@@ -365,14 +365,14 @@ const memoryLanceDBProPlugin = {
|
|
|
365
365
|
|
|
366
366
|
// Auto-recall: inject relevant memories before agent starts
|
|
367
367
|
if (config.autoRecall !== false) {
|
|
368
|
-
api.on("before_agent_start", async (event) => {
|
|
368
|
+
api.on("before_agent_start", async (event, ctx) => {
|
|
369
369
|
if (!event.prompt || shouldSkipRetrieval(event.prompt)) {
|
|
370
370
|
return;
|
|
371
371
|
}
|
|
372
372
|
|
|
373
373
|
try {
|
|
374
374
|
// Determine agent ID and accessible scopes
|
|
375
|
-
const agentId =
|
|
375
|
+
const agentId = ctx?.agentId || "main";
|
|
376
376
|
const accessibleScopes = scopeManager.getAccessibleScopes(agentId);
|
|
377
377
|
|
|
378
378
|
const results = await retriever.retrieve({
|
|
@@ -409,14 +409,14 @@ const memoryLanceDBProPlugin = {
|
|
|
409
409
|
|
|
410
410
|
// Auto-capture: analyze and store important information after agent ends
|
|
411
411
|
if (config.autoCapture !== false) {
|
|
412
|
-
api.on("agent_end", async (event) => {
|
|
412
|
+
api.on("agent_end", async (event, ctx) => {
|
|
413
413
|
if (!event.success || !event.messages || event.messages.length === 0) {
|
|
414
414
|
return;
|
|
415
415
|
}
|
|
416
416
|
|
|
417
417
|
try {
|
|
418
418
|
// Determine agent ID and default scope
|
|
419
|
-
const agentId =
|
|
419
|
+
const agentId = ctx?.agentId || "main";
|
|
420
420
|
const defaultScope = scopeManager.getDefaultScope(agentId);
|
|
421
421
|
|
|
422
422
|
// Extract text content from messages
|
package/openclaw.plugin.json
CHANGED
|
@@ -2,7 +2,7 @@
|
|
|
2
2
|
"id": "memory-lancedb-pro",
|
|
3
3
|
"name": "Memory (LanceDB Pro)",
|
|
4
4
|
"description": "Enhanced LanceDB-backed long-term memory with hybrid retrieval, multi-scope isolation, and management CLI",
|
|
5
|
-
"version": "1.0.
|
|
5
|
+
"version": "1.0.8",
|
|
6
6
|
"kind": "memory",
|
|
7
7
|
"configSchema": {
|
|
8
8
|
"type": "object",
|
|
@@ -118,7 +118,11 @@
|
|
|
118
118
|
},
|
|
119
119
|
"rerankProvider": {
|
|
120
120
|
"type": "string",
|
|
121
|
-
"enum": [
|
|
121
|
+
"enum": [
|
|
122
|
+
"jina",
|
|
123
|
+
"siliconflow",
|
|
124
|
+
"pinecone"
|
|
125
|
+
],
|
|
122
126
|
"default": "jina",
|
|
123
127
|
"description": "Reranker provider format. Determines request/response shape and auth header."
|
|
124
128
|
},
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "memory-lancedb-pro",
|
|
3
|
-
"version": "1.0.
|
|
3
|
+
"version": "1.0.8",
|
|
4
4
|
"description": "OpenClaw enhanced LanceDB memory plugin with hybrid retrieval (Vector + BM25), cross-encoder rerank, multi-scope isolation, and management CLI",
|
|
5
5
|
"type": "module",
|
|
6
6
|
"main": "index.ts",
|
package/scripts/jsonl_distill.py
CHANGED
|
@@ -36,6 +36,22 @@ EXCLUDED_AGENT_IDS = {
|
|
|
36
36
|
"memory-distiller",
|
|
37
37
|
}
|
|
38
38
|
|
|
39
|
+
# Source allowlist (optional quality control).
|
|
40
|
+
# Default (env unset): allow all agents (except EXCLUDED_AGENT_IDS).
|
|
41
|
+
# If set: only distill from the listed agent IDs.
|
|
42
|
+
# Example:
|
|
43
|
+
# OPENCLAW_JSONL_DISTILL_ALLOWED_AGENT_IDS=main,code-agent
|
|
44
|
+
ENV_ALLOWED_AGENT_IDS = "OPENCLAW_JSONL_DISTILL_ALLOWED_AGENT_IDS"
|
|
45
|
+
|
|
46
|
+
|
|
47
|
+
def _get_allowed_agent_ids() -> Optional[set[str]]:
|
|
48
|
+
raw = os.environ.get(ENV_ALLOWED_AGENT_IDS, "").strip()
|
|
49
|
+
if not raw or raw in ("*", "all"):
|
|
50
|
+
return None
|
|
51
|
+
parts = [p.strip() for p in raw.split(",") if p.strip()]
|
|
52
|
+
return set(parts) if parts else None
|
|
53
|
+
|
|
54
|
+
|
|
39
55
|
|
|
40
56
|
NOISE_PREFIXES = (
|
|
41
57
|
"✅ New session started",
|
|
@@ -175,12 +191,16 @@ def _list_session_files(agents_dir: Path) -> List[Tuple[str, Path]]:
|
|
|
175
191
|
if not agents_dir.exists():
|
|
176
192
|
return results
|
|
177
193
|
|
|
194
|
+
allowed_agent_ids = _get_allowed_agent_ids()
|
|
195
|
+
|
|
178
196
|
for agent_dir in sorted(agents_dir.iterdir()):
|
|
179
197
|
if not agent_dir.is_dir():
|
|
180
198
|
continue
|
|
181
199
|
agent_id = agent_dir.name
|
|
182
200
|
if agent_id in EXCLUDED_AGENT_IDS:
|
|
183
201
|
continue
|
|
202
|
+
if allowed_agent_ids is not None and agent_id not in allowed_agent_ids:
|
|
203
|
+
continue
|
|
184
204
|
sessions_dir = agent_dir / "sessions"
|
|
185
205
|
if not sessions_dir.exists():
|
|
186
206
|
continue
|