@ontos-ai/knowhere-claw 0.1.3 → 0.1.5-beta.20260320121253
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +12 -5
- package/dist/tools.js +3 -3
- package/openclaw.plugin.json +9 -3
- package/package.json +9 -3
- package/skills/knowhere/SKILL.md +1 -1
- package/skills/knowhere_memory/SKILL.md +167 -0
package/README.md
CHANGED
|
@@ -10,7 +10,7 @@ Quick mental model:
|
|
|
10
10
|
- Restart the Gateway
|
|
11
11
|
- Configure `plugins.entries."knowhere-claw".config`
|
|
12
12
|
- Ask an agent to read, search, or compare a document
|
|
13
|
-
- Let the bundled `knowhere`
|
|
13
|
+
- Let the bundled `knowhere` and `knowhere_memory` skills steer the agent toward the right tools
|
|
14
14
|
|
|
15
15
|
## Where It Runs
|
|
16
16
|
|
|
@@ -25,8 +25,8 @@ the machine running that Gateway, then restart that Gateway.
|
|
|
25
25
|
- Store parsed result packages inside OpenClaw-managed state
|
|
26
26
|
- Preview document structure, search chunks, and inspect raw result files
|
|
27
27
|
- Reuse stored documents across `session`, `agent`, or `global` scope
|
|
28
|
-
- Ship
|
|
29
|
-
document-heavy tasks
|
|
28
|
+
- Ship bundled `knowhere` and `knowhere_memory` skills so agents prefer this
|
|
29
|
+
toolchain for document-heavy tasks and knowledge-base lookups
|
|
30
30
|
|
|
31
31
|
## Install
|
|
32
32
|
|
|
@@ -88,9 +88,13 @@ Once the plugin is enabled, you can ask an OpenClaw agent to:
|
|
|
88
88
|
The bundled `knowhere` skill teaches agents to use the `knowhere_*` tools
|
|
89
89
|
instead of raw file reads when document parsing matters.
|
|
90
90
|
|
|
91
|
+
The bundled `knowhere_memory` skill teaches agents to treat previously parsed
|
|
92
|
+
Knowhere content as a knowledge base when users ask to search their materials,
|
|
93
|
+
look something up, or summarize what data they already have.
|
|
94
|
+
|
|
91
95
|
If you use skill filters or allowlists in OpenClaw, keep the bundled
|
|
92
|
-
`knowhere`
|
|
93
|
-
guidance.
|
|
96
|
+
`knowhere` and `knowhere_memory` skills enabled or the tools will load without
|
|
97
|
+
their intended usage guidance.
|
|
94
98
|
|
|
95
99
|
If your agent runtime uses a tool allowlist, include `knowhere_*` so agents can
|
|
96
100
|
actually call the plugin tools.
|
|
@@ -134,3 +138,6 @@ Within each scope, the plugin keeps:
|
|
|
134
138
|
|
|
135
139
|
Contributor-oriented architecture, workflow, and packaging notes live in
|
|
136
140
|
`DEVELOPMENT.md` at the repository root.
|
|
141
|
+
|
|
142
|
+
Release-process details for maintainers live in
|
|
143
|
+
[`docs/release-workflow.md`](./docs/release-workflow.md).
|
package/dist/tools.js
CHANGED
|
@@ -492,7 +492,7 @@ function createIngestTool(params) {
|
|
|
492
492
|
return {
|
|
493
493
|
name: "knowhere_ingest_document",
|
|
494
494
|
label: "Knowhere Ingest",
|
|
495
|
-
description: "Parse a local file or remote URL with Knowhere and store the result in the current scope. Before calling this for a document that might already be stored in the current scope, use knowhere_list_documents and reuse the existing stored document when Source, File, or Title clearly match unless the user explicitly asks for a fresh parse or overwrite. When the user provides a URL to a document (PDF link, web page, etc.), pass it as the url parameter — Knowhere fetches it directly, no local download needed. Returns immediately with a job ID while parsing continues in the background. Use knowhere_get_job_status
|
|
495
|
+
description: "Parse a local file or remote URL with Knowhere and store the result in the current scope. Before calling this for a document that might already be stored in the current scope, use knowhere_list_documents and reuse the existing stored document when Source, File, or Title clearly match unless the user explicitly asks for a fresh parse or overwrite. When the user provides a URL to a document (PDF link, web page, etc.), pass it as the url parameter — Knowhere fetches it directly, no local download needed. Returns immediately with a job ID while parsing continues in the background. Use knowhere_get_job_status only when the current turn needs the parsed result. Use lang to control the language of any user-facing background status update (`en` by default, `ch` for Chinese). Provide either filePath or url, not both.",
|
|
496
496
|
parameters: {
|
|
497
497
|
type: "object",
|
|
498
498
|
additionalProperties: false,
|
|
@@ -536,7 +536,7 @@ function createIngestTool(params) {
|
|
|
536
536
|
},
|
|
537
537
|
lang: {
|
|
538
538
|
type: "string",
|
|
539
|
-
description: "Language for
|
|
539
|
+
description: "Language for any user-facing background status update sent after parsing completes or fails. Supports en and ch; unsupported values fall back to en."
|
|
540
540
|
},
|
|
541
541
|
parsing: {
|
|
542
542
|
type: "object",
|
|
@@ -679,7 +679,7 @@ function createIngestTool(params) {
|
|
|
679
679
|
`Job ID: ${createdJob.job_id}`,
|
|
680
680
|
`File: ${progressLabel}`,
|
|
681
681
|
`Scope: ${scope.label}`,
|
|
682
|
-
"Use knowhere_get_job_status
|
|
682
|
+
"Use knowhere_get_job_status only if this turn needs the parsed result."
|
|
683
683
|
].join("\n"));
|
|
684
684
|
}
|
|
685
685
|
};
|
package/openclaw.plugin.json
CHANGED
|
@@ -2,8 +2,10 @@
|
|
|
2
2
|
"id": "knowhere-claw",
|
|
3
3
|
"name": "Knowhere",
|
|
4
4
|
"description": "Parse documents with Knowhere and expose the stored result as tool-queryable document state for OpenClaw agents.",
|
|
5
|
-
"skills": [
|
|
6
|
-
|
|
5
|
+
"skills": [
|
|
6
|
+
"./skills"
|
|
7
|
+
],
|
|
8
|
+
"version": "0.1.5-beta.20260320121253",
|
|
7
9
|
"uiHints": {
|
|
8
10
|
"apiKey": {
|
|
9
11
|
"label": "Knowhere API Key",
|
|
@@ -56,7 +58,11 @@
|
|
|
56
58
|
},
|
|
57
59
|
"scopeMode": {
|
|
58
60
|
"type": "string",
|
|
59
|
-
"enum": [
|
|
61
|
+
"enum": [
|
|
62
|
+
"session",
|
|
63
|
+
"agent",
|
|
64
|
+
"global"
|
|
65
|
+
],
|
|
60
66
|
"default": "session"
|
|
61
67
|
},
|
|
62
68
|
"pollIntervalMs": {
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "@ontos-ai/knowhere-claw",
|
|
3
|
-
"version": "0.1.
|
|
3
|
+
"version": "0.1.5-beta.20260320121253",
|
|
4
4
|
"description": "OpenClaw plugin for Knowhere-powered document ingestion and automatic grounding.",
|
|
5
5
|
"files": [
|
|
6
6
|
"dist/",
|
|
@@ -19,15 +19,19 @@
|
|
|
19
19
|
},
|
|
20
20
|
"scripts": {
|
|
21
21
|
"build": "rolldown -c && tsc -p tsconfig.build.json",
|
|
22
|
+
"changeset": "changeset",
|
|
23
|
+
"changeset:publish": "changeset publish",
|
|
22
24
|
"fmt": "oxfmt",
|
|
23
25
|
"fmt:check": "oxfmt --check",
|
|
24
26
|
"lint": "oxlint --type-aware",
|
|
25
27
|
"lint:fix": "oxlint --type-aware --fix",
|
|
26
28
|
"smoke:tools": "vite-node --mode smoke ./smoketest/run-tool.ts",
|
|
29
|
+
"release:beta": "node ./scripts/publish-beta-release.mjs",
|
|
30
|
+
"release:publish": "node ./scripts/publish-release.mjs",
|
|
31
|
+
"release:version": "node ./scripts/release-version.mjs",
|
|
27
32
|
"tsgo": "tsgo --noEmit -p tsconfig.json",
|
|
28
33
|
"typecheck": "pnpm tsgo",
|
|
29
34
|
"check:plugin-version": "node ./scripts/release-guard.mjs plugin-version",
|
|
30
|
-
"check:beta-version": "node ./scripts/release-guard.mjs beta-version",
|
|
31
35
|
"prepack": "pnpm build",
|
|
32
36
|
"test": "vitest --run",
|
|
33
37
|
"clean": "rm -rf dist"
|
|
@@ -37,6 +41,8 @@
|
|
|
37
41
|
"fflate": "^0.8.2"
|
|
38
42
|
},
|
|
39
43
|
"devDependencies": {
|
|
44
|
+
"@changesets/changelog-github": "^0.6.0",
|
|
45
|
+
"@changesets/cli": "^2.30.0",
|
|
40
46
|
"@tsconfig/node22": "^22.0.5",
|
|
41
47
|
"@types/node": "^25.3.5",
|
|
42
48
|
"@typescript/native-preview": "7.0.0-dev.20260307.1",
|
|
@@ -60,4 +66,4 @@
|
|
|
60
66
|
"./dist/index.js"
|
|
61
67
|
]
|
|
62
68
|
}
|
|
63
|
-
}
|
|
69
|
+
}
|
package/skills/knowhere/SKILL.md
CHANGED
|
@@ -132,7 +132,7 @@ After ingesting a document, use the returned document or job identifiers for fol
|
|
|
132
132
|
## Recommended workflow
|
|
133
133
|
|
|
134
134
|
1. If the document may already exist in the current scope, call `knowhere_list_documents` first and compare `Source`, `File`, and `Title` to find an existing match.
|
|
135
|
-
2. Ingest or import the document only if it is not already in the store, or if the user explicitly wants a fresh parse. After calling `knowhere_ingest_document`, you receive a job ID immediately while parsing continues in the background.
|
|
135
|
+
2. Ingest or import the document only if it is not already in the store, or if the user explicitly wants a fresh parse. After calling `knowhere_ingest_document`, you receive a job ID immediately while parsing continues in the background. If the current turn needs the parsed document, check with `knowhere_get_job_status`; otherwise stop and wait for the user to continue later.
|
|
136
136
|
3. Call `knowhere_list_documents` again if you need to confirm the right `docId`.
|
|
137
137
|
4. Call `knowhere_preview_document` to get a structural overview (table of contents with summaries).
|
|
138
138
|
5. When you know what to search for, call `knowhere_grep` with `conditions: [{ pattern: "your query" }]` — this searches all text fields (content, summary, keywords, path) in one call. Add more conditions to narrow results (e.g. filter by `chunk.type` or `chunk.path`).
|
|
@@ -0,0 +1,167 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: knowhere_memory
|
|
3
|
+
description: Auto-discover and search knowledge from Knowhere parsed documents. Use when the user asks questions, needs information, or references their knowledge base.
|
|
4
|
+
user-invocable: false
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# Knowhere Knowledge Memory
|
|
8
|
+
|
|
9
|
+
This agent has access to a **personal knowledge base** managed by Knowhere.
|
|
10
|
+
|
|
11
|
+
## When to Use
|
|
12
|
+
|
|
13
|
+
Activate this skill when:
|
|
14
|
+
|
|
15
|
+
- The user asks a question that might be answered by their documents
|
|
16
|
+
- The user says "look it up", "help me find", "knowledge base", "my materials", etc.
|
|
17
|
+
- The user asks "what materials do I have" or wants an overview
|
|
18
|
+
|
|
19
|
+
## Data Location
|
|
20
|
+
|
|
21
|
+
All knowledge data lives under `~/.knowhere/{kb_id}/`:
|
|
22
|
+
|
|
23
|
+
```text
|
|
24
|
+
~/.knowhere/
|
|
25
|
+
└── {kb_id}/ # e.g. "chengke_kb"
|
|
26
|
+
├── knowledge_graph.json # START HERE — file-level overview + cross-file edges
|
|
27
|
+
├── chunk_stats.json # Hit counts / usage stats per chunk
|
|
28
|
+
└── {document_name}/ # One subdir per parsed document
|
|
29
|
+
├── chunks.json # All chunks for this document (the actual content)
|
|
30
|
+
├── hierarchy.json # Document structure tree
|
|
31
|
+
├── images/ # Extracted images (JPEG/PNG)
|
|
32
|
+
└── tables/ # Extracted tables (HTML files)
|
|
33
|
+
```
|
|
34
|
+
|
|
35
|
+
## File Schema Reference
|
|
36
|
+
|
|
37
|
+
### knowledge_graph.json — Global Navigation (read this first)
|
|
38
|
+
|
|
39
|
+
```json
|
|
40
|
+
{
|
|
41
|
+
"version": "2.0",
|
|
42
|
+
"stats": { "total_files": 5, "total_chunks": 327, "total_cross_file_edges": 3 },
|
|
43
|
+
"files": {
|
|
44
|
+
"report.docx": {
|
|
45
|
+
"chunks_count": 198,
|
|
46
|
+
"types": { "text": 135, "table": 21, "image": 42 },
|
|
47
|
+
"top_keywords": ["excavation", "retaining", "construction"],
|
|
48
|
+
"top_summary": "",
|
|
49
|
+
"importance": 0.85
|
|
50
|
+
}
|
|
51
|
+
},
|
|
52
|
+
"edges": [
|
|
53
|
+
{
|
|
54
|
+
"source": "file_A.docx",
|
|
55
|
+
"target": "file_B.pdf",
|
|
56
|
+
"connection_count": 20,
|
|
57
|
+
"avg_score": 0.95,
|
|
58
|
+
"top_connections": [
|
|
59
|
+
{
|
|
60
|
+
"source_chunk": "Chapter 3 Safety Measures",
|
|
61
|
+
"source_id": "abc123-...",
|
|
62
|
+
"target_chunk": "Safety Management Policy",
|
|
63
|
+
"target_id": "def456-...",
|
|
64
|
+
"relation": "related",
|
|
65
|
+
"score": 1.0
|
|
66
|
+
}
|
|
67
|
+
]
|
|
68
|
+
}
|
|
69
|
+
]
|
|
70
|
+
}
|
|
71
|
+
```
|
|
72
|
+
|
|
73
|
+
### chunks.json — Document Content (read per-file, on demand)
|
|
74
|
+
|
|
75
|
+
Located at `~/.knowhere/{kb_id}/{document_name}/chunks.json`.
|
|
76
|
+
|
|
77
|
+
```json
|
|
78
|
+
{
|
|
79
|
+
"chunks": [
|
|
80
|
+
{
|
|
81
|
+
"chunk_id": "34da946a-5938-578c-...",
|
|
82
|
+
"type": "text",
|
|
83
|
+
"path": "Default_Root/report.docx/Chapter 1/1.1",
|
|
84
|
+
"content": "actual content...",
|
|
85
|
+
"metadata": {
|
|
86
|
+
"summary": "LLM-generated summary (may be empty)",
|
|
87
|
+
"keywords": ["Extracted keywords"],
|
|
88
|
+
"tokens": ["Jieba tokenization"],
|
|
89
|
+
"length": 1234,
|
|
90
|
+
"page_nums": "Source pages (PDF/DOCX)"
|
|
91
|
+
}
|
|
92
|
+
}
|
|
93
|
+
]
|
|
94
|
+
}
|
|
95
|
+
```
|
|
96
|
+
|
|
97
|
+
**Content format by chunk type:**
|
|
98
|
+
|
|
99
|
+
- `text`: Plain text with embedded markers like `IMAGE_uuid_IMAGE` or `TABLE_uuid_TABLE`
|
|
100
|
+
- `table`: Raw HTML (`<table>...</table>`)
|
|
101
|
+
- `image`: Brief description + `IMAGE_uuid_IMAGE` marker; actual image file in `images/` subdir
|
|
102
|
+
|
|
103
|
+
### hierarchy.json — Document Structure
|
|
104
|
+
|
|
105
|
+
Three sub-trees:
|
|
106
|
+
|
|
107
|
+
- `images/`: all extracted images with descriptive names
|
|
108
|
+
- `tables/`: all extracted tables with header-based names
|
|
109
|
+
- `Default_Root/{filename}/`: section hierarchy (chapters → subsections)
|
|
110
|
+
|
|
111
|
+
## Retrieval Workflow
|
|
112
|
+
|
|
113
|
+
All operations below are **read-only** — use your file reading tools (e.g. `view_file`, `read_file`) to read JSON files directly. Do NOT use shell commands like `cat` — use native file reading tools that don't require user approval.
|
|
114
|
+
|
|
115
|
+
Follow this pattern — do NOT explore the filesystem blindly:
|
|
116
|
+
|
|
117
|
+
### Before Step 1: Resolve `kb_id`
|
|
118
|
+
|
|
119
|
+
- If the user already specified a KB, use that `kb_id`.
|
|
120
|
+
- Otherwise, inspect only the top level of `~/.knowhere/` to discover available KB IDs.
|
|
121
|
+
- If exactly one KB is available, use it.
|
|
122
|
+
- If multiple KBs are available and the user did not specify one, ask which KB to use.
|
|
123
|
+
- Do not explore beyond the top level of `~/.knowhere/` until `kb_id` is known.
|
|
124
|
+
|
|
125
|
+
### Step 1: Read knowledge_graph.json (global navigation)
|
|
126
|
+
|
|
127
|
+
Read the file `~/.knowhere/{kb_id}/knowledge_graph.json` using your file reading tool.
|
|
128
|
+
|
|
129
|
+
From this you get:
|
|
130
|
+
|
|
131
|
+
- **File list** with `top_keywords` → match user's question against ALL files, not just one
|
|
132
|
+
- **importance** → prioritize high-value files when multiple match
|
|
133
|
+
- **edges** → note which matched files connect to other files (you'll need these in Step 3)
|
|
134
|
+
|
|
135
|
+
**Important**: Identify ALL candidate files whose `top_keywords` are relevant to the query. Do not stop at the first match.
|
|
136
|
+
|
|
137
|
+
### Step 2: Search ALL candidate files' chunks.json
|
|
138
|
+
|
|
139
|
+
For EACH candidate file identified in Step 1, read `~/.knowhere/{kb_id}/{document_name}/chunks.json`.
|
|
140
|
+
|
|
141
|
+
Search the `chunks` array:
|
|
142
|
+
|
|
143
|
+
- Match `metadata.summary` or `content` against the user's query
|
|
144
|
+
- Use `metadata.keywords` for topic matching
|
|
145
|
+
- Use `path` to understand where the chunk sits in the document structure
|
|
146
|
+
- Use `chunk_id` to cross-reference with edge `source_id`/`target_id`
|
|
147
|
+
|
|
148
|
+
Collect matching chunks from ALL files, not just the first one that hits.
|
|
149
|
+
|
|
150
|
+
### Step 3: Expand via edges (required, not optional)
|
|
151
|
+
|
|
152
|
+
After finding matches, ALWAYS check the `edges` array from Step 1 for connections:
|
|
153
|
+
|
|
154
|
+
1. Look at edges involving your matched files
|
|
155
|
+
2. Check `top_connections` — if any `source_chunk`/`target_chunk` names are related to the query topic, the connected file likely has relevant content too
|
|
156
|
+
3. If the connected file wasn't already in your candidate set, read its `chunks.json` and search for related content
|
|
157
|
+
4. Use `source_id`/`target_id` to jump directly to specific related chunks
|
|
158
|
+
|
|
159
|
+
**Why this matters**: Documents often split related information across files. Edges reveal these connections.
|
|
160
|
+
|
|
161
|
+
## Response Guidelines
|
|
162
|
+
|
|
163
|
+
- **Multi-source**: Synthesize information from ALL matched files, not just one
|
|
164
|
+
- **Cite sources**: Include document name and chunk path for each piece of information
|
|
165
|
+
- **Show connections**: When edges link matched chunks across files, mention the relationship
|
|
166
|
+
- **Distinguish**: Be transparent about what comes from parsed documents vs general knowledge
|
|
167
|
+
- **Use summaries**: When available, `metadata.summary` gives a quick overview without reading full content
|