@ontos-ai/knowhere-claw 0.1.3 → 0.1.5-beta.20260320121253

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -10,7 +10,7 @@ Quick mental model:
10
10
  - Restart the Gateway
11
11
  - Configure `plugins.entries."knowhere-claw".config`
12
12
  - Ask an agent to read, search, or compare a document
13
- - Let the bundled `knowhere` skill steer the agent toward the right tools
13
+ - Let the bundled `knowhere` and `knowhere_memory` skills steer the agent toward the right tools
14
14
 
15
15
  ## Where It Runs
16
16
 
@@ -25,8 +25,8 @@ the machine running that Gateway, then restart that Gateway.
25
25
  - Store parsed result packages inside OpenClaw-managed state
26
26
  - Preview document structure, search chunks, and inspect raw result files
27
27
  - Reuse stored documents across `session`, `agent`, or `global` scope
28
- - Ship a bundled `knowhere` skill so agents prefer this toolchain for
29
- document-heavy tasks
28
+ - Ship bundled `knowhere` and `knowhere_memory` skills so agents prefer this
29
+ toolchain for document-heavy tasks and knowledge-base lookups
30
30
 
31
31
  ## Install
32
32
 
@@ -88,9 +88,13 @@ Once the plugin is enabled, you can ask an OpenClaw agent to:
88
88
  The bundled `knowhere` skill teaches agents to use the `knowhere_*` tools
89
89
  instead of raw file reads when document parsing matters.
90
90
 
91
+ The bundled `knowhere_memory` skill teaches agents to treat previously parsed
92
+ Knowhere content as a knowledge base when users ask to search their materials,
93
+ look something up, or summarize what data they already have.
94
+
91
95
  If you use skill filters or allowlists in OpenClaw, keep the bundled
92
- `knowhere` skill enabled or the tools will load without their intended usage
93
- guidance.
96
+ `knowhere` and `knowhere_memory` skills enabled or the tools will load without
97
+ their intended usage guidance.
94
98
 
95
99
  If your agent runtime uses a tool allowlist, include `knowhere_*` so agents can
96
100
  actually call the plugin tools.
@@ -134,3 +138,6 @@ Within each scope, the plugin keeps:
134
138
 
135
139
  Contributor-oriented architecture, workflow, and packaging notes live in
136
140
  `DEVELOPMENT.md` at the repository root.
141
+
142
+ Release-process details for maintainers live in
143
+ [`docs/release-workflow.md`](./docs/release-workflow.md).
package/dist/tools.js CHANGED
@@ -492,7 +492,7 @@ function createIngestTool(params) {
492
492
  return {
493
493
  name: "knowhere_ingest_document",
494
494
  label: "Knowhere Ingest",
495
- description: "Parse a local file or remote URL with Knowhere and store the result in the current scope. Before calling this for a document that might already be stored in the current scope, use knowhere_list_documents and reuse the existing stored document when Source, File, or Title clearly match unless the user explicitly asks for a fresh parse or overwrite. When the user provides a URL to a document (PDF link, web page, etc.), pass it as the url parameter — Knowhere fetches it directly, no local download needed. Returns immediately with a job ID while parsing continues in the background. Use knowhere_get_job_status to check progress at any time. Use lang to control the language of the direct tracker follow-up (`en` by default, `ch` for Chinese). Provide either filePath or url, not both.",
495
+ description: "Parse a local file or remote URL with Knowhere and store the result in the current scope. Before calling this for a document that might already be stored in the current scope, use knowhere_list_documents and reuse the existing stored document when Source, File, or Title clearly match unless the user explicitly asks for a fresh parse or overwrite. When the user provides a URL to a document (PDF link, web page, etc.), pass it as the url parameter — Knowhere fetches it directly, no local download needed. Returns immediately with a job ID while parsing continues in the background. Use knowhere_get_job_status only when the current turn needs the parsed result. Use lang to control the language of any user-facing background status update (`en` by default, `ch` for Chinese). Provide either filePath or url, not both.",
496
496
  parameters: {
497
497
  type: "object",
498
498
  additionalProperties: false,
@@ -536,7 +536,7 @@ function createIngestTool(params) {
536
536
  },
537
537
  lang: {
538
538
  type: "string",
539
- description: "Language for the direct tracker follow-up message sent after background parsing completes or fails. Supports en and ch; unsupported values fall back to en."
539
+ description: "Language for any user-facing background status update sent after parsing completes or fails. Supports en and ch; unsupported values fall back to en."
540
540
  },
541
541
  parsing: {
542
542
  type: "object",
@@ -679,7 +679,7 @@ function createIngestTool(params) {
679
679
  `Job ID: ${createdJob.job_id}`,
680
680
  `File: ${progressLabel}`,
681
681
  `Scope: ${scope.label}`,
682
- "Use knowhere_get_job_status to check progress at any time."
682
+ "Use knowhere_get_job_status only if this turn needs the parsed result."
683
683
  ].join("\n"));
684
684
  }
685
685
  };
@@ -2,8 +2,10 @@
2
2
  "id": "knowhere-claw",
3
3
  "name": "Knowhere",
4
4
  "description": "Parse documents with Knowhere and expose the stored result as tool-queryable document state for OpenClaw agents.",
5
- "skills": ["./skills"],
6
- "version": "0.1.1",
5
+ "skills": [
6
+ "./skills"
7
+ ],
8
+ "version": "0.1.5-beta.20260320121253",
7
9
  "uiHints": {
8
10
  "apiKey": {
9
11
  "label": "Knowhere API Key",
@@ -56,7 +58,11 @@
56
58
  },
57
59
  "scopeMode": {
58
60
  "type": "string",
59
- "enum": ["session", "agent", "global"],
61
+ "enum": [
62
+ "session",
63
+ "agent",
64
+ "global"
65
+ ],
60
66
  "default": "session"
61
67
  },
62
68
  "pollIntervalMs": {
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@ontos-ai/knowhere-claw",
3
- "version": "0.1.3",
3
+ "version": "0.1.5-beta.20260320121253",
4
4
  "description": "OpenClaw plugin for Knowhere-powered document ingestion and automatic grounding.",
5
5
  "files": [
6
6
  "dist/",
@@ -19,15 +19,19 @@
19
19
  },
20
20
  "scripts": {
21
21
  "build": "rolldown -c && tsc -p tsconfig.build.json",
22
+ "changeset": "changeset",
23
+ "changeset:publish": "changeset publish",
22
24
  "fmt": "oxfmt",
23
25
  "fmt:check": "oxfmt --check",
24
26
  "lint": "oxlint --type-aware",
25
27
  "lint:fix": "oxlint --type-aware --fix",
26
28
  "smoke:tools": "vite-node --mode smoke ./smoketest/run-tool.ts",
29
+ "release:beta": "node ./scripts/publish-beta-release.mjs",
30
+ "release:publish": "node ./scripts/publish-release.mjs",
31
+ "release:version": "node ./scripts/release-version.mjs",
27
32
  "tsgo": "tsgo --noEmit -p tsconfig.json",
28
33
  "typecheck": "pnpm tsgo",
29
34
  "check:plugin-version": "node ./scripts/release-guard.mjs plugin-version",
30
- "check:beta-version": "node ./scripts/release-guard.mjs beta-version",
31
35
  "prepack": "pnpm build",
32
36
  "test": "vitest --run",
33
37
  "clean": "rm -rf dist"
@@ -37,6 +41,8 @@
37
41
  "fflate": "^0.8.2"
38
42
  },
39
43
  "devDependencies": {
44
+ "@changesets/changelog-github": "^0.6.0",
45
+ "@changesets/cli": "^2.30.0",
40
46
  "@tsconfig/node22": "^22.0.5",
41
47
  "@types/node": "^25.3.5",
42
48
  "@typescript/native-preview": "7.0.0-dev.20260307.1",
@@ -60,4 +66,4 @@
60
66
  "./dist/index.js"
61
67
  ]
62
68
  }
63
- }
69
+ }
@@ -132,7 +132,7 @@ After ingesting a document, use the returned document or job identifiers for fol
132
132
  ## Recommended workflow
133
133
 
134
134
  1. If the document may already exist in the current scope, call `knowhere_list_documents` first and compare `Source`, `File`, and `Title` to find an existing match.
135
- 2. Ingest or import the document only if it is not already in the store, or if the user explicitly wants a fresh parse. After calling `knowhere_ingest_document`, you receive a job ID immediately while parsing continues in the background. Wait for the plugin's completion message that the document is ready before proceeding. If no completion message arrives, check with `knowhere_get_job_status`.
135
+ 2. Ingest or import the document only if it is not already in the store, or if the user explicitly wants a fresh parse. After calling `knowhere_ingest_document`, you receive a job ID immediately while parsing continues in the background. If the current turn needs the parsed document, check with `knowhere_get_job_status`; otherwise stop and wait for the user to continue later.
136
136
  3. Call `knowhere_list_documents` again if you need to confirm the right `docId`.
137
137
  4. Call `knowhere_preview_document` to get a structural overview (table of contents with summaries).
138
138
  5. When you know what to search for, call `knowhere_grep` with `conditions: [{ pattern: "your query" }]` — this searches all text fields (content, summary, keywords, path) in one call. Add more conditions to narrow results (e.g. filter by `chunk.type` or `chunk.path`).
@@ -0,0 +1,167 @@
1
+ ---
2
+ name: knowhere_memory
3
+ description: Auto-discover and search knowledge from Knowhere parsed documents. Use when the user asks questions, needs information, or references their knowledge base.
4
+ user-invocable: false
5
+ ---
6
+
7
+ # Knowhere Knowledge Memory
8
+
9
+ This agent has access to a **personal knowledge base** managed by Knowhere.
10
+
11
+ ## When to Use
12
+
13
+ Activate this skill when:
14
+
15
+ - The user asks a question that might be answered by their documents
16
+ - The user says "look it up", "help me find", "knowledge base", "my materials", etc.
17
+ - The user asks "what materials do I have" or wants an overview
18
+
19
+ ## Data Location
20
+
21
+ All knowledge data lives under `~/.knowhere/{kb_id}/`:
22
+
23
+ ```text
24
+ ~/.knowhere/
25
+ └── {kb_id}/ # e.g. "chengke_kb"
26
+ ├── knowledge_graph.json # START HERE — file-level overview + cross-file edges
27
+ ├── chunk_stats.json # Hit counts / usage stats per chunk
28
+ └── {document_name}/ # One subdir per parsed document
29
+ ├── chunks.json # All chunks for this document (the actual content)
30
+ ├── hierarchy.json # Document structure tree
31
+ ├── images/ # Extracted images (JPEG/PNG)
32
+ └── tables/ # Extracted tables (HTML files)
33
+ ```
34
+
35
+ ## File Schema Reference
36
+
37
+ ### knowledge_graph.json — Global Navigation (read this first)
38
+
39
+ ```json
40
+ {
41
+ "version": "2.0",
42
+ "stats": { "total_files": 5, "total_chunks": 327, "total_cross_file_edges": 3 },
43
+ "files": {
44
+ "report.docx": {
45
+ "chunks_count": 198,
46
+ "types": { "text": 135, "table": 21, "image": 42 },
47
+ "top_keywords": ["excavation", "retaining", "construction"],
48
+ "top_summary": "",
49
+ "importance": 0.85
50
+ }
51
+ },
52
+ "edges": [
53
+ {
54
+ "source": "file_A.docx",
55
+ "target": "file_B.pdf",
56
+ "connection_count": 20,
57
+ "avg_score": 0.95,
58
+ "top_connections": [
59
+ {
60
+ "source_chunk": "Chapter 3 Safety Measures",
61
+ "source_id": "abc123-...",
62
+ "target_chunk": "Safety Management Policy",
63
+ "target_id": "def456-...",
64
+ "relation": "related",
65
+ "score": 1.0
66
+ }
67
+ ]
68
+ }
69
+ ]
70
+ }
71
+ ```
72
+
73
+ ### chunks.json — Document Content (read per-file, on demand)
74
+
75
+ Located at `~/.knowhere/{kb_id}/{document_name}/chunks.json`.
76
+
77
+ ```json
78
+ {
79
+ "chunks": [
80
+ {
81
+ "chunk_id": "34da946a-5938-578c-...",
82
+ "type": "text",
83
+ "path": "Default_Root/report.docx/Chapter 1/1.1",
84
+ "content": "actual content...",
85
+ "metadata": {
86
+ "summary": "LLM-generated summary (may be empty)",
87
+ "keywords": ["Extracted keywords"],
88
+ "tokens": ["Jieba tokenization"],
89
+ "length": 1234,
90
+ "page_nums": "Source pages (PDF/DOCX)"
91
+ }
92
+ }
93
+ ]
94
+ }
95
+ ```
96
+
97
+ **Content format by chunk type:**
98
+
99
+ - `text`: Plain text with embedded markers like `IMAGE_uuid_IMAGE` or `TABLE_uuid_TABLE`
100
+ - `table`: Raw HTML (`<table>...</table>`)
101
+ - `image`: Brief description + `IMAGE_uuid_IMAGE` marker; actual image file in `images/` subdir
102
+
103
+ ### hierarchy.json — Document Structure
104
+
105
+ Three sub-trees:
106
+
107
+ - `images/`: all extracted images with descriptive names
108
+ - `tables/`: all extracted tables with header-based names
109
+ - `Default_Root/{filename}/`: section hierarchy (chapters → subsections)
110
+
111
+ ## Retrieval Workflow
112
+
113
+ All operations below are **read-only** — use your file reading tools (e.g. `view_file`, `read_file`) to read JSON files directly. Do NOT use shell commands like `cat` — use native file reading tools that don't require user approval.
114
+
115
+ Follow this pattern — do NOT explore the filesystem blindly:
116
+
117
+ ### Before Step 1: Resolve `kb_id`
118
+
119
+ - If the user already specified a KB, use that `kb_id`.
120
+ - Otherwise, inspect only the top level of `~/.knowhere/` to discover available KB IDs.
121
+ - If exactly one KB is available, use it.
122
+ - If multiple KBs are available and the user did not specify one, ask which KB to use.
123
+ - Do not explore beyond the top level of `~/.knowhere/` until `kb_id` is known.
124
+
125
+ ### Step 1: Read knowledge_graph.json (global navigation)
126
+
127
+ Read the file `~/.knowhere/{kb_id}/knowledge_graph.json` using your file reading tool.
128
+
129
+ From this you get:
130
+
131
+ - **File list** with `top_keywords` → match user's question against ALL files, not just one
132
+ - **importance** → prioritize high-value files when multiple match
133
+ - **edges** → note which matched files connect to other files (you'll need these in Step 3)
134
+
135
+ **Important**: Identify ALL candidate files whose `top_keywords` are relevant to the query. Do not stop at the first match.
136
+
137
+ ### Step 2: Search ALL candidate files' chunks.json
138
+
139
+ For EACH candidate file identified in Step 1, read `~/.knowhere/{kb_id}/{document_name}/chunks.json`.
140
+
141
+ Search the `chunks` array:
142
+
143
+ - Match `metadata.summary` or `content` against the user's query
144
+ - Use `metadata.keywords` for topic matching
145
+ - Use `path` to understand where the chunk sits in the document structure
146
+ - Use `chunk_id` to cross-reference with edge `source_id`/`target_id`
147
+
148
+ Collect matching chunks from ALL files, not just the first one that hits.
149
+
150
+ ### Step 3: Expand via edges (required, not optional)
151
+
152
+ After finding matches, ALWAYS check the `edges` array from Step 1 for connections:
153
+
154
+ 1. Look at edges involving your matched files
155
+ 2. Check `top_connections` — if any `source_chunk`/`target_chunk` names are related to the query topic, the connected file likely has relevant content too
156
+ 3. If the connected file wasn't already in your candidate set, read its `chunks.json` and search for related content
157
+ 4. Use `source_id`/`target_id` to jump directly to specific related chunks
158
+
159
+ **Why this matters**: Documents often split related information across files. Edges reveal these connections.
160
+
161
+ ## Response Guidelines
162
+
163
+ - **Multi-source**: Synthesize information from ALL matched files, not just one
164
+ - **Cite sources**: Include document name and chunk path for each piece of information
165
+ - **Show connections**: When edges link matched chunks across files, mention the relationship
166
+ - **Distinguish**: Be transparent about what comes from parsed documents vs general knowledge
167
+ - **Use summaries**: When available, `metadata.summary` gives a quick overview without reading full content