npm - @ontos-ai/knowhere-claw - Versions diffs - 0.1.3 → 0.1.5-beta.20260320121253 - Mend

@ontos-ai/knowhere-claw 0.1.3 → 0.1.5-beta.20260320121253

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (6) hide show

package/README.md +12 -5
package/dist/tools.js +3 -3
package/openclaw.plugin.json +9 -3
package/package.json +9 -3
package/skills/knowhere/SKILL.md +1 -1
package/skills/knowhere_memory/SKILL.md +167 -0

package/README.md CHANGED Viewed

@@ -10,7 +10,7 @@ Quick mental model:
 - Restart the Gateway
 - Configure `plugins.entries."knowhere-claw".config`
 - Ask an agent to read, search, or compare a document
-- Let the bundled `knowhere` skill steer the agent toward the right tools
+- Let the bundled `knowhere` and `knowhere_memory` skills steer the agent toward the right tools
 ## Where It Runs
@@ -25,8 +25,8 @@ the machine running that Gateway, then restart that Gateway.
 - Store parsed result packages inside OpenClaw-managed state
 - Preview document structure, search chunks, and inspect raw result files
 - Reuse stored documents across `session`, `agent`, or `global` scope
-- Ship a bundled `knowhere` skill so agents prefer this toolchain for
-  document-heavy tasks
+- Ship bundled `knowhere` and `knowhere_memory` skills so agents prefer this
+  toolchain for document-heavy tasks and knowledge-base lookups
 ## Install
@@ -88,9 +88,13 @@ Once the plugin is enabled, you can ask an OpenClaw agent to:
 The bundled `knowhere` skill teaches agents to use the `knowhere_*` tools
 instead of raw file reads when document parsing matters.
+The bundled `knowhere_memory` skill teaches agents to treat previously parsed
+Knowhere content as a knowledge base when users ask to search their materials,
+look something up, or summarize what data they already have.
 If you use skill filters or allowlists in OpenClaw, keep the bundled
-`knowhere` skill enabled or the tools will load without their intended usage
-guidance.
+`knowhere` and `knowhere_memory` skills enabled or the tools will load without
+their intended usage guidance.
 If your agent runtime uses a tool allowlist, include `knowhere_*` so agents can
 actually call the plugin tools.
@@ -134,3 +138,6 @@ Within each scope, the plugin keeps:
 Contributor-oriented architecture, workflow, and packaging notes live in
 `DEVELOPMENT.md` at the repository root.
+Release-process details for maintainers live in
+[`docs/release-workflow.md`](./docs/release-workflow.md).

package/dist/tools.js CHANGED Viewed

@@ -492,7 +492,7 @@ function createIngestTool(params) {
 	return {
 		name: "knowhere_ingest_document",
 		label: "Knowhere Ingest",
-		description: "Parse a local file or remote URL with Knowhere and store the result in the current scope. Before calling this for a document that might already be stored in the current scope, use knowhere_list_documents and reuse the existing stored document when Source, File, or Title clearly match unless the user explicitly asks for a fresh parse or overwrite. When the user provides a URL to a document (PDF link, web page, etc.), pass it as the url parameter — Knowhere fetches it directly, no local download needed. Returns immediately with a job ID while parsing continues in the background. Use knowhere_get_job_status to check progress at any time. Use lang to control the language of the direct tracker follow-up (`en` by default, `ch` for Chinese). Provide either filePath or url, not both.",
+		description: "Parse a local file or remote URL with Knowhere and store the result in the current scope. Before calling this for a document that might already be stored in the current scope, use knowhere_list_documents and reuse the existing stored document when Source, File, or Title clearly match unless the user explicitly asks for a fresh parse or overwrite. When the user provides a URL to a document (PDF link, web page, etc.), pass it as the url parameter — Knowhere fetches it directly, no local download needed. Returns immediately with a job ID while parsing continues in the background. Use knowhere_get_job_status only when the current turn needs the parsed result. Use lang to control the language of any user-facing background status update (`en` by default, `ch` for Chinese). Provide either filePath or url, not both.",
 		parameters: {
 			type: "object",
 			additionalProperties: false,
@@ -536,7 +536,7 @@ function createIngestTool(params) {
 				},
 				lang: {
 					type: "string",
-					description: "Language for the direct tracker follow-up message sent after background parsing completes or fails. Supports en and ch; unsupported values fall back to en."
+					description: "Language for any user-facing background status update sent after parsing completes or fails. Supports en and ch; unsupported values fall back to en."
 				},
 				parsing: {
 					type: "object",
@@ -679,7 +679,7 @@ function createIngestTool(params) {
 				`Job ID: ${createdJob.job_id}`,
 				`File: ${progressLabel}`,
 				`Scope: ${scope.label}`,
-				"Use knowhere_get_job_status to check progress at any time."
+				"Use knowhere_get_job_status only if this turn needs the parsed result."
 			].join("\n"));
 		}
 	};

package/openclaw.plugin.json CHANGED Viewed

@@ -2,8 +2,10 @@
   "id": "knowhere-claw",
   "name": "Knowhere",
   "description": "Parse documents with Knowhere and expose the stored result as tool-queryable document state for OpenClaw agents.",
-  "skills": ["./skills"],
-  "version": "0.1.1",
+  "skills": [
+    "./skills"
+  ],
+  "version": "0.1.5-beta.20260320121253",
   "uiHints": {
     "apiKey": {
       "label": "Knowhere API Key",
@@ -56,7 +58,11 @@
       },
       "scopeMode": {
         "type": "string",
-        "enum": ["session", "agent", "global"],
+        "enum": [
+          "session",
+          "agent",
+          "global"
+        ],
         "default": "session"
       },
       "pollIntervalMs": {

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@ontos-ai/knowhere-claw",
-  "version": "0.1.3",
+  "version": "0.1.5-beta.20260320121253",
   "description": "OpenClaw plugin for Knowhere-powered document ingestion and automatic grounding.",
   "files": [
     "dist/",
@@ -19,15 +19,19 @@
   },
   "scripts": {
     "build": "rolldown -c && tsc -p tsconfig.build.json",
+    "changeset": "changeset",
+    "changeset:publish": "changeset publish",
     "fmt": "oxfmt",
     "fmt:check": "oxfmt --check",
     "lint": "oxlint --type-aware",
     "lint:fix": "oxlint --type-aware --fix",
     "smoke:tools": "vite-node --mode smoke ./smoketest/run-tool.ts",
+    "release:beta": "node ./scripts/publish-beta-release.mjs",
+    "release:publish": "node ./scripts/publish-release.mjs",
+    "release:version": "node ./scripts/release-version.mjs",
     "tsgo": "tsgo --noEmit -p tsconfig.json",
     "typecheck": "pnpm tsgo",
     "check:plugin-version": "node ./scripts/release-guard.mjs plugin-version",
-    "check:beta-version": "node ./scripts/release-guard.mjs beta-version",
     "prepack": "pnpm build",
     "test": "vitest --run",
     "clean": "rm -rf dist"
@@ -37,6 +41,8 @@
     "fflate": "^0.8.2"
   },
   "devDependencies": {
+    "@changesets/changelog-github": "^0.6.0",
+    "@changesets/cli": "^2.30.0",
     "@tsconfig/node22": "^22.0.5",
     "@types/node": "^25.3.5",
     "@typescript/native-preview": "7.0.0-dev.20260307.1",
@@ -60,4 +66,4 @@
       "./dist/index.js"
     ]
   }
-}
+}

package/skills/knowhere/SKILL.md CHANGED Viewed

@@ -132,7 +132,7 @@ After ingesting a document, use the returned document or job identifiers for fol
 ## Recommended workflow
 1. If the document may already exist in the current scope, call `knowhere_list_documents` first and compare `Source`, `File`, and `Title` to find an existing match.
-2. Ingest or import the document only if it is not already in the store, or if the user explicitly wants a fresh parse. After calling `knowhere_ingest_document`, you receive a job ID immediately while parsing continues in the background. Wait for the plugin's completion message that the document is ready before proceeding. If no completion message arrives, check with `knowhere_get_job_status`.
+2. Ingest or import the document only if it is not already in the store, or if the user explicitly wants a fresh parse. After calling `knowhere_ingest_document`, you receive a job ID immediately while parsing continues in the background. If the current turn needs the parsed document, check with `knowhere_get_job_status`; otherwise stop and wait for the user to continue later.
 3. Call `knowhere_list_documents` again if you need to confirm the right `docId`.
 4. Call `knowhere_preview_document` to get a structural overview (table of contents with summaries).
 5. When you know what to search for, call `knowhere_grep` with `conditions: [{ pattern: "your query" }]` — this searches all text fields (content, summary, keywords, path) in one call. Add more conditions to narrow results (e.g. filter by `chunk.type` or `chunk.path`).

package/skills/knowhere_memory/SKILL.md ADDED Viewed

@@ -0,0 +1,167 @@
+---
+name: knowhere_memory
+description: Auto-discover and search knowledge from Knowhere parsed documents. Use when the user asks questions, needs information, or references their knowledge base.
+user-invocable: false
+---
+# Knowhere Knowledge Memory
+This agent has access to a **personal knowledge base** managed by Knowhere.
+## When to Use
+Activate this skill when:
+- The user asks a question that might be answered by their documents
+- The user says "look it up", "help me find", "knowledge base", "my materials", etc.
+- The user asks "what materials do I have" or wants an overview
+## Data Location
+All knowledge data lives under `~/.knowhere/{kb_id}/`:
+```text
+~/.knowhere/
+└── {kb_id}/                          # e.g. "chengke_kb"
+    ├── knowledge_graph.json          # START HERE — file-level overview + cross-file edges
+    ├── chunk_stats.json              # Hit counts / usage stats per chunk
+    └── {document_name}/              # One subdir per parsed document
+        ├── chunks.json               # All chunks for this document (the actual content)
+        ├── hierarchy.json            # Document structure tree
+        ├── images/                   # Extracted images (JPEG/PNG)
+        └── tables/                   # Extracted tables (HTML files)
+```
+## File Schema Reference
+### knowledge_graph.json — Global Navigation (read this first)
+```json
+{
+  "version": "2.0",
+  "stats": { "total_files": 5, "total_chunks": 327, "total_cross_file_edges": 3 },
+  "files": {
+    "report.docx": {
+      "chunks_count": 198,
+      "types": { "text": 135, "table": 21, "image": 42 },
+      "top_keywords": ["excavation", "retaining", "construction"],
+      "top_summary": "",
+      "importance": 0.85
+    }
+  },
+  "edges": [
+    {
+      "source": "file_A.docx",
+      "target": "file_B.pdf",
+      "connection_count": 20,
+      "avg_score": 0.95,
+      "top_connections": [
+        {
+          "source_chunk": "Chapter 3 Safety Measures",
+          "source_id": "abc123-...",
+          "target_chunk": "Safety Management Policy",
+          "target_id": "def456-...",
+          "relation": "related",
+          "score": 1.0
+        }
+      ]
+    }
+  ]
+}
+```
+### chunks.json — Document Content (read per-file, on demand)
+Located at `~/.knowhere/{kb_id}/{document_name}/chunks.json`.
+```json
+{
+  "chunks": [
+    {
+      "chunk_id": "34da946a-5938-578c-...",
+      "type": "text",
+      "path": "Default_Root/report.docx/Chapter 1/1.1",
+      "content": "actual content...",
+      "metadata": {
+        "summary": "LLM-generated summary (may be empty)",
+        "keywords": ["Extracted keywords"],
+        "tokens": ["Jieba tokenization"],
+        "length": 1234,
+        "page_nums": "Source pages (PDF/DOCX)"
+      }
+    }
+  ]
+}
+```
+**Content format by chunk type:**
+- `text`: Plain text with embedded markers like `IMAGE_uuid_IMAGE` or `TABLE_uuid_TABLE`
+- `table`: Raw HTML (`<table>...</table>`)
+- `image`: Brief description + `IMAGE_uuid_IMAGE` marker; actual image file in `images/` subdir
+### hierarchy.json — Document Structure
+Three sub-trees:
+- `images/`: all extracted images with descriptive names
+- `tables/`: all extracted tables with header-based names
+- `Default_Root/{filename}/`: section hierarchy (chapters → subsections)
+## Retrieval Workflow
+All operations below are **read-only** — use your file reading tools (e.g. `view_file`, `read_file`) to read JSON files directly. Do NOT use shell commands like `cat` — use native file reading tools that don't require user approval.
+Follow this pattern — do NOT explore the filesystem blindly:
+### Before Step 1: Resolve `kb_id`
+- If the user already specified a KB, use that `kb_id`.
+- Otherwise, inspect only the top level of `~/.knowhere/` to discover available KB IDs.
+- If exactly one KB is available, use it.
+- If multiple KBs are available and the user did not specify one, ask which KB to use.
+- Do not explore beyond the top level of `~/.knowhere/` until `kb_id` is known.
+### Step 1: Read knowledge_graph.json (global navigation)
+Read the file `~/.knowhere/{kb_id}/knowledge_graph.json` using your file reading tool.
+From this you get:
+- **File list** with `top_keywords` → match user's question against ALL files, not just one
+- **importance** → prioritize high-value files when multiple match
+- **edges** → note which matched files connect to other files (you'll need these in Step 3)
+**Important**: Identify ALL candidate files whose `top_keywords` are relevant to the query. Do not stop at the first match.
+### Step 2: Search ALL candidate files' chunks.json
+For EACH candidate file identified in Step 1, read `~/.knowhere/{kb_id}/{document_name}/chunks.json`.
+Search the `chunks` array:
+- Match `metadata.summary` or `content` against the user's query
+- Use `metadata.keywords` for topic matching
+- Use `path` to understand where the chunk sits in the document structure
+- Use `chunk_id` to cross-reference with edge `source_id`/`target_id`
+Collect matching chunks from ALL files, not just the first one that hits.
+### Step 3: Expand via edges (required, not optional)
+After finding matches, ALWAYS check the `edges` array from Step 1 for connections:
+1. Look at edges involving your matched files
+2. Check `top_connections` — if any `source_chunk`/`target_chunk` names are related to the query topic, the connected file likely has relevant content too
+3. If the connected file wasn't already in your candidate set, read its `chunks.json` and search for related content
+4. Use `source_id`/`target_id` to jump directly to specific related chunks
+**Why this matters**: Documents often split related information across files. Edges reveal these connections.
+## Response Guidelines
+- **Multi-source**: Synthesize information from ALL matched files, not just one
+- **Cite sources**: Include document name and chunk path for each piece of information
+- **Show connections**: When edges link matched chunks across files, mention the relationship
+- **Distinguish**: Be transparent about what comes from parsed documents vs general knowledge
+- **Use summaries**: When available, `metadata.summary` gives a quick overview without reading full content