npm - @ontos-ai/knowhere-claw - Versions diffs - 0.1.2 → 0.1.4 - Mend

@ontos-ai/knowhere-claw 0.1.2 → 0.1.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (4) hide show

package/README.md +9 -5
package/openclaw.plugin.json +1 -2
package/package.json +2 -2
package/skills/knowhere_memory/SKILL.md +167 -0

package/README.md CHANGED Viewed

@@ -10,7 +10,7 @@ Quick mental model:
 - Restart the Gateway
 - Configure `plugins.entries."knowhere-claw".config`
 - Ask an agent to read, search, or compare a document
-- Let the bundled `knowhere` skill steer the agent toward the right tools
+- Let the bundled `knowhere` and `knowhere_memory` skills steer the agent toward the right tools
 ## Where It Runs
@@ -25,8 +25,8 @@ the machine running that Gateway, then restart that Gateway.
 - Store parsed result packages inside OpenClaw-managed state
 - Preview document structure, search chunks, and inspect raw result files
 - Reuse stored documents across `session`, `agent`, or `global` scope
-- Ship a bundled `knowhere` skill so agents prefer this toolchain for
-  document-heavy tasks
+- Ship bundled `knowhere` and `knowhere_memory` skills so agents prefer this
+  toolchain for document-heavy tasks and knowledge-base lookups
 ## Install
@@ -88,9 +88,13 @@ Once the plugin is enabled, you can ask an OpenClaw agent to:
 The bundled `knowhere` skill teaches agents to use the `knowhere_*` tools
 instead of raw file reads when document parsing matters.
+The bundled `knowhere_memory` skill teaches agents to treat previously parsed
+Knowhere content as a knowledge base when users ask to search their materials,
+look something up, or summarize what data they already have.
 If you use skill filters or allowlists in OpenClaw, keep the bundled
-`knowhere` skill enabled or the tools will load without their intended usage
-guidance.
+`knowhere` and `knowhere_memory` skills enabled or the tools will load without
+their intended usage guidance.
 If your agent runtime uses a tool allowlist, include `knowhere_*` so agents can
 actually call the plugin tools.

package/openclaw.plugin.json CHANGED Viewed

@@ -3,7 +3,7 @@
   "name": "Knowhere",
   "description": "Parse documents with Knowhere and expose the stored result as tool-queryable document state for OpenClaw agents.",
   "skills": ["./skills"],
-  "version": "0.1.1",
+  "version": "0.1.4",
   "uiHints": {
     "apiKey": {
       "label": "Knowhere API Key",
@@ -49,7 +49,6 @@
       },
       "baseUrl": {
         "type": "string",
-        "format": "string",
         "default": "https://api.knowhereto.ai"
       },
       "storageDir": {

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@ontos-ai/knowhere-claw",
-  "version": "0.1.2",
+  "version": "0.1.4",
   "description": "OpenClaw plugin for Knowhere-powered document ingestion and automatic grounding.",
   "files": [
     "dist/",
@@ -60,4 +60,4 @@
       "./dist/index.js"
     ]
   }
-}
+}

package/skills/knowhere_memory/SKILL.md ADDED Viewed

@@ -0,0 +1,167 @@
+---
+name: knowhere_memory
+description: Auto-discover and search knowledge from Knowhere parsed documents. Use when the user asks questions, needs information, or references their knowledge base.
+user-invocable: false
+---
+# Knowhere Knowledge Memory
+This agent has access to a **personal knowledge base** managed by Knowhere.
+## When to Use
+Activate this skill when:
+- The user asks a question that might be answered by their documents
+- The user says "look it up", "help me find", "knowledge base", "my materials", etc.
+- The user asks "what materials do I have" or wants an overview
+## Data Location
+All knowledge data lives under `~/.knowhere/{kb_id}/`:
+```text
+~/.knowhere/
+└── {kb_id}/                          # e.g. "chengke_kb"
+    ├── knowledge_graph.json          # START HERE — file-level overview + cross-file edges
+    ├── chunk_stats.json              # Hit counts / usage stats per chunk
+    └── {document_name}/              # One subdir per parsed document
+        ├── chunks.json               # All chunks for this document (the actual content)
+        ├── hierarchy.json            # Document structure tree
+        ├── images/                   # Extracted images (JPEG/PNG)
+        └── tables/                   # Extracted tables (HTML files)
+```
+## File Schema Reference
+### knowledge_graph.json — Global Navigation (read this first)
+```json
+{
+  "version": "2.0",
+  "stats": { "total_files": 5, "total_chunks": 327, "total_cross_file_edges": 3 },
+  "files": {
+    "report.docx": {
+      "chunks_count": 198,
+      "types": { "text": 135, "table": 21, "image": 42 },
+      "top_keywords": ["excavation", "retaining", "construction"],
+      "top_summary": "",
+      "importance": 0.85
+    }
+  },
+  "edges": [
+    {
+      "source": "file_A.docx",
+      "target": "file_B.pdf",
+      "connection_count": 20,
+      "avg_score": 0.95,
+      "top_connections": [
+        {
+          "source_chunk": "Chapter 3 Safety Measures",
+          "source_id": "abc123-...",
+          "target_chunk": "Safety Management Policy",
+          "target_id": "def456-...",
+          "relation": "related",
+          "score": 1.0
+        }
+      ]
+    }
+  ]
+}
+```
+### chunks.json — Document Content (read per-file, on demand)
+Located at `~/.knowhere/{kb_id}/{document_name}/chunks.json`.
+```json
+{
+  "chunks": [
+    {
+      "chunk_id": "34da946a-5938-578c-...",
+      "type": "text",
+      "path": "Default_Root/report.docx/Chapter 1/1.1",
+      "content": "actual content...",
+      "metadata": {
+        "summary": "LLM-generated summary (may be empty)",
+        "keywords": ["Extracted keywords"],
+        "tokens": ["Jieba tokenization"],
+        "length": 1234,
+        "page_nums": "Source pages (PDF/DOCX)"
+      }
+    }
+  ]
+}
+```
+**Content format by chunk type:**
+- `text`: Plain text with embedded markers like `IMAGE_uuid_IMAGE` or `TABLE_uuid_TABLE`
+- `table`: Raw HTML (`<table>...</table>`)
+- `image`: Brief description + `IMAGE_uuid_IMAGE` marker; actual image file in `images/` subdir
+### hierarchy.json — Document Structure
+Three sub-trees:
+- `images/`: all extracted images with descriptive names
+- `tables/`: all extracted tables with header-based names
+- `Default_Root/{filename}/`: section hierarchy (chapters → subsections)
+## Retrieval Workflow
+All operations below are **read-only** — use your file reading tools (e.g. `view_file`, `read_file`) to read JSON files directly. Do NOT use shell commands like `cat` — use native file reading tools that don't require user approval.
+Follow this pattern — do NOT explore the filesystem blindly:
+### Before Step 1: Resolve `kb_id`
+- If the user already specified a KB, use that `kb_id`.
+- Otherwise, inspect only the top level of `~/.knowhere/` to discover available KB IDs.
+- If exactly one KB is available, use it.
+- If multiple KBs are available and the user did not specify one, ask which KB to use.
+- Do not explore beyond the top level of `~/.knowhere/` until `kb_id` is known.
+### Step 1: Read knowledge_graph.json (global navigation)
+Read the file `~/.knowhere/{kb_id}/knowledge_graph.json` using your file reading tool.
+From this you get:
+- **File list** with `top_keywords` → match user's question against ALL files, not just one
+- **importance** → prioritize high-value files when multiple match
+- **edges** → note which matched files connect to other files (you'll need these in Step 3)
+**Important**: Identify ALL candidate files whose `top_keywords` are relevant to the query. Do not stop at the first match.
+### Step 2: Search ALL candidate files' chunks.json
+For EACH candidate file identified in Step 1, read `~/.knowhere/{kb_id}/{document_name}/chunks.json`.
+Search the `chunks` array:
+- Match `metadata.summary` or `content` against the user's query
+- Use `metadata.keywords` for topic matching
+- Use `path` to understand where the chunk sits in the document structure
+- Use `chunk_id` to cross-reference with edge `source_id`/`target_id`
+Collect matching chunks from ALL files, not just the first one that hits.
+### Step 3: Expand via edges (required, not optional)
+After finding matches, ALWAYS check the `edges` array from Step 1 for connections:
+1. Look at edges involving your matched files
+2. Check `top_connections` — if any `source_chunk`/`target_chunk` names are related to the query topic, the connected file likely has relevant content too
+3. If the connected file wasn't already in your candidate set, read its `chunks.json` and search for related content
+4. Use `source_id`/`target_id` to jump directly to specific related chunks
+**Why this matters**: Documents often split related information across files. Edges reveal these connections.
+## Response Guidelines
+- **Multi-source**: Synthesize information from ALL matched files, not just one
+- **Cite sources**: Include document name and chunk path for each piece of information
+- **Show connections**: When edges link matched chunks across files, mention the relationship
+- **Distinguish**: Be transparent about what comes from parsed documents vs general knowledge
+- **Use summaries**: When available, `metadata.summary` gives a quick overview without reading full content