@ontos-ai/knowhere-claw 0.1.2 → 0.1.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -10,7 +10,7 @@ Quick mental model:
10
10
  - Restart the Gateway
11
11
  - Configure `plugins.entries."knowhere-claw".config`
12
12
  - Ask an agent to read, search, or compare a document
13
- - Let the bundled `knowhere` skill steer the agent toward the right tools
13
+ - Let the bundled `knowhere` and `knowhere_memory` skills steer the agent toward the right tools
14
14
 
15
15
  ## Where It Runs
16
16
 
@@ -25,8 +25,8 @@ the machine running that Gateway, then restart that Gateway.
25
25
  - Store parsed result packages inside OpenClaw-managed state
26
26
  - Preview document structure, search chunks, and inspect raw result files
27
27
  - Reuse stored documents across `session`, `agent`, or `global` scope
28
- - Ship a bundled `knowhere` skill so agents prefer this toolchain for
29
- document-heavy tasks
28
+ - Ship bundled `knowhere` and `knowhere_memory` skills so agents prefer this
29
+ toolchain for document-heavy tasks and knowledge-base lookups
30
30
 
31
31
  ## Install
32
32
 
@@ -88,9 +88,13 @@ Once the plugin is enabled, you can ask an OpenClaw agent to:
88
88
  The bundled `knowhere` skill teaches agents to use the `knowhere_*` tools
89
89
  instead of raw file reads when document parsing matters.
90
90
 
91
+ The bundled `knowhere_memory` skill teaches agents to treat previously parsed
92
+ Knowhere content as a knowledge base when users ask to search their materials,
93
+ look something up, or summarize what data they already have.
94
+
91
95
  If you use skill filters or allowlists in OpenClaw, keep the bundled
92
- `knowhere` skill enabled or the tools will load without their intended usage
93
- guidance.
96
+ `knowhere` and `knowhere_memory` skills enabled or the tools will load without
97
+ their intended usage guidance.
94
98
 
95
99
  If your agent runtime uses a tool allowlist, include `knowhere_*` so agents can
96
100
  actually call the plugin tools.
@@ -3,7 +3,7 @@
3
3
  "name": "Knowhere",
4
4
  "description": "Parse documents with Knowhere and expose the stored result as tool-queryable document state for OpenClaw agents.",
5
5
  "skills": ["./skills"],
6
- "version": "0.1.1",
6
+ "version": "0.1.4",
7
7
  "uiHints": {
8
8
  "apiKey": {
9
9
  "label": "Knowhere API Key",
@@ -49,7 +49,6 @@
49
49
  },
50
50
  "baseUrl": {
51
51
  "type": "string",
52
- "format": "string",
53
52
  "default": "https://api.knowhereto.ai"
54
53
  },
55
54
  "storageDir": {
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@ontos-ai/knowhere-claw",
3
- "version": "0.1.2",
3
+ "version": "0.1.4",
4
4
  "description": "OpenClaw plugin for Knowhere-powered document ingestion and automatic grounding.",
5
5
  "files": [
6
6
  "dist/",
@@ -60,4 +60,4 @@
60
60
  "./dist/index.js"
61
61
  ]
62
62
  }
63
- }
63
+ }
@@ -0,0 +1,167 @@
1
+ ---
2
+ name: knowhere_memory
3
+ description: Auto-discover and search knowledge from Knowhere parsed documents. Use when the user asks questions, needs information, or references their knowledge base.
4
+ user-invocable: false
5
+ ---
6
+
7
+ # Knowhere Knowledge Memory
8
+
9
+ This agent has access to a **personal knowledge base** managed by Knowhere.
10
+
11
+ ## When to Use
12
+
13
+ Activate this skill when:
14
+
15
+ - The user asks a question that might be answered by their documents
16
+ - The user says "look it up", "help me find", "knowledge base", "my materials", etc.
17
+ - The user asks "what materials do I have" or wants an overview
18
+
19
+ ## Data Location
20
+
21
+ All knowledge data lives under `~/.knowhere/{kb_id}/`:
22
+
23
+ ```text
24
+ ~/.knowhere/
25
+ └── {kb_id}/ # e.g. "chengke_kb"
26
+ ├── knowledge_graph.json # START HERE — file-level overview + cross-file edges
27
+ ├── chunk_stats.json # Hit counts / usage stats per chunk
28
+ └── {document_name}/ # One subdir per parsed document
29
+ ├── chunks.json # All chunks for this document (the actual content)
30
+ ├── hierarchy.json # Document structure tree
31
+ ├── images/ # Extracted images (JPEG/PNG)
32
+ └── tables/ # Extracted tables (HTML files)
33
+ ```
34
+
35
+ ## File Schema Reference
36
+
37
+ ### knowledge_graph.json — Global Navigation (read this first)
38
+
39
+ ```json
40
+ {
41
+ "version": "2.0",
42
+ "stats": { "total_files": 5, "total_chunks": 327, "total_cross_file_edges": 3 },
43
+ "files": {
44
+ "report.docx": {
45
+ "chunks_count": 198,
46
+ "types": { "text": 135, "table": 21, "image": 42 },
47
+ "top_keywords": ["excavation", "retaining", "construction"],
48
+ "top_summary": "",
49
+ "importance": 0.85
50
+ }
51
+ },
52
+ "edges": [
53
+ {
54
+ "source": "file_A.docx",
55
+ "target": "file_B.pdf",
56
+ "connection_count": 20,
57
+ "avg_score": 0.95,
58
+ "top_connections": [
59
+ {
60
+ "source_chunk": "Chapter 3 Safety Measures",
61
+ "source_id": "abc123-...",
62
+ "target_chunk": "Safety Management Policy",
63
+ "target_id": "def456-...",
64
+ "relation": "related",
65
+ "score": 1.0
66
+ }
67
+ ]
68
+ }
69
+ ]
70
+ }
71
+ ```
72
+
73
+ ### chunks.json — Document Content (read per-file, on demand)
74
+
75
+ Located at `~/.knowhere/{kb_id}/{document_name}/chunks.json`.
76
+
77
+ ```json
78
+ {
79
+ "chunks": [
80
+ {
81
+ "chunk_id": "34da946a-5938-578c-...",
82
+ "type": "text",
83
+ "path": "Default_Root/report.docx/Chapter 1/1.1",
84
+ "content": "actual content...",
85
+ "metadata": {
86
+ "summary": "LLM-generated summary (may be empty)",
87
+ "keywords": ["Extracted keywords"],
88
+ "tokens": ["Jieba tokenization"],
89
+ "length": 1234,
90
+ "page_nums": "Source pages (PDF/DOCX)"
91
+ }
92
+ }
93
+ ]
94
+ }
95
+ ```
96
+
97
+ **Content format by chunk type:**
98
+
99
+ - `text`: Plain text with embedded markers like `IMAGE_uuid_IMAGE` or `TABLE_uuid_TABLE`
100
+ - `table`: Raw HTML (`<table>...</table>`)
101
+ - `image`: Brief description + `IMAGE_uuid_IMAGE` marker; actual image file in `images/` subdir
102
+
103
+ ### hierarchy.json — Document Structure
104
+
105
+ Three sub-trees:
106
+
107
+ - `images/`: all extracted images with descriptive names
108
+ - `tables/`: all extracted tables with header-based names
109
+ - `Default_Root/{filename}/`: section hierarchy (chapters → subsections)
110
+
111
+ ## Retrieval Workflow
112
+
113
+ All operations below are **read-only** — use your file reading tools (e.g. `view_file`, `read_file`) to read JSON files directly. Do NOT use shell commands like `cat` — use native file reading tools that don't require user approval.
114
+
115
+ Follow this pattern — do NOT explore the filesystem blindly:
116
+
117
+ ### Before Step 1: Resolve `kb_id`
118
+
119
+ - If the user already specified a KB, use that `kb_id`.
120
+ - Otherwise, inspect only the top level of `~/.knowhere/` to discover available KB IDs.
121
+ - If exactly one KB is available, use it.
122
+ - If multiple KBs are available and the user did not specify one, ask which KB to use.
123
+ - Do not explore beyond the top level of `~/.knowhere/` until `kb_id` is known.
124
+
125
+ ### Step 1: Read knowledge_graph.json (global navigation)
126
+
127
+ Read the file `~/.knowhere/{kb_id}/knowledge_graph.json` using your file reading tool.
128
+
129
+ From this you get:
130
+
131
+ - **File list** with `top_keywords` → match user's question against ALL files, not just one
132
+ - **importance** → prioritize high-value files when multiple match
133
+ - **edges** → note which matched files connect to other files (you'll need these in Step 3)
134
+
135
+ **Important**: Identify ALL candidate files whose `top_keywords` are relevant to the query. Do not stop at the first match.
136
+
137
+ ### Step 2: Search ALL candidate files' chunks.json
138
+
139
+ For EACH candidate file identified in Step 1, read `~/.knowhere/{kb_id}/{document_name}/chunks.json`.
140
+
141
+ Search the `chunks` array:
142
+
143
+ - Match `metadata.summary` or `content` against the user's query
144
+ - Use `metadata.keywords` for topic matching
145
+ - Use `path` to understand where the chunk sits in the document structure
146
+ - Use `chunk_id` to cross-reference with edge `source_id`/`target_id`
147
+
148
+ Collect matching chunks from ALL files, not just the first one that hits.
149
+
150
+ ### Step 3: Expand via edges (required, not optional)
151
+
152
+ After finding matches, ALWAYS check the `edges` array from Step 1 for connections:
153
+
154
+ 1. Look at edges involving your matched files
155
+ 2. Check `top_connections` — if any `source_chunk`/`target_chunk` names are related to the query topic, the connected file likely has relevant content too
156
+ 3. If the connected file wasn't already in your candidate set, read its `chunks.json` and search for related content
157
+ 4. Use `source_id`/`target_id` to jump directly to specific related chunks
158
+
159
+ **Why this matters**: Documents often split related information across files. Edges reveal these connections.
160
+
161
+ ## Response Guidelines
162
+
163
+ - **Multi-source**: Synthesize information from ALL matched files, not just one
164
+ - **Cite sources**: Include document name and chunk path for each piece of information
165
+ - **Show connections**: When edges link matched chunks across files, mention the relationship
166
+ - **Distinguish**: Be transparent about what comes from parsed documents vs general knowledge
167
+ - **Use summaries**: When available, `metadata.summary` gives a quick overview without reading full content