@vainplex/openclaw-knowledge-engine 0.1.0 → 0.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (3) hide show
  1. package/LICENSE +21 -0
  2. package/README.md +251 -0
  3. package/package.json +1 -1
package/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Vainplex
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
package/README.md ADDED
@@ -0,0 +1,251 @@
1
+ # @vainplex/openclaw-knowledge-engine
2
+
3
+ A real-time knowledge extraction plugin for [OpenClaw](https://github.com/openclaw/openclaw). Automatically extracts entities, facts, and relationships from conversations — building a persistent, queryable knowledge base that grows with every message.
4
+
5
+ ## What it does
6
+
7
+ Every message your OpenClaw agent processes flows through the Knowledge Engine:
8
+
9
+ 1. **Regex Extraction** (instant, zero cost) — Detects people, organizations, technologies, URLs, emails, and other entities using pattern matching
10
+ 2. **LLM Enhancement** (optional, batched) — Groups messages and sends them to a local LLM for deeper entity and fact extraction
11
+ 3. **Fact Storage** — Stores extracted knowledge as structured subject-predicate-object triples with relevance scoring
12
+ 4. **Relevance Decay** — Automatically decays old facts so recent knowledge surfaces first
13
+ 5. **Vector Sync** — Optionally syncs facts to ChromaDB for semantic search
14
+ 6. **Background Maintenance** — Prunes low-relevance facts, compacts storage, runs cleanup
15
+
16
+ ```
17
+ User: "We're meeting with Sebastian from Mondo Gate next Tuesday"
18
+
19
+ ├─ Regex → entities: [Sebastian (person), Mondo Gate (organization)]
20
+ └─ LLM → facts: [Sebastian — works-at — Mondo Gate]
21
+ [Meeting — scheduled-with — Mondo Gate]
22
+ ```
23
+
24
+ ## Quick Start
25
+
26
+ ### 1. Install
27
+
28
+ ```bash
29
+ cd ~/.openclaw
30
+ npm install @vainplex/openclaw-knowledge-engine
31
+ ```
32
+
33
+ ### 2. Sync to extensions
34
+
35
+ OpenClaw loads plugins from the `extensions/` directory:
36
+
37
+ ```bash
38
+ mkdir -p extensions/openclaw-knowledge-engine
39
+ cp -r node_modules/@vainplex/openclaw-knowledge-engine/{dist,package.json,openclaw.plugin.json} extensions/openclaw-knowledge-engine/
40
+ ```
41
+
42
+ ### 3. Configure
43
+
44
+ Add to your `openclaw.json`:
45
+
46
+ ```json
47
+ {
48
+ "plugins": {
49
+ "entries": {
50
+ "openclaw-knowledge-engine": {
51
+ "enabled": true,
52
+ "config": {
53
+ "workspace": "/path/to/your/workspace",
54
+ "extraction": {
55
+ "regex": { "enabled": true },
56
+ "llm": {
57
+ "enabled": true,
58
+ "endpoint": "http://localhost:11434/api/generate",
59
+ "model": "mistral:7b",
60
+ "batchSize": 10,
61
+ "cooldownMs": 30000
62
+ }
63
+ }
64
+ }
65
+ }
66
+ }
67
+ }
68
+ }
69
+ ```
70
+
71
+ ### 4. Restart gateway
72
+
73
+ ```bash
74
+ openclaw gateway restart
75
+ ```
76
+
77
+ ## Configuration
78
+
79
+ | Key | Type | Default | Description |
80
+ |-----|------|---------|-------------|
81
+ | `enabled` | boolean | `true` | Enable/disable the plugin |
82
+ | `workspace` | string | `~/.clawd/plugins/knowledge-engine` | Storage directory for knowledge files |
83
+ | `extraction.regex.enabled` | boolean | `true` | High-speed regex entity extraction |
84
+ | `extraction.llm.enabled` | boolean | `true` | LLM-based deep extraction |
85
+ | `extraction.llm.model` | string | `"mistral:7b"` | Ollama/OpenAI-compatible model |
86
+ | `extraction.llm.endpoint` | string | `"http://localhost:11434/api/generate"` | LLM API endpoint (HTTP or HTTPS) |
87
+ | `extraction.llm.batchSize` | number | `10` | Messages per LLM batch |
88
+ | `extraction.llm.cooldownMs` | number | `30000` | Wait time before sending batch |
89
+ | `decay.enabled` | boolean | `true` | Periodic relevance decay |
90
+ | `decay.intervalHours` | number | `24` | Hours between decay cycles |
91
+ | `decay.rate` | number | `0.02` | Decay rate per interval (2%) |
92
+ | `embeddings.enabled` | boolean | `false` | Sync facts to ChromaDB |
93
+ | `embeddings.endpoint` | string | `"http://localhost:8000/..."` | ChromaDB API endpoint |
94
+ | `embeddings.collectionName` | string | `"openclaw-facts"` | Vector collection name |
95
+ | `embeddings.syncIntervalMinutes` | number | `15` | Minutes between vector syncs |
96
+ | `storage.maxEntities` | number | `5000` | Max entities before pruning |
97
+ | `storage.maxFacts` | number | `10000` | Max facts before pruning |
98
+ | `storage.writeDebounceMs` | number | `15000` | Debounce delay for disk writes |
99
+
100
+ ### Minimal config (regex only, no LLM)
101
+
102
+ ```json
103
+ {
104
+ "openclaw-knowledge-engine": {
105
+ "enabled": true,
106
+ "config": {
107
+ "extraction": {
108
+ "llm": { "enabled": false }
109
+ }
110
+ }
111
+ }
112
+ }
113
+ ```
114
+
115
+ This gives you zero-cost entity extraction with no external dependencies.
116
+
117
+ ### Full config (LLM + ChromaDB)
118
+
119
+ ```json
120
+ {
121
+ "openclaw-knowledge-engine": {
122
+ "enabled": true,
123
+ "config": {
124
+ "workspace": "~/my-agent/knowledge",
125
+ "extraction": {
126
+ "llm": {
127
+ "enabled": true,
128
+ "endpoint": "http://localhost:11434/api/generate",
129
+ "model": "mistral:7b"
130
+ }
131
+ },
132
+ "embeddings": {
133
+ "enabled": true,
134
+ "endpoint": "http://localhost:8000/api/v1/collections/facts/add"
135
+ },
136
+ "decay": {
137
+ "intervalHours": 12,
138
+ "rate": 0.03
139
+ }
140
+ }
141
+ }
142
+ }
143
+ ```
144
+
145
+ ## How it works
146
+
147
+ ### Extraction Pipeline
148
+
149
+ ```
150
+ Message received
151
+
152
+ ├──▶ Regex Engine (sync, <1ms)
153
+ │ └─ Extracts: proper nouns, organizations, tech terms,
154
+ │ URLs, emails, monetary amounts, dates
155
+
156
+ └──▶ LLM Batch Queue (async, batched)
157
+ └─ Every N messages or after cooldown:
158
+ └─ Sends batch to local LLM
159
+ └─ Extracts: entities + fact triples
160
+ └─ Stores in FactStore
161
+ ```
162
+
163
+ ### Fact Lifecycle
164
+
165
+ Facts are stored as structured triples:
166
+
167
+ ```json
168
+ {
169
+ "id": "f-abc123",
170
+ "subject": "Sebastian",
171
+ "predicate": "works-at",
172
+ "object": "Mondo Gate",
173
+ "source": "extracted-llm",
174
+ "relevance": 0.95,
175
+ "createdAt": 1707123456789,
176
+ "lastAccessedAt": 1707123456789
177
+ }
178
+ ```
179
+
180
+ - **Relevance** starts at 1.0 and decays over time
181
+ - **Accessed facts** get a relevance boost (LRU-style)
182
+ - **Pruning** removes facts below the relevance floor when storage limits are hit
183
+ - **Minimum floor** (0.1) prevents complete decay — old facts never fully disappear
184
+
185
+ ### Storage
186
+
187
+ All data is persisted as JSON files in your workspace:
188
+
189
+ ```
190
+ workspace/
191
+ ├── entities.json # Extracted entities with types and counts
192
+ └── facts.json # Fact triples with relevance scores
193
+ ```
194
+
195
+ Writes use atomic file operations (write to `.tmp`, then rename) to prevent corruption.
196
+
197
+ ## Architecture
198
+
199
+ ```
200
+ index.ts → Plugin entry point
201
+ src/
202
+ ├── types.ts → All TypeScript interfaces
203
+ ├── config.ts → Config resolution + validation
204
+ ├── patterns.ts → Regex factories (Proxy-based, no /g state bleed)
205
+ ├── entity-extractor.ts → Regex-based entity extraction
206
+ ├── llm-enhancer.ts → Batched LLM extraction with cooldown
207
+ ├── fact-store.ts → In-memory fact store with decay + pruning
208
+ ├── hooks.ts → OpenClaw hook registration + orchestration
209
+ ├── http-client.ts → Shared HTTP/HTTPS transport
210
+ ├── embeddings.ts → ChromaDB vector sync
211
+ ├── storage.ts → Atomic JSON I/O with debounce
212
+ └── maintenance.ts → Scheduled background tasks
213
+ ```
214
+
215
+ - **12 modules**, each with a single responsibility
216
+ - **Zero runtime dependencies** — Node.js built-ins only
217
+ - **TypeScript strict** — no `any` in source code
218
+ - **All functions ≤40 lines**
219
+
220
+ ## Hooks
221
+
222
+ | Hook | Priority | Description |
223
+ |------|----------|-------------|
224
+ | `session_start` | 200 | Loads fact store from disk |
225
+ | `message_received` | 100 | Extracts entities + queues LLM batch |
226
+ | `message_sent` | 100 | Same extraction on outbound messages |
227
+ | `gateway_stop` | 50 | Flushes writes, stops timers |
228
+
229
+ ## Testing
230
+
231
+ ```bash
232
+ npm test
233
+ # Runs 83 tests across 10 test files
234
+ ```
235
+
236
+ Tests cover: config validation, entity extraction, fact CRUD, decay, pruning, LLM batching, HTTP client, embeddings, storage atomicity, maintenance scheduling, hook orchestration.
237
+
238
+ ## Part of the Darkplex Plugin Suite
239
+
240
+ | # | Plugin | Status | Description |
241
+ |---|--------|--------|-------------|
242
+ | 1 | [@vainplex/nats-eventstore](https://github.com/alberthild/openclaw-nats-eventstore) | ✅ Published | NATS JetStream event persistence |
243
+ | 2 | [@vainplex/openclaw-cortex](https://github.com/alberthild/openclaw-cortex) | ✅ Published | Conversation intelligence (threads, decisions, boot context) |
244
+ | 3 | **@vainplex/openclaw-knowledge-engine** | ✅ Published | Real-time knowledge extraction (this plugin) |
245
+ | 4 | @vainplex/openclaw-governance | 📋 Planned | Policy enforcement + guardrails |
246
+ | 5 | @vainplex/openclaw-memory-engine | 📋 Planned | Unified memory layer |
247
+ | 6 | @vainplex/openclaw-health-monitor | 📋 Planned | System health + auto-healing |
248
+
249
+ ## License
250
+
251
+ MIT
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@vainplex/openclaw-knowledge-engine",
3
- "version": "0.1.0",
3
+ "version": "0.1.1",
4
4
  "description": "An OpenClaw plugin for real-time and batch knowledge extraction from conversational data.",
5
5
  "main": "dist/index.js",
6
6
  "types": "dist/index.d.ts",