@karmaniverous/jeeves-watcher 0.5.0 → 0.6.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +22 -280
- package/config.schema.json +70 -48
- package/dist/cli/jeeves-watcher/index.js +3349 -2831
- package/dist/index.d.ts +135 -25
- package/dist/{mjs/index.js → index.js} +3525 -3007
- package/package.json +64 -92
- package/LICENSE +0 -28
- package/dist/cjs/index.js +0 -5050
- package/dist/index.iife.js +0 -5021
- package/dist/index.iife.min.js +0 -1
- package/dist/plugin/index.js +0 -267
- package/dist/plugin/openclaw.plugin.json +0 -24
- package/dist/skills/jeeves-watcher/SKILL.md +0 -428
- package/dist/skills/jeeves-watcher-admin/SKILL.md +0 -200
package/README.md
CHANGED
|
@@ -1,300 +1,42 @@
|
|
|
1
1
|
# @karmaniverous/jeeves-watcher
|
|
2
2
|
|
|
3
|
-
Filesystem watcher that keeps a Qdrant vector store in sync with document changes.
|
|
3
|
+
Filesystem watcher that keeps a [Qdrant](https://qdrant.tech/) vector store in sync with document changes. Extract text from files, chunk it, generate embeddings, and query your documents with semantic search.
|
|
4
4
|
|
|
5
|
-
##
|
|
5
|
+
## Features
|
|
6
6
|
|
|
7
|
-
|
|
7
|
+
- **Filesystem watching** — monitors directories for file changes via [chokidar](https://github.com/paulmillr/chokidar)
|
|
8
|
+
- **Multi-format extraction** — PDF, HTML, DOCX, Markdown, plain text, and more
|
|
9
|
+
- **Configurable chunking** — token-based text splitting with overlap control
|
|
10
|
+
- **Embedding providers** — Gemini, OpenAI, or mock (for testing)
|
|
11
|
+
- **Qdrant sync** — automatic upsert/delete keeps the vector store current
|
|
12
|
+
- **Rules engine** — glob-based inference rules for metadata enrichment
|
|
13
|
+
- **REST API** — Fastify server for search, status, config, and management
|
|
14
|
+
- **CLI** — `jeeves-watcher init`, `validate`, `start`, and more
|
|
8
15
|
|
|
9
|
-
|
|
10
|
-
- **Extracts** text from various formats (Markdown, PDF, DOCX, HTML, JSON, plain text)
|
|
11
|
-
- **Chunks** large documents for optimal embedding
|
|
12
|
-
- **Embeds** content using configurable providers (Google Gemini, mock for testing)
|
|
13
|
-
- **Syncs** to Qdrant for fast semantic search
|
|
14
|
-
- **Enriches** metadata via rules and API endpoints
|
|
15
|
-
|
|
16
|
-
### Architecture
|
|
17
|
-
|
|
18
|
-

|
|
19
|
-
|
|
20
|
-
For detailed architecture documentation, see [guides/architecture.md](guides/architecture.md).
|
|
21
|
-
|
|
22
|
-
## Quick Start
|
|
23
|
-
|
|
24
|
-
### Installation
|
|
25
|
-
|
|
26
|
-
```bash
|
|
27
|
-
npm install -g @karmaniverous/jeeves-watcher
|
|
28
|
-
```
|
|
29
|
-
|
|
30
|
-
### Initialize Configuration
|
|
31
|
-
|
|
32
|
-
Create a new configuration file in your project:
|
|
33
|
-
|
|
34
|
-
```bash
|
|
35
|
-
jeeves-watcher init
|
|
36
|
-
```
|
|
37
|
-
|
|
38
|
-
This generates a `jeeves-watcher.config.json` file with sensible defaults.
|
|
39
|
-
|
|
40
|
-
### Configure
|
|
41
|
-
|
|
42
|
-
Edit `jeeves-watcher.config.json` to specify:
|
|
43
|
-
|
|
44
|
-
- **Watch paths**: Directories to monitor
|
|
45
|
-
- **Embedding provider**: Google Gemini or mock (for testing)
|
|
46
|
-
- **Qdrant connection**: URL and collection name
|
|
47
|
-
- **Inference rules**: Automatic metadata enrichment based on file patterns
|
|
48
|
-
|
|
49
|
-
Example minimal configuration:
|
|
50
|
-
|
|
51
|
-
```json
|
|
52
|
-
{
|
|
53
|
-
"watch": {
|
|
54
|
-
"paths": ["./docs"],
|
|
55
|
-
"ignored": ["**/node_modules/**", "**/.git/**"]
|
|
56
|
-
},
|
|
57
|
-
"embedding": {
|
|
58
|
-
"provider": "gemini",
|
|
59
|
-
"model": "gemini-embedding-001",
|
|
60
|
-
"apiKey": "${GOOGLE_API_KEY}"
|
|
61
|
-
},
|
|
62
|
-
"vectorStore": {
|
|
63
|
-
"url": "http://localhost:6333",
|
|
64
|
-
"collectionName": "my_docs"
|
|
65
|
-
}
|
|
66
|
-
}
|
|
67
|
-
```
|
|
68
|
-
|
|
69
|
-
### Start Watching
|
|
70
|
-
|
|
71
|
-
```bash
|
|
72
|
-
jeeves-watcher start
|
|
73
|
-
```
|
|
74
|
-
|
|
75
|
-
The watcher will:
|
|
76
|
-
1. Index all existing files in watched directories
|
|
77
|
-
2. Monitor for changes
|
|
78
|
-
3. Update Qdrant automatically
|
|
79
|
-
|
|
80
|
-
## CLI Commands
|
|
81
|
-
|
|
82
|
-
| Command | Description |
|
|
83
|
-
|---------|-------------|
|
|
84
|
-
| `jeeves-watcher start` | Start the filesystem watcher (foreground) |
|
|
85
|
-
| `jeeves-watcher init` | Initialize a new configuration file |
|
|
86
|
-
| `jeeves-watcher status` | Show watcher status |
|
|
87
|
-
| `jeeves-watcher reindex` | Reindex all watched files |
|
|
88
|
-
| `jeeves-watcher rebuild-metadata` | Rebuild metadata files from Qdrant payloads |
|
|
89
|
-
| `jeeves-watcher search <query>` | Search the vector store |
|
|
90
|
-
| `jeeves-watcher enrich <path>` | Enrich document metadata with key-value pairs |
|
|
91
|
-
| `jeeves-watcher validate` | Validate the configuration |
|
|
92
|
-
| `jeeves-watcher service` | Manage the watcher as a system service |
|
|
93
|
-
| `jeeves-watcher config-reindex` | Reindex after configuration changes (rules only or full) |
|
|
94
|
-
|
|
95
|
-
## Configuration
|
|
96
|
-
|
|
97
|
-
### Environment Variable Substitution
|
|
98
|
-
|
|
99
|
-
Config strings support `${VAR_NAME}` syntax for environment variable injection:
|
|
100
|
-
|
|
101
|
-
```json
|
|
102
|
-
{
|
|
103
|
-
"embedding": {
|
|
104
|
-
"apiKey": "${GOOGLE_API_KEY}"
|
|
105
|
-
}
|
|
106
|
-
}
|
|
107
|
-
```
|
|
108
|
-
|
|
109
|
-
If `GOOGLE_API_KEY` is set in the environment, the value is substituted at config load time. **Unresolvable expressions are left untouched** — this allows `${...}` template syntax used in inference rule property schemas (e.g. `${frontmatter.title}`, `${file.path}`) to pass through for later resolution by the rules engine.
|
|
110
|
-
|
|
111
|
-
### Watch Paths
|
|
112
|
-
|
|
113
|
-
```json
|
|
114
|
-
{
|
|
115
|
-
"watch": {
|
|
116
|
-
"paths": ["./docs", "./notes"],
|
|
117
|
-
"ignored": ["**/node_modules/**", "**/*.tmp"]
|
|
118
|
-
}
|
|
119
|
-
}
|
|
120
|
-
```
|
|
121
|
-
|
|
122
|
-
- **`paths`**: Array of glob patterns or directories to watch
|
|
123
|
-
- **`ignored`**: Array of patterns to exclude
|
|
124
|
-
- **`respectGitignore`**: (default: `true`) Skip processing files ignored by `.gitignore` in git repositories. Nested `.gitignore` files are respected within their subtree.
|
|
125
|
-
|
|
126
|
-
### Embedding Provider
|
|
127
|
-
|
|
128
|
-
#### Google Gemini
|
|
129
|
-
|
|
130
|
-
```json
|
|
131
|
-
{
|
|
132
|
-
"embedding": {
|
|
133
|
-
"provider": "gemini",
|
|
134
|
-
"model": "gemini-embedding-001",
|
|
135
|
-
"apiKey": "${GOOGLE_API_KEY}"
|
|
136
|
-
}
|
|
137
|
-
}
|
|
138
|
-
```
|
|
139
|
-
|
|
140
|
-
### Vector Store
|
|
141
|
-
|
|
142
|
-
```json
|
|
143
|
-
{
|
|
144
|
-
"vectorStore": {
|
|
145
|
-
"url": "http://localhost:6333",
|
|
146
|
-
"collectionName": "my_collection"
|
|
147
|
-
}
|
|
148
|
-
}
|
|
149
|
-
```
|
|
150
|
-
|
|
151
|
-
### Inference Rules
|
|
152
|
-
|
|
153
|
-
Automatically enrich metadata based on file patterns using declarative JSON Schemas:
|
|
154
|
-
|
|
155
|
-
```json
|
|
156
|
-
{
|
|
157
|
-
"schemas": {
|
|
158
|
-
"base": {
|
|
159
|
-
"type": "object",
|
|
160
|
-
"properties": {
|
|
161
|
-
"domain": {
|
|
162
|
-
"type": "string",
|
|
163
|
-
"description": "Content domain"
|
|
164
|
-
}
|
|
165
|
-
}
|
|
166
|
-
}
|
|
167
|
-
},
|
|
168
|
-
"inferenceRules": [
|
|
169
|
-
{
|
|
170
|
-
"name": "meeting-classifier",
|
|
171
|
-
"description": "Classify files under meetings directory",
|
|
172
|
-
"match": {
|
|
173
|
-
"properties": {
|
|
174
|
-
"file": {
|
|
175
|
-
"type": "object",
|
|
176
|
-
"properties": {
|
|
177
|
-
"path": { "type": "string", "glob": "**/meetings/**" }
|
|
178
|
-
}
|
|
179
|
-
}
|
|
180
|
-
}
|
|
181
|
-
},
|
|
182
|
-
"schema": [
|
|
183
|
-
"base",
|
|
184
|
-
{
|
|
185
|
-
"properties": {
|
|
186
|
-
"domain": { "set": "meetings" },
|
|
187
|
-
"category": { "type": "string", "set": "notes" }
|
|
188
|
-
}
|
|
189
|
-
}
|
|
190
|
-
]
|
|
191
|
-
}
|
|
192
|
-
]
|
|
193
|
-
}
|
|
194
|
-
```
|
|
195
|
-
|
|
196
|
-
**New in v0.5.0:** Inference rules now use `schema` arrays that reference global named schemas. Type coercion automatically converts string interpolation results to declared types (integer, number, boolean, array, object). See [Inference Rules Guide](guides/inference-rules.md) for details.
|
|
197
|
-
|
|
198
|
-
### Chunking
|
|
199
|
-
|
|
200
|
-
Chunking settings are configured under `embedding`:
|
|
201
|
-
|
|
202
|
-
```json
|
|
203
|
-
{
|
|
204
|
-
"embedding": {
|
|
205
|
-
"chunkSize": 1000,
|
|
206
|
-
"chunkOverlap": 200
|
|
207
|
-
}
|
|
208
|
-
}
|
|
209
|
-
```
|
|
210
|
-
|
|
211
|
-
### Metadata Storage
|
|
212
|
-
|
|
213
|
-
```json
|
|
214
|
-
{
|
|
215
|
-
"metadataDir": ".jeeves-watcher"
|
|
216
|
-
}
|
|
217
|
-
```
|
|
218
|
-
|
|
219
|
-
Metadata is stored as JSON files alongside watched documents.
|
|
220
|
-
|
|
221
|
-
## API Endpoints
|
|
222
|
-
|
|
223
|
-
The watcher provides a REST API (default port: 3456):
|
|
224
|
-
|
|
225
|
-
| Endpoint | Method | Description |
|
|
226
|
-
|----------|--------|-------------|
|
|
227
|
-
| `/status` | GET | Health check, uptime, and collection stats |
|
|
228
|
-
| `/search` | POST | Semantic search (`{ query: string, limit?: number, filter?: object }`) |
|
|
229
|
-
| `/metadata` | POST | Update document metadata with schema validation (`{ path: string, metadata: object }`) |
|
|
230
|
-
| `/reindex` | POST | Reindex all watched files |
|
|
231
|
-
| `/rebuild-metadata` | POST | Rebuild metadata files from Qdrant |
|
|
232
|
-
| `/config-reindex` | POST | Reindex after config changes (`{ scope?: "rules" \| "full" }`) |
|
|
233
|
-
| `/config/schema` | GET | JSON Schema of merged virtual document (v0.5.0+) |
|
|
234
|
-
| `/config/query` | POST | JSONPath query over config (`{ path: string, resolve?: string[] }`) (v0.5.0+) |
|
|
235
|
-
| `/config/match` | POST | Test paths against inference rules (`{ paths: string[] }`) (v0.5.0+) |
|
|
236
|
-
| `/issues` | GET | Current embedding failures and processing errors (v0.5.0+) |
|
|
237
|
-
|
|
238
|
-
### Example: Search
|
|
16
|
+
## Install
|
|
239
17
|
|
|
240
18
|
```bash
|
|
241
|
-
|
|
242
|
-
-H "Content-Type: application/json" \
|
|
243
|
-
-d '{"query": "machine learning algorithms", "limit": 5}'
|
|
19
|
+
npm install @karmaniverous/jeeves-watcher
|
|
244
20
|
```
|
|
245
21
|
|
|
246
|
-
|
|
22
|
+
## Quick Start
|
|
247
23
|
|
|
248
24
|
```bash
|
|
249
|
-
|
|
250
|
-
|
|
251
|
-
-d '{
|
|
252
|
-
"query": "error handling",
|
|
253
|
-
"limit": 10,
|
|
254
|
-
"filter": {
|
|
255
|
-
"must": [{ "key": "domain", "match": { "value": "backend" } }]
|
|
256
|
-
}
|
|
257
|
-
}'
|
|
258
|
-
```
|
|
25
|
+
# Generate a config file
|
|
26
|
+
npx jeeves-watcher init --output ./jeeves-watcher.config.json
|
|
259
27
|
|
|
260
|
-
|
|
28
|
+
# Validate it
|
|
29
|
+
npx jeeves-watcher validate --config ./jeeves-watcher.config.json
|
|
261
30
|
|
|
262
|
-
|
|
263
|
-
|
|
264
|
-
-H "Content-Type: application/json" \
|
|
265
|
-
-d '{
|
|
266
|
-
"path": "/path/to/document.md",
|
|
267
|
-
"metadata": {
|
|
268
|
-
"priority": "high",
|
|
269
|
-
"category": "research"
|
|
270
|
-
}
|
|
271
|
-
}'
|
|
31
|
+
# Start the watcher
|
|
32
|
+
npx jeeves-watcher start --config ./jeeves-watcher.config.json
|
|
272
33
|
```
|
|
273
34
|
|
|
274
|
-
##
|
|
275
|
-
|
|
276
|
-
This repo ships an OpenClaw plugin that exposes the jeeves-watcher API as native agent tools:
|
|
277
|
-
|
|
278
|
-
- `watcher_status` (GET `/status`)
|
|
279
|
-
- `watcher_search` (POST `/search`)
|
|
280
|
-
- `watcher_enrich` (POST `/metadata`)
|
|
281
|
-
|
|
282
|
-
Build output:
|
|
283
|
-
|
|
284
|
-
- Plugin entry: `dist/plugin/index.js`
|
|
285
|
-
- Plugin manifest: `dist/plugin/openclaw.plugin.json`
|
|
286
|
-
- Skill: `dist/plugin/skill/SKILL.md`
|
|
287
|
-
|
|
288
|
-
Plugin configuration supports `apiUrl` (defaults to `http://127.0.0.1:3458`).
|
|
35
|
+
## Documentation
|
|
289
36
|
|
|
290
|
-
|
|
37
|
+
Full docs, guides, and API reference:
|
|
291
38
|
|
|
292
|
-
-
|
|
293
|
-
- **PDF** (`.pdf`) — text extraction
|
|
294
|
-
- **DOCX** (`.docx`) — Microsoft Word documents
|
|
295
|
-
- **HTML** (`.html`, `.htm`) — content extraction (scripts/styles removed)
|
|
296
|
-
- **JSON** (`.json`) — with smart text field detection
|
|
297
|
-
- **Plain Text** (`.txt`, `.text`)
|
|
39
|
+
**[docs.karmanivero.us/jeeves-watcher](https://docs.karmanivero.us/jeeves-watcher)**
|
|
298
40
|
|
|
299
41
|
## License
|
|
300
42
|
|
package/config.schema.json
CHANGED
|
@@ -138,7 +138,7 @@
|
|
|
138
138
|
]
|
|
139
139
|
},
|
|
140
140
|
"inferenceRules": {
|
|
141
|
-
"description": "Rules for inferring metadata from file attributes.",
|
|
141
|
+
"description": "Rules for inferring metadata from file attributes. Each entry may be an inline rule object or a file path to a JSON rule file (resolved relative to config directory).",
|
|
142
142
|
"allOf": [
|
|
143
143
|
{
|
|
144
144
|
"$ref": "#/definitions/__schema49"
|
|
@@ -194,7 +194,7 @@
|
|
|
194
194
|
]
|
|
195
195
|
},
|
|
196
196
|
"search": {
|
|
197
|
-
"description": "Search configuration including score thresholds.",
|
|
197
|
+
"description": "Search configuration including score thresholds and hybrid search.",
|
|
198
198
|
"allOf": [
|
|
199
199
|
{
|
|
200
200
|
"$ref": "#/definitions/__schema63"
|
|
@@ -579,57 +579,64 @@
|
|
|
579
579
|
"__schema49": {
|
|
580
580
|
"type": "array",
|
|
581
581
|
"items": {
|
|
582
|
-
"
|
|
583
|
-
|
|
584
|
-
"name": {
|
|
585
|
-
"type": "string",
|
|
586
|
-
"minLength": 1,
|
|
587
|
-
"description": "Unique name identifying this inference rule."
|
|
588
|
-
},
|
|
589
|
-
"description": {
|
|
590
|
-
"type": "string",
|
|
591
|
-
"minLength": 1,
|
|
592
|
-
"description": "Human-readable description of what this rule does."
|
|
593
|
-
},
|
|
594
|
-
"match": {
|
|
582
|
+
"anyOf": [
|
|
583
|
+
{
|
|
595
584
|
"type": "object",
|
|
596
|
-
"
|
|
597
|
-
"
|
|
598
|
-
|
|
599
|
-
|
|
600
|
-
|
|
601
|
-
|
|
602
|
-
|
|
603
|
-
|
|
604
|
-
|
|
605
|
-
|
|
606
|
-
|
|
607
|
-
{
|
|
608
|
-
"
|
|
609
|
-
|
|
610
|
-
|
|
611
|
-
|
|
612
|
-
|
|
613
|
-
|
|
614
|
-
|
|
615
|
-
|
|
616
|
-
|
|
585
|
+
"properties": {
|
|
586
|
+
"name": {
|
|
587
|
+
"type": "string",
|
|
588
|
+
"minLength": 1,
|
|
589
|
+
"description": "Unique name identifying this inference rule."
|
|
590
|
+
},
|
|
591
|
+
"description": {
|
|
592
|
+
"type": "string",
|
|
593
|
+
"minLength": 1,
|
|
594
|
+
"description": "Human-readable description of what this rule does."
|
|
595
|
+
},
|
|
596
|
+
"match": {
|
|
597
|
+
"type": "object",
|
|
598
|
+
"propertyNames": {
|
|
599
|
+
"$ref": "#/definitions/__schema50"
|
|
600
|
+
},
|
|
601
|
+
"additionalProperties": {
|
|
602
|
+
"$ref": "#/definitions/__schema51"
|
|
603
|
+
},
|
|
604
|
+
"description": "JSON Schema object to match against file attributes."
|
|
605
|
+
},
|
|
606
|
+
"schema": {
|
|
607
|
+
"description": "Array of schema references (named schema refs or inline objects) merged left-to-right.",
|
|
608
|
+
"allOf": [
|
|
609
|
+
{
|
|
610
|
+
"$ref": "#/definitions/__schema52"
|
|
611
|
+
}
|
|
612
|
+
]
|
|
613
|
+
},
|
|
614
|
+
"map": {
|
|
615
|
+
"description": "JsonMap transformation (inline definition, named map reference, or .json file path).",
|
|
616
|
+
"allOf": [
|
|
617
|
+
{
|
|
618
|
+
"$ref": "#/definitions/__schema53"
|
|
619
|
+
}
|
|
620
|
+
]
|
|
621
|
+
},
|
|
622
|
+
"template": {
|
|
623
|
+
"description": "Handlebars content template (inline string, named ref, or .hbs/.handlebars file path).",
|
|
624
|
+
"allOf": [
|
|
625
|
+
{
|
|
626
|
+
"$ref": "#/definitions/__schema56"
|
|
627
|
+
}
|
|
628
|
+
]
|
|
617
629
|
}
|
|
630
|
+
},
|
|
631
|
+
"required": [
|
|
632
|
+
"name",
|
|
633
|
+
"description",
|
|
634
|
+
"match"
|
|
618
635
|
]
|
|
619
636
|
},
|
|
620
|
-
|
|
621
|
-
"
|
|
622
|
-
"allOf": [
|
|
623
|
-
{
|
|
624
|
-
"$ref": "#/definitions/__schema56"
|
|
625
|
-
}
|
|
626
|
-
]
|
|
637
|
+
{
|
|
638
|
+
"type": "string"
|
|
627
639
|
}
|
|
628
|
-
},
|
|
629
|
-
"required": [
|
|
630
|
-
"name",
|
|
631
|
-
"description",
|
|
632
|
-
"match"
|
|
633
640
|
]
|
|
634
641
|
}
|
|
635
642
|
},
|
|
@@ -896,6 +903,21 @@
|
|
|
896
903
|
"relevant",
|
|
897
904
|
"noise"
|
|
898
905
|
]
|
|
906
|
+
},
|
|
907
|
+
"hybrid": {
|
|
908
|
+
"type": "object",
|
|
909
|
+
"properties": {
|
|
910
|
+
"enabled": {
|
|
911
|
+
"default": false,
|
|
912
|
+
"type": "boolean"
|
|
913
|
+
},
|
|
914
|
+
"textWeight": {
|
|
915
|
+
"default": 0.3,
|
|
916
|
+
"type": "number",
|
|
917
|
+
"minimum": 0,
|
|
918
|
+
"maximum": 1
|
|
919
|
+
}
|
|
920
|
+
}
|
|
899
921
|
}
|
|
900
922
|
}
|
|
901
923
|
},
|