@karmaniverous/jeeves-watcher 0.5.0-1 → 0.5.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,300 +1,42 @@
1
1
  # @karmaniverous/jeeves-watcher
2
2
 
3
- Filesystem watcher that keeps a Qdrant vector store in sync with document changes.
3
+ Filesystem watcher that keeps a [Qdrant](https://qdrant.tech/) vector store in sync with document changes. Extract text from files, chunk it, generate embeddings, and query your documents with semantic search.
4
4
 
5
- ## Overview
5
+ ## Features
6
6
 
7
- `jeeves-watcher` monitors a configured set of directories for file changes, extracts text content, generates embeddings, and maintains a synchronized Qdrant vector store for semantic search. It automatically:
7
+ - **Filesystem watching** monitors directories for file changes via [chokidar](https://github.com/paulmillr/chokidar)
8
+ - **Multi-format extraction** — PDF, HTML, DOCX, Markdown, plain text, and more
9
+ - **Configurable chunking** — token-based text splitting with overlap control
10
+ - **Embedding providers** — Gemini, OpenAI, or mock (for testing)
11
+ - **Qdrant sync** — automatic upsert/delete keeps the vector store current
12
+ - **Rules engine** — glob-based inference rules for metadata enrichment
13
+ - **REST API** — Fastify server for search, status, config, and management
14
+ - **CLI** — `jeeves-watcher init`, `validate`, `start`, and more
8
15
 
9
- - **Watches** directories for file additions, modifications, and deletions
10
- - **Extracts** text from various formats (Markdown, PDF, DOCX, HTML, JSON, plain text)
11
- - **Chunks** large documents for optimal embedding
12
- - **Embeds** content using configurable providers (Google Gemini, mock for testing)
13
- - **Syncs** to Qdrant for fast semantic search
14
- - **Enriches** metadata via rules and API endpoints
15
-
16
- ### Architecture
17
-
18
- ![System Architecture](assets/system-architecture.png)
19
-
20
- For detailed architecture documentation, see [guides/architecture.md](guides/architecture.md).
21
-
22
- ## Quick Start
23
-
24
- ### Installation
25
-
26
- ```bash
27
- npm install -g @karmaniverous/jeeves-watcher
28
- ```
29
-
30
- ### Initialize Configuration
31
-
32
- Create a new configuration file in your project:
33
-
34
- ```bash
35
- jeeves-watcher init
36
- ```
37
-
38
- This generates a `jeeves-watcher.config.json` file with sensible defaults.
39
-
40
- ### Configure
41
-
42
- Edit `jeeves-watcher.config.json` to specify:
43
-
44
- - **Watch paths**: Directories to monitor
45
- - **Embedding provider**: Google Gemini or mock (for testing)
46
- - **Qdrant connection**: URL and collection name
47
- - **Inference rules**: Automatic metadata enrichment based on file patterns
48
-
49
- Example minimal configuration:
50
-
51
- ```json
52
- {
53
- "watch": {
54
- "paths": ["./docs"],
55
- "ignored": ["**/node_modules/**", "**/.git/**"]
56
- },
57
- "embedding": {
58
- "provider": "gemini",
59
- "model": "gemini-embedding-001",
60
- "apiKey": "${GOOGLE_API_KEY}"
61
- },
62
- "vectorStore": {
63
- "url": "http://localhost:6333",
64
- "collectionName": "my_docs"
65
- }
66
- }
67
- ```
68
-
69
- ### Start Watching
70
-
71
- ```bash
72
- jeeves-watcher start
73
- ```
74
-
75
- The watcher will:
76
- 1. Index all existing files in watched directories
77
- 2. Monitor for changes
78
- 3. Update Qdrant automatically
79
-
80
- ## CLI Commands
81
-
82
- | Command | Description |
83
- |---------|-------------|
84
- | `jeeves-watcher start` | Start the filesystem watcher (foreground) |
85
- | `jeeves-watcher init` | Initialize a new configuration file |
86
- | `jeeves-watcher status` | Show watcher status |
87
- | `jeeves-watcher reindex` | Reindex all watched files |
88
- | `jeeves-watcher rebuild-metadata` | Rebuild metadata files from Qdrant payloads |
89
- | `jeeves-watcher search <query>` | Search the vector store |
90
- | `jeeves-watcher enrich <path>` | Enrich document metadata with key-value pairs |
91
- | `jeeves-watcher validate` | Validate the configuration |
92
- | `jeeves-watcher service` | Manage the watcher as a system service |
93
- | `jeeves-watcher config-reindex` | Reindex after configuration changes (rules only or full) |
94
-
95
- ## Configuration
96
-
97
- ### Environment Variable Substitution
98
-
99
- Config strings support `${VAR_NAME}` syntax for environment variable injection:
100
-
101
- ```json
102
- {
103
- "embedding": {
104
- "apiKey": "${GOOGLE_API_KEY}"
105
- }
106
- }
107
- ```
108
-
109
- If `GOOGLE_API_KEY` is set in the environment, the value is substituted at config load time. **Unresolvable expressions are left untouched** — this allows `${...}` template syntax used in inference rule property schemas (e.g. `${frontmatter.title}`, `${file.path}`) to pass through for later resolution by the rules engine.
110
-
111
- ### Watch Paths
112
-
113
- ```json
114
- {
115
- "watch": {
116
- "paths": ["./docs", "./notes"],
117
- "ignored": ["**/node_modules/**", "**/*.tmp"]
118
- }
119
- }
120
- ```
121
-
122
- - **`paths`**: Array of glob patterns or directories to watch
123
- - **`ignored`**: Array of patterns to exclude
124
- - **`respectGitignore`**: (default: `true`) Skip processing files ignored by `.gitignore` in git repositories. Nested `.gitignore` files are respected within their subtree.
125
-
126
- ### Embedding Provider
127
-
128
- #### Google Gemini
129
-
130
- ```json
131
- {
132
- "embedding": {
133
- "provider": "gemini",
134
- "model": "gemini-embedding-001",
135
- "apiKey": "${GOOGLE_API_KEY}"
136
- }
137
- }
138
- ```
139
-
140
- ### Vector Store
141
-
142
- ```json
143
- {
144
- "vectorStore": {
145
- "url": "http://localhost:6333",
146
- "collectionName": "my_collection"
147
- }
148
- }
149
- ```
150
-
151
- ### Inference Rules
152
-
153
- Automatically enrich metadata based on file patterns using declarative JSON Schemas:
154
-
155
- ```json
156
- {
157
- "schemas": {
158
- "base": {
159
- "type": "object",
160
- "properties": {
161
- "domain": {
162
- "type": "string",
163
- "description": "Content domain"
164
- }
165
- }
166
- }
167
- },
168
- "inferenceRules": [
169
- {
170
- "name": "meeting-classifier",
171
- "description": "Classify files under meetings directory",
172
- "match": {
173
- "properties": {
174
- "file": {
175
- "type": "object",
176
- "properties": {
177
- "path": { "type": "string", "glob": "**/meetings/**" }
178
- }
179
- }
180
- }
181
- },
182
- "schema": [
183
- "base",
184
- {
185
- "properties": {
186
- "domain": { "set": "meetings" },
187
- "category": { "type": "string", "set": "notes" }
188
- }
189
- }
190
- ]
191
- }
192
- ]
193
- }
194
- ```
195
-
196
- **New in v0.5.0:** Inference rules now use `schema` arrays that reference global named schemas. Type coercion automatically converts string interpolation results to declared types (integer, number, boolean, array, object). See [Inference Rules Guide](guides/inference-rules.md) for details.
197
-
198
- ### Chunking
199
-
200
- Chunking settings are configured under `embedding`:
201
-
202
- ```json
203
- {
204
- "embedding": {
205
- "chunkSize": 1000,
206
- "chunkOverlap": 200
207
- }
208
- }
209
- ```
210
-
211
- ### Metadata Storage
212
-
213
- ```json
214
- {
215
- "metadataDir": ".jeeves-watcher"
216
- }
217
- ```
218
-
219
- Metadata is stored as JSON files alongside watched documents.
220
-
221
- ## API Endpoints
222
-
223
- The watcher provides a REST API (default port: 3456):
224
-
225
- | Endpoint | Method | Description |
226
- |----------|--------|-------------|
227
- | `/status` | GET | Health check, uptime, and collection stats |
228
- | `/search` | POST | Semantic search (`{ query: string, limit?: number, filter?: object }`) |
229
- | `/metadata` | POST | Update document metadata with schema validation (`{ path: string, metadata: object }`) |
230
- | `/reindex` | POST | Reindex all watched files |
231
- | `/rebuild-metadata` | POST | Rebuild metadata files from Qdrant |
232
- | `/config-reindex` | POST | Reindex after config changes (`{ scope?: "rules" \| "full" }`) |
233
- | `/config/schema` | GET | JSON Schema of merged virtual document (v0.5.0+) |
234
- | `/config/query` | POST | JSONPath query over config (`{ path: string, resolve?: string[] }`) (v0.5.0+) |
235
- | `/config/match` | POST | Test paths against inference rules (`{ paths: string[] }`) (v0.5.0+) |
236
- | `/issues` | GET | Current embedding failures and processing errors (v0.5.0+) |
237
-
238
- ### Example: Search
16
+ ## Install
239
17
 
240
18
  ```bash
241
- curl -X POST http://localhost:3456/search \
242
- -H "Content-Type: application/json" \
243
- -d '{"query": "machine learning algorithms", "limit": 5}'
19
+ npm install @karmaniverous/jeeves-watcher
244
20
  ```
245
21
 
246
- ### Example: Search With Filter
22
+ ## Quick Start
247
23
 
248
24
  ```bash
249
- curl -X POST http://localhost:3456/search \
250
- -H "Content-Type: application/json" \
251
- -d '{
252
- "query": "error handling",
253
- "limit": 10,
254
- "filter": {
255
- "must": [{ "key": "domain", "match": { "value": "backend" } }]
256
- }
257
- }'
258
- ```
25
+ # Generate a config file
26
+ npx jeeves-watcher init --output ./jeeves-watcher.config.json
259
27
 
260
- ### Example: Update Metadata
28
+ # Validate it
29
+ npx jeeves-watcher validate --config ./jeeves-watcher.config.json
261
30
 
262
- ```bash
263
- curl -X POST http://localhost:3456/metadata \
264
- -H "Content-Type: application/json" \
265
- -d '{
266
- "path": "/path/to/document.md",
267
- "metadata": {
268
- "priority": "high",
269
- "category": "research"
270
- }
271
- }'
31
+ # Start the watcher
32
+ npx jeeves-watcher start --config ./jeeves-watcher.config.json
272
33
  ```
273
34
 
274
- ## OpenClaw Plugin
275
-
276
- This repo ships an OpenClaw plugin that exposes the jeeves-watcher API as native agent tools:
277
-
278
- - `watcher_status` (GET `/status`)
279
- - `watcher_search` (POST `/search`)
280
- - `watcher_enrich` (POST `/metadata`)
281
-
282
- Build output:
283
-
284
- - Plugin entry: `dist/plugin/index.js`
285
- - Plugin manifest: `dist/plugin/openclaw.plugin.json`
286
- - Skill: `dist/plugin/skill/SKILL.md`
287
-
288
- Plugin configuration supports `apiUrl` (defaults to `http://127.0.0.1:3458`).
35
+ ## Documentation
289
36
 
290
- ## Supported File Formats
37
+ Full docs, guides, and API reference:
291
38
 
292
- - **Markdown** (`.md`, `.markdown`) — with YAML frontmatter support
293
- - **PDF** (`.pdf`) — text extraction
294
- - **DOCX** (`.docx`) — Microsoft Word documents
295
- - **HTML** (`.html`, `.htm`) — content extraction (scripts/styles removed)
296
- - **JSON** (`.json`) — with smart text field detection
297
- - **Plain Text** (`.txt`, `.text`)
39
+ **[docs.karmanivero.us/jeeves-watcher](https://docs.karmanivero.us/jeeves-watcher)**
298
40
 
299
41
  ## License
300
42
 
package/package.json CHANGED
@@ -1,17 +1,54 @@
1
1
  {
2
+ "name": "@karmaniverous/jeeves-watcher",
3
+ "version": "0.5.1",
2
4
  "author": "Jason Williscroft",
5
+ "description": "Filesystem watcher that keeps a Qdrant vector store in sync with document changes",
6
+ "license": "BSD-3-Clause",
7
+ "type": "module",
8
+ "module": "dist/index.js",
9
+ "types": "dist/index.d.ts",
3
10
  "bin": {
4
11
  "jeeves-watcher": "./dist/cli/jeeves-watcher/index.js"
5
12
  },
6
- "auto-changelog": {
7
- "output": "CHANGELOG.md",
8
- "unreleased": true,
9
- "commitLimit": false,
10
- "hideCredit": true
13
+ "exports": {
14
+ ".": {
15
+ "types": "./dist/index.d.ts",
16
+ "default": "./dist/index.js"
17
+ }
18
+ },
19
+ "files": [
20
+ "dist",
21
+ "config.schema.json"
22
+ ],
23
+ "publishConfig": {
24
+ "access": "public"
25
+ },
26
+ "repository": {
27
+ "type": "git",
28
+ "url": "git+https://github.com/karmaniverous/jeeves-watcher.git",
29
+ "directory": "packages/service"
11
30
  },
12
31
  "bugs": {
13
32
  "url": "https://github.com/karmaniverous/jeeves-watcher/issues"
14
33
  },
34
+ "homepage": "https://github.com/karmaniverous/jeeves-watcher#readme",
35
+ "keywords": [
36
+ "filesystem",
37
+ "watcher",
38
+ "qdrant",
39
+ "vector-store",
40
+ "embeddings",
41
+ "semantic-search",
42
+ "document-indexing",
43
+ "rag",
44
+ "langchain",
45
+ "gemini",
46
+ "typescript",
47
+ "cli"
48
+ ],
49
+ "engines": {
50
+ "node": ">=20"
51
+ },
15
52
  "dependencies": {
16
53
  "@commander-js/extra-typings": "^14.0.0",
17
54
  "@karmaniverous/jsonmap": "^2.1.0",
@@ -23,7 +60,6 @@
23
60
  "ajv-formats": "*",
24
61
  "cheerio": "^1.2.0",
25
62
  "chokidar": "^5.0.0",
26
- "commander": "^14.0.3",
27
63
  "cosmiconfig": "*",
28
64
  "dayjs": "^1.11.19",
29
65
  "fastify": "*",
@@ -47,15 +83,12 @@
47
83
  "uuid": "*",
48
84
  "zod": "^4.3.6"
49
85
  },
50
- "description": "Filesystem watcher that keeps a Qdrant vector store in sync with document changes",
51
86
  "devDependencies": {
52
87
  "@dotenvx/dotenvx": "^1.52.0",
53
- "@eslint/js": "^9.39.2",
54
88
  "@rollup/plugin-alias": "^6.0.0",
55
89
  "@rollup/plugin-commonjs": "^29.0.0",
56
90
  "@rollup/plugin-json": "^6.1.0",
57
91
  "@rollup/plugin-node-resolve": "^16.0.3",
58
- "@rollup/plugin-terser": "^0.4.4",
59
92
  "@rollup/plugin-typescript": "^12.3.0",
60
93
  "@types/fs-extra": "^11.0.4",
61
94
  "@types/js-yaml": "*",
@@ -63,82 +96,43 @@
63
96
  "@types/picomatch": "*",
64
97
  "@types/uuid": "*",
65
98
  "@vitest/coverage-v8": "^4.0.18",
66
- "@vitest/eslint-plugin": "^1.6.9",
67
99
  "auto-changelog": "^2.5.0",
68
100
  "cross-env": "^10.1.0",
69
- "eslint": "^9.39.2",
70
- "eslint-config-prettier": "^10.1.8",
71
- "eslint-plugin-prettier": "^5.5.5",
72
- "eslint-plugin-simple-import-sort": "^12.1.1",
73
- "eslint-plugin-tsdoc": "^0.5.0",
74
101
  "fs-extra": "^11.3.3",
75
102
  "happy-dom": "^20.7.0",
76
103
  "knip": "^5.85.0",
77
- "lefthook": "^2.1.1",
78
- "prettier": "^3.8.1",
79
104
  "release-it": "^19.2.4",
80
- "rimraf": "^6.1.3",
81
105
  "rollup": "^4.59.0",
82
106
  "rollup-plugin-dts": "^6.3.0",
83
107
  "tslib": "^2.8.1",
84
- "tsx": "^4.21.0",
85
108
  "typedoc": "^0.28.17",
86
109
  "typedoc-plugin-mdn-links": "^5.1.1",
87
110
  "typedoc-plugin-replace-text": "^4.2.0",
88
- "typescript": "^5.9.3",
89
- "typescript-eslint": "^8.56.0",
90
111
  "vitest": "^4.0.18"
91
112
  },
92
- "engines": {
93
- "node": ">=20"
94
- },
95
- "exports": {
96
- ".": {
97
- "import": {
98
- "types": "./dist/index.d.ts",
99
- "default": "./dist/mjs/index.js"
100
- },
101
- "require": {
102
- "types": "./dist/index.d.ts",
103
- "default": "./dist/cjs/index.js"
104
- }
105
- }
106
- },
107
- "files": [
108
- "dist",
109
- "config.schema.json"
110
- ],
111
- "homepage": "https://github.com/karmaniverous/jeeves-watcher#readme",
112
- "keywords": [
113
- "filesystem",
114
- "watcher",
115
- "qdrant",
116
- "vector-store",
117
- "embeddings",
118
- "semantic-search",
119
- "document-indexing",
120
- "rag",
121
- "langchain",
122
- "gemini",
123
- "typescript",
124
- "cli"
125
- ],
126
- "license": "BSD-3-Clause",
127
- "main": "dist/cjs/index.js",
128
- "module": "dist/mjs/index.js",
129
- "openclaw": {
130
- "extensions": [
131
- "./dist/plugin/index.js"
132
- ]
113
+ "scripts": {
114
+ "generate:schema": "tsx src/config/generate-schema.ts",
115
+ "build": "npm run generate:schema && rimraf dist && cross-env NO_COLOR=1 rollup --config rollup.config.ts --configPlugin @rollup/plugin-typescript",
116
+ "changelog": "auto-changelog",
117
+ "diagrams": "cd diagrams && plantuml -tpng -o ../assets -r .",
118
+ "knip": "knip",
119
+ "lint": "eslint .",
120
+ "lint:fix": "eslint --fix .",
121
+ "release": "dotenvx run -f .env.local -- release-it",
122
+ "release:pre": "dotenvx run -f .env.local -- release-it --no-git.requireBranch --github.prerelease --preRelease",
123
+ "test": "vitest run",
124
+ "typecheck": "tsc"
133
125
  },
134
- "name": "@karmaniverous/jeeves-watcher",
135
- "publishConfig": {
136
- "access": "public"
126
+ "auto-changelog": {
127
+ "output": "CHANGELOG.md",
128
+ "unreleased": true,
129
+ "commitLimit": false,
130
+ "hideCredit": true
137
131
  },
138
132
  "release-it": {
139
133
  "git": {
140
134
  "changelog": "npx auto-changelog --unreleased-only --stdout --template https://raw.githubusercontent.com/release-it/release-it/main/templates/changelog-compact.hbs",
141
- "commitMessage": "chore: release v${version}",
135
+ "commitMessage": "chore: release @karmaniverous/jeeves-watcher v${version}",
142
136
  "requireBranch": "main"
143
137
  },
144
138
  "github": {
@@ -153,39 +147,16 @@
153
147
  ],
154
148
  "before:npm:release": [
155
149
  "npx auto-changelog -p",
156
- "npm run docs",
157
150
  "git add -A"
158
151
  ],
159
152
  "after:release": [
160
- "git switch -c release/${version}",
161
- "git push -u origin release/${version}",
153
+ "git switch -c release/service/${version}",
154
+ "git push -u origin release/service/${version}",
162
155
  "git switch ${branchName}"
163
156
  ]
164
157
  },
165
158
  "npm": {
166
159
  "publish": true
167
160
  }
168
- },
169
- "repository": {
170
- "type": "git",
171
- "url": "git+https://github.com/karmaniverous/jeeves-watcher.git"
172
- },
173
- "scripts": {
174
- "generate:schema": "tsx src/config/generate-schema.ts",
175
- "build:skills": "node scripts/build-skills.js",
176
- "build": "npm run generate:schema && rimraf dist && cross-env NO_COLOR=1 rollup --config rollup.config.ts --configPlugin @rollup/plugin-typescript && npm run build:skills && node -e \"const fs=require('fs-extra');fs.copySync('plugin/openclaw.plugin.json','dist/plugin/openclaw.plugin.json');\"",
177
- "changelog": "auto-changelog",
178
- "diagrams": "cd diagrams && plantuml -tpng -o ../assets -r .",
179
- "docs": "typedoc",
180
- "knip": "knip",
181
- "lint": "eslint .",
182
- "lint:fix": "eslint --fix .",
183
- "release": "dotenvx run -f .env.local -- release-it",
184
- "release:pre": "dotenvx run -f .env.local -- release-it --no-git.requireBranch --github.prerelease --preRelease",
185
- "test": "vitest run",
186
- "typecheck": "tsc"
187
- },
188
- "type": "module",
189
- "types": "dist/index.d.ts",
190
- "version": "0.5.0-1"
161
+ }
191
162
  }
package/LICENSE DELETED
@@ -1,28 +0,0 @@
1
- BSD 3-Clause License
2
-
3
- Copyright (c) 2025, Jason Williscroft
4
-
5
- Redistribution and use in source and binary forms, with or without
6
- modification, are permitted provided that the following conditions are met:
7
-
8
- 1. Redistributions of source code must retain the above copyright notice, this
9
- list of conditions and the following disclaimer.
10
-
11
- 2. Redistributions in binary form must reproduce the above copyright notice,
12
- this list of conditions and the following disclaimer in the documentation
13
- and/or other materials provided with the distribution.
14
-
15
- 3. Neither the name of the copyright holder nor the names of its
16
- contributors may be used to endorse or promote products derived from
17
- this software without specific prior written permission.
18
-
19
- THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
20
- AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
21
- IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
22
- DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
23
- FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
24
- DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
25
- SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
26
- CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
27
- OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
28
- OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.