@karmaniverous/jeeves-watcher 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/LICENSE ADDED
@@ -0,0 +1,28 @@
1
+ BSD 3-Clause License
2
+
3
+ Copyright (c) 2025, Jason Williscroft
4
+
5
+ Redistribution and use in source and binary forms, with or without
6
+ modification, are permitted provided that the following conditions are met:
7
+
8
+ 1. Redistributions of source code must retain the above copyright notice, this
9
+ list of conditions and the following disclaimer.
10
+
11
+ 2. Redistributions in binary form must reproduce the above copyright notice,
12
+ this list of conditions and the following disclaimer in the documentation
13
+ and/or other materials provided with the distribution.
14
+
15
+ 3. Neither the name of the copyright holder nor the names of its
16
+ contributors may be used to endorse or promote products derived from
17
+ this software without specific prior written permission.
18
+
19
+ THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
20
+ AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
21
+ IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
22
+ DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
23
+ FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
24
+ DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
25
+ SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
26
+ CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
27
+ OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
28
+ OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
package/README.md ADDED
@@ -0,0 +1,236 @@
1
+ # @karmaniverous/jeeves-watcher
2
+
3
+ Filesystem watcher that keeps a Qdrant vector store in sync with document changes.
4
+
5
+ ## Overview
6
+
7
+ `jeeves-watcher` monitors a configured set of directories for file changes, extracts text content, generates embeddings, and maintains a synchronized Qdrant vector store for semantic search. It automatically:
8
+
9
+ - **Watches** directories for file additions, modifications, and deletions
10
+ - **Extracts** text from various formats (Markdown, PDF, DOCX, HTML, JSON, plain text)
11
+ - **Chunks** large documents for optimal embedding
12
+ - **Embeds** content using configurable providers (Google Gemini, OpenAI, etc.)
13
+ - **Syncs** to Qdrant for fast semantic search
14
+ - **Enriches** metadata via rules and API endpoints
15
+
16
+ ## Quick Start
17
+
18
+ ### Installation
19
+
20
+ ```bash
21
+ npm install -g @karmaniverous/jeeves-watcher
22
+ ```
23
+
24
+ ### Initialize Configuration
25
+
26
+ Create a new configuration file in your project:
27
+
28
+ ```bash
29
+ jeeves-watcher init
30
+ ```
31
+
32
+ This generates a `.jeeves-watcher.json` file with sensible defaults.
33
+
34
+ ### Configure
35
+
36
+ Edit `.jeeves-watcher.json` to specify:
37
+
38
+ - **Watch paths**: Directories to monitor
39
+ - **Embedding provider**: Google Gemini, OpenAI, or custom
40
+ - **Qdrant connection**: URL and collection name
41
+ - **Inference rules**: Automatic metadata enrichment based on file patterns
42
+
43
+ Example minimal configuration:
44
+
45
+ ```json
46
+ {
47
+ "watch": {
48
+ "paths": ["./docs"],
49
+ "ignored": ["**/node_modules/**", "**/.git/**"]
50
+ },
51
+ "embedding": {
52
+ "provider": "google",
53
+ "model": "text-embedding-004",
54
+ "apiKey": "${GOOGLE_API_KEY}"
55
+ },
56
+ "vectorStore": {
57
+ "url": "http://localhost:6333",
58
+ "collectionName": "my_docs"
59
+ }
60
+ }
61
+ ```
62
+
63
+ ### Start Watching
64
+
65
+ ```bash
66
+ jeeves-watcher start
67
+ ```
68
+
69
+ The watcher will:
70
+ 1. Index all existing files in watched directories
71
+ 2. Monitor for changes
72
+ 3. Update Qdrant automatically
73
+
74
+ ## CLI Commands
75
+
76
+ | Command | Description |
77
+ |---------|-------------|
78
+ | `jeeves-watcher start` | Start the filesystem watcher (foreground) |
79
+ | `jeeves-watcher init` | Initialize a new configuration file |
80
+ | `jeeves-watcher status` | Show watcher status |
81
+ | `jeeves-watcher reindex` | Reindex all watched files |
82
+ | `jeeves-watcher rebuild-metadata` | Rebuild metadata files from Qdrant payloads |
83
+ | `jeeves-watcher search <query>` | Search the vector store |
84
+ | `jeeves-watcher enrich <path>` | Enrich document metadata |
85
+ | `jeeves-watcher validate` | Validate the configuration |
86
+ | `jeeves-watcher service` | Manage the watcher as a system service |
87
+ | `jeeves-watcher config-reindex` | Reindex after configuration changes (rules only or full) |
88
+
89
+ ## Configuration
90
+
91
+ ### Watch Paths
92
+
93
+ ```json
94
+ {
95
+ "watch": {
96
+ "paths": ["./docs", "./notes"],
97
+ "ignored": ["**/node_modules/**", "**/*.tmp"]
98
+ }
99
+ }
100
+ ```
101
+
102
+ - **`paths`**: Array of glob patterns or directories to watch
103
+ - **`ignored`**: Array of patterns to exclude
104
+
105
+ ### Embedding Provider
106
+
107
+ #### Google Gemini
108
+
109
+ ```json
110
+ {
111
+ "embedding": {
112
+ "provider": "google",
113
+ "model": "text-embedding-004",
114
+ "apiKey": "${GOOGLE_API_KEY}"
115
+ }
116
+ }
117
+ ```
118
+
119
+ #### OpenAI
120
+
121
+ ```json
122
+ {
123
+ "embedding": {
124
+ "provider": "openai",
125
+ "model": "text-embedding-3-small",
126
+ "apiKey": "${OPENAI_API_KEY}"
127
+ }
128
+ }
129
+ ```
130
+
131
+ ### Vector Store
132
+
133
+ ```json
134
+ {
135
+ "vectorStore": {
136
+ "url": "http://localhost:6333",
137
+ "collectionName": "my_collection"
138
+ }
139
+ }
140
+ ```
141
+
142
+ ### Inference Rules
143
+
144
+ Automatically enrich metadata based on file patterns:
145
+
146
+ ```json
147
+ {
148
+ "inferenceRules": [
149
+ {
150
+ "match": {
151
+ "properties": {
152
+ "file": {
153
+ "type": "object",
154
+ "properties": {
155
+ "path": { "type": "string", "glob": "**/meetings/**" }
156
+ }
157
+ }
158
+ }
159
+ },
160
+ "set": {
161
+ "domain": "meetings",
162
+ "category": "notes"
163
+ }
164
+ }
165
+ ]
166
+ }
167
+ ```
168
+
169
+ ### Chunking
170
+
171
+ ```json
172
+ {
173
+ "chunking": {
174
+ "chunkSize": 1000,
175
+ "chunkOverlap": 200
176
+ }
177
+ }
178
+ ```
179
+
180
+ ### Metadata Storage
181
+
182
+ ```json
183
+ {
184
+ "metadataDir": ".jeeves-metadata"
185
+ }
186
+ ```
187
+
188
+ Metadata is stored as JSON files alongside watched documents.
189
+
190
+ ## API Endpoints
191
+
192
+ The watcher provides a REST API (default port: 3456):
193
+
194
+ | Endpoint | Method | Description |
195
+ |----------|--------|-------------|
196
+ | `/status` | GET | Health check and uptime |
197
+ | `/search` | POST | Semantic search (`{ query: string, limit?: number }`) |
198
+ | `/metadata` | POST | Update document metadata (`{ path: string, metadata: object }`) |
199
+ | `/reindex` | POST | Reindex all watched files |
200
+ | `/rebuild-metadata` | POST | Rebuild metadata files from Qdrant |
201
+ | `/config-reindex` | POST | Reindex after config changes (`{ scope?: "rules" \| "full" }`) |
202
+
203
+ ### Example: Search
204
+
205
+ ```bash
206
+ curl -X POST http://localhost:3456/search \
207
+ -H "Content-Type: application/json" \
208
+ -d '{"query": "machine learning algorithms", "limit": 5}'
209
+ ```
210
+
211
+ ### Example: Update Metadata
212
+
213
+ ```bash
214
+ curl -X POST http://localhost:3456/metadata \
215
+ -H "Content-Type: application/json" \
216
+ -d '{
217
+ "path": "/path/to/document.md",
218
+ "metadata": {
219
+ "priority": "high",
220
+ "category": "research"
221
+ }
222
+ }'
223
+ ```
224
+
225
+ ## Supported File Formats
226
+
227
+ - **Markdown** (`.md`, `.markdown`) — with YAML frontmatter support
228
+ - **PDF** (`.pdf`) — text extraction
229
+ - **DOCX** (`.docx`) — Microsoft Word documents
230
+ - **HTML** (`.html`, `.htm`) — content extraction (scripts/styles removed)
231
+ - **JSON** (`.json`) — with smart text field detection
232
+ - **Plain Text** (`.txt`, `.text`)
233
+
234
+ ## License
235
+
236
+ BSD-3-Clause