@karmaniverous/jeeves-watcher 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +28 -0
- package/README.md +236 -0
- package/dist/cjs/index.js +1577 -0
- package/dist/cli/jeeves-watcher/index.js +1765 -0
- package/dist/index.d.ts +615 -0
- package/dist/index.iife.js +1562 -0
- package/dist/index.iife.min.js +1 -0
- package/dist/mjs/index.js +1537 -0
- package/package.json +169 -0
package/LICENSE
ADDED
|
@@ -0,0 +1,28 @@
|
|
|
1
|
+
BSD 3-Clause License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2025, Jason Williscroft
|
|
4
|
+
|
|
5
|
+
Redistribution and use in source and binary forms, with or without
|
|
6
|
+
modification, are permitted provided that the following conditions are met:
|
|
7
|
+
|
|
8
|
+
1. Redistributions of source code must retain the above copyright notice, this
|
|
9
|
+
list of conditions and the following disclaimer.
|
|
10
|
+
|
|
11
|
+
2. Redistributions in binary form must reproduce the above copyright notice,
|
|
12
|
+
this list of conditions and the following disclaimer in the documentation
|
|
13
|
+
and/or other materials provided with the distribution.
|
|
14
|
+
|
|
15
|
+
3. Neither the name of the copyright holder nor the names of its
|
|
16
|
+
contributors may be used to endorse or promote products derived from
|
|
17
|
+
this software without specific prior written permission.
|
|
18
|
+
|
|
19
|
+
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
|
|
20
|
+
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
|
21
|
+
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
|
|
22
|
+
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
|
|
23
|
+
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
|
|
24
|
+
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
|
|
25
|
+
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
|
|
26
|
+
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
|
|
27
|
+
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
|
|
28
|
+
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
package/README.md
ADDED
|
@@ -0,0 +1,236 @@
|
|
|
1
|
+
# @karmaniverous/jeeves-watcher
|
|
2
|
+
|
|
3
|
+
Filesystem watcher that keeps a Qdrant vector store in sync with document changes.
|
|
4
|
+
|
|
5
|
+
## Overview
|
|
6
|
+
|
|
7
|
+
`jeeves-watcher` monitors a configured set of directories for file changes, extracts text content, generates embeddings, and maintains a synchronized Qdrant vector store for semantic search. It automatically:
|
|
8
|
+
|
|
9
|
+
- **Watches** directories for file additions, modifications, and deletions
|
|
10
|
+
- **Extracts** text from various formats (Markdown, PDF, DOCX, HTML, JSON, plain text)
|
|
11
|
+
- **Chunks** large documents for optimal embedding
|
|
12
|
+
- **Embeds** content using configurable providers (Google Gemini, OpenAI, etc.)
|
|
13
|
+
- **Syncs** to Qdrant for fast semantic search
|
|
14
|
+
- **Enriches** metadata via rules and API endpoints
|
|
15
|
+
|
|
16
|
+
## Quick Start
|
|
17
|
+
|
|
18
|
+
### Installation
|
|
19
|
+
|
|
20
|
+
```bash
|
|
21
|
+
npm install -g @karmaniverous/jeeves-watcher
|
|
22
|
+
```
|
|
23
|
+
|
|
24
|
+
### Initialize Configuration
|
|
25
|
+
|
|
26
|
+
Create a new configuration file in your project:
|
|
27
|
+
|
|
28
|
+
```bash
|
|
29
|
+
jeeves-watcher init
|
|
30
|
+
```
|
|
31
|
+
|
|
32
|
+
This generates a `.jeeves-watcher.json` file with sensible defaults.
|
|
33
|
+
|
|
34
|
+
### Configure
|
|
35
|
+
|
|
36
|
+
Edit `.jeeves-watcher.json` to specify:
|
|
37
|
+
|
|
38
|
+
- **Watch paths**: Directories to monitor
|
|
39
|
+
- **Embedding provider**: Google Gemini, OpenAI, or custom
|
|
40
|
+
- **Qdrant connection**: URL and collection name
|
|
41
|
+
- **Inference rules**: Automatic metadata enrichment based on file patterns
|
|
42
|
+
|
|
43
|
+
Example minimal configuration:
|
|
44
|
+
|
|
45
|
+
```json
|
|
46
|
+
{
|
|
47
|
+
"watch": {
|
|
48
|
+
"paths": ["./docs"],
|
|
49
|
+
"ignored": ["**/node_modules/**", "**/.git/**"]
|
|
50
|
+
},
|
|
51
|
+
"embedding": {
|
|
52
|
+
"provider": "google",
|
|
53
|
+
"model": "text-embedding-004",
|
|
54
|
+
"apiKey": "${GOOGLE_API_KEY}"
|
|
55
|
+
},
|
|
56
|
+
"vectorStore": {
|
|
57
|
+
"url": "http://localhost:6333",
|
|
58
|
+
"collectionName": "my_docs"
|
|
59
|
+
}
|
|
60
|
+
}
|
|
61
|
+
```
|
|
62
|
+
|
|
63
|
+
### Start Watching
|
|
64
|
+
|
|
65
|
+
```bash
|
|
66
|
+
jeeves-watcher start
|
|
67
|
+
```
|
|
68
|
+
|
|
69
|
+
The watcher will:
|
|
70
|
+
1. Index all existing files in watched directories
|
|
71
|
+
2. Monitor for changes
|
|
72
|
+
3. Update Qdrant automatically
|
|
73
|
+
|
|
74
|
+
## CLI Commands
|
|
75
|
+
|
|
76
|
+
| Command | Description |
|
|
77
|
+
|---------|-------------|
|
|
78
|
+
| `jeeves-watcher start` | Start the filesystem watcher (foreground) |
|
|
79
|
+
| `jeeves-watcher init` | Initialize a new configuration file |
|
|
80
|
+
| `jeeves-watcher status` | Show watcher status |
|
|
81
|
+
| `jeeves-watcher reindex` | Reindex all watched files |
|
|
82
|
+
| `jeeves-watcher rebuild-metadata` | Rebuild metadata files from Qdrant payloads |
|
|
83
|
+
| `jeeves-watcher search <query>` | Search the vector store |
|
|
84
|
+
| `jeeves-watcher enrich <path>` | Enrich document metadata |
|
|
85
|
+
| `jeeves-watcher validate` | Validate the configuration |
|
|
86
|
+
| `jeeves-watcher service` | Manage the watcher as a system service |
|
|
87
|
+
| `jeeves-watcher config-reindex` | Reindex after configuration changes (rules only or full) |
|
|
88
|
+
|
|
89
|
+
## Configuration
|
|
90
|
+
|
|
91
|
+
### Watch Paths
|
|
92
|
+
|
|
93
|
+
```json
|
|
94
|
+
{
|
|
95
|
+
"watch": {
|
|
96
|
+
"paths": ["./docs", "./notes"],
|
|
97
|
+
"ignored": ["**/node_modules/**", "**/*.tmp"]
|
|
98
|
+
}
|
|
99
|
+
}
|
|
100
|
+
```
|
|
101
|
+
|
|
102
|
+
- **`paths`**: Array of glob patterns or directories to watch
|
|
103
|
+
- **`ignored`**: Array of patterns to exclude
|
|
104
|
+
|
|
105
|
+
### Embedding Provider
|
|
106
|
+
|
|
107
|
+
#### Google Gemini
|
|
108
|
+
|
|
109
|
+
```json
|
|
110
|
+
{
|
|
111
|
+
"embedding": {
|
|
112
|
+
"provider": "google",
|
|
113
|
+
"model": "text-embedding-004",
|
|
114
|
+
"apiKey": "${GOOGLE_API_KEY}"
|
|
115
|
+
}
|
|
116
|
+
}
|
|
117
|
+
```
|
|
118
|
+
|
|
119
|
+
#### OpenAI
|
|
120
|
+
|
|
121
|
+
```json
|
|
122
|
+
{
|
|
123
|
+
"embedding": {
|
|
124
|
+
"provider": "openai",
|
|
125
|
+
"model": "text-embedding-3-small",
|
|
126
|
+
"apiKey": "${OPENAI_API_KEY}"
|
|
127
|
+
}
|
|
128
|
+
}
|
|
129
|
+
```
|
|
130
|
+
|
|
131
|
+
### Vector Store
|
|
132
|
+
|
|
133
|
+
```json
|
|
134
|
+
{
|
|
135
|
+
"vectorStore": {
|
|
136
|
+
"url": "http://localhost:6333",
|
|
137
|
+
"collectionName": "my_collection"
|
|
138
|
+
}
|
|
139
|
+
}
|
|
140
|
+
```
|
|
141
|
+
|
|
142
|
+
### Inference Rules
|
|
143
|
+
|
|
144
|
+
Automatically enrich metadata based on file patterns:
|
|
145
|
+
|
|
146
|
+
```json
|
|
147
|
+
{
|
|
148
|
+
"inferenceRules": [
|
|
149
|
+
{
|
|
150
|
+
"match": {
|
|
151
|
+
"properties": {
|
|
152
|
+
"file": {
|
|
153
|
+
"type": "object",
|
|
154
|
+
"properties": {
|
|
155
|
+
"path": { "type": "string", "glob": "**/meetings/**" }
|
|
156
|
+
}
|
|
157
|
+
}
|
|
158
|
+
}
|
|
159
|
+
},
|
|
160
|
+
"set": {
|
|
161
|
+
"domain": "meetings",
|
|
162
|
+
"category": "notes"
|
|
163
|
+
}
|
|
164
|
+
}
|
|
165
|
+
]
|
|
166
|
+
}
|
|
167
|
+
```
|
|
168
|
+
|
|
169
|
+
### Chunking
|
|
170
|
+
|
|
171
|
+
```json
|
|
172
|
+
{
|
|
173
|
+
"chunking": {
|
|
174
|
+
"chunkSize": 1000,
|
|
175
|
+
"chunkOverlap": 200
|
|
176
|
+
}
|
|
177
|
+
}
|
|
178
|
+
```
|
|
179
|
+
|
|
180
|
+
### Metadata Storage
|
|
181
|
+
|
|
182
|
+
```json
|
|
183
|
+
{
|
|
184
|
+
"metadataDir": ".jeeves-metadata"
|
|
185
|
+
}
|
|
186
|
+
```
|
|
187
|
+
|
|
188
|
+
Metadata is stored as JSON files alongside watched documents.
|
|
189
|
+
|
|
190
|
+
## API Endpoints
|
|
191
|
+
|
|
192
|
+
The watcher provides a REST API (default port: 3456):
|
|
193
|
+
|
|
194
|
+
| Endpoint | Method | Description |
|
|
195
|
+
|----------|--------|-------------|
|
|
196
|
+
| `/status` | GET | Health check and uptime |
|
|
197
|
+
| `/search` | POST | Semantic search (`{ query: string, limit?: number }`) |
|
|
198
|
+
| `/metadata` | POST | Update document metadata (`{ path: string, metadata: object }`) |
|
|
199
|
+
| `/reindex` | POST | Reindex all watched files |
|
|
200
|
+
| `/rebuild-metadata` | POST | Rebuild metadata files from Qdrant |
|
|
201
|
+
| `/config-reindex` | POST | Reindex after config changes (`{ scope?: "rules" \| "full" }`) |
|
|
202
|
+
|
|
203
|
+
### Example: Search
|
|
204
|
+
|
|
205
|
+
```bash
|
|
206
|
+
curl -X POST http://localhost:3456/search \
|
|
207
|
+
-H "Content-Type: application/json" \
|
|
208
|
+
-d '{"query": "machine learning algorithms", "limit": 5}'
|
|
209
|
+
```
|
|
210
|
+
|
|
211
|
+
### Example: Update Metadata
|
|
212
|
+
|
|
213
|
+
```bash
|
|
214
|
+
curl -X POST http://localhost:3456/metadata \
|
|
215
|
+
-H "Content-Type: application/json" \
|
|
216
|
+
-d '{
|
|
217
|
+
"path": "/path/to/document.md",
|
|
218
|
+
"metadata": {
|
|
219
|
+
"priority": "high",
|
|
220
|
+
"category": "research"
|
|
221
|
+
}
|
|
222
|
+
}'
|
|
223
|
+
```
|
|
224
|
+
|
|
225
|
+
## Supported File Formats
|
|
226
|
+
|
|
227
|
+
- **Markdown** (`.md`, `.markdown`) — with YAML frontmatter support
|
|
228
|
+
- **PDF** (`.pdf`) — text extraction
|
|
229
|
+
- **DOCX** (`.docx`) — Microsoft Word documents
|
|
230
|
+
- **HTML** (`.html`, `.htm`) — content extraction (scripts/styles removed)
|
|
231
|
+
- **JSON** (`.json`) — with smart text field detection
|
|
232
|
+
- **Plain Text** (`.txt`, `.text`)
|
|
233
|
+
|
|
234
|
+
## License
|
|
235
|
+
|
|
236
|
+
BSD-3-Clause
|