@karmaniverous/jeeves-watcher 0.5.0-1 → 0.5.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,413 +0,0 @@
1
- ---
2
- name: jeeves-watcher
3
- description: >
4
- Semantic search and metadata enrichment via a jeeves-watcher instance.
5
- Use when you need to search indexed documents, discover available metadata
6
- fields, filter by payload values, or enrich document metadata.
7
- ---
8
-
9
- # jeeves-watcher — Search & Discovery
10
-
11
- **Key principle:** The SKILL teaches procedure. The config provides specifics. The assistant discovers everything about a deployment at runtime; nothing about domains, field names, or organizational structure is hardcoded in the SKILL.
12
-
13
- ## Quick Start
14
-
15
- 1. **Orient yourself** (once per session) — understand the deployment's organizational strategy and available record types
16
- 2. **Search** — use semantic search with optional metadata filters to find relevant documents
17
- 3. **Read source** — retrieve full file content for complete context
18
-
19
- ## Tools
20
-
21
- ### `watcher_search`
22
- Semantic search over indexed documents.
23
- - `query` (string, required) — natural language search query
24
- - `limit` (number, optional) — max results, default 10
25
- - `offset` (number, optional) — skip N results for pagination
26
- - `filter` (object, optional) — Qdrant filter for metadata filtering
27
-
28
- ### `watcher_enrich`
29
- Set or update metadata on a document.
30
- - `path` (string, required) — file path of the document
31
- - `metadata` (object, required) — key-value metadata to merge
32
-
33
- ### `watcher_status`
34
- Service health check. Returns uptime, collection stats, reindex status.
35
-
36
- ### `watcher_query`
37
- Query the merged virtual document via JSONPath.
38
- - `path` (string, required) — JSONPath expression
39
- - `resolve` (string[], optional) — `["files"]`, `["globals"]`, or `["files","globals"]`
40
-
41
- ## Qdrant Filter Syntax
42
-
43
- Filters use Qdrant's native JSON filter format, passed as the `filter` parameter to `watcher_search`.
44
-
45
- ### Basic Patterns
46
-
47
- **Match exact value:**
48
- ```json
49
- { "must": [{ "key": "domain", "match": { "value": "email" } }] }
50
- ```
51
-
52
- **Match text (full-text search within field):**
53
- ```json
54
- { "must": [{ "key": "chunk_text", "match": { "text": "authentication" } }] }
55
- ```
56
-
57
- **Combine conditions (AND):**
58
- ```json
59
- {
60
- "must": [
61
- { "key": "domain", "match": { "value": "jira" } },
62
- { "key": "status", "match": { "value": "In Progress" } }
63
- ]
64
- }
65
- ```
66
-
67
- **Exclude (NOT):**
68
- ```json
69
- {
70
- "must_not": [{ "key": "domain", "match": { "value": "repos" } }]
71
- }
72
- ```
73
-
74
- **Any of (OR):**
75
- ```json
76
- {
77
- "should": [
78
- { "key": "domain", "match": { "value": "email" } },
79
- { "key": "domain", "match": { "value": "slack" } }
80
- ]
81
- }
82
- ```
83
-
84
- **Nested (combine AND + NOT):**
85
- ```json
86
- {
87
- "must": [{ "key": "domain", "match": { "value": "jira" } }],
88
- "must_not": [{ "key": "status", "match": { "value": "Done" } }]
89
- }
90
- ```
91
-
92
- ### Key Differences
93
- - `match.value` — exact match (case-sensitive, for keyword fields like `domain`, `status`)
94
- - `match.text` — full-text match (for text fields like `chunk_text`)
95
-
96
- ## Search Result Shape
97
-
98
- Each result from `watcher_search` contains:
99
-
100
- | Field | Type | Description |
101
- |-------|------|-------------|
102
- | `id` | string | Qdrant point ID |
103
- | `score` | number | Similarity score (0-1, higher = more relevant) |
104
- | `payload.file_path` | string | Source file path |
105
- | `payload.chunk_text` | string | The matched text chunk |
106
- | `payload.chunk_index` | number | Chunk position within the file |
107
- | `payload.total_chunks` | number | Total chunks for this file |
108
- | `payload.content_hash` | string | Hash of the full document content |
109
- | `payload.matched_rules` | string[] | Names of inference rules that matched |
110
-
111
- Additional metadata fields depend on the deployment's inference rules (e.g., `domain`, `status`, `author`). Use `watcher_query` to discover available fields.
112
-
113
- ## JSONPath Patterns for Schema Discovery
114
-
115
- Use `watcher_query` to explore the merged virtual document. Common patterns:
116
-
117
- ### Orientation
118
- ```
119
- $.inferenceRules[*].['name','description'] — List all rules with descriptions
120
- $.search.scoreThresholds — Score interpretation thresholds
121
- $.slots — Named filter patterns (e.g., memory)
122
- ```
123
-
124
- ### Schema Discovery
125
- ```
126
- $.inferenceRules[?(@.name=='jira-issue')] — Full rule details
127
- $.inferenceRules[?(@.name=='jira-issue')].values — Distinct values for a rule
128
- $.inferenceRules[?(@.name=='jira-issue')].values.status — Values for a specific field
129
- ```
130
-
131
- ### Helper Enumeration
132
- ```
133
- $.mapHelpers — All JsonMap helper namespaces
134
- $.mapHelpers.slack.exports — Exports from the 'slack' helper
135
- $.templateHelpers — All Handlebars helper namespaces
136
- ```
137
-
138
- ### Issues
139
- ```
140
- $.issues — All runtime embedding failures
141
- ```
142
-
143
- ### Full Config Introspection
144
- ```
145
- $.schemas — Global named schemas
146
- $.maps — Named JsonMap transforms
147
- $.templates — Named Handlebars templates
148
- ```
149
-
150
- ---
151
-
152
- ## Orientation Pattern (Once Per Session)
153
-
154
- Query the deployment's organizational context and available record types. This information is stable within a session; query once and rely on results for the remainder.
155
-
156
- **Efficient pattern (two calls):**
157
-
158
- 1. **Top-level context:**
159
- ```
160
- watcher_query: path="$.['description','search']"
161
- ```
162
- Returns:
163
- - `description` — organizational strategy (e.g., how domains are structured, what partitioning means)
164
- - `search.scoreThresholds` — score interpretation boundaries (strong, relevant, noise)
165
-
166
- 2. **Available record types:**
167
- ```
168
- watcher_query: path="$.inferenceRules[*].['name','description']"
169
- ```
170
- Returns list of inference rules with their names and descriptions.
171
-
172
- **Example result:**
173
- ```json
174
- [
175
- { "name": "email-archive", "description": "Email archive messages" },
176
- { "name": "slack-message", "description": "Slack channel messages with channel and author metadata" },
177
- { "name": "jira-issue", "description": "Jira issue metadata extracted from issue JSON exports" }
178
- ]
179
- ```
180
-
181
- The top-level `description` explains this deployment's organizational strategy. Each rule's `description` explains what that specific record type represents. Both levels are useful: one orients, the other enumerates.
182
-
183
- ---
184
-
185
- ## `resolve` Usage Guidance
186
-
187
- The `resolve` parameter controls which reference layers are expanded in `watcher_query`:
188
-
189
- - **No `resolve` (default):** Raw config structure with references intact (lightweight)
190
- - **`resolve: ["files"]`:** Resolve file path references to their contents (e.g., `"schemas/base.json"` → the JSON Schema object)
191
- - **`resolve: ["globals"]`:** Resolve named schema references (e.g., `"base"` in a rule's schema array → the global schema object)
192
- - **`resolve: ["files","globals"]`:** Fully inlined, everything expanded
193
-
194
- **When to use:**
195
- - **Orientation:** No resolve (just names and descriptions, lightweight)
196
- - **Query planning:** `resolve: ["files","globals"]` (need complete merged schemas for filter construction)
197
- - **Browsing global schemas:** `resolve: ["files"]` (see schema contents but keep named references visible for DRY structure understanding)
198
-
199
- ---
200
-
201
- ## Query Planning (Per Search Task)
202
-
203
- Identify relevant rule(s) from the orientation model, then retrieve their schemas:
204
-
205
- **Retrieve complete schema for a rule:**
206
- ```
207
- watcher_query: path="$.inferenceRules[?(@.name=='jira-issue')].schema"
208
- resolve=["files","globals"]
209
- ```
210
-
211
- Returns the fully merged schema with properties, types, `set` provenance, `uiHint`, `enum`, etc.
212
-
213
- **For select/multiselect fields without `enum` in schema:**
214
- ```
215
- watcher_query: path="$.inferenceRules[?(@.name=='jira-issue')].values.status"
216
- ```
217
-
218
- Retrieves valid filter values from the runtime values index (distinct values accumulated during embedding).
219
-
220
- **When search results span multiple rules** (indicated by `matched_rules` on results): query each unique rule's schema separately and merge mentally. Most result sets share the same rule combination, so this is typically one or two queries, not one per result.
221
-
222
- ---
223
-
224
- ## uiHint → Qdrant Filter Mapping
225
-
226
- Use `uiHint` to determine filter construction strategy. **This table is explicit, not intuited:**
227
-
228
- | `uiHint` | Qdrant filter | Notes |
229
- |----------|--------------|-------|
230
- | `text` | `{ "key": "<field>", "match": { "text": "<value>" } }` | Substring/keyword match |
231
- | `select` | `{ "key": "<field>", "match": { "value": "<enum_value>" } }` | Exact match; use `enum` values from schema or runtime values index |
232
- | `multiselect` | `{ "key": "<field>", "match": { "value": "<enum_value>" } }` | Any-element match on array field; use `enum` or runtime values index |
233
- | `date` | `{ "key": "<field>", "range": { "gte": <unix_ts>, "lt": <unix_ts> } }` | Either bound optional for open-ended ranges (e.g., "after January" → `gte` only) |
234
- | `number` | `{ "key": "<field>", "range": { "gte": <n>, "lte": <n> } }` | Either bound optional for open-ended ranges |
235
- | `check` | `{ "key": "<field>", "match": { "value": true } }` | Boolean match |
236
- | *(absent)* | Do not use in filters | Internal bookkeeping field, not intended for search |
237
-
238
- **Fallback:** If a `select`/`multiselect` field has neither `enum` in schema nor values in the index, treat it as `text` (substring match instead of exact match).
239
-
240
- ---
241
-
242
- ## Qdrant Filter Combinators
243
-
244
- Compose individual field conditions into complex queries using three combinators:
245
-
246
- | Combinator | Semantics | Use case |
247
- |-----------|-----------|----------|
248
- | `must` | AND — all conditions required | Intersecting constraints (domain + date range + assignee) |
249
- | `should` | OR — at least one must match | Alternative values, fuzzy criteria ("assigned to X or Y") |
250
- | `must_not` | Exclusion — any match triggers exclude | Filtering out noise (exclude Done, exclude codebase domain) |
251
-
252
- **Combinators nest arbitrarily for complex boolean logic:**
253
- ```json
254
- {
255
- "must": [
256
- { "key": "domain", "match": { "value": "jira" } },
257
- { "key": "created", "range": { "gte": 1735689600 } }
258
- ],
259
- "should": [
260
- { "key": "assignee", "match": { "value": "Jason Williscroft" } },
261
- { "key": "assignee", "match": { "value": null } }
262
- ],
263
- "must_not": [
264
- { "key": "status", "match": { "value": "Done" } }
265
- ]
266
- }
267
- ```
268
-
269
- A consuming UI will necessarily compose simple single-field filters. The assistant can compose deeply complex queries combining multiple fields, nested boolean logic, and open-ended ranges to precisely target what it needs.
270
-
271
- ---
272
-
273
- ## Search Execution
274
-
275
- **Plain semantic search is valid and often sufficient.** Not every query needs metadata filters. When the user's question is broad or exploratory, a natural language query with no filter object is the right starting point. Add filters to narrow, not as a default.
276
-
277
- **Result limit guidance:**
278
- - Default: 10 results
279
- - Broad discovery / exploratory: 20–30, apply score threshold cutoff from config
280
- - Targeted retrieval with tight filters: 5
281
- - Cross-domain sweep: 15–20, no domain filter, use score to separate signal from noise
282
-
283
- ---
284
-
285
- ## Search Result Shape
286
-
287
- **Qdrant output (stable across all configs):**
288
- ```json
289
- {
290
- "id": "<point_id>",
291
- "score": 0.82,
292
- "payload": {
293
- "file_path": "j:/domains/jira/VCN/issue/WEB-123.json",
294
- "chunk_index": 0,
295
- "total_chunks": 1,
296
- "chunk_text": "...",
297
- "content_hash": "...",
298
- "matched_rules": ["jira-issue", "json-subject"],
299
- ...config-defined metadata fields...
300
- }
301
- }
302
- ```
303
-
304
- **System fields present on every result** (watcher-managed, not config-defined):
305
- - `file_path` — source file path
306
- - `chunk_index` / `total_chunks` — chunk position within document
307
- - `chunk_text` — the embedded text content
308
- - `content_hash` — content fingerprint for deduplication
309
- - `matched_rules` — inference rules that produced this point's metadata
310
-
311
- **All other payload fields are config-defined** (via inference rule schemas).
312
-
313
- Refer to Qdrant documentation for the complete search response envelope.
314
-
315
- ---
316
-
317
- ## Post-Processing Guidance
318
-
319
- ### Score Interpretation
320
- Use `scoreThresholds` from config (queried during orientation). Values are deployment-specific, constrained to [-1, 1]:
321
- - `strong` — minimum score for a strong match
322
- - `relevant` — minimum score for relevance
323
- - `noise` — maximum score below which results are noise
324
-
325
- ### Chunk Grouping
326
- Multiple results with the same `file_path` are chunks of one document. Read the full file for complete context.
327
-
328
- ### Schema Lookup
329
- Use `matched_rules` on results to look up applicable schemas for metadata interpretation:
330
- ```
331
- watcher_query: path="$.inferenceRules[?(@.name=='jira-issue')].schema"
332
- resolve=["files","globals"]
333
- ```
334
-
335
- ### Full Context
336
- Search gives you chunks; use `read` with `file_path` for the complete document.
337
-
338
- ---
339
-
340
- ## Path Testing
341
-
342
- When uncertain whether a file is indexed, use the path test endpoint:
343
- ```
344
- watcher_query: path="$.inferenceRules[?(@.name=='<rule>')].match"
345
- ```
346
-
347
- Or check if a specific path would match:
348
- - Returns matching rule names and watch scope status
349
- - Empty `rules` array means no inference rules match
350
- - `watched: false` means the path falls outside watch paths or is excluded by ignore patterns
351
-
352
- ---
353
-
354
- ## Diagnostics
355
-
356
- Check the issues endpoint for failed embeddings:
357
- ```
358
- watcher_query: path="$.issues"
359
- ```
360
-
361
- **Issues are self-healing:** resolved on successful re-process. The issues file always represents the current set of unresolved problems: a live todo list.
362
-
363
- **Issue types:**
364
- - `type_collision` — multiple rules declare the same property with incompatible types (includes `property`, `rules[]`, `types[]`)
365
- - `interpolation_error` — `set` template path doesn't resolve (includes `property`, `rule`)
366
-
367
- ---
368
-
369
- ## Enrichment
370
-
371
- Use `watcher_enrich` to tag documents after analysis (e.g., `reviewed: true`, project labels).
372
-
373
- **Metadata is validated against the file's matched rule schemas.** Validation errors return structured messages:
374
- ```json
375
- {
376
- "error": "Validation failed",
377
- "details": [
378
- {
379
- "property": "priority",
380
- "expected": "string",
381
- "received": "number",
382
- "rule": "jira-issue",
383
- "message": "Property 'priority' is declared as string in jira-issue schema, received number"
384
- }
385
- ]
386
- }
387
- ```
388
-
389
- ---
390
-
391
- ## Memory Recall
392
-
393
- If `$.slots.memory` is present during orientation, this instance indexes memory files. Before answering questions about prior work, decisions, dates, people, preferences, or todos:
394
-
395
- 1. Search with `watcher_search` using the memory slot filter
396
- 2. Use `read` with offset/limit for full context from matched files
397
- 3. Include `Source: <file_path>` citations in your response
398
-
399
- ---
400
-
401
- ## Error Handling
402
-
403
- If the watcher is unreachable:
404
- - Inform the user that semantic search is temporarily unavailable
405
- - Fall back to direct `read` for known file paths
406
- - Do not retry silently in a loop
407
-
408
- ---
409
-
410
- ## References
411
-
412
- - [JSONPath Plus documentation](https://www.npmjs.com/package/jsonpath-plus) for JSONPath syntax
413
- - [Qdrant filtering documentation](https://qdrant.tech/documentation/concepts/filtering/) for advanced query patterns and search response format
@@ -1,200 +0,0 @@
1
- ---
2
- name: jeeves-watcher-admin
3
- description: >
4
- Instance management for a jeeves-watcher deployment. Use when you need to
5
- author or validate config, trigger reindexing, diagnose embedding failures,
6
- or manage helper registrations.
7
- ---
8
-
9
- # jeeves-watcher — Instance Administration
10
-
11
- ## Tools
12
-
13
- ### `watcher_validate`
14
- Validate config and optionally test file paths.
15
- - `config` (object, optional) — candidate config (partial or full). Omit to validate current config.
16
- - `testPaths` (string[], optional) — file paths to test against the config
17
-
18
- Partial configs merge with current config by rule name. If `config` is omitted, tests against the running config.
19
-
20
- ### `watcher_config_apply`
21
- Apply config changes atomically.
22
- - `config` (object, required) — full or partial config to apply
23
-
24
- Validates, writes to disk, and triggers configured reindex behavior. Returns validation errors if invalid.
25
-
26
- ### `watcher_reindex`
27
- Trigger a reindex.
28
- - `scope` (string, optional) — `"rules"` (default) or `"full"`
29
-
30
- Rules scope re-applies inference rules without re-embedding (lightweight). Full scope re-processes all files.
31
-
32
- ### `watcher_issues`
33
- Get runtime embedding failures. Returns `{ filePath: IssueRecord }` showing files that failed and why.
34
-
35
- ### `watcher_query`
36
- Query config and runtime state via JSONPath (same tool as consumer skill).
37
-
38
- ### `watcher_status`
39
- Service health check including reindex progress.
40
-
41
- ## Qdrant Filter Syntax
42
-
43
- Filters use Qdrant's native JSON filter format, passed as the `filter` parameter to `watcher_search`.
44
-
45
- ### Basic Patterns
46
-
47
- **Match exact value:**
48
- ```json
49
- { "must": [{ "key": "domain", "match": { "value": "email" } }] }
50
- ```
51
-
52
- **Match text (full-text search within field):**
53
- ```json
54
- { "must": [{ "key": "chunk_text", "match": { "text": "authentication" } }] }
55
- ```
56
-
57
- **Combine conditions (AND):**
58
- ```json
59
- {
60
- "must": [
61
- { "key": "domain", "match": { "value": "jira" } },
62
- { "key": "status", "match": { "value": "In Progress" } }
63
- ]
64
- }
65
- ```
66
-
67
- **Exclude (NOT):**
68
- ```json
69
- {
70
- "must_not": [{ "key": "domain", "match": { "value": "repos" } }]
71
- }
72
- ```
73
-
74
- **Any of (OR):**
75
- ```json
76
- {
77
- "should": [
78
- { "key": "domain", "match": { "value": "email" } },
79
- { "key": "domain", "match": { "value": "slack" } }
80
- ]
81
- }
82
- ```
83
-
84
- **Nested (combine AND + NOT):**
85
- ```json
86
- {
87
- "must": [{ "key": "domain", "match": { "value": "jira" } }],
88
- "must_not": [{ "key": "status", "match": { "value": "Done" } }]
89
- }
90
- ```
91
-
92
- ### Key Differences
93
- - `match.value` — exact match (case-sensitive, for keyword fields like `domain`, `status`)
94
- - `match.text` — full-text match (for text fields like `chunk_text`)
95
-
96
- ## Search Result Shape
97
-
98
- Each result from `watcher_search` contains:
99
-
100
- | Field | Type | Description |
101
- |-------|------|-------------|
102
- | `id` | string | Qdrant point ID |
103
- | `score` | number | Similarity score (0-1, higher = more relevant) |
104
- | `payload.file_path` | string | Source file path |
105
- | `payload.chunk_text` | string | The matched text chunk |
106
- | `payload.chunk_index` | number | Chunk position within the file |
107
- | `payload.total_chunks` | number | Total chunks for this file |
108
- | `payload.content_hash` | string | Hash of the full document content |
109
- | `payload.matched_rules` | string[] | Names of inference rules that matched |
110
-
111
- Additional metadata fields depend on the deployment's inference rules (e.g., `domain`, `status`, `author`). Use `watcher_query` to discover available fields.
112
-
113
- ## JSONPath Patterns for Schema Discovery
114
-
115
- Use `watcher_query` to explore the merged virtual document. Common patterns:
116
-
117
- ### Orientation
118
- ```
119
- $.inferenceRules[*].['name','description'] — List all rules with descriptions
120
- $.search.scoreThresholds — Score interpretation thresholds
121
- $.slots — Named filter patterns (e.g., memory)
122
- ```
123
-
124
- ### Schema Discovery
125
- ```
126
- $.inferenceRules[?(@.name=='jira-issue')] — Full rule details
127
- $.inferenceRules[?(@.name=='jira-issue')].values — Distinct values for a rule
128
- $.inferenceRules[?(@.name=='jira-issue')].values.status — Values for a specific field
129
- ```
130
-
131
- ### Helper Enumeration
132
- ```
133
- $.mapHelpers — All JsonMap helper namespaces
134
- $.mapHelpers.slack.exports — Exports from the 'slack' helper
135
- $.templateHelpers — All Handlebars helper namespaces
136
- ```
137
-
138
- ### Issues
139
- ```
140
- $.issues — All runtime embedding failures
141
- ```
142
-
143
- ### Full Config Introspection
144
- ```
145
- $.schemas — Global named schemas
146
- $.maps — Named JsonMap transforms
147
- $.templates — Named Handlebars templates
148
- ```
149
-
150
- ## Config Authoring
151
-
152
- ### Rule Structure
153
- Each inference rule has:
154
- - `name` (required) — unique identifier
155
- - `description` (optional) — human-readable purpose
156
- - `match` — JSON Schema with picomatch glob for path matching
157
- - `set` — metadata fields to set on match
158
- - `map` (optional) — named JsonMap transform
159
- - `template` (optional) — named Handlebars template
160
-
161
- ### Config Workflow
162
- 1. Edit config (or build partial config object)
163
- 2. Validate: `watcher_validate` with optional `testPaths` for dry-run preview
164
- 3. Apply: `watcher_config_apply` — validates, writes, triggers reindex
165
- 4. Monitor: `watcher_issues` for runtime embedding failures
166
-
167
- ### When to Reindex
168
- - **Rules scope** (`"rules"`): Changed rule matching patterns, set expressions, schema mappings. No re-embedding needed.
169
- - **Full scope** (`"full"`): Changed embedding config, added watch paths, broad schema restructuring. Re-embeds everything.
170
-
171
- ## Diagnostics
172
-
173
- ### Escalation Path
174
- 1. `watcher_status` — is the service healthy? Is a reindex running?
175
- 2. `watcher_issues` — what files are failing and why?
176
- 3. `watcher_query` with `$.issues` — same data via JSONPath
177
- 4. Check logs at the configured log path
178
-
179
- ### Error Categories
180
- - `type_collision` — metadata field type mismatch during extraction
181
- - `interpolation` — template/set expression failed to resolve
182
- - `read_failure` — file couldn't be read (permissions, encoding)
183
- - `embedding` — embedding API error
184
-
185
- ## Helper Management
186
-
187
- Helpers use namespace prefixing: config key becomes prefix. A helper named `slack` exports `slack_extractParticipants`.
188
-
189
- Enumerate loaded helpers:
190
- ```
191
- $.mapHelpers — JsonMap helper namespaces with exports
192
- $.templateHelpers — Handlebars helper namespaces with exports
193
- ```
194
-
195
- ## CLI Fallbacks
196
-
197
- If the watcher API is down:
198
- - `jeeves-watcher status` — check if the service is running
199
- - `jeeves-watcher validate` — validate config from CLI
200
- - Restart via NSSM (Windows) or systemctl (Linux)
File without changes