@oomkapwn/enquire-mcp 2.10.0 → 2.12.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -2,6 +2,163 @@
2
2
 
3
3
  All notable changes to this project will be documented here. The format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/), and the project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
4
4
 
5
+ ## [2.12.0] — 2026-05-09
6
+
7
+ **Sprint 12 — built-in retrieval-quality evaluation harness.** Closes the "you can't tune what you can't measure" gap. Before this, anyone trying to A/B test retrieval changes (graph_boost on/off, reranker on/off, different `min_signals` / `limit` values) had to write a custom script. Now there's a first-class `enquire-mcp eval` subcommand. **No other Obsidian-MCP currently ships a built-in retrieval evaluation harness.**
8
+
9
+ ### Added — `enquire-mcp eval --vault <path> --queries <file>`
10
+
11
+ Reads a JSONL file of queries with known-relevant doc paths, runs `obsidian_search` for each, computes standard IR metrics, reports per-query + aggregate scores.
12
+
13
+ **Input format** (one JSON object per line; tolerates blank lines and `//` comments):
14
+
15
+ ```jsonl
16
+ {"id": "rkt", "query": "Apollo program rocket", "relevant": ["apollo.md", "saturn.md"]}
17
+ {"id": "food", "query": "carbonara recipe", "relevant": ["pasta.md"]}
18
+ ```
19
+
20
+ **Metrics** (from Manning et al, "Introduction to Information Retrieval", Chapter 8):
21
+ - **NDCG@K** (Normalized Discounted Cumulative Gain) — penalizes relevant docs found low in the ranking; 1.0 perfect, 0.0 worst.
22
+ - **Recall@K** — fraction of relevant docs found in top-K.
23
+ - **MRR** (Mean Reciprocal Rank) — 1/rank of the first relevant doc; 0 if none.
24
+
25
+ Binary-relevance ground truth (each path in `relevant` is gain=1, others gain=0) — most users won't label graded relevance, so this is the practical default.
26
+
27
+ **Flags:**
28
+ - `--k <n>` — top-K cutoff (default 10)
29
+ - `--matrix` — 2×2 sweep of (graph_boost ± reranker), printed as a comparison table with the best-NDCG config highlighted
30
+ - `--reranker` — enable cross-encoder reranking (same as `serve --enable-reranker`)
31
+ - `--reranker-model <alias>` / `--reranker-top-n <n>` — pass-through reranker config
32
+ - `--persistent-index` — open the FTS5 BM25 index for the eval (recommended; without it, the eval runs over TF-IDF only)
33
+ - `--per-query` — print per-query scores in addition to aggregates
34
+ - `--json` — emit machine-readable JSON (useful for piping into a comparison tool, dashboard, or CI gate)
35
+
36
+ **Example output:**
37
+
38
+ ```
39
+ enquire eval — default
40
+ 12 queries · k=10 · wall=2483ms
41
+
42
+ aggregate:
43
+ mean NDCG@10 = 0.7621
44
+ mean Recall@10 = 0.8333
45
+ mean MRR = 0.8125
46
+ mean latency = 187ms (per query)
47
+ ```
48
+
49
+ **Matrix mode example:**
50
+
51
+ ```
52
+ enquire eval matrix (4 configs)
53
+
54
+ label NDCG@10 Recall@10 MRR latency
55
+ baseline (RRF only) 0.6420 0.7500 0.6250 142ms
56
+ +graph-boost 0.7150 0.8333 0.7083 148ms
57
+ +reranker 0.8210 0.8333 0.9583 421ms
58
+ +graph-boost +reranker 0.8345 0.9167 0.9583 428ms
59
+
60
+ best NDCG@10: +graph-boost +reranker (0.8345)
61
+ ```
62
+
63
+ ### Implementation
64
+
65
+ `src/eval.ts` (~340 lines):
66
+ - Pure-function metrics (`ndcgAtK`, `recallAtK`, `reciprocalRank`) — exact log2-based formulas, fully testable without I/O.
67
+ - `readQueriesJsonl(file)` — tolerates blank lines + `//` comments, throws with line numbers on malformed input.
68
+ - `runEval(opts)` — orchestrates per-query searchHybrid calls with per-query latency tracking and per-query failure isolation (one bad query doesn't sink the eval).
69
+ - `formatEvalResult` / `formatEvalMatrix` — TTY-aware ANSI rendering, plain text on pipes.
70
+
71
+ ### Surface delta vs v2.11.0
72
+
73
+ - **+1 CLI subcommand** (`eval`)
74
+ - **+1 source module** (`src/eval.ts`)
75
+ - **No new MCP tools, no new prompts, no schema changes, no new prod deps.**
76
+
77
+ ### Tests
78
+
79
+ 547 unit tests pass (was 522 in v2.11.0, +25 new):
80
+ - **Pure metrics (+11):** ndcgAtK / recallAtK / reciprocalRank — empty relevant set, no overlap, perfect ranking, partial overlap, K-cutoff truncation, first-relevant-only MRR semantics.
81
+ - **readQueriesJsonl (+5):** valid input, blank lines + comments tolerated, malformed JSON throws with line number, missing required fields throws with field name, type-incorrect `relevant` rejected.
82
+ - **runEval end-to-end (+3):** single-query scoring against real FtsIndex, multi-query aggregation, per-query failure isolation.
83
+ - **format helpers (+6):** non-empty output, per-query mode includes table, matrix highlights best NDCG, empty matrix handles gracefully.
84
+
85
+ ### Migration
86
+
87
+ **No-op for default users.** Eval is opt-in via the new subcommand. Existing `serve` / `serve-http` / `setup` / `doctor` behavior is unchanged.
88
+
89
+ ### Strategic position
90
+
91
+ v2.12.0 is the **measurement** sprint that pairs with v2.11.0's onboarding sprint. Together they form a "tune-while-you-build" feedback loop: `setup` indexes your vault, `eval` scores your retrieval, you adjust flags + re-eval until NDCG plateaus. Karpathy-style LLM Wiki users get systematic quality tuning for free. The retrieval-quality moat (hybrid RRF, graph-boost, PDF blending, cross-encoder reranking, OCR) gets a quantitative ruler bundled in the box.
92
+
93
+ ### Bonus (PR #31)
94
+
95
+ Patched 3 fresh `hono` advisories that landed in the GHSA database overnight (CSS injection in JSX SSR, JWT NumericDate validation, Cache Middleware Vary handling). Transitive via `@modelcontextprotocol/sdk → @hono/node-server → hono`. Lockfile-only diff via `npm audit fix`.
96
+
97
+ ## [2.11.0] — 2026-05-08
98
+
99
+ **Sprint 11 — zero-touch onboarding (`doctor` + `setup`).** Closes the biggest UX gap in the project: setup friction. Before this, getting full hybrid retrieval required 3 separate commands (`install-model` → `build-embeddings` → `serve --persistent-index`), and there was no quick way to see "is everything ready?" without triggering each codepath.
100
+
101
+ ### Added — `enquire-mcp doctor --vault <path>`
102
+
103
+ Read-only health check. Verifies every prerequisite for full hybrid retrieval:
104
+ - Vault path exists + is readable, with note/PDF/canvas counts (privacy filter applied)
105
+ - All 5 optional deps load cleanly: `better-sqlite3` (FTS5 + embed-db), `@huggingface/transformers` (ML embeddings + reranker), `pdfjs-dist` (PDF read + indexing), `tesseract.js` + `@napi-rs/canvas` (OCR for scanned PDFs)
106
+ - Embedding model cache — probes 5+ candidate paths (transformers.js v3 default `node_modules/@huggingface/transformers/.cache/Xenova/`, HF_HOME, TRANSFORMERS_CACHE env vars, `~/.cache/huggingface/`, macOS XDG `~/Library/Caches/huggingface/`)
107
+ - FTS5 BM25 index existence + per-vault file/chunk counts
108
+ - Embed-db existence + size
109
+
110
+ Color-coded ✓ / ⚠ / ✗ output (auto-detects TTY so piped output stays clean). Returns 0 if everything is ready, 1 if any critical piece is missing. `--json` flag for machine-readable output (useful for CI / scripted setup checks).
111
+
112
+ ### Added — `enquire-mcp setup --vault <path>`
113
+
114
+ Zero-touch onboarding. Runs the install + build sequence in one command:
115
+
116
+ 1. **Step 1/3:** Cold-build FTS5 BM25 index (`syncFtsIndex` + optional `syncPdfFtsIndex` if `--include-pdfs`)
117
+ 2. **Step 2/3:** Install embedding model (downloads ~120 MB for `multilingual` default, cached for reuse)
118
+ 3. **Step 3/3:** Build embedding index (`syncEmbedDb` + optional `syncPdfEmbedDb`)
119
+
120
+ Idempotent — re-running on a fully set-up vault is a fast no-op pass that just reports the existing state. `--skip-embeddings` for users who only want BM25. `--include-pdfs` for vaults with PDFs.
121
+
122
+ After successful setup, prints the exact `serve` command to run.
123
+
124
+ ### Surface delta vs v2.10.0
125
+
126
+ - **+2 CLI subcommands** (`doctor`, `setup`)
127
+ - **+1 source module** (`src/doctor.ts`, ~310 lines)
128
+ - **No new tools, no new prompts, no schema changes, no new deps.**
129
+
130
+ ### Tests
131
+
132
+ 522 unit tests pass (was 509 in v2.10.0, +13 new):
133
+ - **runDoctor (+8):** result shape contract, vault check ok-vs-error, optional-dep checks (5 deps), model-cache check missing-vs-ok with synthetic Xenova dir, FTS5 + embed-db checks not-built status, ready boolean correctness against summary tally.
134
+ - **formatCheck + formatDoctorResult (+5):** non-empty output for each status, detail + hint inclusion, hint omission for ok status, banner shape, NOT-READY verdict on failures.
135
+
136
+ ### Migration
137
+
138
+ **No-op for default users.** Both new subcommands are opt-in. Existing `serve` / `serve-http` / `index` / `build-embeddings` behavior unchanged.
139
+
140
+ ### Strategic position
141
+
142
+ v2.11.0 is a UX-focused sprint, not a capability sprint. The retrieval moats (hybrid RRF, graph-boost, PDF + OCR, cross-encoder reranking) all stayed put. What changed: the **time-to-first-useful-result** drops from ~5 minutes (figure out 3 commands, paste them, wait) to ~30 seconds (`enquire-mcp setup --vault <path>` and you're done).
143
+
144
+ Demo flow:
145
+
146
+ ```bash
147
+ $ enquire-mcp doctor --vault ~/Obsidian
148
+ NOT READY — 1 missing/error, 0 warnings, 7 ok
149
+ ✗ Embedding model cache → enquire-mcp install-model multilingual
150
+
151
+ $ enquire-mcp setup --vault ~/Obsidian
152
+ >> Step 1/3: Cold-build FTS5 index ...
153
+ >> Step 2/3: Install embedding model ...
154
+ >> Step 3/3: Build embedding index ...
155
+ ✓ Setup complete. Now run:
156
+ enquire-mcp serve --vault ~/Obsidian --persistent-index
157
+
158
+ $ enquire-mcp doctor --vault ~/Obsidian
159
+ READY — all critical checks pass (8 ok, 0 warnings)
160
+ ```
161
+
5
162
  ## [2.10.0] — 2026-05-08
6
163
 
7
164
  **Sprint 10 — OCR for image-only / scanned PDFs.** Closes the v2.7-v2.8-v2.9 PDF retrieval story. v2.7.0 added text-extraction tools; v2.8.0 blended PDF chunks into hybrid search; v2.9.0 added cross-encoder reranking. v2.10.0 makes the **scanned / camera-captured** PDFs in your vault searchable too — Tesseract.js OCR over each page bitmap.
package/README.md CHANGED
@@ -42,14 +42,19 @@ That's it. Your AI now has structured access to wikilinks, backlinks, frontmatte
42
42
  }
43
43
  ```
44
44
 
45
- **Want hybrid retrieval at full power?** One-time setup, ~10 min for a 100-note vault:
45
+ **Want hybrid retrieval at full power?** One command (v2.11.0):
46
46
 
47
47
  ```bash
48
- enquire-mcp install-model multilingual # ~120MB, 50+ languages
49
- enquire-mcp build-embeddings --vault <path> # ~30ms/chunk on M1
48
+ enquire-mcp setup --vault <path> # downloads model, builds FTS5 + embed indexes
50
49
  # then: serve --persistent-index for BM25 + --enable-reranker for cross-encoder
51
50
  ```
52
51
 
52
+ Already set up? Check status anytime:
53
+
54
+ ```bash
55
+ enquire-mcp doctor --vault <path> # color-coded ✓/⚠/✗ health check
56
+ ```
57
+
53
58
  ---
54
59
 
55
60
  ## 🎯 The only Obsidian-MCP with…
@@ -58,6 +63,7 @@ enquire-mcp build-embeddings --vault <path> # ~30ms/chunk on M1
58
63
  - ✅ **Cross-encoder reranking** on top of RRF (+5-10 NDCG@10) — `v2.9.0`
59
64
  - ✅ **PDFs blended into hybrid search** with `[page: N]` citation markers — `v2.8.0`
60
65
  - ✅ **OCR for scanned / image-only PDFs** (Tesseract.js, multilingual) — `v2.10.0`
66
+ - ✅ **Built-in retrieval-quality eval** (`enquire-mcp eval` — NDCG@K, Recall@K, MRR, A/B matrix) — `v2.12.0`
61
67
  - ✅ **Wikilink graph-boost** as a retrieval signal (1-step personalised PageRank seeded by RRF top-K)
62
68
  - ✅ **Remote MCP** over HTTP with bearer auth + rate-limit + CORS — `v2.6.0`
63
69
  - ✅ **Multilingual** semantic search (50+ languages, runs on CPU, free)
@@ -113,13 +119,14 @@ graph LR
113
119
  | **PDFs blended into hybrid search** | ❌ | ❌ | ✅ **only here** |
114
120
  | **OCR for scanned / image-only PDFs** | ❌ | ❌ | ✅ **only here** |
115
121
  | **Cross-encoder reranking** | ❌ | ❌ | ✅ **only here** |
122
+ | **Built-in retrieval-quality eval** (NDCG@K + matrix) | ❌ | ❌ | ✅ **only here** |
116
123
  | **Remote MCP (HTTP + bearer auth)** | ❌ | ❌ | ✅ **only here** |
117
124
  | Per-signal observability per hit | ❌ | ❌ | ✅ |
118
125
  | Privacy filter (exclude/allow globs) | ❌ | n/a | ✅ verified at search + write paths |
119
126
  | Standalone (no Obsidian plugin) | varies | ❌ requires Obsidian | ✅ direct vault read |
120
127
  | MCP-native (any agent) | varies | ❌ Obsidian-only | ✅ stdio + HTTP |
121
128
  | SLSA-3 release provenance | ❌ | n/a | ✅ |
122
- | Test suite | rare | n/a | ✅ 507 unit tests |
129
+ | Test suite | rare | n/a | ✅ 547 unit tests |
123
130
 
124
131
  > **Strategic claim:** enquire is the open-source backend for [Karpathy-style LLM Wikis](https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f) on top of your existing Obsidian vault. The `vault_synth` / `vault_wiki_compile` / `vault_lint_extended` prompts implement the ingest → query → lint → compile workflow natively over `.md` + `[[wikilinks]]`. Knowledge that compounds, traceable to sources.
125
132
 
@@ -172,7 +179,7 @@ The flags you'll actually use:
172
179
  | `--watch` | off | Live invalidation on `.md` add/change/unlink |
173
180
  | `--persistent-cache` | off | Survive cold starts |
174
181
 
175
- Subcommands: `serve` · `serve-http` · `gen-token` · `clear-cache` · `clear-index` · `clear-embeddings` · `index` · `install-model` · `build-embeddings`.
182
+ Subcommands: `serve` · `serve-http` · `gen-token` · `doctor` (v2.11) · `setup` (v2.11) · `eval` (v2.12) · `clear-cache` · `clear-index` · `clear-embeddings` · `index` · `install-model` · `build-embeddings`.
176
183
 
177
184
  **Remote MCP** for Claude.ai web / ChatGPT / Cursor HTTP / mobile:
178
185
 
@@ -200,7 +207,7 @@ enquire-mcp serve-http \
200
207
 
201
208
  | Surface | Posture |
202
209
  |---|---|
203
- | Tests | 507 unit tests across 25 files, 8 required CI gates per PR |
210
+ | Tests | 547 unit tests across 27 files, 8 required CI gates per PR |
204
211
  | Coverage | Lines ≥86%, statements ≥82%, functions ≥75%, branches ≥73% (gated) |
205
212
  | Audit | `npm audit --audit-level=moderate` for prod; high for dev |
206
213
  | CI | Ubuntu × {Node 20, 22, 24} required + macOS advisory job |
@@ -0,0 +1,54 @@
1
+ import type { EmbeddingModel } from "./embeddings.js";
2
+ /** Severity buckets surfaced in the diagnostic UI. */
3
+ export type CheckStatus = "ok" | "warn" | "missing" | "error";
4
+ export interface DoctorCheck {
5
+ /** Stable id for programmatic consumers (e.g. JSON output). */
6
+ id: string;
7
+ /** Human-readable label (rendered next to the status icon). */
8
+ label: string;
9
+ status: CheckStatus;
10
+ /** Optional detail line printed below the label. */
11
+ detail?: string;
12
+ /** Optional hint — usually the command that fixes it. */
13
+ hint?: string;
14
+ }
15
+ export interface DoctorResult {
16
+ vault: string;
17
+ /** True iff every `missing`/`error` check is absent (`warn` is OK). */
18
+ ready: boolean;
19
+ checks: DoctorCheck[];
20
+ /** Tally for quick consumer reporting. */
21
+ summary: {
22
+ ok: number;
23
+ warn: number;
24
+ missing: number;
25
+ error: number;
26
+ };
27
+ }
28
+ /** Render one DoctorCheck to a multi-line string. */
29
+ export declare function formatCheck(check: DoctorCheck): string;
30
+ /** Render a full DoctorResult to a banner string. */
31
+ export declare function formatDoctorResult(result: DoctorResult): string;
32
+ export interface RunDoctorOptions {
33
+ vault: string;
34
+ /** Override default cache root (mostly for tests). */
35
+ modelCacheRoot?: string;
36
+ /** Override default embed-db location. */
37
+ embedFile?: string;
38
+ /** Override default FTS5 index location. */
39
+ indexFile?: string;
40
+ /** Default model alias to check for (matches DEFAULT_MODEL_ALIAS). */
41
+ modelAlias?: string;
42
+ /**
43
+ * Embedding-model catalog entry — passed in to avoid pulling
44
+ * `@huggingface/transformers` into this module. Caller resolves it via
45
+ * `resolveModel(alias)` from src/embeddings.ts.
46
+ */
47
+ modelEntry?: EmbeddingModel;
48
+ }
49
+ /**
50
+ * Run all the diagnostic checks. Pure data — caller decides how to
51
+ * render (CLI banner, JSON, MCP tool response).
52
+ */
53
+ export declare function runDoctor(opts: RunDoctorOptions): Promise<DoctorResult>;
54
+ //# sourceMappingURL=doctor.d.ts.map
@@ -0,0 +1 @@
1
+ {"version":3,"file":"doctor.d.ts","sourceRoot":"","sources":["../src/doctor.ts"],"names":[],"mappings":"AA8BA,OAAO,KAAK,EAAE,cAAc,EAAE,MAAM,iBAAiB,CAAC;AAItD,sDAAsD;AACtD,MAAM,MAAM,WAAW,GAAG,IAAI,GAAG,MAAM,GAAG,SAAS,GAAG,OAAO,CAAC;AAE9D,MAAM,WAAW,WAAW;IAC1B,+DAA+D;IAC/D,EAAE,EAAE,MAAM,CAAC;IACX,+DAA+D;IAC/D,KAAK,EAAE,MAAM,CAAC;IACd,MAAM,EAAE,WAAW,CAAC;IACpB,oDAAoD;IACpD,MAAM,CAAC,EAAE,MAAM,CAAC;IAChB,yDAAyD;IACzD,IAAI,CAAC,EAAE,MAAM,CAAC;CACf;AAED,MAAM,WAAW,YAAY;IAC3B,KAAK,EAAE,MAAM,CAAC;IACd,uEAAuE;IACvE,KAAK,EAAE,OAAO,CAAC;IACf,MAAM,EAAE,WAAW,EAAE,CAAC;IACtB,0CAA0C;IAC1C,OAAO,EAAE;QAAE,EAAE,EAAE,MAAM,CAAC;QAAC,IAAI,EAAE,MAAM,CAAC;QAAC,OAAO,EAAE,MAAM,CAAC;QAAC,KAAK,EAAE,MAAM,CAAA;KAAE,CAAC;CACvE;AAYD,qDAAqD;AACrD,wBAAgB,WAAW,CAAC,KAAK,EAAE,WAAW,GAAG,MAAM,CAatD;AAED,qDAAqD;AACrD,wBAAgB,kBAAkB,CAAC,MAAM,EAAE,YAAY,GAAG,MAAM,CAY/D;AA4DD,MAAM,WAAW,gBAAgB;IAC/B,KAAK,EAAE,MAAM,CAAC;IACd,sDAAsD;IACtD,cAAc,CAAC,EAAE,MAAM,CAAC;IACxB,0CAA0C;IAC1C,SAAS,CAAC,EAAE,MAAM,CAAC;IACnB,4CAA4C;IAC5C,SAAS,CAAC,EAAE,MAAM,CAAC;IACnB,sEAAsE;IACtE,UAAU,CAAC,EAAE,MAAM,CAAC;IACpB;;;;OAIG;IACH,UAAU,CAAC,EAAE,cAAc,CAAC;CAC7B;AAED;;;GAGG;AACH,wBAAsB,SAAS,CAAC,IAAI,EAAE,gBAAgB,GAAG,OAAO,CAAC,YAAY,CAAC,CAwO7E"}
package/dist/doctor.js ADDED
@@ -0,0 +1,369 @@
1
+ // Diagnostic + auto-setup for enquire-mcp.
2
+ //
3
+ // v2.11.0 — closes the biggest UX gap in the project: setup friction.
4
+ // Before this, getting full hybrid retrieval required 3 separate commands
5
+ // (`install-model` → `build-embeddings` → `serve --persistent-index`),
6
+ // and there was no quick way to see "is everything ready?" without
7
+ // triggering each codepath.
8
+ //
9
+ // Two new subcommands:
10
+ //
11
+ // enquire-mcp doctor --vault <path>
12
+ // Read-only health check. Lists every prerequisite for full hybrid
13
+ // retrieval (vault path, optional deps, embedding model cache, FTS5
14
+ // index, embed.db). Color-coded ✓ / ⚠ / ✗. Returns 0 if everything
15
+ // is ready, 1 if any critical piece is missing.
16
+ //
17
+ // enquire-mcp setup --vault <path>
18
+ // Runs the install + build sequence in order, with progress messages
19
+ // at each stage. Calls install-model + cold-build FTS5 + build-
20
+ // embeddings under the hood. Idempotent — re-running on a fully
21
+ // set-up vault is a no-op pass.
22
+ //
23
+ // Both are pure orchestration over existing CLI/library code — no new
24
+ // runtime deps, no schema changes. Same privacy filter applies (the
25
+ // doctor walks the vault via Vault.listMarkdown so excluded paths are
26
+ // hidden from its counts).
27
+ import { existsSync, promises as fs, statSync } from "node:fs";
28
+ import * as os from "node:os";
29
+ import * as path from "node:path";
30
+ import { defaultIndexFile, FtsIndex } from "./fts5.js";
31
+ import { Vault } from "./vault.js";
32
+ /** Simple ANSI color helpers — autodetect TTY so piped output stays clean. */
33
+ const isTty = process.stdout.isTTY === true;
34
+ const c = {
35
+ green: (s) => (isTty ? `\x1b[32m${s}\x1b[0m` : s),
36
+ yellow: (s) => (isTty ? `\x1b[33m${s}\x1b[0m` : s),
37
+ red: (s) => (isTty ? `\x1b[31m${s}\x1b[0m` : s),
38
+ dim: (s) => (isTty ? `\x1b[2m${s}\x1b[0m` : s),
39
+ bold: (s) => (isTty ? `\x1b[1m${s}\x1b[0m` : s)
40
+ };
41
+ /** Render one DoctorCheck to a multi-line string. */
42
+ export function formatCheck(check) {
43
+ const icon = check.status === "ok"
44
+ ? c.green("✓")
45
+ : check.status === "warn"
46
+ ? c.yellow("⚠")
47
+ : check.status === "missing"
48
+ ? c.red("✗")
49
+ : c.red("✗");
50
+ const lines = [`${icon} ${check.label}`];
51
+ if (check.detail)
52
+ lines.push(c.dim(` ${check.detail}`));
53
+ if (check.hint && check.status !== "ok")
54
+ lines.push(c.dim(` → ${check.hint}`));
55
+ return lines.join("\n");
56
+ }
57
+ /** Render a full DoctorResult to a banner string. */
58
+ export function formatDoctorResult(result) {
59
+ const lines = [];
60
+ lines.push(c.bold(`enquire-mcp doctor — ${result.vault}`));
61
+ lines.push("");
62
+ for (const check of result.checks)
63
+ lines.push(formatCheck(check));
64
+ lines.push("");
65
+ const { ok, warn, missing, error } = result.summary;
66
+ const verdict = result.ready
67
+ ? c.green(`READY — all critical checks pass (${ok} ok, ${warn} warnings)`)
68
+ : c.red(`NOT READY — ${missing + error} missing/error, ${warn} warnings, ${ok} ok`);
69
+ lines.push(verdict);
70
+ return lines.join("\n");
71
+ }
72
+ /**
73
+ * Candidate locations where transformers.js may have cached embedding model
74
+ * weights. We probe all of them and report `ok` if any contains data.
75
+ *
76
+ * Why multiple paths:
77
+ * - transformers.js v3+ default: `<package>/.cache/Xenova/...` (lives
78
+ * inside `node_modules/@huggingface/transformers/.cache`, the
79
+ * library's own cache dir relative to its install location).
80
+ * - Older HuggingFace Hub convention: `~/.cache/huggingface/...`.
81
+ * - macOS XDG override: `~/Library/Caches/huggingface/...`.
82
+ * - Custom env var: HF_HOME or TRANSFORMERS_CACHE if the user set them.
83
+ *
84
+ * We don't try to load transformers.js to read `env.cacheDir` — that
85
+ * would defeat the doctor's "fast read-only health check" promise on
86
+ * users who haven't installed the optional dep at all.
87
+ */
88
+ function candidateModelCacheRoots() {
89
+ const candidates = [];
90
+ // 1. transformers.js v3+ default (lives inside the package itself).
91
+ // Find the @huggingface/transformers install directory.
92
+ // require.resolve doesn't exist in ESM; we walk node_modules ourselves
93
+ // from cwd. If transformers.js isn't installed, this candidate just
94
+ // won't exist on disk and gets filtered out.
95
+ candidates.push(path.join(process.cwd(), "node_modules", "@huggingface", "transformers", ".cache"));
96
+ // 2. HuggingFace Hub conventions.
97
+ if (process.env.HF_HOME)
98
+ candidates.push(path.join(process.env.HF_HOME, "hub"));
99
+ if (process.env.TRANSFORMERS_CACHE)
100
+ candidates.push(process.env.TRANSFORMERS_CACHE);
101
+ candidates.push(path.join(os.homedir(), ".cache", "huggingface", "transformers.js"));
102
+ candidates.push(path.join(os.homedir(), ".cache", "huggingface"));
103
+ // 3. macOS XDG-ish convention.
104
+ if (process.platform === "darwin") {
105
+ candidates.push(path.join(os.homedir(), "Library", "Caches", "huggingface"));
106
+ }
107
+ return candidates;
108
+ }
109
+ /**
110
+ * Default `.embed.db` location for a given vault root — same convention as
111
+ * the rest of the codebase. Mirrors `embedDbPath` in src/index.ts.
112
+ */
113
+ function defaultEmbedDbFile(vaultRoot) {
114
+ return defaultIndexFile(vaultRoot).replace(/\.fts5\.db$/, ".embed.db");
115
+ }
116
+ /**
117
+ * Probe whether an optional dep is loadable in this process. Uses a
118
+ * dynamic import inside a try/catch so we never crash the diagnostic
119
+ * on a missing or broken native binding.
120
+ */
121
+ async function probeOptionalDep(spec) {
122
+ try {
123
+ await import(spec);
124
+ return true;
125
+ }
126
+ catch {
127
+ return false;
128
+ }
129
+ }
130
+ /**
131
+ * Run all the diagnostic checks. Pure data — caller decides how to
132
+ * render (CLI banner, JSON, MCP tool response).
133
+ */
134
+ export async function runDoctor(opts) {
135
+ const checks = [];
136
+ const vault = new Vault(opts.vault);
137
+ // 1. Vault path exists + is readable.
138
+ let vaultExists = false;
139
+ try {
140
+ await vault.ensureExists();
141
+ vaultExists = true;
142
+ const noteCount = (await vault.listMarkdown()).length;
143
+ const pdfCount = (await vault.listFilesByExtension(".pdf")).length;
144
+ const canvasCount = (await vault.listFilesByExtension(".canvas")).length;
145
+ checks.push({
146
+ id: "vault",
147
+ label: `Vault accessible at ${opts.vault}`,
148
+ status: "ok",
149
+ detail: `${noteCount} markdown · ${pdfCount} pdf · ${canvasCount} canvas (privacy filter applied)`
150
+ });
151
+ }
152
+ catch (err) {
153
+ const msg = err instanceof Error ? err.message : String(err);
154
+ checks.push({
155
+ id: "vault",
156
+ label: `Vault path ${opts.vault}`,
157
+ status: "error",
158
+ detail: msg,
159
+ hint: "Check the path exists and is a directory"
160
+ });
161
+ }
162
+ // 2. better-sqlite3 — gates --persistent-index + ML embed-db.
163
+ const hasSqlite = await probeOptionalDep("better-sqlite3");
164
+ checks.push({
165
+ id: "dep:better-sqlite3",
166
+ label: "better-sqlite3 (FTS5 BM25 + embedding store)",
167
+ status: hasSqlite ? "ok" : "missing",
168
+ detail: hasSqlite ? "loaded; native binding works" : undefined,
169
+ hint: hasSqlite ? undefined : "npm install better-sqlite3 (or remove --omit=optional from your install)"
170
+ });
171
+ // 3. @huggingface/transformers — gates ML embeddings + reranker.
172
+ const hasTransformers = await probeOptionalDep("@huggingface/transformers");
173
+ checks.push({
174
+ id: "dep:transformers",
175
+ label: "@huggingface/transformers (ML embeddings + cross-encoder reranker)",
176
+ status: hasTransformers ? "ok" : "missing",
177
+ detail: hasTransformers ? "loaded; ONNX runtime available" : undefined,
178
+ hint: hasTransformers ? undefined : "npm install @huggingface/transformers"
179
+ });
180
+ // 4. pdfjs-dist — gates obsidian_read_pdf + PDF retrieval.
181
+ const hasPdfjs = await probeOptionalDep("pdfjs-dist/legacy/build/pdf.mjs");
182
+ checks.push({
183
+ id: "dep:pdfjs",
184
+ label: "pdfjs-dist (PDF read + indexing)",
185
+ status: hasPdfjs ? "ok" : "warn",
186
+ detail: hasPdfjs ? "loaded" : "PDFs in vault won't be indexable",
187
+ hint: hasPdfjs ? undefined : "npm install pdfjs-dist@^4.10.38 (skip if you have no PDFs)"
188
+ });
189
+ // 5. tesseract.js + @napi-rs/canvas — gates obsidian_ocr_pdf.
190
+ const [hasTesseract, hasCanvas] = await Promise.all([
191
+ probeOptionalDep("tesseract.js"),
192
+ probeOptionalDep("@napi-rs/canvas")
193
+ ]);
194
+ if (hasTesseract && hasCanvas) {
195
+ checks.push({
196
+ id: "dep:ocr",
197
+ label: "tesseract.js + @napi-rs/canvas (OCR for scanned PDFs)",
198
+ status: "ok",
199
+ detail: "both loaded; PDF OCR ready"
200
+ });
201
+ }
202
+ else {
203
+ checks.push({
204
+ id: "dep:ocr",
205
+ label: "tesseract.js + @napi-rs/canvas (OCR for scanned PDFs)",
206
+ status: "warn",
207
+ detail: `tesseract.js=${hasTesseract ? "ok" : "missing"} · canvas=${hasCanvas ? "ok" : "missing"}`,
208
+ hint: "npm install tesseract.js @napi-rs/canvas (skip if you have no scanned PDFs)"
209
+ });
210
+ }
211
+ // 6. Embedding model cache — does the user have weights downloaded?
212
+ // Probe every candidate path; whichever has Xenova-style model dirs
213
+ // wins. Fall back to "missing" only if every candidate is empty/absent.
214
+ const cacheRoots = opts.modelCacheRoot ? [opts.modelCacheRoot] : candidateModelCacheRoots();
215
+ let foundCacheRoot = null;
216
+ let cachedCount = 0;
217
+ let cacheBytes = 0;
218
+ for (const cacheRoot of cacheRoots) {
219
+ if (!existsSync(cacheRoot))
220
+ continue;
221
+ try {
222
+ // Look for at least one Xenova/* directory or any direct model dir
223
+ // (transformers.js stores models as `Xenova/<model-id>`).
224
+ const xenovaPath = path.join(cacheRoot, "Xenova");
225
+ if (existsSync(xenovaPath)) {
226
+ const sub = await fs.readdir(xenovaPath, { withFileTypes: true });
227
+ const models = sub.filter((e) => e.isDirectory());
228
+ if (models.length > 0) {
229
+ foundCacheRoot = cacheRoot;
230
+ cachedCount = models.length;
231
+ // Best-effort size sum — bounded per model dir.
232
+ for (const m of models) {
233
+ try {
234
+ const files = await fs.readdir(path.join(xenovaPath, m.name));
235
+ for (const f of files) {
236
+ try {
237
+ cacheBytes += statSync(path.join(xenovaPath, m.name, f)).size;
238
+ }
239
+ catch {
240
+ /* skip */
241
+ }
242
+ }
243
+ }
244
+ catch {
245
+ /* skip */
246
+ }
247
+ }
248
+ break;
249
+ }
250
+ }
251
+ }
252
+ catch {
253
+ /* try next candidate */
254
+ }
255
+ }
256
+ if (foundCacheRoot && cachedCount > 0) {
257
+ checks.push({
258
+ id: "model:cache",
259
+ label: "Embedding model cache",
260
+ status: "ok",
261
+ detail: `${cachedCount} model(s) cached under ${foundCacheRoot}/Xenova/ (~${Math.round(cacheBytes / 1024 / 1024)} MB)`
262
+ });
263
+ }
264
+ else {
265
+ checks.push({
266
+ id: "model:cache",
267
+ label: "Embedding model cache",
268
+ status: "missing",
269
+ detail: "no Xenova model weights found in any standard cache location",
270
+ hint: opts.modelEntry
271
+ ? `enquire-mcp install-model ${opts.modelEntry.alias} (~${opts.modelEntry.approxSizeMB} MB)`
272
+ : "enquire-mcp install-model multilingual"
273
+ });
274
+ }
275
+ // 7. FTS5 index — does the persistent index exist for this vault?
276
+ if (vaultExists) {
277
+ const indexFile = opts.indexFile ?? defaultIndexFile(vault.root);
278
+ if (existsSync(indexFile) && hasSqlite) {
279
+ // Open + close to count files/chunks. If something's off, surface it
280
+ // as a warn (not missing — caller can still serve without the index).
281
+ try {
282
+ const idx = new FtsIndex({ file: indexFile, vaultRoot: vault.root });
283
+ await idx.open();
284
+ const totalFiles = idx.totalFiles();
285
+ const totalChunks = idx.totalChunks();
286
+ idx.close();
287
+ checks.push({
288
+ id: "index:fts5",
289
+ label: "FTS5 BM25 index",
290
+ status: "ok",
291
+ detail: `${indexFile} — ${totalFiles} files / ${totalChunks} chunks`
292
+ });
293
+ }
294
+ catch (err) {
295
+ const msg = err instanceof Error ? err.message : String(err);
296
+ checks.push({
297
+ id: "index:fts5",
298
+ label: "FTS5 BM25 index",
299
+ status: "warn",
300
+ detail: `${indexFile} present but failed to open: ${msg}`,
301
+ hint: `enquire-mcp clear-index --vault ${opts.vault} && enquire-mcp index --vault ${opts.vault}`
302
+ });
303
+ }
304
+ }
305
+ else {
306
+ checks.push({
307
+ id: "index:fts5",
308
+ label: "FTS5 BM25 index",
309
+ status: "warn",
310
+ detail: hasSqlite ? `${indexFile} not built` : "needs better-sqlite3 first",
311
+ hint: hasSqlite ? `enquire-mcp index --vault ${opts.vault}` : "install better-sqlite3 first"
312
+ });
313
+ }
314
+ }
315
+ // 8. Embedding index — does the .embed.db exist for this vault?
316
+ if (vaultExists) {
317
+ const embedFile = opts.embedFile ?? defaultEmbedDbFile(vault.root);
318
+ if (existsSync(embedFile) && hasSqlite && hasTransformers) {
319
+ // Don't open the file (loading the model is expensive); just stat it
320
+ // and rely on the existence + size check.
321
+ try {
322
+ const sz = statSync(embedFile).size;
323
+ checks.push({
324
+ id: "index:embed",
325
+ label: "Embedding index (.embed.db)",
326
+ status: "ok",
327
+ detail: `${embedFile} — ${(sz / 1024 / 1024).toFixed(1)} MB`
328
+ });
329
+ }
330
+ catch (err) {
331
+ const msg = err instanceof Error ? err.message : String(err);
332
+ checks.push({
333
+ id: "index:embed",
334
+ label: "Embedding index (.embed.db)",
335
+ status: "warn",
336
+ detail: msg,
337
+ hint: `enquire-mcp clear-embeddings --vault ${opts.vault} && enquire-mcp build-embeddings --vault ${opts.vault}`
338
+ });
339
+ }
340
+ }
341
+ else {
342
+ const blockers = [];
343
+ if (!hasSqlite)
344
+ blockers.push("better-sqlite3");
345
+ if (!hasTransformers)
346
+ blockers.push("@huggingface/transformers");
347
+ checks.push({
348
+ id: "index:embed",
349
+ label: "Embedding index (.embed.db)",
350
+ status: "warn",
351
+ detail: blockers.length > 0
352
+ ? `blocked on: ${blockers.join(", ")}`
353
+ : `${embedFile} not built — semantic-search-only path will use TF-IDF cosine`,
354
+ hint: blockers.length > 0
355
+ ? `npm install ${blockers.join(" ")}`
356
+ : `enquire-mcp build-embeddings --vault ${opts.vault}`
357
+ });
358
+ }
359
+ }
360
+ // Tally the summary.
361
+ const summary = { ok: 0, warn: 0, missing: 0, error: 0 };
362
+ for (const ch of checks)
363
+ summary[ch.status] += 1;
364
+ // "ready" means: no missing or error. Warnings are advisory — you can
365
+ // still serve a useful subset of the surface (e.g. without ML embeddings).
366
+ const ready = summary.missing === 0 && summary.error === 0;
367
+ return { vault: opts.vault, ready, checks, summary };
368
+ }
369
+ //# sourceMappingURL=doctor.js.map