@oomkapwn/enquire-mcp 3.9.0-rc.2 → 3.9.0-rc.20
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +697 -0
- package/README.md +17 -17
- package/SECURITY.md +18 -12
- package/assets/social-preview.png +0 -0
- package/dist/bases.d.ts +23 -0
- package/dist/bases.d.ts.map +1 -1
- package/dist/bases.js +29 -4
- package/dist/bases.js.map +1 -1
- package/dist/cli.d.ts.map +1 -1
- package/dist/cli.js +62 -4
- package/dist/cli.js.map +1 -1
- package/dist/communities.d.ts +7 -1
- package/dist/communities.d.ts.map +1 -1
- package/dist/communities.js +7 -3
- package/dist/communities.js.map +1 -1
- package/dist/doctor.d.ts +12 -0
- package/dist/doctor.d.ts.map +1 -1
- package/dist/doctor.js +35 -2
- package/dist/doctor.js.map +1 -1
- package/dist/dql.d.ts +10 -0
- package/dist/dql.d.ts.map +1 -1
- package/dist/dql.js +13 -1
- package/dist/dql.js.map +1 -1
- package/dist/embeddings.d.ts +1 -1
- package/dist/embeddings.js +1 -1
- package/dist/eval.d.ts +14 -0
- package/dist/eval.d.ts.map +1 -1
- package/dist/eval.js +12 -2
- package/dist/eval.js.map +1 -1
- package/dist/hnsw.d.ts.map +1 -1
- package/dist/hnsw.js +5 -1
- package/dist/hnsw.js.map +1 -1
- package/dist/http-transport.d.ts.map +1 -1
- package/dist/http-transport.js +19 -5
- package/dist/http-transport.js.map +1 -1
- package/dist/index.d.ts +1 -1
- package/dist/index.d.ts.map +1 -1
- package/dist/index.js +1 -1
- package/dist/index.js.map +1 -1
- package/dist/ocr.d.ts +97 -19
- package/dist/ocr.d.ts.map +1 -1
- package/dist/ocr.js +145 -25
- package/dist/ocr.js.map +1 -1
- package/dist/pdf.js +1 -1
- package/dist/pdf.js.map +1 -1
- package/dist/server.d.ts.map +1 -1
- package/dist/server.js +18 -2
- package/dist/server.js.map +1 -1
- package/dist/tool-registry.d.ts.map +1 -1
- package/dist/tool-registry.js +5 -3
- package/dist/tool-registry.js.map +1 -1
- package/dist/tools/meta.d.ts +35 -0
- package/dist/tools/meta.d.ts.map +1 -1
- package/dist/tools/meta.js +131 -1
- package/dist/tools/meta.js.map +1 -1
- package/dist/tools/search.d.ts +44 -0
- package/dist/tools/search.d.ts.map +1 -1
- package/dist/tools/search.js +72 -13
- package/dist/tools/search.js.map +1 -1
- package/dist/watcher.d.ts +52 -1
- package/dist/watcher.d.ts.map +1 -1
- package/dist/watcher.js +138 -20
- package/dist/watcher.js.map +1 -1
- package/docs/COMPARISON.md +4 -4
- package/docs/QUICKSTART.md +2 -2
- package/docs/api.md +17 -4
- package/docs/benchmarks.md +51 -8
- package/package.json +5 -4
package/CHANGELOG.md
CHANGED
|
@@ -2,6 +2,703 @@
|
|
|
2
2
|
|
|
3
3
|
All notable changes to this project will be documented here. The format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/), and the project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
|
4
4
|
|
|
5
|
+
## [3.9.0-rc.20] — 2026-05-29
|
|
6
|
+
|
|
7
|
+
> **TL;DR:** **CI hardening — kill the recurring `npm ci` flake that just failed a release (sprint RC 12).** The rc.19 release **failed at the assert-CI gate** because the squash-merge commit's `test (24)` leg flaked: `npm ci` → `onnxruntime-node` postinstall → CDN `ETIMEDOUT` (same transient flake as rc.9; the rc.19 PR was all-green, only the main-push re-run flaked). Re-running the job published rc.19 — but a transient network blip should never fail a release. All **10 `npm ci` steps** across the 3 workflows are now wrapped in a **dependency-free bash retry loop** (3 attempts, 15s backoff — no marketplace retry action, so nothing new to SHA-pin per rc.14's supply-chain posture). New **OIA Check 10** fails CI if any bare `- run: npm ci` reappears (detection-power verified: injected one → flags `publish-docs.yml`; clean after). **993 tests unchanged** (workflows + audit-script + docs only).
|
|
8
|
+
|
|
9
|
+
**Patch — CI/supply-chain hardening (sprint RC 12). Workflows/audit-script/docs only; no `src/` runtime change.**
|
|
10
|
+
|
|
11
|
+
### Fixed
|
|
12
|
+
|
|
13
|
+
- **Recurring `npm ci` release-failing flake.** `onnxruntime-node`'s postinstall (`node ./script/install`) downloads its native binary from a CDN that intermittently times out; a bare `- run: npm ci` then fails the whole job — and when it hits the squash-merge commit's CI, `release.yml`'s "assert required CI checks passed" gate correctly refuses to publish (it did, on rc.19). All **10** `npm ci` invocations (`ci.yml` ×8, `release.yml`, `publish-docs.yml`) now run inside:
|
|
14
|
+
```bash
|
|
15
|
+
for n in 1 2 3; do
|
|
16
|
+
npm ci && break
|
|
17
|
+
[ "$n" -eq 3 ] && { echo "::error::npm ci failed after 3 attempts"; exit 1; }
|
|
18
|
+
echo "::warning::npm ci attempt $n failed (transient — e.g. onnxruntime postinstall CDN ETIMEDOUT); retrying in 15s"
|
|
19
|
+
sleep 15
|
|
20
|
+
done
|
|
21
|
+
```
|
|
22
|
+
Dependency-free (a bash loop, not a marketplace retry action) so it adds **no new action to SHA-pin** — consistent with rc.14's pinned-dependencies posture.
|
|
23
|
+
|
|
24
|
+
### Changed (structural defense — close the flake class)
|
|
25
|
+
|
|
26
|
+
- **OIA Check 10 (`NPM-CI-NOT-RETRY-WRAPPED`)** — scans `.github/workflows/*.yml` and fails CI on any line that is exactly a bare `- run: npm ci`. Makes the retry-wrap self-enforcing: a future PR that adds an unwrapped `npm ci` trips the `oia` gate. **Detection power verified non-vacuously**: injecting a bare `npm ci` flags `publish-docs.yml:<line>`; the wrapped form (`npm ci && break` inside `run: |`) is silent. OIA count synced **9 → 10** (oia-walk header + AGENTS.md ×2).
|
|
27
|
+
|
|
28
|
+
### Method note
|
|
29
|
+
|
|
30
|
+
The rc.19 release failure is the *first time* this known flake (documented since rc.9) actually **blocked a publish** rather than just a PR check — which is exactly the signal that "re-run by hand" was no longer an acceptable response. Fixed the class (all 10 steps) + a structural guard (Check 10), not the instance.
|
|
31
|
+
|
|
32
|
+
### Files changed
|
|
33
|
+
|
|
34
|
+
- `.github/workflows/{ci,release,publish-docs}.yml` (10 `npm ci` → retry loop), `scripts/oia-walk.mjs` (Check 10 + header 9→10 / 13→14 walks / marker order), `AGENTS.md` (OIA count 9→10 ×2).
|
|
35
|
+
- version bump 3.9.0-rc.19 → 3.9.0-rc.20 (7 surfaces); test count unchanged (993).
|
|
36
|
+
|
|
37
|
+
---
|
|
38
|
+
|
|
39
|
+
## [3.9.0-rc.19] — 2026-05-29
|
|
40
|
+
|
|
41
|
+
> **TL;DR:** **LongMemEval retrieval harness (sprint RC 11 — the v3.10 credibility lever, engineering half).** [LongMemEval](https://github.com/xiaowu0162/LongMemEval) (Wu et al. 2024) is the long-term-memory benchmark Mem0/Zep publish against; no Obsidian-MCP has any LongMemEval-derived number. New [`scripts/bench-longmemeval.mjs`](https://github.com/oomkapwn/enquire-mcp/blob/main/scripts/bench-longmemeval.mjs) materializes each question's haystack sessions into a throwaway vault, indexes with FTS5, runs `searchHybrid`, and scores **`recall@k` / `MRR` / `NDCG@k` of the answer-bearing session(s)** (reusing `src/eval.ts`), aggregated per `question_type`. It measures **retrieval quality, NOT end-to-end QA accuracy** — enquire is a retriever, not an answerer; claiming a QA number would be an overclaim. The dataset is **not** committed (size + licensing); the **headline numbers are intentionally NOT published** — they're maintainer-gated (a full reference-hardware run + methodology review, per the project's "measured, reproducible, reviewed — never a placeholder" bar). **982 → 993 tests** (+11 pure-function tests, positive + NEGATIVE controls).
|
|
42
|
+
|
|
43
|
+
**Patch — discoverability/credibility infrastructure (sprint RC 11). Scripts/tests/docs only; no `src/` runtime change.**
|
|
44
|
+
|
|
45
|
+
### Added
|
|
46
|
+
|
|
47
|
+
- **`scripts/bench-longmemeval.mjs`** — LongMemEval **retrieval** benchmark harness. Per question: materialize haystack sessions → one note each in a temp vault → `syncFtsIndex` → `searchHybrid` → score `recall@k`/`MRR`/`NDCG@k` of the answer session(s) (the same `src/eval.ts` metrics as the rest of `docs/benchmarks.md`), aggregated overall + per `question_type`; abstention (`*_abs`) questions counted separately. Pure helpers (`sessionToMarkdown`, `sessionNotePath`, `relevantSessionPaths`, `isAbstention`, `aggregateByType`) exported for unit testing; CLI guarded by `isEntrypoint`. `--dataset <path> [--limit N] [--k 10] [--embeddings]`. Missing dataset → exit 2 with download guidance (it's not committed).
|
|
48
|
+
- **`npm run bench:longmemeval`** script.
|
|
49
|
+
- **`docs/benchmarks.md` → "LongMemEval retrieval (external benchmark)"** section: the retrieval-vs-QA framing, the run command, and an explicit **"numbers pending a full maintainer run"** status (no fabricated/placeholder figures — the LongMemEval headline is the credibility centerpiece and goes through the same measured-and-reviewed bar as every other number).
|
|
50
|
+
- **`.gitignore`** guard (`longmemeval*.json`, `longmemeval_*/`) so a maintainer's dataset download can't be accidentally committed.
|
|
51
|
+
|
|
52
|
+
### Tests added (+11 new it() blocks, positive + NEGATIVE controls) — 982 → 993
|
|
53
|
+
|
|
54
|
+
- `tests/longmemeval-harness.test.ts` (new) — `sessionNotePath` (safe-id + **path-traversal NEGATIVE control**), `sessionToMarkdown` (role-labelled turns + **malformed/empty-session NEGATIVE control**), `relevantSessionPaths` (explicit `answer_session_ids` + `has_answer` fallback + **empty-on-abstention NEGATIVE control**), `isAbstention` (`_abs` + NEGATIVE), `aggregateByType` (per-type averages + hit-rate + **empty-input NEGATIVE control**). The full benchmark run (needs the uncommitted dataset + heavy compute) is intentionally not a CI gate; the *logic that decides what's scored and how it aggregates* is.
|
|
55
|
+
|
|
56
|
+
### Scope note — what ships vs. what's gated
|
|
57
|
+
|
|
58
|
+
The **harness + tests ship now** (verifiable engineering). The **published LongMemEval score**, forgetting-aware staleness, and "grounded in your knowledge, not extracted" messaging remain **v3.10** — the score specifically is maintainer-gated (download + full run + review) so the credibility centerpiece is never an unreviewed auto-publish.
|
|
59
|
+
|
|
60
|
+
### Files changed
|
|
61
|
+
|
|
62
|
+
- `scripts/bench-longmemeval.mjs` (new), `tests/longmemeval-harness.test.ts` (new, +11), `docs/benchmarks.md` (LongMemEval section), `package.json` (`bench:longmemeval` script), `.gitignore` (dataset guard).
|
|
63
|
+
- version bump 3.9.0-rc.18 → 3.9.0-rc.19 (7 surfaces); test count 982 → 993.
|
|
64
|
+
|
|
65
|
+
---
|
|
66
|
+
|
|
67
|
+
## [3.9.0-rc.18] — 2026-05-29
|
|
68
|
+
|
|
69
|
+
> **TL;DR:** **Brand-integrity: the social card stopped lying about SLSA (sprint RC 10).** State-driven read of `assets/social-preview.svg` — the GitHub social card, the single most-shared visual of the repo — caught a **`SLSA-3`** trust badge (line 137). That's a **residual instance of overclaim #15** (rc.7 downgraded SLSA-3 → SLSA Build L2 everywhere because `release.yml` only does `npm publish --provenance`); rc.7's sweep AND OIA Check 4d's original file scope both missed the SVG, so the card advertised a false security level for 11 RCs. Fixed the badge (`SLSA-3` → `SLSA L2`), re-rendered the PNG, and **extended OIA Check 4d's `claimFiles` to include `assets/social-preview.svg`** so the surface is permanently guarded. Detection power verified (injected `SLSA-3` → Check 4d flags `social-preview.svg:137`; clean after fix). **982 tests unchanged** (assets + audit-script only).
|
|
70
|
+
|
|
71
|
+
**Patch — brand-integrity + structural defense (sprint RC 10). Assets/audit-script only; no `src/` runtime change.**
|
|
72
|
+
|
|
73
|
+
### Fixed
|
|
74
|
+
|
|
75
|
+
- **`assets/social-preview.svg` claimed `SLSA-3` (overclaim #15, residual instance).** The bottom trust-signal row badge said `SLSA-3` — a level the build doesn't earn (`npm publish --provenance` = SLSA Build **L2**; L3 needs an isolated builder via `slsa-framework/slsa-github-generator`). This is the same overclaim rc.7 retracted across README/package.json/llms.txt/COMPARISON/STABILITY, but the **social card was outside both rc.7's sweep and OIA Check 4d's scope**, so it persisted on the most externally-visible surface. → `SLSA L2`. `assets/social-preview.png` re-rendered from the corrected SVG via `scripts/render-social-preview.mjs`.
|
|
76
|
+
- **`src/pdf.ts:13` asserted pdfjs-dist is "SLSA-3 published" (unverified third-party claim).** A repo-wide sweep for the SLSA-3 class (triggered by the SVG find, per the root-cause-sweep rule) surfaced a source comment claiming the **pdfjs-dist dependency** ships SLSA-3 provenance — something we never verified. Per the project rule ("any SLSA-level claim must point to backing evidence, else downgrade"), the unverified clause was removed (the comment's real point — pure-JS, no native deps, Apache-2.0, optional — is unchanged). All other repo `SLSA-3` hits are legitimate: CLAUDE.md/ROADMAP history + the "earn real L3" roadmap target, `oia-walk.mjs`'s own detector regex, and `docs/audits/*` point-in-time audit artifacts (excluded from OIA currency + npm — rewriting them would falsify the historical record).
|
|
77
|
+
|
|
78
|
+
### Changed (structural defense — close the recursion)
|
|
79
|
+
|
|
80
|
+
- **OIA Check 4d (`SLSA-LEVEL-OVERCLAIM`) `claimFiles` now includes `assets/social-preview.svg`.** Root-cause: the SLSA-level check guarded the doc surfaces but not the rendered-asset surface. Adding the SVG makes the social-card SLSA badge self-enforcing (CI fails if it ever drifts to L3/SLSA-3 again). **Detection power verified non-vacuously**: with `SLSA-3` injected the check flags `assets/social-preview.svg:137`; with `SLSA L2` it's silent. Mirrors the v3.8.0-rc.11 "drift findings demand a full-surface sweep + structural defense" rule.
|
|
81
|
+
|
|
82
|
+
### Method note
|
|
83
|
+
|
|
84
|
+
This is a textbook **state-driven** catch: a change-driven sweep (rc.7) fixed the class on the files it was looking at; reading *every* file as it exists on disk — including a rendered-asset source — surfaced the one instance it missed. The fix isn't just the instance (SVG badge) but the **defense-scope gap** (Check 4d file list), so the class is closed, not just the symptom.
|
|
85
|
+
|
|
86
|
+
### Files changed
|
|
87
|
+
|
|
88
|
+
- `assets/social-preview.svg` (SLSA-3 → SLSA L2), `assets/social-preview.png` (re-rendered), `scripts/oia-walk.mjs` (Check 4d `claimFiles` += social-preview.svg), `src/pdf.ts` (comment-only: dropped unverified pdfjs-dist "SLSA-3 published" — dist output byte-identical, no runtime change).
|
|
89
|
+
- version bump 3.9.0-rc.17 → 3.9.0-rc.18 (7 surfaces); test count unchanged (982).
|
|
90
|
+
|
|
91
|
+
### Deferred (repo-page polish, lower priority)
|
|
92
|
+
|
|
93
|
+
Social-preview stat-pill redesign (would add new numeric-claim drift surface — needs a docs-consistency invariant in the same PR), README hero `claude mcp add` one-liner, `server.json` `categories`/`websiteUrl` (verify against the 2025-12-11 schema first). Then **v3.10 LongMemEval** (the #1 credibility lever).
|
|
94
|
+
|
|
95
|
+
---
|
|
96
|
+
|
|
97
|
+
## [3.9.0-rc.17] — 2026-05-29
|
|
98
|
+
|
|
99
|
+
> **TL;DR:** **AI-search discoverability: Schema.org `@graph` structured data (sprint RC 9).** The single biggest lever for getting cited by Google AI Overviews / Perplexity / Bing Copilot is machine-readable structured data, and the highest-citation type is **FAQPage**. `scripts/inject-jsonld.mjs` (run at GH-Pages publish time) is upgraded from a lone `SoftwareApplication` node to a Schema.org **`@graph`** with three cross-linked nodes: an enriched **SoftwareApplication** (now with `featureList` + `maintainer`), a **SoftwareSourceCode** node (`codeRepository`/`runtimePlatform`/`targetProduct` → the app), and a **FAQPage** carrying the README's 6 Q&A pairs. Plus a `glama.json` (`maintainers: [oomkapwn]`) so the Glama.ai crawler can attribute + index the server instead of withholding it from search. The builder is refactored into a pure, exported `buildJsonLdGraph(pkg)` so it's unit-tested (deterministic — no dates/RNG). **975 → 982 tests** (+7, positive + NEGATIVE controls).
|
|
100
|
+
|
|
101
|
+
**Patch — discoverability (sprint RC 9). Docs/scripts/config only; no `src/` runtime change.**
|
|
102
|
+
|
|
103
|
+
### Added
|
|
104
|
+
|
|
105
|
+
- **Schema.org `@graph` JSON-LD** (`scripts/inject-jsonld.mjs`, expanded). Three nodes:
|
|
106
|
+
- **SoftwareApplication** — now includes `featureList` (8 differentiators), `maintainer`, `applicationSubCategory: "Model Context Protocol (MCP) server"`, stable `@id`.
|
|
107
|
+
- **SoftwareSourceCode** — `codeRepository` (cleaned of `git+`/`.git`), `runtimePlatform`, `programmingLanguage`, `targetProduct` cross-referencing the SoftwareApplication `@id`.
|
|
108
|
+
- **FAQPage** — the README "## ❓ FAQ" Q&A as `Question`/`acceptedAnswer` pairs (highest AI-citation structured-data type).
|
|
109
|
+
- **`glama.json`** at repo root (`$schema` + `maintainers: ["oomkapwn"]`) — lets the Glama.ai MCP directory attribute the server to its maintainer and index it (claimed servers move from "withheld from search" to discoverable for Glama's user base).
|
|
110
|
+
|
|
111
|
+
### Changed
|
|
112
|
+
|
|
113
|
+
- `scripts/inject-jsonld.mjs` refactored: `buildJsonLdGraph(pkg)` + `FAQ_ENTRIES` are now **exported pure** functions/data (CLI behavior guarded behind an `isEntrypoint` check), so the JSON-LD is unit-testable. The injected `<script type="application/ld+json">` now carries a `@graph`; the idempotency marker (`application/ld+json`) is unchanged, so `publish-docs.yml` needs no edit.
|
|
114
|
+
|
|
115
|
+
### Tests added (+7 new it() blocks, positive + NEGATIVE controls) — 975 → 982
|
|
116
|
+
|
|
117
|
+
- `tests/jsonld.test.ts` (new) — `buildJsonLdGraph`: `@graph` has exactly the 3 expected `@type`s; SoftwareApplication carries `softwareVersion === package.json` + `featureList` + `maintainer`; `SoftwareSourceCode.targetProduct["@id"]` cross-refs the app `@id` + repo URL is clean (no `git+`/`.git`); FAQPage mirrors `FAQ_ENTRIES` with **non-empty Q + A (NEGATIVE control on empty answers)**; the graph is JSON-serializable. Plus a **README-FAQ-count drift guard**: `FAQ_ENTRIES.length` must equal the README FAQ bold-question count (so a 7th README FAQ that's not mirrored into the JSON-LD fails CI), and every entry is well-formed (`q` ends with `?`, `a` non-empty).
|
|
118
|
+
|
|
119
|
+
### Files changed
|
|
120
|
+
|
|
121
|
+
- `scripts/inject-jsonld.mjs` (expanded + exported builder), `glama.json` (new), `tests/jsonld.test.ts` (new, +7).
|
|
122
|
+
- version bump 3.9.0-rc.16 → 3.9.0-rc.17 (7 surfaces); test count 975 → 982.
|
|
123
|
+
|
|
124
|
+
### Deferred to rc.18 (repo-page polish)
|
|
125
|
+
|
|
126
|
+
Social-preview regen (`scripts/render-social-preview.mjs` → stat-pill design: 44 tools / 982 tests / +15.5 NDCG@10), README hero `claude mcp add` one-liner + canonical-URL comments, `server.json` `categories`/`keywords`, then **v3.10 LongMemEval** harness (the #1 credibility lever).
|
|
127
|
+
|
|
128
|
+
---
|
|
129
|
+
|
|
130
|
+
## [3.9.0-rc.16] — 2026-05-29
|
|
131
|
+
|
|
132
|
+
> **TL;DR:** **Correctness batch 2 (sprint RC 8) — user-facing correctness + honesty.** Clears the rc.15-deferred backlog plus the rc.15 post-ship self-audit. (1) `doctor` now actually applies the privacy filter it claimed (`--exclude-glob`/`--read-paths` were never wired — it counted all files yet labeled the count "privacy filter applied" — **P2-12**). (2) `eval` distinguishes an *errored* query from a genuine zero-relevance one (new `query_errors` count + per-query `error` flag + a banner warning — a benchmark's means were silently deflatable by infra hiccups). (3) The stateless HTTP handler now wires its per-request cleanup **before** `connect()`, so a connect failure no longer leaks the McpServer + transport (parity with the stateful path's close discipline). (4) `--ocr-pdfs` warns instead of silently no-op'ing when `--watch` or the embed-db is absent. (5) rc.15's `converged` flag is now actually **surfaced** to MCP callers, and a stale "`+5-10 NDCG@10`" reranker undersell in CLI `--help` (missed by rc.12's docs-only sweep) is corrected to the measured **+15.5 / +24.7**. The deferred `tools/search.ts` "citation mis-attribution" item was **investigated and found to be a non-issue** (snippet/line/chunk/kind all follow one consistent `bm25 ?? embeddings ?? tfidf` precedence). **970 → 975 tests** (+5, positive + NEGATIVE controls).
|
|
133
|
+
|
|
134
|
+
**Patch — audit-driven correctness, batch 2 (sprint RC 8).**
|
|
135
|
+
|
|
136
|
+
### Fixed
|
|
137
|
+
|
|
138
|
+
- **`doctor` ignored the privacy filter while claiming to apply it (P2-12).** `runDoctor` built `new Vault(opts.vault)` with no `excludeGlobs`/`readPaths` — so it walked the *unfiltered* vault, counted every file, and labeled the count `"(privacy filter applied)"`. A privacy-conscious user verifying setup got false reassurance. Now `RunDoctorOptions` accepts `excludeGlobs`/`readPaths`, the CLI `doctor` command exposes `--exclude-glob`/`--read-paths`, the count is honest (`"(after privacy filter)"` only when one is set), and a new `privacy` check reports the active pattern counts — or surfaces a config **error** (instead of crashing) on an empty-after-trim glob.
|
|
139
|
+
- **`eval` conflated errored queries with zero-relevance hits.** A query that threw in `searchHybrid` was pushed to `per_query` with all-zero scores and counted in the means — indistinguishable from a genuine miss, silently deflating published NDCG/Recall/MRR. New `EvalResult.query_errors` count + per-query `error?: true` flag + a `formatEvalResult` banner warning ("re-run before publishing"). Means still include the zeros (you don't get to drop hard queries that crashed) but the deflation is now **visible**.
|
|
140
|
+
- **Stateless HTTP per-request cleanup leaked on connect failure (parity).** `handleStatelessRequest` registered `res.on("close", cleanup)` *after* `await server.connect(transport)`, so a connect throw skipped straight to the catch and the freshly-built McpServer + transport were never closed. Cleanup is now wired **before** connect, made idempotent (`cleanedUp` guard) + error-safe (`.catch`), and also invoked in the catch — matching the stateful path's close discipline (P2-10).
|
|
141
|
+
- **`--ocr-pdfs` was a silent no-op in two cases.** Passed without `--watch` (the flag only acts on the watcher path) → now warns + ignores. Passed with `--watch` but no embed-db (OCR'd text has nowhere to be indexed) → now warns + continues FTS5-only, instead of the block being skipped inside `if (existsSync(embedFile))` with zero feedback.
|
|
142
|
+
|
|
143
|
+
### rc.15 post-ship self-audit (same-class re-sweep)
|
|
144
|
+
|
|
145
|
+
- **`converged` was computed but never surfaced.** rc.15 added `CommunityResult.converged` "so callers can surface this" — but the `obsidian_get_communities` handler dropped it. Now in the tool output; tool description corrected ("`iterations` until convergence" → "`iterations` (greedy passes run) and `converged`").
|
|
146
|
+
- **α-class comment drift (bases.ts).** The v3.6.2 HN-2 comment still framed the unbounded warn-Set as "fine" ("one log line each") right next to rc.15's `MAX_WARNED_PREDICATES` cap that exists *because* a distinct-predicate stream broke that reasoning. Comment corrected.
|
|
147
|
+
- **Reranker undersell in CLI `--help` (missed instance).** `--enable-reranker` help still said the generic "+5-10 NDCG@10 typical"; rc.12's "corrected everywhere" sweep covered `docs/` but not `src/` CLI strings. → measured **≈+15.5 NDCG@10 / +24.7 MRR (60-query ablation)**.
|
|
148
|
+
|
|
149
|
+
### Investigated — no change (empirical rejection)
|
|
150
|
+
|
|
151
|
+
- **`tools/search.ts` "citation line/kind mis-attribution across rankers"** (rc.15-deferred hypothesis): traced the final-hit assembly — `snippet`, `line_start`/`line_end`, `chunk_index`, and `kind` all derive from the same `bm25 ?? embeddings ?? tfidf` precedence (TF-IDF carries no line/kind, so a TF-IDF-only hit reports `line: undefined` + `kind: "md"`, never a *cross-ranker mix*). `kind` is a file-level property and can't conflict across signals. Current `main` is consistent; no fix warranted.
|
|
152
|
+
|
|
153
|
+
### Tests added (+5 new it() blocks, positive + NEGATIVE controls) — 970 → 975
|
|
154
|
+
|
|
155
|
+
- `tests/eval.test.ts` — errored-query: `query_errors === 1`, per-query `error === true`, banner contains "errored", successful query `error` undefined (NEGATIVE); + an all-success NEGATIVE control (`query_errors === 0`, no banner). `makeResult()` literal updated for the new field.
|
|
156
|
+
- `tests/doctor.test.ts` — privacy-active (ok check + "after privacy filter" count), no-filter NEGATIVE control (no `privacy` check, no false claim), empty-glob error path (`ready === false`).
|
|
157
|
+
- `tests/http-transport.test.ts` — 6 sequential stateless requests each 200 (exercises per-request build→connect→cleanup repeatedly).
|
|
158
|
+
- `tests/e2e-handlers.test.ts` — `converged` surfaced in the `obsidian_get_communities` MCP output.
|
|
159
|
+
|
|
160
|
+
### Files changed
|
|
161
|
+
|
|
162
|
+
- `src/doctor.ts` (privacy opts + check + honest count), `src/cli.ts` (doctor `--exclude-glob`/`--read-paths`; reranker help number), `src/eval.ts` (`query_errors` + `error` + banner), `src/http-transport.ts` (stateless cleanup parity), `src/server.ts` (two `--ocr-pdfs` warnings), `src/tool-registry.ts` (`converged` surfaced + description), `src/bases.ts` (HN-2 comment).
|
|
163
|
+
- tests: eval (+2 + literal), doctor (+3), http-transport (+1), e2e-handlers (+1 assertion). `scripts/check-per-file-coverage.mjs` bases.ts comment 73.17% → 74.71%.
|
|
164
|
+
- version bump 3.9.0-rc.15 → 3.9.0-rc.16 (7 surfaces); test count 970 → 975.
|
|
165
|
+
|
|
166
|
+
---
|
|
167
|
+
|
|
168
|
+
## [3.9.0-rc.15] — 2026-05-29
|
|
169
|
+
|
|
170
|
+
> **TL;DR:** **Correctness cleanup (sprint RC 7).** Three MEDIUM/LOW findings from the audit: `bases.ts`'s warn-once dedup `Set` grew without bound on a stream of distinct malformed `.base` predicates (slow memory leak on a long-lived `serve`); `detectCommunities` gave no signal when Louvain hit the `MAX_PASSES=50` cap without converging (callers couldn't tell a sub-optimal partition); and `loadReranker`'s TSDoc claimed the default alias is `rerank-multilingual` when it's actually `rerank-bge` (α-class drift). **966 → 970 tests** (+4, positive + NEGATIVE controls).
|
|
171
|
+
|
|
172
|
+
**Patch — audit-driven correctness (sprint RC 7).**
|
|
173
|
+
|
|
174
|
+
### Fixed
|
|
175
|
+
|
|
176
|
+
- **`bases.ts` unbounded warn-Set (memory growth).** `warnedUnknownPredicates` `.add()`ed every distinct unevaluated predicate forever. A `.base`/DQL query with many unique malformed predicates (attacker- or agent-controlled) grew it without bound for the process lifetime. New exported `boundedSetAdd(set, value, max)` caps it at `MAX_WARNED_PREDICATES`=1000 (past the cap a distinct predicate may re-warn once — acceptable vs. unbounded memory).
|
|
177
|
+
- **`communities.ts` convergence signal.** `CommunityResult` gains **`converged: boolean`** — true when Louvain reached a stable partition (a pass made no moves), false when it exited on the `MAX_PASSES` cap with moves pending (valid but possibly sub-optimal). Derived from the loop's final `!changed`; the edgeless short-circuit reports `converged: true, iterations: 0`.
|
|
178
|
+
- **`embeddings.ts` reranker-default TSDoc.** `loadReranker`'s `@param` said `default: "rerank-multilingual"`; the real `DEFAULT_RERANKER_ALIAS` is `"rerank-bge"`. Corrected (published TypeDoc/IDE-hover was lying — α-class).
|
|
179
|
+
|
|
180
|
+
### Tests added (+4, positive + NEGATIVE controls)
|
|
181
|
+
|
|
182
|
+
- `tests/bases.test.ts` — `boundedSetAdd`: adds under cap (POSITIVE), no-grow on duplicate, **refuses to grow past the cap (NEGATIVE control)**, `MAX_WARNED_PREDICATES` sanity.
|
|
183
|
+
- `tests/communities.test.ts` — `converged` asserted on the edgeless path (`true`, `iterations === 0`) + a clustered graph (`true`, `iterations < 50`).
|
|
184
|
+
|
|
185
|
+
### Deferred to rc.16 (correctness batch 2)
|
|
186
|
+
|
|
187
|
+
`tools/search.ts` citation line/kind mis-attribution across rankers, `eval.ts` `query_errors` count, `doctor` privacy-glob flags (P2-12), `http-transport.ts` stateless-handler cleanup parity, `server.ts` `--ocr-pdfs`-no-embed-db warning — each needs heavier integration-test setup; batched next.
|
|
188
|
+
|
|
189
|
+
### Files changed
|
|
190
|
+
|
|
191
|
+
- `src/bases.ts` (`boundedSetAdd` + cap), `src/communities.ts` (`converged`), `src/embeddings.ts` (TSDoc).
|
|
192
|
+
- `tests/bases.test.ts` (+4), `tests/communities.test.ts` (+assertions).
|
|
193
|
+
- test count 966 → 970 across README/COMPARISON/llms.txt/AGENTS/package.json/ROADMAP.
|
|
194
|
+
- version bump 3.9.0-rc.14 → 3.9.0-rc.15 (7 surfaces).
|
|
195
|
+
|
|
196
|
+
---
|
|
197
|
+
|
|
198
|
+
## [3.9.0-rc.14] — 2026-05-29
|
|
199
|
+
|
|
200
|
+
> **TL;DR:** **Supply-chain: SHA-pin every GitHub Action + a structural guard so they can't drift back (sprint RC 6).** Floating action tags (`uses: actions/checkout@v6`) can be silently retagged to malicious code — the OpenSSF "Pinned-Dependencies" check + this project's supply-chain brand (SLSA L2 + signed provenance) call for commit-SHA pins. All **28 action refs across the 4 workflows** are now pinned to their exact current 40-hex commit SHA (behavior identical) with a `# vN` comment for humans + Dependabot. New **OIA Check 9** fails CI if any third-party action ever uses a floating tag again — making the pin self-enforcing. **Workflows + audit-script + docs only; 966 tests unchanged.**
|
|
201
|
+
|
|
202
|
+
**Patch — audit-driven supply-chain (sprint RC 6).**
|
|
203
|
+
|
|
204
|
+
### Fixed
|
|
205
|
+
|
|
206
|
+
- **SHA-pin all GitHub Actions (28 refs / 4 workflows).** `actions/checkout@v6`, `actions/setup-node@v6`, `actions/upload-artifact@v7`, `actions/configure-pages@v6`, `actions/upload-pages-artifact@v5`, `actions/deploy-pages@v5` → each pinned to the exact commit SHA the tag currently resolves to (resolved via `gh api repos/actions/<x>/commits/<tag>`), with a trailing `# vN` comment. Identical behavior today; immune to tag-moving supply-chain attacks. Spans `ci.yml` (19), `publish-docs.yml` (5), `release.yml` (2), `dist-tag-cleanup.yml` (2).
|
|
207
|
+
|
|
208
|
+
### Structural defense
|
|
209
|
+
|
|
210
|
+
- **OIA Check 9 — Actions SHA-pin.** Scans every `.github/workflows/*.yml` `uses:` line; flags any third-party action NOT pinned to a 40-hex commit SHA (local `./.github/...` reusable refs exempt). **Verified non-vacuous** (all 28 current refs pass — silent for the right reason) **and with detection power** (a floating `@v6` / `@main` would flag). Makes the pin permanent: a future unpinned action fails CI. This is the 9th numbered OIA walk (header + AGENTS + CLAUDE counts synced 8 → 9).
|
|
211
|
+
|
|
212
|
+
### Deferred (tracked)
|
|
213
|
+
|
|
214
|
+
OpenSSF Scorecard workflow + `dependency-review-action` on PRs — additive new workflows (each itself SHA-pinned) → a follow-up supply-chain RC. SHA-pinning is the highest-value item (the concrete hardening + the Scorecard "Pinned-Dependencies" win) and ships here first.
|
|
215
|
+
|
|
216
|
+
### Files changed
|
|
217
|
+
|
|
218
|
+
- `.github/workflows/{ci,publish-docs,release,dist-tag-cleanup}.yml` — 28 action refs SHA-pinned.
|
|
219
|
+
- `scripts/oia-walk.mjs` — Check 9 + header enumeration (8 → 9 numbered, 12 → 13 blocks).
|
|
220
|
+
- `AGENTS.md`, `CLAUDE.md` — OIA check count 8 → 9.
|
|
221
|
+
- version bump 3.9.0-rc.13 → 3.9.0-rc.14 (7 surfaces). **966 tests unchanged.**
|
|
222
|
+
|
|
223
|
+
---
|
|
224
|
+
|
|
225
|
+
## [3.9.0-rc.13] — 2026-05-29
|
|
226
|
+
|
|
227
|
+
> **TL;DR:** **State-driven docs hygiene (sprint RC 5).** Clears the deferred-from-rc.12 backlog of stale-fragment fixes the file-by-file audit found — none CI-blocking, all honesty/credibility: CITATION.cff named the wrong default models; a script comment still credited the retracted "Cursor external audit" (overclaim #11); AGENTS.md said the version gate checks "5 surfaces" (it's 7) and listed a phantom `bench` CLI subcommand; several **packaged docs** (README, docs/api.md, docs/benchmarks.md — all ship in the npm tarball) linked to repo paths that **don't** ship (`../tests/`, `../src/`, `../bench/`, `./AGENTS.md`, `./ROADMAP.md`, `./llms.txt`, `.github/…`) → 404 for npm-page readers; and the rc.7 CHANGELOG entry's forward-claim ("#16 → rc.8, H1 → rc.9") was left stale after the rc.8 pivot re-sequenced them to rc.10/rc.11. **Docs/metadata/script only; 966 tests unchanged.**
|
|
228
|
+
|
|
229
|
+
**Patch — audit-driven docs hygiene (sprint RC 5).**
|
|
230
|
+
|
|
231
|
+
### Fixed
|
|
232
|
+
|
|
233
|
+
- **CITATION.cff model names.** Said "enquire-mcp uses bge-multilingual-gemma2 and bge-reranker-base" — `bge-multilingual-gemma2` isn't even in the model catalog. Corrected to the actual defaults: `paraphrase-multilingual-MiniLM-L12-v2` (embeddings) + `bge-reranker-base` (reranker). (Consumed by Zenodo/OpenAlex/Scholar — a factually wrong metadata claim.)
|
|
234
|
+
- **Retracted-Cursor-audit comment.** `scripts/check-version-consistency.mjs` header still credited the server.json gate to a "Cursor external audit on rc.15" — that attribution was retracted as overclaim #11 (the doc was for a different project). Re-credited to the M-REG-1 external-audit finding.
|
|
235
|
+
- **AGENTS.md drift.** "version sync across 5 surfaces" → **7** (×4 incl. the hyphenated "5-surface" + the surface list, which now names server.json version + packages[0]); dropped the phantom `bench` CLI subcommand from the architecture comment (no such subcommand) and listed the real `install-ocr-lang` instead.
|
|
236
|
+
- **Broken packaged-doc links → absolute GitHub URLs.** README (`llms.txt`, `AGENTS.md`, `ROADMAP.md`, `publish-docs.yml`), docs/api.md (`../scripts/bench-search.mjs`), docs/benchmarks.md (`../tests/…`, `../src/eval.ts` ×2, `../bench/benchmarks.json`, `./api-reference/` → the GH Pages URL) — all 404'd in the npm tarball (those paths aren't in `package.json#files`). Now absolute `github.com/.../blob/main/…` links that resolve everywhere.
|
|
237
|
+
- **CHANGELOG rc.7 forward-claim.** Added an inline "re-sequenced" note: #16 actually shipped in rc.10 and H1 in rc.11 (the rc.8 integrity-batch pivot pushed both back two RCs); the original "ships in rc.8 / rc.9" lines are preserved as history.
|
|
238
|
+
|
|
239
|
+
### Deferred (tracked)
|
|
240
|
+
|
|
241
|
+
`ROADMAP`/`AGENTS` into `scope-completeness-audit.mjs` AUDIT_FILES (needs a coordinated docs-consistency assertion so the numbers are actually verified, not just "claimed covered") + extending OIA Check 3's CLI-subcommand scan to AGENTS.md → a later structural RC. Supply-chain (SHA-pin Actions + OpenSSF Scorecard) → rc.14. Correctness cleanup (bases Set leak, search citation, eval errors, doctor globs, stateless-HTTP cleanup) → rc.15.
|
|
242
|
+
|
|
243
|
+
### Files changed
|
|
244
|
+
|
|
245
|
+
`CITATION.cff`, `scripts/check-version-consistency.mjs`, `AGENTS.md`, `README.md`, `docs/api.md`, `docs/benchmarks.md`, `CHANGELOG.md` (rc.7 note); version bump 3.9.0-rc.12 → 3.9.0-rc.13 (7 surfaces). **966 tests unchanged.**
|
|
246
|
+
|
|
247
|
+
---
|
|
248
|
+
|
|
249
|
+
## [3.9.0-rc.12] — 2026-05-29
|
|
250
|
+
|
|
251
|
+
> **TL;DR:** **Claim-accuracy: a structural RC-level currency guard + the stale-doc instances it surfaces (sprint RC 4).** The audit's root-cause theme — "the stale-claim findings stem from a defense gap" — gets its second structural fix (the first was rc.10's OIA Check 4e for OCR). OIA Check 7 only compared **major.minor** (so `v3.9.0-rc.3` read as "current" because `3.9 == 3.9`), letting a pinned "currently v3.9.0-rc.N" drift every release. New **RC-level sub-check** compares the **full** version: a "currently / valid as of vX.Y.Z-rc.N" claim must match the exact current version. It immediately caught 3 stale instances (README, api.md, benchmarks.md, all pinned to rc.3/rc.6); all rephrased to version-agnostic. Also closes the **reranker-number undersell** that rc.7's "corrected everywhere" sweep missed (4 sites still said the generic "+5-10 NDCG@10" vs the measured **+15.5 NDCG@10 / +24.7 MRR**). **Docs + audit-script only; 966 tests unchanged.**
|
|
252
|
+
|
|
253
|
+
**Patch — audit-driven claim-accuracy (sprint RC 4).**
|
|
254
|
+
|
|
255
|
+
### Fixed
|
|
256
|
+
|
|
257
|
+
- **RC-level currency drift (structural + instances).** `scripts/oia-walk.mjs` Check 7 gains an RC sub-check: `/(?:currently|(?:still )?valid as of) vX.Y.Z-rc.N/` must equal the exact `package.json` version (a tombstone-verb-after-version skip avoids flagging "vX shipped" history; bare "As of vX, <feature> ships" is excluded as a *since* claim). **Detection-power verified**: with the instances still stale it flagged README:280, docs/api.md:5, docs/benchmarks.md:3; after rephrasing to version-agnostic ("the latest release candidate — see CHANGELOG", "still valid through the v3.9.0-rc cascade", `3.9.0-rc.N` placeholder) it's silent. The api.md RC feature-list (rc.1/rc.2/rc.3) — already incomplete (missing rc.10/rc.11) and unmaintainable — was replaced with a CHANGELOG pointer.
|
|
258
|
+
- **Reranker number undersell (brand credibility).** 4 surfaces (docs/api.md ×2, docs/QUICKSTART.md, docs/COMPARISON.md) still claimed the generic literature figure "+5-10 NDCG@10" for our BGE reranker; corrected to the **measured +15.5 NDCG@10 / +24.7 MRR (60-query ablation)** that COMPARISON's headline + benchmarks.md already report. (benchmarks.md:396's "+5-10 across BEIR" is a legitimate *literature* citation about rerankers in general, not our self-claim — left as-is.)
|
|
259
|
+
|
|
260
|
+
### Deferred to rc.13 (state-driven backlog, batched with the correctness cleanup)
|
|
261
|
+
|
|
262
|
+
CITATION.cff model names, the retracted-Cursor-audit comment in `check-version-consistency.mjs`, AGENTS.md "5 surfaces"→7 + the phantom `bench` subcommand, broken packaged-doc relative links → absolute URLs, the rc.7↔rc.8 CHANGELOG sequencing note, `ROADMAP`/`AGENTS` into `scope-completeness-audit.mjs` AUDIT_FILES, and **SHA-pinning GitHub Actions + OpenSSF Scorecard** (a separable supply-chain batch).
|
|
263
|
+
|
|
264
|
+
### Files changed
|
|
265
|
+
|
|
266
|
+
- `scripts/oia-walk.mjs` — Check 7 RC-level currency sub-check + header note.
|
|
267
|
+
- `README.md`, `docs/api.md`, `docs/QUICKSTART.md`, `docs/benchmarks.md` — RC-currency → version-agnostic.
|
|
268
|
+
- `docs/api.md` (×2), `docs/QUICKSTART.md`, `docs/COMPARISON.md` — reranker "+5-10" → measured +15.5/+24.7.
|
|
269
|
+
- version bump 3.9.0-rc.11 → 3.9.0-rc.12 (7 surfaces).
|
|
270
|
+
|
|
271
|
+
---
|
|
272
|
+
|
|
273
|
+
## [3.9.0-rc.11] — 2026-05-28
|
|
274
|
+
|
|
275
|
+
> **TL;DR:** **Watcher / HNSW live-update correctness (sprint RC 3).** Two HIGH concurrency/integrity findings from the audit: **H1** — the watcher's file-change handler was fire-and-forget, so concurrent saves to the *same* file could interleave their embed-db upsert + HNSW `applyDiff` + the shared `rowsByLabel` mutation → silent index drift (ghost labels live in HNSW but absent from the embed-db → stale search hits). Now a **per-file promise queue** serializes same-file events (different files stay parallel), and `close()` drains in-flight handlers before the HNSW flush. **`-1` sentinel-label corruption** — the HNSW add-zip used `newIds[i] ?? -1`, which on any row/id length mismatch inserted a vector under label `-1`, corrupting the index + `rowsByLabel` + the persisted sidecar; the new `zipHnswAddPoints` throws fail-closed instead. Plus **M1** (`saveTo` persists the live `getCurrentCount()`, not the stale build-time `size`) and **L2** (correct `kind` on PDF unlink). **959 → 966 tests** (+7, positive + NEGATIVE controls). No API breaks.
|
|
276
|
+
|
|
277
|
+
**Patch — audit-driven correctness (sprint RC 3).**
|
|
278
|
+
|
|
279
|
+
### Fixed
|
|
280
|
+
|
|
281
|
+
- **H1 — watcher per-file serialization (HIGH, race).** `onChange` chained each event onto `this.handle(...).catch(...)` fire-and-forget; chokidar can dispatch overlapping events, and `handle()` has multiple `await` points between reading `oldIds` and applying the HNSW diff. Two concurrent edits to one file could interleave so a stale `applyDiff` left labels live in HNSW + `rowsByLabel` but absent from the embed-db (search then returns ghost hits, masked by `applyDiff`'s silent missing-label skip). Fix: a `fileQueues: Map<absPath, Promise>` chains same-file events sequentially (different files keep independent chains → still parallel); the map self-evicts when a file's chain drains. `close()` now `await Promise.allSettled([...fileQueues.values()])` before `flushHnswToDisk()` so a pending update completes before the flush.
|
|
282
|
+
- **`-1` sentinel-label corruption (HIGH).** `result.rows.map((r, i) => ({ id: newIds[i] ?? -1, … }))` at both the md and PDF zip sites silently inserted a vector under sentinel label `-1` if `newIds.length < rows.length` — corrupting the in-memory index, the shared `rowsByLabel`, and the flushed `.hnsw.bin`. New exported **`zipHnswAddPoints(rows, newIds)`** asserts equal length and throws (fail-closed) — caught by the watcher's per-event try/catch (logs + skips HNSW for that file; signature guard rebuilds a correct index next serve). No corrupt label is ever inserted.
|
|
283
|
+
- **M1 — HNSW `saveTo` live count.** `hnsw.ts` persisted the build-time `size` closure into `.meta.json`; after live updates that's stale. Now persists `hasLiveUpdate ? ctor.getCurrentCount() : size` (the same source the `size` getter uses).
|
|
284
|
+
- **L2 — unlink kind for PDFs.** The unlink branch hardcoded `kind: "md"` in its `syncHnswForFile` call; now passes `isPdf ? "pdf" : "md"`. Cosmetic on today's pure-delete diff (no rows are set) but correct + future-proof.
|
|
285
|
+
|
|
286
|
+
### Tests added (+7, positive + NEGATIVE controls)
|
|
287
|
+
|
|
288
|
+
- `tests/zip-hnsw-points.test.ts` (NEW) — `zipHnswAddPoints`: matched zip (POSITIVE), empty case, too-few-ids + too-many-ids throw (NEGATIVE — the `-1` guard), never-emits-`-1`.
|
|
289
|
+
- `tests/hnsw.test.ts` — M1: build → `applyDiff` add 1 → `saveTo` → persisted `meta.size` equals the live count, **not** the build-time size (NEGATIVE control).
|
|
290
|
+
- `tests/watcher.test.ts` — H1: after `close()` drains an edit, the invariant holds — no `-1` sentinel in `rowsByLabel`, no ghost label (every tracked label exists in the embed-db). (chokidar's 250ms `awaitWriteFinish` coalesces writes, so this asserts the serialization+drain invariant rather than forcing the exact race.)
|
|
291
|
+
|
|
292
|
+
### Files changed
|
|
293
|
+
|
|
294
|
+
- `src/watcher.ts` — `zipHnswAddPoints` helper + `EmbedRowLike`; `fileQueues` field + serialized `onChange`; `close()` drain; both zip sites use the helper; unlink kind.
|
|
295
|
+
- `src/hnsw.ts` — `saveTo` persists the live `getCurrentCount()`.
|
|
296
|
+
- `tests/zip-hnsw-points.test.ts` (new), `tests/hnsw.test.ts`, `tests/watcher.test.ts`.
|
|
297
|
+
- test count 959 → 966 across README/COMPARISON/llms.txt/AGENTS/package.json/ROADMAP.
|
|
298
|
+
- version bump 3.9.0-rc.10 → 3.9.0-rc.11 (7 surfaces).
|
|
299
|
+
|
|
300
|
+
---
|
|
301
|
+
|
|
302
|
+
## [3.9.0-rc.10] — 2026-05-28
|
|
303
|
+
|
|
304
|
+
> **TL;DR:** **Closes overclaim #16 — OCR offline enforcement is now REAL (CRITICAL), plus the OCR canvas-OOM DoS.** The TSDoc/CLI-help/SECURITY.md all claimed `serve` "makes zero outbound network calls" / "no runtime CDN download" / "throws if a language isn't installed" and referenced an `install-ocr-lang` subcommand — but the code did none of it (`createWorker` silently CDN-fetched; the subcommand didn't exist). This RC builds the guards the docs promised: a **pre-flight cache check that throws fail-closed before the worker is created**, a real **`install-ocr-lang <code>` subcommand**, a worker pinned read-only to the local tessdata cache, an **absolute canvas-dimension clamp** (the `scale` cap was a false OOM guard for giant MediaBoxes), page-range validation, and **OIA Check 4e** which fails CI if any doc claims the offline guarantee while a code guard is absent (regression-proofs the #16 class, like Check 4d did for SLSA). **+15 tests (positive + NEGATIVE controls), all CI-runnable without the OCR optional deps. 944 → 959 tests.**
|
|
305
|
+
|
|
306
|
+
**Patch — audit-driven security (sprint RC 2): #16 + DoS.**
|
|
307
|
+
|
|
308
|
+
### Fixed
|
|
309
|
+
|
|
310
|
+
- **#16 OCR offline enforcement (CRITICAL — claimed-guarantee vs code-guard).** `src/ocr.ts`: `extractPdfWithOcr` now calls **`assertOcrLangsInstalled(langs, langPath)`** BEFORE loading any optional dep — it `existsSync`-checks every requested `<lang>.traineddata` in the local tessdata cache and throws (fail-closed), naming the exact `install-ocr-lang` command, if any is missing. The Tesseract worker is created with `langPath` + `cachePath` at the local cache and **`cacheMethod: "readOnly"`** (never writes/refetches). New **`resolveTessdataDir()`** (`$ENQUIRE_TESSDATA_DIR` → `$XDG_CACHE_HOME/enquire-mcp/tessdata` → `~/.cache/enquire-mcp/tessdata`). New CLI **`install-ocr-lang <code>`** subcommand (mirrors `install-model`) downloads `<code>.traineddata` from tessdata_fast into that dir — the ONLY OCR network call, explicit + opt-in, with strict `^[a-z0-9_]+$` code validation (no path-traversal / URL-injection). `serve` now makes **zero** outbound calls for OCR.
|
|
311
|
+
- **OCR canvas-OOM DoS (HIGH).** The `scale ∈ [0.5,4]` clamp bounds the multiplier, not the absolute pixel count — a PDF with a giant MediaBox (spec allows 14400×14400 pt) rendered to a multi-GB single-page canvas → OOM. New **`clampOcrScale(w, h, scale)`** lowers the effective scale so the larger rendered side never exceeds **`MAX_OCR_CANVAS_DIM`** (5000 px).
|
|
312
|
+
- **Inverted page range (LOW).** **`resolveOcrPageRange`** throws on an empty/inverted range (e.g. `pages:[5,2]`) instead of silently returning zero pages (which a caller could misread as "image-only scan").
|
|
313
|
+
- **Docs corrected to the enforced reality.** Rewrote SECURITY.md "OCR network posture" (was: `install-ocr-lang` "Deferred" + "the only outbound call in serve mode" — both now false) with the code-guard list + a stable `<a id="ocr-network-posture">` anchor; fixed the api.md broken anchor; updated `--ocr-pdfs`/`--ocr-langs` CLI help + api.md to cite `install-ocr-lang`.
|
|
314
|
+
|
|
315
|
+
### Structural defense (closes the #16 class)
|
|
316
|
+
|
|
317
|
+
- **OIA Check 4e** (`scripts/oia-walk.mjs`) — the "claimed-guarantee vs code-guard" pattern applied to OCR (parallel to rc.8's SLSA Check 4d). If any of README/SECURITY.md/COMPARISON/api.md/llms.txt claims "zero outbound / no runtime CDN / install-ocr-lang" (non-roadmap), it asserts `src/ocr.ts` calls `assertOcrLangsInstalled` + sets `cacheMethod:"readOnly"` AND `src/cli.ts` registers `install-ocr-lang` — failing CI otherwise. **Verified non-vacuous** (all 3 guards detected present → silent for the right reason) **and with detection power** (would flag 4+ claim lines if a guard were removed). The generalized enforcement-verb grep remains a tracked ROADMAP item (this is the #16-specific guard, mirroring how 4d was #15-specific).
|
|
318
|
+
|
|
319
|
+
### Tests added (+15, positive + NEGATIVE controls)
|
|
320
|
+
|
|
321
|
+
`tests/ocr-offline.test.ts` (NEW) — `resolveTessdataDir` precedence (3), `ocrLangIsInstalled`/`assertOcrLangsInstalled` incl. multi-lang + missing-pack throw (5), `extractPdfWithOcr` pre-flight throw before any dep loads (1, the load-bearing #16 guard), `clampOcrScale` normal-unchanged + huge-MediaBox-shrinks (3), `resolveOcrPageRange` clamp + inverted-throws (3). All run without `tesseract.js`/`canvas`/`pdfjs` because the guards execute before those load.
|
|
322
|
+
|
|
323
|
+
### Files changed
|
|
324
|
+
|
|
325
|
+
- `src/ocr.ts` — `resolveTessdataDir`/`ocrLangIsInstalled`/`assertOcrLangsInstalled`/`clampOcrScale`/`resolveOcrPageRange`/`MAX_OCR_CANVAS_DIM`; pre-flight + readOnly worker + canvas clamp + page-range in `extractPdfWithOcr`; TSDoc corrected to the enforced behavior.
|
|
326
|
+
- `src/cli.ts` — `install-ocr-lang` subcommand; `--ocr-pdfs`/`--ocr-langs` help cite it.
|
|
327
|
+
- `scripts/oia-walk.mjs` — Check 4e + header enumeration (11 → 12 blocks).
|
|
328
|
+
- `SECURITY.md`, `docs/api.md` — OCR posture rewrite + stable anchor + subcommand row.
|
|
329
|
+
- `tests/ocr-offline.test.ts` (new).
|
|
330
|
+
- test count 944 → 959 across README/COMPARISON/llms.txt/AGENTS/package.json/ROADMAP.
|
|
331
|
+
- version bump 3.9.0-rc.9 → 3.9.0-rc.10 (7 surfaces).
|
|
332
|
+
|
|
333
|
+
---
|
|
334
|
+
|
|
335
|
+
## [3.9.0-rc.9] — 2026-05-28
|
|
336
|
+
|
|
337
|
+
> **TL;DR:** **First RC of the post-audit sprint — input-validation security.** A second, five-agent comprehensive audit (core-retrieval code · server/transport/CLI code · docs/workflows/config · competitor landscape · repo-page/discoverability) ran against rc.8; `ROADMAP.md` is rewritten around its findings + the competitive read (we are capability-ahead of every Obsidian-MCP peer; the gap to the memory leaders is published benchmarks + discoverability, not tech). This RC ships the **P0 input-validation** findings: a real **ReDoS guard** on `obsidian_open_questions` (an always-registered tool that compiled a caller-supplied `pattern` straight into V8's backtracking engine and ran it over every line of every note — a remote DoS on `serve-http`), a defensive length cap on DQL `like`, and reconciliation of the bearer-token min-length check between the CLI and the transport. **No behavior change for legitimate callers. 927 → 944 tests** (+17, all with positive + negative controls).
|
|
338
|
+
|
|
339
|
+
**Patch — audit-driven security (sprint RC 1 of N).**
|
|
340
|
+
|
|
341
|
+
### The audit (sprint kickoff)
|
|
342
|
+
|
|
343
|
+
Five parallel agents re-read the project end-to-end on rc.8. Net: **zero CRITICAL beyond the already-tracked #16 OCR overclaim**; the codebase's path-safety, FTS5 escaping, int8 quantization, RRF/IR-metric, bearer-compare, CORS, and P2-10/11 session-lifecycle layers were all re-confirmed solid. New actionable findings were sequenced into a phased sprint (see `ROADMAP.md` Tier 1): **rc.9 input-validation (this RC) → rc.10 OCR offline enforcement + canvas-OOM → rc.11 watcher/HNSW correctness → rc.12 structural defenses + state-driven docs + supply-chain → rc.13 remaining correctness → rc.14 discoverability**. Audit checkpoint after each RC.
|
|
344
|
+
|
|
345
|
+
### Fixed (input-validation security)
|
|
346
|
+
|
|
347
|
+
- **ReDoS in `obsidian_open_questions` (HIGH).** `tools/meta.ts` compiled `args.pattern` (zod `z.string().optional()`, no constraint) into a `RegExp` and ran it per-line across the whole vault. The tool is **always registered** (not gated), so any stdio or bearer-authenticated `serve-http` client could submit a catastrophic-backtracking pattern (`(a+)+$`, `(.*)*`) and freeze the single-threaded event loop. Fix: a dependency-free **`isCatastrophicRegex`** guard that rejects "star height ≥ 2" patterns (an unbounded/amplifying quantifier applied to a group whose body also has one — honoring char-classes + escapes) **before** compile, plus a hard **`MAX_QUESTION_PATTERN_LEN` = 200** cap mirrored on the zod schema. The safe default pattern is unaffected (regression-guarded in tests).
|
|
348
|
+
- **DQL `like` length cap (defensive).** `dql.ts`'s `likeToRegex` is catastrophic-backtracking-**safe by construction** (it only ever emits `.*`, never a nested quantifier — re-confirmed by the audit), so this is **not** a ReDoS fix; it just bounds regex-compile/match CPU on an absurdly long user-supplied LIKE value via **`MAX_LIKE_PATTERN_LEN` = 512** (throws above it).
|
|
349
|
+
- **Bearer min-length reconciliation.** `cli.ts` accepted any non-empty `--bearer-token` while `startHttpServer` independently threw on `< 16` — so a short token passed the CLI gate then failed deeper with a less-friendly error. The `≥16` check now also fires in the CLI action (before any server setup), giving the user the `gen-token` hint + a clean `exit(1)`. The transport-layer check stays as defense-in-depth.
|
|
350
|
+
|
|
351
|
+
### ROADMAP refresh
|
|
352
|
+
|
|
353
|
+
`ROADMAP.md` rewritten after the second audit + competitive/discoverability survey: sharpened "#1 in our spheres" thesis, the phased rc.9→rc.14 sprint, a Tier-3 push to **publish LongMemEval scores** (the #1 credibility lever — no Obsidian MCP has any) + a "forgetting-aware" note-staleness signal (a frontier every memory competitor fails), and a "Requires the maintainer" section for the account/OAuth-gated discoverability actions (Glama claim, MCP Registry re-submit, forum post).
|
|
354
|
+
|
|
355
|
+
### Tests added (+17, all positive + negative controls)
|
|
356
|
+
|
|
357
|
+
- `tests/redos-guard.test.ts` (NEW) — 13 catastrophic patterns flagged (NEGATIVE), 11 safe patterns accepted incl. the production default (POSITIVE regression guard), `readUnboundedQuantifier` unit cases, + 4 `getOpenQuestions` integration cases (rejects catastrophic/over-long; accepts safe/default). The catastrophic *integration* fixture is built at runtime (`String.fromCharCode`) so CodeQL's `js/redos` static pass doesn't flag a regex literal that the guard rejects before compile — keeps "0 new CodeQL alerts" true (caught by the advisory CodeQL gate on the first PR push).
|
|
358
|
+
- `tests/dql.test.ts` — `likeToRegex` cap: normal pattern matches (POSITIVE), boundary at the cap passes, over-long throws (NEGATIVE).
|
|
359
|
+
- `tests/cli.test.ts` — `serve-http` short-token → `exit(1)` + "≥16 chars" hint (NEGATIVE); no-token → "required" with the length error explicitly NOT firing (contrast control).
|
|
360
|
+
|
|
361
|
+
### Files changed
|
|
362
|
+
|
|
363
|
+
- `src/tools/meta.ts` — `isCatastrophicRegex` + `readUnboundedQuantifier` + `MAX_QUESTION_PATTERN_LEN` + guarded compile in `getOpenQuestions`.
|
|
364
|
+
- `src/tool-registry.ts` — `.max(MAX_QUESTION_PATTERN_LEN)` on the `pattern` schema + import.
|
|
365
|
+
- `src/dql.ts` — `MAX_LIKE_PATTERN_LEN` + cap in `likeToRegex` (exported for tests).
|
|
366
|
+
- `src/cli.ts` — bearer `≥16` check in the `serve-http` action.
|
|
367
|
+
- `ROADMAP.md` — full rewrite (post-audit).
|
|
368
|
+
- `tests/redos-guard.test.ts` (new), `tests/dql.test.ts`, `tests/cli.test.ts`.
|
|
369
|
+
- test count 927 → 944 across README/COMPARISON/llms.txt/AGENTS/package.json; README suite-timing ~5s → ~12s (audit LOW).
|
|
370
|
+
- version bump 3.9.0-rc.8 → 3.9.0-rc.9 (7 surfaces).
|
|
371
|
+
|
|
372
|
+
---
|
|
373
|
+
|
|
374
|
+
## [3.9.0-rc.8] — 2026-05-28
|
|
375
|
+
|
|
376
|
+
> **TL;DR:** **Integrity-batch #2 from the exhaustive file-by-file audit** (every `src/` module, every doc, every workflow, every script re-read on Opus 4.8). Closes the cheap-but-real drift the audit surfaced and adds the FIRST structural defense for the "claimed-guarantee vs code-guard" class introduced in rc.7: a new **OIA Check 4d** that reads `.github/workflows/release.yml`, computes the SLSA Build Level it actually earns, and fails CI if any doc claims a higher level. Also: a bench-harness honesty fix (a 5-sample "p99" that always returned the max — relabeled `max`), determinism fix (`Date.now()` tag → stable), the privacy-test soft-skips made VISIBLE via `ctx.skip()` + a CI tripwire that fails loudly if the native deps that gate them ever go missing in CI, two stale test-title positioning claims, a benchmarks rounding drift, a biome binary/schema unification (2.4.14/2.4.15 → 2.4.16), and a stale Node placeholder in the bug template. **Docs/tests/scripts/config only — zero `src/` runtime logic changed. 926 → 927 tests (+1 CI tripwire).**
|
|
377
|
+
|
|
378
|
+
**Patch — audit-driven integrity (Tier 0, batch 2).**
|
|
379
|
+
|
|
380
|
+
### Fixed
|
|
381
|
+
|
|
382
|
+
- **S2 — OIA Check 4d: SLSA-level code-guard (structural defense for the rc.7 #15 class).** rc.7 *corrected* the SLSA-3→L2 overclaim by hand; this rc makes the regression **structurally impossible**. New `scripts/oia-walk.mjs` Check 4d Part A statically reads `release.yml`: `earnsL3 = /slsa-framework\/slsa-github-generator/`, `doesProvenance = /npm publish[^\n]*--provenance/` → `earnedLevel = earnsL3 ? 3 : doesProvenance ? 2 : 0`. It then greps the claim surfaces (README, package.json, llms.txt, COMPARISON, STABILITY) for an L3 claim (`/\bSLSA[-\s]?3\b|…L(?:evel\s*)?3\b|levels#build-l3/i`) and fails if any claim exceeds the earned level — with a roadmap-context skip so "L3 on the roadmap" stays legal. Part B (opt-out via `--skip-network`) checks the published attestation. This is the first concrete instance of the rc.7-promised "enforcement-verb code-guard" defense.
|
|
383
|
+
- **S1 — bench "p99" was always the max (honesty fix).** `scripts/bench.mjs` runs `RUNS=5` then took `quantile(samples, 0.99)`, which on 5 sorted samples is unconditionally `samples[4]` = the maximum. Reporting it as "p99" overstated tail rigor. Relabeled to `max` in the return object, the table header, and `bench/results.md` (the *values* were always the max — only the label was wrong, so no number moved).
|
|
384
|
+
- **M3 — bench determinism.** The write-path micro-bench used `#new-tag-${Date.now()}`, making every run mutate a different note and defeating run-to-run comparability. Pinned to `#new-tag-stable`.
|
|
385
|
+
- **T1 — privacy tests: visible skips + a CI tripwire (the silent-skip class).** `tests/cli-privacy-filters.test.ts` guarded 6 security-critical privacy assertions behind `if (!distExists() || !canRunFts5) return;` — a SILENT pass when the build or `better-sqlite3` was absent, exactly the failure mode that hides regressions. Converted all 6 to `(ctx) => { if (…) return ctx.skip(); … }` so a skip is *visible* in the reporter, and added one **CI GUARD** test that hard-asserts (when `process.env.CI`) that the dist build AND a live FTS5 query both work — so if the native-dep preconditions ever vanish in CI, the suite fails loudly instead of silently skipping the privacy coverage. The single guard transitively protects every other native-dep soft-skip (same CI preconditions). **This is the +1 test (926 → 927).**
|
|
386
|
+
- **W1 — stale positioning in test titles.** `tests/github-metadata-invariant.test.ts` had two `it(...)` titles still describing the pre-v3.7.8 "Memory layer for AI agents" lead and "v3.6.3 hype keywords" — while the assertions already pinned `ABOUT_LEADS_WITH = /^The most advanced Obsidian MCP/i`. Titles realigned to what the code actually checks (α-class TSDoc-drift sibling, but in test descriptions).
|
|
387
|
+
- **S4 — benchmarks rounding drift.** `docs/benchmarks.md` line 30 said "+25 MRR / +16 NDCG@10" (rounded) while every other surface uses the precise measured "+24.7 MRR / +15.5 NDCG@10". Unified to the precise figures.
|
|
388
|
+
- **C1 — biome binary/schema unification.** Installed binary was 2.4.14, `biome.json` `$schema` pinned 2.4.15, `package.json` devDep `^2.4.15`. Bumped all three to **2.4.16** (latest). Clean bump — `lint:fix` reformatted one long line I'd added to `oia-walk.mjs`; zero new rule violations.
|
|
389
|
+
- **bug_report.yml Node placeholder.** `.github/ISSUE_TEMPLATE/bug_report.yml` example was `v20.11.0`, below the `engines.node >= 22.13.0` floor — a reporter copying it would file an unsupported version. → `v22.13.0`.
|
|
390
|
+
|
|
391
|
+
### Why these are batched
|
|
392
|
+
|
|
393
|
+
All nine are state-driven findings from re-reading the repo file-by-file (the methodology gap CLAUDE.md documents: change-driven sweeps miss files not actively edited). None touch `src/` runtime behavior — they harden the *audit apparatus* (S2), *measurement honesty* (S1/M3/S4), *test visibility* (T1/W1), and *toolchain/template hygiene* (C1/bug_report). Higher-risk items stay sequenced per plan: **#16 OCR offline enforcement → rc.9; H1 watcher per-file serialization → rc.10.**
|
|
394
|
+
|
|
395
|
+
### Files changed
|
|
396
|
+
|
|
397
|
+
- `scripts/oia-walk.mjs` — Check 4d SLSA-level guard (Part A static + Part B network) + honest header enumeration of all 8 checks / 11 blocks.
|
|
398
|
+
- `scripts/bench.mjs` — `p99`→`max` (return obj + header); `Date.now()` tag → `#new-tag-stable`.
|
|
399
|
+
- `bench/results.md` — `p50 / p99` → `p50 / max` column label.
|
|
400
|
+
- `tests/cli-privacy-filters.test.ts` — 6 soft-skips → `ctx.skip()`; +1 CI GUARD tripwire.
|
|
401
|
+
- `tests/github-metadata-invariant.test.ts` — 2 stale test titles realigned to assertions.
|
|
402
|
+
- `docs/benchmarks.md` — +25/+16 → +24.7/+15.5.
|
|
403
|
+
- `biome.json` + `package.json` — biome 2.4.15 → 2.4.16.
|
|
404
|
+
- `.github/ISSUE_TEMPLATE/bug_report.yml` — Node placeholder v20.11.0 → v22.13.0.
|
|
405
|
+
- `ROADMAP.md` — re-sequenced #16 OCR offline (rc.8 → rc.9) + Tier 1 watcher/H1 (rc.9 → rc.10) since rc.8 became the integrity-batch; noted Check 4d as partial progress on the structural drift-class item.
|
|
406
|
+
- `README.md`, `docs/COMPARISON.md`, `llms.txt`, `AGENTS.md`, `package.json` — test count 926 → 927.
|
|
407
|
+
- version bump 3.9.0-rc.7 → 3.9.0-rc.8 (7 surfaces).
|
|
408
|
+
|
|
409
|
+
### Stats
|
|
410
|
+
|
|
411
|
+
- **927 unit tests** (+1 CI tripwire) — all passing.
|
|
412
|
+
- Lint clean (biome 2.4.16, 0 warnings). `tsc` strict clean. OIA clean (8 checks incl. new 4d). scope-completeness clean.
|
|
413
|
+
|
|
414
|
+
---
|
|
415
|
+
|
|
416
|
+
## [3.9.0-rc.7] — 2026-05-25
|
|
417
|
+
|
|
418
|
+
> **TL;DR:** **Tier 0 integrity batch from a full project audit** (deep code audit of all 31 src/ modules + docs/workflows/config audit + competitive survey of the Obsidian-MCP / AI-memory / RAG-MCP landscapes). Fixes the two brand-critical overclaims the audit surfaced — **#15 SLSA-3** (badge linked to the slsa.dev **L3** spec + 8+ surfaces claimed "SLSA-3", but `release.yml` only runs `npm publish --provenance` = SLSA Build **L2**) and corrects pervasive version/RC drift + an undersold reranker number. Adds a public **ROADMAP.md**, gitignores the stray `false/` npm-cache tree, adds `CITATION.cff` version field, and documents a new overclaim anti-pattern (the "claimed-guarantee vs code-guard" class behind #15 + #16). **Docs/config-only; 926 tests unchanged. The OCR-offline-enforcement overclaim (#16, "implement" decision) ships in rc.8; the watcher live-update race (H1) in rc.9.**
|
|
419
|
+
|
|
420
|
+
**Patch — audit-driven integrity (Tier 0).**
|
|
421
|
+
|
|
422
|
+
### The audit
|
|
423
|
+
|
|
424
|
+
Three parallel passes:
|
|
425
|
+
1. **Deep code audit** (all `src/*.ts` + `src/tools/*.ts`, whole files): **zero CRITICAL**. The codebase is well-hardened (constant-time bearer compare, ReDoS-safe glob/like walkers, fail-closed `.base` predicates, transactional SQLite). Residual: 1 HIGH (watcher race, H1), 1 HIGH (OCR offline overclaim, #16), 5 MEDIUM, 5 LOW.
|
|
426
|
+
2. **Docs/workflows/config audit**: SLSA-3 overclaim (#15), version drift, OIA self-count drift (docs say "6 checks", code has 8), reranker undersell, `false/` junk dir, no ROADMAP, missing OSS-health files.
|
|
427
|
+
3. **Competitive survey**: enquire is technically ahead of every Obsidian-MCP peer (CRUD-only or REST-plugin-dependent); near-parity with local-RAG MCPs (knowledge-rag); behind AI-memory frameworks (mem0/cognee/Letta/Zep) only on **published LoCoMo numbers**, **entity knowledge graph**, and **discoverability** (8★). Letta's "filesystem memory scores 74% LoCoMo" validates our vault-as-memory thesis.
|
|
428
|
+
|
|
429
|
+
### Fixed in this rc.7 (Tier 0)
|
|
430
|
+
|
|
431
|
+
- **#15 SLSA-3 → SLSA L2 (overclaim instance #15).** Real mechanism is `npm publish --provenance` + GitHub OIDC = a Sigstore-signed provenance attestation = **SLSA Build Level 2** (hosted builder + non-forgeable-by-author provenance). Level 3 needs an isolated builder via `slsa-framework/slsa-github-generator`. Corrected every surface: README badge (now links to the L2 spec) + hero line + comparison table + releases row, package.json description + keyword (`slsa-3` → `build-provenance`), llms.txt (×2), docs/COMPARISON.md (×2). Earning real L3 is now a tracked **ROADMAP Tier 4** item, not a claim.
|
|
432
|
+
- **Version/RC drift.** README "Pre-release: currently v3.9.0-rc.3" → rc.6; QUICKSTART version example → rc.6; benchmarks.md "still valid as of rc.3" → rc.6; AGENTS.md "OIA — 6 checks" → 8 (×2); CLAUDE.md OIA-walk description "6 cheap walks" → 8 + the rc.4 "(current)" marker corrected.
|
|
433
|
+
- **Reranker undersold → measured numbers.** README (3 sites) + llms.txt: "+5-10 NDCG@10 typical" → **+15.5 NDCG@10 / +24.7 MRR measured** (the figure already in COMPARISON.md + benchmarks.md). The repo was undercutting its own measured, reproducible result by ~50%.
|
|
434
|
+
- **`false/` npm-cache junk → `.gitignore`.** A stray `--cache false` / `npm_config_cache=false` mis-parse created an untracked `_cacache`/`_logs` tree at repo root; one `git add .` would have committed it.
|
|
435
|
+
- **CITATION.cff** gains `version` (tracks the @latest stable line, deliberately not in version-consistency) + `date-released`.
|
|
436
|
+
- **New `ROADMAP.md`** — public, tiered (Tier 0 integrity → Tier 1 correctness → Tier 2 LoCoMo benchmarks → Tier 3 GraphRAG-full / conversational write-back → Tier 4 discoverability + real SLSA-L3). Linked from README.
|
|
437
|
+
- **New anti-pattern documented (CLAUDE.md):** "Never claim an ENFORCED guarantee the code doesn't actually enforce" — the class behind overclaim #15 (SLSA) + #16 (OCR offline). The invariant apparatus checks numeric/doc drift but had no defense for "we promise enforcement X; does a code path enforce X?". Candidate structural defense (deferred): an OIA enforcement-verb grep.
|
|
438
|
+
|
|
439
|
+
### Deferred to the next RCs (tracked in ROADMAP.md)
|
|
440
|
+
|
|
441
|
+
- **rc.8 — #16 OCR offline enforcement (HIGH, "implement" decision).** SECURITY.md claims "zero outbound network calls in serve mode" and `ocr.ts` TSDoc claims a pre-flight "throws if language not installed" check, but `extractPdfWithOcr` only warns then `createWorker` silently CDN-fetches; `install-ocr-lang` is referenced in 4 files but never existed. Implement: pre-flight cache check + `langPath` wiring + real `install-ocr-lang` subcommand + env-gated integration test.
|
|
442
|
+
- **rc.9 — H1 watcher per-file serialization (HIGH).** Fire-and-forget `handle()` lets concurrent saves to one file interleave `applyDiff` + the shared `rowsByLabel` mutation → in-memory HNSW drift. Add a per-relPath promise queue + concurrent-event test. Plus M1 (HNSW `saveTo` live count), L2 (unlink kind).
|
|
443
|
+
- _**Re-sequenced after this entry** (rc.13 doc fix): the rc.8 integrity-batch pivot pushed both items back two RCs — **#16 OCR offline enforcement actually shipped in v3.9.0-rc.10**, **H1 watcher serialization in v3.9.0-rc.11** (see those entries). The "ships in rc.8 / rc.9" lines above are the original rc.7 plan, preserved as history._
|
|
444
|
+
|
|
445
|
+
### Files changed
|
|
446
|
+
|
|
447
|
+
- `README.md` — SLSA badge/hero/table/releases; reranker numbers (×3); RC currency; ROADMAP link.
|
|
448
|
+
- `package.json` — description SLSA wording + `slsa-3`→`build-provenance` keyword.
|
|
449
|
+
- `llms.txt` — SLSA (×2) + reranker number.
|
|
450
|
+
- `docs/COMPARISON.md` — SLSA row + provenance paragraph.
|
|
451
|
+
- `docs/QUICKSTART.md`, `docs/benchmarks.md` — RC currency.
|
|
452
|
+
- `AGENTS.md`, `CLAUDE.md` — OIA check count (6→8); CLAUDE status rc.7 entry + new anti-pattern.
|
|
453
|
+
- `CITATION.cff` — version + date-released.
|
|
454
|
+
- `.gitignore` — `false/`.
|
|
455
|
+
- `ROADMAP.md` — new file.
|
|
456
|
+
- version bump 3.9.0-rc.6 → 3.9.0-rc.7 (7 surfaces).
|
|
457
|
+
|
|
458
|
+
---
|
|
459
|
+
|
|
460
|
+
## [3.9.0-rc.6] — 2026-05-25
|
|
461
|
+
|
|
462
|
+
> **TL;DR:** **HNSW disk persistence on live update.** When the watcher applies HNSW live updates (`applyDiff`) during a serve session, the in-memory index diverges from the persisted `.hnsw.bin` sidecar. This rc re-persists the live-updated index at watcher **close time** so the next serve loads the up-to-date sidecar (~50ms) instead of rebuilding from embed-db (~25s on 50K chunks). Correctness was always guaranteed by the signature guard (a stale sidecar is ignored → safe rebuild); this is purely a restart-speed optimization. Chose close-time flush over a debounced during-serve timer: same restart benefit, no timer-lifecycle complexity, no mid-serve disk I/O. **+3 tests (2 POSITIVE + 1 NEGATIVE control); 926 unit tests total. No API breaks (additive).**
|
|
463
|
+
|
|
464
|
+
**Patch — restart-speed optimization.**
|
|
465
|
+
|
|
466
|
+
### Why close-time flush (not debounced during serve)
|
|
467
|
+
|
|
468
|
+
The originally-planned design was "debounced `saveTo` ~30s after the last mutation". On reflection, close-time flush is the better risk-adjusted choice:
|
|
469
|
+
|
|
470
|
+
- **Correctness is already guaranteed** by the signature guard. `loadHnswFromDisk` recomputes the embed-db signature at load time and rebuilds on mismatch. After live edits, the embed-db signature changes, so a STALE `.hnsw.bin` is simply ignored → safe (just slower) rebuild. So persisting-on-live-update is ONLY a speed optimization, never a correctness fix.
|
|
471
|
+
- **The only benefit is restart speed**, and that benefit is identical whether you persist debounced-during-serve or once-at-close: either way the NEXT serve loads a current sidecar.
|
|
472
|
+
- **Close-time is lower risk**: no `setTimeout`/`clearTimeout` lifecycle to leak on `close()`, no concurrent save-vs-mutate window mid-serve, no disk I/O churn during active use.
|
|
473
|
+
- **Tradeoff**: an ungraceful `SIGKILL` (no graceful close) skips the flush — but the signature guard makes that safe (falls back to rebuild). A crash is rare; paying a one-time ~25s rebuild after a rare crash is an acceptable cost vs the complexity of a debounce timer.
|
|
474
|
+
|
|
475
|
+
### Implementation
|
|
476
|
+
|
|
477
|
+
`src/watcher.ts`:
|
|
478
|
+
- New fields `hnswPersistFile: string | null` + `hnswDirty: boolean`.
|
|
479
|
+
- `attachHnsw(hnsw, rowsByLabel, persistFile?)` — gains an optional `persistFile` param (the `<embed-db>.hnsw` sidecar base path). Omitted (or `--no-hnsw-persist`) → no flush.
|
|
480
|
+
- `syncHnswForFile` sets `hnswDirty = true` after every successful `applyDiff`.
|
|
481
|
+
- New `flushHnswToDisk(): Promise<boolean>` — no-op unless dirty + index + rowsByLabel + persistFile + embedDb all wired. Recomputes the embed-db signature so the persisted `.meta.json` matches what the next `loadHnswFromDisk` expects, then `await hnsw.saveTo(...)`. Fail-soft (a save error is logged + swallowed; signature guard → safe rebuild). Returns whether a flush happened.
|
|
482
|
+
- `close()` awaits `flushHnswToDisk()` before closing the chokidar watcher.
|
|
483
|
+
|
|
484
|
+
`src/server.ts`: both `attachHnsw` call sites (built-fresh + loaded-from-disk HNSW paths) now pass `persistFile` — gated on `opts.hnswPersist !== false` so `--no-hnsw-persist` correctly skips the close-time flush too.
|
|
485
|
+
|
|
486
|
+
### Tests added (+3)
|
|
487
|
+
|
|
488
|
+
`tests/watcher.test.ts` — new describe block `VaultWatcher HNSW disk persistence (v3.9.0-rc.6)`:
|
|
489
|
+
- POSITIVE: `flushHnswToDisk is a no-op when no live update occurred (not dirty)` — no sidecar written.
|
|
490
|
+
- POSITIVE: `close() flushes the live-updated index to a loadable sidecar with matching signature` — full integration: real EmbedDb + mock embedder + real `buildHnsw` + FtsIndex → file edit → `applyDiff` → `close()` → assert `.hnsw.bin` exists AND `loadHnswFromDisk(persistFile, postEditSignature)` returns non-null. This integration test also lifted `watcher.ts` branch coverage 55.05% → 59.58%.
|
|
491
|
+
- NEGATIVE control: `flushHnswToDisk is a no-op when persistFile was omitted` — even with a live mutation, no `persistFile` → no flush.
|
|
492
|
+
|
|
493
|
+
### Files changed
|
|
494
|
+
|
|
495
|
+
- `src/watcher.ts` — `hnswPersistFile`/`hnswDirty` fields + `flushHnswToDisk()` + `attachHnsw` param + `close()` flush (+50 lines).
|
|
496
|
+
- `src/server.ts` — pass `persistFile` to both `attachHnsw` call sites.
|
|
497
|
+
- `tests/watcher.test.ts` — 3 new tests (~120 lines).
|
|
498
|
+
- `scripts/check-per-file-coverage.mjs` — watcher coverage comment refreshed (55.05% → 59.58%; floor stays 53%).
|
|
499
|
+
- `README.md`, `llms.txt`, `AGENTS.md`, `docs/COMPARISON.md`, `package.json` — test count 923 → 926.
|
|
500
|
+
- version bump 3.9.0-rc.5 → 3.9.0-rc.6 (7 surfaces).
|
|
501
|
+
|
|
502
|
+
### What's next
|
|
503
|
+
|
|
504
|
+
- **v3.9.0 stable** — promote `@rc → @latest`. All architectural v3.9.0 items now shipped (OCR'd PDF watcher embed-sync rc.1, HNSW in-memory live update rc.2, R-10 adaptive refill rc.3, HNSW disk persistence rc.6). Gated on a fresh external audit on the v3.9.0-rc.2+ commit per `docs/audits/AUDIT-REQUEST-v3.9.0-rc.2-2026-05-25.md` (the v3.6.1 ≥2-independent-external-auditors rule).
|
|
505
|
+
- **v3.9.x+ backlog** — `install-ocr-lang` subcommand (with env-gated integration test); HNSW filter-during-search (structural R-10 closure); serve-http parity residual (P1-3); the remaining P2/P3 items.
|
|
506
|
+
|
|
507
|
+
---
|
|
508
|
+
|
|
509
|
+
## [3.9.0-rc.5] — 2026-05-25
|
|
510
|
+
|
|
511
|
+
> **TL;DR:** **OCR install-instruction unification — closes the μ-class doc inconsistency the v3.9.0-rc.4 fix itself introduced (overclaim #14 residual).** rc.4's fix for overclaim #14 replaced the (non-existent) `install-ocr-lang` references in `cli.ts`/`api.md` with a "download from github tessdata_fast" instruction — but `SECURITY.md:167` documented a *different* procedure ("run OCR once online, copy `tessdata/`"). Two divergent install paths. This rc.5 unifies all three surfaces on the canonical run-once-then-copy procedure (SECURITY.md is the single source of truth), and refreshes the stale `SECURITY.md` roadmap stamp ("(v3.8.0)" → "planned, not yet shipped as of v3.9.0") with the deferral rationale (the `install-ocr-lang` subcommand needs `langPath`/`cachePath` wiring in `src/ocr.ts` that CI can't exercise — tesseract.js + canvas are optional deps absent from the matrix). **Docs-only; 923 unit tests unchanged.**
|
|
512
|
+
|
|
513
|
+
**Patch — docs consistency (audit-driven self-correction).**
|
|
514
|
+
|
|
515
|
+
### Why this exists
|
|
516
|
+
|
|
517
|
+
This is a self-audit finding on rc.4's own diff (the CLAUDE.md "post-merge re-sweep" rule since v3.7.15 — after every audit-driven release that closes a class finding, scan that patch's own diff for fresh instances of the same class). rc.4 closed overclaim #14 (the `install-ocr-lang` subcommand was referenced as if it existed) by swapping the references for a manual `tessdata_fast` download instruction. But that swap was hasty — it created a NEW inconsistency: `SECURITY.md` already documented the canonical "run OCR once online to populate the `tessdata/` cache, then copy to the offline host" procedure, and rc.4's `tessdata_fast` instruction diverged from it without specifying the exact cache dir.
|
|
518
|
+
|
|
519
|
+
This is the **μ-class** (instruction inconsistency across docs) — same class swept in v3.7.20 task #24.
|
|
520
|
+
|
|
521
|
+
### Fixes
|
|
522
|
+
|
|
523
|
+
- **`src/cli.ts`** (`--ocr-pdfs` + `--ocr-langs` help text): now point at SECURITY.md's canonical procedure instead of a standalone `tessdata_fast` instruction.
|
|
524
|
+
- **`docs/api.md`** (`--ocr-pdfs` flag row): same — references the canonical procedure.
|
|
525
|
+
- **`SECURITY.md`**: added an explicit "**Current install procedure (canonical)**" paragraph (the run-once-then-copy approach, with `tessdata_fast` as a documented alternative). Refreshed the "**Roadmap (v3.8.0)**" heading → "**Roadmap (planned, not yet shipped as of v3.9.0 — re-targeted from the original v3.8.0 plan)**" and documented WHY `install-ocr-lang` is deferred: it requires wiring a stable `langPath`/`cachePath` into `src/ocr.ts`'s `createWorker`, and the network-download path can't be exercised in CI, so it needs an env-gated integration test before shipping.
|
|
526
|
+
|
|
527
|
+
### Why NOT implement the full `install-ocr-lang` subcommand now
|
|
528
|
+
|
|
529
|
+
The honest answer is testability. The subcommand would:
|
|
530
|
+
1. Download `<lang>.traineddata` into a cache dir (network op — fine, mirrors `install-model`).
|
|
531
|
+
2. Require `src/ocr.ts`'s `createWorker` to read from that same dir via `langPath`/`cachePath`.
|
|
532
|
+
|
|
533
|
+
Step 2 is the risk: `src/ocr.ts` currently calls `createWorker(langs, undefined, { logger })` with no explicit `langPath`, so tesseract.js uses its default cache behavior. Changing that to a custom dir could break OCR in a way CI can't catch — there are no CI tests that actually run OCR (tesseract.js + `@napi-rs/canvas` are optional deps absent from the CI matrix; the only OCR test is env-gated). Shipping an untestable change to the OCR worker config violates the "audit BEFORE ship" discipline. Tracked as a v3.9.x backlog item that must land WITH an env-gated integration test (`ENQUIRE_LOAD_OCR_E2E=1`, same pattern as the reranker smoke).
|
|
534
|
+
|
|
535
|
+
### Files changed
|
|
536
|
+
|
|
537
|
+
- `src/cli.ts` — `--ocr-pdfs` + `--ocr-langs` help text reference SECURITY.md canonical procedure.
|
|
538
|
+
- `docs/api.md` — `--ocr-pdfs` flag row reference.
|
|
539
|
+
- `SECURITY.md` — canonical-procedure paragraph + roadmap re-target.
|
|
540
|
+
- version bump 3.9.0-rc.4 → 3.9.0-rc.5 (7 surfaces).
|
|
541
|
+
|
|
542
|
+
### What's next
|
|
543
|
+
|
|
544
|
+
- **v3.9.0-rc.6** — HNSW disk persistence on live update (debounced `saveTo` ~30s after the last watcher mutation; recompute embed-db signature so the persisted `.hnsw.bin` tracks live state).
|
|
545
|
+
- **v3.9.0 stable** — promote `@rc → @latest` after rc.6 + fresh external audit per `docs/audits/AUDIT-REQUEST-v3.9.0-rc.2-2026-05-25.md`.
|
|
546
|
+
- **v3.9.x+** — `install-ocr-lang` subcommand (with env-gated integration test); HNSW filter-during-search (structural R-10 closure).
|
|
547
|
+
|
|
548
|
+
---
|
|
549
|
+
|
|
550
|
+
## [3.9.0-rc.4] — 2026-05-25
|
|
551
|
+
|
|
552
|
+
> **TL;DR:** **Full state-driven self-audit on the v3.8.7 → v3.9.0-rc.3 cascade — closes 3 HIGH + 4 MEDIUM findings + documents overclaim instance #13 + recursion-pair shape #7 + extends META scope-completeness with 2 new defenses.** Audit caught: (1) CLAUDE.md header line said "deferred to v3.9.0+: ... OCR'd PDF watcher embed-sync, HNSW in-memory live update, R-10 adaptive refill" while the status section in the same file listed all three as SHIPPED (overclaim #13). (2) `docs/api.md:5` said "currently v3.9.0-rc.1" — we're on rc.3. (3) v3.9.0-rc.1/rc.2/rc.3 features absent from ALL user-facing docs (README, api.md, QUICKSTART, llms.txt, AGENTS.md) — the v3.8.8 META audit covered only NUMERIC drift, not FEATURE-MENTION drift. **+5 tests (3 POSITIVE + 2 NEGATIVE controls); 923 unit tests total.** All findings closed by the same PR.
|
|
553
|
+
|
|
554
|
+
**Patch — full audit + docs-only fixes + 2 new structural defenses.**
|
|
555
|
+
|
|
556
|
+
### What the audit found
|
|
557
|
+
|
|
558
|
+
Phase 0 (reality snapshot): all 9 required CI gates green, 917 tests, lint clean, OIA clean, 7-surface version-consistency, 10/10 per-file floors, 0 vulns.
|
|
559
|
+
|
|
560
|
+
Phase 1 (state-driven docs walk via parallel general-purpose agent): 3 HIGH + several MEDIUM/LOW findings.
|
|
561
|
+
|
|
562
|
+
Phase 2 (code-doc consistency via parallel general-purpose agent): PASS — every v3.8.7 → v3.9.0-rc.3 CHANGELOG claim verified in the codebase.
|
|
563
|
+
|
|
564
|
+
### HIGH findings (all closed in this rc.4)
|
|
565
|
+
|
|
566
|
+
- **H-1 — Feature-mention drift**: v3.9.0-rc.1 (`--ocr-pdfs` + 2 sibling flags), v3.9.0-rc.2 (HNSW in-memory live update), v3.9.0-rc.3 (`adaptiveHnswRefill`) shipped in 3 RCs but appeared ONLY in CHANGELOG + CLAUDE.md. Zero hits in `README.md`, `docs/api.md` (flag table), `docs/QUICKSTART.md`, `docs/http-transport.md`, `llms.txt`, `AGENTS.md`. **Fix**: added the 3 OCR flags + 6 other previously-paragraph-only stable flags (`--include-pdfs`, `--enable-reranker`, `--reranker-model`, `--reranker-top-n`, `--use-hnsw`, `--hnsw-ef`, `--late-chunk-context`, `--no-hnsw-persist`, `--quantize-embeddings`) to `docs/api.md` flag table. Added rc.1/rc.2/rc.3 mention to README highlight reel + llms.txt bullet list + AGENTS.md watcher section.
|
|
567
|
+
|
|
568
|
+
- **H-2 — Stale RC index**: `docs/api.md:5` said "currently v3.9.0-rc.1 — OCR'd PDF watcher embed-sync"; actual `@rc` is v3.9.0-rc.3. **Fix**: updated to mention all three RCs (OCR, HNSW live update, R-10 adaptive).
|
|
569
|
+
|
|
570
|
+
- **H-3 — Ambiguous CI gate rendering in README**: README line 249 listed "lint · test ×2 [Node 22/24] · smoke · audit · coverage · version-consistency · docs · oia" as the 9 required gates, but the `test ×2` rendering reads as 1 entry visually → looks like 8 gates while claiming "9 required". **Fix**: rewrote to enumerate explicitly: "(1) lint, (2) test on Node 22, (3) test on Node 24, (4) smoke, …, (9) oia".
|
|
571
|
+
|
|
572
|
+
### MEDIUM findings (closed in this rc.4)
|
|
573
|
+
|
|
574
|
+
- **M-1 — Overclaim instance #13** (CLAUDE.md self-contradiction): `CLAUDE.md:9` said "**Still deferred to v3.9.0+:** ... OCR'd PDF watcher embed-sync, HNSW in-memory live update, R-10 adaptive refill" — but the status section in the same file (lines ~143–145) listed all three as SHIPPED. **Class**: stale future-tense deferral claim (vs the present-tense "as of vX.Y.Z" pattern OIA Check 7 catches since v3.8.3). **Fix**: rewrote the header to clearly separate "v3.9.0 RCs shipped on `@rc`" from "Still deferred to v3.9.x+" (HNSW filter-during-search, embed-db migrations, distributed rate-limit, HNSW disk persistence on live update).
|
|
575
|
+
|
|
576
|
+
- **M-2 — Stale QUICKSTART version**: `docs/QUICKSTART.md:32` expected output `3.7.12` — bumped to mention both `3.9.0-rc.3` (`@rc`) and `3.8.8` (`@latest`).
|
|
577
|
+
|
|
578
|
+
- **M-3 — Stale benchmarks version footer**: `docs/benchmarks.md:3` cited v3.7.x version stamps. **Fix**: appended "still valid as of v3.9.0-rc.3 — retrieval pipeline unchanged; v3.8.x→v3.9.0 work was correctness/hardening + watcher live-update, not algorithmic" so the page is no longer misleadingly date-stale.
|
|
579
|
+
|
|
580
|
+
### META extension — 2 new scope-completeness defenses (recursion-pair shape #7)
|
|
581
|
+
|
|
582
|
+
The v3.8.8 META audit (`scripts/scope-completeness-audit.mjs`) covered 5 NUMERIC-CLAIM patterns. The HIGH-1 finding above (3 OCR flags missing from `docs/api.md`) revealed that META's dimension coverage was incomplete. **Recursion-pair shape #7** documented: even after v3.8.8's META audit landed, drift in a different dimension (feature mentions) snuck in for 3 RCs.
|
|
583
|
+
|
|
584
|
+
Added in rc.4:
|
|
585
|
+
|
|
586
|
+
- **`runDeferredClaimAudit()`** — scans `CLAUDE.md` for `(?:Still\s+)?deferred\s+to\s+v\d+\.\d+\.\d+\+?:\s*([^.\n]+)` patterns. For each item named in such a line, checks whether the same file contains a "shipped" status entry mentioning that item. If both present → finding. Closes overclaim #13 class structurally.
|
|
587
|
+
- **`runCliFlagCoverageAudit()`** — extracts every `.option("--name", …)` from `src/cli.ts`; verifies each appears in `docs/api.md` (substring match). Subcommand-specific flags (`--bearer-token`, `--queries`, `--lang`, etc.) live in `subcommandExempts` and are skipped. Closes the feature-mention class for CLI flags specifically.
|
|
588
|
+
- **`runAudit()`** now composes all three sub-audits (numeric + deferred-claim + cli-flag-coverage). OIA Check 8 picks up the extended results automatically.
|
|
589
|
+
|
|
590
|
+
### Tests added (+5)
|
|
591
|
+
|
|
592
|
+
`tests/scope-completeness-invariant.test.ts` extended:
|
|
593
|
+
- POSITIVE: `runDeferredClaimAudit returns zero findings on current state` (proves rc.4's CLAUDE.md fix closed overclaim #13)
|
|
594
|
+
- POSITIVE: `runCliFlagCoverageAudit returns zero findings on current state` (proves the new OCR/HNSW flags are in `docs/api.md`)
|
|
595
|
+
- POSITIVE: `runAudit returns union of all three sub-audits` (composition correctness)
|
|
596
|
+
- NEGATIVE: deferred-to regex matches the drift pattern (proves the audit would catch a regression)
|
|
597
|
+
- NEGATIVE: missing-flag-in-docs is structurally detectable (synthetic CLI + doc fixture)
|
|
598
|
+
|
|
599
|
+
### CLAUDE.md anti-patterns added
|
|
600
|
+
|
|
601
|
+
Two new rules captured (already-existing recurring shapes from this session):
|
|
602
|
+
|
|
603
|
+
- **Update forward-looking deferral claims in the same commit that ships the deferred item** — closes overclaim instance #13 class. The `deferred-claim` defense above makes this structural; the rule documents the human-side discipline.
|
|
604
|
+
- **META scope-completeness defenses must cover every drift DIMENSION** — closes recursion-pair shape #7. New rule: every structural defense PR must enumerate covered + uncovered dimensions; uncovered ones become deferred-defense TODOs.
|
|
605
|
+
|
|
606
|
+
### Files changed
|
|
607
|
+
|
|
608
|
+
- `CLAUDE.md` — overclaim #13 documented; recursion-pair shape #7 documented; header bullet at line 9 corrected; 2 new anti-pattern rules added.
|
|
609
|
+
- `docs/api.md` — `:5` Channels paragraph current; flag table expanded with 12 new rows (3 OCR + 9 previously-paragraph-only stable flags).
|
|
610
|
+
- `README.md` — highlight reel + features-table CI block rendering.
|
|
611
|
+
- `llms.txt` — v3.9.0 features bulleted.
|
|
612
|
+
- `AGENTS.md` — watcher section mentions `setOcrPdfs` + `attachHnsw`.
|
|
613
|
+
- `docs/QUICKSTART.md` — version example refreshed.
|
|
614
|
+
- `docs/benchmarks.md` — footer "still valid as of v3.9.0-rc.3" note.
|
|
615
|
+
- `scripts/scope-completeness-audit.mjs` — `runNumericAudit` (renamed), `runDeferredClaimAudit`, `runCliFlagCoverageAudit`, combined `runAudit` (+200 lines).
|
|
616
|
+
- `tests/scope-completeness-invariant.test.ts` — extended describe block with 5 new tests.
|
|
617
|
+
- `README.md`, `llms.txt`, `AGENTS.md`, `docs/COMPARISON.md`, `package.json` — test count 918 → 923.
|
|
618
|
+
- version bump 3.9.0-rc.3 → 3.9.0-rc.4 (7 surfaces).
|
|
619
|
+
|
|
620
|
+
### What's next
|
|
621
|
+
|
|
622
|
+
- **v3.9.0-rc.5** — HNSW disk persistence on live update (debounced `saveTo` ~30s after last mutation). Originally planned for rc.4; deferred to make space for this audit-driven docs cascade.
|
|
623
|
+
- **v3.9.0 stable** — promote `@rc → @latest` after rc.5 lands + fresh external audit on v3.9.0-rc.2+ per `docs/audits/AUDIT-REQUEST-v3.9.0-rc.2-2026-05-25.md`.
|
|
624
|
+
- **v3.9.x+** — HNSW filter-during-search (architectural; closes R-10 structurally).
|
|
625
|
+
|
|
626
|
+
---
|
|
627
|
+
|
|
628
|
+
## [3.9.0-rc.3] — 2026-05-25
|
|
629
|
+
|
|
630
|
+
> **TL;DR:** **R-10 adaptive HNSW refill + external audit attribution.** Closes the last open INFO finding from the corrected 2026-05-25 external audit (`docs/audits/v3.8.0-rc.15-external-2026-05-25.md`, 4.85/5). New `adaptiveHnswRefill()` helper in `src/tools/search.ts` doubles k up to maxAttempts=3 times when the post-filter hit count is below `limit`. Closes the ">66% excluded" under-return class that rc.9's static 6× multiplier could not fully solve. Archives the external audit doc in `docs/audits/` + lifts the "External audit blocker per v3.6.1 STILL OPEN" framing in CLAUDE.md (the corrected audit retroactively justifies v3.8.0 stable). Creates `docs/audits/AUDIT-REQUEST-v3.9.0-rc.2-2026-05-25.md` for the next fresh pass. **+7 tests (5 POSITIVE + 2 NEGATIVE controls); 918 unit tests total. No API breaks.**
|
|
631
|
+
|
|
632
|
+
**Patch — R-10 + audit attribution.**
|
|
633
|
+
|
|
634
|
+
### R-10 adaptive HNSW refill (INFO-2 from corrected external audit)
|
|
635
|
+
|
|
636
|
+
**Problem**: The embed-db can contain entries for paths that the privacy filter (`vault.isExcluded`) drops at response-build time. Pre-3.9.0-rc.3 the HNSW path fetched a STATIC multiplier of `max(limit × 6, 50)` entries; for vaults with > 66% excluded entries, filtering left fewer than `limit` results and the response under-returned.
|
|
637
|
+
|
|
638
|
+
**Fix**: `adaptiveHnswRefill()` (`src/tools/search.ts`) is a bounded loop:
|
|
639
|
+
```ts
|
|
640
|
+
let k = min(initialK, maxLabels);
|
|
641
|
+
let filtered: T[] = [];
|
|
642
|
+
for (let attempt = 0; attempt < maxAttempts; attempt++) {
|
|
643
|
+
filtered = filter(searchKnn(k));
|
|
644
|
+
if (filtered.length >= limit) break;
|
|
645
|
+
if (k >= maxLabels) break; // saturated — re-search yields same set
|
|
646
|
+
k = min(k * 2, maxLabels);
|
|
647
|
+
}
|
|
648
|
+
return filtered;
|
|
649
|
+
```
|
|
650
|
+
|
|
651
|
+
`maxAttempts = 3` bounds the worst-case to 3 × HNSW search latency (~30ms). Typical vaults converge on attempt 1 (most have < 20% excluded). The refill engages only for the long-tail privacy-heavy configurations the static multiplier under-served.
|
|
652
|
+
|
|
653
|
+
**Residual**: at > 95% excluded the loop still saturates without satisfying `limit`. That's the structural limit of post-filter retrieval; the architectural fix is `HNSW filter-during-search` (pushes the privacy predicate into the graph traversal). Deferred to v3.9.x+.
|
|
654
|
+
|
|
655
|
+
### Tests added (+7)
|
|
656
|
+
|
|
657
|
+
`tests/hnsw.test.ts` — new describe block `adaptiveHnswRefill (v3.9.0-rc.3 R-10)`:
|
|
658
|
+
|
|
659
|
+
- POSITIVE: returns initialK results when no filter drops anything (0% excluded case)
|
|
660
|
+
- POSITIVE: refills when 80% are filtered out (R-10 target case)
|
|
661
|
+
- POSITIVE: doubles k up to MAX_REFILL_ATTEMPTS=3 times when refill needed (assertion on call count + k progression)
|
|
662
|
+
- POSITIVE: stops doubling when k saturates maxLabels (prevents redundant calls when filter rejects everything)
|
|
663
|
+
- POSITIVE: respects custom maxAttempts override
|
|
664
|
+
- NEGATIVE control: exits after attempt 1 when filter satisfies on first try (proves the early-exit optimization fires)
|
|
665
|
+
- NEGATIVE control: maxAttempts=0 makes zero searchKnn calls (proves the loop bound works)
|
|
666
|
+
|
|
667
|
+
### External audit attribution (closes v3.8.1 framing)
|
|
668
|
+
|
|
669
|
+
- **`docs/audits/v3.8.0-rc.15-external-2026-05-25.md`** archived in-repo. The corrected audit (returned 2026-05-25 after the auditor acknowledged delivering the wrong project's doc to a prior chat — see v3.8.1 retraction): 4.85/5, ship-blockers none, 5 of 6 actionable findings already closed by the rc.18 → v3.8.5 cascade. INFO-2 (R-10 residual) closes in this rc.3.
|
|
670
|
+
- **CLAUDE.md header + v3.8.0/v3.8.1 entries** updated: the "External audit blocker per v3.6.1 STILL OPEN" framing is lifted. v3.8.0 stable promotion is now retroactively justified by the corrected audit. The v3.8.1 retraction was about misdirected delivery, not about the verdict itself being wrong.
|
|
671
|
+
- **`docs/audits/AUDIT-REQUEST-v3.9.0-rc.2-2026-05-25.md`** created — fresh audit request for the next pre-stable promotion (v3.9.0 → @latest). Lists the delta since rc.15, current state snapshot, specific zones of interest (HNSW concurrency, OCR network posture, P2-10/P2-11 wire-level verification, MCP Registry sync, test count drift).
|
|
672
|
+
|
|
673
|
+
### Cross-walk: external audit findings vs current state
|
|
674
|
+
|
|
675
|
+
| ID | Finding (audit on rc.15) | Current status (a80d491, v3.9.0-rc.2) |
|
|
676
|
+
|---|---|---|
|
|
677
|
+
| **M-REG-1** | server.json version drift; gate doesn't cover registry manifest | ✅ Closed in v3.8.0-rc.18 S-AUDIT-1 (5 → 7 version-consistency surfaces) |
|
|
678
|
+
| **L-HYB-1** | searchHybrid lacks terminal vault.isExcluded() filter | ✅ Closed in v3.8.0-rc.18 S-AUDIT-2 (line 1019 of src/tools/search.ts) |
|
|
679
|
+
| **L-OIA-1** | check:oia Check 6 fails on stale coverage-summary.json | ✅ Closed in v3.8.0-rc.18 S-AUDIT-3 (test:coverage → check:oia order documented) |
|
|
680
|
+
| **INFO-1** | README badge "v3.7.x stable" but @rc = rc.15 | ✅ Closed — README now `v3.8.x stable`, badge `tests-918 passing` |
|
|
681
|
+
| **INFO-2** | R-10 residual: HNSW under-return at > 66% excluded | ✅ **Closed in this rc.3** (adaptive refill loop) |
|
|
682
|
+
| **INFO-3** | T-2..T-5, HTTP P2-10/P2-11, multi-subcommand backlog | ✅ T-2/T-3/T-4 in v3.8.5; HTTP P2-10/P2-11 in v3.8.7; multi-subcommand in v3.8.0-rc.17. T-5 was over-counted placeholder (only 4 named items) |
|
|
683
|
+
|
|
684
|
+
### Files changed
|
|
685
|
+
|
|
686
|
+
- `src/tools/search.ts` — `adaptiveHnswRefill()` helper + integration in HNSW path of `embeddingsSearch` (+85 lines).
|
|
687
|
+
- `tests/hnsw.test.ts` — adaptiveHnswRefill describe block (+7 tests).
|
|
688
|
+
- `docs/audits/v3.8.0-rc.15-external-2026-05-25.md` — new file (archived audit).
|
|
689
|
+
- `docs/audits/AUDIT-REQUEST-v3.9.0-rc.2-2026-05-25.md` — new file (fresh audit request).
|
|
690
|
+
- `CLAUDE.md` — header note + v3.8.0/v3.8.1 entries + backlog section updated to reflect corrected audit.
|
|
691
|
+
- `README.md`, `llms.txt`, `AGENTS.md`, `docs/COMPARISON.md`, `package.json` — test count 911 → 918.
|
|
692
|
+
- version bump 3.9.0-rc.2 → 3.9.0-rc.3 (7 surfaces).
|
|
693
|
+
|
|
694
|
+
### What's next
|
|
695
|
+
|
|
696
|
+
- **v3.9.0-rc.4** — HNSW disk persistence on live update (debounced `saveTo` ~30s after last mutation). Currently `applyDiff` only mutates the in-memory index; next serve restart triggers a full rebuild from embed-db.
|
|
697
|
+
- **v3.9.0 stable** — promote `@rc → @latest` after rc.4 lands + the fresh external audit on the v3.9.0-rc.2+ commit completes (per `docs/audits/AUDIT-REQUEST-v3.9.0-rc.2-2026-05-25.md`).
|
|
698
|
+
- **v3.9.x+** — HNSW filter-during-search (architectural; pushes the privacy/exclude filter into the graph traversal itself rather than post-filter, structurally closing the R-10 class).
|
|
699
|
+
|
|
700
|
+
---
|
|
701
|
+
|
|
5
702
|
## [3.9.0-rc.2] — 2026-05-25
|
|
6
703
|
|
|
7
704
|
> **TL;DR:** **HNSW in-memory live update — closes the last named v3.8.0 architectural deferral.** When the watcher updates embed-db rows for an md/pdf file change, the in-memory HNSW index is now updated in lockstep via the new `HnswIndex.applyDiff(removeLabels, addPoints)` method. Pre-3.9.0 the index was rebuilt only at serve startup; long-running sessions slowly drifted as embed-db got upserts but HNSW kept the original vectors. Search results now reflect vault edits within the watcher debounce window (~250ms typical). **+13 tests (10 POSITIVE + 3 NEGATIVE controls); 911 unit tests total. No API breaks (additive — old callers ignoring the new return values + interface methods keep working).**
|