@yoch/frozenminisearch 1.2.1 → 1.2.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -2,6 +2,17 @@
2
2
 
3
3
  ## Unreleased
4
4
 
5
+ ## v1.2.2 — `@yoch/frozenminisearch`
6
+
7
+ Patch release: faster frozen AND scoring on large posting lists (gated seek + posting-ratio gate) and BM25 segment hoisting. No API or MSv5 wire-format changes.
8
+
9
+ ### Improved
10
+
11
+ - **AND gate posting-ratio** — when the absolute gate cap would disable filtering, pass `allowedDocs` to later AND branches if the gate is small relative to the branch posting length (calibrated: min length 2048, max 25% of posting). Applies to string AND and nested `QueryCombination` AND. Parity with naive score-then-intersect unchanged.
12
+ - **Gated posting seek** — on selective AND paths, score gated segments with binary search by doc id instead of scanning full sorted posting lists (same numeric thresholds as the ratio gate; distinct decision point).
13
+ - **BM25 IDF hoisting** — compute document-frequency IDF once per posting segment on frozen paths when doc activity filtering is inactive; lowers work on high-frequency AND queries.
14
+ - **Posting layout selection** — cost-based choice between dense and sparse frozen posting layouts from field/term statistics at build time.
15
+
5
16
  ## v1.2.1 — `@yoch/frozenminisearch`
6
17
 
7
18
  Patch release: lower search overhead when stored fields are disabled and fewer query-normalization allocations. No API or MSv5 wire-format changes.
package/README.md CHANGED
@@ -30,33 +30,32 @@ Choose **mutable MiniSearch** when documents change at runtime (`add`, `remove`,
30
30
  | **Incremental builder** | Typed-array accumulators during build; lower peak heap than materializing `number[][]` per term |
31
31
 
32
32
  <!-- vs-reference:start — npm run bench:readme -->
33
- ### Measured vs MiniSearch (reference baseline)
33
+ ### Measured vs lucaong MiniSearch (reference baseline)
34
34
 
35
- Same BM25 queries on identical corpora. **Index RAM is the headline metric** frozen uses a fraction of mutable heap on every scenario below; disk and cold load follow from the compact binary format.
35
+ Same BM25 queries on identical corpora. **Frozen wins on what we optimize for**: RAM, disk, cold load, and search throughput on real workloads.
36
36
 
37
37
  | Scenario | Docs | Index RAM¹ | Disk (binary vs JSON)² | Cold load³ | Search p50⁴ |
38
38
  |----------|-----:|------------|------------------------:|-----------:|------------:|
39
- | Divina with storeFields | 14,097 | 1.1 vs 16.0 MB (~93% less) | ~73% less | ~70% faster | ~13% faster |
40
- | Divina index only | 14,097 | 0.3 vs 14.9 MB (~98% less) | ~77% less | ~86% faster | ~8% faster |
41
- | high-frequency terms (10k docs) | 10,000 | 0.2 vs 7.4 MB (~98% less) | ~94% less | ~93% faster | ~29% faster |
42
- | Dense numeric ids (100k, identity lookup) | 100,000 | 1.6 vs 91.2 MB (~98% less) | ~88% less | ~91% faster | ~18% faster |
43
- | Doc id Uint16 boundary (65535 docs) | 65,535 | 1.1 vs 58.6 MB (~98% less) | ~91% less | ~93% faster | ~43% faster |
39
+ | Divina with storeFields | 14,097 | 0.3 vs 16.1 MB (~98% less) | ~73% less | ~65% faster | ~21% faster |
40
+ | Divina index only | 14,097 | 0.2 vs 14.9 MB (~99% less) | ~77% less | ~85% faster | ~17% faster |
41
+ | high-frequency terms (10k docs) | 10,000 | 0.1 vs 7.4 MB (~99% less) | ~94% less | ~90% faster | ~38% faster |
42
+ | Dense numeric ids (100k, identity lookup) | 100,000 | 0.9 vs 91.3 MB (~99% less) | ~88% less | ~90% faster | ~27% faster |
43
+ | Doc id Uint16 boundary (65535 docs) | 65,535 | 0.6 vs 58.6 MB (~99% less) | ~91% less | ~93% faster | ~44% faster |
44
44
 
45
- **Headline:** 22/27 query benchmarks favor frozen (paired **hrtime** protocol v2). Divina `inferno` (exact, paired p50): mutable 16.2 µs → frozen 13.7 µs (**-2 µs**, ratio 0.90).
45
+ **Headline:** 26/27 query benchmarks favor frozen (paired **hrtime** protocol v2). Divina `inferno` (exact, paired p50): mutable 15.7 µs → frozen 13.4 µs (**-2 µs**, ratio 0.80).
46
46
 
47
- Decomposition (Divina exact): L0 lookup ~300 ns frozen, L1 `executeQuery` ~8.3 µs, L2 full `search` ~11.6 µs (finalize ≈ 3 µs).
47
+ Decomposition (Divina exact): L0 lookup ~300 ns frozen, L1 `executeQuery` ~6.6 µs, L2 full `search` ~10.1 µs (finalize ≈ 3 µs).
48
48
 
49
- | | MiniSearch | `@yoch/frozenminisearch` |
49
+ | | lucaong `minisearch` | `@yoch/frozenminisearch` |
50
50
  |---|------------------------|---------------------------|
51
- | **Optimizes for** | Live mutations, flexibility | **Retained RAM**, snapshot size, cold load |
52
- | **Sweet spot** | Documents change at runtime | Fixed corpus, many replicas, tight memory budget |
51
+ | **Sweet spot** | Live index mutations | Fixed corpus, deploy from binary |
53
52
  | **Production path** | `addAll` → `toJSON` | `fromDocuments` / `fromMiniSearch` → `saveBinarySync` → `loadBinarySync` |
54
53
  | **Typical trade-off** | Higher RAM, JSON snapshots | One-time freeze, then compact binary |
55
54
 
56
55
  <details>
57
56
  <summary><strong>How to read these numbers (limits &amp; protocol)</strong></summary>
58
57
 
59
- - **Captured:** 2026-06-07 · commit `9f32207` · Node v24.16.0 · minisearch **7.2.0** · **3** run(s)/scenario · protocol **v2** (hrtime-paired, batch target 3 ms).
58
+ - **Captured:** 2026-06-18 · commit `d05d8e9` · Node v24.16.0 · minisearch **7.2.0** · **3** run(s)/scenario · protocol **v2** (hrtime-paired, batch target 3 ms).
60
59
  - ¹ **Index RAM** — `measureHeap` with `--expose-gc`, one index alive. V8 overhead is extra; treat as **trend**, not accounting. Sporadic outliers happen (e.g. index-only Divina).
61
60
  - ² **Disk** — `JSON.stringify(mutable)` vs `saveBinarySync`.
62
61
  - ³ **Cold load** — median wall time to searchable index after read from disk format.