@yoch/frozenminisearch 1.0.2 → 1.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -2,6 +2,33 @@
2
2
 
3
3
  ## Unreleased
4
4
 
5
+ ## v1.2.0 — `@yoch/frozenminisearch`
6
+
7
+ Minor release: configurable MSv5 snapshot compression and Node 20 support.
8
+
9
+ ### Added
10
+
11
+ - **`SaveBinaryOptions`** — `saveBinarySync()` / `saveBinaryAsync()` accept `{ compression: 'auto' | 'raw' | 'zstd' | 'zlib' }`.
12
+ - **`CODEC_ZLIB`** — portable deflate snapshots readable on Node 20+; explicit `compression: 'zlib'` always writes zlib on disk.
13
+ - **Exported types** — `BinaryCompression`, `SaveBinaryOptions`.
14
+
15
+ ### Improved
16
+
17
+ - **`compression: 'auto'`** — one compression pass: zstd when available (Node 22.15+), otherwise zlib on Node 20–22.14, otherwise raw when compression does not strictly shrink the payload (including payloads under 64 B).
18
+ - **Node engine** — `>=20` (was `>=22.15`); zstd remains available on Node 22.15+ and is required to read zstd snapshots.
19
+
20
+ ## v1.1.0 — `@yoch/frozenminisearch`
21
+
22
+ Minor release: MiniSearch JSON wire export and clearer JSON import API. MSv5 binary format unchanged.
23
+
24
+ ### Added
25
+
26
+ - **`toJSON()`** — export MiniSearch wire snapshots (`serializationVersion: 2`); import via `fromJson` / `fromMiniSearchSnapshot`. Production persistence remains `saveBinarySync`.
27
+
28
+ ### Breaking
29
+
30
+ - **`fromMiniSearchJson` → `fromJson`** — rename for clearer semantics (JSON import vs binary load). Update call sites: `FrozenMiniSearch.fromMiniSearchJson(json)` → `FrozenMiniSearch.fromJson(json)`.
31
+
5
32
  ## v1.0.2 — `@yoch/frozenminisearch`
6
33
 
7
34
  Patch release: lower retained heap when `storeFields` has one field. No API or MSv5 wire-format changes.
@@ -23,7 +50,7 @@ Patch release: lower build-time peak memory and migration ergonomics. No API or
23
50
 
24
51
  ### Fixed
25
52
 
26
- - **Default tokenizer parity** — leading delimiter produces an empty token (e.g. `::a` → `["", "a"]`), matching lucaong `split` behaviour.
53
+ - **Default tokenizer parity** — leading delimiter produces an empty token (e.g. `::a` → `["", "a"]`), matching MiniSearch `split` behaviour.
27
54
  - **Named export** — `FrozenMiniSearch` is exported again alongside the default export (ESM and CJS).
28
55
 
29
56
  ## v1.0.0 — `@yoch/frozenminisearch`
@@ -32,7 +59,7 @@ First stable release on npm. Frozen-only read-only search for Node.js.
32
59
 
33
60
  ### Breaking
34
61
 
35
- - **Binary snapshots** — `loadBinarySync` / `loadBinaryAsync` read only the current frozen binary format; re-build from lucaong JSON if an older snapshot fails to load.
62
+ - **Binary snapshots** — `loadBinarySync` / `loadBinaryAsync` read only the current frozen binary format; re-build from MiniSearch JSON if an older snapshot fails to load.
36
63
  - **Removed `saveBinary()` / `loadBinary()`** — use `saveBinarySync` / `saveBinaryAsync` and `loadBinarySync` / `loadBinaryAsync`.
37
64
 
38
65
  ## v1.0.0-beta.0 — `@yoch/frozenminisearch`
@@ -42,16 +69,16 @@ New standalone package (frozen-only) for read-only serving workloads.
42
69
  ### Added
43
70
 
44
71
  - **`FrozenMiniSearch`** as the default export — `fromDocuments`, builder, `saveBinarySync` / `loadBinarySync`
45
- - **Migration loaders** — `fromMiniSearch`, `fromMiniSearchJson`, `fromMiniSearchSnapshot` (lucaong JSON wire format)
72
+ - **Migration loaders** — `fromMiniSearch`, `fromJson`, `fromMiniSearchSnapshot` (MiniSearch JSON wire format)
46
73
  - **Modular benchmarks** — `npm run bench` with profiles `vs-reference`, `regression`, `dev`
47
74
  - **Parity suite** — `dev/parity/` vs `minisearch` npm (functional invariants)
48
75
 
49
76
  ### Removed from published API
50
77
 
51
78
  - Mutable `MiniSearch` class and `freeze()` on the fork
52
- - `freezeFromMiniSearch` (use `fromMiniSearchJson`)
79
+ - `freezeFromMiniSearch` (use `fromJson`)
53
80
  - Read-only mutation stubs (`add`, `remove`, …)
54
81
 
55
82
  ### Migration
56
83
 
57
- - `new MiniSearch(opts).addAll(docs)` (lucaong) → `FrozenMiniSearch.fromDocuments(docs, opts)` or `fromMiniSearch(mutable, opts)` — see README
84
+ - `new MiniSearch(opts).addAll(docs)` → `FrozenMiniSearch.fromDocuments(docs, opts)` or `fromMiniSearch(mutable, opts)` — see README
package/README.md CHANGED
@@ -1,21 +1,38 @@
1
- # @yoch/frozenminisearch
1
+ # FrozenMiniSearch
2
2
 
3
- **Read-only full-text search for Node.js** — compact frozen indexes, fast binary snapshots, and a **drop-in** search API for frozen workloads: same `search`, `autoSuggest`, scoring, and query options as [MiniSearch](https://github.com/lucaong/minisearch) by [Luca Ongaro](https://github.com/lucaong).
3
+ [![npm version](https://img.shields.io/npm/v/@yoch/frozenminisearch.svg)](https://www.npmjs.com/package/@yoch/frozenminisearch)
4
+ [![coverage](https://codecov.io/gh/yoch/frozenminisearch/graph/badge.svg)](https://codecov.io/gh/yoch/frozenminisearch)
5
+ [![CI](https://github.com/yoch/frozenminisearch/actions/workflows/main.yml/badge.svg)](https://github.com/yoch/frozenminisearch/actions/workflows/main.yml)
6
+ [![Socket Badge](https://socket.dev/api/badge/npm/package/@yoch/frozenminisearch)](https://socket.dev/npm/package/%40yoch%2Ffrozenminisearch)
4
7
 
5
- > **Current release:** `1.0.2` on npm
8
+ [API documentation](https://yoch.github.io/frozenminisearch/)
6
9
 
7
- **Design goal:** once an index is built or loaded, migrate with the minimum code change — package name and index construction only; serving code stays the same. No mutable `MiniSearch` class is published here; build indexes with `fromDocuments`, the incremental builder, or migrate from an existing lucaong index via `fromMiniSearchJson`.
10
+ **Memory-optimized, read-only full-text search for Node.js** the same BM25, prefix/fuzzy, and `autoSuggest` API as [MiniSearch](https://github.com/lucaong/minisearch), with **up to ~98% less index RAM** on real corpora and compact binary snapshots you ship instead of JSON.
11
+
12
+ **Why it exists:** [MiniSearch](https://github.com/lucaong/minisearch) optimizes for a mutable in-memory index. FrozenMiniSearch optimizes for **retained heap, disk footprint, and cold load** once the corpus is fixed — packed radix postings, columnar `storeFields`, typed-array layouts, and MSv5 binary wire format instead of per-document JS objects.
13
+
14
+ **Design goal:** migrate with minimal code change — package name and index construction only; serving code stays the same. Build with `fromDocuments`, the incremental builder, or `fromJson`; no mutable `MiniSearch` class is published here.
8
15
 
9
16
  ---
10
17
 
11
18
  ## Why frozen instead of MiniSearch?
12
19
 
13
- **Mutable** lucaong `minisearch` when documents change (`add`, `remove`, `discard`). **Frozen** when the corpus is fixed or shipped as a binary snapshot same BM25, prefix/fuzzy, `autoSuggest`, wildcard, and `AND` / `OR` / `AND_NOT`. Parity with `minisearch@7` is validated in `dev/parity/` (scores `toBeCloseTo` precision 6).
20
+ Choose **mutable MiniSearch** when documents change at runtime (`add`, `remove`, `discard`). Choose **frozen** when memory and snapshot size matter: fixed corpus, deploy from binary, many replicas loading the same index. Search semantics stay the same — BM25, prefix/fuzzy, `autoSuggest`, wildcard, `AND` / `OR` / `AND_NOT` with parity vs MiniSearch 7 validated in `dev/parity/` (scores `toBeCloseTo` precision 6).
21
+
22
+ ### Memory-first design
23
+
24
+ | Technique | What it saves |
25
+ |-----------|---------------|
26
+ | **Packed radix tree + flat postings** | Term dictionary and posting lists without per-entry JS wrappers |
27
+ | **Columnar `storeFields`** | One dense column per field instead of a `Record` per document (~75% less heap for a single stored field) |
28
+ | **MSv5 binary snapshots** | ~73–94% smaller on disk than MiniSearch JSON; faster cold load |
29
+ | **Read-only freeze** | No mutation bookkeeping — layouts sized for serve-time, not incremental edit |
30
+ | **Incremental builder** | Typed-array accumulators during build; lower peak heap than materializing `number[][]` per term |
14
31
 
15
32
  <!-- vs-reference:start — npm run bench:readme -->
16
- ### Measured vs lucaong MiniSearch (reference baseline)
33
+ ### Measured vs MiniSearch (reference baseline)
17
34
 
18
- Same BM25 queries on identical corpora. **Frozen wins on what we optimize for**: RAM, disk, cold load, and search throughput on real workloads.
35
+ Same BM25 queries on identical corpora. **Index RAM is the headline metric** frozen uses a fraction of mutable heap on every scenario below; disk and cold load follow from the compact binary format.
19
36
 
20
37
  | Scenario | Docs | Index RAM¹ | Disk (binary vs JSON)² | Cold load³ | Search p50⁴ |
21
38
  |----------|-----:|------------|------------------------:|-----------:|------------:|
@@ -29,9 +46,10 @@ Same BM25 queries on identical corpora. **Frozen wins on what we optimize for**:
29
46
 
30
47
  Decomposition (Divina exact): L0 lookup ~300 ns frozen, L1 `executeQuery` ~8.3 µs, L2 full `search` ~11.6 µs (finalize ≈ 3 µs).
31
48
 
32
- | | lucaong `minisearch` | `@yoch/frozenminisearch` |
49
+ | | MiniSearch | `@yoch/frozenminisearch` |
33
50
  |---|------------------------|---------------------------|
34
- | **Sweet spot** | Live index mutations | Fixed corpus, deploy from binary |
51
+ | **Optimizes for** | Live mutations, flexibility | **Retained RAM**, snapshot size, cold load |
52
+ | **Sweet spot** | Documents change at runtime | Fixed corpus, many replicas, tight memory budget |
35
53
  | **Production path** | `addAll` → `toJSON` | `fromDocuments` / `fromMiniSearch` → `saveBinarySync` → `loadBinarySync` |
36
54
  | **Typical trade-off** | Higher RAM, JSON snapshots | One-time freeze, then compact binary |
37
55
 
@@ -91,17 +109,17 @@ ESM and CommonJS are both supported (`main` → CJS, `module` → ESM).
91
109
 
92
110
  ## Drop-in
93
111
 
94
- For **fixed corpora** (build once, serve read-only), treat this package as a drop-in replacement for lucaong `minisearch` on the serving path.
112
+ For **fixed corpora** (build once, serve read-only), treat this package as a drop-in replacement for MiniSearch on the serving path — same queries, far less memory per replica.
95
113
 
96
114
  **Change only:**
97
115
 
98
116
  | What | Before | After |
99
117
  |------|--------|-------|
100
- | Package | lucaong `minisearch` | `@yoch/frozenminisearch` |
118
+ | Package | `minisearch` | `@yoch/frozenminisearch` |
101
119
  | Construction | `new MiniSearch(opts).addAll(docs)` | `FrozenMiniSearch.fromDocuments(docs, opts)` or `fromMiniSearch(mutable, opts)` |
102
- | JSON snapshot | `MiniSearch.loadJSON(json)` / `toJSON()` wire format | `FrozenMiniSearch.fromMiniSearchJson(json, opts)` or `fromMiniSearchSnapshot(obj)` — no runtime dependency on lucaong `minisearch` |
120
+ | JSON snapshot | `toJSON()` / `loadJSON()` wire format | `FrozenMiniSearch.toJSON()` / `fromJson(json, opts)` or `fromMiniSearchSnapshot(obj)` — no runtime dependency on `minisearch` |
103
121
 
104
- **Keep unchanged** after load: `search`, `autoSuggest`, `has`, `getStoredFields`, query options (`prefix`, `fuzzy`, `AND` / `OR` / `AND_NOT`, filters, boosts). Parity vs `minisearch@7` is enforced in `dev/parity/`.
122
+ **Keep unchanged** after load: `search`, `autoSuggest`, `has`, `getStoredFields`, query options (`prefix`, `fuzzy`, `AND` / `OR` / `AND_NOT`, filters, boosts). Parity vs MiniSearch 7 is enforced in `dev/parity/`.
105
123
 
106
124
  **Imports** — default and named both work (ESM and CJS):
107
125
 
@@ -121,7 +139,7 @@ const { FrozenMiniSearch } = require('@yoch/frozenminisearch')
121
139
 
122
140
  ## Migration
123
141
 
124
- ### From lucaong `minisearch` JSON
142
+ ### From MiniSearch JSON
125
143
 
126
144
  ```javascript
127
145
  import MiniSearch from 'minisearch' // build-time only
@@ -135,18 +153,18 @@ const frozen = FrozenMiniSearch.fromMiniSearch(mutable, options)
135
153
 
136
154
  // Option B — serialized index (offline / ETL)
137
155
  const json = JSON.stringify(mutable)
138
- const frozen2 = FrozenMiniSearch.fromMiniSearchJson(json, options)
156
+ const frozen2 = FrozenMiniSearch.fromJson(json, options)
139
157
  ```
140
158
 
141
159
  `options.fields` must match the indexed fields in the snapshot when provided.
142
160
 
143
- ### From lucaong `minisearch` (mutable → frozen)
161
+ ### From MiniSearch (mutable → frozen)
144
162
 
145
163
  | Before (mutable) | After (`@yoch/frozenminisearch`) |
146
164
  |------------------|----------------------------------|
147
165
  | `new MiniSearch(opts).addAll(docs)` then serve | `FrozenMiniSearch.fromDocuments(docs, opts)` or `fromMiniSearch(mutable, opts)` |
148
- | lucaong JSON snapshot | `FrozenMiniSearch.fromMiniSearchJson(json)` or `fromMiniSearchSnapshot(obj)` |
149
- | `import MiniSearch from 'minisearch'` | `import FrozenMiniSearch from '@yoch/frozenminisearch'` (+ lucaong `minisearch` only if you still build mutable indexes) |
166
+ | MiniSearch JSON snapshot | `FrozenMiniSearch.fromJson(json)` or `fromMiniSearchSnapshot(obj)` |
167
+ | `import MiniSearch from 'minisearch'` | `import FrozenMiniSearch from '@yoch/frozenminisearch'` (+ `minisearch` only if you still build mutable indexes) |
150
168
 
151
169
  ---
152
170
 
@@ -163,13 +181,31 @@ Indexing is **not** available on a frozen instance — use `fromDocuments`, the
163
181
 
164
182
  ## Binary snapshots
165
183
 
184
+ The primary way to **persist and ship a memory-compact index** — smaller than MiniSearch JSON and faster to load into a low-RAM serving process.
185
+
166
186
  ```javascript
167
187
  const buf = index.saveBinarySync()
168
188
  const loaded = FrozenMiniSearch.loadBinarySync(buf, {}) // field names embedded in snapshot
169
189
  ```
170
190
 
171
- - **Node ≥ 22.15.0** (zstd via `node:zlib`)
172
- - Snapshots produced by this package version are forward-compatible; re-build from lucaong JSON if an older binary fails to load
191
+ - **Node ≥ 20**
192
+ - Default snapshot compression (`compression: 'auto'`, one pass):
193
+ - payloads under 64 B stay raw
194
+ - `zstd` on Node 22.15+ when it strictly shrinks the payload
195
+ - otherwise `zlib` on Node 20–22.14 when it strictly shrinks the payload
196
+ - otherwise `raw` (uncompressed)
197
+ - Explicit snapshot compression always writes the chosen codec, even when compression would not shrink the payload (useful for portability):
198
+
199
+ ```javascript
200
+ const portable = index.saveBinarySync({ compression: 'zlib' })
201
+ const uncompressed = index.saveBinarySync({ compression: 'raw' })
202
+ const bestRatio = index.saveBinarySync({ compression: 'zstd' }) // Node 22.15+
203
+ ```
204
+
205
+ - Snapshot readability depends on the embedded codec:
206
+ - `raw` and `zlib` snapshots load on Node 20+
207
+ - `zstd` snapshots require Node 22.15+
208
+ - Snapshots produced by this package version are forward-compatible; re-build from MiniSearch JSON if an older binary fails to load
173
209
  - `tokenize` / `processTerm` are not stored — pass the same functions at load when customized
174
210
 
175
211
  ---
@@ -206,6 +242,6 @@ Design notes (freq adaptive, AND gating): [dev/docs/README.md](dev/docs/README.m
206
242
  See [CHANGELOG.md](./CHANGELOG.md).
207
243
 
208
244
  - **MiniSearch** — [Luca Ongaro](https://github.com/lucaong/minisearch) (MIT)
209
- - **@yoch/frozenminisearch** — frozen indexes, packed radix tree, compact binary snapshots
245
+ - **@yoch/frozenminisearch** — memory-optimized frozen indexes, packed radix tree, compact binary snapshots
210
246
 
211
247
  Upstream docs: [MiniSearch](https://lucaong.github.io/minisearch/)