@yoch/frozenminisearch 1.0.1 → 1.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -2,6 +2,28 @@
2
2
 
3
3
  ## Unreleased
4
4
 
5
+ ## v1.1.0 — `@yoch/frozenminisearch`
6
+
7
+ Minor release: MiniSearch JSON wire export and clearer JSON import API. MSv5 binary format unchanged.
8
+
9
+ ### Added
10
+
11
+ - **`toJSON()`** — export MiniSearch wire snapshots (`serializationVersion: 2`); import via `fromJson` / `fromMiniSearchSnapshot`. Production persistence remains `saveBinarySync`.
12
+
13
+ ### Breaking
14
+
15
+ - **`fromMiniSearchJson` → `fromJson`** — rename for clearer semantics (JSON import vs binary load). Update call sites: `FrozenMiniSearch.fromMiniSearchJson(json)` → `FrozenMiniSearch.fromJson(json)`.
16
+
17
+ ## v1.0.2 — `@yoch/frozenminisearch`
18
+
19
+ Patch release: lower retained heap when `storeFields` has one field. No API or MSv5 wire-format changes.
20
+
21
+ ### Improved
22
+
23
+ - **Single-field `storeFields` at rest** — values live in a dense column instead of one `Record` per document (~75% less retained heap on Divina with `storeFields: ['txt']`; ~1.0 → ~0.3 MB).
24
+ - **Binary save/load** — encode and decode skip intermediate row arrays when the in-memory layout or load `storeFields` hint allows direct wire paths (same bytes on disk).
25
+ - **Posting slice lookups** — scoring flyweight reuses a scratch buffer instead of allocating `{ offset, length }` per lookup.
26
+
5
27
  ## v1.0.1 — `@yoch/frozenminisearch`
6
28
 
7
29
  Patch release: lower build-time peak memory and migration ergonomics. No API or wire-format changes.
@@ -13,7 +35,7 @@ Patch release: lower build-time peak memory and migration ergonomics. No API or
13
35
 
14
36
  ### Fixed
15
37
 
16
- - **Default tokenizer parity** — leading delimiter produces an empty token (e.g. `::a` → `["", "a"]`), matching lucaong `split` behaviour.
38
+ - **Default tokenizer parity** — leading delimiter produces an empty token (e.g. `::a` → `["", "a"]`), matching MiniSearch `split` behaviour.
17
39
  - **Named export** — `FrozenMiniSearch` is exported again alongside the default export (ESM and CJS).
18
40
 
19
41
  ## v1.0.0 — `@yoch/frozenminisearch`
@@ -22,7 +44,7 @@ First stable release on npm. Frozen-only read-only search for Node.js.
22
44
 
23
45
  ### Breaking
24
46
 
25
- - **Binary snapshots** — `loadBinarySync` / `loadBinaryAsync` read only the current frozen binary format; re-build from lucaong JSON if an older snapshot fails to load.
47
+ - **Binary snapshots** — `loadBinarySync` / `loadBinaryAsync` read only the current frozen binary format; re-build from MiniSearch JSON if an older snapshot fails to load.
26
48
  - **Removed `saveBinary()` / `loadBinary()`** — use `saveBinarySync` / `saveBinaryAsync` and `loadBinarySync` / `loadBinaryAsync`.
27
49
 
28
50
  ## v1.0.0-beta.0 — `@yoch/frozenminisearch`
@@ -32,16 +54,16 @@ New standalone package (frozen-only) for read-only serving workloads.
32
54
  ### Added
33
55
 
34
56
  - **`FrozenMiniSearch`** as the default export — `fromDocuments`, builder, `saveBinarySync` / `loadBinarySync`
35
- - **Migration loaders** — `fromMiniSearch`, `fromMiniSearchJson`, `fromMiniSearchSnapshot` (lucaong JSON wire format)
57
+ - **Migration loaders** — `fromMiniSearch`, `fromJson`, `fromMiniSearchSnapshot` (MiniSearch JSON wire format)
36
58
  - **Modular benchmarks** — `npm run bench` with profiles `vs-reference`, `regression`, `dev`
37
59
  - **Parity suite** — `dev/parity/` vs `minisearch` npm (functional invariants)
38
60
 
39
61
  ### Removed from published API
40
62
 
41
63
  - Mutable `MiniSearch` class and `freeze()` on the fork
42
- - `freezeFromMiniSearch` (use `fromMiniSearchJson`)
64
+ - `freezeFromMiniSearch` (use `fromJson`)
43
65
  - Read-only mutation stubs (`add`, `remove`, …)
44
66
 
45
67
  ### Migration
46
68
 
47
- - `new MiniSearch(opts).addAll(docs)` (lucaong) → `FrozenMiniSearch.fromDocuments(docs, opts)` or `fromMiniSearch(mutable, opts)` — see README
69
+ - `new MiniSearch(opts).addAll(docs)` → `FrozenMiniSearch.fromDocuments(docs, opts)` or `fromMiniSearch(mutable, opts)` — see README
package/README.md CHANGED
@@ -1,21 +1,36 @@
1
- # @yoch/frozenminisearch
1
+ # FrozenMiniSearch
2
2
 
3
- **Read-only full-text search for Node.js** — compact frozen indexes, fast binary snapshots, and a **drop-in** search API for frozen workloads: same `search`, `autoSuggest`, scoring, and query options as [MiniSearch](https://github.com/lucaong/minisearch) by [Luca Ongaro](https://github.com/lucaong).
3
+ [![npm version](https://img.shields.io/npm/v/@yoch/frozenminisearch.svg)](https://www.npmjs.com/package/@yoch/frozenminisearch)
4
+ [![coverage](https://codecov.io/gh/yoch/frozenminisearch/graph/badge.svg)](https://codecov.io/gh/yoch/frozenminisearch)
5
+ [![CI](https://github.com/yoch/frozenminisearch/actions/workflows/main.yml/badge.svg)](https://github.com/yoch/frozenminisearch/actions/workflows/main.yml)
6
+ [![Socket Badge](https://socket.dev/api/badge/npm/package/@yoch/frozenminisearch)](https://socket.dev/npm/package/%40yoch%2Ffrozenminisearch)
4
7
 
5
- > **Current release:** `1.0.1` on npm
8
+ **Memory-optimized, read-only full-text search for Node.js** the same BM25, prefix/fuzzy, and `autoSuggest` API as [MiniSearch](https://github.com/lucaong/minisearch), with **up to ~98% less index RAM** on real corpora and compact binary snapshots you ship instead of JSON.
6
9
 
7
- **Design goal:** once an index is built or loaded, migrate with the minimum code change package name and index construction only; serving code stays the same. No mutable `MiniSearch` class is published here; build indexes with `fromDocuments`, the incremental builder, or migrate from an existing lucaong index via `fromMiniSearchJson`.
10
+ **Why it exists:** [MiniSearch](https://github.com/lucaong/minisearch) optimizes for a mutable in-memory index. FrozenMiniSearch optimizes for **retained heap, disk footprint, and cold load** once the corpus is fixed packed radix postings, columnar `storeFields`, typed-array layouts, and MSv5 binary wire format instead of per-document JS objects.
11
+
12
+ **Design goal:** migrate with minimal code change — package name and index construction only; serving code stays the same. Build with `fromDocuments`, the incremental builder, or `fromJson`; no mutable `MiniSearch` class is published here.
8
13
 
9
14
  ---
10
15
 
11
16
  ## Why frozen instead of MiniSearch?
12
17
 
13
- **Mutable** lucaong `minisearch` when documents change (`add`, `remove`, `discard`). **Frozen** when the corpus is fixed or shipped as a binary snapshot same BM25, prefix/fuzzy, `autoSuggest`, wildcard, and `AND` / `OR` / `AND_NOT`. Parity with `minisearch@7` is validated in `dev/parity/` (scores `toBeCloseTo` precision 6).
18
+ Choose **mutable MiniSearch** when documents change at runtime (`add`, `remove`, `discard`). Choose **frozen** when memory and snapshot size matter: fixed corpus, deploy from binary, many replicas loading the same index. Search semantics stay the same — BM25, prefix/fuzzy, `autoSuggest`, wildcard, `AND` / `OR` / `AND_NOT` with parity vs MiniSearch 7 validated in `dev/parity/` (scores `toBeCloseTo` precision 6).
19
+
20
+ ### Memory-first design
21
+
22
+ | Technique | What it saves |
23
+ |-----------|---------------|
24
+ | **Packed radix tree + flat postings** | Term dictionary and posting lists without per-entry JS wrappers |
25
+ | **Columnar `storeFields`** | One dense column per field instead of a `Record` per document (~75% less heap for a single stored field) |
26
+ | **MSv5 binary snapshots** | ~73–94% smaller on disk than MiniSearch JSON; faster cold load |
27
+ | **Read-only freeze** | No mutation bookkeeping — layouts sized for serve-time, not incremental edit |
28
+ | **Incremental builder** | Typed-array accumulators during build; lower peak heap than materializing `number[][]` per term |
14
29
 
15
30
  <!-- vs-reference:start — npm run bench:readme -->
16
- ### Measured vs lucaong MiniSearch (reference baseline)
31
+ ### Measured vs MiniSearch (reference baseline)
17
32
 
18
- Same BM25 queries on identical corpora. **Frozen wins on what we optimize for**: RAM, disk, cold load, and search throughput on real workloads.
33
+ Same BM25 queries on identical corpora. **Index RAM is the headline metric** frozen uses a fraction of mutable heap on every scenario below; disk and cold load follow from the compact binary format.
19
34
 
20
35
  | Scenario | Docs | Index RAM¹ | Disk (binary vs JSON)² | Cold load³ | Search p50⁴ |
21
36
  |----------|-----:|------------|------------------------:|-----------:|------------:|
@@ -29,9 +44,10 @@ Same BM25 queries on identical corpora. **Frozen wins on what we optimize for**:
29
44
 
30
45
  Decomposition (Divina exact): L0 lookup ~300 ns frozen, L1 `executeQuery` ~8.3 µs, L2 full `search` ~11.6 µs (finalize ≈ 3 µs).
31
46
 
32
- | | lucaong `minisearch` | `@yoch/frozenminisearch` |
47
+ | | MiniSearch | `@yoch/frozenminisearch` |
33
48
  |---|------------------------|---------------------------|
34
- | **Sweet spot** | Live index mutations | Fixed corpus, deploy from binary |
49
+ | **Optimizes for** | Live mutations, flexibility | **Retained RAM**, snapshot size, cold load |
50
+ | **Sweet spot** | Documents change at runtime | Fixed corpus, many replicas, tight memory budget |
35
51
  | **Production path** | `addAll` → `toJSON` | `fromDocuments` / `fromMiniSearch` → `saveBinarySync` → `loadBinarySync` |
36
52
  | **Typical trade-off** | Higher RAM, JSON snapshots | One-time freeze, then compact binary |
37
53
 
@@ -91,17 +107,17 @@ ESM and CommonJS are both supported (`main` → CJS, `module` → ESM).
91
107
 
92
108
  ## Drop-in
93
109
 
94
- For **fixed corpora** (build once, serve read-only), treat this package as a drop-in replacement for lucaong `minisearch` on the serving path.
110
+ For **fixed corpora** (build once, serve read-only), treat this package as a drop-in replacement for MiniSearch on the serving path — same queries, far less memory per replica.
95
111
 
96
112
  **Change only:**
97
113
 
98
114
  | What | Before | After |
99
115
  |------|--------|-------|
100
- | Package | lucaong `minisearch` | `@yoch/frozenminisearch` |
116
+ | Package | `minisearch` | `@yoch/frozenminisearch` |
101
117
  | Construction | `new MiniSearch(opts).addAll(docs)` | `FrozenMiniSearch.fromDocuments(docs, opts)` or `fromMiniSearch(mutable, opts)` |
102
- | JSON snapshot | `MiniSearch.loadJSON(json)` / `toJSON()` wire format | `FrozenMiniSearch.fromMiniSearchJson(json, opts)` or `fromMiniSearchSnapshot(obj)` — no runtime dependency on lucaong `minisearch` |
118
+ | JSON snapshot | `toJSON()` / `loadJSON()` wire format | `FrozenMiniSearch.toJSON()` / `fromJson(json, opts)` or `fromMiniSearchSnapshot(obj)` — no runtime dependency on `minisearch` |
103
119
 
104
- **Keep unchanged** after load: `search`, `autoSuggest`, `has`, `getStoredFields`, query options (`prefix`, `fuzzy`, `AND` / `OR` / `AND_NOT`, filters, boosts). Parity vs `minisearch@7` is enforced in `dev/parity/`.
120
+ **Keep unchanged** after load: `search`, `autoSuggest`, `has`, `getStoredFields`, query options (`prefix`, `fuzzy`, `AND` / `OR` / `AND_NOT`, filters, boosts). Parity vs MiniSearch 7 is enforced in `dev/parity/`.
105
121
 
106
122
  **Imports** — default and named both work (ESM and CJS):
107
123
 
@@ -121,7 +137,7 @@ const { FrozenMiniSearch } = require('@yoch/frozenminisearch')
121
137
 
122
138
  ## Migration
123
139
 
124
- ### From lucaong `minisearch` JSON
140
+ ### From MiniSearch JSON
125
141
 
126
142
  ```javascript
127
143
  import MiniSearch from 'minisearch' // build-time only
@@ -135,18 +151,18 @@ const frozen = FrozenMiniSearch.fromMiniSearch(mutable, options)
135
151
 
136
152
  // Option B — serialized index (offline / ETL)
137
153
  const json = JSON.stringify(mutable)
138
- const frozen2 = FrozenMiniSearch.fromMiniSearchJson(json, options)
154
+ const frozen2 = FrozenMiniSearch.fromJson(json, options)
139
155
  ```
140
156
 
141
157
  `options.fields` must match the indexed fields in the snapshot when provided.
142
158
 
143
- ### From lucaong `minisearch` (mutable → frozen)
159
+ ### From MiniSearch (mutable → frozen)
144
160
 
145
161
  | Before (mutable) | After (`@yoch/frozenminisearch`) |
146
162
  |------------------|----------------------------------|
147
163
  | `new MiniSearch(opts).addAll(docs)` then serve | `FrozenMiniSearch.fromDocuments(docs, opts)` or `fromMiniSearch(mutable, opts)` |
148
- | lucaong JSON snapshot | `FrozenMiniSearch.fromMiniSearchJson(json)` or `fromMiniSearchSnapshot(obj)` |
149
- | `import MiniSearch from 'minisearch'` | `import FrozenMiniSearch from '@yoch/frozenminisearch'` (+ lucaong `minisearch` only if you still build mutable indexes) |
164
+ | MiniSearch JSON snapshot | `FrozenMiniSearch.fromJson(json)` or `fromMiniSearchSnapshot(obj)` |
165
+ | `import MiniSearch from 'minisearch'` | `import FrozenMiniSearch from '@yoch/frozenminisearch'` (+ `minisearch` only if you still build mutable indexes) |
150
166
 
151
167
  ---
152
168
 
@@ -163,13 +179,15 @@ Indexing is **not** available on a frozen instance — use `fromDocuments`, the
163
179
 
164
180
  ## Binary snapshots
165
181
 
182
+ The primary way to **persist and ship a memory-compact index** — smaller than MiniSearch JSON and faster to load into a low-RAM serving process.
183
+
166
184
  ```javascript
167
185
  const buf = index.saveBinarySync()
168
186
  const loaded = FrozenMiniSearch.loadBinarySync(buf, {}) // field names embedded in snapshot
169
187
  ```
170
188
 
171
189
  - **Node ≥ 22.15.0** (zstd via `node:zlib`)
172
- - Snapshots produced by this package version are forward-compatible; re-build from lucaong JSON if an older binary fails to load
190
+ - Snapshots produced by this package version are forward-compatible; re-build from MiniSearch JSON if an older binary fails to load
173
191
  - `tokenize` / `processTerm` are not stored — pass the same functions at load when customized
174
192
 
175
193
  ---
@@ -206,6 +224,6 @@ Design notes (freq adaptive, AND gating): [dev/docs/README.md](dev/docs/README.m
206
224
  See [CHANGELOG.md](./CHANGELOG.md).
207
225
 
208
226
  - **MiniSearch** — [Luca Ongaro](https://github.com/lucaong/minisearch) (MIT)
209
- - **@yoch/frozenminisearch** — frozen indexes, packed radix tree, compact binary snapshots
227
+ - **@yoch/frozenminisearch** — memory-optimized frozen indexes, packed radix tree, compact binary snapshots
210
228
 
211
229
  Upstream docs: [MiniSearch](https://lucaong.github.io/minisearch/)