@yoch/frozenminisearch 1.0.2 → 1.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +32 -5
- package/README.md +57 -21
- package/dist/cjs/index.cjs +276 -81
- package/dist/es/index.d.ts +37 -14
- package/dist/es/index.js +276 -81
- package/package.json +16 -5
package/CHANGELOG.md
CHANGED
|
@@ -2,6 +2,33 @@
|
|
|
2
2
|
|
|
3
3
|
## Unreleased
|
|
4
4
|
|
|
5
|
+
## v1.2.0 — `@yoch/frozenminisearch`
|
|
6
|
+
|
|
7
|
+
Minor release: configurable MSv5 snapshot compression and Node 20 support.
|
|
8
|
+
|
|
9
|
+
### Added
|
|
10
|
+
|
|
11
|
+
- **`SaveBinaryOptions`** — `saveBinarySync()` / `saveBinaryAsync()` accept `{ compression: 'auto' | 'raw' | 'zstd' | 'zlib' }`.
|
|
12
|
+
- **`CODEC_ZLIB`** — portable deflate snapshots readable on Node 20+; explicit `compression: 'zlib'` always writes zlib on disk.
|
|
13
|
+
- **Exported types** — `BinaryCompression`, `SaveBinaryOptions`.
|
|
14
|
+
|
|
15
|
+
### Improved
|
|
16
|
+
|
|
17
|
+
- **`compression: 'auto'`** — one compression pass: zstd when available (Node 22.15+), otherwise zlib on Node 20–22.14, otherwise raw when compression does not strictly shrink the payload (including payloads under 64 B).
|
|
18
|
+
- **Node engine** — `>=20` (was `>=22.15`); zstd remains available on Node 22.15+ and is required to read zstd snapshots.
|
|
19
|
+
|
|
20
|
+
## v1.1.0 — `@yoch/frozenminisearch`
|
|
21
|
+
|
|
22
|
+
Minor release: MiniSearch JSON wire export and clearer JSON import API. MSv5 binary format unchanged.
|
|
23
|
+
|
|
24
|
+
### Added
|
|
25
|
+
|
|
26
|
+
- **`toJSON()`** — export MiniSearch wire snapshots (`serializationVersion: 2`); import via `fromJson` / `fromMiniSearchSnapshot`. Production persistence remains `saveBinarySync`.
|
|
27
|
+
|
|
28
|
+
### Breaking
|
|
29
|
+
|
|
30
|
+
- **`fromMiniSearchJson` → `fromJson`** — rename for clearer semantics (JSON import vs binary load). Update call sites: `FrozenMiniSearch.fromMiniSearchJson(json)` → `FrozenMiniSearch.fromJson(json)`.
|
|
31
|
+
|
|
5
32
|
## v1.0.2 — `@yoch/frozenminisearch`
|
|
6
33
|
|
|
7
34
|
Patch release: lower retained heap when `storeFields` has one field. No API or MSv5 wire-format changes.
|
|
@@ -23,7 +50,7 @@ Patch release: lower build-time peak memory and migration ergonomics. No API or
|
|
|
23
50
|
|
|
24
51
|
### Fixed
|
|
25
52
|
|
|
26
|
-
- **Default tokenizer parity** — leading delimiter produces an empty token (e.g. `::a` → `["", "a"]`), matching
|
|
53
|
+
- **Default tokenizer parity** — leading delimiter produces an empty token (e.g. `::a` → `["", "a"]`), matching MiniSearch `split` behaviour.
|
|
27
54
|
- **Named export** — `FrozenMiniSearch` is exported again alongside the default export (ESM and CJS).
|
|
28
55
|
|
|
29
56
|
## v1.0.0 — `@yoch/frozenminisearch`
|
|
@@ -32,7 +59,7 @@ First stable release on npm. Frozen-only read-only search for Node.js.
|
|
|
32
59
|
|
|
33
60
|
### Breaking
|
|
34
61
|
|
|
35
|
-
- **Binary snapshots** — `loadBinarySync` / `loadBinaryAsync` read only the current frozen binary format; re-build from
|
|
62
|
+
- **Binary snapshots** — `loadBinarySync` / `loadBinaryAsync` read only the current frozen binary format; re-build from MiniSearch JSON if an older snapshot fails to load.
|
|
36
63
|
- **Removed `saveBinary()` / `loadBinary()`** — use `saveBinarySync` / `saveBinaryAsync` and `loadBinarySync` / `loadBinaryAsync`.
|
|
37
64
|
|
|
38
65
|
## v1.0.0-beta.0 — `@yoch/frozenminisearch`
|
|
@@ -42,16 +69,16 @@ New standalone package (frozen-only) for read-only serving workloads.
|
|
|
42
69
|
### Added
|
|
43
70
|
|
|
44
71
|
- **`FrozenMiniSearch`** as the default export — `fromDocuments`, builder, `saveBinarySync` / `loadBinarySync`
|
|
45
|
-
- **Migration loaders** — `fromMiniSearch`, `
|
|
72
|
+
- **Migration loaders** — `fromMiniSearch`, `fromJson`, `fromMiniSearchSnapshot` (MiniSearch JSON wire format)
|
|
46
73
|
- **Modular benchmarks** — `npm run bench` with profiles `vs-reference`, `regression`, `dev`
|
|
47
74
|
- **Parity suite** — `dev/parity/` vs `minisearch` npm (functional invariants)
|
|
48
75
|
|
|
49
76
|
### Removed from published API
|
|
50
77
|
|
|
51
78
|
- Mutable `MiniSearch` class and `freeze()` on the fork
|
|
52
|
-
- `freezeFromMiniSearch` (use `
|
|
79
|
+
- `freezeFromMiniSearch` (use `fromJson`)
|
|
53
80
|
- Read-only mutation stubs (`add`, `remove`, …)
|
|
54
81
|
|
|
55
82
|
### Migration
|
|
56
83
|
|
|
57
|
-
- `new MiniSearch(opts).addAll(docs)`
|
|
84
|
+
- `new MiniSearch(opts).addAll(docs)` → `FrozenMiniSearch.fromDocuments(docs, opts)` or `fromMiniSearch(mutable, opts)` — see README
|
package/README.md
CHANGED
|
@@ -1,21 +1,38 @@
|
|
|
1
|
-
#
|
|
1
|
+
# FrozenMiniSearch
|
|
2
2
|
|
|
3
|
-
|
|
3
|
+
[](https://www.npmjs.com/package/@yoch/frozenminisearch)
|
|
4
|
+
[](https://codecov.io/gh/yoch/frozenminisearch)
|
|
5
|
+
[](https://github.com/yoch/frozenminisearch/actions/workflows/main.yml)
|
|
6
|
+
[](https://socket.dev/npm/package/%40yoch%2Ffrozenminisearch)
|
|
4
7
|
|
|
5
|
-
|
|
8
|
+
[API documentation](https://yoch.github.io/frozenminisearch/)
|
|
6
9
|
|
|
7
|
-
**
|
|
10
|
+
**Memory-optimized, read-only full-text search for Node.js** — the same BM25, prefix/fuzzy, and `autoSuggest` API as [MiniSearch](https://github.com/lucaong/minisearch), with **up to ~98% less index RAM** on real corpora and compact binary snapshots you ship instead of JSON.
|
|
11
|
+
|
|
12
|
+
**Why it exists:** [MiniSearch](https://github.com/lucaong/minisearch) optimizes for a mutable in-memory index. FrozenMiniSearch optimizes for **retained heap, disk footprint, and cold load** once the corpus is fixed — packed radix postings, columnar `storeFields`, typed-array layouts, and MSv5 binary wire format instead of per-document JS objects.
|
|
13
|
+
|
|
14
|
+
**Design goal:** migrate with minimal code change — package name and index construction only; serving code stays the same. Build with `fromDocuments`, the incremental builder, or `fromJson`; no mutable `MiniSearch` class is published here.
|
|
8
15
|
|
|
9
16
|
---
|
|
10
17
|
|
|
11
18
|
## Why frozen instead of MiniSearch?
|
|
12
19
|
|
|
13
|
-
**
|
|
20
|
+
Choose **mutable MiniSearch** when documents change at runtime (`add`, `remove`, `discard`). Choose **frozen** when memory and snapshot size matter: fixed corpus, deploy from binary, many replicas loading the same index. Search semantics stay the same — BM25, prefix/fuzzy, `autoSuggest`, wildcard, `AND` / `OR` / `AND_NOT` — with parity vs MiniSearch 7 validated in `dev/parity/` (scores `toBeCloseTo` precision 6).
|
|
21
|
+
|
|
22
|
+
### Memory-first design
|
|
23
|
+
|
|
24
|
+
| Technique | What it saves |
|
|
25
|
+
|-----------|---------------|
|
|
26
|
+
| **Packed radix tree + flat postings** | Term dictionary and posting lists without per-entry JS wrappers |
|
|
27
|
+
| **Columnar `storeFields`** | One dense column per field instead of a `Record` per document (~75% less heap for a single stored field) |
|
|
28
|
+
| **MSv5 binary snapshots** | ~73–94% smaller on disk than MiniSearch JSON; faster cold load |
|
|
29
|
+
| **Read-only freeze** | No mutation bookkeeping — layouts sized for serve-time, not incremental edit |
|
|
30
|
+
| **Incremental builder** | Typed-array accumulators during build; lower peak heap than materializing `number[][]` per term |
|
|
14
31
|
|
|
15
32
|
<!-- vs-reference:start — npm run bench:readme -->
|
|
16
|
-
### Measured vs
|
|
33
|
+
### Measured vs MiniSearch (reference baseline)
|
|
17
34
|
|
|
18
|
-
Same BM25 queries on identical corpora. **
|
|
35
|
+
Same BM25 queries on identical corpora. **Index RAM is the headline metric** — frozen uses a fraction of mutable heap on every scenario below; disk and cold load follow from the compact binary format.
|
|
19
36
|
|
|
20
37
|
| Scenario | Docs | Index RAM¹ | Disk (binary vs JSON)² | Cold load³ | Search p50⁴ |
|
|
21
38
|
|----------|-----:|------------|------------------------:|-----------:|------------:|
|
|
@@ -29,9 +46,10 @@ Same BM25 queries on identical corpora. **Frozen wins on what we optimize for**:
|
|
|
29
46
|
|
|
30
47
|
Decomposition (Divina exact): L0 lookup ~300 ns frozen, L1 `executeQuery` ~8.3 µs, L2 full `search` ~11.6 µs (finalize ≈ 3 µs).
|
|
31
48
|
|
|
32
|
-
| |
|
|
49
|
+
| | MiniSearch | `@yoch/frozenminisearch` |
|
|
33
50
|
|---|------------------------|---------------------------|
|
|
34
|
-
| **
|
|
51
|
+
| **Optimizes for** | Live mutations, flexibility | **Retained RAM**, snapshot size, cold load |
|
|
52
|
+
| **Sweet spot** | Documents change at runtime | Fixed corpus, many replicas, tight memory budget |
|
|
35
53
|
| **Production path** | `addAll` → `toJSON` | `fromDocuments` / `fromMiniSearch` → `saveBinarySync` → `loadBinarySync` |
|
|
36
54
|
| **Typical trade-off** | Higher RAM, JSON snapshots | One-time freeze, then compact binary |
|
|
37
55
|
|
|
@@ -91,17 +109,17 @@ ESM and CommonJS are both supported (`main` → CJS, `module` → ESM).
|
|
|
91
109
|
|
|
92
110
|
## Drop-in
|
|
93
111
|
|
|
94
|
-
For **fixed corpora** (build once, serve read-only), treat this package as a drop-in replacement for
|
|
112
|
+
For **fixed corpora** (build once, serve read-only), treat this package as a drop-in replacement for MiniSearch on the serving path — same queries, far less memory per replica.
|
|
95
113
|
|
|
96
114
|
**Change only:**
|
|
97
115
|
|
|
98
116
|
| What | Before | After |
|
|
99
117
|
|------|--------|-------|
|
|
100
|
-
| Package |
|
|
118
|
+
| Package | `minisearch` | `@yoch/frozenminisearch` |
|
|
101
119
|
| Construction | `new MiniSearch(opts).addAll(docs)` | `FrozenMiniSearch.fromDocuments(docs, opts)` or `fromMiniSearch(mutable, opts)` |
|
|
102
|
-
| JSON snapshot | `
|
|
120
|
+
| JSON snapshot | `toJSON()` / `loadJSON()` wire format | `FrozenMiniSearch.toJSON()` / `fromJson(json, opts)` or `fromMiniSearchSnapshot(obj)` — no runtime dependency on `minisearch` |
|
|
103
121
|
|
|
104
|
-
**Keep unchanged** after load: `search`, `autoSuggest`, `has`, `getStoredFields`, query options (`prefix`, `fuzzy`, `AND` / `OR` / `AND_NOT`, filters, boosts). Parity vs
|
|
122
|
+
**Keep unchanged** after load: `search`, `autoSuggest`, `has`, `getStoredFields`, query options (`prefix`, `fuzzy`, `AND` / `OR` / `AND_NOT`, filters, boosts). Parity vs MiniSearch 7 is enforced in `dev/parity/`.
|
|
105
123
|
|
|
106
124
|
**Imports** — default and named both work (ESM and CJS):
|
|
107
125
|
|
|
@@ -121,7 +139,7 @@ const { FrozenMiniSearch } = require('@yoch/frozenminisearch')
|
|
|
121
139
|
|
|
122
140
|
## Migration
|
|
123
141
|
|
|
124
|
-
### From
|
|
142
|
+
### From MiniSearch JSON
|
|
125
143
|
|
|
126
144
|
```javascript
|
|
127
145
|
import MiniSearch from 'minisearch' // build-time only
|
|
@@ -135,18 +153,18 @@ const frozen = FrozenMiniSearch.fromMiniSearch(mutable, options)
|
|
|
135
153
|
|
|
136
154
|
// Option B — serialized index (offline / ETL)
|
|
137
155
|
const json = JSON.stringify(mutable)
|
|
138
|
-
const frozen2 = FrozenMiniSearch.
|
|
156
|
+
const frozen2 = FrozenMiniSearch.fromJson(json, options)
|
|
139
157
|
```
|
|
140
158
|
|
|
141
159
|
`options.fields` must match the indexed fields in the snapshot when provided.
|
|
142
160
|
|
|
143
|
-
### From
|
|
161
|
+
### From MiniSearch (mutable → frozen)
|
|
144
162
|
|
|
145
163
|
| Before (mutable) | After (`@yoch/frozenminisearch`) |
|
|
146
164
|
|------------------|----------------------------------|
|
|
147
165
|
| `new MiniSearch(opts).addAll(docs)` then serve | `FrozenMiniSearch.fromDocuments(docs, opts)` or `fromMiniSearch(mutable, opts)` |
|
|
148
|
-
|
|
|
149
|
-
| `import MiniSearch from 'minisearch'` | `import FrozenMiniSearch from '@yoch/frozenminisearch'` (+
|
|
166
|
+
| MiniSearch JSON snapshot | `FrozenMiniSearch.fromJson(json)` or `fromMiniSearchSnapshot(obj)` |
|
|
167
|
+
| `import MiniSearch from 'minisearch'` | `import FrozenMiniSearch from '@yoch/frozenminisearch'` (+ `minisearch` only if you still build mutable indexes) |
|
|
150
168
|
|
|
151
169
|
---
|
|
152
170
|
|
|
@@ -163,13 +181,31 @@ Indexing is **not** available on a frozen instance — use `fromDocuments`, the
|
|
|
163
181
|
|
|
164
182
|
## Binary snapshots
|
|
165
183
|
|
|
184
|
+
The primary way to **persist and ship a memory-compact index** — smaller than MiniSearch JSON and faster to load into a low-RAM serving process.
|
|
185
|
+
|
|
166
186
|
```javascript
|
|
167
187
|
const buf = index.saveBinarySync()
|
|
168
188
|
const loaded = FrozenMiniSearch.loadBinarySync(buf, {}) // field names embedded in snapshot
|
|
169
189
|
```
|
|
170
190
|
|
|
171
|
-
- **Node ≥
|
|
172
|
-
-
|
|
191
|
+
- **Node ≥ 20**
|
|
192
|
+
- Default snapshot compression (`compression: 'auto'`, one pass):
|
|
193
|
+
- payloads under 64 B stay raw
|
|
194
|
+
- `zstd` on Node 22.15+ when it strictly shrinks the payload
|
|
195
|
+
- otherwise `zlib` on Node 20–22.14 when it strictly shrinks the payload
|
|
196
|
+
- otherwise `raw` (uncompressed)
|
|
197
|
+
- Explicit snapshot compression always writes the chosen codec, even when compression would not shrink the payload (useful for portability):
|
|
198
|
+
|
|
199
|
+
```javascript
|
|
200
|
+
const portable = index.saveBinarySync({ compression: 'zlib' })
|
|
201
|
+
const uncompressed = index.saveBinarySync({ compression: 'raw' })
|
|
202
|
+
const bestRatio = index.saveBinarySync({ compression: 'zstd' }) // Node 22.15+
|
|
203
|
+
```
|
|
204
|
+
|
|
205
|
+
- Snapshot readability depends on the embedded codec:
|
|
206
|
+
- `raw` and `zlib` snapshots load on Node 20+
|
|
207
|
+
- `zstd` snapshots require Node 22.15+
|
|
208
|
+
- Snapshots produced by this package version are forward-compatible; re-build from MiniSearch JSON if an older binary fails to load
|
|
173
209
|
- `tokenize` / `processTerm` are not stored — pass the same functions at load when customized
|
|
174
210
|
|
|
175
211
|
---
|
|
@@ -206,6 +242,6 @@ Design notes (freq adaptive, AND gating): [dev/docs/README.md](dev/docs/README.m
|
|
|
206
242
|
See [CHANGELOG.md](./CHANGELOG.md).
|
|
207
243
|
|
|
208
244
|
- **MiniSearch** — [Luca Ongaro](https://github.com/lucaong/minisearch) (MIT)
|
|
209
|
-
- **@yoch/frozenminisearch** — frozen indexes, packed radix tree, compact binary snapshots
|
|
245
|
+
- **@yoch/frozenminisearch** — memory-optimized frozen indexes, packed radix tree, compact binary snapshots
|
|
210
246
|
|
|
211
247
|
Upstream docs: [MiniSearch](https://lucaong.github.io/minisearch/)
|