@yoch/frozenminisearch 1.2.1 → 1.2.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +22 -0
- package/README.md +43 -99
- package/dist/cjs/index.cjs +419 -263
- package/dist/es/index.js +419 -263
- package/package.json +2 -1
package/CHANGELOG.md
CHANGED
|
@@ -2,6 +2,28 @@
|
|
|
2
2
|
|
|
3
3
|
## Unreleased
|
|
4
4
|
|
|
5
|
+
## v1.2.3 — `@yoch/frozenminisearch`
|
|
6
|
+
|
|
7
|
+
Patch release: broad-first exact AND / AND_NOT paths, seek-based gated doc-id collection, and README benchmark copy refresh. No API or MSv5 wire-format changes.
|
|
8
|
+
|
|
9
|
+
### Improved
|
|
10
|
+
|
|
11
|
+
- **Broad-first exact AND** — on exact-only combined queries where the first branch posting is large and a later branch is selective enough, collect the final doc-id gate by estimated posting length, then score branches in query order (parity with naive score-then-intersect unchanged).
|
|
12
|
+
- **Broad-first AND_NOT** — when the positive branch is large and a negated branch is comparably large, collect exclusions first and score the positive branch only on survivors.
|
|
13
|
+
- **Gated doc-id collection** — `DocIdGate` lazy views and seek over sorted postings when the gate is much smaller than the posting list (`scoring.ts`, `frozenPostings.ts`).
|
|
14
|
+
- **AND gate posting estimate** — skip upfront posting-length estimation on prefix/fuzzy AND branches; keep absolute-gate skip on the sequential path so Divina `AND+fuzzy` stays fast.
|
|
15
|
+
|
|
16
|
+
## v1.2.2 — `@yoch/frozenminisearch`
|
|
17
|
+
|
|
18
|
+
Patch release: faster frozen AND scoring on large posting lists (gated seek + posting-ratio gate) and BM25 segment hoisting. No API or MSv5 wire-format changes.
|
|
19
|
+
|
|
20
|
+
### Improved
|
|
21
|
+
|
|
22
|
+
- **AND gate posting-ratio** — when the absolute gate cap would disable filtering, pass `allowedDocs` to later AND branches if the gate is small relative to the branch posting length (calibrated: min length 2048, max 25% of posting). Applies to string AND and nested `QueryCombination` AND. Parity with naive score-then-intersect unchanged.
|
|
23
|
+
- **Gated posting seek** — on selective AND paths, score gated segments with binary search by doc id instead of scanning full sorted posting lists (same numeric thresholds as the ratio gate; distinct decision point).
|
|
24
|
+
- **BM25 IDF hoisting** — compute document-frequency IDF once per posting segment on frozen paths when doc activity filtering is inactive; lowers work on high-frequency AND queries.
|
|
25
|
+
- **Posting layout selection** — cost-based choice between dense and sparse frozen posting layouts from field/term statistics at build time.
|
|
26
|
+
|
|
5
27
|
## v1.2.1 — `@yoch/frozenminisearch`
|
|
6
28
|
|
|
7
29
|
Patch release: lower search overhead when stored fields are disabled and fewer query-normalization allocations. No API or MSv5 wire-format changes.
|
package/README.md
CHANGED
|
@@ -3,68 +3,44 @@
|
|
|
3
3
|
[](https://www.npmjs.com/package/@yoch/frozenminisearch)
|
|
4
4
|
[](https://codecov.io/gh/yoch/frozenminisearch)
|
|
5
5
|
[](https://github.com/yoch/frozenminisearch/actions/workflows/main.yml)
|
|
6
|
-
[](https://socket.dev/npm/package/%40yoch%2Ffrozenminisearch)
|
|
7
6
|
|
|
8
7
|
[API documentation](https://yoch.github.io/frozenminisearch/)
|
|
9
8
|
|
|
10
|
-
**Memory-optimized, read-only full-text search for Node.js
|
|
9
|
+
**Memory-optimized, read-only full-text search for Node.js.** FrozenMiniSearch keeps the serving API close to [MiniSearch](https://github.com/lucaong/minisearch) while using compact, immutable indexes for fixed corpora.
|
|
11
10
|
|
|
12
|
-
|
|
11
|
+
Use it when your documents are built offline, shipped to production, and queried many times. In that shape, frozen indexes use **~98-99% less index RAM** in the main benchmark set, save to compact binary snapshots, and load faster than MiniSearch JSON.
|
|
13
12
|
|
|
14
|
-
|
|
13
|
+
If you need live `add`, `remove`, or `discard`, use MiniSearch. If the corpus is fixed, this package is designed to keep the search experience familiar while making each serving replica much smaller.
|
|
15
14
|
|
|
16
15
|
---
|
|
17
16
|
|
|
18
|
-
## Why
|
|
17
|
+
## Why FrozenMiniSearch?
|
|
19
18
|
|
|
20
|
-
|
|
19
|
+
FrozenMiniSearch is for the common production path where search data changes elsewhere, not inside the web process:
|
|
21
20
|
|
|
22
|
-
|
|
21
|
+
- Build or import the index offline.
|
|
22
|
+
- Save it as a compact binary snapshot.
|
|
23
|
+
- Load it in many read-only Node.js processes.
|
|
24
|
+
- Query with MiniSearch-style `search`, `autoSuggest`, filters, boosts, prefix/fuzzy search, wildcard, and `AND` / `OR` / `AND_NOT`.
|
|
23
25
|
|
|
24
|
-
|
|
25
|
-
|-----------|---------------|
|
|
26
|
-
| **Packed radix tree + flat postings** | Term dictionary and posting lists without per-entry JS wrappers |
|
|
27
|
-
| **Columnar `storeFields`** | One dense column per field instead of a `Record` per document (~75% less heap for a single stored field) |
|
|
28
|
-
| **MSv5 binary snapshots** | ~73–94% smaller on disk than MiniSearch JSON; faster cold load |
|
|
29
|
-
| **Read-only freeze** | No mutation bookkeeping — layouts sized for serve-time, not incremental edit |
|
|
30
|
-
| **Incremental builder** | Typed-array accumulators during build; lower peak heap than materializing `number[][]` per term |
|
|
26
|
+
Internally it replaces mutable JavaScript object graphs with packed radix postings, typed arrays, and columnar stored fields. The result is less flexible than MiniSearch, but much cheaper to keep resident.
|
|
31
27
|
|
|
32
28
|
<!-- vs-reference:start — npm run bench:readme -->
|
|
33
|
-
### Measured vs MiniSearch
|
|
29
|
+
### Measured vs MiniSearch
|
|
34
30
|
|
|
35
|
-
Same BM25 queries
|
|
31
|
+
Same corpora, same BM25-style queries, MiniSearch 7.2.0 as the reference.
|
|
36
32
|
|
|
37
|
-
| Scenario | Docs | Index RAM
|
|
38
|
-
|
|
39
|
-
| Divina with
|
|
40
|
-
| Divina index only | 14,097 | 0.
|
|
41
|
-
|
|
|
42
|
-
| Dense numeric ids
|
|
43
|
-
|
|
|
33
|
+
| Scenario | Docs | Index RAM | Binary size | Load time | Search p50 |
|
|
34
|
+
|----------|-----:|-----------|------------:|----------:|-----------:|
|
|
35
|
+
| Divina, with stored text | 14,097 | 0.3 vs 16.1 MB (~98% less) | ~73% less | ~75% faster | ~14% faster |
|
|
36
|
+
| Divina, index only | 14,097 | 0.2 vs 14.9 MB (~99% less) | ~77% less | ~89% faster | ~23% faster |
|
|
37
|
+
| High-frequency terms | 10,000 | 4.4 vs 7.4 MB (~41% less) | ~94% less | ~91% faster | ~40% faster |
|
|
38
|
+
| Dense numeric ids | 100,000 | 0.9 vs 91.3 MB (~99% less) | ~88% less | ~94% faster | ~27% faster |
|
|
39
|
+
| Uint16 doc id boundary | 65,535 | 0.6 vs 58.6 MB (~99% less) | ~91% less | ~93% faster | ~43% faster |
|
|
44
40
|
|
|
45
|
-
|
|
41
|
+
Across this full run, frozen is faster on **25/27** search cases. Divina `inferno` (exact, paired p50): mutable 15.0 µs → frozen 13.4 µs (**-2 µs**, ratio 0.78).
|
|
46
42
|
|
|
47
|
-
|
|
48
|
-
|
|
49
|
-
| | MiniSearch | `@yoch/frozenminisearch` |
|
|
50
|
-
|---|------------------------|---------------------------|
|
|
51
|
-
| **Optimizes for** | Live mutations, flexibility | **Retained RAM**, snapshot size, cold load |
|
|
52
|
-
| **Sweet spot** | Documents change at runtime | Fixed corpus, many replicas, tight memory budget |
|
|
53
|
-
| **Production path** | `addAll` → `toJSON` | `fromDocuments` / `fromMiniSearch` → `saveBinarySync` → `loadBinarySync` |
|
|
54
|
-
| **Typical trade-off** | Higher RAM, JSON snapshots | One-time freeze, then compact binary |
|
|
55
|
-
|
|
56
|
-
<details>
|
|
57
|
-
<summary><strong>How to read these numbers (limits & protocol)</strong></summary>
|
|
58
|
-
|
|
59
|
-
- **Captured:** 2026-06-07 · commit `9f32207` · Node v24.16.0 · minisearch **7.2.0** · **3** run(s)/scenario · protocol **v2** (hrtime-paired, batch target 3 ms).
|
|
60
|
-
- ¹ **Index RAM** — `measureHeap` with `--expose-gc`, one index alive. V8 overhead is extra; treat as **trend**, not accounting. Sporadic outliers happen (e.g. index-only Divina).
|
|
61
|
-
- ² **Disk** — `JSON.stringify(mutable)` vs `saveBinarySync`.
|
|
62
|
-
- ³ **Cold load** — median wall time to searchable index after read from disk format.
|
|
63
|
-
- ⁴ **Search p50** — paired mutable/frozen samples per iteration; sub-0.1 ms baselines reported in **µs** in full reports. Fast queries use **50** iterations, others **20**.
|
|
64
|
-
- **Not shown:** mutable `add`/`remove` (frozen is read-only by design). Freeze time is offline — see full suite for build metrics.
|
|
65
|
-
- **Reproduce:** `npm run bench -- run --profile=vs-reference` · **Update this block:** `npm run bench:readme` after refreshing `benchmarks/baselines/reference.json`.
|
|
66
|
-
|
|
67
|
-
</details>
|
|
43
|
+
Numbers are from `benchmarks/baselines/reference.json`, captured 2026-06-18 on Node v24.16.0, 3 runs per scenario. Heap is measured with one index alive and should be read as a trend, not exact accounting.
|
|
68
44
|
<!-- vs-reference:end -->
|
|
69
45
|
|
|
70
46
|
---
|
|
@@ -75,8 +51,6 @@ Decomposition (Divina exact): L0 lookup ~300 ns frozen, L1 `executeQuery` ~8.3
|
|
|
75
51
|
npm install @yoch/frozenminisearch
|
|
76
52
|
```
|
|
77
53
|
|
|
78
|
-
**Build from documents:**
|
|
79
|
-
|
|
80
54
|
```javascript
|
|
81
55
|
import FrozenMiniSearch from '@yoch/frozenminisearch'
|
|
82
56
|
|
|
@@ -90,7 +64,7 @@ const buf = index.saveBinarySync()
|
|
|
90
64
|
const loaded = FrozenMiniSearch.loadBinarySync(buf, options)
|
|
91
65
|
```
|
|
92
66
|
|
|
93
|
-
|
|
67
|
+
For larger imports, use the incremental builder:
|
|
94
68
|
|
|
95
69
|
```javascript
|
|
96
70
|
import FrozenMiniSearch, {
|
|
@@ -107,21 +81,11 @@ ESM and CommonJS are both supported (`main` → CJS, `module` → ESM).
|
|
|
107
81
|
|
|
108
82
|
---
|
|
109
83
|
|
|
110
|
-
##
|
|
111
|
-
|
|
112
|
-
For **fixed corpora** (build once, serve read-only), treat this package as a drop-in replacement for MiniSearch on the serving path — same queries, far less memory per replica.
|
|
113
|
-
|
|
114
|
-
**Change only:**
|
|
115
|
-
|
|
116
|
-
| What | Before | After |
|
|
117
|
-
|------|--------|-------|
|
|
118
|
-
| Package | `minisearch` | `@yoch/frozenminisearch` |
|
|
119
|
-
| Construction | `new MiniSearch(opts).addAll(docs)` | `FrozenMiniSearch.fromDocuments(docs, opts)` or `fromMiniSearch(mutable, opts)` |
|
|
120
|
-
| JSON snapshot | `toJSON()` / `loadJSON()` wire format | `FrozenMiniSearch.toJSON()` / `fromJson(json, opts)` or `fromMiniSearchSnapshot(obj)` — no runtime dependency on `minisearch` |
|
|
84
|
+
## Migration
|
|
121
85
|
|
|
122
|
-
|
|
86
|
+
For fixed corpora, most serving code can stay the same. Change how the index is built or loaded, then keep calling `search`, `autoSuggest`, `has`, and `getStoredFields`.
|
|
123
87
|
|
|
124
|
-
|
|
88
|
+
Default and named imports both work:
|
|
125
89
|
|
|
126
90
|
```javascript
|
|
127
91
|
// ESM
|
|
@@ -133,38 +97,28 @@ const FrozenMiniSearch = require('@yoch/frozenminisearch')
|
|
|
133
97
|
const { FrozenMiniSearch } = require('@yoch/frozenminisearch')
|
|
134
98
|
```
|
|
135
99
|
|
|
136
|
-
|
|
100
|
+
Build directly:
|
|
137
101
|
|
|
138
|
-
|
|
102
|
+
```javascript
|
|
103
|
+
import FrozenMiniSearch from '@yoch/frozenminisearch'
|
|
139
104
|
|
|
140
|
-
|
|
105
|
+
const frozen = FrozenMiniSearch.fromDocuments(documents, options)
|
|
106
|
+
```
|
|
141
107
|
|
|
142
|
-
|
|
108
|
+
Or freeze an existing MiniSearch index:
|
|
143
109
|
|
|
144
110
|
```javascript
|
|
145
|
-
import MiniSearch from 'minisearch'
|
|
111
|
+
import MiniSearch from 'minisearch'
|
|
146
112
|
import FrozenMiniSearch from '@yoch/frozenminisearch'
|
|
147
113
|
|
|
148
114
|
const mutable = new MiniSearch(options)
|
|
149
115
|
mutable.addAll(documents)
|
|
150
116
|
|
|
151
|
-
// Option A — live instance
|
|
152
117
|
const frozen = FrozenMiniSearch.fromMiniSearch(mutable, options)
|
|
153
|
-
|
|
154
|
-
// Option B — serialized index (offline / ETL)
|
|
155
|
-
const json = JSON.stringify(mutable)
|
|
156
|
-
const frozen2 = FrozenMiniSearch.fromJson(json, options)
|
|
118
|
+
const fromJson = FrozenMiniSearch.fromJson(JSON.stringify(mutable), options)
|
|
157
119
|
```
|
|
158
120
|
|
|
159
|
-
|
|
160
|
-
|
|
161
|
-
### From MiniSearch (mutable → frozen)
|
|
162
|
-
|
|
163
|
-
| Before (mutable) | After (`@yoch/frozenminisearch`) |
|
|
164
|
-
|------------------|----------------------------------|
|
|
165
|
-
| `new MiniSearch(opts).addAll(docs)` then serve | `FrozenMiniSearch.fromDocuments(docs, opts)` or `fromMiniSearch(mutable, opts)` |
|
|
166
|
-
| MiniSearch JSON snapshot | `FrozenMiniSearch.fromJson(json)` or `fromMiniSearchSnapshot(obj)` |
|
|
167
|
-
| `import MiniSearch from 'minisearch'` | `import FrozenMiniSearch from '@yoch/frozenminisearch'` (+ `minisearch` only if you still build mutable indexes) |
|
|
121
|
+
MiniSearch is only needed if you still build mutable indexes. Frozen instances do not support live `add`, `remove`, or `discard`.
|
|
168
122
|
|
|
169
123
|
---
|
|
170
124
|
|
|
@@ -175,13 +129,13 @@ const frozen2 = FrozenMiniSearch.fromJson(json, options)
|
|
|
175
129
|
- `has(id)`, `getStoredFields(id)`
|
|
176
130
|
- `saveBinarySync` / `loadBinarySync` / async variants
|
|
177
131
|
|
|
178
|
-
|
|
132
|
+
Custom `tokenize` and `processTerm` functions are not stored in snapshots; pass the same functions again when loading.
|
|
179
133
|
|
|
180
134
|
---
|
|
181
135
|
|
|
182
136
|
## Binary snapshots
|
|
183
137
|
|
|
184
|
-
|
|
138
|
+
Binary snapshots are the preferred production format.
|
|
185
139
|
|
|
186
140
|
```javascript
|
|
187
141
|
const buf = index.saveBinarySync()
|
|
@@ -189,12 +143,8 @@ const loaded = FrozenMiniSearch.loadBinarySync(buf, {}) // field names embedded
|
|
|
189
143
|
```
|
|
190
144
|
|
|
191
145
|
- **Node ≥ 20**
|
|
192
|
-
-
|
|
193
|
-
|
|
194
|
-
- `zstd` on Node 22.15+ when it strictly shrinks the payload
|
|
195
|
-
- otherwise `zlib` on Node 20–22.14 when it strictly shrinks the payload
|
|
196
|
-
- otherwise `raw` (uncompressed)
|
|
197
|
-
- Explicit snapshot compression always writes the chosen codec, even when compression would not shrink the payload (useful for portability):
|
|
146
|
+
- `compression: 'auto'` chooses `zstd` on Node 22.15+, otherwise `zlib`, and falls back to raw when compression does not help.
|
|
147
|
+
- Use explicit compression when you need a portable artifact:
|
|
198
148
|
|
|
199
149
|
```javascript
|
|
200
150
|
const portable = index.saveBinarySync({ compression: 'zlib' })
|
|
@@ -202,11 +152,7 @@ const uncompressed = index.saveBinarySync({ compression: 'raw' })
|
|
|
202
152
|
const bestRatio = index.saveBinarySync({ compression: 'zstd' }) // Node 22.15+
|
|
203
153
|
```
|
|
204
154
|
|
|
205
|
-
|
|
206
|
-
- `raw` and `zlib` snapshots load on Node 20+
|
|
207
|
-
- `zstd` snapshots require Node 22.15+
|
|
208
|
-
- Snapshots produced by this package version are forward-compatible; re-build from MiniSearch JSON if an older binary fails to load
|
|
209
|
-
- `tokenize` / `processTerm` are not stored — pass the same functions at load when customized
|
|
155
|
+
Raw and zlib snapshots load on Node 20+. zstd snapshots require Node 22.15+.
|
|
210
156
|
|
|
211
157
|
---
|
|
212
158
|
|
|
@@ -216,8 +162,8 @@ See [benchmarks/README.md](benchmarks/README.md).
|
|
|
216
162
|
|
|
217
163
|
```bash
|
|
218
164
|
npm run bench -- run --profile=vs-reference # compare frozen vs minisearch
|
|
219
|
-
npm run bench:diff
|
|
220
|
-
npm run bench:readme
|
|
165
|
+
npm run bench:diff # regression vs reference.json
|
|
166
|
+
npm run bench:readme -- --from=benchmarks/baselines/latest.json
|
|
221
167
|
```
|
|
222
168
|
|
|
223
169
|
---
|
|
@@ -231,9 +177,7 @@ yarn build
|
|
|
231
177
|
node scripts/verify-npm-pack.cjs
|
|
232
178
|
```
|
|
233
179
|
|
|
234
|
-
Parity tests
|
|
235
|
-
|
|
236
|
-
Design notes (freq adaptive, AND gating): [dev/docs/README.md](dev/docs/README.md).
|
|
180
|
+
Parity tests compare against MiniSearch 7. Longer notes and performance work live under [dev/docs/README.md](dev/docs/README.md) and [benchmarks/README.md](benchmarks/README.md).
|
|
237
181
|
|
|
238
182
|
---
|
|
239
183
|
|
|
@@ -242,6 +186,6 @@ Design notes (freq adaptive, AND gating): [dev/docs/README.md](dev/docs/README.m
|
|
|
242
186
|
See [CHANGELOG.md](./CHANGELOG.md).
|
|
243
187
|
|
|
244
188
|
- **MiniSearch** — [Luca Ongaro](https://github.com/lucaong/minisearch) (MIT)
|
|
245
|
-
- **@yoch/frozenminisearch** — memory-optimized frozen indexes
|
|
189
|
+
- **@yoch/frozenminisearch** — memory-optimized frozen indexes and compact binary snapshots
|
|
246
190
|
|
|
247
191
|
Upstream docs: [MiniSearch](https://lucaong.github.io/minisearch/)
|