@yoch/frozenminisearch 1.0.1 → 1.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +27 -5
- package/README.md +38 -20
- package/dist/cjs/index.cjs +545 -275
- package/dist/es/index.d.ts +32 -8
- package/dist/es/index.js +545 -275
- package/package.json +10 -2
package/CHANGELOG.md
CHANGED
|
@@ -2,6 +2,28 @@
|
|
|
2
2
|
|
|
3
3
|
## Unreleased
|
|
4
4
|
|
|
5
|
+
## v1.1.0 — `@yoch/frozenminisearch`
|
|
6
|
+
|
|
7
|
+
Minor release: MiniSearch JSON wire export and clearer JSON import API. MSv5 binary format unchanged.
|
|
8
|
+
|
|
9
|
+
### Added
|
|
10
|
+
|
|
11
|
+
- **`toJSON()`** — export MiniSearch wire snapshots (`serializationVersion: 2`); import via `fromJson` / `fromMiniSearchSnapshot`. Production persistence remains `saveBinarySync`.
|
|
12
|
+
|
|
13
|
+
### Breaking
|
|
14
|
+
|
|
15
|
+
- **`fromMiniSearchJson` → `fromJson`** — rename for clearer semantics (JSON import vs binary load). Update call sites: `FrozenMiniSearch.fromMiniSearchJson(json)` → `FrozenMiniSearch.fromJson(json)`.
|
|
16
|
+
|
|
17
|
+
## v1.0.2 — `@yoch/frozenminisearch`
|
|
18
|
+
|
|
19
|
+
Patch release: lower retained heap when `storeFields` has one field. No API or MSv5 wire-format changes.
|
|
20
|
+
|
|
21
|
+
### Improved
|
|
22
|
+
|
|
23
|
+
- **Single-field `storeFields` at rest** — values live in a dense column instead of one `Record` per document (~75% less retained heap on Divina with `storeFields: ['txt']`; ~1.0 → ~0.3 MB).
|
|
24
|
+
- **Binary save/load** — encode and decode skip intermediate row arrays when the in-memory layout or load `storeFields` hint allows direct wire paths (same bytes on disk).
|
|
25
|
+
- **Posting slice lookups** — scoring flyweight reuses a scratch buffer instead of allocating `{ offset, length }` per lookup.
|
|
26
|
+
|
|
5
27
|
## v1.0.1 — `@yoch/frozenminisearch`
|
|
6
28
|
|
|
7
29
|
Patch release: lower build-time peak memory and migration ergonomics. No API or wire-format changes.
|
|
@@ -13,7 +35,7 @@ Patch release: lower build-time peak memory and migration ergonomics. No API or
|
|
|
13
35
|
|
|
14
36
|
### Fixed
|
|
15
37
|
|
|
16
|
-
- **Default tokenizer parity** — leading delimiter produces an empty token (e.g. `::a` → `["", "a"]`), matching
|
|
38
|
+
- **Default tokenizer parity** — leading delimiter produces an empty token (e.g. `::a` → `["", "a"]`), matching MiniSearch `split` behaviour.
|
|
17
39
|
- **Named export** — `FrozenMiniSearch` is exported again alongside the default export (ESM and CJS).
|
|
18
40
|
|
|
19
41
|
## v1.0.0 — `@yoch/frozenminisearch`
|
|
@@ -22,7 +44,7 @@ First stable release on npm. Frozen-only read-only search for Node.js.
|
|
|
22
44
|
|
|
23
45
|
### Breaking
|
|
24
46
|
|
|
25
|
-
- **Binary snapshots** — `loadBinarySync` / `loadBinaryAsync` read only the current frozen binary format; re-build from
|
|
47
|
+
- **Binary snapshots** — `loadBinarySync` / `loadBinaryAsync` read only the current frozen binary format; re-build from MiniSearch JSON if an older snapshot fails to load.
|
|
26
48
|
- **Removed `saveBinary()` / `loadBinary()`** — use `saveBinarySync` / `saveBinaryAsync` and `loadBinarySync` / `loadBinaryAsync`.
|
|
27
49
|
|
|
28
50
|
## v1.0.0-beta.0 — `@yoch/frozenminisearch`
|
|
@@ -32,16 +54,16 @@ New standalone package (frozen-only) for read-only serving workloads.
|
|
|
32
54
|
### Added
|
|
33
55
|
|
|
34
56
|
- **`FrozenMiniSearch`** as the default export — `fromDocuments`, builder, `saveBinarySync` / `loadBinarySync`
|
|
35
|
-
- **Migration loaders** — `fromMiniSearch`, `
|
|
57
|
+
- **Migration loaders** — `fromMiniSearch`, `fromJson`, `fromMiniSearchSnapshot` (MiniSearch JSON wire format)
|
|
36
58
|
- **Modular benchmarks** — `npm run bench` with profiles `vs-reference`, `regression`, `dev`
|
|
37
59
|
- **Parity suite** — `dev/parity/` vs `minisearch` npm (functional invariants)
|
|
38
60
|
|
|
39
61
|
### Removed from published API
|
|
40
62
|
|
|
41
63
|
- Mutable `MiniSearch` class and `freeze()` on the fork
|
|
42
|
-
- `freezeFromMiniSearch` (use `
|
|
64
|
+
- `freezeFromMiniSearch` (use `fromJson`)
|
|
43
65
|
- Read-only mutation stubs (`add`, `remove`, …)
|
|
44
66
|
|
|
45
67
|
### Migration
|
|
46
68
|
|
|
47
|
-
- `new MiniSearch(opts).addAll(docs)`
|
|
69
|
+
- `new MiniSearch(opts).addAll(docs)` → `FrozenMiniSearch.fromDocuments(docs, opts)` or `fromMiniSearch(mutable, opts)` — see README
|
package/README.md
CHANGED
|
@@ -1,21 +1,36 @@
|
|
|
1
|
-
#
|
|
1
|
+
# FrozenMiniSearch
|
|
2
2
|
|
|
3
|
-
|
|
3
|
+
[](https://www.npmjs.com/package/@yoch/frozenminisearch)
|
|
4
|
+
[](https://codecov.io/gh/yoch/frozenminisearch)
|
|
5
|
+
[](https://github.com/yoch/frozenminisearch/actions/workflows/main.yml)
|
|
6
|
+
[](https://socket.dev/npm/package/%40yoch%2Ffrozenminisearch)
|
|
4
7
|
|
|
5
|
-
|
|
8
|
+
**Memory-optimized, read-only full-text search for Node.js** — the same BM25, prefix/fuzzy, and `autoSuggest` API as [MiniSearch](https://github.com/lucaong/minisearch), with **up to ~98% less index RAM** on real corpora and compact binary snapshots you ship instead of JSON.
|
|
6
9
|
|
|
7
|
-
**
|
|
10
|
+
**Why it exists:** [MiniSearch](https://github.com/lucaong/minisearch) optimizes for a mutable in-memory index. FrozenMiniSearch optimizes for **retained heap, disk footprint, and cold load** once the corpus is fixed — packed radix postings, columnar `storeFields`, typed-array layouts, and MSv5 binary wire format instead of per-document JS objects.
|
|
11
|
+
|
|
12
|
+
**Design goal:** migrate with minimal code change — package name and index construction only; serving code stays the same. Build with `fromDocuments`, the incremental builder, or `fromJson`; no mutable `MiniSearch` class is published here.
|
|
8
13
|
|
|
9
14
|
---
|
|
10
15
|
|
|
11
16
|
## Why frozen instead of MiniSearch?
|
|
12
17
|
|
|
13
|
-
**
|
|
18
|
+
Choose **mutable MiniSearch** when documents change at runtime (`add`, `remove`, `discard`). Choose **frozen** when memory and snapshot size matter: fixed corpus, deploy from binary, many replicas loading the same index. Search semantics stay the same — BM25, prefix/fuzzy, `autoSuggest`, wildcard, `AND` / `OR` / `AND_NOT` — with parity vs MiniSearch 7 validated in `dev/parity/` (scores `toBeCloseTo` precision 6).
|
|
19
|
+
|
|
20
|
+
### Memory-first design
|
|
21
|
+
|
|
22
|
+
| Technique | What it saves |
|
|
23
|
+
|-----------|---------------|
|
|
24
|
+
| **Packed radix tree + flat postings** | Term dictionary and posting lists without per-entry JS wrappers |
|
|
25
|
+
| **Columnar `storeFields`** | One dense column per field instead of a `Record` per document (~75% less heap for a single stored field) |
|
|
26
|
+
| **MSv5 binary snapshots** | ~73–94% smaller on disk than MiniSearch JSON; faster cold load |
|
|
27
|
+
| **Read-only freeze** | No mutation bookkeeping — layouts sized for serve-time, not incremental edit |
|
|
28
|
+
| **Incremental builder** | Typed-array accumulators during build; lower peak heap than materializing `number[][]` per term |
|
|
14
29
|
|
|
15
30
|
<!-- vs-reference:start — npm run bench:readme -->
|
|
16
|
-
### Measured vs
|
|
31
|
+
### Measured vs MiniSearch (reference baseline)
|
|
17
32
|
|
|
18
|
-
Same BM25 queries on identical corpora. **
|
|
33
|
+
Same BM25 queries on identical corpora. **Index RAM is the headline metric** — frozen uses a fraction of mutable heap on every scenario below; disk and cold load follow from the compact binary format.
|
|
19
34
|
|
|
20
35
|
| Scenario | Docs | Index RAM¹ | Disk (binary vs JSON)² | Cold load³ | Search p50⁴ |
|
|
21
36
|
|----------|-----:|------------|------------------------:|-----------:|------------:|
|
|
@@ -29,9 +44,10 @@ Same BM25 queries on identical corpora. **Frozen wins on what we optimize for**:
|
|
|
29
44
|
|
|
30
45
|
Decomposition (Divina exact): L0 lookup ~300 ns frozen, L1 `executeQuery` ~8.3 µs, L2 full `search` ~11.6 µs (finalize ≈ 3 µs).
|
|
31
46
|
|
|
32
|
-
| |
|
|
47
|
+
| | MiniSearch | `@yoch/frozenminisearch` |
|
|
33
48
|
|---|------------------------|---------------------------|
|
|
34
|
-
| **
|
|
49
|
+
| **Optimizes for** | Live mutations, flexibility | **Retained RAM**, snapshot size, cold load |
|
|
50
|
+
| **Sweet spot** | Documents change at runtime | Fixed corpus, many replicas, tight memory budget |
|
|
35
51
|
| **Production path** | `addAll` → `toJSON` | `fromDocuments` / `fromMiniSearch` → `saveBinarySync` → `loadBinarySync` |
|
|
36
52
|
| **Typical trade-off** | Higher RAM, JSON snapshots | One-time freeze, then compact binary |
|
|
37
53
|
|
|
@@ -91,17 +107,17 @@ ESM and CommonJS are both supported (`main` → CJS, `module` → ESM).
|
|
|
91
107
|
|
|
92
108
|
## Drop-in
|
|
93
109
|
|
|
94
|
-
For **fixed corpora** (build once, serve read-only), treat this package as a drop-in replacement for
|
|
110
|
+
For **fixed corpora** (build once, serve read-only), treat this package as a drop-in replacement for MiniSearch on the serving path — same queries, far less memory per replica.
|
|
95
111
|
|
|
96
112
|
**Change only:**
|
|
97
113
|
|
|
98
114
|
| What | Before | After |
|
|
99
115
|
|------|--------|-------|
|
|
100
|
-
| Package |
|
|
116
|
+
| Package | `minisearch` | `@yoch/frozenminisearch` |
|
|
101
117
|
| Construction | `new MiniSearch(opts).addAll(docs)` | `FrozenMiniSearch.fromDocuments(docs, opts)` or `fromMiniSearch(mutable, opts)` |
|
|
102
|
-
| JSON snapshot | `
|
|
118
|
+
| JSON snapshot | `toJSON()` / `loadJSON()` wire format | `FrozenMiniSearch.toJSON()` / `fromJson(json, opts)` or `fromMiniSearchSnapshot(obj)` — no runtime dependency on `minisearch` |
|
|
103
119
|
|
|
104
|
-
**Keep unchanged** after load: `search`, `autoSuggest`, `has`, `getStoredFields`, query options (`prefix`, `fuzzy`, `AND` / `OR` / `AND_NOT`, filters, boosts). Parity vs
|
|
120
|
+
**Keep unchanged** after load: `search`, `autoSuggest`, `has`, `getStoredFields`, query options (`prefix`, `fuzzy`, `AND` / `OR` / `AND_NOT`, filters, boosts). Parity vs MiniSearch 7 is enforced in `dev/parity/`.
|
|
105
121
|
|
|
106
122
|
**Imports** — default and named both work (ESM and CJS):
|
|
107
123
|
|
|
@@ -121,7 +137,7 @@ const { FrozenMiniSearch } = require('@yoch/frozenminisearch')
|
|
|
121
137
|
|
|
122
138
|
## Migration
|
|
123
139
|
|
|
124
|
-
### From
|
|
140
|
+
### From MiniSearch JSON
|
|
125
141
|
|
|
126
142
|
```javascript
|
|
127
143
|
import MiniSearch from 'minisearch' // build-time only
|
|
@@ -135,18 +151,18 @@ const frozen = FrozenMiniSearch.fromMiniSearch(mutable, options)
|
|
|
135
151
|
|
|
136
152
|
// Option B — serialized index (offline / ETL)
|
|
137
153
|
const json = JSON.stringify(mutable)
|
|
138
|
-
const frozen2 = FrozenMiniSearch.
|
|
154
|
+
const frozen2 = FrozenMiniSearch.fromJson(json, options)
|
|
139
155
|
```
|
|
140
156
|
|
|
141
157
|
`options.fields` must match the indexed fields in the snapshot when provided.
|
|
142
158
|
|
|
143
|
-
### From
|
|
159
|
+
### From MiniSearch (mutable → frozen)
|
|
144
160
|
|
|
145
161
|
| Before (mutable) | After (`@yoch/frozenminisearch`) |
|
|
146
162
|
|------------------|----------------------------------|
|
|
147
163
|
| `new MiniSearch(opts).addAll(docs)` then serve | `FrozenMiniSearch.fromDocuments(docs, opts)` or `fromMiniSearch(mutable, opts)` |
|
|
148
|
-
|
|
|
149
|
-
| `import MiniSearch from 'minisearch'` | `import FrozenMiniSearch from '@yoch/frozenminisearch'` (+
|
|
164
|
+
| MiniSearch JSON snapshot | `FrozenMiniSearch.fromJson(json)` or `fromMiniSearchSnapshot(obj)` |
|
|
165
|
+
| `import MiniSearch from 'minisearch'` | `import FrozenMiniSearch from '@yoch/frozenminisearch'` (+ `minisearch` only if you still build mutable indexes) |
|
|
150
166
|
|
|
151
167
|
---
|
|
152
168
|
|
|
@@ -163,13 +179,15 @@ Indexing is **not** available on a frozen instance — use `fromDocuments`, the
|
|
|
163
179
|
|
|
164
180
|
## Binary snapshots
|
|
165
181
|
|
|
182
|
+
The primary way to **persist and ship a memory-compact index** — smaller than MiniSearch JSON and faster to load into a low-RAM serving process.
|
|
183
|
+
|
|
166
184
|
```javascript
|
|
167
185
|
const buf = index.saveBinarySync()
|
|
168
186
|
const loaded = FrozenMiniSearch.loadBinarySync(buf, {}) // field names embedded in snapshot
|
|
169
187
|
```
|
|
170
188
|
|
|
171
189
|
- **Node ≥ 22.15.0** (zstd via `node:zlib`)
|
|
172
|
-
- Snapshots produced by this package version are forward-compatible; re-build from
|
|
190
|
+
- Snapshots produced by this package version are forward-compatible; re-build from MiniSearch JSON if an older binary fails to load
|
|
173
191
|
- `tokenize` / `processTerm` are not stored — pass the same functions at load when customized
|
|
174
192
|
|
|
175
193
|
---
|
|
@@ -206,6 +224,6 @@ Design notes (freq adaptive, AND gating): [dev/docs/README.md](dev/docs/README.m
|
|
|
206
224
|
See [CHANGELOG.md](./CHANGELOG.md).
|
|
207
225
|
|
|
208
226
|
- **MiniSearch** — [Luca Ongaro](https://github.com/lucaong/minisearch) (MIT)
|
|
209
|
-
- **@yoch/frozenminisearch** — frozen indexes, packed radix tree, compact binary snapshots
|
|
227
|
+
- **@yoch/frozenminisearch** — memory-optimized frozen indexes, packed radix tree, compact binary snapshots
|
|
210
228
|
|
|
211
229
|
Upstream docs: [MiniSearch](https://lucaong.github.io/minisearch/)
|