@yoch/frozenminisearch 1.2.4 → 1.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -2,6 +2,32 @@
2
2
 
3
3
  ## Unreleased
4
4
 
5
+ ## v1.3.0 — `@yoch/frozenminisearch`
6
+
7
+ Minor release: browser entry (`@yoch/frozenminisearch/browser`), portable default compression (`auto` → zlib), async browser MSv5 binary snapshots, Node ↔ browser zlib interoperability, and indexing parity fixes for custom tokenizers.
8
+
9
+ ### Added
10
+
11
+ - **Browser entry** — `@yoch/frozenminisearch/browser` for read-only search and index build in the browser (`fromDocuments`, `fromJson`, `search`, `autoSuggest`, incremental builder).
12
+ - **Browser binary I/O** — `saveBinaryAsync` / `loadBinaryAsync` on `Uint8Array` (`raw`, `zlib`, `auto`). No sync binary APIs and no zstd in the browser build.
13
+ - **Wire portability layer** — `binaryBytes`, `binaryWireIo`, `fieldLengthMatrixWire`, and browser compression via native `CompressionStream` / `DecompressionStream`.
14
+ - **Indexing parity gate** — `dev/parity/indexing-parity.test.js` compares `MiniSearch.addAll` vs `FrozenMiniSearch.fromDocuments` (index fingerprint + scores) across default, camelCase, `processTerm`, `stringifyField`, and Vocs-style profiles; builder, `fromJson`, and binary round-trips included.
15
+
16
+ ### Fixed
17
+
18
+ - **Custom tokenizer indexing** — `isDefaultTokenize` now requires reference equality with the default tokenizer; split-equivalent wrappers no longer take the default fast path (fixes missing camelCase terms such as `create` from `createUser`).
19
+ - **Field length with `processTerm`** — `fromDocuments` counts unique raw tokens per field (MiniSearch semantics) instead of distinct indexed terms after filtering.
20
+
21
+ ### Changed
22
+
23
+ - **`compression: 'auto'`** — always tries zlib (then raw if it does not shrink). zstd remains opt-in via `compression: 'zstd'` on Node 22.15+; existing zstd snapshots still load on Node.
24
+
25
+ ### Improved
26
+
27
+ - **CI** — cross-runtime smoke tests: Node zlib save → browser load and browser zlib save → Node load.
28
+ - **Browser bundle size** — production `dist/browser/index.js` is ~67.6 KB raw and ~20.9 KB gzip (native compression streams, no `fflate`).
29
+ - **`stringifyField` fast path** — skip redundant `toString()` when the field value is already a string and the default stringifier is in use.
30
+
5
31
  ## v1.2.4 — `@yoch/frozenminisearch`
6
32
 
7
33
  Patch release: faster frozen search and autoSuggest finalization, simplified AND gate heuristics, and small public exports for advanced callers. No MSv5 wire-format changes.
package/README.md CHANGED
@@ -6,7 +6,7 @@
6
6
 
7
7
  [API documentation](https://yoch.github.io/frozenminisearch/)
8
8
 
9
- **Memory-optimized, read-only full-text search for Node.js.** FrozenMiniSearch keeps the serving API close to [MiniSearch](https://github.com/lucaong/minisearch) while using compact, immutable indexes for fixed corpora.
9
+ **Memory-optimized, read-only full-text search for Node.js and browsers.** FrozenMiniSearch keeps the serving API close to [MiniSearch](https://github.com/lucaong/minisearch) while using compact, immutable indexes for fixed corpora.
10
10
 
11
11
  Use it when your documents are built offline, shipped to production, and queried many times. In that shape, frozen indexes use **~98-99% less index RAM** in the main benchmark set, save to compact binary snapshots, and load faster than MiniSearch JSON.
12
12
 
@@ -32,15 +32,15 @@ Same corpora, same BM25-style queries, MiniSearch 7.2.0 as the reference.
32
32
 
33
33
  | Scenario | Docs | Index RAM | Binary size | Load time | Search p50 |
34
34
  |----------|-----:|-----------|------------:|----------:|-----------:|
35
- | Divina, with stored text | 14,097 | 0.3 vs 16.0 MB (~98% less) | ~73% less | ~69% faster | ~20% faster |
36
- | Divina, index only | 14,097 | 0.2 vs 14.9 MB (~99% less) | ~77% less | ~84% faster | ~19% faster |
37
- | High-frequency terms | 10,000 | 4.4 vs 7.4 MB (~40% less) | ~94% less | ~89% faster | ~43% faster |
38
- | Dense numeric ids | 100,000 | 0.9 vs 91.3 MB (~99% less) | ~88% less | ~90% faster | ~33% faster |
39
- | Uint16 doc id boundary | 65,535 | 0.6 vs 58.6 MB (~99% less) | ~91% less | ~94% faster | ~59% faster |
35
+ | Divina, with stored text | 14,097 | 0.3 vs 16.0 MB (~98% less) | ~71% less | ~56% faster | ~21% faster |
36
+ | Divina, index only | 14,097 | 0.2 vs 14.9 MB (~99% less) | ~74% less | ~80% faster | ~24% faster |
37
+ | High-frequency terms | 10,000 | 4.4 vs 7.4 MB (~41% less) | ~92% less | ~85% faster | ~41% faster |
38
+ | Dense numeric ids | 100,000 | 0.9 vs 91.3 MB (~99% less) | ~73% less | ~87% faster | ~33% faster |
39
+ | Uint16 doc id boundary | 65,535 | 0.6 vs 58.6 MB (~99% less) | ~77% less | ~91% faster | ~53% faster |
40
40
 
41
- Across this full run, frozen is faster on **26/27** search cases. Divina `inferno` (exact, paired p50): mutable 15.0 µs → frozen 11.3 µs (**-4 µs**, ratio 0.74).
41
+ Across this full run, frozen is faster on **25/27** search cases. Divina `inferno` (exact, paired p50): mutable 18.1 µs → frozen 11.4 µs (**-7 µs**, ratio 0.72).
42
42
 
43
- Numbers are from `benchmarks/baselines/reference.json`, captured 2026-06-20 on Node v24.16.0, 3 runs per scenario. Heap is measured with one index alive and should be read as a trend, not exact accounting.
43
+ Numbers are from `benchmarks/baselines/reference.json`, captured 2026-06-21 on Node v24.16.0, 3 runs per scenario. Heap is measured with one index alive and should be read as a trend, not exact accounting.
44
44
  <!-- vs-reference:end -->
45
45
 
46
46
  ---
@@ -77,7 +77,20 @@ for (const doc of rows) builder.add(doc)
77
77
  const index = freezeFrozenIndexBuilder(builder)
78
78
  ```
79
79
 
80
- ESM and CommonJS are both supported (`main` → CJS, `module` → ESM).
80
+ ESM and CommonJS are both supported on Node (`main` → CJS, `module` → ESM). For browsers and bundlers, use the dedicated browser entry (search, build, and **async** binary I/O):
81
+
82
+ ```javascript
83
+ import FrozenMiniSearch from '@yoch/frozenminisearch/browser'
84
+
85
+ const index = FrozenMiniSearch.fromDocuments(documents, options)
86
+ index.search('ishmael', { prefix: true })
87
+
88
+ // Load a zlib snapshot from CDN (Uint8Array)
89
+ const buf = new Uint8Array(await (await fetch('/index.frozen')).arrayBuffer())
90
+ const loaded = await FrozenMiniSearch.loadBinaryAsync(buf, options)
91
+ ```
92
+
93
+ See [examples/plain_js_frozen/](examples/plain_js_frozen/) for a plain-JS demo (`yarn build` first).
81
94
 
82
95
  ---
83
96
 
@@ -127,15 +140,15 @@ MiniSearch is only needed if you still build mutable indexes. Frozen instances d
127
140
  - `search(query, searchOptions?)` — string, wildcard (`FrozenMiniSearch.wildcard`), or nested `QueryCombination`
128
141
  - `autoSuggest(queryString, options?)`
129
142
  - `has(id)`, `getStoredFields(id)`
130
- - `saveBinarySync` / `loadBinarySync` / async variants
143
+ - `saveBinarySync` / `loadBinarySync` on **Node** (async variants too); browser entry supports **async** binary only (`Uint8Array`, `raw` / `zlib` / `auto`)
131
144
 
132
145
  Custom `tokenize` and `processTerm` functions are not stored in snapshots; pass the same functions again when loading.
133
146
 
134
147
  ---
135
148
 
136
- ## Binary snapshots
149
+ ## Binary snapshots (Node)
137
150
 
138
- Binary snapshots are the preferred production format.
151
+ Binary snapshots are the preferred production format on Node.js.
139
152
 
140
153
  ```javascript
141
154
  const buf = index.saveBinarySync()
@@ -143,16 +156,16 @@ const loaded = FrozenMiniSearch.loadBinarySync(buf, {}) // field names embedded
143
156
  ```
144
157
 
145
158
  - **Node ≥ 20**
146
- - `compression: 'auto'` chooses `zstd` on Node 22.15+, otherwise `zlib`, and falls back to raw when compression does not help.
147
- - Use explicit compression when you need a portable artifact:
159
+ - `compression: 'auto'` uses **zlib** when it shrinks the payload (portable on Node 20+ and in the browser build); falls back to raw when compression does not help.
160
+ - Use explicit compression when you need a specific artifact:
148
161
 
149
162
  ```javascript
150
- const portable = index.saveBinarySync({ compression: 'zlib' })
163
+ const portable = index.saveBinarySync({ compression: 'zlib' }) // CDN / browser
151
164
  const uncompressed = index.saveBinarySync({ compression: 'raw' })
152
- const bestRatio = index.saveBinarySync({ compression: 'zstd' }) // Node 22.15+
165
+ const bestRatio = index.saveBinarySync({ compression: 'zstd' }) // Node 22.15+ only
153
166
  ```
154
167
 
155
- Raw and zlib snapshots load on Node 20+. zstd snapshots require Node 22.15+.
168
+ Raw snapshots load in the browser without native compression APIs. zlib snapshots in the browser require `CompressionStream` / `DecompressionStream`. Browser binary I/O is async because it uses native browser stream APIs, but it still materializes the full compressed/decompressed payload in memory. zstd snapshots require Node 22.15+ (read/write on Node; not supported in the browser build).
156
169
 
157
170
  ---
158
171