@yoch/minisearch 8.0.0-beta.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md ADDED
@@ -0,0 +1,516 @@
1
+ # Changelog
2
+
3
+ `MiniSearch` follows [semantic versioning](https://semver.org/spec/v2.0.0.html).
4
+
5
+ ## v8.0.0-beta.2
6
+
7
+ Consolidated beta on npm. Supersedes `8.0.0-beta.0` and `8.0.0-beta.1` (unpublished).
8
+ Includes frozen index, binary format, `fromDocuments`, English docs, and `publishConfig.tag: "beta"`.
9
+
10
+ - Documentation: README and benchmarks README in English
11
+ - Fix `FrozenMiniSearch.fromDocuments` wiring (no side-effect import required)
12
+ - Fix `processTerm` array semantics to match mutable `MiniSearch#add`
13
+ - `publishConfig.tag: "beta"` on publish; align `latest` and `beta` dist-tags to this version
14
+
15
+ ## v8.0.0-beta.1
16
+
17
+ Second beta (`@yoch/minisearch@beta`). Adds one-shot frozen index construction.
18
+
19
+ - Add `FrozenMiniSearch.fromDocuments(documents, options)` to build a read-only
20
+ index in a single pass without a mutable `MiniSearch` step (same search results
21
+ as `addAll` + `freeze()` on the same corpus and options)
22
+ - Export `buildFrozenFromDocuments` and `assembleFrozen` for custom build pipelines
23
+ - Add `indexingCore.ts` and `frozenBuild.ts`; share tokenization / `processTerm`
24
+ logic with `MiniSearch#add`
25
+ - Benchmark suite: `heapMb.buildMutableFreeze` vs `heapMb.buildFromDocuments`
26
+
27
+ ## v8.0.0-beta.0
28
+
29
+ Node.js–focused beta release (`@yoch/minisearch` on npm). Adds a read-only frozen index and
30
+ binary serialization; packaging no longer ships a browser UMD bundle.
31
+
32
+ - Add `FrozenMiniSearch`, a read-only index with compact TypedArray postings,
33
+ built via `MiniSearch#freeze()` or `FrozenMiniSearch.loadBinary()`
34
+ - Add `saveBinary()` / `FrozenMiniSearch.loadBinary()` for smaller on-disk
35
+ snapshots and faster loads than `JSON.stringify` / `loadJSON` (`MSv2` flat
36
+ postings on write; `MSv1` still readable on load; same `fields`, `tokenize`,
37
+ and `processTerm` as at index build time are still required in `options`)
38
+ - Flat in-memory postings (`allDocIds` / `allFreqs` buffers) reduce JS object
39
+ overhead; `frozenMemoryBreakdown()` for benchmark profiling
40
+ - Frozen postings clamp per-doc term frequency to 255 (Uint8). This can
41
+ slightly affect scores for very large term frequencies; benchmark scenario
42
+ \"overflow frequencies\" reports score drift vs mutable.
43
+ - Extract shared search scoring into `scoring.ts` (BM25, AND/OR/AND_NOT,
44
+ result finalization) used by both `MiniSearch` and `FrozenMiniSearch`
45
+ - Add `SearchableMap#radixTree` for index snapshots that preserve radix tree
46
+ key order (prefix, fuzzy, and autoSuggest parity)
47
+ - Add `yarn benchmark:compare`, `benchmark:record`, `benchmark:diff`, and
48
+ versioned `benchmarks/baselines/reference.json` for regression tracking
49
+ - Add `benchmarks/loadDivinaLines.js` and extreme synthetic scenarios
50
+ - [breaking change] Drop UMD / browser build targets from Rollup; package
51
+ ships ESM and CJS for Node only (`exports.require` points at a CJS wrapper
52
+ so `require('@yoch/minisearch')` works without `.default`)
53
+ - Centralize default search, autoSuggest, and `loadBinary` options in
54
+ `searchDefaults.ts`
55
+
56
+ ## v7.2.0
57
+
58
+ - [fix] Relax the return type of `extractField` to allow non-string values
59
+ (when a field is stored but not indexed, it can be any type)
60
+ - Add `stringifyField` option to customize how field values are turned into strings
61
+ for indexing
62
+
63
+ ## v7.1.2
64
+
65
+ - [fix] Correctly specify that MiniSearch targets ES9 (ES2018), not ES6
66
+ (ES2015), due to the use of Unicode character class escapes in the
67
+ tokenizer RegExp. Note: the README explains how to achieve ES2015
68
+ compatibility.
69
+
70
+ ## v7.1.1
71
+
72
+ - [fix] Fix ability to pass the default `filter` search option in the
73
+ constructor alongside other search options
74
+
75
+ ## v7.1.0
76
+
77
+ - Add `boostTerm` search option to apply a custom boosting factor to specific
78
+ terms in the query
79
+
80
+ ## v7.0.2
81
+
82
+ - [fix] Fix regression on tokenizer producing blank terms when multiple
83
+ contiguous spaces or punctuation characters are present in the input,
84
+ introduced in `v7.0.0`.
85
+
86
+ ## v7.0.1
87
+
88
+ - [fix] Fix type definitions directory in `package.json` (by
89
+ [@brenoepics](https://github.com/brenoepics)
90
+ - [fix] Remove redundant versions of distribution files and simplify build
91
+
92
+ ## v7.0.0
93
+
94
+ This is a major release, but the only real breaking change is that it targets
95
+ ES6 (ES2015) and later. This means that it will not work in legacy browsers,
96
+ most notably Internet Explorer 11 and earlier (by now well below 1% global
97
+ usage according to https://caniuse.com). Among other benefits, this reduces the
98
+ package size (from 8.8KB to 5.8KB minified and gzipped).
99
+
100
+ - [breaking change] Target ES6 (ES2015) and later, dropping support for
101
+ Internet Explorer 11 and earlier.
102
+ - [breaking change] Better TypeScript type of `combineWith` search option
103
+ values, catching invalid operators at compile time. Note that this is a
104
+ breaking change only if one was using unlikely weird casing for the
105
+ `combineWith` option. For example, `AND`, `and`, `And` are all still valid,
106
+ but `aNd` won't compile anymore.
107
+ - More informative error when specifying an invalid value for `combineWith`
108
+ in JavaScript (in TypeScript this would be a compile time error)
109
+ - Use the Unicode flag to simplify the tokenizer regular expression
110
+ - Add `loadJSONAsync` method, to load a serialized index asynchronously
111
+
112
+ ## v6.3.0 - 2023-11-22
113
+
114
+ - Add `queryTerms` array to the search results. This is useful to determine
115
+ which query terms were matched by each search result.
116
+
117
+ ## v6.2.0 - 2023-10-26
118
+
119
+ - Add the possibility to search for the special value `MiniSearch.wildcard` to
120
+ match all documents, but still apply search options like filtering and
121
+ document boosting
122
+
123
+ ## v6.1.0 - 2023-05-15
124
+
125
+ - Add `getStoredFields` method to retrieve the stored fields for a document
126
+ given its ID.
127
+
128
+ - Pass stored fields to the `boostDocument` callback function, making it
129
+ easier to perform dynamic document boosting.
130
+
131
+ ## v6.0.1 - 2023-02-01
132
+
133
+ - [fix] The `boost` search option now does not interfere with the `fields`
134
+ search option: if `fields` is specified, boosting a field that is not
135
+ included in `fields` has no effect, and will not include such boosted field
136
+ in the search.
137
+ - [fix] When using `search` with a `QuerySpec`, the `combineWith` option is
138
+ now properly taking its default from the `SearchOptions` given as the second
139
+ argument.
140
+
141
+ ## v6.0.0 - 2022-12-01
142
+
143
+ This is a major release. The most notable change is the addition of `discard`,
144
+ `discardAll`, and `replace`. These method make it more convenient and performant
145
+ to remove or replace documents.
146
+
147
+ This release is almost completely backwards compatible with `v5`, apart from one
148
+ breaking change in the behavior of `add` when the document ID already exists.
149
+
150
+ Changes:
151
+
152
+ - [breaking change] `add`, `addAll`, and `addAllAsync` now throw an error on
153
+ duplicate document IDs. When necessary, it is now possible to check for the
154
+ existence of a document with a certain ID with the new method `has`.
155
+ - Add `discard` method to remove documents by ID. This is a convenient
156
+ alternative to `remove` that takes only the ID of the documents to remove,
157
+ as opposed to the whole document. The visible effect is the same as
158
+ `remove`. The difference is that `remove` immediately mutates the index,
159
+ while `discard` marks the current document version as discarded, so it is
160
+ immedately ignored by searches, but delays modifying the index until a
161
+ certain number of documents are discarded. At that point, a vacuuming is
162
+ triggered, cleaning up the index from obsolete references and allowing
163
+ memory to be released.
164
+ - Add `discardAll` and `replace` methods, built on top of `discard`
165
+ - Add vacuuming of references to discarded documents from the index. Vacuuming
166
+ is performed automatically by default when the number of discarded documents
167
+ reaches a threshold (controlled by the new `autoVacuum` constructor option),
168
+ or can be triggered manually by calling the `vacuum` method. The new
169
+ `dirtCount` and `dirtFactor` properties give the current value of the
170
+ parameters used to decide whether to trigger an automatic vacuuming.
171
+ - Add `termCount` property, giving the number of distinct terms present in the
172
+ index
173
+ - Allow customizing the parameters of the BM25+ scoring algorithm via the
174
+ `bm25` search option.
175
+ - Improve TypeScript type of some methods by marking the given array argument
176
+ as `readonly`, signaling that it won't be mutated, and allowing passing
177
+ readonly arrays.
178
+ - Make it possible to overload the `loadJS` static method in subclasses
179
+
180
+ ## v5.1.0
181
+
182
+ - The `processTerm` option can now also expand a single term into several
183
+ terms by returning an array of strings.
184
+ - Add `logger` option to pass a custom logger function.
185
+
186
+ ## v5.0.0
187
+
188
+ This is a major release. The main change is an improved scoring algorithm based
189
+ on [BM25+](https://en.wikipedia.org/wiki/Okapi_BM25). The new algorithm will
190
+ cause the scoring and sorting of search results to be different than in previous
191
+ versions (generally better), and need less aggressive boosting.
192
+
193
+ - [breaking change] Use the [BM25+
194
+ algorithm](https://en.wikipedia.org/wiki/Okapi_BM25) to score search
195
+ results, improving their quality over the previous implementation. Note
196
+ that, if you were using field boosting, you might need to re-adjust the
197
+ boosting amounts, since their effect is now different.
198
+
199
+ - [breaking change] auto suggestions now default to `combineWith: 'AND'`
200
+ instead of `'OR'`, requiring all the query terms to match. The old defaults
201
+ can be replicated by passing a new `autoSuggestOptions` option to the
202
+ constructor, with value `{ autoSuggestOptions: { combineWith: 'OR' } }`.
203
+
204
+ - Possibility to set the default auto suggest options in the constructor.
205
+
206
+ - Remove redundant fields in the index data. This also changes the
207
+ serialization format, but serialized indexes created with `v4.x.y` are still
208
+ deserialized correctly.
209
+
210
+ - Define `exports` entry points in `package.json`, to require MiniSearch as a
211
+ commonjs package or import it as a ES module.
212
+
213
+ ## v4.0.3
214
+
215
+ - [fix] Fix regression causing stored fields not being saved in some
216
+ situations.
217
+
218
+ ## v4.0.2
219
+
220
+ - [fix] Fix match data on mixed prefix and fuzzy search
221
+
222
+ ## v4.0.1
223
+
224
+ - [fix] Fix an issue with scoring, causing a result matching both fuzzy and
225
+ prefix search to be scored higher than an exact match.
226
+
227
+ - [breaking change] `SearchableMap` method `fuzzyGet` now returns a `Map`
228
+ instead of an object. This is a breaking change only if you directly use
229
+ `SearchableMap`, not if you use `MiniSearch`, and is considered part of
230
+ version 4.
231
+
232
+ ## v4.0.0
233
+
234
+ - [breaking change] The serialization format was changed, to abstract away the
235
+ internal implementation details of the index data structure. This allows for
236
+ present and future optimizations without breaking backward compatibility
237
+ again. Moreover, the new format is simpler, facilitating the job of tools
238
+ that create a serialized MiniSearch index in other languages.
239
+
240
+ - [performance] Large performance improvements on indexing (at least 4 time
241
+ faster in the official benchmark) and search, due to changes to the internal
242
+ data structures and the code.
243
+
244
+ - [peformance] The fuzzy search algorithm has been updated to work like
245
+ outlined in [this blog post by Steve
246
+ Hanov](http://stevehanov.ca/blog/?id=114), improving its performance by
247
+ several times, especially on large maximum edit distances.
248
+
249
+ - [fix] The `weights` search option did not have an effect due to a bug. Now
250
+ it works as documented. Note that, due to this, the relative scoring of
251
+ fuzzy vs. prefix search matches might change compared to previous versions.
252
+ This change also brings a further performance improvement of both fuzzy and
253
+ prefix search.
254
+
255
+ **Migration notes:**
256
+
257
+ If you have an index serialized with a previous version of MiniSearch, you will
258
+ need to re-create it when you upgrade to MiniSearch `v4`.
259
+
260
+ Also note that loading a pre-serialized index is _slower_ in `v4` than in
261
+ previous versions, but there are much larger performance gains on indexing and
262
+ search speed. If you serialized an index on the server-side, it is worth
263
+ checking if it is now fast enough for your use case to index on the client side:
264
+ it would save you from having to re-serialize the index every time something
265
+ changes.
266
+
267
+ **Acknowledgements:**
268
+
269
+ Many thanks to [rolftimmermans](https://github.com/rolftimmermans) for
270
+ contributing the fixes and outstanding performance improvements that are part of
271
+ this release.
272
+
273
+
274
+ ## v3.3.0
275
+
276
+ - Add `maxFuzzy` search option, to limit the maximum edit distance for fuzzy
277
+ search when using fractional fuzziness
278
+
279
+ ## v3.2.0
280
+
281
+ - Add AND_NOT combinator to subtract results of a subquery from another (for
282
+ example to find documents that match one term and not another)
283
+
284
+ ## v3.1.0
285
+
286
+ - Add possibility for advanced combination of subqueries as query expression
287
+ trees
288
+
289
+ ## v3.0.4
290
+
291
+ - [fix] Keep radix tree property (no node with a single child) after removal
292
+ of an entry
293
+
294
+ ## v3.0.3
295
+
296
+ - [fix] Adjust data about field lengths upon document removal
297
+
298
+ ## v3.0.2
299
+
300
+ - [fix] `addAllAsync` now allows events to be processed between chunks, avoid
301
+ blocking the UI (by [@grimmen](https://github.com/grimmen))
302
+
303
+ ## v3.0.1
304
+
305
+ - [fix] Fix type signature of `removeAll` to allow calling it with no
306
+ arguments. Also, throw a more informative error if called with a falsey
307
+ value. Thanks to [https://github.com/nilclass](@nilclass).
308
+
309
+ ## v3.0.0
310
+
311
+ This major version ports the source code to TypeScript. That made it possible
312
+ to improve types and documentation, making sure that both are in sync with the
313
+ actual code. It is mostly backward compatible: JavaScript users should
314
+ experience no breaking change, while TypeScript users _might_ have toadapt
315
+ some types.
316
+
317
+ - Port source to [TypeScript](https://www.typescriptlang.org), adding type
318
+ safety
319
+ - Improved types and documentation (now generated with [TypeDoc](http://typedoc.org))
320
+ - [breaking change, fix] TypeScript `SearchOptions` type is not generic
321
+ anymore
322
+ - [breaking change] `SearchableMap` is not a static field of `MiniSearch`
323
+ anymore: it can instead be imported separately as `minisearch/SearchableMap`
324
+
325
+ ## v2.6.2
326
+
327
+ - [fix] Improve TypeScript types: default generic document type is `any`, not `object`
328
+
329
+ ## v2.6.1
330
+
331
+ - No change from 2.6.0
332
+
333
+ ## v2.6.0
334
+
335
+ - Better TypeScript typings using generics, letting the user (optionally)
336
+ specify the document type.
337
+
338
+ ## v2.5.1
339
+
340
+ - [fix] Fix document removal when using a custom `extractField` function
341
+ (thanks [@ahri](https://github.com/ahri) for reporting and reproducting)
342
+
343
+ ## v2.5.0
344
+
345
+ - Make `idField` extraction customizeable and consistent with other fields,
346
+ using `extractField`
347
+
348
+ ## v2.4.1
349
+
350
+ - [fix] Fix issue with the term `constructor` (reported by
351
+ [@scambier](https://github.com/scambier))
352
+
353
+ - [fix] Fix issues when a field is named like a default property of JavaScript
354
+ objects
355
+
356
+ ## v2.4.0
357
+
358
+ - Convert field value to string before tokenization and indexing. This makes
359
+ a custom field extractor unnecessary for basic cases like integers or simple
360
+ arrays.
361
+
362
+ ## v2.3.1
363
+
364
+ - Version `v2.3.1` mistakenly did not contain the commit adding `removeAll`,
365
+ this patch release fixes it.
366
+
367
+ ## v2.3.0
368
+
369
+ - Add `removeAll` method, to remove many documents, or all documents, at once.
370
+
371
+ ## v2.2.2
372
+
373
+ - Avoid destructuring variables named with an underscore prefix. This plays
374
+ nicer to some common minifier and builder configurations.
375
+
376
+ - Performance improvement in `getDefault` (by
377
+ [stalniy](https://github.com/stalniy))
378
+
379
+ - Fix the linter setup, to ensure code style consistency
380
+
381
+ ## v2.2.1
382
+
383
+ - Add `"sideEffects": false` to `package.json` to allow bundlers to perform
384
+ tree shaking
385
+
386
+ ## v2.2.0
387
+
388
+ - [fix] Fix documentation of `SearchableMap.prototype.atPrefix` (by
389
+ [@graphman65](https://github.com/graphman65))
390
+ - Switch to Rollup for bundling (by [stalniy](https://github.com/stalniy)),
391
+ reducing size of build and providing ES6 and ES5 module versions too.
392
+
393
+ ## v2.1.4
394
+
395
+ - [fix] Fix document removal in presence of custom per field tokenizer, field
396
+ extractor, or term processor (thanks [@CaptainChaos](https://github.com/CaptainChaos))
397
+
398
+ ## v2.1.3
399
+
400
+ - [fix] Fix TypeScript definition for `storeFields` option (by
401
+ [@ryan-codingintrigue](https://github.com/ryan-codingintrigue))
402
+
403
+ ## v2.1.2
404
+
405
+ - [fix] Fix TypeScript definition for `fuzzy` option (by
406
+ [@alessandrobardini](https://github.com/alessandrobardini))
407
+
408
+ ## v2.1.1
409
+
410
+ - [fix] Fix TypeScript definitions adding `filter` and `storeFields` options
411
+ (by [@emilianox](https://github.com/emilianox))
412
+
413
+ ## v2.1.0
414
+
415
+ - [feature] Add support for stored fields
416
+
417
+ - [feature] Add filtering of search results and auto suggestions
418
+
419
+ ## v2.0.6
420
+
421
+ - Better TypeScript definitions (by [@samuelmeuli](https://github.com/samuelmeuli))
422
+
423
+ ## v2.0.5
424
+
425
+ - Add TypeScript definitions for ease of use in TypeScript projects
426
+
427
+ ## v2.0.4
428
+
429
+ - [fix] tokenizer behavior with newline characters (by [@samuelmeuli](https://github.com/samuelmeuli))
430
+
431
+ ## v2.0.3
432
+
433
+ - Fix small imprecision in documentation
434
+
435
+ ## v2.0.2
436
+
437
+ - Add `addAllAsync` method, adding many documents asynchronously and in chunks
438
+ to avoid blocking the main thread
439
+
440
+ ## v2.0.1
441
+
442
+ - Throw a more descriptive error when `loadJSON` is called without options
443
+
444
+ ## v2.0.0
445
+
446
+ This release introduces better defaults. It is considered a major release, as
447
+ the default options are slightly different, but the API is not changed.
448
+
449
+ - *Breaking change*: default tokenizer splits by Unicode space or punctuation
450
+ (before it was splitting by space, punctuation, or _symbol_). The difference
451
+ is that currency symbols and other non-punctuation symbols will not be
452
+ discarded: "it's 100€" is now tokenized as `["it", "s", "100€"]` instead of
453
+ `["it", "s", "100"]`.
454
+
455
+ - *Breaking change*: default term processing does not discard 1-character
456
+ words.
457
+
458
+ - *Breaking change*: auto suggestions by default perform prefix search only on
459
+ the last term in the query. So "super cond" will suggest "super
460
+ conductivity", but not "superposition condition".
461
+
462
+ ## v1.3.1
463
+
464
+ - Better and more compact regular expression in the default tokenizer,
465
+ separating on Unicode spaces, punctuation, and symbols
466
+
467
+ ## v1.3.0
468
+
469
+ - Support for non-latin scripts
470
+
471
+ ## v1.2.1
472
+
473
+ - Improve fuzzy search performance (common cases are now ~4x faster, as shown
474
+ by the benchmark)
475
+
476
+ ## v1.2.0
477
+
478
+ - Add possibility to configure a custom field extraction function by setting
479
+ the `extractField` option (to support cases like nested fields, non-string
480
+ fields, getter methods, field pre-processing, etc.)
481
+
482
+ ## v1.1.2
483
+
484
+ - Add `getDefault` static method to get the default value of configuration options
485
+
486
+ ## v1.1.1
487
+
488
+ - Do not minify library when published as NPM package. Run `yarn
489
+ build-minified` (or `npm run build-minified`) to produce a minified build
490
+ with source maps.
491
+ - **Bugfix**: as per specification, `processTerm` is called with only one
492
+ argument upon search (see [#5](https://github.com/lucaong/minisearch/issues/5))
493
+
494
+ ## v1.1.0
495
+
496
+ - Add possibility to configure separate index-time and search-time
497
+ tokenization and term processing functions
498
+ - The `processTerm` function can now reject a term by returning a falsy value
499
+ - Upon indexing, the `tokenize` and `processTerm` functions receive the field
500
+ name as the second argument. This makes it possible to process or tokenize
501
+ each field differently.
502
+
503
+ ## v1.0.1
504
+
505
+ - Reduce bundle size by optimizing babel preset env options
506
+
507
+ ## v1.0.0
508
+
509
+ Production-ready release.
510
+
511
+ Features:
512
+
513
+ - Space-optimized index
514
+ - Exact match, prefix match, fuzzy search
515
+ - Auto suggestions
516
+ - Add/remove documents at any time
package/LICENSE.txt ADDED
@@ -0,0 +1,7 @@
1
+ Copyright 2022 Luca Ongaro
2
+
3
+ Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
4
+
5
+ The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
6
+
7
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
package/README.md ADDED
@@ -0,0 +1,152 @@
1
+ # @yoch/minisearch
2
+
3
+ **In-memory full-text search for Node.js** — a fork of [MiniSearch](https://github.com/lucaong/minisearch) by [Luca Ongaro](https://github.com/lucaong/minisearch), extended for **production serving**: smaller indexes, faster loads, and a read-only fast path.
4
+
5
+ > **Current release:** `8.0.0-beta.2` · install with `npm install @yoch/minisearch`
6
+
7
+ ---
8
+
9
+ ## Why this fork?
10
+
11
+ [MiniSearch](https://github.com/lucaong/minisearch) is excellent for building and querying an index in JavaScript. This fork keeps that API for **mutable** indexing, and adds **`FrozenMiniSearch`** for when the index is built once and queried many times:
12
+
13
+ | | Mutable `MiniSearch` | `FrozenMiniSearch` |
14
+ |---|---------------------|-------------------|
15
+ | **Use when** | Documents change (`add`, `remove`, `discard`) | Corpus is fixed, or you reload from disk |
16
+ | **Memory** | Maps and nested objects per posting | Flat `Uint32Array` / `Uint8Array` postings |
17
+ | **On disk** | `toJSON` / `loadJSON` | **`saveBinary` / `loadBinary`** (MSv2, reads MSv1) |
18
+ | **Typical search** | Baseline | Often **~20–35% faster** p50 on the same corpus (see benchmarks) |
19
+
20
+ Same BM25 scoring, prefix/fuzzy search, `autoSuggest`, and query combinators — frozen indexes aim for **bit-for-bit parity** with `addAll` + `freeze()` on the same options.
21
+
22
+ ---
23
+
24
+ ## Quick start
25
+
26
+ ```bash
27
+ npm install @yoch/minisearch
28
+ # or pin the beta channel:
29
+ # npm install @yoch/minisearch@beta
30
+ ```
31
+
32
+ **One-shot frozen index** (no mutable step):
33
+
34
+ ```javascript
35
+ import { FrozenMiniSearch } from '@yoch/minisearch'
36
+
37
+ const options = { fields: ['title', 'text'], storeFields: ['title'] }
38
+
39
+ const index = FrozenMiniSearch.fromDocuments(documents, options)
40
+ index.search('ishmael', { prefix: true })
41
+ index.autoSuggest('zen')
42
+
43
+ // Persist and reload
44
+ const buf = index.saveBinary()
45
+ const loaded = FrozenMiniSearch.loadBinary(buf, options)
46
+ ```
47
+
48
+ **Mutable index, then freeze** (incremental build):
49
+
50
+ ```javascript
51
+ import MiniSearch, { FrozenMiniSearch } from '@yoch/minisearch'
52
+
53
+ const ms = new MiniSearch({ fields: ['title', 'text'] })
54
+ ms.addAll(documents)
55
+
56
+ const frozen = ms.freeze() // immutable snapshot
57
+ const buf = frozen.saveBinary()
58
+ ```
59
+
60
+ ```javascript
61
+ // ESM
62
+ import MiniSearch, { FrozenMiniSearch, buildFrozenFromDocuments } from '@yoch/minisearch'
63
+
64
+ // CommonJS
65
+ const MiniSearch = require('@yoch/minisearch')
66
+ const { FrozenMiniSearch } = require('@yoch/minisearch')
67
+ ```
68
+
69
+ ---
70
+
71
+ ## Pick the right API
72
+
73
+ | Goal | API |
74
+ |------|-----|
75
+ | Live index that changes over time | `MiniSearch` → `freeze()` when you need read-only serving |
76
+ | Fixed corpus, build frozen directly | **`FrozenMiniSearch.fromDocuments(documents, options)`** |
77
+ | Load a snapshot from disk | `FrozenMiniSearch.loadBinary(buffer, options)` |
78
+ | Custom assembly pipeline | `buildFrozenFromDocuments`, `assembleFrozen`, `freezeFromMiniSearch` |
79
+
80
+ `fromDocuments` matches `new MiniSearch(opts).addAll(docs).freeze()` for search ranking on the same corpus and options (`fields`, `tokenize`, `processTerm`, …). Frozen indexes do not support `add` / `remove`.
81
+
82
+ ---
83
+
84
+ ## FrozenMiniSearch in a bit more detail
85
+
86
+ - **`freeze()`** — snapshot a mutable index into compact typed postings + a radix tree keyed by term index.
87
+ - **`fromDocuments()`** — build that structure in one pass (skips nested `Map` postings and radix cloning at freeze time).
88
+ - **`saveBinary()` / `loadBinary()`** — MSv2 on write, MSv1 still readable; pass the same `fields` (and custom `tokenize` / `processTerm` if used at build time).
89
+ - **Term frequencies** — stored as `Uint8` (max 255 per doc/term); only affects scores for extreme term repetition.
90
+ - **`frozenMemoryBreakdown()`** — introspect postings, radix tree, and stored-field footprint.
91
+
92
+ Advanced exports:
93
+
94
+ ```javascript
95
+ import {
96
+ FrozenMiniSearch,
97
+ buildFrozenFromDocuments,
98
+ assembleFrozen,
99
+ freezeFromMiniSearch,
100
+ frozenMemoryBreakdown
101
+ } from '@yoch/minisearch'
102
+ ```
103
+
104
+ ---
105
+
106
+ ## MiniSearch (mutable)
107
+
108
+ Full upstream-style API: field boosts, fuzzy/prefix, nested queries, `AND` / `OR` / `AND_NOT`, filters, `autoSuggest`, vacuum after `discard`, etc.
109
+
110
+ ```javascript
111
+ import MiniSearch from '@yoch/minisearch'
112
+
113
+ const miniSearch = new MiniSearch({ fields: ['title', 'text'] })
114
+ miniSearch.addAll(documents)
115
+ miniSearch.search('zen art motorcycle')
116
+ ```
117
+
118
+ TypeScript definitions: `dist/es/index.d.ts`.
119
+
120
+ ---
121
+
122
+ ## Benchmarks
123
+
124
+ Reproducible comparisons (heap, load time, search latency) live under [`benchmarks/`](benchmarks/README.md):
125
+
126
+ ```bash
127
+ yarn benchmark:compare # terminal report
128
+ yarn benchmark:diff # vs versioned baseline
129
+ ```
130
+
131
+ ---
132
+
133
+ ## Development
134
+
135
+ ```bash
136
+ yarn install
137
+ yarn test
138
+ yarn build
139
+ ```
140
+
141
+ **Requirements:** Node.js **ES2018+**. No browser UMD/CDN build in this fork (Node-only ESM + CJS).
142
+
143
+ ---
144
+
145
+ ## Changelog & credits
146
+
147
+ See [CHANGELOG.md](./CHANGELOG.md).
148
+
149
+ - **MiniSearch** — [Luca Ongaro](https://github.com/lucaong/minisearch) (MIT)
150
+ - **This fork** — [yoch/minisearch](https://github.com/yoch/minisearch): `FrozenMiniSearch`, MSv1/MSv2 binary format, shared scoring refactor
151
+
152
+ Upstream docs: [MiniSearch site](https://lucaong.github.io/minisearch/) · [intro article](https://lucaongaro.eu/blog/2019/01/30/minisearch-client-side-fulltext-search-engine.html)