npm - @yoch/frozenminisearch - Versions diffs - 1.2.2 → 1.2.3 - Mend

@yoch/frozenminisearch 1.2.2 → 1.2.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (5) hide show

package/CHANGELOG.md CHANGED Viewed

@@ -2,6 +2,17 @@
 ## Unreleased
+## v1.2.3 — `@yoch/frozenminisearch`
+Patch release: broad-first exact AND / AND_NOT paths, seek-based gated doc-id collection, and README benchmark copy refresh. No API or MSv5 wire-format changes.
+### Improved
+- **Broad-first exact AND** — on exact-only combined queries where the first branch posting is large and a later branch is selective enough, collect the final doc-id gate by estimated posting length, then score branches in query order (parity with naive score-then-intersect unchanged).
+- **Broad-first AND_NOT** — when the positive branch is large and a negated branch is comparably large, collect exclusions first and score the positive branch only on survivors.
+- **Gated doc-id collection** — `DocIdGate` lazy views and seek over sorted postings when the gate is much smaller than the posting list (`scoring.ts`, `frozenPostings.ts`).
+- **AND gate posting estimate** — skip upfront posting-length estimation on prefix/fuzzy AND branches; keep absolute-gate skip on the sequential path so Divina `AND+fuzzy` stays fast.
 ## v1.2.2 — `@yoch/frozenminisearch`
 Patch release: faster frozen AND scoring on large posting lists (gated seek + posting-ratio gate) and BM25 segment hoisting. No API or MSv5 wire-format changes.

package/README.md CHANGED Viewed

@@ -3,67 +3,44 @@
 [![npm version](https://img.shields.io/npm/v/@yoch/frozenminisearch.svg)](https://www.npmjs.com/package/@yoch/frozenminisearch)
 [![coverage](https://codecov.io/gh/yoch/frozenminisearch/graph/badge.svg)](https://codecov.io/gh/yoch/frozenminisearch)
 [![CI](https://github.com/yoch/frozenminisearch/actions/workflows/main.yml/badge.svg)](https://github.com/yoch/frozenminisearch/actions/workflows/main.yml)
-[![Socket Badge](https://socket.dev/api/badge/npm/package/@yoch/frozenminisearch)](https://socket.dev/npm/package/%40yoch%2Ffrozenminisearch)
 [API documentation](https://yoch.github.io/frozenminisearch/)
-**Memory-optimized, read-only full-text search for Node.js** — the same BM25, prefix/fuzzy, and `autoSuggest` API as [MiniSearch](https://github.com/lucaong/minisearch), with **up to ~98% less index RAM** on real corpora and compact binary snapshots you ship instead of JSON.
+**Memory-optimized, read-only full-text search for Node.js.** FrozenMiniSearch keeps the serving API close to [MiniSearch](https://github.com/lucaong/minisearch) while using compact, immutable indexes for fixed corpora.
-**Why it exists:** [MiniSearch](https://github.com/lucaong/minisearch) optimizes for a mutable in-memory index. FrozenMiniSearch optimizes for **retained heap, disk footprint, and cold load** once the corpus is fixed — packed radix postings, columnar `storeFields`, typed-array layouts, and MSv5 binary wire format instead of per-document JS objects.
+Use it when your documents are built offline, shipped to production, and queried many times. In that shape, frozen indexes use **~98-99% less index RAM** in the main benchmark set, save to compact binary snapshots, and load faster than MiniSearch JSON.
-**Design goal:** migrate with minimal code change — package name and index construction only; serving code stays the same. Build with `fromDocuments`, the incremental builder, or `fromJson`; no mutable `MiniSearch` class is published here.
+If you need live `add`, `remove`, or `discard`, use MiniSearch. If the corpus is fixed, this package is designed to keep the search experience familiar while making each serving replica much smaller.
 ---
-## Why frozen instead of MiniSearch?
+## Why FrozenMiniSearch?
-Choose **mutable MiniSearch** when documents change at runtime (`add`, `remove`, `discard`). Choose **frozen** when memory and snapshot size matter: fixed corpus, deploy from binary, many replicas loading the same index. Search semantics stay the same — BM25, prefix/fuzzy, `autoSuggest`, wildcard, `AND` / `OR` / `AND_NOT` — with parity vs MiniSearch 7 validated in `dev/parity/` (scores `toBeCloseTo` precision 6).
+FrozenMiniSearch is for the common production path where search data changes elsewhere, not inside the web process:
-### Memory-first design
+- Build or import the index offline.
+- Save it as a compact binary snapshot.
+- Load it in many read-only Node.js processes.
+- Query with MiniSearch-style `search`, `autoSuggest`, filters, boosts, prefix/fuzzy search, wildcard, and `AND` / `OR` / `AND_NOT`.
-| Technique | What it saves |
-|-----------|---------------|
-| **Packed radix tree + flat postings** | Term dictionary and posting lists without per-entry JS wrappers |
-| **Columnar `storeFields`** | One dense column per field instead of a `Record` per document (~75% less heap for a single stored field) |
-| **MSv5 binary snapshots** | ~73–94% smaller on disk than MiniSearch JSON; faster cold load |
-| **Read-only freeze** | No mutation bookkeeping — layouts sized for serve-time, not incremental edit |
-| **Incremental builder** | Typed-array accumulators during build; lower peak heap than materializing `number[][]` per term |
+Internally it replaces mutable JavaScript object graphs with packed radix postings, typed arrays, and columnar stored fields. The result is less flexible than MiniSearch, but much cheaper to keep resident.
 <!-- vs-reference:start — npm run bench:readme -->
-### Measured vs lucaong MiniSearch (reference baseline)
+### Measured vs MiniSearch
-Same BM25 queries on identical corpora. **Frozen wins on what we optimize for**: RAM, disk, cold load, and search throughput on real workloads.
+Same corpora, same BM25-style queries, MiniSearch 7.2.0 as the reference.
-| Scenario | Docs | Index RAM¹ | Disk (binary vs JSON)² | Cold load³ | Search p50⁴ |
-|----------|-----:|------------|------------------------:|-----------:|------------:|
-| Divina with storeFields | 14,097 | 0.3 vs 16.1 MB (~98% less) | ~73% less | ~65% faster | ~21% faster |
-| Divina index only | 14,097 | 0.2 vs 14.9 MB (~99% less) | ~77% less | ~85% faster | ~17% faster |
-| high-frequency terms (10k docs) | 10,000 | 0.1 vs 7.4 MB (~99% less) | ~94% less | ~90% faster | ~38% faster |
-| Dense numeric ids (100k, identity lookup) | 100,000 | 0.9 vs 91.3 MB (~99% less) | ~88% less | ~90% faster | ~27% faster |
-| Doc id Uint16 boundary (65535 docs) | 65,535 | 0.6 vs 58.6 MB (~99% less) | ~91% less | ~93% faster | ~44% faster |
+| Scenario | Docs | Index RAM | Binary size | Load time | Search p50 |
+|----------|-----:|-----------|------------:|----------:|-----------:|
+| Divina, with stored text | 14,097 | 0.3 vs 16.1 MB (~98% less) | ~73% less | ~75% faster | ~14% faster |
+| Divina, index only | 14,097 | 0.2 vs 14.9 MB (~99% less) | ~77% less | ~89% faster | ~23% faster |
+| High-frequency terms | 10,000 | 4.4 vs 7.4 MB (~41% less) | ~94% less | ~91% faster | ~40% faster |
+| Dense numeric ids | 100,000 | 0.9 vs 91.3 MB (~99% less) | ~88% less | ~94% faster | ~27% faster |
+| Uint16 doc id boundary | 65,535 | 0.6 vs 58.6 MB (~99% less) | ~91% less | ~93% faster | ~43% faster |
-**Headline:** 26/27 query benchmarks favor frozen (paired **hrtime** protocol v2). Divina `inferno` (exact, paired p50): mutable 15.7 µs → frozen 13.4 µs (**-2 µs**, ratio 0.80).
+Across this full run, frozen is faster on **25/27** search cases. Divina `inferno` (exact, paired p50): mutable 15.0 µs → frozen 13.4 µs (**-2 µs**, ratio 0.78).
-Decomposition (Divina exact): L0 lookup ~300 ns frozen, L1 `executeQuery` ~6.6 µs, L2 full `search` ~10.1 µs (finalize ≈ 3 µs).
-| | lucaong `minisearch` | `@yoch/frozenminisearch` |
-|---|------------------------|---------------------------|
-| **Sweet spot** | Live index mutations | Fixed corpus, deploy from binary |
-| **Production path** | `addAll` → `toJSON` | `fromDocuments` / `fromMiniSearch` → `saveBinarySync` → `loadBinarySync` |
-| **Typical trade-off** | Higher RAM, JSON snapshots | One-time freeze, then compact binary |
-<details>
-<summary><strong>How to read these numbers (limits &amp; protocol)</strong></summary>
-- **Captured:** 2026-06-18 · commit `d05d8e9` · Node v24.16.0 · minisearch **7.2.0** · **3** run(s)/scenario · protocol **v2** (hrtime-paired, batch target 3 ms).
-- ¹ **Index RAM** — `measureHeap` with `--expose-gc`, one index alive. V8 overhead is extra; treat as **trend**, not accounting. Sporadic outliers happen (e.g. index-only Divina).
-- ² **Disk** — `JSON.stringify(mutable)` vs `saveBinarySync`.
-- ³ **Cold load** — median wall time to searchable index after read from disk format.
-- ⁴ **Search p50** — paired mutable/frozen samples per iteration; sub-0.1 ms baselines reported in **µs** in full reports. Fast queries use **50** iterations, others **20**.
-- **Not shown:** mutable `add`/`remove` (frozen is read-only by design). Freeze time is offline — see full suite for build metrics.
-- **Reproduce:** `npm run bench -- run --profile=vs-reference` · **Update this block:** `npm run bench:readme` after refreshing `benchmarks/baselines/reference.json`.
-</details>
+Numbers are from `benchmarks/baselines/reference.json`, captured 2026-06-18 on Node v24.16.0, 3 runs per scenario. Heap is measured with one index alive and should be read as a trend, not exact accounting.
 <!-- vs-reference:end -->
 ---
@@ -74,8 +51,6 @@ Decomposition (Divina exact): L0 lookup ~300 ns frozen, L1 `executeQuery` ~6.6
 npm install @yoch/frozenminisearch
 ```
-**Build from documents:**
 ```javascript
 import FrozenMiniSearch from '@yoch/frozenminisearch'
@@ -89,7 +64,7 @@ const buf = index.saveBinarySync()
 const loaded = FrozenMiniSearch.loadBinarySync(buf, options)
 ```
-**Incremental builder:**
+For larger imports, use the incremental builder:
 ```javascript
 import FrozenMiniSearch, {
@@ -106,21 +81,11 @@ ESM and CommonJS are both supported (`main` → CJS, `module` → ESM).
 ---
-## Drop-in
-For **fixed corpora** (build once, serve read-only), treat this package as a drop-in replacement for MiniSearch on the serving path — same queries, far less memory per replica.
-**Change only:**
-| What | Before | After |
-|------|--------|-------|
-| Package | `minisearch` | `@yoch/frozenminisearch` |
-| Construction | `new MiniSearch(opts).addAll(docs)` | `FrozenMiniSearch.fromDocuments(docs, opts)` or `fromMiniSearch(mutable, opts)` |
-| JSON snapshot | `toJSON()` / `loadJSON()` wire format | `FrozenMiniSearch.toJSON()` / `fromJson(json, opts)` or `fromMiniSearchSnapshot(obj)` — no runtime dependency on `minisearch` |
+## Migration
-**Keep unchanged** after load: `search`, `autoSuggest`, `has`, `getStoredFields`, query options (`prefix`, `fuzzy`, `AND` / `OR` / `AND_NOT`, filters, boosts). Parity vs MiniSearch 7 is enforced in `dev/parity/`.
+For fixed corpora, most serving code can stay the same. Change how the index is built or loaded, then keep calling `search`, `autoSuggest`, `has`, and `getStoredFields`.
-**Imports** — default and named both work (ESM and CJS):
+Default and named imports both work:
 ```javascript
 // ESM
@@ -132,38 +97,28 @@ const FrozenMiniSearch = require('@yoch/frozenminisearch')
 const { FrozenMiniSearch } = require('@yoch/frozenminisearch')
 ```
-**Intentionally not drop-in:** live `add` / `remove` / `discard` (frozen is read-only); browser builds; custom `tokenize` / `processTerm` are not stored in JSON or binary snapshots — pass the same functions at load when you customized them.
+Build directly:
----
+```javascript
+import FrozenMiniSearch from '@yoch/frozenminisearch'
-## Migration
+const frozen = FrozenMiniSearch.fromDocuments(documents, options)
+```
-### From MiniSearch JSON
+Or freeze an existing MiniSearch index:
 ```javascript
-import MiniSearch from 'minisearch' // build-time only
+import MiniSearch from 'minisearch'
 import FrozenMiniSearch from '@yoch/frozenminisearch'
 const mutable = new MiniSearch(options)
 mutable.addAll(documents)
-// Option A — live instance
 const frozen = FrozenMiniSearch.fromMiniSearch(mutable, options)
-// Option B — serialized index (offline / ETL)
-const json = JSON.stringify(mutable)
-const frozen2 = FrozenMiniSearch.fromJson(json, options)
+const fromJson = FrozenMiniSearch.fromJson(JSON.stringify(mutable), options)
 ```
-`options.fields` must match the indexed fields in the snapshot when provided.
-### From MiniSearch (mutable → frozen)
-| Before (mutable) | After (`@yoch/frozenminisearch`) |
-|------------------|----------------------------------|
-| `new MiniSearch(opts).addAll(docs)` then serve | `FrozenMiniSearch.fromDocuments(docs, opts)` or `fromMiniSearch(mutable, opts)` |
-| MiniSearch JSON snapshot | `FrozenMiniSearch.fromJson(json)` or `fromMiniSearchSnapshot(obj)` |
-| `import MiniSearch from 'minisearch'` | `import FrozenMiniSearch from '@yoch/frozenminisearch'` (+ `minisearch` only if you still build mutable indexes) |
+MiniSearch is only needed if you still build mutable indexes. Frozen instances do not support live `add`, `remove`, or `discard`.
 ---
@@ -174,13 +129,13 @@ const frozen2 = FrozenMiniSearch.fromJson(json, options)
 - `has(id)`, `getStoredFields(id)`
 - `saveBinarySync` / `loadBinarySync` / async variants
-Indexing is **not** available on a frozen instance — use `fromDocuments`, the builder, `fromMiniSearch*`, or `loadBinary*`.
+Custom `tokenize` and `processTerm` functions are not stored in snapshots; pass the same functions again when loading.
 ---
 ## Binary snapshots
-The primary way to **persist and ship a memory-compact index** — smaller than MiniSearch JSON and faster to load into a low-RAM serving process.
+Binary snapshots are the preferred production format.
 ```javascript
 const buf = index.saveBinarySync()
@@ -188,12 +143,8 @@ const loaded = FrozenMiniSearch.loadBinarySync(buf, {}) // field names embedded
 ```
 - **Node ≥ 20**
-- Default snapshot compression (`compression: 'auto'`, one pass):
-  - payloads under 64 B stay raw
-  - `zstd` on Node 22.15+ when it strictly shrinks the payload
-  - otherwise `zlib` on Node 20–22.14 when it strictly shrinks the payload
-  - otherwise `raw` (uncompressed)
-- Explicit snapshot compression always writes the chosen codec, even when compression would not shrink the payload (useful for portability):
+- `compression: 'auto'` chooses `zstd` on Node 22.15+, otherwise `zlib`, and falls back to raw when compression does not help.
+- Use explicit compression when you need a portable artifact:
 ```javascript
 const portable = index.saveBinarySync({ compression: 'zlib' })
@@ -201,11 +152,7 @@ const uncompressed = index.saveBinarySync({ compression: 'raw' })
 const bestRatio = index.saveBinarySync({ compression: 'zstd' }) // Node 22.15+
 ```
-- Snapshot readability depends on the embedded codec:
-  - `raw` and `zlib` snapshots load on Node 20+
-  - `zstd` snapshots require Node 22.15+
-- Snapshots produced by this package version are forward-compatible; re-build from MiniSearch JSON if an older binary fails to load
-- `tokenize` / `processTerm` are not stored — pass the same functions at load when customized
+Raw and zlib snapshots load on Node 20+. zstd snapshots require Node 22.15+.
 ---
@@ -215,8 +162,8 @@ See [benchmarks/README.md](benchmarks/README.md).
 ```bash
 npm run bench -- run --profile=vs-reference   # compare frozen vs minisearch
-npm run bench:diff                          # regression vs reference.json
-npm run bench:readme                          # refresh comparison table above
+npm run bench:diff                            # regression vs reference.json
+npm run bench:readme -- --from=benchmarks/baselines/latest.json
 ```
 ---
@@ -230,9 +177,7 @@ yarn build
 node scripts/verify-npm-pack.cjs
 ```
-Parity tests import `minisearch` as a devDependency (reference). Optional upstream clone: `git submodule update --init vendor/minisearch`.
-Design notes (freq adaptive, AND gating): [dev/docs/README.md](dev/docs/README.md).
+Parity tests compare against MiniSearch 7. Longer notes and performance work live under [dev/docs/README.md](dev/docs/README.md) and [benchmarks/README.md](benchmarks/README.md).
 ---
@@ -241,6 +186,6 @@ Design notes (freq adaptive, AND gating): [dev/docs/README.md](dev/docs/README.m
 See [CHANGELOG.md](./CHANGELOG.md).
 - **MiniSearch** — [Luca Ongaro](https://github.com/lucaong/minisearch) (MIT)
-- **@yoch/frozenminisearch** — memory-optimized frozen indexes, packed radix tree, compact binary snapshots
+- **@yoch/frozenminisearch** — memory-optimized frozen indexes and compact binary snapshots
 Upstream docs: [MiniSearch](https://lucaong.github.io/minisearch/)

package/dist/cjs/index.cjs CHANGED Viewed

@@ -37,6 +37,21 @@ function gateIsSelectiveEnough(gateSize, documentCount, limits = DEFAULT_AND_GAT
     }
     return false;
 }
+/** True when passing gate as allowedDocs can skip docs vs scanning the full branch posting. */
+function gateFilterShrinksScan(gateSize, postingListLength) {
+    return postingListLength > gateSize;
+}
+/**
+ * Whether to pass the AND gate as allowedDocs to the next branch (perf only; scores unchanged if false).
+ * Distinct from gateIsSelectiveEnough: a selective gate may still be too large to filter a short posting.
+ */
+function shouldPassGateAsAllowedDocs(selective, gateSize, postingListLength) {
+    if (!selective || gateSize === 0)
+        return false;
+    if (postingListLength == null || postingListLength <= 0)
+        return false;
+    return gateFilterShrinksScan(gateSize, postingListLength);
+}
 const MAX_FREQ = 65535;
 function readDocId(docIds, index) {
@@ -2138,6 +2153,16 @@ function createFrozenFieldTermFlyweight(layout) {
     return flyweight;
 }
 function collectDocIdsFromFrozenSegment(allDocIds, offset, length, context, docIds, allowedDocs) {
+    if (allowedDocs != null && shouldSeekAllowedDocs(allowedDocs.size, length)) {
+        for (const docId of allowedDocs) {
+            if (context.isDocActive != null && !context.isDocActive(docId))
+                continue;
+            if (findDocIndexInSortedSegment(allDocIds, offset, length, docId) >= 0) {
+                docIds.add(docId);
+            }
+        }
+        return;
+    }
     for (let i = 0; i < length; i++) {
         const docId = readDocId(allDocIds, offset + i);
         if (context.isDocActive != null && !context.isDocActive(docId))
@@ -4127,8 +4152,18 @@ function buildFrozenParamsFromDocuments(documents, options) {
 function useGatedEvaluation(run, branchCount, operator, hasWildcard) {
     return shouldUseGatedEvaluation(branchCount, operator, hasWildcard);
 }
-function docIdsFromResult(result) {
-    return new Set(result.keys());
+function gateFromResult(result) {
+    return {
+        get size() {
+            return result.size;
+        },
+        has(docId) {
+            return result.has(docId);
+        },
+        [Symbol.iterator]() {
+            return result.keys();
+        },
+    };
 }
 function isQueryCombination(query) {
     return typeof query === 'object'
@@ -4200,6 +4235,7 @@ function normalizeStringQuery(query, searchOptions, params) {
 function lazyIndexedTerm(indexView, termIndex) {
     return { kind: 'lazy', resolve: () => indexView.resolveTermByIndex(termIndex) };
 }
+const TWO_PHASE_AND_NOT_MIN_FRACTION = 0.5;
 function forEachQuerySpecTermRef(query, normalized, params, visit) {
     const { indexView } = params;
     const { options } = normalized;
@@ -4330,6 +4366,70 @@ function subtractDocIdsFromResult(result, excludedDocIds) {
     for (const docId of excludedDocIds)
         result.delete(docId);
 }
+function twoPhasePostingLengths(branches, allowTwoPhase, estimateBranchPostingLength) {
+    if (!allowTwoPhase || estimateBranchPostingLength == null)
+        return undefined;
+    const lengths = new Array(branches.length);
+    for (let i = 0; i < branches.length; i++) {
+        lengths[i] = estimateBranchPostingLength(branches[i]);
+    }
+    return lengths;
+}
+function shouldUseTwoPhaseAnd(branchPostingLengths, allowedDocs) {
+    if (branchPostingLengths.length <= 1)
+        return false;
+    const firstLength = branchPostingLengths[0];
+    const effectiveFirstLength = allowedDocs == null
+        ? firstLength
+        : Math.min(firstLength, allowedDocs.size);
+    if (effectiveFirstLength < DEFAULT_POSTING_GATE_MIN_LENGTH)
+        return false;
+    const targetLength = effectiveFirstLength >>> DEFAULT_POSTING_GATE_RATIO_SHIFT;
+    for (let i = 1; i < branchPostingLengths.length; i++) {
+        const len = branchPostingLengths[i];
+        if (len > 0 && len <= targetLength)
+            return true;
+    }
+    return false;
+}
+function shouldUseTwoPhaseAndNot(branchPostingLengths, allowedDocs, documentCount) {
+    if (branchPostingLengths.length <= 1)
+        return false;
+    const firstLength = branchPostingLengths[0];
+    const effectiveFirstLength = allowedDocs == null
+        ? firstLength
+        : Math.min(firstLength, allowedDocs.size);
+    const largeThreshold = Math.max(DEFAULT_POSTING_GATE_MIN_LENGTH, Math.floor(documentCount * TWO_PHASE_AND_NOT_MIN_FRACTION));
+    if (effectiveFirstLength < largeThreshold)
+        return false;
+    for (let i = 1; i < branchPostingLengths.length; i++) {
+        if (branchPostingLengths[i] >= largeThreshold)
+            return true;
+    }
+    return false;
+}
+function executeAndWithFinalGate(branches, finalGate, executeBranch) {
+    if (finalGate.size === 0)
+        return new Map();
+    let result = executeBranch(branches[0], finalGate);
+    for (let i = 1; i < branches.length; i++) {
+        if (result.size === 0)
+            return result;
+        result = combineResults([result, executeBranch(branches[i], finalGate)], AND);
+    }
+    return result;
+}
+function collectAndDocIdsByEstimatedLength(branches, branchPostingLengths, collectBranch, allowedDocs) {
+    const order = branches.map((_, i) => i);
+    order.sort((a, b) => branchPostingLengths[a] - branchPostingLengths[b] || a - b);
+    const docIds = collectBranch(branches[order[0]], allowedDocs);
+    for (let i = 1; i < order.length; i++) {
+        if (docIds.size === 0)
+            return docIds;
+        intersectDocIdsInPlace(docIds, collectBranch(branches[order[i]], docIds));
+    }
+    return docIds;
+}
 function collectCombinedDocIds(branches, operator, collectBranch, allowedDocs) {
     if (branches.length === 0)
         return new Set();
@@ -4359,11 +4459,12 @@ function collectCombinedDocIds(branches, operator, collectBranch, allowedDocs) {
     throw new Error(`Invalid combination operator: ${operator}`);
 }
 /**
- * AND: score every branch (with optional docId gate on later branches), then intersect scores.
+ * AND: normally score left-to-right with optional docId gates; for broad-first selective
+ * exact queries, collect the final gate first, then score branches in original order.
  * AND_NOT: score the positive branch only; negated branches are collected as docId sets and
- * subtracted without scoring (avoids term materialization on excluded branches).
+ * subtracted without scoring. Large exact exclusions may collect survivors before positive scoring.
  */
-function executeCombinedBranches(branches, operator, params, executeBranch, collectBranch, allowedDocs, run, estimateBranchPostingLength) {
+function executeCombinedBranches(branches, operator, params, executeBranch, collectBranch, allowedDocs, run, estimateBranchPostingLength, allowTwoPhase = false) {
     var _a;
     if (branches.length === 0)
         return new Map();
@@ -4371,9 +4472,14 @@ function executeCombinedBranches(branches, operator, params, executeBranch, coll
     if (op === 'or') {
         return combineResults(branches.map(branch => executeBranch(branch, allowedDocs)), operator);
     }
-    let result = executeBranch(branches[0], allowedDocs);
-    let gate = docIdsFromResult(result);
     if (op === 'and') {
+        const branchPostingLengths = twoPhasePostingLengths(branches, allowTwoPhase, estimateBranchPostingLength);
+        if (branchPostingLengths != null && shouldUseTwoPhaseAnd(branchPostingLengths, allowedDocs)) {
+            const finalGate = collectAndDocIdsByEstimatedLength(branches, branchPostingLengths, collectBranch, allowedDocs);
+            return executeAndWithFinalGate(branches, finalGate, executeBranch);
+        }
+        let result = executeBranch(branches[0], allowedDocs);
+        let gate = gateFromResult(result);
         const limits = void 0 ;
         const documentCount = params.aggregateContext.documentCount;
         const postingGatePolicy = (_a = void 0 ) !== null && _a !== void 0 ? _a : DEFAULT_POSTING_GATE_POLICY;
@@ -4381,21 +4487,30 @@ function executeCombinedBranches(branches, operator, params, executeBranch, coll
         for (let i = 1; i < branches.length; i++) {
             if (gate.size === 0)
                 return result;
-            const ratioPath = gate.size > maxGateSize;
-            const postingListLength = ratioPath
-                ? estimateBranchPostingLength === null || estimateBranchPostingLength === void 0 ? void 0 : estimateBranchPostingLength(branches[i])
-                : undefined;
+            const absoluteSelective = gate.size <= maxGateSize;
+            const postingListLength = absoluteSelective
+                ? undefined
+                : estimateBranchPostingLength === null || estimateBranchPostingLength === void 0 ? void 0 : estimateBranchPostingLength(branches[i]);
             const selective = gateIsSelectiveEnough(gate.size, documentCount, limits, postingListLength, postingGatePolicy);
-            const branchAllowed = selective ? gate : allowedDocs;
+            const branchAllowed = absoluteSelective || shouldPassGateAsAllowedDocs(selective, gate.size, postingListLength)
+                ? gate
+                : allowedDocs;
             result = combineResults([result, executeBranch(branches[i], branchAllowed)], AND);
-            gate = docIdsFromResult(result);
+            gate = gateFromResult(result);
         }
         return result;
     }
     if (op === 'and_not') {
+        const branchPostingLengths = twoPhasePostingLengths(branches, allowTwoPhase, estimateBranchPostingLength);
+        if (branchPostingLengths != null && shouldUseTwoPhaseAndNot(branchPostingLengths, allowedDocs, params.aggregateContext.documentCount)) {
+            const finalGate = collectCombinedDocIds(branches, operator, collectBranch, allowedDocs);
+            return finalGate.size === 0 ? new Map() : executeBranch(branches[0], finalGate);
+        }
+        const result = executeBranch(branches[0], allowedDocs);
+        let gate = gateFromResult(result);
         for (let i = 1; i < branches.length; i++) {
             subtractDocIdsFromResult(result, collectBranch(branches[i], gate));
-            gate = docIdsFromResult(result);
+            gate = gateFromResult(result);
         }
         return result;
     }
@@ -4491,7 +4606,7 @@ function executeQueryInternal(query, searchOptions, params, allowedDocs, run) {
     const { specs, operator } = normalized;
     const combineWith = (operator !== null && operator !== void 0 ? operator : params.globalSearchOptions.combineWith);
     if (useGatedEvaluation(run, specs.length, combineWith, false)) {
-        return executeCombinedBranches(specs, combineWith, params, (spec, branchAllowed) => executeQuerySpecInternal(spec, normalized, params, branchAllowed), (spec, branchAllowed) => collectDocIdsForQuerySpec(spec, normalized, params, branchAllowed), allowedDocs, run, spec => estimateMaxPostingLengthForQuerySpec(spec, normalized, params));
+        return executeCombinedBranches(specs, combineWith, params, (spec, branchAllowed) => executeQuerySpecInternal(spec, normalized, params, branchAllowed), (spec, branchAllowed) => collectDocIdsForQuerySpec(spec, normalized, params, branchAllowed), allowedDocs, run, spec => estimateMaxPostingLengthForQuerySpec(spec, normalized, params), specs.every(spec => !spec.prefix && !spec.fuzzy));
     }
     const results = specs.map(spec => executeQuerySpecInternal(spec, normalized, params, allowedDocs));
     return combineResults(results, combineWith);

package/dist/es/index.js CHANGED Viewed

@@ -33,6 +33,21 @@ function gateIsSelectiveEnough(gateSize, documentCount, limits = DEFAULT_AND_GAT
     }
     return false;
 }
+/** True when passing gate as allowedDocs can skip docs vs scanning the full branch posting. */
+function gateFilterShrinksScan(gateSize, postingListLength) {
+    return postingListLength > gateSize;
+}
+/**
+ * Whether to pass the AND gate as allowedDocs to the next branch (perf only; scores unchanged if false).
+ * Distinct from gateIsSelectiveEnough: a selective gate may still be too large to filter a short posting.
+ */
+function shouldPassGateAsAllowedDocs(selective, gateSize, postingListLength) {
+    if (!selective || gateSize === 0)
+        return false;
+    if (postingListLength == null || postingListLength <= 0)
+        return false;
+    return gateFilterShrinksScan(gateSize, postingListLength);
+}
 const MAX_FREQ = 65535;
 function readDocId(docIds, index) {
@@ -2134,6 +2149,16 @@ function createFrozenFieldTermFlyweight(layout) {
     return flyweight;
 }
 function collectDocIdsFromFrozenSegment(allDocIds, offset, length, context, docIds, allowedDocs) {
+    if (allowedDocs != null && shouldSeekAllowedDocs(allowedDocs.size, length)) {
+        for (const docId of allowedDocs) {
+            if (context.isDocActive != null && !context.isDocActive(docId))
+                continue;
+            if (findDocIndexInSortedSegment(allDocIds, offset, length, docId) >= 0) {
+                docIds.add(docId);
+            }
+        }
+        return;
+    }
     for (let i = 0; i < length; i++) {
         const docId = readDocId(allDocIds, offset + i);
         if (context.isDocActive != null && !context.isDocActive(docId))
@@ -4123,8 +4148,18 @@ function buildFrozenParamsFromDocuments(documents, options) {
 function useGatedEvaluation(run, branchCount, operator, hasWildcard) {
     return shouldUseGatedEvaluation(branchCount, operator, hasWildcard);
 }
-function docIdsFromResult(result) {
-    return new Set(result.keys());
+function gateFromResult(result) {
+    return {
+        get size() {
+            return result.size;
+        },
+        has(docId) {
+            return result.has(docId);
+        },
+        [Symbol.iterator]() {
+            return result.keys();
+        },
+    };
 }
 function isQueryCombination(query) {
     return typeof query === 'object'
@@ -4196,6 +4231,7 @@ function normalizeStringQuery(query, searchOptions, params) {
 function lazyIndexedTerm(indexView, termIndex) {
     return { kind: 'lazy', resolve: () => indexView.resolveTermByIndex(termIndex) };
 }
+const TWO_PHASE_AND_NOT_MIN_FRACTION = 0.5;
 function forEachQuerySpecTermRef(query, normalized, params, visit) {
     const { indexView } = params;
     const { options } = normalized;
@@ -4326,6 +4362,70 @@ function subtractDocIdsFromResult(result, excludedDocIds) {
     for (const docId of excludedDocIds)
         result.delete(docId);
 }
+function twoPhasePostingLengths(branches, allowTwoPhase, estimateBranchPostingLength) {
+    if (!allowTwoPhase || estimateBranchPostingLength == null)
+        return undefined;
+    const lengths = new Array(branches.length);
+    for (let i = 0; i < branches.length; i++) {
+        lengths[i] = estimateBranchPostingLength(branches[i]);
+    }
+    return lengths;
+}
+function shouldUseTwoPhaseAnd(branchPostingLengths, allowedDocs) {
+    if (branchPostingLengths.length <= 1)
+        return false;
+    const firstLength = branchPostingLengths[0];
+    const effectiveFirstLength = allowedDocs == null
+        ? firstLength
+        : Math.min(firstLength, allowedDocs.size);
+    if (effectiveFirstLength < DEFAULT_POSTING_GATE_MIN_LENGTH)
+        return false;
+    const targetLength = effectiveFirstLength >>> DEFAULT_POSTING_GATE_RATIO_SHIFT;
+    for (let i = 1; i < branchPostingLengths.length; i++) {
+        const len = branchPostingLengths[i];
+        if (len > 0 && len <= targetLength)
+            return true;
+    }
+    return false;
+}
+function shouldUseTwoPhaseAndNot(branchPostingLengths, allowedDocs, documentCount) {
+    if (branchPostingLengths.length <= 1)
+        return false;
+    const firstLength = branchPostingLengths[0];
+    const effectiveFirstLength = allowedDocs == null
+        ? firstLength
+        : Math.min(firstLength, allowedDocs.size);
+    const largeThreshold = Math.max(DEFAULT_POSTING_GATE_MIN_LENGTH, Math.floor(documentCount * TWO_PHASE_AND_NOT_MIN_FRACTION));
+    if (effectiveFirstLength < largeThreshold)
+        return false;
+    for (let i = 1; i < branchPostingLengths.length; i++) {
+        if (branchPostingLengths[i] >= largeThreshold)
+            return true;
+    }
+    return false;
+}
+function executeAndWithFinalGate(branches, finalGate, executeBranch) {
+    if (finalGate.size === 0)
+        return new Map();
+    let result = executeBranch(branches[0], finalGate);
+    for (let i = 1; i < branches.length; i++) {
+        if (result.size === 0)
+            return result;
+        result = combineResults([result, executeBranch(branches[i], finalGate)], AND);
+    }
+    return result;
+}
+function collectAndDocIdsByEstimatedLength(branches, branchPostingLengths, collectBranch, allowedDocs) {
+    const order = branches.map((_, i) => i);
+    order.sort((a, b) => branchPostingLengths[a] - branchPostingLengths[b] || a - b);
+    const docIds = collectBranch(branches[order[0]], allowedDocs);
+    for (let i = 1; i < order.length; i++) {
+        if (docIds.size === 0)
+            return docIds;
+        intersectDocIdsInPlace(docIds, collectBranch(branches[order[i]], docIds));
+    }
+    return docIds;
+}
 function collectCombinedDocIds(branches, operator, collectBranch, allowedDocs) {
     if (branches.length === 0)
         return new Set();
@@ -4355,11 +4455,12 @@ function collectCombinedDocIds(branches, operator, collectBranch, allowedDocs) {
     throw new Error(`Invalid combination operator: ${operator}`);
 }
 /**
- * AND: score every branch (with optional docId gate on later branches), then intersect scores.
+ * AND: normally score left-to-right with optional docId gates; for broad-first selective
+ * exact queries, collect the final gate first, then score branches in original order.
  * AND_NOT: score the positive branch only; negated branches are collected as docId sets and
- * subtracted without scoring (avoids term materialization on excluded branches).
+ * subtracted without scoring. Large exact exclusions may collect survivors before positive scoring.
  */
-function executeCombinedBranches(branches, operator, params, executeBranch, collectBranch, allowedDocs, run, estimateBranchPostingLength) {
+function executeCombinedBranches(branches, operator, params, executeBranch, collectBranch, allowedDocs, run, estimateBranchPostingLength, allowTwoPhase = false) {
     var _a;
     if (branches.length === 0)
         return new Map();
@@ -4367,9 +4468,14 @@ function executeCombinedBranches(branches, operator, params, executeBranch, coll
     if (op === 'or') {
         return combineResults(branches.map(branch => executeBranch(branch, allowedDocs)), operator);
     }
-    let result = executeBranch(branches[0], allowedDocs);
-    let gate = docIdsFromResult(result);
     if (op === 'and') {
+        const branchPostingLengths = twoPhasePostingLengths(branches, allowTwoPhase, estimateBranchPostingLength);
+        if (branchPostingLengths != null && shouldUseTwoPhaseAnd(branchPostingLengths, allowedDocs)) {
+            const finalGate = collectAndDocIdsByEstimatedLength(branches, branchPostingLengths, collectBranch, allowedDocs);
+            return executeAndWithFinalGate(branches, finalGate, executeBranch);
+        }
+        let result = executeBranch(branches[0], allowedDocs);
+        let gate = gateFromResult(result);
         const limits = void 0 ;
         const documentCount = params.aggregateContext.documentCount;
         const postingGatePolicy = (_a = void 0 ) !== null && _a !== void 0 ? _a : DEFAULT_POSTING_GATE_POLICY;
@@ -4377,21 +4483,30 @@ function executeCombinedBranches(branches, operator, params, executeBranch, coll
         for (let i = 1; i < branches.length; i++) {
             if (gate.size === 0)
                 return result;
-            const ratioPath = gate.size > maxGateSize;
-            const postingListLength = ratioPath
-                ? estimateBranchPostingLength === null || estimateBranchPostingLength === void 0 ? void 0 : estimateBranchPostingLength(branches[i])
-                : undefined;
+            const absoluteSelective = gate.size <= maxGateSize;
+            const postingListLength = absoluteSelective
+                ? undefined
+                : estimateBranchPostingLength === null || estimateBranchPostingLength === void 0 ? void 0 : estimateBranchPostingLength(branches[i]);
             const selective = gateIsSelectiveEnough(gate.size, documentCount, limits, postingListLength, postingGatePolicy);
-            const branchAllowed = selective ? gate : allowedDocs;
+            const branchAllowed = absoluteSelective || shouldPassGateAsAllowedDocs(selective, gate.size, postingListLength)
+                ? gate
+                : allowedDocs;
             result = combineResults([result, executeBranch(branches[i], branchAllowed)], AND);
-            gate = docIdsFromResult(result);
+            gate = gateFromResult(result);
         }
         return result;
     }
     if (op === 'and_not') {
+        const branchPostingLengths = twoPhasePostingLengths(branches, allowTwoPhase, estimateBranchPostingLength);
+        if (branchPostingLengths != null && shouldUseTwoPhaseAndNot(branchPostingLengths, allowedDocs, params.aggregateContext.documentCount)) {
+            const finalGate = collectCombinedDocIds(branches, operator, collectBranch, allowedDocs);
+            return finalGate.size === 0 ? new Map() : executeBranch(branches[0], finalGate);
+        }
+        const result = executeBranch(branches[0], allowedDocs);
+        let gate = gateFromResult(result);
         for (let i = 1; i < branches.length; i++) {
             subtractDocIdsFromResult(result, collectBranch(branches[i], gate));
-            gate = docIdsFromResult(result);
+            gate = gateFromResult(result);
         }
         return result;
     }
@@ -4487,7 +4602,7 @@ function executeQueryInternal(query, searchOptions, params, allowedDocs, run) {
     const { specs, operator } = normalized;
     const combineWith = (operator !== null && operator !== void 0 ? operator : params.globalSearchOptions.combineWith);
     if (useGatedEvaluation(run, specs.length, combineWith, false)) {
-        return executeCombinedBranches(specs, combineWith, params, (spec, branchAllowed) => executeQuerySpecInternal(spec, normalized, params, branchAllowed), (spec, branchAllowed) => collectDocIdsForQuerySpec(spec, normalized, params, branchAllowed), allowedDocs, run, spec => estimateMaxPostingLengthForQuerySpec(spec, normalized, params));
+        return executeCombinedBranches(specs, combineWith, params, (spec, branchAllowed) => executeQuerySpecInternal(spec, normalized, params, branchAllowed), (spec, branchAllowed) => collectDocIdsForQuerySpec(spec, normalized, params, branchAllowed), allowedDocs, run, spec => estimateMaxPostingLengthForQuerySpec(spec, normalized, params), specs.every(spec => !spec.prefix && !spec.fuzzy));
     }
     const results = specs.map(spec => executeQuerySpecInternal(spec, normalized, params, allowedDocs));
     return combineResults(results, combineWith);

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@yoch/frozenminisearch",
-  "version": "1.2.2",
+  "version": "1.2.3",
   "description": "Read-only Node.js full-text search — compact frozen indexes and binary snapshots",
   "main": "dist/cjs/index.cjs",
   "module": "dist/es/index.js",