ralph-hero-knowledge-index 0.1.32 → 0.1.33

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "ralph-knowledge",
3
- "version": "0.1.32",
3
+ "version": "0.1.33",
4
4
  "description": "Knowledge graph for ralph-hero: semantic search, relationship traversal, and document indexing across thoughts/ documents. Optional companion to ralph-hero.",
5
5
  "author": {
6
6
  "name": "Chad Dubiel",
package/.mcp.json CHANGED
@@ -2,7 +2,7 @@
2
2
  "mcpServers": {
3
3
  "ralph-knowledge": {
4
4
  "command": "npx",
5
- "args": ["-y", "ralph-hero-knowledge-index@0.1.32"]
5
+ "args": ["-y", "ralph-hero-knowledge-index@0.1.33"]
6
6
  }
7
7
  }
8
8
  }
@@ -11,7 +11,8 @@ will not break the CI matrix on Node 18/20/22.
11
11
  ## Running
12
12
 
13
13
  Each script is a standalone TypeScript file that can be run directly with
14
- `tsx` (already a transitive devDependency via `vitest` no install required):
14
+ `tsx` (declared as a `devDependency` in `package.json`, installed by
15
+ `npm ci`):
15
16
 
16
17
  ```bash
17
18
  # From repo root or plugin/ralph-knowledge:
@@ -19,6 +20,9 @@ npx tsx benchmark/reranker-bench.ts
19
20
 
20
21
  # Or, equivalently, with the node loader form:
21
22
  node --import tsx benchmark/reranker-bench.ts
23
+
24
+ # Or via the npm script (used by CI for the heap bench):
25
+ npm run bench:heap -- --assert
22
26
  ```
23
27
 
24
28
  Scripts read the same `RALPH_KNOWLEDGE_DB` env var as the MCP server, so by
@@ -48,3 +52,88 @@ the entire run.
48
52
  The script is purely additive — it does not modify `hybrid-search.ts` or any
49
53
  production source file. Production wiring of a default reranker is a separate
50
54
  followup gated on the benchmark findings.
55
+
56
+ ### `reindex-heap-bench.ts` (GH-913)
57
+
58
+ Microbenchmark guarding the OOM fix from #907 (#911 embedder tensor disposal,
59
+ #916 chunker forward-progress). Generates a deterministic 50-doc / ~240-chunk
60
+ synthetic corpus in a tmp dir via a seeded `mulberry32` RNG, runs `reindex()`
61
+ against it with `RALPH_CONTEXTUAL_RETRIEVAL=0`, samples
62
+ `process.memoryUsage()` every 100 ms, and writes a TSV row with peak
63
+ `heap_used`, `rss`, `external`, wall clock, and chunk count. (The reranker
64
+ bench measures cold-start; the heap bench does not, because `reindex()`
65
+ exposes no hook to mark the moment when the embedding model finishes loading.)
66
+
67
+ ```bash
68
+ # Run once, write TSV row, no exit-1 behavior:
69
+ npx tsx benchmark/reindex-heap-bench.ts
70
+
71
+ # Same, but exit 1 if peak_heap_used > 600 MB or peak_rss > 800 MB:
72
+ npx tsx benchmark/reindex-heap-bench.ts --assert
73
+
74
+ # Same as above but via the npm script (used by CI in build-and-test-knowledge):
75
+ npm run bench:heap -- --assert
76
+ ```
77
+
78
+ Results are appended one row per run to `benchmark/results-YYYY-MM-DD.tsv`
79
+ (history-preserving — re-running the bench during a tuning session adds rows
80
+ under the same header rather than overwriting). The TSV header is:
81
+
82
+ ```
83
+ date doc_count chunk_count wall_clock_s peak_heap_used_mb peak_rss_mb peak_external_mb threshold_pass notes
84
+ ```
85
+
86
+ Default thresholds (sourced from
87
+ [2026-04-29-reindex-memory-profile.md](../../../thoughts/shared/research/2026-04-29-reindex-memory-profile.md)):
88
+
89
+ | Threshold | Value | Rationale |
90
+ |----------------------|-------|--------------------------------------------------------------------------------------------------------------------------------------------|
91
+ | `peak_heap_used_mb` | 600 | Catches catastrophic regrowth (the original OOM was 4 GB+); ~12x margin over today's typical ~30-50 MB on the 50-doc bench corpus. |
92
+ | `peak_rss_mb` | 800 | Catches transformer-model bloat or external-buffer growth; ~1.6-2x margin over today's typical ~400-450 MB on the 50-doc bench corpus. |
93
+
94
+ **Tuning the thresholds**: open the TSV results history, find the
95
+ 95th-percentile `peak_heap_used_mb` across the last ~10 runs on your CI
96
+ hardware, multiply by 2. That yields a regression-detection threshold without
97
+ flakiness from per-run jitter.
98
+
99
+ #### Manually verifying the bench fails on a regression
100
+
101
+ The intuition behind the bench is: **a regression that re-introduces
102
+ unbounded transient allocation will push one of the three peak metrics
103
+ (`heap_used`, `rss`, `external`) far above today's baseline**. The TSV
104
+ records all three so a tuning session can pick the right metric for the
105
+ regression class being guarded.
106
+
107
+ To confirm the bench's `--assert` path works end-to-end, force a synthetic
108
+ breach by temporarily lowering one of the thresholds in
109
+ `benchmark/reindex-heap-bench.ts`:
110
+
111
+ ```bash
112
+ # In benchmark/reindex-heap-bench.ts, temporarily set:
113
+ # const HEAP_THRESHOLD_MB = 30; // below today's ~40 MB baseline
114
+ # (or)
115
+ # const RSS_THRESHOLD_MB = 300; // below today's ~450 MB baseline
116
+
117
+ npx tsx benchmark/reindex-heap-bench.ts --assert
118
+ # expected: exit code 1, console line:
119
+ # reindex-heap-bench: ASSERT FAIL — THRESHOLD BREACH: heap_used 41.2 > 30
120
+
121
+ # Restore the threshold (revert benchmark/reindex-heap-bench.ts).
122
+ ```
123
+
124
+ Do **NOT** commit the threshold change — it's a one-time confirmation that
125
+ the assertion path works end-to-end. The bench script itself is purely
126
+ additive and never modifies `embedder.ts`/`chunker.ts`/`reindex.ts`.
127
+
128
+ **Note on the dispose() regression**: an earlier draft of this section
129
+ suggested reverting `output.dispose()` in `src/embedder.ts` to verify the
130
+ bench catches the original GH-911 OOM. Empirically, on the 50-doc / ~240-chunk
131
+ synthetic corpus, removing the dispose call leaves `peak_heap_used_mb`
132
+ unchanged (~41 MB) and only adds ~3x to `peak_external_mb` (~21 MB -> ~65 MB).
133
+ The original OOM manifested at the live ~14k-chunk corpus scale, not at this
134
+ bench's scale. The bench therefore guards against **catastrophic
135
+ regressions** (a 10x+ allocation increase that crosses the 600 MB / 800 MB
136
+ margins) rather than the specific dispose() leak — which would need a much
137
+ larger synthetic corpus to be detectable. The `peak_external_mb` column is
138
+ recorded in the TSV for future tuning if a tighter native-buffer guard
139
+ becomes worth the added bench runtime.
@@ -0,0 +1,369 @@
1
+ /**
2
+ * GH-913 — Heap-regression microbenchmark for reindex().
3
+ *
4
+ * Generates a deterministic 50-doc synthetic corpus, runs the production
5
+ * reindex() against it with RALPH_CONTEXTUAL_RETRIEVAL=0 and a 100 ms heap
6
+ * sampler, then writes a TSV row with peak heap_used, peak RSS, peak external,
7
+ * wall-clock, and chunk count.
8
+ *
9
+ * Guards the OOM fix from #907 (#911 + #916). A regression that re-introduces
10
+ * catastrophic transient allocation (10x+ over today's baseline) will push
11
+ * peak_heap_used or peak_rss past the configured thresholds and fail
12
+ * `--assert` (exit 1).
13
+ *
14
+ * Run with:
15
+ * # Always exits 0; just records the row:
16
+ * npx tsx plugin/ralph-knowledge/benchmark/reindex-heap-bench.ts
17
+ *
18
+ * # Exits 1 if peak_heap_used > 600 MB OR peak_rss > 800 MB:
19
+ * npx tsx plugin/ralph-knowledge/benchmark/reindex-heap-bench.ts --assert
20
+ */
21
+ import { mkdtempSync, writeFileSync, existsSync, appendFileSync } from "node:fs";
22
+ import { join, dirname } from "node:path";
23
+ import { tmpdir } from "node:os";
24
+ import { fileURLToPath } from "node:url";
25
+ import Database from "better-sqlite3";
26
+ import { reindex } from "../src/reindex.js";
27
+
28
+ const DOC_COUNT = 50;
29
+ const TARGET_DOC_BYTES = 7 * 1024; // ~7 KB per doc -> ~3-5 chunks each
30
+ const SAMPLE_INTERVAL_MS = 100;
31
+
32
+ /**
33
+ * Default thresholds — sourced from the GH-910 reindex memory profile note.
34
+ *
35
+ * - HEAP_THRESHOLD_MB (600): catches catastrophic regrowth. Pre-#911 the
36
+ * per-call retention was ~30 MB transient, climbing to 4 GB+ within ~150
37
+ * chunks on the LIVE corpus. Today's typical heap_used on the 50-doc
38
+ * bench corpus is ~30-50 MB, so 600 MB gives ~12x margin for the post-#911
39
+ * baseline while still failing if a regression causes 10x+ allocation.
40
+ * - RSS_THRESHOLD_MB (800): catches transformer-model bloat or external-buffer
41
+ * growth. Today's typical RSS on the 50-doc bench is ~400-450 MB (mostly
42
+ * the transformer model baseline), so 800 MB gives 1.6-2x margin while
43
+ * still failing if a regression doubles per-doc RSS pressure.
44
+ *
45
+ * Tuning recipe: open the TSV history, find p95 across the last ~10 runs
46
+ * on your CI hardware, multiply by 2. Avoids per-run jitter flakes.
47
+ */
48
+ const HEAP_THRESHOLD_MB = 600;
49
+ const RSS_THRESHOLD_MB = 800;
50
+
51
+ /**
52
+ * One bench-run row. Columns mirror the reranker-bench convention (one
53
+ * scalar per metric, trailing free-form `notes` for any partial-failure
54
+ * or threshold-breach description). `threshold_pass` and `notes` are
55
+ * populated by `main()` after `runBench()` returns the raw measurements —
56
+ * keeps the measurement path independent of the threshold-check policy.
57
+ */
58
+ interface BenchResult {
59
+ date: string;
60
+ doc_count: number;
61
+ chunk_count: number;
62
+ wall_clock_s: number;
63
+ peak_heap_used_mb: number;
64
+ peak_rss_mb: number;
65
+ peak_external_mb: number;
66
+ threshold_pass: boolean;
67
+ notes: string;
68
+ }
69
+
70
+ /**
71
+ * Seeded RNG (mulberry32) — deterministic across runs and machines so the
72
+ * synthetic corpus is reproducible. We DO NOT want jitter from `Math.random()`
73
+ * in the corpus generator: the bench's value comes from comparing memory
74
+ * deltas across CI runs of the SAME corpus shape, not from sampling random
75
+ * corpora.
76
+ */
77
+ function mulberry32(seed: number): () => number {
78
+ let a = seed >>> 0;
79
+ return function (): number {
80
+ a = (a + 0x6d2b79f5) >>> 0;
81
+ let t = a;
82
+ t = Math.imul(t ^ (t >>> 15), t | 1);
83
+ t ^= t + Math.imul(t ^ (t >>> 7), t | 61);
84
+ return ((t ^ (t >>> 14)) >>> 0) / 4294967296;
85
+ };
86
+ }
87
+
88
+ /**
89
+ * English filler pool. Small repeating set keeps generation fast and
90
+ * produces realistic chunker behavior (sentence boundaries trip the
91
+ * paragraph-aware split heuristic in chunker.ts). Sentences are 60-120
92
+ * characters each — typical for prose paragraphs.
93
+ */
94
+ const SENTENCES: string[] = [
95
+ "The reindex pipeline ingests markdown files from configured roots.",
96
+ "Each document is parsed for frontmatter and its body content is extracted.",
97
+ "Chunking splits long documents into roughly 800 token windows with overlap.",
98
+ "An embedding is produced for every chunk using a small transformer model.",
99
+ "The vector index stores normalized float embeddings in a virtual table.",
100
+ "Full text search is provided by a separate FTS5 index in the same database.",
101
+ "Hybrid search combines reciprocal rank fusion across the two retrievers.",
102
+ "Sync records track file modification times to skip unchanged documents.",
103
+ "Schema version metadata forces a full reindex when the embedding shape changes.",
104
+ "Stub documents are upserted for outbound relationships that lack a target.",
105
+ "Contextual retrieval prepends a short context prefix to each chunk before embedding.",
106
+ "The LLM endpoint is probed once and the run fails open when unreachable.",
107
+ "Memory tier classification distinguishes raw notes from synthesized reflections.",
108
+ "Tags are split from frontmatter arrays and indexed separately for filtering.",
109
+ "Relationships extend wiki style links into typed predicates between documents.",
110
+ "Untyped edges capture incidental mentions outside an explicit predicate context.",
111
+ "The chunker walks the document and emits chunks with character offsets.",
112
+ "Forward progress is enforced so the chunker never returns a zero length chunk.",
113
+ "Each embedding call disposes the underlying tensor immediately after copy.",
114
+ "Without disposal the native ONNX buffers retain memory across the await loop.",
115
+ "Mark compact garbage collection cannot reclaim native memory on its own.",
116
+ "The accumulator gate prevents unbounded retention during long indexing runs.",
117
+ "Synthetic corpora isolate heap behavior from live document content drift.",
118
+ "Reproducibility across machines requires seeded random number generation.",
119
+ "Wall clock measurements include cold start latency from model download.",
120
+ "Peak resident set size captures both heap and native buffer pressure.",
121
+ "External memory in the v8 heap snapshot tracks ArrayBuffer allocations.",
122
+ "Sampling at one hundred millisecond intervals catches transient spikes.",
123
+ "TSV results are appended one row per run for tuning and history review.",
124
+ "Threshold values come from the calibration profile in the prior research note.",
125
+ ];
126
+
127
+ /**
128
+ * Generate one synthetic markdown doc. Frontmatter is realistic enough to
129
+ * pass the parseDocument frontmatter check; body is built by sampling the
130
+ * filler pool until the target byte budget is reached.
131
+ */
132
+ function generateSyntheticDoc(rng: () => number, idx: number): string {
133
+ const fmDate = "2026-05-02";
134
+ const tier = "research";
135
+ // Deterministic title: pick three filler-pool prefixes by RNG.
136
+ const titleSeed = [
137
+ SENTENCES[Math.floor(rng() * SENTENCES.length)].split(" ").slice(0, 3).join(" "),
138
+ SENTENCES[Math.floor(rng() * SENTENCES.length)].split(" ").slice(0, 2).join(" "),
139
+ ].join(" - ");
140
+ let body = "";
141
+ while (body.length < TARGET_DOC_BYTES) {
142
+ body += SENTENCES[Math.floor(rng() * SENTENCES.length)] + " ";
143
+ // Insert a paragraph break every ~10 sentences so the chunker has
144
+ // realistic paragraph boundaries to split on.
145
+ if (body.length % 11 === 0) body += "\n\n";
146
+ }
147
+ return `---\ndate: ${fmDate}\ntype: ${tier}\nstatus: draft\n---\n\n# Doc ${idx}: ${titleSeed}\n\n${body}\n`;
148
+ }
149
+
150
+ function generateCorpus(dir: string): void {
151
+ const rng = mulberry32(0xc0ffee);
152
+ for (let i = 0; i < DOC_COUNT; i++) {
153
+ const name = `doc-${String(i).padStart(3, "0")}.md`;
154
+ writeFileSync(join(dir, name), generateSyntheticDoc(rng, i));
155
+ }
156
+ }
157
+
158
+ interface HeapSample {
159
+ heapUsed: number;
160
+ rss: number;
161
+ external: number;
162
+ }
163
+
164
+ /**
165
+ * Start a 100 ms in-process heap sampler. The sampler captures peak values
166
+ * across the full sampling window (vs. snapshots between docs, which
167
+ * underestimate transient peaks per the GH-910 profile note). The interval
168
+ * handle is `unref()`d so it does not pin the event loop alive on its own —
169
+ * `stop()` clears the interval and returns the accumulated peaks.
170
+ */
171
+ function startHeapSampler(): { stop: () => HeapSample } {
172
+ const peak: HeapSample = { heapUsed: 0, rss: 0, external: 0 };
173
+ const tick = (): void => {
174
+ const m = process.memoryUsage();
175
+ if (m.heapUsed > peak.heapUsed) peak.heapUsed = m.heapUsed;
176
+ if (m.rss > peak.rss) peak.rss = m.rss;
177
+ if (m.external > peak.external) peak.external = m.external;
178
+ };
179
+ tick();
180
+ const handle = setInterval(tick, SAMPLE_INTERVAL_MS);
181
+ handle.unref();
182
+ return {
183
+ stop: (): HeapSample => {
184
+ tick();
185
+ clearInterval(handle);
186
+ return peak;
187
+ },
188
+ };
189
+ }
190
+
191
+ /**
192
+ * Query the chunk count from the database directly. Reindex does not
193
+ * expose this on its return value, but the schema is stable: the
194
+ * `chunks` table is populated as a side effect of `reindex()` per the
195
+ * upsert loop in `src/reindex.ts`. Using better-sqlite3 directly keeps
196
+ * the bench independent of KnowledgeDB's surface — the count survives
197
+ * any future API drift as long as `chunks(document_id)` exists.
198
+ */
199
+ function countChunks(dbPath: string): number {
200
+ const db = new Database(dbPath, { readonly: true });
201
+ try {
202
+ const row = db.prepare("SELECT COUNT(*) AS n FROM chunks").get() as { n: number };
203
+ return row.n;
204
+ } catch {
205
+ // Table not present (e.g., very early reindex failure). Surface as 0
206
+ // rather than throwing so the TSV row still records the heap data.
207
+ return 0;
208
+ } finally {
209
+ db.close();
210
+ }
211
+ }
212
+
213
+ async function runBench(): Promise<BenchResult> {
214
+ process.env.RALPH_CONTEXTUAL_RETRIEVAL = "0";
215
+
216
+ const corpusDir = mkdtempSync(join(tmpdir(), "bench-heap-corpus-"));
217
+ const dbDir = mkdtempSync(join(tmpdir(), "bench-heap-db-"));
218
+ const dbPath = join(dbDir, "bench.db");
219
+
220
+ generateCorpus(corpusDir);
221
+
222
+ const sampler = startHeapSampler();
223
+ // Wall clock spans the entire reindex run: model cold-start, file scan,
224
+ // chunk loop, and final flush. A separate cold-start metric was removed
225
+ // (PR #935 review) because reindex() exposes no hook to mark the moment
226
+ // when model load completes — measuring it from outside the call always
227
+ // yielded ~0. If a future iteration needs warm vs. cold timing, add an
228
+ // event hook in reindex.ts and reintroduce the column then.
229
+ const t0 = performance.now();
230
+ await reindex([corpusDir], dbPath, false);
231
+ const elapsed = (performance.now() - t0) / 1000;
232
+ const peak = sampler.stop();
233
+
234
+ const chunkCount = countChunks(dbPath);
235
+
236
+ return {
237
+ date: isoDate(),
238
+ doc_count: DOC_COUNT,
239
+ chunk_count: chunkCount,
240
+ wall_clock_s: Number(elapsed.toFixed(2)),
241
+ peak_heap_used_mb: Number((peak.heapUsed / 1024 / 1024).toFixed(1)),
242
+ peak_rss_mb: Number((peak.rss / 1024 / 1024).toFixed(1)),
243
+ peak_external_mb: Number((peak.external / 1024 / 1024).toFixed(1)),
244
+ // Threshold check is the caller's responsibility (main() in --assert
245
+ // mode). Default to neutral values so the row is well-formed even
246
+ // when a downstream consumer imports runBench() directly.
247
+ threshold_pass: true,
248
+ notes: "",
249
+ };
250
+ }
251
+
252
+ const TSV_HEADERS = [
253
+ "date",
254
+ "doc_count",
255
+ "chunk_count",
256
+ "wall_clock_s",
257
+ "peak_heap_used_mb",
258
+ "peak_rss_mb",
259
+ "peak_external_mb",
260
+ "threshold_pass",
261
+ "notes",
262
+ ] as const;
263
+
264
+ function rowToTsv(r: BenchResult): string {
265
+ return [
266
+ r.date,
267
+ r.doc_count,
268
+ r.chunk_count,
269
+ r.wall_clock_s,
270
+ r.peak_heap_used_mb,
271
+ r.peak_rss_mb,
272
+ r.peak_external_mb,
273
+ r.threshold_pass,
274
+ r.notes,
275
+ ].join("\t");
276
+ }
277
+
278
+ /**
279
+ * Append `rows` to `outPath`. If the file does not exist, write the header
280
+ * line first; if it does, only append rows. Idempotent: re-invoking the
281
+ * bench on the same day produces additional rows under the same header,
282
+ * never duplicate headers. (The reranker-bench overwrites because it runs
283
+ * a fixed model set sequentially; the heap bench may be invoked multiple
284
+ * times during a tuning session and history is the point.)
285
+ */
286
+ function appendOrCreateTsv(outPath: string, rows: BenchResult[]): void {
287
+ const lines = rows.map(rowToTsv);
288
+ if (existsSync(outPath)) {
289
+ appendFileSync(outPath, lines.join("\n") + "\n", "utf8");
290
+ } else {
291
+ writeFileSync(outPath, TSV_HEADERS.join("\t") + "\n" + lines.join("\n") + "\n", "utf8");
292
+ }
293
+ }
294
+
295
+ function printSummary(r: BenchResult): void {
296
+ console.log("\n=== Reindex Heap Benchmark Result ===");
297
+ console.log(` date : ${r.date}`);
298
+ console.log(` doc_count : ${r.doc_count}`);
299
+ console.log(` chunk_count : ${r.chunk_count}`);
300
+ console.log(` wall_clock_s : ${r.wall_clock_s}`);
301
+ console.log(` peak_heap_used_mb : ${r.peak_heap_used_mb}`);
302
+ console.log(` peak_rss_mb : ${r.peak_rss_mb}`);
303
+ console.log(` peak_external_mb : ${r.peak_external_mb}`);
304
+ console.log(` threshold_pass : ${r.threshold_pass}`);
305
+ console.log(` notes : ${r.notes}`);
306
+ console.log("");
307
+ }
308
+
309
+ function isoDate(): string {
310
+ return new Date().toISOString().slice(0, 10); // YYYY-MM-DD
311
+ }
312
+
313
+ export async function main(): Promise<void> {
314
+ const args = process.argv.slice(2);
315
+ const assertMode = args.includes("--assert");
316
+
317
+ console.log(`reindex-heap-bench: generating ${DOC_COUNT}-doc synthetic corpus...`);
318
+ const result = await runBench();
319
+
320
+ // Threshold check — applied unconditionally so the TSV row always records
321
+ // pass/fail, but only --assert turns a breach into a non-zero exit.
322
+ const heapBreach = result.peak_heap_used_mb > HEAP_THRESHOLD_MB;
323
+ const rssBreach = result.peak_rss_mb > RSS_THRESHOLD_MB;
324
+ result.threshold_pass = !heapBreach && !rssBreach;
325
+ if (heapBreach || rssBreach) {
326
+ const breaches: string[] = [];
327
+ if (heapBreach) {
328
+ breaches.push(`heap_used ${result.peak_heap_used_mb} > ${HEAP_THRESHOLD_MB}`);
329
+ }
330
+ if (rssBreach) {
331
+ breaches.push(`rss ${result.peak_rss_mb} > ${RSS_THRESHOLD_MB}`);
332
+ }
333
+ result.notes = `THRESHOLD BREACH: ${breaches.join("; ")}`;
334
+ } else {
335
+ result.notes = "ok";
336
+ }
337
+
338
+ // Always write TSV — useful for tuning even when an --assert run aborts.
339
+ const here = dirname(fileURLToPath(import.meta.url));
340
+ const outPath = join(here, `results-${isoDate()}.tsv`);
341
+ appendOrCreateTsv(outPath, [result]);
342
+ console.log(`reindex-heap-bench: wrote ${outPath}`);
343
+
344
+ printSummary(result);
345
+
346
+ // Exit 1 ONLY when --assert was passed AND a threshold breached. Without
347
+ // --assert, a breach still appears in the TSV `notes` column so a tuning
348
+ // session can review history without aborting.
349
+ //
350
+ // Use `process.exitCode` (not `process.exit()`) so the event loop drains
351
+ // and native bindings (better-sqlite3, transformers.js ONNX runtime) tear
352
+ // down cleanly. A hard `process.exit(1)` here causes a libc++ abort during
353
+ // ONNX teardown that returns 134 (SIGABRT) instead of 1.
354
+ if (assertMode && !result.threshold_pass) {
355
+ console.error(`reindex-heap-bench: ASSERT FAIL — ${result.notes}`);
356
+ process.exitCode = 1;
357
+ }
358
+ }
359
+
360
+ // Top-level runner — only executes when this file is invoked directly,
361
+ // not when imported. We use endsWith() over the tsx source path because tsx
362
+ // (the runner) sets process.argv[1] to the .ts file directly.
363
+ const invokedDirectly = process.argv[1]?.endsWith("reindex-heap-bench.ts");
364
+ if (invokedDirectly) {
365
+ main().catch((e) => {
366
+ console.error("reindex-heap-bench: fatal error", e);
367
+ process.exit(1);
368
+ });
369
+ }
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "ralph-hero-knowledge-index",
3
- "version": "0.1.32",
3
+ "version": "0.1.33",
4
4
  "type": "module",
5
5
  "main": "dist/index.js",
6
6
  "bin": {
@@ -16,6 +16,7 @@
16
16
  "start": "node dist/index.js",
17
17
  "reindex": "node dist/reindex.js",
18
18
  "test": "vitest run",
19
+ "bench:heap": "tsx benchmark/reindex-heap-bench.ts",
19
20
  "prepublishOnly": "npm run build"
20
21
  },
21
22
  "dependencies": {
@@ -39,6 +40,7 @@
39
40
  "devDependencies": {
40
41
  "@types/better-sqlite3": "^7.6.13",
41
42
  "@types/node": "^22.0.0",
43
+ "tsx": "^4.21.0",
42
44
  "typescript": "^5.7.0",
43
45
  "vitest": "^4.0.0"
44
46
  }