omnivad 0.2.5 → 0.2.9

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md ADDED
@@ -0,0 +1,184 @@
1
+ # omnivad
2
+
3
+ [![npm](https://img.shields.io/npm/v/omnivad)](https://www.npmjs.com/package/omnivad)
4
+ [![npm bundle size](https://img.shields.io/bundlephobia/min/omnivad)](https://bundlephobia.com/package/omnivad)
5
+ [![license](https://img.shields.io/npm/l/omnivad)](https://github.com/lifeiteng/OmniVAD-Kit/blob/main/LICENSE)
6
+
7
+ Cross-platform Voice Activity Detection and Audio Event Detection via WebAssembly.
8
+ Runs in **browsers, Web Workers, and Node.js** with a single API. Zero runtime
9
+ dependencies. Built on [FireRedVAD](https://github.com/FireRedTeam/FireRedVAD)
10
+ from Xiaohongshu (DFSMN architecture, ~2.2 MB per model).
11
+
12
+ ## What's in the box
13
+
14
+ | Class | Use case | Output |
15
+ |-------|----------|--------|
16
+ | **`OmniVAD`** | Whole-audio voice activity detection | `[start, end]` timestamps |
17
+ | **`OmniStreamVAD`** | Real-time, frame-by-frame VAD with segment-boundary events | per-frame probability + start/end events |
18
+ | **`OmniAED`** | Audio event detection (3-class) | `speech` / `singing` / `music` timestamps |
19
+ | **`mergeChunks`** | Pack VAD output into Whisper-style 30 s chunks | `{ start, end, segStartIdx, segCount }[]` |
20
+
21
+ All four share one WASM module (~2.2 MB SIMD-enabled), one C implementation,
22
+ and a single bundle (~24 KB JS, ESM + CJS + types).
23
+
24
+ ## Install
25
+
26
+ ```bash
27
+ pnpm add omnivad # or: npm install omnivad / yarn add omnivad
28
+ ```
29
+
30
+ Models are served from jsDelivr by default (zero config). For air-gapped or
31
+ custom deployments, pass `modelUrl` or pre-loaded `modelData`.
32
+
33
+ ## Quickstart — whole-audio VAD
34
+
35
+ ```ts
36
+ import { OmniVAD } from "omnivad";
37
+
38
+ const vad = await OmniVAD.create();
39
+
40
+ // Float32Array in [-1, 1] (Web Audio, decodeAudioData) or Int16Array (raw PCM)
41
+ const result = vad.detect(audioFloat32);
42
+ // { duration: 12.4, timestamps: [[0.35, 4.8], [5.1, 12.4]] }
43
+ ```
44
+
45
+ ## Streaming VAD — real-time, frame-by-frame
46
+
47
+ `OmniStreamVAD` processes 10 ms frames (160 int16 samples @ 16 kHz) and emits
48
+ segment-boundary events on the same call that confirms the boundary —
49
+ bit-identical to upstream FireRedVAD's `FireRedStreamVad`.
50
+
51
+ ```ts
52
+ import { OmniStreamVAD } from "omnivad";
53
+
54
+ const vad = await OmniStreamVAD.create();
55
+
56
+ for (let i = 0; i + 160 <= pcm.length; i += 160) {
57
+ const r = vad.processFrame(pcm.subarray(i, i + 160));
58
+ if (!r) continue;
59
+ if (r.isSpeechStart) console.log(`START @ ${(r.speechStartFrame * 0.01).toFixed(2)}s`);
60
+ if (r.isSpeechEnd) console.log(`END @ ${(r.speechEndFrame * 0.01).toFixed(2)}s`);
61
+ }
62
+ ```
63
+
64
+ `processFrame()` returns `{ confidence, smoothedProb, isSpeech, isSpeechStart,
65
+ isSpeechEnd, frameIdx, speechStartFrame, speechEndFrame }` — every field comes
66
+ straight from the C state machine.
67
+
68
+ ## Audio Event Detection — speech / singing / music
69
+
70
+ ```ts
71
+ import { OmniAED } from "omnivad";
72
+
73
+ const aed = await OmniAED.create();
74
+ const events = aed.detect(audioFloat32);
75
+ // { duration: 22.0,
76
+ // events: { speech: [[...]], singing: [[...]], music: [[...]] },
77
+ // ratios: { speech: 0.41, singing: 0.0, music: 0.59 } }
78
+ ```
79
+
80
+ ## Whisper / WhisperX-style chunking
81
+
82
+ `OmniVAD` + `mergeChunks(mode: "greedy")` is the 1:1 equivalent of WhisperX's
83
+ `Binarize(max_duration=chunk_size)` + greedy packing. Use this recipe when
84
+ feeding chunks into Whisper-family ASR models that expect a fixed 30 s window:
85
+
86
+ ```ts
87
+ import { OmniVAD, mergeChunks } from "omnivad";
88
+
89
+ const vad = await OmniVAD.create(); // threshold=0.4 default — safer for Whisper
90
+ const result = vad.detect(audioFloat32);
91
+
92
+ const chunks = await mergeChunks(result.timestamps, {
93
+ maxChunkSecs: 30.0, // Whisper input window
94
+ mode: "greedy", // WhisperX behavior
95
+ padOnsetSecs: 0.04,
96
+ padOffsetSecs: 0.04,
97
+ minSilenceSecs: 0.20,
98
+ });
99
+ // Slice the audio at [chunk.start, chunk.end] and feed each slice to Whisper.
100
+ ```
101
+
102
+ A second mode `"longest_gap"` exists for variable-length-input models
103
+ (forced alignment, TTS) — see the GitHub README for the comparison table.
104
+
105
+ ## Multi-stream concurrency
106
+
107
+ `OmniStreamVAD` instances have mutable per-stream state and **must not** be
108
+ shared across concurrent streams. Use `clone()` to spin up a fresh instance
109
+ that shares the underlying model weights but has its own state — instant,
110
+ near-zero memory overhead per stream.
111
+
112
+ ```ts
113
+ const base = await OmniStreamVAD.create();
114
+ const streamA = base.clone();
115
+ const streamB = base.clone();
116
+ // Process two independent audio sessions in parallel.
117
+ ```
118
+
119
+ ## Models and CDN
120
+
121
+ By default, models are fetched from jsDelivr:
122
+
123
+ ```
124
+ https://cdn.jsdelivr.net/npm/omnivad@<version>/models/{vad,stream-vad,aed}.omnivad
125
+ ```
126
+
127
+ Override per call when you need to host them yourself or pre-bundle:
128
+
129
+ ```ts
130
+ const vad = await OmniVAD.create({
131
+ modelUrl: "https://your-cdn/vad.omnivad", // or
132
+ modelData: arrayBufferYouAlreadyHave,
133
+ });
134
+ ```
135
+
136
+ In Node.js, models are read from the installed package (`omnivad/models/`) — no
137
+ network access required at runtime.
138
+
139
+ ## Performance
140
+
141
+ Real-Time Factor (lower = faster) on Apple M-series:
142
+
143
+ | Model | RTF | Speed |
144
+ |-------|-----|-------|
145
+ | VAD | ~0.003 | ~330× real-time |
146
+ | Streaming VAD | ~0.002 | ~500× real-time |
147
+ | AED | ~0.002 | ~500× real-time |
148
+
149
+ WASM is built with SIMD enabled and ncnn fp16 weights.
150
+
151
+ ## Accuracy
152
+
153
+ Verified bit-identical to upstream PyTorch reference on 5 audio files × 3
154
+ models — see the [accuracy table](https://github.com/lifeiteng/OmniVAD-Kit#testing)
155
+ in the main repo.
156
+
157
+ ## Browser, Worker, Node — same API
158
+
159
+ The package detects its runtime and loads the right glue:
160
+
161
+ - **Browsers (main thread)** — classic-script injection of the Emscripten glue
162
+ (works around `MODULARIZE=1` IIFE issues with `import()`).
163
+ - **Web Workers / ServiceWorkers** — same path via `importScripts`.
164
+ - **Node.js (≥ 18)** — `createRequire` + local CJS resolution. No bundler
165
+ config needed.
166
+
167
+ ## See also
168
+
169
+ - Full documentation, accuracy tables, C/C++ API, Python package, native build:
170
+ [GitHub repository](https://github.com/lifeiteng/OmniVAD-Kit)
171
+ - [中文 README](https://github.com/lifeiteng/OmniVAD-Kit/blob/main/README.zh.md)
172
+ - [Local development guide](https://github.com/lifeiteng/OmniVAD-Kit#local-development)
173
+
174
+ ## Credits
175
+
176
+ - [**FireRedVAD**](https://github.com/FireRedTeam/FireRedVAD) — Kaituo Xu,
177
+ Wenpeng Li, Kai Huang, Kun Liu (Xiaohongshu). Source models, DFSMN
178
+ architecture, training pipeline.
179
+ - [ncnn](https://github.com/Tencent/ncnn) — Tencent. Inference backend.
180
+ - [Emscripten](https://emscripten.org/) — WebAssembly toolchain.
181
+
182
+ ## License
183
+
184
+ Apache-2.0 — same as upstream FireRedVAD.
package/dist/index.cjs CHANGED
@@ -4,12 +4,41 @@ var _documentCurrentScript = typeof document !== 'undefined' ? document.currentS
4
4
  // src/wasm-binding.ts
5
5
  var _module = null;
6
6
  var _loading = null;
7
+ function loadScript(url) {
8
+ if (typeof globalThis.document === "undefined") {
9
+ return new Promise((resolve, reject) => {
10
+ try {
11
+ const importScripts = globalThis.importScripts;
12
+ if (typeof importScripts !== "function") {
13
+ throw new Error(
14
+ "omnivad: cannot load glue script \u2014 no document and no importScripts"
15
+ );
16
+ }
17
+ importScripts(url);
18
+ resolve();
19
+ } catch (err) {
20
+ reject(err instanceof Error ? err : new Error(String(err)));
21
+ }
22
+ });
23
+ }
24
+ return new Promise((resolve, reject) => {
25
+ const s = globalThis.document.createElement("script");
26
+ s.src = url;
27
+ s.async = true;
28
+ s.crossOrigin = "anonymous";
29
+ s.onload = () => resolve();
30
+ s.onerror = () => reject(new Error(`Failed to load omnivad glue script: ${url}`));
31
+ globalThis.document.head.appendChild(s);
32
+ });
33
+ }
7
34
  var SIZEOF_POST_CONFIG = 28;
8
35
  var SIZEOF_AED_POST_CONFIG = 3 * SIZEOF_POST_CONFIG;
9
36
  var SIZEOF_SEGMENT = 8;
10
37
  var SIZEOF_AED_SEGMENT = 16;
38
+ var SIZEOF_CHUNK_CONFIG = 28;
39
+ var SIZEOF_CHUNK = 16;
11
40
  var OMNI_ERR_NO_FRAMES = -7;
12
- var VERSION = "0.2.5";
41
+ var VERSION = "0.2.9";
13
42
  var DEFAULT_CDN_BASE = `https://cdn.jsdelivr.net/npm/omnivad@${VERSION}/models`;
14
43
  var MODEL_FILES = {
15
44
  vad: "vad.omnivad",
@@ -25,22 +54,41 @@ async function initWasm(wasmLocator) {
25
54
  if (typeof globalThis.process?.versions?.node === "string") {
26
55
  const { createRequire } = await import(
27
56
  /* webpackIgnore: true */
57
+ /* turbopackIgnore: true */
28
58
  'module'
29
59
  );
30
- const { dirname, join } = await import('path');
60
+ const { dirname, join } = await import(
61
+ /* webpackIgnore: true */
62
+ /* turbopackIgnore: true */
63
+ 'path'
64
+ );
31
65
  const req = createRequire((typeof document === 'undefined' ? require('u' + 'rl').pathToFileURL(__filename).href : (_documentCurrentScript && _documentCurrentScript.tagName.toUpperCase() === 'SCRIPT' && _documentCurrentScript.src || new URL('index.cjs', document.baseURI).href)));
32
66
  const gluePath = req.resolve("../dist/wasm/omnivad.cjs");
33
67
  const wasmDir = dirname(gluePath);
34
68
  createOmniVAD = req(gluePath);
35
69
  defaultLocateFile = (filename) => join(wasmDir, filename);
36
70
  } else {
37
- const glueUrl = new URL("../dist/wasm/omnivad.js", (typeof document === 'undefined' ? require('u' + 'rl').pathToFileURL(__filename).href : (_documentCurrentScript && _documentCurrentScript.tagName.toUpperCase() === 'SCRIPT' && _documentCurrentScript.src || new URL('index.cjs', document.baseURI).href)));
38
- const mod = await import(
39
- /* webpackIgnore: true */
40
- glueUrl.href
41
- );
42
- createOmniVAD = mod.default || mod;
43
- const wasmBaseUrl = new URL("./", glueUrl);
71
+ let glueUrlStr;
72
+ if (wasmLocator) {
73
+ glueUrlStr = wasmLocator("omnivad.js");
74
+ } else {
75
+ glueUrlStr = new URL("../dist/wasm/omnivad.js", (typeof document === 'undefined' ? require('u' + 'rl').pathToFileURL(__filename).href : (_documentCurrentScript && _documentCurrentScript.tagName.toUpperCase() === 'SCRIPT' && _documentCurrentScript.src || new URL('index.cjs', document.baseURI).href))).href;
76
+ }
77
+ const g = globalThis;
78
+ let factory = g.createOmniVAD;
79
+ if (typeof factory !== "function") {
80
+ await loadScript(glueUrlStr);
81
+ factory = g.createOmniVAD;
82
+ }
83
+ if (typeof factory !== "function") {
84
+ throw new Error(
85
+ `omnivad.js loaded from ${glueUrlStr} but globalThis.createOmniVAD is missing`
86
+ );
87
+ }
88
+ createOmniVAD = factory;
89
+ const baseHref = typeof globalThis.location !== "undefined" ? globalThis.location.href : "file:///";
90
+ const absGlue = new URL(glueUrlStr, baseHref);
91
+ const wasmBaseUrl = new URL("./", absGlue);
44
92
  defaultLocateFile = (filename) => new URL(filename, wasmBaseUrl).toString();
45
93
  }
46
94
  const opts = {};
@@ -64,10 +112,19 @@ async function loadModel(modelType, modelUrl, modelData) {
64
112
  if (typeof globalThis.process?.versions?.node === "string") {
65
113
  const { createRequire } = await import(
66
114
  /* webpackIgnore: true */
115
+ /* turbopackIgnore: true */
67
116
  'module'
68
117
  );
69
- const { dirname, join } = await import('path');
70
- const { readFile } = await import('fs/promises');
118
+ const { dirname, join } = await import(
119
+ /* webpackIgnore: true */
120
+ /* turbopackIgnore: true */
121
+ 'path'
122
+ );
123
+ const { readFile } = await import(
124
+ /* webpackIgnore: true */
125
+ /* turbopackIgnore: true */
126
+ 'fs/promises'
127
+ );
71
128
  const req = createRequire((typeof document === 'undefined' ? require('u' + 'rl').pathToFileURL(__filename).href : (_documentCurrentScript && _documentCurrentScript.tagName.toUpperCase() === 'SCRIPT' && _documentCurrentScript.src || new URL('index.cjs', document.baseURI).href)));
72
129
  const pkgDir = dirname(req.resolve("../package.json"));
73
130
  const modelPath = join(pkgDir, "models", filename);
@@ -120,10 +177,86 @@ var DEFAULT_VAD_CONFIG = {
120
177
  smoothWindowSize: 5,
121
178
  minSpeechFrames: 20,
122
179
  minSilenceFrames: 20,
123
- maxSpeechFrames: 2e3,
180
+ maxSpeechFrames: 3e3,
124
181
  mergeSilenceFrames: 0,
125
182
  extendSpeechFrames: 0
126
183
  };
184
+ var OMNI_CHUNK_GREEDY = 0;
185
+ var OMNI_CHUNK_LONGEST_GAP = 1;
186
+ var DEFAULT_CHUNK_CONFIG = {
187
+ maxChunkSecs: 30,
188
+ maxGapSecs: Infinity,
189
+ padOnsetSecs: 0.04,
190
+ padOffsetSecs: 0.04,
191
+ minSpeechSecs: 0,
192
+ minSilenceSecs: 0.2,
193
+ // matches VAD minSilenceFrames=20 @ 10ms shift
194
+ mode: "greedy"
195
+ };
196
+ function modeToInt(m) {
197
+ switch (m) {
198
+ case "greedy":
199
+ return OMNI_CHUNK_GREEDY;
200
+ case "longest_gap":
201
+ return OMNI_CHUNK_LONGEST_GAP;
202
+ default:
203
+ throw new Error(`Unknown chunking mode: ${String(m)}`);
204
+ }
205
+ }
206
+ function writeChunkConfig(M, ptr, cfg) {
207
+ M.setValue(ptr + 0, cfg.maxChunkSecs, "float");
208
+ M.setValue(ptr + 4, cfg.maxGapSecs, "float");
209
+ M.setValue(ptr + 8, cfg.padOnsetSecs, "float");
210
+ M.setValue(ptr + 12, cfg.padOffsetSecs, "float");
211
+ M.setValue(ptr + 16, cfg.minSpeechSecs, "float");
212
+ M.setValue(ptr + 20, cfg.minSilenceSecs, "float");
213
+ M.setValue(ptr + 24, modeToInt(cfg.mode), "i32");
214
+ }
215
+ function chunkMerge(M, segments, config) {
216
+ const numSegments = segments.length;
217
+ const segPtr = numSegments > 0 ? M._malloc(numSegments * SIZEOF_SEGMENT) : 0;
218
+ const cfgPtr = M._malloc(SIZEOF_CHUNK_CONFIG);
219
+ const outPtrPtr = M._malloc(4);
220
+ const outCountPtr = M._malloc(4);
221
+ try {
222
+ for (let i = 0; i < numSegments; i++) {
223
+ const base = segPtr + i * SIZEOF_SEGMENT;
224
+ M.setValue(base + 0, segments[i][0], "float");
225
+ M.setValue(base + 4, segments[i][1], "float");
226
+ }
227
+ writeChunkConfig(M, cfgPtr, config);
228
+ M.setValue(outPtrPtr, 0, "i32");
229
+ M.setValue(outCountPtr, 0, "i32");
230
+ const rc = M.ccall(
231
+ "omni_merge_chunks",
232
+ "number",
233
+ ["number", "number", "number", "number", "number"],
234
+ [segPtr, numSegments, cfgPtr, outPtrPtr, outCountPtr]
235
+ );
236
+ if (rc !== 0) {
237
+ throw new Error(`omni_merge_chunks failed: ${readNativeError(M, rc)}`);
238
+ }
239
+ const count = M.getValue(outCountPtr, "i32");
240
+ const chunkPtr = M.getValue(outPtrPtr, "i32");
241
+ const chunks = [];
242
+ for (let i = 0; i < count; i++) {
243
+ const base = chunkPtr + i * SIZEOF_CHUNK;
244
+ chunks.push({
245
+ start: M.getValue(base + 0, "float"),
246
+ end: M.getValue(base + 4, "float"),
247
+ segStartIdx: M.getValue(base + 8, "i32"),
248
+ segCount: M.getValue(base + 12, "i32")
249
+ });
250
+ }
251
+ if (chunkPtr) M._free(chunkPtr);
252
+ return chunks;
253
+ } finally {
254
+ if (segPtr) M._free(segPtr);
255
+ M._free(cfgPtr);
256
+ M._free(outPtrPtr);
257
+ M._free(outCountPtr);
258
+ }
259
+ }
127
260
  function vadCreate(M, modelBuffer) {
128
261
  const bytes = new Uint8Array(modelBuffer);
129
262
  const ptr = M._malloc(bytes.length);
@@ -228,24 +361,49 @@ function aedDetect(M, handle, audioPtr, numSamples, cfg, format = "f32") {
228
361
  function aedDestroy(M, handle) {
229
362
  M.ccall("omni_aed_destroy", null, ["number"], [handle]);
230
363
  }
231
- function streamVadCreate(M, modelBuffer, threshold = 0.5) {
364
+ var DEFAULT_STREAM_VAD_CONFIG = {
365
+ threshold: 0.5,
366
+ smoothWindowSize: 5,
367
+ padStartFrame: 5,
368
+ minSpeechFrame: 8,
369
+ maxSpeechFrame: 2e3,
370
+ minSilenceFrame: 20
371
+ };
372
+ var SIZEOF_STREAM_VAD_CONFIG = 24;
373
+ function writeStreamVadConfig(M, ptr, cfg) {
374
+ M.setValue(ptr + 0, cfg.threshold, "float");
375
+ M.setValue(ptr + 4, cfg.smoothWindowSize, "i32");
376
+ M.setValue(ptr + 8, cfg.padStartFrame, "i32");
377
+ M.setValue(ptr + 12, cfg.minSpeechFrame, "i32");
378
+ M.setValue(ptr + 16, cfg.maxSpeechFrame, "i32");
379
+ M.setValue(ptr + 20, cfg.minSilenceFrame, "i32");
380
+ }
381
+ function streamVadCreate(M, modelBuffer, config = {}) {
382
+ const overrides = Object.fromEntries(
383
+ Object.entries(config).filter(([, v]) => v !== void 0)
384
+ );
385
+ const cfg = { ...DEFAULT_STREAM_VAD_CONFIG, ...overrides };
232
386
  const bytes = new Uint8Array(modelBuffer);
233
- const ptr = M._malloc(bytes.length);
234
- M.HEAPU8.set(bytes, ptr);
387
+ const dataPtr = M._malloc(bytes.length);
388
+ M.HEAPU8.set(bytes, dataPtr);
389
+ const cfgPtr = M._malloc(SIZEOF_STREAM_VAD_CONFIG);
235
390
  try {
391
+ writeStreamVadConfig(M, cfgPtr, cfg);
236
392
  return createModel(
237
393
  M,
238
394
  "omni_stream_vad_create_from_buffer",
239
395
  ["number", "number", "number"],
240
- [ptr, bytes.length, threshold],
396
+ [dataPtr, bytes.length, cfgPtr],
241
397
  "StreamVAD"
242
398
  );
243
399
  } finally {
244
- M._free(ptr);
400
+ M._free(dataPtr);
401
+ M._free(cfgPtr);
245
402
  }
246
403
  }
404
+ var SIZEOF_STREAM_VAD_RESULT = 24;
247
405
  function streamVadProcess(M, handle, pcm16Ptr, numSamples) {
248
- const resultPtr = M._malloc(12);
406
+ const resultPtr = M._malloc(SIZEOF_STREAM_VAD_RESULT);
249
407
  try {
250
408
  const ret = M.ccall(
251
409
  "omni_stream_vad_process",
@@ -256,9 +414,14 @@ function streamVadProcess(M, handle, pcm16Ptr, numSamples) {
256
414
  if (ret === OMNI_ERR_NO_FRAMES) return null;
257
415
  if (ret !== 0) throw new Error(`StreamVAD process failed: ${ret}`);
258
416
  return {
259
- confidence: M.getValue(resultPtr, "float"),
260
- isSpeech: M.getValue(resultPtr + 4, "i8") !== 0,
261
- frameOffset: M.getValue(resultPtr + 8, "i32")
417
+ confidence: M.getValue(resultPtr + 0, "float"),
418
+ smoothedProb: M.getValue(resultPtr + 4, "float"),
419
+ isSpeech: M.getValue(resultPtr + 8, "i8") !== 0,
420
+ isSpeechStart: M.getValue(resultPtr + 9, "i8") !== 0,
421
+ isSpeechEnd: M.getValue(resultPtr + 10, "i8") !== 0,
422
+ frameIdx: M.getValue(resultPtr + 12, "i32"),
423
+ speechStartFrame: M.getValue(resultPtr + 16, "i32"),
424
+ speechEndFrame: M.getValue(resultPtr + 20, "i32")
262
425
  };
263
426
  } finally {
264
427
  M._free(resultPtr);
@@ -357,8 +520,6 @@ function int16ToNormalizedFloat32(i16) {
357
520
  var SAMPLE_RATE2 = 16e3;
358
521
  var OmniStreamVAD = class _OmniStreamVAD {
359
522
  constructor(handle) {
360
- this.inSpeech = false;
361
- this.speechStartFrame = 0;
362
523
  this.handle = handle;
363
524
  }
364
525
  /**
@@ -369,8 +530,14 @@ var OmniStreamVAD = class _OmniStreamVAD {
369
530
  await initWasm();
370
531
  const M = getModule();
371
532
  const modelBuffer = await loadModel("stream-vad", options.modelUrl, options.modelData);
372
- const threshold = options.speechThreshold ?? 0.5;
373
- const handle = streamVadCreate(M, modelBuffer, threshold);
533
+ const handle = streamVadCreate(M, modelBuffer, {
534
+ threshold: options.threshold,
535
+ smoothWindowSize: options.smoothWindowSize,
536
+ padStartFrame: options.padStartFrame,
537
+ minSpeechFrame: options.minSpeechFrame,
538
+ maxSpeechFrame: options.maxSpeechFrame,
539
+ minSilenceFrame: options.minSilenceFrame
540
+ });
374
541
  return new _OmniStreamVAD(handle);
375
542
  }
376
543
  /**
@@ -388,6 +555,10 @@ var OmniStreamVAD = class _OmniStreamVAD {
388
555
  /**
389
556
  * Process one frame of audio (160 int16 samples = 10ms @ 16kHz).
390
557
  * Returns null until enough audio is accumulated.
558
+ *
559
+ * Segment-boundary events (isSpeechStart / isSpeechEnd and the matching
560
+ * speech_*_frame indices) come straight from the C-layer state machine
561
+ * (bit-identical to upstream FireRedVAD) — the wrapper is just a marshaller.
391
562
  */
392
563
  processFrame(pcm160) {
393
564
  const M = getModule();
@@ -396,28 +567,16 @@ var OmniStreamVAD = class _OmniStreamVAD {
396
567
  heap16.set(pcm160);
397
568
  try {
398
569
  const result = streamVadProcess(M, this.handle, ptr, pcm160.length);
399
- if (!result || result.frameOffset === 0) return null;
400
- const frameIndex = result.frameOffset;
401
- const isSpeechStart = result.isSpeech && !this.inSpeech;
402
- const isSpeechEnd = !result.isSpeech && this.inSpeech;
403
- if (isSpeechStart) {
404
- this.speechStartFrame = frameIndex;
405
- }
406
- const activeSpeechStartFrame = isSpeechEnd ? this.speechStartFrame : result.isSpeech ? this.speechStartFrame : 0;
407
- const speechEndFrame = isSpeechEnd ? Math.max(1, frameIndex - 1) : 0;
408
- this.inSpeech = result.isSpeech;
409
- if (isSpeechEnd) {
410
- this.speechStartFrame = 0;
411
- }
570
+ if (!result) return null;
412
571
  return {
413
572
  confidence: result.confidence,
414
- smoothedConfidence: result.confidence,
573
+ smoothedProb: result.smoothedProb,
415
574
  isSpeech: result.isSpeech,
416
- frameIndex,
417
- isSpeechStart,
418
- isSpeechEnd,
419
- speechStartFrame: activeSpeechStartFrame,
420
- speechEndFrame
575
+ frameIndex: result.frameIdx,
576
+ isSpeechStart: result.isSpeechStart,
577
+ isSpeechEnd: result.isSpeechEnd,
578
+ speechStartFrame: result.speechStartFrame,
579
+ speechEndFrame: result.speechEndFrame
421
580
  };
422
581
  } finally {
423
582
  M._free(ptr);
@@ -456,11 +615,9 @@ var OmniStreamVAD = class _OmniStreamVAD {
456
615
  M._free(framesPtr);
457
616
  }
458
617
  }
459
- /** Reset all internal state. */
618
+ /** Reset all internal state (model cache, audio buffer, postprocessor). */
460
619
  reset() {
461
620
  streamVadReset(getModule(), this.handle);
462
- this.inSpeech = false;
463
- this.speechStartFrame = 0;
464
621
  }
465
622
  /** Release native resources. */
466
623
  dispose() {
@@ -468,8 +625,6 @@ var OmniStreamVAD = class _OmniStreamVAD {
468
625
  streamVadDestroy(getModule(), this.handle);
469
626
  this.handle = 0;
470
627
  }
471
- this.inSpeech = false;
472
- this.speechStartFrame = 0;
473
628
  }
474
629
  };
475
630
  function int16ToFloat32(i16) {
@@ -583,7 +738,30 @@ function computeCoverageRatios(events, duration) {
583
738
  return ratios;
584
739
  }
585
740
 
741
+ // src/chunking.ts
742
+ async function mergeChunks(segments, options = {}) {
743
+ await initWasm();
744
+ const M = getModule();
745
+ const cfg = {
746
+ maxChunkSecs: options.maxChunkSecs ?? DEFAULT_CHUNK_CONFIG.maxChunkSecs,
747
+ maxGapSecs: options.maxGapSecs ?? DEFAULT_CHUNK_CONFIG.maxGapSecs,
748
+ padOnsetSecs: options.padOnsetSecs ?? DEFAULT_CHUNK_CONFIG.padOnsetSecs,
749
+ padOffsetSecs: options.padOffsetSecs ?? DEFAULT_CHUNK_CONFIG.padOffsetSecs,
750
+ minSpeechSecs: options.minSpeechSecs ?? DEFAULT_CHUNK_CONFIG.minSpeechSecs,
751
+ minSilenceSecs: options.minSilenceSecs ?? DEFAULT_CHUNK_CONFIG.minSilenceSecs,
752
+ mode: options.mode ?? DEFAULT_CHUNK_CONFIG.mode
753
+ };
754
+ const records = chunkMerge(M, segments, cfg);
755
+ return records.map((r) => ({
756
+ start: r.start,
757
+ end: r.end,
758
+ segStartIdx: r.segStartIdx,
759
+ segCount: r.segCount
760
+ }));
761
+ }
762
+
586
763
  exports.DEFAULT_CDN_BASE = DEFAULT_CDN_BASE;
764
+ exports.DEFAULT_CHUNK_CONFIG = DEFAULT_CHUNK_CONFIG;
587
765
  exports.FireRedAED = OmniAED;
588
766
  exports.FireRedStreamVAD = OmniStreamVAD;
589
767
  exports.FireRedVAD = OmniVAD;
@@ -594,5 +772,6 @@ exports.OmniVAD = OmniVAD;
594
772
  exports.VERSION = VERSION;
595
773
  exports.initWasm = initWasm;
596
774
  exports.loadModel = loadModel;
775
+ exports.mergeChunks = mergeChunks;
597
776
  //# sourceMappingURL=index.cjs.map
598
777
  //# sourceMappingURL=index.cjs.map