speech-to-speech 0.1.3 → 0.1.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -6,6 +6,7 @@ TypeScript utilities for speech-to-text (STT) and text-to-speech (TTS) in the br
6
6
 
7
7
  - 🎤 **STT**: Browser-native speech recognition with session management
8
8
  - 🔊 **TTS**: Piper neural TTS with automatic model downloading
9
+ - ⚡ **WASM Caching**: Automatic browser caching eliminates repeated downloads
9
10
  - 🎵 **Shared Audio Queue**: Auto-play audio queue for seamless playback
10
11
  - ✅ **Zero Config**: No manual ONNX setup required - everything is handled automatically
11
12
  - 📦 **Small**: ~135KB package size
@@ -37,7 +38,7 @@ stt.start();
37
38
 
38
39
  // Text-to-Speech with auto-play queue
39
40
  const tts = new TTSLogic({ voiceId: "en_US-hfc_female-medium" });
40
- await tts.initialize();
41
+ await tts.initialize(); // WASM files cached automatically
41
42
 
42
43
  const result = await tts.synthesize("Hello world!");
43
44
  sharedAudioPlayer.addAudioIntoQueue(result.audio, result.sampleRate);
@@ -255,24 +256,41 @@ export default function SpeechComponent() {
255
256
  ## Exports
256
257
 
257
258
  ```typescript
258
- // Main bundle (STT + TTS)
259
+ // Main bundle (STT + TTS + Service wrapper)
259
260
  import {
261
+ // Service wrapper (new in 0.1.4)
262
+ createSpeechService,
263
+ // STT
260
264
  STTLogic,
265
+ getCompatibilityInfo,
266
+ // TTS
261
267
  TTSLogic,
268
+ prefetchTTSModel,
269
+ cleanTextForTTS,
262
270
  AudioPlayer,
263
271
  createAudioPlayer,
264
272
  sharedAudioPlayer,
265
273
  } from "speech-to-speech";
266
274
 
267
275
  // STT only
268
- import { STTLogic, ResetSTTLogic, VADController } from "speech-to-speech/stt";
276
+ import {
277
+ STTLogic,
278
+ ResetSTTLogic,
279
+ VADController,
280
+ getCompatibilityInfo, // new in 0.1.4
281
+ } from "speech-to-speech/stt";
269
282
 
270
283
  // TTS only
271
284
  import {
272
285
  TTSLogic,
286
+ prefetchTTSModel, // new in 0.1.4
287
+ cleanTextForTTS, // new in 0.1.4
273
288
  AudioPlayer,
274
289
  createAudioPlayer,
275
290
  sharedAudioPlayer,
291
+ ensureWasmCached,
292
+ isWasmCached,
293
+ clearWasmCache,
276
294
  } from "speech-to-speech/tts";
277
295
  ```
278
296
 
@@ -282,38 +300,123 @@ import {
282
300
 
283
301
  #### `STTLogic`
284
302
 
285
- Main speech recognition controller with session management.
303
+ Main speech recognition controller. Wraps the browser's Web Speech API with:
304
+
305
+ - **Silent session rotation.** Chromium ends Web Speech sessions on its own (typically after ~60s). `STTLogic` detects the browser's `end` event, commits the current session into an in-memory transcript, and transparently starts a fresh session — all without notifying the consumer. `onTranscript` is never fired during a rotation.
306
+ - **Dedup-safe transcript model.** A high-water-mark (`processedFinalCount`) ensures each `isFinal` result is ingested exactly once across rotations, eliminating the duplicate-word artifacts typical of naive `results` concatenation.
307
+ - **Two delivery modes.** Pick when the final transcript is emitted via the `continueOnSilence` option:
308
+
309
+ | `continueOnSilence` | Behaviour |
310
+ | ------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------- |
311
+ | `true` *(default)* | **Continuous / manual-stop.** Listening keeps running across all silent restarts until the consumer calls `stt.stop()`. `onTranscript` fires exactly once, on stop. |
312
+ | `false` | **Silence-triggered.** When the user has been silent for `silenceThresholdMs`, `onTranscript` fires with the final transcript and recognition auto-stops. |
313
+
314
+ In **both** modes, `onInterimTranscript` streams the live transcript (committed sessions + current-session finals + in-flight partial) continuously, including during silent rotations — so the UI never goes blank.
286
315
 
287
316
  ```typescript
288
317
  const stt = new STTLogic(
289
318
  // Log callback
290
319
  (message: string, level?: "info" | "warning" | "error") => void,
291
- // Transcript callback
320
+ // Final transcript callback — fires ONCE (see modes above)
292
321
  (transcript: string) => void,
293
322
  // Options
294
323
  {
295
- sessionDurationMs?: number, // Session duration (default: 30000)
296
- interimSaveIntervalMs?: number, // Interim save interval (default: 5000)
297
- preserveTranscriptOnStart?: boolean,
324
+ // --- Delivery mode (new) ---
325
+ continueOnSilence?: boolean, // default: true (manual stop). false => silence-triggered.
326
+ silenceThresholdMs?: number, // default: 1500. Only used when continueOnSilence=false.
327
+
328
+ // --- Live UI streaming ---
329
+ onInterimTranscript?: (text: string) => void, // fires on every result, both interim & final
330
+
331
+ // --- Misc ---
332
+ preserveTranscriptOnStart?: boolean, // keep the previous transcript when start() is called again
333
+
334
+ // --- Deprecated (accepted for backward compat, ignored) ---
335
+ sessionDurationMs?: number, // silent rotation is now browser-driven, not timer-driven
336
+ interimSaveIntervalMs?: number,
298
337
  }
299
338
  );
300
339
 
301
340
  // Core methods
302
341
  stt.start(); // Start listening
303
- stt.stop(); // Stop listening
342
+ stt.stop(); // Stop listening AND emit onTranscript
304
343
  stt.destroy(); // Cleanup resources
305
- stt.getFullTranscript(); // Get accumulated transcript
306
- stt.clearTranscript(); // Clear transcript
344
+ stt.getFullTranscript(); // Live transcript: committed + current session + in-flight interim
345
+ stt.clearTranscript(); // Clear all accumulated transcript
307
346
 
308
347
  // Callbacks
309
- stt.setWordsUpdateCallback((words: string[]) => {}); // Word-by-word updates
348
+ stt.setWordsUpdateCallback((words: string[]) => {}); // Word stream of the live transcript
310
349
  stt.setMicTimeUpdateCallback((ms: number) => {}); // Mic active time
311
350
  stt.setVadCallbacks(
312
- () => console.log("Speech started"), // onSpeechStart
313
- () => console.log("Speech ended") // onSpeechEnd
351
+ () => console.log("Speech started"), // onSpeechStart (heuristic)
352
+ () => console.log("Speech ended") // onSpeechEnd (heuristic)
353
+ );
354
+ ```
355
+
356
+ ##### Mode 1 — Continuous (manual stop)
357
+
358
+ Use this for long-form dictation, note-taking, or chat inputs where the user decides when they are done.
359
+
360
+ ```typescript
361
+ const stt = new STTLogic(
362
+ (msg, level) => console.log(`[${level}]`, msg),
363
+ (finalText) => {
364
+ // Fires ONCE, when stt.stop() is called by you.
365
+ saveToDB(finalText);
366
+ },
367
+ {
368
+ continueOnSilence: true, // (default)
369
+ onInterimTranscript: (liveText) => {
370
+ // Fires continuously — render the growing text as the user speaks.
371
+ liveCaption.textContent = liveText;
372
+ },
373
+ },
314
374
  );
375
+
376
+ stt.start();
377
+ // ... user keeps talking for 5 minutes; Web Speech silently rotates several times ...
378
+ stopButton.onclick = () => stt.stop(); // only here does onTranscript fire
379
+ ```
380
+
381
+ ##### Mode 2 — Silence-triggered auto-stop
382
+
383
+ Use this for turn-taking conversational UIs (voice assistants, STS loops), where "user stopped talking" is the signal to act.
384
+
385
+ ```typescript
386
+ const stt = new STTLogic(
387
+ (msg, level) => console.log(`[${level}]`, msg),
388
+ (finalText) => {
389
+ // Fires automatically once the user has been silent for silenceThresholdMs.
390
+ sendToLLM(finalText);
391
+ },
392
+ {
393
+ continueOnSilence: false,
394
+ silenceThresholdMs: 1500, // 1.5s of silence => auto-emit & auto-stop
395
+ onInterimTranscript: (liveText) => {
396
+ liveCaption.textContent = liveText;
397
+ },
398
+ },
399
+ );
400
+
401
+ stt.start();
402
+ // User speaks, pauses 1.5s, onTranscript fires and listening stops on its own.
403
+ // To begin the next turn, call stt.start() again.
404
+ ```
405
+
406
+ ##### Observing silent session rotations
407
+
408
+ When `continueOnSilence: true`, the library will silently restart the underlying recognition session whenever the browser ends it. You can observe this in the browser DevTools console — `STTLogic` prints three clearly-prefixed markers:
409
+
410
+ ```text
411
+ [STT] 🔴 Session ENDED by Web Speech (sessionId=1) — will silently restart
412
+ [STT] 🔄 Silent restart requested (newSessionId=2, restartCount=1) — committing 3 final segment(s) + interim into memory
413
+ [STT] 🟢 Session RESTARTED silently (sessionId=2) in 180ms — committed="hello there how are you doing today"
414
+ ...
415
+ [STT] ⏹️ Explicit STOP — emitting onTranscript once (len=284, silent restarts during session=2)
315
416
  ```
316
417
 
418
+ The `onTranscript` callback only fires on the final `⏹️ Explicit STOP` line (or when the silence threshold hits in mode 2). If you never see anything between the red/green pairs, the rotation is fully transparent — which is the intended behaviour.
419
+
317
420
  ### TTS (Text-to-Speech)
318
421
 
319
422
  #### `TTSLogic`
@@ -323,7 +426,8 @@ Piper TTS synthesizer. Voice models download automatically on first use.
323
426
  ```typescript
324
427
  const tts = new TTSLogic({
325
428
  voiceId: "en_US-hfc_female-medium", // Piper voice ID
326
- warmUp: true, // Pre-warm the model (default: true)
429
+ warmUp: true, // Pre-warm the model (default: true)
430
+ enableWasmCache: true, // Cache WASM assets (default: true)
327
431
  });
328
432
  await tts.initialize();
329
433
 
@@ -341,6 +445,51 @@ await tts.synthesizeAndAddToQueue("Hello world!");
341
445
  await tts.dispose();
342
446
  ```
343
447
 
448
+ #### WASM Caching (New in 0.1.3)
449
+
450
+ The library automatically caches `piper_phonemize.data` (~9MB) and `piper_phonemize.wasm` in the browser Cache API. This eliminates repeated network downloads on every synthesis call.
451
+
452
+ **Zero-config (recommended):**
453
+ ```typescript
454
+ const tts = new TTSLogic({ voiceId: "en_US-hfc_female-medium" });
455
+ await tts.initialize();
456
+ // WASM files cached automatically after first download
457
+ ```
458
+
459
+ **Self-hosted WASM files:**
460
+ ```typescript
461
+ const tts = new TTSLogic({
462
+ voiceId: "en_US-hfc_female-medium",
463
+ wasmPaths: {
464
+ piperData: "/piper-wasm/piper_phonemize.data",
465
+ piperWasm: "/piper-wasm/piper_phonemize.wasm",
466
+ onnxWasm: "/ort/ort-wasm-simd.wasm", // optional
467
+ },
468
+ });
469
+ ```
470
+
471
+ **Disable caching:**
472
+ ```typescript
473
+ const tts = new TTSLogic({
474
+ voiceId: "en_US-hfc_female-medium",
475
+ enableWasmCache: false, // Uses CDN URLs directly
476
+ });
477
+ ```
478
+
479
+ **Utility functions:**
480
+ ```typescript
481
+ import { ensureWasmCached, isWasmCached, clearWasmCache } from "speech-to-speech/tts";
482
+
483
+ // Prefetch WASM assets before initialization
484
+ await ensureWasmCached(); // Returns { piperData: blob:..., piperWasm: blob:... }
485
+
486
+ // Check if cached
487
+ const cached = await isWasmCached(); // true/false
488
+
489
+ // Clear cache
490
+ await clearWasmCache();
491
+ ```
492
+
344
493
  ### Audio Playback
345
494
 
346
495
  #### `sharedAudioPlayer` (Recommended)
@@ -407,36 +556,43 @@ await player.close();
407
556
  ```typescript
408
557
  import { STTLogic } from "speech-to-speech";
409
558
 
559
+ const liveEl = document.getElementById("live")!;
560
+ const finalEl = document.getElementById("final")!;
561
+
410
562
  const stt = new STTLogic(
411
563
  (message, level) => console.log(`[STT ${level}] ${message}`),
412
- (transcript) => {
413
- document.getElementById("output")!.textContent = transcript;
564
+ (finalTranscript) => {
565
+ // Fires exactly once — when stt.stop() is called (manual mode)
566
+ // or when silence >= silenceThresholdMs is detected (silence mode).
567
+ finalEl.textContent = finalTranscript;
414
568
  },
415
569
  {
416
- sessionDurationMs: 30000,
417
- interimSaveIntervalMs: 5000,
570
+ continueOnSilence: true, // manual-stop mode — swap to false for silence auto-stop
571
+ // silenceThresholdMs: 1500, // only used when continueOnSilence=false
572
+ onInterimTranscript: (liveText) => {
573
+ // Streams continuously — even across silent session rotations.
574
+ liveEl.textContent = liveText;
575
+ },
418
576
  }
419
577
  );
420
578
 
421
- // Listen for individual words
579
+ // Optional: word-by-word stream of the live transcript
422
580
  stt.setWordsUpdateCallback((words) => {
423
- console.log("Heard words:", words);
581
+ console.log("Words so far:", words);
424
582
  });
425
583
 
426
- // Detect speech start/end
584
+ // Optional: rough VAD based on Web Speech interim/final transitions
427
585
  stt.setVadCallbacks(
428
586
  () => console.log("User started speaking"),
429
587
  () => console.log("User stopped speaking")
430
588
  );
431
589
 
432
- // Start listening
590
+ // Start listening — silent restarts happen under the hood if Web Speech
591
+ // ends its session; you do nothing.
433
592
  stt.start();
434
593
 
435
- // Stop after 10 seconds
436
- setTimeout(() => {
437
- stt.stop();
438
- console.log("Final transcript:", stt.getFullTranscript());
439
- }, 10000);
594
+ // Stop whenever the user decides. Final transcript arrives via onTranscript.
595
+ stopButton.addEventListener("click", () => stt.stop());
440
596
 
441
597
  // Cleanup on page unload
442
598
  window.addEventListener("beforeunload", () => stt.destroy());
@@ -498,22 +654,22 @@ async function init() {
498
654
  tts = new TTSLogic({ voiceId: "en_US-hfc_female-medium" });
499
655
  await tts.initialize();
500
656
 
501
- // Initialize STT
657
+ // Initialize STT in silence-triggered mode — the library itself decides
658
+ // when the user is done and fires `onTranscript` automatically.
502
659
  stt = new STTLogic(
503
660
  (msg, level) => console.log(`[STT] ${msg}`),
504
- (transcript) => console.log("Transcript:", transcript),
505
- { sessionDurationMs: 60000 }
506
- );
507
-
508
- // Process speech when user stops talking
509
- stt.setVadCallbacks(
510
- () => console.log("Listening..."),
511
- async () => {
512
- const transcript = stt.getFullTranscript();
513
- if (transcript.trim().length > 3) {
514
- await processSpeech(transcript);
515
- stt.clearTranscript();
661
+ async (finalTranscript) => {
662
+ // Fires once per turn, when silence >= silenceThresholdMs is detected.
663
+ if (finalTranscript.trim().length > 3) {
664
+ await processSpeech(finalTranscript);
516
665
  }
666
+ stt.clearTranscript();
667
+ stt.start(); // start next turn
668
+ },
669
+ {
670
+ continueOnSilence: false,
671
+ silenceThresholdMs: 1500,
672
+ onInterimTranscript: (live) => (liveCaption.textContent = live),
517
673
  }
518
674
  );
519
675
  }
@@ -569,9 +725,140 @@ function stop() {
569
725
  }
570
726
  ```
571
727
 
728
+ ## Unified Speech Service
729
+
730
+ `createSpeechService()` wires STT and TTS together so you need fewer imports and no manual callback plumbing.
731
+
732
+ ```ts
733
+ import { createSpeechService } from "speech-to-speech";
734
+
735
+ const service = createSpeechService();
736
+
737
+ // 1. Set up STT
738
+ service.initializeSTT({
739
+ onTranscript: (text) => console.log("Final:", text),
740
+ onInterimTranscript: (text) => setLiveCaption(text), // real-time display
741
+ onWordsUpdate: (words) => console.log("Words so far:", words),
742
+ onStatusChange: (type, data) => {
743
+ if (type === "speaking") setUserSpeaking(data as boolean);
744
+ },
745
+ });
746
+
747
+ // 2. Set up TTS (awaitable)
748
+ await service.initializeTTS({ voiceId: "en_US-hfc_female-medium" });
749
+
750
+ // 3. Start session
751
+ service.startListening();
752
+ await service.speak("Hello, how can I help you?");
753
+
754
+ // 4. End session
755
+ const transcript = service.stopListening();
756
+ service.stopSpeaking();
757
+ ```
758
+
759
+ ---
760
+
761
+ ## Interim Transcript Streaming
762
+
763
+ Get real-time partial results while the user is still speaking. `onInterimTranscript` fires on **every** recognition update (both interim and final results) with the full live transcript — including the text committed from prior silent session rotations — so you can render a continuously-growing caption without any gaps when the browser rotates the underlying Web Speech session.
764
+
765
+ Pass `onInterimTranscript` directly to `initializeSTT()`:
766
+
767
+ ```ts
768
+ import { createSpeechService } from "speech-to-speech";
769
+
770
+ const service = createSpeechService();
771
+
772
+ service.initializeSTT({
773
+ onTranscript: (finalText) => console.log("Final:", finalText),
774
+ onInterimTranscript: (liveText) => {
775
+ // Full live text: committed sessions + current-session finals + in-flight partial.
776
+ // Never empties mid-session due to Web Speech's internal timeouts.
777
+ liveCaption.textContent = liveText;
778
+ },
779
+ });
780
+
781
+ await service.initializeTTS({ voiceId: "en_US-hfc_female-medium" });
782
+ service.startListening();
783
+ ```
784
+
785
+ ---
786
+
787
+ ## TTS Warmup
788
+
789
+ Call `prefetchTTSModel()` early in your app boot (e.g. after page load) so the first `speak()` call has no cold-start delay:
790
+
791
+ ```ts
792
+ import { prefetchTTSModel } from "speech-to-speech";
793
+
794
+ // Fire-and-forget — safe to call before the user interacts
795
+ prefetchTTSModel("en_US-hfc_female-medium");
796
+
797
+ // Later, when the user actually triggers speech:
798
+ const tts = new TTSLogic({ voiceId: "en_US-hfc_female-medium" });
799
+ await tts.initialize(); // instant — model already cached
800
+ ```
801
+
802
+ ---
803
+
804
+ ## Browser Compatibility Check
805
+
806
+ Gate your UI before attempting to start STT or TTS:
807
+
808
+ ```ts
809
+ import { getCompatibilityInfo } from "speech-to-speech";
810
+
811
+ const { stt, tts, browser } = getCompatibilityInfo();
812
+
813
+ if (!stt) {
814
+ showBanner(`Speech input is not supported in ${browser}. Please use Chrome or Edge.`);
815
+ }
816
+ if (!tts) {
817
+ showBanner("Text-to-speech is not supported in this browser.");
818
+ }
819
+ ```
820
+
821
+ ---
822
+
823
+ ## Text Cleanup for TTS
824
+
825
+ Strip HTML, Markdown, and emoji from LLM responses before passing them to synthesis:
826
+
827
+ ```ts
828
+ import { cleanTextForTTS } from "speech-to-speech";
829
+
830
+ const raw = "**Hello** <b>world</b>! Here's a [link](https://example.com) 🎉";
831
+ const spoken = cleanTextForTTS(raw);
832
+ // → "Hello world Here's a link"
833
+
834
+ // Or opt-out of individual steps:
835
+ const spoken2 = cleanTextForTTS(raw, { removeEmojis: false });
836
+ // → "Hello world Here's a link 🎉"
837
+ ```
838
+
839
+ ---
840
+
841
+ ## Audio Player Status Callbacks
842
+
843
+ React to playback state changes without polling:
844
+
845
+ ```ts
846
+ import { sharedAudioPlayer } from "speech-to-speech";
847
+
848
+ sharedAudioPlayer.setStatusCallback((status) => {
849
+ console.log("[TTS]", status); // e.g. "Playing audio chunk 1"
850
+ });
851
+
852
+ sharedAudioPlayer.setPlayingChangeCallback((isPlaying) => {
853
+ setTTSIndicator(isPlaying); // show/hide a speaking indicator in UI
854
+ });
855
+ ```
856
+
857
+ ---
858
+
572
859
  ## Available Piper Voices
573
860
 
574
- Voice models are downloaded automatically from CDN on first use (~20-80MB per voice).
861
+ Voice models are downloaded automatically from CDN on first use (~20-80MB per voice). WASM files (~9MB) are cached automatically and reused across all voices.
575
862
 
576
863
  | Voice ID | Language | Description |
577
864
  | ------------------------- | ------------ | ------------------------------ |
@@ -602,7 +889,8 @@ See [Piper Voices](https://rhasspy.github.io/piper-samples/) for the complete li
602
889
  | Issue | Solution |
603
890
  | -------------------- | --------------------------------------------------------------------------------------- |
604
891
  | "Voice not found" | Check voice ID spelling. Use `en_US-hfc_female-medium` for testing. |
605
- | Slow first synthesis | Normal - voice model (~20MB) downloads on first use. Subsequent calls use cached model. |
892
+ | Slow first synthesis | Normal - voice model (~20MB) and WASM files (~9MB) download on first use. Subsequent calls use cached assets. |
893
+ | Repeated WASM downloads | Ensure `enableWasmCache: true` (default). Check browser Cache API support. |
606
894
  | No audio output | Ensure browser supports Web Audio API. Check volume and audio permissions. |
607
895
  | CORS errors | Ensure Vite config has proper COOP/COEP headers (see above). |
608
896
 
@@ -612,7 +900,9 @@ See [Piper Voices](https://rhasspy.github.io/piper-samples/) for the complete li
612
900
  | ---------------------------------- | ------------------------------------------------------------------------------------------ |
613
901
  | "Speech Recognition not supported" | Use Chrome, Safari, or Edge. Firefox doesn't support Web Speech API. |
614
902
  | No transcript | Check microphone permissions. Ensure `stt.start()` was called. |
615
- | Transcript stops | Browser sessions timeout after ~30s. Library auto-restarts, but check `sessionDurationMs`. |
903
+ | Transcript stops | The library silently restarts the recognition session whenever the browser ends it — nothing to configure. Open DevTools and look for the `[STT] 🔴 … 🟢` log pair to confirm a rotation happened. |
904
+ | `onTranscript` never fires | In `continueOnSilence: true` (default) it only fires on `stt.stop()`. Call `stop()` to get the final transcript, or switch to `continueOnSilence: false` + `silenceThresholdMs` for automatic delivery. |
905
+ | Duplicated words in final | Fixed in v0.1.5. If you still see duplicates, ensure you are on ≥ 0.1.5 — the old `sessionDurationMs` / `interimSaveIntervalMs` timer path no longer runs. |
616
906
 
617
907
  ### Dev Server Issues (Vite)
618
908
 
@@ -647,6 +937,37 @@ npm run clean # Remove dist/
647
937
  - **[Web Speech API](https://developer.mozilla.org/en-US/docs/Web/API/Web_Speech_API)** - Browser speech recognition
648
938
  - **[Web Audio API](https://developer.mozilla.org/en-US/docs/Web/API/Web_Audio_API)** - Audio processing
649
939
 
940
+ ## Changelog
941
+
942
+ ### v0.1.5
943
+
944
+ - **`STTLogic` — silent session rotation.** Web Speech's internal session end (the ~60s browser timeout, error retries, any spontaneous `end` event) now triggers a fully-silent restart: the library commits the current session into an in-memory transcript and starts a fresh recognition session. `onTranscript` is **not** emitted during rotations, so the consumer sees one uninterrupted listening session.
945
+ - **`STTLogic` — dedup-safe transcript model.** The previous `results` concatenation + `collapseRepeats` safety net is replaced by a high-water-mark (`processedFinalCount`) that ingests each `isFinal` result exactly once. This eliminates the duplicate-word/line artifacts that could previously appear in the final transcript.
946
+ - **`STTLogic` — new option `continueOnSilence` (default `true`).**
947
+ - `true` → manual-stop mode. `onTranscript` fires only when the consumer calls `stt.stop()`.
948
+ - `false` → silence-triggered mode. `onTranscript` fires (and listening auto-stops) when the user has been silent for `silenceThresholdMs`.
949
+ - **`STTLogic` — new option `silenceThresholdMs` (default `1500`).** Silence window used when `continueOnSilence: false`.
950
+ - **`onInterimTranscript`** now fires on every recognition update (interim AND final), and always includes the committed transcript from prior silent rotations — UI captions stay gap-free.
951
+ - **Deprecated options (accepted for backward compatibility, now no-ops):** `sessionDurationMs`, `interimSaveIntervalMs`. Session rotation is browser-driven, not timer-driven.
952
+ - **Observability.** `STTLogic` emits colored `[STT]` console markers on session end, silent restart, and explicit stop, so you can verify behaviour from DevTools without any extra wiring.
953
+
954
+ ### v0.1.4
955
+
956
+ - **`createSpeechService()`** — Unified service wrapper that wires STT + TTS together with a single ergonomic API. Supports `initializeSTT`, `initializeTTS`, `startListening`, `stopListening`, `speak`, `stopSpeaking`, and `getCompatibilityInfo`.
957
+ - **`onInterimTranscript`** — New option in `STTLogic` (and `createSpeechService().initializeSTT()`) to receive real-time partial transcript updates while the user is still speaking.
958
+ - **`prefetchTTSModel(voiceId)`** — Pre-warm a Piper voice early in app boot to eliminate cold-start latency on the first `speak()` call.
959
+ - **`getCompatibilityInfo()`** — Returns `{ stt, tts, browser }` for browser feature detection and UI gating.
960
+ - **`cleanTextForTTS(text, options?)`** — Strips HTML, Markdown, and emoji from text before synthesis. Options: `stripHtml`, `stripMarkdown`, `removeEmojis` (all default `true`).
961
+
962
+ ### v0.1.3
963
+
964
+ - Automatic WASM caching via the browser Cache API — `piper_phonemize.data` (~9MB) and `piper_phonemize.wasm` are fetched once and reused across sessions.
965
+ - `ensureWasmCached`, `isWasmCached`, `clearWasmCache` utility functions.
966
+ - `enableWasmCache` and `wasmPaths` options on `TTSLogic` for self-hosted WASM.
967
+ - Speech-aware audio player — queue automatically pauses while the user is speaking.
968
+
969
+ ---
970
+
650
971
  ## License
651
972
 
652
973
  MIT