voice-router-dev 0.8.6 → 0.8.8

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -5,6 +5,71 @@ All notable changes to this project will be documented in this file.
5
5
  The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
6
6
  and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7
7
 
8
+ ## [0.8.7] - 2026-04-18
9
+
10
+ ### Added
11
+
12
+ #### Speechmatics: Real-Time Streaming (`transcribeStream()`)
13
+
14
+ Speechmatics now supports WebSocket-based real-time transcription via `wss://{region}.rt.speechmatics.com/v2`. The adapter follows the same pattern as Deepgram/Gladia/AssemblyAI streaming.
15
+
16
+ **Protocol flow:**
17
+ 1. Connect with `Authorization: Bearer` header
18
+ 2. Send `StartRecognition` JSON with `audio_format` + `transcription_config`
19
+ 3. Wait for `RecognitionStarted` acknowledgment
20
+ 4. Stream binary audio frames via `sendAudio()`
21
+ 5. Receive `AddPartialTranscript` (partials) and `AddTranscript` (finals)
22
+ 6. `EndOfUtterance` boundaries trigger `onUtterance()` callback
23
+ 7. `EndOfStream` → `EndOfTranscript` for clean shutdown
24
+
25
+ **Streaming options** (`speechmaticsStreaming`): `encoding`, `sampleRate`, `language`, `domain`, `operatingPoint`, `maxDelay`, `maxDelayMode`, `enablePartials`, `enableEntities`, `diarization`, `maxSpeakers`, `additionalVocab`, `conversationConfig`, `region`.
26
+
27
+ **Type changes:**
28
+ - `SpeechmaticsCapabilities.streaming` is now `true` — Speechmatics is included in `StreamingProviderType`
29
+ - `SpeechmaticsStreamingOptions` added to `ProviderStreamingOptions` union and `StreamingOptionsForProvider<P>` conditional type
30
+ - `StreamingOptions.speechmaticsStreaming` field added
31
+
32
+ ### Fixed
33
+
34
+ #### Soniox: Fix Streaming WebSocket Initialization
35
+
36
+ Three bugs in the Soniox streaming adapter:
37
+
38
+ | Bug | Before (broken) | After (fixed) |
39
+ |-----|-----------------|----------------|
40
+ | **Init message** | Config sent as URL query params | JSON text frame sent after `ws.onopen` (Soniox requires first frame to be JSON) |
41
+ | **Default model** | `stt-rt-preview` (deprecated/removed) | `stt-rt-v4` |
42
+ | **Close detection** | 1s threshold for early-close detection | 5s threshold (Soniox takes ~3s to close) |
43
+
44
+ The JSON init frame now includes `api_key`, `model`, `audio_format`, `sample_rate`, `num_channels`, and all optional config (diarization, language hints, context, etc.).
45
+
46
+ #### Speechmatics: Fix Content-Type for URL-Based Batch Transcription
47
+
48
+ Speechmatics `POST /v2/jobs` always requires `multipart/form-data`, but the URL path was sending a JSON body with `Content-Type: application/json`, causing HTTP 400 errors.
49
+
50
+ The `config` field is now sent as a FormData field for both URL and file inputs. Also fixed the file upload path to properly convert `Buffer` to `Blob` before appending to FormData (pre-existing type error).
51
+
52
+ #### Soniox: Migrate to Current Async Transcription API
53
+
54
+ The batch transcription adapter was using the old `/speech/transcribe` endpoint which no longer exists (HTTP 404). Soniox migrated to an async job-based API.
55
+
56
+ | | Before (broken) | After (fixed) |
57
+ |---|---|---|
58
+ | **Create job (URL)** | `POST /speech/transcribe` (JSON) | `POST /transcriptions` (JSON with `audio_url`) |
59
+ | **Create job (file)** | `POST /speech/transcribe` (multipart) | `POST /files` → `POST /transcriptions` with `file_id` |
60
+ | **Get result** | `GET /speech/transcripts/{id}` | `GET /transcriptions/{id}` (status) + `GET /transcriptions/{id}/transcript` (result) |
61
+ | **Flow** | Synchronous (immediate result) | Async with `pollForCompletion()` |
62
+
63
+ `normalizeResponse` updated to handle batch transcript tokens (no `is_final` field — all tokens are final) and read `audio_duration_ms` from job metadata.
64
+
65
+ **No breaking changes for consumers.** The adapter's public API (`transcribe()`, `getTranscript()`) is unchanged.
66
+
67
+ #### Azure STT: Add Utterance Extraction to Batch Transcription
68
+
69
+ Azure batch transcription had words with speaker labels but wasn't building utterances from them. Now uses `buildUtterancesFromWords()` to group speaker-labeled words into utterances, matching all other adapters.
70
+
71
+ ---
72
+
8
73
  ## [0.8.6] - 2026-04-15
9
74
 
10
75
  ### Changed