npm - @absolutejs/voice - Versions diffs - 0.0.21 → 0.0.22-beta.0 - Mend

@absolutejs/voice 0.0.21 → 0.0.22-beta.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (39) hide show

package/README.md +499 -2
package/dist/angular/index.js +90 -0
package/dist/angular/voice-controller.service.d.ts +6 -0
package/dist/angular/voice-stream.service.d.ts +6 -0
package/dist/client/actions.d.ts +41 -0
package/dist/client/audioPlayer.d.ts +40 -0
package/dist/client/duplex.d.ts +3 -0
package/dist/client/htmxBootstrap.js +84 -0
package/dist/client/index.d.ts +2 -0
package/dist/client/index.js +507 -5
package/dist/correction.d.ts +18 -1
package/dist/fileStore.d.ts +27 -0
package/dist/index.d.ts +12 -1
package/dist/index.js +2425 -33
package/dist/ops.d.ts +100 -0
package/dist/react/index.js +86 -0
package/dist/react/useVoiceController.d.ts +6 -0
package/dist/react/useVoiceStream.d.ts +6 -0
package/dist/routing.d.ts +3 -0
package/dist/runtimeOps.d.ts +23 -0
package/dist/svelte/index.js +84 -0
package/dist/telephony/response.d.ts +7 -0
package/dist/telephony/twilio.d.ts +116 -0
package/dist/testing/benchmark.d.ts +59 -4
package/dist/testing/corrected.d.ts +41 -0
package/dist/testing/duplex.d.ts +59 -0
package/dist/testing/fixtures.d.ts +18 -2
package/dist/testing/index.d.ts +5 -0
package/dist/testing/index.js +4940 -307
package/dist/testing/review.d.ts +143 -0
package/dist/testing/sessionBenchmark.d.ts +25 -0
package/dist/testing/stt.d.ts +2 -1
package/dist/testing/telephony.d.ts +70 -0
package/dist/testing/tts.d.ts +73 -0
package/dist/types.d.ts +290 -3
package/dist/vue/index.js +90 -0
package/dist/vue/useVoiceController.d.ts +11 -0
package/dist/vue/useVoiceStream.d.ts +11 -0
package/package.json +115 -1

package/README.md CHANGED Viewed

@@ -37,6 +37,13 @@ const app = new Elysia()
 		voice({
 			path: '/voice',
 			preset: 'guided-intake',
+			lexicon: [
+				{
+					text: 'AbsoluteJS',
+					aliases: ['absoloot js'],
+					pronunciation: 'ab-so-lute jay ess'
+				}
+			],
 			phraseHints: [
 				{ text: 'AbsoluteJS', aliases: ['absolute js'] },
 				{ text: 'Joe Johnston', aliases: ['joe johnson'] }
@@ -66,11 +73,283 @@ const app = new Elysia()
 `createVoiceMemoryStore()` is dev-only. Real deployments should provide a shared store backed by Redis, Postgres, or equivalent.
+## TTS
+`@absolutejs/voice` now supports optional assistant audio streaming on the same session path. If you provide a `tts` adapter, `assistantText` responses are still sent as text, and the synthesized PCM chunks are streamed as `audio` messages alongside them.
+```ts
+import { voice, createVoiceMemoryStore } from '@absolutejs/voice';
+import { deepgram } from '@absolutejs/voice-deepgram';
+import { elevenlabs } from '@absolutejs/voice-elevenlabs';
+app.use(
+	voice({
+		path: '/voice',
+		session: createVoiceMemoryStore(),
+		stt: deepgram({
+			apiKey: process.env.DEEPGRAM_API_KEY!,
+			model: 'flux-general-en'
+		}),
+		tts: elevenlabs({
+			apiKey: process.env.ELEVENLABS_API_KEY!,
+			voiceId: process.env.ELEVENLABS_VOICE_ID!
+		}),
+		onTurn: async ({ turn }) => ({
+			assistantText: `You said: ${turn.text}`
+		}),
+		onComplete: async () => {}
+	})
+);
+```
+Client state now exposes `assistantAudio` on the stream/controller helpers, so apps can buffer or play synthesized chunks without inventing a second transport.
+If you want a minimal browser playback path, use the client audio player:
+```ts
+import {
+	createVoiceAudioPlayer,
+	createVoiceController
+} from '@absolutejs/voice/client';
+const voice = createVoiceController('/voice', {
+	preset: 'chat'
+});
+const player = createVoiceAudioPlayer(voice);
+await player.start(); // call from a user gesture
+await player.interrupt(); // flush queued assistant playback for barge-in
+```
+`createVoiceAudioPlayer()` subscribes to `assistantAudio`, decodes raw `pcm_s16le` chunks, and queues them in WebAudio. It also exposes `interrupt()`, `lastInterruptLatencyMs`, and `lastPlaybackStopLatencyMs` so apps can flush assistant playback during barge-in and inspect how long it took for queued playback to fully stop.
+For a higher-level client path, use the duplex helper:
+```ts
+import { createVoiceDuplexController } from '@absolutejs/voice/client';
+const voice = createVoiceDuplexController('/voice', {
+	bargeIn: {
+		interruptThreshold: 0.08
+	},
+	preset: 'chat'
+});
+await voice.audioPlayer.start();
+await voice.startRecording();
+```
+`createVoiceDuplexController()` composes the controller and audio player and automatically interrupts assistant playback when:
+- microphone input crosses the configured barge-in threshold
+- partial user speech starts arriving
+- manual `sendAudio(...)` is called while assistant audio is playing
+## Duplex Benchmarks
+The first duplex benchmark lane measures package-level barge-in interruption on the client path. It records scenario pass/fail plus local interruption latency for:
+- manual `sendAudio(...)`
+- partial transcript start
+- input-level threshold crossing
+Run it with:
+```bash
+bun run bench:duplex
+```
+That writes:
+- `benchmark-results/duplex-barge-in.json`
+## Telephony
+`@absolutejs/voice` now includes a first PSTN bridge layer for Twilio Media Streams. It converts inbound `audio/x-mulaw` 8 kHz frames into the PCM format the voice session expects, and converts assistant PCM audio back into outbound Twilio media events.
+Minimal usage:
+```ts
+import { createTwilioMediaStreamBridge, createTwilioVoiceResponse } from '@absolutejs/voice';
+import { deepgram } from '@absolutejs/voice-deepgram';
+import { elevenlabs } from '@absolutejs/voice-elevenlabs';
+const twiml = createTwilioVoiceResponse({
+  streamUrl: 'wss://example.com/voice/twilio',
+  parameters: {
+    sessionId: 'call-123',
+    scenarioId: 'phone-intake'
+  },
+  track: 'both_tracks'
+});
+const bridge = createTwilioMediaStreamBridge(twilioSocket, {
+  context: {},
+  onComplete: async () => {},
+  onTurn: async ({ turn }) => ({
+    assistantText: `You said: ${turn.text}`
+  }),
+  session: createVoiceMemoryStore(),
+  stt: deepgram({
+    apiKey: process.env.DEEPGRAM_API_KEY!,
+    model: 'flux-general-en'
+  }),
+  tts: elevenlabs({
+    apiKey: process.env.ELEVENLABS_API_KEY!,
+    voiceId: process.env.ELEVENLABS_VOICE_ID!
+  })
+});
+await bridge.handleMessage(startMessageFromTwilio);
+await bridge.handleMessage(mediaMessageFromTwilio);
+```
+The bridge also sends Twilio `clear` events on new inbound media after assistant audio has started streaming, so telephony barge-in can stop queued outbound playback.
+You can benchmark the package-level Twilio bridge path with:
+```bash
+bun run bench:telephony:run
+```
+That writes:
+- `benchmark-results/telephony-twilio-bridge.json`
+- `benchmark-results/telephony-run-manifest.json`
+For a live vendor-backed duplex smoke benchmark on the real TTS adapters, run:
+```bash
+bun run bench:duplex:live:run
+```
+That writes fresh results to:
+For a live vendor-backed telephony smoke benchmark through the Twilio bridge path, run:
+```bash
+bun run bench:telephony:live:run
+```
+That writes:
+- `benchmark-results/telephony-live-deepgram-elevenlabs.json`
+- `benchmark-results/telephony-live-run-manifest.json`
+For a repeated live telephony stability read, run:
+```bash
+bun run bench:telephony:live:series
+```
+That writes:
+- `benchmark-results/telephony-live-series-summary-runs-3.json`
+For a live Deepgram telephony model shootout on the same PSTN path, run:
+```bash
+bun run bench:telephony:live:shootout
+```
+That writes:
+- `benchmark-results/telephony-live-flux-general-en.json`
+- `benchmark-results/telephony-live-nova-3-phone.json`
+- `benchmark-results/telephony-live-shootout-manifest.json`
+- `benchmark-results/duplex-live-elevenlabs.json`
+- `benchmark-results/duplex-live-openai.json`
+- `benchmark-results/duplex-live-all.json`
+- `benchmark-results/duplex-live-run-manifest.json`
+For a browser-run duplex benchmark that uses a real headless Chrome `AudioContext` instead of the fake Node-side playback context, run:
+```bash
+bun run bench:duplex:browser:run
+```
+That writes fresh results to:
+- `benchmark-results/duplex-browser-elevenlabs.json`
+- `benchmark-results/duplex-browser-openai.json`
+- `benchmark-results/duplex-browser-all.json`
+- `benchmark-results/duplex-browser-run-manifest.json`
+To measure browser duplex stability across repeated runs, use:
+```bash
+bun run bench:duplex:browser:series
+```
+That writes:
+- `benchmark-results/duplex-browser-series-summary-runs-3.json`
+- per-run provider artifacts like `benchmark-results/duplex-browser-elevenlabs-series-run-1.json`
+For repeated interrupt-and-resume across several consecutive assistant turns, run:
+```bash
+bun run bench:duplex:browser:overlap:run
+```
+That writes:
+- `benchmark-results/duplex-browser-overlap-elevenlabs.json`
+- `benchmark-results/duplex-browser-overlap-openai.json`
+- `benchmark-results/duplex-browser-overlap-all.json`
+- `benchmark-results/duplex-browser-overlap-run-manifest.json`
+To measure overlap stability across repeated live browser runs, use:
+```bash
+bun run bench:duplex:browser:overlap:series
+```
+That writes:
+- `benchmark-results/duplex-browser-overlap-series-summary-runs-3.json`
+- per-run provider artifacts like `benchmark-results/duplex-browser-overlap-elevenlabs-series-run-1.json`
+## TTS Benchmarks
+`@absolutejs/voice` now includes a first TTS benchmark harness for streaming output adapters. The initial metrics are:
+- `firstAudioLatencyMs`
+- `elapsedMs`
+- `audioChunkCount`
+- `totalAudioBytes`
+- estimated PCM `audioDurationMs`
+- interruption responsiveness via `interruptionLatencyMs`
+Run the full TTS suite with one command:
+```bash
+bun run bench:tts:run
+```
+That writes fresh results to:
+- `benchmark-results/tts-all.json`
+- `benchmark-results/tts-elevenlabs.json`
+- `benchmark-results/tts-openai.json`
+- `benchmark-results/tts-run-manifest.json`
+To measure interruption/cancel responsiveness separately:
+```bash
+bun run bench:tts:interrupt:run
+```
+That writes fresh interruption results to:
+- `benchmark-results/tts-all-interrupt.json`
+- `benchmark-results/tts-elevenlabs-interrupt.json`
+- `benchmark-results/tts-openai-interrupt.json`
+- `benchmark-results/tts-interrupt-run-manifest.json`
 ## Recommended Production Path
 The current best-performing path in the bundled benchmarks is:
 - `deepgram-flux` as primary STT
+- route-level `lexicon` for pronunciation/domain entries
 - route-level `phraseHints`
 - route-level `correctTurn` using `createPhraseHintCorrectionHandler()`
@@ -80,7 +359,9 @@ Minimal production-oriented example:
 ```ts
 import {
+	createVoiceSTTRoutingCorrectionHandler,
 	createPhraseHintCorrectionHandler,
+	resolveVoiceSTTRoutingStrategy,
 	voice
 } from '@absolutejs/voice';
 import { deepgram } from '@absolutejs/voice-deepgram';
@@ -89,6 +370,13 @@ app.use(
 	voice({
 		path: '/voice/intake',
 		preset: 'reliability',
+		lexicon: [
+			{
+				text: 'AbsoluteJS',
+				aliases: ['absoloot js'],
+				pronunciation: 'ab-so-lute jay ess'
+			}
+		],
 		phraseHints: [
 			{ text: 'AbsoluteJS', aliases: ['absolute js'] },
 			{ text: 'Joe Johnston', aliases: ['joe johnson'] },
@@ -113,6 +401,45 @@ app.use(
 `phraseHints` are user-controlled route config, not hidden framework magic. They are there so the app can teach the voice route its domain vocabulary.
+## Best Vs Cheap STT
+`@absolutejs/voice` now exposes an explicit package-level routing split so apps can choose between the strongest benchmarked path and a cheaper/raw path without inventing their own policy layer.
+```ts
+import {
+	createVoiceMemoryStore,
+	createVoiceSTTRoutingCorrectionHandler,
+	resolveVoiceSTTRoutingStrategy,
+	voice
+} from '@absolutejs/voice';
+import { deepgram } from '@absolutejs/voice-deepgram';
+const strategy = resolveVoiceSTTRoutingStrategy('best');
+app.use(
+	voice({
+		path: '/voice/stt',
+		preset: strategy.preset,
+		phraseHints: [{ text: 'Joe Johnston', aliases: ['joe johnson'] }],
+		correctTurn: createVoiceSTTRoutingCorrectionHandler(strategy.correctionMode),
+		session: createVoiceMemoryStore(),
+		sttLifecycle: strategy.sttLifecycle,
+		stt: deepgram({
+			apiKey: process.env.DEEPGRAM_API_KEY!,
+			model: 'flux-general-en'
+		})
+	})
+);
+```
+- `best` maps to the current strongest in-package path: Deepgram Flux plus generic deterministic correction.
+- `low-cost` maps to a cheaper/raw package path: one primary STT pass with no correction hook.
+- session benchmarks now include per-turn cost telemetry fields like `averageRelativeCostUnits`, `averagePrimaryAudioMs`, and `averageFallbackReplayAudioMs`.
+- use `bun run bench:stt:routing:run` to benchmark both in parallel and write fresh:
+  - `benchmark-results/sessions-best-stt-runs-3.json`
+  - `benchmark-results/sessions-cheap-stt-runs-3.json`
+  - `benchmark-results/stt-routing-run-manifest.json`
 ## Presets
 Voice now ships named runtime presets so apps can start from a useful baseline instead of hand-tuning silence and capture settings every time.
@@ -161,11 +488,13 @@ Presets are still overridable. If you need to tune for a specific route, layer `
 Presets are not the same thing as phrase hints:
 - presets tune framework-owned behavior like silence windows, reconnect defaults, and audio conditioning
+- `lexicon` tunes pronunciation-aware domain entries that should reach STT/TTS adapters directly
 - `phraseHints` tune app/domain vocabulary like company names, product names, legal phrases, or subscriber-specific jargon
 In practice:
 - use a preset to choose the runtime shape (`guided-intake`, `reliability`, `noisy-room`)
+- use `lexicon` when pronunciation matters and you want adapter-consumable entries
 - use `phraseHints` to teach the route what words matter for your business
 - use `correctTurn` when you want deterministic post-STT repair before the turn is committed
@@ -199,9 +528,51 @@ The controller helpers abstract the common browser boilerplate:
 They do not hide the underlying transport. You still choose the route path and preset explicitly.
-## Phrase Hints And Correction
+## Lexicon, Phrase Hints, And Correction
+`lexicon` is a route-level input for pronunciation-aware domain entries.
+It can be:
+- a static array for known names, products, and jargon
+- a resolver function when entries depend on the tenant, subscriber, or scenario
+```ts
+voice({
+	path: '/voice/intake',
+	lexicon: async ({ context }) => {
+		return [
+			{
+				text: 'AbsoluteJS',
+				aliases: ['absoloot js'],
+				pronunciation: 'ab-so-lute jay ess'
+			},
+			{
+				text: 'Eden Treaty',
+				aliases: ['eden tree tea'],
+				pronunciation: 'ee-den tree-tee'
+			}
+		];
+	},
+	session: createVoiceMemoryStore(),
+	stt: deepgram({
+		apiKey: process.env.DEEPGRAM_API_KEY!,
+		model: 'flux-general-en'
+	}),
+	onTurn: async ({ turn }) => ({
+		assistantText: turn.text
+	}),
+	onComplete: async () => {}
+});
+```
+How the package uses it:
+- adapters receive `lexicon` at open time and translate it into vendor-native hinting surfaces when possible
+- STT adapters can use the canonical text plus aliases to bias recognition
+- future TTS adapters can use the same entries for pronunciation-aware speech output
-`phraseHints` are a route-level input that the application owns.
+`phraseHints` are a separate route-level input that the application owns.
 They can be:
@@ -234,6 +605,7 @@ voice({
 How the package uses them:
+- adapters receive `lexicon` and `phraseHints` at open time
 - adapters receive `phraseHints` at open time and can translate them into vendor-native hinting surfaces
 - the correction layer can use the same hints after STT to repair domain terms before commit
@@ -361,6 +733,11 @@ Use profiles to focus where you want to win:
 - `bun run bench:vs all` (default)
 - `bun run bench:vs all accents`
+- `bun run bench:vs all code-switch`
+- `bun run bench:vs all jargon`
+- `bun run bench:vs all multilingual`
+- `bun run bench:vs all multi-speaker`
+- `bun run bench:vs all telephony`
 - `bun run bench:vs all clean`
 - `bun run bench:vs all noisy`
 - `bun run bench:vs deepgram accents`
@@ -387,6 +764,21 @@ DEEPGRAM_MODEL=flux-general-en bun run bench:deepgram:accents
 DEEPGRAM_MODEL=nova-3 bun run bench:deepgram:accents
 ```
+To stress the STT path with synthesized narrowband phone audio:
+```bash
+bun run bench:telephony
+bun run bench:telephony:run
+bun run bench:deepgram:telephony
+bun run bench:deepgram:corrected:telephony
+bun run bench:jargon
+bun run bench:deepgram:jargon
+bun run bench:deepgram:corrected:audit:jargon
+bun run bench:multi-speaker:run
+bun run bench:multi-speaker:analyze
+bun run bench:deepgram:multi-speaker
+```
 To compare against Vapi or other providers, provide a baseline JSON file:
 ```bash
@@ -427,20 +819,31 @@ The harness prints:
 - pass rate and recall deltas per adapter
 - weighted scorecard (`passRate`, term recall, word accuracy)
 - optional competitor deltas (Vapi)
+- a markdown report beside the JSON output, for example:
+  - `benchmark-results/vs-all-telephony.json`
+  - `benchmark-results/vs-all-telephony.md`
 For package-level multi-turn behavior, use the session benchmark harness instead of raw STT-only benchmarking:
 ```bash
 bun run bench:sessions
 bun run bench:deepgram:sessions
+bun run bench:deepgram:soak:sessions
 bun run bench:deepgram:hybrid:sessions
 bun run bench:deepgram:corrected:sessions
+bun run bench:deepgram:corrected:soak:sessions
+bun run bench:stt:routing:run
 bun run bench:assemblyai:sessions
 bun run bench:openai:sessions
+bun run bench:soak:run
 ```
 That harness runs the adapter through `VoiceSession` itself, so the output reflects reconnect handling, turn commit stability, and duplicate-turn protection rather than only raw transcript quality.
+`bench:soak:run` is the STT-5 runner. It executes the long-session soak lane for raw Deepgram Flux, corrected Deepgram, and the reconnect resilience suite in parallel, then writes fresh JSON into `benchmark-results/` without the runs deleting each other.
+`bench:stt:routing:run` is the STT-7 runner. It benchmarks the package’s current `best` vs `low-cost` session strategies in parallel, clears stale outputs first, and writes a manifest so the cost-aware summaries are guaranteed fresh.
 `bench:deepgram:corrected:sessions` exercises the current recommended package-level production path:
 - Deepgram Flux as primary STT
@@ -568,6 +971,100 @@ Fallback triggers are evaluated at commit time:
 The fallback adapter receives the same window of turn audio as the primary (default `8s`, configurable with `replayWindowMs`) and can only run `maxAttemptsPerTurn` times per turn.
+## Benchmark Fixture Sources
+Bundled fixtures cover the current in-repo English benchmark suite. For multilingual and code-switch evaluation, add external fixture directories and let the benchmark scripts merge them automatically.
+The public corpus builder currently assembles:
+- FLEURS multilingual dev clips
+- BSC Catalan-Spanish code-switch evaluation clips
+- CoSHE Hindi-English code-switch evaluation clips
+Set either:
+- `VOICE_FIXTURE_DIR=/abs/path/to/fixtures`
+- `VOICE_FIXTURE_DIRS=/abs/path/one,/abs/path/two`
+Each fixture directory must include:
+- `manifest.json`
+- `pcm/*.pcm`
+Each manifest entry can include:
+- `language`
+- `tags`
+  Use `multilingual`, `bilingual`, or `code-switch` to route fixtures into the multilingual benchmark lane.
+Benchmark commands:
+```bash
+bun run bench:multilingual
+bun run bench:code-switch
+bun run bench:code-switch:series
+bun run bench:code-switch:ca-es
+bun run bench:code-switch:ca-es:series
+bun run bench:code-switch:ca-es:corts:series
+bun run bench:code-switch:ca-es:parlament:series
+bun run bench:code-switch:hi-en
+bun run bench:code-switch:hi-en:series
+bun run bench:deepgram:multilingual
+bun run bench:deepgram:code-switch
+bun run bench:deepgram:code-switch:series
+bun run bench:deepgram:code-switch:ca-es
+bun run bench:deepgram:code-switch:ca-es:series
+bun run bench:deepgram:code-switch:ca-es:corts:series
+bun run bench:deepgram:code-switch:ca-es:parlament:series
+bun run bench:deepgram:code-switch:ca-es:nova3-multi:series
+bun run bench:deepgram:code-switch:ca-es:nova3-ca:series
+bun run bench:deepgram:code-switch:ca-es:nova3-es:series
+bun run bench:deepgram:code-switch:ca-es:nova2-ca:series
+bun run bench:deepgram:code-switch:ca-es:nova2-es:series
+bun run bench:deepgram:code-switch:ca-es:best:corrected:series
+bun run bench:deepgram:code-switch:ca-es:parlament:debug
+bun run bench:deepgram:code-switch:corrected:ca-es
+bun run bench:deepgram:code-switch:corrected:ca-es:series
+bun run bench:deepgram:code-switch:corrected:ca-es:corts:series
+bun run bench:deepgram:code-switch:corrected:ca-es:parlament:series
+bun run bench:deepgram:code-switch:hi-en
+bun run bench:deepgram:code-switch:hi-en:series
+bun run bench:deepgram:code-switch:corrected:hi-en
+bun run bench:deepgram:code-switch:corrected:hi-en:series
+bun run bench:deepgram:code-switch:corrected
+bun run bench:deepgram:code-switch:corrected:series
+bun run bench:assemblyai:multilingual
+bun run bench:assemblyai:code-switch
+bun run bench:openai:multilingual
+bun run bench:openai:code-switch
+bun run bench:openai:code-switch:series
+bun run bench:openai:code-switch:ca-es
+bun run bench:openai:code-switch:ca-es:series
+bun run bench:openai:code-switch:corrected:ca-es
+bun run bench:openai:code-switch:corrected:ca-es:series
+bun run bench:openai:code-switch:hi-en
+bun run bench:openai:code-switch:hi-en:series
+bun run bench:openai:code-switch:corrected:hi-en
+bun run bench:openai:code-switch:corrected:hi-en:series
+bun run bench:openai:code-switch:corrected
+bun run bench:openai:code-switch:corrected:series
+```
+Current benchmark direction:
+- `openai` is the strongest adapter on the current public multilingual corpus
+- `deepgram` remains the strongest browser-English path
+- raw code-switch remains a weaker surface for every adapter and should be benchmarked separately with `bench:code-switch`
+- jargon-heavy/domain-heavy English terms now have their own profile; use `bench:jargon` for the cross-adapter read and `bench:deepgram:corrected:audit:jargon` to compare `raw` vs `generic` vs `experimental` vs `benchmarkSeeded`
+- code-switch should be treated as language-pair-specific, not one universal lane; `ca-es` and `hi-en` now have dedicated series commands
+- `ca-es` also has a dedicated Deepgram model/language shootout lane so you can compare `nova-3`/`nova-2` with `multi`, `ca`, and `es` routing without overwriting results
+- current best `ca-es` base path is `deepgram` `nova-3` with `language=ca`; the short runner script uses that path for corrected series
+- `ca-es` is also split by source now: `corts_valencianes` and `parlament_parla` can be benchmarked independently, and `parlament_parla` has a dedicated transcript dump script
+- corrected code-switch runs now have dedicated lexicon-driven series commands so raw and corrected stability can be compared directly
+- multi-speaker diarization is now its own benchmark surface; use `bench:multi-speaker:run` for the parallel cross-adapter plus Deepgram-specific read
+- when tuning diarization specifically, use `bench:multi-speaker:analyze` to split Deepgram into clean vs noisy handoff lanes, include a corrected noisy read, and emit a speaker-pattern debug dump
+- use the `:series` commands when you need stability rather than a single-pass snapshot
 ## Client Primitives
 Browser and framework helpers sit on top of the same connection core: