npm - @absolutejs/voice - Versions diffs - 0.0.20 → 0.0.21 - Mend

@absolutejs/voice 0.0.20 → 0.0.21

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (40) hide show

package/README.md +387 -4
package/dist/angular/index.d.ts +1 -0
package/dist/angular/index.js +669 -3
package/dist/angular/voice-controller.service.d.ts +21 -0
package/dist/audioConditioning.d.ts +3 -0
package/dist/client/actions.d.ts +7 -0
package/dist/client/connection.d.ts +5 -0
package/dist/client/controller.d.ts +2 -0
package/dist/client/htmxBootstrap.js +576 -167
package/dist/client/index.d.ts +1 -0
package/dist/client/index.js +486 -3
package/dist/client/microphone.d.ts +4 -2
package/dist/correction.d.ts +16 -0
package/dist/index.d.ts +4 -0
package/dist/index.js +1314 -283
package/dist/presets.d.ts +13 -0
package/dist/react/index.d.ts +1 -0
package/dist/react/index.js +642 -3
package/dist/react/useVoiceController.d.ts +20 -0
package/dist/react/useVoiceStream.d.ts +1 -0
package/dist/store.d.ts +2 -2
package/dist/svelte/index.d.ts +1 -0
package/dist/svelte/index.js +607 -3
package/dist/testing/benchmark.d.ts +36 -0
package/dist/testing/index.js +1453 -241
package/dist/testing/sessionBenchmark.d.ts +67 -2
package/dist/testing/stt.d.ts +1 -0
package/dist/turnDetection.d.ts +5 -1
package/dist/turnProfiles.d.ts +6 -0
package/dist/types.d.ts +198 -8
package/dist/vue/index.d.ts +1 -0
package/dist/vue/index.js +660 -3
package/dist/vue/useVoiceController.d.ts +19 -0
package/fixtures/README.md +9 -0
package/fixtures/manifest.json +59 -1
package/fixtures/pcm/dialogue-three-clean.pcm +0 -0
package/fixtures/pcm/dialogue-three-mixed.pcm +0 -0
package/fixtures/pcm/dialogue-two-clean.pcm +0 -0
package/fixtures/pcm/dialogue-two-noisy.pcm +0 -0
package/package.json +21 -1

package/README.md CHANGED Viewed

@@ -25,17 +25,32 @@ Optional framework entrypoints:
 ```ts
 import { Elysia } from 'elysia';
-import { voice, createVoiceMemoryStore } from '@absolutejs/voice';
+import {
+	voice,
+	createVoiceMemoryStore,
+	createPhraseHintCorrectionHandler
+} from '@absolutejs/voice';
 import { deepgram } from '@absolutejs/voice-deepgram';
 const app = new Elysia()
 	.use(
 		voice({
 			path: '/voice',
+			preset: 'guided-intake',
+			phraseHints: [
+				{ text: 'AbsoluteJS', aliases: ['absolute js'] },
+				{ text: 'Joe Johnston', aliases: ['joe johnson'] }
+			],
+			correctTurn: createPhraseHintCorrectionHandler(),
 			onComplete: async ({ session }) => {
 				console.log(session.turns);
 			},
 			async onTurn({ turn }) {
+				console.log('turn quality:', {
+					source: turn.quality?.source,
+					fallbackUsed: turn.quality?.fallbackUsed,
+					confidence: turn.quality?.averageConfidence
+				});
 				return {
 					assistantText: `You said: ${turn.text}`
 				};
@@ -51,6 +66,237 @@ const app = new Elysia()
 `createVoiceMemoryStore()` is dev-only. Real deployments should provide a shared store backed by Redis, Postgres, or equivalent.
+## Recommended Production Path
+The current best-performing path in the bundled benchmarks is:
+- `deepgram-flux` as primary STT
+- route-level `phraseHints`
+- route-level `correctTurn` using `createPhraseHintCorrectionHandler()`
+That combination outperformed the raw vendor-only paths in the package benchmarks because it lets AbsoluteJS repair domain-specific terms after strong base transcription instead of depending on a second STT vendor to rescue hard turns.
+Minimal production-oriented example:
+```ts
+import {
+	createPhraseHintCorrectionHandler,
+	voice
+} from '@absolutejs/voice';
+import { deepgram } from '@absolutejs/voice-deepgram';
+app.use(
+	voice({
+		path: '/voice/intake',
+		preset: 'reliability',
+		phraseHints: [
+			{ text: 'AbsoluteJS', aliases: ['absolute js'] },
+			{ text: 'Joe Johnston', aliases: ['joe johnson'] },
+			{
+				text: 'beneath well thatched trees that shed the rain like a roof',
+				aliases: ['beneath wealth', 'shelter beneath wealth']
+			}
+		],
+		correctTurn: createPhraseHintCorrectionHandler(),
+		session: createVoiceMemoryStore(),
+		stt: deepgram({
+			apiKey: process.env.DEEPGRAM_API_KEY!,
+			model: 'flux-general-en'
+		}),
+		onTurn: async ({ turn }) => ({
+			assistantText: `Captured: ${turn.text}`
+		}),
+		onComplete: async () => {}
+	})
+);
+```
+`phraseHints` are user-controlled route config, not hidden framework magic. They are there so the app can teach the voice route its domain vocabulary.
+## Presets
+Voice now ships named runtime presets so apps can start from a useful baseline instead of hand-tuning silence and capture settings every time.
+- `default`
+- `chat`
+- `guided-intake`
+- `dictation`
+- `noisy-room`
+- `reliability`
+On the server:
+```ts
+voice({
+	path: '/voice/intake',
+	preset: 'guided-intake',
+	session: createVoiceMemoryStore(),
+	stt: deepgram({
+		apiKey: process.env.DEEPGRAM_API_KEY!,
+		model: 'nova-3'
+	}),
+	onTurn: async ({ turn }) => ({
+		assistantText: `Captured: ${turn.text}`
+	}),
+	onComplete: async () => {}
+});
+```
+On the client:
+```ts
+import { createVoiceController } from '@absolutejs/voice/client';
+const voice = createVoiceController('/voice/intake', {
+	preset: 'guided-intake'
+});
+await voice.startRecording();
+voice.endTurn();
+voice.stopRecording();
+```
+Presets are still overridable. If you need to tune for a specific route, layer `turnDetection` or `audioConditioning` on top of the preset instead of replacing the whole setup.
+Presets are not the same thing as phrase hints:
+- presets tune framework-owned behavior like silence windows, reconnect defaults, and audio conditioning
+- `phraseHints` tune app/domain vocabulary like company names, product names, legal phrases, or subscriber-specific jargon
+In practice:
+- use a preset to choose the runtime shape (`guided-intake`, `reliability`, `noisy-room`)
+- use `phraseHints` to teach the route what words matter for your business
+- use `correctTurn` when you want deterministic post-STT repair before the turn is committed
+## Framework Helpers
+The package now exposes higher-level controller helpers as well as the lower-level stream primitives.
+- `@absolutejs/voice/client`
+  - `createVoiceController()`
+  - `createVoiceStream()`
+  - `bindVoiceHTMX()`
+- `@absolutejs/voice/react`
+  - `useVoiceController()`
+  - `useVoiceStream()`
+- `@absolutejs/voice/vue`
+  - `useVoiceController()`
+  - `useVoiceStream()`
+- `@absolutejs/voice/svelte`
+  - `createVoiceController()`
+  - `createVoiceStream()`
+- `@absolutejs/voice/angular`
+  - `VoiceControllerService`
+  - `VoiceStreamService`
+The controller helpers abstract the common browser boilerplate:
+- microphone capture
+- start / stop / toggle recording
+- stream subscription state
+- HTMX session syncing
+They do not hide the underlying transport. You still choose the route path and preset explicitly.
+## Phrase Hints And Correction
+`phraseHints` are a route-level input that the application owns.
+They can be:
+- a static array for known domain vocabulary
+- a resolver function when hints depend on the authenticated user, tenant, scenario, or subscriber record
+```ts
+voice({
+	path: '/voice/intake',
+	preset: 'reliability',
+	phraseHints: async ({ context, scenarioId, sessionId }) => {
+		return [
+			{ text: 'AbsoluteJS', aliases: ['absolute js'] },
+			{ text: 'Eden Treaty', aliases: ['eden treaty'] },
+			{ text: 'Joe Johnston', aliases: ['joe johnson'] }
+		];
+	},
+	correctTurn: createPhraseHintCorrectionHandler(),
+	session: createVoiceMemoryStore(),
+	stt: deepgram({
+		apiKey: process.env.DEEPGRAM_API_KEY!,
+		model: 'flux-general-en'
+	}),
+	onTurn: async ({ turn }) => ({
+		assistantText: turn.text
+	}),
+	onComplete: async () => {}
+});
+```
+How the package uses them:
+- adapters receive `phraseHints` at open time and can translate them into vendor-native hinting surfaces
+- the correction layer can use the same hints after STT to repair domain terms before commit
+Current built-in correction helper:
+```ts
+import { createPhraseHintCorrectionHandler } from '@absolutejs/voice';
+const correctTurn = createPhraseHintCorrectionHandler();
+```
+This helper is intentionally deterministic. It is for phrase normalization and domain repair, not for hiding an LLM behind your turn commit. If you need something more advanced, provide your own `correctTurn` handler.
+### React
+```tsx
+import { useVoiceController } from '@absolutejs/voice/react';
+export function VoiceWidget() {
+	const voice = useVoiceController('/voice/intake', {
+		preset: 'guided-intake'
+	});
+	return (
+		<button onClick={() => void voice.toggleRecording()}>
+			{voice.isRecording ? 'Stop microphone' : 'Start microphone'}
+		</button>
+	);
+}
+```
+### Vue
+```ts
+import { useVoiceController } from '@absolutejs/voice/vue';
+const voice = useVoiceController('/voice/intake', {
+	preset: 'guided-intake'
+});
+```
+### Svelte
+```ts
+import { createVoiceController } from '@absolutejs/voice/svelte';
+const voice = createVoiceController('/voice/intake', {
+	preset: 'guided-intake'
+});
+```
+### Angular
+```ts
+import { VoiceControllerService } from '@absolutejs/voice/angular';
+constructor(private readonly voice: VoiceControllerService) {}
+controller = this.voice.connect('/voice/intake', {
+	preset: 'guided-intake'
+});
+```
 ## HTMX
 Voice now mirrors the AI plugin's HTMX pattern with plugin-owned renderers and a plugin-owned fragment route.
@@ -91,14 +337,117 @@ The plugin exposes `GET /voice/intake/htmx/session?sessionId=...` by default. Th
 On the client, bind the browser voice stream to a hidden HTMX refresh element:
 ```ts
-import { bindVoiceHTMX, createVoiceStream } from '@absolutejs/voice/client';
+import { createVoiceController } from '@absolutejs/voice/client';
-const voice = createVoiceStream('/voice/intake');
-bindVoiceHTMX(voice, { element: '#voice-htmx-sync' });
+const voice = createVoiceController('/voice/intake', {
+	preset: 'guided-intake'
+});
+voice.bindHTMX({ element: '#voice-htmx-sync' });
 ```
 That keeps HTMX pages declarative without inventing custom fragment endpoints for core voice session UI.
+## Competitive Benchmarking
+The package includes a competitive benchmark harness for STT quality and responsiveness.
+Run:
+```bash
+bun run bench:vs
+```
+Use profiles to focus where you want to win:
+- `bun run bench:vs all` (default)
+- `bun run bench:vs all accents`
+- `bun run bench:vs all clean`
+- `bun run bench:vs all noisy`
+- `bun run bench:vs deepgram accents`
+- `bun run bench:vs deepgram-flux accents` (compare Flux candidate, default includes VAPI output if configured)
+- `bun run bench:vs deepgram-nova accents`
+Current benchmark guidance:
+- use `deepgram-flux` as the primary conversational STT path
+- prefer route-level `phraseHints` plus `correctTurn` over cross-vendor fallback for domain-specific accuracy
+- use fallback vendors only when your own traffic proves they beat the package-level correction path
+- do not treat `openai` as the default STT path unless your own benchmarks prove it for your traffic
+If you use a VAPI baseline file, you can run a direct model comparison:
+```bash
+bun run bench:vs:deepgram-flux
+```
+To benchmark Nova vs Flux back-to-back, set the model explicitly:
+```bash
+DEEPGRAM_MODEL=flux-general-en bun run bench:deepgram:accents
+DEEPGRAM_MODEL=nova-3 bun run bench:deepgram:accents
+```
+To compare against Vapi or other providers, provide a baseline JSON file:
+```bash
+bun run bench:vs all accents --compare /path/to/vapi-baseline.json
+```
+Expected benchmark payload:
+```json
+{
+  "source": "vapi",
+  "results": [
+    {
+      "adapterId": "vapi-baseline",
+      "summary": {
+        "passRate": 0.0,
+        "averageWordErrorRate": 1.0,
+        "averageTermRecall": 0.0,
+        "averageElapsedMs": 0,
+        "averageTimeToEndOfTurnMs": 0,
+        "averageTimeToFirstFinalMs": 0,
+        "averageTimeToFirstPartialMs": 0,
+        "wordAccuracyRate": 0.0
+      }
+    }
+  ]
+}
+```
+For a fast parse-only validation of arguments:
+```bash
+bun run ./scripts/benchmark-vs.ts --dry-run
+```
+The harness prints:
+- pass rate and recall deltas per adapter
+- weighted scorecard (`passRate`, term recall, word accuracy)
+- optional competitor deltas (Vapi)
+For package-level multi-turn behavior, use the session benchmark harness instead of raw STT-only benchmarking:
+```bash
+bun run bench:sessions
+bun run bench:deepgram:sessions
+bun run bench:deepgram:hybrid:sessions
+bun run bench:deepgram:corrected:sessions
+bun run bench:assemblyai:sessions
+bun run bench:openai:sessions
+```
+That harness runs the adapter through `VoiceSession` itself, so the output reflects reconnect handling, turn commit stability, and duplicate-turn protection rather than only raw transcript quality.
+`bench:deepgram:corrected:sessions` exercises the current recommended package-level production path:
+- Deepgram Flux as primary STT
+- phrase hints routed through the adapter layer
+- committed-turn correction via `createPhraseHintCorrectionHandler()`
+- core turn dedupe, reconnect, and transcript selection still owned by `@absolutejs/voice`
 ## Adapter Contract
 Adapters normalize vendor behavior into a core event model so the plugin never branches on vendor names.
@@ -185,6 +534,40 @@ Default reconnect strategy is `resume-last-turn`.
 If an adapter does not emit native end-of-turn events, core falls back to silence detection with a default `700ms` threshold.
+## STT Fallback
+You can pair a primary vendor with an optional fallback vendor per route when you need extra reliability for accents, edge environments, or short commands.
+```ts
+voice({
+	path: '/voice/intake',
+	preset: 'default',
+	session: createVoiceMemoryStore(),
+	stt: deepgram({ apiKey: process.env.DEEPGRAM_API_KEY!, model: 'nova-3' }),
+	sttFallback: {
+		adapter: assemblyai({ apiKey: process.env.ASSEMBLYAI_API_KEY! }),
+		trigger: 'empty-or-low-confidence',
+		confidenceThreshold: 0.65,
+		minTextLength: 2,
+		replayWindowMs: 8000,
+		settleMs: 220,
+		maxAttemptsPerTurn: 1
+	},
+	onTurn: async ({ turn }) => {
+		return { assistantText: `Captured: ${turn.text}` };
+	},
+	onComplete: async () => {}
+});
+```
+Fallback triggers are evaluated at commit time:
+- `empty-turn`: commit is empty (`< minTextLength` words), then fallback is attempted
+- `low-confidence`: average transcript confidence is below `confidenceThreshold`
+- `empty-or-low-confidence`: both conditions
+The fallback adapter receives the same window of turn audio as the primary (default `8s`, configurable with `replayWindowMs`) and can only run `maxAttemptsPerTurn` times per turn.
 ## Client Primitives
 Browser and framework helpers sit on top of the same connection core:

package/dist/angular/index.d.ts CHANGED Viewed

	@@ -1 +1,2 @@
1 1	export { VoiceStreamService } from './voice-stream.service';
2	+ export { VoiceControllerService } from './voice-controller.service';