npm - @mastra/voice-inworld - Versions diffs - 0.3.0-alpha.1 → 0.3.0 - Mend

@mastra/voice-inworld 0.3.0-alpha.1 → 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (6) hide show

package/CHANGELOG.md +69 -0
package/dist/docs/SKILL.md +1 -1
package/dist/docs/assets/SOURCE_MAP.json +1 -1
package/dist/docs/references/docs-voice-overview.md +25 -25
package/dist/docs/references/docs-voice-speech-to-speech.md +4 -4
package/package.json +4 -4

package/CHANGELOG.md CHANGED Viewed

@@ -1,5 +1,74 @@
 # @mastra/voice-inworld
+## 0.3.0
+### Minor Changes
+- `@mastra/voice-inworld` now ships `InworldRealtimeVoice` for full-duplex realtime voice — mic in, speakers out, server-side LLM routing, semantic VAD turn-taking, tool calling, barge-in, and live transcripts of both sides — alongside the existing streaming TTS and batch STT. No separate package needed; import both from the same entry point. ([#16865](https://github.com/mastra-ai/mastra/pull/16865))
+  ```typescript
+  // Batch TTS / STT (unchanged)
+  import { InworldVoice } from '@mastra/voice-inworld';
+  // New: realtime full-duplex voice, from the same package
+  import { InworldRealtimeVoice } from '@mastra/voice-inworld';
+  const voice = new InworldRealtimeVoice({
+    apiKey: process.env.INWORLD_API_KEY,
+    // Defaults: model 'inworld/models/gemma-4-26b-a4b-it', speaker 'Sarah',
+    // STT 'inworld/inworld-stt-1', semantic-VAD turn detection.
+  });
+  await voice.connect();
+  voice.on('speaker', stream => playAudio(stream)); // PCM16 @ 24kHz
+  voice.on('writing', ({ text, role }) => console.log(role, text));
+  voice.on('interrupted', ({ response_id }) => stopAudio(response_id));
+  await voice.send(getMicrophoneStream());
+  ```
+  **Typed `providerData` for Inworld realtime extensions**
+  `InworldRealtimeVoice` now accepts a typed `providerData` object for Inworld-specific extensions — STT tuning, TTS segmentation and steering, automatic memory, back-channel, and responsiveness — sent under `session.providerData`. The provider also surfaces inbound extension data: a `voiceProfile` on user `writing` events, a `memory` event for the rolling summary/facts state, and `backchannel` / `backchannel.done` / `backchannel.skipped` events for back-channel audio.
+  ```typescript
+  const voice = new InworldRealtimeVoice({
+    providerData: {
+      stt: { voice_profile: true, language_hints: ['en-US'] },
+      tts: { delivery_mode: 'CREATIVE', segmenter_strategy: 'balanced' },
+      memory: { enabled: true, turn_interval: 4 },
+      backchannel: { enabled: true, max_per_turn: 1 },
+    },
+  });
+  voice.on('memory', state => console.log(state.summary, state.facts));
+  voice.on('backchannel', stream => playAudio(stream));
+  voice.on('writing', ({ role, voiceProfile }) => console.log(role, voiceProfile?.emotion));
+  ```
+  **Realtime fixes and additions**
+  - Fixed the per-call `speak(text, { speaker })` voice override. It is now sent as the flat `response.voice` field, so the per-call speaker is no longer silently ignored by the server.
+  - Added manual turn-taking methods `commitInput()`, `clearInput()`, and `clearOutput()` for push-to-talk and manual turn control (use `clearOutput()` only to hard-stop all playback — it also stops in-flight back-channels).
+  - Added smart-turn and playback-state events: `turn-suggestion`, `turn-suggestion-revoked`, `input-committed`, `input-cleared`, `input-timeout`, and `output-audio-started` / `output-audio-stopped` / `output-audio-cleared`.
+  - Added richer typed session config: input noise reduction, telephony (8 kHz) and float32 audio formats, a server-VAD `idle_timeout_ms`, plus `tracing`, `include`, and `prompt`.
+  ```typescript
+  // Push-to-talk with no auto-VAD
+  const voice = new InworldRealtimeVoice({
+    session: { audio: { input: { turn_detection: null } } },
+  });
+  await voice.send(getMicrophoneStream());
+  voice.commitInput(); // end the user turn manually
+  voice.on('output-audio-stopped', () => console.log('playback finished'));
+  ```
+### Patch Changes
+- Moved shared voice primitives and route metadata into the new `@internal/voice` package so voice providers no longer depend on `@mastra/core` and server voice routes share the same route definitions. ([#16725](https://github.com/mastra-ai/mastra/pull/16725))
+  `@mastra/core/voice` continues to re-export the voice APIs for backwards compatibility.
 ## 0.3.0-alpha.1
 ### Minor Changes

package/dist/docs/SKILL.md CHANGED Viewed

@@ -3,7 +3,7 @@ name: mastra-voice-inworld
 description: Documentation for @mastra/voice-inworld. Use when working with @mastra/voice-inworld APIs, configuration, or implementation.
 metadata:
   package: "@mastra/voice-inworld"
-  version: "0.3.0-alpha.1"
+  version: "0.3.0"
 ---
 ## When to use

package/dist/docs/assets/SOURCE_MAP.json CHANGED Viewed

@@ -1,5 +1,5 @@
 {
-  "version": "0.3.0-alpha.1",
+  "version": "0.3.0",
   "package": "@mastra/voice-inworld",
   "exports": {},
   "modules": {}

package/dist/docs/references/docs-voice-overview.md CHANGED Viewed

@@ -16,7 +16,7 @@ const voiceAgent = new Agent({
   id: 'voice-agent',
   name: 'Voice Agent',
   instructions: 'You are a voice assistant that can help users with their tasks.',
-  model: 'openai/gpt-5.4',
+  model: 'openai/gpt-5.5',
   voice: new OpenAIVoice(),
 })
 ```
@@ -40,7 +40,7 @@ const voiceAgent = new Agent({
   id: 'voice-agent',
   name: 'Voice Agent',
   instructions: 'You are a voice assistant that can help users with their tasks.',
-  model: 'openai/gpt-5.4',
+  model: 'openai/gpt-5.5',
   voice: new OpenAIVoice(),
 })
@@ -68,7 +68,7 @@ const voiceAgent = new Agent({
   id: 'voice-agent',
   name: 'Voice Agent',
   instructions: 'You are a voice assistant that can help users with their tasks.',
-  model: 'openai/gpt-5.4',
+  model: 'openai/gpt-5.5',
   voice: new AzureVoice(),
 })
@@ -95,7 +95,7 @@ const voiceAgent = new Agent({
   id: 'voice-agent',
   name: 'Voice Agent',
   instructions: 'You are a voice assistant that can help users with their tasks.',
-  model: 'openai/gpt-5.4',
+  model: 'openai/gpt-5.5',
   voice: new ElevenLabsVoice(),
 })
@@ -122,7 +122,7 @@ const voiceAgent = new Agent({
   id: 'voice-agent',
   name: 'Voice Agent',
   instructions: 'You are a voice assistant that can help users with their tasks.',
-  model: 'openai/gpt-5.4',
+  model: 'openai/gpt-5.5',
   voice: new PlayAIVoice(),
 })
@@ -149,7 +149,7 @@ const voiceAgent = new Agent({
   id: 'voice-agent',
   name: 'Voice Agent',
   instructions: 'You are a voice assistant that can help users with their tasks.',
-  model: 'openai/gpt-5.4',
+  model: 'openai/gpt-5.5',
   voice: new GoogleVoice(),
 })
@@ -176,7 +176,7 @@ const voiceAgent = new Agent({
   id: 'voice-agent',
   name: 'Voice Agent',
   instructions: 'You are a voice assistant that can help users with their tasks.',
-  model: 'openai/gpt-5.4',
+  model: 'openai/gpt-5.5',
   voice: new CloudflareVoice(),
 })
@@ -203,7 +203,7 @@ const voiceAgent = new Agent({
   id: 'voice-agent',
   name: 'Voice Agent',
   instructions: 'You are a voice assistant that can help users with their tasks.',
-  model: 'openai/gpt-5.4',
+  model: 'openai/gpt-5.5',
   voice: new DeepgramVoice(),
 })
@@ -230,7 +230,7 @@ const voiceAgent = new Agent({
   id: 'voice-agent',
   name: 'Voice Agent',
   instructions: 'You are a voice assistant that can help users with their tasks.',
-  model: 'openai/gpt-5.4',
+  model: 'openai/gpt-5.5',
   voice: new InworldVoice(),
 })
@@ -257,7 +257,7 @@ const voiceAgent = new Agent({
   id: 'voice-agent',
   name: 'Voice Agent',
   instructions: 'You are a voice assistant that can help users with their tasks.',
-  model: 'openai/gpt-5.4',
+  model: 'openai/gpt-5.5',
   voice: new SpeechifyVoice(),
 })
@@ -284,7 +284,7 @@ const voiceAgent = new Agent({
   id: 'voice-agent',
   name: 'Voice Agent',
   instructions: 'You are a voice assistant that can help users with their tasks.',
-  model: 'openai/gpt-5.4',
+  model: 'openai/gpt-5.5',
   voice: new SarvamVoice(),
 })
@@ -311,7 +311,7 @@ const voiceAgent = new Agent({
   id: 'voice-agent',
   name: 'Voice Agent',
   instructions: 'You are a voice assistant that can help users with their tasks.',
-  model: 'openai/gpt-5.4',
+  model: 'openai/gpt-5.5',
   voice: new MurfVoice(),
 })
@@ -346,7 +346,7 @@ const voiceAgent = new Agent({
   id: 'voice-agent',
   name: 'Voice Agent',
   instructions: 'You are a voice assistant that can help users with their tasks.',
-  model: 'openai/gpt-5.4',
+  model: 'openai/gpt-5.5',
   voice: new OpenAIVoice(),
 })
@@ -375,7 +375,7 @@ const voiceAgent = new Agent({
   id: 'voice-agent',
   name: 'Voice Agent',
   instructions: 'You are a voice assistant that can help users with their tasks.',
-  model: 'openai/gpt-5.4',
+  model: 'openai/gpt-5.5',
   voice: new AzureVoice(),
 })
@@ -403,7 +403,7 @@ const voiceAgent = new Agent({
   id: 'voice-agent',
   name: 'Voice Agent',
   instructions: 'You are a voice assistant that can help users with their tasks.',
-  model: 'openai/gpt-5.4',
+  model: 'openai/gpt-5.5',
   voice: new ElevenLabsVoice(),
 })
@@ -431,7 +431,7 @@ const voiceAgent = new Agent({
   id: 'voice-agent',
   name: 'Voice Agent',
   instructions: 'You are a voice assistant that can help users with their tasks.',
-  model: 'openai/gpt-5.4',
+  model: 'openai/gpt-5.5',
   voice: new GoogleVoice(),
 })
@@ -459,7 +459,7 @@ const voiceAgent = new Agent({
   id: 'voice-agent',
   name: 'Voice Agent',
   instructions: 'You are a voice assistant that can help users with their tasks.',
-  model: 'openai/gpt-5.4',
+  model: 'openai/gpt-5.5',
   voice: new CloudflareVoice(),
 })
@@ -487,7 +487,7 @@ const voiceAgent = new Agent({
   id: 'voice-agent',
   name: 'Voice Agent',
   instructions: 'You are a voice assistant that can help users with their tasks.',
-  model: 'openai/gpt-5.4',
+  model: 'openai/gpt-5.5',
   voice: new DeepgramVoice(),
 })
@@ -515,7 +515,7 @@ const voiceAgent = new Agent({
   id: 'voice-agent',
   name: 'Voice Agent',
   instructions: 'You are a voice assistant that can help users with their tasks.',
-  model: 'openai/gpt-5.4',
+  model: 'openai/gpt-5.5',
   voice: new InworldVoice(),
 })
@@ -543,7 +543,7 @@ const voiceAgent = new Agent({
   id: 'voice-agent',
   name: 'Voice Agent',
   instructions: 'You are a voice assistant that can help users with their tasks.',
-  model: 'openai/gpt-5.4',
+  model: 'openai/gpt-5.5',
   voice: new SarvamVoice(),
 })
@@ -575,7 +575,7 @@ const voiceAgent = new Agent({
   id: 'voice-agent',
   name: 'Voice Agent',
   instructions: 'You are a voice assistant that can help users with their tasks.',
-  model: 'openai/gpt-5.4',
+  model: 'openai/gpt-5.5',
   voice: new OpenAIRealtimeVoice(),
 })
@@ -605,7 +605,7 @@ const voiceAgent = new Agent({
   id: 'voice-agent',
   name: 'Voice Agent',
   instructions: 'You are a voice assistant that can help users with their tasks.',
-  model: 'openai/gpt-5.4',
+  model: 'openai/gpt-5.5',
   voice: new GeminiLiveVoice({
     // Live API mode
     apiKey: process.env.GOOGLE_API_KEY,
@@ -654,7 +654,7 @@ const voiceAgent = new Agent({
   id: 'voice-agent',
   name: 'Voice Agent',
   instructions: 'You are a voice assistant that can help users with their tasks.',
-  model: 'openai/gpt-5.4',
+  model: 'openai/gpt-5.5',
   voice: new NovaSonicVoice({
     region: 'us-east-1',
     speaker: 'matthew',
@@ -697,7 +697,7 @@ const voiceAgent = new Agent({
   id: 'voice-agent',
   name: 'Voice Agent',
   instructions: 'You are a voice assistant that can help users with their tasks.',
-  model: 'openai/gpt-5.4',
+  model: 'openai/gpt-5.5',
   voice: new InworldRealtimeVoice({
     apiKey: process.env.INWORLD_API_KEY,
     model: 'inworld/models/gemma-4-26b-a4b-it',
@@ -1132,7 +1132,7 @@ const voiceAgent = new Agent({
   id: 'aisdk-voice-agent',
   name: 'AI SDK Voice Agent',
   instructions: 'You are a helpful assistant with voice capabilities.',
-  model: 'openai/gpt-5.4',
+  model: 'openai/gpt-5.5',
   voice,
 })
 ```

package/dist/docs/references/docs-voice-speech-to-speech.md CHANGED Viewed

@@ -32,7 +32,7 @@ const agent = new Agent({
   id: 'agent',
   name: 'OpenAI Realtime Agent',
   instructions: `You are a helpful assistant with real-time voice capabilities.`,
-  model: 'openai/gpt-5.4',
+  model: 'openai/gpt-5.5',
   voice: new OpenAIRealtimeVoice(),
 })
@@ -66,7 +66,7 @@ const agent = new Agent({
   name: 'Gemini Live Agent',
   instructions: 'You are a helpful assistant with real-time voice capabilities.',
   // Model used for text generation; voice provider handles realtime audio
-  model: 'openai/gpt-5.4',
+  model: 'openai/gpt-5.5',
   voice: new GeminiLiveVoice({
     apiKey: process.env.GOOGLE_API_KEY,
     model: 'gemini-2.0-flash-exp',
@@ -113,7 +113,7 @@ const agent = new Agent({
   name: 'Nova Sonic Agent',
   instructions: 'You are a helpful assistant with real-time voice capabilities.',
   // Model used for text generation; voice provider handles realtime audio
-  model: 'openai/gpt-5.4',
+  model: 'openai/gpt-5.5',
   voice: new NovaSonicVoice({
     region: 'us-east-1',
     speaker: 'matthew',
@@ -157,7 +157,7 @@ const agent = new Agent({
   name: 'Inworld Realtime Agent',
   instructions: 'You are a helpful assistant with real-time voice capabilities.',
   // Model used for text generation; voice provider handles realtime audio
-  model: 'openai/gpt-5.4',
+  model: 'openai/gpt-5.5',
   voice: new InworldRealtimeVoice({
     apiKey: process.env.INWORLD_API_KEY,
     model: 'inworld/models/gemma-4-26b-a4b-it',

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@mastra/voice-inworld",
-  "version": "0.3.0-alpha.1",
+  "version": "0.3.0",
   "description": "Mastra Inworld AI voice integration — streaming TTS, batch STT, and realtime full-duplex voice",
   "type": "module",
   "files": [
@@ -37,9 +37,9 @@
     "typescript": "^6.0.3",
     "vitest": "4.1.5",
     "zod": "^4.4.3",
-    "@internal/lint": "0.0.99",
-    "@internal/types-builder": "0.0.74",
-    "@internal/voice": "0.0.0"
+    "@internal/lint": "0.0.100",
+    "@internal/types-builder": "0.0.75",
+    "@internal/voice": "0.0.1"
   },
   "peerDependencies": {
     "zod": "^3.25.0 || ^4.0.0"