npm - @mastra/voice-inworld - Versions diffs - 0.3.0-alpha.1 → 0.3.2 - Mend

@mastra/voice-inworld 0.3.0-alpha.1 → 0.3.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (13) hide show

package/CHANGELOG.md +69 -0
package/dist/_types/@internal_voice/dist/_types/@internal_ai-sdk-v5/dist/index.d.ts +11 -8
package/dist/docs/SKILL.md +1 -1
package/dist/docs/assets/SOURCE_MAP.json +1 -1
package/dist/docs/references/docs-voice-overview.md +25 -25
package/dist/docs/references/docs-voice-speech-to-speech.md +4 -4
package/dist/docs/references/reference-voice-inworld-realtime.md +5 -5
package/dist/docs/references/reference-voice-inworld.md +1 -1
package/dist/index.cjs +2 -2
package/dist/index.cjs.map +1 -1
package/dist/index.js +2 -2
package/dist/index.js.map +1 -1
package/package.json +10 -9

package/CHANGELOG.md CHANGED Viewed

@@ -1,5 +1,74 @@
 # @mastra/voice-inworld
+## 0.3.0
+### Minor Changes
+- `@mastra/voice-inworld` now ships `InworldRealtimeVoice` for full-duplex realtime voice — mic in, speakers out, server-side LLM routing, semantic VAD turn-taking, tool calling, barge-in, and live transcripts of both sides — alongside the existing streaming TTS and batch STT. No separate package needed; import both from the same entry point. ([#16865](https://github.com/mastra-ai/mastra/pull/16865))
+  ```typescript
+  // Batch TTS / STT (unchanged)
+  import { InworldVoice } from '@mastra/voice-inworld';
+  // New: realtime full-duplex voice, from the same package
+  import { InworldRealtimeVoice } from '@mastra/voice-inworld';
+  const voice = new InworldRealtimeVoice({
+    apiKey: process.env.INWORLD_API_KEY,
+    // Defaults: model 'inworld/models/gemma-4-26b-a4b-it', speaker 'Sarah',
+    // STT 'inworld/inworld-stt-1', semantic-VAD turn detection.
+  });
+  await voice.connect();
+  voice.on('speaker', stream => playAudio(stream)); // PCM16 @ 24kHz
+  voice.on('writing', ({ text, role }) => console.log(role, text));
+  voice.on('interrupted', ({ response_id }) => stopAudio(response_id));
+  await voice.send(getMicrophoneStream());
+  ```
+  **Typed `providerData` for Inworld realtime extensions**
+  `InworldRealtimeVoice` now accepts a typed `providerData` object for Inworld-specific extensions — STT tuning, TTS segmentation and steering, automatic memory, back-channel, and responsiveness — sent under `session.providerData`. The provider also surfaces inbound extension data: a `voiceProfile` on user `writing` events, a `memory` event for the rolling summary/facts state, and `backchannel` / `backchannel.done` / `backchannel.skipped` events for back-channel audio.
+  ```typescript
+  const voice = new InworldRealtimeVoice({
+    providerData: {
+      stt: { voice_profile: true, language_hints: ['en-US'] },
+      tts: { delivery_mode: 'CREATIVE', segmenter_strategy: 'balanced' },
+      memory: { enabled: true, turn_interval: 4 },
+      backchannel: { enabled: true, max_per_turn: 1 },
+    },
+  });
+  voice.on('memory', state => console.log(state.summary, state.facts));
+  voice.on('backchannel', stream => playAudio(stream));
+  voice.on('writing', ({ role, voiceProfile }) => console.log(role, voiceProfile?.emotion));
+  ```
+  **Realtime fixes and additions**
+  - Fixed the per-call `speak(text, { speaker })` voice override. It is now sent as the flat `response.voice` field, so the per-call speaker is no longer silently ignored by the server.
+  - Added manual turn-taking methods `commitInput()`, `clearInput()`, and `clearOutput()` for push-to-talk and manual turn control (use `clearOutput()` only to hard-stop all playback — it also stops in-flight back-channels).
+  - Added smart-turn and playback-state events: `turn-suggestion`, `turn-suggestion-revoked`, `input-committed`, `input-cleared`, `input-timeout`, and `output-audio-started` / `output-audio-stopped` / `output-audio-cleared`.
+  - Added richer typed session config: input noise reduction, telephony (8 kHz) and float32 audio formats, a server-VAD `idle_timeout_ms`, plus `tracing`, `include`, and `prompt`.
+  ```typescript
+  // Push-to-talk with no auto-VAD
+  const voice = new InworldRealtimeVoice({
+    session: { audio: { input: { turn_detection: null } } },
+  });
+  await voice.send(getMicrophoneStream());
+  voice.commitInput(); // end the user turn manually
+  voice.on('output-audio-stopped', () => console.log('playback finished'));
+  ```
+### Patch Changes
+- Moved shared voice primitives and route metadata into the new `@internal/voice` package so voice providers no longer depend on `@mastra/core` and server voice routes share the same route definitions. ([#16725](https://github.com/mastra-ai/mastra/pull/16725))
+  `@mastra/core/voice` continues to re-export the voice APIs for backwards compatibility.
 ## 0.3.0-alpha.1
 ### Minor Changes

package/dist/_types/@internal_voice/dist/_types/@internal_ai-sdk-v5/dist/index.d.ts CHANGED Viewed

@@ -1549,16 +1549,16 @@ declare interface EventSourceMessage {
      * implementation in that browsers will default this to `message`, whereas this parser will
      * leave this as `undefined` if not explicitly declared.
      */
-    event?: string | undefined
+    event?: string | undefined;
     /**
      * ID of the message, if any was provided by the server. Can be used by clients to keep the
      * last received message ID in sync when reconnecting.
      */
-    id?: string | undefined
+    id?: string | undefined;
     /**
      * The data received for this message
      */
-    data: string
+    data: string;
 }
 /**
@@ -1582,8 +1582,11 @@ declare interface EventSourceMessage {
  *
  * @public
  */
-declare class EventSourceParserStream extends TransformStream<string, EventSourceMessage> {
-    constructor({onError, onRetry, onComment}?: StreamOptions)
+declare class EventSourceParserStream extends TransformStream<
+string,
+EventSourceMessage
+> {
+    constructor({ onError, onRetry, onComment }?: StreamOptions);
 }
 /**
@@ -6830,19 +6833,19 @@ declare interface StreamOptions {
      *
      * @defaultValue `undefined`
      */
-    onError?: ('terminate' | ((error: Error) => void)) | undefined
+    onError?: ("terminate" | ((error: Error) => void)) | undefined;
     /**
      * Callback for when a reconnection interval is sent from the server.
      *
      * @param retry - The number of milliseconds to wait before reconnecting.
      */
-    onRetry?: ((retry: number) => void) | undefined
+    onRetry?: ((retry: number) => void) | undefined;
     /**
      * Callback for when a comment is encountered in the stream.
      *
      * @param comment - The comment encountered in the stream.
      */
-    onComment?: ((comment: string) => void) | undefined
+    onComment?: ((comment: string) => void) | undefined;
 }
 /**

package/dist/docs/SKILL.md CHANGED Viewed

@@ -3,7 +3,7 @@ name: mastra-voice-inworld
 description: Documentation for @mastra/voice-inworld. Use when working with @mastra/voice-inworld APIs, configuration, or implementation.
 metadata:
   package: "@mastra/voice-inworld"
-  version: "0.3.0-alpha.1"
+  version: "0.3.2"
 ---
 ## When to use

package/dist/docs/assets/SOURCE_MAP.json CHANGED Viewed

@@ -1,5 +1,5 @@
 {
-  "version": "0.3.0-alpha.1",
+  "version": "0.3.2",
   "package": "@mastra/voice-inworld",
   "exports": {},
   "modules": {}

package/dist/docs/references/docs-voice-overview.md CHANGED Viewed

@@ -16,7 +16,7 @@ const voiceAgent = new Agent({
   id: 'voice-agent',
   name: 'Voice Agent',
   instructions: 'You are a voice assistant that can help users with their tasks.',
-  model: 'openai/gpt-5.4',
+  model: 'openai/gpt-5.5',
   voice: new OpenAIVoice(),
 })
 ```
@@ -40,7 +40,7 @@ const voiceAgent = new Agent({
   id: 'voice-agent',
   name: 'Voice Agent',
   instructions: 'You are a voice assistant that can help users with their tasks.',
-  model: 'openai/gpt-5.4',
+  model: 'openai/gpt-5.5',
   voice: new OpenAIVoice(),
 })
@@ -68,7 +68,7 @@ const voiceAgent = new Agent({
   id: 'voice-agent',
   name: 'Voice Agent',
   instructions: 'You are a voice assistant that can help users with their tasks.',
-  model: 'openai/gpt-5.4',
+  model: 'openai/gpt-5.5',
   voice: new AzureVoice(),
 })
@@ -95,7 +95,7 @@ const voiceAgent = new Agent({
   id: 'voice-agent',
   name: 'Voice Agent',
   instructions: 'You are a voice assistant that can help users with their tasks.',
-  model: 'openai/gpt-5.4',
+  model: 'openai/gpt-5.5',
   voice: new ElevenLabsVoice(),
 })
@@ -122,7 +122,7 @@ const voiceAgent = new Agent({
   id: 'voice-agent',
   name: 'Voice Agent',
   instructions: 'You are a voice assistant that can help users with their tasks.',
-  model: 'openai/gpt-5.4',
+  model: 'openai/gpt-5.5',
   voice: new PlayAIVoice(),
 })
@@ -149,7 +149,7 @@ const voiceAgent = new Agent({
   id: 'voice-agent',
   name: 'Voice Agent',
   instructions: 'You are a voice assistant that can help users with their tasks.',
-  model: 'openai/gpt-5.4',
+  model: 'openai/gpt-5.5',
   voice: new GoogleVoice(),
 })
@@ -176,7 +176,7 @@ const voiceAgent = new Agent({
   id: 'voice-agent',
   name: 'Voice Agent',
   instructions: 'You are a voice assistant that can help users with their tasks.',
-  model: 'openai/gpt-5.4',
+  model: 'openai/gpt-5.5',
   voice: new CloudflareVoice(),
 })
@@ -203,7 +203,7 @@ const voiceAgent = new Agent({
   id: 'voice-agent',
   name: 'Voice Agent',
   instructions: 'You are a voice assistant that can help users with their tasks.',
-  model: 'openai/gpt-5.4',
+  model: 'openai/gpt-5.5',
   voice: new DeepgramVoice(),
 })
@@ -230,7 +230,7 @@ const voiceAgent = new Agent({
   id: 'voice-agent',
   name: 'Voice Agent',
   instructions: 'You are a voice assistant that can help users with their tasks.',
-  model: 'openai/gpt-5.4',
+  model: 'openai/gpt-5.5',
   voice: new InworldVoice(),
 })
@@ -257,7 +257,7 @@ const voiceAgent = new Agent({
   id: 'voice-agent',
   name: 'Voice Agent',
   instructions: 'You are a voice assistant that can help users with their tasks.',
-  model: 'openai/gpt-5.4',
+  model: 'openai/gpt-5.5',
   voice: new SpeechifyVoice(),
 })
@@ -284,7 +284,7 @@ const voiceAgent = new Agent({
   id: 'voice-agent',
   name: 'Voice Agent',
   instructions: 'You are a voice assistant that can help users with their tasks.',
-  model: 'openai/gpt-5.4',
+  model: 'openai/gpt-5.5',
   voice: new SarvamVoice(),
 })
@@ -311,7 +311,7 @@ const voiceAgent = new Agent({
   id: 'voice-agent',
   name: 'Voice Agent',
   instructions: 'You are a voice assistant that can help users with their tasks.',
-  model: 'openai/gpt-5.4',
+  model: 'openai/gpt-5.5',
   voice: new MurfVoice(),
 })
@@ -346,7 +346,7 @@ const voiceAgent = new Agent({
   id: 'voice-agent',
   name: 'Voice Agent',
   instructions: 'You are a voice assistant that can help users with their tasks.',
-  model: 'openai/gpt-5.4',
+  model: 'openai/gpt-5.5',
   voice: new OpenAIVoice(),
 })
@@ -375,7 +375,7 @@ const voiceAgent = new Agent({
   id: 'voice-agent',
   name: 'Voice Agent',
   instructions: 'You are a voice assistant that can help users with their tasks.',
-  model: 'openai/gpt-5.4',
+  model: 'openai/gpt-5.5',
   voice: new AzureVoice(),
 })
@@ -403,7 +403,7 @@ const voiceAgent = new Agent({
   id: 'voice-agent',
   name: 'Voice Agent',
   instructions: 'You are a voice assistant that can help users with their tasks.',
-  model: 'openai/gpt-5.4',
+  model: 'openai/gpt-5.5',
   voice: new ElevenLabsVoice(),
 })
@@ -431,7 +431,7 @@ const voiceAgent = new Agent({
   id: 'voice-agent',
   name: 'Voice Agent',
   instructions: 'You are a voice assistant that can help users with their tasks.',
-  model: 'openai/gpt-5.4',
+  model: 'openai/gpt-5.5',
   voice: new GoogleVoice(),
 })
@@ -459,7 +459,7 @@ const voiceAgent = new Agent({
   id: 'voice-agent',
   name: 'Voice Agent',
   instructions: 'You are a voice assistant that can help users with their tasks.',
-  model: 'openai/gpt-5.4',
+  model: 'openai/gpt-5.5',
   voice: new CloudflareVoice(),
 })
@@ -487,7 +487,7 @@ const voiceAgent = new Agent({
   id: 'voice-agent',
   name: 'Voice Agent',
   instructions: 'You are a voice assistant that can help users with their tasks.',
-  model: 'openai/gpt-5.4',
+  model: 'openai/gpt-5.5',
   voice: new DeepgramVoice(),
 })
@@ -515,7 +515,7 @@ const voiceAgent = new Agent({
   id: 'voice-agent',
   name: 'Voice Agent',
   instructions: 'You are a voice assistant that can help users with their tasks.',
-  model: 'openai/gpt-5.4',
+  model: 'openai/gpt-5.5',
   voice: new InworldVoice(),
 })
@@ -543,7 +543,7 @@ const voiceAgent = new Agent({
   id: 'voice-agent',
   name: 'Voice Agent',
   instructions: 'You are a voice assistant that can help users with their tasks.',
-  model: 'openai/gpt-5.4',
+  model: 'openai/gpt-5.5',
   voice: new SarvamVoice(),
 })
@@ -575,7 +575,7 @@ const voiceAgent = new Agent({
   id: 'voice-agent',
   name: 'Voice Agent',
   instructions: 'You are a voice assistant that can help users with their tasks.',
-  model: 'openai/gpt-5.4',
+  model: 'openai/gpt-5.5',
   voice: new OpenAIRealtimeVoice(),
 })
@@ -605,7 +605,7 @@ const voiceAgent = new Agent({
   id: 'voice-agent',
   name: 'Voice Agent',
   instructions: 'You are a voice assistant that can help users with their tasks.',
-  model: 'openai/gpt-5.4',
+  model: 'openai/gpt-5.5',
   voice: new GeminiLiveVoice({
     // Live API mode
     apiKey: process.env.GOOGLE_API_KEY,
@@ -654,7 +654,7 @@ const voiceAgent = new Agent({
   id: 'voice-agent',
   name: 'Voice Agent',
   instructions: 'You are a voice assistant that can help users with their tasks.',
-  model: 'openai/gpt-5.4',
+  model: 'openai/gpt-5.5',
   voice: new NovaSonicVoice({
     region: 'us-east-1',
     speaker: 'matthew',
@@ -697,7 +697,7 @@ const voiceAgent = new Agent({
   id: 'voice-agent',
   name: 'Voice Agent',
   instructions: 'You are a voice assistant that can help users with their tasks.',
-  model: 'openai/gpt-5.4',
+  model: 'openai/gpt-5.5',
   voice: new InworldRealtimeVoice({
     apiKey: process.env.INWORLD_API_KEY,
     model: 'inworld/models/gemma-4-26b-a4b-it',
@@ -1132,7 +1132,7 @@ const voiceAgent = new Agent({
   id: 'aisdk-voice-agent',
   name: 'AI SDK Voice Agent',
   instructions: 'You are a helpful assistant with voice capabilities.',
-  model: 'openai/gpt-5.4',
+  model: 'openai/gpt-5.5',
   voice,
 })
 ```

package/dist/docs/references/docs-voice-speech-to-speech.md CHANGED Viewed

@@ -32,7 +32,7 @@ const agent = new Agent({
   id: 'agent',
   name: 'OpenAI Realtime Agent',
   instructions: `You are a helpful assistant with real-time voice capabilities.`,
-  model: 'openai/gpt-5.4',
+  model: 'openai/gpt-5.5',
   voice: new OpenAIRealtimeVoice(),
 })
@@ -66,7 +66,7 @@ const agent = new Agent({
   name: 'Gemini Live Agent',
   instructions: 'You are a helpful assistant with real-time voice capabilities.',
   // Model used for text generation; voice provider handles realtime audio
-  model: 'openai/gpt-5.4',
+  model: 'openai/gpt-5.5',
   voice: new GeminiLiveVoice({
     apiKey: process.env.GOOGLE_API_KEY,
     model: 'gemini-2.0-flash-exp',
@@ -113,7 +113,7 @@ const agent = new Agent({
   name: 'Nova Sonic Agent',
   instructions: 'You are a helpful assistant with real-time voice capabilities.',
   // Model used for text generation; voice provider handles realtime audio
-  model: 'openai/gpt-5.4',
+  model: 'openai/gpt-5.5',
   voice: new NovaSonicVoice({
     region: 'us-east-1',
     speaker: 'matthew',
@@ -157,7 +157,7 @@ const agent = new Agent({
   name: 'Inworld Realtime Agent',
   instructions: 'You are a helpful assistant with real-time voice capabilities.',
   // Model used for text generation; voice provider handles realtime audio
-  model: 'openai/gpt-5.4',
+  model: 'openai/gpt-5.5',
   voice: new InworldRealtimeVoice({
     apiKey: process.env.INWORLD_API_KEY,
     model: 'inworld/models/gemma-4-26b-a4b-it',

package/dist/docs/references/reference-voice-inworld-realtime.md CHANGED Viewed

@@ -54,7 +54,7 @@ await voice.send(microphoneStream)
 voice.close()
 ```
-> Inworld API keys ship pre-Basic-encoded. Paste them verbatim into `INWORLD_API_KEY`; the package does not re-encode them.
+> Inworld API keys ship pre-Basic-encoded. Paste them verbatim into `INWORLD_API_KEY`; the package doesn't re-encode them.
 ## Constructor parameters
@@ -116,7 +116,7 @@ Use the typed `session` field for documented Inworld realtime options. Fields co
 ### `providerData` (Inworld extensions)
-`providerData` is a typed object for Inworld-specific realtime extensions. It is sent under `session.providerData` on every `session.update`, and composes with any `session.providerData` you set via the `session` field — the constructor `providerData` wins on key collisions.
+`providerData` is a typed object for Inworld-specific realtime extensions. It's sent under `session.providerData` on every `session.update`, and composes with any `session.providerData` you set via the `session` field — the constructor `providerData` wins on key collisions.
 It has five branches plus two session-level fields:
@@ -342,12 +342,12 @@ Any voice ID from [Inworld's voice catalog](https://docs.inworld.ai/quickstart-t
 ## Notes
-- API keys can be provided via constructor options or the `INWORLD_API_KEY` environment variable. Keys are pre-Basic-encoded; do not re-encode them.
+- API keys can be provided via constructor options or the `INWORLD_API_KEY` environment variable. Keys are pre-Basic-encoded; don't re-encode them.
 - The WebSocket URL appends `?key=<sessionId>&protocol=realtime`. The model is configured via the initial `session.update`, not the URL.
-- Per-call `speak(input, { speaker })` scopes the voice override to a single response (via the flat `response.voice` field) and does NOT mutate the session.
+- Per-call `speak(input, { speaker })` scopes the voice override to a single response (via the flat `response.voice` field) and doesn't mutate the session.
 - Audio output defaults to PCM16 at 24 kHz. Telephony `audio/pcmu` and `audio/pcma` at 8 kHz, and `audio/float32`, are also supported via `session.audio.output.format`.
 - Use `connect()` before any send, speak, or listen call. Events sent before the WebSocket is open are queued and flushed once the server acknowledges `session.updated`.
 - The voice instance must be closed with `close()` or `disconnect()` to release the WebSocket.
-- `audio.input.turn_detection` defaults to semantic VAD when `session` does not supply it. Override with your own object, or pass `null` to disable turn detection entirely.
+- `audio.input.turn_detection` defaults to semantic VAD when `session` doesn't supply it. Override with your own object, or pass `null` to disable turn detection entirely.
 - `audio.input.transcription` defaults to `{ model: 'inworld/inworld-stt-1' }`, so user-side `writing` events fire out of the box. Override with your own object, or pass `null` to disable user-side transcription.
 - `on()` and `off()` are typed against `InworldVoiceEventMap` — known event names yield a typed callback payload, unknown names fall back to `unknown`.

package/dist/docs/references/reference-voice-inworld.md CHANGED Viewed

@@ -130,6 +130,6 @@ const speakers = await voice.getSpeakers()
 ## Notes
 - The TTS endpoint uses progressive NDJSON streaming, so audio playback can begin before the full response is received.
-- An API key can be provided via the `speechModel` or `listeningModel` config, or the `INWORLD_API_KEY` environment variable. TTS and STT keys are resolved independently: passing distinct `speechModel.apiKey` and `listeningModel.apiKey` values lets each service use its own credential. If only one is provided, it is reused for both services as a fallback before the env var.
+- An API key can be provided via the `speechModel` or `listeningModel` config, or the `INWORLD_API_KEY` environment variable. TTS and STT keys are resolved independently: passing distinct `speechModel.apiKey` and `listeningModel.apiKey` values lets each service use its own credential. If only one is provided, it's reused for both services as a fallback before the env var.
 - `inworld-tts-2` is the default flagship model. Use `deliveryMode` (`STABLE` | `BALANCED` | `CREATIVE`) to steer delivery style on this model. The `temperature` option is ignored on `inworld-tts-2`.
 - The `inworld-tts-1.5-mini` model offers lower latency at the cost of reduced voice quality compared to `inworld-tts-1.5-max`.

package/dist/index.cjs CHANGED Viewed

@@ -7,7 +7,7 @@ var zodToJsonSchema = require('zod-to-json-schema');
 // src/index.ts
-// ../../packages/_internal-core/dist/chunk-HDURQPU2.js
+// ../../packages/_internal-core/dist/chunk-3M4SEWMI.js
 var RegisteredLogger = {
   LLM: "LLM"};
 var LogLevel = {
@@ -104,7 +104,7 @@ var ConsoleLogger = class _ConsoleLogger extends MastraLogger {
   }
   warn(message, ...args) {
     if ((this.level === LogLevel.WARN || this.level === LogLevel.INFO || this.level === LogLevel.DEBUG) && this.shouldLog(LogLevel.WARN, message, args)) {
-      console.info(`${this.prefix()}${message}`, ...args);
+      console.warn(`${this.prefix()}${message}`, ...args);
     }
   }
   error(message, ...args) {