npm - @mastra/voice-google - Versions diffs - 0.12.0 → 0.12.1 - Mend

@mastra/voice-google 0.12.0 → 0.12.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (125) hide show

package/dist/docs/references/docs-voice-overview.md ADDED Viewed

@@ -0,0 +1,1250 @@
+# Voice in Mastra
+Mastra's Voice system provides a unified interface for voice interactions, enabling text-to-speech (TTS), speech-to-text (STT), and real-time speech-to-speech (STS) capabilities in your applications.
+## Adding voice to agents
+To learn how to integrate voice capabilities into your agents, check out the [Adding Voice to Agents](https://mastra.ai/docs/agents/adding-voice) documentation. This section covers how to use both single and multiple voice providers, as well as real-time interactions.
+```typescript
+import { Agent } from '@mastra/core/agent'
+import { OpenAIVoice } from '@mastra/voice-openai'
+// Initialize OpenAI voice for TTS
+const voiceAgent = new Agent({
+  id: 'voice-agent',
+  name: 'Voice Agent',
+  instructions: 'You are a voice assistant that can help users with their tasks.',
+  model: 'openai/gpt-5.5',
+  voice: new OpenAIVoice(),
+})
+```
+You can then use the following voice capabilities:
+### Text to Speech (TTS)
+Turn your agent's responses into natural-sounding speech using Mastra's TTS capabilities. Choose from multiple providers like OpenAI, ElevenLabs, and more.
+For detailed configuration options and advanced features, check out our [Text-to-Speech guide](https://mastra.ai/docs/voice/text-to-speech).
+**OpenAI**:
+```typescript
+import { Agent } from '@mastra/core/agent'
+import { OpenAIVoice } from '@mastra/voice-openai'
+import { playAudio } from '@mastra/node-audio'
+const voiceAgent = new Agent({
+  id: 'voice-agent',
+  name: 'Voice Agent',
+  instructions: 'You are a voice assistant that can help users with their tasks.',
+  model: 'openai/gpt-5.5',
+  voice: new OpenAIVoice(),
+})
+const { text } = await voiceAgent.generate('What color is the sky?')
+// Convert text to speech to an Audio Stream
+const audioStream = await voiceAgent.voice.speak(text, {
+  speaker: 'default', // Optional: specify a speaker
+  responseFormat: 'wav', // Optional: specify a response format
+})
+playAudio(audioStream)
+```
+Visit the [OpenAI Voice Reference](https://mastra.ai/reference/voice/openai) for more information on the OpenAI voice provider.
+**Azure**:
+```typescript
+import { Agent } from '@mastra/core/agent'
+import { AzureVoice } from '@mastra/voice-azure'
+import { playAudio } from '@mastra/node-audio'
+const voiceAgent = new Agent({
+  id: 'voice-agent',
+  name: 'Voice Agent',
+  instructions: 'You are a voice assistant that can help users with their tasks.',
+  model: 'openai/gpt-5.5',
+  voice: new AzureVoice(),
+})
+const { text } = await voiceAgent.generate('What color is the sky?')
+// Convert text to speech to an Audio Stream
+const audioStream = await voiceAgent.voice.speak(text, {
+  speaker: 'en-US-JennyNeural', // Optional: specify a speaker
+})
+playAudio(audioStream)
+```
+Visit the [Azure Voice Reference](https://mastra.ai/reference/voice/azure) for more information on the Azure voice provider.
+**ElevenLabs**:
+```typescript
+import { Agent } from '@mastra/core/agent'
+import { ElevenLabsVoice } from '@mastra/voice-elevenlabs'
+import { playAudio } from '@mastra/node-audio'
+const voiceAgent = new Agent({
+  id: 'voice-agent',
+  name: 'Voice Agent',
+  instructions: 'You are a voice assistant that can help users with their tasks.',
+  model: 'openai/gpt-5.5',
+  voice: new ElevenLabsVoice(),
+})
+const { text } = await voiceAgent.generate('What color is the sky?')
+// Convert text to speech to an Audio Stream
+const audioStream = await voiceAgent.voice.speak(text, {
+  speaker: 'default', // Optional: specify a speaker
+})
+playAudio(audioStream)
+```
+Visit the [ElevenLabs Voice Reference](https://mastra.ai/reference/voice/elevenlabs) for more information on the ElevenLabs voice provider.
+**PlayAI**:
+```typescript
+import { Agent } from '@mastra/core/agent'
+import { PlayAIVoice } from '@mastra/voice-playai'
+import { playAudio } from '@mastra/node-audio'
+const voiceAgent = new Agent({
+  id: 'voice-agent',
+  name: 'Voice Agent',
+  instructions: 'You are a voice assistant that can help users with their tasks.',
+  model: 'openai/gpt-5.5',
+  voice: new PlayAIVoice(),
+})
+const { text } = await voiceAgent.generate('What color is the sky?')
+// Convert text to speech to an Audio Stream
+const audioStream = await voiceAgent.voice.speak(text, {
+  speaker: 'default', // Optional: specify a speaker
+})
+playAudio(audioStream)
+```
+Visit the [PlayAI Voice Reference](https://mastra.ai/reference/voice/playai) for more information on the PlayAI voice provider.
+**Google**:
+```typescript
+import { Agent } from '@mastra/core/agent'
+import { GoogleVoice } from '@mastra/voice-google'
+import { playAudio } from '@mastra/node-audio'
+const voiceAgent = new Agent({
+  id: 'voice-agent',
+  name: 'Voice Agent',
+  instructions: 'You are a voice assistant that can help users with their tasks.',
+  model: 'openai/gpt-5.5',
+  voice: new GoogleVoice(),
+})
+const { text } = await voiceAgent.generate('What color is the sky?')
+// Convert text to speech to an Audio Stream
+const audioStream = await voiceAgent.voice.speak(text, {
+  speaker: 'en-US-Studio-O', // Optional: specify a speaker
+})
+playAudio(audioStream)
+```
+Visit the [Google Voice Reference](https://mastra.ai/reference/voice/google) for more information on the Google voice provider.
+**Cloudflare**:
+```typescript
+import { Agent } from '@mastra/core/agent'
+import { CloudflareVoice } from '@mastra/voice-cloudflare'
+import { playAudio } from '@mastra/node-audio'
+const voiceAgent = new Agent({
+  id: 'voice-agent',
+  name: 'Voice Agent',
+  instructions: 'You are a voice assistant that can help users with their tasks.',
+  model: 'openai/gpt-5.5',
+  voice: new CloudflareVoice(),
+})
+const { text } = await voiceAgent.generate('What color is the sky?')
+// Convert text to speech to an Audio Stream
+const audioStream = await voiceAgent.voice.speak(text, {
+  speaker: 'default', // Optional: specify a speaker
+})
+playAudio(audioStream)
+```
+Visit the [Cloudflare Voice Reference](https://mastra.ai/reference/voice/cloudflare) for more information on the Cloudflare voice provider.
+**Deepgram**:
+```typescript
+import { Agent } from '@mastra/core/agent'
+import { DeepgramVoice } from '@mastra/voice-deepgram'
+import { playAudio } from '@mastra/node-audio'
+const voiceAgent = new Agent({
+  id: 'voice-agent',
+  name: 'Voice Agent',
+  instructions: 'You are a voice assistant that can help users with their tasks.',
+  model: 'openai/gpt-5.5',
+  voice: new DeepgramVoice(),
+})
+const { text } = await voiceAgent.generate('What color is the sky?')
+// Convert text to speech to an Audio Stream
+const audioStream = await voiceAgent.voice.speak(text, {
+  speaker: 'aura-english-us', // Optional: specify a speaker
+})
+playAudio(audioStream)
+```
+Visit the [Deepgram Voice Reference](https://mastra.ai/reference/voice/deepgram) for more information on the Deepgram voice provider.
+**Inworld**:
+```typescript
+import { Agent } from '@mastra/core/agent'
+import { InworldVoice } from '@mastra/voice-inworld'
+import { playAudio } from '@mastra/node-audio'
+const voiceAgent = new Agent({
+  id: 'voice-agent',
+  name: 'Voice Agent',
+  instructions: 'You are a voice assistant that can help users with their tasks.',
+  model: 'openai/gpt-5.5',
+  voice: new InworldVoice(),
+})
+const { text } = await voiceAgent.generate('What color is the sky?')
+// Convert text to speech to an Audio Stream
+const audioStream = await voiceAgent.voice.speak(text, {
+  speaker: 'Dennis', // Optional: specify a speaker
+})
+playAudio(audioStream)
+```
+Visit the [Inworld Voice Reference](https://mastra.ai/reference/voice/inworld) for more information on the Inworld voice provider.
+**Speechify**:
+```typescript
+import { Agent } from '@mastra/core/agent'
+import { SpeechifyVoice } from '@mastra/voice-speechify'
+import { playAudio } from '@mastra/node-audio'
+const voiceAgent = new Agent({
+  id: 'voice-agent',
+  name: 'Voice Agent',
+  instructions: 'You are a voice assistant that can help users with their tasks.',
+  model: 'openai/gpt-5.5',
+  voice: new SpeechifyVoice(),
+})
+const { text } = await voiceAgent.generate('What color is the sky?')
+// Convert text to speech to an Audio Stream
+const audioStream = await voiceAgent.voice.speak(text, {
+  speaker: 'matthew', // Optional: specify a speaker
+})
+playAudio(audioStream)
+```
+Visit the [Speechify Voice Reference](https://mastra.ai/reference/voice/speechify) for more information on the Speechify voice provider.
+**Sarvam**:
+```typescript
+import { Agent } from '@mastra/core/agent'
+import { SarvamVoice } from '@mastra/voice-sarvam'
+import { playAudio } from '@mastra/node-audio'
+const voiceAgent = new Agent({
+  id: 'voice-agent',
+  name: 'Voice Agent',
+  instructions: 'You are a voice assistant that can help users with their tasks.',
+  model: 'openai/gpt-5.5',
+  voice: new SarvamVoice(),
+})
+const { text } = await voiceAgent.generate('What color is the sky?')
+// Convert text to speech to an Audio Stream
+const audioStream = await voiceAgent.voice.speak(text, {
+  speaker: 'shubh', // Optional: specify a bulbul:v3 speaker
+})
+playAudio(audioStream)
+```
+Visit the [Sarvam Voice Reference](https://mastra.ai/reference/voice/sarvam) for more information on the Sarvam voice provider.
+**Murf**:
+```typescript
+import { Agent } from '@mastra/core/agent'
+import { MurfVoice } from '@mastra/voice-murf'
+import { playAudio } from '@mastra/node-audio'
+const voiceAgent = new Agent({
+  id: 'voice-agent',
+  name: 'Voice Agent',
+  instructions: 'You are a voice assistant that can help users with their tasks.',
+  model: 'openai/gpt-5.5',
+  voice: new MurfVoice(),
+})
+const { text } = await voiceAgent.generate('What color is the sky?')
+// Convert text to speech to an Audio Stream
+const audioStream = await voiceAgent.voice.speak(text, {
+  speaker: 'default', // Optional: specify a speaker
+})
+playAudio(audioStream)
+```
+Visit the [Murf Voice Reference](https://mastra.ai/reference/voice/murf) for more information on the Murf voice provider.
+### Speech to Text (STT)
+Transcribe spoken content using various providers like OpenAI, ElevenLabs, and more. For detailed configuration options and more, check out [Speech to Text](https://mastra.ai/docs/voice/speech-to-text).
+You can download a sample audio file from [here](https://github.com/mastra-ai/realtime-voice-demo/raw/refs/heads/main/how_can_i_help_you.mp3).
+[](https://github.com/mastra-ai/realtime-voice-demo/raw/refs/heads/main/how_can_i_help_you.mp3)
+**OpenAI**:
+```typescript
+import { Agent } from '@mastra/core/agent'
+import { OpenAIVoice } from '@mastra/voice-openai'
+import { createReadStream } from 'fs'
+const voiceAgent = new Agent({
+  id: 'voice-agent',
+  name: 'Voice Agent',
+  instructions: 'You are a voice assistant that can help users with their tasks.',
+  model: 'openai/gpt-5.5',
+  voice: new OpenAIVoice(),
+})
+// Use an audio file from a URL
+const audioStream = await createReadStream('./how_can_i_help_you.mp3')
+// Convert audio to text
+const transcript = await voiceAgent.voice.listen(audioStream)
+console.log(`User said: ${transcript}`)
+// Generate a response based on the transcript
+const { text } = await voiceAgent.generate(transcript)
+```
+Visit the [OpenAI Voice Reference](https://mastra.ai/reference/voice/openai) for more information on the OpenAI voice provider.
+**Azure**:
+```typescript
+import { createReadStream } from 'fs'
+import { Agent } from '@mastra/core/agent'
+import { AzureVoice } from '@mastra/voice-azure'
+import { createReadStream } from 'fs'
+const voiceAgent = new Agent({
+  id: 'voice-agent',
+  name: 'Voice Agent',
+  instructions: 'You are a voice assistant that can help users with their tasks.',
+  model: 'openai/gpt-5.5',
+  voice: new AzureVoice(),
+})
+// Use an audio file from a URL
+const audioStream = await createReadStream('./how_can_i_help_you.mp3')
+// Convert audio to text
+const transcript = await voiceAgent.voice.listen(audioStream)
+console.log(`User said: ${transcript}`)
+// Generate a response based on the transcript
+const { text } = await voiceAgent.generate(transcript)
+```
+Visit the [Azure Voice Reference](https://mastra.ai/reference/voice/azure) for more information on the Azure voice provider.
+**ElevenLabs**:
+```typescript
+import { Agent } from '@mastra/core/agent'
+import { ElevenLabsVoice } from '@mastra/voice-elevenlabs'
+import { createReadStream } from 'fs'
+const voiceAgent = new Agent({
+  id: 'voice-agent',
+  name: 'Voice Agent',
+  instructions: 'You are a voice assistant that can help users with their tasks.',
+  model: 'openai/gpt-5.5',
+  voice: new ElevenLabsVoice(),
+})
+// Use an audio file from a URL
+const audioStream = await createReadStream('./how_can_i_help_you.mp3')
+// Convert audio to text
+const transcript = await voiceAgent.voice.listen(audioStream)
+console.log(`User said: ${transcript}`)
+// Generate a response based on the transcript
+const { text } = await voiceAgent.generate(transcript)
+```
+Visit the [ElevenLabs Voice Reference](https://mastra.ai/reference/voice/elevenlabs) for more information on the ElevenLabs voice provider.
+**Google**:
+```typescript
+import { Agent } from '@mastra/core/agent'
+import { GoogleVoice } from '@mastra/voice-google'
+import { createReadStream } from 'fs'
+const voiceAgent = new Agent({
+  id: 'voice-agent',
+  name: 'Voice Agent',
+  instructions: 'You are a voice assistant that can help users with their tasks.',
+  model: 'openai/gpt-5.5',
+  voice: new GoogleVoice(),
+})
+// Use an audio file from a URL
+const audioStream = await createReadStream('./how_can_i_help_you.mp3')
+// Convert audio to text
+const transcript = await voiceAgent.voice.listen(audioStream)
+console.log(`User said: ${transcript}`)
+// Generate a response based on the transcript
+const { text } = await voiceAgent.generate(transcript)
+```
+Visit the [Google Voice Reference](https://mastra.ai/reference/voice/google) for more information on the Google voice provider.
+**Cloudflare**:
+```typescript
+import { Agent } from '@mastra/core/agent'
+import { CloudflareVoice } from '@mastra/voice-cloudflare'
+import { createReadStream } from 'fs'
+const voiceAgent = new Agent({
+  id: 'voice-agent',
+  name: 'Voice Agent',
+  instructions: 'You are a voice assistant that can help users with their tasks.',
+  model: 'openai/gpt-5.5',
+  voice: new CloudflareVoice(),
+})
+// Use an audio file from a URL
+const audioStream = await createReadStream('./how_can_i_help_you.mp3')
+// Convert audio to text
+const transcript = await voiceAgent.voice.listen(audioStream)
+console.log(`User said: ${transcript}`)
+// Generate a response based on the transcript
+const { text } = await voiceAgent.generate(transcript)
+```
+Visit the [Cloudflare Voice Reference](https://mastra.ai/reference/voice/cloudflare) for more information on the Cloudflare voice provider.
+**Deepgram**:
+```typescript
+import { Agent } from '@mastra/core/agent'
+import { DeepgramVoice } from '@mastra/voice-deepgram'
+import { createReadStream } from 'fs'
+const voiceAgent = new Agent({
+  id: 'voice-agent',
+  name: 'Voice Agent',
+  instructions: 'You are a voice assistant that can help users with their tasks.',
+  model: 'openai/gpt-5.5',
+  voice: new DeepgramVoice(),
+})
+// Use an audio file from a URL
+const audioStream = await createReadStream('./how_can_i_help_you.mp3')
+// Convert audio to text
+const transcript = await voiceAgent.voice.listen(audioStream)
+console.log(`User said: ${transcript}`)
+// Generate a response based on the transcript
+const { text } = await voiceAgent.generate(transcript)
+```
+Visit the [Deepgram Voice Reference](https://mastra.ai/reference/voice/deepgram) for more information on the Deepgram voice provider.
+**Inworld**:
+```typescript
+import { Agent } from '@mastra/core/agent'
+import { InworldVoice } from '@mastra/voice-inworld'
+import { createReadStream } from 'fs'
+const voiceAgent = new Agent({
+  id: 'voice-agent',
+  name: 'Voice Agent',
+  instructions: 'You are a voice assistant that can help users with their tasks.',
+  model: 'openai/gpt-5.5',
+  voice: new InworldVoice(),
+})
+// Use an audio file from a URL
+const audioStream = await createReadStream('./how_can_i_help_you.mp3')
+// Convert audio to text
+const transcript = await voiceAgent.voice.listen(audioStream)
+console.log(`User said: ${transcript}`)
+// Generate a response based on the transcript
+const { text } = await voiceAgent.generate(transcript)
+```
+Visit the [Inworld Voice Reference](https://mastra.ai/reference/voice/inworld) for more information on the Inworld voice provider.
+**Sarvam**:
+```typescript
+import { Agent } from '@mastra/core/agent'
+import { SarvamVoice } from '@mastra/voice-sarvam'
+import { createReadStream } from 'fs'
+const voiceAgent = new Agent({
+  id: 'voice-agent',
+  name: 'Voice Agent',
+  instructions: 'You are a voice assistant that can help users with their tasks.',
+  model: 'openai/gpt-5.5',
+  voice: new SarvamVoice(),
+})
+// Use an audio file from a URL
+const audioStream = await createReadStream('./how_can_i_help_you.mp3')
+// Convert audio to text
+const transcript = await voiceAgent.voice.listen(audioStream)
+console.log(`User said: ${transcript}`)
+// Generate a response based on the transcript
+const { text } = await voiceAgent.generate(transcript)
+```
+Visit the [Sarvam Voice Reference](https://mastra.ai/reference/voice/sarvam) for more information on the Sarvam voice provider.
+### Speech to Speech (STS)
+Create conversational experiences with speech-to-speech capabilities. The unified API enables real-time voice interactions between users and AI agents. For detailed configuration options and advanced features, check out [Speech to Speech](https://mastra.ai/docs/voice/speech-to-speech).
+**OpenAI**:
+```typescript
+import { Agent } from '@mastra/core/agent'
+import { playAudio, getMicrophoneStream } from '@mastra/node-audio'
+import { OpenAIRealtimeVoice } from '@mastra/voice-openai-realtime'
+const voiceAgent = new Agent({
+  id: 'voice-agent',
+  name: 'Voice Agent',
+  instructions: 'You are a voice assistant that can help users with their tasks.',
+  model: 'openai/gpt-5.5',
+  voice: new OpenAIRealtimeVoice(),
+})
+// Listen for agent audio responses
+voiceAgent.voice.on('speaker', ({ audio }) => {
+  playAudio(audio)
+})
+// Initiate the conversation
+await voiceAgent.voice.speak('How can I help you today?')
+// Send continuous audio from the microphone
+const micStream = getMicrophoneStream()
+await voiceAgent.voice.send(micStream)
+```
+Visit the [OpenAI Voice Reference](https://mastra.ai/reference/voice/openai-realtime) for more information on the OpenAI voice provider.
+**Google**:
+```typescript
+import { Agent } from '@mastra/core/agent'
+import { playAudio, getMicrophoneStream } from '@mastra/node-audio'
+import { GeminiLiveVoice } from '@mastra/voice-google-gemini-live'
+const voiceAgent = new Agent({
+  id: 'voice-agent',
+  name: 'Voice Agent',
+  instructions: 'You are a voice assistant that can help users with their tasks.',
+  model: 'openai/gpt-5.5',
+  voice: new GeminiLiveVoice({
+    // Live API mode
+    apiKey: process.env.GOOGLE_API_KEY,
+    model: 'gemini-2.0-flash-exp',
+    speaker: 'Puck',
+    debug: true,
+    // Vertex AI alternative:
+    // vertexAI: true,
+    // project: 'your-gcp-project',
+    // location: 'us-central1',
+    // serviceAccountKeyFile: '/path/to/service-account.json',
+  }),
+})
+// Connect before using speak/send
+await voiceAgent.voice.connect()
+// Listen for agent audio responses
+voiceAgent.voice.on('speaker', ({ audio }) => {
+  playAudio(audio)
+})
+// Listen for text responses and transcriptions
+voiceAgent.voice.on('writing', ({ text, role }) => {
+  console.log(`${role}: ${text}`)
+})
+// Initiate the conversation
+await voiceAgent.voice.speak('How can I help you today?')
+// Send continuous audio from the microphone
+const micStream = getMicrophoneStream()
+await voiceAgent.voice.send(micStream)
+```
+Visit the [Google Gemini Live Reference](https://mastra.ai/reference/voice/google-gemini-live) for more information on the Google Gemini Live voice provider.
+**AWS Nova Sonic**:
+```typescript
+import { Agent } from '@mastra/core/agent'
+import { playAudio, getMicrophoneStream } from '@mastra/node-audio'
+import { NovaSonicVoice } from '@mastra/voice-aws-nova-sonic'
+const voiceAgent = new Agent({
+  id: 'voice-agent',
+  name: 'Voice Agent',
+  instructions: 'You are a voice assistant that can help users with their tasks.',
+  model: 'openai/gpt-5.5',
+  voice: new NovaSonicVoice({
+    region: 'us-east-1',
+    speaker: 'matthew',
+    // Static credentials are optional. The default AWS credential
+    // provider chain is used when none are passed.
+  }),
+})
+// Connect before using speak/send
+await voiceAgent.voice.connect()
+// Listen for assistant audio (Int16Array PCM)
+voiceAgent.voice.on('speaking', ({ audioData }) => {
+  if (audioData) playAudio(audioData)
+})
+// Listen for transcribed text
+voiceAgent.voice.on('writing', ({ text, role }) => {
+  console.log(`${role}: ${text}`)
+})
+// Initiate the conversation
+await voiceAgent.voice.speak('How can I help you today?')
+// Send continuous audio from the microphone
+const micStream = getMicrophoneStream()
+await voiceAgent.voice.send(micStream)
+```
+Visit the [AWS Nova Sonic Reference](https://mastra.ai/reference/voice/aws-nova-sonic) for more information on the AWS Nova Sonic voice provider.
+**Inworld Realtime**:
+```typescript
+import { Agent } from '@mastra/core/agent'
+import { playAudio, getMicrophoneStream } from '@mastra/node-audio'
+import { InworldRealtimeVoice } from '@mastra/voice-inworld'
+const voiceAgent = new Agent({
+  id: 'voice-agent',
+  name: 'Voice Agent',
+  instructions: 'You are a voice assistant that can help users with their tasks.',
+  model: 'openai/gpt-5.5',
+  voice: new InworldRealtimeVoice({
+    apiKey: process.env.INWORLD_API_KEY,
+    model: 'inworld/models/gemma-4-26b-a4b-it',
+    speaker: 'Sarah',
+  }),
+})
+// Connect before using speak/send
+await voiceAgent.voice.connect()
+// Listen for agent audio (PCM stream)
+voiceAgent.voice.on('speaker', stream => {
+  playAudio(stream)
+})
+// Listen for text responses and transcriptions
+voiceAgent.voice.on('writing', ({ text, role }) => {
+  console.log(`${role}: ${text}`)
+})
+// Initiate the conversation
+await voiceAgent.voice.speak('How can I help you today?')
+// Send continuous audio from the microphone
+const micStream = getMicrophoneStream()
+await voiceAgent.voice.send(micStream)
+```
+Visit the [Inworld Realtime Reference](https://mastra.ai/reference/voice/inworld-realtime) for more information on the Inworld Realtime voice provider.
+**xAI**:
+```typescript
+import { Agent } from '@mastra/core/agent'
+import { playAudio, getMicrophoneStream } from '@mastra/node-audio'
+import { XAIRealtimeVoice } from '@mastra/voice-xai-realtime'
+const voiceAgent = new Agent({
+  id: 'voice-agent',
+  name: 'Voice Agent',
+  instructions: 'You are a voice assistant that can help users with their tasks.',
+  model: 'xai/grok-4.3',
+  voice: new XAIRealtimeVoice({
+    apiKey: process.env.XAI_API_KEY,
+    model: 'grok-voice-think-fast-1.0',
+    speaker: 'eve',
+    turnDetection: { type: 'server_vad' },
+  }),
+})
+// Connect before using speak/send
+await voiceAgent.voice.connect()
+// Listen for agent audio responses
+voiceAgent.voice.on('speaker', audioStream => {
+  playAudio(audioStream)
+})
+// Listen for text responses and transcriptions
+voiceAgent.voice.on('writing', ({ text, role }) => {
+  console.log(`${role}: ${text}`)
+})
+// Initiate the conversation
+await voiceAgent.voice.speak('How can I help you today?')
+// Send continuous audio from the microphone
+const micStream = getMicrophoneStream()
+await voiceAgent.voice.send(micStream)
+```
+Visit the [xAI Realtime Voice Reference](https://mastra.ai/reference/voice/xai-realtime) for more information on the xAI voice provider.
+## Voice configuration
+Each voice provider can be configured with different models and options. Below are the detailed configuration options for all supported providers:
+**OpenAI**:
+```typescript
+// OpenAI Voice Configuration
+const voice = new OpenAIVoice({
+  speechModel: {
+    name: 'gpt-3.5-turbo', // Example model name
+    apiKey: process.env.OPENAI_API_KEY,
+    language: 'en-US', // Language code
+    voiceType: 'neural', // Type of voice model
+  },
+  listeningModel: {
+    name: 'whisper-1', // Example model name
+    apiKey: process.env.OPENAI_API_KEY,
+    language: 'en-US', // Language code
+    format: 'wav', // Audio format
+  },
+  speaker: 'alloy', // Example speaker name
+})
+```
+Visit the [OpenAI Voice Reference](https://mastra.ai/reference/voice/openai) for more information on the OpenAI voice provider.
+**Azure**:
+```typescript
+// Azure Voice Configuration
+const voice = new AzureVoice({
+  speechModel: {
+    name: 'en-US-JennyNeural', // Example model name
+    apiKey: process.env.AZURE_SPEECH_KEY,
+    region: process.env.AZURE_SPEECH_REGION,
+    language: 'en-US', // Language code
+    style: 'cheerful', // Voice style
+    pitch: '+0Hz', // Pitch adjustment
+    rate: '1.0', // Speech rate
+  },
+  listeningModel: {
+    name: 'en-US', // Example model name
+    apiKey: process.env.AZURE_SPEECH_KEY,
+    region: process.env.AZURE_SPEECH_REGION,
+    format: 'simple', // Output format
+  },
+})
+```
+Visit the [Azure Voice Reference](https://mastra.ai/reference/voice/azure) for more information on the Azure voice provider.
+**ElevenLabs**:
+```typescript
+// ElevenLabs Voice Configuration
+const voice = new ElevenLabsVoice({
+  speechModel: {
+    voiceId: 'your-voice-id', // Example voice ID
+    model: 'eleven_multilingual_v2', // Example model name
+    apiKey: process.env.ELEVENLABS_API_KEY,
+    language: 'en', // Language code
+    emotion: 'neutral', // Emotion setting
+  },
+  // ElevenLabs may not have a separate listening model
+})
+```
+Visit the [ElevenLabs Voice Reference](https://mastra.ai/reference/voice/elevenlabs) for more information on the ElevenLabs voice provider.
+**PlayAI**:
+```typescript
+// PlayAI Voice Configuration
+const voice = new PlayAIVoice({
+  speechModel: {
+    name: 'playai-voice', // Example model name
+    speaker: 'emma', // Example speaker name
+    apiKey: process.env.PLAYAI_API_KEY,
+    language: 'en-US', // Language code
+    speed: 1.0, // Speech speed
+  },
+  // PlayAI may not have a separate listening model
+})
+```
+Visit the [PlayAI Voice Reference](https://mastra.ai/reference/voice/playai) for more information on the PlayAI voice provider.
+**Google**:
+```typescript
+// Google Voice Configuration
+const voice = new GoogleVoice({
+  speechModel: {
+    name: 'en-US-Studio-O', // Example model name
+    apiKey: process.env.GOOGLE_API_KEY,
+    languageCode: 'en-US', // Language code
+    gender: 'FEMALE', // Voice gender
+    speakingRate: 1.0, // Speaking rate
+  },
+  listeningModel: {
+    name: 'en-US', // Example model name
+    sampleRateHertz: 16000, // Sample rate
+  },
+})
+```
+Visit the [Google Voice Reference](https://mastra.ai/reference/voice/google) for more information on the Google voice provider.
+**Cloudflare**:
+```typescript
+// Cloudflare Voice Configuration
+const voice = new CloudflareVoice({
+  speechModel: {
+    name: 'cloudflare-voice', // Example model name
+    accountId: process.env.CLOUDFLARE_ACCOUNT_ID,
+    apiToken: process.env.CLOUDFLARE_API_TOKEN,
+    language: 'en-US', // Language code
+    format: 'mp3', // Audio format
+  },
+  // Cloudflare may not have a separate listening model
+})
+```
+Visit the [Cloudflare Voice Reference](https://mastra.ai/reference/voice/cloudflare) for more information on the Cloudflare voice provider.
+**Deepgram**:
+```typescript
+// Deepgram Voice Configuration
+const voice = new DeepgramVoice({
+  speechModel: {
+    name: 'nova-2', // Example model name
+    speaker: 'aura-english-us', // Example speaker name
+    apiKey: process.env.DEEPGRAM_API_KEY,
+    language: 'en-US', // Language code
+    tone: 'formal', // Tone setting
+  },
+  listeningModel: {
+    name: 'nova-2', // Example model name
+    format: 'flac', // Audio format
+  },
+})
+```
+Visit the [Deepgram Voice Reference](https://mastra.ai/reference/voice/deepgram) for more information on the Deepgram voice provider.
+**Inworld**:
+```typescript
+// Inworld Voice Configuration
+const voice = new InworldVoice({
+  speechModel: {
+    name: 'inworld-tts-2',
+    apiKey: process.env.INWORLD_API_KEY,
+  },
+  listeningModel: {
+    name: 'groq/whisper-large-v3',
+    apiKey: process.env.INWORLD_API_KEY,
+  },
+  speaker: 'Dennis',
+  audioEncoding: 'MP3',
+  sampleRateHertz: 48000,
+  language: 'en-US',
+})
+// Per-call options: `deliveryMode` is honored only by `inworld-tts-2`.
+const audioStream = await voice.speak('Hello!', {
+  deliveryMode: 'BALANCED', // 'STABLE' | 'BALANCED' | 'CREATIVE'
+  language: 'en-US', // BCP-47 per-call override
+})
+```
+Visit the [Inworld Voice Reference](https://mastra.ai/reference/voice/inworld) for more information on the Inworld voice provider.
+**Speechify**:
+```typescript
+// Speechify Voice Configuration
+const voice = new SpeechifyVoice({
+  speechModel: {
+    name: 'speechify-voice', // Example model name
+    speaker: 'matthew', // Example speaker name
+    apiKey: process.env.SPEECHIFY_API_KEY,
+    language: 'en-US', // Language code
+    speed: 1.0, // Speech speed
+  },
+  // Speechify may not have a separate listening model
+})
+```
+Visit the [Speechify Voice Reference](https://mastra.ai/reference/voice/speechify) for more information on the Speechify voice provider.
+**Sarvam**:
+```typescript
+// Sarvam Voice Configuration
+const voice = new SarvamVoice({
+  speechModel: {
+    model: 'bulbul:v3', // TTS model (bulbul:v2 or bulbul:v3)
+    apiKey: process.env.SARVAM_API_KEY,
+    language: 'en-IN', // BCP-47 language code
+  },
+  listeningModel: {
+    model: 'saarika:v2.5', // STT model (saarika:v2.5 or saaras:v3)
+    apiKey: process.env.SARVAM_API_KEY,
+  },
+  speaker: 'shubh', // Default bulbul:v3 speaker
+})
+```
+Visit the [Sarvam Voice Reference](https://mastra.ai/reference/voice/sarvam) for more information on the Sarvam voice provider.
+**Murf**:
+```typescript
+// Murf Voice Configuration
+const voice = new MurfVoice({
+  speechModel: {
+    name: 'murf-voice', // Example model name
+    apiKey: process.env.MURF_API_KEY,
+    language: 'en-US', // Language code
+    emotion: 'happy', // Emotion setting
+  },
+  // Murf may not have a separate listening model
+})
+```
+Visit the [Murf Voice Reference](https://mastra.ai/reference/voice/murf) for more information on the Murf voice provider.
+**OpenAI Realtime**:
+```typescript
+// OpenAI Realtime Voice Configuration
+const voice = new OpenAIRealtimeVoice({
+  speechModel: {
+    name: 'gpt-3.5-turbo', // Example model name
+    apiKey: process.env.OPENAI_API_KEY,
+    language: 'en-US', // Language code
+  },
+  listeningModel: {
+    name: 'whisper-1', // Example model name
+    apiKey: process.env.OPENAI_API_KEY,
+    format: 'ogg', // Audio format
+  },
+  speaker: 'alloy', // Example speaker name
+})
+```
+For more information on the OpenAI Realtime voice provider, refer to the [OpenAI Realtime Voice Reference](https://mastra.ai/reference/voice/openai-realtime).
+**xAI Realtime**:
+```typescript
+// xAI Realtime Voice Configuration
+const voice = new XAIRealtimeVoice({
+  apiKey: process.env.XAI_API_KEY,
+  model: 'grok-voice-think-fast-1.0',
+  speaker: 'eve',
+  instructions: 'You are a concise voice assistant.',
+  turnDetection: {
+    type: 'server_vad',
+    threshold: 0.85,
+    silence_duration_ms: 1000,
+    prefix_padding_ms: 333,
+  },
+  audio: {
+    input: { format: { type: 'audio/pcm', rate: 24000 } },
+    output: { format: { type: 'audio/pcm', rate: 24000 } },
+  },
+  serverTools: [
+    { type: 'web_search' },
+    {
+      type: 'mcp',
+      server_url: 'https://mcp.example.com/mcp',
+      server_label: 'business-tools',
+    },
+  ],
+})
+```
+Visit the [xAI Realtime Voice Reference](https://mastra.ai/reference/voice/xai-realtime) for more information on the xAI realtime voice provider.
+**Google Gemini Live**:
+```typescript
+// Google Gemini Live Voice Configuration
+const voice = new GeminiLiveVoice({
+  speechModel: {
+    name: 'gemini-2.0-flash-exp', // Example model name
+    apiKey: process.env.GOOGLE_API_KEY,
+  },
+  speaker: 'Puck', // Example speaker name
+  // Google Gemini Live is a realtime bidirectional API without separate speech and listening models
+})
+```
+Visit the [Google Gemini Live Reference](https://mastra.ai/reference/voice/google-gemini-live) for more information on the Google Gemini Live voice provider.
+**AWS Nova Sonic**:
+```typescript
+// AWS Nova Sonic Voice Configuration
+const voice = new NovaSonicVoice({
+  region: 'us-east-1',
+  speaker: 'matthew',
+  sessionConfig: {
+    inferenceConfiguration: {
+      temperature: 0.7,
+      maxTokens: 1024,
+    },
+    turnDetectionConfiguration: {
+      endpointingSensitivity: 'MEDIUM',
+    },
+  },
+  // AWS Nova Sonic is a realtime bidirectional API without separate speech and listening models
+})
+```
+Visit the [AWS Nova Sonic Reference](https://mastra.ai/reference/voice/aws-nova-sonic) for more information on the AWS Nova Sonic voice provider.
+**Inworld Realtime**:
+```typescript
+// Inworld Realtime Voice Configuration
+const voice = new InworldRealtimeVoice({
+  apiKey: process.env.INWORLD_API_KEY,
+  model: 'inworld/models/gemma-4-26b-a4b-it',
+  speaker: 'Sarah',
+  // Typed Inworld realtime knobs (semantic VAD, playback speed, MCP tool routing, ...)
+  session: {
+    audio: {
+      output: { speed: 1.1 },
+      input: { turn_detection: { type: 'semantic_vad', eagerness: 'high' } },
+    },
+  },
+})
+```
+Visit the [Inworld Realtime Reference](https://mastra.ai/reference/voice/inworld-realtime) for more information on the Inworld Realtime voice provider.
+**AI SDK**:
+```typescript
+// AI SDK Voice Configuration
+import { CompositeVoice } from '@mastra/core/voice'
+import { openai } from '@ai-sdk/openai'
+import { elevenlabs } from '@ai-sdk/elevenlabs'
+// Use AI SDK models directly - no need to install separate packages
+const voice = new CompositeVoice({
+  input: openai.transcription('whisper-1'), // AI SDK transcription
+  output: elevenlabs.speech('eleven_turbo_v2'), // AI SDK speech
+})
+// Works seamlessly with your agent
+const voiceAgent = new Agent({
+  id: 'aisdk-voice-agent',
+  name: 'AI SDK Voice Agent',
+  instructions: 'You are a helpful assistant with voice capabilities.',
+  model: 'openai/gpt-5.5',
+  voice,
+})
+```
+### Using Multiple Voice Providers
+This example demonstrates how to create and use two different voice providers in Mastra: OpenAI for speech-to-text (STT) and PlayAI for text-to-speech (TTS).
+Start by creating instances of the voice providers with any necessary configuration.
+```typescript
+import { OpenAIVoice } from '@mastra/voice-openai'
+import { PlayAIVoice } from '@mastra/voice-playai'
+import { CompositeVoice } from '@mastra/core/voice'
+import { playAudio, getMicrophoneStream } from '@mastra/node-audio'
+// Initialize OpenAI voice for STT
+const input = new OpenAIVoice({
+  listeningModel: {
+    name: 'whisper-1',
+    apiKey: process.env.OPENAI_API_KEY,
+  },
+})
+// Initialize PlayAI voice for TTS
+const output = new PlayAIVoice({
+  speechModel: {
+    name: 'playai-voice',
+    apiKey: process.env.PLAYAI_API_KEY,
+  },
+})
+// Combine the providers using CompositeVoice
+const voice = new CompositeVoice({
+  input,
+  output,
+})
+// Implement voice interactions using the combined voice provider
+const audioStream = getMicrophoneStream() // Assume this function gets audio input
+const transcript = await voice.listen(audioStream)
+// Log the transcribed text
+console.log('Transcribed text:', transcript)
+// Convert text to speech
+const responseAudio = await voice.speak(`You said: ${transcript}`, {
+  speaker: 'default', // Optional: specify a speaker,
+  responseFormat: 'wav', // Optional: specify a response format
+})
+// Play the audio response
+playAudio(responseAudio)
+```
+### Using AI SDK Model Providers
+You can also use AI SDK models directly with `CompositeVoice`:
+```typescript
+import { CompositeVoice } from '@mastra/core/voice'
+import { openai } from '@ai-sdk/openai'
+import { elevenlabs } from '@ai-sdk/elevenlabs'
+import { playAudio, getMicrophoneStream } from '@mastra/node-audio'
+// Use AI SDK models directly - no provider setup needed
+const voice = new CompositeVoice({
+  input: openai.transcription('whisper-1'), // AI SDK transcription
+  output: elevenlabs.speech('eleven_turbo_v2'), // AI SDK speech
+})
+// Works the same way as Mastra providers
+const audioStream = getMicrophoneStream()
+const transcript = await voice.listen(audioStream)
+console.log('Transcribed text:', transcript)
+// Convert text to speech
+const responseAudio = await voice.speak(`You said: ${transcript}`, {
+  speaker: 'Rachel', // ElevenLabs voice
+})
+playAudio(responseAudio)
+```
+You can also mix AI SDK models with Mastra providers:
+```typescript
+import { CompositeVoice } from '@mastra/core/voice'
+import { PlayAIVoice } from '@mastra/voice-playai'
+import { groq } from '@ai-sdk/groq'
+const voice = new CompositeVoice({
+  input: groq.transcription('whisper-large-v3'), // AI SDK for STT
+  output: new PlayAIVoice(), // Mastra provider for TTS
+})
+```
+For more information on the CompositeVoice, refer to the [CompositeVoice Reference](https://mastra.ai/reference/voice/composite-voice).
+## More resources
+- [CompositeVoice](https://mastra.ai/reference/voice/composite-voice)
+- [MastraVoice](https://mastra.ai/reference/voice/mastra-voice)
+- [OpenAI Voice](https://mastra.ai/reference/voice/openai)
+- [OpenAI Realtime Voice](https://mastra.ai/reference/voice/openai-realtime)
+- [xAI Realtime Voice](https://mastra.ai/reference/voice/xai-realtime)
+- [Azure Voice](https://mastra.ai/reference/voice/azure)
+- [Google Voice](https://mastra.ai/reference/voice/google)
+- [Google Gemini Live Voice](https://mastra.ai/reference/voice/google-gemini-live)
+- [AWS Nova Sonic Voice](https://mastra.ai/reference/voice/aws-nova-sonic)
+- [Deepgram Voice](https://mastra.ai/reference/voice/deepgram)
+- [Inworld Voice](https://mastra.ai/reference/voice/inworld)
+- [PlayAI Voice](https://mastra.ai/reference/voice/playai)
+- [Voice Examples](https://github.com/mastra-ai/voice-examples)