@mastra/voice-murf 0.12.0 → 0.12.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,1188 @@
1
+ # Voice in Mastra
2
+
3
+ Mastra's Voice system provides a unified interface for voice interactions, enabling text-to-speech (TTS), speech-to-text (STT), and real-time speech-to-speech (STS) capabilities in your applications.
4
+
5
+ ## Adding voice to agents
6
+
7
+ To learn how to integrate voice capabilities into your agents, check out the [Adding Voice to Agents](https://mastra.ai/docs/agents/adding-voice) documentation. This section covers how to use both single and multiple voice providers, as well as real-time interactions.
8
+
9
+ ```typescript
10
+ import { Agent } from '@mastra/core/agent'
11
+ import { OpenAIVoice } from '@mastra/voice-openai'
12
+
13
+ // Initialize OpenAI voice for TTS
14
+
15
+ const voiceAgent = new Agent({
16
+ id: 'voice-agent',
17
+ name: 'Voice Agent',
18
+ instructions: 'You are a voice assistant that can help users with their tasks.',
19
+ model: 'openai/gpt-5.4',
20
+ voice: new OpenAIVoice(),
21
+ })
22
+ ```
23
+
24
+ You can then use the following voice capabilities:
25
+
26
+ ### Text to Speech (TTS)
27
+
28
+ Turn your agent's responses into natural-sounding speech using Mastra's TTS capabilities. Choose from multiple providers like OpenAI, ElevenLabs, and more.
29
+
30
+ For detailed configuration options and advanced features, check out our [Text-to-Speech guide](https://mastra.ai/docs/voice/text-to-speech).
31
+
32
+ **OpenAI**:
33
+
34
+ ```typescript
35
+ import { Agent } from '@mastra/core/agent'
36
+ import { OpenAIVoice } from '@mastra/voice-openai'
37
+ import { playAudio } from '@mastra/node-audio'
38
+
39
+ const voiceAgent = new Agent({
40
+ id: 'voice-agent',
41
+ name: 'Voice Agent',
42
+ instructions: 'You are a voice assistant that can help users with their tasks.',
43
+ model: 'openai/gpt-5.4',
44
+ voice: new OpenAIVoice(),
45
+ })
46
+
47
+ const { text } = await voiceAgent.generate('What color is the sky?')
48
+
49
+ // Convert text to speech to an Audio Stream
50
+ const audioStream = await voiceAgent.voice.speak(text, {
51
+ speaker: 'default', // Optional: specify a speaker
52
+ responseFormat: 'wav', // Optional: specify a response format
53
+ })
54
+
55
+ playAudio(audioStream)
56
+ ```
57
+
58
+ Visit the [OpenAI Voice Reference](https://mastra.ai/reference/voice/openai) for more information on the OpenAI voice provider.
59
+
60
+ **Azure**:
61
+
62
+ ```typescript
63
+ import { Agent } from '@mastra/core/agent'
64
+ import { AzureVoice } from '@mastra/voice-azure'
65
+ import { playAudio } from '@mastra/node-audio'
66
+
67
+ const voiceAgent = new Agent({
68
+ id: 'voice-agent',
69
+ name: 'Voice Agent',
70
+ instructions: 'You are a voice assistant that can help users with their tasks.',
71
+ model: 'openai/gpt-5.4',
72
+ voice: new AzureVoice(),
73
+ })
74
+
75
+ const { text } = await voiceAgent.generate('What color is the sky?')
76
+
77
+ // Convert text to speech to an Audio Stream
78
+ const audioStream = await voiceAgent.voice.speak(text, {
79
+ speaker: 'en-US-JennyNeural', // Optional: specify a speaker
80
+ })
81
+
82
+ playAudio(audioStream)
83
+ ```
84
+
85
+ Visit the [Azure Voice Reference](https://mastra.ai/reference/voice/azure) for more information on the Azure voice provider.
86
+
87
+ **ElevenLabs**:
88
+
89
+ ```typescript
90
+ import { Agent } from '@mastra/core/agent'
91
+ import { ElevenLabsVoice } from '@mastra/voice-elevenlabs'
92
+ import { playAudio } from '@mastra/node-audio'
93
+
94
+ const voiceAgent = new Agent({
95
+ id: 'voice-agent',
96
+ name: 'Voice Agent',
97
+ instructions: 'You are a voice assistant that can help users with their tasks.',
98
+ model: 'openai/gpt-5.4',
99
+ voice: new ElevenLabsVoice(),
100
+ })
101
+
102
+ const { text } = await voiceAgent.generate('What color is the sky?')
103
+
104
+ // Convert text to speech to an Audio Stream
105
+ const audioStream = await voiceAgent.voice.speak(text, {
106
+ speaker: 'default', // Optional: specify a speaker
107
+ })
108
+
109
+ playAudio(audioStream)
110
+ ```
111
+
112
+ Visit the [ElevenLabs Voice Reference](https://mastra.ai/reference/voice/elevenlabs) for more information on the ElevenLabs voice provider.
113
+
114
+ **PlayAI**:
115
+
116
+ ```typescript
117
+ import { Agent } from '@mastra/core/agent'
118
+ import { PlayAIVoice } from '@mastra/voice-playai'
119
+ import { playAudio } from '@mastra/node-audio'
120
+
121
+ const voiceAgent = new Agent({
122
+ id: 'voice-agent',
123
+ name: 'Voice Agent',
124
+ instructions: 'You are a voice assistant that can help users with their tasks.',
125
+ model: 'openai/gpt-5.4',
126
+ voice: new PlayAIVoice(),
127
+ })
128
+
129
+ const { text } = await voiceAgent.generate('What color is the sky?')
130
+
131
+ // Convert text to speech to an Audio Stream
132
+ const audioStream = await voiceAgent.voice.speak(text, {
133
+ speaker: 'default', // Optional: specify a speaker
134
+ })
135
+
136
+ playAudio(audioStream)
137
+ ```
138
+
139
+ Visit the [PlayAI Voice Reference](https://mastra.ai/reference/voice/playai) for more information on the PlayAI voice provider.
140
+
141
+ **Google**:
142
+
143
+ ```typescript
144
+ import { Agent } from '@mastra/core/agent'
145
+ import { GoogleVoice } from '@mastra/voice-google'
146
+ import { playAudio } from '@mastra/node-audio'
147
+
148
+ const voiceAgent = new Agent({
149
+ id: 'voice-agent',
150
+ name: 'Voice Agent',
151
+ instructions: 'You are a voice assistant that can help users with their tasks.',
152
+ model: 'openai/gpt-5.4',
153
+ voice: new GoogleVoice(),
154
+ })
155
+
156
+ const { text } = await voiceAgent.generate('What color is the sky?')
157
+
158
+ // Convert text to speech to an Audio Stream
159
+ const audioStream = await voiceAgent.voice.speak(text, {
160
+ speaker: 'en-US-Studio-O', // Optional: specify a speaker
161
+ })
162
+
163
+ playAudio(audioStream)
164
+ ```
165
+
166
+ Visit the [Google Voice Reference](https://mastra.ai/reference/voice/google) for more information on the Google voice provider.
167
+
168
+ **Cloudflare**:
169
+
170
+ ```typescript
171
+ import { Agent } from '@mastra/core/agent'
172
+ import { CloudflareVoice } from '@mastra/voice-cloudflare'
173
+ import { playAudio } from '@mastra/node-audio'
174
+
175
+ const voiceAgent = new Agent({
176
+ id: 'voice-agent',
177
+ name: 'Voice Agent',
178
+ instructions: 'You are a voice assistant that can help users with their tasks.',
179
+ model: 'openai/gpt-5.4',
180
+ voice: new CloudflareVoice(),
181
+ })
182
+
183
+ const { text } = await voiceAgent.generate('What color is the sky?')
184
+
185
+ // Convert text to speech to an Audio Stream
186
+ const audioStream = await voiceAgent.voice.speak(text, {
187
+ speaker: 'default', // Optional: specify a speaker
188
+ })
189
+
190
+ playAudio(audioStream)
191
+ ```
192
+
193
+ Visit the [Cloudflare Voice Reference](https://mastra.ai/reference/voice/cloudflare) for more information on the Cloudflare voice provider.
194
+
195
+ **Deepgram**:
196
+
197
+ ```typescript
198
+ import { Agent } from '@mastra/core/agent'
199
+ import { DeepgramVoice } from '@mastra/voice-deepgram'
200
+ import { playAudio } from '@mastra/node-audio'
201
+
202
+ const voiceAgent = new Agent({
203
+ id: 'voice-agent',
204
+ name: 'Voice Agent',
205
+ instructions: 'You are a voice assistant that can help users with their tasks.',
206
+ model: 'openai/gpt-5.4',
207
+ voice: new DeepgramVoice(),
208
+ })
209
+
210
+ const { text } = await voiceAgent.generate('What color is the sky?')
211
+
212
+ // Convert text to speech to an Audio Stream
213
+ const audioStream = await voiceAgent.voice.speak(text, {
214
+ speaker: 'aura-english-us', // Optional: specify a speaker
215
+ })
216
+
217
+ playAudio(audioStream)
218
+ ```
219
+
220
+ Visit the [Deepgram Voice Reference](https://mastra.ai/reference/voice/deepgram) for more information on the Deepgram voice provider.
221
+
222
+ **Inworld**:
223
+
224
+ ```typescript
225
+ import { Agent } from '@mastra/core/agent'
226
+ import { InworldVoice } from '@mastra/voice-inworld'
227
+ import { playAudio } from '@mastra/node-audio'
228
+
229
+ const voiceAgent = new Agent({
230
+ id: 'voice-agent',
231
+ name: 'Voice Agent',
232
+ instructions: 'You are a voice assistant that can help users with their tasks.',
233
+ model: 'openai/gpt-5.4',
234
+ voice: new InworldVoice(),
235
+ })
236
+
237
+ const { text } = await voiceAgent.generate('What color is the sky?')
238
+
239
+ // Convert text to speech to an Audio Stream
240
+ const audioStream = await voiceAgent.voice.speak(text, {
241
+ speaker: 'Dennis', // Optional: specify a speaker
242
+ })
243
+
244
+ playAudio(audioStream)
245
+ ```
246
+
247
+ Visit the [Inworld Voice Reference](https://mastra.ai/reference/voice/inworld) for more information on the Inworld voice provider.
248
+
249
+ **Speechify**:
250
+
251
+ ```typescript
252
+ import { Agent } from '@mastra/core/agent'
253
+ import { SpeechifyVoice } from '@mastra/voice-speechify'
254
+ import { playAudio } from '@mastra/node-audio'
255
+
256
+ const voiceAgent = new Agent({
257
+ id: 'voice-agent',
258
+ name: 'Voice Agent',
259
+ instructions: 'You are a voice assistant that can help users with their tasks.',
260
+ model: 'openai/gpt-5.4',
261
+ voice: new SpeechifyVoice(),
262
+ })
263
+
264
+ const { text } = await voiceAgent.generate('What color is the sky?')
265
+
266
+ // Convert text to speech to an Audio Stream
267
+ const audioStream = await voiceAgent.voice.speak(text, {
268
+ speaker: 'matthew', // Optional: specify a speaker
269
+ })
270
+
271
+ playAudio(audioStream)
272
+ ```
273
+
274
+ Visit the [Speechify Voice Reference](https://mastra.ai/reference/voice/speechify) for more information on the Speechify voice provider.
275
+
276
+ **Sarvam**:
277
+
278
+ ```typescript
279
+ import { Agent } from '@mastra/core/agent'
280
+ import { SarvamVoice } from '@mastra/voice-sarvam'
281
+ import { playAudio } from '@mastra/node-audio'
282
+
283
+ const voiceAgent = new Agent({
284
+ id: 'voice-agent',
285
+ name: 'Voice Agent',
286
+ instructions: 'You are a voice assistant that can help users with their tasks.',
287
+ model: 'openai/gpt-5.4',
288
+ voice: new SarvamVoice(),
289
+ })
290
+
291
+ const { text } = await voiceAgent.generate('What color is the sky?')
292
+
293
+ // Convert text to speech to an Audio Stream
294
+ const audioStream = await voiceAgent.voice.speak(text, {
295
+ speaker: 'shubh', // Optional: specify a bulbul:v3 speaker
296
+ })
297
+
298
+ playAudio(audioStream)
299
+ ```
300
+
301
+ Visit the [Sarvam Voice Reference](https://mastra.ai/reference/voice/sarvam) for more information on the Sarvam voice provider.
302
+
303
+ **Murf**:
304
+
305
+ ```typescript
306
+ import { Agent } from '@mastra/core/agent'
307
+ import { MurfVoice } from '@mastra/voice-murf'
308
+ import { playAudio } from '@mastra/node-audio'
309
+
310
+ const voiceAgent = new Agent({
311
+ id: 'voice-agent',
312
+ name: 'Voice Agent',
313
+ instructions: 'You are a voice assistant that can help users with their tasks.',
314
+ model: 'openai/gpt-5.4',
315
+ voice: new MurfVoice(),
316
+ })
317
+
318
+ const { text } = await voiceAgent.generate('What color is the sky?')
319
+
320
+ // Convert text to speech to an Audio Stream
321
+ const audioStream = await voiceAgent.voice.speak(text, {
322
+ speaker: 'default', // Optional: specify a speaker
323
+ })
324
+
325
+ playAudio(audioStream)
326
+ ```
327
+
328
+ Visit the [Murf Voice Reference](https://mastra.ai/reference/voice/murf) for more information on the Murf voice provider.
329
+
330
+ ### Speech to Text (STT)
331
+
332
+ Transcribe spoken content using various providers like OpenAI, ElevenLabs, and more. For detailed configuration options and more, check out [Speech to Text](https://mastra.ai/docs/voice/speech-to-text).
333
+
334
+ You can download a sample audio file from [here](https://github.com/mastra-ai/realtime-voice-demo/raw/refs/heads/main/how_can_i_help_you.mp3).
335
+
336
+ [](https://github.com/mastra-ai/realtime-voice-demo/raw/refs/heads/main/how_can_i_help_you.mp3)
337
+
338
+ **OpenAI**:
339
+
340
+ ```typescript
341
+ import { Agent } from '@mastra/core/agent'
342
+ import { OpenAIVoice } from '@mastra/voice-openai'
343
+ import { createReadStream } from 'fs'
344
+
345
+ const voiceAgent = new Agent({
346
+ id: 'voice-agent',
347
+ name: 'Voice Agent',
348
+ instructions: 'You are a voice assistant that can help users with their tasks.',
349
+ model: 'openai/gpt-5.4',
350
+ voice: new OpenAIVoice(),
351
+ })
352
+
353
+ // Use an audio file from a URL
354
+ const audioStream = await createReadStream('./how_can_i_help_you.mp3')
355
+
356
+ // Convert audio to text
357
+ const transcript = await voiceAgent.voice.listen(audioStream)
358
+ console.log(`User said: ${transcript}`)
359
+
360
+ // Generate a response based on the transcript
361
+ const { text } = await voiceAgent.generate(transcript)
362
+ ```
363
+
364
+ Visit the [OpenAI Voice Reference](https://mastra.ai/reference/voice/openai) for more information on the OpenAI voice provider.
365
+
366
+ **Azure**:
367
+
368
+ ```typescript
369
+ import { createReadStream } from 'fs'
370
+ import { Agent } from '@mastra/core/agent'
371
+ import { AzureVoice } from '@mastra/voice-azure'
372
+ import { createReadStream } from 'fs'
373
+
374
+ const voiceAgent = new Agent({
375
+ id: 'voice-agent',
376
+ name: 'Voice Agent',
377
+ instructions: 'You are a voice assistant that can help users with their tasks.',
378
+ model: 'openai/gpt-5.4',
379
+ voice: new AzureVoice(),
380
+ })
381
+
382
+ // Use an audio file from a URL
383
+ const audioStream = await createReadStream('./how_can_i_help_you.mp3')
384
+
385
+ // Convert audio to text
386
+ const transcript = await voiceAgent.voice.listen(audioStream)
387
+ console.log(`User said: ${transcript}`)
388
+
389
+ // Generate a response based on the transcript
390
+ const { text } = await voiceAgent.generate(transcript)
391
+ ```
392
+
393
+ Visit the [Azure Voice Reference](https://mastra.ai/reference/voice/azure) for more information on the Azure voice provider.
394
+
395
+ **ElevenLabs**:
396
+
397
+ ```typescript
398
+ import { Agent } from '@mastra/core/agent'
399
+ import { ElevenLabsVoice } from '@mastra/voice-elevenlabs'
400
+ import { createReadStream } from 'fs'
401
+
402
+ const voiceAgent = new Agent({
403
+ id: 'voice-agent',
404
+ name: 'Voice Agent',
405
+ instructions: 'You are a voice assistant that can help users with their tasks.',
406
+ model: 'openai/gpt-5.4',
407
+ voice: new ElevenLabsVoice(),
408
+ })
409
+
410
+ // Use an audio file from a URL
411
+ const audioStream = await createReadStream('./how_can_i_help_you.mp3')
412
+
413
+ // Convert audio to text
414
+ const transcript = await voiceAgent.voice.listen(audioStream)
415
+ console.log(`User said: ${transcript}`)
416
+
417
+ // Generate a response based on the transcript
418
+ const { text } = await voiceAgent.generate(transcript)
419
+ ```
420
+
421
+ Visit the [ElevenLabs Voice Reference](https://mastra.ai/reference/voice/elevenlabs) for more information on the ElevenLabs voice provider.
422
+
423
+ **Google**:
424
+
425
+ ```typescript
426
+ import { Agent } from '@mastra/core/agent'
427
+ import { GoogleVoice } from '@mastra/voice-google'
428
+ import { createReadStream } from 'fs'
429
+
430
+ const voiceAgent = new Agent({
431
+ id: 'voice-agent',
432
+ name: 'Voice Agent',
433
+ instructions: 'You are a voice assistant that can help users with their tasks.',
434
+ model: 'openai/gpt-5.4',
435
+ voice: new GoogleVoice(),
436
+ })
437
+
438
+ // Use an audio file from a URL
439
+ const audioStream = await createReadStream('./how_can_i_help_you.mp3')
440
+
441
+ // Convert audio to text
442
+ const transcript = await voiceAgent.voice.listen(audioStream)
443
+ console.log(`User said: ${transcript}`)
444
+
445
+ // Generate a response based on the transcript
446
+ const { text } = await voiceAgent.generate(transcript)
447
+ ```
448
+
449
+ Visit the [Google Voice Reference](https://mastra.ai/reference/voice/google) for more information on the Google voice provider.
450
+
451
+ **Cloudflare**:
452
+
453
+ ```typescript
454
+ import { Agent } from '@mastra/core/agent'
455
+ import { CloudflareVoice } from '@mastra/voice-cloudflare'
456
+ import { createReadStream } from 'fs'
457
+
458
+ const voiceAgent = new Agent({
459
+ id: 'voice-agent',
460
+ name: 'Voice Agent',
461
+ instructions: 'You are a voice assistant that can help users with their tasks.',
462
+ model: 'openai/gpt-5.4',
463
+ voice: new CloudflareVoice(),
464
+ })
465
+
466
+ // Use an audio file from a URL
467
+ const audioStream = await createReadStream('./how_can_i_help_you.mp3')
468
+
469
+ // Convert audio to text
470
+ const transcript = await voiceAgent.voice.listen(audioStream)
471
+ console.log(`User said: ${transcript}`)
472
+
473
+ // Generate a response based on the transcript
474
+ const { text } = await voiceAgent.generate(transcript)
475
+ ```
476
+
477
+ Visit the [Cloudflare Voice Reference](https://mastra.ai/reference/voice/cloudflare) for more information on the Cloudflare voice provider.
478
+
479
+ **Deepgram**:
480
+
481
+ ```typescript
482
+ import { Agent } from '@mastra/core/agent'
483
+ import { DeepgramVoice } from '@mastra/voice-deepgram'
484
+ import { createReadStream } from 'fs'
485
+
486
+ const voiceAgent = new Agent({
487
+ id: 'voice-agent',
488
+ name: 'Voice Agent',
489
+ instructions: 'You are a voice assistant that can help users with their tasks.',
490
+ model: 'openai/gpt-5.4',
491
+ voice: new DeepgramVoice(),
492
+ })
493
+
494
+ // Use an audio file from a URL
495
+ const audioStream = await createReadStream('./how_can_i_help_you.mp3')
496
+
497
+ // Convert audio to text
498
+ const transcript = await voiceAgent.voice.listen(audioStream)
499
+ console.log(`User said: ${transcript}`)
500
+
501
+ // Generate a response based on the transcript
502
+ const { text } = await voiceAgent.generate(transcript)
503
+ ```
504
+
505
+ Visit the [Deepgram Voice Reference](https://mastra.ai/reference/voice/deepgram) for more information on the Deepgram voice provider.
506
+
507
+ **Inworld**:
508
+
509
+ ```typescript
510
+ import { Agent } from '@mastra/core/agent'
511
+ import { InworldVoice } from '@mastra/voice-inworld'
512
+ import { createReadStream } from 'fs'
513
+
514
+ const voiceAgent = new Agent({
515
+ id: 'voice-agent',
516
+ name: 'Voice Agent',
517
+ instructions: 'You are a voice assistant that can help users with their tasks.',
518
+ model: 'openai/gpt-5.4',
519
+ voice: new InworldVoice(),
520
+ })
521
+
522
+ // Use an audio file from a URL
523
+ const audioStream = await createReadStream('./how_can_i_help_you.mp3')
524
+
525
+ // Convert audio to text
526
+ const transcript = await voiceAgent.voice.listen(audioStream)
527
+ console.log(`User said: ${transcript}`)
528
+
529
+ // Generate a response based on the transcript
530
+ const { text } = await voiceAgent.generate(transcript)
531
+ ```
532
+
533
+ Visit the [Inworld Voice Reference](https://mastra.ai/reference/voice/inworld) for more information on the Inworld voice provider.
534
+
535
+ **Sarvam**:
536
+
537
+ ```typescript
538
+ import { Agent } from '@mastra/core/agent'
539
+ import { SarvamVoice } from '@mastra/voice-sarvam'
540
+ import { createReadStream } from 'fs'
541
+
542
+ const voiceAgent = new Agent({
543
+ id: 'voice-agent',
544
+ name: 'Voice Agent',
545
+ instructions: 'You are a voice assistant that can help users with their tasks.',
546
+ model: 'openai/gpt-5.4',
547
+ voice: new SarvamVoice(),
548
+ })
549
+
550
+ // Use an audio file from a URL
551
+ const audioStream = await createReadStream('./how_can_i_help_you.mp3')
552
+
553
+ // Convert audio to text
554
+ const transcript = await voiceAgent.voice.listen(audioStream)
555
+ console.log(`User said: ${transcript}`)
556
+
557
+ // Generate a response based on the transcript
558
+ const { text } = await voiceAgent.generate(transcript)
559
+ ```
560
+
561
+ Visit the [Sarvam Voice Reference](https://mastra.ai/reference/voice/sarvam) for more information on the Sarvam voice provider.
562
+
563
+ ### Speech to Speech (STS)
564
+
565
+ Create conversational experiences with speech-to-speech capabilities. The unified API enables real-time voice interactions between users and AI agents. For detailed configuration options and advanced features, check out [Speech to Speech](https://mastra.ai/docs/voice/speech-to-speech).
566
+
567
+ **OpenAI**:
568
+
569
+ ```typescript
570
+ import { Agent } from '@mastra/core/agent'
571
+ import { playAudio, getMicrophoneStream } from '@mastra/node-audio'
572
+ import { OpenAIRealtimeVoice } from '@mastra/voice-openai-realtime'
573
+
574
+ const voiceAgent = new Agent({
575
+ id: 'voice-agent',
576
+ name: 'Voice Agent',
577
+ instructions: 'You are a voice assistant that can help users with their tasks.',
578
+ model: 'openai/gpt-5.4',
579
+ voice: new OpenAIRealtimeVoice(),
580
+ })
581
+
582
+ // Listen for agent audio responses
583
+ voiceAgent.voice.on('speaker', ({ audio }) => {
584
+ playAudio(audio)
585
+ })
586
+
587
+ // Initiate the conversation
588
+ await voiceAgent.voice.speak('How can I help you today?')
589
+
590
+ // Send continuous audio from the microphone
591
+ const micStream = getMicrophoneStream()
592
+ await voiceAgent.voice.send(micStream)
593
+ ```
594
+
595
+ Visit the [OpenAI Voice Reference](https://mastra.ai/reference/voice/openai-realtime) for more information on the OpenAI voice provider.
596
+
597
+ **Google**:
598
+
599
+ ```typescript
600
+ import { Agent } from '@mastra/core/agent'
601
+ import { playAudio, getMicrophoneStream } from '@mastra/node-audio'
602
+ import { GeminiLiveVoice } from '@mastra/voice-google-gemini-live'
603
+
604
+ const voiceAgent = new Agent({
605
+ id: 'voice-agent',
606
+ name: 'Voice Agent',
607
+ instructions: 'You are a voice assistant that can help users with their tasks.',
608
+ model: 'openai/gpt-5.4',
609
+ voice: new GeminiLiveVoice({
610
+ // Live API mode
611
+ apiKey: process.env.GOOGLE_API_KEY,
612
+ model: 'gemini-2.0-flash-exp',
613
+ speaker: 'Puck',
614
+ debug: true,
615
+ // Vertex AI alternative:
616
+ // vertexAI: true,
617
+ // project: 'your-gcp-project',
618
+ // location: 'us-central1',
619
+ // serviceAccountKeyFile: '/path/to/service-account.json',
620
+ }),
621
+ })
622
+
623
+ // Connect before using speak/send
624
+ await voiceAgent.voice.connect()
625
+
626
+ // Listen for agent audio responses
627
+ voiceAgent.voice.on('speaker', ({ audio }) => {
628
+ playAudio(audio)
629
+ })
630
+
631
+ // Listen for text responses and transcriptions
632
+ voiceAgent.voice.on('writing', ({ text, role }) => {
633
+ console.log(`${role}: ${text}`)
634
+ })
635
+
636
+ // Initiate the conversation
637
+ await voiceAgent.voice.speak('How can I help you today?')
638
+
639
+ // Send continuous audio from the microphone
640
+ const micStream = getMicrophoneStream()
641
+ await voiceAgent.voice.send(micStream)
642
+ ```
643
+
644
+ Visit the [Google Gemini Live Reference](https://mastra.ai/reference/voice/google-gemini-live) for more information on the Google Gemini Live voice provider.
645
+
646
+ **AWS Nova Sonic**:
647
+
648
+ ```typescript
649
+ import { Agent } from '@mastra/core/agent'
650
+ import { playAudio, getMicrophoneStream } from '@mastra/node-audio'
651
+ import { NovaSonicVoice } from '@mastra/voice-aws-nova-sonic'
652
+
653
+ const voiceAgent = new Agent({
654
+ id: 'voice-agent',
655
+ name: 'Voice Agent',
656
+ instructions: 'You are a voice assistant that can help users with their tasks.',
657
+ model: 'openai/gpt-5.4',
658
+ voice: new NovaSonicVoice({
659
+ region: 'us-east-1',
660
+ speaker: 'matthew',
661
+ // Static credentials are optional. The default AWS credential
662
+ // provider chain is used when none are passed.
663
+ }),
664
+ })
665
+
666
+ // Connect before using speak/send
667
+ await voiceAgent.voice.connect()
668
+
669
+ // Listen for assistant audio (Int16Array PCM)
670
+ voiceAgent.voice.on('speaking', ({ audioData }) => {
671
+ if (audioData) playAudio(audioData)
672
+ })
673
+
674
+ // Listen for transcribed text
675
+ voiceAgent.voice.on('writing', ({ text, role }) => {
676
+ console.log(`${role}: ${text}`)
677
+ })
678
+
679
+ // Initiate the conversation
680
+ await voiceAgent.voice.speak('How can I help you today?')
681
+
682
+ // Send continuous audio from the microphone
683
+ const micStream = getMicrophoneStream()
684
+ await voiceAgent.voice.send(micStream)
685
+ ```
686
+
687
+ Visit the [AWS Nova Sonic Reference](https://mastra.ai/reference/voice/aws-nova-sonic) for more information on the AWS Nova Sonic voice provider.
688
+
689
+ **xAI**:
690
+
691
+ ```typescript
692
+ import { Agent } from '@mastra/core/agent'
693
+ import { playAudio, getMicrophoneStream } from '@mastra/node-audio'
694
+ import { XAIRealtimeVoice } from '@mastra/voice-xai-realtime'
695
+
696
+ const voiceAgent = new Agent({
697
+ id: 'voice-agent',
698
+ name: 'Voice Agent',
699
+ instructions: 'You are a voice assistant that can help users with their tasks.',
700
+ model: 'xai/grok-4.3',
701
+ voice: new XAIRealtimeVoice({
702
+ apiKey: process.env.XAI_API_KEY,
703
+ model: 'grok-voice-think-fast-1.0',
704
+ speaker: 'eve',
705
+ turnDetection: { type: 'server_vad' },
706
+ }),
707
+ })
708
+
709
+ // Connect before using speak/send
710
+ await voiceAgent.voice.connect()
711
+
712
+ // Listen for agent audio responses
713
+ voiceAgent.voice.on('speaker', audioStream => {
714
+ playAudio(audioStream)
715
+ })
716
+
717
+ // Listen for text responses and transcriptions
718
+ voiceAgent.voice.on('writing', ({ text, role }) => {
719
+ console.log(`${role}: ${text}`)
720
+ })
721
+
722
+ // Initiate the conversation
723
+ await voiceAgent.voice.speak('How can I help you today?')
724
+
725
+ // Send continuous audio from the microphone
726
+ const micStream = getMicrophoneStream()
727
+ await voiceAgent.voice.send(micStream)
728
+ ```
729
+
730
+ Visit the [xAI Realtime Voice Reference](https://mastra.ai/reference/voice/xai-realtime) for more information on the xAI voice provider.
731
+
732
+ ## Voice configuration
733
+
734
+ Each voice provider can be configured with different models and options. Below are the detailed configuration options for all supported providers:
735
+
736
+ **OpenAI**:
737
+
738
+ ```typescript
739
+ // OpenAI Voice Configuration
740
+ const voice = new OpenAIVoice({
741
+ speechModel: {
742
+ name: 'gpt-3.5-turbo', // Example model name
743
+ apiKey: process.env.OPENAI_API_KEY,
744
+ language: 'en-US', // Language code
745
+ voiceType: 'neural', // Type of voice model
746
+ },
747
+ listeningModel: {
748
+ name: 'whisper-1', // Example model name
749
+ apiKey: process.env.OPENAI_API_KEY,
750
+ language: 'en-US', // Language code
751
+ format: 'wav', // Audio format
752
+ },
753
+ speaker: 'alloy', // Example speaker name
754
+ })
755
+ ```
756
+
757
+ Visit the [OpenAI Voice Reference](https://mastra.ai/reference/voice/openai) for more information on the OpenAI voice provider.
758
+
759
+ **Azure**:
760
+
761
+ ```typescript
762
+ // Azure Voice Configuration
763
+ const voice = new AzureVoice({
764
+ speechModel: {
765
+ name: 'en-US-JennyNeural', // Example model name
766
+ apiKey: process.env.AZURE_SPEECH_KEY,
767
+ region: process.env.AZURE_SPEECH_REGION,
768
+ language: 'en-US', // Language code
769
+ style: 'cheerful', // Voice style
770
+ pitch: '+0Hz', // Pitch adjustment
771
+ rate: '1.0', // Speech rate
772
+ },
773
+ listeningModel: {
774
+ name: 'en-US', // Example model name
775
+ apiKey: process.env.AZURE_SPEECH_KEY,
776
+ region: process.env.AZURE_SPEECH_REGION,
777
+ format: 'simple', // Output format
778
+ },
779
+ })
780
+ ```
781
+
782
+ Visit the [Azure Voice Reference](https://mastra.ai/reference/voice/azure) for more information on the Azure voice provider.
783
+
784
+ **ElevenLabs**:
785
+
786
+ ```typescript
787
+ // ElevenLabs Voice Configuration
788
+ const voice = new ElevenLabsVoice({
789
+ speechModel: {
790
+ voiceId: 'your-voice-id', // Example voice ID
791
+ model: 'eleven_multilingual_v2', // Example model name
792
+ apiKey: process.env.ELEVENLABS_API_KEY,
793
+ language: 'en', // Language code
794
+ emotion: 'neutral', // Emotion setting
795
+ },
796
+ // ElevenLabs may not have a separate listening model
797
+ })
798
+ ```
799
+
800
+ Visit the [ElevenLabs Voice Reference](https://mastra.ai/reference/voice/elevenlabs) for more information on the ElevenLabs voice provider.
801
+
802
+ **PlayAI**:
803
+
804
+ ```typescript
805
+ // PlayAI Voice Configuration
806
+ const voice = new PlayAIVoice({
807
+ speechModel: {
808
+ name: 'playai-voice', // Example model name
809
+ speaker: 'emma', // Example speaker name
810
+ apiKey: process.env.PLAYAI_API_KEY,
811
+ language: 'en-US', // Language code
812
+ speed: 1.0, // Speech speed
813
+ },
814
+ // PlayAI may not have a separate listening model
815
+ })
816
+ ```
817
+
818
+ Visit the [PlayAI Voice Reference](https://mastra.ai/reference/voice/playai) for more information on the PlayAI voice provider.
819
+
820
+ **Google**:
821
+
822
+ ```typescript
823
+ // Google Voice Configuration
824
+ const voice = new GoogleVoice({
825
+ speechModel: {
826
+ name: 'en-US-Studio-O', // Example model name
827
+ apiKey: process.env.GOOGLE_API_KEY,
828
+ languageCode: 'en-US', // Language code
829
+ gender: 'FEMALE', // Voice gender
830
+ speakingRate: 1.0, // Speaking rate
831
+ },
832
+ listeningModel: {
833
+ name: 'en-US', // Example model name
834
+ sampleRateHertz: 16000, // Sample rate
835
+ },
836
+ })
837
+ ```
838
+
839
+ Visit the [Google Voice Reference](https://mastra.ai/reference/voice/google) for more information on the Google voice provider.
840
+
841
+ **Cloudflare**:
842
+
843
+ ```typescript
844
+ // Cloudflare Voice Configuration
845
+ const voice = new CloudflareVoice({
846
+ speechModel: {
847
+ name: 'cloudflare-voice', // Example model name
848
+ accountId: process.env.CLOUDFLARE_ACCOUNT_ID,
849
+ apiToken: process.env.CLOUDFLARE_API_TOKEN,
850
+ language: 'en-US', // Language code
851
+ format: 'mp3', // Audio format
852
+ },
853
+ // Cloudflare may not have a separate listening model
854
+ })
855
+ ```
856
+
857
+ Visit the [Cloudflare Voice Reference](https://mastra.ai/reference/voice/cloudflare) for more information on the Cloudflare voice provider.
858
+
859
+ **Deepgram**:
860
+
861
+ ```typescript
862
+ // Deepgram Voice Configuration
863
+ const voice = new DeepgramVoice({
864
+ speechModel: {
865
+ name: 'nova-2', // Example model name
866
+ speaker: 'aura-english-us', // Example speaker name
867
+ apiKey: process.env.DEEPGRAM_API_KEY,
868
+ language: 'en-US', // Language code
869
+ tone: 'formal', // Tone setting
870
+ },
871
+ listeningModel: {
872
+ name: 'nova-2', // Example model name
873
+ format: 'flac', // Audio format
874
+ },
875
+ })
876
+ ```
877
+
878
+ Visit the [Deepgram Voice Reference](https://mastra.ai/reference/voice/deepgram) for more information on the Deepgram voice provider.
879
+
880
+ **Inworld**:
881
+
882
+ ```typescript
883
+ // Inworld Voice Configuration
884
+ const voice = new InworldVoice({
885
+ speechModel: {
886
+ name: 'inworld-tts-2',
887
+ apiKey: process.env.INWORLD_API_KEY,
888
+ },
889
+ listeningModel: {
890
+ name: 'groq/whisper-large-v3',
891
+ apiKey: process.env.INWORLD_API_KEY,
892
+ },
893
+ speaker: 'Dennis',
894
+ audioEncoding: 'MP3',
895
+ sampleRateHertz: 48000,
896
+ language: 'en-US',
897
+ })
898
+
899
+ // Per-call options: `deliveryMode` is honored only by `inworld-tts-2`.
900
+ const audioStream = await voice.speak('Hello!', {
901
+ deliveryMode: 'BALANCED', // 'STABLE' | 'BALANCED' | 'CREATIVE'
902
+ language: 'en-US', // BCP-47 per-call override
903
+ })
904
+ ```
905
+
906
+ Visit the [Inworld Voice Reference](https://mastra.ai/reference/voice/inworld) for more information on the Inworld voice provider.
907
+
908
+ **Speechify**:
909
+
910
+ ```typescript
911
+ // Speechify Voice Configuration
912
+ const voice = new SpeechifyVoice({
913
+ speechModel: {
914
+ name: 'speechify-voice', // Example model name
915
+ speaker: 'matthew', // Example speaker name
916
+ apiKey: process.env.SPEECHIFY_API_KEY,
917
+ language: 'en-US', // Language code
918
+ speed: 1.0, // Speech speed
919
+ },
920
+ // Speechify may not have a separate listening model
921
+ })
922
+ ```
923
+
924
+ Visit the [Speechify Voice Reference](https://mastra.ai/reference/voice/speechify) for more information on the Speechify voice provider.
925
+
926
+ **Sarvam**:
927
+
928
+ ```typescript
929
+ // Sarvam Voice Configuration
930
+ const voice = new SarvamVoice({
931
+ speechModel: {
932
+ model: 'bulbul:v3', // TTS model (bulbul:v2 or bulbul:v3)
933
+ apiKey: process.env.SARVAM_API_KEY,
934
+ language: 'en-IN', // BCP-47 language code
935
+ },
936
+ listeningModel: {
937
+ model: 'saarika:v2.5', // STT model (saarika:v2.5 or saaras:v3)
938
+ apiKey: process.env.SARVAM_API_KEY,
939
+ },
940
+ speaker: 'shubh', // Default bulbul:v3 speaker
941
+ })
942
+ ```
943
+
944
+ Visit the [Sarvam Voice Reference](https://mastra.ai/reference/voice/sarvam) for more information on the Sarvam voice provider.
945
+
946
+ **Murf**:
947
+
948
+ ```typescript
949
+ // Murf Voice Configuration
950
+ const voice = new MurfVoice({
951
+ speechModel: {
952
+ name: 'murf-voice', // Example model name
953
+ apiKey: process.env.MURF_API_KEY,
954
+ language: 'en-US', // Language code
955
+ emotion: 'happy', // Emotion setting
956
+ },
957
+ // Murf may not have a separate listening model
958
+ })
959
+ ```
960
+
961
+ Visit the [Murf Voice Reference](https://mastra.ai/reference/voice/murf) for more information on the Murf voice provider.
962
+
963
+ **OpenAI Realtime**:
964
+
965
+ ```typescript
966
+ // OpenAI Realtime Voice Configuration
967
+ const voice = new OpenAIRealtimeVoice({
968
+ speechModel: {
969
+ name: 'gpt-3.5-turbo', // Example model name
970
+ apiKey: process.env.OPENAI_API_KEY,
971
+ language: 'en-US', // Language code
972
+ },
973
+ listeningModel: {
974
+ name: 'whisper-1', // Example model name
975
+ apiKey: process.env.OPENAI_API_KEY,
976
+ format: 'ogg', // Audio format
977
+ },
978
+ speaker: 'alloy', // Example speaker name
979
+ })
980
+ ```
981
+
982
+ For more information on the OpenAI Realtime voice provider, refer to the [OpenAI Realtime Voice Reference](https://mastra.ai/reference/voice/openai-realtime).
983
+
984
+ **xAI Realtime**:
985
+
986
+ ```typescript
987
+ // xAI Realtime Voice Configuration
988
+ const voice = new XAIRealtimeVoice({
989
+ apiKey: process.env.XAI_API_KEY,
990
+ model: 'grok-voice-think-fast-1.0',
991
+ speaker: 'eve',
992
+ instructions: 'You are a concise voice assistant.',
993
+ turnDetection: {
994
+ type: 'server_vad',
995
+ threshold: 0.85,
996
+ silence_duration_ms: 1000,
997
+ prefix_padding_ms: 333,
998
+ },
999
+ audio: {
1000
+ input: { format: { type: 'audio/pcm', rate: 24000 } },
1001
+ output: { format: { type: 'audio/pcm', rate: 24000 } },
1002
+ },
1003
+ serverTools: [
1004
+ { type: 'web_search' },
1005
+ {
1006
+ type: 'mcp',
1007
+ server_url: 'https://mcp.example.com/mcp',
1008
+ server_label: 'business-tools',
1009
+ },
1010
+ ],
1011
+ })
1012
+ ```
1013
+
1014
+ Visit the [xAI Realtime Voice Reference](https://mastra.ai/reference/voice/xai-realtime) for more information on the xAI realtime voice provider.
1015
+
1016
+ **Google Gemini Live**:
1017
+
1018
+ ```typescript
1019
+ // Google Gemini Live Voice Configuration
1020
+ const voice = new GeminiLiveVoice({
1021
+ speechModel: {
1022
+ name: 'gemini-2.0-flash-exp', // Example model name
1023
+ apiKey: process.env.GOOGLE_API_KEY,
1024
+ },
1025
+ speaker: 'Puck', // Example speaker name
1026
+ // Google Gemini Live is a realtime bidirectional API without separate speech and listening models
1027
+ })
1028
+ ```
1029
+
1030
+ Visit the [Google Gemini Live Reference](https://mastra.ai/reference/voice/google-gemini-live) for more information on the Google Gemini Live voice provider.
1031
+
1032
+ **AWS Nova Sonic**:
1033
+
1034
+ ```typescript
1035
+ // AWS Nova Sonic Voice Configuration
1036
+ const voice = new NovaSonicVoice({
1037
+ region: 'us-east-1',
1038
+ speaker: 'matthew',
1039
+ sessionConfig: {
1040
+ inferenceConfiguration: {
1041
+ temperature: 0.7,
1042
+ maxTokens: 1024,
1043
+ },
1044
+ turnDetectionConfiguration: {
1045
+ endpointingSensitivity: 'MEDIUM',
1046
+ },
1047
+ },
1048
+ // AWS Nova Sonic is a realtime bidirectional API without separate speech and listening models
1049
+ })
1050
+ ```
1051
+
1052
+ Visit the [AWS Nova Sonic Reference](https://mastra.ai/reference/voice/aws-nova-sonic) for more information on the AWS Nova Sonic voice provider.
1053
+
1054
+ **AI SDK**:
1055
+
1056
+ ```typescript
1057
+ // AI SDK Voice Configuration
1058
+ import { CompositeVoice } from '@mastra/core/voice'
1059
+ import { openai } from '@ai-sdk/openai'
1060
+ import { elevenlabs } from '@ai-sdk/elevenlabs'
1061
+
1062
+ // Use AI SDK models directly - no need to install separate packages
1063
+ const voice = new CompositeVoice({
1064
+ input: openai.transcription('whisper-1'), // AI SDK transcription
1065
+ output: elevenlabs.speech('eleven_turbo_v2'), // AI SDK speech
1066
+ })
1067
+
1068
+ // Works seamlessly with your agent
1069
+ const voiceAgent = new Agent({
1070
+ id: 'aisdk-voice-agent',
1071
+ name: 'AI SDK Voice Agent',
1072
+ instructions: 'You are a helpful assistant with voice capabilities.',
1073
+ model: 'openai/gpt-5.4',
1074
+ voice,
1075
+ })
1076
+ ```
1077
+
1078
+ ### Using Multiple Voice Providers
1079
+
1080
+ This example demonstrates how to create and use two different voice providers in Mastra: OpenAI for speech-to-text (STT) and PlayAI for text-to-speech (TTS).
1081
+
1082
+ Start by creating instances of the voice providers with any necessary configuration.
1083
+
1084
+ ```typescript
1085
+ import { OpenAIVoice } from '@mastra/voice-openai'
1086
+ import { PlayAIVoice } from '@mastra/voice-playai'
1087
+ import { CompositeVoice } from '@mastra/core/voice'
1088
+ import { playAudio, getMicrophoneStream } from '@mastra/node-audio'
1089
+
1090
+ // Initialize OpenAI voice for STT
1091
+ const input = new OpenAIVoice({
1092
+ listeningModel: {
1093
+ name: 'whisper-1',
1094
+ apiKey: process.env.OPENAI_API_KEY,
1095
+ },
1096
+ })
1097
+
1098
+ // Initialize PlayAI voice for TTS
1099
+ const output = new PlayAIVoice({
1100
+ speechModel: {
1101
+ name: 'playai-voice',
1102
+ apiKey: process.env.PLAYAI_API_KEY,
1103
+ },
1104
+ })
1105
+
1106
+ // Combine the providers using CompositeVoice
1107
+ const voice = new CompositeVoice({
1108
+ input,
1109
+ output,
1110
+ })
1111
+
1112
+ // Implement voice interactions using the combined voice provider
1113
+ const audioStream = getMicrophoneStream() // Assume this function gets audio input
1114
+ const transcript = await voice.listen(audioStream)
1115
+
1116
+ // Log the transcribed text
1117
+ console.log('Transcribed text:', transcript)
1118
+
1119
+ // Convert text to speech
1120
+ const responseAudio = await voice.speak(`You said: ${transcript}`, {
1121
+ speaker: 'default', // Optional: specify a speaker,
1122
+ responseFormat: 'wav', // Optional: specify a response format
1123
+ })
1124
+
1125
+ // Play the audio response
1126
+ playAudio(responseAudio)
1127
+ ```
1128
+
1129
+ ### Using AI SDK Model Providers
1130
+
1131
+ You can also use AI SDK models directly with `CompositeVoice`:
1132
+
1133
+ ```typescript
1134
+ import { CompositeVoice } from '@mastra/core/voice'
1135
+ import { openai } from '@ai-sdk/openai'
1136
+ import { elevenlabs } from '@ai-sdk/elevenlabs'
1137
+ import { playAudio, getMicrophoneStream } from '@mastra/node-audio'
1138
+
1139
+ // Use AI SDK models directly - no provider setup needed
1140
+ const voice = new CompositeVoice({
1141
+ input: openai.transcription('whisper-1'), // AI SDK transcription
1142
+ output: elevenlabs.speech('eleven_turbo_v2'), // AI SDK speech
1143
+ })
1144
+
1145
+ // Works the same way as Mastra providers
1146
+ const audioStream = getMicrophoneStream()
1147
+ const transcript = await voice.listen(audioStream)
1148
+
1149
+ console.log('Transcribed text:', transcript)
1150
+
1151
+ // Convert text to speech
1152
+ const responseAudio = await voice.speak(`You said: ${transcript}`, {
1153
+ speaker: 'Rachel', // ElevenLabs voice
1154
+ })
1155
+
1156
+ playAudio(responseAudio)
1157
+ ```
1158
+
1159
+ You can also mix AI SDK models with Mastra providers:
1160
+
1161
+ ```typescript
1162
+ import { CompositeVoice } from '@mastra/core/voice'
1163
+ import { PlayAIVoice } from '@mastra/voice-playai'
1164
+ import { groq } from '@ai-sdk/groq'
1165
+
1166
+ const voice = new CompositeVoice({
1167
+ input: groq.transcription('whisper-large-v3'), // AI SDK for STT
1168
+ output: new PlayAIVoice(), // Mastra provider for TTS
1169
+ })
1170
+ ```
1171
+
1172
+ For more information on the CompositeVoice, refer to the [CompositeVoice Reference](https://mastra.ai/reference/voice/composite-voice).
1173
+
1174
+ ## More resources
1175
+
1176
+ - [CompositeVoice](https://mastra.ai/reference/voice/composite-voice)
1177
+ - [MastraVoice](https://mastra.ai/reference/voice/mastra-voice)
1178
+ - [OpenAI Voice](https://mastra.ai/reference/voice/openai)
1179
+ - [OpenAI Realtime Voice](https://mastra.ai/reference/voice/openai-realtime)
1180
+ - [xAI Realtime Voice](https://mastra.ai/reference/voice/xai-realtime)
1181
+ - [Azure Voice](https://mastra.ai/reference/voice/azure)
1182
+ - [Google Voice](https://mastra.ai/reference/voice/google)
1183
+ - [Google Gemini Live Voice](https://mastra.ai/reference/voice/google-gemini-live)
1184
+ - [AWS Nova Sonic Voice](https://mastra.ai/reference/voice/aws-nova-sonic)
1185
+ - [Deepgram Voice](https://mastra.ai/reference/voice/deepgram)
1186
+ - [Inworld Voice](https://mastra.ai/reference/voice/inworld)
1187
+ - [PlayAI Voice](https://mastra.ai/reference/voice/playai)
1188
+ - [Voice Examples](https://github.com/mastra-ai/voice-examples)