@mastra/voice-aws-nova-sonic 0.0.0-studio-cli-20260504022012

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,1028 @@
1
+ # Voice in Mastra
2
+
3
+ Mastra's Voice system provides a unified interface for voice interactions, enabling text-to-speech (TTS), speech-to-text (STT), and real-time speech-to-speech (STS) capabilities in your applications.
4
+
5
+ ## Adding voice to agents
6
+
7
+ To learn how to integrate voice capabilities into your agents, check out the [Adding Voice to Agents](https://mastra.ai/docs/agents/adding-voice) documentation. This section covers how to use both single and multiple voice providers, as well as real-time interactions.
8
+
9
+ ```typescript
10
+ import { Agent } from '@mastra/core/agent'
11
+ import { OpenAIVoice } from '@mastra/voice-openai'
12
+
13
+ // Initialize OpenAI voice for TTS
14
+
15
+ const voiceAgent = new Agent({
16
+ id: 'voice-agent',
17
+ name: 'Voice Agent',
18
+ instructions: 'You are a voice assistant that can help users with their tasks.',
19
+ model: 'openai/gpt-5.4',
20
+ voice: new OpenAIVoice(),
21
+ })
22
+ ```
23
+
24
+ You can then use the following voice capabilities:
25
+
26
+ ### Text to Speech (TTS)
27
+
28
+ Turn your agent's responses into natural-sounding speech using Mastra's TTS capabilities. Choose from multiple providers like OpenAI, ElevenLabs, and more.
29
+
30
+ For detailed configuration options and advanced features, check out our [Text-to-Speech guide](https://mastra.ai/docs/voice/text-to-speech).
31
+
32
+ **OpenAI**:
33
+
34
+ ```typescript
35
+ import { Agent } from '@mastra/core/agent'
36
+ import { OpenAIVoice } from '@mastra/voice-openai'
37
+ import { playAudio } from '@mastra/node-audio'
38
+
39
+ const voiceAgent = new Agent({
40
+ id: 'voice-agent',
41
+ name: 'Voice Agent',
42
+ instructions: 'You are a voice assistant that can help users with their tasks.',
43
+ model: 'openai/gpt-5.4',
44
+ voice: new OpenAIVoice(),
45
+ })
46
+
47
+ const { text } = await voiceAgent.generate('What color is the sky?')
48
+
49
+ // Convert text to speech to an Audio Stream
50
+ const audioStream = await voiceAgent.voice.speak(text, {
51
+ speaker: 'default', // Optional: specify a speaker
52
+ responseFormat: 'wav', // Optional: specify a response format
53
+ })
54
+
55
+ playAudio(audioStream)
56
+ ```
57
+
58
+ Visit the [OpenAI Voice Reference](https://mastra.ai/reference/voice/openai) for more information on the OpenAI voice provider.
59
+
60
+ **Azure**:
61
+
62
+ ```typescript
63
+ import { Agent } from '@mastra/core/agent'
64
+ import { AzureVoice } from '@mastra/voice-azure'
65
+ import { playAudio } from '@mastra/node-audio'
66
+
67
+ const voiceAgent = new Agent({
68
+ id: 'voice-agent',
69
+ name: 'Voice Agent',
70
+ instructions: 'You are a voice assistant that can help users with their tasks.',
71
+ model: 'openai/gpt-5.4',
72
+ voice: new AzureVoice(),
73
+ })
74
+
75
+ const { text } = await voiceAgent.generate('What color is the sky?')
76
+
77
+ // Convert text to speech to an Audio Stream
78
+ const audioStream = await voiceAgent.voice.speak(text, {
79
+ speaker: 'en-US-JennyNeural', // Optional: specify a speaker
80
+ })
81
+
82
+ playAudio(audioStream)
83
+ ```
84
+
85
+ Visit the [Azure Voice Reference](https://mastra.ai/reference/voice/azure) for more information on the Azure voice provider.
86
+
87
+ **ElevenLabs**:
88
+
89
+ ```typescript
90
+ import { Agent } from '@mastra/core/agent'
91
+ import { ElevenLabsVoice } from '@mastra/voice-elevenlabs'
92
+ import { playAudio } from '@mastra/node-audio'
93
+
94
+ const voiceAgent = new Agent({
95
+ id: 'voice-agent',
96
+ name: 'Voice Agent',
97
+ instructions: 'You are a voice assistant that can help users with their tasks.',
98
+ model: 'openai/gpt-5.4',
99
+ voice: new ElevenLabsVoice(),
100
+ })
101
+
102
+ const { text } = await voiceAgent.generate('What color is the sky?')
103
+
104
+ // Convert text to speech to an Audio Stream
105
+ const audioStream = await voiceAgent.voice.speak(text, {
106
+ speaker: 'default', // Optional: specify a speaker
107
+ })
108
+
109
+ playAudio(audioStream)
110
+ ```
111
+
112
+ Visit the [ElevenLabs Voice Reference](https://mastra.ai/reference/voice/elevenlabs) for more information on the ElevenLabs voice provider.
113
+
114
+ **PlayAI**:
115
+
116
+ ```typescript
117
+ import { Agent } from '@mastra/core/agent'
118
+ import { PlayAIVoice } from '@mastra/voice-playai'
119
+ import { playAudio } from '@mastra/node-audio'
120
+
121
+ const voiceAgent = new Agent({
122
+ id: 'voice-agent',
123
+ name: 'Voice Agent',
124
+ instructions: 'You are a voice assistant that can help users with their tasks.',
125
+ model: 'openai/gpt-5.4',
126
+ voice: new PlayAIVoice(),
127
+ })
128
+
129
+ const { text } = await voiceAgent.generate('What color is the sky?')
130
+
131
+ // Convert text to speech to an Audio Stream
132
+ const audioStream = await voiceAgent.voice.speak(text, {
133
+ speaker: 'default', // Optional: specify a speaker
134
+ })
135
+
136
+ playAudio(audioStream)
137
+ ```
138
+
139
+ Visit the [PlayAI Voice Reference](https://mastra.ai/reference/voice/playai) for more information on the PlayAI voice provider.
140
+
141
+ **Google**:
142
+
143
+ ```typescript
144
+ import { Agent } from '@mastra/core/agent'
145
+ import { GoogleVoice } from '@mastra/voice-google'
146
+ import { playAudio } from '@mastra/node-audio'
147
+
148
+ const voiceAgent = new Agent({
149
+ id: 'voice-agent',
150
+ name: 'Voice Agent',
151
+ instructions: 'You are a voice assistant that can help users with their tasks.',
152
+ model: 'openai/gpt-5.4',
153
+ voice: new GoogleVoice(),
154
+ })
155
+
156
+ const { text } = await voiceAgent.generate('What color is the sky?')
157
+
158
+ // Convert text to speech to an Audio Stream
159
+ const audioStream = await voiceAgent.voice.speak(text, {
160
+ speaker: 'en-US-Studio-O', // Optional: specify a speaker
161
+ })
162
+
163
+ playAudio(audioStream)
164
+ ```
165
+
166
+ Visit the [Google Voice Reference](https://mastra.ai/reference/voice/google) for more information on the Google voice provider.
167
+
168
+ **Cloudflare**:
169
+
170
+ ```typescript
171
+ import { Agent } from '@mastra/core/agent'
172
+ import { CloudflareVoice } from '@mastra/voice-cloudflare'
173
+ import { playAudio } from '@mastra/node-audio'
174
+
175
+ const voiceAgent = new Agent({
176
+ id: 'voice-agent',
177
+ name: 'Voice Agent',
178
+ instructions: 'You are a voice assistant that can help users with their tasks.',
179
+ model: 'openai/gpt-5.4',
180
+ voice: new CloudflareVoice(),
181
+ })
182
+
183
+ const { text } = await voiceAgent.generate('What color is the sky?')
184
+
185
+ // Convert text to speech to an Audio Stream
186
+ const audioStream = await voiceAgent.voice.speak(text, {
187
+ speaker: 'default', // Optional: specify a speaker
188
+ })
189
+
190
+ playAudio(audioStream)
191
+ ```
192
+
193
+ Visit the [Cloudflare Voice Reference](https://mastra.ai/reference/voice/cloudflare) for more information on the Cloudflare voice provider.
194
+
195
+ **Deepgram**:
196
+
197
+ ```typescript
198
+ import { Agent } from '@mastra/core/agent'
199
+ import { DeepgramVoice } from '@mastra/voice-deepgram'
200
+ import { playAudio } from '@mastra/node-audio'
201
+
202
+ const voiceAgent = new Agent({
203
+ id: 'voice-agent',
204
+ name: 'Voice Agent',
205
+ instructions: 'You are a voice assistant that can help users with their tasks.',
206
+ model: 'openai/gpt-5.4',
207
+ voice: new DeepgramVoice(),
208
+ })
209
+
210
+ const { text } = await voiceAgent.generate('What color is the sky?')
211
+
212
+ // Convert text to speech to an Audio Stream
213
+ const audioStream = await voiceAgent.voice.speak(text, {
214
+ speaker: 'aura-english-us', // Optional: specify a speaker
215
+ })
216
+
217
+ playAudio(audioStream)
218
+ ```
219
+
220
+ Visit the [Deepgram Voice Reference](https://mastra.ai/reference/voice/deepgram) for more information on the Deepgram voice provider.
221
+
222
+ **Speechify**:
223
+
224
+ ```typescript
225
+ import { Agent } from '@mastra/core/agent'
226
+ import { SpeechifyVoice } from '@mastra/voice-speechify'
227
+ import { playAudio } from '@mastra/node-audio'
228
+
229
+ const voiceAgent = new Agent({
230
+ id: 'voice-agent',
231
+ name: 'Voice Agent',
232
+ instructions: 'You are a voice assistant that can help users with their tasks.',
233
+ model: 'openai/gpt-5.4',
234
+ voice: new SpeechifyVoice(),
235
+ })
236
+
237
+ const { text } = await voiceAgent.generate('What color is the sky?')
238
+
239
+ // Convert text to speech to an Audio Stream
240
+ const audioStream = await voiceAgent.voice.speak(text, {
241
+ speaker: 'matthew', // Optional: specify a speaker
242
+ })
243
+
244
+ playAudio(audioStream)
245
+ ```
246
+
247
+ Visit the [Speechify Voice Reference](https://mastra.ai/reference/voice/speechify) for more information on the Speechify voice provider.
248
+
249
+ **Sarvam**:
250
+
251
+ ```typescript
252
+ import { Agent } from '@mastra/core/agent'
253
+ import { SarvamVoice } from '@mastra/voice-sarvam'
254
+ import { playAudio } from '@mastra/node-audio'
255
+
256
+ const voiceAgent = new Agent({
257
+ id: 'voice-agent',
258
+ name: 'Voice Agent',
259
+ instructions: 'You are a voice assistant that can help users with their tasks.',
260
+ model: 'openai/gpt-5.4',
261
+ voice: new SarvamVoice(),
262
+ })
263
+
264
+ const { text } = await voiceAgent.generate('What color is the sky?')
265
+
266
+ // Convert text to speech to an Audio Stream
267
+ const audioStream = await voiceAgent.voice.speak(text, {
268
+ speaker: 'shubh', // Optional: specify a bulbul:v3 speaker
269
+ })
270
+
271
+ playAudio(audioStream)
272
+ ```
273
+
274
+ Visit the [Sarvam Voice Reference](https://mastra.ai/reference/voice/sarvam) for more information on the Sarvam voice provider.
275
+
276
+ **Murf**:
277
+
278
+ ```typescript
279
+ import { Agent } from '@mastra/core/agent'
280
+ import { MurfVoice } from '@mastra/voice-murf'
281
+ import { playAudio } from '@mastra/node-audio'
282
+
283
+ const voiceAgent = new Agent({
284
+ id: 'voice-agent',
285
+ name: 'Voice Agent',
286
+ instructions: 'You are a voice assistant that can help users with their tasks.',
287
+ model: 'openai/gpt-5.4',
288
+ voice: new MurfVoice(),
289
+ })
290
+
291
+ const { text } = await voiceAgent.generate('What color is the sky?')
292
+
293
+ // Convert text to speech to an Audio Stream
294
+ const audioStream = await voiceAgent.voice.speak(text, {
295
+ speaker: 'default', // Optional: specify a speaker
296
+ })
297
+
298
+ playAudio(audioStream)
299
+ ```
300
+
301
+ Visit the [Murf Voice Reference](https://mastra.ai/reference/voice/murf) for more information on the Murf voice provider.
302
+
303
+ ### Speech to Text (STT)
304
+
305
+ Transcribe spoken content using various providers like OpenAI, ElevenLabs, and more. For detailed configuration options and more, check out [Speech to Text](https://mastra.ai/docs/voice/speech-to-text).
306
+
307
+ You can download a sample audio file from [here](https://github.com/mastra-ai/realtime-voice-demo/raw/refs/heads/main/how_can_i_help_you.mp3).
308
+
309
+ [](https://github.com/mastra-ai/realtime-voice-demo/raw/refs/heads/main/how_can_i_help_you.mp3)
310
+
311
+ **OpenAI**:
312
+
313
+ ```typescript
314
+ import { Agent } from '@mastra/core/agent'
315
+ import { OpenAIVoice } from '@mastra/voice-openai'
316
+ import { createReadStream } from 'fs'
317
+
318
+ const voiceAgent = new Agent({
319
+ id: 'voice-agent',
320
+ name: 'Voice Agent',
321
+ instructions: 'You are a voice assistant that can help users with their tasks.',
322
+ model: 'openai/gpt-5.4',
323
+ voice: new OpenAIVoice(),
324
+ })
325
+
326
+ // Use an audio file from a URL
327
+ const audioStream = await createReadStream('./how_can_i_help_you.mp3')
328
+
329
+ // Convert audio to text
330
+ const transcript = await voiceAgent.voice.listen(audioStream)
331
+ console.log(`User said: ${transcript}`)
332
+
333
+ // Generate a response based on the transcript
334
+ const { text } = await voiceAgent.generate(transcript)
335
+ ```
336
+
337
+ Visit the [OpenAI Voice Reference](https://mastra.ai/reference/voice/openai) for more information on the OpenAI voice provider.
338
+
339
+ **Azure**:
340
+
341
+ ```typescript
342
+ import { createReadStream } from 'fs'
343
+ import { Agent } from '@mastra/core/agent'
344
+ import { AzureVoice } from '@mastra/voice-azure'
345
+ import { createReadStream } from 'fs'
346
+
347
+ const voiceAgent = new Agent({
348
+ id: 'voice-agent',
349
+ name: 'Voice Agent',
350
+ instructions: 'You are a voice assistant that can help users with their tasks.',
351
+ model: 'openai/gpt-5.4',
352
+ voice: new AzureVoice(),
353
+ })
354
+
355
+ // Use an audio file from a URL
356
+ const audioStream = await createReadStream('./how_can_i_help_you.mp3')
357
+
358
+ // Convert audio to text
359
+ const transcript = await voiceAgent.voice.listen(audioStream)
360
+ console.log(`User said: ${transcript}`)
361
+
362
+ // Generate a response based on the transcript
363
+ const { text } = await voiceAgent.generate(transcript)
364
+ ```
365
+
366
+ Visit the [Azure Voice Reference](https://mastra.ai/reference/voice/azure) for more information on the Azure voice provider.
367
+
368
+ **ElevenLabs**:
369
+
370
+ ```typescript
371
+ import { Agent } from '@mastra/core/agent'
372
+ import { ElevenLabsVoice } from '@mastra/voice-elevenlabs'
373
+ import { createReadStream } from 'fs'
374
+
375
+ const voiceAgent = new Agent({
376
+ id: 'voice-agent',
377
+ name: 'Voice Agent',
378
+ instructions: 'You are a voice assistant that can help users with their tasks.',
379
+ model: 'openai/gpt-5.4',
380
+ voice: new ElevenLabsVoice(),
381
+ })
382
+
383
+ // Use an audio file from a URL
384
+ const audioStream = await createReadStream('./how_can_i_help_you.mp3')
385
+
386
+ // Convert audio to text
387
+ const transcript = await voiceAgent.voice.listen(audioStream)
388
+ console.log(`User said: ${transcript}`)
389
+
390
+ // Generate a response based on the transcript
391
+ const { text } = await voiceAgent.generate(transcript)
392
+ ```
393
+
394
+ Visit the [ElevenLabs Voice Reference](https://mastra.ai/reference/voice/elevenlabs) for more information on the ElevenLabs voice provider.
395
+
396
+ **Google**:
397
+
398
+ ```typescript
399
+ import { Agent } from '@mastra/core/agent'
400
+ import { GoogleVoice } from '@mastra/voice-google'
401
+ import { createReadStream } from 'fs'
402
+
403
+ const voiceAgent = new Agent({
404
+ id: 'voice-agent',
405
+ name: 'Voice Agent',
406
+ instructions: 'You are a voice assistant that can help users with their tasks.',
407
+ model: 'openai/gpt-5.4',
408
+ voice: new GoogleVoice(),
409
+ })
410
+
411
+ // Use an audio file from a URL
412
+ const audioStream = await createReadStream('./how_can_i_help_you.mp3')
413
+
414
+ // Convert audio to text
415
+ const transcript = await voiceAgent.voice.listen(audioStream)
416
+ console.log(`User said: ${transcript}`)
417
+
418
+ // Generate a response based on the transcript
419
+ const { text } = await voiceAgent.generate(transcript)
420
+ ```
421
+
422
+ Visit the [Google Voice Reference](https://mastra.ai/reference/voice/google) for more information on the Google voice provider.
423
+
424
+ **Cloudflare**:
425
+
426
+ ```typescript
427
+ import { Agent } from '@mastra/core/agent'
428
+ import { CloudflareVoice } from '@mastra/voice-cloudflare'
429
+ import { createReadStream } from 'fs'
430
+
431
+ const voiceAgent = new Agent({
432
+ id: 'voice-agent',
433
+ name: 'Voice Agent',
434
+ instructions: 'You are a voice assistant that can help users with their tasks.',
435
+ model: 'openai/gpt-5.4',
436
+ voice: new CloudflareVoice(),
437
+ })
438
+
439
+ // Use an audio file from a URL
440
+ const audioStream = await createReadStream('./how_can_i_help_you.mp3')
441
+
442
+ // Convert audio to text
443
+ const transcript = await voiceAgent.voice.listen(audioStream)
444
+ console.log(`User said: ${transcript}`)
445
+
446
+ // Generate a response based on the transcript
447
+ const { text } = await voiceAgent.generate(transcript)
448
+ ```
449
+
450
+ Visit the [Cloudflare Voice Reference](https://mastra.ai/reference/voice/cloudflare) for more information on the Cloudflare voice provider.
451
+
452
+ **Deepgram**:
453
+
454
+ ```typescript
455
+ import { Agent } from '@mastra/core/agent'
456
+ import { DeepgramVoice } from '@mastra/voice-deepgram'
457
+ import { createReadStream } from 'fs'
458
+
459
+ const voiceAgent = new Agent({
460
+ id: 'voice-agent',
461
+ name: 'Voice Agent',
462
+ instructions: 'You are a voice assistant that can help users with their tasks.',
463
+ model: 'openai/gpt-5.4',
464
+ voice: new DeepgramVoice(),
465
+ })
466
+
467
+ // Use an audio file from a URL
468
+ const audioStream = await createReadStream('./how_can_i_help_you.mp3')
469
+
470
+ // Convert audio to text
471
+ const transcript = await voiceAgent.voice.listen(audioStream)
472
+ console.log(`User said: ${transcript}`)
473
+
474
+ // Generate a response based on the transcript
475
+ const { text } = await voiceAgent.generate(transcript)
476
+ ```
477
+
478
+ Visit the [Deepgram Voice Reference](https://mastra.ai/reference/voice/deepgram) for more information on the Deepgram voice provider.
479
+
480
+ **Sarvam**:
481
+
482
+ ```typescript
483
+ import { Agent } from '@mastra/core/agent'
484
+ import { SarvamVoice } from '@mastra/voice-sarvam'
485
+ import { createReadStream } from 'fs'
486
+
487
+ const voiceAgent = new Agent({
488
+ id: 'voice-agent',
489
+ name: 'Voice Agent',
490
+ instructions: 'You are a voice assistant that can help users with their tasks.',
491
+ model: 'openai/gpt-5.4',
492
+ voice: new SarvamVoice(),
493
+ })
494
+
495
+ // Use an audio file from a URL
496
+ const audioStream = await createReadStream('./how_can_i_help_you.mp3')
497
+
498
+ // Convert audio to text
499
+ const transcript = await voiceAgent.voice.listen(audioStream)
500
+ console.log(`User said: ${transcript}`)
501
+
502
+ // Generate a response based on the transcript
503
+ const { text } = await voiceAgent.generate(transcript)
504
+ ```
505
+
506
+ Visit the [Sarvam Voice Reference](https://mastra.ai/reference/voice/sarvam) for more information on the Sarvam voice provider.
507
+
508
+ ### Speech to Speech (STS)
509
+
510
+ Create conversational experiences with speech-to-speech capabilities. The unified API enables real-time voice interactions between users and AI agents. For detailed configuration options and advanced features, check out [Speech to Speech](https://mastra.ai/docs/voice/speech-to-speech).
511
+
512
+ **OpenAI**:
513
+
514
+ ```typescript
515
+ import { Agent } from '@mastra/core/agent'
516
+ import { playAudio, getMicrophoneStream } from '@mastra/node-audio'
517
+ import { OpenAIRealtimeVoice } from '@mastra/voice-openai-realtime'
518
+
519
+ const voiceAgent = new Agent({
520
+ id: 'voice-agent',
521
+ name: 'Voice Agent',
522
+ instructions: 'You are a voice assistant that can help users with their tasks.',
523
+ model: 'openai/gpt-5.4',
524
+ voice: new OpenAIRealtimeVoice(),
525
+ })
526
+
527
+ // Listen for agent audio responses
528
+ voiceAgent.voice.on('speaker', ({ audio }) => {
529
+ playAudio(audio)
530
+ })
531
+
532
+ // Initiate the conversation
533
+ await voiceAgent.voice.speak('How can I help you today?')
534
+
535
+ // Send continuous audio from the microphone
536
+ const micStream = getMicrophoneStream()
537
+ await voiceAgent.voice.send(micStream)
538
+ ```
539
+
540
+ Visit the [OpenAI Voice Reference](https://mastra.ai/reference/voice/openai-realtime) for more information on the OpenAI voice provider.
541
+
542
+ **Google**:
543
+
544
+ ```typescript
545
+ import { Agent } from '@mastra/core/agent'
546
+ import { playAudio, getMicrophoneStream } from '@mastra/node-audio'
547
+ import { GeminiLiveVoice } from '@mastra/voice-google-gemini-live'
548
+
549
+ const voiceAgent = new Agent({
550
+ id: 'voice-agent',
551
+ name: 'Voice Agent',
552
+ instructions: 'You are a voice assistant that can help users with their tasks.',
553
+ model: 'openai/gpt-5.4',
554
+ voice: new GeminiLiveVoice({
555
+ // Live API mode
556
+ apiKey: process.env.GOOGLE_API_KEY,
557
+ model: 'gemini-2.0-flash-exp',
558
+ speaker: 'Puck',
559
+ debug: true,
560
+ // Vertex AI alternative:
561
+ // vertexAI: true,
562
+ // project: 'your-gcp-project',
563
+ // location: 'us-central1',
564
+ // serviceAccountKeyFile: '/path/to/service-account.json',
565
+ }),
566
+ })
567
+
568
+ // Connect before using speak/send
569
+ await voiceAgent.voice.connect()
570
+
571
+ // Listen for agent audio responses
572
+ voiceAgent.voice.on('speaker', ({ audio }) => {
573
+ playAudio(audio)
574
+ })
575
+
576
+ // Listen for text responses and transcriptions
577
+ voiceAgent.voice.on('writing', ({ text, role }) => {
578
+ console.log(`${role}: ${text}`)
579
+ })
580
+
581
+ // Initiate the conversation
582
+ await voiceAgent.voice.speak('How can I help you today?')
583
+
584
+ // Send continuous audio from the microphone
585
+ const micStream = getMicrophoneStream()
586
+ await voiceAgent.voice.send(micStream)
587
+ ```
588
+
589
+ Visit the [Google Gemini Live Reference](https://mastra.ai/reference/voice/google-gemini-live) for more information on the Google Gemini Live voice provider.
590
+
591
+ **AWS Nova Sonic**:
592
+
593
+ ```typescript
594
+ import { Agent } from '@mastra/core/agent'
595
+ import { playAudio, getMicrophoneStream } from '@mastra/node-audio'
596
+ import { NovaSonicVoice } from '@mastra/voice-aws-nova-sonic'
597
+
598
+ const voiceAgent = new Agent({
599
+ id: 'voice-agent',
600
+ name: 'Voice Agent',
601
+ instructions: 'You are a voice assistant that can help users with their tasks.',
602
+ model: 'openai/gpt-5.4',
603
+ voice: new NovaSonicVoice({
604
+ region: 'us-east-1',
605
+ speaker: 'matthew',
606
+ // Static credentials are optional. The default AWS credential
607
+ // provider chain is used when none are passed.
608
+ }),
609
+ })
610
+
611
+ // Connect before using speak/send
612
+ await voiceAgent.voice.connect()
613
+
614
+ // Listen for assistant audio (Int16Array PCM)
615
+ voiceAgent.voice.on('speaking', ({ audioData }) => {
616
+ if (audioData) playAudio(audioData)
617
+ })
618
+
619
+ // Listen for transcribed text
620
+ voiceAgent.voice.on('writing', ({ text, role }) => {
621
+ console.log(`${role}: ${text}`)
622
+ })
623
+
624
+ // Initiate the conversation
625
+ await voiceAgent.voice.speak('How can I help you today?')
626
+
627
+ // Send continuous audio from the microphone
628
+ const micStream = getMicrophoneStream()
629
+ await voiceAgent.voice.send(micStream)
630
+ ```
631
+
632
+ Visit the [AWS Nova Sonic Reference](https://mastra.ai/reference/voice/aws-nova-sonic) for more information on the AWS Nova Sonic voice provider.
633
+
634
+ ## Voice configuration
635
+
636
+ Each voice provider can be configured with different models and options. Below are the detailed configuration options for all supported providers:
637
+
638
+ **OpenAI**:
639
+
640
+ ```typescript
641
+ // OpenAI Voice Configuration
642
+ const voice = new OpenAIVoice({
643
+ speechModel: {
644
+ name: 'gpt-3.5-turbo', // Example model name
645
+ apiKey: process.env.OPENAI_API_KEY,
646
+ language: 'en-US', // Language code
647
+ voiceType: 'neural', // Type of voice model
648
+ },
649
+ listeningModel: {
650
+ name: 'whisper-1', // Example model name
651
+ apiKey: process.env.OPENAI_API_KEY,
652
+ language: 'en-US', // Language code
653
+ format: 'wav', // Audio format
654
+ },
655
+ speaker: 'alloy', // Example speaker name
656
+ })
657
+ ```
658
+
659
+ Visit the [OpenAI Voice Reference](https://mastra.ai/reference/voice/openai) for more information on the OpenAI voice provider.
660
+
661
+ **Azure**:
662
+
663
+ ```typescript
664
+ // Azure Voice Configuration
665
+ const voice = new AzureVoice({
666
+ speechModel: {
667
+ name: 'en-US-JennyNeural', // Example model name
668
+ apiKey: process.env.AZURE_SPEECH_KEY,
669
+ region: process.env.AZURE_SPEECH_REGION,
670
+ language: 'en-US', // Language code
671
+ style: 'cheerful', // Voice style
672
+ pitch: '+0Hz', // Pitch adjustment
673
+ rate: '1.0', // Speech rate
674
+ },
675
+ listeningModel: {
676
+ name: 'en-US', // Example model name
677
+ apiKey: process.env.AZURE_SPEECH_KEY,
678
+ region: process.env.AZURE_SPEECH_REGION,
679
+ format: 'simple', // Output format
680
+ },
681
+ })
682
+ ```
683
+
684
+ Visit the [Azure Voice Reference](https://mastra.ai/reference/voice/azure) for more information on the Azure voice provider.
685
+
686
+ **ElevenLabs**:
687
+
688
+ ```typescript
689
+ // ElevenLabs Voice Configuration
690
+ const voice = new ElevenLabsVoice({
691
+ speechModel: {
692
+ voiceId: 'your-voice-id', // Example voice ID
693
+ model: 'eleven_multilingual_v2', // Example model name
694
+ apiKey: process.env.ELEVENLABS_API_KEY,
695
+ language: 'en', // Language code
696
+ emotion: 'neutral', // Emotion setting
697
+ },
698
+ // ElevenLabs may not have a separate listening model
699
+ })
700
+ ```
701
+
702
+ Visit the [ElevenLabs Voice Reference](https://mastra.ai/reference/voice/elevenlabs) for more information on the ElevenLabs voice provider.
703
+
704
+ **PlayAI**:
705
+
706
+ ```typescript
707
+ // PlayAI Voice Configuration
708
+ const voice = new PlayAIVoice({
709
+ speechModel: {
710
+ name: 'playai-voice', // Example model name
711
+ speaker: 'emma', // Example speaker name
712
+ apiKey: process.env.PLAYAI_API_KEY,
713
+ language: 'en-US', // Language code
714
+ speed: 1.0, // Speech speed
715
+ },
716
+ // PlayAI may not have a separate listening model
717
+ })
718
+ ```
719
+
720
+ Visit the [PlayAI Voice Reference](https://mastra.ai/reference/voice/playai) for more information on the PlayAI voice provider.
721
+
722
+ **Google**:
723
+
724
+ ```typescript
725
+ // Google Voice Configuration
726
+ const voice = new GoogleVoice({
727
+ speechModel: {
728
+ name: 'en-US-Studio-O', // Example model name
729
+ apiKey: process.env.GOOGLE_API_KEY,
730
+ languageCode: 'en-US', // Language code
731
+ gender: 'FEMALE', // Voice gender
732
+ speakingRate: 1.0, // Speaking rate
733
+ },
734
+ listeningModel: {
735
+ name: 'en-US', // Example model name
736
+ sampleRateHertz: 16000, // Sample rate
737
+ },
738
+ })
739
+ ```
740
+
741
+ Visit the [Google Voice Reference](https://mastra.ai/reference/voice/google) for more information on the Google voice provider.
742
+
743
+ **Cloudflare**:
744
+
745
+ ```typescript
746
+ // Cloudflare Voice Configuration
747
+ const voice = new CloudflareVoice({
748
+ speechModel: {
749
+ name: 'cloudflare-voice', // Example model name
750
+ accountId: process.env.CLOUDFLARE_ACCOUNT_ID,
751
+ apiToken: process.env.CLOUDFLARE_API_TOKEN,
752
+ language: 'en-US', // Language code
753
+ format: 'mp3', // Audio format
754
+ },
755
+ // Cloudflare may not have a separate listening model
756
+ })
757
+ ```
758
+
759
+ Visit the [Cloudflare Voice Reference](https://mastra.ai/reference/voice/cloudflare) for more information on the Cloudflare voice provider.
760
+
761
+ **Deepgram**:
762
+
763
+ ```typescript
764
+ // Deepgram Voice Configuration
765
+ const voice = new DeepgramVoice({
766
+ speechModel: {
767
+ name: 'nova-2', // Example model name
768
+ speaker: 'aura-english-us', // Example speaker name
769
+ apiKey: process.env.DEEPGRAM_API_KEY,
770
+ language: 'en-US', // Language code
771
+ tone: 'formal', // Tone setting
772
+ },
773
+ listeningModel: {
774
+ name: 'nova-2', // Example model name
775
+ format: 'flac', // Audio format
776
+ },
777
+ })
778
+ ```
779
+
780
+ Visit the [Deepgram Voice Reference](https://mastra.ai/reference/voice/deepgram) for more information on the Deepgram voice provider.
781
+
782
+ **Speechify**:
783
+
784
+ ```typescript
785
+ // Speechify Voice Configuration
786
+ const voice = new SpeechifyVoice({
787
+ speechModel: {
788
+ name: 'speechify-voice', // Example model name
789
+ speaker: 'matthew', // Example speaker name
790
+ apiKey: process.env.SPEECHIFY_API_KEY,
791
+ language: 'en-US', // Language code
792
+ speed: 1.0, // Speech speed
793
+ },
794
+ // Speechify may not have a separate listening model
795
+ })
796
+ ```
797
+
798
+ Visit the [Speechify Voice Reference](https://mastra.ai/reference/voice/speechify) for more information on the Speechify voice provider.
799
+
800
+ **Sarvam**:
801
+
802
+ ```typescript
803
+ // Sarvam Voice Configuration
804
+ const voice = new SarvamVoice({
805
+ speechModel: {
806
+ model: 'bulbul:v3', // TTS model (bulbul:v2 or bulbul:v3)
807
+ apiKey: process.env.SARVAM_API_KEY,
808
+ language: 'en-IN', // BCP-47 language code
809
+ },
810
+ listeningModel: {
811
+ model: 'saarika:v2.5', // STT model (saarika:v2.5 or saaras:v3)
812
+ apiKey: process.env.SARVAM_API_KEY,
813
+ },
814
+ speaker: 'shubh', // Default bulbul:v3 speaker
815
+ })
816
+ ```
817
+
818
+ Visit the [Sarvam Voice Reference](https://mastra.ai/reference/voice/sarvam) for more information on the Sarvam voice provider.
819
+
820
+ **Murf**:
821
+
822
+ ```typescript
823
+ // Murf Voice Configuration
824
+ const voice = new MurfVoice({
825
+ speechModel: {
826
+ name: 'murf-voice', // Example model name
827
+ apiKey: process.env.MURF_API_KEY,
828
+ language: 'en-US', // Language code
829
+ emotion: 'happy', // Emotion setting
830
+ },
831
+ // Murf may not have a separate listening model
832
+ })
833
+ ```
834
+
835
+ Visit the [Murf Voice Reference](https://mastra.ai/reference/voice/murf) for more information on the Murf voice provider.
836
+
837
+ **OpenAI Realtime**:
838
+
839
+ ```typescript
840
+ // OpenAI Realtime Voice Configuration
841
+ const voice = new OpenAIRealtimeVoice({
842
+ speechModel: {
843
+ name: 'gpt-3.5-turbo', // Example model name
844
+ apiKey: process.env.OPENAI_API_KEY,
845
+ language: 'en-US', // Language code
846
+ },
847
+ listeningModel: {
848
+ name: 'whisper-1', // Example model name
849
+ apiKey: process.env.OPENAI_API_KEY,
850
+ format: 'ogg', // Audio format
851
+ },
852
+ speaker: 'alloy', // Example speaker name
853
+ })
854
+ ```
855
+
856
+ For more information on the OpenAI Realtime voice provider, refer to the [OpenAI Realtime Voice Reference](https://mastra.ai/reference/voice/openai-realtime).
857
+
858
+ **Google Gemini Live**:
859
+
860
+ ```typescript
861
+ // Google Gemini Live Voice Configuration
862
+ const voice = new GeminiLiveVoice({
863
+ speechModel: {
864
+ name: 'gemini-2.0-flash-exp', // Example model name
865
+ apiKey: process.env.GOOGLE_API_KEY,
866
+ },
867
+ speaker: 'Puck', // Example speaker name
868
+ // Google Gemini Live is a realtime bidirectional API without separate speech and listening models
869
+ })
870
+ ```
871
+
872
+ Visit the [Google Gemini Live Reference](https://mastra.ai/reference/voice/google-gemini-live) for more information on the Google Gemini Live voice provider.
873
+
874
+ **AWS Nova Sonic**:
875
+
876
+ ```typescript
877
+ // AWS Nova Sonic Voice Configuration
878
+ const voice = new NovaSonicVoice({
879
+ region: 'us-east-1',
880
+ speaker: 'matthew',
881
+ sessionConfig: {
882
+ inferenceConfiguration: {
883
+ temperature: 0.7,
884
+ maxTokens: 1024,
885
+ },
886
+ turnDetectionConfiguration: {
887
+ endpointingSensitivity: 'MEDIUM',
888
+ },
889
+ },
890
+ // AWS Nova Sonic is a realtime bidirectional API without separate speech and listening models
891
+ })
892
+ ```
893
+
894
+ Visit the [AWS Nova Sonic Reference](https://mastra.ai/reference/voice/aws-nova-sonic) for more information on the AWS Nova Sonic voice provider.
895
+
896
+ **AI SDK**:
897
+
898
+ ```typescript
899
+ // AI SDK Voice Configuration
900
+ import { CompositeVoice } from '@mastra/core/voice'
901
+ import { openai } from '@ai-sdk/openai'
902
+ import { elevenlabs } from '@ai-sdk/elevenlabs'
903
+
904
+ // Use AI SDK models directly - no need to install separate packages
905
+ const voice = new CompositeVoice({
906
+ input: openai.transcription('whisper-1'), // AI SDK transcription
907
+ output: elevenlabs.speech('eleven_turbo_v2'), // AI SDK speech
908
+ })
909
+
910
+ // Works seamlessly with your agent
911
+ const voiceAgent = new Agent({
912
+ id: 'aisdk-voice-agent',
913
+ name: 'AI SDK Voice Agent',
914
+ instructions: 'You are a helpful assistant with voice capabilities.',
915
+ model: 'openai/gpt-5.4',
916
+ voice,
917
+ })
918
+ ```
919
+
920
+ ### Using Multiple Voice Providers
921
+
922
+ This example demonstrates how to create and use two different voice providers in Mastra: OpenAI for speech-to-text (STT) and PlayAI for text-to-speech (TTS).
923
+
924
+ Start by creating instances of the voice providers with any necessary configuration.
925
+
926
+ ```typescript
927
+ import { OpenAIVoice } from '@mastra/voice-openai'
928
+ import { PlayAIVoice } from '@mastra/voice-playai'
929
+ import { CompositeVoice } from '@mastra/core/voice'
930
+ import { playAudio, getMicrophoneStream } from '@mastra/node-audio'
931
+
932
+ // Initialize OpenAI voice for STT
933
+ const input = new OpenAIVoice({
934
+ listeningModel: {
935
+ name: 'whisper-1',
936
+ apiKey: process.env.OPENAI_API_KEY,
937
+ },
938
+ })
939
+
940
+ // Initialize PlayAI voice for TTS
941
+ const output = new PlayAIVoice({
942
+ speechModel: {
943
+ name: 'playai-voice',
944
+ apiKey: process.env.PLAYAI_API_KEY,
945
+ },
946
+ })
947
+
948
+ // Combine the providers using CompositeVoice
949
+ const voice = new CompositeVoice({
950
+ input,
951
+ output,
952
+ })
953
+
954
+ // Implement voice interactions using the combined voice provider
955
+ const audioStream = getMicrophoneStream() // Assume this function gets audio input
956
+ const transcript = await voice.listen(audioStream)
957
+
958
+ // Log the transcribed text
959
+ console.log('Transcribed text:', transcript)
960
+
961
+ // Convert text to speech
962
+ const responseAudio = await voice.speak(`You said: ${transcript}`, {
963
+ speaker: 'default', // Optional: specify a speaker,
964
+ responseFormat: 'wav', // Optional: specify a response format
965
+ })
966
+
967
+ // Play the audio response
968
+ playAudio(responseAudio)
969
+ ```
970
+
971
+ ### Using AI SDK Model Providers
972
+
973
+ You can also use AI SDK models directly with `CompositeVoice`:
974
+
975
+ ```typescript
976
+ import { CompositeVoice } from '@mastra/core/voice'
977
+ import { openai } from '@ai-sdk/openai'
978
+ import { elevenlabs } from '@ai-sdk/elevenlabs'
979
+ import { playAudio, getMicrophoneStream } from '@mastra/node-audio'
980
+
981
+ // Use AI SDK models directly - no provider setup needed
982
+ const voice = new CompositeVoice({
983
+ input: openai.transcription('whisper-1'), // AI SDK transcription
984
+ output: elevenlabs.speech('eleven_turbo_v2'), // AI SDK speech
985
+ })
986
+
987
+ // Works the same way as Mastra providers
988
+ const audioStream = getMicrophoneStream()
989
+ const transcript = await voice.listen(audioStream)
990
+
991
+ console.log('Transcribed text:', transcript)
992
+
993
+ // Convert text to speech
994
+ const responseAudio = await voice.speak(`You said: ${transcript}`, {
995
+ speaker: 'Rachel', // ElevenLabs voice
996
+ })
997
+
998
+ playAudio(responseAudio)
999
+ ```
1000
+
1001
+ You can also mix AI SDK models with Mastra providers:
1002
+
1003
+ ```typescript
1004
+ import { CompositeVoice } from '@mastra/core/voice'
1005
+ import { PlayAIVoice } from '@mastra/voice-playai'
1006
+ import { groq } from '@ai-sdk/groq'
1007
+
1008
+ const voice = new CompositeVoice({
1009
+ input: groq.transcription('whisper-large-v3'), // AI SDK for STT
1010
+ output: new PlayAIVoice(), // Mastra provider for TTS
1011
+ })
1012
+ ```
1013
+
1014
+ For more information on the CompositeVoice, refer to the [CompositeVoice Reference](https://mastra.ai/reference/voice/composite-voice).
1015
+
1016
+ ## More resources
1017
+
1018
+ - [CompositeVoice](https://mastra.ai/reference/voice/composite-voice)
1019
+ - [MastraVoice](https://mastra.ai/reference/voice/mastra-voice)
1020
+ - [OpenAI Voice](https://mastra.ai/reference/voice/openai)
1021
+ - [OpenAI Realtime Voice](https://mastra.ai/reference/voice/openai-realtime)
1022
+ - [Azure Voice](https://mastra.ai/reference/voice/azure)
1023
+ - [Google Voice](https://mastra.ai/reference/voice/google)
1024
+ - [Google Gemini Live Voice](https://mastra.ai/reference/voice/google-gemini-live)
1025
+ - [AWS Nova Sonic Voice](https://mastra.ai/reference/voice/aws-nova-sonic)
1026
+ - [Deepgram Voice](https://mastra.ai/reference/voice/deepgram)
1027
+ - [PlayAI Voice](https://mastra.ai/reference/voice/playai)
1028
+ - [Voice Examples](https://github.com/mastra-ai/voice-examples)