@mastra/voice-murf 0.0.0-mastra-2452-core-tracing-sentry-20250507205331 → 0.0.0-mastra-auto-detect-server-20260108233416
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +1755 -0
- package/LICENSE.md +11 -42
- package/dist/docs/README.md +32 -0
- package/dist/docs/SKILL.md +33 -0
- package/dist/docs/SOURCE_MAP.json +6 -0
- package/dist/docs/agents/01-adding-voice.md +352 -0
- package/dist/docs/voice/01-overview.md +1019 -0
- package/dist/docs/voice/02-reference.md +83 -0
- package/dist/index.cjs +44 -38
- package/dist/index.cjs.map +1 -0
- package/dist/index.d.ts +54 -3
- package/dist/index.d.ts.map +1 -0
- package/dist/index.js +44 -38
- package/dist/index.js.map +1 -0
- package/dist/voices.d.ts +6 -0
- package/dist/voices.d.ts.map +1 -0
- package/package.json +35 -16
- package/dist/_tsup-dts-rollup.d.cts +0 -57
- package/dist/_tsup-dts-rollup.d.ts +0 -57
- package/dist/index.d.cts +0 -3
package/LICENSE.md
CHANGED
|
@@ -1,46 +1,15 @@
|
|
|
1
|
-
#
|
|
1
|
+
# Apache License 2.0
|
|
2
2
|
|
|
3
|
-
Copyright (c) 2025
|
|
3
|
+
Copyright (c) 2025 Kepler Software, Inc.
|
|
4
4
|
|
|
5
|
-
|
|
6
|
-
|
|
5
|
+
Licensed under the Apache License, Version 2.0 (the "License");
|
|
6
|
+
you may not use this file except in compliance with the License.
|
|
7
|
+
You may obtain a copy of the License at
|
|
7
8
|
|
|
8
|
-
|
|
9
|
-
The licensor grants you a non-exclusive, royalty-free, worldwide, non-sublicensable, non-transferable license to use, copy, distribute, make available, and prepare derivative works of the software, in each case subject to the limitations and conditions below
|
|
9
|
+
http://www.apache.org/licenses/LICENSE-2.0
|
|
10
10
|
|
|
11
|
-
|
|
12
|
-
|
|
13
|
-
|
|
14
|
-
|
|
15
|
-
|
|
16
|
-
You may not alter, remove, or obscure any licensing, copyright, or other notices of the licensor in the software. Any use of the licensor’s trademarks is subject to applicable law.
|
|
17
|
-
|
|
18
|
-
**Patents**
|
|
19
|
-
The licensor grants you a license, under any patent claims the licensor can license, or becomes able to license, to make, have made, use, sell, offer for sale, import and have imported the software, in each case subject to the limitations and conditions in this license. This license does not cover any patent claims that you cause to be infringed by modifications or additions to the software. If you or your company make any written claim that the software infringes or contributes to infringement of any patent, your patent license for the software granted under these terms ends immediately. If your company makes such a claim, your patent license ends immediately for work on behalf of your company.
|
|
20
|
-
|
|
21
|
-
**Notices**
|
|
22
|
-
You must ensure that anyone who gets a copy of any part of the software from you also gets a copy of these terms.
|
|
23
|
-
|
|
24
|
-
If you modify the software, you must include in any modified copies of the software prominent notices stating that you have modified the software.
|
|
25
|
-
|
|
26
|
-
**No Other Rights**
|
|
27
|
-
These terms do not imply any licenses other than those expressly granted in these terms.
|
|
28
|
-
|
|
29
|
-
**Termination**
|
|
30
|
-
If you use the software in violation of these terms, such use is not licensed, and your licenses will automatically terminate. If the licensor provides you with a notice of your violation, and you cease all violation of this license no later than 30 days after you receive that notice, your licenses will be reinstated retroactively. However, if you violate these terms after such reinstatement, any additional violation of these terms will cause your licenses to terminate automatically and permanently.
|
|
31
|
-
|
|
32
|
-
**No Liability**
|
|
33
|
-
As far as the law allows, the software comes as is, without any warranty or condition, and the licensor will not be liable to you for any damages arising out of these terms or the use or nature of the software, under any kind of legal claim.
|
|
34
|
-
|
|
35
|
-
**Definitions**
|
|
36
|
-
The _licensor_ is the entity offering these terms, and the _software_ is the software the licensor makes available under these terms, including any portion of it.
|
|
37
|
-
|
|
38
|
-
_you_ refers to the individual or entity agreeing to these terms.
|
|
39
|
-
|
|
40
|
-
_your company_ is any legal entity, sole proprietorship, or other kind of organization that you work for, plus all organizations that have control over, are under the control of, or are under common control with that organization. _control_ means ownership of substantially all the assets of an entity, or the power to direct its management and policies by vote, contract, or otherwise. Control can be direct or indirect.
|
|
41
|
-
|
|
42
|
-
_your licenses_ are all the licenses granted to you for the software under these terms.
|
|
43
|
-
|
|
44
|
-
_use_ means anything you do with the software requiring one of your licenses.
|
|
45
|
-
|
|
46
|
-
_trademark_ means trademarks, service marks, and similar rights.
|
|
11
|
+
Unless required by applicable law or agreed to in writing, software
|
|
12
|
+
distributed under the License is distributed on an "AS IS" BASIS,
|
|
13
|
+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
14
|
+
See the License for the specific language governing permissions and
|
|
15
|
+
limitations under the License.
|
|
@@ -0,0 +1,32 @@
|
|
|
1
|
+
# @mastra/voice-murf Documentation
|
|
2
|
+
|
|
3
|
+
> Embedded documentation for coding agents
|
|
4
|
+
|
|
5
|
+
## Quick Start
|
|
6
|
+
|
|
7
|
+
```bash
|
|
8
|
+
# Read the skill overview
|
|
9
|
+
cat docs/SKILL.md
|
|
10
|
+
|
|
11
|
+
# Get the source map
|
|
12
|
+
cat docs/SOURCE_MAP.json
|
|
13
|
+
|
|
14
|
+
# Read topic documentation
|
|
15
|
+
cat docs/<topic>/01-overview.md
|
|
16
|
+
```
|
|
17
|
+
|
|
18
|
+
## Structure
|
|
19
|
+
|
|
20
|
+
```
|
|
21
|
+
docs/
|
|
22
|
+
├── SKILL.md # Entry point
|
|
23
|
+
├── README.md # This file
|
|
24
|
+
├── SOURCE_MAP.json # Export index
|
|
25
|
+
├── agents/ (1 files)
|
|
26
|
+
├── voice/ (2 files)
|
|
27
|
+
```
|
|
28
|
+
|
|
29
|
+
## Version
|
|
30
|
+
|
|
31
|
+
Package: @mastra/voice-murf
|
|
32
|
+
Version: 0.12.0-beta.1
|
|
@@ -0,0 +1,33 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: mastra-voice-murf-docs
|
|
3
|
+
description: Documentation for @mastra/voice-murf. Includes links to type definitions and readable implementation code in dist/.
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# @mastra/voice-murf Documentation
|
|
7
|
+
|
|
8
|
+
> **Version**: 0.12.0-beta.1
|
|
9
|
+
> **Package**: @mastra/voice-murf
|
|
10
|
+
|
|
11
|
+
## Quick Navigation
|
|
12
|
+
|
|
13
|
+
Use SOURCE_MAP.json to find any export:
|
|
14
|
+
|
|
15
|
+
```bash
|
|
16
|
+
cat docs/SOURCE_MAP.json
|
|
17
|
+
```
|
|
18
|
+
|
|
19
|
+
Each export maps to:
|
|
20
|
+
- **types**: `.d.ts` file with JSDoc and API signatures
|
|
21
|
+
- **implementation**: `.js` chunk file with readable source
|
|
22
|
+
- **docs**: Conceptual documentation in `docs/`
|
|
23
|
+
|
|
24
|
+
## Top Exports
|
|
25
|
+
|
|
26
|
+
|
|
27
|
+
|
|
28
|
+
See SOURCE_MAP.json for the complete list.
|
|
29
|
+
|
|
30
|
+
## Available Topics
|
|
31
|
+
|
|
32
|
+
- [Agents](agents/) - 1 file(s)
|
|
33
|
+
- [Voice](voice/) - 2 file(s)
|
|
@@ -0,0 +1,352 @@
|
|
|
1
|
+
# Voice
|
|
2
|
+
|
|
3
|
+
Mastra agents can be enhanced with voice capabilities, allowing them to speak responses and listen to user input. You can configure an agent to use either a single voice provider or combine multiple providers for different operations.
|
|
4
|
+
|
|
5
|
+
## Basic usage
|
|
6
|
+
|
|
7
|
+
The simplest way to add voice to an agent is to use a single provider for both speaking and listening:
|
|
8
|
+
|
|
9
|
+
```typescript
|
|
10
|
+
import { createReadStream } from "fs";
|
|
11
|
+
import path from "path";
|
|
12
|
+
import { Agent } from "@mastra/core/agent";
|
|
13
|
+
import { OpenAIVoice } from "@mastra/voice-openai";
|
|
14
|
+
|
|
15
|
+
// Initialize the voice provider with default settings
|
|
16
|
+
const voice = new OpenAIVoice();
|
|
17
|
+
|
|
18
|
+
// Create an agent with voice capabilities
|
|
19
|
+
export const agent = new Agent({
|
|
20
|
+
id: "voice-agent",
|
|
21
|
+
name: "Voice Agent",
|
|
22
|
+
instructions: `You are a helpful assistant with both STT and TTS capabilities.`,
|
|
23
|
+
model: "openai/gpt-5.1",
|
|
24
|
+
voice,
|
|
25
|
+
});
|
|
26
|
+
|
|
27
|
+
// The agent can now use voice for interaction
|
|
28
|
+
const audioStream = await agent.voice.speak("Hello, I'm your AI assistant!", {
|
|
29
|
+
filetype: "m4a",
|
|
30
|
+
});
|
|
31
|
+
|
|
32
|
+
playAudio(audioStream!);
|
|
33
|
+
|
|
34
|
+
try {
|
|
35
|
+
const transcription = await agent.voice.listen(audioStream);
|
|
36
|
+
console.log(transcription);
|
|
37
|
+
} catch (error) {
|
|
38
|
+
console.error("Error transcribing audio:", error);
|
|
39
|
+
}
|
|
40
|
+
```
|
|
41
|
+
|
|
42
|
+
## Working with Audio Streams
|
|
43
|
+
|
|
44
|
+
The `speak()` and `listen()` methods work with Node.js streams. Here's how to save and load audio files:
|
|
45
|
+
|
|
46
|
+
### Saving Speech Output
|
|
47
|
+
|
|
48
|
+
The `speak` method returns a stream that you can pipe to a file or speaker.
|
|
49
|
+
|
|
50
|
+
```typescript
|
|
51
|
+
import { createWriteStream } from "fs";
|
|
52
|
+
import path from "path";
|
|
53
|
+
|
|
54
|
+
// Generate speech and save to file
|
|
55
|
+
const audio = await agent.voice.speak("Hello, World!");
|
|
56
|
+
const filePath = path.join(process.cwd(), "agent.mp3");
|
|
57
|
+
const writer = createWriteStream(filePath);
|
|
58
|
+
|
|
59
|
+
audio.pipe(writer);
|
|
60
|
+
|
|
61
|
+
await new Promise<void>((resolve, reject) => {
|
|
62
|
+
writer.on("finish", () => resolve());
|
|
63
|
+
writer.on("error", reject);
|
|
64
|
+
});
|
|
65
|
+
```
|
|
66
|
+
|
|
67
|
+
### Transcribing Audio Input
|
|
68
|
+
|
|
69
|
+
The `listen` method expects a stream of audio data from a microphone or file.
|
|
70
|
+
|
|
71
|
+
```typescript
|
|
72
|
+
import { createReadStream } from "fs";
|
|
73
|
+
import path from "path";
|
|
74
|
+
|
|
75
|
+
// Read audio file and transcribe
|
|
76
|
+
const audioFilePath = path.join(process.cwd(), "/agent.m4a");
|
|
77
|
+
const audioStream = createReadStream(audioFilePath);
|
|
78
|
+
|
|
79
|
+
try {
|
|
80
|
+
console.log("Transcribing audio file...");
|
|
81
|
+
const transcription = await agent.voice.listen(audioStream, {
|
|
82
|
+
filetype: "m4a",
|
|
83
|
+
});
|
|
84
|
+
console.log("Transcription:", transcription);
|
|
85
|
+
} catch (error) {
|
|
86
|
+
console.error("Error transcribing audio:", error);
|
|
87
|
+
}
|
|
88
|
+
```
|
|
89
|
+
|
|
90
|
+
## Speech-to-Speech Voice Interactions
|
|
91
|
+
|
|
92
|
+
For more dynamic and interactive voice experiences, you can use real-time voice providers that support speech-to-speech capabilities:
|
|
93
|
+
|
|
94
|
+
```typescript
|
|
95
|
+
import { Agent } from "@mastra/core/agent";
|
|
96
|
+
import { getMicrophoneStream } from "@mastra/node-audio";
|
|
97
|
+
import { OpenAIRealtimeVoice } from "@mastra/voice-openai-realtime";
|
|
98
|
+
import { search, calculate } from "../tools";
|
|
99
|
+
|
|
100
|
+
// Initialize the realtime voice provider
|
|
101
|
+
const voice = new OpenAIRealtimeVoice({
|
|
102
|
+
apiKey: process.env.OPENAI_API_KEY,
|
|
103
|
+
model: "gpt-5.1-realtime",
|
|
104
|
+
speaker: "alloy",
|
|
105
|
+
});
|
|
106
|
+
|
|
107
|
+
// Create an agent with speech-to-speech voice capabilities
|
|
108
|
+
export const agent = new Agent({
|
|
109
|
+
id: "speech-to-speech-agent",
|
|
110
|
+
name: "Speech-to-Speech Agent",
|
|
111
|
+
instructions: `You are a helpful assistant with speech-to-speech capabilities.`,
|
|
112
|
+
model: "openai/gpt-5.1",
|
|
113
|
+
tools: {
|
|
114
|
+
// Tools configured on Agent are passed to voice provider
|
|
115
|
+
search,
|
|
116
|
+
calculate,
|
|
117
|
+
},
|
|
118
|
+
voice,
|
|
119
|
+
});
|
|
120
|
+
|
|
121
|
+
// Establish a WebSocket connection
|
|
122
|
+
await agent.voice.connect();
|
|
123
|
+
|
|
124
|
+
// Start a conversation
|
|
125
|
+
agent.voice.speak("Hello, I'm your AI assistant!");
|
|
126
|
+
|
|
127
|
+
// Stream audio from a microphone
|
|
128
|
+
const microphoneStream = getMicrophoneStream();
|
|
129
|
+
agent.voice.send(microphoneStream);
|
|
130
|
+
|
|
131
|
+
// When done with the conversation
|
|
132
|
+
agent.voice.close();
|
|
133
|
+
```
|
|
134
|
+
|
|
135
|
+
### Event System
|
|
136
|
+
|
|
137
|
+
The realtime voice provider emits several events you can listen for:
|
|
138
|
+
|
|
139
|
+
```typescript
|
|
140
|
+
// Listen for speech audio data sent from voice provider
|
|
141
|
+
agent.voice.on("speaking", ({ audio }) => {
|
|
142
|
+
// audio contains ReadableStream or Int16Array audio data
|
|
143
|
+
});
|
|
144
|
+
|
|
145
|
+
// Listen for transcribed text sent from both voice provider and user
|
|
146
|
+
agent.voice.on("writing", ({ text, role }) => {
|
|
147
|
+
console.log(`${role} said: ${text}`);
|
|
148
|
+
});
|
|
149
|
+
|
|
150
|
+
// Listen for errors
|
|
151
|
+
agent.voice.on("error", (error) => {
|
|
152
|
+
console.error("Voice error:", error);
|
|
153
|
+
});
|
|
154
|
+
```
|
|
155
|
+
|
|
156
|
+
## Examples
|
|
157
|
+
|
|
158
|
+
### End-to-end voice interaction
|
|
159
|
+
|
|
160
|
+
This example demonstrates a voice interaction between two agents. The hybrid voice agent, which uses multiple providers, speaks a question, which is saved as an audio file. The unified voice agent listens to that file, processes the question, generates a response, and speaks it back. Both audio outputs are saved to the `audio` directory.
|
|
161
|
+
|
|
162
|
+
The following files are created:
|
|
163
|
+
|
|
164
|
+
- **hybrid-question.mp3** – Hybrid agent's spoken question.
|
|
165
|
+
- **unified-response.mp3** – Unified agent's spoken response.
|
|
166
|
+
|
|
167
|
+
```typescript title="src/test-voice-agents.ts"
|
|
168
|
+
import "dotenv/config";
|
|
169
|
+
|
|
170
|
+
import path from "path";
|
|
171
|
+
import { createReadStream } from "fs";
|
|
172
|
+
import { Agent } from "@mastra/core/agent";
|
|
173
|
+
import { CompositeVoice } from "@mastra/core/voice";
|
|
174
|
+
import { OpenAIVoice } from "@mastra/voice-openai";
|
|
175
|
+
import { Mastra } from "@mastra/core";
|
|
176
|
+
|
|
177
|
+
// Saves an audio stream to a file in the audio directory, creating the directory if it doesn't exist.
|
|
178
|
+
export const saveAudioToFile = async (
|
|
179
|
+
audio: NodeJS.ReadableStream,
|
|
180
|
+
filename: string,
|
|
181
|
+
): Promise<void> => {
|
|
182
|
+
const audioDir = path.join(process.cwd(), "audio");
|
|
183
|
+
const filePath = path.join(audioDir, filename);
|
|
184
|
+
|
|
185
|
+
await fs.promises.mkdir(audioDir, { recursive: true });
|
|
186
|
+
|
|
187
|
+
const writer = createWriteStream(filePath);
|
|
188
|
+
audio.pipe(writer);
|
|
189
|
+
return new Promise((resolve, reject) => {
|
|
190
|
+
writer.on("finish", resolve);
|
|
191
|
+
writer.on("error", reject);
|
|
192
|
+
});
|
|
193
|
+
};
|
|
194
|
+
|
|
195
|
+
// Saves an audio stream to a file in the audio directory, creating the directory if it doesn't exist.
|
|
196
|
+
export const convertToText = async (
|
|
197
|
+
input: string | NodeJS.ReadableStream,
|
|
198
|
+
): Promise<string> => {
|
|
199
|
+
if (typeof input === "string") {
|
|
200
|
+
return input;
|
|
201
|
+
}
|
|
202
|
+
|
|
203
|
+
const chunks: Buffer[] = [];
|
|
204
|
+
return new Promise((resolve, reject) => {
|
|
205
|
+
inputData.on("data", (chunk) => chunks.push(Buffer.from(chunk)));
|
|
206
|
+
inputData.on("error", reject);
|
|
207
|
+
inputData.on("end", () => resolve(Buffer.concat(chunks).toString("utf-8")));
|
|
208
|
+
});
|
|
209
|
+
};
|
|
210
|
+
|
|
211
|
+
export const hybridVoiceAgent = new Agent({
|
|
212
|
+
id: "hybrid-voice-agent",
|
|
213
|
+
name: "Hybrid Voice Agent",
|
|
214
|
+
model: "openai/gpt-5.1",
|
|
215
|
+
instructions: "You can speak and listen using different providers.",
|
|
216
|
+
voice: new CompositeVoice({
|
|
217
|
+
input: new OpenAIVoice(),
|
|
218
|
+
output: new OpenAIVoice(),
|
|
219
|
+
}),
|
|
220
|
+
});
|
|
221
|
+
|
|
222
|
+
export const unifiedVoiceAgent = new Agent({
|
|
223
|
+
id: "unified-voice-agent",
|
|
224
|
+
name: "Unified Voice Agent",
|
|
225
|
+
instructions: "You are an agent with both STT and TTS capabilities.",
|
|
226
|
+
model: "openai/gpt-5.1",
|
|
227
|
+
voice: new OpenAIVoice(),
|
|
228
|
+
});
|
|
229
|
+
|
|
230
|
+
export const mastra = new Mastra({
|
|
231
|
+
agents: { hybridVoiceAgent, unifiedVoiceAgent },
|
|
232
|
+
});
|
|
233
|
+
|
|
234
|
+
const hybridVoiceAgent = mastra.getAgent("hybridVoiceAgent");
|
|
235
|
+
const unifiedVoiceAgent = mastra.getAgent("unifiedVoiceAgent");
|
|
236
|
+
|
|
237
|
+
const question = "What is the meaning of life in one sentence?";
|
|
238
|
+
|
|
239
|
+
const hybridSpoken = await hybridVoiceAgent.voice.speak(question);
|
|
240
|
+
|
|
241
|
+
await saveAudioToFile(hybridSpoken!, "hybrid-question.mp3");
|
|
242
|
+
|
|
243
|
+
const audioStream = createReadStream(
|
|
244
|
+
path.join(process.cwd(), "audio", "hybrid-question.mp3"),
|
|
245
|
+
);
|
|
246
|
+
const unifiedHeard = await unifiedVoiceAgent.voice.listen(audioStream);
|
|
247
|
+
|
|
248
|
+
const inputText = await convertToText(unifiedHeard!);
|
|
249
|
+
|
|
250
|
+
const unifiedResponse = await unifiedVoiceAgent.generate(inputText);
|
|
251
|
+
const unifiedSpoken = await unifiedVoiceAgent.voice.speak(unifiedResponse.text);
|
|
252
|
+
|
|
253
|
+
await saveAudioToFile(unifiedSpoken!, "unified-response.mp3");
|
|
254
|
+
```
|
|
255
|
+
|
|
256
|
+
### Using Multiple Providers
|
|
257
|
+
|
|
258
|
+
For more flexibility, you can use different providers for speaking and listening using the CompositeVoice class:
|
|
259
|
+
|
|
260
|
+
```typescript
|
|
261
|
+
import { Agent } from "@mastra/core/agent";
|
|
262
|
+
import { CompositeVoice } from "@mastra/core/voice";
|
|
263
|
+
import { OpenAIVoice } from "@mastra/voice-openai";
|
|
264
|
+
import { PlayAIVoice } from "@mastra/voice-playai";
|
|
265
|
+
|
|
266
|
+
export const agent = new Agent({
|
|
267
|
+
id: "voice-agent",
|
|
268
|
+
name: "Voice Agent",
|
|
269
|
+
instructions: `You are a helpful assistant with both STT and TTS capabilities.`,
|
|
270
|
+
model: "openai/gpt-5.1",
|
|
271
|
+
|
|
272
|
+
// Create a composite voice using OpenAI for listening and PlayAI for speaking
|
|
273
|
+
voice: new CompositeVoice({
|
|
274
|
+
input: new OpenAIVoice(),
|
|
275
|
+
output: new PlayAIVoice(),
|
|
276
|
+
}),
|
|
277
|
+
});
|
|
278
|
+
```
|
|
279
|
+
|
|
280
|
+
### Using AI SDK
|
|
281
|
+
|
|
282
|
+
Mastra supports using AI SDK's transcription and speech models directly in `CompositeVoice`, giving you access to a wide range of providers through the AI SDK ecosystem:
|
|
283
|
+
|
|
284
|
+
```typescript
|
|
285
|
+
import { Agent } from "@mastra/core/agent";
|
|
286
|
+
import { CompositeVoice } from "@mastra/core/voice";
|
|
287
|
+
import { openai } from "@ai-sdk/openai";
|
|
288
|
+
import { elevenlabs } from "@ai-sdk/elevenlabs";
|
|
289
|
+
import { groq } from "@ai-sdk/groq";
|
|
290
|
+
|
|
291
|
+
export const agent = new Agent({
|
|
292
|
+
id: "aisdk-voice-agent",
|
|
293
|
+
name: "AI SDK Voice Agent",
|
|
294
|
+
instructions: `You are a helpful assistant with voice capabilities.`,
|
|
295
|
+
model: "openai/gpt-5.1",
|
|
296
|
+
|
|
297
|
+
// Pass AI SDK models directly to CompositeVoice
|
|
298
|
+
voice: new CompositeVoice({
|
|
299
|
+
input: openai.transcription('whisper-1'), // AI SDK transcription model
|
|
300
|
+
output: elevenlabs.speech('eleven_turbo_v2'), // AI SDK speech model
|
|
301
|
+
}),
|
|
302
|
+
});
|
|
303
|
+
|
|
304
|
+
// Use voice capabilities as usual
|
|
305
|
+
const audioStream = await agent.voice.speak("Hello!");
|
|
306
|
+
const transcribedText = await agent.voice.listen(audioStream);
|
|
307
|
+
```
|
|
308
|
+
|
|
309
|
+
#### Mix and Match Providers
|
|
310
|
+
|
|
311
|
+
You can mix AI SDK models with Mastra voice providers:
|
|
312
|
+
|
|
313
|
+
```typescript
|
|
314
|
+
import { CompositeVoice } from "@mastra/core/voice";
|
|
315
|
+
import { PlayAIVoice } from "@mastra/voice-playai";
|
|
316
|
+
import { openai } from "@ai-sdk/openai";
|
|
317
|
+
|
|
318
|
+
// Use AI SDK for transcription and Mastra provider for speech
|
|
319
|
+
const voice = new CompositeVoice({
|
|
320
|
+
input: openai.transcription('whisper-1'), // AI SDK
|
|
321
|
+
output: new PlayAIVoice(), // Mastra provider
|
|
322
|
+
});
|
|
323
|
+
```
|
|
324
|
+
|
|
325
|
+
For the complete list of supported AI SDK providers and their capabilities:
|
|
326
|
+
* [Transcription](https://ai-sdk.dev/docs/providers/openai/transcription)
|
|
327
|
+
* [Speech](https://ai-sdk.dev/docs/providers/elevenlabs/speech)
|
|
328
|
+
|
|
329
|
+
## Supported Voice Providers
|
|
330
|
+
|
|
331
|
+
Mastra supports multiple voice providers for text-to-speech (TTS) and speech-to-text (STT) capabilities:
|
|
332
|
+
|
|
333
|
+
| Provider | Package | Features | Reference |
|
|
334
|
+
| --------------- | ------------------------------- | ------------------------- | ------------------------------------------------- |
|
|
335
|
+
| OpenAI | `@mastra/voice-openai` | TTS, STT | [Documentation](https://mastra.ai/reference/v1/voice/openai) |
|
|
336
|
+
| OpenAI Realtime | `@mastra/voice-openai-realtime` | Realtime speech-to-speech | [Documentation](https://mastra.ai/reference/v1/voice/openai-realtime) |
|
|
337
|
+
| ElevenLabs | `@mastra/voice-elevenlabs` | High-quality TTS | [Documentation](https://mastra.ai/reference/v1/voice/elevenlabs) |
|
|
338
|
+
| PlayAI | `@mastra/voice-playai` | TTS | [Documentation](https://mastra.ai/reference/v1/voice/playai) |
|
|
339
|
+
| Google | `@mastra/voice-google` | TTS, STT | [Documentation](https://mastra.ai/reference/v1/voice/google) |
|
|
340
|
+
| Deepgram | `@mastra/voice-deepgram` | STT | [Documentation](https://mastra.ai/reference/v1/voice/deepgram) |
|
|
341
|
+
| Murf | `@mastra/voice-murf` | TTS | [Documentation](https://mastra.ai/reference/v1/voice/murf) |
|
|
342
|
+
| Speechify | `@mastra/voice-speechify` | TTS | [Documentation](https://mastra.ai/reference/v1/voice/speechify) |
|
|
343
|
+
| Sarvam | `@mastra/voice-sarvam` | TTS, STT | [Documentation](https://mastra.ai/reference/v1/voice/sarvam) |
|
|
344
|
+
| Azure | `@mastra/voice-azure` | TTS, STT | [Documentation](https://mastra.ai/reference/v1/voice/mastra-voice) |
|
|
345
|
+
| Cloudflare | `@mastra/voice-cloudflare` | TTS | [Documentation](https://mastra.ai/reference/v1/voice/mastra-voice) |
|
|
346
|
+
|
|
347
|
+
## Next Steps
|
|
348
|
+
|
|
349
|
+
- [Voice API Reference](https://mastra.ai/reference/v1/voice/mastra-voice) - Detailed API documentation for voice capabilities
|
|
350
|
+
- [Text to Speech Examples](https://github.com/mastra-ai/voice-examples/tree/main/text-to-speech) - Interactive story generator and other TTS implementations
|
|
351
|
+
- [Speech to Text Examples](https://github.com/mastra-ai/voice-examples/tree/main/speech-to-text) - Voice memo app and other STT implementations
|
|
352
|
+
- [Speech to Speech Examples](https://github.com/mastra-ai/voice-examples/tree/main/speech-to-speech) - Real-time voice conversation with call analysis
|