@mastra/voice-cloudflare 0.12.0 → 0.12.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +18 -0
- package/LICENSE.md +15 -0
- package/dist/docs/SKILL.md +15 -21
- package/dist/docs/{SOURCE_MAP.json → assets/SOURCE_MAP.json} +1 -1
- package/dist/docs/{agents/01-adding-voice.md → references/docs-agents-adding-voice.md} +155 -158
- package/dist/docs/references/docs-voice-overview.md +959 -0
- package/dist/docs/references/reference-voice-cloudflare.md +83 -0
- package/package.json +13 -14
- package/dist/docs/README.md +0 -32
- package/dist/docs/voice/01-overview.md +0 -1019
- package/dist/docs/voice/02-reference.md +0 -72
package/CHANGELOG.md
CHANGED
|
@@ -1,5 +1,23 @@
|
|
|
1
1
|
# @mastra/voice-cloudflare
|
|
2
2
|
|
|
3
|
+
## 0.12.1
|
|
4
|
+
|
|
5
|
+
### Patch Changes
|
|
6
|
+
|
|
7
|
+
- dependencies updates: ([#10188](https://github.com/mastra-ai/mastra/pull/10188))
|
|
8
|
+
- Updated dependency [`cloudflare@^5.2.0` ↗︎](https://www.npmjs.com/package/cloudflare/v/5.2.0) (from `^4.5.0`, in `dependencies`)
|
|
9
|
+
- Updated dependencies [[`dc514a8`](https://github.com/mastra-ai/mastra/commit/dc514a83dba5f719172dddfd2c7b858e4943d067), [`e333b77`](https://github.com/mastra-ai/mastra/commit/e333b77e2d76ba57ccec1818e08cebc1993469ff), [`dc9fc19`](https://github.com/mastra-ai/mastra/commit/dc9fc19da4437f6b508cc355f346a8856746a76b), [`60a224d`](https://github.com/mastra-ai/mastra/commit/60a224dd497240e83698cfa5bfd02e3d1d854844), [`fbf22a7`](https://github.com/mastra-ai/mastra/commit/fbf22a7ad86bcb50dcf30459f0d075e51ddeb468), [`f16d92c`](https://github.com/mastra-ai/mastra/commit/f16d92c677a119a135cebcf7e2b9f51ada7a9df4), [`949b7bf`](https://github.com/mastra-ai/mastra/commit/949b7bfd4e40f2b2cba7fef5eb3f108a02cfe938), [`404fea1`](https://github.com/mastra-ai/mastra/commit/404fea13042181f0b0c73a101392ac87c79ceae2), [`ebf5047`](https://github.com/mastra-ai/mastra/commit/ebf5047e825c38a1a356f10b214c1d4260dfcd8d), [`12c647c`](https://github.com/mastra-ai/mastra/commit/12c647cf3a26826eb72d40b42e3c8356ceae16ed), [`d084b66`](https://github.com/mastra-ai/mastra/commit/d084b6692396057e83c086b954c1857d20b58a14), [`79c699a`](https://github.com/mastra-ai/mastra/commit/79c699acf3cd8a77e11c55530431f48eb48456e9), [`62757b6`](https://github.com/mastra-ai/mastra/commit/62757b6db6e8bb86569d23ad0b514178f57053f8), [`675f15b`](https://github.com/mastra-ai/mastra/commit/675f15b7eaeea649158d228ea635be40480c584d), [`b174c63`](https://github.com/mastra-ai/mastra/commit/b174c63a093108d4e53b9bc89a078d9f66202b3f), [`819f03c`](https://github.com/mastra-ai/mastra/commit/819f03c25823373b32476413bd76be28a5d8705a), [`04160ee`](https://github.com/mastra-ai/mastra/commit/04160eedf3130003cf842ad08428c8ff69af4cc1), [`2c27503`](https://github.com/mastra-ai/mastra/commit/2c275032510d131d2cde47f99953abf0fe02c081), [`424a1df`](https://github.com/mastra-ai/mastra/commit/424a1df7bee59abb5c83717a54807fdd674a6224), [`3d70b0b`](https://github.com/mastra-ai/mastra/commit/3d70b0b3524d817173ad870768f259c06d61bd23), [`eef7cb2`](https://github.com/mastra-ai/mastra/commit/eef7cb2abe7ef15951e2fdf792a5095c6c643333), [`260fe12`](https://github.com/mastra-ai/mastra/commit/260fe1295fe7354e39d6def2775e0797a7a277f0), [`12c88a6`](https://github.com/mastra-ai/mastra/commit/12c88a6e32bf982c2fe0c6af62e65a3414519a75), [`43595bf`](https://github.com/mastra-ai/mastra/commit/43595bf7b8df1a6edce7a23b445b5124d2a0b473), [`78670e9`](https://github.com/mastra-ai/mastra/commit/78670e97e76d7422cf7025faf371b2aeafed860d), [`e8a5b0b`](https://github.com/mastra-ai/mastra/commit/e8a5b0b9bc94d12dee4150095512ca27a288d778), [`3b45a13`](https://github.com/mastra-ai/mastra/commit/3b45a138d09d040779c0aba1edbbfc1b57442d23), [`d400e7c`](https://github.com/mastra-ai/mastra/commit/d400e7c8b8d7afa6ba2c71769eace4048e3cef8e), [`f58d1a7`](https://github.com/mastra-ai/mastra/commit/f58d1a7a457588a996c3ecb53201a68f3d28c432), [`a49a929`](https://github.com/mastra-ai/mastra/commit/a49a92904968b4fc67e01effee8c7c8d0464ba85), [`8127d96`](https://github.com/mastra-ai/mastra/commit/8127d96280492e335d49b244501088dfdd59a8f1)]:
|
|
10
|
+
- @mastra/core@1.18.0
|
|
11
|
+
|
|
12
|
+
## 0.12.1-alpha.0
|
|
13
|
+
|
|
14
|
+
### Patch Changes
|
|
15
|
+
|
|
16
|
+
- dependencies updates: ([#10188](https://github.com/mastra-ai/mastra/pull/10188))
|
|
17
|
+
- Updated dependency [`cloudflare@^5.2.0` ↗︎](https://www.npmjs.com/package/cloudflare/v/5.2.0) (from `^4.5.0`, in `dependencies`)
|
|
18
|
+
- Updated dependencies [[`e333b77`](https://github.com/mastra-ai/mastra/commit/e333b77e2d76ba57ccec1818e08cebc1993469ff), [`60a224d`](https://github.com/mastra-ai/mastra/commit/60a224dd497240e83698cfa5bfd02e3d1d854844), [`949b7bf`](https://github.com/mastra-ai/mastra/commit/949b7bfd4e40f2b2cba7fef5eb3f108a02cfe938), [`d084b66`](https://github.com/mastra-ai/mastra/commit/d084b6692396057e83c086b954c1857d20b58a14), [`79c699a`](https://github.com/mastra-ai/mastra/commit/79c699acf3cd8a77e11c55530431f48eb48456e9), [`62757b6`](https://github.com/mastra-ai/mastra/commit/62757b6db6e8bb86569d23ad0b514178f57053f8), [`3d70b0b`](https://github.com/mastra-ai/mastra/commit/3d70b0b3524d817173ad870768f259c06d61bd23), [`3b45a13`](https://github.com/mastra-ai/mastra/commit/3b45a138d09d040779c0aba1edbbfc1b57442d23), [`8127d96`](https://github.com/mastra-ai/mastra/commit/8127d96280492e335d49b244501088dfdd59a8f1)]:
|
|
19
|
+
- @mastra/core@1.18.0-alpha.3
|
|
20
|
+
|
|
3
21
|
## 0.12.0
|
|
4
22
|
|
|
5
23
|
### Minor Changes
|
package/LICENSE.md
CHANGED
|
@@ -1,3 +1,18 @@
|
|
|
1
|
+
Portions of this software are licensed as follows:
|
|
2
|
+
|
|
3
|
+
- All content that resides under any directory named "ee/" within this
|
|
4
|
+
repository, including but not limited to:
|
|
5
|
+
- `packages/core/src/auth/ee/`
|
|
6
|
+
- `packages/server/src/server/auth/ee/`
|
|
7
|
+
is licensed under the license defined in `ee/LICENSE`.
|
|
8
|
+
|
|
9
|
+
- All third-party components incorporated into the Mastra Software are
|
|
10
|
+
licensed under the original license provided by the owner of the
|
|
11
|
+
applicable component.
|
|
12
|
+
|
|
13
|
+
- Content outside of the above-mentioned directories or restrictions is
|
|
14
|
+
available under the "Apache License 2.0" as defined below.
|
|
15
|
+
|
|
1
16
|
# Apache License 2.0
|
|
2
17
|
|
|
3
18
|
Copyright (c) 2025 Kepler Software, Inc.
|
package/dist/docs/SKILL.md
CHANGED
|
@@ -1,33 +1,27 @@
|
|
|
1
1
|
---
|
|
2
|
-
name: mastra-voice-cloudflare
|
|
3
|
-
description: Documentation for @mastra/voice-cloudflare.
|
|
2
|
+
name: mastra-voice-cloudflare
|
|
3
|
+
description: Documentation for @mastra/voice-cloudflare. Use when working with @mastra/voice-cloudflare APIs, configuration, or implementation.
|
|
4
|
+
metadata:
|
|
5
|
+
package: "@mastra/voice-cloudflare"
|
|
6
|
+
version: "0.12.1"
|
|
4
7
|
---
|
|
5
8
|
|
|
6
|
-
|
|
9
|
+
## When to use
|
|
7
10
|
|
|
8
|
-
|
|
9
|
-
> **Package**: @mastra/voice-cloudflare
|
|
11
|
+
Use this skill whenever you are working with @mastra/voice-cloudflare to obtain the domain-specific knowledge.
|
|
10
12
|
|
|
11
|
-
##
|
|
13
|
+
## How to use
|
|
12
14
|
|
|
13
|
-
|
|
15
|
+
Read the individual reference documents for detailed explanations and code examples.
|
|
14
16
|
|
|
15
|
-
|
|
16
|
-
cat docs/SOURCE_MAP.json
|
|
17
|
-
```
|
|
17
|
+
### Docs
|
|
18
18
|
|
|
19
|
-
|
|
20
|
-
-
|
|
21
|
-
- **implementation**: `.js` chunk file with readable source
|
|
22
|
-
- **docs**: Conceptual documentation in `docs/`
|
|
19
|
+
- [Voice](references/docs-agents-adding-voice.md) - Learn how to add voice capabilities to your Mastra agents for text-to-speech and speech-to-text interactions.
|
|
20
|
+
- [Voice in Mastra](references/docs-voice-overview.md) - Overview of voice capabilities in Mastra, including text-to-speech, speech-to-text, and real-time speech-to-speech interactions.
|
|
23
21
|
|
|
24
|
-
|
|
22
|
+
### Reference
|
|
25
23
|
|
|
24
|
+
- [Reference: Cloudflare](references/reference-voice-cloudflare.md) - Documentation for the CloudflareVoice class, providing text-to-speech capabilities using Cloudflare Workers AI.
|
|
26
25
|
|
|
27
26
|
|
|
28
|
-
|
|
29
|
-
|
|
30
|
-
## Available Topics
|
|
31
|
-
|
|
32
|
-
- [Agents](agents/) - 1 file(s)
|
|
33
|
-
- [Voice](voice/) - 2 file(s)
|
|
27
|
+
Read [assets/SOURCE_MAP.json](assets/SOURCE_MAP.json) for source code references.
|
|
@@ -7,39 +7,39 @@ Mastra agents can be enhanced with voice capabilities, allowing them to speak re
|
|
|
7
7
|
The simplest way to add voice to an agent is to use a single provider for both speaking and listening:
|
|
8
8
|
|
|
9
9
|
```typescript
|
|
10
|
-
import { createReadStream } from
|
|
11
|
-
import path from
|
|
12
|
-
import { Agent } from
|
|
13
|
-
import { OpenAIVoice } from
|
|
10
|
+
import { createReadStream } from 'fs'
|
|
11
|
+
import path from 'path'
|
|
12
|
+
import { Agent } from '@mastra/core/agent'
|
|
13
|
+
import { OpenAIVoice } from '@mastra/voice-openai'
|
|
14
14
|
|
|
15
15
|
// Initialize the voice provider with default settings
|
|
16
|
-
const voice = new OpenAIVoice()
|
|
16
|
+
const voice = new OpenAIVoice()
|
|
17
17
|
|
|
18
18
|
// Create an agent with voice capabilities
|
|
19
19
|
export const agent = new Agent({
|
|
20
|
-
id:
|
|
21
|
-
name:
|
|
20
|
+
id: 'voice-agent',
|
|
21
|
+
name: 'Voice Agent',
|
|
22
22
|
instructions: `You are a helpful assistant with both STT and TTS capabilities.`,
|
|
23
|
-
model:
|
|
23
|
+
model: 'openai/gpt-5.4',
|
|
24
24
|
voice,
|
|
25
|
-
})
|
|
25
|
+
})
|
|
26
26
|
|
|
27
27
|
// The agent can now use voice for interaction
|
|
28
28
|
const audioStream = await agent.voice.speak("Hello, I'm your AI assistant!", {
|
|
29
|
-
filetype:
|
|
30
|
-
})
|
|
29
|
+
filetype: 'm4a',
|
|
30
|
+
})
|
|
31
31
|
|
|
32
|
-
playAudio(audioStream!)
|
|
32
|
+
playAudio(audioStream!)
|
|
33
33
|
|
|
34
34
|
try {
|
|
35
|
-
const transcription = await agent.voice.listen(audioStream)
|
|
36
|
-
console.log(transcription)
|
|
35
|
+
const transcription = await agent.voice.listen(audioStream)
|
|
36
|
+
console.log(transcription)
|
|
37
37
|
} catch (error) {
|
|
38
|
-
console.error(
|
|
38
|
+
console.error('Error transcribing audio:', error)
|
|
39
39
|
}
|
|
40
40
|
```
|
|
41
41
|
|
|
42
|
-
## Working with
|
|
42
|
+
## Working with audio streams
|
|
43
43
|
|
|
44
44
|
The `speak()` and `listen()` methods work with Node.js streams. Here's how to save and load audio files:
|
|
45
45
|
|
|
@@ -48,20 +48,20 @@ The `speak()` and `listen()` methods work with Node.js streams. Here's how to sa
|
|
|
48
48
|
The `speak` method returns a stream that you can pipe to a file or speaker.
|
|
49
49
|
|
|
50
50
|
```typescript
|
|
51
|
-
import { createWriteStream } from
|
|
52
|
-
import path from
|
|
51
|
+
import { createWriteStream } from 'fs'
|
|
52
|
+
import path from 'path'
|
|
53
53
|
|
|
54
54
|
// Generate speech and save to file
|
|
55
|
-
const audio = await agent.voice.speak(
|
|
56
|
-
const filePath = path.join(process.cwd(),
|
|
57
|
-
const writer = createWriteStream(filePath)
|
|
55
|
+
const audio = await agent.voice.speak('Hello, World!')
|
|
56
|
+
const filePath = path.join(process.cwd(), 'agent.mp3')
|
|
57
|
+
const writer = createWriteStream(filePath)
|
|
58
58
|
|
|
59
|
-
audio.pipe(writer)
|
|
59
|
+
audio.pipe(writer)
|
|
60
60
|
|
|
61
61
|
await new Promise<void>((resolve, reject) => {
|
|
62
|
-
writer.on(
|
|
63
|
-
writer.on(
|
|
64
|
-
})
|
|
62
|
+
writer.on('finish', () => resolve())
|
|
63
|
+
writer.on('error', reject)
|
|
64
|
+
})
|
|
65
65
|
```
|
|
66
66
|
|
|
67
67
|
### Transcribing Audio Input
|
|
@@ -69,67 +69,67 @@ await new Promise<void>((resolve, reject) => {
|
|
|
69
69
|
The `listen` method expects a stream of audio data from a microphone or file.
|
|
70
70
|
|
|
71
71
|
```typescript
|
|
72
|
-
import { createReadStream } from
|
|
73
|
-
import path from
|
|
72
|
+
import { createReadStream } from 'fs'
|
|
73
|
+
import path from 'path'
|
|
74
74
|
|
|
75
75
|
// Read audio file and transcribe
|
|
76
|
-
const audioFilePath = path.join(process.cwd(),
|
|
77
|
-
const audioStream = createReadStream(audioFilePath)
|
|
76
|
+
const audioFilePath = path.join(process.cwd(), '/agent.m4a')
|
|
77
|
+
const audioStream = createReadStream(audioFilePath)
|
|
78
78
|
|
|
79
79
|
try {
|
|
80
|
-
console.log(
|
|
80
|
+
console.log('Transcribing audio file...')
|
|
81
81
|
const transcription = await agent.voice.listen(audioStream, {
|
|
82
|
-
filetype:
|
|
83
|
-
})
|
|
84
|
-
console.log(
|
|
82
|
+
filetype: 'm4a',
|
|
83
|
+
})
|
|
84
|
+
console.log('Transcription:', transcription)
|
|
85
85
|
} catch (error) {
|
|
86
|
-
console.error(
|
|
86
|
+
console.error('Error transcribing audio:', error)
|
|
87
87
|
}
|
|
88
88
|
```
|
|
89
89
|
|
|
90
|
-
## Speech-to-
|
|
90
|
+
## Speech-to-speech voice interactions
|
|
91
91
|
|
|
92
92
|
For more dynamic and interactive voice experiences, you can use real-time voice providers that support speech-to-speech capabilities:
|
|
93
93
|
|
|
94
94
|
```typescript
|
|
95
|
-
import { Agent } from
|
|
96
|
-
import { getMicrophoneStream } from
|
|
97
|
-
import { OpenAIRealtimeVoice } from
|
|
98
|
-
import { search, calculate } from
|
|
95
|
+
import { Agent } from '@mastra/core/agent'
|
|
96
|
+
import { getMicrophoneStream } from '@mastra/node-audio'
|
|
97
|
+
import { OpenAIRealtimeVoice } from '@mastra/voice-openai-realtime'
|
|
98
|
+
import { search, calculate } from '../tools'
|
|
99
99
|
|
|
100
100
|
// Initialize the realtime voice provider
|
|
101
101
|
const voice = new OpenAIRealtimeVoice({
|
|
102
102
|
apiKey: process.env.OPENAI_API_KEY,
|
|
103
|
-
model:
|
|
104
|
-
speaker:
|
|
105
|
-
})
|
|
103
|
+
model: 'gpt-5.1-realtime',
|
|
104
|
+
speaker: 'alloy',
|
|
105
|
+
})
|
|
106
106
|
|
|
107
107
|
// Create an agent with speech-to-speech voice capabilities
|
|
108
108
|
export const agent = new Agent({
|
|
109
|
-
id:
|
|
110
|
-
name:
|
|
109
|
+
id: 'speech-to-speech-agent',
|
|
110
|
+
name: 'Speech-to-Speech Agent',
|
|
111
111
|
instructions: `You are a helpful assistant with speech-to-speech capabilities.`,
|
|
112
|
-
model:
|
|
112
|
+
model: 'openai/gpt-5.4',
|
|
113
113
|
tools: {
|
|
114
114
|
// Tools configured on Agent are passed to voice provider
|
|
115
115
|
search,
|
|
116
116
|
calculate,
|
|
117
117
|
},
|
|
118
118
|
voice,
|
|
119
|
-
})
|
|
119
|
+
})
|
|
120
120
|
|
|
121
121
|
// Establish a WebSocket connection
|
|
122
|
-
await agent.voice.connect()
|
|
122
|
+
await agent.voice.connect()
|
|
123
123
|
|
|
124
124
|
// Start a conversation
|
|
125
|
-
agent.voice.speak("Hello, I'm your AI assistant!")
|
|
125
|
+
agent.voice.speak("Hello, I'm your AI assistant!")
|
|
126
126
|
|
|
127
127
|
// Stream audio from a microphone
|
|
128
|
-
const microphoneStream = getMicrophoneStream()
|
|
129
|
-
agent.voice.send(microphoneStream)
|
|
128
|
+
const microphoneStream = getMicrophoneStream()
|
|
129
|
+
agent.voice.send(microphoneStream)
|
|
130
130
|
|
|
131
131
|
// When done with the conversation
|
|
132
|
-
agent.voice.close()
|
|
132
|
+
agent.voice.close()
|
|
133
133
|
```
|
|
134
134
|
|
|
135
135
|
### Event System
|
|
@@ -138,19 +138,19 @@ The realtime voice provider emits several events you can listen for:
|
|
|
138
138
|
|
|
139
139
|
```typescript
|
|
140
140
|
// Listen for speech audio data sent from voice provider
|
|
141
|
-
agent.voice.on(
|
|
141
|
+
agent.voice.on('speaking', ({ audio }) => {
|
|
142
142
|
// audio contains ReadableStream or Int16Array audio data
|
|
143
|
-
})
|
|
143
|
+
})
|
|
144
144
|
|
|
145
145
|
// Listen for transcribed text sent from both voice provider and user
|
|
146
|
-
agent.voice.on(
|
|
147
|
-
console.log(`${role} said: ${text}`)
|
|
148
|
-
})
|
|
146
|
+
agent.voice.on('writing', ({ text, role }) => {
|
|
147
|
+
console.log(`${role} said: ${text}`)
|
|
148
|
+
})
|
|
149
149
|
|
|
150
150
|
// Listen for errors
|
|
151
|
-
agent.voice.on(
|
|
152
|
-
console.error(
|
|
153
|
-
})
|
|
151
|
+
agent.voice.on('error', error => {
|
|
152
|
+
console.error('Voice error:', error)
|
|
153
|
+
})
|
|
154
154
|
```
|
|
155
155
|
|
|
156
156
|
## Examples
|
|
@@ -164,93 +164,89 @@ The following files are created:
|
|
|
164
164
|
- **hybrid-question.mp3** – Hybrid agent's spoken question.
|
|
165
165
|
- **unified-response.mp3** – Unified agent's spoken response.
|
|
166
166
|
|
|
167
|
-
```typescript
|
|
168
|
-
import
|
|
167
|
+
```typescript
|
|
168
|
+
import 'dotenv/config'
|
|
169
169
|
|
|
170
|
-
import path from
|
|
171
|
-
import { createReadStream } from
|
|
172
|
-
import { Agent } from
|
|
173
|
-
import { CompositeVoice } from
|
|
174
|
-
import { OpenAIVoice } from
|
|
175
|
-
import { Mastra } from
|
|
170
|
+
import path from 'path'
|
|
171
|
+
import { createReadStream } from 'fs'
|
|
172
|
+
import { Agent } from '@mastra/core/agent'
|
|
173
|
+
import { CompositeVoice } from '@mastra/core/voice'
|
|
174
|
+
import { OpenAIVoice } from '@mastra/voice-openai'
|
|
175
|
+
import { Mastra } from '@mastra/core'
|
|
176
176
|
|
|
177
177
|
// Saves an audio stream to a file in the audio directory, creating the directory if it doesn't exist.
|
|
178
178
|
export const saveAudioToFile = async (
|
|
179
179
|
audio: NodeJS.ReadableStream,
|
|
180
180
|
filename: string,
|
|
181
181
|
): Promise<void> => {
|
|
182
|
-
const audioDir = path.join(process.cwd(),
|
|
183
|
-
const filePath = path.join(audioDir, filename)
|
|
182
|
+
const audioDir = path.join(process.cwd(), 'audio')
|
|
183
|
+
const filePath = path.join(audioDir, filename)
|
|
184
184
|
|
|
185
|
-
await fs.promises.mkdir(audioDir, { recursive: true })
|
|
185
|
+
await fs.promises.mkdir(audioDir, { recursive: true })
|
|
186
186
|
|
|
187
|
-
const writer = createWriteStream(filePath)
|
|
188
|
-
audio.pipe(writer)
|
|
187
|
+
const writer = createWriteStream(filePath)
|
|
188
|
+
audio.pipe(writer)
|
|
189
189
|
return new Promise((resolve, reject) => {
|
|
190
|
-
writer.on(
|
|
191
|
-
writer.on(
|
|
192
|
-
})
|
|
193
|
-
}
|
|
190
|
+
writer.on('finish', resolve)
|
|
191
|
+
writer.on('error', reject)
|
|
192
|
+
})
|
|
193
|
+
}
|
|
194
194
|
|
|
195
195
|
// Saves an audio stream to a file in the audio directory, creating the directory if it doesn't exist.
|
|
196
|
-
export const convertToText = async (
|
|
197
|
-
input
|
|
198
|
-
|
|
199
|
-
if (typeof input === "string") {
|
|
200
|
-
return input;
|
|
196
|
+
export const convertToText = async (input: string | NodeJS.ReadableStream): Promise<string> => {
|
|
197
|
+
if (typeof input === 'string') {
|
|
198
|
+
return input
|
|
201
199
|
}
|
|
202
200
|
|
|
203
|
-
const chunks: Buffer[] = []
|
|
201
|
+
const chunks: Buffer[] = []
|
|
204
202
|
return new Promise((resolve, reject) => {
|
|
205
|
-
inputData.on(
|
|
206
|
-
inputData.on(
|
|
207
|
-
inputData.on(
|
|
208
|
-
})
|
|
209
|
-
}
|
|
203
|
+
inputData.on('data', chunk => chunks.push(Buffer.from(chunk)))
|
|
204
|
+
inputData.on('error', reject)
|
|
205
|
+
inputData.on('end', () => resolve(Buffer.concat(chunks).toString('utf-8')))
|
|
206
|
+
})
|
|
207
|
+
}
|
|
210
208
|
|
|
211
209
|
export const hybridVoiceAgent = new Agent({
|
|
212
|
-
id:
|
|
213
|
-
name:
|
|
214
|
-
model:
|
|
215
|
-
instructions:
|
|
210
|
+
id: 'hybrid-voice-agent',
|
|
211
|
+
name: 'Hybrid Voice Agent',
|
|
212
|
+
model: 'openai/gpt-5.4',
|
|
213
|
+
instructions: 'You can speak and listen using different providers.',
|
|
216
214
|
voice: new CompositeVoice({
|
|
217
215
|
input: new OpenAIVoice(),
|
|
218
216
|
output: new OpenAIVoice(),
|
|
219
217
|
}),
|
|
220
|
-
})
|
|
218
|
+
})
|
|
221
219
|
|
|
222
220
|
export const unifiedVoiceAgent = new Agent({
|
|
223
|
-
id:
|
|
224
|
-
name:
|
|
225
|
-
instructions:
|
|
226
|
-
model:
|
|
221
|
+
id: 'unified-voice-agent',
|
|
222
|
+
name: 'Unified Voice Agent',
|
|
223
|
+
instructions: 'You are an agent with both STT and TTS capabilities.',
|
|
224
|
+
model: 'openai/gpt-5.4',
|
|
227
225
|
voice: new OpenAIVoice(),
|
|
228
|
-
})
|
|
226
|
+
})
|
|
229
227
|
|
|
230
228
|
export const mastra = new Mastra({
|
|
231
229
|
agents: { hybridVoiceAgent, unifiedVoiceAgent },
|
|
232
|
-
})
|
|
230
|
+
})
|
|
233
231
|
|
|
234
|
-
const hybridVoiceAgent = mastra.getAgent(
|
|
235
|
-
const unifiedVoiceAgent = mastra.getAgent(
|
|
232
|
+
const hybridVoiceAgent = mastra.getAgent('hybridVoiceAgent')
|
|
233
|
+
const unifiedVoiceAgent = mastra.getAgent('unifiedVoiceAgent')
|
|
236
234
|
|
|
237
|
-
const question =
|
|
235
|
+
const question = 'What is the meaning of life in one sentence?'
|
|
238
236
|
|
|
239
|
-
const hybridSpoken = await hybridVoiceAgent.voice.speak(question)
|
|
237
|
+
const hybridSpoken = await hybridVoiceAgent.voice.speak(question)
|
|
240
238
|
|
|
241
|
-
await saveAudioToFile(hybridSpoken!,
|
|
239
|
+
await saveAudioToFile(hybridSpoken!, 'hybrid-question.mp3')
|
|
242
240
|
|
|
243
|
-
const audioStream = createReadStream(
|
|
244
|
-
|
|
245
|
-
);
|
|
246
|
-
const unifiedHeard = await unifiedVoiceAgent.voice.listen(audioStream);
|
|
241
|
+
const audioStream = createReadStream(path.join(process.cwd(), 'audio', 'hybrid-question.mp3'))
|
|
242
|
+
const unifiedHeard = await unifiedVoiceAgent.voice.listen(audioStream)
|
|
247
243
|
|
|
248
|
-
const inputText = await convertToText(unifiedHeard!)
|
|
244
|
+
const inputText = await convertToText(unifiedHeard!)
|
|
249
245
|
|
|
250
|
-
const unifiedResponse = await unifiedVoiceAgent.generate(inputText)
|
|
251
|
-
const unifiedSpoken = await unifiedVoiceAgent.voice.speak(unifiedResponse.text)
|
|
246
|
+
const unifiedResponse = await unifiedVoiceAgent.generate(inputText)
|
|
247
|
+
const unifiedSpoken = await unifiedVoiceAgent.voice.speak(unifiedResponse.text)
|
|
252
248
|
|
|
253
|
-
await saveAudioToFile(unifiedSpoken!,
|
|
249
|
+
await saveAudioToFile(unifiedSpoken!, 'unified-response.mp3')
|
|
254
250
|
```
|
|
255
251
|
|
|
256
252
|
### Using Multiple Providers
|
|
@@ -258,23 +254,23 @@ await saveAudioToFile(unifiedSpoken!, "unified-response.mp3");
|
|
|
258
254
|
For more flexibility, you can use different providers for speaking and listening using the CompositeVoice class:
|
|
259
255
|
|
|
260
256
|
```typescript
|
|
261
|
-
import { Agent } from
|
|
262
|
-
import { CompositeVoice } from
|
|
263
|
-
import { OpenAIVoice } from
|
|
264
|
-
import { PlayAIVoice } from
|
|
257
|
+
import { Agent } from '@mastra/core/agent'
|
|
258
|
+
import { CompositeVoice } from '@mastra/core/voice'
|
|
259
|
+
import { OpenAIVoice } from '@mastra/voice-openai'
|
|
260
|
+
import { PlayAIVoice } from '@mastra/voice-playai'
|
|
265
261
|
|
|
266
262
|
export const agent = new Agent({
|
|
267
|
-
id:
|
|
268
|
-
name:
|
|
263
|
+
id: 'voice-agent',
|
|
264
|
+
name: 'Voice Agent',
|
|
269
265
|
instructions: `You are a helpful assistant with both STT and TTS capabilities.`,
|
|
270
|
-
model:
|
|
266
|
+
model: 'openai/gpt-5.4',
|
|
271
267
|
|
|
272
268
|
// Create a composite voice using OpenAI for listening and PlayAI for speaking
|
|
273
269
|
voice: new CompositeVoice({
|
|
274
270
|
input: new OpenAIVoice(),
|
|
275
271
|
output: new PlayAIVoice(),
|
|
276
272
|
}),
|
|
277
|
-
})
|
|
273
|
+
})
|
|
278
274
|
```
|
|
279
275
|
|
|
280
276
|
### Using AI SDK
|
|
@@ -282,28 +278,28 @@ export const agent = new Agent({
|
|
|
282
278
|
Mastra supports using AI SDK's transcription and speech models directly in `CompositeVoice`, giving you access to a wide range of providers through the AI SDK ecosystem:
|
|
283
279
|
|
|
284
280
|
```typescript
|
|
285
|
-
import { Agent } from
|
|
286
|
-
import { CompositeVoice } from
|
|
287
|
-
import { openai } from
|
|
288
|
-
import { elevenlabs } from
|
|
289
|
-
import { groq } from
|
|
281
|
+
import { Agent } from '@mastra/core/agent'
|
|
282
|
+
import { CompositeVoice } from '@mastra/core/voice'
|
|
283
|
+
import { openai } from '@ai-sdk/openai'
|
|
284
|
+
import { elevenlabs } from '@ai-sdk/elevenlabs'
|
|
285
|
+
import { groq } from '@ai-sdk/groq'
|
|
290
286
|
|
|
291
287
|
export const agent = new Agent({
|
|
292
|
-
id:
|
|
293
|
-
name:
|
|
288
|
+
id: 'aisdk-voice-agent',
|
|
289
|
+
name: 'AI SDK Voice Agent',
|
|
294
290
|
instructions: `You are a helpful assistant with voice capabilities.`,
|
|
295
|
-
model:
|
|
291
|
+
model: 'openai/gpt-5.4',
|
|
296
292
|
|
|
297
293
|
// Pass AI SDK models directly to CompositeVoice
|
|
298
294
|
voice: new CompositeVoice({
|
|
299
|
-
input: openai.transcription('whisper-1'),
|
|
300
|
-
output: elevenlabs.speech('eleven_turbo_v2'),
|
|
295
|
+
input: openai.transcription('whisper-1'), // AI SDK transcription model
|
|
296
|
+
output: elevenlabs.speech('eleven_turbo_v2'), // AI SDK speech model
|
|
301
297
|
}),
|
|
302
|
-
})
|
|
298
|
+
})
|
|
303
299
|
|
|
304
300
|
// Use voice capabilities as usual
|
|
305
|
-
const audioStream = await agent.voice.speak(
|
|
306
|
-
const transcribedText = await agent.voice.listen(audioStream)
|
|
301
|
+
const audioStream = await agent.voice.speak('Hello!')
|
|
302
|
+
const transcribedText = await agent.voice.listen(audioStream)
|
|
307
303
|
```
|
|
308
304
|
|
|
309
305
|
#### Mix and Match Providers
|
|
@@ -311,42 +307,43 @@ const transcribedText = await agent.voice.listen(audioStream);
|
|
|
311
307
|
You can mix AI SDK models with Mastra voice providers:
|
|
312
308
|
|
|
313
309
|
```typescript
|
|
314
|
-
import { CompositeVoice } from
|
|
315
|
-
import { PlayAIVoice } from
|
|
316
|
-
import { openai } from
|
|
310
|
+
import { CompositeVoice } from '@mastra/core/voice'
|
|
311
|
+
import { PlayAIVoice } from '@mastra/voice-playai'
|
|
312
|
+
import { openai } from '@ai-sdk/openai'
|
|
317
313
|
|
|
318
314
|
// Use AI SDK for transcription and Mastra provider for speech
|
|
319
315
|
const voice = new CompositeVoice({
|
|
320
|
-
input: openai.transcription('whisper-1'),
|
|
321
|
-
output: new PlayAIVoice(),
|
|
322
|
-
})
|
|
316
|
+
input: openai.transcription('whisper-1'), // AI SDK
|
|
317
|
+
output: new PlayAIVoice(), // Mastra provider
|
|
318
|
+
})
|
|
323
319
|
```
|
|
324
320
|
|
|
325
321
|
For the complete list of supported AI SDK providers and their capabilities:
|
|
326
|
-
* [Transcription](https://ai-sdk.dev/docs/providers/openai/transcription)
|
|
327
|
-
* [Speech](https://ai-sdk.dev/docs/providers/elevenlabs/speech)
|
|
328
322
|
|
|
329
|
-
|
|
323
|
+
- [Transcription](https://ai-sdk.dev/docs/providers/openai/transcription)
|
|
324
|
+
- [Speech](https://ai-sdk.dev/docs/providers/elevenlabs/speech)
|
|
325
|
+
|
|
326
|
+
## Supported voice providers
|
|
330
327
|
|
|
331
328
|
Mastra supports multiple voice providers for text-to-speech (TTS) and speech-to-text (STT) capabilities:
|
|
332
329
|
|
|
333
|
-
| Provider | Package | Features | Reference
|
|
334
|
-
| --------------- | ------------------------------- | ------------------------- |
|
|
335
|
-
| OpenAI | `@mastra/voice-openai` | TTS, STT | [Documentation](https://mastra.ai/reference/
|
|
336
|
-
| OpenAI Realtime | `@mastra/voice-openai-realtime` | Realtime speech-to-speech | [Documentation](https://mastra.ai/reference/
|
|
337
|
-
| ElevenLabs | `@mastra/voice-elevenlabs` | High-quality TTS | [Documentation](https://mastra.ai/reference/
|
|
338
|
-
| PlayAI | `@mastra/voice-playai` | TTS | [Documentation](https://mastra.ai/reference/
|
|
339
|
-
| Google | `@mastra/voice-google` | TTS, STT | [Documentation](https://mastra.ai/reference/
|
|
340
|
-
| Deepgram | `@mastra/voice-deepgram` | STT | [Documentation](https://mastra.ai/reference/
|
|
341
|
-
| Murf | `@mastra/voice-murf` | TTS | [Documentation](https://mastra.ai/reference/
|
|
342
|
-
| Speechify | `@mastra/voice-speechify` | TTS | [Documentation](https://mastra.ai/reference/
|
|
343
|
-
| Sarvam | `@mastra/voice-sarvam` | TTS, STT | [Documentation](https://mastra.ai/reference/
|
|
344
|
-
| Azure | `@mastra/voice-azure` | TTS, STT | [Documentation](https://mastra.ai/reference/
|
|
345
|
-
| Cloudflare | `@mastra/voice-cloudflare` | TTS | [Documentation](https://mastra.ai/reference/
|
|
346
|
-
|
|
347
|
-
## Next
|
|
348
|
-
|
|
349
|
-
- [Voice API Reference](https://mastra.ai/reference/
|
|
330
|
+
| Provider | Package | Features | Reference |
|
|
331
|
+
| --------------- | ------------------------------- | ------------------------- | ------------------------------------------------------------------ |
|
|
332
|
+
| OpenAI | `@mastra/voice-openai` | TTS, STT | [Documentation](https://mastra.ai/reference/voice/openai) |
|
|
333
|
+
| OpenAI Realtime | `@mastra/voice-openai-realtime` | Realtime speech-to-speech | [Documentation](https://mastra.ai/reference/voice/openai-realtime) |
|
|
334
|
+
| ElevenLabs | `@mastra/voice-elevenlabs` | High-quality TTS | [Documentation](https://mastra.ai/reference/voice/elevenlabs) |
|
|
335
|
+
| PlayAI | `@mastra/voice-playai` | TTS | [Documentation](https://mastra.ai/reference/voice/playai) |
|
|
336
|
+
| Google | `@mastra/voice-google` | TTS, STT | [Documentation](https://mastra.ai/reference/voice/google) |
|
|
337
|
+
| Deepgram | `@mastra/voice-deepgram` | STT | [Documentation](https://mastra.ai/reference/voice/deepgram) |
|
|
338
|
+
| Murf | `@mastra/voice-murf` | TTS | [Documentation](https://mastra.ai/reference/voice/murf) |
|
|
339
|
+
| Speechify | `@mastra/voice-speechify` | TTS | [Documentation](https://mastra.ai/reference/voice/speechify) |
|
|
340
|
+
| Sarvam | `@mastra/voice-sarvam` | TTS, STT | [Documentation](https://mastra.ai/reference/voice/sarvam) |
|
|
341
|
+
| Azure | `@mastra/voice-azure` | TTS, STT | [Documentation](https://mastra.ai/reference/voice/mastra-voice) |
|
|
342
|
+
| Cloudflare | `@mastra/voice-cloudflare` | TTS | [Documentation](https://mastra.ai/reference/voice/mastra-voice) |
|
|
343
|
+
|
|
344
|
+
## Next steps
|
|
345
|
+
|
|
346
|
+
- [Voice API Reference](https://mastra.ai/reference/voice/mastra-voice) - Detailed API documentation for voice capabilities
|
|
350
347
|
- [Text to Speech Examples](https://github.com/mastra-ai/voice-examples/tree/main/text-to-speech) - Interactive story generator and other TTS implementations
|
|
351
348
|
- [Speech to Text Examples](https://github.com/mastra-ai/voice-examples/tree/main/speech-to-text) - Voice memo app and other STT implementations
|
|
352
349
|
- [Speech to Speech Examples](https://github.com/mastra-ai/voice-examples/tree/main/speech-to-speech) - Real-time voice conversation with call analysis
|