@mastra/voice-openai 0.12.0-beta.1 → 0.12.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,87 @@
1
+ > Overview of Text-to-Speech capabilities in Mastra, including configuration, usage, and integration with voice providers.
2
+
3
+ # Text-to-Speech (TTS)
4
+
5
+ Text-to-Speech (TTS) in Mastra offers a unified API for synthesizing spoken audio from text using various providers.
6
+ By incorporating TTS into your applications, you can enhance user experience with natural voice interactions, improve accessibility for users with visual impairments, and create more engaging multimodal interfaces.
7
+
8
+ TTS is a core component of any voice application. Combined with STT (Speech-to-Text), it forms the foundation of voice interaction systems. Newer models support STS ([Speech-to-Speech](./speech-to-speech)) which can be used for real-time interactions but come at high cost ($).
9
+
10
+ ## Configuration
11
+
12
+ To use TTS in Mastra, you need to provide a `speechModel` when initializing the voice provider. This includes parameters such as:
13
+
14
+ - **`name`**: The specific TTS model to use.
15
+ - **`apiKey`**: Your API key for authentication.
16
+ - **Provider-specific options**: Additional options that may be required or supported by the specific voice provider.
17
+
18
+ The **`speaker`** option allows you to select different voices for speech synthesis. Each provider offers a variety of voice options with distinct characteristics for **Voice diversity**, **Quality**, **Voice personality**, and **Multilingual support**
19
+
20
+ **Note**: All of these parameters are optional. You can use the default settings provided by the voice provider, which will depend on the specific provider you are using.
21
+
22
+ ```typescript
23
+ const voice = new OpenAIVoice({
24
+ speechModel: {
25
+ name: "tts-1-hd",
26
+ apiKey: process.env.OPENAI_API_KEY,
27
+ },
28
+ speaker: "alloy",
29
+ });
30
+
31
+ // If using default settings the configuration can be simplified to:
32
+ const voice = new OpenAIVoice();
33
+ ```
34
+
35
+ ## Available Providers
36
+
37
+ Mastra supports a wide range of Text-to-Speech providers, each with their own unique capabilities and voice options. You can choose the provider that best suits your application's needs:
38
+
39
+ - [**OpenAI**](https://mastra.ai/reference/v1/voice/openai/) - High-quality voices with natural intonation and expression
40
+ - [**Azure**](https://mastra.ai/reference/v1/voice/azure/) - Microsoft's speech service with a wide range of voices and languages
41
+ - [**ElevenLabs**](https://mastra.ai/reference/v1/voice/elevenlabs/) - Ultra-realistic voices with emotion and fine-grained control
42
+ - [**PlayAI**](https://mastra.ai/reference/v1/voice/playai/) - Specialized in natural-sounding voices with various styles
43
+ - [**Google**](https://mastra.ai/reference/v1/voice/google/) - Google's speech synthesis with multilingual support
44
+ - [**Cloudflare**](https://mastra.ai/reference/v1/voice/cloudflare/) - Edge-optimized speech synthesis for low-latency applications
45
+ - [**Deepgram**](https://mastra.ai/reference/v1/voice/deepgram/) - AI-powered speech technology with high accuracy
46
+ - [**Speechify**](https://mastra.ai/reference/v1/voice/speechify/) - Text-to-speech optimized for readability and accessibility
47
+ - [**Sarvam**](https://mastra.ai/reference/v1/voice/sarvam/) - Specialized in Indic languages and accents
48
+ - [**Murf**](https://mastra.ai/reference/v1/voice/murf/) - Studio-quality voice overs with customizable parameters
49
+
50
+ Each provider is implemented as a separate package that you can install as needed:
51
+
52
+ ```bash
53
+ pnpm add @mastra/voice-openai@beta # Example for OpenAI
54
+ ```
55
+
56
+ ## Using the Speak Method
57
+
58
+ The primary method for TTS is the `speak()` method, which converts text to speech. This method can accept options that allows you to specify the speaker and other provider-specific options. Here's how to use it:
59
+
60
+ ```typescript
61
+ import { Agent } from "@mastra/core/agent";
62
+ import { OpenAIVoice } from "@mastra/voice-openai";
63
+
64
+ const voice = new OpenAIVoice();
65
+
66
+ const agent = new Agent({
67
+ id: "voice-agent",
68
+ name: "Voice Agent",
69
+ instructions:
70
+ "You are a voice assistant that can help users with their tasks.",
71
+ model: "openai/gpt-5.1",
72
+ voice,
73
+ });
74
+
75
+ const { text } = await agent.generate("What color is the sky?");
76
+
77
+ // Convert text to speech to an Audio Stream
78
+ const readableStream = await voice.speak(text, {
79
+ speaker: "default", // Optional: specify a speaker
80
+ properties: {
81
+ speed: 1.0, // Optional: adjust speech speed
82
+ pitch: "default", // Optional: specify pitch if supported
83
+ },
84
+ });
85
+ ```
86
+
87
+ Check out the [Adding Voice to Agents](https://mastra.ai/docs/v1/agents/adding-voice) documentation to learn how to use TTS in an agent.
@@ -0,0 +1,83 @@
1
+ > Overview of Speech-to-Text capabilities in Mastra, including configuration, usage, and integration with voice providers.
2
+
3
+ # Speech-to-Text (STT)
4
+
5
+ Speech-to-Text (STT) in Mastra provides a standardized interface for converting audio input into text across multiple service providers.
6
+ STT helps create voice-enabled applications that can respond to human speech, enabling hands-free interaction, accessibility for users with disabilities, and more natural human-computer interfaces.
7
+
8
+ ## Configuration
9
+
10
+ To use STT in Mastra, you need to provide a `listeningModel` when initializing the voice provider. This includes parameters such as:
11
+
12
+ - **`name`**: The specific STT model to use.
13
+ - **`apiKey`**: Your API key for authentication.
14
+ - **Provider-specific options**: Additional options that may be required or supported by the specific voice provider.
15
+
16
+ **Note**: All of these parameters are optional. You can use the default settings provided by the voice provider, which will depend on the specific provider you are using.
17
+
18
+ ```typescript
19
+ const voice = new OpenAIVoice({
20
+ listeningModel: {
21
+ name: "whisper-1",
22
+ apiKey: process.env.OPENAI_API_KEY,
23
+ },
24
+ });
25
+
26
+ // If using default settings the configuration can be simplified to:
27
+ const voice = new OpenAIVoice();
28
+ ```
29
+
30
+ ## Available Providers
31
+
32
+ Mastra supports several Speech-to-Text providers, each with their own capabilities and strengths:
33
+
34
+ - [**OpenAI**](https://mastra.ai/reference/v1/voice/openai/) - High-accuracy transcription with Whisper models
35
+ - [**Azure**](https://mastra.ai/reference/v1/voice/azure/) - Microsoft's speech recognition with enterprise-grade reliability
36
+ - [**ElevenLabs**](https://mastra.ai/reference/v1/voice/elevenlabs/) - Advanced speech recognition with support for multiple languages
37
+ - [**Google**](https://mastra.ai/reference/v1/voice/google/) - Google's speech recognition with extensive language support
38
+ - [**Cloudflare**](https://mastra.ai/reference/v1/voice/cloudflare/) - Edge-optimized speech recognition for low-latency applications
39
+ - [**Deepgram**](https://mastra.ai/reference/v1/voice/deepgram/) - AI-powered speech recognition with high accuracy for various accents
40
+ - [**Sarvam**](https://mastra.ai/reference/v1/voice/sarvam/) - Specialized in Indic languages and accents
41
+
42
+ Each provider is implemented as a separate package that you can install as needed:
43
+
44
+ ```bash
45
+ pnpm add @mastra/voice-openai@beta # Example for OpenAI
46
+ ```
47
+
48
+ ## Using the Listen Method
49
+
50
+ The primary method for STT is the `listen()` method, which converts spoken audio into text. Here's how to use it:
51
+
52
+ ```typescript
53
+ import { Agent } from "@mastra/core/agent";
54
+ import { OpenAIVoice } from "@mastra/voice-openai";
55
+ import { getMicrophoneStream } from "@mastra/node-audio";
56
+
57
+ const voice = new OpenAIVoice();
58
+
59
+ const agent = new Agent({
60
+ id: "voice-agent",
61
+ name: "Voice Agent",
62
+ instructions:
63
+ "You are a voice assistant that provides recommendations based on user input.",
64
+ model: "openai/gpt-5.1",
65
+ voice,
66
+ });
67
+
68
+ const audioStream = getMicrophoneStream(); // Assume this function gets audio input
69
+
70
+ const transcript = await agent.voice.listen(audioStream, {
71
+ filetype: "m4a", // Optional: specify the audio file type
72
+ });
73
+
74
+ console.log(`User said: ${transcript}`);
75
+
76
+ const { text } = await agent.generate(
77
+ `Based on what the user said, provide them a recommendation: ${transcript}`,
78
+ );
79
+
80
+ console.log(`Recommendation: ${text}`);
81
+ ```
82
+
83
+ Check out the [Adding Voice to Agents](https://mastra.ai/docs/v1/agents/adding-voice) documentation to learn how to use STT in an agent.