npm - @mastra/voice-openai - Versions diffs - 0.12.1 → 0.12.2 - Mend

@mastra/voice-openai 0.12.1 → 0.12.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (128) hide show

package/dist/docs/references/reference-voice-voice.listen.md CHANGED Viewed

@@ -4,11 +4,11 @@ The `listen()` method is a core function available in all Mastra voice providers
 ## Parameters
-**audioStream:** (`NodeJS.ReadableStream`): Audio stream to transcribe. This can be a file stream or a microphone stream.
+**audioStream** (`NodeJS.ReadableStream`): Audio stream to transcribe. This can be a file stream or a microphone stream.
-**options?:** (`object`): Provider-specific options for speech recognition
+**options** (`object`): Provider-specific options for speech recognition
-## Return Value
+## Return value
 Returns one of the following:
@@ -16,81 +16,87 @@ Returns one of the following:
 - `Promise<NodeJS.ReadableStream>`: A promise that resolves to a stream of transcribed text (for streaming transcription)
 - `Promise<void>`: For real-time providers that emit 'writing' events instead of returning text directly
-## Provider-Specific Options
+## Provider-specific options
 Each voice provider may support additional options specific to their implementation. Here are some examples:
 ### OpenAI
-**options.filetype?:** (`string`): Audio file format (e.g., 'mp3', 'wav', 'm4a') (Default: `'mp3'`)
+**options** (`Options`): Configuration options.
-**options.prompt?:** (`string`): Text to guide the model's transcription
+**options.filetype** (`string`): Audio file format (e.g., 'mp3', 'wav', 'm4a')
-**options.language?:** (`string`): Language code (e.g., 'en', 'fr', 'de')
+**options.prompt** (`string`): Text to guide the model's transcription
+**options.language** (`string`): Language code (e.g., 'en', 'fr', 'de')
 ### Google
-**options.stream?:** (`boolean`): Whether to use streaming recognition (Default: `false`)
+**options** (`Options`): Configuration options.
+**options.stream** (`boolean`): Whether to use streaming recognition
-**options.config?:** (`object`): Recognition configuration from Google Cloud Speech-to-Text API (Default: `{ encoding: 'LINEAR16', languageCode: 'en-US' }`)
+**options.config** (`object`): Recognition configuration from Google Cloud Speech-to-Text API
 ### Deepgram
-**options.model?:** (`string`): Deepgram model to use for transcription (Default: `'nova-2'`)
+**options** (`Options`): Configuration options.
-**options.language?:** (`string`): Language code for transcription (Default: `'en'`)
+**options.model** (`string`): Deepgram model to use for transcription
-## Usage Example
+**options.language** (`string`): Language code for transcription
+## Usage example
 ```typescript
-import { OpenAIVoice } from "@mastra/voice-openai";
-import { getMicrophoneStream } from "@mastra/node-audio";
-import { createReadStream } from "fs";
-import path from "path";
+import { OpenAIVoice } from '@mastra/voice-openai'
+import { getMicrophoneStream } from '@mastra/node-audio'
+import { createReadStream } from 'fs'
+import path from 'path'
 // Initialize a voice provider
 const voice = new OpenAIVoice({
   listeningModel: {
-    name: "whisper-1",
+    name: 'whisper-1',
     apiKey: process.env.OPENAI_API_KEY,
   },
-});
+})
 // Basic usage with a file stream
-const audioFilePath = path.join(process.cwd(), "audio.mp3");
-const audioStream = createReadStream(audioFilePath);
+const audioFilePath = path.join(process.cwd(), 'audio.mp3')
+const audioStream = createReadStream(audioFilePath)
 const transcript = await voice.listen(audioStream, {
-  filetype: "mp3",
-});
-console.log("Transcribed text:", transcript);
+  filetype: 'mp3',
+})
+console.log('Transcribed text:', transcript)
 // Using a microphone stream
-const microphoneStream = getMicrophoneStream(); // Assume this function gets audio input
-const transcription = await voice.listen(microphoneStream);
+const microphoneStream = getMicrophoneStream() // Assume this function gets audio input
+const transcription = await voice.listen(microphoneStream)
 // With provider-specific options
 const transcriptWithOptions = await voice.listen(audioStream, {
-  language: "en",
-  prompt: "This is a conversation about artificial intelligence.",
-});
+  language: 'en',
+  prompt: 'This is a conversation about artificial intelligence.',
+})
 ```
-## Using with CompositeVoice
+## Using with `CompositeVoice`
 When using `CompositeVoice`, the `listen()` method delegates to the configured listening provider:
 ```typescript
-import { CompositeVoice } from "@mastra/core/voice";
-import { OpenAIVoice } from "@mastra/voice-openai";
-import { PlayAIVoice } from "@mastra/voice-playai";
+import { CompositeVoice } from '@mastra/core/voice'
+import { OpenAIVoice } from '@mastra/voice-openai'
+import { PlayAIVoice } from '@mastra/voice-playai'
 const voice = new CompositeVoice({
   input: new OpenAIVoice(),
   output: new PlayAIVoice(),
-});
+})
 // This will use the OpenAIVoice provider
-const transcript = await voice.listen(audioStream);
+const transcript = await voice.listen(audioStream)
 ```
 ### Using AI SDK Model Providers
@@ -98,18 +104,18 @@ const transcript = await voice.listen(audioStream);
 You can also use AI SDK transcription models directly with `CompositeVoice`:
 ```typescript
-import { CompositeVoice } from "@mastra/core/voice";
-import { openai } from "@ai-sdk/openai";
-import { groq } from "@ai-sdk/groq";
+import { CompositeVoice } from '@mastra/core/voice'
+import { openai } from '@ai-sdk/openai'
+import { groq } from '@ai-sdk/groq'
 // Use AI SDK transcription models
 const voice = new CompositeVoice({
-  input: openai.transcription('whisper-1'),  // AI SDK model
-  output: new PlayAIVoice(),                 // Mastra provider
-});
+  input: openai.transcription('whisper-1'), // AI SDK model
+  output: new PlayAIVoice(), // Mastra provider
+})
 // Works the same way
-const transcript = await voice.listen(audioStream);
+const transcript = await voice.listen(audioStream)
 // Provider-specific options can be passed through
 const transcriptWithOptions = await voice.listen(audioStream, {
@@ -117,14 +123,14 @@ const transcriptWithOptions = await voice.listen(audioStream, {
     openai: {
       language: 'en',
       prompt: 'This is about AI',
-    }
-  }
-});
+    },
+  },
+})
 ```
 See the [CompositeVoice reference](https://mastra.ai/reference/voice/composite-voice) for more details on AI SDK integration.
-## Realtime Voice Providers
+## Realtime voice providers
 When using realtime voice providers like `OpenAIRealtimeVoice`, the `listen()` method behaves differently:
@@ -132,20 +138,20 @@ When using realtime voice providers like `OpenAIRealtimeVoice`, the `listen()` m
 - You need to register an event listener to receive the transcription
 ```typescript
-import { OpenAIRealtimeVoice } from "@mastra/voice-openai-realtime";
-import { getMicrophoneStream } from "@mastra/node-audio";
+import { OpenAIRealtimeVoice } from '@mastra/voice-openai-realtime'
+import { getMicrophoneStream } from '@mastra/node-audio'
-const voice = new OpenAIRealtimeVoice();
-await voice.connect();
+const voice = new OpenAIRealtimeVoice()
+await voice.connect()
 // Register event listener for transcription
-voice.on("writing", ({ text, role }) => {
-  console.log(`${role}: ${text}`);
-});
+voice.on('writing', ({ text, role }) => {
+  console.log(`${role}: ${text}`)
+})
 // This will emit 'writing' events instead of returning text
-const microphoneStream = getMicrophoneStream();
-await voice.listen(microphoneStream);
+const microphoneStream = getMicrophoneStream()
+await voice.listen(microphoneStream)
 ```
 ## Notes
@@ -157,8 +163,8 @@ await voice.listen(microphoneStream);
 - Some providers support streaming transcription, where text is returned as it's transcribed
 - For best performance, consider closing or ending the audio stream when you're done with it
-## Related Methods
+## Related methods
-- [voice.speak()](https://mastra.ai/reference/voice/voice.speak) - Converts text to speech
-- [voice.send()](https://mastra.ai/reference/voice/voice.send) - Sends audio data to the voice provider in real-time
-- [voice.on()](https://mastra.ai/reference/voice/voice.on) - Registers an event listener for voice events
+- [voice.speak()](https://mastra.ai/reference/voice/voice.speak): Converts text to speech
+- [voice.send()](https://mastra.ai/reference/voice/voice.send): Sends audio data to the voice provider in real-time
+- [voice.on()](https://mastra.ai/reference/voice/voice.on): Registers an event listener for voice events

package/dist/docs/references/reference-voice-voice.speak.md CHANGED Viewed

@@ -4,88 +4,98 @@ The `speak()` method is a core function available in all Mastra voice providers
 ## Parameters
-**input:** (`string | NodeJS.ReadableStream`): Text to convert to speech. Can be a string or a readable stream of text.
+**input** (`string | NodeJS.ReadableStream`): Text to convert to speech. Can be a string or a readable stream of text.
-**options?:** (`object`): Options for speech synthesis
+**options** (`object`): Options for speech synthesis
-**options.speaker?:** (`string`): Voice ID to use for this specific request. Overrides the default speaker set in the constructor.
+**options.speaker** (`string`): Voice ID to use for this specific request. Overrides the default speaker set in the constructor.
-## Return Value
+## Return value
 Returns a `Promise<NodeJS.ReadableStream | void>` where:
 - `NodeJS.ReadableStream`: A stream of audio data that can be played or saved
 - `void`: When using a realtime voice provider that emits audio through events instead of returning it directly
-## Provider-Specific Options
+## Provider-specific options
 Each voice provider may support additional options specific to their implementation. Here are some examples:
 ### OpenAI
-**options.speed?:** (`number`): Speech speed multiplier. Values between 0.25 and 4.0 are supported. (Default: `1.0`)
+**options** (`Options`): Configuration options.
+**options.speed** (`number`): Speech speed multiplier. Values between 0.25 and 4.0 are supported.
 ### ElevenLabs
-**options.stability?:** (`number`): Voice stability. Higher values result in more stable, less expressive speech. (Default: `0.5`)
+**options** (`Options`): Configuration options.
+**options.stability** (`number`): Voice stability. Higher values result in more stable, less expressive speech.
-**options.similarity\_boost?:** (`number`): Voice clarity and similarity to the original voice. (Default: `0.75`)
+**options.similarity\_boost** (`number`): Voice clarity and similarity to the original voice.
 ### Google
-**options.languageCode?:** (`string`): Language code for the voice (e.g., 'en-US').
+**options** (`Options`): Configuration options.
+**options.languageCode** (`string`): Language code for the voice (e.g., 'en-US').
-**options.audioConfig?:** (`object`): Audio configuration options from Google Cloud Text-to-Speech API. (Default: `{ audioEncoding: 'LINEAR16' }`)
+**options.audioConfig** (`object`): Audio configuration options from Google Cloud Text-to-Speech API.
 ### Murf
-**options.properties.rate?:** (`number`): Speech rate multiplier.
+**options** (`Options`): Configuration options.
+**options.properties** (`object`): properties configuration.
+**options.properties.rate** (`number`): Speech rate multiplier.
-**options.properties.pitch?:** (`number`): Voice pitch adjustment.
+**options.properties.pitch** (`number`): Voice pitch adjustment.
-**options.properties.format?:** (`'MP3' | 'WAV' | 'FLAC' | 'ALAW' | 'ULAW'`): Output audio format.
+**options.properties.format** (`'MP3' | 'WAV' | 'FLAC' | 'ALAW' | 'ULAW'`): Output audio format.
-## Usage Example
+## Usage example
 ```typescript
-import { OpenAIVoice } from "@mastra/voice-openai";
+import { OpenAIVoice } from '@mastra/voice-openai'
 // Initialize a voice provider
 const voice = new OpenAIVoice({
-  speaker: "alloy", // Default voice
-});
+  speaker: 'alloy', // Default voice
+})
 // Basic usage with default settings
-const audioStream = await voice.speak("Hello, world!");
+const audioStream = await voice.speak('Hello, world!')
 // Using a different voice for this specific request
-const audioStreamWithDifferentVoice = await voice.speak("Hello again!", {
-  speaker: "nova",
-});
+const audioStreamWithDifferentVoice = await voice.speak('Hello again!', {
+  speaker: 'nova',
+})
 // Using provider-specific options
-const audioStreamWithOptions = await voice.speak("Hello with options!", {
-  speaker: "echo",
+const audioStreamWithOptions = await voice.speak('Hello with options!', {
+  speaker: 'echo',
   speed: 1.2, // OpenAI-specific option
-});
+})
 // Using a text stream as input
-import { Readable } from "stream";
-const textStream = Readable.from(["Hello", " from", " a", " stream!"]);
-const audioStreamFromTextStream = await voice.speak(textStream);
+import { Readable } from 'stream'
+const textStream = Readable.from(['Hello', ' from', ' a', ' stream!'])
+const audioStreamFromTextStream = await voice.speak(textStream)
 ```
-## Using with CompositeVoice
+## Using with `CompositeVoice`
 When using `CompositeVoice`, the `speak()` method delegates to the configured speaking provider:
 ```typescript
-import { CompositeVoice } from "@mastra/core/voice";
-import { OpenAIVoice } from "@mastra/voice-openai";
-import { PlayAIVoice } from "@mastra/voice-playai";
+import { CompositeVoice } from '@mastra/core/voice'
+import { OpenAIVoice } from '@mastra/voice-openai'
+import { PlayAIVoice } from '@mastra/voice-playai'
 const voice = new CompositeVoice({
   output: new PlayAIVoice(),
   input: new OpenAIVoice(),
-});
+})
 // This will use the PlayAIVoice provider
-const audioStream = await voice.speak("Hello, world!");
+const audioStream = await voice.speak('Hello, world!')
 ```
 ### Using AI SDK Model Providers
@@ -93,34 +103,34 @@ const audioStream = await voice.speak("Hello, world!");
 You can also use AI SDK speech models directly with `CompositeVoice`:
 ```typescript
-import { CompositeVoice } from "@mastra/core/voice";
-import { openai } from "@ai-sdk/openai";
-import { elevenlabs } from "@ai-sdk/elevenlabs";
+import { CompositeVoice } from '@mastra/core/voice'
+import { openai } from '@ai-sdk/openai'
+import { elevenlabs } from '@ai-sdk/elevenlabs'
 // Use AI SDK speech models
 const voice = new CompositeVoice({
-  output: elevenlabs.speech('eleven_turbo_v2'),  // AI SDK model
-  input: openai.transcription('whisper-1'),      // AI SDK model
-});
+  output: elevenlabs.speech('eleven_turbo_v2'), // AI SDK model
+  input: openai.transcription('whisper-1'), // AI SDK model
+})
 // Works the same way
-const audioStream = await voice.speak("Hello from AI SDK!");
+const audioStream = await voice.speak('Hello from AI SDK!')
 // Provider-specific options can be passed through
-const audioWithOptions = await voice.speak("Hello with options!", {
-  speaker: 'Rachel',  // ElevenLabs voice
+const audioWithOptions = await voice.speak('Hello with options!', {
+  speaker: 'Rachel', // ElevenLabs voice
   providerOptions: {
     elevenlabs: {
       stability: 0.5,
       similarity_boost: 0.75,
-    }
-  }
-});
+    },
+  },
+})
 ```
 See the [CompositeVoice reference](https://mastra.ai/reference/voice/composite-voice) for more details on AI SDK integration.
-## Realtime Voice Providers
+## Realtime voice providers
 When using realtime voice providers like `OpenAIRealtimeVoice`, the `speak()` method behaves differently:
@@ -128,24 +138,24 @@ When using realtime voice providers like `OpenAIRealtimeVoice`, the `speak()` me
 - You need to register an event listener to receive the audio chunks
 ```typescript
-import { OpenAIRealtimeVoice } from "@mastra/voice-openai-realtime";
-import Speaker from "@mastra/node-speaker";
+import { OpenAIRealtimeVoice } from '@mastra/voice-openai-realtime'
+import Speaker from '@mastra/node-speaker'
 const speaker = new Speaker({
   sampleRate: 24100, // Audio sample rate in Hz - standard for high-quality audio on MacBook Pro
   channels: 1, // Mono audio output (as opposed to stereo which would be 2)
   bitDepth: 16, // Bit depth for audio quality - CD quality standard (16-bit resolution)
-});
+})
-const voice = new OpenAIRealtimeVoice();
-await voice.connect();
+const voice = new OpenAIRealtimeVoice()
+await voice.connect()
 // Register event listener for audio chunks
-voice.on("speaker", (stream) => {
+voice.on('speaker', stream => {
   // Handle audio chunk (e.g., play it or save it)
-  stream.pipe(speaker);
-});
+  stream.pipe(speaker)
+})
 // This will emit 'speaking' events instead of returning a stream
-await voice.speak("Hello, this is realtime speech!");
+await voice.speak('Hello, this is realtime speech!')
 ```
 ## Notes