@mastra/voice-aws-nova-sonic 0.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md ADDED
@@ -0,0 +1,17 @@
1
+ # Changelog
2
+
3
+ All notable changes to this package will be documented in this file.
4
+
5
+ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
6
+ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7
+
8
+ ## [0.12.0-beta.0] - 2026-02-18
9
+
10
+ ### Added
11
+
12
+ - Initial release of AWS Nova 2 Sonic voice integration
13
+ - Support for bidirectional streaming speech-to-speech
14
+ - AWS SigV4 authentication for WebSocket connections
15
+ - Polyglot voice support with multiple languages
16
+ - Tool/function calling support
17
+ - Cross-modal input (audio and text)
package/README.md ADDED
@@ -0,0 +1,383 @@
1
+ # @mastra/voice-aws-nova-sonic
2
+
3
+ Mastra integration for AWS Nova 2 Sonic, providing real-time bidirectional speech-to-speech capabilities using Amazon Bedrock's bidirectional streaming API.
4
+
5
+ ## Features
6
+
7
+ - **Real-time bidirectional streaming**: Continuous audio streaming in both directions
8
+ - **Multilingual support**: Supports English, French, Italian, German, Spanish, Portuguese, and Hindi
9
+ - **Polyglot voices**: Voices that can speak multiple languages within the same session
10
+ - **Barge-in support**: Users can interrupt the assistant mid-speech; handled server-side by Nova Sonic
11
+ - **Tool/function calling**: Support for agentic workflows and async tool execution
12
+ - **Cross-modal input**: Support for both audio and text inputs in the same conversation
13
+ - **Natural turn-taking**: Intelligent voice activity detection and turn management
14
+ - **Robust error handling**: Comprehensive error handling with detailed error codes
15
+
16
+ ## Installation
17
+
18
+ ```bash
19
+ npm install @mastra/voice-aws-nova-sonic
20
+ # or
21
+ pnpm add @mastra/voice-aws-nova-sonic
22
+ # or
23
+ yarn add @mastra/voice-aws-nova-sonic
24
+ ```
25
+
26
+ ## Prerequisites
27
+
28
+ - Node.js >= 22.13.0
29
+ - AWS account with access to Amazon Bedrock
30
+ - AWS credentials configured (see [AWS Setup](#aws-setup))
31
+ - Access to Nova 2 Sonic model in your AWS region
32
+
33
+ ## AWS Setup
34
+
35
+ ### 1. Enable Nova 2 Sonic in Amazon Bedrock
36
+
37
+ 1. Go to the [Amazon Bedrock Console](https://console.aws.amazon.com/bedrock/)
38
+ 2. Navigate to "Model access" in the left sidebar
39
+ 3. Request access to "Amazon Nova 2 Sonic" model
40
+ 4. Wait for approval (usually instant)
41
+
42
+ ### 2. Configure AWS Credentials
43
+
44
+ You can configure AWS credentials in several ways:
45
+
46
+ **Option 1: Environment Variables**
47
+ ```bash
48
+ export AWS_ACCESS_KEY_ID=your-access-key-id
49
+ export AWS_SECRET_ACCESS_KEY=your-secret-access-key
50
+ export AWS_REGION=us-east-1
51
+ ```
52
+
53
+ **Option 2: AWS Credentials File**
54
+ ```ini
55
+ # ~/.aws/credentials
56
+ [default]
57
+ aws_access_key_id = your-access-key-id
58
+ aws_secret_access_key = your-secret-access-key
59
+ ```
60
+
61
+ **Option 3: IAM Role** (for EC2/Lambda)
62
+ - Attach an IAM role with Bedrock permissions to your EC2 instance or Lambda function
63
+
64
+ **Option 4: Explicit Credentials in Code**
65
+ ```typescript
66
+ import { NovaSonicVoice } from '@mastra/voice-aws-nova-sonic';
67
+
68
+ const voice = new NovaSonicVoice({
69
+ region: 'us-east-1',
70
+ credentials: {
71
+ accessKeyId: 'your-access-key-id',
72
+ secretAccessKey: 'your-secret-access-key',
73
+ },
74
+ });
75
+ ```
76
+
77
+ ### 3. IAM Permissions
78
+
79
+ Your AWS credentials need the following IAM permissions:
80
+
81
+ ```json
82
+ {
83
+ "Version": "2012-10-17",
84
+ "Statement": [
85
+ {
86
+ "Effect": "Allow",
87
+ "Action": [
88
+ "bedrock:InvokeModel",
89
+ "bedrock:InvokeModelWithBidirectionalStream"
90
+ ],
91
+ "Resource": "arn:aws:bedrock:*::foundation-model/amazon.nova-2-sonic-v1:0"
92
+ }
93
+ ]
94
+ }
95
+ ```
96
+
97
+ ## Usage
98
+
99
+ ### Basic Example
100
+
101
+ ```typescript
102
+ import { Agent } from '@mastra/core/agent';
103
+ import { NovaSonicVoice } from '@mastra/voice-aws-nova-sonic';
104
+
105
+ const agent = new Agent({
106
+ name: 'Nova Sonic Agent',
107
+ instructions: 'You are a helpful assistant with real-time voice capabilities.',
108
+ model: 'openai/gpt-4o',
109
+ voice: new NovaSonicVoice({
110
+ region: 'us-east-1',
111
+ speaker: 'tiffany',
112
+ }),
113
+ });
114
+
115
+ // Connect to the voice service
116
+ await agent.voice.connect();
117
+
118
+ // Listen for agent audio responses (stream of audio data)
119
+ agent.voice.on('speaker', (audioStream) => {
120
+ // Pipe to your audio output (e.g., speaker, WebSocket, file)
121
+ audioStream.pipe(yourAudioOutput);
122
+ });
123
+
124
+ // Listen for text transcriptions
125
+ agent.voice.on('writing', ({ text, role, generationStage }) => {
126
+ // generationStage is 'SPECULATIVE' (preview) or 'FINAL' (actual transcript)
127
+ console.log(`[${role}] ${text}`);
128
+ });
129
+
130
+ // Send continuous audio from the microphone (NodeJS.ReadableStream of PCM16 audio)
131
+ await agent.voice.send(microphoneStream);
132
+ ```
133
+
134
+ ### Advanced Configuration
135
+
136
+ ```typescript
137
+ import { NovaSonicVoice } from '@mastra/voice-aws-nova-sonic';
138
+
139
+ const voice = new NovaSonicVoice({
140
+ region: 'us-east-1', // or 'us-west-2', 'ap-northeast-1'
141
+ model: 'amazon.nova-2-sonic-v1:0',
142
+ speaker: 'matthew', // or 'tiffany', 'amy', etc.
143
+ languageCode: 'en-US',
144
+ instructions: 'You are a helpful assistant.',
145
+ sessionConfig: {
146
+ tools: [
147
+ {
148
+ name: 'search',
149
+ description: 'Search the web',
150
+ inputSchema: {
151
+ type: 'object',
152
+ properties: {
153
+ query: { type: 'string' },
154
+ },
155
+ required: ['query'],
156
+ },
157
+ },
158
+ ],
159
+ turnDetectionConfiguration: {
160
+ // HIGH = fastest (1.5s pause), MEDIUM = balanced (1.75s), LOW = slowest (2s)
161
+ endpointingSensitivity: 'MEDIUM',
162
+ },
163
+ },
164
+ debug: true,
165
+ });
166
+
167
+ await voice.connect();
168
+ ```
169
+
170
+ ### With Tools
171
+
172
+ ```typescript
173
+ import { Agent } from '@mastra/core/agent';
174
+ import { NovaSonicVoice } from '@mastra/voice-aws-nova-sonic';
175
+ import { createTool } from '@mastra/core/tools';
176
+ import { z } from 'zod';
177
+
178
+ const weatherTool = createTool({
179
+ id: 'weather',
180
+ description: 'Get weather information',
181
+ inputSchema: z.object({
182
+ location: z.string(),
183
+ }),
184
+ execute: async ({ context }) => {
185
+ // Fetch weather data
186
+ return { temperature: 72, condition: 'sunny' };
187
+ },
188
+ });
189
+
190
+ const agent = new Agent({
191
+ name: 'Weather Agent',
192
+ instructions: 'You help users get weather information.',
193
+ model: 'openai/gpt-4o',
194
+ tools: {
195
+ weather: weatherTool,
196
+ },
197
+ voice: new NovaSonicVoice({
198
+ region: 'us-east-1',
199
+ }),
200
+ });
201
+
202
+ await agent.voice.connect();
203
+ // Tools are automatically available to the voice model
204
+ ```
205
+
206
+ ### Cross-Modal Text Input
207
+
208
+ Send text messages during an active voice session:
209
+
210
+ ```typescript
211
+ // After connecting and starting audio streaming
212
+ await agent.voice.speak('What is the weather in New York?');
213
+ ```
214
+
215
+ ## API Reference
216
+
217
+ ### Constructor
218
+
219
+ ```typescript
220
+ new NovaSonicVoice(config?: NovaSonicVoiceConfig)
221
+ ```
222
+
223
+ **Configuration Options:**
224
+
225
+ - `region` (string, optional): AWS region. Default: `'us-east-1'`. Supported: `'us-east-1'`, `'us-west-2'`, `'ap-northeast-1'`
226
+ - `model` (string, optional): Model ID. Default: `'amazon.nova-2-sonic-v1:0'`
227
+ - `credentials` (Credentials, optional): AWS credentials. If not provided, uses default credential chain
228
+ - `speaker` (string, optional): Voice name/identifier (e.g., `'matthew'`, `'tiffany'`, `'amy'`)
229
+ - `languageCode` (string, optional): Language code (e.g., `'en-US'`, `'fr-FR'`)
230
+ - `instructions` (string, optional): System instructions for the model
231
+ - `tools` (array, optional): Tool definitions
232
+ - `sessionConfig` (object, optional): Session configuration including `turnDetectionConfiguration`, `tools`, `inferenceConfiguration`
233
+ - `debug` (boolean, optional): Enable debug logging. Default: `false`
234
+
235
+ ### Methods
236
+
237
+ #### `connect(options?)`
238
+
239
+ Establishes connection to AWS Bedrock. Must be called before using other methods.
240
+
241
+ ```typescript
242
+ await voice.connect();
243
+ ```
244
+
245
+ #### `speak(input, options?)`
246
+
247
+ Send cross-modal text input during an active voice session. Nova Sonic processes it and responds with audio.
248
+
249
+ ```typescript
250
+ await voice.speak('Hello, world!');
251
+ ```
252
+
253
+ #### `listen(audioStream, options?)`
254
+
255
+ Stream audio input for transcription. For Nova Sonic, this is equivalent to `send()`.
256
+
257
+ ```typescript
258
+ await voice.listen(audioStream);
259
+ ```
260
+
261
+ #### `send(audioData)`
262
+
263
+ Stream audio data in real-time. Accepts a `NodeJS.ReadableStream` (PCM16 audio) or an `Int16Array`.
264
+
265
+ ```typescript
266
+ // Stream from a ReadableStream
267
+ await voice.send(audioStream);
268
+
269
+ // Or with Int16Array
270
+ const audioArray = new Int16Array([...]);
271
+ await voice.send(audioArray);
272
+ ```
273
+
274
+ #### `close()`
275
+
276
+ Disconnect and cleanup resources.
277
+
278
+ ```typescript
279
+ voice.close();
280
+ ```
281
+
282
+ #### `on(event, callback)`
283
+
284
+ Register an event listener.
285
+
286
+ ```typescript
287
+ voice.on('speaking', ({ audio }) => {
288
+ // audio is a base64-encoded string of PCM audio
289
+ });
290
+
291
+ voice.on('writing', ({ text, role, generationStage }) => {
292
+ // generationStage: 'SPECULATIVE' (preview) or 'FINAL' (actual transcript)
293
+ console.log(`${role}: ${text}`);
294
+ });
295
+
296
+ voice.on('error', ({ message, code }) => {
297
+ console.error(`Error: ${message} (${code})`);
298
+ });
299
+ ```
300
+
301
+ #### `off(event, callback)`
302
+
303
+ Remove an event listener.
304
+
305
+ ```typescript
306
+ voice.off('speaking', callback);
307
+ ```
308
+
309
+ ### Events
310
+
311
+ - **`speaker`**: Audio stream (`NodeJS.ReadableStream`) for the full response
312
+ - **`speaking`**: Audio chunk `{ audio: string, audioData: Buffer, response_id?: string }`
313
+ - **`writing`**: Text transcription `{ text: string, role: 'assistant' | 'user', generationStage?: 'SPECULATIVE' | 'FINAL' }`
314
+ - **`error`**: Error event `{ message: string, code?: string, details?: unknown }`
315
+ - **`toolCall`**: Tool invocation `{ name: string, args: Record<string, any>, id: string }`
316
+ - **`turnComplete`**: Turn completion `{ timestamp: number }`
317
+ - **`interrupt`**: Barge-in detected `{ type: string, timestamp: number }`
318
+ - **`contentStart`**: Content block started (raw Nova Sonic event)
319
+ - **`contentEnd`**: Content block ended (raw Nova Sonic event)
320
+ - **`usage`**: Token usage `{ inputTokens: number, outputTokens: number, totalTokens: number }`
321
+
322
+ ## Supported Regions
323
+
324
+ - `us-east-1` (US East - N. Virginia)
325
+ - `us-west-2` (US West - Oregon)
326
+ - `ap-northeast-1` (Asia Pacific - Tokyo)
327
+
328
+ ## Supported Languages
329
+
330
+ - English (US, UK, India, Australia)
331
+ - French
332
+ - Italian
333
+ - German
334
+ - Spanish
335
+ - Portuguese
336
+ - Hindi
337
+
338
+ ## Error Handling
339
+
340
+ The package provides error handling with specific error codes:
341
+
342
+ ```typescript
343
+ import { NovaSonicError, NovaSonicErrorCode } from '@mastra/voice-aws-nova-sonic';
344
+
345
+ voice.on('error', ({ message, code, details }) => {
346
+ if (code === NovaSonicErrorCode.CONNECTION_FAILED) {
347
+ // Handle connection error
348
+ } else if (code === NovaSonicErrorCode.CREDENTIALS_MISSING) {
349
+ // Handle credentials error
350
+ }
351
+ });
352
+ ```
353
+
354
+ ## Troubleshooting
355
+
356
+ ### Connection Issues
357
+
358
+ - Verify AWS credentials are configured correctly
359
+ - Check that Nova 2 Sonic is enabled in your AWS Bedrock console
360
+ - Ensure your IAM role/user has the required permissions
361
+ - Verify the region supports Nova 2 Sonic
362
+
363
+ ### Audio Issues
364
+
365
+ - Ensure audio format is compatible (PCM, 16-bit, 16kHz)
366
+ - Check sample rate matches expected format
367
+ - Verify audio stream is not empty
368
+
369
+ ### Authentication Issues
370
+
371
+ - Check AWS credentials are valid
372
+ - Verify IAM permissions include Bedrock access
373
+ - Ensure region is correct
374
+
375
+ ## License
376
+
377
+ Apache-2.0
378
+
379
+ ## Links
380
+
381
+ - [Mastra Documentation](https://mastra.ai)
382
+ - [AWS Nova 2 Sonic Documentation](https://docs.aws.amazon.com/nova/latest/nova2-userguide/using-conversational-speech.html)
383
+ - [Amazon Bedrock Documentation](https://docs.aws.amazon.com/bedrock/)