@mastra/voice-aws-nova-sonic 0.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +17 -0
- package/README.md +383 -0
- package/dist/index.cjs +1539 -0
- package/dist/index.cjs.map +1 -0
- package/dist/index.d.ts +269 -0
- package/dist/index.d.ts.map +1 -0
- package/dist/index.js +1535 -0
- package/dist/index.js.map +1 -0
- package/dist/types.d.ts +354 -0
- package/dist/types.d.ts.map +1 -0
- package/dist/utils/auth.d.ts +6 -0
- package/dist/utils/auth.d.ts.map +1 -0
- package/dist/utils/errors.d.ts +17 -0
- package/dist/utils/errors.d.ts.map +1 -0
- package/package.json +70 -0
package/CHANGELOG.md
ADDED
|
@@ -0,0 +1,17 @@
|
|
|
1
|
+
# Changelog
|
|
2
|
+
|
|
3
|
+
All notable changes to this package will be documented in this file.
|
|
4
|
+
|
|
5
|
+
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
|
|
6
|
+
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
|
7
|
+
|
|
8
|
+
## [0.12.0-beta.0] - 2026-02-18
|
|
9
|
+
|
|
10
|
+
### Added
|
|
11
|
+
|
|
12
|
+
- Initial release of AWS Nova 2 Sonic voice integration
|
|
13
|
+
- Support for bidirectional streaming speech-to-speech
|
|
14
|
+
- AWS SigV4 authentication for WebSocket connections
|
|
15
|
+
- Polyglot voice support with multiple languages
|
|
16
|
+
- Tool/function calling support
|
|
17
|
+
- Cross-modal input (audio and text)
|
package/README.md
ADDED
|
@@ -0,0 +1,383 @@
|
|
|
1
|
+
# @mastra/voice-aws-nova-sonic
|
|
2
|
+
|
|
3
|
+
Mastra integration for AWS Nova 2 Sonic, providing real-time bidirectional speech-to-speech capabilities using Amazon Bedrock's bidirectional streaming API.
|
|
4
|
+
|
|
5
|
+
## Features
|
|
6
|
+
|
|
7
|
+
- **Real-time bidirectional streaming**: Continuous audio streaming in both directions
|
|
8
|
+
- **Multilingual support**: Supports English, French, Italian, German, Spanish, Portuguese, and Hindi
|
|
9
|
+
- **Polyglot voices**: Voices that can speak multiple languages within the same session
|
|
10
|
+
- **Barge-in support**: Users can interrupt the assistant mid-speech; handled server-side by Nova Sonic
|
|
11
|
+
- **Tool/function calling**: Support for agentic workflows and async tool execution
|
|
12
|
+
- **Cross-modal input**: Support for both audio and text inputs in the same conversation
|
|
13
|
+
- **Natural turn-taking**: Intelligent voice activity detection and turn management
|
|
14
|
+
- **Robust error handling**: Comprehensive error handling with detailed error codes
|
|
15
|
+
|
|
16
|
+
## Installation
|
|
17
|
+
|
|
18
|
+
```bash
|
|
19
|
+
npm install @mastra/voice-aws-nova-sonic
|
|
20
|
+
# or
|
|
21
|
+
pnpm add @mastra/voice-aws-nova-sonic
|
|
22
|
+
# or
|
|
23
|
+
yarn add @mastra/voice-aws-nova-sonic
|
|
24
|
+
```
|
|
25
|
+
|
|
26
|
+
## Prerequisites
|
|
27
|
+
|
|
28
|
+
- Node.js >= 22.13.0
|
|
29
|
+
- AWS account with access to Amazon Bedrock
|
|
30
|
+
- AWS credentials configured (see [AWS Setup](#aws-setup))
|
|
31
|
+
- Access to Nova 2 Sonic model in your AWS region
|
|
32
|
+
|
|
33
|
+
## AWS Setup
|
|
34
|
+
|
|
35
|
+
### 1. Enable Nova 2 Sonic in Amazon Bedrock
|
|
36
|
+
|
|
37
|
+
1. Go to the [Amazon Bedrock Console](https://console.aws.amazon.com/bedrock/)
|
|
38
|
+
2. Navigate to "Model access" in the left sidebar
|
|
39
|
+
3. Request access to "Amazon Nova 2 Sonic" model
|
|
40
|
+
4. Wait for approval (usually instant)
|
|
41
|
+
|
|
42
|
+
### 2. Configure AWS Credentials
|
|
43
|
+
|
|
44
|
+
You can configure AWS credentials in several ways:
|
|
45
|
+
|
|
46
|
+
**Option 1: Environment Variables**
|
|
47
|
+
```bash
|
|
48
|
+
export AWS_ACCESS_KEY_ID=your-access-key-id
|
|
49
|
+
export AWS_SECRET_ACCESS_KEY=your-secret-access-key
|
|
50
|
+
export AWS_REGION=us-east-1
|
|
51
|
+
```
|
|
52
|
+
|
|
53
|
+
**Option 2: AWS Credentials File**
|
|
54
|
+
```ini
|
|
55
|
+
# ~/.aws/credentials
|
|
56
|
+
[default]
|
|
57
|
+
aws_access_key_id = your-access-key-id
|
|
58
|
+
aws_secret_access_key = your-secret-access-key
|
|
59
|
+
```
|
|
60
|
+
|
|
61
|
+
**Option 3: IAM Role** (for EC2/Lambda)
|
|
62
|
+
- Attach an IAM role with Bedrock permissions to your EC2 instance or Lambda function
|
|
63
|
+
|
|
64
|
+
**Option 4: Explicit Credentials in Code**
|
|
65
|
+
```typescript
|
|
66
|
+
import { NovaSonicVoice } from '@mastra/voice-aws-nova-sonic';
|
|
67
|
+
|
|
68
|
+
const voice = new NovaSonicVoice({
|
|
69
|
+
region: 'us-east-1',
|
|
70
|
+
credentials: {
|
|
71
|
+
accessKeyId: 'your-access-key-id',
|
|
72
|
+
secretAccessKey: 'your-secret-access-key',
|
|
73
|
+
},
|
|
74
|
+
});
|
|
75
|
+
```
|
|
76
|
+
|
|
77
|
+
### 3. IAM Permissions
|
|
78
|
+
|
|
79
|
+
Your AWS credentials need the following IAM permissions:
|
|
80
|
+
|
|
81
|
+
```json
|
|
82
|
+
{
|
|
83
|
+
"Version": "2012-10-17",
|
|
84
|
+
"Statement": [
|
|
85
|
+
{
|
|
86
|
+
"Effect": "Allow",
|
|
87
|
+
"Action": [
|
|
88
|
+
"bedrock:InvokeModel",
|
|
89
|
+
"bedrock:InvokeModelWithBidirectionalStream"
|
|
90
|
+
],
|
|
91
|
+
"Resource": "arn:aws:bedrock:*::foundation-model/amazon.nova-2-sonic-v1:0"
|
|
92
|
+
}
|
|
93
|
+
]
|
|
94
|
+
}
|
|
95
|
+
```
|
|
96
|
+
|
|
97
|
+
## Usage
|
|
98
|
+
|
|
99
|
+
### Basic Example
|
|
100
|
+
|
|
101
|
+
```typescript
|
|
102
|
+
import { Agent } from '@mastra/core/agent';
|
|
103
|
+
import { NovaSonicVoice } from '@mastra/voice-aws-nova-sonic';
|
|
104
|
+
|
|
105
|
+
const agent = new Agent({
|
|
106
|
+
name: 'Nova Sonic Agent',
|
|
107
|
+
instructions: 'You are a helpful assistant with real-time voice capabilities.',
|
|
108
|
+
model: 'openai/gpt-4o',
|
|
109
|
+
voice: new NovaSonicVoice({
|
|
110
|
+
region: 'us-east-1',
|
|
111
|
+
speaker: 'tiffany',
|
|
112
|
+
}),
|
|
113
|
+
});
|
|
114
|
+
|
|
115
|
+
// Connect to the voice service
|
|
116
|
+
await agent.voice.connect();
|
|
117
|
+
|
|
118
|
+
// Listen for agent audio responses (stream of audio data)
|
|
119
|
+
agent.voice.on('speaker', (audioStream) => {
|
|
120
|
+
// Pipe to your audio output (e.g., speaker, WebSocket, file)
|
|
121
|
+
audioStream.pipe(yourAudioOutput);
|
|
122
|
+
});
|
|
123
|
+
|
|
124
|
+
// Listen for text transcriptions
|
|
125
|
+
agent.voice.on('writing', ({ text, role, generationStage }) => {
|
|
126
|
+
// generationStage is 'SPECULATIVE' (preview) or 'FINAL' (actual transcript)
|
|
127
|
+
console.log(`[${role}] ${text}`);
|
|
128
|
+
});
|
|
129
|
+
|
|
130
|
+
// Send continuous audio from the microphone (NodeJS.ReadableStream of PCM16 audio)
|
|
131
|
+
await agent.voice.send(microphoneStream);
|
|
132
|
+
```
|
|
133
|
+
|
|
134
|
+
### Advanced Configuration
|
|
135
|
+
|
|
136
|
+
```typescript
|
|
137
|
+
import { NovaSonicVoice } from '@mastra/voice-aws-nova-sonic';
|
|
138
|
+
|
|
139
|
+
const voice = new NovaSonicVoice({
|
|
140
|
+
region: 'us-east-1', // or 'us-west-2', 'ap-northeast-1'
|
|
141
|
+
model: 'amazon.nova-2-sonic-v1:0',
|
|
142
|
+
speaker: 'matthew', // or 'tiffany', 'amy', etc.
|
|
143
|
+
languageCode: 'en-US',
|
|
144
|
+
instructions: 'You are a helpful assistant.',
|
|
145
|
+
sessionConfig: {
|
|
146
|
+
tools: [
|
|
147
|
+
{
|
|
148
|
+
name: 'search',
|
|
149
|
+
description: 'Search the web',
|
|
150
|
+
inputSchema: {
|
|
151
|
+
type: 'object',
|
|
152
|
+
properties: {
|
|
153
|
+
query: { type: 'string' },
|
|
154
|
+
},
|
|
155
|
+
required: ['query'],
|
|
156
|
+
},
|
|
157
|
+
},
|
|
158
|
+
],
|
|
159
|
+
turnDetectionConfiguration: {
|
|
160
|
+
// HIGH = fastest (1.5s pause), MEDIUM = balanced (1.75s), LOW = slowest (2s)
|
|
161
|
+
endpointingSensitivity: 'MEDIUM',
|
|
162
|
+
},
|
|
163
|
+
},
|
|
164
|
+
debug: true,
|
|
165
|
+
});
|
|
166
|
+
|
|
167
|
+
await voice.connect();
|
|
168
|
+
```
|
|
169
|
+
|
|
170
|
+
### With Tools
|
|
171
|
+
|
|
172
|
+
```typescript
|
|
173
|
+
import { Agent } from '@mastra/core/agent';
|
|
174
|
+
import { NovaSonicVoice } from '@mastra/voice-aws-nova-sonic';
|
|
175
|
+
import { createTool } from '@mastra/core/tools';
|
|
176
|
+
import { z } from 'zod';
|
|
177
|
+
|
|
178
|
+
const weatherTool = createTool({
|
|
179
|
+
id: 'weather',
|
|
180
|
+
description: 'Get weather information',
|
|
181
|
+
inputSchema: z.object({
|
|
182
|
+
location: z.string(),
|
|
183
|
+
}),
|
|
184
|
+
execute: async ({ context }) => {
|
|
185
|
+
// Fetch weather data
|
|
186
|
+
return { temperature: 72, condition: 'sunny' };
|
|
187
|
+
},
|
|
188
|
+
});
|
|
189
|
+
|
|
190
|
+
const agent = new Agent({
|
|
191
|
+
name: 'Weather Agent',
|
|
192
|
+
instructions: 'You help users get weather information.',
|
|
193
|
+
model: 'openai/gpt-4o',
|
|
194
|
+
tools: {
|
|
195
|
+
weather: weatherTool,
|
|
196
|
+
},
|
|
197
|
+
voice: new NovaSonicVoice({
|
|
198
|
+
region: 'us-east-1',
|
|
199
|
+
}),
|
|
200
|
+
});
|
|
201
|
+
|
|
202
|
+
await agent.voice.connect();
|
|
203
|
+
// Tools are automatically available to the voice model
|
|
204
|
+
```
|
|
205
|
+
|
|
206
|
+
### Cross-Modal Text Input
|
|
207
|
+
|
|
208
|
+
Send text messages during an active voice session:
|
|
209
|
+
|
|
210
|
+
```typescript
|
|
211
|
+
// After connecting and starting audio streaming
|
|
212
|
+
await agent.voice.speak('What is the weather in New York?');
|
|
213
|
+
```
|
|
214
|
+
|
|
215
|
+
## API Reference
|
|
216
|
+
|
|
217
|
+
### Constructor
|
|
218
|
+
|
|
219
|
+
```typescript
|
|
220
|
+
new NovaSonicVoice(config?: NovaSonicVoiceConfig)
|
|
221
|
+
```
|
|
222
|
+
|
|
223
|
+
**Configuration Options:**
|
|
224
|
+
|
|
225
|
+
- `region` (string, optional): AWS region. Default: `'us-east-1'`. Supported: `'us-east-1'`, `'us-west-2'`, `'ap-northeast-1'`
|
|
226
|
+
- `model` (string, optional): Model ID. Default: `'amazon.nova-2-sonic-v1:0'`
|
|
227
|
+
- `credentials` (Credentials, optional): AWS credentials. If not provided, uses default credential chain
|
|
228
|
+
- `speaker` (string, optional): Voice name/identifier (e.g., `'matthew'`, `'tiffany'`, `'amy'`)
|
|
229
|
+
- `languageCode` (string, optional): Language code (e.g., `'en-US'`, `'fr-FR'`)
|
|
230
|
+
- `instructions` (string, optional): System instructions for the model
|
|
231
|
+
- `tools` (array, optional): Tool definitions
|
|
232
|
+
- `sessionConfig` (object, optional): Session configuration including `turnDetectionConfiguration`, `tools`, `inferenceConfiguration`
|
|
233
|
+
- `debug` (boolean, optional): Enable debug logging. Default: `false`
|
|
234
|
+
|
|
235
|
+
### Methods
|
|
236
|
+
|
|
237
|
+
#### `connect(options?)`
|
|
238
|
+
|
|
239
|
+
Establishes connection to AWS Bedrock. Must be called before using other methods.
|
|
240
|
+
|
|
241
|
+
```typescript
|
|
242
|
+
await voice.connect();
|
|
243
|
+
```
|
|
244
|
+
|
|
245
|
+
#### `speak(input, options?)`
|
|
246
|
+
|
|
247
|
+
Send cross-modal text input during an active voice session. Nova Sonic processes it and responds with audio.
|
|
248
|
+
|
|
249
|
+
```typescript
|
|
250
|
+
await voice.speak('Hello, world!');
|
|
251
|
+
```
|
|
252
|
+
|
|
253
|
+
#### `listen(audioStream, options?)`
|
|
254
|
+
|
|
255
|
+
Stream audio input for transcription. For Nova Sonic, this is equivalent to `send()`.
|
|
256
|
+
|
|
257
|
+
```typescript
|
|
258
|
+
await voice.listen(audioStream);
|
|
259
|
+
```
|
|
260
|
+
|
|
261
|
+
#### `send(audioData)`
|
|
262
|
+
|
|
263
|
+
Stream audio data in real-time. Accepts a `NodeJS.ReadableStream` (PCM16 audio) or an `Int16Array`.
|
|
264
|
+
|
|
265
|
+
```typescript
|
|
266
|
+
// Stream from a ReadableStream
|
|
267
|
+
await voice.send(audioStream);
|
|
268
|
+
|
|
269
|
+
// Or with Int16Array
|
|
270
|
+
const audioArray = new Int16Array([...]);
|
|
271
|
+
await voice.send(audioArray);
|
|
272
|
+
```
|
|
273
|
+
|
|
274
|
+
#### `close()`
|
|
275
|
+
|
|
276
|
+
Disconnect and cleanup resources.
|
|
277
|
+
|
|
278
|
+
```typescript
|
|
279
|
+
voice.close();
|
|
280
|
+
```
|
|
281
|
+
|
|
282
|
+
#### `on(event, callback)`
|
|
283
|
+
|
|
284
|
+
Register an event listener.
|
|
285
|
+
|
|
286
|
+
```typescript
|
|
287
|
+
voice.on('speaking', ({ audio }) => {
|
|
288
|
+
// audio is a base64-encoded string of PCM audio
|
|
289
|
+
});
|
|
290
|
+
|
|
291
|
+
voice.on('writing', ({ text, role, generationStage }) => {
|
|
292
|
+
// generationStage: 'SPECULATIVE' (preview) or 'FINAL' (actual transcript)
|
|
293
|
+
console.log(`${role}: ${text}`);
|
|
294
|
+
});
|
|
295
|
+
|
|
296
|
+
voice.on('error', ({ message, code }) => {
|
|
297
|
+
console.error(`Error: ${message} (${code})`);
|
|
298
|
+
});
|
|
299
|
+
```
|
|
300
|
+
|
|
301
|
+
#### `off(event, callback)`
|
|
302
|
+
|
|
303
|
+
Remove an event listener.
|
|
304
|
+
|
|
305
|
+
```typescript
|
|
306
|
+
voice.off('speaking', callback);
|
|
307
|
+
```
|
|
308
|
+
|
|
309
|
+
### Events
|
|
310
|
+
|
|
311
|
+
- **`speaker`**: Audio stream (`NodeJS.ReadableStream`) for the full response
|
|
312
|
+
- **`speaking`**: Audio chunk `{ audio: string, audioData: Buffer, response_id?: string }`
|
|
313
|
+
- **`writing`**: Text transcription `{ text: string, role: 'assistant' | 'user', generationStage?: 'SPECULATIVE' | 'FINAL' }`
|
|
314
|
+
- **`error`**: Error event `{ message: string, code?: string, details?: unknown }`
|
|
315
|
+
- **`toolCall`**: Tool invocation `{ name: string, args: Record<string, any>, id: string }`
|
|
316
|
+
- **`turnComplete`**: Turn completion `{ timestamp: number }`
|
|
317
|
+
- **`interrupt`**: Barge-in detected `{ type: string, timestamp: number }`
|
|
318
|
+
- **`contentStart`**: Content block started (raw Nova Sonic event)
|
|
319
|
+
- **`contentEnd`**: Content block ended (raw Nova Sonic event)
|
|
320
|
+
- **`usage`**: Token usage `{ inputTokens: number, outputTokens: number, totalTokens: number }`
|
|
321
|
+
|
|
322
|
+
## Supported Regions
|
|
323
|
+
|
|
324
|
+
- `us-east-1` (US East - N. Virginia)
|
|
325
|
+
- `us-west-2` (US West - Oregon)
|
|
326
|
+
- `ap-northeast-1` (Asia Pacific - Tokyo)
|
|
327
|
+
|
|
328
|
+
## Supported Languages
|
|
329
|
+
|
|
330
|
+
- English (US, UK, India, Australia)
|
|
331
|
+
- French
|
|
332
|
+
- Italian
|
|
333
|
+
- German
|
|
334
|
+
- Spanish
|
|
335
|
+
- Portuguese
|
|
336
|
+
- Hindi
|
|
337
|
+
|
|
338
|
+
## Error Handling
|
|
339
|
+
|
|
340
|
+
The package provides error handling with specific error codes:
|
|
341
|
+
|
|
342
|
+
```typescript
|
|
343
|
+
import { NovaSonicError, NovaSonicErrorCode } from '@mastra/voice-aws-nova-sonic';
|
|
344
|
+
|
|
345
|
+
voice.on('error', ({ message, code, details }) => {
|
|
346
|
+
if (code === NovaSonicErrorCode.CONNECTION_FAILED) {
|
|
347
|
+
// Handle connection error
|
|
348
|
+
} else if (code === NovaSonicErrorCode.CREDENTIALS_MISSING) {
|
|
349
|
+
// Handle credentials error
|
|
350
|
+
}
|
|
351
|
+
});
|
|
352
|
+
```
|
|
353
|
+
|
|
354
|
+
## Troubleshooting
|
|
355
|
+
|
|
356
|
+
### Connection Issues
|
|
357
|
+
|
|
358
|
+
- Verify AWS credentials are configured correctly
|
|
359
|
+
- Check that Nova 2 Sonic is enabled in your AWS Bedrock console
|
|
360
|
+
- Ensure your IAM role/user has the required permissions
|
|
361
|
+
- Verify the region supports Nova 2 Sonic
|
|
362
|
+
|
|
363
|
+
### Audio Issues
|
|
364
|
+
|
|
365
|
+
- Ensure audio format is compatible (PCM, 16-bit, 16kHz)
|
|
366
|
+
- Check sample rate matches expected format
|
|
367
|
+
- Verify audio stream is not empty
|
|
368
|
+
|
|
369
|
+
### Authentication Issues
|
|
370
|
+
|
|
371
|
+
- Check AWS credentials are valid
|
|
372
|
+
- Verify IAM permissions include Bedrock access
|
|
373
|
+
- Ensure region is correct
|
|
374
|
+
|
|
375
|
+
## License
|
|
376
|
+
|
|
377
|
+
Apache-2.0
|
|
378
|
+
|
|
379
|
+
## Links
|
|
380
|
+
|
|
381
|
+
- [Mastra Documentation](https://mastra.ai)
|
|
382
|
+
- [AWS Nova 2 Sonic Documentation](https://docs.aws.amazon.com/nova/latest/nova2-userguide/using-conversational-speech.html)
|
|
383
|
+
- [Amazon Bedrock Documentation](https://docs.aws.amazon.com/bedrock/)
|