@mastra/voice-google-gemini-live 0.12.0-alpha.3 → 0.12.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md
CHANGED
|
@@ -1,5 +1,68 @@
|
|
|
1
1
|
# @mastra/voice-google-gemini-live
|
|
2
2
|
|
|
3
|
+
## 0.12.0
|
|
4
|
+
|
|
5
|
+
### Minor Changes
|
|
6
|
+
|
|
7
|
+
- Surface native-audio behavioral signals on Gemini Live realtime sessions (#17021). ([#17434](https://github.com/mastra-ai/mastra/pull/17434))
|
|
8
|
+
|
|
9
|
+
The `@mastra/voice-google-gemini-live` provider now enables transcription and barge-in detection in the setup payload and exposes them through Mastra's standard realtime event contract. This makes native-audio models such as `gemini-2.5-flash-native-audio-preview-12-2025` and `gemini-3.1-flash-live-preview` behaviorally usable end-to-end. Until now, the spoken response was silently dropped on native-audio because it arrives on a different wire channel from the model's internal reasoning.
|
|
10
|
+
|
|
11
|
+
**What changed**
|
|
12
|
+
- Setup payload unconditionally includes `input_audio_transcription`, `output_audio_transcription`, and `realtime_input_config.activity_handling = 'START_OF_ACTIVITY_INTERRUPTS'`, matching how the OpenAI, xAI, Inworld, and AWS Nova Sonic providers enable transcription by default.
|
|
13
|
+
- User-side transcripts emit as `writing` with `role: 'user'`. Model-side transcripts emit as `writing` with `role: 'assistant'`. This matches the cross-provider `writing` contract.
|
|
14
|
+
- Barge-in (the server cancelling its in-flight response when the user starts speaking) emits an `interrupt` event with `{ type: 'user', timestamp }`, matching `@mastra/voice-aws-nova-sonic`.
|
|
15
|
+
- On native-audio models, `modelTurn.parts.text` is the model's internal chain-of-thought, not the spoken response. It now emits as a Gemini-specific `thinking` event so consumers can render reasoning separately. On non-native-audio models, `modelTurn.parts.text` continues to emit as `writing` (it is the spoken response there).
|
|
16
|
+
|
|
17
|
+
**Example**
|
|
18
|
+
|
|
19
|
+
```ts
|
|
20
|
+
import { GeminiLiveVoice } from '@mastra/voice-google-gemini-live';
|
|
21
|
+
|
|
22
|
+
const voice = new GeminiLiveVoice({
|
|
23
|
+
apiKey: process.env.GOOGLE_API_KEY,
|
|
24
|
+
model: 'gemini-2.5-flash-native-audio-preview-12-2025',
|
|
25
|
+
});
|
|
26
|
+
|
|
27
|
+
voice.on('writing', ({ text, role }) => {
|
|
28
|
+
// role: 'user' → speech-to-text of the caller
|
|
29
|
+
// role: 'assistant' → speech-to-text of the model's spoken reply
|
|
30
|
+
});
|
|
31
|
+
|
|
32
|
+
voice.on('thinking', ({ text }) => {
|
|
33
|
+
// Gemini's internal reasoning on native-audio models
|
|
34
|
+
});
|
|
35
|
+
|
|
36
|
+
voice.on('interrupt', ({ type, timestamp }) => {
|
|
37
|
+
// Drop queued TTS audio — the user just barged in
|
|
38
|
+
});
|
|
39
|
+
|
|
40
|
+
await voice.connect();
|
|
41
|
+
```
|
|
42
|
+
|
|
43
|
+
### Patch Changes
|
|
44
|
+
|
|
45
|
+
- Moved shared voice primitives and route metadata into the new `@internal/voice` package so voice providers no longer depend on `@mastra/core` and server voice routes share the same route definitions. ([#16725](https://github.com/mastra-ai/mastra/pull/16725))
|
|
46
|
+
|
|
47
|
+
`@mastra/core/voice` continues to re-export the voice APIs for backwards compatibility.
|
|
48
|
+
|
|
49
|
+
- Fixed Gemini Live tool registration failing with `1007 Unknown name` errors for tools using discriminated unions, literals, and nullable types. The `sanitizeToolParameters` method now rewrites `oneOf` → `anyOf`, `const` → `enum`, and collapses nullable `anyOf` patterns into OpenAPI 3.0-compatible `type` + `nullable: true` form. Fixes #17020. ([#17179](https://github.com/mastra-ai/mastra/pull/17179))
|
|
50
|
+
|
|
51
|
+
- **Fixed** Gemini Live sessions now connect successfully when using native-audio models. Previously the connection failed during session setup. ([#17019](https://github.com/mastra-ai/mastra/pull/17019))
|
|
52
|
+
|
|
53
|
+
**Fixed** tools are now invoked correctly. Previously tool calls were silently ignored even when tools were registered during setup.
|
|
54
|
+
|
|
55
|
+
**Fixed** tool results of any shape (arrays, primitives, objects) are now accepted. Previously, non-object tool return values caused sessions to close unexpectedly.
|
|
56
|
+
|
|
57
|
+
**Fixed** the `speaker` option is now honored when passed at the `VoiceConfig` root alongside `realtimeConfig`, not only when passed in the flat config shape.
|
|
58
|
+
|
|
59
|
+
**Changed** default model from `gemini-2.0-flash-exp` (shut down 2025-12-09) to `gemini-3.1-flash-live-preview` (Google's current Live API quickstart model). If you weren't explicitly setting `model`, your sessions will start connecting again.
|
|
60
|
+
|
|
61
|
+
Fixes #17018.
|
|
62
|
+
|
|
63
|
+
- Updated dependencies [[`00eca42`](https://github.com/mastra-ai/mastra/commit/00eca4252393aa114dc8c9a5e1da68df91fa06cf), [`ff9d743`](https://github.com/mastra-ai/mastra/commit/ff9d743f71d7e072927725c0d700632aca0c1fee)]:
|
|
64
|
+
- @mastra/schema-compat@1.2.11
|
|
65
|
+
|
|
3
66
|
## 0.12.0-alpha.3
|
|
4
67
|
|
|
5
68
|
### Minor Changes
|
package/dist/docs/SKILL.md
CHANGED
|
@@ -3,7 +3,7 @@ name: mastra-voice-google-gemini-live
|
|
|
3
3
|
description: Documentation for @mastra/voice-google-gemini-live. Use when working with @mastra/voice-google-gemini-live APIs, configuration, or implementation.
|
|
4
4
|
metadata:
|
|
5
5
|
package: "@mastra/voice-google-gemini-live"
|
|
6
|
-
version: "0.12.0
|
|
6
|
+
version: "0.12.0"
|
|
7
7
|
---
|
|
8
8
|
|
|
9
9
|
## When to use
|
|
@@ -16,7 +16,7 @@ const voiceAgent = new Agent({
|
|
|
16
16
|
id: 'voice-agent',
|
|
17
17
|
name: 'Voice Agent',
|
|
18
18
|
instructions: 'You are a voice assistant that can help users with their tasks.',
|
|
19
|
-
model: 'openai/gpt-5.
|
|
19
|
+
model: 'openai/gpt-5.5',
|
|
20
20
|
voice: new OpenAIVoice(),
|
|
21
21
|
})
|
|
22
22
|
```
|
|
@@ -40,7 +40,7 @@ const voiceAgent = new Agent({
|
|
|
40
40
|
id: 'voice-agent',
|
|
41
41
|
name: 'Voice Agent',
|
|
42
42
|
instructions: 'You are a voice assistant that can help users with their tasks.',
|
|
43
|
-
model: 'openai/gpt-5.
|
|
43
|
+
model: 'openai/gpt-5.5',
|
|
44
44
|
voice: new OpenAIVoice(),
|
|
45
45
|
})
|
|
46
46
|
|
|
@@ -68,7 +68,7 @@ const voiceAgent = new Agent({
|
|
|
68
68
|
id: 'voice-agent',
|
|
69
69
|
name: 'Voice Agent',
|
|
70
70
|
instructions: 'You are a voice assistant that can help users with their tasks.',
|
|
71
|
-
model: 'openai/gpt-5.
|
|
71
|
+
model: 'openai/gpt-5.5',
|
|
72
72
|
voice: new AzureVoice(),
|
|
73
73
|
})
|
|
74
74
|
|
|
@@ -95,7 +95,7 @@ const voiceAgent = new Agent({
|
|
|
95
95
|
id: 'voice-agent',
|
|
96
96
|
name: 'Voice Agent',
|
|
97
97
|
instructions: 'You are a voice assistant that can help users with their tasks.',
|
|
98
|
-
model: 'openai/gpt-5.
|
|
98
|
+
model: 'openai/gpt-5.5',
|
|
99
99
|
voice: new ElevenLabsVoice(),
|
|
100
100
|
})
|
|
101
101
|
|
|
@@ -122,7 +122,7 @@ const voiceAgent = new Agent({
|
|
|
122
122
|
id: 'voice-agent',
|
|
123
123
|
name: 'Voice Agent',
|
|
124
124
|
instructions: 'You are a voice assistant that can help users with their tasks.',
|
|
125
|
-
model: 'openai/gpt-5.
|
|
125
|
+
model: 'openai/gpt-5.5',
|
|
126
126
|
voice: new PlayAIVoice(),
|
|
127
127
|
})
|
|
128
128
|
|
|
@@ -149,7 +149,7 @@ const voiceAgent = new Agent({
|
|
|
149
149
|
id: 'voice-agent',
|
|
150
150
|
name: 'Voice Agent',
|
|
151
151
|
instructions: 'You are a voice assistant that can help users with their tasks.',
|
|
152
|
-
model: 'openai/gpt-5.
|
|
152
|
+
model: 'openai/gpt-5.5',
|
|
153
153
|
voice: new GoogleVoice(),
|
|
154
154
|
})
|
|
155
155
|
|
|
@@ -176,7 +176,7 @@ const voiceAgent = new Agent({
|
|
|
176
176
|
id: 'voice-agent',
|
|
177
177
|
name: 'Voice Agent',
|
|
178
178
|
instructions: 'You are a voice assistant that can help users with their tasks.',
|
|
179
|
-
model: 'openai/gpt-5.
|
|
179
|
+
model: 'openai/gpt-5.5',
|
|
180
180
|
voice: new CloudflareVoice(),
|
|
181
181
|
})
|
|
182
182
|
|
|
@@ -203,7 +203,7 @@ const voiceAgent = new Agent({
|
|
|
203
203
|
id: 'voice-agent',
|
|
204
204
|
name: 'Voice Agent',
|
|
205
205
|
instructions: 'You are a voice assistant that can help users with their tasks.',
|
|
206
|
-
model: 'openai/gpt-5.
|
|
206
|
+
model: 'openai/gpt-5.5',
|
|
207
207
|
voice: new DeepgramVoice(),
|
|
208
208
|
})
|
|
209
209
|
|
|
@@ -230,7 +230,7 @@ const voiceAgent = new Agent({
|
|
|
230
230
|
id: 'voice-agent',
|
|
231
231
|
name: 'Voice Agent',
|
|
232
232
|
instructions: 'You are a voice assistant that can help users with their tasks.',
|
|
233
|
-
model: 'openai/gpt-5.
|
|
233
|
+
model: 'openai/gpt-5.5',
|
|
234
234
|
voice: new InworldVoice(),
|
|
235
235
|
})
|
|
236
236
|
|
|
@@ -257,7 +257,7 @@ const voiceAgent = new Agent({
|
|
|
257
257
|
id: 'voice-agent',
|
|
258
258
|
name: 'Voice Agent',
|
|
259
259
|
instructions: 'You are a voice assistant that can help users with their tasks.',
|
|
260
|
-
model: 'openai/gpt-5.
|
|
260
|
+
model: 'openai/gpt-5.5',
|
|
261
261
|
voice: new SpeechifyVoice(),
|
|
262
262
|
})
|
|
263
263
|
|
|
@@ -284,7 +284,7 @@ const voiceAgent = new Agent({
|
|
|
284
284
|
id: 'voice-agent',
|
|
285
285
|
name: 'Voice Agent',
|
|
286
286
|
instructions: 'You are a voice assistant that can help users with their tasks.',
|
|
287
|
-
model: 'openai/gpt-5.
|
|
287
|
+
model: 'openai/gpt-5.5',
|
|
288
288
|
voice: new SarvamVoice(),
|
|
289
289
|
})
|
|
290
290
|
|
|
@@ -311,7 +311,7 @@ const voiceAgent = new Agent({
|
|
|
311
311
|
id: 'voice-agent',
|
|
312
312
|
name: 'Voice Agent',
|
|
313
313
|
instructions: 'You are a voice assistant that can help users with their tasks.',
|
|
314
|
-
model: 'openai/gpt-5.
|
|
314
|
+
model: 'openai/gpt-5.5',
|
|
315
315
|
voice: new MurfVoice(),
|
|
316
316
|
})
|
|
317
317
|
|
|
@@ -346,7 +346,7 @@ const voiceAgent = new Agent({
|
|
|
346
346
|
id: 'voice-agent',
|
|
347
347
|
name: 'Voice Agent',
|
|
348
348
|
instructions: 'You are a voice assistant that can help users with their tasks.',
|
|
349
|
-
model: 'openai/gpt-5.
|
|
349
|
+
model: 'openai/gpt-5.5',
|
|
350
350
|
voice: new OpenAIVoice(),
|
|
351
351
|
})
|
|
352
352
|
|
|
@@ -375,7 +375,7 @@ const voiceAgent = new Agent({
|
|
|
375
375
|
id: 'voice-agent',
|
|
376
376
|
name: 'Voice Agent',
|
|
377
377
|
instructions: 'You are a voice assistant that can help users with their tasks.',
|
|
378
|
-
model: 'openai/gpt-5.
|
|
378
|
+
model: 'openai/gpt-5.5',
|
|
379
379
|
voice: new AzureVoice(),
|
|
380
380
|
})
|
|
381
381
|
|
|
@@ -403,7 +403,7 @@ const voiceAgent = new Agent({
|
|
|
403
403
|
id: 'voice-agent',
|
|
404
404
|
name: 'Voice Agent',
|
|
405
405
|
instructions: 'You are a voice assistant that can help users with their tasks.',
|
|
406
|
-
model: 'openai/gpt-5.
|
|
406
|
+
model: 'openai/gpt-5.5',
|
|
407
407
|
voice: new ElevenLabsVoice(),
|
|
408
408
|
})
|
|
409
409
|
|
|
@@ -431,7 +431,7 @@ const voiceAgent = new Agent({
|
|
|
431
431
|
id: 'voice-agent',
|
|
432
432
|
name: 'Voice Agent',
|
|
433
433
|
instructions: 'You are a voice assistant that can help users with their tasks.',
|
|
434
|
-
model: 'openai/gpt-5.
|
|
434
|
+
model: 'openai/gpt-5.5',
|
|
435
435
|
voice: new GoogleVoice(),
|
|
436
436
|
})
|
|
437
437
|
|
|
@@ -459,7 +459,7 @@ const voiceAgent = new Agent({
|
|
|
459
459
|
id: 'voice-agent',
|
|
460
460
|
name: 'Voice Agent',
|
|
461
461
|
instructions: 'You are a voice assistant that can help users with their tasks.',
|
|
462
|
-
model: 'openai/gpt-5.
|
|
462
|
+
model: 'openai/gpt-5.5',
|
|
463
463
|
voice: new CloudflareVoice(),
|
|
464
464
|
})
|
|
465
465
|
|
|
@@ -487,7 +487,7 @@ const voiceAgent = new Agent({
|
|
|
487
487
|
id: 'voice-agent',
|
|
488
488
|
name: 'Voice Agent',
|
|
489
489
|
instructions: 'You are a voice assistant that can help users with their tasks.',
|
|
490
|
-
model: 'openai/gpt-5.
|
|
490
|
+
model: 'openai/gpt-5.5',
|
|
491
491
|
voice: new DeepgramVoice(),
|
|
492
492
|
})
|
|
493
493
|
|
|
@@ -515,7 +515,7 @@ const voiceAgent = new Agent({
|
|
|
515
515
|
id: 'voice-agent',
|
|
516
516
|
name: 'Voice Agent',
|
|
517
517
|
instructions: 'You are a voice assistant that can help users with their tasks.',
|
|
518
|
-
model: 'openai/gpt-5.
|
|
518
|
+
model: 'openai/gpt-5.5',
|
|
519
519
|
voice: new InworldVoice(),
|
|
520
520
|
})
|
|
521
521
|
|
|
@@ -543,7 +543,7 @@ const voiceAgent = new Agent({
|
|
|
543
543
|
id: 'voice-agent',
|
|
544
544
|
name: 'Voice Agent',
|
|
545
545
|
instructions: 'You are a voice assistant that can help users with their tasks.',
|
|
546
|
-
model: 'openai/gpt-5.
|
|
546
|
+
model: 'openai/gpt-5.5',
|
|
547
547
|
voice: new SarvamVoice(),
|
|
548
548
|
})
|
|
549
549
|
|
|
@@ -575,7 +575,7 @@ const voiceAgent = new Agent({
|
|
|
575
575
|
id: 'voice-agent',
|
|
576
576
|
name: 'Voice Agent',
|
|
577
577
|
instructions: 'You are a voice assistant that can help users with their tasks.',
|
|
578
|
-
model: 'openai/gpt-5.
|
|
578
|
+
model: 'openai/gpt-5.5',
|
|
579
579
|
voice: new OpenAIRealtimeVoice(),
|
|
580
580
|
})
|
|
581
581
|
|
|
@@ -605,7 +605,7 @@ const voiceAgent = new Agent({
|
|
|
605
605
|
id: 'voice-agent',
|
|
606
606
|
name: 'Voice Agent',
|
|
607
607
|
instructions: 'You are a voice assistant that can help users with their tasks.',
|
|
608
|
-
model: 'openai/gpt-5.
|
|
608
|
+
model: 'openai/gpt-5.5',
|
|
609
609
|
voice: new GeminiLiveVoice({
|
|
610
610
|
// Live API mode
|
|
611
611
|
apiKey: process.env.GOOGLE_API_KEY,
|
|
@@ -654,7 +654,7 @@ const voiceAgent = new Agent({
|
|
|
654
654
|
id: 'voice-agent',
|
|
655
655
|
name: 'Voice Agent',
|
|
656
656
|
instructions: 'You are a voice assistant that can help users with their tasks.',
|
|
657
|
-
model: 'openai/gpt-5.
|
|
657
|
+
model: 'openai/gpt-5.5',
|
|
658
658
|
voice: new NovaSonicVoice({
|
|
659
659
|
region: 'us-east-1',
|
|
660
660
|
speaker: 'matthew',
|
|
@@ -697,7 +697,7 @@ const voiceAgent = new Agent({
|
|
|
697
697
|
id: 'voice-agent',
|
|
698
698
|
name: 'Voice Agent',
|
|
699
699
|
instructions: 'You are a voice assistant that can help users with their tasks.',
|
|
700
|
-
model: 'openai/gpt-5.
|
|
700
|
+
model: 'openai/gpt-5.5',
|
|
701
701
|
voice: new InworldRealtimeVoice({
|
|
702
702
|
apiKey: process.env.INWORLD_API_KEY,
|
|
703
703
|
model: 'inworld/models/gemma-4-26b-a4b-it',
|
|
@@ -1132,7 +1132,7 @@ const voiceAgent = new Agent({
|
|
|
1132
1132
|
id: 'aisdk-voice-agent',
|
|
1133
1133
|
name: 'AI SDK Voice Agent',
|
|
1134
1134
|
instructions: 'You are a helpful assistant with voice capabilities.',
|
|
1135
|
-
model: 'openai/gpt-5.
|
|
1135
|
+
model: 'openai/gpt-5.5',
|
|
1136
1136
|
voice,
|
|
1137
1137
|
})
|
|
1138
1138
|
```
|
|
@@ -32,7 +32,7 @@ const agent = new Agent({
|
|
|
32
32
|
id: 'agent',
|
|
33
33
|
name: 'OpenAI Realtime Agent',
|
|
34
34
|
instructions: `You are a helpful assistant with real-time voice capabilities.`,
|
|
35
|
-
model: 'openai/gpt-5.
|
|
35
|
+
model: 'openai/gpt-5.5',
|
|
36
36
|
voice: new OpenAIRealtimeVoice(),
|
|
37
37
|
})
|
|
38
38
|
|
|
@@ -66,7 +66,7 @@ const agent = new Agent({
|
|
|
66
66
|
name: 'Gemini Live Agent',
|
|
67
67
|
instructions: 'You are a helpful assistant with real-time voice capabilities.',
|
|
68
68
|
// Model used for text generation; voice provider handles realtime audio
|
|
69
|
-
model: 'openai/gpt-5.
|
|
69
|
+
model: 'openai/gpt-5.5',
|
|
70
70
|
voice: new GeminiLiveVoice({
|
|
71
71
|
apiKey: process.env.GOOGLE_API_KEY,
|
|
72
72
|
model: 'gemini-2.0-flash-exp',
|
|
@@ -113,7 +113,7 @@ const agent = new Agent({
|
|
|
113
113
|
name: 'Nova Sonic Agent',
|
|
114
114
|
instructions: 'You are a helpful assistant with real-time voice capabilities.',
|
|
115
115
|
// Model used for text generation; voice provider handles realtime audio
|
|
116
|
-
model: 'openai/gpt-5.
|
|
116
|
+
model: 'openai/gpt-5.5',
|
|
117
117
|
voice: new NovaSonicVoice({
|
|
118
118
|
region: 'us-east-1',
|
|
119
119
|
speaker: 'matthew',
|
|
@@ -157,7 +157,7 @@ const agent = new Agent({
|
|
|
157
157
|
name: 'Inworld Realtime Agent',
|
|
158
158
|
instructions: 'You are a helpful assistant with real-time voice capabilities.',
|
|
159
159
|
// Model used for text generation; voice provider handles realtime audio
|
|
160
|
-
model: 'openai/gpt-5.
|
|
160
|
+
model: 'openai/gpt-5.5',
|
|
161
161
|
voice: new InworldRealtimeVoice({
|
|
162
162
|
apiKey: process.env.INWORLD_API_KEY,
|
|
163
163
|
model: 'inworld/models/gemma-4-26b-a4b-it',
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "@mastra/voice-google-gemini-live",
|
|
3
|
-
"version": "0.12.0
|
|
3
|
+
"version": "0.12.0",
|
|
4
4
|
"description": "Mastra Google Gemini Live API integration",
|
|
5
5
|
"type": "module",
|
|
6
6
|
"files": [
|
|
@@ -27,7 +27,7 @@
|
|
|
27
27
|
"@google/genai": "^1.45.0",
|
|
28
28
|
"google-auth-library": "^10.6.1",
|
|
29
29
|
"ws": "^8.20.0",
|
|
30
|
-
"@mastra/schema-compat": "1.2.11
|
|
30
|
+
"@mastra/schema-compat": "1.2.11"
|
|
31
31
|
},
|
|
32
32
|
"devDependencies": {
|
|
33
33
|
"@types/node": "22.19.15",
|
|
@@ -40,10 +40,10 @@
|
|
|
40
40
|
"typescript": "^6.0.3",
|
|
41
41
|
"vitest": "4.1.5",
|
|
42
42
|
"zod": "^4.4.3",
|
|
43
|
-
"@internal/
|
|
44
|
-
"@internal/
|
|
45
|
-
"@internal/
|
|
46
|
-
"@internal/
|
|
43
|
+
"@internal/lint": "0.0.100",
|
|
44
|
+
"@internal/test-utils": "0.0.36",
|
|
45
|
+
"@internal/voice": "0.0.1",
|
|
46
|
+
"@internal/types-builder": "0.0.75"
|
|
47
47
|
},
|
|
48
48
|
"homepage": "https://mastra.ai",
|
|
49
49
|
"repository": {
|