@mastra/voice-google-gemini-live 0.12.0-alpha.3 → 0.12.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -1,5 +1,68 @@
1
1
  # @mastra/voice-google-gemini-live
2
2
 
3
+ ## 0.12.0
4
+
5
+ ### Minor Changes
6
+
7
+ - Surface native-audio behavioral signals on Gemini Live realtime sessions (#17021). ([#17434](https://github.com/mastra-ai/mastra/pull/17434))
8
+
9
+ The `@mastra/voice-google-gemini-live` provider now enables transcription and barge-in detection in the setup payload and exposes them through Mastra's standard realtime event contract. This makes native-audio models such as `gemini-2.5-flash-native-audio-preview-12-2025` and `gemini-3.1-flash-live-preview` behaviorally usable end-to-end. Until now, the spoken response was silently dropped on native-audio because it arrives on a different wire channel from the model's internal reasoning.
10
+
11
+ **What changed**
12
+ - Setup payload unconditionally includes `input_audio_transcription`, `output_audio_transcription`, and `realtime_input_config.activity_handling = 'START_OF_ACTIVITY_INTERRUPTS'`, matching how the OpenAI, xAI, Inworld, and AWS Nova Sonic providers enable transcription by default.
13
+ - User-side transcripts emit as `writing` with `role: 'user'`. Model-side transcripts emit as `writing` with `role: 'assistant'`. This matches the cross-provider `writing` contract.
14
+ - Barge-in (the server cancelling its in-flight response when the user starts speaking) emits an `interrupt` event with `{ type: 'user', timestamp }`, matching `@mastra/voice-aws-nova-sonic`.
15
+ - On native-audio models, `modelTurn.parts.text` is the model's internal chain-of-thought, not the spoken response. It now emits as a Gemini-specific `thinking` event so consumers can render reasoning separately. On non-native-audio models, `modelTurn.parts.text` continues to emit as `writing` (it is the spoken response there).
16
+
17
+ **Example**
18
+
19
+ ```ts
20
+ import { GeminiLiveVoice } from '@mastra/voice-google-gemini-live';
21
+
22
+ const voice = new GeminiLiveVoice({
23
+ apiKey: process.env.GOOGLE_API_KEY,
24
+ model: 'gemini-2.5-flash-native-audio-preview-12-2025',
25
+ });
26
+
27
+ voice.on('writing', ({ text, role }) => {
28
+ // role: 'user' → speech-to-text of the caller
29
+ // role: 'assistant' → speech-to-text of the model's spoken reply
30
+ });
31
+
32
+ voice.on('thinking', ({ text }) => {
33
+ // Gemini's internal reasoning on native-audio models
34
+ });
35
+
36
+ voice.on('interrupt', ({ type, timestamp }) => {
37
+ // Drop queued TTS audio — the user just barged in
38
+ });
39
+
40
+ await voice.connect();
41
+ ```
42
+
43
+ ### Patch Changes
44
+
45
+ - Moved shared voice primitives and route metadata into the new `@internal/voice` package so voice providers no longer depend on `@mastra/core` and server voice routes share the same route definitions. ([#16725](https://github.com/mastra-ai/mastra/pull/16725))
46
+
47
+ `@mastra/core/voice` continues to re-export the voice APIs for backwards compatibility.
48
+
49
+ - Fixed Gemini Live tool registration failing with `1007 Unknown name` errors for tools using discriminated unions, literals, and nullable types. The `sanitizeToolParameters` method now rewrites `oneOf` → `anyOf`, `const` → `enum`, and collapses nullable `anyOf` patterns into OpenAPI 3.0-compatible `type` + `nullable: true` form. Fixes #17020. ([#17179](https://github.com/mastra-ai/mastra/pull/17179))
50
+
51
+ - **Fixed** Gemini Live sessions now connect successfully when using native-audio models. Previously the connection failed during session setup. ([#17019](https://github.com/mastra-ai/mastra/pull/17019))
52
+
53
+ **Fixed** tools are now invoked correctly. Previously tool calls were silently ignored even when tools were registered during setup.
54
+
55
+ **Fixed** tool results of any shape (arrays, primitives, objects) are now accepted. Previously, non-object tool return values caused sessions to close unexpectedly.
56
+
57
+ **Fixed** the `speaker` option is now honored when passed at the `VoiceConfig` root alongside `realtimeConfig`, not only when passed in the flat config shape.
58
+
59
+ **Changed** default model from `gemini-2.0-flash-exp` (shut down 2025-12-09) to `gemini-3.1-flash-live-preview` (Google's current Live API quickstart model). If you weren't explicitly setting `model`, your sessions will start connecting again.
60
+
61
+ Fixes #17018.
62
+
63
+ - Updated dependencies [[`00eca42`](https://github.com/mastra-ai/mastra/commit/00eca4252393aa114dc8c9a5e1da68df91fa06cf), [`ff9d743`](https://github.com/mastra-ai/mastra/commit/ff9d743f71d7e072927725c0d700632aca0c1fee)]:
64
+ - @mastra/schema-compat@1.2.11
65
+
3
66
  ## 0.12.0-alpha.3
4
67
 
5
68
  ### Minor Changes
@@ -3,7 +3,7 @@ name: mastra-voice-google-gemini-live
3
3
  description: Documentation for @mastra/voice-google-gemini-live. Use when working with @mastra/voice-google-gemini-live APIs, configuration, or implementation.
4
4
  metadata:
5
5
  package: "@mastra/voice-google-gemini-live"
6
- version: "0.12.0-alpha.3"
6
+ version: "0.12.0"
7
7
  ---
8
8
 
9
9
  ## When to use
@@ -1,5 +1,5 @@
1
1
  {
2
- "version": "0.12.0-alpha.3",
2
+ "version": "0.12.0",
3
3
  "package": "@mastra/voice-google-gemini-live",
4
4
  "exports": {},
5
5
  "modules": {}
@@ -16,7 +16,7 @@ const voiceAgent = new Agent({
16
16
  id: 'voice-agent',
17
17
  name: 'Voice Agent',
18
18
  instructions: 'You are a voice assistant that can help users with their tasks.',
19
- model: 'openai/gpt-5.4',
19
+ model: 'openai/gpt-5.5',
20
20
  voice: new OpenAIVoice(),
21
21
  })
22
22
  ```
@@ -40,7 +40,7 @@ const voiceAgent = new Agent({
40
40
  id: 'voice-agent',
41
41
  name: 'Voice Agent',
42
42
  instructions: 'You are a voice assistant that can help users with their tasks.',
43
- model: 'openai/gpt-5.4',
43
+ model: 'openai/gpt-5.5',
44
44
  voice: new OpenAIVoice(),
45
45
  })
46
46
 
@@ -68,7 +68,7 @@ const voiceAgent = new Agent({
68
68
  id: 'voice-agent',
69
69
  name: 'Voice Agent',
70
70
  instructions: 'You are a voice assistant that can help users with their tasks.',
71
- model: 'openai/gpt-5.4',
71
+ model: 'openai/gpt-5.5',
72
72
  voice: new AzureVoice(),
73
73
  })
74
74
 
@@ -95,7 +95,7 @@ const voiceAgent = new Agent({
95
95
  id: 'voice-agent',
96
96
  name: 'Voice Agent',
97
97
  instructions: 'You are a voice assistant that can help users with their tasks.',
98
- model: 'openai/gpt-5.4',
98
+ model: 'openai/gpt-5.5',
99
99
  voice: new ElevenLabsVoice(),
100
100
  })
101
101
 
@@ -122,7 +122,7 @@ const voiceAgent = new Agent({
122
122
  id: 'voice-agent',
123
123
  name: 'Voice Agent',
124
124
  instructions: 'You are a voice assistant that can help users with their tasks.',
125
- model: 'openai/gpt-5.4',
125
+ model: 'openai/gpt-5.5',
126
126
  voice: new PlayAIVoice(),
127
127
  })
128
128
 
@@ -149,7 +149,7 @@ const voiceAgent = new Agent({
149
149
  id: 'voice-agent',
150
150
  name: 'Voice Agent',
151
151
  instructions: 'You are a voice assistant that can help users with their tasks.',
152
- model: 'openai/gpt-5.4',
152
+ model: 'openai/gpt-5.5',
153
153
  voice: new GoogleVoice(),
154
154
  })
155
155
 
@@ -176,7 +176,7 @@ const voiceAgent = new Agent({
176
176
  id: 'voice-agent',
177
177
  name: 'Voice Agent',
178
178
  instructions: 'You are a voice assistant that can help users with their tasks.',
179
- model: 'openai/gpt-5.4',
179
+ model: 'openai/gpt-5.5',
180
180
  voice: new CloudflareVoice(),
181
181
  })
182
182
 
@@ -203,7 +203,7 @@ const voiceAgent = new Agent({
203
203
  id: 'voice-agent',
204
204
  name: 'Voice Agent',
205
205
  instructions: 'You are a voice assistant that can help users with their tasks.',
206
- model: 'openai/gpt-5.4',
206
+ model: 'openai/gpt-5.5',
207
207
  voice: new DeepgramVoice(),
208
208
  })
209
209
 
@@ -230,7 +230,7 @@ const voiceAgent = new Agent({
230
230
  id: 'voice-agent',
231
231
  name: 'Voice Agent',
232
232
  instructions: 'You are a voice assistant that can help users with their tasks.',
233
- model: 'openai/gpt-5.4',
233
+ model: 'openai/gpt-5.5',
234
234
  voice: new InworldVoice(),
235
235
  })
236
236
 
@@ -257,7 +257,7 @@ const voiceAgent = new Agent({
257
257
  id: 'voice-agent',
258
258
  name: 'Voice Agent',
259
259
  instructions: 'You are a voice assistant that can help users with their tasks.',
260
- model: 'openai/gpt-5.4',
260
+ model: 'openai/gpt-5.5',
261
261
  voice: new SpeechifyVoice(),
262
262
  })
263
263
 
@@ -284,7 +284,7 @@ const voiceAgent = new Agent({
284
284
  id: 'voice-agent',
285
285
  name: 'Voice Agent',
286
286
  instructions: 'You are a voice assistant that can help users with their tasks.',
287
- model: 'openai/gpt-5.4',
287
+ model: 'openai/gpt-5.5',
288
288
  voice: new SarvamVoice(),
289
289
  })
290
290
 
@@ -311,7 +311,7 @@ const voiceAgent = new Agent({
311
311
  id: 'voice-agent',
312
312
  name: 'Voice Agent',
313
313
  instructions: 'You are a voice assistant that can help users with their tasks.',
314
- model: 'openai/gpt-5.4',
314
+ model: 'openai/gpt-5.5',
315
315
  voice: new MurfVoice(),
316
316
  })
317
317
 
@@ -346,7 +346,7 @@ const voiceAgent = new Agent({
346
346
  id: 'voice-agent',
347
347
  name: 'Voice Agent',
348
348
  instructions: 'You are a voice assistant that can help users with their tasks.',
349
- model: 'openai/gpt-5.4',
349
+ model: 'openai/gpt-5.5',
350
350
  voice: new OpenAIVoice(),
351
351
  })
352
352
 
@@ -375,7 +375,7 @@ const voiceAgent = new Agent({
375
375
  id: 'voice-agent',
376
376
  name: 'Voice Agent',
377
377
  instructions: 'You are a voice assistant that can help users with their tasks.',
378
- model: 'openai/gpt-5.4',
378
+ model: 'openai/gpt-5.5',
379
379
  voice: new AzureVoice(),
380
380
  })
381
381
 
@@ -403,7 +403,7 @@ const voiceAgent = new Agent({
403
403
  id: 'voice-agent',
404
404
  name: 'Voice Agent',
405
405
  instructions: 'You are a voice assistant that can help users with their tasks.',
406
- model: 'openai/gpt-5.4',
406
+ model: 'openai/gpt-5.5',
407
407
  voice: new ElevenLabsVoice(),
408
408
  })
409
409
 
@@ -431,7 +431,7 @@ const voiceAgent = new Agent({
431
431
  id: 'voice-agent',
432
432
  name: 'Voice Agent',
433
433
  instructions: 'You are a voice assistant that can help users with their tasks.',
434
- model: 'openai/gpt-5.4',
434
+ model: 'openai/gpt-5.5',
435
435
  voice: new GoogleVoice(),
436
436
  })
437
437
 
@@ -459,7 +459,7 @@ const voiceAgent = new Agent({
459
459
  id: 'voice-agent',
460
460
  name: 'Voice Agent',
461
461
  instructions: 'You are a voice assistant that can help users with their tasks.',
462
- model: 'openai/gpt-5.4',
462
+ model: 'openai/gpt-5.5',
463
463
  voice: new CloudflareVoice(),
464
464
  })
465
465
 
@@ -487,7 +487,7 @@ const voiceAgent = new Agent({
487
487
  id: 'voice-agent',
488
488
  name: 'Voice Agent',
489
489
  instructions: 'You are a voice assistant that can help users with their tasks.',
490
- model: 'openai/gpt-5.4',
490
+ model: 'openai/gpt-5.5',
491
491
  voice: new DeepgramVoice(),
492
492
  })
493
493
 
@@ -515,7 +515,7 @@ const voiceAgent = new Agent({
515
515
  id: 'voice-agent',
516
516
  name: 'Voice Agent',
517
517
  instructions: 'You are a voice assistant that can help users with their tasks.',
518
- model: 'openai/gpt-5.4',
518
+ model: 'openai/gpt-5.5',
519
519
  voice: new InworldVoice(),
520
520
  })
521
521
 
@@ -543,7 +543,7 @@ const voiceAgent = new Agent({
543
543
  id: 'voice-agent',
544
544
  name: 'Voice Agent',
545
545
  instructions: 'You are a voice assistant that can help users with their tasks.',
546
- model: 'openai/gpt-5.4',
546
+ model: 'openai/gpt-5.5',
547
547
  voice: new SarvamVoice(),
548
548
  })
549
549
 
@@ -575,7 +575,7 @@ const voiceAgent = new Agent({
575
575
  id: 'voice-agent',
576
576
  name: 'Voice Agent',
577
577
  instructions: 'You are a voice assistant that can help users with their tasks.',
578
- model: 'openai/gpt-5.4',
578
+ model: 'openai/gpt-5.5',
579
579
  voice: new OpenAIRealtimeVoice(),
580
580
  })
581
581
 
@@ -605,7 +605,7 @@ const voiceAgent = new Agent({
605
605
  id: 'voice-agent',
606
606
  name: 'Voice Agent',
607
607
  instructions: 'You are a voice assistant that can help users with their tasks.',
608
- model: 'openai/gpt-5.4',
608
+ model: 'openai/gpt-5.5',
609
609
  voice: new GeminiLiveVoice({
610
610
  // Live API mode
611
611
  apiKey: process.env.GOOGLE_API_KEY,
@@ -654,7 +654,7 @@ const voiceAgent = new Agent({
654
654
  id: 'voice-agent',
655
655
  name: 'Voice Agent',
656
656
  instructions: 'You are a voice assistant that can help users with their tasks.',
657
- model: 'openai/gpt-5.4',
657
+ model: 'openai/gpt-5.5',
658
658
  voice: new NovaSonicVoice({
659
659
  region: 'us-east-1',
660
660
  speaker: 'matthew',
@@ -697,7 +697,7 @@ const voiceAgent = new Agent({
697
697
  id: 'voice-agent',
698
698
  name: 'Voice Agent',
699
699
  instructions: 'You are a voice assistant that can help users with their tasks.',
700
- model: 'openai/gpt-5.4',
700
+ model: 'openai/gpt-5.5',
701
701
  voice: new InworldRealtimeVoice({
702
702
  apiKey: process.env.INWORLD_API_KEY,
703
703
  model: 'inworld/models/gemma-4-26b-a4b-it',
@@ -1132,7 +1132,7 @@ const voiceAgent = new Agent({
1132
1132
  id: 'aisdk-voice-agent',
1133
1133
  name: 'AI SDK Voice Agent',
1134
1134
  instructions: 'You are a helpful assistant with voice capabilities.',
1135
- model: 'openai/gpt-5.4',
1135
+ model: 'openai/gpt-5.5',
1136
1136
  voice,
1137
1137
  })
1138
1138
  ```
@@ -32,7 +32,7 @@ const agent = new Agent({
32
32
  id: 'agent',
33
33
  name: 'OpenAI Realtime Agent',
34
34
  instructions: `You are a helpful assistant with real-time voice capabilities.`,
35
- model: 'openai/gpt-5.4',
35
+ model: 'openai/gpt-5.5',
36
36
  voice: new OpenAIRealtimeVoice(),
37
37
  })
38
38
 
@@ -66,7 +66,7 @@ const agent = new Agent({
66
66
  name: 'Gemini Live Agent',
67
67
  instructions: 'You are a helpful assistant with real-time voice capabilities.',
68
68
  // Model used for text generation; voice provider handles realtime audio
69
- model: 'openai/gpt-5.4',
69
+ model: 'openai/gpt-5.5',
70
70
  voice: new GeminiLiveVoice({
71
71
  apiKey: process.env.GOOGLE_API_KEY,
72
72
  model: 'gemini-2.0-flash-exp',
@@ -113,7 +113,7 @@ const agent = new Agent({
113
113
  name: 'Nova Sonic Agent',
114
114
  instructions: 'You are a helpful assistant with real-time voice capabilities.',
115
115
  // Model used for text generation; voice provider handles realtime audio
116
- model: 'openai/gpt-5.4',
116
+ model: 'openai/gpt-5.5',
117
117
  voice: new NovaSonicVoice({
118
118
  region: 'us-east-1',
119
119
  speaker: 'matthew',
@@ -157,7 +157,7 @@ const agent = new Agent({
157
157
  name: 'Inworld Realtime Agent',
158
158
  instructions: 'You are a helpful assistant with real-time voice capabilities.',
159
159
  // Model used for text generation; voice provider handles realtime audio
160
- model: 'openai/gpt-5.4',
160
+ model: 'openai/gpt-5.5',
161
161
  voice: new InworldRealtimeVoice({
162
162
  apiKey: process.env.INWORLD_API_KEY,
163
163
  model: 'inworld/models/gemma-4-26b-a4b-it',
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@mastra/voice-google-gemini-live",
3
- "version": "0.12.0-alpha.3",
3
+ "version": "0.12.0",
4
4
  "description": "Mastra Google Gemini Live API integration",
5
5
  "type": "module",
6
6
  "files": [
@@ -27,7 +27,7 @@
27
27
  "@google/genai": "^1.45.0",
28
28
  "google-auth-library": "^10.6.1",
29
29
  "ws": "^8.20.0",
30
- "@mastra/schema-compat": "1.2.11-alpha.0"
30
+ "@mastra/schema-compat": "1.2.11"
31
31
  },
32
32
  "devDependencies": {
33
33
  "@types/node": "22.19.15",
@@ -40,10 +40,10 @@
40
40
  "typescript": "^6.0.3",
41
41
  "vitest": "4.1.5",
42
42
  "zod": "^4.4.3",
43
- "@internal/test-utils": "0.0.35",
44
- "@internal/lint": "0.0.99",
45
- "@internal/types-builder": "0.0.74",
46
- "@internal/voice": "0.0.0"
43
+ "@internal/lint": "0.0.100",
44
+ "@internal/test-utils": "0.0.36",
45
+ "@internal/voice": "0.0.1",
46
+ "@internal/types-builder": "0.0.75"
47
47
  },
48
48
  "homepage": "https://mastra.ai",
49
49
  "repository": {