npm - @loonylabs/tts-middleware - Versions diffs - 0.8.0 → 0.10.0 - Mend

@loonylabs/tts-middleware 0.8.0 → 0.10.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (41) hide show

package/README.md CHANGED Viewed

@@ -2,7 +2,7 @@
 # TTS Middleware
-*Provider-agnostic Text-to-Speech middleware with **GDPR compliance** support. Currently supports Azure Speech Services, EdenAI, Google Cloud TTS, Fish Audio, and Inworld AI. Features EU data residency via Azure and Google Cloud, pluggable logging, character-based billing, and comprehensive error handling.*
+*Provider-agnostic Text-to-Speech middleware with **GDPR compliance** support. Currently supports Azure Speech Services, EdenAI, Google Cloud TTS, Fish Audio, Inworld AI, and Vertex AI TTS. Features EU data residency via Azure and Google Cloud, pluggable logging, character-based billing, and comprehensive error handling.*
 <!-- Horizontal Badge Navigation Bar -->
 [![npm version](https://img.shields.io/npm/v/@loonylabs/tts-middleware.svg?style=for-the-badge&logo=npm&logoColor=white)](https://www.npmjs.com/package/@loonylabs/tts-middleware)
@@ -43,6 +43,7 @@
   - **Google Cloud TTS**: Neural2, WaveNet, Studio voices with EU data residency
   - **Fish Audio**: S1 model with 13 languages & 64+ emotions (test/admin only)
   - **Inworld AI**: TTS 1.5 Max/Mini with 15 languages & voice cloning (test/admin only)
+  - **Vertex AI TTS**: Gemini Flash/Pro models with 30 voices, 90+ languages & style prompts (test/admin only)
   - **Ready for:** OpenAI, ElevenLabs, Deepgram (interfaces prepared)
 - **GDPR/DSGVO Compliance**: Built-in EU region support for Azure and Google Cloud
 - **SSML Abstraction**: Auto-generates provider-specific SSML from simple JSON options
@@ -137,6 +138,14 @@ const inworld = await ttsService.synthesize({
   voice: { id: 'Ashley' },
   providerOptions: { modelId: 'inworld-tts-1.5-max', temperature: 1.1 },
 });
+// Vertex AI TTS (test/admin only)
+const vertexAI = await ttsService.synthesize({
+  text: 'Have a wonderful day!',
+  provider: TTSProvider.VERTEX_AI,
+  voice: { id: 'Kore' },
+  providerOptions: { model: 'gemini-2.5-flash-preview-tts', stylePrompt: 'Say cheerfully:' },
+});
 ```
 </details>
@@ -240,6 +249,10 @@ FISH_AUDIO_API_KEY=your-fish-audio-api-key
 # Inworld AI (test/admin only – no EU data residency)
 INWORLD_API_KEY=your-inworld-api-key
+# Vertex AI TTS (test/admin only – no EU data residency)
+# Reuses GOOGLE_APPLICATION_CREDENTIALS and GOOGLE_CLOUD_PROJECT from above
+VERTEX_AI_TTS_REGION=us-central1
 # Logging
 TTS_DEBUG=false
 LOG_LEVEL=info
@@ -304,6 +317,20 @@ LOG_LEVEL=info
 | **Pricing** | $10/1M chars (Max), $5/1M chars (Mini) |
 | **EU Compliance** | No data residency guarantees |
+### Vertex AI TTS (Test/Admin Only)
+| Feature | Details |
+|---------|---------|
+| **Models** | `gemini-2.5-flash-preview-tts` (budget, fast), `gemini-2.5-pro-preview-tts` (premium, natural) |
+| **Languages** | 90+ with auto-detection |
+| **Voices** | 30 multilingual: Kore, Puck, Charon, Zephyr, Fenrir, Sulafat, etc. |
+| **Style Control** | Natural language prompts: "Say cheerfully:", "Read in a spooky whisper:" |
+| **Audio** | MP3 (via ffmpeg), WAV (fallback) |
+| **Auth** | Service Account OAuth2 (reuses `GOOGLE_APPLICATION_CREDENTIALS`) |
+| **Region** | `VERTEX_AI_TTS_REGION` env var (default: `us-central1`) |
+| **Pricing** | $0.50-1.00/M input tokens + $10-20/M audio output tokens |
+| **EU Compliance** | Preview models currently `us-central1` only — no EU data residency yet |
 ## GDPR / Compliance
 ### Provider Compliance Overview
@@ -315,9 +342,12 @@ LOG_LEVEL=info
 | **EdenAI** | Yes | Depends* | Depends* | Depends on underlying provider |
 | **Fish Audio** | No | No | No | Test/admin only |
 | **Inworld AI** | No | No | No | Test/admin only |
+| **Vertex AI TTS** | Yes (Vertex DPA) | Partial | No* | Test/admin only |
 *EdenAI is an aggregator - compliance depends on the underlying provider.
+\*Vertex AI TTS: DPA available, no model training on customer data — but preview models are currently `us-central1` only (no EU data residency until GA with EU region support).
 ## API Reference
 ### TTSService
@@ -503,12 +533,14 @@ graph TD
     Registry -->|Select| Eden[EdenAIProvider]
     Registry -->|Select| Fish[FishAudioProvider]
     Registry -->|Select| Inworld[InworldProvider]
+    Registry -->|Select| VertexAI[VertexAITTSProvider]
     Azure -->|SSML/SDK| AzureAPI[Azure Speech API]
     GCloud -->|gRPC/SDK| GoogleAPI[Google Cloud TTS API]
     Eden -->|REST| EdenAPI[EdenAI API]
     Fish -->|REST| FishAPI[Fish Audio API]
     Inworld -->|REST| InworldAPI[Inworld AI API]
+    VertexAI -->|REST/OAuth2| VertexAPI[Vertex AI API]
     GoogleAPI -->|EU Endpoint| EU[eu-texttospeech.googleapis.com]
     EdenAPI -.-> OpenAI[OpenAI TTS]
@@ -518,7 +550,7 @@ graph TD
 ## Testing
 ```bash
-# Run all tests (555 tests, >90% coverage)
+# Run all tests (600+ tests, >90% coverage)
 npm test
 # Unit tests only
@@ -534,6 +566,8 @@ npm run test:coverage
 npx ts-node scripts/manual-test-edenai.ts
 npx ts-node scripts/manual-test-google-cloud-tts.ts
 npx ts-node scripts/manual-test-fish-audio.ts [en] [de]
+npx ts-node scripts/manual-test-inworld.ts [en] [de] [mini]
+npx ts-node scripts/manual-test-vertex-ai.ts [en] [de] [pro] [style]
 # List available Google Cloud voices
 npx ts-node scripts/list-google-voices.ts de-DE

package/dist/middleware/services/tts/index.d.ts CHANGED Viewed

@@ -20,9 +20,10 @@
  */
 export { TTSService, ttsService } from './tts.service';
 export { TTSProvider, TTSErrorCode, AudioFormat, } from './types';
-export type { AudioOptions, VoiceConfig, TTSSynthesizeRequest, TTSResponse, TTSResponseMetadata, TTSBillingInfo, TTSVoice, TTSVoiceMetadata, AzureProviderOptions, OpenAIProviderOptions, ElevenLabsProviderOptions, GoogleCloudProviderOptions, GoogleCloudTTSProviderOptions, DeepgramProviderOptions, EdenAIProviderOptions, FishAudioProviderOptions, InworldProviderOptions, ProviderOptions, } from './types';
-export { isAzureOptions, isOpenAIOptions, isElevenLabsOptions, isGoogleCloudOptions, isGoogleCloudTTSOptions, isDeepgramOptions, isEdenAIOptions, isFishAudioOptions, isInworldOptions, } from './types';
-export { BaseTTSProvider, AzureProvider, EdenAIProvider, FishAudioProvider, GoogleCloudTTSProvider, InworldProvider, } from './providers';
+export type { AudioOptions, VoiceConfig, TTSSynthesizeRequest, TTSResponse, TTSResponseMetadata, TTSBillingInfo, TTSVoice, TTSVoiceMetadata, AzureProviderOptions, OpenAIProviderOptions, ElevenLabsProviderOptions, GoogleCloudProviderOptions, GoogleCloudTTSProviderOptions, DeepgramProviderOptions, EdenAIProviderOptions, FishAudioProviderOptions, InworldProviderOptions, VertexAITTSProviderOptions, ProviderOptions, } from './types';
+export { isAzureOptions, isOpenAIOptions, isElevenLabsOptions, isGoogleCloudOptions, isGoogleCloudTTSOptions, isDeepgramOptions, isEdenAIOptions, isFishAudioOptions, isInworldOptions, isVertexAITTSOptions, } from './types';
+export { BaseTTSProvider, AzureProvider, EdenAIProvider, FishAudioProvider, GoogleCloudTTSProvider, InworldProvider, VertexAITTSProvider, } from './providers';
+export type { VertexAITTSConfig } from './providers';
 export type { GoogleCloudTTSRegion, GoogleCloudTTSConfig, } from './providers';
 export { TTSError, InvalidConfigError, InvalidVoiceError, QuotaExceededError, ProviderUnavailableError, SynthesisFailedError, NetworkError, } from './providers';
 export { countCharacters, countCharactersWithoutSSML, validateCharacterCount, countBillableCharacters, estimateAudioDuration, formatCharacterCount, } from './utils';

package/dist/middleware/services/tts/index.d.ts.map CHANGED Viewed

	@@ -1 +1 @@
1	- {"version":3,"file":"index.d.ts","sourceRoot":"","sources":["../../../../src/middleware/services/tts/index.ts"],"names":[],"mappings":"AAAA;;;;;;;;;;;;;;;;;;;GAmBG;AAGH,OAAO,EAAE,UAAU,EAAE,UAAU,EAAE,MAAM,eAAe,CAAC;AAGvD,OAAO,EACL,WAAW,EACX,YAAY,EACZ,WAAW,GACZ,MAAM,SAAS,CAAC;AAEjB,YAAY,EACV,YAAY,EACZ,WAAW,EACX,oBAAoB,EACpB,WAAW,EACX,mBAAmB,EACnB,cAAc,EACd,QAAQ,EACR,gBAAgB,EAChB,oBAAoB,EACpB,qBAAqB,EACrB,yBAAyB,EACzB,0BAA0B,EAC1B,6BAA6B,EAC7B,uBAAuB,EACvB,qBAAqB,EACrB,wBAAwB,EACxB,sBAAsB,EACtB,eAAe,GAChB,MAAM,SAAS,CAAC;AAEjB,OAAO,EACL,cAAc,EACd,eAAe,EACf,mBAAmB,EACnB,oBAAoB,EACpB,uBAAuB,EACvB,iBAAiB,EACjB,eAAe,EACf,kBAAkB,EAClB,gBAAgB,~~GACjB~~,MAAM,SAAS,CAAC;AAGjB,OAAO,EACL,eAAe,EACf,aAAa,EACb,cAAc,EACd,iBAAiB,EACjB,sBAAsB,EACtB,eAAe,~~GAChB~~,MAAM,aAAa,CAAC;AAErB,YAAY,EACV,oBAAoB,EACpB,oBAAoB,GACrB,MAAM,aAAa,CAAC;AAGrB,OAAO,EACL,QAAQ,EACR,kBAAkB,EAClB,iBAAiB,EACjB,kBAAkB,EAClB,wBAAwB,EACxB,oBAAoB,EACpB,YAAY,GACb,MAAM,aAAa,CAAC;AAGrB,OAAO,EACL,eAAe,EACf,0BAA0B,EAC1B,sBAAsB,EACtB,uBAAuB,EACvB,qBAAqB,EACrB,oBAAoB,GACrB,MAAM,SAAS,CAAC;AAGjB,OAAO,EACL,SAAS,EACT,SAAS,EACT,WAAW,EACX,WAAW,EACX,WAAW,EACX,YAAY,GACb,MAAM,SAAS,CAAC;AAEjB,YAAY,EAAE,SAAS,EAAE,QAAQ,EAAE,MAAM,SAAS,CAAC;AAGnD,OAAO,EACL,gBAAgB,EAChB,gBAAgB,EAChB,oBAAoB,GACrB,MAAM,SAAS,CAAC;AAEjB,YAAY,EAAE,WAAW,EAAE,WAAW,EAAE,MAAM,SAAS,CAAC"}
1	+ {"version":3,"file":"index.d.ts","sourceRoot":"","sources":["../../../../src/middleware/services/tts/index.ts"],"names":[],"mappings":"AAAA;;;;;;;;;;;;;;;;;;;GAmBG;AAGH,OAAO,EAAE,UAAU,EAAE,UAAU,EAAE,MAAM,eAAe,CAAC;AAGvD,OAAO,EACL,WAAW,EACX,YAAY,EACZ,WAAW,GACZ,MAAM,SAAS,CAAC;AAEjB,YAAY,EACV,YAAY,EACZ,WAAW,EACX,oBAAoB,EACpB,WAAW,EACX,mBAAmB,EACnB,cAAc,EACd,QAAQ,EACR,gBAAgB,EAChB,oBAAoB,EACpB,qBAAqB,EACrB,yBAAyB,EACzB,0BAA0B,EAC1B,6BAA6B,EAC7B,uBAAuB,EACvB,qBAAqB,EACrB,wBAAwB,EACxB,sBAAsB,EACtB,0BAA0B,EAC1B,eAAe,GAChB,MAAM,SAAS,CAAC;AAEjB,OAAO,EACL,cAAc,EACd,eAAe,EACf,mBAAmB,EACnB,oBAAoB,EACpB,uBAAuB,EACvB,iBAAiB,EACjB,eAAe,EACf,kBAAkB,EAClB,gBAAgB,EAChB,oBAAoB,GACrB,MAAM,SAAS,CAAC;AAGjB,OAAO,EACL,eAAe,EACf,aAAa,EACb,cAAc,EACd,iBAAiB,EACjB,sBAAsB,EACtB,eAAe,EACf,mBAAmB,GACpB,MAAM,aAAa,CAAC;AAErB,YAAY,EAAE,iBAAiB,EAAE,MAAM,aAAa,CAAC;AAErD,YAAY,EACV,oBAAoB,EACpB,oBAAoB,GACrB,MAAM,aAAa,CAAC;AAGrB,OAAO,EACL,QAAQ,EACR,kBAAkB,EAClB,iBAAiB,EACjB,kBAAkB,EAClB,wBAAwB,EACxB,oBAAoB,EACpB,YAAY,GACb,MAAM,aAAa,CAAC;AAGrB,OAAO,EACL,eAAe,EACf,0BAA0B,EAC1B,sBAAsB,EACtB,uBAAuB,EACvB,qBAAqB,EACrB,oBAAoB,GACrB,MAAM,SAAS,CAAC;AAGjB,OAAO,EACL,SAAS,EACT,SAAS,EACT,WAAW,EACX,WAAW,EACX,WAAW,EACX,YAAY,GACb,MAAM,SAAS,CAAC;AAEjB,YAAY,EAAE,SAAS,EAAE,QAAQ,EAAE,MAAM,SAAS,CAAC;AAGnD,OAAO,EACL,gBAAgB,EAChB,gBAAgB,EAChB,oBAAoB,GACrB,MAAM,SAAS,CAAC;AAEjB,YAAY,EAAE,WAAW,EAAE,WAAW,EAAE,MAAM,SAAS,CAAC"}

package/dist/middleware/services/tts/index.js CHANGED Viewed

@@ -20,7 +20,7 @@
  * @module @loonylabs/tts-middleware
  */
 Object.defineProperty(exports, "__esModule", { value: true });
-exports.DEFAULT_RETRY_CONFIG = exports.isRetryableError = exports.executeWithRetry = exports.silentLogger = exports.getLogLevel = exports.setLogLevel = exports.resetLogger = exports.getLogger = exports.setLogger = exports.formatCharacterCount = exports.estimateAudioDuration = exports.countBillableCharacters = exports.validateCharacterCount = exports.countCharactersWithoutSSML = exports.countCharacters = exports.NetworkError = exports.SynthesisFailedError = exports.ProviderUnavailableError = exports.QuotaExceededError = exports.InvalidVoiceError = exports.InvalidConfigError = exports.TTSError = exports.InworldProvider = exports.GoogleCloudTTSProvider = exports.FishAudioProvider = exports.EdenAIProvider = exports.AzureProvider = exports.BaseTTSProvider = exports.isInworldOptions = exports.isFishAudioOptions = exports.isEdenAIOptions = exports.isDeepgramOptions = exports.isGoogleCloudTTSOptions = exports.isGoogleCloudOptions = exports.isElevenLabsOptions = exports.isOpenAIOptions = exports.isAzureOptions = exports.TTSErrorCode = exports.TTSProvider = exports.ttsService = exports.TTSService = void 0;
+exports.DEFAULT_RETRY_CONFIG = exports.isRetryableError = exports.executeWithRetry = exports.silentLogger = exports.getLogLevel = exports.setLogLevel = exports.resetLogger = exports.getLogger = exports.setLogger = exports.formatCharacterCount = exports.estimateAudioDuration = exports.countBillableCharacters = exports.validateCharacterCount = exports.countCharactersWithoutSSML = exports.countCharacters = exports.NetworkError = exports.SynthesisFailedError = exports.ProviderUnavailableError = exports.QuotaExceededError = exports.InvalidVoiceError = exports.InvalidConfigError = exports.TTSError = exports.VertexAITTSProvider = exports.InworldProvider = exports.GoogleCloudTTSProvider = exports.FishAudioProvider = exports.EdenAIProvider = exports.AzureProvider = exports.BaseTTSProvider = exports.isVertexAITTSOptions = exports.isInworldOptions = exports.isFishAudioOptions = exports.isEdenAIOptions = exports.isDeepgramOptions = exports.isGoogleCloudTTSOptions = exports.isGoogleCloudOptions = exports.isElevenLabsOptions = exports.isOpenAIOptions = exports.isAzureOptions = exports.TTSErrorCode = exports.TTSProvider = exports.ttsService = exports.TTSService = void 0;
 // ===== Main Service =====
 var tts_service_1 = require("./tts.service");
 Object.defineProperty(exports, "TTSService", { enumerable: true, get: function () { return tts_service_1.TTSService; } });
@@ -39,6 +39,7 @@ Object.defineProperty(exports, "isDeepgramOptions", { enumerable: true, get: fun
 Object.defineProperty(exports, "isEdenAIOptions", { enumerable: true, get: function () { return types_2.isEdenAIOptions; } });
 Object.defineProperty(exports, "isFishAudioOptions", { enumerable: true, get: function () { return types_2.isFishAudioOptions; } });
 Object.defineProperty(exports, "isInworldOptions", { enumerable: true, get: function () { return types_2.isInworldOptions; } });
+Object.defineProperty(exports, "isVertexAITTSOptions", { enumerable: true, get: function () { return types_2.isVertexAITTSOptions; } });
 // ===== Providers =====
 var providers_1 = require("./providers");
 Object.defineProperty(exports, "BaseTTSProvider", { enumerable: true, get: function () { return providers_1.BaseTTSProvider; } });
@@ -47,6 +48,7 @@ Object.defineProperty(exports, "EdenAIProvider", { enumerable: true, get: functi
 Object.defineProperty(exports, "FishAudioProvider", { enumerable: true, get: function () { return providers_1.FishAudioProvider; } });
 Object.defineProperty(exports, "GoogleCloudTTSProvider", { enumerable: true, get: function () { return providers_1.GoogleCloudTTSProvider; } });
 Object.defineProperty(exports, "InworldProvider", { enumerable: true, get: function () { return providers_1.InworldProvider; } });
+Object.defineProperty(exports, "VertexAITTSProvider", { enumerable: true, get: function () { return providers_1.VertexAITTSProvider; } });
 // ===== Errors =====
 var providers_2 = require("./providers");
 Object.defineProperty(exports, "TTSError", { enumerable: true, get: function () { return providers_2.TTSError; } });

package/dist/middleware/services/tts/index.js.map CHANGED Viewed

	@@ -1 +1 @@
1	- {"version":3,"file":"index.js","sourceRoot":"","sources":["../../../../src/middleware/services/tts/index.ts"],"names":[],"mappings":";AAAA;;;;;;;;;;;;;;;;;;;GAmBG;;;AAEH,2BAA2B;AAC3B,6CAAuD;AAA9C,yGAAA,UAAU,OAAA;AAAE,yGAAA,UAAU,OAAA;AAE/B,oBAAoB;AACpB,iCAIiB;AAHf,oGAAA,WAAW,OAAA;AACX,qGAAA,YAAY,OAAA;~~AAyBd~~,~~iCAUiB~~;~~AATf~~,uGAAA,cAAc,OAAA;AACd,wGAAA,eAAe,OAAA;AACf,4GAAA,mBAAmB,OAAA;AACnB,6GAAA,oBAAoB,OAAA;AACpB,gHAAA,uBAAuB,OAAA;AACvB,0GAAA,iBAAiB,OAAA;AACjB,wGAAA,eAAe,OAAA;AACf,2GAAA,kBAAkB,OAAA;AAClB,yGAAA,gBAAgB,OAAA;~~AAGlB~~,wBAAwB;AACxB,~~yCAOqB~~;~~AANnB~~,4GAAA,eAAe,OAAA;AACf,0GAAA,aAAa,OAAA;AACb,2GAAA,cAAc,OAAA;AACd,8GAAA,iBAAiB,OAAA;AACjB,mHAAA,sBAAsB,OAAA;AACtB,4GAAA,eAAe,OAAA;~~AAQjB~~,qBAAqB;AACrB,yCAQqB;AAPnB,qGAAA,QAAQ,OAAA;AACR,+GAAA,kBAAkB,OAAA;AAClB,8GAAA,iBAAiB,OAAA;AACjB,+GAAA,kBAAkB,OAAA;AAClB,qHAAA,wBAAwB,OAAA;AACxB,iHAAA,oBAAoB,OAAA;AACpB,yGAAA,YAAY,OAAA;AAGd,wBAAwB;AACxB,iCAOiB;AANf,wGAAA,eAAe,OAAA;AACf,mHAAA,0BAA0B,OAAA;AAC1B,+GAAA,sBAAsB,OAAA;AACtB,gHAAA,uBAAuB,OAAA;AACvB,8GAAA,qBAAqB,OAAA;AACrB,6GAAA,oBAAoB,OAAA;AAGtB,qBAAqB;AACrB,iCAOiB;AANf,kGAAA,SAAS,OAAA;AACT,kGAAA,SAAS,OAAA;AACT,oGAAA,WAAW,OAAA;AACX,oGAAA,WAAW,OAAA;AACX,oGAAA,WAAW,OAAA;AACX,qGAAA,YAAY,OAAA;AAKd,oBAAoB;AACpB,iCAIiB;AAHf,yGAAA,gBAAgB,OAAA;AAChB,yGAAA,gBAAgB,OAAA;AAChB,6GAAA,oBAAoB,OAAA"}
1	+ {"version":3,"file":"index.js","sourceRoot":"","sources":["../../../../src/middleware/services/tts/index.ts"],"names":[],"mappings":";AAAA;;;;;;;;;;;;;;;;;;;GAmBG;;;AAEH,2BAA2B;AAC3B,6CAAuD;AAA9C,yGAAA,UAAU,OAAA;AAAE,yGAAA,UAAU,OAAA;AAE/B,oBAAoB;AACpB,iCAIiB;AAHf,oGAAA,WAAW,OAAA;AACX,qGAAA,YAAY,OAAA;AA0Bd,iCAWiB;AAVf,uGAAA,cAAc,OAAA;AACd,wGAAA,eAAe,OAAA;AACf,4GAAA,mBAAmB,OAAA;AACnB,6GAAA,oBAAoB,OAAA;AACpB,gHAAA,uBAAuB,OAAA;AACvB,0GAAA,iBAAiB,OAAA;AACjB,wGAAA,eAAe,OAAA;AACf,2GAAA,kBAAkB,OAAA;AAClB,yGAAA,gBAAgB,OAAA;AAChB,6GAAA,oBAAoB,OAAA;AAGtB,wBAAwB;AACxB,yCAQqB;AAPnB,4GAAA,eAAe,OAAA;AACf,0GAAA,aAAa,OAAA;AACb,2GAAA,cAAc,OAAA;AACd,8GAAA,iBAAiB,OAAA;AACjB,mHAAA,sBAAsB,OAAA;AACtB,4GAAA,eAAe,OAAA;AACf,gHAAA,mBAAmB,OAAA;AAUrB,qBAAqB;AACrB,yCAQqB;AAPnB,qGAAA,QAAQ,OAAA;AACR,+GAAA,kBAAkB,OAAA;AAClB,8GAAA,iBAAiB,OAAA;AACjB,+GAAA,kBAAkB,OAAA;AAClB,qHAAA,wBAAwB,OAAA;AACxB,iHAAA,oBAAoB,OAAA;AACpB,yGAAA,YAAY,OAAA;AAGd,wBAAwB;AACxB,iCAOiB;AANf,wGAAA,eAAe,OAAA;AACf,mHAAA,0BAA0B,OAAA;AAC1B,+GAAA,sBAAsB,OAAA;AACtB,gHAAA,uBAAuB,OAAA;AACvB,8GAAA,qBAAqB,OAAA;AACrB,6GAAA,oBAAoB,OAAA;AAGtB,qBAAqB;AACrB,iCAOiB;AANf,kGAAA,SAAS,OAAA;AACT,kGAAA,SAAS,OAAA;AACT,oGAAA,WAAW,OAAA;AACX,oGAAA,WAAW,OAAA;AACX,oGAAA,WAAW,OAAA;AACX,qGAAA,YAAY,OAAA;AAKd,oBAAoB;AACpB,iCAIiB;AAHf,yGAAA,gBAAgB,OAAA;AAChB,yGAAA,gBAAgB,OAAA;AAChB,6GAAA,oBAAoB,OAAA"}

package/dist/middleware/services/tts/providers/gemini-provider.d.ts ADDED Viewed

@@ -0,0 +1,142 @@
+/**
+ * Gemini TTS Provider
+ *
+ * @description Provider for Google Gemini TTS via Vertex AI, using the generateContent
+ * endpoint with responseModalities: ['AUDIO']. Authenticates via Service Account
+ * (same as Google Cloud TTS — reuses GOOGLE_APPLICATION_CREDENTIALS).
+ *
+ * Supports 30 multilingual voices with auto-detect language and natural language
+ * style control. Output is raw PCM (24kHz, 16-bit, mono) which is converted to
+ * MP3 via ffmpeg or WAV as fallback.
+ *
+ * Test/Admin only -- no EU data residency guarantees.
+ *
+ * @see https://cloud.google.com/vertex-ai/generative-ai/docs/text-to-speech
+ */
+import type { TTSSynthesizeRequest, TTSResponse } from '../types';
+import { BaseTTSProvider } from './base-tts-provider';
+/**
+ * Gemini TTS configuration (Vertex AI)
+ */
+export interface GeminiConfig {
+    /**
+     * Path to Service Account JSON file
+     * @env GOOGLE_APPLICATION_CREDENTIALS
+     */
+    keyFilename?: string;
+    /**
+     * Google Cloud Project ID
+     * @env GOOGLE_CLOUD_PROJECT
+     */
+    projectId?: string;
+    /**
+     * Vertex AI region
+     * @env GEMINI_REGION
+     * @default 'us-central1'
+     */
+    region?: string;
+}
+/**
+ * Gemini TTS provider implementation
+ *
+ * @description Provides TTS synthesis using Google's Gemini generateContent API
+ * via Vertex AI. Authenticates with Service Account OAuth2 (same credentials as
+ * Google Cloud TTS). Gemini outputs raw PCM which is converted to MP3 (via ffmpeg)
+ * or WAV (pure Node.js fallback).
+ *
+ * Billing: Token-based ($0.50-1.00/M input + $10-20/M audio output tokens).
+ * For billing compatibility, reports character count like all other providers.
+ *
+ * @example
+ * ```typescript
+ * const provider = new GeminiProvider();
+ * const response = await provider.synthesize(
+ *   "Hello World",
+ *   "Kore",
+ *   {
+ *     text: "Hello World",
+ *     voice: { id: "Kore" },
+ *     audio: { format: "mp3" },
+ *     providerOptions: {
+ *       model: "gemini-2.5-flash-preview-tts",
+ *       stylePrompt: "Say cheerfully:"
+ *     }
+ *   }
+ * );
+ * ```
+ */
+export declare class GeminiProvider extends BaseTTSProvider {
+    private config;
+    private authClient;
+    /**
+     * Creates a new Gemini TTS provider
+     *
+     * @param config - Optional configuration (uses env vars if not provided)
+     * @throws {InvalidConfigError} If credentials are missing
+     */
+    constructor(config?: Partial<GeminiConfig>);
+    /**
+     * Validate Gemini configuration
+     *
+     * @private
+     * @throws {InvalidConfigError} If configuration is invalid
+     */
+    private validateGeminiConfig;
+    /**
+     * Get an authenticated access token via Service Account
+     *
+     * @private
+     * @returns OAuth2 access token
+     */
+    private getAccessToken;
+    /**
+     * Synthesize text to speech using Gemini TTS
+     *
+     * @param text - The input text to synthesize
+     * @param voiceId - The voice name (e.g. "Kore", "Puck", "Charon")
+     * @param request - The full synthesis request with options
+     * @returns Promise resolving to the synthesis response
+     */
+    synthesize(text: string, voiceId: string, request: TTSSynthesizeRequest): Promise<TTSResponse>;
+    /**
+     * Build Gemini generateContent request payload
+     *
+     * @private
+     */
+    private buildRequest;
+    /**
+     * Call Gemini generateContent API via Vertex AI
+     *
+     * @private
+     * @param requestBody - The request payload
+     * @param model - The Gemini model to use
+     * @returns Promise resolving to raw PCM audio buffer
+     */
+    private callAPI;
+    /**
+     * Convert raw PCM audio to the requested format
+     *
+     * @private
+     * @param pcmBuffer - Raw PCM buffer (24kHz, 16-bit, mono, little-endian)
+     * @param requestedFormat - The desired output format ('mp3', 'wav', etc.)
+     * @returns The converted audio buffer and actual format used
+     */
+    private convertPcmAudio;
+    /**
+     * Convert raw PCM to MP3 using ffmpeg via child_process
+     *
+     * @private
+     * @param pcmBuffer - Raw PCM buffer (24kHz, 16-bit, mono, little-endian)
+     * @returns Promise resolving to MP3 buffer
+     */
+    private pcmToMp3;
+    /**
+     * Convert raw PCM to WAV by prepending a 44-byte WAV header
+     *
+     * @private
+     * @param pcmBuffer - Raw PCM buffer (24kHz, 16-bit, mono, little-endian)
+     * @returns WAV buffer
+     */
+    private pcmToWav;
+}
+//# sourceMappingURL=gemini-provider.d.ts.map

package/dist/middleware/services/tts/providers/gemini-provider.d.ts.map ADDED Viewed

@@ -0,0 +1 @@

+ {"version":3,"file":"gemini-provider.d.ts","sourceRoot":"","sources":["../../../../../src/middleware/services/tts/providers/gemini-provider.ts"],"names":[],"mappings":"AAAA;;;;;;;;;;;;;;GAcG;AAGH,OAAO,KAAK,EAAE,oBAAoB,EAAE,WAAW,EAAE,MAAM,UAAU,CAAC;AAGlE,OAAO,EACL,eAAe,EAEhB,MAAM,qBAAqB,CAAC;AAG7B;;GAEG;AACH,MAAM,WAAW,YAAY;IAC3B;;;OAGG;IACH,WAAW,CAAC,EAAE,MAAM,CAAC;IAErB;;;OAGG;IACH,SAAS,CAAC,EAAE,MAAM,CAAC;IAEnB;;;;OAIG;IACH,MAAM,CAAC,EAAE,MAAM,CAAC;CACjB;AAMD;;;;;;;;;;;;;;;;;;;;;;;;;;;;GA4BG;AACH,qBAAa,cAAe,SAAQ,eAAe;IACjD,OAAO,CAAC,MAAM,CAAe;IAC7B,OAAO,CAAC,UAAU,CAA6E;IAE/F;;;;;OAKG;gBACS,MAAM,CAAC,EAAE,OAAO,CAAC,YAAY,CAAC;IAkB1C;;;;;OAKG;IACH,OAAO,CAAC,oBAAoB;IAgB5B;;;;;OAKG;YACW,cAAc;IAqB5B;;;;;;;OAOG;IACG,UAAU,CACd,IAAI,EAAE,MAAM,EACZ,OAAO,EAAE,MAAM,EACf,OAAO,EAAE,oBAAoB,GAC5B,OAAO,CAAC,WAAW,CAAC;IAsDvB;;;;OAIG;IACH,OAAO,CAAC,YAAY;IA6BpB;;;;;;;OAOG;YACW,OAAO;IA6CrB;;;;;;;OAOG;YACW,eAAe;IA0B7B;;;;;;OAMG;IACH,OAAO,CAAC,QAAQ;IAkChB;;;;;;OAMG;IACH,OAAO,CAAC,QAAQ;CAwBjB"}

package/dist/middleware/services/tts/providers/gemini-provider.js ADDED Viewed

@@ -0,0 +1,358 @@
+"use strict";
+/**
+ * Gemini TTS Provider
+ *
+ * @description Provider for Google Gemini TTS via Vertex AI, using the generateContent
+ * endpoint with responseModalities: ['AUDIO']. Authenticates via Service Account
+ * (same as Google Cloud TTS — reuses GOOGLE_APPLICATION_CREDENTIALS).
+ *
+ * Supports 30 multilingual voices with auto-detect language and natural language
+ * style control. Output is raw PCM (24kHz, 16-bit, mono) which is converted to
+ * MP3 via ffmpeg or WAV as fallback.
+ *
+ * Test/Admin only -- no EU data residency guarantees.
+ *
+ * @see https://cloud.google.com/vertex-ai/generative-ai/docs/text-to-speech
+ */
+var __createBinding = (this && this.__createBinding) || (Object.create ? (function(o, m, k, k2) {
+    if (k2 === undefined) k2 = k;
+    var desc = Object.getOwnPropertyDescriptor(m, k);
+    if (!desc || ("get" in desc ? !m.__esModule : desc.writable || desc.configurable)) {
+      desc = { enumerable: true, get: function() { return m[k]; } };
+    }
+    Object.defineProperty(o, k2, desc);
+}) : (function(o, m, k, k2) {
+    if (k2 === undefined) k2 = k;
+    o[k2] = m[k];
+}));
+var __setModuleDefault = (this && this.__setModuleDefault) || (Object.create ? (function(o, v) {
+    Object.defineProperty(o, "default", { enumerable: true, value: v });
+}) : function(o, v) {
+    o["default"] = v;
+});
+var __importStar = (this && this.__importStar) || (function () {
+    var ownKeys = function(o) {
+        ownKeys = Object.getOwnPropertyNames || function (o) {
+            var ar = [];
+            for (var k in o) if (Object.prototype.hasOwnProperty.call(o, k)) ar[ar.length] = k;
+            return ar;
+        };
+        return ownKeys(o);
+    };
+    return function (mod) {
+        if (mod && mod.__esModule) return mod;
+        var result = {};
+        if (mod != null) for (var k = ownKeys(mod), i = 0; i < k.length; i++) if (k[i] !== "default") __createBinding(result, mod, k[i]);
+        __setModuleDefault(result, mod);
+        return result;
+    };
+})();
+Object.defineProperty(exports, "__esModule", { value: true });
+exports.GeminiProvider = void 0;
+const child_process_1 = require("child_process");
+const types_1 = require("../types");
+const mp3_duration_utils_1 = require("../utils/mp3-duration.utils");
+const base_tts_provider_1 = require("./base-tts-provider");
+const DEFAULT_MODEL = 'gemini-2.5-flash-preview-tts';
+const DEFAULT_SAMPLE_RATE = 24000;
+const DEFAULT_REGION = 'us-central1';
+/**
+ * Gemini TTS provider implementation
+ *
+ * @description Provides TTS synthesis using Google's Gemini generateContent API
+ * via Vertex AI. Authenticates with Service Account OAuth2 (same credentials as
+ * Google Cloud TTS). Gemini outputs raw PCM which is converted to MP3 (via ffmpeg)
+ * or WAV (pure Node.js fallback).
+ *
+ * Billing: Token-based ($0.50-1.00/M input + $10-20/M audio output tokens).
+ * For billing compatibility, reports character count like all other providers.
+ *
+ * @example
+ * ```typescript
+ * const provider = new GeminiProvider();
+ * const response = await provider.synthesize(
+ *   "Hello World",
+ *   "Kore",
+ *   {
+ *     text: "Hello World",
+ *     voice: { id: "Kore" },
+ *     audio: { format: "mp3" },
+ *     providerOptions: {
+ *       model: "gemini-2.5-flash-preview-tts",
+ *       stylePrompt: "Say cheerfully:"
+ *     }
+ *   }
+ * );
+ * ```
+ */
+class GeminiProvider extends base_tts_provider_1.BaseTTSProvider {
+    /**
+     * Creates a new Gemini TTS provider
+     *
+     * @param config - Optional configuration (uses env vars if not provided)
+     * @throws {InvalidConfigError} If credentials are missing
+     */
+    constructor(config) {
+        super(types_1.TTSProvider.GEMINI);
+        this.authClient = null;
+        this.config = {
+            keyFilename: config?.keyFilename || process.env.GOOGLE_APPLICATION_CREDENTIALS,
+            projectId: config?.projectId || process.env.GOOGLE_CLOUD_PROJECT,
+            region: config?.region || process.env.GEMINI_REGION || DEFAULT_REGION,
+        };
+        this.validateGeminiConfig();
+        this.log('info', 'Gemini TTS provider initialized', {
+            hasCredentials: !!this.config.keyFilename,
+            projectId: this.config.projectId ? '***' : undefined,
+            region: this.config.region,
+        });
+    }
+    /**
+     * Validate Gemini configuration
+     *
+     * @private
+     * @throws {InvalidConfigError} If configuration is invalid
+     */
+    validateGeminiConfig() {
+        if (!this.config.keyFilename) {
+            throw new base_tts_provider_1.InvalidConfigError(this.providerName, 'Google Cloud credentials are required for Gemini TTS (GOOGLE_APPLICATION_CREDENTIALS)');
+        }
+        if (!this.config.projectId) {
+            throw new base_tts_provider_1.InvalidConfigError(this.providerName, 'Google Cloud Project ID is required for Gemini TTS (GOOGLE_CLOUD_PROJECT)');
+        }
+    }
+    /**
+     * Get an authenticated access token via Service Account
+     *
+     * @private
+     * @returns OAuth2 access token
+     */
+    async getAccessToken() {
+        if (!this.authClient) {
+            const { GoogleAuth } = await Promise.resolve().then(() => __importStar(require('google-auth-library')));
+            const auth = new GoogleAuth({
+                keyFilename: this.config.keyFilename,
+                scopes: ['https://www.googleapis.com/auth/cloud-platform'],
+            });
+            this.authClient = await auth.getClient();
+        }
+        const tokenResponse = await this.authClient.getAccessToken();
+        if (!tokenResponse.token) {
+            throw new base_tts_provider_1.InvalidConfigError(this.providerName, 'Failed to obtain access token from Service Account');
+        }
+        return tokenResponse.token;
+    }
+    /**
+     * Synthesize text to speech using Gemini TTS
+     *
+     * @param text - The input text to synthesize
+     * @param voiceId - The voice name (e.g. "Kore", "Puck", "Charon")
+     * @param request - The full synthesis request with options
+     * @returns Promise resolving to the synthesis response
+     */
+    async synthesize(text, voiceId, request) {
+        this.validateConfig(request);
+        const startTime = Date.now();
+        const options = (request.providerOptions || {});
+        const model = options.model || DEFAULT_MODEL;
+        const requestedFormat = request.audio?.format || 'mp3';
+        const requestBody = this.buildRequest(text, voiceId, options);
+        this.log('debug', 'Synthesizing with Gemini TTS', {
+            voiceId,
+            model,
+            textLength: text.length,
+            requestedFormat,
+        });
+        try {
+            const pcmBuffer = await this.callAPI(requestBody, model);
+            const { audioBuffer, audioFormat } = await this.convertPcmAudio(pcmBuffer, requestedFormat);
+            const duration = Date.now() - startTime;
+            this.log('info', 'Synthesis successful', {
+                voiceId,
+                characters: text.length,
+                duration,
+                audioSize: audioBuffer.length,
+                audioFormat,
+            });
+            return {
+                audio: audioBuffer,
+                metadata: {
+                    provider: this.providerName,
+                    voice: voiceId,
+                    duration,
+                    audioDuration: audioFormat === 'mp3' ? (0, mp3_duration_utils_1.getMp3Duration)(audioBuffer) : undefined,
+                    audioFormat,
+                    sampleRate: DEFAULT_SAMPLE_RATE,
+                },
+                billing: {
+                    characters: this.countCharacters(text),
+                },
+            };
+        }
+        catch (error) {
+            this.log('error', 'Synthesis failed', {
+                voiceId,
+                error: error.message,
+            });
+            throw this.handleError(error, 'during Gemini TTS API call');
+        }
+    }
+    /**
+     * Build Gemini generateContent request payload
+     *
+     * @private
+     */
+    buildRequest(text, voiceId, options) {
+        const synthesisText = options.stylePrompt
+            ? `${options.stylePrompt} ${text}`
+            : text;
+        return {
+            contents: [
+                {
+                    role: 'user',
+                    parts: [{ text: synthesisText }],
+                },
+            ],
+            generationConfig: {
+                responseModalities: ['AUDIO'],
+                speechConfig: {
+                    voiceConfig: {
+                        prebuiltVoiceConfig: {
+                            voiceName: voiceId,
+                        },
+                    },
+                },
+            },
+        };
+    }
+    /**
+     * Call Gemini generateContent API via Vertex AI
+     *
+     * @private
+     * @param requestBody - The request payload
+     * @param model - The Gemini model to use
+     * @returns Promise resolving to raw PCM audio buffer
+     */
+    async callAPI(requestBody, model) {
+        const accessToken = await this.getAccessToken();
+        const region = this.config.region || DEFAULT_REGION;
+        const projectId = this.config.projectId;
+        const url = `https://${region}-aiplatform.googleapis.com/v1/projects/${projectId}/locations/${region}/publishers/google/models/${model}:generateContent`;
+        const response = await fetch(url, {
+            method: 'POST',
+            headers: {
+                'Authorization': `Bearer ${accessToken}`,
+                'Content-Type': 'application/json',
+            },
+            body: JSON.stringify(requestBody),
+        });
+        if (!response.ok) {
+            const errorText = await response.text();
+            throw new Error(`Gemini API error (${response.status}): ${errorText}`);
+        }
+        const responseJson = await response.json();
+        const inlineData = responseJson.candidates?.[0]?.content?.parts?.[0]?.inlineData;
+        if (!inlineData?.data) {
+            throw new Error('Gemini API returned no audio data');
+        }
+        return Buffer.from(inlineData.data, 'base64');
+    }
+    /**
+     * Convert raw PCM audio to the requested format
+     *
+     * @private
+     * @param pcmBuffer - Raw PCM buffer (24kHz, 16-bit, mono, little-endian)
+     * @param requestedFormat - The desired output format ('mp3', 'wav', etc.)
+     * @returns The converted audio buffer and actual format used
+     */
+    async convertPcmAudio(pcmBuffer, requestedFormat) {
+        if (requestedFormat === 'wav') {
+            return {
+                audioBuffer: this.pcmToWav(pcmBuffer),
+                audioFormat: 'wav',
+            };
+        }
+        // For mp3 (and any other format), try ffmpeg first, fall back to WAV
+        try {
+            const mp3Buffer = await this.pcmToMp3(pcmBuffer);
+            return { audioBuffer: mp3Buffer, audioFormat: 'mp3' };
+        }
+        catch (error) {
+            this.log('warn', 'ffmpeg not available, falling back to WAV output', {
+                error: error.message,
+            });
+            return {
+                audioBuffer: this.pcmToWav(pcmBuffer),
+                audioFormat: 'wav',
+            };
+        }
+    }
+    /**
+     * Convert raw PCM to MP3 using ffmpeg via child_process
+     *
+     * @private
+     * @param pcmBuffer - Raw PCM buffer (24kHz, 16-bit, mono, little-endian)
+     * @returns Promise resolving to MP3 buffer
+     */
+    pcmToMp3(pcmBuffer) {
+        return new Promise((resolve, reject) => {
+            const ffmpeg = (0, child_process_1.spawn)('ffmpeg', [
+                '-f', 's16le',
+                '-ar', String(DEFAULT_SAMPLE_RATE),
+                '-ac', '1',
+                '-i', 'pipe:0',
+                '-codec:a', 'libmp3lame',
+                '-b:a', '128k',
+                '-f', 'mp3',
+                'pipe:1',
+            ]);
+            const chunks = [];
+            ffmpeg.stdout.on('data', (chunk) => chunks.push(chunk));
+            ffmpeg.stderr.on('data', () => { });
+            ffmpeg.on('error', (err) => {
+                reject(new Error(`ffmpeg spawn failed: ${err.message}`));
+            });
+            ffmpeg.on('close', (code) => {
+                if (code === 0) {
+                    resolve(Buffer.concat(chunks));
+                }
+                else {
+                    reject(new Error(`ffmpeg exited with code ${code}`));
+                }
+            });
+            ffmpeg.stdin.write(pcmBuffer);
+            ffmpeg.stdin.end();
+        });
+    }
+    /**
+     * Convert raw PCM to WAV by prepending a 44-byte WAV header
+     *
+     * @private
+     * @param pcmBuffer - Raw PCM buffer (24kHz, 16-bit, mono, little-endian)
+     * @returns WAV buffer
+     */
+    pcmToWav(pcmBuffer) {
+        const channels = 1;
+        const bitsPerSample = 16;
+        const byteRate = DEFAULT_SAMPLE_RATE * channels * (bitsPerSample / 8);
+        const blockAlign = channels * (bitsPerSample / 8);
+        const dataLength = pcmBuffer.length;
+        const header = Buffer.alloc(44);
+        header.write('RIFF', 0);
+        header.writeUInt32LE(36 + dataLength, 4);
+        header.write('WAVE', 8);
+        header.write('fmt ', 12);
+        header.writeUInt32LE(16, 16); // PCM chunk size
+        header.writeUInt16LE(1, 20); // PCM format
+        header.writeUInt16LE(channels, 22);
+        header.writeUInt32LE(DEFAULT_SAMPLE_RATE, 24);
+        header.writeUInt32LE(byteRate, 28);
+        header.writeUInt16LE(blockAlign, 32);
+        header.writeUInt16LE(bitsPerSample, 34);
+        header.write('data', 36);
+        header.writeUInt32LE(dataLength, 40);
+        return Buffer.concat([header, pcmBuffer]);
+    }
+}
+exports.GeminiProvider = GeminiProvider;
+//# sourceMappingURL=gemini-provider.js.map