npm - kugelaudio - Versions diffs - 0.1.1 - Mend

kugelaudio 0.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (13) hide show

package/CHANGELOG.md ADDED Viewed

@@ -0,0 +1,20 @@
+# Changelog
+All notable changes to the KugelAudio JavaScript/TypeScript SDK will be documented in this file.
+The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
+and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
+## [0.1.0] - 2024-12-17
+### Added
+- Initial release of the KugelAudio JavaScript/TypeScript SDK
+- **Models API**: List available TTS models (`client.models.list()`)
+- **Voices API**: List voices (`client.voices.list()`) and get voice details (`client.voices.get()`)
+- **TTS Generation**: Generate complete audio (`client.tts.generate()`)
+- **Streaming**: Real-time audio streaming via WebSocket (`client.tts.stream()`)
+- **Audio Utilities**: `createWavBlob()`, `createWavFile()`, `decodePCM16()`, `base64ToArrayBuffer()`
+- **TypeScript**: Full type definitions for all APIs
+- **Error Handling**: Typed exceptions for auth, rate limits, validation errors
+- **Single URL Architecture**: Connect to TTS server directly for minimal latency
+- **Browser Support**: Works in modern browsers with WebSocket support

package/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2024 KugelAudio
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

package/README.md ADDED Viewed

@@ -0,0 +1,499 @@
+# KugelAudio JavaScript/TypeScript SDK
+Official JavaScript/TypeScript SDK for the KugelAudio Text-to-Speech API.
+## Installation
+```bash
+npm install kugelaudio
+```
+Or with yarn:
+```bash
+yarn add kugelaudio
+```
+Or with pnpm:
+```bash
+pnpm add kugelaudio
+```
+## Quick Start
+```typescript
+import { KugelAudio } from 'kugelaudio';
+// Initialize the client - just needs an API key!
+const client = new KugelAudio({ apiKey: 'your_api_key' });
+// Generate speech
+const audio = await client.tts.generate({
+  text: 'Hello, world!',
+  model: 'kugel-one-turbo',
+});
+// Create a playable blob (browser)
+const blob = new Blob([audio.audio], { type: 'audio/wav' });
+const url = URL.createObjectURL(blob);
+const audioElement = new Audio(url);
+audioElement.play();
+```
+## Client Configuration
+```typescript
+import { KugelAudio } from 'kugelaudio';
+// Simple setup - single URL handles everything
+const client = new KugelAudio({ apiKey: 'your_api_key' });
+// Or with custom options
+const client = new KugelAudio({
+  apiKey: 'your_api_key',           // Required: Your API key
+  apiUrl: 'https://api.kugelaudio.com',  // Optional: API base URL (default)
+  timeout: 60000,                    // Optional: Request timeout in ms
+});
+```
+### Single URL Architecture
+The SDK uses a **single URL** for both REST API and WebSocket streaming. The TTS server provides both REST endpoints (`/v1/models`, `/v1/voices`) and WebSocket (`/ws/tts`) - no proxy needed, minimal latency.
+### Local Development
+For local development, point directly to your TTS server:
+```typescript
+const client = new KugelAudio({
+  apiKey: 'your_api_key',
+  apiUrl: 'http://localhost:8000',   // TTS server handles everything
+});
+```
+Or if you have separate backend and TTS servers:
+```typescript
+const client = new KugelAudio({
+  apiKey: 'your_api_key',
+  apiUrl: 'http://localhost:8001',   // Backend for REST API
+  ttsUrl: 'http://localhost:8000',   // TTS server for WebSocket streaming
+});
+```
+## Available Models
+| Model ID | Name | Parameters | Description |
+|----------|------|------------|-------------|
+| `kugel-one-turbo` | Kugel One Turbo | 1.5B | Fast, low-latency model for real-time applications |
+| `kugel-one` | Kugel One | 7B | Premium quality model for pre-recorded content |
+### List Available Models
+```typescript
+const models = await client.models.list();
+for (const model of models) {
+  console.log(`${model.id}: ${model.name}`);
+  console.log(`  Description: ${model.description}`);
+  console.log(`  Parameters: ${model.parameters}`);
+  console.log(`  Max Input: ${model.maxInputLength} characters`);
+  console.log(`  Sample Rate: ${model.sampleRate} Hz`);
+}
+```
+## Voices
+### List Available Voices
+```typescript
+// List all available voices
+const voices = await client.voices.list();
+for (const voice of voices) {
+  console.log(`${voice.id}: ${voice.name}`);
+  console.log(`  Category: ${voice.category}`);
+  console.log(`  Languages: ${voice.supportedLanguages.join(', ')}`);
+}
+// Filter by language
+const germanVoices = await client.voices.list({ language: 'de' });
+// Get only public voices
+const publicVoices = await client.voices.list({ includePublic: true });
+// Limit results
+const first10 = await client.voices.list({ limit: 10 });
+```
+### Get a Specific Voice
+```typescript
+const voice = await client.voices.get(123);
+console.log(`Voice: ${voice.name}`);
+console.log(`Sample text: ${voice.sampleText}`);
+```
+## Text-to-Speech Generation
+### Basic Generation (Non-Streaming)
+Generate complete audio and receive it all at once:
+```typescript
+const audio = await client.tts.generate({
+  text: 'Hello, this is a test of the KugelAudio text-to-speech system.',
+  model: 'kugel-one-turbo',  // 'kugel-one-turbo' (fast) or 'kugel-one' (quality)
+  voiceId: 123,              // Optional: specific voice ID
+  cfgScale: 2.0,             // Guidance scale (1.0-5.0)
+  maxNewTokens: 2048,        // Maximum tokens to generate
+  sampleRate: 24000,         // Output sample rate
+  speakerPrefix: true,       // Add speaker prefix for better quality
+});
+// Audio properties
+console.log(`Duration: ${audio.durationMs}ms`);
+console.log(`Samples: ${audio.samples}`);
+console.log(`Sample rate: ${audio.sampleRate} Hz`);
+console.log(`Generation time: ${audio.generationMs}ms`);
+console.log(`RTF: ${audio.rtf}`);  // Real-time factor
+// audio.audio is an ArrayBuffer with PCM16 data
+```
+### Playing Audio in Browser
+```typescript
+import { createWavBlob } from 'kugelaudio';
+const audio = await client.tts.generate({
+  text: 'Hello, world!',
+  model: 'kugel-one-turbo',
+});
+// Create WAV blob for playback
+const wavBlob = createWavBlob(audio.audio, audio.sampleRate);
+const url = URL.createObjectURL(wavBlob);
+// Play with Audio element
+const audioElement = new Audio(url);
+audioElement.play();
+// Or with Web Audio API
+const audioContext = new AudioContext();
+const arrayBuffer = await wavBlob.arrayBuffer();
+const audioBuffer = await audioContext.decodeAudioData(arrayBuffer);
+const source = audioContext.createBufferSource();
+source.buffer = audioBuffer;
+source.connect(audioContext.destination);
+source.start();
+```
+### Streaming Audio Output
+Receive audio chunks as they are generated for lower latency:
+```typescript
+await client.tts.stream(
+  {
+    text: 'Hello, this is streaming audio.',
+    model: 'kugel-one-turbo',
+  },
+  {
+    onOpen: () => {
+      console.log('WebSocket connected');
+    },
+    onChunk: (chunk) => {
+      console.log(`Chunk ${chunk.index}: ${chunk.samples} samples`);
+      // chunk.audio is base64-encoded PCM16 data
+      // Use base64ToArrayBuffer() to decode
+      playAudioChunk(chunk);
+    },
+    onFinal: (stats) => {
+      console.log(`Total duration: ${stats.durationMs}ms`);
+      console.log(`Time to first audio: ${stats.ttfaMs}ms`);
+      console.log(`Generation time: ${stats.generationMs}ms`);
+      console.log(`RTF: ${stats.rtf}`);
+    },
+    onError: (error) => {
+      console.error('TTS error:', error);
+    },
+    onClose: () => {
+      console.log('WebSocket closed');
+    },
+  }
+);
+```
+### Processing Audio Chunks
+```typescript
+import { base64ToArrayBuffer, decodePCM16 } from 'kugelaudio';
+// In streaming callback:
+onChunk: (chunk) => {
+  // Decode base64 to ArrayBuffer
+  const pcmBuffer = base64ToArrayBuffer(chunk.audio);
+  // Convert PCM16 to Float32 for Web Audio API
+  const float32Data = decodePCM16(chunk.audio);
+  // Play with Web Audio API
+  const audioBuffer = audioContext.createBuffer(1, float32Data.length, chunk.sampleRate);
+  audioBuffer.copyToChannel(float32Data, 0);
+  const source = audioContext.createBufferSource();
+  source.buffer = audioBuffer;
+  source.connect(audioContext.destination);
+  source.start();
+}
+```
+## Error Handling
+```typescript
+import { KugelAudio } from 'kugelaudio';
+import {
+  KugelAudioError,
+  AuthenticationError,
+  RateLimitError,
+  InsufficientCreditsError,
+  ValidationError,
+  ConnectionError,
+} from 'kugelaudio';
+try {
+  const audio = await client.tts.generate({ text: 'Hello!' });
+} catch (error) {
+  if (error instanceof AuthenticationError) {
+    console.error('Invalid API key');
+  } else if (error instanceof RateLimitError) {
+    console.error('Rate limit exceeded, please wait');
+  } else if (error instanceof InsufficientCreditsError) {
+    console.error('Not enough credits, please top up');
+  } else if (error instanceof ValidationError) {
+    console.error(`Invalid request: ${error.message}`);
+  } else if (error instanceof ConnectionError) {
+    console.error('Failed to connect to server');
+  } else if (error instanceof KugelAudioError) {
+    console.error(`API error: ${error.message}`);
+  }
+}
+```
+## TypeScript Types
+### KugelAudioOptions
+```typescript
+interface KugelAudioOptions {
+  apiKey: string;      // Required
+  apiUrl?: string;     // Default: 'https://api.kugelaudio.com'
+  ttsUrl?: string;     // Default: same as apiUrl (backend proxies to TTS)
+  timeout?: number;    // Default: 60000 (ms)
+}
+```
+### GenerateOptions
+```typescript
+interface GenerateOptions {
+  text: string;            // Required: Text to synthesize
+  model?: string;          // Default: 'kugel-one-turbo'
+  voiceId?: number;        // Optional: Voice ID
+  cfgScale?: number;       // Default: 2.0
+  maxNewTokens?: number;   // Default: 2048
+  sampleRate?: number;     // Default: 24000
+  speakerPrefix?: boolean; // Default: true
+}
+```
+### AudioChunk
+```typescript
+interface AudioChunk {
+  audio: string;       // Base64-encoded PCM16 audio
+  encoding: string;    // 'pcm_s16le'
+  index: number;       // Chunk index (0-based)
+  sampleRate: number;  // Sample rate (24000)
+  samples: number;     // Number of samples in chunk
+}
+```
+### AudioResponse
+```typescript
+interface AudioResponse {
+  audio: ArrayBuffer;     // Complete PCM16 audio
+  sampleRate: number;     // Sample rate (24000)
+  samples: number;        // Total samples
+  durationMs: number;     // Duration in milliseconds
+  generationMs: number;   // Generation time in milliseconds
+  rtf: number;           // Real-time factor
+}
+```
+### GenerationStats
+```typescript
+interface GenerationStats {
+  final: true;
+  chunks: number;         // Number of chunks generated
+  totalSamples: number;   // Total samples generated
+  durationMs: number;     // Audio duration in ms
+  generationMs: number;   // Generation time in ms
+  ttfaMs: number;         // Time to first audio in ms
+  rtf: number;           // Real-time factor
+}
+```
+### StreamCallbacks
+```typescript
+interface StreamCallbacks {
+  onOpen?: () => void;
+  onChunk?: (chunk: AudioChunk) => void;
+  onFinal?: (stats: GenerationStats) => void;
+  onError?: (error: Error) => void;
+  onClose?: () => void;
+}
+```
+### Model
+```typescript
+interface Model {
+  id: string;             // 'kugel-one-turbo' or 'kugel-one'
+  name: string;           // Human-readable name
+  description: string;    // Model description
+  parameters: string;     // Parameter count ('1.5B', '7B')
+  maxInputLength: number; // Maximum input characters
+  sampleRate: number;     // Output sample rate
+}
+```
+### Voice
+```typescript
+interface Voice {
+  id: number;                    // Voice ID
+  name: string;                  // Voice name
+  description?: string;          // Description
+  category?: VoiceCategory;      // 'premade' | 'cloned' | 'generated'
+  sex?: VoiceSex;               // 'male' | 'female' | 'neutral'
+  age?: VoiceAge;               // 'young' | 'middle_aged' | 'old'
+  supportedLanguages: string[]; // ['en', 'de', ...]
+  sampleText?: string;          // Sample text for preview
+  avatarUrl?: string;           // Avatar image URL
+  sampleUrl?: string;           // Sample audio URL
+  isPublic: boolean;            // Whether voice is public
+  verified: boolean;            // Whether voice is verified
+}
+```
+## Utility Functions
+### base64ToArrayBuffer
+Convert base64 string to ArrayBuffer:
+```typescript
+import { base64ToArrayBuffer } from 'kugelaudio';
+const buffer = base64ToArrayBuffer(chunk.audio);
+```
+### decodePCM16
+Convert base64 PCM16 to Float32Array for Web Audio API:
+```typescript
+import { decodePCM16 } from 'kugelaudio';
+const floatData = decodePCM16(chunk.audio);
+```
+### createWavFile
+Create a WAV file from PCM16 data:
+```typescript
+import { createWavFile } from 'kugelaudio';
+const wavBuffer = createWavFile(pcmArrayBuffer, 24000);
+```
+### createWavBlob
+Create a playable Blob from PCM16 data:
+```typescript
+import { createWavBlob } from 'kugelaudio';
+const blob = createWavBlob(pcmArrayBuffer, 24000);
+const url = URL.createObjectURL(blob);
+```
+## Complete Example
+```typescript
+import { KugelAudio, createWavBlob } from 'kugelaudio';
+async function main() {
+  // Initialize client
+  const client = new KugelAudio({ apiKey: 'your_api_key' });
+  // List available models
+  console.log('Available Models:');
+  const models = await client.models.list();
+  for (const model of models) {
+    console.log(`  - ${model.id}: ${model.name} (${model.parameters})`);
+  }
+  // List available voices
+  console.log('\nAvailable Voices:');
+  const voices = await client.voices.list({ limit: 5 });
+  for (const voice of voices) {
+    console.log(`  - ${voice.id}: ${voice.name}`);
+  }
+  // Generate audio with streaming
+  console.log('\nGenerating audio (streaming)...');
+  const chunks: ArrayBuffer[] = [];
+  let ttfa: number | undefined;
+  const startTime = Date.now();
+  await client.tts.stream(
+    {
+      text: 'Welcome to KugelAudio. This is an example of high-quality text-to-speech synthesis.',
+      model: 'kugel-one-turbo',
+    },
+    {
+      onChunk: (chunk) => {
+        if (!ttfa) {
+          ttfa = Date.now() - startTime;
+          console.log(`Time to first audio: ${ttfa}ms`);
+        }
+        chunks.push(base64ToArrayBuffer(chunk.audio));
+      },
+      onFinal: (stats) => {
+        console.log(`Generated ${stats.durationMs}ms of audio`);
+        console.log(`Generation time: ${stats.generationMs}ms`);
+        console.log(`RTF: ${stats.rtf}x`);
+      },
+    }
+  );
+}
+main();
+```
+## Browser Support
+The SDK works in modern browsers with WebSocket support. For Node.js, ensure you have a WebSocket implementation available.
+## License
+MIT