npm - @drawdream/livespeech - Versions diffs - 0.1.10 → 0.1.14 - Mend

@drawdream/livespeech 0.1.10 → 0.1.14

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (6) hide show

package/README.md CHANGED Viewed

@@ -10,7 +10,7 @@ A TypeScript/JavaScript SDK for real-time speech-to-speech AI conversations.
 - 🎙️ **Real-time Voice Conversations** - Natural, low-latency voice interactions
 - 🌐 **Multi-language Support** - Korean, English, Japanese, Chinese, and more
 - 🔊 **Streaming Audio** - Send and receive audio in real-time
-- 📝 **Live Transcription** - Get transcriptions of both user and AI speech
+- ⏹️ **Barge-in Support** - Interrupt AI mid-speech by talking or programmatically
 - 🔄 **Auto-reconnection** - Automatic recovery from network issues
 - 🌐 **Browser & Node.js** - Works in both environments
@@ -18,13 +18,9 @@ A TypeScript/JavaScript SDK for real-time speech-to-speech AI conversations.
 ```bash
 npm install @drawdream/livespeech
-# or
-yarn add @drawdream/livespeech
-# or
-pnpm add @drawdream/livespeech
 ```
-## Quick Start
+## Quick Start (5 minutes)
 ```typescript
 import { LiveSpeechClient } from '@drawdream/livespeech';
@@ -34,31 +30,28 @@ const client = new LiveSpeechClient({
   apiKey: 'your-api-key',
 });
-// Set up event handlers
-client.setUserTranscriptHandler((text) => {
-  console.log('You:', text);
+// Handle only 4 essential events!
+client.setAudioHandler((audioData) => {
+  audioPlayer.queue(audioData);  // PCM16 — use event.sampleRate (24kHz Live, 16kHz Composed)
 });
-client.setResponseHandler((text, isFinal) => {
-  console.log('AI:', text);
+client.on('interrupted', () => {
+  audioPlayer.clear();  // CRITICAL: Clear buffer on interrupt!
 });
-client.setAudioHandler((audioData) => {
-  playAudio(audioData);  // PCM16 @ 24kHz
+client.on('turnComplete', () => {
+  console.log('AI finished');
 });
 client.setErrorHandler((error) => {
   console.error('Error:', error.message);
 });
-// Connect and start conversation
+// Connect and start
 await client.connect();
-await client.startSession({
-  prePrompt: 'You are a helpful assistant.',
-  language: 'ko-KR',
-});
+await client.startSession({ prePrompt: 'You are a helpful assistant.' });
-// Stream audio
+// Send audio
 client.audioStart();
 client.sendAudioChunk(pcmData);  // PCM16 @ 16kHz
 client.audioEnd();
@@ -68,380 +61,267 @@ await client.endSession();
 client.disconnect();
 ```
-## Audio Flow
+---
-```
-connect() → startSession() → audioStart() → sendAudioChunk()* → audioEnd() → endSession()
-                                    ↓
-                          sendSystemMessage() (optional, during live session)
-                          sendToolResponse() (when toolCall received)
-```
+# Core API
-| Step | Description |
-|------|-------------|
-| `connect()` | Establish WebSocket connection |
-| `startSession(config)` | Start conversation with optional system prompt |
-| `audioStart()` | Begin audio streaming |
-| `sendAudioChunk(data)` | Send PCM16 audio (call multiple times) |
-| `sendSystemMessage(msg)` | Inject context or trigger AI response (optional) |
-| `sendToolResponse(id, result)` | Send function result back to AI (after toolCall) |
-| `updateUserId(userId)` | Migrate guest session to user account |
-| `audioEnd()` | End streaming, triggers AI response |
-| `endSession()` | End conversation |
+Everything you need for basic voice conversations.
+## Methods
+| Method | Description |
+|--------|-------------|
+| `connect()` | Establish connection |
 | `disconnect()` | Close connection |
+| `startSession(config)` | Start conversation with system prompt |
+| `endSession()` | End conversation |
+| `sendAudioChunk(data)` | Send PCM16 audio (16kHz) |
+## Events
+| Event | Description | Action Required |
+|-------|-------------|-----------------|
+| `audio` | AI's audio output | Play audio (PCM16 — check `sampleRate`) |
+| `turnComplete` | AI finished speaking | Ready for next input |
+| `interrupted` | User barged in | **Clear audio buffer!** |
+| `error` | Error occurred | Handle/log error |
+### ⚠️ Critical: Handle `interrupted`
+When the user speaks while AI is responding, **you must clear your audio buffer**:
+```typescript
+client.on('interrupted', () => {
+  audioPlayer.clear();  // Stop buffered audio immediately
+  audioPlayer.stop();
+});
+```
+Without this, 2-3 seconds of buffered audio continues playing after the user interrupts.
+## Audio Format
+| Direction | Format | Sample Rate |
+|-----------|--------|-------------|
+| Input (mic) | PCM16 | 16,000 Hz |
+| Output (AI) — Live mode | PCM16 | 24,000 Hz |
+| Output (AI) — Composed mode | PCM16 | 16,000 Hz |
+> **Important:** The `audio` event includes a `sampleRate` field. Always use it to configure your audio decoder rather than hardcoding a rate.
 ## Configuration
 ```typescript
 const client = new LiveSpeechClient({
-  region: 'ap-northeast-2',       // Required: Seoul region
-  apiKey: 'your-api-key',         // Required: Your API key
-  userId: 'user-123',             // Optional: Enable conversation memory
-  autoReconnect: true,            // Auto-reconnect on disconnect
-  maxReconnectAttempts: 5,        // Maximum reconnection attempts
-  debug: false,                   // Enable debug logging
+  region: 'ap-northeast-2',       // Required
+  apiKey: 'your-api-key',         // Required
 });
 await client.startSession({
   prePrompt: 'You are a helpful assistant.',
-  language: 'ko-KR',              // Language: ko-KR, en-US, ja-JP, etc.
-  pipelineMode: 'live',           // 'live' (default) or 'composed'
-  aiSpeaksFirst: false,           // AI speaks first (live mode only)
-  allowHarmCategory: false,       // Disable safety filtering (use with caution)
-  tools: [{ name: 'func', description: 'desc', parameters: {...} }],  // Function calling
+  language: 'ko-KR',              // Optional: ko-KR, en-US, ja-JP, etc.
 });
 ```
-## Session Options
-| Option | Type | Default | Description |
-|--------|------|---------|-------------|
-| `prePrompt` | `string` | - | System prompt for the AI assistant |
-| `language` | `string` | `'en-US'` | Language code (e.g., `ko-KR`, `ja-JP`) |
-| `pipelineMode` | `'live' \| 'composed'` | `'live'` | Audio processing mode |
-| `aiSpeaksFirst` | `boolean` | `false` | AI initiates conversation (live mode only) |
-| `allowHarmCategory` | `boolean` | `false` | Disable content safety filtering |
-| `tools` | `Tool[]` | `undefined` | Function definitions for AI to call |
-### Pipeline Modes
-| Mode | Latency | Description |
-|------|---------|-------------|
-| `live` | Lower (~300ms) | Direct audio-to-audio via Live API |
-| `composed` | Higher (~1-2s) | Separate STT → LLM → TTS pipeline |
+---
-### AI Speaks First
+# Composed Mode
-When `aiSpeaksFirst: true`, the AI will immediately speak a greeting based on your `prePrompt`:
+Use composed mode for higher accuracy with slightly more latency. It runs a separate STT → LLM → TTS pipeline instead of direct audio-to-audio.
 ```typescript
 await client.startSession({
-  prePrompt: 'You are a customer service agent. Greet the customer warmly and ask how you can help.',
-  aiSpeaksFirst: true,
+  prePrompt: 'You are a helpful assistant.',
+  pipelineMode: 'composed',
+  language: 'ko-KR',
 });
-client.audioStart();  // AI greeting plays immediately
+client.audioStart();
+// Send/receive audio the same way as live mode
 ```
-> ⚠️ **Note**: Only works with `pipelineMode: 'live'`
+### Live vs Composed
-### Content Safety
+| | Live | Composed |
+|---|---|---|
+| **Latency** | ~300ms | ~1-2s |
+| **Pipeline** | Direct audio-to-audio (Gemini Live) | STT → LLM → TTS |
+| **Accuracy** | Good | Higher |
+| **`aiSpeaksFirst`** | ✅ Supported | ❌ Not supported |
+| **`tools` (function calling)** | ✅ Supported | ❌ Not supported |
+| **Output sample rate** | 24,000 Hz | 16,000 Hz |
+| **Barge-in** | Automatic (Gemini VAD) | Automatic |
-By default, LLM applies content safety filtering. Set `allowHarmCategory: true` to disable:
+> **Note:** All other SDK methods and events work identically in both modes. The only code change is adding `pipelineMode: 'composed'` to your session config.
-```typescript
-await client.startSession({
-  allowHarmCategory: true,  // ⚠️ Disables all safety filters
-});
-```
+---
-> ⚠️ **Warning**: Only use in controlled environments where content moderation is handled by other means.
+# Advanced API
-## Function Calling (Tool Use)
+Optional features for power users.
-Define functions that the AI can call during conversation. When the AI decides to call a function, you receive a `toolCall` event and must respond with `sendToolResponse()`.
+## Additional Methods
-### Define Tools
+| Method | Description |
+|--------|-------------|
+| `audioStart()` / `audioEnd()` | Manual audio stream control |
+| `interrupt()` | Explicitly stop AI response (for Stop button) |
+| `sendSystemMessage(msg)` | Inject context during conversation |
+| `sendToolResponse(id, result)` | Reply to function calls |
+| `updateUserId(userId)` | Migrate guest to authenticated user |
-```typescript
-const tools = [
-  {
-    name: 'open_login',
-    description: 'Opens Google Login popup when user wants to sign in',
-    parameters: { type: 'OBJECT', properties: {}, required: [] }
-  },
-  {
-    name: 'get_price',
-    description: 'Gets product price by ID',
-    parameters: {
-      type: 'OBJECT',
-      properties: {
-        productId: { type: 'string', description: 'Product ID' }
-      },
-      required: ['productId']
-    }
-  }
-];
+## Additional Events
-await client.startSession({
-  prePrompt: 'You are a helpful assistant. Use tools when appropriate.',
-  tools,
-});
-```
+| Event | Description |
+|-------|-------------|
+| `connected` / `disconnected` | Connection lifecycle |
+| `sessionStarted` / `sessionEnded` | Session lifecycle |
+| `ready` | Session ready for audio |
+| `userTranscript` | User's speech transcribed |
+| `response` | AI's response text |
+| `toolCall` | AI wants to call a function |
+| `reconnecting` | Auto-reconnection attempt |
+| `userIdUpdated` | Guest-to-user migration complete |
+| `sessionWarning` | Session nearing duration limit |
+| `sessionGoodbye` | Session about to end |
-### Handle Tool Calls
+---
-```typescript
-client.on('toolCall', (event) => {
-  console.log('AI wants to call:', event.name);
-  console.log('With arguments:', event.args);
-  if (event.name === 'open_login') {
-    showLoginModal();
-    client.sendToolResponse(event.id, { success: true });
-  }
-  if (event.name === 'get_price') {
-    const price = getProductPrice(event.args.productId);
-    client.sendToolResponse(event.id, { price, currency: 'USD' });
-  }
-});
-```
+## Explicit Interrupt (Stop Button)
-### Tool Interface
+For UI "Stop" buttons or programmatic control:
 ```typescript
-interface Tool {
-  name: string;                    // Function name
-  description: string;             // When AI should use this
-  parameters?: {
-    type: 'OBJECT';
-    properties: Record<string, unknown>;
-    required?: string[];
-  };
-}
+// User clicks Stop button
+client.interrupt();
 ```
-> ⚠️ **Note**: Function calling only works with `pipelineMode: 'live'`
+Note: Voice barge-in works automatically via Gemini's VAD. This method is for explicit control.
-## System Messages
+---
-During an active live session, you can inject text messages to the AI using `sendSystemMessage()`. This is useful for:
-- Game events ("User completed level 5, congratulate them!")
-- App state changes ("User opened the cart with 3 items")
-- Timer/engagement triggers ("User has been quiet, engage them")
-- External data updates ("Weather changed to rainy")
+## System Messages
-### Usage
+Inject text context during live sessions (game events, app state, etc.):
 ```typescript
-// Simple usage - AI responds immediately
-client.sendSystemMessage("User just completed level 5. Congratulate them!");
+// AI responds immediately
+client.sendSystemMessage("User completed level 5. Congratulate them!");
-// With options - context only, no immediate response
-client.sendSystemMessage({
-  text: "User is browsing the cart",
-  triggerResponse: false
-});
+// Context only, no response
+client.sendSystemMessage({ text: "User is browsing", triggerResponse: false });
 ```
-### Parameters
+> Requires active live session (`audioStart()` called). Max 500 characters.
-| Parameter | Type | Required | Default | Description |
-|-----------|------|----------|---------|-------------|
-| `text` | `string` | Yes | - | Message text (max 500 chars) |
-| `triggerResponse` | `boolean` | No | `true` | AI responds immediately if `true` |
+---
-> ⚠️ **Note**: Requires an active live session (`audioStart()` must have been called). Only works with `pipelineMode: 'live'`.
-## Conversation Memory
+## Function Calling (Tool Use)
-When you provide a `userId`, the SDK enables persistent conversation memory:
+Let AI call functions in your app:
-- **Entity Memory**: AI remembers facts shared in previous sessions (names, preferences, relationships)
-- **Session Summaries**: Recent conversation summaries are available to the AI
-- **Cross-Session**: Memory persists across sessions for the same `userId`
+### 1. Define Tools
 ```typescript
-// With memory (authenticated user)
-const client = new LiveSpeechClient({
-  region: 'ap-northeast-2',
-  apiKey: 'your-api-key',
-  userId: 'user-123',  // Enables conversation memory
-});
+const tools = [{
+  name: 'get_price',
+  description: 'Gets product price by ID',
+  parameters: {
+    type: 'OBJECT',
+    properties: { productId: { type: 'string' } },
+    required: ['productId']
+  }
+}];
-// Without memory (guest)
-const client = new LiveSpeechClient({
-  region: 'ap-northeast-2',
-  apiKey: 'your-api-key',
-  // No userId = guest mode, no persistent memory
+await client.startSession({
+  prePrompt: 'You are helpful.',
+  tools,
 });
 ```
-| Mode | Memory Persistence | Use Case |
-|------|-------------------|----------|
-| With `userId` | Permanent | Authenticated users |
-| Without `userId` | Session only | Guests, anonymous users |
-### Guest-to-User Migration
-When a guest user logs in during a session, you can migrate their conversation history to their user account:
+### 2. Handle toolCall Events
 ```typescript
-// User logs in after chatting as guest
-client.on('userIdUpdated', (event) => {
-  console.log(`Migrated ${event.migratedMessages} messages to user ${event.userId}`);
+client.on('toolCall', (event) => {
+  if (event.name === 'get_price') {
+    const price = lookupPrice(event.args.productId);
+    client.sendToolResponse(event.id, { price });
+  }
 });
-// After authentication
-await client.updateUserId('authenticated-user-123');
 ```
-This enables:
-- Entity extraction on guest conversation history
-- Conversation continuity across sessions
-- Personalization based on past interactions
+---
-## Events
+## Conversation Memory
-| Event | Description | Key Properties |
-|-------|-------------|----------------|
-| `connected` | Connection established | `connectionId` |
-| `disconnected` | Connection closed | `reason`, `code` |
-| `sessionStarted` | Session created | `sessionId` |
-| `ready` | Ready for audio input | `timestamp` |
-| `userTranscript` | Your speech transcribed | `text` |
-| `response` | AI's response text | `text`, `isFinal` |
-| `audio` | AI's audio output | `data`, `sampleRate` |
-| `turnComplete` | AI finished speaking | `timestamp` |
-| `toolCall` | AI wants to call a function | `id`, `name`, `args` |
-| `userIdUpdated` | Guest migrated to user account | `userId`, `migratedMessages` |
-| `error` | Error occurred | `code`, `message` |
-### Simple Handlers
+Enable persistent memory across sessions:
 ```typescript
-// Your speech transcription
-client.setUserTranscriptHandler((text) => {
-  console.log('You said:', text);
-});
-// AI's text response
-client.setResponseHandler((text, isFinal) => {
-  console.log('AI:', text, isFinal ? '(done)' : '...');
+const client = new LiveSpeechClient({
+  region: 'ap-northeast-2',
+  apiKey: 'your-api-key',
+  userId: 'user-123',  // Enables memory
 });
+```
-// AI's audio output
-client.setAudioHandler((data: Uint8Array) => {
-  // data: PCM16 audio
-  // Sample rate: 24000 Hz
-  playAudio(data);
-});
+| Mode | Memory |
+|------|--------|
+| With `userId` | Permanent (entities, summaries) |
+| Without `userId` | Session only (guest) |
-// Error handling
-client.setErrorHandler((error) => {
-  console.error(`Error [${error.code}]: ${error.message}`);
-});
+### Guest-to-User Migration
-// Tool calls (function calling)
-client.on('toolCall', (event) => {
-  // Execute function and send result
-  const result = executeFunction(event.name, event.args);
-  client.sendToolResponse(event.id, result);
-});
+```typescript
+// User logs in during session
+await client.updateUserId('authenticated-user-123');
-// Guest-to-user migration
+// Listen for confirmation
 client.on('userIdUpdated', (event) => {
-  console.log(`Logged in as ${event.userId}, migrated ${event.migratedMessages} messages`);
+  console.log(`Migrated ${event.migratedMessages} messages`);
 });
 ```
-### Full Event API
-```typescript
-client.on('connected', (event) => {
-  console.log('Connected:', event.connectionId);
-});
-client.on('ready', () => {
-  console.log('Ready for audio');
-});
+---
-client.on('userTranscript', (event) => {
-  console.log('You:', event.text);
-});
-client.on('response', (event) => {
-  console.log('AI:', event.text, event.isFinal);
-});
+## AI Speaks First
-client.on('audio', (event) => {
-  // event.data: Uint8Array (PCM16)
-  // event.sampleRate: 24000
-  playAudio(event.data);
-});
-client.on('turnComplete', () => {
-  console.log('AI finished speaking');
-});
+AI initiates the conversation:
-client.on('error', (event) => {
-  console.error('Error:', event.code, event.message);
-});
-client.on('toolCall', (event) => {
-  // event.id: string - use with sendToolResponse
-  // event.name: string - function name
-  // event.args: object - function arguments
-  const result = handleToolCall(event.name, event.args);
-  client.sendToolResponse(event.id, result);
+```typescript
+await client.startSession({
+  prePrompt: 'Greet the customer warmly.',
+  aiSpeaksFirst: true,
 });
-client.on('userIdUpdated', (event) => {
-  // event.userId: string - the new user ID
-  // event.migratedMessages: number - count of migrated messages
-  console.log(`Migrated ${event.migratedMessages} messages to ${event.userId}`);
-});
+client.audioStart();  // AI speaks immediately
 ```
-## Audio Format
+---
-### Input (Your Microphone)
+## Session Options
-| Property | Value |
-|----------|-------|
-| Format | PCM16 (16-bit signed, little-endian) |
-| Sample Rate | 16,000 Hz |
-| Channels | 1 (Mono) |
-| Chunk Size | ~3200 bytes (100ms) |
+| Option | Default | Description |
+|--------|---------|-------------|
+| `prePrompt` | - | System prompt |
+| `language` | `'en-US'` | Language code |
+| `pipelineMode` | `'live'` | `'live'` (~300ms) or `'composed'` (~1-2s) |
+| `aiSpeaksFirst` | `false` | AI initiates (live mode only) |
+| `allowHarmCategory` | `false` | Disable safety filters |
+| `tools` | `[]` | Function definitions |
+| `sessionDuration` | - | Enables session duration limits when provided |
-### Output (AI Response)
+**Notes**
+- Duration checks are **disabled by default**. They activate only when `sessionDuration` is provided.
+- If only `sessionDuration.maxSeconds` is provided, `enableWarning`/`enableGoodbye` default to `false` in the SDK.
+- Server limits take precedence in production.
-| Property | Value |
-|----------|-------|
-| Format | PCM16 (16-bit signed, little-endian) |
-| Sample Rate | 24,000 Hz |
-| Channels | 1 (Mono) |
+---
 ## Browser Example
 ```typescript
 import { LiveSpeechClient, float32ToInt16, int16ToUint8 } from '@drawdream/livespeech';
-const client = new LiveSpeechClient({
-  region: 'ap-northeast-2',
-  apiKey: 'your-api-key',
-});
-// Handlers
-client.setUserTranscriptHandler((text) => console.log('You:', text));
-client.setResponseHandler((text) => console.log('AI:', text));
-client.setAudioHandler((data) => playAudioChunk(data));
-// Connect
-await client.connect();
-await client.startSession({ prePrompt: 'You are a helpful assistant.' });
 // Capture microphone
 const stream = await navigator.mediaDevices.getUserMedia({
   audio: { sampleRate: 16000, channelCount: 1 }
@@ -460,60 +340,30 @@ processor.onaudioprocess = (e) => {
 source.connect(processor);
 processor.connect(audioContext.destination);
-// Start streaming
-client.audioStart();
-// Stop later
-client.audioEnd();
-stream.getTracks().forEach(track => track.stop());
 ```
+---
 ## Audio Utilities
 ```typescript
-import {
-  float32ToInt16,    // Web Audio Float32 → PCM16
-  int16ToFloat32,    // PCM16 → Float32
-  int16ToUint8,      // Int16Array → Uint8Array
-  uint8ToInt16,      // Uint8Array → Int16Array
-  wrapPcmInWav,      // Create WAV file
-  AudioEncoder,      // Base64 encoding/decoding
-} from '@drawdream/livespeech';
-// Convert Web Audio to PCM16 for sending
-const float32 = audioBuffer.getChannelData(0);
-const int16 = float32ToInt16(float32);
-const pcmBytes = int16ToUint8(int16);
-client.sendAudioChunk(pcmBytes);
-// Convert received PCM16 to Web Audio
-const receivedInt16 = uint8ToInt16(audioEvent.data);
-const float32Data = int16ToFloat32(receivedInt16);
+import { float32ToInt16, int16ToUint8, wrapPcmInWav } from '@drawdream/livespeech';
+const int16 = float32ToInt16(float32Data);
+const bytes = int16ToUint8(int16);
+const wav = wrapPcmInWav(bytes, { sampleRate: 16000, channels: 1, bitDepth: 16 });
 ```
+---
 ## Error Handling
 ```typescript
 client.on('error', (event) => {
   switch (event.code) {
-    case 'authentication_failed':
-      console.error('Invalid API key');
-      break;
-    case 'connection_timeout':
-      console.error('Connection timed out');
-      break;
-    case 'rate_limit':
-      console.error('Rate limit exceeded');
-      break;
-    default:
-      console.error(`Error: ${event.message}`);
-  }
-});
-client.on('disconnected', (event) => {
-  if (event.reason === 'error') {
-    console.log('Will auto-reconnect...');
+    case 'authentication_failed': console.error('Invalid API key'); break;
+    case 'connection_timeout': console.error('Timed out'); break;
+    default: console.error(`Error: ${event.message}`);
   }
 });
@@ -522,44 +372,13 @@ client.on('reconnecting', (event) => {
 });
 ```
-## Client Properties
-| Property | Type | Description |
-|----------|------|-------------|
-| `isConnected` | `boolean` | Connection status |
-| `hasActiveSession` | `boolean` | Session status |
-| `isAudioStreaming` | `boolean` | Streaming status |
-| `connectionId` | `string \| null` | Current connection ID |
-| `currentSessionId` | `string \| null` | Current session ID |
+---
 ## Regions
-| Region | Code | Location |
-|--------|------|----------|
-| Asia Pacific (Seoul) | `ap-northeast-2` | Korea |
-## TypeScript Types
-```typescript
-import type {
-  LiveSpeechConfig,
-  SessionConfig,
-  LiveSpeechEvent,
-  ConnectedEvent,
-  DisconnectedEvent,
-  SessionStartedEvent,
-  ReadyEvent,
-  UserTranscriptEvent,
-  ResponseEvent,
-  AudioEvent,
-  TurnCompleteEvent,
-  ToolCallEvent,
-  UserIdUpdatedEvent,
-  ErrorEvent,
-  ErrorCode,
-  Tool,
-} from '@drawdream/livespeech';
-```
+| Region | Code |
+|--------|------|
+| Seoul (Korea) | `ap-northeast-2` |
 ## License