@voxdiscover/voiceserver 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Voxdiscover
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
package/README.md ADDED
@@ -0,0 +1,465 @@
1
+ # @voxdiscover/voiceserver
2
+
3
+ Framework-agnostic TypeScript SDK for Voice_server voice agents. Provides session token authentication, Daily.js WebRTC integration, and typed events for building voice-enabled applications.
4
+
5
+ ## Installation
6
+
7
+ ```bash
8
+ npm install @voxdiscover/voiceserver @daily-co/daily-js
9
+ # or
10
+ pnpm add @voxdiscover/voiceserver @daily-co/daily-js
11
+ # or
12
+ yarn add @voxdiscover/voiceserver @daily-co/daily-js
13
+ ```
14
+
15
+ **Note:** `@daily-co/daily-js` is a peer dependency and must be installed separately.
16
+
17
+ ## Quick Start
18
+
19
+ ### 1. Obtain Session Token
20
+
21
+ First, obtain a session token from your backend (which calls Voice_server's session API):
22
+
23
+ ```typescript
24
+ // Your backend endpoint
25
+ const response = await fetch('https://voiceserver.voxdiscover.com/api/voice-session', {
26
+ method: 'POST',
27
+ headers: { 'Content-Type': 'application/json' },
28
+ body: JSON.stringify({ userId: 'user_123' }),
29
+ });
30
+
31
+ const { token } = await response.json();
32
+ ```
33
+
34
+ ### 2. Initialize SDK
35
+
36
+ ```typescript
37
+ import { VoiceAgent } from '@voxdiscover/voiceserver';
38
+
39
+ const agent = new VoiceAgent({ token });
40
+ ```
41
+
42
+ ### 3. Subscribe to Events
43
+
44
+ ```typescript
45
+ // Connection state changes
46
+ agent.on('connection:state', (state) => {
47
+ console.log('Connection state:', state);
48
+ // states: 'connecting' | 'connected' | 'reconnecting' | 'disconnected' | 'failed'
49
+ });
50
+
51
+ // Transcripts (streaming)
52
+ agent.on('transcript:interim', ({ text, speaker }) => {
53
+ console.log(`[interim] ${speaker}: ${text}`);
54
+ });
55
+
56
+ agent.on('transcript:final', ({ text, speaker }) => {
57
+ console.log(`[final] ${speaker}: ${text}`);
58
+ });
59
+
60
+ // Errors
61
+ agent.on('connection:error', (error) => {
62
+ console.error('Connection error:', error.message);
63
+ if (error.context?.suggestion) {
64
+ console.log('Suggestion:', error.context.suggestion);
65
+ }
66
+ });
67
+ ```
68
+
69
+ ### 4. Connect to Voice Agent
70
+
71
+ ```typescript
72
+ try {
73
+ await agent.connect();
74
+ console.log('Connected! State:', agent.state);
75
+ } catch (error) {
76
+ console.error('Failed to connect:', error);
77
+ }
78
+ ```
79
+
80
+ ### 5. Control Audio
81
+
82
+ ```typescript
83
+ // Mute microphone
84
+ agent.mute();
85
+
86
+ // Unmute microphone
87
+ agent.unmute();
88
+ ```
89
+
90
+ ### 6. Disconnect
91
+
92
+ ```typescript
93
+ await agent.disconnect();
94
+ ```
95
+
96
+ ## API Reference
97
+
98
+ ### `VoiceAgent`
99
+
100
+ Main SDK class for managing voice conversations.
101
+
102
+ #### Constructor
103
+
104
+ ```typescript
105
+ new VoiceAgent(config: VoiceAgentConfig)
106
+ ```
107
+
108
+ **Config options:**
109
+
110
+ - `token` (required): Session token from backend
111
+ - `baseUrl` (optional): Backend base URL for validation (default: `https://voiceserver.voxdiscover.com`)
112
+ - `reconnection` (optional):
113
+ - `enabled` (default: `true`): Enable automatic reconnection
114
+ - `maxAttempts` (default: `5`): Max reconnection attempts
115
+
116
+ #### Properties
117
+
118
+ - `state`: Current connection state (read-only)
119
+ - `'connecting'` - Establishing connection
120
+ - `'connected'` - Successfully connected
121
+ - `'reconnecting'` - Attempting to reconnect
122
+ - `'disconnected'` - Not connected
123
+ - `'failed'` - Connection failed
124
+
125
+ #### Methods
126
+
127
+ - `connect(): Promise<void>` - Connect to voice session (validates token, joins Daily room, starts remote audio)
128
+ - `disconnect(): Promise<void>` - Disconnect and cleanup resources (leaves room, releases audio elements)
129
+ - `mute(): void` - Mute microphone
130
+ - `unmute(): void` - Unmute microphone
131
+
132
+ #### Events
133
+
134
+ Subscribe to events using `agent.on(event, callback)`:
135
+
136
+ **Connection events:**
137
+ - `connection:state` - `(state: ConnectionState) => void`
138
+ - `connection:error` - `(error: VoiceAgentError) => void`
139
+
140
+ **Transcript events:**
141
+ - `transcript:interim` - `(data: TranscriptData) => void` - Partial transcripts (not emitted by all agent types)
142
+ - `transcript:final` - `(data: TranscriptData) => void` - One event per completed turn; emitted in real-time as each user or agent turn finishes
143
+
144
+ **Audio events:**
145
+ - `audio:muted` - `() => void`
146
+ - `audio:unmuted` - `() => void`
147
+
148
+ **Session events:**
149
+ - `session:expiring` - `(expiresIn: number) => void` - 5 minutes before expiration
150
+
151
+ ### Error Handling
152
+
153
+ The SDK provides typed error classes for different scenarios:
154
+
155
+ ```typescript
156
+ import {
157
+ VoiceAgentError,
158
+ TokenExpiredError,
159
+ TokenInvalidError,
160
+ ConnectionFailedError,
161
+ PermissionDeniedError,
162
+ } from '@voxdiscover/voiceserver';
163
+
164
+ try {
165
+ await agent.connect();
166
+ } catch (error) {
167
+ // Pattern 1: instanceof checks
168
+ if (error instanceof TokenExpiredError) {
169
+ console.log('Token expired, requesting new session...');
170
+ // Request new token from backend
171
+ } else if (error instanceof PermissionDeniedError) {
172
+ console.log('Microphone permission denied');
173
+ // Show permission request UI
174
+ }
175
+
176
+ // Pattern 2: code property checks
177
+ if (error.code === 'CONNECTION_FAILED' && error.retryable) {
178
+ console.log('Retryable error, will auto-reconnect');
179
+ }
180
+
181
+ // Access error details
182
+ console.log('Message:', error.message);
183
+ console.log('Suggestion:', error.context?.suggestion);
184
+ console.log('Retryable:', error.retryable);
185
+ }
186
+ ```
187
+
188
+ **Error types:**
189
+
190
+ - `TokenExpiredError` - Session token expired (non-retryable)
191
+ - `TokenInvalidError` - Token malformed or invalid (non-retryable)
192
+ - `ConnectionFailedError` - WebRTC connection failed (retryable)
193
+ - `PermissionDeniedError` - Microphone permission denied (non-retryable)
194
+ - `NetworkError` - Network error during API call (retryable)
195
+
196
+ ## Complete Example
197
+
198
+ ```typescript
199
+ import { VoiceAgent, TokenExpiredError } from '@voxdiscover/voiceserver';
200
+
201
+ async function startVoiceCall() {
202
+ // 1. Get session token from your backend
203
+ const { token } = await fetch('/api/voice-session', {
204
+ method: 'POST',
205
+ body: JSON.stringify({ agentId: 'support-agent' }),
206
+ }).then(r => r.json());
207
+
208
+ // 2. Initialize agent
209
+ const agent = new VoiceAgent({
210
+ token,
211
+ baseUrl: 'https://voiceserver.voxdiscover.com',
212
+ reconnection: { enabled: true, maxAttempts: 5 },
213
+ });
214
+
215
+ // 3. Set up event listeners
216
+ agent.on('connection:state', (state) => {
217
+ updateUI({ connectionState: state });
218
+ });
219
+
220
+ agent.on('transcript:final', ({ text, speaker }) => {
221
+ addMessageToChat({ speaker, text });
222
+ });
223
+
224
+ agent.on('connection:error', async (error) => {
225
+ if (error instanceof TokenExpiredError) {
226
+ // Refresh token and reconnect
227
+ const { token: newToken } = await refreshSession();
228
+ // Create new agent with fresh token
229
+ await startVoiceCall();
230
+ } else {
231
+ showError(error.message, error.context?.suggestion);
232
+ }
233
+ });
234
+
235
+ // 4. Connect
236
+ try {
237
+ await agent.connect();
238
+ showUI('connected');
239
+ } catch (error) {
240
+ showUI('error', error.message);
241
+ }
242
+
243
+ return agent;
244
+ }
245
+
246
+ // Usage in UI event handlers
247
+ document.getElementById('startCall').addEventListener('click', async () => {
248
+ const agent = await startVoiceCall();
249
+
250
+ document.getElementById('muteBtn').addEventListener('click', () => {
251
+ agent.mute();
252
+ });
253
+
254
+ document.getElementById('endCall').addEventListener('click', async () => {
255
+ await agent.disconnect();
256
+ });
257
+ });
258
+ ```
259
+
260
+ ## Analytics Hooks
261
+
262
+ The SDK provides standardized analytics hooks for integrating with observability platforms like Segment, DataDog, or PostHog. Analytics hooks emit lifecycle and error events only (not transcripts or audio events) to keep analytics data clean.
263
+
264
+ ### Registering an Analytics Callback
265
+
266
+ ```typescript
267
+ import { VoiceAgent } from '@voxdiscover/voiceserver';
268
+
269
+ const agent = new VoiceAgent({ token });
270
+
271
+ agent.onAnalyticsEvent((event) => {
272
+ console.log('Analytics event:', event.eventType, {
273
+ sessionId: event.sessionId,
274
+ agentId: event.agentId,
275
+ userId: event.userId,
276
+ timestamp: event.timestamp,
277
+ });
278
+ });
279
+ ```
280
+
281
+ ### Event Types
282
+
283
+ | Event Type | When Emitted |
284
+ |------------|-------------|
285
+ | `session_started` | WebRTC connection established (joined Daily room) |
286
+ | `session_ended` | Session disconnected (explicit disconnect) |
287
+ | `connection_failed` | Connection error (token invalid, network failure, etc.) |
288
+ | `agent_swap_completed` | Agent hot-swap completed successfully |
289
+ | `agent_swap_failed` | Agent hot-swap failed |
290
+ | `error` | Categorized SDK error requiring developer attention |
291
+
292
+ ### Event Payload Structure
293
+
294
+ ```typescript
295
+ interface AnalyticsEvent {
296
+ timestamp: number; // Unix timestamp in milliseconds
297
+ eventType: string; // One of the event types above
298
+ sessionId: string; // Session identifier from token
299
+ agentId?: string; // Agent identifier from token
300
+ userId?: string; // User identifier from session context
301
+ customContext?: Record<string, any>; // Context from session creation
302
+ error?: {
303
+ code: string; // Programmatic error code
304
+ message: string; // Human-readable error description
305
+ retryable: boolean; // Whether operation can be retried
306
+ };
307
+ }
308
+ ```
309
+
310
+ ### Integration with Segment
311
+
312
+ ```typescript
313
+ import Analytics from 'analytics';
314
+ import segmentPlugin from '@analytics/segment';
315
+
316
+ // Initialize Segment
317
+ const analytics = Analytics({
318
+ app: 'my-voice-app',
319
+ plugins: [
320
+ segmentPlugin({
321
+ writeKey: 'YOUR_SEGMENT_WRITE_KEY',
322
+ }),
323
+ ],
324
+ });
325
+
326
+ // Register analytics callback
327
+ const agent = new VoiceAgent({ token });
328
+
329
+ agent.onAnalyticsEvent((event) => {
330
+ analytics.track(event.eventType, {
331
+ session_id: event.sessionId,
332
+ agent_id: event.agentId,
333
+ user_id: event.userId,
334
+ timestamp: event.timestamp,
335
+ // Error details (only present on failure events)
336
+ ...(event.error && {
337
+ error_code: event.error.code,
338
+ error_message: event.error.message,
339
+ error_retryable: event.error.retryable,
340
+ }),
341
+ // Custom context (from session creation)
342
+ ...event.customContext,
343
+ });
344
+ });
345
+
346
+ await agent.connect();
347
+ ```
348
+
349
+ ### Integration with DataDog / PostHog
350
+
351
+ Any analytics platform that accepts key-value event properties works the same way:
352
+
353
+ ```typescript
354
+ agent.onAnalyticsEvent((event) => {
355
+ // PostHog example
356
+ posthog.capture(event.eventType, {
357
+ distinct_id: event.userId,
358
+ session_id: event.sessionId,
359
+ agent_id: event.agentId,
360
+ $timestamp: new Date(event.timestamp).toISOString(),
361
+ });
362
+
363
+ // DataDog example
364
+ datadogRum.addAction(event.eventType, {
365
+ session_id: event.sessionId,
366
+ user_id: event.userId,
367
+ });
368
+ });
369
+ ```
370
+
371
+ ### Multiple Callbacks
372
+
373
+ Multiple callbacks can be registered - all receive every event:
374
+
375
+ ```typescript
376
+ // Log to console
377
+ agent.onAnalyticsEvent((event) => {
378
+ console.log('[Voice Analytics]', event.eventType, event.sessionId);
379
+ });
380
+
381
+ // Send to Segment
382
+ agent.onAnalyticsEvent((event) => {
383
+ analytics.track(event.eventType, { session_id: event.sessionId });
384
+ });
385
+
386
+ // Send to custom backend
387
+ agent.onAnalyticsEvent((event) => {
388
+ fetch('/api/analytics', {
389
+ method: 'POST',
390
+ body: JSON.stringify(event),
391
+ });
392
+ });
393
+ ```
394
+
395
+ ### IMPORTANT: Read-Only Callbacks
396
+
397
+ Analytics callbacks MUST be read-only. Do NOT call SDK methods (connect, disconnect, mute, etc.) inside a callback. Calling SDK methods from within an analytics callback creates a circular event chain that triggers the circuit breaker and disables analytics for the remainder of the session.
398
+
399
+ **Do NOT do this:**
400
+ ```typescript
401
+ // WRONG: Calling SDK methods inside analytics callback
402
+ agent.onAnalyticsEvent((event) => {
403
+ if (event.eventType === 'session_ended') {
404
+ agent.connect(); // This will trigger another analytics event -> infinite loop
405
+ }
406
+ });
407
+ ```
408
+
409
+ **Do this instead:**
410
+ ```typescript
411
+ // CORRECT: React to events outside the callback
412
+ agent.on('connection:state', (state) => {
413
+ if (state === 'disconnected') {
414
+ handleReconnect(); // Handle reconnection in event listener, not analytics callback
415
+ }
416
+ });
417
+
418
+ agent.onAnalyticsEvent((event) => {
419
+ // Read-only: only forward events to external services
420
+ myAnalytics.track(event.eventType, { session_id: event.sessionId });
421
+ });
422
+ ```
423
+
424
+ ## TypeScript Support
425
+
426
+ The SDK is written in TypeScript and includes full type definitions. All types are exported:
427
+
428
+ ```typescript
429
+ import type {
430
+ VoiceAgentConfig,
431
+ ConnectionState,
432
+ TranscriptData,
433
+ VoiceAgentEvents,
434
+ VoiceAgentErrorCode,
435
+ } from '@voxdiscover/voiceserver';
436
+ ```
437
+
438
+ ## Internal Architecture Notes
439
+
440
+ ### Headless Mode Audio
441
+ `VoiceAgent` uses `DailyIframe.createCallObject()` (headless mode), which does **not** auto-play
442
+ remote audio. The SDK manages this internally: a `track-started` handler creates an `<Audio>`
443
+ element per remote participant and pipes the incoming track through it. No additional setup is
444
+ needed in your application.
445
+
446
+ ### Transcript Delivery
447
+ Transcripts are streamed in real-time over Daily's app-message channel. Each completed turn
448
+ (user or agent) triggers one `transcript:final` event. The server uses Pipecat's
449
+ `OutputTransportMessageFrame` API to broadcast each turn to all room participants.
450
+
451
+ ## Browser Support
452
+
453
+ - Chrome 90+
454
+ - Firefox 88+
455
+ - Safari 14+
456
+ - Edge 90+
457
+
458
+ **Requirements:**
459
+ - WebRTC support
460
+ - getUserMedia API support
461
+ - ES2022 features
462
+
463
+ ## License
464
+
465
+ MIT