@keyframelabs/elements 0.2.1 → 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -71,7 +71,7 @@ view.disconnect();
71
71
 
72
72
  ## Supported agents and real-time LLMs
73
73
 
74
- Supports Cartesia, ElevenLabs, Vapi, Gemini Live (closed alpha), OpenAI Realtime (closed alpha).
74
+ Supports ElevenLabs and OpenAI Realtime.
75
75
 
76
76
  For `PersonaEmbed`, this is determined by the values you set in the Keyframe platform dashboard.
77
77
 
@@ -79,27 +79,35 @@ For `PersonaView`, this is determined by `voiceAgentDetails`.
79
79
 
80
80
  ## Emotion Controls
81
81
 
82
- The avatar can display emotional expressions (`neutral`, `angry`, `sad`, `happy`) that affect its facial expression and demeanor.
82
+ The avatar can display emotional expressions (`neutral`, `angry`, `sad`, `happy`) that affect its facial expression and demeanor. All supported voice agents can drive emotions automatically via tool/function calling.
83
83
 
84
- ### ElevenLabs: `set_emotion` Tool Call
84
+ ### Agent Events
85
+
86
+ The `emotion` event is emitted when any agent triggers a `set_emotion` tool call:
87
+
88
+ ```typescript
89
+ agent.on('emotion', (emotion) => {
90
+ console.log('Emotion changed:', emotion); // 'neutral' | 'angry' | 'sad' | 'happy'
91
+ });
92
+ ```
85
93
 
86
- When using ElevenLabs as the voice agent, emotions are driven by a **client tool call** named `set_emotion`. The ElevenLabs agent parses incoming `client_tool_call` WebSocket messages and, when the tool name is `set_emotion`, updates the avatar's expression accordingly.
94
+ When using `PersonaEmbed` or `PersonaView`, emotion events are automatically wired to the avatar session -- no extra code is needed.
87
95
 
88
- > **Important:** Transcripts from the ElevenLabs agent are **not** automatically consumed. The `transcript` event is emitted, but it is up to you to subscribe to it if you need transcript data.
96
+ ### ElevenLabs
89
97
 
90
- #### Setup
98
+ Emotions are driven by a **client tool call** named `set_emotion`. The agent parses incoming `client_tool_call` WebSocket messages and sends a `client_tool_result` back.
91
99
 
92
- You must create a `set_emotion` tool in the [ElevenLabs API](https://elevenlabs.io/docs) for your agent. The tool should accept a single parameter:
100
+ **Setup:** Create a `set_emotion` [client tool](https://elevenlabs.io/docs/conversational-ai/customization/tools/client-tools) in the ElevenLabs dashboard for your agent with a single `emotion` parameter (enum: `neutral`, `angry`, `sad`, `happy`). Then instruct your agent (via its system prompt) to call `set_emotion` on each turn.
93
101
 
94
- | Parameter | Type | Description |
95
- | --------- | -------- | -------------------------------------------------------- |
96
- | `emotion` | `enum` | One of `neutral`, `angry`, `sad`, `happy`. |
102
+ ### OpenAI Realtime
97
103
 
98
- Then instruct your agent (via its system prompt) to call `set_emotion` on each turn with the appropriate emotion. The client library handles the rest it validates the emotion, emits an `emotion` event, and sends a `client_tool_result` back to ElevenLabs.
104
+ The `set_emotion` function is **automatically declared** in the OpenAI Realtime session setup. The model calls it via Realtime [function calling](https://developers.openai.com/api/docs/guides/realtime-conversations#function-calling) (`response.done` with `function_call` output items), and the client responds by creating a `function_call_output` conversation item before asking the model to continue.
105
+
106
+ **Setup:** No additional dashboard configuration is needed. Instruct the model via its system prompt to call `set_emotion` on each turn to reflect the tone of its response.
99
107
 
100
108
  ### Manual Emotion Control
101
109
 
102
- For other agents or custom emotion logic, you can access the underlying session to set emotions manually:
110
+ For custom emotion logic outside of tool calling, you can access the underlying session directly:
103
111
 
104
112
  ```typescript
105
113
  import { createClient } from '@keyframelabs/sdk';
@@ -108,18 +116,6 @@ const session = createClient({ ... });
108
116
  await session.setEmotion('happy');
109
117
  ```
110
118
 
111
- ### Agent Events
112
-
113
- The `emotion` event is emitted when the agent triggers a `set_emotion` tool call:
114
-
115
- ```typescript
116
- agent.on('emotion', (emotion) => {
117
- console.log('Emotion changed:', emotion); // 'neutral' | 'angry' | 'sad' | 'happy'
118
- });
119
- ```
120
-
121
- Currently, only the ElevenLabs agent emits emotion events via tool calls.
122
-
123
119
  ## API
124
120
 
125
121
  ### `PersonaEmbed`
@@ -188,10 +184,12 @@ type SessionDetails = {
188
184
  };
189
185
 
190
186
  type VoiceAgentDetails = {
191
- type: 'cartesia' | 'elevenlabs' | 'vapi' | 'gemini' | 'openai';
192
- token?: string; // For gemini, cartesia
193
- agent_id?: string; // For elevenlabs, cartesia
194
- signed_url?: string; // For elevenlabs, vapi
187
+ type: 'elevenlabs' | 'openai';
188
+ token?: string; // For openai (ephemeral client secret)
189
+ agent_id?: string; // For elevenlabs
190
+ signed_url?: string; // For elevenlabs
191
+ system_prompt?: string; // For openai
192
+ voice?: string; // For openai
195
193
  };
196
194
 
197
195
  type Emotion = 'neutral' | 'angry' | 'sad' | 'happy';
@@ -3,7 +3,7 @@
3
3
  *
4
4
  * These utilities help with PCM audio processing for voice AI integrations.
5
5
  */
6
- /** Sample rate for audio sent to Persona (matches Gemini output) */
6
+ /** Standard output sample rate for audio sent to Persona */
7
7
  export declare const SAMPLE_RATE = 24000;
8
8
  /**
9
9
  * Convert base64-encoded audio to Uint8Array.
@@ -1,7 +1,5 @@
1
- import { GeminiLiveAgent, GeminiLiveConfig } from './gemini-live';
2
1
  import { ElevenLabsAgent, ElevenLabsConfig } from './elevenlabs';
3
- import { CartesiaAgent, CartesiaConfig } from './cartesia';
4
- import { VapiAgent, VapiConfig } from './vapi';
2
+ import { OpenAIRealtimeAgent, OpenAIRealtimeConfig, TurnDetection } from './openai-realtime';
5
3
  /**
6
4
  * Agent implementations for voice AI platforms.
7
5
  *
@@ -10,13 +8,11 @@ import { VapiAgent, VapiConfig } from './vapi';
10
8
  */
11
9
  export { BaseAgent, DEFAULT_INPUT_SAMPLE_RATE } from './base';
12
10
  export type { Agent, AgentConfig, AgentEventMap, AgentState, Emotion } from './types';
13
- export { GeminiLiveAgent, type GeminiLiveConfig };
14
11
  export { ElevenLabsAgent, type ElevenLabsConfig };
15
- export { CartesiaAgent, type CartesiaConfig };
16
- export { VapiAgent, type VapiConfig };
12
+ export { OpenAIRealtimeAgent, type OpenAIRealtimeConfig, type TurnDetection };
17
13
  export { SAMPLE_RATE, base64ToBytes, bytesToBase64, resamplePcm, createEventEmitter, floatTo16BitPCM } from './audio-utils';
18
14
  /** Supported agent types */
19
- export type AgentType = 'gemini' | 'elevenlabs' | 'cartesia' | 'vapi';
15
+ export type AgentType = 'elevenlabs' | 'openai';
20
16
  /** Agent type metadata */
21
17
  export interface AgentTypeInfo {
22
18
  id: AgentType;
@@ -27,26 +23,22 @@ export interface AgentTypeInfo {
27
23
  export declare const AGENT_REGISTRY: AgentTypeInfo[];
28
24
  /** Configuration types by agent type */
29
25
  export interface AgentConfigMap {
30
- gemini: GeminiLiveConfig;
31
26
  elevenlabs: ElevenLabsConfig;
32
- cartesia: CartesiaConfig;
33
- vapi: VapiConfig;
27
+ openai: OpenAIRealtimeConfig;
34
28
  }
35
29
  /** Union type of all agent instances */
36
- export type AnyAgent = GeminiLiveAgent | ElevenLabsAgent | CartesiaAgent | VapiAgent;
30
+ export type AnyAgent = ElevenLabsAgent | OpenAIRealtimeAgent;
37
31
  /**
38
32
  * Create an agent instance by type.
39
33
  *
40
34
  * @example
41
35
  * ```ts
42
- * const agent = createAgent('gemini');
43
- * await agent.connect({ apiKey: 'YOUR_KEY' });
36
+ * const agent = createAgent('elevenlabs');
37
+ * await agent.connect({ agentId: '...', signedUrl: '...' });
44
38
  * ```
45
39
  */
46
- export declare function createAgent(type: 'gemini'): GeminiLiveAgent;
47
40
  export declare function createAgent(type: 'elevenlabs'): ElevenLabsAgent;
48
- export declare function createAgent(type: 'cartesia'): CartesiaAgent;
49
- export declare function createAgent(type: 'vapi'): VapiAgent;
41
+ export declare function createAgent(type: 'openai'): OpenAIRealtimeAgent;
50
42
  export declare function createAgent(type: AgentType): AnyAgent;
51
43
  /**
52
44
  * Get agent type metadata by ID.
@@ -0,0 +1,60 @@
1
+ import { AgentConfig } from './types';
2
+ import { BaseAgent } from './base';
3
+ /**
4
+ * Turn detection configuration for OpenAI Realtime.
5
+ * @see https://developers.openai.com/api/docs/guides/realtime-vad
6
+ */
7
+ export type TurnDetection = {
8
+ type: 'server_vad';
9
+ /** Activation threshold 0-1. Higher = requires louder audio. */
10
+ threshold?: number;
11
+ /** Audio (ms) to include before detected speech. */
12
+ prefix_padding_ms?: number;
13
+ /** Silence duration (ms) before speech stop is detected. */
14
+ silence_duration_ms?: number;
15
+ } | {
16
+ type: 'semantic_vad';
17
+ /** How eager the model is to consider a turn finished. Default: 'auto'. */
18
+ eagerness?: 'low' | 'medium' | 'high' | 'auto';
19
+ };
20
+ /** OpenAI Realtime specific configuration */
21
+ export interface OpenAIRealtimeConfig extends AgentConfig {
22
+ /** Model to use (defaults to gpt-realtime) */
23
+ model?: string;
24
+ /** Turn detection / VAD settings. Defaults to semantic_vad with eagerness 'high'. */
25
+ turnDetection?: TurnDetection;
26
+ }
27
+ /**
28
+ * OpenAI Realtime agent implementation.
29
+ *
30
+ * Handles WebSocket connection to OpenAI Realtime and converts
31
+ * audio responses to events that Persona SDK can consume.
32
+ */
33
+ export declare class OpenAIRealtimeAgent extends BaseAgent {
34
+ protected readonly agentName = "OpenAIRealtime";
35
+ private connectResolve;
36
+ private connectReject;
37
+ private connectTimeout;
38
+ private initialSessionUpdate;
39
+ private currentResponseHasAudio;
40
+ private currentTranscript;
41
+ private readonly handledFunctionCallIds;
42
+ private sourceInputSampleRate;
43
+ private pendingFunctionCallStartedAtMs;
44
+ private pendingFunctionCallNames;
45
+ connect(config: OpenAIRealtimeConfig): Promise<void>;
46
+ protected handleParsedMessage(message: unknown): void;
47
+ sendAudio(pcmData: Uint8Array): void;
48
+ close(): void;
49
+ private buildSessionUpdate;
50
+ private sendInitialSessionUpdate;
51
+ private handleResponseDone;
52
+ private handleFunctionCalls;
53
+ private handleFunctionCall;
54
+ private finishAudioTurn;
55
+ private resetTurnState;
56
+ private sendEvent;
57
+ private resolvePendingConnect;
58
+ private rejectPendingConnect;
59
+ private clearConnectTimeout;
60
+ }
@@ -39,7 +39,7 @@ export interface AgentEventMap {
39
39
  text: string;
40
40
  isFinal: boolean;
41
41
  };
42
- /** Emotion change (currently only supported by ElevenLabs agent) */
42
+ /** Emotion change (supported by all agents: ElevenLabs, OpenAI Realtime) */
43
43
  emotion: Emotion;
44
44
  /** Agent connection closed (unexpected disconnect) */
45
45
  closed: {
@@ -50,7 +50,7 @@ export interface AgentEventMap {
50
50
  /**
51
51
  * Abstract agent interface.
52
52
  *
53
- * Implement this for each voice AI platform (Gemini, ElevenLabs, Cartesia, etc.)
53
+ * Implement this for each voice AI platform (ElevenLabs, OpenAI Realtime, etc.)
54
54
  */
55
55
  export interface Agent {
56
56
  /** Current agent state */
package/dist/index.d.ts CHANGED
@@ -3,8 +3,8 @@ export type { PersonaEmbedOptions } from './PersonaEmbed';
3
3
  export { PersonaView } from './PersonaView';
4
4
  export type { PersonaViewOptions } from './PersonaView';
5
5
  export type { EmbedStatus, VideoFit, VoiceAgentDetails, SessionDetails, BaseCallbacks, } from './types';
6
- export { createAgent, GeminiLiveAgent, ElevenLabsAgent, CartesiaAgent, BaseAgent, AGENT_REGISTRY, getAgentInfo, } from './agents';
7
- export type { AgentType, AgentConfig, AgentEventMap, Agent, AnyAgent, AgentTypeInfo, GeminiLiveConfig, ElevenLabsConfig, CartesiaConfig, } from './agents';
6
+ export { createAgent, ElevenLabsAgent, OpenAIRealtimeAgent, BaseAgent, AGENT_REGISTRY, getAgentInfo, } from './agents';
7
+ export type { AgentType, AgentConfig, AgentEventMap, Agent, AnyAgent, AgentTypeInfo, ElevenLabsConfig, OpenAIRealtimeConfig, TurnDetection, } from './agents';
8
8
  export type { AgentState } from '@keyframelabs/sdk';
9
9
  export { floatTo16BitPCM, resamplePcm, base64ToBytes, bytesToBase64, SAMPLE_RATE, createEventEmitter, } from './agents';
10
10
  export { ApiError as KeyframeApiError } from './ApiError';