@aihumanity/voice-sdk 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 AIHumanity
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
package/README.md ADDED
@@ -0,0 +1,486 @@
1
+ # @aihumanity/voice-sdk
2
+
3
+ A small, batteries-included JavaScript SDK for embedding AIHumanity / **eimi**
4
+ voice AI calls on any web page.
5
+
6
+ It wraps [`ultravox-client`](https://www.npmjs.com/package/ultravox-client) and
7
+ adds the things you almost always end up writing yourself:
8
+
9
+ - One-call setup: SDK fetches the `joinUrl` from your eimi backend and joins the call for you.
10
+ - A semantic call-state machine: `idle` → `connecting` → `connected` → `listening` / `speaking` / `thinking` → `disconnecting` → `idle`.
11
+ - Live transcripts, with `transcript` / `transcripts` events and a snapshot getter.
12
+ - Vocal-emotion extraction from `[EMOTION_CONTEXT]` data messages produced by the eimi emotion bridge (configurable regex).
13
+ - Mic / speaker mute helpers.
14
+ - A pre-built React hook (`@aihumanity/voice-sdk/react`).
15
+ - A pre-built floating-button widget (`@aihumanity/voice-sdk/widget`) — drop a single `<script>` on any site.
16
+
17
+ ## Installation
18
+
19
+ ```bash
20
+ npm install @aihumanity/voice-sdk ultravox-client
21
+ # or
22
+ pnpm add @aihumanity/voice-sdk ultravox-client
23
+ ```
24
+
25
+ `ultravox-client` is a hard runtime dependency; it ships separately so multiple
26
+ SDKs / apps can dedupe it. React is an *optional* peer dependency — only needed
27
+ if you import the React adapter.
28
+
29
+ The SDK is ESM-only because `ultravox-client` is ESM-only. Use `import` syntax
30
+ or a bundler that supports ESM packages.
31
+
32
+ For zero-build `<script>`-tag use you can also load the IIFE bundle directly
33
+ from `dist/aihumanity-voice.iife.js` (see the demo).
34
+
35
+ ## Getting your credentials
36
+
37
+ Before writing any code you need a developer account. The whole process takes
38
+ about two minutes and is self-service.
39
+
40
+ ### 1 — Sign up at the developer portal
41
+
42
+ Go to **[portal.eimi.ai](https://portal.eimi.ai)** and create an account.
43
+ Once verified you land on your dashboard.
44
+
45
+ ### 2 — Note your Key ID
46
+
47
+ In the **API Keys** tab you'll see two values:
48
+
49
+ | Field | What it is | Where you use it |
50
+ | --- | --- | --- |
51
+ | **SDK Key ID** | Identifies your developer account | `publicKey` option *or* as the key ID in HMAC signing |
52
+ | **SDK Key Secret** | Signs server-to-server requests | Never put this in browser code |
53
+
54
+ The Key ID is the same value regardless of which auth mode you choose.
55
+
56
+ ### 3 — Choose your integration path
57
+
58
+ **No backend (simplest)**
59
+
60
+ Use your **Key ID** directly as `publicKey`. You also need to tell the server
61
+ which origins are allowed to use it — otherwise every request is rejected.
62
+
63
+ In the portal under **API Keys → Allowed Origins**, add the exact origin(s)
64
+ your site runs on:
65
+
66
+ ```
67
+ https://myapp.com
68
+ https://staging.myapp.com
69
+ http://localhost:5173 ← add this while developing locally
70
+ ```
71
+
72
+ An origin is `scheme + host + port` — no path, no trailing slash.
73
+
74
+ Then in your code:
75
+
76
+ ```ts
77
+ import { VoiceCall } from "@aihumanity/voice-sdk";
78
+
79
+ const call = new VoiceCall({
80
+ apiUrl: "https://api.eimi.ai",
81
+ publicKey: "YOUR_KEY_ID", // from the portal — safe to commit
82
+ agentName: "YourAgent",
83
+ username: "visitor",
84
+ });
85
+ ```
86
+
87
+ **With a backend (more control)**
88
+
89
+ Keep your Key ID and Key Secret on your server and build a small proxy endpoint
90
+ that HMAC-signs the join request. The browser calls your endpoint via
91
+ `fetchJoinUrl` and never touches the eimi API directly:
92
+
93
+ ```ts
94
+ // In your frontend:
95
+ const call = new VoiceCall({
96
+ fetchJoinUrl: async () => {
97
+ const res = await fetch("/api/create-voice-call", { method: "POST" });
98
+ if (!res.ok) throw new Error("Could not start call");
99
+ return res.json(); // { joinUrl, callId, sessionToken }
100
+ },
101
+ agentName: "YourAgent",
102
+ });
103
+ ```
104
+
105
+ ```js
106
+ // On your server (/api/create-voice-call):
107
+ // Sign the request with your Key ID + Key Secret using HMAC-SHA256.
108
+ // See the Authentication section below for the exact signing scheme.
109
+ ```
110
+
111
+ You don't need to register any Allowed Origins when using the server-side path,
112
+ because the HMAC signature — not the browser Origin — is what authenticates
113
+ the request.
114
+
115
+ ---
116
+
117
+ ## Authentication — choosing the right method
118
+
119
+ The SDK supports three auth patterns. Pick the one that matches your deployment.
120
+
121
+ ### Option A — `fetchJoinUrl` (full control)
122
+
123
+ Supply your own async function that returns `{ joinUrl, callId?, sessionToken? }`.
124
+ Use this when your backend already has an endpoint that creates the Ultravox call
125
+ session and you want the SDK to stay out of the request entirely.
126
+
127
+ ```ts
128
+ import { VoiceCall } from "@aihumanity/voice-sdk";
129
+
130
+ const call = new VoiceCall({
131
+ fetchJoinUrl: async () => {
132
+ const res = await fetch("/api/create-call", { method: "POST" });
133
+ if (!res.ok) throw new Error("Could not start call");
134
+ return res.json(); // { joinUrl, callId, sessionToken? }
135
+ },
136
+ agentName: "DavidChiu",
137
+ });
138
+ ```
139
+
140
+ This is the **recommended approach for production web apps**. Your server holds
141
+ the credentials; the browser never sees them.
142
+
143
+ > **`sessionToken`** — When your backend returns a short-lived, call-scoped JWT
144
+ > alongside `joinUrl` / `callId`, include it in the response object. The SDK
145
+ > forwards it to `pollEmotion(callId, sessionToken)` so emotion polling can
146
+ > authenticate without a long-lived secret in the browser.
147
+
148
+ ---
149
+
150
+ ### Option B — `publicKey` (browser-direct, no backend)
151
+
152
+ Use your **Key ID** from the developer portal directly in browser code. The
153
+ server validates requests using the browser's `Origin` header against your
154
+ registered Allowed Origins list — see [Getting your credentials](#getting-your-credentials)
155
+ for the signup and origin registration steps.
156
+
157
+ ```ts
158
+ const call = new VoiceCall({
159
+ apiUrl: "https://api.eimi.ai",
160
+ publicKey: "YOUR_KEY_ID", // Key ID from developer portal — safe to commit
161
+ agentName: "YourAgent",
162
+ username: "visitor",
163
+ });
164
+ ```
165
+
166
+ The SDK sends `X-Public-Key: <publicKey>` and POSTs to
167
+ `${apiUrl}/v1/voice/joinurl`. Override the path with `joinUrlPath` if needed.
168
+
169
+ > Requests from origins not in your Allowed Origins list are rejected with 403.
170
+ > Add `http://localhost:PORT` while developing locally.
171
+
172
+ ---
173
+
174
+ ### Option C — `fetchJoinUrl` with HMAC backend proxy
175
+
176
+ Keep your Key ID and Key Secret on your server. Your backend endpoint signs the
177
+ join request; the browser calls your endpoint via `fetchJoinUrl`.
178
+
179
+ ```ts
180
+ // Frontend — no credentials in the browser at all:
181
+ const call = new VoiceCall({
182
+ fetchJoinUrl: async () => {
183
+ const res = await fetch("/api/create-voice-call", { method: "POST" });
184
+ if (!res.ok) throw new Error("Could not start call");
185
+ return res.json(); // { joinUrl, callId, sessionToken }
186
+ },
187
+ agentName: "YourAgent",
188
+ });
189
+ ```
190
+
191
+ Your server endpoint signs requests to `POST /v1/voice/joinurl` using
192
+ HMAC-SHA256:
193
+
194
+ ```js
195
+ // Server-side signing (Node example):
196
+ const crypto = require("crypto");
197
+ const timestamp = Date.now().toString();
198
+ const method = "POST";
199
+ const path = "/v1/voice/joinurl";
200
+ const canonical = `${timestamp}\n${method}\n${path}`;
201
+ const signature = crypto
202
+ .createHmac("sha256", YOUR_KEY_SECRET)
203
+ .update(canonical)
204
+ .digest("base64");
205
+
206
+ const response = await fetch(`https://api.eimi.ai${path}`, {
207
+ method: "POST",
208
+ headers: {
209
+ "Content-Type": "application/json",
210
+ "X-SDK-Key-Id": YOUR_KEY_ID,
211
+ "X-SDK-Timestamp": timestamp,
212
+ "X-SDK-Signature": signature,
213
+ },
214
+ body: JSON.stringify({ agentName: "YourAgent", username: req.user.id }),
215
+ });
216
+ return response.json(); // forward { joinUrl, callId, sessionToken } to the browser
217
+ ```
218
+
219
+ `YOUR_KEY_ID` and `YOUR_KEY_SECRET` come from the developer portal. The secret
220
+ never leaves your server.
221
+
222
+ > The `authToken` option (Bearer JWT) also maps to this server-side path but is
223
+ > intended for internal operator use. External developers should use `fetchJoinUrl`
224
+ > with HMAC signing as shown above.
225
+
226
+ ---
227
+
228
+ ## Quick start (vanilla TypeScript / JavaScript)
229
+
230
+ ```ts
231
+ import { VoiceCall, CallStatus } from "@aihumanity/voice-sdk";
232
+
233
+ // Option A — recommended for production
234
+ const call = new VoiceCall({
235
+ fetchJoinUrl: async () => {
236
+ const res = await fetch("/.netlify/functions/create-call", { method: "POST" });
237
+ if (!res.ok) throw new Error("Could not create call session.");
238
+ return res.json(); // { joinUrl, callId, sessionToken }
239
+ },
240
+ // Poll server-side emotion every 15 s using the call-scoped session token.
241
+ pollEmotion: async (callId, sessionToken) => {
242
+ const params = new URLSearchParams({ callId });
243
+ if (sessionToken) params.set("sessionToken", sessionToken);
244
+ const res = await fetch(`/.netlify/functions/get-emotion?${params}`);
245
+ if (!res.ok) return null;
246
+ const data = await res.json();
247
+ return data?.emotion ?? null;
248
+ },
249
+ emotionPollIntervalMs: 15_000,
250
+ agentName: "DavidChiu",
251
+ });
252
+
253
+ call.on("status", (s) => console.log("call status:", s));
254
+ call.on("transcript", (t) => console.log(t.speaker, t.text));
255
+ call.on("emotion", (e) => console.log("emotion:", e.label));
256
+ call.on("error", (err) => console.error(err));
257
+
258
+ document.querySelector("#start")!.addEventListener("click", () => call.start());
259
+ document.querySelector("#stop")!.addEventListener("click", () => call.end());
260
+ ```
261
+
262
+ ### How the join URL is fetched
263
+
264
+ The SDK resolves credentials in this order:
265
+
266
+ 1. **`fetchJoinUrl`** — calls your function; skips all built-in request logic.
267
+ 2. **`publicKey`** — POSTs to `${apiUrl}/v1/voice/joinurl` with `X-Public-Key`.
268
+ 3. **`authToken`** — POSTs to `${apiUrl}/ultravox/secure/joinurl` with `Authorization: Bearer`.
269
+
270
+ The backend response must contain at least `joinUrl`. Optional fields:
271
+
272
+ ```jsonc
273
+ {
274
+ "joinUrl": "https://...", // required
275
+ "callId": "uuid", // forwarded to pollEmotion
276
+ "sessionToken": "eyJ...", // short-lived JWT for emotion polling
277
+ "emotion": { "dataConnectionEnabled": true, ... }
278
+ }
279
+ ```
280
+
281
+ Override the default path for options B or C with `joinUrlPath`:
282
+
283
+ ```ts
284
+ new VoiceCall({ publicKey: "pk_...", joinUrlPath: "/v1/voice/joinurl", ... })
285
+ ```
286
+
287
+ ### Session tokens and emotion polling
288
+
289
+ When the backend returns a `sessionToken` alongside the join URL, the SDK stores
290
+ it for the duration of the call. If you provide a `pollEmotion` callback, the SDK
291
+ passes both `(callId, sessionToken)` so your function can authenticate the polling
292
+ request without embedding a service credential in browser code:
293
+
294
+ ```ts
295
+ pollEmotion: async (callId, sessionToken) => {
296
+ const headers: Record<string, string> = {};
297
+ if (sessionToken) headers["Authorization"] = `Bearer ${sessionToken}`;
298
+ const res = await fetch(`/api/calls/${callId}/emotion`, { headers });
299
+ if (!res.ok) return null;
300
+ const { emotion } = await res.json();
301
+ return emotion ?? null;
302
+ },
303
+ ```
304
+
305
+ ## React
306
+
307
+ ```tsx
308
+ import { useVoiceCall, CallStatus } from "@aihumanity/voice-sdk/react";
309
+
310
+ // Define stable callbacks outside the component so the hook doesn't re-run.
311
+ async function fetchJoinUrl() {
312
+ const res = await fetch("/api/create-call", { method: "POST" });
313
+ if (!res.ok) throw new Error("Could not start call");
314
+ return res.json(); // { joinUrl, callId, sessionToken }
315
+ }
316
+
317
+ async function pollEmotion(callId: string, sessionToken?: string) {
318
+ const params = new URLSearchParams({ callId });
319
+ if (sessionToken) params.set("sessionToken", sessionToken);
320
+ const res = await fetch(`/api/emotion?${params}`);
321
+ if (!res.ok) return null;
322
+ const data = await res.json();
323
+ return data?.emotion ?? null;
324
+ }
325
+
326
+ const VOICE_OPTS = { fetchJoinUrl, pollEmotion, emotionPollIntervalMs: 15_000 };
327
+
328
+ function TalkButton() {
329
+ const {
330
+ status, isLive, isBusy, transcripts, lastEmotion,
331
+ micMuted, error, start, end, toggleMicMute,
332
+ } = useVoiceCall(VOICE_OPTS);
333
+
334
+ return (
335
+ <div>
336
+ <button onClick={isLive || isBusy ? end : start}>
337
+ {isLive ? "End" : isBusy ? "Connecting…" : "Talk"}
338
+ </button>
339
+ <button onClick={toggleMicMute} disabled={!isLive}>
340
+ {micMuted ? "Unmute" : "Mute"}
341
+ </button>
342
+ {error && <p style={{ color: "tomato" }}>{error.message}</p>}
343
+ {lastEmotion && <p>Vocal emotion: {lastEmotion}</p>}
344
+ <ul>
345
+ {transcripts.map((t, i) => (
346
+ <li key={i}><b>{t.speaker}:</b> {t.text}</li>
347
+ ))}
348
+ </ul>
349
+ </div>
350
+ );
351
+ }
352
+ ```
353
+
354
+ Status values map directly onto `CallStatus`:
355
+
356
+ | `CallStatus` | When you'll see it |
357
+ | ---------------- | --------------------------------------------------------------- |
358
+ | `IDLE` | Before `start()` and after the call has fully ended. |
359
+ | `CONNECTING` | Fetching the join URL or running WebRTC handshake. |
360
+ | `CONNECTED` | Call is live and the agent is waiting (no one is talking). |
361
+ | `LISTENING` | Mic is open and capturing user audio. |
362
+ | `THINKING` | Agent is reasoning about the user's last utterance. |
363
+ | `SPEAKING` | Agent is generating audio. |
364
+ | `DISCONNECTING` | `end()` was called; teardown in progress. |
365
+ | `DISCONNECTED` | Terminal state from ultravox-client; SDK normalises back to `IDLE`. |
366
+
367
+ ## Floating widget
368
+
369
+ Mount a self-contained mic button + call panel anywhere:
370
+
371
+ ```ts
372
+ import { mountFloatingWidget } from "@aihumanity/voice-sdk/widget";
373
+
374
+ // Option A — server-side proxy (recommended)
375
+ mountFloatingWidget({
376
+ fetchJoinUrl: () =>
377
+ fetch("/api/create-call", { method: "POST" }).then((r) => r.json()),
378
+ agentName: "DavidChiu",
379
+ persona: {
380
+ name: "David Chiu",
381
+ title: "Founder & CEO · AIHumanity",
382
+ initials: "DC",
383
+ intro: "Have a real-time voice conversation with David — ask anything.",
384
+ },
385
+ });
386
+
387
+ // Option B — browser-direct with a public key
388
+ mountFloatingWidget({
389
+ apiUrl: "https://api.eimi.ai",
390
+ publicKey: "pk_live_abc123", // register your origin in the developer portal first
391
+ agentName: "DavidChiu",
392
+ persona: { name: "David Chiu", initials: "DC" },
393
+ });
394
+ ```
395
+
396
+ Or via plain `<script>` (IIFE build):
397
+
398
+ ```html
399
+ <script src="https://your.cdn/aihumanity-voice.iife.js"></script>
400
+ <script>
401
+ // Browser-direct with public key
402
+ AIHVoice.mountFloatingWidget({
403
+ apiUrl: "https://api.eimi.ai",
404
+ publicKey: "pk_live_abc123",
405
+ agentName: "DavidChiu",
406
+ persona: { name: "David Chiu", initials: "DC" },
407
+ });
408
+ </script>
409
+ ```
410
+
411
+ The widget renders inside a Shadow DOM, so its CSS won't fight your site's.
412
+
413
+ ## Events reference
414
+
415
+ | Event | Payload | Notes |
416
+ | --------------- | ---------------------------------------- | ------------------------------------------- |
417
+ | `status` | `CallStatus` | Coarse semantic status. |
418
+ | `raw_status` | `string` | Underlying ultravox-client status string. |
419
+ | `transcript` | `Transcript` | Fired per added/updated entry. |
420
+ | `transcripts` | `Transcript[]` | Snapshot after each transcript update. |
421
+ | `emotion` | `{ label: string, raw: unknown }` | Emitted when emotion regex matches a data message. |
422
+ | `data_message` | `unknown` | Every `experimental_message` payload. |
423
+ | `mic_muted` | `boolean` | |
424
+ | `speaker_muted` | `boolean` | |
425
+ | `contact_saved` | `void` | Heuristic on agent transcript. |
426
+ | `warning` | `string` | E.g. emotion bridge not configured. |
427
+ | `error` | `Error` | Fatal during start/operation. |
428
+ | `ended` | `void` | Fires once the underlying session disconnects. |
429
+
430
+ ## API surface
431
+
432
+ ```ts
433
+ class VoiceCall {
434
+ constructor(options: VoiceCallOptions);
435
+
436
+ // Read-only state
437
+ readonly status: CallStatus;
438
+ readonly callId: string | null;
439
+ readonly transcripts: Transcript[];
440
+ readonly lastEmotion: string | null;
441
+ readonly contactSaved: boolean;
442
+ readonly isMicMuted: boolean;
443
+ readonly isSpeakerMuted: boolean;
444
+ readonly emotionMeta: ServerEmotionMeta | null;
445
+ readonly rawSession: UltravoxSession | null;
446
+
447
+ // Events
448
+ on<E>(event, listener): () => void; // returns unsubscribe
449
+ off<E>(event, listener): void;
450
+ once<E>(event, listener): () => void;
451
+
452
+ // Control
453
+ start(): Promise<void>;
454
+ end(): Promise<void>;
455
+ muteMic(): void; unmuteMic(): void; toggleMicMute(): boolean;
456
+ muteSpeaker(): void; unmuteSpeaker(): void; toggleSpeakerMute(): boolean;
457
+ sendText(text: string, deferResponse?: boolean): void;
458
+ sendData(obj: unknown): void;
459
+ dispose(): void;
460
+ }
461
+ ```
462
+
463
+ ## Building from source
464
+
465
+ ```bash
466
+ npm install
467
+ npm run build # ESM + CJS + .d.ts (library mode)
468
+ npm run build:iife # bundled <script> tag build
469
+ npm run build:all
470
+ npm run typecheck
471
+ ```
472
+
473
+ The `examples/demo.html` page loads `dist/aihumanity-voice.iife.js`, so run
474
+ `npm run build:all` once before opening it. The `npm run demo` script does
475
+ both for you.
476
+
477
+ ## Roadmap
478
+
479
+ - Streaming partial-emotion confidences (instead of just last label).
480
+ - Pluggable transcript renderers (Markdown, ReactMarkdown).
481
+ - Server-side helper to mint short-lived per-user JWTs.
482
+ - Unit tests for emotion-pattern matching and status mapping.
483
+
484
+ ## License
485
+
486
+ MIT