npm - @aihumanity/voice-sdk - Versions diffs - 0.1.0 - Mend

@aihumanity/voice-sdk 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (13) hide show

package/LICENSE +21 -0
package/README.md +486 -0
package/dist/VoiceCall-_BBARIQT.d.ts +276 -0
package/dist/index.d.ts +19 -0
package/dist/index.js +485 -0
package/dist/index.js.map +1 -0
package/dist/react.d.ts +60 -0
package/dist/react.js +572 -0
package/dist/react.js.map +1 -0
package/dist/widget.d.ts +68 -0
package/dist/widget.js +781 -0
package/dist/widget.js.map +1 -0
package/package.json +87 -0

package/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2026 AIHumanity
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

package/README.md ADDED Viewed

@@ -0,0 +1,486 @@
+# @aihumanity/voice-sdk
+A small, batteries-included JavaScript SDK for embedding AIHumanity / **eimi**
+voice AI calls on any web page.
+It wraps [`ultravox-client`](https://www.npmjs.com/package/ultravox-client) and
+adds the things you almost always end up writing yourself:
+- One-call setup: SDK fetches the `joinUrl` from your eimi backend and joins the call for you.
+- A semantic call-state machine: `idle` → `connecting` → `connected` → `listening` / `speaking` / `thinking` → `disconnecting` → `idle`.
+- Live transcripts, with `transcript` / `transcripts` events and a snapshot getter.
+- Vocal-emotion extraction from `[EMOTION_CONTEXT]` data messages produced by the eimi emotion bridge (configurable regex).
+- Mic / speaker mute helpers.
+- A pre-built React hook (`@aihumanity/voice-sdk/react`).
+- A pre-built floating-button widget (`@aihumanity/voice-sdk/widget`) — drop a single `<script>` on any site.
+## Installation
+```bash
+npm install @aihumanity/voice-sdk ultravox-client
+# or
+pnpm add @aihumanity/voice-sdk ultravox-client
+```
+`ultravox-client` is a hard runtime dependency; it ships separately so multiple
+SDKs / apps can dedupe it. React is an *optional* peer dependency — only needed
+if you import the React adapter.
+The SDK is ESM-only because `ultravox-client` is ESM-only. Use `import` syntax
+or a bundler that supports ESM packages.
+For zero-build `<script>`-tag use you can also load the IIFE bundle directly
+from `dist/aihumanity-voice.iife.js` (see the demo).
+## Getting your credentials
+Before writing any code you need a developer account. The whole process takes
+about two minutes and is self-service.
+### 1 — Sign up at the developer portal
+Go to **[portal.eimi.ai](https://portal.eimi.ai)** and create an account.
+Once verified you land on your dashboard.
+### 2 — Note your Key ID
+In the **API Keys** tab you'll see two values:
+| Field | What it is | Where you use it |
+| --- | --- | --- |
+| **SDK Key ID** | Identifies your developer account | `publicKey` option *or* as the key ID in HMAC signing |
+| **SDK Key Secret** | Signs server-to-server requests | Never put this in browser code |
+The Key ID is the same value regardless of which auth mode you choose.
+### 3 — Choose your integration path
+**No backend (simplest)**
+Use your **Key ID** directly as `publicKey`. You also need to tell the server
+which origins are allowed to use it — otherwise every request is rejected.
+In the portal under **API Keys → Allowed Origins**, add the exact origin(s)
+your site runs on:
+```
+https://myapp.com
+https://staging.myapp.com
+http://localhost:5173     ← add this while developing locally
+```
+An origin is `scheme + host + port` — no path, no trailing slash.
+Then in your code:
+```ts
+import { VoiceCall } from "@aihumanity/voice-sdk";
+const call = new VoiceCall({
+  apiUrl:    "https://api.eimi.ai",
+  publicKey: "YOUR_KEY_ID",   // from the portal — safe to commit
+  agentName: "YourAgent",
+  username:  "visitor",
+});
+```
+**With a backend (more control)**
+Keep your Key ID and Key Secret on your server and build a small proxy endpoint
+that HMAC-signs the join request. The browser calls your endpoint via
+`fetchJoinUrl` and never touches the eimi API directly:
+```ts
+// In your frontend:
+const call = new VoiceCall({
+  fetchJoinUrl: async () => {
+    const res = await fetch("/api/create-voice-call", { method: "POST" });
+    if (!res.ok) throw new Error("Could not start call");
+    return res.json(); // { joinUrl, callId, sessionToken }
+  },
+  agentName: "YourAgent",
+});
+```
+```js
+// On your server (/api/create-voice-call):
+// Sign the request with your Key ID + Key Secret using HMAC-SHA256.
+// See the Authentication section below for the exact signing scheme.
+```
+You don't need to register any Allowed Origins when using the server-side path,
+because the HMAC signature — not the browser Origin — is what authenticates
+the request.
+---
+## Authentication — choosing the right method
+The SDK supports three auth patterns. Pick the one that matches your deployment.
+### Option A — `fetchJoinUrl` (full control)
+Supply your own async function that returns `{ joinUrl, callId?, sessionToken? }`.
+Use this when your backend already has an endpoint that creates the Ultravox call
+session and you want the SDK to stay out of the request entirely.
+```ts
+import { VoiceCall } from "@aihumanity/voice-sdk";
+const call = new VoiceCall({
+  fetchJoinUrl: async () => {
+    const res = await fetch("/api/create-call", { method: "POST" });
+    if (!res.ok) throw new Error("Could not start call");
+    return res.json(); // { joinUrl, callId, sessionToken? }
+  },
+  agentName: "DavidChiu",
+});
+```
+This is the **recommended approach for production web apps**. Your server holds
+the credentials; the browser never sees them.
+> **`sessionToken`** — When your backend returns a short-lived, call-scoped JWT
+> alongside `joinUrl` / `callId`, include it in the response object. The SDK
+> forwards it to `pollEmotion(callId, sessionToken)` so emotion polling can
+> authenticate without a long-lived secret in the browser.
+---
+### Option B — `publicKey` (browser-direct, no backend)
+Use your **Key ID** from the developer portal directly in browser code. The
+server validates requests using the browser's `Origin` header against your
+registered Allowed Origins list — see [Getting your credentials](#getting-your-credentials)
+for the signup and origin registration steps.
+```ts
+const call = new VoiceCall({
+  apiUrl:    "https://api.eimi.ai",
+  publicKey: "YOUR_KEY_ID",   // Key ID from developer portal — safe to commit
+  agentName: "YourAgent",
+  username:  "visitor",
+});
+```
+The SDK sends `X-Public-Key: <publicKey>` and POSTs to
+`${apiUrl}/v1/voice/joinurl`. Override the path with `joinUrlPath` if needed.
+> Requests from origins not in your Allowed Origins list are rejected with 403.
+> Add `http://localhost:PORT` while developing locally.
+---
+### Option C — `fetchJoinUrl` with HMAC backend proxy
+Keep your Key ID and Key Secret on your server. Your backend endpoint signs the
+join request; the browser calls your endpoint via `fetchJoinUrl`.
+```ts
+// Frontend — no credentials in the browser at all:
+const call = new VoiceCall({
+  fetchJoinUrl: async () => {
+    const res = await fetch("/api/create-voice-call", { method: "POST" });
+    if (!res.ok) throw new Error("Could not start call");
+    return res.json(); // { joinUrl, callId, sessionToken }
+  },
+  agentName: "YourAgent",
+});
+```
+Your server endpoint signs requests to `POST /v1/voice/joinurl` using
+HMAC-SHA256:
+```js
+// Server-side signing (Node example):
+const crypto    = require("crypto");
+const timestamp = Date.now().toString();
+const method    = "POST";
+const path      = "/v1/voice/joinurl";
+const canonical = `${timestamp}\n${method}\n${path}`;
+const signature = crypto
+  .createHmac("sha256", YOUR_KEY_SECRET)
+  .update(canonical)
+  .digest("base64");
+const response = await fetch(`https://api.eimi.ai${path}`, {
+  method: "POST",
+  headers: {
+    "Content-Type":    "application/json",
+    "X-SDK-Key-Id":    YOUR_KEY_ID,
+    "X-SDK-Timestamp": timestamp,
+    "X-SDK-Signature": signature,
+  },
+  body: JSON.stringify({ agentName: "YourAgent", username: req.user.id }),
+});
+return response.json(); // forward { joinUrl, callId, sessionToken } to the browser
+```
+`YOUR_KEY_ID` and `YOUR_KEY_SECRET` come from the developer portal. The secret
+never leaves your server.
+> The `authToken` option (Bearer JWT) also maps to this server-side path but is
+> intended for internal operator use. External developers should use `fetchJoinUrl`
+> with HMAC signing as shown above.
+---
+## Quick start (vanilla TypeScript / JavaScript)
+```ts
+import { VoiceCall, CallStatus } from "@aihumanity/voice-sdk";
+// Option A — recommended for production
+const call = new VoiceCall({
+  fetchJoinUrl: async () => {
+    const res = await fetch("/.netlify/functions/create-call", { method: "POST" });
+    if (!res.ok) throw new Error("Could not create call session.");
+    return res.json(); // { joinUrl, callId, sessionToken }
+  },
+  // Poll server-side emotion every 15 s using the call-scoped session token.
+  pollEmotion: async (callId, sessionToken) => {
+    const params = new URLSearchParams({ callId });
+    if (sessionToken) params.set("sessionToken", sessionToken);
+    const res = await fetch(`/.netlify/functions/get-emotion?${params}`);
+    if (!res.ok) return null;
+    const data = await res.json();
+    return data?.emotion ?? null;
+  },
+  emotionPollIntervalMs: 15_000,
+  agentName: "DavidChiu",
+});
+call.on("status",     (s) => console.log("call status:", s));
+call.on("transcript", (t) => console.log(t.speaker, t.text));
+call.on("emotion",    (e) => console.log("emotion:", e.label));
+call.on("error",      (err) => console.error(err));
+document.querySelector("#start")!.addEventListener("click", () => call.start());
+document.querySelector("#stop")!.addEventListener("click",  () => call.end());
+```
+### How the join URL is fetched
+The SDK resolves credentials in this order:
+1. **`fetchJoinUrl`** — calls your function; skips all built-in request logic.
+2. **`publicKey`** — POSTs to `${apiUrl}/v1/voice/joinurl` with `X-Public-Key`.
+3. **`authToken`** — POSTs to `${apiUrl}/ultravox/secure/joinurl` with `Authorization: Bearer`.
+The backend response must contain at least `joinUrl`. Optional fields:
+```jsonc
+{
+  "joinUrl":      "https://...",          // required
+  "callId":       "uuid",                 // forwarded to pollEmotion
+  "sessionToken": "eyJ...",              // short-lived JWT for emotion polling
+  "emotion":      { "dataConnectionEnabled": true, ... }
+}
+```
+Override the default path for options B or C with `joinUrlPath`:
+```ts
+new VoiceCall({ publicKey: "pk_...", joinUrlPath: "/v1/voice/joinurl", ... })
+```
+### Session tokens and emotion polling
+When the backend returns a `sessionToken` alongside the join URL, the SDK stores
+it for the duration of the call. If you provide a `pollEmotion` callback, the SDK
+passes both `(callId, sessionToken)` so your function can authenticate the polling
+request without embedding a service credential in browser code:
+```ts
+pollEmotion: async (callId, sessionToken) => {
+  const headers: Record<string, string> = {};
+  if (sessionToken) headers["Authorization"] = `Bearer ${sessionToken}`;
+  const res = await fetch(`/api/calls/${callId}/emotion`, { headers });
+  if (!res.ok) return null;
+  const { emotion } = await res.json();
+  return emotion ?? null;
+},
+```
+## React
+```tsx
+import { useVoiceCall, CallStatus } from "@aihumanity/voice-sdk/react";
+// Define stable callbacks outside the component so the hook doesn't re-run.
+async function fetchJoinUrl() {
+  const res = await fetch("/api/create-call", { method: "POST" });
+  if (!res.ok) throw new Error("Could not start call");
+  return res.json(); // { joinUrl, callId, sessionToken }
+}
+async function pollEmotion(callId: string, sessionToken?: string) {
+  const params = new URLSearchParams({ callId });
+  if (sessionToken) params.set("sessionToken", sessionToken);
+  const res = await fetch(`/api/emotion?${params}`);
+  if (!res.ok) return null;
+  const data = await res.json();
+  return data?.emotion ?? null;
+}
+const VOICE_OPTS = { fetchJoinUrl, pollEmotion, emotionPollIntervalMs: 15_000 };
+function TalkButton() {
+  const {
+    status, isLive, isBusy, transcripts, lastEmotion,
+    micMuted, error, start, end, toggleMicMute,
+  } = useVoiceCall(VOICE_OPTS);
+  return (
+    <div>
+      <button onClick={isLive || isBusy ? end : start}>
+        {isLive ? "End" : isBusy ? "Connecting…" : "Talk"}
+      </button>
+      <button onClick={toggleMicMute} disabled={!isLive}>
+        {micMuted ? "Unmute" : "Mute"}
+      </button>
+      {error && <p style={{ color: "tomato" }}>{error.message}</p>}
+      {lastEmotion && <p>Vocal emotion: {lastEmotion}</p>}
+      <ul>
+        {transcripts.map((t, i) => (
+          <li key={i}><b>{t.speaker}:</b> {t.text}</li>
+        ))}
+      </ul>
+    </div>
+  );
+}
+```
+Status values map directly onto `CallStatus`:
+| `CallStatus`     | When you'll see it                                              |
+| ---------------- | --------------------------------------------------------------- |
+| `IDLE`           | Before `start()` and after the call has fully ended.           |
+| `CONNECTING`     | Fetching the join URL or running WebRTC handshake.             |
+| `CONNECTED`      | Call is live and the agent is waiting (no one is talking).     |
+| `LISTENING`      | Mic is open and capturing user audio.                          |
+| `THINKING`       | Agent is reasoning about the user's last utterance.            |
+| `SPEAKING`       | Agent is generating audio.                                     |
+| `DISCONNECTING`  | `end()` was called; teardown in progress.                      |
+| `DISCONNECTED`   | Terminal state from ultravox-client; SDK normalises back to `IDLE`. |
+## Floating widget
+Mount a self-contained mic button + call panel anywhere:
+```ts
+import { mountFloatingWidget } from "@aihumanity/voice-sdk/widget";
+// Option A — server-side proxy (recommended)
+mountFloatingWidget({
+  fetchJoinUrl: () =>
+    fetch("/api/create-call", { method: "POST" }).then((r) => r.json()),
+  agentName: "DavidChiu",
+  persona: {
+    name: "David Chiu",
+    title: "Founder & CEO · AIHumanity",
+    initials: "DC",
+    intro: "Have a real-time voice conversation with David — ask anything.",
+  },
+});
+// Option B — browser-direct with a public key
+mountFloatingWidget({
+  apiUrl:    "https://api.eimi.ai",
+  publicKey: "pk_live_abc123",  // register your origin in the developer portal first
+  agentName: "DavidChiu",
+  persona:   { name: "David Chiu", initials: "DC" },
+});
+```
+Or via plain `<script>` (IIFE build):
+```html
+<script src="https://your.cdn/aihumanity-voice.iife.js"></script>
+<script>
+  // Browser-direct with public key
+  AIHVoice.mountFloatingWidget({
+    apiUrl:    "https://api.eimi.ai",
+    publicKey: "pk_live_abc123",
+    agentName: "DavidChiu",
+    persona:   { name: "David Chiu", initials: "DC" },
+  });
+</script>
+```
+The widget renders inside a Shadow DOM, so its CSS won't fight your site's.
+## Events reference
+| Event           | Payload                                  | Notes                                       |
+| --------------- | ---------------------------------------- | ------------------------------------------- |
+| `status`        | `CallStatus`                             | Coarse semantic status.                    |
+| `raw_status`    | `string`                                 | Underlying ultravox-client status string.  |
+| `transcript`    | `Transcript`                             | Fired per added/updated entry.             |
+| `transcripts`   | `Transcript[]`                           | Snapshot after each transcript update.    |
+| `emotion`       | `{ label: string, raw: unknown }`        | Emitted when emotion regex matches a data message. |
+| `data_message`  | `unknown`                                | Every `experimental_message` payload.      |
+| `mic_muted`     | `boolean`                                |                                             |
+| `speaker_muted` | `boolean`                                |                                             |
+| `contact_saved` | `void`                                   | Heuristic on agent transcript.             |
+| `warning`       | `string`                                 | E.g. emotion bridge not configured.        |
+| `error`         | `Error`                                  | Fatal during start/operation.              |
+| `ended`         | `void`                                   | Fires once the underlying session disconnects. |
+## API surface
+```ts
+class VoiceCall {
+  constructor(options: VoiceCallOptions);
+  // Read-only state
+  readonly status: CallStatus;
+  readonly callId: string | null;
+  readonly transcripts: Transcript[];
+  readonly lastEmotion: string | null;
+  readonly contactSaved: boolean;
+  readonly isMicMuted: boolean;
+  readonly isSpeakerMuted: boolean;
+  readonly emotionMeta: ServerEmotionMeta | null;
+  readonly rawSession: UltravoxSession | null;
+  // Events
+  on<E>(event, listener): () => void;     // returns unsubscribe
+  off<E>(event, listener): void;
+  once<E>(event, listener): () => void;
+  // Control
+  start(): Promise<void>;
+  end(): Promise<void>;
+  muteMic(): void;          unmuteMic(): void;          toggleMicMute(): boolean;
+  muteSpeaker(): void;      unmuteSpeaker(): void;      toggleSpeakerMute(): boolean;
+  sendText(text: string, deferResponse?: boolean): void;
+  sendData(obj: unknown): void;
+  dispose(): void;
+}
+```
+## Building from source
+```bash
+npm install
+npm run build         # ESM + CJS + .d.ts (library mode)
+npm run build:iife    # bundled <script> tag build
+npm run build:all
+npm run typecheck
+```
+The `examples/demo.html` page loads `dist/aihumanity-voice.iife.js`, so run
+`npm run build:all` once before opening it. The `npm run demo` script does
+both for you.
+## Roadmap
+- Streaming partial-emotion confidences (instead of just last label).
+- Pluggable transcript renderers (Markdown, ReactMarkdown).
+- Server-side helper to mint short-lived per-user JWTs.
+- Unit tests for emotion-pattern matching and status mapping.
+## License
+MIT