npm - @craftedxp/voice-js - Versions diffs - 0.2.0 - Mend

@craftedxp/voice-js 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (17) hide show

package/CONSUMING.md ADDED Viewed

@@ -0,0 +1,120 @@
+# Consuming `@craftedxp/voice-js` in your app
+Walks through the three install paths (local tarball → fastest, file: dep → middle, npm registry → production) plus the platform-specific glue you need on each side.
+## TL;DR
+```bash
+# in YOUR app
+npm install @craftedxp/voice-js
+# Node consumers also need:
+npm install ws
+# wire up:
+import { configureVoiceClient } from '@craftedxp/voice-js'
+const voice = configureVoiceClient({ apiBase, fetchToken })
+const call = await voice.startCall({ agentId, ...callbacks })
+```
+That's the whole thing in the browser. No native build, no patches, no permissions hassle (browser's standard mic prompt covers it).
+## Path 1 — Local tarball (fastest, no registry)
+Best for quick verification while iterating on the SDK locally. Same workflow as the React Native SDK.
+```bash
+# in the SDK repo
+cd voice-assistant/sdk/voice-js
+npm install
+npm pack  # → craftedxp-voice-js-0.2.0.tgz
+# in YOUR app
+cd /path/to/your-app
+npm install /abs/path/to/voice-assistant/sdk/voice-js/craftedxp-voice-js-0.2.0.tgz
+```
+`npm pack` runs `prepare` automatically, which rebuilds `dist/` (browser + node bundles + embed IIFE) into the tarball.
+## Path 2 — file: dep (monorepo / side-by-side)
+For the landing dashboard in this repo:
+```jsonc
+{
+  "dependencies": {
+    "@craftedxp/voice-js": "file:../sdk/voice-js"
+  }
+}
+```
+`dist/` is checked into git for this dep type because Vercel-style deploys don't reliably run the `prepare` hook for `file:` deps (matching the same rationale we ship `dist/` for under `sdk/voice-rn` and `sdk/sdk-node`).
+After SDK changes, `npm run build` in `sdk/voice-js/` and your consumer picks up the new `dist/` on next refresh.
+## Path 3 — npm registry (production)
+```bash
+npm install @craftedxp/voice-js
+# Node-side consumers:
+npm install ws
+```
+`ws` is a peer dependency declared as `peerDependenciesMeta.optional` so npm doesn't force-install it for browser-only consumers. Add it explicitly when running under Node / Electron-main.
+## Backend setup — minting `ct_` tokens
+Your `fetchToken` callback hits YOUR backend. Your backend uses the `sk_` API key (held server-side only) to mint a short-lived `ct_` for the SDK to use. Pattern:
+```ts
+// YOUR backend (e.g. /api/voice/mint route)
+import { CraftedXP } from '@craftedxp/sdk-node'
+const platform = new CraftedXP({ apiKey: process.env.CRAFTEDXP_SK })
+export async function POST(req) {
+  // Your auth — verify the user is signed in, extract their userId etc.
+  const session = await getSession(req)
+  if (!session) return new Response('unauthorized', { status: 401 })
+  const { agentId, context, metadata } = await req.json()
+  // Mint with the user's identity. Server attaches contactId, applies the
+  // org's tier gate, etc.
+  const token = await platform.callTokens.mint({
+    agentId,
+    contactId: session.userId,
+    context,
+    metadata,
+    ttlSeconds: 600,
+  })
+  return Response.json({ token: token.token })
+}
+```
+The SDK calls this on initial connect and (eventually) on `token_expired` mid-call. Don't share the same `ct_` across browser tabs — they're scoped to one call session and short-lived.
+## iOS Safari / mobile browsers
+Browsers require a user gesture to start `AudioContext`. The SDK calls `audioContext.resume()` automatically, but if you call `startCall` outside a click/tap handler, the AudioContext may stay suspended and you won't hear the agent.
+**Always invoke `startCall` from inside a click handler** (or a `Promise.resolve().then()` chain that started in one). The `getUserMedia` mic prompt also won't fire on iOS without a gesture.
+## CSP / mic permission
+For consumers running on a strict CSP, allow:
+- `connect-src wss://your-voxline-server.com`
+- `worker-src 'self' blob:` (the audio worklet is registered from a Blob URL)
+Browsers also need `https` for `getUserMedia` (or `localhost` during dev).
+## Debug logging
+The SDK doesn't log to the console by default. To see protocol-level events, wire all the `onStateChange` / `onTranscript` / `onError` / `onEnd` callbacks and log them yourself — that's the same surface the SDK uses internally.
+## Updating
+When the SDK changes:
+- **Tarball path:** re-`npm pack` then `npm install <newTgz>` in the consumer.
+- **`file:` path:** `npm run build` in `sdk/voice-js/` (refreshes `dist/`); the consumer picks it up on the next bundler refresh.
+- **Registry path:** bump the version in your `package.json` and `npm install`.
+Major-version bumps note breaking changes in the README's `Status` section + a migration block.

package/DEVELOPING.md ADDED Viewed

@@ -0,0 +1,78 @@
+# Developing `@craftedxp/voice-js` locally
+JS SDK is pure JS — no native toolchain, no platform quirks. Iteration is fast: edit TS → tsup rebuilds `dist/` → your consumer app picks it up.
+## TL;DR
+```bash
+cd voice-assistant/sdk/voice-js
+npm install
+npm run dev       # tsup --watch, rebuilds browser + node + embed on save
+```
+Leave that running. In a consumer project, either:
+1. **Via Yalc (recommended):** `yalc publish` in the SDK, `yalc add @craftedxp/voice-js` in the consumer. `yalc push` after each SDK rebuild.
+2. **Via file-dep:** `npm install file:/abs/path/to/sdk/voice-js` in the consumer; re-run after each rebuild (or use `--install-links` to copy).
+Either way, the consumer's bundler (Webpack / Vite / esbuild / Next) picks up the new `dist/` on next hot reload.
+## Bundle layout (0.2.0)
+`tsup.config.ts` emits three artefacts:
+- `dist/browser.{mjs,js}` + `.d.ts` — browser entry. Uses native `WebSocket` + `AudioContext` + `getUserMedia`. The default under the package's `browser` and `default` exports conditions.
+- `dist/node.{mjs,js}` + `.d.ts` — Node entry. Uses `ws` (loaded via dynamic `import()` so it stays an OPTIONAL peer). No audio glue. Picked under the `node` condition or `@craftedxp/voice-js/node`.
+- `dist/embed.iife.js` — minified IIFE for `<script>` embed; bundles the browser entry inline.
+Source files map to:
+- `src/browser.ts` — entry, factory implementation, public re-exports.
+- `src/node.ts` — entry, dynamic `ws` loader, factory implementation.
+- `src/VoiceClient.ts` — browser `BrowserVoiceClient` implementing the `Call` interface.
+- `src/NodeVoiceClient.ts` — node `NodeVoiceClient` implementing the `NodeCall` interface (extends `Call` with `sendAudioChunk`).
+- `src/protocol.ts` — shared WS message handling + state machine + types. Both clients call into this.
+- `src/config.ts` — `VoiceClientConfig` / `StartCallOptions` / `VoiceClientFactory` types + `normalizeConfig` + `mergeStartCallContext` helpers.
+- `src/AudioCapture.ts` / `src/AudioPlayback.ts` — browser-only. Lifted unchanged from the `@voxline/web` 0.1.x code.
+- `src/ReconnectingWebSocket.ts` — transport-agnostic; takes a `wsFactory` so browser + node both reuse it.
+## Testing the embed bundle
+The embed widget lives at `src/widget/embed.ts` and compiles to `dist/embed.iife.js`, which is copied to `/web/embed.js` post-build. The main server (port 8080) serves `/embed.js` as a static file.
+To iterate on the widget:
+1. `npm run dev` in `sdk/voice-js` — watches all three entries (browser, node, embed).
+2. Open `http://localhost:8080/embed.js` in a browser tab after each save — `curl` or View-Source to confirm your changes landed.
+3. Test end-to-end against `http://localhost:3000` (the landing page's EmbedDemo mounts the widget in place) or the `/test` page.
+Note: the post-build copy (`embed:copy` script) only runs after a full `npm run build`, not after `tsup --watch`. For a fresh watch-mode embed, rerun the build manually or add an `onchange` watcher:
+```bash
+# optional: auto-copy on every tsup output change
+npx onchange 'dist/embed.iife.js' -- npm run embed:copy
+```
+## Debugging in the browser
+- **Source maps** are enabled for the main SDK bundle. In DevTools you'll see `VoiceClient.ts`, `AudioCapture.ts` etc. in Sources. The embed bundle is minified (it ships to end users) and has no source map — file an issue if you need one for local debugging.
+- **`console.log`** inside `VoiceClient` shows up in the consumer's browser console.
+- **AudioWorklet code** (`mic-downsampler.worklet.js`) runs in the audio rendering thread. `console.log` inside the worklet works but has a 1–2 frame delay. For tight debugging, post messages back to the main thread with `port.postMessage({debug: ...})`.
+## Two quick sanity checks before cutting a release
+1. **Build succeeds + tarball is clean:**
+   ```bash
+   npm run build
+   npm pack --dry-run
+   # expected: only dist/, README.md, CONSUMING.md, DEVELOPING.md, package.json
+   ```
+2. **Embed IIFE runs standalone:**
+   ```bash
+   # open an empty HTML page; paste the <script> tag from the README
+   # click the button, verify getUserMedia prompt + agent greeting
+   ```
+## Related
+- [CONSUMING.md](./CONSUMING.md) — external install paths
+- `../react-native/DEVELOPING.md` — same story for mobile

package/README.md ADDED Viewed

@@ -0,0 +1,245 @@
+# @craftedxp/voice-js
+JS SDK for embedding a voice agent call in any JS environment — browser tabs, Node.js processes, Electron apps. Zero framework deps.
+Companion to [`@craftedxp/voice-rn`](https://www.npmjs.com/package/@craftedxp/voice-rn) (React Native) and [`@craftedxp/sdk-node`](https://www.npmjs.com/package/@craftedxp/sdk-node) (server-side `sk_` SDK).
+> **Internal testing release.** API surface may evolve before a stable release. **0.2.0** is a breaking rename + redesign of the previous `@voxline/web@0.1.0` — the singleton-`VoiceClient`-with-`apiKey` pattern is gone in favour of a `configureVoiceClient({ fetchToken })` factory that mirrors `voice-rn` 0.3.x. See [Migrating from `@voxline/web`](#migrating-from-voxlineweb) below.
+## Install
+```bash
+npm install @craftedxp/voice-js
+# Node consumers also need:
+npm install ws
+```
+`ws` is declared as an OPTIONAL peer — only needed in Node / Electron-main. Browsers use the native `WebSocket` and skip it.
+## How the integration fits together
+The same three-party flow as `voice-rn`. Your backend mints `ct_` tokens with its `sk_` API key (via `@craftedxp/sdk-node` or a raw `POST /v1/call-tokens`), the SDK calls your `fetchToken` callback whenever it needs a fresh one, and your client never sees `sk_`.
+```
+┌─────────────────┐        ┌──────────────────┐        ┌─────────────────┐
+│  Your web app   │        │ Your backend     │        │ Voxline server  │
+│                 │        │                  │        │                 │
+│  fetchToken ────┼───────►│  call Voxline ──┼───────►│  mint ct_       │
+│        │        │        │  with sk_       │        │       │         │
+│        │◄───────┼────────┼──── ct_ ────────┼────────┼─── ct_          │
+│  startCall(...) ┼────────┼──── WSS /v1/agents/.../call?token=ct_ ─────►│
+└─────────────────┘        └──────────────────┘        └─────────────────┘
+```
+The `sk_` API key never lives in browser code. **The SDK has no `apiKey` option** — pre-0.2 had one, anyone reading the docs would bake their server credential into client code, that whole class of footgun is gone.
+## Quick start (browser)
+```ts
+import { configureVoiceClient } from '@craftedxp/voice-js'
+const voice = configureVoiceClient({
+  apiBase: 'https://api.your-server.com',
+  // SDK calls this whenever it needs a fresh ct_ — initial connect
+  // and any mid-call token refresh. Your backend handles the mint.
+  fetchToken: async ({ agentId }) => {
+    const r = await fetch('/api/voice/mint', {
+      method: 'POST',
+      body: JSON.stringify({ agentId }),
+    })
+    return (await r.json()).token
+  },
+  // Optional — applied to every call. Per-call options merge on top.
+  defaultMetadata: { surface: 'web', appVersion: '1.4.0' },
+})
+// Per call (typically inside a click handler so the AudioContext gets
+// the user gesture it needs):
+const call = await voice.startCall({
+  agentId: 'agt_xxx',
+  context: { userId: 'usr_123', topic: 'billing' },
+  metadata: { sessionId: 'sess_x' },
+  bargeIn: true,
+  onStateChange: (state) => console.log('state', state),
+  onTranscript: (entries) => render(entries),
+  onVolume: ({ input, output }) => drawMeters(input, output),
+  onError: ({ code, message }) => toast(`${code}: ${message}`),
+  onEnd: ({ reason, durationMs }) => log('ended', reason, durationMs),
+})
+call.mute()    // gate mic frames (server still sees wire cadence)
+call.unmute()
+call.end()     // close WS + stop mic + fire onEnd
+```
+## Quick start (Node / Electron-main)
+```ts
+import { configureVoiceClient } from '@craftedxp/voice-js/node'
+import { spawn } from 'child_process'
+const voice = configureVoiceClient({
+  apiBase: 'https://api.your-server.com',
+  fetchToken: async () => mintFromMyBackend(),
+})
+// Bring your own audio. Example: sox subprocesses for mic + speakers.
+const mic = spawn('sox', ['-d', '-r', '16000', '-c', '1', '-b', '16', '-e', 'signed', '-t', 'raw', '-'])
+const spk = spawn('sox', ['-t', 'raw', '-r', '16000', '-c', '1', '-b', '16', '-e', 'signed', '-', '-d'])
+const call = await voice.startCall({
+  agentId: 'agt_xxx',
+  onAudioChunk: (pcm) => spk.stdin.write(Buffer.from(pcm)),
+  onEnd: () => { mic.kill(); spk.stdin.end() },
+})
+mic.stdout.on('data', (chunk) => call.sendAudioChunk(chunk))
+```
+The Node bundle has the same `configureVoiceClient` / `startCall` shape, plus an extra `sendAudioChunk(pcm)` method on the call handle and an `onAudioChunk(pcm)` per-call callback. **No built-in audio adapter** — feed PCM in/out yourself with whatever your host has handy (sox, PortAudio, RTP relay, Electron IPC bridge).
+## API reference
+### `configureVoiceClient(config)`
+| Field | Type | Notes |
+|---|---|---|
+| `apiBase` | `string` | Full HTTPS URL of the Voxline server. WS scheme derived: `https`→`wss`. Trailing slash optional. |
+| `fetchToken` | `(args) => Promise<string>` | Called by the SDK whenever it needs a fresh `ct_`. Mirrors `@craftedxp/voice-rn`'s shape exactly — `{ agentId, userId?, context?, metadata? }`. |
+| `defaultMetadata` | `Record<string, string>?` | Applied to every `startCall`. Per-call merges on top. |
+| `defaultContext` | `Record<string, unknown>?` | Applied to every `startCall`. Per-call merges on top. |
+Returns a `VoiceClientFactory` with one method:
+### `factory.startCall(options)`
+| Field | Type | Notes |
+|---|---|---|
+| `agentId` | `string` | Required. |
+| `userId` | `string?` | Round-tripped to fetchToken as `userId`; server uses it for contact memory. |
+| `context` | `Record<string, unknown>?` | Per-call structured context. Merged on top of `defaultContext`. Lowered into the agent's system prompt server-side. |
+| `metadata` | `Record<string, string>?` | Per-call key/value. Merged on top of `defaultMetadata`. Round-tripped on `call.ended` webhook. NOT lowered into the prompt. |
+| `bargeIn` | `boolean?` | Default `true`. Set `false` for alarm-style flows where the user shouldn't accidentally interrupt the script. |
+| `token` | `string?` | **Test-only escape hatch** — pre-minted `ct_`, bypasses `fetchToken`. Don't use in production. |
+| `onStateChange` | `(state) => void` | Fires on every state machine transition. |
+| `onTranscript` | `(entries) => void` | Fires on every transcript update. |
+| `onVolume` | `({ input, output }) => void` | 0-1 RMS. ~10 Hz cadence. Browser bundle only. |
+| `onError` | `(err) => void` | Stable `code` from `CallErrorCode`; matches `voice-rn` codes where overlap. |
+| `onEnd` | `({ reason, errorCode?, durationMs }) => void` | Fires once when the call ends. |
+Resolves to a `Call` handle:
+```ts
+interface Call {
+  readonly state: CallState
+  readonly transcript: TranscriptEntry[]
+  readonly isMuted: boolean
+  end: () => void
+  mute: () => void
+  unmute: () => void
+}
+```
+Node consumers get a `NodeCall` extension with one extra method:
+```ts
+interface NodeCall extends Call {
+  sendAudioChunk: (pcm: ArrayBuffer | ArrayBufferView) => boolean
+}
+```
+### Stable types
+```ts
+type CallState =
+  | 'idle' | 'connecting' | 'listening'
+  | 'user_speaking' | 'agent_speaking'
+  | 'ended' | 'error'
+type CallErrorCode =
+  | 'missing_credentials' | 'forbidden'
+  | 'mic_denied' | 'mic_start_failed' | 'audio_session_failed'
+  | 'token_expired' | 'token_invalid' | 'unauthorized'
+  | 'network_unreachable' | 'socket_error'
+  | 'payment_required' | 'not_found'
+  | 'silence_timeout' | 'server_error'
+type CallEndReason = 'agent_ended' | 'user_hangup' | 'timeout' | 'error'
+```
+## Migrating from `@voxline/web`
+```diff
+- import { VoiceClient } from '@voxline/web'
++ import { configureVoiceClient } from '@craftedxp/voice-js'
+- const client = new VoiceClient({
+-   apiBase: 'https://api.example.com',
+-   agentId: 'agt_xxx',
+-   apiKey: 'sk_REDACTED',           // ⚠️ DO NOT in client code
+-   variables: { topic: 'billing' },
+- })
+- client.on('state', (s) => setState(s))
+- client.on('transcript', (t) => setTranscript(t))
+- client.on('volume', (v) => setVolume(v))
+- client.on('error', (e) => setError(e))
+- client.on('close', () => setState('ended'))
+- await client.connect()
+- // …
+- client.mute(true)
+- client.disconnect()
++ const voice = configureVoiceClient({
++   apiBase: 'https://api.example.com',
++   fetchToken: async ({ agentId }) => {
++     const r = await fetch('/api/voice/mint', {
++       method: 'POST',
++       body: JSON.stringify({ agentId }),
++     })
++     return (await r.json()).token  // your backend uses sk_ to mint ct_
++   },
++ })
++ const call = await voice.startCall({
++   agentId: 'agt_xxx',
++   context: { topic: 'billing' },                    // was `variables`
++   onStateChange: (s) => setState(s),
++   onTranscript: (t) => setTranscript(t),
++   onVolume: (v) => setVolume(v),
++   onError: (e) => setError(e),
++   onEnd: ({ reason }) => setState('ended'),
++ })
++ // …
++ call.mute()                                         // toggleable: mute/unmute, no boolean
++ call.end()
+```
+Three semantic shifts to be aware of:
+1. **`apiKey` is gone.** The SDK no longer accepts it. Move your token mint to your backend. If you have an `sk_` in JS code today, that's a credential leak — rotate it after the migration.
+2. **`variables` → `context`.** Same purpose; new name lines up with `voice-rn`.
+3. **`mute(true|false)` → `mute()` / `unmute()`.** Symmetric with `voice-rn`.
+The embed widget (`<script src="embed.js" data-token="ct_...">`) keeps the same HTML API, but the `data-api-key` attribute is no longer accepted — mint server-side and inject `data-token` instead.
+## Embed widget
+For drop-in `<script>` consumers (landing pages, no-build embeds):
+```html
+<script
+  src="https://your-cdn/embed.js"
+  data-token="ct_REDACTED"
+  data-agent-id="agt_xxx"
+  data-api-base="https://api.your-server.com"
+  defer
+></script>
+```
+Renders a floating call button with a Shadow-DOM transcript panel. Pre-mint the `ct_` server-side and inject it into the `data-token` attribute when you render the page.
+## Status
+- **0.2.0** (current) — first `@craftedxp/voice-js` release. Browser + Node dual bundle, `fetchToken` factory, voice-rn 0.3.x parity. Migration path from `@voxline/web@0.1.0` documented above.
+- 0.1.0 — `@voxline/web`. Singleton `VoiceClient` class, `apiKey` accepted. Retired in 0.2.0; never published to npm so no deprecation window.
+See [`CONSUMING.md`](CONSUMING.md) for the full setup walkthrough and [`DEVELOPING.md`](DEVELOPING.md) for SDK-author iteration.