npm - @navai/voice-frontend - Versions diffs - 0.1.5 → 0.1.7 - Mend

@navai/voice-frontend 0.1.5 → 0.1.7

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (8) hide show

package/README.en.md CHANGED Viewed

@@ -26,11 +26,12 @@ npm install react
 This package is intentionally split by concern:
-1. `src/backend.ts`
-HTTP client for backend routes:
-- `POST /navai/realtime/client-secret`
-- `GET /navai/functions`
-- `POST /navai/functions/execute`
+1. `src/backend.ts`
+HTTP client for backend routes:
+- `POST /navai/realtime/client-secret`
+- `POST /navai/speech/synthesize`
+- `GET /navai/functions`
+- `POST /navai/functions/execute`
 2. `src/runtime.ts`
 Runtime resolver for:
@@ -68,11 +69,12 @@ Hook-driven runtime flow (`useWebVoiceAgent`):
 1. Resolve runtime config from `moduleLoaders` + `defaultRoutes` + env/options.
 2. Create backend client with `apiBaseUrl` or `NAVAI_API_URL`.
-3. On `start()`:
-- request client secret.
-- fetch backend function list.
-- build Navai agent with local + backend functions.
-- connect `RealtimeSession`.
+3. On `start()`:
+- request client secret.
+- read `speech.provider` from backend response.
+- fetch backend function list.
+- build Navai agent with local + backend functions.
+- connect `RealtimeSession`.
 4. On `stop()`:
 - close session and reset state.
@@ -106,9 +108,18 @@ Useful types:
 - `NavaiRoute`
 - `NavaiFunctionDefinition`
-- `NavaiFunctionsRegistry`
-- `NavaiBackendFunctionDefinition`
-- `UseWebVoiceAgentOptions`
+- `NavaiFunctionsRegistry`
+- `NavaiBackendFunctionDefinition`
+- `NavaiBackendSpeechConfig`
+- `UseWebVoiceAgentOptions`
+## Hybrid Speech Mode
+When backend returns `speech.provider: "elevenlabs"`:
+- `useWebVoiceAgent` updates the Realtime session to use `output_modalities: ["text"]`.
+- assistant final text is sent to `backendClient.synthesizeSpeech(...)`.
+- playback happens locally in the browser with the synthesized ElevenLabs audio.
 ## Tool Model and Behavior
@@ -234,16 +245,18 @@ For browser realtime multi-agent orchestration, `buildNavaiAgent` currently wire
 2. `env.NAVAI_API_URL`.
 3. fallback `http://localhost:3000`.
-Methods:
-- `createClientSecret(input?)`
-- `listFunctions()`
-- `executeFunction({ functionName, payload })`
+Methods:
+- `createClientSecret(input?)`
+- `synthesizeSpeech({ text, ... })`
+- `listFunctions()`
+- `executeFunction({ functionName, payload })`
 Error handling:
-- network/HTTP failures throw for create/execute.
-- function listing returns warnings and empty list on failures.
+- network/HTTP failures throw for create/execute.
+- function listing returns warnings and empty list on failures.
+- `createClientSecret()` returns `{ value, expires_at, speech }`, where `speech.provider` is `openai` or `elevenlabs`.
 ## Generated Module Loader CLI

package/README.es.md CHANGED Viewed

@@ -26,11 +26,12 @@ npm install react
 El paquete esta separado por responsabilidades:
-1. `src/backend.ts`
-Cliente HTTP para rutas backend:
-- `POST /navai/realtime/client-secret`
-- `GET /navai/functions`
-- `POST /navai/functions/execute`
+1. `src/backend.ts`
+Cliente HTTP para rutas backend:
+- `POST /navai/realtime/client-secret`
+- `POST /navai/speech/synthesize`
+- `GET /navai/functions`
+- `POST /navai/functions/execute`
 2. `src/runtime.ts`
 Resolver de runtime para:
@@ -68,11 +69,12 @@ Flujo del hook (`useWebVoiceAgent`):
 1. Resuelve runtime config desde `moduleLoaders` + `defaultRoutes` + env/opciones.
 2. Crea backend client con `apiBaseUrl` o `NAVAI_API_URL`.
-3. En `start()`:
-- solicita client secret.
-- solicita listado de funciones backend.
-- construye agente Navai con funciones locales + backend.
-- conecta `RealtimeSession`.
+3. En `start()`:
+- solicita client secret.
+- lee `speech.provider` desde la respuesta backend.
+- solicita listado de funciones backend.
+- construye agente Navai con funciones locales + backend.
+- conecta `RealtimeSession`.
 4. En `stop()`:
 - cierra sesion y resetea estado.
@@ -106,9 +108,18 @@ Tipos utiles:
 - `NavaiRoute`
 - `NavaiFunctionDefinition`
-- `NavaiFunctionsRegistry`
-- `NavaiBackendFunctionDefinition`
-- `UseWebVoiceAgentOptions`
+- `NavaiFunctionsRegistry`
+- `NavaiBackendFunctionDefinition`
+- `NavaiBackendSpeechConfig`
+- `UseWebVoiceAgentOptions`
+## Modo de voz hibrido
+Cuando el backend devuelve `speech.provider: "elevenlabs"`:
+- `useWebVoiceAgent` actualiza la sesion Realtime con `output_modalities: ["text"]`.
+- el texto final del asistente se envia a `backendClient.synthesizeSpeech(...)`.
+- la reproduccion ocurre localmente en el navegador con el audio sintetizado por ElevenLabs.
 ## Modelo de Tools y Comportamiento
@@ -234,16 +245,18 @@ Prioridad de base URL en `createNavaiBackendClient`:
 2. `env.NAVAI_API_URL`.
 3. Fallback `http://localhost:3000`.
-Metodos:
-- `createClientSecret(input?)`
-- `listFunctions()`
-- `executeFunction({ functionName, payload })`
+Metodos:
+- `createClientSecret(input?)`
+- `synthesizeSpeech({ text, ... })`
+- `listFunctions()`
+- `executeFunction({ functionName, payload })`
 Manejo de errores:
-- fallos de red/HTTP lanzan error en create/execute.
-- el listado de funciones retorna warnings + lista vacia en fallos.
+- fallos de red/HTTP lanzan error en create/execute.
+- el listado de funciones retorna warnings + lista vacia en fallos.
+- `createClientSecret()` retorna `{ value, expires_at, speech }`, donde `speech.provider` puede ser `openai` o `elevenlabs`.
 ## CLI Generador de Module Loaders

package/README.md CHANGED Viewed

@@ -26,11 +26,12 @@ npm install react
 This package is intentionally split by concern:
-1. `src/backend.ts`
-HTTP client for backend routes:
-- `POST /navai/realtime/client-secret`
-- `GET /navai/functions`
-- `POST /navai/functions/execute`
+1. `src/backend.ts`
+HTTP client for backend routes:
+- `POST /navai/realtime/client-secret`
+- `POST /navai/speech/synthesize`
+- `GET /navai/functions`
+- `POST /navai/functions/execute`
 2. `src/runtime.ts`
 Runtime resolver for:
@@ -68,11 +69,12 @@ Hook-driven runtime flow (`useWebVoiceAgent`):
 1. Resolve runtime config from `moduleLoaders` + `defaultRoutes` + env/options.
 2. Create backend client with `apiBaseUrl` or `NAVAI_API_URL`.
-3. On `start()`:
-- request client secret.
-- fetch backend function list.
-- build Navai agent with local + backend functions.
-- connect `RealtimeSession`.
+3. On `start()`:
+- request client secret.
+- read `speech.provider` from backend response.
+- fetch backend function list.
+- build Navai agent with local + backend functions.
+- connect `RealtimeSession`.
 4. On `stop()`:
 - close session and reset state.
@@ -106,9 +108,18 @@ Useful types:
 - `NavaiRoute`
 - `NavaiFunctionDefinition`
-- `NavaiFunctionsRegistry`
-- `NavaiBackendFunctionDefinition`
-- `UseWebVoiceAgentOptions`
+- `NavaiFunctionsRegistry`
+- `NavaiBackendFunctionDefinition`
+- `NavaiBackendSpeechConfig`
+- `UseWebVoiceAgentOptions`
+## Hybrid Speech Mode
+When backend returns `speech.provider: "elevenlabs"`:
+- `useWebVoiceAgent` updates the Realtime session to use `output_modalities: ["text"]`.
+- assistant final text is sent to `backendClient.synthesizeSpeech(...)`.
+- playback happens locally in the browser with the synthesized ElevenLabs audio.
 ## Tool Model and Behavior
@@ -247,16 +258,18 @@ For browser realtime multi-agent orchestration, `buildNavaiAgent` currently wire
 2. `env.NAVAI_API_URL`.
 3. fallback `http://localhost:3000`.
-Methods:
-- `createClientSecret(input?)`
-- `listFunctions()`
-- `executeFunction({ functionName, payload })`
-Error handling:
-- network/HTTP failures throw for create/execute.
-- function listing returns warnings and empty list on failures.
+Methods:
+- `createClientSecret(input?)`
+- `synthesizeSpeech({ text, ... })`
+- `listFunctions()`
+- `executeFunction({ functionName, payload })`
+Error handling:
+- network/HTTP failures throw for create/execute.
+- function listing returns warnings and empty list on failures.
+- `createClientSecret()` returns `{ value, expires_at, speech }`, where `speech.provider` is `openai` or `elevenlabs`.
 ## Generated Module Loader CLI

package/dist/index.cjs CHANGED Viewed

@@ -1153,6 +1153,7 @@ var DEFAULT_API_BASE_URL = "http://localhost:3000";
 var DEFAULT_CLIENT_SECRET_PATH = "/navai/realtime/client-secret";
 var DEFAULT_FUNCTIONS_LIST_PATH = "/navai/functions";
 var DEFAULT_FUNCTIONS_EXECUTE_PATH = "/navai/functions/execute";
+var DEFAULT_SPEECH_SYNTHESIZE_PATH = "/navai/speech/synthesize";
 function readOptional(value) {
   const trimmed = value?.trim();
   return trimmed ? trimmed : void 0;
@@ -1165,6 +1166,12 @@ function joinUrl(baseUrl, path) {
 function isRecord(value) {
   return Boolean(value && typeof value === "object");
 }
+function readSpeechConfig(payload) {
+  if (isRecord(payload) && isRecord(payload.speech) && payload.speech.provider === "elevenlabs") {
+    return { provider: "elevenlabs" };
+  }
+  return { provider: "openai" };
+}
 async function readTextSafe(response) {
   try {
     return await response.text();
@@ -1185,6 +1192,7 @@ function createNavaiBackendClient(options = {}) {
   const clientSecretUrl = joinUrl(apiBaseUrl, options.clientSecretPath ?? DEFAULT_CLIENT_SECRET_PATH);
   const functionsListUrl = joinUrl(apiBaseUrl, options.functionsListPath ?? DEFAULT_FUNCTIONS_LIST_PATH);
   const functionsExecuteUrl = joinUrl(apiBaseUrl, options.functionsExecutePath ?? DEFAULT_FUNCTIONS_EXECUTE_PATH);
+  const speechSynthesizeUrl = joinUrl(apiBaseUrl, options.speechSynthesizePath ?? DEFAULT_SPEECH_SYNTHESIZE_PATH);
   async function createClientSecret(input = {}) {
     const response = await fetchImpl(clientSecretUrl, {
       method: "POST",
@@ -1200,7 +1208,27 @@ function createNavaiBackendClient(options = {}) {
     }
     return {
       value: payload.value,
-      expires_at: typeof payload.expires_at === "number" ? payload.expires_at : void 0
+      expires_at: typeof payload.expires_at === "number" ? payload.expires_at : void 0,
+      speech: readSpeechConfig(payload)
+    };
+  }
+  async function synthesizeSpeech(input) {
+    const response = await fetchImpl(speechSynthesizeUrl, {
+      method: "POST",
+      headers: { "Content-Type": "application/json" },
+      body: JSON.stringify(input)
+    });
+    if (!response.ok) {
+      throw new Error(await readTextSafe(response));
+    }
+    const payload = await readJsonSafe(response);
+    if (!isRecord(payload) || payload.provider !== "elevenlabs" || typeof payload.mimeType !== "string" || typeof payload.audioBase64 !== "string") {
+      throw new Error("Invalid speech synthesis response.");
+    }
+    return {
+      provider: "elevenlabs",
+      mimeType: payload.mimeType,
+      audioBase64: payload.audioBase64
     };
   }
   async function listFunctions() {
@@ -1260,6 +1288,7 @@ function createNavaiBackendClient(options = {}) {
   };
   return {
     createClientSecret,
+    synthesizeSpeech,
     listFunctions,
     executeFunction
   };
@@ -1621,9 +1650,77 @@ function debugLog2(message, details) {
   }
   console.log(`${DEBUG_PREFIX2} ${message}`, details);
 }
+function isRecord2(value) {
+  return Boolean(value && typeof value === "object");
+}
+function readRealtimeEventType(event) {
+  if (!isRecord2(event) || typeof event.type !== "string") {
+    return "";
+  }
+  return event.type.trim().toLowerCase();
+}
+function readAssistantTextFromResponseOutput(items) {
+  const parts = [];
+  for (const item of items) {
+    if (!isRecord2(item) || item.type !== "message" || item.role !== "assistant") {
+      continue;
+    }
+    const content = Array.isArray(item.content) ? item.content : [];
+    for (const chunk of content) {
+      if (!isRecord2(chunk)) {
+        continue;
+      }
+      const text = chunk.type === "output_text" ? typeof chunk.text === "string" ? chunk.text : "" : chunk.type === "output_audio" ? typeof chunk.transcript === "string" ? chunk.transcript : "" : "";
+      const normalized = text.trim();
+      if (normalized) {
+        parts.push(normalized);
+      }
+    }
+  }
+  return parts.join("\n").trim();
+}
+function extractAssistantTextFromRealtimeEvent(event) {
+  if (!isRecord2(event)) {
+    return null;
+  }
+  const eventType = readRealtimeEventType(event);
+  if (eventType === "response.output_text.done" || eventType === "response.text.done" || eventType === "response.audio_transcript.done") {
+    const text = typeof event.text === "string" ? event.text.trim() : typeof event.transcript === "string" ? event.transcript.trim() : "";
+    if (!text) {
+      return null;
+    }
+    const key = [
+      eventType,
+      typeof event.response_id === "string" ? event.response_id : "",
+      typeof event.item_id === "string" ? event.item_id : ""
+    ].filter(Boolean).join(":");
+    return { key: key || `${eventType}:${text}`, text };
+  }
+  if (eventType === "response.done" && isRecord2(event.response) && Array.isArray(event.response.output)) {
+    const text = readAssistantTextFromResponseOutput(event.response.output);
+    if (!text) {
+      return null;
+    }
+    const responseId = typeof event.response.id === "string" ? event.response.id : "";
+    return { key: responseId ? `response.done:${responseId}` : `response.done:${text}`, text };
+  }
+  return null;
+}
+function audioUrlFromSynthesis(result) {
+  const binary = atob(result.audioBase64);
+  const bytes = new Uint8Array(binary.length);
+  for (let index = 0; index < binary.length; index += 1) {
+    bytes[index] = binary.charCodeAt(index);
+  }
+  return URL.createObjectURL(new Blob([bytes], { type: result.mimeType }));
+}
 function useWebVoiceAgent(options) {
   const sessionRef = (0, import_react.useRef)(null);
   const attachedRealtimeSessionRef = (0, import_react.useRef)(null);
+  const speechProviderRef = (0, import_react.useRef)("openai");
+  const spokenAssistantKeysRef = (0, import_react.useRef)(/* @__PURE__ */ new Set());
+  const playbackGenerationRef = (0, import_react.useRef)(0);
+  const activePlaybackRef = (0, import_react.useRef)(null);
   const runtimeConfigPromise = (0, import_react.useMemo)(
     () => resolveNavaiFrontendRuntimeConfig({
       moduleLoaders: options.moduleLoaders,
@@ -1661,6 +1758,69 @@ function useWebVoiceAgent(options) {
   const setAgentVoiceStateIfChanged = (0, import_react.useCallback)((next) => {
     setAgentVoiceState((current) => current === next ? current : next);
   }, []);
+  const clearPlayback = (0, import_react.useCallback)(
+    (options2) => {
+      if (options2?.invalidate) {
+        playbackGenerationRef.current += 1;
+      }
+      const active = activePlaybackRef.current;
+      if (active) {
+        try {
+          active.audio.pause();
+          active.audio.currentTime = 0;
+        } catch {
+        }
+        URL.revokeObjectURL(active.url);
+      }
+      activePlaybackRef.current = null;
+      if (options2?.resetState !== false) {
+        setAgentVoiceStateIfChanged("idle");
+      }
+    },
+    [setAgentVoiceStateIfChanged]
+  );
+  const playAssistantSpeech = (0, import_react.useCallback)(
+    async (text) => {
+      if (speechProviderRef.current !== "elevenlabs") {
+        return;
+      }
+      const normalized = text.trim();
+      if (!normalized) {
+        return;
+      }
+      clearPlayback({ resetState: false });
+      const generation = playbackGenerationRef.current + 1;
+      playbackGenerationRef.current = generation;
+      setAgentVoiceStateIfChanged("speaking");
+      try {
+        const synthesized = await backendClient.synthesizeSpeech({ text: normalized });
+        if (speechProviderRef.current !== "elevenlabs" || playbackGenerationRef.current !== generation) {
+          return;
+        }
+        const audio = new Audio();
+        const url = audioUrlFromSynthesis(synthesized);
+        audio.src = url;
+        audio.autoplay = false;
+        activePlaybackRef.current = { audio, url };
+        const finish = () => {
+          if (activePlaybackRef.current?.audio === audio) {
+            clearPlayback({ resetState: true });
+          } else {
+            URL.revokeObjectURL(url);
+          }
+        };
+        audio.addEventListener("ended", finish, { once: true });
+        audio.addEventListener("error", finish, { once: true });
+        await audio.play();
+      } catch (playbackError) {
+        debugLog2("assistant speech playback failed", playbackError);
+        if (playbackGenerationRef.current === generation) {
+          clearPlayback({ resetState: true });
+        }
+      }
+    },
+    [backendClient, clearPlayback, setAgentVoiceStateIfChanged]
+  );
   const handleSessionAudioStart = (0, import_react.useCallback)(() => {
     setAgentVoiceStateIfChanged("speaking");
   }, [setAgentVoiceStateIfChanged]);
@@ -1668,11 +1828,35 @@ function useWebVoiceAgent(options) {
     setAgentVoiceStateIfChanged("idle");
   }, [setAgentVoiceStateIfChanged]);
   const handleSessionAudioInterrupted = (0, import_react.useCallback)(() => {
+    clearPlayback({ invalidate: true, resetState: true });
     setAgentVoiceStateIfChanged("idle");
-  }, [setAgentVoiceStateIfChanged]);
+  }, [clearPlayback, setAgentVoiceStateIfChanged]);
   const handleSessionError = (0, import_react.useCallback)(() => {
+    clearPlayback({ invalidate: true, resetState: true });
     setAgentVoiceStateIfChanged("idle");
-  }, [setAgentVoiceStateIfChanged]);
+  }, [clearPlayback, setAgentVoiceStateIfChanged]);
+  const handleTransportEvent = (0, import_react.useCallback)(
+    (event) => {
+      const eventType = readRealtimeEventType(event);
+      if (!eventType) {
+        return;
+      }
+      if (eventType === "input_audio_buffer.speech_started" || eventType === "conversation.item.input_audio_transcription.started") {
+        clearPlayback({ invalidate: true, resetState: true });
+        return;
+      }
+      if (speechProviderRef.current !== "elevenlabs") {
+        return;
+      }
+      const assistantText = extractAssistantTextFromRealtimeEvent(event);
+      if (!assistantText || spokenAssistantKeysRef.current.has(assistantText.key)) {
+        return;
+      }
+      spokenAssistantKeysRef.current.add(assistantText.key);
+      void playAssistantSpeech(assistantText.text);
+    },
+    [clearPlayback, playAssistantSpeech]
+  );
   const detachSessionAudioListeners = (0, import_react.useCallback)(() => {
     const attachedSession = attachedRealtimeSessionRef.current;
     if (!attachedSession) {
@@ -1681,9 +1865,16 @@ function useWebVoiceAgent(options) {
     attachedSession.off("audio_start", handleSessionAudioStart);
     attachedSession.off("audio_stopped", handleSessionAudioStopped);
     attachedSession.off("audio_interrupted", handleSessionAudioInterrupted);
+    attachedSession.off("transport_event", handleTransportEvent);
     attachedSession.off("error", handleSessionError);
     attachedRealtimeSessionRef.current = null;
-  }, [handleSessionAudioInterrupted, handleSessionAudioStart, handleSessionAudioStopped, handleSessionError]);
+  }, [
+    handleSessionAudioInterrupted,
+    handleSessionAudioStart,
+    handleSessionAudioStopped,
+    handleSessionError,
+    handleTransportEvent
+  ]);
   const attachSessionAudioListeners = (0, import_react.useCallback)(
     (session) => {
       detachSessionAudioListeners();
@@ -1723,6 +1914,7 @@ function useWebVoiceAgent(options) {
       session.on("history_added", (item) => {
         debugLog2("session history_added", item);
       });
+      session.on("transport_event", handleTransportEvent);
       session.on("error", (sessionError) => {
         debugLog2("session error", sessionError);
       });
@@ -1737,19 +1929,23 @@ function useWebVoiceAgent(options) {
       handleSessionAudioInterrupted,
       handleSessionAudioStart,
       handleSessionAudioStopped,
-      handleSessionError
+      handleSessionError,
+      handleTransportEvent
     ]
   );
   const stop = (0, import_react.useCallback)(() => {
     detachSessionAudioListeners();
+    clearPlayback({ invalidate: true, resetState: true });
     try {
       sessionRef.current?.close();
     } finally {
       sessionRef.current = null;
+      spokenAssistantKeysRef.current.clear();
+      speechProviderRef.current = "openai";
       setStatus("idle");
       setAgentVoiceStateIfChanged("idle");
     }
-  }, [detachSessionAudioListeners, setAgentVoiceStateIfChanged]);
+  }, [clearPlayback, detachSessionAudioListeners, setAgentVoiceStateIfChanged]);
   (0, import_react.useEffect)(() => {
     return () => {
       stop();
@@ -1777,6 +1973,9 @@ function useWebVoiceAgent(options) {
       });
       const requestPayload = runtimeConfig.modelOverride ? { model: runtimeConfig.modelOverride } : {};
       const secretPayload = await backendClient.createClientSecret(requestPayload);
+      speechProviderRef.current = secretPayload.speech.provider;
+      spokenAssistantKeysRef.current.clear();
+      clearPlayback({ invalidate: true, resetState: true });
       const backendFunctionsResult = await backendClient.listFunctions();
       const { agent, warnings } = await buildNavaiAgent({
         navigate: options.navigate,
@@ -1788,7 +1987,11 @@ function useWebVoiceAgent(options) {
         executeBackendFunction: backendClient.executeFunction
       });
       emitWarnings([...runtimeConfig.warnings, ...backendFunctionsResult.warnings, ...warnings]);
-      const session = new import_realtime2.RealtimeSession(agent);
+      const session = secretPayload.speech.provider === "elevenlabs" ? new import_realtime2.RealtimeSession(agent, {
+        config: {
+          outputModalities: ["text"]
+        }
+      }) : new import_realtime2.RealtimeSession(agent);
       attachSessionAudioListeners(session);
       if (runtimeConfig.modelOverride) {
         await session.connect({ apiKey: secretPayload.value, model: runtimeConfig.modelOverride });
@@ -1804,6 +2007,9 @@ function useWebVoiceAgent(options) {
       setStatus("error");
       setAgentVoiceStateIfChanged("idle");
       detachSessionAudioListeners();
+      clearPlayback({ invalidate: true, resetState: true });
+      spokenAssistantKeysRef.current.clear();
+      speechProviderRef.current = "openai";
       try {
         sessionRef.current?.close();
       } catch {
@@ -1813,6 +2019,7 @@ function useWebVoiceAgent(options) {
   }, [
     attachSessionAudioListeners,
     backendClient,
+    clearPlayback,
     detachSessionAudioListeners,
     options.navigate,
     runtimeConfigPromise,

package/dist/index.d.cts CHANGED Viewed

@@ -104,9 +104,32 @@ type CreateClientSecretInput = {
     voiceTone?: string;
     apiKey?: string;
 };
+type NavaiSpeechProvider = "openai" | "elevenlabs";
+type NavaiBackendSpeechConfig = {
+    provider: NavaiSpeechProvider;
+};
 type CreateClientSecretOutput = {
     value: string;
     expires_at?: number;
+    speech: NavaiBackendSpeechConfig;
+};
+type SynthesizeSpeechInput = {
+    text: string;
+    voiceId?: string;
+    modelId?: string;
+    outputFormat?: string;
+    optimizeStreamingLatency?: number;
+    voiceSettings?: {
+        stability?: number;
+        similarityBoost?: number;
+        style?: number;
+        useSpeakerBoost?: boolean;
+    };
+};
+type SynthesizeSpeechOutput = {
+    provider: "elevenlabs";
+    mimeType: string;
+    audioBase64: string;
 };
 type BackendFunctionsResult = {
     functions: NavaiBackendFunctionDefinition[];
@@ -119,9 +142,11 @@ type CreateNavaiBackendClientOptions = {
     clientSecretPath?: string;
     functionsListPath?: string;
     functionsExecutePath?: string;
+    speechSynthesizePath?: string;
 };
 type NavaiBackendClient = {
     createClientSecret: (input?: CreateClientSecretInput) => Promise<CreateClientSecretOutput>;
+    synthesizeSpeech: (input: SynthesizeSpeechInput) => Promise<SynthesizeSpeechOutput>;
     listFunctions: () => Promise<BackendFunctionsResult>;
     executeFunction: ExecuteNavaiBackendFunction;
 };
@@ -253,4 +278,4 @@ type NavaiVoiceOrbDockMicIconProps = {
 };
 declare function NavaiVoiceOrbDockMicIcon({ isActive, size }: NavaiVoiceOrbDockMicIconProps): react_jsx_runtime.JSX.Element;
-export { type BuildNavaiAgentOptions, type BuildNavaiAgentResult, type CreateNavaiBackendClientOptions, type ExecuteNavaiBackendFunction, type ExecuteNavaiBackendFunctionInput, type NavaiAgentModuleConfig, type NavaiBackendClient, type NavaiBackendFunctionDefinition, type NavaiFunctionContext, type NavaiFunctionDefinition, type NavaiFunctionModuleLoaders, type NavaiFunctionPayload, type NavaiFunctionsRegistry, NavaiHeroOrb, type NavaiHeroOrbProps, NavaiMiniOrbDock, type NavaiMiniOrbDockProps, type NavaiRoute, type NavaiRuntimeAgentConfig, NavaiVoiceHeroOrb, type NavaiVoiceHeroOrbProps, type NavaiVoiceOrbBaseProps, NavaiVoiceOrbDock, NavaiVoiceOrbDockMicIcon, type NavaiVoiceOrbDockProps, type NavaiVoiceOrbMessages, type NavaiVoiceOrbPlacement, type NavaiVoiceOrbRuntimeSnapshot, type NavaiVoiceOrbThemeMode, type NavaiWebVoiceAgentLike, Orb, type OrbProps, type ResolveNavaiFrontendRuntimeConfigOptions, type ResolveNavaiFrontendRuntimeConfigResult, type UseWebVoiceAgentOptions, type UseWebVoiceAgentResult, buildNavaiAgent, clampNavaiOrbDelayMs, createNavaiBackendClient, getNavaiRoutePromptLines, loadNavaiFunctions, resolveNavaiFrontendRuntimeConfig, resolveNavaiRoute, resolveNavaiVoiceOrbRuntimeSnapshot, useWebVoiceAgent };
+export { type BuildNavaiAgentOptions, type BuildNavaiAgentResult, type CreateNavaiBackendClientOptions, type ExecuteNavaiBackendFunction, type ExecuteNavaiBackendFunctionInput, type NavaiAgentModuleConfig, type NavaiBackendClient, type NavaiBackendFunctionDefinition, type NavaiBackendSpeechConfig, type NavaiFunctionContext, type NavaiFunctionDefinition, type NavaiFunctionModuleLoaders, type NavaiFunctionPayload, type NavaiFunctionsRegistry, NavaiHeroOrb, type NavaiHeroOrbProps, NavaiMiniOrbDock, type NavaiMiniOrbDockProps, type NavaiRoute, type NavaiRuntimeAgentConfig, NavaiVoiceHeroOrb, type NavaiVoiceHeroOrbProps, type NavaiVoiceOrbBaseProps, NavaiVoiceOrbDock, NavaiVoiceOrbDockMicIcon, type NavaiVoiceOrbDockProps, type NavaiVoiceOrbMessages, type NavaiVoiceOrbPlacement, type NavaiVoiceOrbRuntimeSnapshot, type NavaiVoiceOrbThemeMode, type NavaiWebVoiceAgentLike, Orb, type OrbProps, type ResolveNavaiFrontendRuntimeConfigOptions, type ResolveNavaiFrontendRuntimeConfigResult, type UseWebVoiceAgentOptions, type UseWebVoiceAgentResult, buildNavaiAgent, clampNavaiOrbDelayMs, createNavaiBackendClient, getNavaiRoutePromptLines, loadNavaiFunctions, resolveNavaiFrontendRuntimeConfig, resolveNavaiRoute, resolveNavaiVoiceOrbRuntimeSnapshot, useWebVoiceAgent };

package/dist/index.d.ts CHANGED Viewed

@@ -104,9 +104,32 @@ type CreateClientSecretInput = {
     voiceTone?: string;
     apiKey?: string;
 };
+type NavaiSpeechProvider = "openai" | "elevenlabs";
+type NavaiBackendSpeechConfig = {
+    provider: NavaiSpeechProvider;
+};
 type CreateClientSecretOutput = {
     value: string;
     expires_at?: number;
+    speech: NavaiBackendSpeechConfig;
+};
+type SynthesizeSpeechInput = {
+    text: string;
+    voiceId?: string;
+    modelId?: string;
+    outputFormat?: string;
+    optimizeStreamingLatency?: number;
+    voiceSettings?: {
+        stability?: number;
+        similarityBoost?: number;
+        style?: number;
+        useSpeakerBoost?: boolean;
+    };
+};
+type SynthesizeSpeechOutput = {
+    provider: "elevenlabs";
+    mimeType: string;
+    audioBase64: string;
 };
 type BackendFunctionsResult = {
     functions: NavaiBackendFunctionDefinition[];
@@ -119,9 +142,11 @@ type CreateNavaiBackendClientOptions = {
     clientSecretPath?: string;
     functionsListPath?: string;
     functionsExecutePath?: string;
+    speechSynthesizePath?: string;
 };
 type NavaiBackendClient = {
     createClientSecret: (input?: CreateClientSecretInput) => Promise<CreateClientSecretOutput>;
+    synthesizeSpeech: (input: SynthesizeSpeechInput) => Promise<SynthesizeSpeechOutput>;
     listFunctions: () => Promise<BackendFunctionsResult>;
     executeFunction: ExecuteNavaiBackendFunction;
 };
@@ -253,4 +278,4 @@ type NavaiVoiceOrbDockMicIconProps = {
 };
 declare function NavaiVoiceOrbDockMicIcon({ isActive, size }: NavaiVoiceOrbDockMicIconProps): react_jsx_runtime.JSX.Element;
-export { type BuildNavaiAgentOptions, type BuildNavaiAgentResult, type CreateNavaiBackendClientOptions, type ExecuteNavaiBackendFunction, type ExecuteNavaiBackendFunctionInput, type NavaiAgentModuleConfig, type NavaiBackendClient, type NavaiBackendFunctionDefinition, type NavaiFunctionContext, type NavaiFunctionDefinition, type NavaiFunctionModuleLoaders, type NavaiFunctionPayload, type NavaiFunctionsRegistry, NavaiHeroOrb, type NavaiHeroOrbProps, NavaiMiniOrbDock, type NavaiMiniOrbDockProps, type NavaiRoute, type NavaiRuntimeAgentConfig, NavaiVoiceHeroOrb, type NavaiVoiceHeroOrbProps, type NavaiVoiceOrbBaseProps, NavaiVoiceOrbDock, NavaiVoiceOrbDockMicIcon, type NavaiVoiceOrbDockProps, type NavaiVoiceOrbMessages, type NavaiVoiceOrbPlacement, type NavaiVoiceOrbRuntimeSnapshot, type NavaiVoiceOrbThemeMode, type NavaiWebVoiceAgentLike, Orb, type OrbProps, type ResolveNavaiFrontendRuntimeConfigOptions, type ResolveNavaiFrontendRuntimeConfigResult, type UseWebVoiceAgentOptions, type UseWebVoiceAgentResult, buildNavaiAgent, clampNavaiOrbDelayMs, createNavaiBackendClient, getNavaiRoutePromptLines, loadNavaiFunctions, resolveNavaiFrontendRuntimeConfig, resolveNavaiRoute, resolveNavaiVoiceOrbRuntimeSnapshot, useWebVoiceAgent };
+export { type BuildNavaiAgentOptions, type BuildNavaiAgentResult, type CreateNavaiBackendClientOptions, type ExecuteNavaiBackendFunction, type ExecuteNavaiBackendFunctionInput, type NavaiAgentModuleConfig, type NavaiBackendClient, type NavaiBackendFunctionDefinition, type NavaiBackendSpeechConfig, type NavaiFunctionContext, type NavaiFunctionDefinition, type NavaiFunctionModuleLoaders, type NavaiFunctionPayload, type NavaiFunctionsRegistry, NavaiHeroOrb, type NavaiHeroOrbProps, NavaiMiniOrbDock, type NavaiMiniOrbDockProps, type NavaiRoute, type NavaiRuntimeAgentConfig, NavaiVoiceHeroOrb, type NavaiVoiceHeroOrbProps, type NavaiVoiceOrbBaseProps, NavaiVoiceOrbDock, NavaiVoiceOrbDockMicIcon, type NavaiVoiceOrbDockProps, type NavaiVoiceOrbMessages, type NavaiVoiceOrbPlacement, type NavaiVoiceOrbRuntimeSnapshot, type NavaiVoiceOrbThemeMode, type NavaiWebVoiceAgentLike, Orb, type OrbProps, type ResolveNavaiFrontendRuntimeConfigOptions, type ResolveNavaiFrontendRuntimeConfigResult, type UseWebVoiceAgentOptions, type UseWebVoiceAgentResult, buildNavaiAgent, clampNavaiOrbDelayMs, createNavaiBackendClient, getNavaiRoutePromptLines, loadNavaiFunctions, resolveNavaiFrontendRuntimeConfig, resolveNavaiRoute, resolveNavaiVoiceOrbRuntimeSnapshot, useWebVoiceAgent };

package/dist/index.js CHANGED Viewed

@@ -577,6 +577,7 @@ var DEFAULT_API_BASE_URL = "http://localhost:3000";
 var DEFAULT_CLIENT_SECRET_PATH = "/navai/realtime/client-secret";
 var DEFAULT_FUNCTIONS_LIST_PATH = "/navai/functions";
 var DEFAULT_FUNCTIONS_EXECUTE_PATH = "/navai/functions/execute";
+var DEFAULT_SPEECH_SYNTHESIZE_PATH = "/navai/speech/synthesize";
 function readOptional(value) {
   const trimmed = value?.trim();
   return trimmed ? trimmed : void 0;
@@ -589,6 +590,12 @@ function joinUrl(baseUrl, path) {
 function isRecord(value) {
   return Boolean(value && typeof value === "object");
 }
+function readSpeechConfig(payload) {
+  if (isRecord(payload) && isRecord(payload.speech) && payload.speech.provider === "elevenlabs") {
+    return { provider: "elevenlabs" };
+  }
+  return { provider: "openai" };
+}
 async function readTextSafe(response) {
   try {
     return await response.text();
@@ -609,6 +616,7 @@ function createNavaiBackendClient(options = {}) {
   const clientSecretUrl = joinUrl(apiBaseUrl, options.clientSecretPath ?? DEFAULT_CLIENT_SECRET_PATH);
   const functionsListUrl = joinUrl(apiBaseUrl, options.functionsListPath ?? DEFAULT_FUNCTIONS_LIST_PATH);
   const functionsExecuteUrl = joinUrl(apiBaseUrl, options.functionsExecutePath ?? DEFAULT_FUNCTIONS_EXECUTE_PATH);
+  const speechSynthesizeUrl = joinUrl(apiBaseUrl, options.speechSynthesizePath ?? DEFAULT_SPEECH_SYNTHESIZE_PATH);
   async function createClientSecret(input = {}) {
     const response = await fetchImpl(clientSecretUrl, {
       method: "POST",
@@ -624,7 +632,27 @@ function createNavaiBackendClient(options = {}) {
     }
     return {
       value: payload.value,
-      expires_at: typeof payload.expires_at === "number" ? payload.expires_at : void 0
+      expires_at: typeof payload.expires_at === "number" ? payload.expires_at : void 0,
+      speech: readSpeechConfig(payload)
+    };
+  }
+  async function synthesizeSpeech(input) {
+    const response = await fetchImpl(speechSynthesizeUrl, {
+      method: "POST",
+      headers: { "Content-Type": "application/json" },
+      body: JSON.stringify(input)
+    });
+    if (!response.ok) {
+      throw new Error(await readTextSafe(response));
+    }
+    const payload = await readJsonSafe(response);
+    if (!isRecord(payload) || payload.provider !== "elevenlabs" || typeof payload.mimeType !== "string" || typeof payload.audioBase64 !== "string") {
+      throw new Error("Invalid speech synthesis response.");
+    }
+    return {
+      provider: "elevenlabs",
+      mimeType: payload.mimeType,
+      audioBase64: payload.audioBase64
     };
   }
   async function listFunctions() {
@@ -684,6 +712,7 @@ function createNavaiBackendClient(options = {}) {
   };
   return {
     createClientSecret,
+    synthesizeSpeech,
     listFunctions,
     executeFunction
   };
@@ -1045,9 +1074,77 @@ function debugLog2(message, details) {
   }
   console.log(`${DEBUG_PREFIX2} ${message}`, details);
 }
+function isRecord2(value) {
+  return Boolean(value && typeof value === "object");
+}
+function readRealtimeEventType(event) {
+  if (!isRecord2(event) || typeof event.type !== "string") {
+    return "";
+  }
+  return event.type.trim().toLowerCase();
+}
+function readAssistantTextFromResponseOutput(items) {
+  const parts = [];
+  for (const item of items) {
+    if (!isRecord2(item) || item.type !== "message" || item.role !== "assistant") {
+      continue;
+    }
+    const content = Array.isArray(item.content) ? item.content : [];
+    for (const chunk of content) {
+      if (!isRecord2(chunk)) {
+        continue;
+      }
+      const text = chunk.type === "output_text" ? typeof chunk.text === "string" ? chunk.text : "" : chunk.type === "output_audio" ? typeof chunk.transcript === "string" ? chunk.transcript : "" : "";
+      const normalized = text.trim();
+      if (normalized) {
+        parts.push(normalized);
+      }
+    }
+  }
+  return parts.join("\n").trim();
+}
+function extractAssistantTextFromRealtimeEvent(event) {
+  if (!isRecord2(event)) {
+    return null;
+  }
+  const eventType = readRealtimeEventType(event);
+  if (eventType === "response.output_text.done" || eventType === "response.text.done" || eventType === "response.audio_transcript.done") {
+    const text = typeof event.text === "string" ? event.text.trim() : typeof event.transcript === "string" ? event.transcript.trim() : "";
+    if (!text) {
+      return null;
+    }
+    const key = [
+      eventType,
+      typeof event.response_id === "string" ? event.response_id : "",
+      typeof event.item_id === "string" ? event.item_id : ""
+    ].filter(Boolean).join(":");
+    return { key: key || `${eventType}:${text}`, text };
+  }
+  if (eventType === "response.done" && isRecord2(event.response) && Array.isArray(event.response.output)) {
+    const text = readAssistantTextFromResponseOutput(event.response.output);
+    if (!text) {
+      return null;
+    }
+    const responseId = typeof event.response.id === "string" ? event.response.id : "";
+    return { key: responseId ? `response.done:${responseId}` : `response.done:${text}`, text };
+  }
+  return null;
+}
+function audioUrlFromSynthesis(result) {
+  const binary = atob(result.audioBase64);
+  const bytes = new Uint8Array(binary.length);
+  for (let index = 0; index < binary.length; index += 1) {
+    bytes[index] = binary.charCodeAt(index);
+  }
+  return URL.createObjectURL(new Blob([bytes], { type: result.mimeType }));
+}
 function useWebVoiceAgent(options) {
   const sessionRef = useRef(null);
   const attachedRealtimeSessionRef = useRef(null);
+  const speechProviderRef = useRef("openai");
+  const spokenAssistantKeysRef = useRef(/* @__PURE__ */ new Set());
+  const playbackGenerationRef = useRef(0);
+  const activePlaybackRef = useRef(null);
   const runtimeConfigPromise = useMemo(
     () => resolveNavaiFrontendRuntimeConfig({
       moduleLoaders: options.moduleLoaders,
@@ -1085,6 +1182,69 @@ function useWebVoiceAgent(options) {
   const setAgentVoiceStateIfChanged = useCallback((next) => {
     setAgentVoiceState((current) => current === next ? current : next);
   }, []);
+  const clearPlayback = useCallback(
+    (options2) => {
+      if (options2?.invalidate) {
+        playbackGenerationRef.current += 1;
+      }
+      const active = activePlaybackRef.current;
+      if (active) {
+        try {
+          active.audio.pause();
+          active.audio.currentTime = 0;
+        } catch {
+        }
+        URL.revokeObjectURL(active.url);
+      }
+      activePlaybackRef.current = null;
+      if (options2?.resetState !== false) {
+        setAgentVoiceStateIfChanged("idle");
+      }
+    },
+    [setAgentVoiceStateIfChanged]
+  );
+  const playAssistantSpeech = useCallback(
+    async (text) => {
+      if (speechProviderRef.current !== "elevenlabs") {
+        return;
+      }
+      const normalized = text.trim();
+      if (!normalized) {
+        return;
+      }
+      clearPlayback({ resetState: false });
+      const generation = playbackGenerationRef.current + 1;
+      playbackGenerationRef.current = generation;
+      setAgentVoiceStateIfChanged("speaking");
+      try {
+        const synthesized = await backendClient.synthesizeSpeech({ text: normalized });
+        if (speechProviderRef.current !== "elevenlabs" || playbackGenerationRef.current !== generation) {
+          return;
+        }
+        const audio = new Audio();
+        const url = audioUrlFromSynthesis(synthesized);
+        audio.src = url;
+        audio.autoplay = false;
+        activePlaybackRef.current = { audio, url };
+        const finish = () => {
+          if (activePlaybackRef.current?.audio === audio) {
+            clearPlayback({ resetState: true });
+          } else {
+            URL.revokeObjectURL(url);
+          }
+        };
+        audio.addEventListener("ended", finish, { once: true });
+        audio.addEventListener("error", finish, { once: true });
+        await audio.play();
+      } catch (playbackError) {
+        debugLog2("assistant speech playback failed", playbackError);
+        if (playbackGenerationRef.current === generation) {
+          clearPlayback({ resetState: true });
+        }
+      }
+    },
+    [backendClient, clearPlayback, setAgentVoiceStateIfChanged]
+  );
   const handleSessionAudioStart = useCallback(() => {
     setAgentVoiceStateIfChanged("speaking");
   }, [setAgentVoiceStateIfChanged]);
@@ -1092,11 +1252,35 @@ function useWebVoiceAgent(options) {
     setAgentVoiceStateIfChanged("idle");
   }, [setAgentVoiceStateIfChanged]);
   const handleSessionAudioInterrupted = useCallback(() => {
+    clearPlayback({ invalidate: true, resetState: true });
     setAgentVoiceStateIfChanged("idle");
-  }, [setAgentVoiceStateIfChanged]);
+  }, [clearPlayback, setAgentVoiceStateIfChanged]);
   const handleSessionError = useCallback(() => {
+    clearPlayback({ invalidate: true, resetState: true });
     setAgentVoiceStateIfChanged("idle");
-  }, [setAgentVoiceStateIfChanged]);
+  }, [clearPlayback, setAgentVoiceStateIfChanged]);
+  const handleTransportEvent = useCallback(
+    (event) => {
+      const eventType = readRealtimeEventType(event);
+      if (!eventType) {
+        return;
+      }
+      if (eventType === "input_audio_buffer.speech_started" || eventType === "conversation.item.input_audio_transcription.started") {
+        clearPlayback({ invalidate: true, resetState: true });
+        return;
+      }
+      if (speechProviderRef.current !== "elevenlabs") {
+        return;
+      }
+      const assistantText = extractAssistantTextFromRealtimeEvent(event);
+      if (!assistantText || spokenAssistantKeysRef.current.has(assistantText.key)) {
+        return;
+      }
+      spokenAssistantKeysRef.current.add(assistantText.key);
+      void playAssistantSpeech(assistantText.text);
+    },
+    [clearPlayback, playAssistantSpeech]
+  );
   const detachSessionAudioListeners = useCallback(() => {
     const attachedSession = attachedRealtimeSessionRef.current;
     if (!attachedSession) {
@@ -1105,9 +1289,16 @@ function useWebVoiceAgent(options) {
     attachedSession.off("audio_start", handleSessionAudioStart);
     attachedSession.off("audio_stopped", handleSessionAudioStopped);
     attachedSession.off("audio_interrupted", handleSessionAudioInterrupted);
+    attachedSession.off("transport_event", handleTransportEvent);
     attachedSession.off("error", handleSessionError);
     attachedRealtimeSessionRef.current = null;
-  }, [handleSessionAudioInterrupted, handleSessionAudioStart, handleSessionAudioStopped, handleSessionError]);
+  }, [
+    handleSessionAudioInterrupted,
+    handleSessionAudioStart,
+    handleSessionAudioStopped,
+    handleSessionError,
+    handleTransportEvent
+  ]);
   const attachSessionAudioListeners = useCallback(
     (session) => {
       detachSessionAudioListeners();
@@ -1147,6 +1338,7 @@ function useWebVoiceAgent(options) {
       session.on("history_added", (item) => {
         debugLog2("session history_added", item);
       });
+      session.on("transport_event", handleTransportEvent);
       session.on("error", (sessionError) => {
         debugLog2("session error", sessionError);
       });
@@ -1161,19 +1353,23 @@ function useWebVoiceAgent(options) {
       handleSessionAudioInterrupted,
       handleSessionAudioStart,
       handleSessionAudioStopped,
-      handleSessionError
+      handleSessionError,
+      handleTransportEvent
     ]
   );
   const stop = useCallback(() => {
     detachSessionAudioListeners();
+    clearPlayback({ invalidate: true, resetState: true });
     try {
       sessionRef.current?.close();
     } finally {
       sessionRef.current = null;
+      spokenAssistantKeysRef.current.clear();
+      speechProviderRef.current = "openai";
       setStatus("idle");
       setAgentVoiceStateIfChanged("idle");
     }
-  }, [detachSessionAudioListeners, setAgentVoiceStateIfChanged]);
+  }, [clearPlayback, detachSessionAudioListeners, setAgentVoiceStateIfChanged]);
   useEffect(() => {
     return () => {
       stop();
@@ -1201,6 +1397,9 @@ function useWebVoiceAgent(options) {
       });
       const requestPayload = runtimeConfig.modelOverride ? { model: runtimeConfig.modelOverride } : {};
       const secretPayload = await backendClient.createClientSecret(requestPayload);
+      speechProviderRef.current = secretPayload.speech.provider;
+      spokenAssistantKeysRef.current.clear();
+      clearPlayback({ invalidate: true, resetState: true });
       const backendFunctionsResult = await backendClient.listFunctions();
       const { agent, warnings } = await buildNavaiAgent({
         navigate: options.navigate,
@@ -1212,7 +1411,11 @@ function useWebVoiceAgent(options) {
         executeBackendFunction: backendClient.executeFunction
       });
       emitWarnings([...runtimeConfig.warnings, ...backendFunctionsResult.warnings, ...warnings]);
-      const session = new RealtimeSession(agent);
+      const session = secretPayload.speech.provider === "elevenlabs" ? new RealtimeSession(agent, {
+        config: {
+          outputModalities: ["text"]
+        }
+      }) : new RealtimeSession(agent);
       attachSessionAudioListeners(session);
       if (runtimeConfig.modelOverride) {
         await session.connect({ apiKey: secretPayload.value, model: runtimeConfig.modelOverride });
@@ -1228,6 +1431,9 @@ function useWebVoiceAgent(options) {
       setStatus("error");
       setAgentVoiceStateIfChanged("idle");
       detachSessionAudioListeners();
+      clearPlayback({ invalidate: true, resetState: true });
+      spokenAssistantKeysRef.current.clear();
+      speechProviderRef.current = "openai";
       try {
         sessionRef.current?.close();
       } catch {
@@ -1237,6 +1443,7 @@ function useWebVoiceAgent(options) {
   }, [
     attachSessionAudioListeners,
     backendClient,
+    clearPlayback,
     detachSessionAudioListeners,
     options.navigate,
     runtimeConfigPromise,

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@navai/voice-frontend",
-  "version": "0.1.5",
+  "version": "0.1.7",
   "description": "Frontend helpers to build OpenAI Realtime voice agents",
   "type": "module",
   "main": "./dist/index.cjs",