npm - @cartesia/cartesia-js - Versions diffs - 2.2.4 → 2.2.7 - Mend

@cartesia/cartesia-js 2.2.4 → 2.2.7

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (219) hide show

package/README.md CHANGED Viewed

@@ -1,13 +1,9 @@
-# Cartesia TypeScript SDK
+# Cartesia TypeScript Library
 [![fern shield](https://img.shields.io/badge/%F0%9F%8C%BF-Built%20with%20Fern-brightgreen)](https://buildwithfern.com?utm_source=github&utm_medium=github&utm_campaign=readme&utm_source=https%3A%2F%2Fgithub.com%2Fcartesia-ai%2Fcartesia-js)
 [![npm shield](https://img.shields.io/npm/v/@cartesia/cartesia-js)](https://www.npmjs.com/package/@cartesia/cartesia-js)
-[![Discord](https://badgen.net/badge/black/Cartesia/icon?icon=discord&label)](https://discord.gg/cartesia)
-The Cartesia TypeScript library provides convenient access to the Cartesia API from TypeScript/JavaScript, runnable in both Node.js and browsers.
-> [!TIP]
-> **[@cartesia-ai/cartesia-nextjs-demo](https://github.com/cartesia-ai/cartesia-nextjs-demo)** is our demo app that shows how to use Cartesia text-to-speech in a browser-based application.
+The Cartesia TypeScript library provides convenient access to the Cartesia APIs from TypeScript.
 ## Installation
@@ -17,222 +13,589 @@ npm i -s @cartesia/cartesia-js
 ## Reference
-A full reference for this library is available [here](./reference.md).
+A full reference for this library is available [here](https://github.com/cartesia-ai/cartesia-js/blob/HEAD/./reference.md).
 ## Usage
-### Instantiation
 Instantiate and use the client with the following:
 ```typescript
 import { CartesiaClient } from "@cartesia/cartesia-js";
-import process from "node:process"
-import fs from "node:fs"
-// Set up the client.
-const client = new CartesiaClient({ apiKey: process.env.CARTESIA_API_KEY });
-// Call the TTS API's bytes endpoint, which returns binary audio data as an ArrayBuffer.
-const response = await client.tts.bytes({
-    modelId: "sonic-2",
-    transcript: "Hello, world!",
-    voice: {
-        mode: "id",
-        id: "694f9389-aac1-45b6-b726-9d9369183238",
-    },
-    language: "en",
-    outputFormat: {
-        container: "wav",
-        sampleRate: 44100,
-        encoding: "pcm_f32le",
+const client = new CartesiaClient({ apiKey: "YOUR_API_KEY" });
+await client.auth.accessToken({
+    grants: {
+        stt: true,
     },
+    expiresIn: 60,
 });
-// Write the response to a file.
-fs.writeFileSync("sonic.wav", new Uint8Array(response));
 ```
-### TTS over WebSocket
+## Speech-to-Text (STT)
-```js
+```typescript
 import { CartesiaClient } from "@cartesia/cartesia-js";
+import fs from "node:fs";
+async function streamingSTTExample() {
+    const client = new CartesiaClient({
+        apiKey: process.env.CARTESIA_API_KEY,
+    });
+    // Create websocket connection with endpointing parameters
+    const sttWs = client.stt.websocket({
+        model: "ink-whisper",
+        language: "en", // Language of your audio
+        encoding: "pcm_s16le", // Audio encoding format (required)
+        sampleRate: 16000, // Audio sample rate (required)
+        minVolume: 0.1, // Volume threshold for voice activity detection (0.0-1.0)
+        maxSilenceDurationSecs: 2.0, // Maximum silence duration before endpointing
+    });
+    // Concurrent audio sending
+    async function sendAudio() {
+        try {
+            const audioBuffer = fs.readFileSync("audio.wav");
+            const chunkSize = 3200; // ~200ms chunks for more realistic streaming
+            console.log("Starting audio stream...");
+            for (let i = 0; i < audioBuffer.length; i += chunkSize) {
+                const chunk = audioBuffer.subarray(i, i + chunkSize);
+                const arrayBuffer = chunk.buffer.slice(chunk.byteOffset, chunk.byteOffset + chunk.byteLength);
+                await sttWs.send(arrayBuffer);
+                console.log(`Sent chunk ${Math.floor(i / chunkSize) + 1}`);
+                // Simulate real-time audio capture delay
+                await new Promise((resolve) => setTimeout(resolve, 100));
+            }
+            await sttWs.finalize();
+            console.log("Audio streaming completed");
+        } catch (error) {
+            console.error("Error sending audio:", error);
+        }
+    }
-const cartesia = new CartesiaClient({
-    apiKey: process.env.CARTESIA_API_KEY,
-});
+    // Concurrent transcript receiving with word-level timestamps
+    async function receiveTranscripts(): Promise<string> {
+        return new Promise((resolve) => {
+            let fullTranscript = "";
+            sttWs.onMessage((result) => {
+                if (result.type === "transcript") {
+                    const status = result.isFinal ? "FINAL" : "INTERIM";
+                    console.log(`[${status}] "${result.text}"`);
+                    // Handle word-level timestamps if available
+                    if (result.words && result.words.length > 0) {
+                        console.log("Word-level timestamps:");
+                        result.words.forEach((word) => {
+                            console.log(`  "${word.word}": ${word.start.toFixed(2)}s - ${word.end.toFixed(2)}s`);
+                        });
+                    }
+                    if (result.isFinal) {
+                        fullTranscript += `${result.text} `;
+                    }
+                } else if (result.type === "flush_done") {
+                    console.log("Flush completed - sending done command");
+                    sttWs.done().catch(console.error);
+                } else if (result.type === "done") {
+                    console.log("Transcription completed");
+                    resolve(fullTranscript.trim());
+                } else if (result.type === "error") {
+                    console.error(`Error: ${result.message}`);
+                    resolve("");
+                }
+            });
+        });
+    }
-// Initialize the WebSocket. Make sure the output format you specify is supported.
-const websocket = cartesia.tts.websocket({
-    container: "raw",
-    encoding: "pcm_f32le",
-    sampleRate: 44100,
-});
+    try {
+        console.log("Starting STT processing...");
-// Create a stream.
-const response = await websocket.send({
-    modelId: "sonic-2",
-    voice: {
-        mode: "id",
-        id: "a0e99841-438c-4a64-b679-ae501e7d6091",
-    },
-    transcript: "Hello, world!",
-    // The WebSocket sets output_format on your behalf.
-});
+        // Run audio sending and transcript receiving concurrently
+        const [, finalTranscript] = await Promise.all([sendAudio(), receiveTranscripts()]);
-// Access the raw messages from the WebSocket.
-response.on("message", (message) => {
-    // Raw message.
-    console.log("Received message:", message);
-});
+        console.log(`\nFinal transcript: ${finalTranscript}`);
-// You can also access messages using a for-await-of loop.
-for await (const message of response.events("message")) {
-    // Raw message.
-    console.log("Received message:", message);
+        // Clean up
+        sttWs.disconnect();
+        return finalTranscript;
+    } catch (error) {
+        console.error("STT processing error:", error);
+        sttWs.disconnect();
+        throw error;
+    }
 }
+// Run the example
+streamingSTTExample().catch(console.error);
 ```
-#### Input Streaming with Contexts
+## Request And Response Types
-```js
-const contextOptions = {
-    contextId: "my-context",
-    modelId: "sonic-2",
-    voice: {
-        mode: "id",
-        id: "a0e99841-438c-4a64-b679-ae501e7d6091",
-    },
+The SDK exports all request and response types as TypeScript interfaces. Simply import them with the
+following namespace:
+```typescript
+import { Cartesia } from "@cartesia/cartesia-js";
+const request: Cartesia.InfillBytesRequest = {
+    ...
 };
+```
-// Initial request on the context uses websocket.send().
-// This response object will aggregate the results of all the inputs sent on the context.
-const response = await websocket.send({
-    ...contextOptions,
-    transcript: "Hello, world!",
-});
+## Exception Handling
-// Subsequent requests on the same context use websocket.continue().
-await websocket.continue({
-    ...contextOptions,
-    transcript: " How are you today?",
-});
+When the API returns a non-success status code (4xx or 5xx response), a subclass of the following error
+will be thrown.
+```typescript
+import { CartesiaError } from "@cartesia/cartesia-js";
+try {
+    await client.auth.accessToken(...);
+} catch (err) {
+    if (err instanceof CartesiaError) {
+        console.log(err.statusCode);
+        console.log(err.message);
+        console.log(err.body);
+    }
+}
 ```
-See the [input streaming docs](https://docs.cartesia.ai/reference/web-socket/stream-speech/working-with-web-sockets#input-streaming-with-contexts) for more information.
+## Binary Response
-### Playing audio in the browser
+You can consume binary data from endpoints using the `BinaryResponse` type which lets you choose how to consume the data:
+```typescript
+const response = await client.agents.downloadCallAudio(...);
+const stream: ReadableStream<Uint8Array> = response.stream();
+// const arrayBuffer: ArrayBuffer = await response.arrayBuffer();
+// const blob: Blob = response.blob();
+// const bytes: Uint8Array = response.bytes();
+// You can only use the response body once, so you must choose one of the above methods.
+// If you want to check if the response body has been used, you can use the following property.
+const bodyUsed = response.bodyUsed;
+```
-(The `WebPlayer` class only supports playing audio in the browser and the raw PCM format with fp32le encoding.)
+<details>
+<summary>Save binary response to a file</summary>
-```js
-// If you're using the client in the browser, you can control audio playback using our WebPlayer:
-import { WebPlayer } from "@cartesia/cartesia-js";
+<blockquote>
+<details>
+<summary>Node.js</summary>
-console.log("Playing stream...");
+<blockquote>
+<details>
+<summary>ReadableStream (most-efficient)</summary>
-// Create a Player object.
-const player = new WebPlayer();
+```ts
+import { createWriteStream } from 'fs';
+import { Readable } from 'stream';
+import { pipeline } from 'stream/promises';
-// Play the audio. (`response` includes a custom Source object that the Player can play.)
-// The call resolves when the audio finishes playing.
-await player.play(response.source);
+const response = await client.agents.downloadCallAudio(...);
-console.log("Done playing.");
+const stream = response.stream();
+const nodeStream = Readable.fromWeb(stream);
+const writeStream = createWriteStream('path/to/file');
+await pipeline(nodeStream, writeStream);
 ```
-## Speech-to-Text (STT)
+</details>
+</blockquote>
-```typescript
-import { CartesiaClient } from "@cartesia/cartesia-js";
-import fs from "fs";
+<blockquote>
+<details>
+<summary>ArrayBuffer</summary>
-const client = new CartesiaClient({
-    apiKey: process.env.CARTESIA_API_KEY,
-});
+```ts
+import { writeFile } from 'fs/promises';
-// Create STT WebSocket connection
-const sttWs = client.stt.websocket({
-    model: "ink-whisper",
-    language: "en",
-    encoding: "pcm_s16le",
-    sampleRate: 16000,
-});
+const response = await client.agents.downloadCallAudio(...);
-// Set up message handler
-await sttWs.onMessage((result) => {
-    if (result.type === "transcript") {
-        const status = result.isFinal ? "FINAL" : "INTERIM";
-        console.log(`[${status}] ${result.text}`);
-        if (result.duration) {
-            console.log(`Duration: ${result.duration.toFixed(2)}s`);
-        }
-    } else if (result.type === "flush_done") {
-        console.log("Flush completed");
-        await sttWs.done(); // Send done command
-    } else if (result.type === "done") {
-        console.log("Session complete");
-    } else if (result.type === "error") {
-        console.error(`Error: ${result.message}`);
-    }
-});
+const arrayBuffer = await response.arrayBuffer();
+await writeFile('path/to/file', Buffer.from(arrayBuffer));
+```
-// Load and send audio data
-const audioBuffer = fs.readFileSync("audio.wav");
-const chunkSize = 1600; // ~100ms at 16kHz
-const audioChunks = [];
+</details>
+</blockquote>
-for (let i = 0; i < audioBuffer.length; i += chunkSize) {
-    const chunk = audioBuffer.slice(i, i + chunkSize);
-    audioChunks.push(chunk.buffer);
-}
+<blockquote>
+<details>
+<summary>Blob</summary>
+```ts
+import { writeFile } from 'fs/promises';
+const response = await client.agents.downloadCallAudio(...);
+const blob = await response.blob();
+const arrayBuffer = await blob.arrayBuffer();
+await writeFile('output.bin', Buffer.from(arrayBuffer));
+```
+</details>
+</blockquote>
+<blockquote>
+<details>
+<summary>Bytes (UIntArray8)</summary>
+```ts
+import { writeFile } from 'fs/promises';
+const response = await client.agents.downloadCallAudio(...);
+const bytes = await response.bytes();
+await writeFile('path/to/file', bytes);
+```
+</details>
+</blockquote>
+</details>
+</blockquote>
+<blockquote>
+<details>
+<summary>Bun</summary>
+<blockquote>
+<details>
+<summary>ReadableStream (most-efficient)</summary>
+```ts
+const response = await client.agents.downloadCallAudio(...);
+const stream = response.stream();
+await Bun.write('path/to/file', stream);
+```
+</details>
+</blockquote>
+<blockquote>
+<details>
+<summary>ArrayBuffer</summary>
+```ts
+const response = await client.agents.downloadCallAudio(...);
+const arrayBuffer = await response.arrayBuffer();
+await Bun.write('path/to/file', arrayBuffer);
+```
+</details>
+</blockquote>
+<blockquote>
+<details>
+<summary>Blob</summary>
+```ts
+const response = await client.agents.downloadCallAudio(...);
+const blob = await response.blob();
+await Bun.write('path/to/file', blob);
+```
+</details>
+</blockquote>
+<blockquote>
+<details>
+<summary>Bytes (UIntArray8)</summary>
+```ts
+const response = await client.agents.downloadCallAudio(...);
+const bytes = await response.bytes();
+await Bun.write('path/to/file', bytes);
+```
+</details>
+</blockquote>
+</details>
+</blockquote>
+<blockquote>
+<details>
+<summary>Deno</summary>
-// Send audio chunks
-for (const chunk of audioChunks) {
-    await sttWs.send(chunk);
+<blockquote>
+<details>
+<summary>ReadableStream (most-efficient)</summary>
+```ts
+const response = await client.agents.downloadCallAudio(...);
+const stream = response.stream();
+const file = await Deno.open('path/to/file', { write: true, create: true });
+await stream.pipeTo(file.writable);
+```
+</details>
+</blockquote>
+<blockquote>
+<details>
+<summary>ArrayBuffer</summary>
+```ts
+const response = await client.agents.downloadCallAudio(...);
+const arrayBuffer = await response.arrayBuffer();
+await Deno.writeFile('path/to/file', new Uint8Array(arrayBuffer));
+```
+</details>
+</blockquote>
+<blockquote>
+<details>
+<summary>Blob</summary>
+```ts
+const response = await client.agents.downloadCallAudio(...);
+const blob = await response.blob();
+const arrayBuffer = await blob.arrayBuffer();
+await Deno.writeFile('path/to/file', new Uint8Array(arrayBuffer));
+```
+</details>
+</blockquote>
+<blockquote>
+<details>
+<summary>Bytes (UIntArray8)</summary>
+```ts
+const response = await client.agents.downloadCallAudio(...);
+const bytes = await response.bytes();
+await Deno.writeFile('path/to/file', bytes);
+```
+</details>
+</blockquote>
+</details>
+</blockquote>
+<blockquote>
+<details>
+<summary>Browser</summary>
+<blockquote>
+<details>
+<summary>Blob (most-efficient)</summary>
+```ts
+const response = await client.agents.downloadCallAudio(...);
+const blob = await response.blob();
+const url = URL.createObjectURL(blob);
+// trigger download
+const a = document.createElement('a');
+a.href = url;
+a.download = 'filename';
+a.click();
+URL.revokeObjectURL(url);
+```
+</details>
+</blockquote>
+<blockquote>
+<details>
+<summary>ReadableStream</summary>
+```ts
+const response = await client.agents.downloadCallAudio(...);
+const stream = response.stream();
+const reader = stream.getReader();
+const chunks = [];
+while (true) {
+  const { done, value } = await reader.read();
+  if (done) break;
+  chunks.push(value);
 }
-// Finalize transcription
-await sttWs.finalize();
+const blob = new Blob(chunks);
+const url = URL.createObjectURL(blob);
-// Disconnect when done
-sttWs.disconnect();
+// trigger download
+const a = document.createElement('a');
+a.href = url;
+a.download = 'filename';
+a.click();
+URL.revokeObjectURL(url);
 ```
-## Request And Response Types
+</details>
+</blockquote>
-The SDK exports all request and response types as TypeScript interfaces. Simply import them with the
-following namespace:
+<blockquote>
+<details>
+<summary>ArrayBuffer</summary>
-```typescript
-import { Cartesia } from "@cartesia/cartesia-js";
+```ts
+const response = await client.agents.downloadCallAudio(...);
-const request: Cartesia.VoiceChangerBytesRequest = {
-    ...
-};
+const arrayBuffer = await response.arrayBuffer();
+const blob = new Blob([arrayBuffer]);
+const url = URL.createObjectURL(blob);
+// trigger download
+const a = document.createElement('a');
+a.href = url;
+a.download = 'filename';
+a.click();
+URL.revokeObjectURL(url);
 ```
-## Exception Handling
+</details>
+</blockquote>
-When the API returns a non-success status code (4xx or 5xx response), a subclass of the following error
-will be thrown.
+<blockquote>
+<details>
+<summary>Bytes (UIntArray8)</summary>
+```ts
+const response = await client.agents.downloadCallAudio(...);
+const bytes = await response.bytes();
+const blob = new Blob([bytes]);
+const url = URL.createObjectURL(blob);
+// trigger download
+const a = document.createElement('a');
+a.href = url;
+a.download = 'filename';
+a.click();
+URL.revokeObjectURL(url);
+```
+</details>
+</blockquote>
+</details>
+</blockquote>
+</details>
+</blockquote>
+<details>
+<summary>Convert binary response to text</summary>
+<blockquote>
+<details>
+<summary>ReadableStream</summary>
+```ts
+const response = await client.agents.downloadCallAudio(...);
+const stream = response.stream();
+const text = await new Response(stream).text();
+```
+</details>
+</blockquote>
+<blockquote>
+<details>
+<summary>ArrayBuffer</summary>
+```ts
+const response = await client.agents.downloadCallAudio(...);
+const arrayBuffer = await response.arrayBuffer();
+const text = new TextDecoder().decode(arrayBuffer);
+```
+</details>
+</blockquote>
+<blockquote>
+<details>
+<summary>Blob</summary>
+```ts
+const response = await client.agents.downloadCallAudio(...);
+const blob = await response.blob();
+const text = await blob.text();
+```
+</details>
+</blockquote>
+<blockquote>
+<details>
+<summary>Bytes (UIntArray8)</summary>
+```ts
+const response = await client.agents.downloadCallAudio(...);
+const bytes = await response.bytes();
+const text = new TextDecoder().decode(bytes);
+```
+</details>
+</blockquote>
+</details>
+## Pagination
+List endpoints are paginated. The SDK provides an iterator so that you can simply loop over the items:
 ```typescript
-import { CartesiaError } from "@cartesia/cartesia-js";
+import { CartesiaClient } from "@cartesia/cartesia-js";
-try {
-    await client.tts.bytes(...);
-} catch (err) {
-    if (err instanceof CartesiaError) {
-        console.log(err.statusCode);
-        console.log(err.message);
-        console.log(err.body);
-    }
+const client = new CartesiaClient({ token: "YOUR_TOKEN" });
+const response = await client.agents.listCalls({
+    agentId: "agent_id",
+});
+for await (const item of response) {
+    console.log(item);
+}
+// Or you can manually iterate page-by-page
+let page = await client.agents.listCalls({
+    agentId: "agent_id",
+});
+while (page.hasNextPage()) {
+    page = page.getNextPage();
 }
 ```
 ## Advanced
+### Additional Headers
+If you would like to send additional headers as part of the request, use the `headers` request option.
+```typescript
+const response = await client.auth.accessToken(..., {
+    headers: {
+        'X-Custom-Header': 'custom value'
+    }
+});
+```
 ### Retries
 The SDK is instrumented with automatic retries with exponential backoff. A request will be retried as long
@@ -241,14 +604,14 @@ retry limit (default: 2).
 A request is deemed retriable when any of the following HTTP status codes is returned:
--   [408](https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/408) (Timeout)
--   [429](https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/429) (Too Many Requests)
--   [5XX](https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/500) (Internal Server Errors)
+- [408](https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/408) (Timeout)
+- [429](https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/429) (Too Many Requests)
+- [5XX](https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/500) (Internal Server Errors)
 Use the `maxRetries` request option to configure this behavior.
 ```typescript
-const response = await client.tts.bytes(..., {
+const response = await client.auth.accessToken(..., {
     maxRetries: 0 // override maxRetries at the request level
 });
 ```
@@ -258,7 +621,7 @@ const response = await client.tts.bytes(..., {
 The SDK defaults to a 60 second timeout. Use the `timeoutInSeconds` option to configure this behavior.
 ```typescript
-const response = await client.tts.bytes(..., {
+const response = await client.auth.accessToken(..., {
     timeoutInSeconds: 30 // override timeout to 30s
 });
 ```
@@ -269,7 +632,7 @@ The SDK allows users to abort requests at any point by passing in an abort signa
 ```typescript
 const controller = new AbortController();
-const response = await client.tts.bytes(..., {
+const response = await client.auth.accessToken(..., {
     abortSignal: controller.signal
 });
 controller.abort(); // aborts the request
@@ -280,12 +643,12 @@ controller.abort(); // aborts the request
 The SDK defaults to `node-fetch` but will use the global fetch client if present. The SDK works in the following
 runtimes:
--   Node.js 18+
--   Vercel
--   Cloudflare Workers
--   Deno v1.25+
--   Bun 1.0+
--   React Native
+- Node.js 18+
+- Vercel
+- Cloudflare Workers
+- Deno v1.25+
+- Bun 1.0+
+- React Native
 ### Customizing Fetch Client
@@ -310,3 +673,7 @@ a proof of concept, but know that we will not be able to merge it as-is. We sugg
 an issue first to discuss with us!
 On the other hand, contributions to the README are always very welcome!
+## Documentation
+API reference documentation is available [here](https://docs.cartesia.ai/).