@mux/ai 0.1.5 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,69 +1,43 @@
1
- # @mux/ai
1
+ # `@mux/ai` 📼 🤝 🤖
2
2
 
3
- A set of tools for connecting videos in your Mux account to multi-modal LLMs.
3
+ [![npm version](https://badge.fury.io/js/@mux%2Fai.svg)](https://www.npmjs.com/package/@mux/ai)
4
+ [![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
4
5
 
5
- ## Available pre-built workflows
6
-
7
- | Workflow | Description | Providers | Default Models | Input | Output |
8
- | ------------------------- | --------------------------------------------------------------- | ------------------------- | ------------------------------------------------------------------ | -------------------------------- | ---------------------------------------------- |
9
- | `getSummaryAndTags` | Generate titles, descriptions, and tags from a Mux video asset | OpenAI, Anthropic, Google | `gpt-5-mini`, `claude-sonnet-4-5`, `gemini-2.5-flash` | Asset ID + options | Title, description, tags, storyboard URL |
10
- | `getModerationScores` | Analyze video thumbnails for inappropriate content | OpenAI, Hive | `omni-moderation-latest` (OpenAI) or Hive visual moderation task | Asset ID + thresholds | Sexual/violence scores, flagged status |
11
- | `hasBurnedInCaptions` | Detect burned-in captions (hardcoded subtitles) in video frames | OpenAI, Anthropic, Google | `gpt-5-mini`, `claude-sonnet-4-5`, `gemini-2.5-flash` | Asset ID + options | Boolean result, confidence, language |
12
- | `generateChapters` | Generate AI-powered chapter markers from video captions | OpenAI, Anthropic, Google | `gpt-5-mini`, `claude-sonnet-4-5`, `gemini-2.5-flash` | Asset ID + language + options | Timestamped chapter list, ready for Mux Player |
13
- | `generateVideoEmbeddings` | Generate vector embeddings for video transcript chunks | OpenAI, Google | `text-embedding-3-small` (OpenAI), `gemini-embedding-001` (Google) | Asset ID + chunking strategy | Chunk embeddings + averaged embedding |
14
- | `translateCaptions` | Translate video captions to different languages | OpenAI, Anthropic, Google | `gpt-5-mini`, `claude-sonnet-4-5`, `gemini-2.5-flash` | Asset ID + languages + S3 config | Translated VTT + Mux track ID |
15
- | `translateAudio` | Create AI-dubbed audio tracks in different languages | ElevenLabs only | ElevenLabs Dubbing API | Asset ID + languages + S3 config | Dubbed audio + Mux track ID |
16
-
17
- ## Features
18
-
19
- - **Cost-Effective by Default**: Uses affordable frontier models like `gpt-5-mini`, `claude-sonnet-4-5`, and `gemini-2.5-flash` to keep analysis costs low while maintaining high quality results
20
- - **Multi-modal Analysis**: Combines storyboard images with video transcripts
21
- - **Tone Control**: Normal, sassy, or professional analysis styles
22
- - **Prompt Customization**: Override specific prompt sections to tune workflows to your use case
23
- - **Configurable Thresholds**: Custom sensitivity levels for content moderation
24
- - **TypeScript**: Fully typed for excellent developer experience
25
- - **Provider Choice**: Switch between OpenAI, Anthropic, and Google for different perspectives
26
- - **Composable Building Blocks**: Import primitives to fetch transcripts, thumbnails, and storyboards to build bespoke flows
27
- - **Universal Language Support**: Automatic language name detection using `Intl.DisplayNames` for all ISO 639-1 codes
6
+ > **A TypeScript SDK for building AI-driven video workflows on the server, powered by [Mux](https://www.mux.com)!**
28
7
 
29
- ## Package Structure
8
+ `@mux/ai` does this by providing:
9
+ - Easy to use, purpose-driven, cost effective, configurable **_workflow functions_** that integrate with a variety of popular AI/LLM providers (OpenAI, Anthropic, Google).
10
+ - **Examples:** [`generateChapters`](#chapter-generation), [`getModerationScores`](#content-moderation), [`generateVideoEmbeddings`](#video-search-with-embeddings), [`getSummaryAndTags`](#video-summarization)
11
+ - Convenient, parameterized, commonly needed **_primitive functions_** backed by [Mux Video](https://www.mux.com/video-api) for building your own media-based AI workflows and integrations.
12
+ - **Examples:** `getStoryboardUrl`, `chunkVTTCues`, `fetchTranscriptForAsset`
30
13
 
31
- This package ships with layered entry points so you can pick the right level of abstraction for your workflow:
14
+ # Usage
32
15
 
33
- - `@mux/ai/workflows` – opinionated, production-ready helpers (`getSummaryAndTags`, `generateChapters`, `translateCaptions`, etc.) that orchestrate Mux API access, transcript/storyboard gathering, and the AI provider call.
34
- - `@mux/ai/primitives` – low-level building blocks such as `fetchTranscriptForAsset`, `getStoryboardUrl`, and `getThumbnailUrls`. Use these when you need to mix our utilities into your own prompts or custom workflows.
35
- - `@mux/ai` – re-exports both namespaces, plus shared `types`, so you can also write `import { workflows, primitives } from '@mux/ai';`.
16
+ ```ts
17
+ import { getSummaryAndTags } from "@mux/ai/workflows";
36
18
 
37
- Every helper inside `@mux/ai/workflows` is composed from the primitives. That means you can start with a high-level workflow and gradually drop down to primitives whenever you need more control.
19
+ const result = await getSummaryAndTags("your-asset-id", {
20
+ provider: "openai",
21
+ tone: "professional",
22
+ includeTranscript: true
23
+ });
38
24
 
39
- ```typescript
40
- import { fetchTranscriptForAsset, getStoryboardUrl } from "@mux/ai/primitives";
41
- import { getModerationScores, getSummaryAndTags } from "@mux/ai/workflows";
25
+ console.log(result.title); // "Getting Started with TypeScript"
26
+ console.log(result.description); // "A comprehensive guide to..."
27
+ console.log(result.tags); // ["typescript", "tutorial", "programming"]
28
+ ```
42
29
 
43
- // Compose high-level workflows for a custom workflow
44
- export async function summarizeIfSafe(assetId: string) {
45
- const moderation = await getModerationScores(assetId, { provider: "openai" });
46
- if (moderation.exceedsThreshold) {
47
- throw new Error("Asset failed content safety review");
48
- }
30
+ > **⚠️ Important:** Many workflows rely on video transcripts for best results. Consider enabling [auto-generated captions](https://www.mux.com/docs/guides/add-autogenerated-captions-and-use-transcripts) on your Mux assets to unlock the full potential of transcript-based workflows like summarization, chapters, and embeddings.
49
31
 
50
- return getSummaryAndTags(assetId, {
51
- provider: "anthropic",
52
- tone: "professional",
53
- });
54
- }
32
+ # Quick Start
55
33
 
56
- // Or drop down to primitives to build bespoke AI workflows
57
- export async function customTranscriptAnalysis(assetId: string, playbackId: string) {
58
- const transcript = await fetchTranscriptForAsset(assetId, "en");
59
- const storyboardUrl = getStoryboardUrl(playbackId);
34
+ ## Prerequisites
60
35
 
61
- // Use these primitives in your own AI prompts or custom logic
62
- return { transcript, storyboardUrl };
63
- }
64
- ```
36
+ - [Node.js](https://nodejs.org/en/download) (≥ 21.0.0)
37
+ - A Mux account and necessary [credentials](#credentials---mux) for your environment (sign up [here](https://dashboard.mux.com/signup) for free!)
38
+ - Accounts and [credentials](#credentials---ai-providers) for any AI providers you intend to use for your workflows
39
+ - (For some workflows only) AWS S3 and [other credentials](#credentials---other)
65
40
 
66
- Use whichever layer makes sense: call a workflow as-is, compose multiple workflows together, or drop down to primitives to build a completely custom workflow.
67
41
 
68
42
  ## Installation
69
43
 
@@ -71,205 +45,127 @@ Use whichever layer makes sense: call a workflow as-is, compose multiple workflo
71
45
  npm install @mux/ai
72
46
  ```
73
47
 
74
- ## Quick Start
48
+ ## Configuration
75
49
 
76
- ### Video Summarization
50
+ We support [dotenv](https://www.npmjs.com/package/dotenv), so you can simply add the following environment variables to your `.env` file:
77
51
 
78
- ```typescript
79
- import { getSummaryAndTags } from "@mux/ai/workflows";
52
+ ```bash
53
+ # Required
54
+ MUX_TOKEN_ID=your_mux_token_id
55
+ MUX_TOKEN_SECRET=your_mux_token_secret
80
56
 
81
- // Uses built-in optimized prompt
82
- const result = await getSummaryAndTags("your-mux-asset-id", {
83
- tone: "professional"
84
- });
57
+ # Needed if your assets _only_ have signed playback IDs
58
+ MUX_SIGNING_KEY=your_signing_key_id
59
+ MUX_PRIVATE_KEY=your_base64_encoded_private_key
85
60
 
86
- console.log(result.title); // Short, descriptive title
87
- console.log(result.description); // Detailed description
88
- console.log(result.tags); // Array of relevant keywords
89
- console.log(result.storyboardUrl); // URL to Mux storyboard
90
-
91
- // Use base64 mode for improved reliability (works with OpenAI, Anthropic, and Google)
92
- const reliableResult = await getSummaryAndTags("your-mux-asset-id", {
93
- provider: "anthropic",
94
- imageSubmissionMode: "base64", // Downloads storyboard locally before submission
95
- imageDownloadOptions: {
96
- timeout: 15000,
97
- retries: 2,
98
- retryDelay: 1000
99
- },
100
- tone: "professional"
101
- });
61
+ # You only need to configure API keys for the AI platforms and workflows you're using
62
+ OPENAI_API_KEY=your_openai_api_key
63
+ ANTHROPIC_API_KEY=your_anthropic_api_key
64
+ GOOGLE_GENERATIVE_AI_API_KEY=your_google_api_key
65
+ ELEVENLABS_API_KEY=your_elevenlabs_api_key
102
66
 
103
- // Customize for specific use cases with promptOverrides
104
- const seoResult = await getSummaryAndTags("your-mux-asset-id", {
105
- promptOverrides: {
106
- task: "Generate SEO-optimized metadata for search engines.",
107
- title: "Create a search-optimized title (50-60 chars) with primary keyword.",
108
- keywords: "Focus on high search volume and long-tail keywords.",
109
- },
110
- });
67
+ # S3-Compatible Storage (required for translation & audio dubbing)
68
+ S3_ENDPOINT=https://your-s3-endpoint.com
69
+ S3_REGION=auto
70
+ S3_BUCKET=your-bucket-name
71
+ S3_ACCESS_KEY_ID=your-access-key
72
+ S3_SECRET_ACCESS_KEY=your-secret-key
111
73
  ```
112
74
 
113
- ### Content Moderation
75
+ Or pass credentials directly to each function:
114
76
 
115
77
  ```typescript
116
- import { getModerationScores } from "@mux/ai/workflows";
117
-
118
- // Analyze Mux video asset for inappropriate content (OpenAI default)
119
- const result = await getModerationScores("your-mux-asset-id", {
120
- thresholds: { sexual: 0.7, violence: 0.8 }
78
+ const result = await getSummaryAndTags(assetId, {
79
+ muxTokenId: "your-token-id",
80
+ muxTokenSecret: "your-token-secret",
81
+ openaiApiKey: "your-openai-key"
121
82
  });
83
+ ```
122
84
 
123
- console.log(result.maxScores); // Highest scores across all thumbnails
124
- console.log(result.exceedsThreshold); // true if content should be flagged
125
- console.log(result.thumbnailScores); // Individual thumbnail results
85
+ > **💡 Tip:** If you're using `.env` in a repository or version tracking system, make sure you add this file to your `.gitignore` or equivalent to avoid unintentionally committing secure credentials.
126
86
 
127
- // Run the same analysis using Hive’s visual moderation API
128
- const hiveResult = await getModerationScores("your-mux-asset-id", {
129
- provider: "hive",
130
- thresholds: { sexual: 0.9, violence: 0.9 },
131
- });
87
+ # Workflows
132
88
 
133
- // Use base64 submission for improved reliability with OpenAI (downloads images locally)
134
- const reliableResult = await getModerationScores("your-mux-asset-id", {
135
- provider: "openai",
136
- imageSubmissionMode: "base64",
137
- imageDownloadOptions: {
138
- timeout: 15000,
139
- retries: 3,
140
- retryDelay: 1000
141
- }
142
- });
143
- ```
89
+ ## Available pre-built workflows
144
90
 
145
- ### Burned-in Caption Detection
91
+ | Workflow | Description | Providers | Default Models | Mux Asset Requirements | Cloud Infrastructure Requirements |
92
+ | ------------------------------------------------------------------------ | ----------------------------------------------------------------- | ------------------------- | ------------------------------------------------------------------ | ---------------------- | --------------------------------- |
93
+ | [`getSummaryAndTags`](./docs/WORKFLOWS.md#video-summarization)<br/>[API](./docs/API.md#getsummaryandtagsassetid-options) · [Source](./src/workflows/summarization.ts) | Generate titles, descriptions, and tags for an asset | OpenAI, Anthropic, Google | `gpt-5.1` (OpenAI), `claude-sonnet-4-5` (Anthropic), `gemini-2.5-flash` (Google) | Video (required), Captions (optional) | None |
94
+ | [`getModerationScores`](./docs/WORKFLOWS.md#content-moderation)<br/>[API](./docs/API.md#getmoderationscoresassetid-options) · [Source](./src/workflows/moderation.ts) | Detect inappropriate (sexual or violent) content in an asset | OpenAI, Hive | `omni-moderation-latest` (OpenAI) or Hive visual moderation task | Video (required) | None |
95
+ | [`hasBurnedInCaptions`](./docs/WORKFLOWS.md#burned-in-caption-detection)<br/>[API](./docs/API.md#hasburnedincaptionsassetid-options) · [Source](./src/workflows/burned-in-captions.ts) | Detect burned-in captions (hardcoded subtitles) in an asset | OpenAI, Anthropic, Google | `gpt-5.1` (OpenAI), `claude-sonnet-4-5` (Anthropic), `gemini-2.5-flash` (Google) | Video (required) | None |
96
+ | [`generateChapters`](./docs/WORKFLOWS.md#chapter-generation)<br/>[API](./docs/API.md#generatechaptersassetid-languagecode-options) · [Source](./src/workflows/chapters.ts) | Generate chapter markers for an asset using the transcript | OpenAI, Anthropic, Google | `gpt-5.1` (OpenAI), `claude-sonnet-4-5` (Anthropic), `gemini-2.5-flash` (Google) | Video (required), Captions (required) | None |
97
+ | [`generateVideoEmbeddings`](./docs/WORKFLOWS.md#video-embeddings)<br/>[API](./docs/API.md#generatevideoembeddingsassetid-options) · [Source](./src/workflows/embeddings.ts) | Generate vector embeddings for an asset's transcript chunks | OpenAI, Google | `text-embedding-3-small` (OpenAI), `gemini-embedding-001` (Google) | Video (required), Captions (required) | None |
98
+ | [`translateCaptions`](./docs/WORKFLOWS.md#caption-translation)<br/>[API](./docs/API.md#translatecaptionsassetid-fromlanguagecode-tolanguagecode-options) · [Source](./src/workflows/translate-captions.ts) | Translate an asset's captions into different languages | OpenAI, Anthropic, Google | `gpt-5.1` (OpenAI), `claude-sonnet-4-5` (Anthropic), `gemini-2.5-flash` (Google) | Video (required), Captions (required) | AWS S3 (if `uploadToMux=true`) |
99
+ | [`translateAudio`](./docs/WORKFLOWS.md#audio-dubbing)<br/>[API](./docs/API.md#translateaudioassetid-tolanguagecode-options) · [Source](./src/workflows/translate-audio.ts) | Create AI-dubbed audio tracks in different languages for an asset | ElevenLabs only | ElevenLabs Dubbing API | Video (required), Audio (required) | AWS S3 (if `uploadToMux=true`) |
146
100
 
147
- ```typescript
148
- import { hasBurnedInCaptions } from "@mux/ai/workflows";
101
+ ## Example Workflows
149
102
 
150
- // Detect burned-in captions (hardcoded subtitles) in video frames
151
- const result = await hasBurnedInCaptions("your-mux-asset-id", {
152
- provider: "openai"
153
- });
103
+ ### Video Summarization
154
104
 
155
- console.log(result.hasBurnedInCaptions); // true/false
156
- console.log(result.confidence); // 0.0-1.0 confidence score
157
- console.log(result.detectedLanguage); // Language if captions detected
158
- console.log(result.storyboardUrl); // Video storyboard analyzed
105
+ Generate SEO-friendly titles, descriptions, and tags from your video content:
159
106
 
160
- // Compare providers
161
- const anthropicResult = await hasBurnedInCaptions("your-mux-asset-id", {
162
- provider: "anthropic",
163
- model: "claude-sonnet-4-5"
164
- });
165
-
166
- const googleResult = await hasBurnedInCaptions("your-mux-asset-id", {
167
- provider: "google",
168
- model: "gemini-2.5-flash"
169
- });
107
+ ```typescript
108
+ import { getSummaryAndTags } from "@mux/ai/workflows";
170
109
 
171
- // Use base64 mode for improved reliability
172
- const reliableResult = await hasBurnedInCaptions("your-mux-asset-id", {
110
+ const result = await getSummaryAndTags("your-asset-id", {
173
111
  provider: "openai",
174
- imageSubmissionMode: "base64",
175
- imageDownloadOptions: {
176
- timeout: 15000,
177
- retries: 3,
178
- retryDelay: 1000
179
- }
112
+ tone: "professional",
113
+ includeTranscript: true
180
114
  });
181
- ```
182
115
 
183
- #### Image Submission Modes
184
-
185
- Choose between two methods for submitting images to AI providers:
186
-
187
- **URL Mode (Default):**
188
-
189
- - Fast initial response
190
- - Lower bandwidth usage
191
- - Relies on AI provider's image downloading
192
- - May encounter timeouts with slow/unreliable image sources
116
+ console.log(result.title); // "Getting Started with TypeScript"
117
+ console.log(result.description); // "A comprehensive guide to..."
118
+ console.log(result.tags); // ["typescript", "tutorial", "programming"]
119
+ ```
193
120
 
194
- **Base64 Mode (Recommended for Production):**
121
+ ### Content Moderation
195
122
 
196
- - Downloads images locally with robust retry logic
197
- - Eliminates AI provider timeout issues
198
- - Better control over slow TTFB and network issues
199
- - Slightly higher bandwidth usage but more reliable results
200
- - For OpenAI: submits images as base64 data URIs
201
- - For Anthropic/Google: the AI SDK handles converting the base64 payload into the provider-specific format automatically
123
+ Automatically detect inappropriate content in videos:
202
124
 
203
125
  ```typescript
204
- // High reliability mode - recommended for production
205
- const result = await getModerationScores(assetId, {
206
- imageSubmissionMode: "base64",
207
- imageDownloadOptions: {
208
- timeout: 15000, // 15s timeout per image
209
- retries: 3, // Retry failed downloads 3x
210
- retryDelay: 1000, // 1s base delay with exponential backoff
211
- exponentialBackoff: true
212
- }
213
- });
214
- ```
215
-
216
- ### Caption Translation
126
+ import { getModerationScores } from "@mux/ai/workflows";
217
127
 
218
- ```typescript
219
- import { translateCaptions } from "@mux/ai/workflows";
220
-
221
- // Translate existing captions to Spanish and add as new track
222
- const result = await translateCaptions(
223
- "your-mux-asset-id",
224
- "en", // from language
225
- "es", // to language
226
- {
227
- provider: "google",
228
- model: "gemini-2.5-flash"
229
- }
230
- );
128
+ const result = await getModerationScores("your-asset-id", {
129
+ provider: "openai",
130
+ thresholds: { sexual: 0.7, violence: 0.8 }
131
+ });
231
132
 
232
- console.log(result.uploadedTrackId); // New Mux track ID
233
- console.log(result.presignedUrl); // S3 file URL
234
- console.log(result.translatedVtt); // Translated VTT content
133
+ if (result.exceedsThreshold) {
134
+ console.log("Content flagged for review");
135
+ console.log(`Max scores: ${result.maxScores}`);
136
+ }
235
137
  ```
236
138
 
237
- ### Video Chapters
139
+ ### Chapter Generation
140
+
141
+ Create automatic chapter markers for better video navigation:
238
142
 
239
143
  ```typescript
240
144
  import { generateChapters } from "@mux/ai/workflows";
241
145
 
242
- // Generate AI-powered chapters from video captions
243
- const result = await generateChapters("your-mux-asset-id", "en", {
244
- provider: "openai"
146
+ const result = await generateChapters("your-asset-id", "en", {
147
+ provider: "anthropic"
245
148
  });
246
149
 
247
- console.log(result.chapters); // Array of {startTime: number, title: string}
248
-
249
150
  // Use with Mux Player
250
- const player = document.querySelector("mux-player");
251
151
  player.addChapters(result.chapters);
252
-
253
- // Compare providers
254
- const anthropicResult = await generateChapters("your-mux-asset-id", "en", {
255
- provider: "anthropic",
256
- model: "claude-sonnet-4-5"
257
- });
258
-
259
- const googleResult = await generateChapters("your-mux-asset-id", "en", {
260
- provider: "google",
261
- model: "gemini-2.5-flash"
262
- });
152
+ // [
153
+ // { startTime: 0, title: "Introduction" },
154
+ // { startTime: 45, title: "Main Content" },
155
+ // { startTime: 120, title: "Conclusion" }
156
+ // ]
263
157
  ```
264
158
 
265
- ### Video Embeddings
159
+ ### Video Search with Embeddings
160
+
161
+ Generate embeddings for semantic video search:
266
162
 
267
163
  ```typescript
268
164
  import { generateVideoEmbeddings } from "@mux/ai/workflows";
269
165
 
270
- // Generate embeddings for semantic video search
271
- const result = await generateVideoEmbeddings("your-mux-asset-id", {
166
+ const result = await generateVideoEmbeddings("your-asset-id", {
272
167
  provider: "openai",
168
+ languageCode: "en",
273
169
  chunkingStrategy: {
274
170
  type: "token",
275
171
  maxTokens: 500,
@@ -277,716 +173,225 @@ const result = await generateVideoEmbeddings("your-mux-asset-id", {
277
173
  }
278
174
  });
279
175
 
280
- console.log(result.chunks); // Array of chunk embeddings with timestamps
281
- console.log(result.averagedEmbedding); // Single embedding for entire video
282
-
283
- // Store chunks in vector database for timestamp-accurate search
176
+ // Store embeddings in your vector database
284
177
  for (const chunk of result.chunks) {
285
178
  await vectorDB.insert({
286
- id: `${result.assetId}:${chunk.chunkId}`,
287
179
  embedding: chunk.embedding,
288
- startTime: chunk.metadata.startTime,
289
- endTime: chunk.metadata.endTime
180
+ metadata: {
181
+ assetId: result.assetId,
182
+ startTime: chunk.metadata.startTime,
183
+ endTime: chunk.metadata.endTime
184
+ }
290
185
  });
291
186
  }
292
-
293
- // Use VTT-based chunking to respect cue boundaries
294
- const vttResult = await generateVideoEmbeddings("your-mux-asset-id", {
295
- provider: "google",
296
- chunkingStrategy: {
297
- type: "vtt",
298
- maxTokens: 500,
299
- overlapCues: 2
300
- }
301
- });
302
- ```
303
-
304
- ### Audio Dubbing
305
-
306
- ```typescript
307
- import { translateAudio } from "@mux/ai/workflows";
308
-
309
- // Create AI-dubbed audio track and add to Mux asset
310
- // Uses the default audio track on your asset, language is auto-detected
311
- const result = await translateAudio(
312
- "your-mux-asset-id",
313
- "es", // target language
314
- {
315
- provider: "elevenlabs",
316
- numSpeakers: 0 // Auto-detect speakers
317
- }
318
- );
319
-
320
- console.log(result.dubbingId); // ElevenLabs dubbing job ID
321
- console.log(result.uploadedTrackId); // New Mux audio track ID
322
- console.log(result.presignedUrl); // S3 audio file URL
323
187
  ```
324
188
 
325
- ### Compare Summarization from Providers
326
-
327
- ```typescript
328
- import { getSummaryAndTags } from "@mux/ai/workflows";
329
-
330
- // Compare different AI providers analyzing the same Mux video asset
331
- const assetId = "your-mux-asset-id";
189
+ # Key Features
332
190
 
333
- // OpenAI analysis (default: gpt-5-mini)
334
- const openaiResult = await getSummaryAndTags(assetId, {
335
- provider: "openai",
336
- tone: "professional"
337
- });
338
-
339
- // Anthropic analysis (default: claude-sonnet-4-5)
340
- const anthropicResult = await getSummaryAndTags(assetId, {
341
- provider: "anthropic",
342
- tone: "professional"
343
- });
344
-
345
- // Google Gemini analysis (default: gemini-2.5-flash)
346
- const googleResult = await getSummaryAndTags(assetId, {
347
- provider: "google",
348
- tone: "professional"
349
- });
350
-
351
- // Compare results
352
- console.log("OpenAI:", openaiResult.title);
353
- console.log("Anthropic:", anthropicResult.title);
354
- console.log("Google:", googleResult.title);
355
- ```
356
-
357
- ## Configuration
358
-
359
- Set environment variables:
360
-
361
- ```bash
362
- MUX_TOKEN_ID=your_mux_token_id
363
- MUX_TOKEN_SECRET=your_mux_token_secret
364
- OPENAI_API_KEY=your_openai_api_key
365
- ANTHROPIC_API_KEY=your_anthropic_api_key
366
- GOOGLE_GENERATIVE_AI_API_KEY=your_google_api_key
367
- ELEVENLABS_API_KEY=your_elevenlabs_api_key
368
-
369
- # Signed Playback (for assets with signed playback policies)
370
- MUX_SIGNING_KEY=your_signing_key_id
371
- MUX_PRIVATE_KEY=your_base64_encoded_private_key
191
+ - **Cost-Effective by Default**: Uses affordable frontier models like `gpt-5.1`, `claude-sonnet-4-5`, and `gemini-2.5-flash` to keep analysis costs low while maintaining high quality results
192
+ - **Multi-modal Analysis**: Combines storyboard images with video transcripts for richer understanding
193
+ - **Tone Control**: Choose between normal, sassy, or professional analysis styles for summarization
194
+ - **Prompt Customization**: Override specific prompt sections to tune workflows to your exact use case
195
+ - **Configurable Thresholds**: Set custom sensitivity levels for content moderation
196
+ - **Full TypeScript Support**: Comprehensive types for excellent developer experience and IDE autocomplete
197
+ - **Provider Flexibility**: Switch between OpenAI, Anthropic, Google, and other providers based on your needs
198
+ - **Composable Building Blocks**: Use primitives to fetch transcripts, thumbnails, and storyboards for custom workflows
199
+ - **Universal Language Support**: Automatic language name detection using `Intl.DisplayNames` for all ISO 639-1 codes
200
+ - **Production Ready**: Built-in retry logic, error handling, and edge case management
372
201
 
373
- # S3-Compatible Storage (required for translation & audio dubbing)
374
- S3_ENDPOINT=https://your-s3-endpoint.com
375
- S3_REGION=auto
376
- S3_BUCKET=your-bucket-name
377
- S3_ACCESS_KEY_ID=your-access-key
378
- S3_SECRET_ACCESS_KEY=your-secret-key
379
- ```
202
+ # Core Concepts
380
203
 
381
- Or pass credentials directly:
204
+ `@mux/ai` is built around two complementary abstractions:
382
205
 
383
- ```typescript
384
- const result = await getSummaryAndTags(assetId, {
385
- muxTokenId: "your-token-id",
386
- muxTokenSecret: "your-token-secret",
387
- openaiApiKey: "your-openai-key",
388
- // For assets with signed playback policies:
389
- muxSigningKey: "your-signing-key-id",
390
- muxPrivateKey: "your-base64-private-key"
391
- });
392
- ```
206
+ ## Workflows
393
207
 
394
- ## API Reference
395
-
396
- ### `getSummaryAndTags(assetId, options?)`
397
-
398
- Analyzes a Mux video asset and returns AI-generated metadata.
399
-
400
- **Parameters:**
401
-
402
- - `assetId` (string) - Mux video asset ID
403
- - `options` (optional) - Configuration options
404
-
405
- **Options:**
406
-
407
- - `provider?: 'openai' | 'anthropic' | 'google'` - AI provider (default: 'openai')
408
- - `tone?: 'normal' | 'sassy' | 'professional'` - Analysis tone (default: 'normal')
409
- - `model?: string` - AI model to use (defaults: `gpt-5-mini`, `claude-sonnet-4-5`, or `gemini-2.5-flash`)
410
- - `includeTranscript?: boolean` - Include video transcript in analysis (default: true)
411
- - `cleanTranscript?: boolean` - Remove VTT timestamps and formatting from transcript (default: true)
412
- - `imageSubmissionMode?: 'url' | 'base64'` - How to submit storyboard to AI providers (default: 'url')
413
- - `imageDownloadOptions?: object` - Options for image download when using base64 mode
414
- - `timeout?: number` - Request timeout in milliseconds (default: 10000)
415
- - `retries?: number` - Maximum retry attempts (default: 3)
416
- - `retryDelay?: number` - Base delay between retries in milliseconds (default: 1000)
417
- - `maxRetryDelay?: number` - Maximum delay between retries in milliseconds (default: 10000)
418
- - `exponentialBackoff?: boolean` - Whether to use exponential backoff (default: true)
419
- - `promptOverrides?: object` - Override specific sections of the prompt for custom use cases
420
- - `task?: string` - Override the main task instruction
421
- - `title?: string` - Override title generation guidance
422
- - `description?: string` - Override description generation guidance
423
- - `keywords?: string` - Override keywords generation guidance
424
- - `qualityGuidelines?: string` - Override quality guidelines
425
- - `muxTokenId?: string` - Mux API token ID
426
- - `muxTokenSecret?: string` - Mux API token secret
427
- - `muxSigningKey?: string` - Signing key ID for signed playback policies
428
- - `muxPrivateKey?: string` - Base64-encoded private key for signed playback policies
429
- - `openaiApiKey?: string` - OpenAI API key
430
- - `anthropicApiKey?: string` - Anthropic API key
431
- - `googleApiKey?: string` - Google Generative AI API key
432
-
433
- **Returns:**
208
+ **Workflows** are functions that handle complete video AI tasks end-to-end. Each workflow orchestrates the entire process: fetching video data from Mux (transcripts, thumbnails, storyboards), formatting it for AI providers, and returning structured results.
434
209
 
435
210
  ```typescript
436
- interface SummaryAndTagsResult {
437
- assetId: string;
438
- title: string; // Short title (max 100 chars)
439
- description: string; // Detailed description
440
- tags: string[]; // Relevant keywords
441
- storyboardUrl: string; // Video storyboard URL
442
- }
443
- ```
444
-
445
- ### `getModerationScores(assetId, options?)`
446
-
447
- Analyzes video thumbnails for inappropriate content using OpenAI's Moderation API or Hive’s visual moderation API.
448
-
449
- **Parameters:**
450
-
451
- - `assetId` (string) - Mux video asset ID
452
- - `options` (optional) - Configuration options
453
-
454
- **Options:**
455
-
456
- - `provider?: 'openai' | 'hive'` - Moderation provider (default: 'openai')
457
- - `model?: string` - OpenAI moderation model to use (default: `omni-moderation-latest`)
458
- - `thresholds?: { sexual?: number; violence?: number }` - Custom thresholds (default: {sexual: 0.7, violence: 0.8})
459
- - `thumbnailInterval?: number` - Seconds between thumbnails for long videos (default: 10)
460
- - `thumbnailWidth?: number` - Thumbnail width in pixels (default: 640)
461
- - `maxConcurrent?: number` - Maximum concurrent API requests (default: 5)
462
- - `imageSubmissionMode?: 'url' | 'base64'` - How to submit images to AI providers (default: 'url')
463
- - `imageDownloadOptions?: object` - Options for image download when using base64 mode
464
- - `timeout?: number` - Request timeout in milliseconds (default: 10000)
465
- - `retries?: number` - Maximum retry attempts (default: 3)
466
- - `retryDelay?: number` - Base delay between retries in milliseconds (default: 1000)
467
- - `maxRetryDelay?: number` - Maximum delay between retries in milliseconds (default: 10000)
468
- - `exponentialBackoff?: boolean` - Whether to use exponential backoff (default: true)
469
- - `muxTokenId/muxTokenSecret?: string` - Mux credentials
470
- - `openaiApiKey?/hiveApiKey?` - Provider credentials
471
-
472
- **Returns:**
211
+ import { getSummaryAndTags } from "@mux/ai/workflows";
473
212
 
474
- ```typescript
475
- {
476
- assetId: string;
477
- thumbnailScores: Array<{ // Individual thumbnail results
478
- url: string;
479
- sexual: number; // 0-1 score
480
- violence: number; // 0-1 score
481
- error: boolean;
482
- }>;
483
- maxScores: { // Highest scores across all thumbnails
484
- sexual: number;
485
- violence: number;
486
- };
487
- exceedsThreshold: boolean; // true if content should be flagged
488
- thresholds: { // Threshold values used
489
- sexual: number;
490
- violence: number;
491
- };
492
- }
213
+ const result = await getSummaryAndTags("asset-id", { provider: "openai" });
493
214
  ```
494
215
 
495
- ### `hasBurnedInCaptions(assetId, options?)`
496
-
497
- Analyzes video frames to detect burned-in captions (hardcoded subtitles) that are permanently embedded in the video image.
498
-
499
- **Parameters:**
500
-
501
- - `assetId` (string) - Mux video asset ID
502
- - `options` (optional) - Configuration options
216
+ Use workflows when you need battle-tested solutions for common tasks like summarization, content moderation, chapter generation, or translation.
503
217
 
504
- **Options:**
218
+ ## Primitives
505
219
 
506
- - `provider?: 'openai' | 'anthropic' | 'google'` - AI provider (default: 'openai')
507
- - `model?: string` - AI model to use (defaults: `gpt-5-mini`, `claude-sonnet-4-5`, or `gemini-2.5-flash`)
508
- - `imageSubmissionMode?: 'url' | 'base64'` - How to submit storyboard to AI providers (default: 'url')
509
- - `imageDownloadOptions?: object` - Options for image download when using base64 mode
510
- - `timeout?: number` - Request timeout in milliseconds (default: 10000)
511
- - `retries?: number` - Maximum retry attempts (default: 3)
512
- - `retryDelay?: number` - Base delay between retries in milliseconds (default: 1000)
513
- - `maxRetryDelay?: number` - Maximum delay between retries in milliseconds (default: 10000)
514
- - `exponentialBackoff?: boolean` - Whether to use exponential backoff (default: true)
515
- - `muxTokenId?: string` - Mux API token ID
516
- - `muxTokenSecret?: string` - Mux API token secret
517
- - `openaiApiKey?: string` - OpenAI API key
518
- - `anthropicApiKey?: string` - Anthropic API key
519
- - `googleApiKey?: string` - Google Generative AI API key
520
-
521
- **Returns:**
220
+ **Primitives** are low-level building blocks that give you direct access to Mux video data and utilities. They provide functions for fetching transcripts, storyboards, thumbnails, and processing text—perfect for building custom workflows.
522
221
 
523
222
  ```typescript
524
- {
525
- assetId: string;
526
- hasBurnedInCaptions: boolean; // Whether burned-in captions were detected
527
- confidence: number; // Confidence score (0.0-1.0)
528
- detectedLanguage: string | null; // Language of detected captions, or null
529
- storyboardUrl: string; // URL to analyzed storyboard
530
- }
531
- ```
532
-
533
- **Detection Logic:**
534
-
535
- - Analyzes video storyboard frames to identify text overlays
536
- - Distinguishes between actual captions and marketing/end-card text
537
- - Text appearing only in final 1-2 frames is classified as marketing copy
538
- - Caption text must appear across multiple frames throughout the timeline
539
- - Both providers use optimized prompts to minimize false positives
540
-
541
- ### `translateCaptions(assetId, fromLanguageCode, toLanguageCode, options?)`
542
-
543
- Translates existing captions from one language to another and optionally adds them as a new track to the Mux asset.
544
-
545
- **Parameters:**
546
-
547
- - `assetId` (string) - Mux video asset ID
548
- - `fromLanguageCode` (string) - Source language code (e.g., 'en', 'es', 'fr')
549
- - `toLanguageCode` (string) - Target language code (e.g., 'es', 'fr', 'de')
550
- - `options` (optional) - Configuration options
551
-
552
- **Options:**
553
-
554
- - `provider: 'openai' | 'anthropic' | 'google'` - AI provider (required)
555
- - `model?: string` - Model to use (defaults to the provider's chat-vision model if omitted)
556
- - `uploadToMux?: boolean` - Whether to upload translated track to Mux (default: true)
557
- - `s3Endpoint?: string` - S3-compatible storage endpoint
558
- - `s3Region?: string` - S3 region (default: 'auto')
559
- - `s3Bucket?: string` - S3 bucket name
560
- - `s3AccessKeyId?: string` - S3 access key ID
561
- - `s3SecretAccessKey?: string` - S3 secret access key
562
- - `muxTokenId/muxTokenSecret?: string` - Mux credentials
563
- - `openaiApiKey?/anthropicApiKey?/googleApiKey?` - Provider credentials
564
-
565
- **Returns:**
223
+ import { fetchTranscriptForAsset, getStoryboardUrl } from "@mux/ai/primitives";
566
224
 
567
- ```typescript
568
- interface TranslateCaptionsResult {
569
- assetId: string;
570
- sourceLanguageCode: string;
571
- targetLanguageCode: string;
572
- originalVtt: string; // Original VTT content
573
- translatedVtt: string; // Translated VTT content
574
- uploadedTrackId?: string; // Mux track ID (if uploaded)
575
- presignedUrl?: string; // S3 presigned URL (expires in 1 hour)
576
- }
225
+ const transcript = await fetchTranscriptForAsset("asset-id", "en");
226
+ const storyboard = getStoryboardUrl("playback-id", { width: 640 });
577
227
  ```
578
228
 
579
- **Supported Languages:**
580
- All ISO 639-1 language codes are automatically supported using `Intl.DisplayNames`. Examples: Spanish (es), French (fr), German (de), Italian (it), Portuguese (pt), Polish (pl), Japanese (ja), Korean (ko), Chinese (zh), Russian (ru), Arabic (ar), Hindi (hi), Thai (th), Swahili (sw), and many more.
581
-
582
- ### `generateChapters(assetId, languageCode, options?)`
583
-
584
- Generates AI-powered chapter markers by analyzing video captions. Creates logical chapter breaks based on topic changes and content transitions.
585
-
586
- **Parameters:**
587
-
588
- - `assetId` (string) - Mux video asset ID
589
- - `languageCode` (string) - Language code for captions (e.g., 'en', 'es', 'fr')
590
- - `options` (optional) - Configuration options
591
-
592
- **Options:**
229
+ Use primitives when you need complete control over your AI prompts or want to build custom workflows not covered by the pre-built options.
593
230
 
594
- - `provider?: 'openai' | 'anthropic' | 'google'` - AI provider (default: 'openai')
595
- - `model?: string` - AI model to use (defaults: `gpt-5-mini`, `claude-sonnet-4-5`, or `gemini-2.5-flash`)
596
- - `muxTokenId?: string` - Mux API token ID
597
- - `muxTokenSecret?: string` - Mux API token secret
598
- - `openaiApiKey?: string` - OpenAI API key
599
- - `anthropicApiKey?: string` - Anthropic API key
600
- - `googleApiKey?: string` - Google Generative AI API key
601
-
602
- **Returns:**
231
+ ## Package Structure
603
232
 
604
233
  ```typescript
605
- {
606
- assetId: string;
607
- languageCode: string;
608
- chapters: Array<{
609
- startTime: number; // Chapter start time in seconds
610
- title: string; // Descriptive chapter title
611
- }>;
612
- }
613
- ```
614
-
615
- **Requirements:**
616
-
617
- - Asset must have caption track in the specified language
618
- - Caption track must be in 'ready' status
619
- - Uses existing auto-generated or uploaded captions
234
+ // Import workflows
235
+ import { generateChapters } from "@mux/ai/workflows";
620
236
 
621
- **Example Output:**
237
+ // Import primitives
238
+ import { fetchTranscriptForAsset } from "@mux/ai/primitives";
622
239
 
623
- ```javascript
624
- // Perfect format for Mux Player
625
- player.addChapters([
626
- { startTime: 0, title: "Introduction and Setup" },
627
- { startTime: 45, title: "Main Content Discussion" },
628
- { startTime: 120, title: "Conclusion" }
629
- ]);
240
+ // Or import everything
241
+ import { workflows, primitives } from "@mux/ai";
630
242
  ```
631
243
 
632
- ### `translateAudio(assetId, toLanguageCode, options?)`
244
+ # Credentials
633
245
 
634
- Creates AI-dubbed audio tracks from existing video content using ElevenLabs voice cloning and translation. Uses the default audio track on your asset, language is auto-detected.
246
+ You'll need to set up credentials for Mux as well as any AI provider you want to use for a particular workflow. In addition, some workflows will need other cloud-hosted access (e.g. cloud storage via AWS S3).
635
247
 
636
- **Parameters:**
248
+ ## Credentials - Mux
637
249
 
638
- - `assetId` (string) - Mux video asset ID (must have audio.m4a static rendition)
639
- - `toLanguageCode` (string) - Target language code (e.g., 'es', 'fr', 'de')
640
- - `options` (optional) - Configuration options
250
+ ### Access Token (required)
641
251
 
642
- **Options:**
252
+ All workflows require a Mux API access token to interact with your video assets. If you're already logged into the dashboard, you can [create a new access token here](https://dashboard.mux.com/settings/access-tokens).
643
253
 
644
- - `provider?: 'elevenlabs'` - AI provider (default: 'elevenlabs')
645
- - `numSpeakers?: number` - Number of speakers (default: 0 for auto-detect)
646
- - `uploadToMux?: boolean` - Whether to upload dubbed track to Mux (default: true)
647
- - `s3Endpoint?: string` - S3-compatible storage endpoint
648
- - `s3Region?: string` - S3 region (default: 'auto')
649
- - `s3Bucket?: string` - S3 bucket name
650
- - `s3AccessKeyId?: string` - S3 access key ID
651
- - `s3SecretAccessKey?: string` - S3 secret access key
652
- - `elevenLabsApiKey?: string` - ElevenLabs API key
653
- - `muxTokenId/muxTokenSecret?: string` - API credentials
254
+ **Required Permissions:**
255
+ - **Mux Video**: Read + Write access
256
+ - **Mux Data**: Read access
654
257
 
655
- **Returns:**
258
+ These permissions cover all current workflows. You can set these when creating your token in the dashboard.
656
259
 
657
- ```typescript
658
- interface TranslateAudioResult {
659
- assetId: string;
660
- targetLanguageCode: string;
661
- dubbingId: string; // ElevenLabs dubbing job ID
662
- uploadedTrackId?: string; // Mux audio track ID (if uploaded)
663
- presignedUrl?: string; // S3 presigned URL (expires in 1 hour)
664
- }
665
- ```
666
-
667
- **Requirements:**
668
-
669
- - Asset must have an `audio.m4a` static rendition
670
- - ElevenLabs API key with Creator plan or higher
671
- - S3-compatible storage for Mux ingestion
260
+ > **💡 Tip:** For security reasons, consider creating a dedicated access token specifically for your AI workflows rather than reusing existing tokens.
672
261
 
673
- **Supported Languages:**
674
- ElevenLabs supports 32+ languages with automatic language name detection via `Intl.DisplayNames`. Supported languages include English, Spanish, French, German, Italian, Portuguese, Polish, Japanese, Korean, Chinese, Russian, Arabic, Hindi, Thai, and many more. Track names are automatically generated (e.g., "Polish (auto-dubbed)").
262
+ ### Signing Key (conditionally required)
675
263
 
676
- ### Custom Prompts with `promptOverrides`
264
+ If your Mux assets use [signed playback URLs](https://docs.mux.com/guides/secure-video-playback) for security, you'll need to provide signing credentials so `@mux/ai` can access the video data.
677
265
 
678
- Customize specific sections of the summarization prompt for different use cases like SEO, social media, or technical analysis.
266
+ **When needed:** Only if your assets have signed playback policies enabled and no public playback ID.
679
267
 
680
- **Tip:** Before adding overrides, read through the default summarization prompt template in `src/functions/summarization.ts` (the `summarizationPromptBuilder` config) so that you have clear context on what each section does and what you’re changing.
681
-
682
- ```typescript
683
- import { getSummaryAndTags } from "@mux/ai/workflows";
268
+ **How to get:**
269
+ 1. Go to [Settings > Signing Keys](https://dashboard.mux.com/settings/signing-keys) in your Mux dashboard
270
+ 2. Create a new signing key or use an existing one
271
+ 3. Save both the **Signing Key ID** and the **Base64-encoded Private Key**
684
272
 
685
- // SEO-optimized metadata
686
- const seoResult = await getSummaryAndTags(assetId, {
687
- tone: "professional",
688
- promptOverrides: {
689
- task: "Generate SEO-optimized metadata that maximizes discoverability.",
690
- title: "Create a search-optimized title (50-60 chars) with primary keyword front-loaded.",
691
- keywords: "Focus on high search volume terms and long-tail keywords.",
692
- },
693
- });
694
-
695
- // Social media optimized for engagement
696
- const socialResult = await getSummaryAndTags(assetId, {
697
- promptOverrides: {
698
- title: "Create a scroll-stopping headline using emotional triggers or curiosity gaps.",
699
- description: "Write shareable copy that creates FOMO and works without watching the video.",
700
- keywords: "Generate hashtag-ready keywords for trending and niche community tags.",
701
- },
702
- });
703
-
704
- // Technical/production analysis
705
- const technicalResult = await getSummaryAndTags(assetId, {
706
- tone: "professional",
707
- promptOverrides: {
708
- task: "Analyze cinematography, lighting, and production techniques.",
709
- title: "Describe the production style or filmmaking technique.",
710
- description: "Provide a technical breakdown of camera work, lighting, and editing.",
711
- keywords: "Use industry-standard production terminology.",
712
- },
713
- });
273
+ **Configuration:**
274
+ ```bash
275
+ MUX_SIGNING_KEY=your_signing_key_id
276
+ MUX_PRIVATE_KEY=your_base64_encoded_private_key
714
277
  ```
715
278
 
716
- **Available override sections:**
717
- | Section | Description |
718
- |---------|-------------|
719
- | `task` | Main instruction for what to analyze |
720
- | `title` | Guidance for generating the title |
721
- | `description` | Guidance for generating the description |
722
- | `keywords` | Guidance for generating keywords/tags |
723
- | `qualityGuidelines` | General quality instructions |
279
+ ## Credentials - AI Providers
724
280
 
725
- Each override can be a simple string (replaces the section content) or a full `PromptSection` object for advanced control over XML tag names and attributes.
281
+ Different workflows support various AI providers. You only need to configure API keys for the providers you plan to use.
726
282
 
727
- ## Examples
283
+ ### OpenAI
728
284
 
729
- See the `examples/` directory for complete working examples.
285
+ **Used by:** `getSummaryAndTags`, `getModerationScores`, `hasBurnedInCaptions`, `generateChapters`, `generateVideoEmbeddings`, `translateCaptions`
730
286
 
731
- **Prerequisites:**
732
- Create a `.env` file in the project root with your API credentials:
287
+ **Get your API key:** [OpenAI API Keys](https://platform.openai.com/api-keys)
733
288
 
734
289
  ```bash
735
- MUX_TOKEN_ID=your_token_id
736
- MUX_TOKEN_SECRET=your_token_secret
737
- OPENAI_API_KEY=your_openai_key
738
- ANTHROPIC_API_KEY=your_anthropic_key
739
- GOOGLE_GENERATIVE_AI_API_KEY=your_google_key
740
- HIVE_API_KEY=your_hive_key # required for Hive moderation runs
290
+ OPENAI_API_KEY=your_openai_api_key
741
291
  ```
742
292
 
743
- All examples automatically load environment variables using `dotenv`.
293
+ ### Anthropic
744
294
 
745
- ### Quick Start (Run from Root)
295
+ **Used by:** `getSummaryAndTags`, `hasBurnedInCaptions`, `generateChapters`, `translateCaptions`
746
296
 
747
- You can run examples directly from the project root without installing dependencies in each example folder:
297
+ **Get your API key:** [Anthropic Console](https://console.anthropic.com/)
748
298
 
749
299
  ```bash
750
- # Chapters
751
- npm run example:chapters <asset-id> [language-code] [provider]
752
- npm run example:chapters:compare <asset-id> [language-code]
753
-
754
- # Burned-in Caption Detection
755
- npm run example:burned-in <asset-id> [provider]
756
- npm run example:burned-in:compare <asset-id>
757
-
758
- # Summarization
759
- npm run example:summarization <asset-id> [provider]
760
- npm run example:summarization:compare <asset-id>
761
-
762
- # Moderation
763
- npm run example:moderation <asset-id> [provider]
764
- npm run example:moderation:compare <asset-id>
765
-
766
- # Caption Translation
767
- npm run example:translate-captions <asset-id> [from-lang] [to-lang] [provider]
768
-
769
- # Audio Translation (Dubbing)
770
- npm run example:translate-audio <asset-id> [to-lang]
771
-
772
- # Signed Playback (for assets with signed playback policies)
773
- npm run example:signed-playback <signed-asset-id>
774
- npm run example:signed-playback:summarize <signed-asset-id> [provider]
300
+ ANTHROPIC_API_KEY=your_anthropic_api_key
775
301
  ```
776
302
 
777
- **Examples:**
778
-
779
- ```bash
780
- # Generate chapters with OpenAI
781
- npm run example:chapters abc123 en openai
782
-
783
- # Detect burned-in captions with Anthropic
784
- npm run example:burned-in abc123 anthropic
785
-
786
- # Compare OpenAI vs Anthropic chapter generation
787
- npm run example:chapters:compare abc123 en
303
+ ### Google Generative AI
788
304
 
789
- # Run moderation analysis with Hive
790
- npm run example:moderation abc123 hive
305
+ **Used by:** `getSummaryAndTags`, `hasBurnedInCaptions`, `generateChapters`, `generateVideoEmbeddings`, `translateCaptions`
791
306
 
792
- # Translate captions from English to Spanish with Anthropic (default)
793
- npm run example:translate-captions abc123 en es anthropic
794
-
795
- # Summarize a video with Claude Sonnet 4.5 (default)
796
- npm run example:summarization abc123 anthropic
797
-
798
- # Create AI-dubbed audio in French
799
- npm run example:translate-audio abc123 fr
800
- ```
801
-
802
- ### Summarization Examples
803
-
804
- - **Basic Usage**: Default prompt with different tones
805
- - **Custom Prompts**: Override prompt sections with presets (SEO, social, technical, ecommerce)
806
- - **Tone Variations**: Compare analysis styles
307
+ **Get your API key:** [Google AI Studio](https://aistudio.google.com/app/apikey)
807
308
 
808
309
  ```bash
809
- cd examples/summarization
810
- npm install
811
- npm run basic <your-asset-id> [provider]
812
- npm run tones <your-asset-id>
813
-
814
- # Custom prompts with presets
815
- npm run custom <your-asset-id> --preset seo
816
- npm run custom <your-asset-id> --preset social
817
- npm run custom <your-asset-id> --preset technical
818
- npm run custom <your-asset-id> --preset ecommerce
819
-
820
- # Or provide individual overrides
821
- npm run custom <your-asset-id> --task "Focus on product features"
310
+ GOOGLE_GENERATIVE_AI_API_KEY=your_google_api_key
822
311
  ```
823
312
 
824
- ### Moderation Examples
825
-
826
- - **Basic Moderation**: Analyze content with default thresholds
827
- - **Custom Thresholds**: Compare strict/default/permissive settings
828
- - **Provider Comparison**: Compare OpenAI’s dedicated Moderation API with Hive’s visual moderation API
829
-
830
- ```bash
831
- cd examples/moderation
832
- npm install
833
- npm run basic <your-asset-id> [provider] # provider: openai | hive
834
- npm run thresholds <your-asset-id>
835
- npm run compare <your-asset-id>
836
- ```
313
+ ### ElevenLabs
837
314
 
838
- Supported moderation providers: `openai` (default) and `hive`. Use `HIVE_API_KEY` when selecting Hive.
315
+ **Used by:** `translateAudio` (audio dubbing)
839
316
 
840
- ### Burned-in Caption Examples
317
+ **Get your API key:** [ElevenLabs API Keys](https://elevenlabs.io/app/settings/api-keys)
841
318
 
842
- - **Basic Detection**: Detect burned-in captions with different AI providers
843
- - **Provider Comparison**: Compare OpenAI vs Anthropic vs Google detection accuracy
319
+ **Note:** Requires a Creator plan or higher for dubbing features.
844
320
 
845
321
  ```bash
846
- cd examples/burned-in-captions
847
- npm install
848
- npm run burned-in:basic <your-asset-id> [provider]
849
- npm run compare <your-asset-id>
322
+ ELEVENLABS_API_KEY=your_elevenlabs_api_key
850
323
  ```
851
324
 
852
- ### Chapter Generation Examples
853
-
854
- - **Basic Chapters**: Generate chapters with different AI providers
855
- - **Provider Comparison**: Compare OpenAI vs Anthropic vs Google chapter generation
856
-
857
- ```bash
858
- cd examples/chapters
859
- npm install
860
- npm run chapters:basic <your-asset-id> [language-code] [provider]
861
- npm run compare <your-asset-id> [language-code]
862
- ```
325
+ ### Hive
863
326
 
864
- ### Caption Translation Examples
327
+ **Used by:** `getModerationScores` (alternative to OpenAI moderation)
865
328
 
866
- - **Basic Translation**: Translate captions and upload to Mux
867
- - **Translation Only**: Translate without uploading to Mux
329
+ **Get your API key:** [Hive Console](https://thehive.ai/)
868
330
 
869
331
  ```bash
870
- cd examples/translate-captions
871
- npm install
872
- npm run basic <your-asset-id> en es [provider]
873
- npm run translation-only <your-asset-id> en fr [provider]
332
+ HIVE_API_KEY=your_hive_api_key
874
333
  ```
875
334
 
876
- **Translation Workflow:**
335
+ ## Credentials - Cloud Infrastructure
877
336
 
878
- 1. Fetches existing captions from Mux asset
879
- 2. Translates VTT content using your selected provider (default: Claude Sonnet 4.5)
880
- 3. Uploads translated VTT to S3-compatible storage
881
- 4. Generates presigned URL (1-hour expiry)
882
- 5. Adds new subtitle track to Mux asset
883
- 6. Track name: "{Language} (auto-translated)"
337
+ ### AWS S3 (or S3-compatible storage)
884
338
 
885
- ### Audio Dubbing Examples
339
+ **Required for:** `translateCaptions`, `translateAudio` (only if `uploadToMux` is true, which is the default)
886
340
 
887
- - **Basic Dubbing**: Create AI-dubbed audio and upload to Mux
888
- - **Dubbing Only**: Create dubbed audio without uploading to Mux
341
+ Translation workflows need temporary storage to upload translated files before attaching them to your Mux assets. Any S3-compatible storage service works (AWS S3, Cloudflare R2, DigitalOcean Spaces, etc.).
889
342
 
343
+ **AWS S3 Setup:**
344
+ 1. [Create an S3 bucket](https://s3.console.aws.amazon.com/s3/home)
345
+ 2. [Create an IAM user](https://console.aws.amazon.com/iam/) with programmatic access
346
+ 3. Attach a policy with `s3:PutObject`, `s3:GetObject`, and `s3:PutObjectAcl` permissions for your bucket
347
+
348
+ **Configuration:**
890
349
  ```bash
891
- cd examples/translate-audio
892
- npm install
893
- npm run basic <your-asset-id> es
894
- npm run dubbing-only <your-asset-id> fr
350
+ S3_ENDPOINT=https://s3.amazonaws.com # Or your S3-compatible endpoint
351
+ S3_REGION=us-east-1 # Your bucket region
352
+ S3_BUCKET=your-bucket-name
353
+ S3_ACCESS_KEY_ID=your-access-key
354
+ S3_SECRET_ACCESS_KEY=your-secret-key
895
355
  ```
896
356
 
897
- **Audio Dubbing Workflow:**
898
-
899
- 1. Checks asset has audio.m4a static rendition
900
- 2. Downloads default audio track from Mux
901
- 3. Creates ElevenLabs dubbing job with automatic language detection
902
- 4. Polls for completion (up to 30 minutes)
903
- 5. Downloads dubbed audio file
904
- 6. Uploads to S3-compatible storage
905
- 7. Generates presigned URL (1-hour expiry)
906
- 8. Adds new audio track to Mux asset
907
- 9. Track name: "{Language} (auto-dubbed)"
908
-
909
- ### Signed Playback Examples
910
-
911
- - **URL Generation Test**: Verify signed URLs work for storyboards, thumbnails, and transcripts
912
- - **Signed Summarization**: Full summarization workflow with a signed asset
913
-
357
+ **Cloudflare R2 Example:**
914
358
  ```bash
915
- cd examples/signed-playback
916
- npm install
917
-
918
- # Verify signed URL generation
919
- npm run basic <signed-asset-id>
920
-
921
- # Summarize a signed asset
922
- npm run summarize <signed-asset-id> [provider]
359
+ S3_ENDPOINT=https://your-account-id.r2.cloudflarestorage.com
360
+ S3_REGION=auto
361
+ S3_BUCKET=your-bucket-name
362
+ S3_ACCESS_KEY_ID=your-r2-access-key
363
+ S3_SECRET_ACCESS_KEY=your-r2-secret-key
923
364
  ```
924
365
 
925
- **Prerequisites:**
926
-
927
- 1. Create a Mux asset with `playback_policy: "signed"`
928
- 2. Create a signing key in Mux Dashboard → Settings → Signing Keys
929
- 3. Set `MUX_SIGNING_KEY` and `MUX_PRIVATE_KEY` environment variables
930
-
931
- **How Signed Playback Works:**
932
- When you provide signing credentials, the library automatically:
366
+ # Documentation
933
367
 
934
- - Detects if an asset has a signed playback policy
935
- - Generates JWT tokens with RS256 algorithm
936
- - Uses the correct `aud` claim for each asset type (video, thumbnail, storyboard)
937
- - Appends tokens to URLs as query parameters
368
+ ## Full Documentation
938
369
 
939
- ## S3-Compatible Storage
370
+ - **[Workflows Guide](./docs/WORKFLOWS.md)** - Detailed guide to each pre-built workflow with examples
371
+ - **[API Reference](./docs/API.md)** - Complete API documentation for all functions, parameters, and return types
372
+ - **[Primitives Guide](./docs/PRIMITIVES.md)** - Low-level building blocks for custom workflows
373
+ - **[Examples](./docs/EXAMPLES.md)** - Running examples from the repository
940
374
 
941
- The translation feature requires S3-compatible storage to temporarily host VTT files for Mux ingestion. Supported providers include:
375
+ ## Additional Resources
942
376
 
943
- - **AWS S3** - Amazon's object storage
944
- - **DigitalOcean Spaces** - S3-compatible with CDN
945
- - **Cloudflare R2** - Zero egress fees
946
- - **MinIO** - Self-hosted S3 alternative
947
- - **Backblaze B2** - Cost-effective storage
948
- - **Wasabi** - Hot cloud storage
377
+ - **[Mux Video API Docs](https://docs.mux.com/guides/video)** - Learn about Mux Video features
378
+ - **[Auto-generated Captions](https://www.mux.com/docs/guides/add-autogenerated-captions-and-use-transcripts)** - Enable transcripts for your assets
379
+ - **[GitHub Repository](https://github.com/muxinc/ai)** - Source code, issues, and contributions
380
+ - **[npm Package](https://www.npmjs.com/package/@mux/ai)** - Package page and version history
949
381
 
950
- **Why S3 Storage?**
951
- Mux requires a publicly accessible URL to ingest subtitle tracks. The translation workflow:
382
+ # Contributing
952
383
 
953
- 1. Uploads translated VTT to your S3 storage
954
- 2. Generates a presigned URL for secure access
955
- 3. Mux fetches the file using the presigned URL
956
- 4. File remains in your storage for future use
384
+ We welcome contributions! Whether you're fixing bugs, adding features, or improving documentation, we'd love your help.
957
385
 
958
- ## Development
959
-
960
- ### Setup
961
-
962
- ```bash
963
- # Clone repo and install dependencies
964
- git clone https://github.com/muxinc/mux-ai.git
965
- cd mux-ai
966
- npm install # Automatically sets up git hooks via Husky
967
- ```
386
+ Please see our **[Contributing Guide](./CONTRIBUTING.md)** for details on:
968
387
 
969
- ### Style
388
+ - Setting up your development environment
389
+ - Running examples and tests
390
+ - Code style and conventions
391
+ - Submitting pull requests
392
+ - Reporting issues
970
393
 
971
- This project uses automated tooling to enforce consistent code style:
972
-
973
- - **ESLint** with `@antfu/eslint-config` for linting and formatting
974
- - **TypeScript** strict mode for type safety
975
- - **Pre-commit hooks** that run automatically before each commit
976
-
977
- ```bash
978
- # Check for linting issues
979
- npm run lint
980
-
981
- # Auto-fix linting issues
982
- npm run lint:fix
983
-
984
- # Run type checking
985
- npm run typecheck
986
-
987
- # Run tests
988
- npm test
989
- ```
394
+ For questions or discussions, feel free to [open an issue](https://github.com/muxinc/ai/issues).
990
395
 
991
396
  ## License
992
397