@mux/ai 0.1.4 → 0.1.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (2) hide show
  1. package/README.md +81 -891
  2. package/package.json +1 -1
package/README.md CHANGED
@@ -1,18 +1,24 @@
1
- # @mux/ai
1
+ # @mux/ai 📼 🤝 🤖
2
2
 
3
- A set of tools for connecting videos in your Mux account to multi-modal LLMs.
3
+ A typescript library for connecting videos in your Mux account to multi-modal LLMs.
4
+
5
+ `@mux/ai` contains two abstractions:
6
+
7
+ **Workflows** are production-ready functions that handle common video<->LLM tasks. Each workflow orchestrates the entire process: fetching video data from Mux (transcripts, thumbnails, storyboards), formatting it for AI providers, and returning structured results. Use workflows when you need battle-tested solutions for tasks like summarization, content moderation, chapter generation, or translation.
8
+
9
+ **Primitives** are the low-level building blocks that workflows are composed from. They provide direct access to Mux video data (transcripts, storyboards, thumbnails) and utilities for chunking and processing text. Use primitives when you need complete control over your AI prompts or want to build custom workflows not covered by the pre-built options.
4
10
 
5
11
  ## Available pre-built workflows
6
12
 
7
- | Workflow | Description | Providers | Default Models | Input | Output |
8
- | ------------------------- | --------------------------------------------------------------- | ------------------------- | ------------------------------------------------------------------ | -------------------------------- | ---------------------------------------------- |
9
- | `getSummaryAndTags` | Generate titles, descriptions, and tags from a Mux video asset | OpenAI, Anthropic, Google | `gpt-5-mini`, `claude-sonnet-4-5`, `gemini-2.5-flash` | Asset ID + options | Title, description, tags, storyboard URL |
10
- | `getModerationScores` | Analyze video thumbnails for inappropriate content | OpenAI, Hive | `omni-moderation-latest` (OpenAI) or Hive visual moderation task | Asset ID + thresholds | Sexual/violence scores, flagged status |
11
- | `hasBurnedInCaptions` | Detect burned-in captions (hardcoded subtitles) in video frames | OpenAI, Anthropic, Google | `gpt-5-mini`, `claude-sonnet-4-5`, `gemini-2.5-flash` | Asset ID + options | Boolean result, confidence, language |
12
- | `generateChapters` | Generate AI-powered chapter markers from video captions | OpenAI, Anthropic, Google | `gpt-5-mini`, `claude-sonnet-4-5`, `gemini-2.5-flash` | Asset ID + language + options | Timestamped chapter list, ready for Mux Player |
13
- | `generateVideoEmbeddings` | Generate vector embeddings for video transcript chunks | OpenAI, Google | `text-embedding-3-small` (OpenAI), `gemini-embedding-001` (Google) | Asset ID + chunking strategy | Chunk embeddings + averaged embedding |
14
- | `translateCaptions` | Translate video captions to different languages | OpenAI, Anthropic, Google | `gpt-5-mini`, `claude-sonnet-4-5`, `gemini-2.5-flash` | Asset ID + languages + S3 config | Translated VTT + Mux track ID |
15
- | `translateAudio` | Create AI-dubbed audio tracks in different languages | ElevenLabs only | ElevenLabs Dubbing API | Asset ID + languages + S3 config | Dubbed audio + Mux track ID |
13
+ | Workflow | Description | Providers | Default Models |
14
+ | ------------------------------------------------------------------------ | ----------------------------------------------------------------- | ------------------------- | ------------------------------------------------------------------ |
15
+ | [`getSummaryAndTags`](./docs/WORKFLOWS.md#video-summarization) | Generate titles, descriptions, and tags for an asset | OpenAI, Anthropic, Google | `gpt-5-mini`, `claude-sonnet-4-5`, `gemini-2.5-flash` |
16
+ | [`getModerationScores`](./docs/WORKFLOWS.md#content-moderation) | Detect inappropriate (sexual or violent) content in an asset | OpenAI, Hive | `omni-moderation-latest` (OpenAI) or Hive visual moderation task |
17
+ | [`hasBurnedInCaptions`](./docs/WORKFLOWS.md#burned-in-caption-detection) | Detect burned-in captions (hardcoded subtitles) in an asset | OpenAI, Anthropic, Google | `gpt-5-mini`, `claude-sonnet-4-5`, `gemini-2.5-flash` |
18
+ | [`generateChapters`](./docs/WORKFLOWS.md#chapter-generation) | Generate chapter markers for an asset using the transcript | OpenAI, Anthropic, Google | `gpt-5-mini`, `claude-sonnet-4-5`, `gemini-2.5-flash` |
19
+ | [`generateVideoEmbeddings`](./docs/WORKFLOWS.md#video-embeddings) | Generate vector embeddings for an asset's transcript chunks | OpenAI, Google | `text-embedding-3-small` (OpenAI), `gemini-embedding-001` (Google) |
20
+ | [`translateCaptions`](./docs/WORKFLOWS.md#caption-translation) | Translate an asset's captions into different languages | OpenAI, Anthropic, Google | `gpt-5-mini`, `claude-sonnet-4-5`, `gemini-2.5-flash` |
21
+ | [`translateAudio`](./docs/WORKFLOWS.md#audio-dubbing) | Create AI-dubbed audio tracks in different languages for an asset | ElevenLabs only | ElevenLabs Dubbing API |
16
22
 
17
23
  ## Features
18
24
 
@@ -26,349 +32,32 @@ A set of tools for connecting videos in your Mux account to multi-modal LLMs.
26
32
  - **Composable Building Blocks**: Import primitives to fetch transcripts, thumbnails, and storyboards to build bespoke flows
27
33
  - **Universal Language Support**: Automatic language name detection using `Intl.DisplayNames` for all ISO 639-1 codes
28
34
 
29
- ## Package Structure
30
-
31
- This package ships with layered entry points so you can pick the right level of abstraction for your workflow:
32
-
33
- - `@mux/ai/workflows` – opinionated, production-ready helpers (`getSummaryAndTags`, `generateChapters`, `translateCaptions`, etc.) that orchestrate Mux API access, transcript/storyboard gathering, and the AI provider call.
34
- - `@mux/ai/primitives` – low-level building blocks such as `fetchTranscriptForAsset`, `getStoryboardUrl`, and `getThumbnailUrls`. Use these when you need to mix our utilities into your own prompts or custom workflows.
35
- - `@mux/ai` – re-exports both namespaces, plus shared `types`, so you can also write `import { workflows, primitives } from '@mux/ai';`.
36
-
37
- Every helper inside `@mux/ai/workflows` is composed from the primitives. That means you can start with a high-level workflow and gradually drop down to primitives whenever you need more control.
38
-
39
- ```typescript
40
- import { fetchTranscriptForAsset, getStoryboardUrl } from "@mux/ai/primitives";
41
- import { getModerationScores, getSummaryAndTags } from "@mux/ai/workflows";
42
-
43
- // Compose high-level workflows for a custom workflow
44
- export async function summarizeIfSafe(assetId: string) {
45
- const moderation = await getModerationScores(assetId, { provider: "openai" });
46
- if (moderation.exceedsThreshold) {
47
- throw new Error("Asset failed content safety review");
48
- }
49
-
50
- return getSummaryAndTags(assetId, {
51
- provider: "anthropic",
52
- tone: "professional",
53
- });
54
- }
55
-
56
- // Or drop down to primitives to build bespoke AI workflows
57
- export async function customTranscriptAnalysis(assetId: string, playbackId: string) {
58
- const transcript = await fetchTranscriptForAsset(assetId, "en");
59
- const storyboardUrl = getStoryboardUrl(playbackId);
60
-
61
- // Use these primitives in your own AI prompts or custom logic
62
- return { transcript, storyboardUrl };
63
- }
64
- ```
65
-
66
- Use whichever layer makes sense: call a workflow as-is, compose multiple workflows together, or drop down to primitives to build a completely custom workflow.
67
-
68
35
  ## Installation
69
36
 
70
37
  ```bash
71
38
  npm install @mux/ai
72
39
  ```
73
40
 
74
- ## Quick Start
75
-
76
- ### Video Summarization
77
-
78
- ```typescript
79
- import { getSummaryAndTags } from "@mux/ai/workflows";
80
-
81
- // Uses built-in optimized prompt
82
- const result = await getSummaryAndTags("your-mux-asset-id", {
83
- tone: "professional"
84
- });
85
-
86
- console.log(result.title); // Short, descriptive title
87
- console.log(result.description); // Detailed description
88
- console.log(result.tags); // Array of relevant keywords
89
- console.log(result.storyboardUrl); // URL to Mux storyboard
90
-
91
- // Use base64 mode for improved reliability (works with OpenAI, Anthropic, and Google)
92
- const reliableResult = await getSummaryAndTags("your-mux-asset-id", {
93
- provider: "anthropic",
94
- imageSubmissionMode: "base64", // Downloads storyboard locally before submission
95
- imageDownloadOptions: {
96
- timeout: 15000,
97
- retries: 2,
98
- retryDelay: 1000
99
- },
100
- tone: "professional"
101
- });
102
-
103
- // Customize for specific use cases with promptOverrides
104
- const seoResult = await getSummaryAndTags("your-mux-asset-id", {
105
- promptOverrides: {
106
- task: "Generate SEO-optimized metadata for search engines.",
107
- title: "Create a search-optimized title (50-60 chars) with primary keyword.",
108
- keywords: "Focus on high search volume and long-tail keywords.",
109
- },
110
- });
111
- ```
112
-
113
- ### Content Moderation
114
-
115
- ```typescript
116
- import { getModerationScores } from "@mux/ai/workflows";
117
-
118
- // Analyze Mux video asset for inappropriate content (OpenAI default)
119
- const result = await getModerationScores("your-mux-asset-id", {
120
- thresholds: { sexual: 0.7, violence: 0.8 }
121
- });
122
-
123
- console.log(result.maxScores); // Highest scores across all thumbnails
124
- console.log(result.exceedsThreshold); // true if content should be flagged
125
- console.log(result.thumbnailScores); // Individual thumbnail results
126
-
127
- // Run the same analysis using Hive’s visual moderation API
128
- const hiveResult = await getModerationScores("your-mux-asset-id", {
129
- provider: "hive",
130
- thresholds: { sexual: 0.9, violence: 0.9 },
131
- });
132
-
133
- // Use base64 submission for improved reliability with OpenAI (downloads images locally)
134
- const reliableResult = await getModerationScores("your-mux-asset-id", {
135
- provider: "openai",
136
- imageSubmissionMode: "base64",
137
- imageDownloadOptions: {
138
- timeout: 15000,
139
- retries: 3,
140
- retryDelay: 1000
141
- }
142
- });
143
- ```
144
-
145
- ### Burned-in Caption Detection
146
-
147
- ```typescript
148
- import { hasBurnedInCaptions } from "@mux/ai/workflows";
149
-
150
- // Detect burned-in captions (hardcoded subtitles) in video frames
151
- const result = await hasBurnedInCaptions("your-mux-asset-id", {
152
- provider: "openai"
153
- });
154
-
155
- console.log(result.hasBurnedInCaptions); // true/false
156
- console.log(result.confidence); // 0.0-1.0 confidence score
157
- console.log(result.detectedLanguage); // Language if captions detected
158
- console.log(result.storyboardUrl); // Video storyboard analyzed
159
-
160
- // Compare providers
161
- const anthropicResult = await hasBurnedInCaptions("your-mux-asset-id", {
162
- provider: "anthropic",
163
- model: "claude-sonnet-4-5"
164
- });
165
-
166
- const googleResult = await hasBurnedInCaptions("your-mux-asset-id", {
167
- provider: "google",
168
- model: "gemini-2.5-flash"
169
- });
170
-
171
- // Use base64 mode for improved reliability
172
- const reliableResult = await hasBurnedInCaptions("your-mux-asset-id", {
173
- provider: "openai",
174
- imageSubmissionMode: "base64",
175
- imageDownloadOptions: {
176
- timeout: 15000,
177
- retries: 3,
178
- retryDelay: 1000
179
- }
180
- });
181
- ```
182
-
183
- #### Image Submission Modes
184
-
185
- Choose between two methods for submitting images to AI providers:
186
-
187
- **URL Mode (Default):**
188
-
189
- - Fast initial response
190
- - Lower bandwidth usage
191
- - Relies on AI provider's image downloading
192
- - May encounter timeouts with slow/unreliable image sources
193
-
194
- **Base64 Mode (Recommended for Production):**
195
-
196
- - Downloads images locally with robust retry logic
197
- - Eliminates AI provider timeout issues
198
- - Better control over slow TTFB and network issues
199
- - Slightly higher bandwidth usage but more reliable results
200
- - For OpenAI: submits images as base64 data URIs
201
- - For Anthropic/Google: the AI SDK handles converting the base64 payload into the provider-specific format automatically
202
-
203
- ```typescript
204
- // High reliability mode - recommended for production
205
- const result = await getModerationScores(assetId, {
206
- imageSubmissionMode: "base64",
207
- imageDownloadOptions: {
208
- timeout: 15000, // 15s timeout per image
209
- retries: 3, // Retry failed downloads 3x
210
- retryDelay: 1000, // 1s base delay with exponential backoff
211
- exponentialBackoff: true
212
- }
213
- });
214
- ```
215
-
216
- ### Caption Translation
217
-
218
- ```typescript
219
- import { translateCaptions } from "@mux/ai/workflows";
220
-
221
- // Translate existing captions to Spanish and add as new track
222
- const result = await translateCaptions(
223
- "your-mux-asset-id",
224
- "en", // from language
225
- "es", // to language
226
- {
227
- provider: "google",
228
- model: "gemini-2.5-flash"
229
- }
230
- );
231
-
232
- console.log(result.uploadedTrackId); // New Mux track ID
233
- console.log(result.presignedUrl); // S3 file URL
234
- console.log(result.translatedVtt); // Translated VTT content
235
- ```
236
-
237
- ### Video Chapters
238
-
239
- ```typescript
240
- import { generateChapters } from "@mux/ai/workflows";
241
-
242
- // Generate AI-powered chapters from video captions
243
- const result = await generateChapters("your-mux-asset-id", "en", {
244
- provider: "openai"
245
- });
246
-
247
- console.log(result.chapters); // Array of {startTime: number, title: string}
248
-
249
- // Use with Mux Player
250
- const player = document.querySelector("mux-player");
251
- player.addChapters(result.chapters);
252
-
253
- // Compare providers
254
- const anthropicResult = await generateChapters("your-mux-asset-id", "en", {
255
- provider: "anthropic",
256
- model: "claude-sonnet-4-5"
257
- });
258
-
259
- const googleResult = await generateChapters("your-mux-asset-id", "en", {
260
- provider: "google",
261
- model: "gemini-2.5-flash"
262
- });
263
- ```
264
-
265
- ### Video Embeddings
266
-
267
- ```typescript
268
- import { generateVideoEmbeddings } from "@mux/ai/workflows";
269
-
270
- // Generate embeddings for semantic video search
271
- const result = await generateVideoEmbeddings("your-mux-asset-id", {
272
- provider: "openai",
273
- chunkingStrategy: {
274
- type: "token",
275
- maxTokens: 500,
276
- overlap: 100
277
- }
278
- });
279
-
280
- console.log(result.chunks); // Array of chunk embeddings with timestamps
281
- console.log(result.averagedEmbedding); // Single embedding for entire video
282
-
283
- // Store chunks in vector database for timestamp-accurate search
284
- for (const chunk of result.chunks) {
285
- await vectorDB.insert({
286
- id: `${result.assetId}:${chunk.chunkId}`,
287
- embedding: chunk.embedding,
288
- startTime: chunk.metadata.startTime,
289
- endTime: chunk.metadata.endTime
290
- });
291
- }
292
-
293
- // Use VTT-based chunking to respect cue boundaries
294
- const vttResult = await generateVideoEmbeddings("your-mux-asset-id", {
295
- provider: "google",
296
- chunkingStrategy: {
297
- type: "vtt",
298
- maxTokens: 500,
299
- overlapCues: 2
300
- }
301
- });
302
- ```
303
-
304
- ### Audio Dubbing
305
-
306
- ```typescript
307
- import { translateAudio } from "@mux/ai/workflows";
308
-
309
- // Create AI-dubbed audio track and add to Mux asset
310
- // Uses the default audio track on your asset, language is auto-detected
311
- const result = await translateAudio(
312
- "your-mux-asset-id",
313
- "es", // target language
314
- {
315
- provider: "elevenlabs",
316
- numSpeakers: 0 // Auto-detect speakers
317
- }
318
- );
319
-
320
- console.log(result.dubbingId); // ElevenLabs dubbing job ID
321
- console.log(result.uploadedTrackId); // New Mux audio track ID
322
- console.log(result.presignedUrl); // S3 audio file URL
323
- ```
324
-
325
- ### Compare Summarization from Providers
326
-
327
- ```typescript
328
- import { getSummaryAndTags } from "@mux/ai/workflows";
329
-
330
- // Compare different AI providers analyzing the same Mux video asset
331
- const assetId = "your-mux-asset-id";
332
-
333
- // OpenAI analysis (default: gpt-5-mini)
334
- const openaiResult = await getSummaryAndTags(assetId, {
335
- provider: "openai",
336
- tone: "professional"
337
- });
338
-
339
- // Anthropic analysis (default: claude-sonnet-4-5)
340
- const anthropicResult = await getSummaryAndTags(assetId, {
341
- provider: "anthropic",
342
- tone: "professional"
343
- });
344
-
345
- // Google Gemini analysis (default: gemini-2.5-flash)
346
- const googleResult = await getSummaryAndTags(assetId, {
347
- provider: "google",
348
- tone: "professional"
349
- });
350
-
351
- // Compare results
352
- console.log("OpenAI:", openaiResult.title);
353
- console.log("Anthropic:", anthropicResult.title);
354
- console.log("Google:", googleResult.title);
355
- ```
356
-
357
41
  ## Configuration
358
42
 
359
43
  Set environment variables:
360
44
 
361
45
  ```bash
46
+ # Required
362
47
  MUX_TOKEN_ID=your_mux_token_id
363
48
  MUX_TOKEN_SECRET=your_mux_token_secret
49
+
50
+ # Needed if your assets _only_ have signed playback IDs
51
+ MUX_SIGNING_KEY=your_signing_key_id
52
+ MUX_PRIVATE_KEY=your_base64_encoded_private_key
53
+
54
+ # You only need to configure API keys for the AI platforms you're using
364
55
  OPENAI_API_KEY=your_openai_api_key
365
56
  ANTHROPIC_API_KEY=your_anthropic_api_key
366
57
  GOOGLE_GENERATIVE_AI_API_KEY=your_google_api_key
367
- ELEVENLABS_API_KEY=your_elevenlabs_api_key
368
58
 
369
- # Signed Playback (for assets with signed playback policies)
370
- MUX_SIGNING_KEY=your_signing_key_id
371
- MUX_PRIVATE_KEY=your_base64_encoded_private_key
59
+ # Needed for audio dubbing workflow
60
+ ELEVENLABS_API_KEY=your_elevenlabs_api_key
372
61
 
373
62
  # S3-Compatible Storage (required for translation & audio dubbing)
374
63
  S3_ENDPOINT=https://your-s3-endpoint.com
@@ -378,616 +67,117 @@ S3_ACCESS_KEY_ID=your-access-key
378
67
  S3_SECRET_ACCESS_KEY=your-secret-key
379
68
  ```
380
69
 
381
- Or pass credentials directly:
70
+ Or pass credentials directly to each function:
382
71
 
383
72
  ```typescript
384
73
  const result = await getSummaryAndTags(assetId, {
385
74
  muxTokenId: "your-token-id",
386
75
  muxTokenSecret: "your-token-secret",
387
- openaiApiKey: "your-openai-key",
388
- // For assets with signed playback policies:
389
- muxSigningKey: "your-signing-key-id",
390
- muxPrivateKey: "your-base64-private-key"
76
+ openaiApiKey: "your-openai-key"
391
77
  });
392
78
  ```
393
79
 
394
- ## API Reference
395
-
396
- ### `getSummaryAndTags(assetId, options?)`
397
-
398
- Analyzes a Mux video asset and returns AI-generated metadata.
399
-
400
- **Parameters:**
401
-
402
- - `assetId` (string) - Mux video asset ID
403
- - `options` (optional) - Configuration options
404
-
405
- **Options:**
406
-
407
- - `provider?: 'openai' | 'anthropic' | 'google'` - AI provider (default: 'openai')
408
- - `tone?: 'normal' | 'sassy' | 'professional'` - Analysis tone (default: 'normal')
409
- - `model?: string` - AI model to use (defaults: `gpt-5-mini`, `claude-sonnet-4-5`, or `gemini-2.5-flash`)
410
- - `includeTranscript?: boolean` - Include video transcript in analysis (default: true)
411
- - `cleanTranscript?: boolean` - Remove VTT timestamps and formatting from transcript (default: true)
412
- - `imageSubmissionMode?: 'url' | 'base64'` - How to submit storyboard to AI providers (default: 'url')
413
- - `imageDownloadOptions?: object` - Options for image download when using base64 mode
414
- - `timeout?: number` - Request timeout in milliseconds (default: 10000)
415
- - `retries?: number` - Maximum retry attempts (default: 3)
416
- - `retryDelay?: number` - Base delay between retries in milliseconds (default: 1000)
417
- - `maxRetryDelay?: number` - Maximum delay between retries in milliseconds (default: 10000)
418
- - `exponentialBackoff?: boolean` - Whether to use exponential backoff (default: true)
419
- - `promptOverrides?: object` - Override specific sections of the prompt for custom use cases
420
- - `task?: string` - Override the main task instruction
421
- - `title?: string` - Override title generation guidance
422
- - `description?: string` - Override description generation guidance
423
- - `keywords?: string` - Override keywords generation guidance
424
- - `qualityGuidelines?: string` - Override quality guidelines
425
- - `muxTokenId?: string` - Mux API token ID
426
- - `muxTokenSecret?: string` - Mux API token secret
427
- - `muxSigningKey?: string` - Signing key ID for signed playback policies
428
- - `muxPrivateKey?: string` - Base64-encoded private key for signed playback policies
429
- - `openaiApiKey?: string` - OpenAI API key
430
- - `anthropicApiKey?: string` - Anthropic API key
431
- - `googleApiKey?: string` - Google Generative AI API key
432
-
433
- **Returns:**
434
-
435
- ```typescript
436
- interface SummaryAndTagsResult {
437
- assetId: string;
438
- title: string; // Short title (max 100 chars)
439
- description: string; // Detailed description
440
- tags: string[]; // Relevant keywords
441
- storyboardUrl: string; // Video storyboard URL
442
- }
443
- ```
444
-
445
- ### `getModerationScores(assetId, options?)`
446
-
447
- Analyzes video thumbnails for inappropriate content using OpenAI's Moderation API or Hive’s visual moderation API.
448
-
449
- **Parameters:**
450
-
451
- - `assetId` (string) - Mux video asset ID
452
- - `options` (optional) - Configuration options
453
-
454
- **Options:**
80
+ ## Quick Start
455
81
 
456
- - `provider?: 'openai' | 'hive'` - Moderation provider (default: 'openai')
457
- - `model?: string` - OpenAI moderation model to use (default: `omni-moderation-latest`)
458
- - `thresholds?: { sexual?: number; violence?: number }` - Custom thresholds (default: {sexual: 0.7, violence: 0.8})
459
- - `thumbnailInterval?: number` - Seconds between thumbnails for long videos (default: 10)
460
- - `thumbnailWidth?: number` - Thumbnail width in pixels (default: 640)
461
- - `maxConcurrent?: number` - Maximum concurrent API requests (default: 5)
462
- - `imageSubmissionMode?: 'url' | 'base64'` - How to submit images to AI providers (default: 'url')
463
- - `imageDownloadOptions?: object` - Options for image download when using base64 mode
464
- - `timeout?: number` - Request timeout in milliseconds (default: 10000)
465
- - `retries?: number` - Maximum retry attempts (default: 3)
466
- - `retryDelay?: number` - Base delay between retries in milliseconds (default: 1000)
467
- - `maxRetryDelay?: number` - Maximum delay between retries in milliseconds (default: 10000)
468
- - `exponentialBackoff?: boolean` - Whether to use exponential backoff (default: true)
469
- - `muxTokenId/muxTokenSecret?: string` - Mux credentials
470
- - `openaiApiKey?/hiveApiKey?` - Provider credentials
82
+ > **‼️ Important: ‼️** Most workflows rely on video transcripts for best results. Enable [auto-generated captions](https://www.mux.com/docs/guides/add-autogenerated-captions-and-use-transcripts) on your Mux assets to unlock the full potential of transcript-based workflows like summarization, chapters, and embeddings.
471
83
 
472
- **Returns:**
84
+ ### Video Summarization
473
85
 
474
86
  ```typescript
475
- {
476
- assetId: string;
477
- thumbnailScores: Array<{ // Individual thumbnail results
478
- url: string;
479
- sexual: number; // 0-1 score
480
- violence: number; // 0-1 score
481
- error: boolean;
482
- }>;
483
- maxScores: { // Highest scores across all thumbnails
484
- sexual: number;
485
- violence: number;
486
- };
487
- exceedsThreshold: boolean; // true if content should be flagged
488
- thresholds: { // Threshold values used
489
- sexual: number;
490
- violence: number;
491
- };
492
- }
493
- ```
494
-
495
- ### `hasBurnedInCaptions(assetId, options?)`
496
-
497
- Analyzes video frames to detect burned-in captions (hardcoded subtitles) that are permanently embedded in the video image.
498
-
499
- **Parameters:**
500
-
501
- - `assetId` (string) - Mux video asset ID
502
- - `options` (optional) - Configuration options
503
-
504
- **Options:**
505
-
506
- - `provider?: 'openai' | 'anthropic' | 'google'` - AI provider (default: 'openai')
507
- - `model?: string` - AI model to use (defaults: `gpt-5-mini`, `claude-sonnet-4-5`, or `gemini-2.5-flash`)
508
- - `imageSubmissionMode?: 'url' | 'base64'` - How to submit storyboard to AI providers (default: 'url')
509
- - `imageDownloadOptions?: object` - Options for image download when using base64 mode
510
- - `timeout?: number` - Request timeout in milliseconds (default: 10000)
511
- - `retries?: number` - Maximum retry attempts (default: 3)
512
- - `retryDelay?: number` - Base delay between retries in milliseconds (default: 1000)
513
- - `maxRetryDelay?: number` - Maximum delay between retries in milliseconds (default: 10000)
514
- - `exponentialBackoff?: boolean` - Whether to use exponential backoff (default: true)
515
- - `muxTokenId?: string` - Mux API token ID
516
- - `muxTokenSecret?: string` - Mux API token secret
517
- - `openaiApiKey?: string` - OpenAI API key
518
- - `anthropicApiKey?: string` - Anthropic API key
519
- - `googleApiKey?: string` - Google Generative AI API key
87
+ import { getSummaryAndTags } from "@mux/ai/workflows";
520
88
 
521
- **Returns:**
89
+ const result = await getSummaryAndTags("your-mux-asset-id", {
90
+ tone: "professional"
91
+ });
522
92
 
523
- ```typescript
524
- {
525
- assetId: string;
526
- hasBurnedInCaptions: boolean; // Whether burned-in captions were detected
527
- confidence: number; // Confidence score (0.0-1.0)
528
- detectedLanguage: string | null; // Language of detected captions, or null
529
- storyboardUrl: string; // URL to analyzed storyboard
530
- }
93
+ console.log(result.title);
94
+ console.log(result.description);
95
+ console.log(result.tags);
531
96
  ```
532
97
 
533
- **Detection Logic:**
534
-
535
- - Analyzes video storyboard frames to identify text overlays
536
- - Distinguishes between actual captions and marketing/end-card text
537
- - Text appearing only in final 1-2 frames is classified as marketing copy
538
- - Caption text must appear across multiple frames throughout the timeline
539
- - Both providers use optimized prompts to minimize false positives
540
-
541
- ### `translateCaptions(assetId, fromLanguageCode, toLanguageCode, options?)`
542
-
543
- Translates existing captions from one language to another and optionally adds them as a new track to the Mux asset.
544
-
545
- **Parameters:**
546
-
547
- - `assetId` (string) - Mux video asset ID
548
- - `fromLanguageCode` (string) - Source language code (e.g., 'en', 'es', 'fr')
549
- - `toLanguageCode` (string) - Target language code (e.g., 'es', 'fr', 'de')
550
- - `options` (optional) - Configuration options
551
-
552
- **Options:**
553
-
554
- - `provider: 'openai' | 'anthropic' | 'google'` - AI provider (required)
555
- - `model?: string` - Model to use (defaults to the provider's chat-vision model if omitted)
556
- - `uploadToMux?: boolean` - Whether to upload translated track to Mux (default: true)
557
- - `s3Endpoint?: string` - S3-compatible storage endpoint
558
- - `s3Region?: string` - S3 region (default: 'auto')
559
- - `s3Bucket?: string` - S3 bucket name
560
- - `s3AccessKeyId?: string` - S3 access key ID
561
- - `s3SecretAccessKey?: string` - S3 secret access key
562
- - `muxTokenId/muxTokenSecret?: string` - Mux credentials
563
- - `openaiApiKey?/anthropicApiKey?/googleApiKey?` - Provider credentials
564
-
565
- **Returns:**
98
+ ### Content Moderation
566
99
 
567
100
  ```typescript
568
- interface TranslateCaptionsResult {
569
- assetId: string;
570
- sourceLanguageCode: string;
571
- targetLanguageCode: string;
572
- originalVtt: string; // Original VTT content
573
- translatedVtt: string; // Translated VTT content
574
- uploadedTrackId?: string; // Mux track ID (if uploaded)
575
- presignedUrl?: string; // S3 presigned URL (expires in 1 hour)
576
- }
577
- ```
578
-
579
- **Supported Languages:**
580
- All ISO 639-1 language codes are automatically supported using `Intl.DisplayNames`. Examples: Spanish (es), French (fr), German (de), Italian (it), Portuguese (pt), Polish (pl), Japanese (ja), Korean (ko), Chinese (zh), Russian (ru), Arabic (ar), Hindi (hi), Thai (th), Swahili (sw), and many more.
581
-
582
- ### `generateChapters(assetId, languageCode, options?)`
583
-
584
- Generates AI-powered chapter markers by analyzing video captions. Creates logical chapter breaks based on topic changes and content transitions.
585
-
586
- **Parameters:**
587
-
588
- - `assetId` (string) - Mux video asset ID
589
- - `languageCode` (string) - Language code for captions (e.g., 'en', 'es', 'fr')
590
- - `options` (optional) - Configuration options
591
-
592
- **Options:**
593
-
594
- - `provider?: 'openai' | 'anthropic' | 'google'` - AI provider (default: 'openai')
595
- - `model?: string` - AI model to use (defaults: `gpt-5-mini`, `claude-sonnet-4-5`, or `gemini-2.5-flash`)
596
- - `muxTokenId?: string` - Mux API token ID
597
- - `muxTokenSecret?: string` - Mux API token secret
598
- - `openaiApiKey?: string` - OpenAI API key
599
- - `anthropicApiKey?: string` - Anthropic API key
600
- - `googleApiKey?: string` - Google Generative AI API key
101
+ import { getModerationScores } from "@mux/ai/workflows";
601
102
 
602
- **Returns:**
103
+ const result = await getModerationScores("your-mux-asset-id", {
104
+ thresholds: { sexual: 0.7, violence: 0.8 }
105
+ });
603
106
 
604
- ```typescript
605
- {
606
- assetId: string;
607
- languageCode: string;
608
- chapters: Array<{
609
- startTime: number; // Chapter start time in seconds
610
- title: string; // Descriptive chapter title
611
- }>;
612
- }
107
+ console.log(result.exceedsThreshold); // true if content flagged
613
108
  ```
614
109
 
615
- **Requirements:**
110
+ ### Generate Chapters
616
111
 
617
- - Asset must have caption track in the specified language
618
- - Caption track must be in 'ready' status
619
- - Uses existing auto-generated or uploaded captions
112
+ ```typescript
113
+ import { generateChapters } from "@mux/ai/workflows";
620
114
 
621
- **Example Output:**
115
+ const result = await generateChapters("your-mux-asset-id", "en");
622
116
 
623
- ```javascript
624
- // Perfect format for Mux Player
625
- player.addChapters([
626
- { startTime: 0, title: "Introduction and Setup" },
627
- { startTime: 45, title: "Main Content Discussion" },
628
- { startTime: 120, title: "Conclusion" }
629
- ]);
117
+ // Use with Mux Player
118
+ player.addChapters(result.chapters);
630
119
  ```
631
120
 
632
- ### `translateAudio(assetId, toLanguageCode, options?)`
633
-
634
- Creates AI-dubbed audio tracks from existing video content using ElevenLabs voice cloning and translation. Uses the default audio track on your asset, language is auto-detected.
635
-
636
- **Parameters:**
637
-
638
- - `assetId` (string) - Mux video asset ID (must have audio.m4a static rendition)
639
- - `toLanguageCode` (string) - Target language code (e.g., 'es', 'fr', 'de')
640
- - `options` (optional) - Configuration options
641
-
642
- **Options:**
643
-
644
- - `provider?: 'elevenlabs'` - AI provider (default: 'elevenlabs')
645
- - `numSpeakers?: number` - Number of speakers (default: 0 for auto-detect)
646
- - `uploadToMux?: boolean` - Whether to upload dubbed track to Mux (default: true)
647
- - `s3Endpoint?: string` - S3-compatible storage endpoint
648
- - `s3Region?: string` - S3 region (default: 'auto')
649
- - `s3Bucket?: string` - S3 bucket name
650
- - `s3AccessKeyId?: string` - S3 access key ID
651
- - `s3SecretAccessKey?: string` - S3 secret access key
652
- - `elevenLabsApiKey?: string` - ElevenLabs API key
653
- - `muxTokenId/muxTokenSecret?: string` - API credentials
654
-
655
- **Returns:**
121
+ ### Translate Captions
656
122
 
657
123
  ```typescript
658
- interface TranslateAudioResult {
659
- assetId: string;
660
- targetLanguageCode: string;
661
- dubbingId: string; // ElevenLabs dubbing job ID
662
- uploadedTrackId?: string; // Mux audio track ID (if uploaded)
663
- presignedUrl?: string; // S3 presigned URL (expires in 1 hour)
664
- }
665
- ```
666
-
667
- **Requirements:**
124
+ import { translateCaptions } from "@mux/ai/workflows";
668
125
 
669
- - Asset must have an `audio.m4a` static rendition
670
- - ElevenLabs API key with Creator plan or higher
671
- - S3-compatible storage for Mux ingestion
126
+ const result = await translateCaptions(
127
+ "your-mux-asset-id",
128
+ "en", // from
129
+ "es", // to
130
+ { provider: "anthropic" }
131
+ );
672
132
 
673
- **Supported Languages:**
674
- ElevenLabs supports 32+ languages with automatic language name detection via `Intl.DisplayNames`. Supported languages include English, Spanish, French, German, Italian, Portuguese, Polish, Japanese, Korean, Chinese, Russian, Arabic, Hindi, Thai, and many more. Track names are automatically generated (e.g., "Polish (auto-dubbed)").
133
+ console.log(result.uploadedTrackId); // New Mux track ID
134
+ ```
675
135
 
676
- ### Custom Prompts with `promptOverrides`
136
+ ## Package Structure
677
137
 
678
- Customize specific sections of the summarization prompt for different use cases like SEO, social media, or technical analysis.
138
+ This package ships with layered entry points:
679
139
 
680
- **Tip:** Before adding overrides, read through the default summarization prompt template in `src/functions/summarization.ts` (the `summarizationPromptBuilder` config) so that you have clear context on what each section does and what you’re changing.
140
+ - **`@mux/ai/workflows`** Production-ready helpers like `getSummaryAndTags` and `generateChapters`
141
+ - **`@mux/ai/primitives`** – Low-level building blocks like `fetchTranscriptForAsset` and `getStoryboardUrl`
142
+ - **`@mux/ai`** – Main entry point that re-exports both namespaces plus shared types
681
143
 
682
144
  ```typescript
145
+ // Or import everything
146
+ import { primitives, workflows } from "@mux/ai";
147
+ // Low-level primitives for custom workflows
148
+ import { fetchTranscriptForAsset, getStoryboardUrl } from "@mux/ai/primitives";
149
+ // High-level workflows
683
150
  import { getSummaryAndTags } from "@mux/ai/workflows";
684
-
685
- // SEO-optimized metadata
686
- const seoResult = await getSummaryAndTags(assetId, {
687
- tone: "professional",
688
- promptOverrides: {
689
- task: "Generate SEO-optimized metadata that maximizes discoverability.",
690
- title: "Create a search-optimized title (50-60 chars) with primary keyword front-loaded.",
691
- keywords: "Focus on high search volume terms and long-tail keywords.",
692
- },
693
- });
694
-
695
- // Social media optimized for engagement
696
- const socialResult = await getSummaryAndTags(assetId, {
697
- promptOverrides: {
698
- title: "Create a scroll-stopping headline using emotional triggers or curiosity gaps.",
699
- description: "Write shareable copy that creates FOMO and works without watching the video.",
700
- keywords: "Generate hashtag-ready keywords for trending and niche community tags.",
701
- },
702
- });
703
-
704
- // Technical/production analysis
705
- const technicalResult = await getSummaryAndTags(assetId, {
706
- tone: "professional",
707
- promptOverrides: {
708
- task: "Analyze cinematography, lighting, and production techniques.",
709
- title: "Describe the production style or filmmaking technique.",
710
- description: "Provide a technical breakdown of camera work, lighting, and editing.",
711
- keywords: "Use industry-standard production terminology.",
712
- },
713
- });
714
- ```
715
-
716
- **Available override sections:**
717
- | Section | Description |
718
- |---------|-------------|
719
- | `task` | Main instruction for what to analyze |
720
- | `title` | Guidance for generating the title |
721
- | `description` | Guidance for generating the description |
722
- | `keywords` | Guidance for generating keywords/tags |
723
- | `qualityGuidelines` | General quality instructions |
724
-
725
- Each override can be a simple string (replaces the section content) or a full `PromptSection` object for advanced control over XML tag names and attributes.
726
-
727
- ## Examples
728
-
729
- See the `examples/` directory for complete working examples.
730
-
731
- **Prerequisites:**
732
- Create a `.env` file in the project root with your API credentials:
733
-
734
- ```bash
735
- MUX_TOKEN_ID=your_token_id
736
- MUX_TOKEN_SECRET=your_token_secret
737
- OPENAI_API_KEY=your_openai_key
738
- ANTHROPIC_API_KEY=your_anthropic_key
739
- GOOGLE_GENERATIVE_AI_API_KEY=your_google_key
740
- HIVE_API_KEY=your_hive_key # required for Hive moderation runs
741
151
  ```
742
152
 
743
- All examples automatically load environment variables using `dotenv`.
744
-
745
- ### Quick Start (Run from Root)
746
-
747
- You can run examples directly from the project root without installing dependencies in each example folder:
748
-
749
- ```bash
750
- # Chapters
751
- npm run example:chapters <asset-id> [language-code] [provider]
752
- npm run example:chapters:compare <asset-id> [language-code]
753
-
754
- # Burned-in Caption Detection
755
- npm run example:burned-in <asset-id> [provider]
756
- npm run example:burned-in:compare <asset-id>
757
-
758
- # Summarization
759
- npm run example:summarization <asset-id> [provider]
760
- npm run example:summarization:compare <asset-id>
761
-
762
- # Moderation
763
- npm run example:moderation <asset-id> [provider]
764
- npm run example:moderation:compare <asset-id>
153
+ Every workflow is composed from primitives, so you can start high-level and drop down to primitives when you need more control.
765
154
 
766
- # Caption Translation
767
- npm run example:translate-captions <asset-id> [from-lang] [to-lang] [provider]
155
+ ## Documentation
768
156
 
769
- # Audio Translation (Dubbing)
770
- npm run example:translate-audio <asset-id> [to-lang]
771
-
772
- # Signed Playback (for assets with signed playback policies)
773
- npm run example:signed-playback <signed-asset-id>
774
- npm run example:signed-playback:summarize <signed-asset-id> [provider]
775
- ```
776
-
777
- **Examples:**
778
-
779
- ```bash
780
- # Generate chapters with OpenAI
781
- npm run example:chapters abc123 en openai
782
-
783
- # Detect burned-in captions with Anthropic
784
- npm run example:burned-in abc123 anthropic
785
-
786
- # Compare OpenAI vs Anthropic chapter generation
787
- npm run example:chapters:compare abc123 en
788
-
789
- # Run moderation analysis with Hive
790
- npm run example:moderation abc123 hive
791
-
792
- # Translate captions from English to Spanish with Anthropic (default)
793
- npm run example:translate-captions abc123 en es anthropic
794
-
795
- # Summarize a video with Claude Sonnet 4.5 (default)
796
- npm run example:summarization abc123 anthropic
797
-
798
- # Create AI-dubbed audio in French
799
- npm run example:translate-audio abc123 fr
800
- ```
801
-
802
- ### Summarization Examples
803
-
804
- - **Basic Usage**: Default prompt with different tones
805
- - **Custom Prompts**: Override prompt sections with presets (SEO, social, technical, ecommerce)
806
- - **Tone Variations**: Compare analysis styles
807
-
808
- ```bash
809
- cd examples/summarization
810
- npm install
811
- npm run basic <your-asset-id> [provider]
812
- npm run tones <your-asset-id>
813
-
814
- # Custom prompts with presets
815
- npm run custom <your-asset-id> --preset seo
816
- npm run custom <your-asset-id> --preset social
817
- npm run custom <your-asset-id> --preset technical
818
- npm run custom <your-asset-id> --preset ecommerce
819
-
820
- # Or provide individual overrides
821
- npm run custom <your-asset-id> --task "Focus on product features"
822
- ```
823
-
824
- ### Moderation Examples
825
-
826
- - **Basic Moderation**: Analyze content with default thresholds
827
- - **Custom Thresholds**: Compare strict/default/permissive settings
828
- - **Provider Comparison**: Compare OpenAI’s dedicated Moderation API with Hive’s visual moderation API
829
-
830
- ```bash
831
- cd examples/moderation
832
- npm install
833
- npm run basic <your-asset-id> [provider] # provider: openai | hive
834
- npm run thresholds <your-asset-id>
835
- npm run compare <your-asset-id>
836
- ```
837
-
838
- Supported moderation providers: `openai` (default) and `hive`. Use `HIVE_API_KEY` when selecting Hive.
839
-
840
- ### Burned-in Caption Examples
841
-
842
- - **Basic Detection**: Detect burned-in captions with different AI providers
843
- - **Provider Comparison**: Compare OpenAI vs Anthropic vs Google detection accuracy
844
-
845
- ```bash
846
- cd examples/burned-in-captions
847
- npm install
848
- npm run burned-in:basic <your-asset-id> [provider]
849
- npm run compare <your-asset-id>
850
- ```
851
-
852
- ### Chapter Generation Examples
853
-
854
- - **Basic Chapters**: Generate chapters with different AI providers
855
- - **Provider Comparison**: Compare OpenAI vs Anthropic vs Google chapter generation
856
-
857
- ```bash
858
- cd examples/chapters
859
- npm install
860
- npm run chapters:basic <your-asset-id> [language-code] [provider]
861
- npm run compare <your-asset-id> [language-code]
862
- ```
863
-
864
- ### Caption Translation Examples
865
-
866
- - **Basic Translation**: Translate captions and upload to Mux
867
- - **Translation Only**: Translate without uploading to Mux
868
-
869
- ```bash
870
- cd examples/translate-captions
871
- npm install
872
- npm run basic <your-asset-id> en es [provider]
873
- npm run translation-only <your-asset-id> en fr [provider]
874
- ```
875
-
876
- **Translation Workflow:**
877
-
878
- 1. Fetches existing captions from Mux asset
879
- 2. Translates VTT content using your selected provider (default: Claude Sonnet 4.5)
880
- 3. Uploads translated VTT to S3-compatible storage
881
- 4. Generates presigned URL (1-hour expiry)
882
- 5. Adds new subtitle track to Mux asset
883
- 6. Track name: "{Language} (auto-translated)"
884
-
885
- ### Audio Dubbing Examples
886
-
887
- - **Basic Dubbing**: Create AI-dubbed audio and upload to Mux
888
- - **Dubbing Only**: Create dubbed audio without uploading to Mux
889
-
890
- ```bash
891
- cd examples/translate-audio
892
- npm install
893
- npm run basic <your-asset-id> es
894
- npm run dubbing-only <your-asset-id> fr
895
- ```
896
-
897
- **Audio Dubbing Workflow:**
898
-
899
- 1. Checks asset has audio.m4a static rendition
900
- 2. Downloads default audio track from Mux
901
- 3. Creates ElevenLabs dubbing job with automatic language detection
902
- 4. Polls for completion (up to 30 minutes)
903
- 5. Downloads dubbed audio file
904
- 6. Uploads to S3-compatible storage
905
- 7. Generates presigned URL (1-hour expiry)
906
- 8. Adds new audio track to Mux asset
907
- 9. Track name: "{Language} (auto-dubbed)"
908
-
909
- ### Signed Playback Examples
910
-
911
- - **URL Generation Test**: Verify signed URLs work for storyboards, thumbnails, and transcripts
912
- - **Signed Summarization**: Full summarization workflow with a signed asset
913
-
914
- ```bash
915
- cd examples/signed-playback
916
- npm install
917
-
918
- # Verify signed URL generation
919
- npm run basic <signed-asset-id>
920
-
921
- # Summarize a signed asset
922
- npm run summarize <signed-asset-id> [provider]
923
- ```
924
-
925
- **Prerequisites:**
926
-
927
- 1. Create a Mux asset with `playback_policy: "signed"`
928
- 2. Create a signing key in Mux Dashboard → Settings → Signing Keys
929
- 3. Set `MUX_SIGNING_KEY` and `MUX_PRIVATE_KEY` environment variables
930
-
931
- **How Signed Playback Works:**
932
- When you provide signing credentials, the library automatically:
933
-
934
- - Detects if an asset has a signed playback policy
935
- - Generates JWT tokens with RS256 algorithm
936
- - Uses the correct `aud` claim for each asset type (video, thumbnail, storyboard)
937
- - Appends tokens to URLs as query parameters
938
-
939
- ## S3-Compatible Storage
940
-
941
- The translation feature requires S3-compatible storage to temporarily host VTT files for Mux ingestion. Supported providers include:
942
-
943
- - **AWS S3** - Amazon's object storage
944
- - **DigitalOcean Spaces** - S3-compatible with CDN
945
- - **Cloudflare R2** - Zero egress fees
946
- - **MinIO** - Self-hosted S3 alternative
947
- - **Backblaze B2** - Cost-effective storage
948
- - **Wasabi** - Hot cloud storage
949
-
950
- **Why S3 Storage?**
951
- Mux requires a publicly accessible URL to ingest subtitle tracks. The translation workflow:
952
-
953
- 1. Uploads translated VTT to your S3 storage
954
- 2. Generates a presigned URL for secure access
955
- 3. Mux fetches the file using the presigned URL
956
- 4. File remains in your storage for future use
157
+ - **[Workflows](./docs/WORKFLOWS.md)** - Detailed guide to each pre-built workflow
158
+ - **[Primitives](./docs/PRIMITIVES.md)** - Low-level building blocks for custom workflows
159
+ - **[API Reference](./docs/API.md)** - Complete API documentation for all functions
160
+ - **[Examples](./docs/EXAMPLES.md)** - Running examples from the repository
957
161
 
958
162
  ## Development
959
163
 
960
- ### Setup
961
-
962
164
  ```bash
963
- # Clone repo and install dependencies
165
+ # Clone and install
964
166
  git clone https://github.com/muxinc/mux-ai.git
965
167
  cd mux-ai
966
- npm install # Automatically sets up git hooks via Husky
967
- ```
968
-
969
- ### Style
970
-
971
- This project uses automated tooling to enforce consistent code style:
972
-
973
- - **ESLint** with `@antfu/eslint-config` for linting and formatting
974
- - **TypeScript** strict mode for type safety
975
- - **Pre-commit hooks** that run automatically before each commit
168
+ npm install # Automatically sets up git hooks
976
169
 
977
- ```bash
978
- # Check for linting issues
170
+ # Linting and type checking
979
171
  npm run lint
980
-
981
- # Auto-fix linting issues
982
172
  npm run lint:fix
983
-
984
- # Run type checking
985
173
  npm run typecheck
986
174
 
987
175
  # Run tests
988
176
  npm test
989
177
  ```
990
178
 
179
+ This project uses ESLint with `@antfu/eslint-config`, TypeScript strict mode, and automated pre-commit hooks.
180
+
991
181
  ## License
992
182
 
993
183
  [Apache 2.0](LICENSE)
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@mux/ai",
3
- "version": "0.1.4",
3
+ "version": "0.1.6",
4
4
  "description": "AI library for Mux",
5
5
  "author": "Mux",
6
6
  "license": "Apache-2.0",