agent-media 0.6.0 → 0.6.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (2) hide show
  1. package/README.md +88 -114
  2. package/package.json +7 -7
package/README.md CHANGED
@@ -6,125 +6,96 @@ Media processing CLI for AI agents.
6
6
  - **Video**: generate (text-to-video and image-to-video)
7
7
  - **Audio**: extract from video, transcribe (with speaker identification)
8
8
 
9
- ## Quick Start
10
-
11
- ### Local processing (no API key needed)
9
+ ## Installation
12
10
 
13
- Uses [Sharp](https://sharp.pixelplumbing.com/) for image operations and [transformers.js](https://huggingface.co/docs/transformers.js) for local AI (background removal, transcription).
11
+ ### Global
14
12
 
15
13
  ```bash
16
- bunx agent-media@latest image resize --in sunset-mountains.jpg --width 800
17
- bunx agent-media@latest image convert --in sunset-mountains.png --format webp
18
- bunx agent-media@latest image extend --in sunset-mountains.jpg --padding 50 --color "#FFFFFF"
19
- bunx agent-media@latest image remove-background --in portrait-headshot.png
20
- bunx agent-media@latest audio extract --in video.mp4
21
- bunx agent-media@latest audio transcribe --in audio.mp3
14
+ npm install -g agent-media@latest
22
15
  ```
23
16
 
24
- > **Note**: You may see a `mutex lock failed` error with local AI processing — ignore it, the output is correct if JSON shows `"ok": true`.
25
-
26
- **Provider auto-selection**: Without an API key, local processing is used. With an API key (`FAL_API_KEY`, `REPLICATE_API_TOKEN`, or `RUNPOD_API_KEY`), cloud providers are used. Override with `--provider <name>`.
27
-
28
- ### AI-powered features
17
+ ### From Source
29
18
 
30
- Requires an API key from one of these providers:
19
+ ```bash
20
+ git clone https://github.com/agntswrm/agent-media
21
+ cd agent-media
22
+ pnpm install && pnpm build && pnpm link --global
23
+ ```
31
24
 
32
- - [fal.ai](https://fal.ai/dashboard/keys) → `FAL_API_KEY`
33
- - [Replicate](https://replicate.com/account/api-tokens) → `REPLICATE_API_TOKEN`
34
- - [Runpod](https://www.runpod.io/console/user/settings) → `RUNPOD_API_KEY`
25
+ ### Via bunx / npx
35
26
 
36
- ### bunx
27
+ Run directly without installing:
37
28
 
38
29
  ```bash
39
- # Generate an image
40
- bunx agent-media@latest image generate --prompt "a robot painting a sunset"
41
-
42
- # Edit the generated image
43
- bunx agent-media@latest image edit --in .agent-media/generated_*.png --prompt "add a cat watching"
44
-
45
- # Remove background
46
- bunx agent-media@latest image remove-background --in .agent-media/edited_*.png
30
+ bunx agent-media@latest --help
31
+ npx agent-media@latest --help
32
+ ```
47
33
 
48
- # Generate a video from text
49
- bunx agent-media@latest video generate --prompt "ocean waves crashing on rocks"
34
+ ### Skills for AI Agents
50
35
 
51
- # Generate a video from an image (image-to-video with audio)
52
- bunx agent-media@latest video generate --in portrait.png --prompt "person smiles and waves hello" --audio
36
+ Install agent-media skills to your coding agent (Claude Code, Cursor, Codex, etc.):
53
37
 
54
- # Transcribe with speaker identification
55
- bunx agent-media@latest audio transcribe --in audio.mp3 --diarize
38
+ ```bash
39
+ npx skills add agntswrm/agent-media
56
40
  ```
57
41
 
58
- ### npx
42
+ This adds media processing skills that your AI agent can use automatically. Available skills:
43
+ - `agent-media` - Overview of all capabilities
44
+ - `image-generate` - Generate images from text
45
+ - `image-resize` - Resize images
46
+ - `image-convert` - Convert image formats
47
+ - `image-remove-background` - Remove backgrounds
48
+ - `audio-extract` - Extract audio from video
49
+ - `audio-transcribe` - Transcribe audio to text
50
+ - `video-generate` - Generate videos from text or images
51
+
52
+ ## Quick Start
59
53
 
60
54
  ```bash
61
55
  # Generate an image
62
- npx agent-media@latest image generate --prompt "a robot painting a sunset"
56
+ agent-media image generate --prompt "a robot painting a sunset"
63
57
 
64
58
  # Edit the generated image
65
- npx agent-media@latest image edit --in .agent-media/generated_*.png --prompt "add a cat watching"
59
+ agent-media image edit --in .agent-media/generated_*.png --prompt "add a cat watching"
66
60
 
67
61
  # Remove background
68
- npx agent-media@latest image remove-background --in .agent-media/edited_*.png
62
+ agent-media image remove-background --in .agent-media/edited_*.png
69
63
 
70
- # Generate a video from text
71
- npx agent-media@latest video generate --prompt "ocean waves crashing on rocks"
64
+ # Convert to different format
65
+ agent-media image convert --in .agent-media/nobg_*.png --format webp
72
66
 
73
- # Generate a video from an image (image-to-video with audio)
74
- npx agent-media@latest video generate --in portrait.png --prompt "person smiles and waves hello" --audio
67
+ # Generate a video from an image (with audio)
68
+ agent-media video generate --in woman-portrait.png --prompt "The woman speaks: 'Hello! Welcome to Agent Media.'" --audio --duration 10
75
69
 
76
- # Transcribe with speaker identification
77
- npx agent-media@latest audio transcribe --in audio.mp3 --diarize
78
- ```
79
-
80
- ## Installation
81
-
82
- ```bash
83
- # Use directly with bunx (no install)
84
- bunx agent-media@latest --help
85
-
86
- # Or with npx
87
- npx agent-media@latest --help
70
+ # Extract audio from video
71
+ agent-media audio extract --in .agent-media/generated_*.mp4
88
72
 
89
- # Or install globally
90
- npm install -g agent-media
91
- ```
92
-
93
- ### From Source
94
-
95
- ```bash
96
- git clone https://github.com/TimPietrusky/agent-media
97
- cd agent-media
98
- pnpm install && pnpm build && pnpm link --global
73
+ # Transcribe the audio
74
+ agent-media audio transcribe --in .agent-media/*_extracted_*.mp3
99
75
  ```
100
76
 
101
77
  ## Requirements
102
78
 
103
79
  - Node.js >= 18.0.0
104
- - API key for AI features (image generate/edit, video generate, remove-background, transcribe)
105
-
106
- ---
80
+ - API key from [fal.ai](https://fal.ai/dashboard/keys), [Replicate](https://replicate.com/account/api-tokens), or [Runpod](https://www.runpod.io/console/user/settings) for AI features
107
81
 
108
- ## image
109
-
110
- ```bash
111
- # Resize image
112
- agent-media@latest image resize --in <path> [options]
82
+ **Local processing** (no API key): resize, convert, extend, audio extract, remove-background, transcribe
113
83
 
114
- # Convert format
115
- agent-media@latest image convert --in <path> --format <f>
84
+ **Cloud processing** (API key required): image generate, image edit, video generate
116
85
 
117
- # Extend canvas with padding
118
- agent-media@latest image extend --in <path> --padding <px> --color <hex>
86
+ > **Note**: You may see a `mutex lock failed` error when using local remove-background or transcribe — ignore it, the output is correct if JSON shows `"ok": true`.
119
87
 
120
- # Generate image from text
121
- agent-media@latest image generate --prompt <text>
88
+ ---
122
89
 
123
- # Edit image with text prompt
124
- agent-media@latest image edit --in <path> --prompt <text>
90
+ ## image
125
91
 
126
- # Remove background
127
- agent-media@latest image remove-background --in <path>
92
+ ```bash
93
+ agent-media image resize --in <path> [options]
94
+ agent-media image convert --in <path> --format <f>
95
+ agent-media image extend --in <path> --padding <px> --color <hex>
96
+ agent-media image generate --prompt <text>
97
+ agent-media image edit --in <path> --prompt <text>
98
+ agent-media image remove-background --in <path>
128
99
  ```
129
100
 
130
101
  ### resize
@@ -132,9 +103,9 @@ agent-media@latest image remove-background --in <path>
132
103
  *local*
133
104
 
134
105
  ```bash
135
- agent-media@latest image resize --in sunset-mountains.jpg --width 800
136
- agent-media@latest image resize --in sunset-mountains.jpg --height 600
137
- agent-media@latest image resize --in https://ytrzap04kkm0giml.public.blob.vercel-storage.com/sunset-mountains.jpg --width 800
106
+ agent-media image resize --in sunset-mountains.jpg --width 800
107
+ agent-media image resize --in sunset-mountains.jpg --height 600
108
+ agent-media image resize --in https://ytrzap04kkm0giml.public.blob.vercel-storage.com/sunset-mountains.jpg --width 800
138
109
  ```
139
110
 
140
111
  | Option | Description |
@@ -149,9 +120,9 @@ agent-media@latest image resize --in https://ytrzap04kkm0giml.public.blob.vercel
149
120
  *local*
150
121
 
151
122
  ```bash
152
- agent-media@latest image convert --in sunset-mountains.png --format webp
153
- agent-media@latest image convert --in sunset-mountains.jpg --format png
154
- agent-media@latest image convert --in https://ytrzap04kkm0giml.public.blob.vercel-storage.com/sunset-mountains.png --format jpg --quality 90
123
+ agent-media image convert --in sunset-mountains.png --format webp
124
+ agent-media image convert --in sunset-mountains.jpg --format png
125
+ agent-media image convert --in https://ytrzap04kkm0giml.public.blob.vercel-storage.com/sunset-mountains.png --format jpg --quality 90
155
126
  ```
156
127
 
157
128
  | Option | Description |
@@ -168,8 +139,8 @@ agent-media@latest image convert --in https://ytrzap04kkm0giml.public.blob.verce
168
139
  Extend image canvas by adding padding on all sides with a solid background color.
169
140
 
170
141
  ```bash
171
- agent-media@latest image extend --in sunset-mountains.jpg --padding 50 --color "#E4ECF8"
172
- agent-media@latest image extend --in https://ytrzap04kkm0giml.public.blob.vercel-storage.com/sunset-mountains.png --padding 100 --color "#FFFFFF"
142
+ agent-media image extend --in sunset-mountains.jpg --padding 50 --color "#E4ECF8"
143
+ agent-media image extend --in https://ytrzap04kkm0giml.public.blob.vercel-storage.com/sunset-mountains.png --padding 100 --color "#FFFFFF"
173
144
  ```
174
145
 
175
146
  | Option | Description |
@@ -185,8 +156,8 @@ agent-media@latest image extend --in https://ytrzap04kkm0giml.public.blob.vercel
185
156
  *API key required*
186
157
 
187
158
  ```bash
188
- agent-media@latest image generate --prompt "a cat wearing a hat"
189
- agent-media@latest image generate --prompt "sunset over mountains" --width 1024 --height 768
159
+ agent-media image generate --prompt "a cat wearing a hat"
160
+ agent-media image generate --prompt "sunset over mountains" --width 1024 --height 768
190
161
  ```
191
162
 
192
163
  | Option | Description |
@@ -205,8 +176,8 @@ agent-media@latest image generate --prompt "sunset over mountains" --width 1024
205
176
  Edit an image using a text prompt (image-to-image).
206
177
 
207
178
  ```bash
208
- agent-media@latest image edit --in sunset-mountains.jpg --prompt "make the sky more vibrant"
209
- agent-media@latest image edit --in https://ytrzap04kkm0giml.public.blob.vercel-storage.com/portrait-headshot.png --prompt "add sunglasses"
179
+ agent-media image edit --in sunset-mountains.jpg --prompt "make the sky more vibrant"
180
+ agent-media image edit --in https://ytrzap04kkm0giml.public.blob.vercel-storage.com/man-portrait.png --prompt "add sunglasses"
210
181
  ```
211
182
 
212
183
  | Option | Description |
@@ -222,8 +193,8 @@ agent-media@latest image edit --in https://ytrzap04kkm0giml.public.blob.vercel-s
222
193
  *API key required*
223
194
 
224
195
  ```bash
225
- agent-media@latest image remove-background --in portrait-headshot.png
226
- agent-media@latest image remove-background --in https://ytrzap04kkm0giml.public.blob.vercel-storage.com/portrait-headshot.png
196
+ agent-media image remove-background --in man-portrait.png
197
+ agent-media image remove-background --in https://ytrzap04kkm0giml.public.blob.vercel-storage.com/man-portrait.png
227
198
  ```
228
199
 
229
200
  | Option | Description |
@@ -238,10 +209,10 @@ agent-media@latest image remove-background --in https://ytrzap04kkm0giml.public.
238
209
 
239
210
  ```bash
240
211
  # Generate video from text
241
- agent-media@latest video generate --prompt <text>
212
+ agent-media video generate --prompt <text>
242
213
 
243
214
  # Generate video from image (animate an image)
244
- agent-media@latest video generate --in <image> --prompt <text>
215
+ agent-media video generate --in <image> --prompt <text>
245
216
  ```
246
217
 
247
218
  ### generate
@@ -252,16 +223,16 @@ Generate video from a text prompt. Optionally provide an input image to animate
252
223
 
253
224
  ```bash
254
225
  # Text-to-video
255
- agent-media@latest video generate --prompt "a cat walking through a garden"
226
+ agent-media video generate --prompt "a cat walking through a garden"
256
227
 
257
228
  # Image-to-video (animate an image)
258
- agent-media@latest video generate --in portrait.png --prompt "person smiles and waves hello"
229
+ agent-media video generate --in woman-portrait.png --prompt "person smiles and waves hello"
259
230
 
260
231
  # With audio generation
261
- agent-media@latest video generate --prompt "fireworks in the night sky" --audio --duration 10
232
+ agent-media video generate --prompt "fireworks in the night sky" --audio --duration 10
262
233
 
263
234
  # Higher resolution
264
- agent-media@latest video generate --prompt "ocean waves" --resolution 1080p
235
+ agent-media video generate --prompt "ocean waves" --resolution 1080p
265
236
  ```
266
237
 
267
238
  | Option | Description |
@@ -283,10 +254,10 @@ agent-media@latest video generate --prompt "ocean waves" --resolution 1080p
283
254
 
284
255
  ```bash
285
256
  # Extract audio from video
286
- agent-media@latest audio extract --in <video>
257
+ agent-media audio extract --in <video>
287
258
 
288
259
  # Transcribe audio to text
289
- agent-media@latest audio transcribe --in <audio>
260
+ agent-media audio transcribe --in <audio>
290
261
  ```
291
262
 
292
263
  ### extract
@@ -296,8 +267,9 @@ agent-media@latest audio transcribe --in <audio>
296
267
  Extract audio track from a video file.
297
268
 
298
269
  ```bash
299
- agent-media@latest audio extract --in video.mp4
300
- agent-media@latest audio extract --in video.mp4 --format wav
270
+ agent-media audio extract --in woman-greeting.mp4
271
+ agent-media audio extract --in woman-greeting.mp4 --format wav
272
+ agent-media audio extract --in https://ytrzap04kkm0giml.public.blob.vercel-storage.com/woman-greeting.mp4
301
273
  ```
302
274
 
303
275
  | Option | Description |
@@ -313,8 +285,9 @@ agent-media@latest audio extract --in video.mp4 --format wav
313
285
  Transcribe audio to text with timestamps. Supports speaker identification.
314
286
 
315
287
  ```bash
316
- agent-media@latest audio transcribe --in audio.mp3
317
- agent-media@latest audio transcribe --in audio.mp3 --diarize --speakers 2
288
+ agent-media audio transcribe --in woman-greeting.mp3
289
+ agent-media audio transcribe --in woman-greeting.mp3 --diarize --speakers 2
290
+ agent-media audio transcribe --in https://ytrzap04kkm0giml.public.blob.vercel-storage.com/woman-greeting.mp3
318
291
  ```
319
292
 
320
293
  | Option | Description |
@@ -365,12 +338,14 @@ Exit code is `0` on success, `1` on error.
365
338
 
366
339
  | Provider | resize | convert | extend | image generate | image edit | remove-background | video generate | transcribe |
367
340
  |----------|--------|---------|--------|----------------|------------|-------------------|----------------|------------|
368
- | **local** | | | | - | - | - | - | - |
369
- | **transformers** | - | - | - | - | - | `Xenova/modnet` | - | `moonshine-base` |
341
+ | **local** | ✓* | ✓* | ✓* | - | - | `Xenova/modnet`** | - | `moonshine-base`** |
370
342
  | **fal** | - | - | - | `fal-ai/flux-2` | `fal-ai/flux-2/edit` | `fal-ai/birefnet/v2` | `fal-ai/ltx-2` | `fal-ai/wizper` |
371
- | **replicate** | - | - | - | `black-forest-labs/flux-2-dev` | `black-forest-labs/flux-kontext-dev` | `men1scus/birefnet` | `lightricks/ltx-video` | WhisperX |
343
+ | **replicate** | - | - | - | `black-forest-labs/flux-2-dev` | `black-forest-labs/flux-kontext-dev` | `men1scus/birefnet` | `lightricks/ltx-video` | `whisper-diarization` |
372
344
  | **runpod** | - | - | - | `alibaba/wan-2.6` | `google/nano-banana-pro-edit` | - | - | - |
373
345
 
346
+ \* Powered by [Sharp](https://sharp.pixelplumbing.com/) for fast image processing
347
+ \** Powered by [Transformers.js](https://huggingface.co/docs/transformers.js) for local ML inference (models downloaded on first use)
348
+
374
349
  Use `--model <name>` to override the default model for any command.
375
350
 
376
351
  ### Provider Selection
@@ -387,12 +362,11 @@ Use `--model <name>` to override the default model for any command.
387
362
  | `FAL_API_KEY` | fal.ai API key | [fal.ai](https://fal.ai/dashboard/keys) |
388
363
  | `REPLICATE_API_TOKEN` | Replicate API token | [replicate.com](https://replicate.com/account/api-tokens) |
389
364
  | `RUNPOD_API_KEY` | Runpod API key | [runpod.io](https://www.runpod.io/console/user/settings) |
390
- | `HUGGINGFACE_ACCESS_TOKEN` | For transcription with speaker ID (replicate only) | [huggingface.co](https://huggingface.co/settings/tokens) |
391
365
  | `AGENT_MEDIA_DIR` | Output directory (default: `.agent-media/`) | - |
392
366
 
393
367
  ## Roadmap
394
368
 
395
- - [x] Local CPU background removal via transformers.js/ONNX (zero API keys)
396
- - [x] Local CPU transcription via transformers.js/ONNX (zero API keys)
369
+ - [x] Local background removal (zero API keys)
370
+ - [x] Local transcription (zero API keys)
397
371
  - [x] Video generation (text-to-video and image-to-video)
398
372
  - [ ] Batch processing support
package/package.json CHANGED
@@ -1,11 +1,11 @@
1
1
  {
2
2
  "name": "agent-media",
3
- "version": "0.6.0",
3
+ "version": "0.6.2",
4
4
  "description": "Agent-first media toolkit CLI",
5
5
  "license": "Apache-2.0",
6
6
  "repository": {
7
7
  "type": "git",
8
- "url": "https://github.com/TimPietrusky/agent-media.git",
8
+ "url": "https://github.com/agntswrm/agent-media.git",
9
9
  "directory": "packages/agent-media"
10
10
  },
11
11
  "keywords": [
@@ -34,11 +34,11 @@
34
34
  "dependencies": {
35
35
  "commander": "^12.0.0",
36
36
  "dotenv": "^17.2.3",
37
- "@agent-media/audio": "0.4.1",
38
- "@agent-media/core": "0.5.0",
39
- "@agent-media/image": "0.3.1",
40
- "@agent-media/providers": "0.5.0",
41
- "@agent-media/video": "0.2.0"
37
+ "@agent-media/audio": "0.4.3",
38
+ "@agent-media/core": "0.5.1",
39
+ "@agent-media/image": "0.3.3",
40
+ "@agent-media/providers": "0.5.2",
41
+ "@agent-media/video": "0.2.2"
42
42
  },
43
43
  "devDependencies": {
44
44
  "@types/node": "^22.0.0",