agent-media 0.6.0 → 0.6.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +88 -114
- package/package.json +7 -7
package/README.md
CHANGED
|
@@ -6,125 +6,96 @@ Media processing CLI for AI agents.
|
|
|
6
6
|
- **Video**: generate (text-to-video and image-to-video)
|
|
7
7
|
- **Audio**: extract from video, transcribe (with speaker identification)
|
|
8
8
|
|
|
9
|
-
##
|
|
10
|
-
|
|
11
|
-
### Local processing (no API key needed)
|
|
9
|
+
## Installation
|
|
12
10
|
|
|
13
|
-
|
|
11
|
+
### Global
|
|
14
12
|
|
|
15
13
|
```bash
|
|
16
|
-
|
|
17
|
-
bunx agent-media@latest image convert --in sunset-mountains.png --format webp
|
|
18
|
-
bunx agent-media@latest image extend --in sunset-mountains.jpg --padding 50 --color "#FFFFFF"
|
|
19
|
-
bunx agent-media@latest image remove-background --in portrait-headshot.png
|
|
20
|
-
bunx agent-media@latest audio extract --in video.mp4
|
|
21
|
-
bunx agent-media@latest audio transcribe --in audio.mp3
|
|
14
|
+
npm install -g agent-media@latest
|
|
22
15
|
```
|
|
23
16
|
|
|
24
|
-
|
|
25
|
-
|
|
26
|
-
**Provider auto-selection**: Without an API key, local processing is used. With an API key (`FAL_API_KEY`, `REPLICATE_API_TOKEN`, or `RUNPOD_API_KEY`), cloud providers are used. Override with `--provider <name>`.
|
|
27
|
-
|
|
28
|
-
### AI-powered features
|
|
17
|
+
### From Source
|
|
29
18
|
|
|
30
|
-
|
|
19
|
+
```bash
|
|
20
|
+
git clone https://github.com/agntswrm/agent-media
|
|
21
|
+
cd agent-media
|
|
22
|
+
pnpm install && pnpm build && pnpm link --global
|
|
23
|
+
```
|
|
31
24
|
|
|
32
|
-
|
|
33
|
-
- [Replicate](https://replicate.com/account/api-tokens) → `REPLICATE_API_TOKEN`
|
|
34
|
-
- [Runpod](https://www.runpod.io/console/user/settings) → `RUNPOD_API_KEY`
|
|
25
|
+
### Via bunx / npx
|
|
35
26
|
|
|
36
|
-
|
|
27
|
+
Run directly without installing:
|
|
37
28
|
|
|
38
29
|
```bash
|
|
39
|
-
|
|
40
|
-
|
|
41
|
-
|
|
42
|
-
# Edit the generated image
|
|
43
|
-
bunx agent-media@latest image edit --in .agent-media/generated_*.png --prompt "add a cat watching"
|
|
44
|
-
|
|
45
|
-
# Remove background
|
|
46
|
-
bunx agent-media@latest image remove-background --in .agent-media/edited_*.png
|
|
30
|
+
bunx agent-media@latest --help
|
|
31
|
+
npx agent-media@latest --help
|
|
32
|
+
```
|
|
47
33
|
|
|
48
|
-
|
|
49
|
-
bunx agent-media@latest video generate --prompt "ocean waves crashing on rocks"
|
|
34
|
+
### Skills for AI Agents
|
|
50
35
|
|
|
51
|
-
|
|
52
|
-
bunx agent-media@latest video generate --in portrait.png --prompt "person smiles and waves hello" --audio
|
|
36
|
+
Install agent-media skills to your coding agent (Claude Code, Cursor, Codex, etc.):
|
|
53
37
|
|
|
54
|
-
|
|
55
|
-
|
|
38
|
+
```bash
|
|
39
|
+
npx skills add agntswrm/agent-media
|
|
56
40
|
```
|
|
57
41
|
|
|
58
|
-
|
|
42
|
+
This adds media processing skills that your AI agent can use automatically. Available skills:
|
|
43
|
+
- `agent-media` - Overview of all capabilities
|
|
44
|
+
- `image-generate` - Generate images from text
|
|
45
|
+
- `image-resize` - Resize images
|
|
46
|
+
- `image-convert` - Convert image formats
|
|
47
|
+
- `image-remove-background` - Remove backgrounds
|
|
48
|
+
- `audio-extract` - Extract audio from video
|
|
49
|
+
- `audio-transcribe` - Transcribe audio to text
|
|
50
|
+
- `video-generate` - Generate videos from text or images
|
|
51
|
+
|
|
52
|
+
## Quick Start
|
|
59
53
|
|
|
60
54
|
```bash
|
|
61
55
|
# Generate an image
|
|
62
|
-
|
|
56
|
+
agent-media image generate --prompt "a robot painting a sunset"
|
|
63
57
|
|
|
64
58
|
# Edit the generated image
|
|
65
|
-
|
|
59
|
+
agent-media image edit --in .agent-media/generated_*.png --prompt "add a cat watching"
|
|
66
60
|
|
|
67
61
|
# Remove background
|
|
68
|
-
|
|
62
|
+
agent-media image remove-background --in .agent-media/edited_*.png
|
|
69
63
|
|
|
70
|
-
#
|
|
71
|
-
|
|
64
|
+
# Convert to different format
|
|
65
|
+
agent-media image convert --in .agent-media/nobg_*.png --format webp
|
|
72
66
|
|
|
73
|
-
# Generate a video from an image (
|
|
74
|
-
|
|
67
|
+
# Generate a video from an image (with audio)
|
|
68
|
+
agent-media video generate --in woman-portrait.png --prompt "The woman speaks: 'Hello! Welcome to Agent Media.'" --audio --duration 10
|
|
75
69
|
|
|
76
|
-
#
|
|
77
|
-
|
|
78
|
-
```
|
|
79
|
-
|
|
80
|
-
## Installation
|
|
81
|
-
|
|
82
|
-
```bash
|
|
83
|
-
# Use directly with bunx (no install)
|
|
84
|
-
bunx agent-media@latest --help
|
|
85
|
-
|
|
86
|
-
# Or with npx
|
|
87
|
-
npx agent-media@latest --help
|
|
70
|
+
# Extract audio from video
|
|
71
|
+
agent-media audio extract --in .agent-media/generated_*.mp4
|
|
88
72
|
|
|
89
|
-
#
|
|
90
|
-
|
|
91
|
-
```
|
|
92
|
-
|
|
93
|
-
### From Source
|
|
94
|
-
|
|
95
|
-
```bash
|
|
96
|
-
git clone https://github.com/TimPietrusky/agent-media
|
|
97
|
-
cd agent-media
|
|
98
|
-
pnpm install && pnpm build && pnpm link --global
|
|
73
|
+
# Transcribe the audio
|
|
74
|
+
agent-media audio transcribe --in .agent-media/*_extracted_*.mp3
|
|
99
75
|
```
|
|
100
76
|
|
|
101
77
|
## Requirements
|
|
102
78
|
|
|
103
79
|
- Node.js >= 18.0.0
|
|
104
|
-
- API key
|
|
105
|
-
|
|
106
|
-
---
|
|
80
|
+
- API key from [fal.ai](https://fal.ai/dashboard/keys), [Replicate](https://replicate.com/account/api-tokens), or [Runpod](https://www.runpod.io/console/user/settings) for AI features
|
|
107
81
|
|
|
108
|
-
|
|
109
|
-
|
|
110
|
-
```bash
|
|
111
|
-
# Resize image
|
|
112
|
-
agent-media@latest image resize --in <path> [options]
|
|
82
|
+
**Local processing** (no API key): resize, convert, extend, audio extract, remove-background, transcribe
|
|
113
83
|
|
|
114
|
-
|
|
115
|
-
agent-media@latest image convert --in <path> --format <f>
|
|
84
|
+
**Cloud processing** (API key required): image generate, image edit, video generate
|
|
116
85
|
|
|
117
|
-
|
|
118
|
-
agent-media@latest image extend --in <path> --padding <px> --color <hex>
|
|
86
|
+
> **Note**: You may see a `mutex lock failed` error when using local remove-background or transcribe — ignore it, the output is correct if JSON shows `"ok": true`.
|
|
119
87
|
|
|
120
|
-
|
|
121
|
-
agent-media@latest image generate --prompt <text>
|
|
88
|
+
---
|
|
122
89
|
|
|
123
|
-
|
|
124
|
-
agent-media@latest image edit --in <path> --prompt <text>
|
|
90
|
+
## image
|
|
125
91
|
|
|
126
|
-
|
|
127
|
-
agent-media
|
|
92
|
+
```bash
|
|
93
|
+
agent-media image resize --in <path> [options]
|
|
94
|
+
agent-media image convert --in <path> --format <f>
|
|
95
|
+
agent-media image extend --in <path> --padding <px> --color <hex>
|
|
96
|
+
agent-media image generate --prompt <text>
|
|
97
|
+
agent-media image edit --in <path> --prompt <text>
|
|
98
|
+
agent-media image remove-background --in <path>
|
|
128
99
|
```
|
|
129
100
|
|
|
130
101
|
### resize
|
|
@@ -132,9 +103,9 @@ agent-media@latest image remove-background --in <path>
|
|
|
132
103
|
*local*
|
|
133
104
|
|
|
134
105
|
```bash
|
|
135
|
-
agent-media
|
|
136
|
-
agent-media
|
|
137
|
-
agent-media
|
|
106
|
+
agent-media image resize --in sunset-mountains.jpg --width 800
|
|
107
|
+
agent-media image resize --in sunset-mountains.jpg --height 600
|
|
108
|
+
agent-media image resize --in https://ytrzap04kkm0giml.public.blob.vercel-storage.com/sunset-mountains.jpg --width 800
|
|
138
109
|
```
|
|
139
110
|
|
|
140
111
|
| Option | Description |
|
|
@@ -149,9 +120,9 @@ agent-media@latest image resize --in https://ytrzap04kkm0giml.public.blob.vercel
|
|
|
149
120
|
*local*
|
|
150
121
|
|
|
151
122
|
```bash
|
|
152
|
-
agent-media
|
|
153
|
-
agent-media
|
|
154
|
-
agent-media
|
|
123
|
+
agent-media image convert --in sunset-mountains.png --format webp
|
|
124
|
+
agent-media image convert --in sunset-mountains.jpg --format png
|
|
125
|
+
agent-media image convert --in https://ytrzap04kkm0giml.public.blob.vercel-storage.com/sunset-mountains.png --format jpg --quality 90
|
|
155
126
|
```
|
|
156
127
|
|
|
157
128
|
| Option | Description |
|
|
@@ -168,8 +139,8 @@ agent-media@latest image convert --in https://ytrzap04kkm0giml.public.blob.verce
|
|
|
168
139
|
Extend image canvas by adding padding on all sides with a solid background color.
|
|
169
140
|
|
|
170
141
|
```bash
|
|
171
|
-
agent-media
|
|
172
|
-
agent-media
|
|
142
|
+
agent-media image extend --in sunset-mountains.jpg --padding 50 --color "#E4ECF8"
|
|
143
|
+
agent-media image extend --in https://ytrzap04kkm0giml.public.blob.vercel-storage.com/sunset-mountains.png --padding 100 --color "#FFFFFF"
|
|
173
144
|
```
|
|
174
145
|
|
|
175
146
|
| Option | Description |
|
|
@@ -185,8 +156,8 @@ agent-media@latest image extend --in https://ytrzap04kkm0giml.public.blob.vercel
|
|
|
185
156
|
*API key required*
|
|
186
157
|
|
|
187
158
|
```bash
|
|
188
|
-
agent-media
|
|
189
|
-
agent-media
|
|
159
|
+
agent-media image generate --prompt "a cat wearing a hat"
|
|
160
|
+
agent-media image generate --prompt "sunset over mountains" --width 1024 --height 768
|
|
190
161
|
```
|
|
191
162
|
|
|
192
163
|
| Option | Description |
|
|
@@ -205,8 +176,8 @@ agent-media@latest image generate --prompt "sunset over mountains" --width 1024
|
|
|
205
176
|
Edit an image using a text prompt (image-to-image).
|
|
206
177
|
|
|
207
178
|
```bash
|
|
208
|
-
agent-media
|
|
209
|
-
agent-media
|
|
179
|
+
agent-media image edit --in sunset-mountains.jpg --prompt "make the sky more vibrant"
|
|
180
|
+
agent-media image edit --in https://ytrzap04kkm0giml.public.blob.vercel-storage.com/man-portrait.png --prompt "add sunglasses"
|
|
210
181
|
```
|
|
211
182
|
|
|
212
183
|
| Option | Description |
|
|
@@ -222,8 +193,8 @@ agent-media@latest image edit --in https://ytrzap04kkm0giml.public.blob.vercel-s
|
|
|
222
193
|
*API key required*
|
|
223
194
|
|
|
224
195
|
```bash
|
|
225
|
-
agent-media
|
|
226
|
-
agent-media
|
|
196
|
+
agent-media image remove-background --in man-portrait.png
|
|
197
|
+
agent-media image remove-background --in https://ytrzap04kkm0giml.public.blob.vercel-storage.com/man-portrait.png
|
|
227
198
|
```
|
|
228
199
|
|
|
229
200
|
| Option | Description |
|
|
@@ -238,10 +209,10 @@ agent-media@latest image remove-background --in https://ytrzap04kkm0giml.public.
|
|
|
238
209
|
|
|
239
210
|
```bash
|
|
240
211
|
# Generate video from text
|
|
241
|
-
agent-media
|
|
212
|
+
agent-media video generate --prompt <text>
|
|
242
213
|
|
|
243
214
|
# Generate video from image (animate an image)
|
|
244
|
-
agent-media
|
|
215
|
+
agent-media video generate --in <image> --prompt <text>
|
|
245
216
|
```
|
|
246
217
|
|
|
247
218
|
### generate
|
|
@@ -252,16 +223,16 @@ Generate video from a text prompt. Optionally provide an input image to animate
|
|
|
252
223
|
|
|
253
224
|
```bash
|
|
254
225
|
# Text-to-video
|
|
255
|
-
agent-media
|
|
226
|
+
agent-media video generate --prompt "a cat walking through a garden"
|
|
256
227
|
|
|
257
228
|
# Image-to-video (animate an image)
|
|
258
|
-
agent-media
|
|
229
|
+
agent-media video generate --in woman-portrait.png --prompt "person smiles and waves hello"
|
|
259
230
|
|
|
260
231
|
# With audio generation
|
|
261
|
-
agent-media
|
|
232
|
+
agent-media video generate --prompt "fireworks in the night sky" --audio --duration 10
|
|
262
233
|
|
|
263
234
|
# Higher resolution
|
|
264
|
-
agent-media
|
|
235
|
+
agent-media video generate --prompt "ocean waves" --resolution 1080p
|
|
265
236
|
```
|
|
266
237
|
|
|
267
238
|
| Option | Description |
|
|
@@ -283,10 +254,10 @@ agent-media@latest video generate --prompt "ocean waves" --resolution 1080p
|
|
|
283
254
|
|
|
284
255
|
```bash
|
|
285
256
|
# Extract audio from video
|
|
286
|
-
agent-media
|
|
257
|
+
agent-media audio extract --in <video>
|
|
287
258
|
|
|
288
259
|
# Transcribe audio to text
|
|
289
|
-
agent-media
|
|
260
|
+
agent-media audio transcribe --in <audio>
|
|
290
261
|
```
|
|
291
262
|
|
|
292
263
|
### extract
|
|
@@ -296,8 +267,9 @@ agent-media@latest audio transcribe --in <audio>
|
|
|
296
267
|
Extract audio track from a video file.
|
|
297
268
|
|
|
298
269
|
```bash
|
|
299
|
-
agent-media
|
|
300
|
-
agent-media
|
|
270
|
+
agent-media audio extract --in woman-greeting.mp4
|
|
271
|
+
agent-media audio extract --in woman-greeting.mp4 --format wav
|
|
272
|
+
agent-media audio extract --in https://ytrzap04kkm0giml.public.blob.vercel-storage.com/woman-greeting.mp4
|
|
301
273
|
```
|
|
302
274
|
|
|
303
275
|
| Option | Description |
|
|
@@ -313,8 +285,9 @@ agent-media@latest audio extract --in video.mp4 --format wav
|
|
|
313
285
|
Transcribe audio to text with timestamps. Supports speaker identification.
|
|
314
286
|
|
|
315
287
|
```bash
|
|
316
|
-
agent-media
|
|
317
|
-
agent-media
|
|
288
|
+
agent-media audio transcribe --in woman-greeting.mp3
|
|
289
|
+
agent-media audio transcribe --in woman-greeting.mp3 --diarize --speakers 2
|
|
290
|
+
agent-media audio transcribe --in https://ytrzap04kkm0giml.public.blob.vercel-storage.com/woman-greeting.mp3
|
|
318
291
|
```
|
|
319
292
|
|
|
320
293
|
| Option | Description |
|
|
@@ -365,12 +338,14 @@ Exit code is `0` on success, `1` on error.
|
|
|
365
338
|
|
|
366
339
|
| Provider | resize | convert | extend | image generate | image edit | remove-background | video generate | transcribe |
|
|
367
340
|
|----------|--------|---------|--------|----------------|------------|-------------------|----------------|------------|
|
|
368
|
-
| **local** |
|
|
369
|
-
| **transformers** | - | - | - | - | - | `Xenova/modnet` | - | `moonshine-base` |
|
|
341
|
+
| **local** | ✓* | ✓* | ✓* | - | - | `Xenova/modnet`** | - | `moonshine-base`** |
|
|
370
342
|
| **fal** | - | - | - | `fal-ai/flux-2` | `fal-ai/flux-2/edit` | `fal-ai/birefnet/v2` | `fal-ai/ltx-2` | `fal-ai/wizper` |
|
|
371
|
-
| **replicate** | - | - | - | `black-forest-labs/flux-2-dev` | `black-forest-labs/flux-kontext-dev` | `men1scus/birefnet` | `lightricks/ltx-video` |
|
|
343
|
+
| **replicate** | - | - | - | `black-forest-labs/flux-2-dev` | `black-forest-labs/flux-kontext-dev` | `men1scus/birefnet` | `lightricks/ltx-video` | `whisper-diarization` |
|
|
372
344
|
| **runpod** | - | - | - | `alibaba/wan-2.6` | `google/nano-banana-pro-edit` | - | - | - |
|
|
373
345
|
|
|
346
|
+
\* Powered by [Sharp](https://sharp.pixelplumbing.com/) for fast image processing
|
|
347
|
+
\** Powered by [Transformers.js](https://huggingface.co/docs/transformers.js) for local ML inference (models downloaded on first use)
|
|
348
|
+
|
|
374
349
|
Use `--model <name>` to override the default model for any command.
|
|
375
350
|
|
|
376
351
|
### Provider Selection
|
|
@@ -387,12 +362,11 @@ Use `--model <name>` to override the default model for any command.
|
|
|
387
362
|
| `FAL_API_KEY` | fal.ai API key | [fal.ai](https://fal.ai/dashboard/keys) |
|
|
388
363
|
| `REPLICATE_API_TOKEN` | Replicate API token | [replicate.com](https://replicate.com/account/api-tokens) |
|
|
389
364
|
| `RUNPOD_API_KEY` | Runpod API key | [runpod.io](https://www.runpod.io/console/user/settings) |
|
|
390
|
-
| `HUGGINGFACE_ACCESS_TOKEN` | For transcription with speaker ID (replicate only) | [huggingface.co](https://huggingface.co/settings/tokens) |
|
|
391
365
|
| `AGENT_MEDIA_DIR` | Output directory (default: `.agent-media/`) | - |
|
|
392
366
|
|
|
393
367
|
## Roadmap
|
|
394
368
|
|
|
395
|
-
- [x] Local
|
|
396
|
-
- [x] Local
|
|
369
|
+
- [x] Local background removal (zero API keys)
|
|
370
|
+
- [x] Local transcription (zero API keys)
|
|
397
371
|
- [x] Video generation (text-to-video and image-to-video)
|
|
398
372
|
- [ ] Batch processing support
|
package/package.json
CHANGED
|
@@ -1,11 +1,11 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "agent-media",
|
|
3
|
-
"version": "0.6.
|
|
3
|
+
"version": "0.6.2",
|
|
4
4
|
"description": "Agent-first media toolkit CLI",
|
|
5
5
|
"license": "Apache-2.0",
|
|
6
6
|
"repository": {
|
|
7
7
|
"type": "git",
|
|
8
|
-
"url": "https://github.com/
|
|
8
|
+
"url": "https://github.com/agntswrm/agent-media.git",
|
|
9
9
|
"directory": "packages/agent-media"
|
|
10
10
|
},
|
|
11
11
|
"keywords": [
|
|
@@ -34,11 +34,11 @@
|
|
|
34
34
|
"dependencies": {
|
|
35
35
|
"commander": "^12.0.0",
|
|
36
36
|
"dotenv": "^17.2.3",
|
|
37
|
-
"@agent-media/audio": "0.4.
|
|
38
|
-
"@agent-media/core": "0.5.
|
|
39
|
-
"@agent-media/image": "0.3.
|
|
40
|
-
"@agent-media/providers": "0.5.
|
|
41
|
-
"@agent-media/video": "0.2.
|
|
37
|
+
"@agent-media/audio": "0.4.3",
|
|
38
|
+
"@agent-media/core": "0.5.1",
|
|
39
|
+
"@agent-media/image": "0.3.3",
|
|
40
|
+
"@agent-media/providers": "0.5.2",
|
|
41
|
+
"@agent-media/video": "0.2.2"
|
|
42
42
|
},
|
|
43
43
|
"devDependencies": {
|
|
44
44
|
"@types/node": "^22.0.0",
|