agent-media 0.3.1 → 0.3.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +163 -66
- package/package.json +4 -4
package/README.md
CHANGED
|
@@ -1,33 +1,73 @@
|
|
|
1
1
|
# agent-media
|
|
2
2
|
|
|
3
|
-
Media processing CLI for AI agents.
|
|
3
|
+
Media processing CLI for AI agents.
|
|
4
4
|
|
|
5
|
-
|
|
5
|
+
- **Image**: generate, edit, remove-background, resize, convert, extend
|
|
6
|
+
- **Video**: extract audio
|
|
7
|
+
- **Audio**: transcribe (with speaker identification)
|
|
8
|
+
|
|
9
|
+
## Quick Start
|
|
10
|
+
|
|
11
|
+
Requires an API key from one of these providers:
|
|
12
|
+
|
|
13
|
+
- [fal.ai](https://fal.ai/dashboard/keys) → `FAL_API_KEY`
|
|
14
|
+
- [Replicate](https://replicate.com/account/api-tokens) → `REPLICATE_API_TOKEN`
|
|
15
|
+
- [Runpod](https://www.runpod.io/console/user/settings) → `RUNPOD_API_KEY`
|
|
16
|
+
|
|
17
|
+
```bash
|
|
18
|
+
# Generate an image
|
|
19
|
+
npx agent-media image generate --prompt "a robot painting a sunset"
|
|
20
|
+
|
|
21
|
+
# Edit the generated image
|
|
22
|
+
npx agent-media image edit --in .agent-media/generated_*.png --prompt "add a cat watching"
|
|
23
|
+
|
|
24
|
+
# Remove background
|
|
25
|
+
npx agent-media image remove-background --in .agent-media/edited_*.png
|
|
26
|
+
|
|
27
|
+
# Convert to webp
|
|
28
|
+
npx agent-media image convert --in .agent-media/nobg_*.png --format webp
|
|
29
|
+
```
|
|
30
|
+
|
|
31
|
+
**Video to transcript** (no API key needed for extract)
|
|
6
32
|
|
|
7
|
-
|
|
33
|
+
```bash
|
|
34
|
+
# Extract audio from video (local, no API key)
|
|
35
|
+
npx agent-media audio extract --in video.mp4
|
|
36
|
+
|
|
37
|
+
# Transcribe with speaker identification
|
|
38
|
+
npx agent-media audio transcribe --in .agent-media/extracted_*.mp3 --diarize
|
|
39
|
+
```
|
|
40
|
+
|
|
41
|
+
**Local processing** (no API key needed)
|
|
42
|
+
|
|
43
|
+
```bash
|
|
44
|
+
npx agent-media image resize --in photo.jpg --width 800
|
|
45
|
+
npx agent-media image convert --in photo.png --format webp
|
|
46
|
+
npx agent-media image extend --in photo.jpg --padding 50 --color "#FFFFFF"
|
|
47
|
+
```
|
|
48
|
+
|
|
49
|
+
## Installation
|
|
8
50
|
|
|
9
51
|
```bash
|
|
52
|
+
# Use directly with npx (no install)
|
|
53
|
+
npx agent-media --help
|
|
54
|
+
|
|
55
|
+
# Or install globally
|
|
10
56
|
npm install -g agent-media
|
|
11
57
|
```
|
|
12
58
|
|
|
13
59
|
### From Source
|
|
14
60
|
|
|
15
61
|
```bash
|
|
16
|
-
git clone https://github.com/
|
|
62
|
+
git clone https://github.com/TimPietrusky/agent-media
|
|
17
63
|
cd agent-media
|
|
18
|
-
pnpm install
|
|
19
|
-
pnpm build
|
|
20
|
-
pnpm link --global
|
|
64
|
+
pnpm install && pnpm build && pnpm link --global
|
|
21
65
|
```
|
|
22
66
|
|
|
23
|
-
##
|
|
67
|
+
## Requirements
|
|
24
68
|
|
|
25
|
-
|
|
26
|
-
|
|
27
|
-
agent-media image convert --in photo.png --format webp
|
|
28
|
-
agent-media image remove-background --in portrait.jpg
|
|
29
|
-
agent-media image generate --prompt "a red robot"
|
|
30
|
-
```
|
|
69
|
+
- Node.js >= 18.0.0
|
|
70
|
+
- API key for AI features (generate, edit, remove-background, transcribe)
|
|
31
71
|
|
|
32
72
|
## Commands
|
|
33
73
|
|
|
@@ -38,9 +78,20 @@ agent-media image resize --in <path> [options] # Resize image
|
|
|
38
78
|
agent-media image convert --in <path> --format <f> # Convert format
|
|
39
79
|
agent-media image remove-background --in <path> # Remove background
|
|
40
80
|
agent-media image generate --prompt <text> # Generate from prompt
|
|
81
|
+
agent-media image extend --in <path> --padding <px> --color <hex> # Extend canvas
|
|
82
|
+
agent-media image edit --in <path> --prompt <text> # Edit with prompt
|
|
83
|
+
```
|
|
84
|
+
|
|
85
|
+
### Audio Commands
|
|
86
|
+
|
|
87
|
+
```bash
|
|
88
|
+
agent-media audio extract --in <video> # Extract audio from video
|
|
89
|
+
agent-media audio transcribe --in <audio> # Transcribe audio to text
|
|
41
90
|
```
|
|
42
91
|
|
|
43
|
-
|
|
92
|
+
---
|
|
93
|
+
|
|
94
|
+
### resize
|
|
44
95
|
|
|
45
96
|
```bash
|
|
46
97
|
agent-media image resize --in photo.jpg --width 800
|
|
@@ -56,7 +107,7 @@ agent-media image resize --in photo.jpg --width 800 --height 600
|
|
|
56
107
|
| `--out <dir>` | Output directory |
|
|
57
108
|
| `--provider <name>` | Provider (local) |
|
|
58
109
|
|
|
59
|
-
###
|
|
110
|
+
### convert
|
|
60
111
|
|
|
61
112
|
```bash
|
|
62
113
|
agent-media image convert --in photo.png --format webp
|
|
@@ -72,7 +123,7 @@ agent-media image convert --in photo.png --format jpg --quality 90
|
|
|
72
123
|
| `--out <dir>` | Output directory |
|
|
73
124
|
| `--provider <name>` | Provider (local) |
|
|
74
125
|
|
|
75
|
-
###
|
|
126
|
+
### remove-background
|
|
76
127
|
|
|
77
128
|
```bash
|
|
78
129
|
agent-media image remove-background --in portrait.jpg
|
|
@@ -85,7 +136,7 @@ agent-media image remove-background --in https://example.com/photo.jpg
|
|
|
85
136
|
| `--out <dir>` | Output directory |
|
|
86
137
|
| `--provider <name>` | Provider (fal, replicate) |
|
|
87
138
|
|
|
88
|
-
###
|
|
139
|
+
### generate
|
|
89
140
|
|
|
90
141
|
```bash
|
|
91
142
|
agent-media image generate --prompt "a cat wearing a hat"
|
|
@@ -99,6 +150,76 @@ agent-media image generate --prompt "sunset over mountains" --width 1024 --heigh
|
|
|
99
150
|
| `--height <px>` | Height (default: 1024) |
|
|
100
151
|
| `--out <dir>` | Output directory |
|
|
101
152
|
| `--provider <name>` | Provider (fal, replicate, runpod) |
|
|
153
|
+
| `--model <name>` | Model override (e.g., `fal-ai/flux-2`, `black-forest-labs/flux-2-dev`) |
|
|
154
|
+
|
|
155
|
+
### extend
|
|
156
|
+
|
|
157
|
+
Extend image canvas by adding padding on all sides with a solid background color.
|
|
158
|
+
|
|
159
|
+
```bash
|
|
160
|
+
agent-media image extend --in photo.jpg --padding 50 --color "#E4ECF8"
|
|
161
|
+
agent-media image extend --in photo.png --padding 100 --color "#FFFFFF" --dpi 300
|
|
162
|
+
```
|
|
163
|
+
|
|
164
|
+
| Option | Description |
|
|
165
|
+
|--------|-------------|
|
|
166
|
+
| `--in <path>` | Input file path or URL (required) |
|
|
167
|
+
| `--padding <px>` | Padding size in pixels to add on all sides (required) |
|
|
168
|
+
| `--color <hex>` | Background color for extended area (required). Also flattens transparency. |
|
|
169
|
+
| `--dpi <n>` | DPI/density for output image (default: 300) |
|
|
170
|
+
| `--out <dir>` | Output directory |
|
|
171
|
+
| `--provider <name>` | Provider (local) |
|
|
172
|
+
|
|
173
|
+
### edit
|
|
174
|
+
|
|
175
|
+
Edit an image using a text prompt (image-to-image).
|
|
176
|
+
|
|
177
|
+
```bash
|
|
178
|
+
agent-media image edit --in photo.jpg --prompt "make the sky more vibrant"
|
|
179
|
+
agent-media image edit --in portrait.jpg --prompt "add sunglasses"
|
|
180
|
+
```
|
|
181
|
+
|
|
182
|
+
| Option | Description |
|
|
183
|
+
|--------|-------------|
|
|
184
|
+
| `--in <path>` | Input file path or URL (required) |
|
|
185
|
+
| `--prompt <text>` | Text description of the desired edit (required) |
|
|
186
|
+
| `--out <dir>` | Output directory |
|
|
187
|
+
| `--provider <name>` | Provider (fal, replicate, runpod) |
|
|
188
|
+
| `--model <name>` | Model override (e.g., `fal-ai/flux-2/edit`) |
|
|
189
|
+
|
|
190
|
+
### audio extract
|
|
191
|
+
|
|
192
|
+
Extract audio track from a video file. Uses local ffmpeg, no API key needed.
|
|
193
|
+
|
|
194
|
+
```bash
|
|
195
|
+
agent-media audio extract --in video.mp4
|
|
196
|
+
agent-media audio extract --in video.mp4 --format wav
|
|
197
|
+
```
|
|
198
|
+
|
|
199
|
+
| Option | Description |
|
|
200
|
+
|--------|-------------|
|
|
201
|
+
| `--in <path>` | Input video file path or URL (required) |
|
|
202
|
+
| `--format <f>` | Output format: mp3, wav (default: mp3) |
|
|
203
|
+
| `--out <dir>` | Output directory |
|
|
204
|
+
|
|
205
|
+
### audio transcribe
|
|
206
|
+
|
|
207
|
+
Transcribe audio to text with timestamps. Supports speaker identification.
|
|
208
|
+
|
|
209
|
+
```bash
|
|
210
|
+
agent-media audio transcribe --in audio.mp3
|
|
211
|
+
agent-media audio transcribe --in audio.mp3 --diarize --speakers 2
|
|
212
|
+
```
|
|
213
|
+
|
|
214
|
+
| Option | Description |
|
|
215
|
+
|--------|-------------|
|
|
216
|
+
| `--in <path>` | Input audio file path or URL (required) |
|
|
217
|
+
| `--diarize` | Enable speaker identification |
|
|
218
|
+
| `--language <code>` | Language code (auto-detected if not provided) |
|
|
219
|
+
| `--speakers <n>` | Number of speakers hint |
|
|
220
|
+
| `--out <dir>` | Output directory |
|
|
221
|
+
| `--provider <name>` | Provider (fal, replicate) |
|
|
222
|
+
| `--model <name>` | Model override |
|
|
102
223
|
|
|
103
224
|
## Output Format
|
|
104
225
|
|
|
@@ -132,49 +253,16 @@ Exit code is `0` on success, `1` on error.
|
|
|
132
253
|
|
|
133
254
|
## Providers
|
|
134
255
|
|
|
135
|
-
###
|
|
136
|
-
|
|
137
|
-
Uses Sharp for image processing. No API key required.
|
|
138
|
-
|
|
139
|
-
**Supports:** resize, convert
|
|
140
|
-
|
|
141
|
-
```bash
|
|
142
|
-
agent-media image resize --in photo.jpg --width 800 # Uses local automatically
|
|
143
|
-
```
|
|
144
|
-
|
|
145
|
-
### Fal
|
|
146
|
-
|
|
147
|
-
Uses fal.ai for AI-powered image operations.
|
|
148
|
-
|
|
149
|
-
**Supports:** generate, remove-background
|
|
150
|
-
|
|
151
|
-
```bash
|
|
152
|
-
export FAL_API_KEY=your-key
|
|
153
|
-
agent-media image generate --prompt "a red robot"
|
|
154
|
-
agent-media image remove-background --in photo.jpg
|
|
155
|
-
```
|
|
156
|
-
|
|
157
|
-
### Replicate
|
|
158
|
-
|
|
159
|
-
Uses Replicate for AI-powered image operations.
|
|
160
|
-
|
|
161
|
-
**Supports:** generate, remove-background
|
|
162
|
-
|
|
163
|
-
```bash
|
|
164
|
-
export REPLICATE_API_TOKEN=your-token
|
|
165
|
-
agent-media image generate --prompt "a red robot" --provider replicate
|
|
166
|
-
```
|
|
167
|
-
|
|
168
|
-
### Runpod
|
|
169
|
-
|
|
170
|
-
Uses Runpod for AI-powered image generation.
|
|
256
|
+
### Default Models
|
|
171
257
|
|
|
172
|
-
|
|
258
|
+
| Provider | resize | convert | extend | generate | edit | remove-background | transcribe |
|
|
259
|
+
|----------|--------|---------|--------|----------|------|-------------------|------------|
|
|
260
|
+
| **local** | ✓ | ✓ | ✓ | - | - | - | - |
|
|
261
|
+
| **fal** | - | - | - | `fal-ai/flux-2` | `fal-ai/flux-2/edit` | `fal-ai/birefnet/v2` | `fal-ai/wizper` |
|
|
262
|
+
| **replicate** | - | - | - | `black-forest-labs/flux-2-dev` | `black-forest-labs/flux-kontext-dev` | `men1scus/birefnet` | WhisperX |
|
|
263
|
+
| **runpod** | - | - | - | `alibaba/wan-2.6` | `google/nano-banana-pro-edit` | - | - |
|
|
173
264
|
|
|
174
|
-
|
|
175
|
-
export RUNPOD_API_KEY=your-key
|
|
176
|
-
agent-media image generate --prompt "a red robot" --provider runpod
|
|
177
|
-
```
|
|
265
|
+
Use `--model <name>` to override the default model for any command.
|
|
178
266
|
|
|
179
267
|
### Provider Selection
|
|
180
268
|
|
|
@@ -185,12 +273,13 @@ agent-media image generate --prompt "a red robot" --provider runpod
|
|
|
185
273
|
|
|
186
274
|
## Environment Variables
|
|
187
275
|
|
|
188
|
-
| Variable | Description |
|
|
189
|
-
|
|
190
|
-
| `FAL_API_KEY` | fal.ai API key |
|
|
191
|
-
| `REPLICATE_API_TOKEN` | Replicate API
|
|
192
|
-
| `RUNPOD_API_KEY` | Runpod API key |
|
|
193
|
-
| `
|
|
276
|
+
| Variable | Description | Get Key |
|
|
277
|
+
|----------|-------------|---------|
|
|
278
|
+
| `FAL_API_KEY` | fal.ai API key | [fal.ai](https://fal.ai/dashboard/keys) |
|
|
279
|
+
| `REPLICATE_API_TOKEN` | Replicate API token | [replicate.com](https://replicate.com/account/api-tokens) |
|
|
280
|
+
| `RUNPOD_API_KEY` | Runpod API key | [runpod.io](https://www.runpod.io/console/user/settings) |
|
|
281
|
+
| `HUGGINGFACE_ACCESS_TOKEN` | For transcription with speaker ID (replicate only) | [huggingface.co](https://huggingface.co/settings/tokens) |
|
|
282
|
+
| `AGENT_MEDIA_DIR` | Output directory (default: `.agent-media/`) | - |
|
|
194
283
|
|
|
195
284
|
## Usage with AI Agents
|
|
196
285
|
|
|
@@ -208,13 +297,21 @@ Add to your project instructions:
|
|
|
208
297
|
```markdown
|
|
209
298
|
## Media Processing
|
|
210
299
|
|
|
211
|
-
Use `agent-media` for image operations. Run `agent-media --help` for commands.
|
|
300
|
+
Use `agent-media` for image and audio operations. Run `agent-media --help` for commands.
|
|
212
301
|
|
|
213
302
|
- `agent-media image resize --in <path> --width <px>` - Resize image
|
|
214
303
|
- `agent-media image convert --in <path> --format <f>` - Convert format
|
|
215
304
|
- `agent-media image generate --prompt <text>` - Generate image
|
|
305
|
+
- `agent-media image edit --in <path> --prompt <text>` - Edit image
|
|
216
306
|
- `agent-media image remove-background --in <path>` - Remove background
|
|
307
|
+
- `agent-media audio extract --in <video>` - Extract audio from video
|
|
308
|
+
- `agent-media audio transcribe --in <audio>` - Transcribe audio
|
|
217
309
|
|
|
218
310
|
All commands output JSON with `ok: true/false` and exit 0/1.
|
|
219
311
|
```
|
|
220
312
|
|
|
313
|
+
## Roadmap
|
|
314
|
+
|
|
315
|
+
- [ ] Local CPU background removal via transformers.js/ONNX (zero API keys)
|
|
316
|
+
- [ ] Video processing actions
|
|
317
|
+
- [ ] Batch processing support
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "agent-media",
|
|
3
|
-
"version": "0.3.
|
|
3
|
+
"version": "0.3.2",
|
|
4
4
|
"description": "Agent-first media toolkit CLI",
|
|
5
5
|
"license": "Apache-2.0",
|
|
6
6
|
"repository": {
|
|
@@ -34,10 +34,10 @@
|
|
|
34
34
|
"dependencies": {
|
|
35
35
|
"commander": "^12.0.0",
|
|
36
36
|
"dotenv": "^17.2.3",
|
|
37
|
-
"@agent-media/core": "0.3.0",
|
|
38
37
|
"@agent-media/audio": "0.3.0",
|
|
39
|
-
"@agent-media/
|
|
40
|
-
"@agent-media/image": "0.2.0"
|
|
38
|
+
"@agent-media/core": "0.3.0",
|
|
39
|
+
"@agent-media/image": "0.2.0",
|
|
40
|
+
"@agent-media/providers": "0.2.0"
|
|
41
41
|
},
|
|
42
42
|
"devDependencies": {
|
|
43
43
|
"@types/node": "^22.0.0",
|