agent-media 0.3.1 → 0.3.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (2) hide show
  1. package/README.md +163 -66
  2. package/package.json +4 -4
package/README.md CHANGED
@@ -1,33 +1,73 @@
1
1
  # agent-media
2
2
 
3
- Media processing CLI for AI agents. Resize, convert, generate, and remove backgrounds from images.
3
+ Media processing CLI for AI agents.
4
4
 
5
- ## Installation
5
+ - **Image**: generate, edit, remove-background, resize, convert, extend
6
+ - **Video**: extract audio
7
+ - **Audio**: transcribe (with speaker identification)
8
+
9
+ ## Quick Start
10
+
11
+ Requires an API key from one of these providers:
12
+
13
+ - [fal.ai](https://fal.ai/dashboard/keys) → `FAL_API_KEY`
14
+ - [Replicate](https://replicate.com/account/api-tokens) → `REPLICATE_API_TOKEN`
15
+ - [Runpod](https://www.runpod.io/console/user/settings) → `RUNPOD_API_KEY`
16
+
17
+ ```bash
18
+ # Generate an image
19
+ npx agent-media image generate --prompt "a robot painting a sunset"
20
+
21
+ # Edit the generated image
22
+ npx agent-media image edit --in .agent-media/generated_*.png --prompt "add a cat watching"
23
+
24
+ # Remove background
25
+ npx agent-media image remove-background --in .agent-media/edited_*.png
26
+
27
+ # Convert to webp
28
+ npx agent-media image convert --in .agent-media/nobg_*.png --format webp
29
+ ```
30
+
31
+ **Video to transcript** (no API key needed for extract)
6
32
 
7
- ### npm (recommended)
33
+ ```bash
34
+ # Extract audio from video (local, no API key)
35
+ npx agent-media audio extract --in video.mp4
36
+
37
+ # Transcribe with speaker identification
38
+ npx agent-media audio transcribe --in .agent-media/extracted_*.mp3 --diarize
39
+ ```
40
+
41
+ **Local processing** (no API key needed)
42
+
43
+ ```bash
44
+ npx agent-media image resize --in photo.jpg --width 800
45
+ npx agent-media image convert --in photo.png --format webp
46
+ npx agent-media image extend --in photo.jpg --padding 50 --color "#FFFFFF"
47
+ ```
48
+
49
+ ## Installation
8
50
 
9
51
  ```bash
52
+ # Use directly with npx (no install)
53
+ npx agent-media --help
54
+
55
+ # Or install globally
10
56
  npm install -g agent-media
11
57
  ```
12
58
 
13
59
  ### From Source
14
60
 
15
61
  ```bash
16
- git clone https://github.com/anthropics/agent-media
62
+ git clone https://github.com/TimPietrusky/agent-media
17
63
  cd agent-media
18
- pnpm install
19
- pnpm build
20
- pnpm link --global
64
+ pnpm install && pnpm build && pnpm link --global
21
65
  ```
22
66
 
23
- ## Quick Start
67
+ ## Requirements
24
68
 
25
- ```bash
26
- agent-media image resize --in photo.jpg --width 800
27
- agent-media image convert --in photo.png --format webp
28
- agent-media image remove-background --in portrait.jpg
29
- agent-media image generate --prompt "a red robot"
30
- ```
69
+ - Node.js >= 18.0.0
70
+ - API key for AI features (generate, edit, remove-background, transcribe)
31
71
 
32
72
  ## Commands
33
73
 
@@ -38,9 +78,20 @@ agent-media image resize --in <path> [options] # Resize image
38
78
  agent-media image convert --in <path> --format <f> # Convert format
39
79
  agent-media image remove-background --in <path> # Remove background
40
80
  agent-media image generate --prompt <text> # Generate from prompt
81
+ agent-media image extend --in <path> --padding <px> --color <hex> # Extend canvas
82
+ agent-media image edit --in <path> --prompt <text> # Edit with prompt
83
+ ```
84
+
85
+ ### Audio Commands
86
+
87
+ ```bash
88
+ agent-media audio extract --in <video> # Extract audio from video
89
+ agent-media audio transcribe --in <audio> # Transcribe audio to text
41
90
  ```
42
91
 
43
- ### Resize
92
+ ---
93
+
94
+ ### resize
44
95
 
45
96
  ```bash
46
97
  agent-media image resize --in photo.jpg --width 800
@@ -56,7 +107,7 @@ agent-media image resize --in photo.jpg --width 800 --height 600
56
107
  | `--out <dir>` | Output directory |
57
108
  | `--provider <name>` | Provider (local) |
58
109
 
59
- ### Convert
110
+ ### convert
60
111
 
61
112
  ```bash
62
113
  agent-media image convert --in photo.png --format webp
@@ -72,7 +123,7 @@ agent-media image convert --in photo.png --format jpg --quality 90
72
123
  | `--out <dir>` | Output directory |
73
124
  | `--provider <name>` | Provider (local) |
74
125
 
75
- ### Remove Background
126
+ ### remove-background
76
127
 
77
128
  ```bash
78
129
  agent-media image remove-background --in portrait.jpg
@@ -85,7 +136,7 @@ agent-media image remove-background --in https://example.com/photo.jpg
85
136
  | `--out <dir>` | Output directory |
86
137
  | `--provider <name>` | Provider (fal, replicate) |
87
138
 
88
- ### Generate
139
+ ### generate
89
140
 
90
141
  ```bash
91
142
  agent-media image generate --prompt "a cat wearing a hat"
@@ -99,6 +150,76 @@ agent-media image generate --prompt "sunset over mountains" --width 1024 --heigh
99
150
  | `--height <px>` | Height (default: 1024) |
100
151
  | `--out <dir>` | Output directory |
101
152
  | `--provider <name>` | Provider (fal, replicate, runpod) |
153
+ | `--model <name>` | Model override (e.g., `fal-ai/flux-2`, `black-forest-labs/flux-2-dev`) |
154
+
155
+ ### extend
156
+
157
+ Extend image canvas by adding padding on all sides with a solid background color.
158
+
159
+ ```bash
160
+ agent-media image extend --in photo.jpg --padding 50 --color "#E4ECF8"
161
+ agent-media image extend --in photo.png --padding 100 --color "#FFFFFF" --dpi 300
162
+ ```
163
+
164
+ | Option | Description |
165
+ |--------|-------------|
166
+ | `--in <path>` | Input file path or URL (required) |
167
+ | `--padding <px>` | Padding size in pixels to add on all sides (required) |
168
+ | `--color <hex>` | Background color for extended area (required). Also flattens transparency. |
169
+ | `--dpi <n>` | DPI/density for output image (default: 300) |
170
+ | `--out <dir>` | Output directory |
171
+ | `--provider <name>` | Provider (local) |
172
+
173
+ ### edit
174
+
175
+ Edit an image using a text prompt (image-to-image).
176
+
177
+ ```bash
178
+ agent-media image edit --in photo.jpg --prompt "make the sky more vibrant"
179
+ agent-media image edit --in portrait.jpg --prompt "add sunglasses"
180
+ ```
181
+
182
+ | Option | Description |
183
+ |--------|-------------|
184
+ | `--in <path>` | Input file path or URL (required) |
185
+ | `--prompt <text>` | Text description of the desired edit (required) |
186
+ | `--out <dir>` | Output directory |
187
+ | `--provider <name>` | Provider (fal, replicate, runpod) |
188
+ | `--model <name>` | Model override (e.g., `fal-ai/flux-2/edit`) |
189
+
190
+ ### audio extract
191
+
192
+ Extract audio track from a video file. Uses local ffmpeg, no API key needed.
193
+
194
+ ```bash
195
+ agent-media audio extract --in video.mp4
196
+ agent-media audio extract --in video.mp4 --format wav
197
+ ```
198
+
199
+ | Option | Description |
200
+ |--------|-------------|
201
+ | `--in <path>` | Input video file path or URL (required) |
202
+ | `--format <f>` | Output format: mp3, wav (default: mp3) |
203
+ | `--out <dir>` | Output directory |
204
+
205
+ ### audio transcribe
206
+
207
+ Transcribe audio to text with timestamps. Supports speaker identification.
208
+
209
+ ```bash
210
+ agent-media audio transcribe --in audio.mp3
211
+ agent-media audio transcribe --in audio.mp3 --diarize --speakers 2
212
+ ```
213
+
214
+ | Option | Description |
215
+ |--------|-------------|
216
+ | `--in <path>` | Input audio file path or URL (required) |
217
+ | `--diarize` | Enable speaker identification |
218
+ | `--language <code>` | Language code (auto-detected if not provided) |
219
+ | `--speakers <n>` | Number of speakers hint |
220
+ | `--out <dir>` | Output directory |
221
+ | `--provider <name>` | Provider (fal, replicate) |
222
+ | `--model <name>` | Model override |
102
223
 
103
224
  ## Output Format
104
225
 
@@ -132,49 +253,16 @@ Exit code is `0` on success, `1` on error.
132
253
 
133
254
  ## Providers
134
255
 
135
- ### Local (default)
136
-
137
- Uses Sharp for image processing. No API key required.
138
-
139
- **Supports:** resize, convert
140
-
141
- ```bash
142
- agent-media image resize --in photo.jpg --width 800 # Uses local automatically
143
- ```
144
-
145
- ### Fal
146
-
147
- Uses fal.ai for AI-powered image operations.
148
-
149
- **Supports:** generate, remove-background
150
-
151
- ```bash
152
- export FAL_API_KEY=your-key
153
- agent-media image generate --prompt "a red robot"
154
- agent-media image remove-background --in photo.jpg
155
- ```
156
-
157
- ### Replicate
158
-
159
- Uses Replicate for AI-powered image operations.
160
-
161
- **Supports:** generate, remove-background
162
-
163
- ```bash
164
- export REPLICATE_API_TOKEN=your-token
165
- agent-media image generate --prompt "a red robot" --provider replicate
166
- ```
167
-
168
- ### Runpod
169
-
170
- Uses Runpod for AI-powered image generation.
256
+ ### Default Models
171
257
 
172
- **Supports:** generate
258
+ | Provider | resize | convert | extend | generate | edit | remove-background | transcribe |
259
+ |----------|--------|---------|--------|----------|------|-------------------|------------|
260
+ | **local** | ✓ | ✓ | ✓ | - | - | - | - |
261
+ | **fal** | - | - | - | `fal-ai/flux-2` | `fal-ai/flux-2/edit` | `fal-ai/birefnet/v2` | `fal-ai/wizper` |
262
+ | **replicate** | - | - | - | `black-forest-labs/flux-2-dev` | `black-forest-labs/flux-kontext-dev` | `men1scus/birefnet` | WhisperX |
263
+ | **runpod** | - | - | - | `alibaba/wan-2.6` | `google/nano-banana-pro-edit` | - | - |
173
264
 
174
- ```bash
175
- export RUNPOD_API_KEY=your-key
176
- agent-media image generate --prompt "a red robot" --provider runpod
177
- ```
265
+ Use `--model <name>` to override the default model for any command.
178
266
 
179
267
  ### Provider Selection
180
268
 
@@ -185,12 +273,13 @@ agent-media image generate --prompt "a red robot" --provider runpod
185
273
 
186
274
  ## Environment Variables
187
275
 
188
- | Variable | Description |
189
- |----------|-------------|
190
- | `FAL_API_KEY` | fal.ai API key |
191
- | `REPLICATE_API_TOKEN` | Replicate API key |
192
- | `RUNPOD_API_KEY` | Runpod API key |
193
- | `AGENT_MEDIA_DIR` | Output directory (default: `.agent-media/`) |
276
+ | Variable | Description | Get Key |
277
+ |----------|-------------|---------|
278
+ | `FAL_API_KEY` | fal.ai API key | [fal.ai](https://fal.ai/dashboard/keys) |
279
+ | `REPLICATE_API_TOKEN` | Replicate API token | [replicate.com](https://replicate.com/account/api-tokens) |
280
+ | `RUNPOD_API_KEY` | Runpod API key | [runpod.io](https://www.runpod.io/console/user/settings) |
281
+ | `HUGGINGFACE_ACCESS_TOKEN` | For transcription with speaker ID (replicate only) | [huggingface.co](https://huggingface.co/settings/tokens) |
282
+ | `AGENT_MEDIA_DIR` | Output directory (default: `.agent-media/`) | - |
194
283
 
195
284
  ## Usage with AI Agents
196
285
 
@@ -208,13 +297,21 @@ Add to your project instructions:
208
297
  ```markdown
209
298
  ## Media Processing
210
299
 
211
- Use `agent-media` for image operations. Run `agent-media --help` for commands.
300
+ Use `agent-media` for image and audio operations. Run `agent-media --help` for commands.
212
301
 
213
302
  - `agent-media image resize --in <path> --width <px>` - Resize image
214
303
  - `agent-media image convert --in <path> --format <f>` - Convert format
215
304
  - `agent-media image generate --prompt <text>` - Generate image
305
+ - `agent-media image edit --in <path> --prompt <text>` - Edit image
216
306
  - `agent-media image remove-background --in <path>` - Remove background
307
+ - `agent-media audio extract --in <video>` - Extract audio from video
308
+ - `agent-media audio transcribe --in <audio>` - Transcribe audio
217
309
 
218
310
  All commands output JSON with `ok: true/false` and exit 0/1.
219
311
  ```
220
312
 
313
+ ## Roadmap
314
+
315
+ - [ ] Local CPU background removal via transformers.js/ONNX (zero API keys)
316
+ - [ ] Video processing actions
317
+ - [ ] Batch processing support
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "agent-media",
3
- "version": "0.3.1",
3
+ "version": "0.3.2",
4
4
  "description": "Agent-first media toolkit CLI",
5
5
  "license": "Apache-2.0",
6
6
  "repository": {
@@ -34,10 +34,10 @@
34
34
  "dependencies": {
35
35
  "commander": "^12.0.0",
36
36
  "dotenv": "^17.2.3",
37
- "@agent-media/core": "0.3.0",
38
37
  "@agent-media/audio": "0.3.0",
39
- "@agent-media/providers": "0.2.0",
40
- "@agent-media/image": "0.2.0"
38
+ "@agent-media/core": "0.3.0",
39
+ "@agent-media/image": "0.2.0",
40
+ "@agent-media/providers": "0.2.0"
41
41
  },
42
42
  "devDependencies": {
43
43
  "@types/node": "^22.0.0",