npm - agent-media - Versions diffs - 0.3.1 → 0.3.2 - Mend

agent-media 0.3.1 → 0.3.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (2) hide show

package/README.md +163 -66
package/package.json +4 -4

package/README.md CHANGED Viewed

@@ -1,33 +1,73 @@
 # agent-media
-Media processing CLI for AI agents. Resize, convert, generate, and remove backgrounds from images.
+Media processing CLI for AI agents.
-## Installation
+- **Image**: generate, edit, remove-background, resize, convert, extend
+- **Video**: extract audio
+- **Audio**: transcribe (with speaker identification)
+## Quick Start
+Requires an API key from one of these providers:
+- [fal.ai](https://fal.ai/dashboard/keys) → `FAL_API_KEY`
+- [Replicate](https://replicate.com/account/api-tokens) → `REPLICATE_API_TOKEN`
+- [Runpod](https://www.runpod.io/console/user/settings) → `RUNPOD_API_KEY`
+```bash
+# Generate an image
+npx agent-media image generate --prompt "a robot painting a sunset"
+# Edit the generated image
+npx agent-media image edit --in .agent-media/generated_*.png --prompt "add a cat watching"
+# Remove background
+npx agent-media image remove-background --in .agent-media/edited_*.png
+# Convert to webp
+npx agent-media image convert --in .agent-media/nobg_*.png --format webp
+```
+**Video to transcript** (no API key needed for extract)
-### npm (recommended)
+```bash
+# Extract audio from video (local, no API key)
+npx agent-media audio extract --in video.mp4
+# Transcribe with speaker identification
+npx agent-media audio transcribe --in .agent-media/extracted_*.mp3 --diarize
+```
+**Local processing** (no API key needed)
+```bash
+npx agent-media image resize --in photo.jpg --width 800
+npx agent-media image convert --in photo.png --format webp
+npx agent-media image extend --in photo.jpg --padding 50 --color "#FFFFFF"
+```
+## Installation
 ```bash
+# Use directly with npx (no install)
+npx agent-media --help
+# Or install globally
 npm install -g agent-media
 ```
 ### From Source
 ```bash
-git clone https://github.com/anthropics/agent-media
+git clone https://github.com/TimPietrusky/agent-media
 cd agent-media
-pnpm install
-pnpm build
-pnpm link --global
+pnpm install && pnpm build && pnpm link --global
 ```
-## Quick Start
+## Requirements
-```bash
-agent-media image resize --in photo.jpg --width 800
-agent-media image convert --in photo.png --format webp
-agent-media image remove-background --in portrait.jpg
-agent-media image generate --prompt "a red robot"
-```
+- Node.js >= 18.0.0
+- API key for AI features (generate, edit, remove-background, transcribe)
 ## Commands
@@ -38,9 +78,20 @@ agent-media image resize --in <path> [options]      # Resize image
 agent-media image convert --in <path> --format <f>  # Convert format
 agent-media image remove-background --in <path>     # Remove background
 agent-media image generate --prompt <text>          # Generate from prompt
+agent-media image extend --in <path> --padding <px> --color <hex>  # Extend canvas
+agent-media image edit --in <path> --prompt <text>  # Edit with prompt
+```
+### Audio Commands
+```bash
+agent-media audio extract --in <video>              # Extract audio from video
+agent-media audio transcribe --in <audio>           # Transcribe audio to text
 ```
-### Resize
+---
+### resize
 ```bash
 agent-media image resize --in photo.jpg --width 800
@@ -56,7 +107,7 @@ agent-media image resize --in photo.jpg --width 800 --height 600
 | `--out <dir>` | Output directory |
 | `--provider <name>` | Provider (local) |
-### Convert
+### convert
 ```bash
 agent-media image convert --in photo.png --format webp
@@ -72,7 +123,7 @@ agent-media image convert --in photo.png --format jpg --quality 90
 | `--out <dir>` | Output directory |
 | `--provider <name>` | Provider (local) |
-### Remove Background
+### remove-background
 ```bash
 agent-media image remove-background --in portrait.jpg
@@ -85,7 +136,7 @@ agent-media image remove-background --in https://example.com/photo.jpg
 | `--out <dir>` | Output directory |
 | `--provider <name>` | Provider (fal, replicate) |
-### Generate
+### generate
 ```bash
 agent-media image generate --prompt "a cat wearing a hat"
@@ -99,6 +150,76 @@ agent-media image generate --prompt "sunset over mountains" --width 1024 --heigh
 | `--height <px>` | Height (default: 1024) |
 | `--out <dir>` | Output directory |
 | `--provider <name>` | Provider (fal, replicate, runpod) |
+| `--model <name>` | Model override (e.g., `fal-ai/flux-2`, `black-forest-labs/flux-2-dev`) |
+### extend
+Extend image canvas by adding padding on all sides with a solid background color.
+```bash
+agent-media image extend --in photo.jpg --padding 50 --color "#E4ECF8"
+agent-media image extend --in photo.png --padding 100 --color "#FFFFFF" --dpi 300
+```
+| Option | Description |
+|--------|-------------|
+| `--in <path>` | Input file path or URL (required) |
+| `--padding <px>` | Padding size in pixels to add on all sides (required) |
+| `--color <hex>` | Background color for extended area (required). Also flattens transparency. |
+| `--dpi <n>` | DPI/density for output image (default: 300) |
+| `--out <dir>` | Output directory |
+| `--provider <name>` | Provider (local) |
+### edit
+Edit an image using a text prompt (image-to-image).
+```bash
+agent-media image edit --in photo.jpg --prompt "make the sky more vibrant"
+agent-media image edit --in portrait.jpg --prompt "add sunglasses"
+```
+| Option | Description |
+|--------|-------------|
+| `--in <path>` | Input file path or URL (required) |
+| `--prompt <text>` | Text description of the desired edit (required) |
+| `--out <dir>` | Output directory |
+| `--provider <name>` | Provider (fal, replicate, runpod) |
+| `--model <name>` | Model override (e.g., `fal-ai/flux-2/edit`) |
+### audio extract
+Extract audio track from a video file. Uses local ffmpeg, no API key needed.
+```bash
+agent-media audio extract --in video.mp4
+agent-media audio extract --in video.mp4 --format wav
+```
+| Option | Description |
+|--------|-------------|
+| `--in <path>` | Input video file path or URL (required) |
+| `--format <f>` | Output format: mp3, wav (default: mp3) |
+| `--out <dir>` | Output directory |
+### audio transcribe
+Transcribe audio to text with timestamps. Supports speaker identification.
+```bash
+agent-media audio transcribe --in audio.mp3
+agent-media audio transcribe --in audio.mp3 --diarize --speakers 2
+```
+| Option | Description |
+|--------|-------------|
+| `--in <path>` | Input audio file path or URL (required) |
+| `--diarize` | Enable speaker identification |
+| `--language <code>` | Language code (auto-detected if not provided) |
+| `--speakers <n>` | Number of speakers hint |
+| `--out <dir>` | Output directory |
+| `--provider <name>` | Provider (fal, replicate) |
+| `--model <name>` | Model override |
 ## Output Format
@@ -132,49 +253,16 @@ Exit code is `0` on success, `1` on error.
 ## Providers
-### Local (default)
-Uses Sharp for image processing. No API key required.
-**Supports:** resize, convert
-```bash
-agent-media image resize --in photo.jpg --width 800  # Uses local automatically
-```
-### Fal
-Uses fal.ai for AI-powered image operations.
-**Supports:** generate, remove-background
-```bash
-export FAL_API_KEY=your-key
-agent-media image generate --prompt "a red robot"
-agent-media image remove-background --in photo.jpg
-```
-### Replicate
-Uses Replicate for AI-powered image operations.
-**Supports:** generate, remove-background
-```bash
-export REPLICATE_API_TOKEN=your-token
-agent-media image generate --prompt "a red robot" --provider replicate
-```
-### Runpod
-Uses Runpod for AI-powered image generation.
+### Default Models
-**Supports:** generate
+| Provider | resize | convert | extend | generate | edit | remove-background | transcribe |
+|----------|--------|---------|--------|----------|------|-------------------|------------|
+| **local** | ✓ | ✓ | ✓ | - | - | - | - |
+| **fal** | - | - | - | `fal-ai/flux-2` | `fal-ai/flux-2/edit` | `fal-ai/birefnet/v2` | `fal-ai/wizper` |
+| **replicate** | - | - | - | `black-forest-labs/flux-2-dev` | `black-forest-labs/flux-kontext-dev` | `men1scus/birefnet` | WhisperX |
+| **runpod** | - | - | - | `alibaba/wan-2.6` | `google/nano-banana-pro-edit` | - | - |
-```bash
-export RUNPOD_API_KEY=your-key
-agent-media image generate --prompt "a red robot" --provider runpod
-```
+Use `--model <name>` to override the default model for any command.
 ### Provider Selection
@@ -185,12 +273,13 @@ agent-media image generate --prompt "a red robot" --provider runpod
 ## Environment Variables
-| Variable | Description |
-|----------|-------------|
-| `FAL_API_KEY` | fal.ai API key |
-| `REPLICATE_API_TOKEN` | Replicate API key |
-| `RUNPOD_API_KEY` | Runpod API key |
-| `AGENT_MEDIA_DIR` | Output directory (default: `.agent-media/`) |
+| Variable | Description | Get Key |
+|----------|-------------|---------|
+| `FAL_API_KEY` | fal.ai API key | [fal.ai](https://fal.ai/dashboard/keys) |
+| `REPLICATE_API_TOKEN` | Replicate API token | [replicate.com](https://replicate.com/account/api-tokens) |
+| `RUNPOD_API_KEY` | Runpod API key | [runpod.io](https://www.runpod.io/console/user/settings) |
+| `HUGGINGFACE_ACCESS_TOKEN` | For transcription with speaker ID (replicate only) | [huggingface.co](https://huggingface.co/settings/tokens) |
+| `AGENT_MEDIA_DIR` | Output directory (default: `.agent-media/`) | - |
 ## Usage with AI Agents
@@ -208,13 +297,21 @@ Add to your project instructions:
 ```markdown
 ## Media Processing
-Use `agent-media` for image operations. Run `agent-media --help` for commands.
+Use `agent-media` for image and audio operations. Run `agent-media --help` for commands.
 - `agent-media image resize --in <path> --width <px>` - Resize image
 - `agent-media image convert --in <path> --format <f>` - Convert format
 - `agent-media image generate --prompt <text>` - Generate image
+- `agent-media image edit --in <path> --prompt <text>` - Edit image
 - `agent-media image remove-background --in <path>` - Remove background
+- `agent-media audio extract --in <video>` - Extract audio from video
+- `agent-media audio transcribe --in <audio>` - Transcribe audio
 All commands output JSON with `ok: true/false` and exit 0/1.
 ```
+## Roadmap
+- [ ] Local CPU background removal via transformers.js/ONNX (zero API keys)
+- [ ] Video processing actions
+- [ ] Batch processing support

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "agent-media",
-  "version": "0.3.1",
+  "version": "0.3.2",
   "description": "Agent-first media toolkit CLI",
   "license": "Apache-2.0",
   "repository": {
@@ -34,10 +34,10 @@
   "dependencies": {
     "commander": "^12.0.0",
     "dotenv": "^17.2.3",
-    "@agent-media/core": "0.3.0",
     "@agent-media/audio": "0.3.0",
-    "@agent-media/providers": "0.2.0",
-    "@agent-media/image": "0.2.0"
+    "@agent-media/core": "0.3.0",
+    "@agent-media/image": "0.2.0",
+    "@agent-media/providers": "0.2.0"
   },
   "devDependencies": {
     "@types/node": "^22.0.0",