npm - agent-media - Versions diffs - 0.3.3 → 0.3.5 - Mend

agent-media 0.3.3 → 0.3.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (2) hide show

package/README.md +125 -78
package/package.json +2 -2

package/README.md CHANGED Viewed

@@ -8,48 +8,66 @@ Media processing CLI for AI agents.
 ## Quick Start
+### Local processing (no API key needed)
+Uses [Sharp](https://sharp.pixelplumbing.com/) for fast local image processing.
+We're working on adding [transformers.js](https://github.com/huggingface/transformers.js) for local AI features soon.
+```bash
+bunx agent-media@latest image resize --in sunset-mountains.jpg --width 800
+bunx agent-media@latest image convert --in sunset-mountains.png --format webp
+bunx agent-media@latest image extend --in sunset-mountains.jpg --padding 50 --color "#FFFFFF"
+bunx agent-media@latest audio extract --in video.mp4
+```
+### AI-powered features
 Requires an API key from one of these providers:
 - [fal.ai](https://fal.ai/dashboard/keys) → `FAL_API_KEY`
 - [Replicate](https://replicate.com/account/api-tokens) → `REPLICATE_API_TOKEN`
 - [Runpod](https://www.runpod.io/console/user/settings) → `RUNPOD_API_KEY`
+### bunx
 ```bash
 # Generate an image
-npx agent-media@latest image generate --prompt "a robot painting a sunset"
+bunx agent-media@latest image generate --prompt "a robot painting a sunset"
 # Edit the generated image
-npx agent-media@latest image edit --in .agent-media/generated_*.png --prompt "add a cat watching"
+bunx agent-media@latest image edit --in .agent-media/generated_*.png --prompt "add a cat watching"
 # Remove background
-npx agent-media@latest image remove-background --in .agent-media/edited_*.png
+bunx agent-media@latest image remove-background --in .agent-media/edited_*.png
-# Convert to webp
-npx agent-media@latest image convert --in .agent-media/nobg_*.png --format webp
+# Transcribe with speaker identification
+bunx agent-media@latest audio transcribe --in audio.mp3 --diarize
 ```
-**Video to transcript** (no API key needed for extract)
+### npx
 ```bash
-# Extract audio from video (local, no API key)
-npx agent-media@latest audio extract --in video.mp4
+# Generate an image
+npx agent-media@latest image generate --prompt "a robot painting a sunset"
-# Transcribe with speaker identification
-npx agent-media@latest audio transcribe --in .agent-media/extracted_*.mp3 --diarize
-```
+# Edit the generated image
+npx agent-media@latest image edit --in .agent-media/generated_*.png --prompt "add a cat watching"
-**Local processing** (no API key needed)
+# Remove background
+npx agent-media@latest image remove-background --in .agent-media/edited_*.png
-```bash
-npx agent-media@latest image resize --in photo.jpg --width 800
-npx agent-media@latest image convert --in photo.png --format webp
-npx agent-media@latest image extend --in photo.jpg --padding 50 --color "#FFFFFF"
+# Transcribe with speaker identification
+npx agent-media@latest audio transcribe --in audio.mp3 --diarize
 ```
 ## Installation
 ```bash
-# Use directly with npx (no install)
+# Use directly with bunx (no install)
+bunx agent-media@latest --help
+# Or with npx
 npx agent-media@latest --help
 # Or install globally
@@ -69,34 +87,38 @@ pnpm install && pnpm build && pnpm link --global
 - Node.js >= 18.0.0
 - API key for AI features (generate, edit, remove-background, transcribe)
-## Commands
+---
-### Image Commands
+## image
 ```bash
-agent-media image resize --in <path> [options]      # Resize image
-agent-media image convert --in <path> --format <f>  # Convert format
-agent-media image remove-background --in <path>     # Remove background
-agent-media image generate --prompt <text>          # Generate from prompt
-agent-media image extend --in <path> --padding <px> --color <hex>  # Extend canvas
-agent-media image edit --in <path> --prompt <text>  # Edit with prompt
-```
+# Resize image
+agent-media@latest image resize --in <path> [options]
-### Audio Commands
+# Convert format
+agent-media@latest image convert --in <path> --format <f>
-```bash
-agent-media audio extract --in <video>              # Extract audio from video
-agent-media audio transcribe --in <audio>           # Transcribe audio to text
-```
+# Extend canvas with padding
+agent-media@latest image extend --in <path> --padding <px> --color <hex>
----
+# Generate image from text
+agent-media@latest image generate --prompt <text>
+# Edit image with text prompt
+agent-media@latest image edit --in <path> --prompt <text>
+# Remove background
+agent-media@latest image remove-background --in <path>
+```
 ### resize
+*local*
 ```bash
-agent-media image resize --in photo.jpg --width 800
-agent-media image resize --in photo.jpg --height 600
-agent-media image resize --in photo.jpg --width 800 --height 600
+agent-media@latest image resize --in sunset-mountains.jpg --width 800
+agent-media@latest image resize --in sunset-mountains.jpg --height 600
+agent-media@latest image resize --in https://ytrzap04kkm0giml.public.blob.vercel-storage.com/sunset-mountains.jpg --width 800
 ```
 | Option | Description |
@@ -105,14 +127,15 @@ agent-media image resize --in photo.jpg --width 800 --height 600
 | `--width <px>` | Target width in pixels |
 | `--height <px>` | Target height in pixels |
 | `--out <dir>` | Output directory |
-| `--provider <name>` | Provider (local) |
 ### convert
+*local*
 ```bash
-agent-media image convert --in photo.png --format webp
-agent-media image convert --in photo.jpg --format png
-agent-media image convert --in photo.png --format jpg --quality 90
+agent-media@latest image convert --in sunset-mountains.png --format webp
+agent-media@latest image convert --in sunset-mountains.jpg --format png
+agent-media@latest image convert --in https://ytrzap04kkm0giml.public.blob.vercel-storage.com/sunset-mountains.png --format jpg --quality 90
 ```
 | Option | Description |
@@ -121,26 +144,33 @@ agent-media image convert --in photo.png --format jpg --quality 90
 | `--format <f>` | Output format: png, jpg, webp (required) |
 | `--quality <n>` | Quality 1-100 for lossy formats (default: 80) |
 | `--out <dir>` | Output directory |
-| `--provider <name>` | Provider (local) |
-### remove-background
+### extend
+*local*
+Extend image canvas by adding padding on all sides with a solid background color.
 ```bash
-agent-media image remove-background --in portrait.jpg
-agent-media image remove-background --in https://example.com/photo.jpg
+agent-media@latest image extend --in sunset-mountains.jpg --padding 50 --color "#E4ECF8"
+agent-media@latest image extend --in https://ytrzap04kkm0giml.public.blob.vercel-storage.com/sunset-mountains.png --padding 100 --color "#FFFFFF"
 ```
 | Option | Description |
 |--------|-------------|
 | `--in <path>` | Input file path or URL (required) |
+| `--padding <px>` | Padding size in pixels to add on all sides (required) |
+| `--color <hex>` | Background color for extended area (required). Also flattens transparency. |
+| `--dpi <n>` | DPI/density for output image (default: 300) |
 | `--out <dir>` | Output directory |
-| `--provider <name>` | Provider (fal, replicate) |
 ### generate
+*API key required*
 ```bash
-agent-media image generate --prompt "a cat wearing a hat"
-agent-media image generate --prompt "sunset over mountains" --width 1024 --height 768
+agent-media@latest image generate --prompt "a cat wearing a hat"
+agent-media@latest image generate --prompt "sunset over mountains" --width 1024 --height 768
 ```
 | Option | Description |
@@ -152,48 +182,61 @@ agent-media image generate --prompt "sunset over mountains" --width 1024 --heigh
 | `--provider <name>` | Provider (fal, replicate, runpod) |
 | `--model <name>` | Model override (e.g., `fal-ai/flux-2`, `black-forest-labs/flux-2-dev`) |
-### extend
+### edit
-Extend image canvas by adding padding on all sides with a solid background color.
+*API key required*
+Edit an image using a text prompt (image-to-image).
 ```bash
-agent-media image extend --in photo.jpg --padding 50 --color "#E4ECF8"
-agent-media image extend --in photo.png --padding 100 --color "#FFFFFF" --dpi 300
+agent-media@latest image edit --in sunset-mountains.jpg --prompt "make the sky more vibrant"
+agent-media@latest image edit --in https://ytrzap04kkm0giml.public.blob.vercel-storage.com/portrait-headshot.png --prompt "add sunglasses"
 ```
 | Option | Description |
 |--------|-------------|
 | `--in <path>` | Input file path or URL (required) |
-| `--padding <px>` | Padding size in pixels to add on all sides (required) |
-| `--color <hex>` | Background color for extended area (required). Also flattens transparency. |
-| `--dpi <n>` | DPI/density for output image (default: 300) |
+| `--prompt <text>` | Text description of the desired edit (required) |
 | `--out <dir>` | Output directory |
-| `--provider <name>` | Provider (local) |
+| `--provider <name>` | Provider (fal, replicate, runpod) |
+| `--model <name>` | Model override (e.g., `fal-ai/flux-2/edit`) |
-### edit
+### remove-background
-Edit an image using a text prompt (image-to-image).
+*API key required*
 ```bash
-agent-media image edit --in photo.jpg --prompt "make the sky more vibrant"
-agent-media image edit --in portrait.jpg --prompt "add sunglasses"
+agent-media@latest image remove-background --in portrait-headshot.png
+agent-media@latest image remove-background --in https://ytrzap04kkm0giml.public.blob.vercel-storage.com/portrait-headshot.png
 ```
 | Option | Description |
 |--------|-------------|
 | `--in <path>` | Input file path or URL (required) |
-| `--prompt <text>` | Text description of the desired edit (required) |
 | `--out <dir>` | Output directory |
-| `--provider <name>` | Provider (fal, replicate, runpod) |
-| `--model <name>` | Model override (e.g., `fal-ai/flux-2/edit`) |
+| `--provider <name>` | Provider (fal, replicate) |
+---
-### audio extract
+## audio
-Extract audio track from a video file. Uses local ffmpeg, no API key needed.
+```bash
+# Extract audio from video
+agent-media@latest audio extract --in <video>
+# Transcribe audio to text
+agent-media@latest audio transcribe --in <audio>
+```
+### extract
+*local*
+Extract audio track from a video file.
 ```bash
-agent-media audio extract --in video.mp4
-agent-media audio extract --in video.mp4 --format wav
+agent-media@latest audio extract --in video.mp4
+agent-media@latest audio extract --in video.mp4 --format wav
 ```
 | Option | Description |
@@ -202,13 +245,15 @@ agent-media audio extract --in video.mp4 --format wav
 | `--format <f>` | Output format: mp3, wav (default: mp3) |
 | `--out <dir>` | Output directory |
-### audio transcribe
+### transcribe
+*API key required*
 Transcribe audio to text with timestamps. Supports speaker identification.
 ```bash
-agent-media audio transcribe --in audio.mp3
-agent-media audio transcribe --in audio.mp3 --diarize --speakers 2
+agent-media@latest audio transcribe --in audio.mp3
+agent-media@latest audio transcribe --in audio.mp3 --diarize --speakers 2
 ```
 | Option | Description |
@@ -221,6 +266,8 @@ agent-media audio transcribe --in audio.mp3 --diarize --speakers 2
 | `--provider <name>` | Provider (fal, replicate) |
 | `--model <name>` | Model override |
+---
 ## Output Format
 All commands return JSON to stdout:
@@ -286,8 +333,8 @@ Use `--model <name>` to override the default model for any command.
 ### Just ask the agent
 ```
-Use agent-media to resize this image to 800px wide.
-Run agent-media --help to see available commands.
+Use agent-media@latest to resize this image to 800px wide.
+Run agent-media@latest --help to see available commands.
 ```
 ### AGENTS.md / CLAUDE.md
@@ -297,15 +344,15 @@ Add to your project instructions:
 ```markdown
 ## Media Processing
-Use `agent-media` for image and audio operations. Run `agent-media --help` for commands.
+Use `agent-media@latest` for image and audio operations. Run `agent-media@latest --help` for commands.
-- `agent-media image resize --in <path> --width <px>` - Resize image
-- `agent-media image convert --in <path> --format <f>` - Convert format
-- `agent-media image generate --prompt <text>` - Generate image
-- `agent-media image edit --in <path> --prompt <text>` - Edit image
-- `agent-media image remove-background --in <path>` - Remove background
-- `agent-media audio extract --in <video>` - Extract audio from video
-- `agent-media audio transcribe --in <audio>` - Transcribe audio
+- `agent-media@latest image resize --in <path> --width <px>` - Resize image
+- `agent-media@latest image convert --in <path> --format <f>` - Convert format
+- `agent-media@latest image generate --prompt <text>` - Generate image
+- `agent-media@latest image edit --in <path> --prompt <text>` - Edit image
+- `agent-media@latest image remove-background --in <path>` - Remove background
+- `agent-media@latest audio extract --in <video>` - Extract audio from video
+- `agent-media@latest audio transcribe --in <audio>` - Transcribe audio
 All commands output JSON with `ok: true/false` and exit 0/1.
 ```

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "agent-media",
-  "version": "0.3.3",
+  "version": "0.3.5",
   "description": "Agent-first media toolkit CLI",
   "license": "Apache-2.0",
   "repository": {
@@ -34,8 +34,8 @@
   "dependencies": {
     "commander": "^12.0.0",
     "dotenv": "^17.2.3",
-    "@agent-media/audio": "0.3.0",
     "@agent-media/core": "0.3.0",
+    "@agent-media/audio": "0.3.0",
     "@agent-media/providers": "0.2.0",
     "@agent-media/image": "0.2.0"
   },