npm - agent-media - Versions diffs - 0.3.4 → 0.4.0 - Mend

agent-media 0.3.4 → 0.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (2) hide show

package/README.md +85 -48
package/package.json +4 -4

package/README.md CHANGED Viewed

@@ -10,15 +10,19 @@ Media processing CLI for AI agents.
 ### Local processing (no API key needed)
-Uses [Sharp](https://sharp.pixelplumbing.com/) for fast local image processing. We're working on adding [transformers.js](https://huggingface.co/docs/transformers.js) for local AI features soon.
+Uses [Sharp](https://sharp.pixelplumbing.com/) for image operations and [transformers.js](https://huggingface.co/docs/transformers.js) for local AI (background removal, transcription).
 ```bash
-bunx agent-media@latest image resize --in photo.jpg --width 800
-bunx agent-media@latest image convert --in photo.png --format webp
-bunx agent-media@latest image extend --in photo.jpg --padding 50 --color "#FFFFFF"
+bunx agent-media@latest image resize --in sunset-mountains.jpg --width 800
+bunx agent-media@latest image convert --in sunset-mountains.png --format webp
+bunx agent-media@latest image extend --in sunset-mountains.jpg --padding 50 --color "#FFFFFF"
+bunx agent-media@latest image remove-background --in portrait-headshot.png --provider transformers
 bunx agent-media@latest audio extract --in video.mp4
+bunx agent-media@latest audio transcribe --in audio.mp3 --provider transformers
 ```
+> **Note**: You may see a `mutex lock failed` error with `--provider transformers` — ignore it, the output is correct if JSON shows `"ok": true`.
 ### AI-powered features
 Requires an API key from one of these providers:
@@ -85,32 +89,38 @@ pnpm install && pnpm build && pnpm link --global
 - Node.js >= 18.0.0
 - API key for AI features (generate, edit, remove-background, transcribe)
+---
 ## image
 ```bash
-agent-media@latest image resize --in <path> [options]      # Resize image
-agent-media@latest image convert --in <path> --format <f>  # Convert format
-agent-media@latest image remove-background --in <path>     # Remove background
-agent-media@latest image generate --prompt <text>          # Generate from prompt
-agent-media@latest image extend --in <path> --padding <px> --color <hex>  # Extend canvas
-agent-media@latest image edit --in <path> --prompt <text>  # Edit with prompt
-```
+# Resize image
+agent-media@latest image resize --in <path> [options]
-## audio
+# Convert format
+agent-media@latest image convert --in <path> --format <f>
-```bash
-agent-media@latest audio extract --in <video>              # Extract audio from video
-agent-media@latest audio transcribe --in <audio>           # Transcribe audio to text
-```
+# Extend canvas with padding
+agent-media@latest image extend --in <path> --padding <px> --color <hex>
----
+# Generate image from text
+agent-media@latest image generate --prompt <text>
+# Edit image with text prompt
+agent-media@latest image edit --in <path> --prompt <text>
+# Remove background
+agent-media@latest image remove-background --in <path>
+```
 ### resize
+*local*
 ```bash
-agent-media@latest image resize --in photo.jpg --width 800
-agent-media@latest image resize --in photo.jpg --height 600
-agent-media@latest image resize --in photo.jpg --width 800 --height 600
+agent-media@latest image resize --in sunset-mountains.jpg --width 800
+agent-media@latest image resize --in sunset-mountains.jpg --height 600
+agent-media@latest image resize --in https://ytrzap04kkm0giml.public.blob.vercel-storage.com/sunset-mountains.jpg --width 800
 ```
 | Option | Description |
@@ -119,14 +129,15 @@ agent-media@latest image resize --in photo.jpg --width 800 --height 600
 | `--width <px>` | Target width in pixels |
 | `--height <px>` | Target height in pixels |
 | `--out <dir>` | Output directory |
-| `--provider <name>` | Provider (local) |
 ### convert
+*local*
 ```bash
-agent-media@latest image convert --in photo.png --format webp
-agent-media@latest image convert --in photo.jpg --format png
-agent-media@latest image convert --in photo.png --format jpg --quality 90
+agent-media@latest image convert --in sunset-mountains.png --format webp
+agent-media@latest image convert --in sunset-mountains.jpg --format png
+agent-media@latest image convert --in https://ytrzap04kkm0giml.public.blob.vercel-storage.com/sunset-mountains.png --format jpg --quality 90
 ```
 | Option | Description |
@@ -135,23 +146,30 @@ agent-media@latest image convert --in photo.png --format jpg --quality 90
 | `--format <f>` | Output format: png, jpg, webp (required) |
 | `--quality <n>` | Quality 1-100 for lossy formats (default: 80) |
 | `--out <dir>` | Output directory |
-| `--provider <name>` | Provider (local) |
-### remove-background
+### extend
+*local*
+Extend image canvas by adding padding on all sides with a solid background color.
 ```bash
-agent-media@latest image remove-background --in portrait.jpg
-agent-media@latest image remove-background --in https://example.com/photo.jpg
+agent-media@latest image extend --in sunset-mountains.jpg --padding 50 --color "#E4ECF8"
+agent-media@latest image extend --in https://ytrzap04kkm0giml.public.blob.vercel-storage.com/sunset-mountains.png --padding 100 --color "#FFFFFF"
 ```
 | Option | Description |
 |--------|-------------|
 | `--in <path>` | Input file path or URL (required) |
+| `--padding <px>` | Padding size in pixels to add on all sides (required) |
+| `--color <hex>` | Background color for extended area (required). Also flattens transparency. |
+| `--dpi <n>` | DPI/density for output image (default: 300) |
 | `--out <dir>` | Output directory |
-| `--provider <name>` | Provider (fal, replicate) |
 ### generate
+*API key required*
 ```bash
 agent-media@latest image generate --prompt "a cat wearing a hat"
 agent-media@latest image generate --prompt "sunset over mountains" --width 1024 --height 768
@@ -166,44 +184,57 @@ agent-media@latest image generate --prompt "sunset over mountains" --width 1024
 | `--provider <name>` | Provider (fal, replicate, runpod) |
 | `--model <name>` | Model override (e.g., `fal-ai/flux-2`, `black-forest-labs/flux-2-dev`) |
-### extend
+### edit
-Extend image canvas by adding padding on all sides with a solid background color.
+*API key required*
+Edit an image using a text prompt (image-to-image).
 ```bash
-agent-media@latest image extend --in photo.jpg --padding 50 --color "#E4ECF8"
-agent-media@latest image extend --in photo.png --padding 100 --color "#FFFFFF" --dpi 300
+agent-media@latest image edit --in sunset-mountains.jpg --prompt "make the sky more vibrant"
+agent-media@latest image edit --in https://ytrzap04kkm0giml.public.blob.vercel-storage.com/portrait-headshot.png --prompt "add sunglasses"
 ```
 | Option | Description |
 |--------|-------------|
 | `--in <path>` | Input file path or URL (required) |
-| `--padding <px>` | Padding size in pixels to add on all sides (required) |
-| `--color <hex>` | Background color for extended area (required). Also flattens transparency. |
-| `--dpi <n>` | DPI/density for output image (default: 300) |
+| `--prompt <text>` | Text description of the desired edit (required) |
 | `--out <dir>` | Output directory |
-| `--provider <name>` | Provider (local) |
+| `--provider <name>` | Provider (fal, replicate, runpod) |
+| `--model <name>` | Model override (e.g., `fal-ai/flux-2/edit`) |
-### edit
+### remove-background
-Edit an image using a text prompt (image-to-image).
+*API key required*
 ```bash
-agent-media@latest image edit --in photo.jpg --prompt "make the sky more vibrant"
-agent-media@latest image edit --in portrait.jpg --prompt "add sunglasses"
+agent-media@latest image remove-background --in portrait-headshot.png
+agent-media@latest image remove-background --in https://ytrzap04kkm0giml.public.blob.vercel-storage.com/portrait-headshot.png
 ```
 | Option | Description |
 |--------|-------------|
 | `--in <path>` | Input file path or URL (required) |
-| `--prompt <text>` | Text description of the desired edit (required) |
 | `--out <dir>` | Output directory |
-| `--provider <name>` | Provider (fal, replicate, runpod) |
-| `--model <name>` | Model override (e.g., `fal-ai/flux-2/edit`) |
+| `--provider <name>` | Provider (fal, replicate) |
-### audio extract
+---
-Extract audio track from a video file. Uses local ffmpeg, no API key needed.
+## audio
+```bash
+# Extract audio from video
+agent-media@latest audio extract --in <video>
+# Transcribe audio to text
+agent-media@latest audio transcribe --in <audio>
+```
+### extract
+*local*
+Extract audio track from a video file.
 ```bash
 agent-media@latest audio extract --in video.mp4
@@ -216,7 +247,9 @@ agent-media@latest audio extract --in video.mp4 --format wav
 | `--format <f>` | Output format: mp3, wav (default: mp3) |
 | `--out <dir>` | Output directory |
-### audio transcribe
+### transcribe
+*API key required*
 Transcribe audio to text with timestamps. Supports speaker identification.
@@ -235,6 +268,8 @@ agent-media@latest audio transcribe --in audio.mp3 --diarize --speakers 2
 | `--provider <name>` | Provider (fal, replicate) |
 | `--model <name>` | Model override |
+---
 ## Output Format
 All commands return JSON to stdout:
@@ -272,6 +307,7 @@ Exit code is `0` on success, `1` on error.
 | Provider | resize | convert | extend | generate | edit | remove-background | transcribe |
 |----------|--------|---------|--------|----------|------|-------------------|------------|
 | **local** | ✓ | ✓ | ✓ | - | - | - | - |
+| **transformers** | - | - | - | - | - | `Xenova/modnet` | `moonshine-base` |
 | **fal** | - | - | - | `fal-ai/flux-2` | `fal-ai/flux-2/edit` | `fal-ai/birefnet/v2` | `fal-ai/wizper` |
 | **replicate** | - | - | - | `black-forest-labs/flux-2-dev` | `black-forest-labs/flux-kontext-dev` | `men1scus/birefnet` | WhisperX |
 | **runpod** | - | - | - | `alibaba/wan-2.6` | `google/nano-banana-pro-edit` | - | - |
@@ -326,6 +362,7 @@ All commands output JSON with `ok: true/false` and exit 0/1.
 ## Roadmap
-- [ ] Local CPU background removal via transformers.js/ONNX (zero API keys)
+- [x] Local CPU background removal via transformers.js/ONNX (zero API keys)
+- [x] Local CPU transcription via transformers.js/ONNX (zero API keys)
 - [ ] Video processing actions
 - [ ] Batch processing support

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "agent-media",
-  "version": "0.3.4",
+  "version": "0.4.0",
   "description": "Agent-first media toolkit CLI",
   "license": "Apache-2.0",
   "repository": {
@@ -34,10 +34,10 @@
   "dependencies": {
     "commander": "^12.0.0",
     "dotenv": "^17.2.3",
+    "@agent-media/audio": "0.3.1",
+    "@agent-media/image": "0.2.1",
     "@agent-media/core": "0.3.0",
-    "@agent-media/image": "0.2.0",
-    "@agent-media/providers": "0.2.0",
-    "@agent-media/audio": "0.3.0"
+    "@agent-media/providers": "0.3.0"
   },
   "devDependencies": {
     "@types/node": "^22.0.0",