npm - @miketromba/ploof - Versions diffs - 0.2.0 → 0.4.0 - Mend

@miketromba/ploof 0.2.0 → 0.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (5) hide show

package/README.md +188 -4
package/SPEC.md +220 -22
package/dist/ploof.js +223 -216
package/package.json +5 -2
package/skills/asset-generation/SKILL.md +1 -1

package/README.md CHANGED Viewed

@@ -9,7 +9,7 @@
   <img src="https://img.shields.io/badge/node-%3E%3D18-brightgreen" alt="node version" />
 </p>
-Ploof is a CLI for generating and editing creative assets with AI providers. It supports OpenAI image generation/editing and OpenAI video generation/editing today, plus the legacy OpenAI image variations endpoint when the authenticated project has access. The provider registry is designed for audio and broader model marketplaces over time.
+Ploof is a CLI for generating and editing creative assets with AI providers. It supports OpenAI image, video, and audio generation/processing, plus fal.ai's model marketplace through the official fal client. The provider registry is designed for broader model marketplaces over time.
 It is built for both developers and AI agents: predictable commands, parseable output, local auth profiles, YAML manifests, parallel execution, and a companion skill.
@@ -24,13 +24,18 @@ It is built for both developers and AI agents: predictable commands, parseable o
 | OpenAI video generation | Supported |
 | OpenAI video editing/extensions | Supported |
 | OpenAI video downloads/library/characters | Supported |
+| OpenAI audio generation / TTS | Supported |
+| OpenAI audio transcription | Supported |
+| OpenAI audio translation | Supported |
+| fal.ai auth profiles | Supported |
+| fal.ai model endpoints | Supported through `ploof model run` |
+| fal.ai image/video/audio endpoints | Supported through `--provider fal --model <endpoint-id>` |
 | Context images and masks | Supported |
-| Video references and source videos | Supported |
+| Image, video, and audio input assets | Supported |
 | YAML/JSON batch manifests | Supported |
 | Dependency-aware parallel runs | Supported |
 | Agent instructions via `ploof learn` | Supported |
 | Additional providers | Planned |
-| Audio generation | Planned |
 ## Install
@@ -58,6 +63,7 @@ npx @miketromba/ploof --help
 ```bash
 # Authenticate
 ploof login openai --api-key <your-api-key>
+ploof login fal --api-key <your-fal-key>
 # Generate an image
 ploof image generate \
@@ -80,8 +86,26 @@ ploof video generate \
   --seconds 4 \
   --out assets/clip.mp4
+# Generate and transcribe speech
+ploof audio generate \
+  --text "Ploof can generate speech and process audio." \
+  --voice alloy \
+  --out assets/speech.mp3
+ploof audio transcribe \
+  --audio assets/speech.mp3 \
+  --out assets/transcript.json
 # Run a manifest
 ploof run assets.yaml --parallel 4
+# Run any fal.ai endpoint directly
+ploof model run \
+  --provider fal \
+  --model fal-ai/flux/dev \
+  --prompt "Friendly CLI mascot icon, simple shape, transparent background" \
+  --param image_size=square_hd \
+  --out assets/icon.png
 ```
 ## Authentication
@@ -94,6 +118,8 @@ ploof login openai --api-key <your-api-key> --profile work
 ploof whoami openai
 ploof profiles openai
 ploof logout openai --profile work
+ploof login fal --api-key <your-fal-key>
+ploof whoami fal
 ```
 If `--api-key` is omitted, `ploof login openai` reads
@@ -106,6 +132,20 @@ Environment variables override stored credentials:
 export PLOOF_OPENAI_API_KEY=sk-...
 # or
 export OPENAI_API_KEY=sk-...
+export PLOOF_FAL_KEY=...
+# or
+export FAL_KEY=...
+```
+fal.ai split key environment variables are also supported:
+```bash
+export PLOOF_FAL_KEY_ID=...
+export PLOOF_FAL_KEY_SECRET=...
+# or
+export FAL_KEY_ID=...
+export FAL_KEY_SECRET=...
 ```
 OpenAI profile metadata:
@@ -118,6 +158,61 @@ ploof login openai \
   --base-url <url>
 ```
+## fal.ai Model Endpoints
+fal.ai support uses the official `@fal-ai/client`. Ploof uploads local asset inputs through fal storage, submits work through the fal queue in polling mode, waits for a complete response, and writes returned assets or text to disk.
+Use `ploof model run` for arbitrary fal endpoints:
+```bash
+ploof model run \
+  --provider fal \
+  --model fal-ai/flux/dev \
+  --prompt "Tiny app icon for a cheerful asset generation CLI" \
+  --param image_size=square_hd \
+  --out assets/fal-icon.png \
+  --output json
+```
+Named asset inputs map directly to provider input fields:
+```bash
+ploof model run \
+  --provider fal \
+  --model <fal-endpoint-id> \
+  --prompt "Animate this image into a short loop" \
+  --input image_url=assets/source.png \
+  --param duration=4 \
+  --out assets/loop.mp4
+```
+The media commands also work with fal when you provide the fal endpoint id as `--model`:
+```bash
+ploof image generate \
+  --provider fal \
+  --model fal-ai/flux/dev \
+  --prompt "Soft clay mascot icon" \
+  --param image_size=square_hd \
+  --out assets/mascot.png
+ploof video generate \
+  --provider fal \
+  --model <fal-video-endpoint-id> \
+  --prompt "Slow camera push through a miniature paper city" \
+  --input-reference assets/reference.png \
+  --param duration=4 \
+  --out assets/fal-video.mp4
+ploof audio generate \
+  --provider fal \
+  --model <fal-audio-endpoint-id> \
+  --text "A short spoken line." \
+  --out assets/fal-audio.mp3
+```
+Use `--param key=value` or `--json '{...}'` for endpoint-specific settings. Queue controls include `--start-timeout`, `--timeout`, `--poll-interval`, `--priority low|normal`, and `--storage-expires-in`.
 ## Image Generation
 OpenAI image generation and editing default to `gpt-image-2` when `--model` is omitted.
@@ -247,6 +342,55 @@ ploof video character create --name Mossy --video character.mp4 --output json
 ploof video character get char_abc123 --output json
 ```
+## Audio Generation And Processing
+OpenAI audio generation defaults to `gpt-4o-mini-tts`, `alloy`, and `mp3` when model, voice, and format are omitted.
+```bash
+ploof audio generate \
+  --provider openai \
+  --text "A concise product narration for the demo reel." \
+  --model gpt-4o-mini-tts \
+  --voice alloy \
+  --format mp3 \
+  --out assets/narration.mp3 \
+  --output json
+```
+Useful generation flags:
+| Flag | Description |
+| --- | --- |
+| `--model <model>` | TTS model, for example `gpt-4o-mini-tts` |
+| `--voice <voice>` | Built-in voice such as `alloy`, `coral`, `nova`, or `shimmer` |
+| `--voice-id <id>` | Custom voice id |
+| `--instructions <text>` | Voice/style instructions for supported models |
+| `--format <format>` | `mp3`, `opus`, `aac`, `flac`, `wav`, or `pcm` |
+| `--speed <number>` | Speech speed |
+| `--param key=value` | Provider-specific pass-through parameter |
+| `--json '{...}'` | Provider-specific JSON object |
+Transcription and translation:
+```bash
+ploof audio transcribe \
+  --audio assets/narration.mp3 \
+  --model gpt-4o-mini-transcribe \
+  --out assets/transcript.json \
+  --output json
+ploof audio translate \
+  --audio assets/spanish.mp3 \
+  --model whisper-1 \
+  --format text \
+  --out assets/translation.txt \
+  --output json
+```
+Transcription supports `--language`, `--prompt`, `--format`, `--temperature`, `--include`, `--timestamp-granularity`, `--chunking-strategy`, `--known-speaker-name`, and `--known-speaker-reference`. Translation supports `--prompt`, `--format`, and `--temperature`.
+Ploof writes complete static assets to disk. Streaming transport settings such as OpenAI `stream=true` for transcription or `stream_format=sse` for speech are rejected because they do not produce a finished asset file directly.
 ## Batch Manifests
 ```yaml
@@ -294,6 +438,36 @@ tasks:
     wait: true
     download: true
     output: assets/clip.mp4
+  - id: narration
+    kind: audio.generate
+    provider: openai
+    text: "Short narration for the generated clip."
+    params:
+      model: gpt-4o-mini-tts
+      voice: alloy
+      response_format: mp3
+    output: assets/narration.mp3
+  - id: transcript
+    kind: audio.transcribe
+    provider: openai
+    needs: [narration]
+    inputs:
+      audio:
+        task: narration
+    params:
+      model: gpt-4o-mini-transcribe
+    output: assets/transcript.json
+  - id: fal-icon
+    kind: model.run
+    provider: fal
+    model: fal-ai/flux/dev
+    prompt: "Small mascot icon for a CLI tool"
+    params:
+      image_size: square_hd
+    output: assets/fal-icon.png
 ```
 Run it:
@@ -303,6 +477,8 @@ ploof run assets.yaml --parallel 4
 ploof run assets.yaml --dry-run --output json
 ```
+In manifests, media task kinds default to `provider: openai`; `model.run` defaults to `provider: fal`.
 ## Output Formats
 Ploof defaults to table output in TTYs and compact output when piped.
@@ -372,7 +548,7 @@ bun run build
 npm pack --dry-run
 ```
-The default test suite includes mocked OpenAI end-to-end tests. Those tests run real `ploof` CLI commands against a local mock OpenAI server and verify generated files, edit uploads, video job polling/downloads, sidecar metadata, and dependency-aware manifests without spending API credits.
+The default test suite includes mocked OpenAI end-to-end tests and fal provider unit tests. The OpenAI tests run real `ploof` CLI commands against a local mock OpenAI server and verify generated files, edit uploads, video job polling/downloads, audio generation/processing, sidecar metadata, and dependency-aware manifests without spending API credits. The fal tests verify endpoint payload construction, local input upload mapping, polling options, and output persistence without spending API credits.
 Live OpenAI tests are opt-in only:
@@ -380,11 +556,19 @@ Live OpenAI tests are opt-in only:
 PLOOF_OPENAI_API_KEY=sk-... bun test tests/e2e
 ```
+Live fal.ai tests are also opt-in and use `fal-ai/flux/schnell` by default:
+```bash
+PLOOF_FAL_KEY=... bun test tests/e2e/fal-live.test.ts
+```
 Optional live-test overrides:
 ```bash
 PLOOF_OPENAI_LIVE_MODEL=gpt-image-2
 PLOOF_OPENAI_LIVE_SIZE=1024x1024
+PLOOF_FAL_LIVE_MODEL=fal-ai/flux/schnell
+PLOOF_FAL_LIVE_IMAGE_SIZE_PARAM=image_size=square_hd
 ```
 ## Publishing

package/SPEC.md CHANGED Viewed

@@ -2,7 +2,7 @@
 ## Summary
-Ploof is an npm-published CLI for generating and editing assets through AI generation providers. It starts with OpenAI image and video generation/editing, but the architecture must support multiple authenticated providers, multiple asset modalities, provider-specific settings, and parallel execution across mixed jobs.
+Ploof is an npm-published CLI for generating, editing, and processing assets through AI generation providers. It supports OpenAI image, video, and audio generation/processing plus fal.ai model endpoints, while preserving an architecture for multiple authenticated providers, multiple asset modalities, provider-specific settings, and parallel execution across mixed jobs.
 The product should feel like a small, sharp developer tool: easy to run manually, predictable in scripts, and optimized for AI agents.
@@ -80,10 +80,11 @@ Local release verification must stop at `npm pack --dry-run`; do not run local
 ## Initial Provider Scope
-Version 1 starts with OpenAI only.
+The current provider scope includes OpenAI and fal.ai.
-Initial capabilities:
+Core operation kinds:
+- `model.run`
 - `image.generate`
 - `image.edit`
 - `image.variation`
@@ -97,12 +98,19 @@ Initial capabilities:
 - `video.delete`
 - `video.character.create`
 - `video.character.get`
+- `audio.generate`
+- `audio.transcribe`
+- `audio.translate`
 Future providers should be added through the provider registry without changing the manifest model.
+Provider notes:
+- OpenAI has first-class implementations for images, videos, audio/TTS, transcription, translation, and OpenAI video library operations.
+- fal.ai uses the official `@fal-ai/client`, supports arbitrary endpoints through `model.run`, and supports image/video/audio commands when the chosen fal endpoint schema matches the command shape.
 Future high-leverage provider candidates:
-- fal.ai: strong multi-model generative media coverage.
 - Replicate: broad community model marketplace.
 - Hugging Face Inference Providers: centralized access to many hosted models/providers.
@@ -136,8 +144,12 @@ Environment overrides:
 - `PLOOF_OPENAI_API_KEY`
 - `OPENAI_API_KEY`
+- `PLOOF_FAL_KEY`
+- `FAL_KEY`
+- `PLOOF_FAL_KEY_ID` and `PLOOF_FAL_KEY_SECRET`
+- `FAL_KEY_ID` and `FAL_KEY_SECRET`
-The Ploof-specific env var wins over the provider-native env var. Stored credentials are used only when no env override is present.
+The Ploof-specific env var wins over the provider-native env var. Stored credentials are used only when no env override is present. Split fal.ai key id/secret pairs are joined into the token format expected by the fal client.
 OpenAI profile metadata may also include:
@@ -163,9 +175,10 @@ OpenAI profile metadata may also include:
 ```bash
 ploof login openai --api-key <key> [--profile default] [--organization org] [--project proj] [--base-url url]
+ploof login fal --api-key <key> [--profile default]
 ploof whoami [provider] [--profile default]
 ploof profiles [provider]
-ploof logout openai [--profile default]
+ploof logout <provider> [--profile default]
 ```
 `login`, `whoami`, `profiles`, and `logout` are the only authentication
@@ -176,6 +189,10 @@ commands. Ploof should not expose a second equivalent auth namespace.
 when run in an interactive terminal. Non-interactive login fails if no key is
 provided.
+`ploof login fal` accepts `--api-key`, reads `PLOOF_FAL_KEY` or `FAL_KEY`, and
+also supports `PLOOF_FAL_KEY_ID`/`PLOOF_FAL_KEY_SECRET` or
+`FAL_KEY_ID`/`FAL_KEY_SECRET` pairs.
 ### Config
 ```bash
@@ -239,6 +256,48 @@ authenticated project has DALL-E 2 variation access; if OpenAI returns a 404,
 use `ploof image edit` for image-to-image workflows. `ploof image variations`
 is an alias.
+### Generic Model Endpoints
+`model.run` executes arbitrary provider model endpoints. It is primarily useful
+for model marketplaces such as fal.ai, where the endpoint schema is selected by
+`--model`.
+```bash
+ploof model run \
+  --provider fal \
+  --model fal-ai/flux/dev \
+  --prompt "Small mascot icon for a CLI tool" \
+  --param image_size=square_hd \
+  --out assets/fal-icon.png \
+  --output json
+```
+Named inputs preserve exact provider field names:
+```bash
+ploof model run \
+  --provider fal \
+  --model <fal-endpoint-id> \
+  --prompt "Animate this source image" \
+  --input image_url=assets/source.png \
+  --param duration=4 \
+  --out assets/clip.mp4
+```
+Model endpoint controls:
+- `--param key=value`
+- `--json '{...}'`
+- `--input field=path-or-url`
+- `--start-timeout <seconds>`
+- `--timeout <seconds>`
+- `--poll-interval <seconds>`
+- `--priority low|normal`
+- `--storage-expires-in <value>`
+fal.ai commands should use queue polling and write complete returned assets or
+text outputs to disk.
 ### Video Generation
 OpenAI video generation uses the asynchronous Videos API. `ploof video generate`
@@ -303,12 +362,89 @@ project is eligible for that workflow. Extensions accept a source video id or
 upload, plus a prompt and `--seconds`. `video remix` is supported for the SDK's
 legacy remix endpoint, but new integrations should prefer `video edit`.
+### Audio Generation And Processing
+OpenAI audio generation uses the speech API and defaults to
+`gpt-4o-mini-tts`, `alloy`, and `mp3` when model, voice, and output format are
+omitted.
+```bash
+ploof audio generate \
+  --provider openai \
+  --text "Short narration for the generated asset." \
+  --model gpt-4o-mini-tts \
+  --voice alloy \
+  --format mp3 \
+  --out assets/narration.mp3 \
+  --output json
+```
+First-class OpenAI audio generation flags:
+- `--model <model>`
+- `--voice <voice>`
+- `--voice-id <id>`
+- `--instructions <text>`
+- `--format <format>` / `--response-format <format>`
+- `--speed <number>`
+- `--param key=value`
+- `--json '{...}'`
+Audio processing supports transcription and English translation:
+```bash
+ploof audio transcribe \
+  --audio assets/narration.mp3 \
+  --model gpt-4o-mini-transcribe \
+  --out assets/transcript.json \
+  --output json
+ploof audio translate \
+  --audio assets/spanish.mp3 \
+  --model whisper-1 \
+  --format text \
+  --out assets/translation.txt \
+  --output json
+```
+Transcription first-class flags:
+- `--model <model>`
+- `--language <code>`
+- `--prompt <prompt>`
+- `--format <format>` / `--response-format <format>`
+- `--temperature <number>`
+- `--include <value>`
+- `--timestamp-granularity word|segment`
+- `--chunking-strategy auto|{...}`
+- `--known-speaker-name <name>`
+- `--known-speaker-reference <data-url>`
+- `--param key=value`
+- `--json '{...}'`
+Translation first-class flags:
+- `--model <model>`
+- `--prompt <prompt>`
+- `--format <format>` / `--response-format <format>`
+- `--temperature <number>`
+- `--param key=value`
+- `--json '{...}'`
+Ploof is a static asset generation CLI. Audio commands request complete outputs
+and write them to disk. Streaming transport settings such as OpenAI
+`stream=true` for transcription or `stream_format=sse` for speech are rejected
+because they do not directly produce finished asset files.
 ### Batch Run
 ```bash
 ploof run assets.yaml --parallel 4
 ```
+Manifest media task kinds default to `provider: openai`; `model.run` defaults
+to `provider: fal`.
 Manifest example:
 ```yaml
@@ -356,6 +492,36 @@ tasks:
     wait: true
     download: true
     output: assets/clip.mp4
+  - id: narration
+    kind: audio.generate
+    provider: openai
+    text: "Short narration for the generated clip."
+    params:
+      model: gpt-4o-mini-tts
+      voice: alloy
+      response_format: mp3
+    output: assets/narration.mp3
+  - id: transcript
+    kind: audio.transcribe
+    provider: openai
+    needs: [narration]
+    inputs:
+      audio:
+        task: narration
+    params:
+      model: gpt-4o-mini-transcribe
+    output: assets/transcript.json
+  - id: fal-icon
+    kind: model.run
+    provider: fal
+    model: fal-ai/flux/dev
+    prompt: "Small mascot icon for a CLI tool"
+    params:
+      image_size: square_hd
+    output: assets/fal-icon.png
 ```
 ## Asset Input Model
@@ -364,13 +530,18 @@ All input/context assets are normalized before provider execution:
 ```ts
 type AssetInput = {
-  role: 'image' | 'mask' | 'reference' | 'style' | 'audio' | 'video'
+  role: 'image' | 'mask' | 'reference' | 'style' | 'audio' | 'video' | string
   source: string
   mime?: string
   name?: string
 }
 ```
+Manifest `inputs` are a role map. Built-in aliases such as `images`,
+`inputReference`, and `videos` normalize to `image`, `reference`, and `video`,
+but providers can also consume custom roles like `style`, `control`, `pose`, or
+`initImage` without changing the manifest schema.
 Supported sources:
 - Local paths.
@@ -388,6 +559,22 @@ OpenAI video generation/editing maps:
 - `role=reference` to `input_reference` for image-guided video generation.
 - `role=video` to source video uploads for eligible edit/extension workflows.
+OpenAI audio processing maps:
+- `role=audio` to the uploaded audio file for transcription or translation.
+fal.ai media commands map common roles to URL fields:
+- `role=image` and `role=reference` to `image_url`.
+- `role=mask` to `mask_url`.
+- `role=style` to `style_image_url`.
+- `role=audio` to `audio_url`.
+- `role=video` to `video_url`.
+fal.ai `model.run` preserves exact input field names, so
+`inputs.image_url` or `--input image_url=source.png` becomes `image_url` in the
+provider input payload.
 Future providers can map roles such as `reference`, `style`, `init-image`, `audio`, or `video` differently.
 ## Provider Architecture
@@ -397,31 +584,37 @@ Provider modules implement a common interface:
 ```ts
 type Provider = {
   id: string
+  displayName?: string
   capabilities: ProviderCapability[]
-  runImageGenerate(job, context): Promise<ProviderResult>
-  runImageEdit(job, context): Promise<ProviderResult>
-  runImageVariation(job, context): Promise<ProviderResult>
-  runVideoGenerate(job, context): Promise<ProviderResult>
-  runVideoEdit(job, context): Promise<ProviderResult>
-  runVideoExtend(job, context): Promise<ProviderResult>
-  runVideoRemix(job, context): Promise<ProviderResult>
-  runVideoStatus(job, context): Promise<ProviderResult>
-  runVideoDownload(job, context): Promise<ProviderResult>
-  runVideoList(job, context): Promise<ProviderResult>
-  runVideoDelete(job, context): Promise<ProviderResult>
-  runVideoCharacterCreate(job, context): Promise<ProviderResult>
-  runVideoCharacterGet(job, context): Promise<ProviderResult>
+  auth?: {
+    apiKeyEnvVars: string[]
+    apiKeyEnvPairs?: Array<{ idEnvVar: string; secretEnvVar: string }>
+    organizationEnvVar?: string
+    projectEnvVar?: string
+    baseURLEnvVar?: string
+  }
+  run(job, context): Promise<ProviderResult>
 }
 ```
 The provider registry owns:
 - Provider lookup.
-- Capability checks.
-- Credential resolution.
+- Auth metadata lookup.
+- Capability discovery.
+Provider modules own:
 - Provider-specific validation.
+- Provider SDK/client mapping.
+- Dispatch from generic `AssetJob` objects to internal operation handlers.
+- Output persistence details when the provider returns URLs, binary responses, or
+  structured data.
 The CLI should avoid hardcoding all provider behavior into command handlers.
+Manifest execution should build generic `AssetJob` objects and call
+`provider.run(job, context)` rather than calling modality-specific provider
+methods directly.
 ## Settings Strategy
@@ -461,6 +654,11 @@ Asset-producing commands should write the asset to disk and print structured met
 }
 ```
+Ploof is a static asset generation tool. Providers may use asynchronous jobs,
+polling, or queue subscriptions internally, but CLI consumers receive completed
+files or text outputs after the command finishes. Streaming transports should
+not be exposed as the primary consumption model.
 Each generated file should have an optional sidecar metadata file:
 ```text