npm - @miketromba/ploof - Versions diffs - 0.3.0 → 0.4.0 - Mend

@miketromba/ploof 0.3.0 → 0.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (5) hide show

package/README.md +104 -2
package/SPEC.md +118 -25
package/dist/ploof.js +224 -218
package/package.json +5 -2
package/skills/asset-generation/SKILL.md +1 -1

package/README.md CHANGED Viewed

@@ -9,7 +9,7 @@
   <img src="https://img.shields.io/badge/node-%3E%3D18-brightgreen" alt="node version" />
 </p>
-Ploof is a CLI for generating and editing creative assets with AI providers. It supports OpenAI image, video, and audio generation/processing today, plus the legacy OpenAI image variations endpoint when the authenticated project has access. The provider registry is designed for broader model marketplaces over time.
+Ploof is a CLI for generating and editing creative assets with AI providers. It supports OpenAI image, video, and audio generation/processing, plus fal.ai's model marketplace through the official fal client. The provider registry is designed for broader model marketplaces over time.
 It is built for both developers and AI agents: predictable commands, parseable output, local auth profiles, YAML manifests, parallel execution, and a companion skill.
@@ -27,6 +27,9 @@ It is built for both developers and AI agents: predictable commands, parseable o
 | OpenAI audio generation / TTS | Supported |
 | OpenAI audio transcription | Supported |
 | OpenAI audio translation | Supported |
+| fal.ai auth profiles | Supported |
+| fal.ai model endpoints | Supported through `ploof model run` |
+| fal.ai image/video/audio endpoints | Supported through `--provider fal --model <endpoint-id>` |
 | Context images and masks | Supported |
 | Image, video, and audio input assets | Supported |
 | YAML/JSON batch manifests | Supported |
@@ -60,6 +63,7 @@ npx @miketromba/ploof --help
 ```bash
 # Authenticate
 ploof login openai --api-key <your-api-key>
+ploof login fal --api-key <your-fal-key>
 # Generate an image
 ploof image generate \
@@ -94,6 +98,14 @@ ploof audio transcribe \
 # Run a manifest
 ploof run assets.yaml --parallel 4
+# Run any fal.ai endpoint directly
+ploof model run \
+  --provider fal \
+  --model fal-ai/flux/dev \
+  --prompt "Friendly CLI mascot icon, simple shape, transparent background" \
+  --param image_size=square_hd \
+  --out assets/icon.png
 ```
 ## Authentication
@@ -106,6 +118,8 @@ ploof login openai --api-key <your-api-key> --profile work
 ploof whoami openai
 ploof profiles openai
 ploof logout openai --profile work
+ploof login fal --api-key <your-fal-key>
+ploof whoami fal
 ```
 If `--api-key` is omitted, `ploof login openai` reads
@@ -118,6 +132,20 @@ Environment variables override stored credentials:
 export PLOOF_OPENAI_API_KEY=sk-...
 # or
 export OPENAI_API_KEY=sk-...
+export PLOOF_FAL_KEY=...
+# or
+export FAL_KEY=...
+```
+fal.ai split key environment variables are also supported:
+```bash
+export PLOOF_FAL_KEY_ID=...
+export PLOOF_FAL_KEY_SECRET=...
+# or
+export FAL_KEY_ID=...
+export FAL_KEY_SECRET=...
 ```
 OpenAI profile metadata:
@@ -130,6 +158,61 @@ ploof login openai \
   --base-url <url>
 ```
+## fal.ai Model Endpoints
+fal.ai support uses the official `@fal-ai/client`. Ploof uploads local asset inputs through fal storage, submits work through the fal queue in polling mode, waits for a complete response, and writes returned assets or text to disk.
+Use `ploof model run` for arbitrary fal endpoints:
+```bash
+ploof model run \
+  --provider fal \
+  --model fal-ai/flux/dev \
+  --prompt "Tiny app icon for a cheerful asset generation CLI" \
+  --param image_size=square_hd \
+  --out assets/fal-icon.png \
+  --output json
+```
+Named asset inputs map directly to provider input fields:
+```bash
+ploof model run \
+  --provider fal \
+  --model <fal-endpoint-id> \
+  --prompt "Animate this image into a short loop" \
+  --input image_url=assets/source.png \
+  --param duration=4 \
+  --out assets/loop.mp4
+```
+The media commands also work with fal when you provide the fal endpoint id as `--model`:
+```bash
+ploof image generate \
+  --provider fal \
+  --model fal-ai/flux/dev \
+  --prompt "Soft clay mascot icon" \
+  --param image_size=square_hd \
+  --out assets/mascot.png
+ploof video generate \
+  --provider fal \
+  --model <fal-video-endpoint-id> \
+  --prompt "Slow camera push through a miniature paper city" \
+  --input-reference assets/reference.png \
+  --param duration=4 \
+  --out assets/fal-video.mp4
+ploof audio generate \
+  --provider fal \
+  --model <fal-audio-endpoint-id> \
+  --text "A short spoken line." \
+  --out assets/fal-audio.mp3
+```
+Use `--param key=value` or `--json '{...}'` for endpoint-specific settings. Queue controls include `--start-timeout`, `--timeout`, `--poll-interval`, `--priority low|normal`, and `--storage-expires-in`.
 ## Image Generation
 OpenAI image generation and editing default to `gpt-image-2` when `--model` is omitted.
@@ -376,6 +459,15 @@ tasks:
     params:
       model: gpt-4o-mini-transcribe
     output: assets/transcript.json
+  - id: fal-icon
+    kind: model.run
+    provider: fal
+    model: fal-ai/flux/dev
+    prompt: "Small mascot icon for a CLI tool"
+    params:
+      image_size: square_hd
+    output: assets/fal-icon.png
 ```
 Run it:
@@ -385,6 +477,8 @@ ploof run assets.yaml --parallel 4
 ploof run assets.yaml --dry-run --output json
 ```
+In manifests, media task kinds default to `provider: openai`; `model.run` defaults to `provider: fal`.
 ## Output Formats
 Ploof defaults to table output in TTYs and compact output when piped.
@@ -454,7 +548,7 @@ bun run build
 npm pack --dry-run
 ```
-The default test suite includes mocked OpenAI end-to-end tests. Those tests run real `ploof` CLI commands against a local mock OpenAI server and verify generated files, edit uploads, video job polling/downloads, audio generation/processing, sidecar metadata, and dependency-aware manifests without spending API credits.
+The default test suite includes mocked OpenAI end-to-end tests and fal provider unit tests. The OpenAI tests run real `ploof` CLI commands against a local mock OpenAI server and verify generated files, edit uploads, video job polling/downloads, audio generation/processing, sidecar metadata, and dependency-aware manifests without spending API credits. The fal tests verify endpoint payload construction, local input upload mapping, polling options, and output persistence without spending API credits.
 Live OpenAI tests are opt-in only:
@@ -462,11 +556,19 @@ Live OpenAI tests are opt-in only:
 PLOOF_OPENAI_API_KEY=sk-... bun test tests/e2e
 ```
+Live fal.ai tests are also opt-in and use `fal-ai/flux/schnell` by default:
+```bash
+PLOOF_FAL_KEY=... bun test tests/e2e/fal-live.test.ts
+```
 Optional live-test overrides:
 ```bash
 PLOOF_OPENAI_LIVE_MODEL=gpt-image-2
 PLOOF_OPENAI_LIVE_SIZE=1024x1024
+PLOOF_FAL_LIVE_MODEL=fal-ai/flux/schnell
+PLOOF_FAL_LIVE_IMAGE_SIZE_PARAM=image_size=square_hd
 ```
 ## Publishing

package/SPEC.md CHANGED Viewed

@@ -2,7 +2,7 @@
 ## Summary
-Ploof is an npm-published CLI for generating, editing, and processing assets through AI generation providers. It starts with OpenAI image, video, and audio generation/processing, but the architecture must support multiple authenticated providers, multiple asset modalities, provider-specific settings, and parallel execution across mixed jobs.
+Ploof is an npm-published CLI for generating, editing, and processing assets through AI generation providers. It supports OpenAI image, video, and audio generation/processing plus fal.ai model endpoints, while preserving an architecture for multiple authenticated providers, multiple asset modalities, provider-specific settings, and parallel execution across mixed jobs.
 The product should feel like a small, sharp developer tool: easy to run manually, predictable in scripts, and optimized for AI agents.
@@ -80,10 +80,11 @@ Local release verification must stop at `npm pack --dry-run`; do not run local
 ## Initial Provider Scope
-Version 1 starts with OpenAI only.
+The current provider scope includes OpenAI and fal.ai.
-Initial capabilities:
+Core operation kinds:
+- `model.run`
 - `image.generate`
 - `image.edit`
 - `image.variation`
@@ -103,9 +104,13 @@ Initial capabilities:
 Future providers should be added through the provider registry without changing the manifest model.
+Provider notes:
+- OpenAI has first-class implementations for images, videos, audio/TTS, transcription, translation, and OpenAI video library operations.
+- fal.ai uses the official `@fal-ai/client`, supports arbitrary endpoints through `model.run`, and supports image/video/audio commands when the chosen fal endpoint schema matches the command shape.
 Future high-leverage provider candidates:
-- fal.ai: strong multi-model generative media coverage.
 - Replicate: broad community model marketplace.
 - Hugging Face Inference Providers: centralized access to many hosted models/providers.
@@ -139,8 +144,12 @@ Environment overrides:
 - `PLOOF_OPENAI_API_KEY`
 - `OPENAI_API_KEY`
+- `PLOOF_FAL_KEY`
+- `FAL_KEY`
+- `PLOOF_FAL_KEY_ID` and `PLOOF_FAL_KEY_SECRET`
+- `FAL_KEY_ID` and `FAL_KEY_SECRET`
-The Ploof-specific env var wins over the provider-native env var. Stored credentials are used only when no env override is present.
+The Ploof-specific env var wins over the provider-native env var. Stored credentials are used only when no env override is present. Split fal.ai key id/secret pairs are joined into the token format expected by the fal client.
 OpenAI profile metadata may also include:
@@ -166,9 +175,10 @@ OpenAI profile metadata may also include:
 ```bash
 ploof login openai --api-key <key> [--profile default] [--organization org] [--project proj] [--base-url url]
+ploof login fal --api-key <key> [--profile default]
 ploof whoami [provider] [--profile default]
 ploof profiles [provider]
-ploof logout openai [--profile default]
+ploof logout <provider> [--profile default]
 ```
 `login`, `whoami`, `profiles`, and `logout` are the only authentication
@@ -179,6 +189,10 @@ commands. Ploof should not expose a second equivalent auth namespace.
 when run in an interactive terminal. Non-interactive login fails if no key is
 provided.
+`ploof login fal` accepts `--api-key`, reads `PLOOF_FAL_KEY` or `FAL_KEY`, and
+also supports `PLOOF_FAL_KEY_ID`/`PLOOF_FAL_KEY_SECRET` or
+`FAL_KEY_ID`/`FAL_KEY_SECRET` pairs.
 ### Config
 ```bash
@@ -242,6 +256,48 @@ authenticated project has DALL-E 2 variation access; if OpenAI returns a 404,
 use `ploof image edit` for image-to-image workflows. `ploof image variations`
 is an alias.
+### Generic Model Endpoints
+`model.run` executes arbitrary provider model endpoints. It is primarily useful
+for model marketplaces such as fal.ai, where the endpoint schema is selected by
+`--model`.
+```bash
+ploof model run \
+  --provider fal \
+  --model fal-ai/flux/dev \
+  --prompt "Small mascot icon for a CLI tool" \
+  --param image_size=square_hd \
+  --out assets/fal-icon.png \
+  --output json
+```
+Named inputs preserve exact provider field names:
+```bash
+ploof model run \
+  --provider fal \
+  --model <fal-endpoint-id> \
+  --prompt "Animate this source image" \
+  --input image_url=assets/source.png \
+  --param duration=4 \
+  --out assets/clip.mp4
+```
+Model endpoint controls:
+- `--param key=value`
+- `--json '{...}'`
+- `--input field=path-or-url`
+- `--start-timeout <seconds>`
+- `--timeout <seconds>`
+- `--poll-interval <seconds>`
+- `--priority low|normal`
+- `--storage-expires-in <value>`
+fal.ai commands should use queue polling and write complete returned assets or
+text outputs to disk.
 ### Video Generation
 OpenAI video generation uses the asynchronous Videos API. `ploof video generate`
@@ -386,6 +442,9 @@ because they do not directly produce finished asset files.
 ploof run assets.yaml --parallel 4
 ```
+Manifest media task kinds default to `provider: openai`; `model.run` defaults
+to `provider: fal`.
 Manifest example:
 ```yaml
@@ -454,6 +513,15 @@ tasks:
     params:
       model: gpt-4o-mini-transcribe
     output: assets/transcript.json
+  - id: fal-icon
+    kind: model.run
+    provider: fal
+    model: fal-ai/flux/dev
+    prompt: "Small mascot icon for a CLI tool"
+    params:
+      image_size: square_hd
+    output: assets/fal-icon.png
 ```
 ## Asset Input Model
@@ -462,13 +530,18 @@ All input/context assets are normalized before provider execution:
 ```ts
 type AssetInput = {
-  role: 'image' | 'mask' | 'reference' | 'style' | 'audio' | 'video'
+  role: 'image' | 'mask' | 'reference' | 'style' | 'audio' | 'video' | string
   source: string
   mime?: string
   name?: string
 }
 ```
+Manifest `inputs` are a role map. Built-in aliases such as `images`,
+`inputReference`, and `videos` normalize to `image`, `reference`, and `video`,
+but providers can also consume custom roles like `style`, `control`, `pose`, or
+`initImage` without changing the manifest schema.
 Supported sources:
 - Local paths.
@@ -490,6 +563,18 @@ OpenAI audio processing maps:
 - `role=audio` to the uploaded audio file for transcription or translation.
+fal.ai media commands map common roles to URL fields:
+- `role=image` and `role=reference` to `image_url`.
+- `role=mask` to `mask_url`.
+- `role=style` to `style_image_url`.
+- `role=audio` to `audio_url`.
+- `role=video` to `video_url`.
+fal.ai `model.run` preserves exact input field names, so
+`inputs.image_url` or `--input image_url=source.png` becomes `image_url` in the
+provider input payload.
 Future providers can map roles such as `reference`, `style`, `init-image`, `audio`, or `video` differently.
 ## Provider Architecture
@@ -499,34 +584,37 @@ Provider modules implement a common interface:
 ```ts
 type Provider = {
   id: string
+  displayName?: string
   capabilities: ProviderCapability[]
-  runImageGenerate(job, context): Promise<ProviderResult>
-  runImageEdit(job, context): Promise<ProviderResult>
-  runImageVariation(job, context): Promise<ProviderResult>
-  runVideoGenerate(job, context): Promise<ProviderResult>
-  runVideoEdit(job, context): Promise<ProviderResult>
-  runVideoExtend(job, context): Promise<ProviderResult>
-  runVideoRemix(job, context): Promise<ProviderResult>
-  runVideoStatus(job, context): Promise<ProviderResult>
-  runVideoDownload(job, context): Promise<ProviderResult>
-  runVideoList(job, context): Promise<ProviderResult>
-  runVideoDelete(job, context): Promise<ProviderResult>
-  runVideoCharacterCreate(job, context): Promise<ProviderResult>
-  runVideoCharacterGet(job, context): Promise<ProviderResult>
-  runAudioGenerate(job, context): Promise<ProviderResult>
-  runAudioTranscribe(job, context): Promise<ProviderResult>
-  runAudioTranslate(job, context): Promise<ProviderResult>
+  auth?: {
+    apiKeyEnvVars: string[]
+    apiKeyEnvPairs?: Array<{ idEnvVar: string; secretEnvVar: string }>
+    organizationEnvVar?: string
+    projectEnvVar?: string
+    baseURLEnvVar?: string
+  }
+  run(job, context): Promise<ProviderResult>
 }
 ```
 The provider registry owns:
 - Provider lookup.
-- Capability checks.
-- Credential resolution.
+- Auth metadata lookup.
+- Capability discovery.
+Provider modules own:
 - Provider-specific validation.
+- Provider SDK/client mapping.
+- Dispatch from generic `AssetJob` objects to internal operation handlers.
+- Output persistence details when the provider returns URLs, binary responses, or
+  structured data.
 The CLI should avoid hardcoding all provider behavior into command handlers.
+Manifest execution should build generic `AssetJob` objects and call
+`provider.run(job, context)` rather than calling modality-specific provider
+methods directly.
 ## Settings Strategy
@@ -566,6 +654,11 @@ Asset-producing commands should write the asset to disk and print structured met
 }
 ```
+Ploof is a static asset generation tool. Providers may use asynchronous jobs,
+polling, or queue subscriptions internally, but CLI consumers receive completed
+files or text outputs after the command finishes. Streaming transports should
+not be exposed as the primary consumption model.
 Each generated file should have an optional sidecar metadata file:
 ```text