@miketromba/ploof 0.3.0 → 0.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -9,7 +9,7 @@
9
9
  <img src="https://img.shields.io/badge/node-%3E%3D18-brightgreen" alt="node version" />
10
10
  </p>
11
11
 
12
- Ploof is a CLI for generating and editing creative assets with AI providers. It supports OpenAI image, video, and audio generation/processing today, plus the legacy OpenAI image variations endpoint when the authenticated project has access. The provider registry is designed for broader model marketplaces over time.
12
+ Ploof is a CLI for generating and editing creative assets with AI providers. It supports OpenAI image, video, and audio generation/processing, plus fal.ai's model marketplace through the official fal client. The provider registry is designed for broader model marketplaces over time.
13
13
 
14
14
  It is built for both developers and AI agents: predictable commands, parseable output, local auth profiles, YAML manifests, parallel execution, and a companion skill.
15
15
 
@@ -27,6 +27,9 @@ It is built for both developers and AI agents: predictable commands, parseable o
27
27
  | OpenAI audio generation / TTS | Supported |
28
28
  | OpenAI audio transcription | Supported |
29
29
  | OpenAI audio translation | Supported |
30
+ | fal.ai auth profiles | Supported |
31
+ | fal.ai model endpoints | Supported through `ploof model run` |
32
+ | fal.ai image/video/audio endpoints | Supported through `--provider fal --model <endpoint-id>` |
30
33
  | Context images and masks | Supported |
31
34
  | Image, video, and audio input assets | Supported |
32
35
  | YAML/JSON batch manifests | Supported |
@@ -60,6 +63,7 @@ npx @miketromba/ploof --help
60
63
  ```bash
61
64
  # Authenticate
62
65
  ploof login openai --api-key <your-api-key>
66
+ ploof login fal --api-key <your-fal-key>
63
67
 
64
68
  # Generate an image
65
69
  ploof image generate \
@@ -94,6 +98,14 @@ ploof audio transcribe \
94
98
 
95
99
  # Run a manifest
96
100
  ploof run assets.yaml --parallel 4
101
+
102
+ # Run any fal.ai endpoint directly
103
+ ploof model run \
104
+ --provider fal \
105
+ --model fal-ai/flux/dev \
106
+ --prompt "Friendly CLI mascot icon, simple shape, transparent background" \
107
+ --param image_size=square_hd \
108
+ --out assets/icon.png
97
109
  ```
98
110
 
99
111
  ## Authentication
@@ -106,6 +118,8 @@ ploof login openai --api-key <your-api-key> --profile work
106
118
  ploof whoami openai
107
119
  ploof profiles openai
108
120
  ploof logout openai --profile work
121
+ ploof login fal --api-key <your-fal-key>
122
+ ploof whoami fal
109
123
  ```
110
124
 
111
125
  If `--api-key` is omitted, `ploof login openai` reads
@@ -118,6 +132,20 @@ Environment variables override stored credentials:
118
132
  export PLOOF_OPENAI_API_KEY=sk-...
119
133
  # or
120
134
  export OPENAI_API_KEY=sk-...
135
+
136
+ export PLOOF_FAL_KEY=...
137
+ # or
138
+ export FAL_KEY=...
139
+ ```
140
+
141
+ fal.ai split key environment variables are also supported:
142
+
143
+ ```bash
144
+ export PLOOF_FAL_KEY_ID=...
145
+ export PLOOF_FAL_KEY_SECRET=...
146
+ # or
147
+ export FAL_KEY_ID=...
148
+ export FAL_KEY_SECRET=...
121
149
  ```
122
150
 
123
151
  OpenAI profile metadata:
@@ -130,6 +158,61 @@ ploof login openai \
130
158
  --base-url <url>
131
159
  ```
132
160
 
161
+ ## fal.ai Model Endpoints
162
+
163
+ fal.ai support uses the official `@fal-ai/client`. Ploof uploads local asset inputs through fal storage, submits work through the fal queue in polling mode, waits for a complete response, and writes returned assets or text to disk.
164
+
165
+ Use `ploof model run` for arbitrary fal endpoints:
166
+
167
+ ```bash
168
+ ploof model run \
169
+ --provider fal \
170
+ --model fal-ai/flux/dev \
171
+ --prompt "Tiny app icon for a cheerful asset generation CLI" \
172
+ --param image_size=square_hd \
173
+ --out assets/fal-icon.png \
174
+ --output json
175
+ ```
176
+
177
+ Named asset inputs map directly to provider input fields:
178
+
179
+ ```bash
180
+ ploof model run \
181
+ --provider fal \
182
+ --model <fal-endpoint-id> \
183
+ --prompt "Animate this image into a short loop" \
184
+ --input image_url=assets/source.png \
185
+ --param duration=4 \
186
+ --out assets/loop.mp4
187
+ ```
188
+
189
+ The media commands also work with fal when you provide the fal endpoint id as `--model`:
190
+
191
+ ```bash
192
+ ploof image generate \
193
+ --provider fal \
194
+ --model fal-ai/flux/dev \
195
+ --prompt "Soft clay mascot icon" \
196
+ --param image_size=square_hd \
197
+ --out assets/mascot.png
198
+
199
+ ploof video generate \
200
+ --provider fal \
201
+ --model <fal-video-endpoint-id> \
202
+ --prompt "Slow camera push through a miniature paper city" \
203
+ --input-reference assets/reference.png \
204
+ --param duration=4 \
205
+ --out assets/fal-video.mp4
206
+
207
+ ploof audio generate \
208
+ --provider fal \
209
+ --model <fal-audio-endpoint-id> \
210
+ --text "A short spoken line." \
211
+ --out assets/fal-audio.mp3
212
+ ```
213
+
214
+ Use `--param key=value` or `--json '{...}'` for endpoint-specific settings. Queue controls include `--start-timeout`, `--timeout`, `--poll-interval`, `--priority low|normal`, and `--storage-expires-in`.
215
+
133
216
  ## Image Generation
134
217
 
135
218
  OpenAI image generation and editing default to `gpt-image-2` when `--model` is omitted.
@@ -376,6 +459,15 @@ tasks:
376
459
  params:
377
460
  model: gpt-4o-mini-transcribe
378
461
  output: assets/transcript.json
462
+
463
+ - id: fal-icon
464
+ kind: model.run
465
+ provider: fal
466
+ model: fal-ai/flux/dev
467
+ prompt: "Small mascot icon for a CLI tool"
468
+ params:
469
+ image_size: square_hd
470
+ output: assets/fal-icon.png
379
471
  ```
380
472
 
381
473
  Run it:
@@ -385,6 +477,8 @@ ploof run assets.yaml --parallel 4
385
477
  ploof run assets.yaml --dry-run --output json
386
478
  ```
387
479
 
480
+ In manifests, media task kinds default to `provider: openai`; `model.run` defaults to `provider: fal`.
481
+
388
482
  ## Output Formats
389
483
 
390
484
  Ploof defaults to table output in TTYs and compact output when piped.
@@ -454,7 +548,7 @@ bun run build
454
548
  npm pack --dry-run
455
549
  ```
456
550
 
457
- The default test suite includes mocked OpenAI end-to-end tests. Those tests run real `ploof` CLI commands against a local mock OpenAI server and verify generated files, edit uploads, video job polling/downloads, audio generation/processing, sidecar metadata, and dependency-aware manifests without spending API credits.
551
+ The default test suite includes mocked OpenAI end-to-end tests and fal provider unit tests. The OpenAI tests run real `ploof` CLI commands against a local mock OpenAI server and verify generated files, edit uploads, video job polling/downloads, audio generation/processing, sidecar metadata, and dependency-aware manifests without spending API credits. The fal tests verify endpoint payload construction, local input upload mapping, polling options, and output persistence without spending API credits.
458
552
 
459
553
  Live OpenAI tests are opt-in only:
460
554
 
@@ -462,11 +556,19 @@ Live OpenAI tests are opt-in only:
462
556
  PLOOF_OPENAI_API_KEY=sk-... bun test tests/e2e
463
557
  ```
464
558
 
559
+ Live fal.ai tests are also opt-in and use `fal-ai/flux/schnell` by default:
560
+
561
+ ```bash
562
+ PLOOF_FAL_KEY=... bun test tests/e2e/fal-live.test.ts
563
+ ```
564
+
465
565
  Optional live-test overrides:
466
566
 
467
567
  ```bash
468
568
  PLOOF_OPENAI_LIVE_MODEL=gpt-image-2
469
569
  PLOOF_OPENAI_LIVE_SIZE=1024x1024
570
+ PLOOF_FAL_LIVE_MODEL=fal-ai/flux/schnell
571
+ PLOOF_FAL_LIVE_IMAGE_SIZE_PARAM=image_size=square_hd
470
572
  ```
471
573
 
472
574
  ## Publishing
package/SPEC.md CHANGED
@@ -2,7 +2,7 @@
2
2
 
3
3
  ## Summary
4
4
 
5
- Ploof is an npm-published CLI for generating, editing, and processing assets through AI generation providers. It starts with OpenAI image, video, and audio generation/processing, but the architecture must support multiple authenticated providers, multiple asset modalities, provider-specific settings, and parallel execution across mixed jobs.
5
+ Ploof is an npm-published CLI for generating, editing, and processing assets through AI generation providers. It supports OpenAI image, video, and audio generation/processing plus fal.ai model endpoints, while preserving an architecture for multiple authenticated providers, multiple asset modalities, provider-specific settings, and parallel execution across mixed jobs.
6
6
 
7
7
  The product should feel like a small, sharp developer tool: easy to run manually, predictable in scripts, and optimized for AI agents.
8
8
 
@@ -80,10 +80,11 @@ Local release verification must stop at `npm pack --dry-run`; do not run local
80
80
 
81
81
  ## Initial Provider Scope
82
82
 
83
- Version 1 starts with OpenAI only.
83
+ The current provider scope includes OpenAI and fal.ai.
84
84
 
85
- Initial capabilities:
85
+ Core operation kinds:
86
86
 
87
+ - `model.run`
87
88
  - `image.generate`
88
89
  - `image.edit`
89
90
  - `image.variation`
@@ -103,9 +104,13 @@ Initial capabilities:
103
104
 
104
105
  Future providers should be added through the provider registry without changing the manifest model.
105
106
 
107
+ Provider notes:
108
+
109
+ - OpenAI has first-class implementations for images, videos, audio/TTS, transcription, translation, and OpenAI video library operations.
110
+ - fal.ai uses the official `@fal-ai/client`, supports arbitrary endpoints through `model.run`, and supports image/video/audio commands when the chosen fal endpoint schema matches the command shape.
111
+
106
112
  Future high-leverage provider candidates:
107
113
 
108
- - fal.ai: strong multi-model generative media coverage.
109
114
  - Replicate: broad community model marketplace.
110
115
  - Hugging Face Inference Providers: centralized access to many hosted models/providers.
111
116
 
@@ -139,8 +144,12 @@ Environment overrides:
139
144
 
140
145
  - `PLOOF_OPENAI_API_KEY`
141
146
  - `OPENAI_API_KEY`
147
+ - `PLOOF_FAL_KEY`
148
+ - `FAL_KEY`
149
+ - `PLOOF_FAL_KEY_ID` and `PLOOF_FAL_KEY_SECRET`
150
+ - `FAL_KEY_ID` and `FAL_KEY_SECRET`
142
151
 
143
- The Ploof-specific env var wins over the provider-native env var. Stored credentials are used only when no env override is present.
152
+ The Ploof-specific env var wins over the provider-native env var. Stored credentials are used only when no env override is present. Split fal.ai key id/secret pairs are joined into the token format expected by the fal client.
144
153
 
145
154
  OpenAI profile metadata may also include:
146
155
 
@@ -166,9 +175,10 @@ OpenAI profile metadata may also include:
166
175
 
167
176
  ```bash
168
177
  ploof login openai --api-key <key> [--profile default] [--organization org] [--project proj] [--base-url url]
178
+ ploof login fal --api-key <key> [--profile default]
169
179
  ploof whoami [provider] [--profile default]
170
180
  ploof profiles [provider]
171
- ploof logout openai [--profile default]
181
+ ploof logout <provider> [--profile default]
172
182
  ```
173
183
 
174
184
  `login`, `whoami`, `profiles`, and `logout` are the only authentication
@@ -179,6 +189,10 @@ commands. Ploof should not expose a second equivalent auth namespace.
179
189
  when run in an interactive terminal. Non-interactive login fails if no key is
180
190
  provided.
181
191
 
192
+ `ploof login fal` accepts `--api-key`, reads `PLOOF_FAL_KEY` or `FAL_KEY`, and
193
+ also supports `PLOOF_FAL_KEY_ID`/`PLOOF_FAL_KEY_SECRET` or
194
+ `FAL_KEY_ID`/`FAL_KEY_SECRET` pairs.
195
+
182
196
  ### Config
183
197
 
184
198
  ```bash
@@ -242,6 +256,48 @@ authenticated project has DALL-E 2 variation access; if OpenAI returns a 404,
242
256
  use `ploof image edit` for image-to-image workflows. `ploof image variations`
243
257
  is an alias.
244
258
 
259
+ ### Generic Model Endpoints
260
+
261
+ `model.run` executes arbitrary provider model endpoints. It is primarily useful
262
+ for model marketplaces such as fal.ai, where the endpoint schema is selected by
263
+ `--model`.
264
+
265
+ ```bash
266
+ ploof model run \
267
+ --provider fal \
268
+ --model fal-ai/flux/dev \
269
+ --prompt "Small mascot icon for a CLI tool" \
270
+ --param image_size=square_hd \
271
+ --out assets/fal-icon.png \
272
+ --output json
273
+ ```
274
+
275
+ Named inputs preserve exact provider field names:
276
+
277
+ ```bash
278
+ ploof model run \
279
+ --provider fal \
280
+ --model <fal-endpoint-id> \
281
+ --prompt "Animate this source image" \
282
+ --input image_url=assets/source.png \
283
+ --param duration=4 \
284
+ --out assets/clip.mp4
285
+ ```
286
+
287
+ Model endpoint controls:
288
+
289
+ - `--param key=value`
290
+ - `--json '{...}'`
291
+ - `--input field=path-or-url`
292
+ - `--start-timeout <seconds>`
293
+ - `--timeout <seconds>`
294
+ - `--poll-interval <seconds>`
295
+ - `--priority low|normal`
296
+ - `--storage-expires-in <value>`
297
+
298
+ fal.ai commands should use queue polling and write complete returned assets or
299
+ text outputs to disk.
300
+
245
301
  ### Video Generation
246
302
 
247
303
  OpenAI video generation uses the asynchronous Videos API. `ploof video generate`
@@ -386,6 +442,9 @@ because they do not directly produce finished asset files.
386
442
  ploof run assets.yaml --parallel 4
387
443
  ```
388
444
 
445
+ Manifest media task kinds default to `provider: openai`; `model.run` defaults
446
+ to `provider: fal`.
447
+
389
448
  Manifest example:
390
449
 
391
450
  ```yaml
@@ -454,6 +513,15 @@ tasks:
454
513
  params:
455
514
  model: gpt-4o-mini-transcribe
456
515
  output: assets/transcript.json
516
+
517
+ - id: fal-icon
518
+ kind: model.run
519
+ provider: fal
520
+ model: fal-ai/flux/dev
521
+ prompt: "Small mascot icon for a CLI tool"
522
+ params:
523
+ image_size: square_hd
524
+ output: assets/fal-icon.png
457
525
  ```
458
526
 
459
527
  ## Asset Input Model
@@ -462,13 +530,18 @@ All input/context assets are normalized before provider execution:
462
530
 
463
531
  ```ts
464
532
  type AssetInput = {
465
- role: 'image' | 'mask' | 'reference' | 'style' | 'audio' | 'video'
533
+ role: 'image' | 'mask' | 'reference' | 'style' | 'audio' | 'video' | string
466
534
  source: string
467
535
  mime?: string
468
536
  name?: string
469
537
  }
470
538
  ```
471
539
 
540
+ Manifest `inputs` are a role map. Built-in aliases such as `images`,
541
+ `inputReference`, and `videos` normalize to `image`, `reference`, and `video`,
542
+ but providers can also consume custom roles like `style`, `control`, `pose`, or
543
+ `initImage` without changing the manifest schema.
544
+
472
545
  Supported sources:
473
546
 
474
547
  - Local paths.
@@ -490,6 +563,18 @@ OpenAI audio processing maps:
490
563
 
491
564
  - `role=audio` to the uploaded audio file for transcription or translation.
492
565
 
566
+ fal.ai media commands map common roles to URL fields:
567
+
568
+ - `role=image` and `role=reference` to `image_url`.
569
+ - `role=mask` to `mask_url`.
570
+ - `role=style` to `style_image_url`.
571
+ - `role=audio` to `audio_url`.
572
+ - `role=video` to `video_url`.
573
+
574
+ fal.ai `model.run` preserves exact input field names, so
575
+ `inputs.image_url` or `--input image_url=source.png` becomes `image_url` in the
576
+ provider input payload.
577
+
493
578
  Future providers can map roles such as `reference`, `style`, `init-image`, `audio`, or `video` differently.
494
579
 
495
580
  ## Provider Architecture
@@ -499,34 +584,37 @@ Provider modules implement a common interface:
499
584
  ```ts
500
585
  type Provider = {
501
586
  id: string
587
+ displayName?: string
502
588
  capabilities: ProviderCapability[]
503
- runImageGenerate(job, context): Promise<ProviderResult>
504
- runImageEdit(job, context): Promise<ProviderResult>
505
- runImageVariation(job, context): Promise<ProviderResult>
506
- runVideoGenerate(job, context): Promise<ProviderResult>
507
- runVideoEdit(job, context): Promise<ProviderResult>
508
- runVideoExtend(job, context): Promise<ProviderResult>
509
- runVideoRemix(job, context): Promise<ProviderResult>
510
- runVideoStatus(job, context): Promise<ProviderResult>
511
- runVideoDownload(job, context): Promise<ProviderResult>
512
- runVideoList(job, context): Promise<ProviderResult>
513
- runVideoDelete(job, context): Promise<ProviderResult>
514
- runVideoCharacterCreate(job, context): Promise<ProviderResult>
515
- runVideoCharacterGet(job, context): Promise<ProviderResult>
516
- runAudioGenerate(job, context): Promise<ProviderResult>
517
- runAudioTranscribe(job, context): Promise<ProviderResult>
518
- runAudioTranslate(job, context): Promise<ProviderResult>
589
+ auth?: {
590
+ apiKeyEnvVars: string[]
591
+ apiKeyEnvPairs?: Array<{ idEnvVar: string; secretEnvVar: string }>
592
+ organizationEnvVar?: string
593
+ projectEnvVar?: string
594
+ baseURLEnvVar?: string
595
+ }
596
+ run(job, context): Promise<ProviderResult>
519
597
  }
520
598
  ```
521
599
 
522
600
  The provider registry owns:
523
601
 
524
602
  - Provider lookup.
525
- - Capability checks.
526
- - Credential resolution.
603
+ - Auth metadata lookup.
604
+ - Capability discovery.
605
+
606
+ Provider modules own:
607
+
527
608
  - Provider-specific validation.
609
+ - Provider SDK/client mapping.
610
+ - Dispatch from generic `AssetJob` objects to internal operation handlers.
611
+ - Output persistence details when the provider returns URLs, binary responses, or
612
+ structured data.
528
613
 
529
614
  The CLI should avoid hardcoding all provider behavior into command handlers.
615
+ Manifest execution should build generic `AssetJob` objects and call
616
+ `provider.run(job, context)` rather than calling modality-specific provider
617
+ methods directly.
530
618
 
531
619
  ## Settings Strategy
532
620
 
@@ -566,6 +654,11 @@ Asset-producing commands should write the asset to disk and print structured met
566
654
  }
567
655
  ```
568
656
 
657
+ Ploof is a static asset generation tool. Providers may use asynchronous jobs,
658
+ polling, or queue subscriptions internally, but CLI consumers receive completed
659
+ files or text outputs after the command finishes. Streaming transports should
660
+ not be exposed as the primary consumption model.
661
+
569
662
  Each generated file should have an optional sidecar metadata file:
570
663
 
571
664
  ```text