@miketromba/ploof 0.2.0 → 0.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -9,7 +9,7 @@
9
9
  <img src="https://img.shields.io/badge/node-%3E%3D18-brightgreen" alt="node version" />
10
10
  </p>
11
11
 
12
- Ploof is a CLI for generating and editing creative assets with AI providers. It supports OpenAI image generation/editing and OpenAI video generation/editing today, plus the legacy OpenAI image variations endpoint when the authenticated project has access. The provider registry is designed for audio and broader model marketplaces over time.
12
+ Ploof is a CLI for generating and editing creative assets with AI providers. It supports OpenAI image, video, and audio generation/processing, plus fal.ai's model marketplace through the official fal client. The provider registry is designed for broader model marketplaces over time.
13
13
 
14
14
  It is built for both developers and AI agents: predictable commands, parseable output, local auth profiles, YAML manifests, parallel execution, and a companion skill.
15
15
 
@@ -24,13 +24,18 @@ It is built for both developers and AI agents: predictable commands, parseable o
24
24
  | OpenAI video generation | Supported |
25
25
  | OpenAI video editing/extensions | Supported |
26
26
  | OpenAI video downloads/library/characters | Supported |
27
+ | OpenAI audio generation / TTS | Supported |
28
+ | OpenAI audio transcription | Supported |
29
+ | OpenAI audio translation | Supported |
30
+ | fal.ai auth profiles | Supported |
31
+ | fal.ai model endpoints | Supported through `ploof model run` |
32
+ | fal.ai image/video/audio endpoints | Supported through `--provider fal --model <endpoint-id>` |
27
33
  | Context images and masks | Supported |
28
- | Video references and source videos | Supported |
34
+ | Image, video, and audio input assets | Supported |
29
35
  | YAML/JSON batch manifests | Supported |
30
36
  | Dependency-aware parallel runs | Supported |
31
37
  | Agent instructions via `ploof learn` | Supported |
32
38
  | Additional providers | Planned |
33
- | Audio generation | Planned |
34
39
 
35
40
  ## Install
36
41
 
@@ -58,6 +63,7 @@ npx @miketromba/ploof --help
58
63
  ```bash
59
64
  # Authenticate
60
65
  ploof login openai --api-key <your-api-key>
66
+ ploof login fal --api-key <your-fal-key>
61
67
 
62
68
  # Generate an image
63
69
  ploof image generate \
@@ -80,8 +86,26 @@ ploof video generate \
80
86
  --seconds 4 \
81
87
  --out assets/clip.mp4
82
88
 
89
+ # Generate and transcribe speech
90
+ ploof audio generate \
91
+ --text "Ploof can generate speech and process audio." \
92
+ --voice alloy \
93
+ --out assets/speech.mp3
94
+
95
+ ploof audio transcribe \
96
+ --audio assets/speech.mp3 \
97
+ --out assets/transcript.json
98
+
83
99
  # Run a manifest
84
100
  ploof run assets.yaml --parallel 4
101
+
102
+ # Run any fal.ai endpoint directly
103
+ ploof model run \
104
+ --provider fal \
105
+ --model fal-ai/flux/dev \
106
+ --prompt "Friendly CLI mascot icon, simple shape, transparent background" \
107
+ --param image_size=square_hd \
108
+ --out assets/icon.png
85
109
  ```
86
110
 
87
111
  ## Authentication
@@ -94,6 +118,8 @@ ploof login openai --api-key <your-api-key> --profile work
94
118
  ploof whoami openai
95
119
  ploof profiles openai
96
120
  ploof logout openai --profile work
121
+ ploof login fal --api-key <your-fal-key>
122
+ ploof whoami fal
97
123
  ```
98
124
 
99
125
  If `--api-key` is omitted, `ploof login openai` reads
@@ -106,6 +132,20 @@ Environment variables override stored credentials:
106
132
  export PLOOF_OPENAI_API_KEY=sk-...
107
133
  # or
108
134
  export OPENAI_API_KEY=sk-...
135
+
136
+ export PLOOF_FAL_KEY=...
137
+ # or
138
+ export FAL_KEY=...
139
+ ```
140
+
141
+ fal.ai split key environment variables are also supported:
142
+
143
+ ```bash
144
+ export PLOOF_FAL_KEY_ID=...
145
+ export PLOOF_FAL_KEY_SECRET=...
146
+ # or
147
+ export FAL_KEY_ID=...
148
+ export FAL_KEY_SECRET=...
109
149
  ```
110
150
 
111
151
  OpenAI profile metadata:
@@ -118,6 +158,61 @@ ploof login openai \
118
158
  --base-url <url>
119
159
  ```
120
160
 
161
+ ## fal.ai Model Endpoints
162
+
163
+ fal.ai support uses the official `@fal-ai/client`. Ploof uploads local asset inputs through fal storage, submits work through the fal queue in polling mode, waits for a complete response, and writes returned assets or text to disk.
164
+
165
+ Use `ploof model run` for arbitrary fal endpoints:
166
+
167
+ ```bash
168
+ ploof model run \
169
+ --provider fal \
170
+ --model fal-ai/flux/dev \
171
+ --prompt "Tiny app icon for a cheerful asset generation CLI" \
172
+ --param image_size=square_hd \
173
+ --out assets/fal-icon.png \
174
+ --output json
175
+ ```
176
+
177
+ Named asset inputs map directly to provider input fields:
178
+
179
+ ```bash
180
+ ploof model run \
181
+ --provider fal \
182
+ --model <fal-endpoint-id> \
183
+ --prompt "Animate this image into a short loop" \
184
+ --input image_url=assets/source.png \
185
+ --param duration=4 \
186
+ --out assets/loop.mp4
187
+ ```
188
+
189
+ The media commands also work with fal when you provide the fal endpoint id as `--model`:
190
+
191
+ ```bash
192
+ ploof image generate \
193
+ --provider fal \
194
+ --model fal-ai/flux/dev \
195
+ --prompt "Soft clay mascot icon" \
196
+ --param image_size=square_hd \
197
+ --out assets/mascot.png
198
+
199
+ ploof video generate \
200
+ --provider fal \
201
+ --model <fal-video-endpoint-id> \
202
+ --prompt "Slow camera push through a miniature paper city" \
203
+ --input-reference assets/reference.png \
204
+ --param duration=4 \
205
+ --out assets/fal-video.mp4
206
+
207
+ ploof audio generate \
208
+ --provider fal \
209
+ --model <fal-audio-endpoint-id> \
210
+ --text "A short spoken line." \
211
+ --out assets/fal-audio.mp3
212
+ ```
213
+
214
+ Use `--param key=value` or `--json '{...}'` for endpoint-specific settings. Queue controls include `--start-timeout`, `--timeout`, `--poll-interval`, `--priority low|normal`, and `--storage-expires-in`.
215
+
121
216
  ## Image Generation
122
217
 
123
218
  OpenAI image generation and editing default to `gpt-image-2` when `--model` is omitted.
@@ -247,6 +342,55 @@ ploof video character create --name Mossy --video character.mp4 --output json
247
342
  ploof video character get char_abc123 --output json
248
343
  ```
249
344
 
345
+ ## Audio Generation And Processing
346
+
347
+ OpenAI audio generation defaults to `gpt-4o-mini-tts`, `alloy`, and `mp3` when model, voice, and format are omitted.
348
+
349
+ ```bash
350
+ ploof audio generate \
351
+ --provider openai \
352
+ --text "A concise product narration for the demo reel." \
353
+ --model gpt-4o-mini-tts \
354
+ --voice alloy \
355
+ --format mp3 \
356
+ --out assets/narration.mp3 \
357
+ --output json
358
+ ```
359
+
360
+ Useful generation flags:
361
+
362
+ | Flag | Description |
363
+ | --- | --- |
364
+ | `--model <model>` | TTS model, for example `gpt-4o-mini-tts` |
365
+ | `--voice <voice>` | Built-in voice such as `alloy`, `coral`, `nova`, or `shimmer` |
366
+ | `--voice-id <id>` | Custom voice id |
367
+ | `--instructions <text>` | Voice/style instructions for supported models |
368
+ | `--format <format>` | `mp3`, `opus`, `aac`, `flac`, `wav`, or `pcm` |
369
+ | `--speed <number>` | Speech speed |
370
+ | `--param key=value` | Provider-specific pass-through parameter |
371
+ | `--json '{...}'` | Provider-specific JSON object |
372
+
373
+ Transcription and translation:
374
+
375
+ ```bash
376
+ ploof audio transcribe \
377
+ --audio assets/narration.mp3 \
378
+ --model gpt-4o-mini-transcribe \
379
+ --out assets/transcript.json \
380
+ --output json
381
+
382
+ ploof audio translate \
383
+ --audio assets/spanish.mp3 \
384
+ --model whisper-1 \
385
+ --format text \
386
+ --out assets/translation.txt \
387
+ --output json
388
+ ```
389
+
390
+ Transcription supports `--language`, `--prompt`, `--format`, `--temperature`, `--include`, `--timestamp-granularity`, `--chunking-strategy`, `--known-speaker-name`, and `--known-speaker-reference`. Translation supports `--prompt`, `--format`, and `--temperature`.
391
+
392
+ Ploof writes complete static assets to disk. Streaming transport settings such as OpenAI `stream=true` for transcription or `stream_format=sse` for speech are rejected because they do not produce a finished asset file directly.
393
+
250
394
  ## Batch Manifests
251
395
 
252
396
  ```yaml
@@ -294,6 +438,36 @@ tasks:
294
438
  wait: true
295
439
  download: true
296
440
  output: assets/clip.mp4
441
+
442
+ - id: narration
443
+ kind: audio.generate
444
+ provider: openai
445
+ text: "Short narration for the generated clip."
446
+ params:
447
+ model: gpt-4o-mini-tts
448
+ voice: alloy
449
+ response_format: mp3
450
+ output: assets/narration.mp3
451
+
452
+ - id: transcript
453
+ kind: audio.transcribe
454
+ provider: openai
455
+ needs: [narration]
456
+ inputs:
457
+ audio:
458
+ task: narration
459
+ params:
460
+ model: gpt-4o-mini-transcribe
461
+ output: assets/transcript.json
462
+
463
+ - id: fal-icon
464
+ kind: model.run
465
+ provider: fal
466
+ model: fal-ai/flux/dev
467
+ prompt: "Small mascot icon for a CLI tool"
468
+ params:
469
+ image_size: square_hd
470
+ output: assets/fal-icon.png
297
471
  ```
298
472
 
299
473
  Run it:
@@ -303,6 +477,8 @@ ploof run assets.yaml --parallel 4
303
477
  ploof run assets.yaml --dry-run --output json
304
478
  ```
305
479
 
480
+ In manifests, media task kinds default to `provider: openai`; `model.run` defaults to `provider: fal`.
481
+
306
482
  ## Output Formats
307
483
 
308
484
  Ploof defaults to table output in TTYs and compact output when piped.
@@ -372,7 +548,7 @@ bun run build
372
548
  npm pack --dry-run
373
549
  ```
374
550
 
375
- The default test suite includes mocked OpenAI end-to-end tests. Those tests run real `ploof` CLI commands against a local mock OpenAI server and verify generated files, edit uploads, video job polling/downloads, sidecar metadata, and dependency-aware manifests without spending API credits.
551
+ The default test suite includes mocked OpenAI end-to-end tests and fal provider unit tests. The OpenAI tests run real `ploof` CLI commands against a local mock OpenAI server and verify generated files, edit uploads, video job polling/downloads, audio generation/processing, sidecar metadata, and dependency-aware manifests without spending API credits. The fal tests verify endpoint payload construction, local input upload mapping, polling options, and output persistence without spending API credits.
376
552
 
377
553
  Live OpenAI tests are opt-in only:
378
554
 
@@ -380,11 +556,19 @@ Live OpenAI tests are opt-in only:
380
556
  PLOOF_OPENAI_API_KEY=sk-... bun test tests/e2e
381
557
  ```
382
558
 
559
+ Live fal.ai tests are also opt-in and use `fal-ai/flux/schnell` by default:
560
+
561
+ ```bash
562
+ PLOOF_FAL_KEY=... bun test tests/e2e/fal-live.test.ts
563
+ ```
564
+
383
565
  Optional live-test overrides:
384
566
 
385
567
  ```bash
386
568
  PLOOF_OPENAI_LIVE_MODEL=gpt-image-2
387
569
  PLOOF_OPENAI_LIVE_SIZE=1024x1024
570
+ PLOOF_FAL_LIVE_MODEL=fal-ai/flux/schnell
571
+ PLOOF_FAL_LIVE_IMAGE_SIZE_PARAM=image_size=square_hd
388
572
  ```
389
573
 
390
574
  ## Publishing
package/SPEC.md CHANGED
@@ -2,7 +2,7 @@
2
2
 
3
3
  ## Summary
4
4
 
5
- Ploof is an npm-published CLI for generating and editing assets through AI generation providers. It starts with OpenAI image and video generation/editing, but the architecture must support multiple authenticated providers, multiple asset modalities, provider-specific settings, and parallel execution across mixed jobs.
5
+ Ploof is an npm-published CLI for generating, editing, and processing assets through AI generation providers. It supports OpenAI image, video, and audio generation/processing plus fal.ai model endpoints, while preserving an architecture for multiple authenticated providers, multiple asset modalities, provider-specific settings, and parallel execution across mixed jobs.
6
6
 
7
7
  The product should feel like a small, sharp developer tool: easy to run manually, predictable in scripts, and optimized for AI agents.
8
8
 
@@ -80,10 +80,11 @@ Local release verification must stop at `npm pack --dry-run`; do not run local
80
80
 
81
81
  ## Initial Provider Scope
82
82
 
83
- Version 1 starts with OpenAI only.
83
+ The current provider scope includes OpenAI and fal.ai.
84
84
 
85
- Initial capabilities:
85
+ Core operation kinds:
86
86
 
87
+ - `model.run`
87
88
  - `image.generate`
88
89
  - `image.edit`
89
90
  - `image.variation`
@@ -97,12 +98,19 @@ Initial capabilities:
97
98
  - `video.delete`
98
99
  - `video.character.create`
99
100
  - `video.character.get`
101
+ - `audio.generate`
102
+ - `audio.transcribe`
103
+ - `audio.translate`
100
104
 
101
105
  Future providers should be added through the provider registry without changing the manifest model.
102
106
 
107
+ Provider notes:
108
+
109
+ - OpenAI has first-class implementations for images, videos, audio/TTS, transcription, translation, and OpenAI video library operations.
110
+ - fal.ai uses the official `@fal-ai/client`, supports arbitrary endpoints through `model.run`, and supports image/video/audio commands when the chosen fal endpoint schema matches the command shape.
111
+
103
112
  Future high-leverage provider candidates:
104
113
 
105
- - fal.ai: strong multi-model generative media coverage.
106
114
  - Replicate: broad community model marketplace.
107
115
  - Hugging Face Inference Providers: centralized access to many hosted models/providers.
108
116
 
@@ -136,8 +144,12 @@ Environment overrides:
136
144
 
137
145
  - `PLOOF_OPENAI_API_KEY`
138
146
  - `OPENAI_API_KEY`
147
+ - `PLOOF_FAL_KEY`
148
+ - `FAL_KEY`
149
+ - `PLOOF_FAL_KEY_ID` and `PLOOF_FAL_KEY_SECRET`
150
+ - `FAL_KEY_ID` and `FAL_KEY_SECRET`
139
151
 
140
- The Ploof-specific env var wins over the provider-native env var. Stored credentials are used only when no env override is present.
152
+ The Ploof-specific env var wins over the provider-native env var. Stored credentials are used only when no env override is present. Split fal.ai key id/secret pairs are joined into the token format expected by the fal client.
141
153
 
142
154
  OpenAI profile metadata may also include:
143
155
 
@@ -163,9 +175,10 @@ OpenAI profile metadata may also include:
163
175
 
164
176
  ```bash
165
177
  ploof login openai --api-key <key> [--profile default] [--organization org] [--project proj] [--base-url url]
178
+ ploof login fal --api-key <key> [--profile default]
166
179
  ploof whoami [provider] [--profile default]
167
180
  ploof profiles [provider]
168
- ploof logout openai [--profile default]
181
+ ploof logout <provider> [--profile default]
169
182
  ```
170
183
 
171
184
  `login`, `whoami`, `profiles`, and `logout` are the only authentication
@@ -176,6 +189,10 @@ commands. Ploof should not expose a second equivalent auth namespace.
176
189
  when run in an interactive terminal. Non-interactive login fails if no key is
177
190
  provided.
178
191
 
192
+ `ploof login fal` accepts `--api-key`, reads `PLOOF_FAL_KEY` or `FAL_KEY`, and
193
+ also supports `PLOOF_FAL_KEY_ID`/`PLOOF_FAL_KEY_SECRET` or
194
+ `FAL_KEY_ID`/`FAL_KEY_SECRET` pairs.
195
+
179
196
  ### Config
180
197
 
181
198
  ```bash
@@ -239,6 +256,48 @@ authenticated project has DALL-E 2 variation access; if OpenAI returns a 404,
239
256
  use `ploof image edit` for image-to-image workflows. `ploof image variations`
240
257
  is an alias.
241
258
 
259
+ ### Generic Model Endpoints
260
+
261
+ `model.run` executes arbitrary provider model endpoints. It is primarily useful
262
+ for model marketplaces such as fal.ai, where the endpoint schema is selected by
263
+ `--model`.
264
+
265
+ ```bash
266
+ ploof model run \
267
+ --provider fal \
268
+ --model fal-ai/flux/dev \
269
+ --prompt "Small mascot icon for a CLI tool" \
270
+ --param image_size=square_hd \
271
+ --out assets/fal-icon.png \
272
+ --output json
273
+ ```
274
+
275
+ Named inputs preserve exact provider field names:
276
+
277
+ ```bash
278
+ ploof model run \
279
+ --provider fal \
280
+ --model <fal-endpoint-id> \
281
+ --prompt "Animate this source image" \
282
+ --input image_url=assets/source.png \
283
+ --param duration=4 \
284
+ --out assets/clip.mp4
285
+ ```
286
+
287
+ Model endpoint controls:
288
+
289
+ - `--param key=value`
290
+ - `--json '{...}'`
291
+ - `--input field=path-or-url`
292
+ - `--start-timeout <seconds>`
293
+ - `--timeout <seconds>`
294
+ - `--poll-interval <seconds>`
295
+ - `--priority low|normal`
296
+ - `--storage-expires-in <value>`
297
+
298
+ fal.ai commands should use queue polling and write complete returned assets or
299
+ text outputs to disk.
300
+
242
301
  ### Video Generation
243
302
 
244
303
  OpenAI video generation uses the asynchronous Videos API. `ploof video generate`
@@ -303,12 +362,89 @@ project is eligible for that workflow. Extensions accept a source video id or
303
362
  upload, plus a prompt and `--seconds`. `video remix` is supported for the SDK's
304
363
  legacy remix endpoint, but new integrations should prefer `video edit`.
305
364
 
365
+ ### Audio Generation And Processing
366
+
367
+ OpenAI audio generation uses the speech API and defaults to
368
+ `gpt-4o-mini-tts`, `alloy`, and `mp3` when model, voice, and output format are
369
+ omitted.
370
+
371
+ ```bash
372
+ ploof audio generate \
373
+ --provider openai \
374
+ --text "Short narration for the generated asset." \
375
+ --model gpt-4o-mini-tts \
376
+ --voice alloy \
377
+ --format mp3 \
378
+ --out assets/narration.mp3 \
379
+ --output json
380
+ ```
381
+
382
+ First-class OpenAI audio generation flags:
383
+
384
+ - `--model <model>`
385
+ - `--voice <voice>`
386
+ - `--voice-id <id>`
387
+ - `--instructions <text>`
388
+ - `--format <format>` / `--response-format <format>`
389
+ - `--speed <number>`
390
+ - `--param key=value`
391
+ - `--json '{...}'`
392
+
393
+ Audio processing supports transcription and English translation:
394
+
395
+ ```bash
396
+ ploof audio transcribe \
397
+ --audio assets/narration.mp3 \
398
+ --model gpt-4o-mini-transcribe \
399
+ --out assets/transcript.json \
400
+ --output json
401
+
402
+ ploof audio translate \
403
+ --audio assets/spanish.mp3 \
404
+ --model whisper-1 \
405
+ --format text \
406
+ --out assets/translation.txt \
407
+ --output json
408
+ ```
409
+
410
+ Transcription first-class flags:
411
+
412
+ - `--model <model>`
413
+ - `--language <code>`
414
+ - `--prompt <prompt>`
415
+ - `--format <format>` / `--response-format <format>`
416
+ - `--temperature <number>`
417
+ - `--include <value>`
418
+ - `--timestamp-granularity word|segment`
419
+ - `--chunking-strategy auto|{...}`
420
+ - `--known-speaker-name <name>`
421
+ - `--known-speaker-reference <data-url>`
422
+ - `--param key=value`
423
+ - `--json '{...}'`
424
+
425
+ Translation first-class flags:
426
+
427
+ - `--model <model>`
428
+ - `--prompt <prompt>`
429
+ - `--format <format>` / `--response-format <format>`
430
+ - `--temperature <number>`
431
+ - `--param key=value`
432
+ - `--json '{...}'`
433
+
434
+ Ploof is a static asset generation CLI. Audio commands request complete outputs
435
+ and write them to disk. Streaming transport settings such as OpenAI
436
+ `stream=true` for transcription or `stream_format=sse` for speech are rejected
437
+ because they do not directly produce finished asset files.
438
+
306
439
  ### Batch Run
307
440
 
308
441
  ```bash
309
442
  ploof run assets.yaml --parallel 4
310
443
  ```
311
444
 
445
+ Manifest media task kinds default to `provider: openai`; `model.run` defaults
446
+ to `provider: fal`.
447
+
312
448
  Manifest example:
313
449
 
314
450
  ```yaml
@@ -356,6 +492,36 @@ tasks:
356
492
  wait: true
357
493
  download: true
358
494
  output: assets/clip.mp4
495
+
496
+ - id: narration
497
+ kind: audio.generate
498
+ provider: openai
499
+ text: "Short narration for the generated clip."
500
+ params:
501
+ model: gpt-4o-mini-tts
502
+ voice: alloy
503
+ response_format: mp3
504
+ output: assets/narration.mp3
505
+
506
+ - id: transcript
507
+ kind: audio.transcribe
508
+ provider: openai
509
+ needs: [narration]
510
+ inputs:
511
+ audio:
512
+ task: narration
513
+ params:
514
+ model: gpt-4o-mini-transcribe
515
+ output: assets/transcript.json
516
+
517
+ - id: fal-icon
518
+ kind: model.run
519
+ provider: fal
520
+ model: fal-ai/flux/dev
521
+ prompt: "Small mascot icon for a CLI tool"
522
+ params:
523
+ image_size: square_hd
524
+ output: assets/fal-icon.png
359
525
  ```
360
526
 
361
527
  ## Asset Input Model
@@ -364,13 +530,18 @@ All input/context assets are normalized before provider execution:
364
530
 
365
531
  ```ts
366
532
  type AssetInput = {
367
- role: 'image' | 'mask' | 'reference' | 'style' | 'audio' | 'video'
533
+ role: 'image' | 'mask' | 'reference' | 'style' | 'audio' | 'video' | string
368
534
  source: string
369
535
  mime?: string
370
536
  name?: string
371
537
  }
372
538
  ```
373
539
 
540
+ Manifest `inputs` are a role map. Built-in aliases such as `images`,
541
+ `inputReference`, and `videos` normalize to `image`, `reference`, and `video`,
542
+ but providers can also consume custom roles like `style`, `control`, `pose`, or
543
+ `initImage` without changing the manifest schema.
544
+
374
545
  Supported sources:
375
546
 
376
547
  - Local paths.
@@ -388,6 +559,22 @@ OpenAI video generation/editing maps:
388
559
  - `role=reference` to `input_reference` for image-guided video generation.
389
560
  - `role=video` to source video uploads for eligible edit/extension workflows.
390
561
 
562
+ OpenAI audio processing maps:
563
+
564
+ - `role=audio` to the uploaded audio file for transcription or translation.
565
+
566
+ fal.ai media commands map common roles to URL fields:
567
+
568
+ - `role=image` and `role=reference` to `image_url`.
569
+ - `role=mask` to `mask_url`.
570
+ - `role=style` to `style_image_url`.
571
+ - `role=audio` to `audio_url`.
572
+ - `role=video` to `video_url`.
573
+
574
+ fal.ai `model.run` preserves exact input field names, so
575
+ `inputs.image_url` or `--input image_url=source.png` becomes `image_url` in the
576
+ provider input payload.
577
+
391
578
  Future providers can map roles such as `reference`, `style`, `init-image`, `audio`, or `video` differently.
392
579
 
393
580
  ## Provider Architecture
@@ -397,31 +584,37 @@ Provider modules implement a common interface:
397
584
  ```ts
398
585
  type Provider = {
399
586
  id: string
587
+ displayName?: string
400
588
  capabilities: ProviderCapability[]
401
- runImageGenerate(job, context): Promise<ProviderResult>
402
- runImageEdit(job, context): Promise<ProviderResult>
403
- runImageVariation(job, context): Promise<ProviderResult>
404
- runVideoGenerate(job, context): Promise<ProviderResult>
405
- runVideoEdit(job, context): Promise<ProviderResult>
406
- runVideoExtend(job, context): Promise<ProviderResult>
407
- runVideoRemix(job, context): Promise<ProviderResult>
408
- runVideoStatus(job, context): Promise<ProviderResult>
409
- runVideoDownload(job, context): Promise<ProviderResult>
410
- runVideoList(job, context): Promise<ProviderResult>
411
- runVideoDelete(job, context): Promise<ProviderResult>
412
- runVideoCharacterCreate(job, context): Promise<ProviderResult>
413
- runVideoCharacterGet(job, context): Promise<ProviderResult>
589
+ auth?: {
590
+ apiKeyEnvVars: string[]
591
+ apiKeyEnvPairs?: Array<{ idEnvVar: string; secretEnvVar: string }>
592
+ organizationEnvVar?: string
593
+ projectEnvVar?: string
594
+ baseURLEnvVar?: string
595
+ }
596
+ run(job, context): Promise<ProviderResult>
414
597
  }
415
598
  ```
416
599
 
417
600
  The provider registry owns:
418
601
 
419
602
  - Provider lookup.
420
- - Capability checks.
421
- - Credential resolution.
603
+ - Auth metadata lookup.
604
+ - Capability discovery.
605
+
606
+ Provider modules own:
607
+
422
608
  - Provider-specific validation.
609
+ - Provider SDK/client mapping.
610
+ - Dispatch from generic `AssetJob` objects to internal operation handlers.
611
+ - Output persistence details when the provider returns URLs, binary responses, or
612
+ structured data.
423
613
 
424
614
  The CLI should avoid hardcoding all provider behavior into command handlers.
615
+ Manifest execution should build generic `AssetJob` objects and call
616
+ `provider.run(job, context)` rather than calling modality-specific provider
617
+ methods directly.
425
618
 
426
619
  ## Settings Strategy
427
620
 
@@ -461,6 +654,11 @@ Asset-producing commands should write the asset to disk and print structured met
461
654
  }
462
655
  ```
463
656
 
657
+ Ploof is a static asset generation tool. Providers may use asynchronous jobs,
658
+ polling, or queue subscriptions internally, but CLI consumers receive completed
659
+ files or text outputs after the command finishes. Streaming transports should
660
+ not be exposed as the primary consumption model.
661
+
464
662
  Each generated file should have an optional sidecar metadata file:
465
663
 
466
664
  ```text