@miketromba/ploof 0.2.0 → 0.4.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +188 -4
- package/SPEC.md +220 -22
- package/dist/ploof.js +223 -216
- package/package.json +5 -2
- package/skills/asset-generation/SKILL.md +1 -1
package/README.md
CHANGED
|
@@ -9,7 +9,7 @@
|
|
|
9
9
|
<img src="https://img.shields.io/badge/node-%3E%3D18-brightgreen" alt="node version" />
|
|
10
10
|
</p>
|
|
11
11
|
|
|
12
|
-
Ploof is a CLI for generating and editing creative assets with AI providers. It supports OpenAI image
|
|
12
|
+
Ploof is a CLI for generating and editing creative assets with AI providers. It supports OpenAI image, video, and audio generation/processing, plus fal.ai's model marketplace through the official fal client. The provider registry is designed for broader model marketplaces over time.
|
|
13
13
|
|
|
14
14
|
It is built for both developers and AI agents: predictable commands, parseable output, local auth profiles, YAML manifests, parallel execution, and a companion skill.
|
|
15
15
|
|
|
@@ -24,13 +24,18 @@ It is built for both developers and AI agents: predictable commands, parseable o
|
|
|
24
24
|
| OpenAI video generation | Supported |
|
|
25
25
|
| OpenAI video editing/extensions | Supported |
|
|
26
26
|
| OpenAI video downloads/library/characters | Supported |
|
|
27
|
+
| OpenAI audio generation / TTS | Supported |
|
|
28
|
+
| OpenAI audio transcription | Supported |
|
|
29
|
+
| OpenAI audio translation | Supported |
|
|
30
|
+
| fal.ai auth profiles | Supported |
|
|
31
|
+
| fal.ai model endpoints | Supported through `ploof model run` |
|
|
32
|
+
| fal.ai image/video/audio endpoints | Supported through `--provider fal --model <endpoint-id>` |
|
|
27
33
|
| Context images and masks | Supported |
|
|
28
|
-
|
|
|
34
|
+
| Image, video, and audio input assets | Supported |
|
|
29
35
|
| YAML/JSON batch manifests | Supported |
|
|
30
36
|
| Dependency-aware parallel runs | Supported |
|
|
31
37
|
| Agent instructions via `ploof learn` | Supported |
|
|
32
38
|
| Additional providers | Planned |
|
|
33
|
-
| Audio generation | Planned |
|
|
34
39
|
|
|
35
40
|
## Install
|
|
36
41
|
|
|
@@ -58,6 +63,7 @@ npx @miketromba/ploof --help
|
|
|
58
63
|
```bash
|
|
59
64
|
# Authenticate
|
|
60
65
|
ploof login openai --api-key <your-api-key>
|
|
66
|
+
ploof login fal --api-key <your-fal-key>
|
|
61
67
|
|
|
62
68
|
# Generate an image
|
|
63
69
|
ploof image generate \
|
|
@@ -80,8 +86,26 @@ ploof video generate \
|
|
|
80
86
|
--seconds 4 \
|
|
81
87
|
--out assets/clip.mp4
|
|
82
88
|
|
|
89
|
+
# Generate and transcribe speech
|
|
90
|
+
ploof audio generate \
|
|
91
|
+
--text "Ploof can generate speech and process audio." \
|
|
92
|
+
--voice alloy \
|
|
93
|
+
--out assets/speech.mp3
|
|
94
|
+
|
|
95
|
+
ploof audio transcribe \
|
|
96
|
+
--audio assets/speech.mp3 \
|
|
97
|
+
--out assets/transcript.json
|
|
98
|
+
|
|
83
99
|
# Run a manifest
|
|
84
100
|
ploof run assets.yaml --parallel 4
|
|
101
|
+
|
|
102
|
+
# Run any fal.ai endpoint directly
|
|
103
|
+
ploof model run \
|
|
104
|
+
--provider fal \
|
|
105
|
+
--model fal-ai/flux/dev \
|
|
106
|
+
--prompt "Friendly CLI mascot icon, simple shape, transparent background" \
|
|
107
|
+
--param image_size=square_hd \
|
|
108
|
+
--out assets/icon.png
|
|
85
109
|
```
|
|
86
110
|
|
|
87
111
|
## Authentication
|
|
@@ -94,6 +118,8 @@ ploof login openai --api-key <your-api-key> --profile work
|
|
|
94
118
|
ploof whoami openai
|
|
95
119
|
ploof profiles openai
|
|
96
120
|
ploof logout openai --profile work
|
|
121
|
+
ploof login fal --api-key <your-fal-key>
|
|
122
|
+
ploof whoami fal
|
|
97
123
|
```
|
|
98
124
|
|
|
99
125
|
If `--api-key` is omitted, `ploof login openai` reads
|
|
@@ -106,6 +132,20 @@ Environment variables override stored credentials:
|
|
|
106
132
|
export PLOOF_OPENAI_API_KEY=sk-...
|
|
107
133
|
# or
|
|
108
134
|
export OPENAI_API_KEY=sk-...
|
|
135
|
+
|
|
136
|
+
export PLOOF_FAL_KEY=...
|
|
137
|
+
# or
|
|
138
|
+
export FAL_KEY=...
|
|
139
|
+
```
|
|
140
|
+
|
|
141
|
+
fal.ai split key environment variables are also supported:
|
|
142
|
+
|
|
143
|
+
```bash
|
|
144
|
+
export PLOOF_FAL_KEY_ID=...
|
|
145
|
+
export PLOOF_FAL_KEY_SECRET=...
|
|
146
|
+
# or
|
|
147
|
+
export FAL_KEY_ID=...
|
|
148
|
+
export FAL_KEY_SECRET=...
|
|
109
149
|
```
|
|
110
150
|
|
|
111
151
|
OpenAI profile metadata:
|
|
@@ -118,6 +158,61 @@ ploof login openai \
|
|
|
118
158
|
--base-url <url>
|
|
119
159
|
```
|
|
120
160
|
|
|
161
|
+
## fal.ai Model Endpoints
|
|
162
|
+
|
|
163
|
+
fal.ai support uses the official `@fal-ai/client`. Ploof uploads local asset inputs through fal storage, submits work through the fal queue in polling mode, waits for a complete response, and writes returned assets or text to disk.
|
|
164
|
+
|
|
165
|
+
Use `ploof model run` for arbitrary fal endpoints:
|
|
166
|
+
|
|
167
|
+
```bash
|
|
168
|
+
ploof model run \
|
|
169
|
+
--provider fal \
|
|
170
|
+
--model fal-ai/flux/dev \
|
|
171
|
+
--prompt "Tiny app icon for a cheerful asset generation CLI" \
|
|
172
|
+
--param image_size=square_hd \
|
|
173
|
+
--out assets/fal-icon.png \
|
|
174
|
+
--output json
|
|
175
|
+
```
|
|
176
|
+
|
|
177
|
+
Named asset inputs map directly to provider input fields:
|
|
178
|
+
|
|
179
|
+
```bash
|
|
180
|
+
ploof model run \
|
|
181
|
+
--provider fal \
|
|
182
|
+
--model <fal-endpoint-id> \
|
|
183
|
+
--prompt "Animate this image into a short loop" \
|
|
184
|
+
--input image_url=assets/source.png \
|
|
185
|
+
--param duration=4 \
|
|
186
|
+
--out assets/loop.mp4
|
|
187
|
+
```
|
|
188
|
+
|
|
189
|
+
The media commands also work with fal when you provide the fal endpoint id as `--model`:
|
|
190
|
+
|
|
191
|
+
```bash
|
|
192
|
+
ploof image generate \
|
|
193
|
+
--provider fal \
|
|
194
|
+
--model fal-ai/flux/dev \
|
|
195
|
+
--prompt "Soft clay mascot icon" \
|
|
196
|
+
--param image_size=square_hd \
|
|
197
|
+
--out assets/mascot.png
|
|
198
|
+
|
|
199
|
+
ploof video generate \
|
|
200
|
+
--provider fal \
|
|
201
|
+
--model <fal-video-endpoint-id> \
|
|
202
|
+
--prompt "Slow camera push through a miniature paper city" \
|
|
203
|
+
--input-reference assets/reference.png \
|
|
204
|
+
--param duration=4 \
|
|
205
|
+
--out assets/fal-video.mp4
|
|
206
|
+
|
|
207
|
+
ploof audio generate \
|
|
208
|
+
--provider fal \
|
|
209
|
+
--model <fal-audio-endpoint-id> \
|
|
210
|
+
--text "A short spoken line." \
|
|
211
|
+
--out assets/fal-audio.mp3
|
|
212
|
+
```
|
|
213
|
+
|
|
214
|
+
Use `--param key=value` or `--json '{...}'` for endpoint-specific settings. Queue controls include `--start-timeout`, `--timeout`, `--poll-interval`, `--priority low|normal`, and `--storage-expires-in`.
|
|
215
|
+
|
|
121
216
|
## Image Generation
|
|
122
217
|
|
|
123
218
|
OpenAI image generation and editing default to `gpt-image-2` when `--model` is omitted.
|
|
@@ -247,6 +342,55 @@ ploof video character create --name Mossy --video character.mp4 --output json
|
|
|
247
342
|
ploof video character get char_abc123 --output json
|
|
248
343
|
```
|
|
249
344
|
|
|
345
|
+
## Audio Generation And Processing
|
|
346
|
+
|
|
347
|
+
OpenAI audio generation defaults to `gpt-4o-mini-tts`, `alloy`, and `mp3` when model, voice, and format are omitted.
|
|
348
|
+
|
|
349
|
+
```bash
|
|
350
|
+
ploof audio generate \
|
|
351
|
+
--provider openai \
|
|
352
|
+
--text "A concise product narration for the demo reel." \
|
|
353
|
+
--model gpt-4o-mini-tts \
|
|
354
|
+
--voice alloy \
|
|
355
|
+
--format mp3 \
|
|
356
|
+
--out assets/narration.mp3 \
|
|
357
|
+
--output json
|
|
358
|
+
```
|
|
359
|
+
|
|
360
|
+
Useful generation flags:
|
|
361
|
+
|
|
362
|
+
| Flag | Description |
|
|
363
|
+
| --- | --- |
|
|
364
|
+
| `--model <model>` | TTS model, for example `gpt-4o-mini-tts` |
|
|
365
|
+
| `--voice <voice>` | Built-in voice such as `alloy`, `coral`, `nova`, or `shimmer` |
|
|
366
|
+
| `--voice-id <id>` | Custom voice id |
|
|
367
|
+
| `--instructions <text>` | Voice/style instructions for supported models |
|
|
368
|
+
| `--format <format>` | `mp3`, `opus`, `aac`, `flac`, `wav`, or `pcm` |
|
|
369
|
+
| `--speed <number>` | Speech speed |
|
|
370
|
+
| `--param key=value` | Provider-specific pass-through parameter |
|
|
371
|
+
| `--json '{...}'` | Provider-specific JSON object |
|
|
372
|
+
|
|
373
|
+
Transcription and translation:
|
|
374
|
+
|
|
375
|
+
```bash
|
|
376
|
+
ploof audio transcribe \
|
|
377
|
+
--audio assets/narration.mp3 \
|
|
378
|
+
--model gpt-4o-mini-transcribe \
|
|
379
|
+
--out assets/transcript.json \
|
|
380
|
+
--output json
|
|
381
|
+
|
|
382
|
+
ploof audio translate \
|
|
383
|
+
--audio assets/spanish.mp3 \
|
|
384
|
+
--model whisper-1 \
|
|
385
|
+
--format text \
|
|
386
|
+
--out assets/translation.txt \
|
|
387
|
+
--output json
|
|
388
|
+
```
|
|
389
|
+
|
|
390
|
+
Transcription supports `--language`, `--prompt`, `--format`, `--temperature`, `--include`, `--timestamp-granularity`, `--chunking-strategy`, `--known-speaker-name`, and `--known-speaker-reference`. Translation supports `--prompt`, `--format`, and `--temperature`.
|
|
391
|
+
|
|
392
|
+
Ploof writes complete static assets to disk. Streaming transport settings such as OpenAI `stream=true` for transcription or `stream_format=sse` for speech are rejected because they do not produce a finished asset file directly.
|
|
393
|
+
|
|
250
394
|
## Batch Manifests
|
|
251
395
|
|
|
252
396
|
```yaml
|
|
@@ -294,6 +438,36 @@ tasks:
|
|
|
294
438
|
wait: true
|
|
295
439
|
download: true
|
|
296
440
|
output: assets/clip.mp4
|
|
441
|
+
|
|
442
|
+
- id: narration
|
|
443
|
+
kind: audio.generate
|
|
444
|
+
provider: openai
|
|
445
|
+
text: "Short narration for the generated clip."
|
|
446
|
+
params:
|
|
447
|
+
model: gpt-4o-mini-tts
|
|
448
|
+
voice: alloy
|
|
449
|
+
response_format: mp3
|
|
450
|
+
output: assets/narration.mp3
|
|
451
|
+
|
|
452
|
+
- id: transcript
|
|
453
|
+
kind: audio.transcribe
|
|
454
|
+
provider: openai
|
|
455
|
+
needs: [narration]
|
|
456
|
+
inputs:
|
|
457
|
+
audio:
|
|
458
|
+
task: narration
|
|
459
|
+
params:
|
|
460
|
+
model: gpt-4o-mini-transcribe
|
|
461
|
+
output: assets/transcript.json
|
|
462
|
+
|
|
463
|
+
- id: fal-icon
|
|
464
|
+
kind: model.run
|
|
465
|
+
provider: fal
|
|
466
|
+
model: fal-ai/flux/dev
|
|
467
|
+
prompt: "Small mascot icon for a CLI tool"
|
|
468
|
+
params:
|
|
469
|
+
image_size: square_hd
|
|
470
|
+
output: assets/fal-icon.png
|
|
297
471
|
```
|
|
298
472
|
|
|
299
473
|
Run it:
|
|
@@ -303,6 +477,8 @@ ploof run assets.yaml --parallel 4
|
|
|
303
477
|
ploof run assets.yaml --dry-run --output json
|
|
304
478
|
```
|
|
305
479
|
|
|
480
|
+
In manifests, media task kinds default to `provider: openai`; `model.run` defaults to `provider: fal`.
|
|
481
|
+
|
|
306
482
|
## Output Formats
|
|
307
483
|
|
|
308
484
|
Ploof defaults to table output in TTYs and compact output when piped.
|
|
@@ -372,7 +548,7 @@ bun run build
|
|
|
372
548
|
npm pack --dry-run
|
|
373
549
|
```
|
|
374
550
|
|
|
375
|
-
The default test suite includes mocked OpenAI end-to-end tests.
|
|
551
|
+
The default test suite includes mocked OpenAI end-to-end tests and fal provider unit tests. The OpenAI tests run real `ploof` CLI commands against a local mock OpenAI server and verify generated files, edit uploads, video job polling/downloads, audio generation/processing, sidecar metadata, and dependency-aware manifests without spending API credits. The fal tests verify endpoint payload construction, local input upload mapping, polling options, and output persistence without spending API credits.
|
|
376
552
|
|
|
377
553
|
Live OpenAI tests are opt-in only:
|
|
378
554
|
|
|
@@ -380,11 +556,19 @@ Live OpenAI tests are opt-in only:
|
|
|
380
556
|
PLOOF_OPENAI_API_KEY=sk-... bun test tests/e2e
|
|
381
557
|
```
|
|
382
558
|
|
|
559
|
+
Live fal.ai tests are also opt-in and use `fal-ai/flux/schnell` by default:
|
|
560
|
+
|
|
561
|
+
```bash
|
|
562
|
+
PLOOF_FAL_KEY=... bun test tests/e2e/fal-live.test.ts
|
|
563
|
+
```
|
|
564
|
+
|
|
383
565
|
Optional live-test overrides:
|
|
384
566
|
|
|
385
567
|
```bash
|
|
386
568
|
PLOOF_OPENAI_LIVE_MODEL=gpt-image-2
|
|
387
569
|
PLOOF_OPENAI_LIVE_SIZE=1024x1024
|
|
570
|
+
PLOOF_FAL_LIVE_MODEL=fal-ai/flux/schnell
|
|
571
|
+
PLOOF_FAL_LIVE_IMAGE_SIZE_PARAM=image_size=square_hd
|
|
388
572
|
```
|
|
389
573
|
|
|
390
574
|
## Publishing
|
package/SPEC.md
CHANGED
|
@@ -2,7 +2,7 @@
|
|
|
2
2
|
|
|
3
3
|
## Summary
|
|
4
4
|
|
|
5
|
-
Ploof is an npm-published CLI for generating and
|
|
5
|
+
Ploof is an npm-published CLI for generating, editing, and processing assets through AI generation providers. It supports OpenAI image, video, and audio generation/processing plus fal.ai model endpoints, while preserving an architecture for multiple authenticated providers, multiple asset modalities, provider-specific settings, and parallel execution across mixed jobs.
|
|
6
6
|
|
|
7
7
|
The product should feel like a small, sharp developer tool: easy to run manually, predictable in scripts, and optimized for AI agents.
|
|
8
8
|
|
|
@@ -80,10 +80,11 @@ Local release verification must stop at `npm pack --dry-run`; do not run local
|
|
|
80
80
|
|
|
81
81
|
## Initial Provider Scope
|
|
82
82
|
|
|
83
|
-
|
|
83
|
+
The current provider scope includes OpenAI and fal.ai.
|
|
84
84
|
|
|
85
|
-
|
|
85
|
+
Core operation kinds:
|
|
86
86
|
|
|
87
|
+
- `model.run`
|
|
87
88
|
- `image.generate`
|
|
88
89
|
- `image.edit`
|
|
89
90
|
- `image.variation`
|
|
@@ -97,12 +98,19 @@ Initial capabilities:
|
|
|
97
98
|
- `video.delete`
|
|
98
99
|
- `video.character.create`
|
|
99
100
|
- `video.character.get`
|
|
101
|
+
- `audio.generate`
|
|
102
|
+
- `audio.transcribe`
|
|
103
|
+
- `audio.translate`
|
|
100
104
|
|
|
101
105
|
Future providers should be added through the provider registry without changing the manifest model.
|
|
102
106
|
|
|
107
|
+
Provider notes:
|
|
108
|
+
|
|
109
|
+
- OpenAI has first-class implementations for images, videos, audio/TTS, transcription, translation, and OpenAI video library operations.
|
|
110
|
+
- fal.ai uses the official `@fal-ai/client`, supports arbitrary endpoints through `model.run`, and supports image/video/audio commands when the chosen fal endpoint schema matches the command shape.
|
|
111
|
+
|
|
103
112
|
Future high-leverage provider candidates:
|
|
104
113
|
|
|
105
|
-
- fal.ai: strong multi-model generative media coverage.
|
|
106
114
|
- Replicate: broad community model marketplace.
|
|
107
115
|
- Hugging Face Inference Providers: centralized access to many hosted models/providers.
|
|
108
116
|
|
|
@@ -136,8 +144,12 @@ Environment overrides:
|
|
|
136
144
|
|
|
137
145
|
- `PLOOF_OPENAI_API_KEY`
|
|
138
146
|
- `OPENAI_API_KEY`
|
|
147
|
+
- `PLOOF_FAL_KEY`
|
|
148
|
+
- `FAL_KEY`
|
|
149
|
+
- `PLOOF_FAL_KEY_ID` and `PLOOF_FAL_KEY_SECRET`
|
|
150
|
+
- `FAL_KEY_ID` and `FAL_KEY_SECRET`
|
|
139
151
|
|
|
140
|
-
The Ploof-specific env var wins over the provider-native env var. Stored credentials are used only when no env override is present.
|
|
152
|
+
The Ploof-specific env var wins over the provider-native env var. Stored credentials are used only when no env override is present. Split fal.ai key id/secret pairs are joined into the token format expected by the fal client.
|
|
141
153
|
|
|
142
154
|
OpenAI profile metadata may also include:
|
|
143
155
|
|
|
@@ -163,9 +175,10 @@ OpenAI profile metadata may also include:
|
|
|
163
175
|
|
|
164
176
|
```bash
|
|
165
177
|
ploof login openai --api-key <key> [--profile default] [--organization org] [--project proj] [--base-url url]
|
|
178
|
+
ploof login fal --api-key <key> [--profile default]
|
|
166
179
|
ploof whoami [provider] [--profile default]
|
|
167
180
|
ploof profiles [provider]
|
|
168
|
-
ploof logout
|
|
181
|
+
ploof logout <provider> [--profile default]
|
|
169
182
|
```
|
|
170
183
|
|
|
171
184
|
`login`, `whoami`, `profiles`, and `logout` are the only authentication
|
|
@@ -176,6 +189,10 @@ commands. Ploof should not expose a second equivalent auth namespace.
|
|
|
176
189
|
when run in an interactive terminal. Non-interactive login fails if no key is
|
|
177
190
|
provided.
|
|
178
191
|
|
|
192
|
+
`ploof login fal` accepts `--api-key`, reads `PLOOF_FAL_KEY` or `FAL_KEY`, and
|
|
193
|
+
also supports `PLOOF_FAL_KEY_ID`/`PLOOF_FAL_KEY_SECRET` or
|
|
194
|
+
`FAL_KEY_ID`/`FAL_KEY_SECRET` pairs.
|
|
195
|
+
|
|
179
196
|
### Config
|
|
180
197
|
|
|
181
198
|
```bash
|
|
@@ -239,6 +256,48 @@ authenticated project has DALL-E 2 variation access; if OpenAI returns a 404,
|
|
|
239
256
|
use `ploof image edit` for image-to-image workflows. `ploof image variations`
|
|
240
257
|
is an alias.
|
|
241
258
|
|
|
259
|
+
### Generic Model Endpoints
|
|
260
|
+
|
|
261
|
+
`model.run` executes arbitrary provider model endpoints. It is primarily useful
|
|
262
|
+
for model marketplaces such as fal.ai, where the endpoint schema is selected by
|
|
263
|
+
`--model`.
|
|
264
|
+
|
|
265
|
+
```bash
|
|
266
|
+
ploof model run \
|
|
267
|
+
--provider fal \
|
|
268
|
+
--model fal-ai/flux/dev \
|
|
269
|
+
--prompt "Small mascot icon for a CLI tool" \
|
|
270
|
+
--param image_size=square_hd \
|
|
271
|
+
--out assets/fal-icon.png \
|
|
272
|
+
--output json
|
|
273
|
+
```
|
|
274
|
+
|
|
275
|
+
Named inputs preserve exact provider field names:
|
|
276
|
+
|
|
277
|
+
```bash
|
|
278
|
+
ploof model run \
|
|
279
|
+
--provider fal \
|
|
280
|
+
--model <fal-endpoint-id> \
|
|
281
|
+
--prompt "Animate this source image" \
|
|
282
|
+
--input image_url=assets/source.png \
|
|
283
|
+
--param duration=4 \
|
|
284
|
+
--out assets/clip.mp4
|
|
285
|
+
```
|
|
286
|
+
|
|
287
|
+
Model endpoint controls:
|
|
288
|
+
|
|
289
|
+
- `--param key=value`
|
|
290
|
+
- `--json '{...}'`
|
|
291
|
+
- `--input field=path-or-url`
|
|
292
|
+
- `--start-timeout <seconds>`
|
|
293
|
+
- `--timeout <seconds>`
|
|
294
|
+
- `--poll-interval <seconds>`
|
|
295
|
+
- `--priority low|normal`
|
|
296
|
+
- `--storage-expires-in <value>`
|
|
297
|
+
|
|
298
|
+
fal.ai commands should use queue polling and write complete returned assets or
|
|
299
|
+
text outputs to disk.
|
|
300
|
+
|
|
242
301
|
### Video Generation
|
|
243
302
|
|
|
244
303
|
OpenAI video generation uses the asynchronous Videos API. `ploof video generate`
|
|
@@ -303,12 +362,89 @@ project is eligible for that workflow. Extensions accept a source video id or
|
|
|
303
362
|
upload, plus a prompt and `--seconds`. `video remix` is supported for the SDK's
|
|
304
363
|
legacy remix endpoint, but new integrations should prefer `video edit`.
|
|
305
364
|
|
|
365
|
+
### Audio Generation And Processing
|
|
366
|
+
|
|
367
|
+
OpenAI audio generation uses the speech API and defaults to
|
|
368
|
+
`gpt-4o-mini-tts`, `alloy`, and `mp3` when model, voice, and output format are
|
|
369
|
+
omitted.
|
|
370
|
+
|
|
371
|
+
```bash
|
|
372
|
+
ploof audio generate \
|
|
373
|
+
--provider openai \
|
|
374
|
+
--text "Short narration for the generated asset." \
|
|
375
|
+
--model gpt-4o-mini-tts \
|
|
376
|
+
--voice alloy \
|
|
377
|
+
--format mp3 \
|
|
378
|
+
--out assets/narration.mp3 \
|
|
379
|
+
--output json
|
|
380
|
+
```
|
|
381
|
+
|
|
382
|
+
First-class OpenAI audio generation flags:
|
|
383
|
+
|
|
384
|
+
- `--model <model>`
|
|
385
|
+
- `--voice <voice>`
|
|
386
|
+
- `--voice-id <id>`
|
|
387
|
+
- `--instructions <text>`
|
|
388
|
+
- `--format <format>` / `--response-format <format>`
|
|
389
|
+
- `--speed <number>`
|
|
390
|
+
- `--param key=value`
|
|
391
|
+
- `--json '{...}'`
|
|
392
|
+
|
|
393
|
+
Audio processing supports transcription and English translation:
|
|
394
|
+
|
|
395
|
+
```bash
|
|
396
|
+
ploof audio transcribe \
|
|
397
|
+
--audio assets/narration.mp3 \
|
|
398
|
+
--model gpt-4o-mini-transcribe \
|
|
399
|
+
--out assets/transcript.json \
|
|
400
|
+
--output json
|
|
401
|
+
|
|
402
|
+
ploof audio translate \
|
|
403
|
+
--audio assets/spanish.mp3 \
|
|
404
|
+
--model whisper-1 \
|
|
405
|
+
--format text \
|
|
406
|
+
--out assets/translation.txt \
|
|
407
|
+
--output json
|
|
408
|
+
```
|
|
409
|
+
|
|
410
|
+
Transcription first-class flags:
|
|
411
|
+
|
|
412
|
+
- `--model <model>`
|
|
413
|
+
- `--language <code>`
|
|
414
|
+
- `--prompt <prompt>`
|
|
415
|
+
- `--format <format>` / `--response-format <format>`
|
|
416
|
+
- `--temperature <number>`
|
|
417
|
+
- `--include <value>`
|
|
418
|
+
- `--timestamp-granularity word|segment`
|
|
419
|
+
- `--chunking-strategy auto|{...}`
|
|
420
|
+
- `--known-speaker-name <name>`
|
|
421
|
+
- `--known-speaker-reference <data-url>`
|
|
422
|
+
- `--param key=value`
|
|
423
|
+
- `--json '{...}'`
|
|
424
|
+
|
|
425
|
+
Translation first-class flags:
|
|
426
|
+
|
|
427
|
+
- `--model <model>`
|
|
428
|
+
- `--prompt <prompt>`
|
|
429
|
+
- `--format <format>` / `--response-format <format>`
|
|
430
|
+
- `--temperature <number>`
|
|
431
|
+
- `--param key=value`
|
|
432
|
+
- `--json '{...}'`
|
|
433
|
+
|
|
434
|
+
Ploof is a static asset generation CLI. Audio commands request complete outputs
|
|
435
|
+
and write them to disk. Streaming transport settings such as OpenAI
|
|
436
|
+
`stream=true` for transcription or `stream_format=sse` for speech are rejected
|
|
437
|
+
because they do not directly produce finished asset files.
|
|
438
|
+
|
|
306
439
|
### Batch Run
|
|
307
440
|
|
|
308
441
|
```bash
|
|
309
442
|
ploof run assets.yaml --parallel 4
|
|
310
443
|
```
|
|
311
444
|
|
|
445
|
+
Manifest media task kinds default to `provider: openai`; `model.run` defaults
|
|
446
|
+
to `provider: fal`.
|
|
447
|
+
|
|
312
448
|
Manifest example:
|
|
313
449
|
|
|
314
450
|
```yaml
|
|
@@ -356,6 +492,36 @@ tasks:
|
|
|
356
492
|
wait: true
|
|
357
493
|
download: true
|
|
358
494
|
output: assets/clip.mp4
|
|
495
|
+
|
|
496
|
+
- id: narration
|
|
497
|
+
kind: audio.generate
|
|
498
|
+
provider: openai
|
|
499
|
+
text: "Short narration for the generated clip."
|
|
500
|
+
params:
|
|
501
|
+
model: gpt-4o-mini-tts
|
|
502
|
+
voice: alloy
|
|
503
|
+
response_format: mp3
|
|
504
|
+
output: assets/narration.mp3
|
|
505
|
+
|
|
506
|
+
- id: transcript
|
|
507
|
+
kind: audio.transcribe
|
|
508
|
+
provider: openai
|
|
509
|
+
needs: [narration]
|
|
510
|
+
inputs:
|
|
511
|
+
audio:
|
|
512
|
+
task: narration
|
|
513
|
+
params:
|
|
514
|
+
model: gpt-4o-mini-transcribe
|
|
515
|
+
output: assets/transcript.json
|
|
516
|
+
|
|
517
|
+
- id: fal-icon
|
|
518
|
+
kind: model.run
|
|
519
|
+
provider: fal
|
|
520
|
+
model: fal-ai/flux/dev
|
|
521
|
+
prompt: "Small mascot icon for a CLI tool"
|
|
522
|
+
params:
|
|
523
|
+
image_size: square_hd
|
|
524
|
+
output: assets/fal-icon.png
|
|
359
525
|
```
|
|
360
526
|
|
|
361
527
|
## Asset Input Model
|
|
@@ -364,13 +530,18 @@ All input/context assets are normalized before provider execution:
|
|
|
364
530
|
|
|
365
531
|
```ts
|
|
366
532
|
type AssetInput = {
|
|
367
|
-
role: 'image' | 'mask' | 'reference' | 'style' | 'audio' | 'video'
|
|
533
|
+
role: 'image' | 'mask' | 'reference' | 'style' | 'audio' | 'video' | string
|
|
368
534
|
source: string
|
|
369
535
|
mime?: string
|
|
370
536
|
name?: string
|
|
371
537
|
}
|
|
372
538
|
```
|
|
373
539
|
|
|
540
|
+
Manifest `inputs` are a role map. Built-in aliases such as `images`,
|
|
541
|
+
`inputReference`, and `videos` normalize to `image`, `reference`, and `video`,
|
|
542
|
+
but providers can also consume custom roles like `style`, `control`, `pose`, or
|
|
543
|
+
`initImage` without changing the manifest schema.
|
|
544
|
+
|
|
374
545
|
Supported sources:
|
|
375
546
|
|
|
376
547
|
- Local paths.
|
|
@@ -388,6 +559,22 @@ OpenAI video generation/editing maps:
|
|
|
388
559
|
- `role=reference` to `input_reference` for image-guided video generation.
|
|
389
560
|
- `role=video` to source video uploads for eligible edit/extension workflows.
|
|
390
561
|
|
|
562
|
+
OpenAI audio processing maps:
|
|
563
|
+
|
|
564
|
+
- `role=audio` to the uploaded audio file for transcription or translation.
|
|
565
|
+
|
|
566
|
+
fal.ai media commands map common roles to URL fields:
|
|
567
|
+
|
|
568
|
+
- `role=image` and `role=reference` to `image_url`.
|
|
569
|
+
- `role=mask` to `mask_url`.
|
|
570
|
+
- `role=style` to `style_image_url`.
|
|
571
|
+
- `role=audio` to `audio_url`.
|
|
572
|
+
- `role=video` to `video_url`.
|
|
573
|
+
|
|
574
|
+
fal.ai `model.run` preserves exact input field names, so
|
|
575
|
+
`inputs.image_url` or `--input image_url=source.png` becomes `image_url` in the
|
|
576
|
+
provider input payload.
|
|
577
|
+
|
|
391
578
|
Future providers can map roles such as `reference`, `style`, `init-image`, `audio`, or `video` differently.
|
|
392
579
|
|
|
393
580
|
## Provider Architecture
|
|
@@ -397,31 +584,37 @@ Provider modules implement a common interface:
|
|
|
397
584
|
```ts
|
|
398
585
|
type Provider = {
|
|
399
586
|
id: string
|
|
587
|
+
displayName?: string
|
|
400
588
|
capabilities: ProviderCapability[]
|
|
401
|
-
|
|
402
|
-
|
|
403
|
-
|
|
404
|
-
|
|
405
|
-
|
|
406
|
-
|
|
407
|
-
|
|
408
|
-
|
|
409
|
-
runVideoDownload(job, context): Promise<ProviderResult>
|
|
410
|
-
runVideoList(job, context): Promise<ProviderResult>
|
|
411
|
-
runVideoDelete(job, context): Promise<ProviderResult>
|
|
412
|
-
runVideoCharacterCreate(job, context): Promise<ProviderResult>
|
|
413
|
-
runVideoCharacterGet(job, context): Promise<ProviderResult>
|
|
589
|
+
auth?: {
|
|
590
|
+
apiKeyEnvVars: string[]
|
|
591
|
+
apiKeyEnvPairs?: Array<{ idEnvVar: string; secretEnvVar: string }>
|
|
592
|
+
organizationEnvVar?: string
|
|
593
|
+
projectEnvVar?: string
|
|
594
|
+
baseURLEnvVar?: string
|
|
595
|
+
}
|
|
596
|
+
run(job, context): Promise<ProviderResult>
|
|
414
597
|
}
|
|
415
598
|
```
|
|
416
599
|
|
|
417
600
|
The provider registry owns:
|
|
418
601
|
|
|
419
602
|
- Provider lookup.
|
|
420
|
-
-
|
|
421
|
-
-
|
|
603
|
+
- Auth metadata lookup.
|
|
604
|
+
- Capability discovery.
|
|
605
|
+
|
|
606
|
+
Provider modules own:
|
|
607
|
+
|
|
422
608
|
- Provider-specific validation.
|
|
609
|
+
- Provider SDK/client mapping.
|
|
610
|
+
- Dispatch from generic `AssetJob` objects to internal operation handlers.
|
|
611
|
+
- Output persistence details when the provider returns URLs, binary responses, or
|
|
612
|
+
structured data.
|
|
423
613
|
|
|
424
614
|
The CLI should avoid hardcoding all provider behavior into command handlers.
|
|
615
|
+
Manifest execution should build generic `AssetJob` objects and call
|
|
616
|
+
`provider.run(job, context)` rather than calling modality-specific provider
|
|
617
|
+
methods directly.
|
|
425
618
|
|
|
426
619
|
## Settings Strategy
|
|
427
620
|
|
|
@@ -461,6 +654,11 @@ Asset-producing commands should write the asset to disk and print structured met
|
|
|
461
654
|
}
|
|
462
655
|
```
|
|
463
656
|
|
|
657
|
+
Ploof is a static asset generation tool. Providers may use asynchronous jobs,
|
|
658
|
+
polling, or queue subscriptions internally, but CLI consumers receive completed
|
|
659
|
+
files or text outputs after the command finishes. Streaming transports should
|
|
660
|
+
not be exposed as the primary consumption model.
|
|
661
|
+
|
|
464
662
|
Each generated file should have an optional sidecar metadata file:
|
|
465
663
|
|
|
466
664
|
```text
|