@miketromba/ploof 0.1.5 → 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -9,7 +9,7 @@
9
9
  <img src="https://img.shields.io/badge/node-%3E%3D18-brightgreen" alt="node version" />
10
10
  </p>
11
11
 
12
- Ploof is a CLI for generating and editing creative assets with AI providers. It supports OpenAI image generation and editing today, plus the legacy OpenAI image variations endpoint when the authenticated project has access. The provider registry is designed for audio, video, and broader model marketplaces over time.
12
+ Ploof is a CLI for generating and editing creative assets with AI providers. It supports OpenAI image, video, and audio generation/processing today, plus the legacy OpenAI image variations endpoint when the authenticated project has access. The provider registry is designed for broader model marketplaces over time.
13
13
 
14
14
  It is built for both developers and AI agents: predictable commands, parseable output, local auth profiles, YAML manifests, parallel execution, and a companion skill.
15
15
 
@@ -21,12 +21,18 @@ It is built for both developers and AI agents: predictable commands, parseable o
21
21
  | OpenAI image generation | Supported |
22
22
  | OpenAI image editing | Supported |
23
23
  | OpenAI image variations | Legacy endpoint; supported when available to the authenticated project |
24
+ | OpenAI video generation | Supported |
25
+ | OpenAI video editing/extensions | Supported |
26
+ | OpenAI video downloads/library/characters | Supported |
27
+ | OpenAI audio generation / TTS | Supported |
28
+ | OpenAI audio transcription | Supported |
29
+ | OpenAI audio translation | Supported |
24
30
  | Context images and masks | Supported |
31
+ | Image, video, and audio input assets | Supported |
25
32
  | YAML/JSON batch manifests | Supported |
26
33
  | Dependency-aware parallel runs | Supported |
27
34
  | Agent instructions via `ploof learn` | Supported |
28
35
  | Additional providers | Planned |
29
- | Audio/video generation | Planned |
30
36
 
31
37
  ## Install
32
38
 
@@ -68,6 +74,24 @@ ploof image edit \
68
74
  --prompt "Replace the background with a clean marble countertop" \
69
75
  --out assets/edited.png
70
76
 
77
+ # Generate and download a video
78
+ ploof video generate \
79
+ --prompt "Wide tracking shot of a paper city at blue hour" \
80
+ --model sora-2 \
81
+ --size 1280x720 \
82
+ --seconds 4 \
83
+ --out assets/clip.mp4
84
+
85
+ # Generate and transcribe speech
86
+ ploof audio generate \
87
+ --text "Ploof can generate speech and process audio." \
88
+ --voice alloy \
89
+ --out assets/speech.mp3
90
+
91
+ ploof audio transcribe \
92
+ --audio assets/speech.mp3 \
93
+ --out assets/transcript.json
94
+
71
95
  # Run a manifest
72
96
  ploof run assets.yaml --parallel 4
73
97
  ```
@@ -170,6 +194,120 @@ The plural alias also works:
170
194
  ploof image variations --image input.png --out variation.png
171
195
  ```
172
196
 
197
+ ## Video Generation
198
+
199
+ OpenAI video generation uses the asynchronous Videos API. `ploof video generate` submits a job immediately. If you pass `--out` or `--download`, Ploof waits for completion and downloads the requested asset.
200
+
201
+ ```bash
202
+ ploof video generate \
203
+ --provider openai \
204
+ --prompt "Wide tracking shot of a teal coupe on a desert highway" \
205
+ --model sora-2 \
206
+ --size 1280x720 \
207
+ --seconds 4 \
208
+ --out assets/clip.mp4 \
209
+ --output json
210
+ ```
211
+
212
+ Useful generation flags:
213
+
214
+ | Flag | Description |
215
+ | --- | --- |
216
+ | `--model <model>` | Video model, for example `sora-2` or `sora-2-pro` |
217
+ | `--size <size>` | Output resolution, for example `1280x720` |
218
+ | `--seconds <seconds>` | Clip or extension duration |
219
+ | `--input-reference <path-or-url-or-file-id>` | Image reference for the first frame |
220
+ | `--input-reference-file-id <id>` | OpenAI uploaded image file id |
221
+ | `--input-reference-url <url>` | Image URL or data URL reference |
222
+ | `--character <id>` | Reusable character id; repeat for multiple characters |
223
+ | `--wait` | Poll until the job reaches a terminal status |
224
+ | `--download` | Download after waiting |
225
+ | `--variant <variant>` | `video`, `thumbnail`, or `spritesheet` |
226
+ | `--poll-interval <seconds>` | Polling interval while waiting |
227
+ | `--timeout <seconds>` | Maximum wait time |
228
+ | `--param key=value` | Provider-specific pass-through parameter |
229
+ | `--json '{...}'` | Provider-specific JSON object |
230
+
231
+ If you omit `--model`, Ploof defaults OpenAI video generation to `sora-2`.
232
+
233
+ ## Video Editing And Library
234
+
235
+ ```bash
236
+ ploof video edit \
237
+ --video-id video_abc123 \
238
+ --prompt "Shift the palette to teal and rust" \
239
+ --out assets/edit.mp4
240
+
241
+ ploof video extend \
242
+ --video-id video_abc123 \
243
+ --prompt "Continue as the camera rises over the rooftops" \
244
+ --seconds 4 \
245
+ --out assets/extended.mp4
246
+
247
+ ploof video download video_abc123 --variant thumbnail --out assets/thumb.webp
248
+ ploof video status video_abc123 --output json
249
+ ploof video list --limit 20 --output json
250
+ ploof video delete video_abc123
251
+ ```
252
+
253
+ OpenAI video edits accept either `--video-id <id>` for an existing completed OpenAI video or `--video <path>` for an uploaded source video when the authenticated project is eligible for that workflow. Extensions accept a source video id or upload, plus a prompt and `--seconds`.
254
+
255
+ Reusable character commands:
256
+
257
+ ```bash
258
+ ploof video character create --name Mossy --video character.mp4 --output json
259
+ ploof video character get char_abc123 --output json
260
+ ```
261
+
262
+ ## Audio Generation And Processing
263
+
264
+ OpenAI audio generation defaults to `gpt-4o-mini-tts`, `alloy`, and `mp3` when model, voice, and format are omitted.
265
+
266
+ ```bash
267
+ ploof audio generate \
268
+ --provider openai \
269
+ --text "A concise product narration for the demo reel." \
270
+ --model gpt-4o-mini-tts \
271
+ --voice alloy \
272
+ --format mp3 \
273
+ --out assets/narration.mp3 \
274
+ --output json
275
+ ```
276
+
277
+ Useful generation flags:
278
+
279
+ | Flag | Description |
280
+ | --- | --- |
281
+ | `--model <model>` | TTS model, for example `gpt-4o-mini-tts` |
282
+ | `--voice <voice>` | Built-in voice such as `alloy`, `coral`, `nova`, or `shimmer` |
283
+ | `--voice-id <id>` | Custom voice id |
284
+ | `--instructions <text>` | Voice/style instructions for supported models |
285
+ | `--format <format>` | `mp3`, `opus`, `aac`, `flac`, `wav`, or `pcm` |
286
+ | `--speed <number>` | Speech speed |
287
+ | `--param key=value` | Provider-specific pass-through parameter |
288
+ | `--json '{...}'` | Provider-specific JSON object |
289
+
290
+ Transcription and translation:
291
+
292
+ ```bash
293
+ ploof audio transcribe \
294
+ --audio assets/narration.mp3 \
295
+ --model gpt-4o-mini-transcribe \
296
+ --out assets/transcript.json \
297
+ --output json
298
+
299
+ ploof audio translate \
300
+ --audio assets/spanish.mp3 \
301
+ --model whisper-1 \
302
+ --format text \
303
+ --out assets/translation.txt \
304
+ --output json
305
+ ```
306
+
307
+ Transcription supports `--language`, `--prompt`, `--format`, `--temperature`, `--include`, `--timestamp-granularity`, `--chunking-strategy`, `--known-speaker-name`, and `--known-speaker-reference`. Translation supports `--prompt`, `--format`, and `--temperature`.
308
+
309
+ Ploof writes complete static assets to disk. Streaming transport settings such as OpenAI `stream=true` for transcription or `stream_format=sse` for speech are rejected because they do not produce a finished asset file directly.
310
+
173
311
  ## Batch Manifests
174
312
 
175
313
  ```yaml
@@ -205,6 +343,39 @@ tasks:
205
343
  images:
206
344
  - task: base
207
345
  output: assets/variation.png
346
+
347
+ - id: clip
348
+ kind: video.generate
349
+ provider: openai
350
+ prompt: "Slow dolly shot through a miniature paper city"
351
+ params:
352
+ model: sora-2
353
+ size: 1280x720
354
+ seconds: "4"
355
+ wait: true
356
+ download: true
357
+ output: assets/clip.mp4
358
+
359
+ - id: narration
360
+ kind: audio.generate
361
+ provider: openai
362
+ text: "Short narration for the generated clip."
363
+ params:
364
+ model: gpt-4o-mini-tts
365
+ voice: alloy
366
+ response_format: mp3
367
+ output: assets/narration.mp3
368
+
369
+ - id: transcript
370
+ kind: audio.transcribe
371
+ provider: openai
372
+ needs: [narration]
373
+ inputs:
374
+ audio:
375
+ task: narration
376
+ params:
377
+ model: gpt-4o-mini-transcribe
378
+ output: assets/transcript.json
208
379
  ```
209
380
 
210
381
  Run it:
@@ -283,7 +454,7 @@ bun run build
283
454
  npm pack --dry-run
284
455
  ```
285
456
 
286
- The default test suite includes mocked OpenAI end-to-end tests. Those tests run real `ploof` CLI commands against a local mock OpenAI server and verify generated files, edit uploads, sidecar metadata, and dependency-aware manifests without spending API credits.
457
+ The default test suite includes mocked OpenAI end-to-end tests. Those tests run real `ploof` CLI commands against a local mock OpenAI server and verify generated files, edit uploads, video job polling/downloads, audio generation/processing, sidecar metadata, and dependency-aware manifests without spending API credits.
287
458
 
288
459
  Live OpenAI tests are opt-in only:
289
460
 
package/SPEC.md CHANGED
@@ -2,7 +2,7 @@
2
2
 
3
3
  ## Summary
4
4
 
5
- Ploof is an npm-published CLI for generating and editing assets through AI generation providers. It starts with OpenAI image generation and editing, but the architecture must support multiple authenticated providers, multiple asset modalities, provider-specific settings, and parallel execution across mixed jobs.
5
+ Ploof is an npm-published CLI for generating, editing, and processing assets through AI generation providers. It starts with OpenAI image, video, and audio generation/processing, but the architecture must support multiple authenticated providers, multiple asset modalities, provider-specific settings, and parallel execution across mixed jobs.
6
6
 
7
7
  The product should feel like a small, sharp developer tool: easy to run manually, predictable in scripts, and optimized for AI agents.
8
8
 
@@ -87,6 +87,19 @@ Initial capabilities:
87
87
  - `image.generate`
88
88
  - `image.edit`
89
89
  - `image.variation`
90
+ - `video.generate`
91
+ - `video.edit`
92
+ - `video.extend`
93
+ - `video.remix`
94
+ - `video.status`
95
+ - `video.download`
96
+ - `video.list`
97
+ - `video.delete`
98
+ - `video.character.create`
99
+ - `video.character.get`
100
+ - `audio.generate`
101
+ - `audio.transcribe`
102
+ - `audio.translate`
90
103
 
91
104
  Future providers should be added through the provider registry without changing the manifest model.
92
105
 
@@ -229,6 +242,144 @@ authenticated project has DALL-E 2 variation access; if OpenAI returns a 404,
229
242
  use `ploof image edit` for image-to-image workflows. `ploof image variations`
230
243
  is an alias.
231
244
 
245
+ ### Video Generation
246
+
247
+ OpenAI video generation uses the asynchronous Videos API. `ploof video generate`
248
+ submits a job; passing `--out` or `--download` makes Ploof poll until a terminal
249
+ status and download the requested asset.
250
+
251
+ ```bash
252
+ ploof video generate \
253
+ --provider openai \
254
+ --prompt "Wide tracking shot of a paper city at blue hour" \
255
+ --model sora-2 \
256
+ --size 1280x720 \
257
+ --seconds 4 \
258
+ --out assets/clip.mp4 \
259
+ --output json
260
+ ```
261
+
262
+ First-class OpenAI video flags:
263
+
264
+ - `--model <model>`
265
+ - `--size <size>`
266
+ - `--seconds <seconds>`
267
+ - `--input-reference <path-or-url-or-file-id>`
268
+ - `--input-reference-file-id <id>`
269
+ - `--input-reference-url <url>`
270
+ - `--character <id>`
271
+ - `--wait`
272
+ - `--download`
273
+ - `--variant video|thumbnail|spritesheet`
274
+ - `--poll-interval <seconds>`
275
+ - `--timeout <seconds>`
276
+ - `--param key=value`
277
+ - `--json '{...}'`
278
+
279
+ OpenAI video generation defaults to `sora-2` when no model is specified.
280
+
281
+ ### Video Editing And Library
282
+
283
+ ```bash
284
+ ploof video edit \
285
+ --video-id video_abc123 \
286
+ --prompt "Shift the palette to teal and rust" \
287
+ --out assets/edit.mp4
288
+
289
+ ploof video extend \
290
+ --video-id video_abc123 \
291
+ --prompt "Continue as the camera rises over the rooftops" \
292
+ --seconds 4 \
293
+ --out assets/extended.mp4
294
+
295
+ ploof video download video_abc123 --variant thumbnail --out assets/thumb.webp
296
+ ploof video status video_abc123 --output json
297
+ ploof video list --limit 20 --output json
298
+ ploof video delete video_abc123
299
+ ploof video character create --name Mossy --video character.mp4 --output json
300
+ ploof video character get char_abc123 --output json
301
+ ```
302
+
303
+ Video edits accept either `--video-id <id>` for an existing completed OpenAI
304
+ video or `--video <path>` for an uploaded source video when the authenticated
305
+ project is eligible for that workflow. Extensions accept a source video id or
306
+ upload, plus a prompt and `--seconds`. `video remix` is supported for the SDK's
307
+ legacy remix endpoint, but new integrations should prefer `video edit`.
308
+
309
+ ### Audio Generation And Processing
310
+
311
+ OpenAI audio generation uses the speech API and defaults to
312
+ `gpt-4o-mini-tts`, `alloy`, and `mp3` when model, voice, and output format are
313
+ omitted.
314
+
315
+ ```bash
316
+ ploof audio generate \
317
+ --provider openai \
318
+ --text "Short narration for the generated asset." \
319
+ --model gpt-4o-mini-tts \
320
+ --voice alloy \
321
+ --format mp3 \
322
+ --out assets/narration.mp3 \
323
+ --output json
324
+ ```
325
+
326
+ First-class OpenAI audio generation flags:
327
+
328
+ - `--model <model>`
329
+ - `--voice <voice>`
330
+ - `--voice-id <id>`
331
+ - `--instructions <text>`
332
+ - `--format <format>` / `--response-format <format>`
333
+ - `--speed <number>`
334
+ - `--param key=value`
335
+ - `--json '{...}'`
336
+
337
+ Audio processing supports transcription and English translation:
338
+
339
+ ```bash
340
+ ploof audio transcribe \
341
+ --audio assets/narration.mp3 \
342
+ --model gpt-4o-mini-transcribe \
343
+ --out assets/transcript.json \
344
+ --output json
345
+
346
+ ploof audio translate \
347
+ --audio assets/spanish.mp3 \
348
+ --model whisper-1 \
349
+ --format text \
350
+ --out assets/translation.txt \
351
+ --output json
352
+ ```
353
+
354
+ Transcription first-class flags:
355
+
356
+ - `--model <model>`
357
+ - `--language <code>`
358
+ - `--prompt <prompt>`
359
+ - `--format <format>` / `--response-format <format>`
360
+ - `--temperature <number>`
361
+ - `--include <value>`
362
+ - `--timestamp-granularity word|segment`
363
+ - `--chunking-strategy auto|{...}`
364
+ - `--known-speaker-name <name>`
365
+ - `--known-speaker-reference <data-url>`
366
+ - `--param key=value`
367
+ - `--json '{...}'`
368
+
369
+ Translation first-class flags:
370
+
371
+ - `--model <model>`
372
+ - `--prompt <prompt>`
373
+ - `--format <format>` / `--response-format <format>`
374
+ - `--temperature <number>`
375
+ - `--param key=value`
376
+ - `--json '{...}'`
377
+
378
+ Ploof is a static asset generation CLI. Audio commands request complete outputs
379
+ and write them to disk. Streaming transport settings such as OpenAI
380
+ `stream=true` for transcription or `stream_format=sse` for speech are rejected
381
+ because they do not directly produce finished asset files.
382
+
232
383
  ### Batch Run
233
384
 
234
385
  ```bash
@@ -270,6 +421,39 @@ tasks:
270
421
  images:
271
422
  - task: base
272
423
  output: assets/variation.png
424
+
425
+ - id: clip
426
+ kind: video.generate
427
+ provider: openai
428
+ prompt: "Slow dolly shot through a miniature paper city"
429
+ params:
430
+ model: sora-2
431
+ size: 1280x720
432
+ seconds: "4"
433
+ wait: true
434
+ download: true
435
+ output: assets/clip.mp4
436
+
437
+ - id: narration
438
+ kind: audio.generate
439
+ provider: openai
440
+ text: "Short narration for the generated clip."
441
+ params:
442
+ model: gpt-4o-mini-tts
443
+ voice: alloy
444
+ response_format: mp3
445
+ output: assets/narration.mp3
446
+
447
+ - id: transcript
448
+ kind: audio.transcribe
449
+ provider: openai
450
+ needs: [narration]
451
+ inputs:
452
+ audio:
453
+ task: narration
454
+ params:
455
+ model: gpt-4o-mini-transcribe
456
+ output: assets/transcript.json
273
457
  ```
274
458
 
275
459
  ## Asset Input Model
@@ -297,6 +481,15 @@ OpenAI image editing maps:
297
481
  - `role=image` to image input files.
298
482
  - `role=mask` to mask file.
299
483
 
484
+ OpenAI video generation/editing maps:
485
+
486
+ - `role=reference` to `input_reference` for image-guided video generation.
487
+ - `role=video` to source video uploads for eligible edit/extension workflows.
488
+
489
+ OpenAI audio processing maps:
490
+
491
+ - `role=audio` to the uploaded audio file for transcription or translation.
492
+
300
493
  Future providers can map roles such as `reference`, `style`, `init-image`, `audio`, or `video` differently.
301
494
 
302
495
  ## Provider Architecture
@@ -310,6 +503,19 @@ type Provider = {
310
503
  runImageGenerate(job, context): Promise<ProviderResult>
311
504
  runImageEdit(job, context): Promise<ProviderResult>
312
505
  runImageVariation(job, context): Promise<ProviderResult>
506
+ runVideoGenerate(job, context): Promise<ProviderResult>
507
+ runVideoEdit(job, context): Promise<ProviderResult>
508
+ runVideoExtend(job, context): Promise<ProviderResult>
509
+ runVideoRemix(job, context): Promise<ProviderResult>
510
+ runVideoStatus(job, context): Promise<ProviderResult>
511
+ runVideoDownload(job, context): Promise<ProviderResult>
512
+ runVideoList(job, context): Promise<ProviderResult>
513
+ runVideoDelete(job, context): Promise<ProviderResult>
514
+ runVideoCharacterCreate(job, context): Promise<ProviderResult>
515
+ runVideoCharacterGet(job, context): Promise<ProviderResult>
516
+ runAudioGenerate(job, context): Promise<ProviderResult>
517
+ runAudioTranscribe(job, context): Promise<ProviderResult>
518
+ runAudioTranslate(job, context): Promise<ProviderResult>
313
519
  }
314
520
  ```
315
521