@miketromba/ploof 0.2.0 → 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -9,7 +9,7 @@
9
9
  <img src="https://img.shields.io/badge/node-%3E%3D18-brightgreen" alt="node version" />
10
10
  </p>
11
11
 
12
- Ploof is a CLI for generating and editing creative assets with AI providers. It supports OpenAI image generation/editing and OpenAI video generation/editing today, plus the legacy OpenAI image variations endpoint when the authenticated project has access. The provider registry is designed for audio and broader model marketplaces over time.
12
+ Ploof is a CLI for generating and editing creative assets with AI providers. It supports OpenAI image, video, and audio generation/processing today, plus the legacy OpenAI image variations endpoint when the authenticated project has access. The provider registry is designed for broader model marketplaces over time.
13
13
 
14
14
  It is built for both developers and AI agents: predictable commands, parseable output, local auth profiles, YAML manifests, parallel execution, and a companion skill.
15
15
 
@@ -24,13 +24,15 @@ It is built for both developers and AI agents: predictable commands, parseable o
24
24
  | OpenAI video generation | Supported |
25
25
  | OpenAI video editing/extensions | Supported |
26
26
  | OpenAI video downloads/library/characters | Supported |
27
+ | OpenAI audio generation / TTS | Supported |
28
+ | OpenAI audio transcription | Supported |
29
+ | OpenAI audio translation | Supported |
27
30
  | Context images and masks | Supported |
28
- | Video references and source videos | Supported |
31
+ | Image, video, and audio input assets | Supported |
29
32
  | YAML/JSON batch manifests | Supported |
30
33
  | Dependency-aware parallel runs | Supported |
31
34
  | Agent instructions via `ploof learn` | Supported |
32
35
  | Additional providers | Planned |
33
- | Audio generation | Planned |
34
36
 
35
37
  ## Install
36
38
 
@@ -80,6 +82,16 @@ ploof video generate \
80
82
  --seconds 4 \
81
83
  --out assets/clip.mp4
82
84
 
85
+ # Generate and transcribe speech
86
+ ploof audio generate \
87
+ --text "Ploof can generate speech and process audio." \
88
+ --voice alloy \
89
+ --out assets/speech.mp3
90
+
91
+ ploof audio transcribe \
92
+ --audio assets/speech.mp3 \
93
+ --out assets/transcript.json
94
+
83
95
  # Run a manifest
84
96
  ploof run assets.yaml --parallel 4
85
97
  ```
@@ -247,6 +259,55 @@ ploof video character create --name Mossy --video character.mp4 --output json
247
259
  ploof video character get char_abc123 --output json
248
260
  ```
249
261
 
262
+ ## Audio Generation And Processing
263
+
264
+ OpenAI audio generation defaults to `gpt-4o-mini-tts`, `alloy`, and `mp3` when model, voice, and format are omitted.
265
+
266
+ ```bash
267
+ ploof audio generate \
268
+ --provider openai \
269
+ --text "A concise product narration for the demo reel." \
270
+ --model gpt-4o-mini-tts \
271
+ --voice alloy \
272
+ --format mp3 \
273
+ --out assets/narration.mp3 \
274
+ --output json
275
+ ```
276
+
277
+ Useful generation flags:
278
+
279
+ | Flag | Description |
280
+ | --- | --- |
281
+ | `--model <model>` | TTS model, for example `gpt-4o-mini-tts` |
282
+ | `--voice <voice>` | Built-in voice such as `alloy`, `coral`, `nova`, or `shimmer` |
283
+ | `--voice-id <id>` | Custom voice id |
284
+ | `--instructions <text>` | Voice/style instructions for supported models |
285
+ | `--format <format>` | `mp3`, `opus`, `aac`, `flac`, `wav`, or `pcm` |
286
+ | `--speed <number>` | Speech speed |
287
+ | `--param key=value` | Provider-specific pass-through parameter |
288
+ | `--json '{...}'` | Provider-specific JSON object |
289
+
290
+ Transcription and translation:
291
+
292
+ ```bash
293
+ ploof audio transcribe \
294
+ --audio assets/narration.mp3 \
295
+ --model gpt-4o-mini-transcribe \
296
+ --out assets/transcript.json \
297
+ --output json
298
+
299
+ ploof audio translate \
300
+ --audio assets/spanish.mp3 \
301
+ --model whisper-1 \
302
+ --format text \
303
+ --out assets/translation.txt \
304
+ --output json
305
+ ```
306
+
307
+ Transcription supports `--language`, `--prompt`, `--format`, `--temperature`, `--include`, `--timestamp-granularity`, `--chunking-strategy`, `--known-speaker-name`, and `--known-speaker-reference`. Translation supports `--prompt`, `--format`, and `--temperature`.
308
+
309
+ Ploof writes complete static assets to disk. Streaming transport settings such as OpenAI `stream=true` for transcription or `stream_format=sse` for speech are rejected because they do not produce a finished asset file directly.
310
+
250
311
  ## Batch Manifests
251
312
 
252
313
  ```yaml
@@ -294,6 +355,27 @@ tasks:
294
355
  wait: true
295
356
  download: true
296
357
  output: assets/clip.mp4
358
+
359
+ - id: narration
360
+ kind: audio.generate
361
+ provider: openai
362
+ text: "Short narration for the generated clip."
363
+ params:
364
+ model: gpt-4o-mini-tts
365
+ voice: alloy
366
+ response_format: mp3
367
+ output: assets/narration.mp3
368
+
369
+ - id: transcript
370
+ kind: audio.transcribe
371
+ provider: openai
372
+ needs: [narration]
373
+ inputs:
374
+ audio:
375
+ task: narration
376
+ params:
377
+ model: gpt-4o-mini-transcribe
378
+ output: assets/transcript.json
297
379
  ```
298
380
 
299
381
  Run it:
@@ -372,7 +454,7 @@ bun run build
372
454
  npm pack --dry-run
373
455
  ```
374
456
 
375
- The default test suite includes mocked OpenAI end-to-end tests. Those tests run real `ploof` CLI commands against a local mock OpenAI server and verify generated files, edit uploads, video job polling/downloads, sidecar metadata, and dependency-aware manifests without spending API credits.
457
+ The default test suite includes mocked OpenAI end-to-end tests. Those tests run real `ploof` CLI commands against a local mock OpenAI server and verify generated files, edit uploads, video job polling/downloads, audio generation/processing, sidecar metadata, and dependency-aware manifests without spending API credits.
376
458
 
377
459
  Live OpenAI tests are opt-in only:
378
460
 
package/SPEC.md CHANGED
@@ -2,7 +2,7 @@
2
2
 
3
3
  ## Summary
4
4
 
5
- Ploof is an npm-published CLI for generating and editing assets through AI generation providers. It starts with OpenAI image and video generation/editing, but the architecture must support multiple authenticated providers, multiple asset modalities, provider-specific settings, and parallel execution across mixed jobs.
5
+ Ploof is an npm-published CLI for generating, editing, and processing assets through AI generation providers. It starts with OpenAI image, video, and audio generation/processing, but the architecture must support multiple authenticated providers, multiple asset modalities, provider-specific settings, and parallel execution across mixed jobs.
6
6
 
7
7
  The product should feel like a small, sharp developer tool: easy to run manually, predictable in scripts, and optimized for AI agents.
8
8
 
@@ -97,6 +97,9 @@ Initial capabilities:
97
97
  - `video.delete`
98
98
  - `video.character.create`
99
99
  - `video.character.get`
100
+ - `audio.generate`
101
+ - `audio.transcribe`
102
+ - `audio.translate`
100
103
 
101
104
  Future providers should be added through the provider registry without changing the manifest model.
102
105
 
@@ -303,6 +306,80 @@ project is eligible for that workflow. Extensions accept a source video id or
303
306
  upload, plus a prompt and `--seconds`. `video remix` is supported for the SDK's
304
307
  legacy remix endpoint, but new integrations should prefer `video edit`.
305
308
 
309
+ ### Audio Generation And Processing
310
+
311
+ OpenAI audio generation uses the speech API and defaults to
312
+ `gpt-4o-mini-tts`, `alloy`, and `mp3` when model, voice, and output format are
313
+ omitted.
314
+
315
+ ```bash
316
+ ploof audio generate \
317
+ --provider openai \
318
+ --text "Short narration for the generated asset." \
319
+ --model gpt-4o-mini-tts \
320
+ --voice alloy \
321
+ --format mp3 \
322
+ --out assets/narration.mp3 \
323
+ --output json
324
+ ```
325
+
326
+ First-class OpenAI audio generation flags:
327
+
328
+ - `--model <model>`
329
+ - `--voice <voice>`
330
+ - `--voice-id <id>`
331
+ - `--instructions <text>`
332
+ - `--format <format>` / `--response-format <format>`
333
+ - `--speed <number>`
334
+ - `--param key=value`
335
+ - `--json '{...}'`
336
+
337
+ Audio processing supports transcription and English translation:
338
+
339
+ ```bash
340
+ ploof audio transcribe \
341
+ --audio assets/narration.mp3 \
342
+ --model gpt-4o-mini-transcribe \
343
+ --out assets/transcript.json \
344
+ --output json
345
+
346
+ ploof audio translate \
347
+ --audio assets/spanish.mp3 \
348
+ --model whisper-1 \
349
+ --format text \
350
+ --out assets/translation.txt \
351
+ --output json
352
+ ```
353
+
354
+ Transcription first-class flags:
355
+
356
+ - `--model <model>`
357
+ - `--language <code>`
358
+ - `--prompt <prompt>`
359
+ - `--format <format>` / `--response-format <format>`
360
+ - `--temperature <number>`
361
+ - `--include <value>`
362
+ - `--timestamp-granularity word|segment`
363
+ - `--chunking-strategy auto|{...}`
364
+ - `--known-speaker-name <name>`
365
+ - `--known-speaker-reference <data-url>`
366
+ - `--param key=value`
367
+ - `--json '{...}'`
368
+
369
+ Translation first-class flags:
370
+
371
+ - `--model <model>`
372
+ - `--prompt <prompt>`
373
+ - `--format <format>` / `--response-format <format>`
374
+ - `--temperature <number>`
375
+ - `--param key=value`
376
+ - `--json '{...}'`
377
+
378
+ Ploof is a static asset generation CLI. Audio commands request complete outputs
379
+ and write them to disk. Streaming transport settings such as OpenAI
380
+ `stream=true` for transcription or `stream_format=sse` for speech are rejected
381
+ because they do not directly produce finished asset files.
382
+
306
383
  ### Batch Run
307
384
 
308
385
  ```bash
@@ -356,6 +433,27 @@ tasks:
356
433
  wait: true
357
434
  download: true
358
435
  output: assets/clip.mp4
436
+
437
+ - id: narration
438
+ kind: audio.generate
439
+ provider: openai
440
+ text: "Short narration for the generated clip."
441
+ params:
442
+ model: gpt-4o-mini-tts
443
+ voice: alloy
444
+ response_format: mp3
445
+ output: assets/narration.mp3
446
+
447
+ - id: transcript
448
+ kind: audio.transcribe
449
+ provider: openai
450
+ needs: [narration]
451
+ inputs:
452
+ audio:
453
+ task: narration
454
+ params:
455
+ model: gpt-4o-mini-transcribe
456
+ output: assets/transcript.json
359
457
  ```
360
458
 
361
459
  ## Asset Input Model
@@ -388,6 +486,10 @@ OpenAI video generation/editing maps:
388
486
  - `role=reference` to `input_reference` for image-guided video generation.
389
487
  - `role=video` to source video uploads for eligible edit/extension workflows.
390
488
 
489
+ OpenAI audio processing maps:
490
+
491
+ - `role=audio` to the uploaded audio file for transcription or translation.
492
+
391
493
  Future providers can map roles such as `reference`, `style`, `init-image`, `audio`, or `video` differently.
392
494
 
393
495
  ## Provider Architecture
@@ -411,6 +513,9 @@ type Provider = {
411
513
  runVideoDelete(job, context): Promise<ProviderResult>
412
514
  runVideoCharacterCreate(job, context): Promise<ProviderResult>
413
515
  runVideoCharacterGet(job, context): Promise<ProviderResult>
516
+ runAudioGenerate(job, context): Promise<ProviderResult>
517
+ runAudioTranscribe(job, context): Promise<ProviderResult>
518
+ runAudioTranslate(job, context): Promise<ProviderResult>
414
519
  }
415
520
  ```
416
521