pixverse-cli 1.1.12 → 1.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,10 +1,10 @@
1
1
  # PixVerse CLI
2
2
 
3
- The official command-line interface (CLI) for [PixVerse](https://pixverse.ai) — create AI-powered videos and images directly from your terminal.
3
+ The official command-line interface (CLI) for [PixVerse](https://pixverse.ai) — create AI-powered videos, images, and audio directly from your terminal.
4
4
 
5
5
  ## What is PixVerse?
6
6
 
7
- PixVerse is an AI-powered creative platform that generates high-quality videos and images from text prompts or reference images. It supports a wide range of creative workflows including text-to-video, image-to-video, text-to-image, video transitions, lip-sync speech, sound effects, templates/effects, and more.
7
+ PixVerse is an AI-powered creative platform that generates high-quality videos, images, and audio from text prompts or reference images. It supports a wide range of creative workflows including text-to-video, image-to-video, text-to-image, video transitions, text-to-speech (voice synthesis), music generation, templates/effects, and more.
8
8
 
9
9
  ## What is PixVerse CLI?
10
10
 
@@ -13,12 +13,12 @@ PixVerse CLI is essentially **a UI-free version of the PixVerse website**. All f
13
13
  It is designed for:
14
14
 
15
15
  - **AI agents** — structured JSON output, deterministic exit codes, and pipeable commands make it a perfect tool for autonomous workflows (e.g. Claude Code, Cursor, Codex, LangChain, custom agents).
16
- - **Developers & power users** — scriptable video/image generation without leaving the terminal.
16
+ - **Developers & power users** — scriptable video/image/audio generation without leaving the terminal.
17
17
  - **Automation** — integrate AI content generation into CI/CD pipelines, batch processing scripts, or content production workflows.
18
18
 
19
19
  ## Subscription Required
20
20
 
21
- PixVerse CLI uses the same credit system as the website — generating videos and images consumes credits from your PixVerse account balance with the same pricing. To prevent abuse, **PixVerse CLI is currently available to subscribed users only**. For details on subscription plans and member benefits, see the [PixVerse Subscribe](https://app.pixverse.ai/subscribe) page.
21
+ PixVerse CLI uses the same credit system as the website — generating videos, images, and audio consumes credits from your PixVerse account balance with the same pricing. To prevent abuse, **PixVerse CLI is currently available to subscribed users only**. For details on subscription plans and member benefits, see the [PixVerse Subscribe](https://app.pixverse.ai/subscribe) page.
22
22
 
23
23
  ## Installation
24
24
 
@@ -66,6 +66,7 @@ This opens a browser where you confirm the authorization. You can also copy the
66
66
  | Kling O3 Standard | `kling-o3-standard` | `720p` | `3`–`15`s | `16:9` `9:16` `1:1` |
67
67
  | Kling 3.0 Pro | `kling-3.0-pro` | `720p` | `3`–`15`s | `16:9` `9:16` `1:1` |
68
68
  | Kling 3.0 Standard | `kling-3.0-standard` | `720p` | `3`–`15`s | `16:9` `9:16` `1:1` |
69
+ | Grok Imagine 1.5 | `grok-imagine-1.5` | `480p` `720p` | `1`–`15`s | *from image* |
69
70
  | Grok Imagine | `grok-imagine` | `480p` `720p` | `1`–`15`s | `16:9` `4:3` `1:1` `9:16` `3:4` `3:2` `2:3` |
70
71
  | Veo 3.1 Lite | `veo-3.1-lite` | `720p` `1080p` | `4` `6` `8`s | `16:9` `9:16` |
71
72
  | Veo 3.1 Standard | `veo-3.1-standard` | `720p` `1080p` `2160p` | `4` `6` `8`s | `16:9` `9:16` |
@@ -76,20 +77,23 @@ This opens a browser where you confirm the authorization. You can also copy the
76
77
  | PixVerse v5.5 | `v5.5` | `360p` `480p` `540p` `720p` `1080p` | `1`–`10`s | `16:9` `4:3` `1:1` `3:4` `9:16` `3:2` `2:3` |
77
78
  | PixVerse v5 | `v5` | `360p` `480p` `540p` `720p` `1080p` | `1`–`10`s | `16:9` `4:3` `1:1` `3:4` `9:16` `3:2` `2:3` |
78
79
 
80
+ > Grok Imagine 1.5 is image-to-video only — it requires `--image` and derives its aspect ratio from the input image (the `--aspect-ratio` flag is ignored).
81
+
79
82
  > Not all models support all creation modes. See the per-mode support matrix below.
80
83
 
81
84
  #### Per-mode Model Support
82
85
 
83
86
  | Creation mode | Supported `--model` values |
84
87
  |:---|:---|
85
- | `create video` (text-to-video / image-to-video) | `v6` `pixverse-c1` `seedance-2.0-standard` `seedance-2.0-fast` `happyhorse-1.0` `kling-o3-pro` `kling-o3-standard` `kling-3.0-pro` `kling-3.0-standard` `grok-imagine` `veo-3.1-lite` `veo-3.1-standard` `veo-3.1-fast` `sora-2-pro` `sora-2` `v5.6` |
88
+ | `create video` (text-to-video / image-to-video) | `v6` `pixverse-c1` `seedance-2.0-standard` `seedance-2.0-fast` `happyhorse-1.0` `kling-o3-pro` `kling-o3-standard` `kling-3.0-pro` `kling-3.0-standard` `grok-imagine-1.5` `grok-imagine` `veo-3.1-lite` `veo-3.1-standard` `veo-3.1-fast` `sora-2-pro` `sora-2` `v5.6` |
86
89
  | `create extend` | `v6` `grok-imagine` |
87
90
  | `create reference` (multi-subject fusion) | `v6` `pixverse-c1` `seedance-2.0-standard` `seedance-2.0-fast` `kling-o3-pro` `kling-o3-standard` `grok-imagine` `v5.6` |
88
91
  | `create transition` (2 frames) | `v6` `pixverse-c1` `seedance-2.0-standard` `seedance-2.0-fast` `kling-o3-pro` `kling-o3-standard` `kling-3.0-pro` `kling-3.0-standard` `veo-3.1-lite` `veo-3.1-standard` `veo-3.1-fast` `v5.6` |
89
92
  | `create transition` (3+ frames) | `v5` |
90
93
  | `create modify` | `v5.5` |
91
94
  | `create motion-control` | `v5.6` |
92
- | `create speech` (lip sync) | `v5` |
95
+
96
+ > Audio creation uses separate model families: `create voice` for text-to-speech and `create music` for prompt-to-music.
93
97
 
94
98
  ### Image Models (`--model <value>`)
95
99
 
@@ -106,6 +110,28 @@ This opens a browser where you confirm the authorization. You can also copy the
106
110
  | Kling Image O3 | `kling-image-o3` | `1080p` `1440p` `2160p` | `16:9` `9:16` `1:1` + more |
107
111
  | Kling Image V3 | `kling-image-v3` | `1080p` `1440p` | `16:9` `9:16` `1:1` + more |
108
112
 
113
+ ### Voice / TTS Models (`create voice --model <value>`)
114
+
115
+ | Model | `--model` value | Provider | Max characters |
116
+ |:---|:---|:---|:---|
117
+ | MiniMax Speech 2.8 HD *(default)* | `speech-2.8-hd` | MiniMax | 10,000 |
118
+ | MiniMax Speech 2.8 Turbo | `speech-2.8-turbo` | MiniMax | 10,000 |
119
+ | Eleven Multilingual v2 | `eleven-multilingual-v2` | ElevenLabs | 10,000 |
120
+ | Eleven v3 | `eleven-v3` | ElevenLabs | 5,000 |
121
+ | Eleven Turbo v2.5 | `eleven-turbo-v2.5` | ElevenLabs | 40,000 |
122
+
123
+ > Browse available preset voices with `pixverse voice presets --model <id>` and the full live model catalog with `pixverse voice models`.
124
+
125
+ ### Music Models (`create music --model <value>`)
126
+
127
+ | Model | `--model` value | Provider | Duration | Notes |
128
+ |:---|:---|:---|:---|:---|
129
+ | MiniMax Music 2.6 *(default)* | `music-2.6` | MiniMax | `10`-`240`s | Lyrics, auto lyrics, instrumental |
130
+ | ElevenLabs Music | `music_v1` | ElevenLabs | `10`-`240`s | Lyrics, auto lyrics, instrumental |
131
+ | Google Lyria 3 Pro | `lyria-3-pro-preview` | Google | `10`-`240`s | Image references, no separate `--lyrics` |
132
+
133
+ > Browse the live music model catalog with `pixverse music models`.
134
+
109
135
  ---
110
136
 
111
137
  ## Usage
@@ -129,16 +155,23 @@ Local image inputs larger than `1920x1920` or `5MB` are automatically resized/co
129
155
  pixverse create video --prompt "A cat walking on Mars" --model v6 --quality 720p --aspect-ratio 16:9
130
156
  ```
131
157
 
132
- ### Prompts from stdin
158
+ ### Text inputs: literal, a file, or stdin
159
+
160
+ Text-input flags — `--prompt` (all create commands), `--text` (`create voice`), and `--lyrics` (`create music`) — accept three forms, just like `--image` / `--video`:
133
161
 
134
- Pass `-` to `--prompt` (or `--tts-text`) to read the value from stdin. Handy for long or multi-line prompts and for piping output from another tool without fighting shell quoting:
162
+ - a **literal** string: `--prompt "A neon city skyline"`
163
+ - a **local file path**: `--prompt ./scene.txt` (the file's contents are used)
164
+ - `-` to read from **stdin**: `... | pixverse create video --prompt -`
135
165
 
136
166
  ```bash
137
- echo "A neon city skyline at dusk, slow drone shot" | pixverse create video --prompt -
167
+ pixverse create video --prompt ./scene.txt
138
168
  cat scene.txt | pixverse create image --prompt - --json
139
- some-prompt-generator | pixverse create speech --video <id> --tts-text -
169
+ echo "Hello from the command line" | pixverse create voice --text -
170
+ pixverse create music --prompt "Bright synth-pop" --lyrics ./lyrics.txt
140
171
  ```
141
172
 
173
+ > A value is treated as a file only when a matching file actually exists on disk; otherwise it's used as literal text (the same rule as `--image` / `--video`).
174
+
142
175
  ### Image to Video
143
176
 
144
177
  ```bash
@@ -163,9 +196,22 @@ pixverse create image --prompt "Turn this into a watercolor painting" --image ./
163
196
  # Create a transition between keyframes (requires 2+ images)
164
197
  pixverse create transition --images ./frame1.png ./frame2.png ./frame3.png
165
198
 
166
- # Add lip-sync speech to a video (via TTS or audio file)
167
- pixverse create speech --video <video_id> --tts-text "Hello world"
168
- pixverse create speech --video <video_id> --audio ./speech.mp3
199
+ # Generate speech audio from text (text-to-speech)
200
+ pixverse create voice --text "Hello world" --voice-id <preset_voice_id> --output ./out.mp3
201
+ # Browse available models / preset voices:
202
+ pixverse voice models
203
+ pixverse voice presets --model speech-2.8-hd
204
+
205
+ # Generate music audio from a prompt
206
+ pixverse create music --prompt "A cinematic pop song with bright synths" --auto-lyrics
207
+ pixverse create music --prompt "Uplifting piano theme" --instrumental --duration-seconds 60
208
+ # Lyrics-capable models require lyrics unless --auto-lyrics or --instrumental is used:
209
+ # (--lyrics takes a literal string, a local file path, or - for stdin)
210
+ pixverse create music --prompt "Bright synth-pop, uplifting mood" --lyrics ./lyrics.txt
211
+ # Google Lyria supports image references and expects lyric-like instructions in --prompt:
212
+ pixverse create music -m lyria-3-pro-preview --prompt "Instrumental orchestral cue inspired by these images" --image ./moodboard.png
213
+ # Browse available music models:
214
+ pixverse music models
169
215
 
170
216
  # Extend video duration
171
217
  pixverse create extend --video <video_id>
@@ -189,6 +235,13 @@ pixverse create motion-control --image ./character.png --video ./dance.mp4
189
235
  pixverse create template --template-id 12345 --image ./photo.png
190
236
  ```
191
237
 
238
+ Voice speed uses provider-specific validation:
239
+
240
+ | Provider | Default | Valid range | Invalid range error | Provider request field |
241
+ |:---|:---|:---|:---|:---|
242
+ | ElevenLabs | `1.0` | `0.7..1.2` | `--speed must be between 0.7 and 1.2` | `voice_settings.speed` |
243
+ | MiniMax | `1.0` | `0.5..2.0` | `--speed must be between 0.5 and 2` | `voice_setting.speed` |
244
+
192
245
  ### Common Creation Flags
193
246
 
194
247
  These flags are available across most `create` subcommands:
@@ -209,6 +262,9 @@ These flags are available across most `create` subcommands:
209
262
  # Check task status
210
263
  pixverse task status <id>
211
264
 
265
+ # Poll a voice/music audio task (audio is not auto-detected — pass --type audio)
266
+ pixverse task status <id> --type audio
267
+
212
268
  # Batch status query (parallel; per-ID failures captured in the response map)
213
269
  pixverse task status --ids 123,456,789 --type video --json
214
270
 
@@ -222,21 +278,32 @@ pixverse task wait <id>
222
278
  # List your generated assets (default: created videos)
223
279
  pixverse asset list
224
280
  pixverse asset list --type image
281
+ pixverse asset list --type audio # voice and music audio history
282
+ pixverse asset list --type audio --source upload
225
283
  pixverse asset list --source upload
226
284
  pixverse asset list --source create --off-peak
227
285
 
228
286
  # Upload a local file or URL to asset library
229
287
  pixverse asset upload ./photo.png
288
+ pixverse asset upload ./voice-over.mp3
230
289
  pixverse asset upload https://example.com/image.jpg
231
290
 
232
- # Get asset details
291
+ # Get asset details (type auto-detected: video → image → audio)
233
292
  pixverse asset info <id>
293
+ # Pass --type to skip auto-detection
294
+ pixverse asset info <id> --type audio
295
+ pixverse asset info <id> --type audio --source upload
234
296
 
235
- # Download a generated video or image
297
+ # Download a created video, image, or audio (uploads are not downloadable)
236
298
  pixverse asset download <id>
299
+ pixverse asset download <id> --type audio --dest ./out/
237
300
 
238
- # Delete an asset
301
+ # Delete a created asset — pass its id (auto-detected)
239
302
  pixverse asset delete <id>
303
+ pixverse asset delete <id> --type audio
304
+
305
+ # Delete an uploaded asset — pass the id from `asset list --source upload`
306
+ pixverse asset delete <id> --source upload --type image
240
307
  ```
241
308
 
242
309
  ### Saved Folders
@@ -382,7 +449,8 @@ pixverse asset download "$VID" --dest ./output/
382
449
  | `create video` | Text-to-video or image-to-video |
383
450
  | `create image` | Text-to-image or image-to-image |
384
451
  | `create transition` | Create transitions between keyframes |
385
- | `create speech` | Add lip-sync speech to video |
452
+ | `create voice` | Generate speech audio from text (text-to-speech) |
453
+ | `create music` | Generate music audio from a prompt |
386
454
  | `create extend` | Extend video duration |
387
455
  | `create modify` | Modify an existing video |
388
456
  | `create upscale` | Upscale video resolution |
@@ -393,9 +461,12 @@ pixverse asset download "$VID" --dest ./output/
393
461
  | `template list` | List templates (with category filter) |
394
462
  | `template search` | Search templates by keyword |
395
463
  | `template info` | Get template details |
464
+ | `voice models` | List voice/TTS providers, models, and supported languages |
465
+ | `voice presets` | List preset voices (filterable by model / language / provider) |
466
+ | `music models` | List music providers, models, and capabilities |
396
467
  | `task status` | Check task status (single `<id>` or `--ids id1,id2,...` for batch) |
397
468
  | `task wait` | Wait for task completion |
398
- | `asset list` | List assets (`--source create\|upload`, `--type video\|image`, `--off-peak`) |
469
+ | `asset list` | List assets (`--source create\|upload`, `--type video\|image\|audio`, `--off-peak`) |
399
470
  | `asset upload` | Upload a local file or HTTPS URL to asset library |
400
471
  | `asset info` | Get asset details |
401
472
  | `asset download` | Download a generated asset |
@@ -437,6 +508,8 @@ pixverse asset download "$VID" --dest ./output/
437
508
 
438
509
  For AI agents (Claude Code, Cursor, Codex, etc.), we **strongly recommend** installing [PixVerse Skills](https://github.com/PixVerseAI/skills) — a comprehensive skill library that teaches agents how to use PixVerse CLI correctly with full model constraints, multi-step pipelines, and error handling.
439
510
 
511
+ For lightweight discovery, the public repo also includes a compact machine-readable command manifest at `capabilities.json`; the npm package includes the same file at `dist/capabilities.json`.
512
+
440
513
  **Install via Skills CLI:**
441
514
 
442
515
  ```bash