@scenerok/cli 1.0.12 → 1.0.14
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json
CHANGED
package/skills/shared/SKILL.md
CHANGED
|
@@ -37,7 +37,7 @@ input logo = "/uploads/logo.png"
|
|
|
37
37
|
|
|
38
38
|
**Time Blocks** - Define what happens during a time range:
|
|
39
39
|
```vidscript
|
|
40
|
-
[-] = hero # auto-append: starts
|
|
40
|
+
[-] = hero # auto-append: starts at visual cursor
|
|
41
41
|
hero.Trim(start: 0s, end: 5s)
|
|
42
42
|
|
|
43
43
|
[- 3s] = text "Hello", style: title, color: "#FFF" # auto-start, 3s duration
|
|
@@ -45,6 +45,8 @@ hero.Trim(start: 0s, end: 5s)
|
|
|
45
45
|
[prev + 0.5s .. prev + 2s] = filter "glow" # expression-based
|
|
46
46
|
```
|
|
47
47
|
|
|
48
|
+
`[-]` timing is channel-aware. Visual blocks (`video`, `text`, filters, visual plugin calls, bare visual inputs) advance a visual playhead. Audio blocks (`audio`, `xai.tts`, `eleven.music`) advance an audio playhead. A 15s music bed or multiple TTS blocks do not delay the next `[-] = video ...` block; that video starts from the visual playhead. Mixed blocks advance both playheads, and `prev` follows the relevant channel for the block being compiled.
|
|
49
|
+
|
|
48
50
|
**Video Operations** - Modify how video plays:
|
|
49
51
|
```vidscript
|
|
50
52
|
hero.Trim(start: 0s, end: 5s) # trim clip
|
|
@@ -125,6 +127,8 @@ Available plugin APIs:
|
|
|
125
127
|
| `@elevenlabs/music` | `music`, `generateMusic`, `composeMusic` | Background music beds, theme music, instrumental loops |
|
|
126
128
|
| `@scenerok/basic-animations` | `fadeIn`, `fadeOut`, `slideX`, `slideY`, `popIn`, `riseIn`, `swingIn`, `glitchIn`, `float`, `typewriter` | Motion descriptors for `animate:` |
|
|
127
129
|
|
|
130
|
+
Cloudflare video calls share named parameters across providers: `model`, `aspect_ratio`, `duration`, `resolution` or `quality`, `generate_audio`, `negative_prompt`, `seed`, and `input` for model-specific overrides. Use `cf.imageToVideo(image, prompt, model: ..., aspect_ratio: "9:16", duration: ...)` when animating one source image. Common aspect values are `"9:16"` for vertical, `"16:9"` for wide, `"1:1"` for square, plus model-specific values such as `"4:3"`, `"3:4"`, and `"21:9"`. For Runway Gen-4.5, keep writing friendly `aspect_ratio` values; SceneRok translates them to Cloudflare's required `ratio` values such as `"720:1280"`.
|
|
131
|
+
|
|
128
132
|
xAI TTS voice IDs: `eve` for demos/announcements/upbeat content, `ara` for warm conversational narration, `rex` for business/tutorial delivery, `sal` for balanced general narration, and `leo` for authoritative instructional narration.
|
|
129
133
|
|
|
130
134
|
For ElevenLabs music, import `@elevenlabs/music` and call `eleven.music(...)`, `eleven.generateMusic(...)`, or `eleven.composeMusic(...)`. Use `let bed = eleven.music(...)` followed by `audio bed, volume: ...` when you need volume or fades. Do not call `eleven.tts(...)`; this repo currently exposes voiceover through `xai.tts(...)` / `xai.textToSpeech(...)`. Place voiceover as an `audio` block and do not leave ads silent unless the user asks for silent output.
|
|
@@ -171,8 +175,8 @@ When the user provides a product, company, landing page, app store listing, or e
|
|
|
171
175
|
|
|
172
176
|
## Best Practices
|
|
173
177
|
|
|
174
|
-
- **Use dynamic timeblocks** — `[-]` auto-advances the cursor, reducing calculation errors
|
|
175
|
-
- **Use `prev` for offsets** — `[prev + 0.5s .. prev + 2s]` for gaps between content
|
|
178
|
+
- **Use dynamic timeblocks** — `[-]` auto-advances the relevant audio or visual cursor, reducing calculation errors
|
|
179
|
+
- **Use `prev` for offsets** — `[prev + 0.5s .. prev + 2s]` for gaps between content in the same channel
|
|
176
180
|
- **Named arguments for clarity** — `hero.Trim(start: 0s, end: 5s)` over `hero.Trim(0s, 5s)`
|
|
177
181
|
- **Use website assets judiciously for URL-based ads** — browser screenshots, product images, app screenshots, and logos can ground the ad, but they are optional ingredients, not the whole recipe
|
|
178
182
|
- **Avoid full-page screenshot backgrounds** — use above-fold, cropped, or focused screenshots that remain legible at video size; never rely on long website screenshots where the text becomes unreadable
|
|
@@ -14,7 +14,8 @@ VidScript is a declarative DSL for composing short-form videos. Write a script,
|
|
|
14
14
|
8. **Use the full media plugin toolkit** — choose still generation, text-to-video, image-to-video, reference-to-video, video extension, TTS, and music deliberately. Do not default to only `xai.imagine`.
|
|
15
15
|
9. **Include audio when useful** — use `xai.tts` / `xai.textToSpeech` for narration and `eleven.music` / `eleven.generateMusic` / `eleven.composeMusic` for music beds.
|
|
16
16
|
10. **Generated media has no text** — prompts for AI-generated images/videos must explicitly ask for no text, no words, no letters, no captions, no logos, no watermarks, and no readable UI copy. Use VidScript `text` primitives for all final copy.
|
|
17
|
-
11. **
|
|
17
|
+
11. **Channel-aware `[-]` timing** — audio and visual auto blocks have separate playheads. A music bed or TTS sequence does not delay the next `[-] = video ...` block.
|
|
18
|
+
12. **Strict rules** — see `vidscript-strict.md` for invalid patterns (bare `xai.imagine`, bare `cf.video(...)`, bare `eleven.music(...)` without import, trailing params after direct plugin calls, nested quotes in prompts).
|
|
18
19
|
|
|
19
20
|
### Common validation failures
|
|
20
21
|
|
|
@@ -25,12 +26,19 @@ VidScript is a declarative DSL for composing short-form videos. Write a script,
|
|
|
25
26
|
| `Unknown function 'fadeIn'` | Use `import motion from "@scenerok/basic-animations"` and call `motion.fadeIn(...)` |
|
|
26
27
|
| `Expected ... but "," found` | Do not put `, volume:` after a direct plugin call. Use `let bed = eleven.music(...)`, then `audio bed, volume: ...` |
|
|
27
28
|
| `Unknown function 'eleven.generateMusic'` | Add `import eleven from "@elevenlabs/music"` before calling `eleven.generateMusic(...)` |
|
|
28
|
-
|
|
|
29
|
+
| Prompt needs quoted words | Escape quotes inside strings, e.g. `"Package labeled \"AER-01\""` |
|
|
29
30
|
|
|
30
31
|
## Program Structure
|
|
31
32
|
|
|
32
33
|
A VidScript program is a sequence of statements separated by newlines.
|
|
33
34
|
|
|
35
|
+
Strings support common backslash escapes. Use `\n` inside a `text` string for an intentional rendered line break, and use `\"` or `\'` for quote characters inside the matching string delimiter.
|
|
36
|
+
|
|
37
|
+
```vidscript
|
|
38
|
+
[0s .. 3s] = text "RUN LIGHTER\nDAILY", line_height: 1.05
|
|
39
|
+
let prompt = "Package labeled \"AER-01\", no text, no watermark"
|
|
40
|
+
```
|
|
41
|
+
|
|
34
42
|
```vidscript
|
|
35
43
|
# Single-line comment
|
|
36
44
|
/* Multi-line
|
|
@@ -128,12 +136,26 @@ Time blocks are the core of VidScript. They define when instructions execute on
|
|
|
128
136
|
### Dynamic Playhead (recommended)
|
|
129
137
|
|
|
130
138
|
```vidscript
|
|
131
|
-
[-] = hero # auto-append: starts
|
|
139
|
+
[-] = hero # auto-append: starts at visual cursor
|
|
132
140
|
[- 3s] = text "Title", size: 72 # auto-start, last 3 seconds
|
|
133
141
|
[- 2.5s] = filter "glow", intensity: 0.8
|
|
134
142
|
```
|
|
135
143
|
|
|
136
|
-
|
|
144
|
+
VidScript keeps separate auto-playhead cursors for visual and audio content. Visual blocks (`video`, `text`, filters, shaders, visual plugin calls, and bare visual inputs) advance the visual cursor. Audio blocks (`audio`, `xai.tts`, `eleven.music`, and other audio plugin calls) advance the audio cursor. `[- duration]` advances the relevant cursor by the explicit duration.
|
|
145
|
+
|
|
146
|
+
This means a long audio bed or voiceover sequence does not push a following auto video block to the end of the audio:
|
|
147
|
+
|
|
148
|
+
```vidscript
|
|
149
|
+
import xai from "@scenerok/xai"
|
|
150
|
+
import eleven from "@elevenlabs/music"
|
|
151
|
+
|
|
152
|
+
input product = "https://cdn.example.com/product.png"
|
|
153
|
+
let bed = eleven.music("Warm music bed", duration: 15, instrumental: true)
|
|
154
|
+
[0s .. 15s] = audio bed, volume: 0.35
|
|
155
|
+
[-] = video xai.imageToVideo(product, "Slow premium camera move", aspect_ratio: "9:16", duration: 6)
|
|
156
|
+
```
|
|
157
|
+
|
|
158
|
+
The generated video above starts at the current visual cursor, usually `0s`, even though the audio cursor already reaches `15s`. Mixed blocks containing both audio and visual instructions advance both cursors. The `prev` keyword resolves against the cursor for the block being compiled; for mixed timing it uses the current combined timeline position.
|
|
137
159
|
|
|
138
160
|
### Explicit Range
|
|
139
161
|
|
|
@@ -157,7 +179,7 @@ frame 90 # 90 frames at 30fps = 3 seconds
|
|
|
157
179
|
|
|
158
180
|
### The `prev` Keyword
|
|
159
181
|
|
|
160
|
-
`prev` refers to the current playhead position
|
|
182
|
+
`prev` refers to the current channel-aware playhead position for the block being compiled. In a visual block it follows the visual cursor; in an audio block it follows the audio cursor.
|
|
161
183
|
|
|
162
184
|
```vidscript
|
|
163
185
|
[- 2s] = text "Hello" # plays from cursor to cursor+2s
|
|
@@ -308,7 +330,7 @@ let bed = eleven.music("Warm premium launch bed", duration: 15, instrumental: tr
|
|
|
308
330
|
| `@scenerok/xai` | `xai.extendVideo(video, prompt, ...)` | Continue an existing video clip by up to 10s. |
|
|
309
331
|
| `@scenerok/xai` | `xai.tts(text, ...)`, `xai.textToSpeech(text, ...)` | Generate spoken narration. Voices: `eve`, `ara`, `rex`, `sal`, `leo`. |
|
|
310
332
|
| `@scenerok/cloudflare` | `cf.image(prompt, ...)`, `cf.generateImage(prompt, ...)` | Generate stills through Cloudflare AI Gateway image models such as `black-forest-labs/flux-2-pro-preview`, `openai/gpt-image-2`, `xai/grok-imagine-image-quality`, or `recraft/recraftv4-pro`. |
|
|
311
|
-
| `@scenerok/cloudflare` | `cf.video(prompt, ...)`, `cf.imagine(prompt, ...)`, `cf.genVideo(prompt, ...)`, `cf.generateVideo(prompt, ...)` | Generate text-to-video through models such as `pixverse/v6`, `vidu/q3-turbo`, `runwayml/gen-4.5`, `minimax/hailuo-2.3`, `bytedance/seedance-2.0`,
|
|
333
|
+
| `@scenerok/cloudflare` | `cf.video(prompt, ...)`, `cf.imagine(prompt, ...)`, `cf.genVideo(prompt, ...)`, `cf.generateVideo(prompt, ...)` | Generate text-to-video through models such as `pixverse/v6`, `vidu/q3-turbo`, `runwayml/gen-4.5`, `minimax/hailuo-2.3`, `bytedance/seedance-2.0`, `google/veo-3.1`, `google/veo-3.1-fast`, or `google/veo-3-fast`. |
|
|
312
334
|
| `@scenerok/cloudflare` | `cf.imageToVideo(image, prompt, ...)`, `cf.referenceToVideo([images...], prompt, ...)`, `cf.extendVideo(video, prompt, ...)` | Use Cloudflare-backed image-guided, reference-guided, or extension modes when the selected model supports them. |
|
|
313
335
|
| `@scenerok/cloudflare` | `cf.listModels()`, `cf.models()` | Return model metadata as data; useful for exploration, not for timeline media. |
|
|
314
336
|
| `@elevenlabs/music` | `eleven.music(prompt, ...)`, `eleven.generateMusic(prompt, ...)`, `eleven.composeMusic(prompt, ...)` | Generate background music. Use a `let` binding if you need `volume`, `fade_in`, or `fade_out`. |
|
|
@@ -317,7 +339,16 @@ xAI TTS voices: `eve` for upbeat demos/announcements, `ara` for warm conversatio
|
|
|
317
339
|
|
|
318
340
|
Do not call `eleven.tts(...)`; the registered ElevenLabs package is `@elevenlabs/music` for music beds only. Use `xai.tts(...)` or `xai.textToSpeech(...)` for voiceover.
|
|
319
341
|
|
|
320
|
-
Cloudflare video
|
|
342
|
+
Cloudflare video calls accept these common signatures:
|
|
343
|
+
|
|
344
|
+
| Function | Signature |
|
|
345
|
+
|----------|-----------|
|
|
346
|
+
| `cf.video` / `cf.generateVideo` / `cf.imagine` / `cf.genVideo` | `(prompt, model?, aspect_ratio?, duration?, resolution?/quality?, generate_audio?, negative_prompt?, seed?, input?)` |
|
|
347
|
+
| `cf.imageToVideo` | `(image, prompt, model?, aspect_ratio?, duration?, resolution?/quality?, generate_audio?, negative_prompt?, seed?, input?)` |
|
|
348
|
+
| `cf.referenceToVideo` | `([images...], prompt, model?, aspect_ratio?, duration?, resolution?/quality?, generate_audio?, input?)` |
|
|
349
|
+
| `cf.extendVideo` | `(video, prompt, model?, duration?, aspect_ratio?, resolution?/quality?, input?)` |
|
|
350
|
+
|
|
351
|
+
Use `aspect_ratio: "9:16"` for vertical output, `"16:9"` for wide, `"1:1"` for square, or model-supported values such as `"4:3"`, `"3:4"`, and `"21:9"`. For Runway models, keep writing friendly `aspect_ratio` values; SceneRok translates them to Cloudflare's required `ratio` values such as `"720:1280"` for vertical Gen-4.5. Prefer explicit `model:` when choosing Cloudflare so the intended provider is clear.
|
|
321
352
|
|
|
322
353
|
**Invalid (will not validate):**
|
|
323
354
|
|
|
@@ -450,8 +481,8 @@ scenerok secrets set ELEVENLABS_API_KEY=your-key
|
|
|
450
481
|
|
|
451
482
|
## Best Practices
|
|
452
483
|
|
|
453
|
-
1. **Use dynamic timeblocks** — `[-]` auto-advances the cursor, reducing calculation errors
|
|
454
|
-
2. **Use `prev` for offsets** — `[prev + 0.5s .. prev + 2s]` for gaps after previous content
|
|
484
|
+
1. **Use dynamic timeblocks** — `[-]` auto-advances the relevant audio or visual cursor, reducing calculation errors
|
|
485
|
+
2. **Use `prev` for offsets** — `[prev + 0.5s .. prev + 2s]` for gaps after previous content in the same channel
|
|
455
486
|
3. **1080×1920 for vertical** (TikTok, Reels, Shorts), **1920×1080 for horizontal** (YouTube)
|
|
456
487
|
4. **Hook viewers in the first 3 seconds** — place the most compelling content early
|
|
457
488
|
5. **High-contrast text** — use `stroke` and `stroke_width` on text overlays over video
|
|
@@ -101,12 +101,32 @@ Do not overuse one plugin function. Choose deliberately:
|
|
|
101
101
|
| Voiceover | `xai.tts`, `xai.textToSpeech` |
|
|
102
102
|
| Music bed | `eleven.music`, `eleven.generateMusic`, `eleven.composeMusic` |
|
|
103
103
|
|
|
104
|
-
Use `imageToVideo` for one extracted screenshot/product/logo image. Use `referenceToVideo` for 1-7 reference images when extracted objects should guide the generated shot. xAI `referenceToVideo` requires a prompt and duration must be <= 10s.
|
|
104
|
+
Use `imageToVideo` for one extracted screenshot/product/logo image. Use `referenceToVideo` for 1-7 reference images when extracted objects should guide the generated shot. xAI `referenceToVideo` requires a prompt and duration must be <= 10s.
|
|
105
|
+
|
|
106
|
+
For Cloudflare calls, always prefer explicit `model:` and use `aspect_ratio: "9:16"` for vertical, `"16:9"` for wide, `"1:1"` for square, or model-supported values such as `"4:3"`, `"3:4"`, and `"21:9"`. `cf.imageToVideo(image, prompt, model: ..., aspect_ratio: ..., duration: ...)` accepts `aspect_ratio`; for Runway Gen-4.5, SceneRok translates `"9:16"` to Cloudflare's required `ratio: "720:1280"`. Useful models include `pixverse/v6`, `vidu/q3-turbo`, `runwayml/gen-4.5`, `minimax/hailuo-2.3`, `bytedance/seedance-2.0`, `google/veo-3.1`, `google/veo-3.1-fast`, `google/veo-3-fast`, `black-forest-labs/flux-2-pro-preview`, and `openai/gpt-image-2`. Veo models accept `duration: 4`, `duration: 4s`, or `duration: "4s"` — all forms snap to the supported 4s/6s/8s durations.
|
|
105
107
|
|
|
106
108
|
xAI TTS voices are only `eve`, `ara`, `rex`, `sal`, and `leo`.
|
|
107
109
|
|
|
108
110
|
For ElevenLabs music, import `@elevenlabs/music` and use `eleven.music(...)`, `eleven.generateMusic(...)`, or `eleven.composeMusic(...)`. Do not call `eleven.tts(...)`; ElevenLabs TTS is not registered in VidScript today. Ads, promos, tutorials, and reels should normally include either `xai.tts` voiceover, an ElevenLabs music bed, or both unless the user asks for silent output.
|
|
109
111
|
|
|
112
|
+
## Rule 4.5: `[-]` uses separate audio and visual playheads
|
|
113
|
+
|
|
114
|
+
Use `[-]` freely for generated visual sequences even when audio is already scheduled. VidScript advances audio and visual auto blocks independently:
|
|
115
|
+
|
|
116
|
+
```vidscript
|
|
117
|
+
import xai from "@scenerok/xai"
|
|
118
|
+
import eleven from "@elevenlabs/music"
|
|
119
|
+
|
|
120
|
+
input family_image = "https://cdn.example.com/family.png"
|
|
121
|
+
let bed = eleven.music("Warm family music", duration: 15, instrumental: true)
|
|
122
|
+
[0s .. 15s] = audio bed, volume: 0.35, fade_out: 2s
|
|
123
|
+
[-] = video xai.imageToVideo(family_image, "Warm family moment, no text, no words, no letters, no captions, no logos, no watermark, no readable UI copy", aspect_ratio: "9:16", duration: 15)
|
|
124
|
+
```
|
|
125
|
+
|
|
126
|
+
The video starts at the visual playhead, usually `0s`; it is not pushed to `15s` by the audio bed. Audio blocks such as `[-] = audio xai.tts(...)` advance only the audio playhead. Visual blocks such as `[-] = video ...` and `[-] = text ...` advance only the visual playhead. A block containing both audio and visual instructions advances both.
|
|
127
|
+
|
|
128
|
+
Use explicit ranges when you need exact synchronization, overlaps, or cuts. Do not avoid `[-]` solely because the script contains music or voiceover.
|
|
129
|
+
|
|
110
130
|
## Rule 5: Time ranges use `..`
|
|
111
131
|
|
|
112
132
|
```vidscript
|
|
@@ -123,7 +143,7 @@ Not `[0s - 5s]`.
|
|
|
123
143
|
|
|
124
144
|
## Rule 7: Quote safety in prompts
|
|
125
145
|
|
|
126
|
-
|
|
146
|
+
Escape quote characters inside prompt strings, e.g. `xai.imagine("Package labeled \"RokMilk\", no text, no watermark")`.
|
|
127
147
|
|
|
128
148
|
## Rule 8: Audio syntax
|
|
129
149
|
|