@scenerok/cli 1.0.11 → 1.0.13

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@scenerok/cli",
3
- "version": "1.0.11",
3
+ "version": "1.0.13",
4
4
  "description": "SceneRok CLI - Create videos from your terminal and agent workflows",
5
5
  "type": "module",
6
6
  "bin": {
@@ -37,7 +37,7 @@ input logo = "/uploads/logo.png"
37
37
 
38
38
  **Time Blocks** - Define what happens during a time range:
39
39
  ```vidscript
40
- [-] = hero # auto-append: starts after previous block
40
+ [-] = hero # auto-append: starts at visual cursor
41
41
  hero.Trim(start: 0s, end: 5s)
42
42
 
43
43
  [- 3s] = text "Hello", style: title, color: "#FFF" # auto-start, 3s duration
@@ -45,6 +45,8 @@ hero.Trim(start: 0s, end: 5s)
45
45
  [prev + 0.5s .. prev + 2s] = filter "glow" # expression-based
46
46
  ```
47
47
 
48
+ `[-]` timing is channel-aware. Visual blocks (`video`, `text`, filters, visual plugin calls, bare visual inputs) advance a visual playhead. Audio blocks (`audio`, `xai.tts`, `eleven.music`) advance an audio playhead. A 15s music bed or multiple TTS blocks do not delay the next `[-] = video ...` block; that video starts from the visual playhead. Mixed blocks advance both playheads, and `prev` follows the relevant channel for the block being compiled.
49
+
48
50
  **Video Operations** - Modify how video plays:
49
51
  ```vidscript
50
52
  hero.Trim(start: 0s, end: 5s) # trim clip
@@ -80,35 +82,54 @@ output to "video.mp4", resolution: "1080x1920", fps: 30
80
82
  ### Package Imports
81
83
 
82
84
  ```vidscript
83
- import xai from "@scenerok/xai" # xAI image/video/tts
84
- import eleven from "@elevenlabs/music" # ElevenLabs generative music
85
- import motion from "@scenerok/basic-animations" # text/video animation helpers
85
+ import xai from "@scenerok/xai" # xAI image/video/tts
86
+ import cf from "@scenerok/cloudflare" # Cloudflare AI Gateway image/video models
87
+ import eleven from "@elevenlabs/music" # ElevenLabs generative music
88
+ import motion from "@scenerok/basic-animations" # text/video animation helpers
86
89
  ```
87
90
 
88
91
  ### Plugin Calls
89
92
 
90
93
  ```vidscript
91
94
  import xai from "@scenerok/xai"
95
+ import cf from "@scenerok/cloudflare"
92
96
  import eleven from "@elevenlabs/music"
93
97
  import motion from "@scenerok/basic-animations"
94
98
 
95
- [-] = video xai.imagine("Cinematic product shot, premium lighting, no text, no words, no letters, no captions, no logos, no watermark, no readable UI copy", aspect_ratio: "9:16", duration: 5)
99
+ [- 4s] = video xai.image("Premium product still on seamless studio background, no text, no words, no letters, no captions, no logos, no watermark, no readable UI copy", aspect_ratio: "9:16", resolution: "2k")
100
+ [-] = video xai.imagine("Cinematic product video, premium lighting, no text, no words, no letters, no captions, no logos, no watermark, no readable UI copy", aspect_ratio: "9:16", duration: 5)
96
101
  [-] = video xai.imageToVideo("https://cdn.example.com/product.png", "Slow premium camera move around the product, no text, no words, no letters, no captions, no logos, no watermark, no readable UI copy", aspect_ratio: "9:16", duration: 6)
97
102
  [-] = video xai.referenceToVideo(["https://cdn.example.com/product.png", "https://cdn.example.com/person.png"], "Lifestyle ad shot featuring the referenced product and person, no text, no words, no letters, no captions, no logos, no watermark, no readable UI copy", aspect_ratio: "9:16", duration: 8)
103
+ [-] = video xai.extendVideo("https://cdn.example.com/clip.mp4", "Camera pulls back to reveal the full premium setup, no text, no words, no letters, no captions, no logos, no watermark, no readable UI copy", duration: 5)
98
104
  [-] = audio xai.tts("Welcome to SceneRok", voice: "eve")
99
105
 
106
+ # Cloudflare gives access to multiple image/video providers through one package.
107
+ [-] = video cf.video("High-energy fashion product shot, handheld camera feel, no text, no words, no letters, no captions, no logos, no watermark, no readable UI copy", model: "pixverse/v6", aspect_ratio: "9:16", duration: 5, generate_audio: false)
108
+ [-] = video cf.imageToVideo("https://cdn.example.com/product.png", "Animate the product with realistic parallax and soft studio lighting, no text, no words, no letters, no captions, no logos, no watermark, no readable UI copy", model: "runwayml/gen-4.5", aspect_ratio: "9:16", duration: 5)
109
+ [- 4s] = video cf.image("Clean editorial product still, no text, no words, no letters, no captions, no logos, no watermark, no readable UI copy", model: "black-forest-labs/flux-2-pro-preview", aspect_ratio: "9:16")
110
+
100
111
  # ElevenLabs music package functions require an import alias.
101
112
  let bed = eleven.music("Warm premium launch bed", duration: 15, instrumental: true)
102
113
  [0s .. 15s] = audio bed, volume: 0.35, fade_out: 2s
103
114
  ```
104
115
 
105
- **Always** import the package that owns the function: `xai.*` from `@scenerok/xai`, `eleven.*` from `@elevenlabs/music`, and `motion.*` from `@scenerok/basic-animations`. Calls without import fail validation with `Unknown function 'xai.imagine'`.
116
+ **Always** import the package that owns the function: `xai.*` from `@scenerok/xai`, `cf.*` from `@scenerok/cloudflare`, `eleven.*` from `@elevenlabs/music`, and `motion.*` from `@scenerok/basic-animations`. Calls without import fail validation with `Unknown function 'xai.imagine'`.
117
+
118
+ Use the full visual toolkit, not only `xai.imagine()`. Generate a still first with `xai.image(...)`, `xai.generateImage(...)`, `cf.image(...)`, or `cf.generateImage(...)` when a clean key visual is enough. Use `xai.imageToVideo(image, prompt, ...)` or `cf.imageToVideo(image, prompt, ...)` when you have one strong product/screenshot/logo image to animate. Use `xai.referenceToVideo([images...], prompt, ...)` or `cf.referenceToVideo([images...], prompt, ...)` when extracted objects, screenshots, product photos, people, packaging, or brand elements should guide the generated scene. Use `xai.extendVideo(...)` or `cf.extendVideo(...)` when an existing generated/user clip should continue.
119
+
120
+ Available plugin APIs:
106
121
 
107
- Use the full xAI visual toolkit, not only `xai.imagine()`: use `xai.imageToVideo(image, prompt, ...)` when you have one strong product/screenshot/logo image to animate, and `xai.referenceToVideo([images...], prompt, ...)` when extracted objects, screenshots, product photos, people, packaging, or brand elements should guide the generated scene. Reference-to-video requires a prompt, accepts up to 7 reference images, and should stay at 10 seconds or less.
122
+ | Package | Functions | Typical use |
123
+ |---------|-----------|-------------|
124
+ | `@scenerok/xai` | `image`, `generateImage`, `imagine`, `genVideo`, `imageToVideo`, `referenceToVideo`, `extendVideo` | Grok stills, text-to-video, image/reference-guided video, clip extension |
125
+ | `@scenerok/xai` | `tts`, `textToSpeech`, `listVoices` | Spoken narration and voice lookup |
126
+ | `@scenerok/cloudflare` | `image`, `generateImage`, `imagine`, `genVideo`, `generateVideo`, `video`, `imageToVideo`, `referenceToVideo`, `extendVideo`, `listModels`, `models` | Alternate providers/models through Cloudflare AI Gateway, including PixVerse, Vidu, Runway, MiniMax, Seedance, Veo, FLUX, GPT Image, Recraft |
127
+ | `@elevenlabs/music` | `music`, `generateMusic`, `composeMusic` | Background music beds, theme music, instrumental loops |
128
+ | `@scenerok/basic-animations` | `fadeIn`, `fadeOut`, `slideX`, `slideY`, `popIn`, `riseIn`, `swingIn`, `glitchIn`, `float`, `typewriter` | Motion descriptors for `animate:` |
108
129
 
109
130
  xAI TTS voice IDs: `eve` for demos/announcements/upbeat content, `ara` for warm conversational narration, `rex` for business/tutorial delivery, `sal` for balanced general narration, and `leo` for authoritative instructional narration.
110
131
 
111
- For ElevenLabs music, import `@elevenlabs/music` and call `eleven.music(...)`, `eleven.generateMusic(...)`, or `eleven.composeMusic(...)`. Use `let bed = eleven.music(...)` followed by `audio bed, volume: ...` when you need volume or fades.
132
+ For ElevenLabs music, import `@elevenlabs/music` and call `eleven.music(...)`, `eleven.generateMusic(...)`, or `eleven.composeMusic(...)`. Use `let bed = eleven.music(...)` followed by `audio bed, volume: ...` when you need volume or fades. Do not call `eleven.tts(...)`; this repo currently exposes voiceover through `xai.tts(...)` / `xai.textToSpeech(...)`. Place voiceover as an `audio` block and do not leave ads silent unless the user asks for silent output.
112
133
 
113
134
  Read `vidscript-strict.md` for anti-patterns and `examples/system/*.vid` for copy-paste templates (installed with `scenerok skills install`).
114
135
 
@@ -127,10 +148,11 @@ input hero = "{{hero_clip}}"
127
148
  3. **Gather assets** — Video URLs, local paths, generated clips, and selective website screenshots/images; when the user provides a URL, browser screenshots are optional source material, not a requirement to build the whole video from screenshots
128
149
  4. **Probe asset type and dimensions** — Inspect every local or downloaded asset before placing it. Use `file` to identify image vs video, `ffprobe` for video/audio, `sips -g pixelWidth -g pixelHeight file` on macOS images, or `identify file` when ImageMagick is available.
129
150
  5. **Use images directly when useful** — VidScript `input` supports both videos and images. You can place a still image input with `video image_input`; the renderer detects the asset type and treats it as a static visual clip for the block duration.
130
- 6. **Compose VidScript** — Write the full script using measured dimensions, aspect-ratio-preserving scale, centered/safe-area placement, and entry/exit animation beats for each visual asset
131
- 7. **Validate** — Run `scenerok validate script.vid`
132
- 8. **Render** — Run `scenerok render script.vid --watch`
133
- 9. **Deliver** — Share the download URL when complete
151
+ 6. **Choose plugin modality deliberately** — still image, text-to-video, image-to-video, reference-to-video, video extension, TTS, and music are separate tools; use the one that matches the assets and story beat
152
+ 7. **Compose VidScript** — Write the full script using measured dimensions, aspect-ratio-preserving scale, centered/safe-area placement, entry/exit animation beats, and audio blocks when the video needs narration or a music bed
153
+ 8. **Validate** — Run `scenerok validate script.vid`
154
+ 9. **Render** — Run `scenerok render script.vid --watch`
155
+ 10. **Deliver** — Share the download URL when complete
134
156
 
135
157
  ## Website URL Asset Workflow
136
158
 
@@ -145,14 +167,14 @@ When the user provides a product, company, landing page, app store listing, or e
145
167
  7. Use captured assets as grounded proof points alongside other visual material. A good ad can combine product/logo/app screenshots, generated video clips, motion backgrounds, text primitives, and music.
146
168
  8. Scale assets from their measured aspect ratio to fit the output frame and safe areas. For 1080x1920 ads, keep primary visuals inside roughly 80-88% of frame width and reserve enough top/bottom room for text.
147
169
  9. Animate visual assets intentionally: use entry animations such as `motion.popIn`, `motion.riseIn`, or `motion.slideY`; use short exit fade/slide segments when a clean transition is needed.
148
- 10. Generative visuals are encouraged when useful. Choose among `xai.imagine` for text-to-video, `xai.imageToVideo` to animate one extracted image/screenshot/product photo, and `xai.referenceToVideo` to guide a generated scene with up to 7 extracted visual references.
170
+ 10. Generative visuals are encouraged when useful. Choose among still generation (`xai.image`, `xai.generateImage`, `cf.image`, `cf.generateImage`), text-to-video (`xai.imagine`, `xai.genVideo`, `cf.video`, `cf.generateVideo`), image-to-video (`xai.imageToVideo`, `cf.imageToVideo`), reference-to-video (`xai.referenceToVideo`, `cf.referenceToVideo`), and clip extension (`xai.extendVideo`, `cf.extendVideo`).
149
171
  11. For every generated visual prompt, explicitly require a clean scene with no text, no words, no letters, no captions, no logos, no watermarks, and no readable UI copy. AI video generation often corrupts text; all final titles, offers, captions, CTAs, prices, and labels must be created with VidScript `text` primitives.
150
172
  12. Do not invent product claims. Extract copy, pricing, feature names, social proof, and CTA language from the site or ask the user.
151
173
 
152
174
  ## Best Practices
153
175
 
154
- - **Use dynamic timeblocks** — `[-]` auto-advances the cursor, reducing calculation errors
155
- - **Use `prev` for offsets** — `[prev + 0.5s .. prev + 2s]` for gaps between content
176
+ - **Use dynamic timeblocks** — `[-]` auto-advances the relevant audio or visual cursor, reducing calculation errors
177
+ - **Use `prev` for offsets** — `[prev + 0.5s .. prev + 2s]` for gaps between content in the same channel
156
178
  - **Named arguments for clarity** — `hero.Trim(start: 0s, end: 5s)` over `hero.Trim(0s, 5s)`
157
179
  - **Use website assets judiciously for URL-based ads** — browser screenshots, product images, app screenshots, and logos can ground the ad, but they are optional ingredients, not the whole recipe
158
180
  - **Avoid full-page screenshot backgrounds** — use above-fold, cropped, or focused screenshots that remain legible at video size; never rely on long website screenshots where the text becomes unreadable
@@ -4,22 +4,25 @@ VidScript is a declarative DSL for composing short-form videos. Write a script,
4
4
 
5
5
  ## Agent quick start (read before composing)
6
6
 
7
- 1. **Import every package function** — use `import xai from "@scenerok/xai"`, `import eleven from "@elevenlabs/music"`, and `import motion from "@scenerok/basic-animations"` before calling `xai.*`, `eleven.*`, or `motion.*`.
7
+ 1. **Import every package function** — use `import xai from "@scenerok/xai"`, `import cf from "@scenerok/cloudflare"`, `import eleven from "@elevenlabs/music"`, and `import motion from "@scenerok/basic-animations"` before calling `xai.*`, `cf.*`, `eleven.*`, or `motion.*`.
8
8
  2. **Validate early** — `scenerok validate file.vid` (errors show as `Line N:C — message`).
9
9
  3. **Music package syntax** — `import eleven from "@elevenlabs/music"`, then call `eleven.music(...)`, `eleven.generateMusic(...)`, or `eleven.composeMusic(...)`.
10
10
  4. **Copy working examples** — see `examples/system/` in this skill folder (e.g. `minimal-text-reel.vid`, `ecom-product-launch.vid`, `rokmilk-chocolate-promo.vid`).
11
11
  5. **Use URL assets selectively** — if the user gives a website or product URL and browser screenshot tools are available, you can capture above-fold screenshots, product images, app screenshots, and logos for VidScript `input` assets. Do not treat screenshots as mandatory or as the only visual source.
12
12
  6. **Probe asset type and dimensions** — before placing real assets, identify image vs video with `file`; inspect width, height, and duration with `ffprobe`, `sips`, or `identify`; then scale from the measured aspect ratio and set explicit `width`, `height`, `x`, and `y`.
13
13
  7. **Use still images directly** — `input` supports videos and images. Place still screenshots, logos, and product photos with `video image_input`; the renderer treats images as static clips for the block duration.
14
- 8. **Use the full xAI visual toolkit** — choose `xai.imagine` for text-to-video, `xai.imageToVideo` for animating one extracted image/screenshot/product photo, and `xai.referenceToVideo` for guiding a scene with up to 7 extracted references.
15
- 9. **Generated media has no text** — prompts for AI-generated images/videos must explicitly ask for no text, no words, no letters, no captions, no logos, no watermarks, and no readable UI copy. Use VidScript `text` primitives for all final copy.
16
- 10. **Strict rules** — see `vidscript-strict.md` for invalid patterns (bare `xai.imagine`, bare `eleven.music(...)` without import, trailing params after direct plugin calls, nested quotes in prompts).
14
+ 8. **Use the full media plugin toolkit** — choose still generation, text-to-video, image-to-video, reference-to-video, video extension, TTS, and music deliberately. Do not default to only `xai.imagine`.
15
+ 9. **Include audio when useful** — use `xai.tts` / `xai.textToSpeech` for narration and `eleven.music` / `eleven.generateMusic` / `eleven.composeMusic` for music beds.
16
+ 10. **Generated media has no text** — prompts for AI-generated images/videos must explicitly ask for no text, no words, no letters, no captions, no logos, no watermarks, and no readable UI copy. Use VidScript `text` primitives for all final copy.
17
+ 11. **Channel-aware `[-]` timing** — audio and visual auto blocks have separate playheads. A music bed or TTS sequence does not delay the next `[-] = video ...` block.
18
+ 12. **Strict rules** — see `vidscript-strict.md` for invalid patterns (bare `xai.imagine`, bare `cf.video(...)`, bare `eleven.music(...)` without import, trailing params after direct plugin calls, nested quotes in prompts).
17
19
 
18
20
  ### Common validation failures
19
21
 
20
22
  | Symptom | Fix |
21
23
  |---------|-----|
22
24
  | `Unknown function 'xai.imagine'` | Add `import xai from "@scenerok/xai"` at top |
25
+ | `Unknown function 'cf.video'` | Add `import cf from "@scenerok/cloudflare"` at top |
23
26
  | `Unknown function 'fadeIn'` | Use `import motion from "@scenerok/basic-animations"` and call `motion.fadeIn(...)` |
24
27
  | `Expected ... but "," found` | Do not put `, volume:` after a direct plugin call. Use `let bed = eleven.music(...)`, then `audio bed, volume: ...` |
25
28
  | `Unknown function 'eleven.generateMusic'` | Add `import eleven from "@elevenlabs/music"` before calling `eleven.generateMusic(...)` |
@@ -112,6 +115,7 @@ Only default package imports are supported for callable packages. Call functions
112
115
 
113
116
  ```vidscript
114
117
  import xai from "@scenerok/xai"
118
+ import cf from "@scenerok/cloudflare"
115
119
  import eleven from "@elevenlabs/music"
116
120
  import motion from "@scenerok/basic-animations"
117
121
  ```
@@ -125,12 +129,26 @@ Time blocks are the core of VidScript. They define when instructions execute on
125
129
  ### Dynamic Playhead (recommended)
126
130
 
127
131
  ```vidscript
128
- [-] = hero # auto-append: starts after previous block
132
+ [-] = hero # auto-append: starts at visual cursor
129
133
  [- 3s] = text "Title", size: 72 # auto-start, last 3 seconds
130
134
  [- 2.5s] = filter "glow", intensity: 0.8
131
135
  ```
132
136
 
133
- The playhead cursor (`prev`) tracks where the timeline is. Each `[-]` block advances the cursor by the block's content duration. `[- duration]` advances by an explicit duration.
137
+ VidScript keeps separate auto-playhead cursors for visual and audio content. Visual blocks (`video`, `text`, filters, shaders, visual plugin calls, and bare visual inputs) advance the visual cursor. Audio blocks (`audio`, `xai.tts`, `eleven.music`, and other audio plugin calls) advance the audio cursor. `[- duration]` advances the relevant cursor by the explicit duration.
138
+
139
+ This means a long audio bed or voiceover sequence does not push a following auto video block to the end of the audio:
140
+
141
+ ```vidscript
142
+ import xai from "@scenerok/xai"
143
+ import eleven from "@elevenlabs/music"
144
+
145
+ input product = "https://cdn.example.com/product.png"
146
+ let bed = eleven.music("Warm music bed", duration: 15, instrumental: true)
147
+ [0s .. 15s] = audio bed, volume: 0.35
148
+ [-] = video xai.imageToVideo(product, "Slow premium camera move", aspect_ratio: "9:16", duration: 6)
149
+ ```
150
+
151
+ The generated video above starts at the current visual cursor, usually `0s`, even though the audio cursor already reaches `15s`. Mixed blocks containing both audio and visual instructions advance both cursors. The `prev` keyword resolves against the cursor for the block being compiled; for mixed timing it uses the current combined timeline position.
134
152
 
135
153
  ### Explicit Range
136
154
 
@@ -154,7 +172,7 @@ frame 90 # 90 frames at 30fps = 3 seconds
154
172
 
155
173
  ### The `prev` Keyword
156
174
 
157
- `prev` refers to the current playhead position the end time of the previous block.
175
+ `prev` refers to the current channel-aware playhead position for the block being compiled. In a visual block it follows the visual cursor; in an audio block it follows the audio cursor.
158
176
 
159
177
  ```vidscript
160
178
  [- 2s] = text "Hello" # plays from cursor to cursor+2s
@@ -268,14 +286,22 @@ Audio sources can be MP3 inputs, audio plugin results, or the embedded audio str
268
286
 
269
287
  ```vidscript
270
288
  import xai from "@scenerok/xai"
289
+ import cf from "@scenerok/cloudflare"
271
290
  import eleven from "@elevenlabs/music"
272
291
  import motion from "@scenerok/basic-animations"
273
292
 
274
- [-] = video xai.imagine("Cinematic product shot, premium lighting, no text, no words, no letters, no captions, no logos, no watermark, no readable UI copy", aspect_ratio: "9:16", duration: 5)
293
+ [- 4s] = video xai.image("Premium product still on seamless studio background, no text, no words, no letters, no captions, no logos, no watermark, no readable UI copy", aspect_ratio: "9:16", resolution: "2k")
294
+ [-] = video xai.imagine("Cinematic product video, premium lighting, no text, no words, no letters, no captions, no logos, no watermark, no readable UI copy", aspect_ratio: "9:16", resolution: "720p", duration: 5)
275
295
  [-] = video xai.imageToVideo("https://cdn.example.com/product.png", "Slow premium camera move around the product, no text, no words, no letters, no captions, no logos, no watermark, no readable UI copy", aspect_ratio: "9:16", duration: 6)
276
296
  [-] = video xai.referenceToVideo(["https://cdn.example.com/product.png", "https://cdn.example.com/person.png"], "Lifestyle ad shot featuring the referenced product and person, no text, no words, no letters, no captions, no logos, no watermark, no readable UI copy", aspect_ratio: "9:16", duration: 8)
297
+ [-] = video xai.extendVideo("https://cdn.example.com/clip.mp4", "Camera pulls back to reveal the full premium setup, no text, no words, no letters, no captions, no logos, no watermark, no readable UI copy", duration: 5)
277
298
  [-] = audio xai.tts("Welcome to SceneRok", voice: "eve")
278
299
 
300
+ # Cloudflare AI Gateway package.
301
+ [-] = video cf.video("High-energy fashion product shot, no text, no words, no letters, no captions, no logos, no watermark, no readable UI copy", model: "pixverse/v6", aspect_ratio: "9:16", duration: 5, generate_audio: false)
302
+ [-] = video cf.imageToVideo("https://cdn.example.com/product.png", "Animate the product with realistic parallax, no text, no words, no letters, no captions, no logos, no watermark, no readable UI copy", model: "runwayml/gen-4.5", aspect_ratio: "9:16", duration: 5)
303
+ [- 4s] = video cf.image("Clean editorial product still, no text, no words, no letters, no captions, no logos, no watermark, no readable UI copy", model: "black-forest-labs/flux-2-pro-preview", aspect_ratio: "9:16")
304
+
279
305
  # ElevenLabs music package functions require an import alias.
280
306
  [0s .. 15s] = audio eleven.music("Warm premium launch bed", duration: 15, instrumental: true)
281
307
 
@@ -284,23 +310,37 @@ let bed = eleven.music("Warm premium launch bed", duration: 15, instrumental: tr
284
310
  [0s .. 15s] = audio bed, volume: 0.35, fade_out: 2s
285
311
  ```
286
312
 
287
- `@scenerok/xai` requires an explicit default import. ElevenLabs music validates as `eleven.music(...)`, `eleven.generateMusic(...)`, or `eleven.composeMusic(...)` after `import eleven from "@elevenlabs/music"`. Plugin calls run at compile time; direct media plugin instructions cannot have trailing audio params, so use a `let` binding when you need `volume`, `fade_in`, or `fade_out`.
313
+ `@scenerok/xai`, `@scenerok/cloudflare`, and `@elevenlabs/music` require explicit default imports. ElevenLabs music validates as `eleven.music(...)`, `eleven.generateMusic(...)`, or `eleven.composeMusic(...)` after `import eleven from "@elevenlabs/music"`. Plugin calls run at compile time; direct media plugin instructions cannot have trailing audio params, so use a `let` binding when you need `volume`, `fade_in`, or `fade_out`.
288
314
 
289
- xAI visual functions:
315
+ ### Media Plugin API Reference
290
316
 
291
- | Function | Use when |
292
- |----------|----------|
293
- | `xai.imagine(prompt, ...)` | You need text-to-video from a clean prompt. |
294
- | `xai.imageToVideo(image, prompt, ...)` | You have one source image, screenshot, logo, product photo, data URI, or file id to animate. |
295
- | `xai.referenceToVideo([images...], prompt, ...)` | You have 1-7 reference images that should influence the generated scene without becoming the first frame. Requires a prompt; keep duration <= 10s. |
317
+ | Package | Function | Use when |
318
+ |---------|----------|----------|
319
+ | `@scenerok/xai` | `xai.image(prompt, ...)`, `xai.generateImage(prompt, ...)` | Generate a still image clip. Use `resolution: "1k"` or `"2k"` and `duration` for how long the still stays on the timeline. |
320
+ | `@scenerok/xai` | `xai.imagine(prompt, ...)`, `xai.genVideo(prompt, ...)` | Generate video from text. Use video resolutions such as `"480p"` or `"720p"`. |
321
+ | `@scenerok/xai` | `xai.imageToVideo(image, prompt, ...)` | Animate one source image, screenshot, logo, product photo, data URI, file id, or `{ url }` object. |
322
+ | `@scenerok/xai` | `xai.referenceToVideo([images...], prompt, ...)` | Guide a generated scene with 1-7 reference images. Requires a prompt; keep duration <= 10s. |
323
+ | `@scenerok/xai` | `xai.extendVideo(video, prompt, ...)` | Continue an existing video clip by up to 10s. |
324
+ | `@scenerok/xai` | `xai.tts(text, ...)`, `xai.textToSpeech(text, ...)` | Generate spoken narration. Voices: `eve`, `ara`, `rex`, `sal`, `leo`. |
325
+ | `@scenerok/cloudflare` | `cf.image(prompt, ...)`, `cf.generateImage(prompt, ...)` | Generate stills through Cloudflare AI Gateway image models such as `black-forest-labs/flux-2-pro-preview`, `openai/gpt-image-2`, `xai/grok-imagine-image-quality`, or `recraft/recraftv4-pro`. |
326
+ | `@scenerok/cloudflare` | `cf.video(prompt, ...)`, `cf.imagine(prompt, ...)`, `cf.genVideo(prompt, ...)`, `cf.generateVideo(prompt, ...)` | Generate text-to-video through models such as `pixverse/v6`, `vidu/q3-turbo`, `runwayml/gen-4.5`, `minimax/hailuo-2.3`, `bytedance/seedance-2.0`, or `google/veo-3.1`. |
327
+ | `@scenerok/cloudflare` | `cf.imageToVideo(image, prompt, ...)`, `cf.referenceToVideo([images...], prompt, ...)`, `cf.extendVideo(video, prompt, ...)` | Use Cloudflare-backed image-guided, reference-guided, or extension modes when the selected model supports them. |
328
+ | `@scenerok/cloudflare` | `cf.listModels()`, `cf.models()` | Return model metadata as data; useful for exploration, not for timeline media. |
329
+ | `@elevenlabs/music` | `eleven.music(prompt, ...)`, `eleven.generateMusic(prompt, ...)`, `eleven.composeMusic(prompt, ...)` | Generate background music. Use a `let` binding if you need `volume`, `fade_in`, or `fade_out`. |
296
330
 
297
331
  xAI TTS voices: `eve` for upbeat demos/announcements, `ara` for warm conversational narration, `rex` for business/tutorial content, `sal` for balanced general delivery, and `leo` for authoritative instruction.
298
332
 
333
+ Do not call `eleven.tts(...)`; the registered ElevenLabs package is `@elevenlabs/music` for music beds only. Use `xai.tts(...)` or `xai.textToSpeech(...)` for voiceover.
334
+
335
+ Cloudflare video supports common named params such as `model`, `aspect_ratio`, `resolution`, `quality`, `duration`, `negative_prompt`, `seed`, `generate_audio`, `image`, `reference_images`, `reference_video`, and `input` for model-specific overrides. Prefer explicit `model:` when choosing Cloudflare so the intended provider is clear.
336
+
299
337
  **Invalid (will not validate):**
300
338
 
301
339
  ```vidscript
302
340
  [-] = video xai.imagine("...") # missing xAI import
341
+ [0s .. 5s] = video cf.video("...") # missing Cloudflare import
303
342
  [0s .. 15s] = audio eleven.generateMusic("...") # missing `import eleven from "@elevenlabs/music"`
343
+ [0s .. 5s] = audio eleven.tts("...") # ElevenLabs TTS is not registered; use xai.tts
304
344
  [0s .. 15s] = audio eleven.music("..."), volume: 0.5 # trailing params after direct plugin call
305
345
  ```
306
346
 
@@ -425,8 +465,8 @@ scenerok secrets set ELEVENLABS_API_KEY=your-key
425
465
 
426
466
  ## Best Practices
427
467
 
428
- 1. **Use dynamic timeblocks** — `[-]` auto-advances the cursor, reducing calculation errors
429
- 2. **Use `prev` for offsets** — `[prev + 0.5s .. prev + 2s]` for gaps after previous content
468
+ 1. **Use dynamic timeblocks** — `[-]` auto-advances the relevant audio or visual cursor, reducing calculation errors
469
+ 2. **Use `prev` for offsets** — `[prev + 0.5s .. prev + 2s]` for gaps after previous content in the same channel
430
470
  3. **1080×1920 for vertical** (TikTok, Reels, Shorts), **1920×1080 for horizontal** (YouTube)
431
471
  4. **Hook viewers in the first 3 seconds** — place the most compelling content early
432
472
  5. **High-contrast text** — use `stroke` and `stroke_width` on text overlays over video
@@ -6,11 +6,12 @@ Follow these rules exactly. When in doubt, copy a file from `examples/system/` (
6
6
 
7
7
  ```vidscript
8
8
  import xai from "@scenerok/xai"
9
+ import cf from "@scenerok/cloudflare"
9
10
  import eleven from "@elevenlabs/music"
10
11
  import motion from "@scenerok/basic-animations"
11
12
  ```
12
13
 
13
- Without the owning import, calls like `xai.imagine`, `xai.tts`, `eleven.music`, and `motion.fadeIn` fail validation.
14
+ Without the owning import, calls like `xai.imagine`, `cf.video`, `xai.tts`, `eleven.music`, and `motion.fadeIn` fail validation.
14
15
 
15
16
  **Never** call package functions without importing the package alias first.
16
17
 
@@ -20,7 +21,7 @@ When the user gives a product, company, ecommerce, landing page, or app URL and
20
21
 
21
22
  Do not use full-page or very long website screenshots as video backgrounds; their text is usually illegible at output size. Crop or frame screenshots to the specific visual proof point, and use them sparingly.
22
23
 
23
- Website screenshots are optional source material, not a requirement to build the whole video from screenshots. You may combine them with `xai.imagine`, `xai.imageToVideo`, `xai.referenceToVideo`, or other generative video plugins for lifestyle scenes, motion backgrounds, transitions, or product context. Do not invent claims; use website copy or ask the user.
24
+ Website screenshots are optional source material, not a requirement to build the whole video from screenshots. You may combine them with still generation, text-to-video, image-to-video, reference-to-video, or extension plugins for lifestyle scenes, motion backgrounds, transitions, or product context. Do not invent claims; use website copy or ask the user.
24
25
 
25
26
  ## Rule 2.5: Probe real asset type and dimensions before placement
26
27
 
@@ -73,20 +74,56 @@ Invalid: asking the generator for signs, labels, packaging text, app UI text, ti
73
74
 
74
75
  ```vidscript
75
76
  import xai from "@scenerok/xai"
77
+ import cf from "@scenerok/cloudflare"
76
78
  import eleven from "@elevenlabs/music"
77
79
 
80
+ [- 4s] = video xai.image("Prompt here, no text, no words, no letters, no captions, no logos, no watermark, no readable UI copy", aspect_ratio: "9:16", resolution: "2k")
78
81
  [-] = video xai.imagine("Prompt here, no text, no words, no letters, no captions, no logos, no watermark, no readable UI copy", aspect_ratio: "9:16", duration: 5)
79
82
  [-] = video xai.imageToVideo("https://cdn.example.com/product.png", "Slow camera move around the product, no text, no words, no letters, no captions, no logos, no watermark, no readable UI copy", aspect_ratio: "9:16", duration: 6)
80
83
  [-] = video xai.referenceToVideo(["https://cdn.example.com/product.png", "https://cdn.example.com/person.png"], "Lifestyle scene using the referenced product and person, no text, no words, no letters, no captions, no logos, no watermark, no readable UI copy", aspect_ratio: "9:16", duration: 8)
84
+ [-] = video xai.extendVideo("https://cdn.example.com/clip.mp4", "Continue the camera move, no text, no words, no letters, no captions, no logos, no watermark, no readable UI copy", duration: 5)
81
85
  [-] = audio xai.tts("Voiceover line", voice: "eve")
86
+
87
+ [-] = video cf.video("Prompt here, no text, no words, no letters, no captions, no logos, no watermark, no readable UI copy", model: "pixverse/v6", aspect_ratio: "9:16", duration: 5, generate_audio: false)
88
+ [-] = video cf.imageToVideo("https://cdn.example.com/product.png", "Animate the image, no text, no words, no letters, no captions, no logos, no watermark, no readable UI copy", model: "runwayml/gen-4.5", aspect_ratio: "9:16", duration: 5)
82
89
  [0s .. 15s] = audio eleven.music("Warm premium music bed", duration: 15, instrumental: true)
83
90
  ```
84
91
 
85
- Use `xai.imageToVideo` for one extracted screenshot/product/logo image. Use `xai.referenceToVideo` for 1-7 reference images when extracted objects should guide the generated shot; it requires a prompt and duration must be <= 10s.
92
+ Do not overuse one plugin function. Choose deliberately:
93
+
94
+ | Need | Use |
95
+ |------|-----|
96
+ | Still image clip | `xai.image`, `xai.generateImage`, `cf.image`, `cf.generateImage` |
97
+ | Text-to-video | `xai.imagine`, `xai.genVideo`, `cf.video`, `cf.imagine`, `cf.generateVideo` |
98
+ | Animate one image/screenshot/product photo | `xai.imageToVideo`, `cf.imageToVideo` |
99
+ | Guide with 1-7 reference images | `xai.referenceToVideo`, `cf.referenceToVideo` |
100
+ | Continue a clip | `xai.extendVideo`, `cf.extendVideo` |
101
+ | Voiceover | `xai.tts`, `xai.textToSpeech` |
102
+ | Music bed | `eleven.music`, `eleven.generateMusic`, `eleven.composeMusic` |
103
+
104
+ Use `imageToVideo` for one extracted screenshot/product/logo image. Use `referenceToVideo` for 1-7 reference images when extracted objects should guide the generated shot. xAI `referenceToVideo` requires a prompt and duration must be <= 10s. Prefer explicit `model:` for Cloudflare calls, e.g. `pixverse/v6`, `vidu/q3-turbo`, `runwayml/gen-4.5`, `minimax/hailuo-2.3`, `bytedance/seedance-2.0`, `google/veo-3.1`, `black-forest-labs/flux-2-pro-preview`, or `openai/gpt-image-2`.
86
105
 
87
106
  xAI TTS voices are only `eve`, `ara`, `rex`, `sal`, and `leo`.
88
107
 
89
- For ElevenLabs music, import `@elevenlabs/music` and use `eleven.music(...)`, `eleven.generateMusic(...)`, or `eleven.composeMusic(...)`.
108
+ For ElevenLabs music, import `@elevenlabs/music` and use `eleven.music(...)`, `eleven.generateMusic(...)`, or `eleven.composeMusic(...)`. Do not call `eleven.tts(...)`; ElevenLabs TTS is not registered in VidScript today. Ads, promos, tutorials, and reels should normally include either `xai.tts` voiceover, an ElevenLabs music bed, or both unless the user asks for silent output.
109
+
110
+ ## Rule 4.5: `[-]` uses separate audio and visual playheads
111
+
112
+ Use `[-]` freely for generated visual sequences even when audio is already scheduled. VidScript advances audio and visual auto blocks independently:
113
+
114
+ ```vidscript
115
+ import xai from "@scenerok/xai"
116
+ import eleven from "@elevenlabs/music"
117
+
118
+ input family_image = "https://cdn.example.com/family.png"
119
+ let bed = eleven.music("Warm family music", duration: 15, instrumental: true)
120
+ [0s .. 15s] = audio bed, volume: 0.35, fade_out: 2s
121
+ [-] = video xai.imageToVideo(family_image, "Warm family moment, no text, no words, no letters, no captions, no logos, no watermark, no readable UI copy", aspect_ratio: "9:16", duration: 15)
122
+ ```
123
+
124
+ The video starts at the visual playhead, usually `0s`; it is not pushed to `15s` by the audio bed. Audio blocks such as `[-] = audio xai.tts(...)` advance only the audio playhead. Visual blocks such as `[-] = video ...` and `[-] = text ...` advance only the visual playhead. A block containing both audio and visual instructions advances both.
125
+
126
+ Use explicit ranges when you need exact synchronization, overlaps, or cuts. Do not avoid `[-]` solely because the script contains music or voiceover.
90
127
 
91
128
  ## Rule 5: Time ranges use `..`
92
129
 
@@ -110,6 +147,7 @@ Use only straight `"` inside `xai.imagine("...")`. Do not nest quotes inside the
110
147
 
111
148
  ```vidscript
112
149
  import xai from "@scenerok/xai"
150
+ import cf from "@scenerok/cloudflare"
113
151
  import eleven from "@elevenlabs/music"
114
152
 
115
153
  [-] = audio xai.tts("Spoken line", voice: "eve", speed: 1.0)
@@ -126,6 +164,10 @@ let bed = eleven.music("Soft premium ad music", duration: 15, instrumental: true
126
164
 
127
165
  **Invalid:** `eleven.generateMusic(...)` without `import eleven from "@elevenlabs/music"`.
128
166
 
167
+ **Invalid:** `eleven.tts(...)` — not a registered function; use `xai.tts(...)`.
168
+
169
+ **Invalid:** `cf.video(...)` without `import cf from "@scenerok/cloudflare"`.
170
+
129
171
  ## Rule 9: Filters
130
172
 
131
173
  ```vidscript