@renoise/video-maker 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude-plugin/plugin.json +5 -0
- package/README.md +50 -0
- package/hooks/hooks.json +16 -0
- package/hooks/session-start.sh +17 -0
- package/lib/gemini.ts +49 -0
- package/package.json +22 -0
- package/skills/director/SKILL.md +272 -0
- package/skills/director/references/narrative-pacing.md +257 -0
- package/skills/director/references/style-library.md +179 -0
- package/skills/product-sheet-generate/SKILL.md +75 -0
- package/skills/renoise-gen/SKILL.md +362 -0
- package/skills/renoise-gen/references/api-endpoints.md +138 -0
- package/skills/renoise-gen/references/video-capabilities.md +524 -0
- package/skills/renoise-gen/renoise-cli.mjs +723 -0
- package/skills/scene-generate/SKILL.md +52 -0
- package/skills/short-film-editor/SKILL.md +479 -0
- package/skills/short-film-editor/examples/mystery-package-4shot.md +260 -0
- package/skills/short-film-editor/references/continuity-guide.md +170 -0
- package/skills/short-film-editor/scripts/analyze-beats.py +271 -0
- package/skills/short-film-editor/scripts/batch-generate.sh +150 -0
- package/skills/short-film-editor/scripts/generate-storyboard-html.ts +714 -0
- package/skills/short-film-editor/scripts/split-grid.sh +70 -0
- package/skills/tiktok-content-maker/SKILL.md +143 -0
- package/skills/tiktok-content-maker/examples/dress-demo.md +86 -0
- package/skills/tiktok-content-maker/references/ecom-prompt-guide.md +261 -0
- package/skills/tiktok-content-maker/scripts/analyze-images.ts +122 -0
- package/skills/video-download/SKILL.md +161 -0
- package/skills/video-download/scripts/download-video.sh +91 -0
|
@@ -0,0 +1,524 @@
|
|
|
1
|
+
# renoise-2.0 Video Model Capabilities
|
|
2
|
+
|
|
3
|
+
## Model Specs
|
|
4
|
+
|
|
5
|
+
| Parameter | Value |
|
|
6
|
+
|-----------|-------|
|
|
7
|
+
| Model name | `renoise-2.0` |
|
|
8
|
+
| Min duration | 5 seconds |
|
|
9
|
+
| Max duration | 15 seconds |
|
|
10
|
+
| Duration options | Any integer from 5-15s |
|
|
11
|
+
| Resolution | Up to 1080p |
|
|
12
|
+
| Aspect ratio | `1:1`, `16:9`, `9:16` |
|
|
13
|
+
|
|
14
|
+
## Input Types
|
|
15
|
+
|
|
16
|
+
### Text-to-Video — Recommended Default Mode
|
|
17
|
+
- No materials needed, generate video from prompt alone
|
|
18
|
+
- **Most common and most stable mode**
|
|
19
|
+
- Not subject to privacy detection, highest success rate
|
|
20
|
+
- Suitable for: all scenarios
|
|
21
|
+
|
|
22
|
+
### Image-to-Video
|
|
23
|
+
- Upload reference image, AI generates video from image + prompt
|
|
24
|
+
- Material role: `ref_image`
|
|
25
|
+
- **⚠️ Privacy detection limitation**: Images with realistic human faces are often blocked (`PrivacyInformation` error). Product photos, landscapes, illustrations without faces work fine
|
|
26
|
+
- Suitable for: product showcase (white background product photos), scene extension (no faces)
|
|
27
|
+
|
|
28
|
+
### Video-to-Video
|
|
29
|
+
- Upload reference video, AI generates new video referencing motion/style
|
|
30
|
+
- Material role: `ref_video`
|
|
31
|
+
- **⚠️ Same privacy detection limitation**, videos with faces are often blocked
|
|
32
|
+
- Using ref_video affects pricing (more expensive)
|
|
33
|
+
- Suitable for: motion transfer, style transfer (face-free materials)
|
|
34
|
+
|
|
35
|
+
### Best Practices
|
|
36
|
+
|
|
37
|
+
Default to **Text-to-Video** and describe character appearance entirely in text. Only use reference materials for:
|
|
38
|
+
- Pure product photos (white background, no faces) → `ref_image`
|
|
39
|
+
- Abstract/landscape references → `ref_image`
|
|
40
|
+
- Precise motion replication (no faces) → `ref_video`
|
|
41
|
+
|
|
42
|
+
## Duration Strategy
|
|
43
|
+
|
|
44
|
+
### Core Principle: Prefer Single 15s Segment, Avoid Multi-Segment Stitching
|
|
45
|
+
|
|
46
|
+
The model can **naturally include multiple storyboard transitions** within a single 15s generation. A single 15s generation has major advantages over stitching shorter clips:
|
|
47
|
+
|
|
48
|
+
| | Single 15s | Stitched 5×3s |
|
|
49
|
+
|---|---------|----------|
|
|
50
|
+
| Music/SFX | Natural, coherent flow | Fragmented, inconsistent rhythm |
|
|
51
|
+
| Character consistency | Naturally consistent within segment | Prone to drift/face changes across segments |
|
|
52
|
+
| Camera fluidity | Complex continuous movements possible | Each segment independent, no continuity |
|
|
53
|
+
| Cost | 1 API call | 5 API calls |
|
|
54
|
+
|
|
55
|
+
**Conclusion**: Default to 15s. Only use multiple segments when target duration > 15s.
|
|
56
|
+
|
|
57
|
+
### 15s Multi-Storyboard Prompt Writing
|
|
58
|
+
|
|
59
|
+
Describe multiple storyboard stages in one prompt, using time beats to guide internal transitions:
|
|
60
|
+
|
|
61
|
+
```
|
|
62
|
+
[Opening 0-3s] Close-up of hands unboxing a sleek black device on a white desk.
|
|
63
|
+
Camera snaps dolly in to reveal the logo.
|
|
64
|
+
|
|
65
|
+
[Middle 3-10s] The woman picks it up, examines it from different angles.
|
|
66
|
+
Medium shot, smooth orbit around the product in her hands.
|
|
67
|
+
Spoken dialogue (say EXACTLY, word-for-word): "I've been waiting for this."
|
|
68
|
+
Mouth clearly visible, lip-sync aligned.
|
|
69
|
+
|
|
70
|
+
[Closing 10-15s] She places the device on a wireless charger, LED glows blue.
|
|
71
|
+
Pull back to wide shot of the full minimalist workspace.
|
|
72
|
+
Soft ambient glow, the frame holds steady.
|
|
73
|
+
```
|
|
74
|
+
|
|
75
|
+
**Key techniques**:
|
|
76
|
+
- Use `[Opening/Middle/Closing]` + time segment annotations for storyboard beats
|
|
77
|
+
- 2-3 sentences per stage, high information density
|
|
78
|
+
- Natural camera transitions (e.g., close-up → medium → wide)
|
|
79
|
+
- Embed dialogue within the corresponding time segment
|
|
80
|
+
- End last stage with `frame holds steady` for easy continuation
|
|
81
|
+
|
|
82
|
+
### Shot Density — CRITICAL
|
|
83
|
+
|
|
84
|
+
**The model can simulate multiple camera angles within a single 15s generation.** Use dense time annotations to create the feeling of edited cuts, not a single continuous take.
|
|
85
|
+
|
|
86
|
+
**Minimum shot density per 15s segment:**
|
|
87
|
+
|
|
88
|
+
| Scene Type | Shots per 15s | Time per Shot | Example |
|
|
89
|
+
|------------|--------------|---------------|---------|
|
|
90
|
+
| Action / martial arts | 5-7 | 2-3s | `[0-2s]` `[2-4s]` `[4-7s]` `[7-10s]` `[10-12s]` `[12-15s]` |
|
|
91
|
+
| Drama / dialogue | 4-5 | 3-4s | `[0-3s]` `[3-6s]` `[6-9s]` `[9-12s]` `[12-15s]` |
|
|
92
|
+
| Product / showcase | 3-5 | 3-5s | `[0-4s]` `[4-8s]` `[8-11s]` `[11-15s]` |
|
|
93
|
+
| Atmospheric / art | 2-3 | 5-7s | `[0-5s]` `[5-10s]` `[10-15s]` |
|
|
94
|
+
|
|
95
|
+
**Each time-annotated shot MUST have a different camera setup:**
|
|
96
|
+
- Different shot size (close-up → medium → wide)
|
|
97
|
+
- OR different angle (low angle → eye level → overhead)
|
|
98
|
+
- OR different movement type (static → tracking → dolly)
|
|
99
|
+
- OR hard cut keyword (`Hard cut —`, `Snap to`, `Cut to`)
|
|
100
|
+
|
|
101
|
+
**BAD — one continuous take, no cuts:**
|
|
102
|
+
```
|
|
103
|
+
[0-15s] Camera slowly follows a cat walking through a bamboo forest.
|
|
104
|
+
The cat stops and looks around. It leaps onto a rock.
|
|
105
|
+
```
|
|
106
|
+
|
|
107
|
+
**GOOD — 5 shots, varied angles, edited feel:**
|
|
108
|
+
```
|
|
109
|
+
[0-3s] Extreme wide shot — mist-filled bamboo forest at dawn. A ginger cat in silk robe stands motionless on a rock.
|
|
110
|
+
[3-5s] Snap zoom to close-up of the cat's eyes narrowing. Ears flatten.
|
|
111
|
+
[5-8s] Low angle — the cat launches forward, whip pan follows the leap through bamboo stalks.
|
|
112
|
+
[8-12s] Hard cut — medium shot, two cats clash mid-air. Paws strike in slow motion for one beat, then speed resumes. Bamboo leaves scatter.
|
|
113
|
+
[12-15s] Wide shot from above — both cats land on opposite sides of a stream. Dust settles. Camera holds.
|
|
114
|
+
```
|
|
115
|
+
|
|
116
|
+
**Rule**: Unless the prompt explicitly requests "single continuous take" or "long take", every 15s segment MUST contain at least 3 distinct camera setups with time annotations.
|
|
117
|
+
|
|
118
|
+
### Videos Over 15s
|
|
119
|
+
|
|
120
|
+
When target duration > 15s, split into 15s segments, minimizing the number of segments:
|
|
121
|
+
|
|
122
|
+
```
|
|
123
|
+
30s → 2 × 15s
|
|
124
|
+
45s → 3 × 15s
|
|
125
|
+
60s → 4 × 15s
|
|
126
|
+
```
|
|
127
|
+
|
|
128
|
+
#### Serial Chain Generation (ref_video chaining)
|
|
129
|
+
|
|
130
|
+
The key to visual continuity: **generate segments sequentially, passing each completed video as `ref_video` to the next segment**. The model continues from where the previous segment ended.
|
|
131
|
+
|
|
132
|
+
```
|
|
133
|
+
S1: text-to-video (standalone)
|
|
134
|
+
↓ complete → upload S1 video as material
|
|
135
|
+
S2: ref_video(S1) + prompt → generates from S1's ending
|
|
136
|
+
↓ complete → upload S2 video as material
|
|
137
|
+
S3: ref_video(S2) + prompt → generates from S2's ending
|
|
138
|
+
↓ ...
|
|
139
|
+
```
|
|
140
|
+
|
|
141
|
+
**S1 prompt**: Normal standalone prompt with full style/character setup.
|
|
142
|
+
**S2+ prompts**: Begin with `Continuing from the previous shot:` + describe only the NEW content. Do NOT repeat the ending of the previous segment — the ref_video already provides that context.
|
|
143
|
+
|
|
144
|
+
**CLI pattern:**
|
|
145
|
+
```bash
|
|
146
|
+
# S1
|
|
147
|
+
renoise-cli.mjs task generate --prompt "<S1>" --duration 15 --ratio 16:9
|
|
148
|
+
|
|
149
|
+
# Upload S1 result
|
|
150
|
+
renoise-cli.mjs material upload <S1-video-url> # → returns MATERIAL_ID
|
|
151
|
+
|
|
152
|
+
# S2
|
|
153
|
+
renoise-cli.mjs task generate \
|
|
154
|
+
--prompt "Continuing from the previous shot: <S2>" \
|
|
155
|
+
--duration 15 --ratio 16:9 \
|
|
156
|
+
--materials "MATERIAL_ID:ref_video"
|
|
157
|
+
```
|
|
158
|
+
|
|
159
|
+
**Time cost**: Each segment takes ~5-8 minutes. A 60s video (4 segments) takes ~20-30 minutes total (sequential, not parallel).
|
|
160
|
+
|
|
161
|
+
#### Visual Consistency — Visual Anchor Method
|
|
162
|
+
|
|
163
|
+
Text-only prompts cannot reliably maintain visual consistency across segments. The model interprets style keywords differently each generation. **Use a reference image to anchor the visual style.**
|
|
164
|
+
|
|
165
|
+
**Step 1 — Generate a concept art image** before any video segments:
|
|
166
|
+
```bash
|
|
167
|
+
renoise-cli.mjs task generate --model nano-banana-2 --resolution 2k --ratio 16:9 \
|
|
168
|
+
--prompt "Concept art sheet for [project description]. Key visual elements:
|
|
169
|
+
[color palette], [material textures], [character appearance], [environment style],
|
|
170
|
+
[lighting mood]. Multiple vignettes showing different scenes in unified style."
|
|
171
|
+
```
|
|
172
|
+
|
|
173
|
+
**Step 2 — Upload as material:**
|
|
174
|
+
```bash
|
|
175
|
+
renoise-cli.mjs material upload concept-art.jpg # → CONCEPT_ID
|
|
176
|
+
```
|
|
177
|
+
|
|
178
|
+
**Step 3 — Pass to every segment** via `--materials "CONCEPT_ID:ref_image"`:
|
|
179
|
+
```bash
|
|
180
|
+
renoise-cli.mjs task create \
|
|
181
|
+
--prompt "[visual anchor prefix] + [segment content]" \
|
|
182
|
+
--materials "CONCEPT_ID:ref_image" --duration 15 --ratio 16:9
|
|
183
|
+
```
|
|
184
|
+
|
|
185
|
+
This locks the model's interpretation of color palette, material textures, and overall aesthetic across all segments.
|
|
186
|
+
|
|
187
|
+
**Visual anchor prefix** — a short block (2-3 lines) at the start of EVERY segment prompt that repeats the core visual DNA:
|
|
188
|
+
```
|
|
189
|
+
[Visual Anchor] Golden desert wasteland, tarnished brass with blue-green
|
|
190
|
+
patina, weathered silk robes with torn edges, exposed copper wiring with
|
|
191
|
+
faint blue glow. Warm gold highlights, cool blue-grey shadows, film grain.
|
|
192
|
+
```
|
|
193
|
+
|
|
194
|
+
**For scenes with recurring characters**, also repeat the full character description at the start of each segment where they appear.
|
|
195
|
+
|
|
196
|
+
**For realistic human characters**, use `--characters "ID"` instead of ref_image. The platform has 89 preset characters with locked appearance. Use `renoise-cli.mjs character list` to browse.
|
|
197
|
+
|
|
198
|
+
**Priority order for consistency:**
|
|
199
|
+
1. `--characters` (strongest lock — exact face/body, but limited to preset characters)
|
|
200
|
+
2. `--materials "ID:ref_image"` with concept art (strong — locks style/palette/texture)
|
|
201
|
+
3. Visual anchor prefix text only (weakest — model may still drift)
|
|
202
|
+
|
|
203
|
+
#### Narrative Continuity (across segments)
|
|
204
|
+
|
|
205
|
+
4. **Energy annotation** — Each segment prompt must start with a comment declaring its narrative role and energy level:
|
|
206
|
+
```
|
|
207
|
+
<!-- Segment 2/4 — DEVELOPMENT | Energy: 5→7→8 -->
|
|
208
|
+
```
|
|
209
|
+
5. **Energy variation** — Never write 3+ segments at the same energy level. Alternate between high-energy and breathing segments.
|
|
210
|
+
6. **Drop before climax** — The segment before the climax must be lower energy (at least -2 points).
|
|
211
|
+
|
|
212
|
+
#### Audio Continuity
|
|
213
|
+
|
|
214
|
+
7. **With ref_video chaining**, the model may naturally extend the audio style from the previous segment, but this is not guaranteed.
|
|
215
|
+
8. **For dialogue-driven videos**: audio continuity is less critical — each segment has distinct lines.
|
|
216
|
+
9. **For music-driven videos**: consider stripping all audio in post and overlaying a unified BGM track:
|
|
217
|
+
```bash
|
|
218
|
+
ffmpeg -i final.mp4 -an -c:v copy silent.mp4
|
|
219
|
+
ffmpeg -i silent.mp4 -i bgm.mp3 -c:v copy -c:a aac -shortest final-with-bgm.mp4
|
|
220
|
+
```
|
|
221
|
+
|
|
222
|
+
#### Example: 30s Product Video (2 segments with narrative arc)
|
|
223
|
+
|
|
224
|
+
**Segment 1 (0-15s) — HOOK + SETUP**
|
|
225
|
+
```
|
|
226
|
+
<!-- Segment 1/2 — HOOK | Energy: 7→5→6 | Transition: Gaze Lead → S2 -->
|
|
227
|
+
Warm golden palette, shallow depth of field, film grain.
|
|
228
|
+
[0-3s] A pair of hands slowly unwrap a matte black box on a sunlit wooden table. Close-up, gentle dolly in, morning light catches the edge of the box. The anticipation builds.
|
|
229
|
+
[3-10s] The lid lifts to reveal a sleek brass desk lamp. The hands carefully lift it out, examining the curves. Medium shot, soft natural light from a nearby window. The pace is unhurried, deliberate.
|
|
230
|
+
[10-15s] The woman sets the lamp on her desk and reaches for the switch. Her eyes trace the design with quiet admiration. She looks up toward the window — the golden light outside mirrors the lamp's warm glow. Her gaze holds on the light.
|
|
231
|
+
No text, subtitles, watermarks, or logos.
|
|
232
|
+
```
|
|
233
|
+
|
|
234
|
+
**Segment 2 (15-30s) — CLIMAX + RESOLUTION**
|
|
235
|
+
```
|
|
236
|
+
<!-- Segment 2/2 — CLIMAX | Energy: 8→10→4 | Transition: n/a (final) -->
|
|
237
|
+
Warm golden palette, shallow depth of field, film grain.
|
|
238
|
+
A woman with shoulder-length dark hair in a cream linen shirt sits at a minimalist wooden desk.
|
|
239
|
+
[0-5s] Revealing what she was looking at: she clicks the lamp on. A pool of warm golden light floods the desk surface. Fast snap dolly in on the illuminated workspace. The light transforms the entire mood of the room.
|
|
240
|
+
[5-10s] Time-lapse of the room transitioning from daylight to evening. The lamp becomes the anchor of warmth in the darkening space. Quick cuts between angles: the light on a book, on her hands writing, on a coffee cup casting a long shadow. Energy peaks.
|
|
241
|
+
[10-15s] Night. The room is dark except for the lamp's glow. Wide shot, she's reading peacefully. Camera slowly pulls back through the window. The frame holds steady on the warm window in the dark facade. Silence except for distant crickets.
|
|
242
|
+
No text, subtitles, watermarks, or logos.
|
|
243
|
+
```
|
|
244
|
+
|
|
245
|
+
**Why this works**: S1 builds curiosity without showing the product immediately (energy 7→5→6). The gaze lead at S1's end creates a natural bridge. S2 opens with the reveal (energy 8), peaks with the time-lapse montage (10), then resolves into calm (4). The energy curve `7→5→6 | 8→10→4` has clear variation, a drop before climax, and a distinct ending.
|
|
246
|
+
|
|
247
|
+
#### Example: 60s Short Drama (4 segments with three-act structure)
|
|
248
|
+
|
|
249
|
+
**Rhythm Blueprint:**
|
|
250
|
+
```
|
|
251
|
+
S1 (0-15s) — ACT I: ORDINARY WORLD | Energy: 5→4→6 | → Action Bridge
|
|
252
|
+
S2 (15-30s) — ACT II-A: COMPLICATION | Energy: 7→8→9 | → Emotional Shift
|
|
253
|
+
S3 (30-45s) — ACT II-B: CLIMAX | Energy: 4→8→10 | → Time Jump
|
|
254
|
+
S4 (45-60s) — ACT III: RESOLUTION | Energy: 5→3→4 | → (end)
|
|
255
|
+
```
|
|
256
|
+
|
|
257
|
+
Note: S3 opens at energy 4 (the "drop before climax") despite S2 ending at 9. This emotional shift creates maximum impact when S3 builds to its peak at 10.
|
|
258
|
+
|
|
259
|
+
## Prompt Writing Principles
|
|
260
|
+
|
|
261
|
+
### Basic Rules
|
|
262
|
+
1. **Must be English** — The model understands English prompts best
|
|
263
|
+
2. **Natural narrative** — Use coherent descriptive paragraphs, not comma-separated tag lists
|
|
264
|
+
3. **Specific > Abstract** — `a golden retriever running through shallow ocean waves at sunset` beats `a dog on a beach`
|
|
265
|
+
4. **High information density** — 15s prompts should include details for multiple storyboard stages, don't waste space on repetition
|
|
266
|
+
|
|
267
|
+
### Prompt Structure
|
|
268
|
+
|
|
269
|
+
```
|
|
270
|
+
Subject (detailed appearance) + Action (specific body movement) + Camera (purposeful movement) + Scene/Environment + Visual Style
|
|
271
|
+
```
|
|
272
|
+
|
|
273
|
+
- **Subject**: What the subject is, with detailed appearance (hairstyle, skin tone, clothing, build)
|
|
274
|
+
- **Action**: What the subject is doing — see Action Writing below
|
|
275
|
+
- **Camera**: Camera movement — see Camera Writing below
|
|
276
|
+
- **Scene**: Environment, lighting, time of day
|
|
277
|
+
- **Style**: Visual style (cinematic, documentary, animation...)
|
|
278
|
+
|
|
279
|
+
### Action Writing — CRITICAL
|
|
280
|
+
|
|
281
|
+
The model generates **video**, not photos. Every shot needs visible motion. Static poses = dead footage.
|
|
282
|
+
|
|
283
|
+
**Level 1 (bad)**: State verbs — `stands`, `sits`, `holds`, `looks`
|
|
284
|
+
**Level 2 (ok)**: Basic action — `walks forward`, `swings sword`, `picks up cup`
|
|
285
|
+
**Level 3 (good)**: Action + body detail — `lunges forward, left foot planted, right arm extending the blade in a downward arc, robes trailing behind the motion`
|
|
286
|
+
**Level 4 (great)**: Action + micro-movement + reaction — `lunges forward, left foot planted, right arm extending the blade. The impact sends a shockwave through his arm — fingers regrip the hilt. His hair whips forward, robes billow out then snap back.`
|
|
287
|
+
|
|
288
|
+
Rules:
|
|
289
|
+
- **Every shot must have at least one verb of motion** (not state). `stands motionless` is only valid for 1-2 second tension holds before action.
|
|
290
|
+
- **Add micro-movements**: hair blowing, fingers tightening, fabric rippling, chest rising with breath, eyes narrowing. These make CG feel alive.
|
|
291
|
+
- **Describe the arc of motion**, not just the start or end: `raises the sword from hip to overhead` not just `holds sword up`.
|
|
292
|
+
- **Physical reactions**: when things collide, describe the aftermath (sparks, dust, recoil, fabric displacement, hair whip).
|
|
293
|
+
|
|
294
|
+
Bad: `A warrior stands on a cliff holding a sword.`
|
|
295
|
+
Good: `A warrior shifts his weight to his back foot, fingers tighten on the sword hilt. Wind catches his robes — they billow and snap. His hair whips across his face. He narrows his eyes at the valley below.`
|
|
296
|
+
|
|
297
|
+
### Camera Writing — CRITICAL
|
|
298
|
+
|
|
299
|
+
Camera movement is what makes the viewer *feel* the scene. Generic movement = flat footage.
|
|
300
|
+
|
|
301
|
+
**Level 1 (bad)**: Label only — `tracking shot`, `push-in`, `static`
|
|
302
|
+
**Level 2 (ok)**: Direction — `camera tracks right`, `slow dolly in`
|
|
303
|
+
**Level 3 (good)**: Direction + speed + purpose — `camera tracks right accelerating to match the runner's pace, keeping the subject in left-third frame`
|
|
304
|
+
**Level 4 (great)**: Direction + speed + reveals — `camera tracks right, initially blocked by a stone pillar — the subject emerges from behind it at full sprint, camera accelerates to keep up, the background racks out of focus`
|
|
305
|
+
|
|
306
|
+
Rules:
|
|
307
|
+
- **Camera and subject move together**: if the character runs left, describe camera tracking left. If they leap up, camera tilts up or cranes.
|
|
308
|
+
- **Describe what the movement reveals**: `camera pulls back to reveal the entire army behind him` not just `camera pulls back`.
|
|
309
|
+
- **Add camera texture**: handheld shake for action, locked-off steady for tension, gentle drift for atmosphere.
|
|
310
|
+
- **Speed changes matter**: `starts slow, accelerates as the horse breaks into gallop` is more cinematic than constant-speed tracking.
|
|
311
|
+
|
|
312
|
+
Bad: `Wide shot. Camera tracking.`
|
|
313
|
+
Good: `Wide shot, camera tracks alongside at ground level, accelerating as the horse breaks into full gallop. Dust kicks up into the lens. The background blurs into streaks of gold and green.`
|
|
314
|
+
|
|
315
|
+
### Camera Movement Cheat Sheet
|
|
316
|
+
|
|
317
|
+
| Category | Effect | Keywords | Use Case |
|
|
318
|
+
|----------|--------|----------|----------|
|
|
319
|
+
| **Shot Size** | Extreme wide | extreme wide shot | Establish environment |
|
|
320
|
+
| | Full shot | wide shot | Spatial relationships |
|
|
321
|
+
| | Medium | medium shot | Character interaction |
|
|
322
|
+
| | Close-up | close-up | Emotion/detail |
|
|
323
|
+
| | Extreme close-up | extreme close-up / macro | Texture/material |
|
|
324
|
+
| **Movement** | Push in | fast snap dolly in | Detail impact |
|
|
325
|
+
| | Pull back | quick pull back to reveal | Reveal full scene |
|
|
326
|
+
| | Whip pan | whip pan with motion blur | Rhythmic transition |
|
|
327
|
+
| | Slider | subtle slider drift | Elegant showcase |
|
|
328
|
+
| | Orbit | smooth orbit | 360° showcase |
|
|
329
|
+
| | Tracking | tracking shot follows subject | Dynamic following |
|
|
330
|
+
| | Macro push | extreme macro push | Material detail |
|
|
331
|
+
| | Static | locked-off static | Freeze/ending |
|
|
332
|
+
| **Angle** | Low angle | low angle | Authority/impact |
|
|
333
|
+
| | Worm's eye | worm's eye view, ultra-low angle | Monumental scale, hero entrance |
|
|
334
|
+
| | Dutch angle | Dutch angle, tilted horizon | Tension, unease, psychological instability |
|
|
335
|
+
| | Overhead | overhead / bird's eye | Overview/spatial |
|
|
336
|
+
| | Fisheye | fisheye lens | Fun/exaggerated |
|
|
337
|
+
| | POV | first-person POV | Immersive experience |
|
|
338
|
+
| **Pacing** | Slow motion | slow motion | Emphasize action |
|
|
339
|
+
| | Quick cuts | rapid cuts / hard cut | Tension/rhythm |
|
|
340
|
+
| | Time-lapse | time-lapse | Passage of time |
|
|
341
|
+
| **Focus** | Shallow DOF | shallow depth of field | Subject isolation |
|
|
342
|
+
| | Focus pull | rack focus | Guide viewer's eye |
|
|
343
|
+
| **Special** | Vertigo | dolly zoom / vertigo effect | Psychological impact |
|
|
344
|
+
| | Crane up | crane shot rising | Reveal, epic scale, emotional lift |
|
|
345
|
+
| | Wipe transition | wipe transition through obstruction | Seamless scene change |
|
|
346
|
+
|
|
347
|
+
### Dramatic Camera Angles — Avoiding Flat Footage
|
|
348
|
+
|
|
349
|
+
Default eye-level, medium-distance shots produce flat, boring footage. Deliberately choose dramatic angles to inject energy:
|
|
350
|
+
|
|
351
|
+
**Low Angle / Worm's Eye View**: Camera at ground level looking up. Makes subjects feel powerful, monumental. Use for hero entrances, authority, product reveals from below.
|
|
352
|
+
```
|
|
353
|
+
Camera at ground level looking up at the swordsman, worm's eye view. He towers against the stormy sky, cape billowing overhead.
|
|
354
|
+
```
|
|
355
|
+
|
|
356
|
+
**Dutch Angle (Tilted Horizon)**: 15-degree tilt creates unease. Use for tension, villain reveals, psychological instability, chase sequences.
|
|
357
|
+
```
|
|
358
|
+
Dutch angle, 15-degree tilt. The corridor stretches ahead, walls leaning ominously. Subject walks toward camera, slightly off-center.
|
|
359
|
+
```
|
|
360
|
+
|
|
361
|
+
**Extreme Macro**: Fill the entire frame with texture detail. Use for product material, food close-ups, mechanical detail, nature textures.
|
|
362
|
+
```
|
|
363
|
+
Extreme macro on the watch dial, filling frame with brushed titanium texture. Slow push-in reveals the engraved serial number.
|
|
364
|
+
```
|
|
365
|
+
|
|
366
|
+
**Vertigo / Dolly Zoom**: Camera pulls back while lens zooms in (or vice versa). Subject stays same size but background warps. Use for revelation moments, emotional shock, character realization.
|
|
367
|
+
```
|
|
368
|
+
Dolly zoom — camera pulls back while lens zooms in. The subject stays the same size but the background warps and stretches. Psychological disorientation.
|
|
369
|
+
```
|
|
370
|
+
|
|
371
|
+
**Whip Pan Transitions**: Fast horizontal pan with motion blur connecting two scenes. Use for energy bursts, music video beat transitions, location changes.
|
|
372
|
+
```
|
|
373
|
+
Whip pan right with heavy motion blur — hard cut to the next scene already in motion. No pause between scenes.
|
|
374
|
+
```
|
|
375
|
+
|
|
376
|
+
### Shot Density Guide
|
|
377
|
+
|
|
378
|
+
Higher shot density = more dynamic, engaging footage. Match density to content energy:
|
|
379
|
+
|
|
380
|
+
| Content Type | Shots per 15s | Avg Shot Length | Camera Variety |
|
|
381
|
+
|-------------|--------------|-----------------|----------------|
|
|
382
|
+
| Action / martial arts | **5-7** | 2-3s | Every shot: different size + angle |
|
|
383
|
+
| Music video / montage | **5-7** | 2-3s | Alternate: close-up ↔ wide, static ↔ motion |
|
|
384
|
+
| Drama / dialogue | 4-5 | 3-4s | Shot-reverse-shot + establishing |
|
|
385
|
+
| Product / showcase | 3-5 | 3-5s | Orbit + macro + wide reveal |
|
|
386
|
+
| Atmospheric / art | 2-3 | 5-7s | Slow movements, held frames |
|
|
387
|
+
|
|
388
|
+
**Rule of thumb**: If your 15s prompt has fewer than 3 distinct camera setups with time annotations, it is probably too flat. Add at least one dramatic angle change.
|
|
389
|
+
|
|
390
|
+
### Example: 15s Multi-Storyboard Prompt
|
|
391
|
+
|
|
392
|
+
**Good prompt**:
|
|
393
|
+
> A young woman with shoulder-length dark hair and a cream knit sweater sits at a sunlit café table. [0-4s] Close-up of her hands wrapping around a steaming ceramic mug, camera gently pushes in, morning light catches the steam rising. [4-10s] She takes a sip, looks up and smiles, medium shot as camera slowly drifts to a side angle revealing the quiet café interior — wooden shelves, hanging plants, soft jazz playing. Spoken dialogue (say EXACTLY, word-for-word): "This is my favorite place in the city." Mouth clearly visible, lip-sync aligned. [10-15s] She sets the mug down and opens a worn leather journal, begins writing. Camera pulls back to a wide shot through the café window, the frame holds steady. Cinematic, warm golden tones, shallow depth of field, film grain.
|
|
394
|
+
|
|
395
|
+
**Bad prompt**:
|
|
396
|
+
> woman, café, coffee, sunshine, beautiful, cinematic, 4k
|
|
397
|
+
|
|
398
|
+
## Advanced Prompt Techniques
|
|
399
|
+
|
|
400
|
+
### Technical Parameters — API vs Prompt
|
|
401
|
+
|
|
402
|
+
**DO NOT put these in the prompt** — they are controlled by API parameters (`--ratio`, `--duration`, model config) and writing them in the prompt wastes tokens with no effect:
|
|
403
|
+
|
|
404
|
+
- Aspect ratio (`2.35:1 widescreen`, `16:9`, `9:16`)
|
|
405
|
+
- Frame rate (`24fps`, `30fps`)
|
|
406
|
+
- Resolution (`1080p`, `4K`, `8K`)
|
|
407
|
+
|
|
408
|
+
**DO put these in the prompt** — they are visual style choices the model responds to:
|
|
409
|
+
|
|
410
|
+
- Color palette (`warm golden palette`, `desaturated blue-grey`, `neon pink and cyan`)
|
|
411
|
+
- Depth of field (`shallow depth of field`, `deep focus`)
|
|
412
|
+
- Film texture (`film grain`, `RAW`, `HDR`)
|
|
413
|
+
- Visual style (`cinematic`, `documentary`, `ink wash painting`)
|
|
414
|
+
- Lighting mood (`golden hour`, `rim light`, `volumetric haze`)
|
|
415
|
+
|
|
416
|
+
```
|
|
417
|
+
Good: warm golden palette, shallow depth of field, film grain.
|
|
418
|
+
[0-5s] Close-up of hands on piano keys...
|
|
419
|
+
|
|
420
|
+
Bad: 2.35:1 widescreen, 24fps, 1080p, warm golden palette...
|
|
421
|
+
(ratio/fps/resolution are wasted — use API params instead)
|
|
422
|
+
```
|
|
423
|
+
|
|
424
|
+
### Negative Prompting
|
|
425
|
+
|
|
426
|
+
Exclude unwanted elements at the end of the prompt to prevent auto-generated text, watermarks, etc.:
|
|
427
|
+
|
|
428
|
+
```
|
|
429
|
+
... frame holds steady. No text, subtitles, watermarks, or logos. No sudden camera shake.
|
|
430
|
+
```
|
|
431
|
+
|
|
432
|
+
Common negatives: `No text / No subtitles / No watermarks / No logos / No camera shake / No jump cuts`
|
|
433
|
+
|
|
434
|
+
### Style Keywords Cheat Sheet
|
|
435
|
+
|
|
436
|
+
| Category | Example Keywords |
|
|
437
|
+
|----------|-----------------|
|
|
438
|
+
| Texture | cinematic, film grain, HDR, RAW, 8K |
|
|
439
|
+
| Color | warm tone, cold blue, high contrast, desaturated, neon, Morandy palette |
|
|
440
|
+
| Lighting | golden hour, rim light, Tyndall effect, volumetric light, natural light, side backlight |
|
|
441
|
+
| Style | documentary, vlog, commercial, music video, Hollywood blockbuster, indie film |
|
|
442
|
+
| Animation | 3D CG animation, cel-shaded anime, ink wash painting, pixel art |
|
|
443
|
+
|
|
444
|
+
## Scene Type Prompt Focus
|
|
445
|
+
|
|
446
|
+
| Scene Type | Prompt Focus |
|
|
447
|
+
|------------|-------------|
|
|
448
|
+
| **E-commerce/Ads** | Product visible in frame 1 + material close-up + 360° showcase + brand ending |
|
|
449
|
+
| **Story/Drama** | Separate visuals and dialogue + annotate character emotion + SFX on separate line |
|
|
450
|
+
| **Action/Fantasy** | VFX particle details + quick-cut pacing + slow-mo for key actions |
|
|
451
|
+
| **Lifestyle/Vlog** | Natural light + handheld tracking feel + ambient sound |
|
|
452
|
+
| **MV/Beat Sync** | Specify aspect ratio + framerate + sound design priority + beat alignment |
|
|
453
|
+
| **Educational** | 4K CGI style + semi-transparent visualization + educational voiceover |
|
|
454
|
+
|
|
455
|
+
## Creative Prompt Templates
|
|
456
|
+
|
|
457
|
+
### Story Completion
|
|
458
|
+
|
|
459
|
+
Provide keyframes or storyboard description, let the model auto-fill actions and transitions:
|
|
460
|
+
|
|
461
|
+
```
|
|
462
|
+
A 4-panel comic strip is shown in the reference image. Animate each panel left-to-right,
|
|
463
|
+
top-to-bottom, maintaining character dialogue. Add dramatic sound effects at key moments.
|
|
464
|
+
Style: humorous and exaggerated.
|
|
465
|
+
```
|
|
466
|
+
|
|
467
|
+
### Video Extension
|
|
468
|
+
|
|
469
|
+
Append content to a previously generated video. Pass the previous video via `--materials "ID:ref_video"`, prompt describes **the new portion only**:
|
|
470
|
+
|
|
471
|
+
```
|
|
472
|
+
Continuing from the previous shot: [0-5s] The character turns and walks toward the door,
|
|
473
|
+
camera tracking follows. [5-10s] She opens the door to reveal a sunlit garden, camera
|
|
474
|
+
glides through the doorframe, frame holds steady.
|
|
475
|
+
```
|
|
476
|
+
|
|
477
|
+
> **Note**: `--duration` should be set to the duration of the new portion, not the total.
|
|
478
|
+
|
|
479
|
+
### Seamless Long Take
|
|
480
|
+
|
|
481
|
+
Use `single continuous take, no cuts` + scene transition words to link multiple spaces:
|
|
482
|
+
|
|
483
|
+
```
|
|
484
|
+
Single continuous take, no cuts. [0-5s] Camera follows a woman in a red coat through
|
|
485
|
+
a crowded market, tracking shot. [5-10s] She turns a corner into a quiet alley, camera
|
|
486
|
+
keeps following without cutting. [10-15s] She pushes open a wooden door and enters a
|
|
487
|
+
sunlit courtyard, camera glides in behind her, frame holds steady.
|
|
488
|
+
```
|
|
489
|
+
|
|
490
|
+
### Sound & Dialogue
|
|
491
|
+
|
|
492
|
+
Embed dialogue using `Spoken dialogue (say EXACTLY, word-for-word): "..."` format in the corresponding time segment, with emotion and lip-sync annotations:
|
|
493
|
+
|
|
494
|
+
```
|
|
495
|
+
[3-8s] Medium shot, she picks up the phone. Spoken dialogue (say EXACTLY, word-for-word):
|
|
496
|
+
"I told you, it's over." Tone: cold and resolute. Mouth clearly visible, lip-sync aligned.
|
|
497
|
+
```
|
|
498
|
+
|
|
499
|
+
SFX/BGM on a separate line at the end of the prompt:
|
|
500
|
+
|
|
501
|
+
```
|
|
502
|
+
Sound design: gentle rain on window, distant thunder, melancholic piano.
|
|
503
|
+
```
|
|
504
|
+
|
|
505
|
+
### Video Editing
|
|
506
|
+
|
|
507
|
+
Make targeted modifications to a reference video (character replacement, element addition/removal). Pass original video + replacement materials via `--materials`:
|
|
508
|
+
|
|
509
|
+
```
|
|
510
|
+
Replace the main character in the reference video with the person in the reference image.
|
|
511
|
+
Keep all original camera movements and timing. Add a white cat sitting on the desk
|
|
512
|
+
in the background.
|
|
513
|
+
```
|
|
514
|
+
|
|
515
|
+
### Beat Sync
|
|
516
|
+
|
|
517
|
+
Use timestamps to precisely align with beats, emphasize audio-visual synchronization (set ratio via `--ratio` API param):
|
|
518
|
+
|
|
519
|
+
```
|
|
520
|
+
[0-2s] Beat drop — extreme close-up of hands clapping, sharp
|
|
521
|
+
snap zoom. [2-5s] Wide shot, dancer spins, camera orbits in sync with bass hits.
|
|
522
|
+
[5-8s] Freeze frame on peak pose, 0.5s hold, then rapid montage cuts on every snare.
|
|
523
|
+
Sound design priority: footsteps, fabric rustle, and breath must align with beat.
|
|
524
|
+
```
|