arca-marketing-video 2.7.0 → 2.9.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/package.json +1 -1
- package/skills/_arca-marketing-assets/assets/final-cta.png +0 -0
- package/skills/shorts-editor/SKILL.md +35 -8
- package/skills/shorts-editor/composition.template.html +29 -38
- package/skills/shorts-editor/sfx/ding.mp3 +0 -0
- package/skills/shorts-editor/sfx/glitch.mp3 +0 -0
- package/skills/shorts-editor/sfx/riser-high.mp3 +0 -0
- package/skills/shorts-editor/sfx/sad-violin.mp3 +0 -0
- package/skills/shorts-editor/sfx/swoosh-high.mp3 +0 -0
- package/skills/shorts-editor/sfx/swoosh-low.mp3 +0 -0
- package/skills/shorts-editor/sfx/tiktok-boom-bling.mp3 +0 -0
- package/skills/shorts-editor/sfx/wrong.mp3 +0 -0
- package/skills/storyboard-prompt/SKILL.md +25 -10
- package/skills/video-prompt/SKILL.md +43 -0
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "arca-marketing-video",
|
|
3
|
-
"version": "2.
|
|
3
|
+
"version": "2.9.0",
|
|
4
4
|
"description": "Brand-driven short-form marketing content kit: four installable Claude Code skills (carousel generator, storyboard prompt, video prompt, shorts editor) plus a shared brand profile and assets. Run `npx arca-marketing-video` to install them into .claude/skills.",
|
|
5
5
|
"keywords": [
|
|
6
6
|
"skill",
|
|
Binary file
|
|
@@ -18,7 +18,7 @@ in-world mark. `silence_cut.py` and `composition.template.html` are co-located i
|
|
|
18
18
|
This skill is the final edit stage after `video-prompt` (or runs standalone on any raw footage).
|
|
19
19
|
|
|
20
20
|
## Overview
|
|
21
|
-
Turn a finished-but-flat talking clip into a punchy 9:16 short. Engagement is
|
|
21
|
+
Turn a finished-but-flat talking clip into a punchy 9:16 short. Engagement is layers stacked on clean footage: a **tight silence-cut** (raw footage only — skip for AI clips), **word-by-word pop-on captions** (Anton, gold keywords, no pill backing), **native treatments** (zoom punch-ins, speed ramps, hard cuts, SFX hits — NOT glass-pill chips), and a **SFX + brand-splash** finish. The composition is HyperFrames HTML; ffmpeg cuts and masters; faster-whisper supplies timing.
|
|
22
22
|
|
|
23
23
|
**Core principle:** retention is manufactured by deleting dead time and giving the eye a new beat every 2-4s. Cut the pauses first; every other layer just decorates the tightened result.
|
|
24
24
|
|
|
@@ -37,13 +37,18 @@ Not for: generating footage from scratch, or pure motion-graphics pieces (use th
|
|
|
37
37
|
1. **Denoise** → `audio_clean.m4a`:
|
|
38
38
|
`ffmpeg -i src.mp4 -af "highpass=85,afftdn,lowpass=12000,loudnorm=I=-14:TP=-1.5:LRA=11" -ar 48000 -ac 2 audio_clean.m4a`
|
|
39
39
|
2. **Transcribe** word timestamps with faster-whisper → `transcript.json` (`[{text,start,end}]`). `small.en` only if the audio is English.
|
|
40
|
-
3. **Silence-cut** with `./silence_cut.py` → `tight.mp4` (the non-obvious core, see below).
|
|
40
|
+
3. **Silence-cut** with `./silence_cut.py` → `tight.mp4` (the non-obvious core, see below). **SKIP this
|
|
41
|
+
step for AI-generated footage** (clips from `video-prompt` / Wyren): it is already tight and its
|
|
42
|
+
pauses are intentional comedic beats — the cutter is built for RAW talking-head dead air, not for
|
|
43
|
+
generated clips. For AI footage, go straight to assembly and keep the clips' timing. Only run the
|
|
44
|
+
silence-cut on real raw recordings (interview, vox-pop, vlog, podcast) with genuine dead air.
|
|
41
45
|
4. **Re-transcribe `tight.mp4`**, regroup into caption phrases (sentence-aware, 3-5 words). Cutting shifts every timestamp, so always re-transcribe the cut; never remap old times.
|
|
42
|
-
5. **Build the composition** from `./composition.template.html`: muted plate + separate dialogue audio + captions +
|
|
46
|
+
5. **Build the composition** from `./composition.template.html`: muted plate + separate dialogue audio + word-pop captions + zooms + logo + splash + SFX (no chips by default). Lint clean.
|
|
43
47
|
6. **Draft render** (`--quality draft`), extract frames at every chip/caption/splash beat, eyeball, fix. Then **`--quality high`** and **master**:
|
|
44
48
|
`ffmpeg -i raw.mp4 -c:v copy -af "loudnorm=I=-14:TP=-1.5,alimiter=limit=0.95" -c:a aac -b:a 192k out.mp4`
|
|
45
49
|
|
|
46
50
|
## The silence-cut (the part that is easy to get wrong)
|
|
51
|
+
**Only for RAW recordings — skip entirely for AI-generated clips** (see Pipeline step 3).
|
|
47
52
|
Neither signal alone works:
|
|
48
53
|
- **silencedetect** misses pauses filled with ambient/breath above the noise floor.
|
|
49
54
|
- **whisper word timestamps** are imprecise around pauses and will report contiguous words across a real ~1s gap (e.g. it claimed "So, cheating" was continuous when 0.8s of silence sat between them).
|
|
@@ -51,11 +56,32 @@ Neither signal alone works:
|
|
|
51
56
|
**Cut where EITHER is true:** acoustic silence (`silencedetect=noise=-33dB:d=0.35`) OR a transcript word-gap > 0.45s. Only remove the dead middle when it exceeds ~0.5s, leave ~0.1s of speech pad each side, and KEEP sub-0.5s gaps (natural rhythm — cutting every micro-gap machine-guns the edit and looks glitchy). Add a 10ms `afade` at each join to kill clicks. **Verify with silencedetect on the OUTPUT** — it should show only short breath gaps. `./silence_cut.py` implements all of this; tune `--cut-min`.
|
|
52
57
|
|
|
53
58
|
## Layers (all face/caption-safe)
|
|
54
|
-
- **Captions
|
|
55
|
-
|
|
59
|
+
- **Captions (word-by-word pop-on — the default, not flat phrase-fade):** the strong default is each
|
|
60
|
+
WORD popping on as it is spoken, not a whole phrase fading in. We already transcribe word-level times,
|
|
61
|
+
so use them: show ~3-5 words per group, but reveal them one word at a time on each word's `start`
|
|
62
|
+
(quick scale/opacity pop), the current word emphasized. **Anton** display font (uppercase), white with
|
|
63
|
+
**gold (`--accent2`) keywords**, **NO pill / glass backing** — just a hard text-shadow for legibility.
|
|
64
|
+
ONE group visible at a time: hard-hide each before the next appears — clamp exit to
|
|
65
|
+
`min(end+0.1, next.start-0.07)`, then `tl.set(opacity:0,visibility:hidden)`. Bottom third (~286px up).
|
|
66
|
+
The flat phrase-fade is the weak default — don't ship it.
|
|
67
|
+
- **Native treatments over graphic "chips" (chips are OFF by default):** do NOT add the glass-pill /
|
|
68
|
+
badge "chip" by default — it reads as AI / vibe-coded. Manufacture beats with NATIVE editing instead:
|
|
69
|
+
zoom punch-ins, speed ramps / hold-frames, the word-pop captions themselves (gold keyword emphasis),
|
|
70
|
+
hard cuts on the beat, and SFX hits. Only add a chip if the user explicitly asks for an on-screen
|
|
71
|
+
label, and keep it minimal. The chip CSS/JS is removed from the default template.
|
|
56
72
|
- **Zoom punch-ins:** scale the plate wrapper (base ~1.04) to ~1.10-1.14 on emphasis lines, ease back. Never scale below 1.0 (reveals letterbox edges). Cover-fit the plate.
|
|
57
|
-
- **SFX:**
|
|
58
|
-
|
|
73
|
+
- **SFX:** a curated set ships in `./sfx/` — use these first (no download needed). Keep dialogue front (SFX vol 0.25-0.35); every `<audio>` needs an `id`. Mapping:
|
|
74
|
+
| Role | File |
|
|
75
|
+
| --- | --- |
|
|
76
|
+
| Opening riser (first frame) | `./sfx/riser-high.mp3` |
|
|
77
|
+
| Hard cut / speaker change | `./sfx/swoosh-high.mp3`, `./sfx/swoosh-low.mp3` |
|
|
78
|
+
| Chip entrance / key reveal (pop) | `./sfx/ding.mp3` |
|
|
79
|
+
| Brand-splash signature hit (reserve for splash only) | `./sfx/tiktok-boom-bling.mp3` |
|
|
80
|
+
| Glitch / error beat | `./sfx/glitch.mp3` |
|
|
81
|
+
| "Wrong"/mistake beat | `./sfx/wrong.mp3` |
|
|
82
|
+
| Comedic deflation | `./sfx/sad-violin.mp3` |
|
|
83
|
+
Need something not here? Mixkit free SFX (`mixkit.co/free-sound-effects/<cat>/` → `assets.mixkit.co/.../<id>-preview.mp3`) is a reliable no-key source.
|
|
84
|
+
- **Brand splash:** 3s end card (a still image — use `../_arca-marketing-assets/assets/final-cta.png`), brought in by a quick white flash + the signature SFX (`./sfx/tiktok-boom-bling.mp3`), with a subtle ken-burns zoom. Reserve that SFX for the splash only.
|
|
59
85
|
|
|
60
86
|
## Gotchas
|
|
61
87
|
| Symptom | Fix |
|
|
@@ -71,4 +97,5 @@ Neither signal alone works:
|
|
|
71
97
|
|
|
72
98
|
## Files
|
|
73
99
|
- `./silence_cut.py` — silence ∪ word-gap cutter: `--src --audio --transcript --out [--cut-min 0.5]`.
|
|
74
|
-
- `./composition.template.html` — HyperFrames composition skeleton (plate, captions,
|
|
100
|
+
- `./composition.template.html` — HyperFrames composition skeleton (plate, word-pop captions, zooms, splash, SFX; chips removed from default) with the load-bearing GSAP logic already wired.
|
|
101
|
+
- `./sfx/` — bundled, ready-to-use SFX (riser, swooshes, ding, tiktok-boom-bling, glitch, wrong, sad-violin). See the SFX mapping above.
|
|
@@ -2,7 +2,9 @@
|
|
|
2
2
|
<!--
|
|
3
3
|
HyperFrames composition skeleton for an engaging vertical short.
|
|
4
4
|
Fill the >>> TODO <<< spots. All times are on the TIGHT (cut) timeline.
|
|
5
|
-
Layers by z-index: plate(0) < grade(1) <
|
|
5
|
+
Layers by z-index: plate(0) < grade(1) < captions(5) < logo(6) < flash(7) < splash(8) < endfade(9).
|
|
6
|
+
Captions are word-by-word pop-on (Anton, gold keywords, no pill backing). Chips are OFF by default
|
|
7
|
+
(they read as AI/vibe-coded) — use native treatments (zooms, speed, hard cuts, SFX) instead.
|
|
6
8
|
Standalone composition: data-composition-id div lives directly in <body> (NO <template>).
|
|
7
9
|
-->
|
|
8
10
|
<html lang="en">
|
|
@@ -17,20 +19,14 @@
|
|
|
17
19
|
#plate-wrap { position:absolute; inset:0; overflow:hidden; z-index:0; transform-origin:50% 42%; will-change:transform; }
|
|
18
20
|
#plate-wrap video { position:absolute; inset:0; width:100%; height:100%; object-fit:cover; }
|
|
19
21
|
#grade { position:absolute; inset:0; z-index:1; pointer-events:none; background:radial-gradient(120% 80% at 50% 40%, rgba(0,0,0,0) 56%, rgba(0,0,0,.42) 100%); }
|
|
20
|
-
/* chips
|
|
21
|
-
#chips { position:absolute; inset:0; z-index:3; pointer-events:none; }
|
|
22
|
-
.chip { position:absolute; top:1170px; left:0; right:0; margin:0 auto; width:max-content; max-width:920px;
|
|
23
|
-
display:flex; align-items:center; gap:22px; padding:18px 40px 18px 20px; border-radius:999px;
|
|
24
|
-
background:var(--glass); border:2px solid rgba(41,171,226,.5); box-shadow:0 18px 50px rgba(0,0,0,.45); opacity:0; }
|
|
25
|
-
.chip .badge { flex:0 0 auto; width:66px; height:66px; border-radius:50%; display:flex; align-items:center; justify-content:center;
|
|
26
|
-
background:linear-gradient(150deg,var(--accent),#1565c0); }
|
|
27
|
-
.chip.alt .badge { background:linear-gradient(150deg,#ffce4d,var(--accent2)); }
|
|
28
|
-
.chip .badge svg { width:36px; height:36px; color:#fff; }
|
|
29
|
-
.chip .ctext { font-weight:800; font-size:46px; letter-spacing:-.02em; color:var(--ink); white-space:nowrap; line-height:1; }
|
|
22
|
+
/* chips are OFF by default (they read as AI/vibe-coded) — native treatments only: zooms, speed, hard cuts, SFX, gold keyword pops */
|
|
30
23
|
#caps { position:absolute; left:0; right:0; bottom:286px; z-index:5; pointer-events:none; }
|
|
31
|
-
|
|
32
|
-
|
|
33
|
-
|
|
24
|
+
/* word-by-word pop-on captions: Anton display font, uppercase, gold keywords, NO pill backing — just a hard shadow */
|
|
25
|
+
.cap { position:absolute; left:0; right:0; bottom:0; margin:0 auto; width:900px; text-align:center;
|
|
26
|
+
font-family:"Anton",system-ui,sans-serif; font-weight:400; text-transform:uppercase;
|
|
27
|
+
font-size:64px; line-height:1.1; letter-spacing:.005em; color:var(--ink); opacity:0;
|
|
28
|
+
text-shadow:0 4px 18px rgba(0,0,0,.85),0 2px 4px rgba(0,0,0,.95); }
|
|
29
|
+
.cap .w { display:inline-block; will-change:transform,opacity; }
|
|
34
30
|
.cap .kw { color:var(--accent2); }
|
|
35
31
|
#logo { position:absolute; top:54px; right:54px; width:90px; z-index:6; opacity:0; filter:drop-shadow(0 6px 18px rgba(0,0,0,.45)); }
|
|
36
32
|
#flash { position:absolute; inset:0; z-index:7; background:#fff; opacity:0; pointer-events:none; }
|
|
@@ -44,7 +40,6 @@
|
|
|
44
40
|
<div id="root" data-composition-id="root" data-width="1080" data-height="1920" data-start="0" data-duration="47.58">
|
|
45
41
|
<div id="plate-wrap"><video id="plate" src="tight.mp4" data-start="0" data-duration="44.58" data-media-start="0" data-track-index="0" muted playsinline></video></div>
|
|
46
42
|
<div id="grade"></div>
|
|
47
|
-
<div id="chips"></div>
|
|
48
43
|
<div id="caps"></div>
|
|
49
44
|
<img id="logo" src="logo.png" alt="brand" crossorigin="anonymous" />
|
|
50
45
|
<div id="flash"></div>
|
|
@@ -53,47 +48,43 @@
|
|
|
53
48
|
|
|
54
49
|
<!-- AUDIO: muted video + separate dialogue track. EVERY <audio> needs an id or it is silent. -->
|
|
55
50
|
<audio id="dlg" src="dialogue.m4a" data-start="0" data-duration="44.58" data-media-start="0" data-track-index="2" data-volume="1"></audio>
|
|
56
|
-
<audio id="sfx-riser" src="sfx/riser.mp3" data-start="0" data-duration="4.0" data-media-start="0" data-track-index="3" data-volume="0.30"></audio>
|
|
57
|
-
<!-- >>> whooshes on cuts (track 4
|
|
58
|
-
<audio id="sfx-
|
|
51
|
+
<audio id="sfx-riser" src="sfx/riser-high.mp3" data-start="0" data-duration="4.0" data-media-start="0" data-track-index="3" data-volume="0.30"></audio>
|
|
52
|
+
<!-- >>> whooshes on hard cuts (track 4: swoosh-high.mp3 / swoosh-low.mp3); reserve the boom-bling for the splash only (track 6) <<< -->
|
|
53
|
+
<audio id="sfx-splash" src="sfx/tiktok-boom-bling.mp3" data-start="44.58" data-duration="0.84" data-media-start="0" data-track-index="6" data-volume="0.5"></audio>
|
|
59
54
|
|
|
60
55
|
<script src="https://cdn.jsdelivr.net/npm/gsap@3.14.2/dist/gsap.min.js"></script>
|
|
61
|
-
<script src="data.js"></script><!-- window.CAPTION_GROUPS = [{
|
|
56
|
+
<script src="data.js"></script><!-- window.CAPTION_GROUPS = [{start,end,words:[{w,t},...]},...] — w=word text, t=its spoken start time (from the re-transcribed cut) -->
|
|
62
57
|
<script>
|
|
63
58
|
(function () {
|
|
64
59
|
var VID = 44.58, DUR = 47.58; // >>> set <<<
|
|
65
60
|
var tl = gsap.timeline({ paused: true });
|
|
66
61
|
|
|
67
|
-
var
|
|
68
|
-
|
|
69
|
-
function capHTML(t){return t.split(" ").map(function(w){return KW[w.toLowerCase()]?'<span class="kw">'+w+'</span>':w;}).join(" ");}
|
|
62
|
+
var KW = { /* lowercase keyword (alphanumeric only) -> true, to highlight gold, e.g. arca:true */ };
|
|
63
|
+
function isKW(w){ return !!KW[w.toLowerCase().replace(/[^a-z0-9]/g,"")]; }
|
|
70
64
|
|
|
71
|
-
// ---- CAPTIONS:
|
|
65
|
+
// ---- CAPTIONS: word-by-word pop-on, gold keywords, ONE group on screen at a time ----
|
|
72
66
|
var caps = document.getElementById("caps"), G = window.CAPTION_GROUPS || [];
|
|
73
67
|
G.forEach(function (g, i) {
|
|
74
68
|
var el = document.createElement("div"); el.className = "cap"; el.id = "cap-" + i;
|
|
75
|
-
|
|
69
|
+
caps.appendChild(el);
|
|
76
70
|
var next = (i + 1 < G.length) ? G[i + 1].start : VID + 1;
|
|
77
71
|
var inAt = Math.max(0, g.start - 0.05);
|
|
78
72
|
var hardOut = Math.min(g.end + 0.1, next - 0.07);
|
|
79
73
|
if (hardOut <= inAt + 0.12) hardOut = inAt + 0.12;
|
|
80
|
-
tl.
|
|
81
|
-
|
|
74
|
+
tl.set(el, { opacity:1, visibility:"visible" }, inAt);
|
|
75
|
+
(g.words || []).forEach(function (wd, j) {
|
|
76
|
+
var s = document.createElement("span"); s.className = "w";
|
|
77
|
+
s.innerHTML = isKW(wd.w) ? '<span class="kw">' + wd.w + '</span>' : wd.w;
|
|
78
|
+
el.appendChild(s); el.appendChild(document.createTextNode(" "));
|
|
79
|
+
// each word pops on at its own spoken time (fallback: group start)
|
|
80
|
+
var at = Math.min(hardOut - 0.05, Math.max(inAt, wd.t != null ? wd.t : g.start));
|
|
81
|
+
tl.fromTo(s, { opacity:0, scale:0.6, y:14 }, { opacity:1, scale:1, y:0, duration:0.16, ease:"back.out(2)" }, at);
|
|
82
|
+
});
|
|
83
|
+
tl.to(el, { opacity:0, duration:0.14, ease:"power2.in" }, Math.max(inAt + 0.1, hardOut - 0.14));
|
|
82
84
|
tl.set(el, { opacity:0, visibility:"hidden" }, hardOut);
|
|
83
85
|
});
|
|
84
86
|
|
|
85
|
-
// ---- CHIPS:
|
|
86
|
-
var CHIPS = [ /* { s, e, icon:"dot", text:"LABEL", alt:false } */ ];
|
|
87
|
-
var chips = document.getElementById("chips");
|
|
88
|
-
CHIPS.forEach(function (c, i) {
|
|
89
|
-
var el = document.createElement("div"); el.className = "chip" + (c.alt ? " alt" : ""); el.id = "chip-" + i;
|
|
90
|
-
el.innerHTML = '<span class="badge">' + (ICONS[c.icon]||"") + '</span><span class="ctext">' + c.text + '</span>';
|
|
91
|
-
chips.appendChild(el);
|
|
92
|
-
tl.fromTo(el, { opacity:0, scale:0.8, y:24 }, { opacity:1, scale:1, y:0, duration:0.42, ease:"back.out(1.6)" }, c.s);
|
|
93
|
-
tl.to(el, { y:-6, duration:(c.e - c.s) - 0.42, ease:"sine.inOut" }, c.s + 0.42);
|
|
94
|
-
tl.to(el, { opacity:0, scale:0.92, duration:0.3, ease:"power2.in" }, c.e - 0.3);
|
|
95
|
-
tl.set(el, { opacity:0, visibility:"hidden" }, c.e);
|
|
96
|
-
});
|
|
87
|
+
// ---- CHIPS removed from default — use native treatments: zoom punch-ins, speed ramps, hard cuts, SFX hits ----
|
|
97
88
|
|
|
98
89
|
// ---- ZOOM PUNCH-INS on the plate (never below 1.0) ----
|
|
99
90
|
var W = "#plate-wrap", BASE = 1.04;
|
|
Binary file
|
|
Binary file
|
|
Binary file
|
|
Binary file
|
|
Binary file
|
|
Binary file
|
|
Binary file
|
|
Binary file
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: storyboard-prompt
|
|
3
|
-
description: Use when pressure-testing a short-form video idea and turning it into a 3×3 (or larger) storyboard for TikTok / Reels / Shorts of any video type (UGC, cinematic, animation, etc.) — brutal virality scoring, a 10-route hook lab, a polish pass that makes the idea catchy and engaging with a strong hook and brand-aligned messaging, a first-5-seconds cold open, a clarification checkpoint before image generation, then TWO deliverables: a clean frames-only grid image (no text/notes, easy to crop) plus a text breakdown (concept,
|
|
3
|
+
description: Use when pressure-testing a short-form video idea and turning it into a 3×3 (or larger) storyboard for TikTok / Reels / Shorts of any video type (UGC, cinematic, animation, etc.) — brutal virality scoring, a 10-route hook lab, a polish pass that makes the idea catchy and engaging with a strong hook and brand-aligned messaging, a first-5-seconds cold open, a clarification checkpoint before image generation, then TWO deliverables: a clean frames-only grid image (no text/notes, easy to crop) plus a text breakdown (concept, named characters + 1-line bibles, a required 5-column Frame·Time·Visual·Dialogue·Direction flow table, editor notes, style notes). Triggers on "storyboard this idea", "is this video concept good", "plan a short". Part of the arca-marketing-video kit.
|
|
4
4
|
---
|
|
5
5
|
|
|
6
6
|
# Storyboard Prompt
|
|
@@ -729,22 +729,37 @@ croppable. Use these sections:
|
|
|
729
729
|
|
|
730
730
|
- VIDEO CONCEPT — one tight paragraph: the polished concept + logline, the video type / style, the
|
|
731
731
|
target viewer, the core promise, the chosen cold-open strategy, and the recommended length.
|
|
732
|
-
-
|
|
733
|
-
-
|
|
734
|
-
-
|
|
735
|
-
-
|
|
736
|
-
|
|
732
|
+
- CHARACTERS — name EVERY recurring character (give each a short memorable name, e.g. "Maya the
|
|
733
|
+
Navigator") and write a ONE-LINE bible for each: face / age / build, hair, wardrobe, and 1–2
|
|
734
|
+
distinguishing features. This is what makes identity-locking trivial downstream — `video-prompt`
|
|
735
|
+
pastes these bibles verbatim into every shot. Even a one-person video needs its character named + a
|
|
736
|
+
bible line. If no character recurs (pure product / motion-graphics), say so in one line.
|
|
737
|
+
- FLOW — the beat-by-beat shot flow as an EXPLICIT 5-COLUMN TABLE, one row per frame (this table is a
|
|
738
|
+
REQUIRED handoff, never optional — your written dialogue + direction drive the acting far better than
|
|
739
|
+
lines inferred downstream). Columns:
|
|
740
|
+
| Frame | Time | Visual | Dialogue | Direction |
|
|
741
|
+
- **Frame** — number (1–9, or however many) + its retention role + shot / angle / move.
|
|
742
|
+
- **Time** — the timestamp range.
|
|
743
|
+
- **Visual** — what happens in the frame (action, who's on screen, key prop).
|
|
744
|
+
- **Dialogue** — the actual spoken line for that beat, written out. Use "—" only for true silent
|
|
745
|
+
beats; do NOT leave it blank to be filled in later. These exact lines get spoken (native audio
|
|
746
|
+
nails scripted lines), so write them tight and in-character.
|
|
747
|
+
- **Direction** — performance / delivery note that drives the acting (e.g. "deliver softly, almost
|
|
748
|
+
inspirational", "rushed, glancing off-camera", "deadpan, then a tiny smirk"). Always fill this —
|
|
749
|
+
it is the highest-value column and the reason this table is mandatory.
|
|
737
750
|
- VIDEO EDITOR NOTES — what to add in the EDIT, not in the frames: suggested on-screen caption per
|
|
738
751
|
beat, cut / transition style, SFX and music, pacing, zoom punch-ins, and where the logo / brand
|
|
739
752
|
splash lands. (These feed the `shorts-editor` skill.)
|
|
740
753
|
- STYLE NOTES — the look the final video must match: video type, lighting, camera feel, color,
|
|
741
|
-
wardrobe / prop continuity
|
|
742
|
-
(These feed the `video-prompt` skill.)
|
|
754
|
+
wardrobe / prop continuity. Reference the CHARACTERS section above for who must stay consistent
|
|
755
|
+
across frames. (These feed the `video-prompt` skill.)
|
|
743
756
|
- BRAND / LOGO NOTES — which frames the supplied logo appears in and how (as a subtle in-world prop),
|
|
744
757
|
per Phase 7.
|
|
745
758
|
|
|
746
|
-
Keep it tight and scannable.
|
|
747
|
-
|
|
759
|
+
Keep it tight and scannable. The FLOW table's Dialogue + Direction columns are REQUIRED (write the
|
|
760
|
+
real lines and the delivery notes — don't defer them); only a genuinely silent beat gets a "—" in
|
|
761
|
+
Dialogue. The clean frame grid (Phase 8) plus this text breakdown are the two deliverables — never
|
|
762
|
+
merge the text into the image.
|
|
748
763
|
|
|
749
764
|
PHASE 9: ANTI-AI AND ANTI-CINEMA QUALITY GATE
|
|
750
765
|
|
|
@@ -26,6 +26,40 @@ Before generating, ask the user for any of these that are not already provided,
|
|
|
26
26
|
10. NATIVE AUDIO — whether the video model should synthesize dialogue/sound (only some models support it). Default: on if the chosen model supports `sound`.
|
|
27
27
|
Ask first. Only make smart, briefly stated assumptions for whatever is still missing, and state them briefly. If the user names a budget or "cheapest/fastest", pick the model tier accordingly and say which you picked.
|
|
28
28
|
|
|
29
|
+
MODEL CAPABILITY MATRIX — decide this in the FIRST intake round (don't discover it mid-build)
|
|
30
|
+
The question that always surfaces late — "shouldn't the storyboard be a reference image for the video
|
|
31
|
+
model?" — must be answered up front, because the answer changes the whole graph. Reconfirm live with
|
|
32
|
+
`mcp__wyren__list_models` + `get_model_capabilities`, but the load-bearing facts:
|
|
33
|
+
|
|
34
|
+
| Model | Ref images | Native audio | Multishot | Start frame | Duration |
|
|
35
|
+
| --- | --- | --- | --- | --- | --- |
|
|
36
|
+
| **Kling V3** (default) | **0** (none) | yes (`sound`) | yes (≤6) | yes | 3–15s |
|
|
37
|
+
| Kling O1 | yes (≤7) | **no** | **no** | yes | — |
|
|
38
|
+
| Kling V2.6 | 0 | yes (+voice clone) | no | yes | 5/10s |
|
|
39
|
+
| Veo 3.1 Fast/Std | ≤3 | yes | no | yes | 4/6/8s |
|
|
40
|
+
| Seedance 2.0 / Fast | ≤4 | no | yes | yes | 4–15s (480/720p) |
|
|
41
|
+
|
|
42
|
+
- **Kling V3 takes ZERO reference images** (`maxReferenceImages: 0`). So the storyboard does NOT feed
|
|
43
|
+
the video model as a reference — it drives the IMAGE stage only (it becomes / designs the start
|
|
44
|
+
frame). Identity on V3 comes from the per-shot **startFrame + the verbatim character bible** in each
|
|
45
|
+
shot prompt, never from a reference image.
|
|
46
|
+
- **Only Kling O1 accepts reference images** — but it LOSES native audio AND multishot. So choosing O1
|
|
47
|
+
for face-locking means giving up scripted dialogue + in-generation angle changes. Usually not worth
|
|
48
|
+
it: prefer V3 (audio + multishot) with startFrame-driven identity.
|
|
49
|
+
- **Kling V3 Omni is disabled** — do not route to it.
|
|
50
|
+
- Net: for a dialogue UGC skit, default = **Kling V3**, identity via startFrame + bible, audio native.
|
|
51
|
+
Reach for a ref-image model (O1/Veo/Seedance) only when face-lock genuinely beats audio+multishot.
|
|
52
|
+
|
|
53
|
+
DECISION RULE — multishot vs separate clips:
|
|
54
|
+
- **Many quick dialogue beats + native audio → ONE Kling V3 multishot per part** (not many tiny
|
|
55
|
+
separate clips). Multishot keeps the face/voice continuous and gives angle changes in one generation.
|
|
56
|
+
- Watch the math: **Kling's minimum clip is 3s**, and a multishot maxes at 15s, so a single multishot
|
|
57
|
+
merge caps at **~5 beats** (5 × 3s = 15s). If a part has more than ~5 dialogue beats, split into a
|
|
58
|
+
second multishot/part rather than crushing beats below 3s.
|
|
59
|
+
- **Native audio nails the exact scripted lines** — for a dialogue skit you do NOT need a separate TTS
|
|
60
|
+
pass. Write the real lines into each shot's prompt (the storyboard FLOW table's Dialogue column) and
|
|
61
|
+
let the model speak them. Only fall back to TTS/editor-added VO when the chosen model has no `sound`.
|
|
62
|
+
|
|
29
63
|
Use the storyboard as the PLAN, not as footage to copy. First classify it (see STORYBOARD
|
|
30
64
|
INTERPRETATION below): if the panels are photographic, clean them and use them as start frames; if
|
|
31
65
|
they are schematic / annotated mockups (panel numbers, notes boxes, drawn phone bezels, mock UI), do
|
|
@@ -196,6 +230,15 @@ Wyren execution flow (per the wyren skill's policy — load it before any `mcp__
|
|
|
196
230
|
brand splash / end card, zoom/SFX timing, and master. That is the ONLY place HyperFrames is used, and
|
|
197
231
|
only for captions + splash + timing — never to composite text/UI graphics onto the footage.
|
|
198
232
|
|
|
233
|
+
WYREN BUILD GOTCHAS (these cost failed `validate_workflow` / `run_workflow` calls — get them right):
|
|
234
|
+
- **`multiPrompt` must be a JSON STRING, not an array.** When using video-model multishot, pass the
|
|
235
|
+
per-shot prompts as a stringified JSON value (e.g. `"[{...},{...}]"`), not a raw array. A raw array
|
|
236
|
+
fails validation.
|
|
237
|
+
- **`imageAI` / `videoAI` need a CONNECTED TEXT EDGE — `customPrompt` alone fails.** Wire a text/prompt
|
|
238
|
+
node into the AI node's prompt input; a `customPrompt` field set on the node without an incoming text
|
|
239
|
+
edge does not satisfy validation. Build the graph so every imageAI/videoAI has its prompt edge
|
|
240
|
+
connected, then set the prompt content.
|
|
241
|
+
|
|
199
242
|
RECURRING CHARACTER CONSISTENCY (multishot / multi-clip)
|
|
200
243
|
|
|
201
244
|
Any time the video is more than one shot — a multi-clip split (Part 1 / Part 2), or video-model
|