arca-marketing-video 2.24.0 → 2.26.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "arca-marketing-video",
3
- "version": "2.24.0",
3
+ "version": "2.26.0",
4
4
  "repository": {
5
5
  "type": "git",
6
6
  "url": "git+https://github.com/briarbearrr/arca-marketing.git"
@@ -105,6 +105,10 @@ Generate each slide as a separate standalone image. Do NOT generate all slides i
105
105
  ### GENERATE ONE SLIDE AT A TIME (hard rule + workflow)
106
106
  **Never render the slides together. One render = one slide.** No grid, no contact sheet, no 2×2 / 3×3 board, no "all slides in one image", no multi-panel composite, no brand board. A combined image is an instant fail — discard it and re-render as separate slides. (Asking an image model for "the whole carousel" reliably produces a cramped grid with unreadable text; one slide per render is the only way to get full-resolution, legible, mobile-ready slides.)
107
107
 
108
+ **Image generation is NOT unavailable — a collage is a WORKFLOW bug, not a tool limit.** If you get a grid/board, never tell the user image generation is unsupported or disabled. The tool works; the prompt method was wrong. Fix the method and regenerate — never downgrade or abandon the deliverable over it.
109
+
110
+ **The wording trap — these phrases make the model emit a collage:** "generate a batch", "the carousel", "the set", "five slides", "all slides", "a slide preview". To an image model, "a batch of slides" = "put all slides in one image." So never say them. Don't *batch* — **chain**: think of it as "generate the carousel as separate images, one image-generation call per slide, chained sequentially from Slide 1 to Slide N." One prompt = exactly one slide, always.
111
+
108
112
  **Render sequentially, anchoring every slide to Slide 1:**
109
113
  1. **Render Slide 1 first** as a single standalone 4:5 image. This locks the carousel's look — palette, type hierarchy, layout rhythm, safe margins, logo placement, illustration style. Review it; re-render until it's right BEFORE touching the rest (everything inherits from it).
110
114
  2. **Render Slides 2…N one at a time**, and feed the rendered Slide 1 back into the image tool as a **style reference** for each (optionally the immediately-previous slide too). Separate per-slide renders drift in style/palette/type unless every slide is anchored to Slide 1 — passing it as a reference is what makes independent renders read as one cohesive carousel.
@@ -187,7 +191,9 @@ When writing image-gen prompts, do not list every possible brand object — the
187
191
 
188
192
  **Standard prompt structure (every slide):**
189
193
 
190
- > Create one standalone Instagram portrait carousel slide, 4:5 ratio, 1080 × 1350 px.
194
+ > Create one standalone Instagram carousel slide only this image contains ONLY Slide [N].
195
+ > Do NOT create a collage, grid, contact sheet, multi-panel board, or slide preview.
196
+ > Full-frame single 4:5 portrait image, 1080 × 1350 px.
191
197
  >
192
198
  > Primary goal: Make the message readable first. The design should support the headline, not overpower it.
193
199
  >
@@ -328,6 +334,6 @@ Check every slide:
328
334
  12. Did every image prompt end with the RENDER-ONLY enforcement line?
329
335
  13. **Hook test:** does Slide 1 open a curiosity loop that makes the swipe feel irresistible — not a generic "X tips" listicle or a self-contained statement that needs no swipe?
330
336
  14. **Swipe-momentum:** does every slide end with a reason to keep swiping (open loop / cliffhanger / progression), all the way to the CTA?
331
- 15. **One-at-a-time render:** is each slide its own standalone image (no grid/board), rendered Slide 1 first with Slide 1 fed back as the style reference for the rest?
337
+ 15. **One-at-a-time render:** is each slide its own standalone image (no grid/board), rendered Slide 1 first with Slide 1 fed back as the style reference for the rest? Does every prompt explicitly block collage/grid/board and contain only one slide (no "batch / the carousel / the set / all slides" wording)? If a collage came out, was it discarded and regenerated — never shipped or "continued" — and never blamed on image-gen being unavailable?
332
338
 
333
339
  If any answer is no, fix before outputting — sharpen the hook, simplify the slide, or re-render. When unsure on visuals, remove the element — restraint always wins; when unsure on copy, make it more specific and open the loop wider.
@@ -14,7 +14,7 @@ Built on HyperFrames + ffmpeg + faster-whisper.
14
14
  ## Brand profile (read first)
15
15
  Read `../_arca-marketing-assets/brand.md` for the brand's colors, logo rules, and persona. The brand-splash end card
16
16
  uses `../_arca-marketing-assets/assets/final-cta.png`; the logo (`../_arca-marketing-assets/assets/logo.png`) may appear as a subtle
17
- in-world mark. `silence_cut.py` and `composition.template.html` are co-located in this skill folder.
17
+ in-world mark. **Never place the logo in the TOP-RIGHT corner** — put the watermark bottom-left or bottom-right (subtle, small), and keep it out of the face and the platform UI safe zone. `silence_cut.py` and `composition.template.html` are co-located in this skill folder.
18
18
  This skill is the final edit stage after `video-prompt` (or runs standalone on any raw footage).
19
19
 
20
20
  ## Overview
@@ -56,9 +56,10 @@ All paths below are inside the project dir: source from `clips/`, working files
56
56
 
57
57
  **Mind the tail / SFX.** AI clips often land the FINAL WORD right at the out-point, so the ~0.1s pad after the last word matters, and **never land a transition SFX on a cut where a word ends** (it steps on the last word — see the SFX layer).
58
58
  4. **Re-transcribe the ASSEMBLED cut** (`edit/tight.mp4` — for raw footage OR an assembled AI-clip video), regroup into caption phrases (sentence-aware, 3-5 words). The silence-cut shifts every timestamp, so always re-transcribe and recompute **caption, zoom, SFX, and splash** timing from the NEW boundaries; never remap old times.
59
- 5. **Build the composition** in `edit/composition/` from `./composition.template.html`: muted plate + separate dialogue audio + word-pop captions + zooms + logo + splash + SFX (no chips by default). Lint clean.
60
- 6. **Draft render** (`--quality draft`) → `edit/frames/`, extract frames at every caption/splash beat, eyeball, fix. Then **`--quality high`** and **master** into `out/`:
59
+ 5. **Scan faces, then build the composition.** FIRST sample frames across the cut (`ffmpeg -i edit/tight.mp4 -vf fps=2 edit/faces/f_%03d.png`) and READ them to note which vertical band each scene's face(s) occupy (see FACE-SAFE PLACEMENT) — faces move between shots, so map them per scene. THEN build `edit/composition/` from `./composition.template.html` (muted plate + dialogue audio + word-pop captions + zooms + logo + splash + SFX, no chips by default), placing captions and EVERY graphic in a band that clears the detected faces. Lint clean.
60
+ 6. **Draft render + FACE-SAFE CHECK** (`--quality draft`) → `edit/frames/`: extract a frame at EVERY caption / chip / logo / splash beat, READ each one, and confirm NO caption, figure, logo, or graphic overlaps a face. If any does, move that element (or nudge the plate up via the cover-fit transform) and re-render — do NOT proceed to the high render until every overlay clears every face. Then **`--quality high`** and **master** into `out/`:
61
61
  `ffmpeg -i edit/raw.mp4 -c:v copy -af "loudnorm=I=-14:TP=-1.5,alimiter=limit=0.95" -c:a aac -b:a 192k out/<slug>-final.mp4`
62
+ Master in a SINGLE video encode and `-c:v copy` on the mux — re-encoding the video across multiple passes stacks compression artifacts. To trim a span out of a FINISHED master, **ripple-cut** (cut video + audio + baked captions together so they stay in sync) — valid only if NO caption is mid-display across the cut window.
62
63
 
63
64
  ## The silence-cut (the part that is easy to get wrong)
64
65
  **Run on BOTH raw recordings and assembled AI-clip videos** (see Pipeline step 3). Kling pads every generated clip with ~1–2s of dead air, so AI assemblies need this just as much as raw footage — for AI clips tune sensitive (`--cut-min 0.35`, noise floor `-36 dB`).
@@ -68,6 +69,8 @@ Neither signal alone works:
68
69
 
69
70
  **Cut where EITHER is true:** acoustic silence (`silencedetect=noise=-33dB:d=0.35`) OR a transcript word-gap > 0.45s. Only remove the dead middle when it exceeds ~0.5s, leave ~0.1s of speech pad each side, and KEEP sub-0.5s gaps (natural rhythm — cutting every micro-gap machine-guns the edit and looks glitchy). Add a 10ms `afade` at each join to kill clicks. **Verify with silencedetect on the OUTPUT** — it should show only short breath gaps. `./silence_cut.py` implements all of this; tune `--cut-min`. **For AI-clip assemblies** (Kling et al.), the padding is quiet near-silent breath that `-33 dB` can miss — drop the noise floor to `-36 dB` and `--cut-min 0.35` to catch it.
70
71
 
72
+ **Don't word-gap-cut a foley/VO/music format** (e.g. a trailer with no continuous dialogue). Word-gap detection assumes speech, so on speechless foley beats it reads the whole clip as a "gap" and guts it. For those formats, SKIP the word-gap cut: trim each Kling clip's head pad MANUALLY and time captions analytically (see Caption standard) instead.
73
+
71
74
  ## Layers (all face/caption-safe)
72
75
  - **Captions — word-by-word pop-on (the default):** each WORD pops in as it's spoken (Anton, UPPERCASE,
73
76
  brand-gold keywords, NO pill backing — stroke + shadow for legibility). This is the canonical standard;
@@ -78,6 +81,12 @@ Neither signal alone works:
78
81
  zoom punch-ins, speed ramps / hold-frames, the word-pop captions themselves (gold keyword emphasis),
79
82
  hard cuts on the beat, and SFX hits. Only add a chip if the user explicitly asks for an on-screen
80
83
  label, and keep it minimal. The chip CSS/JS is removed from the default template.
84
+ - **In-world / screen graphics are NOT an editor job — generate them upstream.** Anything that belongs ON a
85
+ surface in the scene (a laptop/phone screen's content, an in-scene poster, a product label, a "now loading"
86
+ promo) must be generated DIEGETICALLY inside the video clip by `video-prompt`, never composited here.
87
+ Pasted in the edit it floats in mid-air, covers faces, and looks fake — e.g. a "SYNERGIZE YOUR VIBES" card
88
+ meant for the laptop screen ends up hovering over the whole team. The editor only adds captions, the brand
89
+ splash / end card, and (rarely, if asked) a minimal face-safe chip — nothing that's supposed to live in the scene.
81
90
  - **Zoom punch-ins:** scale the plate wrapper (base ~1.04) to ~1.10-1.14 on emphasis lines, ease back. Never scale below 1.0 (reveals letterbox edges). Cover-fit the plate.
82
91
  - **SFX:** a curated set ships in `./sfx/` — use these first (no download needed). Keep dialogue front (SFX vol 0.25-0.35); every `<audio>` needs an `id`. **Never land a transition/whoosh SFX on a cut where a word ends** — it steps on the last word; put the hit on a silent beat or at the head of the next clip (this is why the silence-cut leaves ~0.1s pad after the last word, see Pipeline step 3). Mapping:
83
92
  | Role | File |
@@ -86,7 +95,6 @@ Neither signal alone works:
86
95
  | Hard cut / speaker change | `./sfx/swoosh-high.mp3`, `./sfx/swoosh-low.mp3` |
87
96
  | Chip entrance / key reveal (pop) | `./sfx/ding.mp3` |
88
97
  | Brand-splash signature hit (reserve for splash only) | `./sfx/tiktok-boom-bling.mp3` |
89
- | Glitch / error beat | `./sfx/glitch.mp3` |
90
98
  | "Wrong"/mistake beat | `./sfx/wrong.mp3` |
91
99
  | Comedic deflation | `./sfx/sad-violin.mp3` |
92
100
  Need something not here? Mixkit free SFX (`mixkit.co/free-sound-effects/<cat>/` → `assets.mixkit.co/.../<id>-preview.mp3`) is a reliable no-key source.
@@ -103,10 +111,13 @@ Captions exist to (a) make the video legible sound-off, (b) hold retention with
103
111
  (floating glass pills are the #1 "AI-looking" tell).
104
112
  - Captions are the **wording source of truth** — they carry the exact script even if native/synth audio
105
113
  drops a word.
106
- - Never cover the face — captions live in the lower third; any other graphic stays lower-mid or a top strip.
114
+ - Never cover the face — DETECT where the face actually sits per scene and place captions/graphics in a band that clears it (see FACE-SAFE PLACEMENT). The lower-third default only holds when the face is high in frame; if the face sits low, raise the captions or push the plate up.
107
115
  - Derive timing from **word-level transcription of the FINAL cut** (re-transcribe after any cut, including
108
116
  the AI-clip silence-cut). The same re-transcription drives caption, zoom, SFX, and splash timing — recompute
109
117
  them all from the new boundaries. Never hand-guess or remap old times.
118
+ - **Noisy mixed audio (music + foley + VO): don't transcribe the MIX** — source timing per layer. Narrator
119
+ captions from the `voiceAI` word-alignment (plus the clip's offset); character / native lines by transcribing
120
+ the **native-audio-only cut** (a separate export with just the dialogue track), not the final mix.
110
121
 
111
122
  **Font / size / layout:** Anton (or a heavy grotesk like Archivo Black for premium brands), UPPERCASE by
112
123
  default (sentence case only for a strictly soft/premium voice), one caption font for the whole video,
@@ -143,6 +154,18 @@ Keep sub-0.5s natural speech rhythm — don't machine-gun a separate caption ont
143
154
  cross-dissolves / cinematic fades · ❌ tiny text, thin weights, or a generic system font · ❌ everything
144
155
  (or nothing) highlighted · ❌ captions over the face or in the platform UI zone.
145
156
 
157
+ ## FACE-SAFE PLACEMENT (captions + every graphic must clear faces)
158
+ The #1 overlay failure is text or a figure landing ON someone's face. "Captions live in the lower third" only holds when the face sits HIGH in frame — but talking-head / UGC framing varies wildly (desk-level POV, low or off-center framing, two people, a face near the bottom, or a subject reframed by a zoom punch-in). A blind lower-third caption then covers the face. So DETECT, don't assume:
159
+
160
+ - **Find the face before placing anything.** Sample frames across each scene (`ffmpeg -i edit/tight.mp4 -vf fps=2 edit/faces/f_%03d.png`) and READ them to note which vertical band each scene's face(s) occupy. Faces MOVE between shots — map them per scene/segment, never once for the whole video.
161
+ - **Place overlays in a band that clears the face**, keeping a margin of ≥8–10% of frame height around the face:
162
+ - Face high / centered (common case) → captions in the lower third (default baseline ~300px from bottom).
163
+ - Face LOW (in the lower third) → RAISE the captions to an upper band / top strip, OR push the plate up via the cover-fit / zoom transform so the face leaves the caption band. Never drop text onto a low face.
164
+ - Two faces, or a face off to one side → use the genuinely clear band (top strip, or the empty side); never straddle a face.
165
+ - **Every inserted element obeys this — not just captions:** word-pop captions, any chip / label, the logo watermark, engagement figures / graphics, and the splash. Each keeps the same face margin AND stays out of the platform UI safe zone (bottom ~10%).
166
+ - **If no band is safe** at a given moment (a face fills the frame), shrink or move the element, delay it to a face-clear beat, or drop it — covering the face is never acceptable.
167
+ - **Verify on real frames, not by assumption** (Pipeline step 6): extract the frame at every overlay beat, look, and move anything touching a face before the high render. When the face moves into the caption band mid-segment, re-place the captions (or the plate) for that segment and re-derive timing.
168
+
146
169
  ## Gotchas
147
170
  | Symptom | Fix |
148
171
  | --- | --- |
@@ -153,11 +176,14 @@ cross-dissolves / cinematic fades · ❌ tiny text, thin weights, or a generic s
153
176
  | Two captions on screen at once | clamp hard-hide to `min(end+0.12, next.start-0.06)`, floor `inAt+0.2` (see Caption standard) |
154
177
  | Font silently falls back | Anton/Archivo etc. are NOT auto-embedded; download `.woff2` to `fonts/` + `@font-face` |
155
178
  | SFX silent in the render | every `<audio>` needs an `id` |
156
- | Graphics cover the face | keep overlays in the lower-mid band (or a top strip), never the center |
179
+ | Captions or graphics land on a face | don't assume lower-third is safe — sample frames, find the face per scene, place overlays in a band that clears it (raise captions or push the plate up when the face sits low), verify on extracted frames before the high render (see FACE-SAFE PLACEMENT) |
180
+ | Logo lands in the top-right | move the watermark to a bottom corner (subtle, small); never top-right (the template may default there — override it) |
157
181
  | Peaks clip (max 0.0 dBFS) | master the final with `loudnorm + alimiter` |
182
+ | HyperFrames duration looks off | its CLI duration summary misreports — trust `ffprobe` for true duration; the render also expects `index.html` in a project dir |
183
+ | Is the dialogue audible over music? | you can't audition audio — checkpoint the mix / levels with the user before finalizing |
158
184
  | Can't preview alpha/cutout in ffmpeg | ffmpeg 4.x can't decode VP9-alpha; verify in Chrome (canvas getImageData) |
159
185
 
160
186
  ## Files
161
187
  - `./silence_cut.py` — silence ∪ word-gap cutter: `--src --audio --transcript --out [--cut-min 0.5]`.
162
188
  - `./composition.template.html` — HyperFrames composition skeleton (plate, word-pop captions, zooms, splash, SFX; chips removed from default) with the load-bearing GSAP logic already wired.
163
- - `./sfx/` — bundled, ready-to-use SFX (riser, swooshes, ding, tiktok-boom-bling, glitch, wrong, sad-violin). See the SFX mapping above.
189
+ - `./sfx/` — bundled, ready-to-use SFX (riser, swooshes, ding, tiktok-boom-bling, wrong, sad-violin). See the SFX mapping above.
@@ -27,7 +27,7 @@ Ask for these inputs, then wait for answers. Make smart, briefly-stated assumpti
27
27
  3. **Target market / audience** — who it's for.
28
28
  4. **Brand profile** — `../_arca-marketing-assets/brand.md` (name, positioning, tone, persona, logo rules, colors).
29
29
  5. **Target format** — TikTok / Reels / Shorts.
30
- 6. **Preferred length** — or let me recommend (default 20–30s unless the idea demands otherwise).
30
+ 6. **Preferred length** — **default 15–20s.** Go LONGER only if the user explicitly asks; never pad past 20s on your own. (Shorter is fine for a genuinely tiny idea, but 15–20s is the target.)
31
31
 
32
32
  Optional extra context to absorb if given: product/offer, audience pain point, desired emotion, must-include, must-avoid, tone, location, available props, actors/character types. Use the supplied logo if available.
33
33
 
@@ -123,8 +123,8 @@ Rewrite the concept into a stronger TikTok/Reels-ready version. Produce:
123
123
  6. Best opening visual
124
124
  7. Chosen hook
125
125
  8. Two backup hooks
126
- 9. **Recommended length** — default 2030s only if truly strongest; recommend shorter/longer if the idea demands
127
- 10. **Retention map** — 0–1s scroll stop / 1–3s promise + clarity / 3–6s foreshadow or escalation / 6–15s proof, action, tension, or transformation / 1525s payoff / final 1–2s loop, CTA, or punchline
126
+ 9. **Recommended length** — **default 1520s**; go LONGER only if the user explicitly asked (never on your own). Recommend shorter only for a genuinely tiny idea.
127
+ 10. **Retention map** (scaled to the 15–20s default) — 0–1s scroll stop / 1–3s promise + clarity / 3–6s foreshadow or escalation / 6–14s proof, action, tension, or transformation / 1418s payoff / final 1–2s loop, CTA, or punchline. Stretch proportionally only if the user asked for a longer video.
128
128
  11. Pacing recommendation
129
129
  12. Music / sound style
130
130
  13. **Dialogue + edit-caption strategy** — captions may be recommended for the final EDIT, but do NOT place overlay captions inside the storyboard image frames
@@ -17,7 +17,7 @@ Before generating, ask for any of these not already provided, then wait for answ
17
17
  3. **BRAND PROFILE** — attached/pasted `brand.md`.
18
18
  4. **TARGET MARKET / AUDIENCE.**
19
19
  5. **VIDEO SEGMENT** — FULL VIDEO / PART 1 / PART 2.
20
- 6. **TARGET DURATION.**
20
+ 6. **TARGET DURATION** — default 15–20s; only exceed 20s if the user explicitly asks.
21
21
  7. **IMAGE MODEL + RESOLUTION** — which model cleans/upscales storyboard frames into start frames, at what size. Default: Nanobanana Pro at 2K.
22
22
  8. **VIDEO MODEL + RESOLUTION** — which model generates clips, at what resolution, which mode (std/pro). Default: Kling V3 at 720p.
23
23
  9. **ASPECT RATIO** — default 9:16 vertical.
@@ -89,7 +89,11 @@ The board is a REFERENCE IMAGE, but WHERE it connects depends on whether the cho
89
89
  - Either way, the board NEVER gets reproduced as on-screen graphics; it only conditions generation.
90
90
 
91
91
  ### DO NOT COMPOSITE TEXT/UI GRAPHICS (hard rule)
92
- This (the video-generation) stage composites NO overlay onto the footage. Never recreate the storyboard's mock UI / data as floating overlays — no data/deck cards, dashboards, labels, callouts, checklists, route-map text, fake screens, or "REC"-style HUD text laid over the video. It looks fake and instantly kills the UGC feel. Any text or screen the viewer sees must be DIEGETIC (filmed/generated as a real screen or prop inside the Wyren clip) or simply dropped — if a panel's mock UI can't be made diegetic, simplify or omit it. All overlays are decided later by `shorts-editor` (spoken-word captions, brand splash, engagement chips/graphics) on the editor's terms — not generated or composited here.
92
+ This (the video-generation) stage composites NO overlay onto the footage. Never recreate the storyboard's mock UI / data as floating overlays — no data/deck cards, dashboards, labels, callouts, checklists, route-map text, fake screens, or "REC"-style HUD text laid over the video. It looks fake and instantly kills the UGC feel. Any text or screen the viewer sees must be DIEGETIC (filmed/generated as a real screen or prop inside the Wyren clip) or simply dropped — if a panel's mock UI can't be made diegetic, simplify or omit it. All overlays are decided later by `shorts-editor` (spoken-word captions, brand splash, engagement chips/graphics) on the editor's terms — not generated or composited here. **And the reverse holds: the editor CANNOT place in-world / screen content convincingly** — pasted in post it floats over faces and looks fake (e.g. a promo card meant for the laptop screen ends up hovering over the team). So if a screen or prop must show specific content, it MUST be generated DIEGETICALLY here, in the clip — never deferred to the edit.
93
+
94
+ ### KLING FRAGILITY — morph guard + where dialogue lands
95
+ - **Kling warps fragile on-screen content and morphs objects.** It garbles readable UI/text (e.g. "Option A / B" labels turn to mush) and deforms objects mid-shot (a laptop literally rotating from its back, the lid becoming the screen). So don't give it readable UI or shape-critical objects to animate: keep screens **dim / out-of-focus / glare-washed**, snap the camera to faces, and **lock each object's shape + orientation explicitly** in the prompt. Add anti-morph negatives: *morphing screen, warping text, laptop flipping, lid becoming screen, deforming/melting object, unstable geometry.*
96
+ - **Native dialogue lands at the END of a generated clip**, and Kling pads the HEAD with filler / ad-libs. Plan the real line for the clip's back half, and tell `shorts-editor` to **transcribe before trimming** — a blind fixed-duration trim chops the actual line (see shorts-editor's AI-clip edge handling).
93
97
 
94
98
  ## VIDEO TYPE SCOPE
95
99
  This skill makes ANY video type. The LOOK follows the chosen VIDEO TYPE; the TikTok retention rules (FIRST 5 SECONDS cold open, sound-off clarity, a fresh beat every 2–4s, continuity locks, brand rules) apply to EVERY type.
@@ -105,7 +109,7 @@ Generate a finished vertical short-form video with:
105
109
  - Camera style: handheld smartphone footage (UGC)
106
110
  - Production level: low-cost, casual, creator-shot (UGC)
107
111
  - Audio: native dialogue, voice, room tone, foley, SFX, risers, music — only if supported by the model
108
- - Duration: [TARGET DURATION]
112
+ - Duration: [TARGET DURATION — default 15–20s unless the user asked for longer]
109
113
  - Max per generation: model-dependent (typically 15s)
110
114
 
111
115
  If the full video exceeds the model's max single-clip duration, split into multiple generated clips.
@@ -156,13 +160,14 @@ Resolutions: 480p (Seedance only), 720p, 1080p, 4K (Veo only). Default 720p for
156
160
  2. `build_graph`: `imageInput` (start-frame source, characters.png, logo.png, AND the production board / its cropped character-reference + storyboard cell if you have one) → `imageAI` (A: clean/upscale photo panel; B: design fresh start frame from the board reference) → `videoAI` (chosen model/resolution/mode/duration). **Route the board per MODEL ROUTING:** zero-ref models (Kling V3) → board into `imageAI` only; reference-capable models (Seedance/Veo/O1) → also wire the board / storyboard frames into `videoAI` as `referenceImages`. Use multishot or per-clip nodes per the split rule.
157
161
  3. `validate_workflow` — resolve warnings with the user.
158
162
  4. Estimate cost: `get_pricing` (chain mode) / `estimate_product_cost`; get the user's OK to spend.
159
- 5. `run_workflow` (`userConfirmed: true`), then poll `get_workflow_run_status` every 5s until terminal.
163
+ 5. `run_workflow` (`userConfirmed: true`), then poll for completion. **`get_workflow_run_status` lags badly** it can still show `pending` ~10 min after a job already succeeded, so treat **`get_node_outputs` as the source of truth** for whether a node is done, not the run status. There is a single video worker, so `run_node` jobs queue and run ~sequentially; `cancel_job` is wedge-risky, so let redundant jobs finish rather than cancel.
160
164
  6. Pull clips with `get_node_outputs` and SAVE into the project dir: clips → `<project-slug>/clips/` (`clip-01.mp4`, `part1.mp4` …), and any designed/cleaned start frames → `<project-slug>/startframes/`. On-screen UI/data/screens were generated DIEGETICALLY inside the clips — no text-overlay pass here.
161
165
  7. EDIT & FINISH — hand the `<project-slug>/clips/` folder to `shorts-editor`: fast-cut assembly, spoken-word CAPTIONS, brand splash/end card, zoom/SFX timing, master into `<project-slug>/out/`. That is the ONLY place HyperFrames is used, and only for captions + splash + timing — never to composite text/UI graphics onto the footage.
162
166
 
163
167
  ### WYREN BUILD GOTCHAS (these cost failed validate/run calls)
164
168
  - **`multiPrompt` must be a JSON STRING, not an array.** For multishot, pass per-shot prompts as a stringified JSON value (e.g. `"[{...},{...}]"`), not a raw array. A raw array fails validation.
165
- - **`imageAI` / `videoAI` need a CONNECTED TEXT EDGE — `customPrompt` alone fails.** Wire a text/prompt node into the AI node's prompt input; a `customPrompt` field set without an incoming text edge does not satisfy validation. Build the graph so every imageAI/videoAI has its prompt edge connected, then set the prompt content.
169
+ - **`imageAI` / `videoAI` need a CONNECTED TEXT EDGE — `customPrompt` alone fails.** Wire a text/prompt node into the AI node's prompt input; a `customPrompt` field set without an incoming text edge does not satisfy validation. Build the graph so every imageAI/videoAI has its prompt edge connected, then set the prompt content. (`imageInput` data shape is `{imageUrls:[url]}`; `videoAI.text` needs a connected `textInput` edge.)
170
+ - **`run_workflow` re-runs ALL nodes.** To regenerate just a few nodes (a reshot clip, a fixed frame) WITHOUT clobbering already-approved outputs, call **`run_node` per node** instead of re-running the whole workflow.
166
171
 
167
172
  ## RECURRING CHARACTER CONSISTENCY (multishot / multi-clip)
168
173
  Any time the video is more than one shot — a multi-clip split (Part 1/Part 2) or video-model multishot (Kling V3 ≤6 shots) — the same person must look identical in every shot. Faces, hair, build, age, wardrobe drift badly across independent generations. Lock identity FIRST, for each recurring character (e.g. the Arca Navigator).
@@ -185,7 +190,7 @@ If two characters recur, build a separate profile + bible for each, and keep bot
185
190
  - **Push extras out of focus.** Keep background people incidental, turned away, blurred, or cropped — never ask the model to hold a face it doesn't need to. An extra the viewer can't study can't visibly drift.
186
191
 
187
192
  ## DEFAULT SPLIT RULE
188
- If target duration is 20–30s: **Part 1 = Panels 1–5; Part 2 = Panels 6–9.** Each part feels like one continuous video. Part 2 continues the same character, wardrobe, props, lighting, setting, camera quality, and emotional energy — do not restart the story, do not recap. Never show the storyboard grid, panel numbers, production notes, borders, arrows, labels, or annotations. Convert panels into real-feeling vertical footage.
193
+ **The default 15–20s video is usually ONE part** — a single Kling V3 multishot (up to 15s) plus a short final clip if needed. Only SPLIT into two parts when the user explicitly asked for a LONGER video (>~20s). When you do split: **Part 1 = Panels 1–5; Part 2 = Panels 6–9.** Each part feels like one continuous video. Part 2 continues the same character, wardrobe, props, lighting, setting, camera quality, and emotional energy — do not restart the story, do not recap. Never show the storyboard grid, panel numbers, production notes, borders, arrows, labels, or annotations. Convert panels into real-feeling vertical footage.
189
194
 
190
195
  ## INPUTS
191
196
  - Storyboard reference: [ATTACH 3×3 STORYBOARD IMAGE]
@@ -288,6 +293,8 @@ If a brand is included: use the supplied logo only if available, as a natural ph
288
293
  ## AUDIO DIRECTION
289
294
  Generate native audio if supported; it should feel like real creator-shot social video. Use: natural room tone, phone-mic ambience, casual dialogue, natural VO if the storyboard calls for it, imperfect human delivery, tiny pauses, keyboard taps, chair squeaks, paper sounds, phone buzzes, footsteps, desk sounds, bag rustles, small reaction sounds, subtle whoosh/riser only when helpful, light music only if it supports pacing. Native to TikTok/Reels, not cinematic. Avoid: trailer/orchestral music, dramatic swells, glossy commercial music, overproduced sound design, fake epic SFX, booming risers, ad-like or perfect-studio VO. Dialogue casual, slightly imperfect, human. If the model can't generate clean dialogue, prioritize realistic visual storytelling and leave dialogue/captions for editing.
290
295
 
296
+ **Audio mix (esp. narrated / trailer formats):** use **native Kling dialogue for character lines**, a **separate, consistent narrator VO** track when there's narration (don't let Kling re-voice the narrator per clip — it drifts), and **native foley everywhere**. Keep **dialogue clearly audible over music** — the editor will balance levels, but write/generate so speech sits on top. For trailer/hype energy (non-UGC), pace it punchy: fast whip-pans and snap-zooms, ~3s shots, high intensity — not slow cinematic drift.
297
+
291
298
  ## SHOT-BY-SHOT EXECUTION
292
299
  Each panel is a BEAT — realize it as several short shots from different angles, not one held clip (see FAST CUTS). Per panel: preserve the main action, emotional beat, character placement when possible, important props, and environment; make it feel like real phone footage; make the action understandable without text; keep it short and purposeful; move the story forward quickly. Panel 1 = strongest scroll-stopping moment. Panel 7 = strongest visual payoff/climax. Panel 9 = punchline, loop, CTA, or memorable final image. Don't let the ending drag. Speak the EXACT lines from the vetted script (SCRIPT & ORDER PRE-FLIGHT) — any dialogue improvement happens THERE and gets re-vetted, not silently per shot, so the same words and order ship every time. You may still refine timing, micro-actions, transitions, and audio as long as the script, story order, and visual anchor stay intact.
293
300
 
Binary file