npm - arca-marketing-video - Versions diffs - 2.24.0 → 2.26.0 - Mend

arca-marketing-video 2.24.0 → 2.26.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (6) hide show

package/package.json +1 -1
package/skills/carousel-generator/SKILL.md +8 -2
package/skills/shorts-editor/SKILL.md +33 -7
package/skills/storyboard-prompt/SKILL.md +3 -3
package/skills/video-prompt/SKILL.md +13 -6
package/skills/shorts-editor/sfx/glitch.mp3 +0 -0

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "arca-marketing-video",
-  "version": "2.24.0",
+  "version": "2.26.0",
   "repository": {
     "type": "git",
     "url": "git+https://github.com/briarbearrr/arca-marketing.git"

package/skills/carousel-generator/SKILL.md CHANGED Viewed

@@ -105,6 +105,10 @@ Generate each slide as a separate standalone image. Do NOT generate all slides i
 ### GENERATE ONE SLIDE AT A TIME (hard rule + workflow)
 **Never render the slides together. One render = one slide.** No grid, no contact sheet, no 2×2 / 3×3 board, no "all slides in one image", no multi-panel composite, no brand board. A combined image is an instant fail — discard it and re-render as separate slides. (Asking an image model for "the whole carousel" reliably produces a cramped grid with unreadable text; one slide per render is the only way to get full-resolution, legible, mobile-ready slides.)
+**Image generation is NOT unavailable — a collage is a WORKFLOW bug, not a tool limit.** If you get a grid/board, never tell the user image generation is unsupported or disabled. The tool works; the prompt method was wrong. Fix the method and regenerate — never downgrade or abandon the deliverable over it.
+**The wording trap — these phrases make the model emit a collage:** "generate a batch", "the carousel", "the set", "five slides", "all slides", "a slide preview". To an image model, "a batch of slides" = "put all slides in one image." So never say them. Don't *batch* — **chain**: think of it as "generate the carousel as separate images, one image-generation call per slide, chained sequentially from Slide 1 to Slide N." One prompt = exactly one slide, always.
 **Render sequentially, anchoring every slide to Slide 1:**
 1. **Render Slide 1 first** as a single standalone 4:5 image. This locks the carousel's look — palette, type hierarchy, layout rhythm, safe margins, logo placement, illustration style. Review it; re-render until it's right BEFORE touching the rest (everything inherits from it).
 2. **Render Slides 2…N one at a time**, and feed the rendered Slide 1 back into the image tool as a **style reference** for each (optionally the immediately-previous slide too). Separate per-slide renders drift in style/palette/type unless every slide is anchored to Slide 1 — passing it as a reference is what makes independent renders read as one cohesive carousel.
@@ -187,7 +191,9 @@ When writing image-gen prompts, do not list every possible brand object — the
 **Standard prompt structure (every slide):**
-> Create one standalone Instagram portrait carousel slide, 4:5 ratio, 1080 × 1350 px.
+> Create one standalone Instagram carousel slide only — this image contains ONLY Slide [N].
+> Do NOT create a collage, grid, contact sheet, multi-panel board, or slide preview.
+> Full-frame single 4:5 portrait image, 1080 × 1350 px.
 >
 > Primary goal: Make the message readable first. The design should support the headline, not overpower it.
 >
@@ -328,6 +334,6 @@ Check every slide:
 12. Did every image prompt end with the RENDER-ONLY enforcement line?
 13. **Hook test:** does Slide 1 open a curiosity loop that makes the swipe feel irresistible — not a generic "X tips" listicle or a self-contained statement that needs no swipe?
 14. **Swipe-momentum:** does every slide end with a reason to keep swiping (open loop / cliffhanger / progression), all the way to the CTA?
-15. **One-at-a-time render:** is each slide its own standalone image (no grid/board), rendered Slide 1 first with Slide 1 fed back as the style reference for the rest?
+15. **One-at-a-time render:** is each slide its own standalone image (no grid/board), rendered Slide 1 first with Slide 1 fed back as the style reference for the rest? Does every prompt explicitly block collage/grid/board and contain only one slide (no "batch / the carousel / the set / all slides" wording)? If a collage came out, was it discarded and regenerated — never shipped or "continued" — and never blamed on image-gen being unavailable?
 If any answer is no, fix before outputting — sharpen the hook, simplify the slide, or re-render. When unsure on visuals, remove the element — restraint always wins; when unsure on copy, make it more specific and open the loop wider.

package/skills/shorts-editor/SKILL.md CHANGED Viewed

@@ -14,7 +14,7 @@ Built on HyperFrames + ffmpeg + faster-whisper.
 ## Brand profile (read first)
 Read `../_arca-marketing-assets/brand.md` for the brand's colors, logo rules, and persona. The brand-splash end card
 uses `../_arca-marketing-assets/assets/final-cta.png`; the logo (`../_arca-marketing-assets/assets/logo.png`) may appear as a subtle
-in-world mark. `silence_cut.py` and `composition.template.html` are co-located in this skill folder.
+in-world mark. **Never place the logo in the TOP-RIGHT corner** — put the watermark bottom-left or bottom-right (subtle, small), and keep it out of the face and the platform UI safe zone. `silence_cut.py` and `composition.template.html` are co-located in this skill folder.
 This skill is the final edit stage after `video-prompt` (or runs standalone on any raw footage).
 ## Overview
@@ -56,9 +56,10 @@ All paths below are inside the project dir: source from `clips/`, working files
    **Mind the tail / SFX.** AI clips often land the FINAL WORD right at the out-point, so the ~0.1s pad after the last word matters, and **never land a transition SFX on a cut where a word ends** (it steps on the last word — see the SFX layer).
 4. **Re-transcribe the ASSEMBLED cut** (`edit/tight.mp4` — for raw footage OR an assembled AI-clip video), regroup into caption phrases (sentence-aware, 3-5 words). The silence-cut shifts every timestamp, so always re-transcribe and recompute **caption, zoom, SFX, and splash** timing from the NEW boundaries; never remap old times.
-5. **Build the composition** in `edit/composition/` from `./composition.template.html`: muted plate + separate dialogue audio + word-pop captions + zooms + logo + splash + SFX (no chips by default). Lint clean.
-6. **Draft render** (`--quality draft`) → `edit/frames/`, extract frames at every caption/splash beat, eyeball, fix. Then **`--quality high`** and **master** into `out/`:
+5. **Scan faces, then build the composition.** FIRST sample frames across the cut (`ffmpeg -i edit/tight.mp4 -vf fps=2 edit/faces/f_%03d.png`) and READ them to note which vertical band each scene's face(s) occupy (see FACE-SAFE PLACEMENT) — faces move between shots, so map them per scene. THEN build `edit/composition/` from `./composition.template.html` (muted plate + dialogue audio + word-pop captions + zooms + logo + splash + SFX, no chips by default), placing captions and EVERY graphic in a band that clears the detected faces. Lint clean.
+6. **Draft render + FACE-SAFE CHECK** (`--quality draft`) → `edit/frames/`: extract a frame at EVERY caption / chip / logo / splash beat, READ each one, and confirm NO caption, figure, logo, or graphic overlaps a face. If any does, move that element (or nudge the plate up via the cover-fit transform) and re-render — do NOT proceed to the high render until every overlay clears every face. Then **`--quality high`** and **master** into `out/`:
    `ffmpeg -i edit/raw.mp4 -c:v copy -af "loudnorm=I=-14:TP=-1.5,alimiter=limit=0.95" -c:a aac -b:a 192k out/<slug>-final.mp4`
+   Master in a SINGLE video encode and `-c:v copy` on the mux — re-encoding the video across multiple passes stacks compression artifacts. To trim a span out of a FINISHED master, **ripple-cut** (cut video + audio + baked captions together so they stay in sync) — valid only if NO caption is mid-display across the cut window.
 ## The silence-cut (the part that is easy to get wrong)
 **Run on BOTH raw recordings and assembled AI-clip videos** (see Pipeline step 3). Kling pads every generated clip with ~1–2s of dead air, so AI assemblies need this just as much as raw footage — for AI clips tune sensitive (`--cut-min 0.35`, noise floor `-36 dB`).
@@ -68,6 +69,8 @@ Neither signal alone works:
 **Cut where EITHER is true:** acoustic silence (`silencedetect=noise=-33dB:d=0.35`) OR a transcript word-gap > 0.45s. Only remove the dead middle when it exceeds ~0.5s, leave ~0.1s of speech pad each side, and KEEP sub-0.5s gaps (natural rhythm — cutting every micro-gap machine-guns the edit and looks glitchy). Add a 10ms `afade` at each join to kill clicks. **Verify with silencedetect on the OUTPUT** — it should show only short breath gaps. `./silence_cut.py` implements all of this; tune `--cut-min`. **For AI-clip assemblies** (Kling et al.), the padding is quiet near-silent breath that `-33 dB` can miss — drop the noise floor to `-36 dB` and `--cut-min 0.35` to catch it.
+**Don't word-gap-cut a foley/VO/music format** (e.g. a trailer with no continuous dialogue). Word-gap detection assumes speech, so on speechless foley beats it reads the whole clip as a "gap" and guts it. For those formats, SKIP the word-gap cut: trim each Kling clip's head pad MANUALLY and time captions analytically (see Caption standard) instead.
 ## Layers (all face/caption-safe)
 - **Captions — word-by-word pop-on (the default):** each WORD pops in as it's spoken (Anton, UPPERCASE,
   brand-gold keywords, NO pill backing — stroke + shadow for legibility). This is the canonical standard;
@@ -78,6 +81,12 @@ Neither signal alone works:
   zoom punch-ins, speed ramps / hold-frames, the word-pop captions themselves (gold keyword emphasis),
   hard cuts on the beat, and SFX hits. Only add a chip if the user explicitly asks for an on-screen
   label, and keep it minimal. The chip CSS/JS is removed from the default template.
+- **In-world / screen graphics are NOT an editor job — generate them upstream.** Anything that belongs ON a
+  surface in the scene (a laptop/phone screen's content, an in-scene poster, a product label, a "now loading"
+  promo) must be generated DIEGETICALLY inside the video clip by `video-prompt`, never composited here.
+  Pasted in the edit it floats in mid-air, covers faces, and looks fake — e.g. a "SYNERGIZE YOUR VIBES" card
+  meant for the laptop screen ends up hovering over the whole team. The editor only adds captions, the brand
+  splash / end card, and (rarely, if asked) a minimal face-safe chip — nothing that's supposed to live in the scene.
 - **Zoom punch-ins:** scale the plate wrapper (base ~1.04) to ~1.10-1.14 on emphasis lines, ease back. Never scale below 1.0 (reveals letterbox edges). Cover-fit the plate.
 - **SFX:** a curated set ships in `./sfx/` — use these first (no download needed). Keep dialogue front (SFX vol 0.25-0.35); every `<audio>` needs an `id`. **Never land a transition/whoosh SFX on a cut where a word ends** — it steps on the last word; put the hit on a silent beat or at the head of the next clip (this is why the silence-cut leaves ~0.1s pad after the last word, see Pipeline step 3). Mapping:
   | Role | File |
@@ -86,7 +95,6 @@ Neither signal alone works:
   | Hard cut / speaker change | `./sfx/swoosh-high.mp3`, `./sfx/swoosh-low.mp3` |
   | Chip entrance / key reveal (pop) | `./sfx/ding.mp3` |
   | Brand-splash signature hit (reserve for splash only) | `./sfx/tiktok-boom-bling.mp3` |
-  | Glitch / error beat | `./sfx/glitch.mp3` |
   | "Wrong"/mistake beat | `./sfx/wrong.mp3` |
   | Comedic deflation | `./sfx/sad-violin.mp3` |
   Need something not here? Mixkit free SFX (`mixkit.co/free-sound-effects/<cat>/` → `assets.mixkit.co/.../<id>-preview.mp3`) is a reliable no-key source.
@@ -103,10 +111,13 @@ Captions exist to (a) make the video legible sound-off, (b) hold retention with
   (floating glass pills are the #1 "AI-looking" tell).
 - Captions are the **wording source of truth** — they carry the exact script even if native/synth audio
   drops a word.
-- Never cover the face — captions live in the lower third; any other graphic stays lower-mid or a top strip.
+- Never cover the face — DETECT where the face actually sits per scene and place captions/graphics in a band that clears it (see FACE-SAFE PLACEMENT). The lower-third default only holds when the face is high in frame; if the face sits low, raise the captions or push the plate up.
 - Derive timing from **word-level transcription of the FINAL cut** (re-transcribe after any cut, including
   the AI-clip silence-cut). The same re-transcription drives caption, zoom, SFX, and splash timing — recompute
   them all from the new boundaries. Never hand-guess or remap old times.
+- **Noisy mixed audio (music + foley + VO): don't transcribe the MIX** — source timing per layer. Narrator
+  captions from the `voiceAI` word-alignment (plus the clip's offset); character / native lines by transcribing
+  the **native-audio-only cut** (a separate export with just the dialogue track), not the final mix.
 **Font / size / layout:** Anton (or a heavy grotesk like Archivo Black for premium brands), UPPERCASE by
 default (sentence case only for a strictly soft/premium voice), one caption font for the whole video,
@@ -143,6 +154,18 @@ Keep sub-0.5s natural speech rhythm — don't machine-gun a separate caption ont
 cross-dissolves / cinematic fades · ❌ tiny text, thin weights, or a generic system font · ❌ everything
 (or nothing) highlighted · ❌ captions over the face or in the platform UI zone.
+## FACE-SAFE PLACEMENT (captions + every graphic must clear faces)
+The #1 overlay failure is text or a figure landing ON someone's face. "Captions live in the lower third" only holds when the face sits HIGH in frame — but talking-head / UGC framing varies wildly (desk-level POV, low or off-center framing, two people, a face near the bottom, or a subject reframed by a zoom punch-in). A blind lower-third caption then covers the face. So DETECT, don't assume:
+- **Find the face before placing anything.** Sample frames across each scene (`ffmpeg -i edit/tight.mp4 -vf fps=2 edit/faces/f_%03d.png`) and READ them to note which vertical band each scene's face(s) occupy. Faces MOVE between shots — map them per scene/segment, never once for the whole video.
+- **Place overlays in a band that clears the face**, keeping a margin of ≥8–10% of frame height around the face:
+  - Face high / centered (common case) → captions in the lower third (default baseline ~300px from bottom).
+  - Face LOW (in the lower third) → RAISE the captions to an upper band / top strip, OR push the plate up via the cover-fit / zoom transform so the face leaves the caption band. Never drop text onto a low face.
+  - Two faces, or a face off to one side → use the genuinely clear band (top strip, or the empty side); never straddle a face.
+- **Every inserted element obeys this — not just captions:** word-pop captions, any chip / label, the logo watermark, engagement figures / graphics, and the splash. Each keeps the same face margin AND stays out of the platform UI safe zone (bottom ~10%).
+- **If no band is safe** at a given moment (a face fills the frame), shrink or move the element, delay it to a face-clear beat, or drop it — covering the face is never acceptable.
+- **Verify on real frames, not by assumption** (Pipeline step 6): extract the frame at every overlay beat, look, and move anything touching a face before the high render. When the face moves into the caption band mid-segment, re-place the captions (or the plate) for that segment and re-derive timing.
 ## Gotchas
 | Symptom | Fix |
 | --- | --- |
@@ -153,11 +176,14 @@ cross-dissolves / cinematic fades · ❌ tiny text, thin weights, or a generic s
 | Two captions on screen at once | clamp hard-hide to `min(end+0.12, next.start-0.06)`, floor `inAt+0.2` (see Caption standard) |
 | Font silently falls back | Anton/Archivo etc. are NOT auto-embedded; download `.woff2` to `fonts/` + `@font-face` |
 | SFX silent in the render | every `<audio>` needs an `id` |
-| Graphics cover the face | keep overlays in the lower-mid band (or a top strip), never the center |
+| Captions or graphics land on a face | don't assume lower-third is safe — sample frames, find the face per scene, place overlays in a band that clears it (raise captions or push the plate up when the face sits low), verify on extracted frames before the high render (see FACE-SAFE PLACEMENT) |
+| Logo lands in the top-right | move the watermark to a bottom corner (subtle, small); never top-right (the template may default there — override it) |
 | Peaks clip (max 0.0 dBFS) | master the final with `loudnorm + alimiter` |
+| HyperFrames duration looks off | its CLI duration summary misreports — trust `ffprobe` for true duration; the render also expects `index.html` in a project dir |
+| Is the dialogue audible over music? | you can't audition audio — checkpoint the mix / levels with the user before finalizing |
 | Can't preview alpha/cutout in ffmpeg | ffmpeg 4.x can't decode VP9-alpha; verify in Chrome (canvas getImageData) |
 ## Files
 - `./silence_cut.py` — silence ∪ word-gap cutter: `--src --audio --transcript --out [--cut-min 0.5]`.
 - `./composition.template.html` — HyperFrames composition skeleton (plate, word-pop captions, zooms, splash, SFX; chips removed from default) with the load-bearing GSAP logic already wired.
-- `./sfx/` — bundled, ready-to-use SFX (riser, swooshes, ding, tiktok-boom-bling, glitch, wrong, sad-violin). See the SFX mapping above.
+- `./sfx/` — bundled, ready-to-use SFX (riser, swooshes, ding, tiktok-boom-bling, wrong, sad-violin). See the SFX mapping above.

package/skills/storyboard-prompt/SKILL.md CHANGED Viewed

@@ -27,7 +27,7 @@ Ask for these inputs, then wait for answers. Make smart, briefly-stated assumpti
 3. **Target market / audience** — who it's for.
 4. **Brand profile** — `../_arca-marketing-assets/brand.md` (name, positioning, tone, persona, logo rules, colors).
 5. **Target format** — TikTok / Reels / Shorts.
-6. **Preferred length** — or let me recommend (default 20–30s unless the idea demands otherwise).
+6. **Preferred length** — **default 15–20s.** Go LONGER only if the user explicitly asks; never pad past 20s on your own. (Shorter is fine for a genuinely tiny idea, but 15–20s is the target.)
 Optional extra context to absorb if given: product/offer, audience pain point, desired emotion, must-include, must-avoid, tone, location, available props, actors/character types. Use the supplied logo if available.
@@ -123,8 +123,8 @@ Rewrite the concept into a stronger TikTok/Reels-ready version. Produce:
 6. Best opening visual
 7. Chosen hook
 8. Two backup hooks
-9. **Recommended length** — default 20–30s only if truly strongest; recommend shorter/longer if the idea demands
-10. **Retention map** — 0–1s scroll stop / 1–3s promise + clarity / 3–6s foreshadow or escalation / 6–15s proof, action, tension, or transformation / 15–25s payoff / final 1–2s loop, CTA, or punchline
+9. **Recommended length** — **default 15–20s**; go LONGER only if the user explicitly asked (never on your own). Recommend shorter only for a genuinely tiny idea.
+10. **Retention map** (scaled to the 15–20s default) — 0–1s scroll stop / 1–3s promise + clarity / 3–6s foreshadow or escalation / 6–14s proof, action, tension, or transformation / 14–18s payoff / final 1–2s loop, CTA, or punchline. Stretch proportionally only if the user asked for a longer video.
 11. Pacing recommendation
 12. Music / sound style
 13. **Dialogue + edit-caption strategy** — captions may be recommended for the final EDIT, but do NOT place overlay captions inside the storyboard image frames

package/skills/video-prompt/SKILL.md CHANGED Viewed

@@ -17,7 +17,7 @@ Before generating, ask for any of these not already provided, then wait for answ
 3. **BRAND PROFILE** — attached/pasted `brand.md`.
 4. **TARGET MARKET / AUDIENCE.**
 5. **VIDEO SEGMENT** — FULL VIDEO / PART 1 / PART 2.
-6. **TARGET DURATION.**
+6. **TARGET DURATION** — default 15–20s; only exceed 20s if the user explicitly asks.
 7. **IMAGE MODEL + RESOLUTION** — which model cleans/upscales storyboard frames into start frames, at what size. Default: Nanobanana Pro at 2K.
 8. **VIDEO MODEL + RESOLUTION** — which model generates clips, at what resolution, which mode (std/pro). Default: Kling V3 at 720p.
 9. **ASPECT RATIO** — default 9:16 vertical.
@@ -89,7 +89,11 @@ The board is a REFERENCE IMAGE, but WHERE it connects depends on whether the cho
 - Either way, the board NEVER gets reproduced as on-screen graphics; it only conditions generation.
 ### DO NOT COMPOSITE TEXT/UI GRAPHICS (hard rule)
-This (the video-generation) stage composites NO overlay onto the footage. Never recreate the storyboard's mock UI / data as floating overlays — no data/deck cards, dashboards, labels, callouts, checklists, route-map text, fake screens, or "REC"-style HUD text laid over the video. It looks fake and instantly kills the UGC feel. Any text or screen the viewer sees must be DIEGETIC (filmed/generated as a real screen or prop inside the Wyren clip) or simply dropped — if a panel's mock UI can't be made diegetic, simplify or omit it. All overlays are decided later by `shorts-editor` (spoken-word captions, brand splash, engagement chips/graphics) on the editor's terms — not generated or composited here.
+This (the video-generation) stage composites NO overlay onto the footage. Never recreate the storyboard's mock UI / data as floating overlays — no data/deck cards, dashboards, labels, callouts, checklists, route-map text, fake screens, or "REC"-style HUD text laid over the video. It looks fake and instantly kills the UGC feel. Any text or screen the viewer sees must be DIEGETIC (filmed/generated as a real screen or prop inside the Wyren clip) or simply dropped — if a panel's mock UI can't be made diegetic, simplify or omit it. All overlays are decided later by `shorts-editor` (spoken-word captions, brand splash, engagement chips/graphics) on the editor's terms — not generated or composited here. **And the reverse holds: the editor CANNOT place in-world / screen content convincingly** — pasted in post it floats over faces and looks fake (e.g. a promo card meant for the laptop screen ends up hovering over the team). So if a screen or prop must show specific content, it MUST be generated DIEGETICALLY here, in the clip — never deferred to the edit.
+### KLING FRAGILITY — morph guard + where dialogue lands
+- **Kling warps fragile on-screen content and morphs objects.** It garbles readable UI/text (e.g. "Option A / B" labels turn to mush) and deforms objects mid-shot (a laptop literally rotating from its back, the lid becoming the screen). So don't give it readable UI or shape-critical objects to animate: keep screens **dim / out-of-focus / glare-washed**, snap the camera to faces, and **lock each object's shape + orientation explicitly** in the prompt. Add anti-morph negatives: *morphing screen, warping text, laptop flipping, lid becoming screen, deforming/melting object, unstable geometry.*
+- **Native dialogue lands at the END of a generated clip**, and Kling pads the HEAD with filler / ad-libs. Plan the real line for the clip's back half, and tell `shorts-editor` to **transcribe before trimming** — a blind fixed-duration trim chops the actual line (see shorts-editor's AI-clip edge handling).
 ## VIDEO TYPE SCOPE
 This skill makes ANY video type. The LOOK follows the chosen VIDEO TYPE; the TikTok retention rules (FIRST 5 SECONDS cold open, sound-off clarity, a fresh beat every 2–4s, continuity locks, brand rules) apply to EVERY type.
@@ -105,7 +109,7 @@ Generate a finished vertical short-form video with:
 - Camera style: handheld smartphone footage (UGC)
 - Production level: low-cost, casual, creator-shot (UGC)
 - Audio: native dialogue, voice, room tone, foley, SFX, risers, music — only if supported by the model
-- Duration: [TARGET DURATION]
+- Duration: [TARGET DURATION — default 15–20s unless the user asked for longer]
 - Max per generation: model-dependent (typically 15s)
 If the full video exceeds the model's max single-clip duration, split into multiple generated clips.
@@ -156,13 +160,14 @@ Resolutions: 480p (Seedance only), 720p, 1080p, 4K (Veo only). Default 720p for
 2. `build_graph`: `imageInput` (start-frame source, characters.png, logo.png, AND the production board / its cropped character-reference + storyboard cell if you have one) → `imageAI` (A: clean/upscale photo panel; B: design fresh start frame from the board reference) → `videoAI` (chosen model/resolution/mode/duration). **Route the board per MODEL ROUTING:** zero-ref models (Kling V3) → board into `imageAI` only; reference-capable models (Seedance/Veo/O1) → also wire the board / storyboard frames into `videoAI` as `referenceImages`. Use multishot or per-clip nodes per the split rule.
 3. `validate_workflow` — resolve warnings with the user.
 4. Estimate cost: `get_pricing` (chain mode) / `estimate_product_cost`; get the user's OK to spend.
-5. `run_workflow` (`userConfirmed: true`), then poll `get_workflow_run_status` every 5s until terminal.
+5. `run_workflow` (`userConfirmed: true`), then poll for completion. **`get_workflow_run_status` lags badly** — it can still show `pending` ~10 min after a job already succeeded, so treat **`get_node_outputs` as the source of truth** for whether a node is done, not the run status. There is a single video worker, so `run_node` jobs queue and run ~sequentially; `cancel_job` is wedge-risky, so let redundant jobs finish rather than cancel.
 6. Pull clips with `get_node_outputs` and SAVE into the project dir: clips → `<project-slug>/clips/` (`clip-01.mp4`, `part1.mp4` …), and any designed/cleaned start frames → `<project-slug>/startframes/`. On-screen UI/data/screens were generated DIEGETICALLY inside the clips — no text-overlay pass here.
 7. EDIT & FINISH — hand the `<project-slug>/clips/` folder to `shorts-editor`: fast-cut assembly, spoken-word CAPTIONS, brand splash/end card, zoom/SFX timing, master into `<project-slug>/out/`. That is the ONLY place HyperFrames is used, and only for captions + splash + timing — never to composite text/UI graphics onto the footage.
 ### WYREN BUILD GOTCHAS (these cost failed validate/run calls)
 - **`multiPrompt` must be a JSON STRING, not an array.** For multishot, pass per-shot prompts as a stringified JSON value (e.g. `"[{...},{...}]"`), not a raw array. A raw array fails validation.
-- **`imageAI` / `videoAI` need a CONNECTED TEXT EDGE — `customPrompt` alone fails.** Wire a text/prompt node into the AI node's prompt input; a `customPrompt` field set without an incoming text edge does not satisfy validation. Build the graph so every imageAI/videoAI has its prompt edge connected, then set the prompt content.
+- **`imageAI` / `videoAI` need a CONNECTED TEXT EDGE — `customPrompt` alone fails.** Wire a text/prompt node into the AI node's prompt input; a `customPrompt` field set without an incoming text edge does not satisfy validation. Build the graph so every imageAI/videoAI has its prompt edge connected, then set the prompt content. (`imageInput` data shape is `{imageUrls:[url]}`; `videoAI.text` needs a connected `textInput` edge.)
+- **`run_workflow` re-runs ALL nodes.** To regenerate just a few nodes (a reshot clip, a fixed frame) WITHOUT clobbering already-approved outputs, call **`run_node` per node** instead of re-running the whole workflow.
 ## RECURRING CHARACTER CONSISTENCY (multishot / multi-clip)
 Any time the video is more than one shot — a multi-clip split (Part 1/Part 2) or video-model multishot (Kling V3 ≤6 shots) — the same person must look identical in every shot. Faces, hair, build, age, wardrobe drift badly across independent generations. Lock identity FIRST, for each recurring character (e.g. the Arca Navigator).
@@ -185,7 +190,7 @@ If two characters recur, build a separate profile + bible for each, and keep bot
 - **Push extras out of focus.** Keep background people incidental, turned away, blurred, or cropped — never ask the model to hold a face it doesn't need to. An extra the viewer can't study can't visibly drift.
 ## DEFAULT SPLIT RULE
-If target duration is 20–30s: **Part 1 = Panels 1–5; Part 2 = Panels 6–9.** Each part feels like one continuous video. Part 2 continues the same character, wardrobe, props, lighting, setting, camera quality, and emotional energy — do not restart the story, do not recap. Never show the storyboard grid, panel numbers, production notes, borders, arrows, labels, or annotations. Convert panels into real-feeling vertical footage.
+**The default 15–20s video is usually ONE part** — a single Kling V3 multishot (up to 15s) plus a short final clip if needed. Only SPLIT into two parts when the user explicitly asked for a LONGER video (>~20s). When you do split: **Part 1 = Panels 1–5; Part 2 = Panels 6–9.** Each part feels like one continuous video. Part 2 continues the same character, wardrobe, props, lighting, setting, camera quality, and emotional energy — do not restart the story, do not recap. Never show the storyboard grid, panel numbers, production notes, borders, arrows, labels, or annotations. Convert panels into real-feeling vertical footage.
 ## INPUTS
 - Storyboard reference: [ATTACH 3×3 STORYBOARD IMAGE]
@@ -288,6 +293,8 @@ If a brand is included: use the supplied logo only if available, as a natural ph
 ## AUDIO DIRECTION
 Generate native audio if supported; it should feel like real creator-shot social video. Use: natural room tone, phone-mic ambience, casual dialogue, natural VO if the storyboard calls for it, imperfect human delivery, tiny pauses, keyboard taps, chair squeaks, paper sounds, phone buzzes, footsteps, desk sounds, bag rustles, small reaction sounds, subtle whoosh/riser only when helpful, light music only if it supports pacing. Native to TikTok/Reels, not cinematic. Avoid: trailer/orchestral music, dramatic swells, glossy commercial music, overproduced sound design, fake epic SFX, booming risers, ad-like or perfect-studio VO. Dialogue casual, slightly imperfect, human. If the model can't generate clean dialogue, prioritize realistic visual storytelling and leave dialogue/captions for editing.
+**Audio mix (esp. narrated / trailer formats):** use **native Kling dialogue for character lines**, a **separate, consistent narrator VO** track when there's narration (don't let Kling re-voice the narrator per clip — it drifts), and **native foley everywhere**. Keep **dialogue clearly audible over music** — the editor will balance levels, but write/generate so speech sits on top. For trailer/hype energy (non-UGC), pace it punchy: fast whip-pans and snap-zooms, ~3s shots, high intensity — not slow cinematic drift.
 ## SHOT-BY-SHOT EXECUTION
 Each panel is a BEAT — realize it as several short shots from different angles, not one held clip (see FAST CUTS). Per panel: preserve the main action, emotional beat, character placement when possible, important props, and environment; make it feel like real phone footage; make the action understandable without text; keep it short and purposeful; move the story forward quickly. Panel 1 = strongest scroll-stopping moment. Panel 7 = strongest visual payoff/climax. Panel 9 = punchline, loop, CTA, or memorable final image. Don't let the ending drag. Speak the EXACT lines from the vetted script (SCRIPT & ORDER PRE-FLIGHT) — any dialogue improvement happens THERE and gets re-vetted, not silently per shot, so the same words and order ship every time. You may still refine timing, micro-actions, transitions, and audio as long as the script, story order, and visual anchor stay intact.

package/skills/shorts-editor/sfx/glitch.mp3 DELETED Viewed

Binary file