npm - arca-marketing-video - Versions diffs - 2.22.0 → 2.23.0 - Mend

arca-marketing-video 2.22.0 → 2.23.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (5) hide show

package/package.json +1 -1
package/skills/carousel-generator/SKILL.md +43 -8
package/skills/shorts-editor/SKILL.md +13 -12
package/skills/storyboard-prompt/SKILL.md +10 -2
package/skills/video-prompt/SKILL.md +8 -1

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "arca-marketing-video",
-  "version": "2.22.0",
+  "version": "2.23.0",
   "repository": {
     "type": "git",
     "url": "git+https://github.com/briarbearrr/arca-marketing.git"

package/skills/carousel-generator/SKILL.md CHANGED Viewed

@@ -102,11 +102,18 @@ Generate each slide as a separate standalone image. Do NOT generate all slides i
 - Safe area: keep important text/faces away from edges; ~80–120 px safe margin all sides; headline large enough for mobile; place logo/name mark consistently (usually bottom-left, bottom-right, or top-left per layout) and subtly — never competing with the headline.
 - Repurposing: square 1:1 1080×1080; stories/reels covers 9:16 1080×1920. Default to 4:5 portrait unless specified.
-### Carousel image generation rule
-Produce separate images: Slide 1, Slide 2, Slide 3, (Slide 4 if needed), (Slide 5 if needed). All must share: same visual style, color system, typography style, logo-placement logic, illustration style, graphic motif family, and level of polish. Each slide visually distinct but clearly part of the same carousel.
+### GENERATE ONE SLIDE AT A TIME (hard rule + workflow)
+**Never render the slides together. One render = one slide.** No grid, no contact sheet, no 2×2 / 3×3 board, no "all slides in one image", no multi-panel composite, no brand board. A combined image is an instant fail — discard it and re-render as separate slides. (Asking an image model for "the whole carousel" reliably produces a cramped grid with unreadable text; one slide per render is the only way to get full-resolution, legible, mobile-ready slides.)
-Continuity SHOULD come from: consistent type hierarchy, consistent palette, recurring background texture, consistent safe margins, restrained highlight emphasis, and 1 recurring object/motif repeated across slides.
-Continuity should NOT come from: repeating every object, the same busy brand-world scene everywhere, forcing the logo into every large visual moment, or packing every slide with the full brand system.
+**Render sequentially, anchoring every slide to Slide 1:**
+1. **Render Slide 1 first** as a single standalone 4:5 image. This locks the carousel's look — palette, type hierarchy, layout rhythm, safe margins, logo placement, illustration style. Review it; re-render until it's right BEFORE touching the rest (everything inherits from it).
+2. **Render Slides 2…N one at a time**, and feed the rendered Slide 1 back into the image tool as a **style reference** for each (optionally the immediately-previous slide too). Separate per-slide renders drift in style/palette/type unless every slide is anchored to Slide 1 — passing it as a reference is what makes independent renders read as one cohesive carousel.
+3. **Review each slide before rendering the next.** If a slide drifts in style, breaks the ELEMENT BUDGET, or its text renders badly, fix and re-render THAT slide before moving on — don't batch-fix at the end.
+*Tool note (if rendering via Wyren `imageAI`): one `imageAI` node/call per slide, with Slide 1's output wired in as a reference image for slides 2…N. Never one node asked to produce all slides. Validate the cheap image tier before spending — load the `wyren` skill first.*
+Continuity SHOULD come from: anchoring every slide to Slide 1's rendered look, consistent type hierarchy, consistent palette, recurring background texture, consistent safe margins, restrained highlight emphasis, and 1 recurring object/motif repeated across slides. Each slide visually distinct but clearly part of the same carousel.
+Continuity should NOT come from: rendering them together, repeating every object, the same busy brand-world scene everywhere, forcing the logo into every large visual moment, or packing every slide with the full brand system.
 ## CREATIVE STYLE OPTIONS
 Choose the best direction for the topic.
@@ -129,9 +136,31 @@ Choose 3, 4, or 5 slides based on how much the topic needs.
 Optional: Slide 2 may be "the answer to the hook" when the hook creates curiosity that needs immediate resolution — only if it improves narrative flow.
+## HOOK & SWIPE-MOMENTUM (the content engine — read before writing copy)
+Carousels grow by earning swipes: every swipe is an engagement signal that pushes the post to more people, so the writing's whole job is to make the reader open the post and keep swiping. Weak copy is the #1 reason a well-designed carousel dies. Write to these:
+**Slide 1 is the whole ballgame — treat it like a YouTube thumbnail + title.** Its only job is to make the reader feel they MUST swipe. Not to inform, not to summarize — to create an itch the next slide scratches. If Slide 1 can be fully understood and "closed" on its own, you've already lost the swipe.
+**Open the loop; don't close it.** The hook names a tension, question, or surprising claim and deliberately withholds the payoff — the answer lives on the next slide ("The reason your backlog never shrinks isn't more work →"). Resolve it too early and there's no reason to keep going.
+**Kill the boring carousel.** The dead format is a solid-color background with a generic listicle hook ("3 tips for losing weight") — instantly skippable. Ban templated "X tips for Y" headlines, generic motivational filler, and vague "you need to hear this." Be specific, surprising, contrarian, or personal instead.
+**Hook formulas — pick one, tailor to the audience + brand voice (see `brand.md` hook examples):**
+- **Curiosity gap** — tease a result or mechanism without revealing it ("Most founders fix the wrong bottleneck first.").
+- **Contrarian / pattern interrupt** — challenge a belief the audience holds ("Stop hiring more people. It's making you slower.").
+- **Breaking-news framing** — state it like a fresh headline or announcement; urgency and newness pull the swipe.
+- **Costly-mistake call-out** — name a specific mistake the reader recognizes they're making ("This one habit is quietly killing your week.").
+- **Number + payoff with an open loop** — the number promises, the slides deliver; never a flat list dumped on slide 1.
+- **Identity call-out** — name the exact audience so they self-select ("Founders who still run their own ops — this one's for you.").
+- **Relatable truth / story-open** — start mid-tension or on a painfully familiar moment so they feel seen.
+**Every slide ends with a reason to swipe.** Build momentum: end each slide on an incomplete thought, a "but here's the problem…", a cliffhanger, a numbered progression (1→2→3), or a question the next slide answers. No slide should feel like a comfortable stopping point until the CTA.
+**Write for shares and saves, not just reads.** The most-distributed carousels are also the most shareable — content that makes the reader look smart, feel seen, or want to send it to someone. A line reframed for two competing sub-audiences earns shares from both. Specific beats generic; concrete pain beats abstract benefit, every time.
 ## SLIDE STRATEGY
-**Slide 1 — Hook** (most important; must stop the scroll). Strongest visual composition in the carousel; short intriguing headline readable in under 2 seconds; names a painful truth, hidden bottleneck, or costly mistake; one bold visual metaphor from the visual world; instantly understandable before the caption. Hook angles (see profile for on-brand lines, tailor to audience): reframe a common assumption ("X is not a Y problem."), name the real bottleneck, call out a costly overlooked mistake, or challenge popular advice. *Visual: can be boldest, but only one clear anchor. Split-screen → keep each side simple/high-contrast. Character → clean background. Metaphor → no second metaphor.*
+**Slide 1 — Hook** (most important; must stop the scroll AND force the swipe — apply the HOOK & SWIPE-MOMENTUM rules above). Strongest visual composition in the carousel; short intriguing headline readable in under 2 seconds; names a painful truth, hidden bottleneck, or costly mistake; one bold visual metaphor from the visual world; instantly understandable before the caption. Hook angles (see profile for on-brand lines, tailor to audience): reframe a common assumption ("X is not a Y problem."), name the real bottleneck, call out a costly overlooked mistake, or challenge popular advice. *Visual: can be boldest, but only one clear anchor. Split-screen → keep each side simple/high-contrast. Character → clean background. Metaphor → no second metaphor.*
 **Slide 2 — Problem / Tension.** Make the pain concrete but don't visualize every symptom — usually the simplest slide after the hook; isolate the core tension with one clean metaphor. Show what's breaking (use the topic's specific pains): core friction/blocker/cost the audience feels, the symptom that keeps returning, unclear ownership or broken handoff, wasted time or effort. Good visuals: one person facing a single growing problem stack; one blocked path between two states; one messy pile beside one clean headline; one calendar page showing delay; one task stack with a single warning label; one split desk with very few props. Avoid: many props/labels/objects in one slide; more than two piles; more than three small labels; detailed paperwork with readable microcopy; a full overhead desk scene unless most objects are abstract/unlabeled. Feel: "Here is the bottleneck," not "Here are all the objects related to the bottleneck."
@@ -144,7 +173,7 @@ Optional: Slide 2 may be "the answer to the hook" when the hook creates curiosit
 Slide pacing: 1 boldest/clearest hook → 2 simplest/clearest bottleneck → 3 calm reframe → 4 structured framework → 5 spacious CTA.
 ## COPY RULES
-One main idea per slide. Big headline first; short supporting copy only; no walls of text. Use audience-specific language; prefer concrete pain over abstract benefits. Avoid generic AI buzzwords and corporate filler. Every slide earns the next swipe. Plain, direct language. Hook feels smart, not clickbait; CTA feels helpful, not desperate.
+One main idea per slide. Big headline first; short supporting copy only; no walls of text. Use audience-specific language; prefer concrete pain over abstract benefits. Avoid generic AI buzzwords and corporate filler. **Every slide earns the next swipe** (open-loop endings, per HOOK & SWIPE-MOMENTUM). Plain, direct language. Hook feels smart, not clickbait; CTA feels helpful, not desperate. **No templated "X tips for Y" listicle headlines and no generic motivational filler** — specific, surprising, or contrarian only. The headline must do most of the work; if blurring the visual leaves the message intact, the copy is strong enough.
 ## DESIGN RULES
 Each slide looks like part of one cohesive campaign. Use: the brand logo somewhere (name written per profile rules), bold contrast, strong typography, generous negative space, clean editorial composition, restrained brand-world details, one clear visual anchor, highlight color sparingly, and off-white/accent/dark backgrounds strategically.
@@ -240,7 +269,7 @@ Describe the recurring visual elements across all slides. Include:
 - visual simplicity rule for this specific carousel
 ```
-Then create each slide separately. Each slide block uses these fields:
+Then create each slide separately — and render them ONE AT A TIME (Slide 1 first, then each later slide with Slide 1 wired in as a style reference; see GENERATE ONE SLIDE AT A TIME). Never render the set as a grid/board. Each slide block uses these fields:
 **SLIDE 1 — HOOK**
 - Purpose: [what this slide does strategically]
@@ -278,6 +307,9 @@ Then create each slide separately. Each slide block uses these fields:
 **CAPTION:**
 Write a practical social caption for the target audience. Expand the idea without simply repeating the slides. Direct, useful, audience-aware. No em dash. Max 3 short paragraphs, 2 sentences each. End with a soft brand CTA. Include up to 5 SEO-helpful, not-overly-specific hashtags.
+**POSTING TIP (include once for the user):**
+Remind the user to **add music to the carousel before posting** (the audio option below the caption). It lifts engagement, and on Instagram it makes the carousel eligible to surface on the Reels tab — a real reach boost for a static post.
 ---
 ## FINAL SELF-CHECK BEFORE OUTPUT
@@ -294,5 +326,8 @@ Check every slide:
 10. Carousel still premium, human, practical, on-brand?
 11. ELEMENT BUDGET COUNT: literally count for this slide — text zones (cap 2: headline + one line), UI cards (cap 1, anchor only), speech/thought bubbles (must be 0), scattered decorations like pins/dashes/sparkles/rays (must be 0). Over on any? Cut before outputting.
 12. Did every image prompt end with the RENDER-ONLY enforcement line?
+13. **Hook test:** does Slide 1 open a curiosity loop that makes the swipe feel irresistible — not a generic "X tips" listicle or a self-contained statement that needs no swipe?
+14. **Swipe-momentum:** does every slide end with a reason to keep swiping (open loop / cliffhanger / progression), all the way to the CTA?
+15. **One-at-a-time render:** is each slide its own standalone image (no grid/board), rendered Slide 1 first with Slide 1 fed back as the style reference for the rest?
-If any answer is no, simplify before outputting. When unsure, remove the element — restraint always wins.
+If any answer is no, fix before outputting — sharpen the hook, simplify the slide, or re-render. When unsure on visuals, remove the element — restraint always wins; when unsure on copy, make it more specific and open the loop wider.

package/skills/shorts-editor/SKILL.md CHANGED Viewed

@@ -18,7 +18,7 @@ in-world mark. `silence_cut.py` and `composition.template.html` are co-located i
 This skill is the final edit stage after `video-prompt` (or runs standalone on any raw footage).
 ## Overview
-Turn a finished-but-flat talking clip into a punchy 9:16 short. Engagement is layers stacked on clean footage: a **tight silence-cut** (raw footage only — skip for AI clips), **word-by-word pop-on captions** (Anton, gold keywords, no pill backing), **native treatments** (zoom punch-ins, speed ramps, hard cuts, SFX hits — NOT glass-pill chips), and a **SFX + brand-splash** finish. The composition is HyperFrames HTML; ffmpeg cuts and masters; faster-whisper supplies timing.
+Turn a finished-but-flat talking clip into a punchy 9:16 short. Engagement is layers stacked on clean footage: a **tight silence-cut** (raw footage AND assembled AI clips — Kling pads dead air into every generated clip), **word-by-word pop-on captions** (Anton, gold keywords, no pill backing), **native treatments** (zoom punch-ins, speed ramps, hard cuts, SFX hits — NOT glass-pill chips), and a **SFX + brand-splash** finish. The composition is HyperFrames HTML; ffmpeg cuts and masters; faster-whisper supplies timing.
 **Core principle:** retention is manufactured by deleting dead time and giving the eye a new beat every 2-4s. Cut the pauses first; every other layer just decorates the tightened result.
@@ -52,23 +52,21 @@ All paths below are inside the project dir: source from `clips/`, working files
 1. **Denoise** → `edit/audio_clean.m4a`:
    `ffmpeg -i clips/src.mp4 -af "highpass=85,afftdn,lowpass=12000,loudnorm=I=-14:TP=-1.5:LRA=11" -ar 48000 -ac 2 edit/audio_clean.m4a`
 2. **Transcribe** word timestamps with faster-whisper → `edit/transcript.json` (`[{text,start,end}]`). `small.en` only if the audio is English.
-3. **Silence-cut** with `./silence_cut.py` → `edit/tight.mp4` (the non-obvious core, see below). **SKIP this
-   step for AI-generated footage** (clips from `video-prompt` / Wyren): it is already tight and its
-   pauses are intentional comedic beats — the cutter is built for RAW talking-head dead air, not for
-   generated clips. For AI footage, go straight to assembly and keep the clips' timing. Only run the
-   silence-cut on real raw recordings (interview, vox-pop, vlog, podcast) with genuine dead air.
-4. **Re-transcribe `edit/tight.mp4`**, regroup into caption phrases (sentence-aware, 3-5 words). Cutting shifts every timestamp, so always re-transcribe the cut; never remap old times.
+3. **Silence-cut** with `./silence_cut.py` → `edit/tight.mp4` (the non-obvious core, see below). **Run this on AI-generated footage too** (clips from `video-prompt` / Wyren), not just raw recordings. Kling and similar models pad EVERY generated clip with ~1–2s of dead air — a slow lead-in plus a tail after the last word — so stacked clips drag badly. The old "AI clips are already tight, skip it" assumption is the opposite of reality. **Assemble the clips first, then silence-cut the assembled video.** For AI footage tune the cutter sensitive: `--cut-min 0.35` and a lower noise floor (`-36 dB`) to catch the quiet, breath-filled padding, leaving ~0.1s pads so only natural beats survive. Real raw recordings (interview, vox-pop, vlog, podcast) use the default settings.
+   **Mind the tail / SFX.** AI clips often land the FINAL WORD right at the out-point, so the ~0.1s pad after the last word matters, and **never land a transition SFX on a cut where a word ends** (it steps on the last word — see the SFX layer).
+4. **Re-transcribe the ASSEMBLED cut** (`edit/tight.mp4` — for raw footage OR an assembled AI-clip video), regroup into caption phrases (sentence-aware, 3-5 words). The silence-cut shifts every timestamp, so always re-transcribe and recompute **caption, zoom, SFX, and splash** timing from the NEW boundaries; never remap old times.
 5. **Build the composition** in `edit/composition/` from `./composition.template.html`: muted plate + separate dialogue audio + word-pop captions + zooms + logo + splash + SFX (no chips by default). Lint clean.
 6. **Draft render** (`--quality draft`) → `edit/frames/`, extract frames at every caption/splash beat, eyeball, fix. Then **`--quality high`** and **master** into `out/`:
    `ffmpeg -i edit/raw.mp4 -c:v copy -af "loudnorm=I=-14:TP=-1.5,alimiter=limit=0.95" -c:a aac -b:a 192k out/<slug>-final.mp4`
 ## The silence-cut (the part that is easy to get wrong)
-**Only for RAW recordings — skip entirely for AI-generated clips** (see Pipeline step 3).
+**Run on BOTH raw recordings and assembled AI-clip videos** (see Pipeline step 3). Kling pads every generated clip with ~1–2s of dead air, so AI assemblies need this just as much as raw footage — for AI clips tune sensitive (`--cut-min 0.35`, noise floor `-36 dB`).
 Neither signal alone works:
 - **silencedetect** misses pauses filled with ambient/breath above the noise floor.
 - **whisper word timestamps** are imprecise around pauses and will report contiguous words across a real ~1s gap (e.g. it claimed "So, cheating" was continuous when 0.8s of silence sat between them).
-**Cut where EITHER is true:** acoustic silence (`silencedetect=noise=-33dB:d=0.35`) OR a transcript word-gap > 0.45s. Only remove the dead middle when it exceeds ~0.5s, leave ~0.1s of speech pad each side, and KEEP sub-0.5s gaps (natural rhythm — cutting every micro-gap machine-guns the edit and looks glitchy). Add a 10ms `afade` at each join to kill clicks. **Verify with silencedetect on the OUTPUT** — it should show only short breath gaps. `./silence_cut.py` implements all of this; tune `--cut-min`.
+**Cut where EITHER is true:** acoustic silence (`silencedetect=noise=-33dB:d=0.35`) OR a transcript word-gap > 0.45s. Only remove the dead middle when it exceeds ~0.5s, leave ~0.1s of speech pad each side, and KEEP sub-0.5s gaps (natural rhythm — cutting every micro-gap machine-guns the edit and looks glitchy). Add a 10ms `afade` at each join to kill clicks. **Verify with silencedetect on the OUTPUT** — it should show only short breath gaps. `./silence_cut.py` implements all of this; tune `--cut-min`. **For AI-clip assemblies** (Kling et al.), the padding is quiet near-silent breath that `-33 dB` can miss — drop the noise floor to `-36 dB` and `--cut-min 0.35` to catch it.
 ## Layers (all face/caption-safe)
 - **Captions — word-by-word pop-on (the default):** each WORD pops in as it's spoken (Anton, UPPERCASE,
@@ -81,7 +79,7 @@ Neither signal alone works:
   hard cuts on the beat, and SFX hits. Only add a chip if the user explicitly asks for an on-screen
   label, and keep it minimal. The chip CSS/JS is removed from the default template.
 - **Zoom punch-ins:** scale the plate wrapper (base ~1.04) to ~1.10-1.14 on emphasis lines, ease back. Never scale below 1.0 (reveals letterbox edges). Cover-fit the plate.
-- **SFX:** a curated set ships in `./sfx/` — use these first (no download needed). Keep dialogue front (SFX vol 0.25-0.35); every `<audio>` needs an `id`. Mapping:
+- **SFX:** a curated set ships in `./sfx/` — use these first (no download needed). Keep dialogue front (SFX vol 0.25-0.35); every `<audio>` needs an `id`. **Never land a transition/whoosh SFX on a cut where a word ends** — it steps on the last word; put the hit on a silent beat or at the head of the next clip (this is why the silence-cut leaves ~0.1s pad after the last word, see Pipeline step 3). Mapping:
   | Role | File |
   | --- | --- |
   | Opening riser (first frame) | `./sfx/riser-high.mp3` |
@@ -106,8 +104,9 @@ Captions exist to (a) make the video legible sound-off, (b) hold retention with
 - Captions are the **wording source of truth** — they carry the exact script even if native/synth audio
   drops a word.
 - Never cover the face — captions live in the lower third; any other graphic stays lower-mid or a top strip.
-- Derive timing from **word-level transcription of the FINAL cut** (re-transcribe after any trim). Never
-  hand-guess or remap old times.
+- Derive timing from **word-level transcription of the FINAL cut** (re-transcribe after any cut, including
+  the AI-clip silence-cut). The same re-transcription drives caption, zoom, SFX, and splash timing — recompute
+  them all from the new boundaries. Never hand-guess or remap old times.
 **Font / size / layout:** Anton (or a heavy grotesk like Archivo Black for premium brands), UPPERCASE by
 default (sentence case only for a strictly soft/premium voice), one caption font for the whole video,
@@ -149,6 +148,8 @@ cross-dissolves / cinematic fades · ❌ tiny text, thin weights, or a generic s
 | --- | --- |
 | `hyperframes transcribe` fails (whisper-cpp not found) | use faster-whisper directly |
 | A pause survives the cut | use silencedetect ∪ word-gap union, not either alone |
+| AI clips drag / dead air between clips | run the silence-cut on the assembled AI video (`--cut-min 0.35`, noise `-36 dB`, ~0.1s pads) — don't skip it for AI footage |
+| Last word clipped at a cut / SFX steps on speech (AI clips) | keep the ~0.1s pad after the last word; never land a transition SFX where a word ends |
 | Two captions on screen at once | clamp hard-hide to `min(end+0.12, next.start-0.06)`, floor `inAt+0.2` (see Caption standard) |
 | Font silently falls back | Anton/Archivo etc. are NOT auto-embedded; download `.woff2` to `fonts/` + `@font-face` |
 | SFX silent in the render | every `<audio>` needs an `id` |

package/skills/storyboard-prompt/SKILL.md CHANGED Viewed

@@ -138,6 +138,7 @@ Rewrite the concept into a stronger TikTok/Reels-ready version. Produce:
 - CTA echoes the brand's CTA style and leads softly to the brand.
 - Treat the brand's hook/CTA examples as a springboard, not copy-paste — write a fresher, sharper line.
 - **State in ONE line how the chosen hook + payoff + CTA map to the brand's message.**
+- **Name the through-line.** Write the SINGLE core message every spoken line will serve — one sentence, not three. This is the through-line the dialogue-only pass enforces in Phase 8B; if you can't state it in one line here, the script isn't ready to storyboard.
 **CATCHY & ENGAGING BAR.** The polished idea must clear a stickiness bar, not just clarity:
 - A memorable, quotable line/phrase the viewer could repeat or comment (not clickbait).
@@ -174,6 +175,8 @@ Text may appear inside a frame ONLY when physically part of the scene: a real co
 **Brand/logo rules.** If a brand logo asset is attached: use the actual supplied logo only; as a subtle in-world prop; place naturally on 1–3 panels only (unless the same object stays visible); good placements = tote bag, laptop sticker, mug, notebook, office sticker, small poster, badge, desk item; visible but not dominant; incidental, not ad-like. **Do NOT** invent a fake logo, distort it, make it giant, use it as a watermark/overlay, force it into every panel, or make the scene feel like a traditional ad. If exact reproduction isn't possible, leave the prop simple and add to that frame's BRAND/LOGO NOTES: "Place supplied brand logo here in post/prop."
+**NEVER trust the image model to draw the mark.** Asked to render a logo from a text description, image models reliably FABRICATE it — a wrong wordmark, an invented icon, a mangled symbol — every time. So always **attach the actual `../_arca-marketing-assets/assets/logo.png` file to the `imageAI` node as a reference image** (Phase 8 already wires it in); never describe the logo and hope. If the model still can't reproduce it cleanly at the size shown, leave a plain unbranded prop and note "place supplied logo here in post" rather than shipping an invented mark.
 ## CLARIFICATION CHECKPOINT — ASK BEFORE GENERATING THE IMAGE
 Before generating the production board (Phase 8), STOP and check whether anything material is still unclear or assumption-based. If so, ask a few focused questions and WAIT — don't generate the board on shaky assumptions, the image is expensive to redo. Ask when any of these are unresolved or guessed:
 - chosen video type/style and overall look
@@ -215,7 +218,7 @@ Render the WHOLE board in the chosen video type's look (UGC by default — raw/i
 ### GENERATE THE BOARD (only after the user confirms in Step 8-ii)
 Once the user says go, actually RENDER the board — don't just hand over the prompt. Default path is the **Wyren MCP** (consistent with `video-prompt`); load the `wyren` skill before any `mcp__wyren__*` call.
-- **Model:** an image model that handles a multi-panel board + reference images — **Nanobanana Pro** (Gemini 3 Pro Image, up to 4K, ≤14 reference images). Pass `../_arca-marketing-assets/assets/logo.png` and `characters.png` as reference images so the persona + logo stay on-brand.
+- **Model:** an image model that handles a multi-panel board + reference images — **Nanobanana Pro** (Gemini 3 Pro Image, up to 4K, ≤14 reference images). Pass `../_arca-marketing-assets/assets/logo.png` and `characters.png` as reference images so the persona + logo stay on-brand — the logo file is REQUIRED on the `imageAI` node, never a text description (the model fabricates a wrong mark otherwise).
 - **Format:** landscape board, ~4:3 (e.g. ~1456×1088, or 2K/4K for legible per-cut labels).
 - **Flow:** `build_graph` (`imageInput` logo.png + characters.png → `imageAI` Nanobanana Pro, the board prompt as a CONNECTED text edge — `customPrompt` alone fails validation) → `validate_workflow` → `get_pricing`/`estimate_product_cost` → get the user's OK to spend → `run_workflow` (`userConfirmed: true`) → poll `get_workflow_run_status` until terminal → `get_node_outputs` → present the rendered board.
 - **Fallback:** if Wyren isn't connected, render with any available image tool; if none is available, output the finalized prompt and tell the user exactly which model to paste it into (Nanobanana Pro / Seedream / GPT-Image). Never silently stop at the prompt when the user asked to generate.
@@ -237,6 +240,11 @@ Output everything written as TEXT, separate from the board image. This carries t
   - **Visual** — a RICH 2–3 sentence description (subject + action, key environment/props visible, where focus sits, lighting/look) — the same depth that prints as the board caption (`Shot spec — Visual`). Not a bare label.
   - **Dialogue** — the actual spoken line for that beat, written out. Use "—" ONLY for true silent beats; do NOT leave blank to fill later. These exact lines get spoken (native audio nails scripted lines), so write them tight and in-character.
   - **Direction** — performance / delivery note that drives the acting (e.g. "deliver softly, almost inspirational"; "rushed, glancing off-camera"; "deadpan, then a tiny smirk"). Always fill this — **it is the highest-value column and the reason this table is mandatory.**
+  **SCRIPT COHERENCE — the dialogue-only pass (mandatory before handoff).** Punchy individual lines do NOT add up to a coherent script. Choppy ≠ coherent. Before you ship the FLOW table, gate the Dialogue column on all three:
+  - **One message, one through-line.** The whole script serves ONE core message (the through-line set in Phase 4); every spoken line must ladder to it. Kill competing morals — if two lines push different takeaways (e.g. "AI does the work" vs "no, I do it" vs "AI took my waiting"), cut or rewrite until one wins. Three half-arguments read as noise.
+  - **Line-to-line logic.** Each line must *answer or escalate* the line before it. No non-sequiturs — a question must be answered by the next line, not sidestepped. No contradictions — a claim must not be undercut by the next line (e.g. "you give it context, it does the work" → "No, I do it" is a contradiction, not a beat).
+  - **The dialogue-only pass (the actual test).** Read ONLY the Dialogue column top-to-bottom, ignoring every visual, caption, and shot note. It must read as one coherent conversation: setup→answer, claim→no-contradiction, hook→payoff. If it doesn't, **rewrite the lines now, before any image/video generation.** Reordering clips downstream cannot repair broken dialogue logic — only rewriting the lines can. Do not hand off a script that fails this read.
 - **VIDEO EDITOR NOTES** — what to add in the EDIT, not the frames: suggested on-screen caption per beat, cut/transition style, SFX + music, pacing, zoom punch-ins, where the logo / brand splash lands. (Feeds the `shorts-editor` skill.)
 - **STYLE NOTES** — the look the final video must match: video type, lighting, camera feel, color, wardrobe/prop continuity, mood keywords. Reference the CHARACTERS + SHARED CHOICES sections for what must stay consistent. (Feeds `video-prompt`.)
 - **BRAND / LOGO NOTES** — which frames the supplied logo appears in and how (subtle in-world prop), per Phase 7.
@@ -283,7 +291,7 @@ After the storyboard, write 2 practical social captions for the target audience.
 5. 9-beat shot structure
 6. Quality gate summary (type-appropriate)
 7. Clarification checkpoint — confirm DIRECTION and ask any open questions (skip with one line if already clear and approved)
-8. Board prompt + text breakdown (Step 8-i) — show the finalized image-gen prompt for the landscape board (shared choices, character reference, environment + floor plan, storyboard strip, lighting/mood) AND the text breakdown (video concept, shared choices, characters, environment, 6-column flow table, editor notes, style notes, brand/logo notes); state assumptions and refine with the user. No image yet.
+8. Board prompt + text breakdown (Step 8-i) — show the finalized image-gen prompt for the landscape board (shared choices, character reference, environment + floor plan, storyboard strip, lighting/mood) AND the text breakdown (video concept, shared choices, characters, environment, 6-column flow table, editor notes, style notes, brand/logo notes); state assumptions and refine with the user. No image yet. Before moving to generate, run the **dialogue-only coherence pass** on the FLOW table (Phase 8B) — read just the Dialogue column top-to-bottom: one through-line, every line answers/escalates the last, no contradictions. Rewrite the lines (never just reorder clips) if it fails.
 9. Confirm + GENERATE (Step 8-ii) — ask "generate the board now, or changes first?"; on an explicit go, render the board via Wyren imageAI (Nanobanana Pro, brand refs) and present it; offer one refine-and-regenerate pass if a panel drifts.
 10. video-prompt handoff (Phase 8C) — the compact copy-paste block (type, look lock, characters+bibles, how to use the board, per-cut shot list with dialogue + delivery, edit note). ALWAYS output this, even if the board render was skipped or failed — it's built from the text, not the image.
 11. Audience captions

package/skills/video-prompt/SKILL.md CHANGED Viewed

@@ -128,7 +128,7 @@ This template runs alongside the Wyren MCP. Confirm settings with the user durin
 - **Keep the face.** When upscaling/cleaning, instruct the model to PRESERVE the existing person's identity — same face, hair, age, build, wardrobe — and only improve quality (sharpen, denoise, fix artifacts). Don't let it redraw or beautify into a different face. Use an image-edit-capable model (Nanobanana / Nanobanana Pro accept image input), pass the panel + character profile (+ `characters.png`) as references, with a prompt like "enhance and clean this frame, keep the exact same face and person, do not change identity." This face-preserving upscale is what makes per-shot start frames consistent across a multishot/multi-clip video.
 **Image models (category "image"):** Nanobanana (Gemini 2.5 Flash Image, 1K, image input, default), Nanobanana Pro (Gemini 3 Pro Image, up to 4K, up to 14 reference images — best for keeping persona/logo consistent), Imagen 4 Fast/Standard/Ultra (text-only, 1K–2K, no image input). Sizes: 1K/2K/4K. Aspect ratios: 1:1, 4:3, 9:16, 16:9.
-**Default:** Nanobanana Pro at 2K (image input + multi-reference locks character + logo). Pass `characters.png` and `logo.png` as reference images.
+**Default:** Nanobanana Pro at 2K (image input + multi-reference locks character + logo). Pass `characters.png` and `logo.png` as reference images — the `logo.png` file is REQUIRED whenever the mark is visible, never a text description (the model fabricates a wrong logo otherwise; see BRAND & LOGO RULES).
 ### Step B — generate the clips (video model, `videoAI` node)
 Video models (category "video") and key knobs:
@@ -168,6 +168,11 @@ Any time the video is more than one shot — a multi-clip split (Part 1/Part 2)
 If two characters recur, build a separate profile + bible for each, and keep both references wired into every shot where they appear.
+**SECONDARY & BACKGROUND characters drift worst.** Per-shot generation only locks the START-FRAME subject; everyone else — the second person in a two-hander, recurring side characters, background extras — gets reinvented (face AND wardrobe) every shot. Don't rely on the bible text alone for them:
+- **Pin each recurring secondary character's wardrobe explicitly** in every shot prompt, not just the lead's (e.g. "the man in the grey zip hoodie and black cap"). Vague secondary descriptions are where the model improvises a new person.
+- **Chain that character's best generated frame back in as a reference** into their later shots: once a shot renders them well, feed that frame to `imageAI` as a reference when designing their next start frame. The image input carries multiple reference URLs, so wire BOTH characters' references in for any shot where both appear.
+- **Push extras out of focus.** Keep background people incidental, turned away, blurred, or cropped — never ask the model to hold a face it doesn't need to. An extra the viewer can't study can't visibly drift.
 ## DEFAULT SPLIT RULE
 If target duration is 20–30s: **Part 1 = Panels 1–5; Part 2 = Panels 6–9.** Each part feels like one continuous video. Part 2 continues the same character, wardrobe, props, lighting, setting, camera quality, and emotional energy — do not restart the story, do not recap. Never show the storyboard grid, panel numbers, production notes, borders, arrows, labels, or annotations. Convert panels into real-feeling vertical footage.
@@ -267,6 +272,8 @@ Do not add generated overlay captions unless explicitly requested. Avoid: subtit
 ## BRAND & LOGO RULES
 If a brand is included: use the supplied logo only if available, as a natural physical prop. Good placements: laptop sticker, tote bag, mug, notebook, badge, desk object, small office poster. Keep it subtle but recognizable, only where it naturally belongs. Do not: make it the focus, use it as an overlay/watermark, make it giant, force it into every shot, or invent a fake/distorted logo. If exact reproduction isn't possible, use a plain prop and avoid fake logo distortion. The video should feel like native content that happens to include the brand, not a brand ad.
+**NEVER trust the image model to draw the mark.** From a text description alone, image models fabricate the logo — wrong wordmark, invented icon — every time. **Attach the actual `../_arca-marketing-assets/assets/logo.png` file to every `imageAI` node that should show the mark** (the start-frame stage is where the logo gets baked in). If the model still can't reproduce it cleanly, leave a plain unbranded prop and add the real logo later in the edit (`shorts-editor`) — never let the model invent one.
 ## AUDIO DIRECTION
 Generate native audio if supported; it should feel like real creator-shot social video. Use: natural room tone, phone-mic ambience, casual dialogue, natural VO if the storyboard calls for it, imperfect human delivery, tiny pauses, keyboard taps, chair squeaks, paper sounds, phone buzzes, footsteps, desk sounds, bag rustles, small reaction sounds, subtle whoosh/riser only when helpful, light music only if it supports pacing. Native to TikTok/Reels, not cinematic. Avoid: trailer/orchestral music, dramatic swells, glossy commercial music, overproduced sound design, fake epic SFX, booming risers, ad-like or perfect-studio VO. Dialogue casual, slightly imperfect, human. If the model can't generate clean dialogue, prioritize realistic visual storytelling and leave dialogue/captions for editing.