arca-marketing-video 2.22.0 → 2.23.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json
CHANGED
|
@@ -102,11 +102,18 @@ Generate each slide as a separate standalone image. Do NOT generate all slides i
|
|
|
102
102
|
- Safe area: keep important text/faces away from edges; ~80–120 px safe margin all sides; headline large enough for mobile; place logo/name mark consistently (usually bottom-left, bottom-right, or top-left per layout) and subtly — never competing with the headline.
|
|
103
103
|
- Repurposing: square 1:1 1080×1080; stories/reels covers 9:16 1080×1920. Default to 4:5 portrait unless specified.
|
|
104
104
|
|
|
105
|
-
###
|
|
106
|
-
|
|
105
|
+
### GENERATE ONE SLIDE AT A TIME (hard rule + workflow)
|
|
106
|
+
**Never render the slides together. One render = one slide.** No grid, no contact sheet, no 2×2 / 3×3 board, no "all slides in one image", no multi-panel composite, no brand board. A combined image is an instant fail — discard it and re-render as separate slides. (Asking an image model for "the whole carousel" reliably produces a cramped grid with unreadable text; one slide per render is the only way to get full-resolution, legible, mobile-ready slides.)
|
|
107
107
|
|
|
108
|
-
|
|
109
|
-
|
|
108
|
+
**Render sequentially, anchoring every slide to Slide 1:**
|
|
109
|
+
1. **Render Slide 1 first** as a single standalone 4:5 image. This locks the carousel's look — palette, type hierarchy, layout rhythm, safe margins, logo placement, illustration style. Review it; re-render until it's right BEFORE touching the rest (everything inherits from it).
|
|
110
|
+
2. **Render Slides 2…N one at a time**, and feed the rendered Slide 1 back into the image tool as a **style reference** for each (optionally the immediately-previous slide too). Separate per-slide renders drift in style/palette/type unless every slide is anchored to Slide 1 — passing it as a reference is what makes independent renders read as one cohesive carousel.
|
|
111
|
+
3. **Review each slide before rendering the next.** If a slide drifts in style, breaks the ELEMENT BUDGET, or its text renders badly, fix and re-render THAT slide before moving on — don't batch-fix at the end.
|
|
112
|
+
|
|
113
|
+
*Tool note (if rendering via Wyren `imageAI`): one `imageAI` node/call per slide, with Slide 1's output wired in as a reference image for slides 2…N. Never one node asked to produce all slides. Validate the cheap image tier before spending — load the `wyren` skill first.*
|
|
114
|
+
|
|
115
|
+
Continuity SHOULD come from: anchoring every slide to Slide 1's rendered look, consistent type hierarchy, consistent palette, recurring background texture, consistent safe margins, restrained highlight emphasis, and 1 recurring object/motif repeated across slides. Each slide visually distinct but clearly part of the same carousel.
|
|
116
|
+
Continuity should NOT come from: rendering them together, repeating every object, the same busy brand-world scene everywhere, forcing the logo into every large visual moment, or packing every slide with the full brand system.
|
|
110
117
|
|
|
111
118
|
## CREATIVE STYLE OPTIONS
|
|
112
119
|
Choose the best direction for the topic.
|
|
@@ -129,9 +136,31 @@ Choose 3, 4, or 5 slides based on how much the topic needs.
|
|
|
129
136
|
|
|
130
137
|
Optional: Slide 2 may be "the answer to the hook" when the hook creates curiosity that needs immediate resolution — only if it improves narrative flow.
|
|
131
138
|
|
|
139
|
+
## HOOK & SWIPE-MOMENTUM (the content engine — read before writing copy)
|
|
140
|
+
Carousels grow by earning swipes: every swipe is an engagement signal that pushes the post to more people, so the writing's whole job is to make the reader open the post and keep swiping. Weak copy is the #1 reason a well-designed carousel dies. Write to these:
|
|
141
|
+
|
|
142
|
+
**Slide 1 is the whole ballgame — treat it like a YouTube thumbnail + title.** Its only job is to make the reader feel they MUST swipe. Not to inform, not to summarize — to create an itch the next slide scratches. If Slide 1 can be fully understood and "closed" on its own, you've already lost the swipe.
|
|
143
|
+
|
|
144
|
+
**Open the loop; don't close it.** The hook names a tension, question, or surprising claim and deliberately withholds the payoff — the answer lives on the next slide ("The reason your backlog never shrinks isn't more work →"). Resolve it too early and there's no reason to keep going.
|
|
145
|
+
|
|
146
|
+
**Kill the boring carousel.** The dead format is a solid-color background with a generic listicle hook ("3 tips for losing weight") — instantly skippable. Ban templated "X tips for Y" headlines, generic motivational filler, and vague "you need to hear this." Be specific, surprising, contrarian, or personal instead.
|
|
147
|
+
|
|
148
|
+
**Hook formulas — pick one, tailor to the audience + brand voice (see `brand.md` hook examples):**
|
|
149
|
+
- **Curiosity gap** — tease a result or mechanism without revealing it ("Most founders fix the wrong bottleneck first.").
|
|
150
|
+
- **Contrarian / pattern interrupt** — challenge a belief the audience holds ("Stop hiring more people. It's making you slower.").
|
|
151
|
+
- **Breaking-news framing** — state it like a fresh headline or announcement; urgency and newness pull the swipe.
|
|
152
|
+
- **Costly-mistake call-out** — name a specific mistake the reader recognizes they're making ("This one habit is quietly killing your week.").
|
|
153
|
+
- **Number + payoff with an open loop** — the number promises, the slides deliver; never a flat list dumped on slide 1.
|
|
154
|
+
- **Identity call-out** — name the exact audience so they self-select ("Founders who still run their own ops — this one's for you.").
|
|
155
|
+
- **Relatable truth / story-open** — start mid-tension or on a painfully familiar moment so they feel seen.
|
|
156
|
+
|
|
157
|
+
**Every slide ends with a reason to swipe.** Build momentum: end each slide on an incomplete thought, a "but here's the problem…", a cliffhanger, a numbered progression (1→2→3), or a question the next slide answers. No slide should feel like a comfortable stopping point until the CTA.
|
|
158
|
+
|
|
159
|
+
**Write for shares and saves, not just reads.** The most-distributed carousels are also the most shareable — content that makes the reader look smart, feel seen, or want to send it to someone. A line reframed for two competing sub-audiences earns shares from both. Specific beats generic; concrete pain beats abstract benefit, every time.
|
|
160
|
+
|
|
132
161
|
## SLIDE STRATEGY
|
|
133
162
|
|
|
134
|
-
**Slide 1 — Hook** (most important; must stop the scroll). Strongest visual composition in the carousel; short intriguing headline readable in under 2 seconds; names a painful truth, hidden bottleneck, or costly mistake; one bold visual metaphor from the visual world; instantly understandable before the caption. Hook angles (see profile for on-brand lines, tailor to audience): reframe a common assumption ("X is not a Y problem."), name the real bottleneck, call out a costly overlooked mistake, or challenge popular advice. *Visual: can be boldest, but only one clear anchor. Split-screen → keep each side simple/high-contrast. Character → clean background. Metaphor → no second metaphor.*
|
|
163
|
+
**Slide 1 — Hook** (most important; must stop the scroll AND force the swipe — apply the HOOK & SWIPE-MOMENTUM rules above). Strongest visual composition in the carousel; short intriguing headline readable in under 2 seconds; names a painful truth, hidden bottleneck, or costly mistake; one bold visual metaphor from the visual world; instantly understandable before the caption. Hook angles (see profile for on-brand lines, tailor to audience): reframe a common assumption ("X is not a Y problem."), name the real bottleneck, call out a costly overlooked mistake, or challenge popular advice. *Visual: can be boldest, but only one clear anchor. Split-screen → keep each side simple/high-contrast. Character → clean background. Metaphor → no second metaphor.*
|
|
135
164
|
|
|
136
165
|
**Slide 2 — Problem / Tension.** Make the pain concrete but don't visualize every symptom — usually the simplest slide after the hook; isolate the core tension with one clean metaphor. Show what's breaking (use the topic's specific pains): core friction/blocker/cost the audience feels, the symptom that keeps returning, unclear ownership or broken handoff, wasted time or effort. Good visuals: one person facing a single growing problem stack; one blocked path between two states; one messy pile beside one clean headline; one calendar page showing delay; one task stack with a single warning label; one split desk with very few props. Avoid: many props/labels/objects in one slide; more than two piles; more than three small labels; detailed paperwork with readable microcopy; a full overhead desk scene unless most objects are abstract/unlabeled. Feel: "Here is the bottleneck," not "Here are all the objects related to the bottleneck."
|
|
137
166
|
|
|
@@ -144,7 +173,7 @@ Optional: Slide 2 may be "the answer to the hook" when the hook creates curiosit
|
|
|
144
173
|
Slide pacing: 1 boldest/clearest hook → 2 simplest/clearest bottleneck → 3 calm reframe → 4 structured framework → 5 spacious CTA.
|
|
145
174
|
|
|
146
175
|
## COPY RULES
|
|
147
|
-
One main idea per slide. Big headline first; short supporting copy only; no walls of text. Use audience-specific language; prefer concrete pain over abstract benefits. Avoid generic AI buzzwords and corporate filler. Every slide earns the next swipe. Plain, direct language. Hook feels smart, not clickbait; CTA feels helpful, not desperate.
|
|
176
|
+
One main idea per slide. Big headline first; short supporting copy only; no walls of text. Use audience-specific language; prefer concrete pain over abstract benefits. Avoid generic AI buzzwords and corporate filler. **Every slide earns the next swipe** (open-loop endings, per HOOK & SWIPE-MOMENTUM). Plain, direct language. Hook feels smart, not clickbait; CTA feels helpful, not desperate. **No templated "X tips for Y" listicle headlines and no generic motivational filler** — specific, surprising, or contrarian only. The headline must do most of the work; if blurring the visual leaves the message intact, the copy is strong enough.
|
|
148
177
|
|
|
149
178
|
## DESIGN RULES
|
|
150
179
|
Each slide looks like part of one cohesive campaign. Use: the brand logo somewhere (name written per profile rules), bold contrast, strong typography, generous negative space, clean editorial composition, restrained brand-world details, one clear visual anchor, highlight color sparingly, and off-white/accent/dark backgrounds strategically.
|
|
@@ -240,7 +269,7 @@ Describe the recurring visual elements across all slides. Include:
|
|
|
240
269
|
- visual simplicity rule for this specific carousel
|
|
241
270
|
```
|
|
242
271
|
|
|
243
|
-
Then create each slide separately. Each slide block uses these fields:
|
|
272
|
+
Then create each slide separately — and render them ONE AT A TIME (Slide 1 first, then each later slide with Slide 1 wired in as a style reference; see GENERATE ONE SLIDE AT A TIME). Never render the set as a grid/board. Each slide block uses these fields:
|
|
244
273
|
|
|
245
274
|
**SLIDE 1 — HOOK**
|
|
246
275
|
- Purpose: [what this slide does strategically]
|
|
@@ -278,6 +307,9 @@ Then create each slide separately. Each slide block uses these fields:
|
|
|
278
307
|
**CAPTION:**
|
|
279
308
|
Write a practical social caption for the target audience. Expand the idea without simply repeating the slides. Direct, useful, audience-aware. No em dash. Max 3 short paragraphs, 2 sentences each. End with a soft brand CTA. Include up to 5 SEO-helpful, not-overly-specific hashtags.
|
|
280
309
|
|
|
310
|
+
**POSTING TIP (include once for the user):**
|
|
311
|
+
Remind the user to **add music to the carousel before posting** (the audio option below the caption). It lifts engagement, and on Instagram it makes the carousel eligible to surface on the Reels tab — a real reach boost for a static post.
|
|
312
|
+
|
|
281
313
|
---
|
|
282
314
|
|
|
283
315
|
## FINAL SELF-CHECK BEFORE OUTPUT
|
|
@@ -294,5 +326,8 @@ Check every slide:
|
|
|
294
326
|
10. Carousel still premium, human, practical, on-brand?
|
|
295
327
|
11. ELEMENT BUDGET COUNT: literally count for this slide — text zones (cap 2: headline + one line), UI cards (cap 1, anchor only), speech/thought bubbles (must be 0), scattered decorations like pins/dashes/sparkles/rays (must be 0). Over on any? Cut before outputting.
|
|
296
328
|
12. Did every image prompt end with the RENDER-ONLY enforcement line?
|
|
329
|
+
13. **Hook test:** does Slide 1 open a curiosity loop that makes the swipe feel irresistible — not a generic "X tips" listicle or a self-contained statement that needs no swipe?
|
|
330
|
+
14. **Swipe-momentum:** does every slide end with a reason to keep swiping (open loop / cliffhanger / progression), all the way to the CTA?
|
|
331
|
+
15. **One-at-a-time render:** is each slide its own standalone image (no grid/board), rendered Slide 1 first with Slide 1 fed back as the style reference for the rest?
|
|
297
332
|
|
|
298
|
-
If any answer is no,
|
|
333
|
+
If any answer is no, fix before outputting — sharpen the hook, simplify the slide, or re-render. When unsure on visuals, remove the element — restraint always wins; when unsure on copy, make it more specific and open the loop wider.
|
|
@@ -18,7 +18,7 @@ in-world mark. `silence_cut.py` and `composition.template.html` are co-located i
|
|
|
18
18
|
This skill is the final edit stage after `video-prompt` (or runs standalone on any raw footage).
|
|
19
19
|
|
|
20
20
|
## Overview
|
|
21
|
-
Turn a finished-but-flat talking clip into a punchy 9:16 short. Engagement is layers stacked on clean footage: a **tight silence-cut** (raw footage
|
|
21
|
+
Turn a finished-but-flat talking clip into a punchy 9:16 short. Engagement is layers stacked on clean footage: a **tight silence-cut** (raw footage AND assembled AI clips — Kling pads dead air into every generated clip), **word-by-word pop-on captions** (Anton, gold keywords, no pill backing), **native treatments** (zoom punch-ins, speed ramps, hard cuts, SFX hits — NOT glass-pill chips), and a **SFX + brand-splash** finish. The composition is HyperFrames HTML; ffmpeg cuts and masters; faster-whisper supplies timing.
|
|
22
22
|
|
|
23
23
|
**Core principle:** retention is manufactured by deleting dead time and giving the eye a new beat every 2-4s. Cut the pauses first; every other layer just decorates the tightened result.
|
|
24
24
|
|
|
@@ -52,23 +52,21 @@ All paths below are inside the project dir: source from `clips/`, working files
|
|
|
52
52
|
1. **Denoise** → `edit/audio_clean.m4a`:
|
|
53
53
|
`ffmpeg -i clips/src.mp4 -af "highpass=85,afftdn,lowpass=12000,loudnorm=I=-14:TP=-1.5:LRA=11" -ar 48000 -ac 2 edit/audio_clean.m4a`
|
|
54
54
|
2. **Transcribe** word timestamps with faster-whisper → `edit/transcript.json` (`[{text,start,end}]`). `small.en` only if the audio is English.
|
|
55
|
-
3. **Silence-cut** with `./silence_cut.py` → `edit/tight.mp4` (the non-obvious core, see below). **
|
|
56
|
-
|
|
57
|
-
|
|
58
|
-
|
|
59
|
-
silence-cut on real raw recordings (interview, vox-pop, vlog, podcast) with genuine dead air.
|
|
60
|
-
4. **Re-transcribe `edit/tight.mp4`**, regroup into caption phrases (sentence-aware, 3-5 words). Cutting shifts every timestamp, so always re-transcribe the cut; never remap old times.
|
|
55
|
+
3. **Silence-cut** with `./silence_cut.py` → `edit/tight.mp4` (the non-obvious core, see below). **Run this on AI-generated footage too** (clips from `video-prompt` / Wyren), not just raw recordings. Kling and similar models pad EVERY generated clip with ~1–2s of dead air — a slow lead-in plus a tail after the last word — so stacked clips drag badly. The old "AI clips are already tight, skip it" assumption is the opposite of reality. **Assemble the clips first, then silence-cut the assembled video.** For AI footage tune the cutter sensitive: `--cut-min 0.35` and a lower noise floor (`-36 dB`) to catch the quiet, breath-filled padding, leaving ~0.1s pads so only natural beats survive. Real raw recordings (interview, vox-pop, vlog, podcast) use the default settings.
|
|
56
|
+
|
|
57
|
+
**Mind the tail / SFX.** AI clips often land the FINAL WORD right at the out-point, so the ~0.1s pad after the last word matters, and **never land a transition SFX on a cut where a word ends** (it steps on the last word — see the SFX layer).
|
|
58
|
+
4. **Re-transcribe the ASSEMBLED cut** (`edit/tight.mp4` — for raw footage OR an assembled AI-clip video), regroup into caption phrases (sentence-aware, 3-5 words). The silence-cut shifts every timestamp, so always re-transcribe and recompute **caption, zoom, SFX, and splash** timing from the NEW boundaries; never remap old times.
|
|
61
59
|
5. **Build the composition** in `edit/composition/` from `./composition.template.html`: muted plate + separate dialogue audio + word-pop captions + zooms + logo + splash + SFX (no chips by default). Lint clean.
|
|
62
60
|
6. **Draft render** (`--quality draft`) → `edit/frames/`, extract frames at every caption/splash beat, eyeball, fix. Then **`--quality high`** and **master** into `out/`:
|
|
63
61
|
`ffmpeg -i edit/raw.mp4 -c:v copy -af "loudnorm=I=-14:TP=-1.5,alimiter=limit=0.95" -c:a aac -b:a 192k out/<slug>-final.mp4`
|
|
64
62
|
|
|
65
63
|
## The silence-cut (the part that is easy to get wrong)
|
|
66
|
-
**
|
|
64
|
+
**Run on BOTH raw recordings and assembled AI-clip videos** (see Pipeline step 3). Kling pads every generated clip with ~1–2s of dead air, so AI assemblies need this just as much as raw footage — for AI clips tune sensitive (`--cut-min 0.35`, noise floor `-36 dB`).
|
|
67
65
|
Neither signal alone works:
|
|
68
66
|
- **silencedetect** misses pauses filled with ambient/breath above the noise floor.
|
|
69
67
|
- **whisper word timestamps** are imprecise around pauses and will report contiguous words across a real ~1s gap (e.g. it claimed "So, cheating" was continuous when 0.8s of silence sat between them).
|
|
70
68
|
|
|
71
|
-
**Cut where EITHER is true:** acoustic silence (`silencedetect=noise=-33dB:d=0.35`) OR a transcript word-gap > 0.45s. Only remove the dead middle when it exceeds ~0.5s, leave ~0.1s of speech pad each side, and KEEP sub-0.5s gaps (natural rhythm — cutting every micro-gap machine-guns the edit and looks glitchy). Add a 10ms `afade` at each join to kill clicks. **Verify with silencedetect on the OUTPUT** — it should show only short breath gaps. `./silence_cut.py` implements all of this; tune `--cut-min`.
|
|
69
|
+
**Cut where EITHER is true:** acoustic silence (`silencedetect=noise=-33dB:d=0.35`) OR a transcript word-gap > 0.45s. Only remove the dead middle when it exceeds ~0.5s, leave ~0.1s of speech pad each side, and KEEP sub-0.5s gaps (natural rhythm — cutting every micro-gap machine-guns the edit and looks glitchy). Add a 10ms `afade` at each join to kill clicks. **Verify with silencedetect on the OUTPUT** — it should show only short breath gaps. `./silence_cut.py` implements all of this; tune `--cut-min`. **For AI-clip assemblies** (Kling et al.), the padding is quiet near-silent breath that `-33 dB` can miss — drop the noise floor to `-36 dB` and `--cut-min 0.35` to catch it.
|
|
72
70
|
|
|
73
71
|
## Layers (all face/caption-safe)
|
|
74
72
|
- **Captions — word-by-word pop-on (the default):** each WORD pops in as it's spoken (Anton, UPPERCASE,
|
|
@@ -81,7 +79,7 @@ Neither signal alone works:
|
|
|
81
79
|
hard cuts on the beat, and SFX hits. Only add a chip if the user explicitly asks for an on-screen
|
|
82
80
|
label, and keep it minimal. The chip CSS/JS is removed from the default template.
|
|
83
81
|
- **Zoom punch-ins:** scale the plate wrapper (base ~1.04) to ~1.10-1.14 on emphasis lines, ease back. Never scale below 1.0 (reveals letterbox edges). Cover-fit the plate.
|
|
84
|
-
- **SFX:** a curated set ships in `./sfx/` — use these first (no download needed). Keep dialogue front (SFX vol 0.25-0.35); every `<audio>` needs an `id`. Mapping:
|
|
82
|
+
- **SFX:** a curated set ships in `./sfx/` — use these first (no download needed). Keep dialogue front (SFX vol 0.25-0.35); every `<audio>` needs an `id`. **Never land a transition/whoosh SFX on a cut where a word ends** — it steps on the last word; put the hit on a silent beat or at the head of the next clip (this is why the silence-cut leaves ~0.1s pad after the last word, see Pipeline step 3). Mapping:
|
|
85
83
|
| Role | File |
|
|
86
84
|
| --- | --- |
|
|
87
85
|
| Opening riser (first frame) | `./sfx/riser-high.mp3` |
|
|
@@ -106,8 +104,9 @@ Captions exist to (a) make the video legible sound-off, (b) hold retention with
|
|
|
106
104
|
- Captions are the **wording source of truth** — they carry the exact script even if native/synth audio
|
|
107
105
|
drops a word.
|
|
108
106
|
- Never cover the face — captions live in the lower third; any other graphic stays lower-mid or a top strip.
|
|
109
|
-
- Derive timing from **word-level transcription of the FINAL cut** (re-transcribe after any
|
|
110
|
-
|
|
107
|
+
- Derive timing from **word-level transcription of the FINAL cut** (re-transcribe after any cut, including
|
|
108
|
+
the AI-clip silence-cut). The same re-transcription drives caption, zoom, SFX, and splash timing — recompute
|
|
109
|
+
them all from the new boundaries. Never hand-guess or remap old times.
|
|
111
110
|
|
|
112
111
|
**Font / size / layout:** Anton (or a heavy grotesk like Archivo Black for premium brands), UPPERCASE by
|
|
113
112
|
default (sentence case only for a strictly soft/premium voice), one caption font for the whole video,
|
|
@@ -149,6 +148,8 @@ cross-dissolves / cinematic fades · ❌ tiny text, thin weights, or a generic s
|
|
|
149
148
|
| --- | --- |
|
|
150
149
|
| `hyperframes transcribe` fails (whisper-cpp not found) | use faster-whisper directly |
|
|
151
150
|
| A pause survives the cut | use silencedetect ∪ word-gap union, not either alone |
|
|
151
|
+
| AI clips drag / dead air between clips | run the silence-cut on the assembled AI video (`--cut-min 0.35`, noise `-36 dB`, ~0.1s pads) — don't skip it for AI footage |
|
|
152
|
+
| Last word clipped at a cut / SFX steps on speech (AI clips) | keep the ~0.1s pad after the last word; never land a transition SFX where a word ends |
|
|
152
153
|
| Two captions on screen at once | clamp hard-hide to `min(end+0.12, next.start-0.06)`, floor `inAt+0.2` (see Caption standard) |
|
|
153
154
|
| Font silently falls back | Anton/Archivo etc. are NOT auto-embedded; download `.woff2` to `fonts/` + `@font-face` |
|
|
154
155
|
| SFX silent in the render | every `<audio>` needs an `id` |
|
|
@@ -138,6 +138,7 @@ Rewrite the concept into a stronger TikTok/Reels-ready version. Produce:
|
|
|
138
138
|
- CTA echoes the brand's CTA style and leads softly to the brand.
|
|
139
139
|
- Treat the brand's hook/CTA examples as a springboard, not copy-paste — write a fresher, sharper line.
|
|
140
140
|
- **State in ONE line how the chosen hook + payoff + CTA map to the brand's message.**
|
|
141
|
+
- **Name the through-line.** Write the SINGLE core message every spoken line will serve — one sentence, not three. This is the through-line the dialogue-only pass enforces in Phase 8B; if you can't state it in one line here, the script isn't ready to storyboard.
|
|
141
142
|
|
|
142
143
|
**CATCHY & ENGAGING BAR.** The polished idea must clear a stickiness bar, not just clarity:
|
|
143
144
|
- A memorable, quotable line/phrase the viewer could repeat or comment (not clickbait).
|
|
@@ -174,6 +175,8 @@ Text may appear inside a frame ONLY when physically part of the scene: a real co
|
|
|
174
175
|
|
|
175
176
|
**Brand/logo rules.** If a brand logo asset is attached: use the actual supplied logo only; as a subtle in-world prop; place naturally on 1–3 panels only (unless the same object stays visible); good placements = tote bag, laptop sticker, mug, notebook, office sticker, small poster, badge, desk item; visible but not dominant; incidental, not ad-like. **Do NOT** invent a fake logo, distort it, make it giant, use it as a watermark/overlay, force it into every panel, or make the scene feel like a traditional ad. If exact reproduction isn't possible, leave the prop simple and add to that frame's BRAND/LOGO NOTES: "Place supplied brand logo here in post/prop."
|
|
176
177
|
|
|
178
|
+
**NEVER trust the image model to draw the mark.** Asked to render a logo from a text description, image models reliably FABRICATE it — a wrong wordmark, an invented icon, a mangled symbol — every time. So always **attach the actual `../_arca-marketing-assets/assets/logo.png` file to the `imageAI` node as a reference image** (Phase 8 already wires it in); never describe the logo and hope. If the model still can't reproduce it cleanly at the size shown, leave a plain unbranded prop and note "place supplied logo here in post" rather than shipping an invented mark.
|
|
179
|
+
|
|
177
180
|
## CLARIFICATION CHECKPOINT — ASK BEFORE GENERATING THE IMAGE
|
|
178
181
|
Before generating the production board (Phase 8), STOP and check whether anything material is still unclear or assumption-based. If so, ask a few focused questions and WAIT — don't generate the board on shaky assumptions, the image is expensive to redo. Ask when any of these are unresolved or guessed:
|
|
179
182
|
- chosen video type/style and overall look
|
|
@@ -215,7 +218,7 @@ Render the WHOLE board in the chosen video type's look (UGC by default — raw/i
|
|
|
215
218
|
|
|
216
219
|
### GENERATE THE BOARD (only after the user confirms in Step 8-ii)
|
|
217
220
|
Once the user says go, actually RENDER the board — don't just hand over the prompt. Default path is the **Wyren MCP** (consistent with `video-prompt`); load the `wyren` skill before any `mcp__wyren__*` call.
|
|
218
|
-
- **Model:** an image model that handles a multi-panel board + reference images — **Nanobanana Pro** (Gemini 3 Pro Image, up to 4K, ≤14 reference images). Pass `../_arca-marketing-assets/assets/logo.png` and `characters.png` as reference images so the persona + logo stay on-brand.
|
|
221
|
+
- **Model:** an image model that handles a multi-panel board + reference images — **Nanobanana Pro** (Gemini 3 Pro Image, up to 4K, ≤14 reference images). Pass `../_arca-marketing-assets/assets/logo.png` and `characters.png` as reference images so the persona + logo stay on-brand — the logo file is REQUIRED on the `imageAI` node, never a text description (the model fabricates a wrong mark otherwise).
|
|
219
222
|
- **Format:** landscape board, ~4:3 (e.g. ~1456×1088, or 2K/4K for legible per-cut labels).
|
|
220
223
|
- **Flow:** `build_graph` (`imageInput` logo.png + characters.png → `imageAI` Nanobanana Pro, the board prompt as a CONNECTED text edge — `customPrompt` alone fails validation) → `validate_workflow` → `get_pricing`/`estimate_product_cost` → get the user's OK to spend → `run_workflow` (`userConfirmed: true`) → poll `get_workflow_run_status` until terminal → `get_node_outputs` → present the rendered board.
|
|
221
224
|
- **Fallback:** if Wyren isn't connected, render with any available image tool; if none is available, output the finalized prompt and tell the user exactly which model to paste it into (Nanobanana Pro / Seedream / GPT-Image). Never silently stop at the prompt when the user asked to generate.
|
|
@@ -237,6 +240,11 @@ Output everything written as TEXT, separate from the board image. This carries t
|
|
|
237
240
|
- **Visual** — a RICH 2–3 sentence description (subject + action, key environment/props visible, where focus sits, lighting/look) — the same depth that prints as the board caption (`Shot spec — Visual`). Not a bare label.
|
|
238
241
|
- **Dialogue** — the actual spoken line for that beat, written out. Use "—" ONLY for true silent beats; do NOT leave blank to fill later. These exact lines get spoken (native audio nails scripted lines), so write them tight and in-character.
|
|
239
242
|
- **Direction** — performance / delivery note that drives the acting (e.g. "deliver softly, almost inspirational"; "rushed, glancing off-camera"; "deadpan, then a tiny smirk"). Always fill this — **it is the highest-value column and the reason this table is mandatory.**
|
|
243
|
+
|
|
244
|
+
**SCRIPT COHERENCE — the dialogue-only pass (mandatory before handoff).** Punchy individual lines do NOT add up to a coherent script. Choppy ≠ coherent. Before you ship the FLOW table, gate the Dialogue column on all three:
|
|
245
|
+
- **One message, one through-line.** The whole script serves ONE core message (the through-line set in Phase 4); every spoken line must ladder to it. Kill competing morals — if two lines push different takeaways (e.g. "AI does the work" vs "no, I do it" vs "AI took my waiting"), cut or rewrite until one wins. Three half-arguments read as noise.
|
|
246
|
+
- **Line-to-line logic.** Each line must *answer or escalate* the line before it. No non-sequiturs — a question must be answered by the next line, not sidestepped. No contradictions — a claim must not be undercut by the next line (e.g. "you give it context, it does the work" → "No, I do it" is a contradiction, not a beat).
|
|
247
|
+
- **The dialogue-only pass (the actual test).** Read ONLY the Dialogue column top-to-bottom, ignoring every visual, caption, and shot note. It must read as one coherent conversation: setup→answer, claim→no-contradiction, hook→payoff. If it doesn't, **rewrite the lines now, before any image/video generation.** Reordering clips downstream cannot repair broken dialogue logic — only rewriting the lines can. Do not hand off a script that fails this read.
|
|
240
248
|
- **VIDEO EDITOR NOTES** — what to add in the EDIT, not the frames: suggested on-screen caption per beat, cut/transition style, SFX + music, pacing, zoom punch-ins, where the logo / brand splash lands. (Feeds the `shorts-editor` skill.)
|
|
241
249
|
- **STYLE NOTES** — the look the final video must match: video type, lighting, camera feel, color, wardrobe/prop continuity, mood keywords. Reference the CHARACTERS + SHARED CHOICES sections for what must stay consistent. (Feeds `video-prompt`.)
|
|
242
250
|
- **BRAND / LOGO NOTES** — which frames the supplied logo appears in and how (subtle in-world prop), per Phase 7.
|
|
@@ -283,7 +291,7 @@ After the storyboard, write 2 practical social captions for the target audience.
|
|
|
283
291
|
5. 9-beat shot structure
|
|
284
292
|
6. Quality gate summary (type-appropriate)
|
|
285
293
|
7. Clarification checkpoint — confirm DIRECTION and ask any open questions (skip with one line if already clear and approved)
|
|
286
|
-
8. Board prompt + text breakdown (Step 8-i) — show the finalized image-gen prompt for the landscape board (shared choices, character reference, environment + floor plan, storyboard strip, lighting/mood) AND the text breakdown (video concept, shared choices, characters, environment, 6-column flow table, editor notes, style notes, brand/logo notes); state assumptions and refine with the user. No image yet.
|
|
294
|
+
8. Board prompt + text breakdown (Step 8-i) — show the finalized image-gen prompt for the landscape board (shared choices, character reference, environment + floor plan, storyboard strip, lighting/mood) AND the text breakdown (video concept, shared choices, characters, environment, 6-column flow table, editor notes, style notes, brand/logo notes); state assumptions and refine with the user. No image yet. Before moving to generate, run the **dialogue-only coherence pass** on the FLOW table (Phase 8B) — read just the Dialogue column top-to-bottom: one through-line, every line answers/escalates the last, no contradictions. Rewrite the lines (never just reorder clips) if it fails.
|
|
287
295
|
9. Confirm + GENERATE (Step 8-ii) — ask "generate the board now, or changes first?"; on an explicit go, render the board via Wyren imageAI (Nanobanana Pro, brand refs) and present it; offer one refine-and-regenerate pass if a panel drifts.
|
|
288
296
|
10. video-prompt handoff (Phase 8C) — the compact copy-paste block (type, look lock, characters+bibles, how to use the board, per-cut shot list with dialogue + delivery, edit note). ALWAYS output this, even if the board render was skipped or failed — it's built from the text, not the image.
|
|
289
297
|
11. Audience captions
|
|
@@ -128,7 +128,7 @@ This template runs alongside the Wyren MCP. Confirm settings with the user durin
|
|
|
128
128
|
- **Keep the face.** When upscaling/cleaning, instruct the model to PRESERVE the existing person's identity — same face, hair, age, build, wardrobe — and only improve quality (sharpen, denoise, fix artifacts). Don't let it redraw or beautify into a different face. Use an image-edit-capable model (Nanobanana / Nanobanana Pro accept image input), pass the panel + character profile (+ `characters.png`) as references, with a prompt like "enhance and clean this frame, keep the exact same face and person, do not change identity." This face-preserving upscale is what makes per-shot start frames consistent across a multishot/multi-clip video.
|
|
129
129
|
|
|
130
130
|
**Image models (category "image"):** Nanobanana (Gemini 2.5 Flash Image, 1K, image input, default), Nanobanana Pro (Gemini 3 Pro Image, up to 4K, up to 14 reference images — best for keeping persona/logo consistent), Imagen 4 Fast/Standard/Ultra (text-only, 1K–2K, no image input). Sizes: 1K/2K/4K. Aspect ratios: 1:1, 4:3, 9:16, 16:9.
|
|
131
|
-
**Default:** Nanobanana Pro at 2K (image input + multi-reference locks character + logo). Pass `characters.png` and `logo.png` as reference images.
|
|
131
|
+
**Default:** Nanobanana Pro at 2K (image input + multi-reference locks character + logo). Pass `characters.png` and `logo.png` as reference images — the `logo.png` file is REQUIRED whenever the mark is visible, never a text description (the model fabricates a wrong logo otherwise; see BRAND & LOGO RULES).
|
|
132
132
|
|
|
133
133
|
### Step B — generate the clips (video model, `videoAI` node)
|
|
134
134
|
Video models (category "video") and key knobs:
|
|
@@ -168,6 +168,11 @@ Any time the video is more than one shot — a multi-clip split (Part 1/Part 2)
|
|
|
168
168
|
|
|
169
169
|
If two characters recur, build a separate profile + bible for each, and keep both references wired into every shot where they appear.
|
|
170
170
|
|
|
171
|
+
**SECONDARY & BACKGROUND characters drift worst.** Per-shot generation only locks the START-FRAME subject; everyone else — the second person in a two-hander, recurring side characters, background extras — gets reinvented (face AND wardrobe) every shot. Don't rely on the bible text alone for them:
|
|
172
|
+
- **Pin each recurring secondary character's wardrobe explicitly** in every shot prompt, not just the lead's (e.g. "the man in the grey zip hoodie and black cap"). Vague secondary descriptions are where the model improvises a new person.
|
|
173
|
+
- **Chain that character's best generated frame back in as a reference** into their later shots: once a shot renders them well, feed that frame to `imageAI` as a reference when designing their next start frame. The image input carries multiple reference URLs, so wire BOTH characters' references in for any shot where both appear.
|
|
174
|
+
- **Push extras out of focus.** Keep background people incidental, turned away, blurred, or cropped — never ask the model to hold a face it doesn't need to. An extra the viewer can't study can't visibly drift.
|
|
175
|
+
|
|
171
176
|
## DEFAULT SPLIT RULE
|
|
172
177
|
If target duration is 20–30s: **Part 1 = Panels 1–5; Part 2 = Panels 6–9.** Each part feels like one continuous video. Part 2 continues the same character, wardrobe, props, lighting, setting, camera quality, and emotional energy — do not restart the story, do not recap. Never show the storyboard grid, panel numbers, production notes, borders, arrows, labels, or annotations. Convert panels into real-feeling vertical footage.
|
|
173
178
|
|
|
@@ -267,6 +272,8 @@ Do not add generated overlay captions unless explicitly requested. Avoid: subtit
|
|
|
267
272
|
## BRAND & LOGO RULES
|
|
268
273
|
If a brand is included: use the supplied logo only if available, as a natural physical prop. Good placements: laptop sticker, tote bag, mug, notebook, badge, desk object, small office poster. Keep it subtle but recognizable, only where it naturally belongs. Do not: make it the focus, use it as an overlay/watermark, make it giant, force it into every shot, or invent a fake/distorted logo. If exact reproduction isn't possible, use a plain prop and avoid fake logo distortion. The video should feel like native content that happens to include the brand, not a brand ad.
|
|
269
274
|
|
|
275
|
+
**NEVER trust the image model to draw the mark.** From a text description alone, image models fabricate the logo — wrong wordmark, invented icon — every time. **Attach the actual `../_arca-marketing-assets/assets/logo.png` file to every `imageAI` node that should show the mark** (the start-frame stage is where the logo gets baked in). If the model still can't reproduce it cleanly, leave a plain unbranded prop and add the real logo later in the edit (`shorts-editor`) — never let the model invent one.
|
|
276
|
+
|
|
270
277
|
## AUDIO DIRECTION
|
|
271
278
|
Generate native audio if supported; it should feel like real creator-shot social video. Use: natural room tone, phone-mic ambience, casual dialogue, natural VO if the storyboard calls for it, imperfect human delivery, tiny pauses, keyboard taps, chair squeaks, paper sounds, phone buzzes, footsteps, desk sounds, bag rustles, small reaction sounds, subtle whoosh/riser only when helpful, light music only if it supports pacing. Native to TikTok/Reels, not cinematic. Avoid: trailer/orchestral music, dramatic swells, glossy commercial music, overproduced sound design, fake epic SFX, booming risers, ad-like or perfect-studio VO. Dialogue casual, slightly imperfect, human. If the model can't generate clean dialogue, prioritize realistic visual storytelling and leave dialogue/captions for editing.
|
|
272
279
|
|