npm - arca-marketing-video - Versions diffs - 2.5.0 → 2.7.0 - Mend

arca-marketing-video 2.5.0 → 2.7.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (3) hide show

package/package.json +1 -1
package/skills/storyboard-prompt/SKILL.md +5 -1
package/skills/video-prompt/SKILL.md +88 -24

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "arca-marketing-video",
-  "version": "2.5.0",
+  "version": "2.7.0",
   "description": "Brand-driven short-form marketing content kit: four installable Claude Code skills (carousel generator, storyboard prompt, video prompt, shorts editor) plus a shared brand profile and assets. Run `npx arca-marketing-video` to install them into .claude/skills.",
   "keywords": [
     "skill",

package/skills/storyboard-prompt/SKILL.md CHANGED Viewed

@@ -700,7 +700,11 @@ The grid:
   edge-to-edge or with a thin uniform gutter, so every frame crops out cleanly on its own.
 - Frames in beat order (frame 1 = top-left). Each frame is a different moment (the 9 beats below).
 - Render the frames in the chosen VIDEO TYPE's look (UGC phone-shot by default — raw, imperfect,
-  human per Phases 6 and 9; cinematic / animation / product film per the declared type).
+  human per Phases 6 and 9; cinematic / animation / product film per the declared type). For UGC, the
+  frames must read like stills grabbed from a phone video, NOT stock/AI photos — apply Phases 6 & 9
+  (real skin texture, non-model casting, mixed practical light, off-center handheld framing, lived-in
+  clutter); avoid even studio light, model-perfect symmetric faces, plastic skin, and staged centering.
+  (`video-prompt` reuses these frames as start frames, so the realism has to start here.)
 The frames must contain NO:
 - panel numbers, labels, titles, or letters

package/skills/video-prompt/SKILL.md CHANGED Viewed

@@ -1,6 +1,6 @@
 ---
 name: video-prompt
-description: Use when turning a storyboard into a finished vertical short-form video of any type (UGC, cinematic / movie-trailer, animation, product film, etc.). Handles both photographic storyboards (clean and use as start frames) and schematic / annotated plans (do NOT upscale 1:1 — rebuild patterned to them using Wyren clips for live footage + HyperFrames for on-screen graphics). Drives the Wyren MCP — picks image/video models and resolutions, optimizes the TikTok first 5 seconds, and keeps character faces consistent across shots. Triggers on "make the video from this storyboard", "generate the short", "render the clips". Part of the arca-marketing-video kit.
+description: Use when turning a storyboard into a finished vertical short-form video of any type (UGC, cinematic / movie-trailer, animation, product film, etc.). Handles both photographic storyboards (clean and use as start frames) and schematic / annotated plans (do NOT upscale 1:1 — rebuild patterned to them with Wyren clips, with any on-screen UI/screens generated diegetically in the footage, never composited as text/UI overlays). Drives the Wyren MCP — picks image/video models and resolutions, forces phone-camera realism on the start frames (so they don't look like stock/AI photos), generates multi-angle coverage so the edit can cut fast like real UGC, optimizes the TikTok first 5 seconds, and keeps character faces consistent across shots. Triggers on "make the video from this storyboard", "generate the short", "render the clips". Part of the arca-marketing-video kit.
 ---
 # Video Prompt
@@ -29,9 +29,10 @@ Ask first. Only make smart, briefly stated assumptions for whatever is still mis
 Use the storyboard as the PLAN, not as footage to copy. First classify it (see STORYBOARD
 INTERPRETATION below): if the panels are photographic, clean them and use them as start frames; if
 they are schematic / annotated mockups (panel numbers, notes boxes, drawn phone bezels, mock UI), do
-NOT upscale or reproduce them 1:1 — rebuild the video patterned to them, using Wyren clips for the
-live footage and HyperFrames for the on-screen graphics. You may generate clips separately and merge
-them (Wyren or HyperFrames), or use video-model multishot — decide smartly.
+NOT upscale or reproduce them 1:1 — rebuild the video patterned to them with Wyren clips, realizing any
+on-screen UI / screens DIEGETICALLY inside the footage (never composited text/UI overlays — see DO NOT
+COMPOSITE TEXT/UI GRAPHICS). You may generate clips separately and merge them, or use video-model
+multishot — decide smartly.
 VIDEO TYPE SCOPE — read before applying the rules below.
 This skill makes ANY video type. The LOOK follows the chosen VIDEO TYPE; the TikTok retention rules
@@ -69,20 +70,33 @@ B) SCHEMATIC / ANNOTATED storyboard — each panel is a rough PLAN: a drawn phon
      - Generate the LIVE FOOTAGE with Wyren (people, faces, desk, reactions, real environment, camera
        move) using the recurring character profile for face consistency and a fresh start frame
        DESIGNED for that shot via image AI — not the schematic panel.
-     - Build the ON-SCREEN GRAPHICS with HyperFrames and composite them over the Wyren footage: UI
-       cards, data / deck mockups, split-desk labels, chips, checklists, route-map lines, captions,
-       progress arcs, a REC indicator, the logo — anything that is a graphic, not live action.
-     - Match each panel's layout and intent (what graphic sits where, what the person is doing), but
-       realize it as real footage + clean motion graphics in the chosen VIDEO TYPE's look.
+     - Realize any on-screen UI / data / screens the panel shows (a deck, a dashboard, sticky notes,
+       a checklist, a route map, a "REC" indicator) as DIEGETIC parts of the scene — i.e. generate
+       them INSIDE the Wyren clip as the real laptop/phone screen, real paper, or real props, framed
+       so the text is small/partial/blurred and need not render perfectly. Do NOT composite them as
+       floating graphics on top.
+     - Match each panel's layout and intent (what's on screen, what the person is doing), realized as
+       real in-scene footage in the chosen VIDEO TYPE's look.
 WHO MAKES WHAT (schematic path):
-- Wyren videoAI → live-action clips: people, faces, reactions, hands, environment, props, camera move.
+- Wyren videoAI → everything that appears in-frame: people, faces, reactions, hands, environment,
+  props, camera move, AND any on-screen UI / data / screens (generated diegetically in the clip).
 - Wyren imageAI → designed start frames and any photographic plates feeding the clips.
-- HyperFrames → all overlay graphics, captions, chips, UI / data mockups, transitions, logo, splash,
-  composited on top of the clips (the same graphics layer the `shorts-editor` skill uses).
-Keep diegetic vs overlay clear: a screen the actor really looks at can be a graphic comped onto the
-device; floating captions / chips are HyperFrames overlays. Never bake storyboard annotations into the
-video. If you are unsure which form the storyboard is, ask the user before generating.
+- HyperFrames → reserved for the EDIT stage (`shorts-editor`): spoken-word captions, brand splash /
+  end card, and timing of zooms / SFX. Nothing else.
+DO NOT COMPOSITE TEXT/UI GRAPHICS (hard rule):
+This skill (the video-generation stage) does NOT composite ANY overlay onto the footage. In particular,
+never recreate the storyboard's mock UI / data as floating overlays — no data/deck cards, dashboards,
+labels, callouts, checklists, route-map text, fake screens, or "REC"-style HUD text laid over the
+video. It looks fake and instantly kills the UGC feel. Any text or screen the viewer sees here must be
+DIEGETIC (filmed/generated as a real screen or prop inside the Wyren clip) or simply dropped — if a
+panel's mock UI can't be made diegetic, simplify or omit it.
+All overlays are decided later by the EDIT stage (`shorts-editor`): spoken-word captions, the brand
+splash, and its own engagement chips/graphics live there, on the editor's terms — not generated or
+composited in this stage.
+If you are unsure which form the storyboard is, ask the user before generating.
 CORE OUTPUT
@@ -113,8 +127,31 @@ Step A — produce a start frame per shot (image model, `imageAI` node):
   first frame.
 - SCHEMATIC storyboard (path B): do NOT upscale the panel. Use image AI to DESIGN a new start frame
   for that shot from the character profile + the panel's brief (scene, action, framing), so the frame
-  is real-looking footage, not a redraw of the mockup. The panel's mock UI becomes a HyperFrames
-  overlay later, not part of this start frame.
+  is real-looking footage, not a redraw of the mockup. Any on-screen UI the panel shows is realized
+  diegetically inside the clip (a real screen/prop), never composited as a text/graphic overlay.
+PHONE-CAMERA REALISM FOR START FRAMES (UGC types — non-negotiable)
+The #1 failure mode is a start frame that looks like a polished STOCK PHOTO or AI render (even soft
+studio light, model-perfect symmetric faces, plastic skin, centered staged composition, sterile clean
+room). The video model inherits the start frame's look, so a stock-looking frame produces a stock-
+looking clip. Fix it at the frame.
+- Apply the SAME proven realism rules this skill already specifies for the footage — PHONE-SHOT VISUAL
+  STYLE, CHARACTER REALISM, FRAMING, LIGHTING, and DEPTH OF FIELD (all below) — to the START FRAME
+  too. The frame is a paused still from that same casual phone video, NOT a separate photo shoot. In
+  particular reuse: natural skin texture / pores / asymmetry, non-model casting, candid mid-action
+  expression (CHARACTER REALISM); mixed practical light, uneven exposure, blown-out window, mild
+  noise (LIGHTING); off-center imperfect handheld framing (FRAMING); normal phone depth of field, no
+  creamy bokeh (DEPTH OF FIELD); real lived-in clutter (ENVIRONMENT).
+- Add explicit phone-camera tokens to the image prompt: "shot on an iPhone, candid vertical phone
+  snapshot, amateur, mild phone HDR, slight grain, imperfect framing."
+- Always include a negative prompt: stock photo, AI render, 3D render, glossy, airbrushed, retouched,
+  model, supermodel, perfect skin, symmetrical, studio lighting, softbox, beauty lighting, creamy
+  bokeh, shallow depth of field, hyperdetailed, 8k, cinematic, magazine, advertisement, clean, sterile,
+  staged.
+- Applies whether you upscale a photo (path A) or design a fresh frame (path B). If the source/look is
+  already stock-ish (the common failure), explicitly degrade it toward phone realism — grain, uneven
+  light, candid expression, off-center framing — BEFORE generating the clip. (NON-UGC types: follow
+  that type's craft instead — this phone-realism bar is for UGC.)
 - **Keep the face.** When upscaling/cleaning with the image model, instruct it to PRESERVE the
   existing person's identity — same face, hair, age, build, and wardrobe — and only improve quality
   (sharpen, denoise, fix artifacts). Do not let it redraw or beautify into a different face. Use an
@@ -153,12 +190,11 @@ Wyren execution flow (per the wyren skill's policy — load it before any `mcp__
 3. `validate_workflow` — resolve warnings with the user.
 4. Estimate cost with `get_pricing` (chain mode) / `estimate_product_cost`; get the user's OK to spend.
 5. `run_workflow` (`userConfirmed: true`), then poll `get_workflow_run_status` every 5s until terminal.
-6. Pull the clips with `get_node_outputs`.
-7. GRAPHICS PASS (HyperFrames) — build the on-screen overlay graphics the storyboard calls for (UI/data
-   cards, chips, captions, checklists, route maps, logo, transitions) and composite them over the Wyren
-   clips. For a SCHEMATIC storyboard this pass is required (the mock UI in the panels lives here, not in
-   the footage). Hand off to the `shorts-editor` skill, which owns this HyperFrames graphics + master step.
-8. Merge the clips + graphics into the final cut.
+6. Pull the clips with `get_node_outputs`. Any on-screen UI / data / screens were generated DIEGETICALLY
+   inside the clips (see DO NOT COMPOSITE TEXT/UI GRAPHICS) — there is no text-overlay pass here.
+7. EDIT & FINISH — hand the clips to the `shorts-editor` skill: fast-cut assembly, spoken-word CAPTIONS,
+   brand splash / end card, zoom/SFX timing, and master. That is the ONLY place HyperFrames is used, and
+   only for captions + splash + timing — never to composite text/UI graphics onto the footage.
 RECURRING CHARACTER CONSISTENCY (multishot / multi-clip)
@@ -343,6 +379,33 @@ Avoid:
 Every shot should answer:
 “Why would someone keep watching right now?”
+FAST CUTS & MULTI-ANGLE COVERAGE (the UGC editing engine)
+A single continuous clip per beat reads as a slow AI ad. Real engaging TikTok/Reels UGC is CUT FAST
+and constantly SWITCHES ANGLE and shot size. This operationalizes the "new beat every 2–4s" rule and
+the CAMERA RULES + TRANSITIONS sections below for GENERATION — and it overrides the one-shot-per-panel
+default in SHOT-BY-SHOT EXECUTION: a storyboard panel is a BEAT, usually several shots, not one clip.
+You cannot cut between angles you never generated.
+Rules:
+- Cut roughly every 1–2.5s. No talking shot holds longer than ~3s without a cut, angle change, or
+  reframe. The whole video should feel like many short shots, not a few long ones.
+- Cover each beat from MULTIPLE angles / shot sizes, then cut between them. Generate 2–3 variations per
+  beat drawn from the FRAMING + CAMERA move set: wide/establishing → medium close-up → tight close-up →
+  over-the-shoulder → desk-level POV → reaction insert. Vary framing on EVERY cut (never two same-size
+  shots back to back).
+- Use the video model's multishot where available (Kling V3 multiShot up to 6 shots) to get angle
+  changes inside one generation; otherwise generate several short clips per beat with different framings
+  and assemble them.
+- Intercut B-roll / insert shots between talking shots — hands on the keyboard, the screen, the paper
+  packet, a face reaction, an object. These inserts let you cut on every sentence and hide jumps.
+- Punctuate with the proven TRANSITIONS vocabulary (jump cuts, reaction cuts, quick whip-pans, match
+  cuts, camera repositioning) and snap zooms / handheld punch-ins — never slow dissolves or cinematic moves.
+- Keep continuity locked across the angle changes (same person/face, wardrobe, set, lighting) via the
+  character profile — fast cuts must not become a different person.
+- The actual cutting/assembly is finished in the editor (`shorts-editor` skill): generate enough angle
+  coverage here so that stage can cut fast. Hand it one long clip per beat and it cannot.
 PHONE-SHOT VISUAL STYLE
 SCOPE: this section (and the CAMERA / FRAMING / LIGHTING / DEPTH OF FIELD / CHARACTER REALISM /
@@ -811,7 +874,8 @@ Avoid:
 SHOT-BY-SHOT EXECUTION
-For each storyboard panel:
+Each storyboard panel is a BEAT — realize it as several short shots from different angles, not one
+held clip (see FAST CUTS & MULTI-ANGLE COVERAGE). For each panel:
 - preserve the main action
 - preserve the emotional beat