npm - @milenyumai/film-kit - Versions diffs - 1.4.0 → 1.4.2 - Mend

@milenyumai/film-kit 1.4.0 → 1.4.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (20) hide show

package/README.md +1 -0
package/build/lib/templates.js +160 -70
package/content/ARCHITECTURE.md +10 -4
package/content/MASTER.md +58 -26
package/content/RULES.md +27 -14
package/content/VOICE-DESIGN.md +248 -0
package/content/agents/prompt-engineer.md +58 -33
package/content/skills/audio-design/SKILL.md +140 -13
package/content/skills/coverage-system/SKILL.md +1 -1
package/content/skills/frame-chaining/SKILL.md +1 -1
package/content/skills/prompt-structure/SKILL.md +112 -46
package/content/skills/semantic-consistency/SKILL.md +94 -0
package/content/skills/spatial-blocking/SKILL.md +1 -0
package/content/skills/visual-modes/SKILL.md +12 -8
package/content/workflows/chain.md +5 -0
package/content/workflows/finish.md +8 -1
package/content/workflows/generate.md +86 -17
package/content/workflows/recover.md +10 -3
package/content/workflows/safety-check.md +41 -3
package/package.json +1 -1

package/README.md CHANGED Viewed

@@ -18,6 +18,7 @@ Film-Kit ships as a single repository with four npm packages:
   - native `.claude/agents/*`
   - cleanup of stale mode-specific Claude artifacts
 - Shared `spatial-blocking` skill for gaze, plane depth, light cohesion, compositing realism, and anti-miniature control.
+- Voice-design aware audio contract with project-level `voiceCast`, shot-level `Audio Plan`, and backward-compatible `Audio direction` blocks.
 - Aligned quality gates across Claude Code, Cursor, Copilot, and Antigravity.
 - Stronger Kling 3.0 and Kling multi-shot guidance, including practical route rules and hard caps.

package/build/lib/templates.js CHANGED Viewed

@@ -44,16 +44,18 @@ All rules, skills, and workflows are located under \`.agent/\`.
 ### Entry Points
 - **Master Rules:** \`.agent/MASTER.md\` — Complete production ruleset (690 lines)
 - **Architecture:** \`.agent/ARCHITECTURE.md\` — System map & quick reference
+- **Voice Design:** \`.agent/VOICE-DESIGN.md\` — Project-level \`voiceCast\` + shot-level \`audioPlan\`
 - **Model Profile:** \`.agent/model-profile.md\` — Active model rules and constraints
 - **Agent:** \`.agent/agents/prompt-engineer.md\` — Senior prompt engineer agent
-### Skills (8 modules)
+### Skills (9 modules)
 | Skill | Path | Priority |
 |-------|------|----------|
 | Safety Compliance | \`.agent/skills/safety-compliance/SKILL.md\` | P0 — ALWAYS |
 | Reference Locking | \`.agent/skills/reference-locking/SKILL.md\` | P1 — When refs provided |
 | Frame Chaining | \`.agent/skills/frame-chaining/SKILL.md\` | P2 — ALWAYS |
 | Spatial Blocking | \`.agent/skills/spatial-blocking/SKILL.md\` | P2 — Relational realism / gaze / depth |
+| Semantic Consistency | \`.agent/skills/semantic-consistency/SKILL.md\` | P2 — ALWAYS, visual_world + physics gate |
 | Coverage System | \`.agent/skills/coverage-system/SKILL.md\` | P2 — ALWAYS (mandatory) |
 | Visual Modes | \`.agent/skills/visual-modes/SKILL.md\` | P4 — ALWAYS |
 | Audio Design | \`.agent/skills/audio-design/SKILL.md\` | P4 — When dialogue/SFX |
@@ -71,9 +73,10 @@ All rules, skills, and workflows are located under \`.agent/\`.
 ## Goal
 When the user asks \`/generate\`, convert the scenario into:
 - \`${config.outputDir}/project-info.md\` — Characters, settings, arc mapping
-- \`${config.outputDir}/shot-plan.json\` — Single-agent plan + policy contract
+- \`${config.outputDir}/shot-plan.json\` — Single-agent plan + policy + \`voiceCast\` contract
 - \`${config.outputDir}/shots/SHOT01.md, SHOT02.md, ...\` — Production shot files (with coverage included)
 - \`${config.outputDir}/reports/SAFETY-REPORT.md\` — Safety gate result
+- \`${config.outputDir}/reports/SEMANTIC-REPORT.md\` — Semantic consistency gate result
 - \`${config.outputDir}/reports/DELIVERY-REPORT.md\` — Delivery gate result
 - \`${config.outputDir}/_index.md\` — Shot list with chain & status tracking
@@ -88,6 +91,7 @@ Each \`SHOTNN.md\` is a **single file** containing ALL shot details:
 - 🔗 Chain status (FIRST / CHAINED / CHAIN BREAK)
 - İLK FRAME (start frame image prompt — min 60 words)
 - SON FRAME (end frame image prompt — min 60 words)
+- AUDIO PLAN (machine-readable shot audio contract)
 - VİDEO (video prompt with audio direction — min 80 words)
 - COVERAGE SHOTS (2-3 coverage shots within same file — each min 60 words)
 - 🇹🇷 Turkish summary for each section
@@ -98,7 +102,9 @@ Each \`SHOTNN.md\` is a **single file** containing ALL shot details:
 - **Name Policy:** Visual prompts must stay anonymous. Dialogue naming follows \`shot-plan.json\` policy.
 - **AUTO-SAFETY:** Proactively reframe content that may trigger safety filters
 - **Frame Chaining:** Last frame of SHOT[N] = First frame of SHOT[N+1]
+- **Semantic Consistency:** \`shot-plan.json.visual_world\` is canonical for perspective, named camera movement strategy, shadow vector, scale, reflection, physics, and seed strategy
 - **Coverage Mandatory:** Every main shot includes 2-3 coverage sub-shots in same file
+- **Voice Design:** \`shot-plan.json\` keeps top-level \`voiceCast\`; every speaking VIDEO section keeps \`Audio Plan\`
 - **Music: NONE** by default (user must explicitly request)
 - **SLOW BURN:** 8-second duration, split actions into multiple shots
 `;
@@ -111,6 +117,7 @@ function buildCursorLegacyRules(config) {
 ## CRITICAL: SKILL LOADING PROTOCOL
 Before generating ANY prompts, read the appropriate skill files from .agent/skills/:
 Read .agent/model-profile.md first to apply model-specific rules.
+Read .agent/VOICE-DESIGN.md when dialogue, narrator VO, or reusable speaker identity exists.
 | Skill | Path | When |
 |-------|------|------|
@@ -118,6 +125,7 @@ Read .agent/model-profile.md first to apply model-specific rules.
 | Reference Locking | .agent/skills/reference-locking/SKILL.md | When refs provided |
 | Frame Chaining | .agent/skills/frame-chaining/SKILL.md | Multi-shot projects |
 | Spatial Blocking | .agent/skills/spatial-blocking/SKILL.md | Multi-subject / gaze / scale-critical shots |
+| Semantic Consistency | .agent/skills/semantic-consistency/SKILL.md | ALWAYS |
 | Coverage System | .agent/skills/coverage-system/SKILL.md | ALWAYS (mandatory) |
 | Visual Modes | .agent/skills/visual-modes/SKILL.md | All visual work |
 | Audio Design | .agent/skills/audio-design/SKILL.md | Dialogue/SFX needed |
@@ -133,8 +141,11 @@ Read .agent/model-profile.md first to apply model-specific rules.
 7. EVERY prompt must have an Avoid line. No exceptions.
 8. Coverage shots mandatory (2-3 per main shot, min 60 words each, included in same file).
 9. Frame chaining: Last frame of SHOT[N] = First frame of SHOT[N+1].
-10. ILK/İLK FRAME section must contain a code block even for chained shots.
-11. ONE FILE PER SHOT: Each SHOTNN.md contains main shot + all coverage shots.
+10. Semantic consistency: \`${config.outputDir}/shot-plan.json\` must include \`visual_world\`; prompts must align camera, named movement strategy, light/shadow vector, scale, reflections, physics, anatomy risk, and contextual logic.
+11. ILK/İLK FRAME section must contain a code block even for chained shots.
+12. Chained ILK/İLK FRAME code blocks must contain only: \`Use SHOT[prev]_END as exact first frame\`; any new visual prompt is a CHAIN BREAK.
+13. ONE FILE PER SHOT: Each SHOTNN.md contains main shot + all coverage shots.
+14. Keep top-level \`voiceCast\` in ${config.outputDir}/shot-plan.json and \`Audio Plan\` in every speaking VIDEO section.
 ## WORKFLOWS
 - /generate → Read .agent/workflows/generate.md
@@ -164,11 +175,12 @@ alwaysApply: true
 ## Entry Point
 Read \`.agent/MASTER.md\` for complete production ruleset.
 Read \`.agent/ARCHITECTURE.md\` for system overview.
+Read \`.agent/VOICE-DESIGN.md\` for voice identity and shot audio contracts.
 Read \`.agent/model-profile.md\` for active model constraints.
 ## SKILL LOADING (MANDATORY)
 Before generating ANY prompts:
-1. ALWAYS load: safety-compliance, frame-chaining, coverage-system, prompt-structure, visual-modes
+1. ALWAYS load: safety-compliance, frame-chaining, semantic-consistency, coverage-system, prompt-structure, visual-modes
 2. Load for relational realism: spatial-blocking
 3. Load if refs provided: reference-locking
 4. Load if dialogue/SFX: audio-design
@@ -187,8 +199,12 @@ All skills at: \`.agent/skills/[name]/SKILL.md\`
 ## CRITICAL RULES
 - AUTO-ANONYMOUS: Replace ALL real names with physical descriptions
 - Dialogue naming follows \`${config.outputDir}/shot-plan.json\` policy
+- \`shot-plan.json\` stores top-level \`voiceCast\`
+- \`shot-plan.json\` stores top-level \`visual_world\` for camera/lens/camera-movement/light/shadow/scale/reflection/physics/seed strategy
+- Every speaking VIDEO section includes \`Audio Plan\`
 - AUTO-SAFETY: Proactively reframe sensitive content
 - Frame chaining: Last frame SHOT[N] = First frame SHOT[N+1]
+- Chained ILK/İLK FRAME code block contains only \`Use SHOT[prev]_END as exact first frame\`; any new visual prompt requires CHAIN BREAK
 - Coverage: 2-3 sub-shots per main shot (min 60 words each, in same file)
 - Avoid line: MANDATORY on every prompt
 - Music: NONE by default
@@ -216,8 +232,9 @@ Single-agent shot package generation. Claude Code may use the native \`prompt-en
 ## Mandatory Read Order
 1. \`.agent/model-profile.md\`
 2. \`.agent/MASTER.md\`
-3. \`.agent/ARCHITECTURE.md\`
-4. \`.agent/agents/prompt-engineer.md\`
+3. \`.agent/VOICE-DESIGN.md\`
+4. \`.agent/ARCHITECTURE.md\`
+5. \`.agent/agents/prompt-engineer.md\`
 5. \`.claude/CLAUDE.md\`
 6. relevant files under \`.claude/rules/\`
@@ -232,6 +249,7 @@ Before generating ANY prompts, read skills from \`.agent/skills/\`:
 - \`reference-locking/SKILL.md\` — When refs provided (P1)
 - \`frame-chaining/SKILL.md\` — ALWAYS for multi-shot (P2)
 - \`spatial-blocking/SKILL.md\` — when gaze / scale / compositing realism matters (P2)
+- \`semantic-consistency/SKILL.md\` — ALWAYS, canonical \`visual_world\` + physics gate (P2)
 - \`coverage-system/SKILL.md\` — ALWAYS, mandatory (P2)
 - \`visual-modes/SKILL.md\` — ALWAYS (P4)
 - \`audio-design/SKILL.md\` — When dialogue/SFX (P4)
@@ -242,8 +260,10 @@ Before generating ANY prompts, read skills from \`.agent/skills/\`:
 - Model: \`${config.model}\` (${getModelDisplayName(config.model)})
 - Kling preset: \`${config.klingPreset}\`
 - Create \`${config.outputDir}/project-info.md\`, \`${config.outputDir}/shot-plan.json\`, and \`${config.outputDir}/_index.md\`
+- Keep top-level \`voiceCast\` in \`${config.outputDir}/shot-plan.json\`
+- Keep top-level \`visual_world\` in \`${config.outputDir}/shot-plan.json\`
 - Write \`${config.outputDir}/shots/SHOTNN.md\` per shot; coverage stays in the same file
-- Refresh \`${config.outputDir}/reports/SAFETY-REPORT.md\` and \`${config.outputDir}/reports/DELIVERY-REPORT.md\` before \`/finish\`
+- Refresh \`${config.outputDir}/reports/SAFETY-REPORT.md\`, \`${config.outputDir}/reports/SEMANTIC-REPORT.md\`, and \`${config.outputDir}/reports/DELIVERY-REPORT.md\` before \`/finish\`
 ## Non-Negotiables
 1. **AUTO-ANONYMOUS:** Replace ALL real person names in visual prompts with physical descriptions.
@@ -254,11 +274,14 @@ Before generating ANY prompts, read skills from \`.agent/skills/\`:
 6. **Music:** NONE by default.
 7. **Avoid line:** MANDATORY on every prompt (image, video, coverage).
 8. **Coverage:** 2-3 sub-shots within same SHOTNN.md file, min 70 words each.
-9. **ILK/İLK FRAME:** Always include a fenced code block, even when chained.
-10. **Quality Floor:** ILK >= 80, SON >= 80, VIDEO >= 120, coverage >= 70 words.
-11. **Specificity Floor:** lens/framing, lighting, and foreground/midground/background action are mandatory.
-12. **Spatial Realism Floor:** eyeline target, plane map, shared light source, and contact/depth cues are mandatory when relational staging matters.
-13. **ONE FILE PER SHOT:** No separate coverage files.
+9. **Voice Design:** keep project-level \`voiceCast\` in \`${config.outputDir}/shot-plan.json\` and per-shot \`Audio Plan\` in each VIDEO section.
+10. **ILK/İLK FRAME:** Always include a fenced code block, even when chained.
+11. **Chained ILK/İLK FRAME:** code block contains only \`Use SHOT[prev]_END as exact first frame\`; any new visual prompt requires CHAIN BREAK.
+12. **Quality Floor:** ILK >= 80, SON >= 80, VIDEO >= 120, coverage >= 70 words.
+13. **Specificity Floor:** lens/framing, lighting, and foreground/midground/background action are mandatory.
+14. **Spatial Realism Floor:** eyeline target, plane map, shared light source, and contact/depth cues are mandatory when relational staging matters.
+15. **Semantic Consistency Floor:** \`visual_world\`, perspective/geometry, shadow vector, scale map, reflections, gravity/contact physics, anatomy risk, foreground/background coherence, contextual contradictions, and targeted semantic avoid terms are mandatory.
+16. **ONE FILE PER SHOT:** No separate coverage files.
 ## Workflows
 | Command | Workflow |
@@ -283,8 +306,9 @@ This workspace keeps high-level policy in \`CLAUDE.md\` and operational detail i
 ## Before Any Generation or Repair
 1. Read \`.agent/model-profile.md\`
 2. Read \`.agent/MASTER.md\`
-3. Read \`.agent/agents/prompt-engineer.md\`
-4. Read \`.agent/workflows/generate.md\` or the requested workflow
+3. Read \`.agent/VOICE-DESIGN.md\`
+4. Read \`.agent/agents/prompt-engineer.md\`
+5. Read \`.agent/workflows/generate.md\` or the requested workflow
 5. Apply AUTO-ANONYMOUS and AUTO-SAFETY before drafting
 6. Prefer the active/selected markdown file as scenario source; fallback is \`${config.scenarioHint}\`
 7. Do not mark work complete while any required report is missing or fail
@@ -293,8 +317,12 @@ This workspace keeps high-level policy in \`CLAUDE.md\` and operational detail i
 - Write only inside \`${config.outputDir}\`
 - Keep one file per shot: \`${config.outputDir}/shots/SHOTNN.md\`
 - Maintain \`${config.outputDir}/shot-plan.json\` dialogue naming policy
+- Maintain \`${config.outputDir}/shot-plan.json\` top-level \`voiceCast\`
+- Maintain \`${config.outputDir}/shot-plan.json\` top-level \`visual_world\`
+- Keep \`Audio Plan\` blocks aligned to \`voiceCast\`
 - Keep \`ILK/İLK FRAME\` in a fenced code block even when chained
 - Quality floor and specificity floor are hard gates, not suggestions
+- Semantic consistency floor is a hard gate: camera/lens/camera-movement/light/shadow/scale/reflection/physics/anatomy/context must align to \`visual_world\`
 - Apply \`.agent/skills/spatial-blocking/SKILL.md\` whenever eyeline, compositing, or depth realism is critical
 ## Debugging
@@ -313,20 +341,26 @@ Use the Film-Kit core runtime.
 ## Read First
 1. \`.agent/model-profile.md\`
 2. \`.agent/MASTER.md\`
-3. \`.agent/agents/prompt-engineer.md\`
-4. \`.agent/workflows/generate.md\`
-5. \`.claude/rules/output-contract.md\`
+3. \`.agent/VOICE-DESIGN.md\`
+4. \`.agent/agents/prompt-engineer.md\`
+5. \`.agent/workflows/generate.md\`
+6. \`.claude/rules/output-contract.md\`
 ## Responsibilities
 - draft and repair shot files under \`${config.outputDir}/shots/\`
 - apply \`${config.outputDir}/shot-plan.json\` dialogue naming policy
+- maintain top-level \`voiceCast\` inside \`${config.outputDir}/shot-plan.json\`
+- maintain top-level \`visual_world\` inside \`${config.outputDir}/shot-plan.json\`
+- keep \`Audio Plan\` blocks valid against \`voiceCast\`
 - enforce AUTO-ANONYMOUS, AUTO-SAFETY, chaining, and coverage contracts
 - enforce quality floor: ILK >= 80, SON >= 80, VIDEO >= 120, coverage >= 70
 - enforce specificity floor: lens/framing, lighting, and foreground/midground/background action
 - enforce spatial realism: explicit eyeline target, plane map, shared light source, and contact/depth cues when needed
+- enforce semantic consistency: \`visual_world\`, perspective/geometry, shadow vector, scale map, reflection handling, physics/anatomy risk, foreground/background coherence, contextual contradictions, and scene-specific avoid terms
 ## Boundaries
 - do not skip safety or delivery reports
+- do not pass chained ILK/İLK FRAME blocks that contain anything besides exact reuse text
 - do not split coverage into separate files
 - if asked to review only, report issues instead of regenerating shots by default
 `;
@@ -349,6 +383,8 @@ If using the native Claude subagent, read \`.claude/agents/prompt-engineer.md\`
    - Determine shot count based on action beats
    - Create \`${config.outputDir}/project-info.md\`
    - Create \`${config.outputDir}/shot-plan.json\`
+   - Add top-level \`voiceCast\` before writing speaking shots
+   - Add top-level \`visual_world\` before writing visual prompts
 2. **Batch Strategy:**
    - 1-10 shots → Generate all at once
@@ -358,10 +394,13 @@ If using the native Claude subagent, read \`.claude/agents/prompt-engineer.md\`
 3. **Per Shot (SINGLE FILE: SHOTNN.md):**
    - Analyze scene type (Dialogue / Action / Emotional / Establishing)
    - Generate main shot (İLK FRAME + SON FRAME + VİDEO)
+   - Add machine-readable \`Audio Plan\` before every VIDEO section
    - Keep İLK FRAME as fenced code block even when chained
+   - If chained, keep İLK FRAME code block to exact reuse text only; new visual prompt means CHAIN BREAK
    - Enforce hard quality floor: ILK >= 80, SON >= 80, VIDEO >= 120, coverage >= 70
    - Enforce specificity floor: lens/framing + lighting + foreground/midground/background action
    - Enforce spatial realism floor: eyeline target + plane map + shared light source + contact/depth cues when applicable
+   - Enforce semantic consistency floor: perspective/geometry + shadow vector + scale map + reflections + gravity/contact physics + anatomy risk + contextual contradiction check
    - Generate 2-3 coverage shots (in same file)
    - Write to \`${config.outputDir}/shots/SHOT[NN].md\`
    - Update \`${config.outputDir}/_index.md\`
@@ -369,6 +408,7 @@ If using the native Claude subagent, read \`.claude/agents/prompt-engineer.md\`
 4. **Validation Gates:**
    - Run /safety-check
    - Write \`${config.outputDir}/reports/SAFETY-REPORT.md\`
+   - Write \`${config.outputDir}/reports/SEMANTIC-REPORT.md\`
    - Write \`${config.outputDir}/reports/DELIVERY-REPORT.md\`
    - If any gate fails, run \`.agent/workflows/recover.md\`
@@ -386,11 +426,13 @@ function buildClaudeRuleOutputContract(config) {
 ## Required Files
 - \`${config.outputDir}/project-info.md\` — Characters, settings, emotional arc mapping, tension levels
-- \`${config.outputDir}/shot-plan.json\` — Name policy, shot plan, and validation contract
+- \`${config.outputDir}/shot-plan.json\` — Name policy, shot plan, validation contract, top-level \`voiceCast\`, and top-level \`visual_world\`
 - \`.agent/model-profile.md\` — Active model constraints and presets
+- \`.agent/VOICE-DESIGN.md\` — Voice identity and shot audio contract
 - \`${config.outputDir}/_index.md\` — Shot tracking with chain & status
 - \`${config.outputDir}/shots/SHOT01.md ... SHOTNN.md\` — Individual shot files (one file per shot)
 - \`${config.outputDir}/reports/SAFETY-REPORT.md\` — Safety gate report
+- \`${config.outputDir}/reports/SEMANTIC-REPORT.md\` — Semantic consistency gate report
 - \`${config.outputDir}/reports/DELIVERY-REPORT.md\` — Delivery gate report
 ## Prompt Flow Order (MANDATORY)
@@ -403,7 +445,7 @@ The first sentence carries the most weight. Every prompt MUST follow this order:
 3. [Action] Micro-behavior, acting cue, physical gesture
 4. [Camera + Lens] "85mm f/2.0, shallow DOF, static with handheld micro-movement"
 5. [Light + Atmosphere] "Warm oil lamp key light from screen-left, deep shadows"
-6. [Audio direction block] (VIDEO prompts only)
+6. [Audio Plan + Audio direction block] (VIDEO prompts only)
 7. [Avoid line] (EVERY prompt — MANDATORY)
 \`\`\`
@@ -413,6 +455,7 @@ The first sentence carries the most weight. Every prompt MUST follow this order:
 - Kling preset: \`${config.klingPreset}\`
 - Kling transition mode: Start+End (when model is kling-3.0)
 - Motion timeline: first → then → finally (when model is kling-3.0)
+- Kling multi-shot mode: single-transition by default; custom storyboard only for 2-3 meaningful phases (when model is kling-3.0)
 ## Shot File Format (SHOTNN.md) — SINGLE FILE, ALL INCLUSIVE
@@ -428,10 +471,11 @@ FIRST SHOT / CHAINED from SHOT[prev]_END / CHAIN BREAK - Reason
 ## Main Shot
 ### İLK FRAME (SHOTNN_START)
-[If chained: "→ Use SHOT[prev]_END as first frame"]
+[If chained: the code block below must contain only "Use SHOT[prev]_END as exact first frame"]
 > NOTE: Even when chained, this section MUST contain a fenced code block.
-> If chained, include: "Use SHOT[prev]_END as exact first frame."
+> If chained, the fenced code block must contain only: "Use SHOT[prev]_END as exact first frame."
+> Any new visual prompt in a chained ILK FRAME section requires CHAIN BREAK.
 \\\`\\\`\\\`
 [Image prompt — min 60 words, following prompt flow order]
@@ -447,6 +491,12 @@ Avoid: blurry, low-res, noise, distorted faces, bad anatomy, extra limbs/fingers
 plastic skin, waxy skin, on-screen text, watermark, logo, cartoon style, CGI look.
 \\\`\\\`\\\`
+### AUDIO PLAN
+\\\`\\\`\\\`json
+[Machine-readable audioPlan JSON block aligned with shot-plan.json -> voiceCast]
+\\\`\\\`\\\`
 ### VİDEO
 \\\`\\\`\\\`
@@ -494,6 +544,10 @@ Audio direction:
 Avoid: distorted faces, morphing, blurry, flickering, unnatural motion, on-screen text.
 \\\`\\\`\\\`
+\\\`\\\`\\\`json
+[Coverage audioPlan JSON block - voice binding only when coverage contains speech]
+\\\`\\\`\\\`
 ### SHOT[NN]B — [Type] | [Duration]s | [Icon] [Label]
 [Same format as A]
@@ -545,6 +599,8 @@ Total: 1 main + [N] coverage = [N+1] production shots
 ### Name Rule
 - Visual prompts and non-dialogue fields: no real names
 - Dialogue transcript naming follows \`shot-plan.json.dialogue_name_policy\`
+- Every speaking shot must resolve \`activeSpeakerKey\` to \`shot-plan.json.voiceCast\`
+- Keep one active speaker per shot
 ### 30° Rule
 Coverage camera angles MUST differ from main shot by at least 30°.
@@ -563,6 +619,11 @@ Character gaze directions must be spatially consistent between cuts.
 - Keep one motivated light source across subjects.
 - Add contact / weight / support cues to avoid pasted composite look.
+### Semantic Consistency
+- \`shot-plan.json.visual_world\` is the canonical scene contract.
+- Prompts must agree with its aspect ratio, camera height, lens family, horizon line, vanishing strategy, camera movement strategy, light source, shadow direction, color temperature, scale map, reflection risk, physics constraints, and seed strategy.
+- Avoid contextual contradictions unless the prompt explicitly explains the unusual physics or style.
 ### Dramaturgy (for dialogue scenes)
 Analyze per character: Objective → Obstacle → Stakes → Subtext → Beat turns.
 Embed as physical behavior in prompts, NOT as metadata.
@@ -590,6 +651,7 @@ function buildCopilotInstructions(config) {
 ### System Entry Point
 Read \`.agent/MASTER.md\` for the complete production ruleset.
 Read \`.agent/ARCHITECTURE.md\` for system overview.
+Read \`.agent/VOICE-DESIGN.md\` for voice identity and shot audio contracts.
 Read \`.agent/model-profile.md\` for active model rules.
 ### Skill Loading Protocol (MANDATORY)
@@ -598,10 +660,11 @@ Before generating ANY prompts, read skills from \`.agent/skills/\`:
 2. \`reference-locking/SKILL.md\` — When reference images provided
 3. \`frame-chaining/SKILL.md\` — ALWAYS for multi-shot continuity
 4. \`spatial-blocking/SKILL.md\` — When gaze / depth / scale realism is critical
-5. \`coverage-system/SKILL.md\` — ALWAYS (mandatory coverage shots)
-6. \`visual-modes/SKILL.md\` — ALWAYS (Ultra Realism default)
-7. \`audio-design/SKILL.md\` — When dialogue or SFX needed
-8. \`prompt-structure/SKILL.md\` — ALWAYS (prompt templates)
+5. \`semantic-consistency/SKILL.md\` — ALWAYS (visual_world + semantic QA)
+6. \`coverage-system/SKILL.md\` — ALWAYS (mandatory coverage shots)
+7. \`visual-modes/SKILL.md\` — ALWAYS (Ultra Realism default)
+8. \`audio-design/SKILL.md\` — When dialogue or SFX needed
+9. \`prompt-structure/SKILL.md\` — ALWAYS (prompt templates)
 ### When User Asks /generate
 1. Read \`.agent/workflows/generate.md\` for the full procedure
@@ -610,17 +673,22 @@ Before generating ANY prompts, read skills from \`.agent/skills/\`:
 4. Create index: \`${config.outputDir}/_index.md\`
 5. Create project info: \`${config.outputDir}/project-info.md\`
 6. Create plan: \`${config.outputDir}/shot-plan.json\`
-7. Write reports: \`${config.outputDir}/reports/SAFETY-REPORT.md\`, \`${config.outputDir}/reports/DELIVERY-REPORT.md\`
+7. Keep top-level \`voiceCast\` in the plan and \`Audio Plan\` in speaking VIDEO sections
+8. Keep top-level \`visual_world\` in the plan for camera/lens/camera-movement/light/shadow/scale/reflection/physics/seed rules
+9. Write reports: \`${config.outputDir}/reports/SAFETY-REPORT.md\`, \`${config.outputDir}/reports/SEMANTIC-REPORT.md\`, \`${config.outputDir}/reports/DELIVERY-REPORT.md\`
 ### Critical Rules
 - **AUTO-ANONYMOUS:** Replace ALL real names with physical descriptions
 - **Name Policy:** Dialogue naming follows \`${config.outputDir}/shot-plan.json\` policy
 - **AUTO-SAFETY:** Proactively reframe sensitive content
 - **Frame Chaining:** Last frame of SHOT[N] = First frame of SHOT[N+1]
+- **Chain Hardening:** chained ILK/İLK FRAME code block contains only \`Use SHOT[prev]_END as exact first frame\`
 - **Coverage:** 2-3 sub-shots per main shot (in same file, min 60 words each)
 - **Spatial Realism:** eyeline targets, shared light, depth scale, and anti-cutout staging must agree when subjects share frame
+- **Semantic Consistency:** \`visual_world\` controls perspective/geometry, shadow vector, scale map, reflections, physics, anatomy risk, background coherence, and contextual contradictions
 - **Avoid Line:** MANDATORY on every prompt
 - **Music:** NONE by default
+- **Voice Design:** keep \`voiceCast\` in \`${config.outputDir}/shot-plan.json\` and \`Audio Plan\` in speaking VIDEO sections
 - **Duration:** 8s default, slow burn pacing
 - **Language:** Prompts in English, dialogue preserved
 - **ILK/İLK FRAME:** keep fenced code block even when chained
@@ -650,9 +718,10 @@ When request is /generate, follow the Film-Kit Hollywood production system:
 3. Load required skills from \`.agent/skills/\`
 4. Transform scenario into production shot package at \`${config.outputDir}\`
 5. Generate: project-info.md, shot-plan.json, _index.md, shots/SHOT01.md..SHOTNN.md
-6. Each SHOTNN.md: İLK FRAME + SON FRAME + VİDEO + 2-3 Coverage (ALL IN ONE FILE)
-7. Enforce: auto-anonymous, dialogue name policy, auto-safety, frame chaining, avoid lines
-8. Write reports to \`${config.outputDir}/reports/\` before /finish
+6. Keep top-level \`voiceCast\` and \`visual_world\` in shot-plan.json
+7. Each SHOTNN.md: İLK FRAME + SON FRAME + AUDIO PLAN + VİDEO + 2-3 Coverage (ALL IN ONE FILE)
+8. Enforce: auto-anonymous, dialogue name policy, auto-safety, frame chaining, semantic consistency, avoid lines
+9. Write reports to \`${config.outputDir}/reports/\` before /finish
 `;
 }
 /* ---------- ANTIGRAVITY ---------- */
@@ -667,6 +736,7 @@ description: Hollywood-standard cinematic prompt engineering with model-aware pr
 ## System Architecture
 This skill is part of the Film-Kit prompt engineering system.
 Read \`.agent/MASTER.md\` for the complete production ruleset (690+ rules).
+Read \`.agent/VOICE-DESIGN.md\` for project-level \`voiceCast\` and shot-level \`audioPlan\`.
 Read \`.agent/model-profile.md\` first for active model constraints.
 ## Skill Loading Protocol
@@ -675,10 +745,11 @@ Before generating ANY prompts, read these skills:
 2. \`.agent/skills/reference-locking/SKILL.md\` — When refs provided
 3. \`.agent/skills/frame-chaining/SKILL.md\` — ALWAYS
 4. \`.agent/skills/spatial-blocking/SKILL.md\` — When gaze / depth / scale realism is critical
-5. \`.agent/skills/coverage-system/SKILL.md\` — ALWAYS (mandatory)
-6. \`.agent/skills/visual-modes/SKILL.md\` — ALWAYS
-7. \`.agent/skills/audio-design/SKILL.md\` — When dialogue/SFX
-8. \`.agent/skills/prompt-structure/SKILL.md\` — ALWAYS
+5. \`.agent/skills/semantic-consistency/SKILL.md\` — ALWAYS (visual_world + semantic QA)
+6. \`.agent/skills/coverage-system/SKILL.md\` — ALWAYS (mandatory)
+7. \`.agent/skills/visual-modes/SKILL.md\` — ALWAYS
+8. \`.agent/skills/audio-design/SKILL.md\` — When dialogue/SFX
+9. \`.agent/skills/prompt-structure/SKILL.md\` — ALWAYS
 ## Workflows
 | Command | Workflow |
@@ -700,7 +771,9 @@ Before generating ANY prompts, read these skills:
 - Shot index: \`${config.outputDir}/_index.md\`
 - Project info: \`${config.outputDir}/project-info.md\`
 - Plan: \`${config.outputDir}/shot-plan.json\`
-- Reports: \`${config.outputDir}/reports/SAFETY-REPORT.md\`, \`${config.outputDir}/reports/DELIVERY-REPORT.md\`
+- Voice contract: top-level \`voiceCast\` in \`${config.outputDir}/shot-plan.json\`
+- Semantic contract: top-level \`visual_world\` in \`${config.outputDir}/shot-plan.json\`
+- Reports: \`${config.outputDir}/reports/SAFETY-REPORT.md\`, \`${config.outputDir}/reports/SEMANTIC-REPORT.md\`, \`${config.outputDir}/reports/DELIVERY-REPORT.md\`
 ## Critical Rules
 1. **AUTO-ANONYMOUS:** Replace ALL real person names with physical descriptions
@@ -709,12 +782,15 @@ Before generating ANY prompts, read these skills:
 4. **Frame Chaining:** Last frame of SHOT[N] becomes first frame of SHOT[N+1]
 5. **Coverage Mandatory:** 2-3 sub-shots per main shot (in same file, min 60 words each)
 6. **Avoid Line:** MANDATORY on every prompt (image + video + coverage)
-7. **Music: NONE** by default
-8. **Ultra Realism** default visual mode
-9. **8s duration** default, slow burn pacing
-10. **ILK/İLK FRAME:** always keep fenced code block
-11. **ONE FILE PER SHOT:** SHOTNN.md contains main shot + all coverage
-12. **Relational Realism:** preserve eyeline targets, shared light, depth scale, and anti-cutout staging when multiple subjects share frame
+7. **Voice Design:** keep \`voiceCast\` in \`${config.outputDir}/shot-plan.json\` and \`Audio Plan\` in every speaking VIDEO section
+8. **Music: NONE** by default
+9. **Ultra Realism** default visual mode
+10. **8s duration** default, slow burn pacing
+11. **ILK/İLK FRAME:** always keep fenced code block
+12. **Chained ILK/İLK FRAME:** code block contains only \`Use SHOT[prev]_END as exact first frame\`; any new visual prompt is CHAIN BREAK
+13. **ONE FILE PER SHOT:** SHOTNN.md contains main shot + all coverage
+14. **Relational Realism:** preserve eyeline targets, shared light, depth scale, and anti-cutout staging when multiple subjects share frame
+15. **Semantic Consistency:** preserve \`visual_world\` perspective, shadow vector, scale map, reflections, gravity/contact physics, anatomy risk, foreground/background coherence, and contextual logic
 ## Quality Floor (Hard Gate)
 Reject and regenerate any shot that fails:
@@ -727,6 +803,7 @@ Reject and regenerate any shot that fails:
 - missing explicit foreground/midground/background action details
 - missing explicit eyeline target or \`not camera\` instruction when gaze matters
 - missing explicit shared light source / depth / contact cues in multi-subject shots
+- missing semantic consistency anchors: perspective/geometry, shadow vector, scale map, reflection handling, gravity/contact physics, anatomy risk, foreground/background coherence, contextual contradiction check
 ## Reject Weak Prompt Style
 Do not accept generic filler language:
@@ -829,7 +906,7 @@ The more aligned these are, the cleaner the transition:
 - Hand pose and finger count should be similar in both frames
 - Avoid end frames with extreme mouth positions if speech is not intended
-**Loop shortcut:** Set Start = End (same image). Prompt: "seamless loop" + simple camera movement (e.g., roll 360, slow push-in).
+**Loop shortcut:** Set Start = End (same image). Prompt: "seamless loop" + simple camera movement (e.g., roll 360, Dolly In).
 ### Transformation Budget
@@ -843,6 +920,16 @@ Limit the amount of change per clip for realism:
 > If end frame is too different, the model must "invent miracles" → plastic/melting/warp risk rises.
+### Kling Storyboard / Multi-Shot Decision Tree
+The official Kling app surface exposes custom storyboard-like shot prompting and optional end-frame anchoring.
+Use it as a selective upgrade, not a default:
+- **single-transition**: default mode for one clean action arc, reveal, or emotional turn
+- **custom-storyboard**: only when one clip truly needs **2-3 editorially distinct phases**
+- **hard cap in this toolkit:** 3 custom storyboard shots per video generation
+- **never storyboard micro-beats:** one glance, one finger move, one tiny prop touch, one breath shift
+- if the beat needs 4+ phases, split into chained videos instead of overloading one Kling prompt
 ### Golden Prompt Skeleton (Start+End)
 The prompt's job is to tell the model **how to generate the in-between frames**:
@@ -874,10 +961,11 @@ These prevent the model from taking shortcuts.
 More complex camera = more warp risk.
 **Safest commands (highest success rate):**
-- slow push-in / pull-back
-- pan left/right
-- tilt up/down
-- gentle handheld micro-sway
+- Dolly In / Dolly Out
+- Pan Left / Pan Right
+- Tilt Up / Tilt Down
+- Tracking Shot or Steadicam Movement for smooth follow
+- Handheld Movement with gentle micro-sway
 - roll 360 (especially for loops)
 **Stabilization trick:** Writing "tripod-locked" reduces background jitter.
@@ -984,26 +1072,22 @@ When using a reference image as the start frame, the model extracts lighting and
 ### Multi-Shot Protocol (Tek Üretimde Çoklu Çekim)
-Kling 3.0 can manage **2 to 6 shots** in a single 15-second generation.
+Treat Kling multi-shot as **storyboarded internal progression**, not as a license to over-cut.
 **Rules:**
-1. **Shot Prefix:** Each shot MUST begin with \`Shot X,\` (e.g., \`Shot 1,\`, \`Shot 2,\`)
-2. **Character Continuity:** Repeat physical descriptions at the start of each shot, or use **Element Binding** (see below)
-3. **Time Distribution:** 15 seconds must be logically divided (e.g., Shot 1: 3s, Shot 2: 7s, Shot 3: 5s)
-4. **Maximum:** 6 shots per generation; for more, use separate generations with chaining
-Official examples favor short, concrete camera-and-action openings. Start with the movement path or subject action, then lock the stable identity/background constraints.
-**Example Multi-Shot Prompt:**
-\`\`\`
-Shot 1, Close-up of a bearded man in his 40s, wearing a dusty military uniform. He stares ahead, jaw clenched. Tripod-locked, 85mm, shallow DOF. (3s)
-Shot 2, Medium shot, same bearded man turns toward a younger soldier standing behind him. Slow pan right to reveal the young soldier. Natural handheld micro-sway. (5s)
+1. Default to **single-transition** whenever one camera path can carry the beat.
+2. Use custom storyboard only for **2-3 distinct internal phases** with different framing/action purpose.
+3. Keep each storyboard phase short, clear, and concrete: what changed, what stayed locked, why the cut exists.
+4. Hard cap for this toolkit: **3 storyboard shots per generation** in app.kling-oriented workflows.
+5. If the sequence wants 4+ phases, split into multiple chained generations.
+6. Never assign separate storyboard shots to micro-actions that would read better inside one stronger shot.
-Shot 3, Over-the-shoulder from the younger soldier's POV, the bearded man speaks directly to camera. Gentle push-in. (4s)
+**Preferred storyboard jobs:**
+- Shot 1: establish or setup
+- Shot 2: main action / reveal / shift
+- Shot 3: reaction / settle / handoff
-Shot 4, Wide shot establishing the artillery position, both soldiers visible. Crane rise. (3s)
-\`\`\`
+Official Kling surfaces favor short, concrete shot descriptions. Start from the action path or camera intention, then lock continuity anchors (identity, wardrobe, background geometry, stable light).
 ### Element Binding (Öğe Bağlama — Karakter/Nesne Tutarlılığı)
@@ -1019,17 +1103,23 @@ Element Binding is Kling 3.0's built-in technology for maintaining character and
 - For **multi-shot** sequences: Prefer Element Binding when available
 - **Fallback:** If Element Binding is not available in your interface, manually repeat: character age, distinctive features, costume, and key proportions at each shot's start
-### Advanced Camera Vocabulary (Kling vCoT Triggers)
+### Advanced Camera Vocabulary (24-Move Cinematic Lexicon)
+These professional terms activate Kling's "Visual Chain-of-Thought" (vCoT) for more precise results. Use one named movement per shot unless a motivated compound move is required:
-These professional terms activate Kling's "Visual Chain-of-Thought" (vCoT) for more precise results:
+| Group | Movements |
+|-------|-----------|
+| **Physical push/pull** | Dolly In, Dolly Out |
+| **Locked head rotation** | Pan Left, Pan Right, Tilt Up, Tilt Down |
+| **Physical lateral/vertical travel** | Truck Left, Truck Right, Pedestal Up, Pedestal Down |
+| **Arc/parallax** | Arc Left, Arc Right, Tracking Shot, Leading Shot, Following Shot |
+| **Dynamic stabilization** | Whip Pan, Handheld Movement, Steadicam Movement |
+| **Angle/subjective** | Canted Angle (Dutch Angle), Point of View (POV) |
+| **Optical/composite** | Zoom In, Zoom Out, Dolly Zoom (Vertigo Effect), Crane/Jib Shot |
-| Category | Terms |
-|----------|-------|
-| **Angles** | Low-angle hero shot, Dutch angle (tilted horizon), POV (subjective), Bird's-eye view (top-down) |
-| **Movements** | Dolly push-in, Orbit (360° rotation), Lateral pan, Tracking, Spiral up |
-| **Hybrid** | Dolly Zoom (Vertigo effect — zoom in while pulling back), Move Left and Zoom In (simultaneous) |
+Aliases: Dolly push-in = Dolly In, Dolly pull-out = Dolly Out, Orbit = Arc Left/Right, Lateral slide = Truck Left/Right, Crane rise/descend = Crane/Jib Shot. Rack focus is a focus move, not camera travel.
-> **Tip:** Hybrid movements like Dolly Zoom trigger stronger vCoT processing and produce more cinematic results, but increase warp risk. Use with wider CFG (0.50-0.60).
+> **Tip:** Hybrid movements like Dolly Zoom, Truck plus Pan, or Crane/Jib plus Arc trigger stronger vCoT processing and produce more cinematic results, but increase warp risk. Use with wider CFG (0.50-0.60).
 ### Native Audio & Dialogue (Kling-Specific)