npm - @milenyumai/film-kit - Versions diffs - 1.3.0 → 1.4.1 - Mend

@milenyumai/film-kit 1.3.0 → 1.4.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (15) hide show

package/README.md +2 -1
package/build/lib/templates.js +89 -43
package/content/ARCHITECTURE.md +4 -1
package/content/MASTER.md +48 -19
package/content/RULES.md +19 -8
package/content/VOICE-DESIGN.md +248 -0
package/content/agents/prompt-engineer.md +50 -27
package/content/skills/audio-design/SKILL.md +140 -13
package/content/skills/prompt-structure/SKILL.md +22 -3
package/content/workflows/chain.md +2 -0
package/content/workflows/finish.md +3 -0
package/content/workflows/generate.md +59 -11
package/content/workflows/recover.md +5 -3
package/content/workflows/safety-check.md +4 -0
package/package.json +1 -1

package/README.md CHANGED Viewed

@@ -18,6 +18,7 @@ Film-Kit ships as a single repository with four npm packages:
   - native `.claude/agents/*`
   - cleanup of stale mode-specific Claude artifacts
 - Shared `spatial-blocking` skill for gaze, plane depth, light cohesion, compositing realism, and anti-miniature control.
+- Voice-design aware audio contract with project-level `voiceCast`, shot-level `Audio Plan`, and backward-compatible `Audio direction` blocks.
 - Aligned quality gates across Claude Code, Cursor, Copilot, and Antigravity.
 - Stronger Kling 3.0 and Kling multi-shot guidance, including practical route rules and hard caps.
@@ -73,7 +74,7 @@ Best for:
 Key contracts:
 - `team-plan.json`
-- specialist validators
+- 8 specialist validators in 3-phase execution (continuity -> parallel quality checks -> delivery)
 - parallel batch generation
 - spatial contract for multi-subject staging

package/build/lib/templates.js CHANGED Viewed

@@ -44,6 +44,7 @@ All rules, skills, and workflows are located under \`.agent/\`.
 ### Entry Points
 - **Master Rules:** \`.agent/MASTER.md\` — Complete production ruleset (690 lines)
 - **Architecture:** \`.agent/ARCHITECTURE.md\` — System map & quick reference
+- **Voice Design:** \`.agent/VOICE-DESIGN.md\` — Project-level \`voiceCast\` + shot-level \`audioPlan\`
 - **Model Profile:** \`.agent/model-profile.md\` — Active model rules and constraints
 - **Agent:** \`.agent/agents/prompt-engineer.md\` — Senior prompt engineer agent
@@ -71,7 +72,7 @@ All rules, skills, and workflows are located under \`.agent/\`.
 ## Goal
 When the user asks \`/generate\`, convert the scenario into:
 - \`${config.outputDir}/project-info.md\` — Characters, settings, arc mapping
-- \`${config.outputDir}/shot-plan.json\` — Single-agent plan + policy contract
+- \`${config.outputDir}/shot-plan.json\` — Single-agent plan + policy + \`voiceCast\` contract
 - \`${config.outputDir}/shots/SHOT01.md, SHOT02.md, ...\` — Production shot files (with coverage included)
 - \`${config.outputDir}/reports/SAFETY-REPORT.md\` — Safety gate result
 - \`${config.outputDir}/reports/DELIVERY-REPORT.md\` — Delivery gate result
@@ -88,6 +89,7 @@ Each \`SHOTNN.md\` is a **single file** containing ALL shot details:
 - 🔗 Chain status (FIRST / CHAINED / CHAIN BREAK)
 - İLK FRAME (start frame image prompt — min 60 words)
 - SON FRAME (end frame image prompt — min 60 words)
+- AUDIO PLAN (machine-readable shot audio contract)
 - VİDEO (video prompt with audio direction — min 80 words)
 - COVERAGE SHOTS (2-3 coverage shots within same file — each min 60 words)
 - 🇹🇷 Turkish summary for each section
@@ -99,6 +101,7 @@ Each \`SHOTNN.md\` is a **single file** containing ALL shot details:
 - **AUTO-SAFETY:** Proactively reframe content that may trigger safety filters
 - **Frame Chaining:** Last frame of SHOT[N] = First frame of SHOT[N+1]
 - **Coverage Mandatory:** Every main shot includes 2-3 coverage sub-shots in same file
+- **Voice Design:** \`shot-plan.json\` keeps top-level \`voiceCast\`; every speaking VIDEO section keeps \`Audio Plan\`
 - **Music: NONE** by default (user must explicitly request)
 - **SLOW BURN:** 8-second duration, split actions into multiple shots
 `;
@@ -111,6 +114,7 @@ function buildCursorLegacyRules(config) {
 ## CRITICAL: SKILL LOADING PROTOCOL
 Before generating ANY prompts, read the appropriate skill files from .agent/skills/:
 Read .agent/model-profile.md first to apply model-specific rules.
+Read .agent/VOICE-DESIGN.md when dialogue, narrator VO, or reusable speaker identity exists.
 | Skill | Path | When |
 |-------|------|------|
@@ -135,6 +139,7 @@ Read .agent/model-profile.md first to apply model-specific rules.
 9. Frame chaining: Last frame of SHOT[N] = First frame of SHOT[N+1].
 10. ILK/İLK FRAME section must contain a code block even for chained shots.
 11. ONE FILE PER SHOT: Each SHOTNN.md contains main shot + all coverage shots.
+12. Keep top-level \`voiceCast\` in ${config.outputDir}/shot-plan.json and \`Audio Plan\` in every speaking VIDEO section.
 ## WORKFLOWS
 - /generate → Read .agent/workflows/generate.md
@@ -164,6 +169,7 @@ alwaysApply: true
 ## Entry Point
 Read \`.agent/MASTER.md\` for complete production ruleset.
 Read \`.agent/ARCHITECTURE.md\` for system overview.
+Read \`.agent/VOICE-DESIGN.md\` for voice identity and shot audio contracts.
 Read \`.agent/model-profile.md\` for active model constraints.
 ## SKILL LOADING (MANDATORY)
@@ -187,6 +193,8 @@ All skills at: \`.agent/skills/[name]/SKILL.md\`
 ## CRITICAL RULES
 - AUTO-ANONYMOUS: Replace ALL real names with physical descriptions
 - Dialogue naming follows \`${config.outputDir}/shot-plan.json\` policy
+- \`shot-plan.json\` stores top-level \`voiceCast\`
+- Every speaking VIDEO section includes \`Audio Plan\`
 - AUTO-SAFETY: Proactively reframe sensitive content
 - Frame chaining: Last frame SHOT[N] = First frame SHOT[N+1]
 - Coverage: 2-3 sub-shots per main shot (min 60 words each, in same file)
@@ -216,8 +224,9 @@ Single-agent shot package generation. Claude Code may use the native \`prompt-en
 ## Mandatory Read Order
 1. \`.agent/model-profile.md\`
 2. \`.agent/MASTER.md\`
-3. \`.agent/ARCHITECTURE.md\`
-4. \`.agent/agents/prompt-engineer.md\`
+3. \`.agent/VOICE-DESIGN.md\`
+4. \`.agent/ARCHITECTURE.md\`
+5. \`.agent/agents/prompt-engineer.md\`
 5. \`.claude/CLAUDE.md\`
 6. relevant files under \`.claude/rules/\`
@@ -242,6 +251,7 @@ Before generating ANY prompts, read skills from \`.agent/skills/\`:
 - Model: \`${config.model}\` (${getModelDisplayName(config.model)})
 - Kling preset: \`${config.klingPreset}\`
 - Create \`${config.outputDir}/project-info.md\`, \`${config.outputDir}/shot-plan.json\`, and \`${config.outputDir}/_index.md\`
+- Keep top-level \`voiceCast\` in \`${config.outputDir}/shot-plan.json\`
 - Write \`${config.outputDir}/shots/SHOTNN.md\` per shot; coverage stays in the same file
 - Refresh \`${config.outputDir}/reports/SAFETY-REPORT.md\` and \`${config.outputDir}/reports/DELIVERY-REPORT.md\` before \`/finish\`
@@ -254,11 +264,12 @@ Before generating ANY prompts, read skills from \`.agent/skills/\`:
 6. **Music:** NONE by default.
 7. **Avoid line:** MANDATORY on every prompt (image, video, coverage).
 8. **Coverage:** 2-3 sub-shots within same SHOTNN.md file, min 70 words each.
-9. **ILK/İLK FRAME:** Always include a fenced code block, even when chained.
-10. **Quality Floor:** ILK >= 80, SON >= 80, VIDEO >= 120, coverage >= 70 words.
-11. **Specificity Floor:** lens/framing, lighting, and foreground/midground/background action are mandatory.
-12. **Spatial Realism Floor:** eyeline target, plane map, shared light source, and contact/depth cues are mandatory when relational staging matters.
-13. **ONE FILE PER SHOT:** No separate coverage files.
+9. **Voice Design:** keep project-level \`voiceCast\` in \`${config.outputDir}/shot-plan.json\` and per-shot \`Audio Plan\` in each VIDEO section.
+10. **ILK/İLK FRAME:** Always include a fenced code block, even when chained.
+11. **Quality Floor:** ILK >= 80, SON >= 80, VIDEO >= 120, coverage >= 70 words.
+12. **Specificity Floor:** lens/framing, lighting, and foreground/midground/background action are mandatory.
+13. **Spatial Realism Floor:** eyeline target, plane map, shared light source, and contact/depth cues are mandatory when relational staging matters.
+14. **ONE FILE PER SHOT:** No separate coverage files.
 ## Workflows
 | Command | Workflow |
@@ -283,8 +294,9 @@ This workspace keeps high-level policy in \`CLAUDE.md\` and operational detail i
 ## Before Any Generation or Repair
 1. Read \`.agent/model-profile.md\`
 2. Read \`.agent/MASTER.md\`
-3. Read \`.agent/agents/prompt-engineer.md\`
-4. Read \`.agent/workflows/generate.md\` or the requested workflow
+3. Read \`.agent/VOICE-DESIGN.md\`
+4. Read \`.agent/agents/prompt-engineer.md\`
+5. Read \`.agent/workflows/generate.md\` or the requested workflow
 5. Apply AUTO-ANONYMOUS and AUTO-SAFETY before drafting
 6. Prefer the active/selected markdown file as scenario source; fallback is \`${config.scenarioHint}\`
 7. Do not mark work complete while any required report is missing or fail
@@ -293,6 +305,8 @@ This workspace keeps high-level policy in \`CLAUDE.md\` and operational detail i
 - Write only inside \`${config.outputDir}\`
 - Keep one file per shot: \`${config.outputDir}/shots/SHOTNN.md\`
 - Maintain \`${config.outputDir}/shot-plan.json\` dialogue naming policy
+- Maintain \`${config.outputDir}/shot-plan.json\` top-level \`voiceCast\`
+- Keep \`Audio Plan\` blocks aligned to \`voiceCast\`
 - Keep \`ILK/İLK FRAME\` in a fenced code block even when chained
 - Quality floor and specificity floor are hard gates, not suggestions
 - Apply \`.agent/skills/spatial-blocking/SKILL.md\` whenever eyeline, compositing, or depth realism is critical
@@ -313,13 +327,16 @@ Use the Film-Kit core runtime.
 ## Read First
 1. \`.agent/model-profile.md\`
 2. \`.agent/MASTER.md\`
-3. \`.agent/agents/prompt-engineer.md\`
-4. \`.agent/workflows/generate.md\`
-5. \`.claude/rules/output-contract.md\`
+3. \`.agent/VOICE-DESIGN.md\`
+4. \`.agent/agents/prompt-engineer.md\`
+5. \`.agent/workflows/generate.md\`
+6. \`.claude/rules/output-contract.md\`
 ## Responsibilities
 - draft and repair shot files under \`${config.outputDir}/shots/\`
 - apply \`${config.outputDir}/shot-plan.json\` dialogue naming policy
+- maintain top-level \`voiceCast\` inside \`${config.outputDir}/shot-plan.json\`
+- keep \`Audio Plan\` blocks valid against \`voiceCast\`
 - enforce AUTO-ANONYMOUS, AUTO-SAFETY, chaining, and coverage contracts
 - enforce quality floor: ILK >= 80, SON >= 80, VIDEO >= 120, coverage >= 70
 - enforce specificity floor: lens/framing, lighting, and foreground/midground/background action
@@ -349,6 +366,7 @@ If using the native Claude subagent, read \`.claude/agents/prompt-engineer.md\`
    - Determine shot count based on action beats
    - Create \`${config.outputDir}/project-info.md\`
    - Create \`${config.outputDir}/shot-plan.json\`
+   - Add top-level \`voiceCast\` before writing speaking shots
 2. **Batch Strategy:**
    - 1-10 shots → Generate all at once
@@ -358,6 +376,7 @@ If using the native Claude subagent, read \`.claude/agents/prompt-engineer.md\`
 3. **Per Shot (SINGLE FILE: SHOTNN.md):**
    - Analyze scene type (Dialogue / Action / Emotional / Establishing)
    - Generate main shot (İLK FRAME + SON FRAME + VİDEO)
+   - Add machine-readable \`Audio Plan\` before every VIDEO section
    - Keep İLK FRAME as fenced code block even when chained
    - Enforce hard quality floor: ILK >= 80, SON >= 80, VIDEO >= 120, coverage >= 70
    - Enforce specificity floor: lens/framing + lighting + foreground/midground/background action
@@ -386,8 +405,9 @@ function buildClaudeRuleOutputContract(config) {
 ## Required Files
 - \`${config.outputDir}/project-info.md\` — Characters, settings, emotional arc mapping, tension levels
-- \`${config.outputDir}/shot-plan.json\` — Name policy, shot plan, and validation contract
+- \`${config.outputDir}/shot-plan.json\` — Name policy, shot plan, validation contract, and top-level \`voiceCast\`
 - \`.agent/model-profile.md\` — Active model constraints and presets
+- \`.agent/VOICE-DESIGN.md\` — Voice identity and shot audio contract
 - \`${config.outputDir}/_index.md\` — Shot tracking with chain & status
 - \`${config.outputDir}/shots/SHOT01.md ... SHOTNN.md\` — Individual shot files (one file per shot)
 - \`${config.outputDir}/reports/SAFETY-REPORT.md\` — Safety gate report
@@ -403,7 +423,7 @@ The first sentence carries the most weight. Every prompt MUST follow this order:
 3. [Action] Micro-behavior, acting cue, physical gesture
 4. [Camera + Lens] "85mm f/2.0, shallow DOF, static with handheld micro-movement"
 5. [Light + Atmosphere] "Warm oil lamp key light from screen-left, deep shadows"
-6. [Audio direction block] (VIDEO prompts only)
+6. [Audio Plan + Audio direction block] (VIDEO prompts only)
 7. [Avoid line] (EVERY prompt — MANDATORY)
 \`\`\`
@@ -413,6 +433,7 @@ The first sentence carries the most weight. Every prompt MUST follow this order:
 - Kling preset: \`${config.klingPreset}\`
 - Kling transition mode: Start+End (when model is kling-3.0)
 - Motion timeline: first → then → finally (when model is kling-3.0)
+- Kling multi-shot mode: single-transition by default; custom storyboard only for 2-3 meaningful phases (when model is kling-3.0)
 ## Shot File Format (SHOTNN.md) — SINGLE FILE, ALL INCLUSIVE
@@ -447,6 +468,12 @@ Avoid: blurry, low-res, noise, distorted faces, bad anatomy, extra limbs/fingers
 plastic skin, waxy skin, on-screen text, watermark, logo, cartoon style, CGI look.
 \\\`\\\`\\\`
+### AUDIO PLAN
+\\\`\\\`\\\`json
+[Machine-readable audioPlan JSON block aligned with shot-plan.json -> voiceCast]
+\\\`\\\`\\\`
 ### VİDEO
 \\\`\\\`\\\`
@@ -494,6 +521,10 @@ Audio direction:
 Avoid: distorted faces, morphing, blurry, flickering, unnatural motion, on-screen text.
 \\\`\\\`\\\`
+\\\`\\\`\\\`json
+[Coverage audioPlan JSON block - voice binding only when coverage contains speech]
+\\\`\\\`\\\`
 ### SHOT[NN]B — [Type] | [Duration]s | [Icon] [Label]
 [Same format as A]
@@ -545,6 +576,8 @@ Total: 1 main + [N] coverage = [N+1] production shots
 ### Name Rule
 - Visual prompts and non-dialogue fields: no real names
 - Dialogue transcript naming follows \`shot-plan.json.dialogue_name_policy\`
+- Every speaking shot must resolve \`activeSpeakerKey\` to \`shot-plan.json.voiceCast\`
+- Keep one active speaker per shot
 ### 30° Rule
 Coverage camera angles MUST differ from main shot by at least 30°.
@@ -590,6 +623,7 @@ function buildCopilotInstructions(config) {
 ### System Entry Point
 Read \`.agent/MASTER.md\` for the complete production ruleset.
 Read \`.agent/ARCHITECTURE.md\` for system overview.
+Read \`.agent/VOICE-DESIGN.md\` for voice identity and shot audio contracts.
 Read \`.agent/model-profile.md\` for active model rules.
 ### Skill Loading Protocol (MANDATORY)
@@ -610,7 +644,8 @@ Before generating ANY prompts, read skills from \`.agent/skills/\`:
 4. Create index: \`${config.outputDir}/_index.md\`
 5. Create project info: \`${config.outputDir}/project-info.md\`
 6. Create plan: \`${config.outputDir}/shot-plan.json\`
-7. Write reports: \`${config.outputDir}/reports/SAFETY-REPORT.md\`, \`${config.outputDir}/reports/DELIVERY-REPORT.md\`
+7. Keep top-level \`voiceCast\` in the plan and \`Audio Plan\` in speaking VIDEO sections
+8. Write reports: \`${config.outputDir}/reports/SAFETY-REPORT.md\`, \`${config.outputDir}/reports/DELIVERY-REPORT.md\`
 ### Critical Rules
 - **AUTO-ANONYMOUS:** Replace ALL real names with physical descriptions
@@ -621,6 +656,7 @@ Before generating ANY prompts, read skills from \`.agent/skills/\`:
 - **Spatial Realism:** eyeline targets, shared light, depth scale, and anti-cutout staging must agree when subjects share frame
 - **Avoid Line:** MANDATORY on every prompt
 - **Music:** NONE by default
+- **Voice Design:** keep \`voiceCast\` in \`${config.outputDir}/shot-plan.json\` and \`Audio Plan\` in speaking VIDEO sections
 - **Duration:** 8s default, slow burn pacing
 - **Language:** Prompts in English, dialogue preserved
 - **ILK/İLK FRAME:** keep fenced code block even when chained
@@ -650,9 +686,10 @@ When request is /generate, follow the Film-Kit Hollywood production system:
 3. Load required skills from \`.agent/skills/\`
 4. Transform scenario into production shot package at \`${config.outputDir}\`
 5. Generate: project-info.md, shot-plan.json, _index.md, shots/SHOT01.md..SHOTNN.md
-6. Each SHOTNN.md: İLK FRAME + SON FRAME + VİDEO + 2-3 Coverage (ALL IN ONE FILE)
-7. Enforce: auto-anonymous, dialogue name policy, auto-safety, frame chaining, avoid lines
-8. Write reports to \`${config.outputDir}/reports/\` before /finish
+6. Keep top-level \`voiceCast\` in shot-plan.json
+7. Each SHOTNN.md: İLK FRAME + SON FRAME + AUDIO PLAN + VİDEO + 2-3 Coverage (ALL IN ONE FILE)
+8. Enforce: auto-anonymous, dialogue name policy, auto-safety, frame chaining, avoid lines
+9. Write reports to \`${config.outputDir}/reports/\` before /finish
 `;
 }
 /* ---------- ANTIGRAVITY ---------- */
@@ -667,6 +704,7 @@ description: Hollywood-standard cinematic prompt engineering with model-aware pr
 ## System Architecture
 This skill is part of the Film-Kit prompt engineering system.
 Read \`.agent/MASTER.md\` for the complete production ruleset (690+ rules).
+Read \`.agent/VOICE-DESIGN.md\` for project-level \`voiceCast\` and shot-level \`audioPlan\`.
 Read \`.agent/model-profile.md\` first for active model constraints.
 ## Skill Loading Protocol
@@ -700,6 +738,7 @@ Before generating ANY prompts, read these skills:
 - Shot index: \`${config.outputDir}/_index.md\`
 - Project info: \`${config.outputDir}/project-info.md\`
 - Plan: \`${config.outputDir}/shot-plan.json\`
+- Voice contract: top-level \`voiceCast\` in \`${config.outputDir}/shot-plan.json\`
 - Reports: \`${config.outputDir}/reports/SAFETY-REPORT.md\`, \`${config.outputDir}/reports/DELIVERY-REPORT.md\`
 ## Critical Rules
@@ -709,12 +748,13 @@ Before generating ANY prompts, read these skills:
 4. **Frame Chaining:** Last frame of SHOT[N] becomes first frame of SHOT[N+1]
 5. **Coverage Mandatory:** 2-3 sub-shots per main shot (in same file, min 60 words each)
 6. **Avoid Line:** MANDATORY on every prompt (image + video + coverage)
-7. **Music: NONE** by default
-8. **Ultra Realism** default visual mode
-9. **8s duration** default, slow burn pacing
-10. **ILK/İLK FRAME:** always keep fenced code block
-11. **ONE FILE PER SHOT:** SHOTNN.md contains main shot + all coverage
-12. **Relational Realism:** preserve eyeline targets, shared light, depth scale, and anti-cutout staging when multiple subjects share frame
+7. **Voice Design:** keep \`voiceCast\` in \`${config.outputDir}/shot-plan.json\` and \`Audio Plan\` in every speaking VIDEO section
+8. **Music: NONE** by default
+9. **Ultra Realism** default visual mode
+10. **8s duration** default, slow burn pacing
+11. **ILK/İLK FRAME:** always keep fenced code block
+12. **ONE FILE PER SHOT:** SHOTNN.md contains main shot + all coverage
+13. **Relational Realism:** preserve eyeline targets, shared light, depth scale, and anti-cutout staging when multiple subjects share frame
 ## Quality Floor (Hard Gate)
 Reject and regenerate any shot that fails:
@@ -843,6 +883,16 @@ Limit the amount of change per clip for realism:
 > If end frame is too different, the model must "invent miracles" → plastic/melting/warp risk rises.
+### Kling Storyboard / Multi-Shot Decision Tree
+The official Kling app surface exposes custom storyboard-like shot prompting and optional end-frame anchoring.
+Use it as a selective upgrade, not a default:
+- **single-transition**: default mode for one clean action arc, reveal, or emotional turn
+- **custom-storyboard**: only when one clip truly needs **2-3 editorially distinct phases**
+- **hard cap in this toolkit:** 3 custom storyboard shots per video generation
+- **never storyboard micro-beats:** one glance, one finger move, one tiny prop touch, one breath shift
+- if the beat needs 4+ phases, split into chained videos instead of overloading one Kling prompt
 ### Golden Prompt Skeleton (Start+End)
 The prompt's job is to tell the model **how to generate the in-between frames**:
@@ -984,26 +1034,22 @@ When using a reference image as the start frame, the model extracts lighting and
 ### Multi-Shot Protocol (Tek Üretimde Çoklu Çekim)
-Kling 3.0 can manage **2 to 6 shots** in a single 15-second generation.
+Treat Kling multi-shot as **storyboarded internal progression**, not as a license to over-cut.
 **Rules:**
-1. **Shot Prefix:** Each shot MUST begin with \`Shot X,\` (e.g., \`Shot 1,\`, \`Shot 2,\`)
-2. **Character Continuity:** Repeat physical descriptions at the start of each shot, or use **Element Binding** (see below)
-3. **Time Distribution:** 15 seconds must be logically divided (e.g., Shot 1: 3s, Shot 2: 7s, Shot 3: 5s)
-4. **Maximum:** 6 shots per generation; for more, use separate generations with chaining
-Official examples favor short, concrete camera-and-action openings. Start with the movement path or subject action, then lock the stable identity/background constraints.
-**Example Multi-Shot Prompt:**
-\`\`\`
-Shot 1, Close-up of a bearded man in his 40s, wearing a dusty military uniform. He stares ahead, jaw clenched. Tripod-locked, 85mm, shallow DOF. (3s)
-Shot 2, Medium shot, same bearded man turns toward a younger soldier standing behind him. Slow pan right to reveal the young soldier. Natural handheld micro-sway. (5s)
-Shot 3, Over-the-shoulder from the younger soldier's POV, the bearded man speaks directly to camera. Gentle push-in. (4s)
-Shot 4, Wide shot establishing the artillery position, both soldiers visible. Crane rise. (3s)
-\`\`\`
+1. Default to **single-transition** whenever one camera path can carry the beat.
+2. Use custom storyboard only for **2-3 distinct internal phases** with different framing/action purpose.
+3. Keep each storyboard phase short, clear, and concrete: what changed, what stayed locked, why the cut exists.
+4. Hard cap for this toolkit: **3 storyboard shots per generation** in app.kling-oriented workflows.
+5. If the sequence wants 4+ phases, split into multiple chained generations.
+6. Never assign separate storyboard shots to micro-actions that would read better inside one stronger shot.
+**Preferred storyboard jobs:**
+- Shot 1: establish or setup
+- Shot 2: main action / reveal / shift
+- Shot 3: reaction / settle / handoff
+Official Kling surfaces favor short, concrete shot descriptions. Start from the action path or camera intention, then lock continuity anchors (identity, wardrobe, background geometry, stable light).
 ### Element Binding (Öğe Bağlama — Karakter/Nesne Tutarlılığı)

package/content/ARCHITECTURE.md CHANGED Viewed

@@ -20,6 +20,7 @@ Modular system consisting of:
 .agent/
 ├── ARCHITECTURE.md          # This file
 ├── MASTER.md                # Main rules (entry point for AI tools)
+├── VOICE-DESIGN.md          # Voice identity + audioPlan contract
 ├── model-profile.md         # Active model rules (runtime generated)
 ├── agents/
 │   └── prompt-engineer.md   # Primary agent
@@ -60,7 +61,7 @@ Modular system consisting of:
 | `coverage-system` | **Mandatory coverage shots** (Reaction, OTS, Insert, Cutaway, ECU, Wide) + L-cut/J-cut + 30° kuralı + **180° kuralı** + eyeline match + matching action + multi-character blocking |
 | `spatial-blocking` | **Relational realism**: eyeline targeting, plane mapping, body orientation, shared lighting, depth/scale integration, anti-cutout / anti-miniature cues |
 | `visual-modes` | **Ultra Realism** default, stylization triggers, anti-AI artifact rules + **renk sürekliliği** + magic hour + flashback/rüya görsel ayrımı |
-| `audio-design` | **Sound design** rules, voice realism, SFX, ambience, audio direction block + diegetic/non-diegetic ses ayrımı |
+| `audio-design` | **Sound design** rules, voice realism, project-level `voiceCast`, shot-level `audioPlan`, audio direction block + diegetic/non-diegetic ses ayrımı |
 | `prompt-structure` | Image/video prompt templates, camera vocabulary, seed parameter, prompt rewriter, **re-take strategy**, coverage prompt yazım standartları (≥60 kelime) |
 ---
@@ -84,6 +85,8 @@ User Scenario → Agent Activated → Read model-profile → Load Required Skill
                                        ↓
                             .agent/model-profile.md (ALWAYS FIRST)
                                        ↓
+                            .agent/VOICE-DESIGN.md (when dialogue / narrator exists)
+                                       ↓
                             safety-compliance (ALWAYS)
                             reference-locking (if refs provided)
                             frame-chaining (ALWAYS)

package/content/MASTER.md CHANGED Viewed

@@ -131,9 +131,23 @@ User: "Subay askerlere bağırır ve ateş emri verir."
   2. "Ateş!" diye bağırır, damarları çıkar. (Mid-shot)
   3. Askerlerin parmağı tetiğe gider. (Detail shot)
-### 🎵 Music Policy
-**DEFAULT:** `Music: NONE` (Always).
-User must explicitly request music to include it. Otherwise, rely on SFX and Ambience.
+### 🎵 Music Policy
+**DEFAULT:** `Music: NONE` (Always).
+User must explicitly request music to include it. Otherwise, rely on SFX and Ambience.
+### Voice Design Contract
+When dialogue or narration exists, audio is mandatory at three synchronized layers:
+1. project-level `voiceCast`
+2. shot-level `audioPlan`
+3. prompt-level `Audio direction:`
+Project-level voice identity is stable.
+Shot-level performance is variable.
+Use `.agent/VOICE-DESIGN.md` whenever a speaker, narrator, or voiceover is involved.
+Do not redesign a speaker voice per shot. Reuse the same `speakerKey` and voice identity across the whole project.
 ### 📸 Reference Image Enforcement
@@ -160,7 +174,7 @@ When user provides references, EVERY prompt MUST:
 3. [Aksiyon] — Ne yapıyor, mikro-davranış, oyunculuk ipucu
 4. [Kamera + Lens] — "85mm f/2.0, shallow DOF, static with handheld micro-movement"
 5. [Işık + Atmosfer] — "Warm oil lamp key light from screen-left, deep shadows"
-6. [Audio direction block] — (sadece video prompt'larında)
+6. [Audio Plan + Audio direction block] — (sadece video prompt'larında)
 7. [Avoid line] — (HER prompt'ta zorunlu)
 ```
@@ -186,11 +200,20 @@ Avoid: blurry, low-res, noise, distorted faces, bad anatomy, extra limbs/fingers
 - `--audio_mode` or `--audio_prompt` (audio is for VIDEO only)
 - Any `--parameter` flags
-#### VIDEO PROMPTS (VİDEO, Coverage Video)
-**MUST INCLUDE FULL AUDIO DIRECTION BLOCK:**
-```
-Audio direction:
+#### VIDEO PROMPTS (VİDEO, Coverage Video)
+**VOICE DESIGN CONTRACT (MANDATORY WHEN VOICE EXISTS):**
+- Project output must include top-level `voiceCast`
+- Every VIDEO section must include a machine-readable `Audio Plan` block
+- If `Type` is `Dialogue`, `Voiceover`, or `Mixed`, `activeSpeakerKey` must bind to `voiceCast`
+- `voiceIdentityPrompt` is character-level
+- `performanceNote` is shot-level
+- Keep one active speaker per shot
+- Split reply dialogue across multiple shots when needed
+**MUST INCLUDE FULL AUDIO DIRECTION BLOCK:**
+```
+Audio direction:
 - Language: [TURKISH/ENGLISH/etc.]
 - Type: [Dialogue/SFX/Ambience/Mixed]
 - Dialogue transcript: "[Exact lines in original language]" or NONE
@@ -530,11 +553,13 @@ Avoid: blurry, low-res, noise, distorted faces, bad anatomy, extra limbs/fingers
 **VİDEO:**
 ```
-[Complete video prompt — MIN 80 words, MAX 120 words]
-[Action + camera movement + acting cues + atmosphere]
-Audio direction:
-- Language: [LANGUAGE]
+[Complete video prompt — MIN 80 words, MAX 120 words]
+[Action + camera movement + acting cues + atmosphere]
+[Audio Plan JSON block aligned with .agent/VOICE-DESIGN.md]
+Audio direction:
+- Language: [LANGUAGE]
 - Type: [Dialogue/SFX/Ambience]
 - Dialogue transcript: [Lines or NONE]
 - SFX: [Effects list]
@@ -613,11 +638,15 @@ Before outputting, validate EVERY shot. **Bu kontrol otomatiktir, kullanıcı ha
 - [ ] **HER coverage image** → Avoid satırı var mı? ❗ Yoksa EKLE
 - [ ] **HER coverage video** → Avoid satırı var mı? ❗ Yoksa EKLE
-### 4️⃣ Prompt Uzunluk Kontrolü
-- [ ] Ana shot image prompt ≥ 60 kelime mi?
-- [ ] Ana shot video prompt ≥ 80 kelime mi?
-- [ ] **Coverage prompt ≥ 60 kelime mi?** ❗ Kısa coverage YASAK
-- [ ] Audio direction block tam mı? (Language, Type, Dialogue, SFX, Ambience, Music, Mix)
+### 4️⃣ Prompt Uzunluk Kontrolü
+- [ ] Ana shot image prompt ≥ 60 kelime mi?
+- [ ] Ana shot video prompt ≥ 80 kelime mi?
+- [ ] **Coverage prompt ≥ 60 kelime mi?** ❗ Kısa coverage YASAK
+- [ ] Audio direction block tam mı? (Language, Type, Dialogue, SFX, Ambience, Music, Mix)
+- [ ] `voiceCast` mevcut mu ve her konusan speaker icin `speakerKey` tekil mi?
+- [ ] Her VIDEO bolumunde machine-readable `Audio Plan` var mi?
+- [ ] Dialogue / Voiceover shot'larinda `activeSpeakerKey` -> `voiceCast` baglantisi gecerli mi?
+- [ ] Bir shot'ta tek aktif speaker kurali korunuyor mu?
 ### 5️⃣ Türkçe Özet
 - [ ] Her shot'un 🇹🇷 Türkçe özet satırı var mı?

package/content/RULES.md CHANGED Viewed

@@ -114,10 +114,11 @@ When references provided:
 | Kural | Açıklama |
 |-------|----------|
-| HER image prompt | Avoid satırı ile bitmeli |
-| HER video prompt | Audio direction block + Avoid satırı ile bitmeli |
-| HER coverage prompt | ≥ 60 kelime + Avoid satırı ile bitmeli |
-| Türkçe özet | Her shot ve coverage için 🇹🇷 özet satırı |
+| HER image prompt | Avoid satırı ile bitmeli |
+| HER video prompt | Audio direction block + Avoid satırı ile bitmeli |
+| HER coverage prompt | ≥ 60 kelime + Avoid satırı ile bitmeli |
+| Türkçe özet | Her shot ve coverage için 🇹🇷 özet satırı |
+| Voice Design | `voiceCast` proje seviyesi, `audioPlan` shot seviyesi, `Audio direction` parser seviyesi |
 ### 📷 Coverage System (MANDATORY)
@@ -134,10 +135,20 @@ When references provided:
 **Sub-shot naming:** SHOT05A, SHOT05B, SHOT05C (main shot + alphabetical suffix)
-### 🎵 Music Policy (BANNED)
-**DEFAULT:** `Music: NONE` (Always).
-User must explicitly request music. Otherwise use SFX/Ambience only.
+### 🎵 Music Policy (BANNED)
+**DEFAULT:** `Music: NONE` (Always).
+User must explicitly request music. Otherwise use SFX/Ambience only.
+### Voice Design Policy
+If dialogue or narration exists:
+- keep a project-level `voiceCast`
+- keep a shot-level `audioPlan`
+- keep the prompt-level `Audio direction:` block
+- keep one active speaker per shot
+- reuse the same `speakerKey` and voice identity across all shots
 ### 🎭 Dramaturji & Oyunculuk (Diyalog Sahneleri)

package/content/VOICE-DESIGN.md ADDED Viewed

@@ -0,0 +1,248 @@
+# Voice Design Output Contract
+This runtime must support voice-design style audio generation where voice identity is created once per speaker and then reused across shots.
+## Three-Layer Audio Architecture
+Every scenario output must keep these layers aligned:
+1. `voiceCast`
+   Project-level or scenario-level voice identity package.
+2. `audioPlan`
+   Shot-level performance, transcript, SFX, ambience, and mix metadata.
+3. `Audio direction:`
+   Human-readable prompt block kept for backward compatibility with existing parsers.
+## Film-Kit Storage Contract
+- Single-agent projects store project-level voice identity in `$OUTPUT_DIR/shot-plan.json`.
+- Multi-agent projects store project-level voice identity in `$OUTPUT_DIR/team-plan.json`.
+- Every `SHOTNN.md` file must include a machine-readable `Audio Plan` JSON block for each VIDEO section.
+- The `Audio direction:` prompt block must mirror the same `audioPlan` and remain present in the video prompt.
+## Core Principle
+Voice identity is character-level.
+Performance is shot-level.
+Do not redesign a character voice on every shot.
+Design the voice once, bind it to a stable `speakerKey`, and reuse that identity across the entire project.
+## Project-Level `voiceCast`
+Create one `voiceCast` entry for every speaking character or narrator.
+Required fields:
+- `speaker`
+- `speakerKey`
+- `role`
+- `language`
+- `voiceIdentityPrompt`
+- `voicePerformanceBase`
+- `preferredProvider`
+- `preferredModel`
+- `saveToLibrary`
+Recommended fields:
+- `genderHint`
+- `ageRange`
+- `referenceAudioStrategy`
+- `referenceAudioNeeded`
+- `shouldGenerateVoice`
+- `voiceDesignRecommended`
+- `voiceDesignPriority`
+- `referenceAudioSuggested`
+- `referenceAudioDescription`
+- `notes`
+Example:
+```json
+{
+  "speaker": "Kemal",
+  "speakerKey": "kemal",
+  "role": "character",
+  "language": "turkish",
+  "genderHint": "male",
+  "ageRange": "35-45",
+  "voiceIdentityPrompt": "A realistic Turkish male voice in his late thirties to early forties. Low-mid register, restrained authority, subtle fatigue, natural diction, intimate close-mic realism, no theatricality, highly believable human texture.",
+  "voicePerformanceBase": "controlled, interior, understated",
+  "referenceAudioStrategy": "optional",
+  "referenceAudioNeeded": false,
+  "shouldGenerateVoice": true,
+  "preferredProvider": "elevenlabs",
+  "preferredModel": "eleven_ttv_v3",
+  "saveToLibrary": true,
+  "voiceDesignRecommended": true,
+  "voiceDesignPriority": "high",
+  "referenceAudioSuggested": false,
+  "referenceAudioDescription": null,
+  "notes": "Use the same voice identity across all shots. Performance may change per shot, identity must not."
+}
+```
+## `speakerKey` Rule
+The same speaker must use the same `speakerKey` everywhere.
+If a dialogue shot references a `speakerKey` that does not exist in `voiceCast`, treat it as a contract failure.
+## `voiceIdentityPrompt` Rule
+`voiceIdentityPrompt` describes the stable voice identity, not the temporary emotion of one shot.
+Include:
+- language
+- age range
+- gender feel when useful
+- register and texture
+- diction and articulation
+- energy profile
+- recording feel or mic proximity
+- realism target such as believable, non-theatrical, non-announcer
+Do not include:
+- exact dialogue lines
+- one-shot emotional spikes
+- scene-specific action like running, shouting, whispering this exact line
+## `voicePerformanceBase`
+Use `voicePerformanceBase` for the speaker's normal acting baseline.
+This baseline may combine with the shot-level `performanceNote`, but it must not replace voice identity.
+## Shot-Level `audioPlan`
+Every VIDEO section must carry a machine-readable `Audio Plan` JSON block.
+If dialogue or voiceover exists, these fields are mandatory:
+- `language`
+- `type`
+- `activeSpeaker`
+- `activeSpeakerKey`
+- `dialogueLines`
+- `dialogueTranscript`
+- `performanceNote`
+Recommended support fields:
+- `delivery.pace`
+- `delivery.intensity`
+- `delivery.emotion`
+- `delivery.projection`
+- `delivery.stability`
+- `sfx`
+- `ambience`
+- `music`
+- `mixTarget`
+- `hasSubtitles`
+Example:
+```json
+{
+  "language": "turkish",
+  "type": "dialogue",
+  "activeSpeaker": "Kemal",
+  "activeSpeakerKey": "kemal",
+  "dialogueLines": [
+    {
+      "speaker": "Kemal",
+      "speakerKey": "kemal",
+      "text": "Burayi terk etmemiz lazim."
+    }
+  ],
+  "dialogueTranscript": "Kemal: Burayi terk etmemiz lazim.",
+  "performanceNote": "Low, controlled urgency. Quiet but decisive. No melodrama.",
+  "delivery": {
+    "pace": "measured",
+    "intensity": "medium-low",
+    "emotion": "suppressed urgency",
+    "projection": "intimate",
+    "stability": "steady"
+  },
+  "sfx": [
+    "Ahsap sandalyenin hafif surtunmesi"
+  ],
+  "ambience": [
+    "Kucuk odada dusuk floresan humu"
+  ],
+  "music": null,
+  "mixTarget": {
+    "dialogue": 70,
+    "sfx": 5,
+    "ambience": 25
+  },
+  "hasSubtitles": false
+}
+```
+## Single Active Speaker Rule
+For realistic lipsync and stable delivery, keep only one active speaker per shot.
+- Other characters may appear in frame, but they should listen silently.
+- Back-and-forth dialogue should be split across multiple shots.
+- Coverage shots may have `audioPlan.type` values like `sfx-only` or `ambience-only` and therefore require no voice binding.
+## Backward-Compatible Prompt Block
+The prompt must still include the human-readable block:
+```text
+Audio direction:
+Language: Turkish
+Type: Dialogue
+Dialogue transcript:
+Kemal: Burayi terk etmemiz lazim.
+SFX:
+- Ahsap sandalyenin hafif surtunmesi
+Ambience:
+- Kucuk odada dusuk floresan humu
+Music: NONE
+Mix target: Dialogue 70%, SFX 5%, Ambience 25%
+No on-screen subtitles/captions.
+```
+This block must be derived from the same `audioPlan`, not invented separately.
+## Default Voice Design Policy
+- Character voice stays fixed.
+- Shot performance may change.
+- Language and diction stay fixed.
+- Realism is maximized.
+- Theatricality is minimized.
+- Promotional or announcer tone is forbidden unless explicitly requested.
+- `Music: NONE` is the default.
+- `No on-screen subtitles/captions.` is mandatory unless the user explicitly requests otherwise.
+## Reference Audio Guidance
+Recommend reference audio when the target voice depends on unusually specific texture:
+- elderly or highly specific age character
+- broken, smoky, hoarse, or damaged micro-texture
+- strong local accent or highly specific diction
+- very narrow performance tolerance between whisper and force
+If reference audio is recommended, prefer:
+- `preferredModel: "eleven_ttv_v3"`
+- `referenceAudioNeeded: true`
+## Provider Flow
+When a provider supports voice design, the implementation flow is:
+1. Use `voiceCast[].voiceIdentityPrompt` to design a preview voice.
+2. Review multiple previews.
+3. Select one preview.
+4. Create a persistent `voice_id`.
+5. Bind that `voice_id` to `speakerKey`.
+6. Reuse the same voice identity across all future shots.
+7. If needed, remix or refine the same voice without breaking identity continuity.

package/content/agents/prompt-engineer.md CHANGED Viewed

@@ -133,13 +133,24 @@ First sentence carries the most weight. Every prompt MUST follow this order:
 3. [Action] Micro-behavior, acting cue, physical gesture
 4. [Camera + Lens] "85mm f/2.0, shallow DOF, static with handheld micro-movement"
 5. [Light + Atmosphere] "Warm oil lamp key light from screen-left, deep shadows"
-6. [Audio direction block] (VIDEO prompts only)
-7. [Avoid line] (EVERY prompt — MANDATORY)
-```
-> **Kling adaptation:** When model is kling-3.0, replace step 3 (Action) with motion timeline (`first → then → finally`). Step 6 uses Kling tone markers instead of Audio Direction Block. See `.agent/model-profile.md` for golden prompt skeleton.
----
+6. [Audio direction block] (VIDEO prompts only)
+7. [Avoid line] (EVERY prompt — MANDATORY)
+```
+> **Kling adaptation:** When model is kling-3.0, replace step 3 (Action) with motion timeline (`first → then → finally`). Step 6 uses Kling tone markers instead of Audio Direction Block. See `.agent/model-profile.md` for golden prompt skeleton.
+## VOICE DESIGN CONTRACT
+When dialogue, narrator VO, or character speech exists:
+- read `.agent/VOICE-DESIGN.md`
+- create and maintain project-level `voiceCast` in `shot-plan.json`
+- reuse the same `speakerKey` across all shots
+- write a machine-readable `Audio Plan` JSON block for every VIDEO section
+- keep `voiceIdentityPrompt` character-level and `performanceNote` shot-level
+- keep one active speaker per shot
+---
 ## AUDIO DIRECTION (Mandatory for Video)
@@ -195,22 +206,29 @@ Before outputting ANY shot:
 - [ ] No active violence/gore?
 ### 2. Prompt Structure
-- [ ] Follows 1-7 Flow Order?
-- [ ] Avoid Line present on EVERY prompt (Image + Video + Coverage)?
-- [ ] Audio direction present on EVERY video prompt? (Veo block or Kling inline)
-- [ ] 2-3 Coverage shots included in the same file?
+- [ ] Follows 1-7 Flow Order?
+- [ ] Avoid Line present on EVERY prompt (Image + Video + Coverage)?
+- [ ] Audio direction present on EVERY video prompt? (Veo block or Kling inline)
+- [ ] `shot-plan.json` contains top-level `voiceCast`?
+- [ ] Every speaking shot has an `Audio Plan` block with valid `activeSpeakerKey`?
+- [ ] Each active speaker resolves to a stable `speakerKey` in `voiceCast`?
+- [ ] One active speaker per dialogue shot?
+- [ ] 2-3 Coverage shots included in the same file?
 - [ ] Quality floor passes? (ILK>=80, SON>=80, VIDEO>=120, coverage>=70)
 - [ ] Specificity floor passes? (lens + lighting + FG/MG/BG action)
 - [ ] Spatial realism passes? (eyeline target + plane map + shared light + contact/depth cues)
 - [ ] Model Control block exists? (`Model`, `Preset`, `CFG`, `Transition Mode`)
-### 3. Kling-Specific Gates (when model is kling-3.0)
-- [ ] Motion timeline uses `first → then → finally` structure?
-- [ ] "What stays the same" explicitly stated? (identity, background, costume)
-- [ ] Camera movement is simple/safe? (no complex hybrid movements)
-- [ ] Negative prompt includes Kling cleanup set? (warping, rubbery, melted)
-- [ ] Duration matches transformation budget? (5s=1 change, 10s=2-3, 15s=complex)
-- [ ] Start/end frames are in same visual universe? (angle, scale, light, lens)
+### 3. Kling-Specific Gates (when model is kling-3.0)
+- [ ] Motion timeline uses `first → then → finally` structure?
+- [ ] "What stays the same" explicitly stated? (identity, background, costume)
+- [ ] Camera movement is simple/safe? (no complex hybrid movements)
+- [ ] Negative prompt includes Kling cleanup set? (warping, rubbery, melted)
+- [ ] Duration matches transformation budget? (5s=1 change, 10s=2-3, 15s=complex)
+- [ ] Start/end frames are in same visual universe? (angle, scale, light, lens)
+- [ ] Used `single-transition` by default instead of forcing storyboard complexity?
+- [ ] If custom storyboard is used, is it capped at 3 meaningful stages?
+- [ ] Did each storyboard stage earn a distinct editorial job instead of representing micro-gestures?
 ---
@@ -241,15 +259,20 @@ For EACH shot, output exactly one file (`SHOTNN.md`) containing Main Shot + Cove
 [If CHAINED: include "Use SHOT[prev]_END as exact first frame" inside code block]
 [If FIRST/BREAK: image prompt in same code block]
-### SON FRAME (SHOTNN_END)
-```
-[Image Prompt (Flow Order + Avoid Line)]
-avoid: blurry, low-res, text, watermark, bad anatomy, distorted face ...
-```
-### VİDEO
-```
-[Video Prompt (Flow Order)]
+### SON FRAME (SHOTNN_END)
+```
+[Image Prompt (Flow Order + Avoid Line)]
+avoid: blurry, low-res, text, watermark, bad anatomy, distorted face ...
+```
+### AUDIO PLAN
+```json
+[Machine-readable audioPlan JSON block aligned with shot-plan.json -> voiceCast]
+```
+### VİDEO
+```
+[Video Prompt (Flow Order)]
 Audio direction:
 - Language: ...

package/content/skills/audio-design/SKILL.md CHANGED Viewed

@@ -3,12 +3,136 @@ name: audio-design
 description: Sound design rules for model-aware profiles (Veo 3.1 / Kling 3.0). Voice realism, environmental sounds, SFX, ambience, and audio direction block formatting. Includes Kling native audio tonlama and anti-synthetic audio guidelines.
 ---
-# Audio Design System
-> **Philosophy:** Sound is half the experience. Audio must be as real as visuals.
-> **Core Principle:** Every sound recorded-quality. No synthetic giveaways.
----
+# Audio Design System
+> **Philosophy:** Sound is half the experience. Audio must be as real as visuals.
+> **Core Principle:** Every sound recorded-quality. No synthetic giveaways.
+---
+## Voice Design Contract (MANDATORY)
+When a scenario includes spoken dialogue, narrator VO, or any reusable character voice, use a three-layer contract:
+1. `voiceCast`
+   Stable project-level voice identity package.
+2. `audioPlan`
+   Shot-level performance and mix metadata.
+3. `Audio direction:`
+   Human-readable prompt block for backward compatibility.
+Use `.agent/VOICE-DESIGN.md` as the authoritative schema reference.
+### Project-Level `voiceCast`
+Every speaking character or narrator must have one `voiceCast` entry.
+Required fields:
+- `speaker`
+- `speakerKey`
+- `role`
+- `language`
+- `voiceIdentityPrompt`
+- `voicePerformanceBase`
+- `preferredProvider`
+- `preferredModel`
+- `saveToLibrary`
+Rules:
+- `speakerKey` must stay stable across the whole project.
+- `voiceIdentityPrompt` describes the permanent voice identity, never a one-shot emotional spike.
+- `voicePerformanceBase` describes the speaker's normal acting baseline.
+- If a dialogue shot has no matching `voiceCast` entry, the output is invalid.
+Example:
+```json
+{
+  "speaker": "Kemal",
+  "speakerKey": "kemal",
+  "role": "character",
+  "language": "turkish",
+  "voiceIdentityPrompt": "A realistic Turkish male voice in his late thirties to early forties. Low-mid register, restrained authority, subtle fatigue, natural diction, intimate close-mic realism, no theatricality, highly believable human texture.",
+  "voicePerformanceBase": "controlled, interior, understated",
+  "preferredProvider": "elevenlabs",
+  "preferredModel": "eleven_ttv_v3",
+  "saveToLibrary": true
+}
+```
+### Shot-Level `audioPlan`
+Every VIDEO block must include a machine-readable `Audio Plan` JSON section that maps to the project-level voice identity.
+If the shot contains dialogue or voiceover, these fields are mandatory:
+- `language`
+- `type`
+- `activeSpeaker`
+- `activeSpeakerKey`
+- `dialogueLines`
+- `dialogueTranscript`
+- `performanceNote`
+Recommended support fields:
+- `delivery.pace`
+- `delivery.intensity`
+- `delivery.emotion`
+- `delivery.projection`
+- `delivery.stability`
+- `sfx`
+- `ambience`
+- `music`
+- `mixTarget`
+- `hasSubtitles`
+### Single Active Speaker Rule
+Keep only one active speaker per shot.
+- Reaction shots should usually be silent or ambience-only.
+- Back-and-forth dialogue should be split across multiple shots.
+- OTS coverage may carry one voice, but do not ask the model to lipsync two people at once.
+### Voice Identity vs Performance
+Keep this distinction strict:
+- `voiceIdentityPrompt` = who the speaker is across the whole film
+- `performanceNote` = how that speaker performs in this shot
+Correct identity cues:
+- `low-mid register`
+- `natural Turkish diction`
+- `restrained authority`
+- `breathy but realistic`
+- `non-theatrical`
+- `highly believable real-world tone`
+Incorrect identity cues:
+- exact dialogue lines
+- `angrily says this line`
+- `whispering while running`
+- temporary scene emotion only
+### Provider Flow
+Voice Design style providers should follow this order:
+1. design from `voiceCast[].voiceIdentityPrompt`
+2. review previews
+3. choose one
+4. create persistent voice
+5. bind voice to `speakerKey`
+6. reuse across all shots
+7. remix only as a refinement, not as a new character identity
+---
 ## 🎯 Audio Realism Baseline
@@ -197,19 +321,22 @@ User: "Kalkın aslanlar! Düşman zırhlıları top menziline giriyorlar!"
 ✅ RIGHT: Dialogue transcript: "Kalkın aslanlar! Düşman zırhlıları top menziline giriyorlar!"
 ```
-### When User Provides Dialogue
-- Include VERBATIM in audio transcript
-- Preserve original language (Turkish stays Turkish, English stays English)
-- Note emotional delivery required
+### When User Provides Dialogue
+- Include VERBATIM in audio transcript
+- Preserve original language (Turkish stays Turkish, English stays English)
+- Note emotional delivery required
+- Mirror the same transcript in `audioPlan.dialogueLines`
+- Bind the speaking line to `activeSpeakerKey`
 ### 🗣️ SPEAKER ISOLATION RULE (Prevent Mixed Dialogue)
 **Problem:** If two people are in the frame and both speak, AI often mixes lipsync or timing.
 **Solution:** ONE active speaker per shot.
-- **Shot A:** Character X speaks. Camera focuses on X (or over-the-shoulder of Y).
-- **Shot B:** Character Y replies. Camera focuses on Y.
+- **Shot A:** Character X speaks. Camera focuses on X (or over-the-shoulder of Y).
+- **Shot B:** Character Y replies. Camera focuses on Y.
+- Keep the same `speakerKey` and `voiceCast` identity in both shots. Only `performanceNote` changes.
 **EXCEPTION:** If both MUST be in frame (Two-Shot):
 1. Use "Reaction Shot" for the listener (listener nods while speaker talks off-screen sound).

package/content/skills/prompt-structure/SKILL.md CHANGED Viewed

@@ -81,6 +81,18 @@ Before using any video template in this file:
 - If active model is **Kling 3.0**, do **not** default to the Veo-style `Audio direction:` bullet schema below.
 - If project is a hybrid/smart-hybrid package, the local `.agent/skills/prompt-structure/SKILL.md` override takes precedence over this base file.
+## Kling Storyboard / Multi-Shot Decision Rule
+The official Kling app surface exposes start/end control, optional end-frame anchoring, and custom storyboard-style shot prompting.
+Use that capability with restraint:
+- default to a single Start+End transition prompt when the beat is one continuous motion arc, one reveal, or one emotional turn
+- use custom storyboard only when the clip truly contains **2-3 distinct editorial phases** with different visual jobs
+- in this toolkit, cap app.kling custom storyboard mode at **3 custom shots** per video
+- never split one micro gesture, blink, glance, prop touch, or tiny head turn into separate storyboard shots
+- if a beat needs 4+ distinct phases, split into chained videos instead of bloating one generation
+- every storyboard shot must earn a clear job: establish / action / reveal / reaction / resolve
 ## Video Prompt Structure (Veo-Oriented Default Template)
 > Use this section when the active video model/routed section is **Veo 3.1**.
@@ -198,9 +210,16 @@ Avoid: distorted faces, morphing, bad anatomy, extra limbs/fingers, blurry, flic
 | **Establishing** | 8s | 10–15s | Set scene, allow environment |
 | **Complex transformation** | N/A | 15s (max) | Multi-step with single-scene continuity |
-> **Kling Note:** Duration maps directly to **transformation budget**. More change between start and end frames = longer duration needed. See `.agent/model-profile.md` for details.
----
+> **Kling Note:** Duration maps directly to **transformation budget**. More change between start and end frames = longer duration needed. See `.agent/model-profile.md` for details.
+### Kling Anti-Fragmentation Rule
+- do not turn every sentence or gesture into a new Kling storyboard phase
+- if a single camera move can carry the beat cleanly, keep one shot and strengthen the motion path
+- if the beat reads as setup -> action -> settle, `first -> then -> finally` is usually enough
+- reserve multi-shot/storyboard mode for meaningful internal progression, not decorative complexity
+---
 ## Prompt Re-Take Strategy

package/content/workflows/chain.md CHANGED Viewed

@@ -34,6 +34,7 @@ All generated files must keep `SHOTNN.md` naming.
 1. Read `$OUTPUT_DIR/shots/SHOT[last].md`.
 2. Extract `SON FRAME` description.
 3. Capture character/location continuity details.
+4. Reuse `$OUTPUT_DIR/shot-plan.json -> voiceCast` and keep every existing `speakerKey` stable.
 ### 3. Continue Generation
@@ -49,6 +50,7 @@ If model is kling-3.0: keep Start+End transition mode and first/then/finally mot
 - Write each shot to `$OUTPUT_DIR/shots/SHOT[NN].md`
 - Keep one-file-per-shot contract
 - Ensure `ILK/İLK FRAME` code block exists even when chained
+- Keep `Audio Plan` blocks aligned to the existing `voiceCast`
 - Update `$OUTPUT_DIR/_index.md`
 ### 5. Refresh Reports

package/content/workflows/finish.md CHANGED Viewed

@@ -32,6 +32,8 @@ All shot files follow `SHOTNN.md` naming.
 5. Verify chain continuity.
 6. Verify Avoid line on every prompt.
 7. Verify `Model Control` block exists in every shot file.
+8. Verify `shot-plan.json` has `voiceCast` coverage for every speaker or narrator.
+9. Verify every speaking VIDEO section has an `Audio Plan` block with valid `activeSpeakerKey`.
 8. For `kling-3.0`, verify:
    - `Transition Mode: Start+End`
    - CFG value is documented
@@ -83,6 +85,7 @@ Do not declare completion unless:
 - delivery report is pass
 - final summary says pass
 - model-specific checks pass (Kling or Veo profile)
+- voice design contract passes (`voiceCast` + `Audio Plan` + single active speaker)
 ---

package/content/workflows/generate.md CHANGED Viewed

@@ -30,7 +30,7 @@ Generate production-ready shot prompts from scenario and **SAVE TO FILES**.
 ```text
 $OUTPUT_DIR/
 ├── project-info.md          # Scenario, characters, settings
-├── shot-plan.json           # Single-agent plan + policy contract
+├── shot-plan.json           # Single-agent plan + policy + voiceCast contract
 ├── shots/
 │   ├── SHOT01.md            # First shot (main + coverage)
 │   ├── SHOT02.md            # Second shot
@@ -62,7 +62,10 @@ $OUTPUT_DIR/
    - coverage strategy
    - model contract (`model`, `preset`, `transition_mode`)
    - `dialogue_name_policy` (`preserve-original-dialogue` or `anonymize-dialogue`)
-10. Ensure directories exist: `$OUTPUT_DIR/`, `$OUTPUT_DIR/shots/`, `$OUTPUT_DIR/reports/`.
+   - top-level `voiceCast`
+   - voice defaults (`single_active_speaker`, `music_default`, `subtitles_default`)
+10. Ensure every speaker or narrator has a stable `speakerKey` before shot writing.
+11. Ensure directories exist: `$OUTPUT_DIR/`, `$OUTPUT_DIR/shots/`, `$OUTPUT_DIR/reports/`.
 ### 2. Batch Strategy
@@ -83,9 +86,16 @@ For EACH shot:
 1. Analyze scene type (Dialogue / Action / Emotional / Establishing).
 2. Build dramaturgy behavior (objective, obstacle, stakes, subtext, beat turns).
 3. If model is `kling-3.0`, perform Kling Frame Prep (see below).
-4. Generate main shot prompts (`ILK/İLK FRAME`, `SON FRAME`, `VIDEO`).
+4. If model is `kling-3.0`, choose Kling execution mode:
+   - `single-transition` for one strong motion arc or reveal
+   - `custom-storyboard` only for 2-3 distinct editorial phases
+4. Reuse or create the correct `voiceCast` entry for any speaking character or narrator.
+5. Generate main shot prompts (`ILK/İLK FRAME`, `SON FRAME`, `VIDEO`).
 5. `ILK/İLK FRAME` section MUST always include a fenced code block.
 6. If chained, first-frame code block must explicitly state: `Use SHOT[prev]_END as exact first frame`.
+7. Write a machine-readable `Audio Plan` JSON block for every VIDEO section.
+8. If dialogue or voiceover exists, require `activeSpeakerKey`, `dialogueLines`, and `performanceNote`.
+9. Keep one active speaker per shot. Split reply dialogue across multiple shots.
 7. Enforce hard quality floor:
    - `ILK FRAME`: minimum 80 words
    - `SON FRAME`: minimum 80 words
@@ -98,11 +108,11 @@ For EACH shot:
    - explicit eyeline target / body orientation when a subject looks at someone or something
    - explicit shared light source / bounce logic when multiple subjects share frame
    - explicit depth/scale integration when more than one plane is visible
-9. Generate coverage prompts (2-3 per main shot, min 70 words each).
-10. Add Turkish summary for shot and each coverage section.
-11. Apply model-specific generation gates (see below).
-12. Write shot file: `$OUTPUT_DIR/shots/SHOT[NN].md`.
-13. Update `$OUTPUT_DIR/_index.md`.
+10. Generate coverage prompts (2-3 per main shot, min 70 words each).
+11. Add Turkish summary for shot and each coverage section.
+12. Apply model-specific generation gates (see below).
+13. Write shot file: `$OUTPUT_DIR/shots/SHOT[NN].md`.
+14. Update `$OUTPUT_DIR/_index.md`.
 #### Kling Frame Prep (when model is kling-3.0)
@@ -113,9 +123,11 @@ Before writing prompts, design the Start→End transition:
     - 5s = 1 major change
     - 10s = 2-3 staged changes
     - 15s = complex multi-step
-3.  **Motion timeline:** Write 2-4 steps: `first → then → finally`.
-4.  **Face/hands stability:** Match orientations between start and end — avoid >45° face rotation.
-5.  **Camera safety:** Use only safe movements (slow push-in, pan, tilt, micro-sway, tripod-locked).
+3.  **Execution mode:** Default to `single-transition`; use `custom-storyboard` only when the shot truly has 2-3 meaningful internal phases.
+4.  **Motion timeline:** Write 2-4 steps: `first → then → finally`.
+5.  **Face/hands stability:** Match orientations between start and end — avoid >45° face rotation.
+6.  **Camera safety:** Use only safe movements (slow push-in, pan, tilt, micro-sway, tripod-locked).
+7.  **Anti-fragmentation:** Do not turn one glance, gesture, or prop touch into separate micro-shots. If custom storyboard is used, cap it at 3 stages and make each stage editorially distinct.
 #### Veo Gate (when model is veo31)
@@ -134,6 +146,8 @@ Before writing prompts, design the Start→End transition:
 - [ ] Start/end frames verified for same visual universe
 - [ ] Dialogue uses `"quotation marks"` with tone markers (Kling inline format)
 - [ ] Eyelines, plane map, and shared-light logic stay consistent across start/end frames
+- [ ] Defaulted to `single-transition` unless 2-3 distinct internal phases genuinely require custom storyboard
+- [ ] If custom storyboard is used: maximum 3 stages, no decorative micro-beats, and each stage changes function/framing/action
 ### 4. Validation Pass (Mandatory Before Completion)
@@ -190,6 +204,36 @@ Avoid: blurry, low-res, text, watermark, bad anatomy, distorted face ...
 Avoid: blurry, low-res, text, watermark, bad anatomy, distorted face ...
 ```
+### AUDIO PLAN
+```json
+{
+  "language": "...",
+  "type": "...",
+  "activeSpeaker": "...",
+  "activeSpeakerKey": "...",
+  "dialogueLines": [],
+  "dialogueTranscript": "...",
+  "performanceNote": "...",
+  "delivery": {
+    "pace": "...",
+    "intensity": "...",
+    "emotion": "...",
+    "projection": "...",
+    "stability": "..."
+  },
+  "sfx": [],
+  "ambience": [],
+  "music": null,
+  "mixTarget": {
+    "dialogue": 0,
+    "sfx": 0,
+    "ambience": 0
+  },
+  "hasSubtitles": false
+}
+```
 ### VIDEO
 #### Veo Format:
@@ -240,6 +284,10 @@ Avoid: ...
 Avoid: ...
 ```
+```json
+[Coverage audioPlan JSON block - voice binding only when coverage actually contains speech]
+```
 ### SHOT[NN]B - [Type]
 [Same format]

package/content/workflows/recover.md CHANGED Viewed

@@ -14,15 +14,17 @@ $ARGUMENTS
 - delivery report is fail
 - missing required shot sections
 - continuity mismatch between neighboring shots
+- missing `voiceCast` entry or broken `activeSpeakerKey` binding
 ## Recovery Steps
 1. Identify failed files and sections.
 2. Fix only affected shots first.
 3. Regenerate neighboring shots only if continuity requires it.
-4. Re-run `/safety-check`.
-5. Regenerate `$OUTPUT_DIR/reports/DELIVERY-REPORT.md`.
-6. Update `$OUTPUT_DIR/_index.md` with recovered status.
+4. Repair `voiceCast` or `Audio Plan` bindings before rerunning reports.
+5. Re-run `/safety-check`.
+6. Regenerate `$OUTPUT_DIR/reports/DELIVERY-REPORT.md`.
+7. Update `$OUTPUT_DIR/_index.md` with recovered status.
 ## Exit Criteria

package/content/workflows/safety-check.md CHANGED Viewed

@@ -38,6 +38,10 @@ Validate all prompts before delivery to ensure platform compliance.
 - coverage has 2-3 sub-shots in same file
 - all prompts include Avoid line
 - all video prompts include full audio direction block
+- `shot-plan.json` contains `voiceCast` when any shot has dialogue or narration
+- every speaking VIDEO section includes an `Audio Plan` block
+- every `activeSpeakerKey` resolves to `shot-plan.json -> voiceCast`
+- no dialogue shot contains more than one active speaker
 3. Continuity checks
 - `SHOT[N]_END` aligns with `SHOT[N+1]_START`

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@milenyumai/film-kit",
-  "version": "1.3.0",
+  "version": "1.4.1",
   "description": "Hollywood-standard cinematic prompt engineering toolkit with model profiles (Veo 3.1 / Kling 3.0). Auto-configures AI agents (Cursor, Claude Code, VS Code Copilot, Antigravity) with production-grade shot generation system.",
   "type": "module",
   "main": "./build/index.js",