@milenyumai/film-kit 1.3.0 → 1.4.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -18,6 +18,7 @@ Film-Kit ships as a single repository with four npm packages:
18
18
  - native `.claude/agents/*`
19
19
  - cleanup of stale mode-specific Claude artifacts
20
20
  - Shared `spatial-blocking` skill for gaze, plane depth, light cohesion, compositing realism, and anti-miniature control.
21
+ - Voice-design aware audio contract with project-level `voiceCast`, shot-level `Audio Plan`, and backward-compatible `Audio direction` blocks.
21
22
  - Aligned quality gates across Claude Code, Cursor, Copilot, and Antigravity.
22
23
  - Stronger Kling 3.0 and Kling multi-shot guidance, including practical route rules and hard caps.
23
24
 
@@ -73,7 +74,7 @@ Best for:
73
74
  Key contracts:
74
75
 
75
76
  - `team-plan.json`
76
- - specialist validators
77
+ - 8 specialist validators in 3-phase execution (continuity -> parallel quality checks -> delivery)
77
78
  - parallel batch generation
78
79
  - spatial contract for multi-subject staging
79
80
 
@@ -44,6 +44,7 @@ All rules, skills, and workflows are located under \`.agent/\`.
44
44
  ### Entry Points
45
45
  - **Master Rules:** \`.agent/MASTER.md\` — Complete production ruleset (690 lines)
46
46
  - **Architecture:** \`.agent/ARCHITECTURE.md\` — System map & quick reference
47
+ - **Voice Design:** \`.agent/VOICE-DESIGN.md\` — Project-level \`voiceCast\` + shot-level \`audioPlan\`
47
48
  - **Model Profile:** \`.agent/model-profile.md\` — Active model rules and constraints
48
49
  - **Agent:** \`.agent/agents/prompt-engineer.md\` — Senior prompt engineer agent
49
50
 
@@ -71,7 +72,7 @@ All rules, skills, and workflows are located under \`.agent/\`.
71
72
  ## Goal
72
73
  When the user asks \`/generate\`, convert the scenario into:
73
74
  - \`${config.outputDir}/project-info.md\` — Characters, settings, arc mapping
74
- - \`${config.outputDir}/shot-plan.json\` — Single-agent plan + policy contract
75
+ - \`${config.outputDir}/shot-plan.json\` — Single-agent plan + policy + \`voiceCast\` contract
75
76
  - \`${config.outputDir}/shots/SHOT01.md, SHOT02.md, ...\` — Production shot files (with coverage included)
76
77
  - \`${config.outputDir}/reports/SAFETY-REPORT.md\` — Safety gate result
77
78
  - \`${config.outputDir}/reports/DELIVERY-REPORT.md\` — Delivery gate result
@@ -88,6 +89,7 @@ Each \`SHOTNN.md\` is a **single file** containing ALL shot details:
88
89
  - 🔗 Chain status (FIRST / CHAINED / CHAIN BREAK)
89
90
  - İLK FRAME (start frame image prompt — min 60 words)
90
91
  - SON FRAME (end frame image prompt — min 60 words)
92
+ - AUDIO PLAN (machine-readable shot audio contract)
91
93
  - VİDEO (video prompt with audio direction — min 80 words)
92
94
  - COVERAGE SHOTS (2-3 coverage shots within same file — each min 60 words)
93
95
  - 🇹🇷 Turkish summary for each section
@@ -99,6 +101,7 @@ Each \`SHOTNN.md\` is a **single file** containing ALL shot details:
99
101
  - **AUTO-SAFETY:** Proactively reframe content that may trigger safety filters
100
102
  - **Frame Chaining:** Last frame of SHOT[N] = First frame of SHOT[N+1]
101
103
  - **Coverage Mandatory:** Every main shot includes 2-3 coverage sub-shots in same file
104
+ - **Voice Design:** \`shot-plan.json\` keeps top-level \`voiceCast\`; every speaking VIDEO section keeps \`Audio Plan\`
102
105
  - **Music: NONE** by default (user must explicitly request)
103
106
  - **SLOW BURN:** 8-second duration, split actions into multiple shots
104
107
  `;
@@ -111,6 +114,7 @@ function buildCursorLegacyRules(config) {
111
114
  ## CRITICAL: SKILL LOADING PROTOCOL
112
115
  Before generating ANY prompts, read the appropriate skill files from .agent/skills/:
113
116
  Read .agent/model-profile.md first to apply model-specific rules.
117
+ Read .agent/VOICE-DESIGN.md when dialogue, narrator VO, or reusable speaker identity exists.
114
118
 
115
119
  | Skill | Path | When |
116
120
  |-------|------|------|
@@ -135,6 +139,7 @@ Read .agent/model-profile.md first to apply model-specific rules.
135
139
  9. Frame chaining: Last frame of SHOT[N] = First frame of SHOT[N+1].
136
140
  10. ILK/İLK FRAME section must contain a code block even for chained shots.
137
141
  11. ONE FILE PER SHOT: Each SHOTNN.md contains main shot + all coverage shots.
142
+ 12. Keep top-level \`voiceCast\` in ${config.outputDir}/shot-plan.json and \`Audio Plan\` in every speaking VIDEO section.
138
143
 
139
144
  ## WORKFLOWS
140
145
  - /generate → Read .agent/workflows/generate.md
@@ -164,6 +169,7 @@ alwaysApply: true
164
169
  ## Entry Point
165
170
  Read \`.agent/MASTER.md\` for complete production ruleset.
166
171
  Read \`.agent/ARCHITECTURE.md\` for system overview.
172
+ Read \`.agent/VOICE-DESIGN.md\` for voice identity and shot audio contracts.
167
173
  Read \`.agent/model-profile.md\` for active model constraints.
168
174
 
169
175
  ## SKILL LOADING (MANDATORY)
@@ -187,6 +193,8 @@ All skills at: \`.agent/skills/[name]/SKILL.md\`
187
193
  ## CRITICAL RULES
188
194
  - AUTO-ANONYMOUS: Replace ALL real names with physical descriptions
189
195
  - Dialogue naming follows \`${config.outputDir}/shot-plan.json\` policy
196
+ - \`shot-plan.json\` stores top-level \`voiceCast\`
197
+ - Every speaking VIDEO section includes \`Audio Plan\`
190
198
  - AUTO-SAFETY: Proactively reframe sensitive content
191
199
  - Frame chaining: Last frame SHOT[N] = First frame SHOT[N+1]
192
200
  - Coverage: 2-3 sub-shots per main shot (min 60 words each, in same file)
@@ -216,8 +224,9 @@ Single-agent shot package generation. Claude Code may use the native \`prompt-en
216
224
  ## Mandatory Read Order
217
225
  1. \`.agent/model-profile.md\`
218
226
  2. \`.agent/MASTER.md\`
219
- 3. \`.agent/ARCHITECTURE.md\`
220
- 4. \`.agent/agents/prompt-engineer.md\`
227
+ 3. \`.agent/VOICE-DESIGN.md\`
228
+ 4. \`.agent/ARCHITECTURE.md\`
229
+ 5. \`.agent/agents/prompt-engineer.md\`
221
230
  5. \`.claude/CLAUDE.md\`
222
231
  6. relevant files under \`.claude/rules/\`
223
232
 
@@ -242,6 +251,7 @@ Before generating ANY prompts, read skills from \`.agent/skills/\`:
242
251
  - Model: \`${config.model}\` (${getModelDisplayName(config.model)})
243
252
  - Kling preset: \`${config.klingPreset}\`
244
253
  - Create \`${config.outputDir}/project-info.md\`, \`${config.outputDir}/shot-plan.json\`, and \`${config.outputDir}/_index.md\`
254
+ - Keep top-level \`voiceCast\` in \`${config.outputDir}/shot-plan.json\`
245
255
  - Write \`${config.outputDir}/shots/SHOTNN.md\` per shot; coverage stays in the same file
246
256
  - Refresh \`${config.outputDir}/reports/SAFETY-REPORT.md\` and \`${config.outputDir}/reports/DELIVERY-REPORT.md\` before \`/finish\`
247
257
 
@@ -254,11 +264,12 @@ Before generating ANY prompts, read skills from \`.agent/skills/\`:
254
264
  6. **Music:** NONE by default.
255
265
  7. **Avoid line:** MANDATORY on every prompt (image, video, coverage).
256
266
  8. **Coverage:** 2-3 sub-shots within same SHOTNN.md file, min 70 words each.
257
- 9. **ILK/İLK FRAME:** Always include a fenced code block, even when chained.
258
- 10. **Quality Floor:** ILK >= 80, SON >= 80, VIDEO >= 120, coverage >= 70 words.
259
- 11. **Specificity Floor:** lens/framing, lighting, and foreground/midground/background action are mandatory.
260
- 12. **Spatial Realism Floor:** eyeline target, plane map, shared light source, and contact/depth cues are mandatory when relational staging matters.
261
- 13. **ONE FILE PER SHOT:** No separate coverage files.
267
+ 9. **Voice Design:** keep project-level \`voiceCast\` in \`${config.outputDir}/shot-plan.json\` and per-shot \`Audio Plan\` in each VIDEO section.
268
+ 10. **ILK/İLK FRAME:** Always include a fenced code block, even when chained.
269
+ 11. **Quality Floor:** ILK >= 80, SON >= 80, VIDEO >= 120, coverage >= 70 words.
270
+ 12. **Specificity Floor:** lens/framing, lighting, and foreground/midground/background action are mandatory.
271
+ 13. **Spatial Realism Floor:** eyeline target, plane map, shared light source, and contact/depth cues are mandatory when relational staging matters.
272
+ 14. **ONE FILE PER SHOT:** No separate coverage files.
262
273
 
263
274
  ## Workflows
264
275
  | Command | Workflow |
@@ -283,8 +294,9 @@ This workspace keeps high-level policy in \`CLAUDE.md\` and operational detail i
283
294
  ## Before Any Generation or Repair
284
295
  1. Read \`.agent/model-profile.md\`
285
296
  2. Read \`.agent/MASTER.md\`
286
- 3. Read \`.agent/agents/prompt-engineer.md\`
287
- 4. Read \`.agent/workflows/generate.md\` or the requested workflow
297
+ 3. Read \`.agent/VOICE-DESIGN.md\`
298
+ 4. Read \`.agent/agents/prompt-engineer.md\`
299
+ 5. Read \`.agent/workflows/generate.md\` or the requested workflow
288
300
  5. Apply AUTO-ANONYMOUS and AUTO-SAFETY before drafting
289
301
  6. Prefer the active/selected markdown file as scenario source; fallback is \`${config.scenarioHint}\`
290
302
  7. Do not mark work complete while any required report is missing or fail
@@ -293,6 +305,8 @@ This workspace keeps high-level policy in \`CLAUDE.md\` and operational detail i
293
305
  - Write only inside \`${config.outputDir}\`
294
306
  - Keep one file per shot: \`${config.outputDir}/shots/SHOTNN.md\`
295
307
  - Maintain \`${config.outputDir}/shot-plan.json\` dialogue naming policy
308
+ - Maintain \`${config.outputDir}/shot-plan.json\` top-level \`voiceCast\`
309
+ - Keep \`Audio Plan\` blocks aligned to \`voiceCast\`
296
310
  - Keep \`ILK/İLK FRAME\` in a fenced code block even when chained
297
311
  - Quality floor and specificity floor are hard gates, not suggestions
298
312
  - Apply \`.agent/skills/spatial-blocking/SKILL.md\` whenever eyeline, compositing, or depth realism is critical
@@ -313,13 +327,16 @@ Use the Film-Kit core runtime.
313
327
  ## Read First
314
328
  1. \`.agent/model-profile.md\`
315
329
  2. \`.agent/MASTER.md\`
316
- 3. \`.agent/agents/prompt-engineer.md\`
317
- 4. \`.agent/workflows/generate.md\`
318
- 5. \`.claude/rules/output-contract.md\`
330
+ 3. \`.agent/VOICE-DESIGN.md\`
331
+ 4. \`.agent/agents/prompt-engineer.md\`
332
+ 5. \`.agent/workflows/generate.md\`
333
+ 6. \`.claude/rules/output-contract.md\`
319
334
 
320
335
  ## Responsibilities
321
336
  - draft and repair shot files under \`${config.outputDir}/shots/\`
322
337
  - apply \`${config.outputDir}/shot-plan.json\` dialogue naming policy
338
+ - maintain top-level \`voiceCast\` inside \`${config.outputDir}/shot-plan.json\`
339
+ - keep \`Audio Plan\` blocks valid against \`voiceCast\`
323
340
  - enforce AUTO-ANONYMOUS, AUTO-SAFETY, chaining, and coverage contracts
324
341
  - enforce quality floor: ILK >= 80, SON >= 80, VIDEO >= 120, coverage >= 70
325
342
  - enforce specificity floor: lens/framing, lighting, and foreground/midground/background action
@@ -349,6 +366,7 @@ If using the native Claude subagent, read \`.claude/agents/prompt-engineer.md\`
349
366
  - Determine shot count based on action beats
350
367
  - Create \`${config.outputDir}/project-info.md\`
351
368
  - Create \`${config.outputDir}/shot-plan.json\`
369
+ - Add top-level \`voiceCast\` before writing speaking shots
352
370
 
353
371
  2. **Batch Strategy:**
354
372
  - 1-10 shots → Generate all at once
@@ -358,6 +376,7 @@ If using the native Claude subagent, read \`.claude/agents/prompt-engineer.md\`
358
376
  3. **Per Shot (SINGLE FILE: SHOTNN.md):**
359
377
  - Analyze scene type (Dialogue / Action / Emotional / Establishing)
360
378
  - Generate main shot (İLK FRAME + SON FRAME + VİDEO)
379
+ - Add machine-readable \`Audio Plan\` before every VIDEO section
361
380
  - Keep İLK FRAME as fenced code block even when chained
362
381
  - Enforce hard quality floor: ILK >= 80, SON >= 80, VIDEO >= 120, coverage >= 70
363
382
  - Enforce specificity floor: lens/framing + lighting + foreground/midground/background action
@@ -386,8 +405,9 @@ function buildClaudeRuleOutputContract(config) {
386
405
 
387
406
  ## Required Files
388
407
  - \`${config.outputDir}/project-info.md\` — Characters, settings, emotional arc mapping, tension levels
389
- - \`${config.outputDir}/shot-plan.json\` — Name policy, shot plan, and validation contract
408
+ - \`${config.outputDir}/shot-plan.json\` — Name policy, shot plan, validation contract, and top-level \`voiceCast\`
390
409
  - \`.agent/model-profile.md\` — Active model constraints and presets
410
+ - \`.agent/VOICE-DESIGN.md\` — Voice identity and shot audio contract
391
411
  - \`${config.outputDir}/_index.md\` — Shot tracking with chain & status
392
412
  - \`${config.outputDir}/shots/SHOT01.md ... SHOTNN.md\` — Individual shot files (one file per shot)
393
413
  - \`${config.outputDir}/reports/SAFETY-REPORT.md\` — Safety gate report
@@ -403,7 +423,7 @@ The first sentence carries the most weight. Every prompt MUST follow this order:
403
423
  3. [Action] Micro-behavior, acting cue, physical gesture
404
424
  4. [Camera + Lens] "85mm f/2.0, shallow DOF, static with handheld micro-movement"
405
425
  5. [Light + Atmosphere] "Warm oil lamp key light from screen-left, deep shadows"
406
- 6. [Audio direction block] (VIDEO prompts only)
426
+ 6. [Audio Plan + Audio direction block] (VIDEO prompts only)
407
427
  7. [Avoid line] (EVERY prompt — MANDATORY)
408
428
  \`\`\`
409
429
 
@@ -413,6 +433,7 @@ The first sentence carries the most weight. Every prompt MUST follow this order:
413
433
  - Kling preset: \`${config.klingPreset}\`
414
434
  - Kling transition mode: Start+End (when model is kling-3.0)
415
435
  - Motion timeline: first → then → finally (when model is kling-3.0)
436
+ - Kling multi-shot mode: single-transition by default; custom storyboard only for 2-3 meaningful phases (when model is kling-3.0)
416
437
 
417
438
  ## Shot File Format (SHOTNN.md) — SINGLE FILE, ALL INCLUSIVE
418
439
 
@@ -447,6 +468,12 @@ Avoid: blurry, low-res, noise, distorted faces, bad anatomy, extra limbs/fingers
447
468
  plastic skin, waxy skin, on-screen text, watermark, logo, cartoon style, CGI look.
448
469
  \\\`\\\`\\\`
449
470
 
471
+ ### AUDIO PLAN
472
+
473
+ \\\`\\\`\\\`json
474
+ [Machine-readable audioPlan JSON block aligned with shot-plan.json -> voiceCast]
475
+ \\\`\\\`\\\`
476
+
450
477
  ### VİDEO
451
478
 
452
479
  \\\`\\\`\\\`
@@ -494,6 +521,10 @@ Audio direction:
494
521
  Avoid: distorted faces, morphing, blurry, flickering, unnatural motion, on-screen text.
495
522
  \\\`\\\`\\\`
496
523
 
524
+ \\\`\\\`\\\`json
525
+ [Coverage audioPlan JSON block - voice binding only when coverage contains speech]
526
+ \\\`\\\`\\\`
527
+
497
528
  ### SHOT[NN]B — [Type] | [Duration]s | [Icon] [Label]
498
529
  [Same format as A]
499
530
 
@@ -545,6 +576,8 @@ Total: 1 main + [N] coverage = [N+1] production shots
545
576
  ### Name Rule
546
577
  - Visual prompts and non-dialogue fields: no real names
547
578
  - Dialogue transcript naming follows \`shot-plan.json.dialogue_name_policy\`
579
+ - Every speaking shot must resolve \`activeSpeakerKey\` to \`shot-plan.json.voiceCast\`
580
+ - Keep one active speaker per shot
548
581
 
549
582
  ### 30° Rule
550
583
  Coverage camera angles MUST differ from main shot by at least 30°.
@@ -590,6 +623,7 @@ function buildCopilotInstructions(config) {
590
623
  ### System Entry Point
591
624
  Read \`.agent/MASTER.md\` for the complete production ruleset.
592
625
  Read \`.agent/ARCHITECTURE.md\` for system overview.
626
+ Read \`.agent/VOICE-DESIGN.md\` for voice identity and shot audio contracts.
593
627
  Read \`.agent/model-profile.md\` for active model rules.
594
628
 
595
629
  ### Skill Loading Protocol (MANDATORY)
@@ -610,7 +644,8 @@ Before generating ANY prompts, read skills from \`.agent/skills/\`:
610
644
  4. Create index: \`${config.outputDir}/_index.md\`
611
645
  5. Create project info: \`${config.outputDir}/project-info.md\`
612
646
  6. Create plan: \`${config.outputDir}/shot-plan.json\`
613
- 7. Write reports: \`${config.outputDir}/reports/SAFETY-REPORT.md\`, \`${config.outputDir}/reports/DELIVERY-REPORT.md\`
647
+ 7. Keep top-level \`voiceCast\` in the plan and \`Audio Plan\` in speaking VIDEO sections
648
+ 8. Write reports: \`${config.outputDir}/reports/SAFETY-REPORT.md\`, \`${config.outputDir}/reports/DELIVERY-REPORT.md\`
614
649
 
615
650
  ### Critical Rules
616
651
  - **AUTO-ANONYMOUS:** Replace ALL real names with physical descriptions
@@ -621,6 +656,7 @@ Before generating ANY prompts, read skills from \`.agent/skills/\`:
621
656
  - **Spatial Realism:** eyeline targets, shared light, depth scale, and anti-cutout staging must agree when subjects share frame
622
657
  - **Avoid Line:** MANDATORY on every prompt
623
658
  - **Music:** NONE by default
659
+ - **Voice Design:** keep \`voiceCast\` in \`${config.outputDir}/shot-plan.json\` and \`Audio Plan\` in speaking VIDEO sections
624
660
  - **Duration:** 8s default, slow burn pacing
625
661
  - **Language:** Prompts in English, dialogue preserved
626
662
  - **ILK/İLK FRAME:** keep fenced code block even when chained
@@ -650,9 +686,10 @@ When request is /generate, follow the Film-Kit Hollywood production system:
650
686
  3. Load required skills from \`.agent/skills/\`
651
687
  4. Transform scenario into production shot package at \`${config.outputDir}\`
652
688
  5. Generate: project-info.md, shot-plan.json, _index.md, shots/SHOT01.md..SHOTNN.md
653
- 6. Each SHOTNN.md: İLK FRAME + SON FRAME + VİDEO + 2-3 Coverage (ALL IN ONE FILE)
654
- 7. Enforce: auto-anonymous, dialogue name policy, auto-safety, frame chaining, avoid lines
655
- 8. Write reports to \`${config.outputDir}/reports/\` before /finish
689
+ 6. Keep top-level \`voiceCast\` in shot-plan.json
690
+ 7. Each SHOTNN.md: İLK FRAME + SON FRAME + AUDIO PLAN + VİDEO + 2-3 Coverage (ALL IN ONE FILE)
691
+ 8. Enforce: auto-anonymous, dialogue name policy, auto-safety, frame chaining, avoid lines
692
+ 9. Write reports to \`${config.outputDir}/reports/\` before /finish
656
693
  `;
657
694
  }
658
695
  /* ---------- ANTIGRAVITY ---------- */
@@ -667,6 +704,7 @@ description: Hollywood-standard cinematic prompt engineering with model-aware pr
667
704
  ## System Architecture
668
705
  This skill is part of the Film-Kit prompt engineering system.
669
706
  Read \`.agent/MASTER.md\` for the complete production ruleset (690+ rules).
707
+ Read \`.agent/VOICE-DESIGN.md\` for project-level \`voiceCast\` and shot-level \`audioPlan\`.
670
708
  Read \`.agent/model-profile.md\` first for active model constraints.
671
709
 
672
710
  ## Skill Loading Protocol
@@ -700,6 +738,7 @@ Before generating ANY prompts, read these skills:
700
738
  - Shot index: \`${config.outputDir}/_index.md\`
701
739
  - Project info: \`${config.outputDir}/project-info.md\`
702
740
  - Plan: \`${config.outputDir}/shot-plan.json\`
741
+ - Voice contract: top-level \`voiceCast\` in \`${config.outputDir}/shot-plan.json\`
703
742
  - Reports: \`${config.outputDir}/reports/SAFETY-REPORT.md\`, \`${config.outputDir}/reports/DELIVERY-REPORT.md\`
704
743
 
705
744
  ## Critical Rules
@@ -709,12 +748,13 @@ Before generating ANY prompts, read these skills:
709
748
  4. **Frame Chaining:** Last frame of SHOT[N] becomes first frame of SHOT[N+1]
710
749
  5. **Coverage Mandatory:** 2-3 sub-shots per main shot (in same file, min 60 words each)
711
750
  6. **Avoid Line:** MANDATORY on every prompt (image + video + coverage)
712
- 7. **Music: NONE** by default
713
- 8. **Ultra Realism** default visual mode
714
- 9. **8s duration** default, slow burn pacing
715
- 10. **ILK/İLK FRAME:** always keep fenced code block
716
- 11. **ONE FILE PER SHOT:** SHOTNN.md contains main shot + all coverage
717
- 12. **Relational Realism:** preserve eyeline targets, shared light, depth scale, and anti-cutout staging when multiple subjects share frame
751
+ 7. **Voice Design:** keep \`voiceCast\` in \`${config.outputDir}/shot-plan.json\` and \`Audio Plan\` in every speaking VIDEO section
752
+ 8. **Music: NONE** by default
753
+ 9. **Ultra Realism** default visual mode
754
+ 10. **8s duration** default, slow burn pacing
755
+ 11. **ILK/İLK FRAME:** always keep fenced code block
756
+ 12. **ONE FILE PER SHOT:** SHOTNN.md contains main shot + all coverage
757
+ 13. **Relational Realism:** preserve eyeline targets, shared light, depth scale, and anti-cutout staging when multiple subjects share frame
718
758
 
719
759
  ## Quality Floor (Hard Gate)
720
760
  Reject and regenerate any shot that fails:
@@ -843,6 +883,16 @@ Limit the amount of change per clip for realism:
843
883
 
844
884
  > If end frame is too different, the model must "invent miracles" → plastic/melting/warp risk rises.
845
885
 
886
+ ### Kling Storyboard / Multi-Shot Decision Tree
887
+
888
+ The official Kling app surface exposes custom storyboard-like shot prompting and optional end-frame anchoring.
889
+ Use it as a selective upgrade, not a default:
890
+ - **single-transition**: default mode for one clean action arc, reveal, or emotional turn
891
+ - **custom-storyboard**: only when one clip truly needs **2-3 editorially distinct phases**
892
+ - **hard cap in this toolkit:** 3 custom storyboard shots per video generation
893
+ - **never storyboard micro-beats:** one glance, one finger move, one tiny prop touch, one breath shift
894
+ - if the beat needs 4+ phases, split into chained videos instead of overloading one Kling prompt
895
+
846
896
  ### Golden Prompt Skeleton (Start+End)
847
897
 
848
898
  The prompt's job is to tell the model **how to generate the in-between frames**:
@@ -984,26 +1034,22 @@ When using a reference image as the start frame, the model extracts lighting and
984
1034
 
985
1035
  ### Multi-Shot Protocol (Tek Üretimde Çoklu Çekim)
986
1036
 
987
- Kling 3.0 can manage **2 to 6 shots** in a single 15-second generation.
1037
+ Treat Kling multi-shot as **storyboarded internal progression**, not as a license to over-cut.
988
1038
 
989
1039
  **Rules:**
990
- 1. **Shot Prefix:** Each shot MUST begin with \`Shot X,\` (e.g., \`Shot 1,\`, \`Shot 2,\`)
991
- 2. **Character Continuity:** Repeat physical descriptions at the start of each shot, or use **Element Binding** (see below)
992
- 3. **Time Distribution:** 15 seconds must be logically divided (e.g., Shot 1: 3s, Shot 2: 7s, Shot 3: 5s)
993
- 4. **Maximum:** 6 shots per generation; for more, use separate generations with chaining
994
-
995
- Official examples favor short, concrete camera-and-action openings. Start with the movement path or subject action, then lock the stable identity/background constraints.
996
-
997
- **Example Multi-Shot Prompt:**
998
- \`\`\`
999
- Shot 1, Close-up of a bearded man in his 40s, wearing a dusty military uniform. He stares ahead, jaw clenched. Tripod-locked, 85mm, shallow DOF. (3s)
1000
-
1001
- Shot 2, Medium shot, same bearded man turns toward a younger soldier standing behind him. Slow pan right to reveal the young soldier. Natural handheld micro-sway. (5s)
1002
-
1003
- Shot 3, Over-the-shoulder from the younger soldier's POV, the bearded man speaks directly to camera. Gentle push-in. (4s)
1004
-
1005
- Shot 4, Wide shot establishing the artillery position, both soldiers visible. Crane rise. (3s)
1006
- \`\`\`
1040
+ 1. Default to **single-transition** whenever one camera path can carry the beat.
1041
+ 2. Use custom storyboard only for **2-3 distinct internal phases** with different framing/action purpose.
1042
+ 3. Keep each storyboard phase short, clear, and concrete: what changed, what stayed locked, why the cut exists.
1043
+ 4. Hard cap for this toolkit: **3 storyboard shots per generation** in app.kling-oriented workflows.
1044
+ 5. If the sequence wants 4+ phases, split into multiple chained generations.
1045
+ 6. Never assign separate storyboard shots to micro-actions that would read better inside one stronger shot.
1046
+
1047
+ **Preferred storyboard jobs:**
1048
+ - Shot 1: establish or setup
1049
+ - Shot 2: main action / reveal / shift
1050
+ - Shot 3: reaction / settle / handoff
1051
+
1052
+ Official Kling surfaces favor short, concrete shot descriptions. Start from the action path or camera intention, then lock continuity anchors (identity, wardrobe, background geometry, stable light).
1007
1053
 
1008
1054
  ### Element Binding (Öğe Bağlama — Karakter/Nesne Tutarlılığı)
1009
1055
 
@@ -20,6 +20,7 @@ Modular system consisting of:
20
20
  .agent/
21
21
  ├── ARCHITECTURE.md # This file
22
22
  ├── MASTER.md # Main rules (entry point for AI tools)
23
+ ├── VOICE-DESIGN.md # Voice identity + audioPlan contract
23
24
  ├── model-profile.md # Active model rules (runtime generated)
24
25
  ├── agents/
25
26
  │ └── prompt-engineer.md # Primary agent
@@ -60,7 +61,7 @@ Modular system consisting of:
60
61
  | `coverage-system` | **Mandatory coverage shots** (Reaction, OTS, Insert, Cutaway, ECU, Wide) + L-cut/J-cut + 30° kuralı + **180° kuralı** + eyeline match + matching action + multi-character blocking |
61
62
  | `spatial-blocking` | **Relational realism**: eyeline targeting, plane mapping, body orientation, shared lighting, depth/scale integration, anti-cutout / anti-miniature cues |
62
63
  | `visual-modes` | **Ultra Realism** default, stylization triggers, anti-AI artifact rules + **renk sürekliliği** + magic hour + flashback/rüya görsel ayrımı |
63
- | `audio-design` | **Sound design** rules, voice realism, SFX, ambience, audio direction block + diegetic/non-diegetic ses ayrımı |
64
+ | `audio-design` | **Sound design** rules, voice realism, project-level `voiceCast`, shot-level `audioPlan`, audio direction block + diegetic/non-diegetic ses ayrımı |
64
65
  | `prompt-structure` | Image/video prompt templates, camera vocabulary, seed parameter, prompt rewriter, **re-take strategy**, coverage prompt yazım standartları (≥60 kelime) |
65
66
 
66
67
  ---
@@ -84,6 +85,8 @@ User Scenario → Agent Activated → Read model-profile → Load Required Skill
84
85
 
85
86
  .agent/model-profile.md (ALWAYS FIRST)
86
87
 
88
+ .agent/VOICE-DESIGN.md (when dialogue / narrator exists)
89
+
87
90
  safety-compliance (ALWAYS)
88
91
  reference-locking (if refs provided)
89
92
  frame-chaining (ALWAYS)
package/content/MASTER.md CHANGED
@@ -131,9 +131,23 @@ User: "Subay askerlere bağırır ve ateş emri verir."
131
131
  2. "Ateş!" diye bağırır, damarları çıkar. (Mid-shot)
132
132
  3. Askerlerin parmağı tetiğe gider. (Detail shot)
133
133
 
134
- ### 🎵 Music Policy
135
- **DEFAULT:** `Music: NONE` (Always).
136
- User must explicitly request music to include it. Otherwise, rely on SFX and Ambience.
134
+ ### 🎵 Music Policy
135
+ **DEFAULT:** `Music: NONE` (Always).
136
+ User must explicitly request music to include it. Otherwise, rely on SFX and Ambience.
137
+
138
+ ### Voice Design Contract
139
+
140
+ When dialogue or narration exists, audio is mandatory at three synchronized layers:
141
+
142
+ 1. project-level `voiceCast`
143
+ 2. shot-level `audioPlan`
144
+ 3. prompt-level `Audio direction:`
145
+
146
+ Project-level voice identity is stable.
147
+ Shot-level performance is variable.
148
+
149
+ Use `.agent/VOICE-DESIGN.md` whenever a speaker, narrator, or voiceover is involved.
150
+ Do not redesign a speaker voice per shot. Reuse the same `speakerKey` and voice identity across the whole project.
137
151
 
138
152
  ### 📸 Reference Image Enforcement
139
153
 
@@ -160,7 +174,7 @@ When user provides references, EVERY prompt MUST:
160
174
  3. [Aksiyon] — Ne yapıyor, mikro-davranış, oyunculuk ipucu
161
175
  4. [Kamera + Lens] — "85mm f/2.0, shallow DOF, static with handheld micro-movement"
162
176
  5. [Işık + Atmosfer] — "Warm oil lamp key light from screen-left, deep shadows"
163
- 6. [Audio direction block] — (sadece video prompt'larında)
177
+ 6. [Audio Plan + Audio direction block] — (sadece video prompt'larında)
164
178
  7. [Avoid line] — (HER prompt'ta zorunlu)
165
179
  ```
166
180
 
@@ -186,11 +200,20 @@ Avoid: blurry, low-res, noise, distorted faces, bad anatomy, extra limbs/fingers
186
200
  - `--audio_mode` or `--audio_prompt` (audio is for VIDEO only)
187
201
  - Any `--parameter` flags
188
202
 
189
- #### VIDEO PROMPTS (VİDEO, Coverage Video)
190
-
191
- **MUST INCLUDE FULL AUDIO DIRECTION BLOCK:**
192
- ```
193
- Audio direction:
203
+ #### VIDEO PROMPTS (VİDEO, Coverage Video)
204
+
205
+ **VOICE DESIGN CONTRACT (MANDATORY WHEN VOICE EXISTS):**
206
+ - Project output must include top-level `voiceCast`
207
+ - Every VIDEO section must include a machine-readable `Audio Plan` block
208
+ - If `Type` is `Dialogue`, `Voiceover`, or `Mixed`, `activeSpeakerKey` must bind to `voiceCast`
209
+ - `voiceIdentityPrompt` is character-level
210
+ - `performanceNote` is shot-level
211
+ - Keep one active speaker per shot
212
+ - Split reply dialogue across multiple shots when needed
213
+
214
+ **MUST INCLUDE FULL AUDIO DIRECTION BLOCK:**
215
+ ```
216
+ Audio direction:
194
217
  - Language: [TURKISH/ENGLISH/etc.]
195
218
  - Type: [Dialogue/SFX/Ambience/Mixed]
196
219
  - Dialogue transcript: "[Exact lines in original language]" or NONE
@@ -530,11 +553,13 @@ Avoid: blurry, low-res, noise, distorted faces, bad anatomy, extra limbs/fingers
530
553
  **VİDEO:**
531
554
 
532
555
  ```
533
- [Complete video prompt — MIN 80 words, MAX 120 words]
534
- [Action + camera movement + acting cues + atmosphere]
535
-
536
- Audio direction:
537
- - Language: [LANGUAGE]
556
+ [Complete video prompt — MIN 80 words, MAX 120 words]
557
+ [Action + camera movement + acting cues + atmosphere]
558
+
559
+ [Audio Plan JSON block aligned with .agent/VOICE-DESIGN.md]
560
+
561
+ Audio direction:
562
+ - Language: [LANGUAGE]
538
563
  - Type: [Dialogue/SFX/Ambience]
539
564
  - Dialogue transcript: [Lines or NONE]
540
565
  - SFX: [Effects list]
@@ -613,11 +638,15 @@ Before outputting, validate EVERY shot. **Bu kontrol otomatiktir, kullanıcı ha
613
638
  - [ ] **HER coverage image** → Avoid satırı var mı? ❗ Yoksa EKLE
614
639
  - [ ] **HER coverage video** → Avoid satırı var mı? ❗ Yoksa EKLE
615
640
 
616
- ### 4️⃣ Prompt Uzunluk Kontrolü
617
- - [ ] Ana shot image prompt ≥ 60 kelime mi?
618
- - [ ] Ana shot video prompt ≥ 80 kelime mi?
619
- - [ ] **Coverage prompt ≥ 60 kelime mi?** ❗ Kısa coverage YASAK
620
- - [ ] Audio direction block tam mı? (Language, Type, Dialogue, SFX, Ambience, Music, Mix)
641
+ ### 4️⃣ Prompt Uzunluk Kontrolü
642
+ - [ ] Ana shot image prompt ≥ 60 kelime mi?
643
+ - [ ] Ana shot video prompt ≥ 80 kelime mi?
644
+ - [ ] **Coverage prompt ≥ 60 kelime mi?** ❗ Kısa coverage YASAK
645
+ - [ ] Audio direction block tam mı? (Language, Type, Dialogue, SFX, Ambience, Music, Mix)
646
+ - [ ] `voiceCast` mevcut mu ve her konusan speaker icin `speakerKey` tekil mi?
647
+ - [ ] Her VIDEO bolumunde machine-readable `Audio Plan` var mi?
648
+ - [ ] Dialogue / Voiceover shot'larinda `activeSpeakerKey` -> `voiceCast` baglantisi gecerli mi?
649
+ - [ ] Bir shot'ta tek aktif speaker kurali korunuyor mu?
621
650
 
622
651
  ### 5️⃣ Türkçe Özet
623
652
  - [ ] Her shot'un 🇹🇷 Türkçe özet satırı var mı?
package/content/RULES.md CHANGED
@@ -114,10 +114,11 @@ When references provided:
114
114
 
115
115
  | Kural | Açıklama |
116
116
  |-------|----------|
117
- | HER image prompt | Avoid satırı ile bitmeli |
118
- | HER video prompt | Audio direction block + Avoid satırı ile bitmeli |
119
- | HER coverage prompt | ≥ 60 kelime + Avoid satırı ile bitmeli |
120
- | Türkçe özet | Her shot ve coverage için 🇹🇷 özet satırı |
117
+ | HER image prompt | Avoid satırı ile bitmeli |
118
+ | HER video prompt | Audio direction block + Avoid satırı ile bitmeli |
119
+ | HER coverage prompt | ≥ 60 kelime + Avoid satırı ile bitmeli |
120
+ | Türkçe özet | Her shot ve coverage için 🇹🇷 özet satırı |
121
+ | Voice Design | `voiceCast` proje seviyesi, `audioPlan` shot seviyesi, `Audio direction` parser seviyesi |
121
122
 
122
123
  ### 📷 Coverage System (MANDATORY)
123
124
 
@@ -134,10 +135,20 @@ When references provided:
134
135
 
135
136
  **Sub-shot naming:** SHOT05A, SHOT05B, SHOT05C (main shot + alphabetical suffix)
136
137
 
137
- ### 🎵 Music Policy (BANNED)
138
-
139
- **DEFAULT:** `Music: NONE` (Always).
140
- User must explicitly request music. Otherwise use SFX/Ambience only.
138
+ ### 🎵 Music Policy (BANNED)
139
+
140
+ **DEFAULT:** `Music: NONE` (Always).
141
+ User must explicitly request music. Otherwise use SFX/Ambience only.
142
+
143
+ ### Voice Design Policy
144
+
145
+ If dialogue or narration exists:
146
+
147
+ - keep a project-level `voiceCast`
148
+ - keep a shot-level `audioPlan`
149
+ - keep the prompt-level `Audio direction:` block
150
+ - keep one active speaker per shot
151
+ - reuse the same `speakerKey` and voice identity across all shots
141
152
 
142
153
  ### 🎭 Dramaturji & Oyunculuk (Diyalog Sahneleri)
143
154
 
@@ -0,0 +1,248 @@
1
+ # Voice Design Output Contract
2
+
3
+ This runtime must support voice-design style audio generation where voice identity is created once per speaker and then reused across shots.
4
+
5
+ ## Three-Layer Audio Architecture
6
+
7
+ Every scenario output must keep these layers aligned:
8
+
9
+ 1. `voiceCast`
10
+ Project-level or scenario-level voice identity package.
11
+ 2. `audioPlan`
12
+ Shot-level performance, transcript, SFX, ambience, and mix metadata.
13
+ 3. `Audio direction:`
14
+ Human-readable prompt block kept for backward compatibility with existing parsers.
15
+
16
+ ## Film-Kit Storage Contract
17
+
18
+ - Single-agent projects store project-level voice identity in `$OUTPUT_DIR/shot-plan.json`.
19
+ - Multi-agent projects store project-level voice identity in `$OUTPUT_DIR/team-plan.json`.
20
+ - Every `SHOTNN.md` file must include a machine-readable `Audio Plan` JSON block for each VIDEO section.
21
+ - The `Audio direction:` prompt block must mirror the same `audioPlan` and remain present in the video prompt.
22
+
23
+ ## Core Principle
24
+
25
+ Voice identity is character-level.
26
+ Performance is shot-level.
27
+
28
+ Do not redesign a character voice on every shot.
29
+ Design the voice once, bind it to a stable `speakerKey`, and reuse that identity across the entire project.
30
+
31
+ ## Project-Level `voiceCast`
32
+
33
+ Create one `voiceCast` entry for every speaking character or narrator.
34
+
35
+ Required fields:
36
+
37
+ - `speaker`
38
+ - `speakerKey`
39
+ - `role`
40
+ - `language`
41
+ - `voiceIdentityPrompt`
42
+ - `voicePerformanceBase`
43
+ - `preferredProvider`
44
+ - `preferredModel`
45
+ - `saveToLibrary`
46
+
47
+ Recommended fields:
48
+
49
+ - `genderHint`
50
+ - `ageRange`
51
+ - `referenceAudioStrategy`
52
+ - `referenceAudioNeeded`
53
+ - `shouldGenerateVoice`
54
+ - `voiceDesignRecommended`
55
+ - `voiceDesignPriority`
56
+ - `referenceAudioSuggested`
57
+ - `referenceAudioDescription`
58
+ - `notes`
59
+
60
+ Example:
61
+
62
+ ```json
63
+ {
64
+ "speaker": "Kemal",
65
+ "speakerKey": "kemal",
66
+ "role": "character",
67
+ "language": "turkish",
68
+ "genderHint": "male",
69
+ "ageRange": "35-45",
70
+ "voiceIdentityPrompt": "A realistic Turkish male voice in his late thirties to early forties. Low-mid register, restrained authority, subtle fatigue, natural diction, intimate close-mic realism, no theatricality, highly believable human texture.",
71
+ "voicePerformanceBase": "controlled, interior, understated",
72
+ "referenceAudioStrategy": "optional",
73
+ "referenceAudioNeeded": false,
74
+ "shouldGenerateVoice": true,
75
+ "preferredProvider": "elevenlabs",
76
+ "preferredModel": "eleven_ttv_v3",
77
+ "saveToLibrary": true,
78
+ "voiceDesignRecommended": true,
79
+ "voiceDesignPriority": "high",
80
+ "referenceAudioSuggested": false,
81
+ "referenceAudioDescription": null,
82
+ "notes": "Use the same voice identity across all shots. Performance may change per shot, identity must not."
83
+ }
84
+ ```
85
+
86
+ ## `speakerKey` Rule
87
+
88
+ The same speaker must use the same `speakerKey` everywhere.
89
+ If a dialogue shot references a `speakerKey` that does not exist in `voiceCast`, treat it as a contract failure.
90
+
91
+ ## `voiceIdentityPrompt` Rule
92
+
93
+ `voiceIdentityPrompt` describes the stable voice identity, not the temporary emotion of one shot.
94
+
95
+ Include:
96
+
97
+ - language
98
+ - age range
99
+ - gender feel when useful
100
+ - register and texture
101
+ - diction and articulation
102
+ - energy profile
103
+ - recording feel or mic proximity
104
+ - realism target such as believable, non-theatrical, non-announcer
105
+
106
+ Do not include:
107
+
108
+ - exact dialogue lines
109
+ - one-shot emotional spikes
110
+ - scene-specific action like running, shouting, whispering this exact line
111
+
112
+ ## `voicePerformanceBase`
113
+
114
+ Use `voicePerformanceBase` for the speaker's normal acting baseline.
115
+ This baseline may combine with the shot-level `performanceNote`, but it must not replace voice identity.
116
+
117
+ ## Shot-Level `audioPlan`
118
+
119
+ Every VIDEO section must carry a machine-readable `Audio Plan` JSON block.
120
+
121
+ If dialogue or voiceover exists, these fields are mandatory:
122
+
123
+ - `language`
124
+ - `type`
125
+ - `activeSpeaker`
126
+ - `activeSpeakerKey`
127
+ - `dialogueLines`
128
+ - `dialogueTranscript`
129
+ - `performanceNote`
130
+
131
+ Recommended support fields:
132
+
133
+ - `delivery.pace`
134
+ - `delivery.intensity`
135
+ - `delivery.emotion`
136
+ - `delivery.projection`
137
+ - `delivery.stability`
138
+ - `sfx`
139
+ - `ambience`
140
+ - `music`
141
+ - `mixTarget`
142
+ - `hasSubtitles`
143
+
144
+ Example:
145
+
146
+ ```json
147
+ {
148
+ "language": "turkish",
149
+ "type": "dialogue",
150
+ "activeSpeaker": "Kemal",
151
+ "activeSpeakerKey": "kemal",
152
+ "dialogueLines": [
153
+ {
154
+ "speaker": "Kemal",
155
+ "speakerKey": "kemal",
156
+ "text": "Burayi terk etmemiz lazim."
157
+ }
158
+ ],
159
+ "dialogueTranscript": "Kemal: Burayi terk etmemiz lazim.",
160
+ "performanceNote": "Low, controlled urgency. Quiet but decisive. No melodrama.",
161
+ "delivery": {
162
+ "pace": "measured",
163
+ "intensity": "medium-low",
164
+ "emotion": "suppressed urgency",
165
+ "projection": "intimate",
166
+ "stability": "steady"
167
+ },
168
+ "sfx": [
169
+ "Ahsap sandalyenin hafif surtunmesi"
170
+ ],
171
+ "ambience": [
172
+ "Kucuk odada dusuk floresan humu"
173
+ ],
174
+ "music": null,
175
+ "mixTarget": {
176
+ "dialogue": 70,
177
+ "sfx": 5,
178
+ "ambience": 25
179
+ },
180
+ "hasSubtitles": false
181
+ }
182
+ ```
183
+
184
+ ## Single Active Speaker Rule
185
+
186
+ For realistic lipsync and stable delivery, keep only one active speaker per shot.
187
+
188
+ - Other characters may appear in frame, but they should listen silently.
189
+ - Back-and-forth dialogue should be split across multiple shots.
190
+ - Coverage shots may have `audioPlan.type` values like `sfx-only` or `ambience-only` and therefore require no voice binding.
191
+
192
+ ## Backward-Compatible Prompt Block
193
+
194
+ The prompt must still include the human-readable block:
195
+
196
+ ```text
197
+ Audio direction:
198
+ Language: Turkish
199
+ Type: Dialogue
200
+ Dialogue transcript:
201
+ Kemal: Burayi terk etmemiz lazim.
202
+ SFX:
203
+ - Ahsap sandalyenin hafif surtunmesi
204
+ Ambience:
205
+ - Kucuk odada dusuk floresan humu
206
+ Music: NONE
207
+ Mix target: Dialogue 70%, SFX 5%, Ambience 25%
208
+ No on-screen subtitles/captions.
209
+ ```
210
+
211
+ This block must be derived from the same `audioPlan`, not invented separately.
212
+
213
+ ## Default Voice Design Policy
214
+
215
+ - Character voice stays fixed.
216
+ - Shot performance may change.
217
+ - Language and diction stay fixed.
218
+ - Realism is maximized.
219
+ - Theatricality is minimized.
220
+ - Promotional or announcer tone is forbidden unless explicitly requested.
221
+ - `Music: NONE` is the default.
222
+ - `No on-screen subtitles/captions.` is mandatory unless the user explicitly requests otherwise.
223
+
224
+ ## Reference Audio Guidance
225
+
226
+ Recommend reference audio when the target voice depends on unusually specific texture:
227
+
228
+ - elderly or highly specific age character
229
+ - broken, smoky, hoarse, or damaged micro-texture
230
+ - strong local accent or highly specific diction
231
+ - very narrow performance tolerance between whisper and force
232
+
233
+ If reference audio is recommended, prefer:
234
+
235
+ - `preferredModel: "eleven_ttv_v3"`
236
+ - `referenceAudioNeeded: true`
237
+
238
+ ## Provider Flow
239
+
240
+ When a provider supports voice design, the implementation flow is:
241
+
242
+ 1. Use `voiceCast[].voiceIdentityPrompt` to design a preview voice.
243
+ 2. Review multiple previews.
244
+ 3. Select one preview.
245
+ 4. Create a persistent `voice_id`.
246
+ 5. Bind that `voice_id` to `speakerKey`.
247
+ 6. Reuse the same voice identity across all future shots.
248
+ 7. If needed, remix or refine the same voice without breaking identity continuity.
@@ -133,13 +133,24 @@ First sentence carries the most weight. Every prompt MUST follow this order:
133
133
  3. [Action] Micro-behavior, acting cue, physical gesture
134
134
  4. [Camera + Lens] "85mm f/2.0, shallow DOF, static with handheld micro-movement"
135
135
  5. [Light + Atmosphere] "Warm oil lamp key light from screen-left, deep shadows"
136
- 6. [Audio direction block] (VIDEO prompts only)
137
- 7. [Avoid line] (EVERY prompt — MANDATORY)
138
- ```
139
-
140
- > **Kling adaptation:** When model is kling-3.0, replace step 3 (Action) with motion timeline (`first → then → finally`). Step 6 uses Kling tone markers instead of Audio Direction Block. See `.agent/model-profile.md` for golden prompt skeleton.
141
-
142
- ---
136
+ 6. [Audio direction block] (VIDEO prompts only)
137
+ 7. [Avoid line] (EVERY prompt — MANDATORY)
138
+ ```
139
+
140
+ > **Kling adaptation:** When model is kling-3.0, replace step 3 (Action) with motion timeline (`first → then → finally`). Step 6 uses Kling tone markers instead of Audio Direction Block. See `.agent/model-profile.md` for golden prompt skeleton.
141
+
142
+ ## VOICE DESIGN CONTRACT
143
+
144
+ When dialogue, narrator VO, or character speech exists:
145
+
146
+ - read `.agent/VOICE-DESIGN.md`
147
+ - create and maintain project-level `voiceCast` in `shot-plan.json`
148
+ - reuse the same `speakerKey` across all shots
149
+ - write a machine-readable `Audio Plan` JSON block for every VIDEO section
150
+ - keep `voiceIdentityPrompt` character-level and `performanceNote` shot-level
151
+ - keep one active speaker per shot
152
+
153
+ ---
143
154
 
144
155
  ## AUDIO DIRECTION (Mandatory for Video)
145
156
 
@@ -195,22 +206,29 @@ Before outputting ANY shot:
195
206
  - [ ] No active violence/gore?
196
207
 
197
208
  ### 2. Prompt Structure
198
- - [ ] Follows 1-7 Flow Order?
199
- - [ ] Avoid Line present on EVERY prompt (Image + Video + Coverage)?
200
- - [ ] Audio direction present on EVERY video prompt? (Veo block or Kling inline)
201
- - [ ] 2-3 Coverage shots included in the same file?
209
+ - [ ] Follows 1-7 Flow Order?
210
+ - [ ] Avoid Line present on EVERY prompt (Image + Video + Coverage)?
211
+ - [ ] Audio direction present on EVERY video prompt? (Veo block or Kling inline)
212
+ - [ ] `shot-plan.json` contains top-level `voiceCast`?
213
+ - [ ] Every speaking shot has an `Audio Plan` block with valid `activeSpeakerKey`?
214
+ - [ ] Each active speaker resolves to a stable `speakerKey` in `voiceCast`?
215
+ - [ ] One active speaker per dialogue shot?
216
+ - [ ] 2-3 Coverage shots included in the same file?
202
217
  - [ ] Quality floor passes? (ILK>=80, SON>=80, VIDEO>=120, coverage>=70)
203
218
  - [ ] Specificity floor passes? (lens + lighting + FG/MG/BG action)
204
219
  - [ ] Spatial realism passes? (eyeline target + plane map + shared light + contact/depth cues)
205
220
  - [ ] Model Control block exists? (`Model`, `Preset`, `CFG`, `Transition Mode`)
206
221
 
207
- ### 3. Kling-Specific Gates (when model is kling-3.0)
208
- - [ ] Motion timeline uses `first → then → finally` structure?
209
- - [ ] "What stays the same" explicitly stated? (identity, background, costume)
210
- - [ ] Camera movement is simple/safe? (no complex hybrid movements)
211
- - [ ] Negative prompt includes Kling cleanup set? (warping, rubbery, melted)
212
- - [ ] Duration matches transformation budget? (5s=1 change, 10s=2-3, 15s=complex)
213
- - [ ] Start/end frames are in same visual universe? (angle, scale, light, lens)
222
+ ### 3. Kling-Specific Gates (when model is kling-3.0)
223
+ - [ ] Motion timeline uses `first → then → finally` structure?
224
+ - [ ] "What stays the same" explicitly stated? (identity, background, costume)
225
+ - [ ] Camera movement is simple/safe? (no complex hybrid movements)
226
+ - [ ] Negative prompt includes Kling cleanup set? (warping, rubbery, melted)
227
+ - [ ] Duration matches transformation budget? (5s=1 change, 10s=2-3, 15s=complex)
228
+ - [ ] Start/end frames are in same visual universe? (angle, scale, light, lens)
229
+ - [ ] Used `single-transition` by default instead of forcing storyboard complexity?
230
+ - [ ] If custom storyboard is used, is it capped at 3 meaningful stages?
231
+ - [ ] Did each storyboard stage earn a distinct editorial job instead of representing micro-gestures?
214
232
 
215
233
  ---
216
234
 
@@ -241,15 +259,20 @@ For EACH shot, output exactly one file (`SHOTNN.md`) containing Main Shot + Cove
241
259
  [If CHAINED: include "Use SHOT[prev]_END as exact first frame" inside code block]
242
260
  [If FIRST/BREAK: image prompt in same code block]
243
261
 
244
- ### SON FRAME (SHOTNN_END)
245
- ```
246
- [Image Prompt (Flow Order + Avoid Line)]
247
- avoid: blurry, low-res, text, watermark, bad anatomy, distorted face ...
248
- ```
249
-
250
- ### VİDEO
251
- ```
252
- [Video Prompt (Flow Order)]
262
+ ### SON FRAME (SHOTNN_END)
263
+ ```
264
+ [Image Prompt (Flow Order + Avoid Line)]
265
+ avoid: blurry, low-res, text, watermark, bad anatomy, distorted face ...
266
+ ```
267
+
268
+ ### AUDIO PLAN
269
+ ```json
270
+ [Machine-readable audioPlan JSON block aligned with shot-plan.json -> voiceCast]
271
+ ```
272
+
273
+ ### VİDEO
274
+ ```
275
+ [Video Prompt (Flow Order)]
253
276
 
254
277
  Audio direction:
255
278
  - Language: ...
@@ -3,12 +3,136 @@ name: audio-design
3
3
  description: Sound design rules for model-aware profiles (Veo 3.1 / Kling 3.0). Voice realism, environmental sounds, SFX, ambience, and audio direction block formatting. Includes Kling native audio tonlama and anti-synthetic audio guidelines.
4
4
  ---
5
5
 
6
- # Audio Design System
7
-
8
- > **Philosophy:** Sound is half the experience. Audio must be as real as visuals.
9
- > **Core Principle:** Every sound recorded-quality. No synthetic giveaways.
10
-
11
- ---
6
+ # Audio Design System
7
+
8
+ > **Philosophy:** Sound is half the experience. Audio must be as real as visuals.
9
+ > **Core Principle:** Every sound recorded-quality. No synthetic giveaways.
10
+
11
+ ---
12
+
13
+ ## Voice Design Contract (MANDATORY)
14
+
15
+ When a scenario includes spoken dialogue, narrator VO, or any reusable character voice, use a three-layer contract:
16
+
17
+ 1. `voiceCast`
18
+ Stable project-level voice identity package.
19
+ 2. `audioPlan`
20
+ Shot-level performance and mix metadata.
21
+ 3. `Audio direction:`
22
+ Human-readable prompt block for backward compatibility.
23
+
24
+ Use `.agent/VOICE-DESIGN.md` as the authoritative schema reference.
25
+
26
+ ### Project-Level `voiceCast`
27
+
28
+ Every speaking character or narrator must have one `voiceCast` entry.
29
+
30
+ Required fields:
31
+
32
+ - `speaker`
33
+ - `speakerKey`
34
+ - `role`
35
+ - `language`
36
+ - `voiceIdentityPrompt`
37
+ - `voicePerformanceBase`
38
+ - `preferredProvider`
39
+ - `preferredModel`
40
+ - `saveToLibrary`
41
+
42
+ Rules:
43
+
44
+ - `speakerKey` must stay stable across the whole project.
45
+ - `voiceIdentityPrompt` describes the permanent voice identity, never a one-shot emotional spike.
46
+ - `voicePerformanceBase` describes the speaker's normal acting baseline.
47
+ - If a dialogue shot has no matching `voiceCast` entry, the output is invalid.
48
+
49
+ Example:
50
+
51
+ ```json
52
+ {
53
+ "speaker": "Kemal",
54
+ "speakerKey": "kemal",
55
+ "role": "character",
56
+ "language": "turkish",
57
+ "voiceIdentityPrompt": "A realistic Turkish male voice in his late thirties to early forties. Low-mid register, restrained authority, subtle fatigue, natural diction, intimate close-mic realism, no theatricality, highly believable human texture.",
58
+ "voicePerformanceBase": "controlled, interior, understated",
59
+ "preferredProvider": "elevenlabs",
60
+ "preferredModel": "eleven_ttv_v3",
61
+ "saveToLibrary": true
62
+ }
63
+ ```
64
+
65
+ ### Shot-Level `audioPlan`
66
+
67
+ Every VIDEO block must include a machine-readable `Audio Plan` JSON section that maps to the project-level voice identity.
68
+
69
+ If the shot contains dialogue or voiceover, these fields are mandatory:
70
+
71
+ - `language`
72
+ - `type`
73
+ - `activeSpeaker`
74
+ - `activeSpeakerKey`
75
+ - `dialogueLines`
76
+ - `dialogueTranscript`
77
+ - `performanceNote`
78
+
79
+ Recommended support fields:
80
+
81
+ - `delivery.pace`
82
+ - `delivery.intensity`
83
+ - `delivery.emotion`
84
+ - `delivery.projection`
85
+ - `delivery.stability`
86
+ - `sfx`
87
+ - `ambience`
88
+ - `music`
89
+ - `mixTarget`
90
+ - `hasSubtitles`
91
+
92
+ ### Single Active Speaker Rule
93
+
94
+ Keep only one active speaker per shot.
95
+
96
+ - Reaction shots should usually be silent or ambience-only.
97
+ - Back-and-forth dialogue should be split across multiple shots.
98
+ - OTS coverage may carry one voice, but do not ask the model to lipsync two people at once.
99
+
100
+ ### Voice Identity vs Performance
101
+
102
+ Keep this distinction strict:
103
+
104
+ - `voiceIdentityPrompt` = who the speaker is across the whole film
105
+ - `performanceNote` = how that speaker performs in this shot
106
+
107
+ Correct identity cues:
108
+
109
+ - `low-mid register`
110
+ - `natural Turkish diction`
111
+ - `restrained authority`
112
+ - `breathy but realistic`
113
+ - `non-theatrical`
114
+ - `highly believable real-world tone`
115
+
116
+ Incorrect identity cues:
117
+
118
+ - exact dialogue lines
119
+ - `angrily says this line`
120
+ - `whispering while running`
121
+ - temporary scene emotion only
122
+
123
+ ### Provider Flow
124
+
125
+ Voice Design style providers should follow this order:
126
+
127
+ 1. design from `voiceCast[].voiceIdentityPrompt`
128
+ 2. review previews
129
+ 3. choose one
130
+ 4. create persistent voice
131
+ 5. bind voice to `speakerKey`
132
+ 6. reuse across all shots
133
+ 7. remix only as a refinement, not as a new character identity
134
+
135
+ ---
12
136
 
13
137
  ## 🎯 Audio Realism Baseline
14
138
 
@@ -197,19 +321,22 @@ User: "Kalkın aslanlar! Düşman zırhlıları top menziline giriyorlar!"
197
321
  ✅ RIGHT: Dialogue transcript: "Kalkın aslanlar! Düşman zırhlıları top menziline giriyorlar!"
198
322
  ```
199
323
 
200
- ### When User Provides Dialogue
201
-
202
- - Include VERBATIM in audio transcript
203
- - Preserve original language (Turkish stays Turkish, English stays English)
204
- - Note emotional delivery required
324
+ ### When User Provides Dialogue
325
+
326
+ - Include VERBATIM in audio transcript
327
+ - Preserve original language (Turkish stays Turkish, English stays English)
328
+ - Note emotional delivery required
329
+ - Mirror the same transcript in `audioPlan.dialogueLines`
330
+ - Bind the speaking line to `activeSpeakerKey`
205
331
 
206
332
  ### 🗣️ SPEAKER ISOLATION RULE (Prevent Mixed Dialogue)
207
333
 
208
334
  **Problem:** If two people are in the frame and both speak, AI often mixes lipsync or timing.
209
335
  **Solution:** ONE active speaker per shot.
210
336
 
211
- - **Shot A:** Character X speaks. Camera focuses on X (or over-the-shoulder of Y).
212
- - **Shot B:** Character Y replies. Camera focuses on Y.
337
+ - **Shot A:** Character X speaks. Camera focuses on X (or over-the-shoulder of Y).
338
+ - **Shot B:** Character Y replies. Camera focuses on Y.
339
+ - Keep the same `speakerKey` and `voiceCast` identity in both shots. Only `performanceNote` changes.
213
340
 
214
341
  **EXCEPTION:** If both MUST be in frame (Two-Shot):
215
342
  1. Use "Reaction Shot" for the listener (listener nods while speaker talks off-screen sound).
@@ -81,6 +81,18 @@ Before using any video template in this file:
81
81
  - If active model is **Kling 3.0**, do **not** default to the Veo-style `Audio direction:` bullet schema below.
82
82
  - If project is a hybrid/smart-hybrid package, the local `.agent/skills/prompt-structure/SKILL.md` override takes precedence over this base file.
83
83
 
84
+ ## Kling Storyboard / Multi-Shot Decision Rule
85
+
86
+ The official Kling app surface exposes start/end control, optional end-frame anchoring, and custom storyboard-style shot prompting.
87
+ Use that capability with restraint:
88
+
89
+ - default to a single Start+End transition prompt when the beat is one continuous motion arc, one reveal, or one emotional turn
90
+ - use custom storyboard only when the clip truly contains **2-3 distinct editorial phases** with different visual jobs
91
+ - in this toolkit, cap app.kling custom storyboard mode at **3 custom shots** per video
92
+ - never split one micro gesture, blink, glance, prop touch, or tiny head turn into separate storyboard shots
93
+ - if a beat needs 4+ distinct phases, split into chained videos instead of bloating one generation
94
+ - every storyboard shot must earn a clear job: establish / action / reveal / reaction / resolve
95
+
84
96
  ## Video Prompt Structure (Veo-Oriented Default Template)
85
97
 
86
98
  > Use this section when the active video model/routed section is **Veo 3.1**.
@@ -198,9 +210,16 @@ Avoid: distorted faces, morphing, bad anatomy, extra limbs/fingers, blurry, flic
198
210
  | **Establishing** | 8s | 10–15s | Set scene, allow environment |
199
211
  | **Complex transformation** | N/A | 15s (max) | Multi-step with single-scene continuity |
200
212
 
201
- > **Kling Note:** Duration maps directly to **transformation budget**. More change between start and end frames = longer duration needed. See `.agent/model-profile.md` for details.
202
-
203
- ---
213
+ > **Kling Note:** Duration maps directly to **transformation budget**. More change between start and end frames = longer duration needed. See `.agent/model-profile.md` for details.
214
+
215
+ ### Kling Anti-Fragmentation Rule
216
+
217
+ - do not turn every sentence or gesture into a new Kling storyboard phase
218
+ - if a single camera move can carry the beat cleanly, keep one shot and strengthen the motion path
219
+ - if the beat reads as setup -> action -> settle, `first -> then -> finally` is usually enough
220
+ - reserve multi-shot/storyboard mode for meaningful internal progression, not decorative complexity
221
+
222
+ ---
204
223
 
205
224
  ## Prompt Re-Take Strategy
206
225
 
@@ -34,6 +34,7 @@ All generated files must keep `SHOTNN.md` naming.
34
34
  1. Read `$OUTPUT_DIR/shots/SHOT[last].md`.
35
35
  2. Extract `SON FRAME` description.
36
36
  3. Capture character/location continuity details.
37
+ 4. Reuse `$OUTPUT_DIR/shot-plan.json -> voiceCast` and keep every existing `speakerKey` stable.
37
38
 
38
39
  ### 3. Continue Generation
39
40
 
@@ -49,6 +50,7 @@ If model is kling-3.0: keep Start+End transition mode and first/then/finally mot
49
50
  - Write each shot to `$OUTPUT_DIR/shots/SHOT[NN].md`
50
51
  - Keep one-file-per-shot contract
51
52
  - Ensure `ILK/İLK FRAME` code block exists even when chained
53
+ - Keep `Audio Plan` blocks aligned to the existing `voiceCast`
52
54
  - Update `$OUTPUT_DIR/_index.md`
53
55
 
54
56
  ### 5. Refresh Reports
@@ -32,6 +32,8 @@ All shot files follow `SHOTNN.md` naming.
32
32
  5. Verify chain continuity.
33
33
  6. Verify Avoid line on every prompt.
34
34
  7. Verify `Model Control` block exists in every shot file.
35
+ 8. Verify `shot-plan.json` has `voiceCast` coverage for every speaker or narrator.
36
+ 9. Verify every speaking VIDEO section has an `Audio Plan` block with valid `activeSpeakerKey`.
35
37
  8. For `kling-3.0`, verify:
36
38
  - `Transition Mode: Start+End`
37
39
  - CFG value is documented
@@ -83,6 +85,7 @@ Do not declare completion unless:
83
85
  - delivery report is pass
84
86
  - final summary says pass
85
87
  - model-specific checks pass (Kling or Veo profile)
88
+ - voice design contract passes (`voiceCast` + `Audio Plan` + single active speaker)
86
89
 
87
90
  ---
88
91
 
@@ -30,7 +30,7 @@ Generate production-ready shot prompts from scenario and **SAVE TO FILES**.
30
30
  ```text
31
31
  $OUTPUT_DIR/
32
32
  ├── project-info.md # Scenario, characters, settings
33
- ├── shot-plan.json # Single-agent plan + policy contract
33
+ ├── shot-plan.json # Single-agent plan + policy + voiceCast contract
34
34
  ├── shots/
35
35
  │ ├── SHOT01.md # First shot (main + coverage)
36
36
  │ ├── SHOT02.md # Second shot
@@ -62,7 +62,10 @@ $OUTPUT_DIR/
62
62
  - coverage strategy
63
63
  - model contract (`model`, `preset`, `transition_mode`)
64
64
  - `dialogue_name_policy` (`preserve-original-dialogue` or `anonymize-dialogue`)
65
- 10. Ensure directories exist: `$OUTPUT_DIR/`, `$OUTPUT_DIR/shots/`, `$OUTPUT_DIR/reports/`.
65
+ - top-level `voiceCast`
66
+ - voice defaults (`single_active_speaker`, `music_default`, `subtitles_default`)
67
+ 10. Ensure every speaker or narrator has a stable `speakerKey` before shot writing.
68
+ 11. Ensure directories exist: `$OUTPUT_DIR/`, `$OUTPUT_DIR/shots/`, `$OUTPUT_DIR/reports/`.
66
69
 
67
70
  ### 2. Batch Strategy
68
71
 
@@ -83,9 +86,16 @@ For EACH shot:
83
86
  1. Analyze scene type (Dialogue / Action / Emotional / Establishing).
84
87
  2. Build dramaturgy behavior (objective, obstacle, stakes, subtext, beat turns).
85
88
  3. If model is `kling-3.0`, perform Kling Frame Prep (see below).
86
- 4. Generate main shot prompts (`ILK/İLK FRAME`, `SON FRAME`, `VIDEO`).
89
+ 4. If model is `kling-3.0`, choose Kling execution mode:
90
+ - `single-transition` for one strong motion arc or reveal
91
+ - `custom-storyboard` only for 2-3 distinct editorial phases
92
+ 4. Reuse or create the correct `voiceCast` entry for any speaking character or narrator.
93
+ 5. Generate main shot prompts (`ILK/İLK FRAME`, `SON FRAME`, `VIDEO`).
87
94
  5. `ILK/İLK FRAME` section MUST always include a fenced code block.
88
95
  6. If chained, first-frame code block must explicitly state: `Use SHOT[prev]_END as exact first frame`.
96
+ 7. Write a machine-readable `Audio Plan` JSON block for every VIDEO section.
97
+ 8. If dialogue or voiceover exists, require `activeSpeakerKey`, `dialogueLines`, and `performanceNote`.
98
+ 9. Keep one active speaker per shot. Split reply dialogue across multiple shots.
89
99
  7. Enforce hard quality floor:
90
100
  - `ILK FRAME`: minimum 80 words
91
101
  - `SON FRAME`: minimum 80 words
@@ -98,11 +108,11 @@ For EACH shot:
98
108
  - explicit eyeline target / body orientation when a subject looks at someone or something
99
109
  - explicit shared light source / bounce logic when multiple subjects share frame
100
110
  - explicit depth/scale integration when more than one plane is visible
101
- 9. Generate coverage prompts (2-3 per main shot, min 70 words each).
102
- 10. Add Turkish summary for shot and each coverage section.
103
- 11. Apply model-specific generation gates (see below).
104
- 12. Write shot file: `$OUTPUT_DIR/shots/SHOT[NN].md`.
105
- 13. Update `$OUTPUT_DIR/_index.md`.
111
+ 10. Generate coverage prompts (2-3 per main shot, min 70 words each).
112
+ 11. Add Turkish summary for shot and each coverage section.
113
+ 12. Apply model-specific generation gates (see below).
114
+ 13. Write shot file: `$OUTPUT_DIR/shots/SHOT[NN].md`.
115
+ 14. Update `$OUTPUT_DIR/_index.md`.
106
116
 
107
117
  #### Kling Frame Prep (when model is kling-3.0)
108
118
 
@@ -113,9 +123,11 @@ Before writing prompts, design the Start→End transition:
113
123
  - 5s = 1 major change
114
124
  - 10s = 2-3 staged changes
115
125
  - 15s = complex multi-step
116
- 3. **Motion timeline:** Write 2-4 steps: `first then finally`.
117
- 4. **Face/hands stability:** Match orientations between start and end avoid >45° face rotation.
118
- 5. **Camera safety:** Use only safe movements (slow push-in, pan, tilt, micro-sway, tripod-locked).
126
+ 3. **Execution mode:** Default to `single-transition`; use `custom-storyboard` only when the shot truly has 2-3 meaningful internal phases.
127
+ 4. **Motion timeline:** Write 2-4 steps: `first then finally`.
128
+ 5. **Face/hands stability:** Match orientations between start and end avoid >45° face rotation.
129
+ 6. **Camera safety:** Use only safe movements (slow push-in, pan, tilt, micro-sway, tripod-locked).
130
+ 7. **Anti-fragmentation:** Do not turn one glance, gesture, or prop touch into separate micro-shots. If custom storyboard is used, cap it at 3 stages and make each stage editorially distinct.
119
131
 
120
132
  #### Veo Gate (when model is veo31)
121
133
 
@@ -134,6 +146,8 @@ Before writing prompts, design the Start→End transition:
134
146
  - [ ] Start/end frames verified for same visual universe
135
147
  - [ ] Dialogue uses `"quotation marks"` with tone markers (Kling inline format)
136
148
  - [ ] Eyelines, plane map, and shared-light logic stay consistent across start/end frames
149
+ - [ ] Defaulted to `single-transition` unless 2-3 distinct internal phases genuinely require custom storyboard
150
+ - [ ] If custom storyboard is used: maximum 3 stages, no decorative micro-beats, and each stage changes function/framing/action
137
151
 
138
152
  ### 4. Validation Pass (Mandatory Before Completion)
139
153
 
@@ -190,6 +204,36 @@ Avoid: blurry, low-res, text, watermark, bad anatomy, distorted face ...
190
204
  Avoid: blurry, low-res, text, watermark, bad anatomy, distorted face ...
191
205
  ```
192
206
 
207
+ ### AUDIO PLAN
208
+
209
+ ```json
210
+ {
211
+ "language": "...",
212
+ "type": "...",
213
+ "activeSpeaker": "...",
214
+ "activeSpeakerKey": "...",
215
+ "dialogueLines": [],
216
+ "dialogueTranscript": "...",
217
+ "performanceNote": "...",
218
+ "delivery": {
219
+ "pace": "...",
220
+ "intensity": "...",
221
+ "emotion": "...",
222
+ "projection": "...",
223
+ "stability": "..."
224
+ },
225
+ "sfx": [],
226
+ "ambience": [],
227
+ "music": null,
228
+ "mixTarget": {
229
+ "dialogue": 0,
230
+ "sfx": 0,
231
+ "ambience": 0
232
+ },
233
+ "hasSubtitles": false
234
+ }
235
+ ```
236
+
193
237
  ### VIDEO
194
238
 
195
239
  #### Veo Format:
@@ -240,6 +284,10 @@ Avoid: ...
240
284
  Avoid: ...
241
285
  ```
242
286
 
287
+ ```json
288
+ [Coverage audioPlan JSON block - voice binding only when coverage actually contains speech]
289
+ ```
290
+
243
291
  ### SHOT[NN]B - [Type]
244
292
  [Same format]
245
293
 
@@ -14,15 +14,17 @@ $ARGUMENTS
14
14
  - delivery report is fail
15
15
  - missing required shot sections
16
16
  - continuity mismatch between neighboring shots
17
+ - missing `voiceCast` entry or broken `activeSpeakerKey` binding
17
18
 
18
19
  ## Recovery Steps
19
20
 
20
21
  1. Identify failed files and sections.
21
22
  2. Fix only affected shots first.
22
23
  3. Regenerate neighboring shots only if continuity requires it.
23
- 4. Re-run `/safety-check`.
24
- 5. Regenerate `$OUTPUT_DIR/reports/DELIVERY-REPORT.md`.
25
- 6. Update `$OUTPUT_DIR/_index.md` with recovered status.
24
+ 4. Repair `voiceCast` or `Audio Plan` bindings before rerunning reports.
25
+ 5. Re-run `/safety-check`.
26
+ 6. Regenerate `$OUTPUT_DIR/reports/DELIVERY-REPORT.md`.
27
+ 7. Update `$OUTPUT_DIR/_index.md` with recovered status.
26
28
 
27
29
  ## Exit Criteria
28
30
 
@@ -38,6 +38,10 @@ Validate all prompts before delivery to ensure platform compliance.
38
38
  - coverage has 2-3 sub-shots in same file
39
39
  - all prompts include Avoid line
40
40
  - all video prompts include full audio direction block
41
+ - `shot-plan.json` contains `voiceCast` when any shot has dialogue or narration
42
+ - every speaking VIDEO section includes an `Audio Plan` block
43
+ - every `activeSpeakerKey` resolves to `shot-plan.json -> voiceCast`
44
+ - no dialogue shot contains more than one active speaker
41
45
 
42
46
  3. Continuity checks
43
47
  - `SHOT[N]_END` aligns with `SHOT[N+1]_START`
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@milenyumai/film-kit",
3
- "version": "1.3.0",
3
+ "version": "1.4.1",
4
4
  "description": "Hollywood-standard cinematic prompt engineering toolkit with model profiles (Veo 3.1 / Kling 3.0). Auto-configures AI agents (Cursor, Claude Code, VS Code Copilot, Antigravity) with production-grade shot generation system.",
5
5
  "type": "module",
6
6
  "main": "./build/index.js",