@milenyumai/film-kit 1.4.0 → 1.4.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -18,6 +18,7 @@ Film-Kit ships as a single repository with four npm packages:
18
18
  - native `.claude/agents/*`
19
19
  - cleanup of stale mode-specific Claude artifacts
20
20
  - Shared `spatial-blocking` skill for gaze, plane depth, light cohesion, compositing realism, and anti-miniature control.
21
+ - Voice-design aware audio contract with project-level `voiceCast`, shot-level `Audio Plan`, and backward-compatible `Audio direction` blocks.
21
22
  - Aligned quality gates across Claude Code, Cursor, Copilot, and Antigravity.
22
23
  - Stronger Kling 3.0 and Kling multi-shot guidance, including practical route rules and hard caps.
23
24
 
@@ -44,16 +44,18 @@ All rules, skills, and workflows are located under \`.agent/\`.
44
44
  ### Entry Points
45
45
  - **Master Rules:** \`.agent/MASTER.md\` — Complete production ruleset (690 lines)
46
46
  - **Architecture:** \`.agent/ARCHITECTURE.md\` — System map & quick reference
47
+ - **Voice Design:** \`.agent/VOICE-DESIGN.md\` — Project-level \`voiceCast\` + shot-level \`audioPlan\`
47
48
  - **Model Profile:** \`.agent/model-profile.md\` — Active model rules and constraints
48
49
  - **Agent:** \`.agent/agents/prompt-engineer.md\` — Senior prompt engineer agent
49
50
 
50
- ### Skills (8 modules)
51
+ ### Skills (9 modules)
51
52
  | Skill | Path | Priority |
52
53
  |-------|------|----------|
53
54
  | Safety Compliance | \`.agent/skills/safety-compliance/SKILL.md\` | P0 — ALWAYS |
54
55
  | Reference Locking | \`.agent/skills/reference-locking/SKILL.md\` | P1 — When refs provided |
55
56
  | Frame Chaining | \`.agent/skills/frame-chaining/SKILL.md\` | P2 — ALWAYS |
56
57
  | Spatial Blocking | \`.agent/skills/spatial-blocking/SKILL.md\` | P2 — Relational realism / gaze / depth |
58
+ | Semantic Consistency | \`.agent/skills/semantic-consistency/SKILL.md\` | P2 — ALWAYS, visual_world + physics gate |
57
59
  | Coverage System | \`.agent/skills/coverage-system/SKILL.md\` | P2 — ALWAYS (mandatory) |
58
60
  | Visual Modes | \`.agent/skills/visual-modes/SKILL.md\` | P4 — ALWAYS |
59
61
  | Audio Design | \`.agent/skills/audio-design/SKILL.md\` | P4 — When dialogue/SFX |
@@ -71,9 +73,10 @@ All rules, skills, and workflows are located under \`.agent/\`.
71
73
  ## Goal
72
74
  When the user asks \`/generate\`, convert the scenario into:
73
75
  - \`${config.outputDir}/project-info.md\` — Characters, settings, arc mapping
74
- - \`${config.outputDir}/shot-plan.json\` — Single-agent plan + policy contract
76
+ - \`${config.outputDir}/shot-plan.json\` — Single-agent plan + policy + \`voiceCast\` contract
75
77
  - \`${config.outputDir}/shots/SHOT01.md, SHOT02.md, ...\` — Production shot files (with coverage included)
76
78
  - \`${config.outputDir}/reports/SAFETY-REPORT.md\` — Safety gate result
79
+ - \`${config.outputDir}/reports/SEMANTIC-REPORT.md\` — Semantic consistency gate result
77
80
  - \`${config.outputDir}/reports/DELIVERY-REPORT.md\` — Delivery gate result
78
81
  - \`${config.outputDir}/_index.md\` — Shot list with chain & status tracking
79
82
 
@@ -88,6 +91,7 @@ Each \`SHOTNN.md\` is a **single file** containing ALL shot details:
88
91
  - 🔗 Chain status (FIRST / CHAINED / CHAIN BREAK)
89
92
  - İLK FRAME (start frame image prompt — min 60 words)
90
93
  - SON FRAME (end frame image prompt — min 60 words)
94
+ - AUDIO PLAN (machine-readable shot audio contract)
91
95
  - VİDEO (video prompt with audio direction — min 80 words)
92
96
  - COVERAGE SHOTS (2-3 coverage shots within same file — each min 60 words)
93
97
  - 🇹🇷 Turkish summary for each section
@@ -98,7 +102,9 @@ Each \`SHOTNN.md\` is a **single file** containing ALL shot details:
98
102
  - **Name Policy:** Visual prompts must stay anonymous. Dialogue naming follows \`shot-plan.json\` policy.
99
103
  - **AUTO-SAFETY:** Proactively reframe content that may trigger safety filters
100
104
  - **Frame Chaining:** Last frame of SHOT[N] = First frame of SHOT[N+1]
105
+ - **Semantic Consistency:** \`shot-plan.json.visual_world\` is canonical for perspective, named camera movement strategy, shadow vector, scale, reflection, physics, and seed strategy
101
106
  - **Coverage Mandatory:** Every main shot includes 2-3 coverage sub-shots in same file
107
+ - **Voice Design:** \`shot-plan.json\` keeps top-level \`voiceCast\`; every speaking VIDEO section keeps \`Audio Plan\`
102
108
  - **Music: NONE** by default (user must explicitly request)
103
109
  - **SLOW BURN:** 8-second duration, split actions into multiple shots
104
110
  `;
@@ -111,6 +117,7 @@ function buildCursorLegacyRules(config) {
111
117
  ## CRITICAL: SKILL LOADING PROTOCOL
112
118
  Before generating ANY prompts, read the appropriate skill files from .agent/skills/:
113
119
  Read .agent/model-profile.md first to apply model-specific rules.
120
+ Read .agent/VOICE-DESIGN.md when dialogue, narrator VO, or reusable speaker identity exists.
114
121
 
115
122
  | Skill | Path | When |
116
123
  |-------|------|------|
@@ -118,6 +125,7 @@ Read .agent/model-profile.md first to apply model-specific rules.
118
125
  | Reference Locking | .agent/skills/reference-locking/SKILL.md | When refs provided |
119
126
  | Frame Chaining | .agent/skills/frame-chaining/SKILL.md | Multi-shot projects |
120
127
  | Spatial Blocking | .agent/skills/spatial-blocking/SKILL.md | Multi-subject / gaze / scale-critical shots |
128
+ | Semantic Consistency | .agent/skills/semantic-consistency/SKILL.md | ALWAYS |
121
129
  | Coverage System | .agent/skills/coverage-system/SKILL.md | ALWAYS (mandatory) |
122
130
  | Visual Modes | .agent/skills/visual-modes/SKILL.md | All visual work |
123
131
  | Audio Design | .agent/skills/audio-design/SKILL.md | Dialogue/SFX needed |
@@ -133,8 +141,11 @@ Read .agent/model-profile.md first to apply model-specific rules.
133
141
  7. EVERY prompt must have an Avoid line. No exceptions.
134
142
  8. Coverage shots mandatory (2-3 per main shot, min 60 words each, included in same file).
135
143
  9. Frame chaining: Last frame of SHOT[N] = First frame of SHOT[N+1].
136
- 10. ILK/İLK FRAME section must contain a code block even for chained shots.
137
- 11. ONE FILE PER SHOT: Each SHOTNN.md contains main shot + all coverage shots.
144
+ 10. Semantic consistency: \`${config.outputDir}/shot-plan.json\` must include \`visual_world\`; prompts must align camera, named movement strategy, light/shadow vector, scale, reflections, physics, anatomy risk, and contextual logic.
145
+ 11. ILK/İLK FRAME section must contain a code block even for chained shots.
146
+ 12. Chained ILK/İLK FRAME code blocks must contain only: \`Use SHOT[prev]_END as exact first frame\`; any new visual prompt is a CHAIN BREAK.
147
+ 13. ONE FILE PER SHOT: Each SHOTNN.md contains main shot + all coverage shots.
148
+ 14. Keep top-level \`voiceCast\` in ${config.outputDir}/shot-plan.json and \`Audio Plan\` in every speaking VIDEO section.
138
149
 
139
150
  ## WORKFLOWS
140
151
  - /generate → Read .agent/workflows/generate.md
@@ -164,11 +175,12 @@ alwaysApply: true
164
175
  ## Entry Point
165
176
  Read \`.agent/MASTER.md\` for complete production ruleset.
166
177
  Read \`.agent/ARCHITECTURE.md\` for system overview.
178
+ Read \`.agent/VOICE-DESIGN.md\` for voice identity and shot audio contracts.
167
179
  Read \`.agent/model-profile.md\` for active model constraints.
168
180
 
169
181
  ## SKILL LOADING (MANDATORY)
170
182
  Before generating ANY prompts:
171
- 1. ALWAYS load: safety-compliance, frame-chaining, coverage-system, prompt-structure, visual-modes
183
+ 1. ALWAYS load: safety-compliance, frame-chaining, semantic-consistency, coverage-system, prompt-structure, visual-modes
172
184
  2. Load for relational realism: spatial-blocking
173
185
  3. Load if refs provided: reference-locking
174
186
  4. Load if dialogue/SFX: audio-design
@@ -187,8 +199,12 @@ All skills at: \`.agent/skills/[name]/SKILL.md\`
187
199
  ## CRITICAL RULES
188
200
  - AUTO-ANONYMOUS: Replace ALL real names with physical descriptions
189
201
  - Dialogue naming follows \`${config.outputDir}/shot-plan.json\` policy
202
+ - \`shot-plan.json\` stores top-level \`voiceCast\`
203
+ - \`shot-plan.json\` stores top-level \`visual_world\` for camera/lens/camera-movement/light/shadow/scale/reflection/physics/seed strategy
204
+ - Every speaking VIDEO section includes \`Audio Plan\`
190
205
  - AUTO-SAFETY: Proactively reframe sensitive content
191
206
  - Frame chaining: Last frame SHOT[N] = First frame SHOT[N+1]
207
+ - Chained ILK/İLK FRAME code block contains only \`Use SHOT[prev]_END as exact first frame\`; any new visual prompt requires CHAIN BREAK
192
208
  - Coverage: 2-3 sub-shots per main shot (min 60 words each, in same file)
193
209
  - Avoid line: MANDATORY on every prompt
194
210
  - Music: NONE by default
@@ -216,8 +232,9 @@ Single-agent shot package generation. Claude Code may use the native \`prompt-en
216
232
  ## Mandatory Read Order
217
233
  1. \`.agent/model-profile.md\`
218
234
  2. \`.agent/MASTER.md\`
219
- 3. \`.agent/ARCHITECTURE.md\`
220
- 4. \`.agent/agents/prompt-engineer.md\`
235
+ 3. \`.agent/VOICE-DESIGN.md\`
236
+ 4. \`.agent/ARCHITECTURE.md\`
237
+ 5. \`.agent/agents/prompt-engineer.md\`
221
238
  5. \`.claude/CLAUDE.md\`
222
239
  6. relevant files under \`.claude/rules/\`
223
240
 
@@ -232,6 +249,7 @@ Before generating ANY prompts, read skills from \`.agent/skills/\`:
232
249
  - \`reference-locking/SKILL.md\` — When refs provided (P1)
233
250
  - \`frame-chaining/SKILL.md\` — ALWAYS for multi-shot (P2)
234
251
  - \`spatial-blocking/SKILL.md\` — when gaze / scale / compositing realism matters (P2)
252
+ - \`semantic-consistency/SKILL.md\` — ALWAYS, canonical \`visual_world\` + physics gate (P2)
235
253
  - \`coverage-system/SKILL.md\` — ALWAYS, mandatory (P2)
236
254
  - \`visual-modes/SKILL.md\` — ALWAYS (P4)
237
255
  - \`audio-design/SKILL.md\` — When dialogue/SFX (P4)
@@ -242,8 +260,10 @@ Before generating ANY prompts, read skills from \`.agent/skills/\`:
242
260
  - Model: \`${config.model}\` (${getModelDisplayName(config.model)})
243
261
  - Kling preset: \`${config.klingPreset}\`
244
262
  - Create \`${config.outputDir}/project-info.md\`, \`${config.outputDir}/shot-plan.json\`, and \`${config.outputDir}/_index.md\`
263
+ - Keep top-level \`voiceCast\` in \`${config.outputDir}/shot-plan.json\`
264
+ - Keep top-level \`visual_world\` in \`${config.outputDir}/shot-plan.json\`
245
265
  - Write \`${config.outputDir}/shots/SHOTNN.md\` per shot; coverage stays in the same file
246
- - Refresh \`${config.outputDir}/reports/SAFETY-REPORT.md\` and \`${config.outputDir}/reports/DELIVERY-REPORT.md\` before \`/finish\`
266
+ - Refresh \`${config.outputDir}/reports/SAFETY-REPORT.md\`, \`${config.outputDir}/reports/SEMANTIC-REPORT.md\`, and \`${config.outputDir}/reports/DELIVERY-REPORT.md\` before \`/finish\`
247
267
 
248
268
  ## Non-Negotiables
249
269
  1. **AUTO-ANONYMOUS:** Replace ALL real person names in visual prompts with physical descriptions.
@@ -254,11 +274,14 @@ Before generating ANY prompts, read skills from \`.agent/skills/\`:
254
274
  6. **Music:** NONE by default.
255
275
  7. **Avoid line:** MANDATORY on every prompt (image, video, coverage).
256
276
  8. **Coverage:** 2-3 sub-shots within same SHOTNN.md file, min 70 words each.
257
- 9. **ILK/İLK FRAME:** Always include a fenced code block, even when chained.
258
- 10. **Quality Floor:** ILK >= 80, SON >= 80, VIDEO >= 120, coverage >= 70 words.
259
- 11. **Specificity Floor:** lens/framing, lighting, and foreground/midground/background action are mandatory.
260
- 12. **Spatial Realism Floor:** eyeline target, plane map, shared light source, and contact/depth cues are mandatory when relational staging matters.
261
- 13. **ONE FILE PER SHOT:** No separate coverage files.
277
+ 9. **Voice Design:** keep project-level \`voiceCast\` in \`${config.outputDir}/shot-plan.json\` and per-shot \`Audio Plan\` in each VIDEO section.
278
+ 10. **ILK/İLK FRAME:** Always include a fenced code block, even when chained.
279
+ 11. **Chained ILK/İLK FRAME:** code block contains only \`Use SHOT[prev]_END as exact first frame\`; any new visual prompt requires CHAIN BREAK.
280
+ 12. **Quality Floor:** ILK >= 80, SON >= 80, VIDEO >= 120, coverage >= 70 words.
281
+ 13. **Specificity Floor:** lens/framing, lighting, and foreground/midground/background action are mandatory.
282
+ 14. **Spatial Realism Floor:** eyeline target, plane map, shared light source, and contact/depth cues are mandatory when relational staging matters.
283
+ 15. **Semantic Consistency Floor:** \`visual_world\`, perspective/geometry, shadow vector, scale map, reflections, gravity/contact physics, anatomy risk, foreground/background coherence, contextual contradictions, and targeted semantic avoid terms are mandatory.
284
+ 16. **ONE FILE PER SHOT:** No separate coverage files.
262
285
 
263
286
  ## Workflows
264
287
  | Command | Workflow |
@@ -283,8 +306,9 @@ This workspace keeps high-level policy in \`CLAUDE.md\` and operational detail i
283
306
  ## Before Any Generation or Repair
284
307
  1. Read \`.agent/model-profile.md\`
285
308
  2. Read \`.agent/MASTER.md\`
286
- 3. Read \`.agent/agents/prompt-engineer.md\`
287
- 4. Read \`.agent/workflows/generate.md\` or the requested workflow
309
+ 3. Read \`.agent/VOICE-DESIGN.md\`
310
+ 4. Read \`.agent/agents/prompt-engineer.md\`
311
+ 5. Read \`.agent/workflows/generate.md\` or the requested workflow
288
312
  5. Apply AUTO-ANONYMOUS and AUTO-SAFETY before drafting
289
313
  6. Prefer the active/selected markdown file as scenario source; fallback is \`${config.scenarioHint}\`
290
314
  7. Do not mark work complete while any required report is missing or fail
@@ -293,8 +317,12 @@ This workspace keeps high-level policy in \`CLAUDE.md\` and operational detail i
293
317
  - Write only inside \`${config.outputDir}\`
294
318
  - Keep one file per shot: \`${config.outputDir}/shots/SHOTNN.md\`
295
319
  - Maintain \`${config.outputDir}/shot-plan.json\` dialogue naming policy
320
+ - Maintain \`${config.outputDir}/shot-plan.json\` top-level \`voiceCast\`
321
+ - Maintain \`${config.outputDir}/shot-plan.json\` top-level \`visual_world\`
322
+ - Keep \`Audio Plan\` blocks aligned to \`voiceCast\`
296
323
  - Keep \`ILK/İLK FRAME\` in a fenced code block even when chained
297
324
  - Quality floor and specificity floor are hard gates, not suggestions
325
+ - Semantic consistency floor is a hard gate: camera/lens/camera-movement/light/shadow/scale/reflection/physics/anatomy/context must align to \`visual_world\`
298
326
  - Apply \`.agent/skills/spatial-blocking/SKILL.md\` whenever eyeline, compositing, or depth realism is critical
299
327
 
300
328
  ## Debugging
@@ -313,20 +341,26 @@ Use the Film-Kit core runtime.
313
341
  ## Read First
314
342
  1. \`.agent/model-profile.md\`
315
343
  2. \`.agent/MASTER.md\`
316
- 3. \`.agent/agents/prompt-engineer.md\`
317
- 4. \`.agent/workflows/generate.md\`
318
- 5. \`.claude/rules/output-contract.md\`
344
+ 3. \`.agent/VOICE-DESIGN.md\`
345
+ 4. \`.agent/agents/prompt-engineer.md\`
346
+ 5. \`.agent/workflows/generate.md\`
347
+ 6. \`.claude/rules/output-contract.md\`
319
348
 
320
349
  ## Responsibilities
321
350
  - draft and repair shot files under \`${config.outputDir}/shots/\`
322
351
  - apply \`${config.outputDir}/shot-plan.json\` dialogue naming policy
352
+ - maintain top-level \`voiceCast\` inside \`${config.outputDir}/shot-plan.json\`
353
+ - maintain top-level \`visual_world\` inside \`${config.outputDir}/shot-plan.json\`
354
+ - keep \`Audio Plan\` blocks valid against \`voiceCast\`
323
355
  - enforce AUTO-ANONYMOUS, AUTO-SAFETY, chaining, and coverage contracts
324
356
  - enforce quality floor: ILK >= 80, SON >= 80, VIDEO >= 120, coverage >= 70
325
357
  - enforce specificity floor: lens/framing, lighting, and foreground/midground/background action
326
358
  - enforce spatial realism: explicit eyeline target, plane map, shared light source, and contact/depth cues when needed
359
+ - enforce semantic consistency: \`visual_world\`, perspective/geometry, shadow vector, scale map, reflection handling, physics/anatomy risk, foreground/background coherence, contextual contradictions, and scene-specific avoid terms
327
360
 
328
361
  ## Boundaries
329
362
  - do not skip safety or delivery reports
363
+ - do not pass chained ILK/İLK FRAME blocks that contain anything besides exact reuse text
330
364
  - do not split coverage into separate files
331
365
  - if asked to review only, report issues instead of regenerating shots by default
332
366
  `;
@@ -349,6 +383,8 @@ If using the native Claude subagent, read \`.claude/agents/prompt-engineer.md\`
349
383
  - Determine shot count based on action beats
350
384
  - Create \`${config.outputDir}/project-info.md\`
351
385
  - Create \`${config.outputDir}/shot-plan.json\`
386
+ - Add top-level \`voiceCast\` before writing speaking shots
387
+ - Add top-level \`visual_world\` before writing visual prompts
352
388
 
353
389
  2. **Batch Strategy:**
354
390
  - 1-10 shots → Generate all at once
@@ -358,10 +394,13 @@ If using the native Claude subagent, read \`.claude/agents/prompt-engineer.md\`
358
394
  3. **Per Shot (SINGLE FILE: SHOTNN.md):**
359
395
  - Analyze scene type (Dialogue / Action / Emotional / Establishing)
360
396
  - Generate main shot (İLK FRAME + SON FRAME + VİDEO)
397
+ - Add machine-readable \`Audio Plan\` before every VIDEO section
361
398
  - Keep İLK FRAME as fenced code block even when chained
399
+ - If chained, keep İLK FRAME code block to exact reuse text only; new visual prompt means CHAIN BREAK
362
400
  - Enforce hard quality floor: ILK >= 80, SON >= 80, VIDEO >= 120, coverage >= 70
363
401
  - Enforce specificity floor: lens/framing + lighting + foreground/midground/background action
364
402
  - Enforce spatial realism floor: eyeline target + plane map + shared light source + contact/depth cues when applicable
403
+ - Enforce semantic consistency floor: perspective/geometry + shadow vector + scale map + reflections + gravity/contact physics + anatomy risk + contextual contradiction check
365
404
  - Generate 2-3 coverage shots (in same file)
366
405
  - Write to \`${config.outputDir}/shots/SHOT[NN].md\`
367
406
  - Update \`${config.outputDir}/_index.md\`
@@ -369,6 +408,7 @@ If using the native Claude subagent, read \`.claude/agents/prompt-engineer.md\`
369
408
  4. **Validation Gates:**
370
409
  - Run /safety-check
371
410
  - Write \`${config.outputDir}/reports/SAFETY-REPORT.md\`
411
+ - Write \`${config.outputDir}/reports/SEMANTIC-REPORT.md\`
372
412
  - Write \`${config.outputDir}/reports/DELIVERY-REPORT.md\`
373
413
  - If any gate fails, run \`.agent/workflows/recover.md\`
374
414
 
@@ -386,11 +426,13 @@ function buildClaudeRuleOutputContract(config) {
386
426
 
387
427
  ## Required Files
388
428
  - \`${config.outputDir}/project-info.md\` — Characters, settings, emotional arc mapping, tension levels
389
- - \`${config.outputDir}/shot-plan.json\` — Name policy, shot plan, and validation contract
429
+ - \`${config.outputDir}/shot-plan.json\` — Name policy, shot plan, validation contract, top-level \`voiceCast\`, and top-level \`visual_world\`
390
430
  - \`.agent/model-profile.md\` — Active model constraints and presets
431
+ - \`.agent/VOICE-DESIGN.md\` — Voice identity and shot audio contract
391
432
  - \`${config.outputDir}/_index.md\` — Shot tracking with chain & status
392
433
  - \`${config.outputDir}/shots/SHOT01.md ... SHOTNN.md\` — Individual shot files (one file per shot)
393
434
  - \`${config.outputDir}/reports/SAFETY-REPORT.md\` — Safety gate report
435
+ - \`${config.outputDir}/reports/SEMANTIC-REPORT.md\` — Semantic consistency gate report
394
436
  - \`${config.outputDir}/reports/DELIVERY-REPORT.md\` — Delivery gate report
395
437
 
396
438
  ## Prompt Flow Order (MANDATORY)
@@ -403,7 +445,7 @@ The first sentence carries the most weight. Every prompt MUST follow this order:
403
445
  3. [Action] Micro-behavior, acting cue, physical gesture
404
446
  4. [Camera + Lens] "85mm f/2.0, shallow DOF, static with handheld micro-movement"
405
447
  5. [Light + Atmosphere] "Warm oil lamp key light from screen-left, deep shadows"
406
- 6. [Audio direction block] (VIDEO prompts only)
448
+ 6. [Audio Plan + Audio direction block] (VIDEO prompts only)
407
449
  7. [Avoid line] (EVERY prompt — MANDATORY)
408
450
  \`\`\`
409
451
 
@@ -413,6 +455,7 @@ The first sentence carries the most weight. Every prompt MUST follow this order:
413
455
  - Kling preset: \`${config.klingPreset}\`
414
456
  - Kling transition mode: Start+End (when model is kling-3.0)
415
457
  - Motion timeline: first → then → finally (when model is kling-3.0)
458
+ - Kling multi-shot mode: single-transition by default; custom storyboard only for 2-3 meaningful phases (when model is kling-3.0)
416
459
 
417
460
  ## Shot File Format (SHOTNN.md) — SINGLE FILE, ALL INCLUSIVE
418
461
 
@@ -428,10 +471,11 @@ FIRST SHOT / CHAINED from SHOT[prev]_END / CHAIN BREAK - Reason
428
471
  ## Main Shot
429
472
 
430
473
  ### İLK FRAME (SHOTNN_START)
431
- [If chained: "Use SHOT[prev]_END as first frame"]
474
+ [If chained: the code block below must contain only "Use SHOT[prev]_END as exact first frame"]
432
475
 
433
476
  > NOTE: Even when chained, this section MUST contain a fenced code block.
434
- > If chained, include: "Use SHOT[prev]_END as exact first frame."
477
+ > If chained, the fenced code block must contain only: "Use SHOT[prev]_END as exact first frame."
478
+ > Any new visual prompt in a chained ILK FRAME section requires CHAIN BREAK.
435
479
 
436
480
  \\\`\\\`\\\`
437
481
  [Image prompt — min 60 words, following prompt flow order]
@@ -447,6 +491,12 @@ Avoid: blurry, low-res, noise, distorted faces, bad anatomy, extra limbs/fingers
447
491
  plastic skin, waxy skin, on-screen text, watermark, logo, cartoon style, CGI look.
448
492
  \\\`\\\`\\\`
449
493
 
494
+ ### AUDIO PLAN
495
+
496
+ \\\`\\\`\\\`json
497
+ [Machine-readable audioPlan JSON block aligned with shot-plan.json -> voiceCast]
498
+ \\\`\\\`\\\`
499
+
450
500
  ### VİDEO
451
501
 
452
502
  \\\`\\\`\\\`
@@ -494,6 +544,10 @@ Audio direction:
494
544
  Avoid: distorted faces, morphing, blurry, flickering, unnatural motion, on-screen text.
495
545
  \\\`\\\`\\\`
496
546
 
547
+ \\\`\\\`\\\`json
548
+ [Coverage audioPlan JSON block - voice binding only when coverage contains speech]
549
+ \\\`\\\`\\\`
550
+
497
551
  ### SHOT[NN]B — [Type] | [Duration]s | [Icon] [Label]
498
552
  [Same format as A]
499
553
 
@@ -545,6 +599,8 @@ Total: 1 main + [N] coverage = [N+1] production shots
545
599
  ### Name Rule
546
600
  - Visual prompts and non-dialogue fields: no real names
547
601
  - Dialogue transcript naming follows \`shot-plan.json.dialogue_name_policy\`
602
+ - Every speaking shot must resolve \`activeSpeakerKey\` to \`shot-plan.json.voiceCast\`
603
+ - Keep one active speaker per shot
548
604
 
549
605
  ### 30° Rule
550
606
  Coverage camera angles MUST differ from main shot by at least 30°.
@@ -563,6 +619,11 @@ Character gaze directions must be spatially consistent between cuts.
563
619
  - Keep one motivated light source across subjects.
564
620
  - Add contact / weight / support cues to avoid pasted composite look.
565
621
 
622
+ ### Semantic Consistency
623
+ - \`shot-plan.json.visual_world\` is the canonical scene contract.
624
+ - Prompts must agree with its aspect ratio, camera height, lens family, horizon line, vanishing strategy, camera movement strategy, light source, shadow direction, color temperature, scale map, reflection risk, physics constraints, and seed strategy.
625
+ - Avoid contextual contradictions unless the prompt explicitly explains the unusual physics or style.
626
+
566
627
  ### Dramaturgy (for dialogue scenes)
567
628
  Analyze per character: Objective → Obstacle → Stakes → Subtext → Beat turns.
568
629
  Embed as physical behavior in prompts, NOT as metadata.
@@ -590,6 +651,7 @@ function buildCopilotInstructions(config) {
590
651
  ### System Entry Point
591
652
  Read \`.agent/MASTER.md\` for the complete production ruleset.
592
653
  Read \`.agent/ARCHITECTURE.md\` for system overview.
654
+ Read \`.agent/VOICE-DESIGN.md\` for voice identity and shot audio contracts.
593
655
  Read \`.agent/model-profile.md\` for active model rules.
594
656
 
595
657
  ### Skill Loading Protocol (MANDATORY)
@@ -598,10 +660,11 @@ Before generating ANY prompts, read skills from \`.agent/skills/\`:
598
660
  2. \`reference-locking/SKILL.md\` — When reference images provided
599
661
  3. \`frame-chaining/SKILL.md\` — ALWAYS for multi-shot continuity
600
662
  4. \`spatial-blocking/SKILL.md\` — When gaze / depth / scale realism is critical
601
- 5. \`coverage-system/SKILL.md\` — ALWAYS (mandatory coverage shots)
602
- 6. \`visual-modes/SKILL.md\` — ALWAYS (Ultra Realism default)
603
- 7. \`audio-design/SKILL.md\` — When dialogue or SFX needed
604
- 8. \`prompt-structure/SKILL.md\` — ALWAYS (prompt templates)
663
+ 5. \`semantic-consistency/SKILL.md\` — ALWAYS (visual_world + semantic QA)
664
+ 6. \`coverage-system/SKILL.md\` — ALWAYS (mandatory coverage shots)
665
+ 7. \`visual-modes/SKILL.md\` — ALWAYS (Ultra Realism default)
666
+ 8. \`audio-design/SKILL.md\` — When dialogue or SFX needed
667
+ 9. \`prompt-structure/SKILL.md\` — ALWAYS (prompt templates)
605
668
 
606
669
  ### When User Asks /generate
607
670
  1. Read \`.agent/workflows/generate.md\` for the full procedure
@@ -610,17 +673,22 @@ Before generating ANY prompts, read skills from \`.agent/skills/\`:
610
673
  4. Create index: \`${config.outputDir}/_index.md\`
611
674
  5. Create project info: \`${config.outputDir}/project-info.md\`
612
675
  6. Create plan: \`${config.outputDir}/shot-plan.json\`
613
- 7. Write reports: \`${config.outputDir}/reports/SAFETY-REPORT.md\`, \`${config.outputDir}/reports/DELIVERY-REPORT.md\`
676
+ 7. Keep top-level \`voiceCast\` in the plan and \`Audio Plan\` in speaking VIDEO sections
677
+ 8. Keep top-level \`visual_world\` in the plan for camera/lens/camera-movement/light/shadow/scale/reflection/physics/seed rules
678
+ 9. Write reports: \`${config.outputDir}/reports/SAFETY-REPORT.md\`, \`${config.outputDir}/reports/SEMANTIC-REPORT.md\`, \`${config.outputDir}/reports/DELIVERY-REPORT.md\`
614
679
 
615
680
  ### Critical Rules
616
681
  - **AUTO-ANONYMOUS:** Replace ALL real names with physical descriptions
617
682
  - **Name Policy:** Dialogue naming follows \`${config.outputDir}/shot-plan.json\` policy
618
683
  - **AUTO-SAFETY:** Proactively reframe sensitive content
619
684
  - **Frame Chaining:** Last frame of SHOT[N] = First frame of SHOT[N+1]
685
+ - **Chain Hardening:** chained ILK/İLK FRAME code block contains only \`Use SHOT[prev]_END as exact first frame\`
620
686
  - **Coverage:** 2-3 sub-shots per main shot (in same file, min 60 words each)
621
687
  - **Spatial Realism:** eyeline targets, shared light, depth scale, and anti-cutout staging must agree when subjects share frame
688
+ - **Semantic Consistency:** \`visual_world\` controls perspective/geometry, shadow vector, scale map, reflections, physics, anatomy risk, background coherence, and contextual contradictions
622
689
  - **Avoid Line:** MANDATORY on every prompt
623
690
  - **Music:** NONE by default
691
+ - **Voice Design:** keep \`voiceCast\` in \`${config.outputDir}/shot-plan.json\` and \`Audio Plan\` in speaking VIDEO sections
624
692
  - **Duration:** 8s default, slow burn pacing
625
693
  - **Language:** Prompts in English, dialogue preserved
626
694
  - **ILK/İLK FRAME:** keep fenced code block even when chained
@@ -650,9 +718,10 @@ When request is /generate, follow the Film-Kit Hollywood production system:
650
718
  3. Load required skills from \`.agent/skills/\`
651
719
  4. Transform scenario into production shot package at \`${config.outputDir}\`
652
720
  5. Generate: project-info.md, shot-plan.json, _index.md, shots/SHOT01.md..SHOTNN.md
653
- 6. Each SHOTNN.md: İLK FRAME + SON FRAME + VİDEO + 2-3 Coverage (ALL IN ONE FILE)
654
- 7. Enforce: auto-anonymous, dialogue name policy, auto-safety, frame chaining, avoid lines
655
- 8. Write reports to \`${config.outputDir}/reports/\` before /finish
721
+ 6. Keep top-level \`voiceCast\` and \`visual_world\` in shot-plan.json
722
+ 7. Each SHOTNN.md: İLK FRAME + SON FRAME + AUDIO PLAN + VİDEO + 2-3 Coverage (ALL IN ONE FILE)
723
+ 8. Enforce: auto-anonymous, dialogue name policy, auto-safety, frame chaining, semantic consistency, avoid lines
724
+ 9. Write reports to \`${config.outputDir}/reports/\` before /finish
656
725
  `;
657
726
  }
658
727
  /* ---------- ANTIGRAVITY ---------- */
@@ -667,6 +736,7 @@ description: Hollywood-standard cinematic prompt engineering with model-aware pr
667
736
  ## System Architecture
668
737
  This skill is part of the Film-Kit prompt engineering system.
669
738
  Read \`.agent/MASTER.md\` for the complete production ruleset (690+ rules).
739
+ Read \`.agent/VOICE-DESIGN.md\` for project-level \`voiceCast\` and shot-level \`audioPlan\`.
670
740
  Read \`.agent/model-profile.md\` first for active model constraints.
671
741
 
672
742
  ## Skill Loading Protocol
@@ -675,10 +745,11 @@ Before generating ANY prompts, read these skills:
675
745
  2. \`.agent/skills/reference-locking/SKILL.md\` — When refs provided
676
746
  3. \`.agent/skills/frame-chaining/SKILL.md\` — ALWAYS
677
747
  4. \`.agent/skills/spatial-blocking/SKILL.md\` — When gaze / depth / scale realism is critical
678
- 5. \`.agent/skills/coverage-system/SKILL.md\` — ALWAYS (mandatory)
679
- 6. \`.agent/skills/visual-modes/SKILL.md\` — ALWAYS
680
- 7. \`.agent/skills/audio-design/SKILL.md\` — When dialogue/SFX
681
- 8. \`.agent/skills/prompt-structure/SKILL.md\` — ALWAYS
748
+ 5. \`.agent/skills/semantic-consistency/SKILL.md\` — ALWAYS (visual_world + semantic QA)
749
+ 6. \`.agent/skills/coverage-system/SKILL.md\` — ALWAYS (mandatory)
750
+ 7. \`.agent/skills/visual-modes/SKILL.md\` — ALWAYS
751
+ 8. \`.agent/skills/audio-design/SKILL.md\` — When dialogue/SFX
752
+ 9. \`.agent/skills/prompt-structure/SKILL.md\` — ALWAYS
682
753
 
683
754
  ## Workflows
684
755
  | Command | Workflow |
@@ -700,7 +771,9 @@ Before generating ANY prompts, read these skills:
700
771
  - Shot index: \`${config.outputDir}/_index.md\`
701
772
  - Project info: \`${config.outputDir}/project-info.md\`
702
773
  - Plan: \`${config.outputDir}/shot-plan.json\`
703
- - Reports: \`${config.outputDir}/reports/SAFETY-REPORT.md\`, \`${config.outputDir}/reports/DELIVERY-REPORT.md\`
774
+ - Voice contract: top-level \`voiceCast\` in \`${config.outputDir}/shot-plan.json\`
775
+ - Semantic contract: top-level \`visual_world\` in \`${config.outputDir}/shot-plan.json\`
776
+ - Reports: \`${config.outputDir}/reports/SAFETY-REPORT.md\`, \`${config.outputDir}/reports/SEMANTIC-REPORT.md\`, \`${config.outputDir}/reports/DELIVERY-REPORT.md\`
704
777
 
705
778
  ## Critical Rules
706
779
  1. **AUTO-ANONYMOUS:** Replace ALL real person names with physical descriptions
@@ -709,12 +782,15 @@ Before generating ANY prompts, read these skills:
709
782
  4. **Frame Chaining:** Last frame of SHOT[N] becomes first frame of SHOT[N+1]
710
783
  5. **Coverage Mandatory:** 2-3 sub-shots per main shot (in same file, min 60 words each)
711
784
  6. **Avoid Line:** MANDATORY on every prompt (image + video + coverage)
712
- 7. **Music: NONE** by default
713
- 8. **Ultra Realism** default visual mode
714
- 9. **8s duration** default, slow burn pacing
715
- 10. **ILK/İLK FRAME:** always keep fenced code block
716
- 11. **ONE FILE PER SHOT:** SHOTNN.md contains main shot + all coverage
717
- 12. **Relational Realism:** preserve eyeline targets, shared light, depth scale, and anti-cutout staging when multiple subjects share frame
785
+ 7. **Voice Design:** keep \`voiceCast\` in \`${config.outputDir}/shot-plan.json\` and \`Audio Plan\` in every speaking VIDEO section
786
+ 8. **Music: NONE** by default
787
+ 9. **Ultra Realism** default visual mode
788
+ 10. **8s duration** default, slow burn pacing
789
+ 11. **ILK/İLK FRAME:** always keep fenced code block
790
+ 12. **Chained ILK/İLK FRAME:** code block contains only \`Use SHOT[prev]_END as exact first frame\`; any new visual prompt is CHAIN BREAK
791
+ 13. **ONE FILE PER SHOT:** SHOTNN.md contains main shot + all coverage
792
+ 14. **Relational Realism:** preserve eyeline targets, shared light, depth scale, and anti-cutout staging when multiple subjects share frame
793
+ 15. **Semantic Consistency:** preserve \`visual_world\` perspective, shadow vector, scale map, reflections, gravity/contact physics, anatomy risk, foreground/background coherence, and contextual logic
718
794
 
719
795
  ## Quality Floor (Hard Gate)
720
796
  Reject and regenerate any shot that fails:
@@ -727,6 +803,7 @@ Reject and regenerate any shot that fails:
727
803
  - missing explicit foreground/midground/background action details
728
804
  - missing explicit eyeline target or \`not camera\` instruction when gaze matters
729
805
  - missing explicit shared light source / depth / contact cues in multi-subject shots
806
+ - missing semantic consistency anchors: perspective/geometry, shadow vector, scale map, reflection handling, gravity/contact physics, anatomy risk, foreground/background coherence, contextual contradiction check
730
807
 
731
808
  ## Reject Weak Prompt Style
732
809
  Do not accept generic filler language:
@@ -829,7 +906,7 @@ The more aligned these are, the cleaner the transition:
829
906
  - Hand pose and finger count should be similar in both frames
830
907
  - Avoid end frames with extreme mouth positions if speech is not intended
831
908
 
832
- **Loop shortcut:** Set Start = End (same image). Prompt: "seamless loop" + simple camera movement (e.g., roll 360, slow push-in).
909
+ **Loop shortcut:** Set Start = End (same image). Prompt: "seamless loop" + simple camera movement (e.g., roll 360, Dolly In).
833
910
 
834
911
  ### Transformation Budget
835
912
 
@@ -843,6 +920,16 @@ Limit the amount of change per clip for realism:
843
920
 
844
921
  > If end frame is too different, the model must "invent miracles" → plastic/melting/warp risk rises.
845
922
 
923
+ ### Kling Storyboard / Multi-Shot Decision Tree
924
+
925
+ The official Kling app surface exposes custom storyboard-like shot prompting and optional end-frame anchoring.
926
+ Use it as a selective upgrade, not a default:
927
+ - **single-transition**: default mode for one clean action arc, reveal, or emotional turn
928
+ - **custom-storyboard**: only when one clip truly needs **2-3 editorially distinct phases**
929
+ - **hard cap in this toolkit:** 3 custom storyboard shots per video generation
930
+ - **never storyboard micro-beats:** one glance, one finger move, one tiny prop touch, one breath shift
931
+ - if the beat needs 4+ phases, split into chained videos instead of overloading one Kling prompt
932
+
846
933
  ### Golden Prompt Skeleton (Start+End)
847
934
 
848
935
  The prompt's job is to tell the model **how to generate the in-between frames**:
@@ -874,10 +961,11 @@ These prevent the model from taking shortcuts.
874
961
  More complex camera = more warp risk.
875
962
 
876
963
  **Safest commands (highest success rate):**
877
- - slow push-in / pull-back
878
- - pan left/right
879
- - tilt up/down
880
- - gentle handheld micro-sway
964
+ - Dolly In / Dolly Out
965
+ - Pan Left / Pan Right
966
+ - Tilt Up / Tilt Down
967
+ - Tracking Shot or Steadicam Movement for smooth follow
968
+ - Handheld Movement with gentle micro-sway
881
969
  - roll 360 (especially for loops)
882
970
 
883
971
  **Stabilization trick:** Writing "tripod-locked" reduces background jitter.
@@ -984,26 +1072,22 @@ When using a reference image as the start frame, the model extracts lighting and
984
1072
 
985
1073
  ### Multi-Shot Protocol (Tek Üretimde Çoklu Çekim)
986
1074
 
987
- Kling 3.0 can manage **2 to 6 shots** in a single 15-second generation.
1075
+ Treat Kling multi-shot as **storyboarded internal progression**, not as a license to over-cut.
988
1076
 
989
1077
  **Rules:**
990
- 1. **Shot Prefix:** Each shot MUST begin with \`Shot X,\` (e.g., \`Shot 1,\`, \`Shot 2,\`)
991
- 2. **Character Continuity:** Repeat physical descriptions at the start of each shot, or use **Element Binding** (see below)
992
- 3. **Time Distribution:** 15 seconds must be logically divided (e.g., Shot 1: 3s, Shot 2: 7s, Shot 3: 5s)
993
- 4. **Maximum:** 6 shots per generation; for more, use separate generations with chaining
994
-
995
- Official examples favor short, concrete camera-and-action openings. Start with the movement path or subject action, then lock the stable identity/background constraints.
996
-
997
- **Example Multi-Shot Prompt:**
998
- \`\`\`
999
- Shot 1, Close-up of a bearded man in his 40s, wearing a dusty military uniform. He stares ahead, jaw clenched. Tripod-locked, 85mm, shallow DOF. (3s)
1000
-
1001
- Shot 2, Medium shot, same bearded man turns toward a younger soldier standing behind him. Slow pan right to reveal the young soldier. Natural handheld micro-sway. (5s)
1078
+ 1. Default to **single-transition** whenever one camera path can carry the beat.
1079
+ 2. Use custom storyboard only for **2-3 distinct internal phases** with different framing/action purpose.
1080
+ 3. Keep each storyboard phase short, clear, and concrete: what changed, what stayed locked, why the cut exists.
1081
+ 4. Hard cap for this toolkit: **3 storyboard shots per generation** in app.kling-oriented workflows.
1082
+ 5. If the sequence wants 4+ phases, split into multiple chained generations.
1083
+ 6. Never assign separate storyboard shots to micro-actions that would read better inside one stronger shot.
1002
1084
 
1003
- Shot 3, Over-the-shoulder from the younger soldier's POV, the bearded man speaks directly to camera. Gentle push-in. (4s)
1085
+ **Preferred storyboard jobs:**
1086
+ - Shot 1: establish or setup
1087
+ - Shot 2: main action / reveal / shift
1088
+ - Shot 3: reaction / settle / handoff
1004
1089
 
1005
- Shot 4, Wide shot establishing the artillery position, both soldiers visible. Crane rise. (3s)
1006
- \`\`\`
1090
+ Official Kling surfaces favor short, concrete shot descriptions. Start from the action path or camera intention, then lock continuity anchors (identity, wardrobe, background geometry, stable light).
1007
1091
 
1008
1092
  ### Element Binding (Öğe Bağlama — Karakter/Nesne Tutarlılığı)
1009
1093
 
@@ -1019,17 +1103,23 @@ Element Binding is Kling 3.0's built-in technology for maintaining character and
1019
1103
  - For **multi-shot** sequences: Prefer Element Binding when available
1020
1104
  - **Fallback:** If Element Binding is not available in your interface, manually repeat: character age, distinctive features, costume, and key proportions at each shot's start
1021
1105
 
1022
- ### Advanced Camera Vocabulary (Kling vCoT Triggers)
1106
+ ### Advanced Camera Vocabulary (24-Move Cinematic Lexicon)
1107
+
1108
+ These professional terms activate Kling's "Visual Chain-of-Thought" (vCoT) for more precise results. Use one named movement per shot unless a motivated compound move is required:
1023
1109
 
1024
- These professional terms activate Kling's "Visual Chain-of-Thought" (vCoT) for more precise results:
1110
+ | Group | Movements |
1111
+ |-------|-----------|
1112
+ | **Physical push/pull** | Dolly In, Dolly Out |
1113
+ | **Locked head rotation** | Pan Left, Pan Right, Tilt Up, Tilt Down |
1114
+ | **Physical lateral/vertical travel** | Truck Left, Truck Right, Pedestal Up, Pedestal Down |
1115
+ | **Arc/parallax** | Arc Left, Arc Right, Tracking Shot, Leading Shot, Following Shot |
1116
+ | **Dynamic stabilization** | Whip Pan, Handheld Movement, Steadicam Movement |
1117
+ | **Angle/subjective** | Canted Angle (Dutch Angle), Point of View (POV) |
1118
+ | **Optical/composite** | Zoom In, Zoom Out, Dolly Zoom (Vertigo Effect), Crane/Jib Shot |
1025
1119
 
1026
- | Category | Terms |
1027
- |----------|-------|
1028
- | **Angles** | Low-angle hero shot, Dutch angle (tilted horizon), POV (subjective), Bird's-eye view (top-down) |
1029
- | **Movements** | Dolly push-in, Orbit (360° rotation), Lateral pan, Tracking, Spiral up |
1030
- | **Hybrid** | Dolly Zoom (Vertigo effect — zoom in while pulling back), Move Left and Zoom In (simultaneous) |
1120
+ Aliases: Dolly push-in = Dolly In, Dolly pull-out = Dolly Out, Orbit = Arc Left/Right, Lateral slide = Truck Left/Right, Crane rise/descend = Crane/Jib Shot. Rack focus is a focus move, not camera travel.
1031
1121
 
1032
- > **Tip:** Hybrid movements like Dolly Zoom trigger stronger vCoT processing and produce more cinematic results, but increase warp risk. Use with wider CFG (0.50-0.60).
1122
+ > **Tip:** Hybrid movements like Dolly Zoom, Truck plus Pan, or Crane/Jib plus Arc trigger stronger vCoT processing and produce more cinematic results, but increase warp risk. Use with wider CFG (0.50-0.60).
1033
1123
 
1034
1124
  ### Native Audio & Dialogue (Kling-Specific)
1035
1125