@milenyumai/film-kit 1.4.0 → 1.4.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +1 -0
- package/build/lib/templates.js +89 -43
- package/content/ARCHITECTURE.md +4 -1
- package/content/MASTER.md +48 -19
- package/content/RULES.md +19 -8
- package/content/VOICE-DESIGN.md +248 -0
- package/content/agents/prompt-engineer.md +50 -27
- package/content/skills/audio-design/SKILL.md +140 -13
- package/content/skills/prompt-structure/SKILL.md +22 -3
- package/content/workflows/chain.md +2 -0
- package/content/workflows/finish.md +3 -0
- package/content/workflows/generate.md +59 -11
- package/content/workflows/recover.md +5 -3
- package/content/workflows/safety-check.md +4 -0
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -18,6 +18,7 @@ Film-Kit ships as a single repository with four npm packages:
|
|
|
18
18
|
- native `.claude/agents/*`
|
|
19
19
|
- cleanup of stale mode-specific Claude artifacts
|
|
20
20
|
- Shared `spatial-blocking` skill for gaze, plane depth, light cohesion, compositing realism, and anti-miniature control.
|
|
21
|
+
- Voice-design aware audio contract with project-level `voiceCast`, shot-level `Audio Plan`, and backward-compatible `Audio direction` blocks.
|
|
21
22
|
- Aligned quality gates across Claude Code, Cursor, Copilot, and Antigravity.
|
|
22
23
|
- Stronger Kling 3.0 and Kling multi-shot guidance, including practical route rules and hard caps.
|
|
23
24
|
|
package/build/lib/templates.js
CHANGED
|
@@ -44,6 +44,7 @@ All rules, skills, and workflows are located under \`.agent/\`.
|
|
|
44
44
|
### Entry Points
|
|
45
45
|
- **Master Rules:** \`.agent/MASTER.md\` — Complete production ruleset (690 lines)
|
|
46
46
|
- **Architecture:** \`.agent/ARCHITECTURE.md\` — System map & quick reference
|
|
47
|
+
- **Voice Design:** \`.agent/VOICE-DESIGN.md\` — Project-level \`voiceCast\` + shot-level \`audioPlan\`
|
|
47
48
|
- **Model Profile:** \`.agent/model-profile.md\` — Active model rules and constraints
|
|
48
49
|
- **Agent:** \`.agent/agents/prompt-engineer.md\` — Senior prompt engineer agent
|
|
49
50
|
|
|
@@ -71,7 +72,7 @@ All rules, skills, and workflows are located under \`.agent/\`.
|
|
|
71
72
|
## Goal
|
|
72
73
|
When the user asks \`/generate\`, convert the scenario into:
|
|
73
74
|
- \`${config.outputDir}/project-info.md\` — Characters, settings, arc mapping
|
|
74
|
-
- \`${config.outputDir}/shot-plan.json\` — Single-agent plan + policy contract
|
|
75
|
+
- \`${config.outputDir}/shot-plan.json\` — Single-agent plan + policy + \`voiceCast\` contract
|
|
75
76
|
- \`${config.outputDir}/shots/SHOT01.md, SHOT02.md, ...\` — Production shot files (with coverage included)
|
|
76
77
|
- \`${config.outputDir}/reports/SAFETY-REPORT.md\` — Safety gate result
|
|
77
78
|
- \`${config.outputDir}/reports/DELIVERY-REPORT.md\` — Delivery gate result
|
|
@@ -88,6 +89,7 @@ Each \`SHOTNN.md\` is a **single file** containing ALL shot details:
|
|
|
88
89
|
- 🔗 Chain status (FIRST / CHAINED / CHAIN BREAK)
|
|
89
90
|
- İLK FRAME (start frame image prompt — min 60 words)
|
|
90
91
|
- SON FRAME (end frame image prompt — min 60 words)
|
|
92
|
+
- AUDIO PLAN (machine-readable shot audio contract)
|
|
91
93
|
- VİDEO (video prompt with audio direction — min 80 words)
|
|
92
94
|
- COVERAGE SHOTS (2-3 coverage shots within same file — each min 60 words)
|
|
93
95
|
- 🇹🇷 Turkish summary for each section
|
|
@@ -99,6 +101,7 @@ Each \`SHOTNN.md\` is a **single file** containing ALL shot details:
|
|
|
99
101
|
- **AUTO-SAFETY:** Proactively reframe content that may trigger safety filters
|
|
100
102
|
- **Frame Chaining:** Last frame of SHOT[N] = First frame of SHOT[N+1]
|
|
101
103
|
- **Coverage Mandatory:** Every main shot includes 2-3 coverage sub-shots in same file
|
|
104
|
+
- **Voice Design:** \`shot-plan.json\` keeps top-level \`voiceCast\`; every speaking VIDEO section keeps \`Audio Plan\`
|
|
102
105
|
- **Music: NONE** by default (user must explicitly request)
|
|
103
106
|
- **SLOW BURN:** 8-second duration, split actions into multiple shots
|
|
104
107
|
`;
|
|
@@ -111,6 +114,7 @@ function buildCursorLegacyRules(config) {
|
|
|
111
114
|
## CRITICAL: SKILL LOADING PROTOCOL
|
|
112
115
|
Before generating ANY prompts, read the appropriate skill files from .agent/skills/:
|
|
113
116
|
Read .agent/model-profile.md first to apply model-specific rules.
|
|
117
|
+
Read .agent/VOICE-DESIGN.md when dialogue, narrator VO, or reusable speaker identity exists.
|
|
114
118
|
|
|
115
119
|
| Skill | Path | When |
|
|
116
120
|
|-------|------|------|
|
|
@@ -135,6 +139,7 @@ Read .agent/model-profile.md first to apply model-specific rules.
|
|
|
135
139
|
9. Frame chaining: Last frame of SHOT[N] = First frame of SHOT[N+1].
|
|
136
140
|
10. ILK/İLK FRAME section must contain a code block even for chained shots.
|
|
137
141
|
11. ONE FILE PER SHOT: Each SHOTNN.md contains main shot + all coverage shots.
|
|
142
|
+
12. Keep top-level \`voiceCast\` in ${config.outputDir}/shot-plan.json and \`Audio Plan\` in every speaking VIDEO section.
|
|
138
143
|
|
|
139
144
|
## WORKFLOWS
|
|
140
145
|
- /generate → Read .agent/workflows/generate.md
|
|
@@ -164,6 +169,7 @@ alwaysApply: true
|
|
|
164
169
|
## Entry Point
|
|
165
170
|
Read \`.agent/MASTER.md\` for complete production ruleset.
|
|
166
171
|
Read \`.agent/ARCHITECTURE.md\` for system overview.
|
|
172
|
+
Read \`.agent/VOICE-DESIGN.md\` for voice identity and shot audio contracts.
|
|
167
173
|
Read \`.agent/model-profile.md\` for active model constraints.
|
|
168
174
|
|
|
169
175
|
## SKILL LOADING (MANDATORY)
|
|
@@ -187,6 +193,8 @@ All skills at: \`.agent/skills/[name]/SKILL.md\`
|
|
|
187
193
|
## CRITICAL RULES
|
|
188
194
|
- AUTO-ANONYMOUS: Replace ALL real names with physical descriptions
|
|
189
195
|
- Dialogue naming follows \`${config.outputDir}/shot-plan.json\` policy
|
|
196
|
+
- \`shot-plan.json\` stores top-level \`voiceCast\`
|
|
197
|
+
- Every speaking VIDEO section includes \`Audio Plan\`
|
|
190
198
|
- AUTO-SAFETY: Proactively reframe sensitive content
|
|
191
199
|
- Frame chaining: Last frame SHOT[N] = First frame SHOT[N+1]
|
|
192
200
|
- Coverage: 2-3 sub-shots per main shot (min 60 words each, in same file)
|
|
@@ -216,8 +224,9 @@ Single-agent shot package generation. Claude Code may use the native \`prompt-en
|
|
|
216
224
|
## Mandatory Read Order
|
|
217
225
|
1. \`.agent/model-profile.md\`
|
|
218
226
|
2. \`.agent/MASTER.md\`
|
|
219
|
-
3. \`.agent/
|
|
220
|
-
4. \`.agent/
|
|
227
|
+
3. \`.agent/VOICE-DESIGN.md\`
|
|
228
|
+
4. \`.agent/ARCHITECTURE.md\`
|
|
229
|
+
5. \`.agent/agents/prompt-engineer.md\`
|
|
221
230
|
5. \`.claude/CLAUDE.md\`
|
|
222
231
|
6. relevant files under \`.claude/rules/\`
|
|
223
232
|
|
|
@@ -242,6 +251,7 @@ Before generating ANY prompts, read skills from \`.agent/skills/\`:
|
|
|
242
251
|
- Model: \`${config.model}\` (${getModelDisplayName(config.model)})
|
|
243
252
|
- Kling preset: \`${config.klingPreset}\`
|
|
244
253
|
- Create \`${config.outputDir}/project-info.md\`, \`${config.outputDir}/shot-plan.json\`, and \`${config.outputDir}/_index.md\`
|
|
254
|
+
- Keep top-level \`voiceCast\` in \`${config.outputDir}/shot-plan.json\`
|
|
245
255
|
- Write \`${config.outputDir}/shots/SHOTNN.md\` per shot; coverage stays in the same file
|
|
246
256
|
- Refresh \`${config.outputDir}/reports/SAFETY-REPORT.md\` and \`${config.outputDir}/reports/DELIVERY-REPORT.md\` before \`/finish\`
|
|
247
257
|
|
|
@@ -254,11 +264,12 @@ Before generating ANY prompts, read skills from \`.agent/skills/\`:
|
|
|
254
264
|
6. **Music:** NONE by default.
|
|
255
265
|
7. **Avoid line:** MANDATORY on every prompt (image, video, coverage).
|
|
256
266
|
8. **Coverage:** 2-3 sub-shots within same SHOTNN.md file, min 70 words each.
|
|
257
|
-
9. **
|
|
258
|
-
10. **
|
|
259
|
-
11. **
|
|
260
|
-
12. **
|
|
261
|
-
13. **
|
|
267
|
+
9. **Voice Design:** keep project-level \`voiceCast\` in \`${config.outputDir}/shot-plan.json\` and per-shot \`Audio Plan\` in each VIDEO section.
|
|
268
|
+
10. **ILK/İLK FRAME:** Always include a fenced code block, even when chained.
|
|
269
|
+
11. **Quality Floor:** ILK >= 80, SON >= 80, VIDEO >= 120, coverage >= 70 words.
|
|
270
|
+
12. **Specificity Floor:** lens/framing, lighting, and foreground/midground/background action are mandatory.
|
|
271
|
+
13. **Spatial Realism Floor:** eyeline target, plane map, shared light source, and contact/depth cues are mandatory when relational staging matters.
|
|
272
|
+
14. **ONE FILE PER SHOT:** No separate coverage files.
|
|
262
273
|
|
|
263
274
|
## Workflows
|
|
264
275
|
| Command | Workflow |
|
|
@@ -283,8 +294,9 @@ This workspace keeps high-level policy in \`CLAUDE.md\` and operational detail i
|
|
|
283
294
|
## Before Any Generation or Repair
|
|
284
295
|
1. Read \`.agent/model-profile.md\`
|
|
285
296
|
2. Read \`.agent/MASTER.md\`
|
|
286
|
-
3. Read \`.agent/
|
|
287
|
-
4. Read \`.agent/
|
|
297
|
+
3. Read \`.agent/VOICE-DESIGN.md\`
|
|
298
|
+
4. Read \`.agent/agents/prompt-engineer.md\`
|
|
299
|
+
5. Read \`.agent/workflows/generate.md\` or the requested workflow
|
|
288
300
|
5. Apply AUTO-ANONYMOUS and AUTO-SAFETY before drafting
|
|
289
301
|
6. Prefer the active/selected markdown file as scenario source; fallback is \`${config.scenarioHint}\`
|
|
290
302
|
7. Do not mark work complete while any required report is missing or fail
|
|
@@ -293,6 +305,8 @@ This workspace keeps high-level policy in \`CLAUDE.md\` and operational detail i
|
|
|
293
305
|
- Write only inside \`${config.outputDir}\`
|
|
294
306
|
- Keep one file per shot: \`${config.outputDir}/shots/SHOTNN.md\`
|
|
295
307
|
- Maintain \`${config.outputDir}/shot-plan.json\` dialogue naming policy
|
|
308
|
+
- Maintain \`${config.outputDir}/shot-plan.json\` top-level \`voiceCast\`
|
|
309
|
+
- Keep \`Audio Plan\` blocks aligned to \`voiceCast\`
|
|
296
310
|
- Keep \`ILK/İLK FRAME\` in a fenced code block even when chained
|
|
297
311
|
- Quality floor and specificity floor are hard gates, not suggestions
|
|
298
312
|
- Apply \`.agent/skills/spatial-blocking/SKILL.md\` whenever eyeline, compositing, or depth realism is critical
|
|
@@ -313,13 +327,16 @@ Use the Film-Kit core runtime.
|
|
|
313
327
|
## Read First
|
|
314
328
|
1. \`.agent/model-profile.md\`
|
|
315
329
|
2. \`.agent/MASTER.md\`
|
|
316
|
-
3. \`.agent/
|
|
317
|
-
4. \`.agent/
|
|
318
|
-
5. \`.
|
|
330
|
+
3. \`.agent/VOICE-DESIGN.md\`
|
|
331
|
+
4. \`.agent/agents/prompt-engineer.md\`
|
|
332
|
+
5. \`.agent/workflows/generate.md\`
|
|
333
|
+
6. \`.claude/rules/output-contract.md\`
|
|
319
334
|
|
|
320
335
|
## Responsibilities
|
|
321
336
|
- draft and repair shot files under \`${config.outputDir}/shots/\`
|
|
322
337
|
- apply \`${config.outputDir}/shot-plan.json\` dialogue naming policy
|
|
338
|
+
- maintain top-level \`voiceCast\` inside \`${config.outputDir}/shot-plan.json\`
|
|
339
|
+
- keep \`Audio Plan\` blocks valid against \`voiceCast\`
|
|
323
340
|
- enforce AUTO-ANONYMOUS, AUTO-SAFETY, chaining, and coverage contracts
|
|
324
341
|
- enforce quality floor: ILK >= 80, SON >= 80, VIDEO >= 120, coverage >= 70
|
|
325
342
|
- enforce specificity floor: lens/framing, lighting, and foreground/midground/background action
|
|
@@ -349,6 +366,7 @@ If using the native Claude subagent, read \`.claude/agents/prompt-engineer.md\`
|
|
|
349
366
|
- Determine shot count based on action beats
|
|
350
367
|
- Create \`${config.outputDir}/project-info.md\`
|
|
351
368
|
- Create \`${config.outputDir}/shot-plan.json\`
|
|
369
|
+
- Add top-level \`voiceCast\` before writing speaking shots
|
|
352
370
|
|
|
353
371
|
2. **Batch Strategy:**
|
|
354
372
|
- 1-10 shots → Generate all at once
|
|
@@ -358,6 +376,7 @@ If using the native Claude subagent, read \`.claude/agents/prompt-engineer.md\`
|
|
|
358
376
|
3. **Per Shot (SINGLE FILE: SHOTNN.md):**
|
|
359
377
|
- Analyze scene type (Dialogue / Action / Emotional / Establishing)
|
|
360
378
|
- Generate main shot (İLK FRAME + SON FRAME + VİDEO)
|
|
379
|
+
- Add machine-readable \`Audio Plan\` before every VIDEO section
|
|
361
380
|
- Keep İLK FRAME as fenced code block even when chained
|
|
362
381
|
- Enforce hard quality floor: ILK >= 80, SON >= 80, VIDEO >= 120, coverage >= 70
|
|
363
382
|
- Enforce specificity floor: lens/framing + lighting + foreground/midground/background action
|
|
@@ -386,8 +405,9 @@ function buildClaudeRuleOutputContract(config) {
|
|
|
386
405
|
|
|
387
406
|
## Required Files
|
|
388
407
|
- \`${config.outputDir}/project-info.md\` — Characters, settings, emotional arc mapping, tension levels
|
|
389
|
-
- \`${config.outputDir}/shot-plan.json\` — Name policy, shot plan, and
|
|
408
|
+
- \`${config.outputDir}/shot-plan.json\` — Name policy, shot plan, validation contract, and top-level \`voiceCast\`
|
|
390
409
|
- \`.agent/model-profile.md\` — Active model constraints and presets
|
|
410
|
+
- \`.agent/VOICE-DESIGN.md\` — Voice identity and shot audio contract
|
|
391
411
|
- \`${config.outputDir}/_index.md\` — Shot tracking with chain & status
|
|
392
412
|
- \`${config.outputDir}/shots/SHOT01.md ... SHOTNN.md\` — Individual shot files (one file per shot)
|
|
393
413
|
- \`${config.outputDir}/reports/SAFETY-REPORT.md\` — Safety gate report
|
|
@@ -403,7 +423,7 @@ The first sentence carries the most weight. Every prompt MUST follow this order:
|
|
|
403
423
|
3. [Action] Micro-behavior, acting cue, physical gesture
|
|
404
424
|
4. [Camera + Lens] "85mm f/2.0, shallow DOF, static with handheld micro-movement"
|
|
405
425
|
5. [Light + Atmosphere] "Warm oil lamp key light from screen-left, deep shadows"
|
|
406
|
-
6. [Audio direction block] (VIDEO prompts only)
|
|
426
|
+
6. [Audio Plan + Audio direction block] (VIDEO prompts only)
|
|
407
427
|
7. [Avoid line] (EVERY prompt — MANDATORY)
|
|
408
428
|
\`\`\`
|
|
409
429
|
|
|
@@ -413,6 +433,7 @@ The first sentence carries the most weight. Every prompt MUST follow this order:
|
|
|
413
433
|
- Kling preset: \`${config.klingPreset}\`
|
|
414
434
|
- Kling transition mode: Start+End (when model is kling-3.0)
|
|
415
435
|
- Motion timeline: first → then → finally (when model is kling-3.0)
|
|
436
|
+
- Kling multi-shot mode: single-transition by default; custom storyboard only for 2-3 meaningful phases (when model is kling-3.0)
|
|
416
437
|
|
|
417
438
|
## Shot File Format (SHOTNN.md) — SINGLE FILE, ALL INCLUSIVE
|
|
418
439
|
|
|
@@ -447,6 +468,12 @@ Avoid: blurry, low-res, noise, distorted faces, bad anatomy, extra limbs/fingers
|
|
|
447
468
|
plastic skin, waxy skin, on-screen text, watermark, logo, cartoon style, CGI look.
|
|
448
469
|
\\\`\\\`\\\`
|
|
449
470
|
|
|
471
|
+
### AUDIO PLAN
|
|
472
|
+
|
|
473
|
+
\\\`\\\`\\\`json
|
|
474
|
+
[Machine-readable audioPlan JSON block aligned with shot-plan.json -> voiceCast]
|
|
475
|
+
\\\`\\\`\\\`
|
|
476
|
+
|
|
450
477
|
### VİDEO
|
|
451
478
|
|
|
452
479
|
\\\`\\\`\\\`
|
|
@@ -494,6 +521,10 @@ Audio direction:
|
|
|
494
521
|
Avoid: distorted faces, morphing, blurry, flickering, unnatural motion, on-screen text.
|
|
495
522
|
\\\`\\\`\\\`
|
|
496
523
|
|
|
524
|
+
\\\`\\\`\\\`json
|
|
525
|
+
[Coverage audioPlan JSON block - voice binding only when coverage contains speech]
|
|
526
|
+
\\\`\\\`\\\`
|
|
527
|
+
|
|
497
528
|
### SHOT[NN]B — [Type] | [Duration]s | [Icon] [Label]
|
|
498
529
|
[Same format as A]
|
|
499
530
|
|
|
@@ -545,6 +576,8 @@ Total: 1 main + [N] coverage = [N+1] production shots
|
|
|
545
576
|
### Name Rule
|
|
546
577
|
- Visual prompts and non-dialogue fields: no real names
|
|
547
578
|
- Dialogue transcript naming follows \`shot-plan.json.dialogue_name_policy\`
|
|
579
|
+
- Every speaking shot must resolve \`activeSpeakerKey\` to \`shot-plan.json.voiceCast\`
|
|
580
|
+
- Keep one active speaker per shot
|
|
548
581
|
|
|
549
582
|
### 30° Rule
|
|
550
583
|
Coverage camera angles MUST differ from main shot by at least 30°.
|
|
@@ -590,6 +623,7 @@ function buildCopilotInstructions(config) {
|
|
|
590
623
|
### System Entry Point
|
|
591
624
|
Read \`.agent/MASTER.md\` for the complete production ruleset.
|
|
592
625
|
Read \`.agent/ARCHITECTURE.md\` for system overview.
|
|
626
|
+
Read \`.agent/VOICE-DESIGN.md\` for voice identity and shot audio contracts.
|
|
593
627
|
Read \`.agent/model-profile.md\` for active model rules.
|
|
594
628
|
|
|
595
629
|
### Skill Loading Protocol (MANDATORY)
|
|
@@ -610,7 +644,8 @@ Before generating ANY prompts, read skills from \`.agent/skills/\`:
|
|
|
610
644
|
4. Create index: \`${config.outputDir}/_index.md\`
|
|
611
645
|
5. Create project info: \`${config.outputDir}/project-info.md\`
|
|
612
646
|
6. Create plan: \`${config.outputDir}/shot-plan.json\`
|
|
613
|
-
7.
|
|
647
|
+
7. Keep top-level \`voiceCast\` in the plan and \`Audio Plan\` in speaking VIDEO sections
|
|
648
|
+
8. Write reports: \`${config.outputDir}/reports/SAFETY-REPORT.md\`, \`${config.outputDir}/reports/DELIVERY-REPORT.md\`
|
|
614
649
|
|
|
615
650
|
### Critical Rules
|
|
616
651
|
- **AUTO-ANONYMOUS:** Replace ALL real names with physical descriptions
|
|
@@ -621,6 +656,7 @@ Before generating ANY prompts, read skills from \`.agent/skills/\`:
|
|
|
621
656
|
- **Spatial Realism:** eyeline targets, shared light, depth scale, and anti-cutout staging must agree when subjects share frame
|
|
622
657
|
- **Avoid Line:** MANDATORY on every prompt
|
|
623
658
|
- **Music:** NONE by default
|
|
659
|
+
- **Voice Design:** keep \`voiceCast\` in \`${config.outputDir}/shot-plan.json\` and \`Audio Plan\` in speaking VIDEO sections
|
|
624
660
|
- **Duration:** 8s default, slow burn pacing
|
|
625
661
|
- **Language:** Prompts in English, dialogue preserved
|
|
626
662
|
- **ILK/İLK FRAME:** keep fenced code block even when chained
|
|
@@ -650,9 +686,10 @@ When request is /generate, follow the Film-Kit Hollywood production system:
|
|
|
650
686
|
3. Load required skills from \`.agent/skills/\`
|
|
651
687
|
4. Transform scenario into production shot package at \`${config.outputDir}\`
|
|
652
688
|
5. Generate: project-info.md, shot-plan.json, _index.md, shots/SHOT01.md..SHOTNN.md
|
|
653
|
-
6.
|
|
654
|
-
7.
|
|
655
|
-
8.
|
|
689
|
+
6. Keep top-level \`voiceCast\` in shot-plan.json
|
|
690
|
+
7. Each SHOTNN.md: İLK FRAME + SON FRAME + AUDIO PLAN + VİDEO + 2-3 Coverage (ALL IN ONE FILE)
|
|
691
|
+
8. Enforce: auto-anonymous, dialogue name policy, auto-safety, frame chaining, avoid lines
|
|
692
|
+
9. Write reports to \`${config.outputDir}/reports/\` before /finish
|
|
656
693
|
`;
|
|
657
694
|
}
|
|
658
695
|
/* ---------- ANTIGRAVITY ---------- */
|
|
@@ -667,6 +704,7 @@ description: Hollywood-standard cinematic prompt engineering with model-aware pr
|
|
|
667
704
|
## System Architecture
|
|
668
705
|
This skill is part of the Film-Kit prompt engineering system.
|
|
669
706
|
Read \`.agent/MASTER.md\` for the complete production ruleset (690+ rules).
|
|
707
|
+
Read \`.agent/VOICE-DESIGN.md\` for project-level \`voiceCast\` and shot-level \`audioPlan\`.
|
|
670
708
|
Read \`.agent/model-profile.md\` first for active model constraints.
|
|
671
709
|
|
|
672
710
|
## Skill Loading Protocol
|
|
@@ -700,6 +738,7 @@ Before generating ANY prompts, read these skills:
|
|
|
700
738
|
- Shot index: \`${config.outputDir}/_index.md\`
|
|
701
739
|
- Project info: \`${config.outputDir}/project-info.md\`
|
|
702
740
|
- Plan: \`${config.outputDir}/shot-plan.json\`
|
|
741
|
+
- Voice contract: top-level \`voiceCast\` in \`${config.outputDir}/shot-plan.json\`
|
|
703
742
|
- Reports: \`${config.outputDir}/reports/SAFETY-REPORT.md\`, \`${config.outputDir}/reports/DELIVERY-REPORT.md\`
|
|
704
743
|
|
|
705
744
|
## Critical Rules
|
|
@@ -709,12 +748,13 @@ Before generating ANY prompts, read these skills:
|
|
|
709
748
|
4. **Frame Chaining:** Last frame of SHOT[N] becomes first frame of SHOT[N+1]
|
|
710
749
|
5. **Coverage Mandatory:** 2-3 sub-shots per main shot (in same file, min 60 words each)
|
|
711
750
|
6. **Avoid Line:** MANDATORY on every prompt (image + video + coverage)
|
|
712
|
-
7. **
|
|
713
|
-
8. **
|
|
714
|
-
9. **
|
|
715
|
-
10. **
|
|
716
|
-
11. **
|
|
717
|
-
12. **
|
|
751
|
+
7. **Voice Design:** keep \`voiceCast\` in \`${config.outputDir}/shot-plan.json\` and \`Audio Plan\` in every speaking VIDEO section
|
|
752
|
+
8. **Music: NONE** by default
|
|
753
|
+
9. **Ultra Realism** default visual mode
|
|
754
|
+
10. **8s duration** default, slow burn pacing
|
|
755
|
+
11. **ILK/İLK FRAME:** always keep fenced code block
|
|
756
|
+
12. **ONE FILE PER SHOT:** SHOTNN.md contains main shot + all coverage
|
|
757
|
+
13. **Relational Realism:** preserve eyeline targets, shared light, depth scale, and anti-cutout staging when multiple subjects share frame
|
|
718
758
|
|
|
719
759
|
## Quality Floor (Hard Gate)
|
|
720
760
|
Reject and regenerate any shot that fails:
|
|
@@ -843,6 +883,16 @@ Limit the amount of change per clip for realism:
|
|
|
843
883
|
|
|
844
884
|
> If end frame is too different, the model must "invent miracles" → plastic/melting/warp risk rises.
|
|
845
885
|
|
|
886
|
+
### Kling Storyboard / Multi-Shot Decision Tree
|
|
887
|
+
|
|
888
|
+
The official Kling app surface exposes custom storyboard-like shot prompting and optional end-frame anchoring.
|
|
889
|
+
Use it as a selective upgrade, not a default:
|
|
890
|
+
- **single-transition**: default mode for one clean action arc, reveal, or emotional turn
|
|
891
|
+
- **custom-storyboard**: only when one clip truly needs **2-3 editorially distinct phases**
|
|
892
|
+
- **hard cap in this toolkit:** 3 custom storyboard shots per video generation
|
|
893
|
+
- **never storyboard micro-beats:** one glance, one finger move, one tiny prop touch, one breath shift
|
|
894
|
+
- if the beat needs 4+ phases, split into chained videos instead of overloading one Kling prompt
|
|
895
|
+
|
|
846
896
|
### Golden Prompt Skeleton (Start+End)
|
|
847
897
|
|
|
848
898
|
The prompt's job is to tell the model **how to generate the in-between frames**:
|
|
@@ -984,26 +1034,22 @@ When using a reference image as the start frame, the model extracts lighting and
|
|
|
984
1034
|
|
|
985
1035
|
### Multi-Shot Protocol (Tek Üretimde Çoklu Çekim)
|
|
986
1036
|
|
|
987
|
-
Kling
|
|
1037
|
+
Treat Kling multi-shot as **storyboarded internal progression**, not as a license to over-cut.
|
|
988
1038
|
|
|
989
1039
|
**Rules:**
|
|
990
|
-
1. **
|
|
991
|
-
2.
|
|
992
|
-
3.
|
|
993
|
-
4. **
|
|
994
|
-
|
|
995
|
-
|
|
996
|
-
|
|
997
|
-
**
|
|
998
|
-
|
|
999
|
-
|
|
1000
|
-
|
|
1001
|
-
|
|
1002
|
-
|
|
1003
|
-
Shot 3, Over-the-shoulder from the younger soldier's POV, the bearded man speaks directly to camera. Gentle push-in. (4s)
|
|
1004
|
-
|
|
1005
|
-
Shot 4, Wide shot establishing the artillery position, both soldiers visible. Crane rise. (3s)
|
|
1006
|
-
\`\`\`
|
|
1040
|
+
1. Default to **single-transition** whenever one camera path can carry the beat.
|
|
1041
|
+
2. Use custom storyboard only for **2-3 distinct internal phases** with different framing/action purpose.
|
|
1042
|
+
3. Keep each storyboard phase short, clear, and concrete: what changed, what stayed locked, why the cut exists.
|
|
1043
|
+
4. Hard cap for this toolkit: **3 storyboard shots per generation** in app.kling-oriented workflows.
|
|
1044
|
+
5. If the sequence wants 4+ phases, split into multiple chained generations.
|
|
1045
|
+
6. Never assign separate storyboard shots to micro-actions that would read better inside one stronger shot.
|
|
1046
|
+
|
|
1047
|
+
**Preferred storyboard jobs:**
|
|
1048
|
+
- Shot 1: establish or setup
|
|
1049
|
+
- Shot 2: main action / reveal / shift
|
|
1050
|
+
- Shot 3: reaction / settle / handoff
|
|
1051
|
+
|
|
1052
|
+
Official Kling surfaces favor short, concrete shot descriptions. Start from the action path or camera intention, then lock continuity anchors (identity, wardrobe, background geometry, stable light).
|
|
1007
1053
|
|
|
1008
1054
|
### Element Binding (Öğe Bağlama — Karakter/Nesne Tutarlılığı)
|
|
1009
1055
|
|
package/content/ARCHITECTURE.md
CHANGED
|
@@ -20,6 +20,7 @@ Modular system consisting of:
|
|
|
20
20
|
.agent/
|
|
21
21
|
├── ARCHITECTURE.md # This file
|
|
22
22
|
├── MASTER.md # Main rules (entry point for AI tools)
|
|
23
|
+
├── VOICE-DESIGN.md # Voice identity + audioPlan contract
|
|
23
24
|
├── model-profile.md # Active model rules (runtime generated)
|
|
24
25
|
├── agents/
|
|
25
26
|
│ └── prompt-engineer.md # Primary agent
|
|
@@ -60,7 +61,7 @@ Modular system consisting of:
|
|
|
60
61
|
| `coverage-system` | **Mandatory coverage shots** (Reaction, OTS, Insert, Cutaway, ECU, Wide) + L-cut/J-cut + 30° kuralı + **180° kuralı** + eyeline match + matching action + multi-character blocking |
|
|
61
62
|
| `spatial-blocking` | **Relational realism**: eyeline targeting, plane mapping, body orientation, shared lighting, depth/scale integration, anti-cutout / anti-miniature cues |
|
|
62
63
|
| `visual-modes` | **Ultra Realism** default, stylization triggers, anti-AI artifact rules + **renk sürekliliği** + magic hour + flashback/rüya görsel ayrımı |
|
|
63
|
-
| `audio-design` | **Sound design** rules, voice realism,
|
|
64
|
+
| `audio-design` | **Sound design** rules, voice realism, project-level `voiceCast`, shot-level `audioPlan`, audio direction block + diegetic/non-diegetic ses ayrımı |
|
|
64
65
|
| `prompt-structure` | Image/video prompt templates, camera vocabulary, seed parameter, prompt rewriter, **re-take strategy**, coverage prompt yazım standartları (≥60 kelime) |
|
|
65
66
|
|
|
66
67
|
---
|
|
@@ -84,6 +85,8 @@ User Scenario → Agent Activated → Read model-profile → Load Required Skill
|
|
|
84
85
|
↓
|
|
85
86
|
.agent/model-profile.md (ALWAYS FIRST)
|
|
86
87
|
↓
|
|
88
|
+
.agent/VOICE-DESIGN.md (when dialogue / narrator exists)
|
|
89
|
+
↓
|
|
87
90
|
safety-compliance (ALWAYS)
|
|
88
91
|
reference-locking (if refs provided)
|
|
89
92
|
frame-chaining (ALWAYS)
|
package/content/MASTER.md
CHANGED
|
@@ -131,9 +131,23 @@ User: "Subay askerlere bağırır ve ateş emri verir."
|
|
|
131
131
|
2. "Ateş!" diye bağırır, damarları çıkar. (Mid-shot)
|
|
132
132
|
3. Askerlerin parmağı tetiğe gider. (Detail shot)
|
|
133
133
|
|
|
134
|
-
### 🎵 Music Policy
|
|
135
|
-
**DEFAULT:** `Music: NONE` (Always).
|
|
136
|
-
User must explicitly request music to include it. Otherwise, rely on SFX and Ambience.
|
|
134
|
+
### 🎵 Music Policy
|
|
135
|
+
**DEFAULT:** `Music: NONE` (Always).
|
|
136
|
+
User must explicitly request music to include it. Otherwise, rely on SFX and Ambience.
|
|
137
|
+
|
|
138
|
+
### Voice Design Contract
|
|
139
|
+
|
|
140
|
+
When dialogue or narration exists, audio is mandatory at three synchronized layers:
|
|
141
|
+
|
|
142
|
+
1. project-level `voiceCast`
|
|
143
|
+
2. shot-level `audioPlan`
|
|
144
|
+
3. prompt-level `Audio direction:`
|
|
145
|
+
|
|
146
|
+
Project-level voice identity is stable.
|
|
147
|
+
Shot-level performance is variable.
|
|
148
|
+
|
|
149
|
+
Use `.agent/VOICE-DESIGN.md` whenever a speaker, narrator, or voiceover is involved.
|
|
150
|
+
Do not redesign a speaker voice per shot. Reuse the same `speakerKey` and voice identity across the whole project.
|
|
137
151
|
|
|
138
152
|
### 📸 Reference Image Enforcement
|
|
139
153
|
|
|
@@ -160,7 +174,7 @@ When user provides references, EVERY prompt MUST:
|
|
|
160
174
|
3. [Aksiyon] — Ne yapıyor, mikro-davranış, oyunculuk ipucu
|
|
161
175
|
4. [Kamera + Lens] — "85mm f/2.0, shallow DOF, static with handheld micro-movement"
|
|
162
176
|
5. [Işık + Atmosfer] — "Warm oil lamp key light from screen-left, deep shadows"
|
|
163
|
-
6. [Audio direction block] — (sadece video prompt'larında)
|
|
177
|
+
6. [Audio Plan + Audio direction block] — (sadece video prompt'larında)
|
|
164
178
|
7. [Avoid line] — (HER prompt'ta zorunlu)
|
|
165
179
|
```
|
|
166
180
|
|
|
@@ -186,11 +200,20 @@ Avoid: blurry, low-res, noise, distorted faces, bad anatomy, extra limbs/fingers
|
|
|
186
200
|
- `--audio_mode` or `--audio_prompt` (audio is for VIDEO only)
|
|
187
201
|
- Any `--parameter` flags
|
|
188
202
|
|
|
189
|
-
#### VIDEO PROMPTS (VİDEO, Coverage Video)
|
|
190
|
-
|
|
191
|
-
**
|
|
192
|
-
|
|
193
|
-
Audio
|
|
203
|
+
#### VIDEO PROMPTS (VİDEO, Coverage Video)
|
|
204
|
+
|
|
205
|
+
**VOICE DESIGN CONTRACT (MANDATORY WHEN VOICE EXISTS):**
|
|
206
|
+
- Project output must include top-level `voiceCast`
|
|
207
|
+
- Every VIDEO section must include a machine-readable `Audio Plan` block
|
|
208
|
+
- If `Type` is `Dialogue`, `Voiceover`, or `Mixed`, `activeSpeakerKey` must bind to `voiceCast`
|
|
209
|
+
- `voiceIdentityPrompt` is character-level
|
|
210
|
+
- `performanceNote` is shot-level
|
|
211
|
+
- Keep one active speaker per shot
|
|
212
|
+
- Split reply dialogue across multiple shots when needed
|
|
213
|
+
|
|
214
|
+
**MUST INCLUDE FULL AUDIO DIRECTION BLOCK:**
|
|
215
|
+
```
|
|
216
|
+
Audio direction:
|
|
194
217
|
- Language: [TURKISH/ENGLISH/etc.]
|
|
195
218
|
- Type: [Dialogue/SFX/Ambience/Mixed]
|
|
196
219
|
- Dialogue transcript: "[Exact lines in original language]" or NONE
|
|
@@ -530,11 +553,13 @@ Avoid: blurry, low-res, noise, distorted faces, bad anatomy, extra limbs/fingers
|
|
|
530
553
|
**VİDEO:**
|
|
531
554
|
|
|
532
555
|
```
|
|
533
|
-
[Complete video prompt — MIN 80 words, MAX 120 words]
|
|
534
|
-
[Action + camera movement + acting cues + atmosphere]
|
|
535
|
-
|
|
536
|
-
Audio
|
|
537
|
-
|
|
556
|
+
[Complete video prompt — MIN 80 words, MAX 120 words]
|
|
557
|
+
[Action + camera movement + acting cues + atmosphere]
|
|
558
|
+
|
|
559
|
+
[Audio Plan JSON block aligned with .agent/VOICE-DESIGN.md]
|
|
560
|
+
|
|
561
|
+
Audio direction:
|
|
562
|
+
- Language: [LANGUAGE]
|
|
538
563
|
- Type: [Dialogue/SFX/Ambience]
|
|
539
564
|
- Dialogue transcript: [Lines or NONE]
|
|
540
565
|
- SFX: [Effects list]
|
|
@@ -613,11 +638,15 @@ Before outputting, validate EVERY shot. **Bu kontrol otomatiktir, kullanıcı ha
|
|
|
613
638
|
- [ ] **HER coverage image** → Avoid satırı var mı? ❗ Yoksa EKLE
|
|
614
639
|
- [ ] **HER coverage video** → Avoid satırı var mı? ❗ Yoksa EKLE
|
|
615
640
|
|
|
616
|
-
### 4️⃣ Prompt Uzunluk Kontrolü
|
|
617
|
-
- [ ] Ana shot image prompt ≥ 60 kelime mi?
|
|
618
|
-
- [ ] Ana shot video prompt ≥ 80 kelime mi?
|
|
619
|
-
- [ ] **Coverage prompt ≥ 60 kelime mi?** ❗ Kısa coverage YASAK
|
|
620
|
-
- [ ] Audio direction block tam mı? (Language, Type, Dialogue, SFX, Ambience, Music, Mix)
|
|
641
|
+
### 4️⃣ Prompt Uzunluk Kontrolü
|
|
642
|
+
- [ ] Ana shot image prompt ≥ 60 kelime mi?
|
|
643
|
+
- [ ] Ana shot video prompt ≥ 80 kelime mi?
|
|
644
|
+
- [ ] **Coverage prompt ≥ 60 kelime mi?** ❗ Kısa coverage YASAK
|
|
645
|
+
- [ ] Audio direction block tam mı? (Language, Type, Dialogue, SFX, Ambience, Music, Mix)
|
|
646
|
+
- [ ] `voiceCast` mevcut mu ve her konusan speaker icin `speakerKey` tekil mi?
|
|
647
|
+
- [ ] Her VIDEO bolumunde machine-readable `Audio Plan` var mi?
|
|
648
|
+
- [ ] Dialogue / Voiceover shot'larinda `activeSpeakerKey` -> `voiceCast` baglantisi gecerli mi?
|
|
649
|
+
- [ ] Bir shot'ta tek aktif speaker kurali korunuyor mu?
|
|
621
650
|
|
|
622
651
|
### 5️⃣ Türkçe Özet
|
|
623
652
|
- [ ] Her shot'un 🇹🇷 Türkçe özet satırı var mı?
|
package/content/RULES.md
CHANGED
|
@@ -114,10 +114,11 @@ When references provided:
|
|
|
114
114
|
|
|
115
115
|
| Kural | Açıklama |
|
|
116
116
|
|-------|----------|
|
|
117
|
-
| HER image prompt | Avoid satırı ile bitmeli |
|
|
118
|
-
| HER video prompt | Audio direction block + Avoid satırı ile bitmeli |
|
|
119
|
-
| HER coverage prompt | ≥ 60 kelime + Avoid satırı ile bitmeli |
|
|
120
|
-
| Türkçe özet | Her shot ve coverage için 🇹🇷 özet satırı |
|
|
117
|
+
| HER image prompt | Avoid satırı ile bitmeli |
|
|
118
|
+
| HER video prompt | Audio direction block + Avoid satırı ile bitmeli |
|
|
119
|
+
| HER coverage prompt | ≥ 60 kelime + Avoid satırı ile bitmeli |
|
|
120
|
+
| Türkçe özet | Her shot ve coverage için 🇹🇷 özet satırı |
|
|
121
|
+
| Voice Design | `voiceCast` proje seviyesi, `audioPlan` shot seviyesi, `Audio direction` parser seviyesi |
|
|
121
122
|
|
|
122
123
|
### 📷 Coverage System (MANDATORY)
|
|
123
124
|
|
|
@@ -134,10 +135,20 @@ When references provided:
|
|
|
134
135
|
|
|
135
136
|
**Sub-shot naming:** SHOT05A, SHOT05B, SHOT05C (main shot + alphabetical suffix)
|
|
136
137
|
|
|
137
|
-
### 🎵 Music Policy (BANNED)
|
|
138
|
-
|
|
139
|
-
**DEFAULT:** `Music: NONE` (Always).
|
|
140
|
-
User must explicitly request music. Otherwise use SFX/Ambience only.
|
|
138
|
+
### 🎵 Music Policy (BANNED)
|
|
139
|
+
|
|
140
|
+
**DEFAULT:** `Music: NONE` (Always).
|
|
141
|
+
User must explicitly request music. Otherwise use SFX/Ambience only.
|
|
142
|
+
|
|
143
|
+
### Voice Design Policy
|
|
144
|
+
|
|
145
|
+
If dialogue or narration exists:
|
|
146
|
+
|
|
147
|
+
- keep a project-level `voiceCast`
|
|
148
|
+
- keep a shot-level `audioPlan`
|
|
149
|
+
- keep the prompt-level `Audio direction:` block
|
|
150
|
+
- keep one active speaker per shot
|
|
151
|
+
- reuse the same `speakerKey` and voice identity across all shots
|
|
141
152
|
|
|
142
153
|
### 🎭 Dramaturji & Oyunculuk (Diyalog Sahneleri)
|
|
143
154
|
|
|
@@ -0,0 +1,248 @@
|
|
|
1
|
+
# Voice Design Output Contract
|
|
2
|
+
|
|
3
|
+
This runtime must support voice-design style audio generation where voice identity is created once per speaker and then reused across shots.
|
|
4
|
+
|
|
5
|
+
## Three-Layer Audio Architecture
|
|
6
|
+
|
|
7
|
+
Every scenario output must keep these layers aligned:
|
|
8
|
+
|
|
9
|
+
1. `voiceCast`
|
|
10
|
+
Project-level or scenario-level voice identity package.
|
|
11
|
+
2. `audioPlan`
|
|
12
|
+
Shot-level performance, transcript, SFX, ambience, and mix metadata.
|
|
13
|
+
3. `Audio direction:`
|
|
14
|
+
Human-readable prompt block kept for backward compatibility with existing parsers.
|
|
15
|
+
|
|
16
|
+
## Film-Kit Storage Contract
|
|
17
|
+
|
|
18
|
+
- Single-agent projects store project-level voice identity in `$OUTPUT_DIR/shot-plan.json`.
|
|
19
|
+
- Multi-agent projects store project-level voice identity in `$OUTPUT_DIR/team-plan.json`.
|
|
20
|
+
- Every `SHOTNN.md` file must include a machine-readable `Audio Plan` JSON block for each VIDEO section.
|
|
21
|
+
- The `Audio direction:` prompt block must mirror the same `audioPlan` and remain present in the video prompt.
|
|
22
|
+
|
|
23
|
+
## Core Principle
|
|
24
|
+
|
|
25
|
+
Voice identity is character-level.
|
|
26
|
+
Performance is shot-level.
|
|
27
|
+
|
|
28
|
+
Do not redesign a character voice on every shot.
|
|
29
|
+
Design the voice once, bind it to a stable `speakerKey`, and reuse that identity across the entire project.
|
|
30
|
+
|
|
31
|
+
## Project-Level `voiceCast`
|
|
32
|
+
|
|
33
|
+
Create one `voiceCast` entry for every speaking character or narrator.
|
|
34
|
+
|
|
35
|
+
Required fields:
|
|
36
|
+
|
|
37
|
+
- `speaker`
|
|
38
|
+
- `speakerKey`
|
|
39
|
+
- `role`
|
|
40
|
+
- `language`
|
|
41
|
+
- `voiceIdentityPrompt`
|
|
42
|
+
- `voicePerformanceBase`
|
|
43
|
+
- `preferredProvider`
|
|
44
|
+
- `preferredModel`
|
|
45
|
+
- `saveToLibrary`
|
|
46
|
+
|
|
47
|
+
Recommended fields:
|
|
48
|
+
|
|
49
|
+
- `genderHint`
|
|
50
|
+
- `ageRange`
|
|
51
|
+
- `referenceAudioStrategy`
|
|
52
|
+
- `referenceAudioNeeded`
|
|
53
|
+
- `shouldGenerateVoice`
|
|
54
|
+
- `voiceDesignRecommended`
|
|
55
|
+
- `voiceDesignPriority`
|
|
56
|
+
- `referenceAudioSuggested`
|
|
57
|
+
- `referenceAudioDescription`
|
|
58
|
+
- `notes`
|
|
59
|
+
|
|
60
|
+
Example:
|
|
61
|
+
|
|
62
|
+
```json
|
|
63
|
+
{
|
|
64
|
+
"speaker": "Kemal",
|
|
65
|
+
"speakerKey": "kemal",
|
|
66
|
+
"role": "character",
|
|
67
|
+
"language": "turkish",
|
|
68
|
+
"genderHint": "male",
|
|
69
|
+
"ageRange": "35-45",
|
|
70
|
+
"voiceIdentityPrompt": "A realistic Turkish male voice in his late thirties to early forties. Low-mid register, restrained authority, subtle fatigue, natural diction, intimate close-mic realism, no theatricality, highly believable human texture.",
|
|
71
|
+
"voicePerformanceBase": "controlled, interior, understated",
|
|
72
|
+
"referenceAudioStrategy": "optional",
|
|
73
|
+
"referenceAudioNeeded": false,
|
|
74
|
+
"shouldGenerateVoice": true,
|
|
75
|
+
"preferredProvider": "elevenlabs",
|
|
76
|
+
"preferredModel": "eleven_ttv_v3",
|
|
77
|
+
"saveToLibrary": true,
|
|
78
|
+
"voiceDesignRecommended": true,
|
|
79
|
+
"voiceDesignPriority": "high",
|
|
80
|
+
"referenceAudioSuggested": false,
|
|
81
|
+
"referenceAudioDescription": null,
|
|
82
|
+
"notes": "Use the same voice identity across all shots. Performance may change per shot, identity must not."
|
|
83
|
+
}
|
|
84
|
+
```
|
|
85
|
+
|
|
86
|
+
## `speakerKey` Rule
|
|
87
|
+
|
|
88
|
+
The same speaker must use the same `speakerKey` everywhere.
|
|
89
|
+
If a dialogue shot references a `speakerKey` that does not exist in `voiceCast`, treat it as a contract failure.
|
|
90
|
+
|
|
91
|
+
## `voiceIdentityPrompt` Rule
|
|
92
|
+
|
|
93
|
+
`voiceIdentityPrompt` describes the stable voice identity, not the temporary emotion of one shot.
|
|
94
|
+
|
|
95
|
+
Include:
|
|
96
|
+
|
|
97
|
+
- language
|
|
98
|
+
- age range
|
|
99
|
+
- gender feel when useful
|
|
100
|
+
- register and texture
|
|
101
|
+
- diction and articulation
|
|
102
|
+
- energy profile
|
|
103
|
+
- recording feel or mic proximity
|
|
104
|
+
- realism target such as believable, non-theatrical, non-announcer
|
|
105
|
+
|
|
106
|
+
Do not include:
|
|
107
|
+
|
|
108
|
+
- exact dialogue lines
|
|
109
|
+
- one-shot emotional spikes
|
|
110
|
+
- scene-specific action like running, shouting, whispering this exact line
|
|
111
|
+
|
|
112
|
+
## `voicePerformanceBase`
|
|
113
|
+
|
|
114
|
+
Use `voicePerformanceBase` for the speaker's normal acting baseline.
|
|
115
|
+
This baseline may combine with the shot-level `performanceNote`, but it must not replace voice identity.
|
|
116
|
+
|
|
117
|
+
## Shot-Level `audioPlan`
|
|
118
|
+
|
|
119
|
+
Every VIDEO section must carry a machine-readable `Audio Plan` JSON block.
|
|
120
|
+
|
|
121
|
+
If dialogue or voiceover exists, these fields are mandatory:
|
|
122
|
+
|
|
123
|
+
- `language`
|
|
124
|
+
- `type`
|
|
125
|
+
- `activeSpeaker`
|
|
126
|
+
- `activeSpeakerKey`
|
|
127
|
+
- `dialogueLines`
|
|
128
|
+
- `dialogueTranscript`
|
|
129
|
+
- `performanceNote`
|
|
130
|
+
|
|
131
|
+
Recommended support fields:
|
|
132
|
+
|
|
133
|
+
- `delivery.pace`
|
|
134
|
+
- `delivery.intensity`
|
|
135
|
+
- `delivery.emotion`
|
|
136
|
+
- `delivery.projection`
|
|
137
|
+
- `delivery.stability`
|
|
138
|
+
- `sfx`
|
|
139
|
+
- `ambience`
|
|
140
|
+
- `music`
|
|
141
|
+
- `mixTarget`
|
|
142
|
+
- `hasSubtitles`
|
|
143
|
+
|
|
144
|
+
Example:
|
|
145
|
+
|
|
146
|
+
```json
|
|
147
|
+
{
|
|
148
|
+
"language": "turkish",
|
|
149
|
+
"type": "dialogue",
|
|
150
|
+
"activeSpeaker": "Kemal",
|
|
151
|
+
"activeSpeakerKey": "kemal",
|
|
152
|
+
"dialogueLines": [
|
|
153
|
+
{
|
|
154
|
+
"speaker": "Kemal",
|
|
155
|
+
"speakerKey": "kemal",
|
|
156
|
+
"text": "Burayi terk etmemiz lazim."
|
|
157
|
+
}
|
|
158
|
+
],
|
|
159
|
+
"dialogueTranscript": "Kemal: Burayi terk etmemiz lazim.",
|
|
160
|
+
"performanceNote": "Low, controlled urgency. Quiet but decisive. No melodrama.",
|
|
161
|
+
"delivery": {
|
|
162
|
+
"pace": "measured",
|
|
163
|
+
"intensity": "medium-low",
|
|
164
|
+
"emotion": "suppressed urgency",
|
|
165
|
+
"projection": "intimate",
|
|
166
|
+
"stability": "steady"
|
|
167
|
+
},
|
|
168
|
+
"sfx": [
|
|
169
|
+
"Ahsap sandalyenin hafif surtunmesi"
|
|
170
|
+
],
|
|
171
|
+
"ambience": [
|
|
172
|
+
"Kucuk odada dusuk floresan humu"
|
|
173
|
+
],
|
|
174
|
+
"music": null,
|
|
175
|
+
"mixTarget": {
|
|
176
|
+
"dialogue": 70,
|
|
177
|
+
"sfx": 5,
|
|
178
|
+
"ambience": 25
|
|
179
|
+
},
|
|
180
|
+
"hasSubtitles": false
|
|
181
|
+
}
|
|
182
|
+
```
|
|
183
|
+
|
|
184
|
+
## Single Active Speaker Rule
|
|
185
|
+
|
|
186
|
+
For realistic lipsync and stable delivery, keep only one active speaker per shot.
|
|
187
|
+
|
|
188
|
+
- Other characters may appear in frame, but they should listen silently.
|
|
189
|
+
- Back-and-forth dialogue should be split across multiple shots.
|
|
190
|
+
- Coverage shots may have `audioPlan.type` values like `sfx-only` or `ambience-only` and therefore require no voice binding.
|
|
191
|
+
|
|
192
|
+
## Backward-Compatible Prompt Block
|
|
193
|
+
|
|
194
|
+
The prompt must still include the human-readable block:
|
|
195
|
+
|
|
196
|
+
```text
|
|
197
|
+
Audio direction:
|
|
198
|
+
Language: Turkish
|
|
199
|
+
Type: Dialogue
|
|
200
|
+
Dialogue transcript:
|
|
201
|
+
Kemal: Burayi terk etmemiz lazim.
|
|
202
|
+
SFX:
|
|
203
|
+
- Ahsap sandalyenin hafif surtunmesi
|
|
204
|
+
Ambience:
|
|
205
|
+
- Kucuk odada dusuk floresan humu
|
|
206
|
+
Music: NONE
|
|
207
|
+
Mix target: Dialogue 70%, SFX 5%, Ambience 25%
|
|
208
|
+
No on-screen subtitles/captions.
|
|
209
|
+
```
|
|
210
|
+
|
|
211
|
+
This block must be derived from the same `audioPlan`, not invented separately.
|
|
212
|
+
|
|
213
|
+
## Default Voice Design Policy
|
|
214
|
+
|
|
215
|
+
- Character voice stays fixed.
|
|
216
|
+
- Shot performance may change.
|
|
217
|
+
- Language and diction stay fixed.
|
|
218
|
+
- Realism is maximized.
|
|
219
|
+
- Theatricality is minimized.
|
|
220
|
+
- Promotional or announcer tone is forbidden unless explicitly requested.
|
|
221
|
+
- `Music: NONE` is the default.
|
|
222
|
+
- `No on-screen subtitles/captions.` is mandatory unless the user explicitly requests otherwise.
|
|
223
|
+
|
|
224
|
+
## Reference Audio Guidance
|
|
225
|
+
|
|
226
|
+
Recommend reference audio when the target voice depends on unusually specific texture:
|
|
227
|
+
|
|
228
|
+
- elderly or highly specific age character
|
|
229
|
+
- broken, smoky, hoarse, or damaged micro-texture
|
|
230
|
+
- strong local accent or highly specific diction
|
|
231
|
+
- very narrow performance tolerance between whisper and force
|
|
232
|
+
|
|
233
|
+
If reference audio is recommended, prefer:
|
|
234
|
+
|
|
235
|
+
- `preferredModel: "eleven_ttv_v3"`
|
|
236
|
+
- `referenceAudioNeeded: true`
|
|
237
|
+
|
|
238
|
+
## Provider Flow
|
|
239
|
+
|
|
240
|
+
When a provider supports voice design, the implementation flow is:
|
|
241
|
+
|
|
242
|
+
1. Use `voiceCast[].voiceIdentityPrompt` to design a preview voice.
|
|
243
|
+
2. Review multiple previews.
|
|
244
|
+
3. Select one preview.
|
|
245
|
+
4. Create a persistent `voice_id`.
|
|
246
|
+
5. Bind that `voice_id` to `speakerKey`.
|
|
247
|
+
6. Reuse the same voice identity across all future shots.
|
|
248
|
+
7. If needed, remix or refine the same voice without breaking identity continuity.
|
|
@@ -133,13 +133,24 @@ First sentence carries the most weight. Every prompt MUST follow this order:
|
|
|
133
133
|
3. [Action] Micro-behavior, acting cue, physical gesture
|
|
134
134
|
4. [Camera + Lens] "85mm f/2.0, shallow DOF, static with handheld micro-movement"
|
|
135
135
|
5. [Light + Atmosphere] "Warm oil lamp key light from screen-left, deep shadows"
|
|
136
|
-
6. [Audio direction block] (VIDEO prompts only)
|
|
137
|
-
7. [Avoid line] (EVERY prompt — MANDATORY)
|
|
138
|
-
```
|
|
139
|
-
|
|
140
|
-
> **Kling adaptation:** When model is kling-3.0, replace step 3 (Action) with motion timeline (`first → then → finally`). Step 6 uses Kling tone markers instead of Audio Direction Block. See `.agent/model-profile.md` for golden prompt skeleton.
|
|
141
|
-
|
|
142
|
-
|
|
136
|
+
6. [Audio direction block] (VIDEO prompts only)
|
|
137
|
+
7. [Avoid line] (EVERY prompt — MANDATORY)
|
|
138
|
+
```
|
|
139
|
+
|
|
140
|
+
> **Kling adaptation:** When model is kling-3.0, replace step 3 (Action) with motion timeline (`first → then → finally`). Step 6 uses Kling tone markers instead of Audio Direction Block. See `.agent/model-profile.md` for golden prompt skeleton.
|
|
141
|
+
|
|
142
|
+
## VOICE DESIGN CONTRACT
|
|
143
|
+
|
|
144
|
+
When dialogue, narrator VO, or character speech exists:
|
|
145
|
+
|
|
146
|
+
- read `.agent/VOICE-DESIGN.md`
|
|
147
|
+
- create and maintain project-level `voiceCast` in `shot-plan.json`
|
|
148
|
+
- reuse the same `speakerKey` across all shots
|
|
149
|
+
- write a machine-readable `Audio Plan` JSON block for every VIDEO section
|
|
150
|
+
- keep `voiceIdentityPrompt` character-level and `performanceNote` shot-level
|
|
151
|
+
- keep one active speaker per shot
|
|
152
|
+
|
|
153
|
+
---
|
|
143
154
|
|
|
144
155
|
## AUDIO DIRECTION (Mandatory for Video)
|
|
145
156
|
|
|
@@ -195,22 +206,29 @@ Before outputting ANY shot:
|
|
|
195
206
|
- [ ] No active violence/gore?
|
|
196
207
|
|
|
197
208
|
### 2. Prompt Structure
|
|
198
|
-
- [ ] Follows 1-7 Flow Order?
|
|
199
|
-
- [ ] Avoid Line present on EVERY prompt (Image + Video + Coverage)?
|
|
200
|
-
- [ ] Audio direction present on EVERY video prompt? (Veo block or Kling inline)
|
|
201
|
-
- [ ]
|
|
209
|
+
- [ ] Follows 1-7 Flow Order?
|
|
210
|
+
- [ ] Avoid Line present on EVERY prompt (Image + Video + Coverage)?
|
|
211
|
+
- [ ] Audio direction present on EVERY video prompt? (Veo block or Kling inline)
|
|
212
|
+
- [ ] `shot-plan.json` contains top-level `voiceCast`?
|
|
213
|
+
- [ ] Every speaking shot has an `Audio Plan` block with valid `activeSpeakerKey`?
|
|
214
|
+
- [ ] Each active speaker resolves to a stable `speakerKey` in `voiceCast`?
|
|
215
|
+
- [ ] One active speaker per dialogue shot?
|
|
216
|
+
- [ ] 2-3 Coverage shots included in the same file?
|
|
202
217
|
- [ ] Quality floor passes? (ILK>=80, SON>=80, VIDEO>=120, coverage>=70)
|
|
203
218
|
- [ ] Specificity floor passes? (lens + lighting + FG/MG/BG action)
|
|
204
219
|
- [ ] Spatial realism passes? (eyeline target + plane map + shared light + contact/depth cues)
|
|
205
220
|
- [ ] Model Control block exists? (`Model`, `Preset`, `CFG`, `Transition Mode`)
|
|
206
221
|
|
|
207
|
-
### 3. Kling-Specific Gates (when model is kling-3.0)
|
|
208
|
-
- [ ] Motion timeline uses `first → then → finally` structure?
|
|
209
|
-
- [ ] "What stays the same" explicitly stated? (identity, background, costume)
|
|
210
|
-
- [ ] Camera movement is simple/safe? (no complex hybrid movements)
|
|
211
|
-
- [ ] Negative prompt includes Kling cleanup set? (warping, rubbery, melted)
|
|
212
|
-
- [ ] Duration matches transformation budget? (5s=1 change, 10s=2-3, 15s=complex)
|
|
213
|
-
- [ ] Start/end frames are in same visual universe? (angle, scale, light, lens)
|
|
222
|
+
### 3. Kling-Specific Gates (when model is kling-3.0)
|
|
223
|
+
- [ ] Motion timeline uses `first → then → finally` structure?
|
|
224
|
+
- [ ] "What stays the same" explicitly stated? (identity, background, costume)
|
|
225
|
+
- [ ] Camera movement is simple/safe? (no complex hybrid movements)
|
|
226
|
+
- [ ] Negative prompt includes Kling cleanup set? (warping, rubbery, melted)
|
|
227
|
+
- [ ] Duration matches transformation budget? (5s=1 change, 10s=2-3, 15s=complex)
|
|
228
|
+
- [ ] Start/end frames are in same visual universe? (angle, scale, light, lens)
|
|
229
|
+
- [ ] Used `single-transition` by default instead of forcing storyboard complexity?
|
|
230
|
+
- [ ] If custom storyboard is used, is it capped at 3 meaningful stages?
|
|
231
|
+
- [ ] Did each storyboard stage earn a distinct editorial job instead of representing micro-gestures?
|
|
214
232
|
|
|
215
233
|
---
|
|
216
234
|
|
|
@@ -241,15 +259,20 @@ For EACH shot, output exactly one file (`SHOTNN.md`) containing Main Shot + Cove
|
|
|
241
259
|
[If CHAINED: include "Use SHOT[prev]_END as exact first frame" inside code block]
|
|
242
260
|
[If FIRST/BREAK: image prompt in same code block]
|
|
243
261
|
|
|
244
|
-
### SON FRAME (SHOTNN_END)
|
|
245
|
-
```
|
|
246
|
-
[Image Prompt (Flow Order + Avoid Line)]
|
|
247
|
-
avoid: blurry, low-res, text, watermark, bad anatomy, distorted face ...
|
|
248
|
-
```
|
|
249
|
-
|
|
250
|
-
###
|
|
251
|
-
```
|
|
252
|
-
[
|
|
262
|
+
### SON FRAME (SHOTNN_END)
|
|
263
|
+
```
|
|
264
|
+
[Image Prompt (Flow Order + Avoid Line)]
|
|
265
|
+
avoid: blurry, low-res, text, watermark, bad anatomy, distorted face ...
|
|
266
|
+
```
|
|
267
|
+
|
|
268
|
+
### AUDIO PLAN
|
|
269
|
+
```json
|
|
270
|
+
[Machine-readable audioPlan JSON block aligned with shot-plan.json -> voiceCast]
|
|
271
|
+
```
|
|
272
|
+
|
|
273
|
+
### VİDEO
|
|
274
|
+
```
|
|
275
|
+
[Video Prompt (Flow Order)]
|
|
253
276
|
|
|
254
277
|
Audio direction:
|
|
255
278
|
- Language: ...
|
|
@@ -3,12 +3,136 @@ name: audio-design
|
|
|
3
3
|
description: Sound design rules for model-aware profiles (Veo 3.1 / Kling 3.0). Voice realism, environmental sounds, SFX, ambience, and audio direction block formatting. Includes Kling native audio tonlama and anti-synthetic audio guidelines.
|
|
4
4
|
---
|
|
5
5
|
|
|
6
|
-
# Audio Design System
|
|
7
|
-
|
|
8
|
-
> **Philosophy:** Sound is half the experience. Audio must be as real as visuals.
|
|
9
|
-
> **Core Principle:** Every sound recorded-quality. No synthetic giveaways.
|
|
10
|
-
|
|
11
|
-
---
|
|
6
|
+
# Audio Design System
|
|
7
|
+
|
|
8
|
+
> **Philosophy:** Sound is half the experience. Audio must be as real as visuals.
|
|
9
|
+
> **Core Principle:** Every sound recorded-quality. No synthetic giveaways.
|
|
10
|
+
|
|
11
|
+
---
|
|
12
|
+
|
|
13
|
+
## Voice Design Contract (MANDATORY)
|
|
14
|
+
|
|
15
|
+
When a scenario includes spoken dialogue, narrator VO, or any reusable character voice, use a three-layer contract:
|
|
16
|
+
|
|
17
|
+
1. `voiceCast`
|
|
18
|
+
Stable project-level voice identity package.
|
|
19
|
+
2. `audioPlan`
|
|
20
|
+
Shot-level performance and mix metadata.
|
|
21
|
+
3. `Audio direction:`
|
|
22
|
+
Human-readable prompt block for backward compatibility.
|
|
23
|
+
|
|
24
|
+
Use `.agent/VOICE-DESIGN.md` as the authoritative schema reference.
|
|
25
|
+
|
|
26
|
+
### Project-Level `voiceCast`
|
|
27
|
+
|
|
28
|
+
Every speaking character or narrator must have one `voiceCast` entry.
|
|
29
|
+
|
|
30
|
+
Required fields:
|
|
31
|
+
|
|
32
|
+
- `speaker`
|
|
33
|
+
- `speakerKey`
|
|
34
|
+
- `role`
|
|
35
|
+
- `language`
|
|
36
|
+
- `voiceIdentityPrompt`
|
|
37
|
+
- `voicePerformanceBase`
|
|
38
|
+
- `preferredProvider`
|
|
39
|
+
- `preferredModel`
|
|
40
|
+
- `saveToLibrary`
|
|
41
|
+
|
|
42
|
+
Rules:
|
|
43
|
+
|
|
44
|
+
- `speakerKey` must stay stable across the whole project.
|
|
45
|
+
- `voiceIdentityPrompt` describes the permanent voice identity, never a one-shot emotional spike.
|
|
46
|
+
- `voicePerformanceBase` describes the speaker's normal acting baseline.
|
|
47
|
+
- If a dialogue shot has no matching `voiceCast` entry, the output is invalid.
|
|
48
|
+
|
|
49
|
+
Example:
|
|
50
|
+
|
|
51
|
+
```json
|
|
52
|
+
{
|
|
53
|
+
"speaker": "Kemal",
|
|
54
|
+
"speakerKey": "kemal",
|
|
55
|
+
"role": "character",
|
|
56
|
+
"language": "turkish",
|
|
57
|
+
"voiceIdentityPrompt": "A realistic Turkish male voice in his late thirties to early forties. Low-mid register, restrained authority, subtle fatigue, natural diction, intimate close-mic realism, no theatricality, highly believable human texture.",
|
|
58
|
+
"voicePerformanceBase": "controlled, interior, understated",
|
|
59
|
+
"preferredProvider": "elevenlabs",
|
|
60
|
+
"preferredModel": "eleven_ttv_v3",
|
|
61
|
+
"saveToLibrary": true
|
|
62
|
+
}
|
|
63
|
+
```
|
|
64
|
+
|
|
65
|
+
### Shot-Level `audioPlan`
|
|
66
|
+
|
|
67
|
+
Every VIDEO block must include a machine-readable `Audio Plan` JSON section that maps to the project-level voice identity.
|
|
68
|
+
|
|
69
|
+
If the shot contains dialogue or voiceover, these fields are mandatory:
|
|
70
|
+
|
|
71
|
+
- `language`
|
|
72
|
+
- `type`
|
|
73
|
+
- `activeSpeaker`
|
|
74
|
+
- `activeSpeakerKey`
|
|
75
|
+
- `dialogueLines`
|
|
76
|
+
- `dialogueTranscript`
|
|
77
|
+
- `performanceNote`
|
|
78
|
+
|
|
79
|
+
Recommended support fields:
|
|
80
|
+
|
|
81
|
+
- `delivery.pace`
|
|
82
|
+
- `delivery.intensity`
|
|
83
|
+
- `delivery.emotion`
|
|
84
|
+
- `delivery.projection`
|
|
85
|
+
- `delivery.stability`
|
|
86
|
+
- `sfx`
|
|
87
|
+
- `ambience`
|
|
88
|
+
- `music`
|
|
89
|
+
- `mixTarget`
|
|
90
|
+
- `hasSubtitles`
|
|
91
|
+
|
|
92
|
+
### Single Active Speaker Rule
|
|
93
|
+
|
|
94
|
+
Keep only one active speaker per shot.
|
|
95
|
+
|
|
96
|
+
- Reaction shots should usually be silent or ambience-only.
|
|
97
|
+
- Back-and-forth dialogue should be split across multiple shots.
|
|
98
|
+
- OTS coverage may carry one voice, but do not ask the model to lipsync two people at once.
|
|
99
|
+
|
|
100
|
+
### Voice Identity vs Performance
|
|
101
|
+
|
|
102
|
+
Keep this distinction strict:
|
|
103
|
+
|
|
104
|
+
- `voiceIdentityPrompt` = who the speaker is across the whole film
|
|
105
|
+
- `performanceNote` = how that speaker performs in this shot
|
|
106
|
+
|
|
107
|
+
Correct identity cues:
|
|
108
|
+
|
|
109
|
+
- `low-mid register`
|
|
110
|
+
- `natural Turkish diction`
|
|
111
|
+
- `restrained authority`
|
|
112
|
+
- `breathy but realistic`
|
|
113
|
+
- `non-theatrical`
|
|
114
|
+
- `highly believable real-world tone`
|
|
115
|
+
|
|
116
|
+
Incorrect identity cues:
|
|
117
|
+
|
|
118
|
+
- exact dialogue lines
|
|
119
|
+
- `angrily says this line`
|
|
120
|
+
- `whispering while running`
|
|
121
|
+
- temporary scene emotion only
|
|
122
|
+
|
|
123
|
+
### Provider Flow
|
|
124
|
+
|
|
125
|
+
Voice Design style providers should follow this order:
|
|
126
|
+
|
|
127
|
+
1. design from `voiceCast[].voiceIdentityPrompt`
|
|
128
|
+
2. review previews
|
|
129
|
+
3. choose one
|
|
130
|
+
4. create persistent voice
|
|
131
|
+
5. bind voice to `speakerKey`
|
|
132
|
+
6. reuse across all shots
|
|
133
|
+
7. remix only as a refinement, not as a new character identity
|
|
134
|
+
|
|
135
|
+
---
|
|
12
136
|
|
|
13
137
|
## 🎯 Audio Realism Baseline
|
|
14
138
|
|
|
@@ -197,19 +321,22 @@ User: "Kalkın aslanlar! Düşman zırhlıları top menziline giriyorlar!"
|
|
|
197
321
|
✅ RIGHT: Dialogue transcript: "Kalkın aslanlar! Düşman zırhlıları top menziline giriyorlar!"
|
|
198
322
|
```
|
|
199
323
|
|
|
200
|
-
### When User Provides Dialogue
|
|
201
|
-
|
|
202
|
-
- Include VERBATIM in audio transcript
|
|
203
|
-
- Preserve original language (Turkish stays Turkish, English stays English)
|
|
204
|
-
- Note emotional delivery required
|
|
324
|
+
### When User Provides Dialogue
|
|
325
|
+
|
|
326
|
+
- Include VERBATIM in audio transcript
|
|
327
|
+
- Preserve original language (Turkish stays Turkish, English stays English)
|
|
328
|
+
- Note emotional delivery required
|
|
329
|
+
- Mirror the same transcript in `audioPlan.dialogueLines`
|
|
330
|
+
- Bind the speaking line to `activeSpeakerKey`
|
|
205
331
|
|
|
206
332
|
### 🗣️ SPEAKER ISOLATION RULE (Prevent Mixed Dialogue)
|
|
207
333
|
|
|
208
334
|
**Problem:** If two people are in the frame and both speak, AI often mixes lipsync or timing.
|
|
209
335
|
**Solution:** ONE active speaker per shot.
|
|
210
336
|
|
|
211
|
-
- **Shot A:** Character X speaks. Camera focuses on X (or over-the-shoulder of Y).
|
|
212
|
-
- **Shot B:** Character Y replies. Camera focuses on Y.
|
|
337
|
+
- **Shot A:** Character X speaks. Camera focuses on X (or over-the-shoulder of Y).
|
|
338
|
+
- **Shot B:** Character Y replies. Camera focuses on Y.
|
|
339
|
+
- Keep the same `speakerKey` and `voiceCast` identity in both shots. Only `performanceNote` changes.
|
|
213
340
|
|
|
214
341
|
**EXCEPTION:** If both MUST be in frame (Two-Shot):
|
|
215
342
|
1. Use "Reaction Shot" for the listener (listener nods while speaker talks off-screen sound).
|
|
@@ -81,6 +81,18 @@ Before using any video template in this file:
|
|
|
81
81
|
- If active model is **Kling 3.0**, do **not** default to the Veo-style `Audio direction:` bullet schema below.
|
|
82
82
|
- If project is a hybrid/smart-hybrid package, the local `.agent/skills/prompt-structure/SKILL.md` override takes precedence over this base file.
|
|
83
83
|
|
|
84
|
+
## Kling Storyboard / Multi-Shot Decision Rule
|
|
85
|
+
|
|
86
|
+
The official Kling app surface exposes start/end control, optional end-frame anchoring, and custom storyboard-style shot prompting.
|
|
87
|
+
Use that capability with restraint:
|
|
88
|
+
|
|
89
|
+
- default to a single Start+End transition prompt when the beat is one continuous motion arc, one reveal, or one emotional turn
|
|
90
|
+
- use custom storyboard only when the clip truly contains **2-3 distinct editorial phases** with different visual jobs
|
|
91
|
+
- in this toolkit, cap app.kling custom storyboard mode at **3 custom shots** per video
|
|
92
|
+
- never split one micro gesture, blink, glance, prop touch, or tiny head turn into separate storyboard shots
|
|
93
|
+
- if a beat needs 4+ distinct phases, split into chained videos instead of bloating one generation
|
|
94
|
+
- every storyboard shot must earn a clear job: establish / action / reveal / reaction / resolve
|
|
95
|
+
|
|
84
96
|
## Video Prompt Structure (Veo-Oriented Default Template)
|
|
85
97
|
|
|
86
98
|
> Use this section when the active video model/routed section is **Veo 3.1**.
|
|
@@ -198,9 +210,16 @@ Avoid: distorted faces, morphing, bad anatomy, extra limbs/fingers, blurry, flic
|
|
|
198
210
|
| **Establishing** | 8s | 10–15s | Set scene, allow environment |
|
|
199
211
|
| **Complex transformation** | N/A | 15s (max) | Multi-step with single-scene continuity |
|
|
200
212
|
|
|
201
|
-
> **Kling Note:** Duration maps directly to **transformation budget**. More change between start and end frames = longer duration needed. See `.agent/model-profile.md` for details.
|
|
202
|
-
|
|
203
|
-
|
|
213
|
+
> **Kling Note:** Duration maps directly to **transformation budget**. More change between start and end frames = longer duration needed. See `.agent/model-profile.md` for details.
|
|
214
|
+
|
|
215
|
+
### Kling Anti-Fragmentation Rule
|
|
216
|
+
|
|
217
|
+
- do not turn every sentence or gesture into a new Kling storyboard phase
|
|
218
|
+
- if a single camera move can carry the beat cleanly, keep one shot and strengthen the motion path
|
|
219
|
+
- if the beat reads as setup -> action -> settle, `first -> then -> finally` is usually enough
|
|
220
|
+
- reserve multi-shot/storyboard mode for meaningful internal progression, not decorative complexity
|
|
221
|
+
|
|
222
|
+
---
|
|
204
223
|
|
|
205
224
|
## Prompt Re-Take Strategy
|
|
206
225
|
|
|
@@ -34,6 +34,7 @@ All generated files must keep `SHOTNN.md` naming.
|
|
|
34
34
|
1. Read `$OUTPUT_DIR/shots/SHOT[last].md`.
|
|
35
35
|
2. Extract `SON FRAME` description.
|
|
36
36
|
3. Capture character/location continuity details.
|
|
37
|
+
4. Reuse `$OUTPUT_DIR/shot-plan.json -> voiceCast` and keep every existing `speakerKey` stable.
|
|
37
38
|
|
|
38
39
|
### 3. Continue Generation
|
|
39
40
|
|
|
@@ -49,6 +50,7 @@ If model is kling-3.0: keep Start+End transition mode and first/then/finally mot
|
|
|
49
50
|
- Write each shot to `$OUTPUT_DIR/shots/SHOT[NN].md`
|
|
50
51
|
- Keep one-file-per-shot contract
|
|
51
52
|
- Ensure `ILK/İLK FRAME` code block exists even when chained
|
|
53
|
+
- Keep `Audio Plan` blocks aligned to the existing `voiceCast`
|
|
52
54
|
- Update `$OUTPUT_DIR/_index.md`
|
|
53
55
|
|
|
54
56
|
### 5. Refresh Reports
|
|
@@ -32,6 +32,8 @@ All shot files follow `SHOTNN.md` naming.
|
|
|
32
32
|
5. Verify chain continuity.
|
|
33
33
|
6. Verify Avoid line on every prompt.
|
|
34
34
|
7. Verify `Model Control` block exists in every shot file.
|
|
35
|
+
8. Verify `shot-plan.json` has `voiceCast` coverage for every speaker or narrator.
|
|
36
|
+
9. Verify every speaking VIDEO section has an `Audio Plan` block with valid `activeSpeakerKey`.
|
|
35
37
|
8. For `kling-3.0`, verify:
|
|
36
38
|
- `Transition Mode: Start+End`
|
|
37
39
|
- CFG value is documented
|
|
@@ -83,6 +85,7 @@ Do not declare completion unless:
|
|
|
83
85
|
- delivery report is pass
|
|
84
86
|
- final summary says pass
|
|
85
87
|
- model-specific checks pass (Kling or Veo profile)
|
|
88
|
+
- voice design contract passes (`voiceCast` + `Audio Plan` + single active speaker)
|
|
86
89
|
|
|
87
90
|
---
|
|
88
91
|
|
|
@@ -30,7 +30,7 @@ Generate production-ready shot prompts from scenario and **SAVE TO FILES**.
|
|
|
30
30
|
```text
|
|
31
31
|
$OUTPUT_DIR/
|
|
32
32
|
├── project-info.md # Scenario, characters, settings
|
|
33
|
-
├── shot-plan.json # Single-agent plan + policy contract
|
|
33
|
+
├── shot-plan.json # Single-agent plan + policy + voiceCast contract
|
|
34
34
|
├── shots/
|
|
35
35
|
│ ├── SHOT01.md # First shot (main + coverage)
|
|
36
36
|
│ ├── SHOT02.md # Second shot
|
|
@@ -62,7 +62,10 @@ $OUTPUT_DIR/
|
|
|
62
62
|
- coverage strategy
|
|
63
63
|
- model contract (`model`, `preset`, `transition_mode`)
|
|
64
64
|
- `dialogue_name_policy` (`preserve-original-dialogue` or `anonymize-dialogue`)
|
|
65
|
-
|
|
65
|
+
- top-level `voiceCast`
|
|
66
|
+
- voice defaults (`single_active_speaker`, `music_default`, `subtitles_default`)
|
|
67
|
+
10. Ensure every speaker or narrator has a stable `speakerKey` before shot writing.
|
|
68
|
+
11. Ensure directories exist: `$OUTPUT_DIR/`, `$OUTPUT_DIR/shots/`, `$OUTPUT_DIR/reports/`.
|
|
66
69
|
|
|
67
70
|
### 2. Batch Strategy
|
|
68
71
|
|
|
@@ -83,9 +86,16 @@ For EACH shot:
|
|
|
83
86
|
1. Analyze scene type (Dialogue / Action / Emotional / Establishing).
|
|
84
87
|
2. Build dramaturgy behavior (objective, obstacle, stakes, subtext, beat turns).
|
|
85
88
|
3. If model is `kling-3.0`, perform Kling Frame Prep (see below).
|
|
86
|
-
4.
|
|
89
|
+
4. If model is `kling-3.0`, choose Kling execution mode:
|
|
90
|
+
- `single-transition` for one strong motion arc or reveal
|
|
91
|
+
- `custom-storyboard` only for 2-3 distinct editorial phases
|
|
92
|
+
4. Reuse or create the correct `voiceCast` entry for any speaking character or narrator.
|
|
93
|
+
5. Generate main shot prompts (`ILK/İLK FRAME`, `SON FRAME`, `VIDEO`).
|
|
87
94
|
5. `ILK/İLK FRAME` section MUST always include a fenced code block.
|
|
88
95
|
6. If chained, first-frame code block must explicitly state: `Use SHOT[prev]_END as exact first frame`.
|
|
96
|
+
7. Write a machine-readable `Audio Plan` JSON block for every VIDEO section.
|
|
97
|
+
8. If dialogue or voiceover exists, require `activeSpeakerKey`, `dialogueLines`, and `performanceNote`.
|
|
98
|
+
9. Keep one active speaker per shot. Split reply dialogue across multiple shots.
|
|
89
99
|
7. Enforce hard quality floor:
|
|
90
100
|
- `ILK FRAME`: minimum 80 words
|
|
91
101
|
- `SON FRAME`: minimum 80 words
|
|
@@ -98,11 +108,11 @@ For EACH shot:
|
|
|
98
108
|
- explicit eyeline target / body orientation when a subject looks at someone or something
|
|
99
109
|
- explicit shared light source / bounce logic when multiple subjects share frame
|
|
100
110
|
- explicit depth/scale integration when more than one plane is visible
|
|
101
|
-
|
|
102
|
-
|
|
103
|
-
|
|
104
|
-
|
|
105
|
-
|
|
111
|
+
10. Generate coverage prompts (2-3 per main shot, min 70 words each).
|
|
112
|
+
11. Add Turkish summary for shot and each coverage section.
|
|
113
|
+
12. Apply model-specific generation gates (see below).
|
|
114
|
+
13. Write shot file: `$OUTPUT_DIR/shots/SHOT[NN].md`.
|
|
115
|
+
14. Update `$OUTPUT_DIR/_index.md`.
|
|
106
116
|
|
|
107
117
|
#### Kling Frame Prep (when model is kling-3.0)
|
|
108
118
|
|
|
@@ -113,9 +123,11 @@ Before writing prompts, design the Start→End transition:
|
|
|
113
123
|
- 5s = 1 major change
|
|
114
124
|
- 10s = 2-3 staged changes
|
|
115
125
|
- 15s = complex multi-step
|
|
116
|
-
3. **
|
|
117
|
-
4. **
|
|
118
|
-
5. **
|
|
126
|
+
3. **Execution mode:** Default to `single-transition`; use `custom-storyboard` only when the shot truly has 2-3 meaningful internal phases.
|
|
127
|
+
4. **Motion timeline:** Write 2-4 steps: `first → then → finally`.
|
|
128
|
+
5. **Face/hands stability:** Match orientations between start and end — avoid >45° face rotation.
|
|
129
|
+
6. **Camera safety:** Use only safe movements (slow push-in, pan, tilt, micro-sway, tripod-locked).
|
|
130
|
+
7. **Anti-fragmentation:** Do not turn one glance, gesture, or prop touch into separate micro-shots. If custom storyboard is used, cap it at 3 stages and make each stage editorially distinct.
|
|
119
131
|
|
|
120
132
|
#### Veo Gate (when model is veo31)
|
|
121
133
|
|
|
@@ -134,6 +146,8 @@ Before writing prompts, design the Start→End transition:
|
|
|
134
146
|
- [ ] Start/end frames verified for same visual universe
|
|
135
147
|
- [ ] Dialogue uses `"quotation marks"` with tone markers (Kling inline format)
|
|
136
148
|
- [ ] Eyelines, plane map, and shared-light logic stay consistent across start/end frames
|
|
149
|
+
- [ ] Defaulted to `single-transition` unless 2-3 distinct internal phases genuinely require custom storyboard
|
|
150
|
+
- [ ] If custom storyboard is used: maximum 3 stages, no decorative micro-beats, and each stage changes function/framing/action
|
|
137
151
|
|
|
138
152
|
### 4. Validation Pass (Mandatory Before Completion)
|
|
139
153
|
|
|
@@ -190,6 +204,36 @@ Avoid: blurry, low-res, text, watermark, bad anatomy, distorted face ...
|
|
|
190
204
|
Avoid: blurry, low-res, text, watermark, bad anatomy, distorted face ...
|
|
191
205
|
```
|
|
192
206
|
|
|
207
|
+
### AUDIO PLAN
|
|
208
|
+
|
|
209
|
+
```json
|
|
210
|
+
{
|
|
211
|
+
"language": "...",
|
|
212
|
+
"type": "...",
|
|
213
|
+
"activeSpeaker": "...",
|
|
214
|
+
"activeSpeakerKey": "...",
|
|
215
|
+
"dialogueLines": [],
|
|
216
|
+
"dialogueTranscript": "...",
|
|
217
|
+
"performanceNote": "...",
|
|
218
|
+
"delivery": {
|
|
219
|
+
"pace": "...",
|
|
220
|
+
"intensity": "...",
|
|
221
|
+
"emotion": "...",
|
|
222
|
+
"projection": "...",
|
|
223
|
+
"stability": "..."
|
|
224
|
+
},
|
|
225
|
+
"sfx": [],
|
|
226
|
+
"ambience": [],
|
|
227
|
+
"music": null,
|
|
228
|
+
"mixTarget": {
|
|
229
|
+
"dialogue": 0,
|
|
230
|
+
"sfx": 0,
|
|
231
|
+
"ambience": 0
|
|
232
|
+
},
|
|
233
|
+
"hasSubtitles": false
|
|
234
|
+
}
|
|
235
|
+
```
|
|
236
|
+
|
|
193
237
|
### VIDEO
|
|
194
238
|
|
|
195
239
|
#### Veo Format:
|
|
@@ -240,6 +284,10 @@ Avoid: ...
|
|
|
240
284
|
Avoid: ...
|
|
241
285
|
```
|
|
242
286
|
|
|
287
|
+
```json
|
|
288
|
+
[Coverage audioPlan JSON block - voice binding only when coverage actually contains speech]
|
|
289
|
+
```
|
|
290
|
+
|
|
243
291
|
### SHOT[NN]B - [Type]
|
|
244
292
|
[Same format]
|
|
245
293
|
|
|
@@ -14,15 +14,17 @@ $ARGUMENTS
|
|
|
14
14
|
- delivery report is fail
|
|
15
15
|
- missing required shot sections
|
|
16
16
|
- continuity mismatch between neighboring shots
|
|
17
|
+
- missing `voiceCast` entry or broken `activeSpeakerKey` binding
|
|
17
18
|
|
|
18
19
|
## Recovery Steps
|
|
19
20
|
|
|
20
21
|
1. Identify failed files and sections.
|
|
21
22
|
2. Fix only affected shots first.
|
|
22
23
|
3. Regenerate neighboring shots only if continuity requires it.
|
|
23
|
-
4.
|
|
24
|
-
5.
|
|
25
|
-
6.
|
|
24
|
+
4. Repair `voiceCast` or `Audio Plan` bindings before rerunning reports.
|
|
25
|
+
5. Re-run `/safety-check`.
|
|
26
|
+
6. Regenerate `$OUTPUT_DIR/reports/DELIVERY-REPORT.md`.
|
|
27
|
+
7. Update `$OUTPUT_DIR/_index.md` with recovered status.
|
|
26
28
|
|
|
27
29
|
## Exit Criteria
|
|
28
30
|
|
|
@@ -38,6 +38,10 @@ Validate all prompts before delivery to ensure platform compliance.
|
|
|
38
38
|
- coverage has 2-3 sub-shots in same file
|
|
39
39
|
- all prompts include Avoid line
|
|
40
40
|
- all video prompts include full audio direction block
|
|
41
|
+
- `shot-plan.json` contains `voiceCast` when any shot has dialogue or narration
|
|
42
|
+
- every speaking VIDEO section includes an `Audio Plan` block
|
|
43
|
+
- every `activeSpeakerKey` resolves to `shot-plan.json -> voiceCast`
|
|
44
|
+
- no dialogue shot contains more than one active speaker
|
|
41
45
|
|
|
42
46
|
3. Continuity checks
|
|
43
47
|
- `SHOT[N]_END` aligns with `SHOT[N+1]_START`
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "@milenyumai/film-kit",
|
|
3
|
-
"version": "1.4.
|
|
3
|
+
"version": "1.4.1",
|
|
4
4
|
"description": "Hollywood-standard cinematic prompt engineering toolkit with model profiles (Veo 3.1 / Kling 3.0). Auto-configures AI agents (Cursor, Claude Code, VS Code Copilot, Antigravity) with production-grade shot generation system.",
|
|
5
5
|
"type": "module",
|
|
6
6
|
"main": "./build/index.js",
|