@milenyumai/film-kit 1.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +21 -0
- package/README.md +89 -0
- package/build/index.d.ts +2 -0
- package/build/index.js +1 -0
- package/build/lib/configure.d.ts +2 -0
- package/build/lib/configure.js +119 -0
- package/build/lib/defaults.d.ts +2 -0
- package/build/lib/defaults.js +12 -0
- package/build/lib/fs.d.ts +3 -0
- package/build/lib/fs.js +23 -0
- package/build/lib/templates.d.ts +2 -0
- package/build/lib/templates.js +583 -0
- package/build/lib/types.d.ts +25 -0
- package/build/lib/types.js +1 -0
- package/build/postinstall.js +22 -0
- package/content/ARCHITECTURE.md +132 -0
- package/content/FILM-KIT-INFO.md +116 -0
- package/content/MASTER.md +689 -0
- package/content/RULES.md +233 -0
- package/content/agents/prompt-engineer.md +258 -0
- package/content/skills/audio-design/SKILL.md +307 -0
- package/content/skills/coverage-system/SKILL.md +681 -0
- package/content/skills/frame-chaining/SKILL.md +342 -0
- package/content/skills/prompt-structure/SKILL.md +429 -0
- package/content/skills/reference-locking/SKILL.md +303 -0
- package/content/skills/safety-compliance/SKILL.md +242 -0
- package/content/skills/visual-modes/SKILL.md +294 -0
- package/content/workflows/chain.md +106 -0
- package/content/workflows/finish.md +108 -0
- package/content/workflows/generate.md +207 -0
- package/content/workflows/safety-check.md +149 -0
- package/package.json +58 -0
|
@@ -0,0 +1,307 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: audio-design
|
|
3
|
+
description: Sound design rules for Veo 3.1. Voice realism, environmental sounds, SFX, ambience, and audio direction block formatting. Includes anti-synthetic audio guidelines.
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Audio Design System
|
|
7
|
+
|
|
8
|
+
> **Philosophy:** Sound is half the experience. Audio must be as real as visuals.
|
|
9
|
+
> **Core Principle:** Every sound recorded-quality. No synthetic giveaways.
|
|
10
|
+
|
|
11
|
+
---
|
|
12
|
+
|
|
13
|
+
## 🎯 Audio Realism Baseline
|
|
14
|
+
|
|
15
|
+
### Human Voices (Ultra Realistic)
|
|
16
|
+
|
|
17
|
+
| Requirement | Description |
|
|
18
|
+
|-------------|-------------|
|
|
19
|
+
| **Natural delivery** | Sound like real human actors |
|
|
20
|
+
| **Breathing** | Natural breaths between phrases |
|
|
21
|
+
| **Micro-pauses** | Authentic hesitation, thinking |
|
|
22
|
+
| **Vocal imperfections** | Slight variations, not robotic |
|
|
23
|
+
| **Emotional authenticity** | Genuine feeling, not performed reading |
|
|
24
|
+
|
|
25
|
+
### ❌ Voice Artifacts to Avoid
|
|
26
|
+
|
|
27
|
+
- Robotic TTS quality
|
|
28
|
+
- Monotone delivery
|
|
29
|
+
- Machine-like precision
|
|
30
|
+
- Flat affect
|
|
31
|
+
- Unnatural pacing
|
|
32
|
+
- Over-articulation
|
|
33
|
+
|
|
34
|
+
---
|
|
35
|
+
|
|
36
|
+
## Environmental Sounds
|
|
37
|
+
|
|
38
|
+
### Acoustic Space Matching
|
|
39
|
+
|
|
40
|
+
| Environment | Acoustic Properties |
|
|
41
|
+
|-------------|---------------------|
|
|
42
|
+
| **Small room** | Tight reverb, close sounds |
|
|
43
|
+
| **Large hall** | Natural echo, distant ambience |
|
|
44
|
+
| **Outdoor open** | Minimal reverb, wind, distant sounds |
|
|
45
|
+
| **Enclosed bunker** | Muffled, resonant, claustrophobic |
|
|
46
|
+
| **Ship deck** | Wind, waves, metallic resonance |
|
|
47
|
+
| **Ship interior** | Engine hum, creaking, contained |
|
|
48
|
+
|
|
49
|
+
### Surface-Matched Footsteps
|
|
50
|
+
|
|
51
|
+
```
|
|
52
|
+
Wood: Creaking, hollow resonance
|
|
53
|
+
Stone: Sharp, echoing impact
|
|
54
|
+
Metal: Clanging, ringing
|
|
55
|
+
Gravel: Crunching, shifting
|
|
56
|
+
Dirt: Muffled, soft thuds
|
|
57
|
+
Wet ground: Splashing, squelching
|
|
58
|
+
```
|
|
59
|
+
|
|
60
|
+
---
|
|
61
|
+
|
|
62
|
+
## Military/Historical Sound Design
|
|
63
|
+
|
|
64
|
+
### Artillery Sounds
|
|
65
|
+
|
|
66
|
+
```
|
|
67
|
+
Cannon fire: Deep boom, mechanical operation, recoil sounds
|
|
68
|
+
Shell loading: Heavy metallic clunk, scraping
|
|
69
|
+
Ammunition handling: Weight sounds, brass on metal
|
|
70
|
+
Distant bombardment: Rumbling, delayed impact
|
|
71
|
+
Near misses: Whistling, earth shaking, debris
|
|
72
|
+
```
|
|
73
|
+
|
|
74
|
+
### Naval Sounds
|
|
75
|
+
|
|
76
|
+
```
|
|
77
|
+
Ship engines: Deep thrumming, mechanical rhythm
|
|
78
|
+
Hull creaking: Metal stress, wave impact
|
|
79
|
+
Bridge ambience: Telegraph, commands, equipment
|
|
80
|
+
Distant ships: Muffled horns, gunfire echoes
|
|
81
|
+
Water: Waves, spray, impacts
|
|
82
|
+
```
|
|
83
|
+
|
|
84
|
+
### Bunker/Trench Sounds
|
|
85
|
+
|
|
86
|
+
```
|
|
87
|
+
Explosions above: Muffled booms, earth falling
|
|
88
|
+
Structural stress: Creaking, dust falling
|
|
89
|
+
Radio/telephone: Static, crackling, distant voices
|
|
90
|
+
Soldier activity: Equipment rattling, footsteps, breathing
|
|
91
|
+
Silence tension: Heartbeats, shallow breathing
|
|
92
|
+
```
|
|
93
|
+
|
|
94
|
+
---
|
|
95
|
+
|
|
96
|
+
## Audio Direction Block Template
|
|
97
|
+
|
|
98
|
+
Include at end of EVERY video prompt:
|
|
99
|
+
|
|
100
|
+
```
|
|
101
|
+
Audio direction:
|
|
102
|
+
- Language: [TURKISH/ENGLISH/NONE]
|
|
103
|
+
- Type: [Dialogue/SFX/Mixed/SFX Only]
|
|
104
|
+
- Dialogue transcript: [Exact lines OR "NONE"]
|
|
105
|
+
- SFX: [Specific sound effects list]
|
|
106
|
+
- Ambience: [Environmental background]
|
|
107
|
+
- Music: NONE <-- (STRICT DEFAULT: Only change if user explicitly requests music)
|
|
108
|
+
- Mix target: [Percentages, e.g., "Dialogue 60%, Ambience 30%, SFX 10%"]
|
|
109
|
+
- No on-screen subtitles/captions.
|
|
110
|
+
```
|
|
111
|
+
|
|
112
|
+
### Example: Battle Scene
|
|
113
|
+
|
|
114
|
+
```
|
|
115
|
+
Audio direction:
|
|
116
|
+
- Language: TURKISH
|
|
117
|
+
- Type: Mixed
|
|
118
|
+
- Dialogue transcript: "Şimdi tam zamanı. Bismillah, Ya Allah! Ateş serbest!"
|
|
119
|
+
- SFX: Cannon fire, shell loading, mechanical recoil, distant explosions
|
|
120
|
+
- Ambience: Wind, smoke whooshing, distant ship engines
|
|
121
|
+
- Music: NONE
|
|
122
|
+
- Mix target: Dialogue 50%, SFX 35%, Ambience 15%
|
|
123
|
+
- No on-screen subtitles/captions.
|
|
124
|
+
```
|
|
125
|
+
|
|
126
|
+
### Example: Tense Waiting
|
|
127
|
+
|
|
128
|
+
```
|
|
129
|
+
Audio direction:
|
|
130
|
+
- Language: NONE
|
|
131
|
+
- Type: SFX Only
|
|
132
|
+
- Dialogue transcript: NONE
|
|
133
|
+
- SFX: Distant rumbling, earth shaking, debris falling, heartbeats
|
|
134
|
+
- Ambience: Muffled explosions, soldiers breathing, equipment rattling
|
|
135
|
+
- Music: NONE
|
|
136
|
+
- Mix target: Ambience 70%, SFX 30%
|
|
137
|
+
- No on-screen subtitles/captions.
|
|
138
|
+
```
|
|
139
|
+
|
|
140
|
+
### Example: Dialogue Scene
|
|
141
|
+
|
|
142
|
+
```
|
|
143
|
+
Audio direction:
|
|
144
|
+
- Language: TURKISH
|
|
145
|
+
- Type: Dialogue
|
|
146
|
+
- Dialogue transcript: "Anladım paşam. Hemen harekete geçiyoruz."
|
|
147
|
+
- SFX: Telephone click, paper rustling
|
|
148
|
+
- Ambience: Bunker acoustics, distant explosions muffled
|
|
149
|
+
- Music: NONE
|
|
150
|
+
- Mix target: Dialogue 70%, Ambience 25%, SFX 5%
|
|
151
|
+
- No on-screen subtitles/captions.
|
|
152
|
+
```
|
|
153
|
+
|
|
154
|
+
---
|
|
155
|
+
|
|
156
|
+
## Anti-Synthetic Audio Rules
|
|
157
|
+
|
|
158
|
+
### ❌ AVOID
|
|
159
|
+
|
|
160
|
+
| Artifact | Description |
|
|
161
|
+
|----------|-------------|
|
|
162
|
+
| **Obviously synthesized** | AI-generated, robotic sounds |
|
|
163
|
+
| **Stock library generic** | Overused, recognizable loops |
|
|
164
|
+
| **Mismatched acoustics** | Sound doesn't match space |
|
|
165
|
+
| **Floating audio** | No spatial grounding |
|
|
166
|
+
| **Unnatural transitions** | Jarring sound cuts |
|
|
167
|
+
| **Uniform volume** | No natural dynamics |
|
|
168
|
+
|
|
169
|
+
### ✅ REQUIRE
|
|
170
|
+
|
|
171
|
+
| Quality | Description |
|
|
172
|
+
|---------|-------------|
|
|
173
|
+
| **Recorded quality** | Sounds like on-set capture |
|
|
174
|
+
| **Spatial grounding** | Sound has location in scene |
|
|
175
|
+
| **Natural dynamics** | Volume variations realistic |
|
|
176
|
+
| **Acoustic matching** | Sound matches environment |
|
|
177
|
+
| **Organic imperfections** | Slight variations natural |
|
|
178
|
+
|
|
179
|
+
---
|
|
180
|
+
|
|
181
|
+
## Dialogue Handling
|
|
182
|
+
|
|
183
|
+
### 🎯 NATIVE LANGUAGE RULE (CRITICAL!)
|
|
184
|
+
|
|
185
|
+
**NEVER translate user's dialogue!**
|
|
186
|
+
|
|
187
|
+
| User Writes In | Audio Transcript Uses |
|
|
188
|
+
|----------------|----------------------|
|
|
189
|
+
| Turkish | Turkish (as-is) |
|
|
190
|
+
| English | English (as-is) |
|
|
191
|
+
| Any language | Same language (verbatim) |
|
|
192
|
+
|
|
193
|
+
```
|
|
194
|
+
User: "Kalkın aslanlar! Düşman zırhlıları top menziline giriyorlar!"
|
|
195
|
+
|
|
196
|
+
❌ WRONG: Dialogue transcript: "Rise lions! Enemy battleships are entering range!"
|
|
197
|
+
✅ RIGHT: Dialogue transcript: "Kalkın aslanlar! Düşman zırhlıları top menziline giriyorlar!"
|
|
198
|
+
```
|
|
199
|
+
|
|
200
|
+
### When User Provides Dialogue
|
|
201
|
+
|
|
202
|
+
- Include VERBATIM in audio transcript
|
|
203
|
+
- Preserve original language (Turkish stays Turkish, English stays English)
|
|
204
|
+
- Note emotional delivery required
|
|
205
|
+
|
|
206
|
+
### 🗣️ SPEAKER ISOLATION RULE (Prevent Mixed Dialogue)
|
|
207
|
+
|
|
208
|
+
**Problem:** If two people are in the frame and both speak, AI often mixes lipsync or timing.
|
|
209
|
+
**Solution:** ONE active speaker per shot.
|
|
210
|
+
|
|
211
|
+
- **Shot A:** Character X speaks. Camera focuses on X (or over-the-shoulder of Y).
|
|
212
|
+
- **Shot B:** Character Y replies. Camera focuses on Y.
|
|
213
|
+
|
|
214
|
+
**EXCEPTION:** If both MUST be in frame (Two-Shot):
|
|
215
|
+
1. Use "Reaction Shot" for the listener (listener nods while speaker talks off-screen sound).
|
|
216
|
+
2. OR Ensure prompt explicitly states: "[Character A] talking, [Character B] listening silently".
|
|
217
|
+
|
|
218
|
+
```
|
|
219
|
+
Dialogue transcript: "Kalkın aslanlar! Düşman zırhlıları top menziline giriyorlar!"
|
|
220
|
+
[Delivery: Excited, commanding]
|
|
221
|
+
```
|
|
222
|
+
|
|
223
|
+
### When No Dialogue
|
|
224
|
+
|
|
225
|
+
```
|
|
226
|
+
Dialogue transcript: NONE
|
|
227
|
+
```
|
|
228
|
+
|
|
229
|
+
### Short Sentence Rule
|
|
230
|
+
|
|
231
|
+
**DO NOT** try to fit very long sentences into a single shot.
|
|
232
|
+
|
|
233
|
+
If dialogue is long:
|
|
234
|
+
- Split the sentence across shots
|
|
235
|
+
- OR split the shot in two
|
|
236
|
+
- This helps AI with audio splitting
|
|
237
|
+
|
|
238
|
+
```
|
|
239
|
+
❌ Wrong: One 4s shot with 20 seconds of dialogue
|
|
240
|
+
✅ Right: Split into multiple shots with shorter lines
|
|
241
|
+
```
|
|
242
|
+
|
|
243
|
+
---
|
|
244
|
+
|
|
245
|
+
## Audio Safety
|
|
246
|
+
|
|
247
|
+
| Rule | Requirement |
|
|
248
|
+
|------|-------------|
|
|
249
|
+
| **Minors on screen** | "Dialogue transcript: NONE" (platform requirement) |
|
|
250
|
+
| **Violence context** | Describe SFX as "mechanical operation" not "firing at targets" |
|
|
251
|
+
| **Threats** | Never script threatening dialogue |
|
|
252
|
+
| **Copyrighted music** | Never reference by name |
|
|
253
|
+
|
|
254
|
+
---
|
|
255
|
+
|
|
256
|
+
## 🔇 NO MUSIC POLICY
|
|
257
|
+
|
|
258
|
+
**Rule:** By default, generating music is BANNED.
|
|
259
|
+
**Why:** Cinematic realism relies on SFX and Ambience. Cheap stock music ruins immersion.
|
|
260
|
+
|
|
261
|
+
- **Default:** `Music: NONE`
|
|
262
|
+
- **Exception:** Only if User says "Add sad music" or "Background score".
|
|
263
|
+
- **Strictness:** Even if the scene is emotional, use *Silence* or *Wind*, NOT music.
|
|
264
|
+
|
|
265
|
+
---
|
|
266
|
+
|
|
267
|
+
## Mix Targets by Scene Type
|
|
268
|
+
|
|
269
|
+
| Scene Type | Dialogue | SFX | Ambience | Music |
|
|
270
|
+
|------------|----------|-----|----------|-------|
|
|
271
|
+
| **Intense dialogue** | 70% | 5% | 25% | 0% |
|
|
272
|
+
| **Action/battle** | 20% | 50% | 30% | 0% |
|
|
273
|
+
| **Tension/waiting** | 0% | 30% | 70% | 0% |
|
|
274
|
+
| **Emotional moment** | 60% | 10% | 30% | 0% |
|
|
275
|
+
| **Establishing shot** | 0% | 20% | 80% | 0% |
|
|
276
|
+
|
|
277
|
+
---
|
|
278
|
+
|
|
279
|
+
## Period-Specific Audio Notes
|
|
280
|
+
|
|
281
|
+
### WWI/Çanakkale
|
|
282
|
+
|
|
283
|
+
```
|
|
284
|
+
Artillery: Black powder era, deeper booms
|
|
285
|
+
Ships: Coal-powered, rhythmic engine sounds
|
|
286
|
+
Communication: Field telephone, no radio static
|
|
287
|
+
Commands: Shouted, no PA systems
|
|
288
|
+
Uniforms: Fabric rustling, leather creaking, metal equipment
|
|
289
|
+
```
|
|
290
|
+
|
|
291
|
+
---
|
|
292
|
+
|
|
293
|
+
## 🔊 Diegetic vs Non-Diegetic Ses
|
|
294
|
+
|
|
295
|
+
Ses'in kaynağına göre prompt'ta farklı yönetilmelidir:
|
|
296
|
+
|
|
297
|
+
| Tür | Tanım | Örnekler | Prompt'ta Nasıl |
|
|
298
|
+
|-----|-------|----------|----------------|
|
|
299
|
+
| **Diegetic** | Sahne içinden gelen ses | Radyo, TV, konuşma, ayak sesi | SFX veya Dialogue olarak yaz |
|
|
300
|
+
| **Non-Diegetic** | Sahne dışından gelen ses | Film müziği, narrator, efekt | Music veya voiceover olarak yaz |
|
|
301
|
+
| **Meta-Diegetic** | Karakterin zihnindeki ses | İç ses, hatıra, hayal | "Internal monologue" + echo efekti |
|
|
302
|
+
|
|
303
|
+
**Kural:** Veo 3.1'de tüm sesler diegetic olarak davranır. Non-diegetic müzik eklenecekse kullanıcı açıkça istemelidir (Music: NONE default).
|
|
304
|
+
|
|
305
|
+
---
|
|
306
|
+
|
|
307
|
+
> **Remember:** Audio sells the reality. A perfectly filmed scene fails if it sounds fake. Every sound must justify its existence and match its environment.
|