@kolbo/kolbo-code-linux-arm64-musl 1.1.72 → 1.1.73

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,146 @@
1
+ ---
2
+ name: music-prompting
3
+ description: >
4
+ Music generation prompting guide: BPM selection by video type, key/mood mapping, prompt
5
+ structure for background music, duration matching, looping strategies, section-mapped scoring.
6
+ Use when generating background music for video or crafting music generation prompts.
7
+ Keywords: music, BPM, tempo, key, mood, instrumental, background music, suno, elevenlabs,
8
+ music generation, prompt, genre, looping, score, soundtrack
9
+ ---
10
+
11
+ # Music Generation — Prompting Guide
12
+
13
+ ## Quick Reference
14
+
15
+ ```
16
+ INSTRUMENTAL: Always force_instrumental=true for video background
17
+ PROMPT ORDER: genre/style → BPM → key/mood → instruments → energy → purpose
18
+ KEY RULE: Music must be 18-20 dB below narration (see sound-design skill)
19
+ ALWAYS INCLUDE: "background" or "underscore" in every prompt
20
+ ```
21
+
22
+ ## BPM Selection by Video Type
23
+
24
+ | Video Type | BPM Range | Prompt Fragment |
25
+ |-----------|-----------|-----------------|
26
+ | Educational explainer | 80-100 | "gentle ambient electronic, 90 BPM" |
27
+ | Corporate / tech | 100-120 | "upbeat corporate pop, 110 BPM, positive" |
28
+ | Epic / dramatic reveal | 60-80 | "cinematic orchestral, 70 BPM, building tension" |
29
+ | Fast-paced montage | 120-140 | "energetic electronic, 130 BPM, driving beat" |
30
+ | Meditation / calm | 50-70 | "ambient drone, 60 BPM, peaceful" |
31
+ | Comedy / lighthearted | 100-130 | "playful ukulele pop, 120 BPM, whimsical" |
32
+ | Sad / reflective | 60-80 | "melancholic piano, 65 BPM, minor key" |
33
+ | Action / hype | 140-170 | "high-intensity drum and bass, 160 BPM" |
34
+
35
+ ## Key and Mood Mapping
36
+
37
+ | Mood | Key | Musical Characteristics |
38
+ |------|-----|----------------------|
39
+ | Happy / upbeat | C major, G major | Bright, resolved, energetic |
40
+ | Serious / professional | D minor, A minor | Grounded, authoritative |
41
+ | Mysterious / curious | E minor, B minor | Tension, anticipation |
42
+ | Triumphant / inspiring | D major, Bb major | Expansive, climactic |
43
+ | Melancholic / thoughtful | F minor, C minor | Reflective, emotional |
44
+ | Neutral / ambient | C major, Am | Unobtrusive, background |
45
+
46
+ ## Prompt Structure
47
+
48
+ ```
49
+ [GENRE/STYLE], [BPM], [KEY/MOOD], [INSTRUMENTS], [ENERGY LEVEL], [PURPOSE]
50
+ ```
51
+
52
+ ### Examples
53
+
54
+ **Educational explainer:**
55
+ ```
56
+ Gentle lo-fi ambient electronic, 90 BPM, C major, soft synth pads and light
57
+ percussion, calm and steady energy, background music for narration
58
+ ```
59
+
60
+ **Corporate product demo:**
61
+ ```
62
+ Modern upbeat corporate pop, 110 BPM, G major, acoustic guitar and light drums,
63
+ positive energy building gradually, underscore for product walkthrough
64
+ ```
65
+
66
+ **Technical deep-dive:**
67
+ ```
68
+ Minimal ambient electronic, 80 BPM, A minor, soft Rhodes piano and subtle
69
+ bass, contemplative and focused, background music for technical explanation
70
+ ```
71
+
72
+ ## Prompting Rules
73
+
74
+ 1. **Always include "background" or "underscore"** — tells the model to stay dynamically even
75
+ 2. **Always use instrumental mode** — lyrics compete with narration
76
+ 3. **Specify BPM explicitly** — don't rely on genre to set tempo
77
+ 4. **Avoid "bright hi-hats" or "prominent vocals"** — high-frequency busy elements compete with speech in the 2-4 kHz intelligibility band
78
+ 5. **Include energy direction** — "steady energy" for explainers, "building gradually" for reveals
79
+
80
+ ## Duration Matching
81
+
82
+ - Generate at the exact video duration when possible
83
+ - For longer videos, generate a track 30-60% of video length and loop with crossfade
84
+ - **Section-mapped scoring** for videos with distinct acts:
85
+
86
+ | Video Section | Duration | Music Style |
87
+ |--------------|----------|-------------|
88
+ | Intro / hook | 8-10s | Soft, building |
89
+ | Main explanation | 90-120s | Steady, neutral |
90
+ | Key reveal | 20-30s | Intensified, fuller |
91
+ | Outro | 10-15s | Fading, gentle |
92
+
93
+ Generate each as a separate track and crossfade between them.
94
+
95
+ ## Looping
96
+
97
+ ```bash
98
+ # Loop a track 3x
99
+ ffmpeg -stream_loop 2 -i music.mp3 -c copy music_looped.mp3
100
+
101
+ # Add crossfade at loop points (2s fade)
102
+ ffmpeg -i music.mp3 -af "afade=t=out:st=28:d=2" part1.mp3
103
+ ffmpeg -i music.mp3 -af "afade=t=in:d=2" part2.mp3
104
+ # Then concat
105
+ ```
106
+
107
+ Better approach: generate at the exact video duration to avoid loop artifacts.
108
+
109
+ ## Integration with Video
110
+
111
+ - Duck music 18-20 dB below narration during speech
112
+ - Cut 2-4 kHz on the music bed to clear speech intelligibility band
113
+ - Test on phone speakers — if narration disappears behind music, duck more
114
+ - One track per video — avoid switching styles mid-video unless clear narrative shift
115
+ - Music should start at video start and fade out 2-3 seconds before end
116
+
117
+ ---
118
+
119
+ ## Kolbo MCP Integration
120
+
121
+ | Task | Kolbo MCP Tool | Notes |
122
+ |------|---------------|-------|
123
+ | Generate music | `generate_music` | Use prompt structure above |
124
+ | Instrumental | `generate_music` instrumental=true | Always for video background |
125
+ | With lyrics | `generate_music` lyrics="..." | Pass actual lyric text |
126
+ | Discover models | `list_models` type="music" | Check available music models |
127
+ | Sound effects | `generate_sound` | For whooshes, impacts, ambience |
128
+
129
+ **Workflow:**
130
+ 1. `list_models` type="music" → pick model or auto-select
131
+ 2. Write prompt using the BPM/key/mood tables above
132
+ 3. `generate_music` with `instrumental: true`, style tags, duration
133
+ 4. Download the result and mix with narration using FFmpeg (see `sound-design` skill)
134
+
135
+ ---
136
+
137
+ ## Local / Free Options
138
+
139
+ > **IMPORTANT:** Always use Kolbo MCP `generate_music` by default. Only mention these if the user explicitly asks for free alternatives. Confirm before installing anything.
140
+
141
+ **Free music libraries (no install, browser-based):**
142
+ - Pixabay Music — free, no attribution required
143
+ - Free Music Archive — CC-licensed
144
+ - Incompetech (Kevin MacLeod) — CC-BY, huge catalog
145
+
146
+ **Local generation:** If the user has a GPU (8GB+) and explicitly asks, `MusicGen` by Meta (`pip install audiocraft`) can generate music locally. Confirm before installing.
@@ -0,0 +1,152 @@
1
+ ---
2
+ name: production-review
3
+ description: >
4
+ Self-review quality gates for video production: post-render verification protocol, pre-delivery
5
+ checklist, audio verification, visual inspection, severity classification (critical/suggestion/nitpick),
6
+ review workflow. Use after completing any production stage to verify quality before delivery.
7
+ Keywords: review, quality, verification, checklist, render, audio check, video check, delivery,
8
+ QA, quality gate, self-review, post-render
9
+ ---
10
+
11
+ # Production Review — Quality Gates
12
+
13
+ ## When to Use
14
+
15
+ After completing any major production stage — especially after rendering, before delivering to the user. Read this skill and run through the relevant checklist.
16
+
17
+ ## Severity Levels
18
+
19
+ | Severity | Definition | Action |
20
+ |----------|-----------|--------|
21
+ | **CRITICAL** | Breaks the output, incomplete, or dangerously wrong | Must fix. Blocks delivery. |
22
+ | **SUGGESTION** | Improves quality significantly but doesn't block | Note it, fix if time allows |
23
+ | **NITPICK** | Nice-to-have polish | Log it, move on |
24
+
25
+ ## Decision Flow
26
+
27
+ 1. Run the relevant checklist below
28
+ 2. Count critical findings
29
+ 3. **0 critical** → PASS (note suggestions)
30
+ 4. **1+ critical** → REVISE (max 2 revision rounds)
31
+ 5. After 2 rounds, still critical → PASS_WITH_WARNINGS (inform user of known issues)
32
+
33
+ ---
34
+
35
+ ## Post-Render Verification (Video)
36
+
37
+ ### Step 1: Probe the Output (GATE — blocks all other steps)
38
+ ```bash
39
+ ffprobe -v quiet -print_format json -show_format -show_streams rendered_video.mp4
40
+ ```
41
+
42
+ Verify ALL of:
43
+ - [ ] Video stream exists with correct resolution and FPS
44
+ - [ ] **Audio stream exists** — if missing, STOP. Fix audio config, re-render
45
+ - [ ] Duration within +/-5% of target
46
+ - [ ] File size is reasonable (not 0 bytes, not suspiciously small)
47
+
48
+ **If audio stream is missing, do NOT proceed.** Most common cause: audio sources mixed externally but never embedded in the composition.
49
+
50
+ ### Step 2: Extract Review Frames
51
+ Sample frames at scene midpoints and visually inspect:
52
+ ```bash
53
+ ffmpeg -i rendered_video.mp4 -vf "fps=1/5" frame_%04d.png
54
+ ```
55
+ - [ ] No visual artifacts or glitches
56
+ - [ ] Text overlays readable and within safe zones
57
+ - [ ] Color grade consistent across scenes
58
+ - [ ] No black frames or flash frames at cuts
59
+
60
+ ### Step 3: Audio Verification
61
+ - [ ] Play back and confirm narration is audible over music
62
+ - [ ] No audio pops or clicks at cut points
63
+ - [ ] Music volume appropriate (18-20 dB below dialogue)
64
+ - [ ] Audio loudness within platform target (-14 LUFS for social)
65
+
66
+ ### Step 4: Present Review to User
67
+ Structured summary with: file stats, audio verification, visual findings, caption status.
68
+
69
+ ---
70
+
71
+ ## Pre-Delivery Checklist by Content Type
72
+
73
+ ### Explainer Video
74
+ - [ ] Hook lands in first 3 seconds
75
+ - [ ] Core concept clearly explained (the "aha" moment)
76
+ - [ ] Captions present and synced
77
+ - [ ] Background music doesn't overpower narration
78
+ - [ ] Duration matches target (+/-10%)
79
+ - [ ] Output plays correctly on target platform
80
+
81
+ ### Short-Form (TikTok/Reels/Shorts)
82
+ - [ ] 9:16 aspect ratio, 1080x1920
83
+ - [ ] Important content within safe zones (900x1400)
84
+ - [ ] Hook in first 1-2 seconds
85
+ - [ ] Captions mandatory (85% watch muted)
86
+ - [ ] File size under platform limit
87
+ - [ ] H.264 High Profile, 8+ Mbps
88
+
89
+ ### Talking Head
90
+ - [ ] Filler words removed
91
+ - [ ] No awkward jump cuts (covered by B-roll or transition)
92
+ - [ ] Speaker's face never covered by overlays
93
+ - [ ] Audio clean — no background noise
94
+ - [ ] Eye-level framing maintained
95
+
96
+ ### Music/Audio
97
+ - [ ] Correct duration
98
+ - [ ] Instrumental if for background use
99
+ - [ ] BPM matches content energy
100
+ - [ ] No clipping or distortion
101
+ - [ ] Loudness normalized to target
102
+
103
+ ---
104
+
105
+ ## Remotion-Specific Verification
106
+
107
+ Before declaring a Remotion render complete:
108
+
109
+ - [ ] Run `composition_validator` before rendering
110
+ - [ ] All `staticFile()` references resolve to existing assets
111
+ - [ ] Composition duration matches sum of scene durations minus transition overlaps
112
+ - [ ] No CSS animations used (must use `useCurrentFrame()` + `interpolate()`)
113
+ - [ ] No Tailwind `animate-*` classes (break frame-based rendering)
114
+ - [ ] `interpolate()` calls use `extrapolateLeft: 'clamp', extrapolateRight: 'clamp'`
115
+ - [ ] Audio layers in sync with visual scenes
116
+ - [ ] Theme colors match the active style
117
+ - [ ] Text scenes use Remotion components, NOT AI-generated images with text
118
+
119
+ ## Review Log Format
120
+
121
+ When logging a review finding:
122
+ ```
123
+ [SEVERITY] Finding description
124
+ - What: specific issue observed
125
+ - Where: timestamp or scene reference
126
+ - Fix: recommended action
127
+ ```
128
+
129
+ ---
130
+
131
+ ## Kolbo MCP Integration
132
+
133
+ Use these tools during review:
134
+
135
+ | Review Step | Kolbo MCP Tool | What to Check |
136
+ |-------------|---------------|---------------|
137
+ | Audio verification | `transcribe_audio` | Transcribe the rendered video — if 0 words, audio is silent |
138
+ | Visual analysis | `chat_send_message` + Gemini | "Review this video for quality issues" |
139
+ | Credit check | `check_credits` | Verify budget before re-renders |
140
+
141
+ **Post-render verification with Kolbo:**
142
+ 1. `ffprobe` the output (always first — check streams exist)
143
+ 2. `transcribe_audio` the rendered video → compare word count to script
144
+ 3. If word count < 80% of script → audio is cut off → investigate
145
+ 4. `chat_send_message` with Gemini + video URL → visual quality review
146
+ 5. Present structured findings to user
147
+
148
+ **Re-generation workflow (if review finds critical issues):**
149
+ 1. Identify the failed asset (video clip, audio, image)
150
+ 2. Re-generate with adjusted prompt via the appropriate Kolbo MCP tool
151
+ 3. Re-compose with FFmpeg or Remotion
152
+ 4. Run review again (max 2 revision rounds)
@@ -0,0 +1,168 @@
1
+ ---
2
+ name: short-form-video
3
+ description: >
4
+ Short-form video optimization for TikTok, Instagram Reels, YouTube Shorts, and Facebook Reels.
5
+ Platform safe zones, upload specs, hook techniques, pacing rules, duration strategy, retention
6
+ benchmarks, caption requirements. Use when creating vertical video content or optimizing for
7
+ social platforms.
8
+ Keywords: tiktok, reels, shorts, vertical video, 9:16, hook, retention, pacing, safe zone,
9
+ caption, short form, social media, viral
10
+ ---
11
+
12
+ # Short-Form Video (TikTok / Reels / Shorts)
13
+
14
+ ## Quick Reference
15
+
16
+ ```
17
+ ASPECT RATIO: 9:16 vertical (1080x1920)
18
+ SAFE ZONE: 900x1400px centered (universal cross-platform)
19
+ DURATION: 15s (highest completion) | 30s (best engagement) | 60s (most flexible)
20
+ HOOK: First 1-2 seconds — visual or text pattern interrupt
21
+ CAPTIONS: Mandatory (85% watch muted on mobile)
22
+ TEXT SIZE: 42px+ minimum, bold sans-serif
23
+ PACING: Visual change every 1-3 seconds
24
+ TARGET LUFS: -14 LUFS, true peak -1 dBTP
25
+ MUSIC: 120-140 BPM for energetic, 90-110 for explainers
26
+ CODEC: H.264 High Profile, 8-15 Mbps VBR
27
+ ```
28
+
29
+ ## Platform Safe Zones (1080x1920)
30
+
31
+ | Platform | Safe Zone | Top Dead | Bottom Dead | Right Dead |
32
+ |----------|-----------|----------|-------------|------------|
33
+ | TikTok | 900x1492 | 108px | 320px | 120px |
34
+ | Instagram Reels | 996x1400 | 210px | 310px | 84px |
35
+ | YouTube Shorts | 984x1500 | 120px | 300px | 96px |
36
+ | Facebook Reels | 1080x1520 | 100px | 300px | 60px |
37
+
38
+ **Universal safe zone: 900x1400px centered** — works across all platforms.
39
+
40
+ **Bottom dead zones are critical** — platform UI (comments, share, captions) covers the bottom 300-320px. Never put important content there.
41
+
42
+ ## Duration Strategy
43
+
44
+ | Duration | Avg Completion | Best For |
45
+ |----------|---------------|----------|
46
+ | 0-15s | 92% | Single fact, quick tip, visual gag |
47
+ | 16-30s | 84% | One concept explained, before/after |
48
+ | 31-60s | 68% | Mini tutorial, step-by-step, story arc |
49
+ | 60s+ | 48% | Deep explainer (only with strong retention structure) |
50
+
51
+ **Platform sweet spots:**
52
+ - TikTok: 21-34s for completion; 60-180s for maximum total watch time
53
+ - Reels: 15-30s for viral reach; 60-90s for highest engagement
54
+ - Shorts: Bimodal — ~13s OR full 60s
55
+
56
+ **Key formula:** A 45s video with 70% completion (31.5s watch time) outperforms a 15s video with 40% completion (6s). Total watch time is what the algorithm rewards.
57
+
58
+ ## The 1-Second Hook
59
+
60
+ 70%+ of TikTok users decide to scroll or stay within 3 seconds (average decision: 1.7s).
61
+
62
+ ### 3-Second Retention Impact
63
+
64
+ | 3-Second Retention | Algorithmic Effect | View Multiplier |
65
+ |-------------------|-------------------|-----------------|
66
+ | Below 60% | Minimal promotion | 1.0x |
67
+ | 60-70% | Average distribution | 1.6x |
68
+ | 70-85% | Optimal reach | 2.2x |
69
+ | 85%+ | Viral potential | 2.8x |
70
+
71
+ ### Hook Techniques
72
+
73
+ | Technique | Example | When to Use |
74
+ |-----------|---------|-------------|
75
+ | **Bold text on screen** | "STOP doing this..." (frame 1) | Always — works even muted |
76
+ | **Pattern interrupt** | Unexpected visual, jump cut, color flash | Attention-grabbing |
77
+ | **Question** | "Why does X happen?" (text + voiceover) | Educational |
78
+ | **Result first** | Show finished result, then explain how | Tutorial/how-to |
79
+ | **Controversy** | "Everyone gets this wrong" | Engagement bait |
80
+
81
+ ### Hook Rules
82
+
83
+ 1. **Frame 1 must have visual interest** — no blank intros, no logos, no "hey guys"
84
+ 2. **Text appears in the first 0.5 seconds** — viewers scan text before listening
85
+ 3. **Voice starts immediately** — no silent buildup
86
+ 4. **Movement in frame 1** — static opening frames get scrolled past
87
+
88
+ ## Pacing Rules
89
+
90
+ - Visual change every **1-3 seconds** minimum
91
+ - New information every **3-5 seconds**
92
+ - No static shot longer than **2 seconds** without text overlay or motion
93
+ - Scene transitions should be **hard cuts** (no slow fades on short-form)
94
+
95
+ ## Retention Checkpoints
96
+
97
+ | Timestamp | Target Retention |
98
+ |-----------|-----------------|
99
+ | 3 seconds | 70%+ |
100
+ | 15 seconds | 60%+ |
101
+ | 30 seconds | 50%+ |
102
+
103
+ ## Upload Specs
104
+
105
+ ```
106
+ CODEC: H.264 High Profile, Level 4.2
107
+ BITRATE: 8-15 Mbps VBR (below 5 Mbps triggers quality downgrade)
108
+ FORMAT: .mp4 preferred
109
+ MAX SIZE: 500 MB (desktop), 287.6 MB (iOS), 72 MB (Android)
110
+ ```
111
+
112
+ ## Caption Requirements
113
+
114
+ - **85% of social video is watched on mute** — captions are mandatory
115
+ - Max 3-4 words per cue on vertical (narrow screen)
116
+ - Max 20 characters per line
117
+ - 42px+ minimum font size
118
+ - Bold sans-serif font (Arial, Inter, Montserrat)
119
+ - Thick outline (3px) for readability on varied backgrounds
120
+ - Position in bottom 20% but above the platform dead zone
121
+
122
+ ## 9:16 Conversion (from 16:9 source)
123
+
124
+ Blurred background + centered content — never crop the original:
125
+
126
+ ```bash
127
+ ffmpeg -i input.mp4 -filter_complex \
128
+ "[0:v]split[bg][fg]; \
129
+ [bg]scale=1080:1920:force_original_aspect_ratio=increase, \
130
+ crop=1080:1920,gblur=sigma=40[blurred]; \
131
+ [fg]scale=1080:1920:force_original_aspect_ratio=decrease, \
132
+ pad=1080:1920:(ow-iw)/2:(oh-ih)/2:color=black@0[front]; \
133
+ [blurred][front]overlay=0:0" \
134
+ -c:v libx264 -crf 18 -c:a aac output_vertical.mp4
135
+ ```
136
+
137
+ ---
138
+
139
+ ## Kolbo MCP Integration
140
+
141
+ | Task | Kolbo MCP Tool | Notes |
142
+ |------|---------------|-------|
143
+ | Generate vertical video | `generate_video` | Specify "9:16 vertical" in prompt |
144
+ | Image-to-video hook | `generate_video_from_image` | Animate a striking frame for the hook |
145
+ | Batch clips from long-form | `generate_creative_director` | Extract highlights |
146
+ | Add captions | `transcribe_audio` | Get word-level SRT, then burn-in with FFmpeg |
147
+ | Background music | `generate_music` | 120-140 BPM for energetic, instrumental=true |
148
+ | Sound effects | `generate_sound` | Whooshes, pops for transitions |
149
+ | Style consistency | `create_visual_dna` | Same look across a series |
150
+
151
+ **Short-form production workflow:**
152
+ 1. Script using the `storytelling` skill (hook → content → close)
153
+ 2. `generate_speech` → narration
154
+ 3. `generate_video` or `generate_video_from_image` → visual clips
155
+ 4. `generate_music` → background track (120-140 BPM, instrumental)
156
+ 5. `transcribe_audio` → get word-level SRT for captions
157
+ 6. FFmpeg: compose 9:16 video + burn-in captions + mix audio
158
+ 7. Review with `production-review` skill checklist
159
+
160
+ ---
161
+
162
+ ## Local / Free Options
163
+
164
+ > **IMPORTANT:** Always use Kolbo MCP tools by default. FFmpeg is the only tool safe to use without asking — it's standard software. For anything else, confirm with the user first.
165
+
166
+ **FFmpeg (safe, standard):** Handles 9:16 conversion, caption burn-in, audio mixing, silence removal — all commands in this skill and the `ffmpeg-patterns` skill.
167
+
168
+ **Transcription:** Kolbo's `transcribe_audio` is easiest. If the user explicitly wants offline transcription, `faster-whisper` runs on CPU with no GPU needed (`pip install faster-whisper`) — but confirm before installing.
@@ -0,0 +1,154 @@
1
+ ---
2
+ name: sound-design
3
+ description: >
4
+ Audio production rules for video: dialogue levels, music ducking, SFX placement and timing,
5
+ BPM selection by content type, platform loudness targets (LUFS), voice EQ and compression,
6
+ audio ducking levels. Use when mixing audio for video, choosing background music, or placing
7
+ sound effects.
8
+ Keywords: audio, sound design, ducking, LUFS, loudness, music, sfx, sound effects, mixing,
9
+ dialogue, voice, EQ, compression, BPM, volume, audio levels
10
+ ---
11
+
12
+ # Sound Design for Video Production
13
+
14
+ ## Quick Reference
15
+
16
+ ```
17
+ DIALOGUE: -12 dB peak | -16 to -14 LUFS integrated
18
+ MUSIC BED: -30 to -20 dB (18-20 dB below dialogue)
19
+ SFX: -18 to -12 dB (6 dB below dialogue minimum)
20
+ WHOOSH TIMING: Start 10-20ms before visual, duration 400-500ms
21
+ MUSIC BPM: Calm 60-80 | Standard 90-110 | Upbeat 120-140
22
+ TRUE PEAK: Never exceed -1.5 dBTP
23
+ VOICE EQ: HPF 80Hz, cut 500Hz, boost 2-5kHz, cut 6-8kHz
24
+ VOICE COMP: 3:1 ratio, 1-5ms attack, 10-20ms release
25
+ TARGET LUFS: -14 LUFS (YouTube/TikTok/IG) | -16 LUFS (podcasts)
26
+ ```
27
+
28
+ ## Audio Ducking Levels
29
+
30
+ | Element | Peak Level | Notes |
31
+ |---------|-----------|-------|
32
+ | Dialogue / Narration | -6 dB to -12 dB | Primary element |
33
+ | Background music (during speech) | -18 dB to -20 dB | 18-20 dB below dialogue |
34
+ | Sound effects | -12 dB to -18 dB | Between dialogue and music |
35
+ | Final mix | -10 dB to -20 dB | Never exceed 0 dB |
36
+
37
+ **Ducking rules:**
38
+ - W3C accessibility: music must be **20 dB lower** than foreground speech
39
+ - BBC guideline: lower music by an additional **4 dB** from where you think it sounds right
40
+ - Duck music **6-12 dB** when narration is active
41
+ - EQ trick: cut **2-4 kHz** on background music to clear the speech intelligibility band
42
+ - When testing, adjust in **1 dB increments** from a -20 dB baseline upward
43
+
44
+ ## Music Selection by Content Type
45
+
46
+ | Content Type | BPM Range | Mood |
47
+ |-------------|-----------|------|
48
+ | Calm explainer / tutorial | 60-80 | Contemplative, focused |
49
+ | Corporate / testimonial | 60-100 | Professional, calm |
50
+ | Standard explainer | 90-110 | Steady, engaging |
51
+ | Upbeat promo | 110-130 | Enthusiastic |
52
+ | High-energy / demo | 120-140 | Exciting, dynamic |
53
+ | Action / fast-paced | 140-200 | Adrenaline |
54
+
55
+ **Genre recommendations for explainers:**
56
+ - Lo-fi (steady, non-distracting, modern feel)
57
+ - Ambient (atmospheric, stays in background)
58
+ - Light acoustic guitar instrumentals (warm, approachable)
59
+ - Inspiring soundtrack / cinematic light (builds emotion without overwhelming)
60
+
61
+ **Key rules:**
62
+ - Always use **instrumental** tracks when voiceover is present
63
+ - Choose dynamically **even** tracks — avoid dramatic crescendos or beat drops
64
+ - Match energy to the content: upbeat for "exciting new concept," gentle for serious topics
65
+
66
+ ## Sound Effects (SFX) Placement
67
+
68
+ | SFX Type | Use Case | Duration | Level |
69
+ |----------|----------|----------|-------|
70
+ | Whoosh / Swish | Scene transitions | 400-500ms | -18 to -12 dB |
71
+ | Pop / Pluck | Text appearing, bullet points | <200ms | -15 to -12 dB |
72
+ | Click / Tap | UI interactions | <100ms | -20 to -15 dB |
73
+ | Riser / Swell | Building to a reveal | 1-3s | -18 to -12 dB |
74
+ | Impact / Hit | Key reveal, stat | <300ms | -12 to -6 dB |
75
+ | Subtle whoosh | Element sliding in/out | 200-400ms | -20 to -15 dB |
76
+
77
+ ### Timing Rules
78
+ - Start whoosh **10-20ms before** the visual transition (brain processes audio faster)
79
+ - Peak of whoosh energy = **moment of greatest visual change**
80
+ - Fine-tune in **1-frame increments** for sync
81
+ - When stacking whooshes, keep them in different frequency bands
82
+
83
+ ## Platform Loudness Targets
84
+
85
+ | Platform | Integrated LUFS | True Peak |
86
+ |----------|----------------|-----------|
87
+ | YouTube | -14 LUFS | -1 dBTP |
88
+ | TikTok | -14 LUFS | -1 dBTP |
89
+ | Instagram Reels | -14 LUFS | -1 dBTP |
90
+ | Spotify (podcast) | -14 LUFS | -1 dBTP |
91
+ | Apple Podcasts | -16 LUFS | -1 dBTP |
92
+ | Broadcast TV | -24 LUFS | -2 dBTP |
93
+
94
+ ## Voice Processing Chain
95
+
96
+ Apply in this order:
97
+ 1. **High-pass filter** at 80 Hz (removes rumble)
98
+ 2. **Cut 500 Hz** by 2-3 dB (removes muddiness)
99
+ 3. **Boost 2-5 kHz** by 2-3 dB (presence and clarity)
100
+ 4. **Cut 6-8 kHz** by 1-2 dB (reduces sibilance)
101
+ 5. **Compress** at 3:1 ratio, 1-5ms attack, 10-20ms release
102
+ 6. **Normalize** to target LUFS
103
+
104
+ ## FFmpeg Audio Commands
105
+
106
+ ### Loudness Normalization
107
+ ```bash
108
+ ffmpeg -i input.mp4 -af loudnorm=I=-14:LRA=11:TP=-1 -c:v copy output.mp4
109
+ ```
110
+
111
+ ### Audio Ducking with Sidechain
112
+ ```bash
113
+ ffmpeg -i narration.wav -i music.wav -filter_complex \
114
+ "[1:a]asplit=2[music1][music2]; \
115
+ [0:a][music2]sidechaincompress=threshold=0.02:ratio=9:attack=200:release=500[ducked]; \
116
+ [music1][ducked]amix=inputs=2:weights='1 0.15'" \
117
+ -c:a aac output.m4a
118
+ ```
119
+
120
+ ### Measure Loudness
121
+ ```bash
122
+ ffmpeg -i input.mp4 -af loudnorm=print_format=json -f null - 2>&1 | grep -A 20 "Parsed_loudnorm"
123
+ ```
124
+
125
+ ---
126
+
127
+ ## Kolbo MCP Integration
128
+
129
+ | Task | Kolbo MCP Tool | Notes |
130
+ |------|---------------|-------|
131
+ | Generate narration | `generate_speech` | See `list_voices` for voice options |
132
+ | Generate music | `generate_music` | Use BPM tables above, always instrumental=true |
133
+ | Generate SFX | `generate_sound` | Describe physically: "door slam in stone hallway" |
134
+ | Transcribe audio | `transcribe_audio` | Word-level timestamps for sync |
135
+ | Voice discovery | `list_voices` | Filter by language, gender, provider |
136
+
137
+ **Full audio production workflow:**
138
+ 1. `generate_speech` → narration track
139
+ 2. `generate_music` instrumental=true → background music
140
+ 3. `generate_sound` → individual SFX (whooshes, impacts)
141
+ 4. Mix with FFmpeg using the ducking commands above
142
+ 5. Normalize to -14 LUFS for social platforms
143
+
144
+ ---
145
+
146
+ ## Local / Free Options
147
+
148
+ > **IMPORTANT:** Always use Kolbo MCP tools by default. Only mention these if the user explicitly asks for free/offline options. Always confirm before installing anything.
149
+
150
+ **TTS:** `edge-tts` (free Microsoft voices, no GPU, `pip install edge-tts`) or `piper-tts` (fully offline, CPU-only). Both are safe, lightweight installs.
151
+
152
+ **SFX libraries (no install needed):** Freesound.org, Pixabay Sound Effects, BBC Sound Effects — all free, browser-based.
153
+
154
+ **FFmpeg** is the only tool you should use without asking — it's standard and safe. All the mixing/ducking/normalization commands in this skill use FFmpeg.