@kolbo/kolbo-code-linux-arm64-musl 1.1.72 → 1.1.73
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/bin/kolbo +0 -0
- package/package.json +1 -1
- package/skills/color-grading/SKILL.md +152 -0
- package/skills/ffmpeg-patterns/SKILL.md +240 -0
- package/skills/image-prompting-guide/SKILL.md +143 -0
- package/skills/kolbo/SKILL.md +29 -0
- package/skills/music-prompting/SKILL.md +146 -0
- package/skills/production-review/SKILL.md +152 -0
- package/skills/short-form-video/SKILL.md +168 -0
- package/skills/sound-design/SKILL.md +154 -0
- package/skills/storytelling/SKILL.md +139 -0
- package/skills/subtitle-production/SKILL.md +244 -0
- package/skills/subtitle-production/reference/burn_to_video.py +222 -0
- package/skills/subtitle-production/reference/export_srts.py +127 -0
- package/skills/subtitle-production/reference/gen_srt.py +42 -0
- package/skills/typography-video/SKILL.md +182 -0
- package/skills/typography-video/reference/KineticTitleScene.tsx +345 -0
- package/skills/video-editing/SKILL.md +128 -0
- package/skills/video-production/SKILL.md +7 -8
- package/skills/video-prompting-guide/SKILL.md +268 -0
|
@@ -0,0 +1,139 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: storytelling
|
|
3
|
+
description: >
|
|
4
|
+
Narrative structure and storytelling for video content. Explainer arc templates, hook types,
|
|
5
|
+
the 30-second retention rule, pacing by duration, the "but-therefore" method, concept
|
|
6
|
+
introduction patterns. Use when scripting explainer videos, educational content, or any
|
|
7
|
+
narrative-driven video.
|
|
8
|
+
Keywords: storytelling, narrative, script, explainer, hook, arc, structure, pacing, retention,
|
|
9
|
+
educational, concept, story, writing, script structure
|
|
10
|
+
---
|
|
11
|
+
|
|
12
|
+
# Storytelling & Narrative Structure for Video
|
|
13
|
+
|
|
14
|
+
## The Explainer Arc Template (3 minutes)
|
|
15
|
+
|
|
16
|
+
Scale proportionally for other lengths.
|
|
17
|
+
|
|
18
|
+
```
|
|
19
|
+
[0:00 - 0:08] HOOK
|
|
20
|
+
Pattern interrupt or counterintuitive claim. 1-2 sentences max.
|
|
21
|
+
Visual: striking image or animation that creates curiosity.
|
|
22
|
+
|
|
23
|
+
[0:08 - 0:30] TENSION / INFORMATION GAP
|
|
24
|
+
"Here's what most people think... but that's not quite right."
|
|
25
|
+
Establish stakes: why should I care?
|
|
26
|
+
|
|
27
|
+
[0:30 - 0:50] CONCEPT 1 (Foundation)
|
|
28
|
+
Simplest building block needed. ONE idea, ONE visual.
|
|
29
|
+
End with a "but" or "therefore" transition.
|
|
30
|
+
|
|
31
|
+
[0:50 - 1:15] CONCEPT 2 (Complication)
|
|
32
|
+
Build on Concept 1. Introduce the wrinkle.
|
|
33
|
+
Visual: transform/evolve the previous visual.
|
|
34
|
+
|
|
35
|
+
[1:15 - 1:20] PALETTE CLEANSER
|
|
36
|
+
Brief pause, visual gag, or "let that sink in" moment.
|
|
37
|
+
Gives working memory a beat to consolidate.
|
|
38
|
+
|
|
39
|
+
[1:20 - 1:50] CONCEPT 3 (Key Insight)
|
|
40
|
+
The "aha" moment. Core of the video.
|
|
41
|
+
1-3 seconds of deliberate silence after the reveal.
|
|
42
|
+
Visual: the most polished animation in the video.
|
|
43
|
+
|
|
44
|
+
[1:50 - 2:20] PROOF / EXAMPLE
|
|
45
|
+
Concrete demonstration: "Watch what happens when..."
|
|
46
|
+
Show the insight working in a specific case.
|
|
47
|
+
|
|
48
|
+
[2:20 - 2:45] IMPLICATIONS / "SO WHAT?"
|
|
49
|
+
Connect back to the real world. "This means that..."
|
|
50
|
+
Scale from specific back to general.
|
|
51
|
+
|
|
52
|
+
[2:45 - 3:00] REFRAME + CLOSE
|
|
53
|
+
Callback to the hook. Restate core insight in one sentence.
|
|
54
|
+
Optional: open a new curiosity gap.
|
|
55
|
+
```
|
|
56
|
+
|
|
57
|
+
## Scaling by Duration
|
|
58
|
+
|
|
59
|
+
| Length | Concepts | Hook | Tension | Core | Proof | Close |
|
|
60
|
+
|--------|----------|------|---------|------|-------|-------|
|
|
61
|
+
| 1 min | 1-2 | 5s | 10s | 30s | 10s | 5s |
|
|
62
|
+
| 2 min | 2-3 | 8s | 15s | 60s | 25s | 12s |
|
|
63
|
+
| 3 min | 3-5 | 8s | 22s | 100s | 30s | 15s |
|
|
64
|
+
| 5 min | 5-8 | 10s | 30s | 180s | 50s | 20s |
|
|
65
|
+
|
|
66
|
+
## Hook Types
|
|
67
|
+
|
|
68
|
+
| Type | Pattern | Best For |
|
|
69
|
+
|------|---------|----------|
|
|
70
|
+
| **Contrarian** | "Everything you've been told about X is wrong." | Science/myth-busting |
|
|
71
|
+
| **Outcome** | "By the end of this video, you'll understand X." | Math/concept explainers |
|
|
72
|
+
| **Mystery** | "In 1987, something impossible happened..." | Story-driven content |
|
|
73
|
+
| **Stakes** | "This one mistake costs people X every year." | Practical/how-to |
|
|
74
|
+
|
|
75
|
+
## The 30-Second Rule
|
|
76
|
+
|
|
77
|
+
50% of viewer drop-off happens in the first 30 seconds. The hook + tension setup MUST be complete by second 30. Retention curves that survive the 30-second cliff typically retain 40-60% through the full video.
|
|
78
|
+
|
|
79
|
+
## The "But-Therefore" Method
|
|
80
|
+
|
|
81
|
+
Never connect sections with "and then." Always use **"but"** or **"therefore."**
|
|
82
|
+
|
|
83
|
+
**Bad:** "Atoms have electrons, AND THEN those electrons have energy levels, AND THEN..."
|
|
84
|
+
|
|
85
|
+
**Good:** "Atoms have electrons. BUT those electrons can only exist at specific energy levels. THEREFORE, when they jump between levels, they release light at exact frequencies."
|
|
86
|
+
|
|
87
|
+
Each "but" creates tension. Each "therefore" resolves it. This is the engine of narrative momentum.
|
|
88
|
+
|
|
89
|
+
## Concept Introduction Pattern
|
|
90
|
+
|
|
91
|
+
For each new concept in the video:
|
|
92
|
+
|
|
93
|
+
1. **Name it** — give the concept a label the viewer can hold onto
|
|
94
|
+
2. **Show it** — visual representation (never just explain with words)
|
|
95
|
+
3. **Contrast it** — "unlike X, this works by..."
|
|
96
|
+
4. **Apply it** — concrete example in the real world
|
|
97
|
+
5. **Connect it** — link to the previous concept with "but" or "therefore"
|
|
98
|
+
|
|
99
|
+
## Pacing by Content Energy
|
|
100
|
+
|
|
101
|
+
| Energy Level | Pacing | Visual Change Rate |
|
|
102
|
+
|-------------|--------|-------------------|
|
|
103
|
+
| High (promo, action) | Fast cuts, 1-2s per shot | Every 1-2 seconds |
|
|
104
|
+
| Medium (tutorial, explainer) | Balanced, 3-5s per shot | Every 3-4 seconds |
|
|
105
|
+
| Low (meditation, documentary) | Let scenes breathe, 5-10s | Every 5-8 seconds |
|
|
106
|
+
|
|
107
|
+
## Common Mistakes
|
|
108
|
+
|
|
109
|
+
- **Info dump at the start** — frontload curiosity, not information
|
|
110
|
+
- **No stakes** — "here's a cool fact" vs "this changes how you should think about X"
|
|
111
|
+
- **Too many concepts** — one concept per minute is the maximum for retention
|
|
112
|
+
- **Missing the "aha"** — every video needs ONE clear revelation moment
|
|
113
|
+
- **Symmetric structure** — the most important concept should be at 60-70% through the video, not in the middle
|
|
114
|
+
- **No callback** — the close should reference the hook, creating a satisfying loop
|
|
115
|
+
|
|
116
|
+
---
|
|
117
|
+
|
|
118
|
+
## Kolbo MCP Integration
|
|
119
|
+
|
|
120
|
+
Use storytelling structure to guide Kolbo generation workflows:
|
|
121
|
+
|
|
122
|
+
**Scripted Explainer Workflow:**
|
|
123
|
+
1. Write script using the arc template above
|
|
124
|
+
2. `generate_speech` → narrate each section
|
|
125
|
+
3. `generate_image` or `generate_video` per scene (with visual direction from script)
|
|
126
|
+
4. `generate_music` → background track matching the energy arc
|
|
127
|
+
5. Compose in Remotion or FFmpeg following the timing structure
|
|
128
|
+
6. `transcribe_audio` → generate captions for accessibility
|
|
129
|
+
|
|
130
|
+
**AI-Assisted Scripting:**
|
|
131
|
+
Use `chat_send_message` with a video-capable model to brainstorm scripts. Feed the arc template as context and ask the AI to fill in each section for your topic.
|
|
132
|
+
|
|
133
|
+
**Creative Director for Visual Storyboarding:**
|
|
134
|
+
`generate_creative_director` with 4-8 scenes mapped to the arc sections:
|
|
135
|
+
- Scene 1: Hook visual
|
|
136
|
+
- Scene 2-3: Concept visuals
|
|
137
|
+
- Scene 4: Key insight / "aha" visual
|
|
138
|
+
- Scene 5-6: Proof / example
|
|
139
|
+
- Scene 7-8: Close / callback
|
|
@@ -0,0 +1,244 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: subtitle-production
|
|
3
|
+
description: >
|
|
4
|
+
Subtitle and caption production: timing strategies, cue length by format (vertical vs horizontal),
|
|
5
|
+
ASS/SRT styling, word-level timing, RTL support for Hebrew/Arabic, burn-in with FFmpeg,
|
|
6
|
+
readability rules. Use when generating, styling, or burning in subtitles.
|
|
7
|
+
Keywords: subtitle, caption, SRT, ASS, VTT, timing, burn-in, word-level, karaoke, RTL,
|
|
8
|
+
Hebrew, Arabic, font size, cue, readability
|
|
9
|
+
---
|
|
10
|
+
|
|
11
|
+
# Subtitle & Caption Production
|
|
12
|
+
|
|
13
|
+
## Output Formats
|
|
14
|
+
|
|
15
|
+
| Format | Extension | Use Case |
|
|
16
|
+
|--------|-----------|----------|
|
|
17
|
+
| SRT | `.srt` | Universal — FFmpeg, players, YouTube upload |
|
|
18
|
+
| VTT | `.vtt` | Web-native — HTML5 video, browser playback |
|
|
19
|
+
| ASS | `.ass` | Advanced styling, RTL support, per-word positioning |
|
|
20
|
+
|
|
21
|
+
## Cue Length by Format
|
|
22
|
+
|
|
23
|
+
### Vertical Short-Form (TikTok, Reels, Shorts)
|
|
24
|
+
- **Max 3-4 words per cue** — narrow screen, text must be large
|
|
25
|
+
- **Max 20 characters per line**
|
|
26
|
+
- Subtitles are **mandatory** (85% watch muted)
|
|
27
|
+
|
|
28
|
+
### Horizontal Standard (YouTube, web)
|
|
29
|
+
- **Max 6-8 words per cue** — wider screen
|
|
30
|
+
- **Max 42 characters per line** (broadcast standard)
|
|
31
|
+
|
|
32
|
+
### General Rules
|
|
33
|
+
- Average viewer reads ~15 characters/second
|
|
34
|
+
- Minimum display time: 0.5 seconds per cue
|
|
35
|
+
- Maximum display time: 5 seconds per cue
|
|
36
|
+
|
|
37
|
+
## Styling for Burn-in
|
|
38
|
+
|
|
39
|
+
### Vertical Video (1080x1920)
|
|
40
|
+
```
|
|
41
|
+
font: Arial (or Heebo Bold for Hebrew)
|
|
42
|
+
font_size: 18
|
|
43
|
+
bold: true
|
|
44
|
+
primary_color: &H00FFFFFF (white, ASS format)
|
|
45
|
+
outline_color: &H00000000 (black)
|
|
46
|
+
outline_width: 3 (thick for readability)
|
|
47
|
+
shadow: 2
|
|
48
|
+
margin_v: 50
|
|
49
|
+
alignment: 2 (bottom center)
|
|
50
|
+
```
|
|
51
|
+
|
|
52
|
+
### Horizontal Video (1920x1080)
|
|
53
|
+
```
|
|
54
|
+
font: Arial
|
|
55
|
+
font_size: 22
|
|
56
|
+
bold: true
|
|
57
|
+
primary_color: &H00FFFFFF
|
|
58
|
+
outline_color: &H00000000
|
|
59
|
+
outline_width: 2
|
|
60
|
+
shadow: 1
|
|
61
|
+
margin_v: 40
|
|
62
|
+
alignment: 2
|
|
63
|
+
```
|
|
64
|
+
|
|
65
|
+
### Common Mistakes
|
|
66
|
+
- **Wrong color format:** `&HFFFFFF` breaks positioning. Always use full 8-char `&H00FFFFFF`
|
|
67
|
+
- **Font too large on vertical:** `font_size: 28` fills center of 9:16. Use 18 max
|
|
68
|
+
- **Too many words per cue on vertical:** 5+ words creates multi-line blocks covering the face
|
|
69
|
+
- **MarginV too large:** Values over 200 push text off-screen. Stay under 100
|
|
70
|
+
|
|
71
|
+
## Timing Best Practices
|
|
72
|
+
|
|
73
|
+
- Cue start must match word onset (not before the speaker starts)
|
|
74
|
+
- Cue end should extend ~200ms past the last word for comfortable reading
|
|
75
|
+
- Never let a cue linger into the next speaker's turn
|
|
76
|
+
- Don't split a thought across two cues if it fits in one
|
|
77
|
+
|
|
78
|
+
## FFmpeg Burn-in Commands
|
|
79
|
+
|
|
80
|
+
### Simple SRT
|
|
81
|
+
```bash
|
|
82
|
+
ffmpeg -i input.mp4 -vf "subtitles=subs.srt:force_style='FontSize=22,Bold=1,PrimaryColour=&H00FFFFFF,OutlineColour=&H00000000,Outline=2'" -c:v libx264 -crf 18 -c:a copy output.mp4
|
|
83
|
+
```
|
|
84
|
+
|
|
85
|
+
### ASS with Custom Styling
|
|
86
|
+
```bash
|
|
87
|
+
ffmpeg -i input.mp4 -vf "ass=styled_subs.ass" -c:v libx264 -crf 18 -c:a copy output.mp4
|
|
88
|
+
```
|
|
89
|
+
|
|
90
|
+
### Windows Path Escaping
|
|
91
|
+
```bash
|
|
92
|
+
# Escape colons in subtitle filter paths on Windows
|
|
93
|
+
ffmpeg -i input.mp4 -vf "subtitles=C\\:/Users/path/subs.srt" output.mp4
|
|
94
|
+
```
|
|
95
|
+
|
|
96
|
+
## RTL (Hebrew/Arabic) — Proven Patterns
|
|
97
|
+
|
|
98
|
+
RTL subtitles are tricky. These patterns are battle-tested in Kolbo's video production pipeline.
|
|
99
|
+
|
|
100
|
+
**Reference implementations (bundled in `./reference/`):**
|
|
101
|
+
- `reference/burn_to_video.py` — Full burn pipeline with RTL progress bar (`geq` filter), chapter compositing, NVENC encoding
|
|
102
|
+
- `reference/export_srts.py` — SRT generation with chapter divider offset accounting
|
|
103
|
+
- `reference/gen_srt.py` — Word-level SRT from transcript JSON (8-word grouping, 1.5s gap detection)
|
|
104
|
+
|
|
105
|
+
### Option 1: SRT with Simple Burn-in (easiest, works for most cases)
|
|
106
|
+
|
|
107
|
+
Plain SRT files work for Hebrew/Arabic if you use the right font and let FFmpeg's libass handle bidi:
|
|
108
|
+
```bash
|
|
109
|
+
ffmpeg -i input.mp4 -vf "subtitles=subs.srt:force_style='FontName=Heebo,FontSize=22,Bold=1,Encoding=177,PrimaryColour=&H00FFFFFF,OutlineColour=&H00000000,Outline=2'" -c:v libx264 -crf 18 -c:a copy output.mp4
|
|
110
|
+
```
|
|
111
|
+
- **Font**: Heebo Bold for Hebrew, Cairo Bold for Arabic
|
|
112
|
+
- **Encoding=177** (Hebrew) or **Encoding=178** (Arabic) in ASS style
|
|
113
|
+
|
|
114
|
+
### Option 2: ASS with Per-Word Positioning (for karaoke/highlighting)
|
|
115
|
+
|
|
116
|
+
When you need per-word color highlighting with RTL text, you MUST use separate ASS Dialogue lines per word:
|
|
117
|
+
|
|
118
|
+
- Each word gets its own `Dialogue` line with explicit `\pos(x,y)`
|
|
119
|
+
- Use PIL to measure word widths: apply `~0.74` scale factor (PIL→libass calibration)
|
|
120
|
+
- Use `Alignment=7` (top-left anchor) so `\pos` sets exact top-left of each word
|
|
121
|
+
- Two named ASS styles (e.g., White + Yellow) for highlight vs inactive — NO inline `\c` tags
|
|
122
|
+
|
|
123
|
+
**CRITICAL:** Any inline ASS tag (`\c`, `\K`, `\1c`) between RTL words **breaks Unicode bidi in libass** — words render LTR instead of RTL. Always use separate Dialogue lines per word.
|
|
124
|
+
|
|
125
|
+
### Option 3: Remotion Captions (best for karaoke, full RTL control)
|
|
126
|
+
|
|
127
|
+
Remotion gives you full CSS control over RTL text. Proven pattern from Kolbo's video pipeline:
|
|
128
|
+
|
|
129
|
+
```tsx
|
|
130
|
+
// Detect language and set direction
|
|
131
|
+
const isHebrew = language === "he" || language === "iw";
|
|
132
|
+
const fontFamily = isHebrew ? "'Heebo', sans-serif" : "'Poppins', sans-serif";
|
|
133
|
+
|
|
134
|
+
// Root container
|
|
135
|
+
<div style={{
|
|
136
|
+
direction: isHebrew ? "rtl" : "ltr",
|
|
137
|
+
fontFamily,
|
|
138
|
+
textTransform: isHebrew ? "none" : "uppercase",
|
|
139
|
+
letterSpacing: isHebrew ? 0 : -2,
|
|
140
|
+
}}>
|
|
141
|
+
{words.map((word, i) => {
|
|
142
|
+
const progress = interpolate(frame, [word.startFrame, word.endFrame], [0, 1], {
|
|
143
|
+
extrapolateLeft: "clamp", extrapolateRight: "clamp"
|
|
144
|
+
});
|
|
145
|
+
return (
|
|
146
|
+
<span key={i} style={{
|
|
147
|
+
color: progress > 0 ? accentColor : "#ffffff",
|
|
148
|
+
transition: "none", // No CSS transitions in Remotion!
|
|
149
|
+
}}>
|
|
150
|
+
{word.text}{" "}
|
|
151
|
+
</span>
|
|
152
|
+
);
|
|
153
|
+
})}
|
|
154
|
+
</div>
|
|
155
|
+
```
|
|
156
|
+
|
|
157
|
+
**RTL-specific gotchas in Remotion (proven fixes):**
|
|
158
|
+
- Flip `paddingLeft` ↔ `paddingRight` when Hebrew
|
|
159
|
+
- Flip `transformOrigin`: `"top left"` → `"top right"` for Hebrew
|
|
160
|
+
- Gradient directions: `270deg` (RTL) vs `90deg` (LTR)
|
|
161
|
+
- Position logic: for Hebrew, "left" position actually means right side of screen
|
|
162
|
+
- `letterSpacing: 0` for Hebrew (negative kerning looks wrong with Hebrew fonts)
|
|
163
|
+
- `textTransform: "none"` for Hebrew (uppercase has no meaning in Hebrew)
|
|
164
|
+
|
|
165
|
+
### RTL Progress Bar (FFmpeg)
|
|
166
|
+
|
|
167
|
+
Animated progress bar that fills right-to-left for Hebrew, using `geq` filter:
|
|
168
|
+
|
|
169
|
+
```python
|
|
170
|
+
duration = 5.0 # seconds
|
|
171
|
+
|
|
172
|
+
# Hebrew (RTL): bar fills RIGHT → LEFT
|
|
173
|
+
bar_cond = f"gt(X,W*(1-T/{duration}))"
|
|
174
|
+
|
|
175
|
+
# English (LTR): bar fills LEFT → RIGHT
|
|
176
|
+
bar_cond = f"lt(X,W*T/{duration})"
|
|
177
|
+
|
|
178
|
+
# Apply as geq filter on bottom 4px strip (performant: 5760px/frame not 2M)
|
|
179
|
+
bar_geq = (
|
|
180
|
+
f"geq="
|
|
181
|
+
f"r='if({bar_cond},59,r(X,Y))':" # #3b82f6 blue
|
|
182
|
+
f"g='if({bar_cond},130,g(X,Y))':"
|
|
183
|
+
f"b='if({bar_cond},246,b(X,Y))'"
|
|
184
|
+
)
|
|
185
|
+
```
|
|
186
|
+
Uses capital `T` for timestamp in `geq` — avoids conflict with drawbox's `t=fill`.
|
|
187
|
+
|
|
188
|
+
### Language Detection
|
|
189
|
+
|
|
190
|
+
```python
|
|
191
|
+
_lang_map = {"heb": "he", "eng": "en", "iw": "he", "ara": "ar", "rus": "ru"}
|
|
192
|
+
language_code = _lang_map.get(raw_lang, raw_lang)
|
|
193
|
+
is_rtl = language_code in ("he", "ar", "fa", "ur")
|
|
194
|
+
```
|
|
195
|
+
|
|
196
|
+
## Word-Level Timing (Karaoke / Motion Graphics)
|
|
197
|
+
|
|
198
|
+
For word-by-word highlighting:
|
|
199
|
+
1. `transcribe_audio` via Kolbo MCP → get `word_by_word_srt_url` (ElevenLabs Scribe word-level timestamps)
|
|
200
|
+
2. Each word has precise start/end timing
|
|
201
|
+
3. Group words into display cues (8+ words or >1.5s gap triggers new line)
|
|
202
|
+
4. **For Remotion**: use word timings directly as props — CSS `direction: rtl` handles Hebrew ordering automatically
|
|
203
|
+
5. **For FFmpeg**: use ASS with per-word Dialogue lines (see Option 2 above)
|
|
204
|
+
|
|
205
|
+
## Quality Checklist
|
|
206
|
+
|
|
207
|
+
- [ ] Every spoken word appears in a subtitle cue
|
|
208
|
+
- [ ] No cue exceeds the character limit for target format
|
|
209
|
+
- [ ] Subtitles in bottom 20% of frame — never covering the face
|
|
210
|
+
- [ ] Text readable on mobile at native resolution
|
|
211
|
+
- [ ] Timing matches speech — no early or late cues
|
|
212
|
+
- [ ] Cues don't overlap each other
|
|
213
|
+
- [ ] Outline/shadow provides sufficient contrast against all backgrounds
|
|
214
|
+
|
|
215
|
+
---
|
|
216
|
+
|
|
217
|
+
## Kolbo MCP Integration
|
|
218
|
+
|
|
219
|
+
| Task | Kolbo MCP Tool | Notes |
|
|
220
|
+
|------|---------------|-------|
|
|
221
|
+
| Transcribe → SRT | `transcribe_audio` | Returns `srt_url` (grouped) + `word_by_word_srt_url` |
|
|
222
|
+
| Word-level captions | `transcribe_audio` | `word_by_word_srt_url` for karaoke/Remotion |
|
|
223
|
+
| Burn-in to video | FFmpeg | Use SRT from transcription |
|
|
224
|
+
| Visual analysis | `chat_send_message` + Gemini | Analyze where speaker's face is for caption placement |
|
|
225
|
+
|
|
226
|
+
**Subtitle production workflow:**
|
|
227
|
+
1. `transcribe_audio` → get `srt_url` and `word_by_word_srt_url`
|
|
228
|
+
2. Download the SRT file
|
|
229
|
+
3. Adjust styling for target format (vertical vs horizontal)
|
|
230
|
+
4. Burn in with FFmpeg using the commands above
|
|
231
|
+
5. For Remotion: use `word_by_word_srt_url` with CaptionOverlay component
|
|
232
|
+
|
|
233
|
+
**For Remotion captions (preferred over FFmpeg burn-in):**
|
|
234
|
+
- Load the word-by-word SRT
|
|
235
|
+
- Use Remotion's CaptionOverlay for animated word highlighting
|
|
236
|
+
- See `remotion-best-practices` skill for details
|
|
237
|
+
|
|
238
|
+
---
|
|
239
|
+
|
|
240
|
+
## Local / Free Option
|
|
241
|
+
|
|
242
|
+
> **IMPORTANT:** Always use Kolbo's `transcribe_audio` by default — it returns both grouped SRT and word-by-word SRT with no setup. FFmpeg burn-in is safe to use directly. Only suggest local transcription if the user explicitly asks for offline/free. Confirm before installing.
|
|
243
|
+
|
|
244
|
+
**Offline transcription:** `faster-whisper` runs on CPU, no GPU needed (`pip install faster-whisper`). Supports word-level timestamps for subtitle generation.
|
|
@@ -0,0 +1,222 @@
|
|
|
1
|
+
"""
|
|
2
|
+
Full burn pipeline:
|
|
3
|
+
For each chapter:
|
|
4
|
+
1. Render SectionDivider card (4s) + mux with SFX
|
|
5
|
+
2. Cut raw footage segment
|
|
6
|
+
3. Render ChapterProgress banner (ProRes alpha, exact chapter duration)
|
|
7
|
+
4. Composite banner onto footage
|
|
8
|
+
Concatenate everything → final burned MP4
|
|
9
|
+
"""
|
|
10
|
+
import json, os, sys, subprocess
|
|
11
|
+
|
|
12
|
+
_root = r"G:\Projects\Master Agent"
|
|
13
|
+
for _p in [os.path.join(_root, 'core'), os.path.join(_root, 'agents', 'content-creation')]:
|
|
14
|
+
if _p not in sys.path:
|
|
15
|
+
sys.path.insert(0, _p)
|
|
16
|
+
|
|
17
|
+
import config
|
|
18
|
+
sys.path.insert(0, os.path.join(_root, 'agents', 'content-creation', 'modules'))
|
|
19
|
+
from remotion_render import render as remotion_render, render_still
|
|
20
|
+
|
|
21
|
+
# ── Config ───────────────────────────────────────────────────────────────────
|
|
22
|
+
SOURCE_VIDEO = r"C:\Users\Zohar\Downloads\מכללת ספיר H.264.mp4"
|
|
23
|
+
CHAPTERS_JSON = r"G:\Projects\Master Agent\ytp_jobs\sapir_test\chapters.json"
|
|
24
|
+
SFX_FILE = r"G:\Projects\Master Agent\ytp_jobs\sapir_test\sfx\v1_cinematic_eq.mp3"
|
|
25
|
+
WORK_DIR = r"G:\Projects\Master Agent\ytp_jobs\sapir_test\burn"
|
|
26
|
+
FINAL_OUTPUT = r"G:\Projects\Youtube Editings\renders\sapir_edited_final.mp4"
|
|
27
|
+
VIDEO_DURATION = 1041.1
|
|
28
|
+
FPS = 30
|
|
29
|
+
FFMPEG = "ffmpeg"
|
|
30
|
+
NVENC = True # -bf 0 -rc-lookahead 0 eliminates encoder delay
|
|
31
|
+
|
|
32
|
+
os.makedirs(WORK_DIR, exist_ok=True)
|
|
33
|
+
|
|
34
|
+
# ── Load chapters ─────────────────────────────────────────────────────────────
|
|
35
|
+
with open(CHAPTERS_JSON, encoding='utf-8') as f:
|
|
36
|
+
chapters = json.load(f)
|
|
37
|
+
|
|
38
|
+
for i, ch in enumerate(chapters):
|
|
39
|
+
ch['end_time'] = chapters[i + 1]['start_time'] if i + 1 < len(chapters) else VIDEO_DURATION
|
|
40
|
+
ch['duration'] = ch['end_time'] - ch['start_time']
|
|
41
|
+
|
|
42
|
+
# ── Helpers ───────────────────────────────────────────────────────────────────
|
|
43
|
+
def _venc(cq=19):
|
|
44
|
+
"""Return video encoder args — NVENC (GPU, no delay) or libx264 fallback."""
|
|
45
|
+
if NVENC:
|
|
46
|
+
# -bf 0 -rc-lookahead 0: zero encoder delay → no A/V drift with -c:a copy
|
|
47
|
+
return ["-c:v", "h264_nvenc", "-preset", "p4", "-cq", str(cq),
|
|
48
|
+
"-bf", "0", "-rc-lookahead", "0"]
|
|
49
|
+
return ["-c:v", "libx264", "-preset", "fast", "-crf", str(cq)]
|
|
50
|
+
|
|
51
|
+
|
|
52
|
+
def run(cmd, desc=""):
|
|
53
|
+
print(f"[ffmpeg] {desc}", flush=True)
|
|
54
|
+
r = subprocess.run(cmd, capture_output=True)
|
|
55
|
+
if r.returncode != 0:
|
|
56
|
+
raise RuntimeError(f"FAILED {desc}:\n{r.stderr.decode('utf-8','replace')[-600:]}")
|
|
57
|
+
|
|
58
|
+
|
|
59
|
+
def cut_footage(start, end, output):
|
|
60
|
+
if os.path.exists(output):
|
|
61
|
+
print(f"[skip] {os.path.basename(output)} exists")
|
|
62
|
+
return
|
|
63
|
+
duration = end - start
|
|
64
|
+
# Dual-seek: fast input seek to 5s before target, then frame-accurate output seek
|
|
65
|
+
# This gives exact A/V sync without decoding the full file from the beginning.
|
|
66
|
+
pre = min(5.0, start)
|
|
67
|
+
run([FFMPEG, "-y",
|
|
68
|
+
"-ss", str(start - pre), "-i", SOURCE_VIDEO,
|
|
69
|
+
"-ss", str(pre), "-t", str(duration),
|
|
70
|
+
*_venc(),
|
|
71
|
+
"-c:a", "copy",
|
|
72
|
+
"-movflags", "+faststart", output],
|
|
73
|
+
f"cut footage {start:.1f}s + {duration:.1f}s")
|
|
74
|
+
|
|
75
|
+
|
|
76
|
+
def mux_sfx(video, sfx, output, video_duration=4.0):
|
|
77
|
+
if os.path.exists(output):
|
|
78
|
+
print(f"[skip] {os.path.basename(output)} exists")
|
|
79
|
+
return
|
|
80
|
+
# Resample SFX to 48kHz to match source video, pad, then mux
|
|
81
|
+
run([FFMPEG, "-y",
|
|
82
|
+
"-i", video, "-i", sfx,
|
|
83
|
+
"-filter_complex", f"[1:a]aresample=48000,apad=pad_dur={video_duration}[a]",
|
|
84
|
+
"-map", "0:v", "-map", "[a]",
|
|
85
|
+
*_venc(cq=12), "-c:a", "aac", "-b:a", "192k", "-ar", "48000",
|
|
86
|
+
"-t", str(video_duration), output],
|
|
87
|
+
f"mux SFX into {os.path.basename(video)}")
|
|
88
|
+
|
|
89
|
+
|
|
90
|
+
def composite_with_banner(footage, banner_png, output, duration_sec, is_hebrew=True):
|
|
91
|
+
"""Composite static banner PNG + animated progress bar.
|
|
92
|
+
Uses crop+geq+overlay on just the bottom 4 rows — fast (5760 px/frame not 2M).
|
|
93
|
+
geq uses capital T for timestamp, avoiding conflict with drawbox's t=fill.
|
|
94
|
+
"""
|
|
95
|
+
if os.path.exists(output):
|
|
96
|
+
print(f"[skip] {os.path.basename(output)} exists")
|
|
97
|
+
return
|
|
98
|
+
d = float(duration_sec)
|
|
99
|
+
# geq on a 4px strip: T=timestamp(secs), W=strip width, X=pixel x-coord
|
|
100
|
+
# Hebrew RTL: fill right side first → X > W*(1 - T/D)
|
|
101
|
+
# LTR: fill left side first → X < W*T/D
|
|
102
|
+
if is_hebrew:
|
|
103
|
+
bar_cond = f"gt(X,W*(1-T/{d}))"
|
|
104
|
+
else:
|
|
105
|
+
bar_cond = f"lt(X,W*T/{d})"
|
|
106
|
+
|
|
107
|
+
bar_geq = (
|
|
108
|
+
f"geq="
|
|
109
|
+
f"r='if({bar_cond},59,r(X,Y))':"
|
|
110
|
+
f"g='if({bar_cond},130,g(X,Y))':"
|
|
111
|
+
f"b='if({bar_cond},246,b(X,Y))'"
|
|
112
|
+
)
|
|
113
|
+
|
|
114
|
+
# overlay=0:0 composites banner PNG onto footage
|
|
115
|
+
# split → crop bottom 4px → geq colors the bar → overlay back at bottom
|
|
116
|
+
fc = (
|
|
117
|
+
f"[0:v][1:v]overlay=0:0:format=auto,format=yuv420p[base];"
|
|
118
|
+
f"[base]split[main][bot_src];"
|
|
119
|
+
f"[bot_src]crop=iw:4:0:ih-4[strip];"
|
|
120
|
+
f"[strip]{bar_geq}[bar];"
|
|
121
|
+
f"[main][bar]overlay=0:H-4[v]"
|
|
122
|
+
)
|
|
123
|
+
|
|
124
|
+
run([FFMPEG, "-y",
|
|
125
|
+
"-i", footage,
|
|
126
|
+
"-i", banner_png,
|
|
127
|
+
"-filter_complex", fc,
|
|
128
|
+
"-map", "[v]", "-map", "0:a",
|
|
129
|
+
*_venc(),
|
|
130
|
+
"-c:a", "copy",
|
|
131
|
+
"-movflags", "+faststart", output],
|
|
132
|
+
f"composite+bar {os.path.basename(output)}")
|
|
133
|
+
|
|
134
|
+
|
|
135
|
+
# ── Main loop ─────────────────────────────────────────────────────────────────
|
|
136
|
+
parts = []
|
|
137
|
+
total = len(chapters)
|
|
138
|
+
|
|
139
|
+
for ch in chapters:
|
|
140
|
+
n = ch['chapter_number']
|
|
141
|
+
dur = ch['duration']
|
|
142
|
+
dur_frames = int(round(dur * FPS))
|
|
143
|
+
|
|
144
|
+
print(f"\n{'='*60}")
|
|
145
|
+
title_safe = ch['title'].encode('ascii','replace').decode('ascii')
|
|
146
|
+
print(f"Chapter {n}/{total}: {title_safe} ({dur:.1f}s = {dur_frames} frames)")
|
|
147
|
+
print('='*60)
|
|
148
|
+
|
|
149
|
+
# ── 1. SectionDivider render ──────────────────────────────────────────
|
|
150
|
+
divider_raw = os.path.join(WORK_DIR, f"ch{n:02d}_divider_raw.mp4")
|
|
151
|
+
divider_sfx = os.path.join(WORK_DIR, f"ch{n:02d}_divider.mp4")
|
|
152
|
+
|
|
153
|
+
if not os.path.exists(divider_raw):
|
|
154
|
+
print(f"[render] SectionDivider ch{n}...")
|
|
155
|
+
remotion_render(
|
|
156
|
+
composition_id="SectionDivider-16x9",
|
|
157
|
+
props={
|
|
158
|
+
"chapterNumber": n,
|
|
159
|
+
"title": ch['title'],
|
|
160
|
+
"subtitle": ch.get('subtitle', ''),
|
|
161
|
+
"language": "he",
|
|
162
|
+
"durationInFrames": 120,
|
|
163
|
+
"fps": FPS,
|
|
164
|
+
},
|
|
165
|
+
output_path=divider_raw,
|
|
166
|
+
job_dir=WORK_DIR,
|
|
167
|
+
alpha=False,
|
|
168
|
+
concurrency=16,
|
|
169
|
+
)
|
|
170
|
+
else:
|
|
171
|
+
print(f"[skip] divider ch{n} exists")
|
|
172
|
+
|
|
173
|
+
mux_sfx(divider_raw, SFX_FILE, divider_sfx, video_duration=4.0)
|
|
174
|
+
parts.append(divider_sfx)
|
|
175
|
+
|
|
176
|
+
# ── 2. Cut raw footage ────────────────────────────────────────────────
|
|
177
|
+
raw_clip = os.path.join(WORK_DIR, f"ch{n:02d}_raw.mp4")
|
|
178
|
+
cut_footage(ch['start_time'], ch['end_time'], raw_clip)
|
|
179
|
+
|
|
180
|
+
# ── 3. ChapterBanner still PNG render (single frame, fast) ───────────
|
|
181
|
+
banner_png = os.path.join(WORK_DIR, f"ch{n:02d}_banner.png")
|
|
182
|
+
|
|
183
|
+
if not os.path.exists(banner_png):
|
|
184
|
+
print(f"[still] ChapterBanner ch{n}...")
|
|
185
|
+
render_still(
|
|
186
|
+
composition_id="ChapterBanner-16x9",
|
|
187
|
+
props={
|
|
188
|
+
"chapterNumber": n,
|
|
189
|
+
"title": ch['title'],
|
|
190
|
+
"language": "he",
|
|
191
|
+
},
|
|
192
|
+
output_path=banner_png,
|
|
193
|
+
job_dir=WORK_DIR,
|
|
194
|
+
)
|
|
195
|
+
else:
|
|
196
|
+
print(f"[skip] banner ch{n} exists")
|
|
197
|
+
|
|
198
|
+
# ── 4. Composite banner + progress bar onto footage ───────────────────
|
|
199
|
+
composited = os.path.join(WORK_DIR, f"ch{n:02d}_composited.mp4")
|
|
200
|
+
composite_with_banner(raw_clip, banner_png, composited, dur, is_hebrew=True)
|
|
201
|
+
parts.append(composited)
|
|
202
|
+
|
|
203
|
+
# ── Concatenate all parts ─────────────────────────────────────────────────────
|
|
204
|
+
print(f"\n{'='*60}")
|
|
205
|
+
print(f"Concatenating {len(parts)} clips...")
|
|
206
|
+
|
|
207
|
+
concat_list = os.path.join(WORK_DIR, "concat.txt")
|
|
208
|
+
with open(concat_list, 'w', encoding='utf-8') as f:
|
|
209
|
+
for p in parts:
|
|
210
|
+
f.write(f"file '{p}'\n")
|
|
211
|
+
|
|
212
|
+
run([FFMPEG, "-y",
|
|
213
|
+
"-f", "concat", "-safe", "0",
|
|
214
|
+
"-i", concat_list,
|
|
215
|
+
*_venc(),
|
|
216
|
+
"-c:a", "copy",
|
|
217
|
+
"-movflags", "+faststart", FINAL_OUTPUT],
|
|
218
|
+
f"final concat -> {FINAL_OUTPUT}")
|
|
219
|
+
|
|
220
|
+
size_mb = os.path.getsize(FINAL_OUTPUT) / 1024 / 1024
|
|
221
|
+
print(f"\nDone! Final output: {FINAL_OUTPUT}")
|
|
222
|
+
print(f"Size: {size_mb:.0f} MB")
|