videopilot 0.1.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- videopilot-0.1.0/AGENT.md +380 -0
- videopilot-0.1.0/LICENSE +21 -0
- videopilot-0.1.0/MANIFEST.in +10 -0
- videopilot-0.1.0/PKG-INFO +285 -0
- videopilot-0.1.0/README.md +248 -0
- videopilot-0.1.0/examples/highlight-cut/compose-plan.json +12 -0
- videopilot-0.1.0/examples/highlight-cut/cut-plan.json +25 -0
- videopilot-0.1.0/examples/narrated-highlights/compose-plan.json +50 -0
- videopilot-0.1.0/examples/narrated-highlights/cut-plan.json +7 -0
- videopilot-0.1.0/examples/narrated-highlights/script.json +26 -0
- videopilot-0.1.0/examples/voiceover-only/compose-plan.json +35 -0
- videopilot-0.1.0/examples/voiceover-only/script.json +27 -0
- videopilot-0.1.0/lib/__init__.py +1 -0
- videopilot-0.1.0/lib/compose.py +450 -0
- videopilot-0.1.0/lib/cut.py +121 -0
- videopilot-0.1.0/lib/doctor.py +98 -0
- videopilot-0.1.0/lib/export.py +404 -0
- videopilot-0.1.0/lib/ffmpeg_wrap.py +164 -0
- videopilot-0.1.0/lib/init_cmd.py +163 -0
- videopilot-0.1.0/lib/silence.py +96 -0
- videopilot-0.1.0/lib/transcribe.py +116 -0
- videopilot-0.1.0/lib/tts.py +172 -0
- videopilot-0.1.0/lib/voices.py +78 -0
- videopilot-0.1.0/pyproject.toml +75 -0
- videopilot-0.1.0/setup.cfg +4 -0
- videopilot-0.1.0/videopilot.egg-info/PKG-INFO +285 -0
- videopilot-0.1.0/videopilot.egg-info/SOURCES.txt +32 -0
- videopilot-0.1.0/videopilot.egg-info/dependency_links.txt +1 -0
- videopilot-0.1.0/videopilot.egg-info/entry_points.txt +3 -0
- videopilot-0.1.0/videopilot.egg-info/requires.txt +8 -0
- videopilot-0.1.0/videopilot.egg-info/top_level.txt +4 -0
- videopilot-0.1.0/videopilot.py +166 -0
- videopilot-0.1.0/videopilot_cli.py +20 -0
- videopilot-0.1.0/videopilot_mcp.py +478 -0
|
@@ -0,0 +1,380 @@
|
|
|
1
|
+
# video-creator — Agent Runbook
|
|
2
|
+
|
|
3
|
+
> **You are the calling LLM (GitHub Copilot CLI, Claude, etc.).** This document tells
|
|
4
|
+
> you how to drive `videopilot.py` from a user's natural-language request to produce
|
|
5
|
+
> a finished video. Read it in full before invoking the CLI.
|
|
6
|
+
|
|
7
|
+
## Mental model
|
|
8
|
+
|
|
9
|
+
`videopilot.py` is a **pure executor**. It does mechanical things (run ffmpeg, run
|
|
10
|
+
Whisper, run Edge TTS) and reads/writes JSON state files. **You** (the LLM) do the
|
|
11
|
+
creative + planning work:
|
|
12
|
+
|
|
13
|
+
- You write the voiceover script (`script.json`).
|
|
14
|
+
- You pick which sections to keep from a long video (`cut-plan.json`) — either by
|
|
15
|
+
reading a transcript or by translating the user's explicit trim instructions.
|
|
16
|
+
- You assemble the final timeline (`compose-plan.json`) — which clip plays when,
|
|
17
|
+
which voiceover overlays it, where slides go, where background music drops.
|
|
18
|
+
- Then you call `videopilot.py` subcommands to make it happen.
|
|
19
|
+
|
|
20
|
+
The JSON state files in `projects/<slug>/` are the **contract** between you and the
|
|
21
|
+
CLI. The CLI never invents content — it does exactly what the JSON says.
|
|
22
|
+
|
|
23
|
+
## Invocation
|
|
24
|
+
|
|
25
|
+
`videopilot` is installed as a console script — it works from **any** directory:
|
|
26
|
+
|
|
27
|
+
```powershell
|
|
28
|
+
videopilot <subcommand> <args>
|
|
29
|
+
```
|
|
30
|
+
|
|
31
|
+
If the command isn't found in a fresh shell, refresh PATH first:
|
|
32
|
+
|
|
33
|
+
```powershell
|
|
34
|
+
$env:Path = [System.Environment]::GetEnvironmentVariable("Path","Machine") + ";" + [System.Environment]::GetEnvironmentVariable("Path","User")
|
|
35
|
+
```
|
|
36
|
+
|
|
37
|
+
The CLI auto-detects ffmpeg in the WinGet `Gyan.FFmpeg` install location, so no manual ffmpeg PATH dance is needed.
|
|
38
|
+
|
|
39
|
+
> **Legacy invocation** (still works): `python C:\Work\tools\video-creator\videopilot.py <subcommand>`. The console-script form is preferred — every example below assumes it.
|
|
40
|
+
|
|
41
|
+
## First-time setup
|
|
42
|
+
|
|
43
|
+
Before doing any real work in a fresh environment, run:
|
|
44
|
+
|
|
45
|
+
```powershell
|
|
46
|
+
videopilot doctor
|
|
47
|
+
```
|
|
48
|
+
|
|
49
|
+
This reports missing prerequisites. If ffmpeg is missing, install with:
|
|
50
|
+
|
|
51
|
+
```powershell
|
|
52
|
+
winget install --id Gyan.FFmpeg -e
|
|
53
|
+
```
|
|
54
|
+
|
|
55
|
+
If Python packages are missing or the `videopilot` command itself is missing:
|
|
56
|
+
|
|
57
|
+
```powershell
|
|
58
|
+
cd C:\Work\tools\video-creator
|
|
59
|
+
pip install -e .
|
|
60
|
+
```
|
|
61
|
+
|
|
62
|
+
Re-run `doctor` until it shows all green.
|
|
63
|
+
|
|
64
|
+
> **`faster-whisper` (used by `transcribe`) is heavy.** On first run it downloads a
|
|
65
|
+
> ~150MB–1.5GB model depending on size. Default is `base` (~150MB). Skip
|
|
66
|
+
> `transcribe` if the user only wants explicit-timestamp trims or pure voiceover
|
|
67
|
+
> output — you don't need it for those flows.
|
|
68
|
+
|
|
69
|
+
## State files — the contracts
|
|
70
|
+
|
|
71
|
+
Every project lives at `projects/<slug>/`. These JSON files are what you author.
|
|
72
|
+
**Always pretty-print with 2-space indent** so they remain human-editable.
|
|
73
|
+
|
|
74
|
+
### `project.json` — overall state (created by `init`, you may extend)
|
|
75
|
+
|
|
76
|
+
```json
|
|
77
|
+
{
|
|
78
|
+
"name": "Q4 Demo",
|
|
79
|
+
"slug": "q4-demo",
|
|
80
|
+
"created_at": "2026-05-15T14:30:00Z",
|
|
81
|
+
"sources": [
|
|
82
|
+
{ "id": "raw1", "path": "sources/raw-screencast.mp4", "duration_sec": 1834.5 }
|
|
83
|
+
]
|
|
84
|
+
}
|
|
85
|
+
```
|
|
86
|
+
|
|
87
|
+
`sources[].id` is what other state files reference. Source files are copied (or
|
|
88
|
+
symlinked on systems that support it) into `sources/` by `init` and `import`.
|
|
89
|
+
|
|
90
|
+
### `script.json` — voiceover script (you write this)
|
|
91
|
+
|
|
92
|
+
```json
|
|
93
|
+
{
|
|
94
|
+
"voice_defaults": {
|
|
95
|
+
"voice": "en-US-AndrewMultilingualNeural",
|
|
96
|
+
"rate": "+0%",
|
|
97
|
+
"pitch": "+0Hz",
|
|
98
|
+
"engine": "edge-tts"
|
|
99
|
+
},
|
|
100
|
+
"segments": [
|
|
101
|
+
{
|
|
102
|
+
"id": "vo-intro",
|
|
103
|
+
"text": "Welcome to our quarterly review of the product launch.",
|
|
104
|
+
"voice": "en-US-AvaMultilingualNeural",
|
|
105
|
+
"rate": "+5%",
|
|
106
|
+
"pause_after_ms": 500
|
|
107
|
+
},
|
|
108
|
+
{
|
|
109
|
+
"id": "vo-closer",
|
|
110
|
+
"text": "Thanks for watching."
|
|
111
|
+
}
|
|
112
|
+
]
|
|
113
|
+
}
|
|
114
|
+
```
|
|
115
|
+
|
|
116
|
+
- `engine`: `edge-tts` (default, free, no key) or `azure` (requires
|
|
117
|
+
`AZURE_SPEECH_KEY` + `AZURE_SPEECH_REGION` env vars).
|
|
118
|
+
- `voice`: any Microsoft Neural voice short name. List them with
|
|
119
|
+
`videopilot voices`.
|
|
120
|
+
- `rate`: SSML rate like `-10%`, `+0%`, `+25%`.
|
|
121
|
+
- `pitch`: SSML pitch like `-5Hz`, `+0Hz`, `+10Hz`.
|
|
122
|
+
- `text` may contain SSML if you wrap it in `<speak>...</speak>` — the CLI
|
|
123
|
+
passes raw SSML through when it detects a top-level `<speak>` tag.
|
|
124
|
+
|
|
125
|
+
`tts` writes `voice/<id>.mp3` for each segment + a `voice/manifest.json` with
|
|
126
|
+
durations (seconds, float) — read it when you author `compose-plan.json`.
|
|
127
|
+
|
|
128
|
+
### `cut-plan.json` — which sections of which sources to keep (you write this)
|
|
129
|
+
|
|
130
|
+
```json
|
|
131
|
+
{
|
|
132
|
+
"clips": [
|
|
133
|
+
{
|
|
134
|
+
"id": "c-hook",
|
|
135
|
+
"source": "raw1",
|
|
136
|
+
"start": 12.3,
|
|
137
|
+
"end": 28.5,
|
|
138
|
+
"label": "the moment they show the wireframe"
|
|
139
|
+
},
|
|
140
|
+
{
|
|
141
|
+
"id": "c-key",
|
|
142
|
+
"source": "raw1",
|
|
143
|
+
"start": 145.2,
|
|
144
|
+
"end": 167.0,
|
|
145
|
+
"label": "Sarah's key insight about onboarding"
|
|
146
|
+
}
|
|
147
|
+
]
|
|
148
|
+
}
|
|
149
|
+
```
|
|
150
|
+
|
|
151
|
+
Times are in seconds (float). `cut` writes `clips/<id>.mp4` and a
|
|
152
|
+
`clips/manifest.json` with verified durations.
|
|
153
|
+
|
|
154
|
+
If the user gives explicit timestamps, translate directly. If they say "pick
|
|
155
|
+
the highlights," run `transcribe` first, read the transcript, and choose
|
|
156
|
+
spans yourself. **You** decide what's important — the CLI has no opinion.
|
|
157
|
+
|
|
158
|
+
### `compose-plan.json` — timeline assembly (you write this)
|
|
159
|
+
|
|
160
|
+
```json
|
|
161
|
+
{
|
|
162
|
+
"output": {
|
|
163
|
+
"filename": "final.mp4",
|
|
164
|
+
"resolution": "1920x1080",
|
|
165
|
+
"fps": 30,
|
|
166
|
+
"video_bitrate": "8M",
|
|
167
|
+
"audio_bitrate": "192k",
|
|
168
|
+
"video_codec": "libx264",
|
|
169
|
+
"audio_codec": "aac"
|
|
170
|
+
},
|
|
171
|
+
"timeline": [
|
|
172
|
+
{
|
|
173
|
+
"type": "slide",
|
|
174
|
+
"duration_sec": 4.0,
|
|
175
|
+
"background_color": "#0b132b",
|
|
176
|
+
"title": "Q4 Product Review",
|
|
177
|
+
"subtitle": "May 2026",
|
|
178
|
+
"voiceover": "vo-intro"
|
|
179
|
+
},
|
|
180
|
+
{
|
|
181
|
+
"type": "clip",
|
|
182
|
+
"clip": "c-hook",
|
|
183
|
+
"voiceover": null
|
|
184
|
+
},
|
|
185
|
+
{
|
|
186
|
+
"type": "clip",
|
|
187
|
+
"clip": "c-key",
|
|
188
|
+
"voiceover": "vo-key-insight",
|
|
189
|
+
"duck_source_db": -18,
|
|
190
|
+
"pad_to_voiceover": true
|
|
191
|
+
},
|
|
192
|
+
{
|
|
193
|
+
"type": "slide",
|
|
194
|
+
"duration_sec": 3.0,
|
|
195
|
+
"background_image": "sources/logo.png",
|
|
196
|
+
"voiceover": "vo-closer"
|
|
197
|
+
}
|
|
198
|
+
],
|
|
199
|
+
"background_music": {
|
|
200
|
+
"path": "sources/music.mp3",
|
|
201
|
+
"volume_db": -22,
|
|
202
|
+
"fade_in_sec": 1.0,
|
|
203
|
+
"fade_out_sec": 2.0
|
|
204
|
+
}
|
|
205
|
+
}
|
|
206
|
+
```
|
|
207
|
+
|
|
208
|
+
Timeline item types:
|
|
209
|
+
|
|
210
|
+
**`clip`** — plays a cut clip. Fields:
|
|
211
|
+
- `clip` (required): id from `cut-plan.json`
|
|
212
|
+
- `voiceover` (optional): id from `script.json`; mixed on top of clip audio
|
|
213
|
+
- `duck_source_db` (optional, default −15 when VO present, 0 otherwise): how
|
|
214
|
+
much to attenuate the clip's original audio under the voiceover
|
|
215
|
+
- `pad_to_voiceover` (optional, default true): if VO is longer than the clip,
|
|
216
|
+
freeze-frame the last frame to match; if VO is shorter than the clip, the
|
|
217
|
+
clip plays out and the VO ends early (use a longer clip or split the VO if
|
|
218
|
+
you don't want that)
|
|
219
|
+
|
|
220
|
+
**`slide`** — a static title card. Fields:
|
|
221
|
+
- `duration_sec` (required UNLESS `voiceover` is set; then VO duration +
|
|
222
|
+
optional `pad_after_sec` determines it)
|
|
223
|
+
- `background_color` (optional, hex; default `#000000`)
|
|
224
|
+
- `background_image` (optional, path relative to project dir; if both color
|
|
225
|
+
and image are set, image wins)
|
|
226
|
+
- `title` (optional): large heading text overlay
|
|
227
|
+
- `subtitle` (optional): smaller text below title
|
|
228
|
+
- `voiceover` (optional): id from script.json
|
|
229
|
+
|
|
230
|
+
**`gap`** — a fixed silent black pause (useful for breathing room):
|
|
231
|
+
- `duration_sec` (required)
|
|
232
|
+
|
|
233
|
+
`background_music` is mixed under the entire final video at the given volume,
|
|
234
|
+
with optional fades.
|
|
235
|
+
|
|
236
|
+
### Output files
|
|
237
|
+
|
|
238
|
+
After `compose`, `out/` contains:
|
|
239
|
+
|
|
240
|
+
- `final.mp4` — the rendered video
|
|
241
|
+
- `final.fcpxml` (if `--fcpxml` passed) — Final Cut XML, imports into Final Cut
|
|
242
|
+
Pro, Premiere Pro, DaVinci Resolve
|
|
243
|
+
- `final.edl` (if `--edl` passed) — CMX 3600 EDL, broadly supported
|
|
244
|
+
- `render.ps1` (if `--script` passed) — a replayable PowerShell script that
|
|
245
|
+
reproduces the render with raw ffmpeg commands; the user can hand-tweak it
|
|
246
|
+
|
|
247
|
+
## Workflow recipes
|
|
248
|
+
|
|
249
|
+
### Recipe 1 — "Make a 60-second voiceover video from this script"
|
|
250
|
+
|
|
251
|
+
User has a brief, no source video. Output: MP4 of slides + voiceover.
|
|
252
|
+
|
|
253
|
+
1. `videopilot init <slug>` — creates `projects/<slug>/`
|
|
254
|
+
2. **You write** `projects/<slug>/script.json` — one segment per beat
|
|
255
|
+
3. `videopilot tts <slug>` — synth voice MP3s, populates `voice/manifest.json`
|
|
256
|
+
4. Read `voice/manifest.json` to learn segment durations
|
|
257
|
+
5. **You write** `projects/<slug>/compose-plan.json` — slides keyed to VO ids
|
|
258
|
+
6. `videopilot compose <slug>` — renders `out/final.mp4`
|
|
259
|
+
|
|
260
|
+
### Recipe 2 — "Cut this long video down to highlights, no narration"
|
|
261
|
+
|
|
262
|
+
User has one source video, wants a tight cut. No voiceover.
|
|
263
|
+
|
|
264
|
+
1. `videopilot init <slug> --source <path>` — copies source into project
|
|
265
|
+
2. `videopilot transcribe <slug> raw1` — emits `transcripts/raw1.json`
|
|
266
|
+
with word-level timestamps and `transcripts/raw1.srt`
|
|
267
|
+
3. **You read** the transcript and decide which spans to keep
|
|
268
|
+
4. **You write** `cut-plan.json` with chosen spans
|
|
269
|
+
5. **You write** a minimal `compose-plan.json` — one timeline entry per clip,
|
|
270
|
+
no voiceover
|
|
271
|
+
6. `videopilot cut <slug>` — emits `clips/<id>.mp4`
|
|
272
|
+
7. `videopilot compose <slug>` — renders `out/final.mp4`
|
|
273
|
+
|
|
274
|
+
### Recipe 3 — "Cut these specific timestamps from this video"
|
|
275
|
+
|
|
276
|
+
User gives explicit "keep 0:30–1:15, 3:00–4:20" instructions. No need to transcribe.
|
|
277
|
+
|
|
278
|
+
1. `init` with source
|
|
279
|
+
2. **You write** `cut-plan.json` translating the user's timestamps directly
|
|
280
|
+
3. **You write** trivial `compose-plan.json` (each clip back-to-back)
|
|
281
|
+
4. `cut`
|
|
282
|
+
5. `compose`
|
|
283
|
+
|
|
284
|
+
### Recipe 4 — "Take this raw recording, cut the highlights, narrate over them"
|
|
285
|
+
|
|
286
|
+
The full pipeline. Combines recipes 1 + 2.
|
|
287
|
+
|
|
288
|
+
1. `init` with source
|
|
289
|
+
2. `transcribe` the source
|
|
290
|
+
3. **You** pick highlights → write `cut-plan.json`; **you** write a narration
|
|
291
|
+
that complements the clips → write `script.json`
|
|
292
|
+
4. `cut` and `tts` (order independent — can run in parallel via separate
|
|
293
|
+
PowerShell sessions, but sequential is fine)
|
|
294
|
+
5. Read `voice/manifest.json` for VO durations; read `clips/manifest.json`
|
|
295
|
+
for clip durations
|
|
296
|
+
6. **You write** `compose-plan.json` lining up VOs against clips with
|
|
297
|
+
appropriate ducking
|
|
298
|
+
7. `compose`
|
|
299
|
+
8. `export --edl --fcpxml --script` if the user wants to keep editing in
|
|
300
|
+
another NLE
|
|
301
|
+
|
|
302
|
+
### Recipe 5 — "Just trim the boring parts" (no AI)
|
|
303
|
+
|
|
304
|
+
User wants the long video minus dead air, no smart selection.
|
|
305
|
+
|
|
306
|
+
1. `init` with source
|
|
307
|
+
2. `videopilot silence <slug> raw1 --threshold-db -35 --min-silence-sec 1.5`
|
|
308
|
+
— emits a candidate `cut-plan.json` of NON-silent spans
|
|
309
|
+
3. Optionally tweak the cut-plan
|
|
310
|
+
4. `cut`, `compose` as usual
|
|
311
|
+
|
|
312
|
+
## Subcommand reference
|
|
313
|
+
|
|
314
|
+
| Command | Purpose |
|
|
315
|
+
|---|---|
|
|
316
|
+
| `doctor` | Check prerequisites (ffmpeg, ffprobe, Python pkgs, optional Azure keys) |
|
|
317
|
+
| `voices [--locale en-US] [--engine edge-tts\|azure]` | List available TTS voices |
|
|
318
|
+
| `init <slug> [--source <path>...] [--name "Display Name"]` | Create a project |
|
|
319
|
+
| `import <slug> <path> [--id raw2]` | Add another source to an existing project |
|
|
320
|
+
| `tts <slug> [--only <id>...] [--force]` | Synthesize voiceover MP3s from `script.json` |
|
|
321
|
+
| `transcribe <slug> <source-id> [--model base\|small\|medium\|large-v3] [--language en]` | Transcribe a source with faster-whisper |
|
|
322
|
+
| `silence <slug> <source-id> [--threshold-db -35] [--min-silence-sec 1.0]` | Emit a cut-plan candidate of non-silent spans |
|
|
323
|
+
| `cut <slug> [--only <clip-id>...] [--force]` | Cut clips per `cut-plan.json` |
|
|
324
|
+
| `compose <slug>` | Render `out/final.mp4` per `compose-plan.json` |
|
|
325
|
+
| `export <slug> [--edl] [--fcpxml] [--script]` | Emit NLE/replay exports |
|
|
326
|
+
|
|
327
|
+
All commands accept `--quiet` and `--verbose`.
|
|
328
|
+
|
|
329
|
+
## Conventions you must follow
|
|
330
|
+
|
|
331
|
+
1. **Always run `doctor` first** in a new session if you don't know whether
|
|
332
|
+
prerequisites are installed. Don't guess.
|
|
333
|
+
2. **Always pretty-print JSON state files with 2-space indent** and a trailing
|
|
334
|
+
newline. The user reads these.
|
|
335
|
+
3. **Always preserve user-provided ids**. Never rename `clip` / `voiceover` /
|
|
336
|
+
`source` ids without asking.
|
|
337
|
+
4. **Validate timing before you compose**: read `voice/manifest.json` and
|
|
338
|
+
`clips/manifest.json` to confirm durations match what you wrote in
|
|
339
|
+
`compose-plan.json`. If a VO is 12s and the clip is only 5s with
|
|
340
|
+
`pad_to_voiceover: false`, warn the user.
|
|
341
|
+
5. **Don't re-run `tts` for unchanged segments** — it's slow and the audio is
|
|
342
|
+
deterministic. The CLI skips existing files by default unless `--force`.
|
|
343
|
+
6. **Don't re-run `cut` for unchanged clips** — same reason. `cut` is idempotent.
|
|
344
|
+
7. **Stop and ask** if the user's request is ambiguous about what to keep,
|
|
345
|
+
what voice to use, what tone the script should take, etc. Don't fabricate.
|
|
346
|
+
8. **Show the user the script** before you run `tts` — VO audio is not free
|
|
347
|
+
and the user often wants edits. Same for `cut-plan.json` before `cut`.
|
|
348
|
+
9. **Final render preview**: after `compose`, tell the user where `out/final.mp4`
|
|
349
|
+
is and offer to open it (`Start-Process .\projects\<slug>\out\final.mp4`).
|
|
350
|
+
|
|
351
|
+
## Why no Clipchamp export
|
|
352
|
+
|
|
353
|
+
Clipchamp's project format is proprietary and undocumented. Attempting to
|
|
354
|
+
generate one is unreliable and breaks across Clipchamp updates. Instead we
|
|
355
|
+
ship:
|
|
356
|
+
|
|
357
|
+
- **FCPXML** — imports into Premiere Pro, Final Cut Pro, DaVinci Resolve
|
|
358
|
+
- **EDL (CMX 3600)** — broadly supported, simple format
|
|
359
|
+
- **`render.ps1`** — a replayable ffmpeg script the user can hand-tweak and
|
|
360
|
+
re-run
|
|
361
|
+
|
|
362
|
+
If the user really wants to edit in Clipchamp, the recommended workflow is:
|
|
363
|
+
import `final.mp4` plus the original `sources/*.mp4` and the `voice/*.mp3`
|
|
364
|
+
files into Clipchamp as separate tracks. The cut clips in `clips/` can also
|
|
365
|
+
be imported as pre-cut pieces.
|
|
366
|
+
|
|
367
|
+
## Failure modes and recovery
|
|
368
|
+
|
|
369
|
+
| Symptom | Cause | Fix |
|
|
370
|
+
|---|---|---|
|
|
371
|
+
| `doctor`: `ffmpeg not found` | Not on PATH | `winget install --id Gyan.FFmpeg -e`; restart shell |
|
|
372
|
+
| `doctor`: `edge_tts not installed` | Python deps missing | `pip install -r requirements.txt` |
|
|
373
|
+
| `tts`: HTTP errors / "no audio received" | Edge TTS service hiccup or network | Retry; check `https://speech.platform.bing.com` reachable; switch to `engine: azure` if persistent |
|
|
374
|
+
| `transcribe`: model download stalls | Slow network on first run | Pre-pull manually: `python -c "from faster_whisper import WhisperModel; WhisperModel('base')"` |
|
|
375
|
+
| `cut`: "stream copy failed" | Source has unusual codec / variable framerate | Pass `--reencode` to `cut` to force decode/encode |
|
|
376
|
+
| `compose`: timeline durations don't match expectation | `pad_to_voiceover` interaction with short clips | Re-check `clips/manifest.json` vs `voice/manifest.json`; adjust `cut-plan.json` |
|
|
377
|
+
| `compose`: audio glitch at clip boundaries | Concat demuxer with mismatched audio params | Already mitigated — `compose` re-encodes per-segment intermediates; if it persists, file a bug with the intermediates |
|
|
378
|
+
|
|
379
|
+
When in doubt, the intermediates in `tmp/` are kept after every `compose` run
|
|
380
|
+
(not auto-cleaned) so you can inspect what happened at each step.
|
videopilot-0.1.0/LICENSE
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2026 Mazen Bahgat
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
|
@@ -0,0 +1,285 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: videopilot
|
|
3
|
+
Version: 0.1.0
|
|
4
|
+
Summary: Agent-driven video creation toolkit: neural TTS voiceover, highlight cutting, timeline composition, and NLE export.
|
|
5
|
+
Author-email: Mazen Bahgat <mazenbahgat@microsoft.com>
|
|
6
|
+
License: MIT
|
|
7
|
+
Project-URL: Homepage, https://github.com/mbahgatTech/videopilot
|
|
8
|
+
Project-URL: Documentation, https://github.com/mbahgatTech/videopilot/blob/main/README.md
|
|
9
|
+
Project-URL: Agent Runbook, https://github.com/mbahgatTech/videopilot/blob/main/AGENT.md
|
|
10
|
+
Project-URL: Issues, https://github.com/mbahgatTech/videopilot/issues
|
|
11
|
+
Keywords: video,voiceover,tts,edge-tts,azure-speech,ffmpeg,screen-recording,highlight-reel,narration,fcpxml,edl,llm,agent
|
|
12
|
+
Classifier: Development Status :: 4 - Beta
|
|
13
|
+
Classifier: Environment :: Console
|
|
14
|
+
Classifier: Intended Audience :: Developers
|
|
15
|
+
Classifier: Intended Audience :: End Users/Desktop
|
|
16
|
+
Classifier: License :: OSI Approved :: MIT License
|
|
17
|
+
Classifier: Operating System :: OS Independent
|
|
18
|
+
Classifier: Programming Language :: Python :: 3
|
|
19
|
+
Classifier: Programming Language :: Python :: 3.10
|
|
20
|
+
Classifier: Programming Language :: Python :: 3.11
|
|
21
|
+
Classifier: Programming Language :: Python :: 3.12
|
|
22
|
+
Classifier: Programming Language :: Python :: 3.13
|
|
23
|
+
Classifier: Topic :: Multimedia :: Video
|
|
24
|
+
Classifier: Topic :: Multimedia :: Sound/Audio :: Speech
|
|
25
|
+
Classifier: Topic :: Software Development :: Libraries :: Python Modules
|
|
26
|
+
Requires-Python: >=3.10
|
|
27
|
+
Description-Content-Type: text/markdown
|
|
28
|
+
License-File: LICENSE
|
|
29
|
+
Requires-Dist: edge-tts>=7.0
|
|
30
|
+
Requires-Dist: faster-whisper>=1.0
|
|
31
|
+
Requires-Dist: azure-cognitiveservices-speech>=1.40
|
|
32
|
+
Requires-Dist: mcp>=1.0
|
|
33
|
+
Provides-Extra: dev
|
|
34
|
+
Requires-Dist: build>=1.0; extra == "dev"
|
|
35
|
+
Requires-Dist: twine>=5.0; extra == "dev"
|
|
36
|
+
Dynamic: license-file
|
|
37
|
+
|
|
38
|
+
# videopilot
|
|
39
|
+
|
|
40
|
+
> Agent-driven video creation toolkit. Neural TTS voiceover, AI highlight cutting,
|
|
41
|
+
> timeline composition with slides and audio ducking, and NLE export — all driven
|
|
42
|
+
> by a calling LLM through a JSON state contract.
|
|
43
|
+
|
|
44
|
+
[](https://pypi.org/project/videopilot/)
|
|
45
|
+
[](https://www.python.org)
|
|
46
|
+
[](LICENSE)
|
|
47
|
+
[](https://ffmpeg.org)
|
|
48
|
+
|
|
49
|
+
`videopilot` is a Python CLI that turns raw screen recordings into narrated,
|
|
50
|
+
edited MP4s. The CLI does the **mechanical work** — ffmpeg, neural TTS,
|
|
51
|
+
faster-whisper transcription, timeline rendering. A calling **agent** (GitHub
|
|
52
|
+
Copilot CLI, Claude Code, Continue.dev, or any code-aware LLM) does the
|
|
53
|
+
**creative work** — writes the voiceover script, picks the highlight spans,
|
|
54
|
+
lays out the timeline — by reading the contract in [`AGENT.md`](AGENT.md) and
|
|
55
|
+
authoring small JSON state files.
|
|
56
|
+
|
|
57
|
+
You can also drive `videopilot` by hand. Each subcommand is independently usable.
|
|
58
|
+
|
|
59
|
+
```
|
|
60
|
+
source.mp4 -> script.json -> tts -> cut-plan.json -> cut -> compose-plan.json -> compose -> final.mp4
|
|
61
|
+
+ EDL / FCPXML / replay script
|
|
62
|
+
```
|
|
63
|
+
|
|
64
|
+
## Highlights
|
|
65
|
+
|
|
66
|
+
| Capability | Engine |
|
|
67
|
+
|---|---|
|
|
68
|
+
| Neural voiceover, 400+ voices, 100+ locales | [Microsoft Edge TTS](https://github.com/rany2/edge-tts) (free, no key) |
|
|
69
|
+
| Premium neural voices | Azure Speech (optional, with key) |
|
|
70
|
+
| Word-level transcription | [faster-whisper](https://github.com/SYSTRAN/faster-whisper) (local) |
|
|
71
|
+
| Silence trimming, scene cuts | ffmpeg |
|
|
72
|
+
| Title slides, picture-in-picture, audio ducking, music underlay | ffmpeg filter graph composer |
|
|
73
|
+
| MP4 render at any resolution / fps | ffmpeg |
|
|
74
|
+
| Hand-off to Premiere / Resolve / Final Cut | EDL (CMX 3600) + FCPXML export |
|
|
75
|
+
| Replayable render scripts | PowerShell / bash export |
|
|
76
|
+
| Agent-first design | JSON state-file contract documented in `AGENT.md` |
|
|
77
|
+
|
|
78
|
+
## Install
|
|
79
|
+
|
|
80
|
+
### From PyPI (recommended)
|
|
81
|
+
|
|
82
|
+
```
|
|
83
|
+
pip install --user videopilot
|
|
84
|
+
```
|
|
85
|
+
|
|
86
|
+
`videopilot` is a console script — after install it's on your `PATH`. Verify:
|
|
87
|
+
|
|
88
|
+
```
|
|
89
|
+
videopilot doctor
|
|
90
|
+
```
|
|
91
|
+
|
|
92
|
+
You also need **ffmpeg** on `PATH`:
|
|
93
|
+
|
|
94
|
+
| OS | Command |
|
|
95
|
+
|---|---|
|
|
96
|
+
| Windows | `winget install --id Gyan.FFmpeg -e` |
|
|
97
|
+
| macOS | `brew install ffmpeg` |
|
|
98
|
+
| Debian / Ubuntu | `sudo apt install ffmpeg` |
|
|
99
|
+
| Fedora | `sudo dnf install ffmpeg` |
|
|
100
|
+
| Arch | `sudo pacman -S ffmpeg` |
|
|
101
|
+
|
|
102
|
+
`videopilot doctor` exits 0 when ffmpeg, ffprobe, Python deps, and optional
|
|
103
|
+
Azure keys are all in order; otherwise it prints exactly what's missing.
|
|
104
|
+
|
|
105
|
+
### From source (development)
|
|
106
|
+
|
|
107
|
+
```
|
|
108
|
+
git clone https://github.com/mbahgatTech/videopilot.git
|
|
109
|
+
cd videopilot
|
|
110
|
+
pip install --user -e .
|
|
111
|
+
```
|
|
112
|
+
|
|
113
|
+
### Via the Agency plugin
|
|
114
|
+
|
|
115
|
+
If you use Copilot or Claude inside Microsoft and have access to the
|
|
116
|
+
[Agency Playground](https://github.com/agency-microsoft/playground)
|
|
117
|
+
marketplace, install the `videopilot` plugin and ask:
|
|
118
|
+
|
|
119
|
+
> set up videopilot
|
|
120
|
+
|
|
121
|
+
The plugin's `init` skill runs the same installer logic for you.
|
|
122
|
+
|
|
123
|
+
## Quick start
|
|
124
|
+
|
|
125
|
+
```
|
|
126
|
+
# 1. Create a project with a source video
|
|
127
|
+
videopilot init demo --source "/path/to/raw-recording.mp4"
|
|
128
|
+
|
|
129
|
+
# 2. Hand-author projects/demo/script.json (one segment per beat of narration),
|
|
130
|
+
# OR have your agent draft it from AGENT.md.
|
|
131
|
+
|
|
132
|
+
# 3. Synthesize the voiceover
|
|
133
|
+
videopilot tts demo
|
|
134
|
+
|
|
135
|
+
# 4. (Optional) transcribe to help pick highlights
|
|
136
|
+
videopilot transcribe demo raw1
|
|
137
|
+
|
|
138
|
+
# 5. Hand-author projects/demo/cut-plan.json (which spans to keep)
|
|
139
|
+
|
|
140
|
+
# 6. Cut clips from sources
|
|
141
|
+
videopilot cut demo
|
|
142
|
+
|
|
143
|
+
# 7. Hand-author projects/demo/compose-plan.json (timeline + slides + ducking)
|
|
144
|
+
|
|
145
|
+
# 8. Render the final video
|
|
146
|
+
videopilot compose demo
|
|
147
|
+
|
|
148
|
+
# 9. Optional: emit NLE projects + replay script
|
|
149
|
+
videopilot export demo --edl --fcpxml --script
|
|
150
|
+
```
|
|
151
|
+
|
|
152
|
+
Final output: `projects/demo/out/final.mp4` plus optional `final.edl`,
|
|
153
|
+
`final.fcpxml`, and `render.ps1`.
|
|
154
|
+
|
|
155
|
+
## CLI reference
|
|
156
|
+
|
|
157
|
+
| Command | Purpose |
|
|
158
|
+
|---|---|
|
|
159
|
+
| `videopilot doctor` | Verify ffmpeg, ffprobe, Python deps, optional Azure keys. |
|
|
160
|
+
| `videopilot voices [--locale en-US]` | List available TTS voices. |
|
|
161
|
+
| `videopilot init <slug> [--source PATH]` | Create a new project with optional first source. |
|
|
162
|
+
| `videopilot import <slug> <path>` | Add another source to an existing project. |
|
|
163
|
+
| `videopilot tts <slug> [--force]` | Synthesize voiceover MP3s from `script.json`. |
|
|
164
|
+
| `videopilot transcribe <slug> <source-id>` | Run faster-whisper; emits word-level JSON + SRT. |
|
|
165
|
+
| `videopilot silence <slug> <source-id>` | Emit a cut-plan candidate that strips silence. |
|
|
166
|
+
| `videopilot cut <slug> [--force] [--reencode]` | Cut clips per `cut-plan.json`. |
|
|
167
|
+
| `videopilot compose <slug>` | Render final MP4 per `compose-plan.json`. |
|
|
168
|
+
| `videopilot export <slug> [--edl] [--fcpxml] [--script]` | Emit NLE projects + replayable ffmpeg script. |
|
|
169
|
+
|
|
170
|
+
Run `videopilot <command> --help` for per-command flags.
|
|
171
|
+
|
|
172
|
+
## Project layout
|
|
173
|
+
|
|
174
|
+
```
|
|
175
|
+
videopilot/
|
|
176
|
+
- AGENT.md <- contract for calling LLMs (start here if you're driving the tool)
|
|
177
|
+
- README.md <- this file
|
|
178
|
+
- LICENSE <- MIT
|
|
179
|
+
- pyproject.toml
|
|
180
|
+
- videopilot.py <- argparse router
|
|
181
|
+
- videopilot_cli.py <- console-script shim
|
|
182
|
+
- lib/ <- implementation modules
|
|
183
|
+
- tts.py
|
|
184
|
+
- transcribe.py
|
|
185
|
+
- silence.py
|
|
186
|
+
- cut.py
|
|
187
|
+
- compose.py
|
|
188
|
+
- export.py
|
|
189
|
+
- ffmpeg_wrap.py
|
|
190
|
+
- voices.py
|
|
191
|
+
- init_cmd.py
|
|
192
|
+
- doctor.py
|
|
193
|
+
- examples/ <- copyable starter JSON state files
|
|
194
|
+
- projects/<slug>/ <- per-project workspace (one folder per video)
|
|
195
|
+
- project.json
|
|
196
|
+
- script.json
|
|
197
|
+
- cut-plan.json
|
|
198
|
+
- compose-plan.json
|
|
199
|
+
- sources/
|
|
200
|
+
- voice/
|
|
201
|
+
- transcripts/
|
|
202
|
+
- clips/
|
|
203
|
+
- tmp/
|
|
204
|
+
- out/
|
|
205
|
+
```
|
|
206
|
+
|
|
207
|
+
## Configuration
|
|
208
|
+
|
|
209
|
+
| Environment variable | Purpose |
|
|
210
|
+
|---|---|
|
|
211
|
+
| `AZURE_SPEECH_KEY` | Optional. Enables Azure Speech voices (premium neural TTS). |
|
|
212
|
+
| `AZURE_SPEECH_REGION` | Required when `AZURE_SPEECH_KEY` is set (e.g. `eastus`). |
|
|
213
|
+
|
|
214
|
+
Edge TTS is the default and requires no configuration.
|
|
215
|
+
|
|
216
|
+
## Driving videopilot from an LLM
|
|
217
|
+
|
|
218
|
+
Read [`AGENT.md`](AGENT.md). It is the contract the calling LLM uses:
|
|
219
|
+
|
|
220
|
+
- the JSON schema for each state file (`project.json`, `script.json`,
|
|
221
|
+
`cut-plan.json`, `compose-plan.json`);
|
|
222
|
+
- when to call which subcommand;
|
|
223
|
+
- conventions (2-space JSON, preserved ids, idempotent re-runs);
|
|
224
|
+
- common failure modes and recoveries.
|
|
225
|
+
|
|
226
|
+
The `videopilot` plugin in the Agency Playground packages this contract as a
|
|
227
|
+
Copilot/Claude skill so you can just say `set up videopilot` and `make a video
|
|
228
|
+
from <source>` instead of orchestrating by hand.
|
|
229
|
+
|
|
230
|
+
## Development
|
|
231
|
+
|
|
232
|
+
```
|
|
233
|
+
git clone https://github.com/mbahgatTech/videopilot.git
|
|
234
|
+
cd videopilot
|
|
235
|
+
pip install --user -e ".[dev]"
|
|
236
|
+
|
|
237
|
+
# Build the package
|
|
238
|
+
python -m build
|
|
239
|
+
|
|
240
|
+
# Validate the dist
|
|
241
|
+
python -m twine check dist/*
|
|
242
|
+
|
|
243
|
+
# Local smoke test
|
|
244
|
+
videopilot doctor
|
|
245
|
+
```
|
|
246
|
+
|
|
247
|
+
## Releasing
|
|
248
|
+
|
|
249
|
+
Releases are published to PyPI automatically when a `v*` tag is pushed.
|
|
250
|
+
The workflow uses [PyPI **Trusted Publishing** (OIDC)](https://docs.pypi.org/trusted-publishers/),
|
|
251
|
+
so **no API tokens are stored in the repo or in GitHub Secrets** — PyPI verifies
|
|
252
|
+
the GitHub OIDC token at publish time.
|
|
253
|
+
|
|
254
|
+
One-time setup (PyPI side, do this once before the first release):
|
|
255
|
+
|
|
256
|
+
1. Sign in to <https://pypi.org/>.
|
|
257
|
+
2. Account settings → Publishing → **Add a new pending publisher**:
|
|
258
|
+
- PyPI project name: `videopilot`
|
|
259
|
+
- Owner: `mbahgatTech`
|
|
260
|
+
- Repository: `videopilot`
|
|
261
|
+
- Workflow filename: `release.yml`
|
|
262
|
+
- Environment name: `pypi`
|
|
263
|
+
3. On GitHub, repo Settings → Environments → **New environment** → `pypi`.
|
|
264
|
+
Optionally add a required reviewer for an extra approval gate.
|
|
265
|
+
|
|
266
|
+
Cutting a release:
|
|
267
|
+
|
|
268
|
+
```
|
|
269
|
+
# bump pyproject.toml [project] version, e.g. 0.1.0 -> 0.2.0
|
|
270
|
+
git commit -am "release: 0.2.0"
|
|
271
|
+
git tag v0.2.0
|
|
272
|
+
git push origin main --tags
|
|
273
|
+
```
|
|
274
|
+
|
|
275
|
+
The `release` workflow then:
|
|
276
|
+
|
|
277
|
+
1. Builds sdist + wheel
|
|
278
|
+
2. Verifies tag matches `pyproject.toml` version
|
|
279
|
+
3. Runs `twine check`
|
|
280
|
+
4. Publishes to PyPI via OIDC
|
|
281
|
+
5. Creates a GitHub Release with the sdist + wheel attached
|
|
282
|
+
|
|
283
|
+
## License
|
|
284
|
+
|
|
285
|
+
MIT. See [`LICENSE`](LICENSE).
|