videopilot 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (34) hide show
  1. videopilot-0.1.0/AGENT.md +380 -0
  2. videopilot-0.1.0/LICENSE +21 -0
  3. videopilot-0.1.0/MANIFEST.in +10 -0
  4. videopilot-0.1.0/PKG-INFO +285 -0
  5. videopilot-0.1.0/README.md +248 -0
  6. videopilot-0.1.0/examples/highlight-cut/compose-plan.json +12 -0
  7. videopilot-0.1.0/examples/highlight-cut/cut-plan.json +25 -0
  8. videopilot-0.1.0/examples/narrated-highlights/compose-plan.json +50 -0
  9. videopilot-0.1.0/examples/narrated-highlights/cut-plan.json +7 -0
  10. videopilot-0.1.0/examples/narrated-highlights/script.json +26 -0
  11. videopilot-0.1.0/examples/voiceover-only/compose-plan.json +35 -0
  12. videopilot-0.1.0/examples/voiceover-only/script.json +27 -0
  13. videopilot-0.1.0/lib/__init__.py +1 -0
  14. videopilot-0.1.0/lib/compose.py +450 -0
  15. videopilot-0.1.0/lib/cut.py +121 -0
  16. videopilot-0.1.0/lib/doctor.py +98 -0
  17. videopilot-0.1.0/lib/export.py +404 -0
  18. videopilot-0.1.0/lib/ffmpeg_wrap.py +164 -0
  19. videopilot-0.1.0/lib/init_cmd.py +163 -0
  20. videopilot-0.1.0/lib/silence.py +96 -0
  21. videopilot-0.1.0/lib/transcribe.py +116 -0
  22. videopilot-0.1.0/lib/tts.py +172 -0
  23. videopilot-0.1.0/lib/voices.py +78 -0
  24. videopilot-0.1.0/pyproject.toml +75 -0
  25. videopilot-0.1.0/setup.cfg +4 -0
  26. videopilot-0.1.0/videopilot.egg-info/PKG-INFO +285 -0
  27. videopilot-0.1.0/videopilot.egg-info/SOURCES.txt +32 -0
  28. videopilot-0.1.0/videopilot.egg-info/dependency_links.txt +1 -0
  29. videopilot-0.1.0/videopilot.egg-info/entry_points.txt +3 -0
  30. videopilot-0.1.0/videopilot.egg-info/requires.txt +8 -0
  31. videopilot-0.1.0/videopilot.egg-info/top_level.txt +4 -0
  32. videopilot-0.1.0/videopilot.py +166 -0
  33. videopilot-0.1.0/videopilot_cli.py +20 -0
  34. videopilot-0.1.0/videopilot_mcp.py +478 -0
@@ -0,0 +1,380 @@
1
+ # video-creator — Agent Runbook
2
+
3
+ > **You are the calling LLM (GitHub Copilot CLI, Claude, etc.).** This document tells
4
+ > you how to drive `videopilot.py` from a user's natural-language request to produce
5
+ > a finished video. Read it in full before invoking the CLI.
6
+
7
+ ## Mental model
8
+
9
+ `videopilot.py` is a **pure executor**. It does mechanical things (run ffmpeg, run
10
+ Whisper, run Edge TTS) and reads/writes JSON state files. **You** (the LLM) do the
11
+ creative + planning work:
12
+
13
+ - You write the voiceover script (`script.json`).
14
+ - You pick which sections to keep from a long video (`cut-plan.json`) — either by
15
+ reading a transcript or by translating the user's explicit trim instructions.
16
+ - You assemble the final timeline (`compose-plan.json`) — which clip plays when,
17
+ which voiceover overlays it, where slides go, where background music drops.
18
+ - Then you call `videopilot.py` subcommands to make it happen.
19
+
20
+ The JSON state files in `projects/<slug>/` are the **contract** between you and the
21
+ CLI. The CLI never invents content — it does exactly what the JSON says.
22
+
23
+ ## Invocation
24
+
25
+ `videopilot` is installed as a console script — it works from **any** directory:
26
+
27
+ ```powershell
28
+ videopilot <subcommand> <args>
29
+ ```
30
+
31
+ If the command isn't found in a fresh shell, refresh PATH first:
32
+
33
+ ```powershell
34
+ $env:Path = [System.Environment]::GetEnvironmentVariable("Path","Machine") + ";" + [System.Environment]::GetEnvironmentVariable("Path","User")
35
+ ```
36
+
37
+ The CLI auto-detects ffmpeg in the WinGet `Gyan.FFmpeg` install location, so no manual ffmpeg PATH dance is needed.
38
+
39
+ > **Legacy invocation** (still works): `python C:\Work\tools\video-creator\videopilot.py <subcommand>`. The console-script form is preferred — every example below assumes it.
40
+
41
+ ## First-time setup
42
+
43
+ Before doing any real work in a fresh environment, run:
44
+
45
+ ```powershell
46
+ videopilot doctor
47
+ ```
48
+
49
+ This reports missing prerequisites. If ffmpeg is missing, install with:
50
+
51
+ ```powershell
52
+ winget install --id Gyan.FFmpeg -e
53
+ ```
54
+
55
+ If Python packages are missing or the `videopilot` command itself is missing:
56
+
57
+ ```powershell
58
+ cd C:\Work\tools\video-creator
59
+ pip install -e .
60
+ ```
61
+
62
+ Re-run `doctor` until it shows all green.
63
+
64
+ > **`faster-whisper` (used by `transcribe`) is heavy.** On first run it downloads a
65
+ > ~150MB–1.5GB model depending on size. Default is `base` (~150MB). Skip
66
+ > `transcribe` if the user only wants explicit-timestamp trims or pure voiceover
67
+ > output — you don't need it for those flows.
68
+
69
+ ## State files — the contracts
70
+
71
+ Every project lives at `projects/<slug>/`. These JSON files are what you author.
72
+ **Always pretty-print with 2-space indent** so they remain human-editable.
73
+
74
+ ### `project.json` — overall state (created by `init`, you may extend)
75
+
76
+ ```json
77
+ {
78
+ "name": "Q4 Demo",
79
+ "slug": "q4-demo",
80
+ "created_at": "2026-05-15T14:30:00Z",
81
+ "sources": [
82
+ { "id": "raw1", "path": "sources/raw-screencast.mp4", "duration_sec": 1834.5 }
83
+ ]
84
+ }
85
+ ```
86
+
87
+ `sources[].id` is what other state files reference. Source files are copied (or
88
+ symlinked on systems that support it) into `sources/` by `init` and `import`.
89
+
90
+ ### `script.json` — voiceover script (you write this)
91
+
92
+ ```json
93
+ {
94
+ "voice_defaults": {
95
+ "voice": "en-US-AndrewMultilingualNeural",
96
+ "rate": "+0%",
97
+ "pitch": "+0Hz",
98
+ "engine": "edge-tts"
99
+ },
100
+ "segments": [
101
+ {
102
+ "id": "vo-intro",
103
+ "text": "Welcome to our quarterly review of the product launch.",
104
+ "voice": "en-US-AvaMultilingualNeural",
105
+ "rate": "+5%",
106
+ "pause_after_ms": 500
107
+ },
108
+ {
109
+ "id": "vo-closer",
110
+ "text": "Thanks for watching."
111
+ }
112
+ ]
113
+ }
114
+ ```
115
+
116
+ - `engine`: `edge-tts` (default, free, no key) or `azure` (requires
117
+ `AZURE_SPEECH_KEY` + `AZURE_SPEECH_REGION` env vars).
118
+ - `voice`: any Microsoft Neural voice short name. List them with
119
+ `videopilot voices`.
120
+ - `rate`: SSML rate like `-10%`, `+0%`, `+25%`.
121
+ - `pitch`: SSML pitch like `-5Hz`, `+0Hz`, `+10Hz`.
122
+ - `text` may contain SSML if you wrap it in `<speak>...</speak>` — the CLI
123
+ passes raw SSML through when it detects a top-level `<speak>` tag.
124
+
125
+ `tts` writes `voice/<id>.mp3` for each segment + a `voice/manifest.json` with
126
+ durations (seconds, float) — read it when you author `compose-plan.json`.
127
+
128
+ ### `cut-plan.json` — which sections of which sources to keep (you write this)
129
+
130
+ ```json
131
+ {
132
+ "clips": [
133
+ {
134
+ "id": "c-hook",
135
+ "source": "raw1",
136
+ "start": 12.3,
137
+ "end": 28.5,
138
+ "label": "the moment they show the wireframe"
139
+ },
140
+ {
141
+ "id": "c-key",
142
+ "source": "raw1",
143
+ "start": 145.2,
144
+ "end": 167.0,
145
+ "label": "Sarah's key insight about onboarding"
146
+ }
147
+ ]
148
+ }
149
+ ```
150
+
151
+ Times are in seconds (float). `cut` writes `clips/<id>.mp4` and a
152
+ `clips/manifest.json` with verified durations.
153
+
154
+ If the user gives explicit timestamps, translate directly. If they say "pick
155
+ the highlights," run `transcribe` first, read the transcript, and choose
156
+ spans yourself. **You** decide what's important — the CLI has no opinion.
157
+
158
+ ### `compose-plan.json` — timeline assembly (you write this)
159
+
160
+ ```json
161
+ {
162
+ "output": {
163
+ "filename": "final.mp4",
164
+ "resolution": "1920x1080",
165
+ "fps": 30,
166
+ "video_bitrate": "8M",
167
+ "audio_bitrate": "192k",
168
+ "video_codec": "libx264",
169
+ "audio_codec": "aac"
170
+ },
171
+ "timeline": [
172
+ {
173
+ "type": "slide",
174
+ "duration_sec": 4.0,
175
+ "background_color": "#0b132b",
176
+ "title": "Q4 Product Review",
177
+ "subtitle": "May 2026",
178
+ "voiceover": "vo-intro"
179
+ },
180
+ {
181
+ "type": "clip",
182
+ "clip": "c-hook",
183
+ "voiceover": null
184
+ },
185
+ {
186
+ "type": "clip",
187
+ "clip": "c-key",
188
+ "voiceover": "vo-key-insight",
189
+ "duck_source_db": -18,
190
+ "pad_to_voiceover": true
191
+ },
192
+ {
193
+ "type": "slide",
194
+ "duration_sec": 3.0,
195
+ "background_image": "sources/logo.png",
196
+ "voiceover": "vo-closer"
197
+ }
198
+ ],
199
+ "background_music": {
200
+ "path": "sources/music.mp3",
201
+ "volume_db": -22,
202
+ "fade_in_sec": 1.0,
203
+ "fade_out_sec": 2.0
204
+ }
205
+ }
206
+ ```
207
+
208
+ Timeline item types:
209
+
210
+ **`clip`** — plays a cut clip. Fields:
211
+ - `clip` (required): id from `cut-plan.json`
212
+ - `voiceover` (optional): id from `script.json`; mixed on top of clip audio
213
+ - `duck_source_db` (optional, default −15 when VO present, 0 otherwise): how
214
+ much to attenuate the clip's original audio under the voiceover
215
+ - `pad_to_voiceover` (optional, default true): if VO is longer than the clip,
216
+ freeze-frame the last frame to match; if VO is shorter than the clip, the
217
+ clip plays out and the VO ends early (use a longer clip or split the VO if
218
+ you don't want that)
219
+
220
+ **`slide`** — a static title card. Fields:
221
+ - `duration_sec` (required UNLESS `voiceover` is set; then VO duration +
222
+ optional `pad_after_sec` determines it)
223
+ - `background_color` (optional, hex; default `#000000`)
224
+ - `background_image` (optional, path relative to project dir; if both color
225
+ and image are set, image wins)
226
+ - `title` (optional): large heading text overlay
227
+ - `subtitle` (optional): smaller text below title
228
+ - `voiceover` (optional): id from script.json
229
+
230
+ **`gap`** — a fixed silent black pause (useful for breathing room):
231
+ - `duration_sec` (required)
232
+
233
+ `background_music` is mixed under the entire final video at the given volume,
234
+ with optional fades.
235
+
236
+ ### Output files
237
+
238
+ After `compose`, `out/` contains:
239
+
240
+ - `final.mp4` — the rendered video
241
+ - `final.fcpxml` (if `--fcpxml` passed) — Final Cut XML, imports into Final Cut
242
+ Pro, Premiere Pro, DaVinci Resolve
243
+ - `final.edl` (if `--edl` passed) — CMX 3600 EDL, broadly supported
244
+ - `render.ps1` (if `--script` passed) — a replayable PowerShell script that
245
+ reproduces the render with raw ffmpeg commands; the user can hand-tweak it
246
+
247
+ ## Workflow recipes
248
+
249
+ ### Recipe 1 — "Make a 60-second voiceover video from this script"
250
+
251
+ User has a brief, no source video. Output: MP4 of slides + voiceover.
252
+
253
+ 1. `videopilot init <slug>` — creates `projects/<slug>/`
254
+ 2. **You write** `projects/<slug>/script.json` — one segment per beat
255
+ 3. `videopilot tts <slug>` — synth voice MP3s, populates `voice/manifest.json`
256
+ 4. Read `voice/manifest.json` to learn segment durations
257
+ 5. **You write** `projects/<slug>/compose-plan.json` — slides keyed to VO ids
258
+ 6. `videopilot compose <slug>` — renders `out/final.mp4`
259
+
260
+ ### Recipe 2 — "Cut this long video down to highlights, no narration"
261
+
262
+ User has one source video, wants a tight cut. No voiceover.
263
+
264
+ 1. `videopilot init <slug> --source <path>` — copies source into project
265
+ 2. `videopilot transcribe <slug> raw1` — emits `transcripts/raw1.json`
266
+ with word-level timestamps and `transcripts/raw1.srt`
267
+ 3. **You read** the transcript and decide which spans to keep
268
+ 4. **You write** `cut-plan.json` with chosen spans
269
+ 5. **You write** a minimal `compose-plan.json` — one timeline entry per clip,
270
+ no voiceover
271
+ 6. `videopilot cut <slug>` — emits `clips/<id>.mp4`
272
+ 7. `videopilot compose <slug>` — renders `out/final.mp4`
273
+
274
+ ### Recipe 3 — "Cut these specific timestamps from this video"
275
+
276
+ User gives explicit "keep 0:30–1:15, 3:00–4:20" instructions. No need to transcribe.
277
+
278
+ 1. `init` with source
279
+ 2. **You write** `cut-plan.json` translating the user's timestamps directly
280
+ 3. **You write** trivial `compose-plan.json` (each clip back-to-back)
281
+ 4. `cut`
282
+ 5. `compose`
283
+
284
+ ### Recipe 4 — "Take this raw recording, cut the highlights, narrate over them"
285
+
286
+ The full pipeline. Combines recipes 1 + 2.
287
+
288
+ 1. `init` with source
289
+ 2. `transcribe` the source
290
+ 3. **You** pick highlights → write `cut-plan.json`; **you** write a narration
291
+ that complements the clips → write `script.json`
292
+ 4. `cut` and `tts` (order independent — can run in parallel via separate
293
+ PowerShell sessions, but sequential is fine)
294
+ 5. Read `voice/manifest.json` for VO durations; read `clips/manifest.json`
295
+ for clip durations
296
+ 6. **You write** `compose-plan.json` lining up VOs against clips with
297
+ appropriate ducking
298
+ 7. `compose`
299
+ 8. `export --edl --fcpxml --script` if the user wants to keep editing in
300
+ another NLE
301
+
302
+ ### Recipe 5 — "Just trim the boring parts" (no AI)
303
+
304
+ User wants the long video minus dead air, no smart selection.
305
+
306
+ 1. `init` with source
307
+ 2. `videopilot silence <slug> raw1 --threshold-db -35 --min-silence-sec 1.5`
308
+ — emits a candidate `cut-plan.json` of NON-silent spans
309
+ 3. Optionally tweak the cut-plan
310
+ 4. `cut`, `compose` as usual
311
+
312
+ ## Subcommand reference
313
+
314
+ | Command | Purpose |
315
+ |---|---|
316
+ | `doctor` | Check prerequisites (ffmpeg, ffprobe, Python pkgs, optional Azure keys) |
317
+ | `voices [--locale en-US] [--engine edge-tts\|azure]` | List available TTS voices |
318
+ | `init <slug> [--source <path>...] [--name "Display Name"]` | Create a project |
319
+ | `import <slug> <path> [--id raw2]` | Add another source to an existing project |
320
+ | `tts <slug> [--only <id>...] [--force]` | Synthesize voiceover MP3s from `script.json` |
321
+ | `transcribe <slug> <source-id> [--model base\|small\|medium\|large-v3] [--language en]` | Transcribe a source with faster-whisper |
322
+ | `silence <slug> <source-id> [--threshold-db -35] [--min-silence-sec 1.0]` | Emit a cut-plan candidate of non-silent spans |
323
+ | `cut <slug> [--only <clip-id>...] [--force]` | Cut clips per `cut-plan.json` |
324
+ | `compose <slug>` | Render `out/final.mp4` per `compose-plan.json` |
325
+ | `export <slug> [--edl] [--fcpxml] [--script]` | Emit NLE/replay exports |
326
+
327
+ All commands accept `--quiet` and `--verbose`.
328
+
329
+ ## Conventions you must follow
330
+
331
+ 1. **Always run `doctor` first** in a new session if you don't know whether
332
+ prerequisites are installed. Don't guess.
333
+ 2. **Always pretty-print JSON state files with 2-space indent** and a trailing
334
+ newline. The user reads these.
335
+ 3. **Always preserve user-provided ids**. Never rename `clip` / `voiceover` /
336
+ `source` ids without asking.
337
+ 4. **Validate timing before you compose**: read `voice/manifest.json` and
338
+ `clips/manifest.json` to confirm durations match what you wrote in
339
+ `compose-plan.json`. If a VO is 12s and the clip is only 5s with
340
+ `pad_to_voiceover: false`, warn the user.
341
+ 5. **Don't re-run `tts` for unchanged segments** — it's slow and the audio is
342
+ deterministic. The CLI skips existing files by default unless `--force`.
343
+ 6. **Don't re-run `cut` for unchanged clips** — same reason. `cut` is idempotent.
344
+ 7. **Stop and ask** if the user's request is ambiguous about what to keep,
345
+ what voice to use, what tone the script should take, etc. Don't fabricate.
346
+ 8. **Show the user the script** before you run `tts` — VO audio is not free
347
+ and the user often wants edits. Same for `cut-plan.json` before `cut`.
348
+ 9. **Final render preview**: after `compose`, tell the user where `out/final.mp4`
349
+ is and offer to open it (`Start-Process .\projects\<slug>\out\final.mp4`).
350
+
351
+ ## Why no Clipchamp export
352
+
353
+ Clipchamp's project format is proprietary and undocumented. Attempting to
354
+ generate one is unreliable and breaks across Clipchamp updates. Instead we
355
+ ship:
356
+
357
+ - **FCPXML** — imports into Premiere Pro, Final Cut Pro, DaVinci Resolve
358
+ - **EDL (CMX 3600)** — broadly supported, simple format
359
+ - **`render.ps1`** — a replayable ffmpeg script the user can hand-tweak and
360
+ re-run
361
+
362
+ If the user really wants to edit in Clipchamp, the recommended workflow is:
363
+ import `final.mp4` plus the original `sources/*.mp4` and the `voice/*.mp3`
364
+ files into Clipchamp as separate tracks. The cut clips in `clips/` can also
365
+ be imported as pre-cut pieces.
366
+
367
+ ## Failure modes and recovery
368
+
369
+ | Symptom | Cause | Fix |
370
+ |---|---|---|
371
+ | `doctor`: `ffmpeg not found` | Not on PATH | `winget install --id Gyan.FFmpeg -e`; restart shell |
372
+ | `doctor`: `edge_tts not installed` | Python deps missing | `pip install -r requirements.txt` |
373
+ | `tts`: HTTP errors / "no audio received" | Edge TTS service hiccup or network | Retry; check `https://speech.platform.bing.com` reachable; switch to `engine: azure` if persistent |
374
+ | `transcribe`: model download stalls | Slow network on first run | Pre-pull manually: `python -c "from faster_whisper import WhisperModel; WhisperModel('base')"` |
375
+ | `cut`: "stream copy failed" | Source has unusual codec / variable framerate | Pass `--reencode` to `cut` to force decode/encode |
376
+ | `compose`: timeline durations don't match expectation | `pad_to_voiceover` interaction with short clips | Re-check `clips/manifest.json` vs `voice/manifest.json`; adjust `cut-plan.json` |
377
+ | `compose`: audio glitch at clip boundaries | Concat demuxer with mismatched audio params | Already mitigated — `compose` re-encodes per-segment intermediates; if it persists, file a bug with the intermediates |
378
+
379
+ When in doubt, the intermediates in `tmp/` are kept after every `compose` run
380
+ (not auto-cleaned) so you can inspect what happened at each step.
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Mazen Bahgat
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,10 @@
1
+ include AGENT.md
2
+ include LICENSE
3
+ include README.md
4
+ recursive-include examples *.json
5
+ recursive-include examples *.md
6
+ graft projects
7
+ prune projects/*
8
+ include projects/.gitkeep
9
+ global-exclude __pycache__
10
+ global-exclude *.py[cod]
@@ -0,0 +1,285 @@
1
+ Metadata-Version: 2.4
2
+ Name: videopilot
3
+ Version: 0.1.0
4
+ Summary: Agent-driven video creation toolkit: neural TTS voiceover, highlight cutting, timeline composition, and NLE export.
5
+ Author-email: Mazen Bahgat <mazenbahgat@microsoft.com>
6
+ License: MIT
7
+ Project-URL: Homepage, https://github.com/mbahgatTech/videopilot
8
+ Project-URL: Documentation, https://github.com/mbahgatTech/videopilot/blob/main/README.md
9
+ Project-URL: Agent Runbook, https://github.com/mbahgatTech/videopilot/blob/main/AGENT.md
10
+ Project-URL: Issues, https://github.com/mbahgatTech/videopilot/issues
11
+ Keywords: video,voiceover,tts,edge-tts,azure-speech,ffmpeg,screen-recording,highlight-reel,narration,fcpxml,edl,llm,agent
12
+ Classifier: Development Status :: 4 - Beta
13
+ Classifier: Environment :: Console
14
+ Classifier: Intended Audience :: Developers
15
+ Classifier: Intended Audience :: End Users/Desktop
16
+ Classifier: License :: OSI Approved :: MIT License
17
+ Classifier: Operating System :: OS Independent
18
+ Classifier: Programming Language :: Python :: 3
19
+ Classifier: Programming Language :: Python :: 3.10
20
+ Classifier: Programming Language :: Python :: 3.11
21
+ Classifier: Programming Language :: Python :: 3.12
22
+ Classifier: Programming Language :: Python :: 3.13
23
+ Classifier: Topic :: Multimedia :: Video
24
+ Classifier: Topic :: Multimedia :: Sound/Audio :: Speech
25
+ Classifier: Topic :: Software Development :: Libraries :: Python Modules
26
+ Requires-Python: >=3.10
27
+ Description-Content-Type: text/markdown
28
+ License-File: LICENSE
29
+ Requires-Dist: edge-tts>=7.0
30
+ Requires-Dist: faster-whisper>=1.0
31
+ Requires-Dist: azure-cognitiveservices-speech>=1.40
32
+ Requires-Dist: mcp>=1.0
33
+ Provides-Extra: dev
34
+ Requires-Dist: build>=1.0; extra == "dev"
35
+ Requires-Dist: twine>=5.0; extra == "dev"
36
+ Dynamic: license-file
37
+
38
+ # videopilot
39
+
40
+ > Agent-driven video creation toolkit. Neural TTS voiceover, AI highlight cutting,
41
+ > timeline composition with slides and audio ducking, and NLE export — all driven
42
+ > by a calling LLM through a JSON state contract.
43
+
44
+ [![PyPI](https://img.shields.io/badge/PyPI-videopilot-blue.svg)](https://pypi.org/project/videopilot/)
45
+ [![Python](https://img.shields.io/badge/python-3.10%2B-blue.svg)](https://www.python.org)
46
+ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
47
+ [![ffmpeg](https://img.shields.io/badge/depends-ffmpeg-orange.svg)](https://ffmpeg.org)
48
+
49
+ `videopilot` is a Python CLI that turns raw screen recordings into narrated,
50
+ edited MP4s. The CLI does the **mechanical work** — ffmpeg, neural TTS,
51
+ faster-whisper transcription, timeline rendering. A calling **agent** (GitHub
52
+ Copilot CLI, Claude Code, Continue.dev, or any code-aware LLM) does the
53
+ **creative work** — writes the voiceover script, picks the highlight spans,
54
+ lays out the timeline — by reading the contract in [`AGENT.md`](AGENT.md) and
55
+ authoring small JSON state files.
56
+
57
+ You can also drive `videopilot` by hand. Each subcommand is independently usable.
58
+
59
+ ```
60
+ source.mp4 -> script.json -> tts -> cut-plan.json -> cut -> compose-plan.json -> compose -> final.mp4
61
+ + EDL / FCPXML / replay script
62
+ ```
63
+
64
+ ## Highlights
65
+
66
+ | Capability | Engine |
67
+ |---|---|
68
+ | Neural voiceover, 400+ voices, 100+ locales | [Microsoft Edge TTS](https://github.com/rany2/edge-tts) (free, no key) |
69
+ | Premium neural voices | Azure Speech (optional, with key) |
70
+ | Word-level transcription | [faster-whisper](https://github.com/SYSTRAN/faster-whisper) (local) |
71
+ | Silence trimming, scene cuts | ffmpeg |
72
+ | Title slides, picture-in-picture, audio ducking, music underlay | ffmpeg filter graph composer |
73
+ | MP4 render at any resolution / fps | ffmpeg |
74
+ | Hand-off to Premiere / Resolve / Final Cut | EDL (CMX 3600) + FCPXML export |
75
+ | Replayable render scripts | PowerShell / bash export |
76
+ | Agent-first design | JSON state-file contract documented in `AGENT.md` |
77
+
78
+ ## Install
79
+
80
+ ### From PyPI (recommended)
81
+
82
+ ```
83
+ pip install --user videopilot
84
+ ```
85
+
86
+ `videopilot` is a console script — after install it's on your `PATH`. Verify:
87
+
88
+ ```
89
+ videopilot doctor
90
+ ```
91
+
92
+ You also need **ffmpeg** on `PATH`:
93
+
94
+ | OS | Command |
95
+ |---|---|
96
+ | Windows | `winget install --id Gyan.FFmpeg -e` |
97
+ | macOS | `brew install ffmpeg` |
98
+ | Debian / Ubuntu | `sudo apt install ffmpeg` |
99
+ | Fedora | `sudo dnf install ffmpeg` |
100
+ | Arch | `sudo pacman -S ffmpeg` |
101
+
102
+ `videopilot doctor` exits 0 when ffmpeg, ffprobe, Python deps, and optional
103
+ Azure keys are all in order; otherwise it prints exactly what's missing.
104
+
105
+ ### From source (development)
106
+
107
+ ```
108
+ git clone https://github.com/mbahgatTech/videopilot.git
109
+ cd videopilot
110
+ pip install --user -e .
111
+ ```
112
+
113
+ ### Via the Agency plugin
114
+
115
+ If you use Copilot or Claude inside Microsoft and have access to the
116
+ [Agency Playground](https://github.com/agency-microsoft/playground)
117
+ marketplace, install the `videopilot` plugin and ask:
118
+
119
+ > set up videopilot
120
+
121
+ The plugin's `init` skill runs the same installer logic for you.
122
+
123
+ ## Quick start
124
+
125
+ ```
126
+ # 1. Create a project with a source video
127
+ videopilot init demo --source "/path/to/raw-recording.mp4"
128
+
129
+ # 2. Hand-author projects/demo/script.json (one segment per beat of narration),
130
+ # OR have your agent draft it from AGENT.md.
131
+
132
+ # 3. Synthesize the voiceover
133
+ videopilot tts demo
134
+
135
+ # 4. (Optional) transcribe to help pick highlights
136
+ videopilot transcribe demo raw1
137
+
138
+ # 5. Hand-author projects/demo/cut-plan.json (which spans to keep)
139
+
140
+ # 6. Cut clips from sources
141
+ videopilot cut demo
142
+
143
+ # 7. Hand-author projects/demo/compose-plan.json (timeline + slides + ducking)
144
+
145
+ # 8. Render the final video
146
+ videopilot compose demo
147
+
148
+ # 9. Optional: emit NLE projects + replay script
149
+ videopilot export demo --edl --fcpxml --script
150
+ ```
151
+
152
+ Final output: `projects/demo/out/final.mp4` plus optional `final.edl`,
153
+ `final.fcpxml`, and `render.ps1`.
154
+
155
+ ## CLI reference
156
+
157
+ | Command | Purpose |
158
+ |---|---|
159
+ | `videopilot doctor` | Verify ffmpeg, ffprobe, Python deps, optional Azure keys. |
160
+ | `videopilot voices [--locale en-US]` | List available TTS voices. |
161
+ | `videopilot init <slug> [--source PATH]` | Create a new project with optional first source. |
162
+ | `videopilot import <slug> <path>` | Add another source to an existing project. |
163
+ | `videopilot tts <slug> [--force]` | Synthesize voiceover MP3s from `script.json`. |
164
+ | `videopilot transcribe <slug> <source-id>` | Run faster-whisper; emits word-level JSON + SRT. |
165
+ | `videopilot silence <slug> <source-id>` | Emit a cut-plan candidate that strips silence. |
166
+ | `videopilot cut <slug> [--force] [--reencode]` | Cut clips per `cut-plan.json`. |
167
+ | `videopilot compose <slug>` | Render final MP4 per `compose-plan.json`. |
168
+ | `videopilot export <slug> [--edl] [--fcpxml] [--script]` | Emit NLE projects + replayable ffmpeg script. |
169
+
170
+ Run `videopilot <command> --help` for per-command flags.
171
+
172
+ ## Project layout
173
+
174
+ ```
175
+ videopilot/
176
+ - AGENT.md <- contract for calling LLMs (start here if you're driving the tool)
177
+ - README.md <- this file
178
+ - LICENSE <- MIT
179
+ - pyproject.toml
180
+ - videopilot.py <- argparse router
181
+ - videopilot_cli.py <- console-script shim
182
+ - lib/ <- implementation modules
183
+ - tts.py
184
+ - transcribe.py
185
+ - silence.py
186
+ - cut.py
187
+ - compose.py
188
+ - export.py
189
+ - ffmpeg_wrap.py
190
+ - voices.py
191
+ - init_cmd.py
192
+ - doctor.py
193
+ - examples/ <- copyable starter JSON state files
194
+ - projects/<slug>/ <- per-project workspace (one folder per video)
195
+ - project.json
196
+ - script.json
197
+ - cut-plan.json
198
+ - compose-plan.json
199
+ - sources/
200
+ - voice/
201
+ - transcripts/
202
+ - clips/
203
+ - tmp/
204
+ - out/
205
+ ```
206
+
207
+ ## Configuration
208
+
209
+ | Environment variable | Purpose |
210
+ |---|---|
211
+ | `AZURE_SPEECH_KEY` | Optional. Enables Azure Speech voices (premium neural TTS). |
212
+ | `AZURE_SPEECH_REGION` | Required when `AZURE_SPEECH_KEY` is set (e.g. `eastus`). |
213
+
214
+ Edge TTS is the default and requires no configuration.
215
+
216
+ ## Driving videopilot from an LLM
217
+
218
+ Read [`AGENT.md`](AGENT.md). It is the contract the calling LLM uses:
219
+
220
+ - the JSON schema for each state file (`project.json`, `script.json`,
221
+ `cut-plan.json`, `compose-plan.json`);
222
+ - when to call which subcommand;
223
+ - conventions (2-space JSON, preserved ids, idempotent re-runs);
224
+ - common failure modes and recoveries.
225
+
226
+ The `videopilot` plugin in the Agency Playground packages this contract as a
227
+ Copilot/Claude skill so you can just say `set up videopilot` and `make a video
228
+ from <source>` instead of orchestrating by hand.
229
+
230
+ ## Development
231
+
232
+ ```
233
+ git clone https://github.com/mbahgatTech/videopilot.git
234
+ cd videopilot
235
+ pip install --user -e ".[dev]"
236
+
237
+ # Build the package
238
+ python -m build
239
+
240
+ # Validate the dist
241
+ python -m twine check dist/*
242
+
243
+ # Local smoke test
244
+ videopilot doctor
245
+ ```
246
+
247
+ ## Releasing
248
+
249
+ Releases are published to PyPI automatically when a `v*` tag is pushed.
250
+ The workflow uses [PyPI **Trusted Publishing** (OIDC)](https://docs.pypi.org/trusted-publishers/),
251
+ so **no API tokens are stored in the repo or in GitHub Secrets** — PyPI verifies
252
+ the GitHub OIDC token at publish time.
253
+
254
+ One-time setup (PyPI side, do this once before the first release):
255
+
256
+ 1. Sign in to <https://pypi.org/>.
257
+ 2. Account settings → Publishing → **Add a new pending publisher**:
258
+ - PyPI project name: `videopilot`
259
+ - Owner: `mbahgatTech`
260
+ - Repository: `videopilot`
261
+ - Workflow filename: `release.yml`
262
+ - Environment name: `pypi`
263
+ 3. On GitHub, repo Settings → Environments → **New environment** → `pypi`.
264
+ Optionally add a required reviewer for an extra approval gate.
265
+
266
+ Cutting a release:
267
+
268
+ ```
269
+ # bump pyproject.toml [project] version, e.g. 0.1.0 -> 0.2.0
270
+ git commit -am "release: 0.2.0"
271
+ git tag v0.2.0
272
+ git push origin main --tags
273
+ ```
274
+
275
+ The `release` workflow then:
276
+
277
+ 1. Builds sdist + wheel
278
+ 2. Verifies tag matches `pyproject.toml` version
279
+ 3. Runs `twine check`
280
+ 4. Publishes to PyPI via OIDC
281
+ 5. Creates a GitHub Release with the sdist + wheel attached
282
+
283
+ ## License
284
+
285
+ MIT. See [`LICENSE`](LICENSE).