videowright 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (306) hide show
  1. package/README.md +91 -0
  2. package/dist/cli/argv.d.ts +28 -0
  3. package/dist/cli/argv.d.ts.map +1 -0
  4. package/dist/cli/argv.js +115 -0
  5. package/dist/cli/argv.js.map +1 -0
  6. package/dist/cli/bin.d.ts +7 -0
  7. package/dist/cli/bin.d.ts.map +1 -0
  8. package/dist/cli/bin.js +10 -0
  9. package/dist/cli/bin.js.map +1 -0
  10. package/dist/cli/dev.d.ts +19 -0
  11. package/dist/cli/dev.d.ts.map +1 -0
  12. package/dist/cli/dev.js +104 -0
  13. package/dist/cli/dev.js.map +1 -0
  14. package/dist/cli/discover.d.ts +29 -0
  15. package/dist/cli/discover.d.ts.map +1 -0
  16. package/dist/cli/discover.js +104 -0
  17. package/dist/cli/discover.js.map +1 -0
  18. package/dist/cli/discover_project.d.ts +29 -0
  19. package/dist/cli/discover_project.d.ts.map +1 -0
  20. package/dist/cli/discover_project.js +108 -0
  21. package/dist/cli/discover_project.js.map +1 -0
  22. package/dist/cli/errors.d.ts +10 -0
  23. package/dist/cli/errors.d.ts.map +1 -0
  24. package/dist/cli/errors.js +13 -0
  25. package/dist/cli/errors.js.map +1 -0
  26. package/dist/cli/ffmpeg.d.ts +57 -0
  27. package/dist/cli/ffmpeg.d.ts.map +1 -0
  28. package/dist/cli/ffmpeg.js +122 -0
  29. package/dist/cli/ffmpeg.js.map +1 -0
  30. package/dist/cli/index.d.ts +7 -0
  31. package/dist/cli/index.d.ts.map +1 -0
  32. package/dist/cli/index.js +152 -0
  33. package/dist/cli/index.js.map +1 -0
  34. package/dist/cli/playwright_check.d.ts +44 -0
  35. package/dist/cli/playwright_check.d.ts.map +1 -0
  36. package/dist/cli/playwright_check.js +20 -0
  37. package/dist/cli/playwright_check.js.map +1 -0
  38. package/dist/cli/prompt.d.ts +13 -0
  39. package/dist/cli/prompt.d.ts.map +1 -0
  40. package/dist/cli/prompt.js +47 -0
  41. package/dist/cli/prompt.js.map +1 -0
  42. package/dist/cli/render.d.ts +60 -0
  43. package/dist/cli/render.d.ts.map +1 -0
  44. package/dist/cli/render.js +471 -0
  45. package/dist/cli/render.js.map +1 -0
  46. package/dist/cli/script_cmd.d.ts +26 -0
  47. package/dist/cli/script_cmd.d.ts.map +1 -0
  48. package/dist/cli/script_cmd.js +88 -0
  49. package/dist/cli/script_cmd.js.map +1 -0
  50. package/dist/cli/time_shim.d.ts +44 -0
  51. package/dist/cli/time_shim.d.ts.map +1 -0
  52. package/dist/cli/time_shim.js +390 -0
  53. package/dist/cli/time_shim.js.map +1 -0
  54. package/dist/cli/ts_loader.d.ts +28 -0
  55. package/dist/cli/ts_loader.d.ts.map +1 -0
  56. package/dist/cli/ts_loader.js +95 -0
  57. package/dist/cli/ts_loader.js.map +1 -0
  58. package/dist/cli/vite_helpers.d.ts +62 -0
  59. package/dist/cli/vite_helpers.d.ts.map +1 -0
  60. package/dist/cli/vite_helpers.js +273 -0
  61. package/dist/cli/vite_helpers.js.map +1 -0
  62. package/dist/index.d.ts +11 -0
  63. package/dist/index.d.ts.map +1 -0
  64. package/dist/index.js +14 -0
  65. package/dist/index.js.map +1 -0
  66. package/dist/player/hash_router.d.ts +23 -0
  67. package/dist/player/hash_router.d.ts.map +1 -0
  68. package/dist/player/hash_router.js +49 -0
  69. package/dist/player/hash_router.js.map +1 -0
  70. package/dist/player/hud.d.ts +33 -0
  71. package/dist/player/hud.d.ts.map +1 -0
  72. package/dist/player/hud.js +357 -0
  73. package/dist/player/hud.js.map +1 -0
  74. package/dist/player/index.d.ts +123 -0
  75. package/dist/player/index.d.ts.map +1 -0
  76. package/dist/player/index.js +848 -0
  77. package/dist/player/index.js.map +1 -0
  78. package/dist/player/input.d.ts +14 -0
  79. package/dist/player/input.d.ts.map +1 -0
  80. package/dist/player/input.js +90 -0
  81. package/dist/player/input.js.map +1 -0
  82. package/dist/player/slot.d.ts +22 -0
  83. package/dist/player/slot.d.ts.map +1 -0
  84. package/dist/player/slot.js +43 -0
  85. package/dist/player/slot.js.map +1 -0
  86. package/dist/player/transitions/cut.d.ts +7 -0
  87. package/dist/player/transitions/cut.d.ts.map +1 -0
  88. package/dist/player/transitions/cut.js +9 -0
  89. package/dist/player/transitions/cut.js.map +1 -0
  90. package/dist/player/transitions/fade.d.ts +7 -0
  91. package/dist/player/transitions/fade.d.ts.map +1 -0
  92. package/dist/player/transitions/fade.js +18 -0
  93. package/dist/player/transitions/fade.js.map +1 -0
  94. package/dist/player/transitions/index.d.ts +4 -0
  95. package/dist/player/transitions/index.d.ts.map +1 -0
  96. package/dist/player/transitions/index.js +4 -0
  97. package/dist/player/transitions/index.js.map +1 -0
  98. package/dist/player/transitions/slide.d.ts +6 -0
  99. package/dist/player/transitions/slide.d.ts.map +1 -0
  100. package/dist/player/transitions/slide.js +35 -0
  101. package/dist/player/transitions/slide.js.map +1 -0
  102. package/dist/script/index.d.ts +2 -0
  103. package/dist/script/index.d.ts.map +1 -0
  104. package/dist/script/index.js +2 -0
  105. package/dist/script/index.js.map +1 -0
  106. package/dist/script/script.d.ts +10 -0
  107. package/dist/script/script.d.ts.map +1 -0
  108. package/dist/script/script.js +41 -0
  109. package/dist/script/script.js.map +1 -0
  110. package/dist/segment/SegmentRunner.d.ts +52 -0
  111. package/dist/segment/SegmentRunner.d.ts.map +1 -0
  112. package/dist/segment/SegmentRunner.js +187 -0
  113. package/dist/segment/SegmentRunner.js.map +1 -0
  114. package/dist/segment/defineConfig.d.ts +6 -0
  115. package/dist/segment/defineConfig.d.ts.map +1 -0
  116. package/dist/segment/defineConfig.js +7 -0
  117. package/dist/segment/defineConfig.js.map +1 -0
  118. package/dist/segment/defineSegment.d.ts +7 -0
  119. package/dist/segment/defineSegment.d.ts.map +1 -0
  120. package/dist/segment/defineSegment.js +25 -0
  121. package/dist/segment/defineSegment.js.map +1 -0
  122. package/dist/segment/index.d.ts +5 -0
  123. package/dist/segment/index.d.ts.map +1 -0
  124. package/dist/segment/index.js +4 -0
  125. package/dist/segment/index.js.map +1 -0
  126. package/dist/timeline/index.d.ts +73 -0
  127. package/dist/timeline/index.d.ts.map +1 -0
  128. package/dist/timeline/index.js +142 -0
  129. package/dist/timeline/index.js.map +1 -0
  130. package/dist/timeline/loadAudioTrack.d.ts +18 -0
  131. package/dist/timeline/loadAudioTrack.d.ts.map +1 -0
  132. package/dist/timeline/loadAudioTrack.js +44 -0
  133. package/dist/timeline/loadAudioTrack.js.map +1 -0
  134. package/dist/timeline/loadVoiceover.d.ts +18 -0
  135. package/dist/timeline/loadVoiceover.d.ts.map +1 -0
  136. package/dist/timeline/loadVoiceover.js +38 -0
  137. package/dist/timeline/loadVoiceover.js.map +1 -0
  138. package/dist/timeline/resolveTiming.d.ts +28 -0
  139. package/dist/timeline/resolveTiming.d.ts.map +1 -0
  140. package/dist/timeline/resolveTiming.js +63 -0
  141. package/dist/timeline/resolveTiming.js.map +1 -0
  142. package/dist/timeline/validateTiming.d.ts +29 -0
  143. package/dist/timeline/validateTiming.d.ts.map +1 -0
  144. package/dist/timeline/validateTiming.js +62 -0
  145. package/dist/timeline/validateTiming.js.map +1 -0
  146. package/dist/types.d.ts +216 -0
  147. package/dist/types.d.ts.map +1 -0
  148. package/dist/types.js +6 -0
  149. package/dist/types.js.map +1 -0
  150. package/package.json +47 -0
  151. package/skill/SKILL.md +64 -0
  152. package/skill/assets/hello_world/PLAN.md +31 -0
  153. package/skill/assets/hello_world/README.md +27 -0
  154. package/skill/assets/hello_world/audio/audio_plan.md +14 -0
  155. package/skill/assets/hello_world/segments/hello_intro.ts +69 -0
  156. package/skill/assets/hello_world/segments/hello_outro.ts +71 -0
  157. package/skill/assets/hello_world/timeline.ts +15 -0
  158. package/skill/assets/hello_world/voiceover_script/script.md +10 -0
  159. package/skill/assets/install/package.json +10 -0
  160. package/skill/assets/install/tsconfig.json +23 -0
  161. package/skill/assets/styles/editorial-mono/STYLE.md +124 -0
  162. package/skill/assets/styles/editorial-mono/brand.md +85 -0
  163. package/skill/assets/styles/editorial-mono/reference/animations.jsx +752 -0
  164. package/skill/assets/styles/editorial-mono/reference/scenes.html +563 -0
  165. package/skill/assets/styles/editorial-mono/sample/bullet.ts +101 -0
  166. package/skill/assets/styles/editorial-mono/sample/content.ts +104 -0
  167. package/skill/assets/styles/editorial-mono/sample/cta.ts +113 -0
  168. package/skill/assets/styles/editorial-mono/sample/feature.ts +111 -0
  169. package/skill/assets/styles/editorial-mono/sample/grid.ts +97 -0
  170. package/skill/assets/styles/editorial-mono/sample/kinetic.ts +96 -0
  171. package/skill/assets/styles/editorial-mono/sample/section.ts +101 -0
  172. package/skill/assets/styles/editorial-mono/sample/stat.ts +128 -0
  173. package/skill/assets/styles/editorial-mono/sample/title.ts +97 -0
  174. package/skill/assets/styles/editorial-mono/sample/ui-showcase.ts +159 -0
  175. package/skill/assets/styles/editorial-mono/tokens.css +44 -0
  176. package/skill/assets/styles/iso-diagram/STYLE.md +109 -0
  177. package/skill/assets/styles/iso-diagram/brand.md +32 -0
  178. package/skill/assets/styles/iso-diagram/reference/animations.jsx +673 -0
  179. package/skill/assets/styles/iso-diagram/reference/scenes.html +427 -0
  180. package/skill/assets/styles/iso-diagram/sample/bullet.ts +144 -0
  181. package/skill/assets/styles/iso-diagram/sample/content.ts +192 -0
  182. package/skill/assets/styles/iso-diagram/sample/cta.ts +162 -0
  183. package/skill/assets/styles/iso-diagram/sample/feature.ts +205 -0
  184. package/skill/assets/styles/iso-diagram/sample/grid.ts +181 -0
  185. package/skill/assets/styles/iso-diagram/sample/kinetic.ts +102 -0
  186. package/skill/assets/styles/iso-diagram/sample/section.ts +149 -0
  187. package/skill/assets/styles/iso-diagram/sample/stat.ts +164 -0
  188. package/skill/assets/styles/iso-diagram/sample/title.ts +173 -0
  189. package/skill/assets/styles/iso-diagram/sample/ui-showcase.ts +162 -0
  190. package/skill/assets/styles/iso-diagram/tokens.css +40 -0
  191. package/skill/assets/styles/motion-engineering/STYLE.md +106 -0
  192. package/skill/assets/styles/motion-engineering/brand.md +29 -0
  193. package/skill/assets/styles/motion-engineering/reference/animations.jsx +673 -0
  194. package/skill/assets/styles/motion-engineering/reference/scenes.html +513 -0
  195. package/skill/assets/styles/motion-engineering/sample/bullet.ts +176 -0
  196. package/skill/assets/styles/motion-engineering/sample/content.ts +228 -0
  197. package/skill/assets/styles/motion-engineering/sample/cta.ts +209 -0
  198. package/skill/assets/styles/motion-engineering/sample/feature.ts +299 -0
  199. package/skill/assets/styles/motion-engineering/sample/grid.ts +190 -0
  200. package/skill/assets/styles/motion-engineering/sample/kinetic.ts +159 -0
  201. package/skill/assets/styles/motion-engineering/sample/section.ts +196 -0
  202. package/skill/assets/styles/motion-engineering/sample/stat.ts +230 -0
  203. package/skill/assets/styles/motion-engineering/sample/title.ts +219 -0
  204. package/skill/assets/styles/motion-engineering/sample/ui-showcase.ts +267 -0
  205. package/skill/assets/styles/motion-engineering/tokens.css +40 -0
  206. package/skill/assets/styles/neon-terminal/STYLE.md +105 -0
  207. package/skill/assets/styles/neon-terminal/brand.md +27 -0
  208. package/skill/assets/styles/neon-terminal/reference/animations.jsx +673 -0
  209. package/skill/assets/styles/neon-terminal/reference/scenes.html +387 -0
  210. package/skill/assets/styles/neon-terminal/sample/bullet.ts +113 -0
  211. package/skill/assets/styles/neon-terminal/sample/content.ts +117 -0
  212. package/skill/assets/styles/neon-terminal/sample/cta.ts +131 -0
  213. package/skill/assets/styles/neon-terminal/sample/feature.ts +112 -0
  214. package/skill/assets/styles/neon-terminal/sample/grid.ts +128 -0
  215. package/skill/assets/styles/neon-terminal/sample/kinetic.ts +105 -0
  216. package/skill/assets/styles/neon-terminal/sample/section.ts +96 -0
  217. package/skill/assets/styles/neon-terminal/sample/stat.ts +123 -0
  218. package/skill/assets/styles/neon-terminal/sample/title.ts +122 -0
  219. package/skill/assets/styles/neon-terminal/sample/ui-showcase.ts +127 -0
  220. package/skill/assets/styles/neon-terminal/tokens.css +39 -0
  221. package/skill/assets/styles/risograph/STYLE.md +110 -0
  222. package/skill/assets/styles/risograph/brand.md +26 -0
  223. package/skill/assets/styles/risograph/reference/animations.jsx +673 -0
  224. package/skill/assets/styles/risograph/reference/scenes.html +403 -0
  225. package/skill/assets/styles/risograph/sample/bullet.ts +124 -0
  226. package/skill/assets/styles/risograph/sample/content.ts +135 -0
  227. package/skill/assets/styles/risograph/sample/cta.ts +149 -0
  228. package/skill/assets/styles/risograph/sample/feature.ts +152 -0
  229. package/skill/assets/styles/risograph/sample/grid.ts +123 -0
  230. package/skill/assets/styles/risograph/sample/kinetic.ts +125 -0
  231. package/skill/assets/styles/risograph/sample/section.ts +130 -0
  232. package/skill/assets/styles/risograph/sample/stat.ts +145 -0
  233. package/skill/assets/styles/risograph/sample/title.ts +132 -0
  234. package/skill/assets/styles/risograph/sample/ui-showcase.ts +147 -0
  235. package/skill/assets/styles/risograph/tokens.css +39 -0
  236. package/skill/assets/styles/swiss-console/STYLE.md +107 -0
  237. package/skill/assets/styles/swiss-console/brand.md +37 -0
  238. package/skill/assets/styles/swiss-console/reference/animations.jsx +673 -0
  239. package/skill/assets/styles/swiss-console/reference/scenes.html +420 -0
  240. package/skill/assets/styles/swiss-console/sample/bullet.ts +122 -0
  241. package/skill/assets/styles/swiss-console/sample/content.ts +137 -0
  242. package/skill/assets/styles/swiss-console/sample/cta.ts +109 -0
  243. package/skill/assets/styles/swiss-console/sample/feature.ts +163 -0
  244. package/skill/assets/styles/swiss-console/sample/grid.ts +145 -0
  245. package/skill/assets/styles/swiss-console/sample/kinetic.ts +117 -0
  246. package/skill/assets/styles/swiss-console/sample/section.ts +127 -0
  247. package/skill/assets/styles/swiss-console/sample/stat.ts +148 -0
  248. package/skill/assets/styles/swiss-console/sample/title.ts +148 -0
  249. package/skill/assets/styles/swiss-console/sample/ui-showcase.ts +198 -0
  250. package/skill/assets/styles/swiss-console/tokens.css +39 -0
  251. package/skill/install/INSTALL.md +400 -0
  252. package/skill/references/audio/audio_plan.md +199 -0
  253. package/skill/references/audio/build.md +208 -0
  254. package/skill/references/audio/cue_template.md +219 -0
  255. package/skill/references/audio/ffmpeg_cookbook.md +267 -0
  256. package/skill/references/audio/music/music.md +171 -0
  257. package/skill/references/audio/music/providers/elevenlabs.md +170 -0
  258. package/skill/references/audio/music/providers/manual.md +140 -0
  259. package/skill/references/audio/music/providers/openverse.md +265 -0
  260. package/skill/references/audio/sfx/providers/elevenlabs.md +152 -0
  261. package/skill/references/audio/sfx/providers/manual.md +117 -0
  262. package/skill/references/audio/sfx/providers/openverse.md +243 -0
  263. package/skill/references/audio/sfx/sfx.md +149 -0
  264. package/skill/references/audio/styles.md +102 -0
  265. package/skill/references/audio/sync.md +237 -0
  266. package/skill/references/audio/voiceover/animation_sync.md +142 -0
  267. package/skill/references/audio/voiceover/provider_script.md +153 -0
  268. package/skill/references/audio/voiceover/providers/elevenlabs.md +288 -0
  269. package/skill/references/audio/voiceover/providers/manual.md +100 -0
  270. package/skill/references/audio/voiceover/script_writing.md +100 -0
  271. package/skill/references/audio/voiceover/style_intake.md +56 -0
  272. package/skill/references/audio/voiceover/sync_algorithm.md +167 -0
  273. package/skill/references/audio/voiceover.md +296 -0
  274. package/skill/references/audio.md +135 -0
  275. package/skill/references/authoring_segment.md +446 -0
  276. package/skill/references/create_or_edit_video.md +232 -0
  277. package/skill/references/dev_server.md +157 -0
  278. package/skill/references/export.md +145 -0
  279. package/skill/references/new_video.md +117 -0
  280. package/skill/references/project_structure.md +144 -0
  281. package/skill/references/setup.md +109 -0
  282. package/skill/references/setup_new_style.md +158 -0
  283. package/skill/references/styles.md +154 -0
  284. package/skill/references/testing.md +115 -0
  285. package/skill/references/types.md +240 -0
  286. package/src/cli/entry/components/copy_button.ts +42 -0
  287. package/src/cli/entry/components/download_modal.ts +204 -0
  288. package/src/cli/entry/components/empty_state.ts +55 -0
  289. package/src/cli/entry/components/hide_hud_tab.ts +37 -0
  290. package/src/cli/entry/components/icons.ts +31 -0
  291. package/src/cli/entry/components/top_bar.ts +69 -0
  292. package/src/cli/entry/components/video_card.ts +57 -0
  293. package/src/cli/entry/dev_frame.ts +189 -0
  294. package/src/cli/entry/entry_index.ts +16 -0
  295. package/src/cli/entry/entry_video.ts +24 -0
  296. package/src/cli/entry/index.html +12 -0
  297. package/src/cli/entry/parse_slug.ts +14 -0
  298. package/src/cli/entry/render.html +17 -0
  299. package/src/cli/entry/render_entry.ts +121 -0
  300. package/src/cli/entry/styles/base.css +45 -0
  301. package/src/cli/entry/styles/components.css +605 -0
  302. package/src/cli/entry/styles/tokens.css +44 -0
  303. package/src/cli/entry/video.html +22 -0
  304. package/src/cli/entry/views/homepage.ts +66 -0
  305. package/src/cli/entry/views/video_view.ts +286 -0
  306. package/src/cli/entry/virtual.d.ts +8 -0
@@ -0,0 +1,288 @@
1
+ # ElevenLabs
2
+
3
+ ## When this is loaded
4
+
5
+ You need to guide the user through ElevenLabs to generate voiceover audio and per-word timing data. This reference covers both the API-key flow (automated) and the portal flow (manual web UI).
6
+
7
+ This reference also covers Speech-to-Text for the manual voiceover flow (Flow B).
8
+
9
+ ## Mode selection
10
+
11
+ This question is asked at the start of Flow A (before style intake). Present the user with two options:
12
+
13
+ > Two ways to generate the voiceover with ElevenLabs:
14
+ >
15
+ > 1. **API key (recommended for repeated use)** -- set up once in `.env`, then the agent generates audio and timings via curl. **Requires a paid ElevenLabs plan** (not available on the free tier). Note: granting the agent API access means it will spend your ElevenLabs credits, which costs real money.
16
+ > 2. **Portal (web UI, works with any plan)** -- the agent walks you through TTS in the ElevenLabs web portal, then STT to extract timings.
17
+ >
18
+ > API key is faster and reusable across projects. Portal needs no setup but takes more clicks per video.
19
+ >
20
+ > If you don't have an account: open https://elevenlabs.io and sign up first.
21
+
22
+ After the user picks, if they chose **API key**, immediately present the curated voice catalog (see [voiceover.md curated voice catalog](../../voiceover.md#curated-voice-catalog)). Then continue with style intake and script writing. When it is time for audio generation, dispatch into the appropriate sub-flow below.
23
+
24
+ ---
25
+
26
+ ## Flow 1: API Key
27
+
28
+ ### Step 1: Credit / cost warning
29
+
30
+ Before generating audio, warn the user about costs:
31
+
32
+ > **Cost notice:** ElevenLabs charges credits for TTS generation. A 60-second voiceover (~900 characters) costs roughly 900-1,000 credits, which is a small fraction of most paid plan quotas.
33
+ >
34
+ > Check your plan's remaining quota at https://elevenlabs.io/app/subscription before generating.
35
+
36
+ <!-- TODO: Verify current pricing tiers -- ElevenLabs pricing may have changed. The rough estimate above is based on ~1 credit per character for standard voices. -->
37
+
38
+ ### Step 2: Get the API key
39
+
40
+ Guide the user:
41
+
42
+ > **Creating your ElevenLabs API key:**
43
+ >
44
+ > 1. Go to https://elevenlabs.io/app/developers/api-keys (sign in if prompted).
45
+ > 2. Click **Create API Key**.
46
+ > 3. Give it a name (e.g., "videowright").
47
+ > 4. **Enable these permissions** on the key before creating it. The key creation UI has toggles for which features the key can access. Enable all of the following:
48
+ > - **Text to Speech** -- required for generating voiceover audio.
49
+ > - **Speech to Text** -- required for extracting word-level timing from audio.
50
+ > - **Sound Effects** -- for upcoming SFX generation features.
51
+ > - **Music Generation** -- for upcoming background music features.
52
+ >
53
+ > If you don't see individual permission toggles, create the key with full access -- the default may already include all required permissions.
54
+ > 5. Click **Create** and copy the key.
55
+
56
+ **Storage rules -- do NOT paste the key into chat:**
57
+
58
+ > **Important:** Do not paste your API key into this chat -- it would be sent to the LLM provider.
59
+ >
60
+ > Instead:
61
+ > 1. Create a `.env` file at your project root (if it doesn't already exist).
62
+ > 2. Add this line: `ELEVENLABS_API_KEY=your-key-here`
63
+ > 3. Make sure `.env` is in your `.gitignore` (add it if not).
64
+
65
+ The agent reads the key via `process.env.ELEVENLABS_API_KEY` when running curl commands.
66
+
67
+ ### Step 3: Voice (already selected)
68
+
69
+ The voice was chosen during approach selection (Flow A step 1). Use this voice ID when constructing the API call below, and write it to the `eleven_labs_voice_id` field when creating `voiceover.ts` in step 8. If no voice was explicitly chosen, default to Asher (`tMvyQtpCVQ0DkixuYm6J`). No action needed here -- proceed to audio generation.
70
+
71
+ The voice ID lookup table for the curated catalog:
72
+
73
+ | Voice | ID |
74
+ |---|---|
75
+ | Asher (default) | `tMvyQtpCVQ0DkixuYm6J` |
76
+ | Cecily | `Uc7anshoV8mdBhDnEZEX` |
77
+ | Don | `8IbUB2LiiCZ85IJAHNnZ` |
78
+ | Hanna | `Hh0rE70WfnSFN80K8uJC` |
79
+
80
+ ### Step 4: Generate audio with timestamps
81
+
82
+ Use the text-to-speech-with-timestamps endpoint. This returns both the audio and per-word timing in a single request.
83
+
84
+ **Endpoint:** `POST https://api.elevenlabs.io/v1/text-to-speech/{voice_id}/with-timestamps`
85
+
86
+ The agent reads the voice ID from the `eleven_labs_voice_id` field in `voiceover.ts`. If not set, default to Asher: `tMvyQtpCVQ0DkixuYm6J`.
87
+
88
+ The agent constructs and runs a curl command like this:
89
+
90
+ ```bash
91
+ # Read the provider script (everything below the --- line)
92
+ SCRIPT_TEXT=$(sed '1,/^---$/d' "videos/<video>/audio/originals/voiceovers/<slug>/provider_script.md")
93
+
94
+ # Voice ID from eleven_labs_voice_id (default: Asher tMvyQtpCVQ0DkixuYm6J)
95
+ VOICE_ID="<selected voice ID>"
96
+
97
+ curl -X POST "https://api.elevenlabs.io/v1/text-to-speech/${VOICE_ID}/with-timestamps" \
98
+ -H "xi-api-key: ${ELEVENLABS_API_KEY}" \
99
+ -H "Content-Type: application/json" \
100
+ -d "$(jq -n \
101
+ --arg text "$SCRIPT_TEXT" \
102
+ --arg model "eleven_multilingual_v2" \
103
+ '{
104
+ text: $text,
105
+ model_id: $model,
106
+ voice_settings: {
107
+ stability: 0.5,
108
+ similarity_boost: 0.75
109
+ }
110
+ }'
111
+ )" \
112
+ -o "$TMPDIR/elevenlabs_response.json"
113
+ ```
114
+
115
+ <!-- TODO: Verify the exact response shape of the with-timestamps endpoint. The format below is based on known ElevenLabs API behavior -- confirm against current docs. -->
116
+
117
+ **Processing the response:**
118
+
119
+ The response JSON contains base64-encoded audio and word-level alignment data. The agent must:
120
+
121
+ 1. **Extract the audio.** The response includes an `audio_base64` field. Decode and save it:
122
+
123
+ ```bash
124
+ jq -r '.audio_base64' "$TMPDIR/elevenlabs_response.json" | base64 --decode > "videos/<video>/audio/originals/voiceovers/<slug>/audio.mp3"
125
+ ```
126
+
127
+ 2. **Extract the timing data.** The response includes an `alignment` field with per-character or per-word timing. Transform it into the standard timing JSON format and save:
128
+
129
+ ```bash
130
+ jq '{
131
+ words: [.alignment.words[] | {word: .word, start: .start, end: .end}]
132
+ }' "$TMPDIR/elevenlabs_response.json" > "videos/<video>/audio/originals/voiceovers/<slug>/timing.json"
133
+ ```
134
+
135
+ If the response uses character-level alignment instead of word-level, aggregate characters into words by grouping on whitespace boundaries and using the start of the first character and end of the last character for each word.
136
+
137
+ 3. **Clean up** the temporary response file.
138
+
139
+ ### API flow output
140
+
141
+ After a successful API call, the voiceover folder should contain:
142
+
143
+ ```
144
+ audio/originals/voiceovers/<slug>/
145
+ provider_script.md # already created in prior step
146
+ audio.mp3 # decoded from API response
147
+ timing.json # extracted from API response
148
+ ```
149
+
150
+ Proceed to the sync algorithm: [../sync_algorithm.md](../sync_algorithm.md).
151
+
152
+ ---
153
+
154
+ ## Flow 2: Portal (Web UI)
155
+
156
+ The portal flow has two steps: TTS to generate audio, then STT to extract per-word timing.
157
+
158
+ ### Step 1 -- Generate the audio (TTS)
159
+
160
+ Portal users do **not** use the curated voice catalog -- they pick a voice visually in the ElevenLabs UI below.
161
+
162
+ Guide the user:
163
+
164
+ > **Generating voiceover audio with ElevenLabs:**
165
+ >
166
+ > 1. Open https://elevenlabs.io/app and sign in.
167
+ > 2. Navigate to **Text to Speech** in the sidebar.
168
+ > 3. **Important: Select the v2 model in the model dropdown.** Look for **"Eleven Multilingual v2"** (or **"Multilingual v2"**). Do NOT use the default v3 model -- v3 does not honor exact pause timing via `<break>` tags.
169
+ > 4. Select a voice that matches your tone preferences. You can preview voices before generating.
170
+ > 5. Paste the content from `audio/originals/voiceovers/<slug>/provider_script.md` (everything below the horizontal rule) into the text area.
171
+ > 6. Click **Generate**.
172
+ > 7. Listen to the preview. If pauses or delivery need adjustment, update the provider script and regenerate.
173
+ > 8. **Download the audio file.** Click the download button on the generated audio. Save as `audio.mp3`.
174
+ > 9. Place the file in `audio/originals/voiceovers/<slug>/audio.mp3`.
175
+
176
+ ### Step 2 -- Extract timings (STT)
177
+
178
+ The TTS portal does not export per-word timing data. To get timings, run the generated audio through Speech-to-Text:
179
+
180
+ > **Extracting word-level timing via Speech-to-Text:**
181
+ >
182
+ > 1. In the ElevenLabs portal, switch to **Speech to Text** in the sidebar.
183
+ > 2. Upload the audio file you just saved (`audio/originals/voiceovers/<slug>/audio.mp3`).
184
+ > 3. Wait for transcription to complete.
185
+ > 4. **Export the result as JSON.** Look for an "Export" or "Download" option and select **JSON** format. **Do not use plain text export** -- plain text does not include per-word timing data.
186
+ > 5. Save the JSON file as `audio/originals/voiceovers/<slug>/timing.json`.
187
+
188
+ ### Portal flow output
189
+
190
+ After both steps, the voiceover folder should contain:
191
+
192
+ ```
193
+ audio/originals/voiceovers/<slug>/
194
+ provider_script.md # already created in prior step
195
+ audio.mp3 # downloaded from TTS in step 1
196
+ timing.json # exported from STT in step 2
197
+ ```
198
+
199
+ Proceed to the sync algorithm: [../sync_algorithm.md](../sync_algorithm.md).
200
+
201
+ ---
202
+
203
+ ## Speech-to-Text (for manual flow / Flow B)
204
+
205
+ Used when the user provides their own audio and needs per-word timing data. This is the same STT process as portal step 2 above.
206
+
207
+ ### Portal walkthrough: STT transcription
208
+
209
+ > **Transcribing audio with ElevenLabs Speech-to-Text:**
210
+ >
211
+ > 1. Open https://elevenlabs.io/app and sign in.
212
+ > 2. Navigate to **Speech to Text** in the sidebar.
213
+ > 3. Upload the audio file (mp3 or wav).
214
+ > 4. Wait for transcription to complete.
215
+ > 5. **Export as JSON.** Select the JSON export option -- plain text export does not include word timing data.
216
+ > 6. Save the JSON file in `audio/originals/voiceovers/<slug>/` as `timing.json`.
217
+
218
+ ### Timing JSON format (STT)
219
+
220
+ ElevenLabs STT output contains word-level timestamps:
221
+
222
+ ```json
223
+ {
224
+ "words": [
225
+ {
226
+ "word": "Welcome",
227
+ "start": 0.12,
228
+ "end": 0.58,
229
+ "confidence": 0.98
230
+ }
231
+ ]
232
+ }
233
+ ```
234
+
235
+ Additional fields like `confidence` can be ignored for sync purposes. The sync algorithm uses only `word`, `start`, and `end`.
236
+
237
+ ### STT accuracy notes
238
+
239
+ - STT may not perfectly transcribe the audio. Minor differences (filler words, slight wording changes) are normal.
240
+ - When the STT transcript differs from the PLAN.md script, use the STT timestamps for timing but the PLAN.md script text for the canonical record.
241
+ - Flag significant discrepancies to the user -- they may want to update PLAN.md to match what was actually spoken.
242
+
243
+ ---
244
+
245
+ ## Timing JSON format (canonical)
246
+
247
+ Both flows produce the same timing JSON format at `audio/originals/voiceovers/<slug>/timing.json`:
248
+
249
+ ```json
250
+ {
251
+ "words": [
252
+ {
253
+ "word": "Welcome",
254
+ "start": 0.0,
255
+ "end": 0.45
256
+ },
257
+ {
258
+ "word": "to",
259
+ "start": 0.47,
260
+ "end": 0.55
261
+ }
262
+ ]
263
+ }
264
+ ```
265
+
266
+ Fields:
267
+ - `word`: the spoken word
268
+ - `start`: seconds from audio start when the word begins
269
+ - `end`: seconds from audio start when the word ends
270
+
271
+ The exact JSON structure may vary by ElevenLabs endpoint or export version. Adapt by looking for word-level entries with start/end timestamps. The sync algorithm needs: word text, start time, end time.
272
+
273
+ ## Known limitations
274
+
275
+ - **`<break>` tags cannot appear at the very start of audio.** ElevenLabs does not support a `<break time="Ns" />` tag as the first element in the provider script. The TTS engine requires spoken text before the first break. If the video needs an initial silent pause before narration begins, that must be handled separately (e.g., by adding a leading silent segment in the timeline). A proper initial-pause feature is planned but not yet available.
276
+
277
+ ## Troubleshooting
278
+
279
+ | Issue | Resolution |
280
+ |---|---|
281
+ | API returns 401 | Check that `ELEVENLABS_API_KEY` is set correctly in `.env` and the key is valid. |
282
+ | API returns 429 | Rate limited. Wait a moment and retry, or check your plan's quota. |
283
+ | Audio quality is poor | Re-generate with a different voice or adjusted settings. Try adjusting `stability` (higher = more consistent) and `similarity_boost` in the API call. |
284
+ | Pauses are too short/long | Adjust the `provider_script.md` pause annotations and regenerate. For `<break>` tags, adjust the `time` value. |
285
+ | TTS mispronounces a word | Add phonetic spelling to the provider script and regenerate. |
286
+ | STT misses words or adds extra words | Use the best-match approach in the sync algorithm. Flag mismatches to the user. |
287
+ | Portal does not show v2 model option | The model may be listed as "Eleven Multilingual v2" or similar. Check the model dropdown carefully. If v2 is not available, the `<break>` tags for exact pause timing will not work reliably in v3 -- note this to the user. |
288
+ | STT JSON export option not visible | Look for a download/export button after transcription completes. The option may be labeled "Export", "Download", or appear as a dropdown with format choices. Select JSON specifically. |
@@ -0,0 +1,100 @@
1
+ # Manual Voiceover
2
+
3
+ ## When this is loaded
4
+
5
+ The user has their own audio file and wants to add it as a voiceover. This is Flow B from the main voiceover reference.
6
+
7
+ ## Overview
8
+
9
+ The manual flow takes a user-provided audio file and generates the transcript and timing data needed for sync. The user provides the audio; the agent guides them through ElevenLabs Speech-to-Text to get per-word timing, then runs the sync algorithm to produce a `Timing` object.
10
+
11
+ ## Step 1: Get the audio file
12
+
13
+ Ask the user for the audio file. Two options:
14
+
15
+ 1. **File path.** The user provides a path to an existing file on disk. Copy or move it into the voiceover folder.
16
+ 2. **Drop-in.** Create the voiceover folder and ask the user to place the file there.
17
+
18
+ ```
19
+ audio/originals/voiceovers/<slug>/
20
+ audio.mp3 # or .wav
21
+ ```
22
+
23
+ Slug naming:
24
+ - Ask the user for a slug, or suggest one based on context: `v1`, `narrator`, `take-1`.
25
+ - Create the directory: `videos/<video>/audio/originals/voiceovers/<slug>/`.
26
+
27
+ ### Audio format requirements
28
+
29
+ - MP3 or WAV are the standard formats. Both work with ElevenLabs and ffmpeg.
30
+ - If the user provides a different format (M4A, OGG, FLAC), note that ffmpeg can handle most formats. ElevenLabs STT also accepts most common formats.
31
+ - No specific sample rate or bitrate requirements -- ElevenLabs and ffmpeg handle resampling.
32
+
33
+ ## Step 2: Generate transcript and timing
34
+
35
+ The user needs per-word timing data for the sync algorithm. Guide them through ElevenLabs Speech-to-Text:
36
+
37
+ > **To generate the timing data, we need to transcribe your audio with per-word timestamps.**
38
+ >
39
+ > Follow these steps:
40
+ >
41
+ > 1. Open https://elevenlabs.io/app and sign in (or create a free account).
42
+ > 2. Navigate to **Speech to Text** in the sidebar.
43
+ > 3. Upload your audio file (`<filename>`).
44
+ > 4. Wait for transcription to complete.
45
+ > 5. **Export as JSON.** Select the JSON export option -- **do not use plain text export**, which does not include per-word timing data.
46
+ > 6. Save the JSON file as `timing.json` in `audio/originals/voiceovers/<slug>/`.
47
+
48
+ See [elevenlabs.md](elevenlabs.md) for the detailed STT portal walkthrough and expected JSON format.
49
+
50
+ ## Step 3: Verify the transcript
51
+
52
+ After the user downloads the timing JSON:
53
+
54
+ 1. Read `timing.json` and extract the full transcript text.
55
+ 2. Present it to the user for review: "Here's what ElevenLabs heard. Does this match your recording?"
56
+ 3. If the transcript is significantly wrong (wrong words, missing sections), the timing data may be unreliable. Ask the user to:
57
+ - Re-record with clearer audio, or
58
+ - Manually correct the timing JSON (rare -- usually re-recording is easier).
59
+ 4. If the transcript has minor differences (filler words, slight variations), proceed -- the sync algorithm handles this.
60
+
61
+ ## Step 4: Update PLAN.md script
62
+
63
+ If the video does not already have a script in PLAN.md:
64
+
65
+ 1. Use the STT transcript as the basis for the script.
66
+ 2. Divide it by segment, matching content to segment ids using the timeline's segment outline and each segment's `notes`/`voiceover` hints.
67
+ 3. Write the script section into PLAN.md.
68
+ 4. Update each segment's `voiceover` field to match.
69
+
70
+ If the video already has a script in PLAN.md:
71
+
72
+ 1. Compare the STT transcript with the existing script.
73
+ 2. Flag significant differences.
74
+ 3. The existing PLAN.md script is the canonical version -- use it for segment boundary alignment in the sync algorithm, but use the STT timestamps for timing.
75
+
76
+ ## Step 5: Proceed to sync
77
+
78
+ With the audio file and `timing.json` in place, proceed to the sync algorithm: [../sync_algorithm.md](../sync_algorithm.md).
79
+
80
+ The sync algorithm uses the provider timing JSON to compute a `Timing` object. The rest of the flow (default voiceover question, animation sync, writing `voiceover.ts`) is identical to the AI-generated flow.
81
+
82
+ ## Expected folder state after manual flow
83
+
84
+ ```
85
+ audio/originals/voiceovers/<slug>/
86
+ voiceover.ts # created at the end of the flow
87
+ audio.mp3 # user-provided audio
88
+ timing.json # from ElevenLabs STT
89
+ ```
90
+
91
+ Note: there is no `provider_script.md` in the manual flow since the user recorded the audio themselves.
92
+
93
+ ## Edge cases
94
+
95
+ | Situation | Behavior |
96
+ |---|---|
97
+ | Audio has background music or noise | STT accuracy may suffer. Warn the user that timing data could be less precise. Suggest clean audio for best results. |
98
+ | Audio has multiple speakers | Videowright supports only single-narrator voiceovers. Ask the user to provide a single-speaker recording. |
99
+ | Audio is very short (< 5 seconds) | Proceed normally, but the resulting timing may be too sparse for meaningful sync. |
100
+ | User does not want to use ElevenLabs STT | There is no alternative STT integration in V1. The user would need to manually create `timing.json` with per-word timestamps, which is impractical. Recommend ElevenLabs STT (free tier available). |
@@ -0,0 +1,100 @@
1
+ # Script Writing
2
+
3
+ ## When this is loaded
4
+
5
+ You are writing or integrating a voiceover script into the video's PLAN.md. This happens before generating the provider script.
6
+
7
+ ## Where the script lives
8
+
9
+ The canonical script lives in the video's `PLAN.md` under a `## Script` section, divided by segment. Each segment gets a subsection with its VO text:
10
+
11
+ ```markdown
12
+ ## Script
13
+
14
+ ### intro
15
+ Welcome to Acme Product. Today we'll walk through the three features that set us apart.
16
+
17
+ ### feature-cards
18
+ First up: real-time collaboration. Your team can edit simultaneously, with changes syncing instantly across devices.
19
+
20
+ [pause for animation]
21
+
22
+ Next, the analytics dashboard. Track engagement, conversion, and retention in one view.
23
+
24
+ [pause for animation]
25
+
26
+ Finally, integrations. Connect with the tools you already use -- Slack, GitHub, Jira, and more.
27
+
28
+ ### outro
29
+ Ready to get started? Visit acme.com for a free trial. Thanks for watching.
30
+ ```
31
+
32
+ ### Rules for the script section
33
+
34
+ - One subsection per segment, using the segment id as the heading.
35
+ - Segment ids must match the ids in the timeline.
36
+ - Use `[pause for animation]` markers where the script expects a visual beat to play before the narration continues. These are hints for the agent when computing multi-advance timing.
37
+ - Segments with no voiceover content are omitted from the script section.
38
+
39
+ ## Writing a new script
40
+
41
+ When the user wants a script written for them:
42
+
43
+ 1. **Read the video's PLAN.md.** Understand the purpose, audience, segment outline, and any creative direction.
44
+ 2. **Read each segment's code** (or at minimum its `notes` and `voiceover` fields) to understand what visuals are on screen.
45
+ 3. **Draft the script** section-by-section, following the segment outline order.
46
+
47
+ ### Writing guidelines
48
+
49
+ - **Match visuals to narration.** The script should describe or complement what is on screen, not fight it. If a segment shows a data chart, the narration should reference the data.
50
+ - **Pacing: ~150 WPM.** A 30-word section takes about 12 seconds to speak. A 100-word section takes about 40 seconds. Use this to estimate segment durations. Err on the side of shorter -- spoken pace with pauses is slower than reading pace.
51
+ - **Use natural language.** Avoid jargon unless the audience expects it. Write like you are speaking to someone, not writing a whitepaper.
52
+ - **One idea per segment.** Each segment should have a single narrative focus. If a segment's script covers multiple unrelated ideas, consider splitting the segment.
53
+ - **Mark pauses explicitly.** Use `[pause for animation]` at points where the visual needs time to play without narration. These become multi-advance beats in the Timing.
54
+ - **End with a call to action or wrap-up.** The final segment's script should give the video a sense of closure.
55
+
56
+ ## Integrating a user-provided script
57
+
58
+ When the user provides their own script (pasted or from a document):
59
+
60
+ 1. **Chunk by segment.** Read the timeline's segment outline and divide the script into sections that match each segment's purpose. Use the segment's `notes` and `voiceover` hint fields as alignment cues.
61
+ 2. **Write into PLAN.md** using the subsection format above.
62
+ 3. **Add pause markers** where the script transitions between ideas within a segment, especially where an animation beat is expected.
63
+
64
+ If the script does not divide cleanly by segment:
65
+
66
+ - Propose a mapping and ask the user to confirm.
67
+ - If the script implies a different segment structure, suggest adding or removing segments to match.
68
+
69
+ ## Sanity checks
70
+
71
+ Run these checks before proceeding to the provider script:
72
+
73
+ ### Spelling and grammar
74
+
75
+ - Read the script aloud (mentally) and flag anything that sounds awkward when spoken.
76
+ - Flag common TTS pitfalls: abbreviations that should be spelled out (e.g., "API" should stay as "A-P-I" or "API" depending on desired pronunciation), numbers that might be misread, technical terms with unusual pronunciation.
77
+
78
+ ### Script-video alignment
79
+
80
+ - For each segment, verify the script references content that is actually on screen.
81
+ - Flag cases where:
82
+ - The script mentions a feature or visual that does not appear in the segment.
83
+ - A segment has prominent visuals that the script ignores.
84
+ - The script order does not match the timeline order.
85
+
86
+ When a misalignment is found, present it to the user:
87
+
88
+ > The script for segment "feature-cards" mentions "pricing comparison" but that segment shows collaboration features. Would you like to: (1) update the script to match the video, (2) update the video to match the script, or (3) keep it as-is?
89
+
90
+ ### Duration estimate
91
+
92
+ - Sum up the word count per segment and estimate total duration at ~150 WPM.
93
+ - Flag if any segment's script seems much longer or shorter than expected given the segment's visual complexity.
94
+ - Present the estimate: "The full script is ~320 words, which should take about 2 minutes 8 seconds to narrate at a moderate pace."
95
+
96
+ ## After the script is confirmed
97
+
98
+ 1. Update each segment's `voiceover` field to match its PLAN.md script section.
99
+ 2. Run `npx videowright script --write` to generate `voiceover_script/script.md`.
100
+ 3. Proceed to [provider_script.md](provider_script.md) for Flow A, or directly to sync for Flow B.
@@ -0,0 +1,56 @@
1
+ # Style Intake
2
+
3
+ ## When this is loaded
4
+
5
+ You are preparing to generate a voiceover and need to understand the user's tone and pacing preferences before creating the provider script.
6
+
7
+ ## Purpose
8
+
9
+ Style intake captures preferences that the agent **directly uses** when writing the provider script: tone, emotional arc, and reference style. Voice selection (which voice to use, gender, accent, speaking rate) is handled separately -- in the ElevenLabs provider walkthrough for API-mode users, or visually in the portal UI for portal-mode users.
10
+
11
+ ## Questions to ask
12
+
13
+ Ask these questions before generating the provider script. Group them into a single message -- do not ask one at a time. Skip any question whose answer is already clear from the user's input.
14
+
15
+ 1. **Tone.** What overall tone should the narration have?
16
+ - Examples: conversational, professional, enthusiastic, calm, authoritative, playful, serious, warm
17
+ - If the user is unsure, default to "conversational and warm" -- it works for most explainer videos.
18
+
19
+ 2. **Emotional arc.** Should the tone shift across the video? For example:
20
+ - Start serious, build to excited
21
+ - Maintain a steady calm throughout
22
+ - Start warm, shift to urgent for the call to action
23
+ - If the user has no preference, a steady tone is fine.
24
+
25
+ 3. **Reference** (optional). Is there a narrator or video style they want to emulate?
26
+ - "Like a TED talk", "like a product launch keynote", "like a podcast host"
27
+
28
+ ## How answers map to provider script
29
+
30
+ The agent targets ElevenLabs v2, which does not have v3-style emotion tags (`[excited]`, `[calm]`, etc.). Instead, tone is conveyed through punctuation, sentence structure, and pacing cues:
31
+
32
+ | Preference | Provider script effect |
33
+ |---|---|
34
+ | Tone: enthusiastic | Exclamation marks, short punchy sentences, emphatic word choice |
35
+ | Tone: calm/serious | Longer sentences, measured pacing, more pauses between phrases |
36
+ | Tone: warm | Natural conversational phrasing -- the default for most v2 voices |
37
+ | Emotional arc | Vary sentence structure and punctuation across sections of the script |
38
+
39
+ See [provider_script.md](provider_script.md) for the full v2 writing toolkit.
40
+
41
+ ## What NOT to ask
42
+
43
+ Voice attributes (gender, accent, age, speaking rate) are **not** part of style intake. They are chosen at voice-selection time:
44
+
45
+ - **API-mode users** pick from a curated voice catalog during the ElevenLabs provider walkthrough (see [providers/elevenlabs.md](providers/elevenlabs.md)).
46
+ - **Portal-mode users** select a voice visually in the ElevenLabs web UI.
47
+
48
+ If the user volunteers voice preferences during style intake, acknowledge them and note that they will be applied during voice selection.
49
+
50
+ ## When to skip style intake
51
+
52
+ - The user has explicitly described the voice style they want in their initial request.
53
+ - The user is iterating on an existing voiceover and the style is already established.
54
+ - The user is using the manual flow (Flow B) -- they already have the audio.
55
+
56
+ In these cases, proceed directly to the next step without asking style questions.