buttercut 0.5.0 → 0.6.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 78b845e8b54d03aee93f00bdbaa96f140d0f10c94910a117352b7401cf30bf63
4
- data.tar.gz: 0eb609100a9e2f367b493d6aef9f45d08e784ad573c4d033733009be40ccc525
3
+ metadata.gz: 57b76c4460b6e40602877e8d9c3a31f0b9748b1447b2dbb5d7189a497a17ad87
4
+ data.tar.gz: 87b7a604c6f807171c463b5e3b49fef95d88796ebd3803b9ab6f495ebdc2e0e5
5
5
  SHA512:
6
- metadata.gz: 33eb34693818323f40900bcea272781391c372fc5bf9adb71f11632da83aa3de5936538150c69a9d5407f5e5a6059342eed23eed5683e6b1fec36d9a55b37d2a
7
- data.tar.gz: f022d15eb198cb32dde550d0120598cf5d8eb7f95145e21bfb70401df1c762bcd4474f4836966601aa46ce232e2bd364a25cc8d1187ad2c1e7edf02164ca1789
6
+ metadata.gz: bbce3378342cc2c11ae5644f215162f523e564206519873ad58f30fc38cbb61363db95a323dfbb0bf9e4df735f7ed7ea49d812bb9f49f826e0c773a7c33df5d4
7
+ data.tar.gz: 3c602e886a0d2597be5e960f0cd5202b8c4143ae8ab66dfa88e04ccc57a982f075ad0a61b489340bde86c13653e469fac472f146977c2b1d174747eb3d9edee9
@@ -27,7 +27,8 @@
27
27
  "Bash(python3:*)",
28
28
  "Bash(gh api:*)",
29
29
  "Bash(gh pr:*)",
30
- "Bash(cp *)"
30
+ "Bash(cp *)",
31
+ "Bash(chmod +x .claude/skills/export-video/export_video.rb)"
31
32
  ],
32
33
  "deny": [],
33
34
  "ask": []
@@ -3,95 +3,28 @@ name: analyze-video
3
3
  description: Adds visual descriptions to transcripts by extracting and analyzing video frames with ffmpeg. Creates visual transcript with periodic visual descriptions of the video clip. Use when all files have audio transcripts present (transcript) but don't yet have visual transcripts created (visual_transcript).
4
4
  ---
5
5
 
6
- # Skill: Analyze Video
6
+ # Skill: Analyze Video (parent brief)
7
7
 
8
- Add visual descriptions to audio transcripts by extracting JPG frames with ffmpeg and analyzing them. **Never read video files directly** - extract frames first.
8
+ Adds visual descriptions to a video's audio transcript by extracting JPG frames with ffmpeg and analyzing them.
9
+
10
+ `SKILL.md` is the parent's dispatch brief. The sub-agent's working prompt lives in `agent_prompt.md` — inline its contents when launching the Task agent. Don't pass `SKILL.md`.
9
11
 
10
12
  ## Prerequisites
11
13
 
12
- Videos must have audio transcripts. Run **transcribe-audio** skill first if needed.
14
+ Each video must already have an audio transcript. Run `transcribe-audio` first if any are missing.
13
15
 
14
- ## Workflow
16
+ ## Parallelism
15
17
 
16
- ### 1. Inputs from the parent
18
+ Launch at most **8 in parallel**. ffmpeg frame extraction is a brief CPU burst at the start; the rest of the runtime is LLM API calls. 8 is a comfortable middle ground that won't saturate older machines.
17
19
 
18
- This skill runs as a sub-agent. Do NOT read `library.yaml` or `settings.yaml` — the parent has that context and passes everything inline in your prompt. Expect these inputs:
20
+ ## Inputs to gather and pass inline
19
21
 
20
22
  - `video_path` — absolute path to the video file
21
23
  - `audio_transcript_path` — absolute path to the prepared audio transcript JSON
22
24
  - `visual_transcript_path` — absolute path to write the visual transcript JSON
23
25
 
24
- ### 2. Copy & Clean Audio Transcript
25
-
26
- Don't read the audio transcript, just copy it and then prepare it by using the prepare_visual_script.rb file. This removes word-level timing data and prettifies the JSON for easier editing:
27
-
28
- ```bash
29
- cp <audio_transcript_path> <visual_transcript_path>
30
- ruby .claude/skills/analyze-video/prepare_visual_script.rb <visual_transcript_path>
31
- ```
32
-
33
- ### 3. Extract Frames (Binary Search)
34
-
35
- Create frame directory: `mkdir -p tmp/frames/[video_name]`
36
-
37
- **Videos ≤30s:** Extract one frame at 2s
38
- **Videos >30s:** Extract start (2s), middle (duration/2), end (duration-2s)
39
-
40
- ```bash
41
- ffmpeg -ss 00:00:02 -i video.mov -vframes 1 -vf "scale=1280:-1" tmp/frames/[video_name]/start.jpg
42
- ```
43
-
44
- **Subdivide when:** Footage start, middle and end have different subjects, setting or angle changes
45
- **Stop when:** The footage no longer seems to be changing or only has minor changes
46
- **Never sample** more frequently than once per 30 seconds
47
-
48
- ### 4. Add Visual Descriptions
49
-
50
- Read the visual video json file that you created earlier.
51
-
52
- **Read the JPG frames** from `tmp/frames/[video_name]/` using Read tool, then **Edit** the file at `<visual_transcript_path>`:
53
-
54
- Do these incrementally. You don't need to create a program or script to do this, just incrementally edit the json whenever you read new frames.
55
-
56
- **Dialogue segments - add `visual` field:**
57
- ```json
58
- {
59
- "start": 2.917,
60
- "end": 7.586,
61
- "text": "Hey, good afternoon everybody.",
62
- "visual": "Man in red shirt speaking to camera in medium shot. Home office with bookshelf. Natural lighting.",
63
- "words": [...]
64
- }
65
- ```
66
-
67
- **B-roll segments - insert new entries:**
68
- ```json
69
- {
70
- "start": 35.474,
71
- "end": 56.162,
72
- "text": "",
73
- "visual": "Green bicycle parked in front of building. Urban street with trees.",
74
- "b_roll": true,
75
- "words": []
76
- }
77
- ```
78
-
79
- **Guidelines:**
80
- - Descriptions should be 3 sentences max.
81
- - First segment: detailed (subject, setting, shot type, lighting, camera style)
82
- - Continuing shots: brief if similar, otherwise can be up to 3 sentences if drastically different.
83
-
84
- ### 5. Cleanup & Return
85
-
86
- ```bash
87
- rm -rf tmp/frames/[video_name]
88
- ```
26
+ After the agent returns, update `library.yaml` with `visual_transcript: <filename>.json`.
89
27
 
90
- Return structured response:
91
- ```
92
- ✓ [video_filename.mov] analyzed successfully
93
- Visual transcript: <visual_transcript_path>
94
- Video path: <video_path>
95
- ```
28
+ ## Next step
96
29
 
97
- **DO NOT update library.yaml** - parent agent handles this to avoid race conditions in parallel execution.
30
+ Once all videos have visual transcripts, dispatch `summarize-video` (Haiku model) to produce summaries.
@@ -0,0 +1,84 @@
1
+ # Analyze Video (sub-agent prompt)
2
+
3
+ You are a sub-agent. Add visual descriptions to one video's audio transcript by extracting JPG frames with ffmpeg and analyzing them. **Never read the video file directly** — extract frames first.
4
+
5
+ ## Inputs (passed inline by the parent)
6
+
7
+ - `video_path` — absolute path to the video file
8
+ - `audio_transcript_path` — absolute path to the prepared audio transcript JSON
9
+ - `visual_transcript_path` — absolute path to write the visual transcript JSON
10
+
11
+ Do NOT read `library.yaml` or `settings.yaml`.
12
+
13
+ ## 1. Copy & clean audio transcript
14
+
15
+ Don't read the audio transcript — just copy it, then prepare it via `prepare_visual_script.rb`. This removes word-level timing data and prettifies the JSON for easier editing:
16
+
17
+ ```bash
18
+ cp <audio_transcript_path> <visual_transcript_path>
19
+ ruby .claude/skills/analyze-video/prepare_visual_script.rb <visual_transcript_path>
20
+ ```
21
+
22
+ ## 2. Extract frames (binary search)
23
+
24
+ Create frame directory: `mkdir -p tmp/frames/[video_name]`
25
+
26
+ **Videos ≤30s:** extract one frame at 2s
27
+ **Videos >30s:** extract start (2s), middle (duration/2), end (duration-2s)
28
+
29
+ ```bash
30
+ ffmpeg -ss 00:00:02 -i video.mov -vframes 1 -vf "scale=1280:-1" tmp/frames/[video_name]/start.jpg
31
+ ```
32
+
33
+ **Subdivide when:** start, middle, and end have different subjects, settings, or angle changes
34
+ **Stop when:** the footage no longer seems to be changing or only has minor changes
35
+ **Never sample** more frequently than once per 30 seconds
36
+
37
+ ## 3. Add visual descriptions
38
+
39
+ Read the visual transcript JSON you created in step 1.
40
+
41
+ **Read the JPG frames** from `tmp/frames/[video_name]/` using the Read tool, then **Edit** the file at `<visual_transcript_path>`. Do this incrementally — no script needed; just edit the JSON each time you read new frames.
42
+
43
+ **Dialogue segments — add `visual` field:**
44
+ ```json
45
+ {
46
+ "start": 2.917,
47
+ "end": 7.586,
48
+ "text": "Hey, good afternoon everybody.",
49
+ "visual": "Man in red shirt speaking to camera in medium shot. Home office with bookshelf. Natural lighting.",
50
+ "words": [...]
51
+ }
52
+ ```
53
+
54
+ **B-roll segments — insert new entries:**
55
+ ```json
56
+ {
57
+ "start": 35.474,
58
+ "end": 56.162,
59
+ "text": "",
60
+ "visual": "Green bicycle parked in front of building. Urban street with trees.",
61
+ "b_roll": true,
62
+ "words": []
63
+ }
64
+ ```
65
+
66
+ **Guidelines:**
67
+ - Descriptions: 3 sentences max
68
+ - First segment: detailed (subject, setting, shot type, lighting, camera style)
69
+ - Continuing shots: brief if similar; up to 3 sentences if drastically different
70
+
71
+ ## 4. Cleanup & return
72
+
73
+ ```bash
74
+ rm -rf tmp/frames/[video_name]
75
+ ```
76
+
77
+ Return:
78
+ ```
79
+ ✓ [video_filename.mov] analyzed successfully
80
+ Visual transcript: <visual_transcript_path>
81
+ Video path: <video_path>
82
+ ```
83
+
84
+ **Do NOT update library.yaml** — parent handles this to avoid race conditions in parallel execution.
@@ -1,6 +1,6 @@
1
1
  ---
2
2
  name: backup-library
3
- description: Creates compressed ZIP backups of libraries directory. Backs up library.yaml, transcripts, and roughcuts (not video files). This skill can also be useful when you need to restore a library.
3
+ description: Backs up user libraries and all their contents (external video excluded). This skill can also be useful when you need to restore a library.
4
4
  ---
5
5
 
6
6
  # Skill: Backup Library
@@ -0,0 +1,74 @@
1
+ ---
2
+ name: cut-planner
3
+ description: Plans a cut (roughcut, sequence, or scene) from a library's clip summaries. Reads all clip summaries, then talks with the user and iteratively creates a plan markdown file until both agent and user are happy and understand the plan.
4
+ ---
5
+
6
+ # Skill: Cut Planner
7
+
8
+ ## Overview
9
+
10
+ In the cut-planner skill, the main thread reads clip summaries from a library to understand footage coverage, then asks the user about the footage to confirm its understanding. It confirms who the characters and locations are, then updates the library.yaml's footage_summary and user_context as it learns more about the footage. If it determines summaries are wrong or missing details, it also updates the summary markdowns.
11
+
12
+ After confirming its understanding of the footage, it works with the user to create a narrative plan markdown file.
13
+
14
+ This skill runs in the main thread and does not use a sub-agent.
15
+
16
+ ## Cut Planner Process
17
+
18
+ ### 1. Verify all clips have visual transcripts and summaries
19
+ Read `libraries/[library-name]/library.yaml`. Every clip must have `visual_transcript` and `summary` populated. If either is missing for any clip, stop and tell the user which clips still need processing — don't try to plan from incomplete footage. Then ask if they want to resume processing the library.
20
+
21
+ ### 2. Read summaries
22
+
23
+ #### Sequences
24
+ If the user explicitly says they want something short like a short sequence (60 seconds or less), consider asking them about what they want and then grepping through summaries to find the handful of files they might need.
25
+
26
+ #### Rough Cuts
27
+ If the user wants a full roughcut, read every `libraries/[library-name]/summaries/summary_*.md` file. This will give you full knowledge of the library.
28
+
29
+ ### 3. Confirm the footage knowledge and update incorrect summaries
30
+ Tell the user what you've learned about the footage, then confirm you understand the Five W's of all the footage: Who, What, When, Where, Why.
31
+
32
+ Don't tell them "Five W's" or label out Who, What, When, Where, Why, just talk with them conversationally like an assistant editor getting a grip on the footage.
33
+
34
+ If they want a full roughcut, spend more time. If they just want a sequence, be brief.
35
+
36
+ Talk with the user until you confirm you understand the footage. Update library.yaml based on the user's responses as you work through questions.
37
+
38
+ Update footage_summary (locations, characters, narrative, dialogue, clips) and user_context (preferences, goals, etc.) as you iteratively learn more about the footage.
39
+
40
+ Updating user_context and footage_summary helps future agents understand the footage and the user.
41
+
42
+ For example, if a summary mentions a generic man or woman but you learn the person is actually the user, replace man/woman with the user's name. Ask the user's name if you don't know it already.
43
+
44
+ ### 4. Ask target length
45
+ If available, use the `AskUserQuestion` tool or similar to ask the user what length of video they want to create. Use your judgement based on the footage — options like short sequence (30–60s), medium cut (5–8 min), or longer roughcut (9+ min) make good starting points. Podcast footage will likely require a longer option.
46
+
47
+ ### 5. If creating a roughcut, propose 2–3 concepts (titles only)
48
+ Give the user 2–3 genuinely distinct narrative concepts. Keep this round short — it's about picking a direction, not approving a full plan. For each concept, write only:
49
+ - **Title** — short, evocative
50
+ - **Concept** — 1–2 sentences explaining the angle, tone, or arc
51
+
52
+ Do **not** include beats, footage suggestions, runtime breakdowns, or format notes yet. Those come in step 6 once a direction is chosen.
53
+
54
+ Make the options genuinely distinct — different angles, tones, or arcs. End with: "Which feels right, or want me to explore something different?"
55
+
56
+ If the user just wants a short sequence, give them information about what the sequence will contain.
57
+
58
+ ### 6. Flesh out the chosen concept
59
+ Once the user picks a direction for a full roughcut, expand it into a full plan and present that for approval. Now include:
60
+ - **Format** — vlog, YouTube Short, long-form, documentary, etc.
61
+ - **Beats** — 3–6 beats, each with editorial intent and a rough share of the runtime ("open with ~3 min of X", "montage of Y", "close on Z")
62
+ - **Footage suggestions per beat** — name a few videos likely to feed each beat, ie DJI_123, panasonic_1234, etc. Include rough or specific dialogue if you think it will be helpful.
63
+ - **Approx. duration**
64
+
65
+ Iterate on the fleshed-out plan until the user explicitly signals go.
66
+
67
+ If the user wants a short sequence, be brief, just a few sentences, including dialogue if that makes sense.
68
+
69
+ ### 7. Save the plan
70
+ Copy `templates/plan_template.md` to `libraries/[library-name]/plans/plan_[short-name]_[YYYYMMDD_HHMMSS].md` and fill in every section. The template is the canonical structure — Concept, Format, Target Duration, Beats (with intent / approx. share / footage suggestions), Required Dialogue, Notes for the Build.
71
+
72
+ The plan is direction. The build agent confirms specific clips inside each beat.
73
+
74
+ Tell the user the plan is ready and confirm they want to move forward, then invoke the `roughcut` skill, passing the full plan path (`libraries/[library-name]/plans/plan_[short-name]_[YYYYMMDD_HHMMSS].md`) as a skill argument — `roughcut` hard-stops if it isn't given one.
@@ -1,71 +1,65 @@
1
1
  ---
2
2
  name: roughcut
3
- description: Creates video rough cut yaml file for use with Buttercut gem. Concatenates visual transcripts with file markers, creates a roughcut yaml with clip selections, then exports to XML format. Use this skill when users want a "roughcut", "sequence" or "scene" generated. These are all the same thing, just with different lengths.
3
+ description: Builds a roughcut YAML and exported XML (Final Cut, Premiere, or Resolve) from an approved plan markdown file produced by `cut-planner`. Spins up a sub-agent that reads the library directly, builds the cut iteratively, reviews against format conventions, then returns paths plus conversational editorial notes. If the user asks for a "roughcut", "sequence", or "scene" and no plan exists yet, run `cut-planner` first.
4
4
  ---
5
5
 
6
- # Skill: Create Rough Cut
6
+ # Skill: Roughcut Build
7
7
 
8
- This skill handles the editorial process of creating rough cut timeline scripts from transcribed video footage. It launches a specialized agent that analyzes transcripts, makes editorial decisions, outputs a structured YAML rough cut, and exports it to Final Cut Pro XML format.
8
+ Turns an approved plan into a working roughcut YAML and exported XML. The sub-agent runs async it commits to a complete cut and returns with notes you can dialogue about.
9
9
 
10
- **Note:** This skill is used for both full-length rough cuts (multiple minutes) and short sequences (30-60 seconds).
10
+ ## 1. Locate the Plan
11
+ A plan path **must** be passed in as a skill argument (the format produced by `cut-planner` step 7: `libraries/[library-name]/plans/plan_[short-name]_[timestamp].md`). If no plan path is passed in, stop immediately and return a message to the parent saying a plan path is required and `cut-planner` should be run first. Do not search for plans, do not pick one, do not proceed without one.
11
12
 
12
- ## Prerequisites Check
13
+ ## 2. Resolve the Editor (Parent Only)
14
+ The sub-agent receives a final editor value:
15
+ 1. If `library.yaml` has `editor` set, use it.
16
+ 2. Otherwise fall back to `libraries/settings.yaml`'s `editor` and write the value back to `library.yaml`.
17
+ 3. If neither has one, ask the user (Final Cut Pro X / Adobe Premiere Pro / DaVinci Resolve), then save the choice to both `library.yaml` and `libraries/settings.yaml`.
13
18
 
14
- Before launching the roughcut agent, verify all transcripts are complete:
15
-
16
- 1. **Check library exists:**
17
- ```bash
18
- ls libraries/[library-name]/library.yaml
19
- ```
20
-
21
- 2. **Verify visual transcripts:**
22
- Read `libraries/[library-name]/library.yaml` and check that every video entry has both:
23
- - `transcript` populated (audio transcript filename)
24
- - `visual_transcript` populated (visual descriptions filename)
25
-
26
- If any visual transcripts are missing:
27
- - Inform user that transcript processing must be completed first
28
- - Ask if they want Claude to finish transcript processing using the `transcribe-audio` and `analyze-video` skills
29
- - Do not proceed with roughcut creation until all transcripts are complete
30
-
31
- ## Launch Roughcut Agent
32
-
33
- Once prerequisites are verified, launch the roughcut creation agent using the Task tool:
19
+ ## 3. Launch Build Agent
34
20
 
35
21
  ```
36
- Task tool with:
22
+ Agent tool with:
37
23
  - subagent_type: "general-purpose"
38
- - description: "Create rough cut from visual transcripts"
39
- - prompt: [See agent prompt template below]
24
+ - description: "Build roughcut YAML and XML from approved plan"
25
+ - prompt: [see template below]
40
26
  ```
41
27
 
42
28
  ### Agent Prompt Template
43
29
 
44
- When launching the agent, provide a detailed prompt with all necessary context:
45
-
46
30
  ```
47
- You are a video editor AI agent creating a rough cut or sequence for the "{library_name}" library.
31
+ You are a video editor AI agent for the "{library_name}" library. The plan below is approved direction — beats, intent, rough length, format. The specific clips are yours to find inside the library. Work iteratively, then review and refine before returning.
32
+
33
+ LIBRARY YAML: libraries/{library_name}/library.yaml
48
34
 
49
- USER REQUEST: {what_user_asked_for}
35
+ APPROVED PLAN:
36
+ {paste full plan markdown}
50
37
 
51
- LIBRARY CONTEXT:
52
- {paste relevant content from library.yaml - footage_summary, user_context, etc.}
38
+ EDITOR: {editor}
53
39
 
54
- YOUR TASK:
55
- 1. Read the roughcut creation instructions from .claude/skills/roughcut/agent_instructions.md
56
- 2. Follow those instructions to create the rough cut
57
- 3. Return the paths to the created YAML and XML files when complete
40
+ TASK:
41
+ 1. Read `.claude/skills/roughcut/agent_prompt.md`
42
+ 2. Follow the steps there in order (the plan is already approved — don't re-propose)
43
+ 3. Return paths to the YAML and XML, plus your editorial notes (alternatives, judgment calls, plan deviations) in conversational prose
44
+ ```
58
45
 
59
- DELIVERABLES:
60
- - Rough cut YAML file at: libraries/{library_name}/roughcuts/{roughcut_name}_{datetime}.yaml
61
- - Exported XML file for user's chosen video editor
62
- - Backup created via backup-library skill
46
+ ## 4. Context Contract
47
+ This sub-agent reads `library.yaml` directly it needs the full inventory plus `footage_summary` and `user_context`. This is a deliberate carve-out from the parallel-skill contract: `roughcut` runs as a single agent (no race risk), and editorial work needs broader library context than inline-passing comfortably supports.
63
48
 
64
- Begin by reading the agent instructions file.
49
+ ## 5. Copy XML to Desktop (if enabled)
50
+ Check `libraries/settings.yaml` for `save_to_desktop_after_export`:
51
+ 1. If the key is `true`, copy the exported XML to `~/Desktop/` so it's easy to grab and import into the editor.
52
+ 2. If the key is `false`, skip this step.
53
+ 3. If the key is missing, ask the user whether to drop a copy of every export on the Desktop, save their answer (`true`/`false`) to `libraries/settings.yaml`, then act on it.
54
+
55
+ ```bash
56
+ cp [library xml path] ~/Desktop/
65
57
  ```
66
58
 
67
- ## After Agent Completes
59
+ The library copy stays as the canonical artifact; the desktop copy is a convenience drop.
60
+
61
+ ## 6. Backup the Library
62
+ Run the `backup-library` skill. This snapshots the library (yaml, transcripts, summaries, plans, roughcuts) so progress can be restored if needed.
68
63
 
69
- When the agent returns:
70
- 1. Inform the user of the created roughcut file (the xml file, not the yaml file) and its location
71
- 2. Confirm the rough cut is ready to import into their video editor
64
+ ## 7. Report Results
65
+ Surface the agent's return message to the user the YAML path, the library XML path, the desktop XML path (only if step 5 actually copied one), plus the editorial notes. The notes are the conversational hook for what comes next; small fixes you can do directly in the YAML, larger restructures relaunch this skill with a revised plan.
@@ -0,0 +1,153 @@
1
+ # Roughcut Agent Instructions
2
+
3
+ You are a video editor AI agent. The user approved a narrative plan in their main conversation — direction and structure, not a paper cut. Your job: explore the library, find real moments that fill each beat, build the rough cut iteratively, review and refine against format conventions, then return the cut with your editorial notes.
4
+
5
+ The plan is your compass. The library is your full toolkit.
6
+
7
+ ## Working style
8
+
9
+ This is async work. **You do not ping the user mid-task.** You commit to a complete cut, then return with your reasoning and any alternatives you considered. The parent dialogues with the user from there.
10
+
11
+ Within the task, work iteratively, not in one shot:
12
+ 1. Take one beat from the plan at a time.
13
+ 2. Read transcripts only for the videos you actually need.
14
+ 3. Drop candidate clips into the YAML — close enough, not perfect.
15
+ 4. Move on.
16
+ 5. After every couple of beats, **look back**. Cut earlier clips that get said better later. Tighten dragging beats. Swap in stronger moments.
17
+
18
+ You'll touch the YAML many times. That's the point.
19
+
20
+ The plan suggests footage per beat as a starting point. If a stronger moment lives in a video the plan didn't name, use it — note the deviation in your return notes so the user knows what you considered.
21
+
22
+ ## Workflow
23
+
24
+ ### 1. Read the library
25
+
26
+ Open `libraries/[library-name]/library.yaml`. The library includes:
27
+ - The full video inventory (filenames, paths, audio + visual transcript paths)
28
+ - `footage_summary` — what the project is, the tone, the subjects
29
+ - `user_context` — what you've learned about this user across sessions
30
+
31
+ After reading the library, you can determine what files you'll need to read beat-by-beat.
32
+
33
+ ### 2. Set up the YAML
34
+
35
+ Derive a slug from the plan's filename (the `[short-name]` portion of `plan_[short-name]_[timestamp].md`). Generate a fresh timestamp:
36
+
37
+ ```bash
38
+ date +%Y%m%d_%H%M%S
39
+ ```
40
+
41
+ Reuse the same timestamp string for the YAML and exported XML. Copy the template:
42
+
43
+ ```bash
44
+ cp templates/roughcut_template.yaml "libraries/[library-name]/roughcuts/[slug]_[timestamp].yaml"
45
+ ```
46
+
47
+ Set `description` in the YAML to a one-line summary of what the cut is.
48
+
49
+ ### 3. Build beat by beat
50
+
51
+ **Clip file types** (all under `libraries/[library-name]/`):
52
+ - **Summary** (`summaries/summary_*.md`) — high-level markdown about what happens in a clip. Short and quick to scan. Use to explore adjacent clips or remind yourself what's in a clip without loading the full transcript.
53
+ - **Visual transcript** (`transcripts/visual_*.json`) — segment-level (roughly sentence): `start`/`end` (seconds), `text` (dialogue, `""` if silent), `visual` (shot description, only when visuals change). This is the primary file for picking moments.
54
+ - **Audio transcript** (`transcripts/*.json`, same name without the `visual_` prefix) — same shape as the visual transcript plus a `words` array per segment with per-word `start`/`end`. Reach for it when you need word-level in/out points to trim inside a segment.
55
+
56
+ For each beat in the plan:
57
+ - Open visual transcripts for the videos that feed it.
58
+ - Pick moments that make sense and drop clips into the YAML.
59
+ - If the segment's dialogue should be cut down, grep to find the word-by-word timing in the audio transcript. These files can be large, so it's generally faster and better to grep for the moment, rather than loading the entire file into memory. See the worked example below.
60
+ - After you've completed a scene or beat, consider going back to improve earlier beats if you can make them stronger, more cohesive, or can remove redundancy.
61
+
62
+ **Worked example — trimming inside a segment.** A wordy segment from `transcripts/visual_DJI_123.json`:
63
+
64
+ ```json
65
+ {
66
+ "start": 15.129,
67
+ "end": 17.195,
68
+ "text": "We're also using AI on the back end to try to find issues as well as try to find more test issues."
69
+ }
70
+ ```
71
+
72
+ The line restates itself — "to try to find issues as well as try to find more test issues." End the clip after the first "issues" instead. The audio transcript lives at the same path without the `visual_` prefix (`transcripts/DJI_123.json`). Grep for the word to get its `end` time:
73
+
74
+ ```bash
75
+ grep -B 1 -A 2 '"word": "issues' libraries/[library-name]/transcripts/DJI_123.json
76
+ ```
77
+
78
+ Returns both occurrences — pick the one matching context (the first "issues" ends at 16.272s, the final "issues." at 17.195s):
79
+
80
+ ```json
81
+ { "word": "issues", "start": 16.152, "end": 16.272 },
82
+ { "word": "issues.", "start": 17.054, "end": 17.195 }
83
+ ```
84
+
85
+ Trimmed clip: `in_point: 00:00:15.13`, `out_point: 00:00:16.27`. Drops nearly a second of redundant phrasing.
86
+
87
+ **Each clip needs:**
88
+ - `source_file`: filename only (from the video's entry in `library.yaml`)
89
+ - `in_point`: start of the FIRST segment in the clip, `HH:MM:SS.ss`
90
+ - `out_point`: end of the LAST segment in the clip, `HH:MM:SS.ss`
91
+ - `dialogue`: spoken words for the span — concatenate across segments if the clip covers more than one
92
+ - `visual_description`: shot description from the visual transcript
93
+
94
+ Use `start`/`end` from segments directly — preserve sub-second precision (e.g. 2.849s → `00:00:02.85`).
95
+
96
+ **Transcripts can be wrong — fix them in the `dialogue` field in the roughcut YAML.** Transcripts will sometimes make mistakes on technical terms, brand names, proper nouns and when dealing with speakers with accents. They're not perfect. If you can clearly tell from context what was actually said, write the corrected version into the clip's `dialogue` field in the roughcut YAML. Do NOT edit the transcript JSON files themselves.
97
+
98
+ #### Examples:
99
+ "RubyVeedums" → "Ruby Meetups"
100
+ "Cloud Code" → "Claude Code"
101
+ "Hot Wide Native" → "HotWire Native"
102
+
103
+ Only correct when you're confident based on context. If a phrase is genuinely ambiguous, leave it or see if another take or cut works better.
104
+
105
+ ### 4. Review pass — format-aware refinement
106
+
107
+ Once a complete first pass exists, do a deliberate review with the format in mind. The plan tells you what kind of cut this is (vlog, YouTube Short, long-form, documentary, etc.). Use that to ask:
108
+
109
+ - **Beat lengths.** Are individual beats the right length for this format? A one-minute static exposition might be right for a documentary but probably not correct for a vlog. Five-second B-roll clips might work for a documentary, but don't make sense for a vlog either. Think about what you're building and what the tone and pacing should feel like. Revise timings when it will improve the pacing.
110
+ - **Dialogue tightness.** Does any clip's dialogue feel too wordy for the format and audience? The audio transcript's word-level timestamps let you trim inside a segment — drop filler, weak openers, or restarts when sharpening helps. **Word-level trimming is a first-class part of this pass, not an edge case.**
111
+ - **Redundancy.** Is a point made twice across different beats? Cut the weaker version.
112
+
113
+ Use editorial judgment based on what you know about the user (`user_context`) and what the format calls for.
114
+
115
+ ### 5. Finalize the YAML
116
+
117
+ - `total_duration`: sum of all clips, `HH:MM:SS.ss`
118
+ - `created_date`: `YYYY-MM-DD HH:MM:SS`
119
+ - Confirm `description` still reflects the cut
120
+
121
+ ### 6. Export
122
+
123
+ Use the `editor` value passed inline in the prompt — the parent already resolved it. Run the matching command:
124
+
125
+ ```bash
126
+ # Final Cut Pro X
127
+ bundle exec ./.claude/skills/roughcut/export_to_fcpxml.rb libraries/[library-name]/roughcuts/[slug]_[timestamp].yaml libraries/[library-name]/roughcuts/[slug]_[timestamp].fcpxml fcpx
128
+
129
+ # Premiere Pro
130
+ bundle exec ./.claude/skills/roughcut/export_to_fcpxml.rb libraries/[library-name]/roughcuts/[slug]_[timestamp].yaml libraries/[library-name]/roughcuts/[slug]_[timestamp].xml premiere
131
+
132
+ # DaVinci Resolve
133
+ bundle exec ./.claude/skills/roughcut/export_to_fcpxml.rb libraries/[library-name]/roughcuts/[slug]_[timestamp].yaml libraries/[library-name]/roughcuts/[slug]_[timestamp].xml resolve
134
+ ```
135
+
136
+ ### 7. Return — with notes
137
+
138
+ Return a conversational message. Include:
139
+ - The path to the YAML
140
+ - The path to the exported XML in the library
141
+ - Your editorial notes — alternatives you considered, judgment calls, plan deviations, pacing flags
142
+
143
+ Example:
144
+
145
+ > YAML: libraries/foo/roughcuts/my_cut_20260501_143022.yaml
146
+ > XML: libraries/foo/roughcuts/my_cut_20260501_143022.fcpxml
147
+ >
148
+ > A couple of alternates I had in mind:
149
+ >
150
+ > - For the ending, the dinosaur-wins angle could work — we'd swap in clips X, Y, Z. Happy to rebuild if that's the direction.
151
+ > - The intro currently runs 35s; if you want it tighter, just the helicopter takeoff (clip K) lands in 8s.
152
+
153
+ The parent reads your notes and dialogues with the user. Small fixes happen at the parent level; bigger restructures may relaunch this skill with a revised plan.
@@ -0,0 +1,31 @@
1
+ ---
2
+ name: summarize-video
3
+ description: Generates a short markdown summary of a video from its visual transcript. Covers overview, key visuals, notable dialogue, and b-roll. Run after analyze-video as the final footage analysis step; summaries become a required field on every video before any roughcut can be created. Always launch this skill using the Haiku model.
4
+ ---
5
+
6
+ # Skill: Summarize Video (parent brief)
7
+
8
+ Generates a short markdown summary from a video's visual transcript. Always launch on the **Haiku model**.
9
+
10
+ `SKILL.md` is the parent's dispatch brief. The sub-agent's working prompt lives in `agent_prompt.md` — inline its contents when launching the Task agent. Don't pass `SKILL.md`.
11
+
12
+ ## Parallelism
13
+
14
+ Launch (at most) 10 agents in parallel until all videos are summarized.
15
+
16
+ ## Pre-create the skeleton (parent step, before launching the agent)
17
+
18
+ For each video, the parent runs:
19
+
20
+ ```bash
21
+ ruby .claude/skills/summarize-video/summary_skeleton.rb <visual_transcript_path> <summary_output_path>
22
+ ```
23
+
24
+ This writes a skeleton file with the header (filename + duration) filled in and four `<!-- FILL_X -->` placeholders in the body. The agent fills them via `Edit`. The skeleton + Edit pattern is required: without it, Haiku frequently refuses Write and dumps markdown into its reply instead.
25
+
26
+ ## Inputs to gather and pass inline
27
+
28
+ - `visual_transcript_path` — absolute path to the visual transcript JSON
29
+ - `summary_output_path` — absolute path to the pre-created skeleton file
30
+
31
+ After the agent returns, update `library.yaml` with `summary: <filename>.md`.
@@ -0,0 +1,39 @@
1
+ # Summarize Video (sub-agent prompt)
2
+
3
+ You are a sub-agent on the Haiku model. The parent has pre-created a skeleton summary file at `<summary_output_path>` with the header (filename + duration) filled in and four placeholder markers in the body: `<!-- FILL_OVERVIEW -->`, `<!-- FILL_KEY_VISUALS -->`, `<!-- FILL_DIALOGUE -->`, `<!-- FILL_BROLL -->`.
4
+
5
+ Your job is to replace each placeholder with content using the **Edit** tool. Your text reply is just a one-line confirmation.
6
+
7
+ ## Inputs (passed inline by the parent)
8
+
9
+ - `visual_transcript_path` — absolute path to the visual transcript JSON
10
+ - `summary_output_path` — absolute path to the pre-created skeleton file
11
+
12
+ ## Action 1 — Bash: extract the script
13
+
14
+ ```bash
15
+ ruby .claude/skills/summarize-video/visual_script_extractor.rb <visual_transcript_path>
16
+ ```
17
+
18
+ The stdout is your input data: a header followed by interleaved `[VISUAL]` descriptions and timestamped dialogue.
19
+
20
+ ## Action 2 — Read the skeleton
21
+
22
+ Read `<summary_output_path>`. The Edit tool requires this before editing.
23
+
24
+ ## Action 3 — Edit each placeholder
25
+
26
+ Use the **Edit** tool four times to replace each `<!-- FILL_X -->` marker with the corresponding content:
27
+
28
+ - `<!-- FILL_OVERVIEW -->` → 2-3 sentences describing the narrative arc. Be specific; avoid vague endings like "the clip ends with..." or "discusses something."
29
+ - `<!-- FILL_KEY_VISUALS -->` → 3-6 bullets covering locations, distinctive shots, visual changes.
30
+ - `<!-- FILL_DIALOGUE -->` → 0–3 quotes formatted as `> [MM:SS] "Quote"`. For clips under 30 seconds, often 0 or 1 is enough — write `None` if nothing stands out. Skip filler ("um", "you know", "I have to be honest"). Use the `[MM:SS]` shown next to each line in the script.
31
+ - `<!-- FILL_BROLL -->` → cutaway descriptions distinct from the main subject. For single-shot clips, write `None`. Do not speculate about how the footage could be used as b-roll elsewhere.
32
+
33
+ ## Action 4 — Reply with one line
34
+
35
+ After the four Edits succeed, your text reply must be exactly:
36
+
37
+ `✓ <video_filename> summarized`
38
+
39
+ Nothing else. The file is the deliverable.