vidpipe 1.2.3 → 1.3.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,294 +1,384 @@
1
- [![CI](https://github.com/htekdev/vidpipe/actions/workflows/ci.yml/badge.svg)](https://github.com/htekdev/vidpipe/actions/workflows/ci.yml)
2
- [![npm version](https://img.shields.io/npm/v/vidpipe)](https://www.npmjs.com/package/vidpipe)
3
- [![Node.js 20+](https://img.shields.io/badge/node-20%2B-brightgreen)](https://nodejs.org/)
4
- [![License: ISC](https://img.shields.io/badge/license-ISC-blue)](./LICENSE)
5
-
6
- # 🎬 VidPipe
7
-
8
- **Drop a video. Get transcripts, summaries, short clips, captions, blog posts, and social media posts — automatically.**
9
-
10
- An AI-powered CLI pipeline that watches for new video recordings and transforms them into rich, structured content using [GitHub Copilot SDK](https://github.com/github/copilot-sdk) agents and OpenAI Whisper.
11
-
12
- ```bash
13
- npm install -g vidpipe
14
- ```
15
-
16
- ---
17
-
18
- ## ✨ Features
19
-
20
- - 🎬 **14-Stage Automated Pipeline** — Drop a video and walk away; everything runs end-to-end
21
- - 🎙️ **Whisper Transcription** — Word-level timestamps via OpenAI Whisper API
22
- - 🔇 **AI-Driven Silence Removal** — Conservative, context-aware dead-air detection (capped at 20% removal)
23
- - 📐 **Smart Split-Screen Layouts** — Webcam + screen content for 3 aspect ratios: portrait (9:16), square (1:1), and feed (4:5)
24
- - 🔍 **Edge-Based Webcam Detection** — Detects webcam overlay position via skin-tone analysis and inter-frame edge refinement (no hardcoded margins)
25
- - 🎯 **Face-Aware AR-Matched Cropping** — Webcam region is aspect-ratio-matched and center-cropped to fill each layout with no black bars
26
- - 💬 **Karaoke Captions** — Opus Clips-style word-by-word highlighting with green active word on portrait, yellow on landscape
27
- - 🪝 **Hook Overlays** — Animated title text burned into portrait short clips
28
- - ✂️ **Short Clips** — AI identifies the best 15–60s moments, supports composite (multi-segment) shorts
29
- - 🎞️ **Medium Clips** — 1–3 min standalone segments for deeper content with crossfade transitions
30
- - 📑 **Chapter Detection** — AI-identified topic boundaries in 4 formats (JSON, Markdown, FFmetadata, YouTube timestamps)
31
- - 📱 **Social Media Posts** — Platform-tailored content for TikTok, YouTube, Instagram, LinkedIn, and X
32
- - 📰 **Dev.to Blog Post** — Long-form technical blog post with frontmatter and web-sourced links
33
- - 🔗 **Web Search Integration** — Finds relevant links for social posts and blog content via Exa
34
- - 🔄 **Git Automation** Auto-commits and pushes all generated content after each video
35
- - 🎨 **Brand Voice** — Customize AI tone, vocabulary, hashtags, and content style via `brand.json`
36
- - 👁️ **Watch Mode** — Monitors a folder and processes new `.mp4` files on arrival
37
- - 🧠 **Agent Architecture** — Powered by GitHub Copilot SDK with tool-calling agents
38
-
39
- ---
40
-
41
- ## 🚀 Quick Start
42
-
43
- ```bash
44
- # Install globally
45
- npm install -g vidpipe
46
-
47
- # Set up your environment
48
- # Unix/Mac
49
- cp .env.example .env
50
- # Windows (PowerShell)
51
- Copy-Item .env.example .env
52
-
53
- # Then edit .env and add your OpenAI API key (REQUIRED):
54
- # OPENAI_API_KEY=sk-your-key-here
55
-
56
- # Verify all prerequisites are met
57
- vidpipe --doctor
58
-
59
- # Process a single video
60
- vidpipe /path/to/video.mp4
61
-
62
- # Watch a folder for new recordings
63
- vidpipe --watch-dir ~/Videos/Recordings
64
-
65
- # Full example with options
66
- vidpipe \
67
- --watch-dir ~/Videos/Recordings \
68
- --output-dir ~/Content/processed \
69
- --openai-key sk-... \
70
- --brand ./brand.json \
71
- --verbose
72
- ```
73
-
74
- > **Prerequisites:**
75
- > - **Node.js 20+**
76
- > - **FFmpeg 6.0+** — Auto-bundled on common platforms (Windows x64, macOS, Linux x64) via [`ffmpeg-static`](https://www.npmjs.com/package/ffmpeg-static). On other architectures, install system FFmpeg (see [Troubleshooting](#troubleshooting)). Override with `FFMPEG_PATH` env var if you need a specific build.
77
- > - **OpenAI API key** (**required**) — Get one at [platform.openai.com/api-keys](https://platform.openai.com/api-keys). Needed for Whisper transcription and all AI features.
78
- > - **GitHub Copilot subscription** — Required for AI agent features (shorts generation, social media posts, summaries, blog posts). See [GitHub Copilot](https://github.com/features/copilot).
79
- >
80
- > See [Getting Started](./docs/getting-started.md) for full setup instructions.
81
-
82
- ---
83
-
84
- ## 🎮 CLI Usage
85
-
86
- ```
87
- vidpipe [options] [video-path]
88
- ```
89
-
90
- | Option | Description |
91
- |--------|-------------|
92
- | `--doctor` | Check that all prerequisites (FFmpeg, API keys, etc.) are installed and configured |
93
- | `[video-path]` | Process a specific video file (implies `--once`) |
94
- | `--watch-dir <path>` | Folder to watch for new recordings |
95
- | `--output-dir <path>` | Output directory (default: `./recordings`) |
96
- | `--openai-key <key>` | OpenAI API key |
97
- | `--exa-key <key>` | Exa AI key for web search in social posts |
98
- | `--brand <path>` | Path to `brand.json` (default: `./brand.json`) |
99
- | `--once` | Process next video and exit |
100
- | `--no-silence-removal` | Skip silence removal |
101
- | `--no-shorts` | Skip short clip extraction |
102
- | `--no-medium-clips` | Skip medium clip generation |
103
- | `--no-social` | Skip social media posts |
104
- | `--no-captions` | Skip caption generation/burning |
105
- | `--no-git` | Skip git commit/push |
106
- | `-v, --verbose` | Debug-level logging |
107
-
108
- ---
109
-
110
- ## 📁 Output Structure
111
-
112
- ```
113
- recordings/
114
- └── my-awesome-demo/
115
- ├── my-awesome-demo.mp4 # Original video
116
- ├── my-awesome-demo-edited.mp4 # Silence-removed
117
- ├── my-awesome-demo-captioned.mp4 # With burned-in captions
118
- ├── transcript.json # Word-level transcript
119
- ├── transcript-edited.json # Timestamps adjusted for silence removal
120
- ├── README.md # AI-generated summary with screenshots
121
- ├── captions/
122
- │ ├── captions.srt # SubRip subtitles
123
- │ ├── captions.vtt # WebVTT subtitles
124
- │ └── captions.ass # Advanced SSA (karaoke-style)
125
- ├── shorts/
126
- │ ├── catchy-title.mp4 # Landscape base clip
127
- │ ├── catchy-title-captioned.mp4 # Landscape + burned captions
128
- │ ├── catchy-title-portrait.mp4 # 9:16 split-screen
129
- │ ├── catchy-title-portrait-captioned.mp4 # Portrait + captions + hook overlay
130
- │ ├── catchy-title-feed.mp4 # 4:5 split-screen
131
- │ ├── catchy-title-square.mp4 # 1:1 split-screen
132
- │ ├── catchy-title.md # Clip metadata
133
- │ └── catchy-title/
134
- │ └── posts/ # Per-short social posts (5 platforms)
135
- ├── medium-clips/
136
- │ ├── deep-dive-topic.mp4 # Landscape base clip
137
- │ ├── deep-dive-topic-captioned.mp4 # With burned captions
138
- │ ├── deep-dive-topic.md # Clip metadata
139
- │ └── deep-dive-topic/
140
- │ └── posts/ # Per-clip social posts (5 platforms)
141
- ├── chapters/
142
- │ ├── chapters.json # Structured chapter data
143
- │ ├── chapters.md # Markdown table
144
- │ ├── chapters.ffmetadata # FFmpeg metadata format
145
- │ └── chapters-youtube.txt # YouTube description timestamps
146
- └── social-posts/
147
- ├── tiktok.md # Full-video social posts
148
- ├── youtube.md
149
- ├── instagram.md
150
- ├── linkedin.md
151
- ├── x.md
152
- └── devto.md # Dev.to blog post
153
- ```
154
-
155
- ---
156
-
157
- ## 🔄 Pipeline
158
-
159
- ```
160
- Ingest Transcribe → Silence Removal → Captions → Caption Burn → Shorts → Medium Clips → Chapters → Summary → Social Media → Short Posts → Medium Clip Posts → Blog → Git Push
161
- ```
162
-
163
- | # | Stage | Description |
164
- |---|-------|-------------|
165
- | 1 | **Ingestion** | Copies video, extracts metadata with FFprobe |
166
- | 2 | **Transcription** | Extracts audio → OpenAI Whisper for word-level transcription |
167
- | 3 | **Silence Removal** | AI detects dead-air segments; context-aware removals capped at 20% |
168
- | 4 | **Captions** | Generates `.srt`, `.vtt`, and `.ass` subtitle files with karaoke word highlighting |
169
- | 5 | **Caption Burn** | Burns ASS captions into video (single-pass encode when silence was also removed) |
170
- | 6 | **Shorts** | AI identifies best 15–60s moments; extracts single and composite clips with 6 variants per short |
171
- | 7 | **Medium Clips** | AI identifies 1–3 min standalone segments with crossfade transitions |
172
- | 8 | **Chapters** | AI detects topic boundaries; outputs JSON, Markdown, FFmetadata, and YouTube timestamps |
173
- | 9 | **Summary** | AI writes a Markdown README with captured screenshots |
174
- | 10 | **Social Media** | Platform-tailored posts for TikTok, YouTube, Instagram, LinkedIn, and X |
175
- | 11 | **Short Posts** | Per-short social media posts for all 5 platforms |
176
- | 12 | **Medium Clip Posts** | Per-medium-clip social media posts for all 5 platforms |
177
- | 13 | **Blog** | Dev.to blog post with frontmatter, web-sourced links via Exa |
178
- | 14 | **Git Push** | Auto-commits and pushes to `origin main` |
179
-
180
- Each stage can be independently skipped with `--no-*` flags. A stage failure does not abort the pipeline — subsequent stages proceed with whatever data is available.
181
-
182
- ---
183
-
184
- ## 🤖 LLM Providers
185
-
186
- VidPipe supports multiple LLM providers:
187
-
188
- | Provider | Env Var | Default Model | Notes |
189
- |----------|---------|---------------|-------|
190
- | `copilot` (default) | — | Claude Opus 4.6 | Uses GitHub Copilot auth |
191
- | `openai` | `OPENAI_API_KEY` | gpt-4o | Direct OpenAI API |
192
- | `claude` | `ANTHROPIC_API_KEY` | claude-opus-4.6 | Direct Anthropic API |
193
-
194
- Set `LLM_PROVIDER` in your `.env` or pass via CLI. Override model with `LLM_MODEL`.
195
-
196
- The pipeline tracks token usage and estimated cost across all providers, displaying a summary at the end of each run.
197
-
198
- ---
199
-
200
- ## ⚙️ Configuration
201
-
202
- Configuration is loaded from CLI flags → environment variables → `.env` file → defaults.
203
-
204
- ```env
205
- # .env
206
- OPENAI_API_KEY=sk-your-key-here
207
- WATCH_FOLDER=/path/to/recordings
208
- OUTPUT_DIR=/path/to/output
209
- # EXA_API_KEY=your-exa-key # Optional: enables web search in social/blog posts
210
- # BRAND_PATH=./brand.json # Optional: path to brand voice config
211
- # FFMPEG_PATH=/usr/local/bin/ffmpeg
212
- # FFPROBE_PATH=/usr/local/bin/ffprobe
213
- ```
214
-
215
- ---
216
-
217
- ## 📚 Documentation
218
-
219
- | Guide | Description |
220
- |-------|-------------|
221
- | [Getting Started](./docs/getting-started.md) | Prerequisites, installation, and first run |
222
- | [Configuration](./docs/configuration.md) | All CLI flags, env vars, skip options, and examples |
223
- | [FFmpeg Setup](./docs/ffmpeg-setup.md) | Platform-specific install (Windows, macOS, Linux, ARM64) |
224
- | [Brand Customization](./docs/brand-customization.md) | Customize AI voice, vocabulary, hashtags, and content style |
225
-
226
- ---
227
-
228
- ## 🏗️ Architecture
229
-
230
- Agent-based architecture built on the [GitHub Copilot SDK](https://github.com/github/copilot-sdk):
231
-
232
- ```
233
- BaseAgent (abstract)
234
- ├── SilenceRemovalAgent → detect_silence, decide_removals
235
- ├── SummaryAgent → capture_frame, write_summary
236
- ├── ShortsAgent → plan_shorts
237
- ├── MediumVideoAgent → plan_medium_clips
238
- ├── ChapterAgent → generate_chapters
239
- ├── SocialMediaAgent → search_links, create_posts
240
- └── BlogAgent search_web, write_blog
241
- ```
242
-
243
- Each agent communicates with the LLM through structured tool calls, ensuring reliable, parseable outputs.
244
-
245
- ---
246
-
247
- ## 🛠️ Tech Stack
248
-
249
- | Technology | Purpose |
250
- |------------|---------|
251
- | [TypeScript](https://www.typescriptlang.org/) | Language (ES2022, ESM) |
252
- | [GitHub Copilot SDK](https://github.com/github/copilot-sdk) | AI agent framework |
253
- | [OpenAI Whisper](https://platform.openai.com/docs/guides/speech-to-text) | Speech-to-text |
254
- | [FFmpeg](https://ffmpeg.org/) | Video/audio processing |
255
- | [Sharp](https://sharp.pixelplumbing.com/) | Image analysis (webcam detection) |
256
- | [Commander.js](https://github.com/tj/commander.js) | CLI framework |
257
- | [Chokidar](https://github.com/paulmillr/chokidar) | File system watching |
258
- | [Winston](https://github.com/winstonjs/winston) | Logging |
259
- | [Exa AI](https://exa.ai/) | Web search for social posts and blog |
260
-
261
- ---
262
-
263
- ## 🗺️ Roadmap
264
-
265
- - [ ] **Automated social posting** Publish directly to platforms via their APIs
266
- - [ ] **Multi-language support** Transcription and summaries in multiple languages
267
- - [ ] **Custom templates** — User-defined Markdown & social post templates
268
- - [ ] **Web dashboard** — Browser UI for reviewing and editing outputs
269
- - [ ] **Batch processing** Process an entire folder of existing videos
270
- - [ ] **Custom short criteria** — Configure what makes a "good" short for your content
271
- - [ ] **Thumbnail generation** Auto-generate branded thumbnails for shorts
272
-
273
- ---
274
-
275
- ## 🔧 Troubleshooting
276
-
277
- ### `No binary found for architecture` during install
278
-
279
- `ffmpeg-static` (an optional dependency) bundles FFmpeg for common platforms. On unsupported architectures, it skips gracefully and vidpipe falls back to your system FFmpeg.
280
-
281
- **Fix:** Install FFmpeg on your system:
282
- - **Windows:** `winget install Gyan.FFmpeg`
283
- - **macOS:** `brew install ffmpeg`
284
- - **Linux:** `sudo apt install ffmpeg` (Debian/Ubuntu) or `sudo dnf install ffmpeg` (Fedora)
285
-
286
- You can also point to a custom binary: `export FFMPEG_PATH=/path/to/ffmpeg`
287
-
288
- Run `vidpipe doctor` to verify your setup.
289
-
290
- ---
291
-
292
- ## 📄 License
293
-
294
- ISC © [htekdev](https://github.com/htekdev)
1
+ <div align="center">
2
+
3
+ ```
4
+ ██╗ ██╗██╗██████╗ ██████╗ ██╗██████╗ ███████╗
5
+ ██║ ██║██║██╔══██╗██╔══██╗██║██╔══██╗██╔════╝
6
+ ██║ ██║██║██║ ██║██████╔╝██║██████╔╝█████╗
7
+ ╚██╗ ██╔╝██║██║ ██║██╔═══╝ ██║██╔═══╝ ██╔══╝
8
+ ╚████╔╝ ██║██████╔╝██║ ██║██║ ███████╗
9
+ ╚═══╝ ╚═╝╚═════╝ ╚═╝ ╚═╝╚═╝ ╚══════╝
10
+ ```
11
+
12
+ **Drop a video. Get transcripts, summaries, short clips, captions, blog posts, and social media posts — automatically.**
13
+
14
+ An AI-powered CLI pipeline that watches for new video recordings and transforms them into rich, structured content using [GitHub Copilot SDK](https://github.com/github/copilot-sdk) agents and OpenAI Whisper.
15
+
16
+ [![CI](https://github.com/htekdev/vidpipe/actions/workflows/ci.yml/badge.svg)](https://github.com/htekdev/vidpipe/actions/workflows/ci.yml)
17
+ [![npm version](https://img.shields.io/npm/v/vidpipe)](https://www.npmjs.com/package/vidpipe)
18
+ [![Node.js 20+](https://img.shields.io/badge/node-20%2B-brightgreen)](https://nodejs.org/)
19
+ [![License: ISC](https://img.shields.io/badge/license-ISC-blue)](./LICENSE)
20
+ [![Docs](https://img.shields.io/badge/docs-vidpipe-a78bfa)](https://htekdev.github.io/vidpipe/)
21
+ [![Last Updated](https://img.shields.io/badge/last_updated-February_2026-informational)](.)
22
+
23
+ </div>
24
+
25
+ ```bash
26
+ npm install -g vidpipe
27
+ ```
28
+
29
+ ---
30
+
31
+ ## Features
32
+
33
+ <p align="center">
34
+ <img src="assets/features-infographic.png" alt="VidPipe FeaturesInput AI Processing Outputs" width="900" />
35
+ </p>
36
+
37
+ <br />
38
+
39
+ <table>
40
+ <tr>
41
+ <td>🎙️ <b>Whisper Transcription</b> — Word-level timestamps</td>
42
+ <td>📐 <b>Split-Screen Layouts</b> — Portrait, square, and feed</td>
43
+ </tr>
44
+ <tr>
45
+ <td>🔇 <b>AI Silence Removal</b> — Context-aware, capped at 20%</td>
46
+ <td>💬 <b>Karaoke Captions</b> — Word-by-word highlighting</td>
47
+ </tr>
48
+ <tr>
49
+ <td>✂️ <b>Short Clips</b> — Best 15–60s moments, multi-segment</td>
50
+ <td>🎞️ <b>Medium Clips</b> — 1–3 min with crossfade transitions</td>
51
+ </tr>
52
+ <tr>
53
+ <td>📑 <b>Chapter Detection</b> JSON, Markdown, YouTube, FFmeta</td>
54
+ <td>📱 <b>Social Posts</b> — TikTok, YouTube, Instagram, LinkedIn, X</td>
55
+ </tr>
56
+ <tr>
57
+ <td>📰 <b>Blog Post</b> — Dev.to style with web-sourced links</td>
58
+ <td>🎨 <b>Brand Voice</b> — Custom tone, hashtags via brand.json</td>
59
+ </tr>
60
+ <tr>
61
+ <td>🔍 <b>Face Detection</b> — ONNX-based webcam cropping</td>
62
+ <td>🚀 <b>Auto-Publish</b> Scheduled posting to TikTok, YouTube, Instagram, LinkedIn, X</td>
63
+ </tr>
64
+ </table>
65
+
66
+ ---
67
+
68
+ ## 🚀 Quick Start
69
+
70
+ ```bash
71
+ # Install globally
72
+ npm install -g vidpipe
73
+
74
+ # Set up your environment
75
+ # Unix/Mac
76
+ cp .env.example .env
77
+ # Windows (PowerShell)
78
+ Copy-Item .env.example .env
79
+
80
+ # Then edit .env and add your OpenAI API key (REQUIRED):
81
+ # OPENAI_API_KEY=sk-your-key-here
82
+
83
+ # Verify all prerequisites are met
84
+ vidpipe --doctor
85
+
86
+ # Process a single video
87
+ vidpipe /path/to/video.mp4
88
+
89
+ # Watch a folder for new recordings
90
+ vidpipe --watch-dir ~/Videos/Recordings
91
+
92
+ # Full example with options
93
+ vidpipe \
94
+ --watch-dir ~/Videos/Recordings \
95
+ --output-dir ~/Content/processed \
96
+ --openai-key sk-... \
97
+ --brand ./brand.json \
98
+ --verbose
99
+ ```
100
+
101
+ > **Prerequisites:**
102
+ > - **Node.js 20+**
103
+ > - **FFmpeg 6.0+** — Auto-bundled on common platforms (Windows x64, macOS, Linux x64) via [`ffmpeg-static`](https://www.npmjs.com/package/ffmpeg-static). On other architectures, install system FFmpeg (see [Troubleshooting](#troubleshooting)). Override with `FFMPEG_PATH` env var if you need a specific build.
104
+ > - **OpenAI API key** (**required**) — Get one at [platform.openai.com/api-keys](https://platform.openai.com/api-keys). Needed for Whisper transcription and all AI features.
105
+ > - **GitHub Copilot subscription** — Required for AI agent features (shorts generation, social media posts, summaries, blog posts). See [GitHub Copilot](https://github.com/features/copilot).
106
+ >
107
+ > See [Getting Started](./docs/getting-started.md) for full setup instructions.
108
+
109
+ ---
110
+
111
+ ## 🎮 CLI Usage
112
+
113
+ ```
114
+ vidpipe [options] [video-path]
115
+ vidpipe init # Interactive setup wizard
116
+ vidpipe review # Open post review web app
117
+ vidpipe schedule # View posting schedule
118
+ ```
119
+
120
+ | Option | Description |
121
+ |--------|-------------|
122
+ | `--doctor` | Check that all prerequisites (FFmpeg, API keys, etc.) are installed and configured |
123
+ | `[video-path]` | Process a specific video file (implies `--once`) |
124
+ | `--watch-dir <path>` | Folder to watch for new recordings |
125
+ | `--output-dir <path>` | Output directory (default: `./recordings`) |
126
+ | `--openai-key <key>` | OpenAI API key |
127
+ | `--exa-key <key>` | Exa AI key for web search in social posts |
128
+ | `--brand <path>` | Path to `brand.json` (default: `./brand.json`) |
129
+ | `--once` | Process next video and exit |
130
+ | `--no-silence-removal` | Skip silence removal |
131
+ | `--no-shorts` | Skip short clip extraction |
132
+ | `--no-medium-clips` | Skip medium clip generation |
133
+ | `--no-social` | Skip social media posts |
134
+ | `--no-social-publish` | Skip social media queue-build stage |
135
+ | `--late-api-key <key>` | Override Late API key |
136
+ | `--no-captions` | Skip caption generation/burning |
137
+ | `--no-git` | Skip git commit/push |
138
+ | `-v, --verbose` | Debug-level logging |
139
+
140
+ ---
141
+
142
+ ## 📁 Output Structure
143
+
144
+ ```
145
+ recordings/
146
+ └── my-awesome-demo/
147
+ ├── my-awesome-demo.mp4 # Original video
148
+ ├── my-awesome-demo-edited.mp4 # Silence-removed
149
+ ├── my-awesome-demo-captioned.mp4 # With burned-in captions
150
+ ├── transcript.json # Word-level transcript
151
+ ├── transcript-edited.json # Timestamps adjusted for silence removal
152
+ ├── README.md # AI-generated summary with screenshots
153
+ ├── captions/
154
+ │ ├── captions.srt # SubRip subtitles
155
+ │ ├── captions.vtt # WebVTT subtitles
156
+ │ └── captions.ass # Advanced SSA (karaoke-style)
157
+ ├── shorts/
158
+ │ ├── catchy-title.mp4 # Landscape base clip
159
+ │ ├── catchy-title-captioned.mp4 # Landscape + burned captions
160
+ │ ├── catchy-title-portrait.mp4 # 9:16 split-screen
161
+ │ ├── catchy-title-portrait-captioned.mp4 # Portrait + captions + hook overlay
162
+ │ ├── catchy-title-feed.mp4 # 4:5 split-screen
163
+ │ ├── catchy-title-square.mp4 # 1:1 split-screen
164
+ │ ├── catchy-title.md # Clip metadata
165
+ │ └── catchy-title/
166
+ │ └── posts/ # Per-short social posts (5 platforms)
167
+ ├── medium-clips/
168
+ │ ├── deep-dive-topic.mp4 # Landscape base clip
169
+ │ ├── deep-dive-topic-captioned.mp4 # With burned captions
170
+ │ ├── deep-dive-topic.md # Clip metadata
171
+ │ └── deep-dive-topic/
172
+ │ └── posts/ # Per-clip social posts (5 platforms)
173
+ ├── chapters/
174
+ │ ├── chapters.json # Structured chapter data
175
+ │ ├── chapters.md # Markdown table
176
+ │ ├── chapters.ffmetadata # FFmpeg metadata format
177
+ │ └── chapters-youtube.txt # YouTube description timestamps
178
+ └── social-posts/
179
+ ├── tiktok.md # Full-video social posts
180
+ ├── youtube.md
181
+ ├── instagram.md
182
+ ├── linkedin.md
183
+ ├── x.md
184
+ └── devto.md # Dev.to blog post
185
+ ```
186
+
187
+ ---
188
+
189
+ ## 📺 Review App
190
+
191
+ VidPipe includes a built-in web app for reviewing, editing, and scheduling social media posts before publishing.
192
+
193
+ <div align="center">
194
+ <img src="assets/review-ui.png" alt="VidPipe Review UI" width="800" />
195
+ <br />
196
+ <em>Review and approve posts across YouTube, TikTok, Instagram, LinkedIn, and X/Twitter</em>
197
+ </div>
198
+
199
+ ```bash
200
+ # Launch the review app
201
+ vidpipe review
202
+ ```
203
+
204
+ - **Platform tabs** — Filter posts by platform (YouTube, TikTok, Instagram, LinkedIn, X)
205
+ - **Video preview** — See the video thumbnail and content before approving
206
+ - **Keyboard shortcuts** — Arrow keys to navigate, Enter to approve, Backspace to reject
207
+ - **Smart scheduling** — Posts are queued with optimal timing per platform
208
+
209
+ ---
210
+
211
+ ## 🔄 Pipeline
212
+
213
+ ```mermaid
214
+ graph LR
215
+ A[📥 Ingest] --> B[🎙️ Transcribe]
216
+ B --> C[🔇 Silence Removal]
217
+ C --> D[💬 Captions]
218
+ D --> E[🔥 Caption Burn]
219
+ E --> F[✂️ Shorts]
220
+ F --> G[🎞️ Medium Clips]
221
+ G --> H[📑 Chapters]
222
+ H --> I[📝 Summary]
223
+ I --> J[📱 Social Media]
224
+ J --> K[📱 Short Posts]
225
+ K --> L[📱 Medium Posts]
226
+ L --> M[📰 Blog]
227
+ M --> N[📦 Queue Build]
228
+ N --> O[🔄 Git Push]
229
+
230
+ style A fill:#2d5a27,stroke:#4ade80
231
+ style B fill:#1e3a5f,stroke:#60a5fa
232
+ style E fill:#5a2d27,stroke:#f87171
233
+ style F fill:#5a4d27,stroke:#fbbf24
234
+ style O fill:#2d5a27,stroke:#4ade80
235
+ ```
236
+
237
+ | # | Stage | Description |
238
+ |---|-------|-------------|
239
+ | 1 | **Ingestion** | Copies video, extracts metadata with FFprobe |
240
+ | 2 | **Transcription** | Extracts audio OpenAI Whisper for word-level transcription |
241
+ | 3 | **Silence Removal** | AI detects dead-air segments; context-aware removals capped at 20% |
242
+ | 4 | **Captions** | Generates `.srt`, `.vtt`, and `.ass` subtitle files with karaoke word highlighting |
243
+ | 5 | **Caption Burn** | Burns ASS captions into video (single-pass encode when silence was also removed) |
244
+ | 6 | **Shorts** | AI identifies best 15–60s moments; extracts single and composite clips with 6 variants per short |
245
+ | 7 | **Medium Clips** | AI identifies 1–3 min standalone segments with crossfade transitions |
246
+ | 8 | **Chapters** | AI detects topic boundaries; outputs JSON, Markdown, FFmetadata, and YouTube timestamps |
247
+ | 9 | **Summary** | AI writes a Markdown README with captured screenshots |
248
+ | 10 | **Social Media** | Platform-tailored posts for TikTok, YouTube, Instagram, LinkedIn, and X |
249
+ | 11 | **Short Posts** | Per-short social media posts for all 5 platforms |
250
+ | 12 | **Medium Clip Posts** | Per-medium-clip social media posts for all 5 platforms |
251
+ | 13 | **Blog** | Dev.to blog post with frontmatter, web-sourced links via Exa |
252
+ | 14 | **Queue Build** | Builds publish queue from social posts with scheduled slots |
253
+ | 15 | **Git Push** | Auto-commits and pushes to `origin main` |
254
+
255
+ Each stage can be independently skipped with `--no-*` flags. A stage failure does not abort the pipeline — subsequent stages proceed with whatever data is available.
256
+
257
+ ---
258
+
259
+ ## 🤖 LLM Providers
260
+
261
+ VidPipe supports multiple LLM providers:
262
+
263
+ | Provider | Env Var | Default Model | Notes |
264
+ |----------|---------|---------------|-------|
265
+ | `copilot` (default) | | Claude Opus 4.6 | Uses GitHub Copilot auth |
266
+ | `openai` | `OPENAI_API_KEY` | gpt-4o | Direct OpenAI API |
267
+ | `claude` | `ANTHROPIC_API_KEY` | claude-opus-4.6 | Direct Anthropic API |
268
+
269
+ Set `LLM_PROVIDER` in your `.env` or pass via CLI. Override model with `LLM_MODEL`.
270
+
271
+ The pipeline tracks token usage and estimated cost across all providers, displaying a summary at the end of each run.
272
+
273
+ ---
274
+
275
+ ## ⚙️ Configuration
276
+
277
+ Configuration is loaded from CLI flags → environment variables → `.env` file → defaults.
278
+
279
+ ```env
280
+ # .env
281
+ OPENAI_API_KEY=sk-your-key-here
282
+ WATCH_FOLDER=/path/to/recordings
283
+ OUTPUT_DIR=/path/to/output
284
+ # EXA_API_KEY=your-exa-key # Optional: enables web search in social/blog posts
285
+ # BRAND_PATH=./brand.json # Optional: path to brand voice config
286
+ # FFMPEG_PATH=/usr/local/bin/ffmpeg
287
+ # FFPROBE_PATH=/usr/local/bin/ffprobe
288
+ # LATE_API_KEY=sk_your_key_here # Optional: Late API for social publishing
289
+ ```
290
+
291
+ Social media publishing is configured via `schedule.json` and the Late API. See [Social Publishing Guide](./docs/social-publishing.md) for details.
292
+
293
+ ---
294
+
295
+ ## 📚 Documentation
296
+
297
+ | Guide | Description |
298
+ |-------|-------------|
299
+ | [Getting Started](./docs/getting-started.md) | Prerequisites, installation, and first run |
300
+ | [Configuration](./docs/configuration.md) | All CLI flags, env vars, skip options, and examples |
301
+ | [FFmpeg Setup](./docs/ffmpeg-setup.md) | Platform-specific install (Windows, macOS, Linux, ARM64) |
302
+ | [Brand Customization](./docs/brand-customization.md) | Customize AI voice, vocabulary, hashtags, and content style |
303
+ | [Social Publishing](./docs/social-publishing.md) | Review, schedule, and publish social posts via Late API |
304
+
305
+ ---
306
+
307
+ ## 🏗️ Architecture
308
+
309
+ Agent-based architecture built on the [GitHub Copilot SDK](https://github.com/github/copilot-sdk):
310
+
311
+ ```mermaid
312
+ graph TD
313
+ BP[🧠 BaseAgent] --> SRA[SilenceRemovalAgent]
314
+ BP --> SA[SummaryAgent]
315
+ BP --> SHA[ShortsAgent]
316
+ BP --> MVA[MediumVideoAgent]
317
+ BP --> CA[ChapterAgent]
318
+ BP --> SMA[SocialMediaAgent]
319
+ BP --> BA[BlogAgent]
320
+
321
+ SRA -->|tools| T1[detect_silence, decide_removals]
322
+ SHA -->|tools| T2[plan_shorts]
323
+ MVA -->|tools| T3[plan_medium_clips]
324
+ CA -->|tools| T4[generate_chapters]
325
+ SA -->|tools| T5[capture_frame, write_summary]
326
+ SMA -->|tools| T6[search_links, create_posts]
327
+ BA -->|tools| T7[search_web, write_blog]
328
+
329
+ style BP fill:#1e3a5f,stroke:#60a5fa,color:#fff
330
+ ```
331
+
332
+ Each agent communicates with the LLM through structured tool calls, ensuring reliable, parseable outputs.
333
+
334
+ ---
335
+
336
+ ## 🛠️ Tech Stack
337
+
338
+ | Technology | Purpose |
339
+ |------------|---------|
340
+ | [TypeScript](https://www.typescriptlang.org/) | Language (ES2022, ESM) |
341
+ | [GitHub Copilot SDK](https://github.com/github/copilot-sdk) | AI agent framework |
342
+ | [OpenAI Whisper](https://platform.openai.com/docs/guides/speech-to-text) | Speech-to-text |
343
+ | [FFmpeg](https://ffmpeg.org/) | Video/audio processing |
344
+ | [Sharp](https://sharp.pixelplumbing.com/) | Image analysis (webcam detection) |
345
+ | [Commander.js](https://github.com/tj/commander.js) | CLI framework |
346
+ | [Chokidar](https://github.com/paulmillr/chokidar) | File system watching |
347
+ | [Winston](https://github.com/winstonjs/winston) | Logging |
348
+ | [Exa AI](https://exa.ai/) | Web search for social posts and blog |
349
+
350
+ ---
351
+
352
+ ## 🗺️ Roadmap
353
+
354
+ - [x] **Automated social posting** — Publish directly to platforms via Late API
355
+ - [ ] **Multi-language support** — Transcription and summaries in multiple languages
356
+ - [ ] **Custom templates** — User-defined Markdown & social post templates
357
+ - [ ] **Web dashboard** — Browser UI for reviewing and editing outputs
358
+ - [ ] **Batch processing** — Process an entire folder of existing videos
359
+ - [ ] **Custom short criteria** — Configure what makes a "good" short for your content
360
+ - [ ] **Thumbnail generation** — Auto-generate branded thumbnails for shorts
361
+
362
+ ---
363
+
364
+ ## 🔧 Troubleshooting
365
+
366
+ ### `No binary found for architecture` during install
367
+
368
+ `ffmpeg-static` (an optional dependency) bundles FFmpeg for common platforms. On unsupported architectures, it skips gracefully and vidpipe falls back to your system FFmpeg.
369
+
370
+ **Fix:** Install FFmpeg on your system:
371
+ - **Windows:** `winget install Gyan.FFmpeg`
372
+ - **macOS:** `brew install ffmpeg`
373
+ - **Linux:** `sudo apt install ffmpeg` (Debian/Ubuntu) or `sudo dnf install ffmpeg` (Fedora)
374
+
375
+ You can also point to a custom binary: `export FFMPEG_PATH=/path/to/ffmpeg`
376
+
377
+ Run `vidpipe doctor` to verify your setup.
378
+
379
+ ---
380
+
381
+ ## 📄 License
382
+
383
+ ISC © [htekdev](https://github.com/htekdev)
384
+