vidpipe 1.3.0 → 1.3.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,384 +1,384 @@
1
- <div align="center">
2
-
3
- ```
4
- ██╗ ██╗██╗██████╗ ██████╗ ██╗██████╗ ███████╗
5
- ██║ ██║██║██╔══██╗██╔══██╗██║██╔══██╗██╔════╝
6
- ██║ ██║██║██║ ██║██████╔╝██║██████╔╝█████╗
7
- ╚██╗ ██╔╝██║██║ ██║██╔═══╝ ██║██╔═══╝ ██╔══╝
8
- ╚████╔╝ ██║██████╔╝██║ ██║██║ ███████╗
9
- ╚═══╝ ╚═╝╚═════╝ ╚═╝ ╚═╝╚═╝ ╚══════╝
10
- ```
11
-
12
- **Drop a video. Get transcripts, summaries, short clips, captions, blog posts, and social media posts automatically.**
13
-
14
- An AI-powered CLI pipeline that watches for new video recordings and transforms them into rich, structured content using [GitHub Copilot SDK](https://github.com/github/copilot-sdk) agents and OpenAI Whisper.
15
-
16
- [![CI](https://github.com/htekdev/vidpipe/actions/workflows/ci.yml/badge.svg)](https://github.com/htekdev/vidpipe/actions/workflows/ci.yml)
17
- [![npm version](https://img.shields.io/npm/v/vidpipe)](https://www.npmjs.com/package/vidpipe)
18
- [![Node.js 20+](https://img.shields.io/badge/node-20%2B-brightgreen)](https://nodejs.org/)
19
- [![License: ISC](https://img.shields.io/badge/license-ISC-blue)](./LICENSE)
20
- [![Docs](https://img.shields.io/badge/docs-vidpipe-a78bfa)](https://htekdev.github.io/vidpipe/)
21
- [![Last Updated](https://img.shields.io/badge/last_updated-February_2026-informational)](.)
22
-
23
- </div>
24
-
25
- ```bash
26
- npm install -g vidpipe
27
- ```
28
-
29
- ---
30
-
31
- ## ✨ Features
32
-
33
- <p align="center">
34
- <img src="assets/features-infographic.png" alt="VidPipe Features — Input → AI Processing → Outputs" width="900" />
35
- </p>
36
-
37
- <br />
38
-
39
- <table>
40
- <tr>
41
- <td>🎙️ <b>Whisper Transcription</b> — Word-level timestamps</td>
42
- <td>📐 <b>Split-Screen Layouts</b> — Portrait, square, and feed</td>
43
- </tr>
44
- <tr>
45
- <td>🔇 <b>AI Silence Removal</b> — Context-aware, capped at 20%</td>
46
- <td>💬 <b>Karaoke Captions</b> — Word-by-word highlighting</td>
47
- </tr>
48
- <tr>
49
- <td>✂️ <b>Short Clips</b> — Best 15–60s moments, multi-segment</td>
50
- <td>🎞️ <b>Medium Clips</b> — 1–3 min with crossfade transitions</td>
51
- </tr>
52
- <tr>
53
- <td>📑 <b>Chapter Detection</b> — JSON, Markdown, YouTube, FFmeta</td>
54
- <td>📱 <b>Social Posts</b> — TikTok, YouTube, Instagram, LinkedIn, X</td>
55
- </tr>
56
- <tr>
57
- <td>📰 <b>Blog Post</b> — Dev.to style with web-sourced links</td>
58
- <td>🎨 <b>Brand Voice</b> — Custom tone, hashtags via brand.json</td>
59
- </tr>
60
- <tr>
61
- <td>🔍 <b>Face Detection</b> — ONNX-based webcam cropping</td>
62
- <td>🚀 <b>Auto-Publish</b> — Scheduled posting to TikTok, YouTube, Instagram, LinkedIn, X</td>
63
- </tr>
64
- </table>
65
-
66
- ---
67
-
68
- ## 🚀 Quick Start
69
-
70
- ```bash
71
- # Install globally
72
- npm install -g vidpipe
73
-
74
- # Set up your environment
75
- # Unix/Mac
76
- cp .env.example .env
77
- # Windows (PowerShell)
78
- Copy-Item .env.example .env
79
-
80
- # Then edit .env and add your OpenAI API key (REQUIRED):
81
- # OPENAI_API_KEY=sk-your-key-here
82
-
83
- # Verify all prerequisites are met
84
- vidpipe --doctor
85
-
86
- # Process a single video
87
- vidpipe /path/to/video.mp4
88
-
89
- # Watch a folder for new recordings
90
- vidpipe --watch-dir ~/Videos/Recordings
91
-
92
- # Full example with options
93
- vidpipe \
94
- --watch-dir ~/Videos/Recordings \
95
- --output-dir ~/Content/processed \
96
- --openai-key sk-... \
97
- --brand ./brand.json \
98
- --verbose
99
- ```
100
-
101
- > **Prerequisites:**
102
- > - **Node.js 20+**
103
- > - **FFmpeg 6.0+** — Auto-bundled on common platforms (Windows x64, macOS, Linux x64) via [`ffmpeg-static`](https://www.npmjs.com/package/ffmpeg-static). On other architectures, install system FFmpeg (see [Troubleshooting](#troubleshooting)). Override with `FFMPEG_PATH` env var if you need a specific build.
104
- > - **OpenAI API key** (**required**) — Get one at [platform.openai.com/api-keys](https://platform.openai.com/api-keys). Needed for Whisper transcription and all AI features.
105
- > - **GitHub Copilot subscription** — Required for AI agent features (shorts generation, social media posts, summaries, blog posts). See [GitHub Copilot](https://github.com/features/copilot).
106
- >
107
- > See [Getting Started](./docs/getting-started.md) for full setup instructions.
108
-
109
- ---
110
-
111
- ## 🎮 CLI Usage
112
-
113
- ```
114
- vidpipe [options] [video-path]
115
- vidpipe init # Interactive setup wizard
116
- vidpipe review # Open post review web app
117
- vidpipe schedule # View posting schedule
118
- ```
119
-
120
- | Option | Description |
121
- |--------|-------------|
122
- | `--doctor` | Check that all prerequisites (FFmpeg, API keys, etc.) are installed and configured |
123
- | `[video-path]` | Process a specific video file (implies `--once`) |
124
- | `--watch-dir <path>` | Folder to watch for new recordings |
125
- | `--output-dir <path>` | Output directory (default: `./recordings`) |
126
- | `--openai-key <key>` | OpenAI API key |
127
- | `--exa-key <key>` | Exa AI key for web search in social posts |
128
- | `--brand <path>` | Path to `brand.json` (default: `./brand.json`) |
129
- | `--once` | Process next video and exit |
130
- | `--no-silence-removal` | Skip silence removal |
131
- | `--no-shorts` | Skip short clip extraction |
132
- | `--no-medium-clips` | Skip medium clip generation |
133
- | `--no-social` | Skip social media posts |
134
- | `--no-social-publish` | Skip social media queue-build stage |
135
- | `--late-api-key <key>` | Override Late API key |
136
- | `--no-captions` | Skip caption generation/burning |
137
- | `--no-git` | Skip git commit/push |
138
- | `-v, --verbose` | Debug-level logging |
139
-
140
- ---
141
-
142
- ## 📁 Output Structure
143
-
144
- ```
145
- recordings/
146
- └── my-awesome-demo/
147
- ├── my-awesome-demo.mp4 # Original video
148
- ├── my-awesome-demo-edited.mp4 # Silence-removed
149
- ├── my-awesome-demo-captioned.mp4 # With burned-in captions
150
- ├── transcript.json # Word-level transcript
151
- ├── transcript-edited.json # Timestamps adjusted for silence removal
152
- ├── README.md # AI-generated summary with screenshots
153
- ├── captions/
154
- │ ├── captions.srt # SubRip subtitles
155
- │ ├── captions.vtt # WebVTT subtitles
156
- │ └── captions.ass # Advanced SSA (karaoke-style)
157
- ├── shorts/
158
- │ ├── catchy-title.mp4 # Landscape base clip
159
- │ ├── catchy-title-captioned.mp4 # Landscape + burned captions
160
- │ ├── catchy-title-portrait.mp4 # 9:16 split-screen
161
- │ ├── catchy-title-portrait-captioned.mp4 # Portrait + captions + hook overlay
162
- │ ├── catchy-title-feed.mp4 # 4:5 split-screen
163
- │ ├── catchy-title-square.mp4 # 1:1 split-screen
164
- │ ├── catchy-title.md # Clip metadata
165
- │ └── catchy-title/
166
- │ └── posts/ # Per-short social posts (5 platforms)
167
- ├── medium-clips/
168
- │ ├── deep-dive-topic.mp4 # Landscape base clip
169
- │ ├── deep-dive-topic-captioned.mp4 # With burned captions
170
- │ ├── deep-dive-topic.md # Clip metadata
171
- │ └── deep-dive-topic/
172
- │ └── posts/ # Per-clip social posts (5 platforms)
173
- ├── chapters/
174
- │ ├── chapters.json # Structured chapter data
175
- │ ├── chapters.md # Markdown table
176
- │ ├── chapters.ffmetadata # FFmpeg metadata format
177
- │ └── chapters-youtube.txt # YouTube description timestamps
178
- └── social-posts/
179
- ├── tiktok.md # Full-video social posts
180
- ├── youtube.md
181
- ├── instagram.md
182
- ├── linkedin.md
183
- ├── x.md
184
- └── devto.md # Dev.to blog post
185
- ```
186
-
187
- ---
188
-
189
- ## 📺 Review App
190
-
191
- VidPipe includes a built-in web app for reviewing, editing, and scheduling social media posts before publishing.
192
-
193
- <div align="center">
194
- <img src="assets/review-ui.png" alt="VidPipe Review UI" width="800" />
195
- <br />
196
- <em>Review and approve posts across YouTube, TikTok, Instagram, LinkedIn, and X/Twitter</em>
197
- </div>
198
-
199
- ```bash
200
- # Launch the review app
201
- vidpipe review
202
- ```
203
-
204
- - **Platform tabs** — Filter posts by platform (YouTube, TikTok, Instagram, LinkedIn, X)
205
- - **Video preview** — See the video thumbnail and content before approving
206
- - **Keyboard shortcuts** — Arrow keys to navigate, Enter to approve, Backspace to reject
207
- - **Smart scheduling** — Posts are queued with optimal timing per platform
208
-
209
- ---
210
-
211
- ## 🔄 Pipeline
212
-
213
- ```mermaid
214
- graph LR
215
- A[📥 Ingest] --> B[🎙️ Transcribe]
216
- B --> C[🔇 Silence Removal]
217
- C --> D[💬 Captions]
218
- D --> E[🔥 Caption Burn]
219
- E --> F[✂️ Shorts]
220
- F --> G[🎞️ Medium Clips]
221
- G --> H[📑 Chapters]
222
- H --> I[📝 Summary]
223
- I --> J[📱 Social Media]
224
- J --> K[📱 Short Posts]
225
- K --> L[📱 Medium Posts]
226
- L --> M[📰 Blog]
227
- M --> N[📦 Queue Build]
228
- N --> O[🔄 Git Push]
229
-
230
- style A fill:#2d5a27,stroke:#4ade80
231
- style B fill:#1e3a5f,stroke:#60a5fa
232
- style E fill:#5a2d27,stroke:#f87171
233
- style F fill:#5a4d27,stroke:#fbbf24
234
- style O fill:#2d5a27,stroke:#4ade80
235
- ```
236
-
237
- | # | Stage | Description |
238
- |---|-------|-------------|
239
- | 1 | **Ingestion** | Copies video, extracts metadata with FFprobe |
240
- | 2 | **Transcription** | Extracts audio → OpenAI Whisper for word-level transcription |
241
- | 3 | **Silence Removal** | AI detects dead-air segments; context-aware removals capped at 20% |
242
- | 4 | **Captions** | Generates `.srt`, `.vtt`, and `.ass` subtitle files with karaoke word highlighting |
243
- | 5 | **Caption Burn** | Burns ASS captions into video (single-pass encode when silence was also removed) |
244
- | 6 | **Shorts** | AI identifies best 15–60s moments; extracts single and composite clips with 6 variants per short |
245
- | 7 | **Medium Clips** | AI identifies 1–3 min standalone segments with crossfade transitions |
246
- | 8 | **Chapters** | AI detects topic boundaries; outputs JSON, Markdown, FFmetadata, and YouTube timestamps |
247
- | 9 | **Summary** | AI writes a Markdown README with captured screenshots |
248
- | 10 | **Social Media** | Platform-tailored posts for TikTok, YouTube, Instagram, LinkedIn, and X |
249
- | 11 | **Short Posts** | Per-short social media posts for all 5 platforms |
250
- | 12 | **Medium Clip Posts** | Per-medium-clip social media posts for all 5 platforms |
251
- | 13 | **Blog** | Dev.to blog post with frontmatter, web-sourced links via Exa |
252
- | 14 | **Queue Build** | Builds publish queue from social posts with scheduled slots |
253
- | 15 | **Git Push** | Auto-commits and pushes to `origin main` |
254
-
255
- Each stage can be independently skipped with `--no-*` flags. A stage failure does not abort the pipeline — subsequent stages proceed with whatever data is available.
256
-
257
- ---
258
-
259
- ## 🤖 LLM Providers
260
-
261
- VidPipe supports multiple LLM providers:
262
-
263
- | Provider | Env Var | Default Model | Notes |
264
- |----------|---------|---------------|-------|
265
- | `copilot` (default) | — | Claude Opus 4.6 | Uses GitHub Copilot auth |
266
- | `openai` | `OPENAI_API_KEY` | gpt-4o | Direct OpenAI API |
267
- | `claude` | `ANTHROPIC_API_KEY` | claude-opus-4.6 | Direct Anthropic API |
268
-
269
- Set `LLM_PROVIDER` in your `.env` or pass via CLI. Override model with `LLM_MODEL`.
270
-
271
- The pipeline tracks token usage and estimated cost across all providers, displaying a summary at the end of each run.
272
-
273
- ---
274
-
275
- ## ⚙️ Configuration
276
-
277
- Configuration is loaded from CLI flags → environment variables → `.env` file → defaults.
278
-
279
- ```env
280
- # .env
281
- OPENAI_API_KEY=sk-your-key-here
282
- WATCH_FOLDER=/path/to/recordings
283
- OUTPUT_DIR=/path/to/output
284
- # EXA_API_KEY=your-exa-key # Optional: enables web search in social/blog posts
285
- # BRAND_PATH=./brand.json # Optional: path to brand voice config
286
- # FFMPEG_PATH=/usr/local/bin/ffmpeg
287
- # FFPROBE_PATH=/usr/local/bin/ffprobe
288
- # LATE_API_KEY=sk_your_key_here # Optional: Late API for social publishing
289
- ```
290
-
291
- Social media publishing is configured via `schedule.json` and the Late API. See [Social Publishing Guide](./docs/social-publishing.md) for details.
292
-
293
- ---
294
-
295
- ## 📚 Documentation
296
-
297
- | Guide | Description |
298
- |-------|-------------|
299
- | [Getting Started](./docs/getting-started.md) | Prerequisites, installation, and first run |
300
- | [Configuration](./docs/configuration.md) | All CLI flags, env vars, skip options, and examples |
301
- | [FFmpeg Setup](./docs/ffmpeg-setup.md) | Platform-specific install (Windows, macOS, Linux, ARM64) |
302
- | [Brand Customization](./docs/brand-customization.md) | Customize AI voice, vocabulary, hashtags, and content style |
303
- | [Social Publishing](./docs/social-publishing.md) | Review, schedule, and publish social posts via Late API |
304
-
305
- ---
306
-
307
- ## 🏗️ Architecture
308
-
309
- Agent-based architecture built on the [GitHub Copilot SDK](https://github.com/github/copilot-sdk):
310
-
311
- ```mermaid
312
- graph TD
313
- BP[🧠 BaseAgent] --> SRA[SilenceRemovalAgent]
314
- BP --> SA[SummaryAgent]
315
- BP --> SHA[ShortsAgent]
316
- BP --> MVA[MediumVideoAgent]
317
- BP --> CA[ChapterAgent]
318
- BP --> SMA[SocialMediaAgent]
319
- BP --> BA[BlogAgent]
320
-
321
- SRA -->|tools| T1[detect_silence, decide_removals]
322
- SHA -->|tools| T2[plan_shorts]
323
- MVA -->|tools| T3[plan_medium_clips]
324
- CA -->|tools| T4[generate_chapters]
325
- SA -->|tools| T5[capture_frame, write_summary]
326
- SMA -->|tools| T6[search_links, create_posts]
327
- BA -->|tools| T7[search_web, write_blog]
328
-
329
- style BP fill:#1e3a5f,stroke:#60a5fa,color:#fff
330
- ```
331
-
332
- Each agent communicates with the LLM through structured tool calls, ensuring reliable, parseable outputs.
333
-
334
- ---
335
-
336
- ## 🛠️ Tech Stack
337
-
338
- | Technology | Purpose |
339
- |------------|---------|
340
- | [TypeScript](https://www.typescriptlang.org/) | Language (ES2022, ESM) |
341
- | [GitHub Copilot SDK](https://github.com/github/copilot-sdk) | AI agent framework |
342
- | [OpenAI Whisper](https://platform.openai.com/docs/guides/speech-to-text) | Speech-to-text |
343
- | [FFmpeg](https://ffmpeg.org/) | Video/audio processing |
344
- | [Sharp](https://sharp.pixelplumbing.com/) | Image analysis (webcam detection) |
345
- | [Commander.js](https://github.com/tj/commander.js) | CLI framework |
346
- | [Chokidar](https://github.com/paulmillr/chokidar) | File system watching |
347
- | [Winston](https://github.com/winstonjs/winston) | Logging |
348
- | [Exa AI](https://exa.ai/) | Web search for social posts and blog |
349
-
350
- ---
351
-
352
- ## 🗺️ Roadmap
353
-
354
- - [x] **Automated social posting** — Publish directly to platforms via Late API
355
- - [ ] **Multi-language support** — Transcription and summaries in multiple languages
356
- - [ ] **Custom templates** — User-defined Markdown & social post templates
357
- - [ ] **Web dashboard** — Browser UI for reviewing and editing outputs
358
- - [ ] **Batch processing** — Process an entire folder of existing videos
359
- - [ ] **Custom short criteria** — Configure what makes a "good" short for your content
360
- - [ ] **Thumbnail generation** — Auto-generate branded thumbnails for shorts
361
-
362
- ---
363
-
364
- ## 🔧 Troubleshooting
365
-
366
- ### `No binary found for architecture` during install
367
-
368
- `ffmpeg-static` (an optional dependency) bundles FFmpeg for common platforms. On unsupported architectures, it skips gracefully and vidpipe falls back to your system FFmpeg.
369
-
370
- **Fix:** Install FFmpeg on your system:
371
- - **Windows:** `winget install Gyan.FFmpeg`
372
- - **macOS:** `brew install ffmpeg`
373
- - **Linux:** `sudo apt install ffmpeg` (Debian/Ubuntu) or `sudo dnf install ffmpeg` (Fedora)
374
-
375
- You can also point to a custom binary: `export FFMPEG_PATH=/path/to/ffmpeg`
376
-
377
- Run `vidpipe doctor` to verify your setup.
378
-
379
- ---
380
-
381
- ## 📄 License
382
-
383
- ISC © [htekdev](https://github.com/htekdev)
384
-
1
+ <div align="center">
2
+
3
+ ```
4
+ ██╗ ██╗██╗██████╗ ██████╗ ██╗██████╗ ███████╗
5
+ ██║ ██║██║██╔══██╗██╔══██╗██║██╔══██╗██╔════╝
6
+ ██║ ██║██║██║ ██║██████╔╝██║██████╔╝█████╗
7
+ ╚██╗ ██╔╝██║██║ ██║██╔═══╝ ██║██╔═══╝ ██╔══╝
8
+ ╚████╔╝ ██║██████╔╝██║ ██║██║ ███████╗
9
+ ╚═══╝ ╚═╝╚═════╝ ╚═╝ ╚═╝╚═╝ ╚══════╝
10
+ ```
11
+
12
+ **Your AI video editor turn raw recordings into shorts, reels, captions, social posts, and blog posts. Record once, publish everywhere.**
13
+
14
+ An agentic video editor that watches for new recordings and edits them into social-media-ready content — shorts, reels, captions, blog posts, and platform-tailored social posts — using [GitHub Copilot SDK](https://github.com/github/copilot-sdk) AI agents and OpenAI Whisper.
15
+
16
+ [![CI](https://github.com/htekdev/vidpipe/actions/workflows/ci.yml/badge.svg)](https://github.com/htekdev/vidpipe/actions/workflows/ci.yml)
17
+ [![npm version](https://img.shields.io/npm/v/vidpipe)](https://www.npmjs.com/package/vidpipe)
18
+ [![Node.js 20+](https://img.shields.io/badge/node-20%2B-brightgreen)](https://nodejs.org/)
19
+ [![License: ISC](https://img.shields.io/badge/license-ISC-blue)](./LICENSE)
20
+ [![Docs](https://img.shields.io/badge/docs-vidpipe-a78bfa)](https://htekdev.github.io/vidpipe/)
21
+ [![Last Updated](https://img.shields.io/badge/last_updated-February_2026-informational)](.)
22
+
23
+ </div>
24
+
25
+ ```bash
26
+ npm install -g vidpipe
27
+ ```
28
+
29
+ ---
30
+
31
+ ## ✨ Features
32
+
33
+ <p align="center">
34
+ <img src="assets/features-infographic.png" alt="VidPipe Features — Input → AI Processing → Outputs" width="900" />
35
+ </p>
36
+
37
+ <br />
38
+
39
+ <table>
40
+ <tr>
41
+ <td>🎙️ <b>Whisper Transcription</b> — Word-level timestamps</td>
42
+ <td>📐 <b>Split-Screen Layouts</b> — Portrait, square, and feed</td>
43
+ </tr>
44
+ <tr>
45
+ <td>🔇 <b>AI Silence Removal</b> — Context-aware, capped at 20%</td>
46
+ <td>💬 <b>Karaoke Captions</b> — Word-by-word highlighting</td>
47
+ </tr>
48
+ <tr>
49
+ <td>✂️ <b>Short Clips</b> — Best 15–60s moments, multi-segment</td>
50
+ <td>🎞️ <b>Medium Clips</b> — 1–3 min with crossfade transitions</td>
51
+ </tr>
52
+ <tr>
53
+ <td>📑 <b>Chapter Detection</b> — JSON, Markdown, YouTube, FFmeta</td>
54
+ <td>📱 <b>Social Posts</b> — TikTok, YouTube, Instagram, LinkedIn, X</td>
55
+ </tr>
56
+ <tr>
57
+ <td>📰 <b>Blog Post</b> — Dev.to style with web-sourced links</td>
58
+ <td>🎨 <b>Brand Voice</b> — Custom tone, hashtags via brand.json</td>
59
+ </tr>
60
+ <tr>
61
+ <td>🔍 <b>Face Detection</b> — ONNX-based webcam cropping</td>
62
+ <td>🚀 <b>Auto-Publish</b> — Scheduled posting to TikTok, YouTube, Instagram, LinkedIn, X</td>
63
+ </tr>
64
+ </table>
65
+
66
+ ---
67
+
68
+ ## 🚀 Quick Start
69
+
70
+ ```bash
71
+ # Install globally
72
+ npm install -g vidpipe
73
+
74
+ # Set up your environment
75
+ # Unix/Mac
76
+ cp .env.example .env
77
+ # Windows (PowerShell)
78
+ Copy-Item .env.example .env
79
+
80
+ # Then edit .env and add your OpenAI API key (REQUIRED):
81
+ # OPENAI_API_KEY=sk-your-key-here
82
+
83
+ # Verify all prerequisites are met
84
+ vidpipe --doctor
85
+
86
+ # Process a single video
87
+ vidpipe /path/to/video.mp4
88
+
89
+ # Watch a folder for new recordings
90
+ vidpipe --watch-dir ~/Videos/Recordings
91
+
92
+ # Full example with options
93
+ vidpipe \
94
+ --watch-dir ~/Videos/Recordings \
95
+ --output-dir ~/Content/processed \
96
+ --openai-key sk-... \
97
+ --brand ./brand.json \
98
+ --verbose
99
+ ```
100
+
101
+ > **Prerequisites:**
102
+ > - **Node.js 20+**
103
+ > - **FFmpeg 6.0+** — Auto-bundled on common platforms (Windows x64, macOS, Linux x64) via [`ffmpeg-static`](https://www.npmjs.com/package/ffmpeg-static). On other architectures, install system FFmpeg (see [Troubleshooting](#troubleshooting)). Override with `FFMPEG_PATH` env var if you need a specific build.
104
+ > - **OpenAI API key** (**required**) — Get one at [platform.openai.com/api-keys](https://platform.openai.com/api-keys). Needed for Whisper transcription and all AI features.
105
+ > - **GitHub Copilot subscription** — Required for AI agent features (shorts generation, social media posts, summaries, blog posts). See [GitHub Copilot](https://github.com/features/copilot).
106
+ >
107
+ > See [Getting Started](./docs/getting-started.md) for full setup instructions.
108
+
109
+ ---
110
+
111
+ ## 🎮 CLI Usage
112
+
113
+ ```
114
+ vidpipe [options] [video-path]
115
+ vidpipe init # Interactive setup wizard
116
+ vidpipe review # Open post review web app
117
+ vidpipe schedule # View posting schedule
118
+ ```
119
+
120
+ | Option | Description |
121
+ |--------|-------------|
122
+ | `--doctor` | Check that all prerequisites (FFmpeg, API keys, etc.) are installed and configured |
123
+ | `[video-path]` | Process a specific video file (implies `--once`) |
124
+ | `--watch-dir <path>` | Folder to watch for new recordings |
125
+ | `--output-dir <path>` | Output directory (default: `./recordings`) |
126
+ | `--openai-key <key>` | OpenAI API key |
127
+ | `--exa-key <key>` | Exa AI key for web search in social posts |
128
+ | `--brand <path>` | Path to `brand.json` (default: `./brand.json`) |
129
+ | `--once` | Process next video and exit |
130
+ | `--no-silence-removal` | Skip silence removal |
131
+ | `--no-shorts` | Skip short clip extraction |
132
+ | `--no-medium-clips` | Skip medium clip generation |
133
+ | `--no-social` | Skip social media posts |
134
+ | `--no-social-publish` | Skip social media queue-build stage |
135
+ | `--late-api-key <key>` | Override Late API key |
136
+ | `--no-captions` | Skip caption generation/burning |
137
+ | `--no-git` | Skip git commit/push |
138
+ | `-v, --verbose` | Debug-level logging |
139
+
140
+ ---
141
+
142
+ ## 📁 Output Structure
143
+
144
+ ```
145
+ recordings/
146
+ └── my-awesome-demo/
147
+ ├── my-awesome-demo.mp4 # Original video
148
+ ├── my-awesome-demo-edited.mp4 # Silence-removed
149
+ ├── my-awesome-demo-captioned.mp4 # With burned-in captions
150
+ ├── transcript.json # Word-level transcript
151
+ ├── transcript-edited.json # Timestamps adjusted for silence removal
152
+ ├── README.md # AI-generated summary with screenshots
153
+ ├── captions/
154
+ │ ├── captions.srt # SubRip subtitles
155
+ │ ├── captions.vtt # WebVTT subtitles
156
+ │ └── captions.ass # Advanced SSA (karaoke-style)
157
+ ├── shorts/
158
+ │ ├── catchy-title.mp4 # Landscape base clip
159
+ │ ├── catchy-title-captioned.mp4 # Landscape + burned captions
160
+ │ ├── catchy-title-portrait.mp4 # 9:16 split-screen
161
+ │ ├── catchy-title-portrait-captioned.mp4 # Portrait + captions + hook overlay
162
+ │ ├── catchy-title-feed.mp4 # 4:5 split-screen
163
+ │ ├── catchy-title-square.mp4 # 1:1 split-screen
164
+ │ ├── catchy-title.md # Clip metadata
165
+ │ └── catchy-title/
166
+ │ └── posts/ # Per-short social posts (5 platforms)
167
+ ├── medium-clips/
168
+ │ ├── deep-dive-topic.mp4 # Landscape base clip
169
+ │ ├── deep-dive-topic-captioned.mp4 # With burned captions
170
+ │ ├── deep-dive-topic.md # Clip metadata
171
+ │ └── deep-dive-topic/
172
+ │ └── posts/ # Per-clip social posts (5 platforms)
173
+ ├── chapters/
174
+ │ ├── chapters.json # Structured chapter data
175
+ │ ├── chapters.md # Markdown table
176
+ │ ├── chapters.ffmetadata # FFmpeg metadata format
177
+ │ └── chapters-youtube.txt # YouTube description timestamps
178
+ └── social-posts/
179
+ ├── tiktok.md # Full-video social posts
180
+ ├── youtube.md
181
+ ├── instagram.md
182
+ ├── linkedin.md
183
+ ├── x.md
184
+ └── devto.md # Dev.to blog post
185
+ ```
186
+
187
+ ---
188
+
189
+ ## 📺 Review App
190
+
191
+ VidPipe includes a built-in web app for reviewing, editing, and scheduling social media posts before publishing.
192
+
193
+ <div align="center">
194
+ <img src="assets/review-ui.png" alt="VidPipe Review UI" width="800" />
195
+ <br />
196
+ <em>Review and approve posts across YouTube, TikTok, Instagram, LinkedIn, and X/Twitter</em>
197
+ </div>
198
+
199
+ ```bash
200
+ # Launch the review app
201
+ vidpipe review
202
+ ```
203
+
204
+ - **Platform tabs** — Filter posts by platform (YouTube, TikTok, Instagram, LinkedIn, X)
205
+ - **Video preview** — See the video thumbnail and content before approving
206
+ - **Keyboard shortcuts** — Arrow keys to navigate, Enter to approve, Backspace to reject
207
+ - **Smart scheduling** — Posts are queued with optimal timing per platform
208
+
209
+ ---
210
+
211
+ ## 🔄 Pipeline
212
+
213
+ ```mermaid
214
+ graph LR
215
+ A[📥 Ingest] --> B[🎙️ Transcribe]
216
+ B --> C[🔇 Silence Removal]
217
+ C --> D[💬 Captions]
218
+ D --> E[🔥 Caption Burn]
219
+ E --> F[✂️ Shorts]
220
+ F --> G[🎞️ Medium Clips]
221
+ G --> H[📑 Chapters]
222
+ H --> I[📝 Summary]
223
+ I --> J[📱 Social Media]
224
+ J --> K[📱 Short Posts]
225
+ K --> L[📱 Medium Posts]
226
+ L --> M[📰 Blog]
227
+ M --> N[📦 Queue Build]
228
+ N --> O[🔄 Git Push]
229
+
230
+ style A fill:#2d5a27,stroke:#4ade80
231
+ style B fill:#1e3a5f,stroke:#60a5fa
232
+ style E fill:#5a2d27,stroke:#f87171
233
+ style F fill:#5a4d27,stroke:#fbbf24
234
+ style O fill:#2d5a27,stroke:#4ade80
235
+ ```
236
+
237
+ | # | Stage | Description |
238
+ |---|-------|-------------|
239
+ | 1 | **Ingestion** | Copies video, extracts metadata with FFprobe |
240
+ | 2 | **Transcription** | Extracts audio → OpenAI Whisper for word-level transcription |
241
+ | 3 | **Silence Removal** | AI detects dead-air segments; context-aware removals capped at 20% |
242
+ | 4 | **Captions** | Generates `.srt`, `.vtt`, and `.ass` subtitle files with karaoke word highlighting |
243
+ | 5 | **Caption Burn** | Burns ASS captions into video (single-pass encode when silence was also removed) |
244
+ | 6 | **Shorts** | AI identifies best 15–60s moments; extracts single and composite clips with 6 variants per short |
245
+ | 7 | **Medium Clips** | AI identifies 1–3 min standalone segments with crossfade transitions |
246
+ | 8 | **Chapters** | AI detects topic boundaries; outputs JSON, Markdown, FFmetadata, and YouTube timestamps |
247
+ | 9 | **Summary** | AI writes a Markdown README with captured screenshots |
248
+ | 10 | **Social Media** | Platform-tailored posts for TikTok, YouTube, Instagram, LinkedIn, and X |
249
+ | 11 | **Short Posts** | Per-short social media posts for all 5 platforms |
250
+ | 12 | **Medium Clip Posts** | Per-medium-clip social media posts for all 5 platforms |
251
+ | 13 | **Blog** | Dev.to blog post with frontmatter, web-sourced links via Exa |
252
+ | 14 | **Queue Build** | Builds publish queue from social posts with scheduled slots |
253
+ | 15 | **Git Push** | Auto-commits and pushes to `origin main` |
254
+
255
+ Each stage can be independently skipped with `--no-*` flags. A stage failure does not abort the pipeline — subsequent stages proceed with whatever data is available.
256
+
257
+ ---
258
+
259
+ ## 🤖 LLM Providers
260
+
261
+ VidPipe supports multiple LLM providers:
262
+
263
+ | Provider | Env Var | Default Model | Notes |
264
+ |----------|---------|---------------|-------|
265
+ | `copilot` (default) | — | Claude Opus 4.6 | Uses GitHub Copilot auth |
266
+ | `openai` | `OPENAI_API_KEY` | gpt-4o | Direct OpenAI API |
267
+ | `claude` | `ANTHROPIC_API_KEY` | claude-opus-4.6 | Direct Anthropic API |
268
+
269
+ Set `LLM_PROVIDER` in your `.env` or pass via CLI. Override model with `LLM_MODEL`.
270
+
271
+ The pipeline tracks token usage and estimated cost across all providers, displaying a summary at the end of each run.
272
+
273
+ ---
274
+
275
+ ## ⚙️ Configuration
276
+
277
+ Configuration is loaded from CLI flags → environment variables → `.env` file → defaults.
278
+
279
+ ```env
280
+ # .env
281
+ OPENAI_API_KEY=sk-your-key-here
282
+ WATCH_FOLDER=/path/to/recordings
283
+ OUTPUT_DIR=/path/to/output
284
+ # EXA_API_KEY=your-exa-key # Optional: enables web search in social/blog posts
285
+ # BRAND_PATH=./brand.json # Optional: path to brand voice config
286
+ # FFMPEG_PATH=/usr/local/bin/ffmpeg
287
+ # FFPROBE_PATH=/usr/local/bin/ffprobe
288
+ # LATE_API_KEY=sk_your_key_here # Optional: Late API for social publishing
289
+ ```
290
+
291
+ Social media publishing is configured via `schedule.json` and the Late API. See [Social Publishing Guide](./docs/social-publishing.md) for details.
292
+
293
+ ---
294
+
295
+ ## 📚 Documentation
296
+
297
+ | Guide | Description |
298
+ |-------|-------------|
299
+ | [Getting Started](./docs/getting-started.md) | Prerequisites, installation, and first run |
300
+ | [Configuration](./docs/configuration.md) | All CLI flags, env vars, skip options, and examples |
301
+ | [FFmpeg Setup](./docs/ffmpeg-setup.md) | Platform-specific install (Windows, macOS, Linux, ARM64) |
302
+ | [Brand Customization](./docs/brand-customization.md) | Customize AI voice, vocabulary, hashtags, and content style |
303
+ | [Social Publishing](./docs/social-publishing.md) | Review, schedule, and publish social posts via Late API |
304
+
305
+ ---
306
+
307
+ ## 🏗️ Architecture
308
+
309
+ Agentic architecture built on the [GitHub Copilot SDK](https://github.com/github/copilot-sdk) — each editing task is handled by a specialized AI agent:
310
+
311
+ ```mermaid
312
+ graph TD
313
+ BP[🧠 BaseAgent] --> SRA[SilenceRemovalAgent]
314
+ BP --> SA[SummaryAgent]
315
+ BP --> SHA[ShortsAgent]
316
+ BP --> MVA[MediumVideoAgent]
317
+ BP --> CA[ChapterAgent]
318
+ BP --> SMA[SocialMediaAgent]
319
+ BP --> BA[BlogAgent]
320
+
321
+ SRA -->|tools| T1[detect_silence, decide_removals]
322
+ SHA -->|tools| T2[plan_shorts]
323
+ MVA -->|tools| T3[plan_medium_clips]
324
+ CA -->|tools| T4[generate_chapters]
325
+ SA -->|tools| T5[capture_frame, write_summary]
326
+ SMA -->|tools| T6[search_links, create_posts]
327
+ BA -->|tools| T7[search_web, write_blog]
328
+
329
+ style BP fill:#1e3a5f,stroke:#60a5fa,color:#fff
330
+ ```
331
+
332
+ Each agent communicates with the LLM through structured tool calls, ensuring reliable, parseable outputs.
333
+
334
+ ---
335
+
336
+ ## 🛠️ Tech Stack
337
+
338
+ | Technology | Purpose |
339
+ |------------|---------|
340
+ | [TypeScript](https://www.typescriptlang.org/) | Language (ES2022, ESM) |
341
+ | [GitHub Copilot SDK](https://github.com/github/copilot-sdk) | AI agent framework |
342
+ | [OpenAI Whisper](https://platform.openai.com/docs/guides/speech-to-text) | Speech-to-text |
343
+ | [FFmpeg](https://ffmpeg.org/) | Video/audio processing |
344
+ | [Sharp](https://sharp.pixelplumbing.com/) | Image analysis (webcam detection) |
345
+ | [Commander.js](https://github.com/tj/commander.js) | CLI framework |
346
+ | [Chokidar](https://github.com/paulmillr/chokidar) | File system watching |
347
+ | [Winston](https://github.com/winstonjs/winston) | Logging |
348
+ | [Exa AI](https://exa.ai/) | Web search for social posts and blog |
349
+
350
+ ---
351
+
352
+ ## 🗺️ Roadmap
353
+
354
+ - [x] **Automated social posting** — Publish directly to platforms via Late API
355
+ - [ ] **Multi-language support** — Transcription and summaries in multiple languages
356
+ - [ ] **Custom templates** — User-defined Markdown & social post templates
357
+ - [ ] **Web dashboard** — Browser UI for reviewing and editing outputs
358
+ - [ ] **Batch processing** — Process an entire folder of existing videos
359
+ - [ ] **Custom short criteria** — Configure what makes a "good" short for your content
360
+ - [ ] **Thumbnail generation** — Auto-generate branded thumbnails for shorts
361
+
362
+ ---
363
+
364
+ ## 🔧 Troubleshooting
365
+
366
+ ### `No binary found for architecture` during install
367
+
368
+ `ffmpeg-static` (an optional dependency) bundles FFmpeg for common platforms. On unsupported architectures, it skips gracefully and vidpipe falls back to your system FFmpeg.
369
+
370
+ **Fix:** Install FFmpeg on your system:
371
+ - **Windows:** `winget install Gyan.FFmpeg`
372
+ - **macOS:** `brew install ffmpeg`
373
+ - **Linux:** `sudo apt install ffmpeg` (Debian/Ubuntu) or `sudo dnf install ffmpeg` (Fedora)
374
+
375
+ You can also point to a custom binary: `export FFMPEG_PATH=/path/to/ffmpeg`
376
+
377
+ Run `vidpipe doctor` to verify your setup.
378
+
379
+ ---
380
+
381
+ ## 📄 License
382
+
383
+ ISC © [htekdev](https://github.com/htekdev)
384
+