@thunderkiller/video-clipper 1.2.0 → 1.3.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (91) hide show
  1. package/CHANGELOG.md +13 -0
  2. package/LICENSE +15 -0
  3. package/package.json +1 -1
  4. package/.github/workflows/ci.yml +0 -42
  5. package/.github/workflows/release.yml +0 -76
  6. package/.husky/pre-commit +0 -3
  7. package/.prettierignore +0 -6
  8. package/.prettierrc +0 -7
  9. package/.releaserc.json +0 -21
  10. package/AGENTS.md +0 -122
  11. package/docs/free-models.md +0 -78
  12. package/docs/plan.md +0 -442
  13. package/docs/refactorPhases.md +0 -105
  14. package/docs/yt-downloader.md +0 -440
  15. package/requirements.txt +0 -5
  16. package/scripts/detect_events.py +0 -81
  17. package/scripts/detect_events_whisper.py +0 -101
  18. package/scripts/transcribe_whisper.py +0 -70
  19. package/src/cli.ts +0 -186
  20. package/src/config/env.ts +0 -18
  21. package/src/config/index.ts +0 -2
  22. package/src/index.ts +0 -46
  23. package/src/pipeline/runner.ts +0 -147
  24. package/src/pipeline/stages/audioProcessor.ts +0 -127
  25. package/src/pipeline/stages/clipExporter.ts +0 -76
  26. package/src/pipeline/stages/segmentAnalyzer.ts +0 -72
  27. package/src/pipeline/stages/segmentSelector.ts +0 -39
  28. package/src/pipeline/stages/videoResolver.ts +0 -44
  29. package/src/services/audioAnalyzers/base.ts +0 -32
  30. package/src/services/audioAnalyzers/factory.ts +0 -69
  31. package/src/services/audioAnalyzers/gemini.ts +0 -136
  32. package/src/services/audioAnalyzers/index.ts +0 -6
  33. package/src/services/audioAnalyzers/whisper.ts +0 -80
  34. package/src/services/audioAnalyzers/yamnet.ts +0 -54
  35. package/src/services/audioDownloader/index.ts +0 -102
  36. package/src/services/chunkBuilder/index.ts +0 -82
  37. package/src/services/clipGenerator/index.ts +0 -210
  38. package/src/services/clipRefiner/index.ts +0 -141
  39. package/src/services/eventDetector/index.ts +0 -68
  40. package/src/services/llmAnalyzer/LLMAnalyzer.ts +0 -98
  41. package/src/services/llmAnalyzer/index.ts +0 -231
  42. package/src/services/metadataExtractor/index.ts +0 -83
  43. package/src/services/segmentRanker/index.ts +0 -88
  44. package/src/services/signalMerger/index.ts +0 -53
  45. package/src/services/transcriptAnalyzers/base.ts +0 -26
  46. package/src/services/transcriptAnalyzers/factory.ts +0 -66
  47. package/src/services/transcriptAnalyzers/gemini.ts +0 -24
  48. package/src/services/transcriptAnalyzers/index.ts +0 -6
  49. package/src/services/transcriptAnalyzers/whisper.ts +0 -68
  50. package/src/services/transcriptAnalyzers/ytdlp.ts +0 -19
  51. package/src/services/transcriptDetector/index.ts +0 -122
  52. package/src/services/transcriptFetcher/index.ts +0 -147
  53. package/src/services/urlParser/index.ts +0 -52
  54. package/src/services/videoDownloader/index.ts +0 -268
  55. package/src/types/analyzer.ts +0 -23
  56. package/src/types/audio.ts +0 -19
  57. package/src/types/cache.ts +0 -8
  58. package/src/types/cli.ts +0 -22
  59. package/src/types/config.ts +0 -151
  60. package/src/types/downloader.ts +0 -15
  61. package/src/types/factory.ts +0 -3
  62. package/src/types/index.ts +0 -40
  63. package/src/types/pipeline.ts +0 -60
  64. package/src/types/segment.ts +0 -43
  65. package/src/types/transcript.ts +0 -22
  66. package/src/types/video.ts +0 -18
  67. package/src/utils/cache.ts +0 -224
  68. package/src/utils/chunker.ts +0 -60
  69. package/src/utils/dumper.ts +0 -41
  70. package/src/utils/format.ts +0 -10
  71. package/src/utils/logger.ts +0 -17
  72. package/src/utils/modelFactory.ts +0 -71
  73. package/src/utils/redactConfig.ts +0 -23
  74. package/src/utils/sliceAudio.ts +0 -35
  75. package/test-trigger.txt +0 -1
  76. package/tests/analyzerFactory.test.ts +0 -146
  77. package/tests/audioEventDetector.test.ts +0 -69
  78. package/tests/cache.test.ts +0 -203
  79. package/tests/chunkBuilder.test.ts +0 -146
  80. package/tests/chunker.test.ts +0 -95
  81. package/tests/eventDetector.test.ts +0 -103
  82. package/tests/llmAnalyzer.test.ts +0 -283
  83. package/tests/segmentRanker.test.ts +0 -133
  84. package/tests/setup.ts +0 -48
  85. package/tests/signalMerger.test.ts +0 -197
  86. package/tests/transcriptDetector.test.ts +0 -150
  87. package/tests/transcriptFetcher.test.ts +0 -179
  88. package/tests/urlParser.test.ts +0 -70
  89. package/tsconfig.json +0 -16
  90. package/tsconfig.test.json +0 -8
  91. package/vitest.config.ts +0 -8
package/docs/plan.md DELETED
@@ -1,442 +0,0 @@
1
- Here's the full updated build plan in markdown:
2
-
3
- ```markdown
4
- # YouTube Clip Finder — Build Plan v2.0
5
-
6
- ### with Audio Event Detection
7
-
8
- ---
9
-
10
- ## Legend
11
-
12
- | Symbol | Meaning |
13
- | ---------- | ------------------- |
14
- | ✅ Done | Already built |
15
- | 🔧 To Do | Not built yet |
16
- | 🆕 New | Added in v2 |
17
- | ⚡ Upgrade | Existing + extended |
18
-
19
- ---
20
-
21
- ## 1. System Architecture
22
-
23
- The v2 pipeline adds audio event detection as a parallel signal alongside transcript analysis. Both signals feed into the merger before ranking.
24
- ```
25
-
26
- User Input (YouTube URL)
27
-
28
-
29
- Module 1 — URL Parser
30
-
31
-
32
- Module 2 — Video Metadata Extractor
33
-
34
- ├─────────────────────────────────┐
35
- ▼ ▼
36
- Module 3 — Transcript Fetcher Module 3b — Audio Downloader ★ NEW
37
- │ │
38
- ▼ ▼
39
- Module 4 — LLM Chunk Builder Module 3c — Audio Event Detector ★ NEW
40
- │ │
41
- ▼ │
42
- Module 5 — LLM Segment Analyzer │
43
- │ │
44
- └──────────────┬──────────────────┘
45
-
46
- Module 5b — Signal Merger ★ NEW
47
-
48
-
49
- Module 6 — Segment Ranking ⚡ UPGRADED
50
-
51
-
52
- Module 7 — Clip Refinement Pass
53
-
54
-
55
- Module 8 — Video Downloader
56
-
57
-
58
- Module 9 — Clip Generator (optional)
59
-
60
- ````
61
-
62
- ---
63
-
64
- ## 2. Module Status Overview
65
-
66
- | # | Module | Status |
67
- |---|--------|--------|
68
- | 1 | URL Parser | ✅ Done |
69
- | 2 | Video Metadata Extractor | ✅ Done |
70
- | 3 | Transcript Fetcher + Micro-block Grouper | ✅ Done |
71
- | 3b | Audio Downloader (yt-dlp audio-only) | 🆕 New |
72
- | 3c | Audio Event Detector (Gemini primary + YAMNet fallback) | 🆕 New |
73
- | 4 | LLM Chunk Builder | ✅ Done |
74
- | 5 | LLM Segment Analyzer | ✅ Done |
75
- | 5b | Signal Merger (transcript + audio events) | 🆕 New |
76
- | 6 | Segment Ranking | ⚡ Upgrade |
77
- | 7 | Clip Refinement Pass | ✅ Done |
78
- | 8 | Video Downloader | ✅ Done |
79
- | 9 | Clip Generator (ffmpeg) | ✅ Done |
80
-
81
- ---
82
-
83
- ## 3. Existing Modules
84
-
85
- Modules 1–5 and 7–9 are fully built per v1 spec. No changes required.
86
-
87
- ---
88
-
89
- ## 4. New Modules — Audio Event Detection
90
-
91
- ### Module 3b — Audio Downloader 🆕
92
-
93
- **Status: To Do**
94
-
95
- Downloads audio-only from YouTube using yt-dlp. Runs in parallel with transcript fetching. Uses 16kHz mono WAV required by YAMNet.
96
-
97
- ```ts
98
- import { execa } from 'execa';
99
-
100
- export async function downloadAudio(videoId: string, outputDir: string): Promise<string> {
101
- const outputPath = `${outputDir}/${videoId}_audio.wav`;
102
-
103
- if (fs.existsSync(outputPath)) {
104
- console.log(`[audio] Cache hit: ${outputPath}`);
105
- return outputPath;
106
- }
107
-
108
- await execa('yt-dlp', [
109
- '-x',
110
- '--audio-format', 'wav',
111
- '--audio-quality', '0',
112
- '--postprocessor-args', '-ar 16000 -ac 1', // 16kHz mono for YAMNet
113
- '-o', outputPath,
114
- `https://youtube.com/watch?v=${videoId}`,
115
- ]);
116
-
117
- return outputPath;
118
- }
119
- ````
120
-
121
- ---
122
-
123
- ### Module 3c — Audio Event Detector (Gemini primary + Whisper local + YAMNet legacy) 🆕
124
-
125
- **Status: To Do**
126
-
127
- Three-tier audio event detection. Gemini 1.5 Flash is tried first — understands game context, needs no local setup. If Gemini fails or is disabled, Whisper runs locally: it transcribes the audio chunk and scans the resulting transcript for hype keywords per game profile. YAMNet remains available as a legacy option via `AUDIO_PROVIDER=yamnet`.
128
-
129
- | | Gemini Flash (primary) | Whisper (local fallback) | YAMNet (legacy) |
130
- | ------------ | ---------------------------------------- | ------------------------------------------------ | ------------------------------------- |
131
- | Cost | ~$0.001/video (free tier: 60/day) | Free, always | Free, always |
132
- | Setup | Just an API key | pip install openai-whisper | pip install tensorflow tensorflow-hub |
133
- | Game context | Understands "clutch", "ace", boss phases | Speech transcript + keyword matching per profile | Class IDs only (gunshot, explosion) |
134
- | Accuracy | High — semantic understanding | Medium-high — depends on speech clarity | Medium — fixed class threshold |
135
- | Offline | No | Yes | Yes |
136
-
137
- #### Tier 1 — Gemini 1.5 Flash (primary)
138
-
139
- ```ts
140
- import { GoogleGenerativeAI } from '@google/generative-ai';
141
- import * as fs from 'fs';
142
-
143
- const genai = new GoogleGenerativeAI(process.env.GOOGLE_GENERATIVE_AI_API_KEY!);
144
-
145
- export async function detectEventsGemini(
146
- audioPath: string,
147
- gameProfile: string,
148
- chunkOffsetSec: number,
149
- ): Promise<AudioEvent[]> {
150
- const model = genai.getGenerativeModel({ model: 'gemini-1.5-flash' });
151
-
152
- const audioData = fs.readFileSync(audioPath);
153
- const base64Audio = audioData.toString('base64');
154
-
155
- const prompt = `
156
- You are analyzing audio from a ${gameProfile} gaming video.
157
- Identify ALL significant game events: kills, deaths, explosions,
158
- ability uses, boss phases, crowd reactions, clutch moments.
159
- For each event return JSON: { time_sec, event, confidence }
160
- time_sec is relative to the START of this audio chunk.
161
- Return ONLY a JSON array, no explanation.
162
- `;
163
-
164
- const result = await model.generateContent([
165
- { inlineData: { mimeType: 'audio/wav', data: base64Audio } },
166
- prompt,
167
- ]);
168
-
169
- const events = JSON.parse(result.response.text());
170
-
171
- // Offset timestamps to absolute video time
172
- return events.map((e: any) => ({
173
- ...e,
174
- time: e.time_sec + chunkOffsetSec,
175
- source: 'gemini',
176
- }));
177
- }
178
- ```
179
-
180
- #### Tier 2 — YAMNet fallback (Python)
181
-
182
- ```python
183
- import tensorflow_hub as hub
184
- import soundfile as sf
185
- import numpy as np, json, sys
186
-
187
- model = hub.load('https://tfhub.dev/google/yamnet/1')
188
-
189
- GAME_EVENTS = {
190
- 67: 'gunshot', 366: 'explosion',
191
- 389: 'crowd_cheering', 63: 'gunfire_burst',
192
- }
193
-
194
- def detect_events(audio_path, threshold=0.30):
195
- wav, sr = sf.read(audio_path, dtype='float32')
196
- scores, _, _ = model(wav)
197
- events = []
198
- for i, frame in enumerate(scores.numpy()):
199
- for cid, label in GAME_EVENTS.items():
200
- if frame[cid] > threshold:
201
- events.append({ 'time': round(i * 0.48, 2),
202
- 'event': label,
203
- 'confidence': float(frame[cid]),
204
- 'source': 'yamnet' })
205
- return cluster_events(events, gap=1.5)
206
-
207
- print(json.dumps(detect_events(sys.argv[1], float(sys.argv[2]))))
208
- ```
209
-
210
- #### Fallback orchestrator (TypeScript)
211
-
212
- ```ts
213
- export async function detectAudioEvents(
214
- audioPath: string,
215
- gameProfile: string,
216
- chunkOffsetSec: number,
217
- ): Promise<AudioEvent[]> {
218
- // Try Gemini first
219
- if (config.AUDIO_PROVIDER !== 'yamnet' && config.GOOGLE_GENERATIVE_AI_API_KEY) {
220
- try {
221
- const events = await detectEventsGemini(audioPath, gameProfile, chunkOffsetSec);
222
- console.log(`[audio] Gemini detected ${events.length} events`);
223
- return events;
224
- } catch (err) {
225
- console.warn('[audio] Gemini failed, falling back to YAMNet:', err.message);
226
- }
227
- }
228
-
229
- // Fallback: YAMNet
230
- const { stdout } = await execa('python', [
231
- 'scripts/detect_events.py',
232
- audioPath,
233
- String(config.AUDIO_CONFIDENCE_THRESHOLD),
234
- ]);
235
- const events = JSON.parse(stdout) as AudioEvent[];
236
- console.log(`[audio] YAMNet detected ${events.length} events`);
237
- return events;
238
- }
239
- ```
240
-
241
- #### Output format (same shape from both providers)
242
-
243
- ```json
244
- {
245
- "time": 142.08,
246
- "event": "gunshot",
247
- "confidence": 0.74,
248
- "source": "gemini"
249
- }
250
- ```
251
-
252
- ---
253
-
254
- ### Module 5b — Signal Merger 🆕
255
-
256
- **Status: To Do**
257
-
258
- Merges LLM transcript scores and audio event detections into unified clip candidates.
259
-
260
- **Merging logic:**
261
-
262
- - Audio event at time T → clip window: `T - 5s` to `T + 15s`
263
- - LLM segment within ±10s of an audio event → score boosted by +2
264
- - Audio event with no nearby LLM signal → still a candidate (score = confidence × 10)
265
- - LLM segment with no audio event → unchanged (existing behavior)
266
-
267
- ```ts
268
- interface MergedCandidate {
269
- start: number;
270
- end: number;
271
- score: number;
272
- source: 'transcript' | 'audio' | 'both';
273
- reason: string;
274
- audio_event?: string;
275
- }
276
-
277
- export function mergeSignals(
278
- llmSegments: LLMSegment[],
279
- audioEvents: AudioEvent[],
280
- boostWindow = 10,
281
- ): MergedCandidate[] {
282
- const candidates: MergedCandidate[] = [];
283
-
284
- // Pass 1: LLM segments (possibly boosted by nearby audio)
285
- for (const seg of llmSegments) {
286
- const nearby = audioEvents.filter((e) => Math.abs(e.time - seg.clip_start) < boostWindow);
287
- candidates.push({
288
- start: seg.clip_start,
289
- end: seg.clip_end,
290
- score: seg.score + (nearby.length > 0 ? 2 : 0),
291
- source: nearby.length > 0 ? 'both' : 'transcript',
292
- reason: seg.reason,
293
- audio_event: nearby[0]?.event,
294
- });
295
- }
296
-
297
- // Pass 2: Audio-only events (the gap filler — silent kills, boss deaths)
298
- for (const evt of audioEvents) {
299
- const hasLLM = llmSegments.some((s) => Math.abs(s.clip_start - evt.time) < boostWindow);
300
- if (!hasLLM) {
301
- candidates.push({
302
- start: Math.max(0, evt.time - 5),
303
- end: evt.time + 15,
304
- score: Math.round(evt.confidence * 10),
305
- source: 'audio',
306
- reason: `Audio event: ${evt.event} (${(evt.confidence * 100).toFixed(0)}% confidence)`,
307
- audio_event: evt.event,
308
- });
309
- }
310
- }
311
-
312
- return candidates;
313
- }
314
- ```
315
-
316
- ---
317
-
318
- ## 5. Upgraded Modules
319
-
320
- ### Module 6 — Segment Ranking ⚡
321
-
322
- **Status: Upgrade — extend existing ranking to handle MergedCandidate[]**
323
-
324
- Changes from v1:
325
-
326
- - Input is now `MergedCandidate[]` instead of `LLMSegment[]`
327
- - New `source` field in output: `'transcript'`, `'audio'`, or `'both'`
328
- - Deduplication window widened to ±8s for audio events
329
- - `audio_event` field passed through to final JSON output
330
-
331
- ---
332
-
333
- ## 6. New Config Options (.env)
334
-
335
- ```env
336
- # Audio Event Detection
337
- AUDIO_DETECTION_ENABLED=true # set false to skip (transcript-only mode)
338
- AUDIO_PROVIDER=both # gemini | yamnet | whisper | both (gemini with whisper fallback)
339
- AUDIO_CONFIDENCE_THRESHOLD=0.30 # confidence minimum (0-1); for Whisper: 1.0=exact, 0.8=partial
340
- AUDIO_WHISPER_MODEL=medium # tiny | base | small | medium | large-v3
341
- AUDIO_CLIP_PRE_ROLL=5 # seconds before event to start clip
342
- AUDIO_CLIP_POST_ROLL=15 # seconds after event to end clip
343
- AUDIO_LLM_BOOST_WINDOW=10 # seconds within which audio boosts LLM score
344
- AUDIO_LLM_SCORE_BOOST=2 # score boost when audio+LLM both signal
345
-
346
- # Game Profile
347
- GAME_PROFILE=valorant # valorant | fps | boss_fight | general
348
- ```
349
-
350
- ---
351
-
352
- ## 7. Game Profiles 🆕
353
-
354
- | Profile | YAMNet classes boosted | LLM keyword hints |
355
- | ---------- | ----------------------------------------------- | ------------------------------------ |
356
- | valorant | gunshot, gunfire_burst, explosion | ace, clutch, defuse, spike, 1v1 |
357
- | boss_fight | explosion, crowd_cheering, battle_cry | phase, dead, down, finally, let's go |
358
- | fps | gunshot, gunfire_burst, explosion, crowd_booing | kill, streak, headshot, collateral |
359
- | general | crowd_cheering, applause | insane, crazy, no way, let's go |
360
-
361
- ---
362
-
363
- ## 8. Updated Final Output Format
364
-
365
- ```json
366
- {
367
- "video_id": "abc123",
368
- "title": "Valorant ranked grind",
369
- "duration": 3640,
370
- "segments": [
371
- {
372
- "rank": 1,
373
- "start": 834,
374
- "end": 849,
375
- "score": 9,
376
- "source": "both",
377
- "reason": "Ace reaction with hype phrases",
378
- "audio_event": "gunshot"
379
- },
380
- {
381
- "rank": 2,
382
- "start": 1205,
383
- "end": 1220,
384
- "score": 7,
385
- "source": "audio",
386
- "reason": "Audio event: gunshot (81% confidence)",
387
- "audio_event": "gunshot"
388
- },
389
- {
390
- "rank": 3,
391
- "start": 420,
392
- "end": 455,
393
- "score": 8,
394
- "source": "transcript",
395
- "reason": "Funny storytelling moment"
396
- }
397
- ]
398
- }
399
- ```
400
-
401
- ---
402
-
403
- ## 9. New Dependencies
404
-
405
- ```bash
406
- pip install tensorflow tensorflow-hub soundfile numpy
407
- ```
408
-
409
- | Package | Language | Cost | Purpose |
410
- | --------------------- | -------- | --------- | ---------------------- |
411
- | tensorflow_hub | Python | Free | Load YAMNet model |
412
- | soundfile | Python | Free | Read WAV files |
413
- | numpy | Python | Free | Score array processing |
414
- | @google/generative-ai | Node | Free tier | Gemini audio analysis |
415
-
416
- ---
417
-
418
- ## 10. To-Do Checklist
419
-
420
- - ✅ Module 1 — URL Parser
421
- - ✅ Module 2 — Video Metadata Extractor
422
- - ✅ Module 3 — Transcript Fetcher + Micro-block Grouper
423
- - ✅ Module 4 — LLM Chunk Builder
424
- - ✅ Module 5 — LLM Segment Analyzer
425
- - ✅ Module 7 — Clip Refinement Pass
426
- - ✅ Module 8 — Video Downloader
427
- - ✅ Module 9 — Clip Generator
428
- - 🆕 Module 3b — Audio Downloader (yt-dlp audio-only, 16kHz mono WAV)
429
- - 🆕 Module 3c — Gemini Flash audio detector (chunked audio + game prompt)
430
- - 🆕 Module 3c — YAMNet fallback (Python script + Node execa caller)
431
- - 🆕 Module 3c — Fallback orchestrator (try Gemini → catch → YAMNet)
432
- - 🆕 Module 5b — Signal Merger (transcript + audio candidates)
433
- - ⚡ Module 6 — Segment Ranking (update to accept MergedCandidate[])
434
- - 🆕 Add AUDIO_PROVIDER env var (gemini | yamnet | both)
435
- - 🆕 Add GAME_PROFILE env var + profile configs
436
- - 🆕 Add all AUDIO\_\* env vars to .env and zod ConfigSchema
437
- - 🆕 Update final JSON output — add source and audio_event fields
438
- - 🆕 Cache audio event results per video ID
439
-
440
- ```
441
-
442
- ```
@@ -1,105 +0,0 @@
1
- # Refactor Phases
2
-
3
- This document records the three-phase refactor of `video-clipper` from a
4
- monolithic `run()` function in `src/index.ts` into a clean, layered
5
- pipeline architecture.
6
-
7
- ---
8
-
9
- ## Goals
10
-
11
- - Make each pipeline concern independently testable and readable
12
- - Eliminate the `run()` god-function (574 lines → ~40 lines in `index.ts`)
13
- - Enable true parallelism between LLM analysis (pass 1) and audio detection
14
- - Introduce a typed `Cache` class injected top-down, replacing ad-hoc free functions
15
- - Extract CLI parsing into its own module for reuse and testability
16
-
17
- ---
18
-
19
- ## Phase 1 — Shared Utilities
20
-
21
- **Files created:**
22
-
23
- | File | Purpose |
24
- | ---------------------- | -------------------------------------------------------------------------------------- |
25
- | `src/utils/cache.ts` | Refactored: `Cache` class + `@deprecated` legacy free-function shims |
26
- | `src/utils/chunker.ts` | New: `buildWindows(totalDuration, windowSec, overlapSec?)` generic time-window builder |
27
- | `src/cli.ts` | New: `CliArgs`, `parseArgs()`, `printUsage()` extracted from `index.ts` |
28
-
29
- **Key decisions:**
30
-
31
- - `Cache` is constructed once per run in `runner.ts` and injected into every
32
- stage that needs caching. `disabled = true` short-circuits all I/O
33
- (replaces the `--no-cache` ad-hoc scattered checks).
34
- - The legacy free-function shims (`readTranscriptCache`, `writeChunkCache`,
35
- etc.) delegate to `new Cache(cacheDir)` so `llmAnalyzer` and `clipRefiner`
36
- compile with zero changes.
37
- - `buildWindows` replaces the manual `for (offset += chunkLength)` loop that
38
- was inline in `run()` and is now shared by `audioProcessor`.
39
-
40
- **Gate:** `npm run build` clean, all 50 existing tests green.
41
-
42
- ---
43
-
44
- ## Phase 2 — Pipeline Stages
45
-
46
- **Files created under `src/pipeline/`:**
47
-
48
- | File | Stage | Wraps |
49
- | ------------------------------- | ------------ | ------------------------------------------------------------------------ |
50
- | `stages/videoResolver.ts` | 1 | `urlParser` + `metadataExtractor` |
51
- | `stages/transcriptProcessor.ts` | 2 | `transcriptFetcher` + `chunkBuilder` + `dumper` |
52
- | `stages/audioProcessor.ts` | 3 | `audioDownloader` + `audioEventDetector` + `sliceAudio` + `buildWindows` |
53
- | `stages/segmentAnalyzer.ts` | 4a + 4b | `llmAnalyzer` (pass 1) + `clipRefiner` (pass 2) |
54
- | `stages/segmentSelector.ts` | 5 | `signalMerger` + `segmentRanker` |
55
- | `stages/clipExporter.ts` | 6 | `videoDownloader` + `clipGenerator` |
56
- | `runner.ts` | Orchestrator | Composes all six stages |
57
-
58
- **Parallelism gain:**
59
- `analyzeSegments` (LLM pass 1) and `processAudio` now run concurrently via
60
- `Promise.all` in `runner.ts`. They are fully independent — audio detection
61
- only needs the video duration (from stage 1 metadata).
62
-
63
- ```
64
- Stage 1 ──► videoResolver
65
- Stage 2 ──► transcriptProcessor
66
- ┌── analyzeSegments (LLM pass 1) ─┐
67
- Stage 3+4a processAudio │ Promise.all
68
- └────────────────────────────────────┘
69
- Stage 5 ──► selectSegments (merge + rank)
70
- Stage 4b ──► refineRankedSegments (LLM pass 2)
71
- Stage 6 ──► exportClips (optional, --clip)
72
- ```
73
-
74
- **Clip export modes** handled by `clipExporter`:
75
-
76
- 1. `--local-video` — cut directly with ffmpeg, no download
77
- 2. `--download-sections N` — download top-N clips via yt-dlp `--download-sections`, organize to outputs/
78
- 3. Default / `--download-sections all` — download full video, cut with ffmpeg
79
-
80
- **Gate:** `npm run build` clean, all 50 existing tests green.
81
-
82
- ---
83
-
84
- ## Phase 3 — Slim Entrypoint + New Tests
85
-
86
- **Files changed / created:**
87
-
88
- | File | Change |
89
- | ------------------------ | ------------------------------------------------------------------------------------- |
90
- | `src/index.ts` | Replaced 574-line god-function with ~40-line entrypoint delegating to `runPipeline()` |
91
- | `tests/chunker.test.ts` | 14 unit tests for `buildWindows` (edge cases, overlaps, clipping) |
92
- | `tests/cache.test.ts` | 16 unit tests for `Cache` class (round-trips, misses, disabled mode, corrupt data) |
93
- | `docs/refactorPhases.md` | This file |
94
-
95
- **`src/index.ts` now:**
96
-
97
- 1. Parses CLI args via `parseArgs`
98
- 2. Validates required args and prints usage on error
99
- 3. Calls `runPipeline(args)` and catches any thrown error → `log.error` + `process.exit(1)`
100
-
101
- All error handling that previously used `process.exit(1)` inline inside
102
- `run()` now propagates as thrown `Error` objects from the stages, keeping
103
- the pipeline stages pure (no direct `process.exit` calls).
104
-
105
- **Gate:** `npm run build` clean, all 80 tests green (50 original + 14 chunker + 16 cache).