@thunderkiller/video-clipper 1.2.0 → 1.3.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +13 -0
- package/LICENSE +15 -0
- package/package.json +1 -1
- package/.github/workflows/ci.yml +0 -42
- package/.github/workflows/release.yml +0 -76
- package/.husky/pre-commit +0 -3
- package/.prettierignore +0 -6
- package/.prettierrc +0 -7
- package/.releaserc.json +0 -21
- package/AGENTS.md +0 -122
- package/docs/free-models.md +0 -78
- package/docs/plan.md +0 -442
- package/docs/refactorPhases.md +0 -105
- package/docs/yt-downloader.md +0 -440
- package/requirements.txt +0 -5
- package/scripts/detect_events.py +0 -81
- package/scripts/detect_events_whisper.py +0 -101
- package/scripts/transcribe_whisper.py +0 -70
- package/src/cli.ts +0 -186
- package/src/config/env.ts +0 -18
- package/src/config/index.ts +0 -2
- package/src/index.ts +0 -46
- package/src/pipeline/runner.ts +0 -147
- package/src/pipeline/stages/audioProcessor.ts +0 -127
- package/src/pipeline/stages/clipExporter.ts +0 -76
- package/src/pipeline/stages/segmentAnalyzer.ts +0 -72
- package/src/pipeline/stages/segmentSelector.ts +0 -39
- package/src/pipeline/stages/videoResolver.ts +0 -44
- package/src/services/audioAnalyzers/base.ts +0 -32
- package/src/services/audioAnalyzers/factory.ts +0 -69
- package/src/services/audioAnalyzers/gemini.ts +0 -136
- package/src/services/audioAnalyzers/index.ts +0 -6
- package/src/services/audioAnalyzers/whisper.ts +0 -80
- package/src/services/audioAnalyzers/yamnet.ts +0 -54
- package/src/services/audioDownloader/index.ts +0 -102
- package/src/services/chunkBuilder/index.ts +0 -82
- package/src/services/clipGenerator/index.ts +0 -210
- package/src/services/clipRefiner/index.ts +0 -141
- package/src/services/eventDetector/index.ts +0 -68
- package/src/services/llmAnalyzer/LLMAnalyzer.ts +0 -98
- package/src/services/llmAnalyzer/index.ts +0 -231
- package/src/services/metadataExtractor/index.ts +0 -83
- package/src/services/segmentRanker/index.ts +0 -88
- package/src/services/signalMerger/index.ts +0 -53
- package/src/services/transcriptAnalyzers/base.ts +0 -26
- package/src/services/transcriptAnalyzers/factory.ts +0 -66
- package/src/services/transcriptAnalyzers/gemini.ts +0 -24
- package/src/services/transcriptAnalyzers/index.ts +0 -6
- package/src/services/transcriptAnalyzers/whisper.ts +0 -68
- package/src/services/transcriptAnalyzers/ytdlp.ts +0 -19
- package/src/services/transcriptDetector/index.ts +0 -122
- package/src/services/transcriptFetcher/index.ts +0 -147
- package/src/services/urlParser/index.ts +0 -52
- package/src/services/videoDownloader/index.ts +0 -268
- package/src/types/analyzer.ts +0 -23
- package/src/types/audio.ts +0 -19
- package/src/types/cache.ts +0 -8
- package/src/types/cli.ts +0 -22
- package/src/types/config.ts +0 -151
- package/src/types/downloader.ts +0 -15
- package/src/types/factory.ts +0 -3
- package/src/types/index.ts +0 -40
- package/src/types/pipeline.ts +0 -60
- package/src/types/segment.ts +0 -43
- package/src/types/transcript.ts +0 -22
- package/src/types/video.ts +0 -18
- package/src/utils/cache.ts +0 -224
- package/src/utils/chunker.ts +0 -60
- package/src/utils/dumper.ts +0 -41
- package/src/utils/format.ts +0 -10
- package/src/utils/logger.ts +0 -17
- package/src/utils/modelFactory.ts +0 -71
- package/src/utils/redactConfig.ts +0 -23
- package/src/utils/sliceAudio.ts +0 -35
- package/test-trigger.txt +0 -1
- package/tests/analyzerFactory.test.ts +0 -146
- package/tests/audioEventDetector.test.ts +0 -69
- package/tests/cache.test.ts +0 -203
- package/tests/chunkBuilder.test.ts +0 -146
- package/tests/chunker.test.ts +0 -95
- package/tests/eventDetector.test.ts +0 -103
- package/tests/llmAnalyzer.test.ts +0 -283
- package/tests/segmentRanker.test.ts +0 -133
- package/tests/setup.ts +0 -48
- package/tests/signalMerger.test.ts +0 -197
- package/tests/transcriptDetector.test.ts +0 -150
- package/tests/transcriptFetcher.test.ts +0 -179
- package/tests/urlParser.test.ts +0 -70
- package/tsconfig.json +0 -16
- package/tsconfig.test.json +0 -8
- package/vitest.config.ts +0 -8
package/docs/plan.md
DELETED
|
@@ -1,442 +0,0 @@
|
|
|
1
|
-
Here's the full updated build plan in markdown:
|
|
2
|
-
|
|
3
|
-
```markdown
|
|
4
|
-
# YouTube Clip Finder — Build Plan v2.0
|
|
5
|
-
|
|
6
|
-
### with Audio Event Detection
|
|
7
|
-
|
|
8
|
-
---
|
|
9
|
-
|
|
10
|
-
## Legend
|
|
11
|
-
|
|
12
|
-
| Symbol | Meaning |
|
|
13
|
-
| ---------- | ------------------- |
|
|
14
|
-
| ✅ Done | Already built |
|
|
15
|
-
| 🔧 To Do | Not built yet |
|
|
16
|
-
| 🆕 New | Added in v2 |
|
|
17
|
-
| ⚡ Upgrade | Existing + extended |
|
|
18
|
-
|
|
19
|
-
---
|
|
20
|
-
|
|
21
|
-
## 1. System Architecture
|
|
22
|
-
|
|
23
|
-
The v2 pipeline adds audio event detection as a parallel signal alongside transcript analysis. Both signals feed into the merger before ranking.
|
|
24
|
-
```
|
|
25
|
-
|
|
26
|
-
User Input (YouTube URL)
|
|
27
|
-
│
|
|
28
|
-
▼
|
|
29
|
-
Module 1 — URL Parser
|
|
30
|
-
│
|
|
31
|
-
▼
|
|
32
|
-
Module 2 — Video Metadata Extractor
|
|
33
|
-
│
|
|
34
|
-
├─────────────────────────────────┐
|
|
35
|
-
▼ ▼
|
|
36
|
-
Module 3 — Transcript Fetcher Module 3b — Audio Downloader ★ NEW
|
|
37
|
-
│ │
|
|
38
|
-
▼ ▼
|
|
39
|
-
Module 4 — LLM Chunk Builder Module 3c — Audio Event Detector ★ NEW
|
|
40
|
-
│ │
|
|
41
|
-
▼ │
|
|
42
|
-
Module 5 — LLM Segment Analyzer │
|
|
43
|
-
│ │
|
|
44
|
-
└──────────────┬──────────────────┘
|
|
45
|
-
▼
|
|
46
|
-
Module 5b — Signal Merger ★ NEW
|
|
47
|
-
│
|
|
48
|
-
▼
|
|
49
|
-
Module 6 — Segment Ranking ⚡ UPGRADED
|
|
50
|
-
│
|
|
51
|
-
▼
|
|
52
|
-
Module 7 — Clip Refinement Pass
|
|
53
|
-
│
|
|
54
|
-
▼
|
|
55
|
-
Module 8 — Video Downloader
|
|
56
|
-
│
|
|
57
|
-
▼
|
|
58
|
-
Module 9 — Clip Generator (optional)
|
|
59
|
-
|
|
60
|
-
````
|
|
61
|
-
|
|
62
|
-
---
|
|
63
|
-
|
|
64
|
-
## 2. Module Status Overview
|
|
65
|
-
|
|
66
|
-
| # | Module | Status |
|
|
67
|
-
|---|--------|--------|
|
|
68
|
-
| 1 | URL Parser | ✅ Done |
|
|
69
|
-
| 2 | Video Metadata Extractor | ✅ Done |
|
|
70
|
-
| 3 | Transcript Fetcher + Micro-block Grouper | ✅ Done |
|
|
71
|
-
| 3b | Audio Downloader (yt-dlp audio-only) | 🆕 New |
|
|
72
|
-
| 3c | Audio Event Detector (Gemini primary + YAMNet fallback) | 🆕 New |
|
|
73
|
-
| 4 | LLM Chunk Builder | ✅ Done |
|
|
74
|
-
| 5 | LLM Segment Analyzer | ✅ Done |
|
|
75
|
-
| 5b | Signal Merger (transcript + audio events) | 🆕 New |
|
|
76
|
-
| 6 | Segment Ranking | ⚡ Upgrade |
|
|
77
|
-
| 7 | Clip Refinement Pass | ✅ Done |
|
|
78
|
-
| 8 | Video Downloader | ✅ Done |
|
|
79
|
-
| 9 | Clip Generator (ffmpeg) | ✅ Done |
|
|
80
|
-
|
|
81
|
-
---
|
|
82
|
-
|
|
83
|
-
## 3. Existing Modules
|
|
84
|
-
|
|
85
|
-
Modules 1–5 and 7–9 are fully built per v1 spec. No changes required.
|
|
86
|
-
|
|
87
|
-
---
|
|
88
|
-
|
|
89
|
-
## 4. New Modules — Audio Event Detection
|
|
90
|
-
|
|
91
|
-
### Module 3b — Audio Downloader 🆕
|
|
92
|
-
|
|
93
|
-
**Status: To Do**
|
|
94
|
-
|
|
95
|
-
Downloads audio-only from YouTube using yt-dlp. Runs in parallel with transcript fetching. Uses 16kHz mono WAV required by YAMNet.
|
|
96
|
-
|
|
97
|
-
```ts
|
|
98
|
-
import { execa } from 'execa';
|
|
99
|
-
|
|
100
|
-
export async function downloadAudio(videoId: string, outputDir: string): Promise<string> {
|
|
101
|
-
const outputPath = `${outputDir}/${videoId}_audio.wav`;
|
|
102
|
-
|
|
103
|
-
if (fs.existsSync(outputPath)) {
|
|
104
|
-
console.log(`[audio] Cache hit: ${outputPath}`);
|
|
105
|
-
return outputPath;
|
|
106
|
-
}
|
|
107
|
-
|
|
108
|
-
await execa('yt-dlp', [
|
|
109
|
-
'-x',
|
|
110
|
-
'--audio-format', 'wav',
|
|
111
|
-
'--audio-quality', '0',
|
|
112
|
-
'--postprocessor-args', '-ar 16000 -ac 1', // 16kHz mono for YAMNet
|
|
113
|
-
'-o', outputPath,
|
|
114
|
-
`https://youtube.com/watch?v=${videoId}`,
|
|
115
|
-
]);
|
|
116
|
-
|
|
117
|
-
return outputPath;
|
|
118
|
-
}
|
|
119
|
-
````
|
|
120
|
-
|
|
121
|
-
---
|
|
122
|
-
|
|
123
|
-
### Module 3c — Audio Event Detector (Gemini primary + Whisper local + YAMNet legacy) 🆕
|
|
124
|
-
|
|
125
|
-
**Status: To Do**
|
|
126
|
-
|
|
127
|
-
Three-tier audio event detection. Gemini 1.5 Flash is tried first — understands game context, needs no local setup. If Gemini fails or is disabled, Whisper runs locally: it transcribes the audio chunk and scans the resulting transcript for hype keywords per game profile. YAMNet remains available as a legacy option via `AUDIO_PROVIDER=yamnet`.
|
|
128
|
-
|
|
129
|
-
| | Gemini Flash (primary) | Whisper (local fallback) | YAMNet (legacy) |
|
|
130
|
-
| ------------ | ---------------------------------------- | ------------------------------------------------ | ------------------------------------- |
|
|
131
|
-
| Cost | ~$0.001/video (free tier: 60/day) | Free, always | Free, always |
|
|
132
|
-
| Setup | Just an API key | pip install openai-whisper | pip install tensorflow tensorflow-hub |
|
|
133
|
-
| Game context | Understands "clutch", "ace", boss phases | Speech transcript + keyword matching per profile | Class IDs only (gunshot, explosion) |
|
|
134
|
-
| Accuracy | High — semantic understanding | Medium-high — depends on speech clarity | Medium — fixed class threshold |
|
|
135
|
-
| Offline | No | Yes | Yes |
|
|
136
|
-
|
|
137
|
-
#### Tier 1 — Gemini 1.5 Flash (primary)
|
|
138
|
-
|
|
139
|
-
```ts
|
|
140
|
-
import { GoogleGenerativeAI } from '@google/generative-ai';
|
|
141
|
-
import * as fs from 'fs';
|
|
142
|
-
|
|
143
|
-
const genai = new GoogleGenerativeAI(process.env.GOOGLE_GENERATIVE_AI_API_KEY!);
|
|
144
|
-
|
|
145
|
-
export async function detectEventsGemini(
|
|
146
|
-
audioPath: string,
|
|
147
|
-
gameProfile: string,
|
|
148
|
-
chunkOffsetSec: number,
|
|
149
|
-
): Promise<AudioEvent[]> {
|
|
150
|
-
const model = genai.getGenerativeModel({ model: 'gemini-1.5-flash' });
|
|
151
|
-
|
|
152
|
-
const audioData = fs.readFileSync(audioPath);
|
|
153
|
-
const base64Audio = audioData.toString('base64');
|
|
154
|
-
|
|
155
|
-
const prompt = `
|
|
156
|
-
You are analyzing audio from a ${gameProfile} gaming video.
|
|
157
|
-
Identify ALL significant game events: kills, deaths, explosions,
|
|
158
|
-
ability uses, boss phases, crowd reactions, clutch moments.
|
|
159
|
-
For each event return JSON: { time_sec, event, confidence }
|
|
160
|
-
time_sec is relative to the START of this audio chunk.
|
|
161
|
-
Return ONLY a JSON array, no explanation.
|
|
162
|
-
`;
|
|
163
|
-
|
|
164
|
-
const result = await model.generateContent([
|
|
165
|
-
{ inlineData: { mimeType: 'audio/wav', data: base64Audio } },
|
|
166
|
-
prompt,
|
|
167
|
-
]);
|
|
168
|
-
|
|
169
|
-
const events = JSON.parse(result.response.text());
|
|
170
|
-
|
|
171
|
-
// Offset timestamps to absolute video time
|
|
172
|
-
return events.map((e: any) => ({
|
|
173
|
-
...e,
|
|
174
|
-
time: e.time_sec + chunkOffsetSec,
|
|
175
|
-
source: 'gemini',
|
|
176
|
-
}));
|
|
177
|
-
}
|
|
178
|
-
```
|
|
179
|
-
|
|
180
|
-
#### Tier 2 — YAMNet fallback (Python)
|
|
181
|
-
|
|
182
|
-
```python
|
|
183
|
-
import tensorflow_hub as hub
|
|
184
|
-
import soundfile as sf
|
|
185
|
-
import numpy as np, json, sys
|
|
186
|
-
|
|
187
|
-
model = hub.load('https://tfhub.dev/google/yamnet/1')
|
|
188
|
-
|
|
189
|
-
GAME_EVENTS = {
|
|
190
|
-
67: 'gunshot', 366: 'explosion',
|
|
191
|
-
389: 'crowd_cheering', 63: 'gunfire_burst',
|
|
192
|
-
}
|
|
193
|
-
|
|
194
|
-
def detect_events(audio_path, threshold=0.30):
|
|
195
|
-
wav, sr = sf.read(audio_path, dtype='float32')
|
|
196
|
-
scores, _, _ = model(wav)
|
|
197
|
-
events = []
|
|
198
|
-
for i, frame in enumerate(scores.numpy()):
|
|
199
|
-
for cid, label in GAME_EVENTS.items():
|
|
200
|
-
if frame[cid] > threshold:
|
|
201
|
-
events.append({ 'time': round(i * 0.48, 2),
|
|
202
|
-
'event': label,
|
|
203
|
-
'confidence': float(frame[cid]),
|
|
204
|
-
'source': 'yamnet' })
|
|
205
|
-
return cluster_events(events, gap=1.5)
|
|
206
|
-
|
|
207
|
-
print(json.dumps(detect_events(sys.argv[1], float(sys.argv[2]))))
|
|
208
|
-
```
|
|
209
|
-
|
|
210
|
-
#### Fallback orchestrator (TypeScript)
|
|
211
|
-
|
|
212
|
-
```ts
|
|
213
|
-
export async function detectAudioEvents(
|
|
214
|
-
audioPath: string,
|
|
215
|
-
gameProfile: string,
|
|
216
|
-
chunkOffsetSec: number,
|
|
217
|
-
): Promise<AudioEvent[]> {
|
|
218
|
-
// Try Gemini first
|
|
219
|
-
if (config.AUDIO_PROVIDER !== 'yamnet' && config.GOOGLE_GENERATIVE_AI_API_KEY) {
|
|
220
|
-
try {
|
|
221
|
-
const events = await detectEventsGemini(audioPath, gameProfile, chunkOffsetSec);
|
|
222
|
-
console.log(`[audio] Gemini detected ${events.length} events`);
|
|
223
|
-
return events;
|
|
224
|
-
} catch (err) {
|
|
225
|
-
console.warn('[audio] Gemini failed, falling back to YAMNet:', err.message);
|
|
226
|
-
}
|
|
227
|
-
}
|
|
228
|
-
|
|
229
|
-
// Fallback: YAMNet
|
|
230
|
-
const { stdout } = await execa('python', [
|
|
231
|
-
'scripts/detect_events.py',
|
|
232
|
-
audioPath,
|
|
233
|
-
String(config.AUDIO_CONFIDENCE_THRESHOLD),
|
|
234
|
-
]);
|
|
235
|
-
const events = JSON.parse(stdout) as AudioEvent[];
|
|
236
|
-
console.log(`[audio] YAMNet detected ${events.length} events`);
|
|
237
|
-
return events;
|
|
238
|
-
}
|
|
239
|
-
```
|
|
240
|
-
|
|
241
|
-
#### Output format (same shape from both providers)
|
|
242
|
-
|
|
243
|
-
```json
|
|
244
|
-
{
|
|
245
|
-
"time": 142.08,
|
|
246
|
-
"event": "gunshot",
|
|
247
|
-
"confidence": 0.74,
|
|
248
|
-
"source": "gemini"
|
|
249
|
-
}
|
|
250
|
-
```
|
|
251
|
-
|
|
252
|
-
---
|
|
253
|
-
|
|
254
|
-
### Module 5b — Signal Merger 🆕
|
|
255
|
-
|
|
256
|
-
**Status: To Do**
|
|
257
|
-
|
|
258
|
-
Merges LLM transcript scores and audio event detections into unified clip candidates.
|
|
259
|
-
|
|
260
|
-
**Merging logic:**
|
|
261
|
-
|
|
262
|
-
- Audio event at time T → clip window: `T - 5s` to `T + 15s`
|
|
263
|
-
- LLM segment within ±10s of an audio event → score boosted by +2
|
|
264
|
-
- Audio event with no nearby LLM signal → still a candidate (score = confidence × 10)
|
|
265
|
-
- LLM segment with no audio event → unchanged (existing behavior)
|
|
266
|
-
|
|
267
|
-
```ts
|
|
268
|
-
interface MergedCandidate {
|
|
269
|
-
start: number;
|
|
270
|
-
end: number;
|
|
271
|
-
score: number;
|
|
272
|
-
source: 'transcript' | 'audio' | 'both';
|
|
273
|
-
reason: string;
|
|
274
|
-
audio_event?: string;
|
|
275
|
-
}
|
|
276
|
-
|
|
277
|
-
export function mergeSignals(
|
|
278
|
-
llmSegments: LLMSegment[],
|
|
279
|
-
audioEvents: AudioEvent[],
|
|
280
|
-
boostWindow = 10,
|
|
281
|
-
): MergedCandidate[] {
|
|
282
|
-
const candidates: MergedCandidate[] = [];
|
|
283
|
-
|
|
284
|
-
// Pass 1: LLM segments (possibly boosted by nearby audio)
|
|
285
|
-
for (const seg of llmSegments) {
|
|
286
|
-
const nearby = audioEvents.filter((e) => Math.abs(e.time - seg.clip_start) < boostWindow);
|
|
287
|
-
candidates.push({
|
|
288
|
-
start: seg.clip_start,
|
|
289
|
-
end: seg.clip_end,
|
|
290
|
-
score: seg.score + (nearby.length > 0 ? 2 : 0),
|
|
291
|
-
source: nearby.length > 0 ? 'both' : 'transcript',
|
|
292
|
-
reason: seg.reason,
|
|
293
|
-
audio_event: nearby[0]?.event,
|
|
294
|
-
});
|
|
295
|
-
}
|
|
296
|
-
|
|
297
|
-
// Pass 2: Audio-only events (the gap filler — silent kills, boss deaths)
|
|
298
|
-
for (const evt of audioEvents) {
|
|
299
|
-
const hasLLM = llmSegments.some((s) => Math.abs(s.clip_start - evt.time) < boostWindow);
|
|
300
|
-
if (!hasLLM) {
|
|
301
|
-
candidates.push({
|
|
302
|
-
start: Math.max(0, evt.time - 5),
|
|
303
|
-
end: evt.time + 15,
|
|
304
|
-
score: Math.round(evt.confidence * 10),
|
|
305
|
-
source: 'audio',
|
|
306
|
-
reason: `Audio event: ${evt.event} (${(evt.confidence * 100).toFixed(0)}% confidence)`,
|
|
307
|
-
audio_event: evt.event,
|
|
308
|
-
});
|
|
309
|
-
}
|
|
310
|
-
}
|
|
311
|
-
|
|
312
|
-
return candidates;
|
|
313
|
-
}
|
|
314
|
-
```
|
|
315
|
-
|
|
316
|
-
---
|
|
317
|
-
|
|
318
|
-
## 5. Upgraded Modules
|
|
319
|
-
|
|
320
|
-
### Module 6 — Segment Ranking ⚡
|
|
321
|
-
|
|
322
|
-
**Status: Upgrade — extend existing ranking to handle MergedCandidate[]**
|
|
323
|
-
|
|
324
|
-
Changes from v1:
|
|
325
|
-
|
|
326
|
-
- Input is now `MergedCandidate[]` instead of `LLMSegment[]`
|
|
327
|
-
- New `source` field in output: `'transcript'`, `'audio'`, or `'both'`
|
|
328
|
-
- Deduplication window widened to ±8s for audio events
|
|
329
|
-
- `audio_event` field passed through to final JSON output
|
|
330
|
-
|
|
331
|
-
---
|
|
332
|
-
|
|
333
|
-
## 6. New Config Options (.env)
|
|
334
|
-
|
|
335
|
-
```env
|
|
336
|
-
# Audio Event Detection
|
|
337
|
-
AUDIO_DETECTION_ENABLED=true # set false to skip (transcript-only mode)
|
|
338
|
-
AUDIO_PROVIDER=both # gemini | yamnet | whisper | both (gemini with whisper fallback)
|
|
339
|
-
AUDIO_CONFIDENCE_THRESHOLD=0.30 # confidence minimum (0-1); for Whisper: 1.0=exact, 0.8=partial
|
|
340
|
-
AUDIO_WHISPER_MODEL=medium # tiny | base | small | medium | large-v3
|
|
341
|
-
AUDIO_CLIP_PRE_ROLL=5 # seconds before event to start clip
|
|
342
|
-
AUDIO_CLIP_POST_ROLL=15 # seconds after event to end clip
|
|
343
|
-
AUDIO_LLM_BOOST_WINDOW=10 # seconds within which audio boosts LLM score
|
|
344
|
-
AUDIO_LLM_SCORE_BOOST=2 # score boost when audio+LLM both signal
|
|
345
|
-
|
|
346
|
-
# Game Profile
|
|
347
|
-
GAME_PROFILE=valorant # valorant | fps | boss_fight | general
|
|
348
|
-
```
|
|
349
|
-
|
|
350
|
-
---
|
|
351
|
-
|
|
352
|
-
## 7. Game Profiles 🆕
|
|
353
|
-
|
|
354
|
-
| Profile | YAMNet classes boosted | LLM keyword hints |
|
|
355
|
-
| ---------- | ----------------------------------------------- | ------------------------------------ |
|
|
356
|
-
| valorant | gunshot, gunfire_burst, explosion | ace, clutch, defuse, spike, 1v1 |
|
|
357
|
-
| boss_fight | explosion, crowd_cheering, battle_cry | phase, dead, down, finally, let's go |
|
|
358
|
-
| fps | gunshot, gunfire_burst, explosion, crowd_booing | kill, streak, headshot, collateral |
|
|
359
|
-
| general | crowd_cheering, applause | insane, crazy, no way, let's go |
|
|
360
|
-
|
|
361
|
-
---
|
|
362
|
-
|
|
363
|
-
## 8. Updated Final Output Format
|
|
364
|
-
|
|
365
|
-
```json
|
|
366
|
-
{
|
|
367
|
-
"video_id": "abc123",
|
|
368
|
-
"title": "Valorant ranked grind",
|
|
369
|
-
"duration": 3640,
|
|
370
|
-
"segments": [
|
|
371
|
-
{
|
|
372
|
-
"rank": 1,
|
|
373
|
-
"start": 834,
|
|
374
|
-
"end": 849,
|
|
375
|
-
"score": 9,
|
|
376
|
-
"source": "both",
|
|
377
|
-
"reason": "Ace reaction with hype phrases",
|
|
378
|
-
"audio_event": "gunshot"
|
|
379
|
-
},
|
|
380
|
-
{
|
|
381
|
-
"rank": 2,
|
|
382
|
-
"start": 1205,
|
|
383
|
-
"end": 1220,
|
|
384
|
-
"score": 7,
|
|
385
|
-
"source": "audio",
|
|
386
|
-
"reason": "Audio event: gunshot (81% confidence)",
|
|
387
|
-
"audio_event": "gunshot"
|
|
388
|
-
},
|
|
389
|
-
{
|
|
390
|
-
"rank": 3,
|
|
391
|
-
"start": 420,
|
|
392
|
-
"end": 455,
|
|
393
|
-
"score": 8,
|
|
394
|
-
"source": "transcript",
|
|
395
|
-
"reason": "Funny storytelling moment"
|
|
396
|
-
}
|
|
397
|
-
]
|
|
398
|
-
}
|
|
399
|
-
```
|
|
400
|
-
|
|
401
|
-
---
|
|
402
|
-
|
|
403
|
-
## 9. New Dependencies
|
|
404
|
-
|
|
405
|
-
```bash
|
|
406
|
-
pip install tensorflow tensorflow-hub soundfile numpy
|
|
407
|
-
```
|
|
408
|
-
|
|
409
|
-
| Package | Language | Cost | Purpose |
|
|
410
|
-
| --------------------- | -------- | --------- | ---------------------- |
|
|
411
|
-
| tensorflow_hub | Python | Free | Load YAMNet model |
|
|
412
|
-
| soundfile | Python | Free | Read WAV files |
|
|
413
|
-
| numpy | Python | Free | Score array processing |
|
|
414
|
-
| @google/generative-ai | Node | Free tier | Gemini audio analysis |
|
|
415
|
-
|
|
416
|
-
---
|
|
417
|
-
|
|
418
|
-
## 10. To-Do Checklist
|
|
419
|
-
|
|
420
|
-
- ✅ Module 1 — URL Parser
|
|
421
|
-
- ✅ Module 2 — Video Metadata Extractor
|
|
422
|
-
- ✅ Module 3 — Transcript Fetcher + Micro-block Grouper
|
|
423
|
-
- ✅ Module 4 — LLM Chunk Builder
|
|
424
|
-
- ✅ Module 5 — LLM Segment Analyzer
|
|
425
|
-
- ✅ Module 7 — Clip Refinement Pass
|
|
426
|
-
- ✅ Module 8 — Video Downloader
|
|
427
|
-
- ✅ Module 9 — Clip Generator
|
|
428
|
-
- 🆕 Module 3b — Audio Downloader (yt-dlp audio-only, 16kHz mono WAV)
|
|
429
|
-
- 🆕 Module 3c — Gemini Flash audio detector (chunked audio + game prompt)
|
|
430
|
-
- 🆕 Module 3c — YAMNet fallback (Python script + Node execa caller)
|
|
431
|
-
- 🆕 Module 3c — Fallback orchestrator (try Gemini → catch → YAMNet)
|
|
432
|
-
- 🆕 Module 5b — Signal Merger (transcript + audio candidates)
|
|
433
|
-
- ⚡ Module 6 — Segment Ranking (update to accept MergedCandidate[])
|
|
434
|
-
- 🆕 Add AUDIO_PROVIDER env var (gemini | yamnet | both)
|
|
435
|
-
- 🆕 Add GAME_PROFILE env var + profile configs
|
|
436
|
-
- 🆕 Add all AUDIO\_\* env vars to .env and zod ConfigSchema
|
|
437
|
-
- 🆕 Update final JSON output — add source and audio_event fields
|
|
438
|
-
- 🆕 Cache audio event results per video ID
|
|
439
|
-
|
|
440
|
-
```
|
|
441
|
-
|
|
442
|
-
```
|
package/docs/refactorPhases.md
DELETED
|
@@ -1,105 +0,0 @@
|
|
|
1
|
-
# Refactor Phases
|
|
2
|
-
|
|
3
|
-
This document records the three-phase refactor of `video-clipper` from a
|
|
4
|
-
monolithic `run()` function in `src/index.ts` into a clean, layered
|
|
5
|
-
pipeline architecture.
|
|
6
|
-
|
|
7
|
-
---
|
|
8
|
-
|
|
9
|
-
## Goals
|
|
10
|
-
|
|
11
|
-
- Make each pipeline concern independently testable and readable
|
|
12
|
-
- Eliminate the `run()` god-function (574 lines → ~40 lines in `index.ts`)
|
|
13
|
-
- Enable true parallelism between LLM analysis (pass 1) and audio detection
|
|
14
|
-
- Introduce a typed `Cache` class injected top-down, replacing ad-hoc free functions
|
|
15
|
-
- Extract CLI parsing into its own module for reuse and testability
|
|
16
|
-
|
|
17
|
-
---
|
|
18
|
-
|
|
19
|
-
## Phase 1 — Shared Utilities
|
|
20
|
-
|
|
21
|
-
**Files created:**
|
|
22
|
-
|
|
23
|
-
| File | Purpose |
|
|
24
|
-
| ---------------------- | -------------------------------------------------------------------------------------- |
|
|
25
|
-
| `src/utils/cache.ts` | Refactored: `Cache` class + `@deprecated` legacy free-function shims |
|
|
26
|
-
| `src/utils/chunker.ts` | New: `buildWindows(totalDuration, windowSec, overlapSec?)` generic time-window builder |
|
|
27
|
-
| `src/cli.ts` | New: `CliArgs`, `parseArgs()`, `printUsage()` extracted from `index.ts` |
|
|
28
|
-
|
|
29
|
-
**Key decisions:**
|
|
30
|
-
|
|
31
|
-
- `Cache` is constructed once per run in `runner.ts` and injected into every
|
|
32
|
-
stage that needs caching. `disabled = true` short-circuits all I/O
|
|
33
|
-
(replaces the `--no-cache` ad-hoc scattered checks).
|
|
34
|
-
- The legacy free-function shims (`readTranscriptCache`, `writeChunkCache`,
|
|
35
|
-
etc.) delegate to `new Cache(cacheDir)` so `llmAnalyzer` and `clipRefiner`
|
|
36
|
-
compile with zero changes.
|
|
37
|
-
- `buildWindows` replaces the manual `for (offset += chunkLength)` loop that
|
|
38
|
-
was inline in `run()` and is now shared by `audioProcessor`.
|
|
39
|
-
|
|
40
|
-
**Gate:** `npm run build` clean, all 50 existing tests green.
|
|
41
|
-
|
|
42
|
-
---
|
|
43
|
-
|
|
44
|
-
## Phase 2 — Pipeline Stages
|
|
45
|
-
|
|
46
|
-
**Files created under `src/pipeline/`:**
|
|
47
|
-
|
|
48
|
-
| File | Stage | Wraps |
|
|
49
|
-
| ------------------------------- | ------------ | ------------------------------------------------------------------------ |
|
|
50
|
-
| `stages/videoResolver.ts` | 1 | `urlParser` + `metadataExtractor` |
|
|
51
|
-
| `stages/transcriptProcessor.ts` | 2 | `transcriptFetcher` + `chunkBuilder` + `dumper` |
|
|
52
|
-
| `stages/audioProcessor.ts` | 3 | `audioDownloader` + `audioEventDetector` + `sliceAudio` + `buildWindows` |
|
|
53
|
-
| `stages/segmentAnalyzer.ts` | 4a + 4b | `llmAnalyzer` (pass 1) + `clipRefiner` (pass 2) |
|
|
54
|
-
| `stages/segmentSelector.ts` | 5 | `signalMerger` + `segmentRanker` |
|
|
55
|
-
| `stages/clipExporter.ts` | 6 | `videoDownloader` + `clipGenerator` |
|
|
56
|
-
| `runner.ts` | Orchestrator | Composes all six stages |
|
|
57
|
-
|
|
58
|
-
**Parallelism gain:**
|
|
59
|
-
`analyzeSegments` (LLM pass 1) and `processAudio` now run concurrently via
|
|
60
|
-
`Promise.all` in `runner.ts`. They are fully independent — audio detection
|
|
61
|
-
only needs the video duration (from stage 1 metadata).
|
|
62
|
-
|
|
63
|
-
```
|
|
64
|
-
Stage 1 ──► videoResolver
|
|
65
|
-
Stage 2 ──► transcriptProcessor
|
|
66
|
-
┌── analyzeSegments (LLM pass 1) ─┐
|
|
67
|
-
Stage 3+4a processAudio │ Promise.all
|
|
68
|
-
└────────────────────────────────────┘
|
|
69
|
-
Stage 5 ──► selectSegments (merge + rank)
|
|
70
|
-
Stage 4b ──► refineRankedSegments (LLM pass 2)
|
|
71
|
-
Stage 6 ──► exportClips (optional, --clip)
|
|
72
|
-
```
|
|
73
|
-
|
|
74
|
-
**Clip export modes** handled by `clipExporter`:
|
|
75
|
-
|
|
76
|
-
1. `--local-video` — cut directly with ffmpeg, no download
|
|
77
|
-
2. `--download-sections N` — download top-N clips via yt-dlp `--download-sections`, organize to outputs/
|
|
78
|
-
3. Default / `--download-sections all` — download full video, cut with ffmpeg
|
|
79
|
-
|
|
80
|
-
**Gate:** `npm run build` clean, all 50 existing tests green.
|
|
81
|
-
|
|
82
|
-
---
|
|
83
|
-
|
|
84
|
-
## Phase 3 — Slim Entrypoint + New Tests
|
|
85
|
-
|
|
86
|
-
**Files changed / created:**
|
|
87
|
-
|
|
88
|
-
| File | Change |
|
|
89
|
-
| ------------------------ | ------------------------------------------------------------------------------------- |
|
|
90
|
-
| `src/index.ts` | Replaced 574-line god-function with ~40-line entrypoint delegating to `runPipeline()` |
|
|
91
|
-
| `tests/chunker.test.ts` | 14 unit tests for `buildWindows` (edge cases, overlaps, clipping) |
|
|
92
|
-
| `tests/cache.test.ts` | 16 unit tests for `Cache` class (round-trips, misses, disabled mode, corrupt data) |
|
|
93
|
-
| `docs/refactorPhases.md` | This file |
|
|
94
|
-
|
|
95
|
-
**`src/index.ts` now:**
|
|
96
|
-
|
|
97
|
-
1. Parses CLI args via `parseArgs`
|
|
98
|
-
2. Validates required args and prints usage on error
|
|
99
|
-
3. Calls `runPipeline(args)` and catches any thrown error → `log.error` + `process.exit(1)`
|
|
100
|
-
|
|
101
|
-
All error handling that previously used `process.exit(1)` inline inside
|
|
102
|
-
`run()` now propagates as thrown `Error` objects from the stages, keeping
|
|
103
|
-
the pipeline stages pure (no direct `process.exit` calls).
|
|
104
|
-
|
|
105
|
-
**Gate:** `npm run build` clean, all 80 tests green (50 original + 14 chunker + 16 cache).
|