vidclaude 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md ADDED
@@ -0,0 +1,237 @@
1
+ # vidclaude
2
+
3
+ Multimodal video understanding for Claude Code. Extract frames, transcribe audio in 90+ languages, build temporal timelines — all from a single command. No API key needed.
4
+
5
+ ```bash
6
+ pip install vidclaude
7
+ vidclaude video.mp4 --mode standard --verbose
8
+ ```
9
+
10
+ ## What it does
11
+
12
+ Drop a video in, get structured evidence out. Claude in your conversation does the thinking.
13
+
14
+ ```
15
+ Video File
16
+
17
+ ├─ ffmpeg ──────────► Frames (adaptive, shot-aware sampling)
18
+ ├─ faster-whisper ──► Transcript with timestamps (large-v3, 90+ languages)
19
+ ├─ pytesseract ─────► On-screen text / OCR (optional)
20
+ ├─ scene detection ─► Shot boundaries
21
+
22
+ └─► Timeline ──► evidence.md ──► Claude reasons over it
23
+ ```
24
+
25
+ Works with Hindi, English, Spanish, Japanese, Arabic — any of the 90+ languages Whisper supports. Language is auto-detected.
26
+
27
+ ## Prerequisites
28
+
29
+ | Requirement | Install |
30
+ |-------------|---------|
31
+ | Python 3.10+ | [python.org](https://python.org) |
32
+ | ffmpeg | Windows: `winget install ffmpeg` / macOS: `brew install ffmpeg` / Linux: `sudo apt install ffmpeg` |
33
+
34
+ ## Install
35
+
36
+ ```bash
37
+ pip install vidclaude
38
+ ```
39
+
40
+ That's it. First run downloads the Whisper model (~3GB, one time).
41
+
42
+ ## Usage
43
+
44
+ ### With Claude Code (recommended)
45
+
46
+ Set up the skill once:
47
+
48
+ ```bash
49
+ vidclaude --install-skill
50
+ ```
51
+
52
+ Then in Claude Code, just say:
53
+
54
+ > "analyze the video at C:/Users/me/Videos/meeting.mp4"
55
+ >
56
+ > "what does the speaker say about the budget in presentation.mp4?"
57
+ >
58
+ > "when does the logo appear in intro.mov?"
59
+
60
+ Claude runs the extraction, reads the evidence report + frames, and answers your question. Your Max/Pro plan covers everything — no API key needed.
61
+
62
+ Follow-up questions about the same video are instant (cached).
63
+
64
+ ### From the command line
65
+
66
+ ```bash
67
+ # Standard analysis — good for most videos
68
+ vidclaude video.mp4 --mode standard --verbose
69
+
70
+ # Quick — fewer frames, faster
71
+ vidclaude video.mp4 --mode quick
72
+
73
+ # Deep — dense frames, full OCR, detailed
74
+ vidclaude video.mp4 --mode deep --verbose
75
+
76
+ # Batch process a folder
77
+ vidclaude ./videos/ --verbose
78
+
79
+ # Skip audio transcription
80
+ vidclaude video.mp4 --no-audio
81
+
82
+ # Force fresh extraction (ignore cache)
83
+ vidclaude video.mp4 --no-cache
84
+ ```
85
+
86
+ ### Output
87
+
88
+ Every run creates a `.vidcache/` directory next to the video:
89
+
90
+ ```
91
+ .vidcache/a3f7b2c1/
92
+ evidence.md ← Human-readable report (Claude reads this)
93
+ frames/ ← Extracted JPEG frames
94
+ transcript.json ← Timestamped transcript
95
+ timeline.json ← Unified event timeline
96
+ meta.json ← Video metadata
97
+ shots.json ← Shot boundaries
98
+ ocr.json ← On-screen text
99
+ summaries.json ← Scene/chapter summaries
100
+ ```
101
+
102
+ ## Modes
103
+
104
+ | Mode | Frames | Whisper model | OCR | Best for |
105
+ |------|--------|---------------|-----|----------|
106
+ | `quick` | ~20, uniform | base | skip | Short clips, fast overview |
107
+ | `standard` | ~60, shot-aware | large-v3 | keyframes | General use |
108
+ | `deep` | ~150, burst sampling | large-v3 | all frames | Long videos, detailed analysis |
109
+
110
+ **Smart frame budget**: If a video is too long for the frame limit, FPS is automatically reduced. Shot boundary frames are always prioritized.
111
+
112
+ ## CLI Reference
113
+
114
+ ```
115
+ vidclaude [input] [options]
116
+
117
+ positional:
118
+ input Video file or folder
119
+
120
+ setup:
121
+ --install-skill Copy SKILL.md to current dir for Claude Code
122
+
123
+ processing:
124
+ --mode {quick,standard,deep} Processing mode (default: standard)
125
+ -f, --fps N Override frames per second
126
+ -m, --max-frames N Override max frame count
127
+ --no-audio Skip audio transcription
128
+ --no-ocr Skip OCR extraction
129
+ --no-cache Force re-extraction
130
+
131
+ output:
132
+ --extract Print cache path summary
133
+ -o, --output FILE Write output to file
134
+ --verbose Show detailed progress
135
+ --batch-summary Cross-video summary for folders
136
+ ```
137
+
138
+ ## How it works
139
+
140
+ Built on a multi-layer video understanding architecture:
141
+
142
+ - **Layer A — Ingestion**: Validates format (MP4, MOV, MKV, WebM, AVI), extracts metadata via ffprobe
143
+ - **Layer B — Segmentation**: Detects shot boundaries using ffmpeg's scene change filter
144
+ - **Layer C — Adaptive Sampling**: Content-aware frame selection — more frames at scene transitions, smart frame budgets per mode
145
+ - **Layer D — Audio**: faster-whisper with large-v3 for multilingual transcription with word-level timestamps and VAD filtering
146
+ - **Layer E — OCR**: pytesseract text extraction from key frames (optional)
147
+ - **Layer G — Timeline**: Merges speech, OCR, and scene events into a single time-sorted list
148
+ - **Layer I — Memory**: Hierarchical summaries for longer videos (scene → chapter → global)
149
+ - **Layer J — Evidence Assembly**: Generates `evidence.md` with frame references, transcript, timeline for Claude to reason over
150
+
151
+ ## Intent-aware processing
152
+
153
+ The tool classifies your question and adjusts the pipeline:
154
+
155
+ | Question type | Example | What happens |
156
+ |--------------|---------|-------------|
157
+ | Description | "What happens in this video?" | Balanced extraction |
158
+ | Moment retrieval | "When does the person stand up?" | Prioritizes transcript + timeline |
159
+ | Temporal ordering | "Does X happen before Y?" | Prioritizes timeline events |
160
+ | Counting | "How many cars appear?" | Denser frame sampling |
161
+ | OCR / text | "What text is on the slide?" | Prioritizes OCR extraction |
162
+ | Speech | "What did they say about revenue?" | Prioritizes transcript |
163
+
164
+ ## Optional extras
165
+
166
+ ```bash
167
+ # OCR support (also needs Tesseract binary installed)
168
+ pip install pytesseract
169
+
170
+ # Standalone API mode (use outside Claude Code)
171
+ pip install anthropic
172
+ export ANTHROPIC_API_KEY=sk-...
173
+ vidclaude video.mp4 --api -q "What happens in this video?"
174
+ ```
175
+
176
+ ## Examples
177
+
178
+ **Analyze a meeting recording:**
179
+ ```bash
180
+ vidclaude meeting.mp4 --mode standard --verbose
181
+ # → 55 frames, full transcript, timeline
182
+ # → In Claude Code: "summarize the key decisions from this meeting"
183
+ ```
184
+
185
+ **Review security footage:**
186
+ ```bash
187
+ vidclaude ./cameras/ --mode deep --verbose
188
+ # → Batch processes all videos in the folder
189
+ # → Dense frame extraction catches fast events
190
+ ```
191
+
192
+ **Extract text from a lecture:**
193
+ ```bash
194
+ pip install pytesseract
195
+ vidclaude lecture.mp4 --mode standard --verbose
196
+ # → Captures slide text via OCR + speaker transcript
197
+ ```
198
+
199
+ **Quick check on a short clip:**
200
+ ```bash
201
+ vidclaude clip.mp4 --mode quick
202
+ # → 20 frames, fast whisper, done in seconds
203
+ ```
204
+
205
+ ## Troubleshooting
206
+
207
+ | Problem | Fix |
208
+ |---------|-----|
209
+ | `ffmpeg not found` | Install: `winget install ffmpeg` (Windows) / `brew install ffmpeg` (Mac) |
210
+ | `No module named 'faster_whisper'` | `pip install faster-whisper` |
211
+ | Slow first run | Normal — downloading Whisper large-v3 model (~3GB, one time) |
212
+ | Wrong language detected | Whisper auto-detects; usually correct. For edge cases, the transcript still captures phonetics |
213
+ | Large `.vidcache` folder | Delete it to free space: `rm -rf .vidcache/` |
214
+ | Want fresh extraction | Use `--no-cache` flag |
215
+ | OCR not working | Install pytesseract + Tesseract binary |
216
+
217
+ ## Project structure
218
+
219
+ ```
220
+ vidclaude/
221
+ cli.py Argument parsing, orchestration, caching
222
+ models.py Data model (VideoMeta, Frame, Shot, TranscriptChunk, etc.)
223
+ ingest.py Video validation + ffprobe metadata
224
+ segment.py Shot detection + adaptive frame sampling
225
+ audio.py faster-whisper transcription (large-v3)
226
+ ocr.py pytesseract text extraction
227
+ intent.py Question intent classification
228
+ timeline.py Temporal event merging
229
+ memory.py Hierarchical summaries
230
+ reason.py Evidence assembly + optional API mode
231
+ util.py ffmpeg helpers, image encoding, caching
232
+ SKILL.md Claude Code skill definition (bundled)
233
+ ```
234
+
235
+ ## License
236
+
237
+ MIT
package/SKILL.md ADDED
@@ -0,0 +1,138 @@
1
+ ---
2
+ name: video-understand
3
+ description: >
4
+ Analyze video files using multimodal extraction (frames, audio transcript,
5
+ OCR, timeline) and Claude's reasoning. Use when the user asks to analyze,
6
+ understand, describe, or answer questions about a video file or folder of
7
+ videos. Triggers on: "analyze this video", "what happens in this video",
8
+ "describe the video", "video question", any path ending in
9
+ .mp4/.mov/.mkv/.webm followed by a question about it.
10
+ tools: ["Bash", "Read", "Glob"]
11
+ ---
12
+
13
+ # Video Understanding Skill
14
+
15
+ Analyze videos by extracting visual frames, audio transcripts, OCR text, and
16
+ temporal timelines, then reasoning over the combined evidence.
17
+
18
+ ## How It Works
19
+
20
+ This skill uses a Python extraction pipeline (`video_understand.py`) that:
21
+ 1. Ingests the video and extracts metadata via ffprobe
22
+ 2. Detects shot boundaries using ffmpeg scene change detection
23
+ 3. Adaptively samples frames (more frames at scene transitions)
24
+ 4. Transcribes audio using OpenAI Whisper (if installed)
25
+ 5. Extracts on-screen text via OCR (if pytesseract installed)
26
+ 6. Builds a unified temporal timeline of all events
27
+ 7. Generates an `evidence.md` report + cached artifacts
28
+
29
+ You (Claude) then read the evidence and frame images to reason over the video.
30
+
31
+ ## Step-by-Step Usage
32
+
33
+ ### Step 1: Extract evidence from the video
34
+
35
+ Run the extraction pipeline:
36
+
37
+ ```bash
38
+ python D:/ai_creative_stuff/claudevid/video_understand.py "<video_path>" --extract --mode standard --verbose
39
+ ```
40
+
41
+ For quick analysis (fewer frames, faster):
42
+ ```bash
43
+ python D:/ai_creative_stuff/claudevid/video_understand.py "<video_path>" --extract --mode quick --verbose
44
+ ```
45
+
46
+ For deep analysis (more frames, OCR on all frames):
47
+ ```bash
48
+ python D:/ai_creative_stuff/claudevid/video_understand.py "<video_path>" --extract --mode deep --verbose
49
+ ```
50
+
51
+ ### Step 2: Read the evidence report
52
+
53
+ The script outputs a cache directory path. Read the evidence report:
54
+
55
+ ```bash
56
+ # The cache path is printed by the script, e.g.:
57
+ # Cache: /path/to/.vidcache/a3f7b2c1
58
+ # Read the evidence:
59
+ cat /path/to/.vidcache/<hash>/evidence.md
60
+ ```
61
+
62
+ Use the Read tool to read `evidence.md` from the cache directory.
63
+
64
+ ### Step 3: View key frames
65
+
66
+ Read the frame images listed in evidence.md to see what's in the video.
67
+ Select 5-10 representative frames spread across the video's duration.
68
+ Use the Read tool on the frame image paths.
69
+
70
+ ### Step 4: Answer the question
71
+
72
+ Reason over the combined evidence (timeline, transcript, OCR, frames) to
73
+ answer the user's question. Ground claims in timestamps. Note uncertainties.
74
+
75
+ ### Follow-up Questions
76
+
77
+ For follow-up questions about the same video, the cache already exists.
78
+ Re-read the evidence.md and relevant frames — no need to re-extract.
79
+
80
+ To force re-extraction: add `--no-cache` flag.
81
+
82
+ ## Batch Processing
83
+
84
+ Process a folder of videos:
85
+ ```bash
86
+ python D:/ai_creative_stuff/claudevid/video_understand.py "<folder_path>" --extract --mode standard --verbose
87
+ ```
88
+
89
+ ## CLI Reference
90
+
91
+ | Flag | Default | Description |
92
+ |------|---------|-------------|
93
+ | `input` | required | Video file or folder path |
94
+ | `--extract` | off | Extract only (skill mode, no API key needed) |
95
+ | `-q "..."` | none | Question (for API standalone mode) |
96
+ | `-f N` | mode default | Frames per second override |
97
+ | `-m N` | mode default | Max frames override |
98
+ | `--no-audio` | off | Skip audio transcription |
99
+ | `--no-ocr` | off | Skip OCR extraction |
100
+ | `--mode` | standard | quick / standard / deep |
101
+ | `--verbose` | off | Detailed progress output |
102
+ | `--no-cache` | off | Force re-extraction |
103
+ | `--api` | off | Standalone mode (needs ANTHROPIC_API_KEY) |
104
+ | `-o file` | stdout | Write output to file |
105
+
106
+ ## Modes Comparison
107
+
108
+ | Aspect | quick | standard | deep |
109
+ |--------|-------|----------|------|
110
+ | Frame sampling | 0.2fps, max 20 | 0.5fps + shot boundaries, max 60 | 1.0fps + burst, max 150 |
111
+ | Audio | whisper base | whisper base | whisper small |
112
+ | OCR | skip | keyframes only | all frames |
113
+ | Summaries | skip | for videos > 5min | always |
114
+
115
+ ## Prerequisites
116
+
117
+ 1. **Python 3.10+**
118
+ 2. **ffmpeg** on PATH — `winget install ffmpeg` or download from https://ffmpeg.org
119
+ 3. **Pillow**: `pip install Pillow`
120
+ 4. **Whisper** (optional): `pip install openai-whisper` (requires PyTorch)
121
+ 5. **Tesseract** (optional): Install Tesseract OCR + `pip install pytesseract`
122
+
123
+ ## Troubleshooting
124
+
125
+ - **"ffmpeg not found"**: Ensure ffmpeg is on your PATH. Run `ffmpeg -version` to verify.
126
+ - **"openai-whisper not installed"**: Audio will be skipped. Install with `pip install openai-whisper`.
127
+ - **Slow frame extraction**: Use `--mode quick` or reduce with `-m 10`.
128
+ - **Large cache**: Delete `.vidcache/` folders to free space.
129
+ - **Re-analyze same video**: Use `--no-cache` to force fresh extraction.
130
+
131
+ ## Extension Ideas
132
+
133
+ - **OmniVision offline footage review**: Process security/dashcam footage folders,
134
+ generate timeline reports, flag anomalous events
135
+ - **Meeting summarization**: Extract slides (OCR) + transcript, generate meeting notes
136
+ - **Content moderation**: Scan video batches for policy violations
137
+ - **Sports analysis**: Dense frame extraction for play-by-play breakdown
138
+ - **Accessibility**: Generate audio descriptions from visual content
package/bin/setup.js ADDED
@@ -0,0 +1,45 @@
1
+ #!/usr/bin/env node
2
+
3
+ const { execSync } = require("child_process");
4
+ const fs = require("fs");
5
+ const path = require("path");
6
+
7
+ const reqPath = path.join(__dirname, "..", "requirements.txt");
8
+
9
+ console.log("vidclaude: Installing Python dependencies...");
10
+
11
+ // Check Python exists
12
+ try {
13
+ execSync("python --version", { stdio: "pipe" });
14
+ } catch {
15
+ console.warn(
16
+ "\n⚠ Python not found. You need Python 3.10+ to use vidclaude.\n" +
17
+ " Install from: https://python.org\n" +
18
+ " Then run: pip install -r requirements.txt\n"
19
+ );
20
+ process.exit(0); // Don't fail npm install
21
+ }
22
+
23
+ // Check ffmpeg exists
24
+ try {
25
+ execSync("ffmpeg -version", { stdio: "pipe" });
26
+ } catch {
27
+ console.warn(
28
+ "\n⚠ ffmpeg not found. You need ffmpeg to use vidclaude.\n" +
29
+ " Windows: winget install ffmpeg\n" +
30
+ " macOS: brew install ffmpeg\n" +
31
+ " Linux: sudo apt install ffmpeg\n"
32
+ );
33
+ }
34
+
35
+ // Install Python deps
36
+ try {
37
+ execSync(`pip install -r "${reqPath}"`, { stdio: "inherit" });
38
+ console.log("\nvidclaude: Setup complete!");
39
+ console.log(" Run: npx vidclaude video.mp4 --extract --mode standard --verbose\n");
40
+ } catch {
41
+ console.warn(
42
+ "\n⚠ Failed to install Python dependencies.\n" +
43
+ " Try manually: pip install -r requirements.txt\n"
44
+ );
45
+ }
@@ -0,0 +1,27 @@
1
+ #!/usr/bin/env node
2
+
3
+ const { spawn } = require("child_process");
4
+ const path = require("path");
5
+
6
+ const scriptPath = path.join(__dirname, "..", "video_understand.py");
7
+ const args = process.argv.slice(2);
8
+
9
+ // Pass all arguments through to the Python script
10
+ const proc = spawn("python", [scriptPath, ...args], {
11
+ stdio: "inherit",
12
+ });
13
+
14
+ proc.on("error", (err) => {
15
+ if (err.code === "ENOENT") {
16
+ console.error(
17
+ "Error: Python not found. Install Python 3.10+ from https://python.org"
18
+ );
19
+ process.exit(1);
20
+ }
21
+ console.error(`Error: ${err.message}`);
22
+ process.exit(1);
23
+ });
24
+
25
+ proc.on("close", (code) => {
26
+ process.exit(code || 0);
27
+ });
package/package.json ADDED
@@ -0,0 +1,31 @@
1
+ {
2
+ "name": "vidclaude",
3
+ "version": "0.2.0",
4
+ "description": "Multimodal video understanding for Claude Code — extract frames, transcribe audio, build timelines from any video",
5
+ "bin": {
6
+ "vidclaude": "./bin/vidclaude.js"
7
+ },
8
+ "files": [
9
+ "bin/",
10
+ "vidclaude/",
11
+ "video_understand.py",
12
+ "requirements.txt",
13
+ "SKILL.md",
14
+ "README.md"
15
+ ],
16
+ "scripts": {
17
+ "postinstall": "node ./bin/setup.js"
18
+ },
19
+ "keywords": [
20
+ "video",
21
+ "claude",
22
+ "multimodal",
23
+ "whisper",
24
+ "analysis",
25
+ "claude-code"
26
+ ],
27
+ "license": "MIT",
28
+ "engines": {
29
+ "node": ">=16.0.0"
30
+ }
31
+ }
@@ -0,0 +1,2 @@
1
+ Pillow>=10.0
2
+ faster-whisper>=1.0
@@ -0,0 +1,138 @@
1
+ ---
2
+ name: video-understand
3
+ description: >
4
+ Analyze video files using multimodal extraction (frames, audio transcript,
5
+ OCR, timeline) and Claude's reasoning. Use when the user asks to analyze,
6
+ understand, describe, or answer questions about a video file or folder of
7
+ videos. Triggers on: "analyze this video", "what happens in this video",
8
+ "describe the video", "video question", any path ending in
9
+ .mp4/.mov/.mkv/.webm followed by a question about it.
10
+ tools: ["Bash", "Read", "Glob"]
11
+ ---
12
+
13
+ # Video Understanding Skill
14
+
15
+ Analyze videos by extracting visual frames, audio transcripts, OCR text, and
16
+ temporal timelines, then reasoning over the combined evidence.
17
+
18
+ ## How It Works
19
+
20
+ This skill uses a Python extraction pipeline (`video_understand.py`) that:
21
+ 1. Ingests the video and extracts metadata via ffprobe
22
+ 2. Detects shot boundaries using ffmpeg scene change detection
23
+ 3. Adaptively samples frames (more frames at scene transitions)
24
+ 4. Transcribes audio using OpenAI Whisper (if installed)
25
+ 5. Extracts on-screen text via OCR (if pytesseract installed)
26
+ 6. Builds a unified temporal timeline of all events
27
+ 7. Generates an `evidence.md` report + cached artifacts
28
+
29
+ You (Claude) then read the evidence and frame images to reason over the video.
30
+
31
+ ## Step-by-Step Usage
32
+
33
+ ### Step 1: Extract evidence from the video
34
+
35
+ Run the extraction pipeline:
36
+
37
+ ```bash
38
+ python D:/ai_creative_stuff/claudevid/video_understand.py "<video_path>" --extract --mode standard --verbose
39
+ ```
40
+
41
+ For quick analysis (fewer frames, faster):
42
+ ```bash
43
+ python D:/ai_creative_stuff/claudevid/video_understand.py "<video_path>" --extract --mode quick --verbose
44
+ ```
45
+
46
+ For deep analysis (more frames, OCR on all frames):
47
+ ```bash
48
+ python D:/ai_creative_stuff/claudevid/video_understand.py "<video_path>" --extract --mode deep --verbose
49
+ ```
50
+
51
+ ### Step 2: Read the evidence report
52
+
53
+ The script outputs a cache directory path. Read the evidence report:
54
+
55
+ ```bash
56
+ # The cache path is printed by the script, e.g.:
57
+ # Cache: /path/to/.vidcache/a3f7b2c1
58
+ # Read the evidence:
59
+ cat /path/to/.vidcache/<hash>/evidence.md
60
+ ```
61
+
62
+ Use the Read tool to read `evidence.md` from the cache directory.
63
+
64
+ ### Step 3: View key frames
65
+
66
+ Read the frame images listed in evidence.md to see what's in the video.
67
+ Select 5-10 representative frames spread across the video's duration.
68
+ Use the Read tool on the frame image paths.
69
+
70
+ ### Step 4: Answer the question
71
+
72
+ Reason over the combined evidence (timeline, transcript, OCR, frames) to
73
+ answer the user's question. Ground claims in timestamps. Note uncertainties.
74
+
75
+ ### Follow-up Questions
76
+
77
+ For follow-up questions about the same video, the cache already exists.
78
+ Re-read the evidence.md and relevant frames — no need to re-extract.
79
+
80
+ To force re-extraction: add `--no-cache` flag.
81
+
82
+ ## Batch Processing
83
+
84
+ Process a folder of videos:
85
+ ```bash
86
+ python D:/ai_creative_stuff/claudevid/video_understand.py "<folder_path>" --extract --mode standard --verbose
87
+ ```
88
+
89
+ ## CLI Reference
90
+
91
+ | Flag | Default | Description |
92
+ |------|---------|-------------|
93
+ | `input` | required | Video file or folder path |
94
+ | `--extract` | off | Extract only (skill mode, no API key needed) |
95
+ | `-q "..."` | none | Question (for API standalone mode) |
96
+ | `-f N` | mode default | Frames per second override |
97
+ | `-m N` | mode default | Max frames override |
98
+ | `--no-audio` | off | Skip audio transcription |
99
+ | `--no-ocr` | off | Skip OCR extraction |
100
+ | `--mode` | standard | quick / standard / deep |
101
+ | `--verbose` | off | Detailed progress output |
102
+ | `--no-cache` | off | Force re-extraction |
103
+ | `--api` | off | Standalone mode (needs ANTHROPIC_API_KEY) |
104
+ | `-o file` | stdout | Write output to file |
105
+
106
+ ## Modes Comparison
107
+
108
+ | Aspect | quick | standard | deep |
109
+ |--------|-------|----------|------|
110
+ | Frame sampling | 0.2fps, max 20 | 0.5fps + shot boundaries, max 60 | 1.0fps + burst, max 150 |
111
+ | Audio | whisper base | whisper base | whisper small |
112
+ | OCR | skip | keyframes only | all frames |
113
+ | Summaries | skip | for videos > 5min | always |
114
+
115
+ ## Prerequisites
116
+
117
+ 1. **Python 3.10+**
118
+ 2. **ffmpeg** on PATH — `winget install ffmpeg` or download from https://ffmpeg.org
119
+ 3. **Pillow**: `pip install Pillow`
120
+ 4. **Whisper** (optional): `pip install openai-whisper` (requires PyTorch)
121
+ 5. **Tesseract** (optional): Install Tesseract OCR + `pip install pytesseract`
122
+
123
+ ## Troubleshooting
124
+
125
+ - **"ffmpeg not found"**: Ensure ffmpeg is on your PATH. Run `ffmpeg -version` to verify.
126
+ - **"openai-whisper not installed"**: Audio will be skipped. Install with `pip install openai-whisper`.
127
+ - **Slow frame extraction**: Use `--mode quick` or reduce with `-m 10`.
128
+ - **Large cache**: Delete `.vidcache/` folders to free space.
129
+ - **Re-analyze same video**: Use `--no-cache` to force fresh extraction.
130
+
131
+ ## Extension Ideas
132
+
133
+ - **OmniVision offline footage review**: Process security/dashcam footage folders,
134
+ generate timeline reports, flag anomalous events
135
+ - **Meeting summarization**: Extract slides (OCR) + transcript, generate meeting notes
136
+ - **Content moderation**: Scan video batches for policy violations
137
+ - **Sports analysis**: Dense frame extraction for play-by-play breakdown
138
+ - **Accessibility**: Generate audio descriptions from visual content
@@ -0,0 +1,3 @@
1
+ """vidclaude — Multimodal video understanding powered by Claude."""
2
+
3
+ __version__ = "0.2.0"