vidistill 0.2.3 → 0.2.5
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +48 -10
- package/dist/index.js +842 -387
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -1,8 +1,8 @@
|
|
|
1
1
|
# vidistill
|
|
2
2
|
|
|
3
|
-
Video intelligence distiller — turn any video into structured notes, transcripts, and insights using Gemini.
|
|
3
|
+
Video intelligence distiller — turn any video or audio file into structured notes, transcripts, and insights using Gemini.
|
|
4
4
|
|
|
5
|
-
Feed it a YouTube URL
|
|
5
|
+
Feed it a YouTube URL, local video, or audio file. It analyzes the content through multiple AI passes (scene analysis, transcript, visuals, code extraction, people, chat, implicit signals) and synthesizes everything into organized markdown output.
|
|
6
6
|
|
|
7
7
|
## Install
|
|
8
8
|
|
|
@@ -20,12 +20,13 @@ vidistill [input] [options]
|
|
|
20
20
|
|
|
21
21
|
**Arguments:**
|
|
22
22
|
|
|
23
|
-
- `input` — YouTube URL or
|
|
23
|
+
- `input` — YouTube URL, local video, or audio file path (prompted interactively if omitted)
|
|
24
24
|
|
|
25
25
|
**Options:**
|
|
26
26
|
|
|
27
27
|
- `-c, --context` — context about the video (e.g. "CS lecture", "product demo")
|
|
28
28
|
- `-o, --output` — output directory (default: `./vidistill-output/`)
|
|
29
|
+
- `-l, --lang <code>` — output language (e.g. `zh`, `ja`, `ko`, `es`, `fr`, `de`, `pt`, `ru`, `ar`, `hi`)
|
|
29
30
|
|
|
30
31
|
**Examples:**
|
|
31
32
|
|
|
@@ -39,10 +40,41 @@ vidistill "https://youtube.com/watch?v=dQw4w9WgXcQ"
|
|
|
39
40
|
# Local file with context
|
|
40
41
|
vidistill ./lecture.mp4 --context "distributed systems lecture"
|
|
41
42
|
|
|
43
|
+
# Audio file
|
|
44
|
+
vidistill ./podcast.mp3
|
|
45
|
+
|
|
42
46
|
# Custom output directory
|
|
43
47
|
vidistill ./demo.mp4 -o ./notes/
|
|
48
|
+
|
|
49
|
+
# Output in another language
|
|
50
|
+
vidistill ./lecture.mp4 --lang zh
|
|
51
|
+
```
|
|
52
|
+
|
|
53
|
+
### Extract
|
|
54
|
+
|
|
55
|
+
Pull specific data from a previously processed video or re-run a targeted pass on a video file.
|
|
56
|
+
|
|
57
|
+
```
|
|
58
|
+
vidistill extract <type> <source>
|
|
44
59
|
```
|
|
45
60
|
|
|
61
|
+
**Arguments:**
|
|
62
|
+
|
|
63
|
+
- `type` — what to extract: `code`, `links`, `people`, `transcript`, or `commands`
|
|
64
|
+
- `source` — path to a vidistill output directory or a video/audio file
|
|
65
|
+
|
|
66
|
+
**Examples:**
|
|
67
|
+
|
|
68
|
+
```bash
|
|
69
|
+
# Extract code from existing output (no API calls)
|
|
70
|
+
vidistill extract code ./vidistill-output/my-video/
|
|
71
|
+
|
|
72
|
+
# Extract links from a video file (runs targeted pipeline)
|
|
73
|
+
vidistill extract links ./lecture.mp4
|
|
74
|
+
```
|
|
75
|
+
|
|
76
|
+
When pointed at an output directory, extract reads from already-generated files with zero API calls. When pointed at a video file, it runs a minimal pipeline with only the passes needed for the requested data type.
|
|
77
|
+
|
|
46
78
|
## API Key
|
|
47
79
|
|
|
48
80
|
vidistill needs a Gemini API key. It checks these sources in order:
|
|
@@ -63,7 +95,9 @@ vidistill-output/my-video/
|
|
|
63
95
|
├── transcript.md # full timestamped transcript
|
|
64
96
|
├── combined.md # transcript + visual notes merged
|
|
65
97
|
├── notes.md # meeting/lecture notes
|
|
66
|
-
├── code
|
|
98
|
+
├── code/ # extracted and reconstructed source files
|
|
99
|
+
│ ├── *.ext # individual source files
|
|
100
|
+
│ └── code-timeline.md # code evolution timeline
|
|
67
101
|
├── people.md # speakers and participants
|
|
68
102
|
├── chat.md # chat messages and links
|
|
69
103
|
├── action-items.md # tasks and follow-ups
|
|
@@ -73,22 +107,26 @@ vidistill-output/my-video/
|
|
|
73
107
|
└── raw/ # raw pass outputs
|
|
74
108
|
```
|
|
75
109
|
|
|
76
|
-
Which files are generated depends on the video content — a coding tutorial gets `code
|
|
110
|
+
Which files are generated depends on the video content — a coding tutorial gets `code/`, a meeting gets `people.md` and `action-items.md`, etc.
|
|
77
111
|
|
|
78
112
|
## How It Works
|
|
79
113
|
|
|
80
|
-
|
|
114
|
+
Supported video formats: MP4, MOV, WebM, MKV, AVI, MPEG, FLV, WMV, 3GPP. Supported audio formats: MP3, AAC, WAV, FLAC, OGG, M4A.
|
|
115
|
+
|
|
116
|
+
1. **Input** — downloads YouTube video via yt-dlp or reads local file (video or audio), compresses if over 2GB
|
|
81
117
|
2. **Pass 0** — scene analysis to classify video type and determine processing strategy
|
|
82
118
|
3. **Pass 1** — transcript extraction with speaker identification
|
|
83
119
|
4. **Pass 2** — visual content extraction (screen states, diagrams, slides)
|
|
84
120
|
5. **Pass 3** — specialist passes based on video type:
|
|
85
|
-
-
|
|
86
|
-
-
|
|
87
|
-
-
|
|
88
|
-
-
|
|
121
|
+
- 3c: chat and links (live streams) — per segment
|
|
122
|
+
- 3d: implicit signals (all types) — per segment
|
|
123
|
+
- 3b: people and social dynamics (meetings) — whole video
|
|
124
|
+
- 3a: code reconstruction (coding videos) — whole video, runs 3x with consensus voting and validation
|
|
89
125
|
6. **Synthesis** — cross-references all passes into unified analysis
|
|
90
126
|
7. **Output** — generates structured markdown files
|
|
91
127
|
|
|
128
|
+
Audio files skip visual passes and go straight to transcript, people, implicit signals, and synthesis.
|
|
129
|
+
|
|
92
130
|
Long videos are segmented automatically. Passes that fail are skipped gracefully.
|
|
93
131
|
|
|
94
132
|
## License
|