vidistill 0.2.3 → 0.2.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (3) hide show
  1. package/README.md +48 -10
  2. package/dist/index.js +842 -387
  3. package/package.json +1 -1
package/README.md CHANGED
@@ -1,8 +1,8 @@
1
1
  # vidistill
2
2
 
3
- Video intelligence distiller — turn any video into structured notes, transcripts, and insights using Gemini.
3
+ Video intelligence distiller — turn any video or audio file into structured notes, transcripts, and insights using Gemini.
4
4
 
5
- Feed it a YouTube URL or local video file. It analyzes the content through multiple AI passes (scene analysis, transcript, visuals, code extraction, people, chat, implicit signals) and synthesizes everything into organized markdown output.
5
+ Feed it a YouTube URL, local video, or audio file. It analyzes the content through multiple AI passes (scene analysis, transcript, visuals, code extraction, people, chat, implicit signals) and synthesizes everything into organized markdown output.
6
6
 
7
7
  ## Install
8
8
 
@@ -20,12 +20,13 @@ vidistill [input] [options]
20
20
 
21
21
  **Arguments:**
22
22
 
23
- - `input` — YouTube URL or local file path (prompted interactively if omitted)
23
+ - `input` — YouTube URL, local video, or audio file path (prompted interactively if omitted)
24
24
 
25
25
  **Options:**
26
26
 
27
27
  - `-c, --context` — context about the video (e.g. "CS lecture", "product demo")
28
28
  - `-o, --output` — output directory (default: `./vidistill-output/`)
29
+ - `-l, --lang <code>` — output language (e.g. `zh`, `ja`, `ko`, `es`, `fr`, `de`, `pt`, `ru`, `ar`, `hi`)
29
30
 
30
31
  **Examples:**
31
32
 
@@ -39,10 +40,41 @@ vidistill "https://youtube.com/watch?v=dQw4w9WgXcQ"
39
40
  # Local file with context
40
41
  vidistill ./lecture.mp4 --context "distributed systems lecture"
41
42
 
43
+ # Audio file
44
+ vidistill ./podcast.mp3
45
+
42
46
  # Custom output directory
43
47
  vidistill ./demo.mp4 -o ./notes/
48
+
49
+ # Output in another language
50
+ vidistill ./lecture.mp4 --lang zh
51
+ ```
52
+
53
+ ### Extract
54
+
55
+ Pull specific data from a previously processed video or re-run a targeted pass on a video file.
56
+
57
+ ```
58
+ vidistill extract <type> <source>
44
59
  ```
45
60
 
61
+ **Arguments:**
62
+
63
+ - `type` — what to extract: `code`, `links`, `people`, `transcript`, or `commands`
64
+ - `source` — path to a vidistill output directory or a video/audio file
65
+
66
+ **Examples:**
67
+
68
+ ```bash
69
+ # Extract code from existing output (no API calls)
70
+ vidistill extract code ./vidistill-output/my-video/
71
+
72
+ # Extract links from a video file (runs targeted pipeline)
73
+ vidistill extract links ./lecture.mp4
74
+ ```
75
+
76
+ When pointed at an output directory, extract reads from already-generated files with zero API calls. When pointed at a video file, it runs a minimal pipeline with only the passes needed for the requested data type.
77
+
46
78
  ## API Key
47
79
 
48
80
  vidistill needs a Gemini API key. It checks these sources in order:
@@ -63,7 +95,9 @@ vidistill-output/my-video/
63
95
  ├── transcript.md # full timestamped transcript
64
96
  ├── combined.md # transcript + visual notes merged
65
97
  ├── notes.md # meeting/lecture notes
66
- ├── code.md # extracted code blocks and reconstructions
98
+ ├── code/ # extracted and reconstructed source files
99
+ │ ├── *.ext # individual source files
100
+ │ └── code-timeline.md # code evolution timeline
67
101
  ├── people.md # speakers and participants
68
102
  ├── chat.md # chat messages and links
69
103
  ├── action-items.md # tasks and follow-ups
@@ -73,22 +107,26 @@ vidistill-output/my-video/
73
107
  └── raw/ # raw pass outputs
74
108
  ```
75
109
 
76
- Which files are generated depends on the video content — a coding tutorial gets `code.md`, a meeting gets `people.md` and `action-items.md`, etc.
110
+ Which files are generated depends on the video content — a coding tutorial gets `code/`, a meeting gets `people.md` and `action-items.md`, etc.
77
111
 
78
112
  ## How It Works
79
113
 
80
- 1. **Input** downloads YouTube video via yt-dlp or reads local file, compresses if over 2GB
114
+ Supported video formats: MP4, MOV, WebM, MKV, AVI, MPEG, FLV, WMV, 3GPP. Supported audio formats: MP3, AAC, WAV, FLAC, OGG, M4A.
115
+
116
+ 1. **Input** — downloads YouTube video via yt-dlp or reads local file (video or audio), compresses if over 2GB
81
117
  2. **Pass 0** — scene analysis to classify video type and determine processing strategy
82
118
  3. **Pass 1** — transcript extraction with speaker identification
83
119
  4. **Pass 2** — visual content extraction (screen states, diagrams, slides)
84
120
  5. **Pass 3** — specialist passes based on video type:
85
- - 3a: code reconstruction (coding videos)
86
- - 3b: people and social dynamics (meetings)
87
- - 3c: chat and links (live streams)
88
- - 3d: implicit signals (all types)
121
+ - 3c: chat and links (live streams) — per segment
122
+ - 3d: implicit signals (all types) — per segment
123
+ - 3b: people and social dynamics (meetings) — whole video
124
+ - 3a: code reconstruction (coding videos) — whole video, runs 3x with consensus voting and validation
89
125
  6. **Synthesis** — cross-references all passes into unified analysis
90
126
  7. **Output** — generates structured markdown files
91
127
 
128
+ Audio files skip visual passes and go straight to transcript, people, implicit signals, and synthesis.
129
+
92
130
  Long videos are segmented automatically. Passes that fail are skipped gracefully.
93
131
 
94
132
  ## License