vidistill 0.4.4 → 0.5.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +7 -6
- package/dist/index.js +1168 -586
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -112,15 +112,16 @@ Supported video formats: MP4, MOV, WebM, MKV, AVI, MPEG, FLV, WMV, 3GPP. Support
|
|
|
112
112
|
|
|
113
113
|
1. **Input** — accepts YouTube URL directly or reads local file (video or audio), compresses if over 2GB
|
|
114
114
|
2. **Pass 0** — scene analysis to classify video type and determine processing strategy
|
|
115
|
-
3. **Pass
|
|
116
|
-
4. **Pass
|
|
117
|
-
5. **Pass
|
|
115
|
+
3. **Pass 1a** — pure verbatim transcription (timestamps, tone, emphasis — no speaker labels), runs 3x with consensus alignment
|
|
116
|
+
4. **Pass 1b** — speaker diarization (assigns SPEAKER_XX labels to transcript entries using voice and visual cues, then merged with 1a), runs 3x with majority voting
|
|
117
|
+
5. **Pass 2** — visual content extraction (screen states, diagrams, slides)
|
|
118
|
+
6. **Pass 3** — specialist passes based on video type:
|
|
118
119
|
- 3c: chat and links (live streams) — per segment, runs 3x with consensus voting
|
|
119
120
|
- 3d: implicit signals (all types) — per segment
|
|
120
|
-
- 3b: people and social dynamics (meetings) — whole video
|
|
121
|
+
- 3b: people and social dynamics (meetings) — whole video, anchored to transcript speakers
|
|
121
122
|
- 3a: code reconstruction (coding videos) — whole video, runs 3x with consensus voting and validation
|
|
122
|
-
|
|
123
|
-
|
|
123
|
+
7. **Synthesis** — cross-references all passes into unified analysis
|
|
124
|
+
8. **Output** — generates structured markdown files
|
|
124
125
|
|
|
125
126
|
Audio files skip visual passes and go straight to transcript, people, implicit signals, and synthesis.
|
|
126
127
|
|