vidistill 0.4.4 → 0.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (3) hide show
  1. package/README.md +7 -6
  2. package/dist/index.js +1168 -586
  3. package/package.json +1 -1
package/README.md CHANGED
@@ -112,15 +112,16 @@ Supported video formats: MP4, MOV, WebM, MKV, AVI, MPEG, FLV, WMV, 3GPP. Support
112
112
 
113
113
  1. **Input** — accepts YouTube URL directly or reads local file (video or audio), compresses if over 2GB
114
114
  2. **Pass 0** — scene analysis to classify video type and determine processing strategy
115
- 3. **Pass 1** — transcript extraction with speaker identification
116
- 4. **Pass 2** — visual content extraction (screen states, diagrams, slides)
117
- 5. **Pass 3** — specialist passes based on video type:
115
+ 3. **Pass 1a** — pure verbatim transcription (timestamps, tone, emphasis — no speaker labels), runs 3x with consensus alignment
116
+ 4. **Pass 1b** — speaker diarization (assigns SPEAKER_XX labels to transcript entries using voice and visual cues, then merged with 1a), runs 3x with majority voting
117
+ 5. **Pass 2** — visual content extraction (screen states, diagrams, slides)
118
+ 6. **Pass 3** — specialist passes based on video type:
118
119
  - 3c: chat and links (live streams) — per segment, runs 3x with consensus voting
119
120
  - 3d: implicit signals (all types) — per segment
120
- - 3b: people and social dynamics (meetings) — whole video
121
+ - 3b: people and social dynamics (meetings) — whole video, anchored to transcript speakers
121
122
  - 3a: code reconstruction (coding videos) — whole video, runs 3x with consensus voting and validation
122
- 6. **Synthesis** — cross-references all passes into unified analysis
123
- 7. **Output** — generates structured markdown files
123
+ 7. **Synthesis** — cross-references all passes into unified analysis
124
+ 8. **Output** — generates structured markdown files
124
125
 
125
126
  Audio files skip visual passes and go straight to transcript, people, implicit signals, and synthesis.
126
127