vg-hubert 1.0.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (45) hide show
  1. vg_hubert-1.0.0/.gitignore +31 -0
  2. vg_hubert-1.0.0/LICENSE +29 -0
  3. vg_hubert-1.0.0/MANIFEST.in +7 -0
  4. vg_hubert-1.0.0/PKG-INFO +375 -0
  5. vg_hubert-1.0.0/README.md +332 -0
  6. vg_hubert-1.0.0/configs/places.yaml +50 -0
  7. vg_hubert-1.0.0/configs/spokencoco.yaml +50 -0
  8. vg_hubert-1.0.0/demo.ipynb +469 -0
  9. vg_hubert-1.0.0/examples/basic_usage.py +109 -0
  10. vg_hubert-1.0.0/examples/batch_processing.py +131 -0
  11. vg_hubert-1.0.0/publish_to_hub.py +406 -0
  12. vg_hubert-1.0.0/pyproject.toml +58 -0
  13. vg_hubert-1.0.0/requirements.txt +26 -0
  14. vg_hubert-1.0.0/setup.cfg +4 -0
  15. vg_hubert-1.0.0/setup.py +75 -0
  16. vg_hubert-1.0.0/tests/test_mincutmerge.py +151 -0
  17. vg_hubert-1.0.0/tests/test_validation.py +256 -0
  18. vg_hubert-1.0.0/train.py +122 -0
  19. vg_hubert-1.0.0/vg_hubert/__init__.py +24 -0
  20. vg_hubert-1.0.0/vg_hubert/datasets/__init__.py +20 -0
  21. vg_hubert-1.0.0/vg_hubert/datasets/places_dataset.py +112 -0
  22. vg_hubert-1.0.0/vg_hubert/datasets/sampler.py +36 -0
  23. vg_hubert-1.0.0/vg_hubert/datasets/spokencoco_dataset.py +108 -0
  24. vg_hubert-1.0.0/vg_hubert/mincut.py +400 -0
  25. vg_hubert-1.0.0/vg_hubert/model/__init__.py +36 -0
  26. vg_hubert-1.0.0/vg_hubert/model/audio_encoder.py +985 -0
  27. vg_hubert-1.0.0/vg_hubert/model/dual_encoder.py +157 -0
  28. vg_hubert-1.0.0/vg_hubert/model/utils.py +558 -0
  29. vg_hubert-1.0.0/vg_hubert/model/vision_transformer.py +304 -0
  30. vg_hubert-1.0.0/vg_hubert/model/vit_utils.py +623 -0
  31. vg_hubert-1.0.0/vg_hubert/segmenter.py +493 -0
  32. vg_hubert-1.0.0/vg_hubert/tests/test_better_params.py +79 -0
  33. vg_hubert-1.0.0/vg_hubert/tests/test_hf_upload.py +224 -0
  34. vg_hubert-1.0.0/vg_hubert/tests/test_mincut_comparison.py +255 -0
  35. vg_hubert-1.0.0/vg_hubert/tests/test_mincut_quality.py +268 -0
  36. vg_hubert-1.0.0/vg_hubert/training/__init__.py +17 -0
  37. vg_hubert-1.0.0/vg_hubert/training/bert_adam.py +179 -0
  38. vg_hubert-1.0.0/vg_hubert/training/trainer.py +403 -0
  39. vg_hubert-1.0.0/vg_hubert/training/trainer_utils.py +113 -0
  40. vg_hubert-1.0.0/vg_hubert/training/utils.py +81 -0
  41. vg_hubert-1.0.0/vg_hubert.egg-info/PKG-INFO +375 -0
  42. vg_hubert-1.0.0/vg_hubert.egg-info/SOURCES.txt +43 -0
  43. vg_hubert-1.0.0/vg_hubert.egg-info/dependency_links.txt +1 -0
  44. vg_hubert-1.0.0/vg_hubert.egg-info/requires.txt +12 -0
  45. vg_hubert-1.0.0/vg_hubert.egg-info/top_level.txt +1 -0
@@ -0,0 +1,31 @@
1
+ __pycache__/
2
+ *.py[cod]
3
+ .DS_Store
4
+ .vscode/
5
+ .ipynb_checkpoints/
6
+ *.png
7
+ *.gif
8
+ *.pdf
9
+ test*
10
+ *.log
11
+ vg-hubert_3/
12
+ *.tar
13
+
14
+ # Original README and cleanup scripts (keep local only)
15
+ README_ORIGINAL.md
16
+ # Training artifacts
17
+ checkpoints/
18
+ logs/
19
+ runs/
20
+ wandb/
21
+ *.ckpt
22
+ *.pth.tar
23
+
24
+ # Data directories (user-specific)
25
+ data/
26
+ datasets_cache/
27
+
28
+ # Build artifacts
29
+ build/
30
+ dist/
31
+ *.egg-info/
@@ -0,0 +1,29 @@
1
+ BSD 3-Clause License
2
+
3
+ Copyright (c) 2022, Puyuan Peng
4
+ All rights reserved.
5
+
6
+ Redistribution and use in source and binary forms, with or without
7
+ modification, are permitted provided that the following conditions are met:
8
+
9
+ 1. Redistributions of source code must retain the above copyright notice, this
10
+ list of conditions and the following disclaimer.
11
+
12
+ 2. Redistributions in binary form must reproduce the above copyright notice,
13
+ this list of conditions and the following disclaimer in the documentation
14
+ and/or other materials provided with the distribution.
15
+
16
+ 3. Neither the name of the copyright holder nor the names of its
17
+ contributors may be used to endorse or promote products derived from
18
+ this software without specific prior written permission.
19
+
20
+ THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
21
+ AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
22
+ IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
23
+ DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
24
+ FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
25
+ DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
26
+ SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
27
+ CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
28
+ OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
29
+ OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
@@ -0,0 +1,7 @@
1
+ include README.md
2
+ include LICENSE
3
+ include requirements.txt
4
+ recursive-include vg_hubert *.py
5
+ recursive-exclude * __pycache__
6
+ recursive-exclude * *.py[co]
7
+ recursive-exclude * .DS_Store
@@ -0,0 +1,375 @@
1
+ Metadata-Version: 2.4
2
+ Name: vg-hubert
3
+ Version: 1.0.0
4
+ Summary: VG-HuBERT: Simplified interface for speech segmentation with HuggingFace Hub integration
5
+ Home-page: https://github.com/human-ai-lab/VG-HuBERT
6
+ Author: Puyuan Peng, David Harwath
7
+ Author-email: Puyuan Peng <harwath@utexas.edu>, David Harwath <harwath@utexas.edu>
8
+ License: BSD-3-Clause
9
+ Project-URL: Homepage, https://github.com/human-ai-lab/VG-HuBERT
10
+ Project-URL: Original Paper (Words), https://arxiv.org/abs/2203.15081
11
+ Project-URL: Original Paper (Syllables), https://www.isca-speech.org/archive/interspeech_2023/peng23_interspeech.html
12
+ Project-URL: HuggingFace Model, https://huggingface.co/hjvm/VG-HuBERT
13
+ Project-URL: Bug Tracker, https://github.com/human-ai-lab/VG-HuBERT/issues
14
+ Keywords: speech,audio,segmentation,syllables,self-supervised,hubert,vg-hubert
15
+ Classifier: Development Status :: 4 - Beta
16
+ Classifier: Intended Audience :: Science/Research
17
+ Classifier: License :: OSI Approved :: BSD License
18
+ Classifier: Programming Language :: Python :: 3
19
+ Classifier: Programming Language :: Python :: 3.8
20
+ Classifier: Programming Language :: Python :: 3.9
21
+ Classifier: Programming Language :: Python :: 3.10
22
+ Classifier: Programming Language :: Python :: 3.11
23
+ Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
24
+ Classifier: Topic :: Multimedia :: Sound/Audio :: Speech
25
+ Requires-Python: >=3.8
26
+ Description-Content-Type: text/markdown
27
+ License-File: LICENSE
28
+ Requires-Dist: torch>=2.0.0
29
+ Requires-Dist: transformers>=4.20.0
30
+ Requires-Dist: huggingface-hub>=0.10.0
31
+ Requires-Dist: numpy>=1.20.0
32
+ Requires-Dist: soundfile>=0.10.0
33
+ Requires-Dist: scipy>=1.6.0
34
+ Provides-Extra: dev
35
+ Requires-Dist: pytest>=7.0.0; extra == "dev"
36
+ Requires-Dist: black>=22.0.0; extra == "dev"
37
+ Requires-Dist: isort>=5.10.0; extra == "dev"
38
+ Requires-Dist: flake8>=4.0.0; extra == "dev"
39
+ Dynamic: author
40
+ Dynamic: home-page
41
+ Dynamic: license-file
42
+ Dynamic: requires-python
43
+
44
+ # VG-HuBERT: Speech Segmentation with Simplified Interface
45
+
46
+ Unsupervised syllable and word segmentation using visually grounded HuBERT (VG-HuBERT). This fork provides a simplified interface with HuggingFace Hub integration, updated PyTorch version to eliminate the need for PyTorch `multi_head_attention_forward` patching, optimized MinCut algorithm (~40x speedup), and PyPI package distribution.
47
+
48
+ ## Quick Start
49
+
50
+ ```python
51
+ from vg_hubert import Segmenter
52
+
53
+ # Syllable segmentation (RECOMMENDED: includes MinCutMerge post-processing)
54
+ segmenter = Segmenter(mode="syllable", merge_threshold=0.3)
55
+ outputs = segmenter("audio.wav")
56
+
57
+ # Word segmentation
58
+ word_segmenter = Segmenter(mode="word")
59
+ word_outputs = word_segmenter("audio.wav")
60
+ ```
61
+
62
+ ## Installation
63
+
64
+ ```bash
65
+ # From source
66
+ pip install git+https://github.com/hjvm/VG-HuBERT.git
67
+
68
+ # Or PyPI (after publishing)
69
+ pip install vg-hubert
70
+ ```
71
+
72
+ **Requirements**: Python ≥3.8, PyTorch ≥2.0, transformers, scipy, soundfile
73
+
74
+ ## Features
75
+
76
+ ✨ **New in this fork:**
77
+ - 🚀 **40x faster MinCut**: Optimized algorithm from [SyllableLM](https://github.com/AlanBaade/SyllableLM) (Baade et al., 2024)
78
+ - 🔧 **MinCutMerge post-processing**: Prevents over-segmentation (matches original paper)
79
+ - 🤗 **HuggingFace integration**: Auto-download models from Hub
80
+ - 🍎 **Apple Silicon support**: Native MPS acceleration
81
+ - 📦 **PyPI distribution**: Simple `pip install`
82
+ - 🧹 **No fairseq for inference**: Removed complex dependency
83
+
84
+ ## Usage
85
+
86
+ ### Basic Example
87
+
88
+ ```python
89
+ from vg_hubert import Segmenter
90
+ import soundfile as sf
91
+
92
+ # Load and segment
93
+ segmenter = Segmenter(
94
+ model_ckpt="hjvm/VG-HuBERT", # HuggingFace Hub or local path
95
+ mode="syllable",
96
+ device="cuda", # or "mps" or "cpu" (auto-detects best available)
97
+ merge_threshold=0.3 # Enable MinCutMerge (recommended)
98
+ )
99
+
100
+ outputs = segmenter("audio.wav")
101
+
102
+ # Access results
103
+ for start, end in outputs['segments']:
104
+ print(f"Segment: {start:.2f}s - {end:.2f}s")
105
+
106
+ # Access features
107
+ segment_features = outputs['segment_features'] # [num_segments, 768]
108
+ frame_features = outputs['hidden_states'] # [num_frames, 768]
109
+ ```
110
+
111
+ ### MinCut Configuration
112
+
113
+ The package supports multiple MinCut configurations for different use cases:
114
+
115
+ ```python
116
+ # Configuration 1: RECOMMENDED (matches original paper)
117
+ # - Fast algorithm + MinCutMerge post-processing
118
+ # - Prevents over-segmentation
119
+ segmenter = Segmenter(
120
+ mode="syllable",
121
+ merge_threshold=0.3, # Original paper value
122
+ min_segment_frames=2 # Filter very short segments
123
+ )
124
+
125
+ # Configuration 2: Plain MinCut (no merging)
126
+ # - Useful for analysis or more granular segmentation
127
+ segmenter = Segmenter(
128
+ mode="syllable",
129
+ merge_threshold=None # Disable MinCutMerge
130
+ )
131
+
132
+ # Configuration 3: Custom merge threshold
133
+ # - Tune for your specific needs
134
+ # - Higher = more merging = fewer segments
135
+ # - Lower = less merging = more segments
136
+ segmenter = Segmenter(
137
+ mode="syllable",
138
+ merge_threshold=0.5 # More aggressive merging
139
+ )
140
+ ```
141
+
142
+ See [examples/mincut_comparison.py](examples/mincut_comparison.py) for detailed comparison.
143
+
144
+ ### Low-Level API
145
+
146
+ For advanced users who need full control:
147
+
148
+ ```python
149
+ from vg_hubert.mincut import segment_with_mincut
150
+ import numpy as np
151
+
152
+ # Extract features (see examples/ for full code)
153
+ features = ... # Shape: (num_frames, 768)
154
+
155
+ # Apply MinCut with full control
156
+ boundaries, ssm = segment_with_mincut(
157
+ features=features,
158
+ K=10, # Number of boundaries
159
+ merge_threshold=0.3, # Set to None for plain MinCut
160
+ min_segment_frames=2,
161
+ min_hop=3, # Minimum segment length
162
+ max_hop=50 # Maximum segment length
163
+ )
164
+ ```
165
+
166
+ ### Parameters
167
+
168
+ - **mode**: `"syllable"` (MinCut + feature similarity) or `"word"` (CLS attention)
169
+ - **layer**: HuBERT layer to use (default: 8 for syllables, 9 for words)
170
+ - **device**: `"cuda"`, `"mps"`, or `"cpu"` (defaults to CUDA if available, falls back to MPS on Apple Silicon, then CPU)
171
+ - **sec_per_syllable**: Target syllable duration for MinCut (default: 0.2)
172
+ - **merge_threshold**: Cosine similarity threshold for merging adjacent segments (default: 0.3, set to `None` to disable)
173
+ - **min_segment_frames**: Filter segments with ≤ this many frames (default: 2)
174
+ - **attn_threshold**: Attention threshold for word boundaries (default: 0.25)
175
+
176
+ See [examples/](examples/) for more usage patterns.
177
+
178
+ ## Model Details
179
+
180
+ ### Checkpoints
181
+
182
+ Two pre-trained models optimized for different tasks:
183
+
184
+ | Checkpoint | Task | Layer | Algorithm | Size |
185
+ |------------|------|-------|-----------|------|
186
+ | `vg-hubert-syllable.pth` | Syllable | 8 | MinCut + MinCutMerge | 474 MB |
187
+ | `vg-hubert-word.pth` | Word | 9 | CLS Attention | 361 MB |
188
+
189
+ ### Algorithm Details
190
+
191
+ **MinCut Segmentation (Syllables):**
192
+ 1. Extract HuBERT features from layer 8
193
+ 2. Compute self-similarity matrix (SSM)
194
+ 3. Apply efficient MinCut algorithm (Baade et al., 2024)
195
+ - ~40x faster than original O(N²K) implementation
196
+ - Uses cumulative sums for O(1) range queries
197
+ 4. **Optional**: Apply MinCutMerge post-processing (Peng et al., 2023)
198
+ - Iteratively merge adjacent segments with cosine similarity ≥ threshold
199
+ - Prevents over-segmentation
200
+ - Recommended for production use
201
+
202
+ **Performance Comparison:**
203
+
204
+ | Configuration | F1 (LibriSpeech) | Speed (ms/utt) | Speedup |
205
+ |---------------|------------------|----------------|---------|
206
+ | Original MinCut | 0.501 | 7524 | 1.0x |
207
+ | New MinCut | 0.501 | 169 | 44.5x |
208
+ | New + MinCutMerge-0.3 ⭐ | TBD | 171 | 44.0x |
209
+
210
+ *Note: LibriSpeech results shown; original paper reports F1=0.603 on SpokenCOCO*
211
+
212
+ ### Performance (SpokenCOCO - Original Paper)
213
+
214
+ **Syllable Segmentation:**
215
+ - Boundary F1: 0.603
216
+ - Boundary Precision: 0.574
217
+ - Boundary Recall: 0.636
218
+
219
+ **Word Discovery:**
220
+ - Token F1: 0.195
221
+ - Type F1: 0.174
222
+ - NED: 0.748
223
+
224
+ ## Training
225
+
226
+ VG-HuBERT uses **visually-grounded contrastive learning** to learn speech representations. The model jointly trains on speech and images using datasets like SpokenCOCO or Places.
227
+
228
+ ### Training Setup
229
+
230
+ 1. **Install training dependencies**:
231
+ ```bash
232
+ pip install -r requirements.txt # Includes fairseq, apex, Pillow, etc.
233
+ ```
234
+
235
+ 2. **Download datasets**:
236
+ - **SpokenCOCO**: [Spoken captions](https://data.csail.mit.edu/placesaudio/SpokenCOCO.tar.gz) + [MSCOCO images](http://cocodataset.org/#download)
237
+ - **Places**: [Spoken descriptions](https://data.csail.mit.edu/placesaudio/) + [Places365 images](http://places2.csail.mit.edu/)
238
+
239
+ 3. **Download pre-trained models** for initialization:
240
+ - [HuBERT Base](https://dl.fbaipublicfiles.com/hubert/hubert_base_ls960.pt) (pretrained on LibriSpeech 960h)
241
+ - [DINO ViT](https://dl.fbaipublicfiles.com/dino/dino_vitsmall8_pretrain/dino_vitsmall8_pretrain_full_checkpoint.pth) (vision encoder)
242
+
243
+ 4. **Configure training**:
244
+ ```yaml
245
+ # configs/spokencoco.yaml
246
+ train_audio_dataset_json_file: "/path/to/SpokenCOCO_train.json"
247
+ val_audio_dataset_json_file: "/path/to/SpokenCOCO_val.json"
248
+ load_hubert_weights: "/path/to/hubert_base_ls960.pt"
249
+ load_pretrained_vit: "/path/to/dino_vitsmall8_pretrain.pth"
250
+ batch_size: 32
251
+ n_epochs: 30
252
+ gpus: "0,1,2,3"
253
+ ```
254
+
255
+ 5. **Train**:
256
+ ```bash
257
+ python train.py --config configs/spokencoco.yaml
258
+ ```
259
+
260
+ ### Training Outputs
261
+
262
+ - Checkpoints saved to `exp_dir/` (default: `./checkpoints/`)
263
+ - TensorBoard logs in experiment directory
264
+ - Config saved as `config.yaml` in experiment directory
265
+
266
+ ### Architecture
267
+
268
+ **Dual-encoder with cross-modal transformer**:
269
+ - **Audio encoder**: HuBERT Base (12 layers, 768-dim)
270
+ - **Vision encoder**: ViT Small/Base (DINO pretrained)
271
+ - **Cross-modal layers**: 5 transformer layers for audio-image interaction
272
+ - **Loss**: Margin InfoNCE (contrastive learning in common embedding space)
273
+
274
+ The trained audio encoder can then be used for segmentation without the vision components.
275
+
276
+ ### Training from Scratch
277
+
278
+ The package includes all training code:
279
+ - `vg_hubert/model/`: Dual encoder, audio/vision transformers
280
+ - `vg_hubert/training/`: Trainer, optimizers, utilities
281
+ - `vg_hubert/datasets/`: SpokenCOCO and Places data loaders
282
+
283
+ See [configs/](configs/) for complete training examples.
284
+
285
+ ## What's Different in This Fork
286
+
287
+ 1. **No PyTorch patching**: Uses native `attn_implementation='eager'` (PyTorch 2.0+)
288
+ 2. **Simplified interface**: Single `Segmenter` class for all use cases
289
+ 3. **HuggingFace Hub**: Automatic model downloading
290
+ 4. **Complete package**: Both training and inference (like Sylber)
291
+ 5. **PyPI distribution**: Easy installation via pip
292
+ 6. **Apple Silicon support**: Automatic MPS (Metal Performance Shaders) GPU acceleration
293
+ 7. **Optimized MinCut**: **~20-50x faster** syllable segmentation using efficient algorithm from [SyllableLM](https://github.com/AlanBaade/SyllableLM) (Baade et al., 2024) with no quality degradation
294
+
295
+ ## Implementation Details
296
+
297
+ For inference, this package uses HuggingFace's `transformers.HubertModel` instead of the original fairseq implementation. This is possible because VG-HuBERT's audio encoder architecture is identical to the standard HuBERT model. The visual grounding training adds a vision encoder and cross-modal transformer layers, but these components are only used during training to learn better speech representations. At inference time, only the audio encoder weights are needed, which are fully compatible with the HuggingFace HuBERT architecture. This simplifies deployment and eliminates the fairseq dependency for inference.
298
+
299
+ ## Citations
300
+
301
+ ### VG-HuBERT Original Work
302
+
303
+ **Syllable Segmentation:**
304
+ ```bibtex
305
+ @inproceedings{peng2023syllable,
306
+ title={Syllable Segmentation and Cross-Lingual Generalization in a Visually Grounded, Self-Supervised Speech Model},
307
+ author={Peng, Puyuan and Li, Shang-Wen and Räsänen, Okko and Mohamed, Abdelrahman and Harwath, David},
308
+ booktitle={Interspeech},
309
+ year={2023}
310
+ }
311
+ ```
312
+
313
+ **Word Discovery:**
314
+ ```bibtex
315
+ @inproceedings{peng2022word,
316
+ title={Word Discovery in Visually Grounded, Self-Supervised Speech Models},
317
+ author={Peng, Puyuan and Harwath, David},
318
+ booktitle={Interspeech},
319
+ year={2022}
320
+ }
321
+ ```
322
+
323
+ ### Interface Design
324
+
325
+ This package follows the interface design of Sylber:
326
+ ```bibtex
327
+ @article{cho2024sylber,
328
+ title={Sylber: Syllabic Embedding Representation of Speech from Raw Audio},
329
+ author={Cho, Cheol Jun and Lee, Nicholas and Gupta, Akshat and Agarwal, Dhruv and Chen, Ethan and Black, Alan W and Anumanchipalli, Gopala K},
330
+ journal={arXiv preprint arXiv:2410.07168},
331
+ year={2024}
332
+ }
333
+ ```
334
+
335
+ ### Optimized MinCut Algorithm
336
+
337
+ The MinCut algorithm used for syllable segmentation has been updated to use the efficient implementation from SyllableLM (Baade et al., 2024), which provides **~20-50x speedup** over the original with no statistically significant quality difference:
338
+
339
+ ```bibtex
340
+ @misc{baade2024syllablelmlearningcoarsesemantic,
341
+ title={SyllableLM: Learning Coarse Semantic Units for Speech Language Models},
342
+ author={Alan Baade and Puyuan Peng and David Harwath},
343
+ year={2024},
344
+ eprint={2410.04029},
345
+ archivePrefix={arXiv},
346
+ primaryClass={cs.CL},
347
+ url={https://arxiv.org/abs/2410.04029},
348
+ }
349
+ ```
350
+
351
+ **Performance Comparison (LibriSpeech test-clean, 50 utterances):**
352
+ - Speed: 6961ms → 133ms per utterance (**52x faster**)
353
+ - Quality: F1=0.377 → 0.372 (p=0.22, not significant)
354
+ - 82% of utterances produce identical segmentations
355
+
356
+ Key optimizations:
357
+ - Cumulative sum preprocessing for O(1) range queries
358
+ - Segment length constraints (min_hop=3, max_hop=50 frames)
359
+ - 5-component cost calculation
360
+
361
+ See [vg_hubert/tests/mincut_validation.ipynb](vg_hubert/tests/mincut_validation.ipynb) for full validation results.
362
+
363
+ ## Related Repositories
364
+
365
+ - **Original implementations**: [word-discovery](https://github.com/jasonppy/word-discovery), [syllable-discovery](https://github.com/jasonppy/syllable-discovery)
366
+ - **Fork parent**: [human-ai-lab/VG-HuBERT](https://github.com/human-ai-lab/VG-HuBERT)
367
+ - **Interface inspiration**: [Sylber](https://github.com/Berkeley-Speech-Group/sylber)
368
+
369
+ ## License
370
+
371
+ BSD-3-Clause License (same as original repositories)
372
+
373
+ ## Contributing
374
+
375
+ Issues and pull requests welcome. Please ensure changes maintain compatibility with original model weights and include proper attribution.