easy-whisperx 0.0.4__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- easy_whisperx-0.0.4/PKG-INFO +323 -0
- easy_whisperx-0.0.4/README.md +286 -0
- easy_whisperx-0.0.4/pyproject.toml +120 -0
- easy_whisperx-0.0.4/setup.cfg +4 -0
- easy_whisperx-0.0.4/src/easy_whisperx/__init__.py +12 -0
- easy_whisperx-0.0.4/src/easy_whisperx/aligner.py +92 -0
- easy_whisperx-0.0.4/src/easy_whisperx/base_model.py +94 -0
- easy_whisperx-0.0.4/src/easy_whisperx/bulk_executor.py +94 -0
- easy_whisperx-0.0.4/src/easy_whisperx/diarizer.py +88 -0
- easy_whisperx-0.0.4/src/easy_whisperx/performance.py +91 -0
- easy_whisperx-0.0.4/src/easy_whisperx/py.typed +0 -0
- easy_whisperx-0.0.4/src/easy_whisperx/transcriber.py +73 -0
- easy_whisperx-0.0.4/src/easy_whisperx/utils.py +38 -0
- easy_whisperx-0.0.4/src/easy_whisperx.egg-info/PKG-INFO +323 -0
- easy_whisperx-0.0.4/src/easy_whisperx.egg-info/SOURCES.txt +23 -0
- easy_whisperx-0.0.4/src/easy_whisperx.egg-info/dependency_links.txt +1 -0
- easy_whisperx-0.0.4/src/easy_whisperx.egg-info/requires.txt +19 -0
- easy_whisperx-0.0.4/src/easy_whisperx.egg-info/top_level.txt +1 -0
- easy_whisperx-0.0.4/tests/test_aligner.py +136 -0
- easy_whisperx-0.0.4/tests/test_basic.py +28 -0
- easy_whisperx-0.0.4/tests/test_diarizer.py +533 -0
- easy_whisperx-0.0.4/tests/test_integration.py +90 -0
- easy_whisperx-0.0.4/tests/test_performance.py +147 -0
- easy_whisperx-0.0.4/tests/test_transcriber.py +190 -0
- easy_whisperx-0.0.4/tests/test_utils.py +164 -0
|
@@ -0,0 +1,323 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: easy-whisperx
|
|
3
|
+
Version: 0.0.4
|
|
4
|
+
Summary: A Python package for easy transcription using WhisperX.
|
|
5
|
+
Author-email: Aryan Falahtpisheh <aryanfalahat@gmail.com>
|
|
6
|
+
License: BSD-2-Clause
|
|
7
|
+
Project-URL: Homepage, https://github.com/falahat/easy-whisperx
|
|
8
|
+
Project-URL: Repository, https://github.com/falahat/easy-whisperx.git
|
|
9
|
+
Project-URL: Issues, https://github.com/falahat/easy-whisperx/issues
|
|
10
|
+
Keywords: whisperx,transcription,audio,speech-to-text
|
|
11
|
+
Classifier: Development Status :: 4 - Beta
|
|
12
|
+
Classifier: Intended Audience :: Developers
|
|
13
|
+
Classifier: License :: OSI Approved :: MIT License
|
|
14
|
+
Classifier: Programming Language :: Python :: 3
|
|
15
|
+
Classifier: Programming Language :: Python :: 3.10
|
|
16
|
+
Classifier: Programming Language :: Python :: 3.11
|
|
17
|
+
Classifier: Programming Language :: Python :: 3.12
|
|
18
|
+
Requires-Python: <3.13,>=3.10
|
|
19
|
+
Description-Content-Type: text/markdown
|
|
20
|
+
Requires-Dist: numpy
|
|
21
|
+
Requires-Dist: torch
|
|
22
|
+
Requires-Dist: pydub
|
|
23
|
+
Requires-Dist: pandas
|
|
24
|
+
Requires-Dist: whisperx-typed
|
|
25
|
+
Requires-Dist: whisperx-stubs
|
|
26
|
+
Provides-Extra: dev
|
|
27
|
+
Requires-Dist: pytest>=6.0.0; extra == "dev"
|
|
28
|
+
Requires-Dist: pytest-cov>=2.0.0; extra == "dev"
|
|
29
|
+
Requires-Dist: mypy>=1.0.0; extra == "dev"
|
|
30
|
+
Requires-Dist: black>=22.0.0; extra == "dev"
|
|
31
|
+
Requires-Dist: flake8>=4.0.0; extra == "dev"
|
|
32
|
+
Requires-Dist: coverage; extra == "dev"
|
|
33
|
+
Provides-Extra: notebook
|
|
34
|
+
Requires-Dist: jupyter>=1.0.0; extra == "notebook"
|
|
35
|
+
Requires-Dist: pandas; extra == "notebook"
|
|
36
|
+
Requires-Dist: matplotlib; extra == "notebook"
|
|
37
|
+
|
|
38
|
+
# easy-whisperx
|
|
39
|
+
|
|
40
|
+
A streamlined Python wrapper around the [WhisperX](https://github.com/m-bain/whisperX) project, providing enhanced type safety, automatic resource management, and simplified API for audio transcription with GPU acceleration, word-level alignment, and speaker diarization.
|
|
41
|
+
|
|
42
|
+
## Acknowledgments
|
|
43
|
+
|
|
44
|
+
This project builds upon the outstanding work of the [WhisperX team](https://github.com/m-bain/whisperX), particularly:
|
|
45
|
+
|
|
46
|
+
- **Max Bain** and contributors for creating WhisperX
|
|
47
|
+
- The original [Whisper](https://github.com/openai/whisper) team at OpenAI
|
|
48
|
+
- The [faster-whisper](https://github.com/guillaumekln/faster-whisper) project for performance improvements
|
|
49
|
+
|
|
50
|
+
**What easy-whisperx adds:**
|
|
51
|
+
|
|
52
|
+
- **Type Safety**: Comprehensive type hints and mypy compatibility
|
|
53
|
+
- **Resource Management**: Automatic GPU memory cleanup using context managers
|
|
54
|
+
- **Performance Tracking**: Built-in metrics collection for all operations
|
|
55
|
+
- **Simplified API**: Cleaner interface with sensible defaults
|
|
56
|
+
- **Error Handling**: Robust error handling with detailed logging
|
|
57
|
+
- **Bulk Processing**: Efficient batch processing capabilities
|
|
58
|
+
|
|
59
|
+
All the core transcription, alignment, and diarization capabilities are provided by the underlying WhisperX library.
|
|
60
|
+
|
|
61
|
+
## Python Version Requirements
|
|
62
|
+
|
|
63
|
+
**This package requires Python 3.10, 3.11, or 3.12.** Python 3.13+ is not supported due to dependency limitations with the WhisperX library.
|
|
64
|
+
|
|
65
|
+
## Features
|
|
66
|
+
|
|
67
|
+
- **Audio Transcription**: WhisperX-powered speech-to-text conversion
|
|
68
|
+
- **Word-level Alignment**: Precise timestamp alignment for individual words
|
|
69
|
+
- **Speaker Diarization**: Automatic speaker identification and assignment
|
|
70
|
+
- **GPU Acceleration**: CUDA support for faster processing
|
|
71
|
+
- **Performance Tracking**: Built-in metrics collection for all operations
|
|
72
|
+
- **Bulk Processing**: Efficient batch processing with individual item tracking
|
|
73
|
+
- **Type Safety**: Comprehensive type hints throughout
|
|
74
|
+
- **Context Management**: Automatic resource cleanup and memory management
|
|
75
|
+
|
|
76
|
+
## Installation
|
|
77
|
+
|
|
78
|
+
### Standard Installation
|
|
79
|
+
|
|
80
|
+
```bash
|
|
81
|
+
git clone https://github.com/falahat/easy-whisperx.git
|
|
82
|
+
cd easy-whisperx
|
|
83
|
+
pip install -e .
|
|
84
|
+
```
|
|
85
|
+
|
|
86
|
+
### Development Installation
|
|
87
|
+
|
|
88
|
+
```bash
|
|
89
|
+
git clone https://github.com/falahat/easy-whisperx.git
|
|
90
|
+
cd easy-whisperx
|
|
91
|
+
pip install -e .[dev]
|
|
92
|
+
```
|
|
93
|
+
|
|
94
|
+
### Notebook Support
|
|
95
|
+
|
|
96
|
+
```bash
|
|
97
|
+
pip install -e .[notebook]
|
|
98
|
+
```
|
|
99
|
+
|
|
100
|
+
## Prerequisites for GPU Transcription
|
|
101
|
+
|
|
102
|
+
1. **NVIDIA GPU** with CUDA support
|
|
103
|
+
2. **Hugging Face Token** (for speaker diarization models)
|
|
104
|
+
3. **PyTorch with GPU support**
|
|
105
|
+
|
|
106
|
+
```bash
|
|
107
|
+
# Install PyTorch with GPU support
|
|
108
|
+
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
|
|
109
|
+
```
|
|
110
|
+
|
|
111
|
+
## Setting up Transcription Environment
|
|
112
|
+
|
|
113
|
+
1. **Get a Hugging Face Token**:
|
|
114
|
+
- Go to [Hugging Face Settings](https://huggingface.co/settings/tokens)
|
|
115
|
+
- Create a token with "read" permissions
|
|
116
|
+
- Accept user agreements for segmentation and diarization models
|
|
117
|
+
|
|
118
|
+
2. **Set Environment Variable**:
|
|
119
|
+
|
|
120
|
+
```powershell
|
|
121
|
+
# Windows PowerShell
|
|
122
|
+
$env:HF_TOKEN="your_token_here"
|
|
123
|
+
```
|
|
124
|
+
|
|
125
|
+
```bash
|
|
126
|
+
# Linux/macOS
|
|
127
|
+
export HF_TOKEN="your_token_here"
|
|
128
|
+
```
|
|
129
|
+
|
|
130
|
+
## Quick Start
|
|
131
|
+
|
|
132
|
+
### Basic Transcription
|
|
133
|
+
|
|
134
|
+
```python
|
|
135
|
+
from easy_whisperx import Transcriber
|
|
136
|
+
|
|
137
|
+
# Initialize transcriber
|
|
138
|
+
transcriber = Transcriber(
|
|
139
|
+
model_size="base",
|
|
140
|
+
device="cuda", # or "cpu"
|
|
141
|
+
compute_type="float16",
|
|
142
|
+
batch_size=16
|
|
143
|
+
)
|
|
144
|
+
|
|
145
|
+
# Transcribe audio file
|
|
146
|
+
with transcriber:
|
|
147
|
+
result = transcriber("path/to/audio.mp3")
|
|
148
|
+
print(result["text"])
|
|
149
|
+
```
|
|
150
|
+
|
|
151
|
+
### Complete Pipeline with Alignment and Diarization
|
|
152
|
+
|
|
153
|
+
```python
|
|
154
|
+
import os
|
|
155
|
+
from easy_whisperx import Transcriber, Aligner, Diarizer
|
|
156
|
+
|
|
157
|
+
audio_path = "path/to/audio.mp3"
|
|
158
|
+
hf_token = os.getenv("HF_TOKEN")
|
|
159
|
+
|
|
160
|
+
# Step 1: Transcribe
|
|
161
|
+
with Transcriber("base", "cuda", "float16", 16) as transcriber:
|
|
162
|
+
transcript = transcriber(audio_path)
|
|
163
|
+
|
|
164
|
+
# Step 2: Align words
|
|
165
|
+
with Aligner("cuda", "en") as aligner:
|
|
166
|
+
aligned_transcript = aligner(transcript["segments"], audio_path)
|
|
167
|
+
|
|
168
|
+
# Step 3: Diarize speakers (optional)
|
|
169
|
+
if hf_token:
|
|
170
|
+
with Diarizer("cuda", hf_token) as diarizer:
|
|
171
|
+
final_transcript = diarizer(aligned_transcript, audio_path)
|
|
172
|
+
else:
|
|
173
|
+
final_transcript = aligned_transcript
|
|
174
|
+
|
|
175
|
+
print(final_transcript)
|
|
176
|
+
```
|
|
177
|
+
|
|
178
|
+
### Bulk Processing
|
|
179
|
+
|
|
180
|
+
```python
|
|
181
|
+
from easy_whisperx import BulkExecutor, Transcriber
|
|
182
|
+
|
|
183
|
+
audio_files = ["file1.mp3", "file2.mp3", "file3.mp3"]
|
|
184
|
+
|
|
185
|
+
with Transcriber("base", "cuda", "float16", 16) as transcriber:
|
|
186
|
+
with BulkExecutor(transcriber) as executor:
|
|
187
|
+
def transcribe_file(model, audio_path, tracker):
|
|
188
|
+
result = model(audio_path)
|
|
189
|
+
tracker["segments_count"] = len(result.get("segments", []))
|
|
190
|
+
|
|
191
|
+
executor.for_each(audio_files, transcribe_file)
|
|
192
|
+
metrics = executor.get_metrics()
|
|
193
|
+
print(f"Bulk processing metrics: {metrics}")
|
|
194
|
+
```
|
|
195
|
+
|
|
196
|
+
## WhisperX Integration
|
|
197
|
+
|
|
198
|
+
This package uses WhisperX as its core transcription engine. We maintain a fork at [falahat/whisperx](https://github.com/falahat/whisperx) that may include specific optimizations or compatibility fixes, but all credit for the underlying transcription technology goes to the original [WhisperX project](https://github.com/m-bain/whisperX).
|
|
199
|
+
|
|
200
|
+
The original WhisperX provides:
|
|
201
|
+
|
|
202
|
+
- Fast automatic speech recognition with word-level timestamps
|
|
203
|
+
- Speaker diarization capabilities
|
|
204
|
+
- Multiple language support
|
|
205
|
+
- GPU acceleration with optimized inference
|
|
206
|
+
|
|
207
|
+
Our wrapper adds the resource management and type safety layer on top of this excellent foundation.
|
|
208
|
+
|
|
209
|
+
## Development
|
|
210
|
+
|
|
211
|
+
### Setting up Development Environment
|
|
212
|
+
|
|
213
|
+
```bash
|
|
214
|
+
git clone https://github.com/falahat/easy-whisperx.git
|
|
215
|
+
cd easy-whisperx
|
|
216
|
+
|
|
217
|
+
# Create virtual environment (note the .venv name)
|
|
218
|
+
python -m venv .venv
|
|
219
|
+
|
|
220
|
+
# Activate virtual environment
|
|
221
|
+
# Windows PowerShell:
|
|
222
|
+
.\.venv\Scripts\Activate.ps1
|
|
223
|
+
# Linux/macOS:
|
|
224
|
+
source .venv/bin/activate
|
|
225
|
+
|
|
226
|
+
# Install in development mode
|
|
227
|
+
pip install -e .[dev]
|
|
228
|
+
```
|
|
229
|
+
|
|
230
|
+
### Running Tests
|
|
231
|
+
|
|
232
|
+
```bash
|
|
233
|
+
# Run all tests
|
|
234
|
+
pytest
|
|
235
|
+
|
|
236
|
+
# Run with coverage report
|
|
237
|
+
pytest --cov=easy_whisperx --cov-report=html
|
|
238
|
+
|
|
239
|
+
# Run specific test file
|
|
240
|
+
pytest tests/test_transcriber.py -v
|
|
241
|
+
|
|
242
|
+
# Run integration tests
|
|
243
|
+
pytest -m integration
|
|
244
|
+
```
|
|
245
|
+
|
|
246
|
+
### Code Quality Tools
|
|
247
|
+
|
|
248
|
+
The project uses:
|
|
249
|
+
|
|
250
|
+
- **Black** for code formatting
|
|
251
|
+
- **mypy** for type checking
|
|
252
|
+
- **flake8** for linting
|
|
253
|
+
- **pytest** for testing
|
|
254
|
+
|
|
255
|
+
```bash
|
|
256
|
+
# Format code
|
|
257
|
+
black src/ tests/
|
|
258
|
+
|
|
259
|
+
# Type checking
|
|
260
|
+
mypy src/easy_whisperx/
|
|
261
|
+
|
|
262
|
+
# Linting
|
|
263
|
+
flake8 src/easy_whisperx/
|
|
264
|
+
```
|
|
265
|
+
|
|
266
|
+
## Core Components
|
|
267
|
+
|
|
268
|
+
The package is built with a modular architecture:
|
|
269
|
+
|
|
270
|
+
- **`Transcriber`** - Main transcription using WhisperX models
|
|
271
|
+
- **`Aligner`** - Word-level timestamp alignment
|
|
272
|
+
- **`Diarizer`** - Speaker identification and assignment
|
|
273
|
+
- **`BulkExecutor`** - Bulk processing with performance tracking
|
|
274
|
+
- **`PerformanceTracker`** - Performance metrics collection
|
|
275
|
+
- **`BaseWhisperxModel`** - Abstract base for model management
|
|
276
|
+
- **Utility functions** - Audio loading and device configuration
|
|
277
|
+
|
|
278
|
+
## Performance and Memory Management
|
|
279
|
+
|
|
280
|
+
The package includes automatic resource management:
|
|
281
|
+
|
|
282
|
+
- **Context Managers**: All models automatically clean up GPU memory
|
|
283
|
+
- **Performance Tracking**: Built-in metrics for all operations
|
|
284
|
+
- **Memory Optimization**: Automatic garbage collection and CUDA cache clearing
|
|
285
|
+
- **Error Handling**: Graceful failure handling with detailed logging
|
|
286
|
+
|
|
287
|
+
## API Reference
|
|
288
|
+
|
|
289
|
+
### Device Configuration
|
|
290
|
+
|
|
291
|
+
```python
|
|
292
|
+
from easy_whisperx.utils import _determine_device_config
|
|
293
|
+
|
|
294
|
+
# Automatic device selection
|
|
295
|
+
device, compute_type = _determine_device_config("auto", "auto")
|
|
296
|
+
# Returns ("cuda", "float16") if GPU available, ("cpu", "int8") otherwise
|
|
297
|
+
```
|
|
298
|
+
|
|
299
|
+
### Performance Tracking
|
|
300
|
+
|
|
301
|
+
```python
|
|
302
|
+
from easy_whisperx import PerformanceTracker
|
|
303
|
+
|
|
304
|
+
with PerformanceTracker("my_operation") as tracker:
|
|
305
|
+
# Your code here
|
|
306
|
+
tracker["custom_metric"] = "value"
|
|
307
|
+
|
|
308
|
+
metrics = tracker.to_dict()
|
|
309
|
+
print(f"Operation took {metrics['my_operation']['duration_seconds']} seconds")
|
|
310
|
+
```
|
|
311
|
+
|
|
312
|
+
## Contributing
|
|
313
|
+
|
|
314
|
+
1. Fork the repository
|
|
315
|
+
2. Create a feature branch (`git checkout -b feature/amazing-feature`)
|
|
316
|
+
3. Make your changes with tests
|
|
317
|
+
4. Ensure all tests pass (`pytest`)
|
|
318
|
+
5. Check code quality (`black src/ tests/` and `mypy src/`)
|
|
319
|
+
6. Submit a pull request
|
|
320
|
+
|
|
321
|
+
## License
|
|
322
|
+
|
|
323
|
+
This project is licensed under the MIT License - see the LICENSE file for details.
|
|
@@ -0,0 +1,286 @@
|
|
|
1
|
+
# easy-whisperx
|
|
2
|
+
|
|
3
|
+
A streamlined Python wrapper around the [WhisperX](https://github.com/m-bain/whisperX) project, providing enhanced type safety, automatic resource management, and simplified API for audio transcription with GPU acceleration, word-level alignment, and speaker diarization.
|
|
4
|
+
|
|
5
|
+
## Acknowledgments
|
|
6
|
+
|
|
7
|
+
This project builds upon the outstanding work of the [WhisperX team](https://github.com/m-bain/whisperX), particularly:
|
|
8
|
+
|
|
9
|
+
- **Max Bain** and contributors for creating WhisperX
|
|
10
|
+
- The original [Whisper](https://github.com/openai/whisper) team at OpenAI
|
|
11
|
+
- The [faster-whisper](https://github.com/guillaumekln/faster-whisper) project for performance improvements
|
|
12
|
+
|
|
13
|
+
**What easy-whisperx adds:**
|
|
14
|
+
|
|
15
|
+
- **Type Safety**: Comprehensive type hints and mypy compatibility
|
|
16
|
+
- **Resource Management**: Automatic GPU memory cleanup using context managers
|
|
17
|
+
- **Performance Tracking**: Built-in metrics collection for all operations
|
|
18
|
+
- **Simplified API**: Cleaner interface with sensible defaults
|
|
19
|
+
- **Error Handling**: Robust error handling with detailed logging
|
|
20
|
+
- **Bulk Processing**: Efficient batch processing capabilities
|
|
21
|
+
|
|
22
|
+
All the core transcription, alignment, and diarization capabilities are provided by the underlying WhisperX library.
|
|
23
|
+
|
|
24
|
+
## Python Version Requirements
|
|
25
|
+
|
|
26
|
+
**This package requires Python 3.10, 3.11, or 3.12.** Python 3.13+ is not supported due to dependency limitations with the WhisperX library.
|
|
27
|
+
|
|
28
|
+
## Features
|
|
29
|
+
|
|
30
|
+
- **Audio Transcription**: WhisperX-powered speech-to-text conversion
|
|
31
|
+
- **Word-level Alignment**: Precise timestamp alignment for individual words
|
|
32
|
+
- **Speaker Diarization**: Automatic speaker identification and assignment
|
|
33
|
+
- **GPU Acceleration**: CUDA support for faster processing
|
|
34
|
+
- **Performance Tracking**: Built-in metrics collection for all operations
|
|
35
|
+
- **Bulk Processing**: Efficient batch processing with individual item tracking
|
|
36
|
+
- **Type Safety**: Comprehensive type hints throughout
|
|
37
|
+
- **Context Management**: Automatic resource cleanup and memory management
|
|
38
|
+
|
|
39
|
+
## Installation
|
|
40
|
+
|
|
41
|
+
### Standard Installation
|
|
42
|
+
|
|
43
|
+
```bash
|
|
44
|
+
git clone https://github.com/falahat/easy-whisperx.git
|
|
45
|
+
cd easy-whisperx
|
|
46
|
+
pip install -e .
|
|
47
|
+
```
|
|
48
|
+
|
|
49
|
+
### Development Installation
|
|
50
|
+
|
|
51
|
+
```bash
|
|
52
|
+
git clone https://github.com/falahat/easy-whisperx.git
|
|
53
|
+
cd easy-whisperx
|
|
54
|
+
pip install -e .[dev]
|
|
55
|
+
```
|
|
56
|
+
|
|
57
|
+
### Notebook Support
|
|
58
|
+
|
|
59
|
+
```bash
|
|
60
|
+
pip install -e .[notebook]
|
|
61
|
+
```
|
|
62
|
+
|
|
63
|
+
## Prerequisites for GPU Transcription
|
|
64
|
+
|
|
65
|
+
1. **NVIDIA GPU** with CUDA support
|
|
66
|
+
2. **Hugging Face Token** (for speaker diarization models)
|
|
67
|
+
3. **PyTorch with GPU support**
|
|
68
|
+
|
|
69
|
+
```bash
|
|
70
|
+
# Install PyTorch with GPU support
|
|
71
|
+
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
|
|
72
|
+
```
|
|
73
|
+
|
|
74
|
+
## Setting up Transcription Environment
|
|
75
|
+
|
|
76
|
+
1. **Get a Hugging Face Token**:
|
|
77
|
+
- Go to [Hugging Face Settings](https://huggingface.co/settings/tokens)
|
|
78
|
+
- Create a token with "read" permissions
|
|
79
|
+
- Accept user agreements for segmentation and diarization models
|
|
80
|
+
|
|
81
|
+
2. **Set Environment Variable**:
|
|
82
|
+
|
|
83
|
+
```powershell
|
|
84
|
+
# Windows PowerShell
|
|
85
|
+
$env:HF_TOKEN="your_token_here"
|
|
86
|
+
```
|
|
87
|
+
|
|
88
|
+
```bash
|
|
89
|
+
# Linux/macOS
|
|
90
|
+
export HF_TOKEN="your_token_here"
|
|
91
|
+
```
|
|
92
|
+
|
|
93
|
+
## Quick Start
|
|
94
|
+
|
|
95
|
+
### Basic Transcription
|
|
96
|
+
|
|
97
|
+
```python
|
|
98
|
+
from easy_whisperx import Transcriber
|
|
99
|
+
|
|
100
|
+
# Initialize transcriber
|
|
101
|
+
transcriber = Transcriber(
|
|
102
|
+
model_size="base",
|
|
103
|
+
device="cuda", # or "cpu"
|
|
104
|
+
compute_type="float16",
|
|
105
|
+
batch_size=16
|
|
106
|
+
)
|
|
107
|
+
|
|
108
|
+
# Transcribe audio file
|
|
109
|
+
with transcriber:
|
|
110
|
+
result = transcriber("path/to/audio.mp3")
|
|
111
|
+
print(result["text"])
|
|
112
|
+
```
|
|
113
|
+
|
|
114
|
+
### Complete Pipeline with Alignment and Diarization
|
|
115
|
+
|
|
116
|
+
```python
|
|
117
|
+
import os
|
|
118
|
+
from easy_whisperx import Transcriber, Aligner, Diarizer
|
|
119
|
+
|
|
120
|
+
audio_path = "path/to/audio.mp3"
|
|
121
|
+
hf_token = os.getenv("HF_TOKEN")
|
|
122
|
+
|
|
123
|
+
# Step 1: Transcribe
|
|
124
|
+
with Transcriber("base", "cuda", "float16", 16) as transcriber:
|
|
125
|
+
transcript = transcriber(audio_path)
|
|
126
|
+
|
|
127
|
+
# Step 2: Align words
|
|
128
|
+
with Aligner("cuda", "en") as aligner:
|
|
129
|
+
aligned_transcript = aligner(transcript["segments"], audio_path)
|
|
130
|
+
|
|
131
|
+
# Step 3: Diarize speakers (optional)
|
|
132
|
+
if hf_token:
|
|
133
|
+
with Diarizer("cuda", hf_token) as diarizer:
|
|
134
|
+
final_transcript = diarizer(aligned_transcript, audio_path)
|
|
135
|
+
else:
|
|
136
|
+
final_transcript = aligned_transcript
|
|
137
|
+
|
|
138
|
+
print(final_transcript)
|
|
139
|
+
```
|
|
140
|
+
|
|
141
|
+
### Bulk Processing
|
|
142
|
+
|
|
143
|
+
```python
|
|
144
|
+
from easy_whisperx import BulkExecutor, Transcriber
|
|
145
|
+
|
|
146
|
+
audio_files = ["file1.mp3", "file2.mp3", "file3.mp3"]
|
|
147
|
+
|
|
148
|
+
with Transcriber("base", "cuda", "float16", 16) as transcriber:
|
|
149
|
+
with BulkExecutor(transcriber) as executor:
|
|
150
|
+
def transcribe_file(model, audio_path, tracker):
|
|
151
|
+
result = model(audio_path)
|
|
152
|
+
tracker["segments_count"] = len(result.get("segments", []))
|
|
153
|
+
|
|
154
|
+
executor.for_each(audio_files, transcribe_file)
|
|
155
|
+
metrics = executor.get_metrics()
|
|
156
|
+
print(f"Bulk processing metrics: {metrics}")
|
|
157
|
+
```
|
|
158
|
+
|
|
159
|
+
## WhisperX Integration
|
|
160
|
+
|
|
161
|
+
This package uses WhisperX as its core transcription engine. We maintain a fork at [falahat/whisperx](https://github.com/falahat/whisperx) that may include specific optimizations or compatibility fixes, but all credit for the underlying transcription technology goes to the original [WhisperX project](https://github.com/m-bain/whisperX).
|
|
162
|
+
|
|
163
|
+
The original WhisperX provides:
|
|
164
|
+
|
|
165
|
+
- Fast automatic speech recognition with word-level timestamps
|
|
166
|
+
- Speaker diarization capabilities
|
|
167
|
+
- Multiple language support
|
|
168
|
+
- GPU acceleration with optimized inference
|
|
169
|
+
|
|
170
|
+
Our wrapper adds the resource management and type safety layer on top of this excellent foundation.
|
|
171
|
+
|
|
172
|
+
## Development
|
|
173
|
+
|
|
174
|
+
### Setting up Development Environment
|
|
175
|
+
|
|
176
|
+
```bash
|
|
177
|
+
git clone https://github.com/falahat/easy-whisperx.git
|
|
178
|
+
cd easy-whisperx
|
|
179
|
+
|
|
180
|
+
# Create virtual environment (note the .venv name)
|
|
181
|
+
python -m venv .venv
|
|
182
|
+
|
|
183
|
+
# Activate virtual environment
|
|
184
|
+
# Windows PowerShell:
|
|
185
|
+
.\.venv\Scripts\Activate.ps1
|
|
186
|
+
# Linux/macOS:
|
|
187
|
+
source .venv/bin/activate
|
|
188
|
+
|
|
189
|
+
# Install in development mode
|
|
190
|
+
pip install -e .[dev]
|
|
191
|
+
```
|
|
192
|
+
|
|
193
|
+
### Running Tests
|
|
194
|
+
|
|
195
|
+
```bash
|
|
196
|
+
# Run all tests
|
|
197
|
+
pytest
|
|
198
|
+
|
|
199
|
+
# Run with coverage report
|
|
200
|
+
pytest --cov=easy_whisperx --cov-report=html
|
|
201
|
+
|
|
202
|
+
# Run specific test file
|
|
203
|
+
pytest tests/test_transcriber.py -v
|
|
204
|
+
|
|
205
|
+
# Run integration tests
|
|
206
|
+
pytest -m integration
|
|
207
|
+
```
|
|
208
|
+
|
|
209
|
+
### Code Quality Tools
|
|
210
|
+
|
|
211
|
+
The project uses:
|
|
212
|
+
|
|
213
|
+
- **Black** for code formatting
|
|
214
|
+
- **mypy** for type checking
|
|
215
|
+
- **flake8** for linting
|
|
216
|
+
- **pytest** for testing
|
|
217
|
+
|
|
218
|
+
```bash
|
|
219
|
+
# Format code
|
|
220
|
+
black src/ tests/
|
|
221
|
+
|
|
222
|
+
# Type checking
|
|
223
|
+
mypy src/easy_whisperx/
|
|
224
|
+
|
|
225
|
+
# Linting
|
|
226
|
+
flake8 src/easy_whisperx/
|
|
227
|
+
```
|
|
228
|
+
|
|
229
|
+
## Core Components
|
|
230
|
+
|
|
231
|
+
The package is built with a modular architecture:
|
|
232
|
+
|
|
233
|
+
- **`Transcriber`** - Main transcription using WhisperX models
|
|
234
|
+
- **`Aligner`** - Word-level timestamp alignment
|
|
235
|
+
- **`Diarizer`** - Speaker identification and assignment
|
|
236
|
+
- **`BulkExecutor`** - Bulk processing with performance tracking
|
|
237
|
+
- **`PerformanceTracker`** - Performance metrics collection
|
|
238
|
+
- **`BaseWhisperxModel`** - Abstract base for model management
|
|
239
|
+
- **Utility functions** - Audio loading and device configuration
|
|
240
|
+
|
|
241
|
+
## Performance and Memory Management
|
|
242
|
+
|
|
243
|
+
The package includes automatic resource management:
|
|
244
|
+
|
|
245
|
+
- **Context Managers**: All models automatically clean up GPU memory
|
|
246
|
+
- **Performance Tracking**: Built-in metrics for all operations
|
|
247
|
+
- **Memory Optimization**: Automatic garbage collection and CUDA cache clearing
|
|
248
|
+
- **Error Handling**: Graceful failure handling with detailed logging
|
|
249
|
+
|
|
250
|
+
## API Reference
|
|
251
|
+
|
|
252
|
+
### Device Configuration
|
|
253
|
+
|
|
254
|
+
```python
|
|
255
|
+
from easy_whisperx.utils import _determine_device_config
|
|
256
|
+
|
|
257
|
+
# Automatic device selection
|
|
258
|
+
device, compute_type = _determine_device_config("auto", "auto")
|
|
259
|
+
# Returns ("cuda", "float16") if GPU available, ("cpu", "int8") otherwise
|
|
260
|
+
```
|
|
261
|
+
|
|
262
|
+
### Performance Tracking
|
|
263
|
+
|
|
264
|
+
```python
|
|
265
|
+
from easy_whisperx import PerformanceTracker
|
|
266
|
+
|
|
267
|
+
with PerformanceTracker("my_operation") as tracker:
|
|
268
|
+
# Your code here
|
|
269
|
+
tracker["custom_metric"] = "value"
|
|
270
|
+
|
|
271
|
+
metrics = tracker.to_dict()
|
|
272
|
+
print(f"Operation took {metrics['my_operation']['duration_seconds']} seconds")
|
|
273
|
+
```
|
|
274
|
+
|
|
275
|
+
## Contributing
|
|
276
|
+
|
|
277
|
+
1. Fork the repository
|
|
278
|
+
2. Create a feature branch (`git checkout -b feature/amazing-feature`)
|
|
279
|
+
3. Make your changes with tests
|
|
280
|
+
4. Ensure all tests pass (`pytest`)
|
|
281
|
+
5. Check code quality (`black src/ tests/` and `mypy src/`)
|
|
282
|
+
6. Submit a pull request
|
|
283
|
+
|
|
284
|
+
## License
|
|
285
|
+
|
|
286
|
+
This project is licensed under the MIT License - see the LICENSE file for details.
|