easy-whisperx 0.0.4__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,323 @@
1
+ Metadata-Version: 2.4
2
+ Name: easy-whisperx
3
+ Version: 0.0.4
4
+ Summary: A Python package for easy transcription using WhisperX.
5
+ Author-email: Aryan Falahtpisheh <aryanfalahat@gmail.com>
6
+ License: BSD-2-Clause
7
+ Project-URL: Homepage, https://github.com/falahat/easy-whisperx
8
+ Project-URL: Repository, https://github.com/falahat/easy-whisperx.git
9
+ Project-URL: Issues, https://github.com/falahat/easy-whisperx/issues
10
+ Keywords: whisperx,transcription,audio,speech-to-text
11
+ Classifier: Development Status :: 4 - Beta
12
+ Classifier: Intended Audience :: Developers
13
+ Classifier: License :: OSI Approved :: MIT License
14
+ Classifier: Programming Language :: Python :: 3
15
+ Classifier: Programming Language :: Python :: 3.10
16
+ Classifier: Programming Language :: Python :: 3.11
17
+ Classifier: Programming Language :: Python :: 3.12
18
+ Requires-Python: <3.13,>=3.10
19
+ Description-Content-Type: text/markdown
20
+ Requires-Dist: numpy
21
+ Requires-Dist: torch
22
+ Requires-Dist: pydub
23
+ Requires-Dist: pandas
24
+ Requires-Dist: whisperx-typed
25
+ Requires-Dist: whisperx-stubs
26
+ Provides-Extra: dev
27
+ Requires-Dist: pytest>=6.0.0; extra == "dev"
28
+ Requires-Dist: pytest-cov>=2.0.0; extra == "dev"
29
+ Requires-Dist: mypy>=1.0.0; extra == "dev"
30
+ Requires-Dist: black>=22.0.0; extra == "dev"
31
+ Requires-Dist: flake8>=4.0.0; extra == "dev"
32
+ Requires-Dist: coverage; extra == "dev"
33
+ Provides-Extra: notebook
34
+ Requires-Dist: jupyter>=1.0.0; extra == "notebook"
35
+ Requires-Dist: pandas; extra == "notebook"
36
+ Requires-Dist: matplotlib; extra == "notebook"
37
+
38
+ # easy-whisperx
39
+
40
+ A streamlined Python wrapper around the [WhisperX](https://github.com/m-bain/whisperX) project, providing enhanced type safety, automatic resource management, and simplified API for audio transcription with GPU acceleration, word-level alignment, and speaker diarization.
41
+
42
+ ## Acknowledgments
43
+
44
+ This project builds upon the outstanding work of the [WhisperX team](https://github.com/m-bain/whisperX), particularly:
45
+
46
+ - **Max Bain** and contributors for creating WhisperX
47
+ - The original [Whisper](https://github.com/openai/whisper) team at OpenAI
48
+ - The [faster-whisper](https://github.com/guillaumekln/faster-whisper) project for performance improvements
49
+
50
+ **What easy-whisperx adds:**
51
+
52
+ - **Type Safety**: Comprehensive type hints and mypy compatibility
53
+ - **Resource Management**: Automatic GPU memory cleanup using context managers
54
+ - **Performance Tracking**: Built-in metrics collection for all operations
55
+ - **Simplified API**: Cleaner interface with sensible defaults
56
+ - **Error Handling**: Robust error handling with detailed logging
57
+ - **Bulk Processing**: Efficient batch processing capabilities
58
+
59
+ All the core transcription, alignment, and diarization capabilities are provided by the underlying WhisperX library.
60
+
61
+ ## Python Version Requirements
62
+
63
+ **This package requires Python 3.10, 3.11, or 3.12.** Python 3.13+ is not supported due to dependency limitations with the WhisperX library.
64
+
65
+ ## Features
66
+
67
+ - **Audio Transcription**: WhisperX-powered speech-to-text conversion
68
+ - **Word-level Alignment**: Precise timestamp alignment for individual words
69
+ - **Speaker Diarization**: Automatic speaker identification and assignment
70
+ - **GPU Acceleration**: CUDA support for faster processing
71
+ - **Performance Tracking**: Built-in metrics collection for all operations
72
+ - **Bulk Processing**: Efficient batch processing with individual item tracking
73
+ - **Type Safety**: Comprehensive type hints throughout
74
+ - **Context Management**: Automatic resource cleanup and memory management
75
+
76
+ ## Installation
77
+
78
+ ### Standard Installation
79
+
80
+ ```bash
81
+ git clone https://github.com/falahat/easy-whisperx.git
82
+ cd easy-whisperx
83
+ pip install -e .
84
+ ```
85
+
86
+ ### Development Installation
87
+
88
+ ```bash
89
+ git clone https://github.com/falahat/easy-whisperx.git
90
+ cd easy-whisperx
91
+ pip install -e .[dev]
92
+ ```
93
+
94
+ ### Notebook Support
95
+
96
+ ```bash
97
+ pip install -e .[notebook]
98
+ ```
99
+
100
+ ## Prerequisites for GPU Transcription
101
+
102
+ 1. **NVIDIA GPU** with CUDA support
103
+ 2. **Hugging Face Token** (for speaker diarization models)
104
+ 3. **PyTorch with GPU support**
105
+
106
+ ```bash
107
+ # Install PyTorch with GPU support
108
+ pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
109
+ ```
110
+
111
+ ## Setting up Transcription Environment
112
+
113
+ 1. **Get a Hugging Face Token**:
114
+ - Go to [Hugging Face Settings](https://huggingface.co/settings/tokens)
115
+ - Create a token with "read" permissions
116
+ - Accept user agreements for segmentation and diarization models
117
+
118
+ 2. **Set Environment Variable**:
119
+
120
+ ```powershell
121
+ # Windows PowerShell
122
+ $env:HF_TOKEN="your_token_here"
123
+ ```
124
+
125
+ ```bash
126
+ # Linux/macOS
127
+ export HF_TOKEN="your_token_here"
128
+ ```
129
+
130
+ ## Quick Start
131
+
132
+ ### Basic Transcription
133
+
134
+ ```python
135
+ from easy_whisperx import Transcriber
136
+
137
+ # Initialize transcriber
138
+ transcriber = Transcriber(
139
+ model_size="base",
140
+ device="cuda", # or "cpu"
141
+ compute_type="float16",
142
+ batch_size=16
143
+ )
144
+
145
+ # Transcribe audio file
146
+ with transcriber:
147
+ result = transcriber("path/to/audio.mp3")
148
+ print(result["text"])
149
+ ```
150
+
151
+ ### Complete Pipeline with Alignment and Diarization
152
+
153
+ ```python
154
+ import os
155
+ from easy_whisperx import Transcriber, Aligner, Diarizer
156
+
157
+ audio_path = "path/to/audio.mp3"
158
+ hf_token = os.getenv("HF_TOKEN")
159
+
160
+ # Step 1: Transcribe
161
+ with Transcriber("base", "cuda", "float16", 16) as transcriber:
162
+ transcript = transcriber(audio_path)
163
+
164
+ # Step 2: Align words
165
+ with Aligner("cuda", "en") as aligner:
166
+ aligned_transcript = aligner(transcript["segments"], audio_path)
167
+
168
+ # Step 3: Diarize speakers (optional)
169
+ if hf_token:
170
+ with Diarizer("cuda", hf_token) as diarizer:
171
+ final_transcript = diarizer(aligned_transcript, audio_path)
172
+ else:
173
+ final_transcript = aligned_transcript
174
+
175
+ print(final_transcript)
176
+ ```
177
+
178
+ ### Bulk Processing
179
+
180
+ ```python
181
+ from easy_whisperx import BulkExecutor, Transcriber
182
+
183
+ audio_files = ["file1.mp3", "file2.mp3", "file3.mp3"]
184
+
185
+ with Transcriber("base", "cuda", "float16", 16) as transcriber:
186
+ with BulkExecutor(transcriber) as executor:
187
+ def transcribe_file(model, audio_path, tracker):
188
+ result = model(audio_path)
189
+ tracker["segments_count"] = len(result.get("segments", []))
190
+
191
+ executor.for_each(audio_files, transcribe_file)
192
+ metrics = executor.get_metrics()
193
+ print(f"Bulk processing metrics: {metrics}")
194
+ ```
195
+
196
+ ## WhisperX Integration
197
+
198
+ This package uses WhisperX as its core transcription engine. We maintain a fork at [falahat/whisperx](https://github.com/falahat/whisperx) that may include specific optimizations or compatibility fixes, but all credit for the underlying transcription technology goes to the original [WhisperX project](https://github.com/m-bain/whisperX).
199
+
200
+ The original WhisperX provides:
201
+
202
+ - Fast automatic speech recognition with word-level timestamps
203
+ - Speaker diarization capabilities
204
+ - Multiple language support
205
+ - GPU acceleration with optimized inference
206
+
207
+ Our wrapper adds the resource management and type safety layer on top of this excellent foundation.
208
+
209
+ ## Development
210
+
211
+ ### Setting up Development Environment
212
+
213
+ ```bash
214
+ git clone https://github.com/falahat/easy-whisperx.git
215
+ cd easy-whisperx
216
+
217
+ # Create virtual environment (note the .venv name)
218
+ python -m venv .venv
219
+
220
+ # Activate virtual environment
221
+ # Windows PowerShell:
222
+ .\.venv\Scripts\Activate.ps1
223
+ # Linux/macOS:
224
+ source .venv/bin/activate
225
+
226
+ # Install in development mode
227
+ pip install -e .[dev]
228
+ ```
229
+
230
+ ### Running Tests
231
+
232
+ ```bash
233
+ # Run all tests
234
+ pytest
235
+
236
+ # Run with coverage report
237
+ pytest --cov=easy_whisperx --cov-report=html
238
+
239
+ # Run specific test file
240
+ pytest tests/test_transcriber.py -v
241
+
242
+ # Run integration tests
243
+ pytest -m integration
244
+ ```
245
+
246
+ ### Code Quality Tools
247
+
248
+ The project uses:
249
+
250
+ - **Black** for code formatting
251
+ - **mypy** for type checking
252
+ - **flake8** for linting
253
+ - **pytest** for testing
254
+
255
+ ```bash
256
+ # Format code
257
+ black src/ tests/
258
+
259
+ # Type checking
260
+ mypy src/easy_whisperx/
261
+
262
+ # Linting
263
+ flake8 src/easy_whisperx/
264
+ ```
265
+
266
+ ## Core Components
267
+
268
+ The package is built with a modular architecture:
269
+
270
+ - **`Transcriber`** - Main transcription using WhisperX models
271
+ - **`Aligner`** - Word-level timestamp alignment
272
+ - **`Diarizer`** - Speaker identification and assignment
273
+ - **`BulkExecutor`** - Bulk processing with performance tracking
274
+ - **`PerformanceTracker`** - Performance metrics collection
275
+ - **`BaseWhisperxModel`** - Abstract base for model management
276
+ - **Utility functions** - Audio loading and device configuration
277
+
278
+ ## Performance and Memory Management
279
+
280
+ The package includes automatic resource management:
281
+
282
+ - **Context Managers**: All models automatically clean up GPU memory
283
+ - **Performance Tracking**: Built-in metrics for all operations
284
+ - **Memory Optimization**: Automatic garbage collection and CUDA cache clearing
285
+ - **Error Handling**: Graceful failure handling with detailed logging
286
+
287
+ ## API Reference
288
+
289
+ ### Device Configuration
290
+
291
+ ```python
292
+ from easy_whisperx.utils import _determine_device_config
293
+
294
+ # Automatic device selection
295
+ device, compute_type = _determine_device_config("auto", "auto")
296
+ # Returns ("cuda", "float16") if GPU available, ("cpu", "int8") otherwise
297
+ ```
298
+
299
+ ### Performance Tracking
300
+
301
+ ```python
302
+ from easy_whisperx import PerformanceTracker
303
+
304
+ with PerformanceTracker("my_operation") as tracker:
305
+ # Your code here
306
+ tracker["custom_metric"] = "value"
307
+
308
+ metrics = tracker.to_dict()
309
+ print(f"Operation took {metrics['my_operation']['duration_seconds']} seconds")
310
+ ```
311
+
312
+ ## Contributing
313
+
314
+ 1. Fork the repository
315
+ 2. Create a feature branch (`git checkout -b feature/amazing-feature`)
316
+ 3. Make your changes with tests
317
+ 4. Ensure all tests pass (`pytest`)
318
+ 5. Check code quality (`black src/ tests/` and `mypy src/`)
319
+ 6. Submit a pull request
320
+
321
+ ## License
322
+
323
+ This project is licensed under the MIT License - see the LICENSE file for details.
@@ -0,0 +1,286 @@
1
+ # easy-whisperx
2
+
3
+ A streamlined Python wrapper around the [WhisperX](https://github.com/m-bain/whisperX) project, providing enhanced type safety, automatic resource management, and simplified API for audio transcription with GPU acceleration, word-level alignment, and speaker diarization.
4
+
5
+ ## Acknowledgments
6
+
7
+ This project builds upon the outstanding work of the [WhisperX team](https://github.com/m-bain/whisperX), particularly:
8
+
9
+ - **Max Bain** and contributors for creating WhisperX
10
+ - The original [Whisper](https://github.com/openai/whisper) team at OpenAI
11
+ - The [faster-whisper](https://github.com/guillaumekln/faster-whisper) project for performance improvements
12
+
13
+ **What easy-whisperx adds:**
14
+
15
+ - **Type Safety**: Comprehensive type hints and mypy compatibility
16
+ - **Resource Management**: Automatic GPU memory cleanup using context managers
17
+ - **Performance Tracking**: Built-in metrics collection for all operations
18
+ - **Simplified API**: Cleaner interface with sensible defaults
19
+ - **Error Handling**: Robust error handling with detailed logging
20
+ - **Bulk Processing**: Efficient batch processing capabilities
21
+
22
+ All the core transcription, alignment, and diarization capabilities are provided by the underlying WhisperX library.
23
+
24
+ ## Python Version Requirements
25
+
26
+ **This package requires Python 3.10, 3.11, or 3.12.** Python 3.13+ is not supported due to dependency limitations with the WhisperX library.
27
+
28
+ ## Features
29
+
30
+ - **Audio Transcription**: WhisperX-powered speech-to-text conversion
31
+ - **Word-level Alignment**: Precise timestamp alignment for individual words
32
+ - **Speaker Diarization**: Automatic speaker identification and assignment
33
+ - **GPU Acceleration**: CUDA support for faster processing
34
+ - **Performance Tracking**: Built-in metrics collection for all operations
35
+ - **Bulk Processing**: Efficient batch processing with individual item tracking
36
+ - **Type Safety**: Comprehensive type hints throughout
37
+ - **Context Management**: Automatic resource cleanup and memory management
38
+
39
+ ## Installation
40
+
41
+ ### Standard Installation
42
+
43
+ ```bash
44
+ git clone https://github.com/falahat/easy-whisperx.git
45
+ cd easy-whisperx
46
+ pip install -e .
47
+ ```
48
+
49
+ ### Development Installation
50
+
51
+ ```bash
52
+ git clone https://github.com/falahat/easy-whisperx.git
53
+ cd easy-whisperx
54
+ pip install -e .[dev]
55
+ ```
56
+
57
+ ### Notebook Support
58
+
59
+ ```bash
60
+ pip install -e .[notebook]
61
+ ```
62
+
63
+ ## Prerequisites for GPU Transcription
64
+
65
+ 1. **NVIDIA GPU** with CUDA support
66
+ 2. **Hugging Face Token** (for speaker diarization models)
67
+ 3. **PyTorch with GPU support**
68
+
69
+ ```bash
70
+ # Install PyTorch with GPU support
71
+ pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
72
+ ```
73
+
74
+ ## Setting up Transcription Environment
75
+
76
+ 1. **Get a Hugging Face Token**:
77
+ - Go to [Hugging Face Settings](https://huggingface.co/settings/tokens)
78
+ - Create a token with "read" permissions
79
+ - Accept user agreements for segmentation and diarization models
80
+
81
+ 2. **Set Environment Variable**:
82
+
83
+ ```powershell
84
+ # Windows PowerShell
85
+ $env:HF_TOKEN="your_token_here"
86
+ ```
87
+
88
+ ```bash
89
+ # Linux/macOS
90
+ export HF_TOKEN="your_token_here"
91
+ ```
92
+
93
+ ## Quick Start
94
+
95
+ ### Basic Transcription
96
+
97
+ ```python
98
+ from easy_whisperx import Transcriber
99
+
100
+ # Initialize transcriber
101
+ transcriber = Transcriber(
102
+ model_size="base",
103
+ device="cuda", # or "cpu"
104
+ compute_type="float16",
105
+ batch_size=16
106
+ )
107
+
108
+ # Transcribe audio file
109
+ with transcriber:
110
+ result = transcriber("path/to/audio.mp3")
111
+ print(result["text"])
112
+ ```
113
+
114
+ ### Complete Pipeline with Alignment and Diarization
115
+
116
+ ```python
117
+ import os
118
+ from easy_whisperx import Transcriber, Aligner, Diarizer
119
+
120
+ audio_path = "path/to/audio.mp3"
121
+ hf_token = os.getenv("HF_TOKEN")
122
+
123
+ # Step 1: Transcribe
124
+ with Transcriber("base", "cuda", "float16", 16) as transcriber:
125
+ transcript = transcriber(audio_path)
126
+
127
+ # Step 2: Align words
128
+ with Aligner("cuda", "en") as aligner:
129
+ aligned_transcript = aligner(transcript["segments"], audio_path)
130
+
131
+ # Step 3: Diarize speakers (optional)
132
+ if hf_token:
133
+ with Diarizer("cuda", hf_token) as diarizer:
134
+ final_transcript = diarizer(aligned_transcript, audio_path)
135
+ else:
136
+ final_transcript = aligned_transcript
137
+
138
+ print(final_transcript)
139
+ ```
140
+
141
+ ### Bulk Processing
142
+
143
+ ```python
144
+ from easy_whisperx import BulkExecutor, Transcriber
145
+
146
+ audio_files = ["file1.mp3", "file2.mp3", "file3.mp3"]
147
+
148
+ with Transcriber("base", "cuda", "float16", 16) as transcriber:
149
+ with BulkExecutor(transcriber) as executor:
150
+ def transcribe_file(model, audio_path, tracker):
151
+ result = model(audio_path)
152
+ tracker["segments_count"] = len(result.get("segments", []))
153
+
154
+ executor.for_each(audio_files, transcribe_file)
155
+ metrics = executor.get_metrics()
156
+ print(f"Bulk processing metrics: {metrics}")
157
+ ```
158
+
159
+ ## WhisperX Integration
160
+
161
+ This package uses WhisperX as its core transcription engine. We maintain a fork at [falahat/whisperx](https://github.com/falahat/whisperx) that may include specific optimizations or compatibility fixes, but all credit for the underlying transcription technology goes to the original [WhisperX project](https://github.com/m-bain/whisperX).
162
+
163
+ The original WhisperX provides:
164
+
165
+ - Fast automatic speech recognition with word-level timestamps
166
+ - Speaker diarization capabilities
167
+ - Multiple language support
168
+ - GPU acceleration with optimized inference
169
+
170
+ Our wrapper adds the resource management and type safety layer on top of this excellent foundation.
171
+
172
+ ## Development
173
+
174
+ ### Setting up Development Environment
175
+
176
+ ```bash
177
+ git clone https://github.com/falahat/easy-whisperx.git
178
+ cd easy-whisperx
179
+
180
+ # Create virtual environment (note the .venv name)
181
+ python -m venv .venv
182
+
183
+ # Activate virtual environment
184
+ # Windows PowerShell:
185
+ .\.venv\Scripts\Activate.ps1
186
+ # Linux/macOS:
187
+ source .venv/bin/activate
188
+
189
+ # Install in development mode
190
+ pip install -e .[dev]
191
+ ```
192
+
193
+ ### Running Tests
194
+
195
+ ```bash
196
+ # Run all tests
197
+ pytest
198
+
199
+ # Run with coverage report
200
+ pytest --cov=easy_whisperx --cov-report=html
201
+
202
+ # Run specific test file
203
+ pytest tests/test_transcriber.py -v
204
+
205
+ # Run integration tests
206
+ pytest -m integration
207
+ ```
208
+
209
+ ### Code Quality Tools
210
+
211
+ The project uses:
212
+
213
+ - **Black** for code formatting
214
+ - **mypy** for type checking
215
+ - **flake8** for linting
216
+ - **pytest** for testing
217
+
218
+ ```bash
219
+ # Format code
220
+ black src/ tests/
221
+
222
+ # Type checking
223
+ mypy src/easy_whisperx/
224
+
225
+ # Linting
226
+ flake8 src/easy_whisperx/
227
+ ```
228
+
229
+ ## Core Components
230
+
231
+ The package is built with a modular architecture:
232
+
233
+ - **`Transcriber`** - Main transcription using WhisperX models
234
+ - **`Aligner`** - Word-level timestamp alignment
235
+ - **`Diarizer`** - Speaker identification and assignment
236
+ - **`BulkExecutor`** - Bulk processing with performance tracking
237
+ - **`PerformanceTracker`** - Performance metrics collection
238
+ - **`BaseWhisperxModel`** - Abstract base for model management
239
+ - **Utility functions** - Audio loading and device configuration
240
+
241
+ ## Performance and Memory Management
242
+
243
+ The package includes automatic resource management:
244
+
245
+ - **Context Managers**: All models automatically clean up GPU memory
246
+ - **Performance Tracking**: Built-in metrics for all operations
247
+ - **Memory Optimization**: Automatic garbage collection and CUDA cache clearing
248
+ - **Error Handling**: Graceful failure handling with detailed logging
249
+
250
+ ## API Reference
251
+
252
+ ### Device Configuration
253
+
254
+ ```python
255
+ from easy_whisperx.utils import _determine_device_config
256
+
257
+ # Automatic device selection
258
+ device, compute_type = _determine_device_config("auto", "auto")
259
+ # Returns ("cuda", "float16") if GPU available, ("cpu", "int8") otherwise
260
+ ```
261
+
262
+ ### Performance Tracking
263
+
264
+ ```python
265
+ from easy_whisperx import PerformanceTracker
266
+
267
+ with PerformanceTracker("my_operation") as tracker:
268
+ # Your code here
269
+ tracker["custom_metric"] = "value"
270
+
271
+ metrics = tracker.to_dict()
272
+ print(f"Operation took {metrics['my_operation']['duration_seconds']} seconds")
273
+ ```
274
+
275
+ ## Contributing
276
+
277
+ 1. Fork the repository
278
+ 2. Create a feature branch (`git checkout -b feature/amazing-feature`)
279
+ 3. Make your changes with tests
280
+ 4. Ensure all tests pass (`pytest`)
281
+ 5. Check code quality (`black src/ tests/` and `mypy src/`)
282
+ 6. Submit a pull request
283
+
284
+ ## License
285
+
286
+ This project is licensed under the MIT License - see the LICENSE file for details.