audio-scribe 0.1.0__tar.gz

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,273 @@
1
+ Metadata-Version: 2.2
2
+ Name: audio_scribe
3
+ Version: 0.1.0
4
+ Summary: A command-line tool for audio transcription with Whisper and Pyannote.
5
+ Home-page: https://gitlab.genomicops.cloud/genomicops/audio-scribe
6
+ Author: Gurasis Osahan
7
+ Author-email: contact@genomicops.com
8
+ License: Apache-2.0
9
+ Project-URL: Source, https://gitlab.genomicops.cloud/genomicops/audio-scribe
10
+ Project-URL: Tracker, https://gitlab.genomicops.cloud/genomicops/audio-scribe/-/issues
11
+ Keywords: whisper pyannote transcription audio diarization
12
+ Classifier: Development Status :: 3 - Alpha
13
+ Classifier: Intended Audience :: Developers
14
+ Classifier: Intended Audience :: Science/Research
15
+ Classifier: Topic :: Multimedia :: Sound/Audio
16
+ Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
17
+ Classifier: License :: OSI Approved :: Apache Software License
18
+ Classifier: Programming Language :: Python :: 3
19
+ Classifier: Programming Language :: Python :: 3.8
20
+ Classifier: Programming Language :: Python :: 3.9
21
+ Classifier: Programming Language :: Python :: 3.10
22
+ Classifier: Operating System :: OS Independent
23
+ Requires-Python: >=3.8
24
+ Description-Content-Type: text/markdown
25
+ Requires-Dist: torch
26
+ Requires-Dist: openai-whisper
27
+ Requires-Dist: pyannote.audio
28
+ Requires-Dist: pytorch-lightning
29
+ Requires-Dist: keyring
30
+ Requires-Dist: cryptography
31
+ Requires-Dist: alive-progress
32
+ Requires-Dist: psutil
33
+ Requires-Dist: GPUtil
34
+ Dynamic: author
35
+ Dynamic: author-email
36
+ Dynamic: classifier
37
+ Dynamic: description
38
+ Dynamic: description-content-type
39
+ Dynamic: home-page
40
+ Dynamic: keywords
41
+ Dynamic: license
42
+ Dynamic: project-url
43
+ Dynamic: requires-dist
44
+ Dynamic: requires-python
45
+ Dynamic: summary
46
+
47
+ # Audio Scribe
48
+
49
+ **A Command-Line Tool for Audio Transcription (Audio Scribe) and Speaker Diarization Using OpenAI Whisper and Pyannote**
50
+
51
+ [![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](LICENSE)
52
+
53
+ ## Overview
54
+
55
+ **Audio Scribe** is a command-line tool that transcribes audio files with speaker diarization. Leveraging [OpenAI Whisper](https://github.com/openai/whisper) for transcription and [Pyannote Audio](https://github.com/pyannote/pyannote-audio) for speaker diarization, this solution converts audio into segmented text files, identifying each speaker turn. Key features include:
56
+
57
+ - **Progress Bar & Resource Monitoring**: See real-time CPU, memory, and GPU usage with a live progress bar.
58
+ - **Speaker Diarization**: Automatically separates speaker turns using Pyannote’s state-of-the-art models.
59
+ - **Tab-Completion for File Paths**: Easily navigate your file system when prompted for the audio path.
60
+ - **Secure Token Storage**: Encrypts and stores your Hugging Face token for private model downloads.
61
+ - **Customizable Whisper Models**: Default to `base.en`, or specify `tiny`, `small`, `medium`, `large`, etc.
62
+
63
+ This repository is licensed under the [Apache License 2.0](#license).
64
+
65
+ ---
66
+
67
+ ## Table of Contents
68
+
69
+ - [Audio Scribe](#audio-scribe)
70
+ - [Overview](#overview)
71
+ - [Table of Contents](#table-of-contents)
72
+ - [Features](#features)
73
+ - [Installation](#installation)
74
+ - [Installing from PyPI](#installing-from-pypi)
75
+ - [Installing from GitHub](#installing-from-github)
76
+ - [Quick Start](#quick-start)
77
+ - [Usage](#usage)
78
+ - [Dependencies](#dependencies)
79
+ - [Sample `requirements.txt`](#sample-requirementstxt)
80
+ - [Contributing](#contributing)
81
+ - [License](#license)
82
+
83
+ ---
84
+
85
+ ## Features
86
+
87
+ - **Whisper Transcription**
88
+ Utilizes [OpenAI Whisper](https://github.com/openai/whisper) to convert speech to text in multiple languages.
89
+ - **Pyannote Speaker Diarization**
90
+ Identifies different speakers and segments your audio output accordingly.
91
+ - **Progress Bar & Resource Usage**
92
+ Displays a live progress bar with CPU, memory, and GPU stats through [alive-progress](https://github.com/rsalmei/alive-progress), [psutil](https://pypi.org/project/psutil/), and [GPUtil](https://pypi.org/project/GPUtil/).
93
+ - **Tab-Completion**
94
+ Press **Tab** to autocomplete file paths on Unix-like systems (and on Windows with [pyreadline3](https://pypi.org/project/pyreadline3/)).
95
+ - **Secure Token Storage**
96
+ Saves your Hugging Face token via [cryptography](https://pypi.org/project/cryptography/) for model downloads (e.g., `pyannote/speaker-diarization-3.1`).
97
+ - **Configurable Models**
98
+ Default is `base.en` but you can specify any other Whisper model using `--whisper-model`.
99
+
100
+ ---
101
+
102
+ ## Installation
103
+
104
+ ### Installing from PyPI
105
+
106
+ **Audio Scribe** is available on PyPI. You can install it with:
107
+
108
+ ```bash
109
+ pip install audio-scribe
110
+ ```
111
+
112
+ After installation, the **`audio-scribe`** command should be available in your terminal (depending on how your PATH is configured). If you prefer to run via Python module, you can also do:
113
+
114
+ ```bash
115
+ python -m audio-scribe --audio path/to/yourfile.wav
116
+ ```
117
+
118
+ ### Installing from GitHub
119
+
120
+ To install the latest development version directly from GitHub:
121
+
122
+ ```bash
123
+ git clone https://gitlab.genomicops.cloud/genomicops/audio-scribe.git
124
+ cd audio-scribe
125
+ pip install -r requirements.txt
126
+ ```
127
+
128
+ This approach is particularly useful if you want the newest changes or plan to contribute.
129
+
130
+ ---
131
+
132
+ ## Quick Start
133
+
134
+ 1. **Obtain a Hugging Face Token**
135
+ - Create a token at [Hugging Face Settings](https://huggingface.co/settings/tokens).
136
+ - Accept the model conditions for `pyannote/segmentation-3.0` and `pyannote/speaker-diarization-3.1`.
137
+
138
+ 2. **Run the Command-Line Tool**
139
+ ```bash
140
+ audio-scribe --audio path/to/audio.wav
141
+ ```
142
+ > On the first run, you’ll be prompted for your Hugging Face token if you haven’t stored one yet.
143
+
144
+ 3. **Watch the Progress Bar**
145
+ - The tool displays a progress bar for each diarized speaker turn, along with real-time CPU, GPU, and memory usage.
146
+
147
+ ---
148
+
149
+ ## Usage
150
+
151
+ Below is a summary of the main command-line options:
152
+
153
+ ```
154
+ usage: audio-scribe [options]
155
+
156
+ Audio Transcription (Audio Scribe) Pipeline using Whisper + Pyannote, with optional progress bar.
157
+
158
+ optional arguments:
159
+ --audio PATH Path to the audio file to transcribe.
160
+ --token TOKEN HuggingFace API token. Overrides any saved token.
161
+ --output PATH Path to the output directory for transcripts and temporary files.
162
+ --delete-token Delete any stored Hugging Face token and exit.
163
+ --show-warnings Enable user warnings (e.g., from pyannote.audio). Disabled by default.
164
+ --whisper-model MODEL Specify the Whisper model to use (default: 'base.en').
165
+ ```
166
+
167
+ **Examples:**
168
+
169
+ - **Basic Transcription**
170
+ ```bash
171
+ audio-scribe --audio meeting.wav
172
+ ```
173
+
174
+ - **Specify a Different Whisper Model**
175
+ ```bash
176
+ audio-scribe --audio webinar.mp3 --whisper-model small
177
+ ```
178
+
179
+ - **Delete a Stored Token**
180
+ ```bash
181
+ audio-scribe --delete-token
182
+ ```
183
+
184
+ - **Show Internal Warnings**
185
+ ```bash
186
+ audio-scribe --audio session.wav --show-warnings
187
+ ```
188
+
189
+ - **Tab-Completion**
190
+ ```bash
191
+ audio-scribe
192
+ # When prompted for an audio file path, press Tab to autocomplete
193
+ ```
194
+
195
+ ---
196
+
197
+ ## Dependencies
198
+
199
+ **Core Libraries**
200
+ - **Python 3.8+**
201
+ - [PyTorch](https://pytorch.org/)
202
+ - [openai-whisper](https://github.com/openai/whisper)
203
+ - [pyannote.audio](https://github.com/pyannote/pyannote-audio)
204
+ - [pytorch-lightning](https://pypi.org/project/pytorch-lightning/)
205
+ - [cryptography](https://pypi.org/project/cryptography/)
206
+ - [keyring](https://pypi.org/project/keyring/)
207
+
208
+ **Optional for Extended Functionality**
209
+ - [alive-progress](https://pypi.org/project/alive-progress/) – Real-time progress bar
210
+ - [psutil](https://pypi.org/project/psutil/) – CPU/memory usage
211
+ - [GPUtil](https://pypi.org/project/GPUtil/) – GPU usage
212
+ - [pyreadline3](https://pypi.org/project/pyreadline3/) (for Windows tab-completion)
213
+
214
+ ### Sample `requirements.txt`
215
+
216
+ Below is a typical `requirements.txt` you can place in your repository:
217
+
218
+ ```
219
+ torch>=1.9
220
+ openai-whisper
221
+ pyannote.audio
222
+ pytorch-lightning
223
+ cryptography
224
+ keyring
225
+ alive-progress
226
+ psutil
227
+ GPUtil
228
+ pyreadline3; sys_platform == "win32"
229
+ ```
230
+
231
+ > Note:
232
+ > - `pyreadline3` is appended with a [PEP 508 marker](https://peps.python.org/pep-0508/) (`; sys_platform == "win32"`) so it only installs on Windows.
233
+ > - For GPU support, ensure you install a compatible PyTorch version with CUDA.
234
+
235
+ ---
236
+
237
+ ## Contributing
238
+
239
+ We welcome contributions to **Audio Scribe**!
240
+
241
+ 1. **Fork** the repository and clone your fork.
242
+ 2. **Create a new branch** for your feature or bugfix.
243
+ 3. **Implement your changes**, ensuring code is well-documented and follows best practices.
244
+ 4. **Open a pull request**, detailing the changes you’ve made.
245
+
246
+ Please read any available guidelines or templates in our repository (such as `CONTRIBUTING.md` or `CODE_OF_CONDUCT.md`) before submitting.
247
+
248
+ ---
249
+
250
+ ## License
251
+
252
+ This project is licensed under the [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0).
253
+
254
+ ```
255
+ Copyright 2025 Gurasis Osahan
256
+
257
+ Licensed under the Apache License, Version 2.0 (the "License");
258
+ you may not use this file except in compliance with the License.
259
+ You may obtain a copy of the License at
260
+
261
+ http://www.apache.org/licenses/LICENSE-2.0
262
+
263
+ Unless required by applicable law or agreed to in writing, software
264
+ distributed under the License is distributed on an "AS IS" BASIS,
265
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
266
+ See the License for the specific language governing permissions and
267
+ limitations under the License.
268
+ ```
269
+
270
+ ---
271
+
272
+ **Thank you for using Audio Scribe!**
273
+ For questions or feedback, please open a [GitHub issue](https://gitlab.genomicops.cloud/genomicops/audio-scribe/issues) or contact the maintainers.
@@ -0,0 +1,227 @@
1
+ # Audio Scribe
2
+
3
+ **A Command-Line Tool for Audio Transcription (Audio Scribe) and Speaker Diarization Using OpenAI Whisper and Pyannote**
4
+
5
+ [![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](LICENSE)
6
+
7
+ ## Overview
8
+
9
+ **Audio Scribe** is a command-line tool that transcribes audio files with speaker diarization. Leveraging [OpenAI Whisper](https://github.com/openai/whisper) for transcription and [Pyannote Audio](https://github.com/pyannote/pyannote-audio) for speaker diarization, this solution converts audio into segmented text files, identifying each speaker turn. Key features include:
10
+
11
+ - **Progress Bar & Resource Monitoring**: See real-time CPU, memory, and GPU usage with a live progress bar.
12
+ - **Speaker Diarization**: Automatically separates speaker turns using Pyannote’s state-of-the-art models.
13
+ - **Tab-Completion for File Paths**: Easily navigate your file system when prompted for the audio path.
14
+ - **Secure Token Storage**: Encrypts and stores your Hugging Face token for private model downloads.
15
+ - **Customizable Whisper Models**: Default to `base.en`, or specify `tiny`, `small`, `medium`, `large`, etc.
16
+
17
+ This repository is licensed under the [Apache License 2.0](#license).
18
+
19
+ ---
20
+
21
+ ## Table of Contents
22
+
23
+ - [Audio Scribe](#audio-scribe)
24
+ - [Overview](#overview)
25
+ - [Table of Contents](#table-of-contents)
26
+ - [Features](#features)
27
+ - [Installation](#installation)
28
+ - [Installing from PyPI](#installing-from-pypi)
29
+ - [Installing from GitHub](#installing-from-github)
30
+ - [Quick Start](#quick-start)
31
+ - [Usage](#usage)
32
+ - [Dependencies](#dependencies)
33
+ - [Sample `requirements.txt`](#sample-requirementstxt)
34
+ - [Contributing](#contributing)
35
+ - [License](#license)
36
+
37
+ ---
38
+
39
+ ## Features
40
+
41
+ - **Whisper Transcription**
42
+ Utilizes [OpenAI Whisper](https://github.com/openai/whisper) to convert speech to text in multiple languages.
43
+ - **Pyannote Speaker Diarization**
44
+ Identifies different speakers and segments your audio output accordingly.
45
+ - **Progress Bar & Resource Usage**
46
+ Displays a live progress bar with CPU, memory, and GPU stats through [alive-progress](https://github.com/rsalmei/alive-progress), [psutil](https://pypi.org/project/psutil/), and [GPUtil](https://pypi.org/project/GPUtil/).
47
+ - **Tab-Completion**
48
+ Press **Tab** to autocomplete file paths on Unix-like systems (and on Windows with [pyreadline3](https://pypi.org/project/pyreadline3/)).
49
+ - **Secure Token Storage**
50
+ Saves your Hugging Face token via [cryptography](https://pypi.org/project/cryptography/) for model downloads (e.g., `pyannote/speaker-diarization-3.1`).
51
+ - **Configurable Models**
52
+ Default is `base.en` but you can specify any other Whisper model using `--whisper-model`.
53
+
54
+ ---
55
+
56
+ ## Installation
57
+
58
+ ### Installing from PyPI
59
+
60
+ **Audio Scribe** is available on PyPI. You can install it with:
61
+
62
+ ```bash
63
+ pip install audio-scribe
64
+ ```
65
+
66
+ After installation, the **`audio-scribe`** command should be available in your terminal (depending on how your PATH is configured). If you prefer to run via Python module, you can also do:
67
+
68
+ ```bash
69
+ python -m audio-scribe --audio path/to/yourfile.wav
70
+ ```
71
+
72
+ ### Installing from GitHub
73
+
74
+ To install the latest development version directly from GitHub:
75
+
76
+ ```bash
77
+ git clone https://gitlab.genomicops.cloud/genomicops/audio-scribe.git
78
+ cd audio-scribe
79
+ pip install -r requirements.txt
80
+ ```
81
+
82
+ This approach is particularly useful if you want the newest changes or plan to contribute.
83
+
84
+ ---
85
+
86
+ ## Quick Start
87
+
88
+ 1. **Obtain a Hugging Face Token**
89
+ - Create a token at [Hugging Face Settings](https://huggingface.co/settings/tokens).
90
+ - Accept the model conditions for `pyannote/segmentation-3.0` and `pyannote/speaker-diarization-3.1`.
91
+
92
+ 2. **Run the Command-Line Tool**
93
+ ```bash
94
+ audio-scribe --audio path/to/audio.wav
95
+ ```
96
+ > On the first run, you’ll be prompted for your Hugging Face token if you haven’t stored one yet.
97
+
98
+ 3. **Watch the Progress Bar**
99
+ - The tool displays a progress bar for each diarized speaker turn, along with real-time CPU, GPU, and memory usage.
100
+
101
+ ---
102
+
103
+ ## Usage
104
+
105
+ Below is a summary of the main command-line options:
106
+
107
+ ```
108
+ usage: audio-scribe [options]
109
+
110
+ Audio Transcription (Audio Scribe) Pipeline using Whisper + Pyannote, with optional progress bar.
111
+
112
+ optional arguments:
113
+ --audio PATH Path to the audio file to transcribe.
114
+ --token TOKEN HuggingFace API token. Overrides any saved token.
115
+ --output PATH Path to the output directory for transcripts and temporary files.
116
+ --delete-token Delete any stored Hugging Face token and exit.
117
+ --show-warnings Enable user warnings (e.g., from pyannote.audio). Disabled by default.
118
+ --whisper-model MODEL Specify the Whisper model to use (default: 'base.en').
119
+ ```
120
+
121
+ **Examples:**
122
+
123
+ - **Basic Transcription**
124
+ ```bash
125
+ audio-scribe --audio meeting.wav
126
+ ```
127
+
128
+ - **Specify a Different Whisper Model**
129
+ ```bash
130
+ audio-scribe --audio webinar.mp3 --whisper-model small
131
+ ```
132
+
133
+ - **Delete a Stored Token**
134
+ ```bash
135
+ audio-scribe --delete-token
136
+ ```
137
+
138
+ - **Show Internal Warnings**
139
+ ```bash
140
+ audio-scribe --audio session.wav --show-warnings
141
+ ```
142
+
143
+ - **Tab-Completion**
144
+ ```bash
145
+ audio-scribe
146
+ # When prompted for an audio file path, press Tab to autocomplete
147
+ ```
148
+
149
+ ---
150
+
151
+ ## Dependencies
152
+
153
+ **Core Libraries**
154
+ - **Python 3.8+**
155
+ - [PyTorch](https://pytorch.org/)
156
+ - [openai-whisper](https://github.com/openai/whisper)
157
+ - [pyannote.audio](https://github.com/pyannote/pyannote-audio)
158
+ - [pytorch-lightning](https://pypi.org/project/pytorch-lightning/)
159
+ - [cryptography](https://pypi.org/project/cryptography/)
160
+ - [keyring](https://pypi.org/project/keyring/)
161
+
162
+ **Optional for Extended Functionality**
163
+ - [alive-progress](https://pypi.org/project/alive-progress/) – Real-time progress bar
164
+ - [psutil](https://pypi.org/project/psutil/) – CPU/memory usage
165
+ - [GPUtil](https://pypi.org/project/GPUtil/) – GPU usage
166
+ - [pyreadline3](https://pypi.org/project/pyreadline3/) (for Windows tab-completion)
167
+
168
+ ### Sample `requirements.txt`
169
+
170
+ Below is a typical `requirements.txt` you can place in your repository:
171
+
172
+ ```
173
+ torch>=1.9
174
+ openai-whisper
175
+ pyannote.audio
176
+ pytorch-lightning
177
+ cryptography
178
+ keyring
179
+ alive-progress
180
+ psutil
181
+ GPUtil
182
+ pyreadline3; sys_platform == "win32"
183
+ ```
184
+
185
+ > Note:
186
+ > - `pyreadline3` is appended with a [PEP 508 marker](https://peps.python.org/pep-0508/) (`; sys_platform == "win32"`) so it only installs on Windows.
187
+ > - For GPU support, ensure you install a compatible PyTorch version with CUDA.
188
+
189
+ ---
190
+
191
+ ## Contributing
192
+
193
+ We welcome contributions to **Audio Scribe**!
194
+
195
+ 1. **Fork** the repository and clone your fork.
196
+ 2. **Create a new branch** for your feature or bugfix.
197
+ 3. **Implement your changes**, ensuring code is well-documented and follows best practices.
198
+ 4. **Open a pull request**, detailing the changes you’ve made.
199
+
200
+ Please read any available guidelines or templates in our repository (such as `CONTRIBUTING.md` or `CODE_OF_CONDUCT.md`) before submitting.
201
+
202
+ ---
203
+
204
+ ## License
205
+
206
+ This project is licensed under the [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0).
207
+
208
+ ```
209
+ Copyright 2025 Gurasis Osahan
210
+
211
+ Licensed under the Apache License, Version 2.0 (the "License");
212
+ you may not use this file except in compliance with the License.
213
+ You may obtain a copy of the License at
214
+
215
+ http://www.apache.org/licenses/LICENSE-2.0
216
+
217
+ Unless required by applicable law or agreed to in writing, software
218
+ distributed under the License is distributed on an "AS IS" BASIS,
219
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
220
+ See the License for the specific language governing permissions and
221
+ limitations under the License.
222
+ ```
223
+
224
+ ---
225
+
226
+ **Thank you for using Audio Scribe!**
227
+ For questions or feedback, please open a [GitHub issue](https://gitlab.genomicops.cloud/genomicops/audio-scribe/issues) or contact the maintainers.
File without changes