audio-scribe 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,273 @@
1
+ Metadata-Version: 2.2
2
+ Name: audio_scribe
3
+ Version: 0.1.0
4
+ Summary: A command-line tool for audio transcription with Whisper and Pyannote.
5
+ Home-page: https://gitlab.genomicops.cloud/genomicops/audio-scribe
6
+ Author: Gurasis Osahan
7
+ Author-email: contact@genomicops.com
8
+ License: Apache-2.0
9
+ Project-URL: Source, https://gitlab.genomicops.cloud/genomicops/audio-scribe
10
+ Project-URL: Tracker, https://gitlab.genomicops.cloud/genomicops/audio-scribe/-/issues
11
+ Keywords: whisper pyannote transcription audio diarization
12
+ Classifier: Development Status :: 3 - Alpha
13
+ Classifier: Intended Audience :: Developers
14
+ Classifier: Intended Audience :: Science/Research
15
+ Classifier: Topic :: Multimedia :: Sound/Audio
16
+ Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
17
+ Classifier: License :: OSI Approved :: Apache Software License
18
+ Classifier: Programming Language :: Python :: 3
19
+ Classifier: Programming Language :: Python :: 3.8
20
+ Classifier: Programming Language :: Python :: 3.9
21
+ Classifier: Programming Language :: Python :: 3.10
22
+ Classifier: Operating System :: OS Independent
23
+ Requires-Python: >=3.8
24
+ Description-Content-Type: text/markdown
25
+ Requires-Dist: torch
26
+ Requires-Dist: openai-whisper
27
+ Requires-Dist: pyannote.audio
28
+ Requires-Dist: pytorch-lightning
29
+ Requires-Dist: keyring
30
+ Requires-Dist: cryptography
31
+ Requires-Dist: alive-progress
32
+ Requires-Dist: psutil
33
+ Requires-Dist: GPUtil
34
+ Dynamic: author
35
+ Dynamic: author-email
36
+ Dynamic: classifier
37
+ Dynamic: description
38
+ Dynamic: description-content-type
39
+ Dynamic: home-page
40
+ Dynamic: keywords
41
+ Dynamic: license
42
+ Dynamic: project-url
43
+ Dynamic: requires-dist
44
+ Dynamic: requires-python
45
+ Dynamic: summary
46
+
47
+ # Audio Scribe
48
+
49
+ **A Command-Line Tool for Audio Transcription (Audio Scribe) and Speaker Diarization Using OpenAI Whisper and Pyannote**
50
+
51
+ [![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](LICENSE)
52
+
53
+ ## Overview
54
+
55
+ **Audio Scribe** is a command-line tool that transcribes audio files with speaker diarization. Leveraging [OpenAI Whisper](https://github.com/openai/whisper) for transcription and [Pyannote Audio](https://github.com/pyannote/pyannote-audio) for speaker diarization, this solution converts audio into segmented text files, identifying each speaker turn. Key features include:
56
+
57
+ - **Progress Bar & Resource Monitoring**: See real-time CPU, memory, and GPU usage with a live progress bar.
58
+ - **Speaker Diarization**: Automatically separates speaker turns using Pyannote’s state-of-the-art models.
59
+ - **Tab-Completion for File Paths**: Easily navigate your file system when prompted for the audio path.
60
+ - **Secure Token Storage**: Encrypts and stores your Hugging Face token for private model downloads.
61
+ - **Customizable Whisper Models**: Default to `base.en`, or specify `tiny`, `small`, `medium`, `large`, etc.
62
+
63
+ This repository is licensed under the [Apache License 2.0](#license).
64
+
65
+ ---
66
+
67
+ ## Table of Contents
68
+
69
+ - [Audio Scribe](#audio-scribe)
70
+ - [Overview](#overview)
71
+ - [Table of Contents](#table-of-contents)
72
+ - [Features](#features)
73
+ - [Installation](#installation)
74
+ - [Installing from PyPI](#installing-from-pypi)
75
+ - [Installing from GitHub](#installing-from-github)
76
+ - [Quick Start](#quick-start)
77
+ - [Usage](#usage)
78
+ - [Dependencies](#dependencies)
79
+ - [Sample `requirements.txt`](#sample-requirementstxt)
80
+ - [Contributing](#contributing)
81
+ - [License](#license)
82
+
83
+ ---
84
+
85
+ ## Features
86
+
87
+ - **Whisper Transcription**
88
+ Utilizes [OpenAI Whisper](https://github.com/openai/whisper) to convert speech to text in multiple languages.
89
+ - **Pyannote Speaker Diarization**
90
+ Identifies different speakers and segments your audio output accordingly.
91
+ - **Progress Bar & Resource Usage**
92
+ Displays a live progress bar with CPU, memory, and GPU stats through [alive-progress](https://github.com/rsalmei/alive-progress), [psutil](https://pypi.org/project/psutil/), and [GPUtil](https://pypi.org/project/GPUtil/).
93
+ - **Tab-Completion**
94
+ Press **Tab** to autocomplete file paths on Unix-like systems (and on Windows with [pyreadline3](https://pypi.org/project/pyreadline3/)).
95
+ - **Secure Token Storage**
96
+ Saves your Hugging Face token via [cryptography](https://pypi.org/project/cryptography/) for model downloads (e.g., `pyannote/speaker-diarization-3.1`).
97
+ - **Configurable Models**
98
+ Default is `base.en` but you can specify any other Whisper model using `--whisper-model`.
99
+
100
+ ---
101
+
102
+ ## Installation
103
+
104
+ ### Installing from PyPI
105
+
106
+ **Audio Scribe** is available on PyPI. You can install it with:
107
+
108
+ ```bash
109
+ pip install audio-scribe
110
+ ```
111
+
112
+ After installation, the **`audio-scribe`** command should be available in your terminal (depending on how your PATH is configured). If you prefer to run via Python module, you can also do:
113
+
114
+ ```bash
115
+ python -m audio-scribe --audio path/to/yourfile.wav
116
+ ```
117
+
118
+ ### Installing from GitHub
119
+
120
+ To install the latest development version directly from GitHub:
121
+
122
+ ```bash
123
+ git clone https://gitlab.genomicops.cloud/genomicops/audio-scribe.git
124
+ cd audio-scribe
125
+ pip install -r requirements.txt
126
+ ```
127
+
128
+ This approach is particularly useful if you want the newest changes or plan to contribute.
129
+
130
+ ---
131
+
132
+ ## Quick Start
133
+
134
+ 1. **Obtain a Hugging Face Token**
135
+ - Create a token at [Hugging Face Settings](https://huggingface.co/settings/tokens).
136
+ - Accept the model conditions for `pyannote/segmentation-3.0` and `pyannote/speaker-diarization-3.1`.
137
+
138
+ 2. **Run the Command-Line Tool**
139
+ ```bash
140
+ audio-scribe --audio path/to/audio.wav
141
+ ```
142
+ > On the first run, you’ll be prompted for your Hugging Face token if you haven’t stored one yet.
143
+
144
+ 3. **Watch the Progress Bar**
145
+ - The tool displays a progress bar for each diarized speaker turn, along with real-time CPU, GPU, and memory usage.
146
+
147
+ ---
148
+
149
+ ## Usage
150
+
151
+ Below is a summary of the main command-line options:
152
+
153
+ ```
154
+ usage: audio-scribe [options]
155
+
156
+ Audio Transcription (Audio Scribe) Pipeline using Whisper + Pyannote, with optional progress bar.
157
+
158
+ optional arguments:
159
+ --audio PATH Path to the audio file to transcribe.
160
+ --token TOKEN HuggingFace API token. Overrides any saved token.
161
+ --output PATH Path to the output directory for transcripts and temporary files.
162
+ --delete-token Delete any stored Hugging Face token and exit.
163
+ --show-warnings Enable user warnings (e.g., from pyannote.audio). Disabled by default.
164
+ --whisper-model MODEL Specify the Whisper model to use (default: 'base.en').
165
+ ```
166
+
167
+ **Examples:**
168
+
169
+ - **Basic Transcription**
170
+ ```bash
171
+ audio-scribe --audio meeting.wav
172
+ ```
173
+
174
+ - **Specify a Different Whisper Model**
175
+ ```bash
176
+ audio-scribe --audio webinar.mp3 --whisper-model small
177
+ ```
178
+
179
+ - **Delete a Stored Token**
180
+ ```bash
181
+ audio-scribe --delete-token
182
+ ```
183
+
184
+ - **Show Internal Warnings**
185
+ ```bash
186
+ audio-scribe --audio session.wav --show-warnings
187
+ ```
188
+
189
+ - **Tab-Completion**
190
+ ```bash
191
+ audio-scribe
192
+ # When prompted for an audio file path, press Tab to autocomplete
193
+ ```
194
+
195
+ ---
196
+
197
+ ## Dependencies
198
+
199
+ **Core Libraries**
200
+ - **Python 3.8+**
201
+ - [PyTorch](https://pytorch.org/)
202
+ - [openai-whisper](https://github.com/openai/whisper)
203
+ - [pyannote.audio](https://github.com/pyannote/pyannote-audio)
204
+ - [pytorch-lightning](https://pypi.org/project/pytorch-lightning/)
205
+ - [cryptography](https://pypi.org/project/cryptography/)
206
+ - [keyring](https://pypi.org/project/keyring/)
207
+
208
+ **Optional for Extended Functionality**
209
+ - [alive-progress](https://pypi.org/project/alive-progress/) – Real-time progress bar
210
+ - [psutil](https://pypi.org/project/psutil/) – CPU/memory usage
211
+ - [GPUtil](https://pypi.org/project/GPUtil/) – GPU usage
212
+ - [pyreadline3](https://pypi.org/project/pyreadline3/) (for Windows tab-completion)
213
+
214
+ ### Sample `requirements.txt`
215
+
216
+ Below is a typical `requirements.txt` you can place in your repository:
217
+
218
+ ```
219
+ torch>=1.9
220
+ openai-whisper
221
+ pyannote.audio
222
+ pytorch-lightning
223
+ cryptography
224
+ keyring
225
+ alive-progress
226
+ psutil
227
+ GPUtil
228
+ pyreadline3; sys_platform == "win32"
229
+ ```
230
+
231
+ > Note:
232
+ > - `pyreadline3` is appended with a [PEP 508 marker](https://peps.python.org/pep-0508/) (`; sys_platform == "win32"`) so it only installs on Windows.
233
+ > - For GPU support, ensure you install a compatible PyTorch version with CUDA.
234
+
235
+ ---
236
+
237
+ ## Contributing
238
+
239
+ We welcome contributions to **Audio Scribe**!
240
+
241
+ 1. **Fork** the repository and clone your fork.
242
+ 2. **Create a new branch** for your feature or bugfix.
243
+ 3. **Implement your changes**, ensuring code is well-documented and follows best practices.
244
+ 4. **Open a pull request**, detailing the changes you’ve made.
245
+
246
+ Please read any available guidelines or templates in our repository (such as `CONTRIBUTING.md` or `CODE_OF_CONDUCT.md`) before submitting.
247
+
248
+ ---
249
+
250
+ ## License
251
+
252
+ This project is licensed under the [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0).
253
+
254
+ ```
255
+ Copyright 2025 Gurasis Osahan
256
+
257
+ Licensed under the Apache License, Version 2.0 (the "License");
258
+ you may not use this file except in compliance with the License.
259
+ You may obtain a copy of the License at
260
+
261
+ http://www.apache.org/licenses/LICENSE-2.0
262
+
263
+ Unless required by applicable law or agreed to in writing, software
264
+ distributed under the License is distributed on an "AS IS" BASIS,
265
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
266
+ See the License for the specific language governing permissions and
267
+ limitations under the License.
268
+ ```
269
+
270
+ ---
271
+
272
+ **Thank you for using Audio Scribe!**
273
+ For questions or feedback, please open a [GitHub issue](https://gitlab.genomicops.cloud/genomicops/audio-scribe/issues) or contact the maintainers.
@@ -0,0 +1,227 @@
1
+ # Audio Scribe
2
+
3
+ **A Command-Line Tool for Audio Transcription (Audio Scribe) and Speaker Diarization Using OpenAI Whisper and Pyannote**
4
+
5
+ [![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](LICENSE)
6
+
7
+ ## Overview
8
+
9
+ **Audio Scribe** is a command-line tool that transcribes audio files with speaker diarization. Leveraging [OpenAI Whisper](https://github.com/openai/whisper) for transcription and [Pyannote Audio](https://github.com/pyannote/pyannote-audio) for speaker diarization, this solution converts audio into segmented text files, identifying each speaker turn. Key features include:
10
+
11
+ - **Progress Bar & Resource Monitoring**: See real-time CPU, memory, and GPU usage with a live progress bar.
12
+ - **Speaker Diarization**: Automatically separates speaker turns using Pyannote’s state-of-the-art models.
13
+ - **Tab-Completion for File Paths**: Easily navigate your file system when prompted for the audio path.
14
+ - **Secure Token Storage**: Encrypts and stores your Hugging Face token for private model downloads.
15
+ - **Customizable Whisper Models**: Default to `base.en`, or specify `tiny`, `small`, `medium`, `large`, etc.
16
+
17
+ This repository is licensed under the [Apache License 2.0](#license).
18
+
19
+ ---
20
+
21
+ ## Table of Contents
22
+
23
+ - [Audio Scribe](#audio-scribe)
24
+ - [Overview](#overview)
25
+ - [Table of Contents](#table-of-contents)
26
+ - [Features](#features)
27
+ - [Installation](#installation)
28
+ - [Installing from PyPI](#installing-from-pypi)
29
+ - [Installing from GitHub](#installing-from-github)
30
+ - [Quick Start](#quick-start)
31
+ - [Usage](#usage)
32
+ - [Dependencies](#dependencies)
33
+ - [Sample `requirements.txt`](#sample-requirementstxt)
34
+ - [Contributing](#contributing)
35
+ - [License](#license)
36
+
37
+ ---
38
+
39
+ ## Features
40
+
41
+ - **Whisper Transcription**
42
+ Utilizes [OpenAI Whisper](https://github.com/openai/whisper) to convert speech to text in multiple languages.
43
+ - **Pyannote Speaker Diarization**
44
+ Identifies different speakers and segments your audio output accordingly.
45
+ - **Progress Bar & Resource Usage**
46
+ Displays a live progress bar with CPU, memory, and GPU stats through [alive-progress](https://github.com/rsalmei/alive-progress), [psutil](https://pypi.org/project/psutil/), and [GPUtil](https://pypi.org/project/GPUtil/).
47
+ - **Tab-Completion**
48
+ Press **Tab** to autocomplete file paths on Unix-like systems (and on Windows with [pyreadline3](https://pypi.org/project/pyreadline3/)).
49
+ - **Secure Token Storage**
50
+ Saves your Hugging Face token via [cryptography](https://pypi.org/project/cryptography/) for model downloads (e.g., `pyannote/speaker-diarization-3.1`).
51
+ - **Configurable Models**
52
+ Default is `base.en` but you can specify any other Whisper model using `--whisper-model`.
53
+
54
+ ---
55
+
56
+ ## Installation
57
+
58
+ ### Installing from PyPI
59
+
60
+ **Audio Scribe** is available on PyPI. You can install it with:
61
+
62
+ ```bash
63
+ pip install audio-scribe
64
+ ```
65
+
66
+ After installation, the **`audio-scribe`** command should be available in your terminal (depending on how your PATH is configured). If you prefer to run via Python module, you can also do:
67
+
68
+ ```bash
69
+ python -m audio-scribe --audio path/to/yourfile.wav
70
+ ```
71
+
72
+ ### Installing from GitHub
73
+
74
+ To install the latest development version directly from GitHub:
75
+
76
+ ```bash
77
+ git clone https://gitlab.genomicops.cloud/genomicops/audio-scribe.git
78
+ cd audio-scribe
79
+ pip install -r requirements.txt
80
+ ```
81
+
82
+ This approach is particularly useful if you want the newest changes or plan to contribute.
83
+
84
+ ---
85
+
86
+ ## Quick Start
87
+
88
+ 1. **Obtain a Hugging Face Token**
89
+ - Create a token at [Hugging Face Settings](https://huggingface.co/settings/tokens).
90
+ - Accept the model conditions for `pyannote/segmentation-3.0` and `pyannote/speaker-diarization-3.1`.
91
+
92
+ 2. **Run the Command-Line Tool**
93
+ ```bash
94
+ audio-scribe --audio path/to/audio.wav
95
+ ```
96
+ > On the first run, you’ll be prompted for your Hugging Face token if you haven’t stored one yet.
97
+
98
+ 3. **Watch the Progress Bar**
99
+ - The tool displays a progress bar for each diarized speaker turn, along with real-time CPU, GPU, and memory usage.
100
+
101
+ ---
102
+
103
+ ## Usage
104
+
105
+ Below is a summary of the main command-line options:
106
+
107
+ ```
108
+ usage: audio-scribe [options]
109
+
110
+ Audio Transcription (Audio Scribe) Pipeline using Whisper + Pyannote, with optional progress bar.
111
+
112
+ optional arguments:
113
+ --audio PATH Path to the audio file to transcribe.
114
+ --token TOKEN HuggingFace API token. Overrides any saved token.
115
+ --output PATH Path to the output directory for transcripts and temporary files.
116
+ --delete-token Delete any stored Hugging Face token and exit.
117
+ --show-warnings Enable user warnings (e.g., from pyannote.audio). Disabled by default.
118
+ --whisper-model MODEL Specify the Whisper model to use (default: 'base.en').
119
+ ```
120
+
121
+ **Examples:**
122
+
123
+ - **Basic Transcription**
124
+ ```bash
125
+ audio-scribe --audio meeting.wav
126
+ ```
127
+
128
+ - **Specify a Different Whisper Model**
129
+ ```bash
130
+ audio-scribe --audio webinar.mp3 --whisper-model small
131
+ ```
132
+
133
+ - **Delete a Stored Token**
134
+ ```bash
135
+ audio-scribe --delete-token
136
+ ```
137
+
138
+ - **Show Internal Warnings**
139
+ ```bash
140
+ audio-scribe --audio session.wav --show-warnings
141
+ ```
142
+
143
+ - **Tab-Completion**
144
+ ```bash
145
+ audio-scribe
146
+ # When prompted for an audio file path, press Tab to autocomplete
147
+ ```
148
+
149
+ ---
150
+
151
+ ## Dependencies
152
+
153
+ **Core Libraries**
154
+ - **Python 3.8+**
155
+ - [PyTorch](https://pytorch.org/)
156
+ - [openai-whisper](https://github.com/openai/whisper)
157
+ - [pyannote.audio](https://github.com/pyannote/pyannote-audio)
158
+ - [pytorch-lightning](https://pypi.org/project/pytorch-lightning/)
159
+ - [cryptography](https://pypi.org/project/cryptography/)
160
+ - [keyring](https://pypi.org/project/keyring/)
161
+
162
+ **Optional for Extended Functionality**
163
+ - [alive-progress](https://pypi.org/project/alive-progress/) – Real-time progress bar
164
+ - [psutil](https://pypi.org/project/psutil/) – CPU/memory usage
165
+ - [GPUtil](https://pypi.org/project/GPUtil/) – GPU usage
166
+ - [pyreadline3](https://pypi.org/project/pyreadline3/) (for Windows tab-completion)
167
+
168
+ ### Sample `requirements.txt`
169
+
170
+ Below is a typical `requirements.txt` you can place in your repository:
171
+
172
+ ```
173
+ torch>=1.9
174
+ openai-whisper
175
+ pyannote.audio
176
+ pytorch-lightning
177
+ cryptography
178
+ keyring
179
+ alive-progress
180
+ psutil
181
+ GPUtil
182
+ pyreadline3; sys_platform == "win32"
183
+ ```
184
+
185
+ > Note:
186
+ > - `pyreadline3` is appended with a [PEP 508 marker](https://peps.python.org/pep-0508/) (`; sys_platform == "win32"`) so it only installs on Windows.
187
+ > - For GPU support, ensure you install a compatible PyTorch version with CUDA.
188
+
189
+ ---
190
+
191
+ ## Contributing
192
+
193
+ We welcome contributions to **Audio Scribe**!
194
+
195
+ 1. **Fork** the repository and clone your fork.
196
+ 2. **Create a new branch** for your feature or bugfix.
197
+ 3. **Implement your changes**, ensuring code is well-documented and follows best practices.
198
+ 4. **Open a pull request**, detailing the changes you’ve made.
199
+
200
+ Please read any available guidelines or templates in our repository (such as `CONTRIBUTING.md` or `CODE_OF_CONDUCT.md`) before submitting.
201
+
202
+ ---
203
+
204
+ ## License
205
+
206
+ This project is licensed under the [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0).
207
+
208
+ ```
209
+ Copyright 2025 Gurasis Osahan
210
+
211
+ Licensed under the Apache License, Version 2.0 (the "License");
212
+ you may not use this file except in compliance with the License.
213
+ You may obtain a copy of the License at
214
+
215
+ http://www.apache.org/licenses/LICENSE-2.0
216
+
217
+ Unless required by applicable law or agreed to in writing, software
218
+ distributed under the License is distributed on an "AS IS" BASIS,
219
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
220
+ See the License for the specific language governing permissions and
221
+ limitations under the License.
222
+ ```
223
+
224
+ ---
225
+
226
+ **Thank you for using Audio Scribe!**
227
+ For questions or feedback, please open a [GitHub issue](https://gitlab.genomicops.cloud/genomicops/audio-scribe/issues) or contact the maintainers.
File without changes