scribe-cli 0.9.0__tar.gz → 0.10.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {scribe_cli-0.9.0/scribe_cli.egg-info → scribe_cli-0.10.0}/PKG-INFO +19 -13
- {scribe_cli-0.9.0 → scribe_cli-0.10.0}/README.md +13 -12
- {scribe_cli-0.9.0 → scribe_cli-0.10.0}/pyproject.toml +2 -1
- {scribe_cli-0.9.0 → scribe_cli-0.10.0}/scribe/_version.py +2 -2
- {scribe_cli-0.9.0 → scribe_cli-0.10.0}/scribe/app.py +27 -18
- {scribe_cli-0.9.0 → scribe_cli-0.10.0}/scribe/models.py +51 -7
- {scribe_cli-0.9.0 → scribe_cli-0.10.0/scribe_cli.egg-info}/PKG-INFO +19 -13
- {scribe_cli-0.9.0 → scribe_cli-0.10.0}/scribe_cli.egg-info/requires.txt +6 -0
- {scribe_cli-0.9.0 → scribe_cli-0.10.0}/.github/workflows/pypi.yml +0 -0
- {scribe_cli-0.9.0 → scribe_cli-0.10.0}/.gitignore +0 -0
- {scribe_cli-0.9.0 → scribe_cli-0.10.0}/LICENSE +0 -0
- {scribe_cli-0.9.0 → scribe_cli-0.10.0}/icon.xcf +0 -0
- {scribe_cli-0.9.0 → scribe_cli-0.10.0}/scribe/__init__.py +0 -0
- {scribe_cli-0.9.0 → scribe_cli-0.10.0}/scribe/audio.py +0 -0
- {scribe_cli-0.9.0 → scribe_cli-0.10.0}/scribe/install_desktop.py +0 -0
- {scribe_cli-0.9.0 → scribe_cli-0.10.0}/scribe/keyboard.py +0 -0
- {scribe_cli-0.9.0 → scribe_cli-0.10.0}/scribe/models.toml +0 -0
- {scribe_cli-0.9.0 → scribe_cli-0.10.0}/scribe/saverecording.py +0 -0
- {scribe_cli-0.9.0 → scribe_cli-0.10.0}/scribe/testpynput.py +0 -0
- {scribe_cli-0.9.0 → scribe_cli-0.10.0}/scribe/util.py +0 -0
- {scribe_cli-0.9.0 → scribe_cli-0.10.0}/scribe_cli.egg-info/SOURCES.txt +0 -0
- {scribe_cli-0.9.0 → scribe_cli-0.10.0}/scribe_cli.egg-info/dependency_links.txt +0 -0
- {scribe_cli-0.9.0 → scribe_cli-0.10.0}/scribe_cli.egg-info/entry_points.txt +0 -0
- {scribe_cli-0.9.0 → scribe_cli-0.10.0}/scribe_cli.egg-info/top_level.txt +0 -0
- {scribe_cli-0.9.0 → scribe_cli-0.10.0}/scribe_data/__init__.py +0 -0
- {scribe_cli-0.9.0 → scribe_cli-0.10.0}/scribe_data/share/icon.png +0 -0
- {scribe_cli-0.9.0 → scribe_cli-0.10.0}/scribe_data/share/icon_recording.png +0 -0
- {scribe_cli-0.9.0 → scribe_cli-0.10.0}/scribe_data/share/icon_writing.png +0 -0
- {scribe_cli-0.9.0 → scribe_cli-0.10.0}/scribe_data/templates/scribe.desktop +0 -0
- {scribe_cli-0.9.0 → scribe_cli-0.10.0}/setup.cfg +0 -0
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.2
|
|
2
2
|
Name: scribe-cli
|
|
3
|
-
Version: 0.
|
|
3
|
+
Version: 0.10.0
|
|
4
4
|
Summary: scribe is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard on a computer
|
|
5
5
|
Author-email: Mahé Perrette <mahe.perrette@gmail.com>
|
|
6
6
|
License: MIT License
|
|
@@ -55,9 +55,14 @@ Requires-Dist: vosk; extra == "vosk"
|
|
|
55
55
|
Provides-Extra: app
|
|
56
56
|
Requires-Dist: pystray; extra == "app"
|
|
57
57
|
Requires-Dist: PyGObject; extra == "app"
|
|
58
|
+
Provides-Extra: openai
|
|
59
|
+
Requires-Dist: openai; extra == "openai"
|
|
60
|
+
Requires-Dist: soundfile; extra == "openai"
|
|
58
61
|
Provides-Extra: all
|
|
59
62
|
Requires-Dist: pynput; extra == "all"
|
|
60
63
|
Requires-Dist: openai-whisper; extra == "all"
|
|
64
|
+
Requires-Dist: openai; extra == "all"
|
|
65
|
+
Requires-Dist: soundfile; extra == "all"
|
|
61
66
|
Requires-Dist: vosk; extra == "all"
|
|
62
67
|
Requires-Dist: pystray; extra == "all"
|
|
63
68
|
|
|
@@ -66,7 +71,9 @@ Requires-Dist: pystray; extra == "all"
|
|
|
66
71
|
|
|
67
72
|
# Scribe <img src="scribe_data/share/icon.png" width=48px>
|
|
68
73
|
|
|
69
|
-
`scribe` is a
|
|
74
|
+
`scribe` is a speech recognition tool that provides real-time transcription using cutting-edge AI models, with the goal of serving as a virtual keyboard on a computer.
|
|
75
|
+
|
|
76
|
+
It features local, downloadable models with the `vosk` and `whisper` backends, as well as a client to open AI via `openaiapi` backend (API key required).
|
|
70
77
|
|
|
71
78
|
## Compatibility
|
|
72
79
|
|
|
@@ -101,12 +108,10 @@ cd scribe
|
|
|
101
108
|
pip install -e .[all]
|
|
102
109
|
```
|
|
103
110
|
|
|
104
|
-
You can leave the optional dependencies (leave out `[all]`) but must install at least one of `vosk` or `openai-whisper` packages (see Usage below).
|
|
105
|
-
|
|
106
|
-
The `vosk` language models will download on-the-fly.
|
|
107
|
-
The default download folder is `$XDG_CACHE_HOME/{backend}` where `$XDG_CACHE_HOME` defaults to `$HOME/.cache` (note for the `whisper` backend
|
|
108
|
-
the default is left to the `openai-whisper` package and might change in the future).
|
|
111
|
+
You can leave the optional dependencies (leave out `[all]`) but must install at least one of `vosk` or `openai-whisper` or `openai` packages (see Usage below).
|
|
109
112
|
|
|
113
|
+
The language models for local backends `vosk` and `whisper` will download on-the-fly.
|
|
114
|
+
The default download folder is `$XDG_CACHE_HOME/{backend}` where `$XDG_CACHE_HOME` defaults to `$HOME/.cache`.
|
|
110
115
|
|
|
111
116
|
## Usage
|
|
112
117
|
|
|
@@ -115,7 +120,7 @@ Just type in the terminal:
|
|
|
115
120
|
```bash
|
|
116
121
|
scribe
|
|
117
122
|
```
|
|
118
|
-
and the script will guide you through the choice of backend (`whisper` or `vosk`) and the specific language model.
|
|
123
|
+
and the script will guide you through the choice of backend (`whisper` or `vosk` or `openaiapi`) and the specific language model.
|
|
119
124
|
After this, you will be prompted to start recording your microphone and print the transcribed text in real-time (`vosk`)
|
|
120
125
|
or until after recording is complete (`whisper`).
|
|
121
126
|
You can interrupt the recording via Ctrl + C and start again or change model.
|
|
@@ -129,9 +134,9 @@ The `vosk` backend is much faster and very good at doing real-time transcription
|
|
|
129
134
|
It becomes really powerful when used for longer or interactive typing session with the [keyboard](#virtual-keyboard-experimental) option, e.g. to make notes or chat with an AI.
|
|
130
135
|
There are many [vosk models](https://alphacephei.com/vosk/models) available, and here a few are associated to [a handful of languages](scribe/models.toml) `en`, `fr`, `it`, `de` (so far).
|
|
131
136
|
|
|
132
|
-
|
|
137
|
+
The `openaiapi` backend uses `whisper-1` model at the time of writing. It requires an API key
|
|
133
138
|
```bash
|
|
134
|
-
scribe --backend
|
|
139
|
+
scribe --backend openaiapi --api YOURAPIKEY
|
|
135
140
|
```
|
|
136
141
|
where `--no-prompt` jumps right to the recording (after the first interruption, you can still choose to change the backend and model).
|
|
137
142
|
|
|
@@ -190,7 +195,8 @@ To activate start with:
|
|
|
190
195
|
```bash
|
|
191
196
|
scribe --app
|
|
192
197
|
```
|
|
193
|
-
or toggle the app option in the interactive menu. The scribe icon will show, with Record or Quit options.
|
|
198
|
+
or toggle the app option in the interactive menu. The scribe icon will show, with Record, Stop or Quit options. The icon will change based on what the app is doing.
|
|
199
|
+
For the vosk model, there are only two states : recording + transcribing or Idle. For the whisper model there are three states visible from the icon: recording, transcribing and idle/waiting.
|
|
194
200
|
That option requires `pystray` to be installed. This is included with the `pip install ...[all]` option. In Ubuntu the following dependencies were required to make the menus appear:
|
|
195
201
|
|
|
196
202
|
```bash
|
|
@@ -213,8 +219,8 @@ scribe-install --name "Scribe Vosk FR" --backend vosk --language fr --keyboard -
|
|
|
213
219
|
```
|
|
214
220
|
This will install three separate apps:
|
|
215
221
|
- `Super + scribe` : will launch the default version with terminal prompt
|
|
216
|
-
- `Super + whisper` : will launch a present version with the `small` model from `whisper` and start recording right away. You can see what is going on in the terminal and the result is ready to paste from the clipboard
|
|
217
|
-
- `Super + vosk fr` : will launch a preset version for real-time transcription in French with the `vosk` backend, and throughput to the clipboard and the keyboard, not even opening a terminal.
|
|
222
|
+
- `Super + whisper` : will launch a present version with the `small` model from `whisper` and start recording right away. You can see what is going on in the terminal and the result is ready to paste from the clipboard.
|
|
223
|
+
- `Super + vosk fr` : will launch a preset version for real-time transcription in French with the `vosk` backend, and throughput to the clipboard and the keyboard, not even opening a terminal (you need to press Record in the tray icon menu to start the recording).
|
|
218
224
|
|
|
219
225
|
|
|
220
226
|
## Fine tuning
|
|
@@ -3,7 +3,9 @@
|
|
|
3
3
|
|
|
4
4
|
# Scribe <img src="scribe_data/share/icon.png" width=48px>
|
|
5
5
|
|
|
6
|
-
`scribe` is a
|
|
6
|
+
`scribe` is a speech recognition tool that provides real-time transcription using cutting-edge AI models, with the goal of serving as a virtual keyboard on a computer.
|
|
7
|
+
|
|
8
|
+
It features local, downloadable models with the `vosk` and `whisper` backends, as well as a client to open AI via `openaiapi` backend (API key required).
|
|
7
9
|
|
|
8
10
|
## Compatibility
|
|
9
11
|
|
|
@@ -38,12 +40,10 @@ cd scribe
|
|
|
38
40
|
pip install -e .[all]
|
|
39
41
|
```
|
|
40
42
|
|
|
41
|
-
You can leave the optional dependencies (leave out `[all]`) but must install at least one of `vosk` or `openai-whisper` packages (see Usage below).
|
|
42
|
-
|
|
43
|
-
The `vosk` language models will download on-the-fly.
|
|
44
|
-
The default download folder is `$XDG_CACHE_HOME/{backend}` where `$XDG_CACHE_HOME` defaults to `$HOME/.cache` (note for the `whisper` backend
|
|
45
|
-
the default is left to the `openai-whisper` package and might change in the future).
|
|
43
|
+
You can leave the optional dependencies (leave out `[all]`) but must install at least one of `vosk` or `openai-whisper` or `openai` packages (see Usage below).
|
|
46
44
|
|
|
45
|
+
The language models for local backends `vosk` and `whisper` will download on-the-fly.
|
|
46
|
+
The default download folder is `$XDG_CACHE_HOME/{backend}` where `$XDG_CACHE_HOME` defaults to `$HOME/.cache`.
|
|
47
47
|
|
|
48
48
|
## Usage
|
|
49
49
|
|
|
@@ -52,7 +52,7 @@ Just type in the terminal:
|
|
|
52
52
|
```bash
|
|
53
53
|
scribe
|
|
54
54
|
```
|
|
55
|
-
and the script will guide you through the choice of backend (`whisper` or `vosk`) and the specific language model.
|
|
55
|
+
and the script will guide you through the choice of backend (`whisper` or `vosk` or `openaiapi`) and the specific language model.
|
|
56
56
|
After this, you will be prompted to start recording your microphone and print the transcribed text in real-time (`vosk`)
|
|
57
57
|
or until after recording is complete (`whisper`).
|
|
58
58
|
You can interrupt the recording via Ctrl + C and start again or change model.
|
|
@@ -66,9 +66,9 @@ The `vosk` backend is much faster and very good at doing real-time transcription
|
|
|
66
66
|
It becomes really powerful when used for longer or interactive typing session with the [keyboard](#virtual-keyboard-experimental) option, e.g. to make notes or chat with an AI.
|
|
67
67
|
There are many [vosk models](https://alphacephei.com/vosk/models) available, and here a few are associated to [a handful of languages](scribe/models.toml) `en`, `fr`, `it`, `de` (so far).
|
|
68
68
|
|
|
69
|
-
|
|
69
|
+
The `openaiapi` backend uses `whisper-1` model at the time of writing. It requires an API key
|
|
70
70
|
```bash
|
|
71
|
-
scribe --backend
|
|
71
|
+
scribe --backend openaiapi --api YOURAPIKEY
|
|
72
72
|
```
|
|
73
73
|
where `--no-prompt` jumps right to the recording (after the first interruption, you can still choose to change the backend and model).
|
|
74
74
|
|
|
@@ -127,7 +127,8 @@ To activate start with:
|
|
|
127
127
|
```bash
|
|
128
128
|
scribe --app
|
|
129
129
|
```
|
|
130
|
-
or toggle the app option in the interactive menu. The scribe icon will show, with Record or Quit options.
|
|
130
|
+
or toggle the app option in the interactive menu. The scribe icon will show, with Record, Stop or Quit options. The icon will change based on what the app is doing.
|
|
131
|
+
For the vosk model, there are only two states : recording + transcribing or Idle. For the whisper model there are three states visible from the icon: recording, transcribing and idle/waiting.
|
|
131
132
|
That option requires `pystray` to be installed. This is included with the `pip install ...[all]` option. In Ubuntu the following dependencies were required to make the menus appear:
|
|
132
133
|
|
|
133
134
|
```bash
|
|
@@ -150,8 +151,8 @@ scribe-install --name "Scribe Vosk FR" --backend vosk --language fr --keyboard -
|
|
|
150
151
|
```
|
|
151
152
|
This will install three separate apps:
|
|
152
153
|
- `Super + scribe` : will launch the default version with terminal prompt
|
|
153
|
-
- `Super + whisper` : will launch a present version with the `small` model from `whisper` and start recording right away. You can see what is going on in the terminal and the result is ready to paste from the clipboard
|
|
154
|
-
- `Super + vosk fr` : will launch a preset version for real-time transcription in French with the `vosk` backend, and throughput to the clipboard and the keyboard, not even opening a terminal.
|
|
154
|
+
- `Super + whisper` : will launch a present version with the `small` model from `whisper` and start recording right away. You can see what is going on in the terminal and the result is ready to paste from the clipboard.
|
|
155
|
+
- `Super + vosk fr` : will launch a preset version for real-time transcription in French with the `vosk` backend, and throughput to the clipboard and the keyboard, not even opening a terminal (you need to press Record in the tray icon menu to start the recording).
|
|
155
156
|
|
|
156
157
|
|
|
157
158
|
## Fine tuning
|
|
@@ -44,7 +44,8 @@ keyboard = ["pynput"]
|
|
|
44
44
|
whisper = ["openai-whisper"]
|
|
45
45
|
vosk = ["vosk"]
|
|
46
46
|
app = ["pystray", "PyGObject"]
|
|
47
|
-
|
|
47
|
+
openai = ["openai", "soundfile"]
|
|
48
|
+
all = ["pynput", "openai-whisper", "openai", "soundfile", "vosk", "pystray"]
|
|
48
49
|
|
|
49
50
|
|
|
50
51
|
[tool.setuptools]
|
|
@@ -4,8 +4,8 @@ import re
|
|
|
4
4
|
import time
|
|
5
5
|
import argparse
|
|
6
6
|
from scribe.audio import Microphone
|
|
7
|
-
from scribe.util import print_partial, clear_line, prompt_choices,
|
|
8
|
-
from scribe.models import VoskTranscriber, WhisperTranscriber
|
|
7
|
+
from scribe.util import print_partial, clear_line, prompt_choices, ansi_link, colored
|
|
8
|
+
from scribe.models import VoskTranscriber, WhisperTranscriber, OpenaiAPITranscriber
|
|
9
9
|
|
|
10
10
|
with open(Path(__file__).parent / "models.toml", "rb") as f:
|
|
11
11
|
language_config_default = tomllib.load(f)
|
|
@@ -24,7 +24,7 @@ def get_default_backend():
|
|
|
24
24
|
except ImportError:
|
|
25
25
|
raise ImportError("Please install either vosk or whisper to use this script.")
|
|
26
26
|
|
|
27
|
-
BACKENDS = ["whisper", "vosk"]
|
|
27
|
+
BACKENDS = ["whisper", "vosk", "openaiapi"]
|
|
28
28
|
UNAVAILABLE_BACKENDS = []
|
|
29
29
|
|
|
30
30
|
|
|
@@ -59,6 +59,7 @@ def get_transcriber(o, prompt=True):
|
|
|
59
59
|
|
|
60
60
|
whisper_models = ["tiny", "base", "small", "medium", "large", "turbo"]
|
|
61
61
|
whisper_english_models = ["tiny.en", "base.en", "small.en", "medium.en"]
|
|
62
|
+
whisperapi_models = ["whisper-1"]
|
|
62
63
|
|
|
63
64
|
if o.dummy:
|
|
64
65
|
return DummyTranscriber("whisper", "dummy")
|
|
@@ -68,26 +69,17 @@ def get_transcriber(o, prompt=True):
|
|
|
68
69
|
o.backend = "vosk"
|
|
69
70
|
elif o.model in whisper_models + whisper_english_models:
|
|
70
71
|
o.backend = "whisper"
|
|
72
|
+
elif o.model in whisperapi_models:
|
|
73
|
+
o.backend = "openaiapi"
|
|
71
74
|
|
|
72
75
|
if o.backend:
|
|
73
|
-
checked_backend = check_dependencies(o.backend)
|
|
74
|
-
if not checked_backend:
|
|
75
|
-
print(f"Backend {o.backend} is not available.")
|
|
76
|
-
exit(1)
|
|
77
76
|
backend = o.backend
|
|
78
77
|
|
|
79
78
|
elif not prompt:
|
|
80
79
|
backend = BACKENDS[0]
|
|
81
80
|
|
|
82
81
|
else:
|
|
83
|
-
|
|
84
|
-
while not checked_backend:
|
|
85
|
-
backend = prompt_choices(BACKENDS, o.backend, "backend", UNAVAILABLE_BACKENDS)
|
|
86
|
-
# raise an error if the user has explicitly selected a backend that is not available
|
|
87
|
-
checked_backend = check_dependencies(backend, raise_error=backend==o.backend)
|
|
88
|
-
if not checked_backend:
|
|
89
|
-
print(f"Backend {o.backend} is not available.")
|
|
90
|
-
UNAVAILABLE_BACKENDS.append(backend)
|
|
82
|
+
backend = prompt_choices(BACKENDS, o.backend, "backend", UNAVAILABLE_BACKENDS)
|
|
91
83
|
|
|
92
84
|
print(f"Selected backend: {backend}")
|
|
93
85
|
|
|
@@ -131,6 +123,13 @@ def get_transcriber(o, prompt=True):
|
|
|
131
123
|
|
|
132
124
|
model = pick_specialist_model(model, o.language, backend)
|
|
133
125
|
|
|
126
|
+
elif backend == "openaiapi":
|
|
127
|
+
model = o.model or "whisper-1"
|
|
128
|
+
|
|
129
|
+
else:
|
|
130
|
+
raise ValueError(f"Unknown backend: {backend}")
|
|
131
|
+
|
|
132
|
+
|
|
134
133
|
print(f"Selected model: {model}")
|
|
135
134
|
|
|
136
135
|
if backend == "vosk":
|
|
@@ -152,6 +151,12 @@ def get_transcriber(o, prompt=True):
|
|
|
152
151
|
restart_after_silence=o.restart_after_silence,
|
|
153
152
|
model_kwargs={"download_root": o.download_folder_whisper})
|
|
154
153
|
|
|
154
|
+
elif backend == "openaiapi":
|
|
155
|
+
transcriber = OpenaiAPITranscriber(model_name=model, samplerate=o.samplerate,
|
|
156
|
+
timeout=o.duration, silence_duration=o.silence, silence_thresh=o.silence_db,
|
|
157
|
+
restart_after_silence=o.restart_after_silence, api_key=o.api_key)
|
|
158
|
+
|
|
159
|
+
|
|
155
160
|
else:
|
|
156
161
|
raise ValueError(f"Unknown backend: {backend}")
|
|
157
162
|
|
|
@@ -195,6 +200,10 @@ def get_parser():
|
|
|
195
200
|
group.add_argument("--silence-db", default=-30, type=float, help="silence magnitude in decibel (default %(default)s db)")
|
|
196
201
|
group.add_argument("-a", "--restart-after-silence", action="store_true", help="Restart the recording after a transcription triggered by a silence")
|
|
197
202
|
|
|
203
|
+
group = parser.add_argument_group("whisper api")
|
|
204
|
+
group.add_argument("--api-key",
|
|
205
|
+
help="API key for the Whisper API backend.")
|
|
206
|
+
|
|
198
207
|
parser.add_argument("--download-folder-vosk", help="Folder to store Vosk models.")
|
|
199
208
|
parser.add_argument("--download-folder-whisper", help="Folder to store Whisper models.")
|
|
200
209
|
|
|
@@ -206,11 +215,11 @@ def start_recording(micro, transcriber, clipboard=True, keyboard=False, latency=
|
|
|
206
215
|
|
|
207
216
|
if keyboard:
|
|
208
217
|
from scribe.keyboard import type_text
|
|
209
|
-
|
|
218
|
+
transcriber.log("Change focus to target app during transcription.")
|
|
210
219
|
|
|
211
220
|
if clipboard:
|
|
212
221
|
import pyperclip
|
|
213
|
-
|
|
222
|
+
transcriber.log("The full transcription will be copied to clipboard as it becomes available.")
|
|
214
223
|
|
|
215
224
|
fulltext = ""
|
|
216
225
|
|
|
@@ -310,7 +319,7 @@ def create_app(micro, transcriber, **kwargs):
|
|
|
310
319
|
def callback_record(icon, item):
|
|
311
320
|
# kwargs["callback"] = icon.update_menu # NOTE: the thread will finish AFTER the callback is complete
|
|
312
321
|
if transcriber.busy:
|
|
313
|
-
|
|
322
|
+
transcriber.log("Still busy recording or transcribing.")
|
|
314
323
|
return
|
|
315
324
|
|
|
316
325
|
if hasattr(icon, "_recording_thread") and icon._recording_thread.is_alive():
|
|
@@ -22,7 +22,7 @@ class StopRecording(Exception):
|
|
|
22
22
|
class AbstractTranscriber:
|
|
23
23
|
backend = None
|
|
24
24
|
def __init__(self, model, model_name=None, language=None, samplerate=16000, timeout=None, model_kwargs={},
|
|
25
|
-
silence_thresh=-40, silence_duration=2, restart_after_silence=False):
|
|
25
|
+
silence_thresh=-40, silence_duration=2, restart_after_silence=False, logger=None):
|
|
26
26
|
self.model_name = model_name
|
|
27
27
|
self.language = language
|
|
28
28
|
self.model = model
|
|
@@ -36,6 +36,11 @@ class AbstractTranscriber:
|
|
|
36
36
|
self.busy = False
|
|
37
37
|
self.waiting = False
|
|
38
38
|
self.interrupt = False
|
|
39
|
+
if logger is None:
|
|
40
|
+
import logging
|
|
41
|
+
logging.basicConfig(level=logging.INFO)
|
|
42
|
+
logger = logging.getLogger("scribe")
|
|
43
|
+
self.logger = logger
|
|
39
44
|
self.reset()
|
|
40
45
|
|
|
41
46
|
def get_elapsed(self):
|
|
@@ -55,9 +60,18 @@ class AbstractTranscriber:
|
|
|
55
60
|
self.audio_buffer = b''
|
|
56
61
|
self.start_time = time.time()
|
|
57
62
|
|
|
63
|
+
def log(self, text):
|
|
64
|
+
if text.startswith("\n"):
|
|
65
|
+
print("")
|
|
66
|
+
text = text[1:]
|
|
67
|
+
if self.logger:
|
|
68
|
+
self.logger.info(text)
|
|
69
|
+
else:
|
|
70
|
+
print(f"[{text}]")
|
|
71
|
+
|
|
58
72
|
def start_recording(self, microphone,
|
|
59
73
|
start_message="Recording... Press Ctrl+C to stop.",
|
|
60
|
-
stop_message="
|
|
74
|
+
stop_message="Exit."):
|
|
61
75
|
|
|
62
76
|
self.reset()
|
|
63
77
|
self.interrupt = False
|
|
@@ -73,7 +87,7 @@ class AbstractTranscriber:
|
|
|
73
87
|
try:
|
|
74
88
|
|
|
75
89
|
with microphone.open_stream():
|
|
76
|
-
|
|
90
|
+
self.log(start_message)
|
|
77
91
|
|
|
78
92
|
while not self.interrupt:
|
|
79
93
|
while not microphone.q.empty():
|
|
@@ -107,7 +121,7 @@ class AbstractTranscriber:
|
|
|
107
121
|
|
|
108
122
|
else:
|
|
109
123
|
if not previous_waiting:
|
|
110
|
-
|
|
124
|
+
self.log("Silence detected...waiting for more audio")
|
|
111
125
|
|
|
112
126
|
if self.is_overtime():
|
|
113
127
|
raise StopRecording("Overtime: {:.2f} seconds".format(self.get_elapsed()))
|
|
@@ -125,7 +139,7 @@ class AbstractTranscriber:
|
|
|
125
139
|
self.busy = False
|
|
126
140
|
yield result
|
|
127
141
|
|
|
128
|
-
|
|
142
|
+
self.log(stop_message)
|
|
129
143
|
|
|
130
144
|
|
|
131
145
|
def get_vosk_model(model, download_root=None, url=None):
|
|
@@ -200,7 +214,7 @@ class WhisperTranscriber(AbstractTranscriber):
|
|
|
200
214
|
super().__init__(model, model_name, language, model_kwargs=model_kwargs, **kwargs)
|
|
201
215
|
|
|
202
216
|
def transcribe_audio(self, audio_bytes):
|
|
203
|
-
|
|
217
|
+
self.log("\nTranscribing")
|
|
204
218
|
audio_array = np.frombuffer(audio_bytes, dtype=np.int16).flatten().astype(np.float32) / 32768.0
|
|
205
219
|
return self.model.transcribe(audio_array, fp16=False, language=self.language)
|
|
206
220
|
|
|
@@ -209,4 +223,34 @@ class WhisperTranscriber(AbstractTranscriber):
|
|
|
209
223
|
return {"text": ""}
|
|
210
224
|
result = self.transcribe_audio(self.audio_buffer)
|
|
211
225
|
self.audio_buffer = b''
|
|
212
|
-
return result
|
|
226
|
+
return result
|
|
227
|
+
|
|
228
|
+
|
|
229
|
+
class OpenaiAPITranscriber(WhisperTranscriber):
|
|
230
|
+
backend = "openaiapi"
|
|
231
|
+
|
|
232
|
+
def __init__(self, model_name="whisper-1", language=None, model_kwargs={}, model=None, api_key=None, **kwargs):
|
|
233
|
+
if model is None:
|
|
234
|
+
import openai
|
|
235
|
+
model = openai.OpenAI(
|
|
236
|
+
api_key=api_key or openai.api_key,
|
|
237
|
+
# 20 seconds (default is 10 minutes)
|
|
238
|
+
timeout=20.0,
|
|
239
|
+
)
|
|
240
|
+
AbstractTranscriber.__init__(self, model, model_name, language, model_kwargs=model_kwargs, **kwargs)
|
|
241
|
+
|
|
242
|
+
def transcribe_audio(self, audio_bytes):
|
|
243
|
+
self.log("\nTranscribing")
|
|
244
|
+
import io
|
|
245
|
+
import soundfile as sf
|
|
246
|
+
audio_data = np.frombuffer(audio_bytes, dtype=np.int16).flatten().astype(np.float32) / 32768.0
|
|
247
|
+
# Write the audio data to an in-memory file in WAV format
|
|
248
|
+
buffer = io.BytesIO()
|
|
249
|
+
sf.write(buffer, audio_data, self.samplerate, format='WAV')
|
|
250
|
+
buffer.seek(0)
|
|
251
|
+
buffer.name = "audio.wav" # Set a filename with a valid extension
|
|
252
|
+
transcription = self.model.audio.transcriptions.create(
|
|
253
|
+
model=self.model_name,
|
|
254
|
+
file=buffer,
|
|
255
|
+
)
|
|
256
|
+
return {"text": transcription.text}
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.2
|
|
2
2
|
Name: scribe-cli
|
|
3
|
-
Version: 0.
|
|
3
|
+
Version: 0.10.0
|
|
4
4
|
Summary: scribe is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard on a computer
|
|
5
5
|
Author-email: Mahé Perrette <mahe.perrette@gmail.com>
|
|
6
6
|
License: MIT License
|
|
@@ -55,9 +55,14 @@ Requires-Dist: vosk; extra == "vosk"
|
|
|
55
55
|
Provides-Extra: app
|
|
56
56
|
Requires-Dist: pystray; extra == "app"
|
|
57
57
|
Requires-Dist: PyGObject; extra == "app"
|
|
58
|
+
Provides-Extra: openai
|
|
59
|
+
Requires-Dist: openai; extra == "openai"
|
|
60
|
+
Requires-Dist: soundfile; extra == "openai"
|
|
58
61
|
Provides-Extra: all
|
|
59
62
|
Requires-Dist: pynput; extra == "all"
|
|
60
63
|
Requires-Dist: openai-whisper; extra == "all"
|
|
64
|
+
Requires-Dist: openai; extra == "all"
|
|
65
|
+
Requires-Dist: soundfile; extra == "all"
|
|
61
66
|
Requires-Dist: vosk; extra == "all"
|
|
62
67
|
Requires-Dist: pystray; extra == "all"
|
|
63
68
|
|
|
@@ -66,7 +71,9 @@ Requires-Dist: pystray; extra == "all"
|
|
|
66
71
|
|
|
67
72
|
# Scribe <img src="scribe_data/share/icon.png" width=48px>
|
|
68
73
|
|
|
69
|
-
`scribe` is a
|
|
74
|
+
`scribe` is a speech recognition tool that provides real-time transcription using cutting-edge AI models, with the goal of serving as a virtual keyboard on a computer.
|
|
75
|
+
|
|
76
|
+
It features local, downloadable models with the `vosk` and `whisper` backends, as well as a client to open AI via `openaiapi` backend (API key required).
|
|
70
77
|
|
|
71
78
|
## Compatibility
|
|
72
79
|
|
|
@@ -101,12 +108,10 @@ cd scribe
|
|
|
101
108
|
pip install -e .[all]
|
|
102
109
|
```
|
|
103
110
|
|
|
104
|
-
You can leave the optional dependencies (leave out `[all]`) but must install at least one of `vosk` or `openai-whisper` packages (see Usage below).
|
|
105
|
-
|
|
106
|
-
The `vosk` language models will download on-the-fly.
|
|
107
|
-
The default download folder is `$XDG_CACHE_HOME/{backend}` where `$XDG_CACHE_HOME` defaults to `$HOME/.cache` (note for the `whisper` backend
|
|
108
|
-
the default is left to the `openai-whisper` package and might change in the future).
|
|
111
|
+
You can leave the optional dependencies (leave out `[all]`) but must install at least one of `vosk` or `openai-whisper` or `openai` packages (see Usage below).
|
|
109
112
|
|
|
113
|
+
The language models for local backends `vosk` and `whisper` will download on-the-fly.
|
|
114
|
+
The default download folder is `$XDG_CACHE_HOME/{backend}` where `$XDG_CACHE_HOME` defaults to `$HOME/.cache`.
|
|
110
115
|
|
|
111
116
|
## Usage
|
|
112
117
|
|
|
@@ -115,7 +120,7 @@ Just type in the terminal:
|
|
|
115
120
|
```bash
|
|
116
121
|
scribe
|
|
117
122
|
```
|
|
118
|
-
and the script will guide you through the choice of backend (`whisper` or `vosk`) and the specific language model.
|
|
123
|
+
and the script will guide you through the choice of backend (`whisper` or `vosk` or `openaiapi`) and the specific language model.
|
|
119
124
|
After this, you will be prompted to start recording your microphone and print the transcribed text in real-time (`vosk`)
|
|
120
125
|
or until after recording is complete (`whisper`).
|
|
121
126
|
You can interrupt the recording via Ctrl + C and start again or change model.
|
|
@@ -129,9 +134,9 @@ The `vosk` backend is much faster and very good at doing real-time transcription
|
|
|
129
134
|
It becomes really powerful when used for longer or interactive typing session with the [keyboard](#virtual-keyboard-experimental) option, e.g. to make notes or chat with an AI.
|
|
130
135
|
There are many [vosk models](https://alphacephei.com/vosk/models) available, and here a few are associated to [a handful of languages](scribe/models.toml) `en`, `fr`, `it`, `de` (so far).
|
|
131
136
|
|
|
132
|
-
|
|
137
|
+
The `openaiapi` backend uses `whisper-1` model at the time of writing. It requires an API key
|
|
133
138
|
```bash
|
|
134
|
-
scribe --backend
|
|
139
|
+
scribe --backend openaiapi --api YOURAPIKEY
|
|
135
140
|
```
|
|
136
141
|
where `--no-prompt` jumps right to the recording (after the first interruption, you can still choose to change the backend and model).
|
|
137
142
|
|
|
@@ -190,7 +195,8 @@ To activate start with:
|
|
|
190
195
|
```bash
|
|
191
196
|
scribe --app
|
|
192
197
|
```
|
|
193
|
-
or toggle the app option in the interactive menu. The scribe icon will show, with Record or Quit options.
|
|
198
|
+
or toggle the app option in the interactive menu. The scribe icon will show, with Record, Stop or Quit options. The icon will change based on what the app is doing.
|
|
199
|
+
For the vosk model, there are only two states : recording + transcribing or Idle. For the whisper model there are three states visible from the icon: recording, transcribing and idle/waiting.
|
|
194
200
|
That option requires `pystray` to be installed. This is included with the `pip install ...[all]` option. In Ubuntu the following dependencies were required to make the menus appear:
|
|
195
201
|
|
|
196
202
|
```bash
|
|
@@ -213,8 +219,8 @@ scribe-install --name "Scribe Vosk FR" --backend vosk --language fr --keyboard -
|
|
|
213
219
|
```
|
|
214
220
|
This will install three separate apps:
|
|
215
221
|
- `Super + scribe` : will launch the default version with terminal prompt
|
|
216
|
-
- `Super + whisper` : will launch a present version with the `small` model from `whisper` and start recording right away. You can see what is going on in the terminal and the result is ready to paste from the clipboard
|
|
217
|
-
- `Super + vosk fr` : will launch a preset version for real-time transcription in French with the `vosk` backend, and throughput to the clipboard and the keyboard, not even opening a terminal.
|
|
222
|
+
- `Super + whisper` : will launch a present version with the `small` model from `whisper` and start recording right away. You can see what is going on in the terminal and the result is ready to paste from the clipboard.
|
|
223
|
+
- `Super + vosk fr` : will launch a preset version for real-time transcription in French with the `vosk` backend, and throughput to the clipboard and the keyboard, not even opening a terminal (you need to press Record in the tray icon menu to start the recording).
|
|
218
224
|
|
|
219
225
|
|
|
220
226
|
## Fine tuning
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|