scribe-cli 0.7.9__tar.gz → 0.7.11__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {scribe_cli-0.7.9/scribe_cli.egg-info → scribe_cli-0.7.11}/PKG-INFO +13 -10
- {scribe_cli-0.7.9 → scribe_cli-0.7.11}/README.md +12 -9
- {scribe_cli-0.7.9 → scribe_cli-0.7.11}/scribe/_version.py +2 -2
- {scribe_cli-0.7.9 → scribe_cli-0.7.11}/scribe/app.py +56 -28
- {scribe_cli-0.7.9 → scribe_cli-0.7.11}/scribe/models.py +22 -3
- {scribe_cli-0.7.9 → scribe_cli-0.7.11/scribe_cli.egg-info}/PKG-INFO +13 -10
- {scribe_cli-0.7.9 → scribe_cli-0.7.11}/.github/workflows/pypi.yml +0 -0
- {scribe_cli-0.7.9 → scribe_cli-0.7.11}/.gitignore +0 -0
- {scribe_cli-0.7.9 → scribe_cli-0.7.11}/LICENSE +0 -0
- {scribe_cli-0.7.9 → scribe_cli-0.7.11}/icon.xcf +0 -0
- {scribe_cli-0.7.9 → scribe_cli-0.7.11}/pyproject.toml +0 -0
- {scribe_cli-0.7.9 → scribe_cli-0.7.11}/scribe/__init__.py +0 -0
- {scribe_cli-0.7.9 → scribe_cli-0.7.11}/scribe/audio.py +0 -0
- {scribe_cli-0.7.9 → scribe_cli-0.7.11}/scribe/install_desktop.py +0 -0
- {scribe_cli-0.7.9 → scribe_cli-0.7.11}/scribe/keyboard.py +0 -0
- {scribe_cli-0.7.9 → scribe_cli-0.7.11}/scribe/models.toml +0 -0
- {scribe_cli-0.7.9 → scribe_cli-0.7.11}/scribe/saverecording.py +0 -0
- {scribe_cli-0.7.9 → scribe_cli-0.7.11}/scribe/testpynput.py +0 -0
- {scribe_cli-0.7.9 → scribe_cli-0.7.11}/scribe/util.py +0 -0
- {scribe_cli-0.7.9 → scribe_cli-0.7.11}/scribe_cli.egg-info/SOURCES.txt +0 -0
- {scribe_cli-0.7.9 → scribe_cli-0.7.11}/scribe_cli.egg-info/dependency_links.txt +0 -0
- {scribe_cli-0.7.9 → scribe_cli-0.7.11}/scribe_cli.egg-info/entry_points.txt +0 -0
- {scribe_cli-0.7.9 → scribe_cli-0.7.11}/scribe_cli.egg-info/requires.txt +0 -0
- {scribe_cli-0.7.9 → scribe_cli-0.7.11}/scribe_cli.egg-info/top_level.txt +0 -0
- {scribe_cli-0.7.9 → scribe_cli-0.7.11}/scribe_data/__init__.py +0 -0
- {scribe_cli-0.7.9 → scribe_cli-0.7.11}/scribe_data/share/icon.png +0 -0
- {scribe_cli-0.7.9 → scribe_cli-0.7.11}/scribe_data/share/icon_recording.png +0 -0
- {scribe_cli-0.7.9 → scribe_cli-0.7.11}/scribe_data/share/icon_writing.png +0 -0
- {scribe_cli-0.7.9 → scribe_cli-0.7.11}/scribe_data/templates/scribe.desktop +0 -0
- {scribe_cli-0.7.9 → scribe_cli-0.7.11}/setup.cfg +0 -0
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.2
|
|
2
2
|
Name: scribe-cli
|
|
3
|
-
Version: 0.7.
|
|
3
|
+
Version: 0.7.11
|
|
4
4
|
Summary: scribe is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard on a computer
|
|
5
5
|
Author-email: Mahé Perrette <mahe.perrette@gmail.com>
|
|
6
6
|
License: MIT License
|
|
@@ -64,14 +64,17 @@ Requires-Dist: pystray; extra == "all"
|
|
|
64
64
|
[]()
|
|
65
65
|
[](https://pypi.org/project/scribe-cli)
|
|
66
66
|
|
|
67
|
-
# Scribe
|
|
67
|
+
# Scribe <img src="scribe_data/share/icon.png" width=48px>
|
|
68
68
|
|
|
69
69
|
`scribe` is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard on a computer.
|
|
70
70
|
|
|
71
71
|
## Compatibility
|
|
72
72
|
|
|
73
|
-
In principle `scribe` is compatible with any OS but I develop it under Ubuntu (Wayland)
|
|
74
|
-
|
|
73
|
+
In principle `scribe` is compatible with any OS but I develop it under Ubuntu (Wayland) for my own purposes so glitches are likely on other configurations.
|
|
74
|
+
Moreover there are quite a bit of dependencies that rely on very OS-specific protocols under the hood, like access to the microphone, keyboard and clipboard,
|
|
75
|
+
and even though the python dependencies `scribe` relies on are not restricted to a single platform, there may be limitation and additional binaries to install.
|
|
76
|
+
This guide is based on python3.12 running on Ubuntu 24.04 with Gnome + Wayland, which is a relatively standard setting at the time of writing.
|
|
77
|
+
Note as of February 19, 2025 python 13 does not seem to produce any transcription (I am not sure which dependency is to blame).
|
|
75
78
|
A test on Mac OS (M1 Air with 8Gb RAM) worked with python 12, though with a much inferior performance compared to my own system (Lenovo T14 Gen 5 with i5 125U 32 Gb RAM).
|
|
76
79
|
|
|
77
80
|
## Installation
|
|
@@ -101,7 +104,7 @@ pip install -e .[all]
|
|
|
101
104
|
You can leave the optional dependencies (leave out `[all]`) but must install at least one of `vosk` or `openai-whisper` packages (see Usage below).
|
|
102
105
|
|
|
103
106
|
The `vosk` language models will download on-the-fly.
|
|
104
|
-
The default download folder is `$XDG_CACHE_HOME/{backend}` where `$XDG_CACHE_HOME` defaults to `$HOME/.cache` (note for the `
|
|
107
|
+
The default download folder is `$XDG_CACHE_HOME/{backend}` where `$XDG_CACHE_HOME` defaults to `$HOME/.cache` (note for the `whisper` backend
|
|
105
108
|
the default is left to the `openai-whisper` package and might change in the future).
|
|
106
109
|
|
|
107
110
|
|
|
@@ -119,8 +122,8 @@ You can interrupt the recording via Ctrl + C and start again or change model. Th
|
|
|
119
122
|
|
|
120
123
|
The default (`whisper`) is excellent at transcribing a full-length audio sequences in [many languages](https://github.com/openai/whisper?tab=readme-ov-file#available-models-and-languages). It is really impressive,
|
|
121
124
|
but it cannot do real-time, and depending on the model can have relatively long execution time, especially with the `turbo` model (at least on my laptop with CPU only). The `small` model is also excellent and runs much faster. It is selected as default in `scribe` for that reason.
|
|
122
|
-
With the `
|
|
123
|
-
|
|
125
|
+
With the `whisper` model the registration stops after a 2-second silence is detected. You can also stop the registration manually before the transcription occurs (Ctrl + C or Stop in the `--app` mode).
|
|
126
|
+
By default, the recording will only last 120 seconds. You can fine-tune this behaviour via the `--silence`, `--duration` and `--restart-after-silence` parameters.
|
|
124
127
|
|
|
125
128
|
The `vosk` backend is much faster and very good at doing real-time transcription for one language, but tended to make more mistakes in my tests and it does not do punctuation.
|
|
126
129
|
It becomes really powerful when used for longer or interactive typing session with the [keyboard](#virtual-keyboard-experimental) option, e.g. to make notes or chat with an AI.
|
|
@@ -147,7 +150,7 @@ scribe --keyboard
|
|
|
147
150
|
It relies on the optional `pynput` dependency (installed together with `scribe` if you used the `[all]` or `[keyboard]` option).
|
|
148
151
|
Depending on your operating system, `pynput` may require additional configuration to work around its [limitations](https://pynput.readthedocs.io/en/latest/limitations.html).
|
|
149
152
|
|
|
150
|
-
#### Use the keyboard
|
|
153
|
+
#### Use the keyboard with Wayland (default for Ubuntu 24.04)
|
|
151
154
|
|
|
152
155
|
In my Ubuntu + Wayland system the keyboard simulation works out-of-the-box in chromium based applications (including vscode) but it does not in firefox and sublime text and any of the rest (not even in a terminal !). I am told this is because Chromium runs an X server emulator and so is compatible with the default pynput backend.
|
|
153
156
|
|
|
@@ -161,7 +164,7 @@ sudo HOME=$HOME XDG_RUNTIME_DIR=$XDG_RUNTIME_DIR PYNPUT_BACKEND_KEYBOARD=uinput
|
|
|
161
164
|
```
|
|
162
165
|
You're on the right path :)
|
|
163
166
|
|
|
164
|
-
### System tray icon (experimental)
|
|
167
|
+
### System tray icon (experimental) <img src="scribe_data/share/icon.png" width=48px>
|
|
165
168
|
|
|
166
169
|
To avoid switching back and forth with the terminal, it's possible to interact with the program via an icon tray.
|
|
167
170
|
To activate start with:
|
|
@@ -176,7 +179,7 @@ sudo apt install libcairo-dev libgirepository1.0-dev gir1.2-appindicator3-0.1
|
|
|
176
179
|
pip install PyGObject
|
|
177
180
|
```
|
|
178
181
|
|
|
179
|
-
### Start as an application in
|
|
182
|
+
### Start as an application in GNOME
|
|
180
183
|
|
|
181
184
|
If you run Ubuntu (or else?) with GNOME, the script `scribe-install [...]` will create a `scribe.desktop` file and place it under `$HOME/.local/share/applications`
|
|
182
185
|
to make it available from the quick launch menu. Any option will be passed on to `scribe`.
|
|
@@ -1,14 +1,17 @@
|
|
|
1
1
|
[]()
|
|
2
2
|
[](https://pypi.org/project/scribe-cli)
|
|
3
3
|
|
|
4
|
-
# Scribe
|
|
4
|
+
# Scribe <img src="scribe_data/share/icon.png" width=48px>
|
|
5
5
|
|
|
6
6
|
`scribe` is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard on a computer.
|
|
7
7
|
|
|
8
8
|
## Compatibility
|
|
9
9
|
|
|
10
|
-
In principle `scribe` is compatible with any OS but I develop it under Ubuntu (Wayland)
|
|
11
|
-
|
|
10
|
+
In principle `scribe` is compatible with any OS but I develop it under Ubuntu (Wayland) for my own purposes so glitches are likely on other configurations.
|
|
11
|
+
Moreover there are quite a bit of dependencies that rely on very OS-specific protocols under the hood, like access to the microphone, keyboard and clipboard,
|
|
12
|
+
and even though the python dependencies `scribe` relies on are not restricted to a single platform, there may be limitation and additional binaries to install.
|
|
13
|
+
This guide is based on python3.12 running on Ubuntu 24.04 with Gnome + Wayland, which is a relatively standard setting at the time of writing.
|
|
14
|
+
Note as of February 19, 2025 python 13 does not seem to produce any transcription (I am not sure which dependency is to blame).
|
|
12
15
|
A test on Mac OS (M1 Air with 8Gb RAM) worked with python 12, though with a much inferior performance compared to my own system (Lenovo T14 Gen 5 with i5 125U 32 Gb RAM).
|
|
13
16
|
|
|
14
17
|
## Installation
|
|
@@ -38,7 +41,7 @@ pip install -e .[all]
|
|
|
38
41
|
You can leave the optional dependencies (leave out `[all]`) but must install at least one of `vosk` or `openai-whisper` packages (see Usage below).
|
|
39
42
|
|
|
40
43
|
The `vosk` language models will download on-the-fly.
|
|
41
|
-
The default download folder is `$XDG_CACHE_HOME/{backend}` where `$XDG_CACHE_HOME` defaults to `$HOME/.cache` (note for the `
|
|
44
|
+
The default download folder is `$XDG_CACHE_HOME/{backend}` where `$XDG_CACHE_HOME` defaults to `$HOME/.cache` (note for the `whisper` backend
|
|
42
45
|
the default is left to the `openai-whisper` package and might change in the future).
|
|
43
46
|
|
|
44
47
|
|
|
@@ -56,8 +59,8 @@ You can interrupt the recording via Ctrl + C and start again or change model. Th
|
|
|
56
59
|
|
|
57
60
|
The default (`whisper`) is excellent at transcribing a full-length audio sequences in [many languages](https://github.com/openai/whisper?tab=readme-ov-file#available-models-and-languages). It is really impressive,
|
|
58
61
|
but it cannot do real-time, and depending on the model can have relatively long execution time, especially with the `turbo` model (at least on my laptop with CPU only). The `small` model is also excellent and runs much faster. It is selected as default in `scribe` for that reason.
|
|
59
|
-
With the `
|
|
60
|
-
|
|
62
|
+
With the `whisper` model the registration stops after a 2-second silence is detected. You can also stop the registration manually before the transcription occurs (Ctrl + C or Stop in the `--app` mode).
|
|
63
|
+
By default, the recording will only last 120 seconds. You can fine-tune this behaviour via the `--silence`, `--duration` and `--restart-after-silence` parameters.
|
|
61
64
|
|
|
62
65
|
The `vosk` backend is much faster and very good at doing real-time transcription for one language, but tended to make more mistakes in my tests and it does not do punctuation.
|
|
63
66
|
It becomes really powerful when used for longer or interactive typing session with the [keyboard](#virtual-keyboard-experimental) option, e.g. to make notes or chat with an AI.
|
|
@@ -84,7 +87,7 @@ scribe --keyboard
|
|
|
84
87
|
It relies on the optional `pynput` dependency (installed together with `scribe` if you used the `[all]` or `[keyboard]` option).
|
|
85
88
|
Depending on your operating system, `pynput` may require additional configuration to work around its [limitations](https://pynput.readthedocs.io/en/latest/limitations.html).
|
|
86
89
|
|
|
87
|
-
#### Use the keyboard
|
|
90
|
+
#### Use the keyboard with Wayland (default for Ubuntu 24.04)
|
|
88
91
|
|
|
89
92
|
In my Ubuntu + Wayland system the keyboard simulation works out-of-the-box in chromium based applications (including vscode) but it does not in firefox and sublime text and any of the rest (not even in a terminal !). I am told this is because Chromium runs an X server emulator and so is compatible with the default pynput backend.
|
|
90
93
|
|
|
@@ -98,7 +101,7 @@ sudo HOME=$HOME XDG_RUNTIME_DIR=$XDG_RUNTIME_DIR PYNPUT_BACKEND_KEYBOARD=uinput
|
|
|
98
101
|
```
|
|
99
102
|
You're on the right path :)
|
|
100
103
|
|
|
101
|
-
### System tray icon (experimental)
|
|
104
|
+
### System tray icon (experimental) <img src="scribe_data/share/icon.png" width=48px>
|
|
102
105
|
|
|
103
106
|
To avoid switching back and forth with the terminal, it's possible to interact with the program via an icon tray.
|
|
104
107
|
To activate start with:
|
|
@@ -113,7 +116,7 @@ sudo apt install libcairo-dev libgirepository1.0-dev gir1.2-appindicator3-0.1
|
|
|
113
116
|
pip install PyGObject
|
|
114
117
|
```
|
|
115
118
|
|
|
116
|
-
### Start as an application in
|
|
119
|
+
### Start as an application in GNOME
|
|
117
120
|
|
|
118
121
|
If you run Ubuntu (or else?) with GNOME, the script `scribe-install [...]` will create a `scribe.desktop` file and place it under `$HOME/.local/share/applications`
|
|
119
122
|
to make it available from the quick launch menu. Any option will be passed on to `scribe`.
|
|
@@ -112,16 +112,17 @@ def get_transcriber(o, prompt=True):
|
|
|
112
112
|
choices = list(zip(available_models, available_languages)) + [f" * [Any model from {ansi_link('https://alphacephei.com/vosk/models')}]"]
|
|
113
113
|
default_model = choices[0] # this is a tuple !!
|
|
114
114
|
|
|
115
|
-
print(f"For information about vosk models see: {ansi_link('https://alphacephei.com/vosk/models')}")
|
|
116
115
|
if prompt:
|
|
116
|
+
print(f"For information about vosk models see: {ansi_link('https://alphacephei.com/vosk/models')}")
|
|
117
117
|
model = prompt_choices(choices, default=default_model, label="model") # this always returns a string
|
|
118
118
|
else:
|
|
119
119
|
model = default_model[0] if isinstance(default_model, tuple) else default_model # tuple -> string
|
|
120
120
|
|
|
121
121
|
elif backend == "whisper":
|
|
122
122
|
default_model = "small"
|
|
123
|
-
print("Some models have a specialized English version (.en) which will be selected as default is `-l en` was requested, but can also be requested explicitly below (option not listed). See [documentation](https://github.com/openai/whisper?tab=readme-ov-file#available-models-and-languages).")
|
|
124
123
|
if prompt:
|
|
124
|
+
# print("Some models have a specialized English version (.en) which will be selected as default is `-l en` was requested, but can also be requested explicitly below (option not listed). See [documentation](https://github.com/openai/whisper?tab=readme-ov-file#available-models-and-languages).")
|
|
125
|
+
print(f"See {ansi_link('https://github.com/openai/whisper?tab=readme-ov-file#available-models-and-languages')} for available models.")
|
|
125
126
|
model = prompt_choices(whisper_models, default=default_model, label="model",
|
|
126
127
|
hidden_models=whisper_english_models)
|
|
127
128
|
else:
|
|
@@ -146,7 +147,8 @@ def get_transcriber(o, prompt=True):
|
|
|
146
147
|
|
|
147
148
|
elif backend == "whisper":
|
|
148
149
|
transcriber = WhisperTranscriber(model_name=model, language=o.language, samplerate=o.samplerate,
|
|
149
|
-
timeout=o.duration, silence_duration=o.silence,
|
|
150
|
+
timeout=o.duration, silence_duration=o.silence, silence_thresh=o.silence_db,
|
|
151
|
+
restart_after_silence=o.restart_after_silence,
|
|
150
152
|
model_kwargs={"download_root": o.download_folder_whisper})
|
|
151
153
|
|
|
152
154
|
else:
|
|
@@ -177,12 +179,13 @@ def get_parser():
|
|
|
177
179
|
parser.add_argument("--microphone-device", help="The device index of the microphone to use.", type=int)
|
|
178
180
|
parser.add_argument("--keyboard", action="store_true")
|
|
179
181
|
parser.add_argument("--no-clipboard", dest="clipboard", action="store_false")
|
|
180
|
-
parser.add_argument("--latency", default=0, type=float, help="keyboard latency")
|
|
182
|
+
parser.add_argument("--latency", default=0.01, type=float, help="keyboard latency")
|
|
181
183
|
parser.add_argument("--ascii", action="store_true", help="Use unidecode for keyboard typing in ascii")
|
|
182
184
|
|
|
183
185
|
group = parser.add_argument_group("whisper options")
|
|
184
|
-
group.add_argument("--duration", default=120, type=
|
|
185
|
-
group.add_argument("--silence", default=2, type=float, help="silence duration
|
|
186
|
+
group.add_argument("--duration", default=120, type=float, help="Max duration of the whisper recording (default %(default)ss)")
|
|
187
|
+
group.add_argument("--silence", default=2, type=float, help="silence duration (default %(default)ss)")
|
|
188
|
+
group.add_argument("--silence-db", default=-30, type=float, help="silence magnitude in db (default %(default)ss)")
|
|
186
189
|
group.add_argument("--restart-after-silence", action="store_true", help="Restart the recording after a transcription triggered by a silence")
|
|
187
190
|
|
|
188
191
|
parser.add_argument("--download-folder-vosk", help="Folder to store Vosk models.")
|
|
@@ -248,7 +251,15 @@ def create_app(micro, transcriber, **kwargs):
|
|
|
248
251
|
image_recording = Image.alpha_composite(image_recording.convert("RGBA"), image_writing.convert("RGBA"))
|
|
249
252
|
|
|
250
253
|
def update_icon(icon, force=False):
|
|
251
|
-
if transcriber.recording:
|
|
254
|
+
if transcriber.recording and transcriber.waiting:
|
|
255
|
+
# this is the situation with the whisper backend when the microphone is recording
|
|
256
|
+
# but we wait for the speaker to speak (silence)
|
|
257
|
+
if force or getattr(icon, "_icon_label", None) != None:
|
|
258
|
+
icon.icon = image
|
|
259
|
+
icon._icon_label = None
|
|
260
|
+
icon.update_menu()
|
|
261
|
+
|
|
262
|
+
elif transcriber.recording:
|
|
252
263
|
if force or getattr(icon, "_icon_label", None) != "recording":
|
|
253
264
|
icon.icon = image_recording
|
|
254
265
|
icon._icon_label = "recording"
|
|
@@ -338,38 +349,48 @@ def main(args=None):
|
|
|
338
349
|
micro = Microphone(samplerate=o.samplerate, device=o.microphone_device)
|
|
339
350
|
|
|
340
351
|
transcriber = None
|
|
341
|
-
|
|
342
|
-
toggle = {True: "On", False: "Off"}
|
|
352
|
+
details = False
|
|
343
353
|
|
|
344
354
|
while True:
|
|
345
355
|
if transcriber is None:
|
|
346
356
|
transcriber = get_transcriber(o, prompt=o.prompt)
|
|
347
357
|
print(f"Model [{colored(transcriber.model_name, 'light_blue', attrs=['bold'])}] from [{colored(transcriber.backend, 'light_blue', attrs=['bold'])}] selected.")
|
|
358
|
+
show_options = ["clipboard", "keyboard", "ascii", "app"]
|
|
359
|
+
activated_options = [colored(option, 'light_blue') for option in show_options if getattr(o, option)]
|
|
360
|
+
print(f"Options: {' | '.join(activated_options)}")
|
|
348
361
|
if o.prompt:
|
|
349
362
|
print(f"Choose any of the following actions")
|
|
350
363
|
print(f"{colored('[q]', 'light_yellow')} quit")
|
|
351
364
|
print(f"{colored('[e]', 'light_yellow')} change model")
|
|
352
|
-
|
|
353
|
-
|
|
354
|
-
|
|
355
|
-
|
|
356
|
-
|
|
357
|
-
|
|
358
|
-
|
|
359
|
-
|
|
360
|
-
|
|
361
|
-
|
|
362
|
-
|
|
363
|
-
|
|
364
|
-
|
|
365
|
-
|
|
366
|
-
|
|
365
|
+
if details:
|
|
366
|
+
print(f"{colored('[x]', 'light_yellow')} app is {colored(o.app, 'light_blue')} toggle?")
|
|
367
|
+
print(f"{colored('[c]', 'light_yellow')} clipboard is {colored(o.clipboard, 'light_blue')} toggle?")
|
|
368
|
+
print(f"{colored('[k]', 'light_yellow')} keyboard is {colored(o.keyboard, 'light_blue')} toggle?")
|
|
369
|
+
if o.keyboard:
|
|
370
|
+
print(f"{colored('[latency]', 'light_yellow')} between keystrokes is {colored(o.latency, 'light_blue')} s")
|
|
371
|
+
if transcriber.backend == "whisper":
|
|
372
|
+
print(f"{colored('[t]', 'light_yellow')} change duration (currently {colored(transcriber.timeout, 'light_blue')} s)")
|
|
373
|
+
print(f"{colored('[b]', 'light_yellow')} change silence (currently {colored(transcriber.silence_duration, 'light_blue')} s)")
|
|
374
|
+
print(f"{colored('[db]', 'light_yellow')} change backround noise (currently {colored(transcriber.silence_thresh, 'light_blue')} db)")
|
|
375
|
+
print(f"{colored('[a]', 'light_yellow')} auto-restart after silence is {colored(transcriber.restart_after_silence, 'light_blue')} toggle?")
|
|
376
|
+
exclude_flags = ["keyboard", "clipboard", "app", "prompt", "restart_after_silence"]
|
|
377
|
+
display_flags = [a.dest for a in parser._actions if a.help != argparse.SUPPRESS]
|
|
378
|
+
for key, value in vars(o).items():
|
|
379
|
+
if key not in display_flags or key in exclude_flags or not isinstance(value, bool):
|
|
380
|
+
continue
|
|
381
|
+
print(f"{colored(f'[{key}]', 'light_yellow')} is {colored(value, 'light_blue')} toggle?")
|
|
382
|
+
print(f"{colored('[o]', 'light_yellow')} hide options")
|
|
383
|
+
else:
|
|
384
|
+
print(f"{colored('[o]', 'light_yellow')} show options")
|
|
367
385
|
|
|
368
386
|
print(colored(f"Press [Enter] to start recording.", attrs=["bold"]))
|
|
369
387
|
|
|
370
388
|
key = input()
|
|
371
389
|
if key == "q":
|
|
372
390
|
exit(0)
|
|
391
|
+
if key == "o":
|
|
392
|
+
details = not details
|
|
393
|
+
continue
|
|
373
394
|
if key == "e":
|
|
374
395
|
transcriber = None
|
|
375
396
|
o.model = None
|
|
@@ -391,9 +412,9 @@ def main(args=None):
|
|
|
391
412
|
if key == "t":
|
|
392
413
|
ans = input(f"Enter new duration in seconds (current: {transcriber.timeout}): ")
|
|
393
414
|
try:
|
|
394
|
-
o.duration = transcriber.timeout =
|
|
415
|
+
o.duration = transcriber.timeout = float(ans)
|
|
395
416
|
except:
|
|
396
|
-
print("Invalid duration. Must be
|
|
417
|
+
print("Invalid duration. Must be a float.")
|
|
397
418
|
continue
|
|
398
419
|
if key == "latency":
|
|
399
420
|
ans = input(f"Enter new keyboard latency in seconds (current: {o.latency}): ")
|
|
@@ -405,9 +426,16 @@ def main(args=None):
|
|
|
405
426
|
if key == "b":
|
|
406
427
|
ans = input(f"Enter new silence break duration in seconds (current: {transcriber.silence_duration}): ")
|
|
407
428
|
try:
|
|
408
|
-
o.silence = transcriber.silence_duration =
|
|
429
|
+
o.silence = transcriber.silence_duration = float(ans)
|
|
430
|
+
except:
|
|
431
|
+
print("Invalid duration. Must be a float.")
|
|
432
|
+
continue
|
|
433
|
+
if key == "db":
|
|
434
|
+
ans = input(f"Enter new background noise threshold to detect silence (current: {transcriber.silence_thresh}): ")
|
|
435
|
+
try:
|
|
436
|
+
o.silence_db = transcriber.silence_thresh = float(ans)
|
|
409
437
|
except:
|
|
410
|
-
print("Invalid duration. Must be
|
|
438
|
+
print("Invalid duration. Must be a float.")
|
|
411
439
|
continue
|
|
412
440
|
if key:
|
|
413
441
|
if hasattr(o, key) and isinstance(getattr(o, key), bool):
|
|
@@ -34,6 +34,7 @@ class AbstractTranscriber:
|
|
|
34
34
|
self.restart_after_silence = restart_after_silence
|
|
35
35
|
self.recording = False
|
|
36
36
|
self.busy = False
|
|
37
|
+
self.waiting = False
|
|
37
38
|
self.reset()
|
|
38
39
|
|
|
39
40
|
def get_elapsed(self):
|
|
@@ -52,7 +53,6 @@ class AbstractTranscriber:
|
|
|
52
53
|
def reset(self):
|
|
53
54
|
self.audio_buffer = b''
|
|
54
55
|
self.start_time = time.time()
|
|
55
|
-
self.last_sound_time = time.time()
|
|
56
56
|
|
|
57
57
|
def start_recording(self, microphone,
|
|
58
58
|
start_message="Recording... Press Ctrl+C to stop.",
|
|
@@ -60,7 +60,13 @@ class AbstractTranscriber:
|
|
|
60
60
|
|
|
61
61
|
self.reset()
|
|
62
62
|
self.recording = True
|
|
63
|
+
self.waiting = True
|
|
63
64
|
self.busy = True
|
|
65
|
+
if self.silence_duration is not None:
|
|
66
|
+
self.last_sound_time = time.time() - self.silence_duration
|
|
67
|
+
else:
|
|
68
|
+
self.last_sound_time = time.time()
|
|
69
|
+
previous_waiting = self.waiting
|
|
64
70
|
|
|
65
71
|
try:
|
|
66
72
|
|
|
@@ -75,19 +81,31 @@ class AbstractTranscriber:
|
|
|
75
81
|
if is_silent(data, self.silence_thresh):
|
|
76
82
|
silence_duration = time.time() - self.last_sound_time
|
|
77
83
|
|
|
78
|
-
|
|
84
|
+
previous_waiting = self.waiting
|
|
85
|
+
self.waiting = self.silence_duration is not None and silence_duration >= self.silence_duration
|
|
86
|
+
|
|
87
|
+
if self.waiting and len(self.audio_buffer) > 0:
|
|
79
88
|
if self.restart_after_silence:
|
|
89
|
+
self.recording = False # for the system tray icon
|
|
80
90
|
result = self.finalize()
|
|
81
91
|
microphone.q.queue.clear()
|
|
82
92
|
self.reset()
|
|
83
93
|
yield result
|
|
94
|
+
self.recording = True # for the system tray icon
|
|
84
95
|
else:
|
|
85
96
|
raise StopRecording("Silence detected: {:.2f} seconds".format(silence_duration))
|
|
86
97
|
|
|
87
98
|
else:
|
|
88
99
|
self.last_sound_time = time.time()
|
|
100
|
+
self.waiting = False
|
|
89
101
|
|
|
90
|
-
|
|
102
|
+
# don't accumulate very long silences
|
|
103
|
+
if not self.waiting:
|
|
104
|
+
yield self.transcribe_realtime_audio(data)
|
|
105
|
+
|
|
106
|
+
else:
|
|
107
|
+
if not previous_waiting:
|
|
108
|
+
print("Silence detected...waiting for more audio")
|
|
91
109
|
|
|
92
110
|
if self.is_overtime():
|
|
93
111
|
raise StopRecording("Overtime: {:.2f} seconds".format(self.get_elapsed()))
|
|
@@ -98,6 +116,7 @@ class AbstractTranscriber:
|
|
|
98
116
|
pass
|
|
99
117
|
|
|
100
118
|
finally:
|
|
119
|
+
self.waiting = False
|
|
101
120
|
self.recording = False
|
|
102
121
|
result = self.finalize()
|
|
103
122
|
microphone.q.queue.clear()
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.2
|
|
2
2
|
Name: scribe-cli
|
|
3
|
-
Version: 0.7.
|
|
3
|
+
Version: 0.7.11
|
|
4
4
|
Summary: scribe is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard on a computer
|
|
5
5
|
Author-email: Mahé Perrette <mahe.perrette@gmail.com>
|
|
6
6
|
License: MIT License
|
|
@@ -64,14 +64,17 @@ Requires-Dist: pystray; extra == "all"
|
|
|
64
64
|
[]()
|
|
65
65
|
[](https://pypi.org/project/scribe-cli)
|
|
66
66
|
|
|
67
|
-
# Scribe
|
|
67
|
+
# Scribe <img src="scribe_data/share/icon.png" width=48px>
|
|
68
68
|
|
|
69
69
|
`scribe` is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard on a computer.
|
|
70
70
|
|
|
71
71
|
## Compatibility
|
|
72
72
|
|
|
73
|
-
In principle `scribe` is compatible with any OS but I develop it under Ubuntu (Wayland)
|
|
74
|
-
|
|
73
|
+
In principle `scribe` is compatible with any OS but I develop it under Ubuntu (Wayland) for my own purposes so glitches are likely on other configurations.
|
|
74
|
+
Moreover there are quite a bit of dependencies that rely on very OS-specific protocols under the hood, like access to the microphone, keyboard and clipboard,
|
|
75
|
+
and even though the python dependencies `scribe` relies on are not restricted to a single platform, there may be limitation and additional binaries to install.
|
|
76
|
+
This guide is based on python3.12 running on Ubuntu 24.04 with Gnome + Wayland, which is a relatively standard setting at the time of writing.
|
|
77
|
+
Note as of February 19, 2025 python 13 does not seem to produce any transcription (I am not sure which dependency is to blame).
|
|
75
78
|
A test on Mac OS (M1 Air with 8Gb RAM) worked with python 12, though with a much inferior performance compared to my own system (Lenovo T14 Gen 5 with i5 125U 32 Gb RAM).
|
|
76
79
|
|
|
77
80
|
## Installation
|
|
@@ -101,7 +104,7 @@ pip install -e .[all]
|
|
|
101
104
|
You can leave the optional dependencies (leave out `[all]`) but must install at least one of `vosk` or `openai-whisper` packages (see Usage below).
|
|
102
105
|
|
|
103
106
|
The `vosk` language models will download on-the-fly.
|
|
104
|
-
The default download folder is `$XDG_CACHE_HOME/{backend}` where `$XDG_CACHE_HOME` defaults to `$HOME/.cache` (note for the `
|
|
107
|
+
The default download folder is `$XDG_CACHE_HOME/{backend}` where `$XDG_CACHE_HOME` defaults to `$HOME/.cache` (note for the `whisper` backend
|
|
105
108
|
the default is left to the `openai-whisper` package and might change in the future).
|
|
106
109
|
|
|
107
110
|
|
|
@@ -119,8 +122,8 @@ You can interrupt the recording via Ctrl + C and start again or change model. Th
|
|
|
119
122
|
|
|
120
123
|
The default (`whisper`) is excellent at transcribing a full-length audio sequences in [many languages](https://github.com/openai/whisper?tab=readme-ov-file#available-models-and-languages). It is really impressive,
|
|
121
124
|
but it cannot do real-time, and depending on the model can have relatively long execution time, especially with the `turbo` model (at least on my laptop with CPU only). The `small` model is also excellent and runs much faster. It is selected as default in `scribe` for that reason.
|
|
122
|
-
With the `
|
|
123
|
-
|
|
125
|
+
With the `whisper` model the registration stops after a 2-second silence is detected. You can also stop the registration manually before the transcription occurs (Ctrl + C or Stop in the `--app` mode).
|
|
126
|
+
By default, the recording will only last 120 seconds. You can fine-tune this behaviour via the `--silence`, `--duration` and `--restart-after-silence` parameters.
|
|
124
127
|
|
|
125
128
|
The `vosk` backend is much faster and very good at doing real-time transcription for one language, but tended to make more mistakes in my tests and it does not do punctuation.
|
|
126
129
|
It becomes really powerful when used for longer or interactive typing session with the [keyboard](#virtual-keyboard-experimental) option, e.g. to make notes or chat with an AI.
|
|
@@ -147,7 +150,7 @@ scribe --keyboard
|
|
|
147
150
|
It relies on the optional `pynput` dependency (installed together with `scribe` if you used the `[all]` or `[keyboard]` option).
|
|
148
151
|
Depending on your operating system, `pynput` may require additional configuration to work around its [limitations](https://pynput.readthedocs.io/en/latest/limitations.html).
|
|
149
152
|
|
|
150
|
-
#### Use the keyboard
|
|
153
|
+
#### Use the keyboard with Wayland (default for Ubuntu 24.04)
|
|
151
154
|
|
|
152
155
|
In my Ubuntu + Wayland system the keyboard simulation works out-of-the-box in chromium based applications (including vscode) but it does not in firefox and sublime text and any of the rest (not even in a terminal !). I am told this is because Chromium runs an X server emulator and so is compatible with the default pynput backend.
|
|
153
156
|
|
|
@@ -161,7 +164,7 @@ sudo HOME=$HOME XDG_RUNTIME_DIR=$XDG_RUNTIME_DIR PYNPUT_BACKEND_KEYBOARD=uinput
|
|
|
161
164
|
```
|
|
162
165
|
You're on the right path :)
|
|
163
166
|
|
|
164
|
-
### System tray icon (experimental)
|
|
167
|
+
### System tray icon (experimental) <img src="scribe_data/share/icon.png" width=48px>
|
|
165
168
|
|
|
166
169
|
To avoid switching back and forth with the terminal, it's possible to interact with the program via an icon tray.
|
|
167
170
|
To activate start with:
|
|
@@ -176,7 +179,7 @@ sudo apt install libcairo-dev libgirepository1.0-dev gir1.2-appindicator3-0.1
|
|
|
176
179
|
pip install PyGObject
|
|
177
180
|
```
|
|
178
181
|
|
|
179
|
-
### Start as an application in
|
|
182
|
+
### Start as an application in GNOME
|
|
180
183
|
|
|
181
184
|
If you run Ubuntu (or else?) with GNOME, the script `scribe-install [...]` will create a `scribe.desktop` file and place it under `$HOME/.local/share/applications`
|
|
182
185
|
to make it available from the quick launch menu. Any option will be passed on to `scribe`.
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|