scribe-cli 0.7.11__tar.gz → 0.8.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {scribe_cli-0.7.11/scribe_cli.egg-info → scribe_cli-0.8.0}/PKG-INFO +41 -12
- {scribe_cli-0.7.11 → scribe_cli-0.8.0}/README.md +40 -11
- {scribe_cli-0.7.11 → scribe_cli-0.8.0}/scribe/_version.py +2 -2
- {scribe_cli-0.7.11 → scribe_cli-0.8.0}/scribe/app.py +51 -28
- {scribe_cli-0.7.11 → scribe_cli-0.8.0/scribe_cli.egg-info}/PKG-INFO +41 -12
- {scribe_cli-0.7.11 → scribe_cli-0.8.0}/.github/workflows/pypi.yml +0 -0
- {scribe_cli-0.7.11 → scribe_cli-0.8.0}/.gitignore +0 -0
- {scribe_cli-0.7.11 → scribe_cli-0.8.0}/LICENSE +0 -0
- {scribe_cli-0.7.11 → scribe_cli-0.8.0}/icon.xcf +0 -0
- {scribe_cli-0.7.11 → scribe_cli-0.8.0}/pyproject.toml +0 -0
- {scribe_cli-0.7.11 → scribe_cli-0.8.0}/scribe/__init__.py +0 -0
- {scribe_cli-0.7.11 → scribe_cli-0.8.0}/scribe/audio.py +0 -0
- {scribe_cli-0.7.11 → scribe_cli-0.8.0}/scribe/install_desktop.py +0 -0
- {scribe_cli-0.7.11 → scribe_cli-0.8.0}/scribe/keyboard.py +0 -0
- {scribe_cli-0.7.11 → scribe_cli-0.8.0}/scribe/models.py +0 -0
- {scribe_cli-0.7.11 → scribe_cli-0.8.0}/scribe/models.toml +0 -0
- {scribe_cli-0.7.11 → scribe_cli-0.8.0}/scribe/saverecording.py +0 -0
- {scribe_cli-0.7.11 → scribe_cli-0.8.0}/scribe/testpynput.py +0 -0
- {scribe_cli-0.7.11 → scribe_cli-0.8.0}/scribe/util.py +0 -0
- {scribe_cli-0.7.11 → scribe_cli-0.8.0}/scribe_cli.egg-info/SOURCES.txt +0 -0
- {scribe_cli-0.7.11 → scribe_cli-0.8.0}/scribe_cli.egg-info/dependency_links.txt +0 -0
- {scribe_cli-0.7.11 → scribe_cli-0.8.0}/scribe_cli.egg-info/entry_points.txt +0 -0
- {scribe_cli-0.7.11 → scribe_cli-0.8.0}/scribe_cli.egg-info/requires.txt +0 -0
- {scribe_cli-0.7.11 → scribe_cli-0.8.0}/scribe_cli.egg-info/top_level.txt +0 -0
- {scribe_cli-0.7.11 → scribe_cli-0.8.0}/scribe_data/__init__.py +0 -0
- {scribe_cli-0.7.11 → scribe_cli-0.8.0}/scribe_data/share/icon.png +0 -0
- {scribe_cli-0.7.11 → scribe_cli-0.8.0}/scribe_data/share/icon_recording.png +0 -0
- {scribe_cli-0.7.11 → scribe_cli-0.8.0}/scribe_data/share/icon_writing.png +0 -0
- {scribe_cli-0.7.11 → scribe_cli-0.8.0}/scribe_data/templates/scribe.desktop +0 -0
- {scribe_cli-0.7.11 → scribe_cli-0.8.0}/setup.cfg +0 -0
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.2
|
|
2
2
|
Name: scribe-cli
|
|
3
|
-
Version: 0.
|
|
3
|
+
Version: 0.8.0
|
|
4
4
|
Summary: scribe is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard on a computer
|
|
5
5
|
Author-email: Mahé Perrette <mahe.perrette@gmail.com>
|
|
6
6
|
License: MIT License
|
|
@@ -118,7 +118,7 @@ scribe
|
|
|
118
118
|
and the script will guide you through the choice of backend (`whisper` or `vosk`) and the specific language model.
|
|
119
119
|
After this, you will be prompted to start recording your microphone and print the transcribed text in real-time (`vosk`)
|
|
120
120
|
or until after recording is complete (`whisper`).
|
|
121
|
-
You can interrupt the recording via Ctrl + C and start again or change model.
|
|
121
|
+
You can interrupt the recording via Ctrl + C and start again or change model.
|
|
122
122
|
|
|
123
123
|
The default (`whisper`) is excellent at transcribing a full-length audio sequences in [many languages](https://github.com/openai/whisper?tab=readme-ov-file#available-models-and-languages). It is really impressive,
|
|
124
124
|
but it cannot do real-time, and depending on the model can have relatively long execution time, especially with the `turbo` model (at least on my laptop with CPU only). The `small` model is also excellent and runs much faster. It is selected as default in `scribe` for that reason.
|
|
@@ -135,19 +135,38 @@ scribe --backend whisper --model small --no-prompt
|
|
|
135
135
|
```
|
|
136
136
|
where `--no-prompt` jumps right to the recording (after the first interruption, you can still choose to change the backend and model).
|
|
137
137
|
|
|
138
|
-
|
|
138
|
+
## Output media
|
|
139
|
+
|
|
140
|
+
By default the transcription is printed on the terminal, but other output media are supported.
|
|
141
|
+
|
|
142
|
+
### Clipboard
|
|
143
|
+
|
|
144
|
+
The most straightforward is the clipboard:
|
|
145
|
+
|
|
146
|
+
```bash
|
|
147
|
+
scribe --clipboard
|
|
148
|
+
```
|
|
149
|
+
The content of the (full) transcription is then pasted to the clipboard, and it is up to the user to paste (e.g. Ctrl + V).
|
|
150
|
+
|
|
151
|
+
### Output file
|
|
152
|
+
|
|
153
|
+
Alternatively an output file can be indicated:
|
|
154
|
+
|
|
155
|
+
```bash
|
|
156
|
+
--keyboard -o transcription.txt
|
|
157
|
+
```
|
|
139
158
|
|
|
140
|
-
|
|
141
|
-
However with the `vosk` backend and its realtime transcription, it is very handy to have the keys sent directly to the keyboard.
|
|
142
|
-
That can be achieve with the `--keyboard` option.
|
|
159
|
+
### Virtual keyboard (experimental)
|
|
143
160
|
|
|
144
|
-
With the `--keyboard` option `scribe` will attempt to simulate a keyboard and send transcribed characters to the
|
|
161
|
+
With the `--keyboard` option `scribe` will attempt to simulate a keyboard and send transcribed characters to the application under focus:
|
|
145
162
|
|
|
146
163
|
```bash
|
|
147
164
|
scribe --keyboard
|
|
148
165
|
```
|
|
149
166
|
|
|
150
|
-
|
|
167
|
+
This can be extremely useful with the `vosk` backend and its realtime transcription, or alternatively with the `--restart` option with the `whisper` backend.
|
|
168
|
+
|
|
169
|
+
The `--keyboard` option relies on the optional `pynput` dependency (installed together with `scribe` if you used the `[all]` or `[keyboard]` option).
|
|
151
170
|
Depending on your operating system, `pynput` may require additional configuration to work around its [limitations](https://pynput.readthedocs.io/en/latest/limitations.html).
|
|
152
171
|
|
|
153
172
|
#### Use the keyboard with Wayland (default for Ubuntu 24.04)
|
|
@@ -164,7 +183,7 @@ sudo HOME=$HOME XDG_RUNTIME_DIR=$XDG_RUNTIME_DIR PYNPUT_BACKEND_KEYBOARD=uinput
|
|
|
164
183
|
```
|
|
165
184
|
You're on the right path :)
|
|
166
185
|
|
|
167
|
-
|
|
186
|
+
## System tray icon (experimental) <img src="scribe_data/share/icon.png" width=48px>
|
|
168
187
|
|
|
169
188
|
To avoid switching back and forth with the terminal, it's possible to interact with the program via an icon tray.
|
|
170
189
|
To activate start with:
|
|
@@ -179,7 +198,7 @@ sudo apt install libcairo-dev libgirepository1.0-dev gir1.2-appindicator3-0.1
|
|
|
179
198
|
pip install PyGObject
|
|
180
199
|
```
|
|
181
200
|
|
|
182
|
-
|
|
201
|
+
## Start as an application in GNOME
|
|
183
202
|
|
|
184
203
|
If you run Ubuntu (or else?) with GNOME, the script `scribe-install [...]` will create a `scribe.desktop` file and place it under `$HOME/.local/share/applications`
|
|
185
204
|
to make it available from the quick launch menu. Any option will be passed on to `scribe`.
|
|
@@ -187,7 +206,17 @@ to make it available from the quick launch menu. Any option will be passed on to
|
|
|
187
206
|
e.g.
|
|
188
207
|
|
|
189
208
|
```bash
|
|
190
|
-
scribe-install --backend whisper --model small
|
|
209
|
+
scribe-install --backend whisper --model small --clipboard --keyboard --restart-after-silence
|
|
191
210
|
```
|
|
192
211
|
|
|
193
|
-
After that just typing
|
|
212
|
+
After that just typing Super + scri... at any time from any where will conveniently start the app in its own terminal with the prescribed options.
|
|
213
|
+
|
|
214
|
+
|
|
215
|
+
## Fine tuning
|
|
216
|
+
|
|
217
|
+
There are a number of options to control the silence threshold, duration and more.
|
|
218
|
+
Best is to check the available options in the online help:
|
|
219
|
+
|
|
220
|
+
```bash
|
|
221
|
+
scribe --help
|
|
222
|
+
```
|
|
@@ -55,7 +55,7 @@ scribe
|
|
|
55
55
|
and the script will guide you through the choice of backend (`whisper` or `vosk`) and the specific language model.
|
|
56
56
|
After this, you will be prompted to start recording your microphone and print the transcribed text in real-time (`vosk`)
|
|
57
57
|
or until after recording is complete (`whisper`).
|
|
58
|
-
You can interrupt the recording via Ctrl + C and start again or change model.
|
|
58
|
+
You can interrupt the recording via Ctrl + C and start again or change model.
|
|
59
59
|
|
|
60
60
|
The default (`whisper`) is excellent at transcribing a full-length audio sequences in [many languages](https://github.com/openai/whisper?tab=readme-ov-file#available-models-and-languages). It is really impressive,
|
|
61
61
|
but it cannot do real-time, and depending on the model can have relatively long execution time, especially with the `turbo` model (at least on my laptop with CPU only). The `small` model is also excellent and runs much faster. It is selected as default in `scribe` for that reason.
|
|
@@ -72,19 +72,38 @@ scribe --backend whisper --model small --no-prompt
|
|
|
72
72
|
```
|
|
73
73
|
where `--no-prompt` jumps right to the recording (after the first interruption, you can still choose to change the backend and model).
|
|
74
74
|
|
|
75
|
-
|
|
75
|
+
## Output media
|
|
76
|
+
|
|
77
|
+
By default the transcription is printed on the terminal, but other output media are supported.
|
|
78
|
+
|
|
79
|
+
### Clipboard
|
|
80
|
+
|
|
81
|
+
The most straightforward is the clipboard:
|
|
82
|
+
|
|
83
|
+
```bash
|
|
84
|
+
scribe --clipboard
|
|
85
|
+
```
|
|
86
|
+
The content of the (full) transcription is then pasted to the clipboard, and it is up to the user to paste (e.g. Ctrl + V).
|
|
76
87
|
|
|
77
|
-
|
|
78
|
-
However with the `vosk` backend and its realtime transcription, it is very handy to have the keys sent directly to the keyboard.
|
|
79
|
-
That can be achieve with the `--keyboard` option.
|
|
88
|
+
### Output file
|
|
80
89
|
|
|
81
|
-
|
|
90
|
+
Alternatively an output file can be indicated:
|
|
91
|
+
|
|
92
|
+
```bash
|
|
93
|
+
--keyboard -o transcription.txt
|
|
94
|
+
```
|
|
95
|
+
|
|
96
|
+
### Virtual keyboard (experimental)
|
|
97
|
+
|
|
98
|
+
With the `--keyboard` option `scribe` will attempt to simulate a keyboard and send transcribed characters to the application under focus:
|
|
82
99
|
|
|
83
100
|
```bash
|
|
84
101
|
scribe --keyboard
|
|
85
102
|
```
|
|
86
103
|
|
|
87
|
-
|
|
104
|
+
This can be extremely useful with the `vosk` backend and its realtime transcription, or alternatively with the `--restart` option with the `whisper` backend.
|
|
105
|
+
|
|
106
|
+
The `--keyboard` option relies on the optional `pynput` dependency (installed together with `scribe` if you used the `[all]` or `[keyboard]` option).
|
|
88
107
|
Depending on your operating system, `pynput` may require additional configuration to work around its [limitations](https://pynput.readthedocs.io/en/latest/limitations.html).
|
|
89
108
|
|
|
90
109
|
#### Use the keyboard with Wayland (default for Ubuntu 24.04)
|
|
@@ -101,7 +120,7 @@ sudo HOME=$HOME XDG_RUNTIME_DIR=$XDG_RUNTIME_DIR PYNPUT_BACKEND_KEYBOARD=uinput
|
|
|
101
120
|
```
|
|
102
121
|
You're on the right path :)
|
|
103
122
|
|
|
104
|
-
|
|
123
|
+
## System tray icon (experimental) <img src="scribe_data/share/icon.png" width=48px>
|
|
105
124
|
|
|
106
125
|
To avoid switching back and forth with the terminal, it's possible to interact with the program via an icon tray.
|
|
107
126
|
To activate start with:
|
|
@@ -116,7 +135,7 @@ sudo apt install libcairo-dev libgirepository1.0-dev gir1.2-appindicator3-0.1
|
|
|
116
135
|
pip install PyGObject
|
|
117
136
|
```
|
|
118
137
|
|
|
119
|
-
|
|
138
|
+
## Start as an application in GNOME
|
|
120
139
|
|
|
121
140
|
If you run Ubuntu (or else?) with GNOME, the script `scribe-install [...]` will create a `scribe.desktop` file and place it under `$HOME/.local/share/applications`
|
|
122
141
|
to make it available from the quick launch menu. Any option will be passed on to `scribe`.
|
|
@@ -124,7 +143,17 @@ to make it available from the quick launch menu. Any option will be passed on to
|
|
|
124
143
|
e.g.
|
|
125
144
|
|
|
126
145
|
```bash
|
|
127
|
-
scribe-install --backend whisper --model small
|
|
146
|
+
scribe-install --backend whisper --model small --clipboard --keyboard --restart-after-silence
|
|
128
147
|
```
|
|
129
148
|
|
|
130
|
-
After that just typing
|
|
149
|
+
After that just typing Super + scri... at any time from any where will conveniently start the app in its own terminal with the prescribed options.
|
|
150
|
+
|
|
151
|
+
|
|
152
|
+
## Fine tuning
|
|
153
|
+
|
|
154
|
+
There are a number of options to control the silence threshold, duration and more.
|
|
155
|
+
Best is to check the available options in the online help:
|
|
156
|
+
|
|
157
|
+
```bash
|
|
158
|
+
scribe --help
|
|
159
|
+
```
|
|
@@ -1,5 +1,6 @@
|
|
|
1
1
|
from pathlib import Path
|
|
2
2
|
import tomllib
|
|
3
|
+
import re
|
|
3
4
|
import time
|
|
4
5
|
import argparse
|
|
5
6
|
from scribe.audio import Microphone
|
|
@@ -177,16 +178,22 @@ def get_parser():
|
|
|
177
178
|
|
|
178
179
|
parser.add_argument("--samplerate", default=16000, type=int, help=argparse.SUPPRESS)
|
|
179
180
|
parser.add_argument("--microphone-device", help="The device index of the microphone to use.", type=int)
|
|
180
|
-
|
|
181
|
-
parser.
|
|
182
|
-
|
|
183
|
-
|
|
181
|
+
|
|
182
|
+
group = parser.add_argument_group("transcription output")
|
|
183
|
+
group.add_argument("-c", "--clipboard", dest="clipboard", action="store_true")
|
|
184
|
+
# group.add_argument("--no-clipboard", dest="clipboard", action="store_false", help=argparse.SUPPRESS)
|
|
185
|
+
group.add_argument("-k", "--keyboard", action="store_true")
|
|
186
|
+
group.add_argument("-o", "--output-file")
|
|
187
|
+
|
|
188
|
+
group = parser.add_argument_group("keyboard options")
|
|
189
|
+
group.add_argument("--latency", default=0.01, type=float, help="keyboard latency (default %(default)s s)")
|
|
190
|
+
group.add_argument("--ascii", action="store_true", help="Use unidecode for keyboard typing in ascii")
|
|
184
191
|
|
|
185
192
|
group = parser.add_argument_group("whisper options")
|
|
186
|
-
group.add_argument("--duration", default=120, type=float, help="Max duration of the whisper recording (default %(default)
|
|
187
|
-
group.add_argument("--silence", default=2, type=float, help="silence duration (default %(default)
|
|
188
|
-
group.add_argument("--silence-db", default=-30, type=float, help="silence magnitude in
|
|
189
|
-
group.add_argument("--restart-after-silence", action="store_true", help="Restart the recording after a transcription triggered by a silence")
|
|
193
|
+
group.add_argument("--duration", default=120, type=float, help="Max duration of the whisper recording (default %(default)s s)")
|
|
194
|
+
group.add_argument("--silence", default=2, type=float, help="silence duration (default %(default)s s)")
|
|
195
|
+
group.add_argument("--silence-db", default=-30, type=float, help="silence magnitude in decibel (default %(default)s db)")
|
|
196
|
+
group.add_argument("-a", "--restart-after-silence", action="store_true", help="Restart the recording after a transcription triggered by a silence")
|
|
190
197
|
|
|
191
198
|
parser.add_argument("--download-folder-vosk", help="Folder to store Vosk models.")
|
|
192
199
|
parser.add_argument("--download-folder-whisper", help="Folder to store Whisper models.")
|
|
@@ -195,18 +202,16 @@ def get_parser():
|
|
|
195
202
|
|
|
196
203
|
|
|
197
204
|
# Commencer l'enregistrement
|
|
198
|
-
def start_recording(micro, transcriber, clipboard=True, keyboard=False, latency=0, ascii=False, callback=None, **greetings):
|
|
205
|
+
def start_recording(micro, transcriber, clipboard=True, keyboard=False, latency=0, ascii=False, output_file=None, callback=None, **greetings):
|
|
199
206
|
|
|
200
207
|
if keyboard:
|
|
201
208
|
from scribe.keyboard import type_text
|
|
202
209
|
print("\nChange focus to target app during transcription.")
|
|
203
210
|
|
|
204
|
-
|
|
205
211
|
if clipboard:
|
|
206
212
|
import pyperclip
|
|
207
213
|
print("\nThe full transcription will be copied to clipboard as it becomes available.")
|
|
208
214
|
|
|
209
|
-
|
|
210
215
|
fulltext = ""
|
|
211
216
|
|
|
212
217
|
for result in transcriber.start_recording(micro, **greetings):
|
|
@@ -217,6 +222,10 @@ def start_recording(micro, transcriber, clipboard=True, keyboard=False, latency=
|
|
|
217
222
|
if keyboard:
|
|
218
223
|
type_text(result['text'] + " ", interval=latency, ascii=ascii) # Simulate typing
|
|
219
224
|
|
|
225
|
+
if output_file:
|
|
226
|
+
with open(output_file, "a") as f:
|
|
227
|
+
f.write(result['text'] + "\n")
|
|
228
|
+
|
|
220
229
|
if clipboard:
|
|
221
230
|
fulltext += result['text'] + " "
|
|
222
231
|
pyperclip.copy(fulltext.strip())
|
|
@@ -224,9 +233,6 @@ def start_recording(micro, transcriber, clipboard=True, keyboard=False, latency=
|
|
|
224
233
|
else:
|
|
225
234
|
print_partial(result.get('partial', ''))
|
|
226
235
|
|
|
227
|
-
if clipboard:
|
|
228
|
-
print("Copied to clipboard.")
|
|
229
|
-
|
|
230
236
|
if callback:
|
|
231
237
|
callback()
|
|
232
238
|
|
|
@@ -355,17 +361,24 @@ def main(args=None):
|
|
|
355
361
|
if transcriber is None:
|
|
356
362
|
transcriber = get_transcriber(o, prompt=o.prompt)
|
|
357
363
|
print(f"Model [{colored(transcriber.model_name, 'light_blue', attrs=['bold'])}] from [{colored(transcriber.backend, 'light_blue', attrs=['bold'])}] selected.")
|
|
358
|
-
|
|
359
|
-
|
|
360
|
-
|
|
364
|
+
show_output = ["clipboard", "keyboard", "output_file"]
|
|
365
|
+
show_options = ["ascii", "app"]
|
|
366
|
+
activated_output = [colored(option if type(getattr(o, option)) is bool else f'{option}={getattr(o, option)}', 'light_blue') for option in show_output if getattr(o, option)]
|
|
367
|
+
activated_options = [colored(option if type(getattr(o, option)) is bool else f'{option}={getattr(o, option)}', 'light_blue') for option in show_options if getattr(o, option)]
|
|
368
|
+
if activated_output:
|
|
369
|
+
print(f"Output: {' | '.join(activated_output)}")
|
|
370
|
+
else:
|
|
371
|
+
print(colored(f"No output selected -> terminal only", "light_red"))
|
|
372
|
+
if activated_options:
|
|
373
|
+
print(f"Options: {' | '.join(activated_options)}")
|
|
361
374
|
if o.prompt:
|
|
362
375
|
print(f"Choose any of the following actions")
|
|
363
|
-
print(f"{colored('[q]', 'light_yellow')} quit")
|
|
364
376
|
print(f"{colored('[e]', 'light_yellow')} change model")
|
|
377
|
+
print(f"{colored('[f]', 'light_yellow')} output file is {colored(repr(o.output_file), 'light_blue')}")
|
|
378
|
+
print(f"{colored('[c]', 'light_yellow')} clipboard is {colored(o.clipboard, 'light_blue')} toggle?")
|
|
379
|
+
print(f"{colored('[k]', 'light_yellow')} keyboard is {colored(o.keyboard, 'light_blue')} toggle?")
|
|
380
|
+
print(f"{colored('[x]', 'light_yellow')} app is {colored(o.app, 'light_blue')} toggle?")
|
|
365
381
|
if details:
|
|
366
|
-
print(f"{colored('[x]', 'light_yellow')} app is {colored(o.app, 'light_blue')} toggle?")
|
|
367
|
-
print(f"{colored('[c]', 'light_yellow')} clipboard is {colored(o.clipboard, 'light_blue')} toggle?")
|
|
368
|
-
print(f"{colored('[k]', 'light_yellow')} keyboard is {colored(o.keyboard, 'light_blue')} toggle?")
|
|
369
382
|
if o.keyboard:
|
|
370
383
|
print(f"{colored('[latency]', 'light_yellow')} between keystrokes is {colored(o.latency, 'light_blue')} s")
|
|
371
384
|
if transcriber.backend == "whisper":
|
|
@@ -379,21 +392,22 @@ def main(args=None):
|
|
|
379
392
|
if key not in display_flags or key in exclude_flags or not isinstance(value, bool):
|
|
380
393
|
continue
|
|
381
394
|
print(f"{colored(f'[{key}]', 'light_yellow')} is {colored(value, 'light_blue')} toggle?")
|
|
382
|
-
print(f"{colored('[
|
|
395
|
+
print(f"{colored('[-]', 'light_yellow')} hide options")
|
|
383
396
|
else:
|
|
384
|
-
print(f"{colored('[
|
|
385
|
-
|
|
397
|
+
print(f"{colored('[-]', 'light_yellow')} show more options")
|
|
398
|
+
print(f"{colored('[q]', 'light_yellow')} quit")
|
|
386
399
|
print(colored(f"Press [Enter] to start recording.", attrs=["bold"]))
|
|
387
400
|
|
|
388
401
|
key = input()
|
|
389
402
|
if key == "q":
|
|
390
403
|
exit(0)
|
|
391
|
-
if key
|
|
404
|
+
if len(key) > 0 and key.strip() in ["", ".", "-", "+", 'o', '\x1b[A', '\x1b[B', '\x1b[C', '\x1b[D']: # arrow keys
|
|
392
405
|
details = not details
|
|
393
406
|
continue
|
|
394
407
|
if key == "e":
|
|
395
408
|
transcriber = None
|
|
396
409
|
o.model = None
|
|
410
|
+
o.dummy = False
|
|
397
411
|
o.backend = None
|
|
398
412
|
o.language = None
|
|
399
413
|
continue
|
|
@@ -437,18 +451,27 @@ def main(args=None):
|
|
|
437
451
|
except:
|
|
438
452
|
print("Invalid duration. Must be a float.")
|
|
439
453
|
continue
|
|
454
|
+
if key == "f":
|
|
455
|
+
ans = input(f"Enter output file (current: {o.output_file}): ")
|
|
456
|
+
invalid_regex = re.compile(r'[^A-Za-z0-9_\-\\\/\.]')
|
|
457
|
+
if not invalid_regex.search(ans):
|
|
458
|
+
o.output_file = ans
|
|
459
|
+
else:
|
|
460
|
+
print(f"Invalid characters: {' '.join(map(repr, invalid_regex.findall(ans)))}")
|
|
461
|
+
print(f"Invalid file name: {repr(ans)}")
|
|
462
|
+
continue
|
|
440
463
|
if key:
|
|
441
464
|
if hasattr(o, key) and isinstance(getattr(o, key), bool):
|
|
442
465
|
setattr(o, key, not getattr(o, key))
|
|
443
466
|
print(f"Toggle {key} to [{getattr(o, key)}].")
|
|
444
|
-
print(f"Invalid choice: {key}
|
|
467
|
+
print(f"Invalid choice: {repr(key)}")
|
|
445
468
|
continue
|
|
446
469
|
|
|
447
470
|
if o.app:
|
|
448
471
|
greetings = dict(
|
|
449
472
|
start_message = "Listening... Use the try icon menu to stop.",
|
|
450
473
|
)
|
|
451
|
-
app = create_app(micro, transcriber, clipboard=o.clipboard,
|
|
474
|
+
app = create_app(micro, transcriber, clipboard=o.clipboard, output_file=o.output_file,
|
|
452
475
|
keyboard=o.keyboard, latency=o.latency, ascii=o.ascii, **greetings)
|
|
453
476
|
print("Starting app...")
|
|
454
477
|
app.run()
|
|
@@ -456,7 +479,7 @@ def main(args=None):
|
|
|
456
479
|
greetings = dict(
|
|
457
480
|
start_message = "Listening... Press Ctrl+C to stop.",
|
|
458
481
|
)
|
|
459
|
-
start_recording(micro, transcriber, clipboard=o.clipboard,
|
|
482
|
+
start_recording(micro, transcriber, clipboard=o.clipboard, output_file=o.output_file,
|
|
460
483
|
keyboard=o.keyboard, latency=o.latency, ascii=o.ascii, **greetings)
|
|
461
484
|
|
|
462
485
|
# if we arrived so far, that means we pressed Ctrl + C anyway, and need Enter to move on.
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.2
|
|
2
2
|
Name: scribe-cli
|
|
3
|
-
Version: 0.
|
|
3
|
+
Version: 0.8.0
|
|
4
4
|
Summary: scribe is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard on a computer
|
|
5
5
|
Author-email: Mahé Perrette <mahe.perrette@gmail.com>
|
|
6
6
|
License: MIT License
|
|
@@ -118,7 +118,7 @@ scribe
|
|
|
118
118
|
and the script will guide you through the choice of backend (`whisper` or `vosk`) and the specific language model.
|
|
119
119
|
After this, you will be prompted to start recording your microphone and print the transcribed text in real-time (`vosk`)
|
|
120
120
|
or until after recording is complete (`whisper`).
|
|
121
|
-
You can interrupt the recording via Ctrl + C and start again or change model.
|
|
121
|
+
You can interrupt the recording via Ctrl + C and start again or change model.
|
|
122
122
|
|
|
123
123
|
The default (`whisper`) is excellent at transcribing a full-length audio sequences in [many languages](https://github.com/openai/whisper?tab=readme-ov-file#available-models-and-languages). It is really impressive,
|
|
124
124
|
but it cannot do real-time, and depending on the model can have relatively long execution time, especially with the `turbo` model (at least on my laptop with CPU only). The `small` model is also excellent and runs much faster. It is selected as default in `scribe` for that reason.
|
|
@@ -135,19 +135,38 @@ scribe --backend whisper --model small --no-prompt
|
|
|
135
135
|
```
|
|
136
136
|
where `--no-prompt` jumps right to the recording (after the first interruption, you can still choose to change the backend and model).
|
|
137
137
|
|
|
138
|
-
|
|
138
|
+
## Output media
|
|
139
|
+
|
|
140
|
+
By default the transcription is printed on the terminal, but other output media are supported.
|
|
141
|
+
|
|
142
|
+
### Clipboard
|
|
143
|
+
|
|
144
|
+
The most straightforward is the clipboard:
|
|
145
|
+
|
|
146
|
+
```bash
|
|
147
|
+
scribe --clipboard
|
|
148
|
+
```
|
|
149
|
+
The content of the (full) transcription is then pasted to the clipboard, and it is up to the user to paste (e.g. Ctrl + V).
|
|
150
|
+
|
|
151
|
+
### Output file
|
|
152
|
+
|
|
153
|
+
Alternatively an output file can be indicated:
|
|
154
|
+
|
|
155
|
+
```bash
|
|
156
|
+
--keyboard -o transcription.txt
|
|
157
|
+
```
|
|
139
158
|
|
|
140
|
-
|
|
141
|
-
However with the `vosk` backend and its realtime transcription, it is very handy to have the keys sent directly to the keyboard.
|
|
142
|
-
That can be achieve with the `--keyboard` option.
|
|
159
|
+
### Virtual keyboard (experimental)
|
|
143
160
|
|
|
144
|
-
With the `--keyboard` option `scribe` will attempt to simulate a keyboard and send transcribed characters to the
|
|
161
|
+
With the `--keyboard` option `scribe` will attempt to simulate a keyboard and send transcribed characters to the application under focus:
|
|
145
162
|
|
|
146
163
|
```bash
|
|
147
164
|
scribe --keyboard
|
|
148
165
|
```
|
|
149
166
|
|
|
150
|
-
|
|
167
|
+
This can be extremely useful with the `vosk` backend and its realtime transcription, or alternatively with the `--restart` option with the `whisper` backend.
|
|
168
|
+
|
|
169
|
+
The `--keyboard` option relies on the optional `pynput` dependency (installed together with `scribe` if you used the `[all]` or `[keyboard]` option).
|
|
151
170
|
Depending on your operating system, `pynput` may require additional configuration to work around its [limitations](https://pynput.readthedocs.io/en/latest/limitations.html).
|
|
152
171
|
|
|
153
172
|
#### Use the keyboard with Wayland (default for Ubuntu 24.04)
|
|
@@ -164,7 +183,7 @@ sudo HOME=$HOME XDG_RUNTIME_DIR=$XDG_RUNTIME_DIR PYNPUT_BACKEND_KEYBOARD=uinput
|
|
|
164
183
|
```
|
|
165
184
|
You're on the right path :)
|
|
166
185
|
|
|
167
|
-
|
|
186
|
+
## System tray icon (experimental) <img src="scribe_data/share/icon.png" width=48px>
|
|
168
187
|
|
|
169
188
|
To avoid switching back and forth with the terminal, it's possible to interact with the program via an icon tray.
|
|
170
189
|
To activate start with:
|
|
@@ -179,7 +198,7 @@ sudo apt install libcairo-dev libgirepository1.0-dev gir1.2-appindicator3-0.1
|
|
|
179
198
|
pip install PyGObject
|
|
180
199
|
```
|
|
181
200
|
|
|
182
|
-
|
|
201
|
+
## Start as an application in GNOME
|
|
183
202
|
|
|
184
203
|
If you run Ubuntu (or else?) with GNOME, the script `scribe-install [...]` will create a `scribe.desktop` file and place it under `$HOME/.local/share/applications`
|
|
185
204
|
to make it available from the quick launch menu. Any option will be passed on to `scribe`.
|
|
@@ -187,7 +206,17 @@ to make it available from the quick launch menu. Any option will be passed on to
|
|
|
187
206
|
e.g.
|
|
188
207
|
|
|
189
208
|
```bash
|
|
190
|
-
scribe-install --backend whisper --model small
|
|
209
|
+
scribe-install --backend whisper --model small --clipboard --keyboard --restart-after-silence
|
|
191
210
|
```
|
|
192
211
|
|
|
193
|
-
After that just typing
|
|
212
|
+
After that just typing Super + scri... at any time from any where will conveniently start the app in its own terminal with the prescribed options.
|
|
213
|
+
|
|
214
|
+
|
|
215
|
+
## Fine tuning
|
|
216
|
+
|
|
217
|
+
There are a number of options to control the silence threshold, duration and more.
|
|
218
|
+
Best is to check the available options in the online help:
|
|
219
|
+
|
|
220
|
+
```bash
|
|
221
|
+
scribe --help
|
|
222
|
+
```
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|