scribe-cli 0.7.10__tar.gz → 0.8.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {scribe_cli-0.7.10/scribe_cli.egg-info → scribe_cli-0.8.0}/PKG-INFO +44 -15
- {scribe_cli-0.7.10 → scribe_cli-0.8.0}/README.md +43 -14
- {scribe_cli-0.7.10 → scribe_cli-0.8.0}/scribe/_version.py +2 -2
- {scribe_cli-0.7.10 → scribe_cli-0.8.0}/scribe/app.py +74 -33
- {scribe_cli-0.7.10 → scribe_cli-0.8.0}/scribe/models.py +22 -3
- {scribe_cli-0.7.10 → scribe_cli-0.8.0/scribe_cli.egg-info}/PKG-INFO +44 -15
- {scribe_cli-0.7.10 → scribe_cli-0.8.0}/.github/workflows/pypi.yml +0 -0
- {scribe_cli-0.7.10 → scribe_cli-0.8.0}/.gitignore +0 -0
- {scribe_cli-0.7.10 → scribe_cli-0.8.0}/LICENSE +0 -0
- {scribe_cli-0.7.10 → scribe_cli-0.8.0}/icon.xcf +0 -0
- {scribe_cli-0.7.10 → scribe_cli-0.8.0}/pyproject.toml +0 -0
- {scribe_cli-0.7.10 → scribe_cli-0.8.0}/scribe/__init__.py +0 -0
- {scribe_cli-0.7.10 → scribe_cli-0.8.0}/scribe/audio.py +0 -0
- {scribe_cli-0.7.10 → scribe_cli-0.8.0}/scribe/install_desktop.py +0 -0
- {scribe_cli-0.7.10 → scribe_cli-0.8.0}/scribe/keyboard.py +0 -0
- {scribe_cli-0.7.10 → scribe_cli-0.8.0}/scribe/models.toml +0 -0
- {scribe_cli-0.7.10 → scribe_cli-0.8.0}/scribe/saverecording.py +0 -0
- {scribe_cli-0.7.10 → scribe_cli-0.8.0}/scribe/testpynput.py +0 -0
- {scribe_cli-0.7.10 → scribe_cli-0.8.0}/scribe/util.py +0 -0
- {scribe_cli-0.7.10 → scribe_cli-0.8.0}/scribe_cli.egg-info/SOURCES.txt +0 -0
- {scribe_cli-0.7.10 → scribe_cli-0.8.0}/scribe_cli.egg-info/dependency_links.txt +0 -0
- {scribe_cli-0.7.10 → scribe_cli-0.8.0}/scribe_cli.egg-info/entry_points.txt +0 -0
- {scribe_cli-0.7.10 → scribe_cli-0.8.0}/scribe_cli.egg-info/requires.txt +0 -0
- {scribe_cli-0.7.10 → scribe_cli-0.8.0}/scribe_cli.egg-info/top_level.txt +0 -0
- {scribe_cli-0.7.10 → scribe_cli-0.8.0}/scribe_data/__init__.py +0 -0
- {scribe_cli-0.7.10 → scribe_cli-0.8.0}/scribe_data/share/icon.png +0 -0
- {scribe_cli-0.7.10 → scribe_cli-0.8.0}/scribe_data/share/icon_recording.png +0 -0
- {scribe_cli-0.7.10 → scribe_cli-0.8.0}/scribe_data/share/icon_writing.png +0 -0
- {scribe_cli-0.7.10 → scribe_cli-0.8.0}/scribe_data/templates/scribe.desktop +0 -0
- {scribe_cli-0.7.10 → scribe_cli-0.8.0}/setup.cfg +0 -0
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.2
|
|
2
2
|
Name: scribe-cli
|
|
3
|
-
Version: 0.
|
|
3
|
+
Version: 0.8.0
|
|
4
4
|
Summary: scribe is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard on a computer
|
|
5
5
|
Author-email: Mahé Perrette <mahe.perrette@gmail.com>
|
|
6
6
|
License: MIT License
|
|
@@ -104,7 +104,7 @@ pip install -e .[all]
|
|
|
104
104
|
You can leave the optional dependencies (leave out `[all]`) but must install at least one of `vosk` or `openai-whisper` packages (see Usage below).
|
|
105
105
|
|
|
106
106
|
The `vosk` language models will download on-the-fly.
|
|
107
|
-
The default download folder is `$XDG_CACHE_HOME/{backend}` where `$XDG_CACHE_HOME` defaults to `$HOME/.cache` (note for the `
|
|
107
|
+
The default download folder is `$XDG_CACHE_HOME/{backend}` where `$XDG_CACHE_HOME` defaults to `$HOME/.cache` (note for the `whisper` backend
|
|
108
108
|
the default is left to the `openai-whisper` package and might change in the future).
|
|
109
109
|
|
|
110
110
|
|
|
@@ -118,12 +118,12 @@ scribe
|
|
|
118
118
|
and the script will guide you through the choice of backend (`whisper` or `vosk`) and the specific language model.
|
|
119
119
|
After this, you will be prompted to start recording your microphone and print the transcribed text in real-time (`vosk`)
|
|
120
120
|
or until after recording is complete (`whisper`).
|
|
121
|
-
You can interrupt the recording via Ctrl + C and start again or change model.
|
|
121
|
+
You can interrupt the recording via Ctrl + C and start again or change model.
|
|
122
122
|
|
|
123
123
|
The default (`whisper`) is excellent at transcribing a full-length audio sequences in [many languages](https://github.com/openai/whisper?tab=readme-ov-file#available-models-and-languages). It is really impressive,
|
|
124
124
|
but it cannot do real-time, and depending on the model can have relatively long execution time, especially with the `turbo` model (at least on my laptop with CPU only). The `small` model is also excellent and runs much faster. It is selected as default in `scribe` for that reason.
|
|
125
|
-
With the `
|
|
126
|
-
|
|
125
|
+
With the `whisper` model the registration stops after a 2-second silence is detected. You can also stop the registration manually before the transcription occurs (Ctrl + C or Stop in the `--app` mode).
|
|
126
|
+
By default, the recording will only last 120 seconds. You can fine-tune this behaviour via the `--silence`, `--duration` and `--restart-after-silence` parameters.
|
|
127
127
|
|
|
128
128
|
The `vosk` backend is much faster and very good at doing real-time transcription for one language, but tended to make more mistakes in my tests and it does not do punctuation.
|
|
129
129
|
It becomes really powerful when used for longer or interactive typing session with the [keyboard](#virtual-keyboard-experimental) option, e.g. to make notes or chat with an AI.
|
|
@@ -135,19 +135,38 @@ scribe --backend whisper --model small --no-prompt
|
|
|
135
135
|
```
|
|
136
136
|
where `--no-prompt` jumps right to the recording (after the first interruption, you can still choose to change the backend and model).
|
|
137
137
|
|
|
138
|
-
|
|
138
|
+
## Output media
|
|
139
|
+
|
|
140
|
+
By default the transcription is printed on the terminal, but other output media are supported.
|
|
141
|
+
|
|
142
|
+
### Clipboard
|
|
143
|
+
|
|
144
|
+
The most straightforward is the clipboard:
|
|
145
|
+
|
|
146
|
+
```bash
|
|
147
|
+
scribe --clipboard
|
|
148
|
+
```
|
|
149
|
+
The content of the (full) transcription is then pasted to the clipboard, and it is up to the user to paste (e.g. Ctrl + V).
|
|
150
|
+
|
|
151
|
+
### Output file
|
|
152
|
+
|
|
153
|
+
Alternatively an output file can be indicated:
|
|
154
|
+
|
|
155
|
+
```bash
|
|
156
|
+
--keyboard -o transcription.txt
|
|
157
|
+
```
|
|
139
158
|
|
|
140
|
-
|
|
141
|
-
However with the `vosk` backend and its realtime transcription, it is very handy to have the keys sent directly to the keyboard.
|
|
142
|
-
That can be achieve with the `--keyboard` option.
|
|
159
|
+
### Virtual keyboard (experimental)
|
|
143
160
|
|
|
144
|
-
With the `--keyboard` option `scribe` will attempt to simulate a keyboard and send transcribed characters to the
|
|
161
|
+
With the `--keyboard` option `scribe` will attempt to simulate a keyboard and send transcribed characters to the application under focus:
|
|
145
162
|
|
|
146
163
|
```bash
|
|
147
164
|
scribe --keyboard
|
|
148
165
|
```
|
|
149
166
|
|
|
150
|
-
|
|
167
|
+
This can be extremely useful with the `vosk` backend and its realtime transcription, or alternatively with the `--restart` option with the `whisper` backend.
|
|
168
|
+
|
|
169
|
+
The `--keyboard` option relies on the optional `pynput` dependency (installed together with `scribe` if you used the `[all]` or `[keyboard]` option).
|
|
151
170
|
Depending on your operating system, `pynput` may require additional configuration to work around its [limitations](https://pynput.readthedocs.io/en/latest/limitations.html).
|
|
152
171
|
|
|
153
172
|
#### Use the keyboard with Wayland (default for Ubuntu 24.04)
|
|
@@ -164,7 +183,7 @@ sudo HOME=$HOME XDG_RUNTIME_DIR=$XDG_RUNTIME_DIR PYNPUT_BACKEND_KEYBOARD=uinput
|
|
|
164
183
|
```
|
|
165
184
|
You're on the right path :)
|
|
166
185
|
|
|
167
|
-
|
|
186
|
+
## System tray icon (experimental) <img src="scribe_data/share/icon.png" width=48px>
|
|
168
187
|
|
|
169
188
|
To avoid switching back and forth with the terminal, it's possible to interact with the program via an icon tray.
|
|
170
189
|
To activate start with:
|
|
@@ -179,7 +198,7 @@ sudo apt install libcairo-dev libgirepository1.0-dev gir1.2-appindicator3-0.1
|
|
|
179
198
|
pip install PyGObject
|
|
180
199
|
```
|
|
181
200
|
|
|
182
|
-
|
|
201
|
+
## Start as an application in GNOME
|
|
183
202
|
|
|
184
203
|
If you run Ubuntu (or else?) with GNOME, the script `scribe-install [...]` will create a `scribe.desktop` file and place it under `$HOME/.local/share/applications`
|
|
185
204
|
to make it available from the quick launch menu. Any option will be passed on to `scribe`.
|
|
@@ -187,7 +206,17 @@ to make it available from the quick launch menu. Any option will be passed on to
|
|
|
187
206
|
e.g.
|
|
188
207
|
|
|
189
208
|
```bash
|
|
190
|
-
scribe-install --backend whisper --model small
|
|
209
|
+
scribe-install --backend whisper --model small --clipboard --keyboard --restart-after-silence
|
|
191
210
|
```
|
|
192
211
|
|
|
193
|
-
After that just typing
|
|
212
|
+
After that just typing Super + scri... at any time from any where will conveniently start the app in its own terminal with the prescribed options.
|
|
213
|
+
|
|
214
|
+
|
|
215
|
+
## Fine tuning
|
|
216
|
+
|
|
217
|
+
There are a number of options to control the silence threshold, duration and more.
|
|
218
|
+
Best is to check the available options in the online help:
|
|
219
|
+
|
|
220
|
+
```bash
|
|
221
|
+
scribe --help
|
|
222
|
+
```
|
|
@@ -41,7 +41,7 @@ pip install -e .[all]
|
|
|
41
41
|
You can leave the optional dependencies (leave out `[all]`) but must install at least one of `vosk` or `openai-whisper` packages (see Usage below).
|
|
42
42
|
|
|
43
43
|
The `vosk` language models will download on-the-fly.
|
|
44
|
-
The default download folder is `$XDG_CACHE_HOME/{backend}` where `$XDG_CACHE_HOME` defaults to `$HOME/.cache` (note for the `
|
|
44
|
+
The default download folder is `$XDG_CACHE_HOME/{backend}` where `$XDG_CACHE_HOME` defaults to `$HOME/.cache` (note for the `whisper` backend
|
|
45
45
|
the default is left to the `openai-whisper` package and might change in the future).
|
|
46
46
|
|
|
47
47
|
|
|
@@ -55,12 +55,12 @@ scribe
|
|
|
55
55
|
and the script will guide you through the choice of backend (`whisper` or `vosk`) and the specific language model.
|
|
56
56
|
After this, you will be prompted to start recording your microphone and print the transcribed text in real-time (`vosk`)
|
|
57
57
|
or until after recording is complete (`whisper`).
|
|
58
|
-
You can interrupt the recording via Ctrl + C and start again or change model.
|
|
58
|
+
You can interrupt the recording via Ctrl + C and start again or change model.
|
|
59
59
|
|
|
60
60
|
The default (`whisper`) is excellent at transcribing a full-length audio sequences in [many languages](https://github.com/openai/whisper?tab=readme-ov-file#available-models-and-languages). It is really impressive,
|
|
61
61
|
but it cannot do real-time, and depending on the model can have relatively long execution time, especially with the `turbo` model (at least on my laptop with CPU only). The `small` model is also excellent and runs much faster. It is selected as default in `scribe` for that reason.
|
|
62
|
-
With the `
|
|
63
|
-
|
|
62
|
+
With the `whisper` model the registration stops after a 2-second silence is detected. You can also stop the registration manually before the transcription occurs (Ctrl + C or Stop in the `--app` mode).
|
|
63
|
+
By default, the recording will only last 120 seconds. You can fine-tune this behaviour via the `--silence`, `--duration` and `--restart-after-silence` parameters.
|
|
64
64
|
|
|
65
65
|
The `vosk` backend is much faster and very good at doing real-time transcription for one language, but tended to make more mistakes in my tests and it does not do punctuation.
|
|
66
66
|
It becomes really powerful when used for longer or interactive typing session with the [keyboard](#virtual-keyboard-experimental) option, e.g. to make notes or chat with an AI.
|
|
@@ -72,19 +72,38 @@ scribe --backend whisper --model small --no-prompt
|
|
|
72
72
|
```
|
|
73
73
|
where `--no-prompt` jumps right to the recording (after the first interruption, you can still choose to change the backend and model).
|
|
74
74
|
|
|
75
|
-
|
|
75
|
+
## Output media
|
|
76
|
+
|
|
77
|
+
By default the transcription is printed on the terminal, but other output media are supported.
|
|
78
|
+
|
|
79
|
+
### Clipboard
|
|
80
|
+
|
|
81
|
+
The most straightforward is the clipboard:
|
|
82
|
+
|
|
83
|
+
```bash
|
|
84
|
+
scribe --clipboard
|
|
85
|
+
```
|
|
86
|
+
The content of the (full) transcription is then pasted to the clipboard, and it is up to the user to paste (e.g. Ctrl + V).
|
|
76
87
|
|
|
77
|
-
|
|
78
|
-
However with the `vosk` backend and its realtime transcription, it is very handy to have the keys sent directly to the keyboard.
|
|
79
|
-
That can be achieve with the `--keyboard` option.
|
|
88
|
+
### Output file
|
|
80
89
|
|
|
81
|
-
|
|
90
|
+
Alternatively an output file can be indicated:
|
|
91
|
+
|
|
92
|
+
```bash
|
|
93
|
+
--keyboard -o transcription.txt
|
|
94
|
+
```
|
|
95
|
+
|
|
96
|
+
### Virtual keyboard (experimental)
|
|
97
|
+
|
|
98
|
+
With the `--keyboard` option `scribe` will attempt to simulate a keyboard and send transcribed characters to the application under focus:
|
|
82
99
|
|
|
83
100
|
```bash
|
|
84
101
|
scribe --keyboard
|
|
85
102
|
```
|
|
86
103
|
|
|
87
|
-
|
|
104
|
+
This can be extremely useful with the `vosk` backend and its realtime transcription, or alternatively with the `--restart` option with the `whisper` backend.
|
|
105
|
+
|
|
106
|
+
The `--keyboard` option relies on the optional `pynput` dependency (installed together with `scribe` if you used the `[all]` or `[keyboard]` option).
|
|
88
107
|
Depending on your operating system, `pynput` may require additional configuration to work around its [limitations](https://pynput.readthedocs.io/en/latest/limitations.html).
|
|
89
108
|
|
|
90
109
|
#### Use the keyboard with Wayland (default for Ubuntu 24.04)
|
|
@@ -101,7 +120,7 @@ sudo HOME=$HOME XDG_RUNTIME_DIR=$XDG_RUNTIME_DIR PYNPUT_BACKEND_KEYBOARD=uinput
|
|
|
101
120
|
```
|
|
102
121
|
You're on the right path :)
|
|
103
122
|
|
|
104
|
-
|
|
123
|
+
## System tray icon (experimental) <img src="scribe_data/share/icon.png" width=48px>
|
|
105
124
|
|
|
106
125
|
To avoid switching back and forth with the terminal, it's possible to interact with the program via an icon tray.
|
|
107
126
|
To activate start with:
|
|
@@ -116,7 +135,7 @@ sudo apt install libcairo-dev libgirepository1.0-dev gir1.2-appindicator3-0.1
|
|
|
116
135
|
pip install PyGObject
|
|
117
136
|
```
|
|
118
137
|
|
|
119
|
-
|
|
138
|
+
## Start as an application in GNOME
|
|
120
139
|
|
|
121
140
|
If you run Ubuntu (or else?) with GNOME, the script `scribe-install [...]` will create a `scribe.desktop` file and place it under `$HOME/.local/share/applications`
|
|
122
141
|
to make it available from the quick launch menu. Any option will be passed on to `scribe`.
|
|
@@ -124,7 +143,17 @@ to make it available from the quick launch menu. Any option will be passed on to
|
|
|
124
143
|
e.g.
|
|
125
144
|
|
|
126
145
|
```bash
|
|
127
|
-
scribe-install --backend whisper --model small
|
|
146
|
+
scribe-install --backend whisper --model small --clipboard --keyboard --restart-after-silence
|
|
128
147
|
```
|
|
129
148
|
|
|
130
|
-
After that just typing
|
|
149
|
+
After that just typing Super + scri... at any time from any where will conveniently start the app in its own terminal with the prescribed options.
|
|
150
|
+
|
|
151
|
+
|
|
152
|
+
## Fine tuning
|
|
153
|
+
|
|
154
|
+
There are a number of options to control the silence threshold, duration and more.
|
|
155
|
+
Best is to check the available options in the online help:
|
|
156
|
+
|
|
157
|
+
```bash
|
|
158
|
+
scribe --help
|
|
159
|
+
```
|
|
@@ -1,5 +1,6 @@
|
|
|
1
1
|
from pathlib import Path
|
|
2
2
|
import tomllib
|
|
3
|
+
import re
|
|
3
4
|
import time
|
|
4
5
|
import argparse
|
|
5
6
|
from scribe.audio import Microphone
|
|
@@ -147,7 +148,8 @@ def get_transcriber(o, prompt=True):
|
|
|
147
148
|
|
|
148
149
|
elif backend == "whisper":
|
|
149
150
|
transcriber = WhisperTranscriber(model_name=model, language=o.language, samplerate=o.samplerate,
|
|
150
|
-
timeout=o.duration, silence_duration=o.silence,
|
|
151
|
+
timeout=o.duration, silence_duration=o.silence, silence_thresh=o.silence_db,
|
|
152
|
+
restart_after_silence=o.restart_after_silence,
|
|
151
153
|
model_kwargs={"download_root": o.download_folder_whisper})
|
|
152
154
|
|
|
153
155
|
else:
|
|
@@ -176,15 +178,22 @@ def get_parser():
|
|
|
176
178
|
|
|
177
179
|
parser.add_argument("--samplerate", default=16000, type=int, help=argparse.SUPPRESS)
|
|
178
180
|
parser.add_argument("--microphone-device", help="The device index of the microphone to use.", type=int)
|
|
179
|
-
|
|
180
|
-
parser.
|
|
181
|
-
|
|
182
|
-
|
|
181
|
+
|
|
182
|
+
group = parser.add_argument_group("transcription output")
|
|
183
|
+
group.add_argument("-c", "--clipboard", dest="clipboard", action="store_true")
|
|
184
|
+
# group.add_argument("--no-clipboard", dest="clipboard", action="store_false", help=argparse.SUPPRESS)
|
|
185
|
+
group.add_argument("-k", "--keyboard", action="store_true")
|
|
186
|
+
group.add_argument("-o", "--output-file")
|
|
187
|
+
|
|
188
|
+
group = parser.add_argument_group("keyboard options")
|
|
189
|
+
group.add_argument("--latency", default=0.01, type=float, help="keyboard latency (default %(default)s s)")
|
|
190
|
+
group.add_argument("--ascii", action="store_true", help="Use unidecode for keyboard typing in ascii")
|
|
183
191
|
|
|
184
192
|
group = parser.add_argument_group("whisper options")
|
|
185
|
-
group.add_argument("--duration", default=120, type=
|
|
186
|
-
group.add_argument("--silence", default=2, type=float, help="silence duration
|
|
187
|
-
group.add_argument("--
|
|
193
|
+
group.add_argument("--duration", default=120, type=float, help="Max duration of the whisper recording (default %(default)s s)")
|
|
194
|
+
group.add_argument("--silence", default=2, type=float, help="silence duration (default %(default)s s)")
|
|
195
|
+
group.add_argument("--silence-db", default=-30, type=float, help="silence magnitude in decibel (default %(default)s db)")
|
|
196
|
+
group.add_argument("-a", "--restart-after-silence", action="store_true", help="Restart the recording after a transcription triggered by a silence")
|
|
188
197
|
|
|
189
198
|
parser.add_argument("--download-folder-vosk", help="Folder to store Vosk models.")
|
|
190
199
|
parser.add_argument("--download-folder-whisper", help="Folder to store Whisper models.")
|
|
@@ -193,18 +202,16 @@ def get_parser():
|
|
|
193
202
|
|
|
194
203
|
|
|
195
204
|
# Commencer l'enregistrement
|
|
196
|
-
def start_recording(micro, transcriber, clipboard=True, keyboard=False, latency=0, ascii=False, callback=None, **greetings):
|
|
205
|
+
def start_recording(micro, transcriber, clipboard=True, keyboard=False, latency=0, ascii=False, output_file=None, callback=None, **greetings):
|
|
197
206
|
|
|
198
207
|
if keyboard:
|
|
199
208
|
from scribe.keyboard import type_text
|
|
200
209
|
print("\nChange focus to target app during transcription.")
|
|
201
210
|
|
|
202
|
-
|
|
203
211
|
if clipboard:
|
|
204
212
|
import pyperclip
|
|
205
213
|
print("\nThe full transcription will be copied to clipboard as it becomes available.")
|
|
206
214
|
|
|
207
|
-
|
|
208
215
|
fulltext = ""
|
|
209
216
|
|
|
210
217
|
for result in transcriber.start_recording(micro, **greetings):
|
|
@@ -215,6 +222,10 @@ def start_recording(micro, transcriber, clipboard=True, keyboard=False, latency=
|
|
|
215
222
|
if keyboard:
|
|
216
223
|
type_text(result['text'] + " ", interval=latency, ascii=ascii) # Simulate typing
|
|
217
224
|
|
|
225
|
+
if output_file:
|
|
226
|
+
with open(output_file, "a") as f:
|
|
227
|
+
f.write(result['text'] + "\n")
|
|
228
|
+
|
|
218
229
|
if clipboard:
|
|
219
230
|
fulltext += result['text'] + " "
|
|
220
231
|
pyperclip.copy(fulltext.strip())
|
|
@@ -222,9 +233,6 @@ def start_recording(micro, transcriber, clipboard=True, keyboard=False, latency=
|
|
|
222
233
|
else:
|
|
223
234
|
print_partial(result.get('partial', ''))
|
|
224
235
|
|
|
225
|
-
if clipboard:
|
|
226
|
-
print("Copied to clipboard.")
|
|
227
|
-
|
|
228
236
|
if callback:
|
|
229
237
|
callback()
|
|
230
238
|
|
|
@@ -249,7 +257,15 @@ def create_app(micro, transcriber, **kwargs):
|
|
|
249
257
|
image_recording = Image.alpha_composite(image_recording.convert("RGBA"), image_writing.convert("RGBA"))
|
|
250
258
|
|
|
251
259
|
def update_icon(icon, force=False):
|
|
252
|
-
if transcriber.recording:
|
|
260
|
+
if transcriber.recording and transcriber.waiting:
|
|
261
|
+
# this is the situation with the whisper backend when the microphone is recording
|
|
262
|
+
# but we wait for the speaker to speak (silence)
|
|
263
|
+
if force or getattr(icon, "_icon_label", None) != None:
|
|
264
|
+
icon.icon = image
|
|
265
|
+
icon._icon_label = None
|
|
266
|
+
icon.update_menu()
|
|
267
|
+
|
|
268
|
+
elif transcriber.recording:
|
|
253
269
|
if force or getattr(icon, "_icon_label", None) != "recording":
|
|
254
270
|
icon.icon = image_recording
|
|
255
271
|
icon._icon_label = "recording"
|
|
@@ -345,22 +361,30 @@ def main(args=None):
|
|
|
345
361
|
if transcriber is None:
|
|
346
362
|
transcriber = get_transcriber(o, prompt=o.prompt)
|
|
347
363
|
print(f"Model [{colored(transcriber.model_name, 'light_blue', attrs=['bold'])}] from [{colored(transcriber.backend, 'light_blue', attrs=['bold'])}] selected.")
|
|
348
|
-
|
|
349
|
-
|
|
350
|
-
|
|
364
|
+
show_output = ["clipboard", "keyboard", "output_file"]
|
|
365
|
+
show_options = ["ascii", "app"]
|
|
366
|
+
activated_output = [colored(option if type(getattr(o, option)) is bool else f'{option}={getattr(o, option)}', 'light_blue') for option in show_output if getattr(o, option)]
|
|
367
|
+
activated_options = [colored(option if type(getattr(o, option)) is bool else f'{option}={getattr(o, option)}', 'light_blue') for option in show_options if getattr(o, option)]
|
|
368
|
+
if activated_output:
|
|
369
|
+
print(f"Output: {' | '.join(activated_output)}")
|
|
370
|
+
else:
|
|
371
|
+
print(colored(f"No output selected -> terminal only", "light_red"))
|
|
372
|
+
if activated_options:
|
|
373
|
+
print(f"Options: {' | '.join(activated_options)}")
|
|
351
374
|
if o.prompt:
|
|
352
375
|
print(f"Choose any of the following actions")
|
|
353
|
-
print(f"{colored('[q]', 'light_yellow')} quit")
|
|
354
376
|
print(f"{colored('[e]', 'light_yellow')} change model")
|
|
377
|
+
print(f"{colored('[f]', 'light_yellow')} output file is {colored(repr(o.output_file), 'light_blue')}")
|
|
378
|
+
print(f"{colored('[c]', 'light_yellow')} clipboard is {colored(o.clipboard, 'light_blue')} toggle?")
|
|
379
|
+
print(f"{colored('[k]', 'light_yellow')} keyboard is {colored(o.keyboard, 'light_blue')} toggle?")
|
|
380
|
+
print(f"{colored('[x]', 'light_yellow')} app is {colored(o.app, 'light_blue')} toggle?")
|
|
355
381
|
if details:
|
|
356
|
-
print(f"{colored('[x]', 'light_yellow')} app is {colored(o.app, 'light_blue')} toggle?")
|
|
357
|
-
print(f"{colored('[c]', 'light_yellow')} clipboard is {colored(o.clipboard, 'light_blue')} toggle?")
|
|
358
|
-
print(f"{colored('[k]', 'light_yellow')} keyboard is {colored(o.keyboard, 'light_blue')} toggle?")
|
|
359
382
|
if o.keyboard:
|
|
360
383
|
print(f"{colored('[latency]', 'light_yellow')} between keystrokes is {colored(o.latency, 'light_blue')} s")
|
|
361
384
|
if transcriber.backend == "whisper":
|
|
362
385
|
print(f"{colored('[t]', 'light_yellow')} change duration (currently {colored(transcriber.timeout, 'light_blue')} s)")
|
|
363
386
|
print(f"{colored('[b]', 'light_yellow')} change silence (currently {colored(transcriber.silence_duration, 'light_blue')} s)")
|
|
387
|
+
print(f"{colored('[db]', 'light_yellow')} change backround noise (currently {colored(transcriber.silence_thresh, 'light_blue')} db)")
|
|
364
388
|
print(f"{colored('[a]', 'light_yellow')} auto-restart after silence is {colored(transcriber.restart_after_silence, 'light_blue')} toggle?")
|
|
365
389
|
exclude_flags = ["keyboard", "clipboard", "app", "prompt", "restart_after_silence"]
|
|
366
390
|
display_flags = [a.dest for a in parser._actions if a.help != argparse.SUPPRESS]
|
|
@@ -368,21 +392,22 @@ def main(args=None):
|
|
|
368
392
|
if key not in display_flags or key in exclude_flags or not isinstance(value, bool):
|
|
369
393
|
continue
|
|
370
394
|
print(f"{colored(f'[{key}]', 'light_yellow')} is {colored(value, 'light_blue')} toggle?")
|
|
371
|
-
print(f"{colored('[
|
|
395
|
+
print(f"{colored('[-]', 'light_yellow')} hide options")
|
|
372
396
|
else:
|
|
373
|
-
print(f"{colored('[
|
|
374
|
-
|
|
397
|
+
print(f"{colored('[-]', 'light_yellow')} show more options")
|
|
398
|
+
print(f"{colored('[q]', 'light_yellow')} quit")
|
|
375
399
|
print(colored(f"Press [Enter] to start recording.", attrs=["bold"]))
|
|
376
400
|
|
|
377
401
|
key = input()
|
|
378
402
|
if key == "q":
|
|
379
403
|
exit(0)
|
|
380
|
-
if key
|
|
404
|
+
if len(key) > 0 and key.strip() in ["", ".", "-", "+", 'o', '\x1b[A', '\x1b[B', '\x1b[C', '\x1b[D']: # arrow keys
|
|
381
405
|
details = not details
|
|
382
406
|
continue
|
|
383
407
|
if key == "e":
|
|
384
408
|
transcriber = None
|
|
385
409
|
o.model = None
|
|
410
|
+
o.dummy = False
|
|
386
411
|
o.backend = None
|
|
387
412
|
o.language = None
|
|
388
413
|
continue
|
|
@@ -401,9 +426,9 @@ def main(args=None):
|
|
|
401
426
|
if key == "t":
|
|
402
427
|
ans = input(f"Enter new duration in seconds (current: {transcriber.timeout}): ")
|
|
403
428
|
try:
|
|
404
|
-
o.duration = transcriber.timeout =
|
|
429
|
+
o.duration = transcriber.timeout = float(ans)
|
|
405
430
|
except:
|
|
406
|
-
print("Invalid duration. Must be
|
|
431
|
+
print("Invalid duration. Must be a float.")
|
|
407
432
|
continue
|
|
408
433
|
if key == "latency":
|
|
409
434
|
ans = input(f"Enter new keyboard latency in seconds (current: {o.latency}): ")
|
|
@@ -415,22 +440,38 @@ def main(args=None):
|
|
|
415
440
|
if key == "b":
|
|
416
441
|
ans = input(f"Enter new silence break duration in seconds (current: {transcriber.silence_duration}): ")
|
|
417
442
|
try:
|
|
418
|
-
o.silence = transcriber.silence_duration =
|
|
443
|
+
o.silence = transcriber.silence_duration = float(ans)
|
|
419
444
|
except:
|
|
420
|
-
print("Invalid duration. Must be
|
|
445
|
+
print("Invalid duration. Must be a float.")
|
|
446
|
+
continue
|
|
447
|
+
if key == "db":
|
|
448
|
+
ans = input(f"Enter new background noise threshold to detect silence (current: {transcriber.silence_thresh}): ")
|
|
449
|
+
try:
|
|
450
|
+
o.silence_db = transcriber.silence_thresh = float(ans)
|
|
451
|
+
except:
|
|
452
|
+
print("Invalid duration. Must be a float.")
|
|
453
|
+
continue
|
|
454
|
+
if key == "f":
|
|
455
|
+
ans = input(f"Enter output file (current: {o.output_file}): ")
|
|
456
|
+
invalid_regex = re.compile(r'[^A-Za-z0-9_\-\\\/\.]')
|
|
457
|
+
if not invalid_regex.search(ans):
|
|
458
|
+
o.output_file = ans
|
|
459
|
+
else:
|
|
460
|
+
print(f"Invalid characters: {' '.join(map(repr, invalid_regex.findall(ans)))}")
|
|
461
|
+
print(f"Invalid file name: {repr(ans)}")
|
|
421
462
|
continue
|
|
422
463
|
if key:
|
|
423
464
|
if hasattr(o, key) and isinstance(getattr(o, key), bool):
|
|
424
465
|
setattr(o, key, not getattr(o, key))
|
|
425
466
|
print(f"Toggle {key} to [{getattr(o, key)}].")
|
|
426
|
-
print(f"Invalid choice: {key}
|
|
467
|
+
print(f"Invalid choice: {repr(key)}")
|
|
427
468
|
continue
|
|
428
469
|
|
|
429
470
|
if o.app:
|
|
430
471
|
greetings = dict(
|
|
431
472
|
start_message = "Listening... Use the try icon menu to stop.",
|
|
432
473
|
)
|
|
433
|
-
app = create_app(micro, transcriber, clipboard=o.clipboard,
|
|
474
|
+
app = create_app(micro, transcriber, clipboard=o.clipboard, output_file=o.output_file,
|
|
434
475
|
keyboard=o.keyboard, latency=o.latency, ascii=o.ascii, **greetings)
|
|
435
476
|
print("Starting app...")
|
|
436
477
|
app.run()
|
|
@@ -438,7 +479,7 @@ def main(args=None):
|
|
|
438
479
|
greetings = dict(
|
|
439
480
|
start_message = "Listening... Press Ctrl+C to stop.",
|
|
440
481
|
)
|
|
441
|
-
start_recording(micro, transcriber, clipboard=o.clipboard,
|
|
482
|
+
start_recording(micro, transcriber, clipboard=o.clipboard, output_file=o.output_file,
|
|
442
483
|
keyboard=o.keyboard, latency=o.latency, ascii=o.ascii, **greetings)
|
|
443
484
|
|
|
444
485
|
# if we arrived so far, that means we pressed Ctrl + C anyway, and need Enter to move on.
|
|
@@ -34,6 +34,7 @@ class AbstractTranscriber:
|
|
|
34
34
|
self.restart_after_silence = restart_after_silence
|
|
35
35
|
self.recording = False
|
|
36
36
|
self.busy = False
|
|
37
|
+
self.waiting = False
|
|
37
38
|
self.reset()
|
|
38
39
|
|
|
39
40
|
def get_elapsed(self):
|
|
@@ -52,7 +53,6 @@ class AbstractTranscriber:
|
|
|
52
53
|
def reset(self):
|
|
53
54
|
self.audio_buffer = b''
|
|
54
55
|
self.start_time = time.time()
|
|
55
|
-
self.last_sound_time = time.time()
|
|
56
56
|
|
|
57
57
|
def start_recording(self, microphone,
|
|
58
58
|
start_message="Recording... Press Ctrl+C to stop.",
|
|
@@ -60,7 +60,13 @@ class AbstractTranscriber:
|
|
|
60
60
|
|
|
61
61
|
self.reset()
|
|
62
62
|
self.recording = True
|
|
63
|
+
self.waiting = True
|
|
63
64
|
self.busy = True
|
|
65
|
+
if self.silence_duration is not None:
|
|
66
|
+
self.last_sound_time = time.time() - self.silence_duration
|
|
67
|
+
else:
|
|
68
|
+
self.last_sound_time = time.time()
|
|
69
|
+
previous_waiting = self.waiting
|
|
64
70
|
|
|
65
71
|
try:
|
|
66
72
|
|
|
@@ -75,19 +81,31 @@ class AbstractTranscriber:
|
|
|
75
81
|
if is_silent(data, self.silence_thresh):
|
|
76
82
|
silence_duration = time.time() - self.last_sound_time
|
|
77
83
|
|
|
78
|
-
|
|
84
|
+
previous_waiting = self.waiting
|
|
85
|
+
self.waiting = self.silence_duration is not None and silence_duration >= self.silence_duration
|
|
86
|
+
|
|
87
|
+
if self.waiting and len(self.audio_buffer) > 0:
|
|
79
88
|
if self.restart_after_silence:
|
|
89
|
+
self.recording = False # for the system tray icon
|
|
80
90
|
result = self.finalize()
|
|
81
91
|
microphone.q.queue.clear()
|
|
82
92
|
self.reset()
|
|
83
93
|
yield result
|
|
94
|
+
self.recording = True # for the system tray icon
|
|
84
95
|
else:
|
|
85
96
|
raise StopRecording("Silence detected: {:.2f} seconds".format(silence_duration))
|
|
86
97
|
|
|
87
98
|
else:
|
|
88
99
|
self.last_sound_time = time.time()
|
|
100
|
+
self.waiting = False
|
|
89
101
|
|
|
90
|
-
|
|
102
|
+
# don't accumulate very long silences
|
|
103
|
+
if not self.waiting:
|
|
104
|
+
yield self.transcribe_realtime_audio(data)
|
|
105
|
+
|
|
106
|
+
else:
|
|
107
|
+
if not previous_waiting:
|
|
108
|
+
print("Silence detected...waiting for more audio")
|
|
91
109
|
|
|
92
110
|
if self.is_overtime():
|
|
93
111
|
raise StopRecording("Overtime: {:.2f} seconds".format(self.get_elapsed()))
|
|
@@ -98,6 +116,7 @@ class AbstractTranscriber:
|
|
|
98
116
|
pass
|
|
99
117
|
|
|
100
118
|
finally:
|
|
119
|
+
self.waiting = False
|
|
101
120
|
self.recording = False
|
|
102
121
|
result = self.finalize()
|
|
103
122
|
microphone.q.queue.clear()
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.2
|
|
2
2
|
Name: scribe-cli
|
|
3
|
-
Version: 0.
|
|
3
|
+
Version: 0.8.0
|
|
4
4
|
Summary: scribe is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard on a computer
|
|
5
5
|
Author-email: Mahé Perrette <mahe.perrette@gmail.com>
|
|
6
6
|
License: MIT License
|
|
@@ -104,7 +104,7 @@ pip install -e .[all]
|
|
|
104
104
|
You can leave the optional dependencies (leave out `[all]`) but must install at least one of `vosk` or `openai-whisper` packages (see Usage below).
|
|
105
105
|
|
|
106
106
|
The `vosk` language models will download on-the-fly.
|
|
107
|
-
The default download folder is `$XDG_CACHE_HOME/{backend}` where `$XDG_CACHE_HOME` defaults to `$HOME/.cache` (note for the `
|
|
107
|
+
The default download folder is `$XDG_CACHE_HOME/{backend}` where `$XDG_CACHE_HOME` defaults to `$HOME/.cache` (note for the `whisper` backend
|
|
108
108
|
the default is left to the `openai-whisper` package and might change in the future).
|
|
109
109
|
|
|
110
110
|
|
|
@@ -118,12 +118,12 @@ scribe
|
|
|
118
118
|
and the script will guide you through the choice of backend (`whisper` or `vosk`) and the specific language model.
|
|
119
119
|
After this, you will be prompted to start recording your microphone and print the transcribed text in real-time (`vosk`)
|
|
120
120
|
or until after recording is complete (`whisper`).
|
|
121
|
-
You can interrupt the recording via Ctrl + C and start again or change model.
|
|
121
|
+
You can interrupt the recording via Ctrl + C and start again or change model.
|
|
122
122
|
|
|
123
123
|
The default (`whisper`) is excellent at transcribing a full-length audio sequences in [many languages](https://github.com/openai/whisper?tab=readme-ov-file#available-models-and-languages). It is really impressive,
|
|
124
124
|
but it cannot do real-time, and depending on the model can have relatively long execution time, especially with the `turbo` model (at least on my laptop with CPU only). The `small` model is also excellent and runs much faster. It is selected as default in `scribe` for that reason.
|
|
125
|
-
With the `
|
|
126
|
-
|
|
125
|
+
With the `whisper` model the registration stops after a 2-second silence is detected. You can also stop the registration manually before the transcription occurs (Ctrl + C or Stop in the `--app` mode).
|
|
126
|
+
By default, the recording will only last 120 seconds. You can fine-tune this behaviour via the `--silence`, `--duration` and `--restart-after-silence` parameters.
|
|
127
127
|
|
|
128
128
|
The `vosk` backend is much faster and very good at doing real-time transcription for one language, but tended to make more mistakes in my tests and it does not do punctuation.
|
|
129
129
|
It becomes really powerful when used for longer or interactive typing session with the [keyboard](#virtual-keyboard-experimental) option, e.g. to make notes or chat with an AI.
|
|
@@ -135,19 +135,38 @@ scribe --backend whisper --model small --no-prompt
|
|
|
135
135
|
```
|
|
136
136
|
where `--no-prompt` jumps right to the recording (after the first interruption, you can still choose to change the backend and model).
|
|
137
137
|
|
|
138
|
-
|
|
138
|
+
## Output media
|
|
139
|
+
|
|
140
|
+
By default the transcription is printed on the terminal, but other output media are supported.
|
|
141
|
+
|
|
142
|
+
### Clipboard
|
|
143
|
+
|
|
144
|
+
The most straightforward is the clipboard:
|
|
145
|
+
|
|
146
|
+
```bash
|
|
147
|
+
scribe --clipboard
|
|
148
|
+
```
|
|
149
|
+
The content of the (full) transcription is then pasted to the clipboard, and it is up to the user to paste (e.g. Ctrl + V).
|
|
150
|
+
|
|
151
|
+
### Output file
|
|
152
|
+
|
|
153
|
+
Alternatively an output file can be indicated:
|
|
154
|
+
|
|
155
|
+
```bash
|
|
156
|
+
--keyboard -o transcription.txt
|
|
157
|
+
```
|
|
139
158
|
|
|
140
|
-
|
|
141
|
-
However with the `vosk` backend and its realtime transcription, it is very handy to have the keys sent directly to the keyboard.
|
|
142
|
-
That can be achieve with the `--keyboard` option.
|
|
159
|
+
### Virtual keyboard (experimental)
|
|
143
160
|
|
|
144
|
-
With the `--keyboard` option `scribe` will attempt to simulate a keyboard and send transcribed characters to the
|
|
161
|
+
With the `--keyboard` option `scribe` will attempt to simulate a keyboard and send transcribed characters to the application under focus:
|
|
145
162
|
|
|
146
163
|
```bash
|
|
147
164
|
scribe --keyboard
|
|
148
165
|
```
|
|
149
166
|
|
|
150
|
-
|
|
167
|
+
This can be extremely useful with the `vosk` backend and its realtime transcription, or alternatively with the `--restart` option with the `whisper` backend.
|
|
168
|
+
|
|
169
|
+
The `--keyboard` option relies on the optional `pynput` dependency (installed together with `scribe` if you used the `[all]` or `[keyboard]` option).
|
|
151
170
|
Depending on your operating system, `pynput` may require additional configuration to work around its [limitations](https://pynput.readthedocs.io/en/latest/limitations.html).
|
|
152
171
|
|
|
153
172
|
#### Use the keyboard with Wayland (default for Ubuntu 24.04)
|
|
@@ -164,7 +183,7 @@ sudo HOME=$HOME XDG_RUNTIME_DIR=$XDG_RUNTIME_DIR PYNPUT_BACKEND_KEYBOARD=uinput
|
|
|
164
183
|
```
|
|
165
184
|
You're on the right path :)
|
|
166
185
|
|
|
167
|
-
|
|
186
|
+
## System tray icon (experimental) <img src="scribe_data/share/icon.png" width=48px>
|
|
168
187
|
|
|
169
188
|
To avoid switching back and forth with the terminal, it's possible to interact with the program via an icon tray.
|
|
170
189
|
To activate start with:
|
|
@@ -179,7 +198,7 @@ sudo apt install libcairo-dev libgirepository1.0-dev gir1.2-appindicator3-0.1
|
|
|
179
198
|
pip install PyGObject
|
|
180
199
|
```
|
|
181
200
|
|
|
182
|
-
|
|
201
|
+
## Start as an application in GNOME
|
|
183
202
|
|
|
184
203
|
If you run Ubuntu (or else?) with GNOME, the script `scribe-install [...]` will create a `scribe.desktop` file and place it under `$HOME/.local/share/applications`
|
|
185
204
|
to make it available from the quick launch menu. Any option will be passed on to `scribe`.
|
|
@@ -187,7 +206,17 @@ to make it available from the quick launch menu. Any option will be passed on to
|
|
|
187
206
|
e.g.
|
|
188
207
|
|
|
189
208
|
```bash
|
|
190
|
-
scribe-install --backend whisper --model small
|
|
209
|
+
scribe-install --backend whisper --model small --clipboard --keyboard --restart-after-silence
|
|
191
210
|
```
|
|
192
211
|
|
|
193
|
-
After that just typing
|
|
212
|
+
After that just typing Super + scri... at any time from any where will conveniently start the app in its own terminal with the prescribed options.
|
|
213
|
+
|
|
214
|
+
|
|
215
|
+
## Fine tuning
|
|
216
|
+
|
|
217
|
+
There are a number of options to control the silence threshold, duration and more.
|
|
218
|
+
Best is to check the available options in the online help:
|
|
219
|
+
|
|
220
|
+
```bash
|
|
221
|
+
scribe --help
|
|
222
|
+
```
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|