scribe-cli 0.12.0__tar.gz → 0.12.2__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {scribe_cli-0.12.0/scribe_cli.egg-info → scribe_cli-0.12.2}/PKG-INFO +77 -28
- {scribe_cli-0.12.0 → scribe_cli-0.12.2}/README.md +71 -26
- {scribe_cli-0.12.0 → scribe_cli-0.12.2}/pyproject.toml +5 -1
- {scribe_cli-0.12.0 → scribe_cli-0.12.2}/scribe/_version.py +2 -2
- {scribe_cli-0.12.0 → scribe_cli-0.12.2}/scribe/app.py +41 -7
- {scribe_cli-0.12.0 → scribe_cli-0.12.2}/scribe/models.py +47 -30
- {scribe_cli-0.12.0 → scribe_cli-0.12.2/scribe_cli.egg-info}/PKG-INFO +77 -28
- {scribe_cli-0.12.0 → scribe_cli-0.12.2}/scribe_cli.egg-info/SOURCES.txt +2 -1
- scribe_cli-0.12.2/scripts/test_python_versions_install.sh +20 -0
- {scribe_cli-0.12.0 → scribe_cli-0.12.2}/.github/workflows/pypi.yml +0 -0
- {scribe_cli-0.12.0 → scribe_cli-0.12.2}/.gitignore +0 -0
- {scribe_cli-0.12.0 → scribe_cli-0.12.2}/LICENSE +0 -0
- {scribe_cli-0.12.0 → scribe_cli-0.12.2}/icon.xcf +0 -0
- {scribe_cli-0.12.0 → scribe_cli-0.12.2}/scribe/__init__.py +0 -0
- {scribe_cli-0.12.0 → scribe_cli-0.12.2}/scribe/audio.py +0 -0
- {scribe_cli-0.12.0 → scribe_cli-0.12.2}/scribe/install_desktop.py +0 -0
- {scribe_cli-0.12.0 → scribe_cli-0.12.2}/scribe/keyboard.py +0 -0
- {scribe_cli-0.12.0 → scribe_cli-0.12.2}/scribe/models.toml +0 -0
- {scribe_cli-0.12.0 → scribe_cli-0.12.2}/scribe/saverecording.py +0 -0
- {scribe_cli-0.12.0 → scribe_cli-0.12.2}/scribe/testpynput.py +0 -0
- {scribe_cli-0.12.0 → scribe_cli-0.12.2}/scribe/util.py +0 -0
- {scribe_cli-0.12.0 → scribe_cli-0.12.2}/scribe_cli.egg-info/dependency_links.txt +0 -0
- {scribe_cli-0.12.0 → scribe_cli-0.12.2}/scribe_cli.egg-info/entry_points.txt +0 -0
- {scribe_cli-0.12.0 → scribe_cli-0.12.2}/scribe_cli.egg-info/requires.txt +0 -0
- {scribe_cli-0.12.0 → scribe_cli-0.12.2}/scribe_cli.egg-info/top_level.txt +0 -0
- {scribe_cli-0.12.0 → scribe_cli-0.12.2}/scribe_data/__init__.py +0 -0
- {scribe_cli-0.12.0 → scribe_cli-0.12.2}/scribe_data/share/icon.png +0 -0
- {scribe_cli-0.12.0 → scribe_cli-0.12.2}/scribe_data/share/icon_recording.png +0 -0
- {scribe_cli-0.12.0 → scribe_cli-0.12.2}/scribe_data/share/icon_writing.png +0 -0
- {scribe_cli-0.12.0 → scribe_cli-0.12.2}/scribe_data/templates/scribe.desktop +0 -0
- {scribe_cli-0.12.0 → scribe_cli-0.12.2}/setup.cfg +0 -0
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.2
|
|
2
2
|
Name: scribe-cli
|
|
3
|
-
Version: 0.12.
|
|
3
|
+
Version: 0.12.2
|
|
4
4
|
Summary: scribe is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard on a computer
|
|
5
5
|
Author-email: Mahé Perrette <mahe.perrette@gmail.com>
|
|
6
6
|
License: MIT License
|
|
@@ -34,7 +34,11 @@ License: MIT License
|
|
|
34
34
|
ensure compliance with their respective terms.
|
|
35
35
|
Project-URL: Homepage, https://github.com/perrette/scribe
|
|
36
36
|
Keywords: speech recognition,transcription,AI,language,vosk,whisper,openai,keyboard,clipboard
|
|
37
|
-
Classifier: Programming Language :: Python :: 3
|
|
37
|
+
Classifier: Programming Language :: Python :: 3.9
|
|
38
|
+
Classifier: Programming Language :: Python :: 3.10
|
|
39
|
+
Classifier: Programming Language :: Python :: 3.11
|
|
40
|
+
Classifier: Programming Language :: Python :: 3.12
|
|
41
|
+
Classifier: Programming Language :: Python :: 3.13
|
|
38
42
|
Classifier: Operating System :: OS Independent
|
|
39
43
|
Requires-Python: >=3.9
|
|
40
44
|
Description-Content-Type: text/markdown
|
|
@@ -66,8 +70,8 @@ Requires-Dist: soundfile; extra == "all"
|
|
|
66
70
|
Requires-Dist: vosk; extra == "all"
|
|
67
71
|
Requires-Dist: pystray; extra == "all"
|
|
68
72
|
|
|
69
|
-
[]()
|
|
70
73
|
[](https://pypi.org/project/scribe-cli)
|
|
74
|
+

|
|
71
75
|
|
|
72
76
|
# Scribe <img src="scribe_data/share/icon.png" width=48px>
|
|
73
77
|
|
|
@@ -77,22 +81,29 @@ It features local, downloadable models with the `vosk` and `whisper` backends, a
|
|
|
77
81
|
|
|
78
82
|
## Compatibility
|
|
79
83
|
|
|
80
|
-
|
|
81
|
-
|
|
82
|
-
|
|
83
|
-
|
|
84
|
-
|
|
85
|
-
|
|
84
|
+
The package is initially developped for python 3.12 with Ubuntu 24.04 with Gnome + Wayland, but it should work on other platforms as well (feedback welcome).
|
|
85
|
+
Basically check the pages of the dependencies for more info (i.e. pynput for the keyboard, pystray for the app).
|
|
86
|
+
|
|
87
|
+
- python 3.13:
|
|
88
|
+
- at the time of writing, `openai-whisper` does not install.
|
|
89
|
+
|
|
90
|
+
- Ubuntu:
|
|
91
|
+
- see caveats in the use of the keyboard under Wayland [keyboard section](#use-the-keyboard-with-wayland).
|
|
92
|
+
- MacOS:
|
|
93
|
+
- tested on a Macbook Air M1 8Gb RAM, with python 3.12. It runs, but poorly, presumably because of the low memory: prefer the `openaiapi` backend for such machines
|
|
94
|
+
- I expect better memory specs will have the local models run fine
|
|
95
|
+
- Windows:
|
|
96
|
+
- not tested yet
|
|
86
97
|
|
|
87
98
|
## Installation
|
|
88
99
|
|
|
89
|
-
Install PortAudio library and xclip library. E.g. on Ubuntu:
|
|
100
|
+
Install PortAudio library (required by `sounddevice`) and xclip library (required by `pyperclip`). E.g. on Ubuntu:
|
|
90
101
|
|
|
91
102
|
```bash
|
|
92
103
|
sudo apt-get install portaudio19-dev xclip
|
|
93
104
|
```
|
|
94
105
|
|
|
95
|
-
See additional requirements for the [icon tray](#system-tray-icon-experimental) and [keyboard](#virtual-keyboard-experimental) options. The python dependencies should be dealt with automatically:
|
|
106
|
+
See additional requirements for the [icon tray](#system-tray-icon-experimental-) and [keyboard](#virtual-keyboard-experimental) options. The python dependencies should be dealt with automatically:
|
|
96
107
|
|
|
97
108
|
```bash
|
|
98
109
|
pip install scribe-cli[all]
|
|
@@ -110,6 +121,37 @@ pip install -e .[all]
|
|
|
110
121
|
|
|
111
122
|
You can leave the optional dependencies (leave out `[all]`) but must install at least one of `vosk` or `openai-whisper` or `openai` packages (see Usage below).
|
|
112
123
|
|
|
124
|
+
At the time of writing `openai-whisper` does not install on `python 3.13`. You can install the packages manually and skip that package. This makes the `whisper` API unavailable.
|
|
125
|
+
|
|
126
|
+
### Manual selection of the dependencies
|
|
127
|
+
|
|
128
|
+
```bash
|
|
129
|
+
# language models (at least one must be installed !)
|
|
130
|
+
pip install vosk
|
|
131
|
+
pip install openai soundfile # openaiapi
|
|
132
|
+
pip install openai-whisper # FAILS IN PYTHON 3.13 on Ubuntu
|
|
133
|
+
|
|
134
|
+
# PortAUDIO (sounddevice)
|
|
135
|
+
pip install sounddevice # automatically installed as required dependency
|
|
136
|
+
sudo apt-get install portaudio19-dev
|
|
137
|
+
|
|
138
|
+
# clipboard
|
|
139
|
+
pip install pyperclip # automatically installed as required dependency
|
|
140
|
+
sudo apt-get install xclip
|
|
141
|
+
|
|
142
|
+
# keyboard
|
|
143
|
+
pip install pynput
|
|
144
|
+
|
|
145
|
+
# app mode
|
|
146
|
+
# Uncommand the line below for Ubuntu !
|
|
147
|
+
sudo apt install libcairo-dev libgirepository1.0-dev gir1.2-appindicator3-0.1 # Ubuntu ONLY (not needed on MacOS)
|
|
148
|
+
pip install PyGObject # Ubuntu ONLY (not needed on MacOS)
|
|
149
|
+
pip install pystray
|
|
150
|
+
|
|
151
|
+
# And finally
|
|
152
|
+
pip install scribe-cli
|
|
153
|
+
```
|
|
154
|
+
|
|
113
155
|
The language models for local backends `vosk` and `whisper` will download on-the-fly.
|
|
114
156
|
The default download folder is `$XDG_CACHE_HOME/{backend}` where `$XDG_CACHE_HOME` defaults to `$HOME/.cache`.
|
|
115
157
|
|
|
@@ -134,11 +176,12 @@ The `vosk` backend is much faster and very good at doing real-time transcription
|
|
|
134
176
|
It becomes really powerful when used for longer or interactive typing session with the [keyboard](#virtual-keyboard-experimental) option, e.g. to make notes or chat with an AI.
|
|
135
177
|
There are many [vosk models](https://alphacephei.com/vosk/models) available, and here a few are associated to [a handful of languages](scribe/models.toml) `en`, `fr`, `it`, `de` (so far).
|
|
136
178
|
|
|
137
|
-
The `openaiapi` backend uses `whisper-1` model at the time of writing. It requires an API key
|
|
179
|
+
The `openaiapi` backend uses `whisper-1` model at the time of writing. It requires an API key best passed as an environment variable, e.g. in bash:
|
|
138
180
|
```bash
|
|
139
|
-
|
|
181
|
+
export OPENAI_API_KEY=YOURAPIKEY
|
|
182
|
+
scribe --backend openaiapi
|
|
140
183
|
```
|
|
141
|
-
|
|
184
|
+
The `openaiapi` backend is lightweight and handy if you have an API (you can create one for free for testing) and a low-spec computer (and don't care too much about privacy, obviously).
|
|
142
185
|
|
|
143
186
|
## Output media
|
|
144
187
|
|
|
@@ -174,9 +217,9 @@ This can be extremely useful with the `vosk` backend and its realtime transcript
|
|
|
174
217
|
The `--keyboard` option relies on the optional `pynput` dependency (installed together with `scribe` if you used the `[all]` or `[keyboard]` option).
|
|
175
218
|
Depending on your operating system, `pynput` may require additional configuration to work around its [limitations](https://pynput.readthedocs.io/en/latest/limitations.html).
|
|
176
219
|
|
|
177
|
-
#### Use the keyboard with Wayland
|
|
220
|
+
#### Use the keyboard with Wayland
|
|
178
221
|
|
|
179
|
-
In my Ubuntu + Wayland system the keyboard simulation works out-of-the-box in chromium based applications (including vscode) but it does not in firefox and sublime text and any of the rest (not even in a terminal !). I am told this is because Chromium runs an X server emulator and so is compatible with the default pynput backend.
|
|
222
|
+
In my Ubuntu 24.04 + Wayland system the keyboard simulation works out-of-the-box in chromium based applications (including vscode) but it does not in firefox and sublime text and any of the rest (not even in a terminal !). I am told this is because Chromium runs an X server emulator and so is compatible with the default pynput backend.
|
|
180
223
|
|
|
181
224
|
One workaround is to use the Xorg version of GNOME: in `etc/gdm3/custom.conf` uncomment `# WaylandEnable=false` and restart your computer.
|
|
182
225
|
|
|
@@ -190,40 +233,46 @@ You're on the right path :)
|
|
|
190
233
|
|
|
191
234
|
## System tray icon (experimental) <img src="scribe_data/share/icon.png" width=48px>
|
|
192
235
|
|
|
236
|
+
<img src=https://github.com/user-attachments/assets/4c97f4b1-1a65-4d49-9f5a-a9f4287cfa5a width=300px>
|
|
237
|
+
|
|
193
238
|
To avoid switching back and forth with the terminal, it's possible to interact with the program via an icon tray.
|
|
194
239
|
To activate start with:
|
|
195
240
|
```bash
|
|
196
|
-
scribe --app
|
|
241
|
+
scribe --app ...
|
|
197
242
|
```
|
|
198
243
|
or toggle the app option in the interactive menu. The scribe icon will show, with Record and other options. The icon will change based on what the app is doing. It is possible to choose from a set
|
|
199
244
|
of predefined models (controlled by `--vosk-models` and `whisper-models`) and options, or to Quit and choose from the terminal before pressing Enter again.
|
|
200
245
|
For the vosk model, there are only two states : recording + transcribing or Idle. For the whisper model there are three states visible from the icon: recording/waiting, transcribing and idle.
|
|
201
|
-
That option requires `pystray` to be installed. This is included with the `pip install ...[all]` option.
|
|
246
|
+
That option requires `pystray` to be installed. This is included with the `pip install ...[all]` option.
|
|
247
|
+
|
|
248
|
+
The `--vosk-models` and `--whisper-models` allow to predefined the set of available models to choose from in the app manu. E.g.
|
|
249
|
+
```bash
|
|
250
|
+
scribe --app --vosk-models vosk-model-fr-0.22 --whisper-models small turbo ...
|
|
251
|
+
```
|
|
252
|
+
|
|
253
|
+
### Ubuntu
|
|
254
|
+
|
|
255
|
+
In Ubuntu the following dependencies were required to make the menus appear:
|
|
202
256
|
|
|
203
257
|
```bash
|
|
204
258
|
sudo apt install libcairo-dev libgirepository1.0-dev gir1.2-appindicator3-0.1
|
|
205
259
|
pip install PyGObject
|
|
206
260
|
```
|
|
207
261
|
|
|
208
|
-
<img src=https://github.com/user-attachments/assets/4c97f4b1-1a65-4d49-9f5a-a9f4287cfa5a width=300px>
|
|
209
|
-
|
|
210
262
|
## Start as an application in GNOME
|
|
211
263
|
|
|
212
264
|
If you run Ubuntu (or else?) with GNOME, the script `scribe-install [...]` will create a `scribe.desktop` file and place it under `$HOME/.local/share/applications`
|
|
213
265
|
to make it available from the quick launch menu. Any option will be passed on to `scribe`, with the additional options `--name` and `--no-terminal`.
|
|
214
266
|
`--no-terminal` means no terminal will show up, and it also implies the options `--app --no-prompt`.
|
|
215
267
|
|
|
216
|
-
|
|
217
|
-
|
|
268
|
+
Consider the following two flavors:
|
|
218
269
|
```bash
|
|
219
|
-
scribe-install --clipboard
|
|
270
|
+
scribe-install --clipboard ...
|
|
271
|
+
scribe-install --name "Scribe App" --no-terminal --clipboard ...
|
|
220
272
|
```
|
|
221
|
-
(
|
|
273
|
+
The first will create an app named Scribe (the default) that simply opens a terminal and execute the command `scribe --clipboard ...`.
|
|
274
|
+
The second will create an app named Scribe App that executes in a hidden terminal: `scribe --no-prompt --app --clipboard ...`, thus leaving the tray icon as only mode of interaction.
|
|
222
275
|
|
|
223
|
-
It is also possible to run an app fully outside the terminal:
|
|
224
|
-
```bash
|
|
225
|
-
scribe-install --backend openaiapi --name "Scribe App" --keyboard --clipboard --app --no-prompt --no-terminal --restart-after-silence --api YOUROPENAIAPIKEY --vosk-models vosk-model-fr-0.22 --whisper-models small turbo
|
|
226
|
-
```
|
|
227
276
|
|
|
228
277
|
## Fine tuning
|
|
229
278
|
|
|
@@ -1,5 +1,5 @@
|
|
|
1
|
-
[]()
|
|
2
1
|
[](https://pypi.org/project/scribe-cli)
|
|
2
|
+

|
|
3
3
|
|
|
4
4
|
# Scribe <img src="scribe_data/share/icon.png" width=48px>
|
|
5
5
|
|
|
@@ -9,22 +9,29 @@ It features local, downloadable models with the `vosk` and `whisper` backends, a
|
|
|
9
9
|
|
|
10
10
|
## Compatibility
|
|
11
11
|
|
|
12
|
-
|
|
13
|
-
|
|
14
|
-
|
|
15
|
-
|
|
16
|
-
|
|
17
|
-
|
|
12
|
+
The package is initially developped for python 3.12 with Ubuntu 24.04 with Gnome + Wayland, but it should work on other platforms as well (feedback welcome).
|
|
13
|
+
Basically check the pages of the dependencies for more info (i.e. pynput for the keyboard, pystray for the app).
|
|
14
|
+
|
|
15
|
+
- python 3.13:
|
|
16
|
+
- at the time of writing, `openai-whisper` does not install.
|
|
17
|
+
|
|
18
|
+
- Ubuntu:
|
|
19
|
+
- see caveats in the use of the keyboard under Wayland [keyboard section](#use-the-keyboard-with-wayland).
|
|
20
|
+
- MacOS:
|
|
21
|
+
- tested on a Macbook Air M1 8Gb RAM, with python 3.12. It runs, but poorly, presumably because of the low memory: prefer the `openaiapi` backend for such machines
|
|
22
|
+
- I expect better memory specs will have the local models run fine
|
|
23
|
+
- Windows:
|
|
24
|
+
- not tested yet
|
|
18
25
|
|
|
19
26
|
## Installation
|
|
20
27
|
|
|
21
|
-
Install PortAudio library and xclip library. E.g. on Ubuntu:
|
|
28
|
+
Install PortAudio library (required by `sounddevice`) and xclip library (required by `pyperclip`). E.g. on Ubuntu:
|
|
22
29
|
|
|
23
30
|
```bash
|
|
24
31
|
sudo apt-get install portaudio19-dev xclip
|
|
25
32
|
```
|
|
26
33
|
|
|
27
|
-
See additional requirements for the [icon tray](#system-tray-icon-experimental) and [keyboard](#virtual-keyboard-experimental) options. The python dependencies should be dealt with automatically:
|
|
34
|
+
See additional requirements for the [icon tray](#system-tray-icon-experimental-) and [keyboard](#virtual-keyboard-experimental) options. The python dependencies should be dealt with automatically:
|
|
28
35
|
|
|
29
36
|
```bash
|
|
30
37
|
pip install scribe-cli[all]
|
|
@@ -42,6 +49,37 @@ pip install -e .[all]
|
|
|
42
49
|
|
|
43
50
|
You can leave the optional dependencies (leave out `[all]`) but must install at least one of `vosk` or `openai-whisper` or `openai` packages (see Usage below).
|
|
44
51
|
|
|
52
|
+
At the time of writing `openai-whisper` does not install on `python 3.13`. You can install the packages manually and skip that package. This makes the `whisper` API unavailable.
|
|
53
|
+
|
|
54
|
+
### Manual selection of the dependencies
|
|
55
|
+
|
|
56
|
+
```bash
|
|
57
|
+
# language models (at least one must be installed !)
|
|
58
|
+
pip install vosk
|
|
59
|
+
pip install openai soundfile # openaiapi
|
|
60
|
+
pip install openai-whisper # FAILS IN PYTHON 3.13 on Ubuntu
|
|
61
|
+
|
|
62
|
+
# PortAUDIO (sounddevice)
|
|
63
|
+
pip install sounddevice # automatically installed as required dependency
|
|
64
|
+
sudo apt-get install portaudio19-dev
|
|
65
|
+
|
|
66
|
+
# clipboard
|
|
67
|
+
pip install pyperclip # automatically installed as required dependency
|
|
68
|
+
sudo apt-get install xclip
|
|
69
|
+
|
|
70
|
+
# keyboard
|
|
71
|
+
pip install pynput
|
|
72
|
+
|
|
73
|
+
# app mode
|
|
74
|
+
# Uncommand the line below for Ubuntu !
|
|
75
|
+
sudo apt install libcairo-dev libgirepository1.0-dev gir1.2-appindicator3-0.1 # Ubuntu ONLY (not needed on MacOS)
|
|
76
|
+
pip install PyGObject # Ubuntu ONLY (not needed on MacOS)
|
|
77
|
+
pip install pystray
|
|
78
|
+
|
|
79
|
+
# And finally
|
|
80
|
+
pip install scribe-cli
|
|
81
|
+
```
|
|
82
|
+
|
|
45
83
|
The language models for local backends `vosk` and `whisper` will download on-the-fly.
|
|
46
84
|
The default download folder is `$XDG_CACHE_HOME/{backend}` where `$XDG_CACHE_HOME` defaults to `$HOME/.cache`.
|
|
47
85
|
|
|
@@ -66,11 +104,12 @@ The `vosk` backend is much faster and very good at doing real-time transcription
|
|
|
66
104
|
It becomes really powerful when used for longer or interactive typing session with the [keyboard](#virtual-keyboard-experimental) option, e.g. to make notes or chat with an AI.
|
|
67
105
|
There are many [vosk models](https://alphacephei.com/vosk/models) available, and here a few are associated to [a handful of languages](scribe/models.toml) `en`, `fr`, `it`, `de` (so far).
|
|
68
106
|
|
|
69
|
-
The `openaiapi` backend uses `whisper-1` model at the time of writing. It requires an API key
|
|
107
|
+
The `openaiapi` backend uses `whisper-1` model at the time of writing. It requires an API key best passed as an environment variable, e.g. in bash:
|
|
70
108
|
```bash
|
|
71
|
-
|
|
109
|
+
export OPENAI_API_KEY=YOURAPIKEY
|
|
110
|
+
scribe --backend openaiapi
|
|
72
111
|
```
|
|
73
|
-
|
|
112
|
+
The `openaiapi` backend is lightweight and handy if you have an API (you can create one for free for testing) and a low-spec computer (and don't care too much about privacy, obviously).
|
|
74
113
|
|
|
75
114
|
## Output media
|
|
76
115
|
|
|
@@ -106,9 +145,9 @@ This can be extremely useful with the `vosk` backend and its realtime transcript
|
|
|
106
145
|
The `--keyboard` option relies on the optional `pynput` dependency (installed together with `scribe` if you used the `[all]` or `[keyboard]` option).
|
|
107
146
|
Depending on your operating system, `pynput` may require additional configuration to work around its [limitations](https://pynput.readthedocs.io/en/latest/limitations.html).
|
|
108
147
|
|
|
109
|
-
#### Use the keyboard with Wayland
|
|
148
|
+
#### Use the keyboard with Wayland
|
|
110
149
|
|
|
111
|
-
In my Ubuntu + Wayland system the keyboard simulation works out-of-the-box in chromium based applications (including vscode) but it does not in firefox and sublime text and any of the rest (not even in a terminal !). I am told this is because Chromium runs an X server emulator and so is compatible with the default pynput backend.
|
|
150
|
+
In my Ubuntu 24.04 + Wayland system the keyboard simulation works out-of-the-box in chromium based applications (including vscode) but it does not in firefox and sublime text and any of the rest (not even in a terminal !). I am told this is because Chromium runs an X server emulator and so is compatible with the default pynput backend.
|
|
112
151
|
|
|
113
152
|
One workaround is to use the Xorg version of GNOME: in `etc/gdm3/custom.conf` uncomment `# WaylandEnable=false` and restart your computer.
|
|
114
153
|
|
|
@@ -122,40 +161,46 @@ You're on the right path :)
|
|
|
122
161
|
|
|
123
162
|
## System tray icon (experimental) <img src="scribe_data/share/icon.png" width=48px>
|
|
124
163
|
|
|
164
|
+
<img src=https://github.com/user-attachments/assets/4c97f4b1-1a65-4d49-9f5a-a9f4287cfa5a width=300px>
|
|
165
|
+
|
|
125
166
|
To avoid switching back and forth with the terminal, it's possible to interact with the program via an icon tray.
|
|
126
167
|
To activate start with:
|
|
127
168
|
```bash
|
|
128
|
-
scribe --app
|
|
169
|
+
scribe --app ...
|
|
129
170
|
```
|
|
130
171
|
or toggle the app option in the interactive menu. The scribe icon will show, with Record and other options. The icon will change based on what the app is doing. It is possible to choose from a set
|
|
131
172
|
of predefined models (controlled by `--vosk-models` and `whisper-models`) and options, or to Quit and choose from the terminal before pressing Enter again.
|
|
132
173
|
For the vosk model, there are only two states : recording + transcribing or Idle. For the whisper model there are three states visible from the icon: recording/waiting, transcribing and idle.
|
|
133
|
-
That option requires `pystray` to be installed. This is included with the `pip install ...[all]` option.
|
|
174
|
+
That option requires `pystray` to be installed. This is included with the `pip install ...[all]` option.
|
|
175
|
+
|
|
176
|
+
The `--vosk-models` and `--whisper-models` allow to predefined the set of available models to choose from in the app manu. E.g.
|
|
177
|
+
```bash
|
|
178
|
+
scribe --app --vosk-models vosk-model-fr-0.22 --whisper-models small turbo ...
|
|
179
|
+
```
|
|
180
|
+
|
|
181
|
+
### Ubuntu
|
|
182
|
+
|
|
183
|
+
In Ubuntu the following dependencies were required to make the menus appear:
|
|
134
184
|
|
|
135
185
|
```bash
|
|
136
186
|
sudo apt install libcairo-dev libgirepository1.0-dev gir1.2-appindicator3-0.1
|
|
137
187
|
pip install PyGObject
|
|
138
188
|
```
|
|
139
189
|
|
|
140
|
-
<img src=https://github.com/user-attachments/assets/4c97f4b1-1a65-4d49-9f5a-a9f4287cfa5a width=300px>
|
|
141
|
-
|
|
142
190
|
## Start as an application in GNOME
|
|
143
191
|
|
|
144
192
|
If you run Ubuntu (or else?) with GNOME, the script `scribe-install [...]` will create a `scribe.desktop` file and place it under `$HOME/.local/share/applications`
|
|
145
193
|
to make it available from the quick launch menu. Any option will be passed on to `scribe`, with the additional options `--name` and `--no-terminal`.
|
|
146
194
|
`--no-terminal` means no terminal will show up, and it also implies the options `--app --no-prompt`.
|
|
147
195
|
|
|
148
|
-
|
|
149
|
-
|
|
196
|
+
Consider the following two flavors:
|
|
150
197
|
```bash
|
|
151
|
-
scribe-install --clipboard
|
|
198
|
+
scribe-install --clipboard ...
|
|
199
|
+
scribe-install --name "Scribe App" --no-terminal --clipboard ...
|
|
152
200
|
```
|
|
153
|
-
(
|
|
201
|
+
The first will create an app named Scribe (the default) that simply opens a terminal and execute the command `scribe --clipboard ...`.
|
|
202
|
+
The second will create an app named Scribe App that executes in a hidden terminal: `scribe --no-prompt --app --clipboard ...`, thus leaving the tray icon as only mode of interaction.
|
|
154
203
|
|
|
155
|
-
It is also possible to run an app fully outside the terminal:
|
|
156
|
-
```bash
|
|
157
|
-
scribe-install --backend openaiapi --name "Scribe App" --keyboard --clipboard --app --no-prompt --no-terminal --restart-after-silence --api YOUROPENAIAPIKEY --vosk-models vosk-model-fr-0.22 --whisper-models small turbo
|
|
158
|
-
```
|
|
159
204
|
|
|
160
205
|
## Fine tuning
|
|
161
206
|
|
|
@@ -23,7 +23,11 @@ dependencies = [
|
|
|
23
23
|
]
|
|
24
24
|
|
|
25
25
|
classifiers = [
|
|
26
|
-
"Programming Language :: Python :: 3",
|
|
26
|
+
"Programming Language :: Python :: 3.9",
|
|
27
|
+
"Programming Language :: Python :: 3.10",
|
|
28
|
+
"Programming Language :: Python :: 3.11",
|
|
29
|
+
"Programming Language :: Python :: 3.12",
|
|
30
|
+
"Programming Language :: Python :: 3.13",
|
|
27
31
|
"Operating System :: OS Independent",
|
|
28
32
|
]
|
|
29
33
|
|
|
@@ -3,6 +3,7 @@ import tomllib
|
|
|
3
3
|
import re
|
|
4
4
|
import time
|
|
5
5
|
import argparse
|
|
6
|
+
from typing import Iterable
|
|
6
7
|
from scribe.audio import Microphone
|
|
7
8
|
from scribe.util import print_partial, clear_line, prompt_choices, ansi_link, colored
|
|
8
9
|
from scribe.models import VoskTranscriber, WhisperTranscriber, OpenaiAPITranscriber
|
|
@@ -255,7 +256,7 @@ def start_recording(micro, transcriber, clipboard=True, keyboard=False, latency=
|
|
|
255
256
|
callback()
|
|
256
257
|
|
|
257
258
|
|
|
258
|
-
def create_app(micro, transcriber, other_transcribers=None, **kwargs):
|
|
259
|
+
def create_app(micro, transcriber, other_transcribers=None, transcriber_options=[], **kwargs):
|
|
259
260
|
import pystray
|
|
260
261
|
from pystray import Menu as pystrayMenu, MenuItem as Item
|
|
261
262
|
from PIL import Image
|
|
@@ -344,6 +345,9 @@ def create_app(micro, transcriber, other_transcribers=None, **kwargs):
|
|
|
344
345
|
|
|
345
346
|
def callback_set_model(icon, item):
|
|
346
347
|
transcriber = icon._transcriber
|
|
348
|
+
if transcriber.model_name == str(item):
|
|
349
|
+
transcriber.log(f"Already using model {str(item)}")
|
|
350
|
+
return
|
|
347
351
|
callback_stop_recording(icon, item)
|
|
348
352
|
model_name = str(item)
|
|
349
353
|
meta = other_transcribers_dict[model_name]
|
|
@@ -356,7 +360,23 @@ def create_app(micro, transcriber, other_transcribers=None, **kwargs):
|
|
|
356
360
|
|
|
357
361
|
def callback_toggle_option(icon, item):
|
|
358
362
|
callback_stop_recording(icon, item)
|
|
359
|
-
|
|
363
|
+
if str(item) in transcriber_options:
|
|
364
|
+
# toggle the option on the current transcriber
|
|
365
|
+
if str(item) in icon._transcriber._frozen_options or type(getattr(icon._transcriber, str(item), None)) is not bool:
|
|
366
|
+
print("Skipped setting option", item)
|
|
367
|
+
return
|
|
368
|
+
newvalue = not getattr(icon._transcriber, str(item))
|
|
369
|
+
setattr(icon._transcriber, str(item), newvalue)
|
|
370
|
+
# set the option on the other transcribers as well
|
|
371
|
+
if other_transcribers:
|
|
372
|
+
for name in other_transcribers_dict:
|
|
373
|
+
meta = other_transcribers_dict[name]
|
|
374
|
+
if str(item) in meta:
|
|
375
|
+
meta[str(item)] = newvalue
|
|
376
|
+
|
|
377
|
+
else:
|
|
378
|
+
kwargs[str(item)] = not kwargs[str(item)]
|
|
379
|
+
print("Option set [", item, "] to", kwargs[str(item)])
|
|
360
380
|
|
|
361
381
|
def is_model_selection(item):
|
|
362
382
|
return icon._model_selection
|
|
@@ -367,23 +387,34 @@ def create_app(micro, transcriber, other_transcribers=None, **kwargs):
|
|
|
367
387
|
def is_not_recording(item):
|
|
368
388
|
return not is_recording(item) and not is_model_selection(item)
|
|
369
389
|
|
|
370
|
-
def
|
|
390
|
+
def is_checked_model(item):
|
|
371
391
|
return icon._transcriber.model_name == str(item)
|
|
372
392
|
|
|
373
393
|
def is_checked_option(item):
|
|
394
|
+
if not is_option_visible(item):
|
|
395
|
+
return False
|
|
396
|
+
if str(item) in transcriber_options:
|
|
397
|
+
return getattr(icon._transcriber, str(item))
|
|
374
398
|
return kwargs[str(item)]
|
|
375
399
|
|
|
400
|
+
def is_option_visible(item):
|
|
401
|
+
if str(item) in transcriber_options:
|
|
402
|
+
return str(item) not in icon._transcriber._frozen_options
|
|
403
|
+
return True
|
|
404
|
+
|
|
376
405
|
modeltitle = f"{transcriber.backend} :: {transcriber.model_name}"
|
|
377
406
|
title = f"scribe :: {modeltitle}"
|
|
378
407
|
|
|
408
|
+
options = [name for name in kwargs if isinstance(kwargs[name], bool)] + [name for name in transcriber_options if isinstance(getattr(transcriber, name), bool)]
|
|
409
|
+
|
|
379
410
|
menus = []
|
|
380
411
|
menus.append(Item(f"Record", callback_record, visible=is_not_recording, default=True))
|
|
381
412
|
menus.append(Item("Stop", callback_stop_recording, visible=is_recording))
|
|
382
413
|
menus.append(Item("Choose Model", pystrayMenu(
|
|
383
|
-
*(Item(f"{name}", callback_set_model, checked=
|
|
414
|
+
*(Item(f"{name}", callback_set_model, checked=is_checked_model) for name in other_transcribers_dict)))
|
|
384
415
|
)
|
|
385
416
|
menus.append(Item("Toggle Options", pystrayMenu(
|
|
386
|
-
*(Item(f"{name}", callback_toggle_option, checked=is_checked_option) for name in
|
|
417
|
+
*(Item(f"{name}", callback_toggle_option, checked=is_checked_option, visible=is_option_visible) for name in options)))
|
|
387
418
|
)
|
|
388
419
|
menus.append(Item('Quit', callback_quit))
|
|
389
420
|
|
|
@@ -398,6 +429,8 @@ def create_app(micro, transcriber, other_transcribers=None, **kwargs):
|
|
|
398
429
|
|
|
399
430
|
return icon
|
|
400
431
|
|
|
432
|
+
def _filter_options(d: dict, exclude: Iterable) -> dict:
|
|
433
|
+
return {k: v for k, v in d.items() if k not in exclude}
|
|
401
434
|
|
|
402
435
|
def main(args=None):
|
|
403
436
|
|
|
@@ -531,9 +564,10 @@ def main(args=None):
|
|
|
531
564
|
app = create_app(micro, transcriber, other_transcribers=[
|
|
532
565
|
{**vars(o), "backend": "openaiapi", "model": "whisper-1"},
|
|
533
566
|
*[{**vars(o), "backend": "whisper", "model": model} for model in o.whisper_models],
|
|
534
|
-
*[{**vars(o), "backend": "vosk", "model": model} for model in o.vosk_models]],
|
|
567
|
+
*[{**_filter_options(vars(o), exclude=VoskTranscriber._frozen_options), "backend": "vosk", "model": model} for model in o.vosk_models]],
|
|
535
568
|
clipboard=o.clipboard, output_file=o.output_file,
|
|
536
|
-
keyboard=o.keyboard, latency=o.latency, ascii=o.ascii,
|
|
569
|
+
keyboard=o.keyboard, latency=o.latency, ascii=o.ascii,
|
|
570
|
+
transcriber_options=["restart_after_silence"], **greetings)
|
|
537
571
|
print("Starting app...")
|
|
538
572
|
app.run()
|
|
539
573
|
else:
|
|
@@ -16,11 +16,15 @@ HOME = os.environ.get('HOME', os.path.expanduser('~'))
|
|
|
16
16
|
XDG_CACHE_HOME = os.environ.get('XDG_CACHE_HOME', os.path.join(HOME, '.cache'))
|
|
17
17
|
VOSK_MODELS_FOLDER = os.path.join(XDG_CACHE_HOME, "vosk")
|
|
18
18
|
|
|
19
|
+
class SilenceDetected(Exception):
|
|
20
|
+
pass
|
|
21
|
+
|
|
19
22
|
class StopRecording(Exception):
|
|
20
23
|
pass
|
|
21
24
|
|
|
22
25
|
class AbstractTranscriber:
|
|
23
26
|
backend = None
|
|
27
|
+
_frozen_options = frozenset()
|
|
24
28
|
def __init__(self, model, model_name=None, language=None, samplerate=16000, timeout=None, model_kwargs={},
|
|
25
29
|
silence_thresh=-40, silence_duration=2, restart_after_silence=False, logger=None):
|
|
26
30
|
self.model_name = model_name
|
|
@@ -50,7 +54,32 @@ class AbstractTranscriber:
|
|
|
50
54
|
return self.timeout is not None and time.time() - self.start_time > self.timeout
|
|
51
55
|
|
|
52
56
|
def transcribe_realtime_audio(self, audio_bytes=b""):
|
|
53
|
-
|
|
57
|
+
"""This method is generic and assumes the underlying model does not handle real-time audio.
|
|
58
|
+
The Vosk model handles real-time audio, so this method is overridden in the VoskTranscriber class.
|
|
59
|
+
"""
|
|
60
|
+
|
|
61
|
+
# Vérifier si le segment est un silence
|
|
62
|
+
if is_silent(audio_bytes, self.silence_thresh):
|
|
63
|
+
self.silence_buffer += audio_bytes
|
|
64
|
+
silence_duration = time.time() - self.last_sound_time
|
|
65
|
+
self.waiting = self.silence_duration is not None and silence_duration >= self.silence_duration
|
|
66
|
+
|
|
67
|
+
if self.waiting and len(self.audio_buffer) > 0:
|
|
68
|
+
if self.restart_after_silence:
|
|
69
|
+
raise SilenceDetected("Silence detected: {:.2f} seconds".format(silence_duration))
|
|
70
|
+
else:
|
|
71
|
+
raise StopRecording("Silence detected: {:.2f} seconds".format(silence_duration))
|
|
72
|
+
|
|
73
|
+
else:
|
|
74
|
+
self.last_sound_time = time.time()
|
|
75
|
+
self.waiting = False
|
|
76
|
+
silence_buffer_data = np.frombuffer(self.silence_buffer, dtype=np.int16)
|
|
77
|
+
# add 0.5 seconds worth of silent data back to the audio buffer
|
|
78
|
+
half_a_second = 0.5
|
|
79
|
+
length_of_half_a_second = int(half_a_second * self.samplerate)
|
|
80
|
+
self.audio_buffer += silence_buffer_data[-length_of_half_a_second:].tobytes() + audio_bytes
|
|
81
|
+
self.silence_buffer = b''
|
|
82
|
+
|
|
54
83
|
return {"partial": f"{len(self.audio_buffer)} bytes received (duration: {self.get_elapsed()} seconds)"}
|
|
55
84
|
|
|
56
85
|
def transcribe_audio(self, audio_data):
|
|
@@ -59,6 +88,7 @@ class AbstractTranscriber:
|
|
|
59
88
|
def reset(self):
|
|
60
89
|
self.audio_buffer = b''
|
|
61
90
|
self.start_time = time.time()
|
|
91
|
+
self.silence_buffer = b''
|
|
62
92
|
|
|
63
93
|
def log(self, text):
|
|
64
94
|
if text.startswith("\n"):
|
|
@@ -82,7 +112,7 @@ class AbstractTranscriber:
|
|
|
82
112
|
self.last_sound_time = time.time() - self.silence_duration
|
|
83
113
|
else:
|
|
84
114
|
self.last_sound_time = time.time()
|
|
85
|
-
|
|
115
|
+
# self.silence_buffer = b'' # already reset in self.reset()
|
|
86
116
|
|
|
87
117
|
try:
|
|
88
118
|
|
|
@@ -93,35 +123,20 @@ class AbstractTranscriber:
|
|
|
93
123
|
while not microphone.q.empty():
|
|
94
124
|
data = microphone.q.get()
|
|
95
125
|
|
|
96
|
-
#
|
|
97
|
-
|
|
98
|
-
silence_duration = time.time() - self.last_sound_time
|
|
99
|
-
|
|
100
|
-
previous_waiting = self.waiting
|
|
101
|
-
self.waiting = self.silence_duration is not None and silence_duration >= self.silence_duration
|
|
102
|
-
|
|
103
|
-
if self.waiting and len(self.audio_buffer) > 0:
|
|
104
|
-
if self.restart_after_silence:
|
|
105
|
-
self.recording = False # for the system tray icon
|
|
106
|
-
result = self.finalize()
|
|
107
|
-
microphone.q.queue.clear()
|
|
108
|
-
self.reset()
|
|
109
|
-
yield result
|
|
110
|
-
self.recording = True # for the system tray icon
|
|
111
|
-
else:
|
|
112
|
-
raise StopRecording("Silence detected: {:.2f} seconds".format(silence_duration))
|
|
113
|
-
|
|
114
|
-
else:
|
|
115
|
-
self.last_sound_time = time.time()
|
|
116
|
-
self.waiting = False
|
|
117
|
-
|
|
118
|
-
# don't accumulate very long silences
|
|
119
|
-
if not self.waiting:
|
|
126
|
+
# leave it to each transcriber to handle the silence in audio data
|
|
127
|
+
try:
|
|
120
128
|
yield self.transcribe_realtime_audio(data)
|
|
121
129
|
|
|
122
|
-
|
|
123
|
-
|
|
124
|
-
|
|
130
|
+
# This exception triggers a pause in recording to allow for a transcription of the audio buffer
|
|
131
|
+
except SilenceDetected as e:
|
|
132
|
+
self.log(str(e))
|
|
133
|
+
self.recording = False # for the system tray icon
|
|
134
|
+
result = self.finalize()
|
|
135
|
+
microphone.q.queue.clear()
|
|
136
|
+
self.reset()
|
|
137
|
+
yield result
|
|
138
|
+
self.recording = True # for the system tray icon
|
|
139
|
+
self.start_time = time.time() # reset the start time to avoid timeout
|
|
125
140
|
|
|
126
141
|
if self.is_overtime():
|
|
127
142
|
raise StopRecording("Overtime: {:.2f} seconds".format(self.get_elapsed()))
|
|
@@ -165,8 +180,10 @@ def get_vosk_recognizer(model, samplerate=16000):
|
|
|
165
180
|
|
|
166
181
|
class VoskTranscriber(AbstractTranscriber):
|
|
167
182
|
backend = "vosk"
|
|
183
|
+
_frozen_options = frozenset(["restart_after_silence", "silence_duration", "silence_thresh"])
|
|
168
184
|
|
|
169
185
|
def __init__(self, model_name, model=None, model_kwargs={}, **kwargs):
|
|
186
|
+
kwargs["silence_thresh"] = -np.inf # disable silence detection (this is handled by Vosk)
|
|
170
187
|
if model is None:
|
|
171
188
|
model = get_vosk_model(model_name, **model_kwargs)
|
|
172
189
|
super().__init__(model, model_name, model_kwargs=model_kwargs, **kwargs)
|
|
@@ -222,7 +239,7 @@ class WhisperTranscriber(AbstractTranscriber):
|
|
|
222
239
|
if len(self.audio_buffer) == 0:
|
|
223
240
|
return {"text": ""}
|
|
224
241
|
result = self.transcribe_audio(self.audio_buffer)
|
|
225
|
-
self.
|
|
242
|
+
self.reset()
|
|
226
243
|
return result
|
|
227
244
|
|
|
228
245
|
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.2
|
|
2
2
|
Name: scribe-cli
|
|
3
|
-
Version: 0.12.
|
|
3
|
+
Version: 0.12.2
|
|
4
4
|
Summary: scribe is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard on a computer
|
|
5
5
|
Author-email: Mahé Perrette <mahe.perrette@gmail.com>
|
|
6
6
|
License: MIT License
|
|
@@ -34,7 +34,11 @@ License: MIT License
|
|
|
34
34
|
ensure compliance with their respective terms.
|
|
35
35
|
Project-URL: Homepage, https://github.com/perrette/scribe
|
|
36
36
|
Keywords: speech recognition,transcription,AI,language,vosk,whisper,openai,keyboard,clipboard
|
|
37
|
-
Classifier: Programming Language :: Python :: 3
|
|
37
|
+
Classifier: Programming Language :: Python :: 3.9
|
|
38
|
+
Classifier: Programming Language :: Python :: 3.10
|
|
39
|
+
Classifier: Programming Language :: Python :: 3.11
|
|
40
|
+
Classifier: Programming Language :: Python :: 3.12
|
|
41
|
+
Classifier: Programming Language :: Python :: 3.13
|
|
38
42
|
Classifier: Operating System :: OS Independent
|
|
39
43
|
Requires-Python: >=3.9
|
|
40
44
|
Description-Content-Type: text/markdown
|
|
@@ -66,8 +70,8 @@ Requires-Dist: soundfile; extra == "all"
|
|
|
66
70
|
Requires-Dist: vosk; extra == "all"
|
|
67
71
|
Requires-Dist: pystray; extra == "all"
|
|
68
72
|
|
|
69
|
-
[]()
|
|
70
73
|
[](https://pypi.org/project/scribe-cli)
|
|
74
|
+

|
|
71
75
|
|
|
72
76
|
# Scribe <img src="scribe_data/share/icon.png" width=48px>
|
|
73
77
|
|
|
@@ -77,22 +81,29 @@ It features local, downloadable models with the `vosk` and `whisper` backends, a
|
|
|
77
81
|
|
|
78
82
|
## Compatibility
|
|
79
83
|
|
|
80
|
-
|
|
81
|
-
|
|
82
|
-
|
|
83
|
-
|
|
84
|
-
|
|
85
|
-
|
|
84
|
+
The package is initially developped for python 3.12 with Ubuntu 24.04 with Gnome + Wayland, but it should work on other platforms as well (feedback welcome).
|
|
85
|
+
Basically check the pages of the dependencies for more info (i.e. pynput for the keyboard, pystray for the app).
|
|
86
|
+
|
|
87
|
+
- python 3.13:
|
|
88
|
+
- at the time of writing, `openai-whisper` does not install.
|
|
89
|
+
|
|
90
|
+
- Ubuntu:
|
|
91
|
+
- see caveats in the use of the keyboard under Wayland [keyboard section](#use-the-keyboard-with-wayland).
|
|
92
|
+
- MacOS:
|
|
93
|
+
- tested on a Macbook Air M1 8Gb RAM, with python 3.12. It runs, but poorly, presumably because of the low memory: prefer the `openaiapi` backend for such machines
|
|
94
|
+
- I expect better memory specs will have the local models run fine
|
|
95
|
+
- Windows:
|
|
96
|
+
- not tested yet
|
|
86
97
|
|
|
87
98
|
## Installation
|
|
88
99
|
|
|
89
|
-
Install PortAudio library and xclip library. E.g. on Ubuntu:
|
|
100
|
+
Install PortAudio library (required by `sounddevice`) and xclip library (required by `pyperclip`). E.g. on Ubuntu:
|
|
90
101
|
|
|
91
102
|
```bash
|
|
92
103
|
sudo apt-get install portaudio19-dev xclip
|
|
93
104
|
```
|
|
94
105
|
|
|
95
|
-
See additional requirements for the [icon tray](#system-tray-icon-experimental) and [keyboard](#virtual-keyboard-experimental) options. The python dependencies should be dealt with automatically:
|
|
106
|
+
See additional requirements for the [icon tray](#system-tray-icon-experimental-) and [keyboard](#virtual-keyboard-experimental) options. The python dependencies should be dealt with automatically:
|
|
96
107
|
|
|
97
108
|
```bash
|
|
98
109
|
pip install scribe-cli[all]
|
|
@@ -110,6 +121,37 @@ pip install -e .[all]
|
|
|
110
121
|
|
|
111
122
|
You can leave the optional dependencies (leave out `[all]`) but must install at least one of `vosk` or `openai-whisper` or `openai` packages (see Usage below).
|
|
112
123
|
|
|
124
|
+
At the time of writing `openai-whisper` does not install on `python 3.13`. You can install the packages manually and skip that package. This makes the `whisper` API unavailable.
|
|
125
|
+
|
|
126
|
+
### Manual selection of the dependencies
|
|
127
|
+
|
|
128
|
+
```bash
|
|
129
|
+
# language models (at least one must be installed !)
|
|
130
|
+
pip install vosk
|
|
131
|
+
pip install openai soundfile # openaiapi
|
|
132
|
+
pip install openai-whisper # FAILS IN PYTHON 3.13 on Ubuntu
|
|
133
|
+
|
|
134
|
+
# PortAUDIO (sounddevice)
|
|
135
|
+
pip install sounddevice # automatically installed as required dependency
|
|
136
|
+
sudo apt-get install portaudio19-dev
|
|
137
|
+
|
|
138
|
+
# clipboard
|
|
139
|
+
pip install pyperclip # automatically installed as required dependency
|
|
140
|
+
sudo apt-get install xclip
|
|
141
|
+
|
|
142
|
+
# keyboard
|
|
143
|
+
pip install pynput
|
|
144
|
+
|
|
145
|
+
# app mode
|
|
146
|
+
# Uncommand the line below for Ubuntu !
|
|
147
|
+
sudo apt install libcairo-dev libgirepository1.0-dev gir1.2-appindicator3-0.1 # Ubuntu ONLY (not needed on MacOS)
|
|
148
|
+
pip install PyGObject # Ubuntu ONLY (not needed on MacOS)
|
|
149
|
+
pip install pystray
|
|
150
|
+
|
|
151
|
+
# And finally
|
|
152
|
+
pip install scribe-cli
|
|
153
|
+
```
|
|
154
|
+
|
|
113
155
|
The language models for local backends `vosk` and `whisper` will download on-the-fly.
|
|
114
156
|
The default download folder is `$XDG_CACHE_HOME/{backend}` where `$XDG_CACHE_HOME` defaults to `$HOME/.cache`.
|
|
115
157
|
|
|
@@ -134,11 +176,12 @@ The `vosk` backend is much faster and very good at doing real-time transcription
|
|
|
134
176
|
It becomes really powerful when used for longer or interactive typing session with the [keyboard](#virtual-keyboard-experimental) option, e.g. to make notes or chat with an AI.
|
|
135
177
|
There are many [vosk models](https://alphacephei.com/vosk/models) available, and here a few are associated to [a handful of languages](scribe/models.toml) `en`, `fr`, `it`, `de` (so far).
|
|
136
178
|
|
|
137
|
-
The `openaiapi` backend uses `whisper-1` model at the time of writing. It requires an API key
|
|
179
|
+
The `openaiapi` backend uses `whisper-1` model at the time of writing. It requires an API key best passed as an environment variable, e.g. in bash:
|
|
138
180
|
```bash
|
|
139
|
-
|
|
181
|
+
export OPENAI_API_KEY=YOURAPIKEY
|
|
182
|
+
scribe --backend openaiapi
|
|
140
183
|
```
|
|
141
|
-
|
|
184
|
+
The `openaiapi` backend is lightweight and handy if you have an API (you can create one for free for testing) and a low-spec computer (and don't care too much about privacy, obviously).
|
|
142
185
|
|
|
143
186
|
## Output media
|
|
144
187
|
|
|
@@ -174,9 +217,9 @@ This can be extremely useful with the `vosk` backend and its realtime transcript
|
|
|
174
217
|
The `--keyboard` option relies on the optional `pynput` dependency (installed together with `scribe` if you used the `[all]` or `[keyboard]` option).
|
|
175
218
|
Depending on your operating system, `pynput` may require additional configuration to work around its [limitations](https://pynput.readthedocs.io/en/latest/limitations.html).
|
|
176
219
|
|
|
177
|
-
#### Use the keyboard with Wayland
|
|
220
|
+
#### Use the keyboard with Wayland
|
|
178
221
|
|
|
179
|
-
In my Ubuntu + Wayland system the keyboard simulation works out-of-the-box in chromium based applications (including vscode) but it does not in firefox and sublime text and any of the rest (not even in a terminal !). I am told this is because Chromium runs an X server emulator and so is compatible with the default pynput backend.
|
|
222
|
+
In my Ubuntu 24.04 + Wayland system the keyboard simulation works out-of-the-box in chromium based applications (including vscode) but it does not in firefox and sublime text and any of the rest (not even in a terminal !). I am told this is because Chromium runs an X server emulator and so is compatible with the default pynput backend.
|
|
180
223
|
|
|
181
224
|
One workaround is to use the Xorg version of GNOME: in `etc/gdm3/custom.conf` uncomment `# WaylandEnable=false` and restart your computer.
|
|
182
225
|
|
|
@@ -190,40 +233,46 @@ You're on the right path :)
|
|
|
190
233
|
|
|
191
234
|
## System tray icon (experimental) <img src="scribe_data/share/icon.png" width=48px>
|
|
192
235
|
|
|
236
|
+
<img src=https://github.com/user-attachments/assets/4c97f4b1-1a65-4d49-9f5a-a9f4287cfa5a width=300px>
|
|
237
|
+
|
|
193
238
|
To avoid switching back and forth with the terminal, it's possible to interact with the program via an icon tray.
|
|
194
239
|
To activate start with:
|
|
195
240
|
```bash
|
|
196
|
-
scribe --app
|
|
241
|
+
scribe --app ...
|
|
197
242
|
```
|
|
198
243
|
or toggle the app option in the interactive menu. The scribe icon will show, with Record and other options. The icon will change based on what the app is doing. It is possible to choose from a set
|
|
199
244
|
of predefined models (controlled by `--vosk-models` and `whisper-models`) and options, or to Quit and choose from the terminal before pressing Enter again.
|
|
200
245
|
For the vosk model, there are only two states : recording + transcribing or Idle. For the whisper model there are three states visible from the icon: recording/waiting, transcribing and idle.
|
|
201
|
-
That option requires `pystray` to be installed. This is included with the `pip install ...[all]` option.
|
|
246
|
+
That option requires `pystray` to be installed. This is included with the `pip install ...[all]` option.
|
|
247
|
+
|
|
248
|
+
The `--vosk-models` and `--whisper-models` allow to predefined the set of available models to choose from in the app manu. E.g.
|
|
249
|
+
```bash
|
|
250
|
+
scribe --app --vosk-models vosk-model-fr-0.22 --whisper-models small turbo ...
|
|
251
|
+
```
|
|
252
|
+
|
|
253
|
+
### Ubuntu
|
|
254
|
+
|
|
255
|
+
In Ubuntu the following dependencies were required to make the menus appear:
|
|
202
256
|
|
|
203
257
|
```bash
|
|
204
258
|
sudo apt install libcairo-dev libgirepository1.0-dev gir1.2-appindicator3-0.1
|
|
205
259
|
pip install PyGObject
|
|
206
260
|
```
|
|
207
261
|
|
|
208
|
-
<img src=https://github.com/user-attachments/assets/4c97f4b1-1a65-4d49-9f5a-a9f4287cfa5a width=300px>
|
|
209
|
-
|
|
210
262
|
## Start as an application in GNOME
|
|
211
263
|
|
|
212
264
|
If you run Ubuntu (or else?) with GNOME, the script `scribe-install [...]` will create a `scribe.desktop` file and place it under `$HOME/.local/share/applications`
|
|
213
265
|
to make it available from the quick launch menu. Any option will be passed on to `scribe`, with the additional options `--name` and `--no-terminal`.
|
|
214
266
|
`--no-terminal` means no terminal will show up, and it also implies the options `--app --no-prompt`.
|
|
215
267
|
|
|
216
|
-
|
|
217
|
-
|
|
268
|
+
Consider the following two flavors:
|
|
218
269
|
```bash
|
|
219
|
-
scribe-install --clipboard
|
|
270
|
+
scribe-install --clipboard ...
|
|
271
|
+
scribe-install --name "Scribe App" --no-terminal --clipboard ...
|
|
220
272
|
```
|
|
221
|
-
(
|
|
273
|
+
The first will create an app named Scribe (the default) that simply opens a terminal and execute the command `scribe --clipboard ...`.
|
|
274
|
+
The second will create an app named Scribe App that executes in a hidden terminal: `scribe --no-prompt --app --clipboard ...`, thus leaving the tray icon as only mode of interaction.
|
|
222
275
|
|
|
223
|
-
It is also possible to run an app fully outside the terminal:
|
|
224
|
-
```bash
|
|
225
|
-
scribe-install --backend openaiapi --name "Scribe App" --keyboard --clipboard --app --no-prompt --no-terminal --restart-after-silence --api YOUROPENAIAPIKEY --vosk-models vosk-model-fr-0.22 --whisper-models small turbo
|
|
226
|
-
```
|
|
227
276
|
|
|
228
277
|
## Fine tuning
|
|
229
278
|
|
|
@@ -0,0 +1,20 @@
|
|
|
1
|
+
subversion=$1
|
|
2
|
+
version=3.$subversion
|
|
3
|
+
name=py3$subversion
|
|
4
|
+
|
|
5
|
+
MAMBAENV=~/.local/share/mamba/envs/$name
|
|
6
|
+
VENVDIR=~/.virtualenvs/$name
|
|
7
|
+
|
|
8
|
+
if [ ! -d $MAMBAENV ] ; then
|
|
9
|
+
micromamba create -n $name python=$version --prefix $MAMBAENV -y
|
|
10
|
+
else
|
|
11
|
+
echo "Environment $name already exists at $MAMBAENV"
|
|
12
|
+
fi
|
|
13
|
+
if [ ! -d $VENVDIR ] ; then
|
|
14
|
+
$MAMBAENV/bin/python3 -m venv $VENVDIR
|
|
15
|
+
else
|
|
16
|
+
echo "Virtualenv $name already exists at $VENVDIR"
|
|
17
|
+
fi
|
|
18
|
+
source ~/.virtualenvs/$name/bin/activate
|
|
19
|
+
pip install -U pip
|
|
20
|
+
pip install scribe-cli[all]
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|