scribe-cli 0.7.4__tar.gz → 0.7.7__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {scribe_cli-0.7.4/scribe_cli.egg-info → scribe_cli-0.7.7}/PKG-INFO +27 -15
- {scribe_cli-0.7.4 → scribe_cli-0.7.7}/README.md +21 -13
- {scribe_cli-0.7.4 → scribe_cli-0.7.7}/pyproject.toml +10 -3
- {scribe_cli-0.7.4 → scribe_cli-0.7.7}/scribe/_version.py +2 -2
- {scribe_cli-0.7.4 → scribe_cli-0.7.7}/scribe/app.py +15 -6
- {scribe_cli-0.7.4 → scribe_cli-0.7.7}/scribe/keyboard.py +20 -3
- {scribe_cli-0.7.4 → scribe_cli-0.7.7/scribe_cli.egg-info}/PKG-INFO +27 -15
- {scribe_cli-0.7.4 → scribe_cli-0.7.7}/scribe_cli.egg-info/requires.txt +6 -1
- {scribe_cli-0.7.4 → scribe_cli-0.7.7}/.github/workflows/pypi.yml +0 -0
- {scribe_cli-0.7.4 → scribe_cli-0.7.7}/.gitignore +0 -0
- {scribe_cli-0.7.4 → scribe_cli-0.7.7}/LICENSE +0 -0
- {scribe_cli-0.7.4 → scribe_cli-0.7.7}/scribe/__init__.py +0 -0
- {scribe_cli-0.7.4 → scribe_cli-0.7.7}/scribe/audio.py +0 -0
- {scribe_cli-0.7.4 → scribe_cli-0.7.7}/scribe/install_desktop.py +0 -0
- {scribe_cli-0.7.4 → scribe_cli-0.7.7}/scribe/models.py +0 -0
- {scribe_cli-0.7.4 → scribe_cli-0.7.7}/scribe/models.toml +0 -0
- {scribe_cli-0.7.4 → scribe_cli-0.7.7}/scribe/saverecording.py +0 -0
- {scribe_cli-0.7.4 → scribe_cli-0.7.7}/scribe/testpynput.py +0 -0
- {scribe_cli-0.7.4 → scribe_cli-0.7.7}/scribe/util.py +0 -0
- {scribe_cli-0.7.4 → scribe_cli-0.7.7}/scribe_cli.egg-info/SOURCES.txt +0 -0
- {scribe_cli-0.7.4 → scribe_cli-0.7.7}/scribe_cli.egg-info/dependency_links.txt +0 -0
- {scribe_cli-0.7.4 → scribe_cli-0.7.7}/scribe_cli.egg-info/entry_points.txt +0 -0
- {scribe_cli-0.7.4 → scribe_cli-0.7.7}/scribe_cli.egg-info/top_level.txt +0 -0
- {scribe_cli-0.7.4 → scribe_cli-0.7.7}/scribe_data/__init__.py +0 -0
- {scribe_cli-0.7.4 → scribe_cli-0.7.7}/scribe_data/share/icon.jpg +0 -0
- {scribe_cli-0.7.4 → scribe_cli-0.7.7}/scribe_data/templates/scribe.desktop +0 -0
- {scribe_cli-0.7.4 → scribe_cli-0.7.7}/setup.cfg +0 -0
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
Metadata-Version: 2.2
|
|
2
2
|
Name: scribe-cli
|
|
3
|
-
Version: 0.7.
|
|
4
|
-
Summary: scribe is a local speech recognition tool that provides real-time transcription using vosk and whisper AI
|
|
3
|
+
Version: 0.7.7
|
|
4
|
+
Summary: scribe is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard on a computer
|
|
5
5
|
Author-email: Mahé Perrette <mahe.perrette@gmail.com>
|
|
6
6
|
License: MIT License
|
|
7
7
|
|
|
@@ -44,34 +44,47 @@ Requires-Dist: sounddevice
|
|
|
44
44
|
Requires-Dist: tqdm
|
|
45
45
|
Requires-Dist: requests
|
|
46
46
|
Requires-Dist: pyperclip
|
|
47
|
-
Requires-Dist:
|
|
47
|
+
Requires-Dist: unidecode
|
|
48
48
|
Provides-Extra: keyboard
|
|
49
49
|
Requires-Dist: pynput; extra == "keyboard"
|
|
50
50
|
Provides-Extra: whisper
|
|
51
51
|
Requires-Dist: openai-whisper; extra == "whisper"
|
|
52
52
|
Provides-Extra: vosk
|
|
53
53
|
Requires-Dist: vosk; extra == "vosk"
|
|
54
|
+
Provides-Extra: app
|
|
55
|
+
Requires-Dist: pystray; extra == "app"
|
|
56
|
+
Requires-Dist: PyGObject; extra == "app"
|
|
54
57
|
Provides-Extra: all
|
|
55
58
|
Requires-Dist: pynput; extra == "all"
|
|
56
59
|
Requires-Dist: openai-whisper; extra == "all"
|
|
57
60
|
Requires-Dist: vosk; extra == "all"
|
|
61
|
+
Requires-Dist: pystray; extra == "all"
|
|
62
|
+
|
|
63
|
+
[]()
|
|
64
|
+
[](https://pypi.org/project/scribe-cli)
|
|
58
65
|
|
|
59
66
|
# Scribe
|
|
60
67
|
|
|
61
|
-
`scribe` is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard.
|
|
68
|
+
`scribe` is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard on a computer.
|
|
69
|
+
|
|
70
|
+
## Compatibility
|
|
71
|
+
|
|
72
|
+
In principle `scribe` is compatible with any OS but I develop it under Ubuntu (Wayland) and develop it for my own purposes so glitches are likely on other configurations.
|
|
73
|
+
As of February 19, 2025 python 13 is not supported (I can't recall now which dependency is to blame).
|
|
74
|
+
A test on Mac OS (M1 Air with 8Gb RAM) worked with python 12, though with a much inferior performance compared to my own system (Lenovo T14 Gen 5 with i5 125U 32 Gb RAM).
|
|
62
75
|
|
|
63
76
|
## Installation
|
|
64
77
|
|
|
65
|
-
Install PortAudio library. E.g. on Ubuntu:
|
|
78
|
+
Install PortAudio library and xclip library. E.g. on Ubuntu:
|
|
66
79
|
|
|
67
80
|
```bash
|
|
68
|
-
sudo apt-get install portaudio19-dev
|
|
81
|
+
sudo apt-get install portaudio19-dev xclip
|
|
69
82
|
```
|
|
70
83
|
|
|
71
|
-
The python dependencies should be dealt with automatically:
|
|
84
|
+
See additional requirements for the [icon tray](#system-tray-icon-experimental) and [keyboard](#virtual-keyboard-experimental) options. The python dependencies should be dealt with automatically:
|
|
72
85
|
|
|
73
86
|
```bash
|
|
74
|
-
pip install scribe-cli[all]
|
|
87
|
+
pip install scribe-cli[all]
|
|
75
88
|
```
|
|
76
89
|
|
|
77
90
|
(note the `-cli` suffix for client)
|
|
@@ -108,9 +121,8 @@ but it cannot do real-time, and depending on the model can have relatively long
|
|
|
108
121
|
With the `whisker` model you need to stop the registration manually before the transcription occurs (Ctrl + C), though
|
|
109
122
|
there is a maximum duration after which it will stop by itself, which is setup to 60s by default (unless `--duration` is set to something else).
|
|
110
123
|
|
|
111
|
-
The `vosk` backend is good at
|
|
112
|
-
|
|
113
|
-
Use mainly for longer typing session with the [keyboard](#virtual-keyboard-advanced) option, e.g. to make notes.
|
|
124
|
+
The `vosk` backend is much faster and very good at doing real-time transcription for one language, but tended to make more mistakes in my tests and it does not do punctuation.
|
|
125
|
+
It becomes really powerful when used for longer or interactive typing session with the [keyboard](#virtual-keyboard-experimental) option, e.g. to make notes or chat with an AI.
|
|
114
126
|
There are many [vosk models](https://alphacephei.com/vosk/models) available, and here a few are associated to [a handful of languages](scribe/models.toml) `en`, `fr`, `it`, `de` (so far).
|
|
115
127
|
|
|
116
128
|
To skip the initial selection menu you can do:
|
|
@@ -132,8 +144,7 @@ scribe --keyboard
|
|
|
132
144
|
```
|
|
133
145
|
|
|
134
146
|
It relies on the optional `pynput` dependency (installed together with `scribe` if you used the `[all]` or `[keyboard]` option).
|
|
135
|
-
|
|
136
|
-
`pynput` may require [some configuration](https://pynput.readthedocs.io/en/latest/limitations.html). It has [limitations]((https://pynput.readthedocs.io/en/latest/limitations.html)).
|
|
147
|
+
Depending on your operating system, `pynput` may require additional configuration to work around its [limitations](https://pynput.readthedocs.io/en/latest/limitations.html).
|
|
137
148
|
|
|
138
149
|
#### Use the keyboard in Ubuntu
|
|
139
150
|
|
|
@@ -141,14 +152,15 @@ In my Ubuntu + Wayland system the keyboard simulation works out-of-the-box in ch
|
|
|
141
152
|
|
|
142
153
|
One workaround is to use the Xorg version of GNOME: in `etc/gdm3/custom.conf` uncomment `# WaylandEnable=false` and restart your computer.
|
|
143
154
|
|
|
144
|
-
Another workaround with Wayland is to use the low-level `uinput` backend but that requires that `scribe` is run as root (sudo), and likely other configurations like activating the `uinput` system module (`sudo modprobe uinput` for a one-time test, or adding `uinput` to `/etc/modules-load.d/modules.conf` to make that persistent).
|
|
155
|
+
Another workaround while staying with Wayland is to use the low-level `uinput` backend of `pynput`, but that requires that `scribe` is run as root (sudo), and likely other configurations like activating the `uinput` system module (`sudo modprobe uinput` for a one-time test, or adding `uinput` to `/etc/modules-load.d/modules.conf` to make that persistent).
|
|
156
|
+
Moreover, the keyboard must be set with an appropriate layout, for example to have the letter `é` you'd want a French or Italian layout otherwise the English will drop it or replace with something else. Another caveat I encountered is that the special characters (`é`) were inserted at the wrong place. Adding a small delay was enough to fix that with the additional parameter `--latency 0.01` Finally if you run as sudo you may need to reset some environment variable so that the list of audio devices (`XDG_RUNTIME_DIR`) and the download folder remain the same. To sum-up, that gives something like:
|
|
145
157
|
```bash
|
|
146
158
|
sudo modprobe uinput
|
|
147
159
|
sudo HOME=$HOME XDG_RUNTIME_DIR=$XDG_RUNTIME_DIR PYNPUT_BACKEND_KEYBOARD=uinput $(which scribe) --latency 0.01
|
|
148
160
|
```
|
|
149
161
|
You're on the right path :)
|
|
150
162
|
|
|
151
|
-
### System
|
|
163
|
+
### System tray icon (experimental)
|
|
152
164
|
|
|
153
165
|
To avoid switching back and forth with the terminal, it's possible to interact with the program via an icon tray.
|
|
154
166
|
To activate start with:
|
|
@@ -1,19 +1,28 @@
|
|
|
1
|
+
[]()
|
|
2
|
+
[](https://pypi.org/project/scribe-cli)
|
|
3
|
+
|
|
1
4
|
# Scribe
|
|
2
5
|
|
|
3
|
-
`scribe` is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard.
|
|
6
|
+
`scribe` is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard on a computer.
|
|
7
|
+
|
|
8
|
+
## Compatibility
|
|
9
|
+
|
|
10
|
+
In principle `scribe` is compatible with any OS but I develop it under Ubuntu (Wayland) and develop it for my own purposes so glitches are likely on other configurations.
|
|
11
|
+
As of February 19, 2025 python 13 is not supported (I can't recall now which dependency is to blame).
|
|
12
|
+
A test on Mac OS (M1 Air with 8Gb RAM) worked with python 12, though with a much inferior performance compared to my own system (Lenovo T14 Gen 5 with i5 125U 32 Gb RAM).
|
|
4
13
|
|
|
5
14
|
## Installation
|
|
6
15
|
|
|
7
|
-
Install PortAudio library. E.g. on Ubuntu:
|
|
16
|
+
Install PortAudio library and xclip library. E.g. on Ubuntu:
|
|
8
17
|
|
|
9
18
|
```bash
|
|
10
|
-
sudo apt-get install portaudio19-dev
|
|
19
|
+
sudo apt-get install portaudio19-dev xclip
|
|
11
20
|
```
|
|
12
21
|
|
|
13
|
-
The python dependencies should be dealt with automatically:
|
|
22
|
+
See additional requirements for the [icon tray](#system-tray-icon-experimental) and [keyboard](#virtual-keyboard-experimental) options. The python dependencies should be dealt with automatically:
|
|
14
23
|
|
|
15
24
|
```bash
|
|
16
|
-
pip install scribe-cli[all]
|
|
25
|
+
pip install scribe-cli[all]
|
|
17
26
|
```
|
|
18
27
|
|
|
19
28
|
(note the `-cli` suffix for client)
|
|
@@ -50,9 +59,8 @@ but it cannot do real-time, and depending on the model can have relatively long
|
|
|
50
59
|
With the `whisker` model you need to stop the registration manually before the transcription occurs (Ctrl + C), though
|
|
51
60
|
there is a maximum duration after which it will stop by itself, which is setup to 60s by default (unless `--duration` is set to something else).
|
|
52
61
|
|
|
53
|
-
The `vosk` backend is good at
|
|
54
|
-
|
|
55
|
-
Use mainly for longer typing session with the [keyboard](#virtual-keyboard-advanced) option, e.g. to make notes.
|
|
62
|
+
The `vosk` backend is much faster and very good at doing real-time transcription for one language, but tended to make more mistakes in my tests and it does not do punctuation.
|
|
63
|
+
It becomes really powerful when used for longer or interactive typing session with the [keyboard](#virtual-keyboard-experimental) option, e.g. to make notes or chat with an AI.
|
|
56
64
|
There are many [vosk models](https://alphacephei.com/vosk/models) available, and here a few are associated to [a handful of languages](scribe/models.toml) `en`, `fr`, `it`, `de` (so far).
|
|
57
65
|
|
|
58
66
|
To skip the initial selection menu you can do:
|
|
@@ -74,8 +82,7 @@ scribe --keyboard
|
|
|
74
82
|
```
|
|
75
83
|
|
|
76
84
|
It relies on the optional `pynput` dependency (installed together with `scribe` if you used the `[all]` or `[keyboard]` option).
|
|
77
|
-
|
|
78
|
-
`pynput` may require [some configuration](https://pynput.readthedocs.io/en/latest/limitations.html). It has [limitations]((https://pynput.readthedocs.io/en/latest/limitations.html)).
|
|
85
|
+
Depending on your operating system, `pynput` may require additional configuration to work around its [limitations](https://pynput.readthedocs.io/en/latest/limitations.html).
|
|
79
86
|
|
|
80
87
|
#### Use the keyboard in Ubuntu
|
|
81
88
|
|
|
@@ -83,14 +90,15 @@ In my Ubuntu + Wayland system the keyboard simulation works out-of-the-box in ch
|
|
|
83
90
|
|
|
84
91
|
One workaround is to use the Xorg version of GNOME: in `etc/gdm3/custom.conf` uncomment `# WaylandEnable=false` and restart your computer.
|
|
85
92
|
|
|
86
|
-
Another workaround with Wayland is to use the low-level `uinput` backend but that requires that `scribe` is run as root (sudo), and likely other configurations like activating the `uinput` system module (`sudo modprobe uinput` for a one-time test, or adding `uinput` to `/etc/modules-load.d/modules.conf` to make that persistent).
|
|
93
|
+
Another workaround while staying with Wayland is to use the low-level `uinput` backend of `pynput`, but that requires that `scribe` is run as root (sudo), and likely other configurations like activating the `uinput` system module (`sudo modprobe uinput` for a one-time test, or adding `uinput` to `/etc/modules-load.d/modules.conf` to make that persistent).
|
|
94
|
+
Moreover, the keyboard must be set with an appropriate layout, for example to have the letter `é` you'd want a French or Italian layout otherwise the English will drop it or replace with something else. Another caveat I encountered is that the special characters (`é`) were inserted at the wrong place. Adding a small delay was enough to fix that with the additional parameter `--latency 0.01` Finally if you run as sudo you may need to reset some environment variable so that the list of audio devices (`XDG_RUNTIME_DIR`) and the download folder remain the same. To sum-up, that gives something like:
|
|
87
95
|
```bash
|
|
88
96
|
sudo modprobe uinput
|
|
89
97
|
sudo HOME=$HOME XDG_RUNTIME_DIR=$XDG_RUNTIME_DIR PYNPUT_BACKEND_KEYBOARD=uinput $(which scribe) --latency 0.01
|
|
90
98
|
```
|
|
91
99
|
You're on the right path :)
|
|
92
100
|
|
|
93
|
-
### System
|
|
101
|
+
### System tray icon (experimental)
|
|
94
102
|
|
|
95
103
|
To avoid switching back and forth with the terminal, it's possible to interact with the program via an icon tray.
|
|
96
104
|
To activate start with:
|
|
@@ -116,4 +124,4 @@ e.g.
|
|
|
116
124
|
scribe-install --backend whisper --model small
|
|
117
125
|
```
|
|
118
126
|
|
|
119
|
-
After that just typing Cmd + scri... at any time from any where will conveniently start the app in its own terminal with the prescribed options.
|
|
127
|
+
After that just typing Cmd + scri... at any time from any where will conveniently start the app in its own terminal with the prescribed options.
|
|
@@ -5,7 +5,7 @@ build-backend = "setuptools.build_meta"
|
|
|
5
5
|
[project]
|
|
6
6
|
name = "scribe-cli"
|
|
7
7
|
dynamic = ["version"]
|
|
8
|
-
description = "scribe is a local speech recognition tool that provides real-time transcription using vosk and whisper AI
|
|
8
|
+
description = "scribe is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard on a computer"
|
|
9
9
|
authors = [
|
|
10
10
|
{ name="Mahé Perrette", email="mahe.perrette@gmail.com" }
|
|
11
11
|
]
|
|
@@ -18,9 +18,8 @@ dependencies = [
|
|
|
18
18
|
"tqdm",
|
|
19
19
|
"requests",
|
|
20
20
|
"pyperclip",
|
|
21
|
-
"
|
|
21
|
+
"unidecode",
|
|
22
22
|
]
|
|
23
|
-
optional-dependencies = { keyboard = ["pynput"], whisper = ["openai-whisper"], vosk = ["vosk"], all = ["pynput", "openai-whisper", "vosk"] }
|
|
24
23
|
|
|
25
24
|
classifiers = [
|
|
26
25
|
"Programming Language :: Python :: 3",
|
|
@@ -39,6 +38,14 @@ keywords = [
|
|
|
39
38
|
"clipboard",
|
|
40
39
|
]
|
|
41
40
|
|
|
41
|
+
[project.optional-dependencies]
|
|
42
|
+
keyboard = ["pynput"]
|
|
43
|
+
whisper = ["openai-whisper"]
|
|
44
|
+
vosk = ["vosk"]
|
|
45
|
+
app = ["pystray", "PyGObject"]
|
|
46
|
+
all = ["pynput", "openai-whisper", "vosk", "pystray"]
|
|
47
|
+
|
|
48
|
+
|
|
42
49
|
[tool.setuptools]
|
|
43
50
|
packages = [ "scribe", "scribe_data" ]
|
|
44
51
|
|
|
@@ -151,6 +151,7 @@ def get_parser():
|
|
|
151
151
|
parser.add_argument("--keyboard", action="store_true")
|
|
152
152
|
parser.add_argument("--no-clipboard", dest="clipboard", action="store_false")
|
|
153
153
|
parser.add_argument("--latency", default=0, type=float, help="keyboard latency")
|
|
154
|
+
parser.add_argument("--ascii", action="store_true", help="Use unidecode for keyboard typing in ascii")
|
|
154
155
|
|
|
155
156
|
group = parser.add_argument_group("whisper options")
|
|
156
157
|
group.add_argument("--duration", default=120, type=int, help="Max duration of the whisper recording (default %(default)ss)")
|
|
@@ -164,7 +165,7 @@ def get_parser():
|
|
|
164
165
|
|
|
165
166
|
|
|
166
167
|
# Commencer l'enregistrement
|
|
167
|
-
def start_recording(micro, transcriber, clipboard=True, keyboard=False, latency=0, **greetings):
|
|
168
|
+
def start_recording(micro, transcriber, clipboard=True, keyboard=False, latency=0, ascii=False, **greetings):
|
|
168
169
|
|
|
169
170
|
if keyboard:
|
|
170
171
|
from scribe.keyboard import type_text
|
|
@@ -184,7 +185,7 @@ def start_recording(micro, transcriber, clipboard=True, keyboard=False, latency=
|
|
|
184
185
|
clear_line()
|
|
185
186
|
print(result.get('text'))
|
|
186
187
|
if keyboard:
|
|
187
|
-
type_text(result['text'] + " ", interval=latency) # Simulate typing
|
|
188
|
+
type_text(result['text'] + " ", interval=latency, ascii=ascii) # Simulate typing
|
|
188
189
|
|
|
189
190
|
if clipboard:
|
|
190
191
|
fulltext += result['text'] + " "
|
|
@@ -280,7 +281,7 @@ def main(args=None):
|
|
|
280
281
|
transcriber = get_transcriber(o, prompt=o.prompt)
|
|
281
282
|
print(f">>> Model {transcriber.model_name} from {transcriber.backend} selected. Keyboard [{'on' if o.keyboard else 'off'}]. Clipboard [{'on' if o.clipboard else 'off'}] <<<")
|
|
282
283
|
if o.prompt:
|
|
283
|
-
print(f"Choose any of the following actions
|
|
284
|
+
print(f"Choose any of the following actions (or any command-line toggle flag by name)")
|
|
284
285
|
print(f"[q] quit")
|
|
285
286
|
print(f"[e] change model")
|
|
286
287
|
print(f"[x] toggle app [{toggle[o.app]}] -> [{toggle[not o.app]}]")
|
|
@@ -290,7 +291,7 @@ def main(args=None):
|
|
|
290
291
|
print(f"[t] change duration (currently {transcriber.timeout}s)")
|
|
291
292
|
print(f"[b] change silence duration (currently {transcriber.silence_duration}s)")
|
|
292
293
|
print(f"[a] toggle auto-restart after silence [{toggle[transcriber.restart_after_silence]}] -> [{toggle[not transcriber.restart_after_silence]}]")
|
|
293
|
-
print(colored(f"Press [Enter]
|
|
294
|
+
print(colored(f"Press [Enter] to start recording.", "BOLD"))
|
|
294
295
|
|
|
295
296
|
key = input()
|
|
296
297
|
if key == "q":
|
|
@@ -324,19 +325,27 @@ def main(args=None):
|
|
|
324
325
|
except:
|
|
325
326
|
print("Invalid duration. Must be an integer.")
|
|
326
327
|
continue
|
|
328
|
+
if key:
|
|
329
|
+
if hasattr(o, key) and isinstance(getattr(o, key), bool):
|
|
330
|
+
setattr(o, key, not getattr(o, key))
|
|
331
|
+
print(f"Toggle {key} to [{getattr(o, key)}].")
|
|
332
|
+
print(f"Invalid choice: {key}.")
|
|
333
|
+
continue
|
|
327
334
|
|
|
328
335
|
if o.app:
|
|
329
336
|
greetings = dict(
|
|
330
337
|
start_message = "Listening... Use the try icon menu to stop.",
|
|
331
338
|
)
|
|
332
|
-
app = create_app(micro, transcriber, clipboard=o.clipboard,
|
|
339
|
+
app = create_app(micro, transcriber, clipboard=o.clipboard,
|
|
340
|
+
keyboard=o.keyboard, latency=o.latency, ascii=o.ascii, **greetings)
|
|
333
341
|
print("Starting app...")
|
|
334
342
|
app.run()
|
|
335
343
|
else:
|
|
336
344
|
greetings = dict(
|
|
337
345
|
start_message = "Listening... Press Ctrl+C to stop.",
|
|
338
346
|
)
|
|
339
|
-
start_recording(micro, transcriber, clipboard=o.clipboard,
|
|
347
|
+
start_recording(micro, transcriber, clipboard=o.clipboard,
|
|
348
|
+
keyboard=o.keyboard, latency=o.latency, ascii=o.ascii, **greetings)
|
|
340
349
|
|
|
341
350
|
# if we arrived so far, that means we pressed Ctrl + C anyway, and need Enter to move on.
|
|
342
351
|
# So we leave the wider range of options to change the model.
|
|
@@ -2,6 +2,8 @@
|
|
|
2
2
|
"""
|
|
3
3
|
import platform
|
|
4
4
|
import time
|
|
5
|
+
import unidecode
|
|
6
|
+
import logging
|
|
5
7
|
|
|
6
8
|
try:
|
|
7
9
|
# import pyautogui
|
|
@@ -30,11 +32,24 @@ def paste_text():
|
|
|
30
32
|
keyboard.release('v')
|
|
31
33
|
keyboard.release(Key.ctrl)
|
|
32
34
|
|
|
33
|
-
def
|
|
35
|
+
def safe_type_text(text):
|
|
36
|
+
"""I got key errors with the uinput mode, so I'm using unidecode to convert
|
|
37
|
+
the text to ASCII before typing it."""
|
|
38
|
+
try:
|
|
39
|
+
keyboard.type(text)
|
|
40
|
+
except KeyError:
|
|
41
|
+
asciitext = unidecode.unidecode(text)
|
|
42
|
+
logging.warning(f"Key error with {text} -> convert to {asciitext}")
|
|
43
|
+
keyboard.type(asciitext)
|
|
44
|
+
|
|
45
|
+
def type_text(text, interval=0, paste=False, ascii=False):
|
|
34
46
|
# Simulate typing a string
|
|
35
47
|
# import subprocess
|
|
36
48
|
# subprocess.run(["ydotool", "type", text])
|
|
37
49
|
|
|
50
|
+
if ascii:
|
|
51
|
+
text = unidecode.unidecode(text)
|
|
52
|
+
|
|
38
53
|
if paste:
|
|
39
54
|
import pyperclip
|
|
40
55
|
keep_state = pyperclip.paste()
|
|
@@ -45,7 +60,9 @@ def type_text(text, interval=0, paste=False):
|
|
|
45
60
|
|
|
46
61
|
if interval > 0:
|
|
47
62
|
for c in text:
|
|
48
|
-
keyboard.type(c)
|
|
63
|
+
# keyboard.type(c)
|
|
64
|
+
safe_type_text(c)
|
|
49
65
|
time.sleep(interval)
|
|
50
66
|
else:
|
|
51
|
-
keyboard.type(text)
|
|
67
|
+
# keyboard.type(text)
|
|
68
|
+
safe_type_text(text)
|
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
Metadata-Version: 2.2
|
|
2
2
|
Name: scribe-cli
|
|
3
|
-
Version: 0.7.
|
|
4
|
-
Summary: scribe is a local speech recognition tool that provides real-time transcription using vosk and whisper AI
|
|
3
|
+
Version: 0.7.7
|
|
4
|
+
Summary: scribe is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard on a computer
|
|
5
5
|
Author-email: Mahé Perrette <mahe.perrette@gmail.com>
|
|
6
6
|
License: MIT License
|
|
7
7
|
|
|
@@ -44,34 +44,47 @@ Requires-Dist: sounddevice
|
|
|
44
44
|
Requires-Dist: tqdm
|
|
45
45
|
Requires-Dist: requests
|
|
46
46
|
Requires-Dist: pyperclip
|
|
47
|
-
Requires-Dist:
|
|
47
|
+
Requires-Dist: unidecode
|
|
48
48
|
Provides-Extra: keyboard
|
|
49
49
|
Requires-Dist: pynput; extra == "keyboard"
|
|
50
50
|
Provides-Extra: whisper
|
|
51
51
|
Requires-Dist: openai-whisper; extra == "whisper"
|
|
52
52
|
Provides-Extra: vosk
|
|
53
53
|
Requires-Dist: vosk; extra == "vosk"
|
|
54
|
+
Provides-Extra: app
|
|
55
|
+
Requires-Dist: pystray; extra == "app"
|
|
56
|
+
Requires-Dist: PyGObject; extra == "app"
|
|
54
57
|
Provides-Extra: all
|
|
55
58
|
Requires-Dist: pynput; extra == "all"
|
|
56
59
|
Requires-Dist: openai-whisper; extra == "all"
|
|
57
60
|
Requires-Dist: vosk; extra == "all"
|
|
61
|
+
Requires-Dist: pystray; extra == "all"
|
|
62
|
+
|
|
63
|
+
[]()
|
|
64
|
+
[](https://pypi.org/project/scribe-cli)
|
|
58
65
|
|
|
59
66
|
# Scribe
|
|
60
67
|
|
|
61
|
-
`scribe` is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard.
|
|
68
|
+
`scribe` is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard on a computer.
|
|
69
|
+
|
|
70
|
+
## Compatibility
|
|
71
|
+
|
|
72
|
+
In principle `scribe` is compatible with any OS but I develop it under Ubuntu (Wayland) and develop it for my own purposes so glitches are likely on other configurations.
|
|
73
|
+
As of February 19, 2025 python 13 is not supported (I can't recall now which dependency is to blame).
|
|
74
|
+
A test on Mac OS (M1 Air with 8Gb RAM) worked with python 12, though with a much inferior performance compared to my own system (Lenovo T14 Gen 5 with i5 125U 32 Gb RAM).
|
|
62
75
|
|
|
63
76
|
## Installation
|
|
64
77
|
|
|
65
|
-
Install PortAudio library. E.g. on Ubuntu:
|
|
78
|
+
Install PortAudio library and xclip library. E.g. on Ubuntu:
|
|
66
79
|
|
|
67
80
|
```bash
|
|
68
|
-
sudo apt-get install portaudio19-dev
|
|
81
|
+
sudo apt-get install portaudio19-dev xclip
|
|
69
82
|
```
|
|
70
83
|
|
|
71
|
-
The python dependencies should be dealt with automatically:
|
|
84
|
+
See additional requirements for the [icon tray](#system-tray-icon-experimental) and [keyboard](#virtual-keyboard-experimental) options. The python dependencies should be dealt with automatically:
|
|
72
85
|
|
|
73
86
|
```bash
|
|
74
|
-
pip install scribe-cli[all]
|
|
87
|
+
pip install scribe-cli[all]
|
|
75
88
|
```
|
|
76
89
|
|
|
77
90
|
(note the `-cli` suffix for client)
|
|
@@ -108,9 +121,8 @@ but it cannot do real-time, and depending on the model can have relatively long
|
|
|
108
121
|
With the `whisker` model you need to stop the registration manually before the transcription occurs (Ctrl + C), though
|
|
109
122
|
there is a maximum duration after which it will stop by itself, which is setup to 60s by default (unless `--duration` is set to something else).
|
|
110
123
|
|
|
111
|
-
The `vosk` backend is good at
|
|
112
|
-
|
|
113
|
-
Use mainly for longer typing session with the [keyboard](#virtual-keyboard-advanced) option, e.g. to make notes.
|
|
124
|
+
The `vosk` backend is much faster and very good at doing real-time transcription for one language, but tended to make more mistakes in my tests and it does not do punctuation.
|
|
125
|
+
It becomes really powerful when used for longer or interactive typing session with the [keyboard](#virtual-keyboard-experimental) option, e.g. to make notes or chat with an AI.
|
|
114
126
|
There are many [vosk models](https://alphacephei.com/vosk/models) available, and here a few are associated to [a handful of languages](scribe/models.toml) `en`, `fr`, `it`, `de` (so far).
|
|
115
127
|
|
|
116
128
|
To skip the initial selection menu you can do:
|
|
@@ -132,8 +144,7 @@ scribe --keyboard
|
|
|
132
144
|
```
|
|
133
145
|
|
|
134
146
|
It relies on the optional `pynput` dependency (installed together with `scribe` if you used the `[all]` or `[keyboard]` option).
|
|
135
|
-
|
|
136
|
-
`pynput` may require [some configuration](https://pynput.readthedocs.io/en/latest/limitations.html). It has [limitations]((https://pynput.readthedocs.io/en/latest/limitations.html)).
|
|
147
|
+
Depending on your operating system, `pynput` may require additional configuration to work around its [limitations](https://pynput.readthedocs.io/en/latest/limitations.html).
|
|
137
148
|
|
|
138
149
|
#### Use the keyboard in Ubuntu
|
|
139
150
|
|
|
@@ -141,14 +152,15 @@ In my Ubuntu + Wayland system the keyboard simulation works out-of-the-box in ch
|
|
|
141
152
|
|
|
142
153
|
One workaround is to use the Xorg version of GNOME: in `etc/gdm3/custom.conf` uncomment `# WaylandEnable=false` and restart your computer.
|
|
143
154
|
|
|
144
|
-
Another workaround with Wayland is to use the low-level `uinput` backend but that requires that `scribe` is run as root (sudo), and likely other configurations like activating the `uinput` system module (`sudo modprobe uinput` for a one-time test, or adding `uinput` to `/etc/modules-load.d/modules.conf` to make that persistent).
|
|
155
|
+
Another workaround while staying with Wayland is to use the low-level `uinput` backend of `pynput`, but that requires that `scribe` is run as root (sudo), and likely other configurations like activating the `uinput` system module (`sudo modprobe uinput` for a one-time test, or adding `uinput` to `/etc/modules-load.d/modules.conf` to make that persistent).
|
|
156
|
+
Moreover, the keyboard must be set with an appropriate layout, for example to have the letter `é` you'd want a French or Italian layout otherwise the English will drop it or replace with something else. Another caveat I encountered is that the special characters (`é`) were inserted at the wrong place. Adding a small delay was enough to fix that with the additional parameter `--latency 0.01` Finally if you run as sudo you may need to reset some environment variable so that the list of audio devices (`XDG_RUNTIME_DIR`) and the download folder remain the same. To sum-up, that gives something like:
|
|
145
157
|
```bash
|
|
146
158
|
sudo modprobe uinput
|
|
147
159
|
sudo HOME=$HOME XDG_RUNTIME_DIR=$XDG_RUNTIME_DIR PYNPUT_BACKEND_KEYBOARD=uinput $(which scribe) --latency 0.01
|
|
148
160
|
```
|
|
149
161
|
You're on the right path :)
|
|
150
162
|
|
|
151
|
-
### System
|
|
163
|
+
### System tray icon (experimental)
|
|
152
164
|
|
|
153
165
|
To avoid switching back and forth with the terminal, it's possible to interact with the program via an icon tray.
|
|
154
166
|
To activate start with:
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|