scribe-cli 0.7.2__tar.gz → 0.7.4__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {scribe_cli-0.7.2/scribe_cli.egg-info → scribe_cli-0.7.4}/PKG-INFO +20 -4
- {scribe_cli-0.7.2 → scribe_cli-0.7.4}/README.md +19 -3
- {scribe_cli-0.7.2 → scribe_cli-0.7.4}/scribe/_version.py +2 -2
- {scribe_cli-0.7.2 → scribe_cli-0.7.4}/scribe/app.py +51 -31
- scribe_cli-0.7.4/scribe/keyboard.py +51 -0
- {scribe_cli-0.7.2 → scribe_cli-0.7.4}/scribe/models.py +6 -3
- scribe_cli-0.7.4/scribe/models.toml +23 -0
- {scribe_cli-0.7.2 → scribe_cli-0.7.4/scribe_cli.egg-info}/PKG-INFO +20 -4
- scribe_cli-0.7.2/scribe/keyboard.py +0 -18
- scribe_cli-0.7.2/scribe/models.toml +0 -31
- {scribe_cli-0.7.2 → scribe_cli-0.7.4}/.github/workflows/pypi.yml +0 -0
- {scribe_cli-0.7.2 → scribe_cli-0.7.4}/.gitignore +0 -0
- {scribe_cli-0.7.2 → scribe_cli-0.7.4}/LICENSE +0 -0
- {scribe_cli-0.7.2 → scribe_cli-0.7.4}/pyproject.toml +0 -0
- {scribe_cli-0.7.2 → scribe_cli-0.7.4}/scribe/__init__.py +0 -0
- {scribe_cli-0.7.2 → scribe_cli-0.7.4}/scribe/audio.py +0 -0
- {scribe_cli-0.7.2 → scribe_cli-0.7.4}/scribe/install_desktop.py +0 -0
- {scribe_cli-0.7.2 → scribe_cli-0.7.4}/scribe/saverecording.py +0 -0
- {scribe_cli-0.7.2 → scribe_cli-0.7.4}/scribe/testpynput.py +0 -0
- {scribe_cli-0.7.2 → scribe_cli-0.7.4}/scribe/util.py +0 -0
- {scribe_cli-0.7.2 → scribe_cli-0.7.4}/scribe_cli.egg-info/SOURCES.txt +0 -0
- {scribe_cli-0.7.2 → scribe_cli-0.7.4}/scribe_cli.egg-info/dependency_links.txt +0 -0
- {scribe_cli-0.7.2 → scribe_cli-0.7.4}/scribe_cli.egg-info/entry_points.txt +0 -0
- {scribe_cli-0.7.2 → scribe_cli-0.7.4}/scribe_cli.egg-info/requires.txt +0 -0
- {scribe_cli-0.7.2 → scribe_cli-0.7.4}/scribe_cli.egg-info/top_level.txt +0 -0
- {scribe_cli-0.7.2 → scribe_cli-0.7.4}/scribe_data/__init__.py +0 -0
- {scribe_cli-0.7.2 → scribe_cli-0.7.4}/scribe_data/share/icon.jpg +0 -0
- {scribe_cli-0.7.2 → scribe_cli-0.7.4}/scribe_data/templates/scribe.desktop +0 -0
- {scribe_cli-0.7.2 → scribe_cli-0.7.4}/setup.cfg +0 -0
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.2
|
|
2
2
|
Name: scribe-cli
|
|
3
|
-
Version: 0.7.
|
|
3
|
+
Version: 0.7.4
|
|
4
4
|
Summary: scribe is a local speech recognition tool that provides real-time transcription using vosk and whisper AI.
|
|
5
5
|
Author-email: Mahé Perrette <mahe.perrette@gmail.com>
|
|
6
6
|
License: MIT License
|
|
@@ -87,8 +87,8 @@ pip install -e .[all]
|
|
|
87
87
|
You can leave the optional dependencies (leave out `[all]`) but must install at least one of `vosk` or `openai-whisper` packages (see Usage below).
|
|
88
88
|
|
|
89
89
|
The `vosk` language models will download on-the-fly.
|
|
90
|
-
The default
|
|
91
|
-
|
|
90
|
+
The default download folder is `$XDG_CACHE_HOME/{backend}` where `$XDG_CACHE_HOME` defaults to `$HOME/.cache` (note for the `whisker` backend
|
|
91
|
+
the default is left to the `openai-whisper` package and might change in the future).
|
|
92
92
|
|
|
93
93
|
|
|
94
94
|
## Usage
|
|
@@ -122,6 +122,9 @@ where `--no-prompt` jumps right to the recording (after the first interruption,
|
|
|
122
122
|
### Virtual keyboard (experimental)
|
|
123
123
|
|
|
124
124
|
By default the content of the transcription is pasted to the clipboard, and it is up to the user to paste (e.g. Ctrl + V).
|
|
125
|
+
However with the `vosk` backend and its realtime transcription, it is very handy to have the keys sent directly to the keyboard.
|
|
126
|
+
That can be achieve with the `--keyboard` option.
|
|
127
|
+
|
|
125
128
|
With the `--keyboard` option `scribe` will attempt to simulate a keyboard and send transcribed characters to the applcation under focus:
|
|
126
129
|
|
|
127
130
|
```bash
|
|
@@ -130,7 +133,20 @@ scribe --keyboard
|
|
|
130
133
|
|
|
131
134
|
It relies on the optional `pynput` dependency (installed together with `scribe` if you used the `[all]` or `[keyboard]` option).
|
|
132
135
|
|
|
133
|
-
`pynput` may require [some configuration](https://pynput.readthedocs.io/en/latest/limitations.html). It has [limitations]((https://pynput.readthedocs.io/en/latest/limitations.html)).
|
|
136
|
+
`pynput` may require [some configuration](https://pynput.readthedocs.io/en/latest/limitations.html). It has [limitations]((https://pynput.readthedocs.io/en/latest/limitations.html)).
|
|
137
|
+
|
|
138
|
+
#### Use the keyboard in Ubuntu
|
|
139
|
+
|
|
140
|
+
In my Ubuntu + Wayland system the keyboard simulation works out-of-the-box in chromium based applications (including vscode) but it does not in firefox and sublime text and any of the rest (not even in a terminal !). I am told this is because Chromium runs an X server emulator and so is compatible with the default pynput backend.
|
|
141
|
+
|
|
142
|
+
One workaround is to use the Xorg version of GNOME: in `etc/gdm3/custom.conf` uncomment `# WaylandEnable=false` and restart your computer.
|
|
143
|
+
|
|
144
|
+
Another workaround with Wayland is to use the low-level `uinput` backend but that requires that `scribe` is run as root (sudo), and likely other configurations like activating the `uinput` system module (`sudo modprobe uinput` for a one-time test, or adding `uinput` to `/etc/modules-load.d/modules.conf` to make that persistent). Moreover, since `uinput` really only simulates key strokes, your keyboard must be set with an appropriate layout, for example to have the letter `é` you'd want a French or Italian layout otherwise the English will drop it or replace with something else. Another caveat I encountered the caveat that the special characters were inserted at the wrong place. Adding a small delay was enough to fix that with the additional parameter `--latency 0.01` Finally if you run as sudo you may need to reset some environment variable so that the list of audio devices (`XDG_RUNTIME_DIR`) and the download folder remain the same. To sum-up, that gives something like:
|
|
145
|
+
```bash
|
|
146
|
+
sudo modprobe uinput
|
|
147
|
+
sudo HOME=$HOME XDG_RUNTIME_DIR=$XDG_RUNTIME_DIR PYNPUT_BACKEND_KEYBOARD=uinput $(which scribe) --latency 0.01
|
|
148
|
+
```
|
|
149
|
+
You're on the right path :)
|
|
134
150
|
|
|
135
151
|
### System try icon (experimental)
|
|
136
152
|
|
|
@@ -29,8 +29,8 @@ pip install -e .[all]
|
|
|
29
29
|
You can leave the optional dependencies (leave out `[all]`) but must install at least one of `vosk` or `openai-whisper` packages (see Usage below).
|
|
30
30
|
|
|
31
31
|
The `vosk` language models will download on-the-fly.
|
|
32
|
-
The default
|
|
33
|
-
|
|
32
|
+
The default download folder is `$XDG_CACHE_HOME/{backend}` where `$XDG_CACHE_HOME` defaults to `$HOME/.cache` (note for the `whisker` backend
|
|
33
|
+
the default is left to the `openai-whisper` package and might change in the future).
|
|
34
34
|
|
|
35
35
|
|
|
36
36
|
## Usage
|
|
@@ -64,6 +64,9 @@ where `--no-prompt` jumps right to the recording (after the first interruption,
|
|
|
64
64
|
### Virtual keyboard (experimental)
|
|
65
65
|
|
|
66
66
|
By default the content of the transcription is pasted to the clipboard, and it is up to the user to paste (e.g. Ctrl + V).
|
|
67
|
+
However with the `vosk` backend and its realtime transcription, it is very handy to have the keys sent directly to the keyboard.
|
|
68
|
+
That can be achieve with the `--keyboard` option.
|
|
69
|
+
|
|
67
70
|
With the `--keyboard` option `scribe` will attempt to simulate a keyboard and send transcribed characters to the applcation under focus:
|
|
68
71
|
|
|
69
72
|
```bash
|
|
@@ -72,7 +75,20 @@ scribe --keyboard
|
|
|
72
75
|
|
|
73
76
|
It relies on the optional `pynput` dependency (installed together with `scribe` if you used the `[all]` or `[keyboard]` option).
|
|
74
77
|
|
|
75
|
-
`pynput` may require [some configuration](https://pynput.readthedocs.io/en/latest/limitations.html). It has [limitations]((https://pynput.readthedocs.io/en/latest/limitations.html)).
|
|
78
|
+
`pynput` may require [some configuration](https://pynput.readthedocs.io/en/latest/limitations.html). It has [limitations]((https://pynput.readthedocs.io/en/latest/limitations.html)).
|
|
79
|
+
|
|
80
|
+
#### Use the keyboard in Ubuntu
|
|
81
|
+
|
|
82
|
+
In my Ubuntu + Wayland system the keyboard simulation works out-of-the-box in chromium based applications (including vscode) but it does not in firefox and sublime text and any of the rest (not even in a terminal !). I am told this is because Chromium runs an X server emulator and so is compatible with the default pynput backend.
|
|
83
|
+
|
|
84
|
+
One workaround is to use the Xorg version of GNOME: in `etc/gdm3/custom.conf` uncomment `# WaylandEnable=false` and restart your computer.
|
|
85
|
+
|
|
86
|
+
Another workaround with Wayland is to use the low-level `uinput` backend but that requires that `scribe` is run as root (sudo), and likely other configurations like activating the `uinput` system module (`sudo modprobe uinput` for a one-time test, or adding `uinput` to `/etc/modules-load.d/modules.conf` to make that persistent). Moreover, since `uinput` really only simulates key strokes, your keyboard must be set with an appropriate layout, for example to have the letter `é` you'd want a French or Italian layout otherwise the English will drop it or replace with something else. Another caveat I encountered the caveat that the special characters were inserted at the wrong place. Adding a small delay was enough to fix that with the additional parameter `--latency 0.01` Finally if you run as sudo you may need to reset some environment variable so that the list of audio devices (`XDG_RUNTIME_DIR`) and the download folder remain the same. To sum-up, that gives something like:
|
|
87
|
+
```bash
|
|
88
|
+
sudo modprobe uinput
|
|
89
|
+
sudo HOME=$HOME XDG_RUNTIME_DIR=$XDG_RUNTIME_DIR PYNPUT_BACKEND_KEYBOARD=uinput $(which scribe) --latency 0.01
|
|
90
|
+
```
|
|
91
|
+
You're on the right path :)
|
|
76
92
|
|
|
77
93
|
### System try icon (experimental)
|
|
78
94
|
|
|
@@ -3,7 +3,7 @@ import tomllib
|
|
|
3
3
|
import argparse
|
|
4
4
|
from scribe.audio import Microphone
|
|
5
5
|
from scribe.util import print_partial, clear_line, prompt_choices, check_dependencies, ansi_link, colored
|
|
6
|
-
from scribe.models import VoskTranscriber, WhisperTranscriber
|
|
6
|
+
from scribe.models import VoskTranscriber, WhisperTranscriber, StopRecording
|
|
7
7
|
|
|
8
8
|
with open(Path(__file__).parent / "models.toml", "rb") as f:
|
|
9
9
|
language_config_default = tomllib.load(f)
|
|
@@ -147,6 +147,7 @@ def get_parser():
|
|
|
147
147
|
parser.add_argument("--app", action="store_true", help="Start in app mode (relies on pystray)")
|
|
148
148
|
|
|
149
149
|
parser.add_argument("--samplerate", default=16000, type=int, help=argparse.SUPPRESS)
|
|
150
|
+
parser.add_argument("--microphone-device", help="The device index of the microphone to use.", type=int)
|
|
150
151
|
parser.add_argument("--keyboard", action="store_true")
|
|
151
152
|
parser.add_argument("--no-clipboard", dest="clipboard", action="store_false")
|
|
152
153
|
parser.add_argument("--latency", default=0, type=float, help="keyboard latency")
|
|
@@ -163,36 +164,20 @@ def get_parser():
|
|
|
163
164
|
|
|
164
165
|
|
|
165
166
|
# Commencer l'enregistrement
|
|
166
|
-
def start_recording(micro, transcriber, clipboard=True, keyboard=False, latency=0):
|
|
167
|
+
def start_recording(micro, transcriber, clipboard=True, keyboard=False, latency=0, **greetings):
|
|
167
168
|
|
|
168
169
|
if keyboard:
|
|
169
|
-
|
|
170
|
-
from scribe.keyboard import type_text
|
|
171
|
-
except ImportError:
|
|
172
|
-
keyboard = False
|
|
173
|
-
print("Keyboard simulation is not available.")
|
|
174
|
-
return
|
|
175
|
-
|
|
170
|
+
from scribe.keyboard import type_text
|
|
176
171
|
print("\nChange focus to target app during transcription.")
|
|
177
172
|
|
|
178
173
|
|
|
179
174
|
if clipboard:
|
|
180
|
-
|
|
181
|
-
import pyperclip
|
|
182
|
-
except ImportError:
|
|
183
|
-
clipboard = False
|
|
184
|
-
print("Clipboard simulation is not available.")
|
|
185
|
-
return
|
|
186
|
-
|
|
175
|
+
import pyperclip
|
|
187
176
|
print("\nThe full transcription will be copied to clipboard as it becomes available.")
|
|
188
177
|
|
|
189
178
|
|
|
190
179
|
fulltext = ""
|
|
191
180
|
|
|
192
|
-
greetings = { k: v for k, v in language_config["_meta"].get(transcriber.language, {}).items()
|
|
193
|
-
if v is not None and k.startswith(("start", "stop"))
|
|
194
|
-
}
|
|
195
|
-
|
|
196
181
|
for result in transcriber.start_recording(micro, **greetings):
|
|
197
182
|
|
|
198
183
|
if result.get('text'):
|
|
@@ -212,6 +197,23 @@ def start_recording(micro, transcriber, clipboard=True, keyboard=False, latency=
|
|
|
212
197
|
print("Copied to clipboard.")
|
|
213
198
|
|
|
214
199
|
|
|
200
|
+
def interrupt_app_thread(icon):
|
|
201
|
+
"""Thanks Le Chat for this solution: https://stackoverflow.com/a/325528/2192272
|
|
202
|
+
"""
|
|
203
|
+
import ctypes
|
|
204
|
+
thread = icon._recording_thread
|
|
205
|
+
# Raise an exception in the thread using ctypes
|
|
206
|
+
thread_id = thread.ident
|
|
207
|
+
if thread_id is not None:
|
|
208
|
+
res = ctypes.pythonapi.PyThreadState_SetAsyncExc(
|
|
209
|
+
ctypes.c_long(thread_id),
|
|
210
|
+
ctypes.py_object(StopRecording)
|
|
211
|
+
)
|
|
212
|
+
if res > 1:
|
|
213
|
+
ctypes.pythonapi.PyThreadState_SetAsyncExc(thread_id, 0)
|
|
214
|
+
print("Failure to raise exception in thread")
|
|
215
|
+
|
|
216
|
+
|
|
215
217
|
def create_app(micro, transcriber, **kwargs):
|
|
216
218
|
import pystray
|
|
217
219
|
from pystray import Menu as pystrayMenu, MenuItem as Item
|
|
@@ -219,26 +221,38 @@ def create_app(micro, transcriber, **kwargs):
|
|
|
219
221
|
import PIL.ImageOps
|
|
220
222
|
|
|
221
223
|
import scribe_data
|
|
224
|
+
import threading
|
|
222
225
|
|
|
223
226
|
# Load an image from a file
|
|
224
227
|
image = Image.open(Path(scribe_data.__file__).parent / "share" / "icon.jpg")
|
|
225
228
|
|
|
226
229
|
def callback_quit(icon, item):
|
|
227
230
|
icon.visible = False
|
|
231
|
+
## Here we need to stop the recording thread
|
|
232
|
+
callback_stop_recording(icon, item)
|
|
228
233
|
icon.stop()
|
|
229
234
|
|
|
235
|
+
def callback_stop_recording(icon, item):
|
|
236
|
+
## Here we need to stop the recording thread
|
|
237
|
+
interrupt_app_thread(icon)
|
|
238
|
+
icon._recording_thread.join()
|
|
239
|
+
|
|
230
240
|
def callback_record(icon, item):
|
|
231
|
-
|
|
232
|
-
|
|
233
|
-
|
|
234
|
-
|
|
235
|
-
|
|
236
|
-
|
|
237
|
-
|
|
241
|
+
icon._recording_thread = threading.Thread(target=start_recording, args=(micro, transcriber), kwargs=kwargs)
|
|
242
|
+
icon._recording_thread.start()
|
|
243
|
+
|
|
244
|
+
def is_recording(item):
|
|
245
|
+
return hasattr(icon, "_recording_thread") and icon._recording_thread.is_alive()
|
|
246
|
+
|
|
247
|
+
def is_not_recording(item):
|
|
248
|
+
return not is_recording(item)
|
|
249
|
+
|
|
238
250
|
|
|
239
251
|
# Create a menu
|
|
240
252
|
menu = pystrayMenu(
|
|
241
|
-
Item('Record', callback_record),
|
|
253
|
+
# Item('Record', callback_record),
|
|
254
|
+
Item("Record", callback_record, visible=is_not_recording),
|
|
255
|
+
Item("Stop", callback_stop_recording, visible=is_recording),
|
|
242
256
|
Item('Quit', callback_quit),
|
|
243
257
|
)
|
|
244
258
|
|
|
@@ -255,7 +269,7 @@ def main(args=None):
|
|
|
255
269
|
|
|
256
270
|
|
|
257
271
|
# Set up the microphone for recording
|
|
258
|
-
micro = Microphone(samplerate=o.samplerate)
|
|
272
|
+
micro = Microphone(samplerate=o.samplerate, device=o.microphone_device)
|
|
259
273
|
|
|
260
274
|
transcriber = None
|
|
261
275
|
|
|
@@ -312,11 +326,17 @@ def main(args=None):
|
|
|
312
326
|
continue
|
|
313
327
|
|
|
314
328
|
if o.app:
|
|
315
|
-
|
|
329
|
+
greetings = dict(
|
|
330
|
+
start_message = "Listening... Use the try icon menu to stop.",
|
|
331
|
+
)
|
|
332
|
+
app = create_app(micro, transcriber, clipboard=o.clipboard, keyboard=o.keyboard, latency=o.latency, **greetings)
|
|
316
333
|
print("Starting app...")
|
|
317
334
|
app.run()
|
|
318
335
|
else:
|
|
319
|
-
|
|
336
|
+
greetings = dict(
|
|
337
|
+
start_message = "Listening... Press Ctrl+C to stop.",
|
|
338
|
+
)
|
|
339
|
+
start_recording(micro, transcriber, clipboard=o.clipboard, keyboard=o.keyboard, latency=o.latency, **greetings)
|
|
320
340
|
|
|
321
341
|
# if we arrived so far, that means we pressed Ctrl + C anyway, and need Enter to move on.
|
|
322
342
|
# So we leave the wider range of options to change the model.
|
|
@@ -0,0 +1,51 @@
|
|
|
1
|
+
"""This module handles typing characters as if they were typed on a keyboard.
|
|
2
|
+
"""
|
|
3
|
+
import platform
|
|
4
|
+
import time
|
|
5
|
+
|
|
6
|
+
try:
|
|
7
|
+
# import pyautogui
|
|
8
|
+
from pynput.keyboard import Controller, Key
|
|
9
|
+
|
|
10
|
+
except ImportError:
|
|
11
|
+
print("Please install pynput to use the keyboard feature.")
|
|
12
|
+
raise
|
|
13
|
+
|
|
14
|
+
# Create a keyboard controller
|
|
15
|
+
keyboard = Controller()
|
|
16
|
+
|
|
17
|
+
def paste_text():
|
|
18
|
+
"""This does not work with the uinput backend
|
|
19
|
+
"""
|
|
20
|
+
os_name = platform.system()
|
|
21
|
+
|
|
22
|
+
if os_name == "Darwin": # macOS
|
|
23
|
+
with keyboard.pressed(Key.cmd):
|
|
24
|
+
keyboard.press('v')
|
|
25
|
+
keyboard.release('v')
|
|
26
|
+
|
|
27
|
+
else: # Windows and Linux
|
|
28
|
+
keyboard.press(Key.ctrl)
|
|
29
|
+
keyboard.press('v')
|
|
30
|
+
keyboard.release('v')
|
|
31
|
+
keyboard.release(Key.ctrl)
|
|
32
|
+
|
|
33
|
+
def type_text(text, interval=0, paste=False):
|
|
34
|
+
# Simulate typing a string
|
|
35
|
+
# import subprocess
|
|
36
|
+
# subprocess.run(["ydotool", "type", text])
|
|
37
|
+
|
|
38
|
+
if paste:
|
|
39
|
+
import pyperclip
|
|
40
|
+
keep_state = pyperclip.paste()
|
|
41
|
+
pyperclip.copy(text)
|
|
42
|
+
paste_text()
|
|
43
|
+
pyperclip.copy(keep_state)
|
|
44
|
+
return
|
|
45
|
+
|
|
46
|
+
if interval > 0:
|
|
47
|
+
for c in text:
|
|
48
|
+
keyboard.type(c)
|
|
49
|
+
time.sleep(interval)
|
|
50
|
+
else:
|
|
51
|
+
keyboard.type(text)
|
|
@@ -12,9 +12,12 @@ def is_silent(data, silence_thresh=-40):
|
|
|
12
12
|
"""
|
|
13
13
|
return calculate_decibels(data) < silence_thresh
|
|
14
14
|
|
|
15
|
-
|
|
16
|
-
|
|
15
|
+
HOME = os.environ.get('HOME', os.path.expanduser('~'))
|
|
16
|
+
XDG_CACHE_HOME = os.environ.get('XDG_CACHE_HOME', os.path.join(HOME, '.cache'))
|
|
17
|
+
VOSK_MODELS_FOLDER = os.path.join(XDG_CACHE_HOME, "vosk")
|
|
17
18
|
|
|
19
|
+
class StopRecording(Exception):
|
|
20
|
+
pass
|
|
18
21
|
|
|
19
22
|
class AbstractTranscriber:
|
|
20
23
|
backend = None
|
|
@@ -85,7 +88,7 @@ class AbstractTranscriber:
|
|
|
85
88
|
if self.is_overtime():
|
|
86
89
|
raise KeyboardInterrupt("Overtime: {:.2f} seconds".format(self.get_elapsed()))
|
|
87
90
|
|
|
88
|
-
except KeyboardInterrupt:
|
|
91
|
+
except (KeyboardInterrupt, StopRecording):
|
|
89
92
|
pass
|
|
90
93
|
|
|
91
94
|
finally:
|
|
@@ -0,0 +1,23 @@
|
|
|
1
|
+
[vosk.en]
|
|
2
|
+
model = "vosk-model-en-us-0.42-gigaspeech"
|
|
3
|
+
|
|
4
|
+
[vosk.fr]
|
|
5
|
+
model = "vosk-model-fr-0.22"
|
|
6
|
+
|
|
7
|
+
[vosk.de]
|
|
8
|
+
model = "vosk-model-de-tuda-0.6-900k"
|
|
9
|
+
|
|
10
|
+
[vosk.it]
|
|
11
|
+
model = "vosk-model-it-0.22"
|
|
12
|
+
|
|
13
|
+
[_meta.en]
|
|
14
|
+
language = "English (US)"
|
|
15
|
+
|
|
16
|
+
[_meta.fr]
|
|
17
|
+
language = "French"
|
|
18
|
+
|
|
19
|
+
[_meta.de]
|
|
20
|
+
language = "German"
|
|
21
|
+
|
|
22
|
+
[_meta.it]
|
|
23
|
+
language = "Italian"
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.2
|
|
2
2
|
Name: scribe-cli
|
|
3
|
-
Version: 0.7.
|
|
3
|
+
Version: 0.7.4
|
|
4
4
|
Summary: scribe is a local speech recognition tool that provides real-time transcription using vosk and whisper AI.
|
|
5
5
|
Author-email: Mahé Perrette <mahe.perrette@gmail.com>
|
|
6
6
|
License: MIT License
|
|
@@ -87,8 +87,8 @@ pip install -e .[all]
|
|
|
87
87
|
You can leave the optional dependencies (leave out `[all]`) but must install at least one of `vosk` or `openai-whisper` packages (see Usage below).
|
|
88
88
|
|
|
89
89
|
The `vosk` language models will download on-the-fly.
|
|
90
|
-
The default
|
|
91
|
-
|
|
90
|
+
The default download folder is `$XDG_CACHE_HOME/{backend}` where `$XDG_CACHE_HOME` defaults to `$HOME/.cache` (note for the `whisker` backend
|
|
91
|
+
the default is left to the `openai-whisper` package and might change in the future).
|
|
92
92
|
|
|
93
93
|
|
|
94
94
|
## Usage
|
|
@@ -122,6 +122,9 @@ where `--no-prompt` jumps right to the recording (after the first interruption,
|
|
|
122
122
|
### Virtual keyboard (experimental)
|
|
123
123
|
|
|
124
124
|
By default the content of the transcription is pasted to the clipboard, and it is up to the user to paste (e.g. Ctrl + V).
|
|
125
|
+
However with the `vosk` backend and its realtime transcription, it is very handy to have the keys sent directly to the keyboard.
|
|
126
|
+
That can be achieve with the `--keyboard` option.
|
|
127
|
+
|
|
125
128
|
With the `--keyboard` option `scribe` will attempt to simulate a keyboard and send transcribed characters to the applcation under focus:
|
|
126
129
|
|
|
127
130
|
```bash
|
|
@@ -130,7 +133,20 @@ scribe --keyboard
|
|
|
130
133
|
|
|
131
134
|
It relies on the optional `pynput` dependency (installed together with `scribe` if you used the `[all]` or `[keyboard]` option).
|
|
132
135
|
|
|
133
|
-
`pynput` may require [some configuration](https://pynput.readthedocs.io/en/latest/limitations.html). It has [limitations]((https://pynput.readthedocs.io/en/latest/limitations.html)).
|
|
136
|
+
`pynput` may require [some configuration](https://pynput.readthedocs.io/en/latest/limitations.html). It has [limitations]((https://pynput.readthedocs.io/en/latest/limitations.html)).
|
|
137
|
+
|
|
138
|
+
#### Use the keyboard in Ubuntu
|
|
139
|
+
|
|
140
|
+
In my Ubuntu + Wayland system the keyboard simulation works out-of-the-box in chromium based applications (including vscode) but it does not in firefox and sublime text and any of the rest (not even in a terminal !). I am told this is because Chromium runs an X server emulator and so is compatible with the default pynput backend.
|
|
141
|
+
|
|
142
|
+
One workaround is to use the Xorg version of GNOME: in `etc/gdm3/custom.conf` uncomment `# WaylandEnable=false` and restart your computer.
|
|
143
|
+
|
|
144
|
+
Another workaround with Wayland is to use the low-level `uinput` backend but that requires that `scribe` is run as root (sudo), and likely other configurations like activating the `uinput` system module (`sudo modprobe uinput` for a one-time test, or adding `uinput` to `/etc/modules-load.d/modules.conf` to make that persistent). Moreover, since `uinput` really only simulates key strokes, your keyboard must be set with an appropriate layout, for example to have the letter `é` you'd want a French or Italian layout otherwise the English will drop it or replace with something else. Another caveat I encountered the caveat that the special characters were inserted at the wrong place. Adding a small delay was enough to fix that with the additional parameter `--latency 0.01` Finally if you run as sudo you may need to reset some environment variable so that the list of audio devices (`XDG_RUNTIME_DIR`) and the download folder remain the same. To sum-up, that gives something like:
|
|
145
|
+
```bash
|
|
146
|
+
sudo modprobe uinput
|
|
147
|
+
sudo HOME=$HOME XDG_RUNTIME_DIR=$XDG_RUNTIME_DIR PYNPUT_BACKEND_KEYBOARD=uinput $(which scribe) --latency 0.01
|
|
148
|
+
```
|
|
149
|
+
You're on the right path :)
|
|
134
150
|
|
|
135
151
|
### System try icon (experimental)
|
|
136
152
|
|
|
@@ -1,18 +0,0 @@
|
|
|
1
|
-
"""This module handles typing characters as if they were typed on a keyboard.
|
|
2
|
-
"""
|
|
3
|
-
try:
|
|
4
|
-
# import pyautogui
|
|
5
|
-
from pynput.keyboard import Controller
|
|
6
|
-
|
|
7
|
-
except ImportError:
|
|
8
|
-
print("Please install pynput to use the keyboard feature.")
|
|
9
|
-
raise
|
|
10
|
-
|
|
11
|
-
# Create a keyboard controller
|
|
12
|
-
keyboard = Controller()
|
|
13
|
-
|
|
14
|
-
def type_text(text, interval=0):
|
|
15
|
-
# Simulate typing a string
|
|
16
|
-
# import subprocess
|
|
17
|
-
# subprocess.run(["ydotool", "type", text])
|
|
18
|
-
keyboard.type(text)
|
|
@@ -1,31 +0,0 @@
|
|
|
1
|
-
[vosk.en]
|
|
2
|
-
model = "vosk-model-en-us-0.42-gigaspeech"
|
|
3
|
-
|
|
4
|
-
[vosk.fr]
|
|
5
|
-
model = "vosk-model-fr-0.22"
|
|
6
|
-
|
|
7
|
-
[vosk.de]
|
|
8
|
-
model = "vosk-model-de-tuda-0.6-900k"
|
|
9
|
-
|
|
10
|
-
[vosk.it]
|
|
11
|
-
model = "vosk-model-it-0.22"
|
|
12
|
-
|
|
13
|
-
[_meta.en]
|
|
14
|
-
language = "English (US)"
|
|
15
|
-
start_message = "Listening... Press Ctrl+C to stop."
|
|
16
|
-
stop_message = "Recording stopped."
|
|
17
|
-
|
|
18
|
-
[_meta.fr]
|
|
19
|
-
language = "French"
|
|
20
|
-
start_message = "En écoute... Appuyez sur Ctrl+C pour arrêter."
|
|
21
|
-
stop_message = "Écoute arrêtée."
|
|
22
|
-
|
|
23
|
-
[_meta.de]
|
|
24
|
-
language = "German"
|
|
25
|
-
start_message = "Hören... Drücken Sie Strg+C, um zu stoppen."
|
|
26
|
-
stop_message = "Aufnahme gestoppt."
|
|
27
|
-
|
|
28
|
-
[_meta.it]
|
|
29
|
-
language = "Italian"
|
|
30
|
-
start_message = "In ascolto... Premere Ctrl+C per interrompere."
|
|
31
|
-
stop_message = "Registrazione interrotta."
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|