scribe-cli 0.7.3__tar.gz → 0.7.6__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (29) hide show
  1. {scribe_cli-0.7.3/scribe_cli.egg-info → scribe_cli-0.7.6}/PKG-INFO +40 -13
  2. {scribe_cli-0.7.3 → scribe_cli-0.7.6}/README.md +34 -10
  3. {scribe_cli-0.7.3 → scribe_cli-0.7.6}/pyproject.toml +9 -3
  4. {scribe_cli-0.7.3 → scribe_cli-0.7.6}/scribe/_version.py +2 -2
  5. {scribe_cli-0.7.3 → scribe_cli-0.7.6}/scribe/app.py +13 -22
  6. scribe_cli-0.7.6/scribe/keyboard.py +51 -0
  7. {scribe_cli-0.7.3 → scribe_cli-0.7.6}/scribe/models.py +3 -2
  8. scribe_cli-0.7.6/scribe/models.toml +23 -0
  9. {scribe_cli-0.7.3 → scribe_cli-0.7.6/scribe_cli.egg-info}/PKG-INFO +40 -13
  10. {scribe_cli-0.7.3 → scribe_cli-0.7.6}/scribe_cli.egg-info/requires.txt +5 -1
  11. scribe_cli-0.7.3/scribe/keyboard.py +0 -18
  12. scribe_cli-0.7.3/scribe/models.toml +0 -31
  13. {scribe_cli-0.7.3 → scribe_cli-0.7.6}/.github/workflows/pypi.yml +0 -0
  14. {scribe_cli-0.7.3 → scribe_cli-0.7.6}/.gitignore +0 -0
  15. {scribe_cli-0.7.3 → scribe_cli-0.7.6}/LICENSE +0 -0
  16. {scribe_cli-0.7.3 → scribe_cli-0.7.6}/scribe/__init__.py +0 -0
  17. {scribe_cli-0.7.3 → scribe_cli-0.7.6}/scribe/audio.py +0 -0
  18. {scribe_cli-0.7.3 → scribe_cli-0.7.6}/scribe/install_desktop.py +0 -0
  19. {scribe_cli-0.7.3 → scribe_cli-0.7.6}/scribe/saverecording.py +0 -0
  20. {scribe_cli-0.7.3 → scribe_cli-0.7.6}/scribe/testpynput.py +0 -0
  21. {scribe_cli-0.7.3 → scribe_cli-0.7.6}/scribe/util.py +0 -0
  22. {scribe_cli-0.7.3 → scribe_cli-0.7.6}/scribe_cli.egg-info/SOURCES.txt +0 -0
  23. {scribe_cli-0.7.3 → scribe_cli-0.7.6}/scribe_cli.egg-info/dependency_links.txt +0 -0
  24. {scribe_cli-0.7.3 → scribe_cli-0.7.6}/scribe_cli.egg-info/entry_points.txt +0 -0
  25. {scribe_cli-0.7.3 → scribe_cli-0.7.6}/scribe_cli.egg-info/top_level.txt +0 -0
  26. {scribe_cli-0.7.3 → scribe_cli-0.7.6}/scribe_data/__init__.py +0 -0
  27. {scribe_cli-0.7.3 → scribe_cli-0.7.6}/scribe_data/share/icon.jpg +0 -0
  28. {scribe_cli-0.7.3 → scribe_cli-0.7.6}/scribe_data/templates/scribe.desktop +0 -0
  29. {scribe_cli-0.7.3 → scribe_cli-0.7.6}/setup.cfg +0 -0
@@ -1,7 +1,7 @@
1
1
  Metadata-Version: 2.2
2
2
  Name: scribe-cli
3
- Version: 0.7.3
4
- Summary: scribe is a local speech recognition tool that provides real-time transcription using vosk and whisper AI.
3
+ Version: 0.7.6
4
+ Summary: scribe is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard on a computer
5
5
  Author-email: Mahé Perrette <mahe.perrette@gmail.com>
6
6
  License: MIT License
7
7
 
@@ -44,31 +44,43 @@ Requires-Dist: sounddevice
44
44
  Requires-Dist: tqdm
45
45
  Requires-Dist: requests
46
46
  Requires-Dist: pyperclip
47
- Requires-Dist: pystray
48
47
  Provides-Extra: keyboard
49
48
  Requires-Dist: pynput; extra == "keyboard"
50
49
  Provides-Extra: whisper
51
50
  Requires-Dist: openai-whisper; extra == "whisper"
52
51
  Provides-Extra: vosk
53
52
  Requires-Dist: vosk; extra == "vosk"
53
+ Provides-Extra: app
54
+ Requires-Dist: pystray; extra == "app"
55
+ Requires-Dist: PyGObject; extra == "app"
54
56
  Provides-Extra: all
55
57
  Requires-Dist: pynput; extra == "all"
56
58
  Requires-Dist: openai-whisper; extra == "all"
57
59
  Requires-Dist: vosk; extra == "all"
60
+ Requires-Dist: pystray; extra == "all"
61
+
62
+ [![python](https://img.shields.io/badge/python-3.12-blue.svg)]()
63
+ [![pypi](https://github.com/perrette/scribe/actions/workflows/pypi.yml/badge.svg)](https://pypi.org/project/papers-cli)
58
64
 
59
65
  # Scribe
60
66
 
61
- `scribe` is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard.
67
+ `scribe` is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard on a computer.
68
+
69
+ ## Compatibility
70
+
71
+ In principle `scribe` is compatible with any OS but I develop it under Ubuntu (Wayland) and develop it for my own purposes so glitches are likely on other configurations.
72
+ As of February 19, 2025 python 13 is not supported (I can't recall now which dependency is to blame).
73
+ A test on Mac OS (M1 Air with 8Gb RAM) worked with python 12, though with a much inferior performance compared to my own system (Lenovo T14 Gen 5 with i5 125U 32 Gb RAM).
62
74
 
63
75
  ## Installation
64
76
 
65
- Install PortAudio library. E.g. on Ubuntu:
77
+ Install PortAudio library and xclip library. E.g. on Ubuntu:
66
78
 
67
79
  ```bash
68
- sudo apt-get install portaudio19-dev
80
+ sudo apt-get install portaudio19-dev xclip
69
81
  ```
70
82
 
71
- The python dependencies should be dealt with automatically:
83
+ See additional requirements for the [icon tray](#system-tray-icon-experimental) and [keyboard](#virtual-keyboard-experimental) options. The python dependencies should be dealt with automatically:
72
84
 
73
85
  ```bash
74
86
  pip install scribe-cli[all]"
@@ -87,8 +99,8 @@ pip install -e .[all]
87
99
  You can leave the optional dependencies (leave out `[all]`) but must install at least one of `vosk` or `openai-whisper` packages (see Usage below).
88
100
 
89
101
  The `vosk` language models will download on-the-fly.
90
- The default data folder is `$HOME/.local/share/vosk/language-models`.
91
- This can be modified.
102
+ The default download folder is `$XDG_CACHE_HOME/{backend}` where `$XDG_CACHE_HOME` defaults to `$HOME/.cache` (note for the `whisker` backend
103
+ the default is left to the `openai-whisper` package and might change in the future).
92
104
 
93
105
 
94
106
  ## Usage
@@ -108,8 +120,7 @@ but it cannot do real-time, and depending on the model can have relatively long
108
120
  With the `whisker` model you need to stop the registration manually before the transcription occurs (Ctrl + C), though
109
121
  there is a maximum duration after which it will stop by itself, which is setup to 60s by default (unless `--duration` is set to something else).
110
122
 
111
- The `vosk` backend is good at
112
- doing real-time transcription for one language, but tended to make more mistakes in my tests and it does not do punctuation.
123
+ The `vosk` backend is much faster and very good at doing real-time transcription for one language, but tended to make more mistakes in my tests and it does not do punctuation.
113
124
  Use mainly for longer typing session with the [keyboard](#virtual-keyboard-advanced) option, e.g. to make notes.
114
125
  There are many [vosk models](https://alphacephei.com/vosk/models) available, and here a few are associated to [a handful of languages](scribe/models.toml) `en`, `fr`, `it`, `de` (so far).
115
126
 
@@ -122,6 +133,9 @@ where `--no-prompt` jumps right to the recording (after the first interruption,
122
133
  ### Virtual keyboard (experimental)
123
134
 
124
135
  By default the content of the transcription is pasted to the clipboard, and it is up to the user to paste (e.g. Ctrl + V).
136
+ However with the `vosk` backend and its realtime transcription, it is very handy to have the keys sent directly to the keyboard.
137
+ That can be achieve with the `--keyboard` option.
138
+
125
139
  With the `--keyboard` option `scribe` will attempt to simulate a keyboard and send transcribed characters to the applcation under focus:
126
140
 
127
141
  ```bash
@@ -129,10 +143,23 @@ scribe --keyboard
129
143
  ```
130
144
 
131
145
  It relies on the optional `pynput` dependency (installed together with `scribe` if you used the `[all]` or `[keyboard]` option).
146
+ Depending on your operating system, `pynput` may require additional configuration to work around its [limitations](https://pynput.readthedocs.io/en/latest/limitations.html).
147
+
148
+ #### Use the keyboard in Ubuntu
132
149
 
133
- `pynput` may require [some configuration](https://pynput.readthedocs.io/en/latest/limitations.html). It has [limitations]((https://pynput.readthedocs.io/en/latest/limitations.html)). In my Ubuntu + Wayland system it works in chromium based applications (including vscode) but it does not in firefox and sublime text and any of the rest (not even in a terminal !). Workarounds include using the Xorg version of GNOME: in `etc/gdm3/custom.conf` uncomment `# WaylandEnable=false` and restart.
150
+ In my Ubuntu + Wayland system the keyboard simulation works out-of-the-box in chromium based applications (including vscode) but it does not in firefox and sublime text and any of the rest (not even in a terminal !). I am told this is because Chromium runs an X server emulator and so is compatible with the default pynput backend.
151
+
152
+ One workaround is to use the Xorg version of GNOME: in `etc/gdm3/custom.conf` uncomment `# WaylandEnable=false` and restart your computer.
153
+
154
+ Another workaround while staying with Wayland is to use the low-level `uinput` backend of `pynput`, but that requires that `scribe` is run as root (sudo), and likely other configurations like activating the `uinput` system module (`sudo modprobe uinput` for a one-time test, or adding `uinput` to `/etc/modules-load.d/modules.conf` to make that persistent).
155
+ Moreover, the keyboard must be set with an appropriate layout, for example to have the letter `é` you'd want a French or Italian layout otherwise the English will drop it or replace with something else. Another caveat I encountered is that the special characters (`é`) were inserted at the wrong place. Adding a small delay was enough to fix that with the additional parameter `--latency 0.01` Finally if you run as sudo you may need to reset some environment variable so that the list of audio devices (`XDG_RUNTIME_DIR`) and the download folder remain the same. To sum-up, that gives something like:
156
+ ```bash
157
+ sudo modprobe uinput
158
+ sudo HOME=$HOME XDG_RUNTIME_DIR=$XDG_RUNTIME_DIR PYNPUT_BACKEND_KEYBOARD=uinput $(which scribe) --latency 0.01
159
+ ```
160
+ You're on the right path :)
134
161
 
135
- ### System try icon (experimental)
162
+ ### System tray icon (experimental)
136
163
 
137
164
  To avoid switching back and forth with the terminal, it's possible to interact with the program via an icon tray.
138
165
  To activate start with:
@@ -1,16 +1,25 @@
1
+ [![python](https://img.shields.io/badge/python-3.12-blue.svg)]()
2
+ [![pypi](https://github.com/perrette/scribe/actions/workflows/pypi.yml/badge.svg)](https://pypi.org/project/papers-cli)
3
+
1
4
  # Scribe
2
5
 
3
- `scribe` is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard.
6
+ `scribe` is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard on a computer.
7
+
8
+ ## Compatibility
9
+
10
+ In principle `scribe` is compatible with any OS but I develop it under Ubuntu (Wayland) and develop it for my own purposes so glitches are likely on other configurations.
11
+ As of February 19, 2025 python 13 is not supported (I can't recall now which dependency is to blame).
12
+ A test on Mac OS (M1 Air with 8Gb RAM) worked with python 12, though with a much inferior performance compared to my own system (Lenovo T14 Gen 5 with i5 125U 32 Gb RAM).
4
13
 
5
14
  ## Installation
6
15
 
7
- Install PortAudio library. E.g. on Ubuntu:
16
+ Install PortAudio library and xclip library. E.g. on Ubuntu:
8
17
 
9
18
  ```bash
10
- sudo apt-get install portaudio19-dev
19
+ sudo apt-get install portaudio19-dev xclip
11
20
  ```
12
21
 
13
- The python dependencies should be dealt with automatically:
22
+ See additional requirements for the [icon tray](#system-tray-icon-experimental) and [keyboard](#virtual-keyboard-experimental) options. The python dependencies should be dealt with automatically:
14
23
 
15
24
  ```bash
16
25
  pip install scribe-cli[all]"
@@ -29,8 +38,8 @@ pip install -e .[all]
29
38
  You can leave the optional dependencies (leave out `[all]`) but must install at least one of `vosk` or `openai-whisper` packages (see Usage below).
30
39
 
31
40
  The `vosk` language models will download on-the-fly.
32
- The default data folder is `$HOME/.local/share/vosk/language-models`.
33
- This can be modified.
41
+ The default download folder is `$XDG_CACHE_HOME/{backend}` where `$XDG_CACHE_HOME` defaults to `$HOME/.cache` (note for the `whisker` backend
42
+ the default is left to the `openai-whisper` package and might change in the future).
34
43
 
35
44
 
36
45
  ## Usage
@@ -50,8 +59,7 @@ but it cannot do real-time, and depending on the model can have relatively long
50
59
  With the `whisker` model you need to stop the registration manually before the transcription occurs (Ctrl + C), though
51
60
  there is a maximum duration after which it will stop by itself, which is setup to 60s by default (unless `--duration` is set to something else).
52
61
 
53
- The `vosk` backend is good at
54
- doing real-time transcription for one language, but tended to make more mistakes in my tests and it does not do punctuation.
62
+ The `vosk` backend is much faster and very good at doing real-time transcription for one language, but tended to make more mistakes in my tests and it does not do punctuation.
55
63
  Use mainly for longer typing session with the [keyboard](#virtual-keyboard-advanced) option, e.g. to make notes.
56
64
  There are many [vosk models](https://alphacephei.com/vosk/models) available, and here a few are associated to [a handful of languages](scribe/models.toml) `en`, `fr`, `it`, `de` (so far).
57
65
 
@@ -64,6 +72,9 @@ where `--no-prompt` jumps right to the recording (after the first interruption,
64
72
  ### Virtual keyboard (experimental)
65
73
 
66
74
  By default the content of the transcription is pasted to the clipboard, and it is up to the user to paste (e.g. Ctrl + V).
75
+ However with the `vosk` backend and its realtime transcription, it is very handy to have the keys sent directly to the keyboard.
76
+ That can be achieve with the `--keyboard` option.
77
+
67
78
  With the `--keyboard` option `scribe` will attempt to simulate a keyboard and send transcribed characters to the applcation under focus:
68
79
 
69
80
  ```bash
@@ -71,10 +82,23 @@ scribe --keyboard
71
82
  ```
72
83
 
73
84
  It relies on the optional `pynput` dependency (installed together with `scribe` if you used the `[all]` or `[keyboard]` option).
85
+ Depending on your operating system, `pynput` may require additional configuration to work around its [limitations](https://pynput.readthedocs.io/en/latest/limitations.html).
86
+
87
+ #### Use the keyboard in Ubuntu
74
88
 
75
- `pynput` may require [some configuration](https://pynput.readthedocs.io/en/latest/limitations.html). It has [limitations]((https://pynput.readthedocs.io/en/latest/limitations.html)). In my Ubuntu + Wayland system it works in chromium based applications (including vscode) but it does not in firefox and sublime text and any of the rest (not even in a terminal !). Workarounds include using the Xorg version of GNOME: in `etc/gdm3/custom.conf` uncomment `# WaylandEnable=false` and restart.
89
+ In my Ubuntu + Wayland system the keyboard simulation works out-of-the-box in chromium based applications (including vscode) but it does not in firefox and sublime text and any of the rest (not even in a terminal !). I am told this is because Chromium runs an X server emulator and so is compatible with the default pynput backend.
90
+
91
+ One workaround is to use the Xorg version of GNOME: in `etc/gdm3/custom.conf` uncomment `# WaylandEnable=false` and restart your computer.
92
+
93
+ Another workaround while staying with Wayland is to use the low-level `uinput` backend of `pynput`, but that requires that `scribe` is run as root (sudo), and likely other configurations like activating the `uinput` system module (`sudo modprobe uinput` for a one-time test, or adding `uinput` to `/etc/modules-load.d/modules.conf` to make that persistent).
94
+ Moreover, the keyboard must be set with an appropriate layout, for example to have the letter `é` you'd want a French or Italian layout otherwise the English will drop it or replace with something else. Another caveat I encountered is that the special characters (`é`) were inserted at the wrong place. Adding a small delay was enough to fix that with the additional parameter `--latency 0.01` Finally if you run as sudo you may need to reset some environment variable so that the list of audio devices (`XDG_RUNTIME_DIR`) and the download folder remain the same. To sum-up, that gives something like:
95
+ ```bash
96
+ sudo modprobe uinput
97
+ sudo HOME=$HOME XDG_RUNTIME_DIR=$XDG_RUNTIME_DIR PYNPUT_BACKEND_KEYBOARD=uinput $(which scribe) --latency 0.01
98
+ ```
99
+ You're on the right path :)
76
100
 
77
- ### System try icon (experimental)
101
+ ### System tray icon (experimental)
78
102
 
79
103
  To avoid switching back and forth with the terminal, it's possible to interact with the program via an icon tray.
80
104
  To activate start with:
@@ -5,7 +5,7 @@ build-backend = "setuptools.build_meta"
5
5
  [project]
6
6
  name = "scribe-cli"
7
7
  dynamic = ["version"]
8
- description = "scribe is a local speech recognition tool that provides real-time transcription using vosk and whisper AI."
8
+ description = "scribe is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard on a computer"
9
9
  authors = [
10
10
  { name="Mahé Perrette", email="mahe.perrette@gmail.com" }
11
11
  ]
@@ -18,9 +18,7 @@ dependencies = [
18
18
  "tqdm",
19
19
  "requests",
20
20
  "pyperclip",
21
- "pystray",
22
21
  ]
23
- optional-dependencies = { keyboard = ["pynput"], whisper = ["openai-whisper"], vosk = ["vosk"], all = ["pynput", "openai-whisper", "vosk"] }
24
22
 
25
23
  classifiers = [
26
24
  "Programming Language :: Python :: 3",
@@ -39,6 +37,14 @@ keywords = [
39
37
  "clipboard",
40
38
  ]
41
39
 
40
+ [project.optional-dependencies]
41
+ keyboard = ["pynput"]
42
+ whisper = ["openai-whisper"]
43
+ vosk = ["vosk"]
44
+ app = ["pystray", "PyGObject"]
45
+ all = ["pynput", "openai-whisper", "vosk", "pystray"]
46
+
47
+
42
48
  [tool.setuptools]
43
49
  packages = [ "scribe", "scribe_data" ]
44
50
 
@@ -12,5 +12,5 @@ __version__: str
12
12
  __version_tuple__: VERSION_TUPLE
13
13
  version_tuple: VERSION_TUPLE
14
14
 
15
- __version__ = version = '0.7.3'
16
- __version_tuple__ = version_tuple = (0, 7, 3)
15
+ __version__ = version = '0.7.6'
16
+ __version_tuple__ = version_tuple = (0, 7, 6)
@@ -147,6 +147,7 @@ def get_parser():
147
147
  parser.add_argument("--app", action="store_true", help="Start in app mode (relies on pystray)")
148
148
 
149
149
  parser.add_argument("--samplerate", default=16000, type=int, help=argparse.SUPPRESS)
150
+ parser.add_argument("--microphone-device", help="The device index of the microphone to use.", type=int)
150
151
  parser.add_argument("--keyboard", action="store_true")
151
152
  parser.add_argument("--no-clipboard", dest="clipboard", action="store_false")
152
153
  parser.add_argument("--latency", default=0, type=float, help="keyboard latency")
@@ -163,36 +164,20 @@ def get_parser():
163
164
 
164
165
 
165
166
  # Commencer l'enregistrement
166
- def start_recording(micro, transcriber, clipboard=True, keyboard=False, latency=0):
167
+ def start_recording(micro, transcriber, clipboard=True, keyboard=False, latency=0, **greetings):
167
168
 
168
169
  if keyboard:
169
- try:
170
- from scribe.keyboard import type_text
171
- except ImportError:
172
- keyboard = False
173
- print("Keyboard simulation is not available.")
174
- return
175
-
170
+ from scribe.keyboard import type_text
176
171
  print("\nChange focus to target app during transcription.")
177
172
 
178
173
 
179
174
  if clipboard:
180
- try:
181
- import pyperclip
182
- except ImportError:
183
- clipboard = False
184
- print("Clipboard simulation is not available.")
185
- return
186
-
175
+ import pyperclip
187
176
  print("\nThe full transcription will be copied to clipboard as it becomes available.")
188
177
 
189
178
 
190
179
  fulltext = ""
191
180
 
192
- greetings = { k: v for k, v in language_config["_meta"].get(transcriber.language, {}).items()
193
- if v is not None and k.startswith(("start", "stop"))
194
- }
195
-
196
181
  for result in transcriber.start_recording(micro, **greetings):
197
182
 
198
183
  if result.get('text'):
@@ -284,7 +269,7 @@ def main(args=None):
284
269
 
285
270
 
286
271
  # Set up the microphone for recording
287
- micro = Microphone(samplerate=o.samplerate)
272
+ micro = Microphone(samplerate=o.samplerate, device=o.microphone_device)
288
273
 
289
274
  transcriber = None
290
275
 
@@ -341,11 +326,17 @@ def main(args=None):
341
326
  continue
342
327
 
343
328
  if o.app:
344
- app = create_app(micro, transcriber, clipboard=o.clipboard, keyboard=o.keyboard, latency=o.latency)
329
+ greetings = dict(
330
+ start_message = "Listening... Use the try icon menu to stop.",
331
+ )
332
+ app = create_app(micro, transcriber, clipboard=o.clipboard, keyboard=o.keyboard, latency=o.latency, **greetings)
345
333
  print("Starting app...")
346
334
  app.run()
347
335
  else:
348
- start_recording(micro, transcriber, clipboard=o.clipboard, keyboard=o.keyboard, latency=o.latency)
336
+ greetings = dict(
337
+ start_message = "Listening... Press Ctrl+C to stop.",
338
+ )
339
+ start_recording(micro, transcriber, clipboard=o.clipboard, keyboard=o.keyboard, latency=o.latency, **greetings)
349
340
 
350
341
  # if we arrived so far, that means we pressed Ctrl + C anyway, and need Enter to move on.
351
342
  # So we leave the wider range of options to change the model.
@@ -0,0 +1,51 @@
1
+ """This module handles typing characters as if they were typed on a keyboard.
2
+ """
3
+ import platform
4
+ import time
5
+
6
+ try:
7
+ # import pyautogui
8
+ from pynput.keyboard import Controller, Key
9
+
10
+ except ImportError:
11
+ print("Please install pynput to use the keyboard feature.")
12
+ raise
13
+
14
+ # Create a keyboard controller
15
+ keyboard = Controller()
16
+
17
+ def paste_text():
18
+ """This does not work with the uinput backend
19
+ """
20
+ os_name = platform.system()
21
+
22
+ if os_name == "Darwin": # macOS
23
+ with keyboard.pressed(Key.cmd):
24
+ keyboard.press('v')
25
+ keyboard.release('v')
26
+
27
+ else: # Windows and Linux
28
+ keyboard.press(Key.ctrl)
29
+ keyboard.press('v')
30
+ keyboard.release('v')
31
+ keyboard.release(Key.ctrl)
32
+
33
+ def type_text(text, interval=0, paste=False):
34
+ # Simulate typing a string
35
+ # import subprocess
36
+ # subprocess.run(["ydotool", "type", text])
37
+
38
+ if paste:
39
+ import pyperclip
40
+ keep_state = pyperclip.paste()
41
+ pyperclip.copy(text)
42
+ paste_text()
43
+ pyperclip.copy(keep_state)
44
+ return
45
+
46
+ if interval > 0:
47
+ for c in text:
48
+ keyboard.type(c)
49
+ time.sleep(interval)
50
+ else:
51
+ keyboard.type(text)
@@ -12,8 +12,9 @@ def is_silent(data, silence_thresh=-40):
12
12
  """
13
13
  return calculate_decibels(data) < silence_thresh
14
14
 
15
- VOSK_MODELS_FOLDER = os.path.join(os.environ.get("HOME"),
16
- ".local/share/vosk/language-models")
15
+ HOME = os.environ.get('HOME', os.path.expanduser('~'))
16
+ XDG_CACHE_HOME = os.environ.get('XDG_CACHE_HOME', os.path.join(HOME, '.cache'))
17
+ VOSK_MODELS_FOLDER = os.path.join(XDG_CACHE_HOME, "vosk")
17
18
 
18
19
  class StopRecording(Exception):
19
20
  pass
@@ -0,0 +1,23 @@
1
+ [vosk.en]
2
+ model = "vosk-model-en-us-0.42-gigaspeech"
3
+
4
+ [vosk.fr]
5
+ model = "vosk-model-fr-0.22"
6
+
7
+ [vosk.de]
8
+ model = "vosk-model-de-tuda-0.6-900k"
9
+
10
+ [vosk.it]
11
+ model = "vosk-model-it-0.22"
12
+
13
+ [_meta.en]
14
+ language = "English (US)"
15
+
16
+ [_meta.fr]
17
+ language = "French"
18
+
19
+ [_meta.de]
20
+ language = "German"
21
+
22
+ [_meta.it]
23
+ language = "Italian"
@@ -1,7 +1,7 @@
1
1
  Metadata-Version: 2.2
2
2
  Name: scribe-cli
3
- Version: 0.7.3
4
- Summary: scribe is a local speech recognition tool that provides real-time transcription using vosk and whisper AI.
3
+ Version: 0.7.6
4
+ Summary: scribe is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard on a computer
5
5
  Author-email: Mahé Perrette <mahe.perrette@gmail.com>
6
6
  License: MIT License
7
7
 
@@ -44,31 +44,43 @@ Requires-Dist: sounddevice
44
44
  Requires-Dist: tqdm
45
45
  Requires-Dist: requests
46
46
  Requires-Dist: pyperclip
47
- Requires-Dist: pystray
48
47
  Provides-Extra: keyboard
49
48
  Requires-Dist: pynput; extra == "keyboard"
50
49
  Provides-Extra: whisper
51
50
  Requires-Dist: openai-whisper; extra == "whisper"
52
51
  Provides-Extra: vosk
53
52
  Requires-Dist: vosk; extra == "vosk"
53
+ Provides-Extra: app
54
+ Requires-Dist: pystray; extra == "app"
55
+ Requires-Dist: PyGObject; extra == "app"
54
56
  Provides-Extra: all
55
57
  Requires-Dist: pynput; extra == "all"
56
58
  Requires-Dist: openai-whisper; extra == "all"
57
59
  Requires-Dist: vosk; extra == "all"
60
+ Requires-Dist: pystray; extra == "all"
61
+
62
+ [![python](https://img.shields.io/badge/python-3.12-blue.svg)]()
63
+ [![pypi](https://github.com/perrette/scribe/actions/workflows/pypi.yml/badge.svg)](https://pypi.org/project/papers-cli)
58
64
 
59
65
  # Scribe
60
66
 
61
- `scribe` is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard.
67
+ `scribe` is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard on a computer.
68
+
69
+ ## Compatibility
70
+
71
+ In principle `scribe` is compatible with any OS but I develop it under Ubuntu (Wayland) and develop it for my own purposes so glitches are likely on other configurations.
72
+ As of February 19, 2025 python 13 is not supported (I can't recall now which dependency is to blame).
73
+ A test on Mac OS (M1 Air with 8Gb RAM) worked with python 12, though with a much inferior performance compared to my own system (Lenovo T14 Gen 5 with i5 125U 32 Gb RAM).
62
74
 
63
75
  ## Installation
64
76
 
65
- Install PortAudio library. E.g. on Ubuntu:
77
+ Install PortAudio library and xclip library. E.g. on Ubuntu:
66
78
 
67
79
  ```bash
68
- sudo apt-get install portaudio19-dev
80
+ sudo apt-get install portaudio19-dev xclip
69
81
  ```
70
82
 
71
- The python dependencies should be dealt with automatically:
83
+ See additional requirements for the [icon tray](#system-tray-icon-experimental) and [keyboard](#virtual-keyboard-experimental) options. The python dependencies should be dealt with automatically:
72
84
 
73
85
  ```bash
74
86
  pip install scribe-cli[all]"
@@ -87,8 +99,8 @@ pip install -e .[all]
87
99
  You can leave the optional dependencies (leave out `[all]`) but must install at least one of `vosk` or `openai-whisper` packages (see Usage below).
88
100
 
89
101
  The `vosk` language models will download on-the-fly.
90
- The default data folder is `$HOME/.local/share/vosk/language-models`.
91
- This can be modified.
102
+ The default download folder is `$XDG_CACHE_HOME/{backend}` where `$XDG_CACHE_HOME` defaults to `$HOME/.cache` (note for the `whisker` backend
103
+ the default is left to the `openai-whisper` package and might change in the future).
92
104
 
93
105
 
94
106
  ## Usage
@@ -108,8 +120,7 @@ but it cannot do real-time, and depending on the model can have relatively long
108
120
  With the `whisker` model you need to stop the registration manually before the transcription occurs (Ctrl + C), though
109
121
  there is a maximum duration after which it will stop by itself, which is setup to 60s by default (unless `--duration` is set to something else).
110
122
 
111
- The `vosk` backend is good at
112
- doing real-time transcription for one language, but tended to make more mistakes in my tests and it does not do punctuation.
123
+ The `vosk` backend is much faster and very good at doing real-time transcription for one language, but tended to make more mistakes in my tests and it does not do punctuation.
113
124
  Use mainly for longer typing session with the [keyboard](#virtual-keyboard-advanced) option, e.g. to make notes.
114
125
  There are many [vosk models](https://alphacephei.com/vosk/models) available, and here a few are associated to [a handful of languages](scribe/models.toml) `en`, `fr`, `it`, `de` (so far).
115
126
 
@@ -122,6 +133,9 @@ where `--no-prompt` jumps right to the recording (after the first interruption,
122
133
  ### Virtual keyboard (experimental)
123
134
 
124
135
  By default the content of the transcription is pasted to the clipboard, and it is up to the user to paste (e.g. Ctrl + V).
136
+ However with the `vosk` backend and its realtime transcription, it is very handy to have the keys sent directly to the keyboard.
137
+ That can be achieve with the `--keyboard` option.
138
+
125
139
  With the `--keyboard` option `scribe` will attempt to simulate a keyboard and send transcribed characters to the applcation under focus:
126
140
 
127
141
  ```bash
@@ -129,10 +143,23 @@ scribe --keyboard
129
143
  ```
130
144
 
131
145
  It relies on the optional `pynput` dependency (installed together with `scribe` if you used the `[all]` or `[keyboard]` option).
146
+ Depending on your operating system, `pynput` may require additional configuration to work around its [limitations](https://pynput.readthedocs.io/en/latest/limitations.html).
147
+
148
+ #### Use the keyboard in Ubuntu
132
149
 
133
- `pynput` may require [some configuration](https://pynput.readthedocs.io/en/latest/limitations.html). It has [limitations]((https://pynput.readthedocs.io/en/latest/limitations.html)). In my Ubuntu + Wayland system it works in chromium based applications (including vscode) but it does not in firefox and sublime text and any of the rest (not even in a terminal !). Workarounds include using the Xorg version of GNOME: in `etc/gdm3/custom.conf` uncomment `# WaylandEnable=false` and restart.
150
+ In my Ubuntu + Wayland system the keyboard simulation works out-of-the-box in chromium based applications (including vscode) but it does not in firefox and sublime text and any of the rest (not even in a terminal !). I am told this is because Chromium runs an X server emulator and so is compatible with the default pynput backend.
151
+
152
+ One workaround is to use the Xorg version of GNOME: in `etc/gdm3/custom.conf` uncomment `# WaylandEnable=false` and restart your computer.
153
+
154
+ Another workaround while staying with Wayland is to use the low-level `uinput` backend of `pynput`, but that requires that `scribe` is run as root (sudo), and likely other configurations like activating the `uinput` system module (`sudo modprobe uinput` for a one-time test, or adding `uinput` to `/etc/modules-load.d/modules.conf` to make that persistent).
155
+ Moreover, the keyboard must be set with an appropriate layout, for example to have the letter `é` you'd want a French or Italian layout otherwise the English will drop it or replace with something else. Another caveat I encountered is that the special characters (`é`) were inserted at the wrong place. Adding a small delay was enough to fix that with the additional parameter `--latency 0.01` Finally if you run as sudo you may need to reset some environment variable so that the list of audio devices (`XDG_RUNTIME_DIR`) and the download folder remain the same. To sum-up, that gives something like:
156
+ ```bash
157
+ sudo modprobe uinput
158
+ sudo HOME=$HOME XDG_RUNTIME_DIR=$XDG_RUNTIME_DIR PYNPUT_BACKEND_KEYBOARD=uinput $(which scribe) --latency 0.01
159
+ ```
160
+ You're on the right path :)
134
161
 
135
- ### System try icon (experimental)
162
+ ### System tray icon (experimental)
136
163
 
137
164
  To avoid switching back and forth with the terminal, it's possible to interact with the program via an icon tray.
138
165
  To activate start with:
@@ -3,12 +3,16 @@ sounddevice
3
3
  tqdm
4
4
  requests
5
5
  pyperclip
6
- pystray
7
6
 
8
7
  [all]
9
8
  pynput
10
9
  openai-whisper
11
10
  vosk
11
+ pystray
12
+
13
+ [app]
14
+ pystray
15
+ PyGObject
12
16
 
13
17
  [keyboard]
14
18
  pynput
@@ -1,18 +0,0 @@
1
- """This module handles typing characters as if they were typed on a keyboard.
2
- """
3
- try:
4
- # import pyautogui
5
- from pynput.keyboard import Controller
6
-
7
- except ImportError:
8
- print("Please install pynput to use the keyboard feature.")
9
- raise
10
-
11
- # Create a keyboard controller
12
- keyboard = Controller()
13
-
14
- def type_text(text, interval=0):
15
- # Simulate typing a string
16
- # import subprocess
17
- # subprocess.run(["ydotool", "type", text])
18
- keyboard.type(text)
@@ -1,31 +0,0 @@
1
- [vosk.en]
2
- model = "vosk-model-en-us-0.42-gigaspeech"
3
-
4
- [vosk.fr]
5
- model = "vosk-model-fr-0.22"
6
-
7
- [vosk.de]
8
- model = "vosk-model-de-tuda-0.6-900k"
9
-
10
- [vosk.it]
11
- model = "vosk-model-it-0.22"
12
-
13
- [_meta.en]
14
- language = "English (US)"
15
- start_message = "Listening... Press Ctrl+C to stop."
16
- stop_message = "Recording stopped."
17
-
18
- [_meta.fr]
19
- language = "French"
20
- start_message = "En écoute... Appuyez sur Ctrl+C pour arrêter."
21
- stop_message = "Écoute arrêtée."
22
-
23
- [_meta.de]
24
- language = "German"
25
- start_message = "Hören... Drücken Sie Strg+C, um zu stoppen."
26
- stop_message = "Aufnahme gestoppt."
27
-
28
- [_meta.it]
29
- language = "Italian"
30
- start_message = "In ascolto... Premere Ctrl+C per interrompere."
31
- stop_message = "Registrazione interrotta."
File without changes
File without changes
File without changes
File without changes
File without changes