scribe-cli 0.7.4__tar.gz → 0.7.7__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (27) hide show
  1. {scribe_cli-0.7.4/scribe_cli.egg-info → scribe_cli-0.7.7}/PKG-INFO +27 -15
  2. {scribe_cli-0.7.4 → scribe_cli-0.7.7}/README.md +21 -13
  3. {scribe_cli-0.7.4 → scribe_cli-0.7.7}/pyproject.toml +10 -3
  4. {scribe_cli-0.7.4 → scribe_cli-0.7.7}/scribe/_version.py +2 -2
  5. {scribe_cli-0.7.4 → scribe_cli-0.7.7}/scribe/app.py +15 -6
  6. {scribe_cli-0.7.4 → scribe_cli-0.7.7}/scribe/keyboard.py +20 -3
  7. {scribe_cli-0.7.4 → scribe_cli-0.7.7/scribe_cli.egg-info}/PKG-INFO +27 -15
  8. {scribe_cli-0.7.4 → scribe_cli-0.7.7}/scribe_cli.egg-info/requires.txt +6 -1
  9. {scribe_cli-0.7.4 → scribe_cli-0.7.7}/.github/workflows/pypi.yml +0 -0
  10. {scribe_cli-0.7.4 → scribe_cli-0.7.7}/.gitignore +0 -0
  11. {scribe_cli-0.7.4 → scribe_cli-0.7.7}/LICENSE +0 -0
  12. {scribe_cli-0.7.4 → scribe_cli-0.7.7}/scribe/__init__.py +0 -0
  13. {scribe_cli-0.7.4 → scribe_cli-0.7.7}/scribe/audio.py +0 -0
  14. {scribe_cli-0.7.4 → scribe_cli-0.7.7}/scribe/install_desktop.py +0 -0
  15. {scribe_cli-0.7.4 → scribe_cli-0.7.7}/scribe/models.py +0 -0
  16. {scribe_cli-0.7.4 → scribe_cli-0.7.7}/scribe/models.toml +0 -0
  17. {scribe_cli-0.7.4 → scribe_cli-0.7.7}/scribe/saverecording.py +0 -0
  18. {scribe_cli-0.7.4 → scribe_cli-0.7.7}/scribe/testpynput.py +0 -0
  19. {scribe_cli-0.7.4 → scribe_cli-0.7.7}/scribe/util.py +0 -0
  20. {scribe_cli-0.7.4 → scribe_cli-0.7.7}/scribe_cli.egg-info/SOURCES.txt +0 -0
  21. {scribe_cli-0.7.4 → scribe_cli-0.7.7}/scribe_cli.egg-info/dependency_links.txt +0 -0
  22. {scribe_cli-0.7.4 → scribe_cli-0.7.7}/scribe_cli.egg-info/entry_points.txt +0 -0
  23. {scribe_cli-0.7.4 → scribe_cli-0.7.7}/scribe_cli.egg-info/top_level.txt +0 -0
  24. {scribe_cli-0.7.4 → scribe_cli-0.7.7}/scribe_data/__init__.py +0 -0
  25. {scribe_cli-0.7.4 → scribe_cli-0.7.7}/scribe_data/share/icon.jpg +0 -0
  26. {scribe_cli-0.7.4 → scribe_cli-0.7.7}/scribe_data/templates/scribe.desktop +0 -0
  27. {scribe_cli-0.7.4 → scribe_cli-0.7.7}/setup.cfg +0 -0
@@ -1,7 +1,7 @@
1
1
  Metadata-Version: 2.2
2
2
  Name: scribe-cli
3
- Version: 0.7.4
4
- Summary: scribe is a local speech recognition tool that provides real-time transcription using vosk and whisper AI.
3
+ Version: 0.7.7
4
+ Summary: scribe is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard on a computer
5
5
  Author-email: Mahé Perrette <mahe.perrette@gmail.com>
6
6
  License: MIT License
7
7
 
@@ -44,34 +44,47 @@ Requires-Dist: sounddevice
44
44
  Requires-Dist: tqdm
45
45
  Requires-Dist: requests
46
46
  Requires-Dist: pyperclip
47
- Requires-Dist: pystray
47
+ Requires-Dist: unidecode
48
48
  Provides-Extra: keyboard
49
49
  Requires-Dist: pynput; extra == "keyboard"
50
50
  Provides-Extra: whisper
51
51
  Requires-Dist: openai-whisper; extra == "whisper"
52
52
  Provides-Extra: vosk
53
53
  Requires-Dist: vosk; extra == "vosk"
54
+ Provides-Extra: app
55
+ Requires-Dist: pystray; extra == "app"
56
+ Requires-Dist: PyGObject; extra == "app"
54
57
  Provides-Extra: all
55
58
  Requires-Dist: pynput; extra == "all"
56
59
  Requires-Dist: openai-whisper; extra == "all"
57
60
  Requires-Dist: vosk; extra == "all"
61
+ Requires-Dist: pystray; extra == "all"
62
+
63
+ [![python](https://img.shields.io/badge/python-3.12-blue.svg)]()
64
+ [![pypi](https://github.com/perrette/scribe/actions/workflows/pypi.yml/badge.svg)](https://pypi.org/project/scribe-cli)
58
65
 
59
66
  # Scribe
60
67
 
61
- `scribe` is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard.
68
+ `scribe` is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard on a computer.
69
+
70
+ ## Compatibility
71
+
72
+ In principle `scribe` is compatible with any OS but I develop it under Ubuntu (Wayland) and develop it for my own purposes so glitches are likely on other configurations.
73
+ As of February 19, 2025 python 13 is not supported (I can't recall now which dependency is to blame).
74
+ A test on Mac OS (M1 Air with 8Gb RAM) worked with python 12, though with a much inferior performance compared to my own system (Lenovo T14 Gen 5 with i5 125U 32 Gb RAM).
62
75
 
63
76
  ## Installation
64
77
 
65
- Install PortAudio library. E.g. on Ubuntu:
78
+ Install PortAudio library and xclip library. E.g. on Ubuntu:
66
79
 
67
80
  ```bash
68
- sudo apt-get install portaudio19-dev
81
+ sudo apt-get install portaudio19-dev xclip
69
82
  ```
70
83
 
71
- The python dependencies should be dealt with automatically:
84
+ See additional requirements for the [icon tray](#system-tray-icon-experimental) and [keyboard](#virtual-keyboard-experimental) options. The python dependencies should be dealt with automatically:
72
85
 
73
86
  ```bash
74
- pip install scribe-cli[all]"
87
+ pip install scribe-cli[all]
75
88
  ```
76
89
 
77
90
  (note the `-cli` suffix for client)
@@ -108,9 +121,8 @@ but it cannot do real-time, and depending on the model can have relatively long
108
121
  With the `whisker` model you need to stop the registration manually before the transcription occurs (Ctrl + C), though
109
122
  there is a maximum duration after which it will stop by itself, which is setup to 60s by default (unless `--duration` is set to something else).
110
123
 
111
- The `vosk` backend is good at
112
- doing real-time transcription for one language, but tended to make more mistakes in my tests and it does not do punctuation.
113
- Use mainly for longer typing session with the [keyboard](#virtual-keyboard-advanced) option, e.g. to make notes.
124
+ The `vosk` backend is much faster and very good at doing real-time transcription for one language, but tended to make more mistakes in my tests and it does not do punctuation.
125
+ It becomes really powerful when used for longer or interactive typing session with the [keyboard](#virtual-keyboard-experimental) option, e.g. to make notes or chat with an AI.
114
126
  There are many [vosk models](https://alphacephei.com/vosk/models) available, and here a few are associated to [a handful of languages](scribe/models.toml) `en`, `fr`, `it`, `de` (so far).
115
127
 
116
128
  To skip the initial selection menu you can do:
@@ -132,8 +144,7 @@ scribe --keyboard
132
144
  ```
133
145
 
134
146
  It relies on the optional `pynput` dependency (installed together with `scribe` if you used the `[all]` or `[keyboard]` option).
135
-
136
- `pynput` may require [some configuration](https://pynput.readthedocs.io/en/latest/limitations.html). It has [limitations]((https://pynput.readthedocs.io/en/latest/limitations.html)).
147
+ Depending on your operating system, `pynput` may require additional configuration to work around its [limitations](https://pynput.readthedocs.io/en/latest/limitations.html).
137
148
 
138
149
  #### Use the keyboard in Ubuntu
139
150
 
@@ -141,14 +152,15 @@ In my Ubuntu + Wayland system the keyboard simulation works out-of-the-box in ch
141
152
 
142
153
  One workaround is to use the Xorg version of GNOME: in `etc/gdm3/custom.conf` uncomment `# WaylandEnable=false` and restart your computer.
143
154
 
144
- Another workaround with Wayland is to use the low-level `uinput` backend but that requires that `scribe` is run as root (sudo), and likely other configurations like activating the `uinput` system module (`sudo modprobe uinput` for a one-time test, or adding `uinput` to `/etc/modules-load.d/modules.conf` to make that persistent). Moreover, since `uinput` really only simulates key strokes, your keyboard must be set with an appropriate layout, for example to have the letter `é` you'd want a French or Italian layout otherwise the English will drop it or replace with something else. Another caveat I encountered the caveat that the special characters were inserted at the wrong place. Adding a small delay was enough to fix that with the additional parameter `--latency 0.01` Finally if you run as sudo you may need to reset some environment variable so that the list of audio devices (`XDG_RUNTIME_DIR`) and the download folder remain the same. To sum-up, that gives something like:
155
+ Another workaround while staying with Wayland is to use the low-level `uinput` backend of `pynput`, but that requires that `scribe` is run as root (sudo), and likely other configurations like activating the `uinput` system module (`sudo modprobe uinput` for a one-time test, or adding `uinput` to `/etc/modules-load.d/modules.conf` to make that persistent).
156
+ Moreover, the keyboard must be set with an appropriate layout, for example to have the letter `é` you'd want a French or Italian layout otherwise the English will drop it or replace with something else. Another caveat I encountered is that the special characters (`é`) were inserted at the wrong place. Adding a small delay was enough to fix that with the additional parameter `--latency 0.01` Finally if you run as sudo you may need to reset some environment variable so that the list of audio devices (`XDG_RUNTIME_DIR`) and the download folder remain the same. To sum-up, that gives something like:
145
157
  ```bash
146
158
  sudo modprobe uinput
147
159
  sudo HOME=$HOME XDG_RUNTIME_DIR=$XDG_RUNTIME_DIR PYNPUT_BACKEND_KEYBOARD=uinput $(which scribe) --latency 0.01
148
160
  ```
149
161
  You're on the right path :)
150
162
 
151
- ### System try icon (experimental)
163
+ ### System tray icon (experimental)
152
164
 
153
165
  To avoid switching back and forth with the terminal, it's possible to interact with the program via an icon tray.
154
166
  To activate start with:
@@ -1,19 +1,28 @@
1
+ [![python](https://img.shields.io/badge/python-3.12-blue.svg)]()
2
+ [![pypi](https://github.com/perrette/scribe/actions/workflows/pypi.yml/badge.svg)](https://pypi.org/project/scribe-cli)
3
+
1
4
  # Scribe
2
5
 
3
- `scribe` is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard.
6
+ `scribe` is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard on a computer.
7
+
8
+ ## Compatibility
9
+
10
+ In principle `scribe` is compatible with any OS but I develop it under Ubuntu (Wayland) and develop it for my own purposes so glitches are likely on other configurations.
11
+ As of February 19, 2025 python 13 is not supported (I can't recall now which dependency is to blame).
12
+ A test on Mac OS (M1 Air with 8Gb RAM) worked with python 12, though with a much inferior performance compared to my own system (Lenovo T14 Gen 5 with i5 125U 32 Gb RAM).
4
13
 
5
14
  ## Installation
6
15
 
7
- Install PortAudio library. E.g. on Ubuntu:
16
+ Install PortAudio library and xclip library. E.g. on Ubuntu:
8
17
 
9
18
  ```bash
10
- sudo apt-get install portaudio19-dev
19
+ sudo apt-get install portaudio19-dev xclip
11
20
  ```
12
21
 
13
- The python dependencies should be dealt with automatically:
22
+ See additional requirements for the [icon tray](#system-tray-icon-experimental) and [keyboard](#virtual-keyboard-experimental) options. The python dependencies should be dealt with automatically:
14
23
 
15
24
  ```bash
16
- pip install scribe-cli[all]"
25
+ pip install scribe-cli[all]
17
26
  ```
18
27
 
19
28
  (note the `-cli` suffix for client)
@@ -50,9 +59,8 @@ but it cannot do real-time, and depending on the model can have relatively long
50
59
  With the `whisker` model you need to stop the registration manually before the transcription occurs (Ctrl + C), though
51
60
  there is a maximum duration after which it will stop by itself, which is setup to 60s by default (unless `--duration` is set to something else).
52
61
 
53
- The `vosk` backend is good at
54
- doing real-time transcription for one language, but tended to make more mistakes in my tests and it does not do punctuation.
55
- Use mainly for longer typing session with the [keyboard](#virtual-keyboard-advanced) option, e.g. to make notes.
62
+ The `vosk` backend is much faster and very good at doing real-time transcription for one language, but tended to make more mistakes in my tests and it does not do punctuation.
63
+ It becomes really powerful when used for longer or interactive typing session with the [keyboard](#virtual-keyboard-experimental) option, e.g. to make notes or chat with an AI.
56
64
  There are many [vosk models](https://alphacephei.com/vosk/models) available, and here a few are associated to [a handful of languages](scribe/models.toml) `en`, `fr`, `it`, `de` (so far).
57
65
 
58
66
  To skip the initial selection menu you can do:
@@ -74,8 +82,7 @@ scribe --keyboard
74
82
  ```
75
83
 
76
84
  It relies on the optional `pynput` dependency (installed together with `scribe` if you used the `[all]` or `[keyboard]` option).
77
-
78
- `pynput` may require [some configuration](https://pynput.readthedocs.io/en/latest/limitations.html). It has [limitations]((https://pynput.readthedocs.io/en/latest/limitations.html)).
85
+ Depending on your operating system, `pynput` may require additional configuration to work around its [limitations](https://pynput.readthedocs.io/en/latest/limitations.html).
79
86
 
80
87
  #### Use the keyboard in Ubuntu
81
88
 
@@ -83,14 +90,15 @@ In my Ubuntu + Wayland system the keyboard simulation works out-of-the-box in ch
83
90
 
84
91
  One workaround is to use the Xorg version of GNOME: in `etc/gdm3/custom.conf` uncomment `# WaylandEnable=false` and restart your computer.
85
92
 
86
- Another workaround with Wayland is to use the low-level `uinput` backend but that requires that `scribe` is run as root (sudo), and likely other configurations like activating the `uinput` system module (`sudo modprobe uinput` for a one-time test, or adding `uinput` to `/etc/modules-load.d/modules.conf` to make that persistent). Moreover, since `uinput` really only simulates key strokes, your keyboard must be set with an appropriate layout, for example to have the letter `é` you'd want a French or Italian layout otherwise the English will drop it or replace with something else. Another caveat I encountered the caveat that the special characters were inserted at the wrong place. Adding a small delay was enough to fix that with the additional parameter `--latency 0.01` Finally if you run as sudo you may need to reset some environment variable so that the list of audio devices (`XDG_RUNTIME_DIR`) and the download folder remain the same. To sum-up, that gives something like:
93
+ Another workaround while staying with Wayland is to use the low-level `uinput` backend of `pynput`, but that requires that `scribe` is run as root (sudo), and likely other configurations like activating the `uinput` system module (`sudo modprobe uinput` for a one-time test, or adding `uinput` to `/etc/modules-load.d/modules.conf` to make that persistent).
94
+ Moreover, the keyboard must be set with an appropriate layout, for example to have the letter `é` you'd want a French or Italian layout otherwise the English will drop it or replace with something else. Another caveat I encountered is that the special characters (`é`) were inserted at the wrong place. Adding a small delay was enough to fix that with the additional parameter `--latency 0.01` Finally if you run as sudo you may need to reset some environment variable so that the list of audio devices (`XDG_RUNTIME_DIR`) and the download folder remain the same. To sum-up, that gives something like:
87
95
  ```bash
88
96
  sudo modprobe uinput
89
97
  sudo HOME=$HOME XDG_RUNTIME_DIR=$XDG_RUNTIME_DIR PYNPUT_BACKEND_KEYBOARD=uinput $(which scribe) --latency 0.01
90
98
  ```
91
99
  You're on the right path :)
92
100
 
93
- ### System try icon (experimental)
101
+ ### System tray icon (experimental)
94
102
 
95
103
  To avoid switching back and forth with the terminal, it's possible to interact with the program via an icon tray.
96
104
  To activate start with:
@@ -116,4 +124,4 @@ e.g.
116
124
  scribe-install --backend whisper --model small
117
125
  ```
118
126
 
119
- After that just typing Cmd + scri... at any time from any where will conveniently start the app in its own terminal with the prescribed options.
127
+ After that just typing Cmd + scri... at any time from any where will conveniently start the app in its own terminal with the prescribed options.
@@ -5,7 +5,7 @@ build-backend = "setuptools.build_meta"
5
5
  [project]
6
6
  name = "scribe-cli"
7
7
  dynamic = ["version"]
8
- description = "scribe is a local speech recognition tool that provides real-time transcription using vosk and whisper AI."
8
+ description = "scribe is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard on a computer"
9
9
  authors = [
10
10
  { name="Mahé Perrette", email="mahe.perrette@gmail.com" }
11
11
  ]
@@ -18,9 +18,8 @@ dependencies = [
18
18
  "tqdm",
19
19
  "requests",
20
20
  "pyperclip",
21
- "pystray",
21
+ "unidecode",
22
22
  ]
23
- optional-dependencies = { keyboard = ["pynput"], whisper = ["openai-whisper"], vosk = ["vosk"], all = ["pynput", "openai-whisper", "vosk"] }
24
23
 
25
24
  classifiers = [
26
25
  "Programming Language :: Python :: 3",
@@ -39,6 +38,14 @@ keywords = [
39
38
  "clipboard",
40
39
  ]
41
40
 
41
+ [project.optional-dependencies]
42
+ keyboard = ["pynput"]
43
+ whisper = ["openai-whisper"]
44
+ vosk = ["vosk"]
45
+ app = ["pystray", "PyGObject"]
46
+ all = ["pynput", "openai-whisper", "vosk", "pystray"]
47
+
48
+
42
49
  [tool.setuptools]
43
50
  packages = [ "scribe", "scribe_data" ]
44
51
 
@@ -12,5 +12,5 @@ __version__: str
12
12
  __version_tuple__: VERSION_TUPLE
13
13
  version_tuple: VERSION_TUPLE
14
14
 
15
- __version__ = version = '0.7.4'
16
- __version_tuple__ = version_tuple = (0, 7, 4)
15
+ __version__ = version = '0.7.7'
16
+ __version_tuple__ = version_tuple = (0, 7, 7)
@@ -151,6 +151,7 @@ def get_parser():
151
151
  parser.add_argument("--keyboard", action="store_true")
152
152
  parser.add_argument("--no-clipboard", dest="clipboard", action="store_false")
153
153
  parser.add_argument("--latency", default=0, type=float, help="keyboard latency")
154
+ parser.add_argument("--ascii", action="store_true", help="Use unidecode for keyboard typing in ascii")
154
155
 
155
156
  group = parser.add_argument_group("whisper options")
156
157
  group.add_argument("--duration", default=120, type=int, help="Max duration of the whisper recording (default %(default)ss)")
@@ -164,7 +165,7 @@ def get_parser():
164
165
 
165
166
 
166
167
  # Commencer l'enregistrement
167
- def start_recording(micro, transcriber, clipboard=True, keyboard=False, latency=0, **greetings):
168
+ def start_recording(micro, transcriber, clipboard=True, keyboard=False, latency=0, ascii=False, **greetings):
168
169
 
169
170
  if keyboard:
170
171
  from scribe.keyboard import type_text
@@ -184,7 +185,7 @@ def start_recording(micro, transcriber, clipboard=True, keyboard=False, latency=
184
185
  clear_line()
185
186
  print(result.get('text'))
186
187
  if keyboard:
187
- type_text(result['text'] + " ", interval=latency) # Simulate typing
188
+ type_text(result['text'] + " ", interval=latency, ascii=ascii) # Simulate typing
188
189
 
189
190
  if clipboard:
190
191
  fulltext += result['text'] + " "
@@ -280,7 +281,7 @@ def main(args=None):
280
281
  transcriber = get_transcriber(o, prompt=o.prompt)
281
282
  print(f">>> Model {transcriber.model_name} from {transcriber.backend} selected. Keyboard [{'on' if o.keyboard else 'off'}]. Clipboard [{'on' if o.clipboard else 'off'}] <<<")
282
283
  if o.prompt:
283
- print(f"Choose any of the following actions:")
284
+ print(f"Choose any of the following actions (or any command-line toggle flag by name)")
284
285
  print(f"[q] quit")
285
286
  print(f"[e] change model")
286
287
  print(f"[x] toggle app [{toggle[o.app]}] -> [{toggle[not o.app]}]")
@@ -290,7 +291,7 @@ def main(args=None):
290
291
  print(f"[t] change duration (currently {transcriber.timeout}s)")
291
292
  print(f"[b] change silence duration (currently {transcriber.silence_duration}s)")
292
293
  print(f"[a] toggle auto-restart after silence [{toggle[transcriber.restart_after_silence]}] -> [{toggle[not transcriber.restart_after_silence]}]")
293
- print(colored(f"Press [Enter] or any other key to start recording.", "BOLD"))
294
+ print(colored(f"Press [Enter] to start recording.", "BOLD"))
294
295
 
295
296
  key = input()
296
297
  if key == "q":
@@ -324,19 +325,27 @@ def main(args=None):
324
325
  except:
325
326
  print("Invalid duration. Must be an integer.")
326
327
  continue
328
+ if key:
329
+ if hasattr(o, key) and isinstance(getattr(o, key), bool):
330
+ setattr(o, key, not getattr(o, key))
331
+ print(f"Toggle {key} to [{getattr(o, key)}].")
332
+ print(f"Invalid choice: {key}.")
333
+ continue
327
334
 
328
335
  if o.app:
329
336
  greetings = dict(
330
337
  start_message = "Listening... Use the try icon menu to stop.",
331
338
  )
332
- app = create_app(micro, transcriber, clipboard=o.clipboard, keyboard=o.keyboard, latency=o.latency, **greetings)
339
+ app = create_app(micro, transcriber, clipboard=o.clipboard,
340
+ keyboard=o.keyboard, latency=o.latency, ascii=o.ascii, **greetings)
333
341
  print("Starting app...")
334
342
  app.run()
335
343
  else:
336
344
  greetings = dict(
337
345
  start_message = "Listening... Press Ctrl+C to stop.",
338
346
  )
339
- start_recording(micro, transcriber, clipboard=o.clipboard, keyboard=o.keyboard, latency=o.latency, **greetings)
347
+ start_recording(micro, transcriber, clipboard=o.clipboard,
348
+ keyboard=o.keyboard, latency=o.latency, ascii=o.ascii, **greetings)
340
349
 
341
350
  # if we arrived so far, that means we pressed Ctrl + C anyway, and need Enter to move on.
342
351
  # So we leave the wider range of options to change the model.
@@ -2,6 +2,8 @@
2
2
  """
3
3
  import platform
4
4
  import time
5
+ import unidecode
6
+ import logging
5
7
 
6
8
  try:
7
9
  # import pyautogui
@@ -30,11 +32,24 @@ def paste_text():
30
32
  keyboard.release('v')
31
33
  keyboard.release(Key.ctrl)
32
34
 
33
- def type_text(text, interval=0, paste=False):
35
+ def safe_type_text(text):
36
+ """I got key errors with the uinput mode, so I'm using unidecode to convert
37
+ the text to ASCII before typing it."""
38
+ try:
39
+ keyboard.type(text)
40
+ except KeyError:
41
+ asciitext = unidecode.unidecode(text)
42
+ logging.warning(f"Key error with {text} -> convert to {asciitext}")
43
+ keyboard.type(asciitext)
44
+
45
+ def type_text(text, interval=0, paste=False, ascii=False):
34
46
  # Simulate typing a string
35
47
  # import subprocess
36
48
  # subprocess.run(["ydotool", "type", text])
37
49
 
50
+ if ascii:
51
+ text = unidecode.unidecode(text)
52
+
38
53
  if paste:
39
54
  import pyperclip
40
55
  keep_state = pyperclip.paste()
@@ -45,7 +60,9 @@ def type_text(text, interval=0, paste=False):
45
60
 
46
61
  if interval > 0:
47
62
  for c in text:
48
- keyboard.type(c)
63
+ # keyboard.type(c)
64
+ safe_type_text(c)
49
65
  time.sleep(interval)
50
66
  else:
51
- keyboard.type(text)
67
+ # keyboard.type(text)
68
+ safe_type_text(text)
@@ -1,7 +1,7 @@
1
1
  Metadata-Version: 2.2
2
2
  Name: scribe-cli
3
- Version: 0.7.4
4
- Summary: scribe is a local speech recognition tool that provides real-time transcription using vosk and whisper AI.
3
+ Version: 0.7.7
4
+ Summary: scribe is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard on a computer
5
5
  Author-email: Mahé Perrette <mahe.perrette@gmail.com>
6
6
  License: MIT License
7
7
 
@@ -44,34 +44,47 @@ Requires-Dist: sounddevice
44
44
  Requires-Dist: tqdm
45
45
  Requires-Dist: requests
46
46
  Requires-Dist: pyperclip
47
- Requires-Dist: pystray
47
+ Requires-Dist: unidecode
48
48
  Provides-Extra: keyboard
49
49
  Requires-Dist: pynput; extra == "keyboard"
50
50
  Provides-Extra: whisper
51
51
  Requires-Dist: openai-whisper; extra == "whisper"
52
52
  Provides-Extra: vosk
53
53
  Requires-Dist: vosk; extra == "vosk"
54
+ Provides-Extra: app
55
+ Requires-Dist: pystray; extra == "app"
56
+ Requires-Dist: PyGObject; extra == "app"
54
57
  Provides-Extra: all
55
58
  Requires-Dist: pynput; extra == "all"
56
59
  Requires-Dist: openai-whisper; extra == "all"
57
60
  Requires-Dist: vosk; extra == "all"
61
+ Requires-Dist: pystray; extra == "all"
62
+
63
+ [![python](https://img.shields.io/badge/python-3.12-blue.svg)]()
64
+ [![pypi](https://github.com/perrette/scribe/actions/workflows/pypi.yml/badge.svg)](https://pypi.org/project/scribe-cli)
58
65
 
59
66
  # Scribe
60
67
 
61
- `scribe` is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard.
68
+ `scribe` is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard on a computer.
69
+
70
+ ## Compatibility
71
+
72
+ In principle `scribe` is compatible with any OS but I develop it under Ubuntu (Wayland) and develop it for my own purposes so glitches are likely on other configurations.
73
+ As of February 19, 2025 python 13 is not supported (I can't recall now which dependency is to blame).
74
+ A test on Mac OS (M1 Air with 8Gb RAM) worked with python 12, though with a much inferior performance compared to my own system (Lenovo T14 Gen 5 with i5 125U 32 Gb RAM).
62
75
 
63
76
  ## Installation
64
77
 
65
- Install PortAudio library. E.g. on Ubuntu:
78
+ Install PortAudio library and xclip library. E.g. on Ubuntu:
66
79
 
67
80
  ```bash
68
- sudo apt-get install portaudio19-dev
81
+ sudo apt-get install portaudio19-dev xclip
69
82
  ```
70
83
 
71
- The python dependencies should be dealt with automatically:
84
+ See additional requirements for the [icon tray](#system-tray-icon-experimental) and [keyboard](#virtual-keyboard-experimental) options. The python dependencies should be dealt with automatically:
72
85
 
73
86
  ```bash
74
- pip install scribe-cli[all]"
87
+ pip install scribe-cli[all]
75
88
  ```
76
89
 
77
90
  (note the `-cli` suffix for client)
@@ -108,9 +121,8 @@ but it cannot do real-time, and depending on the model can have relatively long
108
121
  With the `whisker` model you need to stop the registration manually before the transcription occurs (Ctrl + C), though
109
122
  there is a maximum duration after which it will stop by itself, which is setup to 60s by default (unless `--duration` is set to something else).
110
123
 
111
- The `vosk` backend is good at
112
- doing real-time transcription for one language, but tended to make more mistakes in my tests and it does not do punctuation.
113
- Use mainly for longer typing session with the [keyboard](#virtual-keyboard-advanced) option, e.g. to make notes.
124
+ The `vosk` backend is much faster and very good at doing real-time transcription for one language, but tended to make more mistakes in my tests and it does not do punctuation.
125
+ It becomes really powerful when used for longer or interactive typing session with the [keyboard](#virtual-keyboard-experimental) option, e.g. to make notes or chat with an AI.
114
126
  There are many [vosk models](https://alphacephei.com/vosk/models) available, and here a few are associated to [a handful of languages](scribe/models.toml) `en`, `fr`, `it`, `de` (so far).
115
127
 
116
128
  To skip the initial selection menu you can do:
@@ -132,8 +144,7 @@ scribe --keyboard
132
144
  ```
133
145
 
134
146
  It relies on the optional `pynput` dependency (installed together with `scribe` if you used the `[all]` or `[keyboard]` option).
135
-
136
- `pynput` may require [some configuration](https://pynput.readthedocs.io/en/latest/limitations.html). It has [limitations]((https://pynput.readthedocs.io/en/latest/limitations.html)).
147
+ Depending on your operating system, `pynput` may require additional configuration to work around its [limitations](https://pynput.readthedocs.io/en/latest/limitations.html).
137
148
 
138
149
  #### Use the keyboard in Ubuntu
139
150
 
@@ -141,14 +152,15 @@ In my Ubuntu + Wayland system the keyboard simulation works out-of-the-box in ch
141
152
 
142
153
  One workaround is to use the Xorg version of GNOME: in `etc/gdm3/custom.conf` uncomment `# WaylandEnable=false` and restart your computer.
143
154
 
144
- Another workaround with Wayland is to use the low-level `uinput` backend but that requires that `scribe` is run as root (sudo), and likely other configurations like activating the `uinput` system module (`sudo modprobe uinput` for a one-time test, or adding `uinput` to `/etc/modules-load.d/modules.conf` to make that persistent). Moreover, since `uinput` really only simulates key strokes, your keyboard must be set with an appropriate layout, for example to have the letter `é` you'd want a French or Italian layout otherwise the English will drop it or replace with something else. Another caveat I encountered the caveat that the special characters were inserted at the wrong place. Adding a small delay was enough to fix that with the additional parameter `--latency 0.01` Finally if you run as sudo you may need to reset some environment variable so that the list of audio devices (`XDG_RUNTIME_DIR`) and the download folder remain the same. To sum-up, that gives something like:
155
+ Another workaround while staying with Wayland is to use the low-level `uinput` backend of `pynput`, but that requires that `scribe` is run as root (sudo), and likely other configurations like activating the `uinput` system module (`sudo modprobe uinput` for a one-time test, or adding `uinput` to `/etc/modules-load.d/modules.conf` to make that persistent).
156
+ Moreover, the keyboard must be set with an appropriate layout, for example to have the letter `é` you'd want a French or Italian layout otherwise the English will drop it or replace with something else. Another caveat I encountered is that the special characters (`é`) were inserted at the wrong place. Adding a small delay was enough to fix that with the additional parameter `--latency 0.01` Finally if you run as sudo you may need to reset some environment variable so that the list of audio devices (`XDG_RUNTIME_DIR`) and the download folder remain the same. To sum-up, that gives something like:
145
157
  ```bash
146
158
  sudo modprobe uinput
147
159
  sudo HOME=$HOME XDG_RUNTIME_DIR=$XDG_RUNTIME_DIR PYNPUT_BACKEND_KEYBOARD=uinput $(which scribe) --latency 0.01
148
160
  ```
149
161
  You're on the right path :)
150
162
 
151
- ### System try icon (experimental)
163
+ ### System tray icon (experimental)
152
164
 
153
165
  To avoid switching back and forth with the terminal, it's possible to interact with the program via an icon tray.
154
166
  To activate start with:
@@ -3,12 +3,17 @@ sounddevice
3
3
  tqdm
4
4
  requests
5
5
  pyperclip
6
- pystray
6
+ unidecode
7
7
 
8
8
  [all]
9
9
  pynput
10
10
  openai-whisper
11
11
  vosk
12
+ pystray
13
+
14
+ [app]
15
+ pystray
16
+ PyGObject
12
17
 
13
18
  [keyboard]
14
19
  pynput
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes