scribe-cli 0.7.9__tar.gz → 0.7.11__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (30) hide show
  1. {scribe_cli-0.7.9/scribe_cli.egg-info → scribe_cli-0.7.11}/PKG-INFO +13 -10
  2. {scribe_cli-0.7.9 → scribe_cli-0.7.11}/README.md +12 -9
  3. {scribe_cli-0.7.9 → scribe_cli-0.7.11}/scribe/_version.py +2 -2
  4. {scribe_cli-0.7.9 → scribe_cli-0.7.11}/scribe/app.py +56 -28
  5. {scribe_cli-0.7.9 → scribe_cli-0.7.11}/scribe/models.py +22 -3
  6. {scribe_cli-0.7.9 → scribe_cli-0.7.11/scribe_cli.egg-info}/PKG-INFO +13 -10
  7. {scribe_cli-0.7.9 → scribe_cli-0.7.11}/.github/workflows/pypi.yml +0 -0
  8. {scribe_cli-0.7.9 → scribe_cli-0.7.11}/.gitignore +0 -0
  9. {scribe_cli-0.7.9 → scribe_cli-0.7.11}/LICENSE +0 -0
  10. {scribe_cli-0.7.9 → scribe_cli-0.7.11}/icon.xcf +0 -0
  11. {scribe_cli-0.7.9 → scribe_cli-0.7.11}/pyproject.toml +0 -0
  12. {scribe_cli-0.7.9 → scribe_cli-0.7.11}/scribe/__init__.py +0 -0
  13. {scribe_cli-0.7.9 → scribe_cli-0.7.11}/scribe/audio.py +0 -0
  14. {scribe_cli-0.7.9 → scribe_cli-0.7.11}/scribe/install_desktop.py +0 -0
  15. {scribe_cli-0.7.9 → scribe_cli-0.7.11}/scribe/keyboard.py +0 -0
  16. {scribe_cli-0.7.9 → scribe_cli-0.7.11}/scribe/models.toml +0 -0
  17. {scribe_cli-0.7.9 → scribe_cli-0.7.11}/scribe/saverecording.py +0 -0
  18. {scribe_cli-0.7.9 → scribe_cli-0.7.11}/scribe/testpynput.py +0 -0
  19. {scribe_cli-0.7.9 → scribe_cli-0.7.11}/scribe/util.py +0 -0
  20. {scribe_cli-0.7.9 → scribe_cli-0.7.11}/scribe_cli.egg-info/SOURCES.txt +0 -0
  21. {scribe_cli-0.7.9 → scribe_cli-0.7.11}/scribe_cli.egg-info/dependency_links.txt +0 -0
  22. {scribe_cli-0.7.9 → scribe_cli-0.7.11}/scribe_cli.egg-info/entry_points.txt +0 -0
  23. {scribe_cli-0.7.9 → scribe_cli-0.7.11}/scribe_cli.egg-info/requires.txt +0 -0
  24. {scribe_cli-0.7.9 → scribe_cli-0.7.11}/scribe_cli.egg-info/top_level.txt +0 -0
  25. {scribe_cli-0.7.9 → scribe_cli-0.7.11}/scribe_data/__init__.py +0 -0
  26. {scribe_cli-0.7.9 → scribe_cli-0.7.11}/scribe_data/share/icon.png +0 -0
  27. {scribe_cli-0.7.9 → scribe_cli-0.7.11}/scribe_data/share/icon_recording.png +0 -0
  28. {scribe_cli-0.7.9 → scribe_cli-0.7.11}/scribe_data/share/icon_writing.png +0 -0
  29. {scribe_cli-0.7.9 → scribe_cli-0.7.11}/scribe_data/templates/scribe.desktop +0 -0
  30. {scribe_cli-0.7.9 → scribe_cli-0.7.11}/setup.cfg +0 -0
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.2
2
2
  Name: scribe-cli
3
- Version: 0.7.9
3
+ Version: 0.7.11
4
4
  Summary: scribe is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard on a computer
5
5
  Author-email: Mahé Perrette <mahe.perrette@gmail.com>
6
6
  License: MIT License
@@ -64,14 +64,17 @@ Requires-Dist: pystray; extra == "all"
64
64
  [![python](https://img.shields.io/badge/python-3.12-blue.svg)]()
65
65
  [![pypi](https://img.shields.io/pypi/v/scribe-cli)](https://pypi.org/project/scribe-cli)
66
66
 
67
- # Scribe
67
+ # Scribe <img src="scribe_data/share/icon.png" width=48px>
68
68
 
69
69
  `scribe` is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard on a computer.
70
70
 
71
71
  ## Compatibility
72
72
 
73
- In principle `scribe` is compatible with any OS but I develop it under Ubuntu (Wayland) and develop it for my own purposes so glitches are likely on other configurations.
74
- As of February 19, 2025 python 13 is not supported (I can't recall now which dependency is to blame).
73
+ In principle `scribe` is compatible with any OS but I develop it under Ubuntu (Wayland) for my own purposes so glitches are likely on other configurations.
74
+ Moreover there are quite a bit of dependencies that rely on very OS-specific protocols under the hood, like access to the microphone, keyboard and clipboard,
75
+ and even though the python dependencies `scribe` relies on are not restricted to a single platform, there may be limitation and additional binaries to install.
76
+ This guide is based on python3.12 running on Ubuntu 24.04 with Gnome + Wayland, which is a relatively standard setting at the time of writing.
77
+ Note as of February 19, 2025 python 13 does not seem to produce any transcription (I am not sure which dependency is to blame).
75
78
  A test on Mac OS (M1 Air with 8Gb RAM) worked with python 12, though with a much inferior performance compared to my own system (Lenovo T14 Gen 5 with i5 125U 32 Gb RAM).
76
79
 
77
80
  ## Installation
@@ -101,7 +104,7 @@ pip install -e .[all]
101
104
  You can leave the optional dependencies (leave out `[all]`) but must install at least one of `vosk` or `openai-whisper` packages (see Usage below).
102
105
 
103
106
  The `vosk` language models will download on-the-fly.
104
- The default download folder is `$XDG_CACHE_HOME/{backend}` where `$XDG_CACHE_HOME` defaults to `$HOME/.cache` (note for the `whisker` backend
107
+ The default download folder is `$XDG_CACHE_HOME/{backend}` where `$XDG_CACHE_HOME` defaults to `$HOME/.cache` (note for the `whisper` backend
105
108
  the default is left to the `openai-whisper` package and might change in the future).
106
109
 
107
110
 
@@ -119,8 +122,8 @@ You can interrupt the recording via Ctrl + C and start again or change model. Th
119
122
 
120
123
  The default (`whisper`) is excellent at transcribing a full-length audio sequences in [many languages](https://github.com/openai/whisper?tab=readme-ov-file#available-models-and-languages). It is really impressive,
121
124
  but it cannot do real-time, and depending on the model can have relatively long execution time, especially with the `turbo` model (at least on my laptop with CPU only). The `small` model is also excellent and runs much faster. It is selected as default in `scribe` for that reason.
122
- With the `whisker` model you need to stop the registration manually before the transcription occurs (Ctrl + C), though
123
- there is a maximum duration after which it will stop by itself, which is setup to 60s by default (unless `--duration` is set to something else).
125
+ With the `whisper` model the registration stops after a 2-second silence is detected. You can also stop the registration manually before the transcription occurs (Ctrl + C or Stop in the `--app` mode).
126
+ By default, the recording will only last 120 seconds. You can fine-tune this behaviour via the `--silence`, `--duration` and `--restart-after-silence` parameters.
124
127
 
125
128
  The `vosk` backend is much faster and very good at doing real-time transcription for one language, but tended to make more mistakes in my tests and it does not do punctuation.
126
129
  It becomes really powerful when used for longer or interactive typing session with the [keyboard](#virtual-keyboard-experimental) option, e.g. to make notes or chat with an AI.
@@ -147,7 +150,7 @@ scribe --keyboard
147
150
  It relies on the optional `pynput` dependency (installed together with `scribe` if you used the `[all]` or `[keyboard]` option).
148
151
  Depending on your operating system, `pynput` may require additional configuration to work around its [limitations](https://pynput.readthedocs.io/en/latest/limitations.html).
149
152
 
150
- #### Use the keyboard in Ubuntu
153
+ #### Use the keyboard with Wayland (default for Ubuntu 24.04)
151
154
 
152
155
  In my Ubuntu + Wayland system the keyboard simulation works out-of-the-box in chromium based applications (including vscode) but it does not in firefox and sublime text and any of the rest (not even in a terminal !). I am told this is because Chromium runs an X server emulator and so is compatible with the default pynput backend.
153
156
 
@@ -161,7 +164,7 @@ sudo HOME=$HOME XDG_RUNTIME_DIR=$XDG_RUNTIME_DIR PYNPUT_BACKEND_KEYBOARD=uinput
161
164
  ```
162
165
  You're on the right path :)
163
166
 
164
- ### System tray icon (experimental)
167
+ ### System tray icon (experimental) <img src="scribe_data/share/icon.png" width=48px>
165
168
 
166
169
  To avoid switching back and forth with the terminal, it's possible to interact with the program via an icon tray.
167
170
  To activate start with:
@@ -176,7 +179,7 @@ sudo apt install libcairo-dev libgirepository1.0-dev gir1.2-appindicator3-0.1
176
179
  pip install PyGObject
177
180
  ```
178
181
 
179
- ### Start as an application in Ubuntu
182
+ ### Start as an application in GNOME
180
183
 
181
184
  If you run Ubuntu (or else?) with GNOME, the script `scribe-install [...]` will create a `scribe.desktop` file and place it under `$HOME/.local/share/applications`
182
185
  to make it available from the quick launch menu. Any option will be passed on to `scribe`.
@@ -1,14 +1,17 @@
1
1
  [![python](https://img.shields.io/badge/python-3.12-blue.svg)]()
2
2
  [![pypi](https://img.shields.io/pypi/v/scribe-cli)](https://pypi.org/project/scribe-cli)
3
3
 
4
- # Scribe
4
+ # Scribe <img src="scribe_data/share/icon.png" width=48px>
5
5
 
6
6
  `scribe` is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard on a computer.
7
7
 
8
8
  ## Compatibility
9
9
 
10
- In principle `scribe` is compatible with any OS but I develop it under Ubuntu (Wayland) and develop it for my own purposes so glitches are likely on other configurations.
11
- As of February 19, 2025 python 13 is not supported (I can't recall now which dependency is to blame).
10
+ In principle `scribe` is compatible with any OS but I develop it under Ubuntu (Wayland) for my own purposes so glitches are likely on other configurations.
11
+ Moreover there are quite a bit of dependencies that rely on very OS-specific protocols under the hood, like access to the microphone, keyboard and clipboard,
12
+ and even though the python dependencies `scribe` relies on are not restricted to a single platform, there may be limitation and additional binaries to install.
13
+ This guide is based on python3.12 running on Ubuntu 24.04 with Gnome + Wayland, which is a relatively standard setting at the time of writing.
14
+ Note as of February 19, 2025 python 13 does not seem to produce any transcription (I am not sure which dependency is to blame).
12
15
  A test on Mac OS (M1 Air with 8Gb RAM) worked with python 12, though with a much inferior performance compared to my own system (Lenovo T14 Gen 5 with i5 125U 32 Gb RAM).
13
16
 
14
17
  ## Installation
@@ -38,7 +41,7 @@ pip install -e .[all]
38
41
  You can leave the optional dependencies (leave out `[all]`) but must install at least one of `vosk` or `openai-whisper` packages (see Usage below).
39
42
 
40
43
  The `vosk` language models will download on-the-fly.
41
- The default download folder is `$XDG_CACHE_HOME/{backend}` where `$XDG_CACHE_HOME` defaults to `$HOME/.cache` (note for the `whisker` backend
44
+ The default download folder is `$XDG_CACHE_HOME/{backend}` where `$XDG_CACHE_HOME` defaults to `$HOME/.cache` (note for the `whisper` backend
42
45
  the default is left to the `openai-whisper` package and might change in the future).
43
46
 
44
47
 
@@ -56,8 +59,8 @@ You can interrupt the recording via Ctrl + C and start again or change model. Th
56
59
 
57
60
  The default (`whisper`) is excellent at transcribing a full-length audio sequences in [many languages](https://github.com/openai/whisper?tab=readme-ov-file#available-models-and-languages). It is really impressive,
58
61
  but it cannot do real-time, and depending on the model can have relatively long execution time, especially with the `turbo` model (at least on my laptop with CPU only). The `small` model is also excellent and runs much faster. It is selected as default in `scribe` for that reason.
59
- With the `whisker` model you need to stop the registration manually before the transcription occurs (Ctrl + C), though
60
- there is a maximum duration after which it will stop by itself, which is setup to 60s by default (unless `--duration` is set to something else).
62
+ With the `whisper` model the registration stops after a 2-second silence is detected. You can also stop the registration manually before the transcription occurs (Ctrl + C or Stop in the `--app` mode).
63
+ By default, the recording will only last 120 seconds. You can fine-tune this behaviour via the `--silence`, `--duration` and `--restart-after-silence` parameters.
61
64
 
62
65
  The `vosk` backend is much faster and very good at doing real-time transcription for one language, but tended to make more mistakes in my tests and it does not do punctuation.
63
66
  It becomes really powerful when used for longer or interactive typing session with the [keyboard](#virtual-keyboard-experimental) option, e.g. to make notes or chat with an AI.
@@ -84,7 +87,7 @@ scribe --keyboard
84
87
  It relies on the optional `pynput` dependency (installed together with `scribe` if you used the `[all]` or `[keyboard]` option).
85
88
  Depending on your operating system, `pynput` may require additional configuration to work around its [limitations](https://pynput.readthedocs.io/en/latest/limitations.html).
86
89
 
87
- #### Use the keyboard in Ubuntu
90
+ #### Use the keyboard with Wayland (default for Ubuntu 24.04)
88
91
 
89
92
  In my Ubuntu + Wayland system the keyboard simulation works out-of-the-box in chromium based applications (including vscode) but it does not in firefox and sublime text and any of the rest (not even in a terminal !). I am told this is because Chromium runs an X server emulator and so is compatible with the default pynput backend.
90
93
 
@@ -98,7 +101,7 @@ sudo HOME=$HOME XDG_RUNTIME_DIR=$XDG_RUNTIME_DIR PYNPUT_BACKEND_KEYBOARD=uinput
98
101
  ```
99
102
  You're on the right path :)
100
103
 
101
- ### System tray icon (experimental)
104
+ ### System tray icon (experimental) <img src="scribe_data/share/icon.png" width=48px>
102
105
 
103
106
  To avoid switching back and forth with the terminal, it's possible to interact with the program via an icon tray.
104
107
  To activate start with:
@@ -113,7 +116,7 @@ sudo apt install libcairo-dev libgirepository1.0-dev gir1.2-appindicator3-0.1
113
116
  pip install PyGObject
114
117
  ```
115
118
 
116
- ### Start as an application in Ubuntu
119
+ ### Start as an application in GNOME
117
120
 
118
121
  If you run Ubuntu (or else?) with GNOME, the script `scribe-install [...]` will create a `scribe.desktop` file and place it under `$HOME/.local/share/applications`
119
122
  to make it available from the quick launch menu. Any option will be passed on to `scribe`.
@@ -12,5 +12,5 @@ __version__: str
12
12
  __version_tuple__: VERSION_TUPLE
13
13
  version_tuple: VERSION_TUPLE
14
14
 
15
- __version__ = version = '0.7.9'
16
- __version_tuple__ = version_tuple = (0, 7, 9)
15
+ __version__ = version = '0.7.11'
16
+ __version_tuple__ = version_tuple = (0, 7, 11)
@@ -112,16 +112,17 @@ def get_transcriber(o, prompt=True):
112
112
  choices = list(zip(available_models, available_languages)) + [f" * [Any model from {ansi_link('https://alphacephei.com/vosk/models')}]"]
113
113
  default_model = choices[0] # this is a tuple !!
114
114
 
115
- print(f"For information about vosk models see: {ansi_link('https://alphacephei.com/vosk/models')}")
116
115
  if prompt:
116
+ print(f"For information about vosk models see: {ansi_link('https://alphacephei.com/vosk/models')}")
117
117
  model = prompt_choices(choices, default=default_model, label="model") # this always returns a string
118
118
  else:
119
119
  model = default_model[0] if isinstance(default_model, tuple) else default_model # tuple -> string
120
120
 
121
121
  elif backend == "whisper":
122
122
  default_model = "small"
123
- print("Some models have a specialized English version (.en) which will be selected as default is `-l en` was requested, but can also be requested explicitly below (option not listed). See [documentation](https://github.com/openai/whisper?tab=readme-ov-file#available-models-and-languages).")
124
123
  if prompt:
124
+ # print("Some models have a specialized English version (.en) which will be selected as default is `-l en` was requested, but can also be requested explicitly below (option not listed). See [documentation](https://github.com/openai/whisper?tab=readme-ov-file#available-models-and-languages).")
125
+ print(f"See {ansi_link('https://github.com/openai/whisper?tab=readme-ov-file#available-models-and-languages')} for available models.")
125
126
  model = prompt_choices(whisper_models, default=default_model, label="model",
126
127
  hidden_models=whisper_english_models)
127
128
  else:
@@ -146,7 +147,8 @@ def get_transcriber(o, prompt=True):
146
147
 
147
148
  elif backend == "whisper":
148
149
  transcriber = WhisperTranscriber(model_name=model, language=o.language, samplerate=o.samplerate,
149
- timeout=o.duration, silence_duration=o.silence, restart_after_silence=o.restart_after_silence,
150
+ timeout=o.duration, silence_duration=o.silence, silence_thresh=o.silence_db,
151
+ restart_after_silence=o.restart_after_silence,
150
152
  model_kwargs={"download_root": o.download_folder_whisper})
151
153
 
152
154
  else:
@@ -177,12 +179,13 @@ def get_parser():
177
179
  parser.add_argument("--microphone-device", help="The device index of the microphone to use.", type=int)
178
180
  parser.add_argument("--keyboard", action="store_true")
179
181
  parser.add_argument("--no-clipboard", dest="clipboard", action="store_false")
180
- parser.add_argument("--latency", default=0, type=float, help="keyboard latency")
182
+ parser.add_argument("--latency", default=0.01, type=float, help="keyboard latency")
181
183
  parser.add_argument("--ascii", action="store_true", help="Use unidecode for keyboard typing in ascii")
182
184
 
183
185
  group = parser.add_argument_group("whisper options")
184
- group.add_argument("--duration", default=120, type=int, help="Max duration of the whisper recording (default %(default)ss)")
185
- group.add_argument("--silence", default=2, type=float, help="silence duration that prompt transcription (whisper) (default %(default)ss)")
186
+ group.add_argument("--duration", default=120, type=float, help="Max duration of the whisper recording (default %(default)ss)")
187
+ group.add_argument("--silence", default=2, type=float, help="silence duration (default %(default)ss)")
188
+ group.add_argument("--silence-db", default=-30, type=float, help="silence magnitude in db (default %(default)ss)")
186
189
  group.add_argument("--restart-after-silence", action="store_true", help="Restart the recording after a transcription triggered by a silence")
187
190
 
188
191
  parser.add_argument("--download-folder-vosk", help="Folder to store Vosk models.")
@@ -248,7 +251,15 @@ def create_app(micro, transcriber, **kwargs):
248
251
  image_recording = Image.alpha_composite(image_recording.convert("RGBA"), image_writing.convert("RGBA"))
249
252
 
250
253
  def update_icon(icon, force=False):
251
- if transcriber.recording:
254
+ if transcriber.recording and transcriber.waiting:
255
+ # this is the situation with the whisper backend when the microphone is recording
256
+ # but we wait for the speaker to speak (silence)
257
+ if force or getattr(icon, "_icon_label", None) != None:
258
+ icon.icon = image
259
+ icon._icon_label = None
260
+ icon.update_menu()
261
+
262
+ elif transcriber.recording:
252
263
  if force or getattr(icon, "_icon_label", None) != "recording":
253
264
  icon.icon = image_recording
254
265
  icon._icon_label = "recording"
@@ -338,38 +349,48 @@ def main(args=None):
338
349
  micro = Microphone(samplerate=o.samplerate, device=o.microphone_device)
339
350
 
340
351
  transcriber = None
341
-
342
- toggle = {True: "On", False: "Off"}
352
+ details = False
343
353
 
344
354
  while True:
345
355
  if transcriber is None:
346
356
  transcriber = get_transcriber(o, prompt=o.prompt)
347
357
  print(f"Model [{colored(transcriber.model_name, 'light_blue', attrs=['bold'])}] from [{colored(transcriber.backend, 'light_blue', attrs=['bold'])}] selected.")
358
+ show_options = ["clipboard", "keyboard", "ascii", "app"]
359
+ activated_options = [colored(option, 'light_blue') for option in show_options if getattr(o, option)]
360
+ print(f"Options: {' | '.join(activated_options)}")
348
361
  if o.prompt:
349
362
  print(f"Choose any of the following actions")
350
363
  print(f"{colored('[q]', 'light_yellow')} quit")
351
364
  print(f"{colored('[e]', 'light_yellow')} change model")
352
- print(f"{colored('[x]', 'light_yellow')} app is {colored(o.app, 'light_blue')} toggle?")
353
- print(f"{colored('[c]', 'light_yellow')} clipboard is {colored(o.clipboard, 'light_blue')} toggle?")
354
- print(f"{colored('[k]', 'light_yellow')} keyboard is {colored(o.keyboard, 'light_blue')} toggle?")
355
- if o.keyboard:
356
- print(f"{colored('[latency]', 'light_yellow')} between keystrokes is {colored(o.latency, 'light_blue')} s")
357
- if transcriber.backend == "whisper":
358
- print(f"{colored('[t]', 'light_yellow')} change duration (currently {colored(transcriber.timeout, 'light_blue')} s)")
359
- print(f"{colored('[b]', 'light_yellow')} change silence duration (currently {colored(transcriber.silence_duration, 'light_blue')} s)")
360
- print(f"{colored('[a]', 'light_yellow')} auto-restart after silence is {colored(transcriber.restart_after_silence, 'light_blue')} toggle?")
361
- exclude_flags = ["keyboard", "clipboard", "app", "prompt", "restart_after_silence"]
362
- display_flags = [a.dest for a in parser._actions if a.help != argparse.SUPPRESS]
363
- for key, value in vars(o).items():
364
- if key not in display_flags or key in exclude_flags or not isinstance(value, bool):
365
- continue
366
- print(f"{colored(f'[{key}]', 'light_yellow')} is {colored(value, 'light_blue')} toggle?")
365
+ if details:
366
+ print(f"{colored('[x]', 'light_yellow')} app is {colored(o.app, 'light_blue')} toggle?")
367
+ print(f"{colored('[c]', 'light_yellow')} clipboard is {colored(o.clipboard, 'light_blue')} toggle?")
368
+ print(f"{colored('[k]', 'light_yellow')} keyboard is {colored(o.keyboard, 'light_blue')} toggle?")
369
+ if o.keyboard:
370
+ print(f"{colored('[latency]', 'light_yellow')} between keystrokes is {colored(o.latency, 'light_blue')} s")
371
+ if transcriber.backend == "whisper":
372
+ print(f"{colored('[t]', 'light_yellow')} change duration (currently {colored(transcriber.timeout, 'light_blue')} s)")
373
+ print(f"{colored('[b]', 'light_yellow')} change silence (currently {colored(transcriber.silence_duration, 'light_blue')} s)")
374
+ print(f"{colored('[db]', 'light_yellow')} change backround noise (currently {colored(transcriber.silence_thresh, 'light_blue')} db)")
375
+ print(f"{colored('[a]', 'light_yellow')} auto-restart after silence is {colored(transcriber.restart_after_silence, 'light_blue')} toggle?")
376
+ exclude_flags = ["keyboard", "clipboard", "app", "prompt", "restart_after_silence"]
377
+ display_flags = [a.dest for a in parser._actions if a.help != argparse.SUPPRESS]
378
+ for key, value in vars(o).items():
379
+ if key not in display_flags or key in exclude_flags or not isinstance(value, bool):
380
+ continue
381
+ print(f"{colored(f'[{key}]', 'light_yellow')} is {colored(value, 'light_blue')} toggle?")
382
+ print(f"{colored('[o]', 'light_yellow')} hide options")
383
+ else:
384
+ print(f"{colored('[o]', 'light_yellow')} show options")
367
385
 
368
386
  print(colored(f"Press [Enter] to start recording.", attrs=["bold"]))
369
387
 
370
388
  key = input()
371
389
  if key == "q":
372
390
  exit(0)
391
+ if key == "o":
392
+ details = not details
393
+ continue
373
394
  if key == "e":
374
395
  transcriber = None
375
396
  o.model = None
@@ -391,9 +412,9 @@ def main(args=None):
391
412
  if key == "t":
392
413
  ans = input(f"Enter new duration in seconds (current: {transcriber.timeout}): ")
393
414
  try:
394
- o.duration = transcriber.timeout = int(ans)
415
+ o.duration = transcriber.timeout = float(ans)
395
416
  except:
396
- print("Invalid duration. Must be an integer.")
417
+ print("Invalid duration. Must be a float.")
397
418
  continue
398
419
  if key == "latency":
399
420
  ans = input(f"Enter new keyboard latency in seconds (current: {o.latency}): ")
@@ -405,9 +426,16 @@ def main(args=None):
405
426
  if key == "b":
406
427
  ans = input(f"Enter new silence break duration in seconds (current: {transcriber.silence_duration}): ")
407
428
  try:
408
- o.silence = transcriber.silence_duration = int(ans)
429
+ o.silence = transcriber.silence_duration = float(ans)
430
+ except:
431
+ print("Invalid duration. Must be a float.")
432
+ continue
433
+ if key == "db":
434
+ ans = input(f"Enter new background noise threshold to detect silence (current: {transcriber.silence_thresh}): ")
435
+ try:
436
+ o.silence_db = transcriber.silence_thresh = float(ans)
409
437
  except:
410
- print("Invalid duration. Must be an integer.")
438
+ print("Invalid duration. Must be a float.")
411
439
  continue
412
440
  if key:
413
441
  if hasattr(o, key) and isinstance(getattr(o, key), bool):
@@ -34,6 +34,7 @@ class AbstractTranscriber:
34
34
  self.restart_after_silence = restart_after_silence
35
35
  self.recording = False
36
36
  self.busy = False
37
+ self.waiting = False
37
38
  self.reset()
38
39
 
39
40
  def get_elapsed(self):
@@ -52,7 +53,6 @@ class AbstractTranscriber:
52
53
  def reset(self):
53
54
  self.audio_buffer = b''
54
55
  self.start_time = time.time()
55
- self.last_sound_time = time.time()
56
56
 
57
57
  def start_recording(self, microphone,
58
58
  start_message="Recording... Press Ctrl+C to stop.",
@@ -60,7 +60,13 @@ class AbstractTranscriber:
60
60
 
61
61
  self.reset()
62
62
  self.recording = True
63
+ self.waiting = True
63
64
  self.busy = True
65
+ if self.silence_duration is not None:
66
+ self.last_sound_time = time.time() - self.silence_duration
67
+ else:
68
+ self.last_sound_time = time.time()
69
+ previous_waiting = self.waiting
64
70
 
65
71
  try:
66
72
 
@@ -75,19 +81,31 @@ class AbstractTranscriber:
75
81
  if is_silent(data, self.silence_thresh):
76
82
  silence_duration = time.time() - self.last_sound_time
77
83
 
78
- if self.silence_duration is not None and silence_duration >= self.silence_duration and len(self.audio_buffer) > 0:
84
+ previous_waiting = self.waiting
85
+ self.waiting = self.silence_duration is not None and silence_duration >= self.silence_duration
86
+
87
+ if self.waiting and len(self.audio_buffer) > 0:
79
88
  if self.restart_after_silence:
89
+ self.recording = False # for the system tray icon
80
90
  result = self.finalize()
81
91
  microphone.q.queue.clear()
82
92
  self.reset()
83
93
  yield result
94
+ self.recording = True # for the system tray icon
84
95
  else:
85
96
  raise StopRecording("Silence detected: {:.2f} seconds".format(silence_duration))
86
97
 
87
98
  else:
88
99
  self.last_sound_time = time.time()
100
+ self.waiting = False
89
101
 
90
- yield self.transcribe_realtime_audio(data)
102
+ # don't accumulate very long silences
103
+ if not self.waiting:
104
+ yield self.transcribe_realtime_audio(data)
105
+
106
+ else:
107
+ if not previous_waiting:
108
+ print("Silence detected...waiting for more audio")
91
109
 
92
110
  if self.is_overtime():
93
111
  raise StopRecording("Overtime: {:.2f} seconds".format(self.get_elapsed()))
@@ -98,6 +116,7 @@ class AbstractTranscriber:
98
116
  pass
99
117
 
100
118
  finally:
119
+ self.waiting = False
101
120
  self.recording = False
102
121
  result = self.finalize()
103
122
  microphone.q.queue.clear()
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.2
2
2
  Name: scribe-cli
3
- Version: 0.7.9
3
+ Version: 0.7.11
4
4
  Summary: scribe is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard on a computer
5
5
  Author-email: Mahé Perrette <mahe.perrette@gmail.com>
6
6
  License: MIT License
@@ -64,14 +64,17 @@ Requires-Dist: pystray; extra == "all"
64
64
  [![python](https://img.shields.io/badge/python-3.12-blue.svg)]()
65
65
  [![pypi](https://img.shields.io/pypi/v/scribe-cli)](https://pypi.org/project/scribe-cli)
66
66
 
67
- # Scribe
67
+ # Scribe <img src="scribe_data/share/icon.png" width=48px>
68
68
 
69
69
  `scribe` is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard on a computer.
70
70
 
71
71
  ## Compatibility
72
72
 
73
- In principle `scribe` is compatible with any OS but I develop it under Ubuntu (Wayland) and develop it for my own purposes so glitches are likely on other configurations.
74
- As of February 19, 2025 python 13 is not supported (I can't recall now which dependency is to blame).
73
+ In principle `scribe` is compatible with any OS but I develop it under Ubuntu (Wayland) for my own purposes so glitches are likely on other configurations.
74
+ Moreover there are quite a bit of dependencies that rely on very OS-specific protocols under the hood, like access to the microphone, keyboard and clipboard,
75
+ and even though the python dependencies `scribe` relies on are not restricted to a single platform, there may be limitation and additional binaries to install.
76
+ This guide is based on python3.12 running on Ubuntu 24.04 with Gnome + Wayland, which is a relatively standard setting at the time of writing.
77
+ Note as of February 19, 2025 python 13 does not seem to produce any transcription (I am not sure which dependency is to blame).
75
78
  A test on Mac OS (M1 Air with 8Gb RAM) worked with python 12, though with a much inferior performance compared to my own system (Lenovo T14 Gen 5 with i5 125U 32 Gb RAM).
76
79
 
77
80
  ## Installation
@@ -101,7 +104,7 @@ pip install -e .[all]
101
104
  You can leave the optional dependencies (leave out `[all]`) but must install at least one of `vosk` or `openai-whisper` packages (see Usage below).
102
105
 
103
106
  The `vosk` language models will download on-the-fly.
104
- The default download folder is `$XDG_CACHE_HOME/{backend}` where `$XDG_CACHE_HOME` defaults to `$HOME/.cache` (note for the `whisker` backend
107
+ The default download folder is `$XDG_CACHE_HOME/{backend}` where `$XDG_CACHE_HOME` defaults to `$HOME/.cache` (note for the `whisper` backend
105
108
  the default is left to the `openai-whisper` package and might change in the future).
106
109
 
107
110
 
@@ -119,8 +122,8 @@ You can interrupt the recording via Ctrl + C and start again or change model. Th
119
122
 
120
123
  The default (`whisper`) is excellent at transcribing a full-length audio sequences in [many languages](https://github.com/openai/whisper?tab=readme-ov-file#available-models-and-languages). It is really impressive,
121
124
  but it cannot do real-time, and depending on the model can have relatively long execution time, especially with the `turbo` model (at least on my laptop with CPU only). The `small` model is also excellent and runs much faster. It is selected as default in `scribe` for that reason.
122
- With the `whisker` model you need to stop the registration manually before the transcription occurs (Ctrl + C), though
123
- there is a maximum duration after which it will stop by itself, which is setup to 60s by default (unless `--duration` is set to something else).
125
+ With the `whisper` model the registration stops after a 2-second silence is detected. You can also stop the registration manually before the transcription occurs (Ctrl + C or Stop in the `--app` mode).
126
+ By default, the recording will only last 120 seconds. You can fine-tune this behaviour via the `--silence`, `--duration` and `--restart-after-silence` parameters.
124
127
 
125
128
  The `vosk` backend is much faster and very good at doing real-time transcription for one language, but tended to make more mistakes in my tests and it does not do punctuation.
126
129
  It becomes really powerful when used for longer or interactive typing session with the [keyboard](#virtual-keyboard-experimental) option, e.g. to make notes or chat with an AI.
@@ -147,7 +150,7 @@ scribe --keyboard
147
150
  It relies on the optional `pynput` dependency (installed together with `scribe` if you used the `[all]` or `[keyboard]` option).
148
151
  Depending on your operating system, `pynput` may require additional configuration to work around its [limitations](https://pynput.readthedocs.io/en/latest/limitations.html).
149
152
 
150
- #### Use the keyboard in Ubuntu
153
+ #### Use the keyboard with Wayland (default for Ubuntu 24.04)
151
154
 
152
155
  In my Ubuntu + Wayland system the keyboard simulation works out-of-the-box in chromium based applications (including vscode) but it does not in firefox and sublime text and any of the rest (not even in a terminal !). I am told this is because Chromium runs an X server emulator and so is compatible with the default pynput backend.
153
156
 
@@ -161,7 +164,7 @@ sudo HOME=$HOME XDG_RUNTIME_DIR=$XDG_RUNTIME_DIR PYNPUT_BACKEND_KEYBOARD=uinput
161
164
  ```
162
165
  You're on the right path :)
163
166
 
164
- ### System tray icon (experimental)
167
+ ### System tray icon (experimental) <img src="scribe_data/share/icon.png" width=48px>
165
168
 
166
169
  To avoid switching back and forth with the terminal, it's possible to interact with the program via an icon tray.
167
170
  To activate start with:
@@ -176,7 +179,7 @@ sudo apt install libcairo-dev libgirepository1.0-dev gir1.2-appindicator3-0.1
176
179
  pip install PyGObject
177
180
  ```
178
181
 
179
- ### Start as an application in Ubuntu
182
+ ### Start as an application in GNOME
180
183
 
181
184
  If you run Ubuntu (or else?) with GNOME, the script `scribe-install [...]` will create a `scribe.desktop` file and place it under `$HOME/.local/share/applications`
182
185
  to make it available from the quick launch menu. Any option will be passed on to `scribe`.
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes