scribe-cli 0.7.10__tar.gz → 0.8.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (30) hide show
  1. {scribe_cli-0.7.10/scribe_cli.egg-info → scribe_cli-0.8.0}/PKG-INFO +44 -15
  2. {scribe_cli-0.7.10 → scribe_cli-0.8.0}/README.md +43 -14
  3. {scribe_cli-0.7.10 → scribe_cli-0.8.0}/scribe/_version.py +2 -2
  4. {scribe_cli-0.7.10 → scribe_cli-0.8.0}/scribe/app.py +74 -33
  5. {scribe_cli-0.7.10 → scribe_cli-0.8.0}/scribe/models.py +22 -3
  6. {scribe_cli-0.7.10 → scribe_cli-0.8.0/scribe_cli.egg-info}/PKG-INFO +44 -15
  7. {scribe_cli-0.7.10 → scribe_cli-0.8.0}/.github/workflows/pypi.yml +0 -0
  8. {scribe_cli-0.7.10 → scribe_cli-0.8.0}/.gitignore +0 -0
  9. {scribe_cli-0.7.10 → scribe_cli-0.8.0}/LICENSE +0 -0
  10. {scribe_cli-0.7.10 → scribe_cli-0.8.0}/icon.xcf +0 -0
  11. {scribe_cli-0.7.10 → scribe_cli-0.8.0}/pyproject.toml +0 -0
  12. {scribe_cli-0.7.10 → scribe_cli-0.8.0}/scribe/__init__.py +0 -0
  13. {scribe_cli-0.7.10 → scribe_cli-0.8.0}/scribe/audio.py +0 -0
  14. {scribe_cli-0.7.10 → scribe_cli-0.8.0}/scribe/install_desktop.py +0 -0
  15. {scribe_cli-0.7.10 → scribe_cli-0.8.0}/scribe/keyboard.py +0 -0
  16. {scribe_cli-0.7.10 → scribe_cli-0.8.0}/scribe/models.toml +0 -0
  17. {scribe_cli-0.7.10 → scribe_cli-0.8.0}/scribe/saverecording.py +0 -0
  18. {scribe_cli-0.7.10 → scribe_cli-0.8.0}/scribe/testpynput.py +0 -0
  19. {scribe_cli-0.7.10 → scribe_cli-0.8.0}/scribe/util.py +0 -0
  20. {scribe_cli-0.7.10 → scribe_cli-0.8.0}/scribe_cli.egg-info/SOURCES.txt +0 -0
  21. {scribe_cli-0.7.10 → scribe_cli-0.8.0}/scribe_cli.egg-info/dependency_links.txt +0 -0
  22. {scribe_cli-0.7.10 → scribe_cli-0.8.0}/scribe_cli.egg-info/entry_points.txt +0 -0
  23. {scribe_cli-0.7.10 → scribe_cli-0.8.0}/scribe_cli.egg-info/requires.txt +0 -0
  24. {scribe_cli-0.7.10 → scribe_cli-0.8.0}/scribe_cli.egg-info/top_level.txt +0 -0
  25. {scribe_cli-0.7.10 → scribe_cli-0.8.0}/scribe_data/__init__.py +0 -0
  26. {scribe_cli-0.7.10 → scribe_cli-0.8.0}/scribe_data/share/icon.png +0 -0
  27. {scribe_cli-0.7.10 → scribe_cli-0.8.0}/scribe_data/share/icon_recording.png +0 -0
  28. {scribe_cli-0.7.10 → scribe_cli-0.8.0}/scribe_data/share/icon_writing.png +0 -0
  29. {scribe_cli-0.7.10 → scribe_cli-0.8.0}/scribe_data/templates/scribe.desktop +0 -0
  30. {scribe_cli-0.7.10 → scribe_cli-0.8.0}/setup.cfg +0 -0
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.2
2
2
  Name: scribe-cli
3
- Version: 0.7.10
3
+ Version: 0.8.0
4
4
  Summary: scribe is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard on a computer
5
5
  Author-email: Mahé Perrette <mahe.perrette@gmail.com>
6
6
  License: MIT License
@@ -104,7 +104,7 @@ pip install -e .[all]
104
104
  You can leave the optional dependencies (leave out `[all]`) but must install at least one of `vosk` or `openai-whisper` packages (see Usage below).
105
105
 
106
106
  The `vosk` language models will download on-the-fly.
107
- The default download folder is `$XDG_CACHE_HOME/{backend}` where `$XDG_CACHE_HOME` defaults to `$HOME/.cache` (note for the `whisker` backend
107
+ The default download folder is `$XDG_CACHE_HOME/{backend}` where `$XDG_CACHE_HOME` defaults to `$HOME/.cache` (note for the `whisper` backend
108
108
  the default is left to the `openai-whisper` package and might change in the future).
109
109
 
110
110
 
@@ -118,12 +118,12 @@ scribe
118
118
  and the script will guide you through the choice of backend (`whisper` or `vosk`) and the specific language model.
119
119
  After this, you will be prompted to start recording your microphone and print the transcribed text in real-time (`vosk`)
120
120
  or until after recording is complete (`whisper`).
121
- You can interrupt the recording via Ctrl + C and start again or change model. The full content of the transcription will be pasted to the clipboard by default, until interruption.
121
+ You can interrupt the recording via Ctrl + C and start again or change model.
122
122
 
123
123
  The default (`whisper`) is excellent at transcribing a full-length audio sequences in [many languages](https://github.com/openai/whisper?tab=readme-ov-file#available-models-and-languages). It is really impressive,
124
124
  but it cannot do real-time, and depending on the model can have relatively long execution time, especially with the `turbo` model (at least on my laptop with CPU only). The `small` model is also excellent and runs much faster. It is selected as default in `scribe` for that reason.
125
- With the `whisker` model you need to stop the registration manually before the transcription occurs (Ctrl + C), though
126
- there is a maximum duration after which it will stop by itself, which is setup to 60s by default (unless `--duration` is set to something else).
125
+ With the `whisper` model the registration stops after a 2-second silence is detected. You can also stop the registration manually before the transcription occurs (Ctrl + C or Stop in the `--app` mode).
126
+ By default, the recording will only last 120 seconds. You can fine-tune this behaviour via the `--silence`, `--duration` and `--restart-after-silence` parameters.
127
127
 
128
128
  The `vosk` backend is much faster and very good at doing real-time transcription for one language, but tended to make more mistakes in my tests and it does not do punctuation.
129
129
  It becomes really powerful when used for longer or interactive typing session with the [keyboard](#virtual-keyboard-experimental) option, e.g. to make notes or chat with an AI.
@@ -135,19 +135,38 @@ scribe --backend whisper --model small --no-prompt
135
135
  ```
136
136
  where `--no-prompt` jumps right to the recording (after the first interruption, you can still choose to change the backend and model).
137
137
 
138
- ### Virtual keyboard (experimental)
138
+ ## Output media
139
+
140
+ By default the transcription is printed on the terminal, but other output media are supported.
141
+
142
+ ### Clipboard
143
+
144
+ The most straightforward is the clipboard:
145
+
146
+ ```bash
147
+ scribe --clipboard
148
+ ```
149
+ The content of the (full) transcription is then pasted to the clipboard, and it is up to the user to paste (e.g. Ctrl + V).
150
+
151
+ ### Output file
152
+
153
+ Alternatively an output file can be indicated:
154
+
155
+ ```bash
156
+ --keyboard -o transcription.txt
157
+ ```
139
158
 
140
- By default the content of the transcription is pasted to the clipboard, and it is up to the user to paste (e.g. Ctrl + V).
141
- However with the `vosk` backend and its realtime transcription, it is very handy to have the keys sent directly to the keyboard.
142
- That can be achieve with the `--keyboard` option.
159
+ ### Virtual keyboard (experimental)
143
160
 
144
- With the `--keyboard` option `scribe` will attempt to simulate a keyboard and send transcribed characters to the applcation under focus:
161
+ With the `--keyboard` option `scribe` will attempt to simulate a keyboard and send transcribed characters to the application under focus:
145
162
 
146
163
  ```bash
147
164
  scribe --keyboard
148
165
  ```
149
166
 
150
- It relies on the optional `pynput` dependency (installed together with `scribe` if you used the `[all]` or `[keyboard]` option).
167
+ This can be extremely useful with the `vosk` backend and its realtime transcription, or alternatively with the `--restart` option with the `whisper` backend.
168
+
169
+ The `--keyboard` option relies on the optional `pynput` dependency (installed together with `scribe` if you used the `[all]` or `[keyboard]` option).
151
170
  Depending on your operating system, `pynput` may require additional configuration to work around its [limitations](https://pynput.readthedocs.io/en/latest/limitations.html).
152
171
 
153
172
  #### Use the keyboard with Wayland (default for Ubuntu 24.04)
@@ -164,7 +183,7 @@ sudo HOME=$HOME XDG_RUNTIME_DIR=$XDG_RUNTIME_DIR PYNPUT_BACKEND_KEYBOARD=uinput
164
183
  ```
165
184
  You're on the right path :)
166
185
 
167
- ### System tray icon (experimental) <img src="scribe_data/share/icon.png" width=48px>
186
+ ## System tray icon (experimental) <img src="scribe_data/share/icon.png" width=48px>
168
187
 
169
188
  To avoid switching back and forth with the terminal, it's possible to interact with the program via an icon tray.
170
189
  To activate start with:
@@ -179,7 +198,7 @@ sudo apt install libcairo-dev libgirepository1.0-dev gir1.2-appindicator3-0.1
179
198
  pip install PyGObject
180
199
  ```
181
200
 
182
- ### Start as an application in GNOME
201
+ ## Start as an application in GNOME
183
202
 
184
203
  If you run Ubuntu (or else?) with GNOME, the script `scribe-install [...]` will create a `scribe.desktop` file and place it under `$HOME/.local/share/applications`
185
204
  to make it available from the quick launch menu. Any option will be passed on to `scribe`.
@@ -187,7 +206,17 @@ to make it available from the quick launch menu. Any option will be passed on to
187
206
  e.g.
188
207
 
189
208
  ```bash
190
- scribe-install --backend whisper --model small
209
+ scribe-install --backend whisper --model small --clipboard --keyboard --restart-after-silence
191
210
  ```
192
211
 
193
- After that just typing Cmd + scri... at any time from any where will conveniently start the app in its own terminal with the prescribed options.
212
+ After that just typing Super + scri... at any time from any where will conveniently start the app in its own terminal with the prescribed options.
213
+
214
+
215
+ ## Fine tuning
216
+
217
+ There are a number of options to control the silence threshold, duration and more.
218
+ Best is to check the available options in the online help:
219
+
220
+ ```bash
221
+ scribe --help
222
+ ```
@@ -41,7 +41,7 @@ pip install -e .[all]
41
41
  You can leave the optional dependencies (leave out `[all]`) but must install at least one of `vosk` or `openai-whisper` packages (see Usage below).
42
42
 
43
43
  The `vosk` language models will download on-the-fly.
44
- The default download folder is `$XDG_CACHE_HOME/{backend}` where `$XDG_CACHE_HOME` defaults to `$HOME/.cache` (note for the `whisker` backend
44
+ The default download folder is `$XDG_CACHE_HOME/{backend}` where `$XDG_CACHE_HOME` defaults to `$HOME/.cache` (note for the `whisper` backend
45
45
  the default is left to the `openai-whisper` package and might change in the future).
46
46
 
47
47
 
@@ -55,12 +55,12 @@ scribe
55
55
  and the script will guide you through the choice of backend (`whisper` or `vosk`) and the specific language model.
56
56
  After this, you will be prompted to start recording your microphone and print the transcribed text in real-time (`vosk`)
57
57
  or until after recording is complete (`whisper`).
58
- You can interrupt the recording via Ctrl + C and start again or change model. The full content of the transcription will be pasted to the clipboard by default, until interruption.
58
+ You can interrupt the recording via Ctrl + C and start again or change model.
59
59
 
60
60
  The default (`whisper`) is excellent at transcribing a full-length audio sequences in [many languages](https://github.com/openai/whisper?tab=readme-ov-file#available-models-and-languages). It is really impressive,
61
61
  but it cannot do real-time, and depending on the model can have relatively long execution time, especially with the `turbo` model (at least on my laptop with CPU only). The `small` model is also excellent and runs much faster. It is selected as default in `scribe` for that reason.
62
- With the `whisker` model you need to stop the registration manually before the transcription occurs (Ctrl + C), though
63
- there is a maximum duration after which it will stop by itself, which is setup to 60s by default (unless `--duration` is set to something else).
62
+ With the `whisper` model the registration stops after a 2-second silence is detected. You can also stop the registration manually before the transcription occurs (Ctrl + C or Stop in the `--app` mode).
63
+ By default, the recording will only last 120 seconds. You can fine-tune this behaviour via the `--silence`, `--duration` and `--restart-after-silence` parameters.
64
64
 
65
65
  The `vosk` backend is much faster and very good at doing real-time transcription for one language, but tended to make more mistakes in my tests and it does not do punctuation.
66
66
  It becomes really powerful when used for longer or interactive typing session with the [keyboard](#virtual-keyboard-experimental) option, e.g. to make notes or chat with an AI.
@@ -72,19 +72,38 @@ scribe --backend whisper --model small --no-prompt
72
72
  ```
73
73
  where `--no-prompt` jumps right to the recording (after the first interruption, you can still choose to change the backend and model).
74
74
 
75
- ### Virtual keyboard (experimental)
75
+ ## Output media
76
+
77
+ By default the transcription is printed on the terminal, but other output media are supported.
78
+
79
+ ### Clipboard
80
+
81
+ The most straightforward is the clipboard:
82
+
83
+ ```bash
84
+ scribe --clipboard
85
+ ```
86
+ The content of the (full) transcription is then pasted to the clipboard, and it is up to the user to paste (e.g. Ctrl + V).
76
87
 
77
- By default the content of the transcription is pasted to the clipboard, and it is up to the user to paste (e.g. Ctrl + V).
78
- However with the `vosk` backend and its realtime transcription, it is very handy to have the keys sent directly to the keyboard.
79
- That can be achieve with the `--keyboard` option.
88
+ ### Output file
80
89
 
81
- With the `--keyboard` option `scribe` will attempt to simulate a keyboard and send transcribed characters to the applcation under focus:
90
+ Alternatively an output file can be indicated:
91
+
92
+ ```bash
93
+ --keyboard -o transcription.txt
94
+ ```
95
+
96
+ ### Virtual keyboard (experimental)
97
+
98
+ With the `--keyboard` option `scribe` will attempt to simulate a keyboard and send transcribed characters to the application under focus:
82
99
 
83
100
  ```bash
84
101
  scribe --keyboard
85
102
  ```
86
103
 
87
- It relies on the optional `pynput` dependency (installed together with `scribe` if you used the `[all]` or `[keyboard]` option).
104
+ This can be extremely useful with the `vosk` backend and its realtime transcription, or alternatively with the `--restart` option with the `whisper` backend.
105
+
106
+ The `--keyboard` option relies on the optional `pynput` dependency (installed together with `scribe` if you used the `[all]` or `[keyboard]` option).
88
107
  Depending on your operating system, `pynput` may require additional configuration to work around its [limitations](https://pynput.readthedocs.io/en/latest/limitations.html).
89
108
 
90
109
  #### Use the keyboard with Wayland (default for Ubuntu 24.04)
@@ -101,7 +120,7 @@ sudo HOME=$HOME XDG_RUNTIME_DIR=$XDG_RUNTIME_DIR PYNPUT_BACKEND_KEYBOARD=uinput
101
120
  ```
102
121
  You're on the right path :)
103
122
 
104
- ### System tray icon (experimental) <img src="scribe_data/share/icon.png" width=48px>
123
+ ## System tray icon (experimental) <img src="scribe_data/share/icon.png" width=48px>
105
124
 
106
125
  To avoid switching back and forth with the terminal, it's possible to interact with the program via an icon tray.
107
126
  To activate start with:
@@ -116,7 +135,7 @@ sudo apt install libcairo-dev libgirepository1.0-dev gir1.2-appindicator3-0.1
116
135
  pip install PyGObject
117
136
  ```
118
137
 
119
- ### Start as an application in GNOME
138
+ ## Start as an application in GNOME
120
139
 
121
140
  If you run Ubuntu (or else?) with GNOME, the script `scribe-install [...]` will create a `scribe.desktop` file and place it under `$HOME/.local/share/applications`
122
141
  to make it available from the quick launch menu. Any option will be passed on to `scribe`.
@@ -124,7 +143,17 @@ to make it available from the quick launch menu. Any option will be passed on to
124
143
  e.g.
125
144
 
126
145
  ```bash
127
- scribe-install --backend whisper --model small
146
+ scribe-install --backend whisper --model small --clipboard --keyboard --restart-after-silence
128
147
  ```
129
148
 
130
- After that just typing Cmd + scri... at any time from any where will conveniently start the app in its own terminal with the prescribed options.
149
+ After that just typing Super + scri... at any time from any where will conveniently start the app in its own terminal with the prescribed options.
150
+
151
+
152
+ ## Fine tuning
153
+
154
+ There are a number of options to control the silence threshold, duration and more.
155
+ Best is to check the available options in the online help:
156
+
157
+ ```bash
158
+ scribe --help
159
+ ```
@@ -12,5 +12,5 @@ __version__: str
12
12
  __version_tuple__: VERSION_TUPLE
13
13
  version_tuple: VERSION_TUPLE
14
14
 
15
- __version__ = version = '0.7.10'
16
- __version_tuple__ = version_tuple = (0, 7, 10)
15
+ __version__ = version = '0.8.0'
16
+ __version_tuple__ = version_tuple = (0, 8, 0)
@@ -1,5 +1,6 @@
1
1
  from pathlib import Path
2
2
  import tomllib
3
+ import re
3
4
  import time
4
5
  import argparse
5
6
  from scribe.audio import Microphone
@@ -147,7 +148,8 @@ def get_transcriber(o, prompt=True):
147
148
 
148
149
  elif backend == "whisper":
149
150
  transcriber = WhisperTranscriber(model_name=model, language=o.language, samplerate=o.samplerate,
150
- timeout=o.duration, silence_duration=o.silence, restart_after_silence=o.restart_after_silence,
151
+ timeout=o.duration, silence_duration=o.silence, silence_thresh=o.silence_db,
152
+ restart_after_silence=o.restart_after_silence,
151
153
  model_kwargs={"download_root": o.download_folder_whisper})
152
154
 
153
155
  else:
@@ -176,15 +178,22 @@ def get_parser():
176
178
 
177
179
  parser.add_argument("--samplerate", default=16000, type=int, help=argparse.SUPPRESS)
178
180
  parser.add_argument("--microphone-device", help="The device index of the microphone to use.", type=int)
179
- parser.add_argument("--keyboard", action="store_true")
180
- parser.add_argument("--no-clipboard", dest="clipboard", action="store_false")
181
- parser.add_argument("--latency", default=0, type=float, help="keyboard latency")
182
- parser.add_argument("--ascii", action="store_true", help="Use unidecode for keyboard typing in ascii")
181
+
182
+ group = parser.add_argument_group("transcription output")
183
+ group.add_argument("-c", "--clipboard", dest="clipboard", action="store_true")
184
+ # group.add_argument("--no-clipboard", dest="clipboard", action="store_false", help=argparse.SUPPRESS)
185
+ group.add_argument("-k", "--keyboard", action="store_true")
186
+ group.add_argument("-o", "--output-file")
187
+
188
+ group = parser.add_argument_group("keyboard options")
189
+ group.add_argument("--latency", default=0.01, type=float, help="keyboard latency (default %(default)s s)")
190
+ group.add_argument("--ascii", action="store_true", help="Use unidecode for keyboard typing in ascii")
183
191
 
184
192
  group = parser.add_argument_group("whisper options")
185
- group.add_argument("--duration", default=120, type=int, help="Max duration of the whisper recording (default %(default)ss)")
186
- group.add_argument("--silence", default=2, type=float, help="silence duration that prompt transcription (whisper) (default %(default)ss)")
187
- group.add_argument("--restart-after-silence", action="store_true", help="Restart the recording after a transcription triggered by a silence")
193
+ group.add_argument("--duration", default=120, type=float, help="Max duration of the whisper recording (default %(default)s s)")
194
+ group.add_argument("--silence", default=2, type=float, help="silence duration (default %(default)s s)")
195
+ group.add_argument("--silence-db", default=-30, type=float, help="silence magnitude in decibel (default %(default)s db)")
196
+ group.add_argument("-a", "--restart-after-silence", action="store_true", help="Restart the recording after a transcription triggered by a silence")
188
197
 
189
198
  parser.add_argument("--download-folder-vosk", help="Folder to store Vosk models.")
190
199
  parser.add_argument("--download-folder-whisper", help="Folder to store Whisper models.")
@@ -193,18 +202,16 @@ def get_parser():
193
202
 
194
203
 
195
204
  # Commencer l'enregistrement
196
- def start_recording(micro, transcriber, clipboard=True, keyboard=False, latency=0, ascii=False, callback=None, **greetings):
205
+ def start_recording(micro, transcriber, clipboard=True, keyboard=False, latency=0, ascii=False, output_file=None, callback=None, **greetings):
197
206
 
198
207
  if keyboard:
199
208
  from scribe.keyboard import type_text
200
209
  print("\nChange focus to target app during transcription.")
201
210
 
202
-
203
211
  if clipboard:
204
212
  import pyperclip
205
213
  print("\nThe full transcription will be copied to clipboard as it becomes available.")
206
214
 
207
-
208
215
  fulltext = ""
209
216
 
210
217
  for result in transcriber.start_recording(micro, **greetings):
@@ -215,6 +222,10 @@ def start_recording(micro, transcriber, clipboard=True, keyboard=False, latency=
215
222
  if keyboard:
216
223
  type_text(result['text'] + " ", interval=latency, ascii=ascii) # Simulate typing
217
224
 
225
+ if output_file:
226
+ with open(output_file, "a") as f:
227
+ f.write(result['text'] + "\n")
228
+
218
229
  if clipboard:
219
230
  fulltext += result['text'] + " "
220
231
  pyperclip.copy(fulltext.strip())
@@ -222,9 +233,6 @@ def start_recording(micro, transcriber, clipboard=True, keyboard=False, latency=
222
233
  else:
223
234
  print_partial(result.get('partial', ''))
224
235
 
225
- if clipboard:
226
- print("Copied to clipboard.")
227
-
228
236
  if callback:
229
237
  callback()
230
238
 
@@ -249,7 +257,15 @@ def create_app(micro, transcriber, **kwargs):
249
257
  image_recording = Image.alpha_composite(image_recording.convert("RGBA"), image_writing.convert("RGBA"))
250
258
 
251
259
  def update_icon(icon, force=False):
252
- if transcriber.recording:
260
+ if transcriber.recording and transcriber.waiting:
261
+ # this is the situation with the whisper backend when the microphone is recording
262
+ # but we wait for the speaker to speak (silence)
263
+ if force or getattr(icon, "_icon_label", None) != None:
264
+ icon.icon = image
265
+ icon._icon_label = None
266
+ icon.update_menu()
267
+
268
+ elif transcriber.recording:
253
269
  if force or getattr(icon, "_icon_label", None) != "recording":
254
270
  icon.icon = image_recording
255
271
  icon._icon_label = "recording"
@@ -345,22 +361,30 @@ def main(args=None):
345
361
  if transcriber is None:
346
362
  transcriber = get_transcriber(o, prompt=o.prompt)
347
363
  print(f"Model [{colored(transcriber.model_name, 'light_blue', attrs=['bold'])}] from [{colored(transcriber.backend, 'light_blue', attrs=['bold'])}] selected.")
348
- show_options = ["clipboard", "keyboard", "ascii", "app"]
349
- activated_options = [colored(option, 'light_blue') for option in show_options if getattr(o, option)]
350
- print(f"Options: {' | '.join(activated_options)}")
364
+ show_output = ["clipboard", "keyboard", "output_file"]
365
+ show_options = ["ascii", "app"]
366
+ activated_output = [colored(option if type(getattr(o, option)) is bool else f'{option}={getattr(o, option)}', 'light_blue') for option in show_output if getattr(o, option)]
367
+ activated_options = [colored(option if type(getattr(o, option)) is bool else f'{option}={getattr(o, option)}', 'light_blue') for option in show_options if getattr(o, option)]
368
+ if activated_output:
369
+ print(f"Output: {' | '.join(activated_output)}")
370
+ else:
371
+ print(colored(f"No output selected -> terminal only", "light_red"))
372
+ if activated_options:
373
+ print(f"Options: {' | '.join(activated_options)}")
351
374
  if o.prompt:
352
375
  print(f"Choose any of the following actions")
353
- print(f"{colored('[q]', 'light_yellow')} quit")
354
376
  print(f"{colored('[e]', 'light_yellow')} change model")
377
+ print(f"{colored('[f]', 'light_yellow')} output file is {colored(repr(o.output_file), 'light_blue')}")
378
+ print(f"{colored('[c]', 'light_yellow')} clipboard is {colored(o.clipboard, 'light_blue')} toggle?")
379
+ print(f"{colored('[k]', 'light_yellow')} keyboard is {colored(o.keyboard, 'light_blue')} toggle?")
380
+ print(f"{colored('[x]', 'light_yellow')} app is {colored(o.app, 'light_blue')} toggle?")
355
381
  if details:
356
- print(f"{colored('[x]', 'light_yellow')} app is {colored(o.app, 'light_blue')} toggle?")
357
- print(f"{colored('[c]', 'light_yellow')} clipboard is {colored(o.clipboard, 'light_blue')} toggle?")
358
- print(f"{colored('[k]', 'light_yellow')} keyboard is {colored(o.keyboard, 'light_blue')} toggle?")
359
382
  if o.keyboard:
360
383
  print(f"{colored('[latency]', 'light_yellow')} between keystrokes is {colored(o.latency, 'light_blue')} s")
361
384
  if transcriber.backend == "whisper":
362
385
  print(f"{colored('[t]', 'light_yellow')} change duration (currently {colored(transcriber.timeout, 'light_blue')} s)")
363
386
  print(f"{colored('[b]', 'light_yellow')} change silence (currently {colored(transcriber.silence_duration, 'light_blue')} s)")
387
+ print(f"{colored('[db]', 'light_yellow')} change backround noise (currently {colored(transcriber.silence_thresh, 'light_blue')} db)")
364
388
  print(f"{colored('[a]', 'light_yellow')} auto-restart after silence is {colored(transcriber.restart_after_silence, 'light_blue')} toggle?")
365
389
  exclude_flags = ["keyboard", "clipboard", "app", "prompt", "restart_after_silence"]
366
390
  display_flags = [a.dest for a in parser._actions if a.help != argparse.SUPPRESS]
@@ -368,21 +392,22 @@ def main(args=None):
368
392
  if key not in display_flags or key in exclude_flags or not isinstance(value, bool):
369
393
  continue
370
394
  print(f"{colored(f'[{key}]', 'light_yellow')} is {colored(value, 'light_blue')} toggle?")
371
- print(f"{colored('[o]', 'light_yellow')} hide options")
395
+ print(f"{colored('[-]', 'light_yellow')} hide options")
372
396
  else:
373
- print(f"{colored('[o]', 'light_yellow')} show options")
374
-
397
+ print(f"{colored('[-]', 'light_yellow')} show more options")
398
+ print(f"{colored('[q]', 'light_yellow')} quit")
375
399
  print(colored(f"Press [Enter] to start recording.", attrs=["bold"]))
376
400
 
377
401
  key = input()
378
402
  if key == "q":
379
403
  exit(0)
380
- if key == "o":
404
+ if len(key) > 0 and key.strip() in ["", ".", "-", "+", 'o', '\x1b[A', '\x1b[B', '\x1b[C', '\x1b[D']: # arrow keys
381
405
  details = not details
382
406
  continue
383
407
  if key == "e":
384
408
  transcriber = None
385
409
  o.model = None
410
+ o.dummy = False
386
411
  o.backend = None
387
412
  o.language = None
388
413
  continue
@@ -401,9 +426,9 @@ def main(args=None):
401
426
  if key == "t":
402
427
  ans = input(f"Enter new duration in seconds (current: {transcriber.timeout}): ")
403
428
  try:
404
- o.duration = transcriber.timeout = int(ans)
429
+ o.duration = transcriber.timeout = float(ans)
405
430
  except:
406
- print("Invalid duration. Must be an integer.")
431
+ print("Invalid duration. Must be a float.")
407
432
  continue
408
433
  if key == "latency":
409
434
  ans = input(f"Enter new keyboard latency in seconds (current: {o.latency}): ")
@@ -415,22 +440,38 @@ def main(args=None):
415
440
  if key == "b":
416
441
  ans = input(f"Enter new silence break duration in seconds (current: {transcriber.silence_duration}): ")
417
442
  try:
418
- o.silence = transcriber.silence_duration = int(ans)
443
+ o.silence = transcriber.silence_duration = float(ans)
419
444
  except:
420
- print("Invalid duration. Must be an integer.")
445
+ print("Invalid duration. Must be a float.")
446
+ continue
447
+ if key == "db":
448
+ ans = input(f"Enter new background noise threshold to detect silence (current: {transcriber.silence_thresh}): ")
449
+ try:
450
+ o.silence_db = transcriber.silence_thresh = float(ans)
451
+ except:
452
+ print("Invalid duration. Must be a float.")
453
+ continue
454
+ if key == "f":
455
+ ans = input(f"Enter output file (current: {o.output_file}): ")
456
+ invalid_regex = re.compile(r'[^A-Za-z0-9_\-\\\/\.]')
457
+ if not invalid_regex.search(ans):
458
+ o.output_file = ans
459
+ else:
460
+ print(f"Invalid characters: {' '.join(map(repr, invalid_regex.findall(ans)))}")
461
+ print(f"Invalid file name: {repr(ans)}")
421
462
  continue
422
463
  if key:
423
464
  if hasattr(o, key) and isinstance(getattr(o, key), bool):
424
465
  setattr(o, key, not getattr(o, key))
425
466
  print(f"Toggle {key} to [{getattr(o, key)}].")
426
- print(f"Invalid choice: {key}.")
467
+ print(f"Invalid choice: {repr(key)}")
427
468
  continue
428
469
 
429
470
  if o.app:
430
471
  greetings = dict(
431
472
  start_message = "Listening... Use the try icon menu to stop.",
432
473
  )
433
- app = create_app(micro, transcriber, clipboard=o.clipboard,
474
+ app = create_app(micro, transcriber, clipboard=o.clipboard, output_file=o.output_file,
434
475
  keyboard=o.keyboard, latency=o.latency, ascii=o.ascii, **greetings)
435
476
  print("Starting app...")
436
477
  app.run()
@@ -438,7 +479,7 @@ def main(args=None):
438
479
  greetings = dict(
439
480
  start_message = "Listening... Press Ctrl+C to stop.",
440
481
  )
441
- start_recording(micro, transcriber, clipboard=o.clipboard,
482
+ start_recording(micro, transcriber, clipboard=o.clipboard, output_file=o.output_file,
442
483
  keyboard=o.keyboard, latency=o.latency, ascii=o.ascii, **greetings)
443
484
 
444
485
  # if we arrived so far, that means we pressed Ctrl + C anyway, and need Enter to move on.
@@ -34,6 +34,7 @@ class AbstractTranscriber:
34
34
  self.restart_after_silence = restart_after_silence
35
35
  self.recording = False
36
36
  self.busy = False
37
+ self.waiting = False
37
38
  self.reset()
38
39
 
39
40
  def get_elapsed(self):
@@ -52,7 +53,6 @@ class AbstractTranscriber:
52
53
  def reset(self):
53
54
  self.audio_buffer = b''
54
55
  self.start_time = time.time()
55
- self.last_sound_time = time.time()
56
56
 
57
57
  def start_recording(self, microphone,
58
58
  start_message="Recording... Press Ctrl+C to stop.",
@@ -60,7 +60,13 @@ class AbstractTranscriber:
60
60
 
61
61
  self.reset()
62
62
  self.recording = True
63
+ self.waiting = True
63
64
  self.busy = True
65
+ if self.silence_duration is not None:
66
+ self.last_sound_time = time.time() - self.silence_duration
67
+ else:
68
+ self.last_sound_time = time.time()
69
+ previous_waiting = self.waiting
64
70
 
65
71
  try:
66
72
 
@@ -75,19 +81,31 @@ class AbstractTranscriber:
75
81
  if is_silent(data, self.silence_thresh):
76
82
  silence_duration = time.time() - self.last_sound_time
77
83
 
78
- if self.silence_duration is not None and silence_duration >= self.silence_duration and len(self.audio_buffer) > 0:
84
+ previous_waiting = self.waiting
85
+ self.waiting = self.silence_duration is not None and silence_duration >= self.silence_duration
86
+
87
+ if self.waiting and len(self.audio_buffer) > 0:
79
88
  if self.restart_after_silence:
89
+ self.recording = False # for the system tray icon
80
90
  result = self.finalize()
81
91
  microphone.q.queue.clear()
82
92
  self.reset()
83
93
  yield result
94
+ self.recording = True # for the system tray icon
84
95
  else:
85
96
  raise StopRecording("Silence detected: {:.2f} seconds".format(silence_duration))
86
97
 
87
98
  else:
88
99
  self.last_sound_time = time.time()
100
+ self.waiting = False
89
101
 
90
- yield self.transcribe_realtime_audio(data)
102
+ # don't accumulate very long silences
103
+ if not self.waiting:
104
+ yield self.transcribe_realtime_audio(data)
105
+
106
+ else:
107
+ if not previous_waiting:
108
+ print("Silence detected...waiting for more audio")
91
109
 
92
110
  if self.is_overtime():
93
111
  raise StopRecording("Overtime: {:.2f} seconds".format(self.get_elapsed()))
@@ -98,6 +116,7 @@ class AbstractTranscriber:
98
116
  pass
99
117
 
100
118
  finally:
119
+ self.waiting = False
101
120
  self.recording = False
102
121
  result = self.finalize()
103
122
  microphone.q.queue.clear()
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.2
2
2
  Name: scribe-cli
3
- Version: 0.7.10
3
+ Version: 0.8.0
4
4
  Summary: scribe is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard on a computer
5
5
  Author-email: Mahé Perrette <mahe.perrette@gmail.com>
6
6
  License: MIT License
@@ -104,7 +104,7 @@ pip install -e .[all]
104
104
  You can leave the optional dependencies (leave out `[all]`) but must install at least one of `vosk` or `openai-whisper` packages (see Usage below).
105
105
 
106
106
  The `vosk` language models will download on-the-fly.
107
- The default download folder is `$XDG_CACHE_HOME/{backend}` where `$XDG_CACHE_HOME` defaults to `$HOME/.cache` (note for the `whisker` backend
107
+ The default download folder is `$XDG_CACHE_HOME/{backend}` where `$XDG_CACHE_HOME` defaults to `$HOME/.cache` (note for the `whisper` backend
108
108
  the default is left to the `openai-whisper` package and might change in the future).
109
109
 
110
110
 
@@ -118,12 +118,12 @@ scribe
118
118
  and the script will guide you through the choice of backend (`whisper` or `vosk`) and the specific language model.
119
119
  After this, you will be prompted to start recording your microphone and print the transcribed text in real-time (`vosk`)
120
120
  or until after recording is complete (`whisper`).
121
- You can interrupt the recording via Ctrl + C and start again or change model. The full content of the transcription will be pasted to the clipboard by default, until interruption.
121
+ You can interrupt the recording via Ctrl + C and start again or change model.
122
122
 
123
123
  The default (`whisper`) is excellent at transcribing a full-length audio sequences in [many languages](https://github.com/openai/whisper?tab=readme-ov-file#available-models-and-languages). It is really impressive,
124
124
  but it cannot do real-time, and depending on the model can have relatively long execution time, especially with the `turbo` model (at least on my laptop with CPU only). The `small` model is also excellent and runs much faster. It is selected as default in `scribe` for that reason.
125
- With the `whisker` model you need to stop the registration manually before the transcription occurs (Ctrl + C), though
126
- there is a maximum duration after which it will stop by itself, which is setup to 60s by default (unless `--duration` is set to something else).
125
+ With the `whisper` model the registration stops after a 2-second silence is detected. You can also stop the registration manually before the transcription occurs (Ctrl + C or Stop in the `--app` mode).
126
+ By default, the recording will only last 120 seconds. You can fine-tune this behaviour via the `--silence`, `--duration` and `--restart-after-silence` parameters.
127
127
 
128
128
  The `vosk` backend is much faster and very good at doing real-time transcription for one language, but tended to make more mistakes in my tests and it does not do punctuation.
129
129
  It becomes really powerful when used for longer or interactive typing session with the [keyboard](#virtual-keyboard-experimental) option, e.g. to make notes or chat with an AI.
@@ -135,19 +135,38 @@ scribe --backend whisper --model small --no-prompt
135
135
  ```
136
136
  where `--no-prompt` jumps right to the recording (after the first interruption, you can still choose to change the backend and model).
137
137
 
138
- ### Virtual keyboard (experimental)
138
+ ## Output media
139
+
140
+ By default the transcription is printed on the terminal, but other output media are supported.
141
+
142
+ ### Clipboard
143
+
144
+ The most straightforward is the clipboard:
145
+
146
+ ```bash
147
+ scribe --clipboard
148
+ ```
149
+ The content of the (full) transcription is then pasted to the clipboard, and it is up to the user to paste (e.g. Ctrl + V).
150
+
151
+ ### Output file
152
+
153
+ Alternatively an output file can be indicated:
154
+
155
+ ```bash
156
+ --keyboard -o transcription.txt
157
+ ```
139
158
 
140
- By default the content of the transcription is pasted to the clipboard, and it is up to the user to paste (e.g. Ctrl + V).
141
- However with the `vosk` backend and its realtime transcription, it is very handy to have the keys sent directly to the keyboard.
142
- That can be achieve with the `--keyboard` option.
159
+ ### Virtual keyboard (experimental)
143
160
 
144
- With the `--keyboard` option `scribe` will attempt to simulate a keyboard and send transcribed characters to the applcation under focus:
161
+ With the `--keyboard` option `scribe` will attempt to simulate a keyboard and send transcribed characters to the application under focus:
145
162
 
146
163
  ```bash
147
164
  scribe --keyboard
148
165
  ```
149
166
 
150
- It relies on the optional `pynput` dependency (installed together with `scribe` if you used the `[all]` or `[keyboard]` option).
167
+ This can be extremely useful with the `vosk` backend and its realtime transcription, or alternatively with the `--restart` option with the `whisper` backend.
168
+
169
+ The `--keyboard` option relies on the optional `pynput` dependency (installed together with `scribe` if you used the `[all]` or `[keyboard]` option).
151
170
  Depending on your operating system, `pynput` may require additional configuration to work around its [limitations](https://pynput.readthedocs.io/en/latest/limitations.html).
152
171
 
153
172
  #### Use the keyboard with Wayland (default for Ubuntu 24.04)
@@ -164,7 +183,7 @@ sudo HOME=$HOME XDG_RUNTIME_DIR=$XDG_RUNTIME_DIR PYNPUT_BACKEND_KEYBOARD=uinput
164
183
  ```
165
184
  You're on the right path :)
166
185
 
167
- ### System tray icon (experimental) <img src="scribe_data/share/icon.png" width=48px>
186
+ ## System tray icon (experimental) <img src="scribe_data/share/icon.png" width=48px>
168
187
 
169
188
  To avoid switching back and forth with the terminal, it's possible to interact with the program via an icon tray.
170
189
  To activate start with:
@@ -179,7 +198,7 @@ sudo apt install libcairo-dev libgirepository1.0-dev gir1.2-appindicator3-0.1
179
198
  pip install PyGObject
180
199
  ```
181
200
 
182
- ### Start as an application in GNOME
201
+ ## Start as an application in GNOME
183
202
 
184
203
  If you run Ubuntu (or else?) with GNOME, the script `scribe-install [...]` will create a `scribe.desktop` file and place it under `$HOME/.local/share/applications`
185
204
  to make it available from the quick launch menu. Any option will be passed on to `scribe`.
@@ -187,7 +206,17 @@ to make it available from the quick launch menu. Any option will be passed on to
187
206
  e.g.
188
207
 
189
208
  ```bash
190
- scribe-install --backend whisper --model small
209
+ scribe-install --backend whisper --model small --clipboard --keyboard --restart-after-silence
191
210
  ```
192
211
 
193
- After that just typing Cmd + scri... at any time from any where will conveniently start the app in its own terminal with the prescribed options.
212
+ After that just typing Super + scri... at any time from any where will conveniently start the app in its own terminal with the prescribed options.
213
+
214
+
215
+ ## Fine tuning
216
+
217
+ There are a number of options to control the silence threshold, duration and more.
218
+ Best is to check the available options in the online help:
219
+
220
+ ```bash
221
+ scribe --help
222
+ ```
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes