scribe-cli 0.7.11__tar.gz → 0.9.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (31) hide show
  1. {scribe_cli-0.7.11/scribe_cli.egg-info → scribe_cli-0.9.0}/PKG-INFO +47 -13
  2. {scribe_cli-0.7.11 → scribe_cli-0.9.0}/README.md +46 -12
  3. {scribe_cli-0.7.11 → scribe_cli-0.9.0}/scribe/_version.py +2 -2
  4. {scribe_cli-0.7.11 → scribe_cli-0.9.0}/scribe/app.py +55 -30
  5. {scribe_cli-0.7.11 → scribe_cli-0.9.0}/scribe/install_desktop.py +14 -3
  6. {scribe_cli-0.7.11 → scribe_cli-0.9.0}/scribe/models.py +3 -1
  7. {scribe_cli-0.7.11 → scribe_cli-0.9.0/scribe_cli.egg-info}/PKG-INFO +47 -13
  8. scribe_cli-0.9.0/scribe_data/templates/scribe.desktop +8 -0
  9. scribe_cli-0.7.11/scribe_data/templates/scribe.desktop +0 -8
  10. {scribe_cli-0.7.11 → scribe_cli-0.9.0}/.github/workflows/pypi.yml +0 -0
  11. {scribe_cli-0.7.11 → scribe_cli-0.9.0}/.gitignore +0 -0
  12. {scribe_cli-0.7.11 → scribe_cli-0.9.0}/LICENSE +0 -0
  13. {scribe_cli-0.7.11 → scribe_cli-0.9.0}/icon.xcf +0 -0
  14. {scribe_cli-0.7.11 → scribe_cli-0.9.0}/pyproject.toml +0 -0
  15. {scribe_cli-0.7.11 → scribe_cli-0.9.0}/scribe/__init__.py +0 -0
  16. {scribe_cli-0.7.11 → scribe_cli-0.9.0}/scribe/audio.py +0 -0
  17. {scribe_cli-0.7.11 → scribe_cli-0.9.0}/scribe/keyboard.py +0 -0
  18. {scribe_cli-0.7.11 → scribe_cli-0.9.0}/scribe/models.toml +0 -0
  19. {scribe_cli-0.7.11 → scribe_cli-0.9.0}/scribe/saverecording.py +0 -0
  20. {scribe_cli-0.7.11 → scribe_cli-0.9.0}/scribe/testpynput.py +0 -0
  21. {scribe_cli-0.7.11 → scribe_cli-0.9.0}/scribe/util.py +0 -0
  22. {scribe_cli-0.7.11 → scribe_cli-0.9.0}/scribe_cli.egg-info/SOURCES.txt +0 -0
  23. {scribe_cli-0.7.11 → scribe_cli-0.9.0}/scribe_cli.egg-info/dependency_links.txt +0 -0
  24. {scribe_cli-0.7.11 → scribe_cli-0.9.0}/scribe_cli.egg-info/entry_points.txt +0 -0
  25. {scribe_cli-0.7.11 → scribe_cli-0.9.0}/scribe_cli.egg-info/requires.txt +0 -0
  26. {scribe_cli-0.7.11 → scribe_cli-0.9.0}/scribe_cli.egg-info/top_level.txt +0 -0
  27. {scribe_cli-0.7.11 → scribe_cli-0.9.0}/scribe_data/__init__.py +0 -0
  28. {scribe_cli-0.7.11 → scribe_cli-0.9.0}/scribe_data/share/icon.png +0 -0
  29. {scribe_cli-0.7.11 → scribe_cli-0.9.0}/scribe_data/share/icon_recording.png +0 -0
  30. {scribe_cli-0.7.11 → scribe_cli-0.9.0}/scribe_data/share/icon_writing.png +0 -0
  31. {scribe_cli-0.7.11 → scribe_cli-0.9.0}/setup.cfg +0 -0
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.2
2
2
  Name: scribe-cli
3
- Version: 0.7.11
3
+ Version: 0.9.0
4
4
  Summary: scribe is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard on a computer
5
5
  Author-email: Mahé Perrette <mahe.perrette@gmail.com>
6
6
  License: MIT License
@@ -118,7 +118,7 @@ scribe
118
118
  and the script will guide you through the choice of backend (`whisper` or `vosk`) and the specific language model.
119
119
  After this, you will be prompted to start recording your microphone and print the transcribed text in real-time (`vosk`)
120
120
  or until after recording is complete (`whisper`).
121
- You can interrupt the recording via Ctrl + C and start again or change model. The full content of the transcription will be pasted to the clipboard by default, until interruption.
121
+ You can interrupt the recording via Ctrl + C and start again or change model.
122
122
 
123
123
  The default (`whisper`) is excellent at transcribing a full-length audio sequences in [many languages](https://github.com/openai/whisper?tab=readme-ov-file#available-models-and-languages). It is really impressive,
124
124
  but it cannot do real-time, and depending on the model can have relatively long execution time, especially with the `turbo` model (at least on my laptop with CPU only). The `small` model is also excellent and runs much faster. It is selected as default in `scribe` for that reason.
@@ -135,19 +135,38 @@ scribe --backend whisper --model small --no-prompt
135
135
  ```
136
136
  where `--no-prompt` jumps right to the recording (after the first interruption, you can still choose to change the backend and model).
137
137
 
138
- ### Virtual keyboard (experimental)
138
+ ## Output media
139
+
140
+ By default the transcription is printed on the terminal, but other output media are supported.
141
+
142
+ ### Clipboard
143
+
144
+ The most straightforward is the clipboard:
145
+
146
+ ```bash
147
+ scribe --clipboard
148
+ ```
149
+ The content of the (full) transcription is then pasted to the clipboard, and it is up to the user to paste (e.g. Ctrl + V).
139
150
 
140
- By default the content of the transcription is pasted to the clipboard, and it is up to the user to paste (e.g. Ctrl + V).
141
- However with the `vosk` backend and its realtime transcription, it is very handy to have the keys sent directly to the keyboard.
142
- That can be achieve with the `--keyboard` option.
151
+ ### Output file
143
152
 
144
- With the `--keyboard` option `scribe` will attempt to simulate a keyboard and send transcribed characters to the applcation under focus:
153
+ Alternatively an output file can be indicated:
154
+
155
+ ```bash
156
+ --keyboard -o transcription.txt
157
+ ```
158
+
159
+ ### Virtual keyboard (experimental)
160
+
161
+ With the `--keyboard` option `scribe` will attempt to simulate a keyboard and send transcribed characters to the application under focus:
145
162
 
146
163
  ```bash
147
164
  scribe --keyboard
148
165
  ```
149
166
 
150
- It relies on the optional `pynput` dependency (installed together with `scribe` if you used the `[all]` or `[keyboard]` option).
167
+ This can be extremely useful with the `vosk` backend and its realtime transcription, or alternatively with the `--restart` option with the `whisper` backend.
168
+
169
+ The `--keyboard` option relies on the optional `pynput` dependency (installed together with `scribe` if you used the `[all]` or `[keyboard]` option).
151
170
  Depending on your operating system, `pynput` may require additional configuration to work around its [limitations](https://pynput.readthedocs.io/en/latest/limitations.html).
152
171
 
153
172
  #### Use the keyboard with Wayland (default for Ubuntu 24.04)
@@ -164,7 +183,7 @@ sudo HOME=$HOME XDG_RUNTIME_DIR=$XDG_RUNTIME_DIR PYNPUT_BACKEND_KEYBOARD=uinput
164
183
  ```
165
184
  You're on the right path :)
166
185
 
167
- ### System tray icon (experimental) <img src="scribe_data/share/icon.png" width=48px>
186
+ ## System tray icon (experimental) <img src="scribe_data/share/icon.png" width=48px>
168
187
 
169
188
  To avoid switching back and forth with the terminal, it's possible to interact with the program via an icon tray.
170
189
  To activate start with:
@@ -179,15 +198,30 @@ sudo apt install libcairo-dev libgirepository1.0-dev gir1.2-appindicator3-0.1
179
198
  pip install PyGObject
180
199
  ```
181
200
 
182
- ### Start as an application in GNOME
201
+ ## Start as an application in GNOME
183
202
 
184
203
  If you run Ubuntu (or else?) with GNOME, the script `scribe-install [...]` will create a `scribe.desktop` file and place it under `$HOME/.local/share/applications`
185
- to make it available from the quick launch menu. Any option will be passed on to `scribe`.
204
+ to make it available from the quick launch menu. Any option will be passed on to `scribe`, with the additional options `--name` and `--no-terminal`.
205
+ `--no-terminal` means no terminal will show up, and it also implies the options `--app --no-prompt`.
186
206
 
187
207
  e.g.
188
208
 
189
209
  ```bash
190
- scribe-install --backend whisper --model small
210
+ scribe-install
211
+ scribe-install --name "Scribe Whisper" --backend whisper --model small --clipboard --restart-after-silence --no-prompt
212
+ scribe-install --name "Scribe Vosk FR" --backend vosk --language fr --keyboard --clipboard --no-terminal
191
213
  ```
214
+ This will install three separate apps:
215
+ - `Super + scribe` : will launch the default version with terminal prompt
216
+ - `Super + whisper` : will launch a present version with the `small` model from `whisper` and start recording right away. You can see what is going on in the terminal and the result is ready to paste from the clipboard
217
+ - `Super + vosk fr` : will launch a preset version for real-time transcription in French with the `vosk` backend, and throughput to the clipboard and the keyboard, not even opening a terminal.
218
+
192
219
 
193
- After that just typing Cmd + scri... at any time from any where will conveniently start the app in its own terminal with the prescribed options.
220
+ ## Fine tuning
221
+
222
+ There are a number of options to control the silence threshold, duration and more.
223
+ Best is to check the available options in the online help:
224
+
225
+ ```bash
226
+ scribe --help
227
+ ```
@@ -55,7 +55,7 @@ scribe
55
55
  and the script will guide you through the choice of backend (`whisper` or `vosk`) and the specific language model.
56
56
  After this, you will be prompted to start recording your microphone and print the transcribed text in real-time (`vosk`)
57
57
  or until after recording is complete (`whisper`).
58
- You can interrupt the recording via Ctrl + C and start again or change model. The full content of the transcription will be pasted to the clipboard by default, until interruption.
58
+ You can interrupt the recording via Ctrl + C and start again or change model.
59
59
 
60
60
  The default (`whisper`) is excellent at transcribing a full-length audio sequences in [many languages](https://github.com/openai/whisper?tab=readme-ov-file#available-models-and-languages). It is really impressive,
61
61
  but it cannot do real-time, and depending on the model can have relatively long execution time, especially with the `turbo` model (at least on my laptop with CPU only). The `small` model is also excellent and runs much faster. It is selected as default in `scribe` for that reason.
@@ -72,19 +72,38 @@ scribe --backend whisper --model small --no-prompt
72
72
  ```
73
73
  where `--no-prompt` jumps right to the recording (after the first interruption, you can still choose to change the backend and model).
74
74
 
75
- ### Virtual keyboard (experimental)
75
+ ## Output media
76
+
77
+ By default the transcription is printed on the terminal, but other output media are supported.
78
+
79
+ ### Clipboard
80
+
81
+ The most straightforward is the clipboard:
82
+
83
+ ```bash
84
+ scribe --clipboard
85
+ ```
86
+ The content of the (full) transcription is then pasted to the clipboard, and it is up to the user to paste (e.g. Ctrl + V).
87
+
88
+ ### Output file
89
+
90
+ Alternatively an output file can be indicated:
76
91
 
77
- By default the content of the transcription is pasted to the clipboard, and it is up to the user to paste (e.g. Ctrl + V).
78
- However with the `vosk` backend and its realtime transcription, it is very handy to have the keys sent directly to the keyboard.
79
- That can be achieve with the `--keyboard` option.
92
+ ```bash
93
+ --keyboard -o transcription.txt
94
+ ```
80
95
 
81
- With the `--keyboard` option `scribe` will attempt to simulate a keyboard and send transcribed characters to the applcation under focus:
96
+ ### Virtual keyboard (experimental)
97
+
98
+ With the `--keyboard` option `scribe` will attempt to simulate a keyboard and send transcribed characters to the application under focus:
82
99
 
83
100
  ```bash
84
101
  scribe --keyboard
85
102
  ```
86
103
 
87
- It relies on the optional `pynput` dependency (installed together with `scribe` if you used the `[all]` or `[keyboard]` option).
104
+ This can be extremely useful with the `vosk` backend and its realtime transcription, or alternatively with the `--restart` option with the `whisper` backend.
105
+
106
+ The `--keyboard` option relies on the optional `pynput` dependency (installed together with `scribe` if you used the `[all]` or `[keyboard]` option).
88
107
  Depending on your operating system, `pynput` may require additional configuration to work around its [limitations](https://pynput.readthedocs.io/en/latest/limitations.html).
89
108
 
90
109
  #### Use the keyboard with Wayland (default for Ubuntu 24.04)
@@ -101,7 +120,7 @@ sudo HOME=$HOME XDG_RUNTIME_DIR=$XDG_RUNTIME_DIR PYNPUT_BACKEND_KEYBOARD=uinput
101
120
  ```
102
121
  You're on the right path :)
103
122
 
104
- ### System tray icon (experimental) <img src="scribe_data/share/icon.png" width=48px>
123
+ ## System tray icon (experimental) <img src="scribe_data/share/icon.png" width=48px>
105
124
 
106
125
  To avoid switching back and forth with the terminal, it's possible to interact with the program via an icon tray.
107
126
  To activate start with:
@@ -116,15 +135,30 @@ sudo apt install libcairo-dev libgirepository1.0-dev gir1.2-appindicator3-0.1
116
135
  pip install PyGObject
117
136
  ```
118
137
 
119
- ### Start as an application in GNOME
138
+ ## Start as an application in GNOME
120
139
 
121
140
  If you run Ubuntu (or else?) with GNOME, the script `scribe-install [...]` will create a `scribe.desktop` file and place it under `$HOME/.local/share/applications`
122
- to make it available from the quick launch menu. Any option will be passed on to `scribe`.
141
+ to make it available from the quick launch menu. Any option will be passed on to `scribe`, with the additional options `--name` and `--no-terminal`.
142
+ `--no-terminal` means no terminal will show up, and it also implies the options `--app --no-prompt`.
123
143
 
124
144
  e.g.
125
145
 
126
146
  ```bash
127
- scribe-install --backend whisper --model small
147
+ scribe-install
148
+ scribe-install --name "Scribe Whisper" --backend whisper --model small --clipboard --restart-after-silence --no-prompt
149
+ scribe-install --name "Scribe Vosk FR" --backend vosk --language fr --keyboard --clipboard --no-terminal
128
150
  ```
151
+ This will install three separate apps:
152
+ - `Super + scribe` : will launch the default version with terminal prompt
153
+ - `Super + whisper` : will launch a present version with the `small` model from `whisper` and start recording right away. You can see what is going on in the terminal and the result is ready to paste from the clipboard
154
+ - `Super + vosk fr` : will launch a preset version for real-time transcription in French with the `vosk` backend, and throughput to the clipboard and the keyboard, not even opening a terminal.
129
155
 
130
- After that just typing Cmd + scri... at any time from any where will conveniently start the app in its own terminal with the prescribed options.
156
+
157
+ ## Fine tuning
158
+
159
+ There are a number of options to control the silence threshold, duration and more.
160
+ Best is to check the available options in the online help:
161
+
162
+ ```bash
163
+ scribe --help
164
+ ```
@@ -12,5 +12,5 @@ __version__: str
12
12
  __version_tuple__: VERSION_TUPLE
13
13
  version_tuple: VERSION_TUPLE
14
14
 
15
- __version__ = version = '0.7.11'
16
- __version_tuple__ = version_tuple = (0, 7, 11)
15
+ __version__ = version = '0.9.0'
16
+ __version_tuple__ = version_tuple = (0, 9, 0)
@@ -1,5 +1,6 @@
1
1
  from pathlib import Path
2
2
  import tomllib
3
+ import re
3
4
  import time
4
5
  import argparse
5
6
  from scribe.audio import Microphone
@@ -177,16 +178,22 @@ def get_parser():
177
178
 
178
179
  parser.add_argument("--samplerate", default=16000, type=int, help=argparse.SUPPRESS)
179
180
  parser.add_argument("--microphone-device", help="The device index of the microphone to use.", type=int)
180
- parser.add_argument("--keyboard", action="store_true")
181
- parser.add_argument("--no-clipboard", dest="clipboard", action="store_false")
182
- parser.add_argument("--latency", default=0.01, type=float, help="keyboard latency")
183
- parser.add_argument("--ascii", action="store_true", help="Use unidecode for keyboard typing in ascii")
181
+
182
+ group = parser.add_argument_group("transcription output")
183
+ group.add_argument("-c", "--clipboard", dest="clipboard", action="store_true")
184
+ # group.add_argument("--no-clipboard", dest="clipboard", action="store_false", help=argparse.SUPPRESS)
185
+ group.add_argument("-k", "--keyboard", action="store_true")
186
+ group.add_argument("-o", "--output-file")
187
+
188
+ group = parser.add_argument_group("keyboard options")
189
+ group.add_argument("--latency", default=0.01, type=float, help="keyboard latency (default %(default)s s)")
190
+ group.add_argument("--ascii", action="store_true", help="Use unidecode for keyboard typing in ascii")
184
191
 
185
192
  group = parser.add_argument_group("whisper options")
186
- group.add_argument("--duration", default=120, type=float, help="Max duration of the whisper recording (default %(default)ss)")
187
- group.add_argument("--silence", default=2, type=float, help="silence duration (default %(default)ss)")
188
- group.add_argument("--silence-db", default=-30, type=float, help="silence magnitude in db (default %(default)ss)")
189
- group.add_argument("--restart-after-silence", action="store_true", help="Restart the recording after a transcription triggered by a silence")
193
+ group.add_argument("--duration", default=120, type=float, help="Max duration of the whisper recording (default %(default)s s)")
194
+ group.add_argument("--silence", default=2, type=float, help="silence duration (default %(default)s s)")
195
+ group.add_argument("--silence-db", default=-30, type=float, help="silence magnitude in decibel (default %(default)s db)")
196
+ group.add_argument("-a", "--restart-after-silence", action="store_true", help="Restart the recording after a transcription triggered by a silence")
190
197
 
191
198
  parser.add_argument("--download-folder-vosk", help="Folder to store Vosk models.")
192
199
  parser.add_argument("--download-folder-whisper", help="Folder to store Whisper models.")
@@ -195,18 +202,16 @@ def get_parser():
195
202
 
196
203
 
197
204
  # Commencer l'enregistrement
198
- def start_recording(micro, transcriber, clipboard=True, keyboard=False, latency=0, ascii=False, callback=None, **greetings):
205
+ def start_recording(micro, transcriber, clipboard=True, keyboard=False, latency=0, ascii=False, output_file=None, callback=None, **greetings):
199
206
 
200
207
  if keyboard:
201
208
  from scribe.keyboard import type_text
202
209
  print("\nChange focus to target app during transcription.")
203
210
 
204
-
205
211
  if clipboard:
206
212
  import pyperclip
207
213
  print("\nThe full transcription will be copied to clipboard as it becomes available.")
208
214
 
209
-
210
215
  fulltext = ""
211
216
 
212
217
  for result in transcriber.start_recording(micro, **greetings):
@@ -217,6 +222,10 @@ def start_recording(micro, transcriber, clipboard=True, keyboard=False, latency=
217
222
  if keyboard:
218
223
  type_text(result['text'] + " ", interval=latency, ascii=ascii) # Simulate typing
219
224
 
225
+ if output_file:
226
+ with open(output_file, "a") as f:
227
+ f.write(result['text'] + "\n")
228
+
220
229
  if clipboard:
221
230
  fulltext += result['text'] + " "
222
231
  pyperclip.copy(fulltext.strip())
@@ -224,9 +233,6 @@ def start_recording(micro, transcriber, clipboard=True, keyboard=False, latency=
224
233
  else:
225
234
  print_partial(result.get('partial', ''))
226
235
 
227
- if clipboard:
228
- print("Copied to clipboard.")
229
-
230
236
  if callback:
231
237
  callback()
232
238
 
@@ -295,7 +301,7 @@ def create_app(micro, transcriber, **kwargs):
295
301
  def callback_stop_recording(icon, item):
296
302
  # Here we need to stop the recording thread
297
303
 
298
- transcriber.recording = False
304
+ transcriber.interrupt = True
299
305
  if hasattr(icon, "_recording_thread"):
300
306
  icon._recording_thread.join()
301
307
  if hasattr(icon, "_monitoring_thread"):
@@ -355,17 +361,26 @@ def main(args=None):
355
361
  if transcriber is None:
356
362
  transcriber = get_transcriber(o, prompt=o.prompt)
357
363
  print(f"Model [{colored(transcriber.model_name, 'light_blue', attrs=['bold'])}] from [{colored(transcriber.backend, 'light_blue', attrs=['bold'])}] selected.")
358
- show_options = ["clipboard", "keyboard", "ascii", "app"]
359
- activated_options = [colored(option, 'light_blue') for option in show_options if getattr(o, option)]
360
- print(f"Options: {' | '.join(activated_options)}")
364
+ show_output = ["clipboard", "keyboard", "output_file"]
365
+ show_options = ["ascii", "restart_after_silence"]
366
+ activated_output = [colored(option if type(getattr(o, option)) is bool else f'{option}={getattr(o, option)}', 'light_blue') for option in show_output if getattr(o, option)]
367
+ activated_options = [colored(option if type(getattr(o, option)) is bool else f'{option}={getattr(o, option)}', 'light_blue') for option in show_options if getattr(o, option)]
368
+ if activated_output:
369
+ print(f"Output: {' | '.join(activated_output)}")
370
+ else:
371
+ print(colored(f"No output selected -> terminal only", "light_red"))
372
+ if o.app:
373
+ print(colored("App mode enabled", "light_green"))
374
+ if activated_options:
375
+ print(f"Options: {' | '.join(activated_options)}")
361
376
  if o.prompt:
362
377
  print(f"Choose any of the following actions")
363
- print(f"{colored('[q]', 'light_yellow')} quit")
364
378
  print(f"{colored('[e]', 'light_yellow')} change model")
379
+ print(f"{colored('[f]', 'light_yellow')} output file is {colored(repr(o.output_file), 'light_blue')}")
380
+ print(f"{colored('[c]', 'light_yellow')} clipboard is {colored(o.clipboard, 'light_blue')} toggle?")
381
+ print(f"{colored('[k]', 'light_yellow')} keyboard is {colored(o.keyboard, 'light_blue')} toggle?")
382
+ print(f"{colored('[x]', 'light_yellow')} app is {colored(o.app, 'light_blue')} toggle?")
365
383
  if details:
366
- print(f"{colored('[x]', 'light_yellow')} app is {colored(o.app, 'light_blue')} toggle?")
367
- print(f"{colored('[c]', 'light_yellow')} clipboard is {colored(o.clipboard, 'light_blue')} toggle?")
368
- print(f"{colored('[k]', 'light_yellow')} keyboard is {colored(o.keyboard, 'light_blue')} toggle?")
369
384
  if o.keyboard:
370
385
  print(f"{colored('[latency]', 'light_yellow')} between keystrokes is {colored(o.latency, 'light_blue')} s")
371
386
  if transcriber.backend == "whisper":
@@ -379,21 +394,22 @@ def main(args=None):
379
394
  if key not in display_flags or key in exclude_flags or not isinstance(value, bool):
380
395
  continue
381
396
  print(f"{colored(f'[{key}]', 'light_yellow')} is {colored(value, 'light_blue')} toggle?")
382
- print(f"{colored('[o]', 'light_yellow')} hide options")
397
+ print(f"{colored('[-]', 'light_yellow')} hide options")
383
398
  else:
384
- print(f"{colored('[o]', 'light_yellow')} show options")
385
-
399
+ print(f"{colored('[-]', 'light_yellow')} show more options")
400
+ print(f"{colored('[q]', 'light_yellow')} quit")
386
401
  print(colored(f"Press [Enter] to start recording.", attrs=["bold"]))
387
402
 
388
403
  key = input()
389
404
  if key == "q":
390
405
  exit(0)
391
- if key == "o":
406
+ if len(key) > 0 and key.strip() in ["", ".", "-", "+", 'o', '\x1b[A', '\x1b[B', '\x1b[C', '\x1b[D']: # arrow keys
392
407
  details = not details
393
408
  continue
394
409
  if key == "e":
395
410
  transcriber = None
396
411
  o.model = None
412
+ o.dummy = False
397
413
  o.backend = None
398
414
  o.language = None
399
415
  continue
@@ -407,7 +423,7 @@ def main(args=None):
407
423
  o.app = not o.app
408
424
  continue
409
425
  if key == "a":
410
- transcriber.restart_after_silence = not transcriber.restart_after_silence
426
+ o.restart_after_silence = transcriber.restart_after_silence = not transcriber.restart_after_silence
411
427
  continue
412
428
  if key == "t":
413
429
  ans = input(f"Enter new duration in seconds (current: {transcriber.timeout}): ")
@@ -437,18 +453,27 @@ def main(args=None):
437
453
  except:
438
454
  print("Invalid duration. Must be a float.")
439
455
  continue
456
+ if key == "f":
457
+ ans = input(f"Enter output file (current: {o.output_file}): ")
458
+ invalid_regex = re.compile(r'[^A-Za-z0-9_\-\\\/\.]')
459
+ if not invalid_regex.search(ans):
460
+ o.output_file = ans
461
+ else:
462
+ print(f"Invalid characters: {' '.join(map(repr, invalid_regex.findall(ans)))}")
463
+ print(f"Invalid file name: {repr(ans)}")
464
+ continue
440
465
  if key:
441
466
  if hasattr(o, key) and isinstance(getattr(o, key), bool):
442
467
  setattr(o, key, not getattr(o, key))
443
468
  print(f"Toggle {key} to [{getattr(o, key)}].")
444
- print(f"Invalid choice: {key}.")
469
+ print(f"Invalid choice: {repr(key)}")
445
470
  continue
446
471
 
447
472
  if o.app:
448
473
  greetings = dict(
449
474
  start_message = "Listening... Use the try icon menu to stop.",
450
475
  )
451
- app = create_app(micro, transcriber, clipboard=o.clipboard,
476
+ app = create_app(micro, transcriber, clipboard=o.clipboard, output_file=o.output_file,
452
477
  keyboard=o.keyboard, latency=o.latency, ascii=o.ascii, **greetings)
453
478
  print("Starting app...")
454
479
  app.run()
@@ -456,7 +481,7 @@ def main(args=None):
456
481
  greetings = dict(
457
482
  start_message = "Listening... Press Ctrl+C to stop.",
458
483
  )
459
- start_recording(micro, transcriber, clipboard=o.clipboard,
484
+ start_recording(micro, transcriber, clipboard=o.clipboard, output_file=o.output_file,
460
485
  keyboard=o.keyboard, latency=o.latency, ascii=o.ascii, **greetings)
461
486
 
462
487
  # if we arrived so far, that means we pressed Ctrl + C anyway, and need Enter to move on.
@@ -10,9 +10,17 @@ def main():
10
10
  sys.exit(0)
11
11
 
12
12
  parser = argparse.ArgumentParser("Install the desktop file for the scribe package. Any arguments to this script will be passed on to `scribe`.")
13
+ parser.add_argument("--name", help="The title of the desktop app", default="Scribe")
14
+ parser.add_argument("--startup-wm-class")
15
+ parser.add_argument("--no-terminal", action="store_false", dest="terminal", help="Don't show the terminal (goes in --app mode)")
13
16
  o, rest = parser.parse_known_args()
14
17
  o.arguments = rest
15
18
 
19
+ if not o.terminal and "--app" not in o.arguments:
20
+ o.arguments.append("--app")
21
+ if not o.terminal and "--no-prompt" not in o.arguments:
22
+ o.arguments.append("--no-prompt")
23
+
16
24
  SOURCE_SCRIBE_DATA = os.path.dirname(scribe_data.__file__)
17
25
 
18
26
  HOME = os.environ.get('HOME',os.path.expanduser('~'))
@@ -25,15 +33,18 @@ def main():
25
33
  with open(os.path.join(SOURCE_SCRIBE_DATA, 'templates', 'scribe.desktop')) as f:
26
34
  template = f.read()
27
35
 
36
+ simple_name = o.name.lower().replace(' ','-').replace(os.path.sep, '-')
28
37
  bin_folder = sysconfig.get_path("scripts")
29
38
  icon_folder = os.path.join(SOURCE_SCRIBE_DATA, 'share')
30
- desktop_filecontent = template.format(icon_folder=icon_folder, bin_folder=bin_folder, options=' '.join(o.arguments) if o.arguments else '')
39
+ desktop_filecontent = template.format(icon_folder=icon_folder, bin_folder=bin_folder,
40
+ name=o.name, terminal=str(o.terminal).lower(),
41
+ StartupWMClass=o.startup_wm_class or f"crx_mpnasdanpmm_{simple_name}",
42
+ options=' ' + ' '.join(o.arguments) if o.arguments else '')
31
43
 
32
- desktop_filepath = os.path.join(XDG_APP_DATA, 'scribe.desktop')
44
+ desktop_filepath = os.path.join(XDG_APP_DATA, f'{simple_name}.desktop')
33
45
  print("Writing GNOME desktop file:", desktop_filepath)
34
46
  with open(desktop_filepath, "w") as f:
35
47
  f.write(desktop_filecontent)
36
48
 
37
-
38
49
  if __name__ == "__main__":
39
50
  main()
@@ -35,6 +35,7 @@ class AbstractTranscriber:
35
35
  self.recording = False
36
36
  self.busy = False
37
37
  self.waiting = False
38
+ self.interrupt = False
38
39
  self.reset()
39
40
 
40
41
  def get_elapsed(self):
@@ -59,6 +60,7 @@ class AbstractTranscriber:
59
60
  stop_message="Done transcribing."):
60
61
 
61
62
  self.reset()
63
+ self.interrupt = False
62
64
  self.recording = True
63
65
  self.waiting = True
64
66
  self.busy = True
@@ -73,7 +75,7 @@ class AbstractTranscriber:
73
75
  with microphone.open_stream():
74
76
  print(start_message)
75
77
 
76
- while self.recording:
78
+ while not self.interrupt:
77
79
  while not microphone.q.empty():
78
80
  data = microphone.q.get()
79
81
 
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.2
2
2
  Name: scribe-cli
3
- Version: 0.7.11
3
+ Version: 0.9.0
4
4
  Summary: scribe is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard on a computer
5
5
  Author-email: Mahé Perrette <mahe.perrette@gmail.com>
6
6
  License: MIT License
@@ -118,7 +118,7 @@ scribe
118
118
  and the script will guide you through the choice of backend (`whisper` or `vosk`) and the specific language model.
119
119
  After this, you will be prompted to start recording your microphone and print the transcribed text in real-time (`vosk`)
120
120
  or until after recording is complete (`whisper`).
121
- You can interrupt the recording via Ctrl + C and start again or change model. The full content of the transcription will be pasted to the clipboard by default, until interruption.
121
+ You can interrupt the recording via Ctrl + C and start again or change model.
122
122
 
123
123
  The default (`whisper`) is excellent at transcribing a full-length audio sequences in [many languages](https://github.com/openai/whisper?tab=readme-ov-file#available-models-and-languages). It is really impressive,
124
124
  but it cannot do real-time, and depending on the model can have relatively long execution time, especially with the `turbo` model (at least on my laptop with CPU only). The `small` model is also excellent and runs much faster. It is selected as default in `scribe` for that reason.
@@ -135,19 +135,38 @@ scribe --backend whisper --model small --no-prompt
135
135
  ```
136
136
  where `--no-prompt` jumps right to the recording (after the first interruption, you can still choose to change the backend and model).
137
137
 
138
- ### Virtual keyboard (experimental)
138
+ ## Output media
139
+
140
+ By default the transcription is printed on the terminal, but other output media are supported.
141
+
142
+ ### Clipboard
143
+
144
+ The most straightforward is the clipboard:
145
+
146
+ ```bash
147
+ scribe --clipboard
148
+ ```
149
+ The content of the (full) transcription is then pasted to the clipboard, and it is up to the user to paste (e.g. Ctrl + V).
139
150
 
140
- By default the content of the transcription is pasted to the clipboard, and it is up to the user to paste (e.g. Ctrl + V).
141
- However with the `vosk` backend and its realtime transcription, it is very handy to have the keys sent directly to the keyboard.
142
- That can be achieve with the `--keyboard` option.
151
+ ### Output file
143
152
 
144
- With the `--keyboard` option `scribe` will attempt to simulate a keyboard and send transcribed characters to the applcation under focus:
153
+ Alternatively an output file can be indicated:
154
+
155
+ ```bash
156
+ --keyboard -o transcription.txt
157
+ ```
158
+
159
+ ### Virtual keyboard (experimental)
160
+
161
+ With the `--keyboard` option `scribe` will attempt to simulate a keyboard and send transcribed characters to the application under focus:
145
162
 
146
163
  ```bash
147
164
  scribe --keyboard
148
165
  ```
149
166
 
150
- It relies on the optional `pynput` dependency (installed together with `scribe` if you used the `[all]` or `[keyboard]` option).
167
+ This can be extremely useful with the `vosk` backend and its realtime transcription, or alternatively with the `--restart` option with the `whisper` backend.
168
+
169
+ The `--keyboard` option relies on the optional `pynput` dependency (installed together with `scribe` if you used the `[all]` or `[keyboard]` option).
151
170
  Depending on your operating system, `pynput` may require additional configuration to work around its [limitations](https://pynput.readthedocs.io/en/latest/limitations.html).
152
171
 
153
172
  #### Use the keyboard with Wayland (default for Ubuntu 24.04)
@@ -164,7 +183,7 @@ sudo HOME=$HOME XDG_RUNTIME_DIR=$XDG_RUNTIME_DIR PYNPUT_BACKEND_KEYBOARD=uinput
164
183
  ```
165
184
  You're on the right path :)
166
185
 
167
- ### System tray icon (experimental) <img src="scribe_data/share/icon.png" width=48px>
186
+ ## System tray icon (experimental) <img src="scribe_data/share/icon.png" width=48px>
168
187
 
169
188
  To avoid switching back and forth with the terminal, it's possible to interact with the program via an icon tray.
170
189
  To activate start with:
@@ -179,15 +198,30 @@ sudo apt install libcairo-dev libgirepository1.0-dev gir1.2-appindicator3-0.1
179
198
  pip install PyGObject
180
199
  ```
181
200
 
182
- ### Start as an application in GNOME
201
+ ## Start as an application in GNOME
183
202
 
184
203
  If you run Ubuntu (or else?) with GNOME, the script `scribe-install [...]` will create a `scribe.desktop` file and place it under `$HOME/.local/share/applications`
185
- to make it available from the quick launch menu. Any option will be passed on to `scribe`.
204
+ to make it available from the quick launch menu. Any option will be passed on to `scribe`, with the additional options `--name` and `--no-terminal`.
205
+ `--no-terminal` means no terminal will show up, and it also implies the options `--app --no-prompt`.
186
206
 
187
207
  e.g.
188
208
 
189
209
  ```bash
190
- scribe-install --backend whisper --model small
210
+ scribe-install
211
+ scribe-install --name "Scribe Whisper" --backend whisper --model small --clipboard --restart-after-silence --no-prompt
212
+ scribe-install --name "Scribe Vosk FR" --backend vosk --language fr --keyboard --clipboard --no-terminal
191
213
  ```
214
+ This will install three separate apps:
215
+ - `Super + scribe` : will launch the default version with terminal prompt
216
+ - `Super + whisper` : will launch a present version with the `small` model from `whisper` and start recording right away. You can see what is going on in the terminal and the result is ready to paste from the clipboard
217
+ - `Super + vosk fr` : will launch a preset version for real-time transcription in French with the `vosk` backend, and throughput to the clipboard and the keyboard, not even opening a terminal.
218
+
192
219
 
193
- After that just typing Cmd + scri... at any time from any where will conveniently start the app in its own terminal with the prescribed options.
220
+ ## Fine tuning
221
+
222
+ There are a number of options to control the silence threshold, duration and more.
223
+ Best is to check the available options in the online help:
224
+
225
+ ```bash
226
+ scribe --help
227
+ ```
@@ -0,0 +1,8 @@
1
+ #!/usr/bin/env xdg-open
2
+ [Desktop Entry]
3
+ Terminal={terminal}
4
+ Type=Application
5
+ Name={name}
6
+ Exec={bin_folder}/scribe{options}
7
+ Icon={icon_folder}/icon.png
8
+ StartupWMClass={StartupWMClass}
@@ -1,8 +0,0 @@
1
- #!/usr/bin/env xdg-open
2
- [Desktop Entry]
3
- Terminal=true
4
- Type=Application
5
- Name=Scribe
6
- Exec={bin_folder}/scribe{options}
7
- Icon={icon_folder}/icon.jpg
8
- StartupWMClass=crx_mpnasdanpmmopoasdjdcgaaiekailkhb
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes