scribe-cli 0.7.10__tar.gz → 0.7.11__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (30) hide show
  1. {scribe_cli-0.7.10/scribe_cli.egg-info → scribe_cli-0.7.11}/PKG-INFO +4 -4
  2. {scribe_cli-0.7.10 → scribe_cli-0.7.11}/README.md +3 -3
  3. {scribe_cli-0.7.10 → scribe_cli-0.7.11}/scribe/_version.py +2 -2
  4. {scribe_cli-0.7.10 → scribe_cli-0.7.11}/scribe/app.py +27 -9
  5. {scribe_cli-0.7.10 → scribe_cli-0.7.11}/scribe/models.py +22 -3
  6. {scribe_cli-0.7.10 → scribe_cli-0.7.11/scribe_cli.egg-info}/PKG-INFO +4 -4
  7. {scribe_cli-0.7.10 → scribe_cli-0.7.11}/.github/workflows/pypi.yml +0 -0
  8. {scribe_cli-0.7.10 → scribe_cli-0.7.11}/.gitignore +0 -0
  9. {scribe_cli-0.7.10 → scribe_cli-0.7.11}/LICENSE +0 -0
  10. {scribe_cli-0.7.10 → scribe_cli-0.7.11}/icon.xcf +0 -0
  11. {scribe_cli-0.7.10 → scribe_cli-0.7.11}/pyproject.toml +0 -0
  12. {scribe_cli-0.7.10 → scribe_cli-0.7.11}/scribe/__init__.py +0 -0
  13. {scribe_cli-0.7.10 → scribe_cli-0.7.11}/scribe/audio.py +0 -0
  14. {scribe_cli-0.7.10 → scribe_cli-0.7.11}/scribe/install_desktop.py +0 -0
  15. {scribe_cli-0.7.10 → scribe_cli-0.7.11}/scribe/keyboard.py +0 -0
  16. {scribe_cli-0.7.10 → scribe_cli-0.7.11}/scribe/models.toml +0 -0
  17. {scribe_cli-0.7.10 → scribe_cli-0.7.11}/scribe/saverecording.py +0 -0
  18. {scribe_cli-0.7.10 → scribe_cli-0.7.11}/scribe/testpynput.py +0 -0
  19. {scribe_cli-0.7.10 → scribe_cli-0.7.11}/scribe/util.py +0 -0
  20. {scribe_cli-0.7.10 → scribe_cli-0.7.11}/scribe_cli.egg-info/SOURCES.txt +0 -0
  21. {scribe_cli-0.7.10 → scribe_cli-0.7.11}/scribe_cli.egg-info/dependency_links.txt +0 -0
  22. {scribe_cli-0.7.10 → scribe_cli-0.7.11}/scribe_cli.egg-info/entry_points.txt +0 -0
  23. {scribe_cli-0.7.10 → scribe_cli-0.7.11}/scribe_cli.egg-info/requires.txt +0 -0
  24. {scribe_cli-0.7.10 → scribe_cli-0.7.11}/scribe_cli.egg-info/top_level.txt +0 -0
  25. {scribe_cli-0.7.10 → scribe_cli-0.7.11}/scribe_data/__init__.py +0 -0
  26. {scribe_cli-0.7.10 → scribe_cli-0.7.11}/scribe_data/share/icon.png +0 -0
  27. {scribe_cli-0.7.10 → scribe_cli-0.7.11}/scribe_data/share/icon_recording.png +0 -0
  28. {scribe_cli-0.7.10 → scribe_cli-0.7.11}/scribe_data/share/icon_writing.png +0 -0
  29. {scribe_cli-0.7.10 → scribe_cli-0.7.11}/scribe_data/templates/scribe.desktop +0 -0
  30. {scribe_cli-0.7.10 → scribe_cli-0.7.11}/setup.cfg +0 -0
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.2
2
2
  Name: scribe-cli
3
- Version: 0.7.10
3
+ Version: 0.7.11
4
4
  Summary: scribe is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard on a computer
5
5
  Author-email: Mahé Perrette <mahe.perrette@gmail.com>
6
6
  License: MIT License
@@ -104,7 +104,7 @@ pip install -e .[all]
104
104
  You can leave the optional dependencies (leave out `[all]`) but must install at least one of `vosk` or `openai-whisper` packages (see Usage below).
105
105
 
106
106
  The `vosk` language models will download on-the-fly.
107
- The default download folder is `$XDG_CACHE_HOME/{backend}` where `$XDG_CACHE_HOME` defaults to `$HOME/.cache` (note for the `whisker` backend
107
+ The default download folder is `$XDG_CACHE_HOME/{backend}` where `$XDG_CACHE_HOME` defaults to `$HOME/.cache` (note for the `whisper` backend
108
108
  the default is left to the `openai-whisper` package and might change in the future).
109
109
 
110
110
 
@@ -122,8 +122,8 @@ You can interrupt the recording via Ctrl + C and start again or change model. Th
122
122
 
123
123
  The default (`whisper`) is excellent at transcribing a full-length audio sequences in [many languages](https://github.com/openai/whisper?tab=readme-ov-file#available-models-and-languages). It is really impressive,
124
124
  but it cannot do real-time, and depending on the model can have relatively long execution time, especially with the `turbo` model (at least on my laptop with CPU only). The `small` model is also excellent and runs much faster. It is selected as default in `scribe` for that reason.
125
- With the `whisker` model you need to stop the registration manually before the transcription occurs (Ctrl + C), though
126
- there is a maximum duration after which it will stop by itself, which is setup to 60s by default (unless `--duration` is set to something else).
125
+ With the `whisper` model the registration stops after a 2-second silence is detected. You can also stop the registration manually before the transcription occurs (Ctrl + C or Stop in the `--app` mode).
126
+ By default, the recording will only last 120 seconds. You can fine-tune this behaviour via the `--silence`, `--duration` and `--restart-after-silence` parameters.
127
127
 
128
128
  The `vosk` backend is much faster and very good at doing real-time transcription for one language, but tended to make more mistakes in my tests and it does not do punctuation.
129
129
  It becomes really powerful when used for longer or interactive typing session with the [keyboard](#virtual-keyboard-experimental) option, e.g. to make notes or chat with an AI.
@@ -41,7 +41,7 @@ pip install -e .[all]
41
41
  You can leave the optional dependencies (leave out `[all]`) but must install at least one of `vosk` or `openai-whisper` packages (see Usage below).
42
42
 
43
43
  The `vosk` language models will download on-the-fly.
44
- The default download folder is `$XDG_CACHE_HOME/{backend}` where `$XDG_CACHE_HOME` defaults to `$HOME/.cache` (note for the `whisker` backend
44
+ The default download folder is `$XDG_CACHE_HOME/{backend}` where `$XDG_CACHE_HOME` defaults to `$HOME/.cache` (note for the `whisper` backend
45
45
  the default is left to the `openai-whisper` package and might change in the future).
46
46
 
47
47
 
@@ -59,8 +59,8 @@ You can interrupt the recording via Ctrl + C and start again or change model. Th
59
59
 
60
60
  The default (`whisper`) is excellent at transcribing a full-length audio sequences in [many languages](https://github.com/openai/whisper?tab=readme-ov-file#available-models-and-languages). It is really impressive,
61
61
  but it cannot do real-time, and depending on the model can have relatively long execution time, especially with the `turbo` model (at least on my laptop with CPU only). The `small` model is also excellent and runs much faster. It is selected as default in `scribe` for that reason.
62
- With the `whisker` model you need to stop the registration manually before the transcription occurs (Ctrl + C), though
63
- there is a maximum duration after which it will stop by itself, which is setup to 60s by default (unless `--duration` is set to something else).
62
+ With the `whisper` model the registration stops after a 2-second silence is detected. You can also stop the registration manually before the transcription occurs (Ctrl + C or Stop in the `--app` mode).
63
+ By default, the recording will only last 120 seconds. You can fine-tune this behaviour via the `--silence`, `--duration` and `--restart-after-silence` parameters.
64
64
 
65
65
  The `vosk` backend is much faster and very good at doing real-time transcription for one language, but tended to make more mistakes in my tests and it does not do punctuation.
66
66
  It becomes really powerful when used for longer or interactive typing session with the [keyboard](#virtual-keyboard-experimental) option, e.g. to make notes or chat with an AI.
@@ -12,5 +12,5 @@ __version__: str
12
12
  __version_tuple__: VERSION_TUPLE
13
13
  version_tuple: VERSION_TUPLE
14
14
 
15
- __version__ = version = '0.7.10'
16
- __version_tuple__ = version_tuple = (0, 7, 10)
15
+ __version__ = version = '0.7.11'
16
+ __version_tuple__ = version_tuple = (0, 7, 11)
@@ -147,7 +147,8 @@ def get_transcriber(o, prompt=True):
147
147
 
148
148
  elif backend == "whisper":
149
149
  transcriber = WhisperTranscriber(model_name=model, language=o.language, samplerate=o.samplerate,
150
- timeout=o.duration, silence_duration=o.silence, restart_after_silence=o.restart_after_silence,
150
+ timeout=o.duration, silence_duration=o.silence, silence_thresh=o.silence_db,
151
+ restart_after_silence=o.restart_after_silence,
151
152
  model_kwargs={"download_root": o.download_folder_whisper})
152
153
 
153
154
  else:
@@ -178,12 +179,13 @@ def get_parser():
178
179
  parser.add_argument("--microphone-device", help="The device index of the microphone to use.", type=int)
179
180
  parser.add_argument("--keyboard", action="store_true")
180
181
  parser.add_argument("--no-clipboard", dest="clipboard", action="store_false")
181
- parser.add_argument("--latency", default=0, type=float, help="keyboard latency")
182
+ parser.add_argument("--latency", default=0.01, type=float, help="keyboard latency")
182
183
  parser.add_argument("--ascii", action="store_true", help="Use unidecode for keyboard typing in ascii")
183
184
 
184
185
  group = parser.add_argument_group("whisper options")
185
- group.add_argument("--duration", default=120, type=int, help="Max duration of the whisper recording (default %(default)ss)")
186
- group.add_argument("--silence", default=2, type=float, help="silence duration that prompt transcription (whisper) (default %(default)ss)")
186
+ group.add_argument("--duration", default=120, type=float, help="Max duration of the whisper recording (default %(default)ss)")
187
+ group.add_argument("--silence", default=2, type=float, help="silence duration (default %(default)ss)")
188
+ group.add_argument("--silence-db", default=-30, type=float, help="silence magnitude in db (default %(default)ss)")
187
189
  group.add_argument("--restart-after-silence", action="store_true", help="Restart the recording after a transcription triggered by a silence")
188
190
 
189
191
  parser.add_argument("--download-folder-vosk", help="Folder to store Vosk models.")
@@ -249,7 +251,15 @@ def create_app(micro, transcriber, **kwargs):
249
251
  image_recording = Image.alpha_composite(image_recording.convert("RGBA"), image_writing.convert("RGBA"))
250
252
 
251
253
  def update_icon(icon, force=False):
252
- if transcriber.recording:
254
+ if transcriber.recording and transcriber.waiting:
255
+ # this is the situation with the whisper backend when the microphone is recording
256
+ # but we wait for the speaker to speak (silence)
257
+ if force or getattr(icon, "_icon_label", None) != None:
258
+ icon.icon = image
259
+ icon._icon_label = None
260
+ icon.update_menu()
261
+
262
+ elif transcriber.recording:
253
263
  if force or getattr(icon, "_icon_label", None) != "recording":
254
264
  icon.icon = image_recording
255
265
  icon._icon_label = "recording"
@@ -361,6 +371,7 @@ def main(args=None):
361
371
  if transcriber.backend == "whisper":
362
372
  print(f"{colored('[t]', 'light_yellow')} change duration (currently {colored(transcriber.timeout, 'light_blue')} s)")
363
373
  print(f"{colored('[b]', 'light_yellow')} change silence (currently {colored(transcriber.silence_duration, 'light_blue')} s)")
374
+ print(f"{colored('[db]', 'light_yellow')} change backround noise (currently {colored(transcriber.silence_thresh, 'light_blue')} db)")
364
375
  print(f"{colored('[a]', 'light_yellow')} auto-restart after silence is {colored(transcriber.restart_after_silence, 'light_blue')} toggle?")
365
376
  exclude_flags = ["keyboard", "clipboard", "app", "prompt", "restart_after_silence"]
366
377
  display_flags = [a.dest for a in parser._actions if a.help != argparse.SUPPRESS]
@@ -401,9 +412,9 @@ def main(args=None):
401
412
  if key == "t":
402
413
  ans = input(f"Enter new duration in seconds (current: {transcriber.timeout}): ")
403
414
  try:
404
- o.duration = transcriber.timeout = int(ans)
415
+ o.duration = transcriber.timeout = float(ans)
405
416
  except:
406
- print("Invalid duration. Must be an integer.")
417
+ print("Invalid duration. Must be a float.")
407
418
  continue
408
419
  if key == "latency":
409
420
  ans = input(f"Enter new keyboard latency in seconds (current: {o.latency}): ")
@@ -415,9 +426,16 @@ def main(args=None):
415
426
  if key == "b":
416
427
  ans = input(f"Enter new silence break duration in seconds (current: {transcriber.silence_duration}): ")
417
428
  try:
418
- o.silence = transcriber.silence_duration = int(ans)
429
+ o.silence = transcriber.silence_duration = float(ans)
430
+ except:
431
+ print("Invalid duration. Must be a float.")
432
+ continue
433
+ if key == "db":
434
+ ans = input(f"Enter new background noise threshold to detect silence (current: {transcriber.silence_thresh}): ")
435
+ try:
436
+ o.silence_db = transcriber.silence_thresh = float(ans)
419
437
  except:
420
- print("Invalid duration. Must be an integer.")
438
+ print("Invalid duration. Must be a float.")
421
439
  continue
422
440
  if key:
423
441
  if hasattr(o, key) and isinstance(getattr(o, key), bool):
@@ -34,6 +34,7 @@ class AbstractTranscriber:
34
34
  self.restart_after_silence = restart_after_silence
35
35
  self.recording = False
36
36
  self.busy = False
37
+ self.waiting = False
37
38
  self.reset()
38
39
 
39
40
  def get_elapsed(self):
@@ -52,7 +53,6 @@ class AbstractTranscriber:
52
53
  def reset(self):
53
54
  self.audio_buffer = b''
54
55
  self.start_time = time.time()
55
- self.last_sound_time = time.time()
56
56
 
57
57
  def start_recording(self, microphone,
58
58
  start_message="Recording... Press Ctrl+C to stop.",
@@ -60,7 +60,13 @@ class AbstractTranscriber:
60
60
 
61
61
  self.reset()
62
62
  self.recording = True
63
+ self.waiting = True
63
64
  self.busy = True
65
+ if self.silence_duration is not None:
66
+ self.last_sound_time = time.time() - self.silence_duration
67
+ else:
68
+ self.last_sound_time = time.time()
69
+ previous_waiting = self.waiting
64
70
 
65
71
  try:
66
72
 
@@ -75,19 +81,31 @@ class AbstractTranscriber:
75
81
  if is_silent(data, self.silence_thresh):
76
82
  silence_duration = time.time() - self.last_sound_time
77
83
 
78
- if self.silence_duration is not None and silence_duration >= self.silence_duration and len(self.audio_buffer) > 0:
84
+ previous_waiting = self.waiting
85
+ self.waiting = self.silence_duration is not None and silence_duration >= self.silence_duration
86
+
87
+ if self.waiting and len(self.audio_buffer) > 0:
79
88
  if self.restart_after_silence:
89
+ self.recording = False # for the system tray icon
80
90
  result = self.finalize()
81
91
  microphone.q.queue.clear()
82
92
  self.reset()
83
93
  yield result
94
+ self.recording = True # for the system tray icon
84
95
  else:
85
96
  raise StopRecording("Silence detected: {:.2f} seconds".format(silence_duration))
86
97
 
87
98
  else:
88
99
  self.last_sound_time = time.time()
100
+ self.waiting = False
89
101
 
90
- yield self.transcribe_realtime_audio(data)
102
+ # don't accumulate very long silences
103
+ if not self.waiting:
104
+ yield self.transcribe_realtime_audio(data)
105
+
106
+ else:
107
+ if not previous_waiting:
108
+ print("Silence detected...waiting for more audio")
91
109
 
92
110
  if self.is_overtime():
93
111
  raise StopRecording("Overtime: {:.2f} seconds".format(self.get_elapsed()))
@@ -98,6 +116,7 @@ class AbstractTranscriber:
98
116
  pass
99
117
 
100
118
  finally:
119
+ self.waiting = False
101
120
  self.recording = False
102
121
  result = self.finalize()
103
122
  microphone.q.queue.clear()
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.2
2
2
  Name: scribe-cli
3
- Version: 0.7.10
3
+ Version: 0.7.11
4
4
  Summary: scribe is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard on a computer
5
5
  Author-email: Mahé Perrette <mahe.perrette@gmail.com>
6
6
  License: MIT License
@@ -104,7 +104,7 @@ pip install -e .[all]
104
104
  You can leave the optional dependencies (leave out `[all]`) but must install at least one of `vosk` or `openai-whisper` packages (see Usage below).
105
105
 
106
106
  The `vosk` language models will download on-the-fly.
107
- The default download folder is `$XDG_CACHE_HOME/{backend}` where `$XDG_CACHE_HOME` defaults to `$HOME/.cache` (note for the `whisker` backend
107
+ The default download folder is `$XDG_CACHE_HOME/{backend}` where `$XDG_CACHE_HOME` defaults to `$HOME/.cache` (note for the `whisper` backend
108
108
  the default is left to the `openai-whisper` package and might change in the future).
109
109
 
110
110
 
@@ -122,8 +122,8 @@ You can interrupt the recording via Ctrl + C and start again or change model. Th
122
122
 
123
123
  The default (`whisper`) is excellent at transcribing a full-length audio sequences in [many languages](https://github.com/openai/whisper?tab=readme-ov-file#available-models-and-languages). It is really impressive,
124
124
  but it cannot do real-time, and depending on the model can have relatively long execution time, especially with the `turbo` model (at least on my laptop with CPU only). The `small` model is also excellent and runs much faster. It is selected as default in `scribe` for that reason.
125
- With the `whisker` model you need to stop the registration manually before the transcription occurs (Ctrl + C), though
126
- there is a maximum duration after which it will stop by itself, which is setup to 60s by default (unless `--duration` is set to something else).
125
+ With the `whisper` model the registration stops after a 2-second silence is detected. You can also stop the registration manually before the transcription occurs (Ctrl + C or Stop in the `--app` mode).
126
+ By default, the recording will only last 120 seconds. You can fine-tune this behaviour via the `--silence`, `--duration` and `--restart-after-silence` parameters.
127
127
 
128
128
  The `vosk` backend is much faster and very good at doing real-time transcription for one language, but tended to make more mistakes in my tests and it does not do punctuation.
129
129
  It becomes really powerful when used for longer or interactive typing session with the [keyboard](#virtual-keyboard-experimental) option, e.g. to make notes or chat with an AI.
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes