scribe-cli 0.7.10__tar.gz → 0.7.11__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {scribe_cli-0.7.10/scribe_cli.egg-info → scribe_cli-0.7.11}/PKG-INFO +4 -4
- {scribe_cli-0.7.10 → scribe_cli-0.7.11}/README.md +3 -3
- {scribe_cli-0.7.10 → scribe_cli-0.7.11}/scribe/_version.py +2 -2
- {scribe_cli-0.7.10 → scribe_cli-0.7.11}/scribe/app.py +27 -9
- {scribe_cli-0.7.10 → scribe_cli-0.7.11}/scribe/models.py +22 -3
- {scribe_cli-0.7.10 → scribe_cli-0.7.11/scribe_cli.egg-info}/PKG-INFO +4 -4
- {scribe_cli-0.7.10 → scribe_cli-0.7.11}/.github/workflows/pypi.yml +0 -0
- {scribe_cli-0.7.10 → scribe_cli-0.7.11}/.gitignore +0 -0
- {scribe_cli-0.7.10 → scribe_cli-0.7.11}/LICENSE +0 -0
- {scribe_cli-0.7.10 → scribe_cli-0.7.11}/icon.xcf +0 -0
- {scribe_cli-0.7.10 → scribe_cli-0.7.11}/pyproject.toml +0 -0
- {scribe_cli-0.7.10 → scribe_cli-0.7.11}/scribe/__init__.py +0 -0
- {scribe_cli-0.7.10 → scribe_cli-0.7.11}/scribe/audio.py +0 -0
- {scribe_cli-0.7.10 → scribe_cli-0.7.11}/scribe/install_desktop.py +0 -0
- {scribe_cli-0.7.10 → scribe_cli-0.7.11}/scribe/keyboard.py +0 -0
- {scribe_cli-0.7.10 → scribe_cli-0.7.11}/scribe/models.toml +0 -0
- {scribe_cli-0.7.10 → scribe_cli-0.7.11}/scribe/saverecording.py +0 -0
- {scribe_cli-0.7.10 → scribe_cli-0.7.11}/scribe/testpynput.py +0 -0
- {scribe_cli-0.7.10 → scribe_cli-0.7.11}/scribe/util.py +0 -0
- {scribe_cli-0.7.10 → scribe_cli-0.7.11}/scribe_cli.egg-info/SOURCES.txt +0 -0
- {scribe_cli-0.7.10 → scribe_cli-0.7.11}/scribe_cli.egg-info/dependency_links.txt +0 -0
- {scribe_cli-0.7.10 → scribe_cli-0.7.11}/scribe_cli.egg-info/entry_points.txt +0 -0
- {scribe_cli-0.7.10 → scribe_cli-0.7.11}/scribe_cli.egg-info/requires.txt +0 -0
- {scribe_cli-0.7.10 → scribe_cli-0.7.11}/scribe_cli.egg-info/top_level.txt +0 -0
- {scribe_cli-0.7.10 → scribe_cli-0.7.11}/scribe_data/__init__.py +0 -0
- {scribe_cli-0.7.10 → scribe_cli-0.7.11}/scribe_data/share/icon.png +0 -0
- {scribe_cli-0.7.10 → scribe_cli-0.7.11}/scribe_data/share/icon_recording.png +0 -0
- {scribe_cli-0.7.10 → scribe_cli-0.7.11}/scribe_data/share/icon_writing.png +0 -0
- {scribe_cli-0.7.10 → scribe_cli-0.7.11}/scribe_data/templates/scribe.desktop +0 -0
- {scribe_cli-0.7.10 → scribe_cli-0.7.11}/setup.cfg +0 -0
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.2
|
|
2
2
|
Name: scribe-cli
|
|
3
|
-
Version: 0.7.
|
|
3
|
+
Version: 0.7.11
|
|
4
4
|
Summary: scribe is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard on a computer
|
|
5
5
|
Author-email: Mahé Perrette <mahe.perrette@gmail.com>
|
|
6
6
|
License: MIT License
|
|
@@ -104,7 +104,7 @@ pip install -e .[all]
|
|
|
104
104
|
You can leave the optional dependencies (leave out `[all]`) but must install at least one of `vosk` or `openai-whisper` packages (see Usage below).
|
|
105
105
|
|
|
106
106
|
The `vosk` language models will download on-the-fly.
|
|
107
|
-
The default download folder is `$XDG_CACHE_HOME/{backend}` where `$XDG_CACHE_HOME` defaults to `$HOME/.cache` (note for the `
|
|
107
|
+
The default download folder is `$XDG_CACHE_HOME/{backend}` where `$XDG_CACHE_HOME` defaults to `$HOME/.cache` (note for the `whisper` backend
|
|
108
108
|
the default is left to the `openai-whisper` package and might change in the future).
|
|
109
109
|
|
|
110
110
|
|
|
@@ -122,8 +122,8 @@ You can interrupt the recording via Ctrl + C and start again or change model. Th
|
|
|
122
122
|
|
|
123
123
|
The default (`whisper`) is excellent at transcribing a full-length audio sequences in [many languages](https://github.com/openai/whisper?tab=readme-ov-file#available-models-and-languages). It is really impressive,
|
|
124
124
|
but it cannot do real-time, and depending on the model can have relatively long execution time, especially with the `turbo` model (at least on my laptop with CPU only). The `small` model is also excellent and runs much faster. It is selected as default in `scribe` for that reason.
|
|
125
|
-
With the `
|
|
126
|
-
|
|
125
|
+
With the `whisper` model the registration stops after a 2-second silence is detected. You can also stop the registration manually before the transcription occurs (Ctrl + C or Stop in the `--app` mode).
|
|
126
|
+
By default, the recording will only last 120 seconds. You can fine-tune this behaviour via the `--silence`, `--duration` and `--restart-after-silence` parameters.
|
|
127
127
|
|
|
128
128
|
The `vosk` backend is much faster and very good at doing real-time transcription for one language, but tended to make more mistakes in my tests and it does not do punctuation.
|
|
129
129
|
It becomes really powerful when used for longer or interactive typing session with the [keyboard](#virtual-keyboard-experimental) option, e.g. to make notes or chat with an AI.
|
|
@@ -41,7 +41,7 @@ pip install -e .[all]
|
|
|
41
41
|
You can leave the optional dependencies (leave out `[all]`) but must install at least one of `vosk` or `openai-whisper` packages (see Usage below).
|
|
42
42
|
|
|
43
43
|
The `vosk` language models will download on-the-fly.
|
|
44
|
-
The default download folder is `$XDG_CACHE_HOME/{backend}` where `$XDG_CACHE_HOME` defaults to `$HOME/.cache` (note for the `
|
|
44
|
+
The default download folder is `$XDG_CACHE_HOME/{backend}` where `$XDG_CACHE_HOME` defaults to `$HOME/.cache` (note for the `whisper` backend
|
|
45
45
|
the default is left to the `openai-whisper` package and might change in the future).
|
|
46
46
|
|
|
47
47
|
|
|
@@ -59,8 +59,8 @@ You can interrupt the recording via Ctrl + C and start again or change model. Th
|
|
|
59
59
|
|
|
60
60
|
The default (`whisper`) is excellent at transcribing a full-length audio sequences in [many languages](https://github.com/openai/whisper?tab=readme-ov-file#available-models-and-languages). It is really impressive,
|
|
61
61
|
but it cannot do real-time, and depending on the model can have relatively long execution time, especially with the `turbo` model (at least on my laptop with CPU only). The `small` model is also excellent and runs much faster. It is selected as default in `scribe` for that reason.
|
|
62
|
-
With the `
|
|
63
|
-
|
|
62
|
+
With the `whisper` model the registration stops after a 2-second silence is detected. You can also stop the registration manually before the transcription occurs (Ctrl + C or Stop in the `--app` mode).
|
|
63
|
+
By default, the recording will only last 120 seconds. You can fine-tune this behaviour via the `--silence`, `--duration` and `--restart-after-silence` parameters.
|
|
64
64
|
|
|
65
65
|
The `vosk` backend is much faster and very good at doing real-time transcription for one language, but tended to make more mistakes in my tests and it does not do punctuation.
|
|
66
66
|
It becomes really powerful when used for longer or interactive typing session with the [keyboard](#virtual-keyboard-experimental) option, e.g. to make notes or chat with an AI.
|
|
@@ -147,7 +147,8 @@ def get_transcriber(o, prompt=True):
|
|
|
147
147
|
|
|
148
148
|
elif backend == "whisper":
|
|
149
149
|
transcriber = WhisperTranscriber(model_name=model, language=o.language, samplerate=o.samplerate,
|
|
150
|
-
timeout=o.duration, silence_duration=o.silence,
|
|
150
|
+
timeout=o.duration, silence_duration=o.silence, silence_thresh=o.silence_db,
|
|
151
|
+
restart_after_silence=o.restart_after_silence,
|
|
151
152
|
model_kwargs={"download_root": o.download_folder_whisper})
|
|
152
153
|
|
|
153
154
|
else:
|
|
@@ -178,12 +179,13 @@ def get_parser():
|
|
|
178
179
|
parser.add_argument("--microphone-device", help="The device index of the microphone to use.", type=int)
|
|
179
180
|
parser.add_argument("--keyboard", action="store_true")
|
|
180
181
|
parser.add_argument("--no-clipboard", dest="clipboard", action="store_false")
|
|
181
|
-
parser.add_argument("--latency", default=0, type=float, help="keyboard latency")
|
|
182
|
+
parser.add_argument("--latency", default=0.01, type=float, help="keyboard latency")
|
|
182
183
|
parser.add_argument("--ascii", action="store_true", help="Use unidecode for keyboard typing in ascii")
|
|
183
184
|
|
|
184
185
|
group = parser.add_argument_group("whisper options")
|
|
185
|
-
group.add_argument("--duration", default=120, type=
|
|
186
|
-
group.add_argument("--silence", default=2, type=float, help="silence duration
|
|
186
|
+
group.add_argument("--duration", default=120, type=float, help="Max duration of the whisper recording (default %(default)ss)")
|
|
187
|
+
group.add_argument("--silence", default=2, type=float, help="silence duration (default %(default)ss)")
|
|
188
|
+
group.add_argument("--silence-db", default=-30, type=float, help="silence magnitude in db (default %(default)ss)")
|
|
187
189
|
group.add_argument("--restart-after-silence", action="store_true", help="Restart the recording after a transcription triggered by a silence")
|
|
188
190
|
|
|
189
191
|
parser.add_argument("--download-folder-vosk", help="Folder to store Vosk models.")
|
|
@@ -249,7 +251,15 @@ def create_app(micro, transcriber, **kwargs):
|
|
|
249
251
|
image_recording = Image.alpha_composite(image_recording.convert("RGBA"), image_writing.convert("RGBA"))
|
|
250
252
|
|
|
251
253
|
def update_icon(icon, force=False):
|
|
252
|
-
if transcriber.recording:
|
|
254
|
+
if transcriber.recording and transcriber.waiting:
|
|
255
|
+
# this is the situation with the whisper backend when the microphone is recording
|
|
256
|
+
# but we wait for the speaker to speak (silence)
|
|
257
|
+
if force or getattr(icon, "_icon_label", None) != None:
|
|
258
|
+
icon.icon = image
|
|
259
|
+
icon._icon_label = None
|
|
260
|
+
icon.update_menu()
|
|
261
|
+
|
|
262
|
+
elif transcriber.recording:
|
|
253
263
|
if force or getattr(icon, "_icon_label", None) != "recording":
|
|
254
264
|
icon.icon = image_recording
|
|
255
265
|
icon._icon_label = "recording"
|
|
@@ -361,6 +371,7 @@ def main(args=None):
|
|
|
361
371
|
if transcriber.backend == "whisper":
|
|
362
372
|
print(f"{colored('[t]', 'light_yellow')} change duration (currently {colored(transcriber.timeout, 'light_blue')} s)")
|
|
363
373
|
print(f"{colored('[b]', 'light_yellow')} change silence (currently {colored(transcriber.silence_duration, 'light_blue')} s)")
|
|
374
|
+
print(f"{colored('[db]', 'light_yellow')} change backround noise (currently {colored(transcriber.silence_thresh, 'light_blue')} db)")
|
|
364
375
|
print(f"{colored('[a]', 'light_yellow')} auto-restart after silence is {colored(transcriber.restart_after_silence, 'light_blue')} toggle?")
|
|
365
376
|
exclude_flags = ["keyboard", "clipboard", "app", "prompt", "restart_after_silence"]
|
|
366
377
|
display_flags = [a.dest for a in parser._actions if a.help != argparse.SUPPRESS]
|
|
@@ -401,9 +412,9 @@ def main(args=None):
|
|
|
401
412
|
if key == "t":
|
|
402
413
|
ans = input(f"Enter new duration in seconds (current: {transcriber.timeout}): ")
|
|
403
414
|
try:
|
|
404
|
-
o.duration = transcriber.timeout =
|
|
415
|
+
o.duration = transcriber.timeout = float(ans)
|
|
405
416
|
except:
|
|
406
|
-
print("Invalid duration. Must be
|
|
417
|
+
print("Invalid duration. Must be a float.")
|
|
407
418
|
continue
|
|
408
419
|
if key == "latency":
|
|
409
420
|
ans = input(f"Enter new keyboard latency in seconds (current: {o.latency}): ")
|
|
@@ -415,9 +426,16 @@ def main(args=None):
|
|
|
415
426
|
if key == "b":
|
|
416
427
|
ans = input(f"Enter new silence break duration in seconds (current: {transcriber.silence_duration}): ")
|
|
417
428
|
try:
|
|
418
|
-
o.silence = transcriber.silence_duration =
|
|
429
|
+
o.silence = transcriber.silence_duration = float(ans)
|
|
430
|
+
except:
|
|
431
|
+
print("Invalid duration. Must be a float.")
|
|
432
|
+
continue
|
|
433
|
+
if key == "db":
|
|
434
|
+
ans = input(f"Enter new background noise threshold to detect silence (current: {transcriber.silence_thresh}): ")
|
|
435
|
+
try:
|
|
436
|
+
o.silence_db = transcriber.silence_thresh = float(ans)
|
|
419
437
|
except:
|
|
420
|
-
print("Invalid duration. Must be
|
|
438
|
+
print("Invalid duration. Must be a float.")
|
|
421
439
|
continue
|
|
422
440
|
if key:
|
|
423
441
|
if hasattr(o, key) and isinstance(getattr(o, key), bool):
|
|
@@ -34,6 +34,7 @@ class AbstractTranscriber:
|
|
|
34
34
|
self.restart_after_silence = restart_after_silence
|
|
35
35
|
self.recording = False
|
|
36
36
|
self.busy = False
|
|
37
|
+
self.waiting = False
|
|
37
38
|
self.reset()
|
|
38
39
|
|
|
39
40
|
def get_elapsed(self):
|
|
@@ -52,7 +53,6 @@ class AbstractTranscriber:
|
|
|
52
53
|
def reset(self):
|
|
53
54
|
self.audio_buffer = b''
|
|
54
55
|
self.start_time = time.time()
|
|
55
|
-
self.last_sound_time = time.time()
|
|
56
56
|
|
|
57
57
|
def start_recording(self, microphone,
|
|
58
58
|
start_message="Recording... Press Ctrl+C to stop.",
|
|
@@ -60,7 +60,13 @@ class AbstractTranscriber:
|
|
|
60
60
|
|
|
61
61
|
self.reset()
|
|
62
62
|
self.recording = True
|
|
63
|
+
self.waiting = True
|
|
63
64
|
self.busy = True
|
|
65
|
+
if self.silence_duration is not None:
|
|
66
|
+
self.last_sound_time = time.time() - self.silence_duration
|
|
67
|
+
else:
|
|
68
|
+
self.last_sound_time = time.time()
|
|
69
|
+
previous_waiting = self.waiting
|
|
64
70
|
|
|
65
71
|
try:
|
|
66
72
|
|
|
@@ -75,19 +81,31 @@ class AbstractTranscriber:
|
|
|
75
81
|
if is_silent(data, self.silence_thresh):
|
|
76
82
|
silence_duration = time.time() - self.last_sound_time
|
|
77
83
|
|
|
78
|
-
|
|
84
|
+
previous_waiting = self.waiting
|
|
85
|
+
self.waiting = self.silence_duration is not None and silence_duration >= self.silence_duration
|
|
86
|
+
|
|
87
|
+
if self.waiting and len(self.audio_buffer) > 0:
|
|
79
88
|
if self.restart_after_silence:
|
|
89
|
+
self.recording = False # for the system tray icon
|
|
80
90
|
result = self.finalize()
|
|
81
91
|
microphone.q.queue.clear()
|
|
82
92
|
self.reset()
|
|
83
93
|
yield result
|
|
94
|
+
self.recording = True # for the system tray icon
|
|
84
95
|
else:
|
|
85
96
|
raise StopRecording("Silence detected: {:.2f} seconds".format(silence_duration))
|
|
86
97
|
|
|
87
98
|
else:
|
|
88
99
|
self.last_sound_time = time.time()
|
|
100
|
+
self.waiting = False
|
|
89
101
|
|
|
90
|
-
|
|
102
|
+
# don't accumulate very long silences
|
|
103
|
+
if not self.waiting:
|
|
104
|
+
yield self.transcribe_realtime_audio(data)
|
|
105
|
+
|
|
106
|
+
else:
|
|
107
|
+
if not previous_waiting:
|
|
108
|
+
print("Silence detected...waiting for more audio")
|
|
91
109
|
|
|
92
110
|
if self.is_overtime():
|
|
93
111
|
raise StopRecording("Overtime: {:.2f} seconds".format(self.get_elapsed()))
|
|
@@ -98,6 +116,7 @@ class AbstractTranscriber:
|
|
|
98
116
|
pass
|
|
99
117
|
|
|
100
118
|
finally:
|
|
119
|
+
self.waiting = False
|
|
101
120
|
self.recording = False
|
|
102
121
|
result = self.finalize()
|
|
103
122
|
microphone.q.queue.clear()
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.2
|
|
2
2
|
Name: scribe-cli
|
|
3
|
-
Version: 0.7.
|
|
3
|
+
Version: 0.7.11
|
|
4
4
|
Summary: scribe is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard on a computer
|
|
5
5
|
Author-email: Mahé Perrette <mahe.perrette@gmail.com>
|
|
6
6
|
License: MIT License
|
|
@@ -104,7 +104,7 @@ pip install -e .[all]
|
|
|
104
104
|
You can leave the optional dependencies (leave out `[all]`) but must install at least one of `vosk` or `openai-whisper` packages (see Usage below).
|
|
105
105
|
|
|
106
106
|
The `vosk` language models will download on-the-fly.
|
|
107
|
-
The default download folder is `$XDG_CACHE_HOME/{backend}` where `$XDG_CACHE_HOME` defaults to `$HOME/.cache` (note for the `
|
|
107
|
+
The default download folder is `$XDG_CACHE_HOME/{backend}` where `$XDG_CACHE_HOME` defaults to `$HOME/.cache` (note for the `whisper` backend
|
|
108
108
|
the default is left to the `openai-whisper` package and might change in the future).
|
|
109
109
|
|
|
110
110
|
|
|
@@ -122,8 +122,8 @@ You can interrupt the recording via Ctrl + C and start again or change model. Th
|
|
|
122
122
|
|
|
123
123
|
The default (`whisper`) is excellent at transcribing a full-length audio sequences in [many languages](https://github.com/openai/whisper?tab=readme-ov-file#available-models-and-languages). It is really impressive,
|
|
124
124
|
but it cannot do real-time, and depending on the model can have relatively long execution time, especially with the `turbo` model (at least on my laptop with CPU only). The `small` model is also excellent and runs much faster. It is selected as default in `scribe` for that reason.
|
|
125
|
-
With the `
|
|
126
|
-
|
|
125
|
+
With the `whisper` model the registration stops after a 2-second silence is detected. You can also stop the registration manually before the transcription occurs (Ctrl + C or Stop in the `--app` mode).
|
|
126
|
+
By default, the recording will only last 120 seconds. You can fine-tune this behaviour via the `--silence`, `--duration` and `--restart-after-silence` parameters.
|
|
127
127
|
|
|
128
128
|
The `vosk` backend is much faster and very good at doing real-time transcription for one language, but tended to make more mistakes in my tests and it does not do punctuation.
|
|
129
129
|
It becomes really powerful when used for longer or interactive typing session with the [keyboard](#virtual-keyboard-experimental) option, e.g. to make notes or chat with an AI.
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|