PyPI - scribe-cli - Versions diffs - 0.11.1__tar.gz → 0.12.1__tar.gz - Mend

scribe-cli 0.11.1tar.gz → 0.12.1tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (30) hide show

{scribe_cli-0.11.1/scribe_cli.egg-info → scribe_cli-0.12.1}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.2
 Name: scribe-cli
-Version: 0.11.1
+Version: 0.12.1
 Summary: scribe is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard on a computer
 Author-email: Mahé Perrette <mahe.perrette@gmail.com>
 License: MIT License
@@ -196,8 +196,8 @@ To activate start with:
 scribe --app
 ```
 or toggle the app option in the interactive menu. The scribe icon will show, with Record and other options. The icon will change based on what the app is doing. It is possible to choose from a set
-of predefined models, or to Quit and choose from the terminal before pressing Enter again.
-For the vosk model, there are only two states : recording + transcribing or Idle. For the whisper model there are three states visible from the icon: recording, transcribing and idle/waiting.
+of predefined models (controlled by `--vosk-models` and `whisper-models`) and options, or to Quit and choose from the terminal before pressing Enter again.
+For the vosk model, there are only two states : recording + transcribing or Idle. For the whisper model there are three states visible from the icon: recording/waiting, transcribing and idle.
 That option requires `pystray` to be installed. This is included with the `pip install ...[all]` option. In Ubuntu the following dependencies were required to make the menus appear:
 ```bash
@@ -220,13 +220,10 @@ scribe-install --clipboard  --api YOUROPENAIAPIKEY
 ```
 (`--api` is optional and only useful if you plan to use `openaiapi` backend later on)
-And to make an app running outside the terminal:
+It is also possible to run an app fully outside the terminal:
 ```bash
-scribe-install --backend openaiapi --name "Scribe App" --keyboard --clipboard --app --no-prompt --no-terminal --restart-after-silence --api YOUROPENAIAPIKEY
+scribe-install --backend openaiapi --name "Scribe App" --keyboard --clipboard --app --no-prompt --no-terminal --restart-after-silence --api YOUROPENAIAPIKEY  --vosk-models vosk-model-fr-0.22 --whisper-models small turbo
 ```
-This will install two separate apps (names "Scribe" and "Scribe App")
 ## Fine tuning

{scribe_cli-0.11.1 → scribe_cli-0.12.1}/README.md RENAMED Viewed

@@ -128,8 +128,8 @@ To activate start with:
 scribe --app
 ```
 or toggle the app option in the interactive menu. The scribe icon will show, with Record and other options. The icon will change based on what the app is doing. It is possible to choose from a set
-of predefined models, or to Quit and choose from the terminal before pressing Enter again.
-For the vosk model, there are only two states : recording + transcribing or Idle. For the whisper model there are three states visible from the icon: recording, transcribing and idle/waiting.
+of predefined models (controlled by `--vosk-models` and `whisper-models`) and options, or to Quit and choose from the terminal before pressing Enter again.
+For the vosk model, there are only two states : recording + transcribing or Idle. For the whisper model there are three states visible from the icon: recording/waiting, transcribing and idle.
 That option requires `pystray` to be installed. This is included with the `pip install ...[all]` option. In Ubuntu the following dependencies were required to make the menus appear:
 ```bash
@@ -152,13 +152,10 @@ scribe-install --clipboard  --api YOUROPENAIAPIKEY
 ```
 (`--api` is optional and only useful if you plan to use `openaiapi` backend later on)
-And to make an app running outside the terminal:
+It is also possible to run an app fully outside the terminal:
 ```bash
-scribe-install --backend openaiapi --name "Scribe App" --keyboard --clipboard --app --no-prompt --no-terminal --restart-after-silence --api YOUROPENAIAPIKEY
+scribe-install --backend openaiapi --name "Scribe App" --keyboard --clipboard --app --no-prompt --no-terminal --restart-after-silence --api YOUROPENAIAPIKEY  --vosk-models vosk-model-fr-0.22 --whisper-models small turbo
 ```
-This will install two separate apps (names "Scribe" and "Scribe App")
 ## Fine tuning

{scribe_cli-0.11.1 → scribe_cli-0.12.1}/scribe/_version.py RENAMED Viewed

@@ -12,5 +12,5 @@ __version__: str
 __version_tuple__: VERSION_TUPLE
 version_tuple: VERSION_TUPLE
-__version__ = version = '0.11.1'
-__version_tuple__ = version_tuple = (0, 11, 1)
+__version__ = version = '0.12.1'
+__version_tuple__ = version_tuple = (0, 12, 1)

{scribe_cli-0.11.1 → scribe_cli-0.12.1}/scribe/app.py RENAMED Viewed

@@ -3,6 +3,7 @@ import tomllib
 import re
 import time
 import argparse
+from typing import Iterable
 from scribe.audio import Microphone
 from scribe.util import print_partial, clear_line, prompt_choices, ansi_link, colored
 from scribe.models import VoskTranscriber, WhisperTranscriber, OpenaiAPITranscriber
@@ -204,13 +205,17 @@ def get_parser():
     group.add_argument("--silence", default=2, type=float, help="silence duration (default %(default)s s)")
     group.add_argument("--silence-db", default=-30, type=float, help="silence magnitude in decibel (default %(default)s db)")
     group.add_argument("-a", "--restart-after-silence", action="store_true", help="Restart the recording after a transcription triggered by a silence")
+    group.add_argument("--download-folder-whisper", help="Folder to store Whisper models.")
     group = parser.add_argument_group("whisper api")
     group.add_argument("--api-key",
                         help="API key for the Whisper API backend.")
+    group = parser.add_argument_group("App")
+    group.add_argument("--vosk-models", nargs="*", help="vosk models available for the app mode", default=vosk_models)
+    group.add_argument("--whisper-models", nargs="*", help="whisper models available for the app mode", default=whisper_models)
     parser.add_argument("--download-folder-vosk", help="Folder to store Vosk models.")
-    parser.add_argument("--download-folder-whisper", help="Folder to store Whisper models.")
     return parser
@@ -251,7 +256,7 @@ def start_recording(micro, transcriber, clipboard=True, keyboard=False, latency=
         callback()
-def create_app(micro, transcriber, other_transcribers=None, **kwargs):
+def create_app(micro, transcriber, other_transcribers=None, transcriber_options=[], **kwargs):
     import pystray
     from pystray import Menu as pystrayMenu, MenuItem as Item
     from PIL import Image
@@ -340,6 +345,9 @@ def create_app(micro, transcriber, other_transcribers=None, **kwargs):
     def callback_set_model(icon, item):
         transcriber = icon._transcriber
+        if transcriber.model_name == str(item):
+            transcriber.log(f"Already using model {str(item)}")
+            return
         callback_stop_recording(icon, item)
         model_name = str(item)
         meta = other_transcribers_dict[model_name]
@@ -351,7 +359,24 @@ def create_app(micro, transcriber, other_transcribers=None, **kwargs):
         icon.update_menu()
     def callback_toggle_option(icon, item):
-        kwargs[str(item)] = not kwargs[str(item)]
+        callback_stop_recording(icon, item)
+        if str(item) in transcriber_options:
+            # toggle the option on the current transcriber
+            if str(item) in icon._transcriber._frozen_options or type(getattr(icon._transcriber, str(item), None)) is not bool:
+                print("Skipped setting option", item)
+                return
+            newvalue = not getattr(icon._transcriber, str(item))
+            setattr(icon._transcriber, str(item), newvalue)
+            # set the option on the other transcribers as well
+            if other_transcribers:
+                for name in other_transcribers_dict:
+                    meta = other_transcribers_dict[name]
+                    if str(item) in meta:
+                        meta[str(item)] = newvalue
+        else:
+            kwargs[str(item)] = not kwargs[str(item)]
+            print("Option set [", item, "] to", kwargs[str(item)])
     def is_model_selection(item):
         return icon._model_selection
@@ -362,23 +387,34 @@ def create_app(micro, transcriber, other_transcribers=None, **kwargs):
     def is_not_recording(item):
         return not is_recording(item) and not is_model_selection(item)
-    def is_checked(item):
+    def is_checked_model(item):
         return icon._transcriber.model_name == str(item)
     def is_checked_option(item):
+        if not is_option_visible(item):
+            return False
+        if str(item) in transcriber_options:
+            return getattr(icon._transcriber, str(item))
         return kwargs[str(item)]
+    def is_option_visible(item):
+        if str(item) in transcriber_options:
+            return str(item) not in icon._transcriber._frozen_options
+        return True
     modeltitle = f"{transcriber.backend} :: {transcriber.model_name}"
     title = f"scribe :: {modeltitle}"
+    options = [name for name in kwargs if isinstance(kwargs[name], bool)] + [name for name in transcriber_options if isinstance(getattr(transcriber, name), bool)]
     menus = []
     menus.append(Item(f"Record", callback_record, visible=is_not_recording, default=True))
     menus.append(Item("Stop", callback_stop_recording, visible=is_recording))
     menus.append(Item("Choose Model", pystrayMenu(
-        *(Item(f"{name}", callback_set_model, checked=is_checked) for name in other_transcribers_dict)))
+        *(Item(f"{name}", callback_set_model, checked=is_checked_model) for name in other_transcribers_dict)))
     )
     menus.append(Item("Toggle Options", pystrayMenu(
-        *(Item(f"{name}", callback_toggle_option, checked=is_checked_option) for name in kwargs if isinstance(kwargs[name], bool))))
+        *(Item(f"{name}", callback_toggle_option, checked=is_checked_option, visible=is_option_visible) for name in options)))
     )
     menus.append(Item('Quit', callback_quit))
@@ -393,6 +429,8 @@ def create_app(micro, transcriber, other_transcribers=None, **kwargs):
     return icon
+def _filter_options(d: dict, exclude: Iterable) -> dict:
+    return {k: v for k, v in d.items() if k not in exclude}
 def main(args=None):
@@ -525,10 +563,11 @@ def main(args=None):
             app = create_app(micro, transcriber, other_transcribers=[
                 {**vars(o), "backend": "openaiapi", "model": "whisper-1"},
-                *[{**vars(o), "backend": "whisper", "model": model} for model in whisper_models],
-                *[{**vars(o), "backend": "vosk", "model": model} for model in vosk_models]],
+                *[{**vars(o), "backend": "whisper", "model": model} for model in o.whisper_models],
+                *[{**_filter_options(vars(o), exclude=VoskTranscriber._frozen_options), "backend": "vosk", "model": model} for model in o.vosk_models]],
                              clipboard=o.clipboard, output_file=o.output_file,
-                             keyboard=o.keyboard, latency=o.latency, ascii=o.ascii, **greetings)
+                             keyboard=o.keyboard, latency=o.latency, ascii=o.ascii,
+                             transcriber_options=["restart_after_silence"], **greetings)
             print("Starting app...")
             app.run()
         else:

{scribe_cli-0.11.1 → scribe_cli-0.12.1}/scribe/models.py RENAMED Viewed

@@ -16,11 +16,15 @@ HOME = os.environ.get('HOME', os.path.expanduser('~'))
 XDG_CACHE_HOME = os.environ.get('XDG_CACHE_HOME', os.path.join(HOME, '.cache'))
 VOSK_MODELS_FOLDER = os.path.join(XDG_CACHE_HOME, "vosk")
+class SilenceDetected(Exception):
+    pass
 class StopRecording(Exception):
     pass
 class AbstractTranscriber:
     backend = None
+    _frozen_options = frozenset()
     def __init__(self, model, model_name=None, language=None, samplerate=16000, timeout=None, model_kwargs={},
                  silence_thresh=-40, silence_duration=2, restart_after_silence=False, logger=None):
         self.model_name = model_name
@@ -50,7 +54,29 @@ class AbstractTranscriber:
         return self.timeout is not None and time.time() - self.start_time > self.timeout
     def transcribe_realtime_audio(self, audio_bytes=b""):
-        self.audio_buffer += audio_bytes
+        # Vérifier si le segment est un silence
+        if is_silent(audio_bytes, self.silence_thresh):
+            self.silence_buffer += audio_bytes
+            silence_duration = time.time() - self.last_sound_time
+            self.waiting = self.silence_duration is not None and silence_duration >= self.silence_duration
+            if self.waiting and len(self.audio_buffer) > 0:
+                if self.restart_after_silence:
+                    raise SilenceDetected("Silence detected: {:.2f} seconds".format(silence_duration))
+                else:
+                    raise StopRecording("Silence detected: {:.2f} seconds".format(silence_duration))
+        else:
+            self.last_sound_time = time.time()
+            self.waiting = False
+            silence_buffer_data = np.frombuffer(self.silence_buffer, dtype=np.int16)
+            # add 0.5 seconds worth of silent data back to the audio buffer
+            half_a_second = 0.5
+            length_of_half_a_second = int(half_a_second * self.samplerate)
+            self.audio_buffer += silence_buffer_data[-length_of_half_a_second:].tobytes() + audio_bytes
+            self.silence_buffer = b''
         return {"partial": f"{len(self.audio_buffer)} bytes received (duration: {self.get_elapsed()} seconds)"}
     def transcribe_audio(self, audio_data):
@@ -59,6 +85,7 @@ class AbstractTranscriber:
     def reset(self):
         self.audio_buffer = b''
         self.start_time = time.time()
+        self.silence_buffer = b''
     def log(self, text):
         if text.startswith("\n"):
@@ -82,7 +109,7 @@ class AbstractTranscriber:
             self.last_sound_time = time.time() - self.silence_duration
         else:
             self.last_sound_time = time.time()
-        previous_waiting = self.waiting
+        # self.silence_buffer = b'' # already reset in self.reset()
         try:
@@ -93,35 +120,20 @@ class AbstractTranscriber:
                     while not microphone.q.empty():
                         data = microphone.q.get()
-                        # Vérifier si le segment est un silence
-                        if is_silent(data, self.silence_thresh):
-                            silence_duration = time.time() - self.last_sound_time
-                            previous_waiting = self.waiting
-                            self.waiting = self.silence_duration is not None and silence_duration >= self.silence_duration
-                            if self.waiting and len(self.audio_buffer) > 0:
-                                if self.restart_after_silence:
-                                    self.recording = False # for the system tray icon
-                                    result = self.finalize()
-                                    microphone.q.queue.clear()
-                                    self.reset()
-                                    yield result
-                                    self.recording = True # for the system tray icon
-                                else:
-                                    raise StopRecording("Silence detected: {:.2f} seconds".format(silence_duration))
-                        else:
-                            self.last_sound_time = time.time()
-                            self.waiting = False
-                        # don't accumulate very long silences
-                        if not self.waiting:
+                        # leave it to each transcriber to handle the silence in audio data
+                        try:
                             yield self.transcribe_realtime_audio(data)
-                        else:
-                            if not previous_waiting:
-                                self.log("Silence detected...waiting for more audio")
+                        # This exception triggers a pause in recording to allow for a transcription of the audio buffer
+                        except SilenceDetected as e:
+                            self.log(str(e))
+                            self.recording = False # for the system tray icon
+                            result = self.finalize()
+                            microphone.q.queue.clear()
+                            self.reset()
+                            yield result
+                            self.recording = True # for the system tray icon
+                            self.start_time = time.time() # reset the start time to avoid timeout
                         if self.is_overtime():
                             raise StopRecording("Overtime: {:.2f} seconds".format(self.get_elapsed()))
@@ -165,8 +177,10 @@ def get_vosk_recognizer(model, samplerate=16000):
 class VoskTranscriber(AbstractTranscriber):
     backend = "vosk"
+    _frozen_options = frozenset(["restart_after_silence", "silence_duration", "silence_thresh"])
     def __init__(self, model_name, model=None, model_kwargs={}, **kwargs):
+        kwargs["silence_thresh"] = -np.inf  # disable silence detection (this is handled by Vosk)
         if model is None:
             model = get_vosk_model(model_name, **model_kwargs)
         super().__init__(model, model_name, model_kwargs=model_kwargs, **kwargs)
@@ -222,7 +236,7 @@ class WhisperTranscriber(AbstractTranscriber):
         if len(self.audio_buffer) == 0:
             return {"text": ""}
         result = self.transcribe_audio(self.audio_buffer)
-        self.audio_buffer = b''
+        self.reset()
         return result

{scribe_cli-0.11.1 → scribe_cli-0.12.1/scribe_cli.egg-info}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.2
 Name: scribe-cli
-Version: 0.11.1
+Version: 0.12.1
 Summary: scribe is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard on a computer
 Author-email: Mahé Perrette <mahe.perrette@gmail.com>
 License: MIT License
@@ -196,8 +196,8 @@ To activate start with:
 scribe --app
 ```
 or toggle the app option in the interactive menu. The scribe icon will show, with Record and other options. The icon will change based on what the app is doing. It is possible to choose from a set
-of predefined models, or to Quit and choose from the terminal before pressing Enter again.
-For the vosk model, there are only two states : recording + transcribing or Idle. For the whisper model there are three states visible from the icon: recording, transcribing and idle/waiting.
+of predefined models (controlled by `--vosk-models` and `whisper-models`) and options, or to Quit and choose from the terminal before pressing Enter again.
+For the vosk model, there are only two states : recording + transcribing or Idle. For the whisper model there are three states visible from the icon: recording/waiting, transcribing and idle.
 That option requires `pystray` to be installed. This is included with the `pip install ...[all]` option. In Ubuntu the following dependencies were required to make the menus appear:
 ```bash
@@ -220,13 +220,10 @@ scribe-install --clipboard  --api YOUROPENAIAPIKEY
 ```
 (`--api` is optional and only useful if you plan to use `openaiapi` backend later on)
-And to make an app running outside the terminal:
+It is also possible to run an app fully outside the terminal:
 ```bash
-scribe-install --backend openaiapi --name "Scribe App" --keyboard --clipboard --app --no-prompt --no-terminal --restart-after-silence --api YOUROPENAIAPIKEY
+scribe-install --backend openaiapi --name "Scribe App" --keyboard --clipboard --app --no-prompt --no-terminal --restart-after-silence --api YOUROPENAIAPIKEY  --vosk-models vosk-model-fr-0.22 --whisper-models small turbo
 ```
-This will install two separate apps (names "Scribe" and "Scribe App")
 ## Fine tuning