PyPI - scribe-cli - Versions diffs - 0.10.0__tar.gz → 0.11.1__tar.gz - Mend

scribe-cli 0.10.0tar.gz → 0.11.1tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (30) hide show

{scribe_cli-0.10.0/scribe_cli.egg-info → scribe_cli-0.11.1}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.2
 Name: scribe-cli
-Version: 0.10.0
+Version: 0.11.1
 Summary: scribe is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard on a computer
 Author-email: Mahé Perrette <mahe.perrette@gmail.com>
 License: MIT License
@@ -158,7 +158,7 @@ The content of the (full) transcription is then pasted to the clipboard, and it
 Alternatively an output file can be indicated:
 ```bash
- --keyboard -o transcription.txt
+scribe -o transcription.txt
 ```
 ### Virtual keyboard (experimental)
@@ -195,7 +195,8 @@ To activate start with:
 ```bash
 scribe --app
 ```
-or toggle the app option in the interactive menu. The scribe icon will show, with Record, Stop or Quit options. The icon will change based on what the app is doing.
+or toggle the app option in the interactive menu. The scribe icon will show, with Record and other options. The icon will change based on what the app is doing. It is possible to choose from a set
+of predefined models, or to Quit and choose from the terminal before pressing Enter again.
 For the vosk model, there are only two states : recording + transcribing or Idle. For the whisper model there are three states visible from the icon: recording, transcribing and idle/waiting.
 That option requires `pystray` to be installed. This is included with the `pip install ...[all]` option. In Ubuntu the following dependencies were required to make the menus appear:
@@ -204,23 +205,27 @@ sudo apt install libcairo-dev libgirepository1.0-dev gir1.2-appindicator3-0.1
 pip install PyGObject
 ```
+<img src=https://github.com/user-attachments/assets/4c97f4b1-1a65-4d49-9f5a-a9f4287cfa5a width=300px>
 ## Start as an application in GNOME
 If you run Ubuntu (or else?) with GNOME, the script `scribe-install [...]` will create a `scribe.desktop` file and place it under `$HOME/.local/share/applications`
 to make it available from the quick launch menu. Any option will be passed on to `scribe`, with the additional options `--name` and `--no-terminal`.
 `--no-terminal` means no terminal will show up, and it also implies the options `--app --no-prompt`.
-e.g.
+In a relatively basic form
+```bash
+scribe-install --clipboard  --api YOUROPENAIAPIKEY
+```
+(`--api` is optional and only useful if you plan to use `openaiapi` backend later on)
+And to make an app running outside the terminal:
 ```bash
-scribe-install
-scribe-install --name "Scribe Whisper" --backend whisper --model small --clipboard --restart-after-silence --no-prompt
-scribe-install --name "Scribe Vosk FR" --backend vosk --language fr --keyboard --clipboard --no-terminal
+scribe-install --backend openaiapi --name "Scribe App" --keyboard --clipboard --app --no-prompt --no-terminal --restart-after-silence --api YOUROPENAIAPIKEY
 ```
-This will install three separate apps:
-- `Super + scribe` : will launch the default version with terminal prompt
-- `Super + whisper` : will launch a present version with the `small` model from `whisper` and start recording right away. You can see what is going on in the terminal and the result is ready to paste from the clipboard.
-- `Super + vosk fr` : will launch a preset version for real-time transcription in French with the `vosk` backend, and throughput to the clipboard and the keyboard, not even opening a terminal (you need to press Record in the tray icon menu to start the recording).
+This will install two separate apps (names "Scribe" and "Scribe App")
 ## Fine tuning

{scribe_cli-0.10.0 → scribe_cli-0.11.1}/README.md RENAMED Viewed

@@ -90,7 +90,7 @@ The content of the (full) transcription is then pasted to the clipboard, and it
 Alternatively an output file can be indicated:
 ```bash
- --keyboard -o transcription.txt
+scribe -o transcription.txt
 ```
 ### Virtual keyboard (experimental)
@@ -127,7 +127,8 @@ To activate start with:
 ```bash
 scribe --app
 ```
-or toggle the app option in the interactive menu. The scribe icon will show, with Record, Stop or Quit options. The icon will change based on what the app is doing.
+or toggle the app option in the interactive menu. The scribe icon will show, with Record and other options. The icon will change based on what the app is doing. It is possible to choose from a set
+of predefined models, or to Quit and choose from the terminal before pressing Enter again.
 For the vosk model, there are only two states : recording + transcribing or Idle. For the whisper model there are three states visible from the icon: recording, transcribing and idle/waiting.
 That option requires `pystray` to be installed. This is included with the `pip install ...[all]` option. In Ubuntu the following dependencies were required to make the menus appear:
@@ -136,23 +137,27 @@ sudo apt install libcairo-dev libgirepository1.0-dev gir1.2-appindicator3-0.1
 pip install PyGObject
 ```
+<img src=https://github.com/user-attachments/assets/4c97f4b1-1a65-4d49-9f5a-a9f4287cfa5a width=300px>
 ## Start as an application in GNOME
 If you run Ubuntu (or else?) with GNOME, the script `scribe-install [...]` will create a `scribe.desktop` file and place it under `$HOME/.local/share/applications`
 to make it available from the quick launch menu. Any option will be passed on to `scribe`, with the additional options `--name` and `--no-terminal`.
 `--no-terminal` means no terminal will show up, and it also implies the options `--app --no-prompt`.
-e.g.
+In a relatively basic form
+```bash
+scribe-install --clipboard  --api YOUROPENAIAPIKEY
+```
+(`--api` is optional and only useful if you plan to use `openaiapi` backend later on)
+And to make an app running outside the terminal:
 ```bash
-scribe-install
-scribe-install --name "Scribe Whisper" --backend whisper --model small --clipboard --restart-after-silence --no-prompt
-scribe-install --name "Scribe Vosk FR" --backend vosk --language fr --keyboard --clipboard --no-terminal
+scribe-install --backend openaiapi --name "Scribe App" --keyboard --clipboard --app --no-prompt --no-terminal --restart-after-silence --api YOUROPENAIAPIKEY
 ```
-This will install three separate apps:
-- `Super + scribe` : will launch the default version with terminal prompt
-- `Super + whisper` : will launch a present version with the `small` model from `whisper` and start recording right away. You can see what is going on in the terminal and the result is ready to paste from the clipboard.
-- `Super + vosk fr` : will launch a preset version for real-time transcription in French with the `vosk` backend, and throughput to the clipboard and the keyboard, not even opening a terminal (you need to press Record in the tray icon menu to start the recording).
+This will install two separate apps (names "Scribe" and "Scribe App")
 ## Fine tuning
@@ -162,4 +167,4 @@ Best is to check the available options in the online help:
 ```bash
 scribe --help
-```
+```

{scribe_cli-0.10.0 → scribe_cli-0.11.1}/scribe/_version.py RENAMED Viewed

@@ -12,5 +12,5 @@ __version__: str
 __version_tuple__: VERSION_TUPLE
 version_tuple: VERSION_TUPLE
-__version__ = version = '0.10.0'
-__version_tuple__ = version_tuple = (0, 10, 0)
+__version__ = version = '0.11.1'
+__version_tuple__ = version_tuple = (0, 11, 1)

{scribe_cli-0.10.0 → scribe_cli-0.11.1}/scribe/app.py RENAMED Viewed

@@ -55,49 +55,54 @@ class DummyTranscriber:
     def __getattr__(self, item):
         return None
-def get_transcriber(o, prompt=True):
+whisper_models = ["tiny", "base", "small", "medium", "large", "turbo"]
+whisper_english_models = ["tiny.en", "base.en", "small.en", "medium.en"]
+whisperapi_models = ["whisper-1"]
+vosk_models = [language_config["vosk"][lang]["model"] for lang in language_config["vosk"]]
-    whisper_models = ["tiny", "base", "small", "medium", "large", "turbo"]
-    whisper_english_models = ["tiny.en", "base.en", "small.en", "medium.en"]
-    whisperapi_models = ["whisper-1"]
-    if o.dummy:
+def get_transcriber(model=None, backend=None, dummy=False, prompt=True, language=None,
+                    samplerate=None, duration=None, silence=None, silence_db=None, restart_after_silence=None,
+                    api_key=None,
+                    download_folder_vosk=None, download_folder_whisper=None, **kwargs):
+    if dummy:
         return DummyTranscriber("whisper", "dummy")
-    if o.model and not o.backend:
-        if o.model.startswith("vosk-"):
-            o.backend = "vosk"
-        elif o.model in whisper_models + whisper_english_models:
-            o.backend = "whisper"
-        elif o.model in whisperapi_models:
-            o.backend = "openaiapi"
+    if model and not backend:
+        if model.startswith("vosk-"):
+            backend = "vosk"
+        elif model in whisper_models + whisper_english_models:
+            backend = "whisper"
+        elif model in whisperapi_models:
+            backend = "openaiapi"
-    if o.backend:
-        backend = o.backend
+    if backend:
+        backend = backend
     elif not prompt:
         backend = BACKENDS[0]
     else:
-        backend = prompt_choices(BACKENDS, o.backend, "backend", UNAVAILABLE_BACKENDS)
+        backend = prompt_choices(BACKENDS, backend, "backend", UNAVAILABLE_BACKENDS)
     print(f"Selected backend: {backend}")
-    if o.model:
-        model = pick_specialist_model(o.model, o.language, backend)
+    if model:
+        model = pick_specialist_model(model, language, backend)
     else:
         if backend == "vosk":
             available_languages = list(language_config[backend])
-            if o.language:
-                if o.language not in available_languages:
-                    print(f"Language '{o.language}' is not pre-defined (yet) for backend '{backend}'.")
+            if language:
+                if language not in available_languages:
+                    print(f"Language '{language}' is not pre-defined (yet) for backend '{backend}'.")
                     print(f"Yet it may actually exist.")
                     print(f"Please choose the model explictly from {ansi_link('https://alphacephei.com/vosk/models')}.")
                     print(f"Or pick one of the pre-defined languages: ", " ".join(available_languages))
                     exit(1)
-                choices = [language_config[backend][o.language]["model"]]
+                choices = [language_config[backend][language]["model"]]
                 default_model = choices[0] # this is a string
             else:
@@ -121,10 +126,10 @@ def get_transcriber(o, prompt=True):
             else:
                 model = default_model
-            model = pick_specialist_model(model, o.language, backend)
+            model = pick_specialist_model(model, language, backend)
         elif backend == "openaiapi":
-            model = o.model or "whisper-1"
+            model = model or "whisper-1"
         else:
             raise ValueError(f"Unknown backend: {backend}")
@@ -135,26 +140,26 @@ def get_transcriber(o, prompt=True):
     if backend == "vosk":
         try:
             transcriber = VoskTranscriber(model_name=model,
-                                        language=o.language,
-                                        samplerate=o.samplerate,
+                                        language=language,
+                                        samplerate=samplerate,
                                         timeout=None, # vosk keeps going (no timeout)
                                         silence_duration=None, # vosk handles silences internally
-                                        model_kwargs={"download_root": o.download_folder_vosk})
+                                        model_kwargs={"download_root": download_folder_vosk})
         except Exception as error:
             print(error)
             print(f"Failed to (down)load model {model}.")
             exit(1)
     elif backend == "whisper":
-        transcriber = WhisperTranscriber(model_name=model, language=o.language, samplerate=o.samplerate,
-                                         timeout=o.duration, silence_duration=o.silence, silence_thresh=o.silence_db,
-                                         restart_after_silence=o.restart_after_silence,
-                                         model_kwargs={"download_root": o.download_folder_whisper})
+        transcriber = WhisperTranscriber(model_name=model, language=language, samplerate=samplerate,
+                                         timeout=duration, silence_duration=silence, silence_thresh=silence_db,
+                                         restart_after_silence=restart_after_silence,
+                                         model_kwargs={"download_root": download_folder_whisper})
     elif backend == "openaiapi":
-        transcriber = OpenaiAPITranscriber(model_name=model, samplerate=o.samplerate,
-                                         timeout=o.duration, silence_duration=o.silence, silence_thresh=o.silence_db,
-                                         restart_after_silence=o.restart_after_silence, api_key=o.api_key)
+        transcriber = OpenaiAPITranscriber(model_name=model, samplerate=samplerate,
+                                         timeout=duration, silence_duration=silence, silence_thresh=silence_db,
+                                         restart_after_silence=restart_after_silence, api_key=api_key)
     else:
@@ -246,7 +251,7 @@ def start_recording(micro, transcriber, clipboard=True, keyboard=False, latency=
         callback()
-def create_app(micro, transcriber, **kwargs):
+def create_app(micro, transcriber, other_transcribers=None, **kwargs):
     import pystray
     from pystray import Menu as pystrayMenu, MenuItem as Item
     from PIL import Image
@@ -266,15 +271,8 @@ def create_app(micro, transcriber, **kwargs):
         image_recording = Image.alpha_composite(image_recording.convert("RGBA"), image_writing.convert("RGBA"))
     def update_icon(icon, force=False):
-        if transcriber.recording and transcriber.waiting:
-            # this is the situation with the whisper backend when the microphone is recording
-            # but we wait for the speaker to speak (silence)
-            if force or getattr(icon, "_icon_label", None) != None:
-                icon.icon = image
-                icon._icon_label = None
-                icon.update_menu()
-        elif transcriber.recording:
+        transcriber = icon._transcriber
+        if transcriber.recording:
             if force or getattr(icon, "_icon_label", None) != "recording":
                 icon.icon = image_recording
                 icon._icon_label = "recording"
@@ -293,6 +291,7 @@ def create_app(micro, transcriber, **kwargs):
                 icon.update_menu()
     def start_monitoring(icon):
+        transcriber = icon._transcriber
         try:
             while transcriber.busy:
                 update_icon(icon)
@@ -308,8 +307,8 @@ def create_app(micro, transcriber, **kwargs):
         icon.stop()
     def callback_stop_recording(icon, item):
+        transcriber = icon._transcriber
         # Here we need to stop the recording thread
         transcriber.interrupt = True
         if hasattr(icon, "_recording_thread"):
             icon._recording_thread.join()
@@ -317,10 +316,10 @@ def create_app(micro, transcriber, **kwargs):
             icon._monitoring_thread.join()
     def callback_record(icon, item):
-        # kwargs["callback"] = icon.update_menu   # NOTE: the thread will finish AFTER the callback is complete
+        transcriber = icon._transcriber
         if transcriber.busy:
-            transcriber.log("Still busy recording or transcribing.")
-            return
+            # transcriber.log("Still busy recording or transcribing.")
+            return callback_stop_recording(icon, item)  # play / stop behavior
         if hasattr(icon, "_recording_thread") and icon._recording_thread.is_alive():
             icon._recording_thread.join()
@@ -334,22 +333,63 @@ def create_app(micro, transcriber, **kwargs):
         icon._monitoring_thread = threading.Thread(target=start_monitoring, args=(icon,))
         icon._monitoring_thread.start()
+    if other_transcribers:
+        other_transcribers_dict = {meta["model"]: meta for meta in other_transcribers}
+    else:
+        other_transcribers_dict = {}
+    def callback_set_model(icon, item):
+        transcriber = icon._transcriber
+        callback_stop_recording(icon, item)
+        model_name = str(item)
+        meta = other_transcribers_dict[model_name]
+        icon._transcriber = transcriber = get_transcriber(**meta)
+        icon.title = f"scribe :: {transcriber.backend} :: {transcriber.model_name}"
+        print("Set", transcriber.backend, transcriber.model_name)
+        # icon.menu.items[0].__name__ = f"Record [{str(item)}]"
+        icon._model_selection = False
+        icon.update_menu()
+    def callback_toggle_option(icon, item):
+        kwargs[str(item)] = not kwargs[str(item)]
+    def is_model_selection(item):
+        return icon._model_selection
     def is_recording(item):
-        return transcriber.busy
+        return icon._transcriber.busy
     def is_not_recording(item):
-        return not is_recording(item)
+        return not is_recording(item) and not is_model_selection(item)
+    def is_checked(item):
+        return icon._transcriber.model_name == str(item)
-    # Create a menu
-    menu = pystrayMenu(
-        Item("Record", callback_record, visible=is_not_recording),
-        Item("Stop", callback_stop_recording, visible=is_recording),
-        Item('Quit', callback_quit),
+    def is_checked_option(item):
+        return kwargs[str(item)]
+    modeltitle = f"{transcriber.backend} :: {transcriber.model_name}"
+    title = f"scribe :: {modeltitle}"
+    menus = []
+    menus.append(Item(f"Record", callback_record, visible=is_not_recording, default=True))
+    menus.append(Item("Stop", callback_stop_recording, visible=is_recording))
+    menus.append(Item("Choose Model", pystrayMenu(
+        *(Item(f"{name}", callback_set_model, checked=is_checked) for name in other_transcribers_dict)))
+    )
+    menus.append(Item("Toggle Options", pystrayMenu(
+        *(Item(f"{name}", callback_toggle_option, checked=is_checked_option) for name in kwargs if isinstance(kwargs[name], bool))))
     )
+    menus.append(Item('Quit', callback_quit))
+    # Create a menu
+    menu = pystrayMenu(*menus)
     # Create the system tray icon
-    icon = pystray.Icon('scribe', image, "scribe", menu)
+    icon = pystray.Icon('scribe', image, title, menu)
+    icon._model_selection = False
+    icon._transcriber = transcriber
+    del transcriber
     return icon
@@ -368,7 +408,7 @@ def main(args=None):
     while True:
         if transcriber is None:
-            transcriber = get_transcriber(o, prompt=o.prompt)
+            transcriber = get_transcriber(**vars(o))
         print(f"Model [{colored(transcriber.model_name, 'light_blue', attrs=['bold'])}] from [{colored(transcriber.backend, 'light_blue', attrs=['bold'])}] selected.")
         show_output = ["clipboard", "keyboard", "output_file"]
         show_options = ["ascii", "restart_after_silence"]
@@ -482,7 +522,12 @@ def main(args=None):
             greetings = dict(
                 start_message = "Listening... Use the try icon menu to stop.",
             )
-            app = create_app(micro, transcriber, clipboard=o.clipboard, output_file=o.output_file,
+            app = create_app(micro, transcriber, other_transcribers=[
+                {**vars(o), "backend": "openaiapi", "model": "whisper-1"},
+                *[{**vars(o), "backend": "whisper", "model": model} for model in whisper_models],
+                *[{**vars(o), "backend": "vosk", "model": model} for model in vosk_models]],
+                             clipboard=o.clipboard, output_file=o.output_file,
                              keyboard=o.keyboard, latency=o.latency, ascii=o.ascii, **greetings)
             print("Starting app...")
             app.run()

{scribe_cli-0.10.0 → scribe_cli-0.11.1}/scribe/models.py RENAMED Viewed

@@ -242,6 +242,7 @@ class OpenaiAPITranscriber(WhisperTranscriber):
     def transcribe_audio(self, audio_bytes):
         self.log("\nTranscribing")
         import io
+        import openai
         import soundfile as sf
         audio_data = np.frombuffer(audio_bytes, dtype=np.int16).flatten().astype(np.float32) / 32768.0
         # Write the audio data to an in-memory file in WAV format
@@ -249,8 +250,12 @@ class OpenaiAPITranscriber(WhisperTranscriber):
         sf.write(buffer, audio_data, self.samplerate, format='WAV')
         buffer.seek(0)
         buffer.name = "audio.wav"  # Set a filename with a valid extension
-        transcription = self.model.audio.transcriptions.create(
-            model=self.model_name,
-            file=buffer,
-        )
+        try:
+            transcription = self.model.audio.transcriptions.create(
+                model=self.model_name,
+                file=buffer,
+            )
+        except openai.BadRequestError as e:
+            self.log(f"Error: {e}")
+            return {"text": ""}
         return {"text": transcription.text}

{scribe_cli-0.10.0 → scribe_cli-0.11.1/scribe_cli.egg-info}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.2
 Name: scribe-cli
-Version: 0.10.0
+Version: 0.11.1
 Summary: scribe is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard on a computer
 Author-email: Mahé Perrette <mahe.perrette@gmail.com>
 License: MIT License
@@ -158,7 +158,7 @@ The content of the (full) transcription is then pasted to the clipboard, and it
 Alternatively an output file can be indicated:
 ```bash
- --keyboard -o transcription.txt
+scribe -o transcription.txt
 ```
 ### Virtual keyboard (experimental)
@@ -195,7 +195,8 @@ To activate start with:
 ```bash
 scribe --app
 ```
-or toggle the app option in the interactive menu. The scribe icon will show, with Record, Stop or Quit options. The icon will change based on what the app is doing.
+or toggle the app option in the interactive menu. The scribe icon will show, with Record and other options. The icon will change based on what the app is doing. It is possible to choose from a set
+of predefined models, or to Quit and choose from the terminal before pressing Enter again.
 For the vosk model, there are only two states : recording + transcribing or Idle. For the whisper model there are three states visible from the icon: recording, transcribing and idle/waiting.
 That option requires `pystray` to be installed. This is included with the `pip install ...[all]` option. In Ubuntu the following dependencies were required to make the menus appear:
@@ -204,23 +205,27 @@ sudo apt install libcairo-dev libgirepository1.0-dev gir1.2-appindicator3-0.1
 pip install PyGObject
 ```
+<img src=https://github.com/user-attachments/assets/4c97f4b1-1a65-4d49-9f5a-a9f4287cfa5a width=300px>
 ## Start as an application in GNOME
 If you run Ubuntu (or else?) with GNOME, the script `scribe-install [...]` will create a `scribe.desktop` file and place it under `$HOME/.local/share/applications`
 to make it available from the quick launch menu. Any option will be passed on to `scribe`, with the additional options `--name` and `--no-terminal`.
 `--no-terminal` means no terminal will show up, and it also implies the options `--app --no-prompt`.
-e.g.
+In a relatively basic form
+```bash
+scribe-install --clipboard  --api YOUROPENAIAPIKEY
+```
+(`--api` is optional and only useful if you plan to use `openaiapi` backend later on)
+And to make an app running outside the terminal:
 ```bash
-scribe-install
-scribe-install --name "Scribe Whisper" --backend whisper --model small --clipboard --restart-after-silence --no-prompt
-scribe-install --name "Scribe Vosk FR" --backend vosk --language fr --keyboard --clipboard --no-terminal
+scribe-install --backend openaiapi --name "Scribe App" --keyboard --clipboard --app --no-prompt --no-terminal --restart-after-silence --api YOUROPENAIAPIKEY
 ```
-This will install three separate apps:
-- `Super + scribe` : will launch the default version with terminal prompt
-- `Super + whisper` : will launch a present version with the `small` model from `whisper` and start recording right away. You can see what is going on in the terminal and the result is ready to paste from the clipboard.
-- `Super + vosk fr` : will launch a preset version for real-time transcription in French with the `vosk` backend, and throughput to the clipboard and the keyboard, not even opening a terminal (you need to press Record in the tray icon menu to start the recording).
+This will install two separate apps (names "Scribe" and "Scribe App")
 ## Fine tuning