PyPI - scribe-cli - Versions diffs - 0.12.0__tar.gz → 0.12.2__tar.gz - Mend

scribe-cli 0.12.0tar.gz → 0.12.2tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (31) hide show

{scribe_cli-0.12.0/scribe_cli.egg-info → scribe_cli-0.12.2}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.2
 Name: scribe-cli
-Version: 0.12.0
+Version: 0.12.2
 Summary: scribe is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard on a computer
 Author-email: Mahé Perrette <mahe.perrette@gmail.com>
 License: MIT License
@@ -34,7 +34,11 @@ License: MIT License
         ensure compliance with their respective terms.
 Project-URL: Homepage, https://github.com/perrette/scribe
 Keywords: speech recognition,transcription,AI,language,vosk,whisper,openai,keyboard,clipboard
-Classifier: Programming Language :: Python :: 3
+Classifier: Programming Language :: Python :: 3.9
+Classifier: Programming Language :: Python :: 3.10
+Classifier: Programming Language :: Python :: 3.11
+Classifier: Programming Language :: Python :: 3.12
+Classifier: Programming Language :: Python :: 3.13
 Classifier: Operating System :: OS Independent
 Requires-Python: >=3.9
 Description-Content-Type: text/markdown
@@ -66,8 +70,8 @@ Requires-Dist: soundfile; extra == "all"
 Requires-Dist: vosk; extra == "all"
 Requires-Dist: pystray; extra == "all"
-[![python](https://img.shields.io/badge/python-3.12-blue.svg)]()
 [![pypi](https://img.shields.io/pypi/v/scribe-cli)](https://pypi.org/project/scribe-cli)
+![](https://img.shields.io/python/required-version-toml?tomlFilePath=https%3A%2F%2Fraw.githubusercontent.com%2Fperrette%2Fscribe%2Frefs%2Fheads%2Fmain%2Fpyproject.toml)
 # Scribe  <img src="scribe_data/share/icon.png" width=48px>
@@ -77,22 +81,29 @@ It features local, downloadable models with the `vosk` and `whisper` backends, a
 ## Compatibility
-In principle `scribe` is compatible with any OS but I develop it under Ubuntu (Wayland) for my own purposes so glitches are likely on other configurations.
-Moreover there are quite a bit of dependencies that rely on very OS-specific protocols under the hood, like access to the microphone, keyboard and clipboard,
-and even though the python dependencies `scribe` relies on are not restricted to a single platform, there may be limitation and additional binaries to install.
-This guide is based on python3.12 running on Ubuntu 24.04 with Gnome + Wayland, which is a relatively standard setting at the time of writing.
-Note as of February 19, 2025 python 13 does not seem to produce any transcription (I am not sure which dependency is to blame).
-A test on Mac OS (M1 Air with 8Gb RAM) worked with python 12, though with a much inferior performance compared to my own system (Lenovo T14 Gen 5 with i5 125U 32 Gb RAM).
+The package is initially developped for python 3.12 with Ubuntu 24.04 with Gnome + Wayland, but it should work on other platforms as well (feedback welcome).
+Basically check the pages of the dependencies for more info (i.e. pynput for the keyboard, pystray for the app).
+- python 3.13:
+    - at the time of writing, `openai-whisper` does not install.
+- Ubuntu:
+    - see caveats in the use of the keyboard under Wayland [keyboard section](#use-the-keyboard-with-wayland).
+- MacOS:
+    - tested on a Macbook Air M1 8Gb RAM, with python 3.12. It runs, but poorly, presumably because of the low memory: prefer the `openaiapi` backend for such machines
+    - I expect better memory specs will have the local models run fine
+- Windows:
+    - not tested yet
 ## Installation
-Install PortAudio library and xclip library. E.g. on Ubuntu:
+Install PortAudio library (required by `sounddevice`) and xclip library (required by `pyperclip`). E.g. on Ubuntu:
 ```bash
 sudo apt-get install portaudio19-dev xclip
 ```
-See additional requirements for the [icon tray](#system-tray-icon-experimental) and [keyboard](#virtual-keyboard-experimental) options. The python dependencies should be dealt with automatically:
+See additional requirements for the [icon tray](#system-tray-icon-experimental-) and [keyboard](#virtual-keyboard-experimental) options. The python dependencies should be dealt with automatically:
 ```bash
 pip install scribe-cli[all]
@@ -110,6 +121,37 @@ pip install -e .[all]
 You can leave the optional dependencies (leave out `[all]`) but must install at least one of `vosk` or `openai-whisper` or `openai` packages (see Usage below).
+At the time of writing `openai-whisper` does not install on `python 3.13`. You can install the packages manually and skip that package. This makes the `whisper` API unavailable.
+### Manual selection of the dependencies
+```bash
+# language models (at least one must be installed !)
+pip install vosk
+pip install openai soundfile  # openaiapi
+pip install openai-whisper   # FAILS IN PYTHON 3.13 on Ubuntu
+# PortAUDIO (sounddevice)
+pip install sounddevice # automatically installed as required dependency
+sudo apt-get install portaudio19-dev
+# clipboard
+pip install pyperclip  # automatically installed as required dependency
+sudo apt-get install xclip
+# keyboard
+pip install pynput
+# app mode
+# Uncommand the line below for Ubuntu !
+sudo apt install libcairo-dev libgirepository1.0-dev gir1.2-appindicator3-0.1  # Ubuntu ONLY (not needed on MacOS)
+pip install PyGObject # Ubuntu ONLY (not needed on MacOS)
+pip install pystray
+# And finally
+pip install scribe-cli
+```
 The language models for local backends `vosk` and `whisper` will download on-the-fly.
 The default download folder is `$XDG_CACHE_HOME/{backend}` where `$XDG_CACHE_HOME` defaults to `$HOME/.cache`.
@@ -134,11 +176,12 @@ The `vosk` backend is much faster and very good at doing real-time transcription
 It becomes really powerful when used for longer or interactive typing session with the [keyboard](#virtual-keyboard-experimental) option, e.g. to make notes or chat with an AI.
 There are many [vosk models](https://alphacephei.com/vosk/models) available, and here a few are associated to [a handful of languages](scribe/models.toml) `en`, `fr`, `it`, `de` (so far).
-The `openaiapi` backend uses `whisper-1` model at the time of writing. It requires an API key
+The `openaiapi` backend uses `whisper-1` model at the time of writing. It requires an API key best passed as an environment variable, e.g. in bash:
 ```bash
-scribe --backend openaiapi --api YOURAPIKEY
+export OPENAI_API_KEY=YOURAPIKEY
+scribe --backend openaiapi
 ```
-where `--no-prompt` jumps right to the recording (after the first interruption, you can still choose to change the backend and model).
+The `openaiapi` backend is lightweight and handy if you have an API (you can create one for free for testing) and a low-spec computer (and don't care too much about privacy, obviously).
 ## Output media
@@ -174,9 +217,9 @@ This can be extremely useful with the `vosk` backend and its realtime transcript
 The `--keyboard` option relies on the optional `pynput` dependency (installed together with `scribe` if you used the `[all]` or `[keyboard]` option).
 Depending on your operating system, `pynput` may require additional configuration to work around its [limitations](https://pynput.readthedocs.io/en/latest/limitations.html).
-#### Use the keyboard with Wayland (default for Ubuntu 24.04)
+#### Use the keyboard with Wayland
-In my Ubuntu + Wayland system the keyboard simulation works out-of-the-box in chromium based applications (including vscode) but it does not in firefox and sublime text and any of the rest (not even in a terminal !). I am told this is because Chromium runs an X server emulator and so is compatible with the default pynput backend.
+In my Ubuntu 24.04 + Wayland system the keyboard simulation works out-of-the-box in chromium based applications (including vscode) but it does not in firefox and sublime text and any of the rest (not even in a terminal !). I am told this is because Chromium runs an X server emulator and so is compatible with the default pynput backend.
 One workaround is to use the Xorg version of GNOME: in `etc/gdm3/custom.conf` uncomment `# WaylandEnable=false` and restart your computer.
@@ -190,40 +233,46 @@ You're on the right path :)
 ## System tray icon (experimental) <img src="scribe_data/share/icon.png" width=48px>
+<img src=https://github.com/user-attachments/assets/4c97f4b1-1a65-4d49-9f5a-a9f4287cfa5a width=300px>
 To avoid switching back and forth with the terminal, it's possible to interact with the program via an icon tray.
 To activate start with:
 ```bash
-scribe --app
+scribe --app ...
 ```
 or toggle the app option in the interactive menu. The scribe icon will show, with Record and other options. The icon will change based on what the app is doing. It is possible to choose from a set
 of predefined models (controlled by `--vosk-models` and `whisper-models`) and options, or to Quit and choose from the terminal before pressing Enter again.
 For the vosk model, there are only two states : recording + transcribing or Idle. For the whisper model there are three states visible from the icon: recording/waiting, transcribing and idle.
-That option requires `pystray` to be installed. This is included with the `pip install ...[all]` option. In Ubuntu the following dependencies were required to make the menus appear:
+That option requires `pystray` to be installed. This is included with the `pip install ...[all]` option.
+The `--vosk-models` and `--whisper-models` allow to predefined the set of available models to choose from in the app manu. E.g.
+```bash
+scribe --app --vosk-models vosk-model-fr-0.22 --whisper-models small turbo ...
+```
+### Ubuntu
+In Ubuntu the following dependencies were required to make the menus appear:
 ```bash
 sudo apt install libcairo-dev libgirepository1.0-dev gir1.2-appindicator3-0.1
 pip install PyGObject
 ```
-<img src=https://github.com/user-attachments/assets/4c97f4b1-1a65-4d49-9f5a-a9f4287cfa5a width=300px>
 ## Start as an application in GNOME
 If you run Ubuntu (or else?) with GNOME, the script `scribe-install [...]` will create a `scribe.desktop` file and place it under `$HOME/.local/share/applications`
 to make it available from the quick launch menu. Any option will be passed on to `scribe`, with the additional options `--name` and `--no-terminal`.
 `--no-terminal` means no terminal will show up, and it also implies the options `--app --no-prompt`.
-In a relatively basic form
+Consider the following two flavors:
 ```bash
-scribe-install --clipboard  --api YOUROPENAIAPIKEY
+scribe-install --clipboard ...
+scribe-install --name "Scribe App" --no-terminal --clipboard ...
 ```
-(`--api` is optional and only useful if you plan to use `openaiapi` backend later on)
+The first will create an app named Scribe (the default) that simply opens a terminal and execute the command `scribe --clipboard ...`.
+The second will create an app named Scribe App that executes in a hidden terminal: `scribe --no-prompt --app --clipboard ...`, thus leaving the tray icon as only mode of interaction.
-It is also possible to run an app fully outside the terminal:
-```bash
-scribe-install --backend openaiapi --name "Scribe App" --keyboard --clipboard --app --no-prompt --no-terminal --restart-after-silence --api YOUROPENAIAPIKEY  --vosk-models vosk-model-fr-0.22 --whisper-models small turbo
-```
 ## Fine tuning

{scribe_cli-0.12.0 → scribe_cli-0.12.2}/README.md RENAMED Viewed

@@ -1,5 +1,5 @@
-[![python](https://img.shields.io/badge/python-3.12-blue.svg)]()
 [![pypi](https://img.shields.io/pypi/v/scribe-cli)](https://pypi.org/project/scribe-cli)
+![](https://img.shields.io/python/required-version-toml?tomlFilePath=https%3A%2F%2Fraw.githubusercontent.com%2Fperrette%2Fscribe%2Frefs%2Fheads%2Fmain%2Fpyproject.toml)
 # Scribe  <img src="scribe_data/share/icon.png" width=48px>
@@ -9,22 +9,29 @@ It features local, downloadable models with the `vosk` and `whisper` backends, a
 ## Compatibility
-In principle `scribe` is compatible with any OS but I develop it under Ubuntu (Wayland) for my own purposes so glitches are likely on other configurations.
-Moreover there are quite a bit of dependencies that rely on very OS-specific protocols under the hood, like access to the microphone, keyboard and clipboard,
-and even though the python dependencies `scribe` relies on are not restricted to a single platform, there may be limitation and additional binaries to install.
-This guide is based on python3.12 running on Ubuntu 24.04 with Gnome + Wayland, which is a relatively standard setting at the time of writing.
-Note as of February 19, 2025 python 13 does not seem to produce any transcription (I am not sure which dependency is to blame).
-A test on Mac OS (M1 Air with 8Gb RAM) worked with python 12, though with a much inferior performance compared to my own system (Lenovo T14 Gen 5 with i5 125U 32 Gb RAM).
+The package is initially developped for python 3.12 with Ubuntu 24.04 with Gnome + Wayland, but it should work on other platforms as well (feedback welcome).
+Basically check the pages of the dependencies for more info (i.e. pynput for the keyboard, pystray for the app).
+- python 3.13:
+    - at the time of writing, `openai-whisper` does not install.
+- Ubuntu:
+    - see caveats in the use of the keyboard under Wayland [keyboard section](#use-the-keyboard-with-wayland).
+- MacOS:
+    - tested on a Macbook Air M1 8Gb RAM, with python 3.12. It runs, but poorly, presumably because of the low memory: prefer the `openaiapi` backend for such machines
+    - I expect better memory specs will have the local models run fine
+- Windows:
+    - not tested yet
 ## Installation
-Install PortAudio library and xclip library. E.g. on Ubuntu:
+Install PortAudio library (required by `sounddevice`) and xclip library (required by `pyperclip`). E.g. on Ubuntu:
 ```bash
 sudo apt-get install portaudio19-dev xclip
 ```
-See additional requirements for the [icon tray](#system-tray-icon-experimental) and [keyboard](#virtual-keyboard-experimental) options. The python dependencies should be dealt with automatically:
+See additional requirements for the [icon tray](#system-tray-icon-experimental-) and [keyboard](#virtual-keyboard-experimental) options. The python dependencies should be dealt with automatically:
 ```bash
 pip install scribe-cli[all]
@@ -42,6 +49,37 @@ pip install -e .[all]
 You can leave the optional dependencies (leave out `[all]`) but must install at least one of `vosk` or `openai-whisper` or `openai` packages (see Usage below).
+At the time of writing `openai-whisper` does not install on `python 3.13`. You can install the packages manually and skip that package. This makes the `whisper` API unavailable.
+### Manual selection of the dependencies
+```bash
+# language models (at least one must be installed !)
+pip install vosk
+pip install openai soundfile  # openaiapi
+pip install openai-whisper   # FAILS IN PYTHON 3.13 on Ubuntu
+# PortAUDIO (sounddevice)
+pip install sounddevice # automatically installed as required dependency
+sudo apt-get install portaudio19-dev
+# clipboard
+pip install pyperclip  # automatically installed as required dependency
+sudo apt-get install xclip
+# keyboard
+pip install pynput
+# app mode
+# Uncommand the line below for Ubuntu !
+sudo apt install libcairo-dev libgirepository1.0-dev gir1.2-appindicator3-0.1  # Ubuntu ONLY (not needed on MacOS)
+pip install PyGObject # Ubuntu ONLY (not needed on MacOS)
+pip install pystray
+# And finally
+pip install scribe-cli
+```
 The language models for local backends `vosk` and `whisper` will download on-the-fly.
 The default download folder is `$XDG_CACHE_HOME/{backend}` where `$XDG_CACHE_HOME` defaults to `$HOME/.cache`.
@@ -66,11 +104,12 @@ The `vosk` backend is much faster and very good at doing real-time transcription
 It becomes really powerful when used for longer or interactive typing session with the [keyboard](#virtual-keyboard-experimental) option, e.g. to make notes or chat with an AI.
 There are many [vosk models](https://alphacephei.com/vosk/models) available, and here a few are associated to [a handful of languages](scribe/models.toml) `en`, `fr`, `it`, `de` (so far).
-The `openaiapi` backend uses `whisper-1` model at the time of writing. It requires an API key
+The `openaiapi` backend uses `whisper-1` model at the time of writing. It requires an API key best passed as an environment variable, e.g. in bash:
 ```bash
-scribe --backend openaiapi --api YOURAPIKEY
+export OPENAI_API_KEY=YOURAPIKEY
+scribe --backend openaiapi
 ```
-where `--no-prompt` jumps right to the recording (after the first interruption, you can still choose to change the backend and model).
+The `openaiapi` backend is lightweight and handy if you have an API (you can create one for free for testing) and a low-spec computer (and don't care too much about privacy, obviously).
 ## Output media
@@ -106,9 +145,9 @@ This can be extremely useful with the `vosk` backend and its realtime transcript
 The `--keyboard` option relies on the optional `pynput` dependency (installed together with `scribe` if you used the `[all]` or `[keyboard]` option).
 Depending on your operating system, `pynput` may require additional configuration to work around its [limitations](https://pynput.readthedocs.io/en/latest/limitations.html).
-#### Use the keyboard with Wayland (default for Ubuntu 24.04)
+#### Use the keyboard with Wayland
-In my Ubuntu + Wayland system the keyboard simulation works out-of-the-box in chromium based applications (including vscode) but it does not in firefox and sublime text and any of the rest (not even in a terminal !). I am told this is because Chromium runs an X server emulator and so is compatible with the default pynput backend.
+In my Ubuntu 24.04 + Wayland system the keyboard simulation works out-of-the-box in chromium based applications (including vscode) but it does not in firefox and sublime text and any of the rest (not even in a terminal !). I am told this is because Chromium runs an X server emulator and so is compatible with the default pynput backend.
 One workaround is to use the Xorg version of GNOME: in `etc/gdm3/custom.conf` uncomment `# WaylandEnable=false` and restart your computer.
@@ -122,40 +161,46 @@ You're on the right path :)
 ## System tray icon (experimental) <img src="scribe_data/share/icon.png" width=48px>
+<img src=https://github.com/user-attachments/assets/4c97f4b1-1a65-4d49-9f5a-a9f4287cfa5a width=300px>
 To avoid switching back and forth with the terminal, it's possible to interact with the program via an icon tray.
 To activate start with:
 ```bash
-scribe --app
+scribe --app ...
 ```
 or toggle the app option in the interactive menu. The scribe icon will show, with Record and other options. The icon will change based on what the app is doing. It is possible to choose from a set
 of predefined models (controlled by `--vosk-models` and `whisper-models`) and options, or to Quit and choose from the terminal before pressing Enter again.
 For the vosk model, there are only two states : recording + transcribing or Idle. For the whisper model there are three states visible from the icon: recording/waiting, transcribing and idle.
-That option requires `pystray` to be installed. This is included with the `pip install ...[all]` option. In Ubuntu the following dependencies were required to make the menus appear:
+That option requires `pystray` to be installed. This is included with the `pip install ...[all]` option.
+The `--vosk-models` and `--whisper-models` allow to predefined the set of available models to choose from in the app manu. E.g.
+```bash
+scribe --app --vosk-models vosk-model-fr-0.22 --whisper-models small turbo ...
+```
+### Ubuntu
+In Ubuntu the following dependencies were required to make the menus appear:
 ```bash
 sudo apt install libcairo-dev libgirepository1.0-dev gir1.2-appindicator3-0.1
 pip install PyGObject
 ```
-<img src=https://github.com/user-attachments/assets/4c97f4b1-1a65-4d49-9f5a-a9f4287cfa5a width=300px>
 ## Start as an application in GNOME
 If you run Ubuntu (or else?) with GNOME, the script `scribe-install [...]` will create a `scribe.desktop` file and place it under `$HOME/.local/share/applications`
 to make it available from the quick launch menu. Any option will be passed on to `scribe`, with the additional options `--name` and `--no-terminal`.
 `--no-terminal` means no terminal will show up, and it also implies the options `--app --no-prompt`.
-In a relatively basic form
+Consider the following two flavors:
 ```bash
-scribe-install --clipboard  --api YOUROPENAIAPIKEY
+scribe-install --clipboard ...
+scribe-install --name "Scribe App" --no-terminal --clipboard ...
 ```
-(`--api` is optional and only useful if you plan to use `openaiapi` backend later on)
+The first will create an app named Scribe (the default) that simply opens a terminal and execute the command `scribe --clipboard ...`.
+The second will create an app named Scribe App that executes in a hidden terminal: `scribe --no-prompt --app --clipboard ...`, thus leaving the tray icon as only mode of interaction.
-It is also possible to run an app fully outside the terminal:
-```bash
-scribe-install --backend openaiapi --name "Scribe App" --keyboard --clipboard --app --no-prompt --no-terminal --restart-after-silence --api YOUROPENAIAPIKEY  --vosk-models vosk-model-fr-0.22 --whisper-models small turbo
-```
 ## Fine tuning

{scribe_cli-0.12.0 → scribe_cli-0.12.2}/pyproject.toml RENAMED Viewed

@@ -23,7 +23,11 @@ dependencies = [
 ]
 classifiers = [
-    "Programming Language :: Python :: 3",
+    "Programming Language :: Python :: 3.9",
+    "Programming Language :: Python :: 3.10",
+    "Programming Language :: Python :: 3.11",
+    "Programming Language :: Python :: 3.12",
+    "Programming Language :: Python :: 3.13",
     "Operating System :: OS Independent",
 ]

{scribe_cli-0.12.0 → scribe_cli-0.12.2}/scribe/_version.py RENAMED Viewed

@@ -12,5 +12,5 @@ __version__: str
 __version_tuple__: VERSION_TUPLE
 version_tuple: VERSION_TUPLE
-__version__ = version = '0.12.0'
-__version_tuple__ = version_tuple = (0, 12, 0)
+__version__ = version = '0.12.2'
+__version_tuple__ = version_tuple = (0, 12, 2)

{scribe_cli-0.12.0 → scribe_cli-0.12.2}/scribe/app.py RENAMED Viewed

@@ -3,6 +3,7 @@ import tomllib
 import re
 import time
 import argparse
+from typing import Iterable
 from scribe.audio import Microphone
 from scribe.util import print_partial, clear_line, prompt_choices, ansi_link, colored
 from scribe.models import VoskTranscriber, WhisperTranscriber, OpenaiAPITranscriber
@@ -255,7 +256,7 @@ def start_recording(micro, transcriber, clipboard=True, keyboard=False, latency=
         callback()
-def create_app(micro, transcriber, other_transcribers=None, **kwargs):
+def create_app(micro, transcriber, other_transcribers=None, transcriber_options=[], **kwargs):
     import pystray
     from pystray import Menu as pystrayMenu, MenuItem as Item
     from PIL import Image
@@ -344,6 +345,9 @@ def create_app(micro, transcriber, other_transcribers=None, **kwargs):
     def callback_set_model(icon, item):
         transcriber = icon._transcriber
+        if transcriber.model_name == str(item):
+            transcriber.log(f"Already using model {str(item)}")
+            return
         callback_stop_recording(icon, item)
         model_name = str(item)
         meta = other_transcribers_dict[model_name]
@@ -356,7 +360,23 @@ def create_app(micro, transcriber, other_transcribers=None, **kwargs):
     def callback_toggle_option(icon, item):
         callback_stop_recording(icon, item)
-        kwargs[str(item)] = not kwargs[str(item)]
+        if str(item) in transcriber_options:
+            # toggle the option on the current transcriber
+            if str(item) in icon._transcriber._frozen_options or type(getattr(icon._transcriber, str(item), None)) is not bool:
+                print("Skipped setting option", item)
+                return
+            newvalue = not getattr(icon._transcriber, str(item))
+            setattr(icon._transcriber, str(item), newvalue)
+            # set the option on the other transcribers as well
+            if other_transcribers:
+                for name in other_transcribers_dict:
+                    meta = other_transcribers_dict[name]
+                    if str(item) in meta:
+                        meta[str(item)] = newvalue
+        else:
+            kwargs[str(item)] = not kwargs[str(item)]
+            print("Option set [", item, "] to", kwargs[str(item)])
     def is_model_selection(item):
         return icon._model_selection
@@ -367,23 +387,34 @@ def create_app(micro, transcriber, other_transcribers=None, **kwargs):
     def is_not_recording(item):
         return not is_recording(item) and not is_model_selection(item)
-    def is_checked(item):
+    def is_checked_model(item):
         return icon._transcriber.model_name == str(item)
     def is_checked_option(item):
+        if not is_option_visible(item):
+            return False
+        if str(item) in transcriber_options:
+            return getattr(icon._transcriber, str(item))
         return kwargs[str(item)]
+    def is_option_visible(item):
+        if str(item) in transcriber_options:
+            return str(item) not in icon._transcriber._frozen_options
+        return True
     modeltitle = f"{transcriber.backend} :: {transcriber.model_name}"
     title = f"scribe :: {modeltitle}"
+    options = [name for name in kwargs if isinstance(kwargs[name], bool)] + [name for name in transcriber_options if isinstance(getattr(transcriber, name), bool)]
     menus = []
     menus.append(Item(f"Record", callback_record, visible=is_not_recording, default=True))
     menus.append(Item("Stop", callback_stop_recording, visible=is_recording))
     menus.append(Item("Choose Model", pystrayMenu(
-        *(Item(f"{name}", callback_set_model, checked=is_checked) for name in other_transcribers_dict)))
+        *(Item(f"{name}", callback_set_model, checked=is_checked_model) for name in other_transcribers_dict)))
     )
     menus.append(Item("Toggle Options", pystrayMenu(
-        *(Item(f"{name}", callback_toggle_option, checked=is_checked_option) for name in kwargs if isinstance(kwargs[name], bool))))
+        *(Item(f"{name}", callback_toggle_option, checked=is_checked_option, visible=is_option_visible) for name in options)))
     )
     menus.append(Item('Quit', callback_quit))
@@ -398,6 +429,8 @@ def create_app(micro, transcriber, other_transcribers=None, **kwargs):
     return icon
+def _filter_options(d: dict, exclude: Iterable) -> dict:
+    return {k: v for k, v in d.items() if k not in exclude}
 def main(args=None):
@@ -531,9 +564,10 @@ def main(args=None):
             app = create_app(micro, transcriber, other_transcribers=[
                 {**vars(o), "backend": "openaiapi", "model": "whisper-1"},
                 *[{**vars(o), "backend": "whisper", "model": model} for model in o.whisper_models],
-                *[{**vars(o), "backend": "vosk", "model": model} for model in o.vosk_models]],
+                *[{**_filter_options(vars(o), exclude=VoskTranscriber._frozen_options), "backend": "vosk", "model": model} for model in o.vosk_models]],
                              clipboard=o.clipboard, output_file=o.output_file,
-                             keyboard=o.keyboard, latency=o.latency, ascii=o.ascii, **greetings)
+                             keyboard=o.keyboard, latency=o.latency, ascii=o.ascii,
+                             transcriber_options=["restart_after_silence"], **greetings)
             print("Starting app...")
             app.run()
         else:

{scribe_cli-0.12.0 → scribe_cli-0.12.2}/scribe/models.py RENAMED Viewed

@@ -16,11 +16,15 @@ HOME = os.environ.get('HOME', os.path.expanduser('~'))
 XDG_CACHE_HOME = os.environ.get('XDG_CACHE_HOME', os.path.join(HOME, '.cache'))
 VOSK_MODELS_FOLDER = os.path.join(XDG_CACHE_HOME, "vosk")
+class SilenceDetected(Exception):
+    pass
 class StopRecording(Exception):
     pass
 class AbstractTranscriber:
     backend = None
+    _frozen_options = frozenset()
     def __init__(self, model, model_name=None, language=None, samplerate=16000, timeout=None, model_kwargs={},
                  silence_thresh=-40, silence_duration=2, restart_after_silence=False, logger=None):
         self.model_name = model_name
@@ -50,7 +54,32 @@ class AbstractTranscriber:
         return self.timeout is not None and time.time() - self.start_time > self.timeout
     def transcribe_realtime_audio(self, audio_bytes=b""):
-        self.audio_buffer += audio_bytes
+        """This method is generic and assumes the underlying model does not handle real-time audio.
+        The Vosk model handles real-time audio, so this method is overridden in the VoskTranscriber class.
+        """
+        # Vérifier si le segment est un silence
+        if is_silent(audio_bytes, self.silence_thresh):
+            self.silence_buffer += audio_bytes
+            silence_duration = time.time() - self.last_sound_time
+            self.waiting = self.silence_duration is not None and silence_duration >= self.silence_duration
+            if self.waiting and len(self.audio_buffer) > 0:
+                if self.restart_after_silence:
+                    raise SilenceDetected("Silence detected: {:.2f} seconds".format(silence_duration))
+                else:
+                    raise StopRecording("Silence detected: {:.2f} seconds".format(silence_duration))
+        else:
+            self.last_sound_time = time.time()
+            self.waiting = False
+            silence_buffer_data = np.frombuffer(self.silence_buffer, dtype=np.int16)
+            # add 0.5 seconds worth of silent data back to the audio buffer
+            half_a_second = 0.5
+            length_of_half_a_second = int(half_a_second * self.samplerate)
+            self.audio_buffer += silence_buffer_data[-length_of_half_a_second:].tobytes() + audio_bytes
+            self.silence_buffer = b''
         return {"partial": f"{len(self.audio_buffer)} bytes received (duration: {self.get_elapsed()} seconds)"}
     def transcribe_audio(self, audio_data):
@@ -59,6 +88,7 @@ class AbstractTranscriber:
     def reset(self):
         self.audio_buffer = b''
         self.start_time = time.time()
+        self.silence_buffer = b''
     def log(self, text):
         if text.startswith("\n"):
@@ -82,7 +112,7 @@ class AbstractTranscriber:
             self.last_sound_time = time.time() - self.silence_duration
         else:
             self.last_sound_time = time.time()
-        previous_waiting = self.waiting
+        # self.silence_buffer = b'' # already reset in self.reset()
         try:
@@ -93,35 +123,20 @@ class AbstractTranscriber:
                     while not microphone.q.empty():
                         data = microphone.q.get()
-                        # Vérifier si le segment est un silence
-                        if is_silent(data, self.silence_thresh):
-                            silence_duration = time.time() - self.last_sound_time
-                            previous_waiting = self.waiting
-                            self.waiting = self.silence_duration is not None and silence_duration >= self.silence_duration
-                            if self.waiting and len(self.audio_buffer) > 0:
-                                if self.restart_after_silence:
-                                    self.recording = False # for the system tray icon
-                                    result = self.finalize()
-                                    microphone.q.queue.clear()
-                                    self.reset()
-                                    yield result
-                                    self.recording = True # for the system tray icon
-                                else:
-                                    raise StopRecording("Silence detected: {:.2f} seconds".format(silence_duration))
-                        else:
-                            self.last_sound_time = time.time()
-                            self.waiting = False
-                        # don't accumulate very long silences
-                        if not self.waiting:
+                        # leave it to each transcriber to handle the silence in audio data
+                        try:
                             yield self.transcribe_realtime_audio(data)
-                        else:
-                            if not previous_waiting:
-                                self.log("Silence detected...waiting for more audio")
+                        # This exception triggers a pause in recording to allow for a transcription of the audio buffer
+                        except SilenceDetected as e:
+                            self.log(str(e))
+                            self.recording = False # for the system tray icon
+                            result = self.finalize()
+                            microphone.q.queue.clear()
+                            self.reset()
+                            yield result
+                            self.recording = True # for the system tray icon
+                            self.start_time = time.time() # reset the start time to avoid timeout
                         if self.is_overtime():
                             raise StopRecording("Overtime: {:.2f} seconds".format(self.get_elapsed()))
@@ -165,8 +180,10 @@ def get_vosk_recognizer(model, samplerate=16000):
 class VoskTranscriber(AbstractTranscriber):
     backend = "vosk"
+    _frozen_options = frozenset(["restart_after_silence", "silence_duration", "silence_thresh"])
     def __init__(self, model_name, model=None, model_kwargs={}, **kwargs):
+        kwargs["silence_thresh"] = -np.inf  # disable silence detection (this is handled by Vosk)
         if model is None:
             model = get_vosk_model(model_name, **model_kwargs)
         super().__init__(model, model_name, model_kwargs=model_kwargs, **kwargs)
@@ -222,7 +239,7 @@ class WhisperTranscriber(AbstractTranscriber):
         if len(self.audio_buffer) == 0:
             return {"text": ""}
         result = self.transcribe_audio(self.audio_buffer)
-        self.audio_buffer = b''
+        self.reset()
         return result

{scribe_cli-0.12.0 → scribe_cli-0.12.2/scribe_cli.egg-info}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.2
 Name: scribe-cli
-Version: 0.12.0
+Version: 0.12.2
 Summary: scribe is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard on a computer
 Author-email: Mahé Perrette <mahe.perrette@gmail.com>
 License: MIT License
@@ -34,7 +34,11 @@ License: MIT License
         ensure compliance with their respective terms.
 Project-URL: Homepage, https://github.com/perrette/scribe
 Keywords: speech recognition,transcription,AI,language,vosk,whisper,openai,keyboard,clipboard
-Classifier: Programming Language :: Python :: 3
+Classifier: Programming Language :: Python :: 3.9
+Classifier: Programming Language :: Python :: 3.10
+Classifier: Programming Language :: Python :: 3.11
+Classifier: Programming Language :: Python :: 3.12
+Classifier: Programming Language :: Python :: 3.13
 Classifier: Operating System :: OS Independent
 Requires-Python: >=3.9
 Description-Content-Type: text/markdown
@@ -66,8 +70,8 @@ Requires-Dist: soundfile; extra == "all"
 Requires-Dist: vosk; extra == "all"
 Requires-Dist: pystray; extra == "all"
-[![python](https://img.shields.io/badge/python-3.12-blue.svg)]()
 [![pypi](https://img.shields.io/pypi/v/scribe-cli)](https://pypi.org/project/scribe-cli)
+![](https://img.shields.io/python/required-version-toml?tomlFilePath=https%3A%2F%2Fraw.githubusercontent.com%2Fperrette%2Fscribe%2Frefs%2Fheads%2Fmain%2Fpyproject.toml)
 # Scribe  <img src="scribe_data/share/icon.png" width=48px>
@@ -77,22 +81,29 @@ It features local, downloadable models with the `vosk` and `whisper` backends, a
 ## Compatibility
-In principle `scribe` is compatible with any OS but I develop it under Ubuntu (Wayland) for my own purposes so glitches are likely on other configurations.
-Moreover there are quite a bit of dependencies that rely on very OS-specific protocols under the hood, like access to the microphone, keyboard and clipboard,
-and even though the python dependencies `scribe` relies on are not restricted to a single platform, there may be limitation and additional binaries to install.
-This guide is based on python3.12 running on Ubuntu 24.04 with Gnome + Wayland, which is a relatively standard setting at the time of writing.
-Note as of February 19, 2025 python 13 does not seem to produce any transcription (I am not sure which dependency is to blame).
-A test on Mac OS (M1 Air with 8Gb RAM) worked with python 12, though with a much inferior performance compared to my own system (Lenovo T14 Gen 5 with i5 125U 32 Gb RAM).
+The package is initially developped for python 3.12 with Ubuntu 24.04 with Gnome + Wayland, but it should work on other platforms as well (feedback welcome).
+Basically check the pages of the dependencies for more info (i.e. pynput for the keyboard, pystray for the app).
+- python 3.13:
+    - at the time of writing, `openai-whisper` does not install.
+- Ubuntu:
+    - see caveats in the use of the keyboard under Wayland [keyboard section](#use-the-keyboard-with-wayland).
+- MacOS:
+    - tested on a Macbook Air M1 8Gb RAM, with python 3.12. It runs, but poorly, presumably because of the low memory: prefer the `openaiapi` backend for such machines
+    - I expect better memory specs will have the local models run fine
+- Windows:
+    - not tested yet
 ## Installation
-Install PortAudio library and xclip library. E.g. on Ubuntu:
+Install PortAudio library (required by `sounddevice`) and xclip library (required by `pyperclip`). E.g. on Ubuntu:
 ```bash
 sudo apt-get install portaudio19-dev xclip
 ```
-See additional requirements for the [icon tray](#system-tray-icon-experimental) and [keyboard](#virtual-keyboard-experimental) options. The python dependencies should be dealt with automatically:
+See additional requirements for the [icon tray](#system-tray-icon-experimental-) and [keyboard](#virtual-keyboard-experimental) options. The python dependencies should be dealt with automatically:
 ```bash
 pip install scribe-cli[all]
@@ -110,6 +121,37 @@ pip install -e .[all]
 You can leave the optional dependencies (leave out `[all]`) but must install at least one of `vosk` or `openai-whisper` or `openai` packages (see Usage below).
+At the time of writing `openai-whisper` does not install on `python 3.13`. You can install the packages manually and skip that package. This makes the `whisper` API unavailable.
+### Manual selection of the dependencies
+```bash
+# language models (at least one must be installed !)
+pip install vosk
+pip install openai soundfile  # openaiapi
+pip install openai-whisper   # FAILS IN PYTHON 3.13 on Ubuntu
+# PortAUDIO (sounddevice)
+pip install sounddevice # automatically installed as required dependency
+sudo apt-get install portaudio19-dev
+# clipboard
+pip install pyperclip  # automatically installed as required dependency
+sudo apt-get install xclip
+# keyboard
+pip install pynput
+# app mode
+# Uncommand the line below for Ubuntu !
+sudo apt install libcairo-dev libgirepository1.0-dev gir1.2-appindicator3-0.1  # Ubuntu ONLY (not needed on MacOS)
+pip install PyGObject # Ubuntu ONLY (not needed on MacOS)
+pip install pystray
+# And finally
+pip install scribe-cli
+```
 The language models for local backends `vosk` and `whisper` will download on-the-fly.
 The default download folder is `$XDG_CACHE_HOME/{backend}` where `$XDG_CACHE_HOME` defaults to `$HOME/.cache`.
@@ -134,11 +176,12 @@ The `vosk` backend is much faster and very good at doing real-time transcription
 It becomes really powerful when used for longer or interactive typing session with the [keyboard](#virtual-keyboard-experimental) option, e.g. to make notes or chat with an AI.
 There are many [vosk models](https://alphacephei.com/vosk/models) available, and here a few are associated to [a handful of languages](scribe/models.toml) `en`, `fr`, `it`, `de` (so far).
-The `openaiapi` backend uses `whisper-1` model at the time of writing. It requires an API key
+The `openaiapi` backend uses `whisper-1` model at the time of writing. It requires an API key best passed as an environment variable, e.g. in bash:
 ```bash
-scribe --backend openaiapi --api YOURAPIKEY
+export OPENAI_API_KEY=YOURAPIKEY
+scribe --backend openaiapi
 ```
-where `--no-prompt` jumps right to the recording (after the first interruption, you can still choose to change the backend and model).
+The `openaiapi` backend is lightweight and handy if you have an API (you can create one for free for testing) and a low-spec computer (and don't care too much about privacy, obviously).
 ## Output media
@@ -174,9 +217,9 @@ This can be extremely useful with the `vosk` backend and its realtime transcript
 The `--keyboard` option relies on the optional `pynput` dependency (installed together with `scribe` if you used the `[all]` or `[keyboard]` option).
 Depending on your operating system, `pynput` may require additional configuration to work around its [limitations](https://pynput.readthedocs.io/en/latest/limitations.html).
-#### Use the keyboard with Wayland (default for Ubuntu 24.04)
+#### Use the keyboard with Wayland
-In my Ubuntu + Wayland system the keyboard simulation works out-of-the-box in chromium based applications (including vscode) but it does not in firefox and sublime text and any of the rest (not even in a terminal !). I am told this is because Chromium runs an X server emulator and so is compatible with the default pynput backend.
+In my Ubuntu 24.04 + Wayland system the keyboard simulation works out-of-the-box in chromium based applications (including vscode) but it does not in firefox and sublime text and any of the rest (not even in a terminal !). I am told this is because Chromium runs an X server emulator and so is compatible with the default pynput backend.
 One workaround is to use the Xorg version of GNOME: in `etc/gdm3/custom.conf` uncomment `# WaylandEnable=false` and restart your computer.
@@ -190,40 +233,46 @@ You're on the right path :)
 ## System tray icon (experimental) <img src="scribe_data/share/icon.png" width=48px>
+<img src=https://github.com/user-attachments/assets/4c97f4b1-1a65-4d49-9f5a-a9f4287cfa5a width=300px>
 To avoid switching back and forth with the terminal, it's possible to interact with the program via an icon tray.
 To activate start with:
 ```bash
-scribe --app
+scribe --app ...
 ```
 or toggle the app option in the interactive menu. The scribe icon will show, with Record and other options. The icon will change based on what the app is doing. It is possible to choose from a set
 of predefined models (controlled by `--vosk-models` and `whisper-models`) and options, or to Quit and choose from the terminal before pressing Enter again.
 For the vosk model, there are only two states : recording + transcribing or Idle. For the whisper model there are three states visible from the icon: recording/waiting, transcribing and idle.
-That option requires `pystray` to be installed. This is included with the `pip install ...[all]` option. In Ubuntu the following dependencies were required to make the menus appear:
+That option requires `pystray` to be installed. This is included with the `pip install ...[all]` option.
+The `--vosk-models` and `--whisper-models` allow to predefined the set of available models to choose from in the app manu. E.g.
+```bash
+scribe --app --vosk-models vosk-model-fr-0.22 --whisper-models small turbo ...
+```
+### Ubuntu
+In Ubuntu the following dependencies were required to make the menus appear:
 ```bash
 sudo apt install libcairo-dev libgirepository1.0-dev gir1.2-appindicator3-0.1
 pip install PyGObject
 ```
-<img src=https://github.com/user-attachments/assets/4c97f4b1-1a65-4d49-9f5a-a9f4287cfa5a width=300px>
 ## Start as an application in GNOME
 If you run Ubuntu (or else?) with GNOME, the script `scribe-install [...]` will create a `scribe.desktop` file and place it under `$HOME/.local/share/applications`
 to make it available from the quick launch menu. Any option will be passed on to `scribe`, with the additional options `--name` and `--no-terminal`.
 `--no-terminal` means no terminal will show up, and it also implies the options `--app --no-prompt`.
-In a relatively basic form
+Consider the following two flavors:
 ```bash
-scribe-install --clipboard  --api YOUROPENAIAPIKEY
+scribe-install --clipboard ...
+scribe-install --name "Scribe App" --no-terminal --clipboard ...
 ```
-(`--api` is optional and only useful if you plan to use `openaiapi` backend later on)
+The first will create an app named Scribe (the default) that simply opens a terminal and execute the command `scribe --clipboard ...`.
+The second will create an app named Scribe App that executes in a hidden terminal: `scribe --no-prompt --app --clipboard ...`, thus leaving the tray icon as only mode of interaction.
-It is also possible to run an app fully outside the terminal:
-```bash
-scribe-install --backend openaiapi --name "Scribe App" --keyboard --clipboard --app --no-prompt --no-terminal --restart-after-silence --api YOUROPENAIAPIKEY  --vosk-models vosk-model-fr-0.22 --whisper-models small turbo
-```
 ## Fine tuning

{scribe_cli-0.12.0 → scribe_cli-0.12.2}/scribe_cli.egg-info/SOURCES.txt RENAMED Viewed

@@ -25,4 +25,5 @@ scribe_data/__init__.py
 scribe_data/share/icon.png
 scribe_data/share/icon_recording.png
 scribe_data/share/icon_writing.png
-scribe_data/templates/scribe.desktop
+scribe_data/templates/scribe.desktop
+scripts/test_python_versions_install.sh

scribe_cli-0.12.2/scripts/test_python_versions_install.sh ADDED Viewed

@@ -0,0 +1,20 @@
+subversion=$1
+version=3.$subversion
+name=py3$subversion
+MAMBAENV=~/.local/share/mamba/envs/$name
+VENVDIR=~/.virtualenvs/$name
+if [ ! -d $MAMBAENV ] ; then
+	micromamba create -n $name python=$version --prefix $MAMBAENV -y
+else
+	echo "Environment $name already exists at $MAMBAENV"
+fi
+if [ ! -d $VENVDIR ] ; then
+	$MAMBAENV/bin/python3 -m venv $VENVDIR
+else
+	echo "Virtualenv $name already exists at $VENVDIR"
+fi
+source ~/.virtualenvs/$name/bin/activate
+pip install -U pip
+pip install scribe-cli[all]