PyPI - scribe-cli - Versions diffs - 0.7.3__tar.gz → 0.7.6__tar.gz - Mend

scribe-cli 0.7.3tar.gz → 0.7.6tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (29) hide show

{scribe_cli-0.7.3/scribe_cli.egg-info → scribe_cli-0.7.6}/PKG-INFO RENAMED Viewed

@@ -1,7 +1,7 @@
 Metadata-Version: 2.2
 Name: scribe-cli
-Version: 0.7.3
-Summary: scribe is a local speech recognition tool that provides real-time transcription using vosk and whisper AI.
+Version: 0.7.6
+Summary: scribe is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard on a computer
 Author-email: Mahé Perrette <mahe.perrette@gmail.com>
 License: MIT License
@@ -44,31 +44,43 @@ Requires-Dist: sounddevice
 Requires-Dist: tqdm
 Requires-Dist: requests
 Requires-Dist: pyperclip
-Requires-Dist: pystray
 Provides-Extra: keyboard
 Requires-Dist: pynput; extra == "keyboard"
 Provides-Extra: whisper
 Requires-Dist: openai-whisper; extra == "whisper"
 Provides-Extra: vosk
 Requires-Dist: vosk; extra == "vosk"
+Provides-Extra: app
+Requires-Dist: pystray; extra == "app"
+Requires-Dist: PyGObject; extra == "app"
 Provides-Extra: all
 Requires-Dist: pynput; extra == "all"
 Requires-Dist: openai-whisper; extra == "all"
 Requires-Dist: vosk; extra == "all"
+Requires-Dist: pystray; extra == "all"
+[![python](https://img.shields.io/badge/python-3.12-blue.svg)]()
+[![pypi](https://github.com/perrette/scribe/actions/workflows/pypi.yml/badge.svg)](https://pypi.org/project/papers-cli)
 # Scribe
-`scribe` is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard.
+`scribe` is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard on a computer.
+## Compatibility
+In principle `scribe` is compatible with any OS but I develop it under Ubuntu (Wayland) and develop it for my own purposes so glitches are likely on other configurations.
+As of February 19, 2025 python 13 is not supported (I can't recall now which dependency is to blame).
+A test on Mac OS (M1 Air with 8Gb RAM) worked with python 12, though with a much inferior performance compared to my own system (Lenovo T14 Gen 5 with i5 125U 32 Gb RAM).
 ## Installation
-Install PortAudio library. E.g. on Ubuntu:
+Install PortAudio library and xclip library. E.g. on Ubuntu:
 ```bash
-sudo apt-get install portaudio19-dev
+sudo apt-get install portaudio19-dev xclip
 ```
-The python dependencies should be dealt with automatically:
+See additional requirements for the [icon tray](#system-tray-icon-experimental) and [keyboard](#virtual-keyboard-experimental) options. The python dependencies should be dealt with automatically:
 ```bash
 pip install scribe-cli[all]"
@@ -87,8 +99,8 @@ pip install -e .[all]
 You can leave the optional dependencies (leave out `[all]`) but must install at least one of `vosk` or `openai-whisper` packages (see Usage below).
 The `vosk` language models will download on-the-fly.
-The default data folder is `$HOME/.local/share/vosk/language-models`.
-This can be modified.
+The default download folder is `$XDG_CACHE_HOME/{backend}` where `$XDG_CACHE_HOME` defaults to `$HOME/.cache` (note for the `whisker` backend
+the default is left to the `openai-whisper` package and might change in the future).
 ## Usage
@@ -108,8 +120,7 @@ but it cannot do real-time, and depending on the model can have relatively long
 With the `whisker` model you need to stop the registration manually before the transcription occurs (Ctrl + C), though
 there is a maximum duration after which it will stop by itself, which is setup to 60s by default (unless `--duration` is set to something else).
-The `vosk` backend is good at
-doing real-time transcription for one language, but tended to make more mistakes in my tests and it does not do punctuation.
+The `vosk` backend is much faster and very good at doing real-time transcription for one language, but tended to make more mistakes in my tests and it does not do punctuation.
 Use mainly for longer typing session with the [keyboard](#virtual-keyboard-advanced) option, e.g. to make notes.
 There are many [vosk models](https://alphacephei.com/vosk/models) available, and here a few are associated to [a handful of languages](scribe/models.toml) `en`, `fr`, `it`, `de` (so far).
@@ -122,6 +133,9 @@ where `--no-prompt` jumps right to the recording (after the first interruption,
 ### Virtual keyboard (experimental)
 By default the content of the transcription is pasted to the clipboard, and it is up to the user to paste (e.g. Ctrl + V).
+However with the `vosk` backend and its realtime transcription, it is very handy to have the keys sent directly to the keyboard.
+That can be achieve with the `--keyboard` option.
 With the `--keyboard` option `scribe` will attempt to simulate a keyboard and send transcribed characters to the applcation under focus:
 ```bash
@@ -129,10 +143,23 @@ scribe --keyboard
 ```
 It relies on the optional `pynput` dependency (installed together with `scribe` if you used the `[all]` or `[keyboard]` option).
+Depending on your operating system, `pynput` may require additional configuration to work around its [limitations](https://pynput.readthedocs.io/en/latest/limitations.html).
+#### Use the keyboard in Ubuntu
-`pynput` may require [some configuration](https://pynput.readthedocs.io/en/latest/limitations.html). It has [limitations]((https://pynput.readthedocs.io/en/latest/limitations.html)). In my Ubuntu + Wayland system it works in chromium based applications (including vscode) but it does not in firefox and sublime text and any of the rest (not even in a terminal !). Workarounds include using the Xorg version of GNOME: in `etc/gdm3/custom.conf` uncomment `# WaylandEnable=false` and restart.
+In my Ubuntu + Wayland system the keyboard simulation works out-of-the-box in chromium based applications (including vscode) but it does not in firefox and sublime text and any of the rest (not even in a terminal !). I am told this is because Chromium runs an X server emulator and so is compatible with the default pynput backend.
+One workaround is to use the Xorg version of GNOME: in `etc/gdm3/custom.conf` uncomment `# WaylandEnable=false` and restart your computer.
+Another workaround while staying with Wayland is to use the low-level `uinput` backend of `pynput`, but that requires that `scribe` is run as root (sudo), and likely other configurations like activating the `uinput` system module (`sudo modprobe uinput` for a one-time test, or adding `uinput` to `/etc/modules-load.d/modules.conf` to make that persistent).
+Moreover, the keyboard must be set with an appropriate layout, for example to have the letter `é` you'd want a French or Italian layout otherwise the English will drop it or replace with something else. Another caveat I encountered is that the special characters (`é`) were inserted at the wrong place. Adding a small delay was enough to fix that with the additional parameter `--latency 0.01` Finally if you run as sudo you may need to reset some environment variable so that the list of audio devices (`XDG_RUNTIME_DIR`) and the download folder remain the same. To sum-up, that gives something like:
+```bash
+sudo modprobe uinput
+sudo HOME=$HOME XDG_RUNTIME_DIR=$XDG_RUNTIME_DIR PYNPUT_BACKEND_KEYBOARD=uinput $(which scribe)  --latency 0.01
+```
+You're on the right path :)
-### System try icon (experimental)
+### System tray icon (experimental)
 To avoid switching back and forth with the terminal, it's possible to interact with the program via an icon tray.
 To activate start with:

{scribe_cli-0.7.3 → scribe_cli-0.7.6}/README.md RENAMED Viewed

@@ -1,16 +1,25 @@
+[![python](https://img.shields.io/badge/python-3.12-blue.svg)]()
+[![pypi](https://github.com/perrette/scribe/actions/workflows/pypi.yml/badge.svg)](https://pypi.org/project/papers-cli)
 # Scribe
-`scribe` is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard.
+`scribe` is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard on a computer.
+## Compatibility
+In principle `scribe` is compatible with any OS but I develop it under Ubuntu (Wayland) and develop it for my own purposes so glitches are likely on other configurations.
+As of February 19, 2025 python 13 is not supported (I can't recall now which dependency is to blame).
+A test on Mac OS (M1 Air with 8Gb RAM) worked with python 12, though with a much inferior performance compared to my own system (Lenovo T14 Gen 5 with i5 125U 32 Gb RAM).
 ## Installation
-Install PortAudio library. E.g. on Ubuntu:
+Install PortAudio library and xclip library. E.g. on Ubuntu:
 ```bash
-sudo apt-get install portaudio19-dev
+sudo apt-get install portaudio19-dev xclip
 ```
-The python dependencies should be dealt with automatically:
+See additional requirements for the [icon tray](#system-tray-icon-experimental) and [keyboard](#virtual-keyboard-experimental) options. The python dependencies should be dealt with automatically:
 ```bash
 pip install scribe-cli[all]"
@@ -29,8 +38,8 @@ pip install -e .[all]
 You can leave the optional dependencies (leave out `[all]`) but must install at least one of `vosk` or `openai-whisper` packages (see Usage below).
 The `vosk` language models will download on-the-fly.
-The default data folder is `$HOME/.local/share/vosk/language-models`.
-This can be modified.
+The default download folder is `$XDG_CACHE_HOME/{backend}` where `$XDG_CACHE_HOME` defaults to `$HOME/.cache` (note for the `whisker` backend
+the default is left to the `openai-whisper` package and might change in the future).
 ## Usage
@@ -50,8 +59,7 @@ but it cannot do real-time, and depending on the model can have relatively long
 With the `whisker` model you need to stop the registration manually before the transcription occurs (Ctrl + C), though
 there is a maximum duration after which it will stop by itself, which is setup to 60s by default (unless `--duration` is set to something else).
-The `vosk` backend is good at
-doing real-time transcription for one language, but tended to make more mistakes in my tests and it does not do punctuation.
+The `vosk` backend is much faster and very good at doing real-time transcription for one language, but tended to make more mistakes in my tests and it does not do punctuation.
 Use mainly for longer typing session with the [keyboard](#virtual-keyboard-advanced) option, e.g. to make notes.
 There are many [vosk models](https://alphacephei.com/vosk/models) available, and here a few are associated to [a handful of languages](scribe/models.toml) `en`, `fr`, `it`, `de` (so far).
@@ -64,6 +72,9 @@ where `--no-prompt` jumps right to the recording (after the first interruption,
 ### Virtual keyboard (experimental)
 By default the content of the transcription is pasted to the clipboard, and it is up to the user to paste (e.g. Ctrl + V).
+However with the `vosk` backend and its realtime transcription, it is very handy to have the keys sent directly to the keyboard.
+That can be achieve with the `--keyboard` option.
 With the `--keyboard` option `scribe` will attempt to simulate a keyboard and send transcribed characters to the applcation under focus:
 ```bash
@@ -71,10 +82,23 @@ scribe --keyboard
 ```
 It relies on the optional `pynput` dependency (installed together with `scribe` if you used the `[all]` or `[keyboard]` option).
+Depending on your operating system, `pynput` may require additional configuration to work around its [limitations](https://pynput.readthedocs.io/en/latest/limitations.html).
+#### Use the keyboard in Ubuntu
-`pynput` may require [some configuration](https://pynput.readthedocs.io/en/latest/limitations.html). It has [limitations]((https://pynput.readthedocs.io/en/latest/limitations.html)). In my Ubuntu + Wayland system it works in chromium based applications (including vscode) but it does not in firefox and sublime text and any of the rest (not even in a terminal !). Workarounds include using the Xorg version of GNOME: in `etc/gdm3/custom.conf` uncomment `# WaylandEnable=false` and restart.
+In my Ubuntu + Wayland system the keyboard simulation works out-of-the-box in chromium based applications (including vscode) but it does not in firefox and sublime text and any of the rest (not even in a terminal !). I am told this is because Chromium runs an X server emulator and so is compatible with the default pynput backend.
+One workaround is to use the Xorg version of GNOME: in `etc/gdm3/custom.conf` uncomment `# WaylandEnable=false` and restart your computer.
+Another workaround while staying with Wayland is to use the low-level `uinput` backend of `pynput`, but that requires that `scribe` is run as root (sudo), and likely other configurations like activating the `uinput` system module (`sudo modprobe uinput` for a one-time test, or adding `uinput` to `/etc/modules-load.d/modules.conf` to make that persistent).
+Moreover, the keyboard must be set with an appropriate layout, for example to have the letter `é` you'd want a French or Italian layout otherwise the English will drop it or replace with something else. Another caveat I encountered is that the special characters (`é`) were inserted at the wrong place. Adding a small delay was enough to fix that with the additional parameter `--latency 0.01` Finally if you run as sudo you may need to reset some environment variable so that the list of audio devices (`XDG_RUNTIME_DIR`) and the download folder remain the same. To sum-up, that gives something like:
+```bash
+sudo modprobe uinput
+sudo HOME=$HOME XDG_RUNTIME_DIR=$XDG_RUNTIME_DIR PYNPUT_BACKEND_KEYBOARD=uinput $(which scribe)  --latency 0.01
+```
+You're on the right path :)
-### System try icon (experimental)
+### System tray icon (experimental)
 To avoid switching back and forth with the terminal, it's possible to interact with the program via an icon tray.
 To activate start with:

{scribe_cli-0.7.3 → scribe_cli-0.7.6}/pyproject.toml RENAMED Viewed

@@ -5,7 +5,7 @@ build-backend = "setuptools.build_meta"
 [project]
 name = "scribe-cli"
 dynamic = ["version"]
-description = "scribe is a local speech recognition tool that provides real-time transcription using vosk and whisper AI."
+description = "scribe is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard on a computer"
 authors = [
     { name="Mahé Perrette", email="mahe.perrette@gmail.com" }
 ]
@@ -18,9 +18,7 @@ dependencies = [
     "tqdm",
     "requests",
     "pyperclip",
-    "pystray",
 ]
-optional-dependencies = { keyboard = ["pynput"], whisper = ["openai-whisper"], vosk = ["vosk"], all = ["pynput", "openai-whisper", "vosk"] }
 classifiers = [
     "Programming Language :: Python :: 3",
@@ -39,6 +37,14 @@ keywords = [
     "clipboard",
 ]
+[project.optional-dependencies]
+keyboard = ["pynput"]
+whisper = ["openai-whisper"]
+vosk = ["vosk"]
+app = ["pystray", "PyGObject"]
+all = ["pynput", "openai-whisper", "vosk", "pystray"]
 [tool.setuptools]
 packages = [ "scribe", "scribe_data" ]

{scribe_cli-0.7.3 → scribe_cli-0.7.6}/scribe/_version.py RENAMED Viewed

@@ -12,5 +12,5 @@ __version__: str
 __version_tuple__: VERSION_TUPLE
 version_tuple: VERSION_TUPLE
-__version__ = version = '0.7.3'
-__version_tuple__ = version_tuple = (0, 7, 3)
+__version__ = version = '0.7.6'
+__version_tuple__ = version_tuple = (0, 7, 6)

{scribe_cli-0.7.3 → scribe_cli-0.7.6}/scribe/app.py RENAMED Viewed

@@ -147,6 +147,7 @@ def get_parser():
     parser.add_argument("--app", action="store_true", help="Start in app mode (relies on pystray)")
     parser.add_argument("--samplerate", default=16000, type=int, help=argparse.SUPPRESS)
+    parser.add_argument("--microphone-device", help="The device index of the microphone to use.", type=int)
     parser.add_argument("--keyboard", action="store_true")
     parser.add_argument("--no-clipboard", dest="clipboard", action="store_false")
     parser.add_argument("--latency", default=0, type=float, help="keyboard latency")
@@ -163,36 +164,20 @@ def get_parser():
 # Commencer l'enregistrement
-def start_recording(micro, transcriber, clipboard=True, keyboard=False, latency=0):
+def start_recording(micro, transcriber, clipboard=True, keyboard=False, latency=0, **greetings):
     if keyboard:
-        try:
-            from scribe.keyboard import type_text
-        except ImportError:
-            keyboard = False
-            print("Keyboard simulation is not available.")
-            return
+        from scribe.keyboard import type_text
         print("\nChange focus to target app during transcription.")
     if clipboard:
-        try:
-            import pyperclip
-        except ImportError:
-            clipboard = False
-            print("Clipboard simulation is not available.")
-            return
+        import pyperclip
         print("\nThe full transcription will be copied to clipboard as it becomes available.")
     fulltext = ""
-    greetings = { k: v for k, v in language_config["_meta"].get(transcriber.language, {}).items()
-                if v is not None and k.startswith(("start", "stop"))
-    }
     for result in transcriber.start_recording(micro, **greetings):
         if result.get('text'):
@@ -284,7 +269,7 @@ def main(args=None):
     # Set up the microphone for recording
-    micro = Microphone(samplerate=o.samplerate)
+    micro = Microphone(samplerate=o.samplerate, device=o.microphone_device)
     transcriber = None
@@ -341,11 +326,17 @@ def main(args=None):
                 continue
         if o.app:
-            app = create_app(micro, transcriber, clipboard=o.clipboard, keyboard=o.keyboard, latency=o.latency)
+            greetings = dict(
+                start_message = "Listening... Use the try icon menu to stop.",
+            )
+            app = create_app(micro, transcriber, clipboard=o.clipboard, keyboard=o.keyboard, latency=o.latency, **greetings)
             print("Starting app...")
             app.run()
         else:
-            start_recording(micro, transcriber, clipboard=o.clipboard, keyboard=o.keyboard, latency=o.latency)
+            greetings = dict(
+                start_message = "Listening... Press Ctrl+C to stop.",
+            )
+            start_recording(micro, transcriber, clipboard=o.clipboard, keyboard=o.keyboard, latency=o.latency, **greetings)
         # if we arrived so far, that means we pressed Ctrl + C anyway, and need Enter to move on.
         # So we leave the wider range of options to change the model.

scribe_cli-0.7.6/scribe/keyboard.py ADDED Viewed

@@ -0,0 +1,51 @@
+"""This module handles typing characters as if they were typed on a keyboard.
+"""
+import platform
+import time
+try:
+    # import pyautogui
+    from pynput.keyboard import Controller, Key
+except ImportError:
+    print("Please install pynput to use the keyboard feature.")
+    raise
+# Create a keyboard controller
+keyboard = Controller()
+def paste_text():
+    """This does not work with the uinput backend
+    """
+    os_name = platform.system()
+    if os_name == "Darwin":  # macOS
+        with keyboard.pressed(Key.cmd):
+            keyboard.press('v')
+            keyboard.release('v')
+    else:  # Windows and Linux
+        keyboard.press(Key.ctrl)
+        keyboard.press('v')
+        keyboard.release('v')
+        keyboard.release(Key.ctrl)
+def type_text(text, interval=0, paste=False):
+    # Simulate typing a string
+    # import subprocess
+    # subprocess.run(["ydotool", "type", text])
+    if paste:
+        import pyperclip
+        keep_state = pyperclip.paste()
+        pyperclip.copy(text)
+        paste_text()
+        pyperclip.copy(keep_state)
+        return
+    if interval > 0:
+        for c in text:
+            keyboard.type(c)
+            time.sleep(interval)
+    else:
+        keyboard.type(text)

{scribe_cli-0.7.3 → scribe_cli-0.7.6}/scribe/models.py RENAMED Viewed

@@ -12,8 +12,9 @@ def is_silent(data, silence_thresh=-40):
     """
     return calculate_decibels(data) < silence_thresh
-VOSK_MODELS_FOLDER = os.path.join(os.environ.get("HOME"),
-                                      ".local/share/vosk/language-models")
+HOME = os.environ.get('HOME', os.path.expanduser('~'))
+XDG_CACHE_HOME = os.environ.get('XDG_CACHE_HOME', os.path.join(HOME, '.cache'))
+VOSK_MODELS_FOLDER = os.path.join(XDG_CACHE_HOME, "vosk")
 class StopRecording(Exception):
     pass

scribe_cli-0.7.6/scribe/models.toml ADDED Viewed

@@ -0,0 +1,23 @@
+[vosk.en]
+model = "vosk-model-en-us-0.42-gigaspeech"
+[vosk.fr]
+model = "vosk-model-fr-0.22"
+[vosk.de]
+model = "vosk-model-de-tuda-0.6-900k"
+[vosk.it]
+model = "vosk-model-it-0.22"
+[_meta.en]
+language = "English (US)"
+[_meta.fr]
+language = "French"
+[_meta.de]
+language = "German"
+[_meta.it]
+language = "Italian"

{scribe_cli-0.7.3 → scribe_cli-0.7.6/scribe_cli.egg-info}/PKG-INFO RENAMED Viewed

@@ -1,7 +1,7 @@
 Metadata-Version: 2.2
 Name: scribe-cli
-Version: 0.7.3
-Summary: scribe is a local speech recognition tool that provides real-time transcription using vosk and whisper AI.
+Version: 0.7.6
+Summary: scribe is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard on a computer
 Author-email: Mahé Perrette <mahe.perrette@gmail.com>
 License: MIT License
@@ -44,31 +44,43 @@ Requires-Dist: sounddevice
 Requires-Dist: tqdm
 Requires-Dist: requests
 Requires-Dist: pyperclip
-Requires-Dist: pystray
 Provides-Extra: keyboard
 Requires-Dist: pynput; extra == "keyboard"
 Provides-Extra: whisper
 Requires-Dist: openai-whisper; extra == "whisper"
 Provides-Extra: vosk
 Requires-Dist: vosk; extra == "vosk"
+Provides-Extra: app
+Requires-Dist: pystray; extra == "app"
+Requires-Dist: PyGObject; extra == "app"
 Provides-Extra: all
 Requires-Dist: pynput; extra == "all"
 Requires-Dist: openai-whisper; extra == "all"
 Requires-Dist: vosk; extra == "all"
+Requires-Dist: pystray; extra == "all"
+[![python](https://img.shields.io/badge/python-3.12-blue.svg)]()
+[![pypi](https://github.com/perrette/scribe/actions/workflows/pypi.yml/badge.svg)](https://pypi.org/project/papers-cli)
 # Scribe
-`scribe` is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard.
+`scribe` is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard on a computer.
+## Compatibility
+In principle `scribe` is compatible with any OS but I develop it under Ubuntu (Wayland) and develop it for my own purposes so glitches are likely on other configurations.
+As of February 19, 2025 python 13 is not supported (I can't recall now which dependency is to blame).
+A test on Mac OS (M1 Air with 8Gb RAM) worked with python 12, though with a much inferior performance compared to my own system (Lenovo T14 Gen 5 with i5 125U 32 Gb RAM).
 ## Installation
-Install PortAudio library. E.g. on Ubuntu:
+Install PortAudio library and xclip library. E.g. on Ubuntu:
 ```bash
-sudo apt-get install portaudio19-dev
+sudo apt-get install portaudio19-dev xclip
 ```
-The python dependencies should be dealt with automatically:
+See additional requirements for the [icon tray](#system-tray-icon-experimental) and [keyboard](#virtual-keyboard-experimental) options. The python dependencies should be dealt with automatically:
 ```bash
 pip install scribe-cli[all]"
@@ -87,8 +99,8 @@ pip install -e .[all]
 You can leave the optional dependencies (leave out `[all]`) but must install at least one of `vosk` or `openai-whisper` packages (see Usage below).
 The `vosk` language models will download on-the-fly.
-The default data folder is `$HOME/.local/share/vosk/language-models`.
-This can be modified.
+The default download folder is `$XDG_CACHE_HOME/{backend}` where `$XDG_CACHE_HOME` defaults to `$HOME/.cache` (note for the `whisker` backend
+the default is left to the `openai-whisper` package and might change in the future).
 ## Usage
@@ -108,8 +120,7 @@ but it cannot do real-time, and depending on the model can have relatively long
 With the `whisker` model you need to stop the registration manually before the transcription occurs (Ctrl + C), though
 there is a maximum duration after which it will stop by itself, which is setup to 60s by default (unless `--duration` is set to something else).
-The `vosk` backend is good at
-doing real-time transcription for one language, but tended to make more mistakes in my tests and it does not do punctuation.
+The `vosk` backend is much faster and very good at doing real-time transcription for one language, but tended to make more mistakes in my tests and it does not do punctuation.
 Use mainly for longer typing session with the [keyboard](#virtual-keyboard-advanced) option, e.g. to make notes.
 There are many [vosk models](https://alphacephei.com/vosk/models) available, and here a few are associated to [a handful of languages](scribe/models.toml) `en`, `fr`, `it`, `de` (so far).
@@ -122,6 +133,9 @@ where `--no-prompt` jumps right to the recording (after the first interruption,
 ### Virtual keyboard (experimental)
 By default the content of the transcription is pasted to the clipboard, and it is up to the user to paste (e.g. Ctrl + V).
+However with the `vosk` backend and its realtime transcription, it is very handy to have the keys sent directly to the keyboard.
+That can be achieve with the `--keyboard` option.
 With the `--keyboard` option `scribe` will attempt to simulate a keyboard and send transcribed characters to the applcation under focus:
 ```bash
@@ -129,10 +143,23 @@ scribe --keyboard
 ```
 It relies on the optional `pynput` dependency (installed together with `scribe` if you used the `[all]` or `[keyboard]` option).
+Depending on your operating system, `pynput` may require additional configuration to work around its [limitations](https://pynput.readthedocs.io/en/latest/limitations.html).
+#### Use the keyboard in Ubuntu
-`pynput` may require [some configuration](https://pynput.readthedocs.io/en/latest/limitations.html). It has [limitations]((https://pynput.readthedocs.io/en/latest/limitations.html)). In my Ubuntu + Wayland system it works in chromium based applications (including vscode) but it does not in firefox and sublime text and any of the rest (not even in a terminal !). Workarounds include using the Xorg version of GNOME: in `etc/gdm3/custom.conf` uncomment `# WaylandEnable=false` and restart.
+In my Ubuntu + Wayland system the keyboard simulation works out-of-the-box in chromium based applications (including vscode) but it does not in firefox and sublime text and any of the rest (not even in a terminal !). I am told this is because Chromium runs an X server emulator and so is compatible with the default pynput backend.
+One workaround is to use the Xorg version of GNOME: in `etc/gdm3/custom.conf` uncomment `# WaylandEnable=false` and restart your computer.
+Another workaround while staying with Wayland is to use the low-level `uinput` backend of `pynput`, but that requires that `scribe` is run as root (sudo), and likely other configurations like activating the `uinput` system module (`sudo modprobe uinput` for a one-time test, or adding `uinput` to `/etc/modules-load.d/modules.conf` to make that persistent).
+Moreover, the keyboard must be set with an appropriate layout, for example to have the letter `é` you'd want a French or Italian layout otherwise the English will drop it or replace with something else. Another caveat I encountered is that the special characters (`é`) were inserted at the wrong place. Adding a small delay was enough to fix that with the additional parameter `--latency 0.01` Finally if you run as sudo you may need to reset some environment variable so that the list of audio devices (`XDG_RUNTIME_DIR`) and the download folder remain the same. To sum-up, that gives something like:
+```bash
+sudo modprobe uinput
+sudo HOME=$HOME XDG_RUNTIME_DIR=$XDG_RUNTIME_DIR PYNPUT_BACKEND_KEYBOARD=uinput $(which scribe)  --latency 0.01
+```
+You're on the right path :)
-### System try icon (experimental)
+### System tray icon (experimental)
 To avoid switching back and forth with the terminal, it's possible to interact with the program via an icon tray.
 To activate start with:

{scribe_cli-0.7.3 → scribe_cli-0.7.6}/scribe_cli.egg-info/requires.txt RENAMED Viewed

@@ -3,12 +3,16 @@ sounddevice
 tqdm
 requests
 pyperclip
-pystray
 [all]
 pynput
 openai-whisper
 vosk
+pystray
+[app]
+pystray
+PyGObject
 [keyboard]
 pynput

scribe_cli-0.7.3/scribe/keyboard.py DELETED Viewed

@@ -1,18 +0,0 @@
-"""This module handles typing characters as if they were typed on a keyboard.
-"""
-try:
-    # import pyautogui
-    from pynput.keyboard import Controller
-except ImportError:
-    print("Please install pynput to use the keyboard feature.")
-    raise
-# Create a keyboard controller
-keyboard = Controller()
-def type_text(text, interval=0):
-    # Simulate typing a string
-    # import subprocess
-    # subprocess.run(["ydotool", "type", text])
-    keyboard.type(text)

scribe_cli-0.7.3/scribe/models.toml DELETED Viewed

@@ -1,31 +0,0 @@
-[vosk.en]
-model = "vosk-model-en-us-0.42-gigaspeech"
-[vosk.fr]
-model = "vosk-model-fr-0.22"
-[vosk.de]
-model = "vosk-model-de-tuda-0.6-900k"
-[vosk.it]
-model = "vosk-model-it-0.22"
-[_meta.en]
-language = "English (US)"
-start_message = "Listening... Press Ctrl+C to stop."
-stop_message = "Recording stopped."
-[_meta.fr]
-language = "French"
-start_message = "En écoute... Appuyez sur Ctrl+C pour arrêter."
-stop_message = "Écoute arrêtée."
-[_meta.de]
-language = "German"
-start_message = "Hören... Drücken Sie Strg+C, um zu stoppen."
-stop_message = "Aufnahme gestoppt."
-[_meta.it]
-language = "Italian"
-start_message = "In ascolto... Premere Ctrl+C per interrompere."
-stop_message = "Registrazione interrotta."