PyPI - scribe-cli - Versions diffs - 0.6.1__tar.gz → 0.7.0__tar.gz - Mend

scribe-cli 0.6.1tar.gz → 0.7.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (27) hide show

{scribe_cli-0.6.1/scribe_cli.egg-info → scribe_cli-0.7.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.2
 Name: scribe-cli
-Version: 0.6.1
+Version: 0.7.0
 Summary: scribe is a local speech recognition tool that provides real-time transcription using vosk and whisper AI.
 Author-email: Mahé Perrette <mahe.perrette@gmail.com>
 License: MIT License
@@ -44,6 +44,7 @@ Requires-Dist: sounddevice
 Requires-Dist: tqdm
 Requires-Dist: requests
 Requires-Dist: pyperclip
+Requires-Dist: pystray
 Provides-Extra: keyboard
 Requires-Dist: pynput; extra == "keyboard"
 Provides-Extra: whisper
@@ -109,6 +110,7 @@ there is a maximum duration after which it will stop by itself, which is setup t
 The `vosk` backend is good at
 doing real-time transcription for one language, but tended to make more mistakes in my tests and it does not do punctuation.
+Use mainly for longer typing session with the [keyboard](#virtual-keyboard-advanced) option, e.g. to make notes.
 There are many [vosk models](https://alphacephei.com/vosk/models) available, and here a few are associated to [a handful of languages](scribe/models.toml) `en`, `fr`, `it`, `de` (so far).
 To skip the initial selection menu you can do:
@@ -117,9 +119,9 @@ scribe --backend whisper --model small --no-prompt
 ```
 where `--no-prompt` jumps right to the recording (after the first interruption, you can still choose to change the backend and model).
-### Advanced usage as keyboard replacement
+### Virtual keyboard (experimental)
-By default the content of the transcription is paster to the clipboard, but is not propagated further.
+By default the content of the transcription is pasted to the clipboard, and it is up to the user to paste (e.g. Ctrl + V).
 With the `--keyboard` option `scribe` will attempt to simulate a keyboard and send transcribed characters to the applcation under focus:
 ```bash
@@ -128,10 +130,32 @@ scribe --keyboard
 It relies on the optional `pynput` dependency (installed together with `scribe` if you used the `[all]` or `[keyboard]` option).
-`pynput` may require [some configuration](https://pynput.readthedocs.io/en/latest/limitations.html) (I *think* got it to work with `xhost +SI:localuser:$(whoami)` as far as the display is concerned). It has [limitations]((https://pynput.readthedocs.io/en/latest/limitations.html)). In my Ubuntu + Wayland system it works in chromium based applications (including vscode) but it does not in firefox and sublime text and any of the rest (not even in a terminal !).
-Workarounds include using the Xorg version of GNOME... Suggestions welcome.
+`pynput` may require [some configuration](https://pynput.readthedocs.io/en/latest/limitations.html). It has [limitations]((https://pynput.readthedocs.io/en/latest/limitations.html)). In my Ubuntu + Wayland system it works in chromium based applications (including vscode) but it does not in firefox and sublime text and any of the rest (not even in a terminal !). Workarounds include using the Xorg version of GNOME: in `etc/gdm3/custom.conf` uncomment `# WaylandEnable=false` and restart.
+### System try icon (experimental)
+To avoid switching back and forth with the terminal, it's possible to interact with the program via an icon tray.
+To activate start with:
+```bash
+scribe --app
+```
+or toggle the app option in the interactive menu. The scribe icon will show, with Record or Quit options.
+That option requires `pystray` to be installed. This is included with the `pip install ...[all]` option. In Ubuntu the following dependencies were required to make the menus appear:
+```bash
+sudo apt install libcairo-dev libgirepository1.0-dev gir1.2-appindicator3-0.1
+pip install PyGObject
+```
 ### Start as an application in Ubuntu
 If you run Ubuntu (or else?) with GNOME, the script `scribe-install [...]` will create a `scribe.desktop` file and place it under `$HOME/.local/share/applications`
 to make it available from the quick launch menu. Any option will be passed on to `scribe`.
+e.g.
+```bash
+scribe-install --backend whisper --model small
+```
+After that just typing Cmd + scri... at any time from any where will conveniently start the app in its own terminal with the prescribed options.

{scribe_cli-0.6.1 → scribe_cli-0.7.0}/README.md RENAMED Viewed

@@ -52,6 +52,7 @@ there is a maximum duration after which it will stop by itself, which is setup t
 The `vosk` backend is good at
 doing real-time transcription for one language, but tended to make more mistakes in my tests and it does not do punctuation.
+Use mainly for longer typing session with the [keyboard](#virtual-keyboard-advanced) option, e.g. to make notes.
 There are many [vosk models](https://alphacephei.com/vosk/models) available, and here a few are associated to [a handful of languages](scribe/models.toml) `en`, `fr`, `it`, `de` (so far).
 To skip the initial selection menu you can do:
@@ -60,9 +61,9 @@ scribe --backend whisper --model small --no-prompt
 ```
 where `--no-prompt` jumps right to the recording (after the first interruption, you can still choose to change the backend and model).
-### Advanced usage as keyboard replacement
+### Virtual keyboard (experimental)
-By default the content of the transcription is paster to the clipboard, but is not propagated further.
+By default the content of the transcription is pasted to the clipboard, and it is up to the user to paste (e.g. Ctrl + V).
 With the `--keyboard` option `scribe` will attempt to simulate a keyboard and send transcribed characters to the applcation under focus:
 ```bash
@@ -71,10 +72,32 @@ scribe --keyboard
 It relies on the optional `pynput` dependency (installed together with `scribe` if you used the `[all]` or `[keyboard]` option).
-`pynput` may require [some configuration](https://pynput.readthedocs.io/en/latest/limitations.html) (I *think* got it to work with `xhost +SI:localuser:$(whoami)` as far as the display is concerned). It has [limitations]((https://pynput.readthedocs.io/en/latest/limitations.html)). In my Ubuntu + Wayland system it works in chromium based applications (including vscode) but it does not in firefox and sublime text and any of the rest (not even in a terminal !).
-Workarounds include using the Xorg version of GNOME... Suggestions welcome.
+`pynput` may require [some configuration](https://pynput.readthedocs.io/en/latest/limitations.html). It has [limitations]((https://pynput.readthedocs.io/en/latest/limitations.html)). In my Ubuntu + Wayland system it works in chromium based applications (including vscode) but it does not in firefox and sublime text and any of the rest (not even in a terminal !). Workarounds include using the Xorg version of GNOME: in `etc/gdm3/custom.conf` uncomment `# WaylandEnable=false` and restart.
+### System try icon (experimental)
+To avoid switching back and forth with the terminal, it's possible to interact with the program via an icon tray.
+To activate start with:
+```bash
+scribe --app
+```
+or toggle the app option in the interactive menu. The scribe icon will show, with Record or Quit options.
+That option requires `pystray` to be installed. This is included with the `pip install ...[all]` option. In Ubuntu the following dependencies were required to make the menus appear:
+```bash
+sudo apt install libcairo-dev libgirepository1.0-dev gir1.2-appindicator3-0.1
+pip install PyGObject
+```
 ### Start as an application in Ubuntu
 If you run Ubuntu (or else?) with GNOME, the script `scribe-install [...]` will create a `scribe.desktop` file and place it under `$HOME/.local/share/applications`
-to make it available from the quick launch menu. Any option will be passed on to `scribe`.
+to make it available from the quick launch menu. Any option will be passed on to `scribe`.
+e.g.
+```bash
+scribe-install --backend whisper --model small
+```
+After that just typing Cmd + scri... at any time from any where will conveniently start the app in its own terminal with the prescribed options.

{scribe_cli-0.6.1 → scribe_cli-0.7.0}/pyproject.toml RENAMED Viewed

@@ -18,6 +18,7 @@ dependencies = [
     "tqdm",
     "requests",
     "pyperclip",
+    "pystray",
 ]
 optional-dependencies = { keyboard = ["pynput"], whisper = ["openai-whisper"], vosk = ["vosk"], all = ["pynput", "openai-whisper", "vosk"] }
@@ -49,5 +50,5 @@ write_to = "scribe/_version.py"
 Homepage = "https://github.com/perrette/scribe"
 [project.scripts]
-scribe = "scribe.streamer:main"
+scribe = "scribe.app:main"
 scribe-install = "scribe.install_desktop:main"

{scribe_cli-0.6.1 → scribe_cli-0.7.0}/scribe/_version.py RENAMED Viewed

@@ -12,5 +12,5 @@ __version__: str
 __version_tuple__: VERSION_TUPLE
 version_tuple: VERSION_TUPLE
-__version__ = version = '0.6.1'
-__version_tuple__ = version_tuple = (0, 6, 1)
+__version__ = version = '0.7.0'
+__version_tuple__ = version_tuple = (0, 7, 0)

scribe_cli-0.6.1/scribe/streamer.py → scribe_cli-0.7.0/scribe/app.py RENAMED Viewed

@@ -47,7 +47,7 @@ def get_transcriber(o, prompt=True):
         backend = o.backend
     elif not prompt:
-        backend = choices[0]
+        backend = BACKENDS[0]
     else:
         checked_backend = False
@@ -113,14 +113,16 @@ def get_transcriber(o, prompt=True):
                                         samplerate=o.samplerate,
                                         timeout=None, # vosk keeps going (no timeout)
                                         silence_duration=None, # vosk handles silences internally
-                                        model_kwargs={"data_folder": o.data_folder})
+                                        model_kwargs={"download_root": o.download_folder_vosk})
         except Exception as error:
             print(error)
             print(f"Failed to (down)load model {model}.")
             exit(1)
     elif backend == "whisper":
-        transcriber = WhisperTranscriber(model_name=model, language=o.language, samplerate=o.samplerate, timeout=o.duration, silence_duration=o.silence)
+        transcriber = WhisperTranscriber(model_name=model, language=o.language, samplerate=o.samplerate,
+                                         timeout=o.duration, silence_duration=o.silence, restart_after_silence=o.restart_after_silence,
+                                         model_kwargs={"download_root": o.download_folder_whisper})
     else:
         raise ValueError(f"Unknown backend: {backend}")
@@ -142,6 +144,7 @@ def get_parser():
                         help="An alias for preselected models when using the vosk backend, or 'en' for the English version of whisper models.")
     parser.add_argument("--no-prompt", action="store_false", dest="prompt", help="Disable prompts for backend and model selection and jump to recording")
+    parser.add_argument("--app", action="store_true", help="Start in app mode (relies on pystray)")
     parser.add_argument("--samplerate", default=16000, type=int, help=argparse.SUPPRESS)
     parser.add_argument("--keyboard", action="store_true")
@@ -153,7 +156,8 @@ def get_parser():
     group.add_argument("--silence", default=2, type=float, help="silence duration that prompt transcription (whisper) (default %(default)ss)")
     group.add_argument("--restart-after-silence", action="store_true", help="Restart the recording after a transcription triggered by a silence")
-    parser.add_argument("--data-folder", help="Folder to store Vosk models.")
+    parser.add_argument("--download-folder-vosk", help="Folder to store Vosk models.")
+    parser.add_argument("--download-folder-whisper", help="Folder to store Whisper models.")
     return parser
@@ -208,6 +212,41 @@ def start_recording(micro, transcriber, clipboard=True, keyboard=False, latency=
         print("Copied to clipboard.")
+def create_app(micro, transcriber, **kwargs):
+    import pystray
+    from pystray import Menu as pystrayMenu, MenuItem as Item
+    from PIL import Image
+    import PIL.ImageOps
+    import scribe_data
+    # Load an image from a file
+    image = Image.open(Path(scribe_data.__file__).parent / "share" / "icon.jpg")
+    def callback_quit(icon, item):
+        icon.visible = False
+        icon.stop()
+    def callback_record(icon, item):
+        print(f"Clicked {item}")
+        # icon.icon = PIL.ImageOps.invert(icon.icon)
+        # icon.icon = PIL.ImageOps.invert(image)
+        # icon.update_menu()
+        start_recording(micro, transcriber, **kwargs)
+        # icon.icon = image
+        # icon.update_menu()
+    # Create a menu
+    menu = pystrayMenu(
+        Item('Record', callback_record),
+        Item('Quit', callback_quit),
+    )
+    # Create the system tray icon
+    icon = pystray.Icon('name', image, "My App", menu)
+    return icon
 def main(args=None):
@@ -230,12 +269,13 @@ def main(args=None):
             print(f"Choose any of the following actions:")
             print(f"[q] quit")
             print(f"[e] change model")
+            print(f"[x] toggle app [{toggle[o.app]}] -> [{toggle[not o.app]}]")
             print(f"[k] toggle keyboard [{toggle[o.keyboard]}] -> [{toggle[not o.keyboard]}]")
             print(f"[c] toggle clipboard [{toggle[o.clipboard]}] -> [{toggle[not o.clipboard]}]")
             if transcriber.backend == "whisper":
                 print(f"[t] change duration (currently {transcriber.timeout}s)")
                 print(f"[b] change silence duration (currently {transcriber.silence_duration}s)")
-                print(f"[a] toggle auto-restart after silence [{toggle[o.restart_after_silence]}] -> [{toggle[not o.restart_after_silence]}]")
+                print(f"[a] toggle auto-restart after silence [{toggle[transcriber.restart_after_silence]}] -> [{toggle[not transcriber.restart_after_silence]}]")
             print(colored(f"Press [Enter] or any other key to start recording.", "BOLD"))
             key = input()
@@ -250,6 +290,12 @@ def main(args=None):
             if key == "c":
                 o.clipboard = not o.clipboard
                 continue
+            if key == "x":
+                o.app = not o.app
+                continue
+            if key == "a":
+                transcriber.restart_after_silence = not transcriber.restart_after_silence
+                continue
             if key == "t":
                 ans = input(f"Enter new duration in seconds (current: {transcriber.timeout}): ")
                 try:
@@ -265,7 +311,11 @@ def main(args=None):
                     print("Invalid duration. Must be an integer.")
                 continue
-        start_recording(micro, transcriber, clipboard=o.clipboard, keyboard=o.keyboard, latency=o.latency)
+        if o.app:
+            app = create_app(micro, transcriber, clipboard=o.clipboard, keyboard=o.keyboard, latency=o.latency)
+            app.run()
+        else:
+            start_recording(micro, transcriber, clipboard=o.clipboard, keyboard=o.keyboard, latency=o.latency)
         # if we arrived so far, that means we pressed Ctrl + C anyway, and need Enter to move on.
         # So we leave the wider range of options to change the model.

{scribe_cli-0.6.1 → scribe_cli-0.7.0}/scribe/keyboard.py RENAMED Viewed

@@ -6,7 +6,6 @@ try:
 except ImportError:
     print("Please install pynput to use the keyboard feature.")
-    print("Alternatively specify [keyboard] optional dependency to voskrealtime, e.g. `pip install -e .[keyboard]`")
     raise
 # Create a keyboard controller

{scribe_cli-0.6.1 → scribe_cli-0.7.0}/scribe/models.py RENAMED Viewed

@@ -95,16 +95,16 @@ class AbstractTranscriber:
             print(stop_message)
-def get_vosk_model(model, data_folder=None, url=None):
+def get_vosk_model(model, download_root=None, url=None):
     """Load the Vosk recognizer"""
     import vosk
-    if data_folder is None:
-        data_folder = VOSK_MODELS_FOLDER
-    model_path = os.path.join(data_folder, model)
+    if download_root is None:
+        download_root = VOSK_MODELS_FOLDER
+    model_path = os.path.join(download_root, model)
     if not os.path.exists(model_path):
         if url is None:
             url = f"https://alphacephei.com/vosk/models/{model}.zip"
-        download_model(url, data_folder)
+        download_model(url, download_root)
         assert os.path.exists(model_path)
     return vosk.Model(model_path)
@@ -162,11 +162,11 @@ class WhisperTranscriber(AbstractTranscriber):
     def __init__(self, model_name, language=None, model=None, model_kwargs={}, **kwargs):
         import whisper
         if model is None:
-            model = whisper.load_model(model_name)
+            model = whisper.load_model(model_name, **model_kwargs)
         super().__init__(model, model_name, language, model_kwargs=model_kwargs, **kwargs)
     def transcribe_audio(self, audio_bytes):
-        print("Transcribing...")
+        print("\nTranscribing...")
         audio_array = np.frombuffer(audio_bytes, dtype=np.int16).flatten().astype(np.float32) / 32768.0
         return self.model.transcribe(audio_array, fp16=False, language=self.language)

{scribe_cli-0.6.1 → scribe_cli-0.7.0/scribe_cli.egg-info}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.2
 Name: scribe-cli
-Version: 0.6.1
+Version: 0.7.0
 Summary: scribe is a local speech recognition tool that provides real-time transcription using vosk and whisper AI.
 Author-email: Mahé Perrette <mahe.perrette@gmail.com>
 License: MIT License
@@ -44,6 +44,7 @@ Requires-Dist: sounddevice
 Requires-Dist: tqdm
 Requires-Dist: requests
 Requires-Dist: pyperclip
+Requires-Dist: pystray
 Provides-Extra: keyboard
 Requires-Dist: pynput; extra == "keyboard"
 Provides-Extra: whisper
@@ -109,6 +110,7 @@ there is a maximum duration after which it will stop by itself, which is setup t
 The `vosk` backend is good at
 doing real-time transcription for one language, but tended to make more mistakes in my tests and it does not do punctuation.
+Use mainly for longer typing session with the [keyboard](#virtual-keyboard-advanced) option, e.g. to make notes.
 There are many [vosk models](https://alphacephei.com/vosk/models) available, and here a few are associated to [a handful of languages](scribe/models.toml) `en`, `fr`, `it`, `de` (so far).
 To skip the initial selection menu you can do:
@@ -117,9 +119,9 @@ scribe --backend whisper --model small --no-prompt
 ```
 where `--no-prompt` jumps right to the recording (after the first interruption, you can still choose to change the backend and model).
-### Advanced usage as keyboard replacement
+### Virtual keyboard (experimental)
-By default the content of the transcription is paster to the clipboard, but is not propagated further.
+By default the content of the transcription is pasted to the clipboard, and it is up to the user to paste (e.g. Ctrl + V).
 With the `--keyboard` option `scribe` will attempt to simulate a keyboard and send transcribed characters to the applcation under focus:
 ```bash
@@ -128,10 +130,32 @@ scribe --keyboard
 It relies on the optional `pynput` dependency (installed together with `scribe` if you used the `[all]` or `[keyboard]` option).
-`pynput` may require [some configuration](https://pynput.readthedocs.io/en/latest/limitations.html) (I *think* got it to work with `xhost +SI:localuser:$(whoami)` as far as the display is concerned). It has [limitations]((https://pynput.readthedocs.io/en/latest/limitations.html)). In my Ubuntu + Wayland system it works in chromium based applications (including vscode) but it does not in firefox and sublime text and any of the rest (not even in a terminal !).
-Workarounds include using the Xorg version of GNOME... Suggestions welcome.
+`pynput` may require [some configuration](https://pynput.readthedocs.io/en/latest/limitations.html). It has [limitations]((https://pynput.readthedocs.io/en/latest/limitations.html)). In my Ubuntu + Wayland system it works in chromium based applications (including vscode) but it does not in firefox and sublime text and any of the rest (not even in a terminal !). Workarounds include using the Xorg version of GNOME: in `etc/gdm3/custom.conf` uncomment `# WaylandEnable=false` and restart.
+### System try icon (experimental)
+To avoid switching back and forth with the terminal, it's possible to interact with the program via an icon tray.
+To activate start with:
+```bash
+scribe --app
+```
+or toggle the app option in the interactive menu. The scribe icon will show, with Record or Quit options.
+That option requires `pystray` to be installed. This is included with the `pip install ...[all]` option. In Ubuntu the following dependencies were required to make the menus appear:
+```bash
+sudo apt install libcairo-dev libgirepository1.0-dev gir1.2-appindicator3-0.1
+pip install PyGObject
+```
 ### Start as an application in Ubuntu
 If you run Ubuntu (or else?) with GNOME, the script `scribe-install [...]` will create a `scribe.desktop` file and place it under `$HOME/.local/share/applications`
 to make it available from the quick launch menu. Any option will be passed on to `scribe`.
+e.g.
+```bash
+scribe-install --backend whisper --model small
+```
+After that just typing Cmd + scri... at any time from any where will conveniently start the app in its own terminal with the prescribed options.

{scribe_cli-0.6.1 → scribe_cli-0.7.0}/scribe_cli.egg-info/SOURCES.txt RENAMED Viewed

@@ -5,13 +5,13 @@ pyproject.toml
 .github/workflows/pypi.yml
 scribe/__init__.py
 scribe/_version.py
+scribe/app.py
 scribe/audio.py
 scribe/install_desktop.py
 scribe/keyboard.py
 scribe/models.py
 scribe/models.toml
 scribe/saverecording.py
-scribe/streamer.py
 scribe/testpynput.py
 scribe/util.py
 scribe_cli.egg-info/PKG-INFO