PyPI - scribe-cli - Versions diffs - 0.6.0__tar.gz → 0.6.2__tar.gz - Mend

scribe-cli 0.6.0tar.gz → 0.6.2tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (27) hide show

{scribe_cli-0.6.0/scribe_cli.egg-info → scribe_cli-0.6.2}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.2
 Name: scribe-cli
-Version: 0.6.0
+Version: 0.6.2
 Summary: scribe is a local speech recognition tool that provides real-time transcription using vosk and whisper AI.
 Author-email: Mahé Perrette <mahe.perrette@gmail.com>
 License: MIT License
@@ -109,6 +109,7 @@ there is a maximum duration after which it will stop by itself, which is setup t
 The `vosk` backend is good at
 doing real-time transcription for one language, but tended to make more mistakes in my tests and it does not do punctuation.
+Use mainly for longer typing session with the [keyboard](#virtual-keyboard-advanced) option, e.g. to make notes.
 There are many [vosk models](https://alphacephei.com/vosk/models) available, and here a few are associated to [a handful of languages](scribe/models.toml) `en`, `fr`, `it`, `de` (so far).
 To skip the initial selection menu you can do:
@@ -117,9 +118,9 @@ scribe --backend whisper --model small --no-prompt
 ```
 where `--no-prompt` jumps right to the recording (after the first interruption, you can still choose to change the backend and model).
-### Advanced usage as keyboard replacement
+### Virtual keyboard (experimental)
-By default the content of the transcription is paster to the clipboard, but is not propagated further.
+By default the content of the transcription is pasted to the clipboard, and it is up to the user to paste (e.g. Ctrl + V).
 With the `--keyboard` option `scribe` will attempt to simulate a keyboard and send transcribed characters to the applcation under focus:
 ```bash
@@ -128,10 +129,18 @@ scribe --keyboard
 It relies on the optional `pynput` dependency (installed together with `scribe` if you used the `[all]` or `[keyboard]` option).
-`pynput` may require [some configuration](https://pynput.readthedocs.io/en/latest/limitations.html) (I *think* got it to work with `xhost +SI:localuser:$(whoami)` as far as the display is concerned). It has [limitations]((https://pynput.readthedocs.io/en/latest/limitations.html)). In my Ubuntu + Wayland system it works in chromium based applications (including vscode) but it does not in firefox and sublime text and any of the rest (not even in a terminal !).
-Workarounds include using the Xorg version of GNOME... Suggestions welcome.
+`pynput` may require [some configuration](https://pynput.readthedocs.io/en/latest/limitations.html). It has [limitations]((https://pynput.readthedocs.io/en/latest/limitations.html)). In my Ubuntu + Wayland system it works in chromium based applications (including vscode) but it does not in firefox and sublime text and any of the rest (not even in a terminal !). Workarounds include using the Xorg version of GNOME: in `etc/gdm3/custom.conf` uncomment `# WaylandEnable=false` and restart.
 ### Start as an application in Ubuntu
 If you run Ubuntu (or else?) with GNOME, the script `scribe-install [...]` will create a `scribe.desktop` file and place it under `$HOME/.local/share/applications`
 to make it available from the quick launch menu. Any option will be passed on to `scribe`.
+e.g.
+```bash
+scribe-install --backend whisper --model small
+```
+After that just typing Cmd + scri... at any time from any where will conveniently start the app in its own terminal with the prescribed options.

{scribe_cli-0.6.0 → scribe_cli-0.6.2}/README.md RENAMED Viewed

@@ -52,6 +52,7 @@ there is a maximum duration after which it will stop by itself, which is setup t
 The `vosk` backend is good at
 doing real-time transcription for one language, but tended to make more mistakes in my tests and it does not do punctuation.
+Use mainly for longer typing session with the [keyboard](#virtual-keyboard-advanced) option, e.g. to make notes.
 There are many [vosk models](https://alphacephei.com/vosk/models) available, and here a few are associated to [a handful of languages](scribe/models.toml) `en`, `fr`, `it`, `de` (so far).
 To skip the initial selection menu you can do:
@@ -60,9 +61,9 @@ scribe --backend whisper --model small --no-prompt
 ```
 where `--no-prompt` jumps right to the recording (after the first interruption, you can still choose to change the backend and model).
-### Advanced usage as keyboard replacement
+### Virtual keyboard (experimental)
-By default the content of the transcription is paster to the clipboard, but is not propagated further.
+By default the content of the transcription is pasted to the clipboard, and it is up to the user to paste (e.g. Ctrl + V).
 With the `--keyboard` option `scribe` will attempt to simulate a keyboard and send transcribed characters to the applcation under focus:
 ```bash
@@ -71,10 +72,18 @@ scribe --keyboard
 It relies on the optional `pynput` dependency (installed together with `scribe` if you used the `[all]` or `[keyboard]` option).
-`pynput` may require [some configuration](https://pynput.readthedocs.io/en/latest/limitations.html) (I *think* got it to work with `xhost +SI:localuser:$(whoami)` as far as the display is concerned). It has [limitations]((https://pynput.readthedocs.io/en/latest/limitations.html)). In my Ubuntu + Wayland system it works in chromium based applications (including vscode) but it does not in firefox and sublime text and any of the rest (not even in a terminal !).
-Workarounds include using the Xorg version of GNOME... Suggestions welcome.
+`pynput` may require [some configuration](https://pynput.readthedocs.io/en/latest/limitations.html). It has [limitations]((https://pynput.readthedocs.io/en/latest/limitations.html)). In my Ubuntu + Wayland system it works in chromium based applications (including vscode) but it does not in firefox and sublime text and any of the rest (not even in a terminal !). Workarounds include using the Xorg version of GNOME: in `etc/gdm3/custom.conf` uncomment `# WaylandEnable=false` and restart.
 ### Start as an application in Ubuntu
 If you run Ubuntu (or else?) with GNOME, the script `scribe-install [...]` will create a `scribe.desktop` file and place it under `$HOME/.local/share/applications`
-to make it available from the quick launch menu. Any option will be passed on to `scribe`.
+to make it available from the quick launch menu. Any option will be passed on to `scribe`.
+e.g.
+```bash
+scribe-install --backend whisper --model small
+```
+After that just typing Cmd + scri... at any time from any where will conveniently start the app in its own terminal with the prescribed options.

{scribe_cli-0.6.0 → scribe_cli-0.6.2}/scribe/_version.py RENAMED Viewed

@@ -12,5 +12,5 @@ __version__: str
 __version_tuple__: VERSION_TUPLE
 version_tuple: VERSION_TUPLE
-__version__ = version = '0.6.0'
-__version_tuple__ = version_tuple = (0, 6, 0)
+__version__ = version = '0.6.2'
+__version_tuple__ = version_tuple = (0, 6, 2)

{scribe_cli-0.6.0 → scribe_cli-0.6.2}/scribe/keyboard.py RENAMED Viewed

@@ -6,7 +6,6 @@ try:
 except ImportError:
     print("Please install pynput to use the keyboard feature.")
-    print("Alternatively specify [keyboard] optional dependency to voskrealtime, e.g. `pip install -e .[keyboard]`")
     raise
 # Create a keyboard controller

{scribe_cli-0.6.0 → scribe_cli-0.6.2}/scribe/models.py RENAMED Viewed

@@ -95,16 +95,16 @@ class AbstractTranscriber:
             print(stop_message)
-def get_vosk_model(model, data_folder=None, url=None):
+def get_vosk_model(model, download_root=None, url=None):
     """Load the Vosk recognizer"""
     import vosk
-    if data_folder is None:
-        data_folder = VOSK_MODELS_FOLDER
-    model_path = os.path.join(data_folder, model)
+    if download_root is None:
+        download_root = VOSK_MODELS_FOLDER
+    model_path = os.path.join(download_root, model)
     if not os.path.exists(model_path):
         if url is None:
             url = f"https://alphacephei.com/vosk/models/{model}.zip"
-        download_model(url, data_folder)
+        download_model(url, download_root)
         assert os.path.exists(model_path)
     return vosk.Model(model_path)
@@ -162,11 +162,11 @@ class WhisperTranscriber(AbstractTranscriber):
     def __init__(self, model_name, language=None, model=None, model_kwargs={}, **kwargs):
         import whisper
         if model is None:
-            model = whisper.load_model(model_name)
+            model = whisper.load_model(model_name, **model_kwargs)
         super().__init__(model, model_name, language, model_kwargs=model_kwargs, **kwargs)
     def transcribe_audio(self, audio_bytes):
-        print("Transcribing...")
+        print("\nTranscribing...")
         audio_array = np.frombuffer(audio_bytes, dtype=np.int16).flatten().astype(np.float32) / 32768.0
         return self.model.transcribe(audio_array, fp16=False, language=self.language)

{scribe_cli-0.6.0 → scribe_cli-0.6.2}/scribe/streamer.py RENAMED Viewed

@@ -47,7 +47,7 @@ def get_transcriber(o, prompt=True):
         backend = o.backend
     elif not prompt:
-        backend = choices[0]
+        backend = BACKENDS[0]
     else:
         checked_backend = False
@@ -113,14 +113,16 @@ def get_transcriber(o, prompt=True):
                                         samplerate=o.samplerate,
                                         timeout=None, # vosk keeps going (no timeout)
                                         silence_duration=None, # vosk handles silences internally
-                                        model_kwargs={"data_folder": o.data_folder})
+                                        model_kwargs={"download_root": o.download_folder_vosk})
         except Exception as error:
             print(error)
             print(f"Failed to (down)load model {model}.")
             exit(1)
     elif backend == "whisper":
-        transcriber = WhisperTranscriber(model_name=model, language=o.language, samplerate=o.samplerate, timeout=o.duration, silence_duration=o.silence)
+        transcriber = WhisperTranscriber(model_name=model, language=o.language, samplerate=o.samplerate,
+                                         timeout=o.duration, silence_duration=o.silence, restart_after_silence=o.restart_after_silence,
+                                         model_kwargs={"download_root": o.download_folder_whisper})
     else:
         raise ValueError(f"Unknown backend: {backend}")
@@ -153,7 +155,8 @@ def get_parser():
     group.add_argument("--silence", default=2, type=float, help="silence duration that prompt transcription (whisper) (default %(default)ss)")
     group.add_argument("--restart-after-silence", action="store_true", help="Restart the recording after a transcription triggered by a silence")
-    parser.add_argument("--data-folder", help="Folder to store Vosk models.")
+    parser.add_argument("--download-folder-vosk", help="Folder to store Vosk models.")
+    parser.add_argument("--download-folder-whisper", help="Folder to store Whisper models.")
     return parser
@@ -235,7 +238,7 @@ def main(args=None):
             if transcriber.backend == "whisper":
                 print(f"[t] change duration (currently {transcriber.timeout}s)")
                 print(f"[b] change silence duration (currently {transcriber.silence_duration}s)")
-                print(f"[a] toggle auto-restart after silence [{toggle[o.restart_after_silence]}] -> [{toggle[not o.restart_after_silence]}]")
+                print(f"[a] toggle auto-restart after silence [{toggle[transcriber.restart_after_silence]}] -> [{toggle[not transcriber.restart_after_silence]}]")
             print(colored(f"Press [Enter] or any other key to start recording.", "BOLD"))
             key = input()
@@ -250,6 +253,9 @@ def main(args=None):
             if key == "c":
                 o.clipboard = not o.clipboard
                 continue
+            if key == "a":
+                transcriber.restart_after_silence = not transcriber.restart_after_silence
+                continue
             if key == "t":
                 ans = input(f"Enter new duration in seconds (current: {transcriber.timeout}): ")
                 try:

{scribe_cli-0.6.0 → scribe_cli-0.6.2/scribe_cli.egg-info}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.2
 Name: scribe-cli
-Version: 0.6.0
+Version: 0.6.2
 Summary: scribe is a local speech recognition tool that provides real-time transcription using vosk and whisper AI.
 Author-email: Mahé Perrette <mahe.perrette@gmail.com>
 License: MIT License
@@ -109,6 +109,7 @@ there is a maximum duration after which it will stop by itself, which is setup t
 The `vosk` backend is good at
 doing real-time transcription for one language, but tended to make more mistakes in my tests and it does not do punctuation.
+Use mainly for longer typing session with the [keyboard](#virtual-keyboard-advanced) option, e.g. to make notes.
 There are many [vosk models](https://alphacephei.com/vosk/models) available, and here a few are associated to [a handful of languages](scribe/models.toml) `en`, `fr`, `it`, `de` (so far).
 To skip the initial selection menu you can do:
@@ -117,9 +118,9 @@ scribe --backend whisper --model small --no-prompt
 ```
 where `--no-prompt` jumps right to the recording (after the first interruption, you can still choose to change the backend and model).
-### Advanced usage as keyboard replacement
+### Virtual keyboard (experimental)
-By default the content of the transcription is paster to the clipboard, but is not propagated further.
+By default the content of the transcription is pasted to the clipboard, and it is up to the user to paste (e.g. Ctrl + V).
 With the `--keyboard` option `scribe` will attempt to simulate a keyboard and send transcribed characters to the applcation under focus:
 ```bash
@@ -128,10 +129,18 @@ scribe --keyboard
 It relies on the optional `pynput` dependency (installed together with `scribe` if you used the `[all]` or `[keyboard]` option).
-`pynput` may require [some configuration](https://pynput.readthedocs.io/en/latest/limitations.html) (I *think* got it to work with `xhost +SI:localuser:$(whoami)` as far as the display is concerned). It has [limitations]((https://pynput.readthedocs.io/en/latest/limitations.html)). In my Ubuntu + Wayland system it works in chromium based applications (including vscode) but it does not in firefox and sublime text and any of the rest (not even in a terminal !).
-Workarounds include using the Xorg version of GNOME... Suggestions welcome.
+`pynput` may require [some configuration](https://pynput.readthedocs.io/en/latest/limitations.html). It has [limitations]((https://pynput.readthedocs.io/en/latest/limitations.html)). In my Ubuntu + Wayland system it works in chromium based applications (including vscode) but it does not in firefox and sublime text and any of the rest (not even in a terminal !). Workarounds include using the Xorg version of GNOME: in `etc/gdm3/custom.conf` uncomment `# WaylandEnable=false` and restart.
 ### Start as an application in Ubuntu
 If you run Ubuntu (or else?) with GNOME, the script `scribe-install [...]` will create a `scribe.desktop` file and place it under `$HOME/.local/share/applications`
 to make it available from the quick launch menu. Any option will be passed on to `scribe`.
+e.g.
+```bash
+scribe-install --backend whisper --model small
+```
+After that just typing Cmd + scri... at any time from any where will conveniently start the app in its own terminal with the prescribed options.