PyPI - scribe-cli - Versions diffs - 0.7.11__tar.gz → 0.8.0__tar.gz - Mend

scribe-cli 0.7.11tar.gz → 0.8.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (30) hide show

{scribe_cli-0.7.11/scribe_cli.egg-info → scribe_cli-0.8.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.2
 Name: scribe-cli
-Version: 0.7.11
+Version: 0.8.0
 Summary: scribe is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard on a computer
 Author-email: Mahé Perrette <mahe.perrette@gmail.com>
 License: MIT License
@@ -118,7 +118,7 @@ scribe
 and the script will guide you through the choice of backend (`whisper` or `vosk`) and the specific language model.
 After this, you will be prompted to start recording your microphone and print the transcribed text in real-time (`vosk`)
 or until after recording is complete (`whisper`).
-You can interrupt the recording via Ctrl + C and start again or change model. The full content of the transcription will be pasted to the clipboard by default, until interruption.
+You can interrupt the recording via Ctrl + C and start again or change model.
 The default (`whisper`) is excellent at transcribing a full-length audio sequences in [many languages](https://github.com/openai/whisper?tab=readme-ov-file#available-models-and-languages). It is really impressive,
 but it cannot do real-time, and depending on the model can have relatively long execution time, especially with the `turbo` model (at least on my laptop with CPU only). The `small` model is also excellent and runs much faster. It is selected as default in `scribe` for that reason.
@@ -135,19 +135,38 @@ scribe --backend whisper --model small --no-prompt
 ```
 where `--no-prompt` jumps right to the recording (after the first interruption, you can still choose to change the backend and model).
-### Virtual keyboard (experimental)
+## Output media
+By default the transcription is printed on the terminal, but other output media are supported.
+### Clipboard
+The most straightforward is the clipboard:
+```bash
+scribe --clipboard
+```
+The content of the (full) transcription is then pasted to the clipboard, and it is up to the user to paste (e.g. Ctrl + V).
+### Output file
+Alternatively an output file can be indicated:
+```bash
+ --keyboard -o transcription.txt
+```
-By default the content of the transcription is pasted to the clipboard, and it is up to the user to paste (e.g. Ctrl + V).
-However with the `vosk` backend and its realtime transcription, it is very handy to have the keys sent directly to the keyboard.
-That can be achieve with the `--keyboard` option.
+### Virtual keyboard (experimental)
-With the `--keyboard` option `scribe` will attempt to simulate a keyboard and send transcribed characters to the applcation under focus:
+With the `--keyboard` option `scribe` will attempt to simulate a keyboard and send transcribed characters to the application under focus:
 ```bash
 scribe --keyboard
 ```
-It relies on the optional `pynput` dependency (installed together with `scribe` if you used the `[all]` or `[keyboard]` option).
+This can be extremely useful with the `vosk` backend and its realtime transcription, or alternatively with the `--restart` option with the `whisper` backend.
+The `--keyboard` option relies on the optional `pynput` dependency (installed together with `scribe` if you used the `[all]` or `[keyboard]` option).
 Depending on your operating system, `pynput` may require additional configuration to work around its [limitations](https://pynput.readthedocs.io/en/latest/limitations.html).
 #### Use the keyboard with Wayland (default for Ubuntu 24.04)
@@ -164,7 +183,7 @@ sudo HOME=$HOME XDG_RUNTIME_DIR=$XDG_RUNTIME_DIR PYNPUT_BACKEND_KEYBOARD=uinput
 ```
 You're on the right path :)
-### System tray icon (experimental) <img src="scribe_data/share/icon.png" width=48px>
+## System tray icon (experimental) <img src="scribe_data/share/icon.png" width=48px>
 To avoid switching back and forth with the terminal, it's possible to interact with the program via an icon tray.
 To activate start with:
@@ -179,7 +198,7 @@ sudo apt install libcairo-dev libgirepository1.0-dev gir1.2-appindicator3-0.1
 pip install PyGObject
 ```
-### Start as an application in GNOME
+## Start as an application in GNOME
 If you run Ubuntu (or else?) with GNOME, the script `scribe-install [...]` will create a `scribe.desktop` file and place it under `$HOME/.local/share/applications`
 to make it available from the quick launch menu. Any option will be passed on to `scribe`.
@@ -187,7 +206,17 @@ to make it available from the quick launch menu. Any option will be passed on to
 e.g.
 ```bash
-scribe-install --backend whisper --model small
+scribe-install --backend whisper --model small --clipboard --keyboard --restart-after-silence
 ```
-After that just typing Cmd + scri... at any time from any where will conveniently start the app in its own terminal with the prescribed options.
+After that just typing Super + scri... at any time from any where will conveniently start the app in its own terminal with the prescribed options.
+## Fine tuning
+There are a number of options to control the silence threshold, duration and more.
+Best is to check the available options in the online help:
+```bash
+scribe --help
+```

{scribe_cli-0.7.11 → scribe_cli-0.8.0}/README.md RENAMED Viewed

@@ -55,7 +55,7 @@ scribe
 and the script will guide you through the choice of backend (`whisper` or `vosk`) and the specific language model.
 After this, you will be prompted to start recording your microphone and print the transcribed text in real-time (`vosk`)
 or until after recording is complete (`whisper`).
-You can interrupt the recording via Ctrl + C and start again or change model. The full content of the transcription will be pasted to the clipboard by default, until interruption.
+You can interrupt the recording via Ctrl + C and start again or change model.
 The default (`whisper`) is excellent at transcribing a full-length audio sequences in [many languages](https://github.com/openai/whisper?tab=readme-ov-file#available-models-and-languages). It is really impressive,
 but it cannot do real-time, and depending on the model can have relatively long execution time, especially with the `turbo` model (at least on my laptop with CPU only). The `small` model is also excellent and runs much faster. It is selected as default in `scribe` for that reason.
@@ -72,19 +72,38 @@ scribe --backend whisper --model small --no-prompt
 ```
 where `--no-prompt` jumps right to the recording (after the first interruption, you can still choose to change the backend and model).
-### Virtual keyboard (experimental)
+## Output media
+By default the transcription is printed on the terminal, but other output media are supported.
+### Clipboard
+The most straightforward is the clipboard:
+```bash
+scribe --clipboard
+```
+The content of the (full) transcription is then pasted to the clipboard, and it is up to the user to paste (e.g. Ctrl + V).
-By default the content of the transcription is pasted to the clipboard, and it is up to the user to paste (e.g. Ctrl + V).
-However with the `vosk` backend and its realtime transcription, it is very handy to have the keys sent directly to the keyboard.
-That can be achieve with the `--keyboard` option.
+### Output file
-With the `--keyboard` option `scribe` will attempt to simulate a keyboard and send transcribed characters to the applcation under focus:
+Alternatively an output file can be indicated:
+```bash
+ --keyboard -o transcription.txt
+```
+### Virtual keyboard (experimental)
+With the `--keyboard` option `scribe` will attempt to simulate a keyboard and send transcribed characters to the application under focus:
 ```bash
 scribe --keyboard
 ```
-It relies on the optional `pynput` dependency (installed together with `scribe` if you used the `[all]` or `[keyboard]` option).
+This can be extremely useful with the `vosk` backend and its realtime transcription, or alternatively with the `--restart` option with the `whisper` backend.
+The `--keyboard` option relies on the optional `pynput` dependency (installed together with `scribe` if you used the `[all]` or `[keyboard]` option).
 Depending on your operating system, `pynput` may require additional configuration to work around its [limitations](https://pynput.readthedocs.io/en/latest/limitations.html).
 #### Use the keyboard with Wayland (default for Ubuntu 24.04)
@@ -101,7 +120,7 @@ sudo HOME=$HOME XDG_RUNTIME_DIR=$XDG_RUNTIME_DIR PYNPUT_BACKEND_KEYBOARD=uinput
 ```
 You're on the right path :)
-### System tray icon (experimental) <img src="scribe_data/share/icon.png" width=48px>
+## System tray icon (experimental) <img src="scribe_data/share/icon.png" width=48px>
 To avoid switching back and forth with the terminal, it's possible to interact with the program via an icon tray.
 To activate start with:
@@ -116,7 +135,7 @@ sudo apt install libcairo-dev libgirepository1.0-dev gir1.2-appindicator3-0.1
 pip install PyGObject
 ```
-### Start as an application in GNOME
+## Start as an application in GNOME
 If you run Ubuntu (or else?) with GNOME, the script `scribe-install [...]` will create a `scribe.desktop` file and place it under `$HOME/.local/share/applications`
 to make it available from the quick launch menu. Any option will be passed on to `scribe`.
@@ -124,7 +143,17 @@ to make it available from the quick launch menu. Any option will be passed on to
 e.g.
 ```bash
-scribe-install --backend whisper --model small
+scribe-install --backend whisper --model small --clipboard --keyboard --restart-after-silence
 ```
-After that just typing Cmd + scri... at any time from any where will conveniently start the app in its own terminal with the prescribed options.
+After that just typing Super + scri... at any time from any where will conveniently start the app in its own terminal with the prescribed options.
+## Fine tuning
+There are a number of options to control the silence threshold, duration and more.
+Best is to check the available options in the online help:
+```bash
+scribe --help
+```

{scribe_cli-0.7.11 → scribe_cli-0.8.0}/scribe/_version.py RENAMED Viewed

@@ -12,5 +12,5 @@ __version__: str
 __version_tuple__: VERSION_TUPLE
 version_tuple: VERSION_TUPLE
-__version__ = version = '0.7.11'
-__version_tuple__ = version_tuple = (0, 7, 11)
+__version__ = version = '0.8.0'
+__version_tuple__ = version_tuple = (0, 8, 0)

{scribe_cli-0.7.11 → scribe_cli-0.8.0}/scribe/app.py RENAMED Viewed

@@ -1,5 +1,6 @@
 from pathlib import Path
 import tomllib
+import re
 import time
 import argparse
 from scribe.audio import Microphone
@@ -177,16 +178,22 @@ def get_parser():
     parser.add_argument("--samplerate", default=16000, type=int, help=argparse.SUPPRESS)
     parser.add_argument("--microphone-device", help="The device index of the microphone to use.", type=int)
-    parser.add_argument("--keyboard", action="store_true")
-    parser.add_argument("--no-clipboard", dest="clipboard", action="store_false")
-    parser.add_argument("--latency", default=0.01, type=float, help="keyboard latency")
-    parser.add_argument("--ascii", action="store_true", help="Use unidecode for keyboard typing in ascii")
+    group = parser.add_argument_group("transcription output")
+    group.add_argument("-c", "--clipboard", dest="clipboard", action="store_true")
+    # group.add_argument("--no-clipboard", dest="clipboard", action="store_false", help=argparse.SUPPRESS)
+    group.add_argument("-k", "--keyboard", action="store_true")
+    group.add_argument("-o", "--output-file")
+    group = parser.add_argument_group("keyboard options")
+    group.add_argument("--latency", default=0.01, type=float, help="keyboard latency (default %(default)s s)")
+    group.add_argument("--ascii", action="store_true", help="Use unidecode for keyboard typing in ascii")
     group = parser.add_argument_group("whisper options")
-    group.add_argument("--duration", default=120, type=float, help="Max duration of the whisper recording (default %(default)ss)")
-    group.add_argument("--silence", default=2, type=float, help="silence duration (default %(default)ss)")
-    group.add_argument("--silence-db", default=-30, type=float, help="silence magnitude in db (default %(default)ss)")
-    group.add_argument("--restart-after-silence", action="store_true", help="Restart the recording after a transcription triggered by a silence")
+    group.add_argument("--duration", default=120, type=float, help="Max duration of the whisper recording (default %(default)s s)")
+    group.add_argument("--silence", default=2, type=float, help="silence duration (default %(default)s s)")
+    group.add_argument("--silence-db", default=-30, type=float, help="silence magnitude in decibel (default %(default)s db)")
+    group.add_argument("-a", "--restart-after-silence", action="store_true", help="Restart the recording after a transcription triggered by a silence")
     parser.add_argument("--download-folder-vosk", help="Folder to store Vosk models.")
     parser.add_argument("--download-folder-whisper", help="Folder to store Whisper models.")
@@ -195,18 +202,16 @@ def get_parser():
 # Commencer l'enregistrement
-def start_recording(micro, transcriber, clipboard=True, keyboard=False, latency=0, ascii=False, callback=None, **greetings):
+def start_recording(micro, transcriber, clipboard=True, keyboard=False, latency=0, ascii=False, output_file=None, callback=None, **greetings):
     if keyboard:
         from scribe.keyboard import type_text
         print("\nChange focus to target app during transcription.")
     if clipboard:
         import pyperclip
         print("\nThe full transcription will be copied to clipboard as it becomes available.")
     fulltext = ""
     for result in transcriber.start_recording(micro, **greetings):
@@ -217,6 +222,10 @@ def start_recording(micro, transcriber, clipboard=True, keyboard=False, latency=
             if keyboard:
                 type_text(result['text'] + " ", interval=latency, ascii=ascii) # Simulate typing
+            if output_file:
+                with open(output_file, "a") as f:
+                    f.write(result['text'] + "\n")
             if clipboard:
                 fulltext += result['text'] + " "
                 pyperclip.copy(fulltext.strip())
@@ -224,9 +233,6 @@ def start_recording(micro, transcriber, clipboard=True, keyboard=False, latency=
         else:
             print_partial(result.get('partial', ''))
-    if clipboard:
-        print("Copied to clipboard.")
     if callback:
         callback()
@@ -355,17 +361,24 @@ def main(args=None):
         if transcriber is None:
             transcriber = get_transcriber(o, prompt=o.prompt)
         print(f"Model [{colored(transcriber.model_name, 'light_blue', attrs=['bold'])}] from [{colored(transcriber.backend, 'light_blue', attrs=['bold'])}] selected.")
-        show_options = ["clipboard", "keyboard", "ascii", "app"]
-        activated_options = [colored(option, 'light_blue') for option in show_options if getattr(o, option)]
-        print(f"Options: {' | '.join(activated_options)}")
+        show_output = ["clipboard", "keyboard", "output_file"]
+        show_options = ["ascii", "app"]
+        activated_output = [colored(option if type(getattr(o, option)) is bool else f'{option}={getattr(o, option)}', 'light_blue') for option in show_output if getattr(o, option)]
+        activated_options = [colored(option if type(getattr(o, option)) is bool else f'{option}={getattr(o, option)}', 'light_blue') for option in show_options if getattr(o, option)]
+        if activated_output:
+            print(f"Output: {' | '.join(activated_output)}")
+        else:
+            print(colored(f"No output selected -> terminal only", "light_red"))
+        if activated_options:
+            print(f"Options: {' | '.join(activated_options)}")
         if o.prompt:
             print(f"Choose any of the following actions")
-            print(f"{colored('[q]', 'light_yellow')} quit")
             print(f"{colored('[e]', 'light_yellow')} change model")
+            print(f"{colored('[f]', 'light_yellow')} output file is {colored(repr(o.output_file), 'light_blue')}")
+            print(f"{colored('[c]', 'light_yellow')} clipboard is {colored(o.clipboard, 'light_blue')} toggle?")
+            print(f"{colored('[k]', 'light_yellow')} keyboard is {colored(o.keyboard, 'light_blue')} toggle?")
+            print(f"{colored('[x]', 'light_yellow')} app is {colored(o.app, 'light_blue')} toggle?")
             if details:
-                print(f"{colored('[x]', 'light_yellow')} app is {colored(o.app, 'light_blue')} toggle?")
-                print(f"{colored('[c]', 'light_yellow')} clipboard is {colored(o.clipboard, 'light_blue')} toggle?")
-                print(f"{colored('[k]', 'light_yellow')} keyboard is {colored(o.keyboard, 'light_blue')} toggle?")
                 if o.keyboard:
                     print(f"{colored('[latency]', 'light_yellow')} between keystrokes is {colored(o.latency, 'light_blue')} s")
                 if transcriber.backend == "whisper":
@@ -379,21 +392,22 @@ def main(args=None):
                     if key not in display_flags or key in exclude_flags or not isinstance(value, bool):
                         continue
                     print(f"{colored(f'[{key}]', 'light_yellow')} is {colored(value, 'light_blue')} toggle?")
-                print(f"{colored('[o]', 'light_yellow')} hide options")
+                print(f"{colored('[-]', 'light_yellow')} hide options")
             else:
-                print(f"{colored('[o]', 'light_yellow')} show options")
+                print(f"{colored('[-]', 'light_yellow')} show more options")
+            print(f"{colored('[q]', 'light_yellow')} quit")
             print(colored(f"Press [Enter] to start recording.", attrs=["bold"]))
             key = input()
             if key == "q":
                 exit(0)
-            if key == "o":
+            if len(key) > 0 and key.strip() in ["", ".", "-", "+", 'o', '\x1b[A', '\x1b[B', '\x1b[C', '\x1b[D']:  # arrow keys
                 details = not details
                 continue
             if key == "e":
                 transcriber = None
                 o.model = None
+                o.dummy = False
                 o.backend = None
                 o.language = None
                 continue
@@ -437,18 +451,27 @@ def main(args=None):
                 except:
                     print("Invalid duration. Must be a float.")
                 continue
+            if key == "f":
+                ans = input(f"Enter output file (current: {o.output_file}): ")
+                invalid_regex = re.compile(r'[^A-Za-z0-9_\-\\\/\.]')
+                if not invalid_regex.search(ans):
+                    o.output_file = ans
+                else:
+                    print(f"Invalid characters: {' '.join(map(repr, invalid_regex.findall(ans)))}")
+                    print(f"Invalid file name: {repr(ans)}")
+                continue
             if key:
                 if hasattr(o, key) and isinstance(getattr(o, key), bool):
                     setattr(o, key, not getattr(o, key))
                     print(f"Toggle {key} to [{getattr(o, key)}].")
-                print(f"Invalid choice: {key}.")
+                print(f"Invalid choice: {repr(key)}")
                 continue
         if o.app:
             greetings = dict(
                 start_message = "Listening... Use the try icon menu to stop.",
             )
-            app = create_app(micro, transcriber, clipboard=o.clipboard,
+            app = create_app(micro, transcriber, clipboard=o.clipboard, output_file=o.output_file,
                              keyboard=o.keyboard, latency=o.latency, ascii=o.ascii, **greetings)
             print("Starting app...")
             app.run()
@@ -456,7 +479,7 @@ def main(args=None):
             greetings = dict(
                 start_message = "Listening... Press Ctrl+C to stop.",
             )
-            start_recording(micro, transcriber, clipboard=o.clipboard,
+            start_recording(micro, transcriber, clipboard=o.clipboard, output_file=o.output_file,
                             keyboard=o.keyboard, latency=o.latency, ascii=o.ascii, **greetings)
         # if we arrived so far, that means we pressed Ctrl + C anyway, and need Enter to move on.

{scribe_cli-0.7.11 → scribe_cli-0.8.0/scribe_cli.egg-info}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.2
 Name: scribe-cli
-Version: 0.7.11
+Version: 0.8.0
 Summary: scribe is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard on a computer
 Author-email: Mahé Perrette <mahe.perrette@gmail.com>
 License: MIT License
@@ -118,7 +118,7 @@ scribe
 and the script will guide you through the choice of backend (`whisper` or `vosk`) and the specific language model.
 After this, you will be prompted to start recording your microphone and print the transcribed text in real-time (`vosk`)
 or until after recording is complete (`whisper`).
-You can interrupt the recording via Ctrl + C and start again or change model. The full content of the transcription will be pasted to the clipboard by default, until interruption.
+You can interrupt the recording via Ctrl + C and start again or change model.
 The default (`whisper`) is excellent at transcribing a full-length audio sequences in [many languages](https://github.com/openai/whisper?tab=readme-ov-file#available-models-and-languages). It is really impressive,
 but it cannot do real-time, and depending on the model can have relatively long execution time, especially with the `turbo` model (at least on my laptop with CPU only). The `small` model is also excellent and runs much faster. It is selected as default in `scribe` for that reason.
@@ -135,19 +135,38 @@ scribe --backend whisper --model small --no-prompt
 ```
 where `--no-prompt` jumps right to the recording (after the first interruption, you can still choose to change the backend and model).
-### Virtual keyboard (experimental)
+## Output media
+By default the transcription is printed on the terminal, but other output media are supported.
+### Clipboard
+The most straightforward is the clipboard:
+```bash
+scribe --clipboard
+```
+The content of the (full) transcription is then pasted to the clipboard, and it is up to the user to paste (e.g. Ctrl + V).
+### Output file
+Alternatively an output file can be indicated:
+```bash
+ --keyboard -o transcription.txt
+```
-By default the content of the transcription is pasted to the clipboard, and it is up to the user to paste (e.g. Ctrl + V).
-However with the `vosk` backend and its realtime transcription, it is very handy to have the keys sent directly to the keyboard.
-That can be achieve with the `--keyboard` option.
+### Virtual keyboard (experimental)
-With the `--keyboard` option `scribe` will attempt to simulate a keyboard and send transcribed characters to the applcation under focus:
+With the `--keyboard` option `scribe` will attempt to simulate a keyboard and send transcribed characters to the application under focus:
 ```bash
 scribe --keyboard
 ```
-It relies on the optional `pynput` dependency (installed together with `scribe` if you used the `[all]` or `[keyboard]` option).
+This can be extremely useful with the `vosk` backend and its realtime transcription, or alternatively with the `--restart` option with the `whisper` backend.
+The `--keyboard` option relies on the optional `pynput` dependency (installed together with `scribe` if you used the `[all]` or `[keyboard]` option).
 Depending on your operating system, `pynput` may require additional configuration to work around its [limitations](https://pynput.readthedocs.io/en/latest/limitations.html).
 #### Use the keyboard with Wayland (default for Ubuntu 24.04)
@@ -164,7 +183,7 @@ sudo HOME=$HOME XDG_RUNTIME_DIR=$XDG_RUNTIME_DIR PYNPUT_BACKEND_KEYBOARD=uinput
 ```
 You're on the right path :)
-### System tray icon (experimental) <img src="scribe_data/share/icon.png" width=48px>
+## System tray icon (experimental) <img src="scribe_data/share/icon.png" width=48px>
 To avoid switching back and forth with the terminal, it's possible to interact with the program via an icon tray.
 To activate start with:
@@ -179,7 +198,7 @@ sudo apt install libcairo-dev libgirepository1.0-dev gir1.2-appindicator3-0.1
 pip install PyGObject
 ```
-### Start as an application in GNOME
+## Start as an application in GNOME
 If you run Ubuntu (or else?) with GNOME, the script `scribe-install [...]` will create a `scribe.desktop` file and place it under `$HOME/.local/share/applications`
 to make it available from the quick launch menu. Any option will be passed on to `scribe`.
@@ -187,7 +206,17 @@ to make it available from the quick launch menu. Any option will be passed on to
 e.g.
 ```bash
-scribe-install --backend whisper --model small
+scribe-install --backend whisper --model small --clipboard --keyboard --restart-after-silence
 ```
-After that just typing Cmd + scri... at any time from any where will conveniently start the app in its own terminal with the prescribed options.
+After that just typing Super + scri... at any time from any where will conveniently start the app in its own terminal with the prescribed options.
+## Fine tuning
+There are a number of options to control the silence threshold, duration and more.
+Best is to check the available options in the online help:
+```bash
+scribe --help
+```