PyPI - scribe-cli - Versions diffs - 0.4.0__tar.gz → 0.5.0__tar.gz - Mend

scribe-cli 0.4.0tar.gz → 0.5.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (27) hide show

{scribe_cli-0.4.0/scribe_cli.egg-info → scribe_cli-0.5.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.2
 Name: scribe-cli
-Version: 0.4.0
+Version: 0.5.0
 Summary: scribe is a local speech recognition tool that provides real-time transcription using vosk and whisper AI.
 Author-email: Mahé Perrette <mahe.perrette@gmail.com>
 License: MIT License
@@ -43,6 +43,7 @@ Requires-Dist: numpy
 Requires-Dist: sounddevice
 Requires-Dist: tqdm
 Requires-Dist: requests
+Requires-Dist: pyperclip
 Provides-Extra: keyboard
 Requires-Dist: pynput; extra == "keyboard"
 Provides-Extra: whisper
@@ -56,7 +57,7 @@ Requires-Dist: vosk; extra == "all"
 # Scribe
-`scribe` is a local speech recognition tool that provides real-time transcription using vosk and whisper AI.
+`scribe` is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard.
 ## Installation
@@ -99,7 +100,7 @@ scribe
 and the script will guide you through the choice of backend (`whisper` or `vosk`) and the specific language model.
 After this, you will be prompted to start recording your microphone and print the transcribed text in real-time (`vosk`)
 or until after recording is complete (`whisper`).
-You can interrupt the recording via Ctrl + C and start again or change model.
+You can interrupt the recording via Ctrl + C and start again or change model. The full content of the transcription will be pasted to the clipboard by default, until interruption.
 The default (`whisper`) is excellent at transcribing a full-length audio sequences in [many languages](https://github.com/openai/whisper?tab=readme-ov-file#available-models-and-languages). It is really impressive,
 but it cannot do real-time, and depending on the model can have relatively long execution time, especially with the `turbo` model (at least on my laptop with CPU only). The `small` model is also excellent and runs much faster. It is selected as default in `scribe` for that reason.
@@ -118,6 +119,7 @@ where `--no-prompt` jumps right to the recording (after the first interruption,
 ### Advanced usage as keyboard replacement
+By default the content of the transcription is paster to the clipboard, but is not propagated further.
 With the `--keyboard` option `scribe` will attempt to simulate a keyboard and send transcribed characters to the applcation under focus:
 ```bash

{scribe_cli-0.4.0 → scribe_cli-0.5.0}/README.md RENAMED Viewed

@@ -1,6 +1,6 @@
 # Scribe
-`scribe` is a local speech recognition tool that provides real-time transcription using vosk and whisper AI.
+`scribe` is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard.
 ## Installation
@@ -43,7 +43,7 @@ scribe
 and the script will guide you through the choice of backend (`whisper` or `vosk`) and the specific language model.
 After this, you will be prompted to start recording your microphone and print the transcribed text in real-time (`vosk`)
 or until after recording is complete (`whisper`).
-You can interrupt the recording via Ctrl + C and start again or change model.
+You can interrupt the recording via Ctrl + C and start again or change model. The full content of the transcription will be pasted to the clipboard by default, until interruption.
 The default (`whisper`) is excellent at transcribing a full-length audio sequences in [many languages](https://github.com/openai/whisper?tab=readme-ov-file#available-models-and-languages). It is really impressive,
 but it cannot do real-time, and depending on the model can have relatively long execution time, especially with the `turbo` model (at least on my laptop with CPU only). The `small` model is also excellent and runs much faster. It is selected as default in `scribe` for that reason.
@@ -62,6 +62,7 @@ where `--no-prompt` jumps right to the recording (after the first interruption,
 ### Advanced usage as keyboard replacement
+By default the content of the transcription is paster to the clipboard, but is not propagated further.
 With the `--keyboard` option `scribe` will attempt to simulate a keyboard and send transcribed characters to the applcation under focus:
 ```bash

{scribe_cli-0.4.0 → scribe_cli-0.5.0}/pyproject.toml RENAMED Viewed

@@ -17,6 +17,7 @@ dependencies = [
     "sounddevice",
     "tqdm",
     "requests",
+    "pyperclip",
 ]
 optional-dependencies = { keyboard = ["pynput"], whisper = ["openai-whisper"], vosk = ["vosk"], all = ["pynput", "openai-whisper", "vosk"] }

{scribe_cli-0.4.0 → scribe_cli-0.5.0}/scribe/_version.py RENAMED Viewed

@@ -12,5 +12,5 @@ __version__: str
 __version_tuple__: VERSION_TUPLE
 version_tuple: VERSION_TUPLE
-__version__ = version = '0.4.0'
-__version_tuple__ = version_tuple = (0, 4, 0)
+__version__ = version = '0.5.0'
+__version_tuple__ = version_tuple = (0, 5, 0)

{scribe_cli-0.4.0 → scribe_cli-0.5.0}/scribe/models.py RENAMED Viewed

@@ -135,8 +135,8 @@ class WhisperTranscriber(AbstractTranscriber):
         super().__init__(model, model_name, language, model_kwargs=model_kwargs, **kwargs)
     def transcribe_audio(self, audio_bytes):
-        print("\nTranscribing...")
-        print("If --keyboard is set, change focus to target app NOW !")
+        print("\nIf --keyboard is set, change focus to target app NOW !")
+        print("Transcribing...")
         audio_array = np.frombuffer(audio_bytes, dtype=np.int16).flatten().astype(np.float32) / 32768.0
         return self.model.transcribe(audio_array, fp16=False, language=self.language)

{scribe_cli-0.4.0 → scribe_cli-0.5.0}/scribe/streamer.py RENAMED Viewed

@@ -12,14 +12,25 @@ language_config = language_config_default.copy()
 # Commencer l'enregistrement
-def start_recording(micro, transcriber, keyboard=False, latency=0):
+def start_recording(micro, transcriber, clipboard=True, keyboard=False, latency=0):
     if keyboard:
         try:
             from scribe.keyboard import type_text
         except ImportError:
             keyboard = False
-            exit(1)
+            print("Keyboard simulation is not available.")
+            return
+    if clipboard:
+        try:
+            import pyperclip
+        except ImportError:
+            clipboard = False
+            print("Clipboard simulation is not available.")
+            return
+    fulltext = ""
     greetings = { k: v for k, v in language_config["_meta"].get(transcriber.language, {}).items()
                 if v is not None and k.startswith(("start", "stop"))
@@ -32,6 +43,11 @@ def start_recording(micro, transcriber, keyboard=False, latency=0):
             print(result.get('text'))
             if keyboard:
                 type_text(result['text'] + " ", interval=latency) # Simulate typing
+            if clipboard:
+                fulltext += result['text'] + " "
+                pyperclip.copy(fulltext)
         else:
             print_partial(result.get('partial', ''))
@@ -170,6 +186,7 @@ def get_parser():
     parser.add_argument("--samplerate", default=16000, type=int, help=argparse.SUPPRESS)
     parser.add_argument("--duration", default=60, type=int, help="duration in seconds before whisper models start transcribing (default %(default)ss)")
     parser.add_argument("--keyboard", action="store_true")
+    parser.add_argument("--no-clipboard", dest="clipboard", action="store_false")
     parser.add_argument("--latency", default=0, type=float, help="keyboard latency")
     parser.add_argument("--data-folder", help="Folder to store Vosk models.")
@@ -191,12 +208,13 @@ def main(args=None):
     while True:
         if transcriber is None:
             transcriber = get_transcriber(o, prompt=o.prompt)
-        print(f"[ Model {transcriber.model_name} from {transcriber.backend} selected. ]")
+        print(f"[ Model {transcriber.model_name} from {transcriber.backend} selected. Keyboard [{'on' if o.keyboard else 'off'}]. Clipboard [{'on' if o.clipboard else 'off'}]]")
         if o.prompt:
             print(f"Choose any of the following actions:")
             print(f"[q] quit")
             print(f"[e] change model")
-            print(f"[k] toggle keyboard {'off' if o.keyboard else 'on'}")
+            print(f"[k] toggle keyboard [{'off' if o.keyboard else 'on'}]")
+            print(f"[c] toggle clipboard [{'off' if o.clipboard else 'on'}]")
             if transcriber.backend == "whisper":
                 print(f"[t] change duration (currently {transcriber.max_duration}s)")
             print(colored(f"Press [Enter] or any other key to start recording.", "BOLD"))
@@ -210,6 +228,9 @@ def main(args=None):
             if key == "k":
                 o.keyboard = not o.keyboard
                 continue
+            if key == "c":
+                o.clipboard = not o.clipboard
+                continue
             if key == "t":
                 duration = input(f"Enter new duration in seconds (current: {transcriber.max_duration}): ")
                 try:
@@ -218,7 +239,7 @@ def main(args=None):
                     print("Invalid duration. Must be an integer.")
                 continue
-        start_recording(micro, transcriber, keyboard=o.keyboard, latency=o.latency)
+        start_recording(micro, transcriber, clipboard=o.clipboard, keyboard=o.keyboard, latency=o.latency)
         # if we arrived so far, that means we pressed Ctrl + C anyway, and need Enter to move on.
         # So we leave the wider range of options to change the model.

{scribe_cli-0.4.0 → scribe_cli-0.5.0/scribe_cli.egg-info}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.2
 Name: scribe-cli
-Version: 0.4.0
+Version: 0.5.0
 Summary: scribe is a local speech recognition tool that provides real-time transcription using vosk and whisper AI.
 Author-email: Mahé Perrette <mahe.perrette@gmail.com>
 License: MIT License
@@ -43,6 +43,7 @@ Requires-Dist: numpy
 Requires-Dist: sounddevice
 Requires-Dist: tqdm
 Requires-Dist: requests
+Requires-Dist: pyperclip
 Provides-Extra: keyboard
 Requires-Dist: pynput; extra == "keyboard"
 Provides-Extra: whisper
@@ -56,7 +57,7 @@ Requires-Dist: vosk; extra == "all"
 # Scribe
-`scribe` is a local speech recognition tool that provides real-time transcription using vosk and whisper AI.
+`scribe` is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard.
 ## Installation
@@ -99,7 +100,7 @@ scribe
 and the script will guide you through the choice of backend (`whisper` or `vosk`) and the specific language model.
 After this, you will be prompted to start recording your microphone and print the transcribed text in real-time (`vosk`)
 or until after recording is complete (`whisper`).
-You can interrupt the recording via Ctrl + C and start again or change model.
+You can interrupt the recording via Ctrl + C and start again or change model. The full content of the transcription will be pasted to the clipboard by default, until interruption.
 The default (`whisper`) is excellent at transcribing a full-length audio sequences in [many languages](https://github.com/openai/whisper?tab=readme-ov-file#available-models-and-languages). It is really impressive,
 but it cannot do real-time, and depending on the model can have relatively long execution time, especially with the `turbo` model (at least on my laptop with CPU only). The `small` model is also excellent and runs much faster. It is selected as default in `scribe` for that reason.
@@ -118,6 +119,7 @@ where `--no-prompt` jumps right to the recording (after the first interruption,
 ### Advanced usage as keyboard replacement
+By default the content of the transcription is paster to the clipboard, but is not propagated further.
 With the `--keyboard` option `scribe` will attempt to simulate a keyboard and send transcribed characters to the applcation under focus:
 ```bash