PyPI - scribe-cli - Versions diffs - 0.8.0__tar.gz → 0.10.0__tar.gz - Mend

scribe-cli 0.8.0tar.gz → 0.10.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (31) hide show

{scribe_cli-0.8.0/scribe_cli.egg-info → scribe_cli-0.10.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.2
 Name: scribe-cli
-Version: 0.8.0
+Version: 0.10.0
 Summary: scribe is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard on a computer
 Author-email: Mahé Perrette <mahe.perrette@gmail.com>
 License: MIT License
@@ -55,9 +55,14 @@ Requires-Dist: vosk; extra == "vosk"
 Provides-Extra: app
 Requires-Dist: pystray; extra == "app"
 Requires-Dist: PyGObject; extra == "app"
+Provides-Extra: openai
+Requires-Dist: openai; extra == "openai"
+Requires-Dist: soundfile; extra == "openai"
 Provides-Extra: all
 Requires-Dist: pynput; extra == "all"
 Requires-Dist: openai-whisper; extra == "all"
+Requires-Dist: openai; extra == "all"
+Requires-Dist: soundfile; extra == "all"
 Requires-Dist: vosk; extra == "all"
 Requires-Dist: pystray; extra == "all"
@@ -66,7 +71,9 @@ Requires-Dist: pystray; extra == "all"
 # Scribe  <img src="scribe_data/share/icon.png" width=48px>
-`scribe` is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard on a computer.
+`scribe` is a speech recognition tool that provides real-time transcription using cutting-edge AI models, with the goal of serving as a virtual keyboard on a computer.
+It features local, downloadable models with the `vosk` and `whisper` backends, as well as a client to open AI via `openaiapi` backend (API key required).
 ## Compatibility
@@ -101,12 +108,10 @@ cd scribe
 pip install -e .[all]
 ```
-You can leave the optional dependencies (leave out `[all]`) but must install at least one of `vosk` or `openai-whisper` packages (see Usage below).
-The `vosk` language models will download on-the-fly.
-The default download folder is `$XDG_CACHE_HOME/{backend}` where `$XDG_CACHE_HOME` defaults to `$HOME/.cache` (note for the `whisper` backend
-the default is left to the `openai-whisper` package and might change in the future).
+You can leave the optional dependencies (leave out `[all]`) but must install at least one of `vosk` or `openai-whisper` or `openai` packages (see Usage below).
+The language models for local backends `vosk` and `whisper` will download on-the-fly.
+The default download folder is `$XDG_CACHE_HOME/{backend}` where `$XDG_CACHE_HOME` defaults to `$HOME/.cache`.
 ## Usage
@@ -115,7 +120,7 @@ Just type in the terminal:
 ```bash
 scribe
 ```
-and the script will guide you through the choice of backend (`whisper` or `vosk`) and the specific language model.
+and the script will guide you through the choice of backend (`whisper` or `vosk` or `openaiapi`) and the specific language model.
 After this, you will be prompted to start recording your microphone and print the transcribed text in real-time (`vosk`)
 or until after recording is complete (`whisper`).
 You can interrupt the recording via Ctrl + C and start again or change model.
@@ -129,9 +134,9 @@ The `vosk` backend is much faster and very good at doing real-time transcription
 It becomes really powerful when used for longer or interactive typing session with the [keyboard](#virtual-keyboard-experimental) option, e.g. to make notes or chat with an AI.
 There are many [vosk models](https://alphacephei.com/vosk/models) available, and here a few are associated to [a handful of languages](scribe/models.toml) `en`, `fr`, `it`, `de` (so far).
-To skip the initial selection menu you can do:
+The `openaiapi` backend uses `whisper-1` model at the time of writing. It requires an API key
 ```bash
-scribe --backend whisper --model small --no-prompt
+scribe --backend openaiapi --api YOURAPIKEY
 ```
 where `--no-prompt` jumps right to the recording (after the first interruption, you can still choose to change the backend and model).
@@ -190,7 +195,8 @@ To activate start with:
 ```bash
 scribe --app
 ```
-or toggle the app option in the interactive menu. The scribe icon will show, with Record or Quit options.
+or toggle the app option in the interactive menu. The scribe icon will show, with Record, Stop or Quit options. The icon will change based on what the app is doing.
+For the vosk model, there are only two states : recording + transcribing or Idle. For the whisper model there are three states visible from the icon: recording, transcribing and idle/waiting.
 That option requires `pystray` to be installed. This is included with the `pip install ...[all]` option. In Ubuntu the following dependencies were required to make the menus appear:
 ```bash
@@ -201,15 +207,20 @@ pip install PyGObject
 ## Start as an application in GNOME
 If you run Ubuntu (or else?) with GNOME, the script `scribe-install [...]` will create a `scribe.desktop` file and place it under `$HOME/.local/share/applications`
-to make it available from the quick launch menu. Any option will be passed on to `scribe`.
+to make it available from the quick launch menu. Any option will be passed on to `scribe`, with the additional options `--name` and `--no-terminal`.
+`--no-terminal` means no terminal will show up, and it also implies the options `--app --no-prompt`.
 e.g.
 ```bash
-scribe-install --backend whisper --model small --clipboard --keyboard --restart-after-silence
+scribe-install
+scribe-install --name "Scribe Whisper" --backend whisper --model small --clipboard --restart-after-silence --no-prompt
+scribe-install --name "Scribe Vosk FR" --backend vosk --language fr --keyboard --clipboard --no-terminal
 ```
-After that just typing Super + scri... at any time from any where will conveniently start the app in its own terminal with the prescribed options.
+This will install three separate apps:
+- `Super + scribe` : will launch the default version with terminal prompt
+- `Super + whisper` : will launch a present version with the `small` model from `whisper` and start recording right away. You can see what is going on in the terminal and the result is ready to paste from the clipboard.
+- `Super + vosk fr` : will launch a preset version for real-time transcription in French with the `vosk` backend, and throughput to the clipboard and the keyboard, not even opening a terminal (you need to press Record in the tray icon menu to start the recording).
 ## Fine tuning

{scribe_cli-0.8.0 → scribe_cli-0.10.0}/README.md RENAMED Viewed

@@ -3,7 +3,9 @@
 # Scribe  <img src="scribe_data/share/icon.png" width=48px>
-`scribe` is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard on a computer.
+`scribe` is a speech recognition tool that provides real-time transcription using cutting-edge AI models, with the goal of serving as a virtual keyboard on a computer.
+It features local, downloadable models with the `vosk` and `whisper` backends, as well as a client to open AI via `openaiapi` backend (API key required).
 ## Compatibility
@@ -38,12 +40,10 @@ cd scribe
 pip install -e .[all]
 ```
-You can leave the optional dependencies (leave out `[all]`) but must install at least one of `vosk` or `openai-whisper` packages (see Usage below).
-The `vosk` language models will download on-the-fly.
-The default download folder is `$XDG_CACHE_HOME/{backend}` where `$XDG_CACHE_HOME` defaults to `$HOME/.cache` (note for the `whisper` backend
-the default is left to the `openai-whisper` package and might change in the future).
+You can leave the optional dependencies (leave out `[all]`) but must install at least one of `vosk` or `openai-whisper` or `openai` packages (see Usage below).
+The language models for local backends `vosk` and `whisper` will download on-the-fly.
+The default download folder is `$XDG_CACHE_HOME/{backend}` where `$XDG_CACHE_HOME` defaults to `$HOME/.cache`.
 ## Usage
@@ -52,7 +52,7 @@ Just type in the terminal:
 ```bash
 scribe
 ```
-and the script will guide you through the choice of backend (`whisper` or `vosk`) and the specific language model.
+and the script will guide you through the choice of backend (`whisper` or `vosk` or `openaiapi`) and the specific language model.
 After this, you will be prompted to start recording your microphone and print the transcribed text in real-time (`vosk`)
 or until after recording is complete (`whisper`).
 You can interrupt the recording via Ctrl + C and start again or change model.
@@ -66,9 +66,9 @@ The `vosk` backend is much faster and very good at doing real-time transcription
 It becomes really powerful when used for longer or interactive typing session with the [keyboard](#virtual-keyboard-experimental) option, e.g. to make notes or chat with an AI.
 There are many [vosk models](https://alphacephei.com/vosk/models) available, and here a few are associated to [a handful of languages](scribe/models.toml) `en`, `fr`, `it`, `de` (so far).
-To skip the initial selection menu you can do:
+The `openaiapi` backend uses `whisper-1` model at the time of writing. It requires an API key
 ```bash
-scribe --backend whisper --model small --no-prompt
+scribe --backend openaiapi --api YOURAPIKEY
 ```
 where `--no-prompt` jumps right to the recording (after the first interruption, you can still choose to change the backend and model).
@@ -127,7 +127,8 @@ To activate start with:
 ```bash
 scribe --app
 ```
-or toggle the app option in the interactive menu. The scribe icon will show, with Record or Quit options.
+or toggle the app option in the interactive menu. The scribe icon will show, with Record, Stop or Quit options. The icon will change based on what the app is doing.
+For the vosk model, there are only two states : recording + transcribing or Idle. For the whisper model there are three states visible from the icon: recording, transcribing and idle/waiting.
 That option requires `pystray` to be installed. This is included with the `pip install ...[all]` option. In Ubuntu the following dependencies were required to make the menus appear:
 ```bash
@@ -138,15 +139,20 @@ pip install PyGObject
 ## Start as an application in GNOME
 If you run Ubuntu (or else?) with GNOME, the script `scribe-install [...]` will create a `scribe.desktop` file and place it under `$HOME/.local/share/applications`
-to make it available from the quick launch menu. Any option will be passed on to `scribe`.
+to make it available from the quick launch menu. Any option will be passed on to `scribe`, with the additional options `--name` and `--no-terminal`.
+`--no-terminal` means no terminal will show up, and it also implies the options `--app --no-prompt`.
 e.g.
 ```bash
-scribe-install --backend whisper --model small --clipboard --keyboard --restart-after-silence
+scribe-install
+scribe-install --name "Scribe Whisper" --backend whisper --model small --clipboard --restart-after-silence --no-prompt
+scribe-install --name "Scribe Vosk FR" --backend vosk --language fr --keyboard --clipboard --no-terminal
 ```
-After that just typing Super + scri... at any time from any where will conveniently start the app in its own terminal with the prescribed options.
+This will install three separate apps:
+- `Super + scribe` : will launch the default version with terminal prompt
+- `Super + whisper` : will launch a present version with the `small` model from `whisper` and start recording right away. You can see what is going on in the terminal and the result is ready to paste from the clipboard.
+- `Super + vosk fr` : will launch a preset version for real-time transcription in French with the `vosk` backend, and throughput to the clipboard and the keyboard, not even opening a terminal (you need to press Record in the tray icon menu to start the recording).
 ## Fine tuning

{scribe_cli-0.8.0 → scribe_cli-0.10.0}/pyproject.toml RENAMED Viewed

@@ -44,7 +44,8 @@ keyboard = ["pynput"]
 whisper = ["openai-whisper"]
 vosk = ["vosk"]
 app = ["pystray", "PyGObject"]
-all = ["pynput", "openai-whisper", "vosk", "pystray"]
+openai = ["openai", "soundfile"]
+all = ["pynput", "openai-whisper", "openai", "soundfile", "vosk", "pystray"]
 [tool.setuptools]

{scribe_cli-0.8.0 → scribe_cli-0.10.0}/scribe/_version.py RENAMED Viewed

@@ -12,5 +12,5 @@ __version__: str
 __version_tuple__: VERSION_TUPLE
 version_tuple: VERSION_TUPLE
-__version__ = version = '0.8.0'
-__version_tuple__ = version_tuple = (0, 8, 0)
+__version__ = version = '0.10.0'
+__version_tuple__ = version_tuple = (0, 10, 0)

{scribe_cli-0.8.0 → scribe_cli-0.10.0}/scribe/app.py RENAMED Viewed

@@ -4,8 +4,8 @@ import re
 import time
 import argparse
 from scribe.audio import Microphone
-from scribe.util import print_partial, clear_line, prompt_choices, check_dependencies, ansi_link, colored
-from scribe.models import VoskTranscriber, WhisperTranscriber
+from scribe.util import print_partial, clear_line, prompt_choices, ansi_link, colored
+from scribe.models import VoskTranscriber, WhisperTranscriber, OpenaiAPITranscriber
 with open(Path(__file__).parent / "models.toml", "rb") as f:
     language_config_default = tomllib.load(f)
@@ -24,7 +24,7 @@ def get_default_backend():
         except ImportError:
             raise ImportError("Please install either vosk or whisper to use this script.")
-BACKENDS = ["whisper", "vosk"]
+BACKENDS = ["whisper", "vosk", "openaiapi"]
 UNAVAILABLE_BACKENDS = []
@@ -59,6 +59,7 @@ def get_transcriber(o, prompt=True):
     whisper_models = ["tiny", "base", "small", "medium", "large", "turbo"]
     whisper_english_models = ["tiny.en", "base.en", "small.en", "medium.en"]
+    whisperapi_models = ["whisper-1"]
     if o.dummy:
         return DummyTranscriber("whisper", "dummy")
@@ -68,26 +69,17 @@ def get_transcriber(o, prompt=True):
             o.backend = "vosk"
         elif o.model in whisper_models + whisper_english_models:
             o.backend = "whisper"
+        elif o.model in whisperapi_models:
+            o.backend = "openaiapi"
     if o.backend:
-        checked_backend = check_dependencies(o.backend)
-        if not checked_backend:
-            print(f"Backend {o.backend} is not available.")
-            exit(1)
         backend = o.backend
     elif not prompt:
         backend = BACKENDS[0]
     else:
-        checked_backend = False
-        while not checked_backend:
-            backend = prompt_choices(BACKENDS, o.backend, "backend", UNAVAILABLE_BACKENDS)
-            # raise an error if the user has explicitly selected a backend that is not available
-            checked_backend = check_dependencies(backend, raise_error=backend==o.backend)
-            if not checked_backend:
-                print(f"Backend {o.backend} is not available.")
-                UNAVAILABLE_BACKENDS.append(backend)
+        backend = prompt_choices(BACKENDS, o.backend, "backend", UNAVAILABLE_BACKENDS)
     print(f"Selected backend: {backend}")
@@ -131,6 +123,13 @@ def get_transcriber(o, prompt=True):
             model = pick_specialist_model(model, o.language, backend)
+        elif backend == "openaiapi":
+            model = o.model or "whisper-1"
+        else:
+            raise ValueError(f"Unknown backend: {backend}")
     print(f"Selected model: {model}")
     if backend == "vosk":
@@ -152,6 +151,12 @@ def get_transcriber(o, prompt=True):
                                          restart_after_silence=o.restart_after_silence,
                                          model_kwargs={"download_root": o.download_folder_whisper})
+    elif backend == "openaiapi":
+        transcriber = OpenaiAPITranscriber(model_name=model, samplerate=o.samplerate,
+                                         timeout=o.duration, silence_duration=o.silence, silence_thresh=o.silence_db,
+                                         restart_after_silence=o.restart_after_silence, api_key=o.api_key)
     else:
         raise ValueError(f"Unknown backend: {backend}")
@@ -195,6 +200,10 @@ def get_parser():
     group.add_argument("--silence-db", default=-30, type=float, help="silence magnitude in decibel (default %(default)s db)")
     group.add_argument("-a", "--restart-after-silence", action="store_true", help="Restart the recording after a transcription triggered by a silence")
+    group = parser.add_argument_group("whisper api")
+    group.add_argument("--api-key",
+                        help="API key for the Whisper API backend.")
     parser.add_argument("--download-folder-vosk", help="Folder to store Vosk models.")
     parser.add_argument("--download-folder-whisper", help="Folder to store Whisper models.")
@@ -206,11 +215,11 @@ def start_recording(micro, transcriber, clipboard=True, keyboard=False, latency=
     if keyboard:
         from scribe.keyboard import type_text
-        print("\nChange focus to target app during transcription.")
+        transcriber.log("Change focus to target app during transcription.")
     if clipboard:
         import pyperclip
-        print("\nThe full transcription will be copied to clipboard as it becomes available.")
+        transcriber.log("The full transcription will be copied to clipboard as it becomes available.")
     fulltext = ""
@@ -301,7 +310,7 @@ def create_app(micro, transcriber, **kwargs):
     def callback_stop_recording(icon, item):
         # Here we need to stop the recording thread
-        transcriber.recording = False
+        transcriber.interrupt = True
         if hasattr(icon, "_recording_thread"):
             icon._recording_thread.join()
         if hasattr(icon, "_monitoring_thread"):
@@ -310,7 +319,7 @@ def create_app(micro, transcriber, **kwargs):
     def callback_record(icon, item):
         # kwargs["callback"] = icon.update_menu   # NOTE: the thread will finish AFTER the callback is complete
         if transcriber.busy:
-            print("Still busy recording or transcribing.")
+            transcriber.log("Still busy recording or transcribing.")
             return
         if hasattr(icon, "_recording_thread") and icon._recording_thread.is_alive():
@@ -362,13 +371,15 @@ def main(args=None):
             transcriber = get_transcriber(o, prompt=o.prompt)
         print(f"Model [{colored(transcriber.model_name, 'light_blue', attrs=['bold'])}] from [{colored(transcriber.backend, 'light_blue', attrs=['bold'])}] selected.")
         show_output = ["clipboard", "keyboard", "output_file"]
-        show_options = ["ascii", "app"]
+        show_options = ["ascii", "restart_after_silence"]
         activated_output = [colored(option if type(getattr(o, option)) is bool else f'{option}={getattr(o, option)}', 'light_blue') for option in show_output if getattr(o, option)]
         activated_options = [colored(option if type(getattr(o, option)) is bool else f'{option}={getattr(o, option)}', 'light_blue') for option in show_options if getattr(o, option)]
         if activated_output:
             print(f"Output: {' | '.join(activated_output)}")
         else:
             print(colored(f"No output selected -> terminal only", "light_red"))
+        if o.app:
+            print(colored("App mode enabled", "light_green"))
         if activated_options:
             print(f"Options: {' | '.join(activated_options)}")
         if o.prompt:
@@ -421,7 +432,7 @@ def main(args=None):
                 o.app = not o.app
                 continue
             if key == "a":
-                transcriber.restart_after_silence = not transcriber.restart_after_silence
+                o.restart_after_silence = transcriber.restart_after_silence = not transcriber.restart_after_silence
                 continue
             if key == "t":
                 ans = input(f"Enter new duration in seconds (current: {transcriber.timeout}): ")

{scribe_cli-0.8.0 → scribe_cli-0.10.0}/scribe/install_desktop.py RENAMED Viewed

@@ -10,9 +10,17 @@ def main():
         sys.exit(0)
     parser = argparse.ArgumentParser("Install the desktop file for the scribe package. Any arguments to this script will be passed on to `scribe`.")
+    parser.add_argument("--name", help="The title of the desktop app", default="Scribe")
+    parser.add_argument("--startup-wm-class")
+    parser.add_argument("--no-terminal", action="store_false", dest="terminal", help="Don't show the terminal (goes in --app mode)")
     o, rest = parser.parse_known_args()
     o.arguments = rest
+    if not o.terminal and "--app" not in o.arguments:
+        o.arguments.append("--app")
+    if not o.terminal and "--no-prompt" not in o.arguments:
+        o.arguments.append("--no-prompt")
     SOURCE_SCRIBE_DATA = os.path.dirname(scribe_data.__file__)
     HOME = os.environ.get('HOME',os.path.expanduser('~'))
@@ -25,15 +33,18 @@ def main():
     with open(os.path.join(SOURCE_SCRIBE_DATA, 'templates', 'scribe.desktop')) as f:
         template = f.read()
+    simple_name = o.name.lower().replace(' ','-').replace(os.path.sep, '-')
     bin_folder = sysconfig.get_path("scripts")
     icon_folder = os.path.join(SOURCE_SCRIBE_DATA, 'share')
-    desktop_filecontent = template.format(icon_folder=icon_folder, bin_folder=bin_folder, options=' '.join(o.arguments) if o.arguments else '')
+    desktop_filecontent = template.format(icon_folder=icon_folder, bin_folder=bin_folder,
+                                          name=o.name, terminal=str(o.terminal).lower(),
+                                          StartupWMClass=o.startup_wm_class or f"crx_mpnasdanpmm_{simple_name}",
+                                          options=' ' + ' '.join(o.arguments) if o.arguments else '')
-    desktop_filepath = os.path.join(XDG_APP_DATA, 'scribe.desktop')
+    desktop_filepath = os.path.join(XDG_APP_DATA, f'{simple_name}.desktop')
     print("Writing GNOME desktop file:", desktop_filepath)
     with open(desktop_filepath, "w") as f:
         f.write(desktop_filecontent)
 if __name__ == "__main__":
     main()

{scribe_cli-0.8.0 → scribe_cli-0.10.0}/scribe/models.py RENAMED Viewed

@@ -22,7 +22,7 @@ class StopRecording(Exception):
 class AbstractTranscriber:
     backend = None
     def __init__(self, model, model_name=None, language=None, samplerate=16000, timeout=None, model_kwargs={},
-                 silence_thresh=-40, silence_duration=2, restart_after_silence=False):
+                 silence_thresh=-40, silence_duration=2, restart_after_silence=False, logger=None):
         self.model_name = model_name
         self.language = language
         self.model = model
@@ -35,6 +35,12 @@ class AbstractTranscriber:
         self.recording = False
         self.busy = False
         self.waiting = False
+        self.interrupt = False
+        if logger is None:
+            import logging
+            logging.basicConfig(level=logging.INFO)
+            logger = logging.getLogger("scribe")
+        self.logger = logger
         self.reset()
     def get_elapsed(self):
@@ -54,11 +60,21 @@ class AbstractTranscriber:
         self.audio_buffer = b''
         self.start_time = time.time()
+    def log(self, text):
+        if text.startswith("\n"):
+            print("")
+            text = text[1:]
+        if self.logger:
+            self.logger.info(text)
+        else:
+            print(f"[{text}]")
     def start_recording(self, microphone,
                         start_message="Recording... Press Ctrl+C to stop.",
-                        stop_message="Done transcribing."):
+                        stop_message="Exit."):
         self.reset()
+        self.interrupt = False
         self.recording = True
         self.waiting = True
         self.busy = True
@@ -71,9 +87,9 @@ class AbstractTranscriber:
         try:
             with microphone.open_stream():
-                print(start_message)
+                self.log(start_message)
-                while self.recording:
+                while not self.interrupt:
                     while not microphone.q.empty():
                         data = microphone.q.get()
@@ -105,7 +121,7 @@ class AbstractTranscriber:
                         else:
                             if not previous_waiting:
-                                print("Silence detected...waiting for more audio")
+                                self.log("Silence detected...waiting for more audio")
                         if self.is_overtime():
                             raise StopRecording("Overtime: {:.2f} seconds".format(self.get_elapsed()))
@@ -123,7 +139,7 @@ class AbstractTranscriber:
             self.busy = False
             yield result
-        print(stop_message)
+        self.log(stop_message)
 def get_vosk_model(model, download_root=None, url=None):
@@ -198,7 +214,7 @@ class WhisperTranscriber(AbstractTranscriber):
         super().__init__(model, model_name, language, model_kwargs=model_kwargs, **kwargs)
     def transcribe_audio(self, audio_bytes):
-        print("\nTranscribing...")
+        self.log("\nTranscribing")
         audio_array = np.frombuffer(audio_bytes, dtype=np.int16).flatten().astype(np.float32) / 32768.0
         return self.model.transcribe(audio_array, fp16=False, language=self.language)
@@ -207,4 +223,34 @@ class WhisperTranscriber(AbstractTranscriber):
             return {"text": ""}
         result = self.transcribe_audio(self.audio_buffer)
         self.audio_buffer = b''
-        return result
+        return result
+class OpenaiAPITranscriber(WhisperTranscriber):
+    backend = "openaiapi"
+    def __init__(self, model_name="whisper-1", language=None, model_kwargs={}, model=None, api_key=None, **kwargs):
+        if model is None:
+            import openai
+            model = openai.OpenAI(
+                api_key=api_key or openai.api_key,
+                # 20 seconds (default is 10 minutes)
+                timeout=20.0,
+            )
+        AbstractTranscriber.__init__(self, model, model_name, language, model_kwargs=model_kwargs, **kwargs)
+    def transcribe_audio(self, audio_bytes):
+        self.log("\nTranscribing")
+        import io
+        import soundfile as sf
+        audio_data = np.frombuffer(audio_bytes, dtype=np.int16).flatten().astype(np.float32) / 32768.0
+        # Write the audio data to an in-memory file in WAV format
+        buffer = io.BytesIO()
+        sf.write(buffer, audio_data, self.samplerate, format='WAV')
+        buffer.seek(0)
+        buffer.name = "audio.wav"  # Set a filename with a valid extension
+        transcription = self.model.audio.transcriptions.create(
+            model=self.model_name,
+            file=buffer,
+        )
+        return {"text": transcription.text}

{scribe_cli-0.8.0 → scribe_cli-0.10.0/scribe_cli.egg-info}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.2
 Name: scribe-cli
-Version: 0.8.0
+Version: 0.10.0
 Summary: scribe is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard on a computer
 Author-email: Mahé Perrette <mahe.perrette@gmail.com>
 License: MIT License
@@ -55,9 +55,14 @@ Requires-Dist: vosk; extra == "vosk"
 Provides-Extra: app
 Requires-Dist: pystray; extra == "app"
 Requires-Dist: PyGObject; extra == "app"
+Provides-Extra: openai
+Requires-Dist: openai; extra == "openai"
+Requires-Dist: soundfile; extra == "openai"
 Provides-Extra: all
 Requires-Dist: pynput; extra == "all"
 Requires-Dist: openai-whisper; extra == "all"
+Requires-Dist: openai; extra == "all"
+Requires-Dist: soundfile; extra == "all"
 Requires-Dist: vosk; extra == "all"
 Requires-Dist: pystray; extra == "all"
@@ -66,7 +71,9 @@ Requires-Dist: pystray; extra == "all"
 # Scribe  <img src="scribe_data/share/icon.png" width=48px>
-`scribe` is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard on a computer.
+`scribe` is a speech recognition tool that provides real-time transcription using cutting-edge AI models, with the goal of serving as a virtual keyboard on a computer.
+It features local, downloadable models with the `vosk` and `whisper` backends, as well as a client to open AI via `openaiapi` backend (API key required).
 ## Compatibility
@@ -101,12 +108,10 @@ cd scribe
 pip install -e .[all]
 ```
-You can leave the optional dependencies (leave out `[all]`) but must install at least one of `vosk` or `openai-whisper` packages (see Usage below).
-The `vosk` language models will download on-the-fly.
-The default download folder is `$XDG_CACHE_HOME/{backend}` where `$XDG_CACHE_HOME` defaults to `$HOME/.cache` (note for the `whisper` backend
-the default is left to the `openai-whisper` package and might change in the future).
+You can leave the optional dependencies (leave out `[all]`) but must install at least one of `vosk` or `openai-whisper` or `openai` packages (see Usage below).
+The language models for local backends `vosk` and `whisper` will download on-the-fly.
+The default download folder is `$XDG_CACHE_HOME/{backend}` where `$XDG_CACHE_HOME` defaults to `$HOME/.cache`.
 ## Usage
@@ -115,7 +120,7 @@ Just type in the terminal:
 ```bash
 scribe
 ```
-and the script will guide you through the choice of backend (`whisper` or `vosk`) and the specific language model.
+and the script will guide you through the choice of backend (`whisper` or `vosk` or `openaiapi`) and the specific language model.
 After this, you will be prompted to start recording your microphone and print the transcribed text in real-time (`vosk`)
 or until after recording is complete (`whisper`).
 You can interrupt the recording via Ctrl + C and start again or change model.
@@ -129,9 +134,9 @@ The `vosk` backend is much faster and very good at doing real-time transcription
 It becomes really powerful when used for longer or interactive typing session with the [keyboard](#virtual-keyboard-experimental) option, e.g. to make notes or chat with an AI.
 There are many [vosk models](https://alphacephei.com/vosk/models) available, and here a few are associated to [a handful of languages](scribe/models.toml) `en`, `fr`, `it`, `de` (so far).
-To skip the initial selection menu you can do:
+The `openaiapi` backend uses `whisper-1` model at the time of writing. It requires an API key
 ```bash
-scribe --backend whisper --model small --no-prompt
+scribe --backend openaiapi --api YOURAPIKEY
 ```
 where `--no-prompt` jumps right to the recording (after the first interruption, you can still choose to change the backend and model).
@@ -190,7 +195,8 @@ To activate start with:
 ```bash
 scribe --app
 ```
-or toggle the app option in the interactive menu. The scribe icon will show, with Record or Quit options.
+or toggle the app option in the interactive menu. The scribe icon will show, with Record, Stop or Quit options. The icon will change based on what the app is doing.
+For the vosk model, there are only two states : recording + transcribing or Idle. For the whisper model there are three states visible from the icon: recording, transcribing and idle/waiting.
 That option requires `pystray` to be installed. This is included with the `pip install ...[all]` option. In Ubuntu the following dependencies were required to make the menus appear:
 ```bash
@@ -201,15 +207,20 @@ pip install PyGObject
 ## Start as an application in GNOME
 If you run Ubuntu (or else?) with GNOME, the script `scribe-install [...]` will create a `scribe.desktop` file and place it under `$HOME/.local/share/applications`
-to make it available from the quick launch menu. Any option will be passed on to `scribe`.
+to make it available from the quick launch menu. Any option will be passed on to `scribe`, with the additional options `--name` and `--no-terminal`.
+`--no-terminal` means no terminal will show up, and it also implies the options `--app --no-prompt`.
 e.g.
 ```bash
-scribe-install --backend whisper --model small --clipboard --keyboard --restart-after-silence
+scribe-install
+scribe-install --name "Scribe Whisper" --backend whisper --model small --clipboard --restart-after-silence --no-prompt
+scribe-install --name "Scribe Vosk FR" --backend vosk --language fr --keyboard --clipboard --no-terminal
 ```
-After that just typing Super + scri... at any time from any where will conveniently start the app in its own terminal with the prescribed options.
+This will install three separate apps:
+- `Super + scribe` : will launch the default version with terminal prompt
+- `Super + whisper` : will launch a present version with the `small` model from `whisper` and start recording right away. You can see what is going on in the terminal and the result is ready to paste from the clipboard.
+- `Super + vosk fr` : will launch a preset version for real-time transcription in French with the `vosk` backend, and throughput to the clipboard and the keyboard, not even opening a terminal (you need to press Record in the tray icon menu to start the recording).
 ## Fine tuning

{scribe_cli-0.8.0 → scribe_cli-0.10.0}/scribe_cli.egg-info/requires.txt RENAMED Viewed

@@ -9,6 +9,8 @@ termcolor
 [all]
 pynput
 openai-whisper
+openai
+soundfile
 vosk
 pystray
@@ -19,6 +21,10 @@ PyGObject
 [keyboard]
 pynput
+[openai]
+openai
+soundfile
 [vosk]
 vosk

scribe_cli-0.10.0/scribe_data/templates/scribe.desktop ADDED Viewed

@@ -0,0 +1,8 @@
+#!/usr/bin/env xdg-open
+[Desktop Entry]
+Terminal={terminal}
+Type=Application
+Name={name}
+Exec={bin_folder}/scribe{options}
+Icon={icon_folder}/icon.png
+StartupWMClass={StartupWMClass}

scribe_cli-0.8.0/scribe_data/templates/scribe.desktop DELETED Viewed

@@ -1,8 +0,0 @@
-#!/usr/bin/env xdg-open
-[Desktop Entry]
-Terminal=true
-Type=Application
-Name=Scribe
-Exec={bin_folder}/scribe{options}
-Icon={icon_folder}/icon.jpg
-StartupWMClass=crx_mpnasdanpmmopoasdjdcgaaiekailkhb