PyPI - scribe-cli - Versions diffs - 0.9.0__tar.gz → 0.11.0__tar.gz - Mend

scribe-cli 0.9.0tar.gz → 0.11.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (30) hide show

{scribe_cli-0.9.0/scribe_cli.egg-info → scribe_cli-0.11.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.2
 Name: scribe-cli
-Version: 0.9.0
+Version: 0.11.0
 Summary: scribe is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard on a computer
 Author-email: Mahé Perrette <mahe.perrette@gmail.com>
 License: MIT License
@@ -55,9 +55,14 @@ Requires-Dist: vosk; extra == "vosk"
 Provides-Extra: app
 Requires-Dist: pystray; extra == "app"
 Requires-Dist: PyGObject; extra == "app"
+Provides-Extra: openai
+Requires-Dist: openai; extra == "openai"
+Requires-Dist: soundfile; extra == "openai"
 Provides-Extra: all
 Requires-Dist: pynput; extra == "all"
 Requires-Dist: openai-whisper; extra == "all"
+Requires-Dist: openai; extra == "all"
+Requires-Dist: soundfile; extra == "all"
 Requires-Dist: vosk; extra == "all"
 Requires-Dist: pystray; extra == "all"
@@ -66,7 +71,9 @@ Requires-Dist: pystray; extra == "all"
 # Scribe  <img src="scribe_data/share/icon.png" width=48px>
-`scribe` is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard on a computer.
+`scribe` is a speech recognition tool that provides real-time transcription using cutting-edge AI models, with the goal of serving as a virtual keyboard on a computer.
+It features local, downloadable models with the `vosk` and `whisper` backends, as well as a client to open AI via `openaiapi` backend (API key required).
 ## Compatibility
@@ -101,12 +108,10 @@ cd scribe
 pip install -e .[all]
 ```
-You can leave the optional dependencies (leave out `[all]`) but must install at least one of `vosk` or `openai-whisper` packages (see Usage below).
-The `vosk` language models will download on-the-fly.
-The default download folder is `$XDG_CACHE_HOME/{backend}` where `$XDG_CACHE_HOME` defaults to `$HOME/.cache` (note for the `whisper` backend
-the default is left to the `openai-whisper` package and might change in the future).
+You can leave the optional dependencies (leave out `[all]`) but must install at least one of `vosk` or `openai-whisper` or `openai` packages (see Usage below).
+The language models for local backends `vosk` and `whisper` will download on-the-fly.
+The default download folder is `$XDG_CACHE_HOME/{backend}` where `$XDG_CACHE_HOME` defaults to `$HOME/.cache`.
 ## Usage
@@ -115,7 +120,7 @@ Just type in the terminal:
 ```bash
 scribe
 ```
-and the script will guide you through the choice of backend (`whisper` or `vosk`) and the specific language model.
+and the script will guide you through the choice of backend (`whisper` or `vosk` or `openaiapi`) and the specific language model.
 After this, you will be prompted to start recording your microphone and print the transcribed text in real-time (`vosk`)
 or until after recording is complete (`whisper`).
 You can interrupt the recording via Ctrl + C and start again or change model.
@@ -129,9 +134,9 @@ The `vosk` backend is much faster and very good at doing real-time transcription
 It becomes really powerful when used for longer or interactive typing session with the [keyboard](#virtual-keyboard-experimental) option, e.g. to make notes or chat with an AI.
 There are many [vosk models](https://alphacephei.com/vosk/models) available, and here a few are associated to [a handful of languages](scribe/models.toml) `en`, `fr`, `it`, `de` (so far).
-To skip the initial selection menu you can do:
+The `openaiapi` backend uses `whisper-1` model at the time of writing. It requires an API key
 ```bash
-scribe --backend whisper --model small --no-prompt
+scribe --backend openaiapi --api YOURAPIKEY
 ```
 where `--no-prompt` jumps right to the recording (after the first interruption, you can still choose to change the backend and model).
@@ -190,7 +195,9 @@ To activate start with:
 ```bash
 scribe --app
 ```
-or toggle the app option in the interactive menu. The scribe icon will show, with Record or Quit options.
+or toggle the app option in the interactive menu. The scribe icon will show, with Record and other options. The icon will change based on what the app is doing. It is possible to choose from a set
+of predefined models, or to Quit and choose from the terminal before pressing Enter again.
+For the vosk model, there are only two states : recording + transcribing or Idle. For the whisper model there are three states visible from the icon: recording, transcribing and idle/waiting.
 That option requires `pystray` to be installed. This is included with the `pip install ...[all]` option. In Ubuntu the following dependencies were required to make the menus appear:
 ```bash
@@ -204,17 +211,19 @@ If you run Ubuntu (or else?) with GNOME, the script `scribe-install [...]` will
 to make it available from the quick launch menu. Any option will be passed on to `scribe`, with the additional options `--name` and `--no-terminal`.
 `--no-terminal` means no terminal will show up, and it also implies the options `--app --no-prompt`.
-e.g.
+In a relatively basic form
+```bash
+scribe-install --clipboard  --api YOUROPENAIAPIKEY
+```
+(`--api` is optional and only useful if you plan to use `openaiapi` backend later on)
+And to make an app running outside the terminal:
 ```bash
-scribe-install
-scribe-install --name "Scribe Whisper" --backend whisper --model small --clipboard --restart-after-silence --no-prompt
-scribe-install --name "Scribe Vosk FR" --backend vosk --language fr --keyboard --clipboard --no-terminal
+scribe-install --backend openaiapi --name "Scribe App" --keyboard --clipboard --app --no-prompt --no-terminal  --api YOUROPENAIAPIKEY
 ```
-This will install three separate apps:
-- `Super + scribe` : will launch the default version with terminal prompt
-- `Super + whisper` : will launch a present version with the `small` model from `whisper` and start recording right away. You can see what is going on in the terminal and the result is ready to paste from the clipboard
-- `Super + vosk fr` : will launch a preset version for real-time transcription in French with the `vosk` backend, and throughput to the clipboard and the keyboard, not even opening a terminal.
+This will install two separate apps (names "Scribe" and "Scribe App")
 ## Fine tuning

{scribe_cli-0.9.0 → scribe_cli-0.11.0}/README.md RENAMED Viewed

@@ -3,7 +3,9 @@
 # Scribe  <img src="scribe_data/share/icon.png" width=48px>
-`scribe` is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard on a computer.
+`scribe` is a speech recognition tool that provides real-time transcription using cutting-edge AI models, with the goal of serving as a virtual keyboard on a computer.
+It features local, downloadable models with the `vosk` and `whisper` backends, as well as a client to open AI via `openaiapi` backend (API key required).
 ## Compatibility
@@ -38,12 +40,10 @@ cd scribe
 pip install -e .[all]
 ```
-You can leave the optional dependencies (leave out `[all]`) but must install at least one of `vosk` or `openai-whisper` packages (see Usage below).
-The `vosk` language models will download on-the-fly.
-The default download folder is `$XDG_CACHE_HOME/{backend}` where `$XDG_CACHE_HOME` defaults to `$HOME/.cache` (note for the `whisper` backend
-the default is left to the `openai-whisper` package and might change in the future).
+You can leave the optional dependencies (leave out `[all]`) but must install at least one of `vosk` or `openai-whisper` or `openai` packages (see Usage below).
+The language models for local backends `vosk` and `whisper` will download on-the-fly.
+The default download folder is `$XDG_CACHE_HOME/{backend}` where `$XDG_CACHE_HOME` defaults to `$HOME/.cache`.
 ## Usage
@@ -52,7 +52,7 @@ Just type in the terminal:
 ```bash
 scribe
 ```
-and the script will guide you through the choice of backend (`whisper` or `vosk`) and the specific language model.
+and the script will guide you through the choice of backend (`whisper` or `vosk` or `openaiapi`) and the specific language model.
 After this, you will be prompted to start recording your microphone and print the transcribed text in real-time (`vosk`)
 or until after recording is complete (`whisper`).
 You can interrupt the recording via Ctrl + C and start again or change model.
@@ -66,9 +66,9 @@ The `vosk` backend is much faster and very good at doing real-time transcription
 It becomes really powerful when used for longer or interactive typing session with the [keyboard](#virtual-keyboard-experimental) option, e.g. to make notes or chat with an AI.
 There are many [vosk models](https://alphacephei.com/vosk/models) available, and here a few are associated to [a handful of languages](scribe/models.toml) `en`, `fr`, `it`, `de` (so far).
-To skip the initial selection menu you can do:
+The `openaiapi` backend uses `whisper-1` model at the time of writing. It requires an API key
 ```bash
-scribe --backend whisper --model small --no-prompt
+scribe --backend openaiapi --api YOURAPIKEY
 ```
 where `--no-prompt` jumps right to the recording (after the first interruption, you can still choose to change the backend and model).
@@ -127,7 +127,9 @@ To activate start with:
 ```bash
 scribe --app
 ```
-or toggle the app option in the interactive menu. The scribe icon will show, with Record or Quit options.
+or toggle the app option in the interactive menu. The scribe icon will show, with Record and other options. The icon will change based on what the app is doing. It is possible to choose from a set
+of predefined models, or to Quit and choose from the terminal before pressing Enter again.
+For the vosk model, there are only two states : recording + transcribing or Idle. For the whisper model there are three states visible from the icon: recording, transcribing and idle/waiting.
 That option requires `pystray` to be installed. This is included with the `pip install ...[all]` option. In Ubuntu the following dependencies were required to make the menus appear:
 ```bash
@@ -141,17 +143,19 @@ If you run Ubuntu (or else?) with GNOME, the script `scribe-install [...]` will
 to make it available from the quick launch menu. Any option will be passed on to `scribe`, with the additional options `--name` and `--no-terminal`.
 `--no-terminal` means no terminal will show up, and it also implies the options `--app --no-prompt`.
-e.g.
+In a relatively basic form
+```bash
+scribe-install --clipboard  --api YOUROPENAIAPIKEY
+```
+(`--api` is optional and only useful if you plan to use `openaiapi` backend later on)
+And to make an app running outside the terminal:
 ```bash
-scribe-install
-scribe-install --name "Scribe Whisper" --backend whisper --model small --clipboard --restart-after-silence --no-prompt
-scribe-install --name "Scribe Vosk FR" --backend vosk --language fr --keyboard --clipboard --no-terminal
+scribe-install --backend openaiapi --name "Scribe App" --keyboard --clipboard --app --no-prompt --no-terminal  --api YOUROPENAIAPIKEY
 ```
-This will install three separate apps:
-- `Super + scribe` : will launch the default version with terminal prompt
-- `Super + whisper` : will launch a present version with the `small` model from `whisper` and start recording right away. You can see what is going on in the terminal and the result is ready to paste from the clipboard
-- `Super + vosk fr` : will launch a preset version for real-time transcription in French with the `vosk` backend, and throughput to the clipboard and the keyboard, not even opening a terminal.
+This will install two separate apps (names "Scribe" and "Scribe App")
 ## Fine tuning

{scribe_cli-0.9.0 → scribe_cli-0.11.0}/pyproject.toml RENAMED Viewed

@@ -44,7 +44,8 @@ keyboard = ["pynput"]
 whisper = ["openai-whisper"]
 vosk = ["vosk"]
 app = ["pystray", "PyGObject"]
-all = ["pynput", "openai-whisper", "vosk", "pystray"]
+openai = ["openai", "soundfile"]
+all = ["pynput", "openai-whisper", "openai", "soundfile", "vosk", "pystray"]
 [tool.setuptools]

{scribe_cli-0.9.0 → scribe_cli-0.11.0}/scribe/_version.py RENAMED Viewed

@@ -12,5 +12,5 @@ __version__: str
 __version_tuple__: VERSION_TUPLE
 version_tuple: VERSION_TUPLE
-__version__ = version = '0.9.0'
-__version_tuple__ = version_tuple = (0, 9, 0)
+__version__ = version = '0.11.0'
+__version_tuple__ = version_tuple = (0, 11, 0)

{scribe_cli-0.9.0 → scribe_cli-0.11.0}/scribe/app.py RENAMED Viewed

@@ -4,8 +4,8 @@ import re
 import time
 import argparse
 from scribe.audio import Microphone
-from scribe.util import print_partial, clear_line, prompt_choices, check_dependencies, ansi_link, colored
-from scribe.models import VoskTranscriber, WhisperTranscriber
+from scribe.util import print_partial, clear_line, prompt_choices, ansi_link, colored
+from scribe.models import VoskTranscriber, WhisperTranscriber, OpenaiAPITranscriber
 with open(Path(__file__).parent / "models.toml", "rb") as f:
     language_config_default = tomllib.load(f)
@@ -24,7 +24,7 @@ def get_default_backend():
         except ImportError:
             raise ImportError("Please install either vosk or whisper to use this script.")
-BACKENDS = ["whisper", "vosk"]
+BACKENDS = ["whisper", "vosk", "openaiapi"]
 UNAVAILABLE_BACKENDS = []
@@ -55,57 +55,54 @@ class DummyTranscriber:
     def __getattr__(self, item):
         return None
-def get_transcriber(o, prompt=True):
+whisper_models = ["tiny", "base", "small", "medium", "large", "turbo"]
+whisper_english_models = ["tiny.en", "base.en", "small.en", "medium.en"]
+whisperapi_models = ["whisper-1"]
+vosk_models = [language_config["vosk"][lang]["model"] for lang in language_config["vosk"]]
-    whisper_models = ["tiny", "base", "small", "medium", "large", "turbo"]
-    whisper_english_models = ["tiny.en", "base.en", "small.en", "medium.en"]
-    if o.dummy:
+def get_transcriber(model=None, backend=None, dummy=False, prompt=True, language=None,
+                    samplerate=None, duration=None, silence=None, silence_db=None, restart_after_silence=None,
+                    api_key=None,
+                    download_folder_vosk=None, download_folder_whisper=None, **kwargs):
+    if dummy:
         return DummyTranscriber("whisper", "dummy")
-    if o.model and not o.backend:
-        if o.model.startswith("vosk-"):
-            o.backend = "vosk"
-        elif o.model in whisper_models + whisper_english_models:
-            o.backend = "whisper"
+    if model and not backend:
+        if model.startswith("vosk-"):
+            backend = "vosk"
+        elif model in whisper_models + whisper_english_models:
+            backend = "whisper"
+        elif model in whisperapi_models:
+            backend = "openaiapi"
-    if o.backend:
-        checked_backend = check_dependencies(o.backend)
-        if not checked_backend:
-            print(f"Backend {o.backend} is not available.")
-            exit(1)
-        backend = o.backend
+    if backend:
+        backend = backend
     elif not prompt:
         backend = BACKENDS[0]
     else:
-        checked_backend = False
-        while not checked_backend:
-            backend = prompt_choices(BACKENDS, o.backend, "backend", UNAVAILABLE_BACKENDS)
-            # raise an error if the user has explicitly selected a backend that is not available
-            checked_backend = check_dependencies(backend, raise_error=backend==o.backend)
-            if not checked_backend:
-                print(f"Backend {o.backend} is not available.")
-                UNAVAILABLE_BACKENDS.append(backend)
+        backend = prompt_choices(BACKENDS, backend, "backend", UNAVAILABLE_BACKENDS)
     print(f"Selected backend: {backend}")
-    if o.model:
-        model = pick_specialist_model(o.model, o.language, backend)
+    if model:
+        model = pick_specialist_model(model, language, backend)
     else:
         if backend == "vosk":
             available_languages = list(language_config[backend])
-            if o.language:
-                if o.language not in available_languages:
-                    print(f"Language '{o.language}' is not pre-defined (yet) for backend '{backend}'.")
+            if language:
+                if language not in available_languages:
+                    print(f"Language '{language}' is not pre-defined (yet) for backend '{backend}'.")
                     print(f"Yet it may actually exist.")
                     print(f"Please choose the model explictly from {ansi_link('https://alphacephei.com/vosk/models')}.")
                     print(f"Or pick one of the pre-defined languages: ", " ".join(available_languages))
                     exit(1)
-                choices = [language_config[backend][o.language]["model"]]
+                choices = [language_config[backend][language]["model"]]
                 default_model = choices[0] # this is a string
             else:
@@ -129,28 +126,41 @@ def get_transcriber(o, prompt=True):
             else:
                 model = default_model
-            model = pick_specialist_model(model, o.language, backend)
+            model = pick_specialist_model(model, language, backend)
+        elif backend == "openaiapi":
+            model = model or "whisper-1"
+        else:
+            raise ValueError(f"Unknown backend: {backend}")
     print(f"Selected model: {model}")
     if backend == "vosk":
         try:
             transcriber = VoskTranscriber(model_name=model,
-                                        language=o.language,
-                                        samplerate=o.samplerate,
+                                        language=language,
+                                        samplerate=samplerate,
                                         timeout=None, # vosk keeps going (no timeout)
                                         silence_duration=None, # vosk handles silences internally
-                                        model_kwargs={"download_root": o.download_folder_vosk})
+                                        model_kwargs={"download_root": download_folder_vosk})
         except Exception as error:
             print(error)
             print(f"Failed to (down)load model {model}.")
             exit(1)
     elif backend == "whisper":
-        transcriber = WhisperTranscriber(model_name=model, language=o.language, samplerate=o.samplerate,
-                                         timeout=o.duration, silence_duration=o.silence, silence_thresh=o.silence_db,
-                                         restart_after_silence=o.restart_after_silence,
-                                         model_kwargs={"download_root": o.download_folder_whisper})
+        transcriber = WhisperTranscriber(model_name=model, language=language, samplerate=samplerate,
+                                         timeout=duration, silence_duration=silence, silence_thresh=silence_db,
+                                         restart_after_silence=restart_after_silence,
+                                         model_kwargs={"download_root": download_folder_whisper})
+    elif backend == "openaiapi":
+        transcriber = OpenaiAPITranscriber(model_name=model, samplerate=samplerate,
+                                         timeout=duration, silence_duration=silence, silence_thresh=silence_db,
+                                         restart_after_silence=restart_after_silence, api_key=api_key)
     else:
         raise ValueError(f"Unknown backend: {backend}")
@@ -195,6 +205,10 @@ def get_parser():
     group.add_argument("--silence-db", default=-30, type=float, help="silence magnitude in decibel (default %(default)s db)")
     group.add_argument("-a", "--restart-after-silence", action="store_true", help="Restart the recording after a transcription triggered by a silence")
+    group = parser.add_argument_group("whisper api")
+    group.add_argument("--api-key",
+                        help="API key for the Whisper API backend.")
     parser.add_argument("--download-folder-vosk", help="Folder to store Vosk models.")
     parser.add_argument("--download-folder-whisper", help="Folder to store Whisper models.")
@@ -206,11 +220,11 @@ def start_recording(micro, transcriber, clipboard=True, keyboard=False, latency=
     if keyboard:
         from scribe.keyboard import type_text
-        print("\nChange focus to target app during transcription.")
+        transcriber.log("Change focus to target app during transcription.")
     if clipboard:
         import pyperclip
-        print("\nThe full transcription will be copied to clipboard as it becomes available.")
+        transcriber.log("The full transcription will be copied to clipboard as it becomes available.")
     fulltext = ""
@@ -237,7 +251,7 @@ def start_recording(micro, transcriber, clipboard=True, keyboard=False, latency=
         callback()
-def create_app(micro, transcriber, **kwargs):
+def create_app(micro, transcriber, other_transcribers=None, **kwargs):
     import pystray
     from pystray import Menu as pystrayMenu, MenuItem as Item
     from PIL import Image
@@ -257,6 +271,7 @@ def create_app(micro, transcriber, **kwargs):
         image_recording = Image.alpha_composite(image_recording.convert("RGBA"), image_writing.convert("RGBA"))
     def update_icon(icon, force=False):
+        transcriber = icon._transcriber
         if transcriber.recording and transcriber.waiting:
             # this is the situation with the whisper backend when the microphone is recording
             # but we wait for the speaker to speak (silence)
@@ -284,6 +299,7 @@ def create_app(micro, transcriber, **kwargs):
                 icon.update_menu()
     def start_monitoring(icon):
+        transcriber = icon._transcriber
         try:
             while transcriber.busy:
                 update_icon(icon)
@@ -299,8 +315,8 @@ def create_app(micro, transcriber, **kwargs):
         icon.stop()
     def callback_stop_recording(icon, item):
+        transcriber = icon._transcriber
         # Here we need to stop the recording thread
         transcriber.interrupt = True
         if hasattr(icon, "_recording_thread"):
             icon._recording_thread.join()
@@ -308,9 +324,9 @@ def create_app(micro, transcriber, **kwargs):
             icon._monitoring_thread.join()
     def callback_record(icon, item):
-        # kwargs["callback"] = icon.update_menu   # NOTE: the thread will finish AFTER the callback is complete
+        transcriber = icon._transcriber
         if transcriber.busy:
-            print("Still busy recording or transcribing.")
+            transcriber.log("Still busy recording or transcribing.")
             return
         if hasattr(icon, "_recording_thread") and icon._recording_thread.is_alive():
@@ -325,22 +341,67 @@ def create_app(micro, transcriber, **kwargs):
         icon._monitoring_thread = threading.Thread(target=start_monitoring, args=(icon,))
         icon._monitoring_thread.start()
+    if other_transcribers:
+        other_transcribers_dict = {meta["model"]: meta for meta in other_transcribers}
+    else:
+        other_transcribers_dict = {}
+    def callback_set_model(icon, item):
+        transcriber = icon._transcriber
+        callback_stop_recording(icon, item)
+        model_name = str(item)
+        meta = other_transcribers_dict[model_name]
+        icon._transcriber = transcriber = get_transcriber(**meta)
+        icon.title = f"scribe :: {transcriber.backend} :: {transcriber.model_name}"
+        print("Set", transcriber.backend, transcriber.model_name)
+        # icon.menu.items[0].__name__ = f"Record [{str(item)}]"
+        icon._model_selection = False
+        icon.update_menu()
+        icon.notify(f"Set {transcriber.backend} {transcriber.model_name}")
+    def callback_info(icon, item):
+        transcriber = icon._transcriber
+        # icon.notify(f"scribe {transcriber.backend} {transcriber.model_name}")
+        title = f"""{transcriber.backend} :: {transcriber.model_name}"""
+        info = [name for name in kwargs if isinstance(kwargs[name], bool) and kwargs[name]]
+        icon.notify(" | ".join(info), title=title)
+    def callback_toggle_option(icon, item):
+        kwargs[str(item)] = not kwargs[str(item)]
+        callback_info(icon, item)
+    def is_model_selection(item):
+        return icon._model_selection
     def is_recording(item):
-        return transcriber.busy
+        return icon._transcriber.busy
     def is_not_recording(item):
-        return not is_recording(item)
+        return not is_recording(item) and not is_model_selection(item)
+    modeltitle = f"{transcriber.backend} :: {transcriber.model_name}"
+    title = f"scribe :: {modeltitle}"
-    # Create a menu
-    menu = pystrayMenu(
-        Item("Record", callback_record, visible=is_not_recording),
-        Item("Stop", callback_stop_recording, visible=is_recording),
-        Item('Quit', callback_quit),
+    menus = []
+    menus.append(Item(f"Record" if len(other_transcribers_dict) <= 1 else f"Record", callback_record, visible=is_not_recording))
+    menus.append(Item("Stop", callback_stop_recording, visible=is_recording))
+    menus.append(Item("Choose Model", pystrayMenu(
+        *(Item(f"{name}", callback_set_model) for name in other_transcribers_dict)))
     )
+    menus.append(Item("Toggle Options", pystrayMenu(
+        *(Item(f"{name}", callback_toggle_option) for name in kwargs if isinstance(kwargs[name], bool))))
+    )
+    menus.append(Item(f"Info", callback_info))
+    menus.append(Item('Quit', callback_quit))
+    # Create a menu
+    menu = pystrayMenu(*menus)
     # Create the system tray icon
-    icon = pystray.Icon('scribe', image, "scribe", menu)
+    icon = pystray.Icon('scribe', image, title, menu)
+    icon._model_selection = False
+    icon._transcriber = transcriber
+    del transcriber
     return icon
@@ -359,7 +420,7 @@ def main(args=None):
     while True:
         if transcriber is None:
-            transcriber = get_transcriber(o, prompt=o.prompt)
+            transcriber = get_transcriber(**vars(o))
         print(f"Model [{colored(transcriber.model_name, 'light_blue', attrs=['bold'])}] from [{colored(transcriber.backend, 'light_blue', attrs=['bold'])}] selected.")
         show_output = ["clipboard", "keyboard", "output_file"]
         show_options = ["ascii", "restart_after_silence"]
@@ -473,7 +534,12 @@ def main(args=None):
             greetings = dict(
                 start_message = "Listening... Use the try icon menu to stop.",
             )
-            app = create_app(micro, transcriber, clipboard=o.clipboard, output_file=o.output_file,
+            app = create_app(micro, transcriber, other_transcribers=[
+                {**vars(o), "backend": "openaiapi", "model": "whisper-1"},
+                *[{**vars(o), "backend": "whisper", "model": model} for model in whisper_models],
+                *[{**vars(o), "backend": "vosk", "model": model} for model in vosk_models]],
+                             clipboard=o.clipboard, output_file=o.output_file,
                              keyboard=o.keyboard, latency=o.latency, ascii=o.ascii, **greetings)
             print("Starting app...")
             app.run()

{scribe_cli-0.9.0 → scribe_cli-0.11.0}/scribe/models.py RENAMED Viewed

@@ -22,7 +22,7 @@ class StopRecording(Exception):
 class AbstractTranscriber:
     backend = None
     def __init__(self, model, model_name=None, language=None, samplerate=16000, timeout=None, model_kwargs={},
-                 silence_thresh=-40, silence_duration=2, restart_after_silence=False):
+                 silence_thresh=-40, silence_duration=2, restart_after_silence=False, logger=None):
         self.model_name = model_name
         self.language = language
         self.model = model
@@ -36,6 +36,11 @@ class AbstractTranscriber:
         self.busy = False
         self.waiting = False
         self.interrupt = False
+        if logger is None:
+            import logging
+            logging.basicConfig(level=logging.INFO)
+            logger = logging.getLogger("scribe")
+        self.logger = logger
         self.reset()
     def get_elapsed(self):
@@ -55,9 +60,18 @@ class AbstractTranscriber:
         self.audio_buffer = b''
         self.start_time = time.time()
+    def log(self, text):
+        if text.startswith("\n"):
+            print("")
+            text = text[1:]
+        if self.logger:
+            self.logger.info(text)
+        else:
+            print(f"[{text}]")
     def start_recording(self, microphone,
                         start_message="Recording... Press Ctrl+C to stop.",
-                        stop_message="Done transcribing."):
+                        stop_message="Exit."):
         self.reset()
         self.interrupt = False
@@ -73,7 +87,7 @@ class AbstractTranscriber:
         try:
             with microphone.open_stream():
-                print(start_message)
+                self.log(start_message)
                 while not self.interrupt:
                     while not microphone.q.empty():
@@ -107,7 +121,7 @@ class AbstractTranscriber:
                         else:
                             if not previous_waiting:
-                                print("Silence detected...waiting for more audio")
+                                self.log("Silence detected...waiting for more audio")
                         if self.is_overtime():
                             raise StopRecording("Overtime: {:.2f} seconds".format(self.get_elapsed()))
@@ -125,7 +139,7 @@ class AbstractTranscriber:
             self.busy = False
             yield result
-        print(stop_message)
+        self.log(stop_message)
 def get_vosk_model(model, download_root=None, url=None):
@@ -200,7 +214,7 @@ class WhisperTranscriber(AbstractTranscriber):
         super().__init__(model, model_name, language, model_kwargs=model_kwargs, **kwargs)
     def transcribe_audio(self, audio_bytes):
-        print("\nTranscribing...")
+        self.log("\nTranscribing")
         audio_array = np.frombuffer(audio_bytes, dtype=np.int16).flatten().astype(np.float32) / 32768.0
         return self.model.transcribe(audio_array, fp16=False, language=self.language)
@@ -209,4 +223,39 @@ class WhisperTranscriber(AbstractTranscriber):
             return {"text": ""}
         result = self.transcribe_audio(self.audio_buffer)
         self.audio_buffer = b''
-        return result
+        return result
+class OpenaiAPITranscriber(WhisperTranscriber):
+    backend = "openaiapi"
+    def __init__(self, model_name="whisper-1", language=None, model_kwargs={}, model=None, api_key=None, **kwargs):
+        if model is None:
+            import openai
+            model = openai.OpenAI(
+                api_key=api_key or openai.api_key,
+                # 20 seconds (default is 10 minutes)
+                timeout=20.0,
+            )
+        AbstractTranscriber.__init__(self, model, model_name, language, model_kwargs=model_kwargs, **kwargs)
+    def transcribe_audio(self, audio_bytes):
+        self.log("\nTranscribing")
+        import io
+        import openai
+        import soundfile as sf
+        audio_data = np.frombuffer(audio_bytes, dtype=np.int16).flatten().astype(np.float32) / 32768.0
+        # Write the audio data to an in-memory file in WAV format
+        buffer = io.BytesIO()
+        sf.write(buffer, audio_data, self.samplerate, format='WAV')
+        buffer.seek(0)
+        buffer.name = "audio.wav"  # Set a filename with a valid extension
+        try:
+            transcription = self.model.audio.transcriptions.create(
+                model=self.model_name,
+                file=buffer,
+            )
+        except openai.BadRequestError as e:
+            self.log(f"Error: {e}")
+            return {"text": ""}
+        return {"text": transcription.text}

{scribe_cli-0.9.0 → scribe_cli-0.11.0/scribe_cli.egg-info}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.2
 Name: scribe-cli
-Version: 0.9.0
+Version: 0.11.0
 Summary: scribe is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard on a computer
 Author-email: Mahé Perrette <mahe.perrette@gmail.com>
 License: MIT License
@@ -55,9 +55,14 @@ Requires-Dist: vosk; extra == "vosk"
 Provides-Extra: app
 Requires-Dist: pystray; extra == "app"
 Requires-Dist: PyGObject; extra == "app"
+Provides-Extra: openai
+Requires-Dist: openai; extra == "openai"
+Requires-Dist: soundfile; extra == "openai"
 Provides-Extra: all
 Requires-Dist: pynput; extra == "all"
 Requires-Dist: openai-whisper; extra == "all"
+Requires-Dist: openai; extra == "all"
+Requires-Dist: soundfile; extra == "all"
 Requires-Dist: vosk; extra == "all"
 Requires-Dist: pystray; extra == "all"
@@ -66,7 +71,9 @@ Requires-Dist: pystray; extra == "all"
 # Scribe  <img src="scribe_data/share/icon.png" width=48px>
-`scribe` is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard on a computer.
+`scribe` is a speech recognition tool that provides real-time transcription using cutting-edge AI models, with the goal of serving as a virtual keyboard on a computer.
+It features local, downloadable models with the `vosk` and `whisper` backends, as well as a client to open AI via `openaiapi` backend (API key required).
 ## Compatibility
@@ -101,12 +108,10 @@ cd scribe
 pip install -e .[all]
 ```
-You can leave the optional dependencies (leave out `[all]`) but must install at least one of `vosk` or `openai-whisper` packages (see Usage below).
-The `vosk` language models will download on-the-fly.
-The default download folder is `$XDG_CACHE_HOME/{backend}` where `$XDG_CACHE_HOME` defaults to `$HOME/.cache` (note for the `whisper` backend
-the default is left to the `openai-whisper` package and might change in the future).
+You can leave the optional dependencies (leave out `[all]`) but must install at least one of `vosk` or `openai-whisper` or `openai` packages (see Usage below).
+The language models for local backends `vosk` and `whisper` will download on-the-fly.
+The default download folder is `$XDG_CACHE_HOME/{backend}` where `$XDG_CACHE_HOME` defaults to `$HOME/.cache`.
 ## Usage
@@ -115,7 +120,7 @@ Just type in the terminal:
 ```bash
 scribe
 ```
-and the script will guide you through the choice of backend (`whisper` or `vosk`) and the specific language model.
+and the script will guide you through the choice of backend (`whisper` or `vosk` or `openaiapi`) and the specific language model.
 After this, you will be prompted to start recording your microphone and print the transcribed text in real-time (`vosk`)
 or until after recording is complete (`whisper`).
 You can interrupt the recording via Ctrl + C and start again or change model.
@@ -129,9 +134,9 @@ The `vosk` backend is much faster and very good at doing real-time transcription
 It becomes really powerful when used for longer or interactive typing session with the [keyboard](#virtual-keyboard-experimental) option, e.g. to make notes or chat with an AI.
 There are many [vosk models](https://alphacephei.com/vosk/models) available, and here a few are associated to [a handful of languages](scribe/models.toml) `en`, `fr`, `it`, `de` (so far).
-To skip the initial selection menu you can do:
+The `openaiapi` backend uses `whisper-1` model at the time of writing. It requires an API key
 ```bash
-scribe --backend whisper --model small --no-prompt
+scribe --backend openaiapi --api YOURAPIKEY
 ```
 where `--no-prompt` jumps right to the recording (after the first interruption, you can still choose to change the backend and model).
@@ -190,7 +195,9 @@ To activate start with:
 ```bash
 scribe --app
 ```
-or toggle the app option in the interactive menu. The scribe icon will show, with Record or Quit options.
+or toggle the app option in the interactive menu. The scribe icon will show, with Record and other options. The icon will change based on what the app is doing. It is possible to choose from a set
+of predefined models, or to Quit and choose from the terminal before pressing Enter again.
+For the vosk model, there are only two states : recording + transcribing or Idle. For the whisper model there are three states visible from the icon: recording, transcribing and idle/waiting.
 That option requires `pystray` to be installed. This is included with the `pip install ...[all]` option. In Ubuntu the following dependencies were required to make the menus appear:
 ```bash
@@ -204,17 +211,19 @@ If you run Ubuntu (or else?) with GNOME, the script `scribe-install [...]` will
 to make it available from the quick launch menu. Any option will be passed on to `scribe`, with the additional options `--name` and `--no-terminal`.
 `--no-terminal` means no terminal will show up, and it also implies the options `--app --no-prompt`.
-e.g.
+In a relatively basic form
+```bash
+scribe-install --clipboard  --api YOUROPENAIAPIKEY
+```
+(`--api` is optional and only useful if you plan to use `openaiapi` backend later on)
+And to make an app running outside the terminal:
 ```bash
-scribe-install
-scribe-install --name "Scribe Whisper" --backend whisper --model small --clipboard --restart-after-silence --no-prompt
-scribe-install --name "Scribe Vosk FR" --backend vosk --language fr --keyboard --clipboard --no-terminal
+scribe-install --backend openaiapi --name "Scribe App" --keyboard --clipboard --app --no-prompt --no-terminal  --api YOUROPENAIAPIKEY
 ```
-This will install three separate apps:
-- `Super + scribe` : will launch the default version with terminal prompt
-- `Super + whisper` : will launch a present version with the `small` model from `whisper` and start recording right away. You can see what is going on in the terminal and the result is ready to paste from the clipboard
-- `Super + vosk fr` : will launch a preset version for real-time transcription in French with the `vosk` backend, and throughput to the clipboard and the keyboard, not even opening a terminal.
+This will install two separate apps (names "Scribe" and "Scribe App")
 ## Fine tuning

{scribe_cli-0.9.0 → scribe_cli-0.11.0}/scribe_cli.egg-info/requires.txt RENAMED Viewed

@@ -9,6 +9,8 @@ termcolor
 [all]
 pynput
 openai-whisper
+openai
+soundfile
 vosk
 pystray
@@ -19,6 +21,10 @@ PyGObject
 [keyboard]
 pynput
+[openai]
+openai
+soundfile
 [vosk]
 vosk