PyPI - scribe-cli - Versions diffs - 0.7.9__tar.gz → 0.7.11__tar.gz - Mend

scribe-cli 0.7.9tar.gz → 0.7.11tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (30) hide show

{scribe_cli-0.7.9/scribe_cli.egg-info → scribe_cli-0.7.11}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.2
 Name: scribe-cli
-Version: 0.7.9
+Version: 0.7.11
 Summary: scribe is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard on a computer
 Author-email: Mahé Perrette <mahe.perrette@gmail.com>
 License: MIT License
@@ -64,14 +64,17 @@ Requires-Dist: pystray; extra == "all"
 [![python](https://img.shields.io/badge/python-3.12-blue.svg)]()
 [![pypi](https://img.shields.io/pypi/v/scribe-cli)](https://pypi.org/project/scribe-cli)
-# Scribe
+# Scribe  <img src="scribe_data/share/icon.png" width=48px>
 `scribe` is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard on a computer.
 ## Compatibility
-In principle `scribe` is compatible with any OS but I develop it under Ubuntu (Wayland) and develop it for my own purposes so glitches are likely on other configurations.
-As of February 19, 2025 python 13 is not supported (I can't recall now which dependency is to blame).
+In principle `scribe` is compatible with any OS but I develop it under Ubuntu (Wayland) for my own purposes so glitches are likely on other configurations.
+Moreover there are quite a bit of dependencies that rely on very OS-specific protocols under the hood, like access to the microphone, keyboard and clipboard,
+and even though the python dependencies `scribe` relies on are not restricted to a single platform, there may be limitation and additional binaries to install.
+This guide is based on python3.12 running on Ubuntu 24.04 with Gnome + Wayland, which is a relatively standard setting at the time of writing.
+Note as of February 19, 2025 python 13 does not seem to produce any transcription (I am not sure which dependency is to blame).
 A test on Mac OS (M1 Air with 8Gb RAM) worked with python 12, though with a much inferior performance compared to my own system (Lenovo T14 Gen 5 with i5 125U 32 Gb RAM).
 ## Installation
@@ -101,7 +104,7 @@ pip install -e .[all]
 You can leave the optional dependencies (leave out `[all]`) but must install at least one of `vosk` or `openai-whisper` packages (see Usage below).
 The `vosk` language models will download on-the-fly.
-The default download folder is `$XDG_CACHE_HOME/{backend}` where `$XDG_CACHE_HOME` defaults to `$HOME/.cache` (note for the `whisker` backend
+The default download folder is `$XDG_CACHE_HOME/{backend}` where `$XDG_CACHE_HOME` defaults to `$HOME/.cache` (note for the `whisper` backend
 the default is left to the `openai-whisper` package and might change in the future).
@@ -119,8 +122,8 @@ You can interrupt the recording via Ctrl + C and start again or change model. Th
 The default (`whisper`) is excellent at transcribing a full-length audio sequences in [many languages](https://github.com/openai/whisper?tab=readme-ov-file#available-models-and-languages). It is really impressive,
 but it cannot do real-time, and depending on the model can have relatively long execution time, especially with the `turbo` model (at least on my laptop with CPU only). The `small` model is also excellent and runs much faster. It is selected as default in `scribe` for that reason.
-With the `whisker` model you need to stop the registration manually before the transcription occurs (Ctrl + C), though
-there is a maximum duration after which it will stop by itself, which is setup to 60s by default (unless `--duration` is set to something else).
+With the `whisper` model the registration stops after a 2-second silence is detected. You can also stop the registration manually before the transcription occurs (Ctrl + C or Stop in the `--app` mode).
+By default, the recording will only last 120 seconds. You can fine-tune this behaviour via the `--silence`, `--duration` and `--restart-after-silence` parameters.
 The `vosk` backend is much faster and very good at doing real-time transcription for one language, but tended to make more mistakes in my tests and it does not do punctuation.
 It becomes really powerful when used for longer or interactive typing session with the [keyboard](#virtual-keyboard-experimental) option, e.g. to make notes or chat with an AI.
@@ -147,7 +150,7 @@ scribe --keyboard
 It relies on the optional `pynput` dependency (installed together with `scribe` if you used the `[all]` or `[keyboard]` option).
 Depending on your operating system, `pynput` may require additional configuration to work around its [limitations](https://pynput.readthedocs.io/en/latest/limitations.html).
-#### Use the keyboard in Ubuntu
+#### Use the keyboard with Wayland (default for Ubuntu 24.04)
 In my Ubuntu + Wayland system the keyboard simulation works out-of-the-box in chromium based applications (including vscode) but it does not in firefox and sublime text and any of the rest (not even in a terminal !). I am told this is because Chromium runs an X server emulator and so is compatible with the default pynput backend.
@@ -161,7 +164,7 @@ sudo HOME=$HOME XDG_RUNTIME_DIR=$XDG_RUNTIME_DIR PYNPUT_BACKEND_KEYBOARD=uinput
 ```
 You're on the right path :)
-### System tray icon (experimental)
+### System tray icon (experimental) <img src="scribe_data/share/icon.png" width=48px>
 To avoid switching back and forth with the terminal, it's possible to interact with the program via an icon tray.
 To activate start with:
@@ -176,7 +179,7 @@ sudo apt install libcairo-dev libgirepository1.0-dev gir1.2-appindicator3-0.1
 pip install PyGObject
 ```
-### Start as an application in Ubuntu
+### Start as an application in GNOME
 If you run Ubuntu (or else?) with GNOME, the script `scribe-install [...]` will create a `scribe.desktop` file and place it under `$HOME/.local/share/applications`
 to make it available from the quick launch menu. Any option will be passed on to `scribe`.

{scribe_cli-0.7.9 → scribe_cli-0.7.11}/README.md RENAMED Viewed

@@ -1,14 +1,17 @@
 [![python](https://img.shields.io/badge/python-3.12-blue.svg)]()
 [![pypi](https://img.shields.io/pypi/v/scribe-cli)](https://pypi.org/project/scribe-cli)
-# Scribe
+# Scribe  <img src="scribe_data/share/icon.png" width=48px>
 `scribe` is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard on a computer.
 ## Compatibility
-In principle `scribe` is compatible with any OS but I develop it under Ubuntu (Wayland) and develop it for my own purposes so glitches are likely on other configurations.
-As of February 19, 2025 python 13 is not supported (I can't recall now which dependency is to blame).
+In principle `scribe` is compatible with any OS but I develop it under Ubuntu (Wayland) for my own purposes so glitches are likely on other configurations.
+Moreover there are quite a bit of dependencies that rely on very OS-specific protocols under the hood, like access to the microphone, keyboard and clipboard,
+and even though the python dependencies `scribe` relies on are not restricted to a single platform, there may be limitation and additional binaries to install.
+This guide is based on python3.12 running on Ubuntu 24.04 with Gnome + Wayland, which is a relatively standard setting at the time of writing.
+Note as of February 19, 2025 python 13 does not seem to produce any transcription (I am not sure which dependency is to blame).
 A test on Mac OS (M1 Air with 8Gb RAM) worked with python 12, though with a much inferior performance compared to my own system (Lenovo T14 Gen 5 with i5 125U 32 Gb RAM).
 ## Installation
@@ -38,7 +41,7 @@ pip install -e .[all]
 You can leave the optional dependencies (leave out `[all]`) but must install at least one of `vosk` or `openai-whisper` packages (see Usage below).
 The `vosk` language models will download on-the-fly.
-The default download folder is `$XDG_CACHE_HOME/{backend}` where `$XDG_CACHE_HOME` defaults to `$HOME/.cache` (note for the `whisker` backend
+The default download folder is `$XDG_CACHE_HOME/{backend}` where `$XDG_CACHE_HOME` defaults to `$HOME/.cache` (note for the `whisper` backend
 the default is left to the `openai-whisper` package and might change in the future).
@@ -56,8 +59,8 @@ You can interrupt the recording via Ctrl + C and start again or change model. Th
 The default (`whisper`) is excellent at transcribing a full-length audio sequences in [many languages](https://github.com/openai/whisper?tab=readme-ov-file#available-models-and-languages). It is really impressive,
 but it cannot do real-time, and depending on the model can have relatively long execution time, especially with the `turbo` model (at least on my laptop with CPU only). The `small` model is also excellent and runs much faster. It is selected as default in `scribe` for that reason.
-With the `whisker` model you need to stop the registration manually before the transcription occurs (Ctrl + C), though
-there is a maximum duration after which it will stop by itself, which is setup to 60s by default (unless `--duration` is set to something else).
+With the `whisper` model the registration stops after a 2-second silence is detected. You can also stop the registration manually before the transcription occurs (Ctrl + C or Stop in the `--app` mode).
+By default, the recording will only last 120 seconds. You can fine-tune this behaviour via the `--silence`, `--duration` and `--restart-after-silence` parameters.
 The `vosk` backend is much faster and very good at doing real-time transcription for one language, but tended to make more mistakes in my tests and it does not do punctuation.
 It becomes really powerful when used for longer or interactive typing session with the [keyboard](#virtual-keyboard-experimental) option, e.g. to make notes or chat with an AI.
@@ -84,7 +87,7 @@ scribe --keyboard
 It relies on the optional `pynput` dependency (installed together with `scribe` if you used the `[all]` or `[keyboard]` option).
 Depending on your operating system, `pynput` may require additional configuration to work around its [limitations](https://pynput.readthedocs.io/en/latest/limitations.html).
-#### Use the keyboard in Ubuntu
+#### Use the keyboard with Wayland (default for Ubuntu 24.04)
 In my Ubuntu + Wayland system the keyboard simulation works out-of-the-box in chromium based applications (including vscode) but it does not in firefox and sublime text and any of the rest (not even in a terminal !). I am told this is because Chromium runs an X server emulator and so is compatible with the default pynput backend.
@@ -98,7 +101,7 @@ sudo HOME=$HOME XDG_RUNTIME_DIR=$XDG_RUNTIME_DIR PYNPUT_BACKEND_KEYBOARD=uinput
 ```
 You're on the right path :)
-### System tray icon (experimental)
+### System tray icon (experimental) <img src="scribe_data/share/icon.png" width=48px>
 To avoid switching back and forth with the terminal, it's possible to interact with the program via an icon tray.
 To activate start with:
@@ -113,7 +116,7 @@ sudo apt install libcairo-dev libgirepository1.0-dev gir1.2-appindicator3-0.1
 pip install PyGObject
 ```
-### Start as an application in Ubuntu
+### Start as an application in GNOME
 If you run Ubuntu (or else?) with GNOME, the script `scribe-install [...]` will create a `scribe.desktop` file and place it under `$HOME/.local/share/applications`
 to make it available from the quick launch menu. Any option will be passed on to `scribe`.

{scribe_cli-0.7.9 → scribe_cli-0.7.11}/scribe/_version.py RENAMED Viewed

@@ -12,5 +12,5 @@ __version__: str
 __version_tuple__: VERSION_TUPLE
 version_tuple: VERSION_TUPLE
-__version__ = version = '0.7.9'
-__version_tuple__ = version_tuple = (0, 7, 9)
+__version__ = version = '0.7.11'
+__version_tuple__ = version_tuple = (0, 7, 11)

{scribe_cli-0.7.9 → scribe_cli-0.7.11}/scribe/app.py RENAMED Viewed

@@ -112,16 +112,17 @@ def get_transcriber(o, prompt=True):
                 choices = list(zip(available_models, available_languages)) + [f" * [Any model from {ansi_link('https://alphacephei.com/vosk/models')}]"]
                 default_model = choices[0]  # this is a tuple !!
-            print(f"For information about vosk models see: {ansi_link('https://alphacephei.com/vosk/models')}")
             if prompt:
+                print(f"For information about vosk models see: {ansi_link('https://alphacephei.com/vosk/models')}")
                 model = prompt_choices(choices, default=default_model, label="model")  # this always returns a string
             else:
                 model = default_model[0] if isinstance(default_model, tuple) else default_model  # tuple -> string
         elif backend == "whisper":
             default_model = "small"
-            print("Some models have a specialized English version (.en) which will be selected as default is `-l en` was requested, but can also be requested explicitly below (option not listed). See [documentation](https://github.com/openai/whisper?tab=readme-ov-file#available-models-and-languages).")
             if prompt:
+                # print("Some models have a specialized English version (.en) which will be selected as default is `-l en` was requested, but can also be requested explicitly below (option not listed). See [documentation](https://github.com/openai/whisper?tab=readme-ov-file#available-models-and-languages).")
+                print(f"See {ansi_link('https://github.com/openai/whisper?tab=readme-ov-file#available-models-and-languages')} for available models.")
                 model = prompt_choices(whisper_models, default=default_model, label="model",
                                         hidden_models=whisper_english_models)
             else:
@@ -146,7 +147,8 @@ def get_transcriber(o, prompt=True):
     elif backend == "whisper":
         transcriber = WhisperTranscriber(model_name=model, language=o.language, samplerate=o.samplerate,
-                                         timeout=o.duration, silence_duration=o.silence, restart_after_silence=o.restart_after_silence,
+                                         timeout=o.duration, silence_duration=o.silence, silence_thresh=o.silence_db,
+                                         restart_after_silence=o.restart_after_silence,
                                          model_kwargs={"download_root": o.download_folder_whisper})
     else:
@@ -177,12 +179,13 @@ def get_parser():
     parser.add_argument("--microphone-device", help="The device index of the microphone to use.", type=int)
     parser.add_argument("--keyboard", action="store_true")
     parser.add_argument("--no-clipboard", dest="clipboard", action="store_false")
-    parser.add_argument("--latency", default=0, type=float, help="keyboard latency")
+    parser.add_argument("--latency", default=0.01, type=float, help="keyboard latency")
     parser.add_argument("--ascii", action="store_true", help="Use unidecode for keyboard typing in ascii")
     group = parser.add_argument_group("whisper options")
-    group.add_argument("--duration", default=120, type=int, help="Max duration of the whisper recording (default %(default)ss)")
-    group.add_argument("--silence", default=2, type=float, help="silence duration that prompt transcription (whisper) (default %(default)ss)")
+    group.add_argument("--duration", default=120, type=float, help="Max duration of the whisper recording (default %(default)ss)")
+    group.add_argument("--silence", default=2, type=float, help="silence duration (default %(default)ss)")
+    group.add_argument("--silence-db", default=-30, type=float, help="silence magnitude in db (default %(default)ss)")
     group.add_argument("--restart-after-silence", action="store_true", help="Restart the recording after a transcription triggered by a silence")
     parser.add_argument("--download-folder-vosk", help="Folder to store Vosk models.")
@@ -248,7 +251,15 @@ def create_app(micro, transcriber, **kwargs):
         image_recording = Image.alpha_composite(image_recording.convert("RGBA"), image_writing.convert("RGBA"))
     def update_icon(icon, force=False):
-        if transcriber.recording:
+        if transcriber.recording and transcriber.waiting:
+            # this is the situation with the whisper backend when the microphone is recording
+            # but we wait for the speaker to speak (silence)
+            if force or getattr(icon, "_icon_label", None) != None:
+                icon.icon = image
+                icon._icon_label = None
+                icon.update_menu()
+        elif transcriber.recording:
             if force or getattr(icon, "_icon_label", None) != "recording":
                 icon.icon = image_recording
                 icon._icon_label = "recording"
@@ -338,38 +349,48 @@ def main(args=None):
     micro = Microphone(samplerate=o.samplerate, device=o.microphone_device)
     transcriber = None
-    toggle = {True: "On", False: "Off"}
+    details = False
     while True:
         if transcriber is None:
             transcriber = get_transcriber(o, prompt=o.prompt)
         print(f"Model [{colored(transcriber.model_name, 'light_blue', attrs=['bold'])}] from [{colored(transcriber.backend, 'light_blue', attrs=['bold'])}] selected.")
+        show_options = ["clipboard", "keyboard", "ascii", "app"]
+        activated_options = [colored(option, 'light_blue') for option in show_options if getattr(o, option)]
+        print(f"Options: {' | '.join(activated_options)}")
         if o.prompt:
             print(f"Choose any of the following actions")
             print(f"{colored('[q]', 'light_yellow')} quit")
             print(f"{colored('[e]', 'light_yellow')} change model")
-            print(f"{colored('[x]', 'light_yellow')} app is {colored(o.app, 'light_blue')} toggle?")
-            print(f"{colored('[c]', 'light_yellow')} clipboard is {colored(o.clipboard, 'light_blue')} toggle?")
-            print(f"{colored('[k]', 'light_yellow')} keyboard is {colored(o.keyboard, 'light_blue')} toggle?")
-            if o.keyboard:
-                print(f"{colored('[latency]', 'light_yellow')} between keystrokes is {colored(o.latency, 'light_blue')} s")
-            if transcriber.backend == "whisper":
-                print(f"{colored('[t]', 'light_yellow')} change duration (currently {colored(transcriber.timeout, 'light_blue')} s)")
-                print(f"{colored('[b]', 'light_yellow')} change silence duration (currently {colored(transcriber.silence_duration, 'light_blue')} s)")
-                print(f"{colored('[a]', 'light_yellow')} auto-restart after silence is {colored(transcriber.restart_after_silence, 'light_blue')} toggle?")
-            exclude_flags = ["keyboard", "clipboard", "app", "prompt", "restart_after_silence"]
-            display_flags = [a.dest for a in parser._actions if a.help != argparse.SUPPRESS]
-            for key, value in vars(o).items():
-                if key not in display_flags or key in exclude_flags or not isinstance(value, bool):
-                    continue
-                print(f"{colored(f'[{key}]', 'light_yellow')} is {colored(value, 'light_blue')} toggle?")
+            if details:
+                print(f"{colored('[x]', 'light_yellow')} app is {colored(o.app, 'light_blue')} toggle?")
+                print(f"{colored('[c]', 'light_yellow')} clipboard is {colored(o.clipboard, 'light_blue')} toggle?")
+                print(f"{colored('[k]', 'light_yellow')} keyboard is {colored(o.keyboard, 'light_blue')} toggle?")
+                if o.keyboard:
+                    print(f"{colored('[latency]', 'light_yellow')} between keystrokes is {colored(o.latency, 'light_blue')} s")
+                if transcriber.backend == "whisper":
+                    print(f"{colored('[t]', 'light_yellow')} change duration (currently {colored(transcriber.timeout, 'light_blue')} s)")
+                    print(f"{colored('[b]', 'light_yellow')} change silence (currently {colored(transcriber.silence_duration, 'light_blue')} s)")
+                    print(f"{colored('[db]', 'light_yellow')} change backround noise (currently {colored(transcriber.silence_thresh, 'light_blue')} db)")
+                    print(f"{colored('[a]', 'light_yellow')} auto-restart after silence is {colored(transcriber.restart_after_silence, 'light_blue')} toggle?")
+                exclude_flags = ["keyboard", "clipboard", "app", "prompt", "restart_after_silence"]
+                display_flags = [a.dest for a in parser._actions if a.help != argparse.SUPPRESS]
+                for key, value in vars(o).items():
+                    if key not in display_flags or key in exclude_flags or not isinstance(value, bool):
+                        continue
+                    print(f"{colored(f'[{key}]', 'light_yellow')} is {colored(value, 'light_blue')} toggle?")
+                print(f"{colored('[o]', 'light_yellow')} hide options")
+            else:
+                print(f"{colored('[o]', 'light_yellow')} show options")
             print(colored(f"Press [Enter] to start recording.", attrs=["bold"]))
             key = input()
             if key == "q":
                 exit(0)
+            if key == "o":
+                details = not details
+                continue
             if key == "e":
                 transcriber = None
                 o.model = None
@@ -391,9 +412,9 @@ def main(args=None):
             if key == "t":
                 ans = input(f"Enter new duration in seconds (current: {transcriber.timeout}): ")
                 try:
-                    o.duration = transcriber.timeout = int(ans)
+                    o.duration = transcriber.timeout = float(ans)
                 except:
-                    print("Invalid duration. Must be an integer.")
+                    print("Invalid duration. Must be a float.")
                 continue
             if key == "latency":
                 ans = input(f"Enter new keyboard latency in seconds (current: {o.latency}): ")
@@ -405,9 +426,16 @@ def main(args=None):
             if key == "b":
                 ans = input(f"Enter new silence break duration in seconds (current: {transcriber.silence_duration}): ")
                 try:
-                    o.silence = transcriber.silence_duration = int(ans)
+                    o.silence = transcriber.silence_duration = float(ans)
+                except:
+                    print("Invalid duration. Must be a float.")
+                continue
+            if key == "db":
+                ans = input(f"Enter new background noise threshold to detect silence (current: {transcriber.silence_thresh}): ")
+                try:
+                    o.silence_db = transcriber.silence_thresh = float(ans)
                 except:
-                    print("Invalid duration. Must be an integer.")
+                    print("Invalid duration. Must be a float.")
                 continue
             if key:
                 if hasattr(o, key) and isinstance(getattr(o, key), bool):

{scribe_cli-0.7.9 → scribe_cli-0.7.11}/scribe/models.py RENAMED Viewed

@@ -34,6 +34,7 @@ class AbstractTranscriber:
         self.restart_after_silence = restart_after_silence
         self.recording = False
         self.busy = False
+        self.waiting = False
         self.reset()
     def get_elapsed(self):
@@ -52,7 +53,6 @@ class AbstractTranscriber:
     def reset(self):
         self.audio_buffer = b''
         self.start_time = time.time()
-        self.last_sound_time = time.time()
     def start_recording(self, microphone,
                         start_message="Recording... Press Ctrl+C to stop.",
@@ -60,7 +60,13 @@ class AbstractTranscriber:
         self.reset()
         self.recording = True
+        self.waiting = True
         self.busy = True
+        if self.silence_duration is not None:
+            self.last_sound_time = time.time() - self.silence_duration
+        else:
+            self.last_sound_time = time.time()
+        previous_waiting = self.waiting
         try:
@@ -75,19 +81,31 @@ class AbstractTranscriber:
                         if is_silent(data, self.silence_thresh):
                             silence_duration = time.time() - self.last_sound_time
-                            if self.silence_duration is not None and silence_duration >= self.silence_duration and len(self.audio_buffer) > 0:
+                            previous_waiting = self.waiting
+                            self.waiting = self.silence_duration is not None and silence_duration >= self.silence_duration
+                            if self.waiting and len(self.audio_buffer) > 0:
                                 if self.restart_after_silence:
+                                    self.recording = False # for the system tray icon
                                     result = self.finalize()
                                     microphone.q.queue.clear()
                                     self.reset()
                                     yield result
+                                    self.recording = True # for the system tray icon
                                 else:
                                     raise StopRecording("Silence detected: {:.2f} seconds".format(silence_duration))
                         else:
                             self.last_sound_time = time.time()
+                            self.waiting = False
-                        yield self.transcribe_realtime_audio(data)
+                        # don't accumulate very long silences
+                        if not self.waiting:
+                            yield self.transcribe_realtime_audio(data)
+                        else:
+                            if not previous_waiting:
+                                print("Silence detected...waiting for more audio")
                         if self.is_overtime():
                             raise StopRecording("Overtime: {:.2f} seconds".format(self.get_elapsed()))
@@ -98,6 +116,7 @@ class AbstractTranscriber:
             pass
         finally:
+            self.waiting = False
             self.recording = False
             result = self.finalize()
             microphone.q.queue.clear()

{scribe_cli-0.7.9 → scribe_cli-0.7.11/scribe_cli.egg-info}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.2
 Name: scribe-cli
-Version: 0.7.9
+Version: 0.7.11
 Summary: scribe is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard on a computer
 Author-email: Mahé Perrette <mahe.perrette@gmail.com>
 License: MIT License
@@ -64,14 +64,17 @@ Requires-Dist: pystray; extra == "all"
 [![python](https://img.shields.io/badge/python-3.12-blue.svg)]()
 [![pypi](https://img.shields.io/pypi/v/scribe-cli)](https://pypi.org/project/scribe-cli)
-# Scribe
+# Scribe  <img src="scribe_data/share/icon.png" width=48px>
 `scribe` is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard on a computer.
 ## Compatibility
-In principle `scribe` is compatible with any OS but I develop it under Ubuntu (Wayland) and develop it for my own purposes so glitches are likely on other configurations.
-As of February 19, 2025 python 13 is not supported (I can't recall now which dependency is to blame).
+In principle `scribe` is compatible with any OS but I develop it under Ubuntu (Wayland) for my own purposes so glitches are likely on other configurations.
+Moreover there are quite a bit of dependencies that rely on very OS-specific protocols under the hood, like access to the microphone, keyboard and clipboard,
+and even though the python dependencies `scribe` relies on are not restricted to a single platform, there may be limitation and additional binaries to install.
+This guide is based on python3.12 running on Ubuntu 24.04 with Gnome + Wayland, which is a relatively standard setting at the time of writing.
+Note as of February 19, 2025 python 13 does not seem to produce any transcription (I am not sure which dependency is to blame).
 A test on Mac OS (M1 Air with 8Gb RAM) worked with python 12, though with a much inferior performance compared to my own system (Lenovo T14 Gen 5 with i5 125U 32 Gb RAM).
 ## Installation
@@ -101,7 +104,7 @@ pip install -e .[all]
 You can leave the optional dependencies (leave out `[all]`) but must install at least one of `vosk` or `openai-whisper` packages (see Usage below).
 The `vosk` language models will download on-the-fly.
-The default download folder is `$XDG_CACHE_HOME/{backend}` where `$XDG_CACHE_HOME` defaults to `$HOME/.cache` (note for the `whisker` backend
+The default download folder is `$XDG_CACHE_HOME/{backend}` where `$XDG_CACHE_HOME` defaults to `$HOME/.cache` (note for the `whisper` backend
 the default is left to the `openai-whisper` package and might change in the future).
@@ -119,8 +122,8 @@ You can interrupt the recording via Ctrl + C and start again or change model. Th
 The default (`whisper`) is excellent at transcribing a full-length audio sequences in [many languages](https://github.com/openai/whisper?tab=readme-ov-file#available-models-and-languages). It is really impressive,
 but it cannot do real-time, and depending on the model can have relatively long execution time, especially with the `turbo` model (at least on my laptop with CPU only). The `small` model is also excellent and runs much faster. It is selected as default in `scribe` for that reason.
-With the `whisker` model you need to stop the registration manually before the transcription occurs (Ctrl + C), though
-there is a maximum duration after which it will stop by itself, which is setup to 60s by default (unless `--duration` is set to something else).
+With the `whisper` model the registration stops after a 2-second silence is detected. You can also stop the registration manually before the transcription occurs (Ctrl + C or Stop in the `--app` mode).
+By default, the recording will only last 120 seconds. You can fine-tune this behaviour via the `--silence`, `--duration` and `--restart-after-silence` parameters.
 The `vosk` backend is much faster and very good at doing real-time transcription for one language, but tended to make more mistakes in my tests and it does not do punctuation.
 It becomes really powerful when used for longer or interactive typing session with the [keyboard](#virtual-keyboard-experimental) option, e.g. to make notes or chat with an AI.
@@ -147,7 +150,7 @@ scribe --keyboard
 It relies on the optional `pynput` dependency (installed together with `scribe` if you used the `[all]` or `[keyboard]` option).
 Depending on your operating system, `pynput` may require additional configuration to work around its [limitations](https://pynput.readthedocs.io/en/latest/limitations.html).
-#### Use the keyboard in Ubuntu
+#### Use the keyboard with Wayland (default for Ubuntu 24.04)
 In my Ubuntu + Wayland system the keyboard simulation works out-of-the-box in chromium based applications (including vscode) but it does not in firefox and sublime text and any of the rest (not even in a terminal !). I am told this is because Chromium runs an X server emulator and so is compatible with the default pynput backend.
@@ -161,7 +164,7 @@ sudo HOME=$HOME XDG_RUNTIME_DIR=$XDG_RUNTIME_DIR PYNPUT_BACKEND_KEYBOARD=uinput
 ```
 You're on the right path :)
-### System tray icon (experimental)
+### System tray icon (experimental) <img src="scribe_data/share/icon.png" width=48px>
 To avoid switching back and forth with the terminal, it's possible to interact with the program via an icon tray.
 To activate start with:
@@ -176,7 +179,7 @@ sudo apt install libcairo-dev libgirepository1.0-dev gir1.2-appindicator3-0.1
 pip install PyGObject
 ```
-### Start as an application in Ubuntu
+### Start as an application in GNOME
 If you run Ubuntu (or else?) with GNOME, the script `scribe-install [...]` will create a `scribe.desktop` file and place it under `$HOME/.local/share/applications`
 to make it available from the quick launch menu. Any option will be passed on to `scribe`.