PyPI - whisper-key-local - Versions diffs - 0.5.3__tar.gz → 0.6.1__tar.gz - Mend

whisper-key-local 0.5.3tar.gz → 0.6.1tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (66) hide show

whisper_key_local-0.6.1/PKG-INFO ADDED Viewed

@@ -0,0 +1,159 @@
+Metadata-Version: 2.4
+Name: whisper-key-local
+Version: 0.6.1
+Summary: Local faster-whisper speech-to-text app with global hotkeys for Windows
+Author-email: Pin Wang <pinwang@gmail.com>
+Requires-Python: >=3.11
+Description-Content-Type: text/markdown
+Requires-Dist: faster-whisper>=1.2.1
+Requires-Dist: ctranslate2>=4.6.3
+Requires-Dist: numpy>=1.24.0
+Requires-Dist: soxr>=0.3.0
+Requires-Dist: sounddevice>=0.4.6
+Requires-Dist: pyperclip>=1.8.2
+Requires-Dist: ruamel.yaml>=0.18.14
+Requires-Dist: pystray>=0.19.5
+Requires-Dist: Pillow>=10.0.0
+Requires-Dist: hf-xet>=1.1.5
+Requires-Dist: playsound3>=2.0
+Requires-Dist: ten-vad>=1.0.6
+Requires-Dist: global-hotkeys>=0.1.7; sys_platform == "win32"
+Requires-Dist: pywin32>=306; sys_platform == "win32"
+Requires-Dist: pyautogui>=0.9.54; sys_platform == "win32"
+Requires-Dist: pyobjc-framework-Quartz; sys_platform == "darwin"
+Requires-Dist: pyobjc-framework-ApplicationServices; sys_platform == "darwin"
+# Whisper Key - Local Speech-to-Text
+Global hotkeys to record speech and transcribe directly to your cursor.
+> **Now on Windows and macOS!** Questions or ideas? [Discord](https://discord.gg/uZnXV8snhz)
+## ✨ Features
+- **Global Hotkey**: Start recording speech from any app
+- **Auto-Paste**: Transcribe directly to cursor
+- **Auto-Send**: Optionally auto-send your transcription with ENTER
+- **Offline Capable**: No internet required after models downloaded
+- **Local Processing**: Voice data never leaves your computer
+- **Efficient Models**: Choose a small, efficient model for CPU
+- **CUDA Support**: Or leverage your nVidia GPU with bigger models
+- **Voice activity detection**: Auto-cancel after long silences and prevent hallucination
+- **Cross-platform**: Works on Windows and macOS
+- **Configurable**: Customize hotkeys, models, and [much more](#️-configuration)
+## 🚀 Quick Start
+### From PyPI (Recommended)
+Requires Python 3.11-3.13
+```bash
+# With pipx (isolated environment)
+pipx install whisper-key-local
+# Or with pip (simpler)
+pip install whisper-key-local
+```
+Then run: `whisper-key`
+### Portable App (Windows Only)
+1. [Download the latest release zip](https://github.com/PinW/whisper-key-local/releases/latest)
+2. Extract and run `whisper-key.exe`
+### From Source
+```bash
+git clone https://github.com/PinW/whisper-key-local.git
+cd whisper-key-local
+pip install -e .
+python whisper-key.py
+```
+## 🎤 Basic Usage
+| Hotkey | Windows | macOS |
+|--------|---------|-------|
+| Start recording | `Ctrl+Win` | `Fn+Ctrl` |
+| Stop & transcribe | `Ctrl` | `Fn` |
+| Stop & auto-send | `Alt` | `Option` |
+| Cancel recording | `Esc` | `Shift` |
+Open the system tray / menu bar icon to:
+- Toggle auto-paste vs clipboard-only
+- Change transcription model
+- Select audio device
+## ⚙️ Configuration
+Local settings at:
+- **Windows:** `%APPDATA%\whisperkey\user_settings.yaml`
+- **macOS:** `~/Library/Application Support/whisperkey/user_settings.yaml`
+Delete this file and restart app to reset to defaults.
+| Option | Default | Notes |
+|--------|---------|-------|
+| **Whisper** |||
+| `whisper.model` | `tiny` | Any model defined in `whisper.models` |
+| `whisper.device` | `cpu` | cpu or cuda (NVIDIA GPU) |
+| `whisper.compute_type` | `int8` | int8/float16/float32 |
+| `whisper.language` | `auto` | auto or language code (en, es, fr, etc.) |
+| `whisper.beam_size` | `5` | Higher = more accurate but slower (1-10) |
+| `whisper.models` | (see config) | Add custom HuggingFace or local models |
+| **Hotkeys** |||
+| `hotkey.recording_hotkey` | `ctrl+win` / `fn+ctrl` | Windows / macOS |
+| `hotkey.stop_with_modifier_enabled` | `true` | Stop with first modifier only |
+| `hotkey.auto_enter_enabled` | `true` | Enable auto-send hotkey |
+| `hotkey.auto_enter_combination` | `alt` / `option` | Stop + paste + Enter |
+| `hotkey.cancel_combination` | `esc` / `shift` | Cancel recording |
+| **Voice Activity Detection** |||
+| `vad.vad_precheck_enabled` | `true` | Prevent hallucinations on silence |
+| `vad.vad_onset_threshold` | `0.7` | Speech detection start (0.0-1.0) |
+| `vad.vad_offset_threshold` | `0.55` | Speech detection end (0.0-1.0) |
+| `vad.vad_min_speech_duration` | `0.1` | Min speech segment (seconds) |
+| `vad.vad_realtime_enabled` | `true` | Auto-stop on silence |
+| `vad.vad_silence_timeout_seconds` | `30.0` | Seconds before auto-stop |
+| **Audio** |||
+| `audio.host` | `null` | Audio API (WASAPI, Core Audio, etc.) |
+| `audio.channels` | `1` | 1 = mono, 2 = stereo |
+| `audio.dtype` | `float32` | float32/int16/int24/int32 |
+| `audio.max_duration` | `900` | Max recording seconds (0 = unlimited) |
+| `audio.input_device` | `default` | Device ID or "default" |
+| **Clipboard** |||
+| `clipboard.auto_paste` | `true` | false = clipboard only |
+| `clipboard.paste_hotkey` | `ctrl+v` / `cmd+v` | Paste key simulation |
+| `clipboard.preserve_clipboard` | `true` | Restore clipboard after paste |
+| `clipboard.key_simulation_delay` | `0.05` | Delay between keystrokes (seconds) |
+| **Logging** |||
+| `logging.level` | `INFO` | DEBUG/INFO/WARNING/ERROR/CRITICAL |
+| `logging.file.enabled` | `true` | Write to app.log |
+| `logging.console.enabled` | `true` | Print to console |
+| `logging.console.level` | `WARNING` | Console verbosity |
+| **Audio Feedback** |||
+| `audio_feedback.enabled` | `true` | Play sounds on record/stop |
+| `audio_feedback.start_sound` | `assets/sounds/...` | Custom sound file path |
+| `audio_feedback.stop_sound` | `assets/sounds/...` | Custom sound file path |
+| `audio_feedback.cancel_sound` | `assets/sounds/...` | Custom sound file path |
+| **System Tray** |||
+| `system_tray.enabled` | `true` | Show tray icon |
+| `system_tray.tooltip` | `Whisper Key` | Hover text |
+| **Console** |||
+| `console.start_hidden` | `false` | Start minimized to tray |
+## 📁 Model Cache
+Default path for transcription models (via HuggingFace):
+- **Windows:** `%USERPROFILE%\.cache\huggingface\hub\`
+- **macOS:** `~/.cache/huggingface/hub/`
+## 📦 Dependencies
+**Cross-platform:**
+`faster-whisper` · `numpy` · `sounddevice` · `soxr` · `pyperclip` · `ruamel.yaml` · `pystray` · `Pillow` · `playsound3` · `ten-vad` · `hf-xet`
+**Windows:** `global-hotkeys` · `pywin32` · `pyautogui`
+**macOS:** `pyobjc-framework-Quartz` · `pyobjc-framework-ApplicationServices`

whisper_key_local-0.6.1/README.md ADDED Viewed

@@ -0,0 +1,134 @@
+# Whisper Key - Local Speech-to-Text
+Global hotkeys to record speech and transcribe directly to your cursor.
+> **Now on Windows and macOS!** Questions or ideas? [Discord](https://discord.gg/uZnXV8snhz)
+## ✨ Features
+- **Global Hotkey**: Start recording speech from any app
+- **Auto-Paste**: Transcribe directly to cursor
+- **Auto-Send**: Optionally auto-send your transcription with ENTER
+- **Offline Capable**: No internet required after models downloaded
+- **Local Processing**: Voice data never leaves your computer
+- **Efficient Models**: Choose a small, efficient model for CPU
+- **CUDA Support**: Or leverage your nVidia GPU with bigger models
+- **Voice activity detection**: Auto-cancel after long silences and prevent hallucination
+- **Cross-platform**: Works on Windows and macOS
+- **Configurable**: Customize hotkeys, models, and [much more](#️-configuration)
+## 🚀 Quick Start
+### From PyPI (Recommended)
+Requires Python 3.11-3.13
+```bash
+# With pipx (isolated environment)
+pipx install whisper-key-local
+# Or with pip (simpler)
+pip install whisper-key-local
+```
+Then run: `whisper-key`
+### Portable App (Windows Only)
+1. [Download the latest release zip](https://github.com/PinW/whisper-key-local/releases/latest)
+2. Extract and run `whisper-key.exe`
+### From Source
+```bash
+git clone https://github.com/PinW/whisper-key-local.git
+cd whisper-key-local
+pip install -e .
+python whisper-key.py
+```
+## 🎤 Basic Usage
+| Hotkey | Windows | macOS |
+|--------|---------|-------|
+| Start recording | `Ctrl+Win` | `Fn+Ctrl` |
+| Stop & transcribe | `Ctrl` | `Fn` |
+| Stop & auto-send | `Alt` | `Option` |
+| Cancel recording | `Esc` | `Shift` |
+Open the system tray / menu bar icon to:
+- Toggle auto-paste vs clipboard-only
+- Change transcription model
+- Select audio device
+## ⚙️ Configuration
+Local settings at:
+- **Windows:** `%APPDATA%\whisperkey\user_settings.yaml`
+- **macOS:** `~/Library/Application Support/whisperkey/user_settings.yaml`
+Delete this file and restart app to reset to defaults.
+| Option | Default | Notes |
+|--------|---------|-------|
+| **Whisper** |||
+| `whisper.model` | `tiny` | Any model defined in `whisper.models` |
+| `whisper.device` | `cpu` | cpu or cuda (NVIDIA GPU) |
+| `whisper.compute_type` | `int8` | int8/float16/float32 |
+| `whisper.language` | `auto` | auto or language code (en, es, fr, etc.) |
+| `whisper.beam_size` | `5` | Higher = more accurate but slower (1-10) |
+| `whisper.models` | (see config) | Add custom HuggingFace or local models |
+| **Hotkeys** |||
+| `hotkey.recording_hotkey` | `ctrl+win` / `fn+ctrl` | Windows / macOS |
+| `hotkey.stop_with_modifier_enabled` | `true` | Stop with first modifier only |
+| `hotkey.auto_enter_enabled` | `true` | Enable auto-send hotkey |
+| `hotkey.auto_enter_combination` | `alt` / `option` | Stop + paste + Enter |
+| `hotkey.cancel_combination` | `esc` / `shift` | Cancel recording |
+| **Voice Activity Detection** |||
+| `vad.vad_precheck_enabled` | `true` | Prevent hallucinations on silence |
+| `vad.vad_onset_threshold` | `0.7` | Speech detection start (0.0-1.0) |
+| `vad.vad_offset_threshold` | `0.55` | Speech detection end (0.0-1.0) |
+| `vad.vad_min_speech_duration` | `0.1` | Min speech segment (seconds) |
+| `vad.vad_realtime_enabled` | `true` | Auto-stop on silence |
+| `vad.vad_silence_timeout_seconds` | `30.0` | Seconds before auto-stop |
+| **Audio** |||
+| `audio.host` | `null` | Audio API (WASAPI, Core Audio, etc.) |
+| `audio.channels` | `1` | 1 = mono, 2 = stereo |
+| `audio.dtype` | `float32` | float32/int16/int24/int32 |
+| `audio.max_duration` | `900` | Max recording seconds (0 = unlimited) |
+| `audio.input_device` | `default` | Device ID or "default" |
+| **Clipboard** |||
+| `clipboard.auto_paste` | `true` | false = clipboard only |
+| `clipboard.paste_hotkey` | `ctrl+v` / `cmd+v` | Paste key simulation |
+| `clipboard.preserve_clipboard` | `true` | Restore clipboard after paste |
+| `clipboard.key_simulation_delay` | `0.05` | Delay between keystrokes (seconds) |
+| **Logging** |||
+| `logging.level` | `INFO` | DEBUG/INFO/WARNING/ERROR/CRITICAL |
+| `logging.file.enabled` | `true` | Write to app.log |
+| `logging.console.enabled` | `true` | Print to console |
+| `logging.console.level` | `WARNING` | Console verbosity |
+| **Audio Feedback** |||
+| `audio_feedback.enabled` | `true` | Play sounds on record/stop |
+| `audio_feedback.start_sound` | `assets/sounds/...` | Custom sound file path |
+| `audio_feedback.stop_sound` | `assets/sounds/...` | Custom sound file path |
+| `audio_feedback.cancel_sound` | `assets/sounds/...` | Custom sound file path |
+| **System Tray** |||
+| `system_tray.enabled` | `true` | Show tray icon |
+| `system_tray.tooltip` | `Whisper Key` | Hover text |
+| **Console** |||
+| `console.start_hidden` | `false` | Start minimized to tray |
+## 📁 Model Cache
+Default path for transcription models (via HuggingFace):
+- **Windows:** `%USERPROFILE%\.cache\huggingface\hub\`
+- **macOS:** `~/.cache/huggingface/hub/`
+## 📦 Dependencies
+**Cross-platform:**
+`faster-whisper` · `numpy` · `sounddevice` · `soxr` · `pyperclip` · `ruamel.yaml` · `pystray` · `Pillow` · `playsound3` · `ten-vad` · `hf-xet`
+**Windows:** `global-hotkeys` · `pywin32` · `pyautogui`
+**macOS:** `pyobjc-framework-Quartz` · `pyobjc-framework-ApplicationServices`

{whisper_key_local-0.5.3 → whisper_key_local-0.6.1}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 [project]
 name = "whisper-key-local"
-version = "0.5.3"
+version = "0.6.1"
 description = "Local faster-whisper speech-to-text app with global hotkeys for Windows"
 readme = "README.md"
 authors = [
@@ -12,20 +12,28 @@ authors = [
 ]
 requires-python = ">=3.11"
 dependencies = [
+    # Cross-platform
     "faster-whisper>=1.2.1",
     "ctranslate2>=4.6.3",
     "numpy>=1.24.0",
     "soxr>=0.3.0",
     "sounddevice>=0.4.6",
-    "global-hotkeys>=0.1.7; platform_system=='Windows'",
     "pyperclip>=1.8.2",
     "ruamel.yaml>=0.18.14",
-    "pywin32>=306; platform_system=='Windows'",
-    "pyautogui>=0.9.54; platform_system=='Windows'",
     "pystray>=0.19.5",
     "Pillow>=10.0.0",
     "hf-xet>=1.1.5",
-    # Note: ten-vad git dependency handled separately
+    "playsound3>=2.0",
+    "ten-vad>=1.0.6",
+    # Windows-only
+    "global-hotkeys>=0.1.7; sys_platform=='win32'",
+    "pywin32>=306; sys_platform=='win32'",
+    "pyautogui>=0.9.54; sys_platform=='win32'",
+    # macOS-only
+    "pyobjc-framework-Quartz; sys_platform=='darwin'",
+    "pyobjc-framework-ApplicationServices; sys_platform=='darwin'",
 ]
 [project.scripts]
@@ -35,4 +43,4 @@ whisper-key = "whisper_key.main:main"
 where = ["src"]
 [tool.setuptools.package-data]
-"whisper_key" = ["assets/**/*", "config.defaults.yaml"]
+"whisper_key" = ["assets/**/*", "platform/*/assets/*", "config.defaults.yaml"]

whisper_key_local-0.6.1/src/whisper_key/assets/version.txt ADDED Viewed

	@@ -0,0 +1 @@
1	+ 0.6.1

{whisper_key_local-0.5.3 → whisper_key_local-0.6.1}/src/whisper_key/audio_feedback.py RENAMED Viewed

@@ -1,56 +1,57 @@
 import logging
 import os
+import platform
 import threading
-import winsound
+from playsound3 import playsound
+SOUND_BACKEND = "winmm" if platform.system() == "Windows" else None
 from .utils import resolve_asset_path
-class AudioFeedback:
+class AudioFeedback:
     def __init__(self, enabled=True, start_sound='', stop_sound='', cancel_sound=''):
         self.enabled = enabled
         self.logger = logging.getLogger(__name__)
         self.start_sound_path = resolve_asset_path(start_sound)
         self.stop_sound_path = resolve_asset_path(stop_sound)
         self.cancel_sound_path = resolve_asset_path(cancel_sound)
         if not self.enabled:
             self.logger.info("Audio feedback disabled by configuration")
             print("   ✗ Audio feedback disabled")
         else:
             self._validate_sound_files()
             print("   ✓ Audio feedback enabled...")
     def _validate_sound_files(self):
         if self.start_sound_path and not os.path.isfile(self.start_sound_path):
             self.logger.warning(f"Start sound file not found: {self.start_sound_path}")
         if self.stop_sound_path and not os.path.isfile(self.stop_sound_path):
             self.logger.warning(f"Stop sound file not found: {self.stop_sound_path}")
         if self.cancel_sound_path and not os.path.isfile(self.cancel_sound_path):
             self.logger.warning(f"Cancel sound file not found: {self.cancel_sound_path}")
     def _play_sound_file_async(self, file_path: str):
-        def play_sound():
+        def play():
             try:
-                # SND_FILENAME = play from file, SND_ASYNC = don't block
-                winsound.PlaySound(file_path, winsound.SND_FILENAME | winsound.SND_ASYNC)
+                playsound(file_path, block=False, backend=SOUND_BACKEND)
             except Exception as e:
                 self.logger.warning(f"Failed to play sound file {file_path}: {e}")
-        sound_thread = threading.Thread(target=play_sound, daemon=True)
-        sound_thread.start()
+        threading.Thread(target=play, daemon=True).start()
     def play_start_sound(self):
         if self.enabled:
             self._play_sound_file_async(self.start_sound_path)
     def play_stop_sound(self):
-        if self.enabled:
+        if self.enabled:
             self._play_sound_file_async(self.stop_sound_path)
     def play_cancel_sound(self):
         if self.enabled:
-            self._play_sound_file_async(self.cancel_sound_path)
+            self._play_sound_file_async(self.cancel_sound_path)

{whisper_key_local-0.5.3 → whisper_key_local-0.6.1}/src/whisper_key/clipboard_manager.py RENAMED Viewed

@@ -3,13 +3,10 @@ import time
 from typing import Optional
 import pyperclip
-import win32gui
-import pyautogui
+from .platform import keyboard
 from .utils import parse_hotkey
-pyautogui.FAILSAFE = True  # Enable "move mouse to corner to abort automation"
 class ClipboardManager:
     def __init__(self, key_simulation_delay, auto_paste, preserve_clipboard, paste_hotkey):
         self.logger = logging.getLogger(__name__)
@@ -18,90 +15,77 @@ class ClipboardManager:
         self.preserve_clipboard = preserve_clipboard
         self.paste_hotkey = paste_hotkey
         self.paste_keys = parse_hotkey(paste_hotkey)
-        self._configure_pyautogui_timing()
+        self._configure_keyboard_timing()
         self._test_clipboard_access()
         self._print_status()
-    def _configure_pyautogui_timing(self):
-        pyautogui.PAUSE = self.key_simulation_delay
+    def _configure_keyboard_timing(self):
+        keyboard.set_delay(self.key_simulation_delay)
     def _test_clipboard_access(self):
         try:
             pyperclip.paste()
             self.logger.info("Clipboard access test successful")
         except Exception as e:
             self.logger.error(f"Clipboard access test failed: {e}")
             raise
     def _print_status(self):
         hotkey_display = self.paste_hotkey.upper()
         if self.auto_paste:
             print(f"   ✓ Auto-paste is ENABLED using key simulation ({hotkey_display})")
         else:
             print(f"   ✗ Auto-paste is DISABLED - paste manually with {hotkey_display}")
     def copy_text(self, text: str) -> bool:
         if not text:
             return False
         try:
             self.logger.info(f"Copying text to clipboard ({len(text)} chars)")
             pyperclip.copy(text)
             return True
         except Exception as e:
             self.logger.error(f"Failed to copy text to clipboard: {e}")
             return False
     def get_clipboard_content(self) -> Optional[str]:
         try:
             clipboard_content = pyperclip.paste()
             if clipboard_content:
                 return clipboard_content
             else:
                 return None
         except Exception as e:
             self.logger.error(f"Failed to paste text from clipboard: {e}")
             return None
     def copy_with_notification(self, text: str) -> bool:
         if not text:
             return False
         success = self.copy_text(text)
         if success:
             print("   ✓ Copied to clipboard")
             print("   ✓ You can now paste with Ctrl+V in any application!")
         return success
     def clear_clipboard(self) -> bool:
         try:
             pyperclip.copy("")
             return True
         except Exception as e:
             self.logger.error(f"Failed to clear clipboard: {e}")
             return False
-    def get_active_window_handle(self) -> Optional[int]:
-        try:
-            hwnd = win32gui.GetForegroundWindow()
-            if hwnd:
-                window_title = win32gui.GetWindowText(hwnd)
-                self.logger.info(f"Active window: '{window_title}' (handle: {hwnd})")
-                return hwnd
-            else:
-                return None
-        except Exception as e:
-            self.logger.error(f"Failed to get active window handle: {e}")
-            return None
-    def execute_auto_paste(self, text: str, preserve_clipboard: bool) -> bool:
+    def execute_auto_paste(self, text: str, preserve_clipboard: bool) -> bool:
         try:
             original_content = None
             if preserve_clipboard:
@@ -109,8 +93,8 @@ class ClipboardManager:
             if not self.copy_text(text):
                 return False
-            pyautogui.hotkey(*self.paste_keys)
+            keyboard.send_hotkey(*self.paste_keys)
             print(f"   ✓ Auto-pasted via key simulation")
@@ -119,19 +103,19 @@ class ClipboardManager:
                 time.sleep(self.key_simulation_delay)
             return True
         except Exception as e:
             self.logger.error(f"Failed to simulate paste keypress: {e}")
             return False
     def send_enter_key(self) -> bool:
         try:
             self.logger.info("Sending ENTER key to active application")
-            pyautogui.press('enter')
+            keyboard.send_key('enter')
             print("   ✓ Text submitted with ENTER!")
             return True
         except Exception as e:
             self.logger.error(f"Failed to send ENTER key: {e}")
             return False
@@ -139,30 +123,29 @@ class ClipboardManager:
     def deliver_transcription(self,
                               transcribed_text: str,
                               use_auto_enter: bool = False) -> bool:
         try:
             if use_auto_enter:
                 print("🚀 Auto-pasting text and SENDING with ENTER...")
-                # Force auto-paste when using auto-enter hotkey
                 success = self.execute_auto_paste(transcribed_text, self.preserve_clipboard)
                 if success:
                     success = self.send_enter_key()
             elif self.auto_paste:
                 print("🚀 Auto-pasting text...")
-                success = self.execute_auto_paste(transcribed_text, self.preserve_clipboard)
+                success = self.execute_auto_paste(transcribed_text, self.preserve_clipboard)
             else:
                 print("📋 Copying to clipboard...")
-                success = self.copy_with_notification(transcribed_text)
+                success = self.copy_with_notification(transcribed_text)
             return success
         except Exception as e:
             self.logger.error(f"Delivery workflow failed: {e}")
             return False
     def update_auto_paste(self, enabled: bool):
         self.auto_paste = enabled
-        self._print_status()
+        self._print_status()

whisper-key-local 0.5.3__tar.gz → 0.6.1__tar.gz

whisper-key-local 0.5.3tar.gz → 0.6.1tar.gz