PyPI - python-voiceio - Versions diffs - 0.3.0__tar.gz → 0.3.2__tar.gz - Mend

python-voiceio 0.3.0tar.gz → 0.3.2tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (104) hide show

{python_voiceio-0.3.0/python_voiceio.egg-info → python_voiceio-0.3.2}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: python-voiceio
-Version: 0.3.0
+Version: 0.3.2
 Summary: Speak → text, locally, instantly.
 Author: Hugo Montenegro
 License-Expression: MIT
@@ -56,6 +56,7 @@ Dynamic: license-file
 [![PyPI](https://img.shields.io/pypi/v/python-voiceio)](https://pypi.org/project/python-voiceio/)
 [![Python](https://img.shields.io/pypi/pyversions/python-voiceio)](https://pypi.org/project/python-voiceio/)
 [![License: MIT](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)
+[![Downloads](https://img.shields.io/pepy/dt/python-voiceio)](https://pepy.tech/projects/python-voiceio)
 Speak → text, locally, instantly.
@@ -153,6 +154,10 @@ Press your hotkey to start recording (1s pre-buffer catches the first syllable).
 - **Works everywhere**: IBus input method for GUI apps, clipboard for terminals
 - **Wayland + X11**: evdev hotkeys work on both, no root required
 - **Pre-buffer**: never miss the first syllable
+- **Voice commands**: "new line", "comma", "scratch that", punctuation by name
+- **Autocorrect**: LLM-powered review of recurring Whisper mistakes (`voiceio correct`)
+- **Text-to-speech**: hear selected text spoken back (Piper, eSpeak, Edge TTS)
+- **Smart post-processing**: numbers ("twenty five" → "25"), punctuation, capitalization
 - **Auto-healing**: falls back to the next working backend if one fails
 - **Autostart**: optional systemd service, restarts on crash
 - **Self-diagnosing**: `voiceio doctor` checks everything, `--fix` repairs it
@@ -176,7 +181,10 @@ voiceio                  Start the daemon
 voiceio setup            Interactive setup wizard
 voiceio doctor           Health check (--fix to auto-repair)
 voiceio test             Test microphone + live transcription
+voiceio demo             Interactive guided tour of all features
 voiceio toggle           Toggle recording on a running daemon
+voiceio correct          Review and fix recurring transcription errors
+voiceio history          View transcription history
 voiceio update           Update to latest version
 voiceio service install  Autostart on login (systemd / Windows Startup)
 voiceio logs             View recent logs
@@ -250,9 +258,8 @@ Contributions welcome! See [CONTRIBUTING.md](CONTRIBUTING.md) and [open issues](
 - [ ] Multiple engine backends (whisper.cpp for Vulkan/AMD, VOSK for low-end hardware)
 - [ ] Echo cancellation (filter system audio for meeting use)
 - [ ] Wake word activation ("Hey voiceio")
-- [ ] Text-to-speech output (Piper/espeak-ng — completes the "io")
 **Done**
+- [x] Text-to-speech output (Piper/eSpeak/Edge TTS — completes the "io")
 - [x] LLM auto-audit dictionary (`voiceio correct --auto` — scan history with LLM, interactive correction)
 - [x] LLM post-processing via Ollama (grammar cleanup, spelling fixes on final pass)
 - [x] Corrections dictionary — auto-replace misheard words, "correct that" voice command

{python_voiceio-0.3.0 → python_voiceio-0.3.2}/README.md RENAMED Viewed

@@ -4,6 +4,7 @@
 [![PyPI](https://img.shields.io/pypi/v/python-voiceio)](https://pypi.org/project/python-voiceio/)
 [![Python](https://img.shields.io/pypi/pyversions/python-voiceio)](https://pypi.org/project/python-voiceio/)
 [![License: MIT](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)
+[![Downloads](https://img.shields.io/pepy/dt/python-voiceio)](https://pepy.tech/projects/python-voiceio)
 Speak → text, locally, instantly.
@@ -101,6 +102,10 @@ Press your hotkey to start recording (1s pre-buffer catches the first syllable).
 - **Works everywhere**: IBus input method for GUI apps, clipboard for terminals
 - **Wayland + X11**: evdev hotkeys work on both, no root required
 - **Pre-buffer**: never miss the first syllable
+- **Voice commands**: "new line", "comma", "scratch that", punctuation by name
+- **Autocorrect**: LLM-powered review of recurring Whisper mistakes (`voiceio correct`)
+- **Text-to-speech**: hear selected text spoken back (Piper, eSpeak, Edge TTS)
+- **Smart post-processing**: numbers ("twenty five" → "25"), punctuation, capitalization
 - **Auto-healing**: falls back to the next working backend if one fails
 - **Autostart**: optional systemd service, restarts on crash
 - **Self-diagnosing**: `voiceio doctor` checks everything, `--fix` repairs it
@@ -124,7 +129,10 @@ voiceio                  Start the daemon
 voiceio setup            Interactive setup wizard
 voiceio doctor           Health check (--fix to auto-repair)
 voiceio test             Test microphone + live transcription
+voiceio demo             Interactive guided tour of all features
 voiceio toggle           Toggle recording on a running daemon
+voiceio correct          Review and fix recurring transcription errors
+voiceio history          View transcription history
 voiceio update           Update to latest version
 voiceio service install  Autostart on login (systemd / Windows Startup)
 voiceio logs             View recent logs
@@ -198,9 +206,8 @@ Contributions welcome! See [CONTRIBUTING.md](CONTRIBUTING.md) and [open issues](
 - [ ] Multiple engine backends (whisper.cpp for Vulkan/AMD, VOSK for low-end hardware)
 - [ ] Echo cancellation (filter system audio for meeting use)
 - [ ] Wake word activation ("Hey voiceio")
-- [ ] Text-to-speech output (Piper/espeak-ng — completes the "io")
 **Done**
+- [x] Text-to-speech output (Piper/eSpeak/Edge TTS — completes the "io")
 - [x] LLM auto-audit dictionary (`voiceio correct --auto` — scan history with LLM, interactive correction)
 - [x] LLM post-processing via Ollama (grammar cleanup, spelling fixes on final pass)
 - [x] Corrections dictionary — auto-replace misheard words, "correct that" voice command

{python_voiceio-0.3.0 → python_voiceio-0.3.2}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 [project]
 name = "python-voiceio"
-version = "0.3.0"
+version = "0.3.2"
 description = "Speak → text, locally, instantly."
 readme = "README.md"
 license = "MIT"

{python_voiceio-0.3.0 → python_voiceio-0.3.2/python_voiceio.egg-info}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: python-voiceio
-Version: 0.3.0
+Version: 0.3.2
 Summary: Speak → text, locally, instantly.
 Author: Hugo Montenegro
 License-Expression: MIT
@@ -56,6 +56,7 @@ Dynamic: license-file
 [![PyPI](https://img.shields.io/pypi/v/python-voiceio)](https://pypi.org/project/python-voiceio/)
 [![Python](https://img.shields.io/pypi/pyversions/python-voiceio)](https://pypi.org/project/python-voiceio/)
 [![License: MIT](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)
+[![Downloads](https://img.shields.io/pepy/dt/python-voiceio)](https://pepy.tech/projects/python-voiceio)
 Speak → text, locally, instantly.
@@ -153,6 +154,10 @@ Press your hotkey to start recording (1s pre-buffer catches the first syllable).
 - **Works everywhere**: IBus input method for GUI apps, clipboard for terminals
 - **Wayland + X11**: evdev hotkeys work on both, no root required
 - **Pre-buffer**: never miss the first syllable
+- **Voice commands**: "new line", "comma", "scratch that", punctuation by name
+- **Autocorrect**: LLM-powered review of recurring Whisper mistakes (`voiceio correct`)
+- **Text-to-speech**: hear selected text spoken back (Piper, eSpeak, Edge TTS)
+- **Smart post-processing**: numbers ("twenty five" → "25"), punctuation, capitalization
 - **Auto-healing**: falls back to the next working backend if one fails
 - **Autostart**: optional systemd service, restarts on crash
 - **Self-diagnosing**: `voiceio doctor` checks everything, `--fix` repairs it
@@ -176,7 +181,10 @@ voiceio                  Start the daemon
 voiceio setup            Interactive setup wizard
 voiceio doctor           Health check (--fix to auto-repair)
 voiceio test             Test microphone + live transcription
+voiceio demo             Interactive guided tour of all features
 voiceio toggle           Toggle recording on a running daemon
+voiceio correct          Review and fix recurring transcription errors
+voiceio history          View transcription history
 voiceio update           Update to latest version
 voiceio service install  Autostart on login (systemd / Windows Startup)
 voiceio logs             View recent logs
@@ -250,9 +258,8 @@ Contributions welcome! See [CONTRIBUTING.md](CONTRIBUTING.md) and [open issues](
 - [ ] Multiple engine backends (whisper.cpp for Vulkan/AMD, VOSK for low-end hardware)
 - [ ] Echo cancellation (filter system audio for meeting use)
 - [ ] Wake word activation ("Hey voiceio")
-- [ ] Text-to-speech output (Piper/espeak-ng — completes the "io")
 **Done**
+- [x] Text-to-speech output (Piper/eSpeak/Edge TTS — completes the "io")
 - [x] LLM auto-audit dictionary (`voiceio correct --auto` — scan history with LLM, interactive correction)
 - [x] LLM post-processing via Ollama (grammar cleanup, spelling fixes on final pass)
 - [x] Corrections dictionary — auto-replace misheard words, "correct that" voice command

{python_voiceio-0.3.0 → python_voiceio-0.3.2}/tests/test_llm_api.py RENAMED Viewed

@@ -6,7 +6,7 @@ import urllib.error
 from unittest.mock import MagicMock, patch
 from voiceio.config import AutocorrectConfig
-from voiceio.llm_api import chat, check_api_key, resolve_api_key
+from voiceio.llm_api import chat, check_api_key, detect_provider, resolve_api_key
 def _mock_response(data: dict) -> MagicMock:
@@ -116,3 +116,61 @@ def test_check_empty_key():
     cfg = _cfg(api_key="")
     with patch.dict("os.environ", {}, clear=True):
         assert check_api_key(cfg) is False
+# ── Anthropic native API ────────────────────────────────────────────────
+@patch("urllib.request.urlopen")
+def test_chat_anthropic_native(mock_urlopen):
+    mock_urlopen.return_value = _mock_response({
+        "content": [{"type": "text", "text": "Fixed text."}]
+    })
+    cfg = _cfg(base_url="https://api.anthropic.com/v1")
+    result = chat(cfg, "system prompt", "user message")
+    assert result == "Fixed text."
+    req = mock_urlopen.call_args[0][0]
+    assert req.get_header("X-api-key") == "test-key"
+    assert req.get_header("Anthropic-version") == "2023-06-01"
+    assert "Authorization" not in dict(req.header_items())
+    body = json.loads(req.data)
+    assert body["system"] == "system prompt"
+    assert body["messages"] == [{"role": "user", "content": "user message"}]
+    assert "/messages" in req.full_url
+@patch("urllib.request.urlopen")
+def test_check_api_key_anthropic(mock_urlopen):
+    mock_urlopen.return_value = _mock_response({
+        "content": [{"type": "text", "text": ""}]
+    })
+    cfg = _cfg(base_url="https://api.anthropic.com/v1")
+    assert check_api_key(cfg, "sk-ant-test") is True
+    req = mock_urlopen.call_args[0][0]
+    assert "/messages" in req.full_url
+# ── detect_provider ─────────────────────────────────────────────────────
+def test_detect_openrouter():
+    base_url, model = detect_provider("sk-or-abc123")
+    assert "openrouter" in base_url
+    assert "claude" in model
+def test_detect_anthropic():
+    base_url, model = detect_provider("sk-ant-abc123")
+    assert "anthropic.com" in base_url
+    assert "claude" in model
+def test_detect_openai():
+    base_url, model = detect_provider("sk-proj-abc123")
+    assert "openai.com" in base_url
+def test_detect_unknown_defaults_openrouter():
+    base_url, _ = detect_provider("unknown-key-format")
+    assert "openrouter" in base_url

{python_voiceio-0.3.0 → python_voiceio-0.3.2}/tests/test_tts.py RENAMED Viewed

@@ -142,7 +142,7 @@ def test_player_empty_audio():
 def test_tts_config_defaults():
     cfg = TTSConfig()
-    assert cfg.enabled is False
+    assert cfg.enabled is True
     assert cfg.engine == "auto"
     assert cfg.hotkey == "ctrl+alt+s"
     assert cfg.voice == ""
@@ -155,4 +155,4 @@ def test_tts_config_in_main_config():
     cfg = Config()
     assert hasattr(cfg, "tts")
     assert isinstance(cfg.tts, TTSConfig)
-    assert cfg.tts.enabled is False
+    assert cfg.tts.enabled is True

python_voiceio-0.3.2/voiceio/__init__.py ADDED Viewed

	@@ -0,0 +1 @@
1	+ __version__ = "0.3.2"

{python_voiceio-0.3.0 → python_voiceio-0.3.2}/voiceio/config.py RENAMED Viewed

@@ -105,7 +105,7 @@ class AutocorrectConfig:
 @dataclass
 class TTSConfig:
-    enabled: bool = False
+    enabled: bool = True
     engine: str = "auto"         # "auto" | "piper" | "espeak" | "edge-tts"
     hotkey: str = "ctrl+alt+s"   # "s" for speak
     voice: str = ""              # empty = engine default

python_voiceio-0.3.2/voiceio/llm_api.py ADDED Viewed

@@ -0,0 +1,183 @@
+"""Multi-provider chat completions API client.
+Supports OpenRouter, OpenAI, Anthropic (native Messages API), Together, Groq,
+local Ollama (via /v1/chat/completions), etc. Zero dependencies beyond stdlib.
+"""
+from __future__ import annotations
+import json
+import logging
+import os
+import urllib.error
+import urllib.request
+from voiceio.config import AutocorrectConfig
+log = logging.getLogger(__name__)
+def _is_anthropic(base_url: str) -> bool:
+    """Check if the base URL points to Anthropic's native API."""
+    return "api.anthropic.com" in base_url
+def resolve_api_key(cfg: AutocorrectConfig) -> str:
+    """Resolve API key from config or environment variables."""
+    if cfg.api_key:
+        return cfg.api_key
+    # Check common env vars in priority order
+    for var in ("OPENROUTER_API_KEY", "OPENAI_API_KEY", "ANTHROPIC_API_KEY"):
+        val = os.environ.get(var, "")
+        if val:
+            return val
+    return ""
+def _anthropic_request(
+    base_url: str,
+    model: str,
+    system: str,
+    messages: list[dict],
+    api_key: str,
+    max_tokens: int,
+    timeout: float,
+) -> str | None:
+    """Send a request using Anthropic's native Messages API."""
+    url = f"{base_url}/messages"
+    body: dict = {
+        "model": model,
+        "max_tokens": max_tokens,
+        "messages": messages,
+    }
+    if system:
+        body["system"] = system
+    headers = {
+        "Content-Type": "application/json",
+        "x-api-key": api_key,
+        "anthropic-version": "2023-06-01",
+    }
+    req = urllib.request.Request(
+        url, data=json.dumps(body).encode(), headers=headers, method="POST",
+    )
+    with urllib.request.urlopen(req, timeout=timeout) as resp:
+        data = json.loads(resp.read())
+    # Anthropic returns content as a list of blocks
+    blocks = data.get("content", [])
+    text = "".join(b.get("text", "") for b in blocks if b.get("type") == "text")
+    return text.strip() or None
+def _openai_request(
+    base_url: str,
+    model: str,
+    system: str,
+    messages: list[dict],
+    api_key: str,
+    max_tokens: int,
+    timeout: float,
+) -> str | None:
+    """Send a request using the OpenAI chat completions format."""
+    url = f"{base_url}/chat/completions"
+    all_messages = []
+    if system:
+        all_messages.append({"role": "system", "content": system})
+    all_messages.extend(messages)
+    body = {
+        "model": model,
+        "max_tokens": max_tokens,
+        "messages": all_messages,
+    }
+    headers = {
+        "Content-Type": "application/json",
+        "Authorization": f"Bearer {api_key}",
+    }
+    req = urllib.request.Request(
+        url, data=json.dumps(body).encode(), headers=headers, method="POST",
+    )
+    with urllib.request.urlopen(req, timeout=timeout) as resp:
+        data = json.loads(resp.read())
+    return data["choices"][0]["message"]["content"].strip()
+def chat(
+    cfg: AutocorrectConfig,
+    system: str,
+    user_message: str,
+    *,
+    api_key: str = "",
+    max_tokens: int = 2048,
+) -> str | None:
+    """Send a chat completion request. Returns response text or None on failure.
+    Automatically detects Anthropic's native API vs OpenAI-compatible format
+    based on the configured base_url.
+    """
+    key = api_key or resolve_api_key(cfg)
+    if not key:
+        return None
+    base_url = cfg.base_url.rstrip("/")
+    messages = [{"role": "user", "content": user_message}]
+    try:
+        if _is_anthropic(base_url):
+            return _anthropic_request(
+                base_url, cfg.model, system, messages, key, max_tokens, cfg.timeout_secs,
+            )
+        return _openai_request(
+            base_url, cfg.model, system, messages, key, max_tokens, cfg.timeout_secs,
+        )
+    except urllib.error.HTTPError as e:
+        body_text = ""
+        try:
+            body_text = e.read().decode()[:200]
+        except Exception:
+            pass
+        log.warning("API request failed (HTTP %d): %s", e.code, body_text)
+        return None
+    except Exception as e:
+        log.warning("API request failed: %s", e)
+        return None
+def detect_provider(api_key: str) -> tuple[str, str]:
+    """Detect provider from API key prefix. Returns (base_url, model)."""
+    if api_key.startswith("sk-or-"):
+        return "https://openrouter.ai/api/v1", "anthropic/claude-sonnet-4"
+    if api_key.startswith("sk-ant-"):
+        return "https://api.anthropic.com/v1", "claude-sonnet-4-20250514"
+    if api_key.startswith(("sk-proj-", "sk-")):
+        return "https://api.openai.com/v1", "gpt-4o-mini"
+    # Default to OpenRouter (works with most keys)
+    return "https://openrouter.ai/api/v1", "anthropic/claude-sonnet-4"
+def check_api_key(cfg: AutocorrectConfig, api_key: str = "") -> bool:
+    """Validate an API key with a minimal request."""
+    key = api_key or resolve_api_key(cfg)
+    if not key:
+        return False
+    base_url = cfg.base_url.rstrip("/")
+    messages = [{"role": "user", "content": "hi"}]
+    try:
+        if _is_anthropic(base_url):
+            _anthropic_request(base_url, cfg.model, "", messages, key, 1, 10)
+        else:
+            _openai_request(base_url, cfg.model, "", messages, key, 1, 10)
+        return True
+    except urllib.error.HTTPError as e:
+        if e.code == 401:
+            return False
+        # Other errors (rate limit, etc.) mean the key itself is valid
+        return e.code != 403
+    except Exception:
+        return False

{python_voiceio-0.3.0 → python_voiceio-0.3.2}/voiceio/numbers.py RENAMED Viewed

@@ -142,6 +142,7 @@ def convert_numbers(text: str, language: str = "en") -> str:
         # Collect consecutive number words
         if _is_number_word(low) and low != "a" and low != "and":
             num_words = []
+            last_category = None  # "ones", "tens", "scale"
             j = i
             while j < len(words):
                 w = words[j].lower().rstrip(".,;:?!")
@@ -153,6 +154,7 @@ def convert_numbers(text: str, language: str = "en") -> str:
                         # "a" at start: only if followed by scale word
                         if j + 1 < len(words) and words[j + 1].lower().rstrip(".,;:?!") in _SCALES:
                             num_words.append(w)
+                            last_category = "ones"
                             j += 1
                             continue
                         break
@@ -163,7 +165,14 @@ def convert_numbers(text: str, language: str = "en") -> str:
                             j += 1
                             continue
                         break
+                    # Two consecutive ones-words = separate numbers
+                    # e.g. "one two three" should NOT become 6
+                    # But "twenty three", "one hundred", "thirteen thousand" are valid
+                    cat = "scale" if w in _SCALES else ("tens" if w in _TENS else "ones")
+                    if cat == "ones" and last_category == "ones":
+                        break
                     num_words.append(w)
+                    last_category = cat
                     j += 1
                 else:
                     break

{python_voiceio-0.3.0 → python_voiceio-0.3.2}/voiceio/service.py RENAMED Viewed

@@ -57,6 +57,7 @@ Type=simple
 ExecStart={bin_path}
 Restart=on-failure
 RestartSec=3
+PassEnvironment=DISPLAY WAYLAND_DISPLAY XDG_SESSION_TYPE XDG_RUNTIME_DIR
 [Install]
 WantedBy=default.target

{python_voiceio-0.3.0 → python_voiceio-0.3.2}/voiceio/tts/edge_engine.py RENAMED Viewed

@@ -19,12 +19,25 @@ class EdgeEngine:
     def probe(self) -> ProbeResult:
         try:
             import edge_tts  # noqa: F401
-            return ProbeResult(ok=True)
         except ImportError:
             return ProbeResult(
                 ok=False, reason="edge-tts not installed",
                 fix_hint="pip install edge-tts",
             )
+        try:
+            import soundfile  # noqa: F401
+            return ProbeResult(ok=True)
+        except ImportError:
+            pass
+        try:
+            import pydub  # noqa: F401
+            return ProbeResult(ok=True)
+        except ImportError:
+            return ProbeResult(
+                ok=False,
+                reason="edge-tts needs soundfile or pydub to decode audio",
+                fix_hint="pip install soundfile",
+            )
     def synthesize(self, text: str, voice: str, speed: float) -> tuple[np.ndarray, int]:
         import asyncio

{python_voiceio-0.3.0 → python_voiceio-0.3.2}/voiceio/tts/piper_engine.py RENAMED Viewed

@@ -22,10 +22,11 @@ class PiperEngine:
     def probe(self) -> ProbeResult:
         try:
             import piper  # noqa: F401
+            from piper.download import ensure_voice_exists, get_voices  # noqa: F401
             return ProbeResult(ok=True)
-        except ImportError:
+        except ImportError as e:
             return ProbeResult(
-                ok=False, reason="piper-tts not installed",
+                ok=False, reason=f"piper-tts not fully installed: {e}",
                 fix_hint="pip install piper-tts",
             )

python-voiceio 0.3.0__tar.gz → 0.3.2__tar.gz

python-voiceio 0.3.0tar.gz → 0.3.2tar.gz