npm - voicesmith-mcp - Versions diffs - 1.0.17 → 1.0.18 - Mend

voicesmith-mcp 1.0.17 → 1.0.18

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (9) hide show

package/README.md +17 -10
package/config.py +4 -0
package/package.json +1 -1
package/server.py +14 -0
package/stt/__pycache__/mic_capture.cpython-314.pyc +0 -0
package/tts/__pycache__/audio_player.cpython-314.pyc +0 -0
package/tts/__pycache__/kokoro_engine.cpython-314.pyc +0 -0
package/tts/__pycache__/media_duck.cpython-314.pyc +0 -0
package/tts/__pycache__/speech_queue.cpython-314.pyc +0 -0

package/README.md CHANGED Viewed

@@ -39,7 +39,7 @@ What the AI does automatically:
 | Moment | What happens |
 |--------|-------------|
-| You give it a task | Speaks a brief acknowledgment |
+| You give it a task | Gets to work (speaks only when clarifying approach) |
 | It finishes work | Speaks a summary of what was done |
 | It has a question | Asks out loud, then listens for your voice response |
 | Voice tools unavailable | Falls back to text silently |
@@ -112,7 +112,8 @@ The MCP server runs as a local process alongside your IDE. It communicates over
 - **TTS**: Kokoro ONNX — fast neural TTS, 54 voices, no GPU needed
 - **STT**: faster-whisper — OpenAI Whisper running locally via CTranslate2
 - **VAD**: Silero VAD — voice activity detection for clean recordings
-- **Audio**: mpv for playback, sounddevice for recording
+- **Audio**: mpv for playback; CoreAudio via native app bundle on macOS (sounddevice fallback on Linux)
+- **Media ducking**: Auto-pauses Apple Music, Spotify, and browser audio during speech (macOS)
 ## Multi-Session
@@ -131,16 +132,24 @@ Config lives at `~/.local/share/voicesmith-mcp/config.json`. Key settings:
   "main_agent": "Eric",
   "tts": {
     "default_voice": "am_eric",
-    "audio_player": "mpv"
+    "audio_player": "mpv",
+    "duck_media": true
   },
   "stt": {
     "model_size": "base",
     "language": "en",
-    "vad_threshold": 0.3
+    "vad_threshold": 0.3,
+    "nudge_on_timeout": false
   }
 }
 ```
+| Setting | Description | Default |
+|---------|-------------|---------|
+| `tts.duck_media` | Auto-pause music/browser audio during speech (macOS) | `true` |
+| `stt.nudge_on_timeout` | Speak "I didn't catch that" when listen times out | `false` |
+| `stt.vad_threshold` | Voice detection sensitivity (lower = more sensitive) | `0.3` |
 Re-run `npx voicesmith-mcp install` to change your voice or update settings. Existing configuration is preserved — only new defaults are added.
 ## Requirements
@@ -166,16 +175,14 @@ Re-run `npx voicesmith-mcp install` to change your voice or update settings. Exi
 ### The AI can't hear me (listen returns empty or times out)
-**Check microphone permissions.** On macOS, the terminal app that runs your IDE needs microphone access:
+**Check microphone permissions.** On macOS, VoiceSmith uses a native app bundle (`VoiceSmithMCP.app`) for mic access. The first time it records, macOS should show a permission dialog for the app. If it didn't:
 1. Open **System Settings > Privacy & Security > Microphone**
-2. Make sure your terminal app is listed and enabled:
-   - **Warp**, **Terminal.app**, or **iTerm2** — for Claude Code
-   - **Cursor** or **VS Code** — if using those IDEs directly
-3. If the app isn't listed, the first `listen` call should trigger the permission prompt. Approve it and try again.
+2. Look for **VoiceSmithMCP** and make sure it's enabled
+3. If it's not listed, the LaunchAgent may not be running — try reinstalling: `npx voicesmith-mcp install`
 > [!IMPORTANT]
-> The Python process inherits microphone permissions from the app that launched it. If your terminal doesn't have mic access, listen will silently fail.
+> If the server detects silent audio (all zeros for ~320ms), it returns an error pointing you to the microphone permission settings. This usually means macOS TCC denied mic access.
 **Check your audio input device.** If an external mic is selected but not connected, the server opens it but gets silence:
 - Open **System Settings > Sound > Input** and verify the correct mic is selected

package/config.py CHANGED Viewed

@@ -37,6 +37,7 @@ class STTConfig:
     silence_threshold: float = 1.5
     max_listen_timeout: float = 15
     vad_threshold: float = 0.3
+    nudge_on_timeout: bool = False
 @dataclass
@@ -117,6 +118,8 @@ def load_config(config_path: Optional[Path] = None) -> AppConfig:
                     config.stt.max_listen_timeout = float(stt["max_listen_timeout"])
                 if "vad_threshold" in stt:
                     config.stt.vad_threshold = float(stt["vad_threshold"])
+                if "nudge_on_timeout" in stt:
+                    config.stt.nudge_on_timeout = bool(stt["nudge_on_timeout"])
             # Top-level config
             if "main_agent" in data:
@@ -191,6 +194,7 @@ def save_config(config: AppConfig, config_path: Optional[Path] = None) -> None:
             "silence_threshold": config.stt.silence_threshold,
             "max_listen_timeout": config.stt.max_listen_timeout,
             "vad_threshold": config.stt.vad_threshold,
+            "nudge_on_timeout": config.stt.nudge_on_timeout,
         },
         "main_agent": config.main_agent,
         "last_voice_name": config.last_voice_name,

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "voicesmith-mcp",
-  "version": "1.0.17",
+  "version": "1.0.18",
   "description": "Local AI voice for coding assistants — TTS & STT via MCP. Kokoro ONNX + faster-whisper, fully offline.",
   "bin": {
     "voicesmith-mcp": "bin/cli.js"

package/server.py CHANGED Viewed

@@ -565,6 +565,20 @@ async def speak_then_listen(
         listen_result = await listen(timeout=timeout, silence_threshold=silence_threshold)
+        # Optionally speak a nudge on timeout to prompt user to type instead
+        if (listen_result.get("error") == "timeout"
+                and _config and _config.stt.nudge_on_timeout
+                and _speech_queue):
+            nudge_text = "I didn't catch that. Go ahead and type it."
+            voice, _ = _registry.get_voice(name) if _registry else (None, False)
+            if voice and _tts_engine:
+                try:
+                    result = _tts_engine.synthesize(nudge_text, voice, speed)
+                    _audio_player.play(result.samples, result.sample_rate)
+                    listen_result["nudge_spoken"] = True
+                except Exception:
+                    pass
         return {"speak": speak_result, "listen": listen_result}
     finally:
         _suppress_duck = False

package/stt/__pycache__/mic_capture.cpython-314.pyc CHANGED Viewed

Binary file

package/tts/__pycache__/audio_player.cpython-314.pyc CHANGED Viewed

Binary file

package/tts/__pycache__/kokoro_engine.cpython-314.pyc CHANGED Viewed

Binary file

package/tts/__pycache__/media_duck.cpython-314.pyc CHANGED Viewed

Binary file

package/tts/__pycache__/speech_queue.cpython-314.pyc CHANGED Viewed

Binary file