npm - voicesmith-mcp - Versions diffs - 1.0.7 → 1.0.9 - Mend

voicesmith-mcp 1.0.7 → 1.0.9

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (5) hide show

package/README.md +93 -0
package/hooks/session-start.sh +20 -5
package/package.json +1 -1
package/stt/__pycache__/mic_capture.cpython-314.pyc +0 -0
package/stt/mic_capture.py +17 -6

package/README.md CHANGED Viewed

@@ -30,6 +30,53 @@ The installer will:
 Restart your IDE session after installing. The AI will greet you by voice on the first response.
+## Usage
+> [!NOTE]
+> **Everything works out of the box.** After installing, just start a session — the AI speaks automatically. No configuration needed. The installer sets up voice behavior rules that teach the AI when and how to use its voice.
+What the AI does automatically:
+| Moment | What happens |
+|--------|-------------|
+| You give it a task | Speaks a brief acknowledgment |
+| It finishes work | Speaks a summary of what was done |
+| It has a question | Asks out loud, then listens for your voice response |
+| Voice tools unavailable | Falls back to text silently |
+---
+### Changing Voices Mid-Session
+Ask the AI to switch voices at any time:
+> *"Switch to Nova"*
+If the voice is available, the AI switches immediately. If it's occupied by another session, the AI will tell you and show available alternatives.
+Browse all 54 voices:
+> *"Show me the available voices"*
+Or preview them in a terminal: `npx voicesmith-mcp voices`
+---
+### Voice Persistence
+> [!TIP]
+> When you switch voices, the choice is saved automatically. Next time you start or resume a session, the AI uses the same voice — no need to switch again.
+---
+### Muting
+In a meeting or shared space? Just ask:
+> *"Mute the voice"*
+The AI continues working normally — it just won't play audio. Say *"unmute"* when you're ready.
 ## Alternative Install
 If you don't have Node.js or prefer a shell script:
@@ -104,6 +151,9 @@ Re-run `npx voicesmith-mcp install` to change your voice or update settings. Exi
 - **mpv** — audio playback
 - ~500MB disk space for models
+> [!WARNING]
+> **Windows is not supported yet.** The server uses Unix-specific features (file locking, audio commands, process detection). Windows support is planned — see [TODO](TODO.md) for details.
 ## Supported IDEs
 | IDE | Config Location | Rules Location | Multi-Session |
@@ -112,6 +162,49 @@ Re-run `npx voicesmith-mcp install` to change your voice or update settings. Exi
 | Cursor | `~/.cursor/mcp.json` | `~/.cursor/rules/voicesmith.mdc` | No (single server) |
 | Codex | `~/.codex/mcp.json` | `~/.codex/AGENTS.md` | No (single session) |
+## Troubleshooting
+### The AI can't hear me (listen returns empty or times out)
+**Check microphone permissions.** On macOS, the terminal app that runs your IDE needs microphone access:
+1. Open **System Settings > Privacy & Security > Microphone**
+2. Make sure your terminal app is listed and enabled:
+   - **Warp**, **Terminal.app**, or **iTerm2** — for Claude Code
+   - **Cursor** or **VS Code** — if using those IDEs directly
+3. If the app isn't listed, the first `listen` call should trigger the permission prompt. Approve it and try again.
+> [!IMPORTANT]
+> The Python process inherits microphone permissions from the app that launched it. If your terminal doesn't have mic access, listen will silently fail.
+**Check your audio input device.** If an external mic is selected but not connected, the server opens it but gets silence:
+- Open **System Settings > Sound > Input** and verify the correct mic is selected
+- Or ask the AI: *"What's the server status?"* — check that `stt.loaded` and `vad.loaded` are both `true`
+**Another app is using the mic.** Apps like Zoom, Teams, or FaceTime can hold exclusive mic access. Close them and try again.
+**Voice too quiet for VAD.** The voice activity detector might not pick up soft speech. You can lower the sensitivity threshold in `~/.local/share/voicesmith-mcp/config.json`:
+```json
+{
+  "stt": {
+    "vad_threshold": 0.2
+  }
+}
+```
+Lower values = more sensitive. Default is `0.3`. Restart the session after changing.
+### The AI doesn't speak
+- Check that **espeak-ng** and **mpv** are installed: `which espeak-ng mpv`
+- Check the AI's status: ask *"What's your voice status?"*
+- If muted, say *"Unmute"*
+### The AI speaks with the wrong voice
+This can happen when another session is holding your preferred voice name. Ask the AI: *"Switch to Eric"* — it will either switch or tell you what's available.
 ## Uninstall
 ```bash

package/hooks/session-start.sh CHANGED Viewed

@@ -38,7 +38,16 @@ try:
                 raise SystemExit
         except (OSError, ProcessLookupError):
             pass
-    # Fallback: most recent alive session
+    # Prefer the most recent session without a session_id (just registered, waiting for hook)
+    for s in reversed(data.get('sessions', [])):
+        try:
+            os.kill(s['pid'], 0)
+            if not s.get('session_id'):
+                print(s['port'])
+                raise SystemExit
+        except (OSError, ProcessLookupError):
+            pass
+    # Final fallback: most recent alive session
     for s in reversed(data.get('sessions', [])):
         try:
             os.kill(s['pid'], 0)
@@ -51,11 +60,17 @@ except:
 " 2>/dev/null)
     # Send session_id to the server if we have both port and session_id
+    # Retry up to 3 times — the HTTP listener may not be ready yet
     if [ -n "$PORT" ] && [ -n "$SESSION_ID" ]; then
-        RESPONSE=$(curl -s --max-time 3 -X POST \
-            -H "Content-Type: application/json" \
-            -d "{\"session_id\": \"$SESSION_ID\"}" \
-            "http://127.0.0.1:$PORT/session" 2>/dev/null)
+        RESPONSE=""
+        for attempt in 1 2 3; do
+            RESPONSE=$(curl -s --max-time 3 -X POST \
+                -H "Content-Type: application/json" \
+                -d "{\"session_id\": \"$SESSION_ID\"}" \
+                "http://127.0.0.1:$PORT/session" 2>/dev/null)
+            [ -n "$RESPONSE" ] && break
+            sleep 0.5
+        done
         if [ -n "$RESPONSE" ]; then
             SESSION_NAME=$(echo "$RESPONSE" | python3 -c "

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "voicesmith-mcp",
-  "version": "1.0.7",
+  "version": "1.0.9",
   "description": "Local AI voice for coding assistants — TTS & STT via MCP. Kokoro ONNX + faster-whisper, fully offline.",
   "bin": {
     "voicesmith-mcp": "bin/cli.js"

package/stt/__pycache__/mic_capture.cpython-314.pyc CHANGED Viewed

Binary file

package/stt/mic_capture.py CHANGED Viewed

@@ -2,6 +2,7 @@
 import asyncio
 import queue
+import time
 from typing import Optional
 import numpy as np
@@ -60,6 +61,7 @@ class MicCapture:
         silence_duration = 0.0
         loop = asyncio.get_event_loop()
+        stream = None
         try:
             stream = sd.InputStream(
                 samplerate=self._sample_rate,
@@ -77,15 +79,15 @@ class MicCapture:
                 # Check cancellation
                 if cancel_event and cancel_event.is_set():
                     logger.info("Recording cancelled by event")
-                    return None
+                    break
                 # Check timeout
                 elapsed = asyncio.get_event_loop().time() - start_time
                 if elapsed >= timeout:
                     if not speech_detected:
                         logger.info("Recording timed out with no speech detected")
-                        return None
-                    logger.info("Recording timed out")
+                    else:
+                        logger.info("Recording timed out")
                     break
                 # Get audio chunk from queue
@@ -112,9 +114,6 @@ class MicCapture:
                         )
                         break
-            stream.stop()
-            stream.close()
             if not chunks or not speech_detected:
                 return None
@@ -125,6 +124,18 @@ class MicCapture:
         except Exception as e:
             raise MicCaptureError(f"Recording failed: {e}") from e
         finally:
+            # Safely tear down the audio stream. The CoreAudio IO thread may
+            # still be executing the callback when we call stop(). Wait briefly
+            # between stop() and close() to let the IO thread finish — this
+            # prevents the segfault in libffi/PortAudio where the callback
+            # dereferences freed memory.
+            if stream is not None:
+                try:
+                    stream.stop()
+                    time.sleep(0.05)  # Let CoreAudio IO thread finish
+                    stream.close()
+                except Exception as e:
+                    logger.debug(f"Stream teardown: {e}")
             self._recording = False
     def _audio_callback(self, indata, frames, time, status) -> None: