npm - voicesmith-mcp - Versions diffs - 1.0.8 → 1.0.10 - Mend

voicesmith-mcp 1.0.8 → 1.0.10

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (5) hide show

package/README.md +93 -0
package/package.json +1 -1
package/stt/__pycache__/mic_capture.cpython-314.pyc +0 -0
package/stt/mic_capture.py +17 -6
package/templates/voice-rules.md +15 -11

package/README.md CHANGED Viewed

@@ -30,6 +30,53 @@ The installer will:
 Restart your IDE session after installing. The AI will greet you by voice on the first response.
+## Usage
+> [!NOTE]
+> **Everything works out of the box.** After installing, just start a session — the AI speaks automatically. No configuration needed. The installer sets up voice behavior rules that teach the AI when and how to use its voice.
+What the AI does automatically:
+| Moment | What happens |
+|--------|-------------|
+| You give it a task | Speaks a brief acknowledgment |
+| It finishes work | Speaks a summary of what was done |
+| It has a question | Asks out loud, then listens for your voice response |
+| Voice tools unavailable | Falls back to text silently |
+---
+### Changing Voices Mid-Session
+Ask the AI to switch voices at any time:
+> *"Switch to Nova"*
+If the voice is available, the AI switches immediately. If it's occupied by another session, the AI will tell you and show available alternatives.
+Browse all 54 voices:
+> *"Show me the available voices"*
+Or preview them in a terminal: `npx voicesmith-mcp voices`
+---
+### Voice Persistence
+> [!TIP]
+> When you switch voices, the choice is saved automatically. Next time you start or resume a session, the AI uses the same voice — no need to switch again.
+---
+### Muting
+In a meeting or shared space? Just ask:
+> *"Mute the voice"*
+The AI continues working normally — it just won't play audio. Say *"unmute"* when you're ready.
 ## Alternative Install
 If you don't have Node.js or prefer a shell script:
@@ -104,6 +151,9 @@ Re-run `npx voicesmith-mcp install` to change your voice or update settings. Exi
 - **mpv** — audio playback
 - ~500MB disk space for models
+> [!WARNING]
+> **Windows is not supported yet.** The server uses Unix-specific features (file locking, audio commands, process detection). Windows support is planned — see [TODO](TODO.md) for details.
 ## Supported IDEs
 | IDE | Config Location | Rules Location | Multi-Session |
@@ -112,6 +162,49 @@ Re-run `npx voicesmith-mcp install` to change your voice or update settings. Exi
 | Cursor | `~/.cursor/mcp.json` | `~/.cursor/rules/voicesmith.mdc` | No (single server) |
 | Codex | `~/.codex/mcp.json` | `~/.codex/AGENTS.md` | No (single session) |
+## Troubleshooting
+### The AI can't hear me (listen returns empty or times out)
+**Check microphone permissions.** On macOS, the terminal app that runs your IDE needs microphone access:
+1. Open **System Settings > Privacy & Security > Microphone**
+2. Make sure your terminal app is listed and enabled:
+   - **Warp**, **Terminal.app**, or **iTerm2** — for Claude Code
+   - **Cursor** or **VS Code** — if using those IDEs directly
+3. If the app isn't listed, the first `listen` call should trigger the permission prompt. Approve it and try again.
+> [!IMPORTANT]
+> The Python process inherits microphone permissions from the app that launched it. If your terminal doesn't have mic access, listen will silently fail.
+**Check your audio input device.** If an external mic is selected but not connected, the server opens it but gets silence:
+- Open **System Settings > Sound > Input** and verify the correct mic is selected
+- Or ask the AI: *"What's the server status?"* — check that `stt.loaded` and `vad.loaded` are both `true`
+**Another app is using the mic.** Apps like Zoom, Teams, or FaceTime can hold exclusive mic access. Close them and try again.
+**Voice too quiet for VAD.** The voice activity detector might not pick up soft speech. You can lower the sensitivity threshold in `~/.local/share/voicesmith-mcp/config.json`:
+```json
+{
+  "stt": {
+    "vad_threshold": 0.2
+  }
+}
+```
+Lower values = more sensitive. Default is `0.3`. Restart the session after changing.
+### The AI doesn't speak
+- Check that **espeak-ng** and **mpv** are installed: `which espeak-ng mpv`
+- Check the AI's status: ask *"What's your voice status?"*
+- If muted, say *"Unmute"*
+### The AI speaks with the wrong voice
+This can happen when another session is holding your preferred voice name. Ask the AI: *"Switch to Eric"* — it will either switch or tell you what's available.
 ## Uninstall
 ```bash

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "voicesmith-mcp",
-  "version": "1.0.8",
+  "version": "1.0.10",
   "description": "Local AI voice for coding assistants — TTS & STT via MCP. Kokoro ONNX + faster-whisper, fully offline.",
   "bin": {
     "voicesmith-mcp": "bin/cli.js"

package/stt/__pycache__/mic_capture.cpython-314.pyc CHANGED Viewed

Binary file

package/stt/mic_capture.py CHANGED Viewed

@@ -2,6 +2,7 @@
 import asyncio
 import queue
+import time
 from typing import Optional
 import numpy as np
@@ -60,6 +61,7 @@ class MicCapture:
         silence_duration = 0.0
         loop = asyncio.get_event_loop()
+        stream = None
         try:
             stream = sd.InputStream(
                 samplerate=self._sample_rate,
@@ -77,15 +79,15 @@ class MicCapture:
                 # Check cancellation
                 if cancel_event and cancel_event.is_set():
                     logger.info("Recording cancelled by event")
-                    return None
+                    break
                 # Check timeout
                 elapsed = asyncio.get_event_loop().time() - start_time
                 if elapsed >= timeout:
                     if not speech_detected:
                         logger.info("Recording timed out with no speech detected")
-                        return None
-                    logger.info("Recording timed out")
+                    else:
+                        logger.info("Recording timed out")
                     break
                 # Get audio chunk from queue
@@ -112,9 +114,6 @@ class MicCapture:
                         )
                         break
-            stream.stop()
-            stream.close()
             if not chunks or not speech_detected:
                 return None
@@ -125,6 +124,18 @@ class MicCapture:
         except Exception as e:
             raise MicCaptureError(f"Recording failed: {e}") from e
         finally:
+            # Safely tear down the audio stream. The CoreAudio IO thread may
+            # still be executing the callback when we call stop(). Wait briefly
+            # between stop() and close() to let the IO thread finish — this
+            # prevents the segfault in libffi/PortAudio where the callback
+            # dereferences freed memory.
+            if stream is not None:
+                try:
+                    stream.stop()
+                    time.sleep(0.05)  # Let CoreAudio IO thread finish
+                    stream.close()
+                except Exception as e:
+                    logger.debug(f"Stream teardown: {e}")
             self._recording = False
     def _audio_callback(self, indata, frames, time, status) -> None:

package/templates/voice-rules.md CHANGED Viewed

@@ -7,6 +7,7 @@ You have access to voice tools via the VoiceSmith MCP server.
 - **IMPORTANT:** If your session context says "Your assigned voice for this session is: [Name]", use THAT name — not "{{MAIN_AGENT}}". This is your real identity for this session.
 - On your first response, speak a brief intro using your assigned name: "[Name] here, ready to go."
 - Do not use your assigned name for sub-agents. Each agent needs its own unique name.
+- Tone: Be conversational and natural. Match the user's energy — casual if they're casual, focused if they're focused.
 ## Voice Switching
 - If the user asks to switch to a voice and `speak` returns `"error": "name_occupied"`, tell the user that voice is occupied by another session.
@@ -14,25 +15,28 @@ You have access to voice tools via the VoiceSmith MCP server.
 - Do NOT silently fall back to a different voice.
 ## Speaking
-- Speak twice per response:
-  1. **Opening** — Brief acknowledgment when starting work. Use `block: false` so work begins immediately in parallel.
-  2. **Closing** — Summary when done. Use `block: true`. Never skip this.
-- **Questions that need user input → use `speak_then_listen` as your closing voice.** If your response asks the user to make a decision, provide information, or confirm something (e.g., "which approach?", "should I?", "want me to?", "does this look right?"), your closing voice MUST be `speak_then_listen` — not regular `speak`. This way the mic opens right after you ask.
-- Rhetorical wrap-ups ("What's next?", "Standing by.") do NOT require listen — use regular `speak` for those.
-- Keep spoken messages to 1-2 sentences. Write details, speak summaries.
-- Do not speak code, file paths, or long lists aloud.
-- Speak at transitions only: start, finish, error, question. Do not narrate every action.
+- **Opening** — Only speak at the start when you have something meaningful to say (e.g., clarifying your approach, flagging an issue). Do NOT speak filler acknowledgments like "Let me look into that." Use `block: false` when you do speak an opening.
+- **Closing** — Always speak a summary when done. Use `block: true`. Never skip the closing.
+- **Questions requiring user input → use `speak_then_listen` as your closing.** If the user literally cannot continue without providing input (e.g., choosing between options, confirming a destructive action, providing missing info), use `speak_then_listen`. If you can reasonably continue without their answer, use regular `speak`.
+- Keep spoken output brief — prefer 1-2 sentences, never exceed 3. Write details, speak summaries. No code or paths aloud.
+## Speed Preferences
+- The `speak` tool accepts a `speed` parameter (default 1.0). Values < 1.0 are slower, > 1.0 are faster.
+- If the user asks to speak slower or faster, adjust the speed and remember their preference for the session.
 ## Listening
-- Use `speak_then_listen` whenever you need user input — it is your closing voice AND listen in one call.
+- Use `speak_then_listen` whenever you need user input — it combines speaking and opening the mic in one call.
 - If `listen` returns timeout or cancelled, fall back to requesting text input. Do not retry `listen`.
 ## Sub-Agents
-- Before assigning a name to a sub-agent, call `get_voice_registry` to see which names are already taken and which voices are available.
-- Pick a name that matches an available Kokoro voice (the voice ID suffix is the name — e.g., af_nova → "Nova", am_fenrir → "Fenrir").
+- Pick voice names matching available Kokoro voices (the voice ID suffix is the name — e.g., af_nova → "Nova", am_fenrir → "Fenrir").
 - Each sub-agent must use its own unique name. Never reuse "{{MAIN_AGENT}}".
 - On handoffs, both agents speak: the outgoing agent announces the handoff, the incoming agent acknowledges before starting.
+## Error Handling
+- If `speak` or `speak_then_listen` fails, fall back to text silently. Do not retry.
+- If `listen` times out, fall back to text. Do not retry.
 ## Fallback
 - If voice tools are not available, respond in text only. Do not mention voice capabilities.
 - If muted, `speak` succeeds silently. Do not call `unmute` unless the user asks.