voicesmith-mcp 1.0.8 → 1.0.10

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -30,6 +30,53 @@ The installer will:
30
30
 
31
31
  Restart your IDE session after installing. The AI will greet you by voice on the first response.
32
32
 
33
+ ## Usage
34
+
35
+ > [!NOTE]
36
+ > **Everything works out of the box.** After installing, just start a session — the AI speaks automatically. No configuration needed. The installer sets up voice behavior rules that teach the AI when and how to use its voice.
37
+
38
+ What the AI does automatically:
39
+
40
+ | Moment | What happens |
41
+ |--------|-------------|
42
+ | You give it a task | Speaks a brief acknowledgment |
43
+ | It finishes work | Speaks a summary of what was done |
44
+ | It has a question | Asks out loud, then listens for your voice response |
45
+ | Voice tools unavailable | Falls back to text silently |
46
+
47
+ ---
48
+
49
+ ### Changing Voices Mid-Session
50
+
51
+ Ask the AI to switch voices at any time:
52
+
53
+ > *"Switch to Nova"*
54
+
55
+ If the voice is available, the AI switches immediately. If it's occupied by another session, the AI will tell you and show available alternatives.
56
+
57
+ Browse all 54 voices:
58
+
59
+ > *"Show me the available voices"*
60
+
61
+ Or preview them in a terminal: `npx voicesmith-mcp voices`
62
+
63
+ ---
64
+
65
+ ### Voice Persistence
66
+
67
+ > [!TIP]
68
+ > When you switch voices, the choice is saved automatically. Next time you start or resume a session, the AI uses the same voice — no need to switch again.
69
+
70
+ ---
71
+
72
+ ### Muting
73
+
74
+ In a meeting or shared space? Just ask:
75
+
76
+ > *"Mute the voice"*
77
+
78
+ The AI continues working normally — it just won't play audio. Say *"unmute"* when you're ready.
79
+
33
80
  ## Alternative Install
34
81
 
35
82
  If you don't have Node.js or prefer a shell script:
@@ -104,6 +151,9 @@ Re-run `npx voicesmith-mcp install` to change your voice or update settings. Exi
104
151
  - **mpv** — audio playback
105
152
  - ~500MB disk space for models
106
153
 
154
+ > [!WARNING]
155
+ > **Windows is not supported yet.** The server uses Unix-specific features (file locking, audio commands, process detection). Windows support is planned — see [TODO](TODO.md) for details.
156
+
107
157
  ## Supported IDEs
108
158
 
109
159
  | IDE | Config Location | Rules Location | Multi-Session |
@@ -112,6 +162,49 @@ Re-run `npx voicesmith-mcp install` to change your voice or update settings. Exi
112
162
  | Cursor | `~/.cursor/mcp.json` | `~/.cursor/rules/voicesmith.mdc` | No (single server) |
113
163
  | Codex | `~/.codex/mcp.json` | `~/.codex/AGENTS.md` | No (single session) |
114
164
 
165
+ ## Troubleshooting
166
+
167
+ ### The AI can't hear me (listen returns empty or times out)
168
+
169
+ **Check microphone permissions.** On macOS, the terminal app that runs your IDE needs microphone access:
170
+
171
+ 1. Open **System Settings > Privacy & Security > Microphone**
172
+ 2. Make sure your terminal app is listed and enabled:
173
+ - **Warp**, **Terminal.app**, or **iTerm2** — for Claude Code
174
+ - **Cursor** or **VS Code** — if using those IDEs directly
175
+ 3. If the app isn't listed, the first `listen` call should trigger the permission prompt. Approve it and try again.
176
+
177
+ > [!IMPORTANT]
178
+ > The Python process inherits microphone permissions from the app that launched it. If your terminal doesn't have mic access, listen will silently fail.
179
+
180
+ **Check your audio input device.** If an external mic is selected but not connected, the server opens it but gets silence:
181
+ - Open **System Settings > Sound > Input** and verify the correct mic is selected
182
+ - Or ask the AI: *"What's the server status?"* — check that `stt.loaded` and `vad.loaded` are both `true`
183
+
184
+ **Another app is using the mic.** Apps like Zoom, Teams, or FaceTime can hold exclusive mic access. Close them and try again.
185
+
186
+ **Voice too quiet for VAD.** The voice activity detector might not pick up soft speech. You can lower the sensitivity threshold in `~/.local/share/voicesmith-mcp/config.json`:
187
+
188
+ ```json
189
+ {
190
+ "stt": {
191
+ "vad_threshold": 0.2
192
+ }
193
+ }
194
+ ```
195
+
196
+ Lower values = more sensitive. Default is `0.3`. Restart the session after changing.
197
+
198
+ ### The AI doesn't speak
199
+
200
+ - Check that **espeak-ng** and **mpv** are installed: `which espeak-ng mpv`
201
+ - Check the AI's status: ask *"What's your voice status?"*
202
+ - If muted, say *"Unmute"*
203
+
204
+ ### The AI speaks with the wrong voice
205
+
206
+ This can happen when another session is holding your preferred voice name. Ask the AI: *"Switch to Eric"* — it will either switch or tell you what's available.
207
+
115
208
  ## Uninstall
116
209
 
117
210
  ```bash
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "voicesmith-mcp",
3
- "version": "1.0.8",
3
+ "version": "1.0.10",
4
4
  "description": "Local AI voice for coding assistants — TTS & STT via MCP. Kokoro ONNX + faster-whisper, fully offline.",
5
5
  "bin": {
6
6
  "voicesmith-mcp": "bin/cli.js"
@@ -2,6 +2,7 @@
2
2
 
3
3
  import asyncio
4
4
  import queue
5
+ import time
5
6
  from typing import Optional
6
7
 
7
8
  import numpy as np
@@ -60,6 +61,7 @@ class MicCapture:
60
61
  silence_duration = 0.0
61
62
  loop = asyncio.get_event_loop()
62
63
 
64
+ stream = None
63
65
  try:
64
66
  stream = sd.InputStream(
65
67
  samplerate=self._sample_rate,
@@ -77,15 +79,15 @@ class MicCapture:
77
79
  # Check cancellation
78
80
  if cancel_event and cancel_event.is_set():
79
81
  logger.info("Recording cancelled by event")
80
- return None
82
+ break
81
83
 
82
84
  # Check timeout
83
85
  elapsed = asyncio.get_event_loop().time() - start_time
84
86
  if elapsed >= timeout:
85
87
  if not speech_detected:
86
88
  logger.info("Recording timed out with no speech detected")
87
- return None
88
- logger.info("Recording timed out")
89
+ else:
90
+ logger.info("Recording timed out")
89
91
  break
90
92
 
91
93
  # Get audio chunk from queue
@@ -112,9 +114,6 @@ class MicCapture:
112
114
  )
113
115
  break
114
116
 
115
- stream.stop()
116
- stream.close()
117
-
118
117
  if not chunks or not speech_detected:
119
118
  return None
120
119
 
@@ -125,6 +124,18 @@ class MicCapture:
125
124
  except Exception as e:
126
125
  raise MicCaptureError(f"Recording failed: {e}") from e
127
126
  finally:
127
+ # Safely tear down the audio stream. The CoreAudio IO thread may
128
+ # still be executing the callback when we call stop(). Wait briefly
129
+ # between stop() and close() to let the IO thread finish — this
130
+ # prevents the segfault in libffi/PortAudio where the callback
131
+ # dereferences freed memory.
132
+ if stream is not None:
133
+ try:
134
+ stream.stop()
135
+ time.sleep(0.05) # Let CoreAudio IO thread finish
136
+ stream.close()
137
+ except Exception as e:
138
+ logger.debug(f"Stream teardown: {e}")
128
139
  self._recording = False
129
140
 
130
141
  def _audio_callback(self, indata, frames, time, status) -> None:
@@ -7,6 +7,7 @@ You have access to voice tools via the VoiceSmith MCP server.
7
7
  - **IMPORTANT:** If your session context says "Your assigned voice for this session is: [Name]", use THAT name — not "{{MAIN_AGENT}}". This is your real identity for this session.
8
8
  - On your first response, speak a brief intro using your assigned name: "[Name] here, ready to go."
9
9
  - Do not use your assigned name for sub-agents. Each agent needs its own unique name.
10
+ - Tone: Be conversational and natural. Match the user's energy — casual if they're casual, focused if they're focused.
10
11
 
11
12
  ## Voice Switching
12
13
  - If the user asks to switch to a voice and `speak` returns `"error": "name_occupied"`, tell the user that voice is occupied by another session.
@@ -14,25 +15,28 @@ You have access to voice tools via the VoiceSmith MCP server.
14
15
  - Do NOT silently fall back to a different voice.
15
16
 
16
17
  ## Speaking
17
- - Speak twice per response:
18
- 1. **Opening** — Brief acknowledgment when starting work. Use `block: false` so work begins immediately in parallel.
19
- 2. **Closing** Summary when done. Use `block: true`. Never skip this.
20
- - **Questions that need user input use `speak_then_listen` as your closing voice.** If your response asks the user to make a decision, provide information, or confirm something (e.g., "which approach?", "should I?", "want me to?", "does this look right?"), your closing voice MUST be `speak_then_listen` — not regular `speak`. This way the mic opens right after you ask.
21
- - Rhetorical wrap-ups ("What's next?", "Standing by.") do NOT require listen — use regular `speak` for those.
22
- - Keep spoken messages to 1-2 sentences. Write details, speak summaries.
23
- - Do not speak code, file paths, or long lists aloud.
24
- - Speak at transitions only: start, finish, error, question. Do not narrate every action.
18
+ - **Opening** Only speak at the start when you have something meaningful to say (e.g., clarifying your approach, flagging an issue). Do NOT speak filler acknowledgments like "Let me look into that." Use `block: false` when you do speak an opening.
19
+ - **Closing** — Always speak a summary when done. Use `block: true`. Never skip the closing.
20
+ - **Questions requiring user input → use `speak_then_listen` as your closing.** If the user literally cannot continue without providing input (e.g., choosing between options, confirming a destructive action, providing missing info), use `speak_then_listen`. If you can reasonably continue without their answer, use regular `speak`.
21
+ - Keep spoken output brief prefer 1-2 sentences, never exceed 3. Write details, speak summaries. No code or paths aloud.
22
+
23
+ ## Speed Preferences
24
+ - The `speak` tool accepts a `speed` parameter (default 1.0). Values < 1.0 are slower, > 1.0 are faster.
25
+ - If the user asks to speak slower or faster, adjust the speed and remember their preference for the session.
25
26
 
26
27
  ## Listening
27
- - Use `speak_then_listen` whenever you need user input — it is your closing voice AND listen in one call.
28
+ - Use `speak_then_listen` whenever you need user input — it combines speaking and opening the mic in one call.
28
29
  - If `listen` returns timeout or cancelled, fall back to requesting text input. Do not retry `listen`.
29
30
 
30
31
  ## Sub-Agents
31
- - Before assigning a name to a sub-agent, call `get_voice_registry` to see which names are already taken and which voices are available.
32
- - Pick a name that matches an available Kokoro voice (the voice ID suffix is the name — e.g., af_nova → "Nova", am_fenrir → "Fenrir").
32
+ - Pick voice names matching available Kokoro voices (the voice ID suffix is the name e.g., af_nova "Nova", am_fenrir → "Fenrir").
33
33
  - Each sub-agent must use its own unique name. Never reuse "{{MAIN_AGENT}}".
34
34
  - On handoffs, both agents speak: the outgoing agent announces the handoff, the incoming agent acknowledges before starting.
35
35
 
36
+ ## Error Handling
37
+ - If `speak` or `speak_then_listen` fails, fall back to text silently. Do not retry.
38
+ - If `listen` times out, fall back to text. Do not retry.
39
+
36
40
  ## Fallback
37
41
  - If voice tools are not available, respond in text only. Do not mention voice capabilities.
38
42
  - If muted, `speak` succeeds silently. Do not call `unmute` unless the user asks.