voicesmith-mcp 1.0.8 → 1.0.10
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +93 -0
- package/package.json +1 -1
- package/stt/__pycache__/mic_capture.cpython-314.pyc +0 -0
- package/stt/mic_capture.py +17 -6
- package/templates/voice-rules.md +15 -11
package/README.md
CHANGED
|
@@ -30,6 +30,53 @@ The installer will:
|
|
|
30
30
|
|
|
31
31
|
Restart your IDE session after installing. The AI will greet you by voice on the first response.
|
|
32
32
|
|
|
33
|
+
## Usage
|
|
34
|
+
|
|
35
|
+
> [!NOTE]
|
|
36
|
+
> **Everything works out of the box.** After installing, just start a session — the AI speaks automatically. No configuration needed. The installer sets up voice behavior rules that teach the AI when and how to use its voice.
|
|
37
|
+
|
|
38
|
+
What the AI does automatically:
|
|
39
|
+
|
|
40
|
+
| Moment | What happens |
|
|
41
|
+
|--------|-------------|
|
|
42
|
+
| You give it a task | Speaks a brief acknowledgment |
|
|
43
|
+
| It finishes work | Speaks a summary of what was done |
|
|
44
|
+
| It has a question | Asks out loud, then listens for your voice response |
|
|
45
|
+
| Voice tools unavailable | Falls back to text silently |
|
|
46
|
+
|
|
47
|
+
---
|
|
48
|
+
|
|
49
|
+
### Changing Voices Mid-Session
|
|
50
|
+
|
|
51
|
+
Ask the AI to switch voices at any time:
|
|
52
|
+
|
|
53
|
+
> *"Switch to Nova"*
|
|
54
|
+
|
|
55
|
+
If the voice is available, the AI switches immediately. If it's occupied by another session, the AI will tell you and show available alternatives.
|
|
56
|
+
|
|
57
|
+
Browse all 54 voices:
|
|
58
|
+
|
|
59
|
+
> *"Show me the available voices"*
|
|
60
|
+
|
|
61
|
+
Or preview them in a terminal: `npx voicesmith-mcp voices`
|
|
62
|
+
|
|
63
|
+
---
|
|
64
|
+
|
|
65
|
+
### Voice Persistence
|
|
66
|
+
|
|
67
|
+
> [!TIP]
|
|
68
|
+
> When you switch voices, the choice is saved automatically. Next time you start or resume a session, the AI uses the same voice — no need to switch again.
|
|
69
|
+
|
|
70
|
+
---
|
|
71
|
+
|
|
72
|
+
### Muting
|
|
73
|
+
|
|
74
|
+
In a meeting or shared space? Just ask:
|
|
75
|
+
|
|
76
|
+
> *"Mute the voice"*
|
|
77
|
+
|
|
78
|
+
The AI continues working normally — it just won't play audio. Say *"unmute"* when you're ready.
|
|
79
|
+
|
|
33
80
|
## Alternative Install
|
|
34
81
|
|
|
35
82
|
If you don't have Node.js or prefer a shell script:
|
|
@@ -104,6 +151,9 @@ Re-run `npx voicesmith-mcp install` to change your voice or update settings. Exi
|
|
|
104
151
|
- **mpv** — audio playback
|
|
105
152
|
- ~500MB disk space for models
|
|
106
153
|
|
|
154
|
+
> [!WARNING]
|
|
155
|
+
> **Windows is not supported yet.** The server uses Unix-specific features (file locking, audio commands, process detection). Windows support is planned — see [TODO](TODO.md) for details.
|
|
156
|
+
|
|
107
157
|
## Supported IDEs
|
|
108
158
|
|
|
109
159
|
| IDE | Config Location | Rules Location | Multi-Session |
|
|
@@ -112,6 +162,49 @@ Re-run `npx voicesmith-mcp install` to change your voice or update settings. Exi
|
|
|
112
162
|
| Cursor | `~/.cursor/mcp.json` | `~/.cursor/rules/voicesmith.mdc` | No (single server) |
|
|
113
163
|
| Codex | `~/.codex/mcp.json` | `~/.codex/AGENTS.md` | No (single session) |
|
|
114
164
|
|
|
165
|
+
## Troubleshooting
|
|
166
|
+
|
|
167
|
+
### The AI can't hear me (listen returns empty or times out)
|
|
168
|
+
|
|
169
|
+
**Check microphone permissions.** On macOS, the terminal app that runs your IDE needs microphone access:
|
|
170
|
+
|
|
171
|
+
1. Open **System Settings > Privacy & Security > Microphone**
|
|
172
|
+
2. Make sure your terminal app is listed and enabled:
|
|
173
|
+
- **Warp**, **Terminal.app**, or **iTerm2** — for Claude Code
|
|
174
|
+
- **Cursor** or **VS Code** — if using those IDEs directly
|
|
175
|
+
3. If the app isn't listed, the first `listen` call should trigger the permission prompt. Approve it and try again.
|
|
176
|
+
|
|
177
|
+
> [!IMPORTANT]
|
|
178
|
+
> The Python process inherits microphone permissions from the app that launched it. If your terminal doesn't have mic access, listen will silently fail.
|
|
179
|
+
|
|
180
|
+
**Check your audio input device.** If an external mic is selected but not connected, the server opens it but gets silence:
|
|
181
|
+
- Open **System Settings > Sound > Input** and verify the correct mic is selected
|
|
182
|
+
- Or ask the AI: *"What's the server status?"* — check that `stt.loaded` and `vad.loaded` are both `true`
|
|
183
|
+
|
|
184
|
+
**Another app is using the mic.** Apps like Zoom, Teams, or FaceTime can hold exclusive mic access. Close them and try again.
|
|
185
|
+
|
|
186
|
+
**Voice too quiet for VAD.** The voice activity detector might not pick up soft speech. You can lower the sensitivity threshold in `~/.local/share/voicesmith-mcp/config.json`:
|
|
187
|
+
|
|
188
|
+
```json
|
|
189
|
+
{
|
|
190
|
+
"stt": {
|
|
191
|
+
"vad_threshold": 0.2
|
|
192
|
+
}
|
|
193
|
+
}
|
|
194
|
+
```
|
|
195
|
+
|
|
196
|
+
Lower values = more sensitive. Default is `0.3`. Restart the session after changing.
|
|
197
|
+
|
|
198
|
+
### The AI doesn't speak
|
|
199
|
+
|
|
200
|
+
- Check that **espeak-ng** and **mpv** are installed: `which espeak-ng mpv`
|
|
201
|
+
- Check the AI's status: ask *"What's your voice status?"*
|
|
202
|
+
- If muted, say *"Unmute"*
|
|
203
|
+
|
|
204
|
+
### The AI speaks with the wrong voice
|
|
205
|
+
|
|
206
|
+
This can happen when another session is holding your preferred voice name. Ask the AI: *"Switch to Eric"* — it will either switch or tell you what's available.
|
|
207
|
+
|
|
115
208
|
## Uninstall
|
|
116
209
|
|
|
117
210
|
```bash
|
package/package.json
CHANGED
|
Binary file
|
package/stt/mic_capture.py
CHANGED
|
@@ -2,6 +2,7 @@
|
|
|
2
2
|
|
|
3
3
|
import asyncio
|
|
4
4
|
import queue
|
|
5
|
+
import time
|
|
5
6
|
from typing import Optional
|
|
6
7
|
|
|
7
8
|
import numpy as np
|
|
@@ -60,6 +61,7 @@ class MicCapture:
|
|
|
60
61
|
silence_duration = 0.0
|
|
61
62
|
loop = asyncio.get_event_loop()
|
|
62
63
|
|
|
64
|
+
stream = None
|
|
63
65
|
try:
|
|
64
66
|
stream = sd.InputStream(
|
|
65
67
|
samplerate=self._sample_rate,
|
|
@@ -77,15 +79,15 @@ class MicCapture:
|
|
|
77
79
|
# Check cancellation
|
|
78
80
|
if cancel_event and cancel_event.is_set():
|
|
79
81
|
logger.info("Recording cancelled by event")
|
|
80
|
-
|
|
82
|
+
break
|
|
81
83
|
|
|
82
84
|
# Check timeout
|
|
83
85
|
elapsed = asyncio.get_event_loop().time() - start_time
|
|
84
86
|
if elapsed >= timeout:
|
|
85
87
|
if not speech_detected:
|
|
86
88
|
logger.info("Recording timed out with no speech detected")
|
|
87
|
-
|
|
88
|
-
|
|
89
|
+
else:
|
|
90
|
+
logger.info("Recording timed out")
|
|
89
91
|
break
|
|
90
92
|
|
|
91
93
|
# Get audio chunk from queue
|
|
@@ -112,9 +114,6 @@ class MicCapture:
|
|
|
112
114
|
)
|
|
113
115
|
break
|
|
114
116
|
|
|
115
|
-
stream.stop()
|
|
116
|
-
stream.close()
|
|
117
|
-
|
|
118
117
|
if not chunks or not speech_detected:
|
|
119
118
|
return None
|
|
120
119
|
|
|
@@ -125,6 +124,18 @@ class MicCapture:
|
|
|
125
124
|
except Exception as e:
|
|
126
125
|
raise MicCaptureError(f"Recording failed: {e}") from e
|
|
127
126
|
finally:
|
|
127
|
+
# Safely tear down the audio stream. The CoreAudio IO thread may
|
|
128
|
+
# still be executing the callback when we call stop(). Wait briefly
|
|
129
|
+
# between stop() and close() to let the IO thread finish — this
|
|
130
|
+
# prevents the segfault in libffi/PortAudio where the callback
|
|
131
|
+
# dereferences freed memory.
|
|
132
|
+
if stream is not None:
|
|
133
|
+
try:
|
|
134
|
+
stream.stop()
|
|
135
|
+
time.sleep(0.05) # Let CoreAudio IO thread finish
|
|
136
|
+
stream.close()
|
|
137
|
+
except Exception as e:
|
|
138
|
+
logger.debug(f"Stream teardown: {e}")
|
|
128
139
|
self._recording = False
|
|
129
140
|
|
|
130
141
|
def _audio_callback(self, indata, frames, time, status) -> None:
|
package/templates/voice-rules.md
CHANGED
|
@@ -7,6 +7,7 @@ You have access to voice tools via the VoiceSmith MCP server.
|
|
|
7
7
|
- **IMPORTANT:** If your session context says "Your assigned voice for this session is: [Name]", use THAT name — not "{{MAIN_AGENT}}". This is your real identity for this session.
|
|
8
8
|
- On your first response, speak a brief intro using your assigned name: "[Name] here, ready to go."
|
|
9
9
|
- Do not use your assigned name for sub-agents. Each agent needs its own unique name.
|
|
10
|
+
- Tone: Be conversational and natural. Match the user's energy — casual if they're casual, focused if they're focused.
|
|
10
11
|
|
|
11
12
|
## Voice Switching
|
|
12
13
|
- If the user asks to switch to a voice and `speak` returns `"error": "name_occupied"`, tell the user that voice is occupied by another session.
|
|
@@ -14,25 +15,28 @@ You have access to voice tools via the VoiceSmith MCP server.
|
|
|
14
15
|
- Do NOT silently fall back to a different voice.
|
|
15
16
|
|
|
16
17
|
## Speaking
|
|
17
|
-
-
|
|
18
|
-
|
|
19
|
-
|
|
20
|
-
-
|
|
21
|
-
|
|
22
|
-
|
|
23
|
-
-
|
|
24
|
-
-
|
|
18
|
+
- **Opening** — Only speak at the start when you have something meaningful to say (e.g., clarifying your approach, flagging an issue). Do NOT speak filler acknowledgments like "Let me look into that." Use `block: false` when you do speak an opening.
|
|
19
|
+
- **Closing** — Always speak a summary when done. Use `block: true`. Never skip the closing.
|
|
20
|
+
- **Questions requiring user input → use `speak_then_listen` as your closing.** If the user literally cannot continue without providing input (e.g., choosing between options, confirming a destructive action, providing missing info), use `speak_then_listen`. If you can reasonably continue without their answer, use regular `speak`.
|
|
21
|
+
- Keep spoken output brief — prefer 1-2 sentences, never exceed 3. Write details, speak summaries. No code or paths aloud.
|
|
22
|
+
|
|
23
|
+
## Speed Preferences
|
|
24
|
+
- The `speak` tool accepts a `speed` parameter (default 1.0). Values < 1.0 are slower, > 1.0 are faster.
|
|
25
|
+
- If the user asks to speak slower or faster, adjust the speed and remember their preference for the session.
|
|
25
26
|
|
|
26
27
|
## Listening
|
|
27
|
-
- Use `speak_then_listen` whenever you need user input — it
|
|
28
|
+
- Use `speak_then_listen` whenever you need user input — it combines speaking and opening the mic in one call.
|
|
28
29
|
- If `listen` returns timeout or cancelled, fall back to requesting text input. Do not retry `listen`.
|
|
29
30
|
|
|
30
31
|
## Sub-Agents
|
|
31
|
-
-
|
|
32
|
-
- Pick a name that matches an available Kokoro voice (the voice ID suffix is the name — e.g., af_nova → "Nova", am_fenrir → "Fenrir").
|
|
32
|
+
- Pick voice names matching available Kokoro voices (the voice ID suffix is the name — e.g., af_nova → "Nova", am_fenrir → "Fenrir").
|
|
33
33
|
- Each sub-agent must use its own unique name. Never reuse "{{MAIN_AGENT}}".
|
|
34
34
|
- On handoffs, both agents speak: the outgoing agent announces the handoff, the incoming agent acknowledges before starting.
|
|
35
35
|
|
|
36
|
+
## Error Handling
|
|
37
|
+
- If `speak` or `speak_then_listen` fails, fall back to text silently. Do not retry.
|
|
38
|
+
- If `listen` times out, fall back to text. Do not retry.
|
|
39
|
+
|
|
36
40
|
## Fallback
|
|
37
41
|
- If voice tools are not available, respond in text only. Do not mention voice capabilities.
|
|
38
42
|
- If muted, `speak` succeeds silently. Do not call `unmute` unless the user asks.
|