voicesmith-mcp 1.0.7 → 1.0.9
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +93 -0
- package/hooks/session-start.sh +20 -5
- package/package.json +1 -1
- package/stt/__pycache__/mic_capture.cpython-314.pyc +0 -0
- package/stt/mic_capture.py +17 -6
package/README.md
CHANGED
|
@@ -30,6 +30,53 @@ The installer will:
|
|
|
30
30
|
|
|
31
31
|
Restart your IDE session after installing. The AI will greet you by voice on the first response.
|
|
32
32
|
|
|
33
|
+
## Usage
|
|
34
|
+
|
|
35
|
+
> [!NOTE]
|
|
36
|
+
> **Everything works out of the box.** After installing, just start a session — the AI speaks automatically. No configuration needed. The installer sets up voice behavior rules that teach the AI when and how to use its voice.
|
|
37
|
+
|
|
38
|
+
What the AI does automatically:
|
|
39
|
+
|
|
40
|
+
| Moment | What happens |
|
|
41
|
+
|--------|-------------|
|
|
42
|
+
| You give it a task | Speaks a brief acknowledgment |
|
|
43
|
+
| It finishes work | Speaks a summary of what was done |
|
|
44
|
+
| It has a question | Asks out loud, then listens for your voice response |
|
|
45
|
+
| Voice tools unavailable | Falls back to text silently |
|
|
46
|
+
|
|
47
|
+
---
|
|
48
|
+
|
|
49
|
+
### Changing Voices Mid-Session
|
|
50
|
+
|
|
51
|
+
Ask the AI to switch voices at any time:
|
|
52
|
+
|
|
53
|
+
> *"Switch to Nova"*
|
|
54
|
+
|
|
55
|
+
If the voice is available, the AI switches immediately. If it's occupied by another session, the AI will tell you and show available alternatives.
|
|
56
|
+
|
|
57
|
+
Browse all 54 voices:
|
|
58
|
+
|
|
59
|
+
> *"Show me the available voices"*
|
|
60
|
+
|
|
61
|
+
Or preview them in a terminal: `npx voicesmith-mcp voices`
|
|
62
|
+
|
|
63
|
+
---
|
|
64
|
+
|
|
65
|
+
### Voice Persistence
|
|
66
|
+
|
|
67
|
+
> [!TIP]
|
|
68
|
+
> When you switch voices, the choice is saved automatically. Next time you start or resume a session, the AI uses the same voice — no need to switch again.
|
|
69
|
+
|
|
70
|
+
---
|
|
71
|
+
|
|
72
|
+
### Muting
|
|
73
|
+
|
|
74
|
+
In a meeting or shared space? Just ask:
|
|
75
|
+
|
|
76
|
+
> *"Mute the voice"*
|
|
77
|
+
|
|
78
|
+
The AI continues working normally — it just won't play audio. Say *"unmute"* when you're ready.
|
|
79
|
+
|
|
33
80
|
## Alternative Install
|
|
34
81
|
|
|
35
82
|
If you don't have Node.js or prefer a shell script:
|
|
@@ -104,6 +151,9 @@ Re-run `npx voicesmith-mcp install` to change your voice or update settings. Exi
|
|
|
104
151
|
- **mpv** — audio playback
|
|
105
152
|
- ~500MB disk space for models
|
|
106
153
|
|
|
154
|
+
> [!WARNING]
|
|
155
|
+
> **Windows is not supported yet.** The server uses Unix-specific features (file locking, audio commands, process detection). Windows support is planned — see [TODO](TODO.md) for details.
|
|
156
|
+
|
|
107
157
|
## Supported IDEs
|
|
108
158
|
|
|
109
159
|
| IDE | Config Location | Rules Location | Multi-Session |
|
|
@@ -112,6 +162,49 @@ Re-run `npx voicesmith-mcp install` to change your voice or update settings. Exi
|
|
|
112
162
|
| Cursor | `~/.cursor/mcp.json` | `~/.cursor/rules/voicesmith.mdc` | No (single server) |
|
|
113
163
|
| Codex | `~/.codex/mcp.json` | `~/.codex/AGENTS.md` | No (single session) |
|
|
114
164
|
|
|
165
|
+
## Troubleshooting
|
|
166
|
+
|
|
167
|
+
### The AI can't hear me (listen returns empty or times out)
|
|
168
|
+
|
|
169
|
+
**Check microphone permissions.** On macOS, the terminal app that runs your IDE needs microphone access:
|
|
170
|
+
|
|
171
|
+
1. Open **System Settings > Privacy & Security > Microphone**
|
|
172
|
+
2. Make sure your terminal app is listed and enabled:
|
|
173
|
+
- **Warp**, **Terminal.app**, or **iTerm2** — for Claude Code
|
|
174
|
+
- **Cursor** or **VS Code** — if using those IDEs directly
|
|
175
|
+
3. If the app isn't listed, the first `listen` call should trigger the permission prompt. Approve it and try again.
|
|
176
|
+
|
|
177
|
+
> [!IMPORTANT]
|
|
178
|
+
> The Python process inherits microphone permissions from the app that launched it. If your terminal doesn't have mic access, listen will silently fail.
|
|
179
|
+
|
|
180
|
+
**Check your audio input device.** If an external mic is selected but not connected, the server opens it but gets silence:
|
|
181
|
+
- Open **System Settings > Sound > Input** and verify the correct mic is selected
|
|
182
|
+
- Or ask the AI: *"What's the server status?"* — check that `stt.loaded` and `vad.loaded` are both `true`
|
|
183
|
+
|
|
184
|
+
**Another app is using the mic.** Apps like Zoom, Teams, or FaceTime can hold exclusive mic access. Close them and try again.
|
|
185
|
+
|
|
186
|
+
**Voice too quiet for VAD.** The voice activity detector might not pick up soft speech. You can lower the sensitivity threshold in `~/.local/share/voicesmith-mcp/config.json`:
|
|
187
|
+
|
|
188
|
+
```json
|
|
189
|
+
{
|
|
190
|
+
"stt": {
|
|
191
|
+
"vad_threshold": 0.2
|
|
192
|
+
}
|
|
193
|
+
}
|
|
194
|
+
```
|
|
195
|
+
|
|
196
|
+
Lower values = more sensitive. Default is `0.3`. Restart the session after changing.
|
|
197
|
+
|
|
198
|
+
### The AI doesn't speak
|
|
199
|
+
|
|
200
|
+
- Check that **espeak-ng** and **mpv** are installed: `which espeak-ng mpv`
|
|
201
|
+
- Check the AI's status: ask *"What's your voice status?"*
|
|
202
|
+
- If muted, say *"Unmute"*
|
|
203
|
+
|
|
204
|
+
### The AI speaks with the wrong voice
|
|
205
|
+
|
|
206
|
+
This can happen when another session is holding your preferred voice name. Ask the AI: *"Switch to Eric"* — it will either switch or tell you what's available.
|
|
207
|
+
|
|
115
208
|
## Uninstall
|
|
116
209
|
|
|
117
210
|
```bash
|
package/hooks/session-start.sh
CHANGED
|
@@ -38,7 +38,16 @@ try:
|
|
|
38
38
|
raise SystemExit
|
|
39
39
|
except (OSError, ProcessLookupError):
|
|
40
40
|
pass
|
|
41
|
-
#
|
|
41
|
+
# Prefer the most recent session without a session_id (just registered, waiting for hook)
|
|
42
|
+
for s in reversed(data.get('sessions', [])):
|
|
43
|
+
try:
|
|
44
|
+
os.kill(s['pid'], 0)
|
|
45
|
+
if not s.get('session_id'):
|
|
46
|
+
print(s['port'])
|
|
47
|
+
raise SystemExit
|
|
48
|
+
except (OSError, ProcessLookupError):
|
|
49
|
+
pass
|
|
50
|
+
# Final fallback: most recent alive session
|
|
42
51
|
for s in reversed(data.get('sessions', [])):
|
|
43
52
|
try:
|
|
44
53
|
os.kill(s['pid'], 0)
|
|
@@ -51,11 +60,17 @@ except:
|
|
|
51
60
|
" 2>/dev/null)
|
|
52
61
|
|
|
53
62
|
# Send session_id to the server if we have both port and session_id
|
|
63
|
+
# Retry up to 3 times — the HTTP listener may not be ready yet
|
|
54
64
|
if [ -n "$PORT" ] && [ -n "$SESSION_ID" ]; then
|
|
55
|
-
RESPONSE
|
|
56
|
-
|
|
57
|
-
-
|
|
58
|
-
|
|
65
|
+
RESPONSE=""
|
|
66
|
+
for attempt in 1 2 3; do
|
|
67
|
+
RESPONSE=$(curl -s --max-time 3 -X POST \
|
|
68
|
+
-H "Content-Type: application/json" \
|
|
69
|
+
-d "{\"session_id\": \"$SESSION_ID\"}" \
|
|
70
|
+
"http://127.0.0.1:$PORT/session" 2>/dev/null)
|
|
71
|
+
[ -n "$RESPONSE" ] && break
|
|
72
|
+
sleep 0.5
|
|
73
|
+
done
|
|
59
74
|
|
|
60
75
|
if [ -n "$RESPONSE" ]; then
|
|
61
76
|
SESSION_NAME=$(echo "$RESPONSE" | python3 -c "
|
package/package.json
CHANGED
|
Binary file
|
package/stt/mic_capture.py
CHANGED
|
@@ -2,6 +2,7 @@
|
|
|
2
2
|
|
|
3
3
|
import asyncio
|
|
4
4
|
import queue
|
|
5
|
+
import time
|
|
5
6
|
from typing import Optional
|
|
6
7
|
|
|
7
8
|
import numpy as np
|
|
@@ -60,6 +61,7 @@ class MicCapture:
|
|
|
60
61
|
silence_duration = 0.0
|
|
61
62
|
loop = asyncio.get_event_loop()
|
|
62
63
|
|
|
64
|
+
stream = None
|
|
63
65
|
try:
|
|
64
66
|
stream = sd.InputStream(
|
|
65
67
|
samplerate=self._sample_rate,
|
|
@@ -77,15 +79,15 @@ class MicCapture:
|
|
|
77
79
|
# Check cancellation
|
|
78
80
|
if cancel_event and cancel_event.is_set():
|
|
79
81
|
logger.info("Recording cancelled by event")
|
|
80
|
-
|
|
82
|
+
break
|
|
81
83
|
|
|
82
84
|
# Check timeout
|
|
83
85
|
elapsed = asyncio.get_event_loop().time() - start_time
|
|
84
86
|
if elapsed >= timeout:
|
|
85
87
|
if not speech_detected:
|
|
86
88
|
logger.info("Recording timed out with no speech detected")
|
|
87
|
-
|
|
88
|
-
|
|
89
|
+
else:
|
|
90
|
+
logger.info("Recording timed out")
|
|
89
91
|
break
|
|
90
92
|
|
|
91
93
|
# Get audio chunk from queue
|
|
@@ -112,9 +114,6 @@ class MicCapture:
|
|
|
112
114
|
)
|
|
113
115
|
break
|
|
114
116
|
|
|
115
|
-
stream.stop()
|
|
116
|
-
stream.close()
|
|
117
|
-
|
|
118
117
|
if not chunks or not speech_detected:
|
|
119
118
|
return None
|
|
120
119
|
|
|
@@ -125,6 +124,18 @@ class MicCapture:
|
|
|
125
124
|
except Exception as e:
|
|
126
125
|
raise MicCaptureError(f"Recording failed: {e}") from e
|
|
127
126
|
finally:
|
|
127
|
+
# Safely tear down the audio stream. The CoreAudio IO thread may
|
|
128
|
+
# still be executing the callback when we call stop(). Wait briefly
|
|
129
|
+
# between stop() and close() to let the IO thread finish — this
|
|
130
|
+
# prevents the segfault in libffi/PortAudio where the callback
|
|
131
|
+
# dereferences freed memory.
|
|
132
|
+
if stream is not None:
|
|
133
|
+
try:
|
|
134
|
+
stream.stop()
|
|
135
|
+
time.sleep(0.05) # Let CoreAudio IO thread finish
|
|
136
|
+
stream.close()
|
|
137
|
+
except Exception as e:
|
|
138
|
+
logger.debug(f"Stream teardown: {e}")
|
|
128
139
|
self._recording = False
|
|
129
140
|
|
|
130
141
|
def _audio_callback(self, indata, frames, time, status) -> None:
|