voice-mcp-server 0.1.18 → 0.1.20
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +6 -6
- package/config/config.yaml +1 -1
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -65,8 +65,8 @@ The system is built on a highly modular adapter pattern configured via `hydra` Y
|
|
|
65
65
|
| | `kokoro_speaker` | High-quality, emotive local ML Text-to-Speech. |
|
|
66
66
|
| | `elevenlabs_speaker` | Premium cloud-based ultra-realistic voices. |
|
|
67
67
|
| **🎙️ Microphones** | `live_mic` | Direct hardware integration via PyAudio. |
|
|
68
|
-
| **🤫 VAD (Activity)** | `silero_vad` | Conversational mode powered by Silero, heavily optimized for 1-second barge-ins. |
|
|
69
|
-
| | `ptt_vad` | Manual Push-to-Talk
|
|
68
|
+
| **🤫 VAD (Activity)** | `silero_vad` | Conversational mode powered by Silero, heavily optimized for 1-second barge-ins. **(Default)** |
|
|
69
|
+
| | `ptt_vad` | Manual Push-to-Talk mode. *(Note: Requires macOS Input Monitoring permissions for your terminal).* |
|
|
70
70
|
| **📝 STT (Transcription)**| `mlx_whisper_large_v3`| Blazing fast local transcription leveraging Apple's MLX framework. |
|
|
71
71
|
|
|
72
72
|
-----
|
|
@@ -80,7 +80,7 @@ Once connected, the server equips your AI agent with two powerful MCP tools:
|
|
|
80
80
|
The core communication loop. The AI calls this tool and passes a string of text it wants to say.
|
|
81
81
|
|
|
82
82
|
1. The server renders and plays the TTS.
|
|
83
|
-
2. The server instantly activates the microphone and listens for the user's reply via VAD. *(Note: By default, the server is configured to use
|
|
83
|
+
2. The server instantly activates the microphone and listens for the user's reply via VAD. *(Note: By default, the server is configured to use Conversational Mode via Silero VAD).*
|
|
84
84
|
3. The server transcribes the audio and returns the text to the AI.
|
|
85
85
|
|
|
86
86
|
**Interrupt Handling (Barge-in):** If the user interrupts the AI mid-sentence, playback instantly stops. The server captures the interruption, transcribes it, and returns the response alongside a `was_interrupted: true` flag. This allows the AI to organically realize it was cut off and address the interruption naturally.
|
|
@@ -113,15 +113,15 @@ npm install -g voice-mcp-server
|
|
|
113
113
|
```
|
|
114
114
|
|
|
115
115
|
### 3. Connect to your MCP Client
|
|
116
|
-
You can now add the server to your favorite client. Using
|
|
116
|
+
You can now add the server to your favorite client. Using the globally installed command is the fastest method:
|
|
117
117
|
|
|
118
118
|
**For Gemini CLI:**
|
|
119
119
|
```bash
|
|
120
|
-
gemini mcp add voice-mcp-server --scope user
|
|
120
|
+
gemini mcp add voice-mcp-server --scope user voice-mcp-server
|
|
121
121
|
```
|
|
122
122
|
|
|
123
123
|
**For Cursor / Claude Desktop:**
|
|
124
|
-
Simply use `
|
|
124
|
+
Simply use `voice-mcp-server` as the command in your configuration.
|
|
125
125
|
|
|
126
126
|
> [!NOTE]
|
|
127
127
|
> **First Run Performance:** The very first time you invoke the voice tool, it will take a few minutes to initialize the Python environment and download the heavy ML weights (~4GB). **The tools will not be available until this background setup completes.** You can monitor progress in your terminal logs. *Depending on your AI client, you may need to restart the application/CLI for the tools to appear after setup.*
|
package/config/config.yaml
CHANGED
|
@@ -11,7 +11,7 @@ defaults:
|
|
|
11
11
|
# Available VADs:
|
|
12
12
|
# - ptt_vad: Walkie-Talkie mode (Hold 'Shift' to talk. Instant response. Ignores TV/noise).
|
|
13
13
|
# - silero_vad: Conversational AI mode (Listens automatically. Tuned for 1-second barge-ins).
|
|
14
|
-
- vad:
|
|
14
|
+
- vad: silero_vad
|
|
15
15
|
|
|
16
16
|
# Available STTs: mlx_whisper_large_v3 (Apple Silicon GPU), whisper_stt (Google Cloud API)
|
|
17
17
|
- stt: mlx_whisper_large_v3
|