voice-mcp-server 0.1.20 → 0.1.22
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +7 -3
- package/config/config.yaml +1 -1
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -65,8 +65,8 @@ The system is built on a highly modular adapter pattern configured via `hydra` Y
|
|
|
65
65
|
| | `kokoro_speaker` | High-quality, emotive local ML Text-to-Speech. |
|
|
66
66
|
| | `elevenlabs_speaker` | Premium cloud-based ultra-realistic voices. |
|
|
67
67
|
| **🎙️ Microphones** | `live_mic` | Direct hardware integration via PyAudio. |
|
|
68
|
-
| **🤫 VAD (Activity)** | `silero_vad` | Conversational mode powered by Silero, heavily optimized for 1-second barge-ins.
|
|
69
|
-
| | `ptt_vad` | Manual Push-to-Talk mode.
|
|
68
|
+
| **🤫 VAD (Activity)** | `silero_vad` | Conversational mode powered by Silero, heavily optimized for 1-second barge-ins. *(Note: **Headphones are strictly required** for this mode to prevent the AI from hearing its own audio output and endlessly interrupting itself).* |
|
|
69
|
+
| | `ptt_vad` | Manual Push-to-Talk / Walkie-Talkie mode. **(Default: Hold 'Shift' to talk)** |
|
|
70
70
|
| **📝 STT (Transcription)**| `mlx_whisper_large_v3`| Blazing fast local transcription leveraging Apple's MLX framework. |
|
|
71
71
|
|
|
72
72
|
-----
|
|
@@ -80,7 +80,7 @@ Once connected, the server equips your AI agent with two powerful MCP tools:
|
|
|
80
80
|
The core communication loop. The AI calls this tool and passes a string of text it wants to say.
|
|
81
81
|
|
|
82
82
|
1. The server renders and plays the TTS.
|
|
83
|
-
2. The server instantly activates the microphone and listens for the user's reply via VAD. *(Note: By default, the server is configured to use
|
|
83
|
+
2. The server instantly activates the microphone and listens for the user's reply via VAD. *(Note: By default, the server is configured to use Push-To-Talk. You must press and hold the **Shift** key on your keyboard to speak or interrupt. You can ask the AI to change this!)*
|
|
84
84
|
3. The server transcribes the audio and returns the text to the AI.
|
|
85
85
|
|
|
86
86
|
**Interrupt Handling (Barge-in):** If the user interrupts the AI mid-sentence, playback instantly stops. The server captures the interruption, transcribes it, and returns the response alongside a `was_interrupted: true` flag. This allows the AI to organically realize it was cut off and address the interruption naturally.
|
|
@@ -100,6 +100,10 @@ The easiest way to get started is to install the server globally via NPM. This w
|
|
|
100
100
|
- **Node.js** (v18+)
|
|
101
101
|
- **Python** (3.10, 3.11, or 3.12) *(Note: Python 3.13 is not yet supported by the Kokoro TTS library. The bridge will automatically search your system for a compatible version if your default is 3.13.)*
|
|
102
102
|
|
|
103
|
+
> [!IMPORTANT]
|
|
104
|
+
> **Input Monitoring Permission:** By default, the server uses **Push-to-Talk (Hold 'Shift')** to prevent the AI from hearing its own voice through your laptop speakers and interrupting itself. For the server to detect the Shift key globally, you **must** grant Input Monitoring permissions to your terminal/CLI.
|
|
105
|
+
> Go to: `System Settings` > `Privacy & Security` > `Input Monitoring` > Toggle your terminal (e.g., Cursor, iTerm, Terminal) **ON**.
|
|
106
|
+
|
|
103
107
|
You must also have the required system-level audio libraries installed via [Homebrew](https://brew.sh/):
|
|
104
108
|
```bash
|
|
105
109
|
brew install portaudio espeak-ng
|
package/config/config.yaml
CHANGED
|
@@ -11,7 +11,7 @@ defaults:
|
|
|
11
11
|
# Available VADs:
|
|
12
12
|
# - ptt_vad: Walkie-Talkie mode (Hold 'Shift' to talk. Instant response. Ignores TV/noise).
|
|
13
13
|
# - silero_vad: Conversational AI mode (Listens automatically. Tuned for 1-second barge-ins).
|
|
14
|
-
- vad:
|
|
14
|
+
- vad: ptt_vad
|
|
15
15
|
|
|
16
16
|
# Available STTs: mlx_whisper_large_v3 (Apple Silicon GPU), whisper_stt (Google Cloud API)
|
|
17
17
|
- stt: mlx_whisper_large_v3
|