voice-mode 0.1.24__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,85 @@
1
+ # Environment configuration
2
+ .env.local
3
+ .env
4
+
5
+ # Python
6
+ __pycache__/
7
+ *.pyc
8
+ *.pyo
9
+ *.pyd
10
+ .Python
11
+ *.so
12
+ .pytest_cache/
13
+ .coverage
14
+ .coverage.*
15
+ htmlcov/
16
+ .mypy_cache/
17
+ .ruff_cache/
18
+ .hypothesis/
19
+ .tox/
20
+ *.cover
21
+ *.py,cover
22
+
23
+ # Virtual environments
24
+ venv/
25
+ env/
26
+ .venv/
27
+ test-env/
28
+ test-pkg-env/
29
+
30
+ # IDE
31
+ .vscode/
32
+ .idea/
33
+ *.swp
34
+ *.swo
35
+ *~
36
+
37
+ # OS
38
+ .DS_Store
39
+ Thumbs.db
40
+ desktop.ini
41
+
42
+ # Logs
43
+ *.log
44
+ logs/
45
+ *.err
46
+
47
+ # Temporary files
48
+ tmp/
49
+ temp/
50
+ .tmp/
51
+ *.tmp
52
+
53
+ # Build artifacts
54
+ dist/
55
+ build/
56
+ *.egg-info/
57
+ .eggs/
58
+ *.whl
59
+
60
+ # Test artifacts
61
+ test_output/
62
+ *.mp3
63
+ *.wav
64
+ *.pcm
65
+ .aider*
66
+
67
+ # Voice MCP specific
68
+ voice-mcp_recordings/
69
+ debug_recordings/
70
+
71
+ # Documentation build
72
+ docs/_build/
73
+ docs/promote/
74
+
75
+ # Local configuration
76
+ .claude/settings.local.json
77
+
78
+ # Node modules (for livekit frontend)
79
+ node_modules/
80
+ .npm/
81
+
82
+ # Misc
83
+ .cache/
84
+ *.bak
85
+ *.orig
@@ -0,0 +1,165 @@
1
+ # Changelog
2
+
3
+ All notable changes to voice-mcp will be documented in this file.
4
+
5
+ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
6
+ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7
+
8
+ ## [Unreleased]
9
+
10
+ ## [0.1.24] - 2025-06-17
11
+
12
+ ## [0.1.23] - 2025-06-17
13
+
14
+ ### Added
15
+ - Provider registry system MVP for managing TTS/STT providers
16
+ - Dynamic provider discovery and registration
17
+ - Automatic availability checking
18
+ - Feature-based provider filtering
19
+ - Dual package name support (voice-mcp and voice-mode)
20
+ - Both commands now available in voice-mode package
21
+ - Maintains backward compatibility
22
+ - Service management tools for Kokoro TTS:
23
+ - `start_kokoro` - Start the Kokoro TTS service using uvx
24
+ - `stop_kokoro` - Stop the running Kokoro service
25
+ - `kokoro_status` - Check service status with CPU/memory usage
26
+ - Automatic cleanup of services on server shutdown
27
+ - psutil dependency for process monitoring
28
+ - `list_tts_voices` tool to list all available TTS voices by provider
29
+ - Shows OpenAI standard and enhanced voices with characteristics
30
+ - Lists Kokoro voices with descriptions
31
+ - Includes usage examples and emotional speech guidance
32
+ - Checks API/service availability for each provider
33
+ - Voice-chat slash command improvements for better argument handling
34
+
35
+ ### Changed
36
+ - Default TTS voices updated: alloy for OpenAI, af_sky for Kokoro
37
+
38
+ ## [0.1.22] - 2025-06-16
39
+
40
+ ### Added
41
+ - Local STT/TTS configuration support in .mcp.json
42
+ - Split TTS metrics into generation and playback components for better performance insights
43
+ - Tracks TTS generation time (API call) separately from playback time
44
+ - Displays metrics as tts_gen, tts_play, and tts_total
45
+
46
+ ### Changed
47
+ - Modified text_to_speech() to return (success, metrics) tuple
48
+ - Updated all tests to handle new TTS return format
49
+
50
+ ## [0.1.21] - 2025-06-16
51
+
52
+ ### Added
53
+ - VOICE_MCP_SAVE_AUDIO environment variable to save all TTS/STT audio files
54
+ - Audio files saved to ~/voice-mcp_audio/ with timestamps
55
+ - Documentation about gpt-4o-mini-tts being best for emotional speech
56
+ - Warning to never use coral voice and default to af_sky for Kokoro
57
+
58
+ ### Changed
59
+ - Voice parameter changed from Literal to str for flexibility in voice selection
60
+
61
+ ## [0.1.20] - 2025-06-15
62
+
63
+ ### Changed
64
+ - Voice parameter changed from Literal to str type for more flexibility
65
+
66
+ ## [0.1.19] - 2025-06-15
67
+
68
+ ### Added
69
+ - TTS provider selection parameter to converse function ("openai" or "kokoro")
70
+ - Auto-detection of TTS provider based on voice selection
71
+ - Support for multiple TTS endpoints with provider-specific clients
72
+
73
+ ## [0.1.18] - 2025-06-15
74
+
75
+ ### Changed
76
+ - Removed mcp-neovim-server from .mcp.json configuration
77
+
78
+ ## [0.1.17] - 2025-06-15
79
+
80
+ ### Changed
81
+ - Minor version bump (no functional changes)
82
+
83
+ ## [0.1.16] - 2025-06-15
84
+
85
+ ## [0.1.16] - 2025-06-15
86
+
87
+ ### Added
88
+ - Voice parameter to converse function for dynamic TTS voice selection
89
+ - Support for Kokoro voices: af_sky, af_sarah, am_adam, af_nicole, am_michael
90
+ - Python 3.13 support with conditional audioop-lts dependency
91
+
92
+ ### Fixed
93
+ - BrokenResourceError when concurrent voice operations interfere with MCP stdio communication
94
+ - Enhanced sounddevice stderr redirection workaround to prevent stdio corruption
95
+ - Added concurrency lock to serialize audio operations and prevent race conditions
96
+ - Protected stdio file descriptors during audio recording and playback operations
97
+ - Added anyio.BrokenResourceError to exception handling for MCP disconnections
98
+ - Configure pytest to exclude manual test scripts from CI builds
99
+
100
+ ## [0.1.15] - 2025-06-14
101
+
102
+ ### Fixed
103
+ - Removed load_dotenv call that was causing import error
104
+
105
+ ## [0.1.14] - 2025-06-14
106
+
107
+ ### Fixed
108
+ - Updated GitHub workflows for new project structure
109
+
110
+ ## [0.1.13] - 2025-06-14
111
+
112
+ ### Added
113
+ - Performance timing in voice responses showing TTS, recording, and STT durations
114
+ - Local STT/TTS documentation for Whisper.cpp and Kokoro
115
+ - CONTRIBUTING.md with development setup instructions
116
+ - CHANGELOG.md for tracking changes
117
+
118
+ ### Changed
119
+ - Refactored from python-package subdirectory to top-level Python package
120
+ - Moved MCP server symlinks from mcp-servers/ to bin/ directory
121
+ - Updated wrapper script to properly resolve symlinks for venv detection
122
+ - Improved signal handlers to prevent premature exit
123
+ - Configure build to only include essential files in package
124
+
125
+ ### Fixed
126
+ - Audio playback dimension mismatch when adding silence buffer
127
+ - MCP server connection persistence (was disconnecting after each request)
128
+ - Event loop cleanup errors on shutdown
129
+ - Wrapper script path resolution for symlinked execution
130
+ - Critical syntax errors in voice-mcp script
131
+
132
+ ### Removed
133
+ - Unused python-dotenv dependency
134
+ - Temporary test files (test_audio.py, test_minimal_mcp.py)
135
+ - Redundant test dependencies in pyproject.toml
136
+ - All container/Docker support
137
+
138
+ ## [0.1.12] - 2025-06-14
139
+
140
+ ### Added
141
+ - Kokoro TTS support with configuration examples
142
+ - Export examples in .env.example for various setups
143
+ - Centralized version management and automatic PyPI publishing
144
+
145
+ ### Changed
146
+ - Simplified project structure with top-level package
147
+
148
+ ## [0.1.11] - 2025-06-13
149
+
150
+ ### Added
151
+ - Initial voice-mcp implementation
152
+ - OpenAI-compatible STT/TTS support
153
+ - LiveKit integration for room-based voice communication
154
+ - MCP tool interface with converse, listen_for_speech, check_room_status, and check_audio_devices
155
+ - Debug mode with audio recording capabilities
156
+ - Support for multiple transport methods (local microphone and LiveKit)
157
+
158
+ ## [0.1.0 - 0.1.10] - 2025-06-13
159
+
160
+ ### Added
161
+ - Initial development and iteration of voice-mcp
162
+ - Basic MCP server structure
163
+ - OpenAI API integration for STT/TTS
164
+ - Audio recording and playback functionality
165
+ - Configuration via environment variables
@@ -0,0 +1,270 @@
1
+ Metadata-Version: 2.4
2
+ Name: voice-mode
3
+ Version: 0.1.24
4
+ Summary: Voice interaction capabilities for Model Context Protocol (MCP) servers (also available as voice-mcp)
5
+ Project-URL: Homepage, https://github.com/mbailey/voice-mcp
6
+ Project-URL: Repository, https://github.com/mbailey/voice-mcp
7
+ Project-URL: Issues, https://github.com/mbailey/voice-mcp/issues
8
+ Author-email: mbailey <mbailey@example.com>
9
+ License: MIT
10
+ Keywords: ai,livekit,llm,mcp,speech,stt,tts,voice
11
+ Classifier: Development Status :: 4 - Beta
12
+ Classifier: Intended Audience :: Developers
13
+ Classifier: License :: OSI Approved :: MIT License
14
+ Classifier: Programming Language :: Python :: 3
15
+ Classifier: Programming Language :: Python :: 3.8
16
+ Classifier: Programming Language :: Python :: 3.9
17
+ Classifier: Programming Language :: Python :: 3.10
18
+ Classifier: Programming Language :: Python :: 3.11
19
+ Classifier: Programming Language :: Python :: 3.12
20
+ Classifier: Programming Language :: Python :: 3.13
21
+ Classifier: Topic :: Multimedia :: Sound/Audio :: Speech
22
+ Classifier: Topic :: Software Development :: Libraries
23
+ Requires-Python: >=3.10
24
+ Requires-Dist: audioop-lts; python_version >= '3.13'
25
+ Requires-Dist: fastmcp>=2.0.0
26
+ Requires-Dist: httpx
27
+ Requires-Dist: numpy
28
+ Requires-Dist: openai>=1.0.0
29
+ Requires-Dist: pydub
30
+ Requires-Dist: scipy
31
+ Requires-Dist: simpleaudio
32
+ Requires-Dist: sounddevice
33
+ Requires-Dist: uv>=0.4.0
34
+ Provides-Extra: dev
35
+ Requires-Dist: build>=1.0.0; extra == 'dev'
36
+ Requires-Dist: pytest-asyncio>=0.21.0; extra == 'dev'
37
+ Requires-Dist: pytest-cov>=4.0.0; extra == 'dev'
38
+ Requires-Dist: pytest-mock>=3.10.0; extra == 'dev'
39
+ Requires-Dist: pytest>=7.0.0; extra == 'dev'
40
+ Requires-Dist: twine>=4.0.0; extra == 'dev'
41
+ Provides-Extra: test
42
+ Requires-Dist: pytest-asyncio>=0.21.0; extra == 'test'
43
+ Requires-Dist: pytest-cov>=4.0.0; extra == 'test'
44
+ Requires-Dist: pytest-mock>=3.10.0; extra == 'test'
45
+ Requires-Dist: pytest>=7.0.0; extra == 'test'
46
+ Description-Content-Type: text/markdown
47
+
48
+ # voice-mcp - Voice Mode for Claude Code
49
+
50
+ A Model Context Protocol (MCP) server that enables voice interactions with Claude and other LLMs. Requires only an OpenAI API key and microphone/speakers.
51
+
52
+ ## πŸ–₯️ Compatibility
53
+
54
+ **Runs on:** Linux β€’ macOS β€’ Windows (WSL) | **Python:** 3.10+ | **Tested:** Ubuntu 24.04 LTS, Fedora 42
55
+
56
+ ## ✨ Features
57
+
58
+ - **πŸŽ™οΈ Voice conversations** with Claude - ask questions and hear responses
59
+ - **πŸ”„ Multiple transports** - local microphone or LiveKit room-based communication
60
+ - **πŸ—£οΈ OpenAI-compatible** - works with any STT/TTS service (local or cloud)
61
+ - **⚑ Real-time** - low-latency voice interactions with automatic transport selection
62
+ - **πŸ”§ MCP Integration** - seamless with Claude Desktop and other MCP clients
63
+
64
+ ## 🎯 Simple Requirements
65
+
66
+ **All you need to get started:**
67
+
68
+ 1. **πŸ”‘ OpenAI API Key** (or compatible service) - for speech-to-text and text-to-speech
69
+ 2. **🎀 Computer with microphone and speakers** OR **☁️ LiveKit server** ([LiveKit Cloud](https://docs.livekit.io/home/cloud/) or [self-hosted](https://github.com/livekit/livekit))
70
+
71
+ ## Quick Start
72
+
73
+ Setup for Claude Code:
74
+
75
+ ```bash
76
+ export OPENAI_API_KEY=your-openai-key
77
+ claude mcp add voice-mcp uvx voice-mcp
78
+ claude
79
+ ```
80
+
81
+ Try: *"Let's have a voice conversation"*
82
+
83
+ ## 🎬 Demo
84
+
85
+ Watch voice-mcp in action:
86
+
87
+ [![voice-mcp Demo](https://img.youtube.com/vi/aXRNWvpnwVs/maxresdefault.jpg)](https://www.youtube.com/watch?v=aXRNWvpnwVs)
88
+
89
+ ## Example Usage
90
+
91
+ Once configured, try these prompts with Claude:
92
+
93
+ - `"Let's have a voice conversation"`
94
+ - `"Ask me about my day using voice"`
95
+ - `"Tell me a joke"` (Claude will speak and wait for your response)
96
+ - `"Say goodbye"` (Claude will speak without waiting)
97
+
98
+ The new `converse` function makes voice interactions more natural - it automatically waits for your response by default.
99
+
100
+ ## Claude Desktop Setup
101
+
102
+ Add to your Claude Desktop configuration file:
103
+
104
+ **macOS**: `~/Library/Application Support/Claude/claude_desktop_config.json`
105
+ **Windows**: `%APPDATA%\Claude\claude_desktop_config.json`
106
+
107
+ <details>
108
+ <summary>Using uvx (recommended)</summary>
109
+
110
+ ```json
111
+ {
112
+ "mcpServers": {
113
+ "voice-mcp": {
114
+ "command": "uvx",
115
+ "args": ["voice-mcp"],
116
+ "env": {
117
+ "OPENAI_API_KEY": "your-openai-key"
118
+ }
119
+ }
120
+ }
121
+ }
122
+ ```
123
+
124
+ </details>
125
+
126
+ <details>
127
+ <summary>Using pip install</summary>
128
+
129
+ ```json
130
+ {
131
+ "mcpServers": {
132
+ "voice-mcp": {
133
+ "command": "voice-mcp",
134
+ "env": {
135
+ "OPENAI_API_KEY": "your-openai-key"
136
+ }
137
+ }
138
+ }
139
+ }
140
+ ```
141
+
142
+ </details>
143
+
144
+ ## Tools
145
+
146
+ | Tool | Description | Key Parameters |
147
+ |------|-------------|----------------|
148
+ | `converse` | Have a voice conversation - speak and optionally listen | `message`, `wait_for_response` (default: true), `listen_duration` (default: 10s), `transport` (auto/local/livekit) |
149
+ | `listen_for_speech` | Listen for speech and convert to text | `duration` (default: 5s) |
150
+ | `check_room_status` | Check LiveKit room status and participants | None |
151
+ | `check_audio_devices` | List available audio input/output devices | None |
152
+ | `start_kokoro` | Start the Kokoro TTS service | `models_dir` (optional, defaults to ~/Models/kokoro) |
153
+ | `stop_kokoro` | Stop the Kokoro TTS service | None |
154
+ | `kokoro_status` | Check the status of Kokoro TTS service | None |
155
+
156
+ **Note:** The `converse` tool is the primary interface for voice interactions, combining speaking and listening in a natural flow.
157
+
158
+ ## Configuration
159
+
160
+ **πŸ“– See [docs/configuration.md](docs/configuration.md) for complete setup instructions for all MCP hosts**
161
+
162
+ **πŸ“ Ready-to-use config files in [config-examples/](config-examples/)**
163
+
164
+ ### Quick Setup
165
+
166
+ The only required configuration is your OpenAI API key:
167
+
168
+ ```bash
169
+ export OPENAI_API_KEY="your-key"
170
+ ```
171
+
172
+ ### Optional Settings
173
+
174
+ ```bash
175
+ # Custom STT/TTS services (OpenAI-compatible)
176
+ export STT_BASE_URL="http://localhost:2022/v1" # Local Whisper
177
+ export TTS_BASE_URL="http://localhost:8880/v1" # Local TTS
178
+ export TTS_VOICE="alloy" # Voice selection
179
+
180
+ # LiveKit (for room-based communication)
181
+ # See docs/livekit/ for setup guide
182
+ export LIVEKIT_URL="wss://your-app.livekit.cloud"
183
+ export LIVEKIT_API_KEY="your-api-key"
184
+ export LIVEKIT_API_SECRET="your-api-secret"
185
+
186
+ # Debug mode
187
+ export VOICE_MCP_DEBUG="true"
188
+
189
+ # Save all audio (TTS output and STT input)
190
+ export VOICE_MCP_SAVE_AUDIO="true"
191
+ ```
192
+
193
+ ## Local STT/TTS Services
194
+
195
+ For privacy-focused or offline usage, voice-mcp supports local speech services:
196
+
197
+ - **[Whisper.cpp](docs/whisper.cpp.md)** - Local speech-to-text with OpenAI-compatible API
198
+ - **[Kokoro](docs/kokoro.md)** - Local text-to-speech with multiple voice options
199
+
200
+ These services provide the same API interface as OpenAI, allowing seamless switching between cloud and local processing.
201
+
202
+ ### OpenAI API Compatibility Benefits
203
+
204
+ By strictly adhering to OpenAI's API standard, voice-mcp enables powerful deployment flexibility:
205
+
206
+ - **πŸ”€ Transparent Routing**: Users can implement their own API proxies or gateways outside of voice-mcp to route requests to different providers based on custom logic (cost, latency, availability, etc.)
207
+ - **🎯 Model Selection**: Deploy routing layers that select optimal models per request without modifying voice-mcp configuration
208
+ - **πŸ’° Cost Optimization**: Build intelligent routers that balance between expensive cloud APIs and free local models
209
+ - **πŸ”§ No Lock-in**: Switch providers by simply changing the `BASE_URL` - no code changes required
210
+
211
+ Example: Simply set `OPENAI_BASE_URL` to point to your custom router:
212
+ ```bash
213
+ export OPENAI_BASE_URL="https://router.example.com/v1"
214
+ export OPENAI_API_KEY="your-key"
215
+ # voice-mcp now uses your router for all OpenAI API calls
216
+ ```
217
+
218
+ The OpenAI SDK handles this automatically - no voice-mcp configuration needed!
219
+
220
+ ## Architecture
221
+
222
+ ```
223
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
224
+ β”‚ Claude/LLM β”‚ β”‚ LiveKit Server β”‚ β”‚ Voice Frontend β”‚
225
+ β”‚ (MCP Client) │◄────►│ (Optional) │◄────►│ (Optional) β”‚
226
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
227
+ β”‚ β”‚
228
+ β”‚ β”‚
229
+ β–Ό β–Ό
230
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
231
+ β”‚ Voice MCP Server β”‚ β”‚ Audio Services β”‚
232
+ β”‚ β€’ converse β”‚ β”‚ β€’ OpenAI APIs β”‚
233
+ β”‚ β€’ listen_for_speech│◄────►│ β€’ Local Whisper β”‚
234
+ β”‚ β€’ check_room_statusβ”‚ β”‚ β€’ Local TTS β”‚
235
+ β”‚ β€’ check_audio_devices β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
236
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
237
+ ```
238
+
239
+ ## Troubleshooting
240
+
241
+ ### Common Issues
242
+
243
+ - **No microphone access**: Check system permissions for terminal/application
244
+ - **UV not found**: Install with `curl -LsSf https://astral.sh/uv/install.sh | sh`
245
+ - **OpenAI API error**: Verify your `OPENAI_API_KEY` is set correctly
246
+ - **No audio output**: Check system audio settings and available devices
247
+
248
+ ### Debug Mode
249
+
250
+ Enable detailed logging and audio file saving:
251
+
252
+ ```bash
253
+ export VOICE_MCP_DEBUG=true
254
+ ```
255
+
256
+ Debug audio files are saved to: `~/voice-mcp_recordings/`
257
+
258
+ ### Audio Saving
259
+
260
+ To save all audio files (both TTS output and STT input):
261
+
262
+ ```bash
263
+ export VOICE_MCP_SAVE_AUDIO=true
264
+ ```
265
+
266
+ Audio files are saved to: `~/voice-mcp_audio/` with timestamps in the filename.
267
+
268
+ ## License
269
+
270
+ MIT