voice-mode 0.1.24__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- voice_mode-0.1.24/.gitignore +85 -0
- voice_mode-0.1.24/CHANGELOG.md +165 -0
- voice_mode-0.1.24/PKG-INFO +270 -0
- voice_mode-0.1.24/README.md +223 -0
- voice_mode-0.1.24/pyproject.toml +98 -0
- voice_mode-0.1.24/voice_mcp/__init__.py +10 -0
- voice_mode-0.1.24/voice_mcp/__main__.py +8 -0
- voice_mode-0.1.24/voice_mcp/__version__.py +3 -0
- voice_mode-0.1.24/voice_mcp/cli.py +12 -0
- voice_mode-0.1.24/voice_mcp/core.py +304 -0
- voice_mode-0.1.24/voice_mcp/providers.py +215 -0
- voice_mode-0.1.24/voice_mcp/server.py +1613 -0
|
@@ -0,0 +1,85 @@
|
|
|
1
|
+
# Environment configuration
|
|
2
|
+
.env.local
|
|
3
|
+
.env
|
|
4
|
+
|
|
5
|
+
# Python
|
|
6
|
+
__pycache__/
|
|
7
|
+
*.pyc
|
|
8
|
+
*.pyo
|
|
9
|
+
*.pyd
|
|
10
|
+
.Python
|
|
11
|
+
*.so
|
|
12
|
+
.pytest_cache/
|
|
13
|
+
.coverage
|
|
14
|
+
.coverage.*
|
|
15
|
+
htmlcov/
|
|
16
|
+
.mypy_cache/
|
|
17
|
+
.ruff_cache/
|
|
18
|
+
.hypothesis/
|
|
19
|
+
.tox/
|
|
20
|
+
*.cover
|
|
21
|
+
*.py,cover
|
|
22
|
+
|
|
23
|
+
# Virtual environments
|
|
24
|
+
venv/
|
|
25
|
+
env/
|
|
26
|
+
.venv/
|
|
27
|
+
test-env/
|
|
28
|
+
test-pkg-env/
|
|
29
|
+
|
|
30
|
+
# IDE
|
|
31
|
+
.vscode/
|
|
32
|
+
.idea/
|
|
33
|
+
*.swp
|
|
34
|
+
*.swo
|
|
35
|
+
*~
|
|
36
|
+
|
|
37
|
+
# OS
|
|
38
|
+
.DS_Store
|
|
39
|
+
Thumbs.db
|
|
40
|
+
desktop.ini
|
|
41
|
+
|
|
42
|
+
# Logs
|
|
43
|
+
*.log
|
|
44
|
+
logs/
|
|
45
|
+
*.err
|
|
46
|
+
|
|
47
|
+
# Temporary files
|
|
48
|
+
tmp/
|
|
49
|
+
temp/
|
|
50
|
+
.tmp/
|
|
51
|
+
*.tmp
|
|
52
|
+
|
|
53
|
+
# Build artifacts
|
|
54
|
+
dist/
|
|
55
|
+
build/
|
|
56
|
+
*.egg-info/
|
|
57
|
+
.eggs/
|
|
58
|
+
*.whl
|
|
59
|
+
|
|
60
|
+
# Test artifacts
|
|
61
|
+
test_output/
|
|
62
|
+
*.mp3
|
|
63
|
+
*.wav
|
|
64
|
+
*.pcm
|
|
65
|
+
.aider*
|
|
66
|
+
|
|
67
|
+
# Voice MCP specific
|
|
68
|
+
voice-mcp_recordings/
|
|
69
|
+
debug_recordings/
|
|
70
|
+
|
|
71
|
+
# Documentation build
|
|
72
|
+
docs/_build/
|
|
73
|
+
docs/promote/
|
|
74
|
+
|
|
75
|
+
# Local configuration
|
|
76
|
+
.claude/settings.local.json
|
|
77
|
+
|
|
78
|
+
# Node modules (for livekit frontend)
|
|
79
|
+
node_modules/
|
|
80
|
+
.npm/
|
|
81
|
+
|
|
82
|
+
# Misc
|
|
83
|
+
.cache/
|
|
84
|
+
*.bak
|
|
85
|
+
*.orig
|
|
@@ -0,0 +1,165 @@
|
|
|
1
|
+
# Changelog
|
|
2
|
+
|
|
3
|
+
All notable changes to voice-mcp will be documented in this file.
|
|
4
|
+
|
|
5
|
+
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
|
|
6
|
+
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
|
7
|
+
|
|
8
|
+
## [Unreleased]
|
|
9
|
+
|
|
10
|
+
## [0.1.24] - 2025-06-17
|
|
11
|
+
|
|
12
|
+
## [0.1.23] - 2025-06-17
|
|
13
|
+
|
|
14
|
+
### Added
|
|
15
|
+
- Provider registry system MVP for managing TTS/STT providers
|
|
16
|
+
- Dynamic provider discovery and registration
|
|
17
|
+
- Automatic availability checking
|
|
18
|
+
- Feature-based provider filtering
|
|
19
|
+
- Dual package name support (voice-mcp and voice-mode)
|
|
20
|
+
- Both commands now available in voice-mode package
|
|
21
|
+
- Maintains backward compatibility
|
|
22
|
+
- Service management tools for Kokoro TTS:
|
|
23
|
+
- `start_kokoro` - Start the Kokoro TTS service using uvx
|
|
24
|
+
- `stop_kokoro` - Stop the running Kokoro service
|
|
25
|
+
- `kokoro_status` - Check service status with CPU/memory usage
|
|
26
|
+
- Automatic cleanup of services on server shutdown
|
|
27
|
+
- psutil dependency for process monitoring
|
|
28
|
+
- `list_tts_voices` tool to list all available TTS voices by provider
|
|
29
|
+
- Shows OpenAI standard and enhanced voices with characteristics
|
|
30
|
+
- Lists Kokoro voices with descriptions
|
|
31
|
+
- Includes usage examples and emotional speech guidance
|
|
32
|
+
- Checks API/service availability for each provider
|
|
33
|
+
- Voice-chat slash command improvements for better argument handling
|
|
34
|
+
|
|
35
|
+
### Changed
|
|
36
|
+
- Default TTS voices updated: alloy for OpenAI, af_sky for Kokoro
|
|
37
|
+
|
|
38
|
+
## [0.1.22] - 2025-06-16
|
|
39
|
+
|
|
40
|
+
### Added
|
|
41
|
+
- Local STT/TTS configuration support in .mcp.json
|
|
42
|
+
- Split TTS metrics into generation and playback components for better performance insights
|
|
43
|
+
- Tracks TTS generation time (API call) separately from playback time
|
|
44
|
+
- Displays metrics as tts_gen, tts_play, and tts_total
|
|
45
|
+
|
|
46
|
+
### Changed
|
|
47
|
+
- Modified text_to_speech() to return (success, metrics) tuple
|
|
48
|
+
- Updated all tests to handle new TTS return format
|
|
49
|
+
|
|
50
|
+
## [0.1.21] - 2025-06-16
|
|
51
|
+
|
|
52
|
+
### Added
|
|
53
|
+
- VOICE_MCP_SAVE_AUDIO environment variable to save all TTS/STT audio files
|
|
54
|
+
- Audio files saved to ~/voice-mcp_audio/ with timestamps
|
|
55
|
+
- Documentation about gpt-4o-mini-tts being best for emotional speech
|
|
56
|
+
- Warning to never use coral voice and default to af_sky for Kokoro
|
|
57
|
+
|
|
58
|
+
### Changed
|
|
59
|
+
- Voice parameter changed from Literal to str for flexibility in voice selection
|
|
60
|
+
|
|
61
|
+
## [0.1.20] - 2025-06-15
|
|
62
|
+
|
|
63
|
+
### Changed
|
|
64
|
+
- Voice parameter changed from Literal to str type for more flexibility
|
|
65
|
+
|
|
66
|
+
## [0.1.19] - 2025-06-15
|
|
67
|
+
|
|
68
|
+
### Added
|
|
69
|
+
- TTS provider selection parameter to converse function ("openai" or "kokoro")
|
|
70
|
+
- Auto-detection of TTS provider based on voice selection
|
|
71
|
+
- Support for multiple TTS endpoints with provider-specific clients
|
|
72
|
+
|
|
73
|
+
## [0.1.18] - 2025-06-15
|
|
74
|
+
|
|
75
|
+
### Changed
|
|
76
|
+
- Removed mcp-neovim-server from .mcp.json configuration
|
|
77
|
+
|
|
78
|
+
## [0.1.17] - 2025-06-15
|
|
79
|
+
|
|
80
|
+
### Changed
|
|
81
|
+
- Minor version bump (no functional changes)
|
|
82
|
+
|
|
83
|
+
## [0.1.16] - 2025-06-15
|
|
84
|
+
|
|
85
|
+
## [0.1.16] - 2025-06-15
|
|
86
|
+
|
|
87
|
+
### Added
|
|
88
|
+
- Voice parameter to converse function for dynamic TTS voice selection
|
|
89
|
+
- Support for Kokoro voices: af_sky, af_sarah, am_adam, af_nicole, am_michael
|
|
90
|
+
- Python 3.13 support with conditional audioop-lts dependency
|
|
91
|
+
|
|
92
|
+
### Fixed
|
|
93
|
+
- BrokenResourceError when concurrent voice operations interfere with MCP stdio communication
|
|
94
|
+
- Enhanced sounddevice stderr redirection workaround to prevent stdio corruption
|
|
95
|
+
- Added concurrency lock to serialize audio operations and prevent race conditions
|
|
96
|
+
- Protected stdio file descriptors during audio recording and playback operations
|
|
97
|
+
- Added anyio.BrokenResourceError to exception handling for MCP disconnections
|
|
98
|
+
- Configure pytest to exclude manual test scripts from CI builds
|
|
99
|
+
|
|
100
|
+
## [0.1.15] - 2025-06-14
|
|
101
|
+
|
|
102
|
+
### Fixed
|
|
103
|
+
- Removed load_dotenv call that was causing import error
|
|
104
|
+
|
|
105
|
+
## [0.1.14] - 2025-06-14
|
|
106
|
+
|
|
107
|
+
### Fixed
|
|
108
|
+
- Updated GitHub workflows for new project structure
|
|
109
|
+
|
|
110
|
+
## [0.1.13] - 2025-06-14
|
|
111
|
+
|
|
112
|
+
### Added
|
|
113
|
+
- Performance timing in voice responses showing TTS, recording, and STT durations
|
|
114
|
+
- Local STT/TTS documentation for Whisper.cpp and Kokoro
|
|
115
|
+
- CONTRIBUTING.md with development setup instructions
|
|
116
|
+
- CHANGELOG.md for tracking changes
|
|
117
|
+
|
|
118
|
+
### Changed
|
|
119
|
+
- Refactored from python-package subdirectory to top-level Python package
|
|
120
|
+
- Moved MCP server symlinks from mcp-servers/ to bin/ directory
|
|
121
|
+
- Updated wrapper script to properly resolve symlinks for venv detection
|
|
122
|
+
- Improved signal handlers to prevent premature exit
|
|
123
|
+
- Configure build to only include essential files in package
|
|
124
|
+
|
|
125
|
+
### Fixed
|
|
126
|
+
- Audio playback dimension mismatch when adding silence buffer
|
|
127
|
+
- MCP server connection persistence (was disconnecting after each request)
|
|
128
|
+
- Event loop cleanup errors on shutdown
|
|
129
|
+
- Wrapper script path resolution for symlinked execution
|
|
130
|
+
- Critical syntax errors in voice-mcp script
|
|
131
|
+
|
|
132
|
+
### Removed
|
|
133
|
+
- Unused python-dotenv dependency
|
|
134
|
+
- Temporary test files (test_audio.py, test_minimal_mcp.py)
|
|
135
|
+
- Redundant test dependencies in pyproject.toml
|
|
136
|
+
- All container/Docker support
|
|
137
|
+
|
|
138
|
+
## [0.1.12] - 2025-06-14
|
|
139
|
+
|
|
140
|
+
### Added
|
|
141
|
+
- Kokoro TTS support with configuration examples
|
|
142
|
+
- Export examples in .env.example for various setups
|
|
143
|
+
- Centralized version management and automatic PyPI publishing
|
|
144
|
+
|
|
145
|
+
### Changed
|
|
146
|
+
- Simplified project structure with top-level package
|
|
147
|
+
|
|
148
|
+
## [0.1.11] - 2025-06-13
|
|
149
|
+
|
|
150
|
+
### Added
|
|
151
|
+
- Initial voice-mcp implementation
|
|
152
|
+
- OpenAI-compatible STT/TTS support
|
|
153
|
+
- LiveKit integration for room-based voice communication
|
|
154
|
+
- MCP tool interface with converse, listen_for_speech, check_room_status, and check_audio_devices
|
|
155
|
+
- Debug mode with audio recording capabilities
|
|
156
|
+
- Support for multiple transport methods (local microphone and LiveKit)
|
|
157
|
+
|
|
158
|
+
## [0.1.0 - 0.1.10] - 2025-06-13
|
|
159
|
+
|
|
160
|
+
### Added
|
|
161
|
+
- Initial development and iteration of voice-mcp
|
|
162
|
+
- Basic MCP server structure
|
|
163
|
+
- OpenAI API integration for STT/TTS
|
|
164
|
+
- Audio recording and playback functionality
|
|
165
|
+
- Configuration via environment variables
|
|
@@ -0,0 +1,270 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: voice-mode
|
|
3
|
+
Version: 0.1.24
|
|
4
|
+
Summary: Voice interaction capabilities for Model Context Protocol (MCP) servers (also available as voice-mcp)
|
|
5
|
+
Project-URL: Homepage, https://github.com/mbailey/voice-mcp
|
|
6
|
+
Project-URL: Repository, https://github.com/mbailey/voice-mcp
|
|
7
|
+
Project-URL: Issues, https://github.com/mbailey/voice-mcp/issues
|
|
8
|
+
Author-email: mbailey <mbailey@example.com>
|
|
9
|
+
License: MIT
|
|
10
|
+
Keywords: ai,livekit,llm,mcp,speech,stt,tts,voice
|
|
11
|
+
Classifier: Development Status :: 4 - Beta
|
|
12
|
+
Classifier: Intended Audience :: Developers
|
|
13
|
+
Classifier: License :: OSI Approved :: MIT License
|
|
14
|
+
Classifier: Programming Language :: Python :: 3
|
|
15
|
+
Classifier: Programming Language :: Python :: 3.8
|
|
16
|
+
Classifier: Programming Language :: Python :: 3.9
|
|
17
|
+
Classifier: Programming Language :: Python :: 3.10
|
|
18
|
+
Classifier: Programming Language :: Python :: 3.11
|
|
19
|
+
Classifier: Programming Language :: Python :: 3.12
|
|
20
|
+
Classifier: Programming Language :: Python :: 3.13
|
|
21
|
+
Classifier: Topic :: Multimedia :: Sound/Audio :: Speech
|
|
22
|
+
Classifier: Topic :: Software Development :: Libraries
|
|
23
|
+
Requires-Python: >=3.10
|
|
24
|
+
Requires-Dist: audioop-lts; python_version >= '3.13'
|
|
25
|
+
Requires-Dist: fastmcp>=2.0.0
|
|
26
|
+
Requires-Dist: httpx
|
|
27
|
+
Requires-Dist: numpy
|
|
28
|
+
Requires-Dist: openai>=1.0.0
|
|
29
|
+
Requires-Dist: pydub
|
|
30
|
+
Requires-Dist: scipy
|
|
31
|
+
Requires-Dist: simpleaudio
|
|
32
|
+
Requires-Dist: sounddevice
|
|
33
|
+
Requires-Dist: uv>=0.4.0
|
|
34
|
+
Provides-Extra: dev
|
|
35
|
+
Requires-Dist: build>=1.0.0; extra == 'dev'
|
|
36
|
+
Requires-Dist: pytest-asyncio>=0.21.0; extra == 'dev'
|
|
37
|
+
Requires-Dist: pytest-cov>=4.0.0; extra == 'dev'
|
|
38
|
+
Requires-Dist: pytest-mock>=3.10.0; extra == 'dev'
|
|
39
|
+
Requires-Dist: pytest>=7.0.0; extra == 'dev'
|
|
40
|
+
Requires-Dist: twine>=4.0.0; extra == 'dev'
|
|
41
|
+
Provides-Extra: test
|
|
42
|
+
Requires-Dist: pytest-asyncio>=0.21.0; extra == 'test'
|
|
43
|
+
Requires-Dist: pytest-cov>=4.0.0; extra == 'test'
|
|
44
|
+
Requires-Dist: pytest-mock>=3.10.0; extra == 'test'
|
|
45
|
+
Requires-Dist: pytest>=7.0.0; extra == 'test'
|
|
46
|
+
Description-Content-Type: text/markdown
|
|
47
|
+
|
|
48
|
+
# voice-mcp - Voice Mode for Claude Code
|
|
49
|
+
|
|
50
|
+
A Model Context Protocol (MCP) server that enables voice interactions with Claude and other LLMs. Requires only an OpenAI API key and microphone/speakers.
|
|
51
|
+
|
|
52
|
+
## π₯οΈ Compatibility
|
|
53
|
+
|
|
54
|
+
**Runs on:** Linux β’ macOS β’ Windows (WSL) | **Python:** 3.10+ | **Tested:** Ubuntu 24.04 LTS, Fedora 42
|
|
55
|
+
|
|
56
|
+
## β¨ Features
|
|
57
|
+
|
|
58
|
+
- **ποΈ Voice conversations** with Claude - ask questions and hear responses
|
|
59
|
+
- **π Multiple transports** - local microphone or LiveKit room-based communication
|
|
60
|
+
- **π£οΈ OpenAI-compatible** - works with any STT/TTS service (local or cloud)
|
|
61
|
+
- **β‘ Real-time** - low-latency voice interactions with automatic transport selection
|
|
62
|
+
- **π§ MCP Integration** - seamless with Claude Desktop and other MCP clients
|
|
63
|
+
|
|
64
|
+
## π― Simple Requirements
|
|
65
|
+
|
|
66
|
+
**All you need to get started:**
|
|
67
|
+
|
|
68
|
+
1. **π OpenAI API Key** (or compatible service) - for speech-to-text and text-to-speech
|
|
69
|
+
2. **π€ Computer with microphone and speakers** OR **βοΈ LiveKit server** ([LiveKit Cloud](https://docs.livekit.io/home/cloud/) or [self-hosted](https://github.com/livekit/livekit))
|
|
70
|
+
|
|
71
|
+
## Quick Start
|
|
72
|
+
|
|
73
|
+
Setup for Claude Code:
|
|
74
|
+
|
|
75
|
+
```bash
|
|
76
|
+
export OPENAI_API_KEY=your-openai-key
|
|
77
|
+
claude mcp add voice-mcp uvx voice-mcp
|
|
78
|
+
claude
|
|
79
|
+
```
|
|
80
|
+
|
|
81
|
+
Try: *"Let's have a voice conversation"*
|
|
82
|
+
|
|
83
|
+
## π¬ Demo
|
|
84
|
+
|
|
85
|
+
Watch voice-mcp in action:
|
|
86
|
+
|
|
87
|
+
[](https://www.youtube.com/watch?v=aXRNWvpnwVs)
|
|
88
|
+
|
|
89
|
+
## Example Usage
|
|
90
|
+
|
|
91
|
+
Once configured, try these prompts with Claude:
|
|
92
|
+
|
|
93
|
+
- `"Let's have a voice conversation"`
|
|
94
|
+
- `"Ask me about my day using voice"`
|
|
95
|
+
- `"Tell me a joke"` (Claude will speak and wait for your response)
|
|
96
|
+
- `"Say goodbye"` (Claude will speak without waiting)
|
|
97
|
+
|
|
98
|
+
The new `converse` function makes voice interactions more natural - it automatically waits for your response by default.
|
|
99
|
+
|
|
100
|
+
## Claude Desktop Setup
|
|
101
|
+
|
|
102
|
+
Add to your Claude Desktop configuration file:
|
|
103
|
+
|
|
104
|
+
**macOS**: `~/Library/Application Support/Claude/claude_desktop_config.json`
|
|
105
|
+
**Windows**: `%APPDATA%\Claude\claude_desktop_config.json`
|
|
106
|
+
|
|
107
|
+
<details>
|
|
108
|
+
<summary>Using uvx (recommended)</summary>
|
|
109
|
+
|
|
110
|
+
```json
|
|
111
|
+
{
|
|
112
|
+
"mcpServers": {
|
|
113
|
+
"voice-mcp": {
|
|
114
|
+
"command": "uvx",
|
|
115
|
+
"args": ["voice-mcp"],
|
|
116
|
+
"env": {
|
|
117
|
+
"OPENAI_API_KEY": "your-openai-key"
|
|
118
|
+
}
|
|
119
|
+
}
|
|
120
|
+
}
|
|
121
|
+
}
|
|
122
|
+
```
|
|
123
|
+
|
|
124
|
+
</details>
|
|
125
|
+
|
|
126
|
+
<details>
|
|
127
|
+
<summary>Using pip install</summary>
|
|
128
|
+
|
|
129
|
+
```json
|
|
130
|
+
{
|
|
131
|
+
"mcpServers": {
|
|
132
|
+
"voice-mcp": {
|
|
133
|
+
"command": "voice-mcp",
|
|
134
|
+
"env": {
|
|
135
|
+
"OPENAI_API_KEY": "your-openai-key"
|
|
136
|
+
}
|
|
137
|
+
}
|
|
138
|
+
}
|
|
139
|
+
}
|
|
140
|
+
```
|
|
141
|
+
|
|
142
|
+
</details>
|
|
143
|
+
|
|
144
|
+
## Tools
|
|
145
|
+
|
|
146
|
+
| Tool | Description | Key Parameters |
|
|
147
|
+
|------|-------------|----------------|
|
|
148
|
+
| `converse` | Have a voice conversation - speak and optionally listen | `message`, `wait_for_response` (default: true), `listen_duration` (default: 10s), `transport` (auto/local/livekit) |
|
|
149
|
+
| `listen_for_speech` | Listen for speech and convert to text | `duration` (default: 5s) |
|
|
150
|
+
| `check_room_status` | Check LiveKit room status and participants | None |
|
|
151
|
+
| `check_audio_devices` | List available audio input/output devices | None |
|
|
152
|
+
| `start_kokoro` | Start the Kokoro TTS service | `models_dir` (optional, defaults to ~/Models/kokoro) |
|
|
153
|
+
| `stop_kokoro` | Stop the Kokoro TTS service | None |
|
|
154
|
+
| `kokoro_status` | Check the status of Kokoro TTS service | None |
|
|
155
|
+
|
|
156
|
+
**Note:** The `converse` tool is the primary interface for voice interactions, combining speaking and listening in a natural flow.
|
|
157
|
+
|
|
158
|
+
## Configuration
|
|
159
|
+
|
|
160
|
+
**π See [docs/configuration.md](docs/configuration.md) for complete setup instructions for all MCP hosts**
|
|
161
|
+
|
|
162
|
+
**π Ready-to-use config files in [config-examples/](config-examples/)**
|
|
163
|
+
|
|
164
|
+
### Quick Setup
|
|
165
|
+
|
|
166
|
+
The only required configuration is your OpenAI API key:
|
|
167
|
+
|
|
168
|
+
```bash
|
|
169
|
+
export OPENAI_API_KEY="your-key"
|
|
170
|
+
```
|
|
171
|
+
|
|
172
|
+
### Optional Settings
|
|
173
|
+
|
|
174
|
+
```bash
|
|
175
|
+
# Custom STT/TTS services (OpenAI-compatible)
|
|
176
|
+
export STT_BASE_URL="http://localhost:2022/v1" # Local Whisper
|
|
177
|
+
export TTS_BASE_URL="http://localhost:8880/v1" # Local TTS
|
|
178
|
+
export TTS_VOICE="alloy" # Voice selection
|
|
179
|
+
|
|
180
|
+
# LiveKit (for room-based communication)
|
|
181
|
+
# See docs/livekit/ for setup guide
|
|
182
|
+
export LIVEKIT_URL="wss://your-app.livekit.cloud"
|
|
183
|
+
export LIVEKIT_API_KEY="your-api-key"
|
|
184
|
+
export LIVEKIT_API_SECRET="your-api-secret"
|
|
185
|
+
|
|
186
|
+
# Debug mode
|
|
187
|
+
export VOICE_MCP_DEBUG="true"
|
|
188
|
+
|
|
189
|
+
# Save all audio (TTS output and STT input)
|
|
190
|
+
export VOICE_MCP_SAVE_AUDIO="true"
|
|
191
|
+
```
|
|
192
|
+
|
|
193
|
+
## Local STT/TTS Services
|
|
194
|
+
|
|
195
|
+
For privacy-focused or offline usage, voice-mcp supports local speech services:
|
|
196
|
+
|
|
197
|
+
- **[Whisper.cpp](docs/whisper.cpp.md)** - Local speech-to-text with OpenAI-compatible API
|
|
198
|
+
- **[Kokoro](docs/kokoro.md)** - Local text-to-speech with multiple voice options
|
|
199
|
+
|
|
200
|
+
These services provide the same API interface as OpenAI, allowing seamless switching between cloud and local processing.
|
|
201
|
+
|
|
202
|
+
### OpenAI API Compatibility Benefits
|
|
203
|
+
|
|
204
|
+
By strictly adhering to OpenAI's API standard, voice-mcp enables powerful deployment flexibility:
|
|
205
|
+
|
|
206
|
+
- **π Transparent Routing**: Users can implement their own API proxies or gateways outside of voice-mcp to route requests to different providers based on custom logic (cost, latency, availability, etc.)
|
|
207
|
+
- **π― Model Selection**: Deploy routing layers that select optimal models per request without modifying voice-mcp configuration
|
|
208
|
+
- **π° Cost Optimization**: Build intelligent routers that balance between expensive cloud APIs and free local models
|
|
209
|
+
- **π§ No Lock-in**: Switch providers by simply changing the `BASE_URL` - no code changes required
|
|
210
|
+
|
|
211
|
+
Example: Simply set `OPENAI_BASE_URL` to point to your custom router:
|
|
212
|
+
```bash
|
|
213
|
+
export OPENAI_BASE_URL="https://router.example.com/v1"
|
|
214
|
+
export OPENAI_API_KEY="your-key"
|
|
215
|
+
# voice-mcp now uses your router for all OpenAI API calls
|
|
216
|
+
```
|
|
217
|
+
|
|
218
|
+
The OpenAI SDK handles this automatically - no voice-mcp configuration needed!
|
|
219
|
+
|
|
220
|
+
## Architecture
|
|
221
|
+
|
|
222
|
+
```
|
|
223
|
+
βββββββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββββββ
|
|
224
|
+
β Claude/LLM β β LiveKit Server β β Voice Frontend β
|
|
225
|
+
β (MCP Client) βββββββΊβ (Optional) βββββββΊβ (Optional) β
|
|
226
|
+
βββββββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββββββ
|
|
227
|
+
β β
|
|
228
|
+
β β
|
|
229
|
+
βΌ βΌ
|
|
230
|
+
βββββββββββββββββββββββ ββββββββββββββββββββ
|
|
231
|
+
β Voice MCP Server β β Audio Services β
|
|
232
|
+
β β’ converse β β β’ OpenAI APIs β
|
|
233
|
+
β β’ listen_for_speechβββββββΊβ β’ Local Whisper β
|
|
234
|
+
β β’ check_room_statusβ β β’ Local TTS β
|
|
235
|
+
β β’ check_audio_devices ββββββββββββββββββββ
|
|
236
|
+
βββββββββββββββββββββββ
|
|
237
|
+
```
|
|
238
|
+
|
|
239
|
+
## Troubleshooting
|
|
240
|
+
|
|
241
|
+
### Common Issues
|
|
242
|
+
|
|
243
|
+
- **No microphone access**: Check system permissions for terminal/application
|
|
244
|
+
- **UV not found**: Install with `curl -LsSf https://astral.sh/uv/install.sh | sh`
|
|
245
|
+
- **OpenAI API error**: Verify your `OPENAI_API_KEY` is set correctly
|
|
246
|
+
- **No audio output**: Check system audio settings and available devices
|
|
247
|
+
|
|
248
|
+
### Debug Mode
|
|
249
|
+
|
|
250
|
+
Enable detailed logging and audio file saving:
|
|
251
|
+
|
|
252
|
+
```bash
|
|
253
|
+
export VOICE_MCP_DEBUG=true
|
|
254
|
+
```
|
|
255
|
+
|
|
256
|
+
Debug audio files are saved to: `~/voice-mcp_recordings/`
|
|
257
|
+
|
|
258
|
+
### Audio Saving
|
|
259
|
+
|
|
260
|
+
To save all audio files (both TTS output and STT input):
|
|
261
|
+
|
|
262
|
+
```bash
|
|
263
|
+
export VOICE_MCP_SAVE_AUDIO=true
|
|
264
|
+
```
|
|
265
|
+
|
|
266
|
+
Audio files are saved to: `~/voice-mcp_audio/` with timestamps in the filename.
|
|
267
|
+
|
|
268
|
+
## License
|
|
269
|
+
|
|
270
|
+
MIT
|