voicesmith-mcp 1.0.17 → 1.0.19

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -39,7 +39,7 @@ What the AI does automatically:
39
39
 
40
40
  | Moment | What happens |
41
41
  |--------|-------------|
42
- | You give it a task | Speaks a brief acknowledgment |
42
+ | You give it a task | Gets to work (speaks only when clarifying approach) |
43
43
  | It finishes work | Speaks a summary of what was done |
44
44
  | It has a question | Asks out loud, then listens for your voice response |
45
45
  | Voice tools unavailable | Falls back to text silently |
@@ -77,6 +77,43 @@ In a meeting or shared space? Just ask:
77
77
 
78
78
  The AI continues working normally — it just won't play audio. Say *"unmute"* when you're ready.
79
79
 
80
+ ---
81
+
82
+ ### Menu Bar App (macOS)
83
+
84
+ On macOS, VoiceSmith includes a native menu bar app for hands-free control:
85
+
86
+ - **Session Activity** — see all active sessions with real-time sparkline graphs
87
+ - **Quick toggles** — Media Ducking, Nudge on Timeout
88
+ - **Voice switcher** — browse and change from 54 voices, nested by language
89
+ - **Whisper model** — switch between base/small/medium/large-v3 with inline download progress
90
+ - **Audio devices** — choose audio output and input devices
91
+ - **Voice rules** — edit or reset to default
92
+ - **Updates** — check and install new versions
93
+
94
+ The menu bar app starts automatically at login and runs independently from IDE sessions.
95
+
96
+ ---
97
+
98
+ ### Audio Device Selection
99
+
100
+ Choose specific audio output (speakers/headphones) and input (microphone) devices from the menu bar app, or in config:
101
+
102
+ ```json
103
+ {
104
+ "tts": { "audio_output_device": "coreaudio/BuiltInSpeakerDevice" },
105
+ "stt": { "audio_input_device": 1 }
106
+ }
107
+ ```
108
+
109
+ Changes take effect immediately — no restart needed. If a configured device is unavailable, falls back to system default.
110
+
111
+ ---
112
+
113
+ ### Interrupting Speech
114
+
115
+ Press **Escape** while the AI is speaking to stop audio immediately. The AI stops mid-sentence and waits for your next input.
116
+
80
117
  ## Alternative Install
81
118
 
82
119
  If you don't have Node.js or prefer a shell script:
@@ -87,7 +124,7 @@ cd voicesmith-mcp
87
124
  ./install.sh
88
125
  ```
89
126
 
90
- Supports the same flags: `--claude`, `--cursor`, `--codex`, `--all`.
127
+ Supports the same flags: `--claude`, `--cursor`, `--codex`, `--all`, `--uninstall`.
91
128
 
92
129
  ## MCP Tools
93
130
 
@@ -104,6 +141,7 @@ Once installed, your AI assistant has access to these tools:
104
141
  | `mute` / `unmute` | Silence or resume voice output |
105
142
  | `stop` | Stop playback or cancel an active recording |
106
143
  | `status` | Server health and session info |
144
+ | `list_audio_devices` | List available audio input and output devices |
107
145
 
108
146
  ## How It Works
109
147
 
@@ -112,7 +150,8 @@ The MCP server runs as a local process alongside your IDE. It communicates over
112
150
  - **TTS**: Kokoro ONNX — fast neural TTS, 54 voices, no GPU needed
113
151
  - **STT**: faster-whisper — OpenAI Whisper running locally via CTranslate2
114
152
  - **VAD**: Silero VAD — voice activity detection for clean recordings
115
- - **Audio**: mpv for playback, sounddevice for recording
153
+ - **Audio**: mpv for playback; CoreAudio via native app bundle on macOS (sounddevice fallback on Linux)
154
+ - **Media ducking**: Auto-pauses Apple Music, Spotify, and browser audio during speech (macOS)
116
155
 
117
156
  ## Multi-Session
118
157
 
@@ -131,16 +170,24 @@ Config lives at `~/.local/share/voicesmith-mcp/config.json`. Key settings:
131
170
  "main_agent": "Eric",
132
171
  "tts": {
133
172
  "default_voice": "am_eric",
134
- "audio_player": "mpv"
173
+ "audio_player": "mpv",
174
+ "duck_media": true
135
175
  },
136
176
  "stt": {
137
177
  "model_size": "base",
138
178
  "language": "en",
139
- "vad_threshold": 0.3
179
+ "vad_threshold": 0.3,
180
+ "nudge_on_timeout": false
140
181
  }
141
182
  }
142
183
  ```
143
184
 
185
+ | Setting | Description | Default |
186
+ |---------|-------------|---------|
187
+ | `tts.duck_media` | Auto-pause music/browser audio during speech (macOS) | `true` |
188
+ | `stt.nudge_on_timeout` | Speak "I didn't catch that" when listen times out | `false` |
189
+ | `stt.vad_threshold` | Voice detection sensitivity (lower = more sensitive) | `0.3` |
190
+
144
191
  Re-run `npx voicesmith-mcp install` to change your voice or update settings. Existing configuration is preserved — only new defaults are added.
145
192
 
146
193
  ## Requirements
@@ -166,16 +213,14 @@ Re-run `npx voicesmith-mcp install` to change your voice or update settings. Exi
166
213
 
167
214
  ### The AI can't hear me (listen returns empty or times out)
168
215
 
169
- **Check microphone permissions.** On macOS, the terminal app that runs your IDE needs microphone access:
216
+ **Check microphone permissions.** On macOS, VoiceSmith uses a native app bundle (`VoiceSmithMCP.app`) for mic access. The first time it records, macOS should show a permission dialog for the app. If it didn't:
170
217
 
171
218
  1. Open **System Settings > Privacy & Security > Microphone**
172
- 2. Make sure your terminal app is listed and enabled:
173
- - **Warp**, **Terminal.app**, or **iTerm2**for Claude Code
174
- - **Cursor** or **VS Code** — if using those IDEs directly
175
- 3. If the app isn't listed, the first `listen` call should trigger the permission prompt. Approve it and try again.
219
+ 2. Look for **VoiceSmithMCP** and make sure it's enabled
220
+ 3. If it's not listed, the LaunchAgent may not be running try reinstalling: `npx voicesmith-mcp install`
176
221
 
177
222
  > [!IMPORTANT]
178
- > The Python process inherits microphone permissions from the app that launched it. If your terminal doesn't have mic access, listen will silently fail.
223
+ > If the server detects silent audio (all zeros for ~320ms), it returns an error pointing you to the microphone permission settings. This usually means macOS TCC denied mic access.
179
224
 
180
225
  **Check your audio input device.** If an external mic is selected but not connected, the server opens it but gets silence:
181
226
  - Open **System Settings > Sound > Input** and verify the correct mic is selected
@@ -209,9 +254,11 @@ This can happen when another session is holding your preferred voice name. Ask t
209
254
 
210
255
  ```bash
211
256
  npx voicesmith-mcp uninstall
257
+ # or if installed via git clone:
258
+ ./install.sh --uninstall
212
259
  ```
213
260
 
214
- Removes all files, models, MCP config entries, and voice rules cleanly.
261
+ Removes all files, models, MCP config entries, voice rules, LaunchAgents, and hooks cleanly.
215
262
 
216
263
  ## License
217
264
 
package/bin/install.js CHANGED
@@ -462,6 +462,122 @@ except Exception as e:
462
462
  }
463
463
  }
464
464
 
465
+ // ─── Step 5b: Menu Bar App (macOS only) ──────────────────────────────────────
466
+
467
+ async function step5b_menuBar() {
468
+ if (process.platform !== "darwin") return;
469
+
470
+ const menubarSrc = path.join(__dirname, "..", "menubar", "VoiceSmithMenu.swift");
471
+ const menubarIconSrc = path.join(__dirname, "..", "menubar", "app-icon.png");
472
+ const menubarPlistTemplate = path.join(__dirname, "..", "menubar", "com.voicesmith-mcp.menubar.plist");
473
+
474
+ if (!fs.existsSync(menubarSrc)) {
475
+ logWarn("Menu bar source not found — skipping");
476
+ return;
477
+ }
478
+
479
+ // Check for swiftc
480
+ if (!(await commandExists("swiftc"))) {
481
+ logWarn("swiftc not found — menu bar app requires Xcode Command Line Tools");
482
+ logInfo("Install with: xcode-select --install");
483
+ return;
484
+ }
485
+
486
+ const menubarApp = path.join(INSTALL_DIR, "VoiceSmith.app");
487
+ const menubarBinDir = path.join(menubarApp, "Contents", "MacOS");
488
+ const menubarResDir = path.join(menubarApp, "Contents", "Resources");
489
+ const menubarBinary = path.join(menubarBinDir, "VoiceSmith");
490
+ const menubarPlist = path.join(os.homedir(), "Library", "LaunchAgents", "com.voicesmith-mcp.menubar.plist");
491
+
492
+ // Create app bundle structure
493
+ fs.mkdirSync(menubarBinDir, { recursive: true });
494
+ fs.mkdirSync(menubarResDir, { recursive: true });
495
+
496
+ // Create Info.plist
497
+ const infoPlist = `<?xml version="1.0" encoding="UTF-8"?>
498
+ <!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
499
+ <plist version="1.0">
500
+ <dict>
501
+ <key>CFBundleExecutable</key>
502
+ <string>VoiceSmith</string>
503
+ <key>CFBundleIdentifier</key>
504
+ <string>com.voicesmith-mcp.menubar</string>
505
+ <key>CFBundleName</key>
506
+ <string>VoiceSmith</string>
507
+ <key>CFBundleDisplayName</key>
508
+ <string>VoiceSmith</string>
509
+ <key>CFBundlePackageType</key>
510
+ <string>APPL</string>
511
+ <key>CFBundleShortVersionString</key>
512
+ <string>1.0</string>
513
+ <key>CFBundleVersion</key>
514
+ <string>1</string>
515
+ <key>CFBundleIconFile</key>
516
+ <string>AppIcon</string>
517
+ <key>LSBackgroundOnly</key>
518
+ <true/>
519
+ <key>LSUIElement</key>
520
+ <true/>
521
+ </dict>
522
+ </plist>`;
523
+ fs.writeFileSync(path.join(menubarApp, "Contents", "Info.plist"), infoPlist);
524
+
525
+ // Compile Swift
526
+ logAction("Building VoiceSmith menu bar app...");
527
+ const buildResult = await runCommand("swiftc", [
528
+ "-parse-as-library",
529
+ "-framework", "SwiftUI",
530
+ "-framework", "AppKit",
531
+ menubarSrc,
532
+ "-o", menubarBinary,
533
+ ]);
534
+
535
+ if (!buildResult.success) {
536
+ logWarn("Menu bar build failed — menu bar will not be available");
537
+ return;
538
+ }
539
+
540
+ // Generate icon
541
+ if (fs.existsSync(menubarIconSrc)) {
542
+ const { execSync } = require("child_process");
543
+ const iconsetDir = path.join(os.tmpdir(), "VoiceSmithIcon.iconset");
544
+ fs.mkdirSync(iconsetDir, { recursive: true });
545
+ const sizes = [16, 32, 64, 128, 256, 512];
546
+ try {
547
+ for (const s of sizes) {
548
+ execSync(`sips -z ${s} ${s} "${menubarIconSrc}" --out "${path.join(iconsetDir, `icon_${s}x${s}.png`)}"`, { stdio: "ignore" });
549
+ }
550
+ execSync(`sips -z 32 32 "${menubarIconSrc}" --out "${path.join(iconsetDir, "icon_16x16@2x.png")}"`, { stdio: "ignore" });
551
+ execSync(`sips -z 64 64 "${menubarIconSrc}" --out "${path.join(iconsetDir, "icon_32x32@2x.png")}"`, { stdio: "ignore" });
552
+ execSync(`sips -z 256 256 "${menubarIconSrc}" --out "${path.join(iconsetDir, "icon_128x128@2x.png")}"`, { stdio: "ignore" });
553
+ execSync(`sips -z 512 512 "${menubarIconSrc}" --out "${path.join(iconsetDir, "icon_256x256@2x.png")}"`, { stdio: "ignore" });
554
+ execSync(`sips -z 1024 1024 "${menubarIconSrc}" --out "${path.join(iconsetDir, "icon_512x512@2x.png")}"`, { stdio: "ignore" });
555
+ execSync(`iconutil -c icns "${iconsetDir}" -o "${path.join(menubarResDir, "AppIcon.icns")}"`, { stdio: "ignore" });
556
+ } catch (e) { /* icon generation is optional */ }
557
+ fs.rmSync(iconsetDir, { recursive: true, force: true });
558
+ }
559
+
560
+ // Codesign
561
+ await runCommand("codesign", ["-s", "-", "--force", menubarApp]);
562
+ logActionDone("VoiceSmith menu bar app built");
563
+
564
+ // Install LaunchAgent
565
+ if (fs.existsSync(menubarPlistTemplate)) {
566
+ fs.mkdirSync(path.dirname(menubarPlist), { recursive: true });
567
+ let plistContent = fs.readFileSync(menubarPlistTemplate, "utf8");
568
+ plistContent = plistContent.replace(/MENUBAR_BINARY/g, menubarBinary);
569
+ fs.writeFileSync(menubarPlist, plistContent);
570
+
571
+ await runCommand("launchctl", ["unload", menubarPlist]);
572
+ const loadResult = await runCommand("launchctl", ["load", "-w", menubarPlist]);
573
+ if (loadResult.success) {
574
+ logOk("VoiceSmith menu bar started (runs at login)");
575
+ } else {
576
+ logWarn("Menu bar LaunchAgent install failed");
577
+ }
578
+ }
579
+ }
580
+
465
581
  // ─── Voice Picker ────────────────────────────────────────────────────────────
466
582
 
467
583
  const DEFAULT_VOICES = [
@@ -726,6 +842,7 @@ async function run() {
726
842
  await step3_models();
727
843
  const configuredIdes = await step4_mcpConfig(targetIdes);
728
844
  await step5_microphone();
845
+ await step5b_menuBar();
729
846
  await step6_voiceRules(configuredIdes);
730
847
 
731
848
  const ideNames = (configuredIdes || [])
package/bin/uninstall.js CHANGED
@@ -6,6 +6,7 @@
6
6
 
7
7
  const fs = require("fs");
8
8
  const path = require("path");
9
+ const os = require("os");
9
10
 
10
11
  const {
11
12
  INSTALL_DIR,
@@ -60,6 +61,22 @@ async function run() {
60
61
 
61
62
  console.log("");
62
63
 
64
+ // Unload and remove LaunchAgents before deleting the install directory
65
+ const { execSync } = require("child_process");
66
+ const launchAgents = [
67
+ path.join(os.homedir(), "Library", "LaunchAgents", "com.voicesmith-mcp.audio.plist"),
68
+ path.join(os.homedir(), "Library", "LaunchAgents", "com.voicesmith-mcp.menubar.plist"),
69
+ ];
70
+ for (const plist of launchAgents) {
71
+ if (fileExists(plist)) {
72
+ try {
73
+ execSync(`launchctl unload "${plist}" 2>/dev/null`, { stdio: "ignore" });
74
+ fs.unlinkSync(plist);
75
+ logOk(`Removed LaunchAgent: ${path.basename(plist)}`);
76
+ } catch (e) { /* ignore */ }
77
+ }
78
+ }
79
+
63
80
  // Remove install directory (venv, models, server files, config)
64
81
  if (dirExists(INSTALL_DIR)) {
65
82
  fs.rmSync(INSTALL_DIR, { recursive: true, force: true });
package/config.json CHANGED
@@ -20,6 +20,7 @@
20
20
  "log_level": "info",
21
21
  "log_file": false,
22
22
  "http_port": 7865,
23
+ "check_updates": true,
23
24
  "wake_word": {
24
25
  "enabled": false,
25
26
  "model": "hey_listen",
package/config.py CHANGED
@@ -28,6 +28,7 @@ class TTSConfig:
28
28
  default_speed: float = 1.0
29
29
  audio_player: str = "mpv"
30
30
  duck_media: bool = False
31
+ audio_output_device: Optional[str] = None # mpv device name, None = system default
31
32
 
32
33
 
33
34
  @dataclass
@@ -37,6 +38,8 @@ class STTConfig:
37
38
  silence_threshold: float = 1.5
38
39
  max_listen_timeout: float = 15
39
40
  vad_threshold: float = 0.3
41
+ nudge_on_timeout: bool = False
42
+ audio_input_device: Optional[int] = None # sounddevice device index, None = system default
40
43
 
41
44
 
42
45
  @dataclass
@@ -60,6 +63,7 @@ class AppConfig:
60
63
  log_level: str = "info"
61
64
  log_file: bool = False
62
65
  http_port: int = 7865
66
+ check_updates: bool = True
63
67
 
64
68
 
65
69
  def get_config_path() -> Path:
@@ -103,6 +107,8 @@ def load_config(config_path: Optional[Path] = None) -> AppConfig:
103
107
  config.tts.audio_player = tts["audio_player"]
104
108
  if "duck_media" in tts:
105
109
  config.tts.duck_media = bool(tts["duck_media"])
110
+ if "audio_output_device" in tts:
111
+ config.tts.audio_output_device = tts["audio_output_device"]
106
112
 
107
113
  # STT config
108
114
  if "stt" in data:
@@ -117,6 +123,11 @@ def load_config(config_path: Optional[Path] = None) -> AppConfig:
117
123
  config.stt.max_listen_timeout = float(stt["max_listen_timeout"])
118
124
  if "vad_threshold" in stt:
119
125
  config.stt.vad_threshold = float(stt["vad_threshold"])
126
+ if "nudge_on_timeout" in stt:
127
+ config.stt.nudge_on_timeout = bool(stt["nudge_on_timeout"])
128
+ if "audio_input_device" in stt:
129
+ val = stt["audio_input_device"]
130
+ config.stt.audio_input_device = int(val) if val is not None else None
120
131
 
121
132
  # Top-level config
122
133
  if "main_agent" in data:
@@ -131,6 +142,8 @@ def load_config(config_path: Optional[Path] = None) -> AppConfig:
131
142
  config.log_file = bool(data["log_file"])
132
143
  if "http_port" in data:
133
144
  config.http_port = int(data["http_port"])
145
+ if "check_updates" in data:
146
+ config.check_updates = bool(data["check_updates"])
134
147
 
135
148
  # Wake word config
136
149
  if "wake_word" in data:
@@ -184,6 +197,7 @@ def save_config(config: AppConfig, config_path: Optional[Path] = None) -> None:
184
197
  "default_speed": config.tts.default_speed,
185
198
  "audio_player": config.tts.audio_player,
186
199
  "duck_media": config.tts.duck_media,
200
+ "audio_output_device": config.tts.audio_output_device,
187
201
  },
188
202
  "stt": {
189
203
  "model_size": config.stt.model_size,
@@ -191,6 +205,8 @@ def save_config(config: AppConfig, config_path: Optional[Path] = None) -> None:
191
205
  "silence_threshold": config.stt.silence_threshold,
192
206
  "max_listen_timeout": config.stt.max_listen_timeout,
193
207
  "vad_threshold": config.stt.vad_threshold,
208
+ "nudge_on_timeout": config.stt.nudge_on_timeout,
209
+ "audio_input_device": config.stt.audio_input_device,
194
210
  },
195
211
  "main_agent": config.main_agent,
196
212
  "last_voice_name": config.last_voice_name,
@@ -198,6 +214,7 @@ def save_config(config: AppConfig, config_path: Optional[Path] = None) -> None:
198
214
  "log_level": config.log_level,
199
215
  "log_file": config.log_file,
200
216
  "http_port": config.http_port,
217
+ "check_updates": config.check_updates,
201
218
  "wake_word": {
202
219
  "enabled": config.wake_word.enabled,
203
220
  "model": config.wake_word.model,
@@ -94,36 +94,29 @@ except:
94
94
  fi
95
95
  fi
96
96
 
97
- # Fallback: read sessions.json directly if HTTP call didn't work
98
- if [ -z "$SESSION_NAME" ]; then
99
- SESSION_INFO=$(python3 -c "
100
- import json, os
97
+ # Fallback: query the server's /status endpoint for the actual name
98
+ if [ -z "$SESSION_NAME" ] && [ -n "$PORT" ]; then
99
+ STATUS=$(curl -s --max-time 2 "http://127.0.0.1:$PORT/status" 2>/dev/null)
100
+ if [ -n "$STATUS" ]; then
101
+ SESSION_NAME=$(echo "$STATUS" | python3 -c "
102
+ import sys, json
101
103
  try:
102
- with open('$SESSIONS_FILE') as f:
103
- data = json.load(f)
104
- tmux = os.environ.get('VOICESMITH_TMUX', '')
105
- for s in data.get('sessions', []):
106
- try:
107
- os.kill(s['pid'], 0)
108
- if tmux and s.get('tmux_session') == tmux:
109
- print(f\"{s['name']}|{s['voice']}\")
110
- raise SystemExit
111
- except (OSError, ProcessLookupError):
112
- pass
113
- for s in reversed(data.get('sessions', [])):
114
- try:
115
- os.kill(s['pid'], 0)
116
- print(f\"{s['name']}|{s['voice']}\")
117
- break
118
- except (OSError, ProcessLookupError):
119
- pass
104
+ d = json.load(sys.stdin)
105
+ # Check session object first (new servers), fall back to top-level name
106
+ s = d.get('session') or d
107
+ print(s.get('name', ''))
108
+ except:
109
+ pass
110
+ " 2>/dev/null)
111
+ SESSION_VOICE=$(echo "$STATUS" | python3 -c "
112
+ import sys, json
113
+ try:
114
+ d = json.load(sys.stdin)
115
+ s = d.get('session') or d
116
+ print(s.get('voice', ''))
120
117
  except:
121
118
  pass
122
119
  " 2>/dev/null)
123
-
124
- if [ -n "$SESSION_INFO" ]; then
125
- SESSION_NAME=$(echo "$SESSION_INFO" | cut -d'|' -f1)
126
- SESSION_VOICE=$(echo "$SESSION_INFO" | cut -d'|' -f2)
127
120
  fi
128
121
  fi
129
122
  fi
@@ -0,0 +1,25 @@
1
+ <?xml version="1.0" encoding="UTF-8"?>
2
+ <!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN"
3
+ "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
4
+ <plist version="1.0">
5
+ <dict>
6
+ <key>CFBundleExecutable</key>
7
+ <string>VoiceSmithMCP</string>
8
+ <key>CFBundleIdentifier</key>
9
+ <string>com.voicesmith-mcp.launcher</string>
10
+ <key>CFBundleName</key>
11
+ <string>VoiceSmithMCP</string>
12
+ <key>CFBundlePackageType</key>
13
+ <string>APPL</string>
14
+ <key>CFBundleShortVersionString</key>
15
+ <string>1.0</string>
16
+ <key>CFBundleVersion</key>
17
+ <string>1</string>
18
+ <!-- Background-only: no Dock icon or app switcher entry -->
19
+ <key>LSBackgroundOnly</key>
20
+ <true/>
21
+ <!-- Required for macOS TCC to show a mic permission dialog for this bundle -->
22
+ <key>NSMicrophoneUsageDescription</key>
23
+ <string>VoiceSmith MCP uses the microphone to transcribe voice input for Claude.</string>
24
+ </dict>
25
+ </plist>