npm - voicesmith-mcp - Versions diffs - 1.0.17 → 1.0.19 - Mend

voicesmith-mcp 1.0.17 → 1.0.19

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (27) hide show

package/README.md +59 -12
package/bin/install.js +117 -0
package/bin/uninstall.js +17 -0
package/config.json +1 -0
package/config.py +17 -0
package/hooks/session-start.sh +19 -26
package/launcher/Info.plist +25 -0
package/launcher/audio_service.c +267 -0
package/launcher/com.voicesmith-mcp.audio.plist +41 -0
package/launcher/main.c +91 -0
package/launcher/mic_capture.c +161 -0
package/menubar/VoiceSmithMenu.swift +1668 -0
package/menubar/app-icon.png +0 -0
package/menubar/com.voicesmith-mcp.menubar.plist +27 -0
package/package.json +6 -2
package/requirements.txt +1 -0
package/server.py +453 -42
package/stt/__pycache__/mic_capture.cpython-314.pyc +0 -0
package/stt/mic_capture.py +6 -2
package/templates/voice-rules.md +2 -1
package/tts/__pycache__/audio_player.cpython-314.pyc +0 -0
package/tts/__pycache__/kokoro_engine.cpython-314.pyc +0 -0
package/tts/__pycache__/media_duck.cpython-314.pyc +0 -0
package/tts/__pycache__/speech_queue.cpython-314.pyc +0 -0
package/tts/audio_player.py +80 -3
package/tts/kokoro_engine.py +11 -4
package/voice_registry.py +23 -10

package/README.md CHANGED Viewed

@@ -39,7 +39,7 @@ What the AI does automatically:
 | Moment | What happens |
 |--------|-------------|
-| You give it a task | Speaks a brief acknowledgment |
+| You give it a task | Gets to work (speaks only when clarifying approach) |
 | It finishes work | Speaks a summary of what was done |
 | It has a question | Asks out loud, then listens for your voice response |
 | Voice tools unavailable | Falls back to text silently |
@@ -77,6 +77,43 @@ In a meeting or shared space? Just ask:
 The AI continues working normally — it just won't play audio. Say *"unmute"* when you're ready.
+---
+### Menu Bar App (macOS)
+On macOS, VoiceSmith includes a native menu bar app for hands-free control:
+- **Session Activity** — see all active sessions with real-time sparkline graphs
+- **Quick toggles** — Media Ducking, Nudge on Timeout
+- **Voice switcher** — browse and change from 54 voices, nested by language
+- **Whisper model** — switch between base/small/medium/large-v3 with inline download progress
+- **Audio devices** — choose audio output and input devices
+- **Voice rules** — edit or reset to default
+- **Updates** — check and install new versions
+The menu bar app starts automatically at login and runs independently from IDE sessions.
+---
+### Audio Device Selection
+Choose specific audio output (speakers/headphones) and input (microphone) devices from the menu bar app, or in config:
+```json
+{
+  "tts": { "audio_output_device": "coreaudio/BuiltInSpeakerDevice" },
+  "stt": { "audio_input_device": 1 }
+}
+```
+Changes take effect immediately — no restart needed. If a configured device is unavailable, falls back to system default.
+---
+### Interrupting Speech
+Press **Escape** while the AI is speaking to stop audio immediately. The AI stops mid-sentence and waits for your next input.
 ## Alternative Install
 If you don't have Node.js or prefer a shell script:
@@ -87,7 +124,7 @@ cd voicesmith-mcp
 ./install.sh
 ```
-Supports the same flags: `--claude`, `--cursor`, `--codex`, `--all`.
+Supports the same flags: `--claude`, `--cursor`, `--codex`, `--all`, `--uninstall`.
 ## MCP Tools
@@ -104,6 +141,7 @@ Once installed, your AI assistant has access to these tools:
 | `mute` / `unmute` | Silence or resume voice output |
 | `stop` | Stop playback or cancel an active recording |
 | `status` | Server health and session info |
+| `list_audio_devices` | List available audio input and output devices |
 ## How It Works
@@ -112,7 +150,8 @@ The MCP server runs as a local process alongside your IDE. It communicates over
 - **TTS**: Kokoro ONNX — fast neural TTS, 54 voices, no GPU needed
 - **STT**: faster-whisper — OpenAI Whisper running locally via CTranslate2
 - **VAD**: Silero VAD — voice activity detection for clean recordings
-- **Audio**: mpv for playback, sounddevice for recording
+- **Audio**: mpv for playback; CoreAudio via native app bundle on macOS (sounddevice fallback on Linux)
+- **Media ducking**: Auto-pauses Apple Music, Spotify, and browser audio during speech (macOS)
 ## Multi-Session
@@ -131,16 +170,24 @@ Config lives at `~/.local/share/voicesmith-mcp/config.json`. Key settings:
   "main_agent": "Eric",
   "tts": {
     "default_voice": "am_eric",
-    "audio_player": "mpv"
+    "audio_player": "mpv",
+    "duck_media": true
   },
   "stt": {
     "model_size": "base",
     "language": "en",
-    "vad_threshold": 0.3
+    "vad_threshold": 0.3,
+    "nudge_on_timeout": false
   }
 }
 ```
+| Setting | Description | Default |
+|---------|-------------|---------|
+| `tts.duck_media` | Auto-pause music/browser audio during speech (macOS) | `true` |
+| `stt.nudge_on_timeout` | Speak "I didn't catch that" when listen times out | `false` |
+| `stt.vad_threshold` | Voice detection sensitivity (lower = more sensitive) | `0.3` |
 Re-run `npx voicesmith-mcp install` to change your voice or update settings. Existing configuration is preserved — only new defaults are added.
 ## Requirements
@@ -166,16 +213,14 @@ Re-run `npx voicesmith-mcp install` to change your voice or update settings. Exi
 ### The AI can't hear me (listen returns empty or times out)
-**Check microphone permissions.** On macOS, the terminal app that runs your IDE needs microphone access:
+**Check microphone permissions.** On macOS, VoiceSmith uses a native app bundle (`VoiceSmithMCP.app`) for mic access. The first time it records, macOS should show a permission dialog for the app. If it didn't:
 1. Open **System Settings > Privacy & Security > Microphone**
-2. Make sure your terminal app is listed and enabled:
-   - **Warp**, **Terminal.app**, or **iTerm2** — for Claude Code
-   - **Cursor** or **VS Code** — if using those IDEs directly
-3. If the app isn't listed, the first `listen` call should trigger the permission prompt. Approve it and try again.
+2. Look for **VoiceSmithMCP** and make sure it's enabled
+3. If it's not listed, the LaunchAgent may not be running — try reinstalling: `npx voicesmith-mcp install`
 > [!IMPORTANT]
-> The Python process inherits microphone permissions from the app that launched it. If your terminal doesn't have mic access, listen will silently fail.
+> If the server detects silent audio (all zeros for ~320ms), it returns an error pointing you to the microphone permission settings. This usually means macOS TCC denied mic access.
 **Check your audio input device.** If an external mic is selected but not connected, the server opens it but gets silence:
 - Open **System Settings > Sound > Input** and verify the correct mic is selected
@@ -209,9 +254,11 @@ This can happen when another session is holding your preferred voice name. Ask t
 ```bash
 npx voicesmith-mcp uninstall
+# or if installed via git clone:
+./install.sh --uninstall
 ```
-Removes all files, models, MCP config entries, and voice rules cleanly.
+Removes all files, models, MCP config entries, voice rules, LaunchAgents, and hooks cleanly.
 ## License

package/bin/install.js CHANGED Viewed

@@ -462,6 +462,122 @@ except Exception as e:
   }
 }
+// ─── Step 5b: Menu Bar App (macOS only) ──────────────────────────────────────
+async function step5b_menuBar() {
+  if (process.platform !== "darwin") return;
+  const menubarSrc = path.join(__dirname, "..", "menubar", "VoiceSmithMenu.swift");
+  const menubarIconSrc = path.join(__dirname, "..", "menubar", "app-icon.png");
+  const menubarPlistTemplate = path.join(__dirname, "..", "menubar", "com.voicesmith-mcp.menubar.plist");
+  if (!fs.existsSync(menubarSrc)) {
+    logWarn("Menu bar source not found — skipping");
+    return;
+  }
+  // Check for swiftc
+  if (!(await commandExists("swiftc"))) {
+    logWarn("swiftc not found — menu bar app requires Xcode Command Line Tools");
+    logInfo("Install with: xcode-select --install");
+    return;
+  }
+  const menubarApp = path.join(INSTALL_DIR, "VoiceSmith.app");
+  const menubarBinDir = path.join(menubarApp, "Contents", "MacOS");
+  const menubarResDir = path.join(menubarApp, "Contents", "Resources");
+  const menubarBinary = path.join(menubarBinDir, "VoiceSmith");
+  const menubarPlist = path.join(os.homedir(), "Library", "LaunchAgents", "com.voicesmith-mcp.menubar.plist");
+  // Create app bundle structure
+  fs.mkdirSync(menubarBinDir, { recursive: true });
+  fs.mkdirSync(menubarResDir, { recursive: true });
+  // Create Info.plist
+  const infoPlist = `<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
+<plist version="1.0">
+<dict>
+    <key>CFBundleExecutable</key>
+    <string>VoiceSmith</string>
+    <key>CFBundleIdentifier</key>
+    <string>com.voicesmith-mcp.menubar</string>
+    <key>CFBundleName</key>
+    <string>VoiceSmith</string>
+    <key>CFBundleDisplayName</key>
+    <string>VoiceSmith</string>
+    <key>CFBundlePackageType</key>
+    <string>APPL</string>
+    <key>CFBundleShortVersionString</key>
+    <string>1.0</string>
+    <key>CFBundleVersion</key>
+    <string>1</string>
+    <key>CFBundleIconFile</key>
+    <string>AppIcon</string>
+    <key>LSBackgroundOnly</key>
+    <true/>
+    <key>LSUIElement</key>
+    <true/>
+</dict>
+</plist>`;
+  fs.writeFileSync(path.join(menubarApp, "Contents", "Info.plist"), infoPlist);
+  // Compile Swift
+  logAction("Building VoiceSmith menu bar app...");
+  const buildResult = await runCommand("swiftc", [
+    "-parse-as-library",
+    "-framework", "SwiftUI",
+    "-framework", "AppKit",
+    menubarSrc,
+    "-o", menubarBinary,
+  ]);
+  if (!buildResult.success) {
+    logWarn("Menu bar build failed — menu bar will not be available");
+    return;
+  }
+  // Generate icon
+  if (fs.existsSync(menubarIconSrc)) {
+    const { execSync } = require("child_process");
+    const iconsetDir = path.join(os.tmpdir(), "VoiceSmithIcon.iconset");
+    fs.mkdirSync(iconsetDir, { recursive: true });
+    const sizes = [16, 32, 64, 128, 256, 512];
+    try {
+      for (const s of sizes) {
+        execSync(`sips -z ${s} ${s} "${menubarIconSrc}" --out "${path.join(iconsetDir, `icon_${s}x${s}.png`)}"`, { stdio: "ignore" });
+      }
+      execSync(`sips -z 32 32 "${menubarIconSrc}" --out "${path.join(iconsetDir, "icon_16x16@2x.png")}"`, { stdio: "ignore" });
+      execSync(`sips -z 64 64 "${menubarIconSrc}" --out "${path.join(iconsetDir, "icon_32x32@2x.png")}"`, { stdio: "ignore" });
+      execSync(`sips -z 256 256 "${menubarIconSrc}" --out "${path.join(iconsetDir, "icon_128x128@2x.png")}"`, { stdio: "ignore" });
+      execSync(`sips -z 512 512 "${menubarIconSrc}" --out "${path.join(iconsetDir, "icon_256x256@2x.png")}"`, { stdio: "ignore" });
+      execSync(`sips -z 1024 1024 "${menubarIconSrc}" --out "${path.join(iconsetDir, "icon_512x512@2x.png")}"`, { stdio: "ignore" });
+      execSync(`iconutil -c icns "${iconsetDir}" -o "${path.join(menubarResDir, "AppIcon.icns")}"`, { stdio: "ignore" });
+    } catch (e) { /* icon generation is optional */ }
+    fs.rmSync(iconsetDir, { recursive: true, force: true });
+  }
+  // Codesign
+  await runCommand("codesign", ["-s", "-", "--force", menubarApp]);
+  logActionDone("VoiceSmith menu bar app built");
+  // Install LaunchAgent
+  if (fs.existsSync(menubarPlistTemplate)) {
+    fs.mkdirSync(path.dirname(menubarPlist), { recursive: true });
+    let plistContent = fs.readFileSync(menubarPlistTemplate, "utf8");
+    plistContent = plistContent.replace(/MENUBAR_BINARY/g, menubarBinary);
+    fs.writeFileSync(menubarPlist, plistContent);
+    await runCommand("launchctl", ["unload", menubarPlist]);
+    const loadResult = await runCommand("launchctl", ["load", "-w", menubarPlist]);
+    if (loadResult.success) {
+      logOk("VoiceSmith menu bar started (runs at login)");
+    } else {
+      logWarn("Menu bar LaunchAgent install failed");
+    }
+  }
+}
 // ─── Voice Picker ────────────────────────────────────────────────────────────
 const DEFAULT_VOICES = [
@@ -726,6 +842,7 @@ async function run() {
   await step3_models();
   const configuredIdes = await step4_mcpConfig(targetIdes);
   await step5_microphone();
+  await step5b_menuBar();
   await step6_voiceRules(configuredIdes);
   const ideNames = (configuredIdes || [])

package/bin/uninstall.js CHANGED Viewed

@@ -6,6 +6,7 @@
 const fs = require("fs");
 const path = require("path");
+const os = require("os");
 const {
   INSTALL_DIR,
@@ -60,6 +61,22 @@ async function run() {
   console.log("");
+  // Unload and remove LaunchAgents before deleting the install directory
+  const { execSync } = require("child_process");
+  const launchAgents = [
+    path.join(os.homedir(), "Library", "LaunchAgents", "com.voicesmith-mcp.audio.plist"),
+    path.join(os.homedir(), "Library", "LaunchAgents", "com.voicesmith-mcp.menubar.plist"),
+  ];
+  for (const plist of launchAgents) {
+    if (fileExists(plist)) {
+      try {
+        execSync(`launchctl unload "${plist}" 2>/dev/null`, { stdio: "ignore" });
+        fs.unlinkSync(plist);
+        logOk(`Removed LaunchAgent: ${path.basename(plist)}`);
+      } catch (e) { /* ignore */ }
+    }
+  }
   // Remove install directory (venv, models, server files, config)
   if (dirExists(INSTALL_DIR)) {
     fs.rmSync(INSTALL_DIR, { recursive: true, force: true });

package/config.json CHANGED Viewed

@@ -20,6 +20,7 @@
   "log_level": "info",
   "log_file": false,
   "http_port": 7865,
+  "check_updates": true,
   "wake_word": {
     "enabled": false,
     "model": "hey_listen",

package/config.py CHANGED Viewed

@@ -28,6 +28,7 @@ class TTSConfig:
     default_speed: float = 1.0
     audio_player: str = "mpv"
     duck_media: bool = False
+    audio_output_device: Optional[str] = None  # mpv device name, None = system default
 @dataclass
@@ -37,6 +38,8 @@ class STTConfig:
     silence_threshold: float = 1.5
     max_listen_timeout: float = 15
     vad_threshold: float = 0.3
+    nudge_on_timeout: bool = False
+    audio_input_device: Optional[int] = None  # sounddevice device index, None = system default
 @dataclass
@@ -60,6 +63,7 @@ class AppConfig:
     log_level: str = "info"
     log_file: bool = False
     http_port: int = 7865
+    check_updates: bool = True
 def get_config_path() -> Path:
@@ -103,6 +107,8 @@ def load_config(config_path: Optional[Path] = None) -> AppConfig:
                     config.tts.audio_player = tts["audio_player"]
                 if "duck_media" in tts:
                     config.tts.duck_media = bool(tts["duck_media"])
+                if "audio_output_device" in tts:
+                    config.tts.audio_output_device = tts["audio_output_device"]
             # STT config
             if "stt" in data:
@@ -117,6 +123,11 @@ def load_config(config_path: Optional[Path] = None) -> AppConfig:
                     config.stt.max_listen_timeout = float(stt["max_listen_timeout"])
                 if "vad_threshold" in stt:
                     config.stt.vad_threshold = float(stt["vad_threshold"])
+                if "nudge_on_timeout" in stt:
+                    config.stt.nudge_on_timeout = bool(stt["nudge_on_timeout"])
+                if "audio_input_device" in stt:
+                    val = stt["audio_input_device"]
+                    config.stt.audio_input_device = int(val) if val is not None else None
             # Top-level config
             if "main_agent" in data:
@@ -131,6 +142,8 @@ def load_config(config_path: Optional[Path] = None) -> AppConfig:
                 config.log_file = bool(data["log_file"])
             if "http_port" in data:
                 config.http_port = int(data["http_port"])
+            if "check_updates" in data:
+                config.check_updates = bool(data["check_updates"])
             # Wake word config
             if "wake_word" in data:
@@ -184,6 +197,7 @@ def save_config(config: AppConfig, config_path: Optional[Path] = None) -> None:
             "default_speed": config.tts.default_speed,
             "audio_player": config.tts.audio_player,
             "duck_media": config.tts.duck_media,
+            "audio_output_device": config.tts.audio_output_device,
         },
         "stt": {
             "model_size": config.stt.model_size,
@@ -191,6 +205,8 @@ def save_config(config: AppConfig, config_path: Optional[Path] = None) -> None:
             "silence_threshold": config.stt.silence_threshold,
             "max_listen_timeout": config.stt.max_listen_timeout,
             "vad_threshold": config.stt.vad_threshold,
+            "nudge_on_timeout": config.stt.nudge_on_timeout,
+            "audio_input_device": config.stt.audio_input_device,
         },
         "main_agent": config.main_agent,
         "last_voice_name": config.last_voice_name,
@@ -198,6 +214,7 @@ def save_config(config: AppConfig, config_path: Optional[Path] = None) -> None:
         "log_level": config.log_level,
         "log_file": config.log_file,
         "http_port": config.http_port,
+        "check_updates": config.check_updates,
         "wake_word": {
             "enabled": config.wake_word.enabled,
             "model": config.wake_word.model,

package/hooks/session-start.sh CHANGED Viewed

@@ -94,36 +94,29 @@ except:
         fi
     fi
-    # Fallback: read sessions.json directly if HTTP call didn't work
-    if [ -z "$SESSION_NAME" ]; then
-        SESSION_INFO=$(python3 -c "
-import json, os
+    # Fallback: query the server's /status endpoint for the actual name
+    if [ -z "$SESSION_NAME" ] && [ -n "$PORT" ]; then
+        STATUS=$(curl -s --max-time 2 "http://127.0.0.1:$PORT/status" 2>/dev/null)
+        if [ -n "$STATUS" ]; then
+            SESSION_NAME=$(echo "$STATUS" | python3 -c "
+import sys, json
 try:
-    with open('$SESSIONS_FILE') as f:
-        data = json.load(f)
-    tmux = os.environ.get('VOICESMITH_TMUX', '')
-    for s in data.get('sessions', []):
-        try:
-            os.kill(s['pid'], 0)
-            if tmux and s.get('tmux_session') == tmux:
-                print(f\"{s['name']}|{s['voice']}\")
-                raise SystemExit
-        except (OSError, ProcessLookupError):
-            pass
-    for s in reversed(data.get('sessions', [])):
-        try:
-            os.kill(s['pid'], 0)
-            print(f\"{s['name']}|{s['voice']}\")
-            break
-        except (OSError, ProcessLookupError):
-            pass
+    d = json.load(sys.stdin)
+    # Check session object first (new servers), fall back to top-level name
+    s = d.get('session') or d
+    print(s.get('name', ''))
+except:
+    pass
+" 2>/dev/null)
+            SESSION_VOICE=$(echo "$STATUS" | python3 -c "
+import sys, json
+try:
+    d = json.load(sys.stdin)
+    s = d.get('session') or d
+    print(s.get('voice', ''))
 except:
     pass
 " 2>/dev/null)
-        if [ -n "$SESSION_INFO" ]; then
-            SESSION_NAME=$(echo "$SESSION_INFO" | cut -d'|' -f1)
-            SESSION_VOICE=$(echo "$SESSION_INFO" | cut -d'|' -f2)
         fi
     fi
 fi

package/launcher/Info.plist ADDED Viewed

@@ -0,0 +1,25 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN"
+    "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
+<plist version="1.0">
+<dict>
+    <key>CFBundleExecutable</key>
+    <string>VoiceSmithMCP</string>
+    <key>CFBundleIdentifier</key>
+    <string>com.voicesmith-mcp.launcher</string>
+    <key>CFBundleName</key>
+    <string>VoiceSmithMCP</string>
+    <key>CFBundlePackageType</key>
+    <string>APPL</string>
+    <key>CFBundleShortVersionString</key>
+    <string>1.0</string>
+    <key>CFBundleVersion</key>
+    <string>1</string>
+    <!-- Background-only: no Dock icon or app switcher entry -->
+    <key>LSBackgroundOnly</key>
+    <true/>
+    <!-- Required for macOS TCC to show a mic permission dialog for this bundle -->
+    <key>NSMicrophoneUsageDescription</key>
+    <string>VoiceSmith MCP uses the microphone to transcribe voice input for Claude.</string>
+</dict>
+</plist>