npm - voicesmith-mcp - Versions diffs - 1.0.11 → 1.0.13 - Mend

voicesmith-mcp 1.0.11 → 1.0.13

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (5) hide show

package/bin/utils.js +1 -1
package/package.json +1 -1
package/stt/__pycache__/mic_capture.cpython-314.pyc +0 -0
package/stt/mic_capture.py +16 -0
package/templates/voice-rules.md +1 -1

package/bin/utils.js CHANGED Viewed

@@ -345,7 +345,7 @@ You have access to voice tools via the VoiceSmith MCP server.
 ## Speaking
 - **Opening** — Only speak at the start when you have something meaningful to say (e.g., clarifying your approach, flagging an issue). Do NOT speak filler acknowledgments like "Let me look into that." Use \`block: false\` when you do speak an opening.
 - **Closing** — Always speak a summary when done. Use \`block: true\`. Never skip the closing.
-- **Questions requiring user input → use \`speak_then_listen\` as your closing.** If the user literally cannot continue without providing input (e.g., choosing between options, confirming a destructive action, providing missing info), use \`speak_then_listen\`. If you can reasonably continue without their answer, use regular \`speak\`.
+- **Questions → use \`speak_then_listen\`.** If your closing statement ends with a question directed at the user (ends with \`?\`), use \`speak_then_listen\` — not regular \`speak\`. The only exceptions are rhetorical wrap-ups like "Standing by." or "What's next?" where you don't actually need an answer.
 - Keep spoken output brief — prefer 1-2 sentences, never exceed 3. Write details, speak summaries. No code or paths aloud.
 ## Speed Preferences

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "voicesmith-mcp",
-  "version": "1.0.11",
+  "version": "1.0.13",
   "description": "Local AI voice for coding assistants — TTS & STT via MCP. Kokoro ONNX + faster-whisper, fully offline.",
   "bin": {
     "voicesmith-mcp": "bin/cli.js"

package/stt/__pycache__/mic_capture.cpython-314.pyc CHANGED Viewed

Binary file

package/stt/mic_capture.py CHANGED Viewed

@@ -61,6 +61,11 @@ class MicCapture:
         silence_duration = 0.0
         loop = asyncio.get_event_loop()
+        # Reset VAD state — the LSTM hidden state and context window must
+        # be cleared between recordings to avoid stale state from previous
+        # audio affecting speech detection.
+        vad.reset()
         stream = None
         try:
             stream = sd.InputStream(
@@ -73,6 +78,17 @@ class MicCapture:
             stream.start()
             logger.info("Microphone recording started")
+            # Discard the first ~200ms of audio to avoid picking up residual
+            # speaker output (Tink sound or TTS playback that just finished).
+            # This prevents VAD from detecting speaker bleed as "speech" and
+            # then cutting off when the bleed stops.
+            flush_chunks = int(0.2 * self._sample_rate / 512)  # ~6 chunks
+            for _ in range(flush_chunks):
+                try:
+                    self._audio_queue.get(timeout=0.1)
+                except queue.Empty:
+                    break
             start_time = asyncio.get_event_loop().time()
             while not self._stop_flag:

package/templates/voice-rules.md CHANGED Viewed

@@ -17,7 +17,7 @@ You have access to voice tools via the VoiceSmith MCP server.
 ## Speaking
 - **Opening** — Only speak at the start when you have something meaningful to say (e.g., clarifying your approach, flagging an issue). Do NOT speak filler acknowledgments like "Let me look into that." Use `block: false` when you do speak an opening.
 - **Closing** — Always speak a summary when done. Use `block: true`. Never skip the closing.
-- **Questions requiring user input → use `speak_then_listen` as your closing.** If the user literally cannot continue without providing input (e.g., choosing between options, confirming a destructive action, providing missing info), use `speak_then_listen`. If you can reasonably continue without their answer, use regular `speak`.
+- **Questions → use `speak_then_listen`.** If your closing statement ends with a question directed at the user (ends with `?`), use `speak_then_listen` — not regular `speak`. The only exceptions are rhetorical wrap-ups like "Standing by." or "What's next?" where you don't actually need an answer.
 - Keep spoken output brief — prefer 1-2 sentences, never exceed 3. Write details, speak summaries. No code or paths aloud.
 ## Speed Preferences