npm - verbalcoding - Versions diffs - 0.2.7 → 0.2.9 - Mend

verbalcoding 0.2.7 → 0.2.9

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (46) hide show

package/README.md +12 -27
package/app-node/cli_install.test.mjs +32 -0
package/app-node/install_config.mjs +10 -0
package/docs/FRESH_INSTALL.md +8 -2
package/docs/assets/figures/verbalcoding-flow.svg +45 -30
package/docs/i18n/CONFIGURATION.es.md +138 -49
package/docs/i18n/CONFIGURATION.fr.md +138 -49
package/docs/i18n/CONFIGURATION.ja.md +137 -48
package/docs/i18n/CONFIGURATION.ko.md +137 -48
package/docs/i18n/CONFIGURATION.ru.md +138 -49
package/docs/i18n/CONFIGURATION.zh.md +137 -48
package/docs/i18n/FRESH_INSTALL.es.md +115 -32
package/docs/i18n/FRESH_INSTALL.fr.md +115 -32
package/docs/i18n/FRESH_INSTALL.ja.md +119 -36
package/docs/i18n/FRESH_INSTALL.ko.md +120 -37
package/docs/i18n/FRESH_INSTALL.ru.md +115 -32
package/docs/i18n/FRESH_INSTALL.zh.md +119 -36
package/docs/i18n/MULTI_INSTANCE.es.md +85 -26
package/docs/i18n/MULTI_INSTANCE.fr.md +85 -26
package/docs/i18n/MULTI_INSTANCE.ja.md +87 -29
package/docs/i18n/MULTI_INSTANCE.ko.md +87 -29
package/docs/i18n/MULTI_INSTANCE.ru.md +84 -26
package/docs/i18n/MULTI_INSTANCE.zh.md +87 -29
package/docs/i18n/README.es.md +109 -45
package/docs/i18n/README.fr.md +109 -45
package/docs/i18n/README.ja.md +109 -45
package/docs/i18n/README.ko.md +108 -45
package/docs/i18n/README.ru.md +109 -45
package/docs/i18n/README.zh.md +108 -45
package/docs/i18n/RELEASE.es.md +53 -37
package/docs/i18n/RELEASE.fr.md +53 -37
package/docs/i18n/RELEASE.ja.md +52 -36
package/docs/i18n/RELEASE.ko.md +52 -36
package/docs/i18n/RELEASE.ru.md +53 -37
package/docs/i18n/RELEASE.zh.md +53 -37
package/docs/i18n/USAGE.es.md +91 -64
package/docs/i18n/USAGE.fr.md +91 -64
package/docs/i18n/USAGE.ja.md +90 -63
package/docs/i18n/USAGE.ko.md +90 -63
package/docs/i18n/USAGE.ru.md +91 -64
package/docs/i18n/USAGE.zh.md +90 -63
package/package.json +1 -1
package/scripts/bootstrap_prereqs.sh +15 -3
package/scripts/cli.mjs +1 -1
package/scripts/doctor.mjs +173 -8
package/scripts/install.mjs +2 -0

package/README.md CHANGED Viewed

@@ -34,7 +34,7 @@ VerbalCoding turns a Discord voice channel into a hands-free control surface for
 | What you get | Why it feels good |
 |---|---|
 | Voice-first agent control | Talk to Hermes Agent, Claude Code, Codex, Gemini CLI, OpenCode, OpenClaw, or any custom CLI harness. |
-| Local-first speech loop | Discord voice capture → `whisper.cpp` STT → agent → chunked TTS playback. |
+| On-device speech loop | Discord voice capture → local `whisper-cli` transcription → agent → chunked TTS playback. |
 | Shared voice + text context | Voice turns and `!ask` text commands can reuse the same supported agent session. |
 | Barge-in and sensitivity modes | Interrupt playback naturally and switch between normal and conservative/noisy environments. |
 | Multilingual voice presets | Switch STT, progress language, and TTS voice together with `vc language ko/en/auto`. |
@@ -69,23 +69,10 @@ vc doctor
 ./run.sh
 ```
-`vc setup --yes` and `./scripts/install.sh --yes` bootstrap local prerequisites where possible: Node/npm dependencies, `ffmpeg`, `whisper-cli`, the default whisper.cpp model, a local `.venv-tts` Edge TTS helper, and the short `vc` shell command for clone installs. They support macOS/Homebrew plus common Linux package managers (`apt`, `dnf`, `pacman`); rerun with `--no-wizard` for dependency-only setup or `--skip-system` if you want to install OS packages yourself.
+`vc setup --yes` bootstraps local prerequisites from the npm package. `./scripts/install.sh --yes` does the same for GitHub clone installs. Both cover Node/npm dependencies, `ffmpeg`, `whisper-cli`, the default whisper.cpp model, a local `.venv-tts` Edge TTS helper, and setup wizard configuration where possible. They support macOS/Homebrew plus common Linux package managers (`apt`, `dnf`, `pacman`); rerun with `--no-wizard` for dependency-only setup or `--skip-system` if you want to install OS packages yourself.
 Need a clean install walkthrough? Start with [Fresh Install](docs/FRESH_INSTALL.md).
-## How It Works
-```mermaid
-flowchart LR
-  A[Discord voice] --> B["@discordjs/voice"]
-  B --> C[PCM cleanup + gates]
-  C --> D["whisper.cpp STT"]
-  D --> E["CLI agent adapter"]
-  E --> F["Concise answer"]
-  F --> G["Chunked TTS"]
-  G --> H["Discord playback"]
-```
 ## Supported Agent Backends
 | Backend | Default command | Session support |
@@ -107,12 +94,6 @@ flowchart LR
 | [Configuration](docs/CONFIGURATION.md) | `.env`, agent backends, MCP, TTS backends, operational notes |
 | [Multi-Instance](docs/MULTI_INSTANCE.md) | One permanent Discord voice room per project |
 | [Release Notes](docs/RELEASE.md) | Current capabilities and pre-release checklist |
-| [한국어 문서](docs/i18n/README.ko.md) | npm 설치, 사용법, 설정, 멀티 인스턴스 한국어 가이드 |
-| [日本語 docs](docs/i18n/README.ja.md) | npm install, usage, configuration, multi-instance guide in Japanese |
-| [中文文档](docs/i18n/README.zh.md) | npm 安装、使用、配置和多实例中文指南 |
-| [Español docs](docs/i18n/README.es.md) | Instalación npm, uso, configuración y multiinstancia en español |
-| [Français docs](docs/i18n/README.fr.md) | Installation npm, utilisation, configuration et multi-instance en français |
-| [Русская документация](docs/i18n/README.ru.md) | npm установка, использование, конфигурация и мульти-инстансы на русском |
 ## Tiny Command Map
@@ -128,11 +109,15 @@ vc start                  # start the default bridge
 In Discord:
-```text
-!join        !ask <prompt>       !verbose on/off
-!latency     !sensitivity normal !sensitivity conservative
-!session new <name> <workdir> [context] --voice <voice-channel>
-```
+| Command | What it does |
+|---|---|
+| `!join` | Join your current voice channel. |
+| `!ask <prompt>` | Send text to the same agent backend. |
+| `!verbose on\|off` | Show/speak short progress updates. |
+| `!latency` | Summarize recent voice/STT/agent/TTS latency. |
+| `!sensitivity normal` | Use normal indoor barge-in sensitivity. |
+| `!sensitivity conservative` | Use stricter noisy/outdoor sensitivity. |
+| `!session new <name> <workdir> [context] --voice <voice-channel>` | Bind a project session to a voice room. |
 ## Requirements
@@ -140,7 +125,7 @@ In Discord:
 |---|---|
 | Runtime | Node.js 20+, npm; install script can install via Homebrew/apt/dnf/pacman |
 | Audio | `ffmpeg`; install script can install it |
-| STT | `whisper.cpp` / `whisper-cli`; install script uses Homebrew on macOS or local Linux build fallback |
+| Speech recognition | Local `whisper-cli` from whisper.cpp; install script uses Homebrew on macOS or local Linux build fallback |
 | TTS | Edge TTS CLI; install script creates `.venv-tts` if needed |
 | Discord | Bot token, Message Content intent, voice permissions |
 | Agent | At least one authenticated CLI harness, Hermes Agent by default |

package/app-node/cli_install.test.mjs CHANGED Viewed

@@ -63,6 +63,8 @@ test('bootstrap script installs cross-platform prerequisites and local model hel
   assert.match(script, /brew install/);
   assert.match(script, /apt-get install/);
+  assert.match(script, /has_cmd node \|\| packages\+\=\(nodejs\)/);
+  assert.match(script, /has_cmd npm \|\| packages\+\=\(npm\)/);
   assert.match(script, /dnf install/);
   assert.match(script, /pacman -Sy/);
   assert.match(script, /git clone --depth 1 https:\/\/github\.com\/ggml-org\/whisper\.cpp\.git/);
@@ -70,6 +72,36 @@ test('bootstrap script installs cross-platform prerequisites and local model hel
   assert.match(script, /\.venv-tts/);
 });
+test('doctor auto-bootstraps fixable prerequisites by default', () => {
+  const doctor = fs.readFileSync(path.join(ROOT, 'scripts', 'doctor.mjs'), 'utf8');
+  const cli = fs.readFileSync(path.join(ROOT, 'scripts', 'cli.mjs'), 'utf8');
+  assert.match(doctor, /fixablePrerequisites/);
+  assert.match(doctor, /bootstrap_prereqs\.sh'\), '--yes'/);
+  assert.match(doctor, /VERBALCODING_DOCTOR_AUTO_FIX/);
+  assert.match(doctor, /--no-fix/);
+  assert.match(doctor, /WHISPER_CPP_BIN/);
+  assert.match(doctor, /EDGE_TTS_COMMAND/);
+  assert.match(doctor, /installHermesCliIfNeeded/);
+  assert.match(doctor, /NousResearch\/hermes-agent\/main\/scripts\/install\.sh/);
+  assert.match(doctor, /VERBALCODING_DOCTOR_INSTALL_HERMES/);
+  assert.match(doctor, /Discord bot setup:/);
+  assert.match(doctor, /discord\.com\/developers\/applications/);
+  assert.match(cli, /doctor\.mjs'\), \.\.\.argv\.slice\(1\)/);
+});
+test('setup summary guides Discord app creation and records client id', () => {
+  const installer = fs.readFileSync(path.join(ROOT, 'scripts', 'install.mjs'), 'utf8');
+  const config = fs.readFileSync(path.join(ROOT, 'app-node', 'install_config.mjs'), 'utf8');
+  assert.match(installer, /Discord application\/client ID for invite URL/);
+  assert.match(config, /DISCORD_CLIENT_ID/);
+  assert.match(config, /Discord app setup:/);
+  assert.match(config, /https:\/\/discord\.com\/developers\/applications/);
+  assert.match(config, /vc bot invite <client-id>/);
+  assert.match(config, /buildDiscordBotInviteUrl\(\{ clientId: values\.DISCORD_CLIENT_ID \}\)/);
+});
 test('Ubuntu Docker smoke script validates clean install without secrets', () => {
   const script = fs.readFileSync(path.join(ROOT, 'scripts', 'docker_ubuntu_smoke.sh'), 'utf8');

package/app-node/install_config.mjs CHANGED Viewed

@@ -26,6 +26,7 @@ export function normalizeInstallAnswers(input = {}) {
   const out = {
     AGENT_BACKEND: normalizedHarness,
     DISCORD_BOT_TOKEN: clean(input.discordBotToken || input.DISCORD_BOT_TOKEN),
+    DISCORD_CLIENT_ID: clean(input.discordClientId || input.DISCORD_CLIENT_ID || input.applicationId || input.APPLICATION_ID),
     DISCORD_ALLOWED_USERS: clean(input.allowedUsers || input.DISCORD_ALLOWED_USERS),
     AUTO_JOIN_VOICE_CHANNELS: clean(input.autoJoinVoiceChannels || input.AUTO_JOIN_VOICE_CHANNELS, '일반,General,general'),
     TRANSCRIPT_CHANNEL_ID: clean(input.transcriptChannelId || input.TRANSCRIPT_CHANNEL_ID),
@@ -101,6 +102,7 @@ export function slugifyInstanceName(name) {
 export function buildEnvFile(values = {}) {
   const ordered = [
     'DISCORD_BOT_TOKEN',
+    'DISCORD_CLIENT_ID',
     'DISCORD_ALLOWED_USERS',
     'AUTO_JOIN_VOICE_CHANNELS',
     'TRANSCRIPT_CHANNEL_ID',
@@ -243,9 +245,17 @@ export function parseKeyValueEnv(text) {
 export function renderInstallSummary(values = {}) {
   const backend = values.AGENT_BACKEND || 'hermes';
+  const inviteUrl = values.DISCORD_CLIENT_ID ? buildDiscordBotInviteUrl({ clientId: values.DISCORD_CLIENT_ID }) : '';
   return [
     `Configured Discord voice bridge for harness: ${backend}`,
     '',
+    'Discord app setup:',
+    '  1. Create an app: https://discord.com/developers/applications',
+    '  2. Bot tab: Add Bot, enable Message Content Intent, copy/reset the token.',
+    '  3. Put the token in .env as DISCORD_BOT_TOKEN="...".',
+    inviteUrl ? `  4. Invite URL: ${inviteUrl}` : '  4. Invite URL: vc bot invite <client-id>',
+    '  5. Make sure the bot can read/send text and connect/speak in voice.',
+    '',
     'Next commands:',
     '  vc doctor',
     '  vc start',

package/docs/FRESH_INSTALL.md CHANGED Viewed

@@ -32,7 +32,13 @@ cd VerbalCoding
 ## 2. Bootstrap dependencies and run the setup wizard
-The npm commands above run the same bootstrapper as the clone install. For a clone, run:
+For an npm install, do not run `./scripts/install.sh` directly; there is no repository checkout in your current directory. Use the packaged CLI wrapper instead:
+```bash
+vc setup --yes
+```
+`vc setup` runs the `scripts/install.sh` bundled inside the installed npm package. Only use `./scripts/install.sh --yes` when you are inside a GitHub clone:
 ```bash
 ./scripts/install.sh --yes
@@ -104,7 +110,7 @@ The invite includes bot and slash-command scopes plus text/voice permissions use
 vc doctor
 ```
-`vc doctor` is redacted: it reports missing tokens/commands/models without printing secret values. Fix every `✗` item, then rerun it.
+`vc doctor` is redacted: it reports missing tokens/commands/models without printing secret values. When fixable local prerequisites are missing (`ffmpeg`, `whisper-cli`, the default model, or Edge TTS helper), it automatically reruns the packaged bootstrap first. Fix any remaining `✗` items, then rerun it.
 Expected success includes:

package/docs/assets/figures/verbalcoding-flow.svg CHANGED Viewed

@@ -1,6 +1,6 @@
 <svg width="1200" height="520" viewBox="0 0 1200 520" fill="none" xmlns="http://www.w3.org/2000/svg" role="img" aria-labelledby="title desc">
-  <title id="title">VerbalCoding voice-to-agent flow</title>
-  <desc id="desc">A stylized pipeline from Discord voice to speech recognition, CLI agent, text answer, and TTS playback.</desc>
+  <title id="title">VerbalCoding natural voice loop</title>
+  <desc id="desc">A compact phone-call-like loop: user speaks in Discord, Local STT with whisper-cli transcribes, the CLI agent works, TTS speaks back, and the user can interrupt anytime.</desc>
   <defs>
     <linearGradient id="bg" x1="0" y1="0" x2="1200" y2="520" gradientUnits="userSpaceOnUse">
       <stop stop-color="#0F172A"/>
@@ -19,45 +19,60 @@
   <circle cx="1030" cy="90" r="190" fill="#6366F1" opacity="0.16"/>
   <circle cx="170" cy="430" r="210" fill="#06B6D4" opacity="0.13"/>
   <rect x="70" y="54" width="1060" height="412" rx="32" fill="url(#card)" stroke="#334155" filter="url(#shadow)"/>
   <text x="110" y="118" fill="#F8FAFC" font-family="Inter, ui-sans-serif, system-ui" font-size="42" font-weight="800">VerbalCoding</text>
-  <text x="110" y="154" fill="#94A3B8" font-family="Inter, ui-sans-serif, system-ui" font-size="20">Discord voice → local STT → CLI coding agent → spoken answer</text>
+  <text x="110" y="154" fill="#94A3B8" font-family="Inter, ui-sans-serif, system-ui" font-size="20">A natural Discord voice loop for coding agents — speak, listen, interrupt, continue</text>
   <g font-family="Inter, ui-sans-serif, system-ui" font-size="17" font-weight="700">
-    <rect x="110" y="220" width="150" height="92" rx="20" fill="#5865F2"/>
-    <text x="185" y="258" fill="white" text-anchor="middle">Discord</text>
-    <text x="185" y="284" fill="#E0E7FF" text-anchor="middle" font-size="14">voice channel</text>
+    <rect x="105" y="220" width="160" height="92" rx="20" fill="#5865F2"/>
+    <text x="185" y="254" fill="white" text-anchor="middle">Discord</text>
+    <text x="185" y="280" fill="#E0E7FF" text-anchor="middle" font-size="14">phone-call voice</text>
-    <rect x="305" y="220" width="150" height="92" rx="20" fill="#0891B2"/>
-    <text x="380" y="258" fill="white" text-anchor="middle">whisper.cpp</text>
-    <text x="380" y="284" fill="#CFFAFE" text-anchor="middle" font-size="14">local STT</text>
+    <rect x="305" y="220" width="165" height="92" rx="20" fill="#0891B2"/>
+    <text x="387.5" y="254" fill="white" text-anchor="middle">Local STT</text>
+    <text x="387.5" y="280" fill="#CFFAFE" text-anchor="middle" font-size="14">whisper-cli</text>
-    <rect x="500" y="220" width="150" height="92" rx="20" fill="#7C3AED"/>
-    <text x="575" y="258" fill="white" text-anchor="middle">Adapter</text>
-    <text x="575" y="284" fill="#EDE9FE" text-anchor="middle" font-size="14">Hermes / Claude / Codex</text>
+    <rect x="510" y="220" width="165" height="92" rx="20" fill="#7C3AED"/>
+    <text x="592.5" y="254" fill="white" text-anchor="middle">Adapter</text>
+    <text x="592.5" y="280" fill="#EDE9FE" text-anchor="middle" font-size="14">Hermes / Claude / Codex</text>
-    <rect x="695" y="220" width="150" height="92" rx="20" fill="#111827" stroke="#475569"/>
-    <text x="770" y="258" fill="white" text-anchor="middle">CLI Agent</text>
-    <text x="770" y="284" fill="#CBD5E1" text-anchor="middle" font-size="14">does the work</text>
+    <rect x="715" y="220" width="165" height="92" rx="20" fill="#111827" stroke="#475569"/>
+    <text x="797.5" y="254" fill="white" text-anchor="middle">CLI Agent</text>
+    <text x="797.5" y="280" fill="#CBD5E1" text-anchor="middle" font-size="14">does the work</text>
-    <rect x="890" y="220" width="150" height="92" rx="20" fill="#0EA5E9"/>
-    <text x="965" y="258" fill="white" text-anchor="middle">TTS</text>
-    <text x="965" y="284" fill="#E0F2FE" text-anchor="middle" font-size="14">chunked playback</text>
+    <rect x="920" y="220" width="165" height="92" rx="20" fill="#0EA5E9"/>
+    <text x="1002.5" y="254" fill="white" text-anchor="middle">TTS</text>
+    <text x="1002.5" y="280" fill="#E0F2FE" text-anchor="middle" font-size="14">spoken reply</text>
   </g>
   <g stroke="#94A3B8" stroke-width="4" stroke-linecap="round">
-    <path d="M266 266H296"/>
-    <path d="M461 266H491"/>
-    <path d="M656 266H686"/>
-    <path d="M851 266H881"/>
+    <path d="M275 266H295"/>
+    <path d="M480 266H500"/>
+    <path d="M685 266H705"/>
+    <path d="M890 266H910"/>
+  </g>
+  <g fill="#94A3B8" opacity="0.95">
+    <circle cx="285" cy="266" r="4"/>
+    <circle cx="490" cy="266" r="4"/>
+    <circle cx="695" cy="266" r="4"/>
+    <circle cx="900" cy="266" r="4"/>
   </g>
-  <g fill="#94A3B8">
-    <path d="M296 266l-10-7v14l10-7z"/>
-    <path d="M491 266l-10-7v14l10-7z"/>
-    <path d="M686 266l-10-7v14l10-7z"/>
-    <path d="M881 266l-10-7v14l10-7z"/>
+  <path d="M1002 330C1002 405 185 405 185 330" stroke="#67E8F9" stroke-width="4" stroke-linecap="round" stroke-dasharray="13 13"/>
+  <g fill="#67E8F9">
+    <circle cx="1002" cy="330" r="5"/>
+    <circle cx="185" cy="330" r="5"/>
+  </g>
+  <text x="594" y="438" fill="#A5F3FC" text-anchor="middle" font-family="Inter, ui-sans-serif, system-ui" font-size="17" font-weight="700">Conversation loop: hear the answer, speak again, or interrupt anytime</text>
+  <path d="M185 210C185 178 1002 178 1002 210" stroke="#FBBF24" stroke-width="3" stroke-linecap="round" stroke-dasharray="8 10" opacity="0.9"/>
+  <g fill="#FBBF24" opacity="0.95">
+    <circle cx="185" cy="210" r="4"/>
+    <circle cx="1002" cy="210" r="4"/>
   </g>
+  <text x="594" y="194" fill="#FDE68A" text-anchor="middle" font-family="Inter, ui-sans-serif, system-ui" font-size="15" font-weight="700">Barge-in stays open while the agent is thinking or speaking</text>
-  <rect x="150" y="360" width="900" height="54" rx="17" fill="#020617" stroke="#1F2937"/>
-  <text x="182" y="394" fill="#A7F3D0" font-family="SFMono-Regular, ui-monospace, monospace" font-size="18">$ vc language ko &amp;&amp; vc instance start my-project</text>
-  <text x="1045" y="394" fill="#64748B" text-anchor="end" font-family="Inter, ui-sans-serif, system-ui" font-size="15">hands-free coding loop</text>
+  <rect x="150" y="348" width="900" height="54" rx="17" fill="#020617" stroke="#1F2937"/>
+  <text x="182" y="382" fill="#A7F3D0" font-family="SFMono-Regular, ui-monospace, monospace" font-size="18">$ vc language ko &amp;&amp; vc instance start my-project</text>
+  <text x="1045" y="382" fill="#64748B" text-anchor="end" font-family="Inter, ui-sans-serif, system-ui" font-size="15">hands-free coding call</text>
 </svg>

package/docs/i18n/CONFIGURATION.es.md CHANGED Viewed

@@ -1,36 +1,40 @@
-# VerbalCoding Configuración
+# Configuración de VerbalCoding
-## Setup Wizard
+## Asistente de configuración
-Use upstream Discord-side guides first, then return to VerbalCoding:
+La configuración de la aplicación/bot de Discord no se vuelve a explicar desde cero aquí de forma intencionada. Usa estas guías originales para los pasos del lado de Discord y luego vuelve a la configuración de VerbalCoding:
-- Hermes Agent Discord messaging guide: <https://hermes-agent.nousresearch.com/docs/user-guide/messaging/discord>
-- Discord official bot overview: <https://docs.discord.com/developers/bots/overview>
-- Discord official quick start: <https://docs.discord.com/developers/quick-start/getting-started>
+- Guía de mensajería Discord de Hermes Agent: <https://hermes-agent.nousresearch.com/docs/user-guide/messaging/discord>
+- Resumen oficial de bots de Discord: <https://docs.discord.com/developers/bots/overview>
+- Inicio rápido oficial de Discord: <https://docs.discord.com/developers/quick-start/getting-started>
 ```bash
-vc setup --yes
-# or from a clone
 ./scripts/install.sh
 ```
-The installer asks for the Discord token, allowed users, auto-join voice channel names, transcript channel/thread, CLI harness backend, default voice language, TTS settings, and wake-word behavior. It writes `.env` with mode `0600`.
+El instalador solicita token de Discord, usuarios permitidos, nombres de canales de voz para auto-unión, canal/hilo de transcripción, backend de arnés CLI, idioma de voz predeterminado, ajustes de TTS y comportamiento de palabra de activación. Escribe `.env` con modo `0600`; `.env` está ignorado por git. También enlaza el comando corto de shell `vc`.
-## Supported Agent Backends
+Si solo necesitas el comando de shell después de una instalación manual:
-Set `AGENT_BACKEND` in `.env`.
+```bash
+npm link
+```
+## Backends de agentes compatibles
-| Backend | Default command | Notes |
+Define `AGENT_BACKEND` en `.env`.
+| Backend | Comando predeterminado | Notas |
 |---|---|---|
-| `hermes` | `hermes chat -Q -q` | Default; supports resume and verbose progress |
-| `claude-code` / `claude` | `claude -p` | Override with `CLAUDE_COMMAND` or `AGENT_COMMAND` |
-| `codex` | `codex exec` | Override with `CODEX_COMMAND` or `AGENT_COMMAND` |
-| `gemini` | `gemini -p` | Override with `GEMINI_COMMAND` or `AGENT_COMMAND` |
-| `opencode` | `opencode run` | Override with `OPENCODE_COMMAND` or `AGENT_COMMAND` |
-| `openclaw` | `openclaw run` | Override with `OPENCLAW_COMMAND` or `AGENT_COMMAND` |
-| `custom` | `AGENT_COMMAND` required | Prompt is appended as final argv |
+| `hermes` | `hermes chat -Q -q` | Predeterminado. Conserva el comportamiento de reanudación de `.verbalcoding-session`. |
+| `claude-code` / `claude` | `claude -p` | Sobrescribe con `CLAUDE_COMMAND` o `AGENT_COMMAND`. |
+| `codex` | `codex exec` | Sobrescribe con `CODEX_COMMAND` o `AGENT_COMMAND`. |
+| `gemini` | `gemini -p` | Sobrescribe con `GEMINI_COMMAND` o `AGENT_COMMAND`. |
+| `opencode` | `opencode run` | Sobrescribe con `OPENCODE_COMMAND` o `AGENT_COMMAND`. |
+| `openclaw` | `openclaw run` | Sobrescribe con `OPENCLAW_COMMAND` o `AGENT_COMMAND`. |
+| `custom` | `AGENT_COMMAND` requerido | El prompt se añade como argumento argv final. |
-Generic overrides:
+Sobrescrituras genéricas:
 ```bash
 AGENT_BACKEND=custom
@@ -43,23 +47,37 @@ UTTERANCE_IDLE_MS=4500
 LATENCY_LOG_PATH=./.logs/latency.jsonl
 ```
-## Example `.env`
+## Contrato del adaptador de agente
+El puente de voz habla con cada backend mediante un único contrato de adaptador:
+- `run({ text }, signal, plan)` devuelve estado, texto de respuesta final, etiqueta del backend, tiempo transcurrido y metadatos de sesión opcionales.
+- `ask(text, signal, plan)` es el atajo de compatibilidad que devuelve solo el texto de la respuesta final.
+- `capabilities` declara si el backend admite reanudación de sesión, progreso en streaming y cancelación.
+- Hermes es el adaptador de referencia: reanudación, streaming de progreso detallado, cancelación y recuperación de respuesta final desde archivos de sesión de Hermes.
+Los nuevos backends deberían implementar el mismo contrato y mantener el comportamiento de voz/STT/TTS fuera del adaptador.
+## Ejemplo de `.env`
 ```bash
 DISCORD_BOT_TOKEN="***"
 DISCORD_ALLOWED_USERS="123456789012345678"
 AUTO_JOIN_VOICE_CHANNELS="일반,General,general"
 TRANSCRIPT_CHANNEL_ID="123456789012345678"
 AGENT_BACKEND="hermes"
 STT_ENGINE="whisper_cpp"
 WHISPER_CPP_BIN="whisper-cli"
 WHISPER_CPP_MODEL="./models/ggml-small-q5_1.bin"
 TTS_BACKEND="edge"
 TTS_VOICE_TYPE="korean_female"
 TTS_VOICE="ko-KR-SunHiNeural"
 TTS_RATE="+10%"
 TTS_MAX_CHARS="495"
 TTS_VOLUME="1.0"
 REQUIRE_WAKE_WORD="0"
 MIN_UTTERANCE_SECONDS="1.0"
 UTTERANCE_IDLE_MS="4500"
@@ -69,39 +87,60 @@ AGENT_VERBOSE_PROGRESS="0"
 LATENCY_LOG_PATH="./.logs/latency.jsonl"
 ```
-## TTS Voice Selection
+## Selección de voz TTS
+Los preajustes de idioma y la selección de voz están separados:
-`vc language ko|en|auto` changes STT language, progress language, and default TTS voice. Live commands such as “남자 한국어 목소리로 바꿔”, “여자 한국어 목소리로 바꿔”, `change voice to Korean female`, and `switch speaker to English` change only the speaker/voice type.
+- `vc language ko|en|auto` cambia el idioma STT, el idioma de progreso y la voz predeterminada para ese idioma.
+- Comandos de voz en vivo como “남자 한국어 목소리로 바꿔”, “여자 한국어 목소리로 바꿔”, `change voice to Korean female` y `switch speaker to English` cambian solo el hablante/tipo de voz.
+- `!voice-test <text>` reproduce una muestra rápida con el backend y la voz actualmente seleccionados.
-Default Edge catalog:
+La selección de voz se guarda por defecto en `config/tts-voices.json`. Sobrescribe la ruta con `TTS_VOICE_CONFIG`. El puente en ejecución vuelve a leer/aplicar la selección de voz antes de sintetizar, por lo que los comandos de voz surten efecto sin reinicio completo.
-| `TTS_VOICE_TYPE` | `TTS_VOICE` | Language |
+Catálogo Edge predeterminado:
+| `TTS_VOICE_TYPE` | `TTS_VOICE` | Idioma |
 |---|---|---|
-| `korean_male` | `ko-KR-InJoonNeural` | Korean |
-| `korean_female` | `ko-KR-SunHiNeural` | Korean |
-| `korean_multilingual_male` | `ko-KR-HyunsuMultilingualNeural` | Korean |
-| `english_male` | `en-US-GuyNeural` | English |
-| `english_female` | `en-US-AriaNeural` | English |
+| `korean_male` | `ko-KR-InJoonNeural` | Coreano |
+| `korean_female` | `ko-KR-SunHiNeural` | Coreano |
+| `korean_multilingual_male` | `ko-KR-HyunsuMultilingualNeural` | Coreano |
+| `english_male` | `en-US-GuyNeural` | Inglés |
+| `english_female` | `en-US-AriaNeural` | Inglés |
+Sobrescritura manual persistente:
+```bash
+TTS_BACKEND="edge"
+TTS_VOICE_TYPE="korean_male"
+TTS_VOICE="ko-KR-InJoonNeural"
+TTS_VOICE_CONFIG="config/tts-voices.json"
+```
-Backend-specific voice options:
+Para OpenVoice, SpeechSwift o Supertonic, mantén los ajustes de voz/referencia específicos del backend en las secciones siguientes; el mismo archivo de catálogo de voces aún puede rastrear el tipo de voz activo.
-| Backend | Settings | Voice choices |
+Opciones de voz específicas de backend:
+| Backend | Ajustes | Opciones de voz |
 |---|---|---|
-| Edge | `TTS_VOICE_TYPE`, `TTS_VOICE` | Built-in types plus any `edge-tts --list-voices` voice |
-| Supertonic | `SUPERTONIC_VOICE`, `SUPERTONIC_LANGUAGE` | `M1`–`M5`, `F1`–`F5`; `ko`, `en`, `es`, `pt`, `fr` |
-| OpenVoice | `OPENVOICE_REF_AUDIO`, `OPENVOICE_STYLE`, `OPENVOICE_LANGUAGE` | User-provided permitted reference WAV |
-| SpeechSwift / CosyVoice | `SPEECHSWIFT_REF_AUDIO`, `SPEECHSWIFT_ENGINE`, `SPEECHSWIFT_SPEAKER`, `SPEECHSWIFT_MODEL_ID` | Reference-sample voice or backend speaker/model ID |
+| Edge | `TTS_VOICE_TYPE`, `TTS_VOICE` | Tipos integrados anteriores, más cualquier voz devuelta por `edge-tts --list-voices` |
+| Supertonic | `SUPERTONIC_VOICE`, `SUPERTONIC_LANGUAGE` | `M1`–`M5`, `F1`–`F5`; idioma `ko`, `en`, `es`, `pt`, `fr` |
+| OpenVoice | `OPENVOICE_REF_AUDIO`, `OPENVOICE_STYLE`, `OPENVOICE_LANGUAGE` | WAV de referencia permitido proporcionado por el usuario; el estilo predeterminado es `default` |
+| SpeechSwift / CosyVoice | `SPEECHSWIFT_REF_AUDIO`, `SPEECHSWIFT_ENGINE`, `SPEECHSWIFT_SPEAKER`, `SPEECHSWIFT_MODEL_ID` | Voces de muestra de referencia para CosyVoice, o IDs de hablante/modelo admitidos por el backend |
-## Utterance Segmentation
+## Segmentación de emisiones
-`UTTERANCE_IDLE_MS` controls how long the bridge waits after speech before starting STT. Default is `4500` ms.
+`UTTERANCE_IDLE_MS` controla cuánto espera el puente después de un segmento de habla antes de decidir que el usuario terminó y empezar STT. El valor predeterminado es `4500` ms para conservar instrucciones habladas más largas con pausas naturales. Los valores menores se sienten más rápidos para comandos cortos, pero pueden dividir dictado largo; los valores mayores son más seguros para habla reflexiva.
 ```bash
-UTTERANCE_IDLE_MS="4500"
-UTTERANCE_IDLE_MS="6000"
+UTTERANCE_IDLE_MS="4500"  # balanced default
+UTTERANCE_IDLE_MS="6000"  # safer for long dictation with pauses
 ```
-## MCP Server
+## Servidor MCP
+VerbalCoding incluye un servidor MCP stdio para que Hermes Agent o cualquier cliente MCP pueda controlar el puente mediante herramientas en lugar de depender de skills o comandos de shell de forma libre.
+Ejemplo de configuración de Hermes:
 ```yaml
 mcp_servers:
@@ -112,39 +151,89 @@ mcp_servers:
     connect_timeout: 30
 ```
-Tools: `status`, `doctor`, `set_auto_restart`, `set_language`, `start`, `stop`, and `restart`.
+Herramientas MCP expuestas:
+| Herramienta | Propósito |
+|---|---|
+| `status` | Informar estado del puente/configuración sin secretos |
+| `doctor` | Ejecutar la comprobación doctor con secretos redactados |
+| `set_auto_restart` | Habilitar/deshabilitar el reinicio automático del bot de voz al hacer commit |
+| `set_language` | Actualizar juntos STT/progreso/TTS |
+| `start`, `stop`, `restart` | Controlar el puente de voz de Discord |
-## Optional OpenVoice TTS
+## TTS OpenVoice opcional
+Edge TTS sigue siendo el valor predeterminado y la alternativa. Para probar clonación de voz local con OpenVoice V2:
 ```bash
 ./scripts/setup_openvoice.sh
+# Download checkpoints_v2_0417.zip from OpenVoice docs and extract under vendor/OpenVoice/checkpoints_v2/
+mkdir -p voice-samples
+# Put a permitted reference sample at voice-samples/user-reference.wav,
+# or capture one from Discord with !voice-clone capture.
 python3 integrations/openvoice/synth.py --openvoice-dir vendor/OpenVoice --ref-audio voice-samples/user-reference.wav --text '안녕하세요. 버벌코딩 목소리 복제 테스트입니다.' --output /tmp/verbalcoding-openvoice-smoke.wav
 ```
+Luego define:
 ```bash
 TTS_BACKEND="openvoice"
 OPENVOICE_REF_AUDIO="./voice-samples/user-reference.wav"
 OPENVOICE_PROGRESS="0"
 ```
-Only clone voices you own or have permission to use. OpenVoice falls back to Edge on failure.
+Clona solo voces que poseas o tengas permiso para usar. Si OpenVoice falla o agota el tiempo, VerbalCoding vuelve a Edge TTS.
-## Optional Supertonic TTS
+## TTS Supertonic opcional
 ```bash
 ./scripts/setup_supertonic.sh
 supertonic tts '안녕하세요. 수퍼토닉 테스트입니다.' --lang ko --voice M1 --steps 2 --speed 1.0 -o /tmp/verbalcoding-supertonic.wav
 ```
-## Optional SpeechSwift / CosyVoice TTS
+Luego define:
+```bash
+TTS_BACKEND="supertonic"
+SUPERTONIC_COMMAND="./.venv-supertonic/bin/supertonic"
+SUPERTONIC_VOICE="M1"
+SUPERTONIC_LANGUAGE="ko"
+SUPERTONIC_STEPS="2"
+SUPERTONIC_SPEED="1.0"
+SUPERTONIC_PROGRESS="0"
+```
+Si Supertonic falta, falla o agota el tiempo, VerbalCoding vuelve a Edge TTS.
+## TTS SpeechSwift / CosyVoice opcional
+En Apple Silicon, `speech-swift` es un backend local para clonación de voz coreana con CosyVoice/Qwen3-TTS nativo de MLX.
 ```bash
 brew tap soniqo/speech https://github.com/soniqo/speech-swift
 brew install speech
 ```
-Recommended env includes `TTS_BACKEND="speechswift"`, `SPEECHSWIFT_MODE="server"`, `SPEECHSWIFT_ENGINE="cosyvoice"`, `SPEECHSWIFT_REF_AUDIO`, and `SPEECHSWIFT_SERVER_URL`. Keep Edge for quick progress prompts.
+Entorno recomendado:
+```bash
+TTS_BACKEND="speechswift"
+SPEECHSWIFT_MODE="server"
+SPEECHSWIFT_ENGINE="cosyvoice"
+SPEECHSWIFT_LANGUAGE="korean"
+SPEECHSWIFT_REF_AUDIO="./voice-samples/user-reference.wav"
+SPEECHSWIFT_SERVER_HOST="127.0.0.1"
+SPEECHSWIFT_SERVER_PORT="18080"
+SPEECHSWIFT_SERVER_URL="http://127.0.0.1:18080"
+SPEECHSWIFT_PROGRESS="0"
+```
+Mantén Edge para prompts rápidos de progreso/backchannel.
-## Operational Notes
+## Notas operativas
-Enable Discord Message Content intent, grant voice connect/speak permissions, authenticate the selected CLI harness separately, and avoid reading diffs/log dumps aloud.
+- El bot necesita el intent privilegiado Message Content de Discord habilitado para comandos de texto.
+- El bot necesita permisos de conectar/hablar en el canal de voz.
+- Para Hermes Agent, configura/autentica Hermes normalmente (`hermes setup`, `hermes login`, etc.) en tu perfil predeterminado.
+- Para Claude Code, Codex, Gemini, OpenCode y OpenClaw, instala y autentica esas CLIs por separado.
+- Si una CLI emite salida de diff/código durante un timeout o fallo de señal, el puente evita leerla en voz alta y envía texto detallado en su lugar.