verbalcoding 0.2.6 → 0.2.8
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +12 -22
- package/app-node/cli_install.test.mjs +15 -0
- package/docs/FRESH_INSTALL.md +8 -2
- package/docs/assets/figures/verbalcoding-flow.svg +45 -30
- package/docs/i18n/CONFIGURATION.es.md +239 -0
- package/docs/i18n/CONFIGURATION.fr.md +239 -0
- package/docs/i18n/CONFIGURATION.ja.md +239 -0
- package/docs/i18n/CONFIGURATION.ko.md +66 -74
- package/docs/i18n/CONFIGURATION.ru.md +239 -0
- package/docs/i18n/CONFIGURATION.zh.md +239 -0
- package/docs/i18n/FRESH_INSTALL.es.md +207 -0
- package/docs/i18n/FRESH_INSTALL.fr.md +207 -0
- package/docs/i18n/FRESH_INSTALL.ja.md +207 -0
- package/docs/i18n/FRESH_INSTALL.ko.md +60 -54
- package/docs/i18n/FRESH_INSTALL.ru.md +207 -0
- package/docs/i18n/FRESH_INSTALL.zh.md +207 -0
- package/docs/i18n/MULTI_INSTANCE.es.md +180 -0
- package/docs/i18n/MULTI_INSTANCE.fr.md +180 -0
- package/docs/i18n/MULTI_INSTANCE.ja.md +179 -0
- package/docs/i18n/MULTI_INSTANCE.ko.md +46 -46
- package/docs/i18n/MULTI_INSTANCE.ru.md +179 -0
- package/docs/i18n/MULTI_INSTANCE.zh.md +179 -0
- package/docs/i18n/README.es.md +83 -55
- package/docs/i18n/README.fr.md +85 -57
- package/docs/i18n/README.ja.md +83 -55
- package/docs/i18n/README.ko.md +47 -56
- package/docs/i18n/README.ru.md +86 -58
- package/docs/i18n/README.zh.md +83 -56
- package/docs/i18n/RELEASE.es.md +74 -0
- package/docs/i18n/RELEASE.fr.md +74 -0
- package/docs/i18n/RELEASE.ja.md +74 -0
- package/docs/i18n/RELEASE.ko.md +38 -36
- package/docs/i18n/RELEASE.ru.md +74 -0
- package/docs/i18n/RELEASE.zh.md +74 -0
- package/docs/i18n/USAGE.es.md +161 -0
- package/docs/i18n/USAGE.fr.md +161 -0
- package/docs/i18n/USAGE.ja.md +161 -0
- package/docs/i18n/USAGE.ko.md +61 -72
- package/docs/i18n/USAGE.ru.md +161 -0
- package/docs/i18n/USAGE.zh.md +161 -0
- package/package.json +1 -1
- package/scripts/bootstrap_prereqs.sh +15 -3
- package/scripts/cli.mjs +1 -1
- package/scripts/doctor.mjs +114 -8
package/README.md
CHANGED
|
@@ -34,7 +34,7 @@ VerbalCoding turns a Discord voice channel into a hands-free control surface for
|
|
|
34
34
|
| What you get | Why it feels good |
|
|
35
35
|
|---|---|
|
|
36
36
|
| Voice-first agent control | Talk to Hermes Agent, Claude Code, Codex, Gemini CLI, OpenCode, OpenClaw, or any custom CLI harness. |
|
|
37
|
-
|
|
|
37
|
+
| On-device speech loop | Discord voice capture → local `whisper-cli` transcription → agent → chunked TTS playback. |
|
|
38
38
|
| Shared voice + text context | Voice turns and `!ask` text commands can reuse the same supported agent session. |
|
|
39
39
|
| Barge-in and sensitivity modes | Interrupt playback naturally and switch between normal and conservative/noisy environments. |
|
|
40
40
|
| Multilingual voice presets | Switch STT, progress language, and TTS voice together with `vc language ko/en/auto`. |
|
|
@@ -69,23 +69,10 @@ vc doctor
|
|
|
69
69
|
./run.sh
|
|
70
70
|
```
|
|
71
71
|
|
|
72
|
-
`vc setup --yes`
|
|
72
|
+
`vc setup --yes` bootstraps local prerequisites from the npm package. `./scripts/install.sh --yes` does the same for GitHub clone installs. Both cover Node/npm dependencies, `ffmpeg`, `whisper-cli`, the default whisper.cpp model, a local `.venv-tts` Edge TTS helper, and setup wizard configuration where possible. They support macOS/Homebrew plus common Linux package managers (`apt`, `dnf`, `pacman`); rerun with `--no-wizard` for dependency-only setup or `--skip-system` if you want to install OS packages yourself.
|
|
73
73
|
|
|
74
74
|
Need a clean install walkthrough? Start with [Fresh Install](docs/FRESH_INSTALL.md).
|
|
75
75
|
|
|
76
|
-
## How It Works
|
|
77
|
-
|
|
78
|
-
```mermaid
|
|
79
|
-
flowchart LR
|
|
80
|
-
A[Discord voice] --> B["@discordjs/voice"]
|
|
81
|
-
B --> C[PCM cleanup + gates]
|
|
82
|
-
C --> D["whisper.cpp STT"]
|
|
83
|
-
D --> E["CLI agent adapter"]
|
|
84
|
-
E --> F["Concise answer"]
|
|
85
|
-
F --> G["Chunked TTS"]
|
|
86
|
-
G --> H["Discord playback"]
|
|
87
|
-
```
|
|
88
|
-
|
|
89
76
|
## Supported Agent Backends
|
|
90
77
|
|
|
91
78
|
| Backend | Default command | Session support |
|
|
@@ -107,7 +94,6 @@ flowchart LR
|
|
|
107
94
|
| [Configuration](docs/CONFIGURATION.md) | `.env`, agent backends, MCP, TTS backends, operational notes |
|
|
108
95
|
| [Multi-Instance](docs/MULTI_INSTANCE.md) | One permanent Discord voice room per project |
|
|
109
96
|
| [Release Notes](docs/RELEASE.md) | Current capabilities and pre-release checklist |
|
|
110
|
-
| [한국어 문서](docs/i18n/README.ko.md) | npm 설치, 사용법, 설정, 멀티 인스턴스 한국어 가이드 |
|
|
111
97
|
|
|
112
98
|
## Tiny Command Map
|
|
113
99
|
|
|
@@ -123,11 +109,15 @@ vc start # start the default bridge
|
|
|
123
109
|
|
|
124
110
|
In Discord:
|
|
125
111
|
|
|
126
|
-
|
|
127
|
-
|
|
128
|
-
|
|
129
|
-
|
|
130
|
-
|
|
112
|
+
| Command | What it does |
|
|
113
|
+
|---|---|
|
|
114
|
+
| `!join` | Join your current voice channel. |
|
|
115
|
+
| `!ask <prompt>` | Send text to the same agent backend. |
|
|
116
|
+
| `!verbose on\|off` | Show/speak short progress updates. |
|
|
117
|
+
| `!latency` | Summarize recent voice/STT/agent/TTS latency. |
|
|
118
|
+
| `!sensitivity normal` | Use normal indoor barge-in sensitivity. |
|
|
119
|
+
| `!sensitivity conservative` | Use stricter noisy/outdoor sensitivity. |
|
|
120
|
+
| `!session new <name> <workdir> [context] --voice <voice-channel>` | Bind a project session to a voice room. |
|
|
131
121
|
|
|
132
122
|
## Requirements
|
|
133
123
|
|
|
@@ -135,7 +125,7 @@ In Discord:
|
|
|
135
125
|
|---|---|
|
|
136
126
|
| Runtime | Node.js 20+, npm; install script can install via Homebrew/apt/dnf/pacman |
|
|
137
127
|
| Audio | `ffmpeg`; install script can install it |
|
|
138
|
-
|
|
|
128
|
+
| Speech recognition | Local `whisper-cli` from whisper.cpp; install script uses Homebrew on macOS or local Linux build fallback |
|
|
139
129
|
| TTS | Edge TTS CLI; install script creates `.venv-tts` if needed |
|
|
140
130
|
| Discord | Bot token, Message Content intent, voice permissions |
|
|
141
131
|
| Agent | At least one authenticated CLI harness, Hermes Agent by default |
|
|
@@ -63,6 +63,8 @@ test('bootstrap script installs cross-platform prerequisites and local model hel
|
|
|
63
63
|
|
|
64
64
|
assert.match(script, /brew install/);
|
|
65
65
|
assert.match(script, /apt-get install/);
|
|
66
|
+
assert.match(script, /has_cmd node \|\| packages\+\=\(nodejs\)/);
|
|
67
|
+
assert.match(script, /has_cmd npm \|\| packages\+\=\(npm\)/);
|
|
66
68
|
assert.match(script, /dnf install/);
|
|
67
69
|
assert.match(script, /pacman -Sy/);
|
|
68
70
|
assert.match(script, /git clone --depth 1 https:\/\/github\.com\/ggml-org\/whisper\.cpp\.git/);
|
|
@@ -70,6 +72,19 @@ test('bootstrap script installs cross-platform prerequisites and local model hel
|
|
|
70
72
|
assert.match(script, /\.venv-tts/);
|
|
71
73
|
});
|
|
72
74
|
|
|
75
|
+
test('doctor auto-bootstraps fixable prerequisites by default', () => {
|
|
76
|
+
const doctor = fs.readFileSync(path.join(ROOT, 'scripts', 'doctor.mjs'), 'utf8');
|
|
77
|
+
const cli = fs.readFileSync(path.join(ROOT, 'scripts', 'cli.mjs'), 'utf8');
|
|
78
|
+
|
|
79
|
+
assert.match(doctor, /fixablePrerequisites/);
|
|
80
|
+
assert.match(doctor, /bootstrap_prereqs\.sh'\), '--yes'/);
|
|
81
|
+
assert.match(doctor, /VERBALCODING_DOCTOR_AUTO_FIX/);
|
|
82
|
+
assert.match(doctor, /--no-fix/);
|
|
83
|
+
assert.match(doctor, /WHISPER_CPP_BIN/);
|
|
84
|
+
assert.match(doctor, /EDGE_TTS_COMMAND/);
|
|
85
|
+
assert.match(cli, /doctor\.mjs'\), \.\.\.argv\.slice\(1\)/);
|
|
86
|
+
});
|
|
87
|
+
|
|
73
88
|
test('Ubuntu Docker smoke script validates clean install without secrets', () => {
|
|
74
89
|
const script = fs.readFileSync(path.join(ROOT, 'scripts', 'docker_ubuntu_smoke.sh'), 'utf8');
|
|
75
90
|
|
package/docs/FRESH_INSTALL.md
CHANGED
|
@@ -32,7 +32,13 @@ cd VerbalCoding
|
|
|
32
32
|
|
|
33
33
|
## 2. Bootstrap dependencies and run the setup wizard
|
|
34
34
|
|
|
35
|
-
|
|
35
|
+
For an npm install, do not run `./scripts/install.sh` directly; there is no repository checkout in your current directory. Use the packaged CLI wrapper instead:
|
|
36
|
+
|
|
37
|
+
```bash
|
|
38
|
+
vc setup --yes
|
|
39
|
+
```
|
|
40
|
+
|
|
41
|
+
`vc setup` runs the `scripts/install.sh` bundled inside the installed npm package. Only use `./scripts/install.sh --yes` when you are inside a GitHub clone:
|
|
36
42
|
|
|
37
43
|
```bash
|
|
38
44
|
./scripts/install.sh --yes
|
|
@@ -104,7 +110,7 @@ The invite includes bot and slash-command scopes plus text/voice permissions use
|
|
|
104
110
|
vc doctor
|
|
105
111
|
```
|
|
106
112
|
|
|
107
|
-
`vc doctor` is redacted: it reports missing tokens/commands/models without printing secret values. Fix
|
|
113
|
+
`vc doctor` is redacted: it reports missing tokens/commands/models without printing secret values. When fixable local prerequisites are missing (`ffmpeg`, `whisper-cli`, the default model, or Edge TTS helper), it automatically reruns the packaged bootstrap first. Fix any remaining `✗` items, then rerun it.
|
|
108
114
|
|
|
109
115
|
Expected success includes:
|
|
110
116
|
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
<svg width="1200" height="520" viewBox="0 0 1200 520" fill="none" xmlns="http://www.w3.org/2000/svg" role="img" aria-labelledby="title desc">
|
|
2
|
-
<title id="title">VerbalCoding voice
|
|
3
|
-
<desc id="desc">A
|
|
2
|
+
<title id="title">VerbalCoding natural voice loop</title>
|
|
3
|
+
<desc id="desc">A compact phone-call-like loop: user speaks in Discord, Local STT with whisper-cli transcribes, the CLI agent works, TTS speaks back, and the user can interrupt anytime.</desc>
|
|
4
4
|
<defs>
|
|
5
5
|
<linearGradient id="bg" x1="0" y1="0" x2="1200" y2="520" gradientUnits="userSpaceOnUse">
|
|
6
6
|
<stop stop-color="#0F172A"/>
|
|
@@ -19,45 +19,60 @@
|
|
|
19
19
|
<circle cx="1030" cy="90" r="190" fill="#6366F1" opacity="0.16"/>
|
|
20
20
|
<circle cx="170" cy="430" r="210" fill="#06B6D4" opacity="0.13"/>
|
|
21
21
|
<rect x="70" y="54" width="1060" height="412" rx="32" fill="url(#card)" stroke="#334155" filter="url(#shadow)"/>
|
|
22
|
+
|
|
22
23
|
<text x="110" y="118" fill="#F8FAFC" font-family="Inter, ui-sans-serif, system-ui" font-size="42" font-weight="800">VerbalCoding</text>
|
|
23
|
-
<text x="110" y="154" fill="#94A3B8" font-family="Inter, ui-sans-serif, system-ui" font-size="20">Discord voice
|
|
24
|
+
<text x="110" y="154" fill="#94A3B8" font-family="Inter, ui-sans-serif, system-ui" font-size="20">A natural Discord voice loop for coding agents — speak, listen, interrupt, continue</text>
|
|
24
25
|
|
|
25
26
|
<g font-family="Inter, ui-sans-serif, system-ui" font-size="17" font-weight="700">
|
|
26
|
-
<rect x="
|
|
27
|
-
<text x="185" y="
|
|
28
|
-
<text x="185" y="
|
|
27
|
+
<rect x="105" y="220" width="160" height="92" rx="20" fill="#5865F2"/>
|
|
28
|
+
<text x="185" y="254" fill="white" text-anchor="middle">Discord</text>
|
|
29
|
+
<text x="185" y="280" fill="#E0E7FF" text-anchor="middle" font-size="14">phone-call voice</text>
|
|
29
30
|
|
|
30
|
-
<rect x="305" y="220" width="
|
|
31
|
-
<text x="
|
|
32
|
-
<text x="
|
|
31
|
+
<rect x="305" y="220" width="165" height="92" rx="20" fill="#0891B2"/>
|
|
32
|
+
<text x="387.5" y="254" fill="white" text-anchor="middle">Local STT</text>
|
|
33
|
+
<text x="387.5" y="280" fill="#CFFAFE" text-anchor="middle" font-size="14">whisper-cli</text>
|
|
33
34
|
|
|
34
|
-
<rect x="
|
|
35
|
-
<text x="
|
|
36
|
-
<text x="
|
|
35
|
+
<rect x="510" y="220" width="165" height="92" rx="20" fill="#7C3AED"/>
|
|
36
|
+
<text x="592.5" y="254" fill="white" text-anchor="middle">Adapter</text>
|
|
37
|
+
<text x="592.5" y="280" fill="#EDE9FE" text-anchor="middle" font-size="14">Hermes / Claude / Codex</text>
|
|
37
38
|
|
|
38
|
-
<rect x="
|
|
39
|
-
<text x="
|
|
40
|
-
<text x="
|
|
39
|
+
<rect x="715" y="220" width="165" height="92" rx="20" fill="#111827" stroke="#475569"/>
|
|
40
|
+
<text x="797.5" y="254" fill="white" text-anchor="middle">CLI Agent</text>
|
|
41
|
+
<text x="797.5" y="280" fill="#CBD5E1" text-anchor="middle" font-size="14">does the work</text>
|
|
41
42
|
|
|
42
|
-
<rect x="
|
|
43
|
-
<text x="
|
|
44
|
-
<text x="
|
|
43
|
+
<rect x="920" y="220" width="165" height="92" rx="20" fill="#0EA5E9"/>
|
|
44
|
+
<text x="1002.5" y="254" fill="white" text-anchor="middle">TTS</text>
|
|
45
|
+
<text x="1002.5" y="280" fill="#E0F2FE" text-anchor="middle" font-size="14">spoken reply</text>
|
|
45
46
|
</g>
|
|
46
47
|
|
|
47
48
|
<g stroke="#94A3B8" stroke-width="4" stroke-linecap="round">
|
|
48
|
-
<path d="
|
|
49
|
-
<path d="
|
|
50
|
-
<path d="
|
|
51
|
-
<path d="
|
|
49
|
+
<path d="M275 266H295"/>
|
|
50
|
+
<path d="M480 266H500"/>
|
|
51
|
+
<path d="M685 266H705"/>
|
|
52
|
+
<path d="M890 266H910"/>
|
|
53
|
+
</g>
|
|
54
|
+
<g fill="#94A3B8" opacity="0.95">
|
|
55
|
+
<circle cx="285" cy="266" r="4"/>
|
|
56
|
+
<circle cx="490" cy="266" r="4"/>
|
|
57
|
+
<circle cx="695" cy="266" r="4"/>
|
|
58
|
+
<circle cx="900" cy="266" r="4"/>
|
|
52
59
|
</g>
|
|
53
|
-
|
|
54
|
-
|
|
55
|
-
|
|
56
|
-
<
|
|
57
|
-
<
|
|
60
|
+
|
|
61
|
+
<path d="M1002 330C1002 405 185 405 185 330" stroke="#67E8F9" stroke-width="4" stroke-linecap="round" stroke-dasharray="13 13"/>
|
|
62
|
+
<g fill="#67E8F9">
|
|
63
|
+
<circle cx="1002" cy="330" r="5"/>
|
|
64
|
+
<circle cx="185" cy="330" r="5"/>
|
|
65
|
+
</g>
|
|
66
|
+
<text x="594" y="438" fill="#A5F3FC" text-anchor="middle" font-family="Inter, ui-sans-serif, system-ui" font-size="17" font-weight="700">Conversation loop: hear the answer, speak again, or interrupt anytime</text>
|
|
67
|
+
|
|
68
|
+
<path d="M185 210C185 178 1002 178 1002 210" stroke="#FBBF24" stroke-width="3" stroke-linecap="round" stroke-dasharray="8 10" opacity="0.9"/>
|
|
69
|
+
<g fill="#FBBF24" opacity="0.95">
|
|
70
|
+
<circle cx="185" cy="210" r="4"/>
|
|
71
|
+
<circle cx="1002" cy="210" r="4"/>
|
|
58
72
|
</g>
|
|
73
|
+
<text x="594" y="194" fill="#FDE68A" text-anchor="middle" font-family="Inter, ui-sans-serif, system-ui" font-size="15" font-weight="700">Barge-in stays open while the agent is thinking or speaking</text>
|
|
59
74
|
|
|
60
|
-
<rect x="150" y="
|
|
61
|
-
<text x="182" y="
|
|
62
|
-
<text x="1045" y="
|
|
75
|
+
<rect x="150" y="348" width="900" height="54" rx="17" fill="#020617" stroke="#1F2937"/>
|
|
76
|
+
<text x="182" y="382" fill="#A7F3D0" font-family="SFMono-Regular, ui-monospace, monospace" font-size="18">$ vc language ko && vc instance start my-project</text>
|
|
77
|
+
<text x="1045" y="382" fill="#64748B" text-anchor="end" font-family="Inter, ui-sans-serif, system-ui" font-size="15">hands-free coding call</text>
|
|
63
78
|
</svg>
|
|
@@ -0,0 +1,239 @@
|
|
|
1
|
+
# Configuración de VerbalCoding
|
|
2
|
+
|
|
3
|
+
## Asistente de configuración
|
|
4
|
+
|
|
5
|
+
La configuración de la aplicación/bot de Discord no se vuelve a explicar desde cero aquí de forma intencionada. Usa estas guías originales para los pasos del lado de Discord y luego vuelve a la configuración de VerbalCoding:
|
|
6
|
+
|
|
7
|
+
- Guía de mensajería Discord de Hermes Agent: <https://hermes-agent.nousresearch.com/docs/user-guide/messaging/discord>
|
|
8
|
+
- Resumen oficial de bots de Discord: <https://docs.discord.com/developers/bots/overview>
|
|
9
|
+
- Inicio rápido oficial de Discord: <https://docs.discord.com/developers/quick-start/getting-started>
|
|
10
|
+
|
|
11
|
+
```bash
|
|
12
|
+
./scripts/install.sh
|
|
13
|
+
```
|
|
14
|
+
|
|
15
|
+
El instalador solicita token de Discord, usuarios permitidos, nombres de canales de voz para auto-unión, canal/hilo de transcripción, backend de arnés CLI, idioma de voz predeterminado, ajustes de TTS y comportamiento de palabra de activación. Escribe `.env` con modo `0600`; `.env` está ignorado por git. También enlaza el comando corto de shell `vc`.
|
|
16
|
+
|
|
17
|
+
Si solo necesitas el comando de shell después de una instalación manual:
|
|
18
|
+
|
|
19
|
+
```bash
|
|
20
|
+
npm link
|
|
21
|
+
```
|
|
22
|
+
|
|
23
|
+
## Backends de agentes compatibles
|
|
24
|
+
|
|
25
|
+
Define `AGENT_BACKEND` en `.env`.
|
|
26
|
+
|
|
27
|
+
| Backend | Comando predeterminado | Notas |
|
|
28
|
+
|---|---|---|
|
|
29
|
+
| `hermes` | `hermes chat -Q -q` | Predeterminado. Conserva el comportamiento de reanudación de `.verbalcoding-session`. |
|
|
30
|
+
| `claude-code` / `claude` | `claude -p` | Sobrescribe con `CLAUDE_COMMAND` o `AGENT_COMMAND`. |
|
|
31
|
+
| `codex` | `codex exec` | Sobrescribe con `CODEX_COMMAND` o `AGENT_COMMAND`. |
|
|
32
|
+
| `gemini` | `gemini -p` | Sobrescribe con `GEMINI_COMMAND` o `AGENT_COMMAND`. |
|
|
33
|
+
| `opencode` | `opencode run` | Sobrescribe con `OPENCODE_COMMAND` o `AGENT_COMMAND`. |
|
|
34
|
+
| `openclaw` | `openclaw run` | Sobrescribe con `OPENCLAW_COMMAND` o `AGENT_COMMAND`. |
|
|
35
|
+
| `custom` | `AGENT_COMMAND` requerido | El prompt se añade como argumento argv final. |
|
|
36
|
+
|
|
37
|
+
Sobrescrituras genéricas:
|
|
38
|
+
|
|
39
|
+
```bash
|
|
40
|
+
AGENT_BACKEND=custom
|
|
41
|
+
AGENT_LABEL="My Harness"
|
|
42
|
+
AGENT_COMMAND="my-harness run --non-interactive"
|
|
43
|
+
AGENT_TASK_TIMEOUT_MS=0
|
|
44
|
+
AGENT_CHAT_TIMEOUT_MS=45000
|
|
45
|
+
AGENT_VERBOSE_PROGRESS=0
|
|
46
|
+
UTTERANCE_IDLE_MS=4500
|
|
47
|
+
LATENCY_LOG_PATH=./.logs/latency.jsonl
|
|
48
|
+
```
|
|
49
|
+
|
|
50
|
+
## Contrato del adaptador de agente
|
|
51
|
+
|
|
52
|
+
El puente de voz habla con cada backend mediante un único contrato de adaptador:
|
|
53
|
+
|
|
54
|
+
- `run({ text }, signal, plan)` devuelve estado, texto de respuesta final, etiqueta del backend, tiempo transcurrido y metadatos de sesión opcionales.
|
|
55
|
+
- `ask(text, signal, plan)` es el atajo de compatibilidad que devuelve solo el texto de la respuesta final.
|
|
56
|
+
- `capabilities` declara si el backend admite reanudación de sesión, progreso en streaming y cancelación.
|
|
57
|
+
- Hermes es el adaptador de referencia: reanudación, streaming de progreso detallado, cancelación y recuperación de respuesta final desde archivos de sesión de Hermes.
|
|
58
|
+
|
|
59
|
+
Los nuevos backends deberían implementar el mismo contrato y mantener el comportamiento de voz/STT/TTS fuera del adaptador.
|
|
60
|
+
|
|
61
|
+
## Ejemplo de `.env`
|
|
62
|
+
|
|
63
|
+
```bash
|
|
64
|
+
DISCORD_BOT_TOKEN="***"
|
|
65
|
+
DISCORD_ALLOWED_USERS="123456789012345678"
|
|
66
|
+
AUTO_JOIN_VOICE_CHANNELS="일반,General,general"
|
|
67
|
+
TRANSCRIPT_CHANNEL_ID="123456789012345678"
|
|
68
|
+
|
|
69
|
+
AGENT_BACKEND="hermes"
|
|
70
|
+
STT_ENGINE="whisper_cpp"
|
|
71
|
+
WHISPER_CPP_BIN="whisper-cli"
|
|
72
|
+
WHISPER_CPP_MODEL="./models/ggml-small-q5_1.bin"
|
|
73
|
+
|
|
74
|
+
TTS_BACKEND="edge"
|
|
75
|
+
TTS_VOICE_TYPE="korean_female"
|
|
76
|
+
TTS_VOICE="ko-KR-SunHiNeural"
|
|
77
|
+
TTS_RATE="+10%"
|
|
78
|
+
TTS_MAX_CHARS="495"
|
|
79
|
+
TTS_VOLUME="1.0"
|
|
80
|
+
|
|
81
|
+
REQUIRE_WAKE_WORD="0"
|
|
82
|
+
MIN_UTTERANCE_SECONDS="1.0"
|
|
83
|
+
UTTERANCE_IDLE_MS="4500"
|
|
84
|
+
HERMES_TASK_TIMEOUT_MS="0"
|
|
85
|
+
HERMES_CHAT_TIMEOUT_MS="45000"
|
|
86
|
+
AGENT_VERBOSE_PROGRESS="0"
|
|
87
|
+
LATENCY_LOG_PATH="./.logs/latency.jsonl"
|
|
88
|
+
```
|
|
89
|
+
|
|
90
|
+
## Selección de voz TTS
|
|
91
|
+
|
|
92
|
+
Los preajustes de idioma y la selección de voz están separados:
|
|
93
|
+
|
|
94
|
+
- `vc language ko|en|auto` cambia el idioma STT, el idioma de progreso y la voz predeterminada para ese idioma.
|
|
95
|
+
- Comandos de voz en vivo como “남자 한국어 목소리로 바꿔”, “여자 한국어 목소리로 바꿔”, `change voice to Korean female` y `switch speaker to English` cambian solo el hablante/tipo de voz.
|
|
96
|
+
- `!voice-test <text>` reproduce una muestra rápida con el backend y la voz actualmente seleccionados.
|
|
97
|
+
|
|
98
|
+
La selección de voz se guarda por defecto en `config/tts-voices.json`. Sobrescribe la ruta con `TTS_VOICE_CONFIG`. El puente en ejecución vuelve a leer/aplicar la selección de voz antes de sintetizar, por lo que los comandos de voz surten efecto sin reinicio completo.
|
|
99
|
+
|
|
100
|
+
Catálogo Edge predeterminado:
|
|
101
|
+
|
|
102
|
+
| `TTS_VOICE_TYPE` | `TTS_VOICE` | Idioma |
|
|
103
|
+
|---|---|---|
|
|
104
|
+
| `korean_male` | `ko-KR-InJoonNeural` | Coreano |
|
|
105
|
+
| `korean_female` | `ko-KR-SunHiNeural` | Coreano |
|
|
106
|
+
| `korean_multilingual_male` | `ko-KR-HyunsuMultilingualNeural` | Coreano |
|
|
107
|
+
| `english_male` | `en-US-GuyNeural` | Inglés |
|
|
108
|
+
| `english_female` | `en-US-AriaNeural` | Inglés |
|
|
109
|
+
|
|
110
|
+
Sobrescritura manual persistente:
|
|
111
|
+
|
|
112
|
+
```bash
|
|
113
|
+
TTS_BACKEND="edge"
|
|
114
|
+
TTS_VOICE_TYPE="korean_male"
|
|
115
|
+
TTS_VOICE="ko-KR-InJoonNeural"
|
|
116
|
+
TTS_VOICE_CONFIG="config/tts-voices.json"
|
|
117
|
+
```
|
|
118
|
+
|
|
119
|
+
Para OpenVoice, SpeechSwift o Supertonic, mantén los ajustes de voz/referencia específicos del backend en las secciones siguientes; el mismo archivo de catálogo de voces aún puede rastrear el tipo de voz activo.
|
|
120
|
+
|
|
121
|
+
Opciones de voz específicas de backend:
|
|
122
|
+
|
|
123
|
+
| Backend | Ajustes | Opciones de voz |
|
|
124
|
+
|---|---|---|
|
|
125
|
+
| Edge | `TTS_VOICE_TYPE`, `TTS_VOICE` | Tipos integrados anteriores, más cualquier voz devuelta por `edge-tts --list-voices` |
|
|
126
|
+
| Supertonic | `SUPERTONIC_VOICE`, `SUPERTONIC_LANGUAGE` | `M1`–`M5`, `F1`–`F5`; idioma `ko`, `en`, `es`, `pt`, `fr` |
|
|
127
|
+
| OpenVoice | `OPENVOICE_REF_AUDIO`, `OPENVOICE_STYLE`, `OPENVOICE_LANGUAGE` | WAV de referencia permitido proporcionado por el usuario; el estilo predeterminado es `default` |
|
|
128
|
+
| SpeechSwift / CosyVoice | `SPEECHSWIFT_REF_AUDIO`, `SPEECHSWIFT_ENGINE`, `SPEECHSWIFT_SPEAKER`, `SPEECHSWIFT_MODEL_ID` | Voces de muestra de referencia para CosyVoice, o IDs de hablante/modelo admitidos por el backend |
|
|
129
|
+
|
|
130
|
+
## Segmentación de emisiones
|
|
131
|
+
|
|
132
|
+
`UTTERANCE_IDLE_MS` controla cuánto espera el puente después de un segmento de habla antes de decidir que el usuario terminó y empezar STT. El valor predeterminado es `4500` ms para conservar instrucciones habladas más largas con pausas naturales. Los valores menores se sienten más rápidos para comandos cortos, pero pueden dividir dictado largo; los valores mayores son más seguros para habla reflexiva.
|
|
133
|
+
|
|
134
|
+
```bash
|
|
135
|
+
UTTERANCE_IDLE_MS="4500" # balanced default
|
|
136
|
+
UTTERANCE_IDLE_MS="6000" # safer for long dictation with pauses
|
|
137
|
+
```
|
|
138
|
+
|
|
139
|
+
## Servidor MCP
|
|
140
|
+
|
|
141
|
+
VerbalCoding incluye un servidor MCP stdio para que Hermes Agent o cualquier cliente MCP pueda controlar el puente mediante herramientas en lugar de depender de skills o comandos de shell de forma libre.
|
|
142
|
+
|
|
143
|
+
Ejemplo de configuración de Hermes:
|
|
144
|
+
|
|
145
|
+
```yaml
|
|
146
|
+
mcp_servers:
|
|
147
|
+
verbalcoding:
|
|
148
|
+
command: "node"
|
|
149
|
+
args: ["/path/to/VerbalCoding/scripts/mcp-server.mjs"]
|
|
150
|
+
timeout: 120
|
|
151
|
+
connect_timeout: 30
|
|
152
|
+
```
|
|
153
|
+
|
|
154
|
+
Herramientas MCP expuestas:
|
|
155
|
+
|
|
156
|
+
| Herramienta | Propósito |
|
|
157
|
+
|---|---|
|
|
158
|
+
| `status` | Informar estado del puente/configuración sin secretos |
|
|
159
|
+
| `doctor` | Ejecutar la comprobación doctor con secretos redactados |
|
|
160
|
+
| `set_auto_restart` | Habilitar/deshabilitar el reinicio automático del bot de voz al hacer commit |
|
|
161
|
+
| `set_language` | Actualizar juntos STT/progreso/TTS |
|
|
162
|
+
| `start`, `stop`, `restart` | Controlar el puente de voz de Discord |
|
|
163
|
+
|
|
164
|
+
## TTS OpenVoice opcional
|
|
165
|
+
|
|
166
|
+
Edge TTS sigue siendo el valor predeterminado y la alternativa. Para probar clonación de voz local con OpenVoice V2:
|
|
167
|
+
|
|
168
|
+
```bash
|
|
169
|
+
./scripts/setup_openvoice.sh
|
|
170
|
+
# Download checkpoints_v2_0417.zip from OpenVoice docs and extract under vendor/OpenVoice/checkpoints_v2/
|
|
171
|
+
mkdir -p voice-samples
|
|
172
|
+
# Put a permitted reference sample at voice-samples/user-reference.wav,
|
|
173
|
+
# or capture one from Discord with !voice-clone capture.
|
|
174
|
+
python3 integrations/openvoice/synth.py --openvoice-dir vendor/OpenVoice --ref-audio voice-samples/user-reference.wav --text '안녕하세요. 버벌코딩 목소리 복제 테스트입니다.' --output /tmp/verbalcoding-openvoice-smoke.wav
|
|
175
|
+
```
|
|
176
|
+
|
|
177
|
+
Luego define:
|
|
178
|
+
|
|
179
|
+
```bash
|
|
180
|
+
TTS_BACKEND="openvoice"
|
|
181
|
+
OPENVOICE_REF_AUDIO="./voice-samples/user-reference.wav"
|
|
182
|
+
OPENVOICE_PROGRESS="0"
|
|
183
|
+
```
|
|
184
|
+
|
|
185
|
+
Clona solo voces que poseas o tengas permiso para usar. Si OpenVoice falla o agota el tiempo, VerbalCoding vuelve a Edge TTS.
|
|
186
|
+
|
|
187
|
+
## TTS Supertonic opcional
|
|
188
|
+
|
|
189
|
+
```bash
|
|
190
|
+
./scripts/setup_supertonic.sh
|
|
191
|
+
supertonic tts '안녕하세요. 수퍼토닉 테스트입니다.' --lang ko --voice M1 --steps 2 --speed 1.0 -o /tmp/verbalcoding-supertonic.wav
|
|
192
|
+
```
|
|
193
|
+
|
|
194
|
+
Luego define:
|
|
195
|
+
|
|
196
|
+
```bash
|
|
197
|
+
TTS_BACKEND="supertonic"
|
|
198
|
+
SUPERTONIC_COMMAND="./.venv-supertonic/bin/supertonic"
|
|
199
|
+
SUPERTONIC_VOICE="M1"
|
|
200
|
+
SUPERTONIC_LANGUAGE="ko"
|
|
201
|
+
SUPERTONIC_STEPS="2"
|
|
202
|
+
SUPERTONIC_SPEED="1.0"
|
|
203
|
+
SUPERTONIC_PROGRESS="0"
|
|
204
|
+
```
|
|
205
|
+
|
|
206
|
+
Si Supertonic falta, falla o agota el tiempo, VerbalCoding vuelve a Edge TTS.
|
|
207
|
+
|
|
208
|
+
## TTS SpeechSwift / CosyVoice opcional
|
|
209
|
+
|
|
210
|
+
En Apple Silicon, `speech-swift` es un backend local para clonación de voz coreana con CosyVoice/Qwen3-TTS nativo de MLX.
|
|
211
|
+
|
|
212
|
+
```bash
|
|
213
|
+
brew tap soniqo/speech https://github.com/soniqo/speech-swift
|
|
214
|
+
brew install speech
|
|
215
|
+
```
|
|
216
|
+
|
|
217
|
+
Entorno recomendado:
|
|
218
|
+
|
|
219
|
+
```bash
|
|
220
|
+
TTS_BACKEND="speechswift"
|
|
221
|
+
SPEECHSWIFT_MODE="server"
|
|
222
|
+
SPEECHSWIFT_ENGINE="cosyvoice"
|
|
223
|
+
SPEECHSWIFT_LANGUAGE="korean"
|
|
224
|
+
SPEECHSWIFT_REF_AUDIO="./voice-samples/user-reference.wav"
|
|
225
|
+
SPEECHSWIFT_SERVER_HOST="127.0.0.1"
|
|
226
|
+
SPEECHSWIFT_SERVER_PORT="18080"
|
|
227
|
+
SPEECHSWIFT_SERVER_URL="http://127.0.0.1:18080"
|
|
228
|
+
SPEECHSWIFT_PROGRESS="0"
|
|
229
|
+
```
|
|
230
|
+
|
|
231
|
+
Mantén Edge para prompts rápidos de progreso/backchannel.
|
|
232
|
+
|
|
233
|
+
## Notas operativas
|
|
234
|
+
|
|
235
|
+
- El bot necesita el intent privilegiado Message Content de Discord habilitado para comandos de texto.
|
|
236
|
+
- El bot necesita permisos de conectar/hablar en el canal de voz.
|
|
237
|
+
- Para Hermes Agent, configura/autentica Hermes normalmente (`hermes setup`, `hermes login`, etc.) en tu perfil predeterminado.
|
|
238
|
+
- Para Claude Code, Codex, Gemini, OpenCode y OpenClaw, instala y autentica esas CLIs por separado.
|
|
239
|
+
- Si una CLI emite salida de diff/código durante un timeout o fallo de señal, el puente evita leerla en voz alta y envía texto detallado en su lugar.
|