verbalcoding 0.2.7 → 0.2.9
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +12 -27
- package/app-node/cli_install.test.mjs +32 -0
- package/app-node/install_config.mjs +10 -0
- package/docs/FRESH_INSTALL.md +8 -2
- package/docs/assets/figures/verbalcoding-flow.svg +45 -30
- package/docs/i18n/CONFIGURATION.es.md +138 -49
- package/docs/i18n/CONFIGURATION.fr.md +138 -49
- package/docs/i18n/CONFIGURATION.ja.md +137 -48
- package/docs/i18n/CONFIGURATION.ko.md +137 -48
- package/docs/i18n/CONFIGURATION.ru.md +138 -49
- package/docs/i18n/CONFIGURATION.zh.md +137 -48
- package/docs/i18n/FRESH_INSTALL.es.md +115 -32
- package/docs/i18n/FRESH_INSTALL.fr.md +115 -32
- package/docs/i18n/FRESH_INSTALL.ja.md +119 -36
- package/docs/i18n/FRESH_INSTALL.ko.md +120 -37
- package/docs/i18n/FRESH_INSTALL.ru.md +115 -32
- package/docs/i18n/FRESH_INSTALL.zh.md +119 -36
- package/docs/i18n/MULTI_INSTANCE.es.md +85 -26
- package/docs/i18n/MULTI_INSTANCE.fr.md +85 -26
- package/docs/i18n/MULTI_INSTANCE.ja.md +87 -29
- package/docs/i18n/MULTI_INSTANCE.ko.md +87 -29
- package/docs/i18n/MULTI_INSTANCE.ru.md +84 -26
- package/docs/i18n/MULTI_INSTANCE.zh.md +87 -29
- package/docs/i18n/README.es.md +109 -45
- package/docs/i18n/README.fr.md +109 -45
- package/docs/i18n/README.ja.md +109 -45
- package/docs/i18n/README.ko.md +108 -45
- package/docs/i18n/README.ru.md +109 -45
- package/docs/i18n/README.zh.md +108 -45
- package/docs/i18n/RELEASE.es.md +53 -37
- package/docs/i18n/RELEASE.fr.md +53 -37
- package/docs/i18n/RELEASE.ja.md +52 -36
- package/docs/i18n/RELEASE.ko.md +52 -36
- package/docs/i18n/RELEASE.ru.md +53 -37
- package/docs/i18n/RELEASE.zh.md +53 -37
- package/docs/i18n/USAGE.es.md +91 -64
- package/docs/i18n/USAGE.fr.md +91 -64
- package/docs/i18n/USAGE.ja.md +90 -63
- package/docs/i18n/USAGE.ko.md +90 -63
- package/docs/i18n/USAGE.ru.md +91 -64
- package/docs/i18n/USAGE.zh.md +90 -63
- package/package.json +1 -1
- package/scripts/bootstrap_prereqs.sh +15 -3
- package/scripts/cli.mjs +1 -1
- package/scripts/doctor.mjs +173 -8
- package/scripts/install.mjs +2 -0
package/README.md
CHANGED
|
@@ -34,7 +34,7 @@ VerbalCoding turns a Discord voice channel into a hands-free control surface for
|
|
|
34
34
|
| What you get | Why it feels good |
|
|
35
35
|
|---|---|
|
|
36
36
|
| Voice-first agent control | Talk to Hermes Agent, Claude Code, Codex, Gemini CLI, OpenCode, OpenClaw, or any custom CLI harness. |
|
|
37
|
-
|
|
|
37
|
+
| On-device speech loop | Discord voice capture → local `whisper-cli` transcription → agent → chunked TTS playback. |
|
|
38
38
|
| Shared voice + text context | Voice turns and `!ask` text commands can reuse the same supported agent session. |
|
|
39
39
|
| Barge-in and sensitivity modes | Interrupt playback naturally and switch between normal and conservative/noisy environments. |
|
|
40
40
|
| Multilingual voice presets | Switch STT, progress language, and TTS voice together with `vc language ko/en/auto`. |
|
|
@@ -69,23 +69,10 @@ vc doctor
|
|
|
69
69
|
./run.sh
|
|
70
70
|
```
|
|
71
71
|
|
|
72
|
-
`vc setup --yes`
|
|
72
|
+
`vc setup --yes` bootstraps local prerequisites from the npm package. `./scripts/install.sh --yes` does the same for GitHub clone installs. Both cover Node/npm dependencies, `ffmpeg`, `whisper-cli`, the default whisper.cpp model, a local `.venv-tts` Edge TTS helper, and setup wizard configuration where possible. They support macOS/Homebrew plus common Linux package managers (`apt`, `dnf`, `pacman`); rerun with `--no-wizard` for dependency-only setup or `--skip-system` if you want to install OS packages yourself.
|
|
73
73
|
|
|
74
74
|
Need a clean install walkthrough? Start with [Fresh Install](docs/FRESH_INSTALL.md).
|
|
75
75
|
|
|
76
|
-
## How It Works
|
|
77
|
-
|
|
78
|
-
```mermaid
|
|
79
|
-
flowchart LR
|
|
80
|
-
A[Discord voice] --> B["@discordjs/voice"]
|
|
81
|
-
B --> C[PCM cleanup + gates]
|
|
82
|
-
C --> D["whisper.cpp STT"]
|
|
83
|
-
D --> E["CLI agent adapter"]
|
|
84
|
-
E --> F["Concise answer"]
|
|
85
|
-
F --> G["Chunked TTS"]
|
|
86
|
-
G --> H["Discord playback"]
|
|
87
|
-
```
|
|
88
|
-
|
|
89
76
|
## Supported Agent Backends
|
|
90
77
|
|
|
91
78
|
| Backend | Default command | Session support |
|
|
@@ -107,12 +94,6 @@ flowchart LR
|
|
|
107
94
|
| [Configuration](docs/CONFIGURATION.md) | `.env`, agent backends, MCP, TTS backends, operational notes |
|
|
108
95
|
| [Multi-Instance](docs/MULTI_INSTANCE.md) | One permanent Discord voice room per project |
|
|
109
96
|
| [Release Notes](docs/RELEASE.md) | Current capabilities and pre-release checklist |
|
|
110
|
-
| [한국어 문서](docs/i18n/README.ko.md) | npm 설치, 사용법, 설정, 멀티 인스턴스 한국어 가이드 |
|
|
111
|
-
| [日本語 docs](docs/i18n/README.ja.md) | npm install, usage, configuration, multi-instance guide in Japanese |
|
|
112
|
-
| [中文文档](docs/i18n/README.zh.md) | npm 安装、使用、配置和多实例中文指南 |
|
|
113
|
-
| [Español docs](docs/i18n/README.es.md) | Instalación npm, uso, configuración y multiinstancia en español |
|
|
114
|
-
| [Français docs](docs/i18n/README.fr.md) | Installation npm, utilisation, configuration et multi-instance en français |
|
|
115
|
-
| [Русская документация](docs/i18n/README.ru.md) | npm установка, использование, конфигурация и мульти-инстансы на русском |
|
|
116
97
|
|
|
117
98
|
## Tiny Command Map
|
|
118
99
|
|
|
@@ -128,11 +109,15 @@ vc start # start the default bridge
|
|
|
128
109
|
|
|
129
110
|
In Discord:
|
|
130
111
|
|
|
131
|
-
|
|
132
|
-
|
|
133
|
-
|
|
134
|
-
|
|
135
|
-
|
|
112
|
+
| Command | What it does |
|
|
113
|
+
|---|---|
|
|
114
|
+
| `!join` | Join your current voice channel. |
|
|
115
|
+
| `!ask <prompt>` | Send text to the same agent backend. |
|
|
116
|
+
| `!verbose on\|off` | Show/speak short progress updates. |
|
|
117
|
+
| `!latency` | Summarize recent voice/STT/agent/TTS latency. |
|
|
118
|
+
| `!sensitivity normal` | Use normal indoor barge-in sensitivity. |
|
|
119
|
+
| `!sensitivity conservative` | Use stricter noisy/outdoor sensitivity. |
|
|
120
|
+
| `!session new <name> <workdir> [context] --voice <voice-channel>` | Bind a project session to a voice room. |
|
|
136
121
|
|
|
137
122
|
## Requirements
|
|
138
123
|
|
|
@@ -140,7 +125,7 @@ In Discord:
|
|
|
140
125
|
|---|---|
|
|
141
126
|
| Runtime | Node.js 20+, npm; install script can install via Homebrew/apt/dnf/pacman |
|
|
142
127
|
| Audio | `ffmpeg`; install script can install it |
|
|
143
|
-
|
|
|
128
|
+
| Speech recognition | Local `whisper-cli` from whisper.cpp; install script uses Homebrew on macOS or local Linux build fallback |
|
|
144
129
|
| TTS | Edge TTS CLI; install script creates `.venv-tts` if needed |
|
|
145
130
|
| Discord | Bot token, Message Content intent, voice permissions |
|
|
146
131
|
| Agent | At least one authenticated CLI harness, Hermes Agent by default |
|
|
@@ -63,6 +63,8 @@ test('bootstrap script installs cross-platform prerequisites and local model hel
|
|
|
63
63
|
|
|
64
64
|
assert.match(script, /brew install/);
|
|
65
65
|
assert.match(script, /apt-get install/);
|
|
66
|
+
assert.match(script, /has_cmd node \|\| packages\+\=\(nodejs\)/);
|
|
67
|
+
assert.match(script, /has_cmd npm \|\| packages\+\=\(npm\)/);
|
|
66
68
|
assert.match(script, /dnf install/);
|
|
67
69
|
assert.match(script, /pacman -Sy/);
|
|
68
70
|
assert.match(script, /git clone --depth 1 https:\/\/github\.com\/ggml-org\/whisper\.cpp\.git/);
|
|
@@ -70,6 +72,36 @@ test('bootstrap script installs cross-platform prerequisites and local model hel
|
|
|
70
72
|
assert.match(script, /\.venv-tts/);
|
|
71
73
|
});
|
|
72
74
|
|
|
75
|
+
test('doctor auto-bootstraps fixable prerequisites by default', () => {
|
|
76
|
+
const doctor = fs.readFileSync(path.join(ROOT, 'scripts', 'doctor.mjs'), 'utf8');
|
|
77
|
+
const cli = fs.readFileSync(path.join(ROOT, 'scripts', 'cli.mjs'), 'utf8');
|
|
78
|
+
|
|
79
|
+
assert.match(doctor, /fixablePrerequisites/);
|
|
80
|
+
assert.match(doctor, /bootstrap_prereqs\.sh'\), '--yes'/);
|
|
81
|
+
assert.match(doctor, /VERBALCODING_DOCTOR_AUTO_FIX/);
|
|
82
|
+
assert.match(doctor, /--no-fix/);
|
|
83
|
+
assert.match(doctor, /WHISPER_CPP_BIN/);
|
|
84
|
+
assert.match(doctor, /EDGE_TTS_COMMAND/);
|
|
85
|
+
assert.match(doctor, /installHermesCliIfNeeded/);
|
|
86
|
+
assert.match(doctor, /NousResearch\/hermes-agent\/main\/scripts\/install\.sh/);
|
|
87
|
+
assert.match(doctor, /VERBALCODING_DOCTOR_INSTALL_HERMES/);
|
|
88
|
+
assert.match(doctor, /Discord bot setup:/);
|
|
89
|
+
assert.match(doctor, /discord\.com\/developers\/applications/);
|
|
90
|
+
assert.match(cli, /doctor\.mjs'\), \.\.\.argv\.slice\(1\)/);
|
|
91
|
+
});
|
|
92
|
+
|
|
93
|
+
test('setup summary guides Discord app creation and records client id', () => {
|
|
94
|
+
const installer = fs.readFileSync(path.join(ROOT, 'scripts', 'install.mjs'), 'utf8');
|
|
95
|
+
const config = fs.readFileSync(path.join(ROOT, 'app-node', 'install_config.mjs'), 'utf8');
|
|
96
|
+
|
|
97
|
+
assert.match(installer, /Discord application\/client ID for invite URL/);
|
|
98
|
+
assert.match(config, /DISCORD_CLIENT_ID/);
|
|
99
|
+
assert.match(config, /Discord app setup:/);
|
|
100
|
+
assert.match(config, /https:\/\/discord\.com\/developers\/applications/);
|
|
101
|
+
assert.match(config, /vc bot invite <client-id>/);
|
|
102
|
+
assert.match(config, /buildDiscordBotInviteUrl\(\{ clientId: values\.DISCORD_CLIENT_ID \}\)/);
|
|
103
|
+
});
|
|
104
|
+
|
|
73
105
|
test('Ubuntu Docker smoke script validates clean install without secrets', () => {
|
|
74
106
|
const script = fs.readFileSync(path.join(ROOT, 'scripts', 'docker_ubuntu_smoke.sh'), 'utf8');
|
|
75
107
|
|
|
@@ -26,6 +26,7 @@ export function normalizeInstallAnswers(input = {}) {
|
|
|
26
26
|
const out = {
|
|
27
27
|
AGENT_BACKEND: normalizedHarness,
|
|
28
28
|
DISCORD_BOT_TOKEN: clean(input.discordBotToken || input.DISCORD_BOT_TOKEN),
|
|
29
|
+
DISCORD_CLIENT_ID: clean(input.discordClientId || input.DISCORD_CLIENT_ID || input.applicationId || input.APPLICATION_ID),
|
|
29
30
|
DISCORD_ALLOWED_USERS: clean(input.allowedUsers || input.DISCORD_ALLOWED_USERS),
|
|
30
31
|
AUTO_JOIN_VOICE_CHANNELS: clean(input.autoJoinVoiceChannels || input.AUTO_JOIN_VOICE_CHANNELS, '일반,General,general'),
|
|
31
32
|
TRANSCRIPT_CHANNEL_ID: clean(input.transcriptChannelId || input.TRANSCRIPT_CHANNEL_ID),
|
|
@@ -101,6 +102,7 @@ export function slugifyInstanceName(name) {
|
|
|
101
102
|
export function buildEnvFile(values = {}) {
|
|
102
103
|
const ordered = [
|
|
103
104
|
'DISCORD_BOT_TOKEN',
|
|
105
|
+
'DISCORD_CLIENT_ID',
|
|
104
106
|
'DISCORD_ALLOWED_USERS',
|
|
105
107
|
'AUTO_JOIN_VOICE_CHANNELS',
|
|
106
108
|
'TRANSCRIPT_CHANNEL_ID',
|
|
@@ -243,9 +245,17 @@ export function parseKeyValueEnv(text) {
|
|
|
243
245
|
|
|
244
246
|
export function renderInstallSummary(values = {}) {
|
|
245
247
|
const backend = values.AGENT_BACKEND || 'hermes';
|
|
248
|
+
const inviteUrl = values.DISCORD_CLIENT_ID ? buildDiscordBotInviteUrl({ clientId: values.DISCORD_CLIENT_ID }) : '';
|
|
246
249
|
return [
|
|
247
250
|
`Configured Discord voice bridge for harness: ${backend}`,
|
|
248
251
|
'',
|
|
252
|
+
'Discord app setup:',
|
|
253
|
+
' 1. Create an app: https://discord.com/developers/applications',
|
|
254
|
+
' 2. Bot tab: Add Bot, enable Message Content Intent, copy/reset the token.',
|
|
255
|
+
' 3. Put the token in .env as DISCORD_BOT_TOKEN="...".',
|
|
256
|
+
inviteUrl ? ` 4. Invite URL: ${inviteUrl}` : ' 4. Invite URL: vc bot invite <client-id>',
|
|
257
|
+
' 5. Make sure the bot can read/send text and connect/speak in voice.',
|
|
258
|
+
'',
|
|
249
259
|
'Next commands:',
|
|
250
260
|
' vc doctor',
|
|
251
261
|
' vc start',
|
package/docs/FRESH_INSTALL.md
CHANGED
|
@@ -32,7 +32,13 @@ cd VerbalCoding
|
|
|
32
32
|
|
|
33
33
|
## 2. Bootstrap dependencies and run the setup wizard
|
|
34
34
|
|
|
35
|
-
|
|
35
|
+
For an npm install, do not run `./scripts/install.sh` directly; there is no repository checkout in your current directory. Use the packaged CLI wrapper instead:
|
|
36
|
+
|
|
37
|
+
```bash
|
|
38
|
+
vc setup --yes
|
|
39
|
+
```
|
|
40
|
+
|
|
41
|
+
`vc setup` runs the `scripts/install.sh` bundled inside the installed npm package. Only use `./scripts/install.sh --yes` when you are inside a GitHub clone:
|
|
36
42
|
|
|
37
43
|
```bash
|
|
38
44
|
./scripts/install.sh --yes
|
|
@@ -104,7 +110,7 @@ The invite includes bot and slash-command scopes plus text/voice permissions use
|
|
|
104
110
|
vc doctor
|
|
105
111
|
```
|
|
106
112
|
|
|
107
|
-
`vc doctor` is redacted: it reports missing tokens/commands/models without printing secret values. Fix
|
|
113
|
+
`vc doctor` is redacted: it reports missing tokens/commands/models without printing secret values. When fixable local prerequisites are missing (`ffmpeg`, `whisper-cli`, the default model, or Edge TTS helper), it automatically reruns the packaged bootstrap first. Fix any remaining `✗` items, then rerun it.
|
|
108
114
|
|
|
109
115
|
Expected success includes:
|
|
110
116
|
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
<svg width="1200" height="520" viewBox="0 0 1200 520" fill="none" xmlns="http://www.w3.org/2000/svg" role="img" aria-labelledby="title desc">
|
|
2
|
-
<title id="title">VerbalCoding voice
|
|
3
|
-
<desc id="desc">A
|
|
2
|
+
<title id="title">VerbalCoding natural voice loop</title>
|
|
3
|
+
<desc id="desc">A compact phone-call-like loop: user speaks in Discord, Local STT with whisper-cli transcribes, the CLI agent works, TTS speaks back, and the user can interrupt anytime.</desc>
|
|
4
4
|
<defs>
|
|
5
5
|
<linearGradient id="bg" x1="0" y1="0" x2="1200" y2="520" gradientUnits="userSpaceOnUse">
|
|
6
6
|
<stop stop-color="#0F172A"/>
|
|
@@ -19,45 +19,60 @@
|
|
|
19
19
|
<circle cx="1030" cy="90" r="190" fill="#6366F1" opacity="0.16"/>
|
|
20
20
|
<circle cx="170" cy="430" r="210" fill="#06B6D4" opacity="0.13"/>
|
|
21
21
|
<rect x="70" y="54" width="1060" height="412" rx="32" fill="url(#card)" stroke="#334155" filter="url(#shadow)"/>
|
|
22
|
+
|
|
22
23
|
<text x="110" y="118" fill="#F8FAFC" font-family="Inter, ui-sans-serif, system-ui" font-size="42" font-weight="800">VerbalCoding</text>
|
|
23
|
-
<text x="110" y="154" fill="#94A3B8" font-family="Inter, ui-sans-serif, system-ui" font-size="20">Discord voice
|
|
24
|
+
<text x="110" y="154" fill="#94A3B8" font-family="Inter, ui-sans-serif, system-ui" font-size="20">A natural Discord voice loop for coding agents — speak, listen, interrupt, continue</text>
|
|
24
25
|
|
|
25
26
|
<g font-family="Inter, ui-sans-serif, system-ui" font-size="17" font-weight="700">
|
|
26
|
-
<rect x="
|
|
27
|
-
<text x="185" y="
|
|
28
|
-
<text x="185" y="
|
|
27
|
+
<rect x="105" y="220" width="160" height="92" rx="20" fill="#5865F2"/>
|
|
28
|
+
<text x="185" y="254" fill="white" text-anchor="middle">Discord</text>
|
|
29
|
+
<text x="185" y="280" fill="#E0E7FF" text-anchor="middle" font-size="14">phone-call voice</text>
|
|
29
30
|
|
|
30
|
-
<rect x="305" y="220" width="
|
|
31
|
-
<text x="
|
|
32
|
-
<text x="
|
|
31
|
+
<rect x="305" y="220" width="165" height="92" rx="20" fill="#0891B2"/>
|
|
32
|
+
<text x="387.5" y="254" fill="white" text-anchor="middle">Local STT</text>
|
|
33
|
+
<text x="387.5" y="280" fill="#CFFAFE" text-anchor="middle" font-size="14">whisper-cli</text>
|
|
33
34
|
|
|
34
|
-
<rect x="
|
|
35
|
-
<text x="
|
|
36
|
-
<text x="
|
|
35
|
+
<rect x="510" y="220" width="165" height="92" rx="20" fill="#7C3AED"/>
|
|
36
|
+
<text x="592.5" y="254" fill="white" text-anchor="middle">Adapter</text>
|
|
37
|
+
<text x="592.5" y="280" fill="#EDE9FE" text-anchor="middle" font-size="14">Hermes / Claude / Codex</text>
|
|
37
38
|
|
|
38
|
-
<rect x="
|
|
39
|
-
<text x="
|
|
40
|
-
<text x="
|
|
39
|
+
<rect x="715" y="220" width="165" height="92" rx="20" fill="#111827" stroke="#475569"/>
|
|
40
|
+
<text x="797.5" y="254" fill="white" text-anchor="middle">CLI Agent</text>
|
|
41
|
+
<text x="797.5" y="280" fill="#CBD5E1" text-anchor="middle" font-size="14">does the work</text>
|
|
41
42
|
|
|
42
|
-
<rect x="
|
|
43
|
-
<text x="
|
|
44
|
-
<text x="
|
|
43
|
+
<rect x="920" y="220" width="165" height="92" rx="20" fill="#0EA5E9"/>
|
|
44
|
+
<text x="1002.5" y="254" fill="white" text-anchor="middle">TTS</text>
|
|
45
|
+
<text x="1002.5" y="280" fill="#E0F2FE" text-anchor="middle" font-size="14">spoken reply</text>
|
|
45
46
|
</g>
|
|
46
47
|
|
|
47
48
|
<g stroke="#94A3B8" stroke-width="4" stroke-linecap="round">
|
|
48
|
-
<path d="
|
|
49
|
-
<path d="
|
|
50
|
-
<path d="
|
|
51
|
-
<path d="
|
|
49
|
+
<path d="M275 266H295"/>
|
|
50
|
+
<path d="M480 266H500"/>
|
|
51
|
+
<path d="M685 266H705"/>
|
|
52
|
+
<path d="M890 266H910"/>
|
|
53
|
+
</g>
|
|
54
|
+
<g fill="#94A3B8" opacity="0.95">
|
|
55
|
+
<circle cx="285" cy="266" r="4"/>
|
|
56
|
+
<circle cx="490" cy="266" r="4"/>
|
|
57
|
+
<circle cx="695" cy="266" r="4"/>
|
|
58
|
+
<circle cx="900" cy="266" r="4"/>
|
|
52
59
|
</g>
|
|
53
|
-
|
|
54
|
-
|
|
55
|
-
|
|
56
|
-
<
|
|
57
|
-
<
|
|
60
|
+
|
|
61
|
+
<path d="M1002 330C1002 405 185 405 185 330" stroke="#67E8F9" stroke-width="4" stroke-linecap="round" stroke-dasharray="13 13"/>
|
|
62
|
+
<g fill="#67E8F9">
|
|
63
|
+
<circle cx="1002" cy="330" r="5"/>
|
|
64
|
+
<circle cx="185" cy="330" r="5"/>
|
|
65
|
+
</g>
|
|
66
|
+
<text x="594" y="438" fill="#A5F3FC" text-anchor="middle" font-family="Inter, ui-sans-serif, system-ui" font-size="17" font-weight="700">Conversation loop: hear the answer, speak again, or interrupt anytime</text>
|
|
67
|
+
|
|
68
|
+
<path d="M185 210C185 178 1002 178 1002 210" stroke="#FBBF24" stroke-width="3" stroke-linecap="round" stroke-dasharray="8 10" opacity="0.9"/>
|
|
69
|
+
<g fill="#FBBF24" opacity="0.95">
|
|
70
|
+
<circle cx="185" cy="210" r="4"/>
|
|
71
|
+
<circle cx="1002" cy="210" r="4"/>
|
|
58
72
|
</g>
|
|
73
|
+
<text x="594" y="194" fill="#FDE68A" text-anchor="middle" font-family="Inter, ui-sans-serif, system-ui" font-size="15" font-weight="700">Barge-in stays open while the agent is thinking or speaking</text>
|
|
59
74
|
|
|
60
|
-
<rect x="150" y="
|
|
61
|
-
<text x="182" y="
|
|
62
|
-
<text x="1045" y="
|
|
75
|
+
<rect x="150" y="348" width="900" height="54" rx="17" fill="#020617" stroke="#1F2937"/>
|
|
76
|
+
<text x="182" y="382" fill="#A7F3D0" font-family="SFMono-Regular, ui-monospace, monospace" font-size="18">$ vc language ko && vc instance start my-project</text>
|
|
77
|
+
<text x="1045" y="382" fill="#64748B" text-anchor="end" font-family="Inter, ui-sans-serif, system-ui" font-size="15">hands-free coding call</text>
|
|
63
78
|
</svg>
|
|
@@ -1,36 +1,40 @@
|
|
|
1
|
-
# VerbalCoding
|
|
1
|
+
# Configuración de VerbalCoding
|
|
2
2
|
|
|
3
|
-
##
|
|
3
|
+
## Asistente de configuración
|
|
4
4
|
|
|
5
|
-
|
|
5
|
+
La configuración de la aplicación/bot de Discord no se vuelve a explicar desde cero aquí de forma intencionada. Usa estas guías originales para los pasos del lado de Discord y luego vuelve a la configuración de VerbalCoding:
|
|
6
6
|
|
|
7
|
-
-
|
|
8
|
-
-
|
|
9
|
-
-
|
|
7
|
+
- Guía de mensajería Discord de Hermes Agent: <https://hermes-agent.nousresearch.com/docs/user-guide/messaging/discord>
|
|
8
|
+
- Resumen oficial de bots de Discord: <https://docs.discord.com/developers/bots/overview>
|
|
9
|
+
- Inicio rápido oficial de Discord: <https://docs.discord.com/developers/quick-start/getting-started>
|
|
10
10
|
|
|
11
11
|
```bash
|
|
12
|
-
vc setup --yes
|
|
13
|
-
# or from a clone
|
|
14
12
|
./scripts/install.sh
|
|
15
13
|
```
|
|
16
14
|
|
|
17
|
-
|
|
15
|
+
El instalador solicita token de Discord, usuarios permitidos, nombres de canales de voz para auto-unión, canal/hilo de transcripción, backend de arnés CLI, idioma de voz predeterminado, ajustes de TTS y comportamiento de palabra de activación. Escribe `.env` con modo `0600`; `.env` está ignorado por git. También enlaza el comando corto de shell `vc`.
|
|
18
16
|
|
|
19
|
-
|
|
17
|
+
Si solo necesitas el comando de shell después de una instalación manual:
|
|
20
18
|
|
|
21
|
-
|
|
19
|
+
```bash
|
|
20
|
+
npm link
|
|
21
|
+
```
|
|
22
|
+
|
|
23
|
+
## Backends de agentes compatibles
|
|
22
24
|
|
|
23
|
-
|
|
25
|
+
Define `AGENT_BACKEND` en `.env`.
|
|
26
|
+
|
|
27
|
+
| Backend | Comando predeterminado | Notas |
|
|
24
28
|
|---|---|---|
|
|
25
|
-
| `hermes` | `hermes chat -Q -q` |
|
|
26
|
-
| `claude-code` / `claude` | `claude -p` |
|
|
27
|
-
| `codex` | `codex exec` |
|
|
28
|
-
| `gemini` | `gemini -p` |
|
|
29
|
-
| `opencode` | `opencode run` |
|
|
30
|
-
| `openclaw` | `openclaw run` |
|
|
31
|
-
| `custom` | `AGENT_COMMAND`
|
|
29
|
+
| `hermes` | `hermes chat -Q -q` | Predeterminado. Conserva el comportamiento de reanudación de `.verbalcoding-session`. |
|
|
30
|
+
| `claude-code` / `claude` | `claude -p` | Sobrescribe con `CLAUDE_COMMAND` o `AGENT_COMMAND`. |
|
|
31
|
+
| `codex` | `codex exec` | Sobrescribe con `CODEX_COMMAND` o `AGENT_COMMAND`. |
|
|
32
|
+
| `gemini` | `gemini -p` | Sobrescribe con `GEMINI_COMMAND` o `AGENT_COMMAND`. |
|
|
33
|
+
| `opencode` | `opencode run` | Sobrescribe con `OPENCODE_COMMAND` o `AGENT_COMMAND`. |
|
|
34
|
+
| `openclaw` | `openclaw run` | Sobrescribe con `OPENCLAW_COMMAND` o `AGENT_COMMAND`. |
|
|
35
|
+
| `custom` | `AGENT_COMMAND` requerido | El prompt se añade como argumento argv final. |
|
|
32
36
|
|
|
33
|
-
|
|
37
|
+
Sobrescrituras genéricas:
|
|
34
38
|
|
|
35
39
|
```bash
|
|
36
40
|
AGENT_BACKEND=custom
|
|
@@ -43,23 +47,37 @@ UTTERANCE_IDLE_MS=4500
|
|
|
43
47
|
LATENCY_LOG_PATH=./.logs/latency.jsonl
|
|
44
48
|
```
|
|
45
49
|
|
|
46
|
-
##
|
|
50
|
+
## Contrato del adaptador de agente
|
|
51
|
+
|
|
52
|
+
El puente de voz habla con cada backend mediante un único contrato de adaptador:
|
|
53
|
+
|
|
54
|
+
- `run({ text }, signal, plan)` devuelve estado, texto de respuesta final, etiqueta del backend, tiempo transcurrido y metadatos de sesión opcionales.
|
|
55
|
+
- `ask(text, signal, plan)` es el atajo de compatibilidad que devuelve solo el texto de la respuesta final.
|
|
56
|
+
- `capabilities` declara si el backend admite reanudación de sesión, progreso en streaming y cancelación.
|
|
57
|
+
- Hermes es el adaptador de referencia: reanudación, streaming de progreso detallado, cancelación y recuperación de respuesta final desde archivos de sesión de Hermes.
|
|
58
|
+
|
|
59
|
+
Los nuevos backends deberían implementar el mismo contrato y mantener el comportamiento de voz/STT/TTS fuera del adaptador.
|
|
60
|
+
|
|
61
|
+
## Ejemplo de `.env`
|
|
47
62
|
|
|
48
63
|
```bash
|
|
49
64
|
DISCORD_BOT_TOKEN="***"
|
|
50
65
|
DISCORD_ALLOWED_USERS="123456789012345678"
|
|
51
66
|
AUTO_JOIN_VOICE_CHANNELS="일반,General,general"
|
|
52
67
|
TRANSCRIPT_CHANNEL_ID="123456789012345678"
|
|
68
|
+
|
|
53
69
|
AGENT_BACKEND="hermes"
|
|
54
70
|
STT_ENGINE="whisper_cpp"
|
|
55
71
|
WHISPER_CPP_BIN="whisper-cli"
|
|
56
72
|
WHISPER_CPP_MODEL="./models/ggml-small-q5_1.bin"
|
|
73
|
+
|
|
57
74
|
TTS_BACKEND="edge"
|
|
58
75
|
TTS_VOICE_TYPE="korean_female"
|
|
59
76
|
TTS_VOICE="ko-KR-SunHiNeural"
|
|
60
77
|
TTS_RATE="+10%"
|
|
61
78
|
TTS_MAX_CHARS="495"
|
|
62
79
|
TTS_VOLUME="1.0"
|
|
80
|
+
|
|
63
81
|
REQUIRE_WAKE_WORD="0"
|
|
64
82
|
MIN_UTTERANCE_SECONDS="1.0"
|
|
65
83
|
UTTERANCE_IDLE_MS="4500"
|
|
@@ -69,39 +87,60 @@ AGENT_VERBOSE_PROGRESS="0"
|
|
|
69
87
|
LATENCY_LOG_PATH="./.logs/latency.jsonl"
|
|
70
88
|
```
|
|
71
89
|
|
|
72
|
-
##
|
|
90
|
+
## Selección de voz TTS
|
|
91
|
+
|
|
92
|
+
Los preajustes de idioma y la selección de voz están separados:
|
|
73
93
|
|
|
74
|
-
`vc language ko|en|auto`
|
|
94
|
+
- `vc language ko|en|auto` cambia el idioma STT, el idioma de progreso y la voz predeterminada para ese idioma.
|
|
95
|
+
- Comandos de voz en vivo como “남자 한국어 목소리로 바꿔”, “여자 한국어 목소리로 바꿔”, `change voice to Korean female` y `switch speaker to English` cambian solo el hablante/tipo de voz.
|
|
96
|
+
- `!voice-test <text>` reproduce una muestra rápida con el backend y la voz actualmente seleccionados.
|
|
75
97
|
|
|
76
|
-
|
|
98
|
+
La selección de voz se guarda por defecto en `config/tts-voices.json`. Sobrescribe la ruta con `TTS_VOICE_CONFIG`. El puente en ejecución vuelve a leer/aplicar la selección de voz antes de sintetizar, por lo que los comandos de voz surten efecto sin reinicio completo.
|
|
77
99
|
|
|
78
|
-
|
|
100
|
+
Catálogo Edge predeterminado:
|
|
101
|
+
|
|
102
|
+
| `TTS_VOICE_TYPE` | `TTS_VOICE` | Idioma |
|
|
79
103
|
|---|---|---|
|
|
80
|
-
| `korean_male` | `ko-KR-InJoonNeural` |
|
|
81
|
-
| `korean_female` | `ko-KR-SunHiNeural` |
|
|
82
|
-
| `korean_multilingual_male` | `ko-KR-HyunsuMultilingualNeural` |
|
|
83
|
-
| `english_male` | `en-US-GuyNeural` |
|
|
84
|
-
| `english_female` | `en-US-AriaNeural` |
|
|
104
|
+
| `korean_male` | `ko-KR-InJoonNeural` | Coreano |
|
|
105
|
+
| `korean_female` | `ko-KR-SunHiNeural` | Coreano |
|
|
106
|
+
| `korean_multilingual_male` | `ko-KR-HyunsuMultilingualNeural` | Coreano |
|
|
107
|
+
| `english_male` | `en-US-GuyNeural` | Inglés |
|
|
108
|
+
| `english_female` | `en-US-AriaNeural` | Inglés |
|
|
109
|
+
|
|
110
|
+
Sobrescritura manual persistente:
|
|
111
|
+
|
|
112
|
+
```bash
|
|
113
|
+
TTS_BACKEND="edge"
|
|
114
|
+
TTS_VOICE_TYPE="korean_male"
|
|
115
|
+
TTS_VOICE="ko-KR-InJoonNeural"
|
|
116
|
+
TTS_VOICE_CONFIG="config/tts-voices.json"
|
|
117
|
+
```
|
|
85
118
|
|
|
86
|
-
|
|
119
|
+
Para OpenVoice, SpeechSwift o Supertonic, mantén los ajustes de voz/referencia específicos del backend en las secciones siguientes; el mismo archivo de catálogo de voces aún puede rastrear el tipo de voz activo.
|
|
87
120
|
|
|
88
|
-
|
|
121
|
+
Opciones de voz específicas de backend:
|
|
122
|
+
|
|
123
|
+
| Backend | Ajustes | Opciones de voz |
|
|
89
124
|
|---|---|---|
|
|
90
|
-
| Edge | `TTS_VOICE_TYPE`, `TTS_VOICE` |
|
|
91
|
-
| Supertonic | `SUPERTONIC_VOICE`, `SUPERTONIC_LANGUAGE` | `M1`–`M5`, `F1`–`F5`; `ko`, `en`, `es`, `pt`, `fr` |
|
|
92
|
-
| OpenVoice | `OPENVOICE_REF_AUDIO`, `OPENVOICE_STYLE`, `OPENVOICE_LANGUAGE` |
|
|
93
|
-
| SpeechSwift / CosyVoice | `SPEECHSWIFT_REF_AUDIO`, `SPEECHSWIFT_ENGINE`, `SPEECHSWIFT_SPEAKER`, `SPEECHSWIFT_MODEL_ID` |
|
|
125
|
+
| Edge | `TTS_VOICE_TYPE`, `TTS_VOICE` | Tipos integrados anteriores, más cualquier voz devuelta por `edge-tts --list-voices` |
|
|
126
|
+
| Supertonic | `SUPERTONIC_VOICE`, `SUPERTONIC_LANGUAGE` | `M1`–`M5`, `F1`–`F5`; idioma `ko`, `en`, `es`, `pt`, `fr` |
|
|
127
|
+
| OpenVoice | `OPENVOICE_REF_AUDIO`, `OPENVOICE_STYLE`, `OPENVOICE_LANGUAGE` | WAV de referencia permitido proporcionado por el usuario; el estilo predeterminado es `default` |
|
|
128
|
+
| SpeechSwift / CosyVoice | `SPEECHSWIFT_REF_AUDIO`, `SPEECHSWIFT_ENGINE`, `SPEECHSWIFT_SPEAKER`, `SPEECHSWIFT_MODEL_ID` | Voces de muestra de referencia para CosyVoice, o IDs de hablante/modelo admitidos por el backend |
|
|
94
129
|
|
|
95
|
-
##
|
|
130
|
+
## Segmentación de emisiones
|
|
96
131
|
|
|
97
|
-
`UTTERANCE_IDLE_MS`
|
|
132
|
+
`UTTERANCE_IDLE_MS` controla cuánto espera el puente después de un segmento de habla antes de decidir que el usuario terminó y empezar STT. El valor predeterminado es `4500` ms para conservar instrucciones habladas más largas con pausas naturales. Los valores menores se sienten más rápidos para comandos cortos, pero pueden dividir dictado largo; los valores mayores son más seguros para habla reflexiva.
|
|
98
133
|
|
|
99
134
|
```bash
|
|
100
|
-
UTTERANCE_IDLE_MS="4500"
|
|
101
|
-
UTTERANCE_IDLE_MS="6000"
|
|
135
|
+
UTTERANCE_IDLE_MS="4500" # balanced default
|
|
136
|
+
UTTERANCE_IDLE_MS="6000" # safer for long dictation with pauses
|
|
102
137
|
```
|
|
103
138
|
|
|
104
|
-
## MCP
|
|
139
|
+
## Servidor MCP
|
|
140
|
+
|
|
141
|
+
VerbalCoding incluye un servidor MCP stdio para que Hermes Agent o cualquier cliente MCP pueda controlar el puente mediante herramientas en lugar de depender de skills o comandos de shell de forma libre.
|
|
142
|
+
|
|
143
|
+
Ejemplo de configuración de Hermes:
|
|
105
144
|
|
|
106
145
|
```yaml
|
|
107
146
|
mcp_servers:
|
|
@@ -112,39 +151,89 @@ mcp_servers:
|
|
|
112
151
|
connect_timeout: 30
|
|
113
152
|
```
|
|
114
153
|
|
|
115
|
-
|
|
154
|
+
Herramientas MCP expuestas:
|
|
155
|
+
|
|
156
|
+
| Herramienta | Propósito |
|
|
157
|
+
|---|---|
|
|
158
|
+
| `status` | Informar estado del puente/configuración sin secretos |
|
|
159
|
+
| `doctor` | Ejecutar la comprobación doctor con secretos redactados |
|
|
160
|
+
| `set_auto_restart` | Habilitar/deshabilitar el reinicio automático del bot de voz al hacer commit |
|
|
161
|
+
| `set_language` | Actualizar juntos STT/progreso/TTS |
|
|
162
|
+
| `start`, `stop`, `restart` | Controlar el puente de voz de Discord |
|
|
116
163
|
|
|
117
|
-
##
|
|
164
|
+
## TTS OpenVoice opcional
|
|
165
|
+
|
|
166
|
+
Edge TTS sigue siendo el valor predeterminado y la alternativa. Para probar clonación de voz local con OpenVoice V2:
|
|
118
167
|
|
|
119
168
|
```bash
|
|
120
169
|
./scripts/setup_openvoice.sh
|
|
170
|
+
# Download checkpoints_v2_0417.zip from OpenVoice docs and extract under vendor/OpenVoice/checkpoints_v2/
|
|
171
|
+
mkdir -p voice-samples
|
|
172
|
+
# Put a permitted reference sample at voice-samples/user-reference.wav,
|
|
173
|
+
# or capture one from Discord with !voice-clone capture.
|
|
121
174
|
python3 integrations/openvoice/synth.py --openvoice-dir vendor/OpenVoice --ref-audio voice-samples/user-reference.wav --text '안녕하세요. 버벌코딩 목소리 복제 테스트입니다.' --output /tmp/verbalcoding-openvoice-smoke.wav
|
|
122
175
|
```
|
|
123
176
|
|
|
177
|
+
Luego define:
|
|
178
|
+
|
|
124
179
|
```bash
|
|
125
180
|
TTS_BACKEND="openvoice"
|
|
126
181
|
OPENVOICE_REF_AUDIO="./voice-samples/user-reference.wav"
|
|
127
182
|
OPENVOICE_PROGRESS="0"
|
|
128
183
|
```
|
|
129
184
|
|
|
130
|
-
|
|
185
|
+
Clona solo voces que poseas o tengas permiso para usar. Si OpenVoice falla o agota el tiempo, VerbalCoding vuelve a Edge TTS.
|
|
131
186
|
|
|
132
|
-
##
|
|
187
|
+
## TTS Supertonic opcional
|
|
133
188
|
|
|
134
189
|
```bash
|
|
135
190
|
./scripts/setup_supertonic.sh
|
|
136
191
|
supertonic tts '안녕하세요. 수퍼토닉 테스트입니다.' --lang ko --voice M1 --steps 2 --speed 1.0 -o /tmp/verbalcoding-supertonic.wav
|
|
137
192
|
```
|
|
138
193
|
|
|
139
|
-
|
|
194
|
+
Luego define:
|
|
195
|
+
|
|
196
|
+
```bash
|
|
197
|
+
TTS_BACKEND="supertonic"
|
|
198
|
+
SUPERTONIC_COMMAND="./.venv-supertonic/bin/supertonic"
|
|
199
|
+
SUPERTONIC_VOICE="M1"
|
|
200
|
+
SUPERTONIC_LANGUAGE="ko"
|
|
201
|
+
SUPERTONIC_STEPS="2"
|
|
202
|
+
SUPERTONIC_SPEED="1.0"
|
|
203
|
+
SUPERTONIC_PROGRESS="0"
|
|
204
|
+
```
|
|
205
|
+
|
|
206
|
+
Si Supertonic falta, falla o agota el tiempo, VerbalCoding vuelve a Edge TTS.
|
|
207
|
+
|
|
208
|
+
## TTS SpeechSwift / CosyVoice opcional
|
|
209
|
+
|
|
210
|
+
En Apple Silicon, `speech-swift` es un backend local para clonación de voz coreana con CosyVoice/Qwen3-TTS nativo de MLX.
|
|
140
211
|
|
|
141
212
|
```bash
|
|
142
213
|
brew tap soniqo/speech https://github.com/soniqo/speech-swift
|
|
143
214
|
brew install speech
|
|
144
215
|
```
|
|
145
216
|
|
|
146
|
-
|
|
217
|
+
Entorno recomendado:
|
|
218
|
+
|
|
219
|
+
```bash
|
|
220
|
+
TTS_BACKEND="speechswift"
|
|
221
|
+
SPEECHSWIFT_MODE="server"
|
|
222
|
+
SPEECHSWIFT_ENGINE="cosyvoice"
|
|
223
|
+
SPEECHSWIFT_LANGUAGE="korean"
|
|
224
|
+
SPEECHSWIFT_REF_AUDIO="./voice-samples/user-reference.wav"
|
|
225
|
+
SPEECHSWIFT_SERVER_HOST="127.0.0.1"
|
|
226
|
+
SPEECHSWIFT_SERVER_PORT="18080"
|
|
227
|
+
SPEECHSWIFT_SERVER_URL="http://127.0.0.1:18080"
|
|
228
|
+
SPEECHSWIFT_PROGRESS="0"
|
|
229
|
+
```
|
|
230
|
+
|
|
231
|
+
Mantén Edge para prompts rápidos de progreso/backchannel.
|
|
147
232
|
|
|
148
|
-
##
|
|
233
|
+
## Notas operativas
|
|
149
234
|
|
|
150
|
-
|
|
235
|
+
- El bot necesita el intent privilegiado Message Content de Discord habilitado para comandos de texto.
|
|
236
|
+
- El bot necesita permisos de conectar/hablar en el canal de voz.
|
|
237
|
+
- Para Hermes Agent, configura/autentica Hermes normalmente (`hermes setup`, `hermes login`, etc.) en tu perfil predeterminado.
|
|
238
|
+
- Para Claude Code, Codex, Gemini, OpenCode y OpenClaw, instala y autentica esas CLIs por separado.
|
|
239
|
+
- Si una CLI emite salida de diff/código durante un timeout o fallo de señal, el puente evita leerla en voz alta y envía texto detallado en su lugar.
|