verbalcoding 0.2.11 → 0.2.12
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.env.example +27 -1
- package/README.es.md +132 -0
- package/README.fr.md +132 -0
- package/README.ja.md +132 -0
- package/README.ko.md +132 -0
- package/README.md +116 -74
- package/README.ru.md +132 -0
- package/README.zh.md +131 -0
- package/app-node/agent_adapters.mjs +37 -5
- package/app-node/agent_adapters.test.mjs +13 -1
- package/app-node/agent_detect.mjs +73 -0
- package/app-node/agent_detect.test.mjs +77 -0
- package/app-node/install_config.mjs +3 -0
- package/app-node/main.mjs +339 -4
- package/app-node/notify.mjs +73 -0
- package/app-node/notify.test.mjs +68 -0
- package/app-node/plan_mode.mjs +174 -0
- package/app-node/plan_mode.test.mjs +153 -0
- package/app-node/smart_progress.mjs +94 -0
- package/app-node/smart_progress.test.mjs +66 -0
- package/app-node/stream_sentencer.mjs +61 -0
- package/app-node/stream_sentencer.test.mjs +64 -0
- package/app-node/streaming_tts_queue.mjs +48 -0
- package/app-node/streaming_tts_queue.test.mjs +58 -0
- package/app-node/text_routing.mjs +20 -0
- package/app-node/text_routing.test.mjs +23 -1
- package/docs/CONFIGURATION.md +69 -96
- package/docs/FRESH_INSTALL.md +105 -63
- package/docs/HERMES_VOICE.md +65 -0
- package/docs/MULTI_INSTANCE.md +16 -0
- package/docs/README.md +49 -0
- package/docs/RELEASE.md +42 -19
- package/docs/ROADMAP.md +38 -0
- package/docs/TROUBLESHOOTING.md +126 -0
- package/docs/USAGE.md +72 -40
- package/docs/assets/figures/verbalcoding-flow.svg +1 -1
- package/docs/i18n/CONFIGURATION.es.md +25 -0
- package/docs/i18n/CONFIGURATION.fr.md +25 -0
- package/docs/i18n/CONFIGURATION.ja.md +25 -0
- package/docs/i18n/CONFIGURATION.ko.md +25 -0
- package/docs/i18n/CONFIGURATION.ru.md +25 -0
- package/docs/i18n/CONFIGURATION.zh.md +25 -0
- package/docs/i18n/FRESH_INSTALL.es.md +27 -2
- package/docs/i18n/FRESH_INSTALL.fr.md +27 -2
- package/docs/i18n/FRESH_INSTALL.ja.md +27 -2
- package/docs/i18n/FRESH_INSTALL.ko.md +27 -2
- package/docs/i18n/FRESH_INSTALL.ru.md +27 -2
- package/docs/i18n/FRESH_INSTALL.zh.md +27 -2
- package/docs/i18n/HERMES_VOICE.es.md +46 -0
- package/docs/i18n/HERMES_VOICE.fr.md +46 -0
- package/docs/i18n/HERMES_VOICE.ja.md +46 -0
- package/docs/i18n/HERMES_VOICE.ko.md +65 -0
- package/docs/i18n/HERMES_VOICE.ru.md +46 -0
- package/docs/i18n/HERMES_VOICE.zh.md +46 -0
- package/docs/i18n/MULTI_INSTANCE.es.md +25 -0
- package/docs/i18n/MULTI_INSTANCE.fr.md +25 -0
- package/docs/i18n/MULTI_INSTANCE.ja.md +25 -0
- package/docs/i18n/MULTI_INSTANCE.ko.md +25 -0
- package/docs/i18n/MULTI_INSTANCE.ru.md +25 -0
- package/docs/i18n/MULTI_INSTANCE.zh.md +25 -0
- package/docs/i18n/README.es.md +20 -134
- package/docs/i18n/README.fr.md +20 -134
- package/docs/i18n/README.ja.md +20 -134
- package/docs/i18n/README.ko.md +20 -133
- package/docs/i18n/README.ru.md +20 -134
- package/docs/i18n/README.zh.md +20 -133
- package/docs/i18n/RELEASE.es.md +26 -1
- package/docs/i18n/RELEASE.fr.md +26 -1
- package/docs/i18n/RELEASE.ja.md +26 -1
- package/docs/i18n/RELEASE.ko.md +26 -1
- package/docs/i18n/RELEASE.ru.md +26 -1
- package/docs/i18n/RELEASE.zh.md +26 -1
- package/docs/i18n/TROUBLESHOOTING.es.md +39 -0
- package/docs/i18n/TROUBLESHOOTING.fr.md +39 -0
- package/docs/i18n/TROUBLESHOOTING.ja.md +39 -0
- package/docs/i18n/TROUBLESHOOTING.ko.md +39 -0
- package/docs/i18n/TROUBLESHOOTING.ru.md +39 -0
- package/docs/i18n/TROUBLESHOOTING.zh.md +39 -0
- package/docs/i18n/USAGE.es.md +25 -0
- package/docs/i18n/USAGE.fr.md +25 -0
- package/docs/i18n/USAGE.ja.md +25 -0
- package/docs/i18n/USAGE.ko.md +25 -0
- package/docs/i18n/USAGE.ru.md +25 -0
- package/docs/i18n/USAGE.zh.md +25 -0
- package/docs/superpowers/plans/2026-05-13-phase1-streaming-pipeline.md +122 -0
- package/docs/superpowers/plans/2026-05-13-phase10-push-notifications.md +152 -0
- package/docs/superpowers/plans/2026-05-13-phase2-agent-adapters.md +242 -0
- package/docs/superpowers/plans/2026-05-13-phase6-smart-progress.md +172 -0
- package/docs/superpowers/plans/2026-05-13-phase7-voice-plan-mode.md +108 -0
- package/package.json +2 -1
- package/scripts/cli.mjs +4 -3
- package/scripts/doctor.mjs +11 -0
- package/scripts/install.mjs +15 -1
package/docs/FRESH_INSTALL.md
CHANGED
|
@@ -1,116 +1,142 @@
|
|
|
1
1
|
# Fresh install
|
|
2
2
|
|
|
3
|
-
|
|
3
|
+
<!-- readme-glow-up:intro -->
|
|
4
|
+
<p align="center">
|
|
5
|
+
<a href="../README.md">README</a> ·
|
|
6
|
+
<a href="README.md">Docs hub</a> ·
|
|
7
|
+
<a href="FRESH_INSTALL.md">Fresh Install</a> ·
|
|
8
|
+
<a href="USAGE.md">Usage</a> ·
|
|
9
|
+
<a href="CONFIGURATION.md">Configuration</a> ·
|
|
10
|
+
<a href="TROUBLESHOOTING.md">Troubleshooting</a> ·
|
|
11
|
+
<a href="MULTI_INSTANCE.md">Multi-Instance</a>
|
|
12
|
+
</p>
|
|
4
13
|
|
|
5
|
-
|
|
14
|
+
> Clean install path for humans first, automation second.
|
|
15
|
+
>
|
|
16
|
+
> Fast path: `npm install -g verbalcoding@latest → vc setup → vc doctor → vc start`
|
|
17
|
+
<!-- /readme-glow-up:intro -->
|
|
6
18
|
|
|
7
|
-
|
|
19
|
+
This guide is for a clean public install. It avoids local-only assumptions and uses the `vc` CLI to bootstrap as much as possible. Windows is not supported yet.
|
|
8
20
|
|
|
9
|
-
|
|
10
|
-
npm install -g verbalcoding
|
|
11
|
-
```
|
|
21
|
+
## 1. Install the CLI and run guided setup
|
|
12
22
|
|
|
13
|
-
|
|
23
|
+
Recommended npm path for humans:
|
|
14
24
|
|
|
15
25
|
```bash
|
|
16
|
-
|
|
26
|
+
npm install -g verbalcoding@latest
|
|
27
|
+
vc setup
|
|
17
28
|
```
|
|
18
29
|
|
|
19
|
-
|
|
30
|
+
`vc setup` bootstraps supported local prerequisites, then asks for the Discord bot token, application/client ID, auto-join voice channel names, transcript target, agent backend, and voice/TTS settings. Keep the Discord Developer Portal open while it runs.
|
|
31
|
+
|
|
32
|
+
Automation/CI path:
|
|
20
33
|
|
|
21
34
|
```bash
|
|
35
|
+
npm install -g verbalcoding@latest
|
|
22
36
|
vc setup --yes
|
|
37
|
+
vc setup token <bot-token> --client-id <discord-client-id>
|
|
38
|
+
vc setup channels "General,Team Voice"
|
|
23
39
|
```
|
|
24
40
|
|
|
41
|
+
Use `--yes` only when you need non-interactive bootstrap/starter config. It cannot stop and wait for you to create a Discord application, so token/channel setup remains a follow-up step in that mode.
|
|
42
|
+
|
|
25
43
|
Contributor GitHub clone path:
|
|
26
44
|
|
|
27
45
|
```bash
|
|
28
46
|
git clone https://github.com/ca1773130n/VerbalCoding.git
|
|
29
47
|
cd VerbalCoding
|
|
30
|
-
./scripts/install.sh
|
|
48
|
+
./scripts/install.sh
|
|
31
49
|
```
|
|
32
50
|
|
|
33
|
-
|
|
34
|
-
|
|
35
|
-
For an npm install, do not run `./scripts/install.sh` directly; there is no repository checkout in your current directory. Use the packaged CLI wrapper instead:
|
|
51
|
+
For npm/global installs, use `vc ...` commands. Do not run `./scripts/install.sh` unless you are inside a repository clone.
|
|
36
52
|
|
|
37
|
-
|
|
38
|
-
vc setup --yes
|
|
39
|
-
```
|
|
53
|
+
## 2. What setup bootstraps
|
|
40
54
|
|
|
41
|
-
`vc setup` runs the
|
|
55
|
+
`vc setup` runs the bootstrap bundled in the npm package and writes `.env`. It can install or prepare:
|
|
42
56
|
|
|
43
|
-
|
|
44
|
-
|
|
45
|
-
|
|
46
|
-
|
|
47
|
-
|
|
48
|
-
|
|
49
|
-
- installs npm dependencies when `node_modules/` is missing,
|
|
50
|
-
- installs the short `vc` shell command with `npm link`,
|
|
51
|
-
- installs `ffmpeg`, Node/npm, and `whisper-cli` when supported by the OS package manager,
|
|
52
|
-
- downloads `models/ggml-small-q5_1.bin`,
|
|
53
|
-
- creates `.venv-tts` and installs `edge-tts` when `edge-tts` is not already on `PATH`,
|
|
54
|
-
- runs the interactive `.env` wizard.
|
|
57
|
+
- npm dependencies when `node_modules/` is missing,
|
|
58
|
+
- `ffmpeg`, Node/npm, Python venv support, build tools, and `whisper-cli` where supported,
|
|
59
|
+
- the default `models/ggml-small-q5_1.bin` whisper.cpp model,
|
|
60
|
+
- a local `.venv-tts` Edge TTS helper,
|
|
61
|
+
- the short `vc` shell command when running from a clone.
|
|
55
62
|
|
|
56
63
|
Supported system bootstrap paths:
|
|
57
64
|
|
|
58
65
|
| OS | System dependency path |
|
|
59
66
|
|---|---|
|
|
60
67
|
| macOS | Homebrew: `brew install node ffmpeg whisper-cpp` as needed |
|
|
61
|
-
| Debian/Ubuntu | `apt-get
|
|
62
|
-
| Fedora/RHEL | `dnf
|
|
63
|
-
| Arch | `pacman
|
|
68
|
+
| Debian/Ubuntu | `apt-get`; handles NodeSource npm conflicts and can locally build whisper.cpp |
|
|
69
|
+
| Fedora/RHEL | `dnf`; local whisper.cpp build fallback |
|
|
70
|
+
| Arch | `pacman`; local whisper.cpp build fallback |
|
|
71
|
+
| Windows | Not supported yet |
|
|
64
72
|
|
|
65
73
|
Useful installer variants:
|
|
66
74
|
|
|
67
75
|
```bash
|
|
68
76
|
vc setup --yes --no-wizard # dependency/bootstrap only from npm install
|
|
69
|
-
|
|
70
|
-
|
|
71
|
-
|
|
72
|
-
./scripts/install.sh --
|
|
73
|
-
VERBALCODING_SKIP_CLI_LINK=1 ./scripts/install.sh --yes
|
|
77
|
+
vc setup --yes --skip-system # skip OS package installation
|
|
78
|
+
vc setup --yes --skip-model # skip default STT model download
|
|
79
|
+
vc setup --yes --skip-edge-tts # skip local Edge TTS helper
|
|
80
|
+
./scripts/install.sh --yes --no-wizard # clone-only non-interactive equivalent
|
|
74
81
|
```
|
|
75
82
|
|
|
76
|
-
|
|
77
|
-
|
|
78
|
-
- Node.js 20+ and npm
|
|
79
|
-
- ffmpeg
|
|
80
|
-
- Python 3 with venv/pip
|
|
81
|
-
- whisper.cpp `whisper-cli`
|
|
82
|
-
- one authenticated CLI agent backend, Hermes Agent by default
|
|
83
|
+
## 3. Discord values collected by setup
|
|
83
84
|
|
|
84
|
-
|
|
85
|
-
|
|
86
|
-
Read the upstream Discord bot setup guides first if this is your first bot:
|
|
85
|
+
Read the upstream Discord bot setup guides if this is your first bot:
|
|
87
86
|
|
|
88
87
|
- Hermes Agent Discord messaging guide: <https://hermes-agent.nousresearch.com/docs/user-guide/messaging/discord>
|
|
89
88
|
- Discord official bot overview: <https://docs.discord.com/developers/bots/overview>
|
|
90
89
|
- Discord official getting started guide: <https://docs.discord.com/developers/quick-start/getting-started>
|
|
91
90
|
|
|
92
|
-
|
|
91
|
+
During `vc setup`:
|
|
93
92
|
|
|
94
|
-
1. Create a Discord application
|
|
93
|
+
1. Create a Discord application/bot in the Developer Portal.
|
|
95
94
|
2. Enable the Message Content privileged intent.
|
|
96
|
-
3.
|
|
97
|
-
4.
|
|
95
|
+
3. Paste the bot token when asked for `DISCORD_BOT_TOKEN`.
|
|
96
|
+
4. Paste the application/client ID when asked; setup can print the invite command.
|
|
97
|
+
5. Enter the real voice channel names the bot should auto-join.
|
|
98
|
+
|
|
99
|
+
Invite URL helper:
|
|
98
100
|
|
|
99
101
|
```bash
|
|
100
102
|
vc bot invite <discord-client-id>
|
|
101
|
-
# or pin it to one server:
|
|
102
103
|
vc bot invite <discord-client-id> --guild <guild-id>
|
|
103
104
|
```
|
|
104
105
|
|
|
105
|
-
|
|
106
|
+
If you skipped a value or need to rotate it later, update only that part:
|
|
107
|
+
|
|
108
|
+
```bash
|
|
109
|
+
vc setup token
|
|
110
|
+
vc setup token <bot-token> --client-id <discord-client-id>
|
|
111
|
+
vc setup channels "VerbalCoding,LLM-Wiki,General"
|
|
112
|
+
```
|
|
113
|
+
|
|
114
|
+
`vc setup token` updates `DISCORD_BOT_TOKEN` and optional `DISCORD_CLIENT_ID`; `vc setup channels` updates `AUTO_JOIN_VOICE_CHANNELS`. Both preserve unrelated `.env` values, set mode `0600`, and do not print secrets back.
|
|
106
115
|
|
|
107
|
-
## 4.
|
|
116
|
+
## 4. Auto-join voice channel names
|
|
117
|
+
|
|
118
|
+
Use the exact Discord voice channel names:
|
|
119
|
+
|
|
120
|
+
```bash
|
|
121
|
+
vc setup channels
|
|
122
|
+
vc setup channels "General,Team Voice"
|
|
123
|
+
vc setup channel "General"
|
|
124
|
+
vc setup voice "General"
|
|
125
|
+
```
|
|
126
|
+
|
|
127
|
+
Restart the bridge after changing channel names.
|
|
128
|
+
|
|
129
|
+
## 5. Verify
|
|
108
130
|
|
|
109
131
|
```bash
|
|
110
132
|
vc doctor
|
|
111
133
|
```
|
|
112
134
|
|
|
113
|
-
`vc doctor` is redacted: it reports missing tokens/commands/models without printing secret values.
|
|
135
|
+
`vc doctor` is redacted: it reports missing tokens/commands/models without printing secret values. On supported macOS/Linux installs it attempts to auto-fix installable prerequisites first, including `ffmpeg`, `whisper-cli`/model, Edge TTS helper, and Hermes CLI for the default Hermes backend. Use this opt-out if you only want diagnosis:
|
|
136
|
+
|
|
137
|
+
```bash
|
|
138
|
+
VERBALCODING_DOCTOR_INSTALL_HERMES=0 vc doctor
|
|
139
|
+
```
|
|
114
140
|
|
|
115
141
|
Expected success includes:
|
|
116
142
|
|
|
@@ -126,9 +152,9 @@ Expected success includes:
|
|
|
126
152
|
Doctor passed. Run vc start to start VerbalCoding.
|
|
127
153
|
```
|
|
128
154
|
|
|
129
|
-
If
|
|
155
|
+
If `DISCORD_BOT_TOKEN` is missing, run `vc setup token`. If no configured channel is found, run `vc setup channels "<actual voice channel name>"`.
|
|
130
156
|
|
|
131
|
-
##
|
|
157
|
+
## 6. Run the single default bot
|
|
132
158
|
|
|
133
159
|
```bash
|
|
134
160
|
vc start
|
|
@@ -154,7 +180,25 @@ In Discord:
|
|
|
154
180
|
|
|
155
181
|
Then speak in the configured voice channel. You should see STT text, progress text when verbose mode is on, a final text answer, and hear TTS playback.
|
|
156
182
|
|
|
157
|
-
##
|
|
183
|
+
## 7. Docker and containers
|
|
184
|
+
|
|
185
|
+
Discord text/gateway login uses TCP/WebSocket, but Discord voice also needs UDP. If `vc start` logs this, the channel was found but voice UDP discovery failed:
|
|
186
|
+
|
|
187
|
+
```text
|
|
188
|
+
Cannot perform IP discovery - socket closed
|
|
189
|
+
```
|
|
190
|
+
|
|
191
|
+
On Linux Docker Compose, use host networking for the service running `vc start`:
|
|
192
|
+
|
|
193
|
+
```yaml
|
|
194
|
+
services:
|
|
195
|
+
verbalcoding:
|
|
196
|
+
network_mode: "host"
|
|
197
|
+
```
|
|
198
|
+
|
|
199
|
+
Remove any `ports:` block from that service when using host networking. On Docker Desktop for macOS/Windows, host networking behaves differently; if UDP voice still fails, run VerbalCoding directly on the host or in a Linux VM. See [Troubleshooting](TROUBLESHOOTING.md).
|
|
200
|
+
|
|
201
|
+
## 8. Project-per-room setup
|
|
158
202
|
|
|
159
203
|
For one permanent bot per project voice room, create one Discord application per project, then:
|
|
160
204
|
|
|
@@ -167,7 +211,7 @@ vc instance status my-project
|
|
|
167
211
|
|
|
168
212
|
Each instance writes an ignored `instances/<name>.env` with its own token, voice channel, transcript target, log path, Hermes session file, and optional Hermes profile.
|
|
169
213
|
|
|
170
|
-
##
|
|
214
|
+
## 9. Optional OpenVoice setup
|
|
171
215
|
|
|
172
216
|
OpenVoice voice cloning is optional. Keep `TTS_BACKEND=edge` for a fresh public install. To enable OpenVoice later:
|
|
173
217
|
|
|
@@ -181,7 +225,7 @@ python3 integrations/openvoice/synth.py --openvoice-dir vendor/OpenVoice --ref-a
|
|
|
181
225
|
|
|
182
226
|
Then set `TTS_BACKEND=openvoice`, run `vc doctor`, and test `!voice-test <text>` in Discord.
|
|
183
227
|
|
|
184
|
-
##
|
|
228
|
+
## 10. Clean clone smoke test for maintainers
|
|
185
229
|
|
|
186
230
|
Fast host-only smoke test:
|
|
187
231
|
|
|
@@ -196,12 +240,10 @@ chmod 600 .env
|
|
|
196
240
|
vc doctor || true
|
|
197
241
|
```
|
|
198
242
|
|
|
199
|
-
The expected failure at this point is missing local secrets or unauthenticated agent CLI, not leaked tokens or missing install scripts.
|
|
200
|
-
|
|
201
243
|
Docker-based Ubuntu clean install smoke test:
|
|
202
244
|
|
|
203
245
|
```bash
|
|
204
246
|
./scripts/docker_ubuntu_smoke.sh
|
|
205
247
|
```
|
|
206
248
|
|
|
207
|
-
This
|
|
249
|
+
This validates bootstrap and doctor behavior in a clean container. It does not connect to Discord voice; use a real Linux host/VM for end-to-end voice UDP testing.
|
|
@@ -0,0 +1,65 @@
|
|
|
1
|
+
# Hermes Built-in Voice vs VerbalCoding
|
|
2
|
+
|
|
3
|
+
<!-- readme-glow-up:intro -->
|
|
4
|
+
<p align="center">
|
|
5
|
+
<a href="../README.md">README</a> ·
|
|
6
|
+
<a href="README.md">Docs hub</a> ·
|
|
7
|
+
<a href="USAGE.md">Usage</a> ·
|
|
8
|
+
<a href="CONFIGURATION.md">Configuration</a> ·
|
|
9
|
+
<a href="TROUBLESHOOTING.md">Troubleshooting</a>
|
|
10
|
+
</p>
|
|
11
|
+
|
|
12
|
+
> Hermes already supports Discord voice channels. VerbalCoding is the workflow layer for people who want a coding-agent phone call, not just the baseline voice loop.
|
|
13
|
+
<!-- /readme-glow-up:intro -->
|
|
14
|
+
|
|
15
|
+
## What Hermes already does
|
|
16
|
+
|
|
17
|
+
Hermes Agent has built-in Discord voice-channel support through the Discord gateway. After the bot is in your server, slash commands such as `/voice join` or `/voice channel` can join the voice channel you are currently in. Hermes can then transcribe speech with Whisper/STT and speak replies back through TTS providers such as Edge TTS, ElevenLabs, OpenAI, or other configured providers.
|
|
18
|
+
|
|
19
|
+
For basic live voice chat, this is enough:
|
|
20
|
+
|
|
21
|
+
```text
|
|
22
|
+
Discord VC → Hermes STT → Hermes agent → TTS → Discord VC playback
|
|
23
|
+
```
|
|
24
|
+
|
|
25
|
+
If that is your whole requirement, use Hermes built-in voice mode first.
|
|
26
|
+
|
|
27
|
+
## What VerbalCoding adds
|
|
28
|
+
|
|
29
|
+
VerbalCoding keeps the same high-level loop, but makes it a coding-workflow runtime around CLI agents.
|
|
30
|
+
|
|
31
|
+
| Area | Hermes built-in voice | VerbalCoding |
|
|
32
|
+
|---|---|---|
|
|
33
|
+
| Primary goal | General Hermes conversation in a Discord VC | Phone-call-style coding workflow with CLI agents |
|
|
34
|
+
| Commands | `/voice join`, `/voice channel`, `/voice leave`, `/voice tts` | `vc setup`, `vc start`, `!join`, `!ask`, `!session`, `!verbose`, `!latency`, multi-instance commands |
|
|
35
|
+
| Backend | Hermes Agent | Hermes Agent, Claude Code, Codex, Gemini CLI, OpenCode, OpenClaw, or custom command |
|
|
36
|
+
| Session model | Normal Hermes gateway session | Project/session routing, voice-channel bindings, shared voice + `!ask` text context where supported |
|
|
37
|
+
| Speech UX | Baseline STT + TTS | Tuned utterance windows, language presets, transcript cleanup, text mirrors, voice tests |
|
|
38
|
+
| Interruption | Basic voice playback behavior | Barge-in rules that stop playback without accidentally killing an active agent task |
|
|
39
|
+
| Long coding tasks | Generic agent response | Progress/status prompts, verbose tool-progress summaries, diff/log suppression for TTS |
|
|
40
|
+
| Operations | Hermes gateway setup and config | `vc doctor` auto-fixes, redacted diagnostics, latency metrics, Docker UDP guidance, multi-bot/project rooms |
|
|
41
|
+
|
|
42
|
+
## When to choose which
|
|
43
|
+
|
|
44
|
+
Use **Hermes built-in voice** when you want:
|
|
45
|
+
|
|
46
|
+
- one bot in one Discord voice channel;
|
|
47
|
+
- simple speak → transcribe → answer → speak-back behavior;
|
|
48
|
+
- the official Hermes gateway path with minimal extra software;
|
|
49
|
+
- Hermes-only sessions and tools.
|
|
50
|
+
|
|
51
|
+
Use **VerbalCoding** when you want:
|
|
52
|
+
|
|
53
|
+
- voice and text to cooperate around a coding project;
|
|
54
|
+
- multiple agent backends, not only Hermes;
|
|
55
|
+
- project-specific Discord rooms or multiple bot instances;
|
|
56
|
+
- Korean/English language presets and runtime voice controls;
|
|
57
|
+
- careful barge-in behavior during long agent work;
|
|
58
|
+
- spoken progress without reading giant diffs, stack traces, or logs aloud;
|
|
59
|
+
- operational debugging with `vc doctor`, latency summaries, and container voice-network guidance.
|
|
60
|
+
|
|
61
|
+
## Honest positioning
|
|
62
|
+
|
|
63
|
+
VerbalCoding should not be described as “adding Discord voice to Hermes from scratch.” Hermes already has that baseline. A better description is:
|
|
64
|
+
|
|
65
|
+
> VerbalCoding is a Discord voice workflow layer for CLI coding agents. It can use Hermes as the default backend, while adding project routing, interruption semantics, progress UX, diagnostics, and backend switching for long-running software work.
|
package/docs/MULTI_INSTANCE.md
CHANGED
|
@@ -1,5 +1,21 @@
|
|
|
1
1
|
# Multi-instance VerbalCoding
|
|
2
2
|
|
|
3
|
+
<!-- readme-glow-up:intro -->
|
|
4
|
+
<p align="center">
|
|
5
|
+
<a href="../README.md">README</a> ·
|
|
6
|
+
<a href="README.md">Docs hub</a> ·
|
|
7
|
+
<a href="FRESH_INSTALL.md">Fresh Install</a> ·
|
|
8
|
+
<a href="USAGE.md">Usage</a> ·
|
|
9
|
+
<a href="CONFIGURATION.md">Configuration</a> ·
|
|
10
|
+
<a href="TROUBLESHOOTING.md">Troubleshooting</a> ·
|
|
11
|
+
<a href="MULTI_INSTANCE.md">Multi-Instance</a>
|
|
12
|
+
</p>
|
|
13
|
+
|
|
14
|
+
> Run one isolated Discord voice bridge per project room.
|
|
15
|
+
>
|
|
16
|
+
> Fast path: `vc instance setup NAME → vc bot invite CLIENT_ID → vc instance start NAME`
|
|
17
|
+
<!-- /readme-glow-up:intro -->
|
|
18
|
+
|
|
3
19
|
VerbalCoding can run multiple independent Discord voice bridge processes. Each process is still the existing single-instance Node bridge, but it loads a different `instances/<name>.env` file and uses a different Discord bot token.
|
|
4
20
|
|
|
5
21
|
Use this when each project should permanently occupy its own Discord voice channel and write to its own transcript channel/thread.
|
package/docs/README.md
ADDED
|
@@ -0,0 +1,49 @@
|
|
|
1
|
+
# VerbalCoding docs
|
|
2
|
+
|
|
3
|
+
<p align="center">
|
|
4
|
+
<a href="../README.md">README</a> ·
|
|
5
|
+
<a href="./i18n/README.ko.md">한국어</a> ·
|
|
6
|
+
<a href="./i18n/README.ja.md">日本語</a> ·
|
|
7
|
+
<a href="./i18n/README.zh.md">中文</a> ·
|
|
8
|
+
<a href="./i18n/README.es.md">Español</a> ·
|
|
9
|
+
<a href="./i18n/README.fr.md">Français</a> ·
|
|
10
|
+
<a href="./i18n/README.ru.md">Русский</a>
|
|
11
|
+
</p>
|
|
12
|
+
|
|
13
|
+
This is the detailed manual behind the compact README. Start with the fresh install guide if you are setting up a real Discord voice bot for the first time.
|
|
14
|
+
|
|
15
|
+
## Fast path
|
|
16
|
+
|
|
17
|
+
```bash
|
|
18
|
+
npm install -g verbalcoding@latest
|
|
19
|
+
vc setup
|
|
20
|
+
vc doctor
|
|
21
|
+
vc start
|
|
22
|
+
```
|
|
23
|
+
|
|
24
|
+
## Guides
|
|
25
|
+
|
|
26
|
+
| Guide | Use it when you need |
|
|
27
|
+
|---|---|
|
|
28
|
+
| [Fresh Install](FRESH_INSTALL.md) | A clean npm/global install, Discord app setup, first bot invite, and first voice run. |
|
|
29
|
+
| [Usage](USAGE.md) | CLI commands, Discord commands, run modes, voice changes, progress, and latency metrics. |
|
|
30
|
+
| [Hermes Voice vs VerbalCoding](HERMES_VOICE.md) | What Hermes built-in Discord voice already does and what VerbalCoding adds. |
|
|
31
|
+
| [Configuration](CONFIGURATION.md) | `.env`, agent backends, MCP server, TTS backends, and operational settings. |
|
|
32
|
+
| [Troubleshooting](TROUBLESHOOTING.md) | Docker UDP, voice join failures, missing token/channel checks, and doctor behavior. |
|
|
33
|
+
| [Multi-Instance](MULTI_INSTANCE.md) | One permanent Discord voice bot per project room with isolated Hermes profiles. |
|
|
34
|
+
| [Release Notes](RELEASE.md) | Current capabilities, verification checklist, and pre-public-release gaps. |
|
|
35
|
+
|
|
36
|
+
## Localized guide sets
|
|
37
|
+
|
|
38
|
+
| Language | Docs index |
|
|
39
|
+
|---|---|
|
|
40
|
+
| Korean | [docs/i18n/README.ko.md](i18n/README.ko.md) |
|
|
41
|
+
| Japanese | [docs/i18n/README.ja.md](i18n/README.ja.md) |
|
|
42
|
+
| Chinese | [docs/i18n/README.zh.md](i18n/README.zh.md) |
|
|
43
|
+
| Spanish | [docs/i18n/README.es.md](i18n/README.es.md) |
|
|
44
|
+
| French | [docs/i18n/README.fr.md](i18n/README.fr.md) |
|
|
45
|
+
| Russian | [docs/i18n/README.ru.md](i18n/README.ru.md) |
|
|
46
|
+
|
|
47
|
+
## Contributor note
|
|
48
|
+
|
|
49
|
+
Use `vc ...` commands in user-facing docs. Keep `./scripts/...` commands for source-checkout contributor flows only.
|
package/docs/RELEASE.md
CHANGED
|
@@ -1,13 +1,29 @@
|
|
|
1
1
|
# VerbalCoding release notes
|
|
2
2
|
|
|
3
|
+
<!-- readme-glow-up:intro -->
|
|
4
|
+
<p align="center">
|
|
5
|
+
<a href="../README.md">README</a> ·
|
|
6
|
+
<a href="README.md">Docs hub</a> ·
|
|
7
|
+
<a href="FRESH_INSTALL.md">Fresh Install</a> ·
|
|
8
|
+
<a href="USAGE.md">Usage</a> ·
|
|
9
|
+
<a href="CONFIGURATION.md">Configuration</a> ·
|
|
10
|
+
<a href="TROUBLESHOOTING.md">Troubleshooting</a> ·
|
|
11
|
+
<a href="MULTI_INSTANCE.md">Multi-Instance</a>
|
|
12
|
+
</p>
|
|
13
|
+
|
|
14
|
+
> Release-facing capability list and verification checklist.
|
|
15
|
+
>
|
|
16
|
+
> Fast path: `npm pack --dry-run → npm test → vc doctor → manual Discord smoke test`
|
|
17
|
+
<!-- /readme-glow-up:intro -->
|
|
18
|
+
|
|
3
19
|
## Current release candidate
|
|
4
20
|
|
|
5
|
-
VerbalCoding is a Discord voice bridge for controlling CLI-based coding agents by voice. It is public-release oriented, with macOS / Apple Silicon as the most tested path and best-effort Linux bootstrap support for common package managers.
|
|
21
|
+
VerbalCoding is a Discord voice bridge for controlling CLI-based coding agents by voice. It is public-release oriented, with macOS / Apple Silicon as the most tested path and best-effort Linux bootstrap support for common package managers. Windows is not supported yet.
|
|
6
22
|
|
|
7
23
|
### Included
|
|
8
24
|
|
|
9
25
|
- Discord voice receive via Node `@discordjs/voice`.
|
|
10
|
-
- Local Korean STT via `whisper.cpp` + Metal.
|
|
26
|
+
- Local Korean STT via `whisper.cpp` + Metal or local Linux build fallback.
|
|
11
27
|
- Edge TTS playback with Korean default voice.
|
|
12
28
|
- Generic CLI harness adapter layer:
|
|
13
29
|
- Hermes Agent
|
|
@@ -21,12 +37,14 @@ VerbalCoding is a Discord voice bridge for controlling CLI-based coding agents b
|
|
|
21
37
|
- Long-answer TTS chunking and responsive barge-in.
|
|
22
38
|
- Diff/code/log guardrails so large technical output is not read aloud.
|
|
23
39
|
- Normal and conservative sensitivity modes for indoor vs. noisy/outdoor use.
|
|
24
|
-
-
|
|
25
|
-
-
|
|
40
|
+
- Public npm setup path: `npm install -g verbalcoding@latest`, guided `vc setup`, `vc doctor`, and `vc start`; `vc setup --yes`, `vc setup token`, and `vc setup channels` remain available for automation or later updates.
|
|
41
|
+
- `vc doctor` redacted prerequisite checker with supported auto-fixes for local media/STT/TTS prerequisites and Hermes CLI on macOS/Linux.
|
|
42
|
+
- Discord onboarding helpers: `vc bot invite <client-id>` plus token/client-id registration through `vc setup token`.
|
|
43
|
+
- Auto-join channel configuration through `vc setup channels`, `vc setup channel`, and `vc setup voice`.
|
|
26
44
|
- Optional verbose progress mode for text-only middle-step updates during long agent work.
|
|
27
|
-
- Always-on JSONL latency metrics plus `!latency` / `!metrics` summary
|
|
28
|
-
- More patient utterance idle wait (`UTTERANCE_IDLE_MS=4500`) so long spoken instructions with natural pauses are not split
|
|
29
|
-
- Multi-instance Hermes profile isolation: `vc instance setup <name>` auto-clones a Hermes profile to `~/.hermes/profiles/<name>` with the instance workdir, seeds SOUL.md, and writes `HERMES_HOME` into the instance env
|
|
45
|
+
- Always-on JSONL latency metrics plus `!latency` / `!metrics` summary.
|
|
46
|
+
- More patient utterance idle wait (`UTTERANCE_IDLE_MS=4500`) so long spoken instructions with natural pauses are not split too early.
|
|
47
|
+
- Multi-instance Hermes profile isolation: `vc instance setup <name>` auto-clones a Hermes profile to `~/.hermes/profiles/<name>` with the instance workdir, seeds SOUL.md, and writes `HERMES_HOME` into the instance env.
|
|
30
48
|
|
|
31
49
|
### Pre-release checklist
|
|
32
50
|
|
|
@@ -35,7 +53,7 @@ Run from the repo root:
|
|
|
35
53
|
```bash
|
|
36
54
|
./scripts/install.sh --yes --no-wizard
|
|
37
55
|
./scripts/docker_ubuntu_smoke.sh # requires Docker; validates ubuntu:24.04 clean install
|
|
38
|
-
node --check app-node/main.mjs app-node/agent_adapters.mjs app-node/install_config.mjs scripts/install.mjs
|
|
56
|
+
node --check app-node/main.mjs app-node/agent_adapters.mjs app-node/install_config.mjs scripts/install.mjs scripts/cli.mjs scripts/doctor.mjs
|
|
39
57
|
npm test
|
|
40
58
|
PYTEST_DISABLE_PLUGIN_AUTOLOAD=1 python3 -m pytest tests/ -q || [ $? -eq 5 ] # ok when no Python tests exist
|
|
41
59
|
bash -n run.sh scripts/install.sh scripts/bootstrap_prereqs.sh scripts/docker_ubuntu_smoke.sh
|
|
@@ -46,22 +64,27 @@ git diff --check
|
|
|
46
64
|
|
|
47
65
|
Manual smoke test:
|
|
48
66
|
|
|
49
|
-
1.
|
|
50
|
-
2.
|
|
51
|
-
3. Verify log contains `
|
|
52
|
-
4.
|
|
53
|
-
5. In Discord
|
|
54
|
-
6.
|
|
67
|
+
1. Configure the app with `vc setup token` and `vc setup channels "<voice-channel>"`.
|
|
68
|
+
2. Start the bridge with `vc start` or `./run.sh`.
|
|
69
|
+
3. Verify log contains `Logged in as <bot-name>`.
|
|
70
|
+
4. Verify log contains `Listening in voice channel ... / <configured channel>`.
|
|
71
|
+
5. In Discord, run `!ping`.
|
|
72
|
+
6. In Discord voice, say a short Korean request.
|
|
73
|
+
7. Verify STT transcript, agent response, TTS playback, and barge-in behavior.
|
|
74
|
+
|
|
75
|
+
Container smoke note: Docker script checks install quality, not Discord voice UDP. For end-to-end voice in containers, Linux host networking is usually required.
|
|
55
76
|
|
|
56
77
|
### Known requirements
|
|
57
78
|
|
|
58
79
|
- macOS with Homebrew, or Linux with `apt`, `dnf`, or `pacman` for best-effort bootstrap.
|
|
59
|
-
- `ffmpeg`;
|
|
60
|
-
- `whisper-cli`;
|
|
61
|
-
- Default model at `models/ggml-small-q5_1.bin`;
|
|
62
|
-
- Edge TTS CLI on `PATH` or local `.venv-tts/bin/edge-tts`;
|
|
63
|
-
- Discord bot token in `.env`, `instances/<name>.env`, `~/.zshrc`, or runtime env.
|
|
80
|
+
- `ffmpeg`; setup/doctor attempts to install it.
|
|
81
|
+
- `whisper-cli`; setup uses Homebrew on macOS or local `vendor/whisper.cpp` build fallback on Linux.
|
|
82
|
+
- Default model at `models/ggml-small-q5_1.bin`; setup downloads it unless `--skip-model` is used.
|
|
83
|
+
- Edge TTS CLI on `PATH` or local `.venv-tts/bin/edge-tts`; setup creates the local helper when needed.
|
|
84
|
+
- Discord bot token registered with `vc setup token` or present in `.env`, `instances/<name>.env`, `~/.zshrc`, or runtime env.
|
|
85
|
+
- Auto-join voice channels registered with `vc setup channels` or present in `AUTO_JOIN_VOICE_CHANNELS`.
|
|
64
86
|
- Selected CLI harness installed and authenticated.
|
|
87
|
+
- For containerized Discord voice, UDP egress must work; Linux `network_mode: "host"` is the recommended Docker Compose setting.
|
|
65
88
|
|
|
66
89
|
### Not for public release yet
|
|
67
90
|
|
package/docs/ROADMAP.md
ADDED
|
@@ -0,0 +1,38 @@
|
|
|
1
|
+
# VerbalCoding Roadmap — 2026 H1 Differentiation Push
|
|
2
|
+
|
|
3
|
+
> Reframe: from "Discord bridge for Hermes" → **the voice layer for any coding agent — with real barge-in, streaming latency, and the agents you already use.**
|
|
4
|
+
|
|
5
|
+
This roadmap covers five differentiation phases that separate VerbalCoding from Hermes' built-in `/voice` (shipped Mar 2026, ~2 months old, no barge-in, Hermes-only, 2.5–9s practical latency).
|
|
6
|
+
|
|
7
|
+
## Phase Plans
|
|
8
|
+
|
|
9
|
+
| # | Phase | Status | Plan |
|
|
10
|
+
|---|---|---|---|
|
|
11
|
+
| 1 | Streaming end-to-end pipeline | designed | [phase1-streaming-pipeline.md](./superpowers/plans/2026-05-13-phase1-streaming-pipeline.md) |
|
|
12
|
+
| 2 | Agent-agnostic adapter completion | partial → designed | [phase2-agent-adapters.md](./superpowers/plans/2026-05-13-phase2-agent-adapters.md) |
|
|
13
|
+
| 6 | Smart progress summarization | designed | [phase6-smart-progress.md](./superpowers/plans/2026-05-13-phase6-smart-progress.md) |
|
|
14
|
+
| 7 | Voice plan mode | designed | [phase7-voice-plan-mode.md](./superpowers/plans/2026-05-13-phase7-voice-plan-mode.md) |
|
|
15
|
+
| 10 | Push notification handoff | designed | [phase10-push-notifications.md](./superpowers/plans/2026-05-13-phase10-push-notifications.md) |
|
|
16
|
+
|
|
17
|
+
## Sequencing rationale
|
|
18
|
+
|
|
19
|
+
1. **Phase 2 first** — adapter polish + Aider/Cursor + auto-detection. Foundational and unlocks marketing claim "any coding agent".
|
|
20
|
+
2. **Phase 1** — extend the existing `tts_prefetch.mjs` to consume streaming stdout. Big perceived-latency win.
|
|
21
|
+
3. **Phase 6** — replaces regex pattern matching with semantic summarization. Demo moment.
|
|
22
|
+
4. **Phase 7** — voice plan mode. UX feature, depends on adapter capability flags from Phase 2.
|
|
23
|
+
5. **Phase 10** — push notification handoff. Independent; ship after the core is tighter.
|
|
24
|
+
|
|
25
|
+
## Differentiation claims this roadmap unlocks
|
|
26
|
+
|
|
27
|
+
- **True barge-in** with smart resume (extend existing `barge_in.mjs`).
|
|
28
|
+
- **Streaming pipeline** so first audio plays before the agent finishes thinking (Hermes Phase-4 wishlist).
|
|
29
|
+
- **Agent-agnostic** — Hermes, Claude Code, Codex, Gemini, OpenCode, OpenClaw, Aider, Cursor CLI, custom.
|
|
30
|
+
- **Smart narration** — describes intent, not file names.
|
|
31
|
+
- **Voice plan mode** — narrate plan, edit by voice (`"skip step 3"`).
|
|
32
|
+
- **Phone-down mode** — push notification when long task completes with voice summary.
|
|
33
|
+
|
|
34
|
+
## Non-goals (for this cycle)
|
|
35
|
+
|
|
36
|
+
- PSTN bridge / actual phone calls (Phase 4 of the broader pitch; deferred).
|
|
37
|
+
- Local-first one-flag preset (Phase 5; deferred but trivial follow-up).
|
|
38
|
+
- Multi-agent in one VC with distinct voices (Phase 3; needs Phase 2 to land first).
|