verbalcoding 0.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.env.example +83 -0
- package/LICENSE +21 -0
- package/README.md +157 -0
- package/app-node/agent_adapters.mjs +576 -0
- package/app-node/agent_adapters.test.mjs +455 -0
- package/app-node/agent_contract.mjs +45 -0
- package/app-node/barge_in.mjs +148 -0
- package/app-node/barge_in.test.mjs +179 -0
- package/app-node/bridge_logger.mjs +66 -0
- package/app-node/bridge_logger.test.mjs +73 -0
- package/app-node/bridge_state.mjs +104 -0
- package/app-node/bridge_state.test.mjs +64 -0
- package/app-node/cli_install.test.mjs +97 -0
- package/app-node/deferred_queue.mjs +12 -0
- package/app-node/deferred_queue.test.mjs +20 -0
- package/app-node/discord_invite_cli.test.mjs +31 -0
- package/app-node/discord_text.mjs +29 -0
- package/app-node/discord_text.test.mjs +32 -0
- package/app-node/hermes_profiles.mjs +164 -0
- package/app-node/hermes_profiles.test.mjs +276 -0
- package/app-node/install_config.mjs +263 -0
- package/app-node/install_config.test.mjs +205 -0
- package/app-node/instance_doctor.mjs +137 -0
- package/app-node/instance_doctor.test.mjs +128 -0
- package/app-node/instance_profile_lifecycle.mjs +16 -0
- package/app-node/instances.mjs +153 -0
- package/app-node/instances.test.mjs +102 -0
- package/app-node/language_config.mjs +73 -0
- package/app-node/language_config.test.mjs +51 -0
- package/app-node/latency_metrics.mjs +133 -0
- package/app-node/latency_metrics.test.mjs +71 -0
- package/app-node/main.mjs +1771 -0
- package/app-node/mcp_tools.mjs +198 -0
- package/app-node/mcp_tools.test.mjs +39 -0
- package/app-node/progress_cache.mjs +7 -0
- package/app-node/progress_cache.test.mjs +23 -0
- package/app-node/progress_speech.mjs +102 -0
- package/app-node/progress_speech.test.mjs +48 -0
- package/app-node/project_sessions.mjs +148 -0
- package/app-node/project_sessions.test.mjs +77 -0
- package/app-node/restart_notice.mjs +57 -0
- package/app-node/restart_notice.test.mjs +37 -0
- package/app-node/restart_policy.mjs +27 -0
- package/app-node/restart_policy.test.mjs +33 -0
- package/app-node/text_routing.mjs +8 -0
- package/app-node/text_routing.test.mjs +18 -0
- package/app-node/tts_backends.mjs +251 -0
- package/app-node/tts_backends.test.mjs +400 -0
- package/app-node/tts_chunks.mjs +57 -0
- package/app-node/tts_chunks.test.mjs +35 -0
- package/app-node/tts_prefetch.mjs +38 -0
- package/app-node/tts_prefetch.test.mjs +49 -0
- package/app-node/tts_settings.mjs +72 -0
- package/app-node/tts_settings.test.mjs +127 -0
- package/app-node/tts_voice_config.mjs +127 -0
- package/app-node/tts_voice_config.test.mjs +64 -0
- package/app-node/voice_clone_capture.mjs +76 -0
- package/app-node/voice_clone_capture.test.mjs +51 -0
- package/app-node/voice_messages.mjs +62 -0
- package/app-node/voice_messages.test.mjs +33 -0
- package/docs/CONFIGURATION.md +183 -0
- package/docs/FRESH_INSTALL.md +193 -0
- package/docs/MULTI_INSTANCE.md +183 -0
- package/docs/RELEASE.md +72 -0
- package/docs/USAGE.md +108 -0
- package/docs/assets/figures/verbalcoding-flow.svg +63 -0
- package/docs/i18n/README.es.md +121 -0
- package/docs/i18n/README.fr.md +121 -0
- package/docs/i18n/README.ja.md +121 -0
- package/docs/i18n/README.ko.md +121 -0
- package/docs/i18n/README.ru.md +121 -0
- package/docs/i18n/README.zh.md +121 -0
- package/package.json +58 -0
- package/run.sh +82 -0
- package/scripts/bootstrap_prereqs.sh +193 -0
- package/scripts/cli.mjs +369 -0
- package/scripts/docker_ubuntu_smoke.sh +76 -0
- package/scripts/doctor.mjs +134 -0
- package/scripts/install.mjs +108 -0
- package/scripts/install.sh +44 -0
- package/scripts/mcp-server.mjs +84 -0
- package/scripts/openvoice_smoke.py +34 -0
- package/scripts/openvoice_synth.py +103 -0
- package/scripts/setup_openvoice.sh +34 -0
- package/scripts/setup_supertonic.sh +18 -0
|
@@ -0,0 +1,183 @@
|
|
|
1
|
+
# VerbalCoding Configuration
|
|
2
|
+
|
|
3
|
+
## Setup Wizard
|
|
4
|
+
|
|
5
|
+
```bash
|
|
6
|
+
./scripts/install.sh
|
|
7
|
+
```
|
|
8
|
+
|
|
9
|
+
The installer asks for Discord token, allowed users, auto-join voice channel names, transcript channel/thread, CLI harness backend, default voice language, TTS settings, and wake-word behavior. It writes `.env` with mode `0600`; `.env` is ignored by git. It also links the short shell command `vc`.
|
|
10
|
+
|
|
11
|
+
If you only need the shell command after manual install:
|
|
12
|
+
|
|
13
|
+
```bash
|
|
14
|
+
npm link
|
|
15
|
+
```
|
|
16
|
+
|
|
17
|
+
## Supported Agent Backends
|
|
18
|
+
|
|
19
|
+
Set `AGENT_BACKEND` in `.env`.
|
|
20
|
+
|
|
21
|
+
| Backend | Default command | Notes |
|
|
22
|
+
|---|---|---|
|
|
23
|
+
| `hermes` | `hermes chat -Q -q` | Default. Preserves `.verbalcoding-session` resume behavior. |
|
|
24
|
+
| `claude-code` / `claude` | `claude -p` | Override with `CLAUDE_COMMAND` or `AGENT_COMMAND`. |
|
|
25
|
+
| `codex` | `codex exec` | Override with `CODEX_COMMAND` or `AGENT_COMMAND`. |
|
|
26
|
+
| `gemini` | `gemini -p` | Override with `GEMINI_COMMAND` or `AGENT_COMMAND`. |
|
|
27
|
+
| `opencode` | `opencode run` | Override with `OPENCODE_COMMAND` or `AGENT_COMMAND`. |
|
|
28
|
+
| `openclaw` | `openclaw run` | Override with `OPENCLAW_COMMAND` or `AGENT_COMMAND`. |
|
|
29
|
+
| `custom` | required `AGENT_COMMAND` | Prompt is appended as the final argv argument. |
|
|
30
|
+
|
|
31
|
+
Generic overrides:
|
|
32
|
+
|
|
33
|
+
```bash
|
|
34
|
+
AGENT_BACKEND=custom
|
|
35
|
+
AGENT_LABEL="My Harness"
|
|
36
|
+
AGENT_COMMAND="my-harness run --non-interactive"
|
|
37
|
+
AGENT_TASK_TIMEOUT_MS=0
|
|
38
|
+
AGENT_CHAT_TIMEOUT_MS=45000
|
|
39
|
+
AGENT_VERBOSE_PROGRESS=0
|
|
40
|
+
UTTERANCE_IDLE_MS=2000
|
|
41
|
+
LATENCY_LOG_PATH=./.logs/latency.jsonl
|
|
42
|
+
```
|
|
43
|
+
|
|
44
|
+
## Agent Adapter Contract
|
|
45
|
+
|
|
46
|
+
The voice bridge talks to every backend through one adapter contract:
|
|
47
|
+
|
|
48
|
+
- `run({ text }, signal, plan)` returns status, final answer text, backend label, elapsed time, and optional session metadata.
|
|
49
|
+
- `ask(text, signal, plan)` is the compatibility shortcut that returns only final answer text.
|
|
50
|
+
- `capabilities` declares whether the backend supports session resume, streaming progress, and cancellation.
|
|
51
|
+
- Hermes is the reference adapter: resume, verbose progress streaming, cancellation, and final-answer recovery from Hermes session files.
|
|
52
|
+
|
|
53
|
+
New backends should implement the same contract and keep voice/STT/TTS behavior outside the adapter.
|
|
54
|
+
|
|
55
|
+
## Example `.env`
|
|
56
|
+
|
|
57
|
+
```bash
|
|
58
|
+
DISCORD_BOT_TOKEN="***"
|
|
59
|
+
DISCORD_ALLOWED_USERS="123456789012345678"
|
|
60
|
+
AUTO_JOIN_VOICE_CHANNELS="일반,General,general"
|
|
61
|
+
TRANSCRIPT_CHANNEL_ID="123456789012345678"
|
|
62
|
+
|
|
63
|
+
AGENT_BACKEND="hermes"
|
|
64
|
+
STT_ENGINE="whisper_cpp"
|
|
65
|
+
WHISPER_CPP_BIN="whisper-cli"
|
|
66
|
+
WHISPER_CPP_MODEL="./models/ggml-small-q5_1.bin"
|
|
67
|
+
|
|
68
|
+
TTS_BACKEND="edge"
|
|
69
|
+
TTS_VOICE="ko-KR-SunHiNeural"
|
|
70
|
+
TTS_RATE="+10%"
|
|
71
|
+
TTS_MAX_CHARS="495"
|
|
72
|
+
TTS_VOLUME="1.0"
|
|
73
|
+
|
|
74
|
+
REQUIRE_WAKE_WORD="0"
|
|
75
|
+
MIN_UTTERANCE_SECONDS="1.0"
|
|
76
|
+
UTTERANCE_IDLE_MS="2000"
|
|
77
|
+
HERMES_TASK_TIMEOUT_MS="0"
|
|
78
|
+
HERMES_CHAT_TIMEOUT_MS="45000"
|
|
79
|
+
AGENT_VERBOSE_PROGRESS="0"
|
|
80
|
+
LATENCY_LOG_PATH="./.logs/latency.jsonl"
|
|
81
|
+
```
|
|
82
|
+
|
|
83
|
+
## MCP Server
|
|
84
|
+
|
|
85
|
+
VerbalCoding ships a stdio MCP server so Hermes Agent or any MCP client can control the bridge through tools instead of relying on skills or free-form shell commands.
|
|
86
|
+
|
|
87
|
+
Hermes config example:
|
|
88
|
+
|
|
89
|
+
```yaml
|
|
90
|
+
mcp_servers:
|
|
91
|
+
verbalcoding:
|
|
92
|
+
command: "node"
|
|
93
|
+
args: ["/path/to/VerbalCoding/scripts/mcp-server.mjs"]
|
|
94
|
+
timeout: 120
|
|
95
|
+
connect_timeout: 30
|
|
96
|
+
```
|
|
97
|
+
|
|
98
|
+
Exposed MCP tools:
|
|
99
|
+
|
|
100
|
+
| Tool | Purpose |
|
|
101
|
+
|---|---|
|
|
102
|
+
| `status` | Report bridge/config status without secrets |
|
|
103
|
+
| `doctor` | Run the redacted doctor check |
|
|
104
|
+
| `set_auto_restart` | Enable/disable commit-time voice-bot auto-restart |
|
|
105
|
+
| `set_language` | Update STT/progress/TTS language together |
|
|
106
|
+
| `start`, `stop`, `restart` | Control the Discord voice bridge |
|
|
107
|
+
|
|
108
|
+
## Optional OpenVoice TTS
|
|
109
|
+
|
|
110
|
+
Edge TTS remains the default and fallback. To try local voice cloning with OpenVoice V2:
|
|
111
|
+
|
|
112
|
+
```bash
|
|
113
|
+
./scripts/setup_openvoice.sh
|
|
114
|
+
# Download checkpoints_v2_0417.zip from OpenVoice docs and extract under vendor/OpenVoice/checkpoints_v2/
|
|
115
|
+
mkdir -p voice-samples
|
|
116
|
+
# Put a permitted reference sample at voice-samples/user-reference.wav,
|
|
117
|
+
# or capture one from Discord with !voice-clone capture.
|
|
118
|
+
python3 scripts/openvoice_smoke.py
|
|
119
|
+
```
|
|
120
|
+
|
|
121
|
+
Then set:
|
|
122
|
+
|
|
123
|
+
```bash
|
|
124
|
+
TTS_BACKEND="openvoice"
|
|
125
|
+
OPENVOICE_REF_AUDIO="./voice-samples/user-reference.wav"
|
|
126
|
+
OPENVOICE_PROGRESS="0"
|
|
127
|
+
```
|
|
128
|
+
|
|
129
|
+
Only clone voices you own or have permission to use. If OpenVoice fails or times out, VerbalCoding falls back to Edge TTS.
|
|
130
|
+
|
|
131
|
+
## Optional Supertonic TTS
|
|
132
|
+
|
|
133
|
+
```bash
|
|
134
|
+
./scripts/setup_supertonic.sh
|
|
135
|
+
supertonic tts '안녕하세요. 수퍼토닉 테스트입니다.' --lang ko --voice M1 --steps 2 --speed 1.0 -o /tmp/verbalcoding-supertonic.wav
|
|
136
|
+
```
|
|
137
|
+
|
|
138
|
+
Then set:
|
|
139
|
+
|
|
140
|
+
```bash
|
|
141
|
+
TTS_BACKEND="supertonic"
|
|
142
|
+
SUPERTONIC_COMMAND="./.venv-supertonic/bin/supertonic"
|
|
143
|
+
SUPERTONIC_VOICE="M1"
|
|
144
|
+
SUPERTONIC_LANGUAGE="ko"
|
|
145
|
+
SUPERTONIC_STEPS="2"
|
|
146
|
+
SUPERTONIC_SPEED="1.0"
|
|
147
|
+
SUPERTONIC_PROGRESS="0"
|
|
148
|
+
```
|
|
149
|
+
|
|
150
|
+
If Supertonic is missing, fails, or times out, VerbalCoding falls back to Edge TTS.
|
|
151
|
+
|
|
152
|
+
## Optional SpeechSwift / CosyVoice TTS
|
|
153
|
+
|
|
154
|
+
On Apple Silicon, `speech-swift` is a local backend for Korean voice cloning with MLX-native CosyVoice/Qwen3-TTS.
|
|
155
|
+
|
|
156
|
+
```bash
|
|
157
|
+
brew tap soniqo/speech https://github.com/soniqo/speech-swift
|
|
158
|
+
brew install speech
|
|
159
|
+
```
|
|
160
|
+
|
|
161
|
+
Recommended env:
|
|
162
|
+
|
|
163
|
+
```bash
|
|
164
|
+
TTS_BACKEND="speechswift"
|
|
165
|
+
SPEECHSWIFT_MODE="server"
|
|
166
|
+
SPEECHSWIFT_ENGINE="cosyvoice"
|
|
167
|
+
SPEECHSWIFT_LANGUAGE="korean"
|
|
168
|
+
SPEECHSWIFT_REF_AUDIO="./voice-samples/user-reference.wav"
|
|
169
|
+
SPEECHSWIFT_SERVER_HOST="127.0.0.1"
|
|
170
|
+
SPEECHSWIFT_SERVER_PORT="18080"
|
|
171
|
+
SPEECHSWIFT_SERVER_URL="http://127.0.0.1:18080"
|
|
172
|
+
SPEECHSWIFT_PROGRESS="0"
|
|
173
|
+
```
|
|
174
|
+
|
|
175
|
+
Keep Edge for quick progress/backchannel prompts.
|
|
176
|
+
|
|
177
|
+
## Operational Notes
|
|
178
|
+
|
|
179
|
+
- Bot needs Discord privileged Message Content intent enabled for text commands.
|
|
180
|
+
- Bot needs voice channel connect/speak permissions.
|
|
181
|
+
- For Hermes Agent, configure/authenticate Hermes normally (`hermes setup`, `hermes login`, etc.) on your default profile.
|
|
182
|
+
- For Claude Code, Codex, Gemini, OpenCode, OpenClaw, install and authenticate those CLIs separately.
|
|
183
|
+
- If a CLI emits diff/code output on timeout or signal failure, the bridge avoids reading it aloud and sends detailed text instead.
|
|
@@ -0,0 +1,193 @@
|
|
|
1
|
+
# Fresh install
|
|
2
|
+
|
|
3
|
+
This guide is for a clean public install. It avoids local-only assumptions and uses the installer to bootstrap as much as possible.
|
|
4
|
+
|
|
5
|
+
## 1. Install the CLI
|
|
6
|
+
|
|
7
|
+
Recommended npm path:
|
|
8
|
+
|
|
9
|
+
```bash
|
|
10
|
+
npm install -g verbalcoding
|
|
11
|
+
```
|
|
12
|
+
|
|
13
|
+
Or run the published package directly:
|
|
14
|
+
|
|
15
|
+
```bash
|
|
16
|
+
npx verbalcoding setup --yes
|
|
17
|
+
```
|
|
18
|
+
|
|
19
|
+
If you used `npm install -g`, continue with:
|
|
20
|
+
|
|
21
|
+
```bash
|
|
22
|
+
vc setup --yes
|
|
23
|
+
```
|
|
24
|
+
|
|
25
|
+
Contributor GitHub clone path:
|
|
26
|
+
|
|
27
|
+
```bash
|
|
28
|
+
git clone https://github.com/ca1773130n/VerbalCoding.git
|
|
29
|
+
cd VerbalCoding
|
|
30
|
+
./scripts/install.sh --yes
|
|
31
|
+
```
|
|
32
|
+
|
|
33
|
+
## 2. Bootstrap dependencies and run the setup wizard
|
|
34
|
+
|
|
35
|
+
The npm commands above run the same bootstrapper as the clone install. For a clone, run:
|
|
36
|
+
|
|
37
|
+
```bash
|
|
38
|
+
./scripts/install.sh --yes
|
|
39
|
+
```
|
|
40
|
+
|
|
41
|
+
What this does:
|
|
42
|
+
|
|
43
|
+
- installs npm dependencies when `node_modules/` is missing,
|
|
44
|
+
- installs the short `vc` shell command with `npm link`,
|
|
45
|
+
- installs `ffmpeg`, Node/npm, and `whisper-cli` when supported by the OS package manager,
|
|
46
|
+
- downloads `models/ggml-small-q5_1.bin`,
|
|
47
|
+
- creates `.venv-tts` and installs `edge-tts` when `edge-tts` is not already on `PATH`,
|
|
48
|
+
- runs the interactive `.env` wizard.
|
|
49
|
+
|
|
50
|
+
Supported system bootstrap paths:
|
|
51
|
+
|
|
52
|
+
| OS | System dependency path |
|
|
53
|
+
|---|---|
|
|
54
|
+
| macOS | Homebrew: `brew install node ffmpeg whisper-cpp` as needed |
|
|
55
|
+
| Debian/Ubuntu | `apt-get` for Node/npm, ffmpeg, Python, build tools; local whisper.cpp build fallback |
|
|
56
|
+
| Fedora/RHEL | `dnf` for Node/npm, ffmpeg, Python, build tools; local whisper.cpp build fallback |
|
|
57
|
+
| Arch | `pacman` for Node/npm, ffmpeg, Python, build tools; local whisper.cpp build fallback |
|
|
58
|
+
|
|
59
|
+
Useful installer variants:
|
|
60
|
+
|
|
61
|
+
```bash
|
|
62
|
+
vc setup --yes --no-wizard # dependency/bootstrap only from npm install
|
|
63
|
+
./scripts/install.sh --yes --no-wizard # dependency/bootstrap only from a clone
|
|
64
|
+
./scripts/install.sh --skip-system # do not install OS packages
|
|
65
|
+
./scripts/install.sh --skip-model # do not download the default STT model
|
|
66
|
+
./scripts/install.sh --skip-edge-tts # do not create .venv-tts
|
|
67
|
+
VERBALCODING_SKIP_CLI_LINK=1 ./scripts/install.sh --yes
|
|
68
|
+
```
|
|
69
|
+
|
|
70
|
+
If your OS is unsupported, install these manually before rerunning:
|
|
71
|
+
|
|
72
|
+
- Node.js 20+ and npm
|
|
73
|
+
- ffmpeg
|
|
74
|
+
- Python 3 with venv/pip
|
|
75
|
+
- whisper.cpp `whisper-cli`
|
|
76
|
+
- one authenticated CLI agent backend, Hermes Agent by default
|
|
77
|
+
|
|
78
|
+
## 3. Discord application setup
|
|
79
|
+
|
|
80
|
+
1. Create a Discord application and bot in the Discord Developer Portal.
|
|
81
|
+
2. Enable the Message Content privileged intent.
|
|
82
|
+
3. Copy the bot token into the installer prompt or `.env` as `DISCORD_BOT_TOKEN`.
|
|
83
|
+
4. Generate an invite URL:
|
|
84
|
+
|
|
85
|
+
```bash
|
|
86
|
+
vc bot invite <discord-client-id>
|
|
87
|
+
# or pin it to one server:
|
|
88
|
+
vc bot invite <discord-client-id> --guild <guild-id>
|
|
89
|
+
```
|
|
90
|
+
|
|
91
|
+
The invite includes bot and slash-command scopes plus text/voice permissions used by VerbalCoding.
|
|
92
|
+
|
|
93
|
+
## 4. Verify
|
|
94
|
+
|
|
95
|
+
```bash
|
|
96
|
+
vc doctor
|
|
97
|
+
```
|
|
98
|
+
|
|
99
|
+
`vc doctor` is redacted: it reports missing tokens/commands/models without printing secret values. Fix every `✗` item, then rerun it.
|
|
100
|
+
|
|
101
|
+
Expected success includes:
|
|
102
|
+
|
|
103
|
+
```text
|
|
104
|
+
✓ Node.js
|
|
105
|
+
✓ npm
|
|
106
|
+
✓ ffmpeg
|
|
107
|
+
✓ whisper-cli
|
|
108
|
+
✓ whisper.cpp model
|
|
109
|
+
✓ Discord bot token configured — [REDACTED]
|
|
110
|
+
✓ edge-tts
|
|
111
|
+
✓ hermes CLI
|
|
112
|
+
Doctor passed. Run vc start to start VerbalCoding.
|
|
113
|
+
```
|
|
114
|
+
|
|
115
|
+
If the installer created a local Edge TTS helper, `.env` should contain an `EDGE_TTS_COMMAND` path pointing at `.venv-tts/bin/edge-tts`.
|
|
116
|
+
|
|
117
|
+
## 5. Run the single default bot
|
|
118
|
+
|
|
119
|
+
```bash
|
|
120
|
+
vc start
|
|
121
|
+
# or, from a GitHub clone:
|
|
122
|
+
./run.sh
|
|
123
|
+
```
|
|
124
|
+
|
|
125
|
+
Successful startup logs include:
|
|
126
|
+
|
|
127
|
+
```text
|
|
128
|
+
Logged in as <bot-name>
|
|
129
|
+
Listening in voice channel <server> / <channel>
|
|
130
|
+
```
|
|
131
|
+
|
|
132
|
+
In Discord:
|
|
133
|
+
|
|
134
|
+
```text
|
|
135
|
+
!ping
|
|
136
|
+
!join
|
|
137
|
+
!ask say hello briefly
|
|
138
|
+
!verbose on
|
|
139
|
+
```
|
|
140
|
+
|
|
141
|
+
Then speak in the configured voice channel. You should see STT text, progress text when verbose mode is on, a final text answer, and hear TTS playback.
|
|
142
|
+
|
|
143
|
+
## 6. Project-per-room setup
|
|
144
|
+
|
|
145
|
+
For one permanent bot per project voice room, create one Discord application per project, then:
|
|
146
|
+
|
|
147
|
+
```bash
|
|
148
|
+
vc instance setup my-project
|
|
149
|
+
vc bot invite <that-project-client-id>
|
|
150
|
+
vc instance start my-project
|
|
151
|
+
vc instance status my-project
|
|
152
|
+
```
|
|
153
|
+
|
|
154
|
+
Each instance writes an ignored `instances/<name>.env` with its own token, voice channel, transcript target, log path, Hermes session file, and optional Hermes profile.
|
|
155
|
+
|
|
156
|
+
## 7. Optional OpenVoice setup
|
|
157
|
+
|
|
158
|
+
OpenVoice voice cloning is optional. Keep `TTS_BACKEND=edge` for a fresh public install. To enable OpenVoice later:
|
|
159
|
+
|
|
160
|
+
```bash
|
|
161
|
+
./scripts/setup_openvoice.sh
|
|
162
|
+
# Download OpenVoice V2 checkpoints into vendor/OpenVoice/checkpoints_v2/
|
|
163
|
+
# Add a permitted local sample at voice-samples/user-reference.wav,
|
|
164
|
+
# or run the bot, say "목소리 샘플 녹음 시작해", then speak 10-30 seconds.
|
|
165
|
+
python3 scripts/openvoice_smoke.py
|
|
166
|
+
```
|
|
167
|
+
|
|
168
|
+
Then set `TTS_BACKEND=openvoice`, run `vc doctor`, and test `!voice-test <text>` in Discord.
|
|
169
|
+
|
|
170
|
+
## 8. Clean clone smoke test for maintainers
|
|
171
|
+
|
|
172
|
+
Fast host-only smoke test:
|
|
173
|
+
|
|
174
|
+
```bash
|
|
175
|
+
TMPDIR=$(mktemp -d)
|
|
176
|
+
git clone https://github.com/ca1773130n/VerbalCoding.git "$TMPDIR/VerbalCoding"
|
|
177
|
+
cd "$TMPDIR/VerbalCoding"
|
|
178
|
+
./scripts/install.sh --yes --no-wizard
|
|
179
|
+
npm pack --dry-run
|
|
180
|
+
cp .env.example .env
|
|
181
|
+
chmod 600 .env
|
|
182
|
+
vc doctor || true
|
|
183
|
+
```
|
|
184
|
+
|
|
185
|
+
The expected failure at this point is missing local secrets or unauthenticated agent CLI, not leaked tokens or missing install scripts.
|
|
186
|
+
|
|
187
|
+
Docker-based Ubuntu clean install smoke test:
|
|
188
|
+
|
|
189
|
+
```bash
|
|
190
|
+
./scripts/docker_ubuntu_smoke.sh
|
|
191
|
+
```
|
|
192
|
+
|
|
193
|
+
This runs `ubuntu:24.04`, copies the tracked repository tree into a clean container, runs `./scripts/install.sh --yes --no-wizard`, writes a non-secret smoke `.env`, checks `vc`, runs Node tests, and verifies `vc doctor`. It does not connect to Discord voice; use a real Ubuntu VM or WSL2 after this if you need an end-to-end voice-channel test.
|
|
@@ -0,0 +1,183 @@
|
|
|
1
|
+
# Multi-instance VerbalCoding
|
|
2
|
+
|
|
3
|
+
VerbalCoding can run multiple independent Discord voice bridge processes. Each process is still the existing single-instance Node bridge, but it loads a different `instances/<name>.env` file and uses a different Discord bot token.
|
|
4
|
+
|
|
5
|
+
Use this when each project should permanently occupy its own Discord voice channel and write to its own transcript channel/thread.
|
|
6
|
+
|
|
7
|
+
## Why multiple bot tokens are required
|
|
8
|
+
|
|
9
|
+
Discord voice residency is effectively one active voice connection per bot account per guild. If one bot token joins another voice channel in the same guild, it cannot also remain permanently connected to the previous channel. For simultaneous project rooms, create one Discord application/bot per project.
|
|
10
|
+
|
|
11
|
+
## File layout
|
|
12
|
+
|
|
13
|
+
```text
|
|
14
|
+
instances/
|
|
15
|
+
README.md
|
|
16
|
+
example.env
|
|
17
|
+
llm-wiki.env # local only, ignored by git
|
|
18
|
+
verbalcoding.env # local only, ignored by git
|
|
19
|
+
.run/instances/
|
|
20
|
+
llm-wiki.pid # runtime only, ignored by git
|
|
21
|
+
```
|
|
22
|
+
|
|
23
|
+
Real `instances/*.env` files are ignored because they may contain Discord tokens. `instances/example.env` is the committed template.
|
|
24
|
+
|
|
25
|
+
## Instance setup wizard
|
|
26
|
+
|
|
27
|
+
Users should not copy and manually edit env files for normal use. Run the wizard instead:
|
|
28
|
+
|
|
29
|
+
```bash
|
|
30
|
+
vc instance setup llm-wiki
|
|
31
|
+
# or through the project setup script:
|
|
32
|
+
./scripts/install.sh --instance llm-wiki
|
|
33
|
+
```
|
|
34
|
+
|
|
35
|
+
The wizard prompts for the bot token, Discord Application/Client ID, voice channel, transcript target, workdir, project context, and isolated runtime paths. It writes `instances/<name>.env` with mode `0600`, backs up an existing file before overwriting it, and prints the next start/status commands.
|
|
36
|
+
|
|
37
|
+
If you enter the Discord Application/Client ID during setup, the summary also prints the invite URL for that bot. You can generate the same URL any time with:
|
|
38
|
+
|
|
39
|
+
```bash
|
|
40
|
+
vc bot invite <client-id>
|
|
41
|
+
vc bot invite <client-id> --guild <guild-id>
|
|
42
|
+
```
|
|
43
|
+
|
|
44
|
+
Discord still requires one Developer Portal application/bot per simultaneous voice room, but this avoids manually building OAuth URLs or permission integers.
|
|
45
|
+
|
|
46
|
+
### Hermes profile isolation
|
|
47
|
+
|
|
48
|
+
Each instance gets its own Hermes home at `~/.hermes/profiles/<name>` so that
|
|
49
|
+
memory, MEMORY.md, SOUL.md, and learned skills do not leak across projects.
|
|
50
|
+
|
|
51
|
+
`vc instance setup <name>` automatically:
|
|
52
|
+
|
|
53
|
+
- runs `hermes profile create <name> --clone-from default` (carries API keys
|
|
54
|
+
and model from your current `~/.hermes`; sessions and memory start fresh),
|
|
55
|
+
- sets the new profile's `terminal.cwd` to the instance workdir,
|
|
56
|
+
- seeds `<profile>/SOUL.md` from the wizard's project-context answer,
|
|
57
|
+
- writes `HERMES_HOME=...` into `instances/<name>.env`.
|
|
58
|
+
|
|
59
|
+
`vc instance start <name>` self-heals: if the env points at a Hermes profile
|
|
60
|
+
dir that no longer exists, the start command recreates it before launching.
|
|
61
|
+
|
|
62
|
+
Instance names must match `^[a-z0-9][a-z0-9_-]{0,63}$` because Hermes uses the
|
|
63
|
+
name as a directory and config key.
|
|
64
|
+
|
|
65
|
+
## Minimal generated instance env
|
|
66
|
+
|
|
67
|
+
```env
|
|
68
|
+
INSTANCE_NAME=my-project
|
|
69
|
+
DISCORD_TOKEN=replace-with-bot-token
|
|
70
|
+
DISCORD_CLIENT_ID=123456789012345678
|
|
71
|
+
AUTO_JOIN_VOICE_CHANNELS=Project Room
|
|
72
|
+
TRANSCRIPT_CHANNEL_ID=123456789012345678
|
|
73
|
+
PROJECT_SESSIONS_FILE=config/project-sessions.my-project.json
|
|
74
|
+
BRIDGE_LOG_PATH=/tmp/verbalcoding-my-project.log
|
|
75
|
+
NODE_AUDIO_DEBUG_DIR=/tmp/verbalcoding-my-project-debug
|
|
76
|
+
HERMES_SESSION_FILE=.agent-sessions/hermes/my-project.session
|
|
77
|
+
HERMES_HOME=/home/you/.hermes/profiles/my-project
|
|
78
|
+
AGENT_LABEL=VerbalCoding · My Project
|
|
79
|
+
AGENT_CWD=/path/to/my-project
|
|
80
|
+
AGENT_PROJECT_CONTEXT=Project session: My Project
|
|
81
|
+
```
|
|
82
|
+
|
|
83
|
+
Give every instance unique values for log/debug/session files. `HERMES_HOME` and the matching `~/.hermes/profiles/<name>` directory are created automatically by `vc instance setup`. `vc doctor` checks for duplicate tokens, colliding runtime paths, missing profile directories, and `terminal.cwd` mismatches between profile and instance — all without printing secrets.
|
|
84
|
+
|
|
85
|
+
## Commands
|
|
86
|
+
|
|
87
|
+
```bash
|
|
88
|
+
vc instance list
|
|
89
|
+
vc instance status
|
|
90
|
+
vc instance status my-project
|
|
91
|
+
vc instance start my-project
|
|
92
|
+
vc instance stop my-project
|
|
93
|
+
vc instance restart my-project
|
|
94
|
+
```
|
|
95
|
+
|
|
96
|
+
`start` runs `./run.sh instances/<name>.env` detached and writes `.run/instances/<name>.pid`.
|
|
97
|
+
|
|
98
|
+
`stop` sends `SIGTERM`, waits up to 10 seconds, then falls back to `SIGKILL` and removes the pid file.
|
|
99
|
+
|
|
100
|
+
## Example: two permanent voice rooms
|
|
101
|
+
|
|
102
|
+
1. Create two Discord applications/bots:
|
|
103
|
+
- VerbalCoding bot
|
|
104
|
+
- LLM-Wiki bot
|
|
105
|
+
|
|
106
|
+
2. Invite both to the server with text and voice permissions:
|
|
107
|
+
- View Channel
|
|
108
|
+
- Send Messages
|
|
109
|
+
- Send Messages in Threads
|
|
110
|
+
- Read Message History
|
|
111
|
+
- Use Application Commands
|
|
112
|
+
- Connect
|
|
113
|
+
- Speak
|
|
114
|
+
|
|
115
|
+
Use `vc bot invite <client-id>` after creating each Discord application to print the exact invite URL with those permissions.
|
|
116
|
+
|
|
117
|
+
3. Run the setup wizard for each local instance:
|
|
118
|
+
|
|
119
|
+
```bash
|
|
120
|
+
vc instance setup verbalcoding
|
|
121
|
+
vc instance setup llm-wiki
|
|
122
|
+
```
|
|
123
|
+
|
|
124
|
+
The wizard writes ignored `instances/verbalcoding.env` and `instances/llm-wiki.env` files with mode `0600`; it also backs up an existing instance env before replacing it. Each run also creates `~/.hermes/profiles/<name>` cloned from your default Hermes home, so the two instances start with the same auth/model but accumulate independent memory and skills as they learn each project.
|
|
125
|
+
|
|
126
|
+
4. Check config:
|
|
127
|
+
|
|
128
|
+
```bash
|
|
129
|
+
vc doctor
|
|
130
|
+
```
|
|
131
|
+
|
|
132
|
+
5. Start both:
|
|
133
|
+
|
|
134
|
+
```bash
|
|
135
|
+
vc instance start verbalcoding
|
|
136
|
+
vc instance start llm-wiki
|
|
137
|
+
vc instance status
|
|
138
|
+
```
|
|
139
|
+
|
|
140
|
+
6. Verify logs:
|
|
141
|
+
|
|
142
|
+
```bash
|
|
143
|
+
tail -n 50 /tmp/verbalcoding-verbalcoding.log
|
|
144
|
+
tail -n 50 /tmp/verbalcoding-llm-wiki.log
|
|
145
|
+
```
|
|
146
|
+
|
|
147
|
+
Expected log lines:
|
|
148
|
+
|
|
149
|
+
```text
|
|
150
|
+
Listening in voice channel ... / VerbalCoding
|
|
151
|
+
Listening in voice channel ... / LLM-Wiki
|
|
152
|
+
```
|
|
153
|
+
|
|
154
|
+
7. Stop both:
|
|
155
|
+
|
|
156
|
+
```bash
|
|
157
|
+
vc instance stop verbalcoding
|
|
158
|
+
vc instance stop llm-wiki
|
|
159
|
+
```
|
|
160
|
+
|
|
161
|
+
## Short-term single-bot text/voice binding
|
|
162
|
+
|
|
163
|
+
If you only have one bot token, use project-session voice binding instead of simultaneous multi-channel residency.
|
|
164
|
+
|
|
165
|
+
Run this in the target text channel/thread:
|
|
166
|
+
|
|
167
|
+
```text
|
|
168
|
+
!session attach-voice --voice "LLM-Wiki"
|
|
169
|
+
```
|
|
170
|
+
|
|
171
|
+
Behavior:
|
|
172
|
+
|
|
173
|
+
- Binds the selected voice channel to the current text channel/thread.
|
|
174
|
+
- If the current text channel has no project session, creates an ad-hoc isolated session.
|
|
175
|
+
- Voice STT/result/progress/final-answer text routes to that active project transcript target.
|
|
176
|
+
|
|
177
|
+
To attach an existing named project session:
|
|
178
|
+
|
|
179
|
+
```text
|
|
180
|
+
!session voice llm-wiki --voice "LLM-Wiki"
|
|
181
|
+
```
|
|
182
|
+
|
|
183
|
+
This is convenient for routing, but it does not make one bot stay in two voice channels at the same time. Use multiple bot tokens/processes for simultaneous permanent residency.
|
package/docs/RELEASE.md
ADDED
|
@@ -0,0 +1,72 @@
|
|
|
1
|
+
# VerbalCoding release notes
|
|
2
|
+
|
|
3
|
+
## Current release candidate
|
|
4
|
+
|
|
5
|
+
VerbalCoding is a Discord voice bridge for controlling CLI-based coding agents by voice. It is public-release oriented, with macOS / Apple Silicon as the most tested path and best-effort Linux bootstrap support for common package managers.
|
|
6
|
+
|
|
7
|
+
### Included
|
|
8
|
+
|
|
9
|
+
- Discord voice receive via Node `@discordjs/voice`.
|
|
10
|
+
- Local Korean STT via `whisper.cpp` + Metal.
|
|
11
|
+
- Edge TTS playback with Korean default voice.
|
|
12
|
+
- Generic CLI harness adapter layer:
|
|
13
|
+
- Hermes Agent
|
|
14
|
+
- Claude Code
|
|
15
|
+
- Codex CLI
|
|
16
|
+
- Gemini CLI
|
|
17
|
+
- OpenCode
|
|
18
|
+
- OpenClaw
|
|
19
|
+
- custom command
|
|
20
|
+
- Shared voice/text session support for Hermes backend.
|
|
21
|
+
- Long-answer TTS chunking and responsive barge-in.
|
|
22
|
+
- Diff/code/log guardrails so large technical output is not read aloud.
|
|
23
|
+
- Normal and conservative sensitivity modes for indoor vs. noisy/outdoor use.
|
|
24
|
+
- Setup wizard, `.env.example`, `vc doctor` prerequisite checker, and `./scripts/install.sh --yes` bootstrap for OS packages, npm dependencies, Edge TTS helper, and the default whisper.cpp model.
|
|
25
|
+
- Optional verbose progress mode for text-only middle-step updates during long agent work.
|
|
26
|
+
- Always-on JSONL latency metrics plus `!latency` / `!metrics` summary for pipeline optimization.
|
|
27
|
+
- Lower default utterance idle wait (`UTTERANCE_IDLE_MS=2000`) so STT starts about 0.6s sooner after speech ends.
|
|
28
|
+
- Multi-instance Hermes profile isolation: `vc instance setup <name>` auto-clones a Hermes profile to `~/.hermes/profiles/<name>` with the instance workdir, seeds SOUL.md, and writes `HERMES_HOME` into the instance env so per-project memory and skills stay separate; `vc instance start` self-heals a missing profile, and `vc doctor` checks profile-dir presence and `terminal.cwd` consistency.
|
|
29
|
+
|
|
30
|
+
### Pre-release checklist
|
|
31
|
+
|
|
32
|
+
Run from the repo root:
|
|
33
|
+
|
|
34
|
+
```bash
|
|
35
|
+
./scripts/install.sh --yes --no-wizard
|
|
36
|
+
./scripts/docker_ubuntu_smoke.sh # requires Docker; validates ubuntu:24.04 clean install
|
|
37
|
+
node --check app-node/main.mjs app-node/agent_adapters.mjs app-node/install_config.mjs scripts/install.mjs
|
|
38
|
+
npm test
|
|
39
|
+
PYTEST_DISABLE_PLUGIN_AUTOLOAD=1 python3 -m pytest tests/ -q || [ $? -eq 5 ] # ok when no Python tests exist
|
|
40
|
+
bash -n run.sh scripts/install.sh scripts/bootstrap_prereqs.sh scripts/docker_ubuntu_smoke.sh
|
|
41
|
+
vc doctor
|
|
42
|
+
git diff --check
|
|
43
|
+
```
|
|
44
|
+
|
|
45
|
+
Manual smoke test:
|
|
46
|
+
|
|
47
|
+
1. Start the bridge with `./run.sh`.
|
|
48
|
+
2. Verify log contains `Logged in as Hermes#6718`.
|
|
49
|
+
3. Verify log contains `Listening in voice channel ... / 일반` or the configured default channel.
|
|
50
|
+
4. In Discord, run `!ping`.
|
|
51
|
+
5. In Discord voice, say a short Korean request.
|
|
52
|
+
6. Verify STT transcript, agent response, TTS playback, and barge-in behavior.
|
|
53
|
+
|
|
54
|
+
### Known requirements
|
|
55
|
+
|
|
56
|
+
- macOS with Homebrew, or Linux with `apt`, `dnf`, or `pacman` for best-effort bootstrap.
|
|
57
|
+
- `ffmpeg`; installer attempts to install it.
|
|
58
|
+
- `whisper-cli`; installer uses Homebrew on macOS or local `vendor/whisper.cpp` build fallback on Linux.
|
|
59
|
+
- Default model at `models/ggml-small-q5_1.bin`; installer downloads it unless `--skip-model` is used.
|
|
60
|
+
- Edge TTS CLI on `PATH` or local `.venv-tts/bin/edge-tts`; installer creates the local helper when needed.
|
|
61
|
+
- Discord bot token in `.env`, `instances/<name>.env`, `~/.zshrc`, or runtime env.
|
|
62
|
+
- Selected CLI harness installed and authenticated.
|
|
63
|
+
|
|
64
|
+
### Not for public release yet
|
|
65
|
+
|
|
66
|
+
Before public release, consider adding:
|
|
67
|
+
|
|
68
|
+
- GitHub Actions CI.
|
|
69
|
+
- Demo video / GIF.
|
|
70
|
+
- Discord bot setup screenshots.
|
|
71
|
+
- Broader Linux validation on real distributions beyond script-level checks.
|
|
72
|
+
- Security review of all logging paths.
|