verbalcoding 0.2.11 → 0.2.13

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (235) hide show
  1. package/.env.example +98 -2
  2. package/README.es.md +134 -0
  3. package/README.fr.md +134 -0
  4. package/README.ja.md +134 -0
  5. package/README.ko.md +134 -0
  6. package/README.md +118 -74
  7. package/README.ru.md +134 -0
  8. package/README.zh.md +133 -0
  9. package/app-node/agent_adapters.mjs +37 -5
  10. package/app-node/agent_adapters.test.mjs +27 -1
  11. package/app-node/agent_detect.mjs +73 -0
  12. package/app-node/agent_detect.test.mjs +77 -0
  13. package/app-node/agent_routing.mjs +148 -0
  14. package/app-node/agent_routing.test.mjs +138 -0
  15. package/app-node/agent_turn.mjs +86 -0
  16. package/app-node/agent_turn.test.mjs +109 -0
  17. package/app-node/bridge_context.mjs +73 -0
  18. package/app-node/bridge_context.test.mjs +54 -0
  19. package/app-node/bridge_state.mjs +4 -0
  20. package/app-node/bridge_wireup.test.mjs +462 -0
  21. package/app-node/cli_install.test.mjs +31 -0
  22. package/app-node/cross_agent_routing.test.mjs +78 -0
  23. package/app-node/discord_command_router.mjs +204 -0
  24. package/app-node/discord_command_router.test.mjs +311 -0
  25. package/app-node/discord_voice_setup.mjs +251 -0
  26. package/app-node/discord_voice_setup.test.mjs +86 -0
  27. package/app-node/hermes_profiles.test.mjs +12 -1
  28. package/app-node/install_config.mjs +113 -3
  29. package/app-node/install_config.test.mjs +8 -0
  30. package/app-node/instance_doctor.test.mjs +9 -0
  31. package/app-node/instances.test.mjs +8 -1
  32. package/app-node/main.mjs +513 -1058
  33. package/app-node/mcp_tools.test.mjs +7 -0
  34. package/app-node/notification_handler.mjs +89 -0
  35. package/app-node/notification_handler.test.mjs +187 -0
  36. package/app-node/notify.mjs +73 -0
  37. package/app-node/notify.test.mjs +68 -0
  38. package/app-node/plan_dispatcher.mjs +215 -0
  39. package/app-node/plan_dispatcher.test.mjs +101 -0
  40. package/app-node/plan_mode.mjs +203 -0
  41. package/app-node/plan_mode.test.mjs +231 -0
  42. package/app-node/progress_handler.mjs +220 -0
  43. package/app-node/progress_handler.test.mjs +193 -0
  44. package/app-node/progress_speech.mjs +54 -32
  45. package/app-node/progress_speech.test.mjs +12 -3
  46. package/app-node/project_sessions.mjs +5 -2
  47. package/app-node/project_sessions.test.mjs +7 -0
  48. package/app-node/research_mode.mjs +282 -0
  49. package/app-node/research_mode.test.mjs +264 -0
  50. package/app-node/restart_notice.mjs +3 -0
  51. package/app-node/restart_notice.test.mjs +11 -0
  52. package/app-node/session_ontology.mjs +271 -0
  53. package/app-node/session_ontology.test.mjs +130 -0
  54. package/app-node/smart_progress.mjs +94 -0
  55. package/app-node/smart_progress.test.mjs +66 -0
  56. package/app-node/stream_sentencer.mjs +91 -0
  57. package/app-node/stream_sentencer.test.mjs +129 -0
  58. package/app-node/streaming_tts_queue.mjs +52 -0
  59. package/app-node/streaming_tts_queue.test.mjs +64 -0
  60. package/app-node/stt_whisper.mjs +24 -0
  61. package/app-node/stt_whisper.test.mjs +32 -0
  62. package/app-node/text_routing.mjs +22 -0
  63. package/app-node/text_routing.test.mjs +23 -1
  64. package/app-node/tts_backends.mjs +537 -3
  65. package/app-node/tts_backends.test.mjs +454 -0
  66. package/app-node/tts_player.mjs +164 -0
  67. package/app-node/tts_player.test.mjs +202 -0
  68. package/app-node/tts_runtime.mjs +134 -0
  69. package/app-node/tts_runtime.test.mjs +89 -0
  70. package/app-node/tts_settings.mjs +150 -3
  71. package/app-node/tts_settings.test.mjs +204 -0
  72. package/app-node/tts_voice_config.mjs +136 -2
  73. package/app-node/tts_voice_config.test.mjs +94 -0
  74. package/app-node/utterance_router.mjs +216 -0
  75. package/app-node/utterance_router.test.mjs +236 -0
  76. package/app-node/voice_autojoin.mjs +37 -0
  77. package/app-node/voice_autojoin.test.mjs +59 -0
  78. package/app-node/voice_io.mjs +272 -0
  79. package/app-node/voice_io.test.mjs +102 -0
  80. package/app-node/voice_turn_runner.mjs +449 -0
  81. package/app-node/voice_turn_runner.test.mjs +289 -0
  82. package/docs/CONFIGURATION.md +79 -96
  83. package/docs/FRESH_INSTALL.md +105 -63
  84. package/docs/HARNESSES.md +58 -0
  85. package/docs/HARNESS_AIDER.md +50 -0
  86. package/docs/HARNESS_CLAUDE.md +56 -0
  87. package/docs/HARNESS_CODEX.md +56 -0
  88. package/docs/HARNESS_CURSOR.md +45 -0
  89. package/docs/HARNESS_GEMINI.md +45 -0
  90. package/docs/HARNESS_HERMES.md +57 -0
  91. package/docs/HARNESS_OPENCLAW.md +44 -0
  92. package/docs/HARNESS_OPENCODE.md +44 -0
  93. package/docs/HERMES_VOICE.md +65 -0
  94. package/docs/MULTI_INSTANCE.md +16 -0
  95. package/docs/README.md +50 -0
  96. package/docs/RELEASE.md +42 -19
  97. package/docs/ROADMAP.md +53 -0
  98. package/docs/TROUBLESHOOTING.md +126 -0
  99. package/docs/TTS_BACKENDS.md +227 -0
  100. package/docs/USAGE.md +94 -40
  101. package/docs/assets/figures/verbalcoding-flow.svg +1 -1
  102. package/docs/i18n/AGENTS.es.md +34 -0
  103. package/docs/i18n/AGENTS.fr.md +34 -0
  104. package/docs/i18n/AGENTS.ja.md +34 -0
  105. package/docs/i18n/AGENTS.ko.md +34 -0
  106. package/docs/i18n/AGENTS.ru.md +34 -0
  107. package/docs/i18n/AGENTS.zh.md +34 -0
  108. package/docs/i18n/CONFIGURATION.es.md +25 -0
  109. package/docs/i18n/CONFIGURATION.fr.md +25 -0
  110. package/docs/i18n/CONFIGURATION.ja.md +25 -0
  111. package/docs/i18n/CONFIGURATION.ko.md +25 -0
  112. package/docs/i18n/CONFIGURATION.ru.md +25 -0
  113. package/docs/i18n/CONFIGURATION.zh.md +25 -0
  114. package/docs/i18n/FRESH_INSTALL.es.md +27 -2
  115. package/docs/i18n/FRESH_INSTALL.fr.md +27 -2
  116. package/docs/i18n/FRESH_INSTALL.ja.md +27 -2
  117. package/docs/i18n/FRESH_INSTALL.ko.md +27 -2
  118. package/docs/i18n/FRESH_INSTALL.ru.md +27 -2
  119. package/docs/i18n/FRESH_INSTALL.zh.md +27 -2
  120. package/docs/i18n/HARNESSES.es.md +58 -0
  121. package/docs/i18n/HARNESSES.fr.md +58 -0
  122. package/docs/i18n/HARNESSES.ja.md +58 -0
  123. package/docs/i18n/HARNESSES.ko.md +58 -0
  124. package/docs/i18n/HARNESSES.ru.md +58 -0
  125. package/docs/i18n/HARNESSES.zh.md +58 -0
  126. package/docs/i18n/HARNESS_AIDER.es.md +48 -0
  127. package/docs/i18n/HARNESS_AIDER.fr.md +48 -0
  128. package/docs/i18n/HARNESS_AIDER.ja.md +50 -0
  129. package/docs/i18n/HARNESS_AIDER.ko.md +50 -0
  130. package/docs/i18n/HARNESS_AIDER.ru.md +48 -0
  131. package/docs/i18n/HARNESS_AIDER.zh.md +48 -0
  132. package/docs/i18n/HARNESS_CLAUDE.es.md +55 -0
  133. package/docs/i18n/HARNESS_CLAUDE.fr.md +55 -0
  134. package/docs/i18n/HARNESS_CLAUDE.ja.md +56 -0
  135. package/docs/i18n/HARNESS_CLAUDE.ko.md +56 -0
  136. package/docs/i18n/HARNESS_CLAUDE.ru.md +55 -0
  137. package/docs/i18n/HARNESS_CLAUDE.zh.md +56 -0
  138. package/docs/i18n/HARNESS_CODEX.es.md +55 -0
  139. package/docs/i18n/HARNESS_CODEX.fr.md +55 -0
  140. package/docs/i18n/HARNESS_CODEX.ja.md +56 -0
  141. package/docs/i18n/HARNESS_CODEX.ko.md +56 -0
  142. package/docs/i18n/HARNESS_CODEX.ru.md +55 -0
  143. package/docs/i18n/HARNESS_CODEX.zh.md +56 -0
  144. package/docs/i18n/HARNESS_CURSOR.es.md +42 -0
  145. package/docs/i18n/HARNESS_CURSOR.fr.md +42 -0
  146. package/docs/i18n/HARNESS_CURSOR.ja.md +45 -0
  147. package/docs/i18n/HARNESS_CURSOR.ko.md +45 -0
  148. package/docs/i18n/HARNESS_CURSOR.ru.md +42 -0
  149. package/docs/i18n/HARNESS_CURSOR.zh.md +42 -0
  150. package/docs/i18n/HARNESS_GEMINI.es.md +44 -0
  151. package/docs/i18n/HARNESS_GEMINI.fr.md +44 -0
  152. package/docs/i18n/HARNESS_GEMINI.ja.md +45 -0
  153. package/docs/i18n/HARNESS_GEMINI.ko.md +45 -0
  154. package/docs/i18n/HARNESS_GEMINI.ru.md +44 -0
  155. package/docs/i18n/HARNESS_GEMINI.zh.md +45 -0
  156. package/docs/i18n/HARNESS_HERMES.es.md +54 -0
  157. package/docs/i18n/HARNESS_HERMES.fr.md +54 -0
  158. package/docs/i18n/HARNESS_HERMES.ja.md +57 -0
  159. package/docs/i18n/HARNESS_HERMES.ko.md +57 -0
  160. package/docs/i18n/HARNESS_HERMES.ru.md +54 -0
  161. package/docs/i18n/HARNESS_HERMES.zh.md +57 -0
  162. package/docs/i18n/HARNESS_OPENCLAW.es.md +41 -0
  163. package/docs/i18n/HARNESS_OPENCLAW.fr.md +41 -0
  164. package/docs/i18n/HARNESS_OPENCLAW.ja.md +44 -0
  165. package/docs/i18n/HARNESS_OPENCLAW.ko.md +44 -0
  166. package/docs/i18n/HARNESS_OPENCLAW.ru.md +41 -0
  167. package/docs/i18n/HARNESS_OPENCLAW.zh.md +42 -0
  168. package/docs/i18n/HARNESS_OPENCODE.es.md +41 -0
  169. package/docs/i18n/HARNESS_OPENCODE.fr.md +41 -0
  170. package/docs/i18n/HARNESS_OPENCODE.ja.md +44 -0
  171. package/docs/i18n/HARNESS_OPENCODE.ko.md +44 -0
  172. package/docs/i18n/HARNESS_OPENCODE.ru.md +41 -0
  173. package/docs/i18n/HARNESS_OPENCODE.zh.md +44 -0
  174. package/docs/i18n/HERMES_VOICE.es.md +46 -0
  175. package/docs/i18n/HERMES_VOICE.fr.md +46 -0
  176. package/docs/i18n/HERMES_VOICE.ja.md +46 -0
  177. package/docs/i18n/HERMES_VOICE.ko.md +65 -0
  178. package/docs/i18n/HERMES_VOICE.ru.md +46 -0
  179. package/docs/i18n/HERMES_VOICE.zh.md +46 -0
  180. package/docs/i18n/MULTI_INSTANCE.es.md +25 -0
  181. package/docs/i18n/MULTI_INSTANCE.fr.md +25 -0
  182. package/docs/i18n/MULTI_INSTANCE.ja.md +25 -0
  183. package/docs/i18n/MULTI_INSTANCE.ko.md +25 -0
  184. package/docs/i18n/MULTI_INSTANCE.ru.md +25 -0
  185. package/docs/i18n/MULTI_INSTANCE.zh.md +25 -0
  186. package/docs/i18n/README.es.md +20 -134
  187. package/docs/i18n/README.fr.md +20 -134
  188. package/docs/i18n/README.ja.md +20 -134
  189. package/docs/i18n/README.ko.md +20 -133
  190. package/docs/i18n/README.ru.md +20 -134
  191. package/docs/i18n/README.zh.md +20 -133
  192. package/docs/i18n/RELEASE.es.md +26 -1
  193. package/docs/i18n/RELEASE.fr.md +26 -1
  194. package/docs/i18n/RELEASE.ja.md +26 -1
  195. package/docs/i18n/RELEASE.ko.md +26 -1
  196. package/docs/i18n/RELEASE.ru.md +26 -1
  197. package/docs/i18n/RELEASE.zh.md +26 -1
  198. package/docs/i18n/TROUBLESHOOTING.es.md +39 -0
  199. package/docs/i18n/TROUBLESHOOTING.fr.md +39 -0
  200. package/docs/i18n/TROUBLESHOOTING.ja.md +39 -0
  201. package/docs/i18n/TROUBLESHOOTING.ko.md +39 -0
  202. package/docs/i18n/TROUBLESHOOTING.ru.md +39 -0
  203. package/docs/i18n/TROUBLESHOOTING.zh.md +39 -0
  204. package/docs/i18n/USAGE.es.md +25 -0
  205. package/docs/i18n/USAGE.fr.md +25 -0
  206. package/docs/i18n/USAGE.ja.md +25 -0
  207. package/docs/i18n/USAGE.ko.md +25 -0
  208. package/docs/i18n/USAGE.ru.md +25 -0
  209. package/docs/i18n/USAGE.zh.md +25 -0
  210. package/docs/superpowers/plans/2026-05-13-phase1-streaming-pipeline.md +122 -0
  211. package/docs/superpowers/plans/2026-05-13-phase10-push-notifications.md +152 -0
  212. package/docs/superpowers/plans/2026-05-13-phase2-agent-adapters.md +242 -0
  213. package/docs/superpowers/plans/2026-05-13-phase6-smart-progress.md +172 -0
  214. package/docs/superpowers/plans/2026-05-13-phase7-voice-plan-mode.md +108 -0
  215. package/docs/superpowers/plans/2026-05-14-cross-agent-voice-transfer.md +625 -0
  216. package/docs/superpowers/plans/2026-05-21-audio-overview-narrated-diffs.md +95 -0
  217. package/docs/superpowers/plans/2026-05-21-autoresearch-ontology.md +83 -0
  218. package/docs/superpowers/plans/2026-05-21-phase11-push-to-talk-wakeword-v2.md +77 -0
  219. package/docs/superpowers/plans/2026-05-21-phase12-multi-user-voice.md +147 -0
  220. package/docs/superpowers/plans/2026-05-21-phase14-verbalbench.md +136 -0
  221. package/docs/superpowers/plans/2026-05-21-phase15-phone-companion.md +72 -0
  222. package/integrations/fireredtts2/mlx_llm.py +183 -0
  223. package/integrations/fireredtts2/synth.py +156 -0
  224. package/integrations/fireredtts2/synth_mlx.py +196 -0
  225. package/integrations/mlxaudio/synth.py +74 -0
  226. package/integrations/neuttsair/synth.py +104 -0
  227. package/integrations/omnivoice/synth.py +110 -0
  228. package/package.json +7 -1
  229. package/scripts/cli.mjs +88 -3
  230. package/scripts/doctor.mjs +115 -4
  231. package/scripts/install.mjs +20 -2
  232. package/scripts/install_fireredtts2.sh +109 -0
  233. package/scripts/install_mlxaudio.sh +34 -0
  234. package/scripts/install_mossttsnano.sh +46 -0
  235. package/scripts/postinstall.mjs +34 -0
@@ -0,0 +1,44 @@
1
+ # OpenClaw — Harness Notes
2
+
3
+ <p align="center">
4
+ <a href="../README.md">README</a> ·
5
+ <a href="HARNESSES.md">Harnesses</a> ·
6
+ <a href="USAGE.md">Usage</a> ·
7
+ <a href="CONFIGURATION.md">Configuration</a>
8
+ </p>
9
+
10
+ OpenClaw is an open-source terminal coding agent. VerbalCoding drives it through `openclaw run`.
11
+
12
+ ## Install
13
+
14
+ Follow the upstream OpenClaw install guide. Confirm:
15
+
16
+ ```bash
17
+ openclaw run "hello"
18
+ ```
19
+
20
+ ## Configure VerbalCoding
21
+
22
+ ```bash
23
+ # .env
24
+ AGENT_BACKEND=openclaw
25
+ # optional
26
+ OPENCLAW_COMMAND="openclaw run" # default
27
+ AGENT_PROJECT_CONTEXT="..."
28
+ AGENT_WORKDIR=/Users/you/code/your-project
29
+ AGENT_CHAT_TIMEOUT_MS=45000
30
+ AGENT_TASK_TIMEOUT_MS=0
31
+ ```
32
+
33
+ ## Voice phrases to switch TO OpenClaw
34
+
35
+ - en: `"switch to OpenClaw"`, `"ask OpenClaw ..."`, `"switch to open claw"`
36
+ - ko: `"openclaw로 전환"`
37
+
38
+ The matcher accepts `openclaw` and `open claw`.
39
+
40
+ ## Gotchas
41
+
42
+ - **No session resume** in the default command. Add a resume flag via `OPENCLAW_COMMAND` if your build supports one.
43
+ - **Verbose progress.** Same as OpenCode — keyword-based labels unless `SMART_PROGRESS_API_KEY` is configured for the LLM summarizer.
44
+ - **Naming clash.** Both the parser alias `openclaw` and the user-facing label `OpenClaw` are distinct from `claude` / `claude code`; the strict-mode router won't conflate them.
@@ -0,0 +1,44 @@
1
+ # OpenCode — Harness Notes
2
+
3
+ <p align="center">
4
+ <a href="../README.md">README</a> ·
5
+ <a href="HARNESSES.md">Harnesses</a> ·
6
+ <a href="USAGE.md">Usage</a> ·
7
+ <a href="CONFIGURATION.md">Configuration</a>
8
+ </p>
9
+
10
+ OpenCode is an open-source terminal coding agent. VerbalCoding drives it through `opencode run`.
11
+
12
+ ## Install
13
+
14
+ Follow the upstream OpenCode install guide. Confirm:
15
+
16
+ ```bash
17
+ opencode run "hello"
18
+ ```
19
+
20
+ ## Configure VerbalCoding
21
+
22
+ ```bash
23
+ # .env
24
+ AGENT_BACKEND=opencode
25
+ # optional
26
+ OPENCODE_COMMAND="opencode run" # default
27
+ AGENT_PROJECT_CONTEXT="..."
28
+ AGENT_WORKDIR=/Users/you/code/your-project
29
+ AGENT_CHAT_TIMEOUT_MS=45000
30
+ AGENT_TASK_TIMEOUT_MS=0
31
+ ```
32
+
33
+ ## Voice phrases to switch TO OpenCode
34
+
35
+ - en: `"switch to OpenCode"`, `"ask OpenCode ..."`, `"switch to open code"`
36
+ - ko: `"opencode로 전환"`, `"오픈코드로 전환"`
37
+
38
+ The matcher accepts `opencode` and `open code`.
39
+
40
+ ## Gotchas
41
+
42
+ - **No session resume** in the default command. If your OpenCode build supports a resume flag, append it via `OPENCODE_COMMAND="opencode run --resume"` (the adapter passes the prompt as the final positional arg).
43
+ - **Model choice.** Append `--model` flags via `OPENCODE_COMMAND` if your OpenCode build expects them.
44
+ - **Verbose progress.** Whatever events OpenCode prints on stdout/stderr get keyword-matched (file reads, web search, terminal); without `SMART_PROGRESS_API_KEY` the bridge falls back to those raw labels.
@@ -0,0 +1,65 @@
1
+ # Hermes Built-in Voice vs VerbalCoding
2
+
3
+ <!-- readme-glow-up:intro -->
4
+ <p align="center">
5
+ <a href="../README.md">README</a> ·
6
+ <a href="README.md">Docs hub</a> ·
7
+ <a href="USAGE.md">Usage</a> ·
8
+ <a href="CONFIGURATION.md">Configuration</a> ·
9
+ <a href="TROUBLESHOOTING.md">Troubleshooting</a>
10
+ </p>
11
+
12
+ > Hermes already supports Discord voice channels. VerbalCoding is the workflow layer for people who want a coding-agent phone call, not just the baseline voice loop.
13
+ <!-- /readme-glow-up:intro -->
14
+
15
+ ## What Hermes already does
16
+
17
+ Hermes Agent has built-in Discord voice-channel support through the Discord gateway. After the bot is in your server, slash commands such as `/voice join` or `/voice channel` can join the voice channel you are currently in. Hermes can then transcribe speech with Whisper/STT and speak replies back through TTS providers such as Edge TTS, ElevenLabs, OpenAI, or other configured providers.
18
+
19
+ For basic live voice chat, this is enough:
20
+
21
+ ```text
22
+ Discord VC → Hermes STT → Hermes agent → TTS → Discord VC playback
23
+ ```
24
+
25
+ If that is your whole requirement, use Hermes built-in voice mode first.
26
+
27
+ ## What VerbalCoding adds
28
+
29
+ VerbalCoding keeps the same high-level loop, but makes it a coding-workflow runtime around CLI agents.
30
+
31
+ | Area | Hermes built-in voice | VerbalCoding |
32
+ |---|---|---|
33
+ | Primary goal | General Hermes conversation in a Discord VC | Phone-call-style coding workflow with CLI agents |
34
+ | Commands | `/voice join`, `/voice channel`, `/voice leave`, `/voice tts` | `vc setup`, `vc start`, `!join`, `!ask`, `!session`, `!verbose`, `!latency`, multi-instance commands |
35
+ | Backend | Hermes Agent | Hermes Agent, Claude Code, Codex, Gemini CLI, OpenCode, OpenClaw, or custom command |
36
+ | Session model | Normal Hermes gateway session | Project/session routing, voice-channel bindings, shared voice + `!ask` text context where supported |
37
+ | Speech UX | Baseline STT + TTS | Tuned utterance windows, language presets, transcript cleanup, text mirrors, voice tests |
38
+ | Interruption | Basic voice playback behavior | Barge-in rules that stop playback without accidentally killing an active agent task |
39
+ | Long coding tasks | Generic agent response | Progress/status prompts, verbose tool-progress summaries, diff/log suppression for TTS |
40
+ | Operations | Hermes gateway setup and config | `vc doctor` auto-fixes, redacted diagnostics, latency metrics, Docker UDP guidance, multi-bot/project rooms |
41
+
42
+ ## When to choose which
43
+
44
+ Use **Hermes built-in voice** when you want:
45
+
46
+ - one bot in one Discord voice channel;
47
+ - simple speak → transcribe → answer → speak-back behavior;
48
+ - the official Hermes gateway path with minimal extra software;
49
+ - Hermes-only sessions and tools.
50
+
51
+ Use **VerbalCoding** when you want:
52
+
53
+ - voice and text to cooperate around a coding project;
54
+ - multiple agent backends, not only Hermes;
55
+ - project-specific Discord rooms or multiple bot instances;
56
+ - Korean/English language presets and runtime voice controls;
57
+ - careful barge-in behavior during long agent work;
58
+ - spoken progress without reading giant diffs, stack traces, or logs aloud;
59
+ - operational debugging with `vc doctor`, latency summaries, and container voice-network guidance.
60
+
61
+ ## Honest positioning
62
+
63
+ VerbalCoding should not be described as “adding Discord voice to Hermes from scratch.” Hermes already has that baseline. A better description is:
64
+
65
+ > VerbalCoding is a Discord voice workflow layer for CLI coding agents. It can use Hermes as the default backend, while adding project routing, interruption semantics, progress UX, diagnostics, and backend switching for long-running software work.
@@ -1,5 +1,21 @@
1
1
  # Multi-instance VerbalCoding
2
2
 
3
+ <!-- readme-glow-up:intro -->
4
+ <p align="center">
5
+ <a href="../README.md">README</a> ·
6
+ <a href="README.md">Docs hub</a> ·
7
+ <a href="FRESH_INSTALL.md">Fresh Install</a> ·
8
+ <a href="USAGE.md">Usage</a> ·
9
+ <a href="CONFIGURATION.md">Configuration</a> ·
10
+ <a href="TROUBLESHOOTING.md">Troubleshooting</a> ·
11
+ <a href="MULTI_INSTANCE.md">Multi-Instance</a>
12
+ </p>
13
+
14
+ > Run one isolated Discord voice bridge per project room.
15
+ >
16
+ > Fast path: `vc instance setup NAME → vc bot invite CLIENT_ID → vc instance start NAME`
17
+ <!-- /readme-glow-up:intro -->
18
+
3
19
  VerbalCoding can run multiple independent Discord voice bridge processes. Each process is still the existing single-instance Node bridge, but it loads a different `instances/<name>.env` file and uses a different Discord bot token.
4
20
 
5
21
  Use this when each project should permanently occupy its own Discord voice channel and write to its own transcript channel/thread.
package/docs/README.md ADDED
@@ -0,0 +1,50 @@
1
+ # VerbalCoding docs
2
+
3
+ <p align="center">
4
+ <a href="../README.md">README</a> ·
5
+ <a href="./i18n/README.ko.md">한국어</a> ·
6
+ <a href="./i18n/README.ja.md">日本語</a> ·
7
+ <a href="./i18n/README.zh.md">中文</a> ·
8
+ <a href="./i18n/README.es.md">Español</a> ·
9
+ <a href="./i18n/README.fr.md">Français</a> ·
10
+ <a href="./i18n/README.ru.md">Русский</a>
11
+ </p>
12
+
13
+ This is the detailed manual behind the compact README. Start with the fresh install guide if you are setting up a real Discord voice bot for the first time.
14
+
15
+ ## Fast path
16
+
17
+ ```bash
18
+ npm install -g verbalcoding@latest
19
+ vc setup
20
+ vc doctor
21
+ vc start
22
+ ```
23
+
24
+ ## Guides
25
+
26
+ | Guide | Use it when you need |
27
+ |---|---|
28
+ | [Fresh Install](FRESH_INSTALL.md) | A clean npm/global install, Discord app setup, first bot invite, and first voice run. |
29
+ | [Usage](USAGE.md) | CLI commands, Discord commands, run modes, voice changes, progress, and latency metrics. |
30
+ | [Hermes Voice vs VerbalCoding](HERMES_VOICE.md) | What Hermes built-in Discord voice already does and what VerbalCoding adds. |
31
+ | [Configuration](CONFIGURATION.md) | `.env`, agent backends, MCP server, TTS backends, and operational settings. |
32
+ | [TTS Backends](TTS_BACKENDS.md) | Optional local/cloud TTS backends, aliases, latency observations, and Mac mini caveats. |
33
+ | [Troubleshooting](TROUBLESHOOTING.md) | Docker UDP, voice join failures, missing token/channel checks, and doctor behavior. |
34
+ | [Multi-Instance](MULTI_INSTANCE.md) | One permanent Discord voice bot per project room with isolated Hermes profiles. |
35
+ | [Release Notes](RELEASE.md) | Current capabilities, verification checklist, and pre-public-release gaps. |
36
+
37
+ ## Localized guide sets
38
+
39
+ | Language | Docs index |
40
+ |---|---|
41
+ | Korean | [docs/i18n/README.ko.md](i18n/README.ko.md) |
42
+ | Japanese | [docs/i18n/README.ja.md](i18n/README.ja.md) |
43
+ | Chinese | [docs/i18n/README.zh.md](i18n/README.zh.md) |
44
+ | Spanish | [docs/i18n/README.es.md](i18n/README.es.md) |
45
+ | French | [docs/i18n/README.fr.md](i18n/README.fr.md) |
46
+ | Russian | [docs/i18n/README.ru.md](i18n/README.ru.md) |
47
+
48
+ ## Contributor note
49
+
50
+ Use `vc ...` commands in user-facing docs. Keep `./scripts/...` commands for source-checkout contributor flows only.
package/docs/RELEASE.md CHANGED
@@ -1,13 +1,29 @@
1
1
  # VerbalCoding release notes
2
2
 
3
+ <!-- readme-glow-up:intro -->
4
+ <p align="center">
5
+ <a href="../README.md">README</a> ·
6
+ <a href="README.md">Docs hub</a> ·
7
+ <a href="FRESH_INSTALL.md">Fresh Install</a> ·
8
+ <a href="USAGE.md">Usage</a> ·
9
+ <a href="CONFIGURATION.md">Configuration</a> ·
10
+ <a href="TROUBLESHOOTING.md">Troubleshooting</a> ·
11
+ <a href="MULTI_INSTANCE.md">Multi-Instance</a>
12
+ </p>
13
+
14
+ > Release-facing capability list and verification checklist.
15
+ >
16
+ > Fast path: `npm pack --dry-run → npm test → vc doctor → manual Discord smoke test`
17
+ <!-- /readme-glow-up:intro -->
18
+
3
19
  ## Current release candidate
4
20
 
5
- VerbalCoding is a Discord voice bridge for controlling CLI-based coding agents by voice. It is public-release oriented, with macOS / Apple Silicon as the most tested path and best-effort Linux bootstrap support for common package managers.
21
+ VerbalCoding is a Discord voice bridge for controlling CLI-based coding agents by voice. It is public-release oriented, with macOS / Apple Silicon as the most tested path and best-effort Linux bootstrap support for common package managers. Windows is not supported yet.
6
22
 
7
23
  ### Included
8
24
 
9
25
  - Discord voice receive via Node `@discordjs/voice`.
10
- - Local Korean STT via `whisper.cpp` + Metal.
26
+ - Local Korean STT via `whisper.cpp` + Metal or local Linux build fallback.
11
27
  - Edge TTS playback with Korean default voice.
12
28
  - Generic CLI harness adapter layer:
13
29
  - Hermes Agent
@@ -21,12 +37,14 @@ VerbalCoding is a Discord voice bridge for controlling CLI-based coding agents b
21
37
  - Long-answer TTS chunking and responsive barge-in.
22
38
  - Diff/code/log guardrails so large technical output is not read aloud.
23
39
  - Normal and conservative sensitivity modes for indoor vs. noisy/outdoor use.
24
- - Setup wizard, `.env.example`, `vc doctor` prerequisite checker, and `./scripts/install.sh --yes` bootstrap for OS packages, npm dependencies, Edge TTS helper, and the default whisper.cpp model.
25
- - npm package install path: `npm install -g verbalcoding`, `vc setup --yes`, and `vc start`.
40
+ - Public npm setup path: `npm install -g verbalcoding@latest`, guided `vc setup`, `vc doctor`, and `vc start`; `vc setup --yes`, `vc setup token`, and `vc setup channels` remain available for automation or later updates.
41
+ - `vc doctor` redacted prerequisite checker with supported auto-fixes for local media/STT/TTS prerequisites and Hermes CLI on macOS/Linux.
42
+ - Discord onboarding helpers: `vc bot invite <client-id>` plus token/client-id registration through `vc setup token`.
43
+ - Auto-join channel configuration through `vc setup channels`, `vc setup channel`, and `vc setup voice`.
26
44
  - Optional verbose progress mode for text-only middle-step updates during long agent work.
27
- - Always-on JSONL latency metrics plus `!latency` / `!metrics` summary for pipeline optimization.
28
- - More patient utterance idle wait (`UTTERANCE_IDLE_MS=4500`) so long spoken instructions with natural pauses are not split into a partial prompt plus ignored processing-time speech.
29
- - Multi-instance Hermes profile isolation: `vc instance setup <name>` auto-clones a Hermes profile to `~/.hermes/profiles/<name>` with the instance workdir, seeds SOUL.md, and writes `HERMES_HOME` into the instance env so per-project memory and skills stay separate; `vc instance start` self-heals a missing profile, and `vc doctor` checks profile-dir presence and `terminal.cwd` consistency.
45
+ - Always-on JSONL latency metrics plus `!latency` / `!metrics` summary.
46
+ - More patient utterance idle wait (`UTTERANCE_IDLE_MS=4500`) so long spoken instructions with natural pauses are not split too early.
47
+ - Multi-instance Hermes profile isolation: `vc instance setup <name>` auto-clones a Hermes profile to `~/.hermes/profiles/<name>` with the instance workdir, seeds SOUL.md, and writes `HERMES_HOME` into the instance env.
30
48
 
31
49
  ### Pre-release checklist
32
50
 
@@ -35,7 +53,7 @@ Run from the repo root:
35
53
  ```bash
36
54
  ./scripts/install.sh --yes --no-wizard
37
55
  ./scripts/docker_ubuntu_smoke.sh # requires Docker; validates ubuntu:24.04 clean install
38
- node --check app-node/main.mjs app-node/agent_adapters.mjs app-node/install_config.mjs scripts/install.mjs
56
+ node --check app-node/main.mjs app-node/agent_adapters.mjs app-node/install_config.mjs scripts/install.mjs scripts/cli.mjs scripts/doctor.mjs
39
57
  npm test
40
58
  PYTEST_DISABLE_PLUGIN_AUTOLOAD=1 python3 -m pytest tests/ -q || [ $? -eq 5 ] # ok when no Python tests exist
41
59
  bash -n run.sh scripts/install.sh scripts/bootstrap_prereqs.sh scripts/docker_ubuntu_smoke.sh
@@ -46,22 +64,27 @@ git diff --check
46
64
 
47
65
  Manual smoke test:
48
66
 
49
- 1. Start the bridge with `vc start` or `./run.sh`.
50
- 2. Verify log contains `Logged in as <bot-name>`.
51
- 3. Verify log contains `Listening in voice channel ... / 일반` or the configured default channel.
52
- 4. In Discord, run `!ping`.
53
- 5. In Discord voice, say a short Korean request.
54
- 6. Verify STT transcript, agent response, TTS playback, and barge-in behavior.
67
+ 1. Configure the app with `vc setup token` and `vc setup channels "<voice-channel>"`.
68
+ 2. Start the bridge with `vc start` or `./run.sh`.
69
+ 3. Verify log contains `Logged in as <bot-name>`.
70
+ 4. Verify log contains `Listening in voice channel ... / <configured channel>`.
71
+ 5. In Discord, run `!ping`.
72
+ 6. In Discord voice, say a short Korean request.
73
+ 7. Verify STT transcript, agent response, TTS playback, and barge-in behavior.
74
+
75
+ Container smoke note: Docker script checks install quality, not Discord voice UDP. For end-to-end voice in containers, Linux host networking is usually required.
55
76
 
56
77
  ### Known requirements
57
78
 
58
79
  - macOS with Homebrew, or Linux with `apt`, `dnf`, or `pacman` for best-effort bootstrap.
59
- - `ffmpeg`; installer attempts to install it.
60
- - `whisper-cli`; installer uses Homebrew on macOS or local `vendor/whisper.cpp` build fallback on Linux.
61
- - Default model at `models/ggml-small-q5_1.bin`; installer downloads it unless `--skip-model` is used.
62
- - Edge TTS CLI on `PATH` or local `.venv-tts/bin/edge-tts`; installer creates the local helper when needed.
63
- - Discord bot token in `.env`, `instances/<name>.env`, `~/.zshrc`, or runtime env.
80
+ - `ffmpeg`; setup/doctor attempts to install it.
81
+ - `whisper-cli`; setup uses Homebrew on macOS or local `vendor/whisper.cpp` build fallback on Linux.
82
+ - Default model at `models/ggml-small-q5_1.bin`; setup downloads it unless `--skip-model` is used.
83
+ - Edge TTS CLI on `PATH` or local `.venv-tts/bin/edge-tts`; setup creates the local helper when needed.
84
+ - Discord bot token registered with `vc setup token` or present in `.env`, `instances/<name>.env`, `~/.zshrc`, or runtime env.
85
+ - Auto-join voice channels registered with `vc setup channels` or present in `AUTO_JOIN_VOICE_CHANNELS`.
64
86
  - Selected CLI harness installed and authenticated.
87
+ - For containerized Discord voice, UDP egress must work; Linux `network_mode: "host"` is the recommended Docker Compose setting.
65
88
 
66
89
  ### Not for public release yet
67
90
 
@@ -0,0 +1,53 @@
1
+ # VerbalCoding Roadmap — 2026 H1 Differentiation Push
2
+
3
+ > Reframe: from "Discord bridge for Hermes" → **the voice layer for any coding agent — with real barge-in, streaming latency, and the agents you already use.**
4
+
5
+ This roadmap covers five differentiation phases that separate VerbalCoding from Hermes' built-in `/voice` (shipped Mar 2026, ~2 months old, no barge-in, Hermes-only, 2.5–9s practical latency).
6
+
7
+ ## Phase Plans
8
+
9
+ | # | Phase | Status | Plan |
10
+ |---|---|---|---|
11
+ | 1 | Streaming end-to-end pipeline | shipped | [phase1-streaming-pipeline.md](./superpowers/plans/2026-05-13-phase1-streaming-pipeline.md) |
12
+ | 2 | Agent-agnostic adapter completion | shipped (incl. cross-agent voice routing) | [phase2-agent-adapters.md](./superpowers/plans/2026-05-13-phase2-agent-adapters.md), [cross-agent-voice-transfer.md](./superpowers/plans/2026-05-14-cross-agent-voice-transfer.md) |
13
+ | 6 | Smart progress summarization | shipped | [phase6-smart-progress.md](./superpowers/plans/2026-05-13-phase6-smart-progress.md) |
14
+ | 7 | Voice plan mode | shipped (incl. `which_agent` slot) | [phase7-voice-plan-mode.md](./superpowers/plans/2026-05-13-phase7-voice-plan-mode.md) |
15
+ | 10 | Push notification handoff | shipped | [phase10-push-notifications.md](./superpowers/plans/2026-05-13-phase10-push-notifications.md) |
16
+
17
+ ## Sequencing rationale
18
+
19
+ 1. **Phase 2 first** — adapter polish + Aider/Cursor + auto-detection. Foundational and unlocks marketing claim "any coding agent".
20
+ 2. **Phase 1** — extend the existing `tts_prefetch.mjs` to consume streaming stdout. Big perceived-latency win.
21
+ 3. **Phase 6** — replaces regex pattern matching with semantic summarization. Demo moment.
22
+ 4. **Phase 7** — voice plan mode. UX feature, depends on adapter capability flags from Phase 2.
23
+ 5. **Phase 10** — push notification handoff. Independent; ship after the core is tighter.
24
+
25
+ ## Differentiation claims this roadmap unlocks
26
+
27
+ - **True barge-in** with smart resume (extend existing `barge_in.mjs`).
28
+ - **Streaming pipeline** so first audio plays before the agent finishes thinking (Hermes Phase-4 wishlist).
29
+ - **Agent-agnostic** — Hermes, Claude Code, Codex, Gemini, OpenCode, OpenClaw, Aider, Cursor CLI, custom.
30
+ - **Smart narration** — describes intent, not file names.
31
+ - **Voice plan mode** — narrate plan, edit by voice (`"skip step 3"`).
32
+ - **Phone-down mode** — push notification when long task completes with voice summary.
33
+
34
+ ## Non-goals (for this cycle)
35
+
36
+ - PSTN bridge / actual phone calls (Phase 4 of the broader pitch; deferred).
37
+ - Local-first one-flag preset (Phase 5; deferred but trivial follow-up).
38
+ - Multi-agent in one VC with distinct voices (Phase 3; needs Phase 2 to land first).
39
+
40
+ ## What's next (2026 H2 candidates)
41
+
42
+ The differentiation push above shipped — the foundation is in. Candidate next phases, not yet planned:
43
+
44
+ | # | Candidate | Why | Status |
45
+ |---|---|---|---|
46
+ | 11 | Push-to-talk and wake-word v2 | Reduce false barge-ins in shared rooms; pair with hardware push-to-talk via a Discord overlay or a key-binding companion. | candidate |
47
+ | 12 | Multi-user voice in one VC | Each speaker resolves to a distinct routing/session; per-speaker plan-mode and decision answers. Builds on the per-channel routing state. | candidate |
48
+ | 13 | Output voice cloning per agent | Distinct voices per backend (e.g. Codex gets a different TTS voice than Claude Code); piggybacks on the existing voice-clone capture flow. | candidate |
49
+ | 14 | Latency benchmarking + regression gate | Codify the latency_metrics output into a benchmark harness + CI threshold so any regression in STT/agent/TTS stages is caught. | candidate |
50
+ | 15 | Phone-app companion (deferred) | The push-handoff notification deeplinks back to Discord today; a thin phone app or PWA could replay a redacted transcript on demand. | candidate |
51
+ | 16 | Voice-clone reference auto-detect | Detect that an OpenVoice/FireRedTTS reference sample is missing and propose `!voice-clone capture` proactively when the user selects a clone-only backend. | candidate |
52
+
53
+ These aren't sequenced yet. Phases 11/12/14 are the highest-leverage if the goal is making the bridge feel solid in shared rooms; 13/16 are quality-of-life on top of the existing voice stack.
@@ -0,0 +1,126 @@
1
+ # VerbalCoding Troubleshooting
2
+
3
+ <!-- readme-glow-up:intro -->
4
+ <p align="center">
5
+ <a href="../README.md">README</a> ·
6
+ <a href="README.md">Docs hub</a> ·
7
+ <a href="FRESH_INSTALL.md">Fresh Install</a> ·
8
+ <a href="USAGE.md">Usage</a> ·
9
+ <a href="CONFIGURATION.md">Configuration</a> ·
10
+ <a href="TROUBLESHOOTING.md">Troubleshooting</a> ·
11
+ <a href="MULTI_INSTANCE.md">Multi-Instance</a>
12
+ </p>
13
+
14
+ > Fast diagnosis for Discord voice, Docker UDP, tokens, channels, and doctor checks.
15
+ >
16
+ > Fast path: `vc doctor → check channel names → check UDP/permissions`
17
+ <!-- /readme-glow-up:intro -->
18
+
19
+ ## `Cannot perform IP discovery - socket closed`
20
+
21
+ This usually means the bot logged into Discord and found a voice channel, but Discord voice UDP discovery failed.
22
+
23
+ Typical log sequence:
24
+
25
+ ```text
26
+ Logged in as <bot-name>
27
+ auto-join failed; trying next configured voice channel <server> <channel> AbortError: The operation was aborted
28
+ voice connection error Error: Cannot perform IP discovery - socket closed
29
+ No auto-join channel found or reachable ... attempted <server>/<channel>
30
+ ```
31
+
32
+ Interpretation:
33
+
34
+ | Log signal | Meaning |
35
+ |---|---|
36
+ | `Logged in as ...` | Token and Discord gateway login worked. |
37
+ | `attempted <server>/<channel>` | Channel lookup worked; names are probably correct. |
38
+ | `AbortError` after ~30s | Voice connection did not become ready in time. |
39
+ | `Cannot perform IP discovery - socket closed` | UDP voice discovery failed, often because Docker/firewall/NAT blocked UDP. |
40
+
41
+ Fixes:
42
+
43
+ 1. Try outside Docker first to isolate container networking.
44
+ 2. On Linux Docker Compose, use host networking:
45
+
46
+ ```yaml
47
+ services:
48
+ verbalcoding:
49
+ network_mode: "host"
50
+ ```
51
+
52
+ 3. Remove `ports:` from the same service. Host networking and port publishing should not be combined.
53
+ 4. Check host firewall, cloud security group, VPN, proxy, or corporate network policies for outbound UDP blocking.
54
+ 5. On Docker Desktop for macOS/Windows, host networking behaves differently. If voice UDP still fails, run VerbalCoding directly on macOS/Linux host or in a Linux VM.
55
+
56
+ ## `No auto-join channel found or reachable`
57
+
58
+ First confirm the configured names:
59
+
60
+ ```bash
61
+ vc setup channels "VerbalCoding,LLM-Wiki,General"
62
+ vc doctor
63
+ vc start
64
+ ```
65
+
66
+ If the log includes `attempted Server/Channel`, the channel was found and the remaining issue is reachability, permissions, or UDP voice transport. If the log has no attempted channel, update the channel names exactly as they appear in Discord.
67
+
68
+ ## Missing Discord token
69
+
70
+ Run:
71
+
72
+ ```bash
73
+ vc setup token
74
+ # or:
75
+ vc setup token <bot-token> --client-id <discord-client-id>
76
+ vc doctor
77
+ ```
78
+
79
+ Do not paste real tokens into issues, logs, screenshots, or docs. `vc doctor` redacts configured token values.
80
+
81
+ ## Bot invited but cannot speak or send text
82
+
83
+ Verify Discord permissions on the exact channel/thread/voice room:
84
+
85
+ - View Channel
86
+ - Send Messages
87
+ - Send Messages in Threads
88
+ - Read Message History
89
+ - Use Application Commands
90
+ - Connect
91
+ - Speak
92
+
93
+ Channel-level overwrites can deny access even when the bot has server-level permissions.
94
+
95
+ ## Text delivery warning
96
+
97
+ ```text
98
+ sendText missing transcript channel id; text not delivered
99
+ ```
100
+
101
+ Voice can still work. This means `TRANSCRIPT_CHANNEL_ID` is unset, so restart/final/progress text cannot be mirrored to Discord text. Rerun setup or set a transcript text channel/thread ID.
102
+
103
+ ## Docker Compose host networking
104
+
105
+ Equivalent of `docker run --network=host`:
106
+
107
+ ```yaml
108
+ services:
109
+ verbalcoding:
110
+ network_mode: "host"
111
+ ```
112
+
113
+ Notes:
114
+
115
+ - Linux Docker: this is the most useful fix for Discord voice UDP issues.
116
+ - Docker Desktop macOS/Windows: host networking is limited/different; test on the host or a Linux VM if voice still fails.
117
+ - Do not include `ports:` for that service when using `network_mode: "host"`.
118
+
119
+ ## Doctor auto-fix behavior
120
+
121
+ `vc doctor` may install or repair local prerequisites on supported macOS/Linux installs. It does not create Discord secrets or authenticate external agent CLIs for you.
122
+
123
+ ```bash
124
+ vc doctor
125
+ VERBALCODING_DOCTOR_INSTALL_HERMES=0 vc doctor # skip Hermes CLI auto-install
126
+ ```