verbalcoding 0.2.11 → 0.2.13

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (235) hide show
  1. package/.env.example +98 -2
  2. package/README.es.md +134 -0
  3. package/README.fr.md +134 -0
  4. package/README.ja.md +134 -0
  5. package/README.ko.md +134 -0
  6. package/README.md +118 -74
  7. package/README.ru.md +134 -0
  8. package/README.zh.md +133 -0
  9. package/app-node/agent_adapters.mjs +37 -5
  10. package/app-node/agent_adapters.test.mjs +27 -1
  11. package/app-node/agent_detect.mjs +73 -0
  12. package/app-node/agent_detect.test.mjs +77 -0
  13. package/app-node/agent_routing.mjs +148 -0
  14. package/app-node/agent_routing.test.mjs +138 -0
  15. package/app-node/agent_turn.mjs +86 -0
  16. package/app-node/agent_turn.test.mjs +109 -0
  17. package/app-node/bridge_context.mjs +73 -0
  18. package/app-node/bridge_context.test.mjs +54 -0
  19. package/app-node/bridge_state.mjs +4 -0
  20. package/app-node/bridge_wireup.test.mjs +462 -0
  21. package/app-node/cli_install.test.mjs +31 -0
  22. package/app-node/cross_agent_routing.test.mjs +78 -0
  23. package/app-node/discord_command_router.mjs +204 -0
  24. package/app-node/discord_command_router.test.mjs +311 -0
  25. package/app-node/discord_voice_setup.mjs +251 -0
  26. package/app-node/discord_voice_setup.test.mjs +86 -0
  27. package/app-node/hermes_profiles.test.mjs +12 -1
  28. package/app-node/install_config.mjs +113 -3
  29. package/app-node/install_config.test.mjs +8 -0
  30. package/app-node/instance_doctor.test.mjs +9 -0
  31. package/app-node/instances.test.mjs +8 -1
  32. package/app-node/main.mjs +513 -1058
  33. package/app-node/mcp_tools.test.mjs +7 -0
  34. package/app-node/notification_handler.mjs +89 -0
  35. package/app-node/notification_handler.test.mjs +187 -0
  36. package/app-node/notify.mjs +73 -0
  37. package/app-node/notify.test.mjs +68 -0
  38. package/app-node/plan_dispatcher.mjs +215 -0
  39. package/app-node/plan_dispatcher.test.mjs +101 -0
  40. package/app-node/plan_mode.mjs +203 -0
  41. package/app-node/plan_mode.test.mjs +231 -0
  42. package/app-node/progress_handler.mjs +220 -0
  43. package/app-node/progress_handler.test.mjs +193 -0
  44. package/app-node/progress_speech.mjs +54 -32
  45. package/app-node/progress_speech.test.mjs +12 -3
  46. package/app-node/project_sessions.mjs +5 -2
  47. package/app-node/project_sessions.test.mjs +7 -0
  48. package/app-node/research_mode.mjs +282 -0
  49. package/app-node/research_mode.test.mjs +264 -0
  50. package/app-node/restart_notice.mjs +3 -0
  51. package/app-node/restart_notice.test.mjs +11 -0
  52. package/app-node/session_ontology.mjs +271 -0
  53. package/app-node/session_ontology.test.mjs +130 -0
  54. package/app-node/smart_progress.mjs +94 -0
  55. package/app-node/smart_progress.test.mjs +66 -0
  56. package/app-node/stream_sentencer.mjs +91 -0
  57. package/app-node/stream_sentencer.test.mjs +129 -0
  58. package/app-node/streaming_tts_queue.mjs +52 -0
  59. package/app-node/streaming_tts_queue.test.mjs +64 -0
  60. package/app-node/stt_whisper.mjs +24 -0
  61. package/app-node/stt_whisper.test.mjs +32 -0
  62. package/app-node/text_routing.mjs +22 -0
  63. package/app-node/text_routing.test.mjs +23 -1
  64. package/app-node/tts_backends.mjs +537 -3
  65. package/app-node/tts_backends.test.mjs +454 -0
  66. package/app-node/tts_player.mjs +164 -0
  67. package/app-node/tts_player.test.mjs +202 -0
  68. package/app-node/tts_runtime.mjs +134 -0
  69. package/app-node/tts_runtime.test.mjs +89 -0
  70. package/app-node/tts_settings.mjs +150 -3
  71. package/app-node/tts_settings.test.mjs +204 -0
  72. package/app-node/tts_voice_config.mjs +136 -2
  73. package/app-node/tts_voice_config.test.mjs +94 -0
  74. package/app-node/utterance_router.mjs +216 -0
  75. package/app-node/utterance_router.test.mjs +236 -0
  76. package/app-node/voice_autojoin.mjs +37 -0
  77. package/app-node/voice_autojoin.test.mjs +59 -0
  78. package/app-node/voice_io.mjs +272 -0
  79. package/app-node/voice_io.test.mjs +102 -0
  80. package/app-node/voice_turn_runner.mjs +449 -0
  81. package/app-node/voice_turn_runner.test.mjs +289 -0
  82. package/docs/CONFIGURATION.md +79 -96
  83. package/docs/FRESH_INSTALL.md +105 -63
  84. package/docs/HARNESSES.md +58 -0
  85. package/docs/HARNESS_AIDER.md +50 -0
  86. package/docs/HARNESS_CLAUDE.md +56 -0
  87. package/docs/HARNESS_CODEX.md +56 -0
  88. package/docs/HARNESS_CURSOR.md +45 -0
  89. package/docs/HARNESS_GEMINI.md +45 -0
  90. package/docs/HARNESS_HERMES.md +57 -0
  91. package/docs/HARNESS_OPENCLAW.md +44 -0
  92. package/docs/HARNESS_OPENCODE.md +44 -0
  93. package/docs/HERMES_VOICE.md +65 -0
  94. package/docs/MULTI_INSTANCE.md +16 -0
  95. package/docs/README.md +50 -0
  96. package/docs/RELEASE.md +42 -19
  97. package/docs/ROADMAP.md +53 -0
  98. package/docs/TROUBLESHOOTING.md +126 -0
  99. package/docs/TTS_BACKENDS.md +227 -0
  100. package/docs/USAGE.md +94 -40
  101. package/docs/assets/figures/verbalcoding-flow.svg +1 -1
  102. package/docs/i18n/AGENTS.es.md +34 -0
  103. package/docs/i18n/AGENTS.fr.md +34 -0
  104. package/docs/i18n/AGENTS.ja.md +34 -0
  105. package/docs/i18n/AGENTS.ko.md +34 -0
  106. package/docs/i18n/AGENTS.ru.md +34 -0
  107. package/docs/i18n/AGENTS.zh.md +34 -0
  108. package/docs/i18n/CONFIGURATION.es.md +25 -0
  109. package/docs/i18n/CONFIGURATION.fr.md +25 -0
  110. package/docs/i18n/CONFIGURATION.ja.md +25 -0
  111. package/docs/i18n/CONFIGURATION.ko.md +25 -0
  112. package/docs/i18n/CONFIGURATION.ru.md +25 -0
  113. package/docs/i18n/CONFIGURATION.zh.md +25 -0
  114. package/docs/i18n/FRESH_INSTALL.es.md +27 -2
  115. package/docs/i18n/FRESH_INSTALL.fr.md +27 -2
  116. package/docs/i18n/FRESH_INSTALL.ja.md +27 -2
  117. package/docs/i18n/FRESH_INSTALL.ko.md +27 -2
  118. package/docs/i18n/FRESH_INSTALL.ru.md +27 -2
  119. package/docs/i18n/FRESH_INSTALL.zh.md +27 -2
  120. package/docs/i18n/HARNESSES.es.md +58 -0
  121. package/docs/i18n/HARNESSES.fr.md +58 -0
  122. package/docs/i18n/HARNESSES.ja.md +58 -0
  123. package/docs/i18n/HARNESSES.ko.md +58 -0
  124. package/docs/i18n/HARNESSES.ru.md +58 -0
  125. package/docs/i18n/HARNESSES.zh.md +58 -0
  126. package/docs/i18n/HARNESS_AIDER.es.md +48 -0
  127. package/docs/i18n/HARNESS_AIDER.fr.md +48 -0
  128. package/docs/i18n/HARNESS_AIDER.ja.md +50 -0
  129. package/docs/i18n/HARNESS_AIDER.ko.md +50 -0
  130. package/docs/i18n/HARNESS_AIDER.ru.md +48 -0
  131. package/docs/i18n/HARNESS_AIDER.zh.md +48 -0
  132. package/docs/i18n/HARNESS_CLAUDE.es.md +55 -0
  133. package/docs/i18n/HARNESS_CLAUDE.fr.md +55 -0
  134. package/docs/i18n/HARNESS_CLAUDE.ja.md +56 -0
  135. package/docs/i18n/HARNESS_CLAUDE.ko.md +56 -0
  136. package/docs/i18n/HARNESS_CLAUDE.ru.md +55 -0
  137. package/docs/i18n/HARNESS_CLAUDE.zh.md +56 -0
  138. package/docs/i18n/HARNESS_CODEX.es.md +55 -0
  139. package/docs/i18n/HARNESS_CODEX.fr.md +55 -0
  140. package/docs/i18n/HARNESS_CODEX.ja.md +56 -0
  141. package/docs/i18n/HARNESS_CODEX.ko.md +56 -0
  142. package/docs/i18n/HARNESS_CODEX.ru.md +55 -0
  143. package/docs/i18n/HARNESS_CODEX.zh.md +56 -0
  144. package/docs/i18n/HARNESS_CURSOR.es.md +42 -0
  145. package/docs/i18n/HARNESS_CURSOR.fr.md +42 -0
  146. package/docs/i18n/HARNESS_CURSOR.ja.md +45 -0
  147. package/docs/i18n/HARNESS_CURSOR.ko.md +45 -0
  148. package/docs/i18n/HARNESS_CURSOR.ru.md +42 -0
  149. package/docs/i18n/HARNESS_CURSOR.zh.md +42 -0
  150. package/docs/i18n/HARNESS_GEMINI.es.md +44 -0
  151. package/docs/i18n/HARNESS_GEMINI.fr.md +44 -0
  152. package/docs/i18n/HARNESS_GEMINI.ja.md +45 -0
  153. package/docs/i18n/HARNESS_GEMINI.ko.md +45 -0
  154. package/docs/i18n/HARNESS_GEMINI.ru.md +44 -0
  155. package/docs/i18n/HARNESS_GEMINI.zh.md +45 -0
  156. package/docs/i18n/HARNESS_HERMES.es.md +54 -0
  157. package/docs/i18n/HARNESS_HERMES.fr.md +54 -0
  158. package/docs/i18n/HARNESS_HERMES.ja.md +57 -0
  159. package/docs/i18n/HARNESS_HERMES.ko.md +57 -0
  160. package/docs/i18n/HARNESS_HERMES.ru.md +54 -0
  161. package/docs/i18n/HARNESS_HERMES.zh.md +57 -0
  162. package/docs/i18n/HARNESS_OPENCLAW.es.md +41 -0
  163. package/docs/i18n/HARNESS_OPENCLAW.fr.md +41 -0
  164. package/docs/i18n/HARNESS_OPENCLAW.ja.md +44 -0
  165. package/docs/i18n/HARNESS_OPENCLAW.ko.md +44 -0
  166. package/docs/i18n/HARNESS_OPENCLAW.ru.md +41 -0
  167. package/docs/i18n/HARNESS_OPENCLAW.zh.md +42 -0
  168. package/docs/i18n/HARNESS_OPENCODE.es.md +41 -0
  169. package/docs/i18n/HARNESS_OPENCODE.fr.md +41 -0
  170. package/docs/i18n/HARNESS_OPENCODE.ja.md +44 -0
  171. package/docs/i18n/HARNESS_OPENCODE.ko.md +44 -0
  172. package/docs/i18n/HARNESS_OPENCODE.ru.md +41 -0
  173. package/docs/i18n/HARNESS_OPENCODE.zh.md +44 -0
  174. package/docs/i18n/HERMES_VOICE.es.md +46 -0
  175. package/docs/i18n/HERMES_VOICE.fr.md +46 -0
  176. package/docs/i18n/HERMES_VOICE.ja.md +46 -0
  177. package/docs/i18n/HERMES_VOICE.ko.md +65 -0
  178. package/docs/i18n/HERMES_VOICE.ru.md +46 -0
  179. package/docs/i18n/HERMES_VOICE.zh.md +46 -0
  180. package/docs/i18n/MULTI_INSTANCE.es.md +25 -0
  181. package/docs/i18n/MULTI_INSTANCE.fr.md +25 -0
  182. package/docs/i18n/MULTI_INSTANCE.ja.md +25 -0
  183. package/docs/i18n/MULTI_INSTANCE.ko.md +25 -0
  184. package/docs/i18n/MULTI_INSTANCE.ru.md +25 -0
  185. package/docs/i18n/MULTI_INSTANCE.zh.md +25 -0
  186. package/docs/i18n/README.es.md +20 -134
  187. package/docs/i18n/README.fr.md +20 -134
  188. package/docs/i18n/README.ja.md +20 -134
  189. package/docs/i18n/README.ko.md +20 -133
  190. package/docs/i18n/README.ru.md +20 -134
  191. package/docs/i18n/README.zh.md +20 -133
  192. package/docs/i18n/RELEASE.es.md +26 -1
  193. package/docs/i18n/RELEASE.fr.md +26 -1
  194. package/docs/i18n/RELEASE.ja.md +26 -1
  195. package/docs/i18n/RELEASE.ko.md +26 -1
  196. package/docs/i18n/RELEASE.ru.md +26 -1
  197. package/docs/i18n/RELEASE.zh.md +26 -1
  198. package/docs/i18n/TROUBLESHOOTING.es.md +39 -0
  199. package/docs/i18n/TROUBLESHOOTING.fr.md +39 -0
  200. package/docs/i18n/TROUBLESHOOTING.ja.md +39 -0
  201. package/docs/i18n/TROUBLESHOOTING.ko.md +39 -0
  202. package/docs/i18n/TROUBLESHOOTING.ru.md +39 -0
  203. package/docs/i18n/TROUBLESHOOTING.zh.md +39 -0
  204. package/docs/i18n/USAGE.es.md +25 -0
  205. package/docs/i18n/USAGE.fr.md +25 -0
  206. package/docs/i18n/USAGE.ja.md +25 -0
  207. package/docs/i18n/USAGE.ko.md +25 -0
  208. package/docs/i18n/USAGE.ru.md +25 -0
  209. package/docs/i18n/USAGE.zh.md +25 -0
  210. package/docs/superpowers/plans/2026-05-13-phase1-streaming-pipeline.md +122 -0
  211. package/docs/superpowers/plans/2026-05-13-phase10-push-notifications.md +152 -0
  212. package/docs/superpowers/plans/2026-05-13-phase2-agent-adapters.md +242 -0
  213. package/docs/superpowers/plans/2026-05-13-phase6-smart-progress.md +172 -0
  214. package/docs/superpowers/plans/2026-05-13-phase7-voice-plan-mode.md +108 -0
  215. package/docs/superpowers/plans/2026-05-14-cross-agent-voice-transfer.md +625 -0
  216. package/docs/superpowers/plans/2026-05-21-audio-overview-narrated-diffs.md +95 -0
  217. package/docs/superpowers/plans/2026-05-21-autoresearch-ontology.md +83 -0
  218. package/docs/superpowers/plans/2026-05-21-phase11-push-to-talk-wakeword-v2.md +77 -0
  219. package/docs/superpowers/plans/2026-05-21-phase12-multi-user-voice.md +147 -0
  220. package/docs/superpowers/plans/2026-05-21-phase14-verbalbench.md +136 -0
  221. package/docs/superpowers/plans/2026-05-21-phase15-phone-companion.md +72 -0
  222. package/integrations/fireredtts2/mlx_llm.py +183 -0
  223. package/integrations/fireredtts2/synth.py +156 -0
  224. package/integrations/fireredtts2/synth_mlx.py +196 -0
  225. package/integrations/mlxaudio/synth.py +74 -0
  226. package/integrations/neuttsair/synth.py +104 -0
  227. package/integrations/omnivoice/synth.py +110 -0
  228. package/package.json +7 -1
  229. package/scripts/cli.mjs +88 -3
  230. package/scripts/doctor.mjs +115 -4
  231. package/scripts/install.mjs +20 -2
  232. package/scripts/install_fireredtts2.sh +109 -0
  233. package/scripts/install_mlxaudio.sh +34 -0
  234. package/scripts/install_mossttsnano.sh +46 -0
  235. package/scripts/postinstall.mjs +34 -0
@@ -0,0 +1,227 @@
1
+ # TTS backends and latency notes
2
+
3
+ This document captures the current VerbalCoding TTS backends, the live-selection rules, and the latency caveats observed while testing on the current Mac mini.
4
+
5
+ ## Current test machine
6
+
7
+ Observed host for these notes:
8
+
9
+ - Machine: Mac mini, Apple M4
10
+ - Memory: 16 GB
11
+ - OS: macOS 26.3 / Darwin 25.3.0 arm64
12
+ - Workload caveat: several measurements were taken while other heavy local processes or model-training jobs could be active. Treat local neural TTS timings as operational observations, not clean benchmarks.
13
+
14
+ ## Operational rule
15
+
16
+ Edge TTS is the default safe live backend. Local neural backends are optional and should normally fall back to Edge for progress prompts unless explicitly enabled with each backend's `*_PROGRESS=1` setting.
17
+
18
+ When a user explicitly asks to switch to a specific backend, update both:
19
+
20
+ ```bash
21
+ TTS_BACKEND=<backend>
22
+ TTS_VOICE_TYPE=<voice-type>
23
+ ```
24
+
25
+ and `config/tts-voices.json`:
26
+
27
+ ```json
28
+ {
29
+ "currentBackend": "<backend>",
30
+ "currentVoiceType": "<voice-type>"
31
+ }
32
+ ```
33
+
34
+ The runtime re-reads voice config, so changing only `.env` can be overridden.
35
+
36
+ ### Fallback notice
37
+
38
+ When a non-Edge backend fails to synthesize (model missing, runtime crash, timeout, install error), the bridge silently re-routes that utterance through Edge so the user still hears a response. The first time this happens for each backend in a session, VerbalCoding posts a one-shot warning to the active Discord text channel and speaks the same message ("`<backend>` synthesis failed; using Edge for the rest of this session." / "`<backend>` 음성 생성에 실패해서 이번 세션은 Edge로 진행할게."). Subsequent failures for the same backend stay silent.
39
+
40
+ If you see the warning, check `vc doctor` and the backend's venv/model install — the bridge will keep using Edge until the next `vc start`.
41
+
42
+ ## Supported backends
43
+
44
+ | Backend | Purpose | Default path / command | Live-call suitability | Notes |
45
+ |---|---|---|---|---|
46
+ | `edge` | Free cloud TTS baseline | `edge-tts` | Best current default | Korean and English voices, fast enough for phone-call mode, progress cache works well. |
47
+ | `openvoice` | Reference-sample voice cloning | `integrations/openvoice/synth.py` | Experimental | Requires permitted reference audio. Progress falls back to Edge unless `OPENVOICE_PROGRESS=1`. |
48
+ | `speechswift` | Apple Silicon local CosyVoice / Qwen3 wrapper | `audio speak ...` | Experimental | CosyVoice is usable for demos but not as responsive as Edge; Qwen3 path is much slower. |
49
+ | `supertonic` | Local Supertonic CLI wrapper | `supertonic tts ...` | Experimental | Supports voice IDs such as `M1`; falls back to Edge on failure. |
50
+ | `omnivoice` | OmniVoice local reference/design voice | `.venv-omnivoice/bin/python integrations/omnivoice/synth.py` | Experimental | Startup/model load can feel hung. Keep Edge for live mode unless explicitly testing. |
51
+ | `qwen3tts` | Qwen3 TTS via `audio` CLI | `audio speak --engine qwen3 ...` | Slow experimental | Correct backend name is `qwen3tts` / alias `qwen3`; do not use old `q13` aliases. |
52
+ | `mlxaudio` | MLX Audio Qwen3 wrapper | `.venv-mlxaudio/bin/python integrations/mlxaudio/synth.py` | Experimental | Uses MLX Qwen3 model defaults; validate actual audible output, not only file existence. |
53
+ | `neuttsair` | NeuTTS-Air English reference cloning | `.venv-neuttsair/bin/python integrations/neuttsair/synth.py` | Too slow for current live use | English-only in practice. Q4 GGUF lowers latency but still felt unusably slow under contention. |
54
+ | `fireredtts2` | FireRedTTS-2 prompt-reference backend | `./.local/bin/fireredtts2` | Slow experimental | Can stall restart/final TTS long enough to feel broken. Honor explicit user selection, but report slowness instead of silently reverting. |
55
+ | FireRedTTS-2 MLX helper | Apple Silicon FireRed LLM-port experiment | `integrations/fireredtts2/synth_mlx.py` | Not wired as canonical backend yet | Ports the FireRed LLM token generator to MLX/Metal while keeping RedCodec in Torch; intended to avoid upstream Torch Qwen hangs/slowness. |
56
+ | `mossttsnano` | OpenMOSS / MOSS-TTS-Nano PyTorch backend | `.venv-mossttsnano/bin/python vendor/MOSS-TTS-Nano/infer.py` | Very slow experimental | On macOS use Python 3.11 venv and `--disable-wetext-processing`. |
57
+ | `mossttsnano_mlx` | MOSS-TTS-Nano hybrid MLX port | `.venv-mossttsnano/bin/python integrations/mossttsnano_mlx/synth.py` | Active experiment, not live default | Native MLX generator, KV cache, and persistent JSON-line worker were added. Still verify audibility and tokenizer/model parity. |
58
+
59
+ ## Backend aliases
60
+
61
+ Accepted aliases normalize to canonical backend names:
62
+
63
+ | Alias examples | Canonical backend |
64
+ |---|---|
65
+ | `qwen3`, `qwen3-tts`, `qtts` | `qwen3tts` |
66
+ | `mlx`, `mlx-audio`, `qwen3-mlx` | `mlxaudio` |
67
+ | `neutts`, `neutts-air`, `neu tts air` | `neuttsair` |
68
+ | `firered`, `fireredtts`, `firered-tts-2` | `fireredtts2` |
69
+ | `moss`, `moss-tts`, `mossnano`, `openmoss` | `mossttsnano` |
70
+ | `moss-mlx`, `mossttsnano-mlx`, `openmoss-mlx` | `mossttsnano_mlx` |
71
+
72
+ ## Observed latency
73
+
74
+ ### End-to-end voice loop log
75
+
76
+ From `.logs/latency.jsonl`, 160 successful voice turns were available. These measure the whole Discord voice loop, not just TTS:
77
+
78
+ | Stage | Median | P90 | Min | Max |
79
+ |---|---:|---:|---:|---:|
80
+ | STT | 3.81 s | 4.60 s | 0.75 s | 23.70 s |
81
+ | Agent call | 16.90 s | 209.74 s | 5.58 s | 825.90 s |
82
+ | TTS synth | 3.98 s | 12.77 s | 0.72 s | 760.73 s |
83
+ | TTS playback | 19.50 s | 47.16 s | 0.99 s | 90.36 s |
84
+ | TTS total | 23.14 s | 62.28 s | 1.90 s | 782.89 s |
85
+ | Voice capture | 11.66 s | 30.02 s | 3.20 s | 109.10 s |
86
+ | Utterance idle wait | 2.60 s | 4.50 s | 2.60 s | 4.54 s |
87
+ | Total turn | 69.06 s | 289.56 s | 20.99 s | 905.24 s |
88
+
89
+ Interpretation:
90
+
91
+ - Long perceived latency is often not only TTS. Agent work and spoken playback length dominate many turns.
92
+ - A high TTS-synth max indicates local/experimental TTS can stall badly under load or fallback paths.
93
+ - Playback time is real audio duration, so long answers sound slow even if synthesis is fast.
94
+ - The idle wait is intentionally a few seconds to avoid cutting off Korean phone-call utterances.
95
+
96
+ ### Local neural TTS observations
97
+
98
+ | Backend / mode | Observed behavior on this Mac mini | Practical conclusion |
99
+ |---|---|---|
100
+ | Edge TTS | Usually low seconds for chunks; reliable enough for current live mode. | Keep as default/fallback. |
101
+ | SpeechSwift CosyVoice CLI | About 6.9 s wall time for a 1.68 s Korean sample after warm-up. | Demo-capable, but sluggish for conversation. |
102
+ | SpeechSwift audio-server | Warm short Korean requests varied around 4.5-7.7 s and sometimes hung. | Not safe as the always-on live backend yet. |
103
+ | SpeechSwift/Qwen3 | About 62.5 s wall time, first chunk around 47.6 s in prior testing. | Too slow for live phone-call mode. |
104
+ | NeuTTS Air | Produced valid WAVs, but felt unusably slow while the machine was under unrelated GPU/model load. | English-only experiment; use Edge for live answers. |
105
+ | FireRedTTS-2 | Can be slow enough that restart/final TTS appears stalled. Timeout is 180 s by default. | Useful to test, but report slowness clearly. |
106
+ | FireRedTTS-2 MLX helper | Added as an Apple Silicon experiment that moves the LLM token generator to MLX/Metal and keeps RedCodec encode/decode in Torch. | Not a production backend yet; verify dependencies, imports, generated frames, and decoded volume before wiring it to `TTS_BACKEND`. |
107
+ | MOSS-TTS-Nano PyTorch | Works as an OpenMOSS path but is very slow on macOS. | Keep as correctness baseline, not live default. |
108
+ | MOSS-TTS-Nano MLX | Added native generator, sampling fixes, KV cache, and persistent worker; can reduce repeated startup overhead. | Still experimental; verify audible volume and parity before live use. |
109
+
110
+ ## MOSS-TTS-Nano MLX status
111
+
112
+ Recent implementation work added:
113
+
114
+ - `integrations/mossttsnano_mlx/convert.py` for conversion experiments.
115
+ - `integrations/mossttsnano_mlx/gpt2_mlx.py` for a native MLX GPT2-like generator.
116
+ - `integrations/mossttsnano_mlx/synth.py` for the hybrid synthesis path.
117
+ - `integrations/mossttsnano_mlx/worker.py` for a persistent JSON-line worker.
118
+ - `MOSSTTSNANO_MLX_WORKER=1` to keep the worker hot between requests.
119
+ - KV cache and sampling-semantics fixes in the MLX generator.
120
+
121
+ Known caution:
122
+
123
+ - A generated WAV is not enough. Check audibility with playback or `ffmpeg volumedetect`.
124
+ - Near-silent or strange audio usually means model/tokenizer/audio-code parity is still wrong.
125
+ - Keep the PyTorch MOSS path as a reference until MLX parity is proven.
126
+
127
+ ## Configuration examples
128
+
129
+ ### Safe live default
130
+
131
+ ```bash
132
+ TTS_BACKEND=edge
133
+ TTS_VOICE_TYPE=korean_male
134
+ TTS_VOICE=ko-KR-InJoonNeural
135
+ TTS_RATE=+10%
136
+ ```
137
+
138
+ ### Qwen3 TTS preset
139
+
140
+ ```bash
141
+ TTS_BACKEND=qwen3tts
142
+ TTS_VOICE_TYPE=korean_preset
143
+ QWEN3TTS_COMMAND=audio
144
+ QWEN3TTS_MODE=custom
145
+ QWEN3TTS_MODEL=customVoice
146
+ QWEN3TTS_LANGUAGE=korean
147
+ QWEN3TTS_SPEAKER=sohee
148
+ QWEN3TTS_PROGRESS=0
149
+ ```
150
+
151
+ ### NeuTTS Air English experiment
152
+
153
+ ```bash
154
+ TTS_BACKEND=neuttsair
155
+ TTS_VOICE_TYPE=cloned_reference
156
+ VOICE_LANGUAGE=en
157
+ STT_LANGUAGE=en
158
+ WHISPER_CPP_LANGUAGE=en
159
+ NEUTTSAIR_PYTHON=./.venv-neuttsair/bin/python
160
+ NEUTTSAIR_SCRIPT=integrations/neuttsair/synth.py
161
+ NEUTTSAIR_BACKBONE_REPO=neuphonic/neutts-air-q4-gguf
162
+ NEUTTSAIR_CODEC_REPO=neuphonic/neucodec
163
+ NEUTTSAIR_PROGRESS=0
164
+ ```
165
+
166
+ ### FireRedTTS-2 experiment
167
+
168
+ ```bash
169
+ TTS_BACKEND=fireredtts2
170
+ TTS_VOICE_TYPE=prompt_reference
171
+ FIREREDTTS2_COMMAND=./.local/bin/fireredtts2
172
+ FIREREDTTS2_PRETRAINED_DIR=./pretrained_models/FireRedTTS2
173
+ FIREREDTTS2_PROMPT_AUDIO=./voice-samples/user-reference.wav
174
+ FIREREDTTS2_PROGRESS=0
175
+ ```
176
+
177
+ ### MOSS-TTS-Nano PyTorch experiment
178
+
179
+ ```bash
180
+ TTS_BACKEND=mossttsnano
181
+ TTS_VOICE_TYPE=prompt_reference
182
+ MOSSTTSNANO_COMMAND=./.venv-mossttsnano/bin/python
183
+ MOSSTTSNANO_SCRIPT=vendor/MOSS-TTS-Nano/infer.py
184
+ MOSSTTSNANO_CHECKPOINT=OpenMOSS-Team/MOSS-TTS-Nano
185
+ MOSSTTSNANO_PROMPT_AUDIO=./voice-samples/user-reference.wav
186
+ MOSSTTSNANO_PROGRESS=0
187
+ ```
188
+
189
+ ### MOSS-TTS-Nano MLX worker experiment
190
+
191
+ ```bash
192
+ TTS_BACKEND=mossttsnano_mlx
193
+ TTS_VOICE_TYPE=prompt_reference
194
+ MOSSTTSNANO_MLX_PYTHON=./.venv-mossttsnano/bin/python
195
+ MOSSTTSNANO_MLX_SCRIPT=integrations/mossttsnano_mlx/synth.py
196
+ MOSSTTSNANO_MLX_WORKER=1
197
+ MOSSTTSNANO_MLX_WORKER_SCRIPT=integrations/mossttsnano_mlx/worker.py
198
+ MOSSTTSNANO_TORCH_DEVICE=cpu
199
+ MOSSTTSNANO_TORCH_DTYPE=float32
200
+ MOSSTTSNANO_PROMPT_AUDIO=./voice-samples/user-reference.wav
201
+ MOSSTTSNANO_MLX_PROGRESS=0
202
+ ```
203
+
204
+ ## How to benchmark safely
205
+
206
+ Use a quiet machine, short fixed text, and separate synthesis from playback:
207
+
208
+ ```bash
209
+ vc doctor
210
+ node --test app-node/tts_backends.test.mjs app-node/tts_settings.test.mjs app-node/tts_voice_config.test.mjs
211
+ ```
212
+
213
+ For live logs, compare these fields in `.logs/latency.jsonl`:
214
+
215
+ - `stt_ms`: speech-to-text time.
216
+ - `agent_ms`: CLI agent time.
217
+ - `tts_synth_ms`: time to synthesize audio files.
218
+ - `tts_play_ms`: time spent playing generated audio.
219
+ - `total_ms`: full turn time.
220
+
221
+ When testing local neural backends, also verify:
222
+
223
+ ```bash
224
+ ffmpeg -i output.wav -af volumedetect -f null -
225
+ ```
226
+
227
+ A non-empty file can still be inaudible or near-silent.
package/docs/USAGE.md CHANGED
@@ -1,45 +1,82 @@
1
1
  # VerbalCoding Usage Guide
2
2
 
3
- This page holds the operational details that used to make the README too long.
3
+ <!-- readme-glow-up:intro -->
4
+ <p align="center">
5
+ <a href="../README.md">README</a> ·
6
+ <a href="README.md">Docs hub</a> ·
7
+ <a href="FRESH_INSTALL.md">Fresh Install</a> ·
8
+ <a href="USAGE.md">Usage</a> ·
9
+ <a href="CONFIGURATION.md">Configuration</a> ·
10
+ <a href="TROUBLESHOOTING.md">Troubleshooting</a> ·
11
+ <a href="MULTI_INSTANCE.md">Multi-Instance</a>
12
+ </p>
13
+
14
+ > Operational command reference for the voice bridge.
15
+ >
16
+ > Fast path: `vc setup → vc start → speak or use !ask in Discord`
17
+ <!-- /readme-glow-up:intro -->
18
+
19
+ This page holds the operational details that should stay out of the README.
4
20
 
5
21
  ## CLI Commands
6
22
 
7
23
  ```bash
8
- vc status # show STT language, progress language, and TTS voice
9
- vc language en # English STT + English progress/TTS voice
10
- vc language ko # Korean STT + Korean progress/TTS voice
11
- vc language auto # Whisper auto-detect STT + English progress/TTS voice
12
- vc restart auto status # show commit-time voice-bot auto-restart setting
13
- vc restart auto on # enable commit-time voice-bot auto-restart
14
- vc restart auto off # disable it; this is the default
15
- vc bot invite CLIENT_ID # print a Discord invite URL with required permissions
16
- vc instance status # list per-instance bridge configs and process status
17
- vc instance setup NAME # write instances/NAME.env and create ~/.hermes/profiles/NAME
18
- vc instance start NAME # start ./run.sh instances/NAME.env detached
19
- vc instance stop NAME # stop a detached instance and remove its pid file
20
- vc doctor # run the redacted doctor check
21
- npm run mcp # run the stdio MCP server
24
+ vc setup # guided setup: prerequisites, Discord token, voice channels
25
+ vc setup --yes # non-interactive bootstrap/starter config for automation
26
+ vc setup --yes --no-wizard # dependency/bootstrap only
27
+ vc setup token # later update Discord bot token
28
+ vc setup token TOKEN --client-id ID # non-interactive token/client-id update
29
+ vc setup channels "General,Team Voice" # later update auto-join voice channel names
30
+ vc setup channel "General" # alias for setup channels
31
+ vc setup voice "General" # alias for setup channels
32
+ vc bot invite CLIENT_ID # print a Discord invite URL with required permissions
33
+ vc status # show STT language, progress language, and TTS voice
34
+ vc language en # English STT + English progress/TTS voice
35
+ vc language ko # Korean STT + Korean progress/TTS voice
36
+ vc language auto # Whisper auto-detect STT + English progress/TTS voice
37
+ vc restart auto status # show commit-time voice-bot auto-restart setting
38
+ vc restart auto on # enable commit-time voice-bot auto-restart
39
+ vc restart auto off # disable it; this is the default
40
+ vc instance list # list per-instance bridge configs
41
+ vc instance status [NAME] # show instance process status
42
+ vc instance setup NAME # write instances/NAME.env and create ~/.hermes/profiles/NAME
43
+ vc instance start NAME # start ./run.sh instances/NAME.env detached
44
+ vc instance stop NAME # stop a detached instance and remove its pid file
45
+ vc doctor # run the redacted doctor check and supported auto-fixes
46
+ vc start # start the default bridge
47
+ npm run mcp # run the stdio MCP server from a clone
22
48
  ```
23
49
 
24
- Language changes update `.env`; restart the bridge with `./run.sh` or your process manager for them to take effect.
50
+ For npm/global installs, prefer `vc ...` commands. Use `./scripts/install.sh` only from a GitHub clone.
51
+
52
+ `vc setup token` and `vc setup channels` are safe follow-up commands: they update `.env` in place, preserve unrelated keys, set file mode `0600`, and avoid printing secrets.
53
+
54
+ Language changes update `.env`; restart the bridge with `vc start`, `./run.sh`, or your process manager for them to take effect.
25
55
 
26
56
  ## Run Modes
27
57
 
28
58
  Single-instance bridge:
29
59
 
30
60
  ```bash
61
+ vc start
62
+ # clone equivalent:
31
63
  ./run.sh
32
64
  ```
33
65
 
34
66
  Per-instance bridge using a local override env:
35
67
 
36
68
  ```bash
69
+ vc instance start my-project
70
+ # clone/debug equivalent:
37
71
  ./run.sh instances/my-project.env
38
- # or
39
72
  VERBALCODING_INSTANCE_ENV=instances/my-project.env ./run.sh
40
73
  ```
41
74
 
42
- The bot auto-joins the first configured channel name, defaulting to `일반,General,general`.
75
+ The bot auto-joins the first matching configured channel name. Set it with:
76
+
77
+ ```bash
78
+ vc setup channels "VerbalCoding,LLM-Wiki,General"
79
+ ```
43
80
 
44
81
  ## Discord Commands
45
82
 
@@ -70,6 +107,28 @@ Then use `vc bot invite CLIENT_ID` to generate the VerbalCoding-specific invite
70
107
 
71
108
  Voice equivalents such as “외부 모드”, “보수 모드”, “실내”, “기본 감도”, and clear stop phrases like “잠깐”, “멈춰”, “그만” are handled by the bridge. You can also say “상세 진행 켜” / “상세 진행 꺼” to toggle verbose progress by voice.
72
109
 
110
+ ## Cross-agent voice routing
111
+
112
+ VerbalCoding can route a single turn (or the rest of the session) to a different installed CLI agent without restarting.
113
+
114
+ | Voice phrase (en) | Voice phrase (ko) | Behavior |
115
+ |---|---|---|
116
+ | `ask Codex what it thinks` | `코덱스한테 물어봐` | Single-turn route to Codex; next utterance returns to the default. |
117
+ | `switch to Aider` | `aider로 전환` | Sticky route — every following utterance goes to Aider. |
118
+ | `back to default` | `기본으로 돌아가` | Restore the default agent (`AGENT_BACKEND` / `vc setup` selection). |
119
+ | `let Claude finish this` | — | Treated as sticky route to Claude Code. |
120
+
121
+ Recognized aliases: `hermes`, `claude` / `claude code`, `codex` / `코덱스`, `gemini` / `gemini cli` / `제미나이`, `opencode`, `openclaw`, `aider` / `에이더`, `cursor` / `cursor cli`.
122
+
123
+ Behaviors on top:
124
+
125
+ - **Missing-binary fallback** — if the requested backend's binary is not on `PATH` (resolved against the active project session's workdir when applicable), the bridge asks "Want me to use the default agent instead?" Answer "yes" / "예" to retry on the default; "no" / "아니오" to cancel.
126
+ - **TTS prefix on backend change** — when the active backend changes between turns, the spoken answer is prefixed (`Codex says: …` / `코덱스: …`). No prefix on stable backends.
127
+ - **Cross-agent context handoff** — the routed agent receives a prompt block containing the prior agent label, recent voice utterances (last 4), and the most recently resolved plan decisions, so it doesn't restart cold.
128
+ - **Plan-mode `which_agent` slot** — plans can include a `which_agent` decision listing CLI options (e.g. `codex, aider, claude, gemini, opencode, openclaw, cursor, hermes`); the user's voice answer selects which agent executes that plan.
129
+ - **Per-channel state** — routing is scoped per Discord channel; switching agents in one project room does not affect others.
130
+ - **Sticky survives interrupts** — barge-in or aborted turns keep a sticky route intact; only single-turn routes are cleared.
131
+
73
132
  ## Changing the Voice
74
133
 
75
134
  `vc language ko|en|auto` changes STT language, progress language, and the matching default TTS voice together. If you only want to change the speaker/voice while the bridge is running, say it in Discord voice:
@@ -81,7 +140,7 @@ change voice to Korean female
81
140
  switch speaker to English
82
141
  ```
83
142
 
84
- The live bridge recognizes these as voice-control commands, updates `config/tts-voices.json`, updates the effective TTS env for the running process, and answers with a short confirmation such as “목소리를 Korean male로 바꿨어.” Use `!voice-test <text>` right after changing it to hear the current backend and voice.
143
+ The live bridge recognizes these as voice-control commands, updates `config/tts-voices.json`, updates the effective TTS env for the running process, and answers with a short confirmation. Use `!voice-test <text>` right after changing it to hear the current backend and voice.
85
144
 
86
145
  Built-in Edge voice types:
87
146
 
@@ -93,27 +152,12 @@ Built-in Edge voice types:
93
152
  | `english_male` | `en-US-GuyNeural` |
94
153
  | `english_female` | `en-US-AriaNeural` |
95
154
 
96
- For persistent manual config, set `TTS_BACKEND=edge`, `TTS_VOICE_TYPE=<voice-type>`, and optionally `TTS_VOICE=<edge-voice>` in `.env`, or edit `config/tts-voices.json` for custom voice catalogs.
97
-
98
- Backend-specific voice knobs:
99
-
100
- | Backend | Voice setting | Common choices |
101
- |---|---|---|
102
- | Edge | `TTS_VOICE_TYPE`, `TTS_VOICE` | `korean_male`, `korean_female`, `korean_multilingual_male`, `english_male`, `english_female`; any Edge voice from `edge-tts --list-voices` |
103
- | Supertonic | `SUPERTONIC_VOICE` | `M1`–`M5`, `F1`–`F5`; set `SUPERTONIC_LANGUAGE=ko|en|es|pt|fr` |
104
- | OpenVoice | `OPENVOICE_REF_AUDIO`, `OPENVOICE_STYLE` | a permitted reference WAV plus style such as `default` |
105
- | SpeechSwift / CosyVoice | `SPEECHSWIFT_REF_AUDIO`, `SPEECHSWIFT_ENGINE`, `SPEECHSWIFT_SPEAKER` | reference WAV for CosyVoice, or backend-supported speaker/model values |
106
-
107
- For Supertonic and local clone backends, use the backend env vars above plus `!voice-test <text>` to audition changes. Voice-command switching currently maps the built-in Edge-style voice types; richer backend catalogs can be added in `config/tts-voices.json`.
108
-
109
155
  ## Long Dictation and Pauses
110
156
 
111
- VerbalCoding waits for an idle window before sending speech to STT. The default `UTTERANCE_IDLE_MS=4500` is intentionally a bit patient so a natural pause in a long instruction does not split the sentence, start an agent turn too early, and then treat the rest as a processing-time interruption.
112
-
113
- If you prefer faster short commands, lower it in `.env`; if long Korean dictation is still being split, raise it:
157
+ VerbalCoding waits for an idle window before sending speech to STT. The default `UTTERANCE_IDLE_MS=4500` is intentionally patient so a natural pause in a long instruction does not split the sentence.
114
158
 
115
159
  ```bash
116
- UTTERANCE_IDLE_MS="6000"
160
+ UTTERANCE_IDLE_MS="6000" # safer for long dictation with pauses
117
161
  ```
118
162
 
119
163
  ## Verbose Progress Mode
@@ -128,7 +172,19 @@ Verbose progress is off by default unless `AGENT_VERBOSE_PROGRESS=1` is set. Ena
128
172
  🤖 Hermes Agent 응답 수신
129
173
  ```
130
174
 
131
- This mode asks the selected CLI harness to emit `VERBALCODING_PROGRESS: ...` lines and summarizes common tool markers from streaming stdout/stderr when available. Secret-looking fields are redacted and progress lines are removed from the final spoken answer.
175
+ Secret-looking fields are redacted and progress lines are removed from the final spoken answer.
176
+
177
+ ## Docker / Container Run Mode
178
+
179
+ If you run VerbalCoding in Docker and voice auto-join fails with `Cannot perform IP discovery - socket closed`, the likely issue is UDP connectivity, not channel lookup. For Linux Docker Compose:
180
+
181
+ ```yaml
182
+ services:
183
+ verbalcoding:
184
+ network_mode: "host"
185
+ ```
186
+
187
+ Remove `ports:` from that service. Docker Desktop for macOS/Windows has different host networking behavior; if UDP voice still fails there, run on the host or in a Linux VM. See [Troubleshooting](TROUBLESHOOTING.md).
132
188
 
133
189
  ## Latency Metrics
134
190
 
@@ -138,8 +194,6 @@ VerbalCoding writes per-turn latency records as JSONL. Default path:
138
194
  ./.logs/latency.jsonl
139
195
  ```
140
196
 
141
- Each record includes status, total time, voice capture time, utterance idle wait, STT time, agent time, TTS synthesis/playback time, chunk counts, transcript length, answer length, and audio levels where available.
142
-
143
197
  In Discord:
144
198
 
145
199
  ```text
@@ -154,7 +208,7 @@ The summary uses the latest 200 records: count, average, p95, max, and non-OK st
154
208
  ```bash
155
209
  node --check app-node/main.mjs
156
210
  npm test
157
- bash -n run.sh scripts/install.sh
211
+ bash -n run.sh scripts/install.sh scripts/bootstrap_prereqs.sh
158
212
  vc doctor
159
213
  ```
160
214
 
@@ -73,6 +73,6 @@
73
73
  <text x="594" y="194" fill="#FDE68A" text-anchor="middle" font-family="Inter, ui-sans-serif, system-ui" font-size="15" font-weight="700">Barge-in stays open while the agent is thinking or speaking</text>
74
74
 
75
75
  <rect x="150" y="348" width="900" height="54" rx="17" fill="#020617" stroke="#1F2937"/>
76
- <text x="182" y="382" fill="#A7F3D0" font-family="SFMono-Regular, ui-monospace, monospace" font-size="18">$ vc language ko &amp;&amp; vc instance start my-project</text>
76
+ <text x="182" y="382" fill="#A7F3D0" font-family="SFMono-Regular, ui-monospace, monospace" font-size="18">$ vc setup vc doctor → vc start</text>
77
77
  <text x="1045" y="382" fill="#64748B" text-anchor="end" font-family="Inter, ui-sans-serif, system-ui" font-size="15">hands-free coding call</text>
78
78
  </svg>
@@ -0,0 +1,34 @@
1
+ # Guía del repositorio (español)
2
+
3
+ > Este fichero es un resumen en español de [`AGENTS.md`](../../AGENTS.md). Las reglas formales viven en el inglés original.
4
+
5
+ VerbalCoding es un puente de voz Discord para agentes de codificación. El runtime es la implementación Node en `app-node/`, lanzada vía `run.sh` o el CLI `vc`.
6
+
7
+ ## Desarrollo
8
+
9
+ - En docs y ejemplos prefiere `vc ...` sobre `npm run vc -- ...`.
10
+ - Los secretos locales viven en `.env` o `instances/*.env`; nunca commits con tokens Discord, IDs de canal, ficheros de sesión, muestras de voz, pesos de modelo, virtualenvs, logs ni cachés.
11
+ - Edita ficheros fuente, no artefactos generados.
12
+ - Ejemplos públicos: usa placeholders para rutas locales, IDs de usuario, IDs Discord y tokens.
13
+
14
+ ## Verificación
15
+
16
+ Antes de marcar un cambio como completo, ejecuta:
17
+
18
+ ```bash
19
+ npm test
20
+ ```
21
+
22
+ ## Layout de módulos
23
+
24
+ Detalle en [`AGENTS.md`](../../AGENTS.md). Módulos clave:
25
+
26
+ - `main.mjs` — dispatcher Discord / voz / agente
27
+ - `agent_routing.mjs` — enrutamiento entre agentes por voz
28
+ - `plan_mode.mjs` — modo plan por voz (slot `which_agent`)
29
+ - `session_ontology.mjs` — grafo tipado por canal (handoff)
30
+ - `research_mode.mjs` — comando `"research X"`
31
+
32
+ ## Bloque gestionado
33
+
34
+ HarnessSync sincroniza las reglas de `CLAUDE.md` dentro de `AGENTS.md`. No edites manualmente ese bloque.
@@ -0,0 +1,34 @@
1
+ # Guide du dépôt (français)
2
+
3
+ > Ce fichier est un résumé français de [`AGENTS.md`](../../AGENTS.md). Les règles formelles restent dans l'original anglais.
4
+
5
+ VerbalCoding est un pont vocal Discord pour les agents de codage. Le runtime est l'implémentation Node sous `app-node/`, lancée via `run.sh` ou le CLI `vc`.
6
+
7
+ ## Développement
8
+
9
+ - Dans les docs et exemples, préférez `vc ...` à `npm run vc -- ...`.
10
+ - Les secrets locaux vivent dans `.env` ou `instances/*.env`; ne commitez jamais de vrais tokens Discord, IDs de salon, fichiers de session, échantillons vocaux, poids de modèle, virtualenvs, logs ni caches.
11
+ - Modifiez les fichiers source, pas les artefacts générés.
12
+ - Exemples publics-safe : placeholders pour chemins locaux, IDs utilisateur, IDs Discord, tokens.
13
+
14
+ ## Vérification
15
+
16
+ Avant de signaler un changement comme terminé, exécutez:
17
+
18
+ ```bash
19
+ npm test
20
+ ```
21
+
22
+ ## Cartographie des modules
23
+
24
+ Détail dans [`AGENTS.md`](../../AGENTS.md). Modules clés :
25
+
26
+ - `main.mjs` — dispatcher Discord / voix / agent
27
+ - `agent_routing.mjs` — routage inter-agent par voix
28
+ - `plan_mode.mjs` — mode plan vocal (slot `which_agent`)
29
+ - `session_ontology.mjs` — graphe typé par salon (handoff)
30
+ - `research_mode.mjs` — commande `"research X"`
31
+
32
+ ## Bloc géré
33
+
34
+ HarnessSync synchronise les règles de `CLAUDE.md` dans `AGENTS.md`. Ne pas éditer manuellement ce bloc.
@@ -0,0 +1,34 @@
1
+ # リポジトリガイドライン (日本語)
2
+
3
+ > 本ファイルは [`AGENTS.md`](../../AGENTS.md) の日本語要約です。正式なルールは英語の本文を参照してください。
4
+
5
+ VerbalCoding はコーディングエージェント向けの Discord 音声ブリッジです。実装は `app-node/` 配下の Node 実装で、`run.sh` または `vc` CLI 経由で起動します。
6
+
7
+ ## 開発
8
+
9
+ - ドキュメント / サンプルでは `vc ...` を `npm run vc -- ...` より優先してください。
10
+ - ローカルシークレットは `.env` または `instances/*.env` に置き、実 Discord トークン、チャンネル ID、セッションファイル、音声サンプル、モデル重み、venv、ログ、キャッシュ出力はコミットしないでください。
11
+ - 自動生成物ではなくソースファイルを編集してください。
12
+ - サンプルは公開しても安全な値で。ローカルパス、ユーザー ID、Discord ID、トークンはプレースホルダで。
13
+
14
+ ## 検証
15
+
16
+ コード変更を完了とする前に Node テストを走らせてください:
17
+
18
+ ```bash
19
+ npm test
20
+ ```
21
+
22
+ ## モジュール構成
23
+
24
+ 詳細は [`AGENTS.md`](../../AGENTS.md) を参照。主要モジュール:
25
+
26
+ - `main.mjs` — Discord / 音声 / エージェントのディスパッチャ
27
+ - `agent_routing.mjs` — 音声主導のクロスエージェントルーティング
28
+ - `plan_mode.mjs` — 音声プランモード(`which_agent` スロット)
29
+ - `session_ontology.mjs` — チャネル単位の typed graph(handoff 用)
30
+ - `research_mode.mjs` — `"research X"` 音声コマンドパイプライン
31
+
32
+ ## 管理ブロック
33
+
34
+ HarnessSync が `AGENTS.md` に `CLAUDE.md` のルールを同期します。当該ブロックは編集しないでください。
@@ -0,0 +1,34 @@
1
+ # 저장소 가이드라인 (한국어)
2
+
3
+ > 이 파일은 [`AGENTS.md`](../../AGENTS.md)의 한국어 요약입니다. 정식 규칙은 원본 영어 문서를 따라주세요.
4
+
5
+ VerbalCoding은 코딩 에이전트용 Discord 음성 브릿지입니다. 실제 런타임은 `app-node/` 하위 Node 구현체이고, `run.sh` 또는 `vc` CLI로 실행합니다.
6
+
7
+ ## 개발
8
+
9
+ - 문서·예제에서는 `npm run vc -- ...` 보다 `vc ...` 형태를 우선 사용합니다.
10
+ - 로컬 비밀은 `.env` 또는 `instances/*.env`에만 두고 절대 커밋하지 마세요. 실제 Discord 토큰, 채널 ID, 세션 파일, 음성 샘플, 모델 가중치, 가상환경, 로그, 캐시 출력도 마찬가지입니다.
11
+ - 생성물/런타임 산출물 대신 소스 파일을 수정합니다.
12
+ - 예제는 공개 안전한 값으로 유지합니다. 로컬 경로, 사용자 ID, Discord ID, 토큰은 플레이스홀더로.
13
+
14
+ ## 검증
15
+
16
+ 코드 변경을 완료로 보고하기 전에 Node 테스트 스위트를 실행하세요:
17
+
18
+ ```bash
19
+ npm test
20
+ ```
21
+
22
+ ## 모듈 맵
23
+
24
+ 자세한 내용은 [`AGENTS.md`](../../AGENTS.md)를 참고하세요. 핵심 모듈:
25
+
26
+ - `main.mjs` — Discord/음성/에이전트 디스패처
27
+ - `agent_routing.mjs` — 음성 기반 크로스 에이전트 라우팅
28
+ - `plan_mode.mjs` — 음성 플랜 모드 (which_agent 슬롯)
29
+ - `session_ontology.mjs` — 채널별 타입드 그래프 (cross-agent 핸드오프 컨텍스트)
30
+ - `research_mode.mjs` — `"리서치 X"` 음성 명령 파이프라인
31
+
32
+ ## 관리되는 영역
33
+
34
+ HarnessSync가 `AGENTS.md`에 `CLAUDE.md`의 규칙을 자동 동기화합니다. 그 블록은 손대지 마세요.
@@ -0,0 +1,34 @@
1
+ # Руководство по репозиторию (русский)
2
+
3
+ > Этот файл — русское резюме [`AGENTS.md`](../../AGENTS.md). Формальные правила — в оригинале на английском.
4
+
5
+ VerbalCoding — голосовой мост Discord для кодинг-агентов. Рантайм — Node-реализация в `app-node/`, запускается через `run.sh` или CLI `vc`.
6
+
7
+ ## Разработка
8
+
9
+ - В документации и примерах предпочитайте `vc ...`, а не `npm run vc -- ...`.
10
+ - Локальные секреты — в `.env` или `instances/*.env`. Не коммитьте реальные Discord-токены, channel ID, session-файлы, голосовые сэмплы, веса моделей, venv, логи, кеши.
11
+ - Правьте исходные файлы, а не сгенерированные артефакты.
12
+ - Примеры — публично-безопасные: плейсхолдеры для локальных путей, user ID, Discord ID, токенов.
13
+
14
+ ## Проверка
15
+
16
+ Перед тем как считать изменения готовыми, запустите:
17
+
18
+ ```bash
19
+ npm test
20
+ ```
21
+
22
+ ## Карта модулей
23
+
24
+ Подробности в [`AGENTS.md`](../../AGENTS.md). Ключевые модули:
25
+
26
+ - `main.mjs` — диспетчер Discord / голос / агенты
27
+ - `agent_routing.mjs` — голосовая маршрутизация между агентами
28
+ - `plan_mode.mjs` — голосовой plan-mode (слот `which_agent`)
29
+ - `session_ontology.mjs` — типизированный граф на канал (handoff)
30
+ - `research_mode.mjs` — голосовая команда `"research X"`
31
+
32
+ ## Управляемый блок
33
+
34
+ HarnessSync синхронизирует правила из `CLAUDE.md` в управляемый блок `AGENTS.md`. Не редактируйте этот блок вручную.
@@ -0,0 +1,34 @@
1
+ # 仓库指南 (中文)
2
+
3
+ > 本文是 [`AGENTS.md`](../../AGENTS.md) 的中文摘要。正式规则以英文原文为准。
4
+
5
+ VerbalCoding 是面向编码代理的 Discord 语音桥。运行时位于 `app-node/`,通过 `run.sh` 或 `vc` CLI 启动。
6
+
7
+ ## 开发
8
+
9
+ - 文档与示例优先使用 `vc ...` 形式,而不是 `npm run vc -- ...`。
10
+ - 本地密钥放在 `.env` 或 `instances/*.env`,不要提交真实 Discord token、频道 ID、会话文件、语音样本、模型权重、虚拟环境、日志、缓存。
11
+ - 修改源文件而非自动生成物。
12
+ - 示例保持公开安全:本地路径、用户 ID、Discord ID、token 用占位符替代。
13
+
14
+ ## 验证
15
+
16
+ 报告完成前请运行 Node 测试:
17
+
18
+ ```bash
19
+ npm test
20
+ ```
21
+
22
+ ## 模块布局
23
+
24
+ 详情见 [`AGENTS.md`](../../AGENTS.md)。核心模块:
25
+
26
+ - `main.mjs` — Discord / 语音 / 代理调度器
27
+ - `agent_routing.mjs` — 语音驱动的跨代理路由
28
+ - `plan_mode.mjs` — 语音 plan 模式 (`which_agent` 槽)
29
+ - `session_ontology.mjs` — 按频道的类型图 (用于 handoff)
30
+ - `research_mode.mjs` — `"research X"` 语音命令流程
31
+
32
+ ## 托管区域
33
+
34
+ HarnessSync 会把 `CLAUDE.md` 的规则同步进 `AGENTS.md` 的托管块,请勿手动修改该块。