verbalcoding 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (85) hide show
  1. package/.env.example +83 -0
  2. package/LICENSE +21 -0
  3. package/README.md +157 -0
  4. package/app-node/agent_adapters.mjs +576 -0
  5. package/app-node/agent_adapters.test.mjs +455 -0
  6. package/app-node/agent_contract.mjs +45 -0
  7. package/app-node/barge_in.mjs +148 -0
  8. package/app-node/barge_in.test.mjs +179 -0
  9. package/app-node/bridge_logger.mjs +66 -0
  10. package/app-node/bridge_logger.test.mjs +73 -0
  11. package/app-node/bridge_state.mjs +104 -0
  12. package/app-node/bridge_state.test.mjs +64 -0
  13. package/app-node/cli_install.test.mjs +97 -0
  14. package/app-node/deferred_queue.mjs +12 -0
  15. package/app-node/deferred_queue.test.mjs +20 -0
  16. package/app-node/discord_invite_cli.test.mjs +31 -0
  17. package/app-node/discord_text.mjs +29 -0
  18. package/app-node/discord_text.test.mjs +32 -0
  19. package/app-node/hermes_profiles.mjs +164 -0
  20. package/app-node/hermes_profiles.test.mjs +276 -0
  21. package/app-node/install_config.mjs +263 -0
  22. package/app-node/install_config.test.mjs +205 -0
  23. package/app-node/instance_doctor.mjs +137 -0
  24. package/app-node/instance_doctor.test.mjs +128 -0
  25. package/app-node/instance_profile_lifecycle.mjs +16 -0
  26. package/app-node/instances.mjs +153 -0
  27. package/app-node/instances.test.mjs +102 -0
  28. package/app-node/language_config.mjs +73 -0
  29. package/app-node/language_config.test.mjs +51 -0
  30. package/app-node/latency_metrics.mjs +133 -0
  31. package/app-node/latency_metrics.test.mjs +71 -0
  32. package/app-node/main.mjs +1771 -0
  33. package/app-node/mcp_tools.mjs +198 -0
  34. package/app-node/mcp_tools.test.mjs +39 -0
  35. package/app-node/progress_cache.mjs +7 -0
  36. package/app-node/progress_cache.test.mjs +23 -0
  37. package/app-node/progress_speech.mjs +102 -0
  38. package/app-node/progress_speech.test.mjs +48 -0
  39. package/app-node/project_sessions.mjs +148 -0
  40. package/app-node/project_sessions.test.mjs +77 -0
  41. package/app-node/restart_notice.mjs +57 -0
  42. package/app-node/restart_notice.test.mjs +37 -0
  43. package/app-node/restart_policy.mjs +27 -0
  44. package/app-node/restart_policy.test.mjs +33 -0
  45. package/app-node/text_routing.mjs +8 -0
  46. package/app-node/text_routing.test.mjs +18 -0
  47. package/app-node/tts_backends.mjs +251 -0
  48. package/app-node/tts_backends.test.mjs +400 -0
  49. package/app-node/tts_chunks.mjs +57 -0
  50. package/app-node/tts_chunks.test.mjs +35 -0
  51. package/app-node/tts_prefetch.mjs +38 -0
  52. package/app-node/tts_prefetch.test.mjs +49 -0
  53. package/app-node/tts_settings.mjs +72 -0
  54. package/app-node/tts_settings.test.mjs +127 -0
  55. package/app-node/tts_voice_config.mjs +127 -0
  56. package/app-node/tts_voice_config.test.mjs +64 -0
  57. package/app-node/voice_clone_capture.mjs +76 -0
  58. package/app-node/voice_clone_capture.test.mjs +51 -0
  59. package/app-node/voice_messages.mjs +62 -0
  60. package/app-node/voice_messages.test.mjs +33 -0
  61. package/docs/CONFIGURATION.md +183 -0
  62. package/docs/FRESH_INSTALL.md +193 -0
  63. package/docs/MULTI_INSTANCE.md +183 -0
  64. package/docs/RELEASE.md +72 -0
  65. package/docs/USAGE.md +108 -0
  66. package/docs/assets/figures/verbalcoding-flow.svg +63 -0
  67. package/docs/i18n/README.es.md +121 -0
  68. package/docs/i18n/README.fr.md +121 -0
  69. package/docs/i18n/README.ja.md +121 -0
  70. package/docs/i18n/README.ko.md +121 -0
  71. package/docs/i18n/README.ru.md +121 -0
  72. package/docs/i18n/README.zh.md +121 -0
  73. package/package.json +58 -0
  74. package/run.sh +82 -0
  75. package/scripts/bootstrap_prereqs.sh +193 -0
  76. package/scripts/cli.mjs +369 -0
  77. package/scripts/docker_ubuntu_smoke.sh +76 -0
  78. package/scripts/doctor.mjs +134 -0
  79. package/scripts/install.mjs +108 -0
  80. package/scripts/install.sh +44 -0
  81. package/scripts/mcp-server.mjs +84 -0
  82. package/scripts/openvoice_smoke.py +34 -0
  83. package/scripts/openvoice_synth.py +103 -0
  84. package/scripts/setup_openvoice.sh +34 -0
  85. package/scripts/setup_supertonic.sh +18 -0
package/docs/USAGE.md ADDED
@@ -0,0 +1,108 @@
1
+ # VerbalCoding Usage Guide
2
+
3
+ This page holds the operational details that used to make the README too long.
4
+
5
+ ## CLI Commands
6
+
7
+ ```bash
8
+ vc status # show STT language, progress language, and TTS voice
9
+ vc language en # English STT + English progress/TTS voice
10
+ vc language ko # Korean STT + Korean progress/TTS voice
11
+ vc language auto # Whisper auto-detect STT + English progress/TTS voice
12
+ vc restart auto status # show commit-time voice-bot auto-restart setting
13
+ vc restart auto on # enable commit-time voice-bot auto-restart
14
+ vc restart auto off # disable it; this is the default
15
+ vc bot invite CLIENT_ID # print a Discord invite URL with required permissions
16
+ vc instance status # list per-instance bridge configs and process status
17
+ vc instance setup NAME # write instances/NAME.env and create ~/.hermes/profiles/NAME
18
+ vc instance start NAME # start ./run.sh instances/NAME.env detached
19
+ vc instance stop NAME # stop a detached instance and remove its pid file
20
+ vc doctor # run the redacted doctor check
21
+ npm run mcp # run the stdio MCP server
22
+ ```
23
+
24
+ Language changes update `.env`; restart the bridge with `./run.sh` or your process manager for them to take effect.
25
+
26
+ ## Run Modes
27
+
28
+ Single-instance bridge:
29
+
30
+ ```bash
31
+ ./run.sh
32
+ ```
33
+
34
+ Per-instance bridge using a local override env:
35
+
36
+ ```bash
37
+ ./run.sh instances/my-project.env
38
+ # or
39
+ VERBALCODING_INSTANCE_ENV=instances/my-project.env ./run.sh
40
+ ```
41
+
42
+ The bot auto-joins the first configured channel name, defaulting to `일반,General,general`.
43
+
44
+ ## Discord Commands
45
+
46
+ | Command | Purpose |
47
+ |---|---|
48
+ | `!ping` | Basic bot check |
49
+ | `!join` / `!leave` | Join or leave voice |
50
+ | `!say <text>` | Speak text directly through TTS |
51
+ | `!voice-test <text>` | Test the active TTS backend |
52
+ | `!voice-clone capture` | Save the next valid utterance as an OpenVoice reference sample |
53
+ | `!voice-clone status` / `!voice-clone cancel` | Inspect or cancel capture |
54
+ | `!ask <prompt>` | Send text through the same selected harness adapter as voice |
55
+ | `!session status` | Show current project/default adapter session |
56
+ | `!session new <name> <workdir> [context] --voice <voice-channel>` | Create a project-scoped Hermes session |
57
+ | `!session attach-voice [sessionName] --voice <voice-channel>` | Bind text channel/thread to a voice channel |
58
+ | `!session list` | List configured project sessions |
59
+ | `!session reset` / `!reset-session` | Clear current project/default adapter session file |
60
+ | `!verbose on/off` | Toggle detailed progress updates |
61
+ | `!latency` / `!metrics` | Show recent latency summary |
62
+ | `!sensitivity normal/conservative` | Switch barge-in sensitivity |
63
+
64
+ Voice equivalents such as “외부 모드”, “보수 모드”, “실내”, “기본 감도”, and clear stop phrases like “잠깐”, “멈춰”, “그만” are handled by the bridge. You can also say “상세 진행 켜” / “상세 진행 꺼” to toggle verbose progress by voice.
65
+
66
+ ## Verbose Progress Mode
67
+
68
+ Verbose progress is off by default unless `AGENT_VERBOSE_PROGRESS=1` is set. Enable it with `!verbose on` or a voice command like “상세 진행 켜”. It can emit short progress lines such as:
69
+
70
+ ```text
71
+ 🤖 Hermes Agent 호출 시작
72
+ 📖 파일 읽기 app-node/main.mjs
73
+ 🔎 웹 검색 실행
74
+ ⌨️ 터미널 명령 실행
75
+ 🤖 Hermes Agent 응답 수신
76
+ ```
77
+
78
+ This mode asks the selected CLI harness to emit `VERBALCODING_PROGRESS: ...` lines and summarizes common tool markers from streaming stdout/stderr when available. Secret-looking fields are redacted and progress lines are removed from the final spoken answer.
79
+
80
+ ## Latency Metrics
81
+
82
+ VerbalCoding writes per-turn latency records as JSONL. Default path:
83
+
84
+ ```text
85
+ ./.logs/latency.jsonl
86
+ ```
87
+
88
+ Each record includes status, total time, voice capture time, utterance idle wait, STT time, agent time, TTS synthesis/playback time, chunk counts, transcript length, answer length, and audio levels where available.
89
+
90
+ In Discord:
91
+
92
+ ```text
93
+ !latency
94
+ !metrics
95
+ ```
96
+
97
+ The summary uses the latest 200 records: count, average, p95, max, and non-OK statuses.
98
+
99
+ ## Testing
100
+
101
+ ```bash
102
+ node --check app-node/main.mjs
103
+ npm test
104
+ bash -n run.sh scripts/install.sh
105
+ vc doctor
106
+ ```
107
+
108
+ `vc doctor` intentionally redacts secrets and only reports whether required values are configured. It also checks `instances/*.env` for duplicate token fingerprints and colliding runtime paths.
@@ -0,0 +1,63 @@
1
+ <svg width="1200" height="520" viewBox="0 0 1200 520" fill="none" xmlns="http://www.w3.org/2000/svg" role="img" aria-labelledby="title desc">
2
+ <title id="title">VerbalCoding voice-to-agent flow</title>
3
+ <desc id="desc">A stylized pipeline from Discord voice to speech recognition, CLI agent, text answer, and TTS playback.</desc>
4
+ <defs>
5
+ <linearGradient id="bg" x1="0" y1="0" x2="1200" y2="520" gradientUnits="userSpaceOnUse">
6
+ <stop stop-color="#0F172A"/>
7
+ <stop offset="0.55" stop-color="#111827"/>
8
+ <stop offset="1" stop-color="#312E81"/>
9
+ </linearGradient>
10
+ <linearGradient id="card" x1="120" y1="84" x2="1080" y2="436" gradientUnits="userSpaceOnUse">
11
+ <stop stop-color="#1E293B" stop-opacity="0.92"/>
12
+ <stop offset="1" stop-color="#020617" stop-opacity="0.86"/>
13
+ </linearGradient>
14
+ <filter id="shadow" x="-20%" y="-20%" width="140%" height="140%">
15
+ <feDropShadow dx="0" dy="18" stdDeviation="22" flood-color="#000" flood-opacity="0.35"/>
16
+ </filter>
17
+ </defs>
18
+ <rect width="1200" height="520" rx="34" fill="url(#bg)"/>
19
+ <circle cx="1030" cy="90" r="190" fill="#6366F1" opacity="0.16"/>
20
+ <circle cx="170" cy="430" r="210" fill="#06B6D4" opacity="0.13"/>
21
+ <rect x="70" y="54" width="1060" height="412" rx="32" fill="url(#card)" stroke="#334155" filter="url(#shadow)"/>
22
+ <text x="110" y="118" fill="#F8FAFC" font-family="Inter, ui-sans-serif, system-ui" font-size="42" font-weight="800">VerbalCoding</text>
23
+ <text x="110" y="154" fill="#94A3B8" font-family="Inter, ui-sans-serif, system-ui" font-size="20">Discord voice → local STT → CLI coding agent → spoken answer</text>
24
+
25
+ <g font-family="Inter, ui-sans-serif, system-ui" font-size="17" font-weight="700">
26
+ <rect x="110" y="220" width="150" height="92" rx="20" fill="#5865F2"/>
27
+ <text x="185" y="258" fill="white" text-anchor="middle">Discord</text>
28
+ <text x="185" y="284" fill="#E0E7FF" text-anchor="middle" font-size="14">voice channel</text>
29
+
30
+ <rect x="305" y="220" width="150" height="92" rx="20" fill="#0891B2"/>
31
+ <text x="380" y="258" fill="white" text-anchor="middle">whisper.cpp</text>
32
+ <text x="380" y="284" fill="#CFFAFE" text-anchor="middle" font-size="14">local STT</text>
33
+
34
+ <rect x="500" y="220" width="150" height="92" rx="20" fill="#7C3AED"/>
35
+ <text x="575" y="258" fill="white" text-anchor="middle">Adapter</text>
36
+ <text x="575" y="284" fill="#EDE9FE" text-anchor="middle" font-size="14">Hermes / Claude / Codex</text>
37
+
38
+ <rect x="695" y="220" width="150" height="92" rx="20" fill="#111827" stroke="#475569"/>
39
+ <text x="770" y="258" fill="white" text-anchor="middle">CLI Agent</text>
40
+ <text x="770" y="284" fill="#CBD5E1" text-anchor="middle" font-size="14">does the work</text>
41
+
42
+ <rect x="890" y="220" width="150" height="92" rx="20" fill="#0EA5E9"/>
43
+ <text x="965" y="258" fill="white" text-anchor="middle">TTS</text>
44
+ <text x="965" y="284" fill="#E0F2FE" text-anchor="middle" font-size="14">chunked playback</text>
45
+ </g>
46
+
47
+ <g stroke="#94A3B8" stroke-width="4" stroke-linecap="round">
48
+ <path d="M266 266H296"/>
49
+ <path d="M461 266H491"/>
50
+ <path d="M656 266H686"/>
51
+ <path d="M851 266H881"/>
52
+ </g>
53
+ <g fill="#94A3B8">
54
+ <path d="M296 266l-10-7v14l10-7z"/>
55
+ <path d="M491 266l-10-7v14l10-7z"/>
56
+ <path d="M686 266l-10-7v14l10-7z"/>
57
+ <path d="M881 266l-10-7v14l10-7z"/>
58
+ </g>
59
+
60
+ <rect x="150" y="360" width="900" height="54" rx="17" fill="#020617" stroke="#1F2937"/>
61
+ <text x="182" y="394" fill="#A7F3D0" font-family="SFMono-Regular, ui-monospace, monospace" font-size="18">$ vc language ko &amp;&amp; vc instance start my-project</text>
62
+ <text x="1045" y="394" fill="#64748B" text-anchor="end" font-family="Inter, ui-sans-serif, system-ui" font-size="15">hands-free coding loop</text>
63
+ </svg>
@@ -0,0 +1,121 @@
1
+ # VerbalCoding
2
+
3
+ <p align="center">
4
+ <strong>Habla con tus agentes de programación CLI por voz en Discord, como en una llamada.</strong>
5
+ </p>
6
+
7
+ <p align="center">
8
+ <a href="../../README.md">English</a> ·
9
+ <a href="README.ko.md">한국어</a> ·
10
+ <a href="README.ja.md">日本語</a> ·
11
+ <a href="README.zh.md">中文</a> ·
12
+ <a href="README.es.md">Español</a> ·
13
+ <a href="README.fr.md">Français</a> ·
14
+ <a href="README.ru.md">Русский</a>
15
+ </p>
16
+
17
+ <p align="center">
18
+ <img alt="Node.js" src="https://img.shields.io/badge/Node.js-20%2B-339933?logo=node.js&logoColor=white">
19
+ <img alt="Discord" src="https://img.shields.io/badge/Discord-voice%20bridge-5865F2?logo=discord&logoColor=white">
20
+ <img alt="STT" src="https://img.shields.io/badge/STT-whisper.cpp-7C3AED">
21
+ <img alt="TTS" src="https://img.shields.io/badge/TTS-Edge%20%7C%20OpenVoice%20%7C%20Supertonic%20%7C%20SpeechSwift-0EA5E9">
22
+ </p>
23
+
24
+ <p align="center">
25
+ <img src="../assets/figures/verbalcoding-flow.svg" alt="VerbalCoding voice-to-agent flow" width="860">
26
+ </p>
27
+
28
+ ## Why
29
+
30
+ VerbalCoding convierte un canal de voz de Discord en una superficie manos libres para agentes de programación. Di una petición, deja que el agente CLI trabaje y escucha una respuesta concisa, con transcripciones, eventos de progreso y protecciones para no leer código o logs interminables.
31
+
32
+ ## Puntos clave
33
+
34
+ | Qué ofrece | Por qué importa |
35
+ |---|---|
36
+ | Control por voz primero | Controla Hermes Agent, Claude Code, Codex, Gemini CLI, OpenCode, OpenClaw o cualquier CLI propia con la voz. |
37
+ | Bucle de voz local-first | Voz de Discord → STT `whisper.cpp` → agente → reproducción TTS por fragmentos. |
38
+ | Contexto compartido voz + texto | Los turnos de voz y `!ask` pueden reutilizar la misma sesión del agente compatible. |
39
+ | Interrupciones y sensibilidad | Interrumpe la reproducción de forma natural y cambia entre sensibilidad normal o conservadora. |
40
+ | Preajustes multilingües | `vc language ko/en/auto` cambia STT, idioma de progreso y voz TTS a la vez. |
41
+ | Aislamiento por proyecto | Un bot, perfil Hermes, sesión, memoria y logs por sala/proyecto. |
42
+
43
+ ## Inicio rápido
44
+
45
+ ```bash
46
+ git clone git@github.com:ca1773130n/VerbalCoding.git
47
+ cd VerbalCoding
48
+ ./scripts/install.sh
49
+ vc doctor
50
+ ./run.sh
51
+ ```
52
+
53
+ ## Cómo funciona
54
+
55
+ ```mermaid
56
+ flowchart LR
57
+ A[Discord voice] --> B["@discordjs/voice"]
58
+ B --> C[PCM cleanup + gates]
59
+ C --> D["whisper.cpp STT"]
60
+ D --> E["CLI agent adapter"]
61
+ E --> F["Concise answer"]
62
+ F --> G["Chunked TTS"]
63
+ G --> H["Discord playback"]
64
+ ```
65
+
66
+ ## Backends de agentes compatibles
67
+
68
+ | Backend | Default command | Session support |
69
+ |---|---:|---|
70
+ | Hermes Agent | `hermes chat -Q -q` | Resume, verbose progress, cancellation, final-answer recovery |
71
+ | Claude Code | `claude -p` | CLI session file support through adapter defaults |
72
+ | Codex CLI | `codex exec` | CLI session file support through adapter defaults |
73
+ | Gemini CLI | `gemini -p` | CLI session file support through adapter defaults |
74
+ | OpenCode | `opencode run` | CLI session file support through adapter defaults |
75
+ | OpenClaw | `openclaw run` | CLI session file support through adapter defaults |
76
+ | Custom | `AGENT_COMMAND` | Bring your own non-interactive command |
77
+
78
+ ## Aprende más
79
+
80
+ | Guide | What you get |
81
+ |---|---|
82
+ | [Fresh Install](../FRESH_INSTALL.md) | Instalación desde cero, descarga del modelo y primera ejecución |
83
+ | [Usage Guide](../USAGE.md) | Comandos CLI, comandos de Discord, progreso y métricas de latencia |
84
+ | [Configuration](../CONFIGURATION.md) | .env, backends de agente, MCP, TTS y notas operativas |
85
+ | [Multi-Instance](../MULTI_INSTANCE.md) | Una sala de voz persistente por proyecto |
86
+ | [Release Notes](../RELEASE.md) | Capacidades actuales y checklist previo al lanzamiento |
87
+
88
+ ## Mapa rápido de comandos
89
+
90
+ ```bash
91
+ vc status
92
+ vc language ko|en|auto
93
+ vc bot invite CLIENT_ID
94
+ vc instance setup NAME
95
+ vc instance start NAME
96
+ vc doctor
97
+ ```
98
+
99
+ ## Requisitos
100
+
101
+ | Layer | Default |
102
+ |---|---|
103
+ | Runtime | Node.js 20+, npm |
104
+ | Audio | `ffmpeg` |
105
+ | STT | `whisper.cpp` / `whisper-cli` |
106
+ | Discord | Bot token, Message Content intent, voice permissions |
107
+ | Agent | At least one authenticated CLI harness, Hermes Agent by default |
108
+ | Platform focus | macOS / Apple Silicon currently gets the most testing |
109
+
110
+ ## Contribuir
111
+
112
+ ```bash
113
+ node --check app-node/main.mjs
114
+ npm test
115
+ bash -n run.sh scripts/install.sh
116
+ vc doctor
117
+ ```
118
+
119
+ ## Estado
120
+
121
+ VerbalCoding is public-release oriented but still early. Demo video/GIF, broader Linux notes, and a formal license file are still TODOs.
@@ -0,0 +1,121 @@
1
+ # VerbalCoding
2
+
3
+ <p align="center">
4
+ <strong>Pilotez vos agents de code CLI à la voix dans Discord, comme au téléphone.</strong>
5
+ </p>
6
+
7
+ <p align="center">
8
+ <a href="../../README.md">English</a> ·
9
+ <a href="README.ko.md">한국어</a> ·
10
+ <a href="README.ja.md">日本語</a> ·
11
+ <a href="README.zh.md">中文</a> ·
12
+ <a href="README.es.md">Español</a> ·
13
+ <a href="README.fr.md">Français</a> ·
14
+ <a href="README.ru.md">Русский</a>
15
+ </p>
16
+
17
+ <p align="center">
18
+ <img alt="Node.js" src="https://img.shields.io/badge/Node.js-20%2B-339933?logo=node.js&logoColor=white">
19
+ <img alt="Discord" src="https://img.shields.io/badge/Discord-voice%20bridge-5865F2?logo=discord&logoColor=white">
20
+ <img alt="STT" src="https://img.shields.io/badge/STT-whisper.cpp-7C3AED">
21
+ <img alt="TTS" src="https://img.shields.io/badge/TTS-Edge%20%7C%20OpenVoice%20%7C%20Supertonic%20%7C%20SpeechSwift-0EA5E9">
22
+ </p>
23
+
24
+ <p align="center">
25
+ <img src="../assets/figures/verbalcoding-flow.svg" alt="VerbalCoding voice-to-agent flow" width="860">
26
+ </p>
27
+
28
+ ## Why
29
+
30
+ VerbalCoding transforme un salon vocal Discord en interface mains libres pour agents de code. Dictez une demande, laissez le CLI travailler, puis écoutez une réponse concise — avec transcription texte, événements de progression et garde-fous pour éviter de lire de longs blocs de code ou logs.
31
+
32
+ ## Points forts
33
+
34
+ | Fonction | Pourquoi c’est utile |
35
+ |---|---|
36
+ | Contrôle vocal d’abord | Pilotez Hermes Agent, Claude Code, Codex, Gemini CLI, OpenCode, OpenClaw ou un CLI personnalisé à la voix. |
37
+ | Boucle vocale locale | Voix Discord → STT `whisper.cpp` → agent → lecture TTS par segments. |
38
+ | Contexte partagé voix + texte | Les tours vocaux et `!ask` peuvent réutiliser la même session d’agent compatible. |
39
+ | Interruption et sensibilité | Interrompez naturellement la lecture et basculez entre sensibilité normale ou conservatrice. |
40
+ | Préréglages vocaux multilingues | `vc language ko/en/auto` change ensemble STT, langue de progression et voix TTS. |
41
+ | Isolation par projet | Un bot, profil Hermes, session, mémoire et logs par salon/projet. |
42
+
43
+ ## Démarrage rapide
44
+
45
+ ```bash
46
+ git clone git@github.com:ca1773130n/VerbalCoding.git
47
+ cd VerbalCoding
48
+ ./scripts/install.sh
49
+ vc doctor
50
+ ./run.sh
51
+ ```
52
+
53
+ ## Fonctionnement
54
+
55
+ ```mermaid
56
+ flowchart LR
57
+ A[Discord voice] --> B["@discordjs/voice"]
58
+ B --> C[PCM cleanup + gates]
59
+ C --> D["whisper.cpp STT"]
60
+ D --> E["CLI agent adapter"]
61
+ E --> F["Concise answer"]
62
+ F --> G["Chunked TTS"]
63
+ G --> H["Discord playback"]
64
+ ```
65
+
66
+ ## Backends d’agents pris en charge
67
+
68
+ | Backend | Default command | Session support |
69
+ |---|---:|---|
70
+ | Hermes Agent | `hermes chat -Q -q` | Resume, verbose progress, cancellation, final-answer recovery |
71
+ | Claude Code | `claude -p` | CLI session file support through adapter defaults |
72
+ | Codex CLI | `codex exec` | CLI session file support through adapter defaults |
73
+ | Gemini CLI | `gemini -p` | CLI session file support through adapter defaults |
74
+ | OpenCode | `opencode run` | CLI session file support through adapter defaults |
75
+ | OpenClaw | `openclaw run` | CLI session file support through adapter defaults |
76
+ | Custom | `AGENT_COMMAND` | Bring your own non-interactive command |
77
+
78
+ ## En savoir plus
79
+
80
+ | Guide | What you get |
81
+ |---|---|
82
+ | [Fresh Install](../FRESH_INSTALL.md) | Installation propre, téléchargement du modèle, premier lancement |
83
+ | [Usage Guide](../USAGE.md) | Commandes CLI, commandes Discord, progression, métriques de latence |
84
+ | [Configuration](../CONFIGURATION.md) | .env, backends agent, MCP, TTS et notes d’exploitation |
85
+ | [Multi-Instance](../MULTI_INSTANCE.md) | Un salon vocal Discord permanent par projet |
86
+ | [Release Notes](../RELEASE.md) | Fonctionnalités actuelles et checklist pré-release |
87
+
88
+ ## Mini carte des commandes
89
+
90
+ ```bash
91
+ vc status
92
+ vc language ko|en|auto
93
+ vc bot invite CLIENT_ID
94
+ vc instance setup NAME
95
+ vc instance start NAME
96
+ vc doctor
97
+ ```
98
+
99
+ ## Prérequis
100
+
101
+ | Layer | Default |
102
+ |---|---|
103
+ | Runtime | Node.js 20+, npm |
104
+ | Audio | `ffmpeg` |
105
+ | STT | `whisper.cpp` / `whisper-cli` |
106
+ | Discord | Bot token, Message Content intent, voice permissions |
107
+ | Agent | At least one authenticated CLI harness, Hermes Agent by default |
108
+ | Platform focus | macOS / Apple Silicon currently gets the most testing |
109
+
110
+ ## Contribuer
111
+
112
+ ```bash
113
+ node --check app-node/main.mjs
114
+ npm test
115
+ bash -n run.sh scripts/install.sh
116
+ vc doctor
117
+ ```
118
+
119
+ ## Statut
120
+
121
+ VerbalCoding is public-release oriented but still early. Demo video/GIF, broader Linux notes, and a formal license file are still TODOs.
@@ -0,0 +1,121 @@
1
+ # VerbalCoding
2
+
3
+ <p align="center">
4
+ <strong>Discord音声でCLIコーディングエージェントと通話するように作業できます。</strong>
5
+ </p>
6
+
7
+ <p align="center">
8
+ <a href="../../README.md">English</a> ·
9
+ <a href="README.ko.md">한국어</a> ·
10
+ <a href="README.ja.md">日本語</a> ·
11
+ <a href="README.zh.md">中文</a> ·
12
+ <a href="README.es.md">Español</a> ·
13
+ <a href="README.fr.md">Français</a> ·
14
+ <a href="README.ru.md">Русский</a>
15
+ </p>
16
+
17
+ <p align="center">
18
+ <img alt="Node.js" src="https://img.shields.io/badge/Node.js-20%2B-339933?logo=node.js&logoColor=white">
19
+ <img alt="Discord" src="https://img.shields.io/badge/Discord-voice%20bridge-5865F2?logo=discord&logoColor=white">
20
+ <img alt="STT" src="https://img.shields.io/badge/STT-whisper.cpp-7C3AED">
21
+ <img alt="TTS" src="https://img.shields.io/badge/TTS-Edge%20%7C%20OpenVoice%20%7C%20Supertonic%20%7C%20SpeechSwift-0EA5E9">
22
+ </p>
23
+
24
+ <p align="center">
25
+ <img src="../assets/figures/verbalcoding-flow.svg" alt="VerbalCoding voice-to-agent flow" width="860">
26
+ </p>
27
+
28
+ ## Why
29
+
30
+ VerbalCodingはDiscordの音声チャンネルを、コーディングエージェントのハンズフリー操作面に変えます。声で依頼し、CLIエージェントに作業させ、要点だけを音声で受け取れます。テキスト記録、進捗イベント、コードやログを読み上げすぎないガードも備えています。
31
+
32
+ ## ハイライト
33
+
34
+ | できること | うれしい理由 |
35
+ |---|---|
36
+ | 音声ファーストのAgent操作 | Hermes Agent、Claude Code、Codex、Gemini CLI、OpenCode、OpenClaw、カスタムCLIを声で操作できます。 |
37
+ | ローカル優先の音声ループ | Discord音声キャプチャ → `whisper.cpp` STT → Agent → 分割TTS再生。 |
38
+ | 音声とテキストの共有コンテキスト | 対応Agentでは音声ターンと`!ask`テキストコマンドが同じセッションを再利用できます。 |
39
+ | 割り込みと感度モード | 再生中に自然に割り込み、通常/保守的な感度を切り替えられます。 |
40
+ | 多言語音声プリセット | `vc language ko/en/auto`でSTT、進捗言語、TTS音声をまとめて変更できます。 |
41
+ | プロジェクト別マルチルーム分離 | プロジェクトごとにBot、Hermesプロファイル、セッション、メモリ、ログを分離します。 |
42
+
43
+ ## クイックスタート
44
+
45
+ ```bash
46
+ git clone git@github.com:ca1773130n/VerbalCoding.git
47
+ cd VerbalCoding
48
+ ./scripts/install.sh
49
+ vc doctor
50
+ ./run.sh
51
+ ```
52
+
53
+ ## 仕組み
54
+
55
+ ```mermaid
56
+ flowchart LR
57
+ A[Discord voice] --> B["@discordjs/voice"]
58
+ B --> C[PCM cleanup + gates]
59
+ C --> D["whisper.cpp STT"]
60
+ D --> E["CLI agent adapter"]
61
+ E --> F["Concise answer"]
62
+ F --> G["Chunked TTS"]
63
+ G --> H["Discord playback"]
64
+ ```
65
+
66
+ ## 対応エージェントバックエンド
67
+
68
+ | Backend | Default command | Session support |
69
+ |---|---:|---|
70
+ | Hermes Agent | `hermes chat -Q -q` | Resume, verbose progress, cancellation, final-answer recovery |
71
+ | Claude Code | `claude -p` | CLI session file support through adapter defaults |
72
+ | Codex CLI | `codex exec` | CLI session file support through adapter defaults |
73
+ | Gemini CLI | `gemini -p` | CLI session file support through adapter defaults |
74
+ | OpenCode | `opencode run` | CLI session file support through adapter defaults |
75
+ | OpenClaw | `openclaw run` | CLI session file support through adapter defaults |
76
+ | Custom | `AGENT_COMMAND` | Bring your own non-interactive command |
77
+
78
+ ## 詳しく見る
79
+
80
+ | Guide | What you get |
81
+ |---|---|
82
+ | [Fresh Install](../FRESH_INSTALL.md) | クリーンなクローンからのセットアップ、モデル取得、初回起動 |
83
+ | [Usage Guide](../USAGE.md) | CLIコマンド、Discordコマンド、進捗モード、レイテンシ指標 |
84
+ | [Configuration](../CONFIGURATION.md) | .env、エージェントバックエンド、MCP、TTSバックエンド、運用メモ |
85
+ | [Multi-Instance](../MULTI_INSTANCE.md) | プロジェクトごとに常駐Discord音声ルームを用意 |
86
+ | [Release Notes](../RELEASE.md) | 現在の機能とリリース前チェックリスト |
87
+
88
+ ## 小さなコマンド表
89
+
90
+ ```bash
91
+ vc status
92
+ vc language ko|en|auto
93
+ vc bot invite CLIENT_ID
94
+ vc instance setup NAME
95
+ vc instance start NAME
96
+ vc doctor
97
+ ```
98
+
99
+ ## 要件
100
+
101
+ | Layer | Default |
102
+ |---|---|
103
+ | Runtime | Node.js 20+, npm |
104
+ | Audio | `ffmpeg` |
105
+ | STT | `whisper.cpp` / `whisper-cli` |
106
+ | Discord | Bot token, Message Content intent, voice permissions |
107
+ | Agent | At least one authenticated CLI harness, Hermes Agent by default |
108
+ | Platform focus | macOS / Apple Silicon currently gets the most testing |
109
+
110
+ ## コントリビュート
111
+
112
+ ```bash
113
+ node --check app-node/main.mjs
114
+ npm test
115
+ bash -n run.sh scripts/install.sh
116
+ vc doctor
117
+ ```
118
+
119
+ ## ステータス
120
+
121
+ VerbalCoding is public-release oriented but still early. Demo video/GIF, broader Linux notes, and a formal license file are still TODOs.
@@ -0,0 +1,121 @@
1
+ # VerbalCoding
2
+
3
+ <p align="center">
4
+ <strong>Discord 음성으로 CLI 코딩 에이전트와 통화하듯 작업하세요.</strong>
5
+ </p>
6
+
7
+ <p align="center">
8
+ <a href="../../README.md">English</a> ·
9
+ <a href="README.ko.md">한국어</a> ·
10
+ <a href="README.ja.md">日本語</a> ·
11
+ <a href="README.zh.md">中文</a> ·
12
+ <a href="README.es.md">Español</a> ·
13
+ <a href="README.fr.md">Français</a> ·
14
+ <a href="README.ru.md">Русский</a>
15
+ </p>
16
+
17
+ <p align="center">
18
+ <img alt="Node.js" src="https://img.shields.io/badge/Node.js-20%2B-339933?logo=node.js&logoColor=white">
19
+ <img alt="Discord" src="https://img.shields.io/badge/Discord-voice%20bridge-5865F2?logo=discord&logoColor=white">
20
+ <img alt="STT" src="https://img.shields.io/badge/STT-whisper.cpp-7C3AED">
21
+ <img alt="TTS" src="https://img.shields.io/badge/TTS-Edge%20%7C%20OpenVoice%20%7C%20Supertonic%20%7C%20SpeechSwift-0EA5E9">
22
+ </p>
23
+
24
+ <p align="center">
25
+ <img src="../assets/figures/verbalcoding-flow.svg" alt="VerbalCoding voice-to-agent flow" width="860">
26
+ </p>
27
+
28
+ ## Why
29
+
30
+ VerbalCoding은 Discord 음성 채널을 코딩 에이전트용 핸즈프리 조작면으로 바꿉니다. 말로 요청하고, CLI 에이전트가 작업하게 두고, 핵심 답변을 음성으로 다시 들을 수 있습니다 — 텍스트 기록, 진행 이벤트, 코드/로그 낭독 방지 장치까지 함께 제공합니다.
31
+
32
+ ## 핵심 기능
33
+
34
+ | 제공 기능 | 좋은 이유 |
35
+ |---|---|
36
+ | 음성 우선 에이전트 제어 | Hermes Agent, Claude Code, Codex, Gemini CLI, OpenCode, OpenClaw 또는 커스텀 CLI를 말로 제어합니다. |
37
+ | 로컬 우선 음성 루프 | Discord 음성 캡처 → `whisper.cpp` STT → 에이전트 → 분할 TTS 재생. |
38
+ | 음성 + 텍스트 컨텍스트 공유 | 지원되는 에이전트에서는 음성 턴과 `!ask` 텍스트 명령이 같은 세션을 재사용합니다. |
39
+ | 바지인과 감도 모드 | 재생 중 자연스럽게 끼어들고, 일반/보수 감도 모드를 전환합니다. |
40
+ | 다국어 음성 프리셋 | `vc language ko/en/auto`로 STT, 진행 언어, TTS 음성을 함께 바꿉니다. |
41
+ | 프로젝트별 멀티룸 격리 | 프로젝트 방마다 별도 봇과 Hermes 프로필, 세션, 메모리, 로그를 둡니다. |
42
+
43
+ ## 빠른 시작
44
+
45
+ ```bash
46
+ git clone git@github.com:ca1773130n/VerbalCoding.git
47
+ cd VerbalCoding
48
+ ./scripts/install.sh
49
+ vc doctor
50
+ ./run.sh
51
+ ```
52
+
53
+ ## 동작 방식
54
+
55
+ ```mermaid
56
+ flowchart LR
57
+ A[Discord voice] --> B["@discordjs/voice"]
58
+ B --> C[PCM cleanup + gates]
59
+ C --> D["whisper.cpp STT"]
60
+ D --> E["CLI agent adapter"]
61
+ E --> F["Concise answer"]
62
+ F --> G["Chunked TTS"]
63
+ G --> H["Discord playback"]
64
+ ```
65
+
66
+ ## 지원 에이전트 백엔드
67
+
68
+ | Backend | Default command | Session support |
69
+ |---|---:|---|
70
+ | Hermes Agent | `hermes chat -Q -q` | Resume, verbose progress, cancellation, final-answer recovery |
71
+ | Claude Code | `claude -p` | CLI session file support through adapter defaults |
72
+ | Codex CLI | `codex exec` | CLI session file support through adapter defaults |
73
+ | Gemini CLI | `gemini -p` | CLI session file support through adapter defaults |
74
+ | OpenCode | `opencode run` | CLI session file support through adapter defaults |
75
+ | OpenClaw | `openclaw run` | CLI session file support through adapter defaults |
76
+ | Custom | `AGENT_COMMAND` | Bring your own non-interactive command |
77
+
78
+ ## 더 알아보기
79
+
80
+ | Guide | What you get |
81
+ |---|---|
82
+ | [Fresh Install](../FRESH_INSTALL.md) | 클린 클론 설치, 모델 다운로드, 첫 실행 |
83
+ | [Usage Guide](../USAGE.md) | CLI 명령, Discord 명령, 진행 모드, 지연 시간 지표 |
84
+ | [Configuration](../CONFIGURATION.md) | .env, 에이전트 백엔드, MCP, TTS 백엔드, 운영 노트 |
85
+ | [Multi-Instance](../MULTI_INSTANCE.md) | 프로젝트마다 영구 Discord 음성방 하나씩 |
86
+ | [Release Notes](../RELEASE.md) | 현재 기능과 릴리스 전 체크리스트 |
87
+
88
+ ## 작은 명령 지도
89
+
90
+ ```bash
91
+ vc status
92
+ vc language ko|en|auto
93
+ vc bot invite CLIENT_ID
94
+ vc instance setup NAME
95
+ vc instance start NAME
96
+ vc doctor
97
+ ```
98
+
99
+ ## 요구 사항
100
+
101
+ | Layer | Default |
102
+ |---|---|
103
+ | Runtime | Node.js 20+, npm |
104
+ | Audio | `ffmpeg` |
105
+ | STT | `whisper.cpp` / `whisper-cli` |
106
+ | Discord | Bot token, Message Content intent, voice permissions |
107
+ | Agent | At least one authenticated CLI harness, Hermes Agent by default |
108
+ | Platform focus | macOS / Apple Silicon currently gets the most testing |
109
+
110
+ ## 기여
111
+
112
+ ```bash
113
+ node --check app-node/main.mjs
114
+ npm test
115
+ bash -n run.sh scripts/install.sh
116
+ vc doctor
117
+ ```
118
+
119
+ ## 상태
120
+
121
+ VerbalCoding is public-release oriented but still early. Demo video/GIF, broader Linux notes, and a formal license file are still TODOs.