verbalcoding 0.2.11 → 0.2.13
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.env.example +98 -2
- package/README.es.md +134 -0
- package/README.fr.md +134 -0
- package/README.ja.md +134 -0
- package/README.ko.md +134 -0
- package/README.md +118 -74
- package/README.ru.md +134 -0
- package/README.zh.md +133 -0
- package/app-node/agent_adapters.mjs +37 -5
- package/app-node/agent_adapters.test.mjs +27 -1
- package/app-node/agent_detect.mjs +73 -0
- package/app-node/agent_detect.test.mjs +77 -0
- package/app-node/agent_routing.mjs +148 -0
- package/app-node/agent_routing.test.mjs +138 -0
- package/app-node/agent_turn.mjs +86 -0
- package/app-node/agent_turn.test.mjs +109 -0
- package/app-node/bridge_context.mjs +73 -0
- package/app-node/bridge_context.test.mjs +54 -0
- package/app-node/bridge_state.mjs +4 -0
- package/app-node/bridge_wireup.test.mjs +462 -0
- package/app-node/cli_install.test.mjs +31 -0
- package/app-node/cross_agent_routing.test.mjs +78 -0
- package/app-node/discord_command_router.mjs +204 -0
- package/app-node/discord_command_router.test.mjs +311 -0
- package/app-node/discord_voice_setup.mjs +251 -0
- package/app-node/discord_voice_setup.test.mjs +86 -0
- package/app-node/hermes_profiles.test.mjs +12 -1
- package/app-node/install_config.mjs +113 -3
- package/app-node/install_config.test.mjs +8 -0
- package/app-node/instance_doctor.test.mjs +9 -0
- package/app-node/instances.test.mjs +8 -1
- package/app-node/main.mjs +513 -1058
- package/app-node/mcp_tools.test.mjs +7 -0
- package/app-node/notification_handler.mjs +89 -0
- package/app-node/notification_handler.test.mjs +187 -0
- package/app-node/notify.mjs +73 -0
- package/app-node/notify.test.mjs +68 -0
- package/app-node/plan_dispatcher.mjs +215 -0
- package/app-node/plan_dispatcher.test.mjs +101 -0
- package/app-node/plan_mode.mjs +203 -0
- package/app-node/plan_mode.test.mjs +231 -0
- package/app-node/progress_handler.mjs +220 -0
- package/app-node/progress_handler.test.mjs +193 -0
- package/app-node/progress_speech.mjs +54 -32
- package/app-node/progress_speech.test.mjs +12 -3
- package/app-node/project_sessions.mjs +5 -2
- package/app-node/project_sessions.test.mjs +7 -0
- package/app-node/research_mode.mjs +282 -0
- package/app-node/research_mode.test.mjs +264 -0
- package/app-node/restart_notice.mjs +3 -0
- package/app-node/restart_notice.test.mjs +11 -0
- package/app-node/session_ontology.mjs +271 -0
- package/app-node/session_ontology.test.mjs +130 -0
- package/app-node/smart_progress.mjs +94 -0
- package/app-node/smart_progress.test.mjs +66 -0
- package/app-node/stream_sentencer.mjs +91 -0
- package/app-node/stream_sentencer.test.mjs +129 -0
- package/app-node/streaming_tts_queue.mjs +52 -0
- package/app-node/streaming_tts_queue.test.mjs +64 -0
- package/app-node/stt_whisper.mjs +24 -0
- package/app-node/stt_whisper.test.mjs +32 -0
- package/app-node/text_routing.mjs +22 -0
- package/app-node/text_routing.test.mjs +23 -1
- package/app-node/tts_backends.mjs +537 -3
- package/app-node/tts_backends.test.mjs +454 -0
- package/app-node/tts_player.mjs +164 -0
- package/app-node/tts_player.test.mjs +202 -0
- package/app-node/tts_runtime.mjs +134 -0
- package/app-node/tts_runtime.test.mjs +89 -0
- package/app-node/tts_settings.mjs +150 -3
- package/app-node/tts_settings.test.mjs +204 -0
- package/app-node/tts_voice_config.mjs +136 -2
- package/app-node/tts_voice_config.test.mjs +94 -0
- package/app-node/utterance_router.mjs +216 -0
- package/app-node/utterance_router.test.mjs +236 -0
- package/app-node/voice_autojoin.mjs +37 -0
- package/app-node/voice_autojoin.test.mjs +59 -0
- package/app-node/voice_io.mjs +272 -0
- package/app-node/voice_io.test.mjs +102 -0
- package/app-node/voice_turn_runner.mjs +449 -0
- package/app-node/voice_turn_runner.test.mjs +289 -0
- package/docs/CONFIGURATION.md +79 -96
- package/docs/FRESH_INSTALL.md +105 -63
- package/docs/HARNESSES.md +58 -0
- package/docs/HARNESS_AIDER.md +50 -0
- package/docs/HARNESS_CLAUDE.md +56 -0
- package/docs/HARNESS_CODEX.md +56 -0
- package/docs/HARNESS_CURSOR.md +45 -0
- package/docs/HARNESS_GEMINI.md +45 -0
- package/docs/HARNESS_HERMES.md +57 -0
- package/docs/HARNESS_OPENCLAW.md +44 -0
- package/docs/HARNESS_OPENCODE.md +44 -0
- package/docs/HERMES_VOICE.md +65 -0
- package/docs/MULTI_INSTANCE.md +16 -0
- package/docs/README.md +50 -0
- package/docs/RELEASE.md +42 -19
- package/docs/ROADMAP.md +53 -0
- package/docs/TROUBLESHOOTING.md +126 -0
- package/docs/TTS_BACKENDS.md +227 -0
- package/docs/USAGE.md +94 -40
- package/docs/assets/figures/verbalcoding-flow.svg +1 -1
- package/docs/i18n/AGENTS.es.md +34 -0
- package/docs/i18n/AGENTS.fr.md +34 -0
- package/docs/i18n/AGENTS.ja.md +34 -0
- package/docs/i18n/AGENTS.ko.md +34 -0
- package/docs/i18n/AGENTS.ru.md +34 -0
- package/docs/i18n/AGENTS.zh.md +34 -0
- package/docs/i18n/CONFIGURATION.es.md +25 -0
- package/docs/i18n/CONFIGURATION.fr.md +25 -0
- package/docs/i18n/CONFIGURATION.ja.md +25 -0
- package/docs/i18n/CONFIGURATION.ko.md +25 -0
- package/docs/i18n/CONFIGURATION.ru.md +25 -0
- package/docs/i18n/CONFIGURATION.zh.md +25 -0
- package/docs/i18n/FRESH_INSTALL.es.md +27 -2
- package/docs/i18n/FRESH_INSTALL.fr.md +27 -2
- package/docs/i18n/FRESH_INSTALL.ja.md +27 -2
- package/docs/i18n/FRESH_INSTALL.ko.md +27 -2
- package/docs/i18n/FRESH_INSTALL.ru.md +27 -2
- package/docs/i18n/FRESH_INSTALL.zh.md +27 -2
- package/docs/i18n/HARNESSES.es.md +58 -0
- package/docs/i18n/HARNESSES.fr.md +58 -0
- package/docs/i18n/HARNESSES.ja.md +58 -0
- package/docs/i18n/HARNESSES.ko.md +58 -0
- package/docs/i18n/HARNESSES.ru.md +58 -0
- package/docs/i18n/HARNESSES.zh.md +58 -0
- package/docs/i18n/HARNESS_AIDER.es.md +48 -0
- package/docs/i18n/HARNESS_AIDER.fr.md +48 -0
- package/docs/i18n/HARNESS_AIDER.ja.md +50 -0
- package/docs/i18n/HARNESS_AIDER.ko.md +50 -0
- package/docs/i18n/HARNESS_AIDER.ru.md +48 -0
- package/docs/i18n/HARNESS_AIDER.zh.md +48 -0
- package/docs/i18n/HARNESS_CLAUDE.es.md +55 -0
- package/docs/i18n/HARNESS_CLAUDE.fr.md +55 -0
- package/docs/i18n/HARNESS_CLAUDE.ja.md +56 -0
- package/docs/i18n/HARNESS_CLAUDE.ko.md +56 -0
- package/docs/i18n/HARNESS_CLAUDE.ru.md +55 -0
- package/docs/i18n/HARNESS_CLAUDE.zh.md +56 -0
- package/docs/i18n/HARNESS_CODEX.es.md +55 -0
- package/docs/i18n/HARNESS_CODEX.fr.md +55 -0
- package/docs/i18n/HARNESS_CODEX.ja.md +56 -0
- package/docs/i18n/HARNESS_CODEX.ko.md +56 -0
- package/docs/i18n/HARNESS_CODEX.ru.md +55 -0
- package/docs/i18n/HARNESS_CODEX.zh.md +56 -0
- package/docs/i18n/HARNESS_CURSOR.es.md +42 -0
- package/docs/i18n/HARNESS_CURSOR.fr.md +42 -0
- package/docs/i18n/HARNESS_CURSOR.ja.md +45 -0
- package/docs/i18n/HARNESS_CURSOR.ko.md +45 -0
- package/docs/i18n/HARNESS_CURSOR.ru.md +42 -0
- package/docs/i18n/HARNESS_CURSOR.zh.md +42 -0
- package/docs/i18n/HARNESS_GEMINI.es.md +44 -0
- package/docs/i18n/HARNESS_GEMINI.fr.md +44 -0
- package/docs/i18n/HARNESS_GEMINI.ja.md +45 -0
- package/docs/i18n/HARNESS_GEMINI.ko.md +45 -0
- package/docs/i18n/HARNESS_GEMINI.ru.md +44 -0
- package/docs/i18n/HARNESS_GEMINI.zh.md +45 -0
- package/docs/i18n/HARNESS_HERMES.es.md +54 -0
- package/docs/i18n/HARNESS_HERMES.fr.md +54 -0
- package/docs/i18n/HARNESS_HERMES.ja.md +57 -0
- package/docs/i18n/HARNESS_HERMES.ko.md +57 -0
- package/docs/i18n/HARNESS_HERMES.ru.md +54 -0
- package/docs/i18n/HARNESS_HERMES.zh.md +57 -0
- package/docs/i18n/HARNESS_OPENCLAW.es.md +41 -0
- package/docs/i18n/HARNESS_OPENCLAW.fr.md +41 -0
- package/docs/i18n/HARNESS_OPENCLAW.ja.md +44 -0
- package/docs/i18n/HARNESS_OPENCLAW.ko.md +44 -0
- package/docs/i18n/HARNESS_OPENCLAW.ru.md +41 -0
- package/docs/i18n/HARNESS_OPENCLAW.zh.md +42 -0
- package/docs/i18n/HARNESS_OPENCODE.es.md +41 -0
- package/docs/i18n/HARNESS_OPENCODE.fr.md +41 -0
- package/docs/i18n/HARNESS_OPENCODE.ja.md +44 -0
- package/docs/i18n/HARNESS_OPENCODE.ko.md +44 -0
- package/docs/i18n/HARNESS_OPENCODE.ru.md +41 -0
- package/docs/i18n/HARNESS_OPENCODE.zh.md +44 -0
- package/docs/i18n/HERMES_VOICE.es.md +46 -0
- package/docs/i18n/HERMES_VOICE.fr.md +46 -0
- package/docs/i18n/HERMES_VOICE.ja.md +46 -0
- package/docs/i18n/HERMES_VOICE.ko.md +65 -0
- package/docs/i18n/HERMES_VOICE.ru.md +46 -0
- package/docs/i18n/HERMES_VOICE.zh.md +46 -0
- package/docs/i18n/MULTI_INSTANCE.es.md +25 -0
- package/docs/i18n/MULTI_INSTANCE.fr.md +25 -0
- package/docs/i18n/MULTI_INSTANCE.ja.md +25 -0
- package/docs/i18n/MULTI_INSTANCE.ko.md +25 -0
- package/docs/i18n/MULTI_INSTANCE.ru.md +25 -0
- package/docs/i18n/MULTI_INSTANCE.zh.md +25 -0
- package/docs/i18n/README.es.md +20 -134
- package/docs/i18n/README.fr.md +20 -134
- package/docs/i18n/README.ja.md +20 -134
- package/docs/i18n/README.ko.md +20 -133
- package/docs/i18n/README.ru.md +20 -134
- package/docs/i18n/README.zh.md +20 -133
- package/docs/i18n/RELEASE.es.md +26 -1
- package/docs/i18n/RELEASE.fr.md +26 -1
- package/docs/i18n/RELEASE.ja.md +26 -1
- package/docs/i18n/RELEASE.ko.md +26 -1
- package/docs/i18n/RELEASE.ru.md +26 -1
- package/docs/i18n/RELEASE.zh.md +26 -1
- package/docs/i18n/TROUBLESHOOTING.es.md +39 -0
- package/docs/i18n/TROUBLESHOOTING.fr.md +39 -0
- package/docs/i18n/TROUBLESHOOTING.ja.md +39 -0
- package/docs/i18n/TROUBLESHOOTING.ko.md +39 -0
- package/docs/i18n/TROUBLESHOOTING.ru.md +39 -0
- package/docs/i18n/TROUBLESHOOTING.zh.md +39 -0
- package/docs/i18n/USAGE.es.md +25 -0
- package/docs/i18n/USAGE.fr.md +25 -0
- package/docs/i18n/USAGE.ja.md +25 -0
- package/docs/i18n/USAGE.ko.md +25 -0
- package/docs/i18n/USAGE.ru.md +25 -0
- package/docs/i18n/USAGE.zh.md +25 -0
- package/docs/superpowers/plans/2026-05-13-phase1-streaming-pipeline.md +122 -0
- package/docs/superpowers/plans/2026-05-13-phase10-push-notifications.md +152 -0
- package/docs/superpowers/plans/2026-05-13-phase2-agent-adapters.md +242 -0
- package/docs/superpowers/plans/2026-05-13-phase6-smart-progress.md +172 -0
- package/docs/superpowers/plans/2026-05-13-phase7-voice-plan-mode.md +108 -0
- package/docs/superpowers/plans/2026-05-14-cross-agent-voice-transfer.md +625 -0
- package/docs/superpowers/plans/2026-05-21-audio-overview-narrated-diffs.md +95 -0
- package/docs/superpowers/plans/2026-05-21-autoresearch-ontology.md +83 -0
- package/docs/superpowers/plans/2026-05-21-phase11-push-to-talk-wakeword-v2.md +77 -0
- package/docs/superpowers/plans/2026-05-21-phase12-multi-user-voice.md +147 -0
- package/docs/superpowers/plans/2026-05-21-phase14-verbalbench.md +136 -0
- package/docs/superpowers/plans/2026-05-21-phase15-phone-companion.md +72 -0
- package/integrations/fireredtts2/mlx_llm.py +183 -0
- package/integrations/fireredtts2/synth.py +156 -0
- package/integrations/fireredtts2/synth_mlx.py +196 -0
- package/integrations/mlxaudio/synth.py +74 -0
- package/integrations/neuttsair/synth.py +104 -0
- package/integrations/omnivoice/synth.py +110 -0
- package/package.json +7 -1
- package/scripts/cli.mjs +88 -3
- package/scripts/doctor.mjs +115 -4
- package/scripts/install.mjs +20 -2
- package/scripts/install_fireredtts2.sh +109 -0
- package/scripts/install_mlxaudio.sh +34 -0
- package/scripts/install_mossttsnano.sh +46 -0
- package/scripts/postinstall.mjs +34 -0
package/.env.example
CHANGED
|
@@ -1,17 +1,44 @@
|
|
|
1
1
|
# Copy to .env and fill local values. Do not commit .env.
|
|
2
|
+
# Preferred setup commands:
|
|
3
|
+
# vc setup token
|
|
4
|
+
# vc setup channels "General,Team Voice"
|
|
2
5
|
|
|
3
6
|
DISCORD_BOT_TOKEN=""
|
|
7
|
+
DISCORD_CLIENT_ID=""
|
|
4
8
|
DISCORD_ALLOWED_USERS=""
|
|
5
9
|
AUTO_JOIN_VOICE_CHANNELS="일반,General,general"
|
|
6
10
|
TRANSCRIPT_CHANNEL_ID=""
|
|
11
|
+
VOICE_CONNECT_TIMEOUT_MS="60000"
|
|
7
12
|
|
|
8
|
-
# Agent harness: hermes, claude-code, claude, codex, gemini, opencode, openclaw, custom
|
|
13
|
+
# Agent harness: hermes, claude-code, claude, codex, gemini, opencode, openclaw, aider, cursor, custom
|
|
14
|
+
# `vc setup` auto-detects which agents are installed and lets you pick.
|
|
9
15
|
AGENT_BACKEND="hermes"
|
|
10
16
|
# AGENT_LABEL="My Harness"
|
|
11
17
|
# AGENT_COMMAND="my-harness run --non-interactive"
|
|
18
|
+
# AIDER_COMMAND="aider --no-pretty --yes-always --message"
|
|
19
|
+
# CURSOR_COMMAND="cursor-agent --print --prompt"
|
|
12
20
|
AGENT_TASK_TIMEOUT_MS="0"
|
|
13
21
|
AGENT_CHAT_TIMEOUT_MS="45000"
|
|
14
22
|
AGENT_VERBOSE_PROGRESS="0" # default off; toggle in Discord with !verbose on/off
|
|
23
|
+
|
|
24
|
+
# Streaming TTS pipeline: sentence-by-sentence playback while the agent is still writing.
|
|
25
|
+
# First audio plays before the agent finishes. Set to "0" to fall back to whole-reply playback.
|
|
26
|
+
STREAMING_TTS="1"
|
|
27
|
+
|
|
28
|
+
# Smart progress summarization. When SMART_PROGRESS_API_KEY is set, raw progress events get
|
|
29
|
+
# folded into a single human sentence via a small LLM (Groq OpenAI-compatible API by default).
|
|
30
|
+
# Without an API key it falls back to the existing regex categories.
|
|
31
|
+
# SMART_PROGRESS_API_KEY=""
|
|
32
|
+
# SMART_PROGRESS_BASE_URL="https://api.groq.com/openai/v1"
|
|
33
|
+
# SMART_PROGRESS_MODEL="llama-3.1-8b-instant"
|
|
34
|
+
|
|
35
|
+
# Push notification handoff for long tasks when the voice channel is empty.
|
|
36
|
+
# Provider: ntfy (free, no account, mobile apps) | pushover | noop.
|
|
37
|
+
# NOTIFY_PROVIDER="ntfy"
|
|
38
|
+
# NTFY_TOPIC="" # pick something unguessable; subscribe with the ntfy app
|
|
39
|
+
# PUSHOVER_USER=""
|
|
40
|
+
# PUSHOVER_TOKEN=""
|
|
41
|
+
NOTIFY_MIN_TASK_MS="60000" # only notify when a task ran at least this long
|
|
15
42
|
LATENCY_LOG_PATH="./.logs/latency.jsonl"
|
|
16
43
|
PROJECT_SESSIONS_FILE="./config/project-sessions.json"
|
|
17
44
|
# Agent workflow helper: off by default. Toggle with `vc restart auto on|off`.
|
|
@@ -24,7 +51,7 @@ VOICE_LANGUAGE="ko" # ko | en | auto; controls progress/status language
|
|
|
24
51
|
WHISPER_CPP_LANGUAGE="ko" # ko | en | auto; auto omits forced whisper language
|
|
25
52
|
STT_LANGUAGE="ko"
|
|
26
53
|
|
|
27
|
-
TTS_BACKEND="edge" # edge | openvoice | speechswift | supertonic
|
|
54
|
+
TTS_BACKEND="edge" # edge | openvoice | speechswift | supertonic | omnivoice | qwen3tts | mlxaudio | fireredtts2 | mossttsnano | neuttsair
|
|
28
55
|
EDGE_TTS_COMMAND="edge-tts"
|
|
29
56
|
TTS_VOICE_TYPE="korean_female" # edge: korean_male | korean_female | korean_multilingual_male | english_male | english_female
|
|
30
57
|
TTS_VOICE="ko-KR-SunHiNeural"
|
|
@@ -66,6 +93,33 @@ OPENVOICE_LANGUAGE="KR"
|
|
|
66
93
|
OPENVOICE_STYLE="default"
|
|
67
94
|
OPENVOICE_TIMEOUT_MS="90000"
|
|
68
95
|
OPENVOICE_PROGRESS="0" # keep progress prompts fast via Edge unless set to 1
|
|
96
|
+
|
|
97
|
+
# Optional k2-fsa/OmniVoice TTS backend (600+ language zero-shot TTS / voice cloning).
|
|
98
|
+
# Recommended: create a separate env, install torch/torchaudio/soundfile/omnivoice, and keep progress prompts on Edge.
|
|
99
|
+
OMNIVOICE_PYTHON="./.venv-omnivoice/bin/python"
|
|
100
|
+
OMNIVOICE_MODEL="k2-fsa/OmniVoice"
|
|
101
|
+
OMNIVOICE_DEVICE="mps" # mps on Apple Silicon, cuda:0 on NVIDIA, cpu as fallback
|
|
102
|
+
OMNIVOICE_DTYPE="float16"
|
|
103
|
+
OMNIVOICE_REF_AUDIO="./voice-samples/user-reference.wav"
|
|
104
|
+
OMNIVOICE_REF_TEXT="" # optional transcript of the reference sample
|
|
105
|
+
OMNIVOICE_LANGUAGE="ko"
|
|
106
|
+
OMNIVOICE_SPEAKER="" # optional voice-design attributes when no ref sample is desired
|
|
107
|
+
OMNIVOICE_TIMEOUT_MS="180000"
|
|
108
|
+
OMNIVOICE_PROGRESS="0" # keep progress prompts fast via Edge unless set to 1
|
|
109
|
+
|
|
110
|
+
# Optional Qwen3-TTS CLI backend via speech-swift `audio speak`.
|
|
111
|
+
# Install speech-swift/audio separately, then set TTS_BACKEND="qwen3tts" or "qwen3".
|
|
112
|
+
QWEN3TTS_COMMAND="audio"
|
|
113
|
+
QWEN3TTS_MODE="custom" # custom | clone | design
|
|
114
|
+
QWEN3TTS_MODEL="customVoice" # customVoice for preset speakers, base/base-8bit for cloning
|
|
115
|
+
QWEN3TTS_LANGUAGE="korean"
|
|
116
|
+
QWEN3TTS_SPEAKER="sohee" # used in custom mode
|
|
117
|
+
QWEN3TTS_INSTRUCT="" # emotion/style instruction; used in custom/design modes
|
|
118
|
+
QWEN3TTS_REF_AUDIO="./voice-samples/user-reference.wav" # used in clone mode
|
|
119
|
+
QWEN3TTS_REF_TEXT="" # optional note for your reference sample
|
|
120
|
+
QWEN3TTS_STREAM="1"
|
|
121
|
+
QWEN3TTS_TIMEOUT_MS="120000"
|
|
122
|
+
QWEN3TTS_PROGRESS="0" # keep progress prompts fast via Edge unless set to 1
|
|
69
123
|
REQUIRE_WAKE_WORD="0"
|
|
70
124
|
MIN_UTTERANCE_SECONDS="1.0"
|
|
71
125
|
# Wait for natural thinking pauses before STT. Lower for faster but more fragmented turns.
|
|
@@ -83,3 +137,45 @@ BARGE_IN_CONSERVATIVE_MIN_SECONDS="1.8"
|
|
|
83
137
|
BARGE_IN_CONSERVATIVE_MIN_MEAN_VOLUME_DB="-27"
|
|
84
138
|
BARGE_IN_CONSERVATIVE_MIN_MAX_VOLUME_DB="-12"
|
|
85
139
|
MAX_DEFERRED_PROCESSING_UTTERANCES="0"
|
|
140
|
+
|
|
141
|
+
|
|
142
|
+
# Optional local TTS backends (final answers only by default; progress falls back to Edge)
|
|
143
|
+
# TTS_BACKEND=fireredtts2
|
|
144
|
+
FIREREDTTS2_COMMAND=./.local/bin/fireredtts2
|
|
145
|
+
FIREREDTTS2_PRETRAINED_DIR=pretrained_models/FireRedTTS2
|
|
146
|
+
FIREREDTTS2_DEVICE=auto
|
|
147
|
+
FIREREDTTS2_GEN_TYPE=monologue
|
|
148
|
+
FIREREDTTS2_SPEAKER=S1
|
|
149
|
+
FIREREDTTS2_PROMPT_AUDIO=voice-samples/user-reference.wav
|
|
150
|
+
FIREREDTTS2_PROMPT_TEXT=
|
|
151
|
+
FIREREDTTS2_BF16=0
|
|
152
|
+
FIREREDTTS2_TIMEOUT_MS=180000
|
|
153
|
+
FIREREDTTS2_PROGRESS=0
|
|
154
|
+
|
|
155
|
+
# TTS_BACKEND=mossttsnano
|
|
156
|
+
MOSSTTSNANO_COMMAND=python3
|
|
157
|
+
MOSSTTSNANO_SCRIPT=vendor/MOSS-TTS-Nano/infer.py
|
|
158
|
+
MOSSTTSNANO_CHECKPOINT=OpenMOSS-Team/MOSS-TTS-Nano
|
|
159
|
+
MOSSTTSNANO_AUDIO_TOKENIZER=
|
|
160
|
+
MOSSTTSNANO_MODE=continuation
|
|
161
|
+
MOSSTTSNANO_DEVICE=auto
|
|
162
|
+
MOSSTTSNANO_DTYPE=auto
|
|
163
|
+
MOSSTTSNANO_PROMPT_AUDIO=voice-samples/user-reference.wav
|
|
164
|
+
MOSSTTSNANO_PROMPT_TEXT=
|
|
165
|
+
MOSSTTSNANO_MAX_NEW_FRAMES=375
|
|
166
|
+
MOSSTTSNANO_TIMEOUT_MS=120000
|
|
167
|
+
MOSSTTSNANO_PROGRESS=0
|
|
168
|
+
|
|
169
|
+
# TTS_BACKEND=neuttsair # NeuTTS Air is English-only; progress falls back to Edge by default.
|
|
170
|
+
NEUTTSAIR_PYTHON=./.venv-neuttsair/bin/python
|
|
171
|
+
NEUTTSAIR_SCRIPT=integrations/neuttsair/synth.py
|
|
172
|
+
NEUTTSAIR_BACKBONE_REPO=neuphonic/neutts-air-q4-gguf
|
|
173
|
+
NEUTTSAIR_BACKBONE_DEVICE=mps
|
|
174
|
+
NEUTTSAIR_CODEC_REPO=neuphonic/neucodec
|
|
175
|
+
NEUTTSAIR_CODEC_DEVICE=mps
|
|
176
|
+
NEUTTSAIR_REF_AUDIO=voice-samples/user-reference.wav
|
|
177
|
+
NEUTTSAIR_REF_TEXT=
|
|
178
|
+
NEUTTSAIR_LANGUAGE=en
|
|
179
|
+
NEUTTSAIR_SAMPLE_RATE=24000
|
|
180
|
+
NEUTTSAIR_TIMEOUT_MS=120000
|
|
181
|
+
NEUTTSAIR_PROGRESS=0
|
package/README.es.md
ADDED
|
@@ -0,0 +1,134 @@
|
|
|
1
|
+
# VerbalCoding
|
|
2
|
+
|
|
3
|
+
<p align="center"><strong>Habla con agentes de programación CLI por voz en Discord, como en una llamada.</strong></p>
|
|
4
|
+
|
|
5
|
+
<p align="center"><a href="./README.md">English</a> · <a href="./README.ko.md">한국어</a> · <a href="./README.ja.md">日本語</a> · <a href="./README.zh.md">中文</a> · <a href="./README.fr.md">Français</a> · <a href="./README.ru.md">Русский</a></p>
|
|
6
|
+
|
|
7
|
+
<p align="center">
|
|
8
|
+
<img alt="npm" src="https://img.shields.io/npm/v/verbalcoding?color=CB3837&logo=npm&logoColor=white">
|
|
9
|
+
<img alt="Node.js" src="https://img.shields.io/badge/Node.js-20%2B-339933?logo=node.js&logoColor=white">
|
|
10
|
+
<img alt="Discord" src="https://img.shields.io/badge/Discord-voice%20bridge-5865F2?logo=discord&logoColor=white">
|
|
11
|
+
<img alt="STT" src="https://img.shields.io/badge/STT-whisper.cpp-7C3AED">
|
|
12
|
+
<img alt="TTS" src="https://img.shields.io/badge/TTS-Edge%20%7C%20OpenVoice%20%7C%20SpeechSwift-0EA5E9">
|
|
13
|
+
<img alt="License" src="https://img.shields.io/github/license/ca1773130n/VerbalCoding">
|
|
14
|
+
</p>
|
|
15
|
+
|
|
16
|
+
<p align="center">
|
|
17
|
+
<img src="docs/assets/figures/verbalcoding-flow.svg" alt="VerbalCoding voice-to-agent flow" width="860">
|
|
18
|
+
</p>
|
|
19
|
+
|
|
20
|
+
## Por qué existe
|
|
21
|
+
|
|
22
|
+
VerbalCoding convierte una sala de voz de Discord en una cabina manos libres para agentes de programación. Pides algo hablando, dejas trabajar al agente CLI y recibes una respuesta breve por voz con transcripción y eventos de progreso. Los diffs y logs quedan fuera del TTS largo.
|
|
23
|
+
|
|
24
|
+
> **¿Ya usas Hermes Agent?** Hermes ya trae soporte de canales de voz de Discord con `/voice join` / `/voice channel`: puede unirse al VC actual, transcribir con Whisper y responder por TTS. Para ese bucle básico, VerbalCoding no es obligatorio. VerbalCoding añade una capa de flujo de trabajo: enrutamiento de proyectos/sesiones, contexto compartido de voz+texto, reglas de interrupción, avisos de progreso, presets de idioma, métricas de latencia y cambio de backend CLI más allá de Hermes.
|
|
25
|
+
|
|
26
|
+
## Qué lo hace distinto
|
|
27
|
+
|
|
28
|
+
| Capacidad | Por qué importa |
|
|
29
|
+
|---|---|
|
|
30
|
+
| Flujo tipo llamada | Habla, escucha, interrumpe y continúa en el mismo canal de voz de Discord. |
|
|
31
|
+
| Configuración guiada | `vc setup` reúne prerequisites, Discord token/client ID, voice channel, transcript target, backend y TTS settings en un solo flujo. |
|
|
32
|
+
| Bucle de voz local | Discord audio → local `whisper-cli` → selected CLI agent → TTS reply. |
|
|
33
|
+
| Elección de agente | Hermes Agent, Claude Code, Codex, Gemini CLI, OpenCode, OpenClaw, Aider, Cursor CLI o custom command. `vc setup` autodetecta lo que tienes instalado. |
|
|
34
|
+
| Enrutamiento de agente por voz | `"ask Codex what it thinks"` (un turno), `"switch to Aider"` (sticky), `"back to default"` para volver. Si el binario no está instalado, el puente ofrece fallback al agente por defecto. |
|
|
35
|
+
| Más allá de la voz integrada de Hermes | Mantiene el mismo bucle de voz en VC y añade salas de proyecto, contexto compartido con `!ask`, interrupciones afinadas, voz de progreso/estado y control de backends multiagente. |
|
|
36
|
+
| Operación real | Incluye doctor auto-fix, guía Docker UDP, latency metrics, multi-instance rooms y redacted config checks. |
|
|
37
|
+
|
|
38
|
+
## Inicio rápido
|
|
39
|
+
|
|
40
|
+
```bash
|
|
41
|
+
npm install -g verbalcoding@latest
|
|
42
|
+
vc setup
|
|
43
|
+
vc doctor
|
|
44
|
+
vc start
|
|
45
|
+
```
|
|
46
|
+
|
|
47
|
+
`vc setup` es la ruta normal para personas. Mantén abierto Discord Developer Portal mientras introduces bot token, application/client ID, transcript target y voice channel names.
|
|
48
|
+
|
|
49
|
+
En automatización puedes omitir prompts y completar los datos de Discord después.
|
|
50
|
+
|
|
51
|
+
```bash
|
|
52
|
+
vc setup --yes
|
|
53
|
+
vc setup token <bot-token> --client-id <discord-client-id>
|
|
54
|
+
vc setup channels "General,Team Voice"
|
|
55
|
+
vc doctor
|
|
56
|
+
```
|
|
57
|
+
|
|
58
|
+
## Discord en un minuto
|
|
59
|
+
|
|
60
|
+
1. Crea una application y un bot en Discord Developer Portal.
|
|
61
|
+
2. Activa Message Content privileged intent.
|
|
62
|
+
3. Ejecuta `vc setup` y pega bot token y application/client ID.
|
|
63
|
+
4. Introduce los nombres exactos de los voice channels para auto-join.
|
|
64
|
+
5. Invita el bot con estos comandos.
|
|
65
|
+
|
|
66
|
+
```bash
|
|
67
|
+
vc bot invite <discord-client-id>
|
|
68
|
+
vc bot invite <discord-client-id> --guild <guild-id>
|
|
69
|
+
```
|
|
70
|
+
|
|
71
|
+
## Mapa mínimo de comandos
|
|
72
|
+
|
|
73
|
+
```bash
|
|
74
|
+
vc setup # configuración guiada: prerequisites, Discord, backend, voice
|
|
75
|
+
vc setup --yes # bootstrap/starter config no interactiva
|
|
76
|
+
vc setup token # rotar o añadir Discord bot token/client ID después
|
|
77
|
+
vc setup channels "General,Team Voice" # actualizar auto-join voice channel names
|
|
78
|
+
vc bot invite CLIENT_ID # generar Discord bot invite URL
|
|
79
|
+
vc status # mostrar configuración actual
|
|
80
|
+
vc language ko|en|auto # cambiar language preset
|
|
81
|
+
vc doctor # redacted health check y auto-fixes
|
|
82
|
+
vc start # iniciar bridge por defecto
|
|
83
|
+
vc instance setup NAME # crear project voice bot aislado
|
|
84
|
+
vc instance start NAME # ejecutar ese bot en background
|
|
85
|
+
```
|
|
86
|
+
|
|
87
|
+
## Más información
|
|
88
|
+
|
|
89
|
+
| Guía | Qué obtienes |
|
|
90
|
+
|---|---|
|
|
91
|
+
| [Centro de documentación](docs/i18n/README.es.md) | Índice de guías localizadas. |
|
|
92
|
+
| [Fresh Install](docs/i18n/FRESH_INSTALL.es.md) | npm/global setup, configuración de Discord y primera ejecución. |
|
|
93
|
+
| [Usage](docs/i18n/USAGE.es.md) | Comandos CLI, comandos Discord, modos de ejecución y latency. |
|
|
94
|
+
| [Uso por harness](docs/i18n/HARNESSES.es.md) | Instalación, configuración y enrutamiento por voz para Claude Code, Codex, Aider y demás. |
|
|
95
|
+
| [Voz integrada de Hermes vs VerbalCoding](docs/i18n/HERMES_VOICE.es.md) | La voz Discord que Hermes ya ofrece y la diferencia de VerbalCoding. |
|
|
96
|
+
| [Configuration](docs/i18n/CONFIGURATION.es.md) | .env, agent backends, MCP, TTS y operación. |
|
|
97
|
+
| [Troubleshooting](docs/i18n/TROUBLESHOOTING.es.md) | Docker UDP y comprobaciones de token/channel. |
|
|
98
|
+
| [Multi-Instance](docs/i18n/MULTI_INSTANCE.es.md) | Una sala de voz fija por proyecto. |
|
|
99
|
+
|
|
100
|
+
## Requisitos
|
|
101
|
+
|
|
102
|
+
| Capa | Predeterminado |
|
|
103
|
+
|---|---|
|
|
104
|
+
| Runtime | Node.js 20+ y npm. |
|
|
105
|
+
| Audio | `ffmpeg` y local `whisper-cli`. |
|
|
106
|
+
| TTS | Edge TTS por defecto; OpenVoice, SpeechSwift/CosyVoice y Supertonic opcionales. |
|
|
107
|
+
| Discord | Bot token, Message Content intent, voice permissions y channel names coincidentes. |
|
|
108
|
+
| Agent | Al menos un CLI harness autenticado; Hermes Agent por defecto. |
|
|
109
|
+
|
|
110
|
+
## Nota Docker / contenedores
|
|
111
|
+
|
|
112
|
+
Si los logs muestran `Cannot perform IP discovery - socket closed`, Discord voice UDP está bloqueado. En Linux Docker Compose usa:
|
|
113
|
+
|
|
114
|
+
```yaml
|
|
115
|
+
services:
|
|
116
|
+
verbalcoding:
|
|
117
|
+
network_mode: "host"
|
|
118
|
+
```
|
|
119
|
+
|
|
120
|
+
No combines `network_mode: "host"` con `ports:`.
|
|
121
|
+
|
|
122
|
+
## Contribuir
|
|
123
|
+
|
|
124
|
+
```bash
|
|
125
|
+
node --check app-node/main.mjs
|
|
126
|
+
npm test
|
|
127
|
+
bash -n run.sh scripts/install.sh scripts/bootstrap_prereqs.sh
|
|
128
|
+
npm pack --dry-run
|
|
129
|
+
vc doctor
|
|
130
|
+
```
|
|
131
|
+
|
|
132
|
+
## Estado
|
|
133
|
+
|
|
134
|
+
VerbalCoding apunta a publicación pública, pero todavía es temprano. Demo video/GIF, validación Linux más amplia, CI y revisión de seguridad siguen como TODO.
|
package/README.fr.md
ADDED
|
@@ -0,0 +1,134 @@
|
|
|
1
|
+
# VerbalCoding
|
|
2
|
+
|
|
3
|
+
<p align="center"><strong>Parlez à des agents de code CLI depuis Discord vocal, comme lors d’un appel.</strong></p>
|
|
4
|
+
|
|
5
|
+
<p align="center"><a href="./README.md">English</a> · <a href="./README.ko.md">한국어</a> · <a href="./README.ja.md">日本語</a> · <a href="./README.zh.md">中文</a> · <a href="./README.es.md">Español</a> · <a href="./README.ru.md">Русский</a></p>
|
|
6
|
+
|
|
7
|
+
<p align="center">
|
|
8
|
+
<img alt="npm" src="https://img.shields.io/npm/v/verbalcoding?color=CB3837&logo=npm&logoColor=white">
|
|
9
|
+
<img alt="Node.js" src="https://img.shields.io/badge/Node.js-20%2B-339933?logo=node.js&logoColor=white">
|
|
10
|
+
<img alt="Discord" src="https://img.shields.io/badge/Discord-voice%20bridge-5865F2?logo=discord&logoColor=white">
|
|
11
|
+
<img alt="STT" src="https://img.shields.io/badge/STT-whisper.cpp-7C3AED">
|
|
12
|
+
<img alt="TTS" src="https://img.shields.io/badge/TTS-Edge%20%7C%20OpenVoice%20%7C%20SpeechSwift-0EA5E9">
|
|
13
|
+
<img alt="License" src="https://img.shields.io/github/license/ca1773130n/VerbalCoding">
|
|
14
|
+
</p>
|
|
15
|
+
|
|
16
|
+
<p align="center">
|
|
17
|
+
<img src="docs/assets/figures/verbalcoding-flow.svg" alt="VerbalCoding voice-to-agent flow" width="860">
|
|
18
|
+
</p>
|
|
19
|
+
|
|
20
|
+
## Pourquoi ce projet existe
|
|
21
|
+
|
|
22
|
+
VerbalCoding transforme un salon vocal Discord en poste de pilotage mains libres pour agents de code. Dictez une demande, laissez le CLI travailler, puis recevez une réponse vocale concise avec transcription et progression. Les diffs et logs ne sont pas lus longuement par TTS.
|
|
23
|
+
|
|
24
|
+
> **Vous utilisez déjà Hermes Agent ?** Hermes prend déjà en charge les salons vocaux Discord via `/voice join` / `/voice channel` : il peut rejoindre votre VC, transcrire avec Whisper et répondre en TTS. Pour cette boucle de base, VerbalCoding n’est pas obligatoire. VerbalCoding ajoute une couche de workflow : routage projet/session, contexte voix+texte partagé, règles d’interruption, annonces de progression, préréglages de langue, métriques de latence et changement de backend CLI au-delà de Hermes.
|
|
25
|
+
|
|
26
|
+
## Ce qui change
|
|
27
|
+
|
|
28
|
+
| Capacité | Pourquoi c’est utile |
|
|
29
|
+
|---|---|
|
|
30
|
+
| Flux type appel | Parler, écouter, interrompre et continuer dans le même salon vocal Discord. |
|
|
31
|
+
| Configuration guidée | `vc setup` couvre prerequisites, Discord token/client ID, voice channel, transcript target, backend et TTS settings en un seul flux. |
|
|
32
|
+
| Boucle vocale locale | Discord audio → local `whisper-cli` → selected CLI agent → TTS reply. |
|
|
33
|
+
| Choix de l’agent | Hermes Agent, Claude Code, Codex, Gemini CLI, OpenCode, OpenClaw, Aider, Cursor CLI ou custom command. `vc setup` détecte automatiquement ce qui est installé. |
|
|
34
|
+
| Routage d’agent par voix | `"ask Codex what it thinks"` pour un tour, `"switch to Aider"` en sticky, `"back to default"` pour revenir. Les binaires absents sont détectés et le pont propose un fallback vers l’agent par défaut. |
|
|
35
|
+
| Au-delà de la voix intégrée de Hermes | Garde la même boucle vocale VC, puis ajoute salons de projet, contexte partagé `!ask`, interruptions réglées, annonces progression/état et contrôle de backends multiagents. |
|
|
36
|
+
| Exploitation réelle | doctor auto-fix, guide Docker UDP, latency metrics, multi-instance rooms et redacted config checks inclus. |
|
|
37
|
+
|
|
38
|
+
## Démarrage rapide
|
|
39
|
+
|
|
40
|
+
```bash
|
|
41
|
+
npm install -g verbalcoding@latest
|
|
42
|
+
vc setup
|
|
43
|
+
vc doctor
|
|
44
|
+
vc start
|
|
45
|
+
```
|
|
46
|
+
|
|
47
|
+
`vc setup` est le parcours normal pour une personne. Gardez Discord Developer Portal ouvert pendant la saisie du bot token, application/client ID, transcript target et voice channel names.
|
|
48
|
+
|
|
49
|
+
En automatisation, vous pouvez ignorer les prompts puis renseigner Discord ensuite.
|
|
50
|
+
|
|
51
|
+
```bash
|
|
52
|
+
vc setup --yes
|
|
53
|
+
vc setup token <bot-token> --client-id <discord-client-id>
|
|
54
|
+
vc setup channels "General,Team Voice"
|
|
55
|
+
vc doctor
|
|
56
|
+
```
|
|
57
|
+
|
|
58
|
+
## Discord en une minute
|
|
59
|
+
|
|
60
|
+
1. Créez une application et un bot dans Discord Developer Portal.
|
|
61
|
+
2. Activez Message Content privileged intent.
|
|
62
|
+
3. Lancez `vc setup` et collez bot token et application/client ID.
|
|
63
|
+
4. Saisissez les noms exacts des voice channels à rejoindre.
|
|
64
|
+
5. Invitez le bot avec ces commandes.
|
|
65
|
+
|
|
66
|
+
```bash
|
|
67
|
+
vc bot invite <discord-client-id>
|
|
68
|
+
vc bot invite <discord-client-id> --guild <guild-id>
|
|
69
|
+
```
|
|
70
|
+
|
|
71
|
+
## Carte rapide des commandes
|
|
72
|
+
|
|
73
|
+
```bash
|
|
74
|
+
vc setup # configuration guidée: prerequisites, Discord, backend, voice
|
|
75
|
+
vc setup --yes # bootstrap/starter config non interactive
|
|
76
|
+
vc setup token # modifier ou ajouter Discord bot token/client ID plus tard
|
|
77
|
+
vc setup channels "General,Team Voice" # mettre à jour auto-join voice channel names
|
|
78
|
+
vc bot invite CLIENT_ID # générer Discord bot invite URL
|
|
79
|
+
vc status # afficher les réglages actuels
|
|
80
|
+
vc language ko|en|auto # changer language preset
|
|
81
|
+
vc doctor # redacted health check et auto-fixes
|
|
82
|
+
vc start # démarrer le bridge par défaut
|
|
83
|
+
vc instance setup NAME # créer un project voice bot isolé
|
|
84
|
+
vc instance start NAME # exécuter ce bot en background
|
|
85
|
+
```
|
|
86
|
+
|
|
87
|
+
## En savoir plus
|
|
88
|
+
|
|
89
|
+
| Guide | Contenu |
|
|
90
|
+
|---|---|
|
|
91
|
+
| [Centre de documentation](docs/i18n/README.fr.md) | Index des guides localisés. |
|
|
92
|
+
| [Fresh Install](docs/i18n/FRESH_INSTALL.fr.md) | npm/global setup, configuration Discord, premier lancement. |
|
|
93
|
+
| [Usage](docs/i18n/USAGE.fr.md) | Commandes CLI, commandes Discord, modes d’exécution, latency. |
|
|
94
|
+
| [Usage par harness](docs/i18n/HARNESSES.fr.md) | Installation, configuration et routage vocal pour Claude Code, Codex, Aider et les autres. |
|
|
95
|
+
| [Voix intégrée Hermes vs VerbalCoding](docs/i18n/HERMES_VOICE.fr.md) | La voix Discord déjà fournie par Hermes et la différence VerbalCoding. |
|
|
96
|
+
| [Configuration](docs/i18n/CONFIGURATION.fr.md) | .env, agent backends, MCP, TTS, exploitation. |
|
|
97
|
+
| [Troubleshooting](docs/i18n/TROUBLESHOOTING.fr.md) | Docker UDP et vérifications token/channel. |
|
|
98
|
+
| [Multi-Instance](docs/i18n/MULTI_INSTANCE.fr.md) | Un salon vocal fixe par projet. |
|
|
99
|
+
|
|
100
|
+
## Exigences
|
|
101
|
+
|
|
102
|
+
| Couche | Défaut |
|
|
103
|
+
|---|---|
|
|
104
|
+
| Runtime | Node.js 20+ et npm. |
|
|
105
|
+
| Audio | `ffmpeg` et local `whisper-cli`. |
|
|
106
|
+
| TTS | Edge TTS par défaut; OpenVoice, SpeechSwift/CosyVoice et Supertonic en option. |
|
|
107
|
+
| Discord | Bot token, Message Content intent, voice permissions et channel names correspondants. |
|
|
108
|
+
| Agent | Au moins un CLI harness authentifié; Hermes Agent par défaut. |
|
|
109
|
+
|
|
110
|
+
## Note Docker / conteneurs
|
|
111
|
+
|
|
112
|
+
Si les logs affichent `Cannot perform IP discovery - socket closed`, Discord voice UDP est bloqué. Avec Linux Docker Compose, utilisez:
|
|
113
|
+
|
|
114
|
+
```yaml
|
|
115
|
+
services:
|
|
116
|
+
verbalcoding:
|
|
117
|
+
network_mode: "host"
|
|
118
|
+
```
|
|
119
|
+
|
|
120
|
+
Ne combinez pas `network_mode: "host"` avec `ports:`.
|
|
121
|
+
|
|
122
|
+
## Contribuer
|
|
123
|
+
|
|
124
|
+
```bash
|
|
125
|
+
node --check app-node/main.mjs
|
|
126
|
+
npm test
|
|
127
|
+
bash -n run.sh scripts/install.sh scripts/bootstrap_prereqs.sh
|
|
128
|
+
npm pack --dry-run
|
|
129
|
+
vc doctor
|
|
130
|
+
```
|
|
131
|
+
|
|
132
|
+
## Statut
|
|
133
|
+
|
|
134
|
+
VerbalCoding vise une publication publique mais reste jeune. Vidéo/GIF de démo, validation Linux plus large, CI et revue sécurité restent TODO.
|
package/README.ja.md
ADDED
|
@@ -0,0 +1,134 @@
|
|
|
1
|
+
# VerbalCoding
|
|
2
|
+
|
|
3
|
+
<p align="center"><strong>Discord 音声で CLI コーディングエージェントと電話のように作業できます。</strong></p>
|
|
4
|
+
|
|
5
|
+
<p align="center"><a href="./README.md">English</a> · <a href="./README.ko.md">한국어</a> · <a href="./README.zh.md">中文</a> · <a href="./README.es.md">Español</a> · <a href="./README.fr.md">Français</a> · <a href="./README.ru.md">Русский</a></p>
|
|
6
|
+
|
|
7
|
+
<p align="center">
|
|
8
|
+
<img alt="npm" src="https://img.shields.io/npm/v/verbalcoding?color=CB3837&logo=npm&logoColor=white">
|
|
9
|
+
<img alt="Node.js" src="https://img.shields.io/badge/Node.js-20%2B-339933?logo=node.js&logoColor=white">
|
|
10
|
+
<img alt="Discord" src="https://img.shields.io/badge/Discord-voice%20bridge-5865F2?logo=discord&logoColor=white">
|
|
11
|
+
<img alt="STT" src="https://img.shields.io/badge/STT-whisper.cpp-7C3AED">
|
|
12
|
+
<img alt="TTS" src="https://img.shields.io/badge/TTS-Edge%20%7C%20OpenVoice%20%7C%20SpeechSwift-0EA5E9">
|
|
13
|
+
<img alt="License" src="https://img.shields.io/github/license/ca1773130n/VerbalCoding">
|
|
14
|
+
</p>
|
|
15
|
+
|
|
16
|
+
<p align="center">
|
|
17
|
+
<img src="docs/assets/figures/verbalcoding-flow.svg" alt="VerbalCoding voice-to-agent flow" width="860">
|
|
18
|
+
</p>
|
|
19
|
+
|
|
20
|
+
## なぜ作ったのか
|
|
21
|
+
|
|
22
|
+
VerbalCoding は Discord の音声ルームを、コーディングエージェント用のハンズフリー操作席に変えます。声で依頼し、CLI エージェントに作業させ、短い音声回答とテキスト記録を受け取れます。diff やログを長々と読み上げないための保護も入っています。
|
|
23
|
+
|
|
24
|
+
> **すでに Hermes Agent を使っていますか?** Hermes 自体にも `/voice join` / `/voice channel` による Discord 音声チャンネル対応があります。現在の VC に参加し、Whisper STT で文字起こしし、TTS で話し返せます。その基本ループだけなら VerbalCoding は必須ではありません。VerbalCoding はその上に、プロジェクト/セッションルーティング、音声+テキスト共有コンテキスト、割り込みルール、進捗音声、言語プリセット、レイテンシ指標、Hermes 以外の CLI バックエンド切り替えを加えるワークフローレイヤーです。
|
|
25
|
+
|
|
26
|
+
## 体験の違い
|
|
27
|
+
|
|
28
|
+
| 機能 | 価値 |
|
|
29
|
+
|---|---|
|
|
30
|
+
| 電話のような流れ | 同じ Discord 音声チャンネルで話す、聞く、割り込む、続けるができます。 |
|
|
31
|
+
| 人向けのガイド付き設定 | `vc setup` が prerequisites、Discord token/client ID、voice channel、transcript target、backend、TTS 設定を一連の流れで確認します。 |
|
|
32
|
+
| ローカル音声ループ | Discord audio → local `whisper-cli` → selected CLI agent → TTS response。 |
|
|
33
|
+
| エージェント選択 | Hermes Agent、Claude Code、Codex、Gemini CLI、OpenCode、OpenClaw、Aider、Cursor CLI または custom command に対応します。`vc setup` がインストール済みのものを自動検出します。 |
|
|
34
|
+
| 音声でエージェントを切替 | `"ask Codex what it thinks"` で 1 ターンのみ、`"switch to Aider"` で sticky、`"back to default"` で復帰。未インストールのバイナリを検出して既定エージェントへの fallback を提案します。 |
|
|
35
|
+
| Hermes 標準音声の先 | 同じ VC 音声ループを土台に、プロジェクトルーム、`!ask` 共有コンテキスト、細かな割り込み処理、進捗/状態の音声案内、複数エージェントバックエンド制御を追加します。 |
|
|
36
|
+
| 運用向け機能 | doctor auto-fix、Docker UDP ガイド、latency metrics、multi-instance rooms、redacted config checks を備えています。 |
|
|
37
|
+
|
|
38
|
+
## クイックスタート
|
|
39
|
+
|
|
40
|
+
```bash
|
|
41
|
+
npm install -g verbalcoding@latest
|
|
42
|
+
vc setup
|
|
43
|
+
vc doctor
|
|
44
|
+
vc start
|
|
45
|
+
```
|
|
46
|
+
|
|
47
|
+
通常の人間向け導線は `vc setup` です。Discord Developer Portal を開いたまま、bot token、application/client ID、transcript target、voice channel names を入力してください。
|
|
48
|
+
|
|
49
|
+
自動化ではプロンプトを省略し、Discord の値を後から設定できます。
|
|
50
|
+
|
|
51
|
+
```bash
|
|
52
|
+
vc setup --yes
|
|
53
|
+
vc setup token <bot-token> --client-id <discord-client-id>
|
|
54
|
+
vc setup channels "General,Team Voice"
|
|
55
|
+
vc doctor
|
|
56
|
+
```
|
|
57
|
+
|
|
58
|
+
## Discord 設定を 1 分で
|
|
59
|
+
|
|
60
|
+
1. Discord Developer Portal で application と bot を作成します。
|
|
61
|
+
2. Message Content privileged intent を有効にします。
|
|
62
|
+
3. `vc setup` を実行し、bot token と application/client ID を貼り付けます。
|
|
63
|
+
4. 自動参加する voice channel 名を正確に入力します。
|
|
64
|
+
5. 次のコマンドで bot を招待します。
|
|
65
|
+
|
|
66
|
+
```bash
|
|
67
|
+
vc bot invite <discord-client-id>
|
|
68
|
+
vc bot invite <discord-client-id> --guild <guild-id>
|
|
69
|
+
```
|
|
70
|
+
|
|
71
|
+
## 小さなコマンド表
|
|
72
|
+
|
|
73
|
+
```bash
|
|
74
|
+
vc setup # ガイド付き設定: prerequisites, Discord, backend, voice
|
|
75
|
+
vc setup --yes # 非対話 bootstrap/starter config
|
|
76
|
+
vc setup token # Discord bot token と client ID を後で更新/追加
|
|
77
|
+
vc setup channels "General,Team Voice" # auto-join voice channel names を更新
|
|
78
|
+
vc bot invite CLIENT_ID # Discord bot invite URL を生成
|
|
79
|
+
vc status # 現在の設定を表示
|
|
80
|
+
vc language ko|en|auto # language preset を切り替え
|
|
81
|
+
vc doctor # redacted health check と auto-fix
|
|
82
|
+
vc start # 既定 bridge を開始
|
|
83
|
+
vc instance setup NAME # 分離された project voice bot を作成
|
|
84
|
+
vc instance start NAME # その bot を background で実行
|
|
85
|
+
```
|
|
86
|
+
|
|
87
|
+
## 詳しく見る
|
|
88
|
+
|
|
89
|
+
| ガイド | 得られる内容 |
|
|
90
|
+
|---|---|
|
|
91
|
+
| [ドキュメントハブ](docs/i18n/README.ja.md) | ローカライズ済みガイドの索引。 |
|
|
92
|
+
| [Fresh Install](docs/i18n/FRESH_INSTALL.ja.md) | npm/global setup、Discord 設定、初回起動。 |
|
|
93
|
+
| [Usage](docs/i18n/USAGE.ja.md) | CLI コマンド、Discord コマンド、実行モード、latency。 |
|
|
94
|
+
| [Harness 別の使い方](docs/i18n/HARNESSES.ja.md) | Claude Code、Codex、Aider などバックエンド別のインストール・設定・音声ルーティング。 |
|
|
95
|
+
| [Hermes 標準音声 vs VerbalCoding](docs/i18n/HERMES_VOICE.ja.md) | Hermes がすでに提供する Discord 音声と VerbalCoding の違い。 |
|
|
96
|
+
| [Configuration](docs/i18n/CONFIGURATION.ja.md) | .env、agent backends、MCP、TTS、運用。 |
|
|
97
|
+
| [Troubleshooting](docs/i18n/TROUBLESHOOTING.ja.md) | Docker UDP、token/channel 不足チェック。 |
|
|
98
|
+
| [Multi-Instance](docs/i18n/MULTI_INSTANCE.ja.md) | プロジェクトごとに固定音声ルームを 1 つ。 |
|
|
99
|
+
|
|
100
|
+
## 要件
|
|
101
|
+
|
|
102
|
+
| レイヤー | 既定 |
|
|
103
|
+
|---|---|
|
|
104
|
+
| Runtime | Node.js 20+ と npm。 |
|
|
105
|
+
| Audio | `ffmpeg` と local `whisper-cli`。 |
|
|
106
|
+
| TTS | 既定は Edge TTS。OpenVoice、SpeechSwift/CosyVoice、Supertonic は任意。 |
|
|
107
|
+
| Discord | Bot token、Message Content intent、voice permissions、一致する channel names。 |
|
|
108
|
+
| Agent | 認証済み CLI harness が 1 つ以上。既定は Hermes Agent。 |
|
|
109
|
+
|
|
110
|
+
## Docker / コンテナ注意
|
|
111
|
+
|
|
112
|
+
ログに `Cannot perform IP discovery - socket closed` が出る場合、Discord voice UDP がブロックされています。Linux Docker Compose では次を使います:
|
|
113
|
+
|
|
114
|
+
```yaml
|
|
115
|
+
services:
|
|
116
|
+
verbalcoding:
|
|
117
|
+
network_mode: "host"
|
|
118
|
+
```
|
|
119
|
+
|
|
120
|
+
`network_mode: "host"` と `ports:` を併用しないでください。
|
|
121
|
+
|
|
122
|
+
## コントリビューション
|
|
123
|
+
|
|
124
|
+
```bash
|
|
125
|
+
node --check app-node/main.mjs
|
|
126
|
+
npm test
|
|
127
|
+
bash -n run.sh scripts/install.sh scripts/bootstrap_prereqs.sh
|
|
128
|
+
npm pack --dry-run
|
|
129
|
+
vc doctor
|
|
130
|
+
```
|
|
131
|
+
|
|
132
|
+
## 状態
|
|
133
|
+
|
|
134
|
+
VerbalCoding は公開リリースを目指していますが、まだ初期段階です。デモ動画/GIF、より広い Linux 検証、CI、セキュリティレビューは TODO です。
|