verbalcoding 0.2.11 → 0.2.13
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.env.example +98 -2
- package/README.es.md +134 -0
- package/README.fr.md +134 -0
- package/README.ja.md +134 -0
- package/README.ko.md +134 -0
- package/README.md +118 -74
- package/README.ru.md +134 -0
- package/README.zh.md +133 -0
- package/app-node/agent_adapters.mjs +37 -5
- package/app-node/agent_adapters.test.mjs +27 -1
- package/app-node/agent_detect.mjs +73 -0
- package/app-node/agent_detect.test.mjs +77 -0
- package/app-node/agent_routing.mjs +148 -0
- package/app-node/agent_routing.test.mjs +138 -0
- package/app-node/agent_turn.mjs +86 -0
- package/app-node/agent_turn.test.mjs +109 -0
- package/app-node/bridge_context.mjs +73 -0
- package/app-node/bridge_context.test.mjs +54 -0
- package/app-node/bridge_state.mjs +4 -0
- package/app-node/bridge_wireup.test.mjs +462 -0
- package/app-node/cli_install.test.mjs +31 -0
- package/app-node/cross_agent_routing.test.mjs +78 -0
- package/app-node/discord_command_router.mjs +204 -0
- package/app-node/discord_command_router.test.mjs +311 -0
- package/app-node/discord_voice_setup.mjs +251 -0
- package/app-node/discord_voice_setup.test.mjs +86 -0
- package/app-node/hermes_profiles.test.mjs +12 -1
- package/app-node/install_config.mjs +113 -3
- package/app-node/install_config.test.mjs +8 -0
- package/app-node/instance_doctor.test.mjs +9 -0
- package/app-node/instances.test.mjs +8 -1
- package/app-node/main.mjs +513 -1058
- package/app-node/mcp_tools.test.mjs +7 -0
- package/app-node/notification_handler.mjs +89 -0
- package/app-node/notification_handler.test.mjs +187 -0
- package/app-node/notify.mjs +73 -0
- package/app-node/notify.test.mjs +68 -0
- package/app-node/plan_dispatcher.mjs +215 -0
- package/app-node/plan_dispatcher.test.mjs +101 -0
- package/app-node/plan_mode.mjs +203 -0
- package/app-node/plan_mode.test.mjs +231 -0
- package/app-node/progress_handler.mjs +220 -0
- package/app-node/progress_handler.test.mjs +193 -0
- package/app-node/progress_speech.mjs +54 -32
- package/app-node/progress_speech.test.mjs +12 -3
- package/app-node/project_sessions.mjs +5 -2
- package/app-node/project_sessions.test.mjs +7 -0
- package/app-node/research_mode.mjs +282 -0
- package/app-node/research_mode.test.mjs +264 -0
- package/app-node/restart_notice.mjs +3 -0
- package/app-node/restart_notice.test.mjs +11 -0
- package/app-node/session_ontology.mjs +271 -0
- package/app-node/session_ontology.test.mjs +130 -0
- package/app-node/smart_progress.mjs +94 -0
- package/app-node/smart_progress.test.mjs +66 -0
- package/app-node/stream_sentencer.mjs +91 -0
- package/app-node/stream_sentencer.test.mjs +129 -0
- package/app-node/streaming_tts_queue.mjs +52 -0
- package/app-node/streaming_tts_queue.test.mjs +64 -0
- package/app-node/stt_whisper.mjs +24 -0
- package/app-node/stt_whisper.test.mjs +32 -0
- package/app-node/text_routing.mjs +22 -0
- package/app-node/text_routing.test.mjs +23 -1
- package/app-node/tts_backends.mjs +537 -3
- package/app-node/tts_backends.test.mjs +454 -0
- package/app-node/tts_player.mjs +164 -0
- package/app-node/tts_player.test.mjs +202 -0
- package/app-node/tts_runtime.mjs +134 -0
- package/app-node/tts_runtime.test.mjs +89 -0
- package/app-node/tts_settings.mjs +150 -3
- package/app-node/tts_settings.test.mjs +204 -0
- package/app-node/tts_voice_config.mjs +136 -2
- package/app-node/tts_voice_config.test.mjs +94 -0
- package/app-node/utterance_router.mjs +216 -0
- package/app-node/utterance_router.test.mjs +236 -0
- package/app-node/voice_autojoin.mjs +37 -0
- package/app-node/voice_autojoin.test.mjs +59 -0
- package/app-node/voice_io.mjs +272 -0
- package/app-node/voice_io.test.mjs +102 -0
- package/app-node/voice_turn_runner.mjs +449 -0
- package/app-node/voice_turn_runner.test.mjs +289 -0
- package/docs/CONFIGURATION.md +79 -96
- package/docs/FRESH_INSTALL.md +105 -63
- package/docs/HARNESSES.md +58 -0
- package/docs/HARNESS_AIDER.md +50 -0
- package/docs/HARNESS_CLAUDE.md +56 -0
- package/docs/HARNESS_CODEX.md +56 -0
- package/docs/HARNESS_CURSOR.md +45 -0
- package/docs/HARNESS_GEMINI.md +45 -0
- package/docs/HARNESS_HERMES.md +57 -0
- package/docs/HARNESS_OPENCLAW.md +44 -0
- package/docs/HARNESS_OPENCODE.md +44 -0
- package/docs/HERMES_VOICE.md +65 -0
- package/docs/MULTI_INSTANCE.md +16 -0
- package/docs/README.md +50 -0
- package/docs/RELEASE.md +42 -19
- package/docs/ROADMAP.md +53 -0
- package/docs/TROUBLESHOOTING.md +126 -0
- package/docs/TTS_BACKENDS.md +227 -0
- package/docs/USAGE.md +94 -40
- package/docs/assets/figures/verbalcoding-flow.svg +1 -1
- package/docs/i18n/AGENTS.es.md +34 -0
- package/docs/i18n/AGENTS.fr.md +34 -0
- package/docs/i18n/AGENTS.ja.md +34 -0
- package/docs/i18n/AGENTS.ko.md +34 -0
- package/docs/i18n/AGENTS.ru.md +34 -0
- package/docs/i18n/AGENTS.zh.md +34 -0
- package/docs/i18n/CONFIGURATION.es.md +25 -0
- package/docs/i18n/CONFIGURATION.fr.md +25 -0
- package/docs/i18n/CONFIGURATION.ja.md +25 -0
- package/docs/i18n/CONFIGURATION.ko.md +25 -0
- package/docs/i18n/CONFIGURATION.ru.md +25 -0
- package/docs/i18n/CONFIGURATION.zh.md +25 -0
- package/docs/i18n/FRESH_INSTALL.es.md +27 -2
- package/docs/i18n/FRESH_INSTALL.fr.md +27 -2
- package/docs/i18n/FRESH_INSTALL.ja.md +27 -2
- package/docs/i18n/FRESH_INSTALL.ko.md +27 -2
- package/docs/i18n/FRESH_INSTALL.ru.md +27 -2
- package/docs/i18n/FRESH_INSTALL.zh.md +27 -2
- package/docs/i18n/HARNESSES.es.md +58 -0
- package/docs/i18n/HARNESSES.fr.md +58 -0
- package/docs/i18n/HARNESSES.ja.md +58 -0
- package/docs/i18n/HARNESSES.ko.md +58 -0
- package/docs/i18n/HARNESSES.ru.md +58 -0
- package/docs/i18n/HARNESSES.zh.md +58 -0
- package/docs/i18n/HARNESS_AIDER.es.md +48 -0
- package/docs/i18n/HARNESS_AIDER.fr.md +48 -0
- package/docs/i18n/HARNESS_AIDER.ja.md +50 -0
- package/docs/i18n/HARNESS_AIDER.ko.md +50 -0
- package/docs/i18n/HARNESS_AIDER.ru.md +48 -0
- package/docs/i18n/HARNESS_AIDER.zh.md +48 -0
- package/docs/i18n/HARNESS_CLAUDE.es.md +55 -0
- package/docs/i18n/HARNESS_CLAUDE.fr.md +55 -0
- package/docs/i18n/HARNESS_CLAUDE.ja.md +56 -0
- package/docs/i18n/HARNESS_CLAUDE.ko.md +56 -0
- package/docs/i18n/HARNESS_CLAUDE.ru.md +55 -0
- package/docs/i18n/HARNESS_CLAUDE.zh.md +56 -0
- package/docs/i18n/HARNESS_CODEX.es.md +55 -0
- package/docs/i18n/HARNESS_CODEX.fr.md +55 -0
- package/docs/i18n/HARNESS_CODEX.ja.md +56 -0
- package/docs/i18n/HARNESS_CODEX.ko.md +56 -0
- package/docs/i18n/HARNESS_CODEX.ru.md +55 -0
- package/docs/i18n/HARNESS_CODEX.zh.md +56 -0
- package/docs/i18n/HARNESS_CURSOR.es.md +42 -0
- package/docs/i18n/HARNESS_CURSOR.fr.md +42 -0
- package/docs/i18n/HARNESS_CURSOR.ja.md +45 -0
- package/docs/i18n/HARNESS_CURSOR.ko.md +45 -0
- package/docs/i18n/HARNESS_CURSOR.ru.md +42 -0
- package/docs/i18n/HARNESS_CURSOR.zh.md +42 -0
- package/docs/i18n/HARNESS_GEMINI.es.md +44 -0
- package/docs/i18n/HARNESS_GEMINI.fr.md +44 -0
- package/docs/i18n/HARNESS_GEMINI.ja.md +45 -0
- package/docs/i18n/HARNESS_GEMINI.ko.md +45 -0
- package/docs/i18n/HARNESS_GEMINI.ru.md +44 -0
- package/docs/i18n/HARNESS_GEMINI.zh.md +45 -0
- package/docs/i18n/HARNESS_HERMES.es.md +54 -0
- package/docs/i18n/HARNESS_HERMES.fr.md +54 -0
- package/docs/i18n/HARNESS_HERMES.ja.md +57 -0
- package/docs/i18n/HARNESS_HERMES.ko.md +57 -0
- package/docs/i18n/HARNESS_HERMES.ru.md +54 -0
- package/docs/i18n/HARNESS_HERMES.zh.md +57 -0
- package/docs/i18n/HARNESS_OPENCLAW.es.md +41 -0
- package/docs/i18n/HARNESS_OPENCLAW.fr.md +41 -0
- package/docs/i18n/HARNESS_OPENCLAW.ja.md +44 -0
- package/docs/i18n/HARNESS_OPENCLAW.ko.md +44 -0
- package/docs/i18n/HARNESS_OPENCLAW.ru.md +41 -0
- package/docs/i18n/HARNESS_OPENCLAW.zh.md +42 -0
- package/docs/i18n/HARNESS_OPENCODE.es.md +41 -0
- package/docs/i18n/HARNESS_OPENCODE.fr.md +41 -0
- package/docs/i18n/HARNESS_OPENCODE.ja.md +44 -0
- package/docs/i18n/HARNESS_OPENCODE.ko.md +44 -0
- package/docs/i18n/HARNESS_OPENCODE.ru.md +41 -0
- package/docs/i18n/HARNESS_OPENCODE.zh.md +44 -0
- package/docs/i18n/HERMES_VOICE.es.md +46 -0
- package/docs/i18n/HERMES_VOICE.fr.md +46 -0
- package/docs/i18n/HERMES_VOICE.ja.md +46 -0
- package/docs/i18n/HERMES_VOICE.ko.md +65 -0
- package/docs/i18n/HERMES_VOICE.ru.md +46 -0
- package/docs/i18n/HERMES_VOICE.zh.md +46 -0
- package/docs/i18n/MULTI_INSTANCE.es.md +25 -0
- package/docs/i18n/MULTI_INSTANCE.fr.md +25 -0
- package/docs/i18n/MULTI_INSTANCE.ja.md +25 -0
- package/docs/i18n/MULTI_INSTANCE.ko.md +25 -0
- package/docs/i18n/MULTI_INSTANCE.ru.md +25 -0
- package/docs/i18n/MULTI_INSTANCE.zh.md +25 -0
- package/docs/i18n/README.es.md +20 -134
- package/docs/i18n/README.fr.md +20 -134
- package/docs/i18n/README.ja.md +20 -134
- package/docs/i18n/README.ko.md +20 -133
- package/docs/i18n/README.ru.md +20 -134
- package/docs/i18n/README.zh.md +20 -133
- package/docs/i18n/RELEASE.es.md +26 -1
- package/docs/i18n/RELEASE.fr.md +26 -1
- package/docs/i18n/RELEASE.ja.md +26 -1
- package/docs/i18n/RELEASE.ko.md +26 -1
- package/docs/i18n/RELEASE.ru.md +26 -1
- package/docs/i18n/RELEASE.zh.md +26 -1
- package/docs/i18n/TROUBLESHOOTING.es.md +39 -0
- package/docs/i18n/TROUBLESHOOTING.fr.md +39 -0
- package/docs/i18n/TROUBLESHOOTING.ja.md +39 -0
- package/docs/i18n/TROUBLESHOOTING.ko.md +39 -0
- package/docs/i18n/TROUBLESHOOTING.ru.md +39 -0
- package/docs/i18n/TROUBLESHOOTING.zh.md +39 -0
- package/docs/i18n/USAGE.es.md +25 -0
- package/docs/i18n/USAGE.fr.md +25 -0
- package/docs/i18n/USAGE.ja.md +25 -0
- package/docs/i18n/USAGE.ko.md +25 -0
- package/docs/i18n/USAGE.ru.md +25 -0
- package/docs/i18n/USAGE.zh.md +25 -0
- package/docs/superpowers/plans/2026-05-13-phase1-streaming-pipeline.md +122 -0
- package/docs/superpowers/plans/2026-05-13-phase10-push-notifications.md +152 -0
- package/docs/superpowers/plans/2026-05-13-phase2-agent-adapters.md +242 -0
- package/docs/superpowers/plans/2026-05-13-phase6-smart-progress.md +172 -0
- package/docs/superpowers/plans/2026-05-13-phase7-voice-plan-mode.md +108 -0
- package/docs/superpowers/plans/2026-05-14-cross-agent-voice-transfer.md +625 -0
- package/docs/superpowers/plans/2026-05-21-audio-overview-narrated-diffs.md +95 -0
- package/docs/superpowers/plans/2026-05-21-autoresearch-ontology.md +83 -0
- package/docs/superpowers/plans/2026-05-21-phase11-push-to-talk-wakeword-v2.md +77 -0
- package/docs/superpowers/plans/2026-05-21-phase12-multi-user-voice.md +147 -0
- package/docs/superpowers/plans/2026-05-21-phase14-verbalbench.md +136 -0
- package/docs/superpowers/plans/2026-05-21-phase15-phone-companion.md +72 -0
- package/integrations/fireredtts2/mlx_llm.py +183 -0
- package/integrations/fireredtts2/synth.py +156 -0
- package/integrations/fireredtts2/synth_mlx.py +196 -0
- package/integrations/mlxaudio/synth.py +74 -0
- package/integrations/neuttsair/synth.py +104 -0
- package/integrations/omnivoice/synth.py +110 -0
- package/package.json +7 -1
- package/scripts/cli.mjs +88 -3
- package/scripts/doctor.mjs +115 -4
- package/scripts/install.mjs +20 -2
- package/scripts/install_fireredtts2.sh +109 -0
- package/scripts/install_mlxaudio.sh +34 -0
- package/scripts/install_mossttsnano.sh +46 -0
- package/scripts/postinstall.mjs +34 -0
|
@@ -0,0 +1,289 @@
|
|
|
1
|
+
import test from 'node:test';
|
|
2
|
+
import assert from 'node:assert/strict';
|
|
3
|
+
import { createVoiceTurnRunner } from './voice_turn_runner.mjs';
|
|
4
|
+
import { createUtteranceRouter } from './utterance_router.mjs';
|
|
5
|
+
import { createPlanDispatcher } from './plan_dispatcher.mjs';
|
|
6
|
+
import { createBridge } from './bridge_context.mjs';
|
|
7
|
+
import { createAgentTurnLifecycle } from './agent_turn.mjs';
|
|
8
|
+
|
|
9
|
+
function noop() {}
|
|
10
|
+
async function noopAsync() {}
|
|
11
|
+
|
|
12
|
+
// Build a complete dep set for voiceTurnRunner by first constructing a
|
|
13
|
+
// real utterance_router with stubbed pure-function deps, then threading
|
|
14
|
+
// its outputs into the runner alongside the rest. This mirrors main.mjs's
|
|
15
|
+
// real construction order so the tests catch any inter-module wiring drift.
|
|
16
|
+
function makeDeps(overrides = {}) {
|
|
17
|
+
const bridge = createBridge();
|
|
18
|
+
bridge.bridgeState = {
|
|
19
|
+
deferredSize: () => 0,
|
|
20
|
+
currentEpoch: () => 1,
|
|
21
|
+
discardQueues: () => 0,
|
|
22
|
+
};
|
|
23
|
+
const agentTurnLifecycle = createAgentTurnLifecycle({ bridge, warn: noop });
|
|
24
|
+
|
|
25
|
+
const agentAdapter = {
|
|
26
|
+
label: 'default-agent', backend: 'hermes',
|
|
27
|
+
readSessionId: () => null,
|
|
28
|
+
ask: async () => 'mock agent answer',
|
|
29
|
+
};
|
|
30
|
+
|
|
31
|
+
// Construct the router (post-Phase 7b: dispatch + adapter selection only).
|
|
32
|
+
const router = createUtteranceRouter({
|
|
33
|
+
bridge,
|
|
34
|
+
log: noop, warn: noop, path: { join: (...a) => a.join('/') },
|
|
35
|
+
ROOT: '/tmp/vc', TTS_VOICE_CONFIG_PATH: '/tmp/voices.json',
|
|
36
|
+
agentAdapter,
|
|
37
|
+
settings: { voiceLanguage: 'ko', transcriptChannelId: 'tx-ch', agent: { backend: 'hermes', label: 'hermes' }, tts: {} },
|
|
38
|
+
projectSessionContextText: () => '',
|
|
39
|
+
createBridgeAgentAdapter: s => ({ label: s?.label || 'fake', backend: s?.backend || 'hermes', ask: async () => '' }),
|
|
40
|
+
buildAgentSettings: () => ({ backend: 'hermes', label: 'hermes' }),
|
|
41
|
+
commandIsInstalled: async () => true,
|
|
42
|
+
shellSplit: s => String(s).split(' '),
|
|
43
|
+
sendText: noopAsync, speakText: noopAsync,
|
|
44
|
+
ensureTtsVoiceConfig: () => ({ backends: {} }),
|
|
45
|
+
updateTtsVoiceConfig: c => c,
|
|
46
|
+
writeTtsVoiceConfig: noop,
|
|
47
|
+
applyVoiceConfigToProcessEnv: () => ({ selection: { backend: 'edge', voiceType: 'female', voice: { language: 'ko', voice: 'x' } } }),
|
|
48
|
+
ensureSelectedTtsBackendInstalled: noopAsync,
|
|
49
|
+
rebuildTtsRuntimeSettings: noop,
|
|
50
|
+
voiceCommandFromTranscript: () => null,
|
|
51
|
+
voiceChangedText: () => '',
|
|
52
|
+
voiceLanguageCommandFromTranscript: () => null,
|
|
53
|
+
voiceCloneCommandFromText: () => null,
|
|
54
|
+
voiceCloneCapture: { arm: () => ({ targetPath: '' }), cancel: () => false, current: () => null },
|
|
55
|
+
notifyVoiceCloneSampleGapIfNeeded: noopAsync,
|
|
56
|
+
languageChangedText: () => '',
|
|
57
|
+
applyRuntimeLanguage: noop,
|
|
58
|
+
persistEnvValues: noop,
|
|
59
|
+
discardVoiceInputQueues: () => 0,
|
|
60
|
+
});
|
|
61
|
+
|
|
62
|
+
// Construct the plan dispatcher (Phase 7b) consuming router outputs.
|
|
63
|
+
const planDispatcher = createPlanDispatcher({
|
|
64
|
+
bridge,
|
|
65
|
+
settings: { voiceLanguage: 'ko', transcriptChannelId: 'tx-ch', agent: { backend: 'hermes', label: 'hermes' } },
|
|
66
|
+
sendText: noopAsync,
|
|
67
|
+
speakText: noopAsync,
|
|
68
|
+
routingStateFor: router.routingStateFor,
|
|
69
|
+
adapterForBackend: router.adapterForBackend,
|
|
70
|
+
adapterForProjectSession: router.adapterForProjectSession,
|
|
71
|
+
resolveProjectSessionForChannel: () => null,
|
|
72
|
+
isAgentRoutingDecision: () => false,
|
|
73
|
+
parseDecisionAnswer: () => ({ type: 'unknown' }),
|
|
74
|
+
parsePlanVoiceCommand: () => ({ type: 'unknown' }),
|
|
75
|
+
applyPlanCommand: s => s,
|
|
76
|
+
parsePlanOutput: () => ({ steps: [], decisions: [] }),
|
|
77
|
+
renderDecisionPrompt: d => d?.text || '',
|
|
78
|
+
renderResolvedDecisions: () => '',
|
|
79
|
+
renderFinalPlan: () => '',
|
|
80
|
+
planModePreamble: () => '',
|
|
81
|
+
planExecutionPreamble: () => '',
|
|
82
|
+
isPlanEntryUtterance: () => false,
|
|
83
|
+
});
|
|
84
|
+
|
|
85
|
+
const settings = { voiceLanguage: 'ko', transcriptChannelId: 'tx-ch', agent: { backend: 'hermes', label: 'hermes' }, tts: {} };
|
|
86
|
+
|
|
87
|
+
return {
|
|
88
|
+
bridge,
|
|
89
|
+
agentTurnLifecycle,
|
|
90
|
+
settings,
|
|
91
|
+
client: { channels: { cache: new Map() } },
|
|
92
|
+
log: noop, warn: noop, fs: { rm: (_p, _o, cb) => cb && cb() },
|
|
93
|
+
// voice_io
|
|
94
|
+
transcribe: async () => 'hey hermes do a thing',
|
|
95
|
+
// tts_player
|
|
96
|
+
beginStreamingTurn: () => false,
|
|
97
|
+
endStreamingTurn: noopAsync,
|
|
98
|
+
speakText: noopAsync,
|
|
99
|
+
// progress_handler
|
|
100
|
+
queueProgressSpeechText: noop,
|
|
101
|
+
stopProgressSpeech: noop,
|
|
102
|
+
speakImmediateNotice: noopAsync,
|
|
103
|
+
// notification_handler
|
|
104
|
+
maybeNotifyTaskComplete: noopAsync,
|
|
105
|
+
// utterance_router outputs (real router instance built above)
|
|
106
|
+
handleLanguageCommand: router.handleLanguageCommand,
|
|
107
|
+
handleTtsVoiceCommand: router.handleTtsVoiceCommand,
|
|
108
|
+
handleVoiceCloneCommand: router.handleVoiceCloneCommand,
|
|
109
|
+
dispatchPlanModeUtterance: planDispatcher.dispatchPlanModeUtterance,
|
|
110
|
+
adapterForBackend: router.adapterForBackend,
|
|
111
|
+
adapterForProjectSession: router.adapterForProjectSession,
|
|
112
|
+
planChannelKey: planDispatcher.planChannelKey,
|
|
113
|
+
routingStateFor: router.routingStateFor,
|
|
114
|
+
recordUtterance: router.recordUtterance,
|
|
115
|
+
clearTransientRouting: router.clearTransientRouting,
|
|
116
|
+
// pure helpers
|
|
117
|
+
isAllowed: () => true,
|
|
118
|
+
isAbortError: e => e?.name === 'AbortError',
|
|
119
|
+
sleep: async () => {},
|
|
120
|
+
sendText: noopAsync,
|
|
121
|
+
sendEmbed: async () => true,
|
|
122
|
+
reloadRuntimeLanguageFromEnv: () => ({ changed: false, voiceLanguage: 'ko', whisperLanguage: 'ko' }),
|
|
123
|
+
drainDeferredProcessingUtterances: noopAsync,
|
|
124
|
+
resolveProjectSessionForChannel: () => null,
|
|
125
|
+
projectSessionContextText: () => '',
|
|
126
|
+
ontologyStateFor: () => ({ nodeCount: 0, serializeForHandoff: () => '' }),
|
|
127
|
+
captureOntologyFromTurn: noop,
|
|
128
|
+
formatRecentDiscordContext: () => '',
|
|
129
|
+
formatSttResultMessage: (_lang, _u, t) => `you said: ${t}`,
|
|
130
|
+
formatSttStartMessage: () => '🎧',
|
|
131
|
+
formatVoiceErrorMessage: (_lang, m) => m,
|
|
132
|
+
formatWakeRejectedMessage: () => 'no wake word',
|
|
133
|
+
agentAnswerHeader: () => 'agent says:',
|
|
134
|
+
emptyAgentAnswer: () => '(empty)',
|
|
135
|
+
spokenResultOnly: (_p, a) => a,
|
|
136
|
+
stripWake: t => t,
|
|
137
|
+
acceptsWake: () => true,
|
|
138
|
+
sensitivityChangedSpeech: () => '',
|
|
139
|
+
sensitivityModeFromTranscript: () => null,
|
|
140
|
+
sensitivityStatusText: () => '',
|
|
141
|
+
setSensitivityMode: () => ({ mode: 'normal' }),
|
|
142
|
+
isSensitivityOnlyRequest: () => false,
|
|
143
|
+
verboseChangedSpeech: () => '',
|
|
144
|
+
verboseModeFromTranscript: () => null,
|
|
145
|
+
verboseStatusText: () => '',
|
|
146
|
+
setVerboseProgress: noop,
|
|
147
|
+
isVerboseOnlyRequest: () => false,
|
|
148
|
+
isRoutingOnlyUtterance: () => false,
|
|
149
|
+
parseAgentRoutingCommand: () => ({ type: 'none' }),
|
|
150
|
+
renderAgentPrefix: () => '',
|
|
151
|
+
buildCrossAgentPrompt: ({ prompt }) => prompt,
|
|
152
|
+
buildFallbackDecision: () => ({ slot: 'fallback' }),
|
|
153
|
+
parseDecisionAnswer: () => ({ type: 'unknown' }),
|
|
154
|
+
parseResearchCommand: () => ({ type: 'none' }),
|
|
155
|
+
runResearchTurn: async () => ({ status: 'no_backend' }),
|
|
156
|
+
PROGRESS_IDLE_CHECK_MS: 5000,
|
|
157
|
+
PROGRESS_IDLE_NOTICE_INITIAL_MS: 10000,
|
|
158
|
+
PROGRESS_IDLE_NOTICE_LIMIT: 20,
|
|
159
|
+
PROGRESS_IDLE_NOTICE_MAX_MS: 30000,
|
|
160
|
+
PROGRESS_IDLE_NOTICE_MULTIPLIER: 1.8,
|
|
161
|
+
STT_START_VOICE_NOTICE: false,
|
|
162
|
+
...overrides,
|
|
163
|
+
};
|
|
164
|
+
}
|
|
165
|
+
|
|
166
|
+
test('createVoiceTurnRunner exposes handleRecording', () => {
|
|
167
|
+
const runner = createVoiceTurnRunner(makeDeps());
|
|
168
|
+
assert.equal(typeof runner.handleRecording, 'function');
|
|
169
|
+
});
|
|
170
|
+
|
|
171
|
+
test('handleRecording happy path: transcribe -> agent -> send + speak + notify, cleanup green', async () => {
|
|
172
|
+
const calls = { transcribe: 0, askPrompt: '', sendText: [], speakText: [], notify: [] };
|
|
173
|
+
// Capture the exact prompt the runner sends to the agent and the exact
|
|
174
|
+
// answer that flows back out.
|
|
175
|
+
const fakeAdapter = {
|
|
176
|
+
label: 'hermes', backend: 'hermes', readSessionId: () => null,
|
|
177
|
+
ask: async (prompt, _signal, plan) => {
|
|
178
|
+
calls.askPrompt = prompt;
|
|
179
|
+
assert.equal(plan.label, 'hermes', 'plan label = agent label');
|
|
180
|
+
assert.equal(plan.task, true, 'plan.task=true for voice turn');
|
|
181
|
+
return 'twelve apples';
|
|
182
|
+
},
|
|
183
|
+
};
|
|
184
|
+
const deps = makeDeps({
|
|
185
|
+
transcribe: async wav => { calls.transcribe++; assert.equal(wav, '/tmp/u.wav'); return 'hermes do the thing'; },
|
|
186
|
+
sendText: async t => { calls.sendText.push(t); return true; },
|
|
187
|
+
speakText: async t => { calls.speakText.push(t); },
|
|
188
|
+
maybeNotifyTaskComplete: async ({ answer, label }) => { calls.notify.push({ answer, label }); },
|
|
189
|
+
// Force the runner's adapter lookup to return our test adapter rather
|
|
190
|
+
// than the router's auto-created one (the router builds a fresh adapter
|
|
191
|
+
// per backend via createBridgeAgentAdapter).
|
|
192
|
+
adapterForBackend: () => fakeAdapter,
|
|
193
|
+
adapterForProjectSession: () => fakeAdapter,
|
|
194
|
+
});
|
|
195
|
+
const { handleRecording } = createVoiceTurnRunner(deps);
|
|
196
|
+
assert.equal(deps.bridge.processing, false);
|
|
197
|
+
await handleRecording('user-1', '/tmp/u.wav', 8192, 1, null);
|
|
198
|
+
assert.equal(calls.transcribe, 1, 'transcribe called once');
|
|
199
|
+
assert.equal(calls.askPrompt, 'hermes do the thing', 'agent receives the post-wake prompt');
|
|
200
|
+
assert.deepEqual(calls.speakText.at(-1), 'twelve apples', 'agent answer is spoken');
|
|
201
|
+
assert.ok(calls.sendText.some(s => /you said: hermes do the thing/.test(s)), 'STT echoed');
|
|
202
|
+
assert.ok(calls.sendText.some(s => /twelve apples/.test(s)), 'agent answer surfaced as text');
|
|
203
|
+
assert.equal(calls.notify.length, 1, 'maybeNotifyTaskComplete fired once');
|
|
204
|
+
assert.equal(calls.notify[0].label, 'hermes', 'notify carries the agent label');
|
|
205
|
+
assert.equal(deps.bridge.processing, false, 'processing flag cleared in finally');
|
|
206
|
+
assert.equal(deps.bridge.activeTurnId, 0, 'activeTurnId cleared');
|
|
207
|
+
assert.equal(deps.bridge.activeProgressAbortController, null, 'progress controller cleared');
|
|
208
|
+
});
|
|
209
|
+
|
|
210
|
+
test('handleRecording cleans up progress controller even when agent throws', async () => {
|
|
211
|
+
const fakeAdapter = {
|
|
212
|
+
label: 'hermes', backend: 'hermes', readSessionId: () => null,
|
|
213
|
+
ask: async () => { throw new Error('agent boom'); },
|
|
214
|
+
};
|
|
215
|
+
const deps = makeDeps({
|
|
216
|
+
adapterForBackend: () => fakeAdapter,
|
|
217
|
+
adapterForProjectSession: () => fakeAdapter,
|
|
218
|
+
});
|
|
219
|
+
const { handleRecording } = createVoiceTurnRunner(deps);
|
|
220
|
+
let finishStatus = null;
|
|
221
|
+
let finishError = null;
|
|
222
|
+
const metricsTurn = { mark: () => {}, addMeta: () => {}, stage: () => {}, finish: r => { finishStatus = r.status; finishError = r.error; } };
|
|
223
|
+
await handleRecording('user-1', '/tmp/u.wav', 8192, 1, metricsTurn);
|
|
224
|
+
assert.equal(finishStatus, 'error');
|
|
225
|
+
assert.match(finishError || '', /agent boom/);
|
|
226
|
+
// Cleanup invariants — the bug Codex flagged on the original voice-path
|
|
227
|
+
// finally is now guarded by agentTurnLifecycle.finish, so we double-check.
|
|
228
|
+
assert.equal(deps.bridge.processing, false);
|
|
229
|
+
assert.equal(deps.bridge.activeProgressAbortController, null);
|
|
230
|
+
assert.equal(deps.bridge.currentAbortController, null);
|
|
231
|
+
});
|
|
232
|
+
|
|
233
|
+
test('handleRecording drops when bridge.processing is already true', async () => {
|
|
234
|
+
const deps = makeDeps();
|
|
235
|
+
deps.bridge.processing = true;
|
|
236
|
+
const { handleRecording } = createVoiceTurnRunner(deps);
|
|
237
|
+
let finishStatus = null;
|
|
238
|
+
const metricsTurn = { mark: () => {}, addMeta: () => {}, stage: () => {}, finish: r => { finishStatus = r.status; } };
|
|
239
|
+
await handleRecording('user-1', '/tmp/u.wav', 8192, 1, metricsTurn);
|
|
240
|
+
assert.equal(finishStatus, 'drop_processing');
|
|
241
|
+
assert.equal(deps.bridge.processing, true, 'processing flag left intact (other turn owns it)');
|
|
242
|
+
});
|
|
243
|
+
|
|
244
|
+
test('handleRecording rejects unauthorized users', async () => {
|
|
245
|
+
const deps = makeDeps({ isAllowed: () => false });
|
|
246
|
+
const { handleRecording } = createVoiceTurnRunner(deps);
|
|
247
|
+
let finishStatus = null;
|
|
248
|
+
const metricsTurn = { mark: () => {}, addMeta: () => {}, stage: () => {}, finish: r => { finishStatus = r.status; } };
|
|
249
|
+
await handleRecording('intruder', '/tmp/u.wav', 8192, 1, metricsTurn);
|
|
250
|
+
assert.equal(finishStatus, 'unauthorized');
|
|
251
|
+
});
|
|
252
|
+
|
|
253
|
+
test('handleRecording short-circuits on empty transcript', async () => {
|
|
254
|
+
const deps = makeDeps({ transcribe: async () => '' });
|
|
255
|
+
const { handleRecording } = createVoiceTurnRunner(deps);
|
|
256
|
+
let finishStatus = null;
|
|
257
|
+
const metricsTurn = { mark: () => {}, addMeta: () => {}, stage: () => {}, finish: r => { finishStatus = r.status; } };
|
|
258
|
+
await handleRecording('user-1', '/tmp/u.wav', 8192, 1, metricsTurn);
|
|
259
|
+
assert.equal(finishStatus, 'empty_transcript');
|
|
260
|
+
assert.equal(deps.bridge.processing, false, 'processing flag still cleaned up');
|
|
261
|
+
});
|
|
262
|
+
|
|
263
|
+
test('handleRecording short-circuits when wake word missing', async () => {
|
|
264
|
+
const sent = [];
|
|
265
|
+
const deps = makeDeps({
|
|
266
|
+
acceptsWake: () => false,
|
|
267
|
+
sendText: async t => { sent.push(t); return true; },
|
|
268
|
+
});
|
|
269
|
+
const { handleRecording } = createVoiceTurnRunner(deps);
|
|
270
|
+
let finishStatus = null;
|
|
271
|
+
const metricsTurn = { mark: () => {}, addMeta: () => {}, stage: () => {}, finish: r => { finishStatus = r.status; } };
|
|
272
|
+
await handleRecording('user-1', '/tmp/u.wav', 8192, 1, metricsTurn);
|
|
273
|
+
assert.equal(finishStatus, 'wake_rejected');
|
|
274
|
+
assert.ok(sent.some(t => /no wake word/.test(t)), 'wake-rejected message sent');
|
|
275
|
+
});
|
|
276
|
+
|
|
277
|
+
test('handleRecording with stale language reload aborts before transcribe', async () => {
|
|
278
|
+
let transcribed = false;
|
|
279
|
+
const deps = makeDeps({
|
|
280
|
+
reloadRuntimeLanguageFromEnv: () => ({ changed: true, voiceLanguage: 'en', whisperLanguage: 'en' }),
|
|
281
|
+
transcribe: async () => { transcribed = true; return 'hi'; },
|
|
282
|
+
});
|
|
283
|
+
const { handleRecording } = createVoiceTurnRunner(deps);
|
|
284
|
+
let finishStatus = null;
|
|
285
|
+
const metricsTurn = { mark: () => {}, addMeta: () => {}, stage: () => {}, finish: r => { finishStatus = r.status; } };
|
|
286
|
+
await handleRecording('user-1', '/tmp/u.wav', 8192, 1, metricsTurn);
|
|
287
|
+
assert.equal(transcribed, false, 'transcribe not called when language changed');
|
|
288
|
+
assert.equal(finishStatus, 'drop_stale_language_change');
|
|
289
|
+
});
|
package/docs/CONFIGURATION.md
CHANGED
|
@@ -1,32 +1,70 @@
|
|
|
1
1
|
# VerbalCoding Configuration
|
|
2
2
|
|
|
3
|
-
|
|
3
|
+
<!-- readme-glow-up:intro -->
|
|
4
|
+
<p align="center">
|
|
5
|
+
<a href="../README.md">README</a> ·
|
|
6
|
+
<a href="README.md">Docs hub</a> ·
|
|
7
|
+
<a href="FRESH_INSTALL.md">Fresh Install</a> ·
|
|
8
|
+
<a href="USAGE.md">Usage</a> ·
|
|
9
|
+
<a href="CONFIGURATION.md">Configuration</a> ·
|
|
10
|
+
<a href="TROUBLESHOOTING.md">Troubleshooting</a> ·
|
|
11
|
+
<a href="MULTI_INSTANCE.md">Multi-Instance</a>
|
|
12
|
+
</p>
|
|
13
|
+
|
|
14
|
+
> Settings reference for Discord, agents, TTS, MCP, and runtime behavior.
|
|
15
|
+
>
|
|
16
|
+
> Fast path: `vc setup handles normal config; edit .env only for advanced overrides`
|
|
17
|
+
<!-- /readme-glow-up:intro -->
|
|
18
|
+
|
|
19
|
+
## Setup Command Map
|
|
20
|
+
|
|
21
|
+
For npm/global installs, use `vc` commands instead of manually editing `.env`:
|
|
4
22
|
|
|
5
|
-
|
|
23
|
+
```bash
|
|
24
|
+
vc setup # guided setup: prerequisites, Discord token, voice channels
|
|
25
|
+
vc setup --yes # non-interactive bootstrap/starter config
|
|
26
|
+
vc setup token # later update Discord bot token
|
|
27
|
+
vc setup channels "General,Team Voice" # later update auto-join voice channel names
|
|
28
|
+
vc setup channel "General" # alias
|
|
29
|
+
vc setup voice "General" # alias
|
|
30
|
+
vc doctor # redacted health check and supported auto-fixes
|
|
31
|
+
vc start # run the default bridge
|
|
32
|
+
```
|
|
6
33
|
|
|
7
|
-
-
|
|
8
|
-
- Discord official bot overview: <https://docs.discord.com/developers/bots/overview>
|
|
9
|
-
- Discord official quick start: <https://docs.discord.com/developers/quick-start/getting-started>
|
|
34
|
+
Clone-only setup remains available:
|
|
10
35
|
|
|
11
36
|
```bash
|
|
12
|
-
./scripts/install.sh
|
|
37
|
+
./scripts/install.sh --yes
|
|
13
38
|
```
|
|
14
39
|
|
|
15
|
-
|
|
40
|
+
`vc setup token` updates `DISCORD_BOT_TOKEN` and optional `DISCORD_CLIENT_ID`. `vc setup channels` updates `AUTO_JOIN_VOICE_CHANNELS`. Both preserve unrelated `.env` values, write the file with mode `0600`, and avoid printing token values.
|
|
41
|
+
|
|
42
|
+
## Discord Bot/Application Setup
|
|
43
|
+
|
|
44
|
+
Use these upstream guides for the Discord-side steps, then return to VerbalCoding setup:
|
|
16
45
|
|
|
17
|
-
|
|
46
|
+
- Hermes Agent Discord messaging guide: <https://hermes-agent.nousresearch.com/docs/user-guide/messaging/discord>
|
|
47
|
+
- Discord official bot overview: <https://docs.discord.com/developers/bots/overview>
|
|
48
|
+
- Discord official quick start: <https://docs.discord.com/developers/quick-start/getting-started>
|
|
49
|
+
|
|
50
|
+
Minimum flow:
|
|
18
51
|
|
|
19
52
|
```bash
|
|
20
|
-
|
|
53
|
+
vc bot invite <discord-client-id>
|
|
54
|
+
vc setup token <bot-token> --client-id <discord-client-id>
|
|
55
|
+
vc setup channels "VerbalCoding,General"
|
|
56
|
+
vc doctor
|
|
21
57
|
```
|
|
22
58
|
|
|
59
|
+
The bot needs Message Content privileged intent plus text/voice permissions for the target channels.
|
|
60
|
+
|
|
23
61
|
## Supported Agent Backends
|
|
24
62
|
|
|
25
63
|
Set `AGENT_BACKEND` in `.env`.
|
|
26
64
|
|
|
27
65
|
| Backend | Default command | Notes |
|
|
28
66
|
|---|---|---|
|
|
29
|
-
| `hermes` | `hermes chat -Q -q` | Default. Preserves `.verbalcoding-session` resume behavior. |
|
|
67
|
+
| `hermes` | `hermes chat -Q -q` | Default. Preserves `.verbalcoding-session` resume behavior. `vc doctor` can auto-install the Hermes CLI on supported macOS/Linux installs. |
|
|
30
68
|
| `claude-code` / `claude` | `claude -p` | Override with `CLAUDE_COMMAND` or `AGENT_COMMAND`. |
|
|
31
69
|
| `codex` | `codex exec` | Override with `CODEX_COMMAND` or `AGENT_COMMAND`. |
|
|
32
70
|
| `gemini` | `gemini -p` | Override with `GEMINI_COMMAND` or `AGENT_COMMAND`. |
|
|
@@ -62,8 +100,9 @@ New backends should implement the same contract and keep voice/STT/TTS behavior
|
|
|
62
100
|
|
|
63
101
|
```bash
|
|
64
102
|
DISCORD_BOT_TOKEN="***"
|
|
103
|
+
DISCORD_CLIENT_ID="123456789012345678"
|
|
65
104
|
DISCORD_ALLOWED_USERS="123456789012345678"
|
|
66
|
-
AUTO_JOIN_VOICE_CHANNELS="
|
|
105
|
+
AUTO_JOIN_VOICE_CHANNELS="VerbalCoding,General"
|
|
67
106
|
TRANSCRIPT_CHANNEL_ID="123456789012345678"
|
|
68
107
|
|
|
69
108
|
AGENT_BACKEND="hermes"
|
|
@@ -95,9 +134,7 @@ Language presets and voice selection are separate:
|
|
|
95
134
|
- Live voice commands such as “남자 한국어 목소리로 바꿔”, “여자 한국어 목소리로 바꿔”, `change voice to Korean female`, and `switch speaker to English` change only the speaker/voice type.
|
|
96
135
|
- `!voice-test <text>` plays a quick sample with the currently selected backend and voice.
|
|
97
136
|
|
|
98
|
-
Voice selection is stored in `config/tts-voices.json` by default. Override the path with `TTS_VOICE_CONFIG`.
|
|
99
|
-
|
|
100
|
-
Default Edge catalog:
|
|
137
|
+
Voice selection is stored in `config/tts-voices.json` by default. Override the path with `TTS_VOICE_CONFIG`.
|
|
101
138
|
|
|
102
139
|
| `TTS_VOICE_TYPE` | `TTS_VOICE` | Language |
|
|
103
140
|
|---|---|---|
|
|
@@ -107,29 +144,9 @@ Default Edge catalog:
|
|
|
107
144
|
| `english_male` | `en-US-GuyNeural` | English |
|
|
108
145
|
| `english_female` | `en-US-AriaNeural` | English |
|
|
109
146
|
|
|
110
|
-
Manual persistent override:
|
|
111
|
-
|
|
112
|
-
```bash
|
|
113
|
-
TTS_BACKEND="edge"
|
|
114
|
-
TTS_VOICE_TYPE="korean_male"
|
|
115
|
-
TTS_VOICE="ko-KR-InJoonNeural"
|
|
116
|
-
TTS_VOICE_CONFIG="config/tts-voices.json"
|
|
117
|
-
```
|
|
118
|
-
|
|
119
|
-
For OpenVoice, SpeechSwift, or Supertonic, keep the backend-specific voice/reference settings in the sections below; the same voice catalog file can still track the active voice type.
|
|
120
|
-
|
|
121
|
-
Backend-specific voice options:
|
|
122
|
-
|
|
123
|
-
| Backend | Settings | Voice choices |
|
|
124
|
-
|---|---|---|
|
|
125
|
-
| Edge | `TTS_VOICE_TYPE`, `TTS_VOICE` | Built-in types above, plus any voice returned by `edge-tts --list-voices` |
|
|
126
|
-
| Supertonic | `SUPERTONIC_VOICE`, `SUPERTONIC_LANGUAGE` | `M1`–`M5`, `F1`–`F5`; language `ko`, `en`, `es`, `pt`, `fr` |
|
|
127
|
-
| OpenVoice | `OPENVOICE_REF_AUDIO`, `OPENVOICE_STYLE`, `OPENVOICE_LANGUAGE` | User-provided permitted reference WAV; style defaults to `default` |
|
|
128
|
-
| SpeechSwift / CosyVoice | `SPEECHSWIFT_REF_AUDIO`, `SPEECHSWIFT_ENGINE`, `SPEECHSWIFT_SPEAKER`, `SPEECHSWIFT_MODEL_ID` | Reference-sample voices for CosyVoice, or backend-supported speaker/model IDs |
|
|
129
|
-
|
|
130
147
|
## Utterance Segmentation
|
|
131
148
|
|
|
132
|
-
`UTTERANCE_IDLE_MS` controls how long the bridge waits after a speech segment before it decides the user is done and starts STT.
|
|
149
|
+
`UTTERANCE_IDLE_MS` controls how long the bridge waits after a speech segment before it decides the user is done and starts STT.
|
|
133
150
|
|
|
134
151
|
```bash
|
|
135
152
|
UTTERANCE_IDLE_MS="4500" # balanced default
|
|
@@ -138,7 +155,7 @@ UTTERANCE_IDLE_MS="6000" # safer for long dictation with pauses
|
|
|
138
155
|
|
|
139
156
|
## MCP Server
|
|
140
157
|
|
|
141
|
-
VerbalCoding ships a stdio MCP server so Hermes Agent or any MCP client can control the bridge through tools
|
|
158
|
+
VerbalCoding ships a stdio MCP server so Hermes Agent or any MCP client can control the bridge through tools.
|
|
142
159
|
|
|
143
160
|
Hermes config example:
|
|
144
161
|
|
|
@@ -161,74 +178,40 @@ Exposed MCP tools:
|
|
|
161
178
|
| `set_language` | Update STT/progress/TTS language together |
|
|
162
179
|
| `start`, `stop`, `restart` | Control the Discord voice bridge |
|
|
163
180
|
|
|
164
|
-
##
|
|
181
|
+
## Docker / Container Networking
|
|
165
182
|
|
|
166
|
-
|
|
183
|
+
Discord voice needs outbound UDP. If Docker logs show `Cannot perform IP discovery - socket closed`, try Linux host networking:
|
|
167
184
|
|
|
168
|
-
```
|
|
169
|
-
|
|
170
|
-
|
|
171
|
-
|
|
172
|
-
# Put a permitted reference sample at voice-samples/user-reference.wav,
|
|
173
|
-
# or capture one from Discord with !voice-clone capture.
|
|
174
|
-
python3 integrations/openvoice/synth.py --openvoice-dir vendor/OpenVoice --ref-audio voice-samples/user-reference.wav --text '안녕하세요. 버벌코딩 목소리 복제 테스트입니다.' --output /tmp/verbalcoding-openvoice-smoke.wav
|
|
175
|
-
```
|
|
176
|
-
|
|
177
|
-
Then set:
|
|
178
|
-
|
|
179
|
-
```bash
|
|
180
|
-
TTS_BACKEND="openvoice"
|
|
181
|
-
OPENVOICE_REF_AUDIO="./voice-samples/user-reference.wav"
|
|
182
|
-
OPENVOICE_PROGRESS="0"
|
|
183
|
-
```
|
|
184
|
-
|
|
185
|
-
Only clone voices you own or have permission to use. If OpenVoice fails or times out, VerbalCoding falls back to Edge TTS.
|
|
186
|
-
|
|
187
|
-
## Optional Supertonic TTS
|
|
188
|
-
|
|
189
|
-
```bash
|
|
190
|
-
./scripts/setup_supertonic.sh
|
|
191
|
-
supertonic tts '안녕하세요. 수퍼토닉 테스트입니다.' --lang ko --voice M1 --steps 2 --speed 1.0 -o /tmp/verbalcoding-supertonic.wav
|
|
192
|
-
```
|
|
193
|
-
|
|
194
|
-
Then set:
|
|
195
|
-
|
|
196
|
-
```bash
|
|
197
|
-
TTS_BACKEND="supertonic"
|
|
198
|
-
SUPERTONIC_COMMAND="./.venv-supertonic/bin/supertonic"
|
|
199
|
-
SUPERTONIC_VOICE="M1"
|
|
200
|
-
SUPERTONIC_LANGUAGE="ko"
|
|
201
|
-
SUPERTONIC_STEPS="2"
|
|
202
|
-
SUPERTONIC_SPEED="1.0"
|
|
203
|
-
SUPERTONIC_PROGRESS="0"
|
|
185
|
+
```yaml
|
|
186
|
+
services:
|
|
187
|
+
verbalcoding:
|
|
188
|
+
network_mode: "host"
|
|
204
189
|
```
|
|
205
190
|
|
|
206
|
-
|
|
207
|
-
|
|
208
|
-
## Optional SpeechSwift / CosyVoice TTS
|
|
191
|
+
Remove `ports:` from that Compose service. On Docker Desktop for macOS/Windows, host networking may not expose UDP the same way; run on the host or a Linux VM if voice still fails.
|
|
209
192
|
|
|
210
|
-
|
|
193
|
+
## Optional TTS Backends
|
|
211
194
|
|
|
212
|
-
|
|
213
|
-
brew tap soniqo/speech https://github.com/soniqo/speech-swift
|
|
214
|
-
brew install speech
|
|
215
|
-
```
|
|
195
|
+
For the full backend matrix, latency notes, aliases, and Mac mini caveats, see [TTS Backends](TTS_BACKENDS.md).
|
|
216
196
|
|
|
217
|
-
|
|
218
|
-
|
|
219
|
-
```bash
|
|
220
|
-
TTS_BACKEND="speechswift"
|
|
221
|
-
SPEECHSWIFT_MODE="server"
|
|
222
|
-
SPEECHSWIFT_ENGINE="cosyvoice"
|
|
223
|
-
SPEECHSWIFT_LANGUAGE="korean"
|
|
224
|
-
SPEECHSWIFT_REF_AUDIO="./voice-samples/user-reference.wav"
|
|
225
|
-
SPEECHSWIFT_SERVER_HOST="127.0.0.1"
|
|
226
|
-
SPEECHSWIFT_SERVER_PORT="18080"
|
|
227
|
-
SPEECHSWIFT_SERVER_URL="http://127.0.0.1:18080"
|
|
228
|
-
SPEECHSWIFT_PROGRESS="0"
|
|
229
|
-
```
|
|
197
|
+
Edge TTS remains the default and fallback. Optional local backends are configured with their own env vars:
|
|
230
198
|
|
|
231
|
-
|
|
199
|
+
| Backend | Settings | Voice choices |
|
|
200
|
+
|---|---|---|
|
|
201
|
+
| Edge | `TTS_VOICE_TYPE`, `TTS_VOICE` | Built-in types above, plus any voice returned by `edge-tts --list-voices` |
|
|
202
|
+
| Supertonic | `SUPERTONIC_VOICE`, `SUPERTONIC_LANGUAGE` | `M1`–`M5`, `F1`–`F5`; language `ko`, `en`, `es`, `pt`, `fr` |
|
|
203
|
+
| OpenVoice | `OPENVOICE_REF_AUDIO`, `OPENVOICE_STYLE`, `OPENVOICE_LANGUAGE` | User-provided permitted reference WAV; style defaults to `default` |
|
|
204
|
+
| SpeechSwift / CosyVoice | `SPEECHSWIFT_REF_AUDIO`, `SPEECHSWIFT_ENGINE`, `SPEECHSWIFT_SPEAKER`, `SPEECHSWIFT_MODEL_ID` | Reference-sample voices for CosyVoice, or backend-supported speaker/model IDs |
|
|
205
|
+
| OmniVoice | `OMNIVOICE_PYTHON`, `OMNIVOICE_MODEL`, `OMNIVOICE_REF_AUDIO`, `OMNIVOICE_REF_TEXT`, `OMNIVOICE_LANGUAGE`, `OMNIVOICE_SPEAKER` | k2-fsa/OmniVoice reference-sample cloning or optional voice-design attributes |
|
|
206
|
+
| Qwen3 TTS | `QWEN3TTS_COMMAND`, `QWEN3TTS_MODE`, `QWEN3TTS_MODEL`, `QWEN3TTS_SPEAKER` | Preset speaker such as `sohee`, reference mode, or designed speaker text |
|
|
207
|
+
| MLX Audio | `MLXAUDIO_PYTHON`, `MLXAUDIO_MODEL`, `MLXAUDIO_VOICE`, `MLXAUDIO_LANG_CODE` | MLX Qwen3 voice/speaker IDs such as `Chelsie` |
|
|
208
|
+
| NeuTTS Air | `NEUTTSAIR_PYTHON`, `NEUTTSAIR_BACKBONE_REPO`, `NEUTTSAIR_CODEC_REPO`, `NEUTTSAIR_REF_AUDIO`, `NEUTTSAIR_REF_TEXT` | English NeuTTS Air reference-sample cloning; use Q4 GGUF for lower latency |
|
|
209
|
+
| FireRedTTS-2 | `FIREREDTTS2_COMMAND`, `FIREREDTTS2_PRETRAINED_DIR`, `FIREREDTTS2_PROMPT_AUDIO`, `FIREREDTTS2_PROMPT_TEXT` | Prompt-reference voice or random speaker |
|
|
210
|
+
| FireRedTTS-2 MLX helper | `integrations/fireredtts2/synth_mlx.py` | Experimental Apple Silicon LLM-port helper; not a canonical `TTS_BACKEND` yet |
|
|
211
|
+
| MOSS-TTS-Nano | `MOSSTTSNANO_COMMAND`, `MOSSTTSNANO_SCRIPT`, `MOSSTTSNANO_CHECKPOINT`, `MOSSTTSNANO_PROMPT_AUDIO` | OpenMOSS prompt reference or continuation mode |
|
|
212
|
+
| MOSS-TTS-Nano MLX | `MOSSTTSNANO_MLX_PYTHON`, `MOSSTTSNANO_MLX_SCRIPT`, `MOSSTTSNANO_MLX_WORKER`, `MOSSTTSNANO_PROMPT_AUDIO` | Experimental MLX hybrid prompt reference or continuation mode |
|
|
213
|
+
|
|
214
|
+
Only clone voices you own or have permission to use. For OmniVoice, install it in a separate Python environment such as `.venv-omnivoice` (`pip install torch torchaudio soundfile omnivoice`) and set `TTS_BACKEND=omnivoice`. For NeuTTS Air, install the local `neutts` package in `.venv-neuttsair`, set `TTS_BACKEND=neuttsair`, and keep progress prompts on Edge unless explicitly testing local progress TTS. If a local backend fails or times out, VerbalCoding falls back to Edge TTS.
|
|
232
215
|
|
|
233
216
|
## Operational Notes
|
|
234
217
|
|