verbalcoding 0.2.12 → 0.2.13
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.env.example +74 -4
- package/README.es.md +3 -1
- package/README.fr.md +3 -1
- package/README.ja.md +3 -1
- package/README.ko.md +4 -2
- package/README.md +4 -2
- package/README.ru.md +3 -1
- package/README.zh.md +3 -1
- package/app-node/agent_adapters.test.mjs +14 -0
- package/app-node/agent_routing.mjs +148 -0
- package/app-node/agent_routing.test.mjs +138 -0
- package/app-node/agent_turn.mjs +86 -0
- package/app-node/agent_turn.test.mjs +109 -0
- package/app-node/bridge_context.mjs +73 -0
- package/app-node/bridge_context.test.mjs +54 -0
- package/app-node/bridge_state.mjs +4 -0
- package/app-node/bridge_wireup.test.mjs +462 -0
- package/app-node/cli_install.test.mjs +31 -0
- package/app-node/cross_agent_routing.test.mjs +78 -0
- package/app-node/discord_command_router.mjs +204 -0
- package/app-node/discord_command_router.test.mjs +311 -0
- package/app-node/discord_voice_setup.mjs +251 -0
- package/app-node/discord_voice_setup.test.mjs +86 -0
- package/app-node/hermes_profiles.test.mjs +12 -1
- package/app-node/install_config.mjs +110 -3
- package/app-node/install_config.test.mjs +8 -0
- package/app-node/instance_doctor.test.mjs +9 -0
- package/app-node/instances.test.mjs +8 -1
- package/app-node/main.mjs +488 -1368
- package/app-node/mcp_tools.test.mjs +7 -0
- package/app-node/notification_handler.mjs +89 -0
- package/app-node/notification_handler.test.mjs +187 -0
- package/app-node/plan_dispatcher.mjs +215 -0
- package/app-node/plan_dispatcher.test.mjs +101 -0
- package/app-node/plan_mode.mjs +36 -7
- package/app-node/plan_mode.test.mjs +78 -0
- package/app-node/progress_handler.mjs +220 -0
- package/app-node/progress_handler.test.mjs +193 -0
- package/app-node/progress_speech.mjs +54 -32
- package/app-node/progress_speech.test.mjs +12 -3
- package/app-node/project_sessions.mjs +5 -2
- package/app-node/project_sessions.test.mjs +7 -0
- package/app-node/research_mode.mjs +282 -0
- package/app-node/research_mode.test.mjs +264 -0
- package/app-node/restart_notice.mjs +3 -0
- package/app-node/restart_notice.test.mjs +11 -0
- package/app-node/session_ontology.mjs +271 -0
- package/app-node/session_ontology.test.mjs +130 -0
- package/app-node/smart_progress.mjs +1 -1
- package/app-node/stream_sentencer.mjs +32 -2
- package/app-node/stream_sentencer.test.mjs +65 -0
- package/app-node/streaming_tts_queue.mjs +5 -1
- package/app-node/streaming_tts_queue.test.mjs +7 -1
- package/app-node/stt_whisper.mjs +24 -0
- package/app-node/stt_whisper.test.mjs +32 -0
- package/app-node/text_routing.mjs +4 -2
- package/app-node/tts_backends.mjs +537 -3
- package/app-node/tts_backends.test.mjs +454 -0
- package/app-node/tts_player.mjs +164 -0
- package/app-node/tts_player.test.mjs +202 -0
- package/app-node/tts_runtime.mjs +134 -0
- package/app-node/tts_runtime.test.mjs +89 -0
- package/app-node/tts_settings.mjs +150 -3
- package/app-node/tts_settings.test.mjs +204 -0
- package/app-node/tts_voice_config.mjs +136 -2
- package/app-node/tts_voice_config.test.mjs +94 -0
- package/app-node/utterance_router.mjs +216 -0
- package/app-node/utterance_router.test.mjs +236 -0
- package/app-node/voice_autojoin.mjs +37 -0
- package/app-node/voice_autojoin.test.mjs +59 -0
- package/app-node/voice_io.mjs +272 -0
- package/app-node/voice_io.test.mjs +102 -0
- package/app-node/voice_turn_runner.mjs +449 -0
- package/app-node/voice_turn_runner.test.mjs +289 -0
- package/docs/CONFIGURATION.md +12 -2
- package/docs/HARNESSES.md +58 -0
- package/docs/HARNESS_AIDER.md +50 -0
- package/docs/HARNESS_CLAUDE.md +56 -0
- package/docs/HARNESS_CODEX.md +56 -0
- package/docs/HARNESS_CURSOR.md +45 -0
- package/docs/HARNESS_GEMINI.md +45 -0
- package/docs/HARNESS_HERMES.md +57 -0
- package/docs/HARNESS_OPENCLAW.md +44 -0
- package/docs/HARNESS_OPENCODE.md +44 -0
- package/docs/README.md +1 -0
- package/docs/ROADMAP.md +20 -5
- package/docs/TTS_BACKENDS.md +227 -0
- package/docs/USAGE.md +22 -0
- package/docs/i18n/AGENTS.es.md +34 -0
- package/docs/i18n/AGENTS.fr.md +34 -0
- package/docs/i18n/AGENTS.ja.md +34 -0
- package/docs/i18n/AGENTS.ko.md +34 -0
- package/docs/i18n/AGENTS.ru.md +34 -0
- package/docs/i18n/AGENTS.zh.md +34 -0
- package/docs/i18n/HARNESSES.es.md +58 -0
- package/docs/i18n/HARNESSES.fr.md +58 -0
- package/docs/i18n/HARNESSES.ja.md +58 -0
- package/docs/i18n/HARNESSES.ko.md +58 -0
- package/docs/i18n/HARNESSES.ru.md +58 -0
- package/docs/i18n/HARNESSES.zh.md +58 -0
- package/docs/i18n/HARNESS_AIDER.es.md +48 -0
- package/docs/i18n/HARNESS_AIDER.fr.md +48 -0
- package/docs/i18n/HARNESS_AIDER.ja.md +50 -0
- package/docs/i18n/HARNESS_AIDER.ko.md +50 -0
- package/docs/i18n/HARNESS_AIDER.ru.md +48 -0
- package/docs/i18n/HARNESS_AIDER.zh.md +48 -0
- package/docs/i18n/HARNESS_CLAUDE.es.md +55 -0
- package/docs/i18n/HARNESS_CLAUDE.fr.md +55 -0
- package/docs/i18n/HARNESS_CLAUDE.ja.md +56 -0
- package/docs/i18n/HARNESS_CLAUDE.ko.md +56 -0
- package/docs/i18n/HARNESS_CLAUDE.ru.md +55 -0
- package/docs/i18n/HARNESS_CLAUDE.zh.md +56 -0
- package/docs/i18n/HARNESS_CODEX.es.md +55 -0
- package/docs/i18n/HARNESS_CODEX.fr.md +55 -0
- package/docs/i18n/HARNESS_CODEX.ja.md +56 -0
- package/docs/i18n/HARNESS_CODEX.ko.md +56 -0
- package/docs/i18n/HARNESS_CODEX.ru.md +55 -0
- package/docs/i18n/HARNESS_CODEX.zh.md +56 -0
- package/docs/i18n/HARNESS_CURSOR.es.md +42 -0
- package/docs/i18n/HARNESS_CURSOR.fr.md +42 -0
- package/docs/i18n/HARNESS_CURSOR.ja.md +45 -0
- package/docs/i18n/HARNESS_CURSOR.ko.md +45 -0
- package/docs/i18n/HARNESS_CURSOR.ru.md +42 -0
- package/docs/i18n/HARNESS_CURSOR.zh.md +42 -0
- package/docs/i18n/HARNESS_GEMINI.es.md +44 -0
- package/docs/i18n/HARNESS_GEMINI.fr.md +44 -0
- package/docs/i18n/HARNESS_GEMINI.ja.md +45 -0
- package/docs/i18n/HARNESS_GEMINI.ko.md +45 -0
- package/docs/i18n/HARNESS_GEMINI.ru.md +44 -0
- package/docs/i18n/HARNESS_GEMINI.zh.md +45 -0
- package/docs/i18n/HARNESS_HERMES.es.md +54 -0
- package/docs/i18n/HARNESS_HERMES.fr.md +54 -0
- package/docs/i18n/HARNESS_HERMES.ja.md +57 -0
- package/docs/i18n/HARNESS_HERMES.ko.md +57 -0
- package/docs/i18n/HARNESS_HERMES.ru.md +54 -0
- package/docs/i18n/HARNESS_HERMES.zh.md +57 -0
- package/docs/i18n/HARNESS_OPENCLAW.es.md +41 -0
- package/docs/i18n/HARNESS_OPENCLAW.fr.md +41 -0
- package/docs/i18n/HARNESS_OPENCLAW.ja.md +44 -0
- package/docs/i18n/HARNESS_OPENCLAW.ko.md +44 -0
- package/docs/i18n/HARNESS_OPENCLAW.ru.md +41 -0
- package/docs/i18n/HARNESS_OPENCLAW.zh.md +42 -0
- package/docs/i18n/HARNESS_OPENCODE.es.md +41 -0
- package/docs/i18n/HARNESS_OPENCODE.fr.md +41 -0
- package/docs/i18n/HARNESS_OPENCODE.ja.md +44 -0
- package/docs/i18n/HARNESS_OPENCODE.ko.md +44 -0
- package/docs/i18n/HARNESS_OPENCODE.ru.md +41 -0
- package/docs/i18n/HARNESS_OPENCODE.zh.md +44 -0
- package/docs/superpowers/plans/2026-05-14-cross-agent-voice-transfer.md +625 -0
- package/docs/superpowers/plans/2026-05-21-audio-overview-narrated-diffs.md +95 -0
- package/docs/superpowers/plans/2026-05-21-autoresearch-ontology.md +83 -0
- package/docs/superpowers/plans/2026-05-21-phase11-push-to-talk-wakeword-v2.md +77 -0
- package/docs/superpowers/plans/2026-05-21-phase12-multi-user-voice.md +147 -0
- package/docs/superpowers/plans/2026-05-21-phase14-verbalbench.md +136 -0
- package/docs/superpowers/plans/2026-05-21-phase15-phone-companion.md +72 -0
- package/integrations/fireredtts2/mlx_llm.py +183 -0
- package/integrations/fireredtts2/synth.py +156 -0
- package/integrations/fireredtts2/synth_mlx.py +196 -0
- package/integrations/mlxaudio/synth.py +74 -0
- package/integrations/neuttsair/synth.py +104 -0
- package/integrations/omnivoice/synth.py +110 -0
- package/package.json +6 -1
- package/scripts/cli.mjs +84 -0
- package/scripts/doctor.mjs +104 -4
- package/scripts/install.mjs +5 -1
- package/scripts/install_fireredtts2.sh +109 -0
- package/scripts/install_mlxaudio.sh +34 -0
- package/scripts/install_mossttsnano.sh +46 -0
- package/scripts/postinstall.mjs +34 -0
|
@@ -0,0 +1,289 @@
|
|
|
1
|
+
import test from 'node:test';
|
|
2
|
+
import assert from 'node:assert/strict';
|
|
3
|
+
import { createVoiceTurnRunner } from './voice_turn_runner.mjs';
|
|
4
|
+
import { createUtteranceRouter } from './utterance_router.mjs';
|
|
5
|
+
import { createPlanDispatcher } from './plan_dispatcher.mjs';
|
|
6
|
+
import { createBridge } from './bridge_context.mjs';
|
|
7
|
+
import { createAgentTurnLifecycle } from './agent_turn.mjs';
|
|
8
|
+
|
|
9
|
+
function noop() {}
|
|
10
|
+
async function noopAsync() {}
|
|
11
|
+
|
|
12
|
+
// Build a complete dep set for voiceTurnRunner by first constructing a
|
|
13
|
+
// real utterance_router with stubbed pure-function deps, then threading
|
|
14
|
+
// its outputs into the runner alongside the rest. This mirrors main.mjs's
|
|
15
|
+
// real construction order so the tests catch any inter-module wiring drift.
|
|
16
|
+
function makeDeps(overrides = {}) {
|
|
17
|
+
const bridge = createBridge();
|
|
18
|
+
bridge.bridgeState = {
|
|
19
|
+
deferredSize: () => 0,
|
|
20
|
+
currentEpoch: () => 1,
|
|
21
|
+
discardQueues: () => 0,
|
|
22
|
+
};
|
|
23
|
+
const agentTurnLifecycle = createAgentTurnLifecycle({ bridge, warn: noop });
|
|
24
|
+
|
|
25
|
+
const agentAdapter = {
|
|
26
|
+
label: 'default-agent', backend: 'hermes',
|
|
27
|
+
readSessionId: () => null,
|
|
28
|
+
ask: async () => 'mock agent answer',
|
|
29
|
+
};
|
|
30
|
+
|
|
31
|
+
// Construct the router (post-Phase 7b: dispatch + adapter selection only).
|
|
32
|
+
const router = createUtteranceRouter({
|
|
33
|
+
bridge,
|
|
34
|
+
log: noop, warn: noop, path: { join: (...a) => a.join('/') },
|
|
35
|
+
ROOT: '/tmp/vc', TTS_VOICE_CONFIG_PATH: '/tmp/voices.json',
|
|
36
|
+
agentAdapter,
|
|
37
|
+
settings: { voiceLanguage: 'ko', transcriptChannelId: 'tx-ch', agent: { backend: 'hermes', label: 'hermes' }, tts: {} },
|
|
38
|
+
projectSessionContextText: () => '',
|
|
39
|
+
createBridgeAgentAdapter: s => ({ label: s?.label || 'fake', backend: s?.backend || 'hermes', ask: async () => '' }),
|
|
40
|
+
buildAgentSettings: () => ({ backend: 'hermes', label: 'hermes' }),
|
|
41
|
+
commandIsInstalled: async () => true,
|
|
42
|
+
shellSplit: s => String(s).split(' '),
|
|
43
|
+
sendText: noopAsync, speakText: noopAsync,
|
|
44
|
+
ensureTtsVoiceConfig: () => ({ backends: {} }),
|
|
45
|
+
updateTtsVoiceConfig: c => c,
|
|
46
|
+
writeTtsVoiceConfig: noop,
|
|
47
|
+
applyVoiceConfigToProcessEnv: () => ({ selection: { backend: 'edge', voiceType: 'female', voice: { language: 'ko', voice: 'x' } } }),
|
|
48
|
+
ensureSelectedTtsBackendInstalled: noopAsync,
|
|
49
|
+
rebuildTtsRuntimeSettings: noop,
|
|
50
|
+
voiceCommandFromTranscript: () => null,
|
|
51
|
+
voiceChangedText: () => '',
|
|
52
|
+
voiceLanguageCommandFromTranscript: () => null,
|
|
53
|
+
voiceCloneCommandFromText: () => null,
|
|
54
|
+
voiceCloneCapture: { arm: () => ({ targetPath: '' }), cancel: () => false, current: () => null },
|
|
55
|
+
notifyVoiceCloneSampleGapIfNeeded: noopAsync,
|
|
56
|
+
languageChangedText: () => '',
|
|
57
|
+
applyRuntimeLanguage: noop,
|
|
58
|
+
persistEnvValues: noop,
|
|
59
|
+
discardVoiceInputQueues: () => 0,
|
|
60
|
+
});
|
|
61
|
+
|
|
62
|
+
// Construct the plan dispatcher (Phase 7b) consuming router outputs.
|
|
63
|
+
const planDispatcher = createPlanDispatcher({
|
|
64
|
+
bridge,
|
|
65
|
+
settings: { voiceLanguage: 'ko', transcriptChannelId: 'tx-ch', agent: { backend: 'hermes', label: 'hermes' } },
|
|
66
|
+
sendText: noopAsync,
|
|
67
|
+
speakText: noopAsync,
|
|
68
|
+
routingStateFor: router.routingStateFor,
|
|
69
|
+
adapterForBackend: router.adapterForBackend,
|
|
70
|
+
adapterForProjectSession: router.adapterForProjectSession,
|
|
71
|
+
resolveProjectSessionForChannel: () => null,
|
|
72
|
+
isAgentRoutingDecision: () => false,
|
|
73
|
+
parseDecisionAnswer: () => ({ type: 'unknown' }),
|
|
74
|
+
parsePlanVoiceCommand: () => ({ type: 'unknown' }),
|
|
75
|
+
applyPlanCommand: s => s,
|
|
76
|
+
parsePlanOutput: () => ({ steps: [], decisions: [] }),
|
|
77
|
+
renderDecisionPrompt: d => d?.text || '',
|
|
78
|
+
renderResolvedDecisions: () => '',
|
|
79
|
+
renderFinalPlan: () => '',
|
|
80
|
+
planModePreamble: () => '',
|
|
81
|
+
planExecutionPreamble: () => '',
|
|
82
|
+
isPlanEntryUtterance: () => false,
|
|
83
|
+
});
|
|
84
|
+
|
|
85
|
+
const settings = { voiceLanguage: 'ko', transcriptChannelId: 'tx-ch', agent: { backend: 'hermes', label: 'hermes' }, tts: {} };
|
|
86
|
+
|
|
87
|
+
return {
|
|
88
|
+
bridge,
|
|
89
|
+
agentTurnLifecycle,
|
|
90
|
+
settings,
|
|
91
|
+
client: { channels: { cache: new Map() } },
|
|
92
|
+
log: noop, warn: noop, fs: { rm: (_p, _o, cb) => cb && cb() },
|
|
93
|
+
// voice_io
|
|
94
|
+
transcribe: async () => 'hey hermes do a thing',
|
|
95
|
+
// tts_player
|
|
96
|
+
beginStreamingTurn: () => false,
|
|
97
|
+
endStreamingTurn: noopAsync,
|
|
98
|
+
speakText: noopAsync,
|
|
99
|
+
// progress_handler
|
|
100
|
+
queueProgressSpeechText: noop,
|
|
101
|
+
stopProgressSpeech: noop,
|
|
102
|
+
speakImmediateNotice: noopAsync,
|
|
103
|
+
// notification_handler
|
|
104
|
+
maybeNotifyTaskComplete: noopAsync,
|
|
105
|
+
// utterance_router outputs (real router instance built above)
|
|
106
|
+
handleLanguageCommand: router.handleLanguageCommand,
|
|
107
|
+
handleTtsVoiceCommand: router.handleTtsVoiceCommand,
|
|
108
|
+
handleVoiceCloneCommand: router.handleVoiceCloneCommand,
|
|
109
|
+
dispatchPlanModeUtterance: planDispatcher.dispatchPlanModeUtterance,
|
|
110
|
+
adapterForBackend: router.adapterForBackend,
|
|
111
|
+
adapterForProjectSession: router.adapterForProjectSession,
|
|
112
|
+
planChannelKey: planDispatcher.planChannelKey,
|
|
113
|
+
routingStateFor: router.routingStateFor,
|
|
114
|
+
recordUtterance: router.recordUtterance,
|
|
115
|
+
clearTransientRouting: router.clearTransientRouting,
|
|
116
|
+
// pure helpers
|
|
117
|
+
isAllowed: () => true,
|
|
118
|
+
isAbortError: e => e?.name === 'AbortError',
|
|
119
|
+
sleep: async () => {},
|
|
120
|
+
sendText: noopAsync,
|
|
121
|
+
sendEmbed: async () => true,
|
|
122
|
+
reloadRuntimeLanguageFromEnv: () => ({ changed: false, voiceLanguage: 'ko', whisperLanguage: 'ko' }),
|
|
123
|
+
drainDeferredProcessingUtterances: noopAsync,
|
|
124
|
+
resolveProjectSessionForChannel: () => null,
|
|
125
|
+
projectSessionContextText: () => '',
|
|
126
|
+
ontologyStateFor: () => ({ nodeCount: 0, serializeForHandoff: () => '' }),
|
|
127
|
+
captureOntologyFromTurn: noop,
|
|
128
|
+
formatRecentDiscordContext: () => '',
|
|
129
|
+
formatSttResultMessage: (_lang, _u, t) => `you said: ${t}`,
|
|
130
|
+
formatSttStartMessage: () => '🎧',
|
|
131
|
+
formatVoiceErrorMessage: (_lang, m) => m,
|
|
132
|
+
formatWakeRejectedMessage: () => 'no wake word',
|
|
133
|
+
agentAnswerHeader: () => 'agent says:',
|
|
134
|
+
emptyAgentAnswer: () => '(empty)',
|
|
135
|
+
spokenResultOnly: (_p, a) => a,
|
|
136
|
+
stripWake: t => t,
|
|
137
|
+
acceptsWake: () => true,
|
|
138
|
+
sensitivityChangedSpeech: () => '',
|
|
139
|
+
sensitivityModeFromTranscript: () => null,
|
|
140
|
+
sensitivityStatusText: () => '',
|
|
141
|
+
setSensitivityMode: () => ({ mode: 'normal' }),
|
|
142
|
+
isSensitivityOnlyRequest: () => false,
|
|
143
|
+
verboseChangedSpeech: () => '',
|
|
144
|
+
verboseModeFromTranscript: () => null,
|
|
145
|
+
verboseStatusText: () => '',
|
|
146
|
+
setVerboseProgress: noop,
|
|
147
|
+
isVerboseOnlyRequest: () => false,
|
|
148
|
+
isRoutingOnlyUtterance: () => false,
|
|
149
|
+
parseAgentRoutingCommand: () => ({ type: 'none' }),
|
|
150
|
+
renderAgentPrefix: () => '',
|
|
151
|
+
buildCrossAgentPrompt: ({ prompt }) => prompt,
|
|
152
|
+
buildFallbackDecision: () => ({ slot: 'fallback' }),
|
|
153
|
+
parseDecisionAnswer: () => ({ type: 'unknown' }),
|
|
154
|
+
parseResearchCommand: () => ({ type: 'none' }),
|
|
155
|
+
runResearchTurn: async () => ({ status: 'no_backend' }),
|
|
156
|
+
PROGRESS_IDLE_CHECK_MS: 5000,
|
|
157
|
+
PROGRESS_IDLE_NOTICE_INITIAL_MS: 10000,
|
|
158
|
+
PROGRESS_IDLE_NOTICE_LIMIT: 20,
|
|
159
|
+
PROGRESS_IDLE_NOTICE_MAX_MS: 30000,
|
|
160
|
+
PROGRESS_IDLE_NOTICE_MULTIPLIER: 1.8,
|
|
161
|
+
STT_START_VOICE_NOTICE: false,
|
|
162
|
+
...overrides,
|
|
163
|
+
};
|
|
164
|
+
}
|
|
165
|
+
|
|
166
|
+
test('createVoiceTurnRunner exposes handleRecording', () => {
|
|
167
|
+
const runner = createVoiceTurnRunner(makeDeps());
|
|
168
|
+
assert.equal(typeof runner.handleRecording, 'function');
|
|
169
|
+
});
|
|
170
|
+
|
|
171
|
+
test('handleRecording happy path: transcribe -> agent -> send + speak + notify, cleanup green', async () => {
|
|
172
|
+
const calls = { transcribe: 0, askPrompt: '', sendText: [], speakText: [], notify: [] };
|
|
173
|
+
// Capture the exact prompt the runner sends to the agent and the exact
|
|
174
|
+
// answer that flows back out.
|
|
175
|
+
const fakeAdapter = {
|
|
176
|
+
label: 'hermes', backend: 'hermes', readSessionId: () => null,
|
|
177
|
+
ask: async (prompt, _signal, plan) => {
|
|
178
|
+
calls.askPrompt = prompt;
|
|
179
|
+
assert.equal(plan.label, 'hermes', 'plan label = agent label');
|
|
180
|
+
assert.equal(plan.task, true, 'plan.task=true for voice turn');
|
|
181
|
+
return 'twelve apples';
|
|
182
|
+
},
|
|
183
|
+
};
|
|
184
|
+
const deps = makeDeps({
|
|
185
|
+
transcribe: async wav => { calls.transcribe++; assert.equal(wav, '/tmp/u.wav'); return 'hermes do the thing'; },
|
|
186
|
+
sendText: async t => { calls.sendText.push(t); return true; },
|
|
187
|
+
speakText: async t => { calls.speakText.push(t); },
|
|
188
|
+
maybeNotifyTaskComplete: async ({ answer, label }) => { calls.notify.push({ answer, label }); },
|
|
189
|
+
// Force the runner's adapter lookup to return our test adapter rather
|
|
190
|
+
// than the router's auto-created one (the router builds a fresh adapter
|
|
191
|
+
// per backend via createBridgeAgentAdapter).
|
|
192
|
+
adapterForBackend: () => fakeAdapter,
|
|
193
|
+
adapterForProjectSession: () => fakeAdapter,
|
|
194
|
+
});
|
|
195
|
+
const { handleRecording } = createVoiceTurnRunner(deps);
|
|
196
|
+
assert.equal(deps.bridge.processing, false);
|
|
197
|
+
await handleRecording('user-1', '/tmp/u.wav', 8192, 1, null);
|
|
198
|
+
assert.equal(calls.transcribe, 1, 'transcribe called once');
|
|
199
|
+
assert.equal(calls.askPrompt, 'hermes do the thing', 'agent receives the post-wake prompt');
|
|
200
|
+
assert.deepEqual(calls.speakText.at(-1), 'twelve apples', 'agent answer is spoken');
|
|
201
|
+
assert.ok(calls.sendText.some(s => /you said: hermes do the thing/.test(s)), 'STT echoed');
|
|
202
|
+
assert.ok(calls.sendText.some(s => /twelve apples/.test(s)), 'agent answer surfaced as text');
|
|
203
|
+
assert.equal(calls.notify.length, 1, 'maybeNotifyTaskComplete fired once');
|
|
204
|
+
assert.equal(calls.notify[0].label, 'hermes', 'notify carries the agent label');
|
|
205
|
+
assert.equal(deps.bridge.processing, false, 'processing flag cleared in finally');
|
|
206
|
+
assert.equal(deps.bridge.activeTurnId, 0, 'activeTurnId cleared');
|
|
207
|
+
assert.equal(deps.bridge.activeProgressAbortController, null, 'progress controller cleared');
|
|
208
|
+
});
|
|
209
|
+
|
|
210
|
+
test('handleRecording cleans up progress controller even when agent throws', async () => {
|
|
211
|
+
const fakeAdapter = {
|
|
212
|
+
label: 'hermes', backend: 'hermes', readSessionId: () => null,
|
|
213
|
+
ask: async () => { throw new Error('agent boom'); },
|
|
214
|
+
};
|
|
215
|
+
const deps = makeDeps({
|
|
216
|
+
adapterForBackend: () => fakeAdapter,
|
|
217
|
+
adapterForProjectSession: () => fakeAdapter,
|
|
218
|
+
});
|
|
219
|
+
const { handleRecording } = createVoiceTurnRunner(deps);
|
|
220
|
+
let finishStatus = null;
|
|
221
|
+
let finishError = null;
|
|
222
|
+
const metricsTurn = { mark: () => {}, addMeta: () => {}, stage: () => {}, finish: r => { finishStatus = r.status; finishError = r.error; } };
|
|
223
|
+
await handleRecording('user-1', '/tmp/u.wav', 8192, 1, metricsTurn);
|
|
224
|
+
assert.equal(finishStatus, 'error');
|
|
225
|
+
assert.match(finishError || '', /agent boom/);
|
|
226
|
+
// Cleanup invariants — the bug Codex flagged on the original voice-path
|
|
227
|
+
// finally is now guarded by agentTurnLifecycle.finish, so we double-check.
|
|
228
|
+
assert.equal(deps.bridge.processing, false);
|
|
229
|
+
assert.equal(deps.bridge.activeProgressAbortController, null);
|
|
230
|
+
assert.equal(deps.bridge.currentAbortController, null);
|
|
231
|
+
});
|
|
232
|
+
|
|
233
|
+
test('handleRecording drops when bridge.processing is already true', async () => {
|
|
234
|
+
const deps = makeDeps();
|
|
235
|
+
deps.bridge.processing = true;
|
|
236
|
+
const { handleRecording } = createVoiceTurnRunner(deps);
|
|
237
|
+
let finishStatus = null;
|
|
238
|
+
const metricsTurn = { mark: () => {}, addMeta: () => {}, stage: () => {}, finish: r => { finishStatus = r.status; } };
|
|
239
|
+
await handleRecording('user-1', '/tmp/u.wav', 8192, 1, metricsTurn);
|
|
240
|
+
assert.equal(finishStatus, 'drop_processing');
|
|
241
|
+
assert.equal(deps.bridge.processing, true, 'processing flag left intact (other turn owns it)');
|
|
242
|
+
});
|
|
243
|
+
|
|
244
|
+
test('handleRecording rejects unauthorized users', async () => {
|
|
245
|
+
const deps = makeDeps({ isAllowed: () => false });
|
|
246
|
+
const { handleRecording } = createVoiceTurnRunner(deps);
|
|
247
|
+
let finishStatus = null;
|
|
248
|
+
const metricsTurn = { mark: () => {}, addMeta: () => {}, stage: () => {}, finish: r => { finishStatus = r.status; } };
|
|
249
|
+
await handleRecording('intruder', '/tmp/u.wav', 8192, 1, metricsTurn);
|
|
250
|
+
assert.equal(finishStatus, 'unauthorized');
|
|
251
|
+
});
|
|
252
|
+
|
|
253
|
+
test('handleRecording short-circuits on empty transcript', async () => {
|
|
254
|
+
const deps = makeDeps({ transcribe: async () => '' });
|
|
255
|
+
const { handleRecording } = createVoiceTurnRunner(deps);
|
|
256
|
+
let finishStatus = null;
|
|
257
|
+
const metricsTurn = { mark: () => {}, addMeta: () => {}, stage: () => {}, finish: r => { finishStatus = r.status; } };
|
|
258
|
+
await handleRecording('user-1', '/tmp/u.wav', 8192, 1, metricsTurn);
|
|
259
|
+
assert.equal(finishStatus, 'empty_transcript');
|
|
260
|
+
assert.equal(deps.bridge.processing, false, 'processing flag still cleaned up');
|
|
261
|
+
});
|
|
262
|
+
|
|
263
|
+
test('handleRecording short-circuits when wake word missing', async () => {
|
|
264
|
+
const sent = [];
|
|
265
|
+
const deps = makeDeps({
|
|
266
|
+
acceptsWake: () => false,
|
|
267
|
+
sendText: async t => { sent.push(t); return true; },
|
|
268
|
+
});
|
|
269
|
+
const { handleRecording } = createVoiceTurnRunner(deps);
|
|
270
|
+
let finishStatus = null;
|
|
271
|
+
const metricsTurn = { mark: () => {}, addMeta: () => {}, stage: () => {}, finish: r => { finishStatus = r.status; } };
|
|
272
|
+
await handleRecording('user-1', '/tmp/u.wav', 8192, 1, metricsTurn);
|
|
273
|
+
assert.equal(finishStatus, 'wake_rejected');
|
|
274
|
+
assert.ok(sent.some(t => /no wake word/.test(t)), 'wake-rejected message sent');
|
|
275
|
+
});
|
|
276
|
+
|
|
277
|
+
test('handleRecording with stale language reload aborts before transcribe', async () => {
|
|
278
|
+
let transcribed = false;
|
|
279
|
+
const deps = makeDeps({
|
|
280
|
+
reloadRuntimeLanguageFromEnv: () => ({ changed: true, voiceLanguage: 'en', whisperLanguage: 'en' }),
|
|
281
|
+
transcribe: async () => { transcribed = true; return 'hi'; },
|
|
282
|
+
});
|
|
283
|
+
const { handleRecording } = createVoiceTurnRunner(deps);
|
|
284
|
+
let finishStatus = null;
|
|
285
|
+
const metricsTurn = { mark: () => {}, addMeta: () => {}, stage: () => {}, finish: r => { finishStatus = r.status; } };
|
|
286
|
+
await handleRecording('user-1', '/tmp/u.wav', 8192, 1, metricsTurn);
|
|
287
|
+
assert.equal(transcribed, false, 'transcribe not called when language changed');
|
|
288
|
+
assert.equal(finishStatus, 'drop_stale_language_change');
|
|
289
|
+
});
|
package/docs/CONFIGURATION.md
CHANGED
|
@@ -192,6 +192,8 @@ Remove `ports:` from that Compose service. On Docker Desktop for macOS/Windows,
|
|
|
192
192
|
|
|
193
193
|
## Optional TTS Backends
|
|
194
194
|
|
|
195
|
+
For the full backend matrix, latency notes, aliases, and Mac mini caveats, see [TTS Backends](TTS_BACKENDS.md).
|
|
196
|
+
|
|
195
197
|
Edge TTS remains the default and fallback. Optional local backends are configured with their own env vars:
|
|
196
198
|
|
|
197
199
|
| Backend | Settings | Voice choices |
|
|
@@ -200,8 +202,16 @@ Edge TTS remains the default and fallback. Optional local backends are configure
|
|
|
200
202
|
| Supertonic | `SUPERTONIC_VOICE`, `SUPERTONIC_LANGUAGE` | `M1`–`M5`, `F1`–`F5`; language `ko`, `en`, `es`, `pt`, `fr` |
|
|
201
203
|
| OpenVoice | `OPENVOICE_REF_AUDIO`, `OPENVOICE_STYLE`, `OPENVOICE_LANGUAGE` | User-provided permitted reference WAV; style defaults to `default` |
|
|
202
204
|
| SpeechSwift / CosyVoice | `SPEECHSWIFT_REF_AUDIO`, `SPEECHSWIFT_ENGINE`, `SPEECHSWIFT_SPEAKER`, `SPEECHSWIFT_MODEL_ID` | Reference-sample voices for CosyVoice, or backend-supported speaker/model IDs |
|
|
203
|
-
|
|
204
|
-
|
|
205
|
+
| OmniVoice | `OMNIVOICE_PYTHON`, `OMNIVOICE_MODEL`, `OMNIVOICE_REF_AUDIO`, `OMNIVOICE_REF_TEXT`, `OMNIVOICE_LANGUAGE`, `OMNIVOICE_SPEAKER` | k2-fsa/OmniVoice reference-sample cloning or optional voice-design attributes |
|
|
206
|
+
| Qwen3 TTS | `QWEN3TTS_COMMAND`, `QWEN3TTS_MODE`, `QWEN3TTS_MODEL`, `QWEN3TTS_SPEAKER` | Preset speaker such as `sohee`, reference mode, or designed speaker text |
|
|
207
|
+
| MLX Audio | `MLXAUDIO_PYTHON`, `MLXAUDIO_MODEL`, `MLXAUDIO_VOICE`, `MLXAUDIO_LANG_CODE` | MLX Qwen3 voice/speaker IDs such as `Chelsie` |
|
|
208
|
+
| NeuTTS Air | `NEUTTSAIR_PYTHON`, `NEUTTSAIR_BACKBONE_REPO`, `NEUTTSAIR_CODEC_REPO`, `NEUTTSAIR_REF_AUDIO`, `NEUTTSAIR_REF_TEXT` | English NeuTTS Air reference-sample cloning; use Q4 GGUF for lower latency |
|
|
209
|
+
| FireRedTTS-2 | `FIREREDTTS2_COMMAND`, `FIREREDTTS2_PRETRAINED_DIR`, `FIREREDTTS2_PROMPT_AUDIO`, `FIREREDTTS2_PROMPT_TEXT` | Prompt-reference voice or random speaker |
|
|
210
|
+
| FireRedTTS-2 MLX helper | `integrations/fireredtts2/synth_mlx.py` | Experimental Apple Silicon LLM-port helper; not a canonical `TTS_BACKEND` yet |
|
|
211
|
+
| MOSS-TTS-Nano | `MOSSTTSNANO_COMMAND`, `MOSSTTSNANO_SCRIPT`, `MOSSTTSNANO_CHECKPOINT`, `MOSSTTSNANO_PROMPT_AUDIO` | OpenMOSS prompt reference or continuation mode |
|
|
212
|
+
| MOSS-TTS-Nano MLX | `MOSSTTSNANO_MLX_PYTHON`, `MOSSTTSNANO_MLX_SCRIPT`, `MOSSTTSNANO_MLX_WORKER`, `MOSSTTSNANO_PROMPT_AUDIO` | Experimental MLX hybrid prompt reference or continuation mode |
|
|
213
|
+
|
|
214
|
+
Only clone voices you own or have permission to use. For OmniVoice, install it in a separate Python environment such as `.venv-omnivoice` (`pip install torch torchaudio soundfile omnivoice`) and set `TTS_BACKEND=omnivoice`. For NeuTTS Air, install the local `neutts` package in `.venv-neuttsair`, set `TTS_BACKEND=neuttsair`, and keep progress prompts on Edge unless explicitly testing local progress TTS. If a local backend fails or times out, VerbalCoding falls back to Edge TTS.
|
|
205
215
|
|
|
206
216
|
## Operational Notes
|
|
207
217
|
|
|
@@ -0,0 +1,58 @@
|
|
|
1
|
+
# Coding Agent Harnesses
|
|
2
|
+
|
|
3
|
+
<p align="center">
|
|
4
|
+
<a href="../README.md">README</a> ·
|
|
5
|
+
<a href="README.md">Docs hub</a> ·
|
|
6
|
+
<a href="USAGE.md">Usage</a> ·
|
|
7
|
+
<a href="CONFIGURATION.md">Configuration</a> ·
|
|
8
|
+
<a href="TROUBLESHOOTING.md">Troubleshooting</a>
|
|
9
|
+
</p>
|
|
10
|
+
|
|
11
|
+
VerbalCoding is agent-agnostic. It drives whichever CLI coding agent you have installed by spawning it once per voice turn, feeding the transcript as a prompt, and speaking the response back. Pick **one** as your default; the cross-agent voice routing lets you reach the others mid-session.
|
|
12
|
+
|
|
13
|
+
| Harness | Default command | Session resume | Per-harness doc |
|
|
14
|
+
|---|---|---|---|
|
|
15
|
+
| Hermes Agent | `hermes chat -Q -q` | ✅ (`--resume <id>`) | [HERMES_VOICE.md](./HERMES_VOICE.md) (positioning) + [HARNESS_HERMES.md](./HARNESS_HERMES.md) |
|
|
16
|
+
| Claude Code | `claude -p` | ❌ | [HARNESS_CLAUDE.md](./HARNESS_CLAUDE.md) |
|
|
17
|
+
| Codex | `codex exec` | ❌ (output-last-message capture) | [HARNESS_CODEX.md](./HARNESS_CODEX.md) |
|
|
18
|
+
| Gemini CLI | `gemini -p` | ❌ | [HARNESS_GEMINI.md](./HARNESS_GEMINI.md) |
|
|
19
|
+
| OpenCode | `opencode run` | ❌ | [HARNESS_OPENCODE.md](./HARNESS_OPENCODE.md) |
|
|
20
|
+
| OpenClaw | `openclaw run` | ❌ | [HARNESS_OPENCLAW.md](./HARNESS_OPENCLAW.md) |
|
|
21
|
+
| Aider | `aider --no-pretty --yes-always --message` | ❌ | [HARNESS_AIDER.md](./HARNESS_AIDER.md) |
|
|
22
|
+
| Cursor CLI | `cursor-agent --print --prompt` | ❌ | [HARNESS_CURSOR.md](./HARNESS_CURSOR.md) |
|
|
23
|
+
|
|
24
|
+
## Pick your default
|
|
25
|
+
|
|
26
|
+
`vc setup` auto-detects installed binaries and lets you pick. Non-interactive override:
|
|
27
|
+
|
|
28
|
+
```bash
|
|
29
|
+
# .env or instance .env
|
|
30
|
+
AGENT_BACKEND=claude # hermes | claude | codex | gemini | opencode | openclaw | aider | cursor | custom
|
|
31
|
+
```
|
|
32
|
+
|
|
33
|
+
Each harness picks up its own command from a matching env var (`HERMES_COMMAND`, `CLAUDE_COMMAND`, etc.). The shared envs `AGENT_LABEL`, `AGENT_COMMAND`, `AGENT_SESSION_FILE`, `AGENT_WORKDIR`, `AGENT_PROJECT_CONTEXT`, `AGENT_TASK_TIMEOUT_MS`, `AGENT_CHAT_TIMEOUT_MS`, `AGENT_VERBOSE_PROGRESS` override per-harness defaults when set.
|
|
34
|
+
|
|
35
|
+
## Routing between harnesses by voice
|
|
36
|
+
|
|
37
|
+
Once configured, you can reach any **installed** harness from a voice channel without restarting:
|
|
38
|
+
|
|
39
|
+
- `"ask Codex what it thinks"` — single-turn route, next utterance returns to the default.
|
|
40
|
+
- `"switch to Aider"` — sticky route until you say `"back to default"`.
|
|
41
|
+
- Plan-mode `which_agent` slot — the agent itself proposes which backend runs the next plan.
|
|
42
|
+
|
|
43
|
+
The routing layer detects whether the binary is on `PATH` (resolving relative commands against the active project session's workdir). If not installed, the bridge asks `"Want me to use the default agent instead?"` — answer `"yes"` to fall back or `"no"` to cancel.
|
|
44
|
+
|
|
45
|
+
Aliases recognized by the parser: `claude` / `claude code`, `codex` / `코덱스`, `gemini` / `gemini cli` / `제미나이`, `opencode`, `openclaw`, `aider` / `에이더`, `cursor` / `cursor cli`, `hermes` / `헤르메스`.
|
|
46
|
+
|
|
47
|
+
## Shared semantics
|
|
48
|
+
|
|
49
|
+
Things every harness adapter respects:
|
|
50
|
+
|
|
51
|
+
- **Voice plan mode** — `"plan it first"` → narrate a plan; edit by voice; `"approve"` to execute against the chosen harness.
|
|
52
|
+
- **Barge-in** — interrupting cuts the current TTS and aborts the agent task. Sticky routing survives interrupts; only single-turn routes are cleared.
|
|
53
|
+
- **Verbose progress** — `AGENT_VERBOSE_PROGRESS=1` (or `"상세 진행 켜"`) prints structured progress events the harness emits (file reads, web search, tool use). Smart-progress, if `SMART_PROGRESS_API_KEY` is set, summarizes these into one sentence per batch.
|
|
54
|
+
- **Push handoff** — `NOTIFY_PROVIDER=ntfy|pushover` plus `NOTIFY_MIN_TASK_MS` fires a push notification when a long task completes and the voice channel is empty. Debounced by body + `NOTIFY_DEBOUNCE_MS`.
|
|
55
|
+
- **Per-channel state** — each Discord voice channel keeps its own routing, plan-mode, and recent-utterance ring buffer.
|
|
56
|
+
- **Project sessions** — `!session new <name> <workdir>` binds a Discord channel to a project; per-(harness, session) adapters are cached and invalidated on rebind.
|
|
57
|
+
|
|
58
|
+
See per-harness docs for install paths, auth, and gotchas. `docs/CONFIGURATION.md` is the canonical env-var reference.
|
|
@@ -0,0 +1,50 @@
|
|
|
1
|
+
# Aider — Harness Notes
|
|
2
|
+
|
|
3
|
+
<p align="center">
|
|
4
|
+
<a href="../README.md">README</a> ·
|
|
5
|
+
<a href="HARNESSES.md">Harnesses</a> ·
|
|
6
|
+
<a href="USAGE.md">Usage</a> ·
|
|
7
|
+
<a href="CONFIGURATION.md">Configuration</a>
|
|
8
|
+
</p>
|
|
9
|
+
|
|
10
|
+
Aider is a pair-programming AI CLI focused on direct edits. VerbalCoding drives it through `aider --no-pretty --yes-always --message` — the prompt is passed as the `--message` value so each voice turn becomes one non-interactive Aider run that may modify files in `AGENT_WORKDIR`.
|
|
11
|
+
|
|
12
|
+
## Install
|
|
13
|
+
|
|
14
|
+
```bash
|
|
15
|
+
pip install aider-chat
|
|
16
|
+
aider --version
|
|
17
|
+
# Confirm a single-message run works:
|
|
18
|
+
aider --no-pretty --yes-always --message "list the top-level files"
|
|
19
|
+
```
|
|
20
|
+
|
|
21
|
+
Aider needs an API key for the model you point it at (OpenAI / Anthropic / a local server). See <https://aider.chat>.
|
|
22
|
+
|
|
23
|
+
## Configure VerbalCoding
|
|
24
|
+
|
|
25
|
+
```bash
|
|
26
|
+
# .env
|
|
27
|
+
AGENT_BACKEND=aider
|
|
28
|
+
# optional
|
|
29
|
+
AIDER_COMMAND="aider --no-pretty --yes-always --message" # default
|
|
30
|
+
AGENT_WORKDIR=/Users/you/code/your-project # where Aider should edit
|
|
31
|
+
AGENT_PROJECT_CONTEXT="..."
|
|
32
|
+
AGENT_CHAT_TIMEOUT_MS=120000 # Aider can take longer
|
|
33
|
+
AGENT_TASK_TIMEOUT_MS=0
|
|
34
|
+
```
|
|
35
|
+
|
|
36
|
+
`--no-pretty` strips Rich-formatting box characters so the stream sentencer doesn't choke on them. `--yes-always` keeps the run non-interactive (Aider won't pause for "apply this diff?" prompts).
|
|
37
|
+
|
|
38
|
+
## Voice phrases to switch TO Aider
|
|
39
|
+
|
|
40
|
+
- en: `"switch to Aider"`, `"ask Aider to ..."`
|
|
41
|
+
- ko: `"aider로 전환해줘"`, `"에이더로 전환"`
|
|
42
|
+
|
|
43
|
+
The matcher accepts `aider` and `에이더`.
|
|
44
|
+
|
|
45
|
+
## Gotchas
|
|
46
|
+
|
|
47
|
+
- **Aider edits files.** Unlike Claude / Codex / Gemini under `-p`, Aider directly modifies the working tree as part of answering. Be deliberate about `AGENT_WORKDIR` — usually a project session's `workdir`.
|
|
48
|
+
- **Diffs in output.** Aider often emits diff-shaped text. If a turn is interrupted, the bridge speaks an "interrupted" notice and skips reading the diff aloud — check the text channel and `git status`.
|
|
49
|
+
- **Auth.** `OPENAI_API_KEY` / `ANTHROPIC_API_KEY` need to be in Aider's environment; instance-isolated installs typically use `instances/<project>.env`.
|
|
50
|
+
- **Per-channel state.** Cross-agent routing is per Discord channel; switching to Aider in one project room does not affect another.
|
|
@@ -0,0 +1,56 @@
|
|
|
1
|
+
# Claude Code — Harness Notes
|
|
2
|
+
|
|
3
|
+
<p align="center">
|
|
4
|
+
<a href="../README.md">README</a> ·
|
|
5
|
+
<a href="HARNESSES.md">Harnesses</a> ·
|
|
6
|
+
<a href="USAGE.md">Usage</a> ·
|
|
7
|
+
<a href="CONFIGURATION.md">Configuration</a>
|
|
8
|
+
</p>
|
|
9
|
+
|
|
10
|
+
Claude Code is Anthropic's official terminal-resident coding agent. VerbalCoding drives it through `claude -p`, where each voice turn is one invocation. Claude Code does not expose a stable session-resume contract over `-p`, so each call is a fresh context — use `AGENT_PROJECT_CONTEXT` and the cross-agent handoff block to keep continuity.
|
|
11
|
+
|
|
12
|
+
## Install
|
|
13
|
+
|
|
14
|
+
```bash
|
|
15
|
+
npm install -g @anthropic-ai/claude-code
|
|
16
|
+
claude login
|
|
17
|
+
claude -p "hello" # confirm it answers
|
|
18
|
+
```
|
|
19
|
+
|
|
20
|
+
## Configure VerbalCoding
|
|
21
|
+
|
|
22
|
+
```bash
|
|
23
|
+
# .env
|
|
24
|
+
AGENT_BACKEND=claude # alias 'claude-code' also accepted
|
|
25
|
+
# optional
|
|
26
|
+
CLAUDE_COMMAND="claude -p" # default; override e.g. to add --model, --debug
|
|
27
|
+
AGENT_PROJECT_CONTEXT="Working on the auth module; previous decisions: oauth=github."
|
|
28
|
+
AGENT_WORKDIR=/Users/you/code/your-project
|
|
29
|
+
AGENT_CHAT_TIMEOUT_MS=45000
|
|
30
|
+
AGENT_TASK_TIMEOUT_MS=0
|
|
31
|
+
AGENT_VERBOSE_PROGRESS=0
|
|
32
|
+
```
|
|
33
|
+
|
|
34
|
+
`AGENT_SESSION_FILE` defaults to `<repo>/.agent-sessions/claude` but is **unused** by this harness — Claude Code's `-p` is stateless. Leave it set; it just becomes a no-op.
|
|
35
|
+
|
|
36
|
+
## What Claude sees per turn
|
|
37
|
+
|
|
38
|
+
Every turn the adapter prepends a Discord-aware preamble (English or Korean depending on `VOICE_LANGUAGE`), the project context, recent Discord text context, and finally the user's transcribed utterance. On cross-agent handoff (e.g. you said `"ask Codex ..."` last turn and just spoke again), the prepended block also includes a "Recent user voice" line of up to the last 4 utterances plus the most recently resolved plan decisions, so Claude doesn't start cold.
|
|
39
|
+
|
|
40
|
+
## Verbose progress
|
|
41
|
+
|
|
42
|
+
Claude Code does not emit a standard progress stream over `-p`. `AGENT_VERBOSE_PROGRESS=1` still works — the adapter parses tool/file/web mentions out of stdout/stderr if Claude prints them — but expect coarser progress than Hermes.
|
|
43
|
+
|
|
44
|
+
## Voice phrases to switch TO Claude Code
|
|
45
|
+
|
|
46
|
+
- en: `"switch to Claude Code"`, `"ask Claude ..."`, `"let Claude finish this"`
|
|
47
|
+
- ko: `"클로드로 전환"`, `"claude한테 물어봐"`
|
|
48
|
+
|
|
49
|
+
The matcher accepts both `claude` and `claude code` as aliases; strict mode (used for routing-only utterances) requires an exact alias.
|
|
50
|
+
|
|
51
|
+
## Gotchas
|
|
52
|
+
|
|
53
|
+
- **No session resume.** A long-running pair-programming session needs the cross-agent handoff context block to carry decisions forward. The bridge does this automatically on backend changes; within the same backend, set `AGENT_PROJECT_CONTEXT` to a short summary.
|
|
54
|
+
- **Quoted command paths.** If `CLAUDE_COMMAND` uses a quoted absolute path (e.g. `"/Applications/Claude Code/claude" -p`), VerbalCoding's installation probe uses `shellSplit` and honors quotes correctly.
|
|
55
|
+
- **Auth refresh.** `claude login` token expiry surfaces as a non-zero exit; the bridge reports the failure and (if a non-default backend) the fallback prompt will offer to retry on the default.
|
|
56
|
+
- **Patch-like output.** If Claude returns a diff and the turn is interrupted, the bridge says `"the agent was interrupted; check the text channel for files and tests"` rather than reading the diff aloud.
|
|
@@ -0,0 +1,56 @@
|
|
|
1
|
+
# Codex — Harness Notes
|
|
2
|
+
|
|
3
|
+
<p align="center">
|
|
4
|
+
<a href="../README.md">README</a> ·
|
|
5
|
+
<a href="HARNESSES.md">Harnesses</a> ·
|
|
6
|
+
<a href="USAGE.md">Usage</a> ·
|
|
7
|
+
<a href="CONFIGURATION.md">Configuration</a>
|
|
8
|
+
</p>
|
|
9
|
+
|
|
10
|
+
Codex CLI is OpenAI's terminal coding agent. VerbalCoding drives it through `codex exec`. Because `codex exec` writes its final assistant text to a temp file when `--output-last-message <path>` is passed, the adapter inserts that flag automatically and reads the file back even if stdout is noisy.
|
|
11
|
+
|
|
12
|
+
## Install
|
|
13
|
+
|
|
14
|
+
```bash
|
|
15
|
+
npm install -g @openai/codex
|
|
16
|
+
codex login # or set OPENAI_API_KEY for headless use
|
|
17
|
+
codex exec "hello"
|
|
18
|
+
```
|
|
19
|
+
|
|
20
|
+
## Configure VerbalCoding
|
|
21
|
+
|
|
22
|
+
```bash
|
|
23
|
+
# .env
|
|
24
|
+
AGENT_BACKEND=codex
|
|
25
|
+
# optional
|
|
26
|
+
CODEX_COMMAND="codex exec" # default
|
|
27
|
+
AGENT_PROJECT_CONTEXT="What we're working on, what's already decided."
|
|
28
|
+
AGENT_WORKDIR=/Users/you/code/your-project
|
|
29
|
+
AGENT_CHAT_TIMEOUT_MS=45000
|
|
30
|
+
AGENT_TASK_TIMEOUT_MS=0
|
|
31
|
+
```
|
|
32
|
+
|
|
33
|
+
`AGENT_SESSION_FILE` is unused (Codex `exec` is stateless across calls).
|
|
34
|
+
|
|
35
|
+
## Output capture
|
|
36
|
+
|
|
37
|
+
For Codex, the adapter:
|
|
38
|
+
|
|
39
|
+
1. Generates a temp path under `os.tmpdir()` like `verbalcoding-codex-last-<pid>-<ts>.txt`.
|
|
40
|
+
2. Inserts `--output-last-message <path>` immediately before the final positional prompt arg.
|
|
41
|
+
3. After the run, reads that file as the authoritative answer (preferred over `stdout`).
|
|
42
|
+
4. Deletes the temp file.
|
|
43
|
+
|
|
44
|
+
This is robust to Codex emitting tool-use chatter on stdout; the spoken answer always comes from the captured file.
|
|
45
|
+
|
|
46
|
+
## Voice phrases to switch TO Codex
|
|
47
|
+
|
|
48
|
+
- en: `"switch to Codex"`, `"ask Codex what it thinks"`
|
|
49
|
+
- ko: `"코덱스로 전환"`, `"코덱스한테 물어봐"`
|
|
50
|
+
|
|
51
|
+
## Gotchas
|
|
52
|
+
|
|
53
|
+
- **Long tasks.** Set `AGENT_TASK_TIMEOUT_MS=0` for codegen runs that may take minutes. The adapter respects `signal.aborted` so barge-in still cuts cleanly.
|
|
54
|
+
- **No session resume.** Pass context via `AGENT_PROJECT_CONTEXT` and rely on the cross-agent handoff block for continuity after a route change.
|
|
55
|
+
- **Patch-like output safety.** If a turn is interrupted and Codex was mid-diff, the bridge does **not** read the diff aloud — it speaks an "interrupted" notice and asks you to check the text channel.
|
|
56
|
+
- **Auth.** A 401 from the OpenAI backend surfaces as a non-zero exit; the bridge reports the failure and the cross-agent fallback prompt offers the default agent.
|
|
@@ -0,0 +1,45 @@
|
|
|
1
|
+
# Cursor CLI — Harness Notes
|
|
2
|
+
|
|
3
|
+
<p align="center">
|
|
4
|
+
<a href="../README.md">README</a> ·
|
|
5
|
+
<a href="HARNESSES.md">Harnesses</a> ·
|
|
6
|
+
<a href="USAGE.md">Usage</a> ·
|
|
7
|
+
<a href="CONFIGURATION.md">Configuration</a>
|
|
8
|
+
</p>
|
|
9
|
+
|
|
10
|
+
Cursor CLI (`cursor-agent`) is Cursor's terminal agent. VerbalCoding drives it through `cursor-agent --print --prompt`, passing the user's transcribed utterance as the prompt value. `--print` keeps the run non-interactive.
|
|
11
|
+
|
|
12
|
+
## Install
|
|
13
|
+
|
|
14
|
+
Follow the upstream Cursor CLI install. Confirm:
|
|
15
|
+
|
|
16
|
+
```bash
|
|
17
|
+
cursor-agent --print --prompt "hello"
|
|
18
|
+
```
|
|
19
|
+
|
|
20
|
+
## Configure VerbalCoding
|
|
21
|
+
|
|
22
|
+
```bash
|
|
23
|
+
# .env
|
|
24
|
+
AGENT_BACKEND=cursor # alias 'cursor-cli' also accepted
|
|
25
|
+
# optional
|
|
26
|
+
CURSOR_COMMAND="cursor-agent --print --prompt" # default
|
|
27
|
+
AGENT_PROJECT_CONTEXT="..."
|
|
28
|
+
AGENT_WORKDIR=/Users/you/code/your-project
|
|
29
|
+
AGENT_CHAT_TIMEOUT_MS=45000
|
|
30
|
+
AGENT_TASK_TIMEOUT_MS=0
|
|
31
|
+
```
|
|
32
|
+
|
|
33
|
+
## Voice phrases to switch TO Cursor
|
|
34
|
+
|
|
35
|
+
- en: `"switch to Cursor"`, `"ask Cursor ..."`, `"switch to cursor cli"`, `"switch to cursor agent"`
|
|
36
|
+
- ko: `"커서로 전환"`, `"cursor한테 물어봐"`
|
|
37
|
+
|
|
38
|
+
The matcher accepts `cursor`, `cursor cli`, `cursor-cli`, `cursor agent`, and `cursor-agent`.
|
|
39
|
+
|
|
40
|
+
## Gotchas
|
|
41
|
+
|
|
42
|
+
- **Prompt placement.** `--prompt` expects the value to follow; VerbalCoding's shell-aware argv builder places the transcribed utterance as the final positional argument, so `CURSOR_COMMAND` must end with `--prompt`.
|
|
43
|
+
- **Editor side-effects.** Cursor's CLI may touch local cursor-related state files in the working directory; if that's surprising for a voice-only flow, point `AGENT_WORKDIR` at an isolated project dir.
|
|
44
|
+
- **No session resume.** Use `AGENT_PROJECT_CONTEXT` for cross-turn continuity, plus the cross-agent handoff block when routing back from a different harness.
|
|
45
|
+
- **Patch safety.** If Cursor returns a diff and the turn is interrupted, the bridge does not read the diff aloud.
|
|
@@ -0,0 +1,45 @@
|
|
|
1
|
+
# Gemini CLI — Harness Notes
|
|
2
|
+
|
|
3
|
+
<p align="center">
|
|
4
|
+
<a href="../README.md">README</a> ·
|
|
5
|
+
<a href="HARNESSES.md">Harnesses</a> ·
|
|
6
|
+
<a href="USAGE.md">Usage</a> ·
|
|
7
|
+
<a href="CONFIGURATION.md">Configuration</a>
|
|
8
|
+
</p>
|
|
9
|
+
|
|
10
|
+
Gemini CLI is Google's terminal coding agent. VerbalCoding drives it through `gemini -p`. Each voice turn is one invocation; there is no built-in session-resume across calls.
|
|
11
|
+
|
|
12
|
+
## Install
|
|
13
|
+
|
|
14
|
+
Follow the upstream Gemini CLI install guide. Confirm:
|
|
15
|
+
|
|
16
|
+
```bash
|
|
17
|
+
gemini -p "hello"
|
|
18
|
+
```
|
|
19
|
+
|
|
20
|
+
## Configure VerbalCoding
|
|
21
|
+
|
|
22
|
+
```bash
|
|
23
|
+
# .env
|
|
24
|
+
AGENT_BACKEND=gemini
|
|
25
|
+
# optional
|
|
26
|
+
GEMINI_COMMAND="gemini -p" # default; add --model, --debug as needed
|
|
27
|
+
AGENT_PROJECT_CONTEXT="..."
|
|
28
|
+
AGENT_WORKDIR=/Users/you/code/your-project
|
|
29
|
+
AGENT_CHAT_TIMEOUT_MS=45000
|
|
30
|
+
AGENT_TASK_TIMEOUT_MS=0
|
|
31
|
+
```
|
|
32
|
+
|
|
33
|
+
## Voice phrases to switch TO Gemini
|
|
34
|
+
|
|
35
|
+
- en: `"switch to Gemini"`, `"ask Gemini ..."`, `"switch to Gemini CLI"`
|
|
36
|
+
- ko: `"제미나이로 전환"`, `"gemini한테 물어봐"`
|
|
37
|
+
|
|
38
|
+
The matcher accepts `gemini`, `gemini cli`, `gemini-cli`, and `제미나이`.
|
|
39
|
+
|
|
40
|
+
## Gotchas
|
|
41
|
+
|
|
42
|
+
- **No session resume.** Same continuity story as Claude / Codex: rely on `AGENT_PROJECT_CONTEXT` and the cross-agent handoff block.
|
|
43
|
+
- **Long answers.** Gemini sometimes returns large structured responses; the stream sentencer splits them into TTS-able sentences. Code fences are stripped from speech (the text channel still gets the full answer with code).
|
|
44
|
+
- **API key.** If Gemini exits non-zero with an auth error, the bridge reports the message; the cross-agent fallback prompt offers the default agent if Gemini was a non-default route.
|
|
45
|
+
- **Verbose progress.** Gemini's stdout doesn't follow Hermes' `┊`-style preview format, so verbose progress mostly relies on the smart-progress LLM summarizer.
|