verbalcoding 0.2.12 → 0.2.13

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (169) hide show
  1. package/.env.example +74 -4
  2. package/README.es.md +3 -1
  3. package/README.fr.md +3 -1
  4. package/README.ja.md +3 -1
  5. package/README.ko.md +4 -2
  6. package/README.md +4 -2
  7. package/README.ru.md +3 -1
  8. package/README.zh.md +3 -1
  9. package/app-node/agent_adapters.test.mjs +14 -0
  10. package/app-node/agent_routing.mjs +148 -0
  11. package/app-node/agent_routing.test.mjs +138 -0
  12. package/app-node/agent_turn.mjs +86 -0
  13. package/app-node/agent_turn.test.mjs +109 -0
  14. package/app-node/bridge_context.mjs +73 -0
  15. package/app-node/bridge_context.test.mjs +54 -0
  16. package/app-node/bridge_state.mjs +4 -0
  17. package/app-node/bridge_wireup.test.mjs +462 -0
  18. package/app-node/cli_install.test.mjs +31 -0
  19. package/app-node/cross_agent_routing.test.mjs +78 -0
  20. package/app-node/discord_command_router.mjs +204 -0
  21. package/app-node/discord_command_router.test.mjs +311 -0
  22. package/app-node/discord_voice_setup.mjs +251 -0
  23. package/app-node/discord_voice_setup.test.mjs +86 -0
  24. package/app-node/hermes_profiles.test.mjs +12 -1
  25. package/app-node/install_config.mjs +110 -3
  26. package/app-node/install_config.test.mjs +8 -0
  27. package/app-node/instance_doctor.test.mjs +9 -0
  28. package/app-node/instances.test.mjs +8 -1
  29. package/app-node/main.mjs +488 -1368
  30. package/app-node/mcp_tools.test.mjs +7 -0
  31. package/app-node/notification_handler.mjs +89 -0
  32. package/app-node/notification_handler.test.mjs +187 -0
  33. package/app-node/plan_dispatcher.mjs +215 -0
  34. package/app-node/plan_dispatcher.test.mjs +101 -0
  35. package/app-node/plan_mode.mjs +36 -7
  36. package/app-node/plan_mode.test.mjs +78 -0
  37. package/app-node/progress_handler.mjs +220 -0
  38. package/app-node/progress_handler.test.mjs +193 -0
  39. package/app-node/progress_speech.mjs +54 -32
  40. package/app-node/progress_speech.test.mjs +12 -3
  41. package/app-node/project_sessions.mjs +5 -2
  42. package/app-node/project_sessions.test.mjs +7 -0
  43. package/app-node/research_mode.mjs +282 -0
  44. package/app-node/research_mode.test.mjs +264 -0
  45. package/app-node/restart_notice.mjs +3 -0
  46. package/app-node/restart_notice.test.mjs +11 -0
  47. package/app-node/session_ontology.mjs +271 -0
  48. package/app-node/session_ontology.test.mjs +130 -0
  49. package/app-node/smart_progress.mjs +1 -1
  50. package/app-node/stream_sentencer.mjs +32 -2
  51. package/app-node/stream_sentencer.test.mjs +65 -0
  52. package/app-node/streaming_tts_queue.mjs +5 -1
  53. package/app-node/streaming_tts_queue.test.mjs +7 -1
  54. package/app-node/stt_whisper.mjs +24 -0
  55. package/app-node/stt_whisper.test.mjs +32 -0
  56. package/app-node/text_routing.mjs +4 -2
  57. package/app-node/tts_backends.mjs +537 -3
  58. package/app-node/tts_backends.test.mjs +454 -0
  59. package/app-node/tts_player.mjs +164 -0
  60. package/app-node/tts_player.test.mjs +202 -0
  61. package/app-node/tts_runtime.mjs +134 -0
  62. package/app-node/tts_runtime.test.mjs +89 -0
  63. package/app-node/tts_settings.mjs +150 -3
  64. package/app-node/tts_settings.test.mjs +204 -0
  65. package/app-node/tts_voice_config.mjs +136 -2
  66. package/app-node/tts_voice_config.test.mjs +94 -0
  67. package/app-node/utterance_router.mjs +216 -0
  68. package/app-node/utterance_router.test.mjs +236 -0
  69. package/app-node/voice_autojoin.mjs +37 -0
  70. package/app-node/voice_autojoin.test.mjs +59 -0
  71. package/app-node/voice_io.mjs +272 -0
  72. package/app-node/voice_io.test.mjs +102 -0
  73. package/app-node/voice_turn_runner.mjs +449 -0
  74. package/app-node/voice_turn_runner.test.mjs +289 -0
  75. package/docs/CONFIGURATION.md +12 -2
  76. package/docs/HARNESSES.md +58 -0
  77. package/docs/HARNESS_AIDER.md +50 -0
  78. package/docs/HARNESS_CLAUDE.md +56 -0
  79. package/docs/HARNESS_CODEX.md +56 -0
  80. package/docs/HARNESS_CURSOR.md +45 -0
  81. package/docs/HARNESS_GEMINI.md +45 -0
  82. package/docs/HARNESS_HERMES.md +57 -0
  83. package/docs/HARNESS_OPENCLAW.md +44 -0
  84. package/docs/HARNESS_OPENCODE.md +44 -0
  85. package/docs/README.md +1 -0
  86. package/docs/ROADMAP.md +20 -5
  87. package/docs/TTS_BACKENDS.md +227 -0
  88. package/docs/USAGE.md +22 -0
  89. package/docs/i18n/AGENTS.es.md +34 -0
  90. package/docs/i18n/AGENTS.fr.md +34 -0
  91. package/docs/i18n/AGENTS.ja.md +34 -0
  92. package/docs/i18n/AGENTS.ko.md +34 -0
  93. package/docs/i18n/AGENTS.ru.md +34 -0
  94. package/docs/i18n/AGENTS.zh.md +34 -0
  95. package/docs/i18n/HARNESSES.es.md +58 -0
  96. package/docs/i18n/HARNESSES.fr.md +58 -0
  97. package/docs/i18n/HARNESSES.ja.md +58 -0
  98. package/docs/i18n/HARNESSES.ko.md +58 -0
  99. package/docs/i18n/HARNESSES.ru.md +58 -0
  100. package/docs/i18n/HARNESSES.zh.md +58 -0
  101. package/docs/i18n/HARNESS_AIDER.es.md +48 -0
  102. package/docs/i18n/HARNESS_AIDER.fr.md +48 -0
  103. package/docs/i18n/HARNESS_AIDER.ja.md +50 -0
  104. package/docs/i18n/HARNESS_AIDER.ko.md +50 -0
  105. package/docs/i18n/HARNESS_AIDER.ru.md +48 -0
  106. package/docs/i18n/HARNESS_AIDER.zh.md +48 -0
  107. package/docs/i18n/HARNESS_CLAUDE.es.md +55 -0
  108. package/docs/i18n/HARNESS_CLAUDE.fr.md +55 -0
  109. package/docs/i18n/HARNESS_CLAUDE.ja.md +56 -0
  110. package/docs/i18n/HARNESS_CLAUDE.ko.md +56 -0
  111. package/docs/i18n/HARNESS_CLAUDE.ru.md +55 -0
  112. package/docs/i18n/HARNESS_CLAUDE.zh.md +56 -0
  113. package/docs/i18n/HARNESS_CODEX.es.md +55 -0
  114. package/docs/i18n/HARNESS_CODEX.fr.md +55 -0
  115. package/docs/i18n/HARNESS_CODEX.ja.md +56 -0
  116. package/docs/i18n/HARNESS_CODEX.ko.md +56 -0
  117. package/docs/i18n/HARNESS_CODEX.ru.md +55 -0
  118. package/docs/i18n/HARNESS_CODEX.zh.md +56 -0
  119. package/docs/i18n/HARNESS_CURSOR.es.md +42 -0
  120. package/docs/i18n/HARNESS_CURSOR.fr.md +42 -0
  121. package/docs/i18n/HARNESS_CURSOR.ja.md +45 -0
  122. package/docs/i18n/HARNESS_CURSOR.ko.md +45 -0
  123. package/docs/i18n/HARNESS_CURSOR.ru.md +42 -0
  124. package/docs/i18n/HARNESS_CURSOR.zh.md +42 -0
  125. package/docs/i18n/HARNESS_GEMINI.es.md +44 -0
  126. package/docs/i18n/HARNESS_GEMINI.fr.md +44 -0
  127. package/docs/i18n/HARNESS_GEMINI.ja.md +45 -0
  128. package/docs/i18n/HARNESS_GEMINI.ko.md +45 -0
  129. package/docs/i18n/HARNESS_GEMINI.ru.md +44 -0
  130. package/docs/i18n/HARNESS_GEMINI.zh.md +45 -0
  131. package/docs/i18n/HARNESS_HERMES.es.md +54 -0
  132. package/docs/i18n/HARNESS_HERMES.fr.md +54 -0
  133. package/docs/i18n/HARNESS_HERMES.ja.md +57 -0
  134. package/docs/i18n/HARNESS_HERMES.ko.md +57 -0
  135. package/docs/i18n/HARNESS_HERMES.ru.md +54 -0
  136. package/docs/i18n/HARNESS_HERMES.zh.md +57 -0
  137. package/docs/i18n/HARNESS_OPENCLAW.es.md +41 -0
  138. package/docs/i18n/HARNESS_OPENCLAW.fr.md +41 -0
  139. package/docs/i18n/HARNESS_OPENCLAW.ja.md +44 -0
  140. package/docs/i18n/HARNESS_OPENCLAW.ko.md +44 -0
  141. package/docs/i18n/HARNESS_OPENCLAW.ru.md +41 -0
  142. package/docs/i18n/HARNESS_OPENCLAW.zh.md +42 -0
  143. package/docs/i18n/HARNESS_OPENCODE.es.md +41 -0
  144. package/docs/i18n/HARNESS_OPENCODE.fr.md +41 -0
  145. package/docs/i18n/HARNESS_OPENCODE.ja.md +44 -0
  146. package/docs/i18n/HARNESS_OPENCODE.ko.md +44 -0
  147. package/docs/i18n/HARNESS_OPENCODE.ru.md +41 -0
  148. package/docs/i18n/HARNESS_OPENCODE.zh.md +44 -0
  149. package/docs/superpowers/plans/2026-05-14-cross-agent-voice-transfer.md +625 -0
  150. package/docs/superpowers/plans/2026-05-21-audio-overview-narrated-diffs.md +95 -0
  151. package/docs/superpowers/plans/2026-05-21-autoresearch-ontology.md +83 -0
  152. package/docs/superpowers/plans/2026-05-21-phase11-push-to-talk-wakeword-v2.md +77 -0
  153. package/docs/superpowers/plans/2026-05-21-phase12-multi-user-voice.md +147 -0
  154. package/docs/superpowers/plans/2026-05-21-phase14-verbalbench.md +136 -0
  155. package/docs/superpowers/plans/2026-05-21-phase15-phone-companion.md +72 -0
  156. package/integrations/fireredtts2/mlx_llm.py +183 -0
  157. package/integrations/fireredtts2/synth.py +156 -0
  158. package/integrations/fireredtts2/synth_mlx.py +196 -0
  159. package/integrations/mlxaudio/synth.py +74 -0
  160. package/integrations/neuttsair/synth.py +104 -0
  161. package/integrations/omnivoice/synth.py +110 -0
  162. package/package.json +6 -1
  163. package/scripts/cli.mjs +84 -0
  164. package/scripts/doctor.mjs +104 -4
  165. package/scripts/install.mjs +5 -1
  166. package/scripts/install_fireredtts2.sh +109 -0
  167. package/scripts/install_mlxaudio.sh +34 -0
  168. package/scripts/install_mossttsnano.sh +46 -0
  169. package/scripts/postinstall.mjs +34 -0
@@ -0,0 +1,289 @@
1
+ import test from 'node:test';
2
+ import assert from 'node:assert/strict';
3
+ import { createVoiceTurnRunner } from './voice_turn_runner.mjs';
4
+ import { createUtteranceRouter } from './utterance_router.mjs';
5
+ import { createPlanDispatcher } from './plan_dispatcher.mjs';
6
+ import { createBridge } from './bridge_context.mjs';
7
+ import { createAgentTurnLifecycle } from './agent_turn.mjs';
8
+
9
+ function noop() {}
10
+ async function noopAsync() {}
11
+
12
+ // Build a complete dep set for voiceTurnRunner by first constructing a
13
+ // real utterance_router with stubbed pure-function deps, then threading
14
+ // its outputs into the runner alongside the rest. This mirrors main.mjs's
15
+ // real construction order so the tests catch any inter-module wiring drift.
16
+ function makeDeps(overrides = {}) {
17
+ const bridge = createBridge();
18
+ bridge.bridgeState = {
19
+ deferredSize: () => 0,
20
+ currentEpoch: () => 1,
21
+ discardQueues: () => 0,
22
+ };
23
+ const agentTurnLifecycle = createAgentTurnLifecycle({ bridge, warn: noop });
24
+
25
+ const agentAdapter = {
26
+ label: 'default-agent', backend: 'hermes',
27
+ readSessionId: () => null,
28
+ ask: async () => 'mock agent answer',
29
+ };
30
+
31
+ // Construct the router (post-Phase 7b: dispatch + adapter selection only).
32
+ const router = createUtteranceRouter({
33
+ bridge,
34
+ log: noop, warn: noop, path: { join: (...a) => a.join('/') },
35
+ ROOT: '/tmp/vc', TTS_VOICE_CONFIG_PATH: '/tmp/voices.json',
36
+ agentAdapter,
37
+ settings: { voiceLanguage: 'ko', transcriptChannelId: 'tx-ch', agent: { backend: 'hermes', label: 'hermes' }, tts: {} },
38
+ projectSessionContextText: () => '',
39
+ createBridgeAgentAdapter: s => ({ label: s?.label || 'fake', backend: s?.backend || 'hermes', ask: async () => '' }),
40
+ buildAgentSettings: () => ({ backend: 'hermes', label: 'hermes' }),
41
+ commandIsInstalled: async () => true,
42
+ shellSplit: s => String(s).split(' '),
43
+ sendText: noopAsync, speakText: noopAsync,
44
+ ensureTtsVoiceConfig: () => ({ backends: {} }),
45
+ updateTtsVoiceConfig: c => c,
46
+ writeTtsVoiceConfig: noop,
47
+ applyVoiceConfigToProcessEnv: () => ({ selection: { backend: 'edge', voiceType: 'female', voice: { language: 'ko', voice: 'x' } } }),
48
+ ensureSelectedTtsBackendInstalled: noopAsync,
49
+ rebuildTtsRuntimeSettings: noop,
50
+ voiceCommandFromTranscript: () => null,
51
+ voiceChangedText: () => '',
52
+ voiceLanguageCommandFromTranscript: () => null,
53
+ voiceCloneCommandFromText: () => null,
54
+ voiceCloneCapture: { arm: () => ({ targetPath: '' }), cancel: () => false, current: () => null },
55
+ notifyVoiceCloneSampleGapIfNeeded: noopAsync,
56
+ languageChangedText: () => '',
57
+ applyRuntimeLanguage: noop,
58
+ persistEnvValues: noop,
59
+ discardVoiceInputQueues: () => 0,
60
+ });
61
+
62
+ // Construct the plan dispatcher (Phase 7b) consuming router outputs.
63
+ const planDispatcher = createPlanDispatcher({
64
+ bridge,
65
+ settings: { voiceLanguage: 'ko', transcriptChannelId: 'tx-ch', agent: { backend: 'hermes', label: 'hermes' } },
66
+ sendText: noopAsync,
67
+ speakText: noopAsync,
68
+ routingStateFor: router.routingStateFor,
69
+ adapterForBackend: router.adapterForBackend,
70
+ adapterForProjectSession: router.adapterForProjectSession,
71
+ resolveProjectSessionForChannel: () => null,
72
+ isAgentRoutingDecision: () => false,
73
+ parseDecisionAnswer: () => ({ type: 'unknown' }),
74
+ parsePlanVoiceCommand: () => ({ type: 'unknown' }),
75
+ applyPlanCommand: s => s,
76
+ parsePlanOutput: () => ({ steps: [], decisions: [] }),
77
+ renderDecisionPrompt: d => d?.text || '',
78
+ renderResolvedDecisions: () => '',
79
+ renderFinalPlan: () => '',
80
+ planModePreamble: () => '',
81
+ planExecutionPreamble: () => '',
82
+ isPlanEntryUtterance: () => false,
83
+ });
84
+
85
+ const settings = { voiceLanguage: 'ko', transcriptChannelId: 'tx-ch', agent: { backend: 'hermes', label: 'hermes' }, tts: {} };
86
+
87
+ return {
88
+ bridge,
89
+ agentTurnLifecycle,
90
+ settings,
91
+ client: { channels: { cache: new Map() } },
92
+ log: noop, warn: noop, fs: { rm: (_p, _o, cb) => cb && cb() },
93
+ // voice_io
94
+ transcribe: async () => 'hey hermes do a thing',
95
+ // tts_player
96
+ beginStreamingTurn: () => false,
97
+ endStreamingTurn: noopAsync,
98
+ speakText: noopAsync,
99
+ // progress_handler
100
+ queueProgressSpeechText: noop,
101
+ stopProgressSpeech: noop,
102
+ speakImmediateNotice: noopAsync,
103
+ // notification_handler
104
+ maybeNotifyTaskComplete: noopAsync,
105
+ // utterance_router outputs (real router instance built above)
106
+ handleLanguageCommand: router.handleLanguageCommand,
107
+ handleTtsVoiceCommand: router.handleTtsVoiceCommand,
108
+ handleVoiceCloneCommand: router.handleVoiceCloneCommand,
109
+ dispatchPlanModeUtterance: planDispatcher.dispatchPlanModeUtterance,
110
+ adapterForBackend: router.adapterForBackend,
111
+ adapterForProjectSession: router.adapterForProjectSession,
112
+ planChannelKey: planDispatcher.planChannelKey,
113
+ routingStateFor: router.routingStateFor,
114
+ recordUtterance: router.recordUtterance,
115
+ clearTransientRouting: router.clearTransientRouting,
116
+ // pure helpers
117
+ isAllowed: () => true,
118
+ isAbortError: e => e?.name === 'AbortError',
119
+ sleep: async () => {},
120
+ sendText: noopAsync,
121
+ sendEmbed: async () => true,
122
+ reloadRuntimeLanguageFromEnv: () => ({ changed: false, voiceLanguage: 'ko', whisperLanguage: 'ko' }),
123
+ drainDeferredProcessingUtterances: noopAsync,
124
+ resolveProjectSessionForChannel: () => null,
125
+ projectSessionContextText: () => '',
126
+ ontologyStateFor: () => ({ nodeCount: 0, serializeForHandoff: () => '' }),
127
+ captureOntologyFromTurn: noop,
128
+ formatRecentDiscordContext: () => '',
129
+ formatSttResultMessage: (_lang, _u, t) => `you said: ${t}`,
130
+ formatSttStartMessage: () => '🎧',
131
+ formatVoiceErrorMessage: (_lang, m) => m,
132
+ formatWakeRejectedMessage: () => 'no wake word',
133
+ agentAnswerHeader: () => 'agent says:',
134
+ emptyAgentAnswer: () => '(empty)',
135
+ spokenResultOnly: (_p, a) => a,
136
+ stripWake: t => t,
137
+ acceptsWake: () => true,
138
+ sensitivityChangedSpeech: () => '',
139
+ sensitivityModeFromTranscript: () => null,
140
+ sensitivityStatusText: () => '',
141
+ setSensitivityMode: () => ({ mode: 'normal' }),
142
+ isSensitivityOnlyRequest: () => false,
143
+ verboseChangedSpeech: () => '',
144
+ verboseModeFromTranscript: () => null,
145
+ verboseStatusText: () => '',
146
+ setVerboseProgress: noop,
147
+ isVerboseOnlyRequest: () => false,
148
+ isRoutingOnlyUtterance: () => false,
149
+ parseAgentRoutingCommand: () => ({ type: 'none' }),
150
+ renderAgentPrefix: () => '',
151
+ buildCrossAgentPrompt: ({ prompt }) => prompt,
152
+ buildFallbackDecision: () => ({ slot: 'fallback' }),
153
+ parseDecisionAnswer: () => ({ type: 'unknown' }),
154
+ parseResearchCommand: () => ({ type: 'none' }),
155
+ runResearchTurn: async () => ({ status: 'no_backend' }),
156
+ PROGRESS_IDLE_CHECK_MS: 5000,
157
+ PROGRESS_IDLE_NOTICE_INITIAL_MS: 10000,
158
+ PROGRESS_IDLE_NOTICE_LIMIT: 20,
159
+ PROGRESS_IDLE_NOTICE_MAX_MS: 30000,
160
+ PROGRESS_IDLE_NOTICE_MULTIPLIER: 1.8,
161
+ STT_START_VOICE_NOTICE: false,
162
+ ...overrides,
163
+ };
164
+ }
165
+
166
+ test('createVoiceTurnRunner exposes handleRecording', () => {
167
+ const runner = createVoiceTurnRunner(makeDeps());
168
+ assert.equal(typeof runner.handleRecording, 'function');
169
+ });
170
+
171
+ test('handleRecording happy path: transcribe -> agent -> send + speak + notify, cleanup green', async () => {
172
+ const calls = { transcribe: 0, askPrompt: '', sendText: [], speakText: [], notify: [] };
173
+ // Capture the exact prompt the runner sends to the agent and the exact
174
+ // answer that flows back out.
175
+ const fakeAdapter = {
176
+ label: 'hermes', backend: 'hermes', readSessionId: () => null,
177
+ ask: async (prompt, _signal, plan) => {
178
+ calls.askPrompt = prompt;
179
+ assert.equal(plan.label, 'hermes', 'plan label = agent label');
180
+ assert.equal(plan.task, true, 'plan.task=true for voice turn');
181
+ return 'twelve apples';
182
+ },
183
+ };
184
+ const deps = makeDeps({
185
+ transcribe: async wav => { calls.transcribe++; assert.equal(wav, '/tmp/u.wav'); return 'hermes do the thing'; },
186
+ sendText: async t => { calls.sendText.push(t); return true; },
187
+ speakText: async t => { calls.speakText.push(t); },
188
+ maybeNotifyTaskComplete: async ({ answer, label }) => { calls.notify.push({ answer, label }); },
189
+ // Force the runner's adapter lookup to return our test adapter rather
190
+ // than the router's auto-created one (the router builds a fresh adapter
191
+ // per backend via createBridgeAgentAdapter).
192
+ adapterForBackend: () => fakeAdapter,
193
+ adapterForProjectSession: () => fakeAdapter,
194
+ });
195
+ const { handleRecording } = createVoiceTurnRunner(deps);
196
+ assert.equal(deps.bridge.processing, false);
197
+ await handleRecording('user-1', '/tmp/u.wav', 8192, 1, null);
198
+ assert.equal(calls.transcribe, 1, 'transcribe called once');
199
+ assert.equal(calls.askPrompt, 'hermes do the thing', 'agent receives the post-wake prompt');
200
+ assert.deepEqual(calls.speakText.at(-1), 'twelve apples', 'agent answer is spoken');
201
+ assert.ok(calls.sendText.some(s => /you said: hermes do the thing/.test(s)), 'STT echoed');
202
+ assert.ok(calls.sendText.some(s => /twelve apples/.test(s)), 'agent answer surfaced as text');
203
+ assert.equal(calls.notify.length, 1, 'maybeNotifyTaskComplete fired once');
204
+ assert.equal(calls.notify[0].label, 'hermes', 'notify carries the agent label');
205
+ assert.equal(deps.bridge.processing, false, 'processing flag cleared in finally');
206
+ assert.equal(deps.bridge.activeTurnId, 0, 'activeTurnId cleared');
207
+ assert.equal(deps.bridge.activeProgressAbortController, null, 'progress controller cleared');
208
+ });
209
+
210
+ test('handleRecording cleans up progress controller even when agent throws', async () => {
211
+ const fakeAdapter = {
212
+ label: 'hermes', backend: 'hermes', readSessionId: () => null,
213
+ ask: async () => { throw new Error('agent boom'); },
214
+ };
215
+ const deps = makeDeps({
216
+ adapterForBackend: () => fakeAdapter,
217
+ adapterForProjectSession: () => fakeAdapter,
218
+ });
219
+ const { handleRecording } = createVoiceTurnRunner(deps);
220
+ let finishStatus = null;
221
+ let finishError = null;
222
+ const metricsTurn = { mark: () => {}, addMeta: () => {}, stage: () => {}, finish: r => { finishStatus = r.status; finishError = r.error; } };
223
+ await handleRecording('user-1', '/tmp/u.wav', 8192, 1, metricsTurn);
224
+ assert.equal(finishStatus, 'error');
225
+ assert.match(finishError || '', /agent boom/);
226
+ // Cleanup invariants — the bug Codex flagged on the original voice-path
227
+ // finally is now guarded by agentTurnLifecycle.finish, so we double-check.
228
+ assert.equal(deps.bridge.processing, false);
229
+ assert.equal(deps.bridge.activeProgressAbortController, null);
230
+ assert.equal(deps.bridge.currentAbortController, null);
231
+ });
232
+
233
+ test('handleRecording drops when bridge.processing is already true', async () => {
234
+ const deps = makeDeps();
235
+ deps.bridge.processing = true;
236
+ const { handleRecording } = createVoiceTurnRunner(deps);
237
+ let finishStatus = null;
238
+ const metricsTurn = { mark: () => {}, addMeta: () => {}, stage: () => {}, finish: r => { finishStatus = r.status; } };
239
+ await handleRecording('user-1', '/tmp/u.wav', 8192, 1, metricsTurn);
240
+ assert.equal(finishStatus, 'drop_processing');
241
+ assert.equal(deps.bridge.processing, true, 'processing flag left intact (other turn owns it)');
242
+ });
243
+
244
+ test('handleRecording rejects unauthorized users', async () => {
245
+ const deps = makeDeps({ isAllowed: () => false });
246
+ const { handleRecording } = createVoiceTurnRunner(deps);
247
+ let finishStatus = null;
248
+ const metricsTurn = { mark: () => {}, addMeta: () => {}, stage: () => {}, finish: r => { finishStatus = r.status; } };
249
+ await handleRecording('intruder', '/tmp/u.wav', 8192, 1, metricsTurn);
250
+ assert.equal(finishStatus, 'unauthorized');
251
+ });
252
+
253
+ test('handleRecording short-circuits on empty transcript', async () => {
254
+ const deps = makeDeps({ transcribe: async () => '' });
255
+ const { handleRecording } = createVoiceTurnRunner(deps);
256
+ let finishStatus = null;
257
+ const metricsTurn = { mark: () => {}, addMeta: () => {}, stage: () => {}, finish: r => { finishStatus = r.status; } };
258
+ await handleRecording('user-1', '/tmp/u.wav', 8192, 1, metricsTurn);
259
+ assert.equal(finishStatus, 'empty_transcript');
260
+ assert.equal(deps.bridge.processing, false, 'processing flag still cleaned up');
261
+ });
262
+
263
+ test('handleRecording short-circuits when wake word missing', async () => {
264
+ const sent = [];
265
+ const deps = makeDeps({
266
+ acceptsWake: () => false,
267
+ sendText: async t => { sent.push(t); return true; },
268
+ });
269
+ const { handleRecording } = createVoiceTurnRunner(deps);
270
+ let finishStatus = null;
271
+ const metricsTurn = { mark: () => {}, addMeta: () => {}, stage: () => {}, finish: r => { finishStatus = r.status; } };
272
+ await handleRecording('user-1', '/tmp/u.wav', 8192, 1, metricsTurn);
273
+ assert.equal(finishStatus, 'wake_rejected');
274
+ assert.ok(sent.some(t => /no wake word/.test(t)), 'wake-rejected message sent');
275
+ });
276
+
277
+ test('handleRecording with stale language reload aborts before transcribe', async () => {
278
+ let transcribed = false;
279
+ const deps = makeDeps({
280
+ reloadRuntimeLanguageFromEnv: () => ({ changed: true, voiceLanguage: 'en', whisperLanguage: 'en' }),
281
+ transcribe: async () => { transcribed = true; return 'hi'; },
282
+ });
283
+ const { handleRecording } = createVoiceTurnRunner(deps);
284
+ let finishStatus = null;
285
+ const metricsTurn = { mark: () => {}, addMeta: () => {}, stage: () => {}, finish: r => { finishStatus = r.status; } };
286
+ await handleRecording('user-1', '/tmp/u.wav', 8192, 1, metricsTurn);
287
+ assert.equal(transcribed, false, 'transcribe not called when language changed');
288
+ assert.equal(finishStatus, 'drop_stale_language_change');
289
+ });
@@ -192,6 +192,8 @@ Remove `ports:` from that Compose service. On Docker Desktop for macOS/Windows,
192
192
 
193
193
  ## Optional TTS Backends
194
194
 
195
+ For the full backend matrix, latency notes, aliases, and Mac mini caveats, see [TTS Backends](TTS_BACKENDS.md).
196
+
195
197
  Edge TTS remains the default and fallback. Optional local backends are configured with their own env vars:
196
198
 
197
199
  | Backend | Settings | Voice choices |
@@ -200,8 +202,16 @@ Edge TTS remains the default and fallback. Optional local backends are configure
200
202
  | Supertonic | `SUPERTONIC_VOICE`, `SUPERTONIC_LANGUAGE` | `M1`–`M5`, `F1`–`F5`; language `ko`, `en`, `es`, `pt`, `fr` |
201
203
  | OpenVoice | `OPENVOICE_REF_AUDIO`, `OPENVOICE_STYLE`, `OPENVOICE_LANGUAGE` | User-provided permitted reference WAV; style defaults to `default` |
202
204
  | SpeechSwift / CosyVoice | `SPEECHSWIFT_REF_AUDIO`, `SPEECHSWIFT_ENGINE`, `SPEECHSWIFT_SPEAKER`, `SPEECHSWIFT_MODEL_ID` | Reference-sample voices for CosyVoice, or backend-supported speaker/model IDs |
203
-
204
- Only clone voices you own or have permission to use. If a local backend fails or times out, VerbalCoding falls back to Edge TTS.
205
+ | OmniVoice | `OMNIVOICE_PYTHON`, `OMNIVOICE_MODEL`, `OMNIVOICE_REF_AUDIO`, `OMNIVOICE_REF_TEXT`, `OMNIVOICE_LANGUAGE`, `OMNIVOICE_SPEAKER` | k2-fsa/OmniVoice reference-sample cloning or optional voice-design attributes |
206
+ | Qwen3 TTS | `QWEN3TTS_COMMAND`, `QWEN3TTS_MODE`, `QWEN3TTS_MODEL`, `QWEN3TTS_SPEAKER` | Preset speaker such as `sohee`, reference mode, or designed speaker text |
207
+ | MLX Audio | `MLXAUDIO_PYTHON`, `MLXAUDIO_MODEL`, `MLXAUDIO_VOICE`, `MLXAUDIO_LANG_CODE` | MLX Qwen3 voice/speaker IDs such as `Chelsie` |
208
+ | NeuTTS Air | `NEUTTSAIR_PYTHON`, `NEUTTSAIR_BACKBONE_REPO`, `NEUTTSAIR_CODEC_REPO`, `NEUTTSAIR_REF_AUDIO`, `NEUTTSAIR_REF_TEXT` | English NeuTTS Air reference-sample cloning; use Q4 GGUF for lower latency |
209
+ | FireRedTTS-2 | `FIREREDTTS2_COMMAND`, `FIREREDTTS2_PRETRAINED_DIR`, `FIREREDTTS2_PROMPT_AUDIO`, `FIREREDTTS2_PROMPT_TEXT` | Prompt-reference voice or random speaker |
210
+ | FireRedTTS-2 MLX helper | `integrations/fireredtts2/synth_mlx.py` | Experimental Apple Silicon LLM-port helper; not a canonical `TTS_BACKEND` yet |
211
+ | MOSS-TTS-Nano | `MOSSTTSNANO_COMMAND`, `MOSSTTSNANO_SCRIPT`, `MOSSTTSNANO_CHECKPOINT`, `MOSSTTSNANO_PROMPT_AUDIO` | OpenMOSS prompt reference or continuation mode |
212
+ | MOSS-TTS-Nano MLX | `MOSSTTSNANO_MLX_PYTHON`, `MOSSTTSNANO_MLX_SCRIPT`, `MOSSTTSNANO_MLX_WORKER`, `MOSSTTSNANO_PROMPT_AUDIO` | Experimental MLX hybrid prompt reference or continuation mode |
213
+
214
+ Only clone voices you own or have permission to use. For OmniVoice, install it in a separate Python environment such as `.venv-omnivoice` (`pip install torch torchaudio soundfile omnivoice`) and set `TTS_BACKEND=omnivoice`. For NeuTTS Air, install the local `neutts` package in `.venv-neuttsair`, set `TTS_BACKEND=neuttsair`, and keep progress prompts on Edge unless explicitly testing local progress TTS. If a local backend fails or times out, VerbalCoding falls back to Edge TTS.
205
215
 
206
216
  ## Operational Notes
207
217
 
@@ -0,0 +1,58 @@
1
+ # Coding Agent Harnesses
2
+
3
+ <p align="center">
4
+ <a href="../README.md">README</a> ·
5
+ <a href="README.md">Docs hub</a> ·
6
+ <a href="USAGE.md">Usage</a> ·
7
+ <a href="CONFIGURATION.md">Configuration</a> ·
8
+ <a href="TROUBLESHOOTING.md">Troubleshooting</a>
9
+ </p>
10
+
11
+ VerbalCoding is agent-agnostic. It drives whichever CLI coding agent you have installed by spawning it once per voice turn, feeding the transcript as a prompt, and speaking the response back. Pick **one** as your default; the cross-agent voice routing lets you reach the others mid-session.
12
+
13
+ | Harness | Default command | Session resume | Per-harness doc |
14
+ |---|---|---|---|
15
+ | Hermes Agent | `hermes chat -Q -q` | ✅ (`--resume <id>`) | [HERMES_VOICE.md](./HERMES_VOICE.md) (positioning) + [HARNESS_HERMES.md](./HARNESS_HERMES.md) |
16
+ | Claude Code | `claude -p` | ❌ | [HARNESS_CLAUDE.md](./HARNESS_CLAUDE.md) |
17
+ | Codex | `codex exec` | ❌ (output-last-message capture) | [HARNESS_CODEX.md](./HARNESS_CODEX.md) |
18
+ | Gemini CLI | `gemini -p` | ❌ | [HARNESS_GEMINI.md](./HARNESS_GEMINI.md) |
19
+ | OpenCode | `opencode run` | ❌ | [HARNESS_OPENCODE.md](./HARNESS_OPENCODE.md) |
20
+ | OpenClaw | `openclaw run` | ❌ | [HARNESS_OPENCLAW.md](./HARNESS_OPENCLAW.md) |
21
+ | Aider | `aider --no-pretty --yes-always --message` | ❌ | [HARNESS_AIDER.md](./HARNESS_AIDER.md) |
22
+ | Cursor CLI | `cursor-agent --print --prompt` | ❌ | [HARNESS_CURSOR.md](./HARNESS_CURSOR.md) |
23
+
24
+ ## Pick your default
25
+
26
+ `vc setup` auto-detects installed binaries and lets you pick. Non-interactive override:
27
+
28
+ ```bash
29
+ # .env or instance .env
30
+ AGENT_BACKEND=claude # hermes | claude | codex | gemini | opencode | openclaw | aider | cursor | custom
31
+ ```
32
+
33
+ Each harness picks up its own command from a matching env var (`HERMES_COMMAND`, `CLAUDE_COMMAND`, etc.). The shared envs `AGENT_LABEL`, `AGENT_COMMAND`, `AGENT_SESSION_FILE`, `AGENT_WORKDIR`, `AGENT_PROJECT_CONTEXT`, `AGENT_TASK_TIMEOUT_MS`, `AGENT_CHAT_TIMEOUT_MS`, `AGENT_VERBOSE_PROGRESS` override per-harness defaults when set.
34
+
35
+ ## Routing between harnesses by voice
36
+
37
+ Once configured, you can reach any **installed** harness from a voice channel without restarting:
38
+
39
+ - `"ask Codex what it thinks"` — single-turn route, next utterance returns to the default.
40
+ - `"switch to Aider"` — sticky route until you say `"back to default"`.
41
+ - Plan-mode `which_agent` slot — the agent itself proposes which backend runs the next plan.
42
+
43
+ The routing layer detects whether the binary is on `PATH` (resolving relative commands against the active project session's workdir). If not installed, the bridge asks `"Want me to use the default agent instead?"` — answer `"yes"` to fall back or `"no"` to cancel.
44
+
45
+ Aliases recognized by the parser: `claude` / `claude code`, `codex` / `코덱스`, `gemini` / `gemini cli` / `제미나이`, `opencode`, `openclaw`, `aider` / `에이더`, `cursor` / `cursor cli`, `hermes` / `헤르메스`.
46
+
47
+ ## Shared semantics
48
+
49
+ Things every harness adapter respects:
50
+
51
+ - **Voice plan mode** — `"plan it first"` → narrate a plan; edit by voice; `"approve"` to execute against the chosen harness.
52
+ - **Barge-in** — interrupting cuts the current TTS and aborts the agent task. Sticky routing survives interrupts; only single-turn routes are cleared.
53
+ - **Verbose progress** — `AGENT_VERBOSE_PROGRESS=1` (or `"상세 진행 켜"`) prints structured progress events the harness emits (file reads, web search, tool use). Smart-progress, if `SMART_PROGRESS_API_KEY` is set, summarizes these into one sentence per batch.
54
+ - **Push handoff** — `NOTIFY_PROVIDER=ntfy|pushover` plus `NOTIFY_MIN_TASK_MS` fires a push notification when a long task completes and the voice channel is empty. Debounced by body + `NOTIFY_DEBOUNCE_MS`.
55
+ - **Per-channel state** — each Discord voice channel keeps its own routing, plan-mode, and recent-utterance ring buffer.
56
+ - **Project sessions** — `!session new <name> <workdir>` binds a Discord channel to a project; per-(harness, session) adapters are cached and invalidated on rebind.
57
+
58
+ See per-harness docs for install paths, auth, and gotchas. `docs/CONFIGURATION.md` is the canonical env-var reference.
@@ -0,0 +1,50 @@
1
+ # Aider — Harness Notes
2
+
3
+ <p align="center">
4
+ <a href="../README.md">README</a> ·
5
+ <a href="HARNESSES.md">Harnesses</a> ·
6
+ <a href="USAGE.md">Usage</a> ·
7
+ <a href="CONFIGURATION.md">Configuration</a>
8
+ </p>
9
+
10
+ Aider is a pair-programming AI CLI focused on direct edits. VerbalCoding drives it through `aider --no-pretty --yes-always --message` — the prompt is passed as the `--message` value so each voice turn becomes one non-interactive Aider run that may modify files in `AGENT_WORKDIR`.
11
+
12
+ ## Install
13
+
14
+ ```bash
15
+ pip install aider-chat
16
+ aider --version
17
+ # Confirm a single-message run works:
18
+ aider --no-pretty --yes-always --message "list the top-level files"
19
+ ```
20
+
21
+ Aider needs an API key for the model you point it at (OpenAI / Anthropic / a local server). See <https://aider.chat>.
22
+
23
+ ## Configure VerbalCoding
24
+
25
+ ```bash
26
+ # .env
27
+ AGENT_BACKEND=aider
28
+ # optional
29
+ AIDER_COMMAND="aider --no-pretty --yes-always --message" # default
30
+ AGENT_WORKDIR=/Users/you/code/your-project # where Aider should edit
31
+ AGENT_PROJECT_CONTEXT="..."
32
+ AGENT_CHAT_TIMEOUT_MS=120000 # Aider can take longer
33
+ AGENT_TASK_TIMEOUT_MS=0
34
+ ```
35
+
36
+ `--no-pretty` strips Rich-formatting box characters so the stream sentencer doesn't choke on them. `--yes-always` keeps the run non-interactive (Aider won't pause for "apply this diff?" prompts).
37
+
38
+ ## Voice phrases to switch TO Aider
39
+
40
+ - en: `"switch to Aider"`, `"ask Aider to ..."`
41
+ - ko: `"aider로 전환해줘"`, `"에이더로 전환"`
42
+
43
+ The matcher accepts `aider` and `에이더`.
44
+
45
+ ## Gotchas
46
+
47
+ - **Aider edits files.** Unlike Claude / Codex / Gemini under `-p`, Aider directly modifies the working tree as part of answering. Be deliberate about `AGENT_WORKDIR` — usually a project session's `workdir`.
48
+ - **Diffs in output.** Aider often emits diff-shaped text. If a turn is interrupted, the bridge speaks an "interrupted" notice and skips reading the diff aloud — check the text channel and `git status`.
49
+ - **Auth.** `OPENAI_API_KEY` / `ANTHROPIC_API_KEY` need to be in Aider's environment; instance-isolated installs typically use `instances/<project>.env`.
50
+ - **Per-channel state.** Cross-agent routing is per Discord channel; switching to Aider in one project room does not affect another.
@@ -0,0 +1,56 @@
1
+ # Claude Code — Harness Notes
2
+
3
+ <p align="center">
4
+ <a href="../README.md">README</a> ·
5
+ <a href="HARNESSES.md">Harnesses</a> ·
6
+ <a href="USAGE.md">Usage</a> ·
7
+ <a href="CONFIGURATION.md">Configuration</a>
8
+ </p>
9
+
10
+ Claude Code is Anthropic's official terminal-resident coding agent. VerbalCoding drives it through `claude -p`, where each voice turn is one invocation. Claude Code does not expose a stable session-resume contract over `-p`, so each call is a fresh context — use `AGENT_PROJECT_CONTEXT` and the cross-agent handoff block to keep continuity.
11
+
12
+ ## Install
13
+
14
+ ```bash
15
+ npm install -g @anthropic-ai/claude-code
16
+ claude login
17
+ claude -p "hello" # confirm it answers
18
+ ```
19
+
20
+ ## Configure VerbalCoding
21
+
22
+ ```bash
23
+ # .env
24
+ AGENT_BACKEND=claude # alias 'claude-code' also accepted
25
+ # optional
26
+ CLAUDE_COMMAND="claude -p" # default; override e.g. to add --model, --debug
27
+ AGENT_PROJECT_CONTEXT="Working on the auth module; previous decisions: oauth=github."
28
+ AGENT_WORKDIR=/Users/you/code/your-project
29
+ AGENT_CHAT_TIMEOUT_MS=45000
30
+ AGENT_TASK_TIMEOUT_MS=0
31
+ AGENT_VERBOSE_PROGRESS=0
32
+ ```
33
+
34
+ `AGENT_SESSION_FILE` defaults to `<repo>/.agent-sessions/claude` but is **unused** by this harness — Claude Code's `-p` is stateless. Leave it set; it just becomes a no-op.
35
+
36
+ ## What Claude sees per turn
37
+
38
+ Every turn the adapter prepends a Discord-aware preamble (English or Korean depending on `VOICE_LANGUAGE`), the project context, recent Discord text context, and finally the user's transcribed utterance. On cross-agent handoff (e.g. you said `"ask Codex ..."` last turn and just spoke again), the prepended block also includes a "Recent user voice" line of up to the last 4 utterances plus the most recently resolved plan decisions, so Claude doesn't start cold.
39
+
40
+ ## Verbose progress
41
+
42
+ Claude Code does not emit a standard progress stream over `-p`. `AGENT_VERBOSE_PROGRESS=1` still works — the adapter parses tool/file/web mentions out of stdout/stderr if Claude prints them — but expect coarser progress than Hermes.
43
+
44
+ ## Voice phrases to switch TO Claude Code
45
+
46
+ - en: `"switch to Claude Code"`, `"ask Claude ..."`, `"let Claude finish this"`
47
+ - ko: `"클로드로 전환"`, `"claude한테 물어봐"`
48
+
49
+ The matcher accepts both `claude` and `claude code` as aliases; strict mode (used for routing-only utterances) requires an exact alias.
50
+
51
+ ## Gotchas
52
+
53
+ - **No session resume.** A long-running pair-programming session needs the cross-agent handoff context block to carry decisions forward. The bridge does this automatically on backend changes; within the same backend, set `AGENT_PROJECT_CONTEXT` to a short summary.
54
+ - **Quoted command paths.** If `CLAUDE_COMMAND` uses a quoted absolute path (e.g. `"/Applications/Claude Code/claude" -p`), VerbalCoding's installation probe uses `shellSplit` and honors quotes correctly.
55
+ - **Auth refresh.** `claude login` token expiry surfaces as a non-zero exit; the bridge reports the failure and (if a non-default backend) the fallback prompt will offer to retry on the default.
56
+ - **Patch-like output.** If Claude returns a diff and the turn is interrupted, the bridge says `"the agent was interrupted; check the text channel for files and tests"` rather than reading the diff aloud.
@@ -0,0 +1,56 @@
1
+ # Codex — Harness Notes
2
+
3
+ <p align="center">
4
+ <a href="../README.md">README</a> ·
5
+ <a href="HARNESSES.md">Harnesses</a> ·
6
+ <a href="USAGE.md">Usage</a> ·
7
+ <a href="CONFIGURATION.md">Configuration</a>
8
+ </p>
9
+
10
+ Codex CLI is OpenAI's terminal coding agent. VerbalCoding drives it through `codex exec`. Because `codex exec` writes its final assistant text to a temp file when `--output-last-message <path>` is passed, the adapter inserts that flag automatically and reads the file back even if stdout is noisy.
11
+
12
+ ## Install
13
+
14
+ ```bash
15
+ npm install -g @openai/codex
16
+ codex login # or set OPENAI_API_KEY for headless use
17
+ codex exec "hello"
18
+ ```
19
+
20
+ ## Configure VerbalCoding
21
+
22
+ ```bash
23
+ # .env
24
+ AGENT_BACKEND=codex
25
+ # optional
26
+ CODEX_COMMAND="codex exec" # default
27
+ AGENT_PROJECT_CONTEXT="What we're working on, what's already decided."
28
+ AGENT_WORKDIR=/Users/you/code/your-project
29
+ AGENT_CHAT_TIMEOUT_MS=45000
30
+ AGENT_TASK_TIMEOUT_MS=0
31
+ ```
32
+
33
+ `AGENT_SESSION_FILE` is unused (Codex `exec` is stateless across calls).
34
+
35
+ ## Output capture
36
+
37
+ For Codex, the adapter:
38
+
39
+ 1. Generates a temp path under `os.tmpdir()` like `verbalcoding-codex-last-<pid>-<ts>.txt`.
40
+ 2. Inserts `--output-last-message <path>` immediately before the final positional prompt arg.
41
+ 3. After the run, reads that file as the authoritative answer (preferred over `stdout`).
42
+ 4. Deletes the temp file.
43
+
44
+ This is robust to Codex emitting tool-use chatter on stdout; the spoken answer always comes from the captured file.
45
+
46
+ ## Voice phrases to switch TO Codex
47
+
48
+ - en: `"switch to Codex"`, `"ask Codex what it thinks"`
49
+ - ko: `"코덱스로 전환"`, `"코덱스한테 물어봐"`
50
+
51
+ ## Gotchas
52
+
53
+ - **Long tasks.** Set `AGENT_TASK_TIMEOUT_MS=0` for codegen runs that may take minutes. The adapter respects `signal.aborted` so barge-in still cuts cleanly.
54
+ - **No session resume.** Pass context via `AGENT_PROJECT_CONTEXT` and rely on the cross-agent handoff block for continuity after a route change.
55
+ - **Patch-like output safety.** If a turn is interrupted and Codex was mid-diff, the bridge does **not** read the diff aloud — it speaks an "interrupted" notice and asks you to check the text channel.
56
+ - **Auth.** A 401 from the OpenAI backend surfaces as a non-zero exit; the bridge reports the failure and the cross-agent fallback prompt offers the default agent.
@@ -0,0 +1,45 @@
1
+ # Cursor CLI — Harness Notes
2
+
3
+ <p align="center">
4
+ <a href="../README.md">README</a> ·
5
+ <a href="HARNESSES.md">Harnesses</a> ·
6
+ <a href="USAGE.md">Usage</a> ·
7
+ <a href="CONFIGURATION.md">Configuration</a>
8
+ </p>
9
+
10
+ Cursor CLI (`cursor-agent`) is Cursor's terminal agent. VerbalCoding drives it through `cursor-agent --print --prompt`, passing the user's transcribed utterance as the prompt value. `--print` keeps the run non-interactive.
11
+
12
+ ## Install
13
+
14
+ Follow the upstream Cursor CLI install. Confirm:
15
+
16
+ ```bash
17
+ cursor-agent --print --prompt "hello"
18
+ ```
19
+
20
+ ## Configure VerbalCoding
21
+
22
+ ```bash
23
+ # .env
24
+ AGENT_BACKEND=cursor # alias 'cursor-cli' also accepted
25
+ # optional
26
+ CURSOR_COMMAND="cursor-agent --print --prompt" # default
27
+ AGENT_PROJECT_CONTEXT="..."
28
+ AGENT_WORKDIR=/Users/you/code/your-project
29
+ AGENT_CHAT_TIMEOUT_MS=45000
30
+ AGENT_TASK_TIMEOUT_MS=0
31
+ ```
32
+
33
+ ## Voice phrases to switch TO Cursor
34
+
35
+ - en: `"switch to Cursor"`, `"ask Cursor ..."`, `"switch to cursor cli"`, `"switch to cursor agent"`
36
+ - ko: `"커서로 전환"`, `"cursor한테 물어봐"`
37
+
38
+ The matcher accepts `cursor`, `cursor cli`, `cursor-cli`, `cursor agent`, and `cursor-agent`.
39
+
40
+ ## Gotchas
41
+
42
+ - **Prompt placement.** `--prompt` expects the value to follow; VerbalCoding's shell-aware argv builder places the transcribed utterance as the final positional argument, so `CURSOR_COMMAND` must end with `--prompt`.
43
+ - **Editor side-effects.** Cursor's CLI may touch local cursor-related state files in the working directory; if that's surprising for a voice-only flow, point `AGENT_WORKDIR` at an isolated project dir.
44
+ - **No session resume.** Use `AGENT_PROJECT_CONTEXT` for cross-turn continuity, plus the cross-agent handoff block when routing back from a different harness.
45
+ - **Patch safety.** If Cursor returns a diff and the turn is interrupted, the bridge does not read the diff aloud.
@@ -0,0 +1,45 @@
1
+ # Gemini CLI — Harness Notes
2
+
3
+ <p align="center">
4
+ <a href="../README.md">README</a> ·
5
+ <a href="HARNESSES.md">Harnesses</a> ·
6
+ <a href="USAGE.md">Usage</a> ·
7
+ <a href="CONFIGURATION.md">Configuration</a>
8
+ </p>
9
+
10
+ Gemini CLI is Google's terminal coding agent. VerbalCoding drives it through `gemini -p`. Each voice turn is one invocation; there is no built-in session-resume across calls.
11
+
12
+ ## Install
13
+
14
+ Follow the upstream Gemini CLI install guide. Confirm:
15
+
16
+ ```bash
17
+ gemini -p "hello"
18
+ ```
19
+
20
+ ## Configure VerbalCoding
21
+
22
+ ```bash
23
+ # .env
24
+ AGENT_BACKEND=gemini
25
+ # optional
26
+ GEMINI_COMMAND="gemini -p" # default; add --model, --debug as needed
27
+ AGENT_PROJECT_CONTEXT="..."
28
+ AGENT_WORKDIR=/Users/you/code/your-project
29
+ AGENT_CHAT_TIMEOUT_MS=45000
30
+ AGENT_TASK_TIMEOUT_MS=0
31
+ ```
32
+
33
+ ## Voice phrases to switch TO Gemini
34
+
35
+ - en: `"switch to Gemini"`, `"ask Gemini ..."`, `"switch to Gemini CLI"`
36
+ - ko: `"제미나이로 전환"`, `"gemini한테 물어봐"`
37
+
38
+ The matcher accepts `gemini`, `gemini cli`, `gemini-cli`, and `제미나이`.
39
+
40
+ ## Gotchas
41
+
42
+ - **No session resume.** Same continuity story as Claude / Codex: rely on `AGENT_PROJECT_CONTEXT` and the cross-agent handoff block.
43
+ - **Long answers.** Gemini sometimes returns large structured responses; the stream sentencer splits them into TTS-able sentences. Code fences are stripped from speech (the text channel still gets the full answer with code).
44
+ - **API key.** If Gemini exits non-zero with an auth error, the bridge reports the message; the cross-agent fallback prompt offers the default agent if Gemini was a non-default route.
45
+ - **Verbose progress.** Gemini's stdout doesn't follow Hermes' `┊`-style preview format, so verbose progress mostly relies on the smart-progress LLM summarizer.