@elvatis_com/openclaw-cli-bridge-elvatis 3.6.0 → 3.7.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CLAUDE.md CHANGED
@@ -31,7 +31,14 @@ OpenClaw Gateway ──(HTTP)──> proxy-server.ts ──(spawn)──> claude
31
31
  - **Compact tool schema** — when >8 tools, only send name+params (skip descriptions/full JSON schema), cuts prompt ~60%
32
32
  - **Exit 143 = our SIGTERM** — not OOM, not crash. The bridge's timeout/stale-output detector sends SIGTERM, Claude CLI exits 143
33
33
  - **Consecutive timeout rotation** — after 3 timeouts in a row on the same session, auto-expire it and create a fresh one. Prevents poisoned sessions from blocking all requests
34
- - **Workspace project auto-detection** — scans `~/.openclaw/workspace/` for project directories; when the prompt contains an exact match of a project name, auto-sets `workdir` and injects `[Context: Working directory is ...]` into the prompt
34
+ - **Workspace project auto-detection** — scans `~/.openclaw/workspace/` for project directories; when the prompt contains an exact match of a project name (from user messages only), auto-sets `workdir` and injects context
35
+ - **Opus escalation** — when conversations exceed 20 messages with tools, automatically routes from Sonnet to Opus. Opus handles large contexts reliably (94% success vs Sonnet's 55%)
36
+ - **Opus 90s stale timeout** — Opus gets 90s stale-output timeout (vs 30s for Sonnet) to allow time for long-form generation (blog posts, Lexical JSON)
37
+ - **Session resume: Opus only** — Sonnet/Haiku use fresh `claude -p` every call (session resume caused 45% hang rate). Opus uses `--session-id`/`--resume` for context continuity
38
+ - **Generic skill auto-detection** — scans `~/.openclaw/skills/` for SKILL.md files, injects pointers when prompt matches a skill name. Fully generic, works with any installed skill
39
+ - **First user message pinning** — original user request is always included in the prompt window, even when conversation exceeds MAX_MESSAGES
40
+ - **Haiku skip in tool loops** — fallback chain skips Haiku when tool_calls are expected (Haiku consistently returns text instead of tool_calls in tool loops)
41
+ - **Improved JSON parser** — tries multiple `{` positions for embedded JSON, rescue-from-raw strategy, handles malformed tool_calls from fallback models
35
42
 
36
43
  ## Build & Test
37
44
 
@@ -84,9 +91,12 @@ Parser tries 5 strategies: Claude JSON wrapper, direct JSON, code blocks, embedd
84
91
 
85
92
  ## Known Issues
86
93
 
87
- - **Sonnet intermittent hangs** — `claude -p` with Sonnet goes completely silent (~50% of the time) on large tool prompts (20KB+). First call often works, subsequent calls hang. NOT RAM-related. Likely API-side rate limiting or request dedup. Workaround: 30s stale-output detection + Haiku fallback.
88
- - **Haiku empty responses** — occasionally returns zero stdout (len:0). Cause unclear. The JSON reminder at prompt end helps but doesn't fully solve it.
89
- - **Pre-existing tsc errors** — 5 errors about `openclaw/plugin-sdk` module not found. These are expected — the SDK is injected at runtime by the gateway. Dist output is still generated.
94
+ - **Sonnet intermittent hangs** — `claude -p` with Sonnet goes completely silent (~45% of requests). Session resume makes it worse (corrupted sessions after SIGTERM). Workaround: session resume disabled for Sonnet (fresh `-p` every call), auto-escalate to Opus at 20+ messages. Opus has ~94% success rate.
95
+ - **Sonnet session resume disabled** — session resume caused corrupted sessions when SIGTERM killed processes. Only Opus uses `--session-id`/`--resume` now. Sonnet/Haiku send the full prompt every time (more tokens, but reliable).
96
+ - **Haiku unreliable for tool_calls** — returns text instead of tool_calls ~80% of the time in tool loops. Skipped in fallback chain when tools are expected.
97
+ - **Long-form generation limit** — generating 15KB+ responses (blog posts as Lexical JSON) can exceed even Opus's 90s stale timeout. The `claude -p` CLI sometimes goes silent during long generation. No workaround from the bridge side.
98
+ - **Agent delegation (disabled)** — infrastructure for delegating skills to `openclaw agent` is built but disabled. `openclaw agent` is single-turn only; multi-turn skill execution needs OpenClaw-side support.
99
+ - **Pre-existing tsc errors** — errors about `openclaw/plugin-sdk` module not found. Expected — the SDK is injected at runtime by the gateway. Dist output is still generated.
90
100
 
91
101
  ## Testing
92
102
 
package/README.md CHANGED
@@ -2,7 +2,7 @@
2
2
 
3
3
  > OpenClaw plugin that bridges locally installed AI CLIs (Codex, Gemini, Claude Code, OpenCode, Pi) as model providers — with slash commands for instant model switching, restore, health testing, and model listing.
4
4
 
5
- **Current version:** `3.6.0`
5
+ **Current version:** `3.7.0`
6
6
 
7
7
  ---
8
8
 
package/SKILL.md CHANGED
@@ -68,4 +68,4 @@ On gateway restart, if any session has expired, a **WhatsApp alert** is sent aut
68
68
 
69
69
  See `README.md` for full configuration reference and architecture diagram.
70
70
 
71
- **Version:** 3.6.0
71
+ **Version:** 3.7.0
@@ -2,7 +2,7 @@
2
2
  "id": "openclaw-cli-bridge-elvatis",
3
3
  "slug": "openclaw-cli-bridge-elvatis",
4
4
  "name": "OpenClaw CLI Bridge",
5
- "version": "3.6.0",
5
+ "version": "3.7.0",
6
6
  "license": "MIT",
7
7
  "description": "Phase 1: openai-codex auth bridge. Phase 2: local HTTP proxy routing model calls through gemini/claude CLIs (vllm provider).",
8
8
  "providers": [
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@elvatis_com/openclaw-cli-bridge-elvatis",
3
- "version": "3.6.0",
3
+ "version": "3.7.0",
4
4
  "description": "Bridges gemini, claude, and codex CLI tools as OpenClaw model providers. Reads existing CLI auth without re-login.",
5
5
  "type": "module",
6
6
  "openclaw": {
package/src/cli-runner.ts CHANGED
@@ -304,6 +304,8 @@ export interface RunCliOptions {
304
304
  */
305
305
  cwd?: string;
306
306
  timeoutMs?: number;
307
+ /** Override stale-output timeout (ms). Opus needs longer (90s) for long-form generation. */
308
+ staleTimeoutMs?: number;
307
309
  /** Optional logger for timeout events. */
308
310
  log?: (msg: string) => void;
309
311
  }
@@ -373,12 +375,13 @@ export function runCli(
373
375
  doKill(`timeout after ${Math.round(timeoutMs / 1000)}s`);
374
376
  }, timeoutMs);
375
377
 
376
- // ── Stale-output detection: kill if no stdout for STALE_OUTPUT_TIMEOUT_MS
377
- if (STALE_OUTPUT_TIMEOUT_MS > 0) {
378
+ // ── Stale-output detection: kill if no stdout for staleTimeoutMs
379
+ const effectiveStaleTimeout = opts.staleTimeoutMs ?? STALE_OUTPUT_TIMEOUT_MS;
380
+ if (effectiveStaleTimeout > 0) {
378
381
  const checkInterval = 15_000; // check every 15s
379
382
  staleTimer = setInterval(() => {
380
383
  const silent = Date.now() - lastOutputAt;
381
- if (silent >= STALE_OUTPUT_TIMEOUT_MS) {
384
+ if (silent >= effectiveStaleTimeout) {
382
385
  doKill(`stale output — no stdout for ${Math.round(silent / 1000)}s`);
383
386
  }
384
387
  }, checkInterval);
@@ -667,7 +670,9 @@ export async function runClaude(
667
670
 
668
671
  const model = stripPrefix(modelId);
669
672
  const session = getOrCreateSession("claude", model);
670
- const isResume = session.requestCount > 0;
673
+ // Session resume: enabled for Opus (reliable), disabled for Sonnet/Haiku (45% hang rate)
674
+ const isOpus = model.includes("opus");
675
+ const isResume = isOpus && session.requestCount > 0;
671
676
 
672
677
  const args: string[] = [
673
678
  "-p",
@@ -679,24 +684,28 @@ export async function runClaude(
679
684
 
680
685
  if (isResume) {
681
686
  args.push("--resume", session.sessionId);
682
- } else {
687
+ } else if (isOpus) {
683
688
  args.push("--session-id", session.sessionId);
684
689
  }
690
+ // Sonnet/Haiku: no session args — fresh call every time
685
691
 
686
- // When tools are present, sandwich the conversation between tool instructions.
687
- // On resume: only send the last user message (Claude has the full history).
688
- // On first request: send the full prompt with tool block.
692
+ // On resume: only send the last user message (Opus has the full history).
693
+ // On fresh: send the full prompt with tool block.
689
694
  const effectivePrompt = opts?.tools?.length
690
- ? buildToolPromptBlock(opts.tools) + "\n\n" + prompt + "\n\nREMINDER: You MUST respond with ONLY valid JSON — either {\"tool_calls\":[...]} or {\"content\":\"...\"}. Nothing else."
695
+ ? (isResume
696
+ ? prompt + "\n\nREMINDER: You MUST respond with ONLY valid JSON — either {\"tool_calls\":[...]} or {\"content\":\"...\"}. Nothing else."
697
+ : buildToolPromptBlock(opts.tools) + "\n\n" + prompt + "\n\nREMINDER: You MUST respond with ONLY valid JSON — either {\"tool_calls\":[...]} or {\"content\":\"...\"}. Nothing else.")
691
698
  : prompt;
692
699
 
693
700
  const cwd = workdir ?? homedir();
694
- debugLog("CLAUDE", `${isResume ? "resume" : "new"} ${model} session=${session.sessionId.slice(0, 8)}`, {
701
+ debugLog("CLAUDE", `${isResume ? "resume" : "fresh"} ${model}${isResume ? ` session=${session.sessionId.slice(0, 8)}` : ""}`, {
695
702
  promptLen: effectivePrompt.length, promptKB: Math.round(effectivePrompt.length / 1024),
696
- requestCount: session.requestCount, cwd, timeoutMs: Math.round(timeoutMs / 1000),
703
+ cwd, timeoutMs: Math.round(timeoutMs / 1000), ...(isOpus ? { requestCount: session.requestCount } : {}),
697
704
  });
698
705
 
699
- const result = await runCli("claude", args, effectivePrompt, timeoutMs, { cwd, log: opts?.log });
706
+ // Opus gets 90s stale timeout it needs think time for long-form generation (blog posts, Lexical JSON)
707
+ const staleMs = isOpus ? 90_000 : undefined;
708
+ const result = await runCli("claude", args, effectivePrompt, timeoutMs, { cwd, log: opts?.log, staleTimeoutMs: staleMs });
700
709
 
701
710
  // Session succeeded — update registry
702
711
  if (result.exitCode === 0 || result.stdout.length > 0) {
@@ -1066,43 +1075,27 @@ function detectSkillHints(userText: string): string | null {
1066
1075
  const skills = getSkillRegistry();
1067
1076
  if (!skills.length) return null;
1068
1077
 
1069
- const lower = userText.toLowerCase();
1070
1078
  const matched: SkillEntry[] = [];
1071
1079
 
1072
1080
  for (const skill of skills) {
1073
- // Match by exact skill name in prompt
1081
+ // Match by exact skill name in prompt only
1074
1082
  const nameRegex = new RegExp(`\\b${skill.name.replace(/[.*+?^${}()|[\]\\]/g, "\\$&")}\\b`, "i");
1075
1083
  if (nameRegex.test(userText)) {
1076
1084
  matched.push(skill);
1077
- continue;
1078
- }
1079
- // Match by description keywords (need at least 2 keyword hits)
1080
- const uniqueKeywords = [...new Set(skill.keywords)];
1081
- const hits = uniqueKeywords.filter(kw => lower.includes(kw.toLowerCase()));
1082
- if (hits.length >= 2) {
1083
- matched.push(skill);
1084
1085
  }
1085
1086
  }
1086
1087
 
1087
1088
  if (!matched.length) return null;
1088
1089
 
1090
+ // Keep hints compact — every byte counts at high message counts
1089
1091
  const hints = matched.map(skill => {
1090
- const lines = [
1091
- `[Skill: ${skill.name}]`,
1092
- `Read the skill instructions with the read tool: ${skill.path}/SKILL.md`,
1093
- `Then follow the workflow step by step using the available tools (read, exec, web_fetch, etc.).`,
1094
- ];
1095
- if (skill.scripts.length > 0) {
1096
- lines.push(`Available scripts (use exec tool to run them):`);
1097
- for (const s of skill.scripts) {
1098
- lines.push(` - python3 ${skill.path}/scripts/${s}`);
1099
- }
1100
- lines.push(`Always use exec to run scripts. Do NOT output results as plain text when a script can do it.`);
1101
- }
1102
- return lines.join("\n");
1092
+ const scripts = skill.scripts.length > 0
1093
+ ? ` Scripts: ${skill.scripts.map(s => `${skill.path}/scripts/${s}`).join(", ")}`
1094
+ : "";
1095
+ return `[Skill: ${skill.name}] Read: ${skill.path}/SKILL.md follow workflow with read/exec tools.${scripts}`;
1103
1096
  });
1104
1097
 
1105
- return hints.join("\n\n");
1098
+ return hints.join("\n");
1106
1099
  }
1107
1100
 
1108
1101
  /**
@@ -1132,7 +1125,7 @@ export async function routeToCliRunner(
1132
1125
  // Auto-detect project from user messages only (not tool results which mention other projects)
1133
1126
  const userText = messages
1134
1127
  .filter((m) => m.role === "user")
1135
- .map((m) => typeof m.content === "string" ? m.content : "")
1128
+ .map((m) => contentToString(m.content))
1136
1129
  .join(" ");
1137
1130
 
1138
1131
  if (!opts.workdir) {
@@ -1144,11 +1137,11 @@ export async function routeToCliRunner(
1144
1137
  }
1145
1138
  }
1146
1139
 
1147
- // Skill hints: inject pointers to local skill files when user prompt matches known patterns
1140
+ // Skill hints: inject at END of prompt so they're the freshest context (not buried under system msg)
1148
1141
  const skillHints = detectSkillHints(userText);
1149
1142
  if (skillHints) {
1150
- prompt = `${skillHints}\n\n${prompt}`;
1151
- debugLog("SKILL-HINT", "injected skill hints", { len: skillHints.length });
1143
+ prompt = `${prompt}\n\n${skillHints}`;
1144
+ debugLog("SKILL-HINT", "injected skill hints at end of prompt", { len: skillHints.length });
1152
1145
  }
1153
1146
 
1154
1147
  // Strip "vllm/" prefix if present — OpenClaw sends the full provider path
package/src/config.ts CHANGED
@@ -78,6 +78,12 @@ export const TOOL_HEAVY_THRESHOLD = 10;
78
78
  */
79
79
  export const TOOL_ROUTING_THRESHOLD = 8;
80
80
 
81
+ /**
82
+ * Prompt size threshold (bytes) for escalating Sonnet to Opus.
83
+ * Sonnet hangs ~50% at 30KB+ prompts. Opus handles large contexts reliably.
84
+ */
85
+ export const OPUS_ESCALATION_THRESHOLD = 30_000;
86
+
81
87
  /** Max characters per message content before truncation. */
82
88
  export const MAX_MSG_CHARS = 4_000;
83
89
 
@@ -9,7 +9,8 @@
9
9
  */
10
10
 
11
11
  import http from "node:http";
12
- import { randomBytes } from "node:crypto";
12
+ import { execSync } from "node:child_process";
13
+ import { randomBytes, createHash } from "node:crypto";
13
14
  import { type ChatMessage, type CliToolResult, type ToolDefinition, routeToCliRunner, extractMultimodalParts, cleanupMediaFiles } from "./cli-runner.js";
14
15
  import { scheduleTokenRefresh, setAuthLogger, stopTokenRefresh } from "./claude-auth.js";
15
16
  import { grokComplete, grokCompleteStream, type ChatMessage as GrokChatMessage } from "./grok-client.js";
@@ -33,9 +34,118 @@ import {
33
34
  BITNET_SYSTEM_PROMPT,
34
35
  DEFAULT_MODEL_TIMEOUTS,
35
36
  TOOL_ROUTING_THRESHOLD,
37
+ OPUS_ESCALATION_THRESHOLD,
36
38
  } from "./config.js";
37
39
  import { debugLog, DEBUG_LOG_PATH, getLogTail, watchLogFile, setDebugLogEnabled } from "./debug-log.js";
38
40
 
41
+ // ── Skill delegation via openclaw agent ─────────────────────────────────────
42
+
43
+ import { existsSync, readdirSync, statSync } from "node:fs";
44
+ import { join } from "node:path";
45
+ import { homedir } from "node:os";
46
+ import { spawn as spawnChild } from "node:child_process";
47
+
48
+ const activeDelegations = new Set<string>();
49
+
50
+ function extractUserText(messages: ChatMessage[]): string {
51
+ return messages
52
+ .filter((m) => m.role === "user")
53
+ .map((m) => {
54
+ if (typeof m.content === "string") return m.content;
55
+ if (Array.isArray(m.content)) {
56
+ return (m.content as Array<{ type: string; text?: string }>)
57
+ .filter((p) => p.type === "text" && p.text)
58
+ .map((p) => p.text!)
59
+ .join(" ");
60
+ }
61
+ return "";
62
+ })
63
+ .join(" ");
64
+ }
65
+
66
+ let _skillNames: string[] | null = null;
67
+ let _skillNamesAt = 0;
68
+
69
+ function getSkillNames(): string[] {
70
+ const now = Date.now();
71
+ if (_skillNames && (now - _skillNamesAt) < 120_000) return _skillNames;
72
+ _skillNames = [];
73
+ const dir = join(homedir(), ".openclaw", "skills");
74
+ try {
75
+ if (!existsSync(dir)) return _skillNames;
76
+ for (const name of readdirSync(dir)) {
77
+ try {
78
+ if (statSync(join(dir, name)).isDirectory() && existsSync(join(dir, name, "SKILL.md"))) {
79
+ _skillNames.push(name);
80
+ }
81
+ } catch {}
82
+ }
83
+ } catch {}
84
+ _skillNamesAt = now;
85
+ return _skillNames;
86
+ }
87
+
88
+ function detectMatchedSkill(userText: string): string | null {
89
+ for (const name of getSkillNames()) {
90
+ const re = new RegExp(`\\b${name.replace(/[.*+?^${}()|[\]\\]/g, "\\$&")}\\b`, "i");
91
+ if (re.test(userText)) return name;
92
+ }
93
+ return null;
94
+ }
95
+
96
+ async function delegateToAgent(prompt: string, timeoutMs: number): Promise<{ text: string; durationMs: number }> {
97
+ const start = Date.now();
98
+ const timeoutSec = Math.min(Math.floor(timeoutMs / 1000), 300);
99
+
100
+ return new Promise((resolve, reject) => {
101
+ // Use the same Node + openclaw entry point as the systemd service to avoid version mismatches
102
+ const openclawEntry = join(homedir(), ".npm-global", "lib", "node_modules", "openclaw", "dist", "entry.js");
103
+ const useEntryJs = existsSync(openclawEntry);
104
+ const cmd = useEntryJs ? process.execPath : "openclaw"; // process.execPath = /usr/bin/node
105
+ const args = useEntryJs
106
+ ? [openclawEntry, "agent", "--agent", "main", "--message", prompt, "--json", "--timeout", String(timeoutSec)]
107
+ : ["agent", "--agent", "main", "--message", prompt, "--json", "--timeout", String(timeoutSec)];
108
+ const child = spawnChild(cmd, args, {
109
+ env: { ...process.env, PATH: `${join(homedir(), ".local", "bin")}:${process.env.PATH ?? ""}` },
110
+ stdio: ["pipe", "pipe", "pipe"],
111
+ });
112
+
113
+ let stdout = "";
114
+ let stderr = "";
115
+ child.stdout?.on("data", (d: Buffer) => { stdout += d.toString(); });
116
+ child.stderr?.on("data", (d: Buffer) => { stderr += d.toString(); });
117
+
118
+ const timer = setTimeout(() => { child.kill("SIGTERM"); }, timeoutMs + 10_000);
119
+
120
+ child.on("close", (code) => {
121
+ clearTimeout(timer);
122
+ const durationMs = Date.now() - start;
123
+ // Only fail if no JSON output at all — stderr always has plugin log noise
124
+ const hasJsonOutput = stdout.includes('"status"') || stdout.includes('"result"');
125
+ if (code !== 0 && !hasJsonOutput) {
126
+ // Filter out plugin log lines from stderr to find real errors
127
+ const realErrors = stderr.split("\n").filter(l => !l.includes("[plugins]") && !l.includes("[memory-") && l.trim()).join("\n");
128
+ reject(new Error(`openclaw agent exited ${code}: ${realErrors.slice(0, 500) || stderr.slice(0, 500)}`));
129
+ return;
130
+ }
131
+ try {
132
+ const jsonStart = stdout.indexOf("{");
133
+ if (jsonStart === -1) {
134
+ reject(new Error("No JSON in openclaw agent output"));
135
+ return;
136
+ }
137
+ const result = JSON.parse(stdout.slice(jsonStart));
138
+ const text = result?.result?.payloads?.[0]?.text ?? result?.result?.text ?? "";
139
+ resolve({ text, durationMs });
140
+ } catch (e) {
141
+ reject(new Error(`Failed to parse agent result: ${(e as Error).message}`));
142
+ }
143
+ });
144
+
145
+ child.on("error", (err) => { clearTimeout(timer); reject(err); });
146
+ });
147
+ }
148
+
39
149
  // ── Active request tracking ─────────────────────────────────────────────────
40
150
 
41
151
  export interface ActiveRequest {
@@ -846,15 +956,107 @@ async function handleRequest(
846
956
  }
847
957
  // ─────────────────────────────────────────────────────────────────────────
848
958
 
959
+ // ── Skill delegation: delegate to openclaw agent for full workflow execution ──
960
+ const userText = extractUserText(cleanMessages);
961
+ const matchedSkill = detectMatchedSkill(userText);
962
+ const delegationKey = matchedSkill ? `${matchedSkill}:${createHash("md5").update(userText.slice(0, 500)).digest("hex").slice(0, 12)}` : null;
963
+
964
+ // TODO: delegation needs a multi-turn agent runner, not single-turn `openclaw agent`.
965
+ // `openclaw agent` returns after one turn (220ms) without executing the full workflow.
966
+ // Re-enable when OpenClaw supports multi-turn skill execution (e.g., `openclaw skill run blog-writer`).
967
+ if (false && matchedSkill && delegationKey && activeDelegations.size === 0) {
968
+ debugLog("DELEGATE", `skill "${matchedSkill}" detected, delegating to openclaw agent`, { msgs: cleanMessages.length });
969
+ activeDelegations.add(delegationKey);
970
+
971
+ // Send SSE headers early if streaming
972
+ if (stream) {
973
+ res.writeHead(200, { "Content-Type": "text/event-stream", "Cache-Control": "no-cache", Connection: "keep-alive", ...corsHeaders() });
974
+ res.write(": delegating to openclaw agent\n\n");
975
+ // Keepalive while agent runs
976
+ const ka = setInterval(() => { res.write(": agent working\n\n"); }, 15_000);
977
+ try {
978
+ const lastUser = [...cleanMessages].reverse().find(m => m.role === "user");
979
+ const delegatePrompt = typeof lastUser?.content === "string" ? lastUser.content
980
+ : Array.isArray(lastUser?.content) ? (lastUser!.content as Array<{ type: string; text?: string }>).filter(p => p.type === "text").map(p => p.text).join(" ")
981
+ : userText.slice(-2000);
982
+
983
+ const agentResult = await delegateToAgent(delegatePrompt, MAX_EFFECTIVE_TIMEOUT_MS);
984
+ debugLog("DELEGATE-OK", `skill "${matchedSkill}" completed in ${(agentResult.durationMs / 1000).toFixed(1)}s`, { contentLen: agentResult.text.length });
985
+ metrics.recordRequest(model, agentResult.durationMs, true, estPromptTokens, estimateTokens(agentResult.text), promptPreview);
986
+
987
+ const chunk = { id, object: "chat.completion.chunk", created, model, choices: [{ index: 0, delta: { role: "assistant", content: agentResult.text }, finish_reason: "stop" }] };
988
+ res.write(`data: ${JSON.stringify(chunk)}\n\n`);
989
+ res.write("data: [DONE]\n\n");
990
+ res.end();
991
+ } catch (err) {
992
+ const msg = (err as Error).message;
993
+ debugLog("DELEGATE-FAIL", `skill "${matchedSkill}" failed`, { error: msg.slice(0, 200) });
994
+ opts.warn(`[cli-bridge] agent delegation failed: ${msg.slice(0, 100)}, falling through to CLI`);
995
+ // Fall through to normal CLI routing below
996
+ clearInterval(ka);
997
+ activeDelegations.delete(delegationKey);
998
+ // Can't fall through after sending SSE headers — send error
999
+ res.write(`data: ${JSON.stringify({ error: { message: `Agent delegation failed: ${msg.slice(0, 200)}. Retrying via CLI.`, type: "cli_error" } })}\n\n`);
1000
+ res.write("data: [DONE]\n\n");
1001
+ res.end();
1002
+ activeRequests.delete(id);
1003
+ cleanupMediaFiles(mediaFiles);
1004
+ return;
1005
+ } finally {
1006
+ clearInterval(ka);
1007
+ activeDelegations.delete(delegationKey);
1008
+ }
1009
+ activeRequests.delete(id);
1010
+ cleanupMediaFiles(mediaFiles);
1011
+ return;
1012
+ }
1013
+
1014
+ // Non-streaming delegation
1015
+ try {
1016
+ const lastUser = [...cleanMessages].reverse().find(m => m.role === "user");
1017
+ const delegatePrompt = typeof lastUser?.content === "string" ? lastUser.content
1018
+ : Array.isArray(lastUser?.content) ? (lastUser!.content as Array<{ type: string; text?: string }>).filter(p => p.type === "text").map(p => p.text).join(" ")
1019
+ : userText.slice(-2000);
1020
+
1021
+ const agentResult = await delegateToAgent(delegatePrompt, MAX_EFFECTIVE_TIMEOUT_MS);
1022
+ debugLog("DELEGATE-OK", `skill "${matchedSkill}" completed in ${(agentResult.durationMs / 1000).toFixed(1)}s`, { contentLen: agentResult.text.length });
1023
+
1024
+ res.writeHead(200, { "Content-Type": "application/json", ...corsHeaders() });
1025
+ res.end(JSON.stringify({
1026
+ id, object: "chat.completion", created, model,
1027
+ choices: [{ index: 0, message: { role: "assistant", content: agentResult.text }, finish_reason: "stop" }],
1028
+ usage: { prompt_tokens: estPromptTokens, completion_tokens: estimateTokens(agentResult.text), total_tokens: estPromptTokens + estimateTokens(agentResult.text) },
1029
+ }));
1030
+ activeRequests.delete(id);
1031
+ cleanupMediaFiles(mediaFiles);
1032
+ return;
1033
+ } catch (err) {
1034
+ debugLog("DELEGATE-FAIL", `skill "${matchedSkill}" failed, falling through to CLI`, { error: (err as Error).message.slice(0, 200) });
1035
+ activeDelegations.delete(delegationKey);
1036
+ // Fall through to normal CLI routing
1037
+ } finally {
1038
+ activeDelegations.delete(delegationKey);
1039
+ }
1040
+ }
1041
+
849
1042
  // ── CLI runner routing (Gemini / Claude Code / Codex) ──────────────────────
850
1043
  let result: CliToolResult;
851
1044
  let usedModel = model;
852
1045
 
853
- // ── Smart tool routing: Sonnet first (better reasoning), fast fallback to Haiku ──
854
- // Sonnet picks the right tools but intermittently hangs on large prompts.
855
- // Strategy: let Sonnet try first if it responds, great (better tool selection).
856
- // The stale-output detector (60s) kills it fast if it hangs, then fallback to Haiku.
857
- // This preserves Sonnet's intelligence for tool selection while keeping Haiku as safety net.
1046
+ // ── Opus escalation: route heavy conversations to Opus instead of Sonnet ──
1047
+ // Sonnet hangs ~50% at 30KB+ prompts. Opus handles large contexts reliably.
1048
+ // Measure by message count (proxy for formatted prompt size after truncation):
1049
+ // - With 21 tools + 12 messages (heavy tools window), prompt hits ~30KB
1050
+ // - Escalate when messages > 20 (conversation is deep enough to cause hangs)
1051
+ const shouldEscalate = model === "cli-claude/claude-sonnet-4-6"
1052
+ && cleanMessages.length > 20
1053
+ && hasTools;
1054
+ if (shouldEscalate) {
1055
+ const originalModel = model;
1056
+ usedModel = "cli-claude/claude-opus-4-6";
1057
+ debugLog("OPUS-ESCALATE", `${originalModel} → ${usedModel}`, { msgs: cleanMessages.length, tools: tools?.length ?? 0 });
1058
+ opts.log(`[cli-bridge] escalating to Opus (${cleanMessages.length} msgs with ${tools?.length ?? 0} tools)`);
1059
+ }
858
1060
 
859
1061
  const routeOpts = { workdir, tools: hasTools ? tools : undefined, mediaFiles: mediaFiles.length ? mediaFiles : undefined, log: opts.log };
860
1062