npm - @elvatis_com/openclaw-cli-bridge-elvatis - Versions diffs - 3.5.1 → 3.7.0 - Mend

@elvatis_com/openclaw-cli-bridge-elvatis 3.5.1 → 3.7.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (8) hide show

package/CLAUDE.md CHANGED Viewed

@@ -31,7 +31,14 @@ OpenClaw Gateway ──(HTTP)──> proxy-server.ts ──(spawn)──> claude
 - **Compact tool schema** — when >8 tools, only send name+params (skip descriptions/full JSON schema), cuts prompt ~60%
 - **Exit 143 = our SIGTERM** — not OOM, not crash. The bridge's timeout/stale-output detector sends SIGTERM, Claude CLI exits 143
 - **Consecutive timeout rotation** — after 3 timeouts in a row on the same session, auto-expire it and create a fresh one. Prevents poisoned sessions from blocking all requests
-- **Workspace project auto-detection** — scans `~/.openclaw/workspace/` for project directories; when the prompt contains an exact match of a project name, auto-sets `workdir` and injects `[Context: Working directory is ...]` into the prompt
+- **Workspace project auto-detection** — scans `~/.openclaw/workspace/` for project directories; when the prompt contains an exact match of a project name (from user messages only), auto-sets `workdir` and injects context
+- **Opus escalation** — when conversations exceed 20 messages with tools, automatically routes from Sonnet to Opus. Opus handles large contexts reliably (94% success vs Sonnet's 55%)
+- **Opus 90s stale timeout** — Opus gets 90s stale-output timeout (vs 30s for Sonnet) to allow time for long-form generation (blog posts, Lexical JSON)
+- **Session resume: Opus only** — Sonnet/Haiku use fresh `claude -p` every call (session resume caused 45% hang rate). Opus uses `--session-id`/`--resume` for context continuity
+- **Generic skill auto-detection** — scans `~/.openclaw/skills/` for SKILL.md files, injects pointers when prompt matches a skill name. Fully generic, works with any installed skill
+- **First user message pinning** — original user request is always included in the prompt window, even when conversation exceeds MAX_MESSAGES
+- **Haiku skip in tool loops** — fallback chain skips Haiku when tool_calls are expected (Haiku consistently returns text instead of tool_calls in tool loops)
+- **Improved JSON parser** — tries multiple `{` positions for embedded JSON, rescue-from-raw strategy, handles malformed tool_calls from fallback models
 ## Build & Test
@@ -84,9 +91,12 @@ Parser tries 5 strategies: Claude JSON wrapper, direct JSON, code blocks, embedd
 ## Known Issues
-- **Sonnet intermittent hangs** — `claude -p` with Sonnet goes completely silent (~50% of the time) on large tool prompts (20KB+). First call often works, subsequent calls hang. NOT RAM-related. Likely API-side rate limiting or request dedup. Workaround: 30s stale-output detection + Haiku fallback.
-- **Haiku empty responses** — occasionally returns zero stdout (len:0). Cause unclear. The JSON reminder at prompt end helps but doesn't fully solve it.
-- **Pre-existing tsc errors** — 5 errors about `openclaw/plugin-sdk` module not found. These are expected — the SDK is injected at runtime by the gateway. Dist output is still generated.
+- **Sonnet intermittent hangs** — `claude -p` with Sonnet goes completely silent (~45% of requests). Session resume makes it worse (corrupted sessions after SIGTERM). Workaround: session resume disabled for Sonnet (fresh `-p` every call), auto-escalate to Opus at 20+ messages. Opus has ~94% success rate.
+- **Sonnet session resume disabled** — session resume caused corrupted sessions when SIGTERM killed processes. Only Opus uses `--session-id`/`--resume` now. Sonnet/Haiku send the full prompt every time (more tokens, but reliable).
+- **Haiku unreliable for tool_calls** — returns text instead of tool_calls ~80% of the time in tool loops. Skipped in fallback chain when tools are expected.
+- **Long-form generation limit** — generating 15KB+ responses (blog posts as Lexical JSON) can exceed even Opus's 90s stale timeout. The `claude -p` CLI sometimes goes silent during long generation. No workaround from the bridge side.
+- **Agent delegation (disabled)** — infrastructure for delegating skills to `openclaw agent` is built but disabled. `openclaw agent` is single-turn only; multi-turn skill execution needs OpenClaw-side support.
+- **Pre-existing tsc errors** — errors about `openclaw/plugin-sdk` module not found. Expected — the SDK is injected at runtime by the gateway. Dist output is still generated.
 ## Testing

package/README.md CHANGED Viewed

@@ -2,7 +2,7 @@
 > OpenClaw plugin that bridges locally installed AI CLIs (Codex, Gemini, Claude Code, OpenCode, Pi) as model providers — with slash commands for instant model switching, restore, health testing, and model listing.
-**Current version:** `3.5.1`
+**Current version:** `3.7.0`
 ---

package/SKILL.md CHANGED Viewed

@@ -68,4 +68,4 @@ On gateway restart, if any session has expired, a **WhatsApp alert** is sent aut
 See `README.md` for full configuration reference and architecture diagram.
-**Version:** 3.5.1
+**Version:** 3.7.0

package/openclaw.plugin.json CHANGED Viewed

@@ -2,7 +2,7 @@
   "id": "openclaw-cli-bridge-elvatis",
   "slug": "openclaw-cli-bridge-elvatis",
   "name": "OpenClaw CLI Bridge",
-  "version": "3.5.1",
+  "version": "3.7.0",
   "license": "MIT",
   "description": "Phase 1: openai-codex auth bridge. Phase 2: local HTTP proxy routing model calls through gemini/claude CLIs (vllm provider).",
   "providers": [

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@elvatis_com/openclaw-cli-bridge-elvatis",
-  "version": "3.5.1",
+  "version": "3.7.0",
   "description": "Bridges gemini, claude, and codex CLI tools as OpenClaw model providers. Reads existing CLI auth without re-login.",
   "type": "module",
   "openclaw": {

package/src/cli-runner.ts CHANGED Viewed

@@ -304,6 +304,8 @@ export interface RunCliOptions {
    */
   cwd?: string;
   timeoutMs?: number;
+  /** Override stale-output timeout (ms). Opus needs longer (90s) for long-form generation. */
+  staleTimeoutMs?: number;
   /** Optional logger for timeout events. */
   log?: (msg: string) => void;
 }
@@ -373,12 +375,13 @@ export function runCli(
       doKill(`timeout after ${Math.round(timeoutMs / 1000)}s`);
     }, timeoutMs);
-    // ── Stale-output detection: kill if no stdout for STALE_OUTPUT_TIMEOUT_MS
-    if (STALE_OUTPUT_TIMEOUT_MS > 0) {
+    // ── Stale-output detection: kill if no stdout for staleTimeoutMs
+    const effectiveStaleTimeout = opts.staleTimeoutMs ?? STALE_OUTPUT_TIMEOUT_MS;
+    if (effectiveStaleTimeout > 0) {
       const checkInterval = 15_000; // check every 15s
       staleTimer = setInterval(() => {
         const silent = Date.now() - lastOutputAt;
-        if (silent >= STALE_OUTPUT_TIMEOUT_MS) {
+        if (silent >= effectiveStaleTimeout) {
           doKill(`stale output — no stdout for ${Math.round(silent / 1000)}s`);
         }
       }, checkInterval);
@@ -667,7 +670,9 @@ export async function runClaude(
   const model = stripPrefix(modelId);
   const session = getOrCreateSession("claude", model);
-  const isResume = session.requestCount > 0;
+  // Session resume: enabled for Opus (reliable), disabled for Sonnet/Haiku (45% hang rate)
+  const isOpus = model.includes("opus");
+  const isResume = isOpus && session.requestCount > 0;
   const args: string[] = [
     "-p",
@@ -679,24 +684,28 @@ export async function runClaude(
   if (isResume) {
     args.push("--resume", session.sessionId);
-  } else {
+  } else if (isOpus) {
     args.push("--session-id", session.sessionId);
   }
+  // Sonnet/Haiku: no session args — fresh call every time
-  // When tools are present, sandwich the conversation between tool instructions.
-  // On resume: only send the last user message (Claude has the full history).
-  // On first request: send the full prompt with tool block.
+  // On resume: only send the last user message (Opus has the full history).
+  // On fresh: send the full prompt with tool block.
   const effectivePrompt = opts?.tools?.length
-    ? buildToolPromptBlock(opts.tools) + "\n\n" + prompt + "\n\nREMINDER: You MUST respond with ONLY valid JSON — either {\"tool_calls\":[...]} or {\"content\":\"...\"}. Nothing else."
+    ? (isResume
+      ? prompt + "\n\nREMINDER: You MUST respond with ONLY valid JSON — either {\"tool_calls\":[...]} or {\"content\":\"...\"}. Nothing else."
+      : buildToolPromptBlock(opts.tools) + "\n\n" + prompt + "\n\nREMINDER: You MUST respond with ONLY valid JSON — either {\"tool_calls\":[...]} or {\"content\":\"...\"}. Nothing else.")
     : prompt;
   const cwd = workdir ?? homedir();
-  debugLog("CLAUDE", `${isResume ? "resume" : "new"} ${model} session=${session.sessionId.slice(0, 8)}`, {
+  debugLog("CLAUDE", `${isResume ? "resume" : "fresh"} ${model}${isResume ? ` session=${session.sessionId.slice(0, 8)}` : ""}`, {
     promptLen: effectivePrompt.length, promptKB: Math.round(effectivePrompt.length / 1024),
-    requestCount: session.requestCount, cwd, timeoutMs: Math.round(timeoutMs / 1000),
+    cwd, timeoutMs: Math.round(timeoutMs / 1000), ...(isOpus ? { requestCount: session.requestCount } : {}),
   });
-  const result = await runCli("claude", args, effectivePrompt, timeoutMs, { cwd, log: opts?.log });
+  // Opus gets 90s stale timeout — it needs think time for long-form generation (blog posts, Lexical JSON)
+  const staleMs = isOpus ? 90_000 : undefined;
+  const result = await runCli("claude", args, effectivePrompt, timeoutMs, { cwd, log: opts?.log, staleTimeoutMs: staleMs });
   // Session succeeded — update registry
   if (result.exitCode === 0 || result.stdout.length > 0) {
@@ -1003,6 +1012,92 @@ function detectProjectFromPrompt(prompt: string): { name: string; path: string }
   return null;
 }
+// ── Skill hint injection ─────────────────────────────────────────────────────
+// Scans ~/.openclaw/skills/ for skill directories with SKILL.md files.
+// When user prompt mentions a skill name (from the directory name or the SKILL.md
+// description), injects a pointer so the model knows where to find it.
+interface SkillEntry {
+  name: string;
+  path: string;
+  description: string;
+  keywords: string[];
+  scripts: string[];
+}
+let _skillRegistry: SkillEntry[] | null = null;
+let _skillRegistryRefreshedAt = 0;
+const SKILL_REGISTRY_CACHE_TTL = 120_000; // refresh every 2 min
+function getSkillRegistry(): SkillEntry[] {
+  const now = Date.now();
+  if (_skillRegistry && (now - _skillRegistryRefreshedAt) < SKILL_REGISTRY_CACHE_TTL) {
+    return _skillRegistry;
+  }
+  _skillRegistry = [];
+  const skillsDir = join(homedir(), ".openclaw", "skills");
+  try {
+    if (!existsSync(skillsDir)) return _skillRegistry;
+    const entries = readdirSync(skillsDir);
+    for (const name of entries) {
+      const skillDir = join(skillsDir, name);
+      const skillMd = join(skillDir, "SKILL.md");
+      try {
+        if (!statSync(skillDir).isDirectory()) continue;
+        if (!existsSync(skillMd)) continue;
+        // Read first 500 chars of SKILL.md to extract description and keywords
+        const content = readFileSync(skillMd, "utf8").slice(0, 500);
+        const descMatch = content.match(/description:\s*"([^"]+)"/);
+        const description = descMatch?.[1] ?? "";
+        // Build keywords from: skill name, words in description, hyphen-split name parts
+        const keywords = [
+          name,
+          ...name.split("-"),
+          ...description.toLowerCase().split(/[\s,.:;]+/).filter(w => w.length > 3),
+        ];
+        // Find scripts
+        const scriptsDir = join(skillDir, "scripts");
+        let scripts: string[] = [];
+        try {
+          if (existsSync(scriptsDir) && statSync(scriptsDir).isDirectory()) {
+            scripts = readdirSync(scriptsDir).filter(f => f.endsWith(".py") || f.endsWith(".sh"));
+          }
+        } catch { /* no scripts dir */ }
+        _skillRegistry.push({ name, path: skillDir, description, keywords, scripts });
+      } catch { /* skip unreadable skill */ }
+    }
+  } catch { /* no skills dir */ }
+  _skillRegistryRefreshedAt = now;
+  return _skillRegistry;
+}
+function detectSkillHints(userText: string): string | null {
+  const skills = getSkillRegistry();
+  if (!skills.length) return null;
+  const matched: SkillEntry[] = [];
+  for (const skill of skills) {
+    // Match by exact skill name in prompt only
+    const nameRegex = new RegExp(`\\b${skill.name.replace(/[.*+?^${}()|[\]\\]/g, "\\$&")}\\b`, "i");
+    if (nameRegex.test(userText)) {
+      matched.push(skill);
+    }
+  }
+  if (!matched.length) return null;
+  // Keep hints compact — every byte counts at high message counts
+  const hints = matched.map(skill => {
+    const scripts = skill.scripts.length > 0
+      ? ` Scripts: ${skill.scripts.map(s => `${skill.path}/scripts/${s}`).join(", ")}`
+      : "";
+    return `[Skill: ${skill.name}] Read: ${skill.path}/SKILL.md — follow workflow with read/exec tools.${scripts}`;
+  });
+  return hints.join("\n");
+}
 /**
  * Route a chat completion to the correct CLI based on model prefix.
  *   cli-gemini/<id>      → gemini CLI
@@ -1028,11 +1123,12 @@ export async function routeToCliRunner(
   const hasTools = toolCount > 0;
   // Auto-detect project from user messages only (not tool results which mention other projects)
+  const userText = messages
+    .filter((m) => m.role === "user")
+    .map((m) => contentToString(m.content))
+    .join(" ");
   if (!opts.workdir) {
-    const userText = messages
-      .filter((m) => m.role === "user")
-      .map((m) => typeof m.content === "string" ? m.content : "")
-      .join(" ");
     const detected = detectProjectFromPrompt(userText);
     if (detected) {
       opts = { ...opts, workdir: detected.path };
@@ -1041,6 +1137,13 @@ export async function routeToCliRunner(
     }
   }
+  // Skill hints: inject at END of prompt so they're the freshest context (not buried under system msg)
+  const skillHints = detectSkillHints(userText);
+  if (skillHints) {
+    prompt = `${prompt}\n\n${skillHints}`;
+    debugLog("SKILL-HINT", "injected skill hints at end of prompt", { len: skillHints.length });
+  }
   // Strip "vllm/" prefix if present — OpenClaw sends the full provider path
   // (e.g. "vllm/cli-claude/claude-sonnet-4-6") but the router only needs the
   // "cli-<type>/<model>" portion.

package/src/config.ts CHANGED Viewed

@@ -78,6 +78,12 @@ export const TOOL_HEAVY_THRESHOLD = 10;
  */
 export const TOOL_ROUTING_THRESHOLD = 8;
+/**
+ * Prompt size threshold (bytes) for escalating Sonnet to Opus.
+ * Sonnet hangs ~50% at 30KB+ prompts. Opus handles large contexts reliably.
+ */
+export const OPUS_ESCALATION_THRESHOLD = 30_000;
 /** Max characters per message content before truncation. */
 export const MAX_MSG_CHARS = 4_000;

package/src/proxy-server.ts CHANGED Viewed

@@ -9,7 +9,8 @@
  */
 import http from "node:http";
-import { randomBytes } from "node:crypto";
+import { execSync } from "node:child_process";
+import { randomBytes, createHash } from "node:crypto";
 import { type ChatMessage, type CliToolResult, type ToolDefinition, routeToCliRunner, extractMultimodalParts, cleanupMediaFiles } from "./cli-runner.js";
 import { scheduleTokenRefresh, setAuthLogger, stopTokenRefresh } from "./claude-auth.js";
 import { grokComplete, grokCompleteStream, type ChatMessage as GrokChatMessage } from "./grok-client.js";
@@ -33,9 +34,118 @@ import {
   BITNET_SYSTEM_PROMPT,
   DEFAULT_MODEL_TIMEOUTS,
   TOOL_ROUTING_THRESHOLD,
+  OPUS_ESCALATION_THRESHOLD,
 } from "./config.js";
 import { debugLog, DEBUG_LOG_PATH, getLogTail, watchLogFile, setDebugLogEnabled } from "./debug-log.js";
+// ── Skill delegation via openclaw agent ─────────────────────────────────────
+import { existsSync, readdirSync, statSync } from "node:fs";
+import { join } from "node:path";
+import { homedir } from "node:os";
+import { spawn as spawnChild } from "node:child_process";
+const activeDelegations = new Set<string>();
+function extractUserText(messages: ChatMessage[]): string {
+  return messages
+    .filter((m) => m.role === "user")
+    .map((m) => {
+      if (typeof m.content === "string") return m.content;
+      if (Array.isArray(m.content)) {
+        return (m.content as Array<{ type: string; text?: string }>)
+          .filter((p) => p.type === "text" && p.text)
+          .map((p) => p.text!)
+          .join(" ");
+      }
+      return "";
+    })
+    .join(" ");
+}
+let _skillNames: string[] | null = null;
+let _skillNamesAt = 0;
+function getSkillNames(): string[] {
+  const now = Date.now();
+  if (_skillNames && (now - _skillNamesAt) < 120_000) return _skillNames;
+  _skillNames = [];
+  const dir = join(homedir(), ".openclaw", "skills");
+  try {
+    if (!existsSync(dir)) return _skillNames;
+    for (const name of readdirSync(dir)) {
+      try {
+        if (statSync(join(dir, name)).isDirectory() && existsSync(join(dir, name, "SKILL.md"))) {
+          _skillNames.push(name);
+        }
+      } catch {}
+    }
+  } catch {}
+  _skillNamesAt = now;
+  return _skillNames;
+}
+function detectMatchedSkill(userText: string): string | null {
+  for (const name of getSkillNames()) {
+    const re = new RegExp(`\\b${name.replace(/[.*+?^${}()|[\]\\]/g, "\\$&")}\\b`, "i");
+    if (re.test(userText)) return name;
+  }
+  return null;
+}
+async function delegateToAgent(prompt: string, timeoutMs: number): Promise<{ text: string; durationMs: number }> {
+  const start = Date.now();
+  const timeoutSec = Math.min(Math.floor(timeoutMs / 1000), 300);
+  return new Promise((resolve, reject) => {
+    // Use the same Node + openclaw entry point as the systemd service to avoid version mismatches
+    const openclawEntry = join(homedir(), ".npm-global", "lib", "node_modules", "openclaw", "dist", "entry.js");
+    const useEntryJs = existsSync(openclawEntry);
+    const cmd = useEntryJs ? process.execPath : "openclaw"; // process.execPath = /usr/bin/node
+    const args = useEntryJs
+      ? [openclawEntry, "agent", "--agent", "main", "--message", prompt, "--json", "--timeout", String(timeoutSec)]
+      : ["agent", "--agent", "main", "--message", prompt, "--json", "--timeout", String(timeoutSec)];
+    const child = spawnChild(cmd, args, {
+      env: { ...process.env, PATH: `${join(homedir(), ".local", "bin")}:${process.env.PATH ?? ""}` },
+      stdio: ["pipe", "pipe", "pipe"],
+    });
+    let stdout = "";
+    let stderr = "";
+    child.stdout?.on("data", (d: Buffer) => { stdout += d.toString(); });
+    child.stderr?.on("data", (d: Buffer) => { stderr += d.toString(); });
+    const timer = setTimeout(() => { child.kill("SIGTERM"); }, timeoutMs + 10_000);
+    child.on("close", (code) => {
+      clearTimeout(timer);
+      const durationMs = Date.now() - start;
+      // Only fail if no JSON output at all — stderr always has plugin log noise
+      const hasJsonOutput = stdout.includes('"status"') || stdout.includes('"result"');
+      if (code !== 0 && !hasJsonOutput) {
+        // Filter out plugin log lines from stderr to find real errors
+        const realErrors = stderr.split("\n").filter(l => !l.includes("[plugins]") && !l.includes("[memory-") && l.trim()).join("\n");
+        reject(new Error(`openclaw agent exited ${code}: ${realErrors.slice(0, 500) || stderr.slice(0, 500)}`));
+        return;
+      }
+      try {
+        const jsonStart = stdout.indexOf("{");
+        if (jsonStart === -1) {
+          reject(new Error("No JSON in openclaw agent output"));
+          return;
+        }
+        const result = JSON.parse(stdout.slice(jsonStart));
+        const text = result?.result?.payloads?.[0]?.text ?? result?.result?.text ?? "";
+        resolve({ text, durationMs });
+      } catch (e) {
+        reject(new Error(`Failed to parse agent result: ${(e as Error).message}`));
+      }
+    });
+    child.on("error", (err) => { clearTimeout(timer); reject(err); });
+  });
+}
 // ── Active request tracking ─────────────────────────────────────────────────
 export interface ActiveRequest {
@@ -846,15 +956,107 @@ async function handleRequest(
     }
     // ─────────────────────────────────────────────────────────────────────────
+    // ── Skill delegation: delegate to openclaw agent for full workflow execution ──
+    const userText = extractUserText(cleanMessages);
+    const matchedSkill = detectMatchedSkill(userText);
+    const delegationKey = matchedSkill ? `${matchedSkill}:${createHash("md5").update(userText.slice(0, 500)).digest("hex").slice(0, 12)}` : null;
+    // TODO: delegation needs a multi-turn agent runner, not single-turn `openclaw agent`.
+    // `openclaw agent` returns after one turn (220ms) without executing the full workflow.
+    // Re-enable when OpenClaw supports multi-turn skill execution (e.g., `openclaw skill run blog-writer`).
+    if (false && matchedSkill && delegationKey && activeDelegations.size === 0) {
+      debugLog("DELEGATE", `skill "${matchedSkill}" detected, delegating to openclaw agent`, { msgs: cleanMessages.length });
+      activeDelegations.add(delegationKey);
+      // Send SSE headers early if streaming
+      if (stream) {
+        res.writeHead(200, { "Content-Type": "text/event-stream", "Cache-Control": "no-cache", Connection: "keep-alive", ...corsHeaders() });
+        res.write(": delegating to openclaw agent\n\n");
+        // Keepalive while agent runs
+        const ka = setInterval(() => { res.write(": agent working\n\n"); }, 15_000);
+        try {
+          const lastUser = [...cleanMessages].reverse().find(m => m.role === "user");
+          const delegatePrompt = typeof lastUser?.content === "string" ? lastUser.content
+            : Array.isArray(lastUser?.content) ? (lastUser!.content as Array<{ type: string; text?: string }>).filter(p => p.type === "text").map(p => p.text).join(" ")
+            : userText.slice(-2000);
+          const agentResult = await delegateToAgent(delegatePrompt, MAX_EFFECTIVE_TIMEOUT_MS);
+          debugLog("DELEGATE-OK", `skill "${matchedSkill}" completed in ${(agentResult.durationMs / 1000).toFixed(1)}s`, { contentLen: agentResult.text.length });
+          metrics.recordRequest(model, agentResult.durationMs, true, estPromptTokens, estimateTokens(agentResult.text), promptPreview);
+          const chunk = { id, object: "chat.completion.chunk", created, model, choices: [{ index: 0, delta: { role: "assistant", content: agentResult.text }, finish_reason: "stop" }] };
+          res.write(`data: ${JSON.stringify(chunk)}\n\n`);
+          res.write("data: [DONE]\n\n");
+          res.end();
+        } catch (err) {
+          const msg = (err as Error).message;
+          debugLog("DELEGATE-FAIL", `skill "${matchedSkill}" failed`, { error: msg.slice(0, 200) });
+          opts.warn(`[cli-bridge] agent delegation failed: ${msg.slice(0, 100)}, falling through to CLI`);
+          // Fall through to normal CLI routing below
+          clearInterval(ka);
+          activeDelegations.delete(delegationKey);
+          // Can't fall through after sending SSE headers — send error
+          res.write(`data: ${JSON.stringify({ error: { message: `Agent delegation failed: ${msg.slice(0, 200)}. Retrying via CLI.`, type: "cli_error" } })}\n\n`);
+          res.write("data: [DONE]\n\n");
+          res.end();
+          activeRequests.delete(id);
+          cleanupMediaFiles(mediaFiles);
+          return;
+        } finally {
+          clearInterval(ka);
+          activeDelegations.delete(delegationKey);
+        }
+        activeRequests.delete(id);
+        cleanupMediaFiles(mediaFiles);
+        return;
+      }
+      // Non-streaming delegation
+      try {
+        const lastUser = [...cleanMessages].reverse().find(m => m.role === "user");
+        const delegatePrompt = typeof lastUser?.content === "string" ? lastUser.content
+          : Array.isArray(lastUser?.content) ? (lastUser!.content as Array<{ type: string; text?: string }>).filter(p => p.type === "text").map(p => p.text).join(" ")
+          : userText.slice(-2000);
+        const agentResult = await delegateToAgent(delegatePrompt, MAX_EFFECTIVE_TIMEOUT_MS);
+        debugLog("DELEGATE-OK", `skill "${matchedSkill}" completed in ${(agentResult.durationMs / 1000).toFixed(1)}s`, { contentLen: agentResult.text.length });
+        res.writeHead(200, { "Content-Type": "application/json", ...corsHeaders() });
+        res.end(JSON.stringify({
+          id, object: "chat.completion", created, model,
+          choices: [{ index: 0, message: { role: "assistant", content: agentResult.text }, finish_reason: "stop" }],
+          usage: { prompt_tokens: estPromptTokens, completion_tokens: estimateTokens(agentResult.text), total_tokens: estPromptTokens + estimateTokens(agentResult.text) },
+        }));
+        activeRequests.delete(id);
+        cleanupMediaFiles(mediaFiles);
+        return;
+      } catch (err) {
+        debugLog("DELEGATE-FAIL", `skill "${matchedSkill}" failed, falling through to CLI`, { error: (err as Error).message.slice(0, 200) });
+        activeDelegations.delete(delegationKey);
+        // Fall through to normal CLI routing
+      } finally {
+        activeDelegations.delete(delegationKey);
+      }
+    }
     // ── CLI runner routing (Gemini / Claude Code / Codex) ──────────────────────
     let result: CliToolResult;
     let usedModel = model;
-    // ── Smart tool routing: Sonnet first (better reasoning), fast fallback to Haiku ──
-    // Sonnet picks the right tools but intermittently hangs on large prompts.
-    // Strategy: let Sonnet try first — if it responds, great (better tool selection).
-    // The stale-output detector (60s) kills it fast if it hangs, then fallback to Haiku.
-    // This preserves Sonnet's intelligence for tool selection while keeping Haiku as safety net.
+    // ── Opus escalation: route heavy conversations to Opus instead of Sonnet ──
+    // Sonnet hangs ~50% at 30KB+ prompts. Opus handles large contexts reliably.
+    // Measure by message count (proxy for formatted prompt size after truncation):
+    //   - With 21 tools + 12 messages (heavy tools window), prompt hits ~30KB
+    //   - Escalate when messages > 20 (conversation is deep enough to cause hangs)
+    const shouldEscalate = model === "cli-claude/claude-sonnet-4-6"
+      && cleanMessages.length > 20
+      && hasTools;
+    if (shouldEscalate) {
+      const originalModel = model;
+      usedModel = "cli-claude/claude-opus-4-6";
+      debugLog("OPUS-ESCALATE", `${originalModel} → ${usedModel}`, { msgs: cleanMessages.length, tools: tools?.length ?? 0 });
+      opts.log(`[cli-bridge] escalating to Opus (${cleanMessages.length} msgs with ${tools?.length ?? 0} tools)`);
+    }
     const routeOpts = { workdir, tools: hasTools ? tools : undefined, mediaFiles: mediaFiles.length ? mediaFiles : undefined, log: opts.log };