npm - mobygate - Versions diffs - 0.5.3 → 0.6.0 - Mend

mobygate 0.5.3 → 0.6.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (6) hide show

package/CHANGELOG.md CHANGED Viewed

@@ -4,6 +4,83 @@ All notable changes to mobygate are documented here. Format loosely follows
 [Keep a Changelog](https://keepachangelog.com/en/1.1.0/); version numbers are
 [Semantic Versioning](https://semver.org/).
+## [0.6.0] — 2026-04-24
+Big one. Native tool calling + in-dashboard self-update.
+### Added
+- **Native MCP tool calling.** Client-supplied OpenAI tools are now
+  registered with the Claude Agent SDK as in-process MCP tools (with
+  Zod schemas converted from JSON Schema). The model emits genuine
+  `tool_use` content blocks instead of the old `<tool_call>...`
+  text-pattern hack. Tool IDs returned to clients are now Anthropic-
+  native `toolu_*` strings, not synthesized `call_*` ones. New module:
+  `lib/tool-bridge.js`.
+- **Dashboard update banner.** When npm has a newer mobygate, the
+  dashboard shows an orange pill at the top: `v0.6.0 → v0.6.1 available
+  · npm install · [changelog] [dismiss] [update now]`. Clicking
+  "update now" fires `npm install -g mobygate@latest` (or `git pull`
+  for git-mode installs) in a detached child process, restarts the
+  service, and auto-reloads the page. Dismissals stick per-version
+  via localStorage. New module: `lib/updater.js`.
+- New endpoints: `GET /update/check`, `POST /update/apply`,
+  `GET /update/status`. The check endpoint caches the npm registry
+  lookup for 15 minutes so dashboards open all day don't hammer it.
+### Changed
+- **No more prompt-injected tool definitions.** The `<system>...</system>`
+  block listing available tools as XML is gone — the SDK's MCP
+  registration is the model's source of truth now. This shrinks every
+  tool-enabled prompt by ~200-500 tokens depending on tool count.
+- **Tool-flow detection** moved from text-pattern matching
+  (`hasCompleteToolCall`, `parseToolCalls` regexes) to native
+  `tool_use` content-block detection in the assistant message stream
+  (`hasToolUse`, `extractToolUses`). The moment a tool_use lands,
+  we abort the SDK and emit OpenAI-shape `tool_calls`.
+- **`alwaysLoad: true`** on every registered tool. Without this, the
+  SDK lazily defers MCP tool schemas — the model has to call the
+  built-in `ToolSearch` tool to fetch each definition before invoking,
+  which leaks through to OpenAI clients as a confusing tool_call
+  for `ToolSearch` instead of their actual tool. Eager loading
+  keeps the surface clean.
+### Removed
+- `buildToolInstructions` — the `<tool_call>...` protocol prose.
+- `parseToolCalls` — the regex parser for `<tool_call>` JSON blocks.
+- `hasCompleteToolCall` — the streaming-buffer heuristic that aborted
+  the SDK when a complete tag pair appeared.
+- `formatAssistantForReplay`'s tool_calls→`<tool_call>` text
+  serialization (assistant replay is now best-effort text only).
+- The "Use the tool results above to continue toward the final answer"
+  nudge — tool results are visible in conversation context now, so
+  the model handles continuation naturally without coaxing.
+### Known limitation (Phase 1 deliberate)
+- Tool *results* coming back from the client are still spliced as
+  `<tool_results>` text in the resumed prompt, not native Anthropic
+  `tool_result` content blocks. Reason: aborting the SDK on a
+  `tool_use` block prevents the assistant turn from being persisted
+  in session state — on resume, native tool_result blocks have
+  nothing to bind to and the model re-calls the tool. Text-form
+  results work because the resumed model has the prior turn in
+  context. Phase 2's full Anthropic Messages wire surface will
+  keep the SDK alive through the tool turn and switch to native
+  tool_result blocks end-to-end.
+### Migration
+- No client-facing changes. Existing OpenAI-shape requests with
+  `tools: [...]` work the same as before — what's improved is
+  reliability ("Model returned empty after tool calls" warnings
+  should largely disappear) and surface fidelity (tool_call IDs
+  are now native Anthropic IDs, not synthesized).
+- Update with `mobygate update` (CLI) or click the new "update now"
+  button in the dashboard once it appears.
 ## [0.5.3] — 2026-04-19
 Security pass.

package/index.html CHANGED Viewed

@@ -49,6 +49,34 @@
 <body class="antialiased">
   <div class="mx-auto px-12 pt-8 pb-7 flex flex-col gap-6 max-w-[1440px] min-h-screen">
+    <!-- ===== Update banner ===== -->
+    <!-- Hidden until /update/check reports updateAvailable=true. During
+         apply, this becomes a progress strip showing live log tail. -->
+    <section id="updateBanner" style="display:none" class="items-center gap-4 py-3 px-5 bg-[#121210] border-l-2 border-l-[#E89B2E] border-t border-b border-r border-[#2A2A1F] rounded-r-md">
+      <div class="flex items-center gap-2.5">
+        <span class="rounded-full bg-[#E89B2E] w-2 h-2 pulse-dot"></span>
+        <span class="uppercase text-[#E89B2E] font-medium text-[10px] tracking-[0.22em]">Update</span>
+      </div>
+      <div id="updateBannerText" class="grow text-[#F3EFE4] text-xs leading-4"></div>
+      <div id="updateBannerActions" class="flex items-center gap-2 shrink-0">
+        <a id="updateBannerChangelog" href="https://github.com/khnfrhn/mobygate/blob/master/CHANGELOG.md" target="_blank" rel="noreferrer" class="text-[#8A9A6A] hover:text-[#C9D9A8] text-[11px] tracking-[0.04em] underline decoration-dotted">changelog</a>
+        <button id="updateDismissBtn" class="rounded-full py-1.5 px-3 border border-[#2A2A1F] text-[#8A9A6A] hover:text-[#C9D9A8] hover:border-[#5A5F54] font-medium text-[11px] tracking-[0.04em] transition">dismiss</button>
+        <button id="updateApplyBtn" class="rounded-full py-1.5 px-3.5 bg-[#E89B2E] hover:brightness-110 text-[#0B0B09] font-bold text-[11px] tracking-[0.04em] transition">update now</button>
+      </div>
+    </section>
+    <!-- Apply-in-progress shelf: expands below the banner during update. -->
+    <section id="updateProgress" style="display:none" class="flex-col gap-2 py-3 px-5 bg-[#121210] border border-[#2A2A1F] rounded-md">
+      <div class="flex items-center justify-between">
+        <div class="flex items-center gap-2">
+          <span id="updateSpinner" class="rounded-full bg-[#E89B2E] w-2 h-2 pulse-dot"></span>
+          <span id="updateProgressTitle" class="uppercase text-[#C9D9A8] font-medium text-[10px] tracking-[0.22em]">Installing</span>
+          <span id="updateProgressSub" class="text-[#5A5F54] text-[11px]"></span>
+        </div>
+        <button id="updateProgressClose" style="display:none" class="text-[#5A5F54] hover:text-[#C9D9A8] text-[11px]">close ✕</button>
+      </div>
+      <pre id="updateProgressLog" class="text-[11px] leading-[15px] text-[#8A9A6A] max-h-[180px] overflow-auto whitespace-pre-wrap m-0"></pre>
+    </section>
     <!-- ===== Header ===== -->
     <header class="flex justify-between items-center shrink-0">
       <div class="flex items-center gap-[22px]">
@@ -804,6 +832,117 @@
       }
     }, 1000);
+    // ───────────────────────── Updater
+    // Dashboard-driven upgrade flow. On load (and every 30 min) we ask
+    // /update/check whether a newer mobygate is on npm. If so, a pill
+    // appears at the top of the page — click "update now" to fire the
+    // update, watch log lines stream in, then auto-reload when the new
+    // server is up. The child process is detached, so the server
+    // restart doesn't orphan it.
+    const UPDATE_DISMISS_KEY = 'mobygate:update:dismissedVersion';
+    let updateInfo = null;
+    let updatePollTimer = null;
+    function showBanner(info) {
+      if (!info?.updateAvailable) {
+        $('updateBanner').style.display = 'none';
+        return;
+      }
+      // Respect dismissal: if the user dismissed this exact version, don't
+      // re-pester until a newer one lands.
+      const dismissed = localStorage.getItem(UPDATE_DISMISS_KEY);
+      if (dismissed === info.latest) {
+        $('updateBanner').style.display = 'none';
+        return;
+      }
+      const msg = info.canApply
+        ? `v${escHtml(info.current)} → <span class="text-[#B7E56D]">v${escHtml(info.latest)}</span> available · <span class="text-[#5A5F54]">${escHtml(info.installMode)} install</span>`
+        : `v${escHtml(info.current)} → <span class="text-[#B7E56D]">v${escHtml(info.latest)}</span> available · <span class="text-[#E89B2E]">${escHtml(info.installMode)} install — update manually</span>`;
+      $('updateBannerText').innerHTML = msg;
+      $('updateApplyBtn').style.display = info.canApply ? '' : 'none';
+      $('updateBanner').style.display = 'flex';
+    }
+    async function checkForUpdates({ force = false } = {}) {
+      try {
+        const r = await fetch(`/update/check${force ? '?force=1' : ''}`);
+        if (!r.ok) return;
+        updateInfo = await r.json();
+        showBanner(updateInfo);
+      } catch (e) { /* offline is fine */ }
+    }
+    function renderUpdateLog(lines) {
+      const el = $('updateProgressLog');
+      el.textContent = (lines || []).join('\n');
+      // Pin to bottom so the user sees the latest line.
+      el.scrollTop = el.scrollHeight;
+    }
+    async function pollUpdateStatus() {
+      try {
+        const r = await fetch('/update/status?lines=200');
+        if (!r.ok) return;
+        const s = await r.json();
+        renderUpdateLog(s.lines);
+        if (!s.running) {
+          // Update finished. The service restart may have already swapped
+          // the running binary — our `currentVersion` reflects whatever
+          // server answered. If it matches `latest`, celebrate. Either
+          // way, give it a moment then reload so the dashboard comes
+          // back on the new code path.
+          clearInterval(updatePollTimer); updatePollTimer = null;
+          $('updateSpinner').classList.remove('pulse-dot');
+          $('updateSpinner').classList.remove('bg-[#E89B2E]');
+          $('updateSpinner').classList.add('bg-[#B7E56D]');
+          $('updateProgressTitle').textContent = 'Installed';
+          $('updateProgressSub').textContent = `now on v${s.currentVersion} — reloading in 3s…`;
+          $('updateProgressClose').style.display = '';
+          setTimeout(() => location.reload(), 3000);
+        }
+      } catch (e) {
+        // Server is mid-restart — keep polling, it'll come back.
+      }
+    }
+    function startUpdateProgress(mode) {
+      $('updateBanner').style.display = 'none';
+      $('updateProgress').style.display = 'flex';
+      $('updateProgressSub').textContent = mode ? `(${mode} install)` : '';
+      $('updateProgressTitle').textContent = 'Installing';
+      $('updateSpinner').classList.add('pulse-dot');
+      $('updateProgressLog').textContent = 'starting update…';
+      if (updatePollTimer) clearInterval(updatePollTimer);
+      updatePollTimer = setInterval(pollUpdateStatus, 1500);
+      pollUpdateStatus();
+    }
+    $('updateApplyBtn')?.addEventListener('click', async () => {
+      $('updateApplyBtn').disabled = true;
+      try {
+        const r = await fetch('/update/apply', { method: 'POST' });
+        const j = await r.json().catch(() => ({}));
+        if (!r.ok || !j.started) {
+          $('updateBannerText').innerHTML += ` <span class="text-[#E89B2E]">— ${escHtml(j.error || 'update failed to start')}</span>`;
+          $('updateApplyBtn').disabled = false;
+          return;
+        }
+        startUpdateProgress(j.mode);
+      } catch (e) {
+        $('updateBannerText').innerHTML += ` <span class="text-[#E89B2E]">— ${escHtml(e.message)}</span>`;
+        $('updateApplyBtn').disabled = false;
+      }
+    });
+    $('updateDismissBtn')?.addEventListener('click', () => {
+      if (updateInfo?.latest) localStorage.setItem(UPDATE_DISMISS_KEY, updateInfo.latest);
+      $('updateBanner').style.display = 'none';
+    });
+    $('updateProgressClose')?.addEventListener('click', () => {
+      $('updateProgress').style.display = 'none';
+    });
     // Kick off
     loadSnapshot();
     loadAuth({ verify: false });
@@ -811,6 +950,21 @@
     loadLogs();
     armLogAutoRefresh();
     connectStream();
+    // Surface update availability on load + every 30 min. The backend
+    // caches the npm registry lookup for 15 min, so this doesn't hammer
+    // the registry even with the dashboard open all day.
+    checkForUpdates();
+    setInterval(() => checkForUpdates(), 30 * 60 * 1000);
+    // If an update is in-flight when the page loads (e.g., user refreshed
+    // mid-apply), pick up where it left off.
+    (async () => {
+      try {
+        const r = await fetch('/update/status?lines=50');
+        if (!r.ok) return;
+        const s = await r.json();
+        if (s.running) startUpdateProgress(s.mode);
+      } catch {}
+    })();
   </script>
 </body>
 </html>

package/lib/tool-bridge.js ADDED Viewed

@@ -0,0 +1,257 @@
+/**
+ * Native tool bridge — translates between OpenAI client tools and the
+ * Claude Agent SDK's MCP-tool model.
+ *
+ * Why this exists (Phase 1 of the mobygate native-tools refactor):
+ *
+ * Until now, mobygate handled client-supplied tools by injecting their
+ * schemas into the system prompt as <tool> XML and instructing the model
+ * to emit <tool_call>{...}</tool_call> tags in its text output. We then
+ * regex-parsed those tags. Fragile in obvious ways: the model sometimes
+ * wrapped tags in code fences, sometimes hallucinated partial blocks,
+ * and the "empty after tool_results" nudge existed to paper over the
+ * model treating bare <tool_results> as inert data.
+ *
+ * The SDK actually supports native tool definitions via MCP — but its
+ * MCP model assumes the **handler runs in-process** and returns a
+ * synchronous result. Our case is different: we're a proxy. The actual
+ * tool implementations live on the *other* side of an HTTP boundary,
+ * inside the client (Hermes / OpenClaw / etc.). We can't run them.
+ *
+ * The trick: register client tools as MCP tools with stub handlers that
+ * never resolve. The model emits **native** `tool_use` content blocks
+ * (in the SDKAssistantMessage stream, not buried in text). We watch the
+ * stream, abort the SDK on the first complete `tool_use`, and surface
+ * it to the client as an OpenAI `tool_calls` response. The stub handler
+ * is then aborted via the SDK's signal — we never actually execute it,
+ * the client does.
+ *
+ * The other end of the round-trip: when the client sends a follow-up
+ * request with tool results (role:'tool' messages), we convert those
+ * into native `tool_result` content blocks inside an SDKUserMessage,
+ * resuming the SDK session. The model sees structured tool results,
+ * not <tool_result> XML, and continues the conversation cleanly.
+ *
+ * Names round-trip via the MCP prefix convention. A client tool named
+ * `getWeather` is registered as `mcp__mobygate__getWeather` with the
+ * SDK; the model emits tool_use blocks under that prefixed name; we
+ * strip the prefix on the way back so the client sees its original name.
+ */
+import { z } from 'zod';
+import { tool, createSdkMcpServer } from '@anthropic-ai/claude-agent-sdk';
+export const MCP_SERVER_NAME = 'mobygate';
+export const MCP_TOOL_PREFIX = `mcp__${MCP_SERVER_NAME}__`;
+// ---------------------------------------------------------------------------
+// JSON Schema → Zod RawShape
+// ---------------------------------------------------------------------------
+// The SDK's `tool()` helper takes a Zod RawShape (a record of ZodTypes,
+// like `{name: z.string(), age: z.number()}`) — NOT a JSON Schema object.
+// OpenAI clients send JSON Schema (`{type:'object', properties:{...}, required:[...]}`),
+// so we need to convert. This handles the common cases that cover ~95% of
+// real-world tool schemas; anything weirder falls through to z.unknown().
+function jsonSchemaPropToZod(prop) {
+  if (!prop || typeof prop !== 'object') return z.unknown();
+  // Handle enums up front — they apply across types.
+  if (Array.isArray(prop.enum) && prop.enum.length > 0) {
+    const stringy = prop.enum.every((v) => typeof v === 'string');
+    if (stringy) return z.enum(prop.enum);
+    // mixed-type enums fall through to z.union of literals
+    return z.union(prop.enum.map((v) => z.literal(v)));
+  }
+  switch (prop.type) {
+    case 'string':  return z.string();
+    case 'number':  return z.number();
+    case 'integer': return z.number().int();
+    case 'boolean': return z.boolean();
+    case 'null':    return z.null();
+    case 'array': {
+      const item = prop.items ? jsonSchemaPropToZod(prop.items) : z.unknown();
+      return z.array(item);
+    }
+    case 'object': {
+      const shape = jsonSchemaToZodShape(prop);
+      return z.object(shape).passthrough();
+    }
+    default: return z.unknown();
+  }
+}
+/**
+ * Convert a JSON Schema *object* (with `properties` + `required`) into
+ * a Zod RawShape suitable for the SDK's `tool()` helper.
+ *
+ * Returns an empty shape `{}` when the schema isn't an object — the
+ * caller will pass this to `tool()`, and the model will see "no
+ * structured input expected." That's the right default for tool defs
+ * that arrive without a properties block (which OpenAI permits).
+ */
+export function jsonSchemaToZodShape(schema) {
+  if (!schema || schema.type !== 'object' || !schema.properties) return {};
+  const shape = {};
+  const required = new Set(Array.isArray(schema.required) ? schema.required : []);
+  for (const [key, prop] of Object.entries(schema.properties)) {
+    let zType = jsonSchemaPropToZod(prop);
+    if (!required.has(key)) zType = zType.optional();
+    if (prop?.description) zType = zType.describe(prop.description);
+    shape[key] = zType;
+  }
+  return shape;
+}
+// ---------------------------------------------------------------------------
+// Build the MCP server that exposes client tools to the SDK
+// ---------------------------------------------------------------------------
+/**
+ * Stub handler. The model emits a tool_use block, the SDK calls us, but
+ * we don't actually have an implementation to run — the client does.
+ * So we wait. The stream-watcher in server.js will abort the SDK as
+ * soon as it sees the tool_use block, which propagates here as a signal
+ * abort. We reject and the SDK cleans up.
+ *
+ * The 30s safety timeout is for the (rare) case where the SDK fires our
+ * handler but the abort never propagates back — we don't want to leak
+ * a Promise forever. 30s is well past any reasonable abort latency.
+ */
+function deferredToolHandler(_args, extra) {
+  return new Promise((resolve, reject) => {
+    const onAbort = () => {
+      cleanup();
+      reject(new Error('mobygate: tool execution deferred to client (aborted)'));
+    };
+    const timer = setTimeout(() => {
+      cleanup();
+      reject(new Error('mobygate: tool execution deferred to client (timeout)'));
+    }, 30_000);
+    function cleanup() {
+      clearTimeout(timer);
+      extra?.signal?.removeEventListener?.('abort', onAbort);
+    }
+    if (extra?.signal?.aborted) return onAbort();
+    extra?.signal?.addEventListener?.('abort', onAbort, { once: true });
+  });
+}
+/**
+ * Build an in-process MCP server exposing the client's tools to the SDK.
+ * Returns the McpSdkServerConfigWithInstance; pass it to `query({options: { mcpServers: { [MCP_SERVER_NAME]: config } }})`.
+ *
+ * Returns `null` when there are no valid tools — caller should skip
+ * MCP setup entirely in that case.
+ */
+export function buildClientToolsServer(openaiTools) {
+  if (!Array.isArray(openaiTools) || openaiTools.length === 0) return null;
+  const toolDefs = [];
+  for (const t of openaiTools) {
+    if (t?.type !== 'function' || !t.function?.name) continue;
+    const fn = t.function;
+    const shape = jsonSchemaToZodShape(fn.parameters);
+    toolDefs.push(tool(
+      fn.name,
+      fn.description || `Client-defined tool: ${fn.name}`,
+      shape,
+      deferredToolHandler,
+      // alwaysLoad: the SDK otherwise marks MCP tools as "deferred" — the
+      // model has to call the built-in `ToolSearch` to fetch the schema
+      // before invoking. That round-trip is invisible to OpenAI clients,
+      // who see a confusing tool_call for ToolSearch instead of getWeather.
+      // Eagerly loading our tools keeps the OpenAI surface clean.
+      { alwaysLoad: true },
+    ));
+  }
+  if (toolDefs.length === 0) return null;
+  return createSdkMcpServer({
+    name: MCP_SERVER_NAME,
+    version: '1.0.0',
+    tools: toolDefs,
+  });
+}
+// ---------------------------------------------------------------------------
+// Tool-use extraction (SDK assistant message → OpenAI tool_calls)
+// ---------------------------------------------------------------------------
+/**
+ * Walk an SDKAssistantMessage's content array for native `tool_use` blocks.
+ * Returns an array of `{ id, name, arguments }` formatted for OpenAI
+ * tool_calls — name has the MCP prefix stripped, arguments is a JSON string.
+ *
+ * Returns `[]` when the message has no tool_use blocks (most assistant
+ * messages don't — they're just text deltas).
+ */
+export function extractToolUses(assistantMessage) {
+  const content = assistantMessage?.message?.content;
+  if (!Array.isArray(content)) return [];
+  const calls = [];
+  for (const block of content) {
+    if (block?.type !== 'tool_use' || !block.id || !block.name) continue;
+    // Strip the MCP prefix so the client sees its original tool name.
+    const name = block.name.startsWith(MCP_TOOL_PREFIX)
+      ? block.name.slice(MCP_TOOL_PREFIX.length)
+      : block.name;
+    let argsString = '{}';
+    try { argsString = JSON.stringify(block.input ?? {}); } catch {}
+    calls.push({ id: block.id, name, arguments: argsString });
+  }
+  return calls;
+}
+/**
+ * Quick liveness check used by the stream loop to decide whether to abort
+ * early. Returns true the moment any tool_use block appears.
+ */
+export function hasToolUse(assistantMessage) {
+  const content = assistantMessage?.message?.content;
+  if (!Array.isArray(content)) return false;
+  return content.some((b) => b?.type === 'tool_use');
+}
+// ---------------------------------------------------------------------------
+// Tool results (OpenAI tool messages → Anthropic tool_result content blocks)
+// ---------------------------------------------------------------------------
+/**
+ * Format OpenAI role:'tool' messages as a single user-readable text
+ * block to splice into a resumed prompt.
+ *
+ * NOTE: Phase 1 deliberately does *not* round-trip tool results as
+ * native Anthropic `tool_result` content blocks. Why: when we abort
+ * the SDK on a tool_use, the assistant turn isn't persisted in the
+ * SDK's session state (we observed `msgs=1` on resume after a tool
+ * call, meaning the partial turn was dropped). On resume, sending a
+ * native tool_result block then has nothing to bind to — the model
+ * sees an orphan tool_result and re-calls the tool.
+ *
+ * Phase 2's full Anthropic Messages wire format will keep the SDK
+ * alive long enough to persist the turn properly. Until then, text-
+ * form tool results (which the model handles fine — it has the
+ * preceding tool_use in resume context) is the pragmatic answer.
+ *
+ * Returns a single string suitable for prepending to (or replacing)
+ * the user's prompt text on a resumed turn. Returns '' when there
+ * are no tool messages.
+ */
+export function toolMessagesToText(toolMessages) {
+  const lines = [];
+  for (const msg of toolMessages) {
+    if (msg?.role !== 'tool') continue;
+    const id = msg.tool_call_id || 'unknown';
+    const name = msg.name || '';
+    const content = typeof msg.content === 'string'
+      ? msg.content
+      : Array.isArray(msg.content)
+        ? msg.content.map((c) => (typeof c === 'string' ? c : c?.text || '')).join('')
+        : (msg.content == null ? '' : String(msg.content));
+    lines.push(`<tool_result id="${id}"${name ? ` name="${name}"` : ''}>\n${content}\n</tool_result>`);
+  }
+  if (lines.length === 0) return '';
+  return `<tool_results>\n${lines.join('\n')}\n</tool_results>`;
+}

package/lib/updater.js ADDED Viewed

@@ -0,0 +1,275 @@
+/**
+ * mobygate updater
+ *
+ * Shared helpers powering the dashboard's "update available → update now"
+ * flow (and re-usable from the CLI later). Two concerns:
+ *
+ *   1. Version lookup — read local package.json + query npm registry for
+ *      the latest published version. Cached for 15 min so the dashboard
+ *      can poll `/update/check` on load + every 30 min without hammering
+ *      the registry (or getting rate-limited).
+ *
+ *   2. Apply — spawn the upgrade as a **detached** child process so the
+ *      restart-the-service step can kill the running mobygate server
+ *      without orphaning the update or losing log lines. Progress is
+ *      streamed to `~/.mobygate/logs/update.log`, which the dashboard
+ *      polls via `/update/status`.
+ *
+ * Install-mode routing matches `bin/mobygate.js cmdUpdate`:
+ *   - `npm` → `npm install -g mobygate@latest` → restart service
+ *   - `git` → `git pull && npm install`        → restart service
+ *   - `unknown` → refuse, surface a readable message
+ */
+import { spawn, spawnSync } from 'child_process';
+import { readFileSync, writeFileSync, existsSync, mkdirSync, openSync } from 'fs';
+import { join, sep, dirname } from 'path';
+import { fileURLToPath } from 'url';
+import { LOGS_DIR } from './config.js';
+const __filename = fileURLToPath(import.meta.url);
+const REPO_ROOT = dirname(dirname(__filename)); // lib/updater.js → repo root
+const IS_WIN = process.platform === 'win32';
+const IS_MAC = process.platform === 'darwin';
+const IS_LINUX = process.platform === 'linux';
+const SERVER_LABEL = 'ai.mobygate.server';
+const WIN_SERVER_TASK = 'ai.mobygate.server';
+const LINUX_SERVER_UNIT = 'mobygate-server.service';
+const UPDATE_LOG = join(LOGS_DIR, 'update.log');
+const UPDATE_MARKER = join(LOGS_DIR, 'update.state.json');
+// ---------------------------------------------------------------------------
+// Version lookup
+// ---------------------------------------------------------------------------
+export function getCurrentVersion() {
+  try {
+    const pkg = JSON.parse(readFileSync(join(REPO_ROOT, 'package.json'), 'utf8'));
+    return pkg.version;
+  } catch {
+    return 'unknown';
+  }
+}
+export function detectInstallMode() {
+  if (existsSync(join(REPO_ROOT, '.git'))) return 'git';
+  if (REPO_ROOT.includes(`${sep}node_modules${sep}mobygate`)) return 'npm';
+  return 'unknown';
+}
+// 15-min in-memory cache so `/update/check` on dashboard load + 30-min
+// repolls don't hit the npm registry every time. Bust with { force: true }.
+const NPM_CACHE_MS = 15 * 60 * 1000;
+let _npmCache = { version: null, fetchedAt: 0, error: null };
+/**
+ * Resolve the latest published version on npm. Returns `{ version, cached, error }`.
+ * Never throws — on failure returns an error string so the endpoint can
+ * report "check failed" without 500'ing the dashboard.
+ */
+export async function getLatestVersion({ force = false } = {}) {
+  const now = Date.now();
+  if (!force && _npmCache.version && (now - _npmCache.fetchedAt) < NPM_CACHE_MS) {
+    return { version: _npmCache.version, cached: true, error: null };
+  }
+  // `npm view` is a network call; cap at 10s so a bad connection doesn't
+  // wedge the dashboard.
+  const r = spawnSync('npm', ['view', 'mobygate', 'version'], {
+    encoding: 'utf8',
+    timeout: 10_000,
+    shell: IS_WIN, // on Windows, npm is a .cmd — needs shell resolution
+  });
+  if (r.status !== 0) {
+    const err = r.stderr?.trim() || r.error?.message || `npm exited ${r.status}`;
+    _npmCache = { version: null, fetchedAt: now, error: err };
+    return { version: null, cached: false, error: err };
+  }
+  const version = r.stdout.trim();
+  _npmCache = { version, fetchedAt: now, error: null };
+  return { version, cached: false, error: null };
+}
+/**
+ * Compare two semver-ish strings (x.y.z). Returns -1/0/1.
+ * Non-numeric suffixes (`-beta.1`) are ignored for simplicity — this
+ * matches what the registry returns for mobygate's release channel.
+ */
+export function compareVersions(a, b) {
+  const pa = String(a).split('-')[0].split('.').map((n) => parseInt(n, 10) || 0);
+  const pb = String(b).split('-')[0].split('.').map((n) => parseInt(n, 10) || 0);
+  for (let i = 0; i < 3; i++) {
+    if ((pa[i] || 0) > (pb[i] || 0)) return 1;
+    if ((pa[i] || 0) < (pb[i] || 0)) return -1;
+  }
+  return 0;
+}
+export async function getUpdateCheck({ force = false } = {}) {
+  const current = getCurrentVersion();
+  const mode = detectInstallMode();
+  const { version: latest, cached, error } = await getLatestVersion({ force });
+  const updateAvailable = !!latest && compareVersions(latest, current) > 0;
+  return {
+    current,
+    latest: latest || null,
+    updateAvailable,
+    installMode: mode,
+    checkedAt: new Date().toISOString(),
+    cached,
+    error,
+    // If the install layout can't self-update, surface it so the UI can
+    // show "run manually" instead of a live-updating button.
+    canApply: updateAvailable && (mode === 'npm' || mode === 'git'),
+  };
+}
+// ---------------------------------------------------------------------------
+// Apply
+// ---------------------------------------------------------------------------
+function ensureLogsDir() {
+  if (!existsSync(LOGS_DIR)) mkdirSync(LOGS_DIR, { recursive: true });
+}
+export function readUpdateState() {
+  if (!existsSync(UPDATE_MARKER)) return { running: false, startedAt: null, finishedAt: null, exitCode: null, pid: null };
+  try { return JSON.parse(readFileSync(UPDATE_MARKER, 'utf8')); } catch { return { running: false, error: 'marker-unreadable' }; }
+}
+export function readUpdateLogTail({ lines = 500 } = {}) {
+  if (!existsSync(UPDATE_LOG)) return [];
+  try {
+    const raw = readFileSync(UPDATE_LOG, 'utf8');
+    const split = raw.split(/\r?\n/);
+    return split.slice(-lines - 1, -1);
+  } catch { return []; }
+}
+function writeUpdateState(patch) {
+  ensureLogsDir();
+  const prev = readUpdateState();
+  const merged = { ...prev, ...patch, updatedAt: new Date().toISOString() };
+  writeFileSync(UPDATE_MARKER, JSON.stringify(merged, null, 2));
+  return merged;
+}
+/**
+ * Build the shell command that performs update + restart. Returned as a
+ * single string we can hand to `sh -c` / `cmd /c`. Written as a string
+ * (not an array) because we want shell redirection for log capture.
+ */
+function buildUpdateCommand({ mode, repoRoot, logPath }) {
+  if (IS_WIN) {
+    // cmd.exe — `>>` for append, `2>&1` to merge. Each step on its own
+    // line so failures short-circuit via `||`.
+    const steps = [];
+    steps.push(`echo [mobygate-update] start at %DATE% %TIME%`);
+    if (mode === 'npm') {
+      steps.push(`npm install -g mobygate@latest`);
+    } else if (mode === 'git') {
+      steps.push(`cd /d "${repoRoot}"`);
+      steps.push(`git pull --ff-only`);
+      steps.push(`npm install`);
+    }
+    steps.push(`echo [mobygate-update] restarting service`);
+    steps.push(`schtasks /End   /TN "${WIN_SERVER_TASK}"`);
+    steps.push(`schtasks /Run   /TN "${WIN_SERVER_TASK}"`);
+    steps.push(`echo [mobygate-update] done`);
+    // Join with && so any failure stops the chain. Final redirect to log.
+    const inner = steps.map((s) => `(${s})`).join(' && ');
+    return { shell: 'cmd', cmd: `${inner} >> "${logPath}" 2>&1` };
+  }
+  // POSIX: sh -c, bail-on-first-failure via set -e
+  const parts = [`set -e`, `echo "[mobygate-update] start $(date)"`];
+  if (mode === 'npm') {
+    parts.push(`npm install -g mobygate@latest`);
+  } else if (mode === 'git') {
+    parts.push(`cd "${repoRoot}"`);
+    parts.push(`git pull --ff-only`);
+    parts.push(`npm install`);
+  }
+  parts.push(`echo "[mobygate-update] restarting service"`);
+  if (IS_MAC) {
+    const plist = join(process.env.HOME || '~', 'Library', 'LaunchAgents', `${SERVER_LABEL}.plist`);
+    // unload may fail if not loaded — tolerate that specific case
+    parts.push(`launchctl unload "${plist}" 2>/dev/null || true`);
+    parts.push(`launchctl load   "${plist}"`);
+  } else if (IS_LINUX) {
+    parts.push(`systemctl --user restart ${LINUX_SERVER_UNIT}`);
+  }
+  parts.push(`echo "[mobygate-update] done"`);
+  const script = parts.join('\n');
+  return { shell: 'sh', cmd: script };
+}
+/**
+ * Kick off the update in a **detached** child process. The running
+ * mobygate server returns immediately and is then killed by the restart
+ * step. The dashboard polls `/update/status` to see progress, and the
+ * new server comes up with the upgraded code.
+ *
+ * Returns `{ started, pid, error }` — never throws. If another update
+ * is already running, returns `{ started: false, error: 'in-progress' }`.
+ */
+export function applyUpdate({ mode, repoRoot = REPO_ROOT } = {}) {
+  const resolvedMode = mode || detectInstallMode();
+  if (resolvedMode !== 'npm' && resolvedMode !== 'git') {
+    return { started: false, error: `install-mode ${resolvedMode} can't auto-update` };
+  }
+  ensureLogsDir();
+  const prev = readUpdateState();
+  if (prev.running && prev.pid) {
+    // Treat unknown-alive PIDs as in-progress. Dead PIDs fall through.
+    let alive = false;
+    try { process.kill(prev.pid, 0); alive = true; } catch {}
+    if (alive) return { started: false, error: 'update-already-running', pid: prev.pid };
+  }
+  // Truncate the log so each update starts fresh (the old content is
+  // still available via `mobygate logs` up to the boundary).
+  writeFileSync(UPDATE_LOG, '');
+  const { shell, cmd } = buildUpdateCommand({ mode: resolvedMode, repoRoot, logPath: UPDATE_LOG });
+  let child;
+  try {
+    if (shell === 'cmd') {
+      // Windows: use cmd.exe /c; redirection is inside cmd string.
+      child = spawn('cmd.exe', ['/c', cmd], {
+        detached: true,
+        stdio: 'ignore',
+        windowsHide: true,
+      });
+    } else {
+      // POSIX: use sh -c; redirect stdout/stderr into the log file via Node
+      // (simpler than embedding `>>` into the script, and avoids quoting
+      // headaches across macOS sh versions).
+      const fd = openSync(UPDATE_LOG, 'a');
+      child = spawn('sh', ['-c', cmd], {
+        detached: true,
+        stdio: ['ignore', fd, fd],
+      });
+    }
+    child.unref();
+  } catch (e) {
+    return { started: false, error: `spawn failed: ${e.message}` };
+  }
+  writeUpdateState({
+    running: true,
+    startedAt: new Date().toISOString(),
+    finishedAt: null,
+    exitCode: null,
+    pid: child.pid,
+    mode: resolvedMode,
+  });
+  // We don't await the child — it's detached and will outlive us. The
+  // dashboard determines completion by polling /update/status, which
+  // checks `process.kill(pid, 0)` to test liveness.
+  return { started: true, pid: child.pid, mode: resolvedMode };
+}

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "mobygate",
-  "version": "0.5.3",
+  "version": "0.6.0",
   "description": "OpenAI-compatible local proxy for Claude Max. The Möbius-strip gateway: OpenAI shape in, Claude Max out.",
   "type": "module",
   "main": "server.js",

package/server.js CHANGED Viewed

@@ -53,6 +53,21 @@ import { banner } from './lib/ascii.js';
 import { bus as dashboardBus } from './lib/dashboard-bus.js';
 import { loadSessions, saveSessions, flushSessionsNow } from './lib/session-store.js';
 import { LOGS_DIR } from './lib/config.js';
+import {
+  buildClientToolsServer,
+  extractToolUses,
+  hasToolUse,
+  toolMessagesToText,
+  MCP_SERVER_NAME,
+  MCP_TOOL_PREFIX,
+} from './lib/tool-bridge.js';
+import {
+  getUpdateCheck,
+  applyUpdate,
+  readUpdateState,
+  readUpdateLogTail,
+  getCurrentVersion,
+} from './lib/updater.js';
 const __filename = fileURLToPath(import.meta.url);
 const __dirname = dirname(__filename);
@@ -214,114 +229,45 @@ function collectImages(messages) {
 }
 // ---------------------------------------------------------------------------
-// Tool calling (Path B: prompt-embedded protocol)
+// Tool calling (Phase 1: native MCP tools — no more <tool_call> text hack)
 // ---------------------------------------------------------------------------
-// The Claude Agent SDK cannot stream OpenAI-style function-call events back to
-// the caller (MCP handlers execute in-process and pollute session state; see
-// README "Known Gaps"). Workaround: inject client-provided tool schemas into
-// the system prompt and instruct the model to emit <tool_call>{...}</tool_call>
-// tags. We parse those out and re-emit as OpenAI `tool_calls`. Tool results
-// coming back from the client get wrapped in <tool_result> blocks.
+// Client-provided OpenAI tools are registered with the SDK as in-process MCP
+// tools (see lib/tool-bridge.js). The model emits **native** tool_use content
+// blocks in its assistant messages; we abort the SDK on the first one and
+// return OpenAI tool_calls to the client. When the client replies with tool
+// results, we send them back as Anthropic tool_result content blocks inside
+// a single SDKUserMessage — round-tripping cleanly through the SDK session.
 function hasTools(body) {
   return Array.isArray(body?.tools) && body.tools.length > 0;
 }
-function buildToolInstructions(tools) {
-  const lines = [
-    'You have access to CLIENT-DEFINED tools listed below. To invoke a tool, emit one or more <tool_call> tags, each containing a strict JSON object with "name" and "arguments":',
-    '',
-    '<tool_call>{"name":"<tool_name>","arguments":{<args>}}</tool_call>',
-    '',
-    'Rules:',
-    '- Do NOT wrap <tool_call> tags in markdown code fences.',
-    '- When you emit <tool_call> tags, output ONLY the tags — no prose, no explanation, no other text.',
-    '- You may emit multiple <tool_call> tags to request parallel calls.',
-    '- Tool results will be returned as <tool_result id="..." name="...">...</tool_result> blocks. After results arrive, continue toward the final answer.',
-    '- When you have the final answer and need no more tool calls, respond normally WITHOUT any <tool_call> tag.',
-    '- Do NOT call any other tool (Read, Bash, Grep, etc.) — only the tools listed below.',
-    '',
-    'Available tools:',
-  ];
-  for (const t of tools) {
-    if (t?.type !== 'function' || !t.function) continue;
-    const fn = t.function;
-    lines.push(`<tool name="${fn.name}">`);
-    if (fn.description) lines.push(`  <description>${fn.description}</description>`);
-    lines.push(`  <parameters>${JSON.stringify(fn.parameters || { type: 'object', properties: {} })}</parameters>`);
-    lines.push('</tool>');
-  }
-  return lines.join('\n');
-}
-function formatAssistantForReplay(msg) {
-  const parts = [];
-  const text = extractContent(msg.content);
-  if (text) parts.push(text);
-  if (Array.isArray(msg.tool_calls)) {
-    for (const tc of msg.tool_calls) {
-      if (tc?.type === 'function' && tc.function) {
-        let args = {};
-        try { args = JSON.parse(tc.function.arguments || '{}'); } catch {}
-        parts.push(`<tool_call>${JSON.stringify({ name: tc.function.name, arguments: args })}</tool_call>`);
-      }
-    }
-  }
-  return parts.join('\n');
-}
-function formatToolResult(msg) {
-  const content = extractContent(msg.content);
-  const id = msg.tool_call_id || 'unknown';
-  const name = msg.name || '';
-  return `<tool_result id="${id}" name="${name}">\n${content}\n</tool_result>`;
-}
-// Parse the model's text output for <tool_call> tags. Returns
-//   { toolCalls: [{id, name, arguments}], textBefore: string }
-// when at least one valid call is found, else null.
-function parseToolCalls(text) {
-  if (!text || !text.includes('<tool_call>')) return null;
-  const re = /<tool_call>\s*([\s\S]*?)\s*<\/tool_call>/g;
-  const calls = [];
-  let firstIdx = -1;
-  let m;
-  while ((m = re.exec(text)) !== null) {
-    if (firstIdx === -1) firstIdx = m.index;
-    try {
-      const obj = JSON.parse(m[1]);
-      if (obj && typeof obj.name === 'string') {
-        calls.push({
-          id: `call_${uuidv4().replace(/-/g, '').slice(0, 20)}`,
-          name: obj.name,
-          arguments: JSON.stringify(obj.arguments ?? {}),
-        });
-      }
-    } catch {
-      // ignore malformed tool_call blocks
-    }
-  }
-  if (!calls.length) return null;
-  return { toolCalls: calls, textBefore: text.slice(0, firstIdx).trim() };
-}
-// Detect whether the running text contains a COMPLETE <tool_call>...</tool_call>
-// pair — used to abort the SDK early once a call has been emitted.
-function hasCompleteToolCall(text) {
-  return /<tool_call>\s*[\s\S]*?<\/tool_call>/.test(text);
-}
-function messagesToPrompt(messages, { resuming = false, tools = null } = {}) {
-  // When resuming, the SDK already has full history. Only send the new tail:
-  // tool_results (if the client is replying with tool outputs) and/or a fresh
-  // user message.
+/**
+ * Build the prompt text from the OpenAI messages array.
+ *
+ * Returns `{ promptText }` — a single string ready for the SDK. Tool
+ * results are spliced in as <tool_results> XML when present (see
+ * lib/tool-bridge.js#toolMessagesToText for why we don't use native
+ * tool_result content blocks yet).
+ *
+ * Resuming vs fresh:
+ *   - Resuming: SDK has full history. We only send the new tail —
+ *     trailing tool results plus the most recent user text, if any.
+ *   - Fresh: SDK starts cold. We serialize the visible history with
+ *     <system>/<previous_response>/<tool_results> tags. No tool-
+ *     instruction injection — the SDK MCP registration handles that.
+ */
+function messagesToPrompt(messages, { resuming = false } = {}) {
   if (resuming) {
-    const toolResults = [];
+    // Walk backwards from the end, collecting trailing tool messages and
+    // the most recent user text. Tool results are formatted as a text
+    // block (see lib/tool-bridge.js#toolMessagesToText for the rationale).
+    const trailingToolMessages = [];
     let userText = '';
     for (let i = messages.length - 1; i >= 0; i--) {
       const msg = messages[i];
       if (msg.role === 'tool') {
-        toolResults.unshift(formatToolResult(msg));
+        trailingToolMessages.unshift(msg);
       } else if (msg.role === 'user') {
         userText = extractContent(msg.content);
         break;
@@ -329,39 +275,20 @@ function messagesToPrompt(messages, { resuming = false, tools = null } = {}) {
         break;
       }
     }
+    const toolResultsText = toolMessagesToText(trailingToolMessages);
     const parts = [];
-    if (toolResults.length) {
-      parts.push(`<tool_results>\n${toolResults.join('\n')}\n</tool_results>`);
-      // The model sometimes treats a bare <tool_results> block as "just data"
-      // and returns empty. A short nudge keeps the turn productive without
-      // biasing what comes next.
-      if (!userText) parts.push('Use the tool results above to continue toward the final answer. If more tool calls are needed, emit them; otherwise respond directly.');
-    }
+    if (toolResultsText) parts.push(toolResultsText);
     if (userText) parts.push(userText);
-    return parts.join('\n\n') || extractContent(messages[messages.length - 1].content);
+    return {
+      promptText: parts.join('\n\n') || extractContent(messages[messages.length - 1]?.content || ''),
+    };
   }
+  // Fresh request: serialize visible history as XML-wrapped text. No
+  // tool-instruction injection (the model learns about tools via the SDK
+  // MCP registration, not the prompt).
   const parts = [];
-  // Tool instructions prepended once at the top of the system context.
-  if (tools && tools.length) {
-    parts.push(`<system>\n${buildToolInstructions(tools)}\n</system>\n`);
-  }
-  // Group consecutive tool-role messages so they emit as one <tool_results> block.
-  let toolBuffer = [];
-  const flushTools = () => {
-    if (toolBuffer.length) {
-      parts.push(`<tool_results>\n${toolBuffer.join('\n')}\n</tool_results>\n`);
-      toolBuffer = [];
-    }
-  };
   for (const msg of messages) {
-    if (msg.role === 'tool') {
-      toolBuffer.push(formatToolResult(msg));
-      continue;
-    }
-    flushTools();
     switch (msg.role) {
       case 'system':
         parts.push(`<system>\n${extractContent(msg.content)}\n</system>\n`);
@@ -369,18 +296,34 @@ function messagesToPrompt(messages, { resuming = false, tools = null } = {}) {
       case 'user':
         parts.push(extractContent(msg.content));
         break;
-      case 'assistant':
-        parts.push(`<previous_response>\n${formatAssistantForReplay(msg)}\n</previous_response>\n`);
+      case 'assistant': {
+        // Best-effort replay. tool_calls in non-resume history are dropped;
+        // the model can usually infer continuity from the surrounding text.
+        const text = extractContent(msg.content);
+        if (text) parts.push(`<previous_response>\n${text}\n</previous_response>\n`);
+        break;
+      }
+      case 'tool': {
+        // Tool messages on a fresh turn (rare — clients normally use
+        // session keys). Splice as text since there's no preceding
+        // tool_use turn we can bind to natively.
+        const text = toolMessagesToText([msg]);
+        if (text) parts.push(text);
         break;
+      }
     }
   }
-  flushTools();
-  return parts.join('\n').trim();
+  return {
+    promptText: parts.join('\n').trim(),
+  };
 }
-// Wrap a prompt + optional image blocks into the form query() expects.
-// Returns a string when there are no images (fast path), or an async iterable
-// yielding one SDKUserMessage with multi-part content when there are.
+/**
+ * Wrap promptText + optional image blocks into the form query() expects.
+ * Returns a string for the fast path (text-only, no images), or an
+ * async iterable yielding one SDKUserMessage with multi-part content
+ * when there are images.
+ */
 function buildQueryPrompt(promptText, imageBlocks) {
   if (!imageBlocks.length) return promptText;
   const content = [
@@ -443,12 +386,15 @@ async function handleStreaming(req, res, body, requestId, sessionKey) {
   const existing = getSession(sessionKey);
   const resuming = !!existing?.sdkSessionId;
   const toolsEnabled = hasTools(body);
-  const promptText = messagesToPrompt(body.messages, { resuming, tools: toolsEnabled ? body.tools : null });
+  const { promptText } = messagesToPrompt(body.messages, { resuming });
   const images = collectImages(body.messages);
   const prompt = buildQueryPrompt(promptText, images);
   const model = resolveModel(body.model);
+  // Build the in-process MCP server exposing client tools to the SDK.
+  // null when toolsEnabled is false (or all tools are malformed).
+  const clientToolsServer = toolsEnabled ? buildClientToolsServer(body.tools) : null;
   if (images.length) console.log(`  [multimodal] ${images.length} image block(s)`);
-  if (toolsEnabled) console.log(`  [tools] ${body.tools.length} client tool(s) — buffering stream`);
+  if (toolsEnabled) console.log(`  [tools] ${body.tools.length} client tool(s) registered as MCP`);
   res.setHeader('Content-Type', 'text/event-stream');
   res.setHeader('Cache-Control', 'no-cache');
@@ -473,11 +419,17 @@ async function handleStreaming(req, res, body, requestId, sessionKey) {
     console.log(`  [session] resuming: ${sessionKey} → sdk=${existing.sdkSessionId} (msgs=${existing.messageCount})`);
   }
-  let bufferedText = ''; // only used when toolsEnabled
+  // Tools-mode buffers text and collects native tool_use blocks. If the
+  // model emits text first then a tool_use, we want both: textBefore as
+  // the assistant content, plus the tool_calls. (Most clients display the
+  // text and then act on the tool_calls.)
+  let bufferedText = '';
+  let collectedToolCalls = []; // [{id, name, arguments}] from extractToolUses()
   const runQuery = async () => {
     // Reset per-attempt state so a 401 retry starts clean
     bufferedText = '';
+    collectedToolCalls = [];
     isFirst = true;
     resolvedModel = model;
     capturedSessionId = existing?.sdkSessionId || null;
@@ -490,7 +442,18 @@ async function handleStreaming(req, res, body, requestId, sessionKey) {
         permissionMode: 'bypassPermissions',
         allowDangerouslySkipPermissions: true,
         abortController,
-        ...(toolsEnabled ? { allowedTools: [] } : {}),
+        // Tools-mode: register client tools as an in-process MCP server
+        // and allow only those (no Bash/Read/etc. — the SDK's built-ins
+        // would pollute the session and leak through to the model).
+        ...(clientToolsServer
+          ? {
+              mcpServers: { [MCP_SERVER_NAME]: clientToolsServer },
+              allowedTools: [`${MCP_TOOL_PREFIX}*`],
+            }
+          : toolsEnabled
+            // Tools were requested but none were valid — disable all tools.
+            ? { allowedTools: [] }
+            : {}),
         ...(resuming ? { resume: existing.sdkSessionId } : {}),
         ...(sessionKey && !resuming ? { persistSession: true } : {}),
       },
@@ -532,15 +495,25 @@ async function handleStreaming(req, res, body, requestId, sessionKey) {
         throw new AuthFailureInResultText(turnText);
       }
+      // Tools-mode: check for native tool_use content blocks. The moment
+      // we see one, abort the SDK — we don't want our stub handler to
+      // hang waiting on an execution that's actually happening client-side.
+      if (toolsEnabled && message.type === 'assistant' && hasToolUse(message)) {
+        const calls = extractToolUses(message);
+        if (calls.length) {
+          collectedToolCalls.push(...calls);
+          if (turnText) bufferedText += turnText;
+          console.log(`  [tools] ${calls.length} native tool_use block(s) — aborting SDK`);
+          abortController.abort();
+          break;
+        }
+      }
       if (turnText) {
         if (toolsEnabled) {
+          // Buffer text in case it precedes a tool_use, or ends up as the
+          // final response when the model decides not to call any tools.
           bufferedText += turnText;
-          // Abort early once we see a complete <tool_call>...</tool_call>
-          if (hasCompleteToolCall(bufferedText)) {
-            console.log('  [tools] complete tool_call detected — aborting SDK');
-            abortController.abort();
-            break;
-          }
         } else {
           sendSSE(res, makeChunk(requestId, resolvedModel, turnText, isFirst ? 'assistant' : undefined, null));
           isFirst = false;
@@ -586,9 +559,8 @@ async function handleStreaming(req, res, body, requestId, sessionKey) {
   // Tools mode: emit the buffered response as a single chunk with either
   // tool_calls (+ finish_reason: tool_calls) or plain text (+ stop).
   if (toolsEnabled && !res.writableEnded) {
-    const parsed = parseToolCalls(bufferedText);
-    if (parsed) {
-      console.log(`  [tools] emitting ${parsed.toolCalls.length} tool_call(s)`);
+    if (collectedToolCalls.length > 0) {
+      console.log(`  [tools] emitting ${collectedToolCalls.length} tool_call(s)`);
       const chunk = {
         id: `chatcmpl-${requestId}`,
         object: 'chat.completion.chunk',
@@ -598,8 +570,8 @@ async function handleStreaming(req, res, body, requestId, sessionKey) {
           index: 0,
           delta: {
             role: 'assistant',
-            content: parsed.textBefore || null,
-            tool_calls: parsed.toolCalls.map((tc, i) => ({
+            content: bufferedText.trim() || null,
+            tool_calls: collectedToolCalls.map((tc, i) => ({
               index: i,
               id: tc.id,
               type: 'function',
@@ -634,14 +606,16 @@ async function handleNonStreaming(res, body, requestId, sessionKey) {
   const existing = getSession(sessionKey);
   const resuming = !!existing?.sdkSessionId;
   const toolsEnabled = hasTools(body);
-  const promptText = messagesToPrompt(body.messages, { resuming, tools: toolsEnabled ? body.tools : null });
+  const { promptText } = messagesToPrompt(body.messages, { resuming });
   const images = collectImages(body.messages);
   const prompt = buildQueryPrompt(promptText, images);
   const model = resolveModel(body.model);
+  const clientToolsServer = toolsEnabled ? buildClientToolsServer(body.tools) : null;
   if (images.length) console.log(`  [multimodal] ${images.length} image block(s)`);
-  if (toolsEnabled) console.log(`  [tools] ${body.tools.length} client tool(s)`);
+  if (toolsEnabled) console.log(`  [tools] ${body.tools.length} client tool(s) registered as MCP`);
   let resultText = '';
+  let collectedToolCalls = [];
   let resolvedModel = model;
   let inputTokens = 0;
   let outputTokens = 0;
@@ -655,6 +629,7 @@ async function handleNonStreaming(res, body, requestId, sessionKey) {
   const runQuery = async () => {
     // Reset per-attempt state so a 401 retry starts clean
     resultText = '';
+    collectedToolCalls = [];
     resolvedModel = model;
     inputTokens = 0;
     outputTokens = 0;
@@ -668,7 +643,14 @@ async function handleNonStreaming(res, body, requestId, sessionKey) {
         permissionMode: 'bypassPermissions',
         allowDangerouslySkipPermissions: true,
         abortController,
-        ...(toolsEnabled ? { allowedTools: [] } : {}),
+        ...(clientToolsServer
+          ? {
+              mcpServers: { [MCP_SERVER_NAME]: clientToolsServer },
+              allowedTools: [`${MCP_TOOL_PREFIX}*`],
+            }
+          : toolsEnabled
+            ? { allowedTools: [] }
+            : {}),
         ...(resuming ? { resume: existing.sdkSessionId } : {}),
         ...(sessionKey && !resuming ? { persistSession: true } : {}),
       },
@@ -696,11 +678,15 @@ async function handleNonStreaming(res, body, requestId, sessionKey) {
           abortController.abort();
           throw new AuthFailureInResultText(resultText);
         }
-        // Abort early once we see a complete <tool_call>...</tool_call>
-        if (toolsEnabled && hasCompleteToolCall(resultText)) {
-          console.log('  [tools] complete tool_call detected — aborting SDK');
-          abortController.abort();
-          break;
+        // Native tool_use detection — abort the moment a tool_use lands.
+        if (toolsEnabled && hasToolUse(message)) {
+          const calls = extractToolUses(message);
+          if (calls.length) {
+            collectedToolCalls.push(...calls);
+            console.log(`  [tools] ${calls.length} native tool_use block(s) — aborting SDK`);
+            abortController.abort();
+            break;
+          }
         }
       }
@@ -740,32 +726,29 @@ async function handleNonStreaming(res, body, requestId, sessionKey) {
   if (sessionKey) responseHeaders['X-Session-Id'] = sessionKey;
   // Tool-calling response shape
-  if (toolsEnabled) {
-    const parsed = parseToolCalls(resultText);
-    if (parsed) {
-      console.log(`  [tools] emitting ${parsed.toolCalls.length} tool_call(s)`);
-      return res.set(responseHeaders).json({
-        id: `chatcmpl-${requestId}`,
-        object: 'chat.completion',
-        created: Math.floor(Date.now() / 1000),
-        model: normalizeModelName(resolvedModel),
-        choices: [{
-          index: 0,
-          message: {
-            role: 'assistant',
-            content: parsed.textBefore || null,
-            tool_calls: parsed.toolCalls.map((tc) => ({
-              id: tc.id,
-              type: 'function',
-              function: { name: tc.name, arguments: tc.arguments },
-            })),
-          },
-          finish_reason: 'tool_calls',
-        }],
-        usage: { prompt_tokens: inputTokens, completion_tokens: outputTokens, total_tokens: inputTokens + outputTokens },
-      });
-    }
-    // No tool_call tags → fall through to normal text response
+  if (toolsEnabled && collectedToolCalls.length > 0) {
+    console.log(`  [tools] emitting ${collectedToolCalls.length} tool_call(s)`);
+    return res.set(responseHeaders).json({
+      id: `chatcmpl-${requestId}`,
+      object: 'chat.completion',
+      created: Math.floor(Date.now() / 1000),
+      model: normalizeModelName(resolvedModel),
+      choices: [{
+        index: 0,
+        message: {
+          role: 'assistant',
+          content: resultText.trim() || null,
+          tool_calls: collectedToolCalls.map((tc) => ({
+            id: tc.id,
+            type: 'function',
+            function: { name: tc.name, arguments: tc.arguments },
+          })),
+        },
+        finish_reason: 'tool_calls',
+      }],
+      usage: { prompt_tokens: inputTokens, completion_tokens: outputTokens, total_tokens: inputTokens + outputTokens },
+    });
+    // No tool_use blocks → fall through to normal text response
   }
   res.set(responseHeaders).json({
@@ -1090,6 +1073,62 @@ app.get('/dashboard/logs', async (req, res) => {
   }
 });
+// ---------------------------------------------------------------------------
+// Updater — dashboard-driven "update available → update now" flow
+// ---------------------------------------------------------------------------
+// GET /update/check — is there a newer mobygate on npm?
+// Response: { current, latest, updateAvailable, installMode, canApply, cached, error }
+// Safe to poll: the npm registry call is cached for 15 min in-process.
+app.get('/update/check', async (req, res) => {
+  try {
+    const force = req.query.force === '1' || req.query.force === 'true';
+    const info = await getUpdateCheck({ force });
+    res.json(info);
+  } catch (e) {
+    res.status(500).json({ error: e.message });
+  }
+});
+// POST /update/apply — fire the update in a detached child process.
+// We return immediately with { started, pid }. The child runs
+// `npm install -g mobygate@latest` (or `git pull && npm install`), then
+// restarts the service — which kills us. The dashboard polls
+// /update/status to show progress and reconnects once the new server is up.
+app.post('/update/apply', (_req, res) => {
+  try {
+    const result = applyUpdate({});
+    const status = result.started ? 202 : 409;
+    res.status(status).json({ ...result, currentVersion: getCurrentVersion() });
+    if (result.started) {
+      dashboardBus.emitEvent({ type: 'update.started', pid: result.pid, mode: result.mode });
+    }
+  } catch (e) {
+    res.status(500).json({ started: false, error: e.message });
+  }
+});
+// GET /update/status — progress for a running (or just-finished) update.
+// The dashboard polls this during apply. `running` is determined by
+// PID liveness, so even if our process is the one getting restarted,
+// the new one answers correctly.
+app.get('/update/status', (req, res) => {
+  const state = readUpdateState();
+  let running = false;
+  if (state.pid) {
+    try { process.kill(state.pid, 0); running = true; } catch {}
+  }
+  const lines = Math.min(1000, parseInt(req.query.lines || '200', 10));
+  res.json({
+    running,
+    pid: state.pid || null,
+    startedAt: state.startedAt || null,
+    mode: state.mode || null,
+    lines: readUpdateLogTail({ lines }),
+    currentVersion: getCurrentVersion(),
+  });
+});
 // ---------------------------------------------------------------------------
 // Start
 // ---------------------------------------------------------------------------