mobygate 0.5.2 → 0.6.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -4,6 +4,128 @@ All notable changes to mobygate are documented here. Format loosely follows
4
4
  [Keep a Changelog](https://keepachangelog.com/en/1.1.0/); version numbers are
5
5
  [Semantic Versioning](https://semver.org/).
6
6
 
7
+ ## [0.6.0] — 2026-04-24
8
+
9
+ Big one. Native tool calling + in-dashboard self-update.
10
+
11
+ ### Added
12
+
13
+ - **Native MCP tool calling.** Client-supplied OpenAI tools are now
14
+ registered with the Claude Agent SDK as in-process MCP tools (with
15
+ Zod schemas converted from JSON Schema). The model emits genuine
16
+ `tool_use` content blocks instead of the old `<tool_call>...`
17
+ text-pattern hack. Tool IDs returned to clients are now Anthropic-
18
+ native `toolu_*` strings, not synthesized `call_*` ones. New module:
19
+ `lib/tool-bridge.js`.
20
+ - **Dashboard update banner.** When npm has a newer mobygate, the
21
+ dashboard shows an orange pill at the top: `v0.6.0 → v0.6.1 available
22
+ · npm install · [changelog] [dismiss] [update now]`. Clicking
23
+ "update now" fires `npm install -g mobygate@latest` (or `git pull`
24
+ for git-mode installs) in a detached child process, restarts the
25
+ service, and auto-reloads the page. Dismissals stick per-version
26
+ via localStorage. New module: `lib/updater.js`.
27
+ - New endpoints: `GET /update/check`, `POST /update/apply`,
28
+ `GET /update/status`. The check endpoint caches the npm registry
29
+ lookup for 15 minutes so dashboards open all day don't hammer it.
30
+
31
+ ### Changed
32
+
33
+ - **No more prompt-injected tool definitions.** The `<system>...</system>`
34
+ block listing available tools as XML is gone — the SDK's MCP
35
+ registration is the model's source of truth now. This shrinks every
36
+ tool-enabled prompt by ~200-500 tokens depending on tool count.
37
+ - **Tool-flow detection** moved from text-pattern matching
38
+ (`hasCompleteToolCall`, `parseToolCalls` regexes) to native
39
+ `tool_use` content-block detection in the assistant message stream
40
+ (`hasToolUse`, `extractToolUses`). The moment a tool_use lands,
41
+ we abort the SDK and emit OpenAI-shape `tool_calls`.
42
+ - **`alwaysLoad: true`** on every registered tool. Without this, the
43
+ SDK lazily defers MCP tool schemas — the model has to call the
44
+ built-in `ToolSearch` tool to fetch each definition before invoking,
45
+ which leaks through to OpenAI clients as a confusing tool_call
46
+ for `ToolSearch` instead of their actual tool. Eager loading
47
+ keeps the surface clean.
48
+
49
+ ### Removed
50
+
51
+ - `buildToolInstructions` — the `<tool_call>...` protocol prose.
52
+ - `parseToolCalls` — the regex parser for `<tool_call>` JSON blocks.
53
+ - `hasCompleteToolCall` — the streaming-buffer heuristic that aborted
54
+ the SDK when a complete tag pair appeared.
55
+ - `formatAssistantForReplay`'s tool_calls→`<tool_call>` text
56
+ serialization (assistant replay is now best-effort text only).
57
+ - The "Use the tool results above to continue toward the final answer"
58
+ nudge — tool results are visible in conversation context now, so
59
+ the model handles continuation naturally without coaxing.
60
+
61
+ ### Known limitation (Phase 1 deliberate)
62
+
63
+ - Tool *results* coming back from the client are still spliced as
64
+ `<tool_results>` text in the resumed prompt, not native Anthropic
65
+ `tool_result` content blocks. Reason: aborting the SDK on a
66
+ `tool_use` block prevents the assistant turn from being persisted
67
+ in session state — on resume, native tool_result blocks have
68
+ nothing to bind to and the model re-calls the tool. Text-form
69
+ results work because the resumed model has the prior turn in
70
+ context. Phase 2's full Anthropic Messages wire surface will
71
+ keep the SDK alive through the tool turn and switch to native
72
+ tool_result blocks end-to-end.
73
+
74
+ ### Migration
75
+
76
+ - No client-facing changes. Existing OpenAI-shape requests with
77
+ `tools: [...]` work the same as before — what's improved is
78
+ reliability ("Model returned empty after tool calls" warnings
79
+ should largely disappear) and surface fidelity (tool_call IDs
80
+ are now native Anthropic IDs, not synthesized).
81
+ - Update with `mobygate update` (CLI) or click the new "update now"
82
+ button in the dashboard once it appears.
83
+
84
+ ## [0.5.3] — 2026-04-19
85
+
86
+ Security pass.
87
+
88
+ ### Changed
89
+
90
+ - **Default listen address is now `127.0.0.1`** (loopback only). Earlier
91
+ versions called `app.listen(PORT)` with no host, which on macOS binds
92
+ to `::` (IPv6 all interfaces) — meaning anyone on your Wi-Fi could
93
+ reach `:3456`, use your Claude Max subscription, and read your
94
+ request logs. New default blocks that; startup banner now calls
95
+ out the bind ("loopback only" vs "⚠ network-reachable — add auth").
96
+ - **Opt-in LAN sharing** via `bind: 0.0.0.0` (or any specific
97
+ interface) in `~/.mobygate/config.yaml`, or via the `BIND` env
98
+ var. If you opt in, consider putting an auth proxy in front of
99
+ the port — the dashboard and HTTP endpoints have no authentication.
100
+
101
+ ### Fixed
102
+
103
+ - **Dashboard XSS** in live-requests and sessions rows. User-
104
+ controlled fields (`model`, `session key`, `model` on session
105
+ entries) were being interpolated directly into `innerHTML`. A
106
+ malicious local process that can reach :3456 could have
107
+ injected `<script>` via a crafted `model` string and executed
108
+ JS in whichever browser tab had the dashboard open. Added an
109
+ `escHtml()` helper and wrapped every user-controlled field
110
+ interpolated via innerHTML.
111
+ - Added `hono >= 4.12.14` as an npm `overrides` entry to clear
112
+ the single `moderate` audit finding (a transitive via
113
+ `@modelcontextprotocol/sdk` → `hono/jsx`). We don't actually
114
+ load hono/jsx, so it was never exploitable, but `npm audit`
115
+ now reports `0 vulnerabilities` — cleaner for downstream users.
116
+
117
+ ### Migration
118
+
119
+ For existing installs: `mobygate update` or
120
+ `npm install -g mobygate@latest` — the postinstall hook restarts
121
+ your service and the new loopback-only bind kicks in.
122
+
123
+ If you were **intentionally** exposing mobygate on the LAN (e.g.,
124
+ "one proxy for the family"), add `bind: 0.0.0.0` to
125
+ `~/.mobygate/config.yaml` and restart. Strongly recommend adding
126
+ an auth proxy (nginx with Basic Auth, Cloudflare Access, etc.)
127
+ in front of the port if you do this.
128
+
7
129
  ## [0.5.2] — 2026-04-19
8
130
 
9
131
  ### Added
package/index.html CHANGED
@@ -49,6 +49,34 @@
49
49
  <body class="antialiased">
50
50
  <div class="mx-auto px-12 pt-8 pb-7 flex flex-col gap-6 max-w-[1440px] min-h-screen">
51
51
 
52
+ <!-- ===== Update banner ===== -->
53
+ <!-- Hidden until /update/check reports updateAvailable=true. During
54
+ apply, this becomes a progress strip showing live log tail. -->
55
+ <section id="updateBanner" style="display:none" class="items-center gap-4 py-3 px-5 bg-[#121210] border-l-2 border-l-[#E89B2E] border-t border-b border-r border-[#2A2A1F] rounded-r-md">
56
+ <div class="flex items-center gap-2.5">
57
+ <span class="rounded-full bg-[#E89B2E] w-2 h-2 pulse-dot"></span>
58
+ <span class="uppercase text-[#E89B2E] font-medium text-[10px] tracking-[0.22em]">Update</span>
59
+ </div>
60
+ <div id="updateBannerText" class="grow text-[#F3EFE4] text-xs leading-4"></div>
61
+ <div id="updateBannerActions" class="flex items-center gap-2 shrink-0">
62
+ <a id="updateBannerChangelog" href="https://github.com/khnfrhn/mobygate/blob/master/CHANGELOG.md" target="_blank" rel="noreferrer" class="text-[#8A9A6A] hover:text-[#C9D9A8] text-[11px] tracking-[0.04em] underline decoration-dotted">changelog</a>
63
+ <button id="updateDismissBtn" class="rounded-full py-1.5 px-3 border border-[#2A2A1F] text-[#8A9A6A] hover:text-[#C9D9A8] hover:border-[#5A5F54] font-medium text-[11px] tracking-[0.04em] transition">dismiss</button>
64
+ <button id="updateApplyBtn" class="rounded-full py-1.5 px-3.5 bg-[#E89B2E] hover:brightness-110 text-[#0B0B09] font-bold text-[11px] tracking-[0.04em] transition">update now</button>
65
+ </div>
66
+ </section>
67
+ <!-- Apply-in-progress shelf: expands below the banner during update. -->
68
+ <section id="updateProgress" style="display:none" class="flex-col gap-2 py-3 px-5 bg-[#121210] border border-[#2A2A1F] rounded-md">
69
+ <div class="flex items-center justify-between">
70
+ <div class="flex items-center gap-2">
71
+ <span id="updateSpinner" class="rounded-full bg-[#E89B2E] w-2 h-2 pulse-dot"></span>
72
+ <span id="updateProgressTitle" class="uppercase text-[#C9D9A8] font-medium text-[10px] tracking-[0.22em]">Installing</span>
73
+ <span id="updateProgressSub" class="text-[#5A5F54] text-[11px]"></span>
74
+ </div>
75
+ <button id="updateProgressClose" style="display:none" class="text-[#5A5F54] hover:text-[#C9D9A8] text-[11px]">close ✕</button>
76
+ </div>
77
+ <pre id="updateProgressLog" class="text-[11px] leading-[15px] text-[#8A9A6A] max-h-[180px] overflow-auto whitespace-pre-wrap m-0"></pre>
78
+ </section>
79
+
52
80
  <!-- ===== Header ===== -->
53
81
  <header class="flex justify-between items-center shrink-0">
54
82
  <div class="flex items-center gap-[22px]">
@@ -357,6 +385,17 @@
357
385
  <script type="module">
358
386
  // ───────────────────────── helpers
359
387
  const $ = (id) => document.getElementById(id);
388
+ // Escape HTML in user-controlled strings (request model/session/error
389
+ // fields, session keys, etc.) before innerHTML interpolation. The
390
+ // dashboard is unauthenticated, so any process that can reach the
391
+ // proxy could otherwise inject a <script> via a crafted request and
392
+ // execute JS in whoever's tab is viewing the dashboard.
393
+ const escHtml = (s) => String(s ?? '')
394
+ .replace(/&/g, '&amp;')
395
+ .replace(/</g, '&lt;')
396
+ .replace(/>/g, '&gt;')
397
+ .replace(/"/g, '&quot;')
398
+ .replace(/'/g, '&#39;');
360
399
  const fmt = {
361
400
  time(ts) { return new Date(ts).toLocaleTimeString([], { hour12: false }); },
362
401
  ms(n) { return n == null ? '—' : `${n}`; },
@@ -553,10 +592,10 @@
553
592
  <div class="w-[72px] shrink-0 text-[#C9D9A8] text-xs leading-4">${fmt.time(startEv.ts)}</div>
554
593
  <div class="w-[100px] flex shrink-0 gap-1">${kindChips(startEv)}</div>
555
594
  <div class="w-[180px] flex flex-col shrink-0 gap-0.5">
556
- <div class="text-[#F3EFE4] text-xs leading-4 truncate">${startEv.model || '—'}</div>
557
- <div class="text-[#5A5F54] text-[10px] leading-3">${fmt.modelBase(startEv.model)} · ${fmt.modelCtx(startEv.resolvedModel)}</div>
595
+ <div class="text-[#F3EFE4] text-xs leading-4 truncate">${escHtml(startEv.model) || '—'}</div>
596
+ <div class="text-[#5A5F54] text-[10px] leading-3">${escHtml(fmt.modelBase(startEv.model))} · ${escHtml(fmt.modelCtx(startEv.resolvedModel))}</div>
558
597
  </div>
559
- <div class="w-[110px] shrink-0 text-[#8A9A6A] text-xs leading-4 truncate" title="${startEv.session || ''}">${startEv.session ? fmt.short(startEv.session) : '—'}</div>
598
+ <div class="w-[110px] shrink-0 text-[#8A9A6A] text-xs leading-4 truncate" title="${escHtml(startEv.session || '')}">${startEv.session ? escHtml(fmt.short(startEv.session)) : '—'}</div>
560
599
  <div class="grow flex flex-col gap-1">${latencyBar(endEv)}</div>
561
600
  <div class="w-[100px] text-right shrink-0 text-[#8A9A6A] text-[11px] leading-[14px]">${endEv && (endEv.inputTokens || endEv.outputTokens) ? `${endEv.inputTokens || 0}/${endEv.outputTokens || 0}` : '—'}</div>
562
601
  <div class="w-[70px] flex justify-end shrink-0">${statusPill(endEv)}</div>
@@ -680,12 +719,12 @@
680
719
  const row = document.createElement('div');
681
720
  row.className = 'flex items-center py-3 px-6 gap-4 border-b border-[#1A1A15]';
682
721
  row.innerHTML = `
683
- <div class="grow min-w-0 text-[#F3EFE4] text-xs leading-4 truncate" title="${s.key}">${s.key}</div>
684
- <div class="w-[160px] shrink-0 text-[#8A9A6A] text-xs leading-4 truncate">${s.model || '—'}</div>
685
- <div class="w-[60px] text-right shrink-0 text-[#8A9A6A] text-xs">${s.messageCount}</div>
722
+ <div class="grow min-w-0 text-[#F3EFE4] text-xs leading-4 truncate" title="${escHtml(s.key)}">${escHtml(s.key)}</div>
723
+ <div class="w-[160px] shrink-0 text-[#8A9A6A] text-xs leading-4 truncate">${escHtml(s.model) || '—'}</div>
724
+ <div class="w-[60px] text-right shrink-0 text-[#8A9A6A] text-xs">${Number(s.messageCount) || 0}</div>
686
725
  <div class="w-[80px] text-right shrink-0 text-[#5A5F54] text-[11px]">${fmt.uptime(s.idleSec)}</div>
687
726
  <div class="w-[80px] text-right shrink-0 text-[#5A5F54] text-[11px]">${fmt.uptime(s.ttlRemainingSec)} left</div>
688
- <button class="text-[#E89B2E] text-[11px] hover:brightness-110 shrink-0" data-key="${s.key}">expire</button>
727
+ <button class="text-[#E89B2E] text-[11px] hover:brightness-110 shrink-0" data-key="${escHtml(s.key)}">expire</button>
689
728
  `;
690
729
  row.querySelector('button').addEventListener('click', async () => {
691
730
  await fetch('/sessions/' + encodeURIComponent(s.key), { method: 'DELETE' });
@@ -793,6 +832,117 @@
793
832
  }
794
833
  }, 1000);
795
834
 
835
+ // ───────────────────────── Updater
836
+ // Dashboard-driven upgrade flow. On load (and every 30 min) we ask
837
+ // /update/check whether a newer mobygate is on npm. If so, a pill
838
+ // appears at the top of the page — click "update now" to fire the
839
+ // update, watch log lines stream in, then auto-reload when the new
840
+ // server is up. The child process is detached, so the server
841
+ // restart doesn't orphan it.
842
+ const UPDATE_DISMISS_KEY = 'mobygate:update:dismissedVersion';
843
+ let updateInfo = null;
844
+ let updatePollTimer = null;
845
+
846
+ function showBanner(info) {
847
+ if (!info?.updateAvailable) {
848
+ $('updateBanner').style.display = 'none';
849
+ return;
850
+ }
851
+ // Respect dismissal: if the user dismissed this exact version, don't
852
+ // re-pester until a newer one lands.
853
+ const dismissed = localStorage.getItem(UPDATE_DISMISS_KEY);
854
+ if (dismissed === info.latest) {
855
+ $('updateBanner').style.display = 'none';
856
+ return;
857
+ }
858
+ const msg = info.canApply
859
+ ? `v${escHtml(info.current)} → <span class="text-[#B7E56D]">v${escHtml(info.latest)}</span> available · <span class="text-[#5A5F54]">${escHtml(info.installMode)} install</span>`
860
+ : `v${escHtml(info.current)} → <span class="text-[#B7E56D]">v${escHtml(info.latest)}</span> available · <span class="text-[#E89B2E]">${escHtml(info.installMode)} install — update manually</span>`;
861
+ $('updateBannerText').innerHTML = msg;
862
+ $('updateApplyBtn').style.display = info.canApply ? '' : 'none';
863
+ $('updateBanner').style.display = 'flex';
864
+ }
865
+
866
+ async function checkForUpdates({ force = false } = {}) {
867
+ try {
868
+ const r = await fetch(`/update/check${force ? '?force=1' : ''}`);
869
+ if (!r.ok) return;
870
+ updateInfo = await r.json();
871
+ showBanner(updateInfo);
872
+ } catch (e) { /* offline is fine */ }
873
+ }
874
+
875
+ function renderUpdateLog(lines) {
876
+ const el = $('updateProgressLog');
877
+ el.textContent = (lines || []).join('\n');
878
+ // Pin to bottom so the user sees the latest line.
879
+ el.scrollTop = el.scrollHeight;
880
+ }
881
+
882
+ async function pollUpdateStatus() {
883
+ try {
884
+ const r = await fetch('/update/status?lines=200');
885
+ if (!r.ok) return;
886
+ const s = await r.json();
887
+ renderUpdateLog(s.lines);
888
+ if (!s.running) {
889
+ // Update finished. The service restart may have already swapped
890
+ // the running binary — our `currentVersion` reflects whatever
891
+ // server answered. If it matches `latest`, celebrate. Either
892
+ // way, give it a moment then reload so the dashboard comes
893
+ // back on the new code path.
894
+ clearInterval(updatePollTimer); updatePollTimer = null;
895
+ $('updateSpinner').classList.remove('pulse-dot');
896
+ $('updateSpinner').classList.remove('bg-[#E89B2E]');
897
+ $('updateSpinner').classList.add('bg-[#B7E56D]');
898
+ $('updateProgressTitle').textContent = 'Installed';
899
+ $('updateProgressSub').textContent = `now on v${s.currentVersion} — reloading in 3s…`;
900
+ $('updateProgressClose').style.display = '';
901
+ setTimeout(() => location.reload(), 3000);
902
+ }
903
+ } catch (e) {
904
+ // Server is mid-restart — keep polling, it'll come back.
905
+ }
906
+ }
907
+
908
+ function startUpdateProgress(mode) {
909
+ $('updateBanner').style.display = 'none';
910
+ $('updateProgress').style.display = 'flex';
911
+ $('updateProgressSub').textContent = mode ? `(${mode} install)` : '';
912
+ $('updateProgressTitle').textContent = 'Installing';
913
+ $('updateSpinner').classList.add('pulse-dot');
914
+ $('updateProgressLog').textContent = 'starting update…';
915
+ if (updatePollTimer) clearInterval(updatePollTimer);
916
+ updatePollTimer = setInterval(pollUpdateStatus, 1500);
917
+ pollUpdateStatus();
918
+ }
919
+
920
+ $('updateApplyBtn')?.addEventListener('click', async () => {
921
+ $('updateApplyBtn').disabled = true;
922
+ try {
923
+ const r = await fetch('/update/apply', { method: 'POST' });
924
+ const j = await r.json().catch(() => ({}));
925
+ if (!r.ok || !j.started) {
926
+ $('updateBannerText').innerHTML += ` <span class="text-[#E89B2E]">— ${escHtml(j.error || 'update failed to start')}</span>`;
927
+ $('updateApplyBtn').disabled = false;
928
+ return;
929
+ }
930
+ startUpdateProgress(j.mode);
931
+ } catch (e) {
932
+ $('updateBannerText').innerHTML += ` <span class="text-[#E89B2E]">— ${escHtml(e.message)}</span>`;
933
+ $('updateApplyBtn').disabled = false;
934
+ }
935
+ });
936
+
937
+ $('updateDismissBtn')?.addEventListener('click', () => {
938
+ if (updateInfo?.latest) localStorage.setItem(UPDATE_DISMISS_KEY, updateInfo.latest);
939
+ $('updateBanner').style.display = 'none';
940
+ });
941
+
942
+ $('updateProgressClose')?.addEventListener('click', () => {
943
+ $('updateProgress').style.display = 'none';
944
+ });
945
+
796
946
  // Kick off
797
947
  loadSnapshot();
798
948
  loadAuth({ verify: false });
@@ -800,6 +950,21 @@
800
950
  loadLogs();
801
951
  armLogAutoRefresh();
802
952
  connectStream();
953
+ // Surface update availability on load + every 30 min. The backend
954
+ // caches the npm registry lookup for 15 min, so this doesn't hammer
955
+ // the registry even with the dashboard open all day.
956
+ checkForUpdates();
957
+ setInterval(() => checkForUpdates(), 30 * 60 * 1000);
958
+ // If an update is in-flight when the page loads (e.g., user refreshed
959
+ // mid-apply), pick up where it left off.
960
+ (async () => {
961
+ try {
962
+ const r = await fetch('/update/status?lines=50');
963
+ if (!r.ok) return;
964
+ const s = await r.json();
965
+ if (s.running) startUpdateProgress(s.mode);
966
+ } catch {}
967
+ })();
803
968
  </script>
804
969
  </body>
805
970
  </html>
package/lib/config.js CHANGED
@@ -27,6 +27,7 @@ export const LOGS_DIR = join(CONFIG_DIR, 'logs');
27
27
 
28
28
  const DEFAULTS = {
29
29
  port: 3456,
30
+ bind: '127.0.0.1', // loopback only by default (no LAN exposure)
30
31
  default_model: 'claude-opus-4-7[1m]',
31
32
  session_ttl_minutes: 60,
32
33
  max_concurrent: null, // reserved for future (per-session throttling)
@@ -57,6 +58,7 @@ export function loadConfig() {
57
58
 
58
59
  const merged = {
59
60
  port: parseInt(process.env.PORT || String(fileConfig.port ?? DEFAULTS.port), 10),
61
+ bind: process.env.BIND || fileConfig.bind || DEFAULTS.bind,
60
62
  default_model: process.env.DEFAULT_MODEL || fileConfig.default_model || DEFAULTS.default_model,
61
63
  session_ttl_minutes: parseInt(
62
64
  process.env.SESSION_TTL_MINUTES
@@ -91,6 +93,13 @@ export function writeConfig(values = {}) {
91
93
  `# HTTP port the proxy listens on.`,
92
94
  `port: ${merged.port}`,
93
95
  '',
96
+ `# Network interface to bind to. Defaults to 127.0.0.1 (loopback only —`,
97
+ `# the proxy is only reachable from this machine). Change to 0.0.0.0 to`,
98
+ `# share it on the LAN (e.g., "one proxy for the whole family"), but be`,
99
+ `# aware: whoever can reach :port can use your Claude Max subscription`,
100
+ `# and read logs containing your prompts. Add auth if you go LAN-public.`,
101
+ `bind: ${JSON.stringify(merged.bind)}`,
102
+ '',
94
103
  `# Default Claude model when the client does not specify one.`,
95
104
  `# Other aliases (opus, sonnet, haiku) resolve per MODEL_MAP in server.js.`,
96
105
  `default_model: ${JSON.stringify(merged.default_model)}`,
@@ -0,0 +1,257 @@
1
+ /**
2
+ * Native tool bridge — translates between OpenAI client tools and the
3
+ * Claude Agent SDK's MCP-tool model.
4
+ *
5
+ * Why this exists (Phase 1 of the mobygate native-tools refactor):
6
+ *
7
+ * Until now, mobygate handled client-supplied tools by injecting their
8
+ * schemas into the system prompt as <tool> XML and instructing the model
9
+ * to emit <tool_call>{...}</tool_call> tags in its text output. We then
10
+ * regex-parsed those tags. Fragile in obvious ways: the model sometimes
11
+ * wrapped tags in code fences, sometimes hallucinated partial blocks,
12
+ * and the "empty after tool_results" nudge existed to paper over the
13
+ * model treating bare <tool_results> as inert data.
14
+ *
15
+ * The SDK actually supports native tool definitions via MCP — but its
16
+ * MCP model assumes the **handler runs in-process** and returns a
17
+ * synchronous result. Our case is different: we're a proxy. The actual
18
+ * tool implementations live on the *other* side of an HTTP boundary,
19
+ * inside the client (Hermes / OpenClaw / etc.). We can't run them.
20
+ *
21
+ * The trick: register client tools as MCP tools with stub handlers that
22
+ * never resolve. The model emits **native** `tool_use` content blocks
23
+ * (in the SDKAssistantMessage stream, not buried in text). We watch the
24
+ * stream, abort the SDK on the first complete `tool_use`, and surface
25
+ * it to the client as an OpenAI `tool_calls` response. The stub handler
26
+ * is then aborted via the SDK's signal — we never actually execute it,
27
+ * the client does.
28
+ *
29
+ * The other end of the round-trip: when the client sends a follow-up
30
+ * request with tool results (role:'tool' messages), we convert those
31
+ * into native `tool_result` content blocks inside an SDKUserMessage,
32
+ * resuming the SDK session. The model sees structured tool results,
33
+ * not <tool_result> XML, and continues the conversation cleanly.
34
+ *
35
+ * Names round-trip via the MCP prefix convention. A client tool named
36
+ * `getWeather` is registered as `mcp__mobygate__getWeather` with the
37
+ * SDK; the model emits tool_use blocks under that prefixed name; we
38
+ * strip the prefix on the way back so the client sees its original name.
39
+ */
40
+
41
+ import { z } from 'zod';
42
+ import { tool, createSdkMcpServer } from '@anthropic-ai/claude-agent-sdk';
43
+
44
+ export const MCP_SERVER_NAME = 'mobygate';
45
+ export const MCP_TOOL_PREFIX = `mcp__${MCP_SERVER_NAME}__`;
46
+
47
+ // ---------------------------------------------------------------------------
48
+ // JSON Schema → Zod RawShape
49
+ // ---------------------------------------------------------------------------
50
+ // The SDK's `tool()` helper takes a Zod RawShape (a record of ZodTypes,
51
+ // like `{name: z.string(), age: z.number()}`) — NOT a JSON Schema object.
52
+ // OpenAI clients send JSON Schema (`{type:'object', properties:{...}, required:[...]}`),
53
+ // so we need to convert. This handles the common cases that cover ~95% of
54
+ // real-world tool schemas; anything weirder falls through to z.unknown().
55
+
56
+ function jsonSchemaPropToZod(prop) {
57
+ if (!prop || typeof prop !== 'object') return z.unknown();
58
+
59
+ // Handle enums up front — they apply across types.
60
+ if (Array.isArray(prop.enum) && prop.enum.length > 0) {
61
+ const stringy = prop.enum.every((v) => typeof v === 'string');
62
+ if (stringy) return z.enum(prop.enum);
63
+ // mixed-type enums fall through to z.union of literals
64
+ return z.union(prop.enum.map((v) => z.literal(v)));
65
+ }
66
+
67
+ switch (prop.type) {
68
+ case 'string': return z.string();
69
+ case 'number': return z.number();
70
+ case 'integer': return z.number().int();
71
+ case 'boolean': return z.boolean();
72
+ case 'null': return z.null();
73
+ case 'array': {
74
+ const item = prop.items ? jsonSchemaPropToZod(prop.items) : z.unknown();
75
+ return z.array(item);
76
+ }
77
+ case 'object': {
78
+ const shape = jsonSchemaToZodShape(prop);
79
+ return z.object(shape).passthrough();
80
+ }
81
+ default: return z.unknown();
82
+ }
83
+ }
84
+
85
+ /**
86
+ * Convert a JSON Schema *object* (with `properties` + `required`) into
87
+ * a Zod RawShape suitable for the SDK's `tool()` helper.
88
+ *
89
+ * Returns an empty shape `{}` when the schema isn't an object — the
90
+ * caller will pass this to `tool()`, and the model will see "no
91
+ * structured input expected." That's the right default for tool defs
92
+ * that arrive without a properties block (which OpenAI permits).
93
+ */
94
+ export function jsonSchemaToZodShape(schema) {
95
+ if (!schema || schema.type !== 'object' || !schema.properties) return {};
96
+ const shape = {};
97
+ const required = new Set(Array.isArray(schema.required) ? schema.required : []);
98
+ for (const [key, prop] of Object.entries(schema.properties)) {
99
+ let zType = jsonSchemaPropToZod(prop);
100
+ if (!required.has(key)) zType = zType.optional();
101
+ if (prop?.description) zType = zType.describe(prop.description);
102
+ shape[key] = zType;
103
+ }
104
+ return shape;
105
+ }
106
+
107
+ // ---------------------------------------------------------------------------
108
+ // Build the MCP server that exposes client tools to the SDK
109
+ // ---------------------------------------------------------------------------
110
+
111
+ /**
112
+ * Stub handler. The model emits a tool_use block, the SDK calls us, but
113
+ * we don't actually have an implementation to run — the client does.
114
+ * So we wait. The stream-watcher in server.js will abort the SDK as
115
+ * soon as it sees the tool_use block, which propagates here as a signal
116
+ * abort. We reject and the SDK cleans up.
117
+ *
118
+ * The 30s safety timeout is for the (rare) case where the SDK fires our
119
+ * handler but the abort never propagates back — we don't want to leak
120
+ * a Promise forever. 30s is well past any reasonable abort latency.
121
+ */
122
+ function deferredToolHandler(_args, extra) {
123
+ return new Promise((resolve, reject) => {
124
+ const onAbort = () => {
125
+ cleanup();
126
+ reject(new Error('mobygate: tool execution deferred to client (aborted)'));
127
+ };
128
+ const timer = setTimeout(() => {
129
+ cleanup();
130
+ reject(new Error('mobygate: tool execution deferred to client (timeout)'));
131
+ }, 30_000);
132
+ function cleanup() {
133
+ clearTimeout(timer);
134
+ extra?.signal?.removeEventListener?.('abort', onAbort);
135
+ }
136
+ if (extra?.signal?.aborted) return onAbort();
137
+ extra?.signal?.addEventListener?.('abort', onAbort, { once: true });
138
+ });
139
+ }
140
+
141
+ /**
142
+ * Build an in-process MCP server exposing the client's tools to the SDK.
143
+ * Returns the McpSdkServerConfigWithInstance; pass it to `query({options: { mcpServers: { [MCP_SERVER_NAME]: config } }})`.
144
+ *
145
+ * Returns `null` when there are no valid tools — caller should skip
146
+ * MCP setup entirely in that case.
147
+ */
148
+ export function buildClientToolsServer(openaiTools) {
149
+ if (!Array.isArray(openaiTools) || openaiTools.length === 0) return null;
150
+
151
+ const toolDefs = [];
152
+ for (const t of openaiTools) {
153
+ if (t?.type !== 'function' || !t.function?.name) continue;
154
+ const fn = t.function;
155
+ const shape = jsonSchemaToZodShape(fn.parameters);
156
+ toolDefs.push(tool(
157
+ fn.name,
158
+ fn.description || `Client-defined tool: ${fn.name}`,
159
+ shape,
160
+ deferredToolHandler,
161
+ // alwaysLoad: the SDK otherwise marks MCP tools as "deferred" — the
162
+ // model has to call the built-in `ToolSearch` to fetch the schema
163
+ // before invoking. That round-trip is invisible to OpenAI clients,
164
+ // who see a confusing tool_call for ToolSearch instead of getWeather.
165
+ // Eagerly loading our tools keeps the OpenAI surface clean.
166
+ { alwaysLoad: true },
167
+ ));
168
+ }
169
+ if (toolDefs.length === 0) return null;
170
+
171
+ return createSdkMcpServer({
172
+ name: MCP_SERVER_NAME,
173
+ version: '1.0.0',
174
+ tools: toolDefs,
175
+ });
176
+ }
177
+
178
+ // ---------------------------------------------------------------------------
179
+ // Tool-use extraction (SDK assistant message → OpenAI tool_calls)
180
+ // ---------------------------------------------------------------------------
181
+
182
+ /**
183
+ * Walk an SDKAssistantMessage's content array for native `tool_use` blocks.
184
+ * Returns an array of `{ id, name, arguments }` formatted for OpenAI
185
+ * tool_calls — name has the MCP prefix stripped, arguments is a JSON string.
186
+ *
187
+ * Returns `[]` when the message has no tool_use blocks (most assistant
188
+ * messages don't — they're just text deltas).
189
+ */
190
+ export function extractToolUses(assistantMessage) {
191
+ const content = assistantMessage?.message?.content;
192
+ if (!Array.isArray(content)) return [];
193
+ const calls = [];
194
+ for (const block of content) {
195
+ if (block?.type !== 'tool_use' || !block.id || !block.name) continue;
196
+ // Strip the MCP prefix so the client sees its original tool name.
197
+ const name = block.name.startsWith(MCP_TOOL_PREFIX)
198
+ ? block.name.slice(MCP_TOOL_PREFIX.length)
199
+ : block.name;
200
+ let argsString = '{}';
201
+ try { argsString = JSON.stringify(block.input ?? {}); } catch {}
202
+ calls.push({ id: block.id, name, arguments: argsString });
203
+ }
204
+ return calls;
205
+ }
206
+
207
+ /**
208
+ * Quick liveness check used by the stream loop to decide whether to abort
209
+ * early. Returns true the moment any tool_use block appears.
210
+ */
211
+ export function hasToolUse(assistantMessage) {
212
+ const content = assistantMessage?.message?.content;
213
+ if (!Array.isArray(content)) return false;
214
+ return content.some((b) => b?.type === 'tool_use');
215
+ }
216
+
217
+ // ---------------------------------------------------------------------------
218
+ // Tool results (OpenAI tool messages → Anthropic tool_result content blocks)
219
+ // ---------------------------------------------------------------------------
220
+
221
+ /**
222
+ * Format OpenAI role:'tool' messages as a single user-readable text
223
+ * block to splice into a resumed prompt.
224
+ *
225
+ * NOTE: Phase 1 deliberately does *not* round-trip tool results as
226
+ * native Anthropic `tool_result` content blocks. Why: when we abort
227
+ * the SDK on a tool_use, the assistant turn isn't persisted in the
228
+ * SDK's session state (we observed `msgs=1` on resume after a tool
229
+ * call, meaning the partial turn was dropped). On resume, sending a
230
+ * native tool_result block then has nothing to bind to — the model
231
+ * sees an orphan tool_result and re-calls the tool.
232
+ *
233
+ * Phase 2's full Anthropic Messages wire format will keep the SDK
234
+ * alive long enough to persist the turn properly. Until then, text-
235
+ * form tool results (which the model handles fine — it has the
236
+ * preceding tool_use in resume context) is the pragmatic answer.
237
+ *
238
+ * Returns a single string suitable for prepending to (or replacing)
239
+ * the user's prompt text on a resumed turn. Returns '' when there
240
+ * are no tool messages.
241
+ */
242
+ export function toolMessagesToText(toolMessages) {
243
+ const lines = [];
244
+ for (const msg of toolMessages) {
245
+ if (msg?.role !== 'tool') continue;
246
+ const id = msg.tool_call_id || 'unknown';
247
+ const name = msg.name || '';
248
+ const content = typeof msg.content === 'string'
249
+ ? msg.content
250
+ : Array.isArray(msg.content)
251
+ ? msg.content.map((c) => (typeof c === 'string' ? c : c?.text || '')).join('')
252
+ : (msg.content == null ? '' : String(msg.content));
253
+ lines.push(`<tool_result id="${id}"${name ? ` name="${name}"` : ''}>\n${content}\n</tool_result>`);
254
+ }
255
+ if (lines.length === 0) return '';
256
+ return `<tool_results>\n${lines.join('\n')}\n</tool_results>`;
257
+ }