npm - mobygate - Versions diffs - 0.7.0 → 0.7.2 - Mend

mobygate 0.7.0 → 0.7.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (5) hide show

package/CHANGELOG.md CHANGED Viewed

@@ -4,6 +4,80 @@ All notable changes to mobygate are documented here. Format loosely follows
 [Keep a Changelog](https://keepachangelog.com/en/1.1.0/); version numbers are
 [Semantic Versioning](https://semver.org/).
+## [0.7.2] — 2026-04-25
+### Fixed
+- **"I can't use the tool 'grep' here because it isn't available" refusals**
+  in long-running tasks. Even with `allowedTools: ['mcp__mobygate__*']`
+  blocking everything except client-defined tools, the model retains
+  strong priors from training for Claude Code's built-ins (Bash, Grep,
+  Read, Edit, Glob, WebFetch, ToolSearch, etc.). When a task seemed to
+  call for one — e.g., "find all TODOs" → instinctive reach for Grep —
+  the model would attempt it, get blocked, refuse the task, and stop.
+  Instead of falling back to the available client tool (`searchFiles`,
+  `terminal`, etc.).
+  **Fix:** for any tool-enabled request, append a short system-prompt
+  block (~150 tokens) via the SDK's
+  `systemPrompt: { type: 'preset', preset: 'claude_code', append: ... }`
+  option. The append explicitly lists the available client tools and
+  states that Claude Code's built-ins are NOT in this environment.
+  Calibrated to be matter-of-fact ("here's the environment, work
+  within it") rather than over-restrictive — the model now uses
+  available tools or briefly says what's missing, instead of refusing
+  silently.
+  Applies to both `/v1/chat/completions` and `/v1/messages`.
+### Notes
+- New helper: `buildToolUsageGuidance(tools)` in `lib/tool-bridge.js`
+  produces the append text from the OpenAI-shape tool array. The
+  Anthropic surface translates its tool defs to OpenAI shape for the
+  bridge already, so the helper takes one input shape across both.
+- Per-request token overhead: ~150 tokens, only when `tools` is non-empty.
+  No effect on text-only chat or non-tool requests.
+## [0.7.1] — 2026-04-24
+Fixes a meaningful token-burn issue for clients that don't pass session
+keys.
+### Added
+- **Auto-derived session keys.** When a request arrives without an
+  `X-Session-Id` header (and without `body.session_id`), mobygate now
+  hashes a stable signature of the conversation — model + system
+  prompt + first user message — and uses that as the session key.
+  Subsequent turns of the same conversation hit the same auto-key,
+  the SDK resume kicks in, and the client only pays input-token cost
+  for the *new* tail of each turn instead of resending 200 messages
+  of history every time.
+  Surfaced in logs as `session=auto_<hash> (auto)` so you can tell
+  client-keyed sessions from server-derived ones at a glance. New
+  module: `lib/session-derive.js`.
+  In production we observed an OpenClaw client repeatedly sending
+  175–211-message conversation histories without a session key,
+  burning through Max usage in minutes. With this change, the same
+  workload re-uses the SDK session and only the new turn gets billed.
+- **Per-request opt-out:** `X-Session-Id: none` (literal string) tells
+  mobygate to skip auto-derive and run the request fully stateless.
+### Notes
+- Applies to both `/v1/chat/completions` (OpenAI) and `/v1/messages`
+  (Anthropic) surfaces.
+- Auto-keys obey the same 60-minute idle TTL as explicit ones, so
+  stale auto-sessions clean themselves up.
+- Two unrelated users starting with identical model + system + first
+  message would share an auto-session — fine for single-user dev
+  setups, but multi-tenant deployments should pass `X-Session-Id`
+  explicitly to scope per-user.
 ## [0.7.0] — 2026-04-24
 Phase 2: native Anthropic Messages surface.

package/lib/session-derive.js ADDED Viewed

@@ -0,0 +1,164 @@
+/**
+ * Auto-derive session keys for clients that don't send `X-Session-Id`.
+ *
+ * Why this exists: OpenAI's wire format is stateless by design — clients
+ * are expected to send the entire conversation history with every turn,
+ * and many clients (OpenClaw at the time of writing, plenty of others)
+ * don't bother passing a session identifier. Without one, mobygate
+ * treats every request as a fresh SDK session and the client ends up
+ * paying input-token cost for the full history on every single turn.
+ * On long conversations (175+ messages observed in production), this
+ * burns through Claude Max usage budgets in minutes.
+ *
+ * The fix: when a request arrives without an explicit session key, we
+ * compute one ourselves from a *stable signature* of the conversation —
+ * model + system prompt + first user message. The same conversation
+ * thread produces the same auto-key turn after turn, so the SDK resume
+ * machinery kicks in and only the new tail of each turn gets billed.
+ * Different conversations naturally produce different signatures and
+ * stay isolated. The existing 60-minute idle TTL keeps stale auto-keys
+ * from lingering forever.
+ *
+ * What's hashed (and why each piece):
+ *   - **model** — different agent configs shouldn't share a session.
+ *   - **system** (string or content blocks, plus any system-role
+ *     messages) — typically stable for the lifetime of a conversation
+ *     thread, distinguishes one agent's persona from another's.
+ *   - **first user message text** — anchors the thread. Stable until
+ *     the client prunes it from history; if/when that happens, a new
+ *     auto-key forms and we lose continuity for that one transition.
+ *     Graceful degradation, not a crash.
+ *
+ * Limitations to be aware of:
+ *   - **Collisions across users:** if two unrelated users happen to
+ *     start with the same model + system + first message ("hello"),
+ *     they'd share a session. In single-user dev contexts (Hermes,
+ *     OpenClaw on a personal machine) this is fine. For multi-tenant
+ *     deployments, clients should pass `X-Session-Id` explicitly to
+ *     scope per-user.
+ *   - **History pruning shifts the key:** if the client drops the first
+ *     user message from history mid-conversation, the auto-key changes
+ *     and the SDK starts a new session. One turn of double-billing,
+ *     then we're back on the new key. Acceptable.
+ *
+ * Opt-out: `X-Session-Id: none` tells us the client explicitly wants
+ * stateless behavior — we return null and the request flows through
+ * as a fresh SDK call. (An *empty* X-Session-Id is indistinguishable
+ * from "header not set" at the Express layer, so we treat it as
+ * "no explicit key, please auto-derive" rather than as opt-out.)
+ */
+import { createHash } from 'crypto';
+const HASH_LEN = 16;
+const SYSTEM_TRIM = 500;
+const USER_TRIM = 500;
+/**
+ * Extract a flat text representation of a content field that might be
+ * either a string or an array of OpenAI/Anthropic content parts. We
+ * only pull the text — images/tool blocks/etc. are ignored for hashing
+ * because they vary in serialization but don't change conversation
+ * identity.
+ */
+function flattenContent(content) {
+  if (typeof content === 'string') return content;
+  if (!Array.isArray(content)) return '';
+  const out = [];
+  for (const part of content) {
+    if (typeof part === 'string') out.push(part);
+    else if (part?.type === 'text' && part.text) out.push(part.text);
+    // image_url / image / tool_use / tool_result intentionally skipped
+  }
+  return out.join(' ');
+}
+/**
+ * Pull the system text out of a request body. The Anthropic surface
+ * carries it on `body.system` (string OR content blocks), the OpenAI
+ * surface carries it as messages with `role: 'system'`. Combine both.
+ */
+function extractSystemText(body) {
+  let parts = [];
+  if (typeof body?.system === 'string') {
+    parts.push(body.system);
+  } else if (Array.isArray(body?.system)) {
+    parts.push(flattenContent(body.system));
+  }
+  for (const msg of body?.messages || []) {
+    if (msg?.role === 'system') {
+      parts.push(flattenContent(msg.content));
+    }
+  }
+  return parts.join('\n').slice(0, SYSTEM_TRIM);
+}
+/**
+ * First user-role message in the array, flattened to text. We use the
+ * first (oldest) one because it's the most stable anchor — later turns
+ * change every request.
+ */
+function extractFirstUserText(body) {
+  for (const msg of body?.messages || []) {
+    if (msg?.role === 'user') {
+      const text = flattenContent(msg.content);
+      if (text) return text.slice(0, USER_TRIM);
+    }
+  }
+  return '';
+}
+/**
+ * Compute a stable session key from a request body. Returns a string
+ * like `auto_<16hex>` when there's enough signal to hash, or `null`
+ * when the body is too sparse (no model, no system, no user text — the
+ * caller should fall through to stateless behavior in that case).
+ *
+ * The hash uses SHA-256 truncated to 16 hex chars (~64 bits of
+ * collision space). A few orders of magnitude more than needed for the
+ * "same conversation prefix" matching use case.
+ */
+export function deriveSessionKey(body) {
+  const model = body?.model || '';
+  const system = extractSystemText(body);
+  const firstUser = extractFirstUserText(body);
+  // Need at least *something* to anchor on. If the request has no
+  // model and no user message, there's literally nothing to identify
+  // the conversation with — better to return null and let the caller
+  // run stateless than to bucket everything into the same auto-key.
+  if (!model && !system && !firstUser) return null;
+  if (!firstUser) return null; // first user msg is the anchor; no anchor → no auto-key
+  const signature = [model, system, firstUser].join('||');
+  const digest = createHash('sha256').update(signature).digest('hex').slice(0, HASH_LEN);
+  return `auto_${digest}`;
+}
+/**
+ * Resolve the effective session key for a request. Order:
+ *   1. Explicit `X-Session-Id` header (or `body.session_id`) wins.
+ *      Special value `'none'` means "explicitly stateless" and
+ *      short-circuits to null without auto-deriving.
+ *   2. Auto-derived key from the conversation signature.
+ *   3. null (stateless) — only when there's nothing useful to hash.
+ *
+ * Returns `{ key, source }` where source is `'explicit' | 'auto' | 'none'`.
+ * The source label is informational — server.js logs it and the dashboard
+ * shows it so you can tell at a glance whether a session was client-keyed
+ * or server-derived.
+ */
+export function resolveSessionKey({ headerKey, bodyKey, body }) {
+  const explicit = headerKey || bodyKey;
+  if (explicit) {
+    const trimmed = String(explicit).trim();
+    if (trimmed.toLowerCase() === 'none') {
+      return { key: null, source: 'none' };
+    }
+    if (trimmed) return { key: trimmed, source: 'explicit' };
+  }
+  const derived = deriveSessionKey(body);
+  if (derived) return { key: derived, source: 'auto' };
+  return { key: null, source: 'none' };
+}

package/lib/tool-bridge.js CHANGED Viewed

@@ -218,6 +218,50 @@ export function hasToolUse(assistantMessage) {
 // Tool results (OpenAI tool messages → Anthropic tool_result content blocks)
 // ---------------------------------------------------------------------------
+// ---------------------------------------------------------------------------
+// Strict-tool guidance (system-prompt append for tool-enabled requests)
+// ---------------------------------------------------------------------------
+// Even with native MCP registration + a tight `allowedTools` allowlist, the
+// model retains strong priors for Claude Code's built-in tools (Bash, Read,
+// Edit, Grep, Glob, WebFetch, ToolSearch, etc.) from training. When a task
+// seems to need one of those, the model reaches for it, gets blocked by
+// `allowedTools`, says "I can't use the tool 'grep' here because it isn't
+// available," and gives up — instead of falling back to the available
+// client-defined tools. We saw this in production OpenClaw use.
+//
+// The fix: append a short, explicit guidance block to Claude Code's system
+// prompt (via `systemPrompt: { type: 'preset', preset: 'claude_code', append: ... }`)
+// telling the model exactly which tools are available and that built-ins
+// are NOT in this environment. The positive list reinforces what the model
+// already sees via MCP registration; the negative list shuts down the
+// trained-in instinct to reach for built-ins.
+//
+// Calibration matters: too directive and the model becomes over-conservative
+// and refuses legitimate work. We aim for matter-of-fact "here's the
+// environment, work within it" rather than threatening prohibition.
+const KNOWN_BUILTINS = 'Bash, Read, Edit, Write, Grep, Glob, NotebookEdit, WebFetch, WebSearch, Task, ToolSearch';
+export function buildToolUsageGuidance(openaiTools) {
+  if (!Array.isArray(openaiTools) || openaiTools.length === 0) return null;
+  const names = [];
+  for (const t of openaiTools) {
+    if (t?.type !== 'function' || !t.function?.name) continue;
+    names.push(t.function.name);
+  }
+  if (names.length === 0) return null;
+  return [
+    'Tool environment: this session is running through a proxy that exposes only the client-defined tools listed below. Claude Code\'s default built-in tools',
+    `(${KNOWN_BUILTINS}, etc.) are NOT available in this environment and cannot be invoked — calls to them will fail.`,
+    '',
+    'Available tools:',
+    ...names.map((n) => `  - ${n}`),
+    '',
+    'If a task seems to require a built-in tool that isn\'t in this list, accomplish what you can with the available tools and briefly note what\'s missing — do not refuse silently or claim you have no tools.',
+  ].join('\n');
+}
 /**
  * Format OpenAI role:'tool' messages as a single user-readable text
  * block to splice into a resumed prompt.

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "mobygate",
-  "version": "0.7.0",
+  "version": "0.7.2",
   "description": "OpenAI-compatible local proxy for Claude Max. The Möbius-strip gateway: OpenAI shape in, Claude Max out.",
   "type": "module",
   "main": "server.js",

package/server.js CHANGED Viewed

@@ -55,6 +55,7 @@ import { loadSessions, saveSessions, flushSessionsNow } from './lib/session-stor
 import { LOGS_DIR } from './lib/config.js';
 import {
   buildClientToolsServer,
+  buildToolUsageGuidance,
   extractToolUses,
   hasToolUse,
   toolMessagesToText,
@@ -76,6 +77,7 @@ import {
   hasAnthropicTools,
   mapStopReason,
 } from './lib/anthropic.js';
+import { resolveSessionKey } from './lib/session-derive.js';
 const __filename = fileURLToPath(import.meta.url);
 const __dirname = dirname(__filename);
@@ -401,6 +403,12 @@ async function handleStreaming(req, res, body, requestId, sessionKey) {
   // Build the in-process MCP server exposing client tools to the SDK.
   // null when toolsEnabled is false (or all tools are malformed).
   const clientToolsServer = toolsEnabled ? buildClientToolsServer(body.tools) : null;
+  // System-prompt append: tells the model exactly which tools are
+  // available and that Claude Code's built-ins (Bash, Grep, Read, etc.)
+  // are NOT in this environment. Without this, the model trained-in
+  // priors lead it to call Grep/Bash, get blocked by allowedTools, and
+  // refuse the task instead of falling back to client tools. ~150 tokens.
+  const toolsGuidance = clientToolsServer ? buildToolUsageGuidance(body.tools) : null;
   if (images.length) console.log(`  [multimodal] ${images.length} image block(s)`);
   if (toolsEnabled) console.log(`  [tools] ${body.tools.length} client tool(s) registered as MCP`);
@@ -457,6 +465,7 @@ async function handleStreaming(req, res, body, requestId, sessionKey) {
           ? {
               mcpServers: { [MCP_SERVER_NAME]: clientToolsServer },
               allowedTools: [`${MCP_TOOL_PREFIX}*`],
+              systemPrompt: { type: 'preset', preset: 'claude_code', append: toolsGuidance },
             }
           : toolsEnabled
             // Tools were requested but none were valid — disable all tools.
@@ -619,6 +628,7 @@ async function handleNonStreaming(res, body, requestId, sessionKey) {
   const prompt = buildQueryPrompt(promptText, images);
   const model = resolveModel(body.model);
   const clientToolsServer = toolsEnabled ? buildClientToolsServer(body.tools) : null;
+  const toolsGuidance = clientToolsServer ? buildToolUsageGuidance(body.tools) : null;
   if (images.length) console.log(`  [multimodal] ${images.length} image block(s)`);
   if (toolsEnabled) console.log(`  [tools] ${body.tools.length} client tool(s) registered as MCP`);
@@ -655,6 +665,7 @@ async function handleNonStreaming(res, body, requestId, sessionKey) {
           ? {
               mcpServers: { [MCP_SERVER_NAME]: clientToolsServer },
               allowedTools: [`${MCP_TOOL_PREFIX}*`],
+              systemPrompt: { type: 'preset', preset: 'claude_code', append: toolsGuidance },
             }
           : toolsEnabled
             ? { allowedTools: [] }
@@ -805,6 +816,7 @@ async function handleAnthropicNonStreaming(res, body, requestId, sessionKey) {
       }))
     : null;
   const clientToolsServer = toolsForBridge ? buildClientToolsServer(toolsForBridge) : null;
+  const toolsGuidance = clientToolsServer ? buildToolUsageGuidance(toolsForBridge) : null;
   if (images.length) console.log(`  [multimodal] ${images.length} image block(s)`);
   if (toolsEnabled) console.log(`  [tools] ${body.tools.length} client tool(s) registered as MCP`);
@@ -843,6 +855,7 @@ async function handleAnthropicNonStreaming(res, body, requestId, sessionKey) {
           ? {
               mcpServers: { [MCP_SERVER_NAME]: clientToolsServer },
               allowedTools: [`${MCP_TOOL_PREFIX}*`],
+              systemPrompt: { type: 'preset', preset: 'claude_code', append: toolsGuidance },
             }
           : toolsEnabled
             ? { allowedTools: [] }
@@ -947,6 +960,7 @@ async function handleAnthropicStreaming(req, res, body, requestId, sessionKey) {
       }))
     : null;
   const clientToolsServer = toolsForBridge ? buildClientToolsServer(toolsForBridge) : null;
+  const toolsGuidance = clientToolsServer ? buildToolUsageGuidance(toolsForBridge) : null;
   if (images.length) console.log(`  [multimodal] ${images.length} image block(s)`);
   if (toolsEnabled) console.log(`  [tools] ${body.tools.length} client tool(s) registered as MCP`);
@@ -1003,6 +1017,7 @@ async function handleAnthropicStreaming(req, res, body, requestId, sessionKey) {
           ? {
               mcpServers: { [MCP_SERVER_NAME]: clientToolsServer },
               allowedTools: [`${MCP_TOOL_PREFIX}*`],
+              systemPrompt: { type: 'preset', preset: 'claude_code', append: toolsGuidance },
             }
           : toolsEnabled
             ? { allowedTools: [] }
@@ -1193,10 +1208,20 @@ app.post('/v1/chat/completions', async (req, res) => {
     });
   }
-  // Session key: X-Session-Id header > body.session_id > null (stateless)
-  const sessionKey = req.headers['x-session-id'] || body.session_id || null;
+  // Session key resolution: X-Session-Id header > body.session_id >
+  // auto-derived from conversation signature > null (stateless).
+  // Auto-derive protects clients that don't pass a session header from
+  // re-paying input-token cost on every turn of a long conversation —
+  // see lib/session-derive.js for the rationale and trade-offs.
+  const { key: sessionKey, source: sessionKeySource } = resolveSessionKey({
+    headerKey: req.headers['x-session-id'],
+    bodyKey: body.session_id,
+    body,
+  });
   const existing = getSession(sessionKey);
-  const sessionTag = sessionKey ? ` | session=${sessionKey}${existing ? ' (resume)' : ' (new)'}` : '';
+  const sessionTag = sessionKey
+    ? ` | session=${sessionKey}${sessionKeySource === 'auto' ? ' (auto)' : ''}${existing ? ' (resume)' : ' (new)'}`
+    : '';
   console.log(`[${new Date().toISOString()}] ${body.stream ? 'stream' : 'sync'} | model=${body.model} → ${resolveModel(body.model)} | msgs=${body.messages.length}${sessionTag}`);
@@ -1260,9 +1285,15 @@ app.post('/v1/messages', async (req, res) => {
     });
   }
-  const sessionKey = req.headers['x-session-id'] || body.session_id || null;
+  const { key: sessionKey, source: sessionKeySource } = resolveSessionKey({
+    headerKey: req.headers['x-session-id'],
+    bodyKey: body.session_id,
+    body,
+  });
   const existing = getSession(sessionKey);
-  const sessionTag = sessionKey ? ` | session=${sessionKey}${existing ? ' (resume)' : ' (new)'}` : '';
+  const sessionTag = sessionKey
+    ? ` | session=${sessionKey}${sessionKeySource === 'auto' ? ' (auto)' : ''}${existing ? ' (resume)' : ' (new)'}`
+    : '';
   console.log(`[${new Date().toISOString()}] anthropic ${body.stream ? 'stream' : 'sync'} | model=${body.model} → ${resolveModel(body.model)} | msgs=${body.messages.length}${sessionTag}`);