mobygate 0.6.2 → 0.7.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -4,6 +4,111 @@ All notable changes to mobygate are documented here. Format loosely follows
4
4
  [Keep a Changelog](https://keepachangelog.com/en/1.1.0/); version numbers are
5
5
  [Semantic Versioning](https://semver.org/).
6
6
 
7
+ ## [0.7.1] — 2026-04-24
8
+
9
+ Fixes a meaningful token-burn issue for clients that don't pass session
10
+ keys.
11
+
12
+ ### Added
13
+
14
+ - **Auto-derived session keys.** When a request arrives without an
15
+ `X-Session-Id` header (and without `body.session_id`), mobygate now
16
+ hashes a stable signature of the conversation — model + system
17
+ prompt + first user message — and uses that as the session key.
18
+ Subsequent turns of the same conversation hit the same auto-key,
19
+ the SDK resume kicks in, and the client only pays input-token cost
20
+ for the *new* tail of each turn instead of resending 200 messages
21
+ of history every time.
22
+
23
+ Surfaced in logs as `session=auto_<hash> (auto)` so you can tell
24
+ client-keyed sessions from server-derived ones at a glance. New
25
+ module: `lib/session-derive.js`.
26
+
27
+ In production we observed an OpenClaw client repeatedly sending
28
+ 175–211-message conversation histories without a session key,
29
+ burning through Max usage in minutes. With this change, the same
30
+ workload re-uses the SDK session and only the new turn gets billed.
31
+
32
+ - **Per-request opt-out:** `X-Session-Id: none` (literal string) tells
33
+ mobygate to skip auto-derive and run the request fully stateless.
34
+
35
+ ### Notes
36
+
37
+ - Applies to both `/v1/chat/completions` (OpenAI) and `/v1/messages`
38
+ (Anthropic) surfaces.
39
+ - Auto-keys obey the same 60-minute idle TTL as explicit ones, so
40
+ stale auto-sessions clean themselves up.
41
+ - Two unrelated users starting with identical model + system + first
42
+ message would share an auto-session — fine for single-user dev
43
+ setups, but multi-tenant deployments should pass `X-Session-Id`
44
+ explicitly to scope per-user.
45
+
46
+ ## [0.7.0] — 2026-04-24
47
+
48
+ Phase 2: native Anthropic Messages surface.
49
+
50
+ Mobygate is now a **dual-surface gateway** — the existing OpenAI-compat
51
+ endpoint at `/v1/chat/completions` keeps working unchanged for Hermes
52
+ and other OpenAI-shaped clients, and a new `POST /v1/messages` endpoint
53
+ speaks native Anthropic Messages wire format for OpenClaw and any other
54
+ Anthropic-shaped client. Both surfaces translate to the same underlying
55
+ `query()` call on the Claude Agent SDK.
56
+
57
+ ### Added
58
+
59
+ - **`POST /v1/messages`** — non-streaming + streaming. Accepts the
60
+ Anthropic Messages request shape (model, messages, system, tools,
61
+ max_tokens, stream, etc.) with native content blocks (`text`, `image`,
62
+ `tool_use`, `tool_result`) and returns native Anthropic responses.
63
+ - **Native Anthropic SSE streaming** — emits `message_start` →
64
+ `content_block_start/delta/stop` (per block, with sequential index) →
65
+ `message_delta` (stop_reason, usage) → `message_stop`. Tool calls
66
+ stream as `content_block_start` with `content_block: {type: 'tool_use'}`
67
+ followed by `content_block_delta` with `delta: {type: 'input_json_delta'}`.
68
+ - **Image passthrough** on `/v1/messages` — base64 data URLs and HTTP
69
+ URLs both flow through to the SDK as Anthropic image content blocks.
70
+ - New module: `lib/anthropic.js` — request translator, response builder,
71
+ streaming SSE translator, stop-reason mapper.
72
+
73
+ ### Changed
74
+
75
+ - **Tool calling on `/v1/messages` reuses Phase 1's native MCP path**
76
+ (from `lib/tool-bridge.js`). No prompt-injected `<tool_call>` text
77
+ protocol on the new surface — the model emits genuine `tool_use`
78
+ content blocks via SDK MCP registration, and we surface them
79
+ structurally. (Earlier WIP work attempted to revert the Phase 1
80
+ tool architecture for this surface; that's been undone in favor of
81
+ reusing the proven path that ships in Hermes today.)
82
+
83
+ ### Known limitation (carried from Phase 1)
84
+
85
+ - Inbound `tool_result` blocks on a resumed turn are still spliced as
86
+ `<tool_results>` text into the next prompt, rather than passed
87
+ through as native Anthropic `tool_result` content blocks. Reason:
88
+ aborting the SDK on a `tool_use` prevents the assistant turn from
89
+ being persisted in session state — on resume, native tool_result
90
+ blocks have nothing to bind to and the model re-calls the tool.
91
+ Text-form works because the resumed model has the prior turn in
92
+ conversational context. A future refactor will keep the SDK
93
+ iterator alive across HTTP request boundaries to lift this.
94
+
95
+ ### Not in scope (deferred to a later release)
96
+
97
+ - Streaming retrofit on the `/v1/chat/completions` endpoint (currently
98
+ buffers tool-mode responses). Mentioned as a Phase 2 candidate; held
99
+ for a focused pass.
100
+ - `cache_control` passthrough — Anthropic's prompt caching is a billing
101
+ feature on API keys, not OAuth Max. We don't pass these headers
102
+ through; nothing to gain on this billing tier.
103
+
104
+ ### OpenClaw migration
105
+
106
+ After this release, register a second provider entry pointing at the
107
+ new endpoint (`api: "anthropic-messages"`, `baseUrl: "http://localhost:3456"`,
108
+ endpoint resolved as `:baseUrl/v1/messages`). The existing
109
+ `claude-max-proxy/*` provider stays registered for clients that want
110
+ the OpenAI-compat surface (Hermes).
111
+
7
112
  ## [0.6.2] — 2026-04-24
8
113
 
9
114
  ### Fixed
@@ -0,0 +1,379 @@
1
+ /**
2
+ * Anthropic Messages translation layer.
3
+ *
4
+ * Translates between the native Anthropic Messages wire format
5
+ * (POST /v1/messages) and the Claude Agent SDK's `query()` shape used
6
+ * internally by mobygate. The SDK is the single source of truth for
7
+ * inference; this module just bridges request and response shapes so
8
+ * Anthropic-shaped clients (OpenClaw, etc.) can use native blocks
9
+ * (`text` / `image` / `tool_use` / `tool_result`) over the wire.
10
+ *
11
+ * Tool calling reuses the Phase 1 native-MCP path from lib/tool-bridge.js
12
+ * — client-defined tools are registered with the SDK as in-process MCP
13
+ * tools (Zod schemas converted from JSON Schema), the model emits real
14
+ * `tool_use` content blocks in its assistant stream, and we surface
15
+ * those structurally instead of regex-parsing them out of text. NO
16
+ * `<tool_call>` text protocol on this surface.
17
+ *
18
+ * Inbound `tool_result` blocks (when the client returns tool outputs in
19
+ * a follow-up turn) are still spliced as text on the resumed prompt.
20
+ * Same Phase 1 limitation: aborting the SDK on a tool_use prevents the
21
+ * assistant turn from being persisted in session state, so a native
22
+ * tool_result has nothing to bind to on resume. A future refactor that
23
+ * keeps the SDK iterator alive across HTTP request boundaries will lift
24
+ * this; until then, text-form works because the resumed model has the
25
+ * prior turn in conversational context.
26
+ */
27
+
28
+ import { v4 as uuidv4 } from 'uuid';
29
+
30
+ // ---------------------------------------------------------------------------
31
+ // Content extraction — read individual block types out of an Anthropic message
32
+ // ---------------------------------------------------------------------------
33
+
34
+ export function anthropicTextOf(content) {
35
+ if (typeof content === 'string') return content;
36
+ if (!Array.isArray(content)) return '';
37
+ return content
38
+ .filter((b) => b?.type === 'text')
39
+ .map((b) => b.text || '')
40
+ .join('');
41
+ }
42
+
43
+ export function anthropicImagesOf(content) {
44
+ if (!Array.isArray(content)) return [];
45
+ return content
46
+ .filter((b) => b?.type === 'image' && b.source)
47
+ .map((b) => ({ type: 'image', source: b.source }));
48
+ }
49
+
50
+ export function anthropicToolResultsOf(content) {
51
+ if (!Array.isArray(content)) return [];
52
+ return content.filter((b) => b?.type === 'tool_result');
53
+ }
54
+
55
+ export function buildAnthropicSystemString(system) {
56
+ if (!system) return '';
57
+ if (typeof system === 'string') return system;
58
+ if (Array.isArray(system)) {
59
+ return system
60
+ .filter((b) => b?.type === 'text')
61
+ .map((b) => b.text || '')
62
+ .join('\n');
63
+ }
64
+ return '';
65
+ }
66
+
67
+ export function hasAnthropicTools(body) {
68
+ return Array.isArray(body?.tools) && body.tools.length > 0;
69
+ }
70
+
71
+ // ---------------------------------------------------------------------------
72
+ // Tool-result text wrapping (inbound side, Phase 1 limitation persists)
73
+ // ---------------------------------------------------------------------------
74
+
75
+ function stringifyToolResultBody(content) {
76
+ if (typeof content === 'string') return content;
77
+ if (Array.isArray(content)) {
78
+ return content
79
+ .map((b) => {
80
+ if (b?.type === 'text') return b.text || '';
81
+ if (b?.type === 'image') return '[image content omitted in tool_result text replay]';
82
+ return JSON.stringify(b);
83
+ })
84
+ .filter(Boolean)
85
+ .join('\n');
86
+ }
87
+ return content == null ? '' : String(content);
88
+ }
89
+
90
+ function formatToolResultBlock(block) {
91
+ const id = block.tool_use_id || 'unknown';
92
+ const body = stringifyToolResultBody(block.content);
93
+ const errAttr = block.is_error ? ' is_error="true"' : '';
94
+ return `<tool_result id="${id}"${errAttr}>\n${body}\n</tool_result>`;
95
+ }
96
+
97
+ // ---------------------------------------------------------------------------
98
+ // Request translation: Anthropic Messages → SDK prompt string
99
+ // ---------------------------------------------------------------------------
100
+ // IMPORTANT: this returns just a string. Tool definitions are NOT injected
101
+ // into the prompt — the caller registers them with the SDK as MCP tools
102
+ // (see lib/tool-bridge.js #buildClientToolsServer). This is a deliberate
103
+ // reversal of OpenClaw's earlier WIP, which fell back to the legacy
104
+ // `<tool_call>` text protocol; the native MCP path proven in Phase 1
105
+ // works fine and we don't need to maintain two tool implementations.
106
+
107
+ export function anthropicMessagesToPrompt(body, { resuming = false } = {}) {
108
+ const messages = body.messages || [];
109
+ const system = buildAnthropicSystemString(body.system);
110
+
111
+ if (resuming) {
112
+ // SDK has full history. Send only the new tail: tool_results from
113
+ // the last user message (if any) plus any fresh user text.
114
+ const last = messages[messages.length - 1];
115
+ if (!last || last.role !== 'user') return '';
116
+ const trBlocks = anthropicToolResultsOf(last.content);
117
+ const text = anthropicTextOf(last.content);
118
+ const parts = [];
119
+ if (trBlocks.length) {
120
+ parts.push(`<tool_results>\n${trBlocks.map(formatToolResultBlock).join('\n')}\n</tool_results>`);
121
+ }
122
+ if (text) parts.push(text);
123
+ return parts.join('\n\n');
124
+ }
125
+
126
+ // Fresh request: serialize visible history. System prompt at top, then
127
+ // each turn. Assistant turns replay as best-effort text — tool_use
128
+ // blocks in the history are dropped (rare in practice; clients almost
129
+ // always use session keys for multi-turn tool conversations).
130
+ const parts = [];
131
+ if (system) parts.push(`<system>\n${system}\n</system>\n`);
132
+
133
+ let toolBuffer = [];
134
+ const flushTools = () => {
135
+ if (toolBuffer.length) {
136
+ parts.push(`<tool_results>\n${toolBuffer.join('\n')}\n</tool_results>\n`);
137
+ toolBuffer = [];
138
+ }
139
+ };
140
+
141
+ for (const msg of messages) {
142
+ if (msg.role === 'user') {
143
+ const trBlocks = anthropicToolResultsOf(msg.content);
144
+ for (const b of trBlocks) toolBuffer.push(formatToolResultBlock(b));
145
+ const text = anthropicTextOf(msg.content);
146
+ if (text) {
147
+ flushTools();
148
+ parts.push(text);
149
+ }
150
+ } else if (msg.role === 'assistant') {
151
+ flushTools();
152
+ const text = anthropicTextOf(msg.content);
153
+ if (text) parts.push(`<previous_response>\n${text}\n</previous_response>\n`);
154
+ }
155
+ }
156
+ flushTools();
157
+ return parts.join('\n').trim();
158
+ }
159
+
160
+ /**
161
+ * Pull image blocks from the latest user message. Anthropic only attaches
162
+ * images to user turns; we ignore older turns to mirror how the SDK + API
163
+ * treat current-turn vs historical content.
164
+ */
165
+ export function collectAnthropicImages(messages) {
166
+ for (let i = messages.length - 1; i >= 0; i--) {
167
+ const msg = messages[i];
168
+ if (msg.role === 'user') {
169
+ const imgs = anthropicImagesOf(msg.content);
170
+ if (imgs.length) return imgs;
171
+ }
172
+ }
173
+ return [];
174
+ }
175
+
176
+ // ---------------------------------------------------------------------------
177
+ // Stop reason mapping
178
+ // ---------------------------------------------------------------------------
179
+
180
+ export function mapStopReason(sdkResult) {
181
+ if (!sdkResult) return 'end_turn';
182
+ const sr = sdkResult.stop_reason;
183
+ if (sr === 'tool_use') return 'tool_use';
184
+ if (sr === 'max_tokens' || sr === 'max_output_tokens') return 'max_tokens';
185
+ if (sr === 'stop_sequence') return 'stop_sequence';
186
+ if (sdkResult.subtype === 'error_max_turns') return 'max_tokens';
187
+ return 'end_turn';
188
+ }
189
+
190
+ // ---------------------------------------------------------------------------
191
+ // Non-streaming response builder
192
+ // ---------------------------------------------------------------------------
193
+ // Takes already-collected text + native tool_use blocks (from
194
+ // extractToolUses in tool-bridge.js) — does NOT parse anything from text.
195
+ // The handler in server.js does the SDK iteration and hands us the result.
196
+
197
+ export function buildAnthropicResponse({
198
+ rawText = '',
199
+ toolUses = [],
200
+ model,
201
+ usage,
202
+ requestId,
203
+ stopReason,
204
+ }) {
205
+ const id = `msg_${(requestId || uuidv4().replace(/-/g, '')).slice(0, 24)}`;
206
+ const content = [];
207
+ if (rawText) content.push({ type: 'text', text: rawText });
208
+ for (const tu of toolUses) {
209
+ // tool_use blocks from extractToolUses() are formatted for OpenAI:
210
+ // {id, name, arguments: <stringified-json>}. Anthropic wants {id, name, input}
211
+ // where input is the parsed object. Reverse the stringify.
212
+ let input = {};
213
+ try { input = JSON.parse(tu.arguments || '{}'); } catch {}
214
+ content.push({ type: 'tool_use', id: tu.id, name: tu.name, input });
215
+ }
216
+ // Empty content array would be invalid in the Anthropic API. If the
217
+ // model produced nothing actionable (rare — usually means an SDK error
218
+ // path), emit a single empty text block so clients don't crash on it.
219
+ if (content.length === 0) content.push({ type: 'text', text: '' });
220
+
221
+ return {
222
+ id,
223
+ type: 'message',
224
+ role: 'assistant',
225
+ model: model || 'claude-opus-4',
226
+ content,
227
+ stop_reason: stopReason || (toolUses.length ? 'tool_use' : 'end_turn'),
228
+ stop_sequence: null,
229
+ usage: {
230
+ input_tokens: usage?.input_tokens || 0,
231
+ output_tokens: usage?.output_tokens || 0,
232
+ },
233
+ };
234
+ }
235
+
236
+ // ---------------------------------------------------------------------------
237
+ // Streaming SSE translator
238
+ // ---------------------------------------------------------------------------
239
+ // Emits Anthropic-shaped events on an Express res. The caller drives it
240
+ // from the SDK iteration loop:
241
+ //
242
+ // const tx = makeStreamTranslator({ res, requestId, model });
243
+ // tx.start(resolvedModel, inputTokens);
244
+ // for await (const message of query(...)) {
245
+ // // text deltas:
246
+ // for (const block of message.message?.content || []) {
247
+ // if (block.type === 'text') tx.pushTextDelta(block.text);
248
+ // }
249
+ // // native tool_use:
250
+ // if (hasToolUse(message)) {
251
+ // for (const tu of extractToolUses(message)) tx.pushToolUse(tu);
252
+ // tx.finish({ stopReason: 'tool_use', usage: ... });
253
+ // break;
254
+ // }
255
+ // }
256
+ // tx.finish({ stopReason: 'end_turn', usage: ... });
257
+
258
+ export function makeStreamTranslator({ res, requestId, model }) {
259
+ let started = false;
260
+ let blockIndex = -1;
261
+ let textBlockOpen = false;
262
+ let finished = false;
263
+ const messageId = `msg_${(requestId || uuidv4().replace(/-/g, '')).slice(0, 24)}`;
264
+
265
+ const sendEvent = (event, data) => {
266
+ if (res.writableEnded) return;
267
+ res.write(`event: ${event}\n`);
268
+ res.write(`data: ${JSON.stringify(data)}\n\n`);
269
+ };
270
+
271
+ const start = (resolvedModel, inputTokens = 0) => {
272
+ if (started) return;
273
+ started = true;
274
+ sendEvent('message_start', {
275
+ type: 'message_start',
276
+ message: {
277
+ id: messageId,
278
+ type: 'message',
279
+ role: 'assistant',
280
+ model: resolvedModel || model,
281
+ content: [],
282
+ stop_reason: null,
283
+ stop_sequence: null,
284
+ usage: { input_tokens: inputTokens, output_tokens: 0 },
285
+ },
286
+ });
287
+ };
288
+
289
+ const openTextBlock = () => {
290
+ if (textBlockOpen) return;
291
+ blockIndex++;
292
+ textBlockOpen = true;
293
+ sendEvent('content_block_start', {
294
+ type: 'content_block_start',
295
+ index: blockIndex,
296
+ content_block: { type: 'text', text: '' },
297
+ });
298
+ };
299
+
300
+ const closeTextBlock = () => {
301
+ if (!textBlockOpen) return;
302
+ sendEvent('content_block_stop', { type: 'content_block_stop', index: blockIndex });
303
+ textBlockOpen = false;
304
+ };
305
+
306
+ const pushTextDelta = (text) => {
307
+ if (!text || finished) return;
308
+ if (!started) start(model, 0);
309
+ openTextBlock();
310
+ sendEvent('content_block_delta', {
311
+ type: 'content_block_delta',
312
+ index: blockIndex,
313
+ delta: { type: 'text_delta', text },
314
+ });
315
+ };
316
+
317
+ /**
318
+ * Emit a native tool_use as content_block_start + input_json_delta +
319
+ * content_block_stop. The SDK gives us the full input object up-front
320
+ * (we don't see the model streaming JSON character by character —
321
+ * that's exposed via the raw API but the Agent SDK aggregates), so
322
+ * we ship it as one delta. Clients that handle character-streamed
323
+ * input_json_delta still parse fine because partial_json across
324
+ * deltas concatenates to the same final string.
325
+ *
326
+ * `tu` is in OpenAI shape from extractToolUses: {id, name, arguments}
327
+ * where arguments is a JSON string.
328
+ */
329
+ const pushToolUse = (tu) => {
330
+ if (finished) return;
331
+ if (!started) start(model, 0);
332
+ closeTextBlock();
333
+ blockIndex++;
334
+ sendEvent('content_block_start', {
335
+ type: 'content_block_start',
336
+ index: blockIndex,
337
+ content_block: { type: 'tool_use', id: tu.id, name: tu.name, input: {} },
338
+ });
339
+ sendEvent('content_block_delta', {
340
+ type: 'content_block_delta',
341
+ index: blockIndex,
342
+ delta: { type: 'input_json_delta', partial_json: tu.arguments || '{}' },
343
+ });
344
+ sendEvent('content_block_stop', { type: 'content_block_stop', index: blockIndex });
345
+ };
346
+
347
+ const finish = ({ stopReason = 'end_turn', usage = {} } = {}) => {
348
+ if (finished) return;
349
+ finished = true;
350
+ if (!started) start(model, 0);
351
+ closeTextBlock();
352
+ sendEvent('message_delta', {
353
+ type: 'message_delta',
354
+ delta: { stop_reason: stopReason, stop_sequence: null },
355
+ usage: { output_tokens: usage.output_tokens || 0 },
356
+ });
357
+ sendEvent('message_stop', { type: 'message_stop' });
358
+ if (!res.writableEnded) res.end();
359
+ };
360
+
361
+ const error = (err) => {
362
+ if (finished || res.writableEnded) return;
363
+ finished = true;
364
+ sendEvent('error', {
365
+ type: 'error',
366
+ error: { type: 'api_error', message: err?.message || String(err) },
367
+ });
368
+ if (!res.writableEnded) res.end();
369
+ };
370
+
371
+ return {
372
+ start,
373
+ pushTextDelta,
374
+ pushToolUse,
375
+ finish,
376
+ error,
377
+ get hasStarted() { return started; },
378
+ };
379
+ }
@@ -0,0 +1,164 @@
1
+ /**
2
+ * Auto-derive session keys for clients that don't send `X-Session-Id`.
3
+ *
4
+ * Why this exists: OpenAI's wire format is stateless by design — clients
5
+ * are expected to send the entire conversation history with every turn,
6
+ * and many clients (OpenClaw at the time of writing, plenty of others)
7
+ * don't bother passing a session identifier. Without one, mobygate
8
+ * treats every request as a fresh SDK session and the client ends up
9
+ * paying input-token cost for the full history on every single turn.
10
+ * On long conversations (175+ messages observed in production), this
11
+ * burns through Claude Max usage budgets in minutes.
12
+ *
13
+ * The fix: when a request arrives without an explicit session key, we
14
+ * compute one ourselves from a *stable signature* of the conversation —
15
+ * model + system prompt + first user message. The same conversation
16
+ * thread produces the same auto-key turn after turn, so the SDK resume
17
+ * machinery kicks in and only the new tail of each turn gets billed.
18
+ * Different conversations naturally produce different signatures and
19
+ * stay isolated. The existing 60-minute idle TTL keeps stale auto-keys
20
+ * from lingering forever.
21
+ *
22
+ * What's hashed (and why each piece):
23
+ * - **model** — different agent configs shouldn't share a session.
24
+ * - **system** (string or content blocks, plus any system-role
25
+ * messages) — typically stable for the lifetime of a conversation
26
+ * thread, distinguishes one agent's persona from another's.
27
+ * - **first user message text** — anchors the thread. Stable until
28
+ * the client prunes it from history; if/when that happens, a new
29
+ * auto-key forms and we lose continuity for that one transition.
30
+ * Graceful degradation, not a crash.
31
+ *
32
+ * Limitations to be aware of:
33
+ * - **Collisions across users:** if two unrelated users happen to
34
+ * start with the same model + system + first message ("hello"),
35
+ * they'd share a session. In single-user dev contexts (Hermes,
36
+ * OpenClaw on a personal machine) this is fine. For multi-tenant
37
+ * deployments, clients should pass `X-Session-Id` explicitly to
38
+ * scope per-user.
39
+ * - **History pruning shifts the key:** if the client drops the first
40
+ * user message from history mid-conversation, the auto-key changes
41
+ * and the SDK starts a new session. One turn of double-billing,
42
+ * then we're back on the new key. Acceptable.
43
+ *
44
+ * Opt-out: `X-Session-Id: none` tells us the client explicitly wants
45
+ * stateless behavior — we return null and the request flows through
46
+ * as a fresh SDK call. (An *empty* X-Session-Id is indistinguishable
47
+ * from "header not set" at the Express layer, so we treat it as
48
+ * "no explicit key, please auto-derive" rather than as opt-out.)
49
+ */
50
+
51
+ import { createHash } from 'crypto';
52
+
53
+ const HASH_LEN = 16;
54
+ const SYSTEM_TRIM = 500;
55
+ const USER_TRIM = 500;
56
+
57
+ /**
58
+ * Extract a flat text representation of a content field that might be
59
+ * either a string or an array of OpenAI/Anthropic content parts. We
60
+ * only pull the text — images/tool blocks/etc. are ignored for hashing
61
+ * because they vary in serialization but don't change conversation
62
+ * identity.
63
+ */
64
+ function flattenContent(content) {
65
+ if (typeof content === 'string') return content;
66
+ if (!Array.isArray(content)) return '';
67
+ const out = [];
68
+ for (const part of content) {
69
+ if (typeof part === 'string') out.push(part);
70
+ else if (part?.type === 'text' && part.text) out.push(part.text);
71
+ // image_url / image / tool_use / tool_result intentionally skipped
72
+ }
73
+ return out.join(' ');
74
+ }
75
+
76
+ /**
77
+ * Pull the system text out of a request body. The Anthropic surface
78
+ * carries it on `body.system` (string OR content blocks), the OpenAI
79
+ * surface carries it as messages with `role: 'system'`. Combine both.
80
+ */
81
+ function extractSystemText(body) {
82
+ let parts = [];
83
+ if (typeof body?.system === 'string') {
84
+ parts.push(body.system);
85
+ } else if (Array.isArray(body?.system)) {
86
+ parts.push(flattenContent(body.system));
87
+ }
88
+ for (const msg of body?.messages || []) {
89
+ if (msg?.role === 'system') {
90
+ parts.push(flattenContent(msg.content));
91
+ }
92
+ }
93
+ return parts.join('\n').slice(0, SYSTEM_TRIM);
94
+ }
95
+
96
+ /**
97
+ * First user-role message in the array, flattened to text. We use the
98
+ * first (oldest) one because it's the most stable anchor — later turns
99
+ * change every request.
100
+ */
101
+ function extractFirstUserText(body) {
102
+ for (const msg of body?.messages || []) {
103
+ if (msg?.role === 'user') {
104
+ const text = flattenContent(msg.content);
105
+ if (text) return text.slice(0, USER_TRIM);
106
+ }
107
+ }
108
+ return '';
109
+ }
110
+
111
+ /**
112
+ * Compute a stable session key from a request body. Returns a string
113
+ * like `auto_<16hex>` when there's enough signal to hash, or `null`
114
+ * when the body is too sparse (no model, no system, no user text — the
115
+ * caller should fall through to stateless behavior in that case).
116
+ *
117
+ * The hash uses SHA-256 truncated to 16 hex chars (~64 bits of
118
+ * collision space). A few orders of magnitude more than needed for the
119
+ * "same conversation prefix" matching use case.
120
+ */
121
+ export function deriveSessionKey(body) {
122
+ const model = body?.model || '';
123
+ const system = extractSystemText(body);
124
+ const firstUser = extractFirstUserText(body);
125
+
126
+ // Need at least *something* to anchor on. If the request has no
127
+ // model and no user message, there's literally nothing to identify
128
+ // the conversation with — better to return null and let the caller
129
+ // run stateless than to bucket everything into the same auto-key.
130
+ if (!model && !system && !firstUser) return null;
131
+ if (!firstUser) return null; // first user msg is the anchor; no anchor → no auto-key
132
+
133
+ const signature = [model, system, firstUser].join('||');
134
+ const digest = createHash('sha256').update(signature).digest('hex').slice(0, HASH_LEN);
135
+ return `auto_${digest}`;
136
+ }
137
+
138
+ /**
139
+ * Resolve the effective session key for a request. Order:
140
+ * 1. Explicit `X-Session-Id` header (or `body.session_id`) wins.
141
+ * Special value `'none'` means "explicitly stateless" and
142
+ * short-circuits to null without auto-deriving.
143
+ * 2. Auto-derived key from the conversation signature.
144
+ * 3. null (stateless) — only when there's nothing useful to hash.
145
+ *
146
+ * Returns `{ key, source }` where source is `'explicit' | 'auto' | 'none'`.
147
+ * The source label is informational — server.js logs it and the dashboard
148
+ * shows it so you can tell at a glance whether a session was client-keyed
149
+ * or server-derived.
150
+ */
151
+ export function resolveSessionKey({ headerKey, bodyKey, body }) {
152
+ const explicit = headerKey || bodyKey;
153
+ if (explicit) {
154
+ const trimmed = String(explicit).trim();
155
+ if (trimmed.toLowerCase() === 'none') {
156
+ return { key: null, source: 'none' };
157
+ }
158
+ if (trimmed) return { key: trimmed, source: 'explicit' };
159
+ }
160
+
161
+ const derived = deriveSessionKey(body);
162
+ if (derived) return { key: derived, source: 'auto' };
163
+ return { key: null, source: 'none' };
164
+ }
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "mobygate",
3
- "version": "0.6.2",
3
+ "version": "0.7.1",
4
4
  "description": "OpenAI-compatible local proxy for Claude Max. The Möbius-strip gateway: OpenAI shape in, Claude Max out.",
5
5
  "type": "module",
6
6
  "main": "server.js",
package/server.js CHANGED
@@ -68,6 +68,15 @@ import {
68
68
  readUpdateLogTail,
69
69
  getCurrentVersion,
70
70
  } from './lib/updater.js';
71
+ import {
72
+ anthropicMessagesToPrompt,
73
+ collectAnthropicImages,
74
+ buildAnthropicResponse,
75
+ makeStreamTranslator,
76
+ hasAnthropicTools,
77
+ mapStopReason,
78
+ } from './lib/anthropic.js';
79
+ import { resolveSessionKey } from './lib/session-derive.js';
71
80
 
72
81
  const __filename = fileURLToPath(import.meta.url);
73
82
  const __dirname = dirname(__filename);
@@ -765,6 +774,376 @@ async function handleNonStreaming(res, body, requestId, sessionKey) {
765
774
  });
766
775
  }
767
776
 
777
+ // ---------------------------------------------------------------------------
778
+ // POST /v1/messages — Anthropic-native surface (non-streaming + streaming)
779
+ // ---------------------------------------------------------------------------
780
+ // The dual-surface architecture: Hermes uses /v1/chat/completions
781
+ // (OpenAI shape), OpenClaw uses /v1/messages (Anthropic shape). Both
782
+ // translate to the SAME underlying SDK query() — the surfaces are pure
783
+ // translators over a single inference engine.
784
+ //
785
+ // Tool calling: reuses Phase 1's native MCP path from lib/tool-bridge.js.
786
+ // No prompt-injected tool definitions, no <tool_call> text parsing.
787
+ // Inbound tool_results still spliced as text on resume (see anthropic.js
788
+ // docstring for why — Phase 1 limitation, not lifted here).
789
+
790
+ async function handleAnthropicNonStreaming(res, body, requestId, sessionKey) {
791
+ const existing = getSession(sessionKey);
792
+ const resuming = !!existing?.sdkSessionId;
793
+ const toolsEnabled = hasAnthropicTools(body);
794
+ const promptText = anthropicMessagesToPrompt(body, { resuming });
795
+ const images = collectAnthropicImages(body.messages || []);
796
+ const prompt = buildQueryPrompt(promptText, images);
797
+ const model = resolveModel(body.model);
798
+ // Translate Anthropic tool defs → OpenAI shape that buildClientToolsServer
799
+ // expects. Both go through the same JSON-Schema → Zod path on the way to
800
+ // MCP; the wrapper shape difference is just `function:{name, parameters}`
801
+ // vs `{name, input_schema}`.
802
+ const toolsForBridge = toolsEnabled
803
+ ? body.tools.map((t) => ({
804
+ type: 'function',
805
+ function: { name: t.name, description: t.description || '', parameters: t.input_schema || {} },
806
+ }))
807
+ : null;
808
+ const clientToolsServer = toolsForBridge ? buildClientToolsServer(toolsForBridge) : null;
809
+
810
+ if (images.length) console.log(` [multimodal] ${images.length} image block(s)`);
811
+ if (toolsEnabled) console.log(` [tools] ${body.tools.length} client tool(s) registered as MCP`);
812
+
813
+ let resultText = '';
814
+ let collectedToolCalls = [];
815
+ let resolvedModel = model;
816
+ let inputTokens = 0;
817
+ let outputTokens = 0;
818
+ let capturedSessionId = existing?.sdkSessionId || null;
819
+ let stopReason = 'end_turn';
820
+ const abortController = new AbortController();
821
+
822
+ if (resuming) {
823
+ console.log(` [session] resuming: ${sessionKey} → sdk=${existing.sdkSessionId} (msgs=${existing.messageCount})`);
824
+ }
825
+
826
+ const runQuery = async () => {
827
+ resultText = '';
828
+ collectedToolCalls = [];
829
+ resolvedModel = model;
830
+ inputTokens = 0;
831
+ outputTokens = 0;
832
+ capturedSessionId = existing?.sdkSessionId || null;
833
+ stopReason = 'end_turn';
834
+
835
+ for await (const message of query({
836
+ prompt,
837
+ options: {
838
+ model,
839
+ maxTurns: toolsEnabled ? 5 : 200,
840
+ permissionMode: 'bypassPermissions',
841
+ allowDangerouslySkipPermissions: true,
842
+ abortController,
843
+ ...(clientToolsServer
844
+ ? {
845
+ mcpServers: { [MCP_SERVER_NAME]: clientToolsServer },
846
+ allowedTools: [`${MCP_TOOL_PREFIX}*`],
847
+ }
848
+ : toolsEnabled
849
+ ? { allowedTools: [] }
850
+ : {}),
851
+ ...(resuming ? { resume: existing.sdkSessionId } : {}),
852
+ ...(sessionKey && !resuming ? { persistSession: true } : {}),
853
+ },
854
+ })) {
855
+ if (message.type === 'system' && message.subtype === 'init' && message.model) {
856
+ resolvedModel = message.model;
857
+ }
858
+
859
+ if (message.type === 'assistant' && message.session_id && !capturedSessionId) {
860
+ capturedSessionId = message.session_id;
861
+ console.log(` [session] captured sdk session: ${capturedSessionId}`);
862
+ }
863
+
864
+ if (message.type === 'assistant' && message.message?.content) {
865
+ const content = message.message.content;
866
+ if (Array.isArray(content)) {
867
+ for (const block of content) {
868
+ if (block.type === 'text') resultText += block.text || '';
869
+ }
870
+ } else if (typeof content === 'string') {
871
+ resultText += content;
872
+ }
873
+ if (isAuthFailureText(resultText)) {
874
+ abortController.abort();
875
+ throw new AuthFailureInResultText(resultText);
876
+ }
877
+ if (toolsEnabled && hasToolUse(message)) {
878
+ const calls = extractToolUses(message);
879
+ if (calls.length) {
880
+ collectedToolCalls.push(...calls);
881
+ stopReason = 'tool_use';
882
+ console.log(` [tools] ${calls.length} native tool_use block(s) — aborting SDK`);
883
+ abortController.abort();
884
+ break;
885
+ }
886
+ }
887
+ }
888
+
889
+ if (message.type === 'result') {
890
+ if (message.result && !resultText) resultText = message.result;
891
+ if (isAuthFailureText(resultText)) {
892
+ throw new AuthFailureInResultText(resultText);
893
+ }
894
+ inputTokens = message.input_tokens || 0;
895
+ outputTokens = message.output_tokens || 0;
896
+ stopReason = mapStopReason(message);
897
+ break;
898
+ }
899
+ }
900
+ };
901
+
902
+ try {
903
+ await runWithAuthRetry({
904
+ attempt: runQuery,
905
+ bailIfStarted: () => false,
906
+ onRefreshing: (err) => console.warn(`[auth] 401 on /v1/messages — refreshing (${err.message?.slice(0, 80)})`),
907
+ onRetry: (r) => console.log(`[auth] refreshed in ${r.durationMs}ms — retrying /v1/messages`),
908
+ });
909
+ } catch (err) {
910
+ const isAbort = err?.name === 'AbortError' || /aborted/i.test(err?.message || '');
911
+ if (!(toolsEnabled && isAbort)) {
912
+ console.error('[/v1/messages] SDK error:', err.message);
913
+ return res.status(500).json({
914
+ type: 'error',
915
+ error: { type: 'api_error', message: err.message },
916
+ });
917
+ }
918
+ }
919
+
920
+ if (sessionKey && capturedSessionId) {
921
+ upsertSession(sessionKey, capturedSessionId, resolvedModel);
922
+ }
923
+
924
+ if (sessionKey) res.setHeader('X-Session-Id', sessionKey);
925
+
926
+ res.json(buildAnthropicResponse({
927
+ rawText: resultText.trim(),
928
+ toolUses: collectedToolCalls,
929
+ model: resolvedModel,
930
+ usage: { input_tokens: inputTokens, output_tokens: outputTokens },
931
+ requestId,
932
+ stopReason,
933
+ }));
934
+ }
935
+
936
+ async function handleAnthropicStreaming(req, res, body, requestId, sessionKey) {
937
+ const existing = getSession(sessionKey);
938
+ const resuming = !!existing?.sdkSessionId;
939
+ const toolsEnabled = hasAnthropicTools(body);
940
+ const promptText = anthropicMessagesToPrompt(body, { resuming });
941
+ const images = collectAnthropicImages(body.messages || []);
942
+ const prompt = buildQueryPrompt(promptText, images);
943
+ const model = resolveModel(body.model);
944
+ const toolsForBridge = toolsEnabled
945
+ ? body.tools.map((t) => ({
946
+ type: 'function',
947
+ function: { name: t.name, description: t.description || '', parameters: t.input_schema || {} },
948
+ }))
949
+ : null;
950
+ const clientToolsServer = toolsForBridge ? buildClientToolsServer(toolsForBridge) : null;
951
+
952
+ if (images.length) console.log(` [multimodal] ${images.length} image block(s)`);
953
+ if (toolsEnabled) console.log(` [tools] ${body.tools.length} client tool(s) registered as MCP`);
954
+
955
+ res.setHeader('Content-Type', 'text/event-stream');
956
+ res.setHeader('Cache-Control', 'no-cache');
957
+ res.setHeader('Connection', 'keep-alive');
958
+ res.setHeader('X-Request-Id', requestId);
959
+ if (sessionKey) res.setHeader('X-Session-Id', sessionKey);
960
+ res.flushHeaders();
961
+
962
+ const tx = makeStreamTranslator({ res, requestId, model });
963
+ const abortController = new AbortController();
964
+ let resolvedModel = model;
965
+ let capturedSessionId = existing?.sdkSessionId || null;
966
+ let inputTokens = 0;
967
+ let outputTokens = 0;
968
+ let stopReason = 'end_turn';
969
+ let clientDisconnected = false;
970
+ let textEmittedSoFar = ''; // dedup against same-message reflow from SDK
971
+ let toolUseEmitted = false;
972
+
973
+ res.on('close', () => {
974
+ clientDisconnected = true;
975
+ abortController.abort();
976
+ });
977
+
978
+ if (resuming) {
979
+ console.log(` [session] resuming: ${sessionKey} → sdk=${existing.sdkSessionId} (msgs=${existing.messageCount})`);
980
+ }
981
+
982
+ const runQuery = async () => {
983
+ // Reset per-attempt state in case of 401-retry. Note: tx is reused
984
+ // across retries, so a successful retry that comes after we already
985
+ // emitted message_start would surface as a confused stream. We bail
986
+ // out of retry once the translator has started (see bailIfStarted).
987
+ resolvedModel = model;
988
+ capturedSessionId = existing?.sdkSessionId || null;
989
+ inputTokens = 0;
990
+ outputTokens = 0;
991
+ stopReason = 'end_turn';
992
+ textEmittedSoFar = '';
993
+ toolUseEmitted = false;
994
+
995
+ for await (const message of query({
996
+ prompt,
997
+ options: {
998
+ model,
999
+ maxTurns: toolsEnabled ? 5 : 200,
1000
+ permissionMode: 'bypassPermissions',
1001
+ allowDangerouslySkipPermissions: true,
1002
+ abortController,
1003
+ ...(clientToolsServer
1004
+ ? {
1005
+ mcpServers: { [MCP_SERVER_NAME]: clientToolsServer },
1006
+ allowedTools: [`${MCP_TOOL_PREFIX}*`],
1007
+ }
1008
+ : toolsEnabled
1009
+ ? { allowedTools: [] }
1010
+ : {}),
1011
+ ...(resuming ? { resume: existing.sdkSessionId } : {}),
1012
+ ...(sessionKey && !resuming ? { persistSession: true } : {}),
1013
+ },
1014
+ })) {
1015
+ if (clientDisconnected) break;
1016
+
1017
+ if (message.type === 'system' && message.subtype === 'init' && message.model) {
1018
+ resolvedModel = message.model;
1019
+ tx.start(resolvedModel, 0);
1020
+ }
1021
+
1022
+ if (message.type === 'assistant' && message.session_id && !capturedSessionId) {
1023
+ capturedSessionId = message.session_id;
1024
+ console.log(` [session] captured sdk session: ${capturedSessionId}`);
1025
+ }
1026
+
1027
+ if (message.type === 'assistant' && message.message?.content) {
1028
+ const content = message.message.content;
1029
+
1030
+ // Auth-failure short-circuit: throw so runWithAuthRetry handles it.
1031
+ // Only safe before any text has been streamed (otherwise we've
1032
+ // already corrupted the SSE stream and can't undo).
1033
+ if (Array.isArray(content)) {
1034
+ let combined = '';
1035
+ for (const b of content) if (b?.type === 'text' && b.text) combined += b.text;
1036
+ if (combined && isAuthFailureText(combined) && !tx.hasStarted) {
1037
+ abortController.abort();
1038
+ throw new AuthFailureInResultText(combined);
1039
+ }
1040
+ }
1041
+
1042
+ // Tool_use detection: emit tool_use blocks structurally and abort.
1043
+ // We do this BEFORE streaming text deltas from this message so the
1044
+ // tool_use block is properly framed (after any pending text block
1045
+ // closes). The translator handles the close-text → open-tool-use
1046
+ // sequencing internally.
1047
+ if (toolsEnabled && hasToolUse(message)) {
1048
+ const calls = extractToolUses(message);
1049
+ if (calls.length) {
1050
+ // Emit any text from this same message *before* the tool_use
1051
+ // (Anthropic streams sometimes have text + tool_use in one
1052
+ // assistant message — preserve that ordering).
1053
+ if (Array.isArray(content)) {
1054
+ for (const b of content) {
1055
+ if (b?.type === 'text' && b.text) {
1056
+ // Compute delta vs what we've emitted to avoid duplication
1057
+ // on aggregator-style assistant messages that resend the
1058
+ // whole accumulated text.
1059
+ const delta = b.text.startsWith(textEmittedSoFar)
1060
+ ? b.text.slice(textEmittedSoFar.length)
1061
+ : b.text;
1062
+ if (delta) {
1063
+ tx.pushTextDelta(delta);
1064
+ textEmittedSoFar += delta;
1065
+ }
1066
+ }
1067
+ }
1068
+ }
1069
+ for (const tu of calls) tx.pushToolUse(tu);
1070
+ toolUseEmitted = true;
1071
+ stopReason = 'tool_use';
1072
+ console.log(` [tools] ${calls.length} native tool_use block(s) — aborting SDK`);
1073
+ abortController.abort();
1074
+ break;
1075
+ }
1076
+ }
1077
+
1078
+ // Plain text-only assistant message: stream the delta.
1079
+ if (Array.isArray(content)) {
1080
+ let combined = '';
1081
+ for (const b of content) if (b?.type === 'text' && b.text) combined += b.text;
1082
+ if (combined) {
1083
+ const delta = combined.startsWith(textEmittedSoFar)
1084
+ ? combined.slice(textEmittedSoFar.length)
1085
+ : combined;
1086
+ if (delta) {
1087
+ tx.pushTextDelta(delta);
1088
+ textEmittedSoFar += delta;
1089
+ }
1090
+ }
1091
+ } else if (typeof content === 'string' && content) {
1092
+ const delta = content.startsWith(textEmittedSoFar)
1093
+ ? content.slice(textEmittedSoFar.length)
1094
+ : content;
1095
+ if (delta) {
1096
+ tx.pushTextDelta(delta);
1097
+ textEmittedSoFar += delta;
1098
+ }
1099
+ }
1100
+ }
1101
+
1102
+ if (message.type === 'result') {
1103
+ if (message.result && !textEmittedSoFar && !toolUseEmitted) {
1104
+ // Some SDK paths only deliver text via the final result message
1105
+ // (no streaming assistant messages). Emit it here as a single
1106
+ // delta — clients see this as "model started + finished in one
1107
+ // chunk", which is valid SSE.
1108
+ tx.pushTextDelta(message.result);
1109
+ }
1110
+ if (isAuthFailureText(message.result || '') && !tx.hasStarted) {
1111
+ throw new AuthFailureInResultText(message.result);
1112
+ }
1113
+ inputTokens = message.input_tokens || 0;
1114
+ outputTokens = message.output_tokens || 0;
1115
+ if (!toolUseEmitted) stopReason = mapStopReason(message);
1116
+ break;
1117
+ }
1118
+ }
1119
+ };
1120
+
1121
+ try {
1122
+ await runWithAuthRetry({
1123
+ attempt: runQuery,
1124
+ // Once we've emitted message_start or any deltas, the SSE stream is
1125
+ // committed — a retry would fragment it. Same logic as the OpenAI
1126
+ // surface (bail once anything has been written).
1127
+ bailIfStarted: () => tx.hasStarted,
1128
+ onRefreshing: (err) => console.warn(`[auth] 401 on /v1/messages stream — refreshing (${err.message?.slice(0, 80)})`),
1129
+ onRetry: (r) => console.log(`[auth] refreshed in ${r.durationMs}ms — retrying /v1/messages stream`),
1130
+ });
1131
+ } catch (err) {
1132
+ const isAbort = err?.name === 'AbortError' || /aborted/i.test(err?.message || '');
1133
+ if (!clientDisconnected && !(toolsEnabled && isAbort)) {
1134
+ console.error('[/v1/messages stream] SDK error:', err.message);
1135
+ tx.error(err);
1136
+ return;
1137
+ }
1138
+ }
1139
+
1140
+ if (sessionKey && capturedSessionId) {
1141
+ upsertSession(sessionKey, capturedSessionId, resolvedModel);
1142
+ }
1143
+
1144
+ tx.finish({ stopReason, usage: { output_tokens: outputTokens } });
1145
+ }
1146
+
768
1147
  // ---------------------------------------------------------------------------
769
1148
  // Express app
770
1149
  // ---------------------------------------------------------------------------
@@ -815,10 +1194,20 @@ app.post('/v1/chat/completions', async (req, res) => {
815
1194
  });
816
1195
  }
817
1196
 
818
- // Session key: X-Session-Id header > body.session_id > null (stateless)
819
- const sessionKey = req.headers['x-session-id'] || body.session_id || null;
1197
+ // Session key resolution: X-Session-Id header > body.session_id >
1198
+ // auto-derived from conversation signature > null (stateless).
1199
+ // Auto-derive protects clients that don't pass a session header from
1200
+ // re-paying input-token cost on every turn of a long conversation —
1201
+ // see lib/session-derive.js for the rationale and trade-offs.
1202
+ const { key: sessionKey, source: sessionKeySource } = resolveSessionKey({
1203
+ headerKey: req.headers['x-session-id'],
1204
+ bodyKey: body.session_id,
1205
+ body,
1206
+ });
820
1207
  const existing = getSession(sessionKey);
821
- const sessionTag = sessionKey ? ` | session=${sessionKey}${existing ? ' (resume)' : ' (new)'}` : '';
1208
+ const sessionTag = sessionKey
1209
+ ? ` | session=${sessionKey}${sessionKeySource === 'auto' ? ' (auto)' : ''}${existing ? ' (resume)' : ' (new)'}`
1210
+ : '';
822
1211
 
823
1212
  console.log(`[${new Date().toISOString()}] ${body.stream ? 'stream' : 'sync'} | model=${body.model} → ${resolveModel(body.model)} | msgs=${body.messages.length}${sessionTag}`);
824
1213
 
@@ -866,6 +1255,75 @@ app.post('/v1/chat/completions', async (req, res) => {
866
1255
  }
867
1256
  });
868
1257
 
1258
+ // POST /v1/messages — Anthropic-native surface (for OpenClaw etc.).
1259
+ // Same dispatch shape as /v1/chat/completions, different translator pair.
1260
+ // Both endpoints terminate at the same SDK query() under the hood; this
1261
+ // route exists so Anthropic-shaped clients get native blocks (text /
1262
+ // image / tool_use / tool_result) without going through OpenAI shape.
1263
+ app.post('/v1/messages', async (req, res) => {
1264
+ const requestId = uuidv4().replace(/-/g, '').slice(0, 24);
1265
+ const body = req.body;
1266
+
1267
+ if (!body?.messages || !Array.isArray(body.messages) || body.messages.length === 0) {
1268
+ return res.status(400).json({
1269
+ type: 'error',
1270
+ error: { type: 'invalid_request_error', message: 'messages is required and must be a non-empty array' },
1271
+ });
1272
+ }
1273
+
1274
+ const { key: sessionKey, source: sessionKeySource } = resolveSessionKey({
1275
+ headerKey: req.headers['x-session-id'],
1276
+ bodyKey: body.session_id,
1277
+ body,
1278
+ });
1279
+ const existing = getSession(sessionKey);
1280
+ const sessionTag = sessionKey
1281
+ ? ` | session=${sessionKey}${sessionKeySource === 'auto' ? ' (auto)' : ''}${existing ? ' (resume)' : ' (new)'}`
1282
+ : '';
1283
+
1284
+ console.log(`[${new Date().toISOString()}] anthropic ${body.stream ? 'stream' : 'sync'} | model=${body.model} → ${resolveModel(body.model)} | msgs=${body.messages.length}${sessionTag}`);
1285
+
1286
+ // Dashboard event — same shape as the OpenAI route, just labeled by path.
1287
+ const startedAt = Date.now();
1288
+ const imageBlocks = collectAnthropicImages(body.messages || []).length;
1289
+ dashboardBus.emitEvent({
1290
+ type: 'request.start',
1291
+ id: requestId,
1292
+ method: 'POST',
1293
+ path: '/v1/messages',
1294
+ model: body.model,
1295
+ resolvedModel: resolveModel(body.model),
1296
+ session: sessionKey,
1297
+ stream: !!body.stream,
1298
+ tools: hasAnthropicTools(body),
1299
+ images: imageBlocks,
1300
+ messages: body.messages.length,
1301
+ resuming: !!existing,
1302
+ });
1303
+
1304
+ let endEmitted = false;
1305
+ const emitEnd = (overrides = {}) => {
1306
+ if (endEmitted) return;
1307
+ endEmitted = true;
1308
+ dashboardBus.emitEvent({
1309
+ type: 'request.end',
1310
+ id: requestId,
1311
+ durationMs: Date.now() - startedAt,
1312
+ status: res.statusCode < 400 ? 'ok' : 'error',
1313
+ httpStatus: res.statusCode,
1314
+ ...overrides,
1315
+ });
1316
+ };
1317
+ res.on('finish', () => emitEnd());
1318
+ res.on('close', () => { if (!endEmitted) emitEnd({ status: 'error', error: 'client_disconnect' }); });
1319
+
1320
+ if (body.stream) {
1321
+ await handleAnthropicStreaming(req, res, body, requestId, sessionKey);
1322
+ } else {
1323
+ await handleAnthropicNonStreaming(res, body, requestId, sessionKey);
1324
+ }
1325
+ });
1326
+
869
1327
  // GET /v1/models
870
1328
  app.get('/v1/models', (_req, res) => {
871
1329
  const now = Math.floor(Date.now() / 1000);