mobygate 0.8.4 → 0.9.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -4,6 +4,550 @@ All notable changes to mobygate are documented here. Format loosely follows
4
4
  [Keep a Changelog](https://keepachangelog.com/en/1.1.0/); version numbers are
5
5
  [Semantic Versioning](https://semver.org/).
6
6
 
7
+ ## [0.9.4] — 2026-05-29
8
+
9
+ Fable 5 support. Anthropic shipped the Fable 5 model family (parallel to
10
+ the Opus 4.x line); this adds it to the model map so `claude-fable-5`
11
+ (and the `fable` alias) resolve to the 1M-context variant instead of
12
+ silently falling back to the default.
13
+
14
+ ### Added
15
+
16
+ - **Fable 5 model map entries:** `claude-fable-5`, `claude-fable-5[1m]`,
17
+ `claude-fable-5-1m`, `claude-fable-5-200k`, plus the `fable` /
18
+ `fable-200k` short aliases. The bare `claude-fable-5` and `fable`
19
+ aliases route to `claude-fable-5[1m]` (1M context, verified
20
+ Max-included — the probe ran on Claude Max OAuth with no extra-usage
21
+ rejection). Use `claude-fable-5-200k` for the standard 200k variant.
22
+
23
+ ### Notes
24
+
25
+ - Additive only. Opus 4.8 stays the default model; Fable resolves only
26
+ when explicitly requested. All Opus / Sonnet / Haiku entries unchanged.
27
+ - Verified live 2026-05-29 against Anthropic via the `claude` CLI:
28
+ `claude-fable-5` accepted and self-identifies; `claude-fable-9`
29
+ (control) rejected as nonexistent.
30
+
31
+ ## [0.9.3] — 2026-05-29
32
+
33
+ Opus 4.8 support + control-plane Host-header hardening.
34
+
35
+ Two threads land together. (1) Anthropic shipped Opus 4.8; mobygate was
36
+ silently downgrading every `claude-opus-4-8` request to `4-7[1m]` because
37
+ the alias wasn't in the model map — so clients (including Taste, which
38
+ already targets 4-8) were unknowingly running on the older model. (2) A
39
+ batch of sensitive control-plane GET endpoints were reachable from
40
+ non-local Host headers (DNS-rebinding / CSRF surface).
41
+
42
+ ### Added
43
+
44
+ - **Opus 4.8 model map entries:** `claude-opus-4-8`,
45
+ `claude-opus-4-8[1m]`, `claude-opus-4-8-1m`, `claude-opus-4-8-200k`.
46
+ Bare `claude-opus-4-8`, the `opus` alias, and the catch-all
47
+ `claude-opus-4` now resolve to `claude-opus-4-8[1m]` (1M, Max-included
48
+ — verified live). All `claude-opus-4-7*` entries retained so explicit
49
+ 4-7 requests still resolve correctly.
50
+ - **`DEFAULT_MODEL` bumped** `claude-opus-4-7[1m]` → `claude-opus-4-8[1m]`.
51
+
52
+ ### Changed (security)
53
+
54
+ - **`requireLocalOrigin` now guards** `/sessions`, `/sessions/:key`,
55
+ `/dashboard/recent`, `/dashboard/sessions`, `/auth/status`,
56
+ `/update/check`, `/update/status`. These previously answered requests
57
+ with hostile `Host` headers, exposing session IDs, dashboard event
58
+ metadata, auth state, and update logs to DNS-rebinding / CSRF.
59
+ - **Extracted `serializeSession()`** — collapses the duplicated
60
+ session-JSON formatting that the `/sessions` routes and dashboard
61
+ shared into one helper (dashboard vs API shape via an option flag).
62
+
63
+ ### Added (dashboard)
64
+
65
+ - **`GET /dashboard.css`** serves the bundled dashboard stylesheet
66
+ (extracted from inline `index.html`), with no-cache headers so edits
67
+ show on reload. Added `dashboard.css` to the published `files[]`.
68
+ - **`test/smoke.test.mjs`** gains a raw-`node:http` case that spoofs
69
+ `Host: evil.example` to assert the guarded endpoints reject it. Full
70
+ suite: 11/11 passing.
71
+
72
+ ### Fixed
73
+
74
+ - **Silent model downgrade.** A `claude-opus-4-8` request returned
75
+ `model: claude-opus-4-7[1m]` in the response body before this fix.
76
+ Confirmed in a real capture (requested 4-8, ran as 4-7) — the bug had
77
+ been quietly affecting live Claude Code and Taste traffic.
78
+
79
+ ## [0.9.2] — 2026-05-03
80
+
81
+ Return `405 Method Not Allowed` (with `Allow: POST` header) for `GET`
82
+ on `/v1/chat/completions`, `/v1/messages`, and `/quiet/v1/messages`
83
+ instead of the default `404`. RFC 9110-correct, and unblocks
84
+ endpoint-detection probes (e.g. Hermes onboarding) that treat 404 as
85
+ "missing" but 405 as "exists, wrong verb."
86
+
87
+ ## [0.9.1] — 2026-05-03
88
+
89
+ Smoke-test suite + `X-Request-Id` on JSON responses.
90
+
91
+ ### Added
92
+
93
+ - **`test/smoke.test.mjs`** — 9 integration tests that spin up a real
94
+ mobygate on a side port and hit each surface×mode combination on the
95
+ three endpoints. Asserts wire shape (Anthropic `message_start` / SSE
96
+ events; OpenAI `chat.completion` / chunks), tools mode (native
97
+ `tool_use` / `tool_calls`), `/quiet/v1/messages` scrubbing, and the
98
+ capture summary + SQLite index row landing with the right status,
99
+ stop reason, and usage. Run with `npm run test:smoke` (~65s, ~$0.005
100
+ per run on haiku).
101
+
102
+ ### Changed
103
+
104
+ - **JSON responses now set `X-Request-Id`.** Previously only the SSE
105
+ paths (streaming) returned it. Adding it to non-streaming closes a
106
+ debug gap — you can now correlate any response back to its capture
107
+ file by header without re-parsing the JSON body.
108
+
109
+ ## [0.9.0] — 2026-05-03
110
+
111
+ Four inference handlers consolidated into one runner. The OpenAI and
112
+ Anthropic surfaces (`/v1/chat/completions`, `/v1/messages`,
113
+ `/quiet/v1/messages`) now share a single inference loop driven by
114
+ surface adapters and a `mode = 'stream' | 'json'` flag. Same wire
115
+ behavior on every endpoint; ~1200 lines deleted from `server.js`.
116
+
117
+ This release bundles the v0.8.6–v0.8.11 work that was sitting
118
+ uncommitted on top of v0.8.5 (quiet mode, capture summary completeness,
119
+ client-disconnect preservation, SQLite captures index + `mobygate
120
+ captures` CLI). See the per-version sections below for those.
121
+
122
+ ### Why
123
+
124
+ The four handlers (handleStreaming, handleNonStreaming,
125
+ handleAnthropicNonStreaming, handleAnthropicStreaming) were ~80%
126
+ identical — same SDK iteration, same tool_use detection, same
127
+ auth-failure-text bail, same per-turn usage tracking, same
128
+ post-disconnect grace window. Bug fixes had to land in 2-4 places,
129
+ which is exactly how regressions slip in: v0.8.9's tool_use usage fix
130
+ landed in 4 spots; v0.8.10's client-disconnect-preservation only
131
+ applied to streaming; the OpenAI streaming path had no text-delta
132
+ dedup while the Anthropic streaming path did. One runner means one
133
+ place to fix, four endpoints to test.
134
+
135
+ ### Added
136
+
137
+ - **`lib/inference-runner.js`** — single `runInference()` driving the
138
+ SDK loop, plus `openaiSurface` and `anthropicSurface` adapters that
139
+ encapsulate per-surface translation (prompt parsing, image
140
+ collection, tool-bridge shape, error envelopes, stop-reason mapping)
141
+ and per-mode sinks (`createSink({ mode: 'stream'|'json', ... })`).
142
+ The sink contract is uniform (`start / pushTextDelta / pushToolUse /
143
+ finish / error / hasStarted`); each surface decides whether deltas
144
+ emit live or buffer.
145
+ - **`lib/openai-translation.js`** — extracted the OpenAI-shape input
146
+ helpers (`messagesToPrompt`, `collectImages`, `hasTools`,
147
+ `extractContent`, `extractImageBlocks`, `normalizeModelName`) into
148
+ their own module. Mirrors the structure of `lib/anthropic.js`.
149
+
150
+ ### Changed
151
+
152
+ - **`server.js`** shrinks from 2388 → 1197 lines. The four handlers and
153
+ their helpers are gone; the three route handlers now do request
154
+ validation + dashboard event emission + capture, then dispatch to
155
+ `runInference(ctx, surface, { mode, deps })`.
156
+ - **OpenAI streaming** now does text-delta dedup (the `startsWith` check
157
+ the Anthropic stream had). Safe for SDK paths that don't re-send
158
+ accumulated text (no-op); fixes a class of duplication on paths that
159
+ do.
160
+ - **OpenAI streaming** now uses the SDK `result.result` fallback when
161
+ no streaming text was emitted, matching the Anthropic stream's
162
+ `if (message.result && !textEmittedSoFar && !toolUseEmitted)` branch.
163
+ Surfaces a final response in tools-mode-without-tools-call paths
164
+ that previously dropped to empty.
165
+
166
+ ### Removed
167
+
168
+ - `handleStreaming`, `handleNonStreaming`, `handleAnthropicNonStreaming`,
169
+ `handleAnthropicStreaming` from `server.js`.
170
+ - `extractContent`, `extractImageBlocks`, `collectImages`, `hasTools`,
171
+ `messagesToPrompt`, `buildQueryPrompt`, `normalizeModelName`,
172
+ `makeChunk`, `sendSSE` from `server.js` (moved to
173
+ `lib/openai-translation.js` and `lib/inference-runner.js`).
174
+
175
+ ## [0.8.11] — 2026-05-02
176
+
177
+ Searchable captures: SQLite index over capture summaries + a `mobygate
178
+ captures` CLI for querying. The on-disk format keeps the human-readable
179
+ `.summary.txt` and full `.json` files unchanged — the index is a
180
+ sidecar that mirrors structured fields so you can find that one
181
+ request by model / session / status / stop_reason / duration / text.
182
+
183
+ ### Added
184
+
185
+ - **`lib/captures-index.js`** — better-sqlite3-backed index at
186
+ `~/.mobygate/captures.sqlite` (WAL mode for concurrent CLI reads).
187
+ Schema includes timing, token usage, cache-hit %, tool counts,
188
+ message previews, and pointers back to the json + summary files.
189
+ Loads better-sqlite3 lazily — if the native build fails, every export
190
+ becomes a no-op and the proxy keeps running.
191
+ - **Live indexing.** `request-capture.js` now fires `indexCapture()` on
192
+ request and `updateCaptureResponse()` on response, both wrapped in
193
+ best-effort try/catch so the index can never block proxying.
194
+ - **`mobygate captures` CLI.** Subcommands:
195
+ - `query [text] [--since 1h] [--model opus] [--session <key>]`
196
+ `[--status ok|client_disconnect|error] [--stop end_turn|tool_use|...]`
197
+ `[--min-duration ms] [--max-duration ms] [--has-tools]`
198
+ `[--limit n] [--json]`
199
+ Positional text does a LIKE %text% over first/last user message and
200
+ session_key. Default output is a colored table with TIME / ID /
201
+ STATUS / STOP / DUR / IN / OUT / CACHE / PREVIEW columns.
202
+ - `show <request_id>` prints the human-readable summary file.
203
+ - `stats` rolls up totals: by status, top models, stop reasons,
204
+ cumulative tokens, avg cache hit, avg duration.
205
+ - `rebuild` walks `~/.mobygate/captures/`, parses each `.json` +
206
+ `.summary.txt` pair, and upserts. Idempotent (REPLACE on
207
+ `request_id`). Took 77ms to backfill 99 historical captures.
208
+
209
+ ### Why
210
+
211
+ The captures dir hits hundreds of files within a day of regular use.
212
+ Finding *the* request that disconnected mid-stream, or all the slow
213
+ opus calls, or every tool_use stop in the last hour — those questions
214
+ turn into multi-grep workflows on the filesystem. SQL answers them in
215
+ one command. Examples:
216
+
217
+ ```
218
+ mobygate captures query --status client_disconnect --since 1d
219
+ mobygate captures query --has-tools --min-duration 5000
220
+ mobygate captures query "webflow" --model opus --limit 5
221
+ mobygate captures stats
222
+ ```
223
+
224
+ ### Notes
225
+
226
+ - New dep: `better-sqlite3` (native, ~5MB binary). Compiles at install
227
+ time via node-gyp. If the build fails on an exotic platform, capture
228
+ files keep working — only the index disappears.
229
+ - The index is a *cache*, not the source of truth. The `.json` and
230
+ `.summary.txt` files remain authoritative. Delete the .sqlite at any
231
+ time and `mobygate captures rebuild` reconstructs it.
232
+ - Schema uses `INSERT OR REPLACE` keyed on `request_id`, so re-running
233
+ rebuild is safe and incremental adds work without conflicts.
234
+ - Backfill from filenames assumes the standard format
235
+ `YYYY-MM-DD_HH-MM-SS_<route>_<requestId>.json` written by
236
+ `captureRequest`. Older or hand-edited filenames fall back to the
237
+ file's mtime.
238
+
239
+ ## [0.8.10] — 2026-05-02
240
+
241
+ Don't abort upstream SDK on client disconnect — preserve the partial
242
+ generation so it lands in the capture file even when the client drops.
243
+
244
+ ### Why
245
+
246
+ Previously, both streaming handlers (`/v1/chat/completions` and
247
+ `/v1/messages`) called `abortController.abort()` the moment
248
+ `res.on('close')` fired. That killed the SDK call mid-stream, even
249
+ though the tokens were already in flight and being billed. Result: a
250
+ dropped client wasted billing AND lost the capture — the summary file
251
+ showed `usage: 0` and a truncated body, defeating the purpose of
252
+ captures for post-mortem on long generations.
253
+
254
+ ### Changed
255
+
256
+ - **Streaming handlers no longer abort SDK on client disconnect.** The
257
+ for-await loop keeps consuming SDK messages, so the final `result`
258
+ message lands and `inputTokens/outputTokens/cache*` populate
259
+ correctly. `tx.pushTextDelta` / `tx.pushToolUse` writes silently
260
+ no-op once the response is closed (translator already guards every
261
+ write with `res.writableEnded`).
262
+ - **Removed `if (clientDisconnected) break;`** from the loop in both
263
+ handlers — the break short-circuited the natural completion path.
264
+ - **60s post-disconnect safety cap.** A `setTimeout(..., 60_000)`
265
+ schedules an abort if the SDK hasn't finished within a minute of the
266
+ client dropping. Prevents a flapping client from burning unbounded
267
+ tokens. Timer is `.unref()`'d so it doesn't keep the process alive.
268
+
269
+ ### Notes
270
+
271
+ - Capture status field still records `client_disconnect` vs `ok` — the
272
+ status reflects whether the user actually got the response, not
273
+ whether the SDK completed.
274
+ - The tools-mode `tool_use → abort` path is unchanged; we still need to
275
+ abort there because the SDK would otherwise hang waiting for a tool
276
+ result that's coming in via the client (which may have just dropped).
277
+ - Non-streaming handlers were already correct (no `res.on('close')`
278
+ abort), so no changes there.
279
+ - Verification: trigger a streaming request, kill the client mid-stream
280
+ (`pkill` or close terminal), then check the latest capture summary —
281
+ should show full `usage:` block with non-zero tokens and a complete
282
+ `duration:`.
283
+
284
+ ## [0.8.9] — 2026-05-02
285
+
286
+ Capture summary completeness: streaming `/v1/messages` responses were
287
+ logging `duration: (unknown)` and `usage: 0` for every request that
288
+ ended in `tool_use`. Root cause: the four inference handlers only
289
+ populated their token trackers inside the SDK `result` message branch,
290
+ but the moment the model emits `tool_use`, mobygate aborts the SDK to
291
+ hand off to the client — so `result` never arrives, and the trackers
292
+ stay at zero. The fix mirrors usage from each assistant turn's inner
293
+ Anthropic Message (`message.message.usage`) so the just-completed turn
294
+ is captured even on the abort path.
295
+
296
+ This is the first agent-driven fix surfaced from the #mobygate channel:
297
+ signal in (capture summary blind on the most common request shape),
298
+ signal out (next capture summary will be honest).
299
+
300
+ ### Changed
301
+
302
+ - **`extractSdkUsage` falls back to `message.message?.usage`.** SDK
303
+ `result` messages still take precedence; the new fallback only kicks
304
+ in for assistant turns, where usage lives one level deeper. Last-
305
+ resort flat-field read on the message itself is preserved.
306
+ - **All four inference handlers** (`handleStreaming`,
307
+ `handleNonStreaming`, `handleAnthropicNonStreaming`,
308
+ `handleAnthropicStreaming`) now mirror per-turn usage into their
309
+ local trackers as assistant messages stream in. On multi-turn loops,
310
+ the `result` message still overwrites with the SDK's aggregated
311
+ total (correct). On `tool_use` aborts, the latest assistant turn's
312
+ usage is preserved (correct, instead of zero).
313
+ - **`captureResponse` now receives `durationMs`** from every handler.
314
+ Each handler grabs `Date.now()` at entry and passes
315
+ `durationMs: Date.now() - startedAt` to the summary writer. Capture
316
+ files no longer say `duration: (unknown)`.
317
+ - **`handleAnthropicStreaming` SSE `message_delta`** now carries the
318
+ full usage shape (input + output + cache_read + cache_create), not
319
+ just `output_tokens`. Clients reading `message_delta.usage` for
320
+ their own billing now get accurate numbers on streaming responses.
321
+ - **Streaming Anthropic captureResponse status** distinguishes
322
+ `client_disconnect` from `ok` (was always reporting `ok`). Matches
323
+ the chat-completions handler's existing behavior.
324
+
325
+ ### Verification path
326
+
327
+ After restart, any new capture summary in `~/.mobygate/captures/` for
328
+ a `tool_use`-ending request should show non-zero `input_tokens`,
329
+ `output_tokens`, `cache_read_input_tokens`, `cache_creation_input_tokens`
330
+ and a real `duration: <N> ms`. Pre-fix summaries on the same shape were
331
+ uniformly zeroed.
332
+
333
+ ## [0.8.8] — 2026-05-01
334
+
335
+ Quiet mode: scrub tool descriptions too. Live test of v0.8.7 caught
336
+ that "hermes" was still leaking through 6 times in a Hermes-driven
337
+ request — all from `body.tools[].description` and nested
338
+ `input_schema.properties.<name>.description` fields. The earlier
339
+ `scrubAnthropicBody` only walked `system`, `messages`, and `metadata`,
340
+ so tool definitions slipped past.
341
+
342
+ ### Changed
343
+
344
+ - **`scrubAnthropicBody` now scrubs tool definitions:** `tools[].description`
345
+ is replaced via `scrubString`, and `tools[].input_schema` is recursed
346
+ through with a new `scrubSchemaDescriptions` helper that touches only
347
+ `description` fields (never `type`, `enum`, property keys, etc., so
348
+ schemas stay structurally valid).
349
+ - **`quietDiagnose` walks tools too**, so the per-request log line
350
+ reflects actual coverage.
351
+
352
+ ### Notes
353
+
354
+ - **Tool *names* still pass through unchanged.** Renaming names
355
+ requires bidirectional mapping (model emits scrubbed name in
356
+ `tool_use`, mobygate has to translate back to original before
357
+ forwarding to the client). That's a future release. If your tool
358
+ names contain brand strings, rename them client-side for now.
359
+ - **Verification path:** in the captured `.json`, grep for the brand
360
+ word (should be 0) and the replacement (should match the original
361
+ count). Live test against a Hermes session with 28 declared tools
362
+ brought "hermes" residue from 6 to 0 (assuming no tool *names*
363
+ contain it).
364
+
365
+ ## [0.8.7] — 2026-05-01
366
+
367
+ Quiet mode correction. v0.8.6 implemented `/quiet/v1/messages` as
368
+ "scrub brand names AND drop the claude_code preset," on the assumption
369
+ that the preset was just identity framing. Live testing showed it's
370
+ also Anthropic's *"this is an approved Claude Code client, bill flat
371
+ Max"* signal — dropping the preset flipped requests into extra-usage
372
+ billing and produced a confusing 400 ("You're out of extra usage"
373
+ even with 89% of cap unused).
374
+
375
+ This release fixes the architecture: quiet mode now layers scrubbing
376
+ **on top of** the preset, not instead of it. Same billing path as
377
+ `/v1/messages`, same Claude-Code wrapping, plus the brand-name scrub.
378
+
379
+ ### Changed
380
+
381
+ - **`/quiet/v1/messages` keeps the `claude_code` preset.** The route
382
+ still applies `scrubAnthropicBody()` to the inbound payload
383
+ (`system`, `messages[].content`, `tool_result` content,
384
+ `metadata.user_id`), but the SDK call now uses the same
385
+ `{ type: 'preset', preset: 'claude_code', append: toolsGuidance }`
386
+ systemPrompt as the regular `/v1/messages` route. Net effect: a
387
+ Claude-Code-shaped request that has had brand-name strings
388
+ substituted, which was the original intent.
389
+ - **Removed the `mode` parameter** from `handleAnthropicNonStreaming`
390
+ and `handleAnthropicStreaming`. Both handlers are now mode-agnostic;
391
+ per-route differences (scrubbing) happen at the route level before
392
+ the handler is invoked.
393
+
394
+ ### Fixed
395
+
396
+ - **"Out of extra usage" 400 on `/quiet/v1/messages`.** Caused by the
397
+ v0.8.6 systemPrompt-skipping logic; reverted.
398
+
399
+ ### Notes
400
+
401
+ - The scrubbing itself was always working in v0.8.6 (verified
402
+ standalone). The bug was in the parallel preset-removal change.
403
+ - If you tested v0.8.6 and saw the 400, your account is *not* flagged —
404
+ the regular `/v1/messages` route works for you, and after this fix,
405
+ `/quiet/v1/messages` will too.
406
+
407
+ ## [0.8.6] — 2026-05-01
408
+
409
+ Quiet mode — a new request lane that strips third-party agent harness
410
+ identifiers from the request body before the Claude Agent SDK forwards
411
+ to Anthropic.
412
+
413
+ Motivation: community reports (X) describe Anthropic's API surface
414
+ appearing to scan request bodies for known third-party harness names
415
+ (e.g. "openclaw" appearing in a `package.json` that the agent reads
416
+ into context). The match seems to flip the account into per-token
417
+ extra-usage billing even when the harness isn't actually invoked. Quiet
418
+ mode is a defensive content-rewrite pass that runs in mobygate before
419
+ the SDK call: substitute `openclaw → orchestrator`, `hermes → assistant`,
420
+ `mobius → bot`, etc., so the outbound payload reads as a vanilla
421
+ Anthropic-shape call.
422
+
423
+ This is best-effort. Tool results / file contents that the agent reads
424
+ mid-conversation still flow back through the same scrub layer, but
425
+ aggressive scrubbing of arbitrary content can distort the model's
426
+ understanding (e.g. "install openclaw" becoming "install orchestrator"
427
+ breaks shell instructions). Quiet mode is **opt-in via a separate route**
428
+ (`/quiet/v1/messages`) so the default `/v1/messages` path stays
429
+ untouched and faithful.
430
+
431
+ ### Added
432
+
433
+ - **`POST /quiet/v1/messages`** — Anthropic-shape route that scrubs
434
+ third-party harness names from the request body and switches the SDK
435
+ from the `claude_code` system-prompt preset to a raw string
436
+ systemPrompt (so the response classifier sees neither identifying
437
+ brand strings nor the Claude-Code preset wrapping). Same SDK path,
438
+ same auth, same Max-OAuth flat billing — just a cleaner outbound
439
+ payload. Captures still write to `~/.mobygate/captures/` (tagged
440
+ `path: /quiet/v1/messages`) so you can inspect what got scrubbed.
441
+ - **`lib/quiet.js`** — scrub primitives: `scrubAnthropicBody(body)`
442
+ (mutates in place across `system`, `messages[].content` strings/blocks,
443
+ `tool_result` content, and `metadata.user_id`), plus `quietDiagnose(body)`
444
+ (counts matches without mutating, useful for logging and capture
445
+ summaries). Word list is configurable via
446
+ `~/.mobygate/quiet-words.txt` (one `term=replacement` per line); falls
447
+ back to a sensible default map covering openclaw, hermes, mobius,
448
+ nous, mobygate, claude-max-proxy and a few variants.
449
+ - **Per-request scrub diagnostic** — every `/quiet/v1/messages` call
450
+ logs `[quiet] scrubbed N occurrence(s): word×n word×n …` so you can
451
+ see at a glance whether the route is actually doing work or whether
452
+ the payload was already clean.
453
+
454
+ ### Changed
455
+
456
+ - **`handleAnthropicNonStreaming` / `handleAnthropicStreaming` accept
457
+ a `mode` param** — defaults to `'normal'` (current behavior, claude_code
458
+ preset). When called with `'quiet'`, both handlers compute
459
+ `sdkSystemPrompt` as a raw string (combining the inbound system block
460
+ and tools-usage guidance) instead of the preset object. The OpenAI
461
+ surface (`/v1/chat/completions`) keeps the preset unconditionally.
462
+
463
+ ### Notes
464
+
465
+ - **Tool definitions are not (yet) renamed.** If a client registers a
466
+ tool literally named `openclaw_send_message`, the name still goes
467
+ through unchanged. We may add a tool-rename map (with reverse-mapping
468
+ for tool_use responses) in a later release. For now, rename your
469
+ tools client-side if you want full quiet coverage.
470
+ - **No response-side scrubbing.** The model rarely emits the brand
471
+ names unprompted (and if the input was scrubbed it doesn't know them
472
+ anyway), so we save the round-trip cost.
473
+ - **Scrub is best-effort.** The exhaustive list of vectors Anthropic
474
+ may match against is unknown; the default word list covers the names
475
+ reported on X. Add your own via the words file.
476
+
477
+ ## [0.8.5] — 2026-04-30
478
+
479
+ Cost visibility + session TTL bump. Born from a per-channel cost audit
480
+ that found 38.9% of total spend going to "singleton" sessions —
481
+ channels that fired once, idled past the wire-cache TTL, and paid
482
+ full `cache_creation` tax on the next turn. The original v0.8.5 plan
483
+ included passing through Anthropic's 1-hour cache TTL beta header,
484
+ but the Claude Agent SDK doesn't expose that beta or `cache_control`
485
+ markers — so this release does what's actually achievable from the
486
+ mobygate side: visibility + a TTL bump + an upstream feature request.
487
+
488
+ ### Added
489
+
490
+ - **`/dashboard/session-costs` endpoint** — aggregates the
491
+ `[model-billed]` log lines per session_key. Returns cost/turn,
492
+ bucket (singleton/short/medium/warm), per-model breakdown, and the
493
+ first user message for human-readable identification.
494
+
495
+ - **Sessions tab in the inspector** — new view at `/inspector` showing
496
+ cost-per-session sorted descending. Surfaces the singleton-bleed
497
+ pattern in real-time. Includes a bucket overview (% of total cost in
498
+ warm vs singleton sessions) so the cache-amortization story jumps
499
+ out at a glance.
500
+
501
+ This view is built directly on top of v0.8.3's `[model-billed]`
502
+ log line — the diagnostic logging continues to pay dividends.
503
+
504
+ ### Changed
505
+
506
+ - **Default `SESSION_TTL_MS` raised from 1h → 4h.** Multi-channel
507
+ Discord users (mobygate's primary use case) typically revisit
508
+ channels every few hours, and a 1h TTL forced mobygate to issue a
509
+ fresh `query()` (full prompt re-send) on every wake-up. With 4h,
510
+ mobygate's session-store retains the SDK session ID for half a day,
511
+ so the next request can resume rather than reissuing the entire
512
+ prompt prefix.
513
+
514
+ **Caveat:** this only addresses SDK session continuity inside
515
+ mobygate. Anthropic's wire-side prompt cache is still 5 min by
516
+ default — the SDK doesn't currently expose the
517
+ `extended-cache-ttl-2025-04-11` beta header to callers, so the
518
+ cache_creation tax on truly cold channels (no traffic in 5+ min)
519
+ is unchanged.
520
+
521
+ - **New env var `MOBY_SESSION_TTL_HOURS`** as a more readable override:
522
+ `MOBY_SESSION_TTL_HOURS=8` is equivalent to setting `SESSION_TTL_MS`
523
+ to 28800000. Either works; legacy var still respected.
524
+
525
+ - **Startup banner** now shows TTL in both minutes and hours:
526
+ `session TTL 240 min (4.0h)`.
527
+
528
+ ### Notes
529
+
530
+ The per-channel audit findings (43-channel installation, ~$48 over
531
+ recent capture window):
532
+
533
+ ```
534
+ 72 singleton sessions → $18.61 (38.9% of total cost)
535
+ 20 short (2-3 turns) → $ 5.25 (11.0%)
536
+ 7 medium (4-10 turns) → $ 4.21 ( 8.8%)
537
+ 6 warm (11+ turns) → $19.73 (41.3%) ← 6 sessions = 41% of value
538
+ ```
539
+
540
+ Most users don't need to act on this — the fix is upstream in the
541
+ Claude Agent SDK exposing `extended-cache-ttl-2025-04-11`. Until then
542
+ the new sessions view at least makes the bleed visible so users can
543
+ decide which channels to keep warm via an OpenClaw cron ping.
544
+
545
+ A feature request was filed with the SDK team:
546
+ > "Expose `extended-cache-ttl-2025-04-11` beta header in `Options.betas[]`
547
+ > and a `cacheTtl: '1h' | '5m'` option on `query()` so multi-tenant
548
+ > / multi-channel mobygate-style proxies can extend wire-cache lifetime
549
+ > for sporadically-active sessions."
550
+
7
551
  ## [0.8.4] — 2026-04-28
8
552
 
9
553
  Sonnet 1M context fix — gated billing tier mismatch. v0.8.3's "match