mobygate 0.8.4 → 0.9.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -4,6 +4,478 @@ All notable changes to mobygate are documented here. Format loosely follows
4
4
  [Keep a Changelog](https://keepachangelog.com/en/1.1.0/); version numbers are
5
5
  [Semantic Versioning](https://semver.org/).
6
6
 
7
+ ## [0.9.2] — 2026-05-03
8
+
9
+ Return `405 Method Not Allowed` (with `Allow: POST` header) for `GET`
10
+ on `/v1/chat/completions`, `/v1/messages`, and `/quiet/v1/messages`
11
+ instead of the default `404`. RFC 9110-correct, and unblocks
12
+ endpoint-detection probes (e.g. Hermes onboarding) that treat 404 as
13
+ "missing" but 405 as "exists, wrong verb."
14
+
15
+ ## [0.9.1] — 2026-05-03
16
+
17
+ Smoke-test suite + `X-Request-Id` on JSON responses.
18
+
19
+ ### Added
20
+
21
+ - **`test/smoke.test.mjs`** — 9 integration tests that spin up a real
22
+ mobygate on a side port and hit each surface×mode combination on the
23
+ three endpoints. Asserts wire shape (Anthropic `message_start` / SSE
24
+ events; OpenAI `chat.completion` / chunks), tools mode (native
25
+ `tool_use` / `tool_calls`), `/quiet/v1/messages` scrubbing, and the
26
+ capture summary + SQLite index row landing with the right status,
27
+ stop reason, and usage. Run with `npm run test:smoke` (~65s, ~$0.005
28
+ per run on haiku).
29
+
30
+ ### Changed
31
+
32
+ - **JSON responses now set `X-Request-Id`.** Previously only the SSE
33
+ paths (streaming) returned it. Adding it to non-streaming closes a
34
+ debug gap — you can now correlate any response back to its capture
35
+ file by header without re-parsing the JSON body.
36
+
37
+ ## [0.9.0] — 2026-05-03
38
+
39
+ Four inference handlers consolidated into one runner. The OpenAI and
40
+ Anthropic surfaces (`/v1/chat/completions`, `/v1/messages`,
41
+ `/quiet/v1/messages`) now share a single inference loop driven by
42
+ surface adapters and a `mode = 'stream' | 'json'` flag. Same wire
43
+ behavior on every endpoint; ~1200 lines deleted from `server.js`.
44
+
45
+ This release bundles the v0.8.6–v0.8.11 work that was sitting
46
+ uncommitted on top of v0.8.5 (quiet mode, capture summary completeness,
47
+ client-disconnect preservation, SQLite captures index + `mobygate
48
+ captures` CLI). See the per-version sections below for those.
49
+
50
+ ### Why
51
+
52
+ The four handlers (handleStreaming, handleNonStreaming,
53
+ handleAnthropicNonStreaming, handleAnthropicStreaming) were ~80%
54
+ identical — same SDK iteration, same tool_use detection, same
55
+ auth-failure-text bail, same per-turn usage tracking, same
56
+ post-disconnect grace window. Bug fixes had to land in 2-4 places,
57
+ which is exactly how regressions slip in: v0.8.9's tool_use usage fix
58
+ landed in 4 spots; v0.8.10's client-disconnect-preservation only
59
+ applied to streaming; the OpenAI streaming path had no text-delta
60
+ dedup while the Anthropic streaming path did. One runner means one
61
+ place to fix, four endpoints to test.
62
+
63
+ ### Added
64
+
65
+ - **`lib/inference-runner.js`** — single `runInference()` driving the
66
+ SDK loop, plus `openaiSurface` and `anthropicSurface` adapters that
67
+ encapsulate per-surface translation (prompt parsing, image
68
+ collection, tool-bridge shape, error envelopes, stop-reason mapping)
69
+ and per-mode sinks (`createSink({ mode: 'stream'|'json', ... })`).
70
+ The sink contract is uniform (`start / pushTextDelta / pushToolUse /
71
+ finish / error / hasStarted`); each surface decides whether deltas
72
+ emit live or buffer.
73
+ - **`lib/openai-translation.js`** — extracted the OpenAI-shape input
74
+ helpers (`messagesToPrompt`, `collectImages`, `hasTools`,
75
+ `extractContent`, `extractImageBlocks`, `normalizeModelName`) into
76
+ their own module. Mirrors the structure of `lib/anthropic.js`.
77
+
78
+ ### Changed
79
+
80
+ - **`server.js`** shrinks from 2388 → 1197 lines. The four handlers and
81
+ their helpers are gone; the three route handlers now do request
82
+ validation + dashboard event emission + capture, then dispatch to
83
+ `runInference(ctx, surface, { mode, deps })`.
84
+ - **OpenAI streaming** now does text-delta dedup (the `startsWith` check
85
+ the Anthropic stream had). Safe for SDK paths that don't re-send
86
+ accumulated text (no-op); fixes a class of duplication on paths that
87
+ do.
88
+ - **OpenAI streaming** now uses the SDK `result.result` fallback when
89
+ no streaming text was emitted, matching the Anthropic stream's
90
+ `if (message.result && !textEmittedSoFar && !toolUseEmitted)` branch.
91
+ Surfaces a final response in tools-mode-without-tools-call paths
92
+ that previously dropped to empty.
93
+
94
+ ### Removed
95
+
96
+ - `handleStreaming`, `handleNonStreaming`, `handleAnthropicNonStreaming`,
97
+ `handleAnthropicStreaming` from `server.js`.
98
+ - `extractContent`, `extractImageBlocks`, `collectImages`, `hasTools`,
99
+ `messagesToPrompt`, `buildQueryPrompt`, `normalizeModelName`,
100
+ `makeChunk`, `sendSSE` from `server.js` (moved to
101
+ `lib/openai-translation.js` and `lib/inference-runner.js`).
102
+
103
+ ## [0.8.11] — 2026-05-02
104
+
105
+ Searchable captures: SQLite index over capture summaries + a `mobygate
106
+ captures` CLI for querying. The on-disk format keeps the human-readable
107
+ `.summary.txt` and full `.json` files unchanged — the index is a
108
+ sidecar that mirrors structured fields so you can find that one
109
+ request by model / session / status / stop_reason / duration / text.
110
+
111
+ ### Added
112
+
113
+ - **`lib/captures-index.js`** — better-sqlite3-backed index at
114
+ `~/.mobygate/captures.sqlite` (WAL mode for concurrent CLI reads).
115
+ Schema includes timing, token usage, cache-hit %, tool counts,
116
+ message previews, and pointers back to the json + summary files.
117
+ Loads better-sqlite3 lazily — if the native build fails, every export
118
+ becomes a no-op and the proxy keeps running.
119
+ - **Live indexing.** `request-capture.js` now fires `indexCapture()` on
120
+ request and `updateCaptureResponse()` on response, both wrapped in
121
+ best-effort try/catch so the index can never block proxying.
122
+ - **`mobygate captures` CLI.** Subcommands:
123
+ - `query [text] [--since 1h] [--model opus] [--session <key>]`
124
+ `[--status ok|client_disconnect|error] [--stop end_turn|tool_use|...]`
125
+ `[--min-duration ms] [--max-duration ms] [--has-tools]`
126
+ `[--limit n] [--json]`
127
+ Positional text does a LIKE %text% over first/last user message and
128
+ session_key. Default output is a colored table with TIME / ID /
129
+ STATUS / STOP / DUR / IN / OUT / CACHE / PREVIEW columns.
130
+ - `show <request_id>` prints the human-readable summary file.
131
+ - `stats` rolls up totals: by status, top models, stop reasons,
132
+ cumulative tokens, avg cache hit, avg duration.
133
+ - `rebuild` walks `~/.mobygate/captures/`, parses each `.json` +
134
+ `.summary.txt` pair, and upserts. Idempotent (REPLACE on
135
+ `request_id`). Took 77ms to backfill 99 historical captures.
136
+
137
+ ### Why
138
+
139
+ The captures dir hits hundreds of files within a day of regular use.
140
+ Finding *the* request that disconnected mid-stream, or all the slow
141
+ opus calls, or every tool_use stop in the last hour — those questions
142
+ turn into multi-grep workflows on the filesystem. SQL answers them in
143
+ one command. Examples:
144
+
145
+ ```
146
+ mobygate captures query --status client_disconnect --since 1d
147
+ mobygate captures query --has-tools --min-duration 5000
148
+ mobygate captures query "webflow" --model opus --limit 5
149
+ mobygate captures stats
150
+ ```
151
+
152
+ ### Notes
153
+
154
+ - New dep: `better-sqlite3` (native, ~5MB binary). Compiles at install
155
+ time via node-gyp. If the build fails on an exotic platform, capture
156
+ files keep working — only the index disappears.
157
+ - The index is a *cache*, not the source of truth. The `.json` and
158
+ `.summary.txt` files remain authoritative. Delete the .sqlite at any
159
+ time and `mobygate captures rebuild` reconstructs it.
160
+ - Schema uses `INSERT OR REPLACE` keyed on `request_id`, so re-running
161
+ rebuild is safe and incremental adds work without conflicts.
162
+ - Backfill from filenames assumes the standard format
163
+ `YYYY-MM-DD_HH-MM-SS_<route>_<requestId>.json` written by
164
+ `captureRequest`. Older or hand-edited filenames fall back to the
165
+ file's mtime.
166
+
167
+ ## [0.8.10] — 2026-05-02
168
+
169
+ Don't abort upstream SDK on client disconnect — preserve the partial
170
+ generation so it lands in the capture file even when the client drops.
171
+
172
+ ### Why
173
+
174
+ Previously, both streaming handlers (`/v1/chat/completions` and
175
+ `/v1/messages`) called `abortController.abort()` the moment
176
+ `res.on('close')` fired. That killed the SDK call mid-stream, even
177
+ though the tokens were already in flight and being billed. Result: a
178
+ dropped client wasted billing AND lost the capture — the summary file
179
+ showed `usage: 0` and a truncated body, defeating the purpose of
180
+ captures for post-mortem on long generations.
181
+
182
+ ### Changed
183
+
184
+ - **Streaming handlers no longer abort SDK on client disconnect.** The
185
+ for-await loop keeps consuming SDK messages, so the final `result`
186
+ message lands and `inputTokens/outputTokens/cache*` populate
187
+ correctly. `tx.pushTextDelta` / `tx.pushToolUse` writes silently
188
+ no-op once the response is closed (translator already guards every
189
+ write with `res.writableEnded`).
190
+ - **Removed `if (clientDisconnected) break;`** from the loop in both
191
+ handlers — the break short-circuited the natural completion path.
192
+ - **60s post-disconnect safety cap.** A `setTimeout(..., 60_000)`
193
+ schedules an abort if the SDK hasn't finished within a minute of the
194
+ client dropping. Prevents a flapping client from burning unbounded
195
+ tokens. Timer is `.unref()`'d so it doesn't keep the process alive.
196
+
197
+ ### Notes
198
+
199
+ - Capture status field still records `client_disconnect` vs `ok` — the
200
+ status reflects whether the user actually got the response, not
201
+ whether the SDK completed.
202
+ - The tools-mode `tool_use → abort` path is unchanged; we still need to
203
+ abort there because the SDK would otherwise hang waiting for a tool
204
+ result that's coming in via the client (which may have just dropped).
205
+ - Non-streaming handlers were already correct (no `res.on('close')`
206
+ abort), so no changes there.
207
+ - Verification: trigger a streaming request, kill the client mid-stream
208
+ (`pkill` or close terminal), then check the latest capture summary —
209
+ should show full `usage:` block with non-zero tokens and a complete
210
+ `duration:`.
211
+
212
+ ## [0.8.9] — 2026-05-02
213
+
214
+ Capture summary completeness: streaming `/v1/messages` responses were
215
+ logging `duration: (unknown)` and `usage: 0` for every request that
216
+ ended in `tool_use`. Root cause: the four inference handlers only
217
+ populated their token trackers inside the SDK `result` message branch,
218
+ but the moment the model emits `tool_use`, mobygate aborts the SDK to
219
+ hand off to the client — so `result` never arrives, and the trackers
220
+ stay at zero. The fix mirrors usage from each assistant turn's inner
221
+ Anthropic Message (`message.message.usage`) so the just-completed turn
222
+ is captured even on the abort path.
223
+
224
+ This is the first agent-driven fix surfaced from the #mobygate channel:
225
+ signal in (capture summary blind on the most common request shape),
226
+ signal out (next capture summary will be honest).
227
+
228
+ ### Changed
229
+
230
+ - **`extractSdkUsage` falls back to `message.message?.usage`.** SDK
231
+ `result` messages still take precedence; the new fallback only kicks
232
+ in for assistant turns, where usage lives one level deeper. Last-
233
+ resort flat-field read on the message itself is preserved.
234
+ - **All four inference handlers** (`handleStreaming`,
235
+ `handleNonStreaming`, `handleAnthropicNonStreaming`,
236
+ `handleAnthropicStreaming`) now mirror per-turn usage into their
237
+ local trackers as assistant messages stream in. On multi-turn loops,
238
+ the `result` message still overwrites with the SDK's aggregated
239
+ total (correct). On `tool_use` aborts, the latest assistant turn's
240
+ usage is preserved (correct, instead of zero).
241
+ - **`captureResponse` now receives `durationMs`** from every handler.
242
+ Each handler grabs `Date.now()` at entry and passes
243
+ `durationMs: Date.now() - startedAt` to the summary writer. Capture
244
+ files no longer say `duration: (unknown)`.
245
+ - **`handleAnthropicStreaming` SSE `message_delta`** now carries the
246
+ full usage shape (input + output + cache_read + cache_create), not
247
+ just `output_tokens`. Clients reading `message_delta.usage` for
248
+ their own billing now get accurate numbers on streaming responses.
249
+ - **Streaming Anthropic captureResponse status** distinguishes
250
+ `client_disconnect` from `ok` (was always reporting `ok`). Matches
251
+ the chat-completions handler's existing behavior.
252
+
253
+ ### Verification path
254
+
255
+ After restart, any new capture summary in `~/.mobygate/captures/` for
256
+ a `tool_use`-ending request should show non-zero `input_tokens`,
257
+ `output_tokens`, `cache_read_input_tokens`, `cache_creation_input_tokens`
258
+ and a real `duration: <N> ms`. Pre-fix summaries on the same shape were
259
+ uniformly zeroed.
260
+
261
+ ## [0.8.8] — 2026-05-01
262
+
263
+ Quiet mode: scrub tool descriptions too. Live test of v0.8.7 caught
264
+ that "hermes" was still leaking through 6 times in a Hermes-driven
265
+ request — all from `body.tools[].description` and nested
266
+ `input_schema.properties.<name>.description` fields. The earlier
267
+ `scrubAnthropicBody` only walked `system`, `messages`, and `metadata`,
268
+ so tool definitions slipped past.
269
+
270
+ ### Changed
271
+
272
+ - **`scrubAnthropicBody` now scrubs tool definitions:** `tools[].description`
273
+ is replaced via `scrubString`, and `tools[].input_schema` is recursed
274
+ through with a new `scrubSchemaDescriptions` helper that touches only
275
+ `description` fields (never `type`, `enum`, property keys, etc., so
276
+ schemas stay structurally valid).
277
+ - **`quietDiagnose` walks tools too**, so the per-request log line
278
+ reflects actual coverage.
279
+
280
+ ### Notes
281
+
282
+ - **Tool *names* still pass through unchanged.** Renaming names
283
+ requires bidirectional mapping (model emits scrubbed name in
284
+ `tool_use`, mobygate has to translate back to original before
285
+ forwarding to the client). That's a future release. If your tool
286
+ names contain brand strings, rename them client-side for now.
287
+ - **Verification path:** in the captured `.json`, grep for the brand
288
+ word (should be 0) and the replacement (should match the original
289
+ count). Live test against a Hermes session with 28 declared tools
290
+ brought "hermes" residue from 6 to 0 (assuming no tool *names*
291
+ contain it).
292
+
293
+ ## [0.8.7] — 2026-05-01
294
+
295
+ Quiet mode correction. v0.8.6 implemented `/quiet/v1/messages` as
296
+ "scrub brand names AND drop the claude_code preset," on the assumption
297
+ that the preset was just identity framing. Live testing showed it's
298
+ also Anthropic's *"this is an approved Claude Code client, bill flat
299
+ Max"* signal — dropping the preset flipped requests into extra-usage
300
+ billing and produced a confusing 400 ("You're out of extra usage"
301
+ even with 89% of cap unused).
302
+
303
+ This release fixes the architecture: quiet mode now layers scrubbing
304
+ **on top of** the preset, not instead of it. Same billing path as
305
+ `/v1/messages`, same Claude-Code wrapping, plus the brand-name scrub.
306
+
307
+ ### Changed
308
+
309
+ - **`/quiet/v1/messages` keeps the `claude_code` preset.** The route
310
+ still applies `scrubAnthropicBody()` to the inbound payload
311
+ (`system`, `messages[].content`, `tool_result` content,
312
+ `metadata.user_id`), but the SDK call now uses the same
313
+ `{ type: 'preset', preset: 'claude_code', append: toolsGuidance }`
314
+ systemPrompt as the regular `/v1/messages` route. Net effect: a
315
+ Claude-Code-shaped request that has had brand-name strings
316
+ substituted, which was the original intent.
317
+ - **Removed the `mode` parameter** from `handleAnthropicNonStreaming`
318
+ and `handleAnthropicStreaming`. Both handlers are now mode-agnostic;
319
+ per-route differences (scrubbing) happen at the route level before
320
+ the handler is invoked.
321
+
322
+ ### Fixed
323
+
324
+ - **"Out of extra usage" 400 on `/quiet/v1/messages`.** Caused by the
325
+ v0.8.6 systemPrompt-skipping logic; reverted.
326
+
327
+ ### Notes
328
+
329
+ - The scrubbing itself was always working in v0.8.6 (verified
330
+ standalone). The bug was in the parallel preset-removal change.
331
+ - If you tested v0.8.6 and saw the 400, your account is *not* flagged —
332
+ the regular `/v1/messages` route works for you, and after this fix,
333
+ `/quiet/v1/messages` will too.
334
+
335
+ ## [0.8.6] — 2026-05-01
336
+
337
+ Quiet mode — a new request lane that strips third-party agent harness
338
+ identifiers from the request body before the Claude Agent SDK forwards
339
+ to Anthropic.
340
+
341
+ Motivation: community reports (X) describe Anthropic's API surface
342
+ appearing to scan request bodies for known third-party harness names
343
+ (e.g. "openclaw" appearing in a `package.json` that the agent reads
344
+ into context). The match seems to flip the account into per-token
345
+ extra-usage billing even when the harness isn't actually invoked. Quiet
346
+ mode is a defensive content-rewrite pass that runs in mobygate before
347
+ the SDK call: substitute `openclaw → orchestrator`, `hermes → assistant`,
348
+ `mobius → bot`, etc., so the outbound payload reads as a vanilla
349
+ Anthropic-shape call.
350
+
351
+ This is best-effort. Tool results / file contents that the agent reads
352
+ mid-conversation still flow back through the same scrub layer, but
353
+ aggressive scrubbing of arbitrary content can distort the model's
354
+ understanding (e.g. "install openclaw" becoming "install orchestrator"
355
+ breaks shell instructions). Quiet mode is **opt-in via a separate route**
356
+ (`/quiet/v1/messages`) so the default `/v1/messages` path stays
357
+ untouched and faithful.
358
+
359
+ ### Added
360
+
361
+ - **`POST /quiet/v1/messages`** — Anthropic-shape route that scrubs
362
+ third-party harness names from the request body and switches the SDK
363
+ from the `claude_code` system-prompt preset to a raw string
364
+ systemPrompt (so the response classifier sees neither identifying
365
+ brand strings nor the Claude-Code preset wrapping). Same SDK path,
366
+ same auth, same Max-OAuth flat billing — just a cleaner outbound
367
+ payload. Captures still write to `~/.mobygate/captures/` (tagged
368
+ `path: /quiet/v1/messages`) so you can inspect what got scrubbed.
369
+ - **`lib/quiet.js`** — scrub primitives: `scrubAnthropicBody(body)`
370
+ (mutates in place across `system`, `messages[].content` strings/blocks,
371
+ `tool_result` content, and `metadata.user_id`), plus `quietDiagnose(body)`
372
+ (counts matches without mutating, useful for logging and capture
373
+ summaries). Word list is configurable via
374
+ `~/.mobygate/quiet-words.txt` (one `term=replacement` per line); falls
375
+ back to a sensible default map covering openclaw, hermes, mobius,
376
+ nous, mobygate, claude-max-proxy and a few variants.
377
+ - **Per-request scrub diagnostic** — every `/quiet/v1/messages` call
378
+ logs `[quiet] scrubbed N occurrence(s): word×n word×n …` so you can
379
+ see at a glance whether the route is actually doing work or whether
380
+ the payload was already clean.
381
+
382
+ ### Changed
383
+
384
+ - **`handleAnthropicNonStreaming` / `handleAnthropicStreaming` accept
385
+ a `mode` param** — defaults to `'normal'` (current behavior, claude_code
386
+ preset). When called with `'quiet'`, both handlers compute
387
+ `sdkSystemPrompt` as a raw string (combining the inbound system block
388
+ and tools-usage guidance) instead of the preset object. The OpenAI
389
+ surface (`/v1/chat/completions`) keeps the preset unconditionally.
390
+
391
+ ### Notes
392
+
393
+ - **Tool definitions are not (yet) renamed.** If a client registers a
394
+ tool literally named `openclaw_send_message`, the name still goes
395
+ through unchanged. We may add a tool-rename map (with reverse-mapping
396
+ for tool_use responses) in a later release. For now, rename your
397
+ tools client-side if you want full quiet coverage.
398
+ - **No response-side scrubbing.** The model rarely emits the brand
399
+ names unprompted (and if the input was scrubbed it doesn't know them
400
+ anyway), so we save the round-trip cost.
401
+ - **Scrub is best-effort.** The exhaustive list of vectors Anthropic
402
+ may match against is unknown; the default word list covers the names
403
+ reported on X. Add your own via the words file.
404
+
405
+ ## [0.8.5] — 2026-04-30
406
+
407
+ Cost visibility + session TTL bump. Born from a per-channel cost audit
408
+ that found 38.9% of total spend going to "singleton" sessions —
409
+ channels that fired once, idled past the wire-cache TTL, and paid
410
+ full `cache_creation` tax on the next turn. The original v0.8.5 plan
411
+ included passing through Anthropic's 1-hour cache TTL beta header,
412
+ but the Claude Agent SDK doesn't expose that beta or `cache_control`
413
+ markers — so this release does what's actually achievable from the
414
+ mobygate side: visibility + a TTL bump + an upstream feature request.
415
+
416
+ ### Added
417
+
418
+ - **`/dashboard/session-costs` endpoint** — aggregates the
419
+ `[model-billed]` log lines per session_key. Returns cost/turn,
420
+ bucket (singleton/short/medium/warm), per-model breakdown, and the
421
+ first user message for human-readable identification.
422
+
423
+ - **Sessions tab in the inspector** — new view at `/inspector` showing
424
+ cost-per-session sorted descending. Surfaces the singleton-bleed
425
+ pattern in real-time. Includes a bucket overview (% of total cost in
426
+ warm vs singleton sessions) so the cache-amortization story jumps
427
+ out at a glance.
428
+
429
+ This view is built directly on top of v0.8.3's `[model-billed]`
430
+ log line — the diagnostic logging continues to pay dividends.
431
+
432
+ ### Changed
433
+
434
+ - **Default `SESSION_TTL_MS` raised from 1h → 4h.** Multi-channel
435
+ Discord users (mobygate's primary use case) typically revisit
436
+ channels every few hours, and a 1h TTL forced mobygate to issue a
437
+ fresh `query()` (full prompt re-send) on every wake-up. With 4h,
438
+ mobygate's session-store retains the SDK session ID for half a day,
439
+ so the next request can resume rather than reissuing the entire
440
+ prompt prefix.
441
+
442
+ **Caveat:** this only addresses SDK session continuity inside
443
+ mobygate. Anthropic's wire-side prompt cache is still 5 min by
444
+ default — the SDK doesn't currently expose the
445
+ `extended-cache-ttl-2025-04-11` beta header to callers, so the
446
+ cache_creation tax on truly cold channels (no traffic in 5+ min)
447
+ is unchanged.
448
+
449
+ - **New env var `MOBY_SESSION_TTL_HOURS`** as a more readable override:
450
+ `MOBY_SESSION_TTL_HOURS=8` is equivalent to setting `SESSION_TTL_MS`
451
+ to 28800000. Either works; legacy var still respected.
452
+
453
+ - **Startup banner** now shows TTL in both minutes and hours:
454
+ `session TTL 240 min (4.0h)`.
455
+
456
+ ### Notes
457
+
458
+ The per-channel audit findings (43-channel installation, ~$48 over
459
+ recent capture window):
460
+
461
+ ```
462
+ 72 singleton sessions → $18.61 (38.9% of total cost)
463
+ 20 short (2-3 turns) → $ 5.25 (11.0%)
464
+ 7 medium (4-10 turns) → $ 4.21 ( 8.8%)
465
+ 6 warm (11+ turns) → $19.73 (41.3%) ← 6 sessions = 41% of value
466
+ ```
467
+
468
+ Most users don't need to act on this — the fix is upstream in the
469
+ Claude Agent SDK exposing `extended-cache-ttl-2025-04-11`. Until then
470
+ the new sessions view at least makes the bleed visible so users can
471
+ decide which channels to keep warm via an OpenClaw cron ping.
472
+
473
+ A feature request was filed with the SDK team:
474
+ > "Expose `extended-cache-ttl-2025-04-11` beta header in `Options.betas[]`
475
+ > and a `cacheTtl: '1h' | '5m'` option on `query()` so multi-tenant
476
+ > / multi-channel mobygate-style proxies can extend wire-cache lifetime
477
+ > for sporadically-active sessions."
478
+
7
479
  ## [0.8.4] — 2026-04-28
8
480
 
9
481
  Sonnet 1M context fix — gated billing tier mismatch. v0.8.3's "match