npm - mobygate - Versions diffs - 0.8.4 → 0.9.2 - Mend

mobygate 0.8.4 → 0.9.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (11) hide show

package/CHANGELOG.md +472 -0
package/bin/mobygate.js +214 -0
package/inspector.html +200 -3
package/lib/anthropic.js +6 -1
package/lib/captures-index.js +524 -0
package/lib/inference-runner.js +753 -0
package/lib/openai-translation.js +146 -0
package/lib/quiet.js +249 -0
package/lib/request-capture.js +24 -0
package/package.json +3 -1
package/server.js +318 -1110

package/CHANGELOG.md CHANGED Viewed

@@ -4,6 +4,478 @@ All notable changes to mobygate are documented here. Format loosely follows
 [Keep a Changelog](https://keepachangelog.com/en/1.1.0/); version numbers are
 [Semantic Versioning](https://semver.org/).
+## [0.9.2] — 2026-05-03
+Return `405 Method Not Allowed` (with `Allow: POST` header) for `GET`
+on `/v1/chat/completions`, `/v1/messages`, and `/quiet/v1/messages`
+instead of the default `404`. RFC 9110-correct, and unblocks
+endpoint-detection probes (e.g. Hermes onboarding) that treat 404 as
+"missing" but 405 as "exists, wrong verb."
+## [0.9.1] — 2026-05-03
+Smoke-test suite + `X-Request-Id` on JSON responses.
+### Added
+- **`test/smoke.test.mjs`** — 9 integration tests that spin up a real
+  mobygate on a side port and hit each surface×mode combination on the
+  three endpoints. Asserts wire shape (Anthropic `message_start` / SSE
+  events; OpenAI `chat.completion` / chunks), tools mode (native
+  `tool_use` / `tool_calls`), `/quiet/v1/messages` scrubbing, and the
+  capture summary + SQLite index row landing with the right status,
+  stop reason, and usage. Run with `npm run test:smoke` (~65s, ~$0.005
+  per run on haiku).
+### Changed
+- **JSON responses now set `X-Request-Id`.** Previously only the SSE
+  paths (streaming) returned it. Adding it to non-streaming closes a
+  debug gap — you can now correlate any response back to its capture
+  file by header without re-parsing the JSON body.
+## [0.9.0] — 2026-05-03
+Four inference handlers consolidated into one runner. The OpenAI and
+Anthropic surfaces (`/v1/chat/completions`, `/v1/messages`,
+`/quiet/v1/messages`) now share a single inference loop driven by
+surface adapters and a `mode = 'stream' | 'json'` flag. Same wire
+behavior on every endpoint; ~1200 lines deleted from `server.js`.
+This release bundles the v0.8.6–v0.8.11 work that was sitting
+uncommitted on top of v0.8.5 (quiet mode, capture summary completeness,
+client-disconnect preservation, SQLite captures index + `mobygate
+captures` CLI). See the per-version sections below for those.
+### Why
+The four handlers (handleStreaming, handleNonStreaming,
+handleAnthropicNonStreaming, handleAnthropicStreaming) were ~80%
+identical — same SDK iteration, same tool_use detection, same
+auth-failure-text bail, same per-turn usage tracking, same
+post-disconnect grace window. Bug fixes had to land in 2-4 places,
+which is exactly how regressions slip in: v0.8.9's tool_use usage fix
+landed in 4 spots; v0.8.10's client-disconnect-preservation only
+applied to streaming; the OpenAI streaming path had no text-delta
+dedup while the Anthropic streaming path did. One runner means one
+place to fix, four endpoints to test.
+### Added
+- **`lib/inference-runner.js`** — single `runInference()` driving the
+  SDK loop, plus `openaiSurface` and `anthropicSurface` adapters that
+  encapsulate per-surface translation (prompt parsing, image
+  collection, tool-bridge shape, error envelopes, stop-reason mapping)
+  and per-mode sinks (`createSink({ mode: 'stream'|'json', ... })`).
+  The sink contract is uniform (`start / pushTextDelta / pushToolUse /
+  finish / error / hasStarted`); each surface decides whether deltas
+  emit live or buffer.
+- **`lib/openai-translation.js`** — extracted the OpenAI-shape input
+  helpers (`messagesToPrompt`, `collectImages`, `hasTools`,
+  `extractContent`, `extractImageBlocks`, `normalizeModelName`) into
+  their own module. Mirrors the structure of `lib/anthropic.js`.
+### Changed
+- **`server.js`** shrinks from 2388 → 1197 lines. The four handlers and
+  their helpers are gone; the three route handlers now do request
+  validation + dashboard event emission + capture, then dispatch to
+  `runInference(ctx, surface, { mode, deps })`.
+- **OpenAI streaming** now does text-delta dedup (the `startsWith` check
+  the Anthropic stream had). Safe for SDK paths that don't re-send
+  accumulated text (no-op); fixes a class of duplication on paths that
+  do.
+- **OpenAI streaming** now uses the SDK `result.result` fallback when
+  no streaming text was emitted, matching the Anthropic stream's
+  `if (message.result && !textEmittedSoFar && !toolUseEmitted)` branch.
+  Surfaces a final response in tools-mode-without-tools-call paths
+  that previously dropped to empty.
+### Removed
+- `handleStreaming`, `handleNonStreaming`, `handleAnthropicNonStreaming`,
+  `handleAnthropicStreaming` from `server.js`.
+- `extractContent`, `extractImageBlocks`, `collectImages`, `hasTools`,
+  `messagesToPrompt`, `buildQueryPrompt`, `normalizeModelName`,
+  `makeChunk`, `sendSSE` from `server.js` (moved to
+  `lib/openai-translation.js` and `lib/inference-runner.js`).
+## [0.8.11] — 2026-05-02
+Searchable captures: SQLite index over capture summaries + a `mobygate
+captures` CLI for querying. The on-disk format keeps the human-readable
+`.summary.txt` and full `.json` files unchanged — the index is a
+sidecar that mirrors structured fields so you can find that one
+request by model / session / status / stop_reason / duration / text.
+### Added
+- **`lib/captures-index.js`** — better-sqlite3-backed index at
+  `~/.mobygate/captures.sqlite` (WAL mode for concurrent CLI reads).
+  Schema includes timing, token usage, cache-hit %, tool counts,
+  message previews, and pointers back to the json + summary files.
+  Loads better-sqlite3 lazily — if the native build fails, every export
+  becomes a no-op and the proxy keeps running.
+- **Live indexing.** `request-capture.js` now fires `indexCapture()` on
+  request and `updateCaptureResponse()` on response, both wrapped in
+  best-effort try/catch so the index can never block proxying.
+- **`mobygate captures` CLI.** Subcommands:
+    - `query [text] [--since 1h] [--model opus] [--session <key>]`
+      `[--status ok|client_disconnect|error] [--stop end_turn|tool_use|...]`
+      `[--min-duration ms] [--max-duration ms] [--has-tools]`
+      `[--limit n] [--json]`
+      Positional text does a LIKE %text% over first/last user message and
+      session_key. Default output is a colored table with TIME / ID /
+      STATUS / STOP / DUR / IN / OUT / CACHE / PREVIEW columns.
+    - `show <request_id>` prints the human-readable summary file.
+    - `stats` rolls up totals: by status, top models, stop reasons,
+      cumulative tokens, avg cache hit, avg duration.
+    - `rebuild` walks `~/.mobygate/captures/`, parses each `.json` +
+      `.summary.txt` pair, and upserts. Idempotent (REPLACE on
+      `request_id`). Took 77ms to backfill 99 historical captures.
+### Why
+The captures dir hits hundreds of files within a day of regular use.
+Finding *the* request that disconnected mid-stream, or all the slow
+opus calls, or every tool_use stop in the last hour — those questions
+turn into multi-grep workflows on the filesystem. SQL answers them in
+one command. Examples:
+```
+mobygate captures query --status client_disconnect --since 1d
+mobygate captures query --has-tools --min-duration 5000
+mobygate captures query "webflow" --model opus --limit 5
+mobygate captures stats
+```
+### Notes
+- New dep: `better-sqlite3` (native, ~5MB binary). Compiles at install
+  time via node-gyp. If the build fails on an exotic platform, capture
+  files keep working — only the index disappears.
+- The index is a *cache*, not the source of truth. The `.json` and
+  `.summary.txt` files remain authoritative. Delete the .sqlite at any
+  time and `mobygate captures rebuild` reconstructs it.
+- Schema uses `INSERT OR REPLACE` keyed on `request_id`, so re-running
+  rebuild is safe and incremental adds work without conflicts.
+- Backfill from filenames assumes the standard format
+  `YYYY-MM-DD_HH-MM-SS_<route>_<requestId>.json` written by
+  `captureRequest`. Older or hand-edited filenames fall back to the
+  file's mtime.
+## [0.8.10] — 2026-05-02
+Don't abort upstream SDK on client disconnect — preserve the partial
+generation so it lands in the capture file even when the client drops.
+### Why
+Previously, both streaming handlers (`/v1/chat/completions` and
+`/v1/messages`) called `abortController.abort()` the moment
+`res.on('close')` fired. That killed the SDK call mid-stream, even
+though the tokens were already in flight and being billed. Result: a
+dropped client wasted billing AND lost the capture — the summary file
+showed `usage: 0` and a truncated body, defeating the purpose of
+captures for post-mortem on long generations.
+### Changed
+- **Streaming handlers no longer abort SDK on client disconnect.** The
+  for-await loop keeps consuming SDK messages, so the final `result`
+  message lands and `inputTokens/outputTokens/cache*` populate
+  correctly. `tx.pushTextDelta` / `tx.pushToolUse` writes silently
+  no-op once the response is closed (translator already guards every
+  write with `res.writableEnded`).
+- **Removed `if (clientDisconnected) break;`** from the loop in both
+  handlers — the break short-circuited the natural completion path.
+- **60s post-disconnect safety cap.** A `setTimeout(..., 60_000)`
+  schedules an abort if the SDK hasn't finished within a minute of the
+  client dropping. Prevents a flapping client from burning unbounded
+  tokens. Timer is `.unref()`'d so it doesn't keep the process alive.
+### Notes
+- Capture status field still records `client_disconnect` vs `ok` — the
+  status reflects whether the user actually got the response, not
+  whether the SDK completed.
+- The tools-mode `tool_use → abort` path is unchanged; we still need to
+  abort there because the SDK would otherwise hang waiting for a tool
+  result that's coming in via the client (which may have just dropped).
+- Non-streaming handlers were already correct (no `res.on('close')`
+  abort), so no changes there.
+- Verification: trigger a streaming request, kill the client mid-stream
+  (`pkill` or close terminal), then check the latest capture summary —
+  should show full `usage:` block with non-zero tokens and a complete
+  `duration:`.
+## [0.8.9] — 2026-05-02
+Capture summary completeness: streaming `/v1/messages` responses were
+logging `duration: (unknown)` and `usage: 0` for every request that
+ended in `tool_use`. Root cause: the four inference handlers only
+populated their token trackers inside the SDK `result` message branch,
+but the moment the model emits `tool_use`, mobygate aborts the SDK to
+hand off to the client — so `result` never arrives, and the trackers
+stay at zero. The fix mirrors usage from each assistant turn's inner
+Anthropic Message (`message.message.usage`) so the just-completed turn
+is captured even on the abort path.
+This is the first agent-driven fix surfaced from the #mobygate channel:
+signal in (capture summary blind on the most common request shape),
+signal out (next capture summary will be honest).
+### Changed
+- **`extractSdkUsage` falls back to `message.message?.usage`.** SDK
+  `result` messages still take precedence; the new fallback only kicks
+  in for assistant turns, where usage lives one level deeper. Last-
+  resort flat-field read on the message itself is preserved.
+- **All four inference handlers** (`handleStreaming`,
+  `handleNonStreaming`, `handleAnthropicNonStreaming`,
+  `handleAnthropicStreaming`) now mirror per-turn usage into their
+  local trackers as assistant messages stream in. On multi-turn loops,
+  the `result` message still overwrites with the SDK's aggregated
+  total (correct). On `tool_use` aborts, the latest assistant turn's
+  usage is preserved (correct, instead of zero).
+- **`captureResponse` now receives `durationMs`** from every handler.
+  Each handler grabs `Date.now()` at entry and passes
+  `durationMs: Date.now() - startedAt` to the summary writer. Capture
+  files no longer say `duration: (unknown)`.
+- **`handleAnthropicStreaming` SSE `message_delta`** now carries the
+  full usage shape (input + output + cache_read + cache_create), not
+  just `output_tokens`. Clients reading `message_delta.usage` for
+  their own billing now get accurate numbers on streaming responses.
+- **Streaming Anthropic captureResponse status** distinguishes
+  `client_disconnect` from `ok` (was always reporting `ok`). Matches
+  the chat-completions handler's existing behavior.
+### Verification path
+After restart, any new capture summary in `~/.mobygate/captures/` for
+a `tool_use`-ending request should show non-zero `input_tokens`,
+`output_tokens`, `cache_read_input_tokens`, `cache_creation_input_tokens`
+and a real `duration: <N> ms`. Pre-fix summaries on the same shape were
+uniformly zeroed.
+## [0.8.8] — 2026-05-01
+Quiet mode: scrub tool descriptions too. Live test of v0.8.7 caught
+that "hermes" was still leaking through 6 times in a Hermes-driven
+request — all from `body.tools[].description` and nested
+`input_schema.properties.<name>.description` fields. The earlier
+`scrubAnthropicBody` only walked `system`, `messages`, and `metadata`,
+so tool definitions slipped past.
+### Changed
+- **`scrubAnthropicBody` now scrubs tool definitions:** `tools[].description`
+  is replaced via `scrubString`, and `tools[].input_schema` is recursed
+  through with a new `scrubSchemaDescriptions` helper that touches only
+  `description` fields (never `type`, `enum`, property keys, etc., so
+  schemas stay structurally valid).
+- **`quietDiagnose` walks tools too**, so the per-request log line
+  reflects actual coverage.
+### Notes
+- **Tool *names* still pass through unchanged.** Renaming names
+  requires bidirectional mapping (model emits scrubbed name in
+  `tool_use`, mobygate has to translate back to original before
+  forwarding to the client). That's a future release. If your tool
+  names contain brand strings, rename them client-side for now.
+- **Verification path:** in the captured `.json`, grep for the brand
+  word (should be 0) and the replacement (should match the original
+  count). Live test against a Hermes session with 28 declared tools
+  brought "hermes" residue from 6 to 0 (assuming no tool *names*
+  contain it).
+## [0.8.7] — 2026-05-01
+Quiet mode correction. v0.8.6 implemented `/quiet/v1/messages` as
+"scrub brand names AND drop the claude_code preset," on the assumption
+that the preset was just identity framing. Live testing showed it's
+also Anthropic's *"this is an approved Claude Code client, bill flat
+Max"* signal — dropping the preset flipped requests into extra-usage
+billing and produced a confusing 400 ("You're out of extra usage"
+even with 89% of cap unused).
+This release fixes the architecture: quiet mode now layers scrubbing
+**on top of** the preset, not instead of it. Same billing path as
+`/v1/messages`, same Claude-Code wrapping, plus the brand-name scrub.
+### Changed
+- **`/quiet/v1/messages` keeps the `claude_code` preset.** The route
+  still applies `scrubAnthropicBody()` to the inbound payload
+  (`system`, `messages[].content`, `tool_result` content,
+  `metadata.user_id`), but the SDK call now uses the same
+  `{ type: 'preset', preset: 'claude_code', append: toolsGuidance }`
+  systemPrompt as the regular `/v1/messages` route. Net effect: a
+  Claude-Code-shaped request that has had brand-name strings
+  substituted, which was the original intent.
+- **Removed the `mode` parameter** from `handleAnthropicNonStreaming`
+  and `handleAnthropicStreaming`. Both handlers are now mode-agnostic;
+  per-route differences (scrubbing) happen at the route level before
+  the handler is invoked.
+### Fixed
+- **"Out of extra usage" 400 on `/quiet/v1/messages`.** Caused by the
+  v0.8.6 systemPrompt-skipping logic; reverted.
+### Notes
+- The scrubbing itself was always working in v0.8.6 (verified
+  standalone). The bug was in the parallel preset-removal change.
+- If you tested v0.8.6 and saw the 400, your account is *not* flagged —
+  the regular `/v1/messages` route works for you, and after this fix,
+  `/quiet/v1/messages` will too.
+## [0.8.6] — 2026-05-01
+Quiet mode — a new request lane that strips third-party agent harness
+identifiers from the request body before the Claude Agent SDK forwards
+to Anthropic.
+Motivation: community reports (X) describe Anthropic's API surface
+appearing to scan request bodies for known third-party harness names
+(e.g. "openclaw" appearing in a `package.json` that the agent reads
+into context). The match seems to flip the account into per-token
+extra-usage billing even when the harness isn't actually invoked. Quiet
+mode is a defensive content-rewrite pass that runs in mobygate before
+the SDK call: substitute `openclaw → orchestrator`, `hermes → assistant`,
+`mobius → bot`, etc., so the outbound payload reads as a vanilla
+Anthropic-shape call.
+This is best-effort. Tool results / file contents that the agent reads
+mid-conversation still flow back through the same scrub layer, but
+aggressive scrubbing of arbitrary content can distort the model's
+understanding (e.g. "install openclaw" becoming "install orchestrator"
+breaks shell instructions). Quiet mode is **opt-in via a separate route**
+(`/quiet/v1/messages`) so the default `/v1/messages` path stays
+untouched and faithful.
+### Added
+- **`POST /quiet/v1/messages`** — Anthropic-shape route that scrubs
+  third-party harness names from the request body and switches the SDK
+  from the `claude_code` system-prompt preset to a raw string
+  systemPrompt (so the response classifier sees neither identifying
+  brand strings nor the Claude-Code preset wrapping). Same SDK path,
+  same auth, same Max-OAuth flat billing — just a cleaner outbound
+  payload. Captures still write to `~/.mobygate/captures/` (tagged
+  `path: /quiet/v1/messages`) so you can inspect what got scrubbed.
+- **`lib/quiet.js`** — scrub primitives: `scrubAnthropicBody(body)`
+  (mutates in place across `system`, `messages[].content` strings/blocks,
+  `tool_result` content, and `metadata.user_id`), plus `quietDiagnose(body)`
+  (counts matches without mutating, useful for logging and capture
+  summaries). Word list is configurable via
+  `~/.mobygate/quiet-words.txt` (one `term=replacement` per line); falls
+  back to a sensible default map covering openclaw, hermes, mobius,
+  nous, mobygate, claude-max-proxy and a few variants.
+- **Per-request scrub diagnostic** — every `/quiet/v1/messages` call
+  logs `[quiet] scrubbed N occurrence(s): word×n word×n …` so you can
+  see at a glance whether the route is actually doing work or whether
+  the payload was already clean.
+### Changed
+- **`handleAnthropicNonStreaming` / `handleAnthropicStreaming` accept
+  a `mode` param** — defaults to `'normal'` (current behavior, claude_code
+  preset). When called with `'quiet'`, both handlers compute
+  `sdkSystemPrompt` as a raw string (combining the inbound system block
+  and tools-usage guidance) instead of the preset object. The OpenAI
+  surface (`/v1/chat/completions`) keeps the preset unconditionally.
+### Notes
+- **Tool definitions are not (yet) renamed.** If a client registers a
+  tool literally named `openclaw_send_message`, the name still goes
+  through unchanged. We may add a tool-rename map (with reverse-mapping
+  for tool_use responses) in a later release. For now, rename your
+  tools client-side if you want full quiet coverage.
+- **No response-side scrubbing.** The model rarely emits the brand
+  names unprompted (and if the input was scrubbed it doesn't know them
+  anyway), so we save the round-trip cost.
+- **Scrub is best-effort.** The exhaustive list of vectors Anthropic
+  may match against is unknown; the default word list covers the names
+  reported on X. Add your own via the words file.
+## [0.8.5] — 2026-04-30
+Cost visibility + session TTL bump. Born from a per-channel cost audit
+that found 38.9% of total spend going to "singleton" sessions —
+channels that fired once, idled past the wire-cache TTL, and paid
+full `cache_creation` tax on the next turn. The original v0.8.5 plan
+included passing through Anthropic's 1-hour cache TTL beta header,
+but the Claude Agent SDK doesn't expose that beta or `cache_control`
+markers — so this release does what's actually achievable from the
+mobygate side: visibility + a TTL bump + an upstream feature request.
+### Added
+- **`/dashboard/session-costs` endpoint** — aggregates the
+  `[model-billed]` log lines per session_key. Returns cost/turn,
+  bucket (singleton/short/medium/warm), per-model breakdown, and the
+  first user message for human-readable identification.
+- **Sessions tab in the inspector** — new view at `/inspector` showing
+  cost-per-session sorted descending. Surfaces the singleton-bleed
+  pattern in real-time. Includes a bucket overview (% of total cost in
+  warm vs singleton sessions) so the cache-amortization story jumps
+  out at a glance.
+  This view is built directly on top of v0.8.3's `[model-billed]`
+  log line — the diagnostic logging continues to pay dividends.
+### Changed
+- **Default `SESSION_TTL_MS` raised from 1h → 4h.** Multi-channel
+  Discord users (mobygate's primary use case) typically revisit
+  channels every few hours, and a 1h TTL forced mobygate to issue a
+  fresh `query()` (full prompt re-send) on every wake-up. With 4h,
+  mobygate's session-store retains the SDK session ID for half a day,
+  so the next request can resume rather than reissuing the entire
+  prompt prefix.
+  **Caveat:** this only addresses SDK session continuity inside
+  mobygate. Anthropic's wire-side prompt cache is still 5 min by
+  default — the SDK doesn't currently expose the
+  `extended-cache-ttl-2025-04-11` beta header to callers, so the
+  cache_creation tax on truly cold channels (no traffic in 5+ min)
+  is unchanged.
+- **New env var `MOBY_SESSION_TTL_HOURS`** as a more readable override:
+  `MOBY_SESSION_TTL_HOURS=8` is equivalent to setting `SESSION_TTL_MS`
+  to 28800000. Either works; legacy var still respected.
+- **Startup banner** now shows TTL in both minutes and hours:
+  `session TTL  240 min (4.0h)`.
+### Notes
+The per-channel audit findings (43-channel installation, ~$48 over
+recent capture window):
+```
+72 singleton sessions  →  $18.61  (38.9% of total cost)
+20 short  (2-3 turns) →  $ 5.25  (11.0%)
+ 7 medium (4-10 turns) →  $ 4.21  ( 8.8%)
+ 6 warm   (11+ turns) →  $19.73  (41.3%)  ← 6 sessions = 41% of value
+```
+Most users don't need to act on this — the fix is upstream in the
+Claude Agent SDK exposing `extended-cache-ttl-2025-04-11`. Until then
+the new sessions view at least makes the bleed visible so users can
+decide which channels to keep warm via an OpenClaw cron ping.
+A feature request was filed with the SDK team:
+> "Expose `extended-cache-ttl-2025-04-11` beta header in `Options.betas[]`
+>  and a `cacheTtl: '1h' | '5m'` option on `query()` so multi-tenant
+>  / multi-channel mobygate-style proxies can extend wire-cache lifetime
+>  for sporadically-active sessions."
 ## [0.8.4] — 2026-04-28
 Sonnet 1M context fix — gated billing tier mismatch. v0.8.3's "match