llm-cli-gateway 1.13.2 → 1.15.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -2,6 +2,296 @@
2
2
 
3
3
  All notable changes to the llm-cli-gateway project.
4
4
 
5
+ ## [1.15.0] - 2026-05-28 — Phase 4 slice λ (gateway-owned worktree lifecycle)
6
+
7
+ Ships the tenth Phase 4 slice: a new top-level `worktree` field on every
8
+ `*_request` and `*_request_async` tool lets a caller run the request
9
+ inside a dedicated git worktree owned and lifecycle-managed by the
10
+ gateway. The provider audit listed `-w/--worktree` as a per-CLI flag on
11
+ Claude / Gemini / Grok; this slice deliberately does **not** wire any
12
+ `-w` passthrough. Instead the gateway pre-creates a worktree via
13
+ `git worktree add`, spawns the child CLI with `cwd: <worktree-path>`,
14
+ and persists `worktreePath` on `session.metadata` for reuse. Five CLIs
15
+ × two transports (sync + async) = ten tools all share one resolver, so
16
+ the surface lands as one Zod schema + one helper per tool rather than
17
+ five-times-two per-CLI argv wirings.
18
+
19
+ ### Added — gateway-owned worktree surface
20
+
21
+ - **`WORKTREE_SCHEMA`** (`src/index.ts`): top-level Zod field
22
+ registered on all ten tools — `claude_request`, `codex_request`,
23
+ `gemini_request`, `grok_request`, `mistral_request`, plus the five
24
+ `*_request_async` siblings. Accepts `true` (anonymous UUID worktree
25
+ at `<repoRoot>/.worktrees/<uuid>` branched from HEAD) or
26
+ `{ name?, ref? }` (sanitised name and/or explicit git ref).
27
+ - **`src/worktree-manager.ts`** (new file, 277 lines):
28
+ `sanitizeWorktreeName` (rejects path traversal — `..`, leading `/`,
29
+ control chars, length > 64), `createWorktree`
30
+ (`git rev-parse --verify <ref>` before `git worktree add`,
31
+ collision detection via `WorktreeCollisionError`, branch-namespaced
32
+ `gateway/<name>` worktrees), `removeWorktree`
33
+ (`git worktree remove --force`), and `createWorktreeSessionCleanupHook`
34
+ (hooks into session manager).
35
+ - **`resolveWorktreeForRequest`** (`src/index.ts`): single per-request
36
+ resolver consumed by every tool handler. When the request carries
37
+ a `sessionId` and the session already has `metadata.worktreePath`,
38
+ the worktree is reused (no second `git worktree add`); otherwise a
39
+ new worktree is created and persisted onto the session via
40
+ `updateSessionMetadata`. The resolved path is threaded to the
41
+ executor via the existing `cwd` plumbing.
42
+ - **`formatWorktreePrefix(path)`** (`src/index.ts:826`): every
43
+ successful tool result is prefixed with
44
+ `[gateway] worktree=<absolute-path>\n` so the caller can drive
45
+ `Bash(cd <path>)`, `Read <path>/...`, etc. Empty when the request
46
+ did not use a worktree (zero behaviour change for non-λ callers).
47
+ - **`Session.metadata` extension** (`src/session-manager.ts`):
48
+ `worktreePath` + `worktreeName` land on the existing `metadata`
49
+ bag — no `Session` interface changes. `FileSessionManager` accepts
50
+ a `cleanupHook` option that fires on `deleteSession` and on
51
+ TTL-driven eviction; the hook calls `git worktree remove --force`
52
+ before the session record is dropped.
53
+ - **`AsyncJobManager` cwd-aware dedup** (`src/async-job-manager.ts`):
54
+ the dedup key now includes the resolved `cwd`, so two
55
+ `*_request_async` calls with identical argv but different
56
+ worktree paths cannot collide (REGRESSIONS Lθ).
57
+
58
+ ### Out of scope — explicitly deferred
59
+
60
+ - **Grok's `worktree` subcommand** (separate top-level subcommand
61
+ on the Grok CLI, distinct from `-w/--worktree`).
62
+ - **Claude's `--tmux`** (terminal-multiplexer integration).
63
+ - **Startup sweep of orphaned `.worktrees/*`** — left to future
64
+ housekeeping; the cleanup hook covers the happy path
65
+ (session_delete + TTL eviction).
66
+ - **Multi-repo / submodule semantics** — gateway assumes a single
67
+ primary repo at `<repoRoot>`; multi-root behaviour is undefined.
68
+
69
+ ### Test surface
70
+
71
+ `940 → 989` tests pass (+49):
72
+
73
+ - **`src/__tests__/worktree-manager.test.ts`** (new, 26 tests) —
74
+ unit-tests for `sanitizeWorktreeName`, `createWorktree` (including
75
+ the rev-parse-before-add invariant + `WorktreeCollisionError`),
76
+ `removeWorktree`, and `createWorktreeSessionCleanupHook`.
77
+ - **`src/__tests__/test-veracity-regressions-slice-lambda.test.ts`**
78
+ (new, 23 tests across REGRESSIONS Lα–Lθ + Lψ):
79
+ - **Lα** — `sanitizeWorktreeName` path-traversal rejection.
80
+ - **Lβ** — `createWorktree` runs `git rev-parse --verify` BEFORE
81
+ `git worktree add`.
82
+ - **Lγ** — `resolveWorktreeForRequest` persists `worktreePath`
83
+ onto session metadata via `updateSessionMetadata`.
84
+ - **Lδ** — same-session reuse: the second request with the same
85
+ `sessionId` skips `git worktree add`.
86
+ - **Lε** — `FileSessionManager.deleteSession` invokes the cleanup
87
+ hook (and TTL eviction does too).
88
+ - **Lζ** — `executor.executeCli` honours the resolved `cwd`.
89
+ - **Lη** — contract-as-negative-oracle: no CLI receives
90
+ `-w`/`--worktree` in emitted argv across all five providers
91
+ (pairs with slice δ's contract-as-positive-oracle).
92
+ - **Lθ** — `AsyncJobManager` dedup key includes `cwd`.
93
+ - **Lψ** — `formatWorktreePrefix` envelope shape locked
94
+ (`[gateway] worktree=<abs>\n`; empty when path missing).
95
+
96
+ ### Multi-LLM strict-evidence audit
97
+
98
+ Per the standing protocol (`feedback_test_veracity_audit_protocol`
99
+
100
+ - `feedback_multi_llm_review_gate`), the slice was audited round-1
101
+ on 2026-05-28 against `docs/plans/slice-lambda.spec.md`.
102
+
103
+ **Round 1 outcomes:**
104
+
105
+ - Codex: UNCONDITIONAL APPROVE — 9/9 mutation probes RED as
106
+ predicted; per-probe verbatim assertion text and pre/post-revert
107
+ test counts. Worktree at `audit/codex-round-1`.
108
+ - Grok: UNCONDITIONAL APPROVE — 9/9 RED, per-probe verbatim
109
+ assertion text. Worktree at `audit/grok-round-1`.
110
+ - Mistral: UNCONDITIONAL APPROVE — 9/9 RED with per-probe failed-
111
+ count summaries. Worktree at `audit/mistral-round-1`
112
+ (`5d75099`).
113
+ - Gemini: **PARTIAL (quota-blocked)** — confirmed Lα–Lε RED (5/9)
114
+ with assertion text matching the substantive reviewers before
115
+ `TerminalQuotaError` (4h35m reset window > round budget) forced
116
+ a stop. No findings, no contradictions.
117
+ - Claude: **STRUCTURAL BLOCKER** — two `claude_request_async`
118
+ jobs (`135c05c3-…`, `e411e8cc-…`) stalled silently
119
+ (`stdoutBytes: 0` for ≥10 minutes); the second produced a
120
+ 1126-byte fabricated meta-summary with no per-probe evidence,
121
+ rejected per the strict-evidence rule. Documented stall pattern,
122
+ not a defect in slice λ.
123
+
124
+ Four out of five independent vendor voices contributed evidence
125
+ (three full + one partial corroborating) with one documented
126
+ unfixable structural block, satisfying the slice-δ "4/5 minimum
127
+ with documented block" bar. The three full audits are unanimous;
128
+ the partial fourth corroborates without contradiction. Verdict:
129
+ slice λ passes the gate and ships as v1.15.0.
130
+
131
+ Full per-reviewer reports preserved at
132
+ `docs/reviews/slice-lambda/{README,round-1-{codex,grok,mistral,
133
+ gemini,claude}}.md`.
134
+
135
+ ### Mechanical anchors (verify with `rg` before relying)
136
+
137
+ - `src/worktree-manager.ts` — new module, 277 lines.
138
+ - `src/index.ts` — `WORKTREE_SCHEMA` (`:419-444`),
139
+ `formatWorktreePrefix` (`:826-828`), `resolveWorktreeForRequest`
140
+ - per-tool prefix injection (search `formatWorktreePrefix(`),
141
+ 10 × `worktree: WORKTREE_SCHEMA.optional()` registrations on
142
+ every `*_request` / `*_request_async` tool input.
143
+ - `src/session-manager.ts` — `cleanupHook` plumbing
144
+ (`:53-90, 318-342`).
145
+ - `src/async-job-manager.ts` — dedup-key cwd inclusion.
146
+
147
+ ## [1.14.0] - 2026-05-28 — Phase 4 slice κ (Claude explicit `cache_control` via `--input-format stream-json`)
148
+
149
+ Ships the ninth Phase 4 slice. Callers can now opt their stable
150
+ `promptParts` blocks into Anthropic's explicit `cache_control`
151
+ breakpoints — the gateway switches from positional `-p <prompt>` to
152
+ `claude -p --input-format stream-json` and pipes a JSON content-blocks
153
+ payload via stdin. Smoke-test against a live 1-hour-cache-enabled
154
+ account observed a **15,511-token shift from `cache_creation` to
155
+ `cache_read` on the second call, 82 % cost drop, 36 % latency drop**.
156
+
157
+ Seven recommendation commits land alongside the feature (default
158
+ `outputFormat`, auto-emit-from-config, observability split, warning,
159
+ schema mutex, smoke-script gate, tool description) plus three
160
+ falsifiability-tightening commits driven by the multi-LLM review gate.
161
+
162
+ ### Added — slice κ feature
163
+
164
+ - **`PromptParts.cacheControl`** (`src/prompt-parts.ts`): per-block
165
+ boolean opt-in (`system?`/`tools?`/`context?`) with strict Zod
166
+ schema. The `task` field is intentionally never markable — it's the
167
+ volatile tail. Setting any flag activates the κ emission path.
168
+ - **`assembleClaudeCacheBlocks(parts)`** helper (`src/prompt-parts.ts`):
169
+ builds the `{type:"user",message:{role:"user",content:[…]}}` payload
170
+ in `system → tools → context → task` order. Each marked non-empty
171
+ block gets `cache_control: {type:"ephemeral", ttl:"1h"}`. Empty
172
+ parts are silently skipped; markers on empty parts are a no-op.
173
+ - **`prepareClaudeRequest` κ branch** (`src/index.ts`): when the
174
+ caller marks any block AND requests `outputFormat: "stream-json"`,
175
+ argv switches to `-p --input-format stream-json --output-format
176
+ stream-json --include-partial-messages --verbose` with NO positional
177
+ prompt; the prep result carries `stdinPayload` + `cacheControlBlocks`.
178
+ Mixing `cacheControl` with `text`/`json` output returns an
179
+ actionable error instead of silently coercing.
180
+ - **`-p` arity widened** to a new `"optional"` (`src/upstream-contracts.ts`):
181
+ consumes the next token as a value iff it does not start with `-`.
182
+ Preserves the legacy `-p <prompt>` positional form AND validates the
183
+ κ `-p` standalone form. New `--input-format` flag registered with
184
+ `values: ["text","stream-json"]`. New conformance fixture
185
+ `claude-input-format-stream-json` pins the exact κ argv combo.
186
+ - **Executor + AsyncJobManager stdin** (`src/executor.ts`,
187
+ `src/async-job-manager.ts`): both gain `stdin?: string` options.
188
+ When set, stdio[0] switches from `"ignore"` to `"pipe"` and the
189
+ payload is written. The stdin payload participates in the
190
+ AsyncJobManager dedup key — two requests with identical argv but
191
+ different cache_control payloads cannot collide.
192
+ - **Flight recorder migration v4** (`src/flight-recorder.ts`):
193
+ `cache_control_blocks INTEGER` column added idempotently;
194
+ `FlightLogStart.cacheControlBlocks?` persists the per-request
195
+ marker count for cache_state aggregates.
196
+
197
+ ### Added — seven recommendations (rec #1..#7)
198
+
199
+ - **Rec #1** — `claude_request` + `claude_request_async` default
200
+ `outputFormat` changes from `"text"` to `"stream-json"`. The gateway
201
+ already parses NDJSON usage events; the prior default routed every
202
+ call through unparseable text, leaving 1,078 historic FR rows with
203
+ NULL tokens. Override to `"text"` still works for callers that
204
+ truly want raw stdout (loses observability).
205
+ - **Rec #2** — `[cache_awareness].emit_anthropic_cache_control`
206
+ config flag is now wired. When enabled AND the caller passes a
207
+ `promptParts` whose stable prefix exceeds the per-model threshold
208
+ (`minStableTokensForModel`), the gateway auto-marks the rightmost
209
+ non-empty stable block (context → tools → system priority) with
210
+ `ttl: "1h"`. Skipped when `optimizePrompt: true` (rec #5 desync
211
+ risk) or `outputFormat !== "stream-json"`.
212
+ - **Rec #3** — `GlobalCacheStats` (`src/cache-stats.ts`) gains five
213
+ derived metrics that distinguish κ-explicit hits from Claude Code's
214
+ baseline cache reads in the same flight-recorder window:
215
+ `explicitCacheControlRows`, `explicitCacheControlHits`,
216
+ `explicitCacheControlHitRate`, `stablePrefixReuseCount`,
217
+ `avgCacheCreationAfterFirstCall` (averaged over rows AFTER the
218
+ first-by-datetime in each stable-prefix reuse group).
219
+ - **Rec #4** — new structured warning `cacheable_prefix_uncached`
220
+ (`src/index.ts`): fires when `promptParts`' stable prefix is above
221
+ the per-model threshold but no `cache_control` breakpoint will be
222
+ emitted (caller didn't set it AND auto-emit also didn't fire). The
223
+ warning includes the measured `stablePrefixTokens`, `threshold`,
224
+ and `reason` (outputFormat-not-streamjson / config-off /
225
+ no-eligible-block). Threaded through both Claude handlers.
226
+ - **Rec #5** — `prepareClaudeRequest` refuses `optimizePrompt: true`
227
+ combined with `promptParts.cacheControl` (`src/index.ts:1455`)
228
+ before optimization runs. Without this mutex the FR `prompt` column
229
+ would log optimized text while Claude actually received raw
230
+ promptParts blocks via stdin, breaking prefix-cache reuse on the
231
+ next call. Actionable error message points the caller at the
232
+ combination to drop.
233
+ - **Rec #6** — new `npm run smoke:cache-control` script
234
+ (`package.json`). Runs `docs/plans/slice-kappa-smoke-test.mjs`,
235
+ which gates on `SMOKE_CACHE_CONTROL=1` env var with a "BILLABLE
236
+ TEST" banner so accidental invocation in CI does not burn live
237
+ Anthropic credit (~$0.08 per run).
238
+ - **Rec #7** — both Claude tools' `promptParts` descriptions now
239
+ explicitly document the `cacheControl` opt-in, the
240
+ `outputFormat: "stream-json"` requirement, the `ttl='1h'`
241
+ hard-code, and the "task is the volatile tail" convention.
242
+
243
+ ### Tests + multi-LLM review gate
244
+
245
+ `886 → 940` tests pass. 54 new tests across `Kα/Kβ/Kγ/Kδ/Kε/Kζ`
246
+ regression sets + 13 falsifiability-gap closures + 1 SQL-drop
247
+ falsifier strengthening. Every new test is mutation-probe-verified:
248
+ the targeted regression goes red on the predicted mutation.
249
+
250
+ The branch passed a strict-evidence multi-LLM review gate per the
251
+ project's standing protocol (`feedback_multi_llm_review_gate.md` and
252
+ `feedback_test_veracity_audit_protocol.md`). Round 3 was sequential
253
+ to avoid concurrent gateway contention; all four reviewers — Codex
254
+ (`gpt-5.4`), Grok (`grok-build`), Mistral (`mistral-medium-3.5`),
255
+ Claude (`sonnet-4-6`) — issued **UNCONDITIONAL APPROVE** against the
256
+ head with file:line citations and executed mutation probes. The
257
+ iteration trail (Codex round-3 REJECT → fix → recheck APPROVE; Grok
258
+ round-3 REJECT → fix → recheck APPROVE; Mistral + Claude first-pass
259
+ APPROVE) is preserved in commit history (`bea1aee` and `bbc3b5f`).
260
+
261
+ ### Caller-honest framing
262
+
263
+ - κ adds caller-side reuse ON TOP of the irreducible ~10–12K
264
+ `cache_creation` token floor that every fresh `claude -p` session
265
+ rebuilds (Claude Code's session-wrap content). The _added_ benefit
266
+ scales with the caller's stable block size, not the total prompt.
267
+ - The `ttl='1h'` hard-code is mandatory because Anthropic rejects a
268
+ `5m` block after Claude Code's own 1h-marked session blocks; the
269
+ gateway warns if `[cache_awareness].anthropic_ttl_seconds` says 300.
270
+ - Recommended migration: callers running batch / orchestration /
271
+ repeated similar prompts should opt in; callers running one-shot
272
+ ad-hoc prompts won't see benefit.
273
+
274
+ ### Files
275
+
276
+ ```
277
+ src/prompt-parts.ts — PromptParts.cacheControl + assembleClaudeCacheBlocks
278
+ src/index.ts — prepareClaudeRequest κ branch + rec #1/#2/#4/#5/#7 + handler threading
279
+ src/upstream-contracts.ts — arity "optional", --input-format, claude-input-format-stream-json fixture
280
+ src/executor.ts — ExecuteOptions.stdin? threading
281
+ src/async-job-manager.ts — stdin? + dedup-key + cacheControlBlocks plumbing
282
+ src/flight-recorder.ts — migration v4 + cache_control_blocks column
283
+ src/cache-stats.ts — GlobalCacheStats 5 new derived metrics
284
+ package.json — smoke:cache-control script
285
+ docs/plans/slice-kappa.spec.md — audit spec
286
+ docs/plans/slice-kappa-final-review.spec.md — round-3 review spec
287
+ docs/plans/slice-kappa-captures/ — live smoke evidence
288
+ docs/plans/slice-kappa-smoke-test.mjs — billable smoke script (SMOKE_CACHE_CONTROL gated)
289
+ src/__tests__/test-veracity-regressions-slice-kappa.test.ts — 40 κ regressions (Kα/Kβ/Kγ/Kδ/Kε/Kζ)
290
+ src/__tests__/cache-stats.test.ts — +7 rec #3 + SQL-drop falsifier tests
291
+ src/__tests__/prompt-parts-tool-wiring.test.ts — +5 B1/B2/D1/D2 schema falsifiers
292
+ src/__tests__/smoke-script-gate.test.ts — 2 I2 subprocess tests
293
+ ```
294
+
5
295
  ## [1.13.2] - 2026-05-27 — Claude stream-json regression fix (--verbose now required)
6
296
 
7
297
  Patch release. Single user-facing fix to `claude_request` /
@@ -12,7 +302,7 @@ Patch release. Single user-facing fix to `claude_request` /
12
302
  - Claude CLI 2.x rejects `--print --output-format=stream-json` without
13
303
  `--verbose` ("When using --print, --output-format=stream-json requires
14
304
  --verbose"). The gateway was emitting `--output-format stream-json
15
- --include-partial-messages` without `--verbose`, so every claude
305
+ --include-partial-messages` without `--verbose`, so every claude
16
306
  request configured for stream-json (sync or async) was exiting 1.
17
307
  - `prepareClaudeRequest` now pushes `--verbose` as part of the
18
308
  stream-json arg group. `--verbose` only affects what claude writes to
@@ -26,7 +316,7 @@ Patch release. Single user-facing fix to `claude_request` /
26
316
  recorded in the FR for the first time since the CLI started enforcing
27
317
  `--verbose`.
28
318
  - Direct CLI verification: `claude -p ... --output-format stream-json
29
- --verbose --include-partial-messages` returned a clean NDJSON stream
319
+ --verbose --include-partial-messages` returned a clean NDJSON stream
30
320
  with `cache_read_input_tokens: 17978` and
31
321
  `cache_creation_input_tokens: 17435` on a 1-hour-cache-enabled
32
322
  account. The parser path is correct; only the missing flag was
@@ -36,7 +326,7 @@ Patch release. Single user-facing fix to `claude_request` /
36
326
 
37
327
  - New regression: `prepareClaudeRequest` emits `--verbose` when
38
328
  `outputFormat: "stream-json"` and does NOT emit it for `text` / `json`
39
- (src/__tests__/claude-handler.test.ts).
329
+ (src/**tests**/claude-handler.test.ts).
40
330
  - Updated `upstream-contracts.test.ts` "accepts a valid Claude argv
41
331
  emitted by the gateway" to pin the three-flag combo so a future
42
332
  removal of `--verbose` fails at the contract gate.
@@ -106,7 +396,7 @@ regressions) plus this release commit.
106
396
  enumerate). Also settable via the `GROK_SANDBOX` env var. Caller
107
397
  responsibility to pass a valid profile name. The slice deliberately
108
398
  does **not** integrate `--sandbox` with `approvalStrategy:
109
- "mcp_managed"` because the value is unbounded — Grok's approval
399
+ "mcp_managed"` because the value is unbounded — Grok's approval
110
400
  semantics are already covered by `permissionMode` + `alwaysApprove` +
111
401
  `approvalStrategy`.
112
402
  - **`rules`** → `--rules <RULES>`. Supports `@file` prefix per
@@ -172,7 +462,7 @@ parallel with mandatory mutation-probe execution against
172
462
 
173
463
  - Codex: UNCONDITIONAL APPROVE — all 12 probes [as predicted], all
174
464
  26 tests VERIFIED. Baseline (`npm test`: 55 files / 884 tests; build
175
- + format:check clean; slice file 31/31).
465
+ - format:check clean; slice file 31/31).
176
466
  - Grok: UNCONDITIONAL APPROVE — all 12 probes [as predicted]; ran in
177
467
  an isolated worktree at `/tmp/theta-audit-grok` per the slice-ζ
178
468
  reviewer-stomping lesson.
@@ -182,8 +472,8 @@ parallel with mandatory mutation-probe execution against
182
472
  beyond the spec and closes the "enum-mistake stays silent if fixture
183
473
  uses a listed value" gap.
184
474
  - Gemini: **FAILED at 10s** with `TerminalQuotaError: You have
185
- exhausted your capacity on this model. Your quota will reset after
186
- 52m10s.` (Google 429). Documented quota blocker per protocol clause
475
+ exhausted your capacity on this model. Your quota will reset after
476
+ 52m10s.` (Google 429). Documented quota blocker per protocol clause
187
477
  5+6 — counts as "concrete unfixable when documented". Four
188
478
  substantive valid approves from independent vendor families (OpenAI,
189
479
  xAI, Mistral, Anthropic) satisfy the gate.
@@ -352,7 +642,7 @@ this release commit.
352
642
  so no extra gating required.
353
643
  - Both tools accept a new `jsonSchema` field
354
644
  (`string | Record<string, unknown>`). Per `claude --help`, the CLI
355
- argument is the JSON Schema *literal* (not a path; contrast with Codex
645
+ argument is the JSON Schema _literal_ (not a path; contrast with Codex
356
646
  `--output-schema`). Object values are `JSON.stringify`-d; string values
357
647
  pass verbatim. Use with `outputFormat: "json"` for structured output
358
648
  validation. Achieves Codex parity for structured-output validation
@@ -650,7 +940,7 @@ for the async tools and the codex CLI.
650
940
  already terminated before the arm signal landed.
651
941
  - `JobStore.markOrphanedOnStartup()` return shape extended from `number`
652
942
  to `{ count, orphaned: Array<{ id, correlationId, startedAt, stdout,
653
- stderr, exitCode }> }` so the manager constructor can write FR
943
+ stderr, exitCode }> }` so the manager constructor can write FR
654
944
  `logComplete` rows for previously orphaned jobs with proper audit data
655
945
  (durationMs from `startedAt`, response from `stderr || stdout`,
656
946
  errorMessage `"orphaned after gateway restart"`). `SqliteJobStore`
@@ -782,8 +1072,9 @@ Pure documentation release; zero source-code changes since 1.6.0.
782
1072
  ### Fixed — `docs/launch/blog-cache-awareness.md` accuracy + voice
783
1073
 
784
1074
  Technical corrections from the multi-LLM voice + technical review:
1075
+
785
1076
  - Mutually-exclusive error-string quotation reformatted so the
786
- ``provide exactly one of `prompt` or `promptParts``` example renders
1077
+ ``provide exactly one of `prompt`or`promptParts``` example renders
787
1078
  correctly in markdown.
788
1079
  - `lastWriteAt` references corrected to `lastRequestAt` (the actual
789
1080
  public field name on `SessionCacheStats`).
@@ -854,8 +1145,7 @@ Also includes (beyond cache-awareness):
854
1145
  The gateway concatenates in canonical order (`system → tools → context → task`)
855
1146
  so the stable prefix bytes precede the volatile task tail unchanged across
856
1147
  calls — raising implicit cache hit rate without calling provider cache APIs.
857
- The exact error strings `provide exactly one of \`prompt\` or \`promptParts\``
858
- and `one of \`prompt\` or \`promptParts\` is required` are stable API
1148
+ The exact error strings `provide exactly one of \`prompt\` or \`promptParts\``and`one of \`prompt\` or \`promptParts\` is required` are stable API
859
1149
  contract.
860
1150
  - **Flight-recorder v3 migration**: new columns `stable_prefix_hash`
861
1151
  (sha256) and `stable_prefix_tokens` (integer bytes/4 heuristic) on
@@ -886,9 +1176,9 @@ Also includes (beyond cache-awareness):
886
1176
  - `warn_on_ttl_expiry = false`
887
1177
  - `[cache_awareness.min_stable_tokens_for_cache_control]` per-family
888
1178
  table (sonnet=1024, opus=4096, haiku=4096, default=4096).
889
- Validated by a separate Zod schema and loader (`loadCacheAwarenessConfig`);
890
- a malformed `[cache_awareness]` block does NOT break `loadPersistenceConfig`
891
- and vice versa. No env-var overrides.
1179
+ Validated by a separate Zod schema and loader (`loadCacheAwarenessConfig`);
1180
+ a malformed `[cache_awareness]` block does NOT break `loadPersistenceConfig`
1181
+ and vice versa. No env-var overrides.
892
1182
 
893
1183
  ### Decision: Branch B (prefix-discipline only) for slice 1
894
1184
 
@@ -1208,6 +1498,7 @@ Lands DAG layers 6-12 — the personal-MCP MVP terminal plus all of Phase 0-3 pr
1208
1498
  - **No self-update** — `cli_upgrade --cli mistral` detects pip / uv / brew via probes and dispatches to `pip install -U vibe-cli`, `uv tool upgrade vibe-cli`, or `brew upgrade mistral-vibe`. Unknown installations return an actionable error rather than running a non-existent `vibe update`.
1209
1499
 
1210
1500
  Other surfaces extended: `SESSION_PROVIDER_VALUES` now includes `"mistral"`; `list_models`, `cli_versions`, `cli_upgrade`, `approval_list`, `session_create`, `session_list`, and `session_clear_all` accept the fifth provider; new MCP resources `sessions://mistral` and `models://mistral` are registered; `validate_with_models` / `consensus_check` / `red_team_review` can route to Mistral.
1501
+
1211
1502
  - **U23 — JSON output + token/cost parity across providers.** New `src/codex-json-parser.ts` parses the Codex `--json` JSONL event stream (`thread.started`, `turn.started`/`completed`/`failed`, `item.*`, `error`); lenient against partial streams and garbage preamble. New `src/gemini-json-parser.ts` parses `gemini -o json` output and maps `usageMetadata.{promptTokenCount, candidatesTokenCount, cachedContentTokenCount}`. `extractUsageAndCost` is now a thin per-provider dispatcher returning `{inputTokens, outputTokens, cacheReadTokens?, cacheCreationTokens?, costUsd?}` for every provider that supports JSON; Claude `cache_read_input_tokens` / `cache_creation_input_tokens` are now plumbed through instead of being discarded. `codex_request`, `codex_request_async`, `gemini_request`, and `gemini_request_async` now expose `outputFormat: enum("text","json")` — set to `"json"` and the gateway emits `--json` (Codex) or `-o json` (Gemini) and forwards parsed usage/cost into the flight recorder. Flight-recorder schema gains `cache_read_tokens` and `cache_creation_tokens` columns via idempotent migration (`PRAGMA table_info` → `ALTER TABLE ADD COLUMN`); existing `logs.db` files are upgraded in place. 15 new tests.
1212
1503
  - **U24 — Permission/approval-mode parity across providers.** Claude `permissionMode` enum (`default | acceptEdits | plan | auto | dontAsk | bypassPermissions`) replaces the boolean `dangerouslySkipPermissions` (the boolean still works and now maps to `permissionMode: "bypassPermissions"`; setting both logs a warning, `permissionMode` wins). Gemini `approvalMode` gains `plan`. Codex splits `--full-auto` into `sandboxMode: enum("read-only","workspace-write","danger-full-access")` and `askForApproval: enum("untrusted","on-request","never")`, emitting `--sandbox <mode>` and `--ask-for-approval <mode>` independently; legacy `fullAuto: true` still works and expands to `--sandbox workspace-write --ask-for-approval never` by default, with `useLegacyFullAutoFlag: true` as an explicit escape hatch to emit `--full-auto` directly. Codex resume mode filters all three flags (`--full-auto`, `--sandbox`, `--ask-for-approval`) since `codex exec resume` inherits the session's policy. 26 new tests.
1213
1504
  - **U25 — Claude high-impact features.** `claude_request` / `claude_request_async` schemas gain `agent?: string` (single sub-agent dispatch), `agents?: Record<string, object>` (multi-agent JSON, validated against `CLAUDE_AGENT_DEFINITION_SCHEMA` before emit), `forkSession?: boolean`, `systemPrompt?: string`, `appendSystemPrompt?: string` (mutually exclusive at the schema + tool-callback boundary), `maxBudgetUsd?: number`, `maxTurns?: number`, `effort?: enum("low","medium","high","xhigh","max")`, and `excludeDynamicSystemPromptSections?: boolean`. Each emits the documented `--<flag>` form. 25 new tests in `src/__tests__/claude-handler.test.ts`.
@@ -1300,7 +1591,7 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
1300
1591
 
1301
1592
  ### Fixed
1302
1593
 
1303
- - **SIGTERM→SIGKILL escalation bug** — `proc.killed` becomes `true` after `.kill()` is *called*, not after the process *exits*, so the SIGKILL guard (`if (!proc.killed)`) was always false. Replaced with an `exited` flag set by `close`/`error` events in both `executor.ts` and `async-job-manager.ts`
1594
+ - **SIGTERM→SIGKILL escalation bug** — `proc.killed` becomes `true` after `.kill()` is _called_, not after the process _exits_, so the SIGKILL guard (`if (!proc.killed)`) was always false. Replaced with an `exited` flag set by `close`/`error` events in both `executor.ts` and `async-job-manager.ts`
1304
1595
  - **Timer priority race** — When both `timeout` and `idleTimeout` are set, idle timeout now clears the wall-clock timer to prevent `timedOut` from overriding `idledOut` in the close handler (which would misclassify code 125 as transient code 124)
1305
1596
 
1306
1597
  ### Added
@@ -1385,6 +1676,7 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
1385
1676
  ## Core Features
1386
1677
 
1387
1678
  ### Multi-LLM Orchestration
1679
+
1388
1680
  - **3 CLI tools supported**: Claude Code, Codex, Gemini
1389
1681
  - **Unified MCP interface**: Single protocol for all LLMs
1390
1682
  - **Cross-tool collaboration**: LLMs can use each other via MCP
@@ -1392,6 +1684,7 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
1392
1684
  - **Correlation ID tracking**: Full request tracing
1393
1685
 
1394
1686
  ### Token Optimization
1687
+
1395
1688
  - **Auto-optimization middleware**: 44% reduction on prompts, 37% on responses
1396
1689
  - **15+ optimization patterns**: Remove filler, compact types, arrow notation
1397
1690
  - **Opt-in feature**: `optimizePrompt` and `optimizeResponse` flags
@@ -1399,6 +1692,7 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
1399
1692
  - **Research-backed**: 42 sources, best practices documented
1400
1693
 
1401
1694
  ### Reliability & Performance
1695
+
1402
1696
  - **Retry logic**: Exponential backoff with circuit breaker
1403
1697
  - **Atomic file writes**: Process-specific temp files with fsync
1404
1698
  - **Memory limits**: 50MB cap on CLI output prevents DoS
@@ -1406,6 +1700,7 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
1406
1700
  - **Non-zero exit code handling**: Proper retry behavior
1407
1701
 
1408
1702
  ### Security Hardening
1703
+
1409
1704
  - **No secret leakage**: Generic session descriptions only
1410
1705
  - **File permissions**: 0o600 on sensitive files
1411
1706
  - **No ReDoS vulnerabilities**: Bounded regex patterns
@@ -1414,6 +1709,7 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
1414
1709
  - **Custom storage paths**: Secure directory creation
1415
1710
 
1416
1711
  ### Testing & Quality
1712
+
1417
1713
  - **114 tests**: 68 unit, 41 integration, 5 optimizer
1418
1714
  - **Real CLI integration**: Not mocks
1419
1715
  - **Regression tests**: ReDoS, schema validation, retry behavior
@@ -1421,6 +1717,7 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
1421
1717
  - **Edge case coverage**: Timeouts, errors, concurrency
1422
1718
 
1423
1719
  ### Documentation Excellence
1720
+
1424
1721
  - **7 comprehensive guides**: 4,000+ lines total
1425
1722
  - **Research-backed**: TOKEN_OPTIMIZATION_GUIDE.md with 42 sources
1426
1723
  - **Real-world examples**: PROMPT_OPTIMIZATION_EXAMPLES.md with 5 examples
@@ -1432,6 +1729,7 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
1432
1729
  ## Added
1433
1730
 
1434
1731
  ### Features
1732
+
1435
1733
  - Multi-LLM CLI orchestration via MCP
1436
1734
  - Session management with persistence
1437
1735
  - Correlation ID tracking for request tracing
@@ -1443,6 +1741,7 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
1443
1741
  - Custom storage path support
1444
1742
 
1445
1743
  ### Tools (MCP)
1744
+
1446
1745
  - `claude_request` - Execute Claude Code CLI
1447
1746
  - `codex_request` - Execute Codex CLI
1448
1747
  - `gemini_request` - Execute Gemini CLI
@@ -1456,6 +1755,7 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
1456
1755
  - `list_models` - List available models for each CLI
1457
1756
 
1458
1757
  ### Resources (MCP)
1758
+
1459
1759
  - `sessions://all` - All sessions across CLIs
1460
1760
  - `sessions://claude` - Claude-specific sessions
1461
1761
  - `sessions://codex` - Codex-specific sessions
@@ -1464,6 +1764,7 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
1464
1764
  - `metrics://performance` - Performance metrics and stats
1465
1765
 
1466
1766
  ### Documentation
1767
+
1467
1768
  - `README.md` - Installation and usage guide
1468
1769
  - `BEST_PRACTICES.md` - Design and implementation patterns
1469
1770
  - `TOKEN_OPTIMIZATION_GUIDE.md` - Research-backed optimization techniques (42 sources)
@@ -1477,6 +1778,7 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
1477
1778
  - `CROSS_TOOL_SUCCESS.md` - Cross-LLM collaboration validation
1478
1779
 
1479
1780
  ### Tests
1781
+
1480
1782
  - 68 unit tests (executor, sessions, metrics, optimizer)
1481
1783
  - 41 integration tests (full MCP with real CLIs)
1482
1784
  - 5 optimizer tests (pattern validation, ReDoS prevention)
@@ -1489,6 +1791,7 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
1489
1791
  ### First Review Round (8 bugs)
1490
1792
 
1491
1793
  **Critical:**
1794
+
1492
1795
  1. **session_set_active schema mismatch** (src/index.ts:430)
1493
1796
  - Issue: Documentation said "null to clear" but z.string() rejected null
1494
1797
  - Fix: Changed to z.string().nullable()
@@ -1504,12 +1807,12 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
1504
1807
  - Fix: Integrated withRetry + CircuitBreaker into executeCli
1505
1808
  - Impact: Transient failures now retried automatically
1506
1809
 
1507
- **Medium:**
1508
- 4. **Integration test brittleness**
1509
- - Issue: Tests failed without dist/ or CLIs installed
1510
- - Fix: Tests properly skip when CLIs unavailable
1810
+ **Medium:** 4. **Integration test brittleness**
1511
1811
 
1512
- 5. **Test timing issues** (src/__tests__/session-manager.test.ts:216,429)
1812
+ - Issue: Tests failed without dist/ or CLIs installed
1813
+ - Fix: Tests properly skip when CLIs unavailable
1814
+
1815
+ 5. **Test timing issues** (src/**tests**/session-manager.test.ts:216,429)
1513
1816
  - Issue: setTimeout not awaited → false positives
1514
1817
  - Fix: Proper async/await patterns
1515
1818
 
@@ -1517,10 +1820,10 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
1517
1820
  - Issue: All stdout/stderr buffered in memory with no cap
1518
1821
  - Fix: Added 50MB limit with early termination
1519
1822
 
1520
- **Low:**
1521
- 7. **Model data duplication** (src/index.ts:64, src/resources.ts:22)
1522
- - Issue: CLI_INFO defined in two places
1523
- - Fix: Centralized in single location
1823
+ **Low:** 7. **Model data duplication** (src/index.ts:64, src/resources.ts:22)
1824
+
1825
+ - Issue: CLI_INFO defined in two places
1826
+ - Fix: Centralized in single location
1524
1827
 
1525
1828
  8. **Unused code** (src/resources.ts:33)
1526
1829
  - Issue: listResources() never called
@@ -1529,27 +1832,28 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
1529
1832
  ### Second Review Round (8 bugs)
1530
1833
 
1531
1834
  **Critical:**
1835
+
1532
1836
  1. **Secret leakage via session descriptions** (src/index.ts + src/session-manager.ts)
1533
1837
  - Issue: First 50 chars of prompts stored in plain text
1534
1838
  - Fix: Generic descriptions ("Claude Session"), file permissions 0o600
1535
1839
  - Impact: No user data exposed in session files
1536
1840
 
1537
- **High:**
1538
- 2. **ReDoS in optimizer regex** (src/optimizer.ts:241,244)
1539
- - Issue: Catastrophic backtracking with .+? patterns
1540
- - Fix: Bounded character sets [A-Za-z][\w-]*
1541
- - Impact: No DoS from malicious prompts
1841
+ **High:** 2. **ReDoS in optimizer regex** (src/optimizer.ts:241,244)
1842
+
1843
+ - Issue: Catastrophic backtracking with .+? patterns
1844
+ - Fix: Bounded character sets [A-Za-z][\w-]\*
1845
+ - Impact: No DoS from malicious prompts
1542
1846
 
1543
1847
  3. **Custom storage path directory not created** (src/session-manager.ts:36)
1544
1848
  - Issue: ensureStorageDirectory only created default path
1545
1849
  - Fix: Create dirname(storagePath) for custom paths
1546
1850
  - Impact: Custom storage paths work without errors
1547
1851
 
1548
- **Medium:**
1549
- 4. **Atomic write temp filename collision** (src/session-manager.ts:57)
1550
- - Issue: All processes used same .tmp filename
1551
- - Fix: Process-specific temp files (sessions.json.tmp.${process.pid})
1552
- - Impact: Safe multi-process deployments
1852
+ **Medium:** 4. **Atomic write temp filename collision** (src/session-manager.ts:57)
1853
+
1854
+ - Issue: All processes used same .tmp filename
1855
+ - Fix: Process-specific temp files (sessions.json.tmp.${process.pid})
1856
+ - Impact: Safe multi-process deployments
1553
1857
 
1554
1858
  5. **Retry doesn't handle non-zero exit codes** (src/executor.ts:99)
1555
1859
  - Issue: Only thrown errors triggered retry
@@ -1561,11 +1865,11 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
1561
1865
  - Fix: 50MB limit with process termination
1562
1866
  - Impact: DoS prevention
1563
1867
 
1564
- **Low:**
1565
- 7. **Performance overhead from NVM scanning** (src/executor.ts:41)
1566
- - Issue: Filesystem scan on every request
1567
- - Fix: Cache NVM path at module load
1568
- - Impact: Performance improvement
1868
+ **Low:** 7. **Performance overhead from NVM scanning** (src/executor.ts:41)
1869
+
1870
+ - Issue: Filesystem scan on every request
1871
+ - Fix: Cache NVM path at module load
1872
+ - Impact: Performance improvement
1569
1873
 
1570
1874
  8. **Unused imports** (src/session-manager.ts:4, src/executor.ts:7)
1571
1875
  - Issue: Dead code and unused parameters
@@ -1577,6 +1881,7 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
1577
1881
  ## Security
1578
1882
 
1579
1883
  ### Vulnerabilities Fixed
1884
+
1580
1885
  - ✅ **Secret leakage**: No user data in session descriptions
1581
1886
  - ✅ **File permissions**: 0o600 on sessions.json
1582
1887
  - ✅ **ReDoS**: Bounded regex patterns prevent DoS
@@ -1585,6 +1890,7 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
1585
1890
  - ✅ **Command injection**: Already prevented via spawn with args
1586
1891
 
1587
1892
  ### Security Best Practices
1893
+
1588
1894
  - Input validation with Zod schemas
1589
1895
  - No stack trace leakage in errors
1590
1896
  - Atomic file writes with fsync
@@ -1596,6 +1902,7 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
1596
1902
  ## Performance
1597
1903
 
1598
1904
  ### Optimizations Added
1905
+
1599
1906
  - **Token optimization**: 44% reduction on prompts, 37% on responses
1600
1907
  - **NVM path caching**: Eliminates I/O on every request
1601
1908
  - **Circuit breaker**: Fast-fail during outages
@@ -1603,6 +1910,7 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
1603
1910
  - **Memory limits**: Prevents resource exhaustion
1604
1911
 
1605
1912
  ### Metrics
1913
+
1606
1914
  - Request counts per CLI tool
1607
1915
  - Response times with percentiles
1608
1916
  - Success/failure rates
@@ -1614,6 +1922,7 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
1614
1922
  ## Testing
1615
1923
 
1616
1924
  ### Test Growth
1925
+
1617
1926
  - **Initial**: 104 tests
1618
1927
  - **After first fixes**: 109 tests (+5 from retry integration)
1619
1928
  - **After optimizer**: 113 tests (+4 from optimizer)
@@ -1621,6 +1930,7 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
1621
1930
  - **Growth**: +10 tests (9.6% increase)
1622
1931
 
1623
1932
  ### Coverage Areas
1933
+
1624
1934
  - Unit: Executor, session manager, metrics, optimizer
1625
1935
  - Integration: Full MCP protocol with real CLI execution
1626
1936
  - Regression: Schema validation, ReDoS, retry behavior
@@ -1631,6 +1941,7 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
1631
1941
  ## Documentation
1632
1942
 
1633
1943
  ### Guides Created
1944
+
1634
1945
  1. **README.md** - Installation, usage, API reference
1635
1946
  2. **BEST_PRACTICES.md** - Design patterns and architecture
1636
1947
  3. **TOKEN_OPTIMIZATION_GUIDE.md** - Research (42 sources)
@@ -1644,6 +1955,7 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
1644
1955
  11. **CROSS_TOOL_SUCCESS.md** - Collaboration proof
1645
1956
 
1646
1957
  ### Total Documentation
1958
+
1647
1959
  - **11 comprehensive files**
1648
1960
  - **~8,000 lines** of documentation
1649
1961
  - **Research-backed** with citations
@@ -1654,17 +1966,20 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
1654
1966
  ## Dogfooding Validation
1655
1967
 
1656
1968
  ### Multi-LLM Review Process
1969
+
1657
1970
  - **Claude Sonnet 4.5**: Strategic/product review (8.5/10 → 10/10)
1658
1971
  - **Codex**: Bug finding and implementation (13 bugs found, 13 fixed)
1659
1972
  - **Gemini 2.5 Pro**: Security analysis (3 critical issues found, 3 fixed)
1660
1973
 
1661
1974
  ### Self-Improvement Cycle
1975
+
1662
1976
  1. ✅ Multi-LLM review found 16 bugs
1663
1977
  2. ✅ Codex fixed all bugs via MCP
1664
1978
  3. ✅ Gateway validated fixes via test suite
1665
1979
  4. ✅ Complete autonomous improvement demonstrated
1666
1980
 
1667
1981
  ### Workflow Validated
1982
+
1668
1983
  ```
1669
1984
  Implement (Codex) → Review (Gemini) → Fix (Codex) → Verify (Tests) → Iterate
1670
1985
  ```
@@ -1674,41 +1989,45 @@ Implement (Codex) → Review (Gemini) → Fix (Codex) → Verify (Tests) → Ite
1674
1989
  ## Migration Guide
1675
1990
 
1676
1991
  ### Breaking Changes
1992
+
1677
1993
  None - This is the first release.
1678
1994
 
1679
1995
  ### New Features to Adopt
1680
1996
 
1681
1997
  **1. Token Optimization** (Optional, Opt-in)
1998
+
1682
1999
  ```typescript
1683
2000
  // Enable prompt optimization
1684
2001
  await callTool("codex_request", {
1685
2002
  prompt: "Your verbose prompt...",
1686
- optimizePrompt: true // 44% token reduction
2003
+ optimizePrompt: true, // 44% token reduction
1687
2004
  });
1688
2005
 
1689
2006
  // Enable response optimization
1690
2007
  await callTool("claude_request", {
1691
2008
  prompt: "Generate docs...",
1692
- optimizeResponse: true // 37% token reduction
2009
+ optimizeResponse: true, // 37% token reduction
1693
2010
  });
1694
2011
  ```
1695
2012
 
1696
2013
  **2. Session Management**
2014
+
1697
2015
  ```typescript
1698
2016
  // Create and use sessions
1699
2017
  const session = await callTool("session_create", {
1700
2018
  cli: "claude",
1701
- description: "My coding session"
2019
+ description: "My coding session",
1702
2020
  });
1703
2021
 
1704
2022
  // Continue conversations
1705
2023
  await callTool("claude_request", {
1706
2024
  prompt: "Continue from previous context",
1707
- sessionId: session.id
2025
+ sessionId: session.id,
1708
2026
  });
1709
2027
  ```
1710
2028
 
1711
2029
  **3. Correlation IDs** (Automatic)
2030
+
1712
2031
  ```typescript
1713
2032
  // Automatically generated for tracing
1714
2033
  // Check logs: [corrId] prefix on all log lines
@@ -1719,6 +2038,7 @@ await callTool("claude_request", {
1719
2038
  ## Known Limitations
1720
2039
 
1721
2040
  ### Documented Constraints
2041
+
1722
2042
  1. **Multi-level orchestration unsupported**
1723
2043
  - Nested MCP connections fail
1724
2044
  - LLMs can't spawn sub-LLMs via gateway
@@ -1733,6 +2053,7 @@ await callTool("claude_request", {
1733
2053
  - Consider encryption for sensitive data (future)
1734
2054
 
1735
2055
  ### Future Enhancements
2056
+
1736
2057
  - Session encryption at rest
1737
2058
  - Session TTL and automatic cleanup
1738
2059
  - Redis/DynamoDB backend for horizontal scaling
@@ -1745,16 +2066,19 @@ await callTool("claude_request", {
1745
2066
  ## Credits
1746
2067
 
1747
2068
  ### Development
2069
+
1748
2070
  - **Architecture & Orchestration**: Claude Sonnet 4.5
1749
2071
  - **Implementation & Bug Fixes**: Codex via llm-cli-gateway MCP
1750
2072
  - **Security Analysis**: Gemini 2.5 Pro via llm-cli-gateway MCP
1751
2073
 
1752
2074
  ### Research
2075
+
1753
2076
  - Token optimization: 42 research sources (2025-2026)
1754
2077
  - Compression validation: Compel paper (OpenReview 2025)
1755
2078
  - Best practices: Industry standards + dogfooding
1756
2079
 
1757
2080
  ### Validation
2081
+
1758
2082
  - **Self-dogfooding**: Gateway reviewed and fixed itself
1759
2083
  - **Multi-LLM collaboration**: 3 LLMs working via MCP
1760
2084
  - **Iterative quality**: 2 review rounds, 16 bugs found and fixed
@@ -1764,6 +2088,7 @@ await callTool("claude_request", {
1764
2088
  ## Statistics
1765
2089
 
1766
2090
  ### Development Timeline
2091
+
1767
2092
  - **Total time**: ~2.5 hours (from first review to 100% bug-free)
1768
2093
  - **Review rounds**: 2 comprehensive multi-LLM reviews
1769
2094
  - **Bugs found**: 16 total
@@ -1771,12 +2096,14 @@ await callTool("claude_request", {
1771
2096
  - **Test growth**: 104 → 114 tests (+9.6%)
1772
2097
 
1773
2098
  ### Code Metrics
2099
+
1774
2100
  - **Files modified**: 12 files
1775
2101
  - **Lines added**: ~2,500 lines
1776
2102
  - **Documentation**: ~8,000 lines (11 files)
1777
2103
  - **Test coverage**: 114 tests across unit/integration/regression
1778
2104
 
1779
2105
  ### Quality Metrics
2106
+
1780
2107
  - **Bug-free rate**: 100%
1781
2108
  - **Test pass rate**: 100%
1782
2109
  - **Build success**: ✅