llm-cli-gateway 1.14.0 → 1.15.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -2,6 +2,148 @@
2
2
 
3
3
  All notable changes to the llm-cli-gateway project.
4
4
 
5
+ ## [1.15.0] - 2026-05-28 — Phase 4 slice λ (gateway-owned worktree lifecycle)
6
+
7
+ Ships the tenth Phase 4 slice: a new top-level `worktree` field on every
8
+ `*_request` and `*_request_async` tool lets a caller run the request
9
+ inside a dedicated git worktree owned and lifecycle-managed by the
10
+ gateway. The provider audit listed `-w/--worktree` as a per-CLI flag on
11
+ Claude / Gemini / Grok; this slice deliberately does **not** wire any
12
+ `-w` passthrough. Instead the gateway pre-creates a worktree via
13
+ `git worktree add`, spawns the child CLI with `cwd: <worktree-path>`,
14
+ and persists `worktreePath` on `session.metadata` for reuse. Five CLIs
15
+ × two transports (sync + async) = ten tools all share one resolver, so
16
+ the surface lands as one Zod schema + one helper per tool rather than
17
+ five-times-two per-CLI argv wirings.
18
+
19
+ ### Added — gateway-owned worktree surface
20
+
21
+ - **`WORKTREE_SCHEMA`** (`src/index.ts`): top-level Zod field
22
+ registered on all ten tools — `claude_request`, `codex_request`,
23
+ `gemini_request`, `grok_request`, `mistral_request`, plus the five
24
+ `*_request_async` siblings. Accepts `true` (anonymous UUID worktree
25
+ at `<repoRoot>/.worktrees/<uuid>` branched from HEAD) or
26
+ `{ name?, ref? }` (sanitised name and/or explicit git ref).
27
+ - **`src/worktree-manager.ts`** (new file, 277 lines):
28
+ `sanitizeWorktreeName` (rejects path traversal — `..`, leading `/`,
29
+ control chars, length > 64), `createWorktree`
30
+ (`git rev-parse --verify <ref>` before `git worktree add`,
31
+ collision detection via `WorktreeCollisionError`, branch-namespaced
32
+ `gateway/<name>` worktrees), `removeWorktree`
33
+ (`git worktree remove --force`), and `createWorktreeSessionCleanupHook`
34
+ (hooks into session manager).
35
+ - **`resolveWorktreeForRequest`** (`src/index.ts`): single per-request
36
+ resolver consumed by every tool handler. When the request carries
37
+ a `sessionId` and the session already has `metadata.worktreePath`,
38
+ the worktree is reused (no second `git worktree add`); otherwise a
39
+ new worktree is created and persisted onto the session via
40
+ `updateSessionMetadata`. The resolved path is threaded to the
41
+ executor via the existing `cwd` plumbing.
42
+ - **`formatWorktreePrefix(path)`** (`src/index.ts:826`): every
43
+ successful tool result is prefixed with
44
+ `[gateway] worktree=<absolute-path>\n` so the caller can drive
45
+ `Bash(cd <path>)`, `Read <path>/...`, etc. Empty when the request
46
+ did not use a worktree (zero behaviour change for non-λ callers).
47
+ - **`Session.metadata` extension** (`src/session-manager.ts`):
48
+ `worktreePath` + `worktreeName` land on the existing `metadata`
49
+ bag — no `Session` interface changes. `FileSessionManager` accepts
50
+ a `cleanupHook` option that fires on `deleteSession` and on
51
+ TTL-driven eviction; the hook calls `git worktree remove --force`
52
+ before the session record is dropped.
53
+ - **`AsyncJobManager` cwd-aware dedup** (`src/async-job-manager.ts`):
54
+ the dedup key now includes the resolved `cwd`, so two
55
+ `*_request_async` calls with identical argv but different
56
+ worktree paths cannot collide (REGRESSIONS Lθ).
57
+
58
+ ### Out of scope — explicitly deferred
59
+
60
+ - **Grok's `worktree` subcommand** (separate top-level subcommand
61
+ on the Grok CLI, distinct from `-w/--worktree`).
62
+ - **Claude's `--tmux`** (terminal-multiplexer integration).
63
+ - **Startup sweep of orphaned `.worktrees/*`** — left to future
64
+ housekeeping; the cleanup hook covers the happy path
65
+ (session_delete + TTL eviction).
66
+ - **Multi-repo / submodule semantics** — gateway assumes a single
67
+ primary repo at `<repoRoot>`; multi-root behaviour is undefined.
68
+
69
+ ### Test surface
70
+
71
+ `940 → 989` tests pass (+49):
72
+
73
+ - **`src/__tests__/worktree-manager.test.ts`** (new, 26 tests) —
74
+ unit-tests for `sanitizeWorktreeName`, `createWorktree` (including
75
+ the rev-parse-before-add invariant + `WorktreeCollisionError`),
76
+ `removeWorktree`, and `createWorktreeSessionCleanupHook`.
77
+ - **`src/__tests__/test-veracity-regressions-slice-lambda.test.ts`**
78
+ (new, 23 tests across REGRESSIONS Lα–Lθ + Lψ):
79
+ - **Lα** — `sanitizeWorktreeName` path-traversal rejection.
80
+ - **Lβ** — `createWorktree` runs `git rev-parse --verify` BEFORE
81
+ `git worktree add`.
82
+ - **Lγ** — `resolveWorktreeForRequest` persists `worktreePath`
83
+ onto session metadata via `updateSessionMetadata`.
84
+ - **Lδ** — same-session reuse: the second request with the same
85
+ `sessionId` skips `git worktree add`.
86
+ - **Lε** — `FileSessionManager.deleteSession` invokes the cleanup
87
+ hook (and TTL eviction does too).
88
+ - **Lζ** — `executor.executeCli` honours the resolved `cwd`.
89
+ - **Lη** — contract-as-negative-oracle: no CLI receives
90
+ `-w`/`--worktree` in emitted argv across all five providers
91
+ (pairs with slice δ's contract-as-positive-oracle).
92
+ - **Lθ** — `AsyncJobManager` dedup key includes `cwd`.
93
+ - **Lψ** — `formatWorktreePrefix` envelope shape locked
94
+ (`[gateway] worktree=<abs>\n`; empty when path missing).
95
+
96
+ ### Multi-LLM strict-evidence audit
97
+
98
+ Per the standing protocol (`feedback_test_veracity_audit_protocol`
99
+
100
+ - `feedback_multi_llm_review_gate`), the slice was audited round-1
101
+ on 2026-05-28 against `docs/plans/slice-lambda.spec.md`.
102
+
103
+ **Round 1 outcomes:**
104
+
105
+ - Codex: UNCONDITIONAL APPROVE — 9/9 mutation probes RED as
106
+ predicted; per-probe verbatim assertion text and pre/post-revert
107
+ test counts. Worktree at `audit/codex-round-1`.
108
+ - Grok: UNCONDITIONAL APPROVE — 9/9 RED, per-probe verbatim
109
+ assertion text. Worktree at `audit/grok-round-1`.
110
+ - Mistral: UNCONDITIONAL APPROVE — 9/9 RED with per-probe failed-
111
+ count summaries. Worktree at `audit/mistral-round-1`
112
+ (`5d75099`).
113
+ - Gemini: **PARTIAL (quota-blocked)** — confirmed Lα–Lε RED (5/9)
114
+ with assertion text matching the substantive reviewers before
115
+ `TerminalQuotaError` (4h35m reset window > round budget) forced
116
+ a stop. No findings, no contradictions.
117
+ - Claude: **STRUCTURAL BLOCKER** — two `claude_request_async`
118
+ jobs (`135c05c3-…`, `e411e8cc-…`) stalled silently
119
+ (`stdoutBytes: 0` for ≥10 minutes); the second produced a
120
+ 1126-byte fabricated meta-summary with no per-probe evidence,
121
+ rejected per the strict-evidence rule. Documented stall pattern,
122
+ not a defect in slice λ.
123
+
124
+ Four out of five independent vendor voices contributed evidence
125
+ (three full + one partial corroborating) with one documented
126
+ unfixable structural block, satisfying the slice-δ "4/5 minimum
127
+ with documented block" bar. The three full audits are unanimous;
128
+ the partial fourth corroborates without contradiction. Verdict:
129
+ slice λ passes the gate and ships as v1.15.0.
130
+
131
+ Full per-reviewer reports preserved at
132
+ `docs/reviews/slice-lambda/{README,round-1-{codex,grok,mistral,
133
+ gemini,claude}}.md`.
134
+
135
+ ### Mechanical anchors (verify with `rg` before relying)
136
+
137
+ - `src/worktree-manager.ts` — new module, 277 lines.
138
+ - `src/index.ts` — `WORKTREE_SCHEMA` (`:419-444`),
139
+ `formatWorktreePrefix` (`:826-828`), `resolveWorktreeForRequest`
140
+ - per-tool prefix injection (search `formatWorktreePrefix(`),
141
+ 10 × `worktree: WORKTREE_SCHEMA.optional()` registrations on
142
+ every `*_request` / `*_request_async` tool input.
143
+ - `src/session-manager.ts` — `cleanupHook` plumbing
144
+ (`:53-90, 318-342`).
145
+ - `src/async-job-manager.ts` — dedup-key cwd inclusion.
146
+
5
147
  ## [1.14.0] - 2026-05-28 — Phase 4 slice κ (Claude explicit `cache_control` via `--input-format stream-json`)
6
148
 
7
149
  Ships the ninth Phase 4 slice. Callers can now opt their stable
@@ -31,7 +173,7 @@ falsifiability-tightening commits driven by the multi-LLM review gate.
31
173
  - **`prepareClaudeRequest` κ branch** (`src/index.ts`): when the
32
174
  caller marks any block AND requests `outputFormat: "stream-json"`,
33
175
  argv switches to `-p --input-format stream-json --output-format
34
- stream-json --include-partial-messages --verbose` with NO positional
176
+ stream-json --include-partial-messages --verbose` with NO positional
35
177
  prompt; the prep result carries `stdinPayload` + `cacheControlBlocks`.
36
178
  Mixing `cacheControl` with `text`/`json` output returns an
37
179
  actionable error instead of silently coercing.
@@ -120,7 +262,7 @@ APPROVE) is preserved in commit history (`bea1aee` and `bbc3b5f`).
120
262
 
121
263
  - κ adds caller-side reuse ON TOP of the irreducible ~10–12K
122
264
  `cache_creation` token floor that every fresh `claude -p` session
123
- rebuilds (Claude Code's session-wrap content). The *added* benefit
265
+ rebuilds (Claude Code's session-wrap content). The _added_ benefit
124
266
  scales with the caller's stable block size, not the total prompt.
125
267
  - The `ttl='1h'` hard-code is mandatory because Anthropic rejects a
126
268
  `5m` block after Claude Code's own 1h-marked session blocks; the
@@ -160,7 +302,7 @@ Patch release. Single user-facing fix to `claude_request` /
160
302
  - Claude CLI 2.x rejects `--print --output-format=stream-json` without
161
303
  `--verbose` ("When using --print, --output-format=stream-json requires
162
304
  --verbose"). The gateway was emitting `--output-format stream-json
163
- --include-partial-messages` without `--verbose`, so every claude
305
+ --include-partial-messages` without `--verbose`, so every claude
164
306
  request configured for stream-json (sync or async) was exiting 1.
165
307
  - `prepareClaudeRequest` now pushes `--verbose` as part of the
166
308
  stream-json arg group. `--verbose` only affects what claude writes to
@@ -174,7 +316,7 @@ Patch release. Single user-facing fix to `claude_request` /
174
316
  recorded in the FR for the first time since the CLI started enforcing
175
317
  `--verbose`.
176
318
  - Direct CLI verification: `claude -p ... --output-format stream-json
177
- --verbose --include-partial-messages` returned a clean NDJSON stream
319
+ --verbose --include-partial-messages` returned a clean NDJSON stream
178
320
  with `cache_read_input_tokens: 17978` and
179
321
  `cache_creation_input_tokens: 17435` on a 1-hour-cache-enabled
180
322
  account. The parser path is correct; only the missing flag was
@@ -184,7 +326,7 @@ Patch release. Single user-facing fix to `claude_request` /
184
326
 
185
327
  - New regression: `prepareClaudeRequest` emits `--verbose` when
186
328
  `outputFormat: "stream-json"` and does NOT emit it for `text` / `json`
187
- (src/__tests__/claude-handler.test.ts).
329
+ (src/**tests**/claude-handler.test.ts).
188
330
  - Updated `upstream-contracts.test.ts` "accepts a valid Claude argv
189
331
  emitted by the gateway" to pin the three-flag combo so a future
190
332
  removal of `--verbose` fails at the contract gate.
@@ -254,7 +396,7 @@ regressions) plus this release commit.
254
396
  enumerate). Also settable via the `GROK_SANDBOX` env var. Caller
255
397
  responsibility to pass a valid profile name. The slice deliberately
256
398
  does **not** integrate `--sandbox` with `approvalStrategy:
257
- "mcp_managed"` because the value is unbounded — Grok's approval
399
+ "mcp_managed"` because the value is unbounded — Grok's approval
258
400
  semantics are already covered by `permissionMode` + `alwaysApprove` +
259
401
  `approvalStrategy`.
260
402
  - **`rules`** → `--rules <RULES>`. Supports `@file` prefix per
@@ -320,7 +462,7 @@ parallel with mandatory mutation-probe execution against
320
462
 
321
463
  - Codex: UNCONDITIONAL APPROVE — all 12 probes [as predicted], all
322
464
  26 tests VERIFIED. Baseline (`npm test`: 55 files / 884 tests; build
323
- + format:check clean; slice file 31/31).
465
+ - format:check clean; slice file 31/31).
324
466
  - Grok: UNCONDITIONAL APPROVE — all 12 probes [as predicted]; ran in
325
467
  an isolated worktree at `/tmp/theta-audit-grok` per the slice-ζ
326
468
  reviewer-stomping lesson.
@@ -330,8 +472,8 @@ parallel with mandatory mutation-probe execution against
330
472
  beyond the spec and closes the "enum-mistake stays silent if fixture
331
473
  uses a listed value" gap.
332
474
  - Gemini: **FAILED at 10s** with `TerminalQuotaError: You have
333
- exhausted your capacity on this model. Your quota will reset after
334
- 52m10s.` (Google 429). Documented quota blocker per protocol clause
475
+ exhausted your capacity on this model. Your quota will reset after
476
+ 52m10s.` (Google 429). Documented quota blocker per protocol clause
335
477
  5+6 — counts as "concrete unfixable when documented". Four
336
478
  substantive valid approves from independent vendor families (OpenAI,
337
479
  xAI, Mistral, Anthropic) satisfy the gate.
@@ -500,7 +642,7 @@ this release commit.
500
642
  so no extra gating required.
501
643
  - Both tools accept a new `jsonSchema` field
502
644
  (`string | Record<string, unknown>`). Per `claude --help`, the CLI
503
- argument is the JSON Schema *literal* (not a path; contrast with Codex
645
+ argument is the JSON Schema _literal_ (not a path; contrast with Codex
504
646
  `--output-schema`). Object values are `JSON.stringify`-d; string values
505
647
  pass verbatim. Use with `outputFormat: "json"` for structured output
506
648
  validation. Achieves Codex parity for structured-output validation
@@ -798,7 +940,7 @@ for the async tools and the codex CLI.
798
940
  already terminated before the arm signal landed.
799
941
  - `JobStore.markOrphanedOnStartup()` return shape extended from `number`
800
942
  to `{ count, orphaned: Array<{ id, correlationId, startedAt, stdout,
801
- stderr, exitCode }> }` so the manager constructor can write FR
943
+ stderr, exitCode }> }` so the manager constructor can write FR
802
944
  `logComplete` rows for previously orphaned jobs with proper audit data
803
945
  (durationMs from `startedAt`, response from `stderr || stdout`,
804
946
  errorMessage `"orphaned after gateway restart"`). `SqliteJobStore`
@@ -930,8 +1072,9 @@ Pure documentation release; zero source-code changes since 1.6.0.
930
1072
  ### Fixed — `docs/launch/blog-cache-awareness.md` accuracy + voice
931
1073
 
932
1074
  Technical corrections from the multi-LLM voice + technical review:
1075
+
933
1076
  - Mutually-exclusive error-string quotation reformatted so the
934
- ``provide exactly one of `prompt` or `promptParts``` example renders
1077
+ ``provide exactly one of `prompt`or`promptParts``` example renders
935
1078
  correctly in markdown.
936
1079
  - `lastWriteAt` references corrected to `lastRequestAt` (the actual
937
1080
  public field name on `SessionCacheStats`).
@@ -1002,8 +1145,7 @@ Also includes (beyond cache-awareness):
1002
1145
  The gateway concatenates in canonical order (`system → tools → context → task`)
1003
1146
  so the stable prefix bytes precede the volatile task tail unchanged across
1004
1147
  calls — raising implicit cache hit rate without calling provider cache APIs.
1005
- The exact error strings `provide exactly one of \`prompt\` or \`promptParts\``
1006
- and `one of \`prompt\` or \`promptParts\` is required` are stable API
1148
+ The exact error strings `provide exactly one of \`prompt\` or \`promptParts\``and`one of \`prompt\` or \`promptParts\` is required` are stable API
1007
1149
  contract.
1008
1150
  - **Flight-recorder v3 migration**: new columns `stable_prefix_hash`
1009
1151
  (sha256) and `stable_prefix_tokens` (integer bytes/4 heuristic) on
@@ -1034,9 +1176,9 @@ Also includes (beyond cache-awareness):
1034
1176
  - `warn_on_ttl_expiry = false`
1035
1177
  - `[cache_awareness.min_stable_tokens_for_cache_control]` per-family
1036
1178
  table (sonnet=1024, opus=4096, haiku=4096, default=4096).
1037
- Validated by a separate Zod schema and loader (`loadCacheAwarenessConfig`);
1038
- a malformed `[cache_awareness]` block does NOT break `loadPersistenceConfig`
1039
- and vice versa. No env-var overrides.
1179
+ Validated by a separate Zod schema and loader (`loadCacheAwarenessConfig`);
1180
+ a malformed `[cache_awareness]` block does NOT break `loadPersistenceConfig`
1181
+ and vice versa. No env-var overrides.
1040
1182
 
1041
1183
  ### Decision: Branch B (prefix-discipline only) for slice 1
1042
1184
 
@@ -1356,6 +1498,7 @@ Lands DAG layers 6-12 — the personal-MCP MVP terminal plus all of Phase 0-3 pr
1356
1498
  - **No self-update** — `cli_upgrade --cli mistral` detects pip / uv / brew via probes and dispatches to `pip install -U vibe-cli`, `uv tool upgrade vibe-cli`, or `brew upgrade mistral-vibe`. Unknown installations return an actionable error rather than running a non-existent `vibe update`.
1357
1499
 
1358
1500
  Other surfaces extended: `SESSION_PROVIDER_VALUES` now includes `"mistral"`; `list_models`, `cli_versions`, `cli_upgrade`, `approval_list`, `session_create`, `session_list`, and `session_clear_all` accept the fifth provider; new MCP resources `sessions://mistral` and `models://mistral` are registered; `validate_with_models` / `consensus_check` / `red_team_review` can route to Mistral.
1501
+
1359
1502
  - **U23 — JSON output + token/cost parity across providers.** New `src/codex-json-parser.ts` parses the Codex `--json` JSONL event stream (`thread.started`, `turn.started`/`completed`/`failed`, `item.*`, `error`); lenient against partial streams and garbage preamble. New `src/gemini-json-parser.ts` parses `gemini -o json` output and maps `usageMetadata.{promptTokenCount, candidatesTokenCount, cachedContentTokenCount}`. `extractUsageAndCost` is now a thin per-provider dispatcher returning `{inputTokens, outputTokens, cacheReadTokens?, cacheCreationTokens?, costUsd?}` for every provider that supports JSON; Claude `cache_read_input_tokens` / `cache_creation_input_tokens` are now plumbed through instead of being discarded. `codex_request`, `codex_request_async`, `gemini_request`, and `gemini_request_async` now expose `outputFormat: enum("text","json")` — set to `"json"` and the gateway emits `--json` (Codex) or `-o json` (Gemini) and forwards parsed usage/cost into the flight recorder. Flight-recorder schema gains `cache_read_tokens` and `cache_creation_tokens` columns via idempotent migration (`PRAGMA table_info` → `ALTER TABLE ADD COLUMN`); existing `logs.db` files are upgraded in place. 15 new tests.
1360
1503
  - **U24 — Permission/approval-mode parity across providers.** Claude `permissionMode` enum (`default | acceptEdits | plan | auto | dontAsk | bypassPermissions`) replaces the boolean `dangerouslySkipPermissions` (the boolean still works and now maps to `permissionMode: "bypassPermissions"`; setting both logs a warning, `permissionMode` wins). Gemini `approvalMode` gains `plan`. Codex splits `--full-auto` into `sandboxMode: enum("read-only","workspace-write","danger-full-access")` and `askForApproval: enum("untrusted","on-request","never")`, emitting `--sandbox <mode>` and `--ask-for-approval <mode>` independently; legacy `fullAuto: true` still works and expands to `--sandbox workspace-write --ask-for-approval never` by default, with `useLegacyFullAutoFlag: true` as an explicit escape hatch to emit `--full-auto` directly. Codex resume mode filters all three flags (`--full-auto`, `--sandbox`, `--ask-for-approval`) since `codex exec resume` inherits the session's policy. 26 new tests.
1361
1504
  - **U25 — Claude high-impact features.** `claude_request` / `claude_request_async` schemas gain `agent?: string` (single sub-agent dispatch), `agents?: Record<string, object>` (multi-agent JSON, validated against `CLAUDE_AGENT_DEFINITION_SCHEMA` before emit), `forkSession?: boolean`, `systemPrompt?: string`, `appendSystemPrompt?: string` (mutually exclusive at the schema + tool-callback boundary), `maxBudgetUsd?: number`, `maxTurns?: number`, `effort?: enum("low","medium","high","xhigh","max")`, and `excludeDynamicSystemPromptSections?: boolean`. Each emits the documented `--<flag>` form. 25 new tests in `src/__tests__/claude-handler.test.ts`.
@@ -1448,7 +1591,7 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
1448
1591
 
1449
1592
  ### Fixed
1450
1593
 
1451
- - **SIGTERM→SIGKILL escalation bug** — `proc.killed` becomes `true` after `.kill()` is *called*, not after the process *exits*, so the SIGKILL guard (`if (!proc.killed)`) was always false. Replaced with an `exited` flag set by `close`/`error` events in both `executor.ts` and `async-job-manager.ts`
1594
+ - **SIGTERM→SIGKILL escalation bug** — `proc.killed` becomes `true` after `.kill()` is _called_, not after the process _exits_, so the SIGKILL guard (`if (!proc.killed)`) was always false. Replaced with an `exited` flag set by `close`/`error` events in both `executor.ts` and `async-job-manager.ts`
1452
1595
  - **Timer priority race** — When both `timeout` and `idleTimeout` are set, idle timeout now clears the wall-clock timer to prevent `timedOut` from overriding `idledOut` in the close handler (which would misclassify code 125 as transient code 124)
1453
1596
 
1454
1597
  ### Added
@@ -1533,6 +1676,7 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
1533
1676
  ## Core Features
1534
1677
 
1535
1678
  ### Multi-LLM Orchestration
1679
+
1536
1680
  - **3 CLI tools supported**: Claude Code, Codex, Gemini
1537
1681
  - **Unified MCP interface**: Single protocol for all LLMs
1538
1682
  - **Cross-tool collaboration**: LLMs can use each other via MCP
@@ -1540,6 +1684,7 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
1540
1684
  - **Correlation ID tracking**: Full request tracing
1541
1685
 
1542
1686
  ### Token Optimization
1687
+
1543
1688
  - **Auto-optimization middleware**: 44% reduction on prompts, 37% on responses
1544
1689
  - **15+ optimization patterns**: Remove filler, compact types, arrow notation
1545
1690
  - **Opt-in feature**: `optimizePrompt` and `optimizeResponse` flags
@@ -1547,6 +1692,7 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
1547
1692
  - **Research-backed**: 42 sources, best practices documented
1548
1693
 
1549
1694
  ### Reliability & Performance
1695
+
1550
1696
  - **Retry logic**: Exponential backoff with circuit breaker
1551
1697
  - **Atomic file writes**: Process-specific temp files with fsync
1552
1698
  - **Memory limits**: 50MB cap on CLI output prevents DoS
@@ -1554,6 +1700,7 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
1554
1700
  - **Non-zero exit code handling**: Proper retry behavior
1555
1701
 
1556
1702
  ### Security Hardening
1703
+
1557
1704
  - **No secret leakage**: Generic session descriptions only
1558
1705
  - **File permissions**: 0o600 on sensitive files
1559
1706
  - **No ReDoS vulnerabilities**: Bounded regex patterns
@@ -1562,6 +1709,7 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
1562
1709
  - **Custom storage paths**: Secure directory creation
1563
1710
 
1564
1711
  ### Testing & Quality
1712
+
1565
1713
  - **114 tests**: 68 unit, 41 integration, 5 optimizer
1566
1714
  - **Real CLI integration**: Not mocks
1567
1715
  - **Regression tests**: ReDoS, schema validation, retry behavior
@@ -1569,6 +1717,7 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
1569
1717
  - **Edge case coverage**: Timeouts, errors, concurrency
1570
1718
 
1571
1719
  ### Documentation Excellence
1720
+
1572
1721
  - **7 comprehensive guides**: 4,000+ lines total
1573
1722
  - **Research-backed**: TOKEN_OPTIMIZATION_GUIDE.md with 42 sources
1574
1723
  - **Real-world examples**: PROMPT_OPTIMIZATION_EXAMPLES.md with 5 examples
@@ -1580,6 +1729,7 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
1580
1729
  ## Added
1581
1730
 
1582
1731
  ### Features
1732
+
1583
1733
  - Multi-LLM CLI orchestration via MCP
1584
1734
  - Session management with persistence
1585
1735
  - Correlation ID tracking for request tracing
@@ -1591,6 +1741,7 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
1591
1741
  - Custom storage path support
1592
1742
 
1593
1743
  ### Tools (MCP)
1744
+
1594
1745
  - `claude_request` - Execute Claude Code CLI
1595
1746
  - `codex_request` - Execute Codex CLI
1596
1747
  - `gemini_request` - Execute Gemini CLI
@@ -1604,6 +1755,7 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
1604
1755
  - `list_models` - List available models for each CLI
1605
1756
 
1606
1757
  ### Resources (MCP)
1758
+
1607
1759
  - `sessions://all` - All sessions across CLIs
1608
1760
  - `sessions://claude` - Claude-specific sessions
1609
1761
  - `sessions://codex` - Codex-specific sessions
@@ -1612,6 +1764,7 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
1612
1764
  - `metrics://performance` - Performance metrics and stats
1613
1765
 
1614
1766
  ### Documentation
1767
+
1615
1768
  - `README.md` - Installation and usage guide
1616
1769
  - `BEST_PRACTICES.md` - Design and implementation patterns
1617
1770
  - `TOKEN_OPTIMIZATION_GUIDE.md` - Research-backed optimization techniques (42 sources)
@@ -1625,6 +1778,7 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
1625
1778
  - `CROSS_TOOL_SUCCESS.md` - Cross-LLM collaboration validation
1626
1779
 
1627
1780
  ### Tests
1781
+
1628
1782
  - 68 unit tests (executor, sessions, metrics, optimizer)
1629
1783
  - 41 integration tests (full MCP with real CLIs)
1630
1784
  - 5 optimizer tests (pattern validation, ReDoS prevention)
@@ -1637,6 +1791,7 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
1637
1791
  ### First Review Round (8 bugs)
1638
1792
 
1639
1793
  **Critical:**
1794
+
1640
1795
  1. **session_set_active schema mismatch** (src/index.ts:430)
1641
1796
  - Issue: Documentation said "null to clear" but z.string() rejected null
1642
1797
  - Fix: Changed to z.string().nullable()
@@ -1652,12 +1807,12 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
1652
1807
  - Fix: Integrated withRetry + CircuitBreaker into executeCli
1653
1808
  - Impact: Transient failures now retried automatically
1654
1809
 
1655
- **Medium:**
1656
- 4. **Integration test brittleness**
1657
- - Issue: Tests failed without dist/ or CLIs installed
1658
- - Fix: Tests properly skip when CLIs unavailable
1810
+ **Medium:** 4. **Integration test brittleness**
1811
+
1812
+ - Issue: Tests failed without dist/ or CLIs installed
1813
+ - Fix: Tests properly skip when CLIs unavailable
1659
1814
 
1660
- 5. **Test timing issues** (src/__tests__/session-manager.test.ts:216,429)
1815
+ 5. **Test timing issues** (src/**tests**/session-manager.test.ts:216,429)
1661
1816
  - Issue: setTimeout not awaited → false positives
1662
1817
  - Fix: Proper async/await patterns
1663
1818
 
@@ -1665,10 +1820,10 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
1665
1820
  - Issue: All stdout/stderr buffered in memory with no cap
1666
1821
  - Fix: Added 50MB limit with early termination
1667
1822
 
1668
- **Low:**
1669
- 7. **Model data duplication** (src/index.ts:64, src/resources.ts:22)
1670
- - Issue: CLI_INFO defined in two places
1671
- - Fix: Centralized in single location
1823
+ **Low:** 7. **Model data duplication** (src/index.ts:64, src/resources.ts:22)
1824
+
1825
+ - Issue: CLI_INFO defined in two places
1826
+ - Fix: Centralized in single location
1672
1827
 
1673
1828
  8. **Unused code** (src/resources.ts:33)
1674
1829
  - Issue: listResources() never called
@@ -1677,27 +1832,28 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
1677
1832
  ### Second Review Round (8 bugs)
1678
1833
 
1679
1834
  **Critical:**
1835
+
1680
1836
  1. **Secret leakage via session descriptions** (src/index.ts + src/session-manager.ts)
1681
1837
  - Issue: First 50 chars of prompts stored in plain text
1682
1838
  - Fix: Generic descriptions ("Claude Session"), file permissions 0o600
1683
1839
  - Impact: No user data exposed in session files
1684
1840
 
1685
- **High:**
1686
- 2. **ReDoS in optimizer regex** (src/optimizer.ts:241,244)
1687
- - Issue: Catastrophic backtracking with .+? patterns
1688
- - Fix: Bounded character sets [A-Za-z][\w-]*
1689
- - Impact: No DoS from malicious prompts
1841
+ **High:** 2. **ReDoS in optimizer regex** (src/optimizer.ts:241,244)
1842
+
1843
+ - Issue: Catastrophic backtracking with .+? patterns
1844
+ - Fix: Bounded character sets [A-Za-z][\w-]\*
1845
+ - Impact: No DoS from malicious prompts
1690
1846
 
1691
1847
  3. **Custom storage path directory not created** (src/session-manager.ts:36)
1692
1848
  - Issue: ensureStorageDirectory only created default path
1693
1849
  - Fix: Create dirname(storagePath) for custom paths
1694
1850
  - Impact: Custom storage paths work without errors
1695
1851
 
1696
- **Medium:**
1697
- 4. **Atomic write temp filename collision** (src/session-manager.ts:57)
1698
- - Issue: All processes used same .tmp filename
1699
- - Fix: Process-specific temp files (sessions.json.tmp.${process.pid})
1700
- - Impact: Safe multi-process deployments
1852
+ **Medium:** 4. **Atomic write temp filename collision** (src/session-manager.ts:57)
1853
+
1854
+ - Issue: All processes used same .tmp filename
1855
+ - Fix: Process-specific temp files (sessions.json.tmp.${process.pid})
1856
+ - Impact: Safe multi-process deployments
1701
1857
 
1702
1858
  5. **Retry doesn't handle non-zero exit codes** (src/executor.ts:99)
1703
1859
  - Issue: Only thrown errors triggered retry
@@ -1709,11 +1865,11 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
1709
1865
  - Fix: 50MB limit with process termination
1710
1866
  - Impact: DoS prevention
1711
1867
 
1712
- **Low:**
1713
- 7. **Performance overhead from NVM scanning** (src/executor.ts:41)
1714
- - Issue: Filesystem scan on every request
1715
- - Fix: Cache NVM path at module load
1716
- - Impact: Performance improvement
1868
+ **Low:** 7. **Performance overhead from NVM scanning** (src/executor.ts:41)
1869
+
1870
+ - Issue: Filesystem scan on every request
1871
+ - Fix: Cache NVM path at module load
1872
+ - Impact: Performance improvement
1717
1873
 
1718
1874
  8. **Unused imports** (src/session-manager.ts:4, src/executor.ts:7)
1719
1875
  - Issue: Dead code and unused parameters
@@ -1725,6 +1881,7 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
1725
1881
  ## Security
1726
1882
 
1727
1883
  ### Vulnerabilities Fixed
1884
+
1728
1885
  - ✅ **Secret leakage**: No user data in session descriptions
1729
1886
  - ✅ **File permissions**: 0o600 on sessions.json
1730
1887
  - ✅ **ReDoS**: Bounded regex patterns prevent DoS
@@ -1733,6 +1890,7 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
1733
1890
  - ✅ **Command injection**: Already prevented via spawn with args
1734
1891
 
1735
1892
  ### Security Best Practices
1893
+
1736
1894
  - Input validation with Zod schemas
1737
1895
  - No stack trace leakage in errors
1738
1896
  - Atomic file writes with fsync
@@ -1744,6 +1902,7 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
1744
1902
  ## Performance
1745
1903
 
1746
1904
  ### Optimizations Added
1905
+
1747
1906
  - **Token optimization**: 44% reduction on prompts, 37% on responses
1748
1907
  - **NVM path caching**: Eliminates I/O on every request
1749
1908
  - **Circuit breaker**: Fast-fail during outages
@@ -1751,6 +1910,7 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
1751
1910
  - **Memory limits**: Prevents resource exhaustion
1752
1911
 
1753
1912
  ### Metrics
1913
+
1754
1914
  - Request counts per CLI tool
1755
1915
  - Response times with percentiles
1756
1916
  - Success/failure rates
@@ -1762,6 +1922,7 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
1762
1922
  ## Testing
1763
1923
 
1764
1924
  ### Test Growth
1925
+
1765
1926
  - **Initial**: 104 tests
1766
1927
  - **After first fixes**: 109 tests (+5 from retry integration)
1767
1928
  - **After optimizer**: 113 tests (+4 from optimizer)
@@ -1769,6 +1930,7 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
1769
1930
  - **Growth**: +10 tests (9.6% increase)
1770
1931
 
1771
1932
  ### Coverage Areas
1933
+
1772
1934
  - Unit: Executor, session manager, metrics, optimizer
1773
1935
  - Integration: Full MCP protocol with real CLI execution
1774
1936
  - Regression: Schema validation, ReDoS, retry behavior
@@ -1779,6 +1941,7 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
1779
1941
  ## Documentation
1780
1942
 
1781
1943
  ### Guides Created
1944
+
1782
1945
  1. **README.md** - Installation, usage, API reference
1783
1946
  2. **BEST_PRACTICES.md** - Design patterns and architecture
1784
1947
  3. **TOKEN_OPTIMIZATION_GUIDE.md** - Research (42 sources)
@@ -1792,6 +1955,7 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
1792
1955
  11. **CROSS_TOOL_SUCCESS.md** - Collaboration proof
1793
1956
 
1794
1957
  ### Total Documentation
1958
+
1795
1959
  - **11 comprehensive files**
1796
1960
  - **~8,000 lines** of documentation
1797
1961
  - **Research-backed** with citations
@@ -1802,17 +1966,20 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
1802
1966
  ## Dogfooding Validation
1803
1967
 
1804
1968
  ### Multi-LLM Review Process
1969
+
1805
1970
  - **Claude Sonnet 4.5**: Strategic/product review (8.5/10 → 10/10)
1806
1971
  - **Codex**: Bug finding and implementation (13 bugs found, 13 fixed)
1807
1972
  - **Gemini 2.5 Pro**: Security analysis (3 critical issues found, 3 fixed)
1808
1973
 
1809
1974
  ### Self-Improvement Cycle
1975
+
1810
1976
  1. ✅ Multi-LLM review found 16 bugs
1811
1977
  2. ✅ Codex fixed all bugs via MCP
1812
1978
  3. ✅ Gateway validated fixes via test suite
1813
1979
  4. ✅ Complete autonomous improvement demonstrated
1814
1980
 
1815
1981
  ### Workflow Validated
1982
+
1816
1983
  ```
1817
1984
  Implement (Codex) → Review (Gemini) → Fix (Codex) → Verify (Tests) → Iterate
1818
1985
  ```
@@ -1822,41 +1989,45 @@ Implement (Codex) → Review (Gemini) → Fix (Codex) → Verify (Tests) → Ite
1822
1989
  ## Migration Guide
1823
1990
 
1824
1991
  ### Breaking Changes
1992
+
1825
1993
  None - This is the first release.
1826
1994
 
1827
1995
  ### New Features to Adopt
1828
1996
 
1829
1997
  **1. Token Optimization** (Optional, Opt-in)
1998
+
1830
1999
  ```typescript
1831
2000
  // Enable prompt optimization
1832
2001
  await callTool("codex_request", {
1833
2002
  prompt: "Your verbose prompt...",
1834
- optimizePrompt: true // 44% token reduction
2003
+ optimizePrompt: true, // 44% token reduction
1835
2004
  });
1836
2005
 
1837
2006
  // Enable response optimization
1838
2007
  await callTool("claude_request", {
1839
2008
  prompt: "Generate docs...",
1840
- optimizeResponse: true // 37% token reduction
2009
+ optimizeResponse: true, // 37% token reduction
1841
2010
  });
1842
2011
  ```
1843
2012
 
1844
2013
  **2. Session Management**
2014
+
1845
2015
  ```typescript
1846
2016
  // Create and use sessions
1847
2017
  const session = await callTool("session_create", {
1848
2018
  cli: "claude",
1849
- description: "My coding session"
2019
+ description: "My coding session",
1850
2020
  });
1851
2021
 
1852
2022
  // Continue conversations
1853
2023
  await callTool("claude_request", {
1854
2024
  prompt: "Continue from previous context",
1855
- sessionId: session.id
2025
+ sessionId: session.id,
1856
2026
  });
1857
2027
  ```
1858
2028
 
1859
2029
  **3. Correlation IDs** (Automatic)
2030
+
1860
2031
  ```typescript
1861
2032
  // Automatically generated for tracing
1862
2033
  // Check logs: [corrId] prefix on all log lines
@@ -1867,6 +2038,7 @@ await callTool("claude_request", {
1867
2038
  ## Known Limitations
1868
2039
 
1869
2040
  ### Documented Constraints
2041
+
1870
2042
  1. **Multi-level orchestration unsupported**
1871
2043
  - Nested MCP connections fail
1872
2044
  - LLMs can't spawn sub-LLMs via gateway
@@ -1881,6 +2053,7 @@ await callTool("claude_request", {
1881
2053
  - Consider encryption for sensitive data (future)
1882
2054
 
1883
2055
  ### Future Enhancements
2056
+
1884
2057
  - Session encryption at rest
1885
2058
  - Session TTL and automatic cleanup
1886
2059
  - Redis/DynamoDB backend for horizontal scaling
@@ -1893,16 +2066,19 @@ await callTool("claude_request", {
1893
2066
  ## Credits
1894
2067
 
1895
2068
  ### Development
2069
+
1896
2070
  - **Architecture & Orchestration**: Claude Sonnet 4.5
1897
2071
  - **Implementation & Bug Fixes**: Codex via llm-cli-gateway MCP
1898
2072
  - **Security Analysis**: Gemini 2.5 Pro via llm-cli-gateway MCP
1899
2073
 
1900
2074
  ### Research
2075
+
1901
2076
  - Token optimization: 42 research sources (2025-2026)
1902
2077
  - Compression validation: Compel paper (OpenReview 2025)
1903
2078
  - Best practices: Industry standards + dogfooding
1904
2079
 
1905
2080
  ### Validation
2081
+
1906
2082
  - **Self-dogfooding**: Gateway reviewed and fixed itself
1907
2083
  - **Multi-LLM collaboration**: 3 LLMs working via MCP
1908
2084
  - **Iterative quality**: 2 review rounds, 16 bugs found and fixed
@@ -1912,6 +2088,7 @@ await callTool("claude_request", {
1912
2088
  ## Statistics
1913
2089
 
1914
2090
  ### Development Timeline
2091
+
1915
2092
  - **Total time**: ~2.5 hours (from first review to 100% bug-free)
1916
2093
  - **Review rounds**: 2 comprehensive multi-LLM reviews
1917
2094
  - **Bugs found**: 16 total
@@ -1919,12 +2096,14 @@ await callTool("claude_request", {
1919
2096
  - **Test growth**: 104 → 114 tests (+9.6%)
1920
2097
 
1921
2098
  ### Code Metrics
2099
+
1922
2100
  - **Files modified**: 12 files
1923
2101
  - **Lines added**: ~2,500 lines
1924
2102
  - **Documentation**: ~8,000 lines (11 files)
1925
2103
  - **Test coverage**: 114 tests across unit/integration/regression
1926
2104
 
1927
2105
  ### Quality Metrics
2106
+
1928
2107
  - **Bug-free rate**: 100%
1929
2108
  - **Test pass rate**: 100%
1930
2109
  - **Build success**: ✅