llm-cli-gateway 1.14.0 → 1.15.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -2,6 +2,172 @@
2
2
 
3
3
  All notable changes to the llm-cli-gateway project.
4
4
 
5
+ ## [1.15.1] - 2026-05-29 — quality badges + Sigstore release signing
6
+
7
+ Release-infrastructure follow-up to v1.15.0.
8
+
9
+ ### Added
10
+
11
+ - README quality badges for CI, security, OpenSSF Scorecard, npm, license, and
12
+ Sigstore-signed release artifacts.
13
+ - Sigstore keyless signing for GitHub release installer artifacts, including
14
+ `.sigstore.json` bundles and pre-upload verification in the release workflow.
15
+ - End-user verification guidance for `SHA256SUMS.sigstore.json` before trusting
16
+ release checksums.
17
+ - Sanitized Windows Claude Desktop MCP config example using 1Password
18
+ environment injection placeholders.
19
+ - Security workflow attribution guard that rejects new Claude/Anthropic
20
+ author/co-author metadata in future commits.
21
+
22
+ ### Changed
23
+
24
+ - Manual release-installer rebuilds now fail fast unless launched from the
25
+ matching release tag ref, keeping Sigstore certificate identities stable.
26
+ - Windows installer snippets and generated release manifest commands now verify
27
+ the Sigstore checksum bundle before executing the downloaded bootstrapper.
28
+
29
+ ## [1.15.0] - 2026-05-28 — Phase 4 slice λ (gateway-owned worktree lifecycle)
30
+
31
+ Ships the tenth Phase 4 slice: a new top-level `worktree` field on every
32
+ `*_request` and `*_request_async` tool lets a caller run the request
33
+ inside a dedicated git worktree owned and lifecycle-managed by the
34
+ gateway. The provider audit listed `-w/--worktree` as a per-CLI flag on
35
+ Claude / Gemini / Grok; this slice deliberately does **not** wire any
36
+ `-w` passthrough. Instead the gateway pre-creates a worktree via
37
+ `git worktree add`, spawns the child CLI with `cwd: <worktree-path>`,
38
+ and persists `worktreePath` on `session.metadata` for reuse. Five CLIs
39
+ × two transports (sync + async) = ten tools all share one resolver, so
40
+ the surface lands as one Zod schema + one helper per tool rather than
41
+ five-times-two per-CLI argv wirings.
42
+
43
+ ### Added — gateway-owned worktree surface
44
+
45
+ - **`WORKTREE_SCHEMA`** (`src/index.ts`): top-level Zod field
46
+ registered on all ten tools — `claude_request`, `codex_request`,
47
+ `gemini_request`, `grok_request`, `mistral_request`, plus the five
48
+ `*_request_async` siblings. Accepts `true` (anonymous UUID worktree
49
+ at `<repoRoot>/.worktrees/<uuid>` branched from HEAD) or
50
+ `{ name?, ref? }` (sanitised name and/or explicit git ref).
51
+ - **`src/worktree-manager.ts`** (new file, 277 lines):
52
+ `sanitizeWorktreeName` (rejects path traversal — `..`, leading `/`,
53
+ control chars, length > 64), `createWorktree`
54
+ (`git rev-parse --verify <ref>` before `git worktree add`,
55
+ collision detection via `WorktreeCollisionError`, branch-namespaced
56
+ `gateway/<name>` worktrees), `removeWorktree`
57
+ (`git worktree remove --force`), and `createWorktreeSessionCleanupHook`
58
+ (hooks into session manager).
59
+ - **`resolveWorktreeForRequest`** (`src/index.ts`): single per-request
60
+ resolver consumed by every tool handler. When the request carries
61
+ a `sessionId` and the session already has `metadata.worktreePath`,
62
+ the worktree is reused (no second `git worktree add`); otherwise a
63
+ new worktree is created and persisted onto the session via
64
+ `updateSessionMetadata`. The resolved path is threaded to the
65
+ executor via the existing `cwd` plumbing.
66
+ - **`formatWorktreePrefix(path)`** (`src/index.ts:826`): every
67
+ successful tool result is prefixed with
68
+ `[gateway] worktree=<absolute-path>\n` so the caller can drive
69
+ `Bash(cd <path>)`, `Read <path>/...`, etc. Empty when the request
70
+ did not use a worktree (zero behaviour change for non-λ callers).
71
+ - **`Session.metadata` extension** (`src/session-manager.ts`):
72
+ `worktreePath` + `worktreeName` land on the existing `metadata`
73
+ bag — no `Session` interface changes. `FileSessionManager` accepts
74
+ a `cleanupHook` option that fires on `deleteSession` and on
75
+ TTL-driven eviction; the hook calls `git worktree remove --force`
76
+ before the session record is dropped.
77
+ - **`AsyncJobManager` cwd-aware dedup** (`src/async-job-manager.ts`):
78
+ the dedup key now includes the resolved `cwd`, so two
79
+ `*_request_async` calls with identical argv but different
80
+ worktree paths cannot collide (REGRESSIONS Lθ).
81
+
82
+ ### Out of scope — explicitly deferred
83
+
84
+ - **Grok's `worktree` subcommand** (separate top-level subcommand
85
+ on the Grok CLI, distinct from `-w/--worktree`).
86
+ - **Claude's `--tmux`** (terminal-multiplexer integration).
87
+ - **Startup sweep of orphaned `.worktrees/*`** — left to future
88
+ housekeeping; the cleanup hook covers the happy path
89
+ (session_delete + TTL eviction).
90
+ - **Multi-repo / submodule semantics** — gateway assumes a single
91
+ primary repo at `<repoRoot>`; multi-root behaviour is undefined.
92
+
93
+ ### Test surface
94
+
95
+ `940 → 989` tests pass (+49):
96
+
97
+ - **`src/__tests__/worktree-manager.test.ts`** (new, 26 tests) —
98
+ unit-tests for `sanitizeWorktreeName`, `createWorktree` (including
99
+ the rev-parse-before-add invariant + `WorktreeCollisionError`),
100
+ `removeWorktree`, and `createWorktreeSessionCleanupHook`.
101
+ - **`src/__tests__/test-veracity-regressions-slice-lambda.test.ts`**
102
+ (new, 23 tests across REGRESSIONS Lα–Lθ + Lψ):
103
+ - **Lα** — `sanitizeWorktreeName` path-traversal rejection.
104
+ - **Lβ** — `createWorktree` runs `git rev-parse --verify` BEFORE
105
+ `git worktree add`.
106
+ - **Lγ** — `resolveWorktreeForRequest` persists `worktreePath`
107
+ onto session metadata via `updateSessionMetadata`.
108
+ - **Lδ** — same-session reuse: the second request with the same
109
+ `sessionId` skips `git worktree add`.
110
+ - **Lε** — `FileSessionManager.deleteSession` invokes the cleanup
111
+ hook (and TTL eviction does too).
112
+ - **Lζ** — `executor.executeCli` honours the resolved `cwd`.
113
+ - **Lη** — contract-as-negative-oracle: no CLI receives
114
+ `-w`/`--worktree` in emitted argv across all five providers
115
+ (pairs with slice δ's contract-as-positive-oracle).
116
+ - **Lθ** — `AsyncJobManager` dedup key includes `cwd`.
117
+ - **Lψ** — `formatWorktreePrefix` envelope shape locked
118
+ (`[gateway] worktree=<abs>\n`; empty when path missing).
119
+
120
+ ### Multi-LLM strict-evidence audit
121
+
122
+ Per the standing protocol (`feedback_test_veracity_audit_protocol`
123
+
124
+ - `feedback_multi_llm_review_gate`), the slice was audited round-1
125
+ on 2026-05-28 against `docs/plans/slice-lambda.spec.md`.
126
+
127
+ **Round 1 outcomes:**
128
+
129
+ - Codex: UNCONDITIONAL APPROVE — 9/9 mutation probes RED as
130
+ predicted; per-probe verbatim assertion text and pre/post-revert
131
+ test counts. Worktree at `audit/codex-round-1`.
132
+ - Grok: UNCONDITIONAL APPROVE — 9/9 RED, per-probe verbatim
133
+ assertion text. Worktree at `audit/grok-round-1`.
134
+ - Mistral: UNCONDITIONAL APPROVE — 9/9 RED with per-probe failed-
135
+ count summaries. Worktree at `audit/mistral-round-1`
136
+ (`5d75099`).
137
+ - Gemini: **PARTIAL (quota-blocked)** — confirmed Lα–Lε RED (5/9)
138
+ with assertion text matching the substantive reviewers before
139
+ `TerminalQuotaError` (4h35m reset window > round budget) forced
140
+ a stop. No findings, no contradictions.
141
+ - Claude: **STRUCTURAL BLOCKER** — two `claude_request_async`
142
+ jobs (`135c05c3-…`, `e411e8cc-…`) stalled silently
143
+ (`stdoutBytes: 0` for ≥10 minutes); the second produced a
144
+ 1126-byte fabricated meta-summary with no per-probe evidence,
145
+ rejected per the strict-evidence rule. Documented stall pattern,
146
+ not a defect in slice λ.
147
+
148
+ Four out of five independent vendor voices contributed evidence
149
+ (three full + one partial corroborating) with one documented
150
+ unfixable structural block, satisfying the slice-δ "4/5 minimum
151
+ with documented block" bar. The three full audits are unanimous;
152
+ the partial fourth corroborates without contradiction. Verdict:
153
+ slice λ passes the gate and ships as v1.15.0.
154
+
155
+ Full per-reviewer reports preserved at
156
+ `docs/reviews/slice-lambda/{README,round-1-{codex,grok,mistral,
157
+ gemini,claude}}.md`.
158
+
159
+ ### Mechanical anchors (verify with `rg` before relying)
160
+
161
+ - `src/worktree-manager.ts` — new module, 277 lines.
162
+ - `src/index.ts` — `WORKTREE_SCHEMA` (`:419-444`),
163
+ `formatWorktreePrefix` (`:826-828`), `resolveWorktreeForRequest`
164
+ - per-tool prefix injection (search `formatWorktreePrefix(`),
165
+ 10 × `worktree: WORKTREE_SCHEMA.optional()` registrations on
166
+ every `*_request` / `*_request_async` tool input.
167
+ - `src/session-manager.ts` — `cleanupHook` plumbing
168
+ (`:53-90, 318-342`).
169
+ - `src/async-job-manager.ts` — dedup-key cwd inclusion.
170
+
5
171
  ## [1.14.0] - 2026-05-28 — Phase 4 slice κ (Claude explicit `cache_control` via `--input-format stream-json`)
6
172
 
7
173
  Ships the ninth Phase 4 slice. Callers can now opt their stable
@@ -31,7 +197,7 @@ falsifiability-tightening commits driven by the multi-LLM review gate.
31
197
  - **`prepareClaudeRequest` κ branch** (`src/index.ts`): when the
32
198
  caller marks any block AND requests `outputFormat: "stream-json"`,
33
199
  argv switches to `-p --input-format stream-json --output-format
34
- stream-json --include-partial-messages --verbose` with NO positional
200
+ stream-json --include-partial-messages --verbose` with NO positional
35
201
  prompt; the prep result carries `stdinPayload` + `cacheControlBlocks`.
36
202
  Mixing `cacheControl` with `text`/`json` output returns an
37
203
  actionable error instead of silently coercing.
@@ -120,7 +286,7 @@ APPROVE) is preserved in commit history (`bea1aee` and `bbc3b5f`).
120
286
 
121
287
  - κ adds caller-side reuse ON TOP of the irreducible ~10–12K
122
288
  `cache_creation` token floor that every fresh `claude -p` session
123
- rebuilds (Claude Code's session-wrap content). The *added* benefit
289
+ rebuilds (Claude Code's session-wrap content). The _added_ benefit
124
290
  scales with the caller's stable block size, not the total prompt.
125
291
  - The `ttl='1h'` hard-code is mandatory because Anthropic rejects a
126
292
  `5m` block after Claude Code's own 1h-marked session blocks; the
@@ -160,7 +326,7 @@ Patch release. Single user-facing fix to `claude_request` /
160
326
  - Claude CLI 2.x rejects `--print --output-format=stream-json` without
161
327
  `--verbose` ("When using --print, --output-format=stream-json requires
162
328
  --verbose"). The gateway was emitting `--output-format stream-json
163
- --include-partial-messages` without `--verbose`, so every claude
329
+ --include-partial-messages` without `--verbose`, so every claude
164
330
  request configured for stream-json (sync or async) was exiting 1.
165
331
  - `prepareClaudeRequest` now pushes `--verbose` as part of the
166
332
  stream-json arg group. `--verbose` only affects what claude writes to
@@ -174,7 +340,7 @@ Patch release. Single user-facing fix to `claude_request` /
174
340
  recorded in the FR for the first time since the CLI started enforcing
175
341
  `--verbose`.
176
342
  - Direct CLI verification: `claude -p ... --output-format stream-json
177
- --verbose --include-partial-messages` returned a clean NDJSON stream
343
+ --verbose --include-partial-messages` returned a clean NDJSON stream
178
344
  with `cache_read_input_tokens: 17978` and
179
345
  `cache_creation_input_tokens: 17435` on a 1-hour-cache-enabled
180
346
  account. The parser path is correct; only the missing flag was
@@ -184,7 +350,7 @@ Patch release. Single user-facing fix to `claude_request` /
184
350
 
185
351
  - New regression: `prepareClaudeRequest` emits `--verbose` when
186
352
  `outputFormat: "stream-json"` and does NOT emit it for `text` / `json`
187
- (src/__tests__/claude-handler.test.ts).
353
+ (src/**tests**/claude-handler.test.ts).
188
354
  - Updated `upstream-contracts.test.ts` "accepts a valid Claude argv
189
355
  emitted by the gateway" to pin the three-flag combo so a future
190
356
  removal of `--verbose` fails at the contract gate.
@@ -254,7 +420,7 @@ regressions) plus this release commit.
254
420
  enumerate). Also settable via the `GROK_SANDBOX` env var. Caller
255
421
  responsibility to pass a valid profile name. The slice deliberately
256
422
  does **not** integrate `--sandbox` with `approvalStrategy:
257
- "mcp_managed"` because the value is unbounded — Grok's approval
423
+ "mcp_managed"` because the value is unbounded — Grok's approval
258
424
  semantics are already covered by `permissionMode` + `alwaysApprove` +
259
425
  `approvalStrategy`.
260
426
  - **`rules`** → `--rules <RULES>`. Supports `@file` prefix per
@@ -320,7 +486,7 @@ parallel with mandatory mutation-probe execution against
320
486
 
321
487
  - Codex: UNCONDITIONAL APPROVE — all 12 probes [as predicted], all
322
488
  26 tests VERIFIED. Baseline (`npm test`: 55 files / 884 tests; build
323
- + format:check clean; slice file 31/31).
489
+ - format:check clean; slice file 31/31).
324
490
  - Grok: UNCONDITIONAL APPROVE — all 12 probes [as predicted]; ran in
325
491
  an isolated worktree at `/tmp/theta-audit-grok` per the slice-ζ
326
492
  reviewer-stomping lesson.
@@ -330,8 +496,8 @@ parallel with mandatory mutation-probe execution against
330
496
  beyond the spec and closes the "enum-mistake stays silent if fixture
331
497
  uses a listed value" gap.
332
498
  - Gemini: **FAILED at 10s** with `TerminalQuotaError: You have
333
- exhausted your capacity on this model. Your quota will reset after
334
- 52m10s.` (Google 429). Documented quota blocker per protocol clause
499
+ exhausted your capacity on this model. Your quota will reset after
500
+ 52m10s.` (Google 429). Documented quota blocker per protocol clause
335
501
  5+6 — counts as "concrete unfixable when documented". Four
336
502
  substantive valid approves from independent vendor families (OpenAI,
337
503
  xAI, Mistral, Anthropic) satisfy the gate.
@@ -500,7 +666,7 @@ this release commit.
500
666
  so no extra gating required.
501
667
  - Both tools accept a new `jsonSchema` field
502
668
  (`string | Record<string, unknown>`). Per `claude --help`, the CLI
503
- argument is the JSON Schema *literal* (not a path; contrast with Codex
669
+ argument is the JSON Schema _literal_ (not a path; contrast with Codex
504
670
  `--output-schema`). Object values are `JSON.stringify`-d; string values
505
671
  pass verbatim. Use with `outputFormat: "json"` for structured output
506
672
  validation. Achieves Codex parity for structured-output validation
@@ -798,7 +964,7 @@ for the async tools and the codex CLI.
798
964
  already terminated before the arm signal landed.
799
965
  - `JobStore.markOrphanedOnStartup()` return shape extended from `number`
800
966
  to `{ count, orphaned: Array<{ id, correlationId, startedAt, stdout,
801
- stderr, exitCode }> }` so the manager constructor can write FR
967
+ stderr, exitCode }> }` so the manager constructor can write FR
802
968
  `logComplete` rows for previously orphaned jobs with proper audit data
803
969
  (durationMs from `startedAt`, response from `stderr || stdout`,
804
970
  errorMessage `"orphaned after gateway restart"`). `SqliteJobStore`
@@ -930,8 +1096,9 @@ Pure documentation release; zero source-code changes since 1.6.0.
930
1096
  ### Fixed — `docs/launch/blog-cache-awareness.md` accuracy + voice
931
1097
 
932
1098
  Technical corrections from the multi-LLM voice + technical review:
1099
+
933
1100
  - Mutually-exclusive error-string quotation reformatted so the
934
- ``provide exactly one of `prompt` or `promptParts``` example renders
1101
+ ``provide exactly one of `prompt`or`promptParts``` example renders
935
1102
  correctly in markdown.
936
1103
  - `lastWriteAt` references corrected to `lastRequestAt` (the actual
937
1104
  public field name on `SessionCacheStats`).
@@ -1002,8 +1169,7 @@ Also includes (beyond cache-awareness):
1002
1169
  The gateway concatenates in canonical order (`system → tools → context → task`)
1003
1170
  so the stable prefix bytes precede the volatile task tail unchanged across
1004
1171
  calls — raising implicit cache hit rate without calling provider cache APIs.
1005
- The exact error strings `provide exactly one of \`prompt\` or \`promptParts\``
1006
- and `one of \`prompt\` or \`promptParts\` is required` are stable API
1172
+ The exact error strings `provide exactly one of \`prompt\` or \`promptParts\``and`one of \`prompt\` or \`promptParts\` is required` are stable API
1007
1173
  contract.
1008
1174
  - **Flight-recorder v3 migration**: new columns `stable_prefix_hash`
1009
1175
  (sha256) and `stable_prefix_tokens` (integer bytes/4 heuristic) on
@@ -1034,9 +1200,9 @@ Also includes (beyond cache-awareness):
1034
1200
  - `warn_on_ttl_expiry = false`
1035
1201
  - `[cache_awareness.min_stable_tokens_for_cache_control]` per-family
1036
1202
  table (sonnet=1024, opus=4096, haiku=4096, default=4096).
1037
- Validated by a separate Zod schema and loader (`loadCacheAwarenessConfig`);
1038
- a malformed `[cache_awareness]` block does NOT break `loadPersistenceConfig`
1039
- and vice versa. No env-var overrides.
1203
+ Validated by a separate Zod schema and loader (`loadCacheAwarenessConfig`);
1204
+ a malformed `[cache_awareness]` block does NOT break `loadPersistenceConfig`
1205
+ and vice versa. No env-var overrides.
1040
1206
 
1041
1207
  ### Decision: Branch B (prefix-discipline only) for slice 1
1042
1208
 
@@ -1356,6 +1522,7 @@ Lands DAG layers 6-12 — the personal-MCP MVP terminal plus all of Phase 0-3 pr
1356
1522
  - **No self-update** — `cli_upgrade --cli mistral` detects pip / uv / brew via probes and dispatches to `pip install -U vibe-cli`, `uv tool upgrade vibe-cli`, or `brew upgrade mistral-vibe`. Unknown installations return an actionable error rather than running a non-existent `vibe update`.
1357
1523
 
1358
1524
  Other surfaces extended: `SESSION_PROVIDER_VALUES` now includes `"mistral"`; `list_models`, `cli_versions`, `cli_upgrade`, `approval_list`, `session_create`, `session_list`, and `session_clear_all` accept the fifth provider; new MCP resources `sessions://mistral` and `models://mistral` are registered; `validate_with_models` / `consensus_check` / `red_team_review` can route to Mistral.
1525
+
1359
1526
  - **U23 — JSON output + token/cost parity across providers.** New `src/codex-json-parser.ts` parses the Codex `--json` JSONL event stream (`thread.started`, `turn.started`/`completed`/`failed`, `item.*`, `error`); lenient against partial streams and garbage preamble. New `src/gemini-json-parser.ts` parses `gemini -o json` output and maps `usageMetadata.{promptTokenCount, candidatesTokenCount, cachedContentTokenCount}`. `extractUsageAndCost` is now a thin per-provider dispatcher returning `{inputTokens, outputTokens, cacheReadTokens?, cacheCreationTokens?, costUsd?}` for every provider that supports JSON; Claude `cache_read_input_tokens` / `cache_creation_input_tokens` are now plumbed through instead of being discarded. `codex_request`, `codex_request_async`, `gemini_request`, and `gemini_request_async` now expose `outputFormat: enum("text","json")` — set to `"json"` and the gateway emits `--json` (Codex) or `-o json` (Gemini) and forwards parsed usage/cost into the flight recorder. Flight-recorder schema gains `cache_read_tokens` and `cache_creation_tokens` columns via idempotent migration (`PRAGMA table_info` → `ALTER TABLE ADD COLUMN`); existing `logs.db` files are upgraded in place. 15 new tests.
1360
1527
  - **U24 — Permission/approval-mode parity across providers.** Claude `permissionMode` enum (`default | acceptEdits | plan | auto | dontAsk | bypassPermissions`) replaces the boolean `dangerouslySkipPermissions` (the boolean still works and now maps to `permissionMode: "bypassPermissions"`; setting both logs a warning, `permissionMode` wins). Gemini `approvalMode` gains `plan`. Codex splits `--full-auto` into `sandboxMode: enum("read-only","workspace-write","danger-full-access")` and `askForApproval: enum("untrusted","on-request","never")`, emitting `--sandbox <mode>` and `--ask-for-approval <mode>` independently; legacy `fullAuto: true` still works and expands to `--sandbox workspace-write --ask-for-approval never` by default, with `useLegacyFullAutoFlag: true` as an explicit escape hatch to emit `--full-auto` directly. Codex resume mode filters all three flags (`--full-auto`, `--sandbox`, `--ask-for-approval`) since `codex exec resume` inherits the session's policy. 26 new tests.
1361
1528
  - **U25 — Claude high-impact features.** `claude_request` / `claude_request_async` schemas gain `agent?: string` (single sub-agent dispatch), `agents?: Record<string, object>` (multi-agent JSON, validated against `CLAUDE_AGENT_DEFINITION_SCHEMA` before emit), `forkSession?: boolean`, `systemPrompt?: string`, `appendSystemPrompt?: string` (mutually exclusive at the schema + tool-callback boundary), `maxBudgetUsd?: number`, `maxTurns?: number`, `effort?: enum("low","medium","high","xhigh","max")`, and `excludeDynamicSystemPromptSections?: boolean`. Each emits the documented `--<flag>` form. 25 new tests in `src/__tests__/claude-handler.test.ts`.
@@ -1448,7 +1615,7 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
1448
1615
 
1449
1616
  ### Fixed
1450
1617
 
1451
- - **SIGTERM→SIGKILL escalation bug** — `proc.killed` becomes `true` after `.kill()` is *called*, not after the process *exits*, so the SIGKILL guard (`if (!proc.killed)`) was always false. Replaced with an `exited` flag set by `close`/`error` events in both `executor.ts` and `async-job-manager.ts`
1618
+ - **SIGTERM→SIGKILL escalation bug** — `proc.killed` becomes `true` after `.kill()` is _called_, not after the process _exits_, so the SIGKILL guard (`if (!proc.killed)`) was always false. Replaced with an `exited` flag set by `close`/`error` events in both `executor.ts` and `async-job-manager.ts`
1452
1619
  - **Timer priority race** — When both `timeout` and `idleTimeout` are set, idle timeout now clears the wall-clock timer to prevent `timedOut` from overriding `idledOut` in the close handler (which would misclassify code 125 as transient code 124)
1453
1620
 
1454
1621
  ### Added
@@ -1533,6 +1700,7 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
1533
1700
  ## Core Features
1534
1701
 
1535
1702
  ### Multi-LLM Orchestration
1703
+
1536
1704
  - **3 CLI tools supported**: Claude Code, Codex, Gemini
1537
1705
  - **Unified MCP interface**: Single protocol for all LLMs
1538
1706
  - **Cross-tool collaboration**: LLMs can use each other via MCP
@@ -1540,6 +1708,7 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
1540
1708
  - **Correlation ID tracking**: Full request tracing
1541
1709
 
1542
1710
  ### Token Optimization
1711
+
1543
1712
  - **Auto-optimization middleware**: 44% reduction on prompts, 37% on responses
1544
1713
  - **15+ optimization patterns**: Remove filler, compact types, arrow notation
1545
1714
  - **Opt-in feature**: `optimizePrompt` and `optimizeResponse` flags
@@ -1547,6 +1716,7 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
1547
1716
  - **Research-backed**: 42 sources, best practices documented
1548
1717
 
1549
1718
  ### Reliability & Performance
1719
+
1550
1720
  - **Retry logic**: Exponential backoff with circuit breaker
1551
1721
  - **Atomic file writes**: Process-specific temp files with fsync
1552
1722
  - **Memory limits**: 50MB cap on CLI output prevents DoS
@@ -1554,6 +1724,7 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
1554
1724
  - **Non-zero exit code handling**: Proper retry behavior
1555
1725
 
1556
1726
  ### Security Hardening
1727
+
1557
1728
  - **No secret leakage**: Generic session descriptions only
1558
1729
  - **File permissions**: 0o600 on sensitive files
1559
1730
  - **No ReDoS vulnerabilities**: Bounded regex patterns
@@ -1562,6 +1733,7 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
1562
1733
  - **Custom storage paths**: Secure directory creation
1563
1734
 
1564
1735
  ### Testing & Quality
1736
+
1565
1737
  - **114 tests**: 68 unit, 41 integration, 5 optimizer
1566
1738
  - **Real CLI integration**: Not mocks
1567
1739
  - **Regression tests**: ReDoS, schema validation, retry behavior
@@ -1569,6 +1741,7 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
1569
1741
  - **Edge case coverage**: Timeouts, errors, concurrency
1570
1742
 
1571
1743
  ### Documentation Excellence
1744
+
1572
1745
  - **7 comprehensive guides**: 4,000+ lines total
1573
1746
  - **Research-backed**: TOKEN_OPTIMIZATION_GUIDE.md with 42 sources
1574
1747
  - **Real-world examples**: PROMPT_OPTIMIZATION_EXAMPLES.md with 5 examples
@@ -1580,6 +1753,7 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
1580
1753
  ## Added
1581
1754
 
1582
1755
  ### Features
1756
+
1583
1757
  - Multi-LLM CLI orchestration via MCP
1584
1758
  - Session management with persistence
1585
1759
  - Correlation ID tracking for request tracing
@@ -1591,6 +1765,7 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
1591
1765
  - Custom storage path support
1592
1766
 
1593
1767
  ### Tools (MCP)
1768
+
1594
1769
  - `claude_request` - Execute Claude Code CLI
1595
1770
  - `codex_request` - Execute Codex CLI
1596
1771
  - `gemini_request` - Execute Gemini CLI
@@ -1604,6 +1779,7 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
1604
1779
  - `list_models` - List available models for each CLI
1605
1780
 
1606
1781
  ### Resources (MCP)
1782
+
1607
1783
  - `sessions://all` - All sessions across CLIs
1608
1784
  - `sessions://claude` - Claude-specific sessions
1609
1785
  - `sessions://codex` - Codex-specific sessions
@@ -1612,6 +1788,7 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
1612
1788
  - `metrics://performance` - Performance metrics and stats
1613
1789
 
1614
1790
  ### Documentation
1791
+
1615
1792
  - `README.md` - Installation and usage guide
1616
1793
  - `BEST_PRACTICES.md` - Design and implementation patterns
1617
1794
  - `TOKEN_OPTIMIZATION_GUIDE.md` - Research-backed optimization techniques (42 sources)
@@ -1625,6 +1802,7 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
1625
1802
  - `CROSS_TOOL_SUCCESS.md` - Cross-LLM collaboration validation
1626
1803
 
1627
1804
  ### Tests
1805
+
1628
1806
  - 68 unit tests (executor, sessions, metrics, optimizer)
1629
1807
  - 41 integration tests (full MCP with real CLIs)
1630
1808
  - 5 optimizer tests (pattern validation, ReDoS prevention)
@@ -1637,6 +1815,7 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
1637
1815
  ### First Review Round (8 bugs)
1638
1816
 
1639
1817
  **Critical:**
1818
+
1640
1819
  1. **session_set_active schema mismatch** (src/index.ts:430)
1641
1820
  - Issue: Documentation said "null to clear" but z.string() rejected null
1642
1821
  - Fix: Changed to z.string().nullable()
@@ -1652,12 +1831,12 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
1652
1831
  - Fix: Integrated withRetry + CircuitBreaker into executeCli
1653
1832
  - Impact: Transient failures now retried automatically
1654
1833
 
1655
- **Medium:**
1656
- 4. **Integration test brittleness**
1657
- - Issue: Tests failed without dist/ or CLIs installed
1658
- - Fix: Tests properly skip when CLIs unavailable
1834
+ **Medium:** 4. **Integration test brittleness**
1835
+
1836
+ - Issue: Tests failed without dist/ or CLIs installed
1837
+ - Fix: Tests properly skip when CLIs unavailable
1659
1838
 
1660
- 5. **Test timing issues** (src/__tests__/session-manager.test.ts:216,429)
1839
+ 5. **Test timing issues** (src/**tests**/session-manager.test.ts:216,429)
1661
1840
  - Issue: setTimeout not awaited → false positives
1662
1841
  - Fix: Proper async/await patterns
1663
1842
 
@@ -1665,10 +1844,10 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
1665
1844
  - Issue: All stdout/stderr buffered in memory with no cap
1666
1845
  - Fix: Added 50MB limit with early termination
1667
1846
 
1668
- **Low:**
1669
- 7. **Model data duplication** (src/index.ts:64, src/resources.ts:22)
1670
- - Issue: CLI_INFO defined in two places
1671
- - Fix: Centralized in single location
1847
+ **Low:** 7. **Model data duplication** (src/index.ts:64, src/resources.ts:22)
1848
+
1849
+ - Issue: CLI_INFO defined in two places
1850
+ - Fix: Centralized in single location
1672
1851
 
1673
1852
  8. **Unused code** (src/resources.ts:33)
1674
1853
  - Issue: listResources() never called
@@ -1677,27 +1856,28 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
1677
1856
  ### Second Review Round (8 bugs)
1678
1857
 
1679
1858
  **Critical:**
1859
+
1680
1860
  1. **Secret leakage via session descriptions** (src/index.ts + src/session-manager.ts)
1681
1861
  - Issue: First 50 chars of prompts stored in plain text
1682
1862
  - Fix: Generic descriptions ("Claude Session"), file permissions 0o600
1683
1863
  - Impact: No user data exposed in session files
1684
1864
 
1685
- **High:**
1686
- 2. **ReDoS in optimizer regex** (src/optimizer.ts:241,244)
1687
- - Issue: Catastrophic backtracking with .+? patterns
1688
- - Fix: Bounded character sets [A-Za-z][\w-]*
1689
- - Impact: No DoS from malicious prompts
1865
+ **High:** 2. **ReDoS in optimizer regex** (src/optimizer.ts:241,244)
1866
+
1867
+ - Issue: Catastrophic backtracking with .+? patterns
1868
+ - Fix: Bounded character sets [A-Za-z][\w-]\*
1869
+ - Impact: No DoS from malicious prompts
1690
1870
 
1691
1871
  3. **Custom storage path directory not created** (src/session-manager.ts:36)
1692
1872
  - Issue: ensureStorageDirectory only created default path
1693
1873
  - Fix: Create dirname(storagePath) for custom paths
1694
1874
  - Impact: Custom storage paths work without errors
1695
1875
 
1696
- **Medium:**
1697
- 4. **Atomic write temp filename collision** (src/session-manager.ts:57)
1698
- - Issue: All processes used same .tmp filename
1699
- - Fix: Process-specific temp files (sessions.json.tmp.${process.pid})
1700
- - Impact: Safe multi-process deployments
1876
+ **Medium:** 4. **Atomic write temp filename collision** (src/session-manager.ts:57)
1877
+
1878
+ - Issue: All processes used same .tmp filename
1879
+ - Fix: Process-specific temp files (sessions.json.tmp.${process.pid})
1880
+ - Impact: Safe multi-process deployments
1701
1881
 
1702
1882
  5. **Retry doesn't handle non-zero exit codes** (src/executor.ts:99)
1703
1883
  - Issue: Only thrown errors triggered retry
@@ -1709,11 +1889,11 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
1709
1889
  - Fix: 50MB limit with process termination
1710
1890
  - Impact: DoS prevention
1711
1891
 
1712
- **Low:**
1713
- 7. **Performance overhead from NVM scanning** (src/executor.ts:41)
1714
- - Issue: Filesystem scan on every request
1715
- - Fix: Cache NVM path at module load
1716
- - Impact: Performance improvement
1892
+ **Low:** 7. **Performance overhead from NVM scanning** (src/executor.ts:41)
1893
+
1894
+ - Issue: Filesystem scan on every request
1895
+ - Fix: Cache NVM path at module load
1896
+ - Impact: Performance improvement
1717
1897
 
1718
1898
  8. **Unused imports** (src/session-manager.ts:4, src/executor.ts:7)
1719
1899
  - Issue: Dead code and unused parameters
@@ -1725,6 +1905,7 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
1725
1905
  ## Security
1726
1906
 
1727
1907
  ### Vulnerabilities Fixed
1908
+
1728
1909
  - ✅ **Secret leakage**: No user data in session descriptions
1729
1910
  - ✅ **File permissions**: 0o600 on sessions.json
1730
1911
  - ✅ **ReDoS**: Bounded regex patterns prevent DoS
@@ -1733,6 +1914,7 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
1733
1914
  - ✅ **Command injection**: Already prevented via spawn with args
1734
1915
 
1735
1916
  ### Security Best Practices
1917
+
1736
1918
  - Input validation with Zod schemas
1737
1919
  - No stack trace leakage in errors
1738
1920
  - Atomic file writes with fsync
@@ -1744,6 +1926,7 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
1744
1926
  ## Performance
1745
1927
 
1746
1928
  ### Optimizations Added
1929
+
1747
1930
  - **Token optimization**: 44% reduction on prompts, 37% on responses
1748
1931
  - **NVM path caching**: Eliminates I/O on every request
1749
1932
  - **Circuit breaker**: Fast-fail during outages
@@ -1751,6 +1934,7 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
1751
1934
  - **Memory limits**: Prevents resource exhaustion
1752
1935
 
1753
1936
  ### Metrics
1937
+
1754
1938
  - Request counts per CLI tool
1755
1939
  - Response times with percentiles
1756
1940
  - Success/failure rates
@@ -1762,6 +1946,7 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
1762
1946
  ## Testing
1763
1947
 
1764
1948
  ### Test Growth
1949
+
1765
1950
  - **Initial**: 104 tests
1766
1951
  - **After first fixes**: 109 tests (+5 from retry integration)
1767
1952
  - **After optimizer**: 113 tests (+4 from optimizer)
@@ -1769,6 +1954,7 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
1769
1954
  - **Growth**: +10 tests (9.6% increase)
1770
1955
 
1771
1956
  ### Coverage Areas
1957
+
1772
1958
  - Unit: Executor, session manager, metrics, optimizer
1773
1959
  - Integration: Full MCP protocol with real CLI execution
1774
1960
  - Regression: Schema validation, ReDoS, retry behavior
@@ -1779,6 +1965,7 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
1779
1965
  ## Documentation
1780
1966
 
1781
1967
  ### Guides Created
1968
+
1782
1969
  1. **README.md** - Installation, usage, API reference
1783
1970
  2. **BEST_PRACTICES.md** - Design patterns and architecture
1784
1971
  3. **TOKEN_OPTIMIZATION_GUIDE.md** - Research (42 sources)
@@ -1792,6 +1979,7 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
1792
1979
  11. **CROSS_TOOL_SUCCESS.md** - Collaboration proof
1793
1980
 
1794
1981
  ### Total Documentation
1982
+
1795
1983
  - **11 comprehensive files**
1796
1984
  - **~8,000 lines** of documentation
1797
1985
  - **Research-backed** with citations
@@ -1802,17 +1990,20 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
1802
1990
  ## Dogfooding Validation
1803
1991
 
1804
1992
  ### Multi-LLM Review Process
1993
+
1805
1994
  - **Claude Sonnet 4.5**: Strategic/product review (8.5/10 → 10/10)
1806
1995
  - **Codex**: Bug finding and implementation (13 bugs found, 13 fixed)
1807
1996
  - **Gemini 2.5 Pro**: Security analysis (3 critical issues found, 3 fixed)
1808
1997
 
1809
1998
  ### Self-Improvement Cycle
1999
+
1810
2000
  1. ✅ Multi-LLM review found 16 bugs
1811
2001
  2. ✅ Codex fixed all bugs via MCP
1812
2002
  3. ✅ Gateway validated fixes via test suite
1813
2003
  4. ✅ Complete autonomous improvement demonstrated
1814
2004
 
1815
2005
  ### Workflow Validated
2006
+
1816
2007
  ```
1817
2008
  Implement (Codex) → Review (Gemini) → Fix (Codex) → Verify (Tests) → Iterate
1818
2009
  ```
@@ -1822,41 +2013,45 @@ Implement (Codex) → Review (Gemini) → Fix (Codex) → Verify (Tests) → Ite
1822
2013
  ## Migration Guide
1823
2014
 
1824
2015
  ### Breaking Changes
2016
+
1825
2017
  None - This is the first release.
1826
2018
 
1827
2019
  ### New Features to Adopt
1828
2020
 
1829
2021
  **1. Token Optimization** (Optional, Opt-in)
2022
+
1830
2023
  ```typescript
1831
2024
  // Enable prompt optimization
1832
2025
  await callTool("codex_request", {
1833
2026
  prompt: "Your verbose prompt...",
1834
- optimizePrompt: true // 44% token reduction
2027
+ optimizePrompt: true, // 44% token reduction
1835
2028
  });
1836
2029
 
1837
2030
  // Enable response optimization
1838
2031
  await callTool("claude_request", {
1839
2032
  prompt: "Generate docs...",
1840
- optimizeResponse: true // 37% token reduction
2033
+ optimizeResponse: true, // 37% token reduction
1841
2034
  });
1842
2035
  ```
1843
2036
 
1844
2037
  **2. Session Management**
2038
+
1845
2039
  ```typescript
1846
2040
  // Create and use sessions
1847
2041
  const session = await callTool("session_create", {
1848
2042
  cli: "claude",
1849
- description: "My coding session"
2043
+ description: "My coding session",
1850
2044
  });
1851
2045
 
1852
2046
  // Continue conversations
1853
2047
  await callTool("claude_request", {
1854
2048
  prompt: "Continue from previous context",
1855
- sessionId: session.id
2049
+ sessionId: session.id,
1856
2050
  });
1857
2051
  ```
1858
2052
 
1859
2053
  **3. Correlation IDs** (Automatic)
2054
+
1860
2055
  ```typescript
1861
2056
  // Automatically generated for tracing
1862
2057
  // Check logs: [corrId] prefix on all log lines
@@ -1867,6 +2062,7 @@ await callTool("claude_request", {
1867
2062
  ## Known Limitations
1868
2063
 
1869
2064
  ### Documented Constraints
2065
+
1870
2066
  1. **Multi-level orchestration unsupported**
1871
2067
  - Nested MCP connections fail
1872
2068
  - LLMs can't spawn sub-LLMs via gateway
@@ -1881,6 +2077,7 @@ await callTool("claude_request", {
1881
2077
  - Consider encryption for sensitive data (future)
1882
2078
 
1883
2079
  ### Future Enhancements
2080
+
1884
2081
  - Session encryption at rest
1885
2082
  - Session TTL and automatic cleanup
1886
2083
  - Redis/DynamoDB backend for horizontal scaling
@@ -1893,16 +2090,19 @@ await callTool("claude_request", {
1893
2090
  ## Credits
1894
2091
 
1895
2092
  ### Development
2093
+
1896
2094
  - **Architecture & Orchestration**: Claude Sonnet 4.5
1897
2095
  - **Implementation & Bug Fixes**: Codex via llm-cli-gateway MCP
1898
2096
  - **Security Analysis**: Gemini 2.5 Pro via llm-cli-gateway MCP
1899
2097
 
1900
2098
  ### Research
2099
+
1901
2100
  - Token optimization: 42 research sources (2025-2026)
1902
2101
  - Compression validation: Compel paper (OpenReview 2025)
1903
2102
  - Best practices: Industry standards + dogfooding
1904
2103
 
1905
2104
  ### Validation
2105
+
1906
2106
  - **Self-dogfooding**: Gateway reviewed and fixed itself
1907
2107
  - **Multi-LLM collaboration**: 3 LLMs working via MCP
1908
2108
  - **Iterative quality**: 2 review rounds, 16 bugs found and fixed
@@ -1912,6 +2112,7 @@ await callTool("claude_request", {
1912
2112
  ## Statistics
1913
2113
 
1914
2114
  ### Development Timeline
2115
+
1915
2116
  - **Total time**: ~2.5 hours (from first review to 100% bug-free)
1916
2117
  - **Review rounds**: 2 comprehensive multi-LLM reviews
1917
2118
  - **Bugs found**: 16 total
@@ -1919,12 +2120,14 @@ await callTool("claude_request", {
1919
2120
  - **Test growth**: 104 → 114 tests (+9.6%)
1920
2121
 
1921
2122
  ### Code Metrics
2123
+
1922
2124
  - **Files modified**: 12 files
1923
2125
  - **Lines added**: ~2,500 lines
1924
2126
  - **Documentation**: ~8,000 lines (11 files)
1925
2127
  - **Test coverage**: 114 tests across unit/integration/regression
1926
2128
 
1927
2129
  ### Quality Metrics
2130
+
1928
2131
  - **Bug-free rate**: 100%
1929
2132
  - **Test pass rate**: 100%
1930
2133
  - **Build success**: ✅