llm-cli-gateway 1.13.2 → 1.15.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +371 -44
- package/dist/async-job-manager.d.ts +15 -1
- package/dist/async-job-manager.js +31 -6
- package/dist/cache-stats.d.ts +26 -0
- package/dist/cache-stats.js +45 -2
- package/dist/executor.d.ts +8 -0
- package/dist/executor.js +7 -2
- package/dist/flight-recorder.d.ts +7 -0
- package/dist/flight-recorder.js +27 -2
- package/dist/index.d.ts +126 -1
- package/dist/index.js +480 -50
- package/dist/prompt-parts.d.ts +74 -0
- package/dist/prompt-parts.js +47 -0
- package/dist/session-manager.d.ts +20 -2
- package/dist/session-manager.js +28 -3
- package/dist/upstream-contracts.d.ts +8 -1
- package/dist/upstream-contracts.js +37 -1
- package/dist/worktree-manager.d.ts +41 -0
- package/dist/worktree-manager.js +214 -0
- package/package.json +2 -1
package/CHANGELOG.md
CHANGED
|
@@ -2,6 +2,296 @@
|
|
|
2
2
|
|
|
3
3
|
All notable changes to the llm-cli-gateway project.
|
|
4
4
|
|
|
5
|
+
## [1.15.0] - 2026-05-28 — Phase 4 slice λ (gateway-owned worktree lifecycle)
|
|
6
|
+
|
|
7
|
+
Ships the tenth Phase 4 slice: a new top-level `worktree` field on every
|
|
8
|
+
`*_request` and `*_request_async` tool lets a caller run the request
|
|
9
|
+
inside a dedicated git worktree owned and lifecycle-managed by the
|
|
10
|
+
gateway. The provider audit listed `-w/--worktree` as a per-CLI flag on
|
|
11
|
+
Claude / Gemini / Grok; this slice deliberately does **not** wire any
|
|
12
|
+
`-w` passthrough. Instead the gateway pre-creates a worktree via
|
|
13
|
+
`git worktree add`, spawns the child CLI with `cwd: <worktree-path>`,
|
|
14
|
+
and persists `worktreePath` on `session.metadata` for reuse. Five CLIs
|
|
15
|
+
× two transports (sync + async) = ten tools all share one resolver, so
|
|
16
|
+
the surface lands as one Zod schema + one helper per tool rather than
|
|
17
|
+
five-times-two per-CLI argv wirings.
|
|
18
|
+
|
|
19
|
+
### Added — gateway-owned worktree surface
|
|
20
|
+
|
|
21
|
+
- **`WORKTREE_SCHEMA`** (`src/index.ts`): top-level Zod field
|
|
22
|
+
registered on all ten tools — `claude_request`, `codex_request`,
|
|
23
|
+
`gemini_request`, `grok_request`, `mistral_request`, plus the five
|
|
24
|
+
`*_request_async` siblings. Accepts `true` (anonymous UUID worktree
|
|
25
|
+
at `<repoRoot>/.worktrees/<uuid>` branched from HEAD) or
|
|
26
|
+
`{ name?, ref? }` (sanitised name and/or explicit git ref).
|
|
27
|
+
- **`src/worktree-manager.ts`** (new file, 277 lines):
|
|
28
|
+
`sanitizeWorktreeName` (rejects path traversal — `..`, leading `/`,
|
|
29
|
+
control chars, length > 64), `createWorktree`
|
|
30
|
+
(`git rev-parse --verify <ref>` before `git worktree add`,
|
|
31
|
+
collision detection via `WorktreeCollisionError`, branch-namespaced
|
|
32
|
+
`gateway/<name>` worktrees), `removeWorktree`
|
|
33
|
+
(`git worktree remove --force`), and `createWorktreeSessionCleanupHook`
|
|
34
|
+
(hooks into session manager).
|
|
35
|
+
- **`resolveWorktreeForRequest`** (`src/index.ts`): single per-request
|
|
36
|
+
resolver consumed by every tool handler. When the request carries
|
|
37
|
+
a `sessionId` and the session already has `metadata.worktreePath`,
|
|
38
|
+
the worktree is reused (no second `git worktree add`); otherwise a
|
|
39
|
+
new worktree is created and persisted onto the session via
|
|
40
|
+
`updateSessionMetadata`. The resolved path is threaded to the
|
|
41
|
+
executor via the existing `cwd` plumbing.
|
|
42
|
+
- **`formatWorktreePrefix(path)`** (`src/index.ts:826`): every
|
|
43
|
+
successful tool result is prefixed with
|
|
44
|
+
`[gateway] worktree=<absolute-path>\n` so the caller can drive
|
|
45
|
+
`Bash(cd <path>)`, `Read <path>/...`, etc. Empty when the request
|
|
46
|
+
did not use a worktree (zero behaviour change for non-λ callers).
|
|
47
|
+
- **`Session.metadata` extension** (`src/session-manager.ts`):
|
|
48
|
+
`worktreePath` + `worktreeName` land on the existing `metadata`
|
|
49
|
+
bag — no `Session` interface changes. `FileSessionManager` accepts
|
|
50
|
+
a `cleanupHook` option that fires on `deleteSession` and on
|
|
51
|
+
TTL-driven eviction; the hook calls `git worktree remove --force`
|
|
52
|
+
before the session record is dropped.
|
|
53
|
+
- **`AsyncJobManager` cwd-aware dedup** (`src/async-job-manager.ts`):
|
|
54
|
+
the dedup key now includes the resolved `cwd`, so two
|
|
55
|
+
`*_request_async` calls with identical argv but different
|
|
56
|
+
worktree paths cannot collide (REGRESSIONS Lθ).
|
|
57
|
+
|
|
58
|
+
### Out of scope — explicitly deferred
|
|
59
|
+
|
|
60
|
+
- **Grok's `worktree` subcommand** (separate top-level subcommand
|
|
61
|
+
on the Grok CLI, distinct from `-w/--worktree`).
|
|
62
|
+
- **Claude's `--tmux`** (terminal-multiplexer integration).
|
|
63
|
+
- **Startup sweep of orphaned `.worktrees/*`** — left to future
|
|
64
|
+
housekeeping; the cleanup hook covers the happy path
|
|
65
|
+
(session_delete + TTL eviction).
|
|
66
|
+
- **Multi-repo / submodule semantics** — gateway assumes a single
|
|
67
|
+
primary repo at `<repoRoot>`; multi-root behaviour is undefined.
|
|
68
|
+
|
|
69
|
+
### Test surface
|
|
70
|
+
|
|
71
|
+
`940 → 989` tests pass (+49):
|
|
72
|
+
|
|
73
|
+
- **`src/__tests__/worktree-manager.test.ts`** (new, 26 tests) —
|
|
74
|
+
unit-tests for `sanitizeWorktreeName`, `createWorktree` (including
|
|
75
|
+
the rev-parse-before-add invariant + `WorktreeCollisionError`),
|
|
76
|
+
`removeWorktree`, and `createWorktreeSessionCleanupHook`.
|
|
77
|
+
- **`src/__tests__/test-veracity-regressions-slice-lambda.test.ts`**
|
|
78
|
+
(new, 23 tests across REGRESSIONS Lα–Lθ + Lψ):
|
|
79
|
+
- **Lα** — `sanitizeWorktreeName` path-traversal rejection.
|
|
80
|
+
- **Lβ** — `createWorktree` runs `git rev-parse --verify` BEFORE
|
|
81
|
+
`git worktree add`.
|
|
82
|
+
- **Lγ** — `resolveWorktreeForRequest` persists `worktreePath`
|
|
83
|
+
onto session metadata via `updateSessionMetadata`.
|
|
84
|
+
- **Lδ** — same-session reuse: the second request with the same
|
|
85
|
+
`sessionId` skips `git worktree add`.
|
|
86
|
+
- **Lε** — `FileSessionManager.deleteSession` invokes the cleanup
|
|
87
|
+
hook (and TTL eviction does too).
|
|
88
|
+
- **Lζ** — `executor.executeCli` honours the resolved `cwd`.
|
|
89
|
+
- **Lη** — contract-as-negative-oracle: no CLI receives
|
|
90
|
+
`-w`/`--worktree` in emitted argv across all five providers
|
|
91
|
+
(pairs with slice δ's contract-as-positive-oracle).
|
|
92
|
+
- **Lθ** — `AsyncJobManager` dedup key includes `cwd`.
|
|
93
|
+
- **Lψ** — `formatWorktreePrefix` envelope shape locked
|
|
94
|
+
(`[gateway] worktree=<abs>\n`; empty when path missing).
|
|
95
|
+
|
|
96
|
+
### Multi-LLM strict-evidence audit
|
|
97
|
+
|
|
98
|
+
Per the standing protocol (`feedback_test_veracity_audit_protocol`
|
|
99
|
+
|
|
100
|
+
- `feedback_multi_llm_review_gate`), the slice was audited round-1
|
|
101
|
+
on 2026-05-28 against `docs/plans/slice-lambda.spec.md`.
|
|
102
|
+
|
|
103
|
+
**Round 1 outcomes:**
|
|
104
|
+
|
|
105
|
+
- Codex: UNCONDITIONAL APPROVE — 9/9 mutation probes RED as
|
|
106
|
+
predicted; per-probe verbatim assertion text and pre/post-revert
|
|
107
|
+
test counts. Worktree at `audit/codex-round-1`.
|
|
108
|
+
- Grok: UNCONDITIONAL APPROVE — 9/9 RED, per-probe verbatim
|
|
109
|
+
assertion text. Worktree at `audit/grok-round-1`.
|
|
110
|
+
- Mistral: UNCONDITIONAL APPROVE — 9/9 RED with per-probe failed-
|
|
111
|
+
count summaries. Worktree at `audit/mistral-round-1`
|
|
112
|
+
(`5d75099`).
|
|
113
|
+
- Gemini: **PARTIAL (quota-blocked)** — confirmed Lα–Lε RED (5/9)
|
|
114
|
+
with assertion text matching the substantive reviewers before
|
|
115
|
+
`TerminalQuotaError` (4h35m reset window > round budget) forced
|
|
116
|
+
a stop. No findings, no contradictions.
|
|
117
|
+
- Claude: **STRUCTURAL BLOCKER** — two `claude_request_async`
|
|
118
|
+
jobs (`135c05c3-…`, `e411e8cc-…`) stalled silently
|
|
119
|
+
(`stdoutBytes: 0` for ≥10 minutes); the second produced a
|
|
120
|
+
1126-byte fabricated meta-summary with no per-probe evidence,
|
|
121
|
+
rejected per the strict-evidence rule. Documented stall pattern,
|
|
122
|
+
not a defect in slice λ.
|
|
123
|
+
|
|
124
|
+
Four out of five independent vendor voices contributed evidence
|
|
125
|
+
(three full + one partial corroborating) with one documented
|
|
126
|
+
unfixable structural block, satisfying the slice-δ "4/5 minimum
|
|
127
|
+
with documented block" bar. The three full audits are unanimous;
|
|
128
|
+
the partial fourth corroborates without contradiction. Verdict:
|
|
129
|
+
slice λ passes the gate and ships as v1.15.0.
|
|
130
|
+
|
|
131
|
+
Full per-reviewer reports preserved at
|
|
132
|
+
`docs/reviews/slice-lambda/{README,round-1-{codex,grok,mistral,
|
|
133
|
+
gemini,claude}}.md`.
|
|
134
|
+
|
|
135
|
+
### Mechanical anchors (verify with `rg` before relying)
|
|
136
|
+
|
|
137
|
+
- `src/worktree-manager.ts` — new module, 277 lines.
|
|
138
|
+
- `src/index.ts` — `WORKTREE_SCHEMA` (`:419-444`),
|
|
139
|
+
`formatWorktreePrefix` (`:826-828`), `resolveWorktreeForRequest`
|
|
140
|
+
- per-tool prefix injection (search `formatWorktreePrefix(`),
|
|
141
|
+
10 × `worktree: WORKTREE_SCHEMA.optional()` registrations on
|
|
142
|
+
every `*_request` / `*_request_async` tool input.
|
|
143
|
+
- `src/session-manager.ts` — `cleanupHook` plumbing
|
|
144
|
+
(`:53-90, 318-342`).
|
|
145
|
+
- `src/async-job-manager.ts` — dedup-key cwd inclusion.
|
|
146
|
+
|
|
147
|
+
## [1.14.0] - 2026-05-28 — Phase 4 slice κ (Claude explicit `cache_control` via `--input-format stream-json`)
|
|
148
|
+
|
|
149
|
+
Ships the ninth Phase 4 slice. Callers can now opt their stable
|
|
150
|
+
`promptParts` blocks into Anthropic's explicit `cache_control`
|
|
151
|
+
breakpoints — the gateway switches from positional `-p <prompt>` to
|
|
152
|
+
`claude -p --input-format stream-json` and pipes a JSON content-blocks
|
|
153
|
+
payload via stdin. Smoke-test against a live 1-hour-cache-enabled
|
|
154
|
+
account observed a **15,511-token shift from `cache_creation` to
|
|
155
|
+
`cache_read` on the second call, 82 % cost drop, 36 % latency drop**.
|
|
156
|
+
|
|
157
|
+
Seven recommendation commits land alongside the feature (default
|
|
158
|
+
`outputFormat`, auto-emit-from-config, observability split, warning,
|
|
159
|
+
schema mutex, smoke-script gate, tool description) plus three
|
|
160
|
+
falsifiability-tightening commits driven by the multi-LLM review gate.
|
|
161
|
+
|
|
162
|
+
### Added — slice κ feature
|
|
163
|
+
|
|
164
|
+
- **`PromptParts.cacheControl`** (`src/prompt-parts.ts`): per-block
|
|
165
|
+
boolean opt-in (`system?`/`tools?`/`context?`) with strict Zod
|
|
166
|
+
schema. The `task` field is intentionally never markable — it's the
|
|
167
|
+
volatile tail. Setting any flag activates the κ emission path.
|
|
168
|
+
- **`assembleClaudeCacheBlocks(parts)`** helper (`src/prompt-parts.ts`):
|
|
169
|
+
builds the `{type:"user",message:{role:"user",content:[…]}}` payload
|
|
170
|
+
in `system → tools → context → task` order. Each marked non-empty
|
|
171
|
+
block gets `cache_control: {type:"ephemeral", ttl:"1h"}`. Empty
|
|
172
|
+
parts are silently skipped; markers on empty parts are a no-op.
|
|
173
|
+
- **`prepareClaudeRequest` κ branch** (`src/index.ts`): when the
|
|
174
|
+
caller marks any block AND requests `outputFormat: "stream-json"`,
|
|
175
|
+
argv switches to `-p --input-format stream-json --output-format
|
|
176
|
+
stream-json --include-partial-messages --verbose` with NO positional
|
|
177
|
+
prompt; the prep result carries `stdinPayload` + `cacheControlBlocks`.
|
|
178
|
+
Mixing `cacheControl` with `text`/`json` output returns an
|
|
179
|
+
actionable error instead of silently coercing.
|
|
180
|
+
- **`-p` arity widened** to a new `"optional"` (`src/upstream-contracts.ts`):
|
|
181
|
+
consumes the next token as a value iff it does not start with `-`.
|
|
182
|
+
Preserves the legacy `-p <prompt>` positional form AND validates the
|
|
183
|
+
κ `-p` standalone form. New `--input-format` flag registered with
|
|
184
|
+
`values: ["text","stream-json"]`. New conformance fixture
|
|
185
|
+
`claude-input-format-stream-json` pins the exact κ argv combo.
|
|
186
|
+
- **Executor + AsyncJobManager stdin** (`src/executor.ts`,
|
|
187
|
+
`src/async-job-manager.ts`): both gain `stdin?: string` options.
|
|
188
|
+
When set, stdio[0] switches from `"ignore"` to `"pipe"` and the
|
|
189
|
+
payload is written. The stdin payload participates in the
|
|
190
|
+
AsyncJobManager dedup key — two requests with identical argv but
|
|
191
|
+
different cache_control payloads cannot collide.
|
|
192
|
+
- **Flight recorder migration v4** (`src/flight-recorder.ts`):
|
|
193
|
+
`cache_control_blocks INTEGER` column added idempotently;
|
|
194
|
+
`FlightLogStart.cacheControlBlocks?` persists the per-request
|
|
195
|
+
marker count for cache_state aggregates.
|
|
196
|
+
|
|
197
|
+
### Added — seven recommendations (rec #1..#7)
|
|
198
|
+
|
|
199
|
+
- **Rec #1** — `claude_request` + `claude_request_async` default
|
|
200
|
+
`outputFormat` changes from `"text"` to `"stream-json"`. The gateway
|
|
201
|
+
already parses NDJSON usage events; the prior default routed every
|
|
202
|
+
call through unparseable text, leaving 1,078 historic FR rows with
|
|
203
|
+
NULL tokens. Override to `"text"` still works for callers that
|
|
204
|
+
truly want raw stdout (loses observability).
|
|
205
|
+
- **Rec #2** — `[cache_awareness].emit_anthropic_cache_control`
|
|
206
|
+
config flag is now wired. When enabled AND the caller passes a
|
|
207
|
+
`promptParts` whose stable prefix exceeds the per-model threshold
|
|
208
|
+
(`minStableTokensForModel`), the gateway auto-marks the rightmost
|
|
209
|
+
non-empty stable block (context → tools → system priority) with
|
|
210
|
+
`ttl: "1h"`. Skipped when `optimizePrompt: true` (rec #5 desync
|
|
211
|
+
risk) or `outputFormat !== "stream-json"`.
|
|
212
|
+
- **Rec #3** — `GlobalCacheStats` (`src/cache-stats.ts`) gains five
|
|
213
|
+
derived metrics that distinguish κ-explicit hits from Claude Code's
|
|
214
|
+
baseline cache reads in the same flight-recorder window:
|
|
215
|
+
`explicitCacheControlRows`, `explicitCacheControlHits`,
|
|
216
|
+
`explicitCacheControlHitRate`, `stablePrefixReuseCount`,
|
|
217
|
+
`avgCacheCreationAfterFirstCall` (averaged over rows AFTER the
|
|
218
|
+
first-by-datetime in each stable-prefix reuse group).
|
|
219
|
+
- **Rec #4** — new structured warning `cacheable_prefix_uncached`
|
|
220
|
+
(`src/index.ts`): fires when `promptParts`' stable prefix is above
|
|
221
|
+
the per-model threshold but no `cache_control` breakpoint will be
|
|
222
|
+
emitted (caller didn't set it AND auto-emit also didn't fire). The
|
|
223
|
+
warning includes the measured `stablePrefixTokens`, `threshold`,
|
|
224
|
+
and `reason` (outputFormat-not-streamjson / config-off /
|
|
225
|
+
no-eligible-block). Threaded through both Claude handlers.
|
|
226
|
+
- **Rec #5** — `prepareClaudeRequest` refuses `optimizePrompt: true`
|
|
227
|
+
combined with `promptParts.cacheControl` (`src/index.ts:1455`)
|
|
228
|
+
before optimization runs. Without this mutex the FR `prompt` column
|
|
229
|
+
would log optimized text while Claude actually received raw
|
|
230
|
+
promptParts blocks via stdin, breaking prefix-cache reuse on the
|
|
231
|
+
next call. Actionable error message points the caller at the
|
|
232
|
+
combination to drop.
|
|
233
|
+
- **Rec #6** — new `npm run smoke:cache-control` script
|
|
234
|
+
(`package.json`). Runs `docs/plans/slice-kappa-smoke-test.mjs`,
|
|
235
|
+
which gates on `SMOKE_CACHE_CONTROL=1` env var with a "BILLABLE
|
|
236
|
+
TEST" banner so accidental invocation in CI does not burn live
|
|
237
|
+
Anthropic credit (~$0.08 per run).
|
|
238
|
+
- **Rec #7** — both Claude tools' `promptParts` descriptions now
|
|
239
|
+
explicitly document the `cacheControl` opt-in, the
|
|
240
|
+
`outputFormat: "stream-json"` requirement, the `ttl='1h'`
|
|
241
|
+
hard-code, and the "task is the volatile tail" convention.
|
|
242
|
+
|
|
243
|
+
### Tests + multi-LLM review gate
|
|
244
|
+
|
|
245
|
+
`886 → 940` tests pass. 54 new tests across `Kα/Kβ/Kγ/Kδ/Kε/Kζ`
|
|
246
|
+
regression sets + 13 falsifiability-gap closures + 1 SQL-drop
|
|
247
|
+
falsifier strengthening. Every new test is mutation-probe-verified:
|
|
248
|
+
the targeted regression goes red on the predicted mutation.
|
|
249
|
+
|
|
250
|
+
The branch passed a strict-evidence multi-LLM review gate per the
|
|
251
|
+
project's standing protocol (`feedback_multi_llm_review_gate.md` and
|
|
252
|
+
`feedback_test_veracity_audit_protocol.md`). Round 3 was sequential
|
|
253
|
+
to avoid concurrent gateway contention; all four reviewers — Codex
|
|
254
|
+
(`gpt-5.4`), Grok (`grok-build`), Mistral (`mistral-medium-3.5`),
|
|
255
|
+
Claude (`sonnet-4-6`) — issued **UNCONDITIONAL APPROVE** against the
|
|
256
|
+
head with file:line citations and executed mutation probes. The
|
|
257
|
+
iteration trail (Codex round-3 REJECT → fix → recheck APPROVE; Grok
|
|
258
|
+
round-3 REJECT → fix → recheck APPROVE; Mistral + Claude first-pass
|
|
259
|
+
APPROVE) is preserved in commit history (`bea1aee` and `bbc3b5f`).
|
|
260
|
+
|
|
261
|
+
### Caller-honest framing
|
|
262
|
+
|
|
263
|
+
- κ adds caller-side reuse ON TOP of the irreducible ~10–12K
|
|
264
|
+
`cache_creation` token floor that every fresh `claude -p` session
|
|
265
|
+
rebuilds (Claude Code's session-wrap content). The _added_ benefit
|
|
266
|
+
scales with the caller's stable block size, not the total prompt.
|
|
267
|
+
- The `ttl='1h'` hard-code is mandatory because Anthropic rejects a
|
|
268
|
+
`5m` block after Claude Code's own 1h-marked session blocks; the
|
|
269
|
+
gateway warns if `[cache_awareness].anthropic_ttl_seconds` says 300.
|
|
270
|
+
- Recommended migration: callers running batch / orchestration /
|
|
271
|
+
repeated similar prompts should opt in; callers running one-shot
|
|
272
|
+
ad-hoc prompts won't see benefit.
|
|
273
|
+
|
|
274
|
+
### Files
|
|
275
|
+
|
|
276
|
+
```
|
|
277
|
+
src/prompt-parts.ts — PromptParts.cacheControl + assembleClaudeCacheBlocks
|
|
278
|
+
src/index.ts — prepareClaudeRequest κ branch + rec #1/#2/#4/#5/#7 + handler threading
|
|
279
|
+
src/upstream-contracts.ts — arity "optional", --input-format, claude-input-format-stream-json fixture
|
|
280
|
+
src/executor.ts — ExecuteOptions.stdin? threading
|
|
281
|
+
src/async-job-manager.ts — stdin? + dedup-key + cacheControlBlocks plumbing
|
|
282
|
+
src/flight-recorder.ts — migration v4 + cache_control_blocks column
|
|
283
|
+
src/cache-stats.ts — GlobalCacheStats 5 new derived metrics
|
|
284
|
+
package.json — smoke:cache-control script
|
|
285
|
+
docs/plans/slice-kappa.spec.md — audit spec
|
|
286
|
+
docs/plans/slice-kappa-final-review.spec.md — round-3 review spec
|
|
287
|
+
docs/plans/slice-kappa-captures/ — live smoke evidence
|
|
288
|
+
docs/plans/slice-kappa-smoke-test.mjs — billable smoke script (SMOKE_CACHE_CONTROL gated)
|
|
289
|
+
src/__tests__/test-veracity-regressions-slice-kappa.test.ts — 40 κ regressions (Kα/Kβ/Kγ/Kδ/Kε/Kζ)
|
|
290
|
+
src/__tests__/cache-stats.test.ts — +7 rec #3 + SQL-drop falsifier tests
|
|
291
|
+
src/__tests__/prompt-parts-tool-wiring.test.ts — +5 B1/B2/D1/D2 schema falsifiers
|
|
292
|
+
src/__tests__/smoke-script-gate.test.ts — 2 I2 subprocess tests
|
|
293
|
+
```
|
|
294
|
+
|
|
5
295
|
## [1.13.2] - 2026-05-27 — Claude stream-json regression fix (--verbose now required)
|
|
6
296
|
|
|
7
297
|
Patch release. Single user-facing fix to `claude_request` /
|
|
@@ -12,7 +302,7 @@ Patch release. Single user-facing fix to `claude_request` /
|
|
|
12
302
|
- Claude CLI 2.x rejects `--print --output-format=stream-json` without
|
|
13
303
|
`--verbose` ("When using --print, --output-format=stream-json requires
|
|
14
304
|
--verbose"). The gateway was emitting `--output-format stream-json
|
|
15
|
-
|
|
305
|
+
--include-partial-messages` without `--verbose`, so every claude
|
|
16
306
|
request configured for stream-json (sync or async) was exiting 1.
|
|
17
307
|
- `prepareClaudeRequest` now pushes `--verbose` as part of the
|
|
18
308
|
stream-json arg group. `--verbose` only affects what claude writes to
|
|
@@ -26,7 +316,7 @@ Patch release. Single user-facing fix to `claude_request` /
|
|
|
26
316
|
recorded in the FR for the first time since the CLI started enforcing
|
|
27
317
|
`--verbose`.
|
|
28
318
|
- Direct CLI verification: `claude -p ... --output-format stream-json
|
|
29
|
-
|
|
319
|
+
--verbose --include-partial-messages` returned a clean NDJSON stream
|
|
30
320
|
with `cache_read_input_tokens: 17978` and
|
|
31
321
|
`cache_creation_input_tokens: 17435` on a 1-hour-cache-enabled
|
|
32
322
|
account. The parser path is correct; only the missing flag was
|
|
@@ -36,7 +326,7 @@ Patch release. Single user-facing fix to `claude_request` /
|
|
|
36
326
|
|
|
37
327
|
- New regression: `prepareClaudeRequest` emits `--verbose` when
|
|
38
328
|
`outputFormat: "stream-json"` and does NOT emit it for `text` / `json`
|
|
39
|
-
(src
|
|
329
|
+
(src/**tests**/claude-handler.test.ts).
|
|
40
330
|
- Updated `upstream-contracts.test.ts` "accepts a valid Claude argv
|
|
41
331
|
emitted by the gateway" to pin the three-flag combo so a future
|
|
42
332
|
removal of `--verbose` fails at the contract gate.
|
|
@@ -106,7 +396,7 @@ regressions) plus this release commit.
|
|
|
106
396
|
enumerate). Also settable via the `GROK_SANDBOX` env var. Caller
|
|
107
397
|
responsibility to pass a valid profile name. The slice deliberately
|
|
108
398
|
does **not** integrate `--sandbox` with `approvalStrategy:
|
|
109
|
-
|
|
399
|
+
"mcp_managed"` because the value is unbounded — Grok's approval
|
|
110
400
|
semantics are already covered by `permissionMode` + `alwaysApprove` +
|
|
111
401
|
`approvalStrategy`.
|
|
112
402
|
- **`rules`** → `--rules <RULES>`. Supports `@file` prefix per
|
|
@@ -172,7 +462,7 @@ parallel with mandatory mutation-probe execution against
|
|
|
172
462
|
|
|
173
463
|
- Codex: UNCONDITIONAL APPROVE — all 12 probes [as predicted], all
|
|
174
464
|
26 tests VERIFIED. Baseline (`npm test`: 55 files / 884 tests; build
|
|
175
|
-
|
|
465
|
+
- format:check clean; slice file 31/31).
|
|
176
466
|
- Grok: UNCONDITIONAL APPROVE — all 12 probes [as predicted]; ran in
|
|
177
467
|
an isolated worktree at `/tmp/theta-audit-grok` per the slice-ζ
|
|
178
468
|
reviewer-stomping lesson.
|
|
@@ -182,8 +472,8 @@ parallel with mandatory mutation-probe execution against
|
|
|
182
472
|
beyond the spec and closes the "enum-mistake stays silent if fixture
|
|
183
473
|
uses a listed value" gap.
|
|
184
474
|
- Gemini: **FAILED at 10s** with `TerminalQuotaError: You have
|
|
185
|
-
|
|
186
|
-
|
|
475
|
+
exhausted your capacity on this model. Your quota will reset after
|
|
476
|
+
52m10s.` (Google 429). Documented quota blocker per protocol clause
|
|
187
477
|
5+6 — counts as "concrete unfixable when documented". Four
|
|
188
478
|
substantive valid approves from independent vendor families (OpenAI,
|
|
189
479
|
xAI, Mistral, Anthropic) satisfy the gate.
|
|
@@ -352,7 +642,7 @@ this release commit.
|
|
|
352
642
|
so no extra gating required.
|
|
353
643
|
- Both tools accept a new `jsonSchema` field
|
|
354
644
|
(`string | Record<string, unknown>`). Per `claude --help`, the CLI
|
|
355
|
-
argument is the JSON Schema
|
|
645
|
+
argument is the JSON Schema _literal_ (not a path; contrast with Codex
|
|
356
646
|
`--output-schema`). Object values are `JSON.stringify`-d; string values
|
|
357
647
|
pass verbatim. Use with `outputFormat: "json"` for structured output
|
|
358
648
|
validation. Achieves Codex parity for structured-output validation
|
|
@@ -650,7 +940,7 @@ for the async tools and the codex CLI.
|
|
|
650
940
|
already terminated before the arm signal landed.
|
|
651
941
|
- `JobStore.markOrphanedOnStartup()` return shape extended from `number`
|
|
652
942
|
to `{ count, orphaned: Array<{ id, correlationId, startedAt, stdout,
|
|
653
|
-
|
|
943
|
+
stderr, exitCode }> }` so the manager constructor can write FR
|
|
654
944
|
`logComplete` rows for previously orphaned jobs with proper audit data
|
|
655
945
|
(durationMs from `startedAt`, response from `stderr || stdout`,
|
|
656
946
|
errorMessage `"orphaned after gateway restart"`). `SqliteJobStore`
|
|
@@ -782,8 +1072,9 @@ Pure documentation release; zero source-code changes since 1.6.0.
|
|
|
782
1072
|
### Fixed — `docs/launch/blog-cache-awareness.md` accuracy + voice
|
|
783
1073
|
|
|
784
1074
|
Technical corrections from the multi-LLM voice + technical review:
|
|
1075
|
+
|
|
785
1076
|
- Mutually-exclusive error-string quotation reformatted so the
|
|
786
|
-
``provide exactly one of `prompt`
|
|
1077
|
+
``provide exactly one of `prompt`or`promptParts``` example renders
|
|
787
1078
|
correctly in markdown.
|
|
788
1079
|
- `lastWriteAt` references corrected to `lastRequestAt` (the actual
|
|
789
1080
|
public field name on `SessionCacheStats`).
|
|
@@ -854,8 +1145,7 @@ Also includes (beyond cache-awareness):
|
|
|
854
1145
|
The gateway concatenates in canonical order (`system → tools → context → task`)
|
|
855
1146
|
so the stable prefix bytes precede the volatile task tail unchanged across
|
|
856
1147
|
calls — raising implicit cache hit rate without calling provider cache APIs.
|
|
857
|
-
The exact error strings `provide exactly one of \`prompt\` or \`promptParts\``
|
|
858
|
-
and `one of \`prompt\` or \`promptParts\` is required` are stable API
|
|
1148
|
+
The exact error strings `provide exactly one of \`prompt\` or \`promptParts\``and`one of \`prompt\` or \`promptParts\` is required` are stable API
|
|
859
1149
|
contract.
|
|
860
1150
|
- **Flight-recorder v3 migration**: new columns `stable_prefix_hash`
|
|
861
1151
|
(sha256) and `stable_prefix_tokens` (integer bytes/4 heuristic) on
|
|
@@ -886,9 +1176,9 @@ Also includes (beyond cache-awareness):
|
|
|
886
1176
|
- `warn_on_ttl_expiry = false`
|
|
887
1177
|
- `[cache_awareness.min_stable_tokens_for_cache_control]` per-family
|
|
888
1178
|
table (sonnet=1024, opus=4096, haiku=4096, default=4096).
|
|
889
|
-
|
|
890
|
-
|
|
891
|
-
|
|
1179
|
+
Validated by a separate Zod schema and loader (`loadCacheAwarenessConfig`);
|
|
1180
|
+
a malformed `[cache_awareness]` block does NOT break `loadPersistenceConfig`
|
|
1181
|
+
and vice versa. No env-var overrides.
|
|
892
1182
|
|
|
893
1183
|
### Decision: Branch B (prefix-discipline only) for slice 1
|
|
894
1184
|
|
|
@@ -1208,6 +1498,7 @@ Lands DAG layers 6-12 — the personal-MCP MVP terminal plus all of Phase 0-3 pr
|
|
|
1208
1498
|
- **No self-update** — `cli_upgrade --cli mistral` detects pip / uv / brew via probes and dispatches to `pip install -U vibe-cli`, `uv tool upgrade vibe-cli`, or `brew upgrade mistral-vibe`. Unknown installations return an actionable error rather than running a non-existent `vibe update`.
|
|
1209
1499
|
|
|
1210
1500
|
Other surfaces extended: `SESSION_PROVIDER_VALUES` now includes `"mistral"`; `list_models`, `cli_versions`, `cli_upgrade`, `approval_list`, `session_create`, `session_list`, and `session_clear_all` accept the fifth provider; new MCP resources `sessions://mistral` and `models://mistral` are registered; `validate_with_models` / `consensus_check` / `red_team_review` can route to Mistral.
|
|
1501
|
+
|
|
1211
1502
|
- **U23 — JSON output + token/cost parity across providers.** New `src/codex-json-parser.ts` parses the Codex `--json` JSONL event stream (`thread.started`, `turn.started`/`completed`/`failed`, `item.*`, `error`); lenient against partial streams and garbage preamble. New `src/gemini-json-parser.ts` parses `gemini -o json` output and maps `usageMetadata.{promptTokenCount, candidatesTokenCount, cachedContentTokenCount}`. `extractUsageAndCost` is now a thin per-provider dispatcher returning `{inputTokens, outputTokens, cacheReadTokens?, cacheCreationTokens?, costUsd?}` for every provider that supports JSON; Claude `cache_read_input_tokens` / `cache_creation_input_tokens` are now plumbed through instead of being discarded. `codex_request`, `codex_request_async`, `gemini_request`, and `gemini_request_async` now expose `outputFormat: enum("text","json")` — set to `"json"` and the gateway emits `--json` (Codex) or `-o json` (Gemini) and forwards parsed usage/cost into the flight recorder. Flight-recorder schema gains `cache_read_tokens` and `cache_creation_tokens` columns via idempotent migration (`PRAGMA table_info` → `ALTER TABLE ADD COLUMN`); existing `logs.db` files are upgraded in place. 15 new tests.
|
|
1212
1503
|
- **U24 — Permission/approval-mode parity across providers.** Claude `permissionMode` enum (`default | acceptEdits | plan | auto | dontAsk | bypassPermissions`) replaces the boolean `dangerouslySkipPermissions` (the boolean still works and now maps to `permissionMode: "bypassPermissions"`; setting both logs a warning, `permissionMode` wins). Gemini `approvalMode` gains `plan`. Codex splits `--full-auto` into `sandboxMode: enum("read-only","workspace-write","danger-full-access")` and `askForApproval: enum("untrusted","on-request","never")`, emitting `--sandbox <mode>` and `--ask-for-approval <mode>` independently; legacy `fullAuto: true` still works and expands to `--sandbox workspace-write --ask-for-approval never` by default, with `useLegacyFullAutoFlag: true` as an explicit escape hatch to emit `--full-auto` directly. Codex resume mode filters all three flags (`--full-auto`, `--sandbox`, `--ask-for-approval`) since `codex exec resume` inherits the session's policy. 26 new tests.
|
|
1213
1504
|
- **U25 — Claude high-impact features.** `claude_request` / `claude_request_async` schemas gain `agent?: string` (single sub-agent dispatch), `agents?: Record<string, object>` (multi-agent JSON, validated against `CLAUDE_AGENT_DEFINITION_SCHEMA` before emit), `forkSession?: boolean`, `systemPrompt?: string`, `appendSystemPrompt?: string` (mutually exclusive at the schema + tool-callback boundary), `maxBudgetUsd?: number`, `maxTurns?: number`, `effort?: enum("low","medium","high","xhigh","max")`, and `excludeDynamicSystemPromptSections?: boolean`. Each emits the documented `--<flag>` form. 25 new tests in `src/__tests__/claude-handler.test.ts`.
|
|
@@ -1300,7 +1591,7 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
|
|
|
1300
1591
|
|
|
1301
1592
|
### Fixed
|
|
1302
1593
|
|
|
1303
|
-
- **SIGTERM→SIGKILL escalation bug** — `proc.killed` becomes `true` after `.kill()` is
|
|
1594
|
+
- **SIGTERM→SIGKILL escalation bug** — `proc.killed` becomes `true` after `.kill()` is _called_, not after the process _exits_, so the SIGKILL guard (`if (!proc.killed)`) was always false. Replaced with an `exited` flag set by `close`/`error` events in both `executor.ts` and `async-job-manager.ts`
|
|
1304
1595
|
- **Timer priority race** — When both `timeout` and `idleTimeout` are set, idle timeout now clears the wall-clock timer to prevent `timedOut` from overriding `idledOut` in the close handler (which would misclassify code 125 as transient code 124)
|
|
1305
1596
|
|
|
1306
1597
|
### Added
|
|
@@ -1385,6 +1676,7 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
|
|
|
1385
1676
|
## Core Features
|
|
1386
1677
|
|
|
1387
1678
|
### Multi-LLM Orchestration
|
|
1679
|
+
|
|
1388
1680
|
- **3 CLI tools supported**: Claude Code, Codex, Gemini
|
|
1389
1681
|
- **Unified MCP interface**: Single protocol for all LLMs
|
|
1390
1682
|
- **Cross-tool collaboration**: LLMs can use each other via MCP
|
|
@@ -1392,6 +1684,7 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
|
|
|
1392
1684
|
- **Correlation ID tracking**: Full request tracing
|
|
1393
1685
|
|
|
1394
1686
|
### Token Optimization
|
|
1687
|
+
|
|
1395
1688
|
- **Auto-optimization middleware**: 44% reduction on prompts, 37% on responses
|
|
1396
1689
|
- **15+ optimization patterns**: Remove filler, compact types, arrow notation
|
|
1397
1690
|
- **Opt-in feature**: `optimizePrompt` and `optimizeResponse` flags
|
|
@@ -1399,6 +1692,7 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
|
|
|
1399
1692
|
- **Research-backed**: 42 sources, best practices documented
|
|
1400
1693
|
|
|
1401
1694
|
### Reliability & Performance
|
|
1695
|
+
|
|
1402
1696
|
- **Retry logic**: Exponential backoff with circuit breaker
|
|
1403
1697
|
- **Atomic file writes**: Process-specific temp files with fsync
|
|
1404
1698
|
- **Memory limits**: 50MB cap on CLI output prevents DoS
|
|
@@ -1406,6 +1700,7 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
|
|
|
1406
1700
|
- **Non-zero exit code handling**: Proper retry behavior
|
|
1407
1701
|
|
|
1408
1702
|
### Security Hardening
|
|
1703
|
+
|
|
1409
1704
|
- **No secret leakage**: Generic session descriptions only
|
|
1410
1705
|
- **File permissions**: 0o600 on sensitive files
|
|
1411
1706
|
- **No ReDoS vulnerabilities**: Bounded regex patterns
|
|
@@ -1414,6 +1709,7 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
|
|
|
1414
1709
|
- **Custom storage paths**: Secure directory creation
|
|
1415
1710
|
|
|
1416
1711
|
### Testing & Quality
|
|
1712
|
+
|
|
1417
1713
|
- **114 tests**: 68 unit, 41 integration, 5 optimizer
|
|
1418
1714
|
- **Real CLI integration**: Not mocks
|
|
1419
1715
|
- **Regression tests**: ReDoS, schema validation, retry behavior
|
|
@@ -1421,6 +1717,7 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
|
|
|
1421
1717
|
- **Edge case coverage**: Timeouts, errors, concurrency
|
|
1422
1718
|
|
|
1423
1719
|
### Documentation Excellence
|
|
1720
|
+
|
|
1424
1721
|
- **7 comprehensive guides**: 4,000+ lines total
|
|
1425
1722
|
- **Research-backed**: TOKEN_OPTIMIZATION_GUIDE.md with 42 sources
|
|
1426
1723
|
- **Real-world examples**: PROMPT_OPTIMIZATION_EXAMPLES.md with 5 examples
|
|
@@ -1432,6 +1729,7 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
|
|
|
1432
1729
|
## Added
|
|
1433
1730
|
|
|
1434
1731
|
### Features
|
|
1732
|
+
|
|
1435
1733
|
- Multi-LLM CLI orchestration via MCP
|
|
1436
1734
|
- Session management with persistence
|
|
1437
1735
|
- Correlation ID tracking for request tracing
|
|
@@ -1443,6 +1741,7 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
|
|
|
1443
1741
|
- Custom storage path support
|
|
1444
1742
|
|
|
1445
1743
|
### Tools (MCP)
|
|
1744
|
+
|
|
1446
1745
|
- `claude_request` - Execute Claude Code CLI
|
|
1447
1746
|
- `codex_request` - Execute Codex CLI
|
|
1448
1747
|
- `gemini_request` - Execute Gemini CLI
|
|
@@ -1456,6 +1755,7 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
|
|
|
1456
1755
|
- `list_models` - List available models for each CLI
|
|
1457
1756
|
|
|
1458
1757
|
### Resources (MCP)
|
|
1758
|
+
|
|
1459
1759
|
- `sessions://all` - All sessions across CLIs
|
|
1460
1760
|
- `sessions://claude` - Claude-specific sessions
|
|
1461
1761
|
- `sessions://codex` - Codex-specific sessions
|
|
@@ -1464,6 +1764,7 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
|
|
|
1464
1764
|
- `metrics://performance` - Performance metrics and stats
|
|
1465
1765
|
|
|
1466
1766
|
### Documentation
|
|
1767
|
+
|
|
1467
1768
|
- `README.md` - Installation and usage guide
|
|
1468
1769
|
- `BEST_PRACTICES.md` - Design and implementation patterns
|
|
1469
1770
|
- `TOKEN_OPTIMIZATION_GUIDE.md` - Research-backed optimization techniques (42 sources)
|
|
@@ -1477,6 +1778,7 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
|
|
|
1477
1778
|
- `CROSS_TOOL_SUCCESS.md` - Cross-LLM collaboration validation
|
|
1478
1779
|
|
|
1479
1780
|
### Tests
|
|
1781
|
+
|
|
1480
1782
|
- 68 unit tests (executor, sessions, metrics, optimizer)
|
|
1481
1783
|
- 41 integration tests (full MCP with real CLIs)
|
|
1482
1784
|
- 5 optimizer tests (pattern validation, ReDoS prevention)
|
|
@@ -1489,6 +1791,7 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
|
|
|
1489
1791
|
### First Review Round (8 bugs)
|
|
1490
1792
|
|
|
1491
1793
|
**Critical:**
|
|
1794
|
+
|
|
1492
1795
|
1. **session_set_active schema mismatch** (src/index.ts:430)
|
|
1493
1796
|
- Issue: Documentation said "null to clear" but z.string() rejected null
|
|
1494
1797
|
- Fix: Changed to z.string().nullable()
|
|
@@ -1504,12 +1807,12 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
|
|
|
1504
1807
|
- Fix: Integrated withRetry + CircuitBreaker into executeCli
|
|
1505
1808
|
- Impact: Transient failures now retried automatically
|
|
1506
1809
|
|
|
1507
|
-
**Medium:**
|
|
1508
|
-
4. **Integration test brittleness**
|
|
1509
|
-
- Issue: Tests failed without dist/ or CLIs installed
|
|
1510
|
-
- Fix: Tests properly skip when CLIs unavailable
|
|
1810
|
+
**Medium:** 4. **Integration test brittleness**
|
|
1511
1811
|
|
|
1512
|
-
|
|
1812
|
+
- Issue: Tests failed without dist/ or CLIs installed
|
|
1813
|
+
- Fix: Tests properly skip when CLIs unavailable
|
|
1814
|
+
|
|
1815
|
+
5. **Test timing issues** (src/**tests**/session-manager.test.ts:216,429)
|
|
1513
1816
|
- Issue: setTimeout not awaited → false positives
|
|
1514
1817
|
- Fix: Proper async/await patterns
|
|
1515
1818
|
|
|
@@ -1517,10 +1820,10 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
|
|
|
1517
1820
|
- Issue: All stdout/stderr buffered in memory with no cap
|
|
1518
1821
|
- Fix: Added 50MB limit with early termination
|
|
1519
1822
|
|
|
1520
|
-
**Low:**
|
|
1521
|
-
|
|
1522
|
-
|
|
1523
|
-
|
|
1823
|
+
**Low:** 7. **Model data duplication** (src/index.ts:64, src/resources.ts:22)
|
|
1824
|
+
|
|
1825
|
+
- Issue: CLI_INFO defined in two places
|
|
1826
|
+
- Fix: Centralized in single location
|
|
1524
1827
|
|
|
1525
1828
|
8. **Unused code** (src/resources.ts:33)
|
|
1526
1829
|
- Issue: listResources() never called
|
|
@@ -1529,27 +1832,28 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
|
|
|
1529
1832
|
### Second Review Round (8 bugs)
|
|
1530
1833
|
|
|
1531
1834
|
**Critical:**
|
|
1835
|
+
|
|
1532
1836
|
1. **Secret leakage via session descriptions** (src/index.ts + src/session-manager.ts)
|
|
1533
1837
|
- Issue: First 50 chars of prompts stored in plain text
|
|
1534
1838
|
- Fix: Generic descriptions ("Claude Session"), file permissions 0o600
|
|
1535
1839
|
- Impact: No user data exposed in session files
|
|
1536
1840
|
|
|
1537
|
-
**High:**
|
|
1538
|
-
|
|
1539
|
-
|
|
1540
|
-
|
|
1541
|
-
|
|
1841
|
+
**High:** 2. **ReDoS in optimizer regex** (src/optimizer.ts:241,244)
|
|
1842
|
+
|
|
1843
|
+
- Issue: Catastrophic backtracking with .+? patterns
|
|
1844
|
+
- Fix: Bounded character sets [A-Za-z][\w-]\*
|
|
1845
|
+
- Impact: No DoS from malicious prompts
|
|
1542
1846
|
|
|
1543
1847
|
3. **Custom storage path directory not created** (src/session-manager.ts:36)
|
|
1544
1848
|
- Issue: ensureStorageDirectory only created default path
|
|
1545
1849
|
- Fix: Create dirname(storagePath) for custom paths
|
|
1546
1850
|
- Impact: Custom storage paths work without errors
|
|
1547
1851
|
|
|
1548
|
-
**Medium:**
|
|
1549
|
-
|
|
1550
|
-
|
|
1551
|
-
|
|
1552
|
-
|
|
1852
|
+
**Medium:** 4. **Atomic write temp filename collision** (src/session-manager.ts:57)
|
|
1853
|
+
|
|
1854
|
+
- Issue: All processes used same .tmp filename
|
|
1855
|
+
- Fix: Process-specific temp files (sessions.json.tmp.${process.pid})
|
|
1856
|
+
- Impact: Safe multi-process deployments
|
|
1553
1857
|
|
|
1554
1858
|
5. **Retry doesn't handle non-zero exit codes** (src/executor.ts:99)
|
|
1555
1859
|
- Issue: Only thrown errors triggered retry
|
|
@@ -1561,11 +1865,11 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
|
|
|
1561
1865
|
- Fix: 50MB limit with process termination
|
|
1562
1866
|
- Impact: DoS prevention
|
|
1563
1867
|
|
|
1564
|
-
**Low:**
|
|
1565
|
-
|
|
1566
|
-
|
|
1567
|
-
|
|
1568
|
-
|
|
1868
|
+
**Low:** 7. **Performance overhead from NVM scanning** (src/executor.ts:41)
|
|
1869
|
+
|
|
1870
|
+
- Issue: Filesystem scan on every request
|
|
1871
|
+
- Fix: Cache NVM path at module load
|
|
1872
|
+
- Impact: Performance improvement
|
|
1569
1873
|
|
|
1570
1874
|
8. **Unused imports** (src/session-manager.ts:4, src/executor.ts:7)
|
|
1571
1875
|
- Issue: Dead code and unused parameters
|
|
@@ -1577,6 +1881,7 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
|
|
|
1577
1881
|
## Security
|
|
1578
1882
|
|
|
1579
1883
|
### Vulnerabilities Fixed
|
|
1884
|
+
|
|
1580
1885
|
- ✅ **Secret leakage**: No user data in session descriptions
|
|
1581
1886
|
- ✅ **File permissions**: 0o600 on sessions.json
|
|
1582
1887
|
- ✅ **ReDoS**: Bounded regex patterns prevent DoS
|
|
@@ -1585,6 +1890,7 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
|
|
|
1585
1890
|
- ✅ **Command injection**: Already prevented via spawn with args
|
|
1586
1891
|
|
|
1587
1892
|
### Security Best Practices
|
|
1893
|
+
|
|
1588
1894
|
- Input validation with Zod schemas
|
|
1589
1895
|
- No stack trace leakage in errors
|
|
1590
1896
|
- Atomic file writes with fsync
|
|
@@ -1596,6 +1902,7 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
|
|
|
1596
1902
|
## Performance
|
|
1597
1903
|
|
|
1598
1904
|
### Optimizations Added
|
|
1905
|
+
|
|
1599
1906
|
- **Token optimization**: 44% reduction on prompts, 37% on responses
|
|
1600
1907
|
- **NVM path caching**: Eliminates I/O on every request
|
|
1601
1908
|
- **Circuit breaker**: Fast-fail during outages
|
|
@@ -1603,6 +1910,7 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
|
|
|
1603
1910
|
- **Memory limits**: Prevents resource exhaustion
|
|
1604
1911
|
|
|
1605
1912
|
### Metrics
|
|
1913
|
+
|
|
1606
1914
|
- Request counts per CLI tool
|
|
1607
1915
|
- Response times with percentiles
|
|
1608
1916
|
- Success/failure rates
|
|
@@ -1614,6 +1922,7 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
|
|
|
1614
1922
|
## Testing
|
|
1615
1923
|
|
|
1616
1924
|
### Test Growth
|
|
1925
|
+
|
|
1617
1926
|
- **Initial**: 104 tests
|
|
1618
1927
|
- **After first fixes**: 109 tests (+5 from retry integration)
|
|
1619
1928
|
- **After optimizer**: 113 tests (+4 from optimizer)
|
|
@@ -1621,6 +1930,7 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
|
|
|
1621
1930
|
- **Growth**: +10 tests (9.6% increase)
|
|
1622
1931
|
|
|
1623
1932
|
### Coverage Areas
|
|
1933
|
+
|
|
1624
1934
|
- Unit: Executor, session manager, metrics, optimizer
|
|
1625
1935
|
- Integration: Full MCP protocol with real CLI execution
|
|
1626
1936
|
- Regression: Schema validation, ReDoS, retry behavior
|
|
@@ -1631,6 +1941,7 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
|
|
|
1631
1941
|
## Documentation
|
|
1632
1942
|
|
|
1633
1943
|
### Guides Created
|
|
1944
|
+
|
|
1634
1945
|
1. **README.md** - Installation, usage, API reference
|
|
1635
1946
|
2. **BEST_PRACTICES.md** - Design patterns and architecture
|
|
1636
1947
|
3. **TOKEN_OPTIMIZATION_GUIDE.md** - Research (42 sources)
|
|
@@ -1644,6 +1955,7 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
|
|
|
1644
1955
|
11. **CROSS_TOOL_SUCCESS.md** - Collaboration proof
|
|
1645
1956
|
|
|
1646
1957
|
### Total Documentation
|
|
1958
|
+
|
|
1647
1959
|
- **11 comprehensive files**
|
|
1648
1960
|
- **~8,000 lines** of documentation
|
|
1649
1961
|
- **Research-backed** with citations
|
|
@@ -1654,17 +1966,20 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
|
|
|
1654
1966
|
## Dogfooding Validation
|
|
1655
1967
|
|
|
1656
1968
|
### Multi-LLM Review Process
|
|
1969
|
+
|
|
1657
1970
|
- **Claude Sonnet 4.5**: Strategic/product review (8.5/10 → 10/10)
|
|
1658
1971
|
- **Codex**: Bug finding and implementation (13 bugs found, 13 fixed)
|
|
1659
1972
|
- **Gemini 2.5 Pro**: Security analysis (3 critical issues found, 3 fixed)
|
|
1660
1973
|
|
|
1661
1974
|
### Self-Improvement Cycle
|
|
1975
|
+
|
|
1662
1976
|
1. ✅ Multi-LLM review found 16 bugs
|
|
1663
1977
|
2. ✅ Codex fixed all bugs via MCP
|
|
1664
1978
|
3. ✅ Gateway validated fixes via test suite
|
|
1665
1979
|
4. ✅ Complete autonomous improvement demonstrated
|
|
1666
1980
|
|
|
1667
1981
|
### Workflow Validated
|
|
1982
|
+
|
|
1668
1983
|
```
|
|
1669
1984
|
Implement (Codex) → Review (Gemini) → Fix (Codex) → Verify (Tests) → Iterate
|
|
1670
1985
|
```
|
|
@@ -1674,41 +1989,45 @@ Implement (Codex) → Review (Gemini) → Fix (Codex) → Verify (Tests) → Ite
|
|
|
1674
1989
|
## Migration Guide
|
|
1675
1990
|
|
|
1676
1991
|
### Breaking Changes
|
|
1992
|
+
|
|
1677
1993
|
None - This is the first release.
|
|
1678
1994
|
|
|
1679
1995
|
### New Features to Adopt
|
|
1680
1996
|
|
|
1681
1997
|
**1. Token Optimization** (Optional, Opt-in)
|
|
1998
|
+
|
|
1682
1999
|
```typescript
|
|
1683
2000
|
// Enable prompt optimization
|
|
1684
2001
|
await callTool("codex_request", {
|
|
1685
2002
|
prompt: "Your verbose prompt...",
|
|
1686
|
-
optimizePrompt: true
|
|
2003
|
+
optimizePrompt: true, // 44% token reduction
|
|
1687
2004
|
});
|
|
1688
2005
|
|
|
1689
2006
|
// Enable response optimization
|
|
1690
2007
|
await callTool("claude_request", {
|
|
1691
2008
|
prompt: "Generate docs...",
|
|
1692
|
-
optimizeResponse: true
|
|
2009
|
+
optimizeResponse: true, // 37% token reduction
|
|
1693
2010
|
});
|
|
1694
2011
|
```
|
|
1695
2012
|
|
|
1696
2013
|
**2. Session Management**
|
|
2014
|
+
|
|
1697
2015
|
```typescript
|
|
1698
2016
|
// Create and use sessions
|
|
1699
2017
|
const session = await callTool("session_create", {
|
|
1700
2018
|
cli: "claude",
|
|
1701
|
-
description: "My coding session"
|
|
2019
|
+
description: "My coding session",
|
|
1702
2020
|
});
|
|
1703
2021
|
|
|
1704
2022
|
// Continue conversations
|
|
1705
2023
|
await callTool("claude_request", {
|
|
1706
2024
|
prompt: "Continue from previous context",
|
|
1707
|
-
sessionId: session.id
|
|
2025
|
+
sessionId: session.id,
|
|
1708
2026
|
});
|
|
1709
2027
|
```
|
|
1710
2028
|
|
|
1711
2029
|
**3. Correlation IDs** (Automatic)
|
|
2030
|
+
|
|
1712
2031
|
```typescript
|
|
1713
2032
|
// Automatically generated for tracing
|
|
1714
2033
|
// Check logs: [corrId] prefix on all log lines
|
|
@@ -1719,6 +2038,7 @@ await callTool("claude_request", {
|
|
|
1719
2038
|
## Known Limitations
|
|
1720
2039
|
|
|
1721
2040
|
### Documented Constraints
|
|
2041
|
+
|
|
1722
2042
|
1. **Multi-level orchestration unsupported**
|
|
1723
2043
|
- Nested MCP connections fail
|
|
1724
2044
|
- LLMs can't spawn sub-LLMs via gateway
|
|
@@ -1733,6 +2053,7 @@ await callTool("claude_request", {
|
|
|
1733
2053
|
- Consider encryption for sensitive data (future)
|
|
1734
2054
|
|
|
1735
2055
|
### Future Enhancements
|
|
2056
|
+
|
|
1736
2057
|
- Session encryption at rest
|
|
1737
2058
|
- Session TTL and automatic cleanup
|
|
1738
2059
|
- Redis/DynamoDB backend for horizontal scaling
|
|
@@ -1745,16 +2066,19 @@ await callTool("claude_request", {
|
|
|
1745
2066
|
## Credits
|
|
1746
2067
|
|
|
1747
2068
|
### Development
|
|
2069
|
+
|
|
1748
2070
|
- **Architecture & Orchestration**: Claude Sonnet 4.5
|
|
1749
2071
|
- **Implementation & Bug Fixes**: Codex via llm-cli-gateway MCP
|
|
1750
2072
|
- **Security Analysis**: Gemini 2.5 Pro via llm-cli-gateway MCP
|
|
1751
2073
|
|
|
1752
2074
|
### Research
|
|
2075
|
+
|
|
1753
2076
|
- Token optimization: 42 research sources (2025-2026)
|
|
1754
2077
|
- Compression validation: Compel paper (OpenReview 2025)
|
|
1755
2078
|
- Best practices: Industry standards + dogfooding
|
|
1756
2079
|
|
|
1757
2080
|
### Validation
|
|
2081
|
+
|
|
1758
2082
|
- **Self-dogfooding**: Gateway reviewed and fixed itself
|
|
1759
2083
|
- **Multi-LLM collaboration**: 3 LLMs working via MCP
|
|
1760
2084
|
- **Iterative quality**: 2 review rounds, 16 bugs found and fixed
|
|
@@ -1764,6 +2088,7 @@ await callTool("claude_request", {
|
|
|
1764
2088
|
## Statistics
|
|
1765
2089
|
|
|
1766
2090
|
### Development Timeline
|
|
2091
|
+
|
|
1767
2092
|
- **Total time**: ~2.5 hours (from first review to 100% bug-free)
|
|
1768
2093
|
- **Review rounds**: 2 comprehensive multi-LLM reviews
|
|
1769
2094
|
- **Bugs found**: 16 total
|
|
@@ -1771,12 +2096,14 @@ await callTool("claude_request", {
|
|
|
1771
2096
|
- **Test growth**: 104 → 114 tests (+9.6%)
|
|
1772
2097
|
|
|
1773
2098
|
### Code Metrics
|
|
2099
|
+
|
|
1774
2100
|
- **Files modified**: 12 files
|
|
1775
2101
|
- **Lines added**: ~2,500 lines
|
|
1776
2102
|
- **Documentation**: ~8,000 lines (11 files)
|
|
1777
2103
|
- **Test coverage**: 114 tests across unit/integration/regression
|
|
1778
2104
|
|
|
1779
2105
|
### Quality Metrics
|
|
2106
|
+
|
|
1780
2107
|
- **Bug-free rate**: 100%
|
|
1781
2108
|
- **Test pass rate**: 100%
|
|
1782
2109
|
- **Build success**: ✅
|