llm-cli-gateway 1.14.0 → 1.15.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +249 -46
- package/README.md +139 -29
- package/dist/async-job-manager.js +20 -8
- package/dist/executor.js +65 -8
- package/dist/index.d.ts +101 -0
- package/dist/index.js +311 -26
- package/dist/request-helpers.js +12 -0
- package/dist/session-manager.d.ts +20 -2
- package/dist/session-manager.js +28 -3
- package/dist/worktree-manager.d.ts +41 -0
- package/dist/worktree-manager.js +214 -0
- package/package.json +1 -1
package/CHANGELOG.md
CHANGED
|
@@ -2,6 +2,172 @@
|
|
|
2
2
|
|
|
3
3
|
All notable changes to the llm-cli-gateway project.
|
|
4
4
|
|
|
5
|
+
## [1.15.1] - 2026-05-29 — quality badges + Sigstore release signing
|
|
6
|
+
|
|
7
|
+
Release-infrastructure follow-up to v1.15.0.
|
|
8
|
+
|
|
9
|
+
### Added
|
|
10
|
+
|
|
11
|
+
- README quality badges for CI, security, OpenSSF Scorecard, npm, license, and
|
|
12
|
+
Sigstore-signed release artifacts.
|
|
13
|
+
- Sigstore keyless signing for GitHub release installer artifacts, including
|
|
14
|
+
`.sigstore.json` bundles and pre-upload verification in the release workflow.
|
|
15
|
+
- End-user verification guidance for `SHA256SUMS.sigstore.json` before trusting
|
|
16
|
+
release checksums.
|
|
17
|
+
- Sanitized Windows Claude Desktop MCP config example using 1Password
|
|
18
|
+
environment injection placeholders.
|
|
19
|
+
- Security workflow attribution guard that rejects new Claude/Anthropic
|
|
20
|
+
author/co-author metadata in future commits.
|
|
21
|
+
|
|
22
|
+
### Changed
|
|
23
|
+
|
|
24
|
+
- Manual release-installer rebuilds now fail fast unless launched from the
|
|
25
|
+
matching release tag ref, keeping Sigstore certificate identities stable.
|
|
26
|
+
- Windows installer snippets and generated release manifest commands now verify
|
|
27
|
+
the Sigstore checksum bundle before executing the downloaded bootstrapper.
|
|
28
|
+
|
|
29
|
+
## [1.15.0] - 2026-05-28 — Phase 4 slice λ (gateway-owned worktree lifecycle)
|
|
30
|
+
|
|
31
|
+
Ships the tenth Phase 4 slice: a new top-level `worktree` field on every
|
|
32
|
+
`*_request` and `*_request_async` tool lets a caller run the request
|
|
33
|
+
inside a dedicated git worktree owned and lifecycle-managed by the
|
|
34
|
+
gateway. The provider audit listed `-w/--worktree` as a per-CLI flag on
|
|
35
|
+
Claude / Gemini / Grok; this slice deliberately does **not** wire any
|
|
36
|
+
`-w` passthrough. Instead the gateway pre-creates a worktree via
|
|
37
|
+
`git worktree add`, spawns the child CLI with `cwd: <worktree-path>`,
|
|
38
|
+
and persists `worktreePath` on `session.metadata` for reuse. Five CLIs
|
|
39
|
+
× two transports (sync + async) = ten tools all share one resolver, so
|
|
40
|
+
the surface lands as one Zod schema + one helper per tool rather than
|
|
41
|
+
five-times-two per-CLI argv wirings.
|
|
42
|
+
|
|
43
|
+
### Added — gateway-owned worktree surface
|
|
44
|
+
|
|
45
|
+
- **`WORKTREE_SCHEMA`** (`src/index.ts`): top-level Zod field
|
|
46
|
+
registered on all ten tools — `claude_request`, `codex_request`,
|
|
47
|
+
`gemini_request`, `grok_request`, `mistral_request`, plus the five
|
|
48
|
+
`*_request_async` siblings. Accepts `true` (anonymous UUID worktree
|
|
49
|
+
at `<repoRoot>/.worktrees/<uuid>` branched from HEAD) or
|
|
50
|
+
`{ name?, ref? }` (sanitised name and/or explicit git ref).
|
|
51
|
+
- **`src/worktree-manager.ts`** (new file, 277 lines):
|
|
52
|
+
`sanitizeWorktreeName` (rejects path traversal — `..`, leading `/`,
|
|
53
|
+
control chars, length > 64), `createWorktree`
|
|
54
|
+
(`git rev-parse --verify <ref>` before `git worktree add`,
|
|
55
|
+
collision detection via `WorktreeCollisionError`, branch-namespaced
|
|
56
|
+
`gateway/<name>` worktrees), `removeWorktree`
|
|
57
|
+
(`git worktree remove --force`), and `createWorktreeSessionCleanupHook`
|
|
58
|
+
(hooks into session manager).
|
|
59
|
+
- **`resolveWorktreeForRequest`** (`src/index.ts`): single per-request
|
|
60
|
+
resolver consumed by every tool handler. When the request carries
|
|
61
|
+
a `sessionId` and the session already has `metadata.worktreePath`,
|
|
62
|
+
the worktree is reused (no second `git worktree add`); otherwise a
|
|
63
|
+
new worktree is created and persisted onto the session via
|
|
64
|
+
`updateSessionMetadata`. The resolved path is threaded to the
|
|
65
|
+
executor via the existing `cwd` plumbing.
|
|
66
|
+
- **`formatWorktreePrefix(path)`** (`src/index.ts:826`): every
|
|
67
|
+
successful tool result is prefixed with
|
|
68
|
+
`[gateway] worktree=<absolute-path>\n` so the caller can drive
|
|
69
|
+
`Bash(cd <path>)`, `Read <path>/...`, etc. Empty when the request
|
|
70
|
+
did not use a worktree (zero behaviour change for non-λ callers).
|
|
71
|
+
- **`Session.metadata` extension** (`src/session-manager.ts`):
|
|
72
|
+
`worktreePath` + `worktreeName` land on the existing `metadata`
|
|
73
|
+
bag — no `Session` interface changes. `FileSessionManager` accepts
|
|
74
|
+
a `cleanupHook` option that fires on `deleteSession` and on
|
|
75
|
+
TTL-driven eviction; the hook calls `git worktree remove --force`
|
|
76
|
+
before the session record is dropped.
|
|
77
|
+
- **`AsyncJobManager` cwd-aware dedup** (`src/async-job-manager.ts`):
|
|
78
|
+
the dedup key now includes the resolved `cwd`, so two
|
|
79
|
+
`*_request_async` calls with identical argv but different
|
|
80
|
+
worktree paths cannot collide (REGRESSIONS Lθ).
|
|
81
|
+
|
|
82
|
+
### Out of scope — explicitly deferred
|
|
83
|
+
|
|
84
|
+
- **Grok's `worktree` subcommand** (separate top-level subcommand
|
|
85
|
+
on the Grok CLI, distinct from `-w/--worktree`).
|
|
86
|
+
- **Claude's `--tmux`** (terminal-multiplexer integration).
|
|
87
|
+
- **Startup sweep of orphaned `.worktrees/*`** — left to future
|
|
88
|
+
housekeeping; the cleanup hook covers the happy path
|
|
89
|
+
(session_delete + TTL eviction).
|
|
90
|
+
- **Multi-repo / submodule semantics** — gateway assumes a single
|
|
91
|
+
primary repo at `<repoRoot>`; multi-root behaviour is undefined.
|
|
92
|
+
|
|
93
|
+
### Test surface
|
|
94
|
+
|
|
95
|
+
`940 → 989` tests pass (+49):
|
|
96
|
+
|
|
97
|
+
- **`src/__tests__/worktree-manager.test.ts`** (new, 26 tests) —
|
|
98
|
+
unit-tests for `sanitizeWorktreeName`, `createWorktree` (including
|
|
99
|
+
the rev-parse-before-add invariant + `WorktreeCollisionError`),
|
|
100
|
+
`removeWorktree`, and `createWorktreeSessionCleanupHook`.
|
|
101
|
+
- **`src/__tests__/test-veracity-regressions-slice-lambda.test.ts`**
|
|
102
|
+
(new, 23 tests across REGRESSIONS Lα–Lθ + Lψ):
|
|
103
|
+
- **Lα** — `sanitizeWorktreeName` path-traversal rejection.
|
|
104
|
+
- **Lβ** — `createWorktree` runs `git rev-parse --verify` BEFORE
|
|
105
|
+
`git worktree add`.
|
|
106
|
+
- **Lγ** — `resolveWorktreeForRequest` persists `worktreePath`
|
|
107
|
+
onto session metadata via `updateSessionMetadata`.
|
|
108
|
+
- **Lδ** — same-session reuse: the second request with the same
|
|
109
|
+
`sessionId` skips `git worktree add`.
|
|
110
|
+
- **Lε** — `FileSessionManager.deleteSession` invokes the cleanup
|
|
111
|
+
hook (and TTL eviction does too).
|
|
112
|
+
- **Lζ** — `executor.executeCli` honours the resolved `cwd`.
|
|
113
|
+
- **Lη** — contract-as-negative-oracle: no CLI receives
|
|
114
|
+
`-w`/`--worktree` in emitted argv across all five providers
|
|
115
|
+
(pairs with slice δ's contract-as-positive-oracle).
|
|
116
|
+
- **Lθ** — `AsyncJobManager` dedup key includes `cwd`.
|
|
117
|
+
- **Lψ** — `formatWorktreePrefix` envelope shape locked
|
|
118
|
+
(`[gateway] worktree=<abs>\n`; empty when path missing).
|
|
119
|
+
|
|
120
|
+
### Multi-LLM strict-evidence audit
|
|
121
|
+
|
|
122
|
+
Per the standing protocol (`feedback_test_veracity_audit_protocol`
|
|
123
|
+
|
|
124
|
+
- `feedback_multi_llm_review_gate`), the slice was audited round-1
|
|
125
|
+
on 2026-05-28 against `docs/plans/slice-lambda.spec.md`.
|
|
126
|
+
|
|
127
|
+
**Round 1 outcomes:**
|
|
128
|
+
|
|
129
|
+
- Codex: UNCONDITIONAL APPROVE — 9/9 mutation probes RED as
|
|
130
|
+
predicted; per-probe verbatim assertion text and pre/post-revert
|
|
131
|
+
test counts. Worktree at `audit/codex-round-1`.
|
|
132
|
+
- Grok: UNCONDITIONAL APPROVE — 9/9 RED, per-probe verbatim
|
|
133
|
+
assertion text. Worktree at `audit/grok-round-1`.
|
|
134
|
+
- Mistral: UNCONDITIONAL APPROVE — 9/9 RED with per-probe failed-
|
|
135
|
+
count summaries. Worktree at `audit/mistral-round-1`
|
|
136
|
+
(`5d75099`).
|
|
137
|
+
- Gemini: **PARTIAL (quota-blocked)** — confirmed Lα–Lε RED (5/9)
|
|
138
|
+
with assertion text matching the substantive reviewers before
|
|
139
|
+
`TerminalQuotaError` (4h35m reset window > round budget) forced
|
|
140
|
+
a stop. No findings, no contradictions.
|
|
141
|
+
- Claude: **STRUCTURAL BLOCKER** — two `claude_request_async`
|
|
142
|
+
jobs (`135c05c3-…`, `e411e8cc-…`) stalled silently
|
|
143
|
+
(`stdoutBytes: 0` for ≥10 minutes); the second produced a
|
|
144
|
+
1126-byte fabricated meta-summary with no per-probe evidence,
|
|
145
|
+
rejected per the strict-evidence rule. Documented stall pattern,
|
|
146
|
+
not a defect in slice λ.
|
|
147
|
+
|
|
148
|
+
Four out of five independent vendor voices contributed evidence
|
|
149
|
+
(three full + one partial corroborating) with one documented
|
|
150
|
+
unfixable structural block, satisfying the slice-δ "4/5 minimum
|
|
151
|
+
with documented block" bar. The three full audits are unanimous;
|
|
152
|
+
the partial fourth corroborates without contradiction. Verdict:
|
|
153
|
+
slice λ passes the gate and ships as v1.15.0.
|
|
154
|
+
|
|
155
|
+
Full per-reviewer reports preserved at
|
|
156
|
+
`docs/reviews/slice-lambda/{README,round-1-{codex,grok,mistral,
|
|
157
|
+
gemini,claude}}.md`.
|
|
158
|
+
|
|
159
|
+
### Mechanical anchors (verify with `rg` before relying)
|
|
160
|
+
|
|
161
|
+
- `src/worktree-manager.ts` — new module, 277 lines.
|
|
162
|
+
- `src/index.ts` — `WORKTREE_SCHEMA` (`:419-444`),
|
|
163
|
+
`formatWorktreePrefix` (`:826-828`), `resolveWorktreeForRequest`
|
|
164
|
+
- per-tool prefix injection (search `formatWorktreePrefix(`),
|
|
165
|
+
10 × `worktree: WORKTREE_SCHEMA.optional()` registrations on
|
|
166
|
+
every `*_request` / `*_request_async` tool input.
|
|
167
|
+
- `src/session-manager.ts` — `cleanupHook` plumbing
|
|
168
|
+
(`:53-90, 318-342`).
|
|
169
|
+
- `src/async-job-manager.ts` — dedup-key cwd inclusion.
|
|
170
|
+
|
|
5
171
|
## [1.14.0] - 2026-05-28 — Phase 4 slice κ (Claude explicit `cache_control` via `--input-format stream-json`)
|
|
6
172
|
|
|
7
173
|
Ships the ninth Phase 4 slice. Callers can now opt their stable
|
|
@@ -31,7 +197,7 @@ falsifiability-tightening commits driven by the multi-LLM review gate.
|
|
|
31
197
|
- **`prepareClaudeRequest` κ branch** (`src/index.ts`): when the
|
|
32
198
|
caller marks any block AND requests `outputFormat: "stream-json"`,
|
|
33
199
|
argv switches to `-p --input-format stream-json --output-format
|
|
34
|
-
|
|
200
|
+
stream-json --include-partial-messages --verbose` with NO positional
|
|
35
201
|
prompt; the prep result carries `stdinPayload` + `cacheControlBlocks`.
|
|
36
202
|
Mixing `cacheControl` with `text`/`json` output returns an
|
|
37
203
|
actionable error instead of silently coercing.
|
|
@@ -120,7 +286,7 @@ APPROVE) is preserved in commit history (`bea1aee` and `bbc3b5f`).
|
|
|
120
286
|
|
|
121
287
|
- κ adds caller-side reuse ON TOP of the irreducible ~10–12K
|
|
122
288
|
`cache_creation` token floor that every fresh `claude -p` session
|
|
123
|
-
rebuilds (Claude Code's session-wrap content). The
|
|
289
|
+
rebuilds (Claude Code's session-wrap content). The _added_ benefit
|
|
124
290
|
scales with the caller's stable block size, not the total prompt.
|
|
125
291
|
- The `ttl='1h'` hard-code is mandatory because Anthropic rejects a
|
|
126
292
|
`5m` block after Claude Code's own 1h-marked session blocks; the
|
|
@@ -160,7 +326,7 @@ Patch release. Single user-facing fix to `claude_request` /
|
|
|
160
326
|
- Claude CLI 2.x rejects `--print --output-format=stream-json` without
|
|
161
327
|
`--verbose` ("When using --print, --output-format=stream-json requires
|
|
162
328
|
--verbose"). The gateway was emitting `--output-format stream-json
|
|
163
|
-
|
|
329
|
+
--include-partial-messages` without `--verbose`, so every claude
|
|
164
330
|
request configured for stream-json (sync or async) was exiting 1.
|
|
165
331
|
- `prepareClaudeRequest` now pushes `--verbose` as part of the
|
|
166
332
|
stream-json arg group. `--verbose` only affects what claude writes to
|
|
@@ -174,7 +340,7 @@ Patch release. Single user-facing fix to `claude_request` /
|
|
|
174
340
|
recorded in the FR for the first time since the CLI started enforcing
|
|
175
341
|
`--verbose`.
|
|
176
342
|
- Direct CLI verification: `claude -p ... --output-format stream-json
|
|
177
|
-
|
|
343
|
+
--verbose --include-partial-messages` returned a clean NDJSON stream
|
|
178
344
|
with `cache_read_input_tokens: 17978` and
|
|
179
345
|
`cache_creation_input_tokens: 17435` on a 1-hour-cache-enabled
|
|
180
346
|
account. The parser path is correct; only the missing flag was
|
|
@@ -184,7 +350,7 @@ Patch release. Single user-facing fix to `claude_request` /
|
|
|
184
350
|
|
|
185
351
|
- New regression: `prepareClaudeRequest` emits `--verbose` when
|
|
186
352
|
`outputFormat: "stream-json"` and does NOT emit it for `text` / `json`
|
|
187
|
-
(src
|
|
353
|
+
(src/**tests**/claude-handler.test.ts).
|
|
188
354
|
- Updated `upstream-contracts.test.ts` "accepts a valid Claude argv
|
|
189
355
|
emitted by the gateway" to pin the three-flag combo so a future
|
|
190
356
|
removal of `--verbose` fails at the contract gate.
|
|
@@ -254,7 +420,7 @@ regressions) plus this release commit.
|
|
|
254
420
|
enumerate). Also settable via the `GROK_SANDBOX` env var. Caller
|
|
255
421
|
responsibility to pass a valid profile name. The slice deliberately
|
|
256
422
|
does **not** integrate `--sandbox` with `approvalStrategy:
|
|
257
|
-
|
|
423
|
+
"mcp_managed"` because the value is unbounded — Grok's approval
|
|
258
424
|
semantics are already covered by `permissionMode` + `alwaysApprove` +
|
|
259
425
|
`approvalStrategy`.
|
|
260
426
|
- **`rules`** → `--rules <RULES>`. Supports `@file` prefix per
|
|
@@ -320,7 +486,7 @@ parallel with mandatory mutation-probe execution against
|
|
|
320
486
|
|
|
321
487
|
- Codex: UNCONDITIONAL APPROVE — all 12 probes [as predicted], all
|
|
322
488
|
26 tests VERIFIED. Baseline (`npm test`: 55 files / 884 tests; build
|
|
323
|
-
|
|
489
|
+
- format:check clean; slice file 31/31).
|
|
324
490
|
- Grok: UNCONDITIONAL APPROVE — all 12 probes [as predicted]; ran in
|
|
325
491
|
an isolated worktree at `/tmp/theta-audit-grok` per the slice-ζ
|
|
326
492
|
reviewer-stomping lesson.
|
|
@@ -330,8 +496,8 @@ parallel with mandatory mutation-probe execution against
|
|
|
330
496
|
beyond the spec and closes the "enum-mistake stays silent if fixture
|
|
331
497
|
uses a listed value" gap.
|
|
332
498
|
- Gemini: **FAILED at 10s** with `TerminalQuotaError: You have
|
|
333
|
-
|
|
334
|
-
|
|
499
|
+
exhausted your capacity on this model. Your quota will reset after
|
|
500
|
+
52m10s.` (Google 429). Documented quota blocker per protocol clause
|
|
335
501
|
5+6 — counts as "concrete unfixable when documented". Four
|
|
336
502
|
substantive valid approves from independent vendor families (OpenAI,
|
|
337
503
|
xAI, Mistral, Anthropic) satisfy the gate.
|
|
@@ -500,7 +666,7 @@ this release commit.
|
|
|
500
666
|
so no extra gating required.
|
|
501
667
|
- Both tools accept a new `jsonSchema` field
|
|
502
668
|
(`string | Record<string, unknown>`). Per `claude --help`, the CLI
|
|
503
|
-
argument is the JSON Schema
|
|
669
|
+
argument is the JSON Schema _literal_ (not a path; contrast with Codex
|
|
504
670
|
`--output-schema`). Object values are `JSON.stringify`-d; string values
|
|
505
671
|
pass verbatim. Use with `outputFormat: "json"` for structured output
|
|
506
672
|
validation. Achieves Codex parity for structured-output validation
|
|
@@ -798,7 +964,7 @@ for the async tools and the codex CLI.
|
|
|
798
964
|
already terminated before the arm signal landed.
|
|
799
965
|
- `JobStore.markOrphanedOnStartup()` return shape extended from `number`
|
|
800
966
|
to `{ count, orphaned: Array<{ id, correlationId, startedAt, stdout,
|
|
801
|
-
|
|
967
|
+
stderr, exitCode }> }` so the manager constructor can write FR
|
|
802
968
|
`logComplete` rows for previously orphaned jobs with proper audit data
|
|
803
969
|
(durationMs from `startedAt`, response from `stderr || stdout`,
|
|
804
970
|
errorMessage `"orphaned after gateway restart"`). `SqliteJobStore`
|
|
@@ -930,8 +1096,9 @@ Pure documentation release; zero source-code changes since 1.6.0.
|
|
|
930
1096
|
### Fixed — `docs/launch/blog-cache-awareness.md` accuracy + voice
|
|
931
1097
|
|
|
932
1098
|
Technical corrections from the multi-LLM voice + technical review:
|
|
1099
|
+
|
|
933
1100
|
- Mutually-exclusive error-string quotation reformatted so the
|
|
934
|
-
``provide exactly one of `prompt`
|
|
1101
|
+
``provide exactly one of `prompt`or`promptParts``` example renders
|
|
935
1102
|
correctly in markdown.
|
|
936
1103
|
- `lastWriteAt` references corrected to `lastRequestAt` (the actual
|
|
937
1104
|
public field name on `SessionCacheStats`).
|
|
@@ -1002,8 +1169,7 @@ Also includes (beyond cache-awareness):
|
|
|
1002
1169
|
The gateway concatenates in canonical order (`system → tools → context → task`)
|
|
1003
1170
|
so the stable prefix bytes precede the volatile task tail unchanged across
|
|
1004
1171
|
calls — raising implicit cache hit rate without calling provider cache APIs.
|
|
1005
|
-
The exact error strings `provide exactly one of \`prompt\` or \`promptParts\``
|
|
1006
|
-
and `one of \`prompt\` or \`promptParts\` is required` are stable API
|
|
1172
|
+
The exact error strings `provide exactly one of \`prompt\` or \`promptParts\``and`one of \`prompt\` or \`promptParts\` is required` are stable API
|
|
1007
1173
|
contract.
|
|
1008
1174
|
- **Flight-recorder v3 migration**: new columns `stable_prefix_hash`
|
|
1009
1175
|
(sha256) and `stable_prefix_tokens` (integer bytes/4 heuristic) on
|
|
@@ -1034,9 +1200,9 @@ Also includes (beyond cache-awareness):
|
|
|
1034
1200
|
- `warn_on_ttl_expiry = false`
|
|
1035
1201
|
- `[cache_awareness.min_stable_tokens_for_cache_control]` per-family
|
|
1036
1202
|
table (sonnet=1024, opus=4096, haiku=4096, default=4096).
|
|
1037
|
-
|
|
1038
|
-
|
|
1039
|
-
|
|
1203
|
+
Validated by a separate Zod schema and loader (`loadCacheAwarenessConfig`);
|
|
1204
|
+
a malformed `[cache_awareness]` block does NOT break `loadPersistenceConfig`
|
|
1205
|
+
and vice versa. No env-var overrides.
|
|
1040
1206
|
|
|
1041
1207
|
### Decision: Branch B (prefix-discipline only) for slice 1
|
|
1042
1208
|
|
|
@@ -1356,6 +1522,7 @@ Lands DAG layers 6-12 — the personal-MCP MVP terminal plus all of Phase 0-3 pr
|
|
|
1356
1522
|
- **No self-update** — `cli_upgrade --cli mistral` detects pip / uv / brew via probes and dispatches to `pip install -U vibe-cli`, `uv tool upgrade vibe-cli`, or `brew upgrade mistral-vibe`. Unknown installations return an actionable error rather than running a non-existent `vibe update`.
|
|
1357
1523
|
|
|
1358
1524
|
Other surfaces extended: `SESSION_PROVIDER_VALUES` now includes `"mistral"`; `list_models`, `cli_versions`, `cli_upgrade`, `approval_list`, `session_create`, `session_list`, and `session_clear_all` accept the fifth provider; new MCP resources `sessions://mistral` and `models://mistral` are registered; `validate_with_models` / `consensus_check` / `red_team_review` can route to Mistral.
|
|
1525
|
+
|
|
1359
1526
|
- **U23 — JSON output + token/cost parity across providers.** New `src/codex-json-parser.ts` parses the Codex `--json` JSONL event stream (`thread.started`, `turn.started`/`completed`/`failed`, `item.*`, `error`); lenient against partial streams and garbage preamble. New `src/gemini-json-parser.ts` parses `gemini -o json` output and maps `usageMetadata.{promptTokenCount, candidatesTokenCount, cachedContentTokenCount}`. `extractUsageAndCost` is now a thin per-provider dispatcher returning `{inputTokens, outputTokens, cacheReadTokens?, cacheCreationTokens?, costUsd?}` for every provider that supports JSON; Claude `cache_read_input_tokens` / `cache_creation_input_tokens` are now plumbed through instead of being discarded. `codex_request`, `codex_request_async`, `gemini_request`, and `gemini_request_async` now expose `outputFormat: enum("text","json")` — set to `"json"` and the gateway emits `--json` (Codex) or `-o json` (Gemini) and forwards parsed usage/cost into the flight recorder. Flight-recorder schema gains `cache_read_tokens` and `cache_creation_tokens` columns via idempotent migration (`PRAGMA table_info` → `ALTER TABLE ADD COLUMN`); existing `logs.db` files are upgraded in place. 15 new tests.
|
|
1360
1527
|
- **U24 — Permission/approval-mode parity across providers.** Claude `permissionMode` enum (`default | acceptEdits | plan | auto | dontAsk | bypassPermissions`) replaces the boolean `dangerouslySkipPermissions` (the boolean still works and now maps to `permissionMode: "bypassPermissions"`; setting both logs a warning, `permissionMode` wins). Gemini `approvalMode` gains `plan`. Codex splits `--full-auto` into `sandboxMode: enum("read-only","workspace-write","danger-full-access")` and `askForApproval: enum("untrusted","on-request","never")`, emitting `--sandbox <mode>` and `--ask-for-approval <mode>` independently; legacy `fullAuto: true` still works and expands to `--sandbox workspace-write --ask-for-approval never` by default, with `useLegacyFullAutoFlag: true` as an explicit escape hatch to emit `--full-auto` directly. Codex resume mode filters all three flags (`--full-auto`, `--sandbox`, `--ask-for-approval`) since `codex exec resume` inherits the session's policy. 26 new tests.
|
|
1361
1528
|
- **U25 — Claude high-impact features.** `claude_request` / `claude_request_async` schemas gain `agent?: string` (single sub-agent dispatch), `agents?: Record<string, object>` (multi-agent JSON, validated against `CLAUDE_AGENT_DEFINITION_SCHEMA` before emit), `forkSession?: boolean`, `systemPrompt?: string`, `appendSystemPrompt?: string` (mutually exclusive at the schema + tool-callback boundary), `maxBudgetUsd?: number`, `maxTurns?: number`, `effort?: enum("low","medium","high","xhigh","max")`, and `excludeDynamicSystemPromptSections?: boolean`. Each emits the documented `--<flag>` form. 25 new tests in `src/__tests__/claude-handler.test.ts`.
|
|
@@ -1448,7 +1615,7 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
|
|
|
1448
1615
|
|
|
1449
1616
|
### Fixed
|
|
1450
1617
|
|
|
1451
|
-
- **SIGTERM→SIGKILL escalation bug** — `proc.killed` becomes `true` after `.kill()` is
|
|
1618
|
+
- **SIGTERM→SIGKILL escalation bug** — `proc.killed` becomes `true` after `.kill()` is _called_, not after the process _exits_, so the SIGKILL guard (`if (!proc.killed)`) was always false. Replaced with an `exited` flag set by `close`/`error` events in both `executor.ts` and `async-job-manager.ts`
|
|
1452
1619
|
- **Timer priority race** — When both `timeout` and `idleTimeout` are set, idle timeout now clears the wall-clock timer to prevent `timedOut` from overriding `idledOut` in the close handler (which would misclassify code 125 as transient code 124)
|
|
1453
1620
|
|
|
1454
1621
|
### Added
|
|
@@ -1533,6 +1700,7 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
|
|
|
1533
1700
|
## Core Features
|
|
1534
1701
|
|
|
1535
1702
|
### Multi-LLM Orchestration
|
|
1703
|
+
|
|
1536
1704
|
- **3 CLI tools supported**: Claude Code, Codex, Gemini
|
|
1537
1705
|
- **Unified MCP interface**: Single protocol for all LLMs
|
|
1538
1706
|
- **Cross-tool collaboration**: LLMs can use each other via MCP
|
|
@@ -1540,6 +1708,7 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
|
|
|
1540
1708
|
- **Correlation ID tracking**: Full request tracing
|
|
1541
1709
|
|
|
1542
1710
|
### Token Optimization
|
|
1711
|
+
|
|
1543
1712
|
- **Auto-optimization middleware**: 44% reduction on prompts, 37% on responses
|
|
1544
1713
|
- **15+ optimization patterns**: Remove filler, compact types, arrow notation
|
|
1545
1714
|
- **Opt-in feature**: `optimizePrompt` and `optimizeResponse` flags
|
|
@@ -1547,6 +1716,7 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
|
|
|
1547
1716
|
- **Research-backed**: 42 sources, best practices documented
|
|
1548
1717
|
|
|
1549
1718
|
### Reliability & Performance
|
|
1719
|
+
|
|
1550
1720
|
- **Retry logic**: Exponential backoff with circuit breaker
|
|
1551
1721
|
- **Atomic file writes**: Process-specific temp files with fsync
|
|
1552
1722
|
- **Memory limits**: 50MB cap on CLI output prevents DoS
|
|
@@ -1554,6 +1724,7 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
|
|
|
1554
1724
|
- **Non-zero exit code handling**: Proper retry behavior
|
|
1555
1725
|
|
|
1556
1726
|
### Security Hardening
|
|
1727
|
+
|
|
1557
1728
|
- **No secret leakage**: Generic session descriptions only
|
|
1558
1729
|
- **File permissions**: 0o600 on sensitive files
|
|
1559
1730
|
- **No ReDoS vulnerabilities**: Bounded regex patterns
|
|
@@ -1562,6 +1733,7 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
|
|
|
1562
1733
|
- **Custom storage paths**: Secure directory creation
|
|
1563
1734
|
|
|
1564
1735
|
### Testing & Quality
|
|
1736
|
+
|
|
1565
1737
|
- **114 tests**: 68 unit, 41 integration, 5 optimizer
|
|
1566
1738
|
- **Real CLI integration**: Not mocks
|
|
1567
1739
|
- **Regression tests**: ReDoS, schema validation, retry behavior
|
|
@@ -1569,6 +1741,7 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
|
|
|
1569
1741
|
- **Edge case coverage**: Timeouts, errors, concurrency
|
|
1570
1742
|
|
|
1571
1743
|
### Documentation Excellence
|
|
1744
|
+
|
|
1572
1745
|
- **7 comprehensive guides**: 4,000+ lines total
|
|
1573
1746
|
- **Research-backed**: TOKEN_OPTIMIZATION_GUIDE.md with 42 sources
|
|
1574
1747
|
- **Real-world examples**: PROMPT_OPTIMIZATION_EXAMPLES.md with 5 examples
|
|
@@ -1580,6 +1753,7 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
|
|
|
1580
1753
|
## Added
|
|
1581
1754
|
|
|
1582
1755
|
### Features
|
|
1756
|
+
|
|
1583
1757
|
- Multi-LLM CLI orchestration via MCP
|
|
1584
1758
|
- Session management with persistence
|
|
1585
1759
|
- Correlation ID tracking for request tracing
|
|
@@ -1591,6 +1765,7 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
|
|
|
1591
1765
|
- Custom storage path support
|
|
1592
1766
|
|
|
1593
1767
|
### Tools (MCP)
|
|
1768
|
+
|
|
1594
1769
|
- `claude_request` - Execute Claude Code CLI
|
|
1595
1770
|
- `codex_request` - Execute Codex CLI
|
|
1596
1771
|
- `gemini_request` - Execute Gemini CLI
|
|
@@ -1604,6 +1779,7 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
|
|
|
1604
1779
|
- `list_models` - List available models for each CLI
|
|
1605
1780
|
|
|
1606
1781
|
### Resources (MCP)
|
|
1782
|
+
|
|
1607
1783
|
- `sessions://all` - All sessions across CLIs
|
|
1608
1784
|
- `sessions://claude` - Claude-specific sessions
|
|
1609
1785
|
- `sessions://codex` - Codex-specific sessions
|
|
@@ -1612,6 +1788,7 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
|
|
|
1612
1788
|
- `metrics://performance` - Performance metrics and stats
|
|
1613
1789
|
|
|
1614
1790
|
### Documentation
|
|
1791
|
+
|
|
1615
1792
|
- `README.md` - Installation and usage guide
|
|
1616
1793
|
- `BEST_PRACTICES.md` - Design and implementation patterns
|
|
1617
1794
|
- `TOKEN_OPTIMIZATION_GUIDE.md` - Research-backed optimization techniques (42 sources)
|
|
@@ -1625,6 +1802,7 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
|
|
|
1625
1802
|
- `CROSS_TOOL_SUCCESS.md` - Cross-LLM collaboration validation
|
|
1626
1803
|
|
|
1627
1804
|
### Tests
|
|
1805
|
+
|
|
1628
1806
|
- 68 unit tests (executor, sessions, metrics, optimizer)
|
|
1629
1807
|
- 41 integration tests (full MCP with real CLIs)
|
|
1630
1808
|
- 5 optimizer tests (pattern validation, ReDoS prevention)
|
|
@@ -1637,6 +1815,7 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
|
|
|
1637
1815
|
### First Review Round (8 bugs)
|
|
1638
1816
|
|
|
1639
1817
|
**Critical:**
|
|
1818
|
+
|
|
1640
1819
|
1. **session_set_active schema mismatch** (src/index.ts:430)
|
|
1641
1820
|
- Issue: Documentation said "null to clear" but z.string() rejected null
|
|
1642
1821
|
- Fix: Changed to z.string().nullable()
|
|
@@ -1652,12 +1831,12 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
|
|
|
1652
1831
|
- Fix: Integrated withRetry + CircuitBreaker into executeCli
|
|
1653
1832
|
- Impact: Transient failures now retried automatically
|
|
1654
1833
|
|
|
1655
|
-
**Medium:**
|
|
1656
|
-
|
|
1657
|
-
|
|
1658
|
-
|
|
1834
|
+
**Medium:** 4. **Integration test brittleness**
|
|
1835
|
+
|
|
1836
|
+
- Issue: Tests failed without dist/ or CLIs installed
|
|
1837
|
+
- Fix: Tests properly skip when CLIs unavailable
|
|
1659
1838
|
|
|
1660
|
-
5. **Test timing issues** (src
|
|
1839
|
+
5. **Test timing issues** (src/**tests**/session-manager.test.ts:216,429)
|
|
1661
1840
|
- Issue: setTimeout not awaited → false positives
|
|
1662
1841
|
- Fix: Proper async/await patterns
|
|
1663
1842
|
|
|
@@ -1665,10 +1844,10 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
|
|
|
1665
1844
|
- Issue: All stdout/stderr buffered in memory with no cap
|
|
1666
1845
|
- Fix: Added 50MB limit with early termination
|
|
1667
1846
|
|
|
1668
|
-
**Low:**
|
|
1669
|
-
|
|
1670
|
-
|
|
1671
|
-
|
|
1847
|
+
**Low:** 7. **Model data duplication** (src/index.ts:64, src/resources.ts:22)
|
|
1848
|
+
|
|
1849
|
+
- Issue: CLI_INFO defined in two places
|
|
1850
|
+
- Fix: Centralized in single location
|
|
1672
1851
|
|
|
1673
1852
|
8. **Unused code** (src/resources.ts:33)
|
|
1674
1853
|
- Issue: listResources() never called
|
|
@@ -1677,27 +1856,28 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
|
|
|
1677
1856
|
### Second Review Round (8 bugs)
|
|
1678
1857
|
|
|
1679
1858
|
**Critical:**
|
|
1859
|
+
|
|
1680
1860
|
1. **Secret leakage via session descriptions** (src/index.ts + src/session-manager.ts)
|
|
1681
1861
|
- Issue: First 50 chars of prompts stored in plain text
|
|
1682
1862
|
- Fix: Generic descriptions ("Claude Session"), file permissions 0o600
|
|
1683
1863
|
- Impact: No user data exposed in session files
|
|
1684
1864
|
|
|
1685
|
-
**High:**
|
|
1686
|
-
|
|
1687
|
-
|
|
1688
|
-
|
|
1689
|
-
|
|
1865
|
+
**High:** 2. **ReDoS in optimizer regex** (src/optimizer.ts:241,244)
|
|
1866
|
+
|
|
1867
|
+
- Issue: Catastrophic backtracking with .+? patterns
|
|
1868
|
+
- Fix: Bounded character sets [A-Za-z][\w-]\*
|
|
1869
|
+
- Impact: No DoS from malicious prompts
|
|
1690
1870
|
|
|
1691
1871
|
3. **Custom storage path directory not created** (src/session-manager.ts:36)
|
|
1692
1872
|
- Issue: ensureStorageDirectory only created default path
|
|
1693
1873
|
- Fix: Create dirname(storagePath) for custom paths
|
|
1694
1874
|
- Impact: Custom storage paths work without errors
|
|
1695
1875
|
|
|
1696
|
-
**Medium:**
|
|
1697
|
-
|
|
1698
|
-
|
|
1699
|
-
|
|
1700
|
-
|
|
1876
|
+
**Medium:** 4. **Atomic write temp filename collision** (src/session-manager.ts:57)
|
|
1877
|
+
|
|
1878
|
+
- Issue: All processes used same .tmp filename
|
|
1879
|
+
- Fix: Process-specific temp files (sessions.json.tmp.${process.pid})
|
|
1880
|
+
- Impact: Safe multi-process deployments
|
|
1701
1881
|
|
|
1702
1882
|
5. **Retry doesn't handle non-zero exit codes** (src/executor.ts:99)
|
|
1703
1883
|
- Issue: Only thrown errors triggered retry
|
|
@@ -1709,11 +1889,11 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
|
|
|
1709
1889
|
- Fix: 50MB limit with process termination
|
|
1710
1890
|
- Impact: DoS prevention
|
|
1711
1891
|
|
|
1712
|
-
**Low:**
|
|
1713
|
-
|
|
1714
|
-
|
|
1715
|
-
|
|
1716
|
-
|
|
1892
|
+
**Low:** 7. **Performance overhead from NVM scanning** (src/executor.ts:41)
|
|
1893
|
+
|
|
1894
|
+
- Issue: Filesystem scan on every request
|
|
1895
|
+
- Fix: Cache NVM path at module load
|
|
1896
|
+
- Impact: Performance improvement
|
|
1717
1897
|
|
|
1718
1898
|
8. **Unused imports** (src/session-manager.ts:4, src/executor.ts:7)
|
|
1719
1899
|
- Issue: Dead code and unused parameters
|
|
@@ -1725,6 +1905,7 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
|
|
|
1725
1905
|
## Security
|
|
1726
1906
|
|
|
1727
1907
|
### Vulnerabilities Fixed
|
|
1908
|
+
|
|
1728
1909
|
- ✅ **Secret leakage**: No user data in session descriptions
|
|
1729
1910
|
- ✅ **File permissions**: 0o600 on sessions.json
|
|
1730
1911
|
- ✅ **ReDoS**: Bounded regex patterns prevent DoS
|
|
@@ -1733,6 +1914,7 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
|
|
|
1733
1914
|
- ✅ **Command injection**: Already prevented via spawn with args
|
|
1734
1915
|
|
|
1735
1916
|
### Security Best Practices
|
|
1917
|
+
|
|
1736
1918
|
- Input validation with Zod schemas
|
|
1737
1919
|
- No stack trace leakage in errors
|
|
1738
1920
|
- Atomic file writes with fsync
|
|
@@ -1744,6 +1926,7 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
|
|
|
1744
1926
|
## Performance
|
|
1745
1927
|
|
|
1746
1928
|
### Optimizations Added
|
|
1929
|
+
|
|
1747
1930
|
- **Token optimization**: 44% reduction on prompts, 37% on responses
|
|
1748
1931
|
- **NVM path caching**: Eliminates I/O on every request
|
|
1749
1932
|
- **Circuit breaker**: Fast-fail during outages
|
|
@@ -1751,6 +1934,7 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
|
|
|
1751
1934
|
- **Memory limits**: Prevents resource exhaustion
|
|
1752
1935
|
|
|
1753
1936
|
### Metrics
|
|
1937
|
+
|
|
1754
1938
|
- Request counts per CLI tool
|
|
1755
1939
|
- Response times with percentiles
|
|
1756
1940
|
- Success/failure rates
|
|
@@ -1762,6 +1946,7 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
|
|
|
1762
1946
|
## Testing
|
|
1763
1947
|
|
|
1764
1948
|
### Test Growth
|
|
1949
|
+
|
|
1765
1950
|
- **Initial**: 104 tests
|
|
1766
1951
|
- **After first fixes**: 109 tests (+5 from retry integration)
|
|
1767
1952
|
- **After optimizer**: 113 tests (+4 from optimizer)
|
|
@@ -1769,6 +1954,7 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
|
|
|
1769
1954
|
- **Growth**: +10 tests (9.6% increase)
|
|
1770
1955
|
|
|
1771
1956
|
### Coverage Areas
|
|
1957
|
+
|
|
1772
1958
|
- Unit: Executor, session manager, metrics, optimizer
|
|
1773
1959
|
- Integration: Full MCP protocol with real CLI execution
|
|
1774
1960
|
- Regression: Schema validation, ReDoS, retry behavior
|
|
@@ -1779,6 +1965,7 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
|
|
|
1779
1965
|
## Documentation
|
|
1780
1966
|
|
|
1781
1967
|
### Guides Created
|
|
1968
|
+
|
|
1782
1969
|
1. **README.md** - Installation, usage, API reference
|
|
1783
1970
|
2. **BEST_PRACTICES.md** - Design patterns and architecture
|
|
1784
1971
|
3. **TOKEN_OPTIMIZATION_GUIDE.md** - Research (42 sources)
|
|
@@ -1792,6 +1979,7 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
|
|
|
1792
1979
|
11. **CROSS_TOOL_SUCCESS.md** - Collaboration proof
|
|
1793
1980
|
|
|
1794
1981
|
### Total Documentation
|
|
1982
|
+
|
|
1795
1983
|
- **11 comprehensive files**
|
|
1796
1984
|
- **~8,000 lines** of documentation
|
|
1797
1985
|
- **Research-backed** with citations
|
|
@@ -1802,17 +1990,20 @@ Round-1 Codex review found 5 blockers across U22, U23, and U26; round-2 uncondit
|
|
|
1802
1990
|
## Dogfooding Validation
|
|
1803
1991
|
|
|
1804
1992
|
### Multi-LLM Review Process
|
|
1993
|
+
|
|
1805
1994
|
- **Claude Sonnet 4.5**: Strategic/product review (8.5/10 → 10/10)
|
|
1806
1995
|
- **Codex**: Bug finding and implementation (13 bugs found, 13 fixed)
|
|
1807
1996
|
- **Gemini 2.5 Pro**: Security analysis (3 critical issues found, 3 fixed)
|
|
1808
1997
|
|
|
1809
1998
|
### Self-Improvement Cycle
|
|
1999
|
+
|
|
1810
2000
|
1. ✅ Multi-LLM review found 16 bugs
|
|
1811
2001
|
2. ✅ Codex fixed all bugs via MCP
|
|
1812
2002
|
3. ✅ Gateway validated fixes via test suite
|
|
1813
2003
|
4. ✅ Complete autonomous improvement demonstrated
|
|
1814
2004
|
|
|
1815
2005
|
### Workflow Validated
|
|
2006
|
+
|
|
1816
2007
|
```
|
|
1817
2008
|
Implement (Codex) → Review (Gemini) → Fix (Codex) → Verify (Tests) → Iterate
|
|
1818
2009
|
```
|
|
@@ -1822,41 +2013,45 @@ Implement (Codex) → Review (Gemini) → Fix (Codex) → Verify (Tests) → Ite
|
|
|
1822
2013
|
## Migration Guide
|
|
1823
2014
|
|
|
1824
2015
|
### Breaking Changes
|
|
2016
|
+
|
|
1825
2017
|
None - This is the first release.
|
|
1826
2018
|
|
|
1827
2019
|
### New Features to Adopt
|
|
1828
2020
|
|
|
1829
2021
|
**1. Token Optimization** (Optional, Opt-in)
|
|
2022
|
+
|
|
1830
2023
|
```typescript
|
|
1831
2024
|
// Enable prompt optimization
|
|
1832
2025
|
await callTool("codex_request", {
|
|
1833
2026
|
prompt: "Your verbose prompt...",
|
|
1834
|
-
optimizePrompt: true
|
|
2027
|
+
optimizePrompt: true, // 44% token reduction
|
|
1835
2028
|
});
|
|
1836
2029
|
|
|
1837
2030
|
// Enable response optimization
|
|
1838
2031
|
await callTool("claude_request", {
|
|
1839
2032
|
prompt: "Generate docs...",
|
|
1840
|
-
optimizeResponse: true
|
|
2033
|
+
optimizeResponse: true, // 37% token reduction
|
|
1841
2034
|
});
|
|
1842
2035
|
```
|
|
1843
2036
|
|
|
1844
2037
|
**2. Session Management**
|
|
2038
|
+
|
|
1845
2039
|
```typescript
|
|
1846
2040
|
// Create and use sessions
|
|
1847
2041
|
const session = await callTool("session_create", {
|
|
1848
2042
|
cli: "claude",
|
|
1849
|
-
description: "My coding session"
|
|
2043
|
+
description: "My coding session",
|
|
1850
2044
|
});
|
|
1851
2045
|
|
|
1852
2046
|
// Continue conversations
|
|
1853
2047
|
await callTool("claude_request", {
|
|
1854
2048
|
prompt: "Continue from previous context",
|
|
1855
|
-
sessionId: session.id
|
|
2049
|
+
sessionId: session.id,
|
|
1856
2050
|
});
|
|
1857
2051
|
```
|
|
1858
2052
|
|
|
1859
2053
|
**3. Correlation IDs** (Automatic)
|
|
2054
|
+
|
|
1860
2055
|
```typescript
|
|
1861
2056
|
// Automatically generated for tracing
|
|
1862
2057
|
// Check logs: [corrId] prefix on all log lines
|
|
@@ -1867,6 +2062,7 @@ await callTool("claude_request", {
|
|
|
1867
2062
|
## Known Limitations
|
|
1868
2063
|
|
|
1869
2064
|
### Documented Constraints
|
|
2065
|
+
|
|
1870
2066
|
1. **Multi-level orchestration unsupported**
|
|
1871
2067
|
- Nested MCP connections fail
|
|
1872
2068
|
- LLMs can't spawn sub-LLMs via gateway
|
|
@@ -1881,6 +2077,7 @@ await callTool("claude_request", {
|
|
|
1881
2077
|
- Consider encryption for sensitive data (future)
|
|
1882
2078
|
|
|
1883
2079
|
### Future Enhancements
|
|
2080
|
+
|
|
1884
2081
|
- Session encryption at rest
|
|
1885
2082
|
- Session TTL and automatic cleanup
|
|
1886
2083
|
- Redis/DynamoDB backend for horizontal scaling
|
|
@@ -1893,16 +2090,19 @@ await callTool("claude_request", {
|
|
|
1893
2090
|
## Credits
|
|
1894
2091
|
|
|
1895
2092
|
### Development
|
|
2093
|
+
|
|
1896
2094
|
- **Architecture & Orchestration**: Claude Sonnet 4.5
|
|
1897
2095
|
- **Implementation & Bug Fixes**: Codex via llm-cli-gateway MCP
|
|
1898
2096
|
- **Security Analysis**: Gemini 2.5 Pro via llm-cli-gateway MCP
|
|
1899
2097
|
|
|
1900
2098
|
### Research
|
|
2099
|
+
|
|
1901
2100
|
- Token optimization: 42 research sources (2025-2026)
|
|
1902
2101
|
- Compression validation: Compel paper (OpenReview 2025)
|
|
1903
2102
|
- Best practices: Industry standards + dogfooding
|
|
1904
2103
|
|
|
1905
2104
|
### Validation
|
|
2105
|
+
|
|
1906
2106
|
- **Self-dogfooding**: Gateway reviewed and fixed itself
|
|
1907
2107
|
- **Multi-LLM collaboration**: 3 LLMs working via MCP
|
|
1908
2108
|
- **Iterative quality**: 2 review rounds, 16 bugs found and fixed
|
|
@@ -1912,6 +2112,7 @@ await callTool("claude_request", {
|
|
|
1912
2112
|
## Statistics
|
|
1913
2113
|
|
|
1914
2114
|
### Development Timeline
|
|
2115
|
+
|
|
1915
2116
|
- **Total time**: ~2.5 hours (from first review to 100% bug-free)
|
|
1916
2117
|
- **Review rounds**: 2 comprehensive multi-LLM reviews
|
|
1917
2118
|
- **Bugs found**: 16 total
|
|
@@ -1919,12 +2120,14 @@ await callTool("claude_request", {
|
|
|
1919
2120
|
- **Test growth**: 104 → 114 tests (+9.6%)
|
|
1920
2121
|
|
|
1921
2122
|
### Code Metrics
|
|
2123
|
+
|
|
1922
2124
|
- **Files modified**: 12 files
|
|
1923
2125
|
- **Lines added**: ~2,500 lines
|
|
1924
2126
|
- **Documentation**: ~8,000 lines (11 files)
|
|
1925
2127
|
- **Test coverage**: 114 tests across unit/integration/regression
|
|
1926
2128
|
|
|
1927
2129
|
### Quality Metrics
|
|
2130
|
+
|
|
1928
2131
|
- **Bug-free rate**: 100%
|
|
1929
2132
|
- **Test pass rate**: 100%
|
|
1930
2133
|
- **Build success**: ✅
|