llm-cli-gateway 1.9.0 → 1.11.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +111 -0
- package/dist/gemini-json-parser.d.ts +19 -4
- package/dist/gemini-json-parser.js +73 -4
- package/dist/index.d.ts +13 -6
- package/dist/index.js +53 -11
- package/dist/request-helpers.d.ts +14 -0
- package/dist/request-helpers.js +7 -0
- package/dist/upstream-contracts.js +50 -1
- package/package.json +1 -1
package/CHANGELOG.md
CHANGED
|
@@ -2,6 +2,117 @@
|
|
|
2
2
|
|
|
3
3
|
All notable changes to the llm-cli-gateway project.
|
|
4
4
|
|
|
5
|
+
## [1.11.0] - 2026-05-27 — Phase 4 slice η (Claude `--fallback-model` + `--json-schema`)
|
|
6
|
+
|
|
7
|
+
Ships the sixth Phase 4 slice: Claude's reliability fallback and
|
|
8
|
+
structured-output JSON-Schema constraint flags are now reachable from
|
|
9
|
+
`claude_request` and `claude_request_async`. Three commits land together
|
|
10
|
+
(feature wiring, contract registration, test-veracity regressions) plus
|
|
11
|
+
this release commit.
|
|
12
|
+
|
|
13
|
+
### Added — `--fallback-model` and `--json-schema` for Claude
|
|
14
|
+
|
|
15
|
+
- `claude_request` and `claude_request_async` accept a new `fallbackModel`
|
|
16
|
+
field (non-empty string, validated via `z.string().min(1)`). Threaded
|
|
17
|
+
through `prepareClaudeRequest` → `prepareClaudeHighImpactFlags`
|
|
18
|
+
(`src/request-helpers.ts:651`) → `--fallback-model <model>` argv pair.
|
|
19
|
+
Effective only with Claude `--print`; the gateway always passes `-p`,
|
|
20
|
+
so no extra gating required.
|
|
21
|
+
- Both tools accept a new `jsonSchema` field
|
|
22
|
+
(`string | Record<string, unknown>`). Per `claude --help`, the CLI
|
|
23
|
+
argument is the JSON Schema *literal* (not a path; contrast with Codex
|
|
24
|
+
`--output-schema`). Object values are `JSON.stringify`-d; string values
|
|
25
|
+
pass verbatim. Use with `outputFormat: "json"` for structured output
|
|
26
|
+
validation. Achieves Codex parity for structured-output validation
|
|
27
|
+
in a single slice.
|
|
28
|
+
- `UPSTREAM_CLI_CONTRACTS.claude.flags` registers `--fallback-model` and
|
|
29
|
+
`--json-schema` with `arity: "one"`. `mcpParameters` includes both new
|
|
30
|
+
field names. Two new passing conformance fixtures
|
|
31
|
+
(`claude-fallback-model`, `claude-json-schema`) pin the contract; both
|
|
32
|
+
are mechanically validated against `validateUpstreamCliArgs` in the
|
|
33
|
+
REGRESSIONS Hε suite.
|
|
34
|
+
|
|
35
|
+
### Test-veracity audit
|
|
36
|
+
|
|
37
|
+
Per the standing protocol (`feedback_test_veracity_audit_protocol`),
|
|
38
|
+
this slice's tests were audited by Codex + Gemini + Grok + Mistral in
|
|
39
|
+
async parallel with mandatory mutation-probe execution. Spec at
|
|
40
|
+
`docs/plans/test-veracity-audit-slice-eta.spec.md`. Round 1 outcomes:
|
|
41
|
+
Grok + Mistral unanimous UNCONDITIONAL APPROVE; Gemini stalled at 682B
|
|
42
|
+
stderr for 15+ minutes (cancelled, documented quota/stall-class
|
|
43
|
+
blocker); Codex initially REJECTED on P-Hβ-4 with an invalid claim
|
|
44
|
+
("removing sync `jsonSchema` left the test green") — pre-verification
|
|
45
|
+
on a clean tree confirmed the mutation does turn `Hα-4` + `Hα-6` RED as
|
|
46
|
+
the spec predicts. Round-2 pushback with the verbatim vitest output:
|
|
47
|
+
Codex self-corrected, reproduced the mutation in a worktree, observed
|
|
48
|
+
the predicted red, restored, and issued UNCONDITIONAL APPROVE.
|
|
49
|
+
|
|
50
|
+
Three substantive reviewer approves (Grok, Mistral, Codex) from
|
|
51
|
+
independent vendor families satisfy the multi-LLM gate; Gemini stall
|
|
52
|
+
documented.
|
|
53
|
+
|
|
54
|
+
Test count: 816 → 837 (21 new across one file:
|
|
55
|
+
`src/__tests__/test-veracity-regressions-slice-eta.test.ts`).
|
|
56
|
+
|
|
57
|
+
### Known caveats
|
|
58
|
+
|
|
59
|
+
- `npm run check` still excludes `format:check` (gap first flagged in
|
|
60
|
+
v1.8.0). Run both locally before pushing.
|
|
61
|
+
- Claude `--fallback-model` and `--json-schema` are CLI-side gated to
|
|
62
|
+
`--print` mode by Claude itself; both gateway tools always pass `-p`,
|
|
63
|
+
so this is invisible to callers but worth noting if the upstream CLI
|
|
64
|
+
flag semantics change.
|
|
65
|
+
|
|
66
|
+
## [1.10.0] - 2026-05-27 — Phase 4 slice ε (Gemini `-o stream-json` enum widening)
|
|
67
|
+
|
|
68
|
+
Ships the fifth Phase 4 slice: Gemini's NDJSON event-stream output format
|
|
69
|
+
(`-o stream-json`) is now reachable from `gemini_request` and
|
|
70
|
+
`gemini_request_async`. Four commits land together: the feature wiring, a
|
|
71
|
+
contract-table widening, a test-veracity regression suite, and a follow-up
|
|
72
|
+
test fix driven by the multi-LLM round-1 audit.
|
|
73
|
+
|
|
74
|
+
### Added — `outputFormat: "stream-json"` for Gemini
|
|
75
|
+
|
|
76
|
+
- `gemini_request` and `gemini_request_async` `outputFormat` enums widened
|
|
77
|
+
from `text | json` to `text | json | stream-json`.
|
|
78
|
+
- `prepareGeminiRequest` emits `-o stream-json` when the new value is set.
|
|
79
|
+
No `--include-partial-messages` analogue is required: Gemini already
|
|
80
|
+
streams stdout in real time across all output modes (covered by
|
|
81
|
+
`CLI_IDLE_TIMEOUTS.gemini = 600_000`).
|
|
82
|
+
- New `parseGeminiStreamJson` parser consumes the NDJSON event stream
|
|
83
|
+
(`init` / `message` / `result` lines), concatenates assistant `delta`
|
|
84
|
+
messages into the response, and extracts
|
|
85
|
+
`input_tokens` / `output_tokens` / `cached` → `cache_read_tokens` from
|
|
86
|
+
the terminal `result.stats` event.
|
|
87
|
+
- `extractUsageAndCost("gemini", _, "stream-json")` routes to the new
|
|
88
|
+
parser so usage tokens reach the flight recorder on the stream-json
|
|
89
|
+
path, matching the existing `-o json` behaviour.
|
|
90
|
+
- `UPSTREAM_CLI_CONTRACTS.gemini.flags["-o"].values` widened to
|
|
91
|
+
`["json", "stream-json"]`; two new conformance fixtures
|
|
92
|
+
(`gemini-stream-json` passing, `gemini-output-format-invalid` failing
|
|
93
|
+
for `-o ndjson`) pin the enum bound.
|
|
94
|
+
|
|
95
|
+
### Test-veracity audit
|
|
96
|
+
|
|
97
|
+
Per the standing protocol established with v1.9.0
|
|
98
|
+
(`feedback_test_veracity_audit_protocol`), this slice's tests were
|
|
99
|
+
audited by Codex + Gemini + Grok + Mistral in async parallel with
|
|
100
|
+
mandatory mutation-probe execution. Round 1 found one real gap
|
|
101
|
+
(`Eε-4` only checked fixture presence/shape — P-Eε-1 left it green);
|
|
102
|
+
closed in commit `4a78f9c` by running the fixture's args through
|
|
103
|
+
`validateUpstreamCliArgs` inside the same `it()` block. Round 2
|
|
104
|
+
delivered unanimous UNCONDITIONAL APPROVE across all four reviewers,
|
|
105
|
+
with site-by-site probe evidence for the contested `Eα` registered-schema
|
|
106
|
+
helper. Spec at `docs/plans/test-veracity-audit-slice-epsilon.spec.md`.
|
|
107
|
+
|
|
108
|
+
Test count: 771 → 795 → 796 (24 + 1 new across two files).
|
|
109
|
+
|
|
110
|
+
### Known caveats
|
|
111
|
+
|
|
112
|
+
- The `npm run check` script still does not include `format:check` (a
|
|
113
|
+
gap first flagged in the v1.8.0 release notes). Run both locally
|
|
114
|
+
before pushing; CI runs format:check separately.
|
|
115
|
+
|
|
5
116
|
## [1.9.0] - 2026-05-27 — Phase 4 slice δ (budget/max-turns parity) + retroactive α/γ contract closure
|
|
6
117
|
|
|
7
118
|
Ships the fourth Phase 4 slice (budget/max-turns parity for Grok and Mistral),
|
|
@@ -1,13 +1,22 @@
|
|
|
1
1
|
/**
|
|
2
|
-
*
|
|
2
|
+
* Parsers for Gemini CLI `-o json` (single object) and `-o stream-json`
|
|
3
|
+
* (NDJSON event stream) output.
|
|
3
4
|
*
|
|
4
|
-
*
|
|
5
|
+
* `-o json` emits a single JSON object with:
|
|
5
6
|
* - `response`: string final model output
|
|
6
7
|
* - `usageMetadata`: { promptTokenCount, candidatesTokenCount,
|
|
7
8
|
* cachedContentTokenCount?, totalTokenCount }
|
|
8
9
|
*
|
|
9
|
-
*
|
|
10
|
-
*
|
|
10
|
+
* `-o stream-json` emits one JSON object per line:
|
|
11
|
+
* - `{ "type": "init", "session_id": "...", "model": "..." }`
|
|
12
|
+
* - `{ "type": "message", "role": "user", "content": "..." }`
|
|
13
|
+
* - `{ "type": "message", "role": "assistant", "content": "...", "delta": true }` (repeated)
|
|
14
|
+
* - `{ "type": "result", "status": "success", "stats": { "input_tokens": N,
|
|
15
|
+
* "output_tokens": N, "cached": N, ... } }`
|
|
16
|
+
*
|
|
17
|
+
* Both parsers return null when stdout is unparseable. Both populate the same
|
|
18
|
+
* `GeminiJsonParseResult` shape so `extractUsageAndCost` can branch on
|
|
19
|
+
* outputFormat without further dispatch.
|
|
11
20
|
*/
|
|
12
21
|
export interface GeminiUsage {
|
|
13
22
|
input_tokens: number;
|
|
@@ -19,3 +28,9 @@ export interface GeminiJsonParseResult {
|
|
|
19
28
|
response?: string;
|
|
20
29
|
}
|
|
21
30
|
export declare function parseGeminiJson(stdout: string): GeminiJsonParseResult | null;
|
|
31
|
+
/**
|
|
32
|
+
* Parse Gemini `-o stream-json` NDJSON output. Concatenates assistant `delta`
|
|
33
|
+
* message content into `response`, extracts the terminal `result.stats` payload
|
|
34
|
+
* into `usage`. Returns null when stdout contains no parseable JSON line.
|
|
35
|
+
*/
|
|
36
|
+
export declare function parseGeminiStreamJson(stdout: string): GeminiJsonParseResult | null;
|
|
@@ -1,13 +1,22 @@
|
|
|
1
1
|
/**
|
|
2
|
-
*
|
|
2
|
+
* Parsers for Gemini CLI `-o json` (single object) and `-o stream-json`
|
|
3
|
+
* (NDJSON event stream) output.
|
|
3
4
|
*
|
|
4
|
-
*
|
|
5
|
+
* `-o json` emits a single JSON object with:
|
|
5
6
|
* - `response`: string final model output
|
|
6
7
|
* - `usageMetadata`: { promptTokenCount, candidatesTokenCount,
|
|
7
8
|
* cachedContentTokenCount?, totalTokenCount }
|
|
8
9
|
*
|
|
9
|
-
*
|
|
10
|
-
*
|
|
10
|
+
* `-o stream-json` emits one JSON object per line:
|
|
11
|
+
* - `{ "type": "init", "session_id": "...", "model": "..." }`
|
|
12
|
+
* - `{ "type": "message", "role": "user", "content": "..." }`
|
|
13
|
+
* - `{ "type": "message", "role": "assistant", "content": "...", "delta": true }` (repeated)
|
|
14
|
+
* - `{ "type": "result", "status": "success", "stats": { "input_tokens": N,
|
|
15
|
+
* "output_tokens": N, "cached": N, ... } }`
|
|
16
|
+
*
|
|
17
|
+
* Both parsers return null when stdout is unparseable. Both populate the same
|
|
18
|
+
* `GeminiJsonParseResult` shape so `extractUsageAndCost` can branch on
|
|
19
|
+
* outputFormat without further dispatch.
|
|
11
20
|
*/
|
|
12
21
|
export function parseGeminiJson(stdout) {
|
|
13
22
|
const trimmed = stdout.trim();
|
|
@@ -45,3 +54,63 @@ export function parseGeminiJson(stdout) {
|
|
|
45
54
|
}
|
|
46
55
|
return result;
|
|
47
56
|
}
|
|
57
|
+
/**
|
|
58
|
+
* Parse Gemini `-o stream-json` NDJSON output. Concatenates assistant `delta`
|
|
59
|
+
* message content into `response`, extracts the terminal `result.stats` payload
|
|
60
|
+
* into `usage`. Returns null when stdout contains no parseable JSON line.
|
|
61
|
+
*/
|
|
62
|
+
export function parseGeminiStreamJson(stdout) {
|
|
63
|
+
if (!stdout) {
|
|
64
|
+
return null;
|
|
65
|
+
}
|
|
66
|
+
const lines = stdout.split(/\r?\n/);
|
|
67
|
+
const result = {};
|
|
68
|
+
const assistantChunks = [];
|
|
69
|
+
let sawAnyLine = false;
|
|
70
|
+
for (const line of lines) {
|
|
71
|
+
const trimmed = line.trim();
|
|
72
|
+
if (!trimmed)
|
|
73
|
+
continue;
|
|
74
|
+
// Gemini stream-json lines are individual JSON objects; non-JSON
|
|
75
|
+
// chatter (warnings, "Ripgrep not available", etc.) is silently
|
|
76
|
+
// ignored so a stray banner line doesn't poison usage extraction.
|
|
77
|
+
let event;
|
|
78
|
+
try {
|
|
79
|
+
event = JSON.parse(trimmed);
|
|
80
|
+
}
|
|
81
|
+
catch {
|
|
82
|
+
continue;
|
|
83
|
+
}
|
|
84
|
+
if (!event || typeof event !== "object")
|
|
85
|
+
continue;
|
|
86
|
+
sawAnyLine = true;
|
|
87
|
+
if (event.type === "message" &&
|
|
88
|
+
event.role === "assistant" &&
|
|
89
|
+
typeof event.content === "string") {
|
|
90
|
+
assistantChunks.push(event.content);
|
|
91
|
+
continue;
|
|
92
|
+
}
|
|
93
|
+
if (event.type === "result" && event.stats && typeof event.stats === "object") {
|
|
94
|
+
const stats = event.stats;
|
|
95
|
+
const input = typeof stats.input_tokens === "number" ? stats.input_tokens : undefined;
|
|
96
|
+
const output = typeof stats.output_tokens === "number" ? stats.output_tokens : undefined;
|
|
97
|
+
if (input !== undefined || output !== undefined) {
|
|
98
|
+
const usage = {
|
|
99
|
+
input_tokens: input ?? 0,
|
|
100
|
+
output_tokens: output ?? 0,
|
|
101
|
+
};
|
|
102
|
+
if (typeof stats.cached === "number") {
|
|
103
|
+
usage.cache_read_tokens = stats.cached;
|
|
104
|
+
}
|
|
105
|
+
result.usage = usage;
|
|
106
|
+
}
|
|
107
|
+
}
|
|
108
|
+
}
|
|
109
|
+
if (!sawAnyLine) {
|
|
110
|
+
return null;
|
|
111
|
+
}
|
|
112
|
+
if (assistantChunks.length > 0) {
|
|
113
|
+
result.response = assistantChunks.join("");
|
|
114
|
+
}
|
|
115
|
+
return result;
|
|
116
|
+
}
|
package/dist/index.d.ts
CHANGED
|
@@ -155,6 +155,8 @@ export declare function prepareClaudeRequest(params: {
|
|
|
155
155
|
maxTurns?: number;
|
|
156
156
|
effort?: ClaudeEffortLevel;
|
|
157
157
|
excludeDynamicSystemPromptSections?: boolean;
|
|
158
|
+
fallbackModel?: string;
|
|
159
|
+
jsonSchema?: string | Record<string, unknown>;
|
|
158
160
|
}, runtime?: GatewayServerRuntime): CliRequestPrep | ExtendedToolResponse;
|
|
159
161
|
export interface CodexRequestPrep extends CliRequestPrep {
|
|
160
162
|
/**
|
|
@@ -212,11 +214,13 @@ export declare function prepareGeminiRequest(params: {
|
|
|
212
214
|
optimizePrompt: boolean;
|
|
213
215
|
operation: string;
|
|
214
216
|
/**
|
|
215
|
-
* U23
|
|
216
|
-
*
|
|
217
|
-
*
|
|
217
|
+
* U23 + Phase 4 slice ε: output format. `json` emits `-o json` (single
|
|
218
|
+
* JSON object with usageMetadata). `stream-json` emits `-o stream-json`
|
|
219
|
+
* (NDJSON event stream — `init` / `message` / `result` lines). Both
|
|
220
|
+
* route through `extractUsageAndCost` so usage tokens reach the flight
|
|
221
|
+
* recorder. Defaults to "text".
|
|
218
222
|
*/
|
|
219
|
-
outputFormat?: "text" | "json";
|
|
223
|
+
outputFormat?: "text" | "json" | "stream-json";
|
|
220
224
|
sandbox?: boolean;
|
|
221
225
|
policyFiles?: string[];
|
|
222
226
|
adminPolicyFiles?: string[];
|
|
@@ -313,8 +317,11 @@ export interface GeminiRequestParams {
|
|
|
313
317
|
optimizeResponse?: boolean;
|
|
314
318
|
idleTimeoutMs?: number;
|
|
315
319
|
forceRefresh?: boolean;
|
|
316
|
-
/**
|
|
317
|
-
|
|
320
|
+
/**
|
|
321
|
+
* U23 + Phase 4 slice ε: "json" emits `-o json`; "stream-json" emits
|
|
322
|
+
* `-o stream-json` (NDJSON event stream). Both are usage-extracted.
|
|
323
|
+
*/
|
|
324
|
+
outputFormat?: "text" | "json" | "stream-json";
|
|
318
325
|
sandbox?: boolean;
|
|
319
326
|
policyFiles?: string[];
|
|
320
327
|
adminPolicyFiles?: string[];
|
package/dist/index.js
CHANGED
|
@@ -9,7 +9,7 @@ import { z } from "zod";
|
|
|
9
9
|
import { executeCli, killAllProcessGroups } from "./executor.js";
|
|
10
10
|
import { parseStreamJson } from "./stream-json-parser.js";
|
|
11
11
|
import { parseCodexJsonStream } from "./codex-json-parser.js";
|
|
12
|
-
import { parseGeminiJson } from "./gemini-json-parser.js";
|
|
12
|
+
import { parseGeminiJson, parseGeminiStreamJson } from "./gemini-json-parser.js";
|
|
13
13
|
import { parseVibeMetaJson } from "./mistral-meta-json-parser.js";
|
|
14
14
|
import { homedir } from "os";
|
|
15
15
|
import { createSessionManager } from "./session-manager.js";
|
|
@@ -530,8 +530,8 @@ ctx) {
|
|
|
530
530
|
costUsd: parsed.usage.cost_usd,
|
|
531
531
|
};
|
|
532
532
|
}
|
|
533
|
-
if (cli === "gemini" && outputFormat === "json") {
|
|
534
|
-
const parsed = parseGeminiJson(output);
|
|
533
|
+
if (cli === "gemini" && (outputFormat === "json" || outputFormat === "stream-json")) {
|
|
534
|
+
const parsed = outputFormat === "stream-json" ? parseGeminiStreamJson(output) : parseGeminiJson(output);
|
|
535
535
|
if (!parsed || !parsed.usage) {
|
|
536
536
|
return {};
|
|
537
537
|
}
|
|
@@ -1005,6 +1005,8 @@ export function prepareClaudeRequest(params, runtime = resolveGatewayServerRunti
|
|
|
1005
1005
|
maxTurns: params.maxTurns,
|
|
1006
1006
|
effort: params.effort,
|
|
1007
1007
|
excludeDynamicSystemPromptSections: params.excludeDynamicSystemPromptSections,
|
|
1008
|
+
fallbackModel: params.fallbackModel,
|
|
1009
|
+
jsonSchema: params.jsonSchema,
|
|
1008
1010
|
}));
|
|
1009
1011
|
return {
|
|
1010
1012
|
corrId,
|
|
@@ -1271,9 +1273,19 @@ export function prepareGeminiRequest(params, runtime = resolveGatewayServerRunti
|
|
|
1271
1273
|
// U23 fix: emit `-o json` when the caller asked for JSON output. The Gemini
|
|
1272
1274
|
// JSON parser is otherwise unreachable from the tool surface and the
|
|
1273
1275
|
// structured usageMetadata is silently dropped.
|
|
1276
|
+
//
|
|
1277
|
+
// Phase 4 slice ε: same wiring for `-o stream-json` (NDJSON event stream).
|
|
1278
|
+
// Gemini already streams stdout in real-time so the existing 10-minute
|
|
1279
|
+
// idle timeout (CLI_IDLE_TIMEOUTS.gemini) covers both modes without
|
|
1280
|
+
// adjustment — unlike Claude, no `--include-partial-messages` companion
|
|
1281
|
+
// flag is required because Gemini emits assistant `delta` events as part
|
|
1282
|
+
// of the default stream-json shape.
|
|
1274
1283
|
if (params.outputFormat === "json") {
|
|
1275
1284
|
args.push("-o", "json");
|
|
1276
1285
|
}
|
|
1286
|
+
else if (params.outputFormat === "stream-json") {
|
|
1287
|
+
args.push("-o", "stream-json");
|
|
1288
|
+
}
|
|
1277
1289
|
// Phase 4 slice γ: opt-in trust-prompt bypass for fresh workspaces.
|
|
1278
1290
|
if (params.skipTrust) {
|
|
1279
1291
|
args.push("--skip-trust");
|
|
@@ -2471,6 +2483,16 @@ export function createGatewayServer(deps = {}) {
|
|
|
2471
2483
|
.boolean()
|
|
2472
2484
|
.optional()
|
|
2473
2485
|
.describe("Claude --exclude-dynamic-system-prompt-sections: trim dynamic context blocks from the system prompt."),
|
|
2486
|
+
// Phase 4 slice η — Claude reliability + structured-output parity
|
|
2487
|
+
fallbackModel: z
|
|
2488
|
+
.string()
|
|
2489
|
+
.min(1)
|
|
2490
|
+
.optional()
|
|
2491
|
+
.describe("Claude --fallback-model: model name to auto-fallback to when the default model is overloaded (effective only with --print, which the gateway always uses)."),
|
|
2492
|
+
jsonSchema: z
|
|
2493
|
+
.union([z.string(), z.record(z.unknown())])
|
|
2494
|
+
.optional()
|
|
2495
|
+
.describe("Claude --json-schema: JSON Schema literal (NOT a path) constraining structured output. Object values are JSON.stringify-d; string values are passed verbatim. Use with outputFormat='json'."),
|
|
2474
2496
|
approvalStrategy: z
|
|
2475
2497
|
.enum(["legacy", "mcp_managed"])
|
|
2476
2498
|
.default("legacy")
|
|
@@ -2501,7 +2523,7 @@ export function createGatewayServer(deps = {}) {
|
|
|
2501
2523
|
.boolean()
|
|
2502
2524
|
.default(false)
|
|
2503
2525
|
.describe("Bypass dedup and force a fresh CLI run even if a recent identical request exists"),
|
|
2504
|
-
}, async ({ prompt, promptParts, model, outputFormat, sessionId, continueSession, createNewSession, allowedTools, disallowedTools, dangerouslySkipPermissions, permissionMode, agent, agents, forkSession, systemPrompt, appendSystemPrompt, maxBudgetUsd, maxTurns, effort, excludeDynamicSystemPromptSections, approvalStrategy, approvalPolicy, mcpServers, strictMcpConfig, correlationId, optimizePrompt, optimizeResponse, idleTimeoutMs, forceRefresh, }) => {
|
|
2526
|
+
}, async ({ prompt, promptParts, model, outputFormat, sessionId, continueSession, createNewSession, allowedTools, disallowedTools, dangerouslySkipPermissions, permissionMode, agent, agents, forkSession, systemPrompt, appendSystemPrompt, maxBudgetUsd, maxTurns, effort, excludeDynamicSystemPromptSections, fallbackModel, jsonSchema, approvalStrategy, approvalPolicy, mcpServers, strictMcpConfig, correlationId, optimizePrompt, optimizeResponse, idleTimeoutMs, forceRefresh, }) => {
|
|
2505
2527
|
const startTime = Date.now();
|
|
2506
2528
|
if (systemPrompt !== undefined && appendSystemPrompt !== undefined) {
|
|
2507
2529
|
return createErrorResponse("claude", 1, "", correlationId, new Error("systemPrompt and appendSystemPrompt are mutually exclusive; use one or the other (not both)."));
|
|
@@ -2531,6 +2553,8 @@ export function createGatewayServer(deps = {}) {
|
|
|
2531
2553
|
maxTurns,
|
|
2532
2554
|
effort,
|
|
2533
2555
|
excludeDynamicSystemPromptSections,
|
|
2556
|
+
fallbackModel,
|
|
2557
|
+
jsonSchema,
|
|
2534
2558
|
}, runtime);
|
|
2535
2559
|
if (!("args" in prep))
|
|
2536
2560
|
return prep;
|
|
@@ -3069,11 +3093,14 @@ export function createGatewayServer(deps = {}) {
|
|
|
3069
3093
|
.default(false)
|
|
3070
3094
|
.describe("Bypass dedup and force a fresh CLI run even if a recent identical request exists"),
|
|
3071
3095
|
// U23: emit `-o json` to extract token usage via parseGeminiJson. Default
|
|
3072
|
-
// remains text so existing callers see no behavior change.
|
|
3096
|
+
// remains text so existing callers see no behavior change. Phase 4 slice
|
|
3097
|
+
// ε adds `stream-json` (NDJSON event stream parsed by
|
|
3098
|
+
// parseGeminiStreamJson — `init`/`message`/`result` lines, idle-timeout
|
|
3099
|
+
// semantics covered by Gemini's existing real-time stdout streaming).
|
|
3073
3100
|
outputFormat: z
|
|
3074
|
-
.enum(["text", "json"])
|
|
3101
|
+
.enum(["text", "json", "stream-json"])
|
|
3075
3102
|
.default("text")
|
|
3076
|
-
.describe("Gemini output format. `json` emits `-o json`
|
|
3103
|
+
.describe("Gemini output format. `json` emits `-o json` (single JSON with usageMetadata). `stream-json` emits `-o stream-json` (NDJSON event stream — `init`/`message`/`result` lines, usage extracted from the terminal `result.stats` event). Both report usage to the flight recorder."),
|
|
3077
3104
|
sandbox: GEMINI_HIGH_IMPACT_PARAMS_SCHEMA.shape.sandbox.describe("Run Gemini in sandbox mode (-s)"),
|
|
3078
3105
|
policyFiles: GEMINI_HIGH_IMPACT_PARAMS_SCHEMA.shape.policyFiles.describe("Policy file paths (--policy <path>, one per file). Paths must exist."),
|
|
3079
3106
|
adminPolicyFiles: GEMINI_HIGH_IMPACT_PARAMS_SCHEMA.shape.adminPolicyFiles.describe("Admin policy file paths (--admin-policy <path>, one per file). Paths must exist."),
|
|
@@ -3395,6 +3422,16 @@ export function createGatewayServer(deps = {}) {
|
|
|
3395
3422
|
.boolean()
|
|
3396
3423
|
.optional()
|
|
3397
3424
|
.describe("Claude --exclude-dynamic-system-prompt-sections: trim dynamic context blocks from the system prompt."),
|
|
3425
|
+
// Phase 4 slice η — Claude reliability + structured-output parity
|
|
3426
|
+
fallbackModel: z
|
|
3427
|
+
.string()
|
|
3428
|
+
.min(1)
|
|
3429
|
+
.optional()
|
|
3430
|
+
.describe("Claude --fallback-model: model name to auto-fallback to when the default model is overloaded (effective only with --print, which the gateway always uses)."),
|
|
3431
|
+
jsonSchema: z
|
|
3432
|
+
.union([z.string(), z.record(z.unknown())])
|
|
3433
|
+
.optional()
|
|
3434
|
+
.describe("Claude --json-schema: JSON Schema literal (NOT a path) constraining structured output. Object values are JSON.stringify-d; string values are passed verbatim. Use with outputFormat='json'."),
|
|
3398
3435
|
approvalStrategy: z
|
|
3399
3436
|
.enum(["legacy", "mcp_managed"])
|
|
3400
3437
|
.default("legacy")
|
|
@@ -3424,7 +3461,7 @@ export function createGatewayServer(deps = {}) {
|
|
|
3424
3461
|
.boolean()
|
|
3425
3462
|
.default(false)
|
|
3426
3463
|
.describe("Bypass dedup and force a fresh CLI run even if a recent identical request exists"),
|
|
3427
|
-
}, async ({ prompt, promptParts, model, outputFormat, sessionId, continueSession, createNewSession, allowedTools, disallowedTools, dangerouslySkipPermissions, permissionMode, agent, agents, forkSession, systemPrompt, appendSystemPrompt, maxBudgetUsd, maxTurns, effort, excludeDynamicSystemPromptSections, approvalStrategy, approvalPolicy, mcpServers, strictMcpConfig, correlationId, optimizePrompt, idleTimeoutMs, forceRefresh, }) => {
|
|
3464
|
+
}, async ({ prompt, promptParts, model, outputFormat, sessionId, continueSession, createNewSession, allowedTools, disallowedTools, dangerouslySkipPermissions, permissionMode, agent, agents, forkSession, systemPrompt, appendSystemPrompt, maxBudgetUsd, maxTurns, effort, excludeDynamicSystemPromptSections, fallbackModel, jsonSchema, approvalStrategy, approvalPolicy, mcpServers, strictMcpConfig, correlationId, optimizePrompt, idleTimeoutMs, forceRefresh, }) => {
|
|
3428
3465
|
if (systemPrompt !== undefined && appendSystemPrompt !== undefined) {
|
|
3429
3466
|
return createErrorResponse("claude", 1, "", correlationId, new Error("systemPrompt and appendSystemPrompt are mutually exclusive; use one or the other (not both)."));
|
|
3430
3467
|
}
|
|
@@ -3453,6 +3490,8 @@ export function createGatewayServer(deps = {}) {
|
|
|
3453
3490
|
maxTurns,
|
|
3454
3491
|
effort,
|
|
3455
3492
|
excludeDynamicSystemPromptSections,
|
|
3493
|
+
fallbackModel,
|
|
3494
|
+
jsonSchema,
|
|
3456
3495
|
}, runtime);
|
|
3457
3496
|
if (!("args" in prep))
|
|
3458
3497
|
return prep;
|
|
@@ -3691,11 +3730,14 @@ export function createGatewayServer(deps = {}) {
|
|
|
3691
3730
|
.default(false)
|
|
3692
3731
|
.describe("Bypass dedup and force a fresh CLI run even if a recent identical request exists"),
|
|
3693
3732
|
// U23: emit `-o json` to extract token usage via parseGeminiJson. Default
|
|
3694
|
-
// remains text so existing callers see no behavior change.
|
|
3733
|
+
// remains text so existing callers see no behavior change. Phase 4 slice
|
|
3734
|
+
// ε adds `stream-json` (NDJSON event stream parsed by
|
|
3735
|
+
// parseGeminiStreamJson — `init`/`message`/`result` lines, idle-timeout
|
|
3736
|
+
// semantics covered by Gemini's existing real-time stdout streaming).
|
|
3695
3737
|
outputFormat: z
|
|
3696
|
-
.enum(["text", "json"])
|
|
3738
|
+
.enum(["text", "json", "stream-json"])
|
|
3697
3739
|
.default("text")
|
|
3698
|
-
.describe("Gemini output format. `json` emits `-o json`
|
|
3740
|
+
.describe("Gemini output format. `json` emits `-o json` (single JSON with usageMetadata). `stream-json` emits `-o stream-json` (NDJSON event stream — `init`/`message`/`result` lines, usage extracted from the terminal `result.stats` event). Both report usage to the flight recorder."),
|
|
3699
3741
|
sandbox: GEMINI_HIGH_IMPACT_PARAMS_SCHEMA.shape.sandbox.describe("Run Gemini in sandbox mode (-s)"),
|
|
3700
3742
|
policyFiles: GEMINI_HIGH_IMPACT_PARAMS_SCHEMA.shape.policyFiles.describe("Policy file paths (--policy <path>, one per file). Paths must exist."),
|
|
3701
3743
|
adminPolicyFiles: GEMINI_HIGH_IMPACT_PARAMS_SCHEMA.shape.adminPolicyFiles.describe("Admin policy file paths (--admin-policy <path>, one per file). Paths must exist."),
|
|
@@ -350,6 +350,20 @@ export interface ClaudeHighImpactFlagsInput {
|
|
|
350
350
|
maxTurns?: number;
|
|
351
351
|
effort?: ClaudeEffortLevel;
|
|
352
352
|
excludeDynamicSystemPromptSections?: boolean;
|
|
353
|
+
/**
|
|
354
|
+
* Phase 4 slice η — Claude `--fallback-model <model>`. Routes overloaded-model
|
|
355
|
+
* requests to the named fallback. Only effective with `--print` (we always pass
|
|
356
|
+
* `-p`, so no extra gating required here).
|
|
357
|
+
*/
|
|
358
|
+
fallbackModel?: string;
|
|
359
|
+
/**
|
|
360
|
+
* Phase 4 slice η — Claude `--json-schema <schema>`. Per `claude --help`, the
|
|
361
|
+
* argument is the JSON Schema *literal*, not a path. Object values are
|
|
362
|
+
* `JSON.stringify`-d; string values are passed verbatim (caller already wrote
|
|
363
|
+
* a JSON literal). No temp file lifecycle needed (contrast with Codex
|
|
364
|
+
* `--output-schema`, which takes a path).
|
|
365
|
+
*/
|
|
366
|
+
jsonSchema?: string | Record<string, unknown>;
|
|
353
367
|
}
|
|
354
368
|
/**
|
|
355
369
|
* Emit Claude high-impact feature flags (U25) as a flat argv segment.
|
package/dist/request-helpers.js
CHANGED
|
@@ -438,6 +438,13 @@ export function prepareClaudeHighImpactFlags(input) {
|
|
|
438
438
|
if (input.excludeDynamicSystemPromptSections) {
|
|
439
439
|
args.push("--exclude-dynamic-system-prompt-sections");
|
|
440
440
|
}
|
|
441
|
+
if (input.fallbackModel !== undefined) {
|
|
442
|
+
args.push("--fallback-model", input.fallbackModel);
|
|
443
|
+
}
|
|
444
|
+
if (input.jsonSchema !== undefined) {
|
|
445
|
+
const schemaArg = typeof input.jsonSchema === "string" ? input.jsonSchema : JSON.stringify(input.jsonSchema);
|
|
446
|
+
args.push("--json-schema", schemaArg);
|
|
447
|
+
}
|
|
441
448
|
return args;
|
|
442
449
|
}
|
|
443
450
|
//──────────────────────────────────────────────────────────────────────────────
|
|
@@ -37,6 +37,8 @@ export const UPSTREAM_CLI_CONTRACTS = {
|
|
|
37
37
|
"maxTurns",
|
|
38
38
|
"effort",
|
|
39
39
|
"excludeDynamicSystemPromptSections",
|
|
40
|
+
"fallbackModel",
|
|
41
|
+
"jsonSchema",
|
|
40
42
|
"approvalStrategy",
|
|
41
43
|
"mcpServers",
|
|
42
44
|
"strictMcpConfig",
|
|
@@ -78,6 +80,14 @@ export const UPSTREAM_CLI_CONTRACTS = {
|
|
|
78
80
|
arity: "none",
|
|
79
81
|
description: "Trim dynamic system prompt sections",
|
|
80
82
|
},
|
|
83
|
+
"--fallback-model": {
|
|
84
|
+
arity: "one",
|
|
85
|
+
description: "Auto-fallback model when default is overloaded (Claude --print only)",
|
|
86
|
+
},
|
|
87
|
+
"--json-schema": {
|
|
88
|
+
arity: "one",
|
|
89
|
+
description: "JSON Schema literal constraining structured output",
|
|
90
|
+
},
|
|
81
91
|
"--continue": { arity: "none", description: "Continue active session" },
|
|
82
92
|
"--session-id": { arity: "one", description: "Session id" },
|
|
83
93
|
},
|
|
@@ -95,6 +105,29 @@ export const UPSTREAM_CLI_CONTRACTS = {
|
|
|
95
105
|
args: ["-p", "hello", "--not-a-claude-flag"],
|
|
96
106
|
expect: "fail",
|
|
97
107
|
},
|
|
108
|
+
{
|
|
109
|
+
// Phase 4 slice η: --fallback-model wired through prepareClaudeRequest.
|
|
110
|
+
id: "claude-fallback-model",
|
|
111
|
+
description: "Phase 4 slice η: --fallback-model accepted",
|
|
112
|
+
args: ["-p", "hello", "--fallback-model", "claude-haiku-4-5-20251001"],
|
|
113
|
+
expect: "pass",
|
|
114
|
+
},
|
|
115
|
+
{
|
|
116
|
+
// Phase 4 slice η: --json-schema accepts an inline JSON Schema literal
|
|
117
|
+
// (per `claude --help` example), not a path. Codex parity for
|
|
118
|
+
// structured-output validation in one slice.
|
|
119
|
+
id: "claude-json-schema",
|
|
120
|
+
description: "Phase 4 slice η: --json-schema accepts inline JSON literal",
|
|
121
|
+
args: [
|
|
122
|
+
"-p",
|
|
123
|
+
"hello",
|
|
124
|
+
"--output-format",
|
|
125
|
+
"json",
|
|
126
|
+
"--json-schema",
|
|
127
|
+
'{"type":"object","properties":{"name":{"type":"string"}},"required":["name"]}',
|
|
128
|
+
],
|
|
129
|
+
expect: "pass",
|
|
130
|
+
},
|
|
98
131
|
],
|
|
99
132
|
},
|
|
100
133
|
codex: {
|
|
@@ -248,7 +281,11 @@ export const UPSTREAM_CLI_CONTRACTS = {
|
|
|
248
281
|
"-s": { arity: "none", description: "Sandbox mode" },
|
|
249
282
|
"--policy": { arity: "one", description: "Policy file path" },
|
|
250
283
|
"--admin-policy": { arity: "one", description: "Admin policy file path" },
|
|
251
|
-
"-o": {
|
|
284
|
+
"-o": {
|
|
285
|
+
arity: "one",
|
|
286
|
+
values: ["json", "stream-json"],
|
|
287
|
+
description: "Output format (Phase 4 slice ε adds stream-json)",
|
|
288
|
+
},
|
|
252
289
|
"--resume": { arity: "one", description: "Resume session" },
|
|
253
290
|
"--skip-trust": {
|
|
254
291
|
arity: "none",
|
|
@@ -275,6 +312,18 @@ export const UPSTREAM_CLI_CONTRACTS = {
|
|
|
275
312
|
args: ["-p", "hello", "--skip-trust"],
|
|
276
313
|
expect: "pass",
|
|
277
314
|
},
|
|
315
|
+
{
|
|
316
|
+
id: "gemini-stream-json",
|
|
317
|
+
description: "Phase 4 slice ε: -o stream-json is accepted",
|
|
318
|
+
args: ["-p", "hello", "-o", "stream-json"],
|
|
319
|
+
expect: "pass",
|
|
320
|
+
},
|
|
321
|
+
{
|
|
322
|
+
id: "gemini-output-format-invalid",
|
|
323
|
+
description: "Phase 4 slice ε: -o ndjson is rejected (not in contract enum)",
|
|
324
|
+
args: ["-p", "hello", "-o", "ndjson"],
|
|
325
|
+
expect: "fail",
|
|
326
|
+
},
|
|
278
327
|
],
|
|
279
328
|
},
|
|
280
329
|
grok: {
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "llm-cli-gateway",
|
|
3
|
-
"version": "1.
|
|
3
|
+
"version": "1.11.0",
|
|
4
4
|
"mcpName": "io.github.verivus-oss/llm-cli-gateway",
|
|
5
5
|
"description": "MCP server providing unified access to Claude Code, Codex, Gemini, Grok, and Mistral Vibe CLIs with session management, retry logic, async job orchestration, durable job results, and cross-LLM validation.",
|
|
6
6
|
"license": "MIT",
|