llm-cli-gateway 1.9.0 → 1.11.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -2,6 +2,117 @@
2
2
 
3
3
  All notable changes to the llm-cli-gateway project.
4
4
 
5
+ ## [1.11.0] - 2026-05-27 — Phase 4 slice η (Claude `--fallback-model` + `--json-schema`)
6
+
7
+ Ships the sixth Phase 4 slice: Claude's reliability fallback and
8
+ structured-output JSON-Schema constraint flags are now reachable from
9
+ `claude_request` and `claude_request_async`. Three commits land together
10
+ (feature wiring, contract registration, test-veracity regressions) plus
11
+ this release commit.
12
+
13
+ ### Added — `--fallback-model` and `--json-schema` for Claude
14
+
15
+ - `claude_request` and `claude_request_async` accept a new `fallbackModel`
16
+ field (non-empty string, validated via `z.string().min(1)`). Threaded
17
+ through `prepareClaudeRequest` → `prepareClaudeHighImpactFlags`
18
+ (`src/request-helpers.ts:651`) → `--fallback-model <model>` argv pair.
19
+ Effective only with Claude `--print`; the gateway always passes `-p`,
20
+ so no extra gating required.
21
+ - Both tools accept a new `jsonSchema` field
22
+ (`string | Record<string, unknown>`). Per `claude --help`, the CLI
23
+ argument is the JSON Schema *literal* (not a path; contrast with Codex
24
+ `--output-schema`). Object values are `JSON.stringify`-d; string values
25
+ pass verbatim. Use with `outputFormat: "json"` for structured output
26
+ validation. Achieves Codex parity for structured-output validation
27
+ in a single slice.
28
+ - `UPSTREAM_CLI_CONTRACTS.claude.flags` registers `--fallback-model` and
29
+ `--json-schema` with `arity: "one"`. `mcpParameters` includes both new
30
+ field names. Two new passing conformance fixtures
31
+ (`claude-fallback-model`, `claude-json-schema`) pin the contract; both
32
+ are mechanically validated against `validateUpstreamCliArgs` in the
33
+ REGRESSIONS Hε suite.
34
+
35
+ ### Test-veracity audit
36
+
37
+ Per the standing protocol (`feedback_test_veracity_audit_protocol`),
38
+ this slice's tests were audited by Codex + Gemini + Grok + Mistral in
39
+ async parallel with mandatory mutation-probe execution. Spec at
40
+ `docs/plans/test-veracity-audit-slice-eta.spec.md`. Round 1 outcomes:
41
+ Grok + Mistral unanimous UNCONDITIONAL APPROVE; Gemini stalled at 682B
42
+ stderr for 15+ minutes (cancelled, documented quota/stall-class
43
+ blocker); Codex initially REJECTED on P-Hβ-4 with an invalid claim
44
+ ("removing sync `jsonSchema` left the test green") — pre-verification
45
+ on a clean tree confirmed the mutation does turn `Hα-4` + `Hα-6` RED as
46
+ the spec predicts. Round-2 pushback with the verbatim vitest output:
47
+ Codex self-corrected, reproduced the mutation in a worktree, observed
48
+ the predicted red, restored, and issued UNCONDITIONAL APPROVE.
49
+
50
+ Three substantive reviewer approves (Grok, Mistral, Codex) from
51
+ independent vendor families satisfy the multi-LLM gate; Gemini stall
52
+ documented.
53
+
54
+ Test count: 816 → 837 (21 new across one file:
55
+ `src/__tests__/test-veracity-regressions-slice-eta.test.ts`).
56
+
57
+ ### Known caveats
58
+
59
+ - `npm run check` still excludes `format:check` (gap first flagged in
60
+ v1.8.0). Run both locally before pushing.
61
+ - Claude `--fallback-model` and `--json-schema` are CLI-side gated to
62
+ `--print` mode by Claude itself; both gateway tools always pass `-p`,
63
+ so this is invisible to callers but worth noting if the upstream CLI
64
+ flag semantics change.
65
+
66
+ ## [1.10.0] - 2026-05-27 — Phase 4 slice ε (Gemini `-o stream-json` enum widening)
67
+
68
+ Ships the fifth Phase 4 slice: Gemini's NDJSON event-stream output format
69
+ (`-o stream-json`) is now reachable from `gemini_request` and
70
+ `gemini_request_async`. Four commits land together: the feature wiring, a
71
+ contract-table widening, a test-veracity regression suite, and a follow-up
72
+ test fix driven by the multi-LLM round-1 audit.
73
+
74
+ ### Added — `outputFormat: "stream-json"` for Gemini
75
+
76
+ - `gemini_request` and `gemini_request_async` `outputFormat` enums widened
77
+ from `text | json` to `text | json | stream-json`.
78
+ - `prepareGeminiRequest` emits `-o stream-json` when the new value is set.
79
+ No `--include-partial-messages` analogue is required: Gemini already
80
+ streams stdout in real time across all output modes (covered by
81
+ `CLI_IDLE_TIMEOUTS.gemini = 600_000`).
82
+ - New `parseGeminiStreamJson` parser consumes the NDJSON event stream
83
+ (`init` / `message` / `result` lines), concatenates assistant `delta`
84
+ messages into the response, and extracts
85
+ `input_tokens` / `output_tokens` / `cached` → `cache_read_tokens` from
86
+ the terminal `result.stats` event.
87
+ - `extractUsageAndCost("gemini", _, "stream-json")` routes to the new
88
+ parser so usage tokens reach the flight recorder on the stream-json
89
+ path, matching the existing `-o json` behaviour.
90
+ - `UPSTREAM_CLI_CONTRACTS.gemini.flags["-o"].values` widened to
91
+ `["json", "stream-json"]`; two new conformance fixtures
92
+ (`gemini-stream-json` passing, `gemini-output-format-invalid` failing
93
+ for `-o ndjson`) pin the enum bound.
94
+
95
+ ### Test-veracity audit
96
+
97
+ Per the standing protocol established with v1.9.0
98
+ (`feedback_test_veracity_audit_protocol`), this slice's tests were
99
+ audited by Codex + Gemini + Grok + Mistral in async parallel with
100
+ mandatory mutation-probe execution. Round 1 found one real gap
101
+ (`Eε-4` only checked fixture presence/shape — P-Eε-1 left it green);
102
+ closed in commit `4a78f9c` by running the fixture's args through
103
+ `validateUpstreamCliArgs` inside the same `it()` block. Round 2
104
+ delivered unanimous UNCONDITIONAL APPROVE across all four reviewers,
105
+ with site-by-site probe evidence for the contested `Eα` registered-schema
106
+ helper. Spec at `docs/plans/test-veracity-audit-slice-epsilon.spec.md`.
107
+
108
+ Test count: 771 → 795 → 796 (24 + 1 new across two files).
109
+
110
+ ### Known caveats
111
+
112
+ - The `npm run check` script still does not include `format:check` (a
113
+ gap first flagged in the v1.8.0 release notes). Run both locally
114
+ before pushing; CI runs format:check separately.
115
+
5
116
  ## [1.9.0] - 2026-05-27 — Phase 4 slice δ (budget/max-turns parity) + retroactive α/γ contract closure
6
117
 
7
118
  Ships the fourth Phase 4 slice (budget/max-turns parity for Grok and Mistral),
@@ -1,13 +1,22 @@
1
1
  /**
2
- * Parser for Gemini CLI `-o json` output.
2
+ * Parsers for Gemini CLI `-o json` (single object) and `-o stream-json`
3
+ * (NDJSON event stream) output.
3
4
  *
4
- * Gemini emits a single JSON object with:
5
+ * `-o json` emits a single JSON object with:
5
6
  * - `response`: string final model output
6
7
  * - `usageMetadata`: { promptTokenCount, candidatesTokenCount,
7
8
  * cachedContentTokenCount?, totalTokenCount }
8
9
  *
9
- * Returns null when stdout is not parseable as JSON. Returns an object with
10
- * only `response` when usageMetadata is missing.
10
+ * `-o stream-json` emits one JSON object per line:
11
+ * - `{ "type": "init", "session_id": "...", "model": "..." }`
12
+ * - `{ "type": "message", "role": "user", "content": "..." }`
13
+ * - `{ "type": "message", "role": "assistant", "content": "...", "delta": true }` (repeated)
14
+ * - `{ "type": "result", "status": "success", "stats": { "input_tokens": N,
15
+ * "output_tokens": N, "cached": N, ... } }`
16
+ *
17
+ * Both parsers return null when stdout is unparseable. Both populate the same
18
+ * `GeminiJsonParseResult` shape so `extractUsageAndCost` can branch on
19
+ * outputFormat without further dispatch.
11
20
  */
12
21
  export interface GeminiUsage {
13
22
  input_tokens: number;
@@ -19,3 +28,9 @@ export interface GeminiJsonParseResult {
19
28
  response?: string;
20
29
  }
21
30
  export declare function parseGeminiJson(stdout: string): GeminiJsonParseResult | null;
31
+ /**
32
+ * Parse Gemini `-o stream-json` NDJSON output. Concatenates assistant `delta`
33
+ * message content into `response`, extracts the terminal `result.stats` payload
34
+ * into `usage`. Returns null when stdout contains no parseable JSON line.
35
+ */
36
+ export declare function parseGeminiStreamJson(stdout: string): GeminiJsonParseResult | null;
@@ -1,13 +1,22 @@
1
1
  /**
2
- * Parser for Gemini CLI `-o json` output.
2
+ * Parsers for Gemini CLI `-o json` (single object) and `-o stream-json`
3
+ * (NDJSON event stream) output.
3
4
  *
4
- * Gemini emits a single JSON object with:
5
+ * `-o json` emits a single JSON object with:
5
6
  * - `response`: string final model output
6
7
  * - `usageMetadata`: { promptTokenCount, candidatesTokenCount,
7
8
  * cachedContentTokenCount?, totalTokenCount }
8
9
  *
9
- * Returns null when stdout is not parseable as JSON. Returns an object with
10
- * only `response` when usageMetadata is missing.
10
+ * `-o stream-json` emits one JSON object per line:
11
+ * - `{ "type": "init", "session_id": "...", "model": "..." }`
12
+ * - `{ "type": "message", "role": "user", "content": "..." }`
13
+ * - `{ "type": "message", "role": "assistant", "content": "...", "delta": true }` (repeated)
14
+ * - `{ "type": "result", "status": "success", "stats": { "input_tokens": N,
15
+ * "output_tokens": N, "cached": N, ... } }`
16
+ *
17
+ * Both parsers return null when stdout is unparseable. Both populate the same
18
+ * `GeminiJsonParseResult` shape so `extractUsageAndCost` can branch on
19
+ * outputFormat without further dispatch.
11
20
  */
12
21
  export function parseGeminiJson(stdout) {
13
22
  const trimmed = stdout.trim();
@@ -45,3 +54,63 @@ export function parseGeminiJson(stdout) {
45
54
  }
46
55
  return result;
47
56
  }
57
+ /**
58
+ * Parse Gemini `-o stream-json` NDJSON output. Concatenates assistant `delta`
59
+ * message content into `response`, extracts the terminal `result.stats` payload
60
+ * into `usage`. Returns null when stdout contains no parseable JSON line.
61
+ */
62
+ export function parseGeminiStreamJson(stdout) {
63
+ if (!stdout) {
64
+ return null;
65
+ }
66
+ const lines = stdout.split(/\r?\n/);
67
+ const result = {};
68
+ const assistantChunks = [];
69
+ let sawAnyLine = false;
70
+ for (const line of lines) {
71
+ const trimmed = line.trim();
72
+ if (!trimmed)
73
+ continue;
74
+ // Gemini stream-json lines are individual JSON objects; non-JSON
75
+ // chatter (warnings, "Ripgrep not available", etc.) is silently
76
+ // ignored so a stray banner line doesn't poison usage extraction.
77
+ let event;
78
+ try {
79
+ event = JSON.parse(trimmed);
80
+ }
81
+ catch {
82
+ continue;
83
+ }
84
+ if (!event || typeof event !== "object")
85
+ continue;
86
+ sawAnyLine = true;
87
+ if (event.type === "message" &&
88
+ event.role === "assistant" &&
89
+ typeof event.content === "string") {
90
+ assistantChunks.push(event.content);
91
+ continue;
92
+ }
93
+ if (event.type === "result" && event.stats && typeof event.stats === "object") {
94
+ const stats = event.stats;
95
+ const input = typeof stats.input_tokens === "number" ? stats.input_tokens : undefined;
96
+ const output = typeof stats.output_tokens === "number" ? stats.output_tokens : undefined;
97
+ if (input !== undefined || output !== undefined) {
98
+ const usage = {
99
+ input_tokens: input ?? 0,
100
+ output_tokens: output ?? 0,
101
+ };
102
+ if (typeof stats.cached === "number") {
103
+ usage.cache_read_tokens = stats.cached;
104
+ }
105
+ result.usage = usage;
106
+ }
107
+ }
108
+ }
109
+ if (!sawAnyLine) {
110
+ return null;
111
+ }
112
+ if (assistantChunks.length > 0) {
113
+ result.response = assistantChunks.join("");
114
+ }
115
+ return result;
116
+ }
package/dist/index.d.ts CHANGED
@@ -155,6 +155,8 @@ export declare function prepareClaudeRequest(params: {
155
155
  maxTurns?: number;
156
156
  effort?: ClaudeEffortLevel;
157
157
  excludeDynamicSystemPromptSections?: boolean;
158
+ fallbackModel?: string;
159
+ jsonSchema?: string | Record<string, unknown>;
158
160
  }, runtime?: GatewayServerRuntime): CliRequestPrep | ExtendedToolResponse;
159
161
  export interface CodexRequestPrep extends CliRequestPrep {
160
162
  /**
@@ -212,11 +214,13 @@ export declare function prepareGeminiRequest(params: {
212
214
  optimizePrompt: boolean;
213
215
  operation: string;
214
216
  /**
215
- * U23: output format. When set to "json", emits `-o json` so Gemini emits
216
- * the JSON object containing usageMetadata that `parseGeminiJson` (and
217
- * downstream `extractUsageAndCost`) can consume. Defaults to "text".
217
+ * U23 + Phase 4 slice ε: output format. `json` emits `-o json` (single
218
+ * JSON object with usageMetadata). `stream-json` emits `-o stream-json`
219
+ * (NDJSON event stream — `init` / `message` / `result` lines). Both
220
+ * route through `extractUsageAndCost` so usage tokens reach the flight
221
+ * recorder. Defaults to "text".
218
222
  */
219
- outputFormat?: "text" | "json";
223
+ outputFormat?: "text" | "json" | "stream-json";
220
224
  sandbox?: boolean;
221
225
  policyFiles?: string[];
222
226
  adminPolicyFiles?: string[];
@@ -313,8 +317,11 @@ export interface GeminiRequestParams {
313
317
  optimizeResponse?: boolean;
314
318
  idleTimeoutMs?: number;
315
319
  forceRefresh?: boolean;
316
- /** U23: "json" emits `-o json` so token usage is parsed and reported. */
317
- outputFormat?: "text" | "json";
320
+ /**
321
+ * U23 + Phase 4 slice ε: "json" emits `-o json`; "stream-json" emits
322
+ * `-o stream-json` (NDJSON event stream). Both are usage-extracted.
323
+ */
324
+ outputFormat?: "text" | "json" | "stream-json";
318
325
  sandbox?: boolean;
319
326
  policyFiles?: string[];
320
327
  adminPolicyFiles?: string[];
package/dist/index.js CHANGED
@@ -9,7 +9,7 @@ import { z } from "zod";
9
9
  import { executeCli, killAllProcessGroups } from "./executor.js";
10
10
  import { parseStreamJson } from "./stream-json-parser.js";
11
11
  import { parseCodexJsonStream } from "./codex-json-parser.js";
12
- import { parseGeminiJson } from "./gemini-json-parser.js";
12
+ import { parseGeminiJson, parseGeminiStreamJson } from "./gemini-json-parser.js";
13
13
  import { parseVibeMetaJson } from "./mistral-meta-json-parser.js";
14
14
  import { homedir } from "os";
15
15
  import { createSessionManager } from "./session-manager.js";
@@ -530,8 +530,8 @@ ctx) {
530
530
  costUsd: parsed.usage.cost_usd,
531
531
  };
532
532
  }
533
- if (cli === "gemini" && outputFormat === "json") {
534
- const parsed = parseGeminiJson(output);
533
+ if (cli === "gemini" && (outputFormat === "json" || outputFormat === "stream-json")) {
534
+ const parsed = outputFormat === "stream-json" ? parseGeminiStreamJson(output) : parseGeminiJson(output);
535
535
  if (!parsed || !parsed.usage) {
536
536
  return {};
537
537
  }
@@ -1005,6 +1005,8 @@ export function prepareClaudeRequest(params, runtime = resolveGatewayServerRunti
1005
1005
  maxTurns: params.maxTurns,
1006
1006
  effort: params.effort,
1007
1007
  excludeDynamicSystemPromptSections: params.excludeDynamicSystemPromptSections,
1008
+ fallbackModel: params.fallbackModel,
1009
+ jsonSchema: params.jsonSchema,
1008
1010
  }));
1009
1011
  return {
1010
1012
  corrId,
@@ -1271,9 +1273,19 @@ export function prepareGeminiRequest(params, runtime = resolveGatewayServerRunti
1271
1273
  // U23 fix: emit `-o json` when the caller asked for JSON output. The Gemini
1272
1274
  // JSON parser is otherwise unreachable from the tool surface and the
1273
1275
  // structured usageMetadata is silently dropped.
1276
+ //
1277
+ // Phase 4 slice ε: same wiring for `-o stream-json` (NDJSON event stream).
1278
+ // Gemini already streams stdout in real-time so the existing 10-minute
1279
+ // idle timeout (CLI_IDLE_TIMEOUTS.gemini) covers both modes without
1280
+ // adjustment — unlike Claude, no `--include-partial-messages` companion
1281
+ // flag is required because Gemini emits assistant `delta` events as part
1282
+ // of the default stream-json shape.
1274
1283
  if (params.outputFormat === "json") {
1275
1284
  args.push("-o", "json");
1276
1285
  }
1286
+ else if (params.outputFormat === "stream-json") {
1287
+ args.push("-o", "stream-json");
1288
+ }
1277
1289
  // Phase 4 slice γ: opt-in trust-prompt bypass for fresh workspaces.
1278
1290
  if (params.skipTrust) {
1279
1291
  args.push("--skip-trust");
@@ -2471,6 +2483,16 @@ export function createGatewayServer(deps = {}) {
2471
2483
  .boolean()
2472
2484
  .optional()
2473
2485
  .describe("Claude --exclude-dynamic-system-prompt-sections: trim dynamic context blocks from the system prompt."),
2486
+ // Phase 4 slice η — Claude reliability + structured-output parity
2487
+ fallbackModel: z
2488
+ .string()
2489
+ .min(1)
2490
+ .optional()
2491
+ .describe("Claude --fallback-model: model name to auto-fallback to when the default model is overloaded (effective only with --print, which the gateway always uses)."),
2492
+ jsonSchema: z
2493
+ .union([z.string(), z.record(z.unknown())])
2494
+ .optional()
2495
+ .describe("Claude --json-schema: JSON Schema literal (NOT a path) constraining structured output. Object values are JSON.stringify-d; string values are passed verbatim. Use with outputFormat='json'."),
2474
2496
  approvalStrategy: z
2475
2497
  .enum(["legacy", "mcp_managed"])
2476
2498
  .default("legacy")
@@ -2501,7 +2523,7 @@ export function createGatewayServer(deps = {}) {
2501
2523
  .boolean()
2502
2524
  .default(false)
2503
2525
  .describe("Bypass dedup and force a fresh CLI run even if a recent identical request exists"),
2504
- }, async ({ prompt, promptParts, model, outputFormat, sessionId, continueSession, createNewSession, allowedTools, disallowedTools, dangerouslySkipPermissions, permissionMode, agent, agents, forkSession, systemPrompt, appendSystemPrompt, maxBudgetUsd, maxTurns, effort, excludeDynamicSystemPromptSections, approvalStrategy, approvalPolicy, mcpServers, strictMcpConfig, correlationId, optimizePrompt, optimizeResponse, idleTimeoutMs, forceRefresh, }) => {
2526
+ }, async ({ prompt, promptParts, model, outputFormat, sessionId, continueSession, createNewSession, allowedTools, disallowedTools, dangerouslySkipPermissions, permissionMode, agent, agents, forkSession, systemPrompt, appendSystemPrompt, maxBudgetUsd, maxTurns, effort, excludeDynamicSystemPromptSections, fallbackModel, jsonSchema, approvalStrategy, approvalPolicy, mcpServers, strictMcpConfig, correlationId, optimizePrompt, optimizeResponse, idleTimeoutMs, forceRefresh, }) => {
2505
2527
  const startTime = Date.now();
2506
2528
  if (systemPrompt !== undefined && appendSystemPrompt !== undefined) {
2507
2529
  return createErrorResponse("claude", 1, "", correlationId, new Error("systemPrompt and appendSystemPrompt are mutually exclusive; use one or the other (not both)."));
@@ -2531,6 +2553,8 @@ export function createGatewayServer(deps = {}) {
2531
2553
  maxTurns,
2532
2554
  effort,
2533
2555
  excludeDynamicSystemPromptSections,
2556
+ fallbackModel,
2557
+ jsonSchema,
2534
2558
  }, runtime);
2535
2559
  if (!("args" in prep))
2536
2560
  return prep;
@@ -3069,11 +3093,14 @@ export function createGatewayServer(deps = {}) {
3069
3093
  .default(false)
3070
3094
  .describe("Bypass dedup and force a fresh CLI run even if a recent identical request exists"),
3071
3095
  // U23: emit `-o json` to extract token usage via parseGeminiJson. Default
3072
- // remains text so existing callers see no behavior change.
3096
+ // remains text so existing callers see no behavior change. Phase 4 slice
3097
+ // ε adds `stream-json` (NDJSON event stream parsed by
3098
+ // parseGeminiStreamJson — `init`/`message`/`result` lines, idle-timeout
3099
+ // semantics covered by Gemini's existing real-time stdout streaming).
3073
3100
  outputFormat: z
3074
- .enum(["text", "json"])
3101
+ .enum(["text", "json", "stream-json"])
3075
3102
  .default("text")
3076
- .describe("Gemini output format. `json` emits `-o json` so usageMetadata is parsed and reported."),
3103
+ .describe("Gemini output format. `json` emits `-o json` (single JSON with usageMetadata). `stream-json` emits `-o stream-json` (NDJSON event stream — `init`/`message`/`result` lines, usage extracted from the terminal `result.stats` event). Both report usage to the flight recorder."),
3077
3104
  sandbox: GEMINI_HIGH_IMPACT_PARAMS_SCHEMA.shape.sandbox.describe("Run Gemini in sandbox mode (-s)"),
3078
3105
  policyFiles: GEMINI_HIGH_IMPACT_PARAMS_SCHEMA.shape.policyFiles.describe("Policy file paths (--policy <path>, one per file). Paths must exist."),
3079
3106
  adminPolicyFiles: GEMINI_HIGH_IMPACT_PARAMS_SCHEMA.shape.adminPolicyFiles.describe("Admin policy file paths (--admin-policy <path>, one per file). Paths must exist."),
@@ -3395,6 +3422,16 @@ export function createGatewayServer(deps = {}) {
3395
3422
  .boolean()
3396
3423
  .optional()
3397
3424
  .describe("Claude --exclude-dynamic-system-prompt-sections: trim dynamic context blocks from the system prompt."),
3425
+ // Phase 4 slice η — Claude reliability + structured-output parity
3426
+ fallbackModel: z
3427
+ .string()
3428
+ .min(1)
3429
+ .optional()
3430
+ .describe("Claude --fallback-model: model name to auto-fallback to when the default model is overloaded (effective only with --print, which the gateway always uses)."),
3431
+ jsonSchema: z
3432
+ .union([z.string(), z.record(z.unknown())])
3433
+ .optional()
3434
+ .describe("Claude --json-schema: JSON Schema literal (NOT a path) constraining structured output. Object values are JSON.stringify-d; string values are passed verbatim. Use with outputFormat='json'."),
3398
3435
  approvalStrategy: z
3399
3436
  .enum(["legacy", "mcp_managed"])
3400
3437
  .default("legacy")
@@ -3424,7 +3461,7 @@ export function createGatewayServer(deps = {}) {
3424
3461
  .boolean()
3425
3462
  .default(false)
3426
3463
  .describe("Bypass dedup and force a fresh CLI run even if a recent identical request exists"),
3427
- }, async ({ prompt, promptParts, model, outputFormat, sessionId, continueSession, createNewSession, allowedTools, disallowedTools, dangerouslySkipPermissions, permissionMode, agent, agents, forkSession, systemPrompt, appendSystemPrompt, maxBudgetUsd, maxTurns, effort, excludeDynamicSystemPromptSections, approvalStrategy, approvalPolicy, mcpServers, strictMcpConfig, correlationId, optimizePrompt, idleTimeoutMs, forceRefresh, }) => {
3464
+ }, async ({ prompt, promptParts, model, outputFormat, sessionId, continueSession, createNewSession, allowedTools, disallowedTools, dangerouslySkipPermissions, permissionMode, agent, agents, forkSession, systemPrompt, appendSystemPrompt, maxBudgetUsd, maxTurns, effort, excludeDynamicSystemPromptSections, fallbackModel, jsonSchema, approvalStrategy, approvalPolicy, mcpServers, strictMcpConfig, correlationId, optimizePrompt, idleTimeoutMs, forceRefresh, }) => {
3428
3465
  if (systemPrompt !== undefined && appendSystemPrompt !== undefined) {
3429
3466
  return createErrorResponse("claude", 1, "", correlationId, new Error("systemPrompt and appendSystemPrompt are mutually exclusive; use one or the other (not both)."));
3430
3467
  }
@@ -3453,6 +3490,8 @@ export function createGatewayServer(deps = {}) {
3453
3490
  maxTurns,
3454
3491
  effort,
3455
3492
  excludeDynamicSystemPromptSections,
3493
+ fallbackModel,
3494
+ jsonSchema,
3456
3495
  }, runtime);
3457
3496
  if (!("args" in prep))
3458
3497
  return prep;
@@ -3691,11 +3730,14 @@ export function createGatewayServer(deps = {}) {
3691
3730
  .default(false)
3692
3731
  .describe("Bypass dedup and force a fresh CLI run even if a recent identical request exists"),
3693
3732
  // U23: emit `-o json` to extract token usage via parseGeminiJson. Default
3694
- // remains text so existing callers see no behavior change.
3733
+ // remains text so existing callers see no behavior change. Phase 4 slice
3734
+ // ε adds `stream-json` (NDJSON event stream parsed by
3735
+ // parseGeminiStreamJson — `init`/`message`/`result` lines, idle-timeout
3736
+ // semantics covered by Gemini's existing real-time stdout streaming).
3695
3737
  outputFormat: z
3696
- .enum(["text", "json"])
3738
+ .enum(["text", "json", "stream-json"])
3697
3739
  .default("text")
3698
- .describe("Gemini output format. `json` emits `-o json` so usageMetadata is parsed and reported."),
3740
+ .describe("Gemini output format. `json` emits `-o json` (single JSON with usageMetadata). `stream-json` emits `-o stream-json` (NDJSON event stream — `init`/`message`/`result` lines, usage extracted from the terminal `result.stats` event). Both report usage to the flight recorder."),
3699
3741
  sandbox: GEMINI_HIGH_IMPACT_PARAMS_SCHEMA.shape.sandbox.describe("Run Gemini in sandbox mode (-s)"),
3700
3742
  policyFiles: GEMINI_HIGH_IMPACT_PARAMS_SCHEMA.shape.policyFiles.describe("Policy file paths (--policy <path>, one per file). Paths must exist."),
3701
3743
  adminPolicyFiles: GEMINI_HIGH_IMPACT_PARAMS_SCHEMA.shape.adminPolicyFiles.describe("Admin policy file paths (--admin-policy <path>, one per file). Paths must exist."),
@@ -350,6 +350,20 @@ export interface ClaudeHighImpactFlagsInput {
350
350
  maxTurns?: number;
351
351
  effort?: ClaudeEffortLevel;
352
352
  excludeDynamicSystemPromptSections?: boolean;
353
+ /**
354
+ * Phase 4 slice η — Claude `--fallback-model <model>`. Routes overloaded-model
355
+ * requests to the named fallback. Only effective with `--print` (we always pass
356
+ * `-p`, so no extra gating required here).
357
+ */
358
+ fallbackModel?: string;
359
+ /**
360
+ * Phase 4 slice η — Claude `--json-schema <schema>`. Per `claude --help`, the
361
+ * argument is the JSON Schema *literal*, not a path. Object values are
362
+ * `JSON.stringify`-d; string values are passed verbatim (caller already wrote
363
+ * a JSON literal). No temp file lifecycle needed (contrast with Codex
364
+ * `--output-schema`, which takes a path).
365
+ */
366
+ jsonSchema?: string | Record<string, unknown>;
353
367
  }
354
368
  /**
355
369
  * Emit Claude high-impact feature flags (U25) as a flat argv segment.
@@ -438,6 +438,13 @@ export function prepareClaudeHighImpactFlags(input) {
438
438
  if (input.excludeDynamicSystemPromptSections) {
439
439
  args.push("--exclude-dynamic-system-prompt-sections");
440
440
  }
441
+ if (input.fallbackModel !== undefined) {
442
+ args.push("--fallback-model", input.fallbackModel);
443
+ }
444
+ if (input.jsonSchema !== undefined) {
445
+ const schemaArg = typeof input.jsonSchema === "string" ? input.jsonSchema : JSON.stringify(input.jsonSchema);
446
+ args.push("--json-schema", schemaArg);
447
+ }
441
448
  return args;
442
449
  }
443
450
  //──────────────────────────────────────────────────────────────────────────────
@@ -37,6 +37,8 @@ export const UPSTREAM_CLI_CONTRACTS = {
37
37
  "maxTurns",
38
38
  "effort",
39
39
  "excludeDynamicSystemPromptSections",
40
+ "fallbackModel",
41
+ "jsonSchema",
40
42
  "approvalStrategy",
41
43
  "mcpServers",
42
44
  "strictMcpConfig",
@@ -78,6 +80,14 @@ export const UPSTREAM_CLI_CONTRACTS = {
78
80
  arity: "none",
79
81
  description: "Trim dynamic system prompt sections",
80
82
  },
83
+ "--fallback-model": {
84
+ arity: "one",
85
+ description: "Auto-fallback model when default is overloaded (Claude --print only)",
86
+ },
87
+ "--json-schema": {
88
+ arity: "one",
89
+ description: "JSON Schema literal constraining structured output",
90
+ },
81
91
  "--continue": { arity: "none", description: "Continue active session" },
82
92
  "--session-id": { arity: "one", description: "Session id" },
83
93
  },
@@ -95,6 +105,29 @@ export const UPSTREAM_CLI_CONTRACTS = {
95
105
  args: ["-p", "hello", "--not-a-claude-flag"],
96
106
  expect: "fail",
97
107
  },
108
+ {
109
+ // Phase 4 slice η: --fallback-model wired through prepareClaudeRequest.
110
+ id: "claude-fallback-model",
111
+ description: "Phase 4 slice η: --fallback-model accepted",
112
+ args: ["-p", "hello", "--fallback-model", "claude-haiku-4-5-20251001"],
113
+ expect: "pass",
114
+ },
115
+ {
116
+ // Phase 4 slice η: --json-schema accepts an inline JSON Schema literal
117
+ // (per `claude --help` example), not a path. Codex parity for
118
+ // structured-output validation in one slice.
119
+ id: "claude-json-schema",
120
+ description: "Phase 4 slice η: --json-schema accepts inline JSON literal",
121
+ args: [
122
+ "-p",
123
+ "hello",
124
+ "--output-format",
125
+ "json",
126
+ "--json-schema",
127
+ '{"type":"object","properties":{"name":{"type":"string"}},"required":["name"]}',
128
+ ],
129
+ expect: "pass",
130
+ },
98
131
  ],
99
132
  },
100
133
  codex: {
@@ -248,7 +281,11 @@ export const UPSTREAM_CLI_CONTRACTS = {
248
281
  "-s": { arity: "none", description: "Sandbox mode" },
249
282
  "--policy": { arity: "one", description: "Policy file path" },
250
283
  "--admin-policy": { arity: "one", description: "Admin policy file path" },
251
- "-o": { arity: "one", values: ["json"], description: "Output format" },
284
+ "-o": {
285
+ arity: "one",
286
+ values: ["json", "stream-json"],
287
+ description: "Output format (Phase 4 slice ε adds stream-json)",
288
+ },
252
289
  "--resume": { arity: "one", description: "Resume session" },
253
290
  "--skip-trust": {
254
291
  arity: "none",
@@ -275,6 +312,18 @@ export const UPSTREAM_CLI_CONTRACTS = {
275
312
  args: ["-p", "hello", "--skip-trust"],
276
313
  expect: "pass",
277
314
  },
315
+ {
316
+ id: "gemini-stream-json",
317
+ description: "Phase 4 slice ε: -o stream-json is accepted",
318
+ args: ["-p", "hello", "-o", "stream-json"],
319
+ expect: "pass",
320
+ },
321
+ {
322
+ id: "gemini-output-format-invalid",
323
+ description: "Phase 4 slice ε: -o ndjson is rejected (not in contract enum)",
324
+ args: ["-p", "hello", "-o", "ndjson"],
325
+ expect: "fail",
326
+ },
278
327
  ],
279
328
  },
280
329
  grok: {
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "llm-cli-gateway",
3
- "version": "1.9.0",
3
+ "version": "1.11.0",
4
4
  "mcpName": "io.github.verivus-oss/llm-cli-gateway",
5
5
  "description": "MCP server providing unified access to Claude Code, Codex, Gemini, Grok, and Mistral Vibe CLIs with session management, retry logic, async job orchestration, durable job results, and cross-LLM validation.",
6
6
  "license": "MIT",