llm-cli-gateway 1.8.0 → 1.10.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -2,6 +2,154 @@
2
2
 
3
3
  All notable changes to the llm-cli-gateway project.
4
4
 
5
+ ## [1.10.0] - 2026-05-27 — Phase 4 slice ε (Gemini `-o stream-json` enum widening)
6
+
7
+ Ships the fifth Phase 4 slice: Gemini's NDJSON event-stream output format
8
+ (`-o stream-json`) is now reachable from `gemini_request` and
9
+ `gemini_request_async`. Four commits land together: the feature wiring, a
10
+ contract-table widening, a test-veracity regression suite, and a follow-up
11
+ test fix driven by the multi-LLM round-1 audit.
12
+
13
+ ### Added — `outputFormat: "stream-json"` for Gemini
14
+
15
+ - `gemini_request` and `gemini_request_async` `outputFormat` enums widened
16
+ from `text | json` to `text | json | stream-json`.
17
+ - `prepareGeminiRequest` emits `-o stream-json` when the new value is set.
18
+ No `--include-partial-messages` analogue is required: Gemini already
19
+ streams stdout in real time across all output modes (covered by
20
+ `CLI_IDLE_TIMEOUTS.gemini = 600_000`).
21
+ - New `parseGeminiStreamJson` parser consumes the NDJSON event stream
22
+ (`init` / `message` / `result` lines), concatenates assistant `delta`
23
+ messages into the response, and extracts
24
+ `input_tokens` / `output_tokens` / `cached` → `cache_read_tokens` from
25
+ the terminal `result.stats` event.
26
+ - `extractUsageAndCost("gemini", _, "stream-json")` routes to the new
27
+ parser so usage tokens reach the flight recorder on the stream-json
28
+ path, matching the existing `-o json` behaviour.
29
+ - `UPSTREAM_CLI_CONTRACTS.gemini.flags["-o"].values` widened to
30
+ `["json", "stream-json"]`; two new conformance fixtures
31
+ (`gemini-stream-json` passing, `gemini-output-format-invalid` failing
32
+ for `-o ndjson`) pin the enum bound.
33
+
34
+ ### Test-veracity audit
35
+
36
+ Per the standing protocol established with v1.9.0
37
+ (`feedback_test_veracity_audit_protocol`), this slice's tests were
38
+ audited by Codex + Gemini + Grok + Mistral in async parallel with
39
+ mandatory mutation-probe execution. Round 1 found one real gap
40
+ (`Eε-4` only checked fixture presence/shape — P-Eε-1 left it green);
41
+ closed in commit `4a78f9c` by running the fixture's args through
42
+ `validateUpstreamCliArgs` inside the same `it()` block. Round 2
43
+ delivered unanimous UNCONDITIONAL APPROVE across all four reviewers,
44
+ with site-by-site probe evidence for the contested `Eα` registered-schema
45
+ helper. Spec at `docs/plans/test-veracity-audit-slice-epsilon.spec.md`.
46
+
47
+ Test count: 771 → 795 → 796 (24 + 1 new across two files).
48
+
49
+ ### Known caveats
50
+
51
+ - The `npm run check` script still does not include `format:check` (a
52
+ gap first flagged in the v1.8.0 release notes). Run both locally
53
+ before pushing; CI runs format:check separately.
54
+
55
+ ## [1.9.0] - 2026-05-27 — Phase 4 slice δ (budget/max-turns parity) + retroactive α/γ contract closure
56
+
57
+ Ships the fourth Phase 4 slice (budget/max-turns parity for Grok and Mistral),
58
+ and retroactively closes three latent contract gaps that shipped silently in
59
+ v1.8.0 (slices α and γ). Five commits land together: the slice δ feature,
60
+ two bounds-tightening fixes, a contract-table closure, and a test-veracity
61
+ hardening pass driven by an iterative multi-LLM audit.
62
+
63
+ ### Added — `maxTurns` / `maxPrice` budget caps (slice δ)
64
+
65
+ - `grok_request` and `grok_request_async` gain optional `maxTurns?: number`
66
+ → emits `grok --max-turns N`. Grok exposes no per-request budget flag,
67
+ so `--max-price` is Mistral-only.
68
+ - `mistral_request` and `mistral_request_async` gain optional
69
+ `maxTurns?: number` → `vibe --max-turns N` AND `maxPrice?: number` →
70
+ `vibe --max-price DOLLARS`. Both apply only in programmatic mode (`-p`),
71
+ matching Vibe's documented constraint.
72
+ - The Mistral stale-model recovery retry path (extracted into a pure
73
+ `buildMistralRetryPrep` helper) preserves all three slice-γ/δ flags
74
+ (`trust`, `maxTurns`, `maxPrice`) on the second attempt.
75
+ - Defaults: undefined for all three new fields → no flag emitted →
76
+ existing callers see no behavioural change.
77
+
78
+ ### Fixed — Bounded numeric schemas for lossless argv stringification
79
+
80
+ - Extracted two shared, exported Zod constants:
81
+ - `MAX_TURNS_SCHEMA = z.number().int().positive().safe().max(10_000)`
82
+ - `MAX_PRICE_SCHEMA = z.number().positive().finite().min(1e-6).max(10_000)`
83
+ - The lower `.min(1e-6)` cap on price is exactly the boundary where
84
+ `String(N)` switches from decimal to scientific notation
85
+ (`String(1e-6) === "0.000001"` but `String(1e-7) === "1e-7"`); both
86
+ upstream CLIs reject scientific-notation values.
87
+ - Reused across all four slice-δ tool registrations so bounds stay
88
+ consistent if they ever need to change.
89
+
90
+ ### Fixed — Upstream contract table closes 5 latent flag gaps
91
+
92
+ `assertUpstreamCliArgs` consults `UPSTREAM_CLI_CONTRACTS` on every real
93
+ `*_request` call. The following flags / mcpParameters were never registered
94
+ there before this release, so production calls setting any of them threw
95
+ "Upstream contract violation" at runtime even though the prepare-function
96
+ unit tests passed:
97
+
98
+ - **Gemini** (slice γ retroactive): `skipTrust` + `--skip-trust`.
99
+ - **Mistral** (slice γ + δ retroactive): `trust` + `--trust`; `maxTurns` +
100
+ `--max-turns`; `maxPrice` + `--max-price` (with a strict decimal-only
101
+ regex matching `MAX_PRICE_SCHEMA`'s lower bound).
102
+ - **Grok** (slice δ): `maxTurns` + `--max-turns`.
103
+ - **Codex** (slice α retroactive): `--output-schema` and `-c` removed
104
+ from `resumeForbiddenFlags` — verified accepted on `codex exec resume`
105
+ per codex-cli 0.133.0.
106
+
107
+ Conformance fixtures pin each new flag's argv shape, including a
108
+ `mistral-max-price-scientific-notation` fixture that locks the `1e-7`
109
+ rejection at the contract layer.
110
+
111
+ ### Hardened — Test veracity (multi-LLM audit follow-up)
112
+
113
+ Codex + Grok ran iterative test-veracity audits with mutation probes per
114
+ `docs/plans/test-veracity-audit.spec.md`. They proved several added tests
115
+ were not falsifiable on the dimensions their commit messages claimed.
116
+ New file `src/__tests__/test-veracity-regressions.test.ts` closes those
117
+ gaps with six describe blocks:
118
+
119
+ - **REGRESSIONS A** — probes registered tool `inputSchema` bounds
120
+ directly (not the bare schema constants), so schema-drift in any of
121
+ the four sync/async registrations is caught.
122
+ - **REGRESSIONS B** — tests the pure `buildMistralRetryPrep` helper
123
+ across all combinations of `trust × maxTurns × maxPrice`. Self-
124
+ validated: dropping any of the three forwards on retry goes red.
125
+ - **REGRESSIONS C** — positive allowlist asserting slice α/γ/δ
126
+ parameters live in the matching contract's `mcpParameters` (closes
127
+ the self-oracle gap where removing a param from BOTH the contract
128
+ AND the schema previously stayed green).
129
+ - **REGRESSIONS D** — threads `prepare*Request` output into
130
+ `validateUpstreamCliArgs` end-to-end; the exact consistency check
131
+ the latent v1.8.0 contract breaks would have failed.
132
+ - **REGRESSIONS E** — `it.each` over sync AND async variants of every
133
+ slice-touched tool; the existing C4 was sync-only.
134
+ - **REGRESSIONS F** — flag-fixture coverage map: every flag in each
135
+ contract `flags` table must be exercised by a passing fixture (with
136
+ a grandfathered pre-audit baseline). Forces future slice authors to
137
+ add a fixture alongside any new flag entry.
138
+
139
+ The existing C4 (`MCP request schemas expose the provider contract
140
+ parameters`) now walks `_async` tools too.
141
+
142
+ ### Notes
143
+
144
+ Multi-LLM review across multiple iterative rounds, ending with a
145
+ dedicated test-veracity audit per Werner's strict-evidence protocol
146
+ (documented in `docs/plans/test-veracity-audit.spec.md`). Round 2 of the
147
+ audit landed UNCONDITIONAL APPROVE from Codex, Grok, Claude, and Mistral
148
+ with full mutation-probe evidence — every documented counterexample
149
+ mutation went red as predicted; tests are falsifiable by exactly the
150
+ regressions they claim to guard against. Gemini was quota-exhausted
151
+ during the audit window (~6h reset) and did not participate in round 2.
152
+
5
153
  ## [1.8.0] - 2026-05-27 — Phase 4 openers (codex resume fix, mistral telemetry, headless trust flags)
6
154
 
7
155
  Ships the first three slices of the Phase 4 provider-modernisation
@@ -1,13 +1,22 @@
1
1
  /**
2
- * Parser for Gemini CLI `-o json` output.
2
+ * Parsers for Gemini CLI `-o json` (single object) and `-o stream-json`
3
+ * (NDJSON event stream) output.
3
4
  *
4
- * Gemini emits a single JSON object with:
5
+ * `-o json` emits a single JSON object with:
5
6
  * - `response`: string final model output
6
7
  * - `usageMetadata`: { promptTokenCount, candidatesTokenCount,
7
8
  * cachedContentTokenCount?, totalTokenCount }
8
9
  *
9
- * Returns null when stdout is not parseable as JSON. Returns an object with
10
- * only `response` when usageMetadata is missing.
10
+ * `-o stream-json` emits one JSON object per line:
11
+ * - `{ "type": "init", "session_id": "...", "model": "..." }`
12
+ * - `{ "type": "message", "role": "user", "content": "..." }`
13
+ * - `{ "type": "message", "role": "assistant", "content": "...", "delta": true }` (repeated)
14
+ * - `{ "type": "result", "status": "success", "stats": { "input_tokens": N,
15
+ * "output_tokens": N, "cached": N, ... } }`
16
+ *
17
+ * Both parsers return null when stdout is unparseable. Both populate the same
18
+ * `GeminiJsonParseResult` shape so `extractUsageAndCost` can branch on
19
+ * outputFormat without further dispatch.
11
20
  */
12
21
  export interface GeminiUsage {
13
22
  input_tokens: number;
@@ -19,3 +28,9 @@ export interface GeminiJsonParseResult {
19
28
  response?: string;
20
29
  }
21
30
  export declare function parseGeminiJson(stdout: string): GeminiJsonParseResult | null;
31
+ /**
32
+ * Parse Gemini `-o stream-json` NDJSON output. Concatenates assistant `delta`
33
+ * message content into `response`, extracts the terminal `result.stats` payload
34
+ * into `usage`. Returns null when stdout contains no parseable JSON line.
35
+ */
36
+ export declare function parseGeminiStreamJson(stdout: string): GeminiJsonParseResult | null;
@@ -1,13 +1,22 @@
1
1
  /**
2
- * Parser for Gemini CLI `-o json` output.
2
+ * Parsers for Gemini CLI `-o json` (single object) and `-o stream-json`
3
+ * (NDJSON event stream) output.
3
4
  *
4
- * Gemini emits a single JSON object with:
5
+ * `-o json` emits a single JSON object with:
5
6
  * - `response`: string final model output
6
7
  * - `usageMetadata`: { promptTokenCount, candidatesTokenCount,
7
8
  * cachedContentTokenCount?, totalTokenCount }
8
9
  *
9
- * Returns null when stdout is not parseable as JSON. Returns an object with
10
- * only `response` when usageMetadata is missing.
10
+ * `-o stream-json` emits one JSON object per line:
11
+ * - `{ "type": "init", "session_id": "...", "model": "..." }`
12
+ * - `{ "type": "message", "role": "user", "content": "..." }`
13
+ * - `{ "type": "message", "role": "assistant", "content": "...", "delta": true }` (repeated)
14
+ * - `{ "type": "result", "status": "success", "stats": { "input_tokens": N,
15
+ * "output_tokens": N, "cached": N, ... } }`
16
+ *
17
+ * Both parsers return null when stdout is unparseable. Both populate the same
18
+ * `GeminiJsonParseResult` shape so `extractUsageAndCost` can branch on
19
+ * outputFormat without further dispatch.
11
20
  */
12
21
  export function parseGeminiJson(stdout) {
13
22
  const trimmed = stdout.trim();
@@ -45,3 +54,63 @@ export function parseGeminiJson(stdout) {
45
54
  }
46
55
  return result;
47
56
  }
57
+ /**
58
+ * Parse Gemini `-o stream-json` NDJSON output. Concatenates assistant `delta`
59
+ * message content into `response`, extracts the terminal `result.stats` payload
60
+ * into `usage`. Returns null when stdout contains no parseable JSON line.
61
+ */
62
+ export function parseGeminiStreamJson(stdout) {
63
+ if (!stdout) {
64
+ return null;
65
+ }
66
+ const lines = stdout.split(/\r?\n/);
67
+ const result = {};
68
+ const assistantChunks = [];
69
+ let sawAnyLine = false;
70
+ for (const line of lines) {
71
+ const trimmed = line.trim();
72
+ if (!trimmed)
73
+ continue;
74
+ // Gemini stream-json lines are individual JSON objects; non-JSON
75
+ // chatter (warnings, "Ripgrep not available", etc.) is silently
76
+ // ignored so a stray banner line doesn't poison usage extraction.
77
+ let event;
78
+ try {
79
+ event = JSON.parse(trimmed);
80
+ }
81
+ catch {
82
+ continue;
83
+ }
84
+ if (!event || typeof event !== "object")
85
+ continue;
86
+ sawAnyLine = true;
87
+ if (event.type === "message" &&
88
+ event.role === "assistant" &&
89
+ typeof event.content === "string") {
90
+ assistantChunks.push(event.content);
91
+ continue;
92
+ }
93
+ if (event.type === "result" && event.stats && typeof event.stats === "object") {
94
+ const stats = event.stats;
95
+ const input = typeof stats.input_tokens === "number" ? stats.input_tokens : undefined;
96
+ const output = typeof stats.output_tokens === "number" ? stats.output_tokens : undefined;
97
+ if (input !== undefined || output !== undefined) {
98
+ const usage = {
99
+ input_tokens: input ?? 0,
100
+ output_tokens: output ?? 0,
101
+ };
102
+ if (typeof stats.cached === "number") {
103
+ usage.cache_read_tokens = stats.cached;
104
+ }
105
+ result.usage = usage;
106
+ }
107
+ }
108
+ }
109
+ if (!sawAnyLine) {
110
+ return null;
111
+ }
112
+ if (assistantChunks.length > 0) {
113
+ result.response = assistantChunks.join("");
114
+ }
115
+ return result;
116
+ }
package/dist/index.d.ts CHANGED
@@ -54,6 +54,19 @@ declare const logger: {
54
54
  debug: (message: string, ...args: any[]) => void;
55
55
  };
56
56
  type GatewayLogger = typeof logger;
57
+ /**
58
+ * Phase 4 slice δ — shared Zod fragments for `maxTurns` / `maxPrice`.
59
+ *
60
+ * Both flags reach the upstream CLIs as decimal-formatted argv strings via
61
+ * `String(N)`. `z.number().int().positive()` alone lets values past
62
+ * `Number.MAX_SAFE_INTEGER` through, after which `String(1e21)` emits
63
+ * scientific notation that Grok and Vibe both reject. The bounds below
64
+ * (safe-integer cap + 10000 ceiling for turns; finite + 10000 USD ceiling
65
+ * for price) guarantee a lossless decimal stringification AND a sane
66
+ * upper bound — no plausible single agent loop exceeds 10k turns or 10k USD.
67
+ */
68
+ export declare const MAX_TURNS_SCHEMA: z.ZodNumber;
69
+ export declare const MAX_PRICE_SCHEMA: z.ZodNumber;
57
70
  export declare const SESSION_PROVIDER_VALUES: readonly ["claude", "codex", "gemini", "grok", "mistral"];
58
71
  export declare const SESSION_PROVIDER_ENUM: z.ZodEnum<["claude", "codex", "gemini", "grok", "mistral"]>;
59
72
  export type SessionProvider = (typeof SESSION_PROVIDER_VALUES)[number];
@@ -199,11 +212,13 @@ export declare function prepareGeminiRequest(params: {
199
212
  optimizePrompt: boolean;
200
213
  operation: string;
201
214
  /**
202
- * U23: output format. When set to "json", emits `-o json` so Gemini emits
203
- * the JSON object containing usageMetadata that `parseGeminiJson` (and
204
- * downstream `extractUsageAndCost`) can consume. Defaults to "text".
215
+ * U23 + Phase 4 slice ε: output format. `json` emits `-o json` (single
216
+ * JSON object with usageMetadata). `stream-json` emits `-o stream-json`
217
+ * (NDJSON event stream — `init` / `message` / `result` lines). Both
218
+ * route through `extractUsageAndCost` so usage tokens reach the flight
219
+ * recorder. Defaults to "text".
205
220
  */
206
- outputFormat?: "text" | "json";
221
+ outputFormat?: "text" | "json" | "stream-json";
207
222
  sandbox?: boolean;
208
223
  policyFiles?: string[];
209
224
  adminPolicyFiles?: string[];
@@ -215,6 +230,29 @@ export declare function prepareGeminiRequest(params: {
215
230
  */
216
231
  skipTrust?: boolean;
217
232
  }, runtime?: GatewayServerRuntime): CliRequestPrep | ExtendedToolResponse;
233
+ export declare function prepareGrokRequest(params: {
234
+ prompt?: string;
235
+ promptParts?: PromptParts;
236
+ model?: string;
237
+ outputFormat?: string;
238
+ alwaysApprove?: boolean;
239
+ permissionMode?: string;
240
+ effort?: string;
241
+ reasoningEffort?: string;
242
+ allowedTools?: string[];
243
+ disallowedTools?: string[];
244
+ approvalStrategy: "legacy" | "mcp_managed";
245
+ approvalPolicy?: string;
246
+ mcpServers?: ClaudeMcpServerName[];
247
+ correlationId?: string;
248
+ optimizePrompt: boolean;
249
+ operation: string;
250
+ /**
251
+ * Phase 4 slice δ: emit `--max-turns N` so callers can cap agent-loop
252
+ * iterations for cost / latency control. Mirrors Claude's wiring.
253
+ */
254
+ maxTurns?: number;
255
+ }, runtime?: GatewayServerRuntime): CliRequestPrep | ExtendedToolResponse;
218
256
  export declare function prepareMistralRequest(params: {
219
257
  prompt?: string;
220
258
  promptParts?: PromptParts;
@@ -236,9 +274,29 @@ export declare function prepareMistralRequest(params: {
236
274
  * prompt for this invocation only (not persisted). Default undefined.
237
275
  */
238
276
  trust?: boolean;
277
+ /** Phase 4 slice δ: Vibe `--max-turns N` cap on agent-loop iterations. */
278
+ maxTurns?: number;
279
+ /** Phase 4 slice δ: Vibe `--max-price DOLLARS` cumulative-cost cap. */
280
+ maxPrice?: number;
239
281
  }, runtime?: GatewayServerRuntime): (CliRequestPrep & {
240
282
  mistralEnv: Record<string, string>;
241
283
  }) | ExtendedToolResponse;
284
+ /**
285
+ * Phase 4 slice δ post-review: pure helper extracted from
286
+ * `handleMistralRequest` so the retry-path arg-preservation invariants
287
+ * (trust + maxTurns + maxPrice from slices γ/δ) are unit-testable
288
+ * without mocking awaitJobOrDefer. Any param the wrapper threads into
289
+ * the FIRST `buildMistralCliInvocation` call MUST also be threaded
290
+ * through here, or a fresh-workspace / budgeted run can degrade on
291
+ * the second attempt.
292
+ */
293
+ export declare function buildMistralRetryPrep(params: Pick<MistralRequestParams, "outputFormat" | "permissionMode" | "effort" | "reasoningEffort" | "allowedTools" | "disallowedTools" | "approvalStrategy" | "trust" | "maxTurns" | "maxPrice"> & {
294
+ effectivePrompt: string;
295
+ }, recoveryModel: string): {
296
+ args: string[];
297
+ env: Record<string, string>;
298
+ ignoredDisallowedTools: boolean;
299
+ };
242
300
  export interface GeminiRequestParams {
243
301
  prompt?: string;
244
302
  promptParts?: PromptParts;
@@ -257,8 +315,11 @@ export interface GeminiRequestParams {
257
315
  optimizeResponse?: boolean;
258
316
  idleTimeoutMs?: number;
259
317
  forceRefresh?: boolean;
260
- /** U23: "json" emits `-o json` so token usage is parsed and reported. */
261
- outputFormat?: "text" | "json";
318
+ /**
319
+ * U23 + Phase 4 slice ε: "json" emits `-o json`; "stream-json" emits
320
+ * `-o stream-json` (NDJSON event stream). Both are usage-extracted.
321
+ */
322
+ outputFormat?: "text" | "json" | "stream-json";
262
323
  sandbox?: boolean;
263
324
  policyFiles?: string[];
264
325
  adminPolicyFiles?: string[];
@@ -303,6 +364,8 @@ export interface GrokRequestParams {
303
364
  optimizeResponse?: boolean;
304
365
  idleTimeoutMs?: number;
305
366
  forceRefresh?: boolean;
367
+ /** Phase 4 slice δ: cap agent-loop iterations via `--max-turns N`. */
368
+ maxTurns?: number;
306
369
  }
307
370
  export declare function handleGrokRequest(deps: HandlerDeps, params: GrokRequestParams): Promise<ExtendedToolResponse>;
308
371
  export declare function handleGrokRequestAsync(deps: AsyncHandlerDeps, params: Omit<GrokRequestParams, "optimizeResponse">): Promise<ExtendedToolResponse>;
@@ -329,6 +392,10 @@ export interface MistralRequestParams {
329
392
  forceRefresh?: boolean;
330
393
  /** Phase 4 slice γ: emit `--trust` for fresh-workspace headless runs. */
331
394
  trust?: boolean;
395
+ /** Phase 4 slice δ: Vibe `--max-turns N` cap on agent-loop iterations. */
396
+ maxTurns?: number;
397
+ /** Phase 4 slice δ: Vibe `--max-price DOLLARS` cumulative-cost cap. */
398
+ maxPrice?: number;
332
399
  }
333
400
  export declare function handleMistralRequest(deps: HandlerDeps, params: MistralRequestParams): Promise<ExtendedToolResponse>;
334
401
  export declare function handleMistralRequestAsync(deps: AsyncHandlerDeps, params: Omit<MistralRequestParams, "optimizeResponse">): Promise<ExtendedToolResponse>;
package/dist/index.js CHANGED
@@ -9,7 +9,7 @@ import { z } from "zod";
9
9
  import { executeCli, killAllProcessGroups } from "./executor.js";
10
10
  import { parseStreamJson } from "./stream-json-parser.js";
11
11
  import { parseCodexJsonStream } from "./codex-json-parser.js";
12
- import { parseGeminiJson } from "./gemini-json-parser.js";
12
+ import { parseGeminiJson, parseGeminiStreamJson } from "./gemini-json-parser.js";
13
13
  import { parseVibeMetaJson } from "./mistral-meta-json-parser.js";
14
14
  import { homedir } from "os";
15
15
  import { createSessionManager } from "./session-manager.js";
@@ -229,6 +229,23 @@ function getApprovalManager(runtimeLogger = logger) {
229
229
  return approvalManager;
230
230
  }
231
231
  const MCP_SERVER_ENUM = z.enum(CLAUDE_MCP_SERVER_NAMES);
232
+ /**
233
+ * Phase 4 slice δ — shared Zod fragments for `maxTurns` / `maxPrice`.
234
+ *
235
+ * Both flags reach the upstream CLIs as decimal-formatted argv strings via
236
+ * `String(N)`. `z.number().int().positive()` alone lets values past
237
+ * `Number.MAX_SAFE_INTEGER` through, after which `String(1e21)` emits
238
+ * scientific notation that Grok and Vibe both reject. The bounds below
239
+ * (safe-integer cap + 10000 ceiling for turns; finite + 10000 USD ceiling
240
+ * for price) guarantee a lossless decimal stringification AND a sane
241
+ * upper bound — no plausible single agent loop exceeds 10k turns or 10k USD.
242
+ */
243
+ export const MAX_TURNS_SCHEMA = z.number().int().positive().safe().max(10_000);
244
+ // `.min(1e-6)` keeps the value in JS's decimal-stringify range:
245
+ // String(1e-6) === "0.000001" but String(1e-7) === "1e-7", which both
246
+ // upstream CLIs would reject. 1µUSD per request is fine-grained enough
247
+ // for any plausible budget-cap use.
248
+ export const MAX_PRICE_SCHEMA = z.number().positive().finite().min(1e-6).max(10_000);
232
249
  // U22: Session-provider enum extended to five providers. The storage layer's
233
250
  // CLI_TYPES already includes "mistral"; the MCP-tool layer mirrors that here so
234
251
  // session_create / session_list / session_clear_all accept the fifth provider.
@@ -513,8 +530,8 @@ ctx) {
513
530
  costUsd: parsed.usage.cost_usd,
514
531
  };
515
532
  }
516
- if (cli === "gemini" && outputFormat === "json") {
517
- const parsed = parseGeminiJson(output);
533
+ if (cli === "gemini" && (outputFormat === "json" || outputFormat === "stream-json")) {
534
+ const parsed = outputFormat === "stream-json" ? parseGeminiStreamJson(output) : parseGeminiJson(output);
518
535
  if (!parsed || !parsed.usage) {
519
536
  return {};
520
537
  }
@@ -1254,9 +1271,19 @@ export function prepareGeminiRequest(params, runtime = resolveGatewayServerRunti
1254
1271
  // U23 fix: emit `-o json` when the caller asked for JSON output. The Gemini
1255
1272
  // JSON parser is otherwise unreachable from the tool surface and the
1256
1273
  // structured usageMetadata is silently dropped.
1274
+ //
1275
+ // Phase 4 slice ε: same wiring for `-o stream-json` (NDJSON event stream).
1276
+ // Gemini already streams stdout in real-time so the existing 10-minute
1277
+ // idle timeout (CLI_IDLE_TIMEOUTS.gemini) covers both modes without
1278
+ // adjustment — unlike Claude, no `--include-partial-messages` companion
1279
+ // flag is required because Gemini emits assistant `delta` events as part
1280
+ // of the default stream-json shape.
1257
1281
  if (params.outputFormat === "json") {
1258
1282
  args.push("-o", "json");
1259
1283
  }
1284
+ else if (params.outputFormat === "stream-json") {
1285
+ args.push("-o", "stream-json");
1286
+ }
1260
1287
  // Phase 4 slice γ: opt-in trust-prompt bypass for fresh workspaces.
1261
1288
  if (params.skipTrust) {
1262
1289
  args.push("--skip-trust");
@@ -1273,7 +1300,7 @@ export function prepareGeminiRequest(params, runtime = resolveGatewayServerRunti
1273
1300
  stablePrefixTokens,
1274
1301
  };
1275
1302
  }
1276
- function prepareGrokRequest(params, runtime = resolveGatewayServerRuntime()) {
1303
+ export function prepareGrokRequest(params, runtime = resolveGatewayServerRuntime()) {
1277
1304
  const corrId = params.correlationId || randomUUID();
1278
1305
  const cliInfo = getCliInfo();
1279
1306
  const resolvedModel = resolveModelAlias("grok", params.model, cliInfo);
@@ -1349,6 +1376,9 @@ function prepareGrokRequest(params, runtime = resolveGatewayServerRuntime()) {
1349
1376
  if (params.disallowedTools && params.disallowedTools.length > 0) {
1350
1377
  args.push("--disallowed-tools", params.disallowedTools.join(","));
1351
1378
  }
1379
+ if (params.maxTurns !== undefined) {
1380
+ args.push("--max-turns", String(params.maxTurns));
1381
+ }
1352
1382
  return {
1353
1383
  corrId,
1354
1384
  effectivePrompt,
@@ -1433,6 +1463,8 @@ export function prepareMistralRequest(params, runtime = resolveGatewayServerRunt
1433
1463
  allowedTools: params.allowedTools,
1434
1464
  disallowedTools: params.disallowedTools,
1435
1465
  trust: params.trust,
1466
+ maxTurns: params.maxTurns,
1467
+ maxPrice: params.maxPrice,
1436
1468
  });
1437
1469
  if (prep.ignoredDisallowedTools) {
1438
1470
  runtime.logger.info(`[${corrId}] Mistral does not support disallowedTools; ignoring (caller passed ${params.disallowedTools?.length ?? 0} entries)`);
@@ -1463,6 +1495,32 @@ function selectMistralRecoveryModel(failedModel) {
1463
1495
  ].filter((model) => Boolean(model && model !== failedModel));
1464
1496
  return candidates.find(model => model !== "local");
1465
1497
  }
1498
+ /**
1499
+ * Phase 4 slice δ post-review: pure helper extracted from
1500
+ * `handleMistralRequest` so the retry-path arg-preservation invariants
1501
+ * (trust + maxTurns + maxPrice from slices γ/δ) are unit-testable
1502
+ * without mocking awaitJobOrDefer. Any param the wrapper threads into
1503
+ * the FIRST `buildMistralCliInvocation` call MUST also be threaded
1504
+ * through here, or a fresh-workspace / budgeted run can degrade on
1505
+ * the second attempt.
1506
+ */
1507
+ export function buildMistralRetryPrep(params, recoveryModel) {
1508
+ return buildMistralCliInvocation({
1509
+ prompt: params.effectivePrompt,
1510
+ resolvedModel: recoveryModel,
1511
+ outputFormat: params.outputFormat,
1512
+ permissionMode: params.approvalStrategy === "mcp_managed"
1513
+ ? "auto-approve"
1514
+ : (params.permissionMode ?? "auto-approve"),
1515
+ effort: params.effort,
1516
+ reasoningEffort: params.reasoningEffort,
1517
+ allowedTools: params.allowedTools,
1518
+ disallowedTools: params.disallowedTools,
1519
+ trust: params.trust,
1520
+ maxTurns: params.maxTurns,
1521
+ maxPrice: params.maxPrice,
1522
+ });
1523
+ }
1466
1524
  function buildCliResponse(cli, stdout, optimizeResponse, corrId, sessionId, prep, durationMs, resumable, outputFormat, warnings) {
1467
1525
  let finalStdout = stdout;
1468
1526
  // Skip response optimization for JSON output to prevent corrupting structured data
@@ -1801,6 +1859,7 @@ export async function handleGrokRequest(deps, params) {
1801
1859
  correlationId: params.correlationId,
1802
1860
  optimizePrompt: params.optimizePrompt,
1803
1861
  operation: "grok_request",
1862
+ maxTurns: params.maxTurns,
1804
1863
  }, runtime);
1805
1864
  if (!("args" in prep))
1806
1865
  return prep;
@@ -1921,6 +1980,7 @@ export async function handleGrokRequestAsync(deps, params) {
1921
1980
  correlationId: params.correlationId,
1922
1981
  optimizePrompt: params.optimizePrompt,
1923
1982
  operation: "grok_request_async",
1983
+ maxTurns: params.maxTurns,
1924
1984
  }, runtime);
1925
1985
  if (!("args" in prep))
1926
1986
  return prep;
@@ -2003,6 +2063,8 @@ export async function handleMistralRequest(deps, params) {
2003
2063
  optimizePrompt: params.optimizePrompt,
2004
2064
  operation: "mistral_request",
2005
2065
  trust: params.trust,
2066
+ maxTurns: params.maxTurns,
2067
+ maxPrice: params.maxPrice,
2006
2068
  }, runtime);
2007
2069
  if (!("args" in prep))
2008
2070
  return prep;
@@ -2035,22 +2097,7 @@ export async function handleMistralRequest(deps, params) {
2035
2097
  const recoveryModel = selectMistralRecoveryModel(prep.resolvedModel);
2036
2098
  if (recoveryModel) {
2037
2099
  deps.logger.info(`[${corrId}] mistral_request detected stale Vibe model selection; retrying once with ${recoveryModel}`);
2038
- const retryPrep = buildMistralCliInvocation({
2039
- prompt: prep.effectivePrompt,
2040
- resolvedModel: recoveryModel,
2041
- outputFormat: params.outputFormat,
2042
- permissionMode: params.approvalStrategy === "mcp_managed"
2043
- ? "auto-approve"
2044
- : (params.permissionMode ?? "auto-approve"),
2045
- effort: params.effort,
2046
- reasoningEffort: params.reasoningEffort,
2047
- allowedTools: params.allowedTools,
2048
- disallowedTools: params.disallowedTools,
2049
- // Phase 4 slice γ: preserve --trust on the model-selection retry
2050
- // so a fresh untrusted workspace doesn't block headlessly on the
2051
- // second attempt after surviving the first.
2052
- trust: params.trust,
2053
- });
2100
+ const retryPrep = buildMistralRetryPrep({ ...params, effectivePrompt: prep.effectivePrompt }, recoveryModel);
2054
2101
  const retryArgs = [...retryPrep.args, ...sessionResult.resumeArgs];
2055
2102
  // Reuse the FR handoff built above — the retry preserves corrId,
2056
2103
  // so the manager's logComplete still updates the original row.
@@ -2151,6 +2198,8 @@ export async function handleMistralRequestAsync(deps, params) {
2151
2198
  optimizePrompt: params.optimizePrompt,
2152
2199
  operation: "mistral_request_async",
2153
2200
  trust: params.trust,
2201
+ maxTurns: params.maxTurns,
2202
+ maxPrice: params.maxPrice,
2154
2203
  }, runtime);
2155
2204
  if (!("args" in prep))
2156
2205
  return prep;
@@ -3030,11 +3079,14 @@ export function createGatewayServer(deps = {}) {
3030
3079
  .default(false)
3031
3080
  .describe("Bypass dedup and force a fresh CLI run even if a recent identical request exists"),
3032
3081
  // U23: emit `-o json` to extract token usage via parseGeminiJson. Default
3033
- // remains text so existing callers see no behavior change.
3082
+ // remains text so existing callers see no behavior change. Phase 4 slice
3083
+ // ε adds `stream-json` (NDJSON event stream parsed by
3084
+ // parseGeminiStreamJson — `init`/`message`/`result` lines, idle-timeout
3085
+ // semantics covered by Gemini's existing real-time stdout streaming).
3034
3086
  outputFormat: z
3035
- .enum(["text", "json"])
3087
+ .enum(["text", "json", "stream-json"])
3036
3088
  .default("text")
3037
- .describe("Gemini output format. `json` emits `-o json` so usageMetadata is parsed and reported."),
3089
+ .describe("Gemini output format. `json` emits `-o json` (single JSON with usageMetadata). `stream-json` emits `-o stream-json` (NDJSON event stream — `init`/`message`/`result` lines, usage extracted from the terminal `result.stats` event). Both report usage to the flight recorder."),
3038
3090
  sandbox: GEMINI_HIGH_IMPACT_PARAMS_SCHEMA.shape.sandbox.describe("Run Gemini in sandbox mode (-s)"),
3039
3091
  policyFiles: GEMINI_HIGH_IMPACT_PARAMS_SCHEMA.shape.policyFiles.describe("Policy file paths (--policy <path>, one per file). Paths must exist."),
3040
3092
  adminPolicyFiles: GEMINI_HIGH_IMPACT_PARAMS_SCHEMA.shape.adminPolicyFiles.describe("Admin policy file paths (--admin-policy <path>, one per file). Paths must exist."),
@@ -3142,7 +3194,8 @@ export function createGatewayServer(deps = {}) {
3142
3194
  .boolean()
3143
3195
  .default(false)
3144
3196
  .describe("Bypass dedup and force a fresh CLI run even if a recent identical request exists"),
3145
- }, async ({ prompt, promptParts, model, outputFormat, sessionId, resumeLatest, createNewSession, alwaysApprove, permissionMode, effort, reasoningEffort, approvalStrategy, approvalPolicy, mcpServers, allowedTools, disallowedTools, correlationId, optimizePrompt, optimizeResponse, idleTimeoutMs, forceRefresh, }) => {
3197
+ maxTurns: MAX_TURNS_SCHEMA.optional().describe("Grok `--max-turns N`: cap on agent-loop iterations for cost / latency control (Phase 4 slice δ). Bounded to safe integers 10000."),
3198
+ }, async ({ prompt, promptParts, model, outputFormat, sessionId, resumeLatest, createNewSession, alwaysApprove, permissionMode, effort, reasoningEffort, approvalStrategy, approvalPolicy, mcpServers, allowedTools, disallowedTools, correlationId, optimizePrompt, optimizeResponse, idleTimeoutMs, forceRefresh, maxTurns, }) => {
3146
3199
  return handleGrokRequest({ sessionManager, logger, runtime }, {
3147
3200
  prompt,
3148
3201
  promptParts,
@@ -3165,6 +3218,7 @@ export function createGatewayServer(deps = {}) {
3165
3218
  optimizeResponse,
3166
3219
  idleTimeoutMs,
3167
3220
  forceRefresh,
3221
+ maxTurns,
3168
3222
  });
3169
3223
  });
3170
3224
  //──────────────────────────────────────────────────────────────────────────────
@@ -3242,7 +3296,9 @@ export function createGatewayServer(deps = {}) {
3242
3296
  .boolean()
3243
3297
  .default(false)
3244
3298
  .describe("Emit `--trust` so Vibe trusts the cwd for this invocation only (not persisted to trusted_folders.toml) and skips the interactive trust prompt (Phase 4 slice γ)."),
3245
- }, async ({ prompt, promptParts, model, outputFormat, sessionId, resumeLatest, createNewSession, permissionMode, effort, reasoningEffort, approvalStrategy, approvalPolicy, mcpServers, allowedTools, disallowedTools, correlationId, optimizePrompt, optimizeResponse, idleTimeoutMs, forceRefresh, trust, }) => {
3299
+ maxTurns: MAX_TURNS_SCHEMA.optional().describe("Vibe `--max-turns N`: cap the agent-loop iteration count (programmatic mode only, Phase 4 slice δ). Bounded to safe integers 10000."),
3300
+ maxPrice: MAX_PRICE_SCHEMA.optional().describe("Vibe `--max-price DOLLARS`: interrupt the session when cumulative cost crosses this cap (programmatic mode only, Phase 4 slice δ). Bounded to finite values ≤ 10000 USD."),
3301
+ }, async ({ prompt, promptParts, model, outputFormat, sessionId, resumeLatest, createNewSession, permissionMode, effort, reasoningEffort, approvalStrategy, approvalPolicy, mcpServers, allowedTools, disallowedTools, correlationId, optimizePrompt, optimizeResponse, idleTimeoutMs, forceRefresh, trust, maxTurns, maxPrice, }) => {
3246
3302
  return handleMistralRequest({ sessionManager, logger, runtime }, {
3247
3303
  prompt,
3248
3304
  promptParts,
@@ -3265,6 +3321,8 @@ export function createGatewayServer(deps = {}) {
3265
3321
  idleTimeoutMs,
3266
3322
  forceRefresh,
3267
3323
  trust,
3324
+ maxTurns,
3325
+ maxPrice,
3268
3326
  });
3269
3327
  });
3270
3328
  //──────────────────────────────────────────────────────────────────────────────
@@ -3646,11 +3704,14 @@ export function createGatewayServer(deps = {}) {
3646
3704
  .default(false)
3647
3705
  .describe("Bypass dedup and force a fresh CLI run even if a recent identical request exists"),
3648
3706
  // U23: emit `-o json` to extract token usage via parseGeminiJson. Default
3649
- // remains text so existing callers see no behavior change.
3707
+ // remains text so existing callers see no behavior change. Phase 4 slice
3708
+ // ε adds `stream-json` (NDJSON event stream parsed by
3709
+ // parseGeminiStreamJson — `init`/`message`/`result` lines, idle-timeout
3710
+ // semantics covered by Gemini's existing real-time stdout streaming).
3650
3711
  outputFormat: z
3651
- .enum(["text", "json"])
3712
+ .enum(["text", "json", "stream-json"])
3652
3713
  .default("text")
3653
- .describe("Gemini output format. `json` emits `-o json` so usageMetadata is parsed and reported."),
3714
+ .describe("Gemini output format. `json` emits `-o json` (single JSON with usageMetadata). `stream-json` emits `-o stream-json` (NDJSON event stream — `init`/`message`/`result` lines, usage extracted from the terminal `result.stats` event). Both report usage to the flight recorder."),
3654
3715
  sandbox: GEMINI_HIGH_IMPACT_PARAMS_SCHEMA.shape.sandbox.describe("Run Gemini in sandbox mode (-s)"),
3655
3716
  policyFiles: GEMINI_HIGH_IMPACT_PARAMS_SCHEMA.shape.policyFiles.describe("Policy file paths (--policy <path>, one per file). Paths must exist."),
3656
3717
  adminPolicyFiles: GEMINI_HIGH_IMPACT_PARAMS_SCHEMA.shape.adminPolicyFiles.describe("Admin policy file paths (--admin-policy <path>, one per file). Paths must exist."),
@@ -3753,7 +3814,8 @@ export function createGatewayServer(deps = {}) {
3753
3814
  .boolean()
3754
3815
  .default(false)
3755
3816
  .describe("Bypass dedup and force a fresh CLI run even if a recent identical request exists"),
3756
- }, async ({ prompt, promptParts, model, outputFormat, sessionId, resumeLatest, createNewSession, alwaysApprove, permissionMode, effort, reasoningEffort, approvalStrategy, approvalPolicy, mcpServers, allowedTools, disallowedTools, correlationId, optimizePrompt, idleTimeoutMs, forceRefresh, }) => {
3817
+ maxTurns: MAX_TURNS_SCHEMA.optional().describe("Grok `--max-turns N`: cap on agent-loop iterations for cost / latency control (Phase 4 slice δ). Bounded to safe integers 10000."),
3818
+ }, async ({ prompt, promptParts, model, outputFormat, sessionId, resumeLatest, createNewSession, alwaysApprove, permissionMode, effort, reasoningEffort, approvalStrategy, approvalPolicy, mcpServers, allowedTools, disallowedTools, correlationId, optimizePrompt, idleTimeoutMs, forceRefresh, maxTurns, }) => {
3757
3819
  return handleGrokRequestAsync({ sessionManager, asyncJobManager, logger, runtime }, {
3758
3820
  prompt,
3759
3821
  promptParts,
@@ -3775,6 +3837,7 @@ export function createGatewayServer(deps = {}) {
3775
3837
  optimizePrompt,
3776
3838
  idleTimeoutMs,
3777
3839
  forceRefresh,
3840
+ maxTurns,
3778
3841
  });
3779
3842
  });
3780
3843
  server.tool("mistral_request_async", {
@@ -3848,7 +3911,9 @@ export function createGatewayServer(deps = {}) {
3848
3911
  .boolean()
3849
3912
  .default(false)
3850
3913
  .describe("Emit `--trust` so Vibe trusts the cwd for this invocation only (not persisted to trusted_folders.toml) and skips the interactive trust prompt (Phase 4 slice γ)."),
3851
- }, async ({ prompt, promptParts, model, outputFormat, sessionId, resumeLatest, createNewSession, permissionMode, effort, reasoningEffort, approvalStrategy, approvalPolicy, mcpServers, allowedTools, disallowedTools, correlationId, optimizePrompt, idleTimeoutMs, forceRefresh, trust, }) => {
3914
+ maxTurns: MAX_TURNS_SCHEMA.optional().describe("Vibe `--max-turns N`: cap the agent-loop iteration count (programmatic mode only, Phase 4 slice δ). Bounded to safe integers 10000."),
3915
+ maxPrice: MAX_PRICE_SCHEMA.optional().describe("Vibe `--max-price DOLLARS`: interrupt the session when cumulative cost crosses this cap (programmatic mode only, Phase 4 slice δ). Bounded to finite values ≤ 10000 USD."),
3916
+ }, async ({ prompt, promptParts, model, outputFormat, sessionId, resumeLatest, createNewSession, permissionMode, effort, reasoningEffort, approvalStrategy, approvalPolicy, mcpServers, allowedTools, disallowedTools, correlationId, optimizePrompt, idleTimeoutMs, forceRefresh, trust, maxTurns, maxPrice, }) => {
3852
3917
  return handleMistralRequestAsync({ sessionManager, asyncJobManager, logger, runtime }, {
3853
3918
  prompt,
3854
3919
  promptParts,
@@ -3870,6 +3935,8 @@ export function createGatewayServer(deps = {}) {
3870
3935
  idleTimeoutMs,
3871
3936
  forceRefresh,
3872
3937
  trust,
3938
+ maxTurns,
3939
+ maxPrice,
3873
3940
  });
3874
3941
  });
3875
3942
  server.tool("llm_job_status", {
@@ -114,6 +114,17 @@ export interface PrepareMistralRequestInput {
114
114
  * Vibe's prompt behaviour is preserved for existing callers.
115
115
  */
116
116
  trust?: boolean;
117
+ /**
118
+ * Phase 4 slice δ: emit `--max-turns N` to cap the agent-loop iteration
119
+ * count (only applies in programmatic mode with `-p`).
120
+ */
121
+ maxTurns?: number;
122
+ /**
123
+ * Phase 4 slice δ: emit `--max-price DOLLARS` so the session is
124
+ * interrupted when cumulative cost crosses the cap (programmatic mode
125
+ * only).
126
+ */
127
+ maxPrice?: number;
117
128
  }
118
129
  export interface PrepareMistralRequestResult {
119
130
  args: string[];
@@ -179,6 +179,12 @@ export function prepareMistralRequest(input) {
179
179
  if (input.trust) {
180
180
  args.push("--trust");
181
181
  }
182
+ if (input.maxTurns !== undefined) {
183
+ args.push("--max-turns", String(input.maxTurns));
184
+ }
185
+ if (input.maxPrice !== undefined) {
186
+ args.push("--max-price", String(input.maxPrice));
187
+ }
182
188
  const ignoredDisallowedTools = Boolean(input.disallowedTools && input.disallowedTools.length > 0);
183
189
  return { args, env, ignoredDisallowedTools };
184
190
  }
@@ -133,14 +133,11 @@ export const UPSTREAM_CLI_CONTRACTS = {
133
133
  "ignoreRules",
134
134
  ],
135
135
  resumeOnlyFlags: ["--last"],
136
- resumeForbiddenFlags: [
137
- "--sandbox",
138
- "--ask-for-approval",
139
- "--full-auto",
140
- "--output-schema",
141
- "--search",
142
- "-c",
143
- ],
136
+ // Phase 4 slice α (v1.8.0) verified that `codex exec resume` accepts
137
+ // `--output-schema` and `-c` (codex-cli 0.133.0 `exec resume --help`),
138
+ // so they're no longer forbidden. `--search` stays forbidden (resume
139
+ // inherits the original session's web-search state).
140
+ resumeForbiddenFlags: ["--sandbox", "--ask-for-approval", "--full-auto", "--search"],
144
141
  flags: {
145
142
  "--last": { arity: "none", description: "Resume latest session" },
146
143
  "--model": { arity: "one", description: "Model selector" },
@@ -189,9 +186,24 @@ export const UPSTREAM_CLI_CONTRACTS = {
189
186
  expect: "fail",
190
187
  },
191
188
  {
189
+ // Phase 4 slice α: --output-schema IS accepted on resume per
190
+ // codex-cli 0.133.0; this fixture pins the new behaviour so future
191
+ // contract changes can't silently regress.
192
192
  id: "codex-resume-output-schema",
193
- description: "Resume-incompatible output schema flag is rejected",
193
+ description: "Phase 4 slice α: --output-schema accepted on resume (codex-cli 0.133.0)",
194
194
  args: ["exec", "resume", "--output-schema", "/tmp/schema.json", "session-id", "hello"],
195
+ expect: "pass",
196
+ },
197
+ {
198
+ id: "codex-resume-config-override",
199
+ description: "Phase 4 slice α: -c key=value accepted on resume",
200
+ args: ["exec", "resume", "-c", "model.foo=bar", "session-id", "hello"],
201
+ expect: "pass",
202
+ },
203
+ {
204
+ id: "codex-resume-search-still-forbidden",
205
+ description: "Phase 4 slice α: --search remains forbidden on resume",
206
+ args: ["exec", "resume", "--search", "session-id", "hello"],
195
207
  expect: "fail",
196
208
  },
197
209
  ],
@@ -219,6 +231,8 @@ export const UPSTREAM_CLI_CONTRACTS = {
219
231
  "policyFiles",
220
232
  "adminPolicyFiles",
221
233
  "attachments",
234
+ // Phase 4 slice γ
235
+ "skipTrust",
222
236
  ],
223
237
  flags: {
224
238
  "-p": { arity: "one", description: "Prompt text" },
@@ -234,8 +248,16 @@ export const UPSTREAM_CLI_CONTRACTS = {
234
248
  "-s": { arity: "none", description: "Sandbox mode" },
235
249
  "--policy": { arity: "one", description: "Policy file path" },
236
250
  "--admin-policy": { arity: "one", description: "Admin policy file path" },
237
- "-o": { arity: "one", values: ["json"], description: "Output format" },
251
+ "-o": {
252
+ arity: "one",
253
+ values: ["json", "stream-json"],
254
+ description: "Output format (Phase 4 slice ε adds stream-json)",
255
+ },
238
256
  "--resume": { arity: "one", description: "Resume session" },
257
+ "--skip-trust": {
258
+ arity: "none",
259
+ description: "Trust workspace for this session (Phase 4 slice γ)",
260
+ },
239
261
  },
240
262
  env: {},
241
263
  conformanceFixtures: [
@@ -251,6 +273,24 @@ export const UPSTREAM_CLI_CONTRACTS = {
251
273
  args: ["-p", "hello", "--not-a-gemini-flag"],
252
274
  expect: "fail",
253
275
  },
276
+ {
277
+ id: "gemini-skip-trust",
278
+ description: "Phase 4 slice γ: --skip-trust is accepted",
279
+ args: ["-p", "hello", "--skip-trust"],
280
+ expect: "pass",
281
+ },
282
+ {
283
+ id: "gemini-stream-json",
284
+ description: "Phase 4 slice ε: -o stream-json is accepted",
285
+ args: ["-p", "hello", "-o", "stream-json"],
286
+ expect: "pass",
287
+ },
288
+ {
289
+ id: "gemini-output-format-invalid",
290
+ description: "Phase 4 slice ε: -o ndjson is rejected (not in contract enum)",
291
+ args: ["-p", "hello", "-o", "ndjson"],
292
+ expect: "fail",
293
+ },
254
294
  ],
255
295
  },
256
296
  grok: {
@@ -275,6 +315,8 @@ export const UPSTREAM_CLI_CONTRACTS = {
275
315
  "mcpServers",
276
316
  "allowedTools",
277
317
  "disallowedTools",
318
+ // Phase 4 slice δ
319
+ "maxTurns",
278
320
  ],
279
321
  flags: {
280
322
  "-p": { arity: "one", description: "Prompt text" },
@@ -299,6 +341,11 @@ export const UPSTREAM_CLI_CONTRACTS = {
299
341
  },
300
342
  "--resume": { arity: "one", description: "Resume session" },
301
343
  "--continue": { arity: "none", description: "Continue latest session" },
344
+ "--max-turns": {
345
+ arity: "one",
346
+ pattern: /^[1-9][0-9]*$/,
347
+ description: "Agent-loop iteration cap (Phase 4 slice δ)",
348
+ },
302
349
  },
303
350
  env: {},
304
351
  conformanceFixtures: [
@@ -314,6 +361,18 @@ export const UPSTREAM_CLI_CONTRACTS = {
314
361
  args: ["-p", "hello", "--not-a-grok-flag"],
315
362
  expect: "fail",
316
363
  },
364
+ {
365
+ id: "grok-max-turns",
366
+ description: "Phase 4 slice δ: --max-turns N is accepted",
367
+ args: ["-p", "hello", "--max-turns", "5"],
368
+ expect: "pass",
369
+ },
370
+ {
371
+ id: "grok-max-turns-invalid-zero",
372
+ description: "Phase 4 slice δ: --max-turns 0 is rejected by contract pattern",
373
+ args: ["-p", "hello", "--max-turns", "0"],
374
+ expect: "fail",
375
+ },
317
376
  ],
318
377
  },
319
378
  mistral: {
@@ -337,6 +396,11 @@ export const UPSTREAM_CLI_CONTRACTS = {
337
396
  "mcpServers",
338
397
  "allowedTools",
339
398
  "disallowedTools",
399
+ // Phase 4 slice γ
400
+ "trust",
401
+ // Phase 4 slice δ
402
+ "maxTurns",
403
+ "maxPrice",
340
404
  ],
341
405
  flags: {
342
406
  "-p": { arity: "one", description: "Prompt text" },
@@ -355,6 +419,22 @@ export const UPSTREAM_CLI_CONTRACTS = {
355
419
  "--enabled-tools": { arity: "one", description: "Enabled tool" },
356
420
  "--resume": { arity: "one", description: "Resume session" },
357
421
  "--continue": { arity: "none", description: "Continue latest session" },
422
+ "--trust": {
423
+ arity: "none",
424
+ description: "Trust cwd for this invocation only (Phase 4 slice γ)",
425
+ },
426
+ "--max-turns": {
427
+ arity: "one",
428
+ pattern: /^[1-9][0-9]*$/,
429
+ description: "Agent-loop iteration cap (Phase 4 slice δ, programmatic mode only)",
430
+ },
431
+ "--max-price": {
432
+ arity: "one",
433
+ // Decimal-only: matches the MAX_PRICE_SCHEMA min(1e-6) lower bound
434
+ // that keeps String(N) in decimal form (no scientific notation).
435
+ pattern: /^(0|[1-9][0-9]*)(\.[0-9]+)?$/,
436
+ description: "Cumulative cost cap in USD (Phase 4 slice δ, programmatic mode only)",
437
+ },
358
438
  },
359
439
  env: {
360
440
  VIBE_ACTIVE_MODEL: {
@@ -378,6 +458,27 @@ export const UPSTREAM_CLI_CONTRACTS = {
378
458
  env: { CODEX_MODEL: "gpt-5.5" },
379
459
  expect: "fail",
380
460
  },
461
+ {
462
+ id: "mistral-trust",
463
+ description: "Phase 4 slice γ: --trust is accepted",
464
+ args: ["-p", "hello", "--agent", "auto-approve", "--trust"],
465
+ env: { VIBE_ACTIVE_MODEL: "mistral-medium-3.5" },
466
+ expect: "pass",
467
+ },
468
+ {
469
+ id: "mistral-max-turns-and-price",
470
+ description: "Phase 4 slice δ: --max-turns + --max-price are accepted together",
471
+ args: ["-p", "hello", "--agent", "auto-approve", "--max-turns", "3", "--max-price", "0.01"],
472
+ env: { VIBE_ACTIVE_MODEL: "mistral-medium-3.5" },
473
+ expect: "pass",
474
+ },
475
+ {
476
+ id: "mistral-max-price-scientific-notation",
477
+ description: "Phase 4 slice δ: scientific-notation --max-price is rejected by contract pattern (matches MAX_PRICE_SCHEMA bounds)",
478
+ args: ["-p", "hello", "--agent", "auto-approve", "--max-price", "1e-7"],
479
+ env: { VIBE_ACTIVE_MODEL: "mistral-medium-3.5" },
480
+ expect: "fail",
481
+ },
381
482
  ],
382
483
  },
383
484
  };
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "llm-cli-gateway",
3
- "version": "1.8.0",
3
+ "version": "1.10.0",
4
4
  "mcpName": "io.github.verivus-oss/llm-cli-gateway",
5
5
  "description": "MCP server providing unified access to Claude Code, Codex, Gemini, Grok, and Mistral Vibe CLIs with session management, retry logic, async job orchestration, durable job results, and cross-LLM validation.",
6
6
  "license": "MIT",