npm - llm-cli-gateway - Versions diffs - 1.8.0 → 1.10.0 - Mend

llm-cli-gateway 1.8.0 → 1.10.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (9) hide show

package/CHANGELOG.md +148 -0
package/dist/gemini-json-parser.d.ts +19 -4
package/dist/gemini-json-parser.js +73 -4
package/dist/index.d.ts +73 -6
package/dist/index.js +97 -30
package/dist/request-helpers.d.ts +11 -0
package/dist/request-helpers.js +6 -0
package/dist/upstream-contracts.js +111 -10
package/package.json +1 -1

package/CHANGELOG.md CHANGED Viewed

@@ -2,6 +2,154 @@
 All notable changes to the llm-cli-gateway project.
+## [1.10.0] - 2026-05-27 — Phase 4 slice ε (Gemini `-o stream-json` enum widening)
+Ships the fifth Phase 4 slice: Gemini's NDJSON event-stream output format
+(`-o stream-json`) is now reachable from `gemini_request` and
+`gemini_request_async`. Four commits land together: the feature wiring, a
+contract-table widening, a test-veracity regression suite, and a follow-up
+test fix driven by the multi-LLM round-1 audit.
+### Added — `outputFormat: "stream-json"` for Gemini
+- `gemini_request` and `gemini_request_async` `outputFormat` enums widened
+  from `text | json` to `text | json | stream-json`.
+- `prepareGeminiRequest` emits `-o stream-json` when the new value is set.
+  No `--include-partial-messages` analogue is required: Gemini already
+  streams stdout in real time across all output modes (covered by
+  `CLI_IDLE_TIMEOUTS.gemini = 600_000`).
+- New `parseGeminiStreamJson` parser consumes the NDJSON event stream
+  (`init` / `message` / `result` lines), concatenates assistant `delta`
+  messages into the response, and extracts
+  `input_tokens` / `output_tokens` / `cached` → `cache_read_tokens` from
+  the terminal `result.stats` event.
+- `extractUsageAndCost("gemini", _, "stream-json")` routes to the new
+  parser so usage tokens reach the flight recorder on the stream-json
+  path, matching the existing `-o json` behaviour.
+- `UPSTREAM_CLI_CONTRACTS.gemini.flags["-o"].values` widened to
+  `["json", "stream-json"]`; two new conformance fixtures
+  (`gemini-stream-json` passing, `gemini-output-format-invalid` failing
+  for `-o ndjson`) pin the enum bound.
+### Test-veracity audit
+Per the standing protocol established with v1.9.0
+(`feedback_test_veracity_audit_protocol`), this slice's tests were
+audited by Codex + Gemini + Grok + Mistral in async parallel with
+mandatory mutation-probe execution. Round 1 found one real gap
+(`Eε-4` only checked fixture presence/shape — P-Eε-1 left it green);
+closed in commit `4a78f9c` by running the fixture's args through
+`validateUpstreamCliArgs` inside the same `it()` block. Round 2
+delivered unanimous UNCONDITIONAL APPROVE across all four reviewers,
+with site-by-site probe evidence for the contested `Eα` registered-schema
+helper. Spec at `docs/plans/test-veracity-audit-slice-epsilon.spec.md`.
+Test count: 771 → 795 → 796 (24 + 1 new across two files).
+### Known caveats
+- The `npm run check` script still does not include `format:check` (a
+  gap first flagged in the v1.8.0 release notes). Run both locally
+  before pushing; CI runs format:check separately.
+## [1.9.0] - 2026-05-27 — Phase 4 slice δ (budget/max-turns parity) + retroactive α/γ contract closure
+Ships the fourth Phase 4 slice (budget/max-turns parity for Grok and Mistral),
+and retroactively closes three latent contract gaps that shipped silently in
+v1.8.0 (slices α and γ). Five commits land together: the slice δ feature,
+two bounds-tightening fixes, a contract-table closure, and a test-veracity
+hardening pass driven by an iterative multi-LLM audit.
+### Added — `maxTurns` / `maxPrice` budget caps (slice δ)
+- `grok_request` and `grok_request_async` gain optional `maxTurns?: number`
+  → emits `grok --max-turns N`. Grok exposes no per-request budget flag,
+  so `--max-price` is Mistral-only.
+- `mistral_request` and `mistral_request_async` gain optional
+  `maxTurns?: number` → `vibe --max-turns N` AND `maxPrice?: number` →
+  `vibe --max-price DOLLARS`. Both apply only in programmatic mode (`-p`),
+  matching Vibe's documented constraint.
+- The Mistral stale-model recovery retry path (extracted into a pure
+  `buildMistralRetryPrep` helper) preserves all three slice-γ/δ flags
+  (`trust`, `maxTurns`, `maxPrice`) on the second attempt.
+- Defaults: undefined for all three new fields → no flag emitted →
+  existing callers see no behavioural change.
+### Fixed — Bounded numeric schemas for lossless argv stringification
+- Extracted two shared, exported Zod constants:
+  - `MAX_TURNS_SCHEMA = z.number().int().positive().safe().max(10_000)`
+  - `MAX_PRICE_SCHEMA = z.number().positive().finite().min(1e-6).max(10_000)`
+- The lower `.min(1e-6)` cap on price is exactly the boundary where
+  `String(N)` switches from decimal to scientific notation
+  (`String(1e-6) === "0.000001"` but `String(1e-7) === "1e-7"`); both
+  upstream CLIs reject scientific-notation values.
+- Reused across all four slice-δ tool registrations so bounds stay
+  consistent if they ever need to change.
+### Fixed — Upstream contract table closes 5 latent flag gaps
+`assertUpstreamCliArgs` consults `UPSTREAM_CLI_CONTRACTS` on every real
+`*_request` call. The following flags / mcpParameters were never registered
+there before this release, so production calls setting any of them threw
+"Upstream contract violation" at runtime even though the prepare-function
+unit tests passed:
+- **Gemini** (slice γ retroactive): `skipTrust` + `--skip-trust`.
+- **Mistral** (slice γ + δ retroactive): `trust` + `--trust`; `maxTurns` +
+  `--max-turns`; `maxPrice` + `--max-price` (with a strict decimal-only
+  regex matching `MAX_PRICE_SCHEMA`'s lower bound).
+- **Grok** (slice δ): `maxTurns` + `--max-turns`.
+- **Codex** (slice α retroactive): `--output-schema` and `-c` removed
+  from `resumeForbiddenFlags` — verified accepted on `codex exec resume`
+  per codex-cli 0.133.0.
+Conformance fixtures pin each new flag's argv shape, including a
+`mistral-max-price-scientific-notation` fixture that locks the `1e-7`
+rejection at the contract layer.
+### Hardened — Test veracity (multi-LLM audit follow-up)
+Codex + Grok ran iterative test-veracity audits with mutation probes per
+`docs/plans/test-veracity-audit.spec.md`. They proved several added tests
+were not falsifiable on the dimensions their commit messages claimed.
+New file `src/__tests__/test-veracity-regressions.test.ts` closes those
+gaps with six describe blocks:
+- **REGRESSIONS A** — probes registered tool `inputSchema` bounds
+  directly (not the bare schema constants), so schema-drift in any of
+  the four sync/async registrations is caught.
+- **REGRESSIONS B** — tests the pure `buildMistralRetryPrep` helper
+  across all combinations of `trust × maxTurns × maxPrice`. Self-
+  validated: dropping any of the three forwards on retry goes red.
+- **REGRESSIONS C** — positive allowlist asserting slice α/γ/δ
+  parameters live in the matching contract's `mcpParameters` (closes
+  the self-oracle gap where removing a param from BOTH the contract
+  AND the schema previously stayed green).
+- **REGRESSIONS D** — threads `prepare*Request` output into
+  `validateUpstreamCliArgs` end-to-end; the exact consistency check
+  the latent v1.8.0 contract breaks would have failed.
+- **REGRESSIONS E** — `it.each` over sync AND async variants of every
+  slice-touched tool; the existing C4 was sync-only.
+- **REGRESSIONS F** — flag-fixture coverage map: every flag in each
+  contract `flags` table must be exercised by a passing fixture (with
+  a grandfathered pre-audit baseline). Forces future slice authors to
+  add a fixture alongside any new flag entry.
+The existing C4 (`MCP request schemas expose the provider contract
+parameters`) now walks `_async` tools too.
+### Notes
+Multi-LLM review across multiple iterative rounds, ending with a
+dedicated test-veracity audit per Werner's strict-evidence protocol
+(documented in `docs/plans/test-veracity-audit.spec.md`). Round 2 of the
+audit landed UNCONDITIONAL APPROVE from Codex, Grok, Claude, and Mistral
+with full mutation-probe evidence — every documented counterexample
+mutation went red as predicted; tests are falsifiable by exactly the
+regressions they claim to guard against. Gemini was quota-exhausted
+during the audit window (~6h reset) and did not participate in round 2.
 ## [1.8.0] - 2026-05-27 — Phase 4 openers (codex resume fix, mistral telemetry, headless trust flags)
 Ships the first three slices of the Phase 4 provider-modernisation

package/dist/gemini-json-parser.d.ts CHANGED Viewed

@@ -1,13 +1,22 @@
 /**
- * Parser for Gemini CLI `-o json` output.
+ * Parsers for Gemini CLI `-o json` (single object) and `-o stream-json`
+ * (NDJSON event stream) output.
  *
- * Gemini emits a single JSON object with:
+ * `-o json` emits a single JSON object with:
  *   - `response`: string final model output
  *   - `usageMetadata`: { promptTokenCount, candidatesTokenCount,
  *                        cachedContentTokenCount?, totalTokenCount }
  *
- * Returns null when stdout is not parseable as JSON. Returns an object with
- * only `response` when usageMetadata is missing.
+ * `-o stream-json` emits one JSON object per line:
+ *   - `{ "type": "init", "session_id": "...", "model": "..." }`
+ *   - `{ "type": "message", "role": "user", "content": "..." }`
+ *   - `{ "type": "message", "role": "assistant", "content": "...", "delta": true }` (repeated)
+ *   - `{ "type": "result", "status": "success", "stats": { "input_tokens": N,
+ *        "output_tokens": N, "cached": N, ... } }`
+ *
+ * Both parsers return null when stdout is unparseable. Both populate the same
+ * `GeminiJsonParseResult` shape so `extractUsageAndCost` can branch on
+ * outputFormat without further dispatch.
  */
 export interface GeminiUsage {
     input_tokens: number;
@@ -19,3 +28,9 @@ export interface GeminiJsonParseResult {
     response?: string;
 }
 export declare function parseGeminiJson(stdout: string): GeminiJsonParseResult | null;
+/**
+ * Parse Gemini `-o stream-json` NDJSON output. Concatenates assistant `delta`
+ * message content into `response`, extracts the terminal `result.stats` payload
+ * into `usage`. Returns null when stdout contains no parseable JSON line.
+ */
+export declare function parseGeminiStreamJson(stdout: string): GeminiJsonParseResult | null;

package/dist/gemini-json-parser.js CHANGED Viewed

@@ -1,13 +1,22 @@
 /**
- * Parser for Gemini CLI `-o json` output.
+ * Parsers for Gemini CLI `-o json` (single object) and `-o stream-json`
+ * (NDJSON event stream) output.
  *
- * Gemini emits a single JSON object with:
+ * `-o json` emits a single JSON object with:
  *   - `response`: string final model output
  *   - `usageMetadata`: { promptTokenCount, candidatesTokenCount,
  *                        cachedContentTokenCount?, totalTokenCount }
  *
- * Returns null when stdout is not parseable as JSON. Returns an object with
- * only `response` when usageMetadata is missing.
+ * `-o stream-json` emits one JSON object per line:
+ *   - `{ "type": "init", "session_id": "...", "model": "..." }`
+ *   - `{ "type": "message", "role": "user", "content": "..." }`
+ *   - `{ "type": "message", "role": "assistant", "content": "...", "delta": true }` (repeated)
+ *   - `{ "type": "result", "status": "success", "stats": { "input_tokens": N,
+ *        "output_tokens": N, "cached": N, ... } }`
+ *
+ * Both parsers return null when stdout is unparseable. Both populate the same
+ * `GeminiJsonParseResult` shape so `extractUsageAndCost` can branch on
+ * outputFormat without further dispatch.
  */
 export function parseGeminiJson(stdout) {
     const trimmed = stdout.trim();
@@ -45,3 +54,63 @@ export function parseGeminiJson(stdout) {
     }
     return result;
 }
+/**
+ * Parse Gemini `-o stream-json` NDJSON output. Concatenates assistant `delta`
+ * message content into `response`, extracts the terminal `result.stats` payload
+ * into `usage`. Returns null when stdout contains no parseable JSON line.
+ */
+export function parseGeminiStreamJson(stdout) {
+    if (!stdout) {
+        return null;
+    }
+    const lines = stdout.split(/\r?\n/);
+    const result = {};
+    const assistantChunks = [];
+    let sawAnyLine = false;
+    for (const line of lines) {
+        const trimmed = line.trim();
+        if (!trimmed)
+            continue;
+        // Gemini stream-json lines are individual JSON objects; non-JSON
+        // chatter (warnings, "Ripgrep not available", etc.) is silently
+        // ignored so a stray banner line doesn't poison usage extraction.
+        let event;
+        try {
+            event = JSON.parse(trimmed);
+        }
+        catch {
+            continue;
+        }
+        if (!event || typeof event !== "object")
+            continue;
+        sawAnyLine = true;
+        if (event.type === "message" &&
+            event.role === "assistant" &&
+            typeof event.content === "string") {
+            assistantChunks.push(event.content);
+            continue;
+        }
+        if (event.type === "result" && event.stats && typeof event.stats === "object") {
+            const stats = event.stats;
+            const input = typeof stats.input_tokens === "number" ? stats.input_tokens : undefined;
+            const output = typeof stats.output_tokens === "number" ? stats.output_tokens : undefined;
+            if (input !== undefined || output !== undefined) {
+                const usage = {
+                    input_tokens: input ?? 0,
+                    output_tokens: output ?? 0,
+                };
+                if (typeof stats.cached === "number") {
+                    usage.cache_read_tokens = stats.cached;
+                }
+                result.usage = usage;
+            }
+        }
+    }
+    if (!sawAnyLine) {
+        return null;
+    }
+    if (assistantChunks.length > 0) {
+        result.response = assistantChunks.join("");
+    }
+    return result;
+}

package/dist/index.d.ts CHANGED Viewed

@@ -54,6 +54,19 @@ declare const logger: {
     debug: (message: string, ...args: any[]) => void;
 };
 type GatewayLogger = typeof logger;
+/**
+ * Phase 4 slice δ — shared Zod fragments for `maxTurns` / `maxPrice`.
+ *
+ * Both flags reach the upstream CLIs as decimal-formatted argv strings via
+ * `String(N)`. `z.number().int().positive()` alone lets values past
+ * `Number.MAX_SAFE_INTEGER` through, after which `String(1e21)` emits
+ * scientific notation that Grok and Vibe both reject. The bounds below
+ * (safe-integer cap + 10000 ceiling for turns; finite + 10000 USD ceiling
+ * for price) guarantee a lossless decimal stringification AND a sane
+ * upper bound — no plausible single agent loop exceeds 10k turns or 10k USD.
+ */
+export declare const MAX_TURNS_SCHEMA: z.ZodNumber;
+export declare const MAX_PRICE_SCHEMA: z.ZodNumber;
 export declare const SESSION_PROVIDER_VALUES: readonly ["claude", "codex", "gemini", "grok", "mistral"];
 export declare const SESSION_PROVIDER_ENUM: z.ZodEnum<["claude", "codex", "gemini", "grok", "mistral"]>;
 export type SessionProvider = (typeof SESSION_PROVIDER_VALUES)[number];
@@ -199,11 +212,13 @@ export declare function prepareGeminiRequest(params: {
     optimizePrompt: boolean;
     operation: string;
     /**
-     * U23: output format. When set to "json", emits `-o json` so Gemini emits
-     * the JSON object containing usageMetadata that `parseGeminiJson` (and
-     * downstream `extractUsageAndCost`) can consume. Defaults to "text".
+     * U23 + Phase 4 slice ε: output format. `json` emits `-o json` (single
+     * JSON object with usageMetadata). `stream-json` emits `-o stream-json`
+     * (NDJSON event stream — `init` / `message` / `result` lines). Both
+     * route through `extractUsageAndCost` so usage tokens reach the flight
+     * recorder. Defaults to "text".
      */
-    outputFormat?: "text" | "json";
+    outputFormat?: "text" | "json" | "stream-json";
     sandbox?: boolean;
     policyFiles?: string[];
     adminPolicyFiles?: string[];
@@ -215,6 +230,29 @@ export declare function prepareGeminiRequest(params: {
      */
     skipTrust?: boolean;
 }, runtime?: GatewayServerRuntime): CliRequestPrep | ExtendedToolResponse;
+export declare function prepareGrokRequest(params: {
+    prompt?: string;
+    promptParts?: PromptParts;
+    model?: string;
+    outputFormat?: string;
+    alwaysApprove?: boolean;
+    permissionMode?: string;
+    effort?: string;
+    reasoningEffort?: string;
+    allowedTools?: string[];
+    disallowedTools?: string[];
+    approvalStrategy: "legacy" | "mcp_managed";
+    approvalPolicy?: string;
+    mcpServers?: ClaudeMcpServerName[];
+    correlationId?: string;
+    optimizePrompt: boolean;
+    operation: string;
+    /**
+     * Phase 4 slice δ: emit `--max-turns N` so callers can cap agent-loop
+     * iterations for cost / latency control. Mirrors Claude's wiring.
+     */
+    maxTurns?: number;
+}, runtime?: GatewayServerRuntime): CliRequestPrep | ExtendedToolResponse;
 export declare function prepareMistralRequest(params: {
     prompt?: string;
     promptParts?: PromptParts;
@@ -236,9 +274,29 @@ export declare function prepareMistralRequest(params: {
      * prompt for this invocation only (not persisted). Default undefined.
      */
     trust?: boolean;
+    /** Phase 4 slice δ: Vibe `--max-turns N` cap on agent-loop iterations. */
+    maxTurns?: number;
+    /** Phase 4 slice δ: Vibe `--max-price DOLLARS` cumulative-cost cap. */
+    maxPrice?: number;
 }, runtime?: GatewayServerRuntime): (CliRequestPrep & {
     mistralEnv: Record<string, string>;
 }) | ExtendedToolResponse;
+/**
+ * Phase 4 slice δ post-review: pure helper extracted from
+ * `handleMistralRequest` so the retry-path arg-preservation invariants
+ * (trust + maxTurns + maxPrice from slices γ/δ) are unit-testable
+ * without mocking awaitJobOrDefer. Any param the wrapper threads into
+ * the FIRST `buildMistralCliInvocation` call MUST also be threaded
+ * through here, or a fresh-workspace / budgeted run can degrade on
+ * the second attempt.
+ */
+export declare function buildMistralRetryPrep(params: Pick<MistralRequestParams, "outputFormat" | "permissionMode" | "effort" | "reasoningEffort" | "allowedTools" | "disallowedTools" | "approvalStrategy" | "trust" | "maxTurns" | "maxPrice"> & {
+    effectivePrompt: string;
+}, recoveryModel: string): {
+    args: string[];
+    env: Record<string, string>;
+    ignoredDisallowedTools: boolean;
+};
 export interface GeminiRequestParams {
     prompt?: string;
     promptParts?: PromptParts;
@@ -257,8 +315,11 @@ export interface GeminiRequestParams {
     optimizeResponse?: boolean;
     idleTimeoutMs?: number;
     forceRefresh?: boolean;
-    /** U23: "json" emits `-o json` so token usage is parsed and reported. */
-    outputFormat?: "text" | "json";
+    /**
+     * U23 + Phase 4 slice ε: "json" emits `-o json`; "stream-json" emits
+     * `-o stream-json` (NDJSON event stream). Both are usage-extracted.
+     */
+    outputFormat?: "text" | "json" | "stream-json";
     sandbox?: boolean;
     policyFiles?: string[];
     adminPolicyFiles?: string[];
@@ -303,6 +364,8 @@ export interface GrokRequestParams {
     optimizeResponse?: boolean;
     idleTimeoutMs?: number;
     forceRefresh?: boolean;
+    /** Phase 4 slice δ: cap agent-loop iterations via `--max-turns N`. */
+    maxTurns?: number;
 }
 export declare function handleGrokRequest(deps: HandlerDeps, params: GrokRequestParams): Promise<ExtendedToolResponse>;
 export declare function handleGrokRequestAsync(deps: AsyncHandlerDeps, params: Omit<GrokRequestParams, "optimizeResponse">): Promise<ExtendedToolResponse>;
@@ -329,6 +392,10 @@ export interface MistralRequestParams {
     forceRefresh?: boolean;
     /** Phase 4 slice γ: emit `--trust` for fresh-workspace headless runs. */
     trust?: boolean;
+    /** Phase 4 slice δ: Vibe `--max-turns N` cap on agent-loop iterations. */
+    maxTurns?: number;
+    /** Phase 4 slice δ: Vibe `--max-price DOLLARS` cumulative-cost cap. */
+    maxPrice?: number;
 }
 export declare function handleMistralRequest(deps: HandlerDeps, params: MistralRequestParams): Promise<ExtendedToolResponse>;
 export declare function handleMistralRequestAsync(deps: AsyncHandlerDeps, params: Omit<MistralRequestParams, "optimizeResponse">): Promise<ExtendedToolResponse>;

package/dist/index.js CHANGED Viewed

@@ -9,7 +9,7 @@ import { z } from "zod";
 import { executeCli, killAllProcessGroups } from "./executor.js";
 import { parseStreamJson } from "./stream-json-parser.js";
 import { parseCodexJsonStream } from "./codex-json-parser.js";
-import { parseGeminiJson } from "./gemini-json-parser.js";
+import { parseGeminiJson, parseGeminiStreamJson } from "./gemini-json-parser.js";
 import { parseVibeMetaJson } from "./mistral-meta-json-parser.js";
 import { homedir } from "os";
 import { createSessionManager } from "./session-manager.js";
@@ -229,6 +229,23 @@ function getApprovalManager(runtimeLogger = logger) {
     return approvalManager;
 }
 const MCP_SERVER_ENUM = z.enum(CLAUDE_MCP_SERVER_NAMES);
+/**
+ * Phase 4 slice δ — shared Zod fragments for `maxTurns` / `maxPrice`.
+ *
+ * Both flags reach the upstream CLIs as decimal-formatted argv strings via
+ * `String(N)`. `z.number().int().positive()` alone lets values past
+ * `Number.MAX_SAFE_INTEGER` through, after which `String(1e21)` emits
+ * scientific notation that Grok and Vibe both reject. The bounds below
+ * (safe-integer cap + 10000 ceiling for turns; finite + 10000 USD ceiling
+ * for price) guarantee a lossless decimal stringification AND a sane
+ * upper bound — no plausible single agent loop exceeds 10k turns or 10k USD.
+ */
+export const MAX_TURNS_SCHEMA = z.number().int().positive().safe().max(10_000);
+// `.min(1e-6)` keeps the value in JS's decimal-stringify range:
+// String(1e-6) === "0.000001" but String(1e-7) === "1e-7", which both
+// upstream CLIs would reject. 1µUSD per request is fine-grained enough
+// for any plausible budget-cap use.
+export const MAX_PRICE_SCHEMA = z.number().positive().finite().min(1e-6).max(10_000);
 // U22: Session-provider enum extended to five providers. The storage layer's
 // CLI_TYPES already includes "mistral"; the MCP-tool layer mirrors that here so
 // session_create / session_list / session_clear_all accept the fifth provider.
@@ -513,8 +530,8 @@ ctx) {
             costUsd: parsed.usage.cost_usd,
         };
     }
-    if (cli === "gemini" && outputFormat === "json") {
-        const parsed = parseGeminiJson(output);
+    if (cli === "gemini" && (outputFormat === "json" || outputFormat === "stream-json")) {
+        const parsed = outputFormat === "stream-json" ? parseGeminiStreamJson(output) : parseGeminiJson(output);
         if (!parsed || !parsed.usage) {
             return {};
         }
@@ -1254,9 +1271,19 @@ export function prepareGeminiRequest(params, runtime = resolveGatewayServerRunti
     // U23 fix: emit `-o json` when the caller asked for JSON output. The Gemini
     // JSON parser is otherwise unreachable from the tool surface and the
     // structured usageMetadata is silently dropped.
+    //
+    // Phase 4 slice ε: same wiring for `-o stream-json` (NDJSON event stream).
+    // Gemini already streams stdout in real-time so the existing 10-minute
+    // idle timeout (CLI_IDLE_TIMEOUTS.gemini) covers both modes without
+    // adjustment — unlike Claude, no `--include-partial-messages` companion
+    // flag is required because Gemini emits assistant `delta` events as part
+    // of the default stream-json shape.
     if (params.outputFormat === "json") {
         args.push("-o", "json");
     }
+    else if (params.outputFormat === "stream-json") {
+        args.push("-o", "stream-json");
+    }
     // Phase 4 slice γ: opt-in trust-prompt bypass for fresh workspaces.
     if (params.skipTrust) {
         args.push("--skip-trust");
@@ -1273,7 +1300,7 @@ export function prepareGeminiRequest(params, runtime = resolveGatewayServerRunti
         stablePrefixTokens,
     };
 }
-function prepareGrokRequest(params, runtime = resolveGatewayServerRuntime()) {
+export function prepareGrokRequest(params, runtime = resolveGatewayServerRuntime()) {
     const corrId = params.correlationId || randomUUID();
     const cliInfo = getCliInfo();
     const resolvedModel = resolveModelAlias("grok", params.model, cliInfo);
@@ -1349,6 +1376,9 @@ function prepareGrokRequest(params, runtime = resolveGatewayServerRuntime()) {
     if (params.disallowedTools && params.disallowedTools.length > 0) {
         args.push("--disallowed-tools", params.disallowedTools.join(","));
     }
+    if (params.maxTurns !== undefined) {
+        args.push("--max-turns", String(params.maxTurns));
+    }
     return {
         corrId,
         effectivePrompt,
@@ -1433,6 +1463,8 @@ export function prepareMistralRequest(params, runtime = resolveGatewayServerRunt
         allowedTools: params.allowedTools,
         disallowedTools: params.disallowedTools,
         trust: params.trust,
+        maxTurns: params.maxTurns,
+        maxPrice: params.maxPrice,
     });
     if (prep.ignoredDisallowedTools) {
         runtime.logger.info(`[${corrId}] Mistral does not support disallowedTools; ignoring (caller passed ${params.disallowedTools?.length ?? 0} entries)`);
@@ -1463,6 +1495,32 @@ function selectMistralRecoveryModel(failedModel) {
     ].filter((model) => Boolean(model && model !== failedModel));
     return candidates.find(model => model !== "local");
 }
+/**
+ * Phase 4 slice δ post-review: pure helper extracted from
+ * `handleMistralRequest` so the retry-path arg-preservation invariants
+ * (trust + maxTurns + maxPrice from slices γ/δ) are unit-testable
+ * without mocking awaitJobOrDefer. Any param the wrapper threads into
+ * the FIRST `buildMistralCliInvocation` call MUST also be threaded
+ * through here, or a fresh-workspace / budgeted run can degrade on
+ * the second attempt.
+ */
+export function buildMistralRetryPrep(params, recoveryModel) {
+    return buildMistralCliInvocation({
+        prompt: params.effectivePrompt,
+        resolvedModel: recoveryModel,
+        outputFormat: params.outputFormat,
+        permissionMode: params.approvalStrategy === "mcp_managed"
+            ? "auto-approve"
+            : (params.permissionMode ?? "auto-approve"),
+        effort: params.effort,
+        reasoningEffort: params.reasoningEffort,
+        allowedTools: params.allowedTools,
+        disallowedTools: params.disallowedTools,
+        trust: params.trust,
+        maxTurns: params.maxTurns,
+        maxPrice: params.maxPrice,
+    });
+}
 function buildCliResponse(cli, stdout, optimizeResponse, corrId, sessionId, prep, durationMs, resumable, outputFormat, warnings) {
     let finalStdout = stdout;
     // Skip response optimization for JSON output to prevent corrupting structured data
@@ -1801,6 +1859,7 @@ export async function handleGrokRequest(deps, params) {
         correlationId: params.correlationId,
         optimizePrompt: params.optimizePrompt,
         operation: "grok_request",
+        maxTurns: params.maxTurns,
     }, runtime);
     if (!("args" in prep))
         return prep;
@@ -1921,6 +1980,7 @@ export async function handleGrokRequestAsync(deps, params) {
         correlationId: params.correlationId,
         optimizePrompt: params.optimizePrompt,
         operation: "grok_request_async",
+        maxTurns: params.maxTurns,
     }, runtime);
     if (!("args" in prep))
         return prep;
@@ -2003,6 +2063,8 @@ export async function handleMistralRequest(deps, params) {
         optimizePrompt: params.optimizePrompt,
         operation: "mistral_request",
         trust: params.trust,
+        maxTurns: params.maxTurns,
+        maxPrice: params.maxPrice,
     }, runtime);
     if (!("args" in prep))
         return prep;
@@ -2035,22 +2097,7 @@ export async function handleMistralRequest(deps, params) {
             const recoveryModel = selectMistralRecoveryModel(prep.resolvedModel);
             if (recoveryModel) {
                 deps.logger.info(`[${corrId}] mistral_request detected stale Vibe model selection; retrying once with ${recoveryModel}`);
-                const retryPrep = buildMistralCliInvocation({
-                    prompt: prep.effectivePrompt,
-                    resolvedModel: recoveryModel,
-                    outputFormat: params.outputFormat,
-                    permissionMode: params.approvalStrategy === "mcp_managed"
-                        ? "auto-approve"
-                        : (params.permissionMode ?? "auto-approve"),
-                    effort: params.effort,
-                    reasoningEffort: params.reasoningEffort,
-                    allowedTools: params.allowedTools,
-                    disallowedTools: params.disallowedTools,
-                    // Phase 4 slice γ: preserve --trust on the model-selection retry
-                    // so a fresh untrusted workspace doesn't block headlessly on the
-                    // second attempt after surviving the first.
-                    trust: params.trust,
-                });
+                const retryPrep = buildMistralRetryPrep({ ...params, effectivePrompt: prep.effectivePrompt }, recoveryModel);
                 const retryArgs = [...retryPrep.args, ...sessionResult.resumeArgs];
                 // Reuse the FR handoff built above — the retry preserves corrId,
                 // so the manager's logComplete still updates the original row.
@@ -2151,6 +2198,8 @@ export async function handleMistralRequestAsync(deps, params) {
         optimizePrompt: params.optimizePrompt,
         operation: "mistral_request_async",
         trust: params.trust,
+        maxTurns: params.maxTurns,
+        maxPrice: params.maxPrice,
     }, runtime);
     if (!("args" in prep))
         return prep;
@@ -3030,11 +3079,14 @@ export function createGatewayServer(deps = {}) {
             .default(false)
             .describe("Bypass dedup and force a fresh CLI run even if a recent identical request exists"),
         // U23: emit `-o json` to extract token usage via parseGeminiJson. Default
-        // remains text so existing callers see no behavior change.
+        // remains text so existing callers see no behavior change. Phase 4 slice
+        // ε adds `stream-json` (NDJSON event stream parsed by
+        // parseGeminiStreamJson — `init`/`message`/`result` lines, idle-timeout
+        // semantics covered by Gemini's existing real-time stdout streaming).
         outputFormat: z
-            .enum(["text", "json"])
+            .enum(["text", "json", "stream-json"])
             .default("text")
-            .describe("Gemini output format. `json` emits `-o json` so usageMetadata is parsed and reported."),
+            .describe("Gemini output format. `json` emits `-o json` (single JSON with usageMetadata). `stream-json` emits `-o stream-json` (NDJSON event stream — `init`/`message`/`result` lines, usage extracted from the terminal `result.stats` event). Both report usage to the flight recorder."),
         sandbox: GEMINI_HIGH_IMPACT_PARAMS_SCHEMA.shape.sandbox.describe("Run Gemini in sandbox mode (-s)"),
         policyFiles: GEMINI_HIGH_IMPACT_PARAMS_SCHEMA.shape.policyFiles.describe("Policy file paths (--policy <path>, one per file). Paths must exist."),
         adminPolicyFiles: GEMINI_HIGH_IMPACT_PARAMS_SCHEMA.shape.adminPolicyFiles.describe("Admin policy file paths (--admin-policy <path>, one per file). Paths must exist."),
@@ -3142,7 +3194,8 @@ export function createGatewayServer(deps = {}) {
             .boolean()
             .default(false)
             .describe("Bypass dedup and force a fresh CLI run even if a recent identical request exists"),
-    }, async ({ prompt, promptParts, model, outputFormat, sessionId, resumeLatest, createNewSession, alwaysApprove, permissionMode, effort, reasoningEffort, approvalStrategy, approvalPolicy, mcpServers, allowedTools, disallowedTools, correlationId, optimizePrompt, optimizeResponse, idleTimeoutMs, forceRefresh, }) => {
+        maxTurns: MAX_TURNS_SCHEMA.optional().describe("Grok `--max-turns N`: cap on agent-loop iterations for cost / latency control (Phase 4 slice δ). Bounded to safe integers ≤ 10000."),
+    }, async ({ prompt, promptParts, model, outputFormat, sessionId, resumeLatest, createNewSession, alwaysApprove, permissionMode, effort, reasoningEffort, approvalStrategy, approvalPolicy, mcpServers, allowedTools, disallowedTools, correlationId, optimizePrompt, optimizeResponse, idleTimeoutMs, forceRefresh, maxTurns, }) => {
         return handleGrokRequest({ sessionManager, logger, runtime }, {
             prompt,
             promptParts,
@@ -3165,6 +3218,7 @@ export function createGatewayServer(deps = {}) {
             optimizeResponse,
             idleTimeoutMs,
             forceRefresh,
+            maxTurns,
         });
     });
     //──────────────────────────────────────────────────────────────────────────────
@@ -3242,7 +3296,9 @@ export function createGatewayServer(deps = {}) {
             .boolean()
             .default(false)
             .describe("Emit `--trust` so Vibe trusts the cwd for this invocation only (not persisted to trusted_folders.toml) and skips the interactive trust prompt (Phase 4 slice γ)."),
-    }, async ({ prompt, promptParts, model, outputFormat, sessionId, resumeLatest, createNewSession, permissionMode, effort, reasoningEffort, approvalStrategy, approvalPolicy, mcpServers, allowedTools, disallowedTools, correlationId, optimizePrompt, optimizeResponse, idleTimeoutMs, forceRefresh, trust, }) => {
+        maxTurns: MAX_TURNS_SCHEMA.optional().describe("Vibe `--max-turns N`: cap the agent-loop iteration count (programmatic mode only, Phase 4 slice δ). Bounded to safe integers ≤ 10000."),
+        maxPrice: MAX_PRICE_SCHEMA.optional().describe("Vibe `--max-price DOLLARS`: interrupt the session when cumulative cost crosses this cap (programmatic mode only, Phase 4 slice δ). Bounded to finite values ≤ 10000 USD."),
+    }, async ({ prompt, promptParts, model, outputFormat, sessionId, resumeLatest, createNewSession, permissionMode, effort, reasoningEffort, approvalStrategy, approvalPolicy, mcpServers, allowedTools, disallowedTools, correlationId, optimizePrompt, optimizeResponse, idleTimeoutMs, forceRefresh, trust, maxTurns, maxPrice, }) => {
         return handleMistralRequest({ sessionManager, logger, runtime }, {
             prompt,
             promptParts,
@@ -3265,6 +3321,8 @@ export function createGatewayServer(deps = {}) {
             idleTimeoutMs,
             forceRefresh,
             trust,
+            maxTurns,
+            maxPrice,
         });
     });
     //──────────────────────────────────────────────────────────────────────────────
@@ -3646,11 +3704,14 @@ export function createGatewayServer(deps = {}) {
                 .default(false)
                 .describe("Bypass dedup and force a fresh CLI run even if a recent identical request exists"),
             // U23: emit `-o json` to extract token usage via parseGeminiJson. Default
-            // remains text so existing callers see no behavior change.
+            // remains text so existing callers see no behavior change. Phase 4 slice
+            // ε adds `stream-json` (NDJSON event stream parsed by
+            // parseGeminiStreamJson — `init`/`message`/`result` lines, idle-timeout
+            // semantics covered by Gemini's existing real-time stdout streaming).
             outputFormat: z
-                .enum(["text", "json"])
+                .enum(["text", "json", "stream-json"])
                 .default("text")
-                .describe("Gemini output format. `json` emits `-o json` so usageMetadata is parsed and reported."),
+                .describe("Gemini output format. `json` emits `-o json` (single JSON with usageMetadata). `stream-json` emits `-o stream-json` (NDJSON event stream — `init`/`message`/`result` lines, usage extracted from the terminal `result.stats` event). Both report usage to the flight recorder."),
             sandbox: GEMINI_HIGH_IMPACT_PARAMS_SCHEMA.shape.sandbox.describe("Run Gemini in sandbox mode (-s)"),
             policyFiles: GEMINI_HIGH_IMPACT_PARAMS_SCHEMA.shape.policyFiles.describe("Policy file paths (--policy <path>, one per file). Paths must exist."),
             adminPolicyFiles: GEMINI_HIGH_IMPACT_PARAMS_SCHEMA.shape.adminPolicyFiles.describe("Admin policy file paths (--admin-policy <path>, one per file). Paths must exist."),
@@ -3753,7 +3814,8 @@ export function createGatewayServer(deps = {}) {
                 .boolean()
                 .default(false)
                 .describe("Bypass dedup and force a fresh CLI run even if a recent identical request exists"),
-        }, async ({ prompt, promptParts, model, outputFormat, sessionId, resumeLatest, createNewSession, alwaysApprove, permissionMode, effort, reasoningEffort, approvalStrategy, approvalPolicy, mcpServers, allowedTools, disallowedTools, correlationId, optimizePrompt, idleTimeoutMs, forceRefresh, }) => {
+            maxTurns: MAX_TURNS_SCHEMA.optional().describe("Grok `--max-turns N`: cap on agent-loop iterations for cost / latency control (Phase 4 slice δ). Bounded to safe integers ≤ 10000."),
+        }, async ({ prompt, promptParts, model, outputFormat, sessionId, resumeLatest, createNewSession, alwaysApprove, permissionMode, effort, reasoningEffort, approvalStrategy, approvalPolicy, mcpServers, allowedTools, disallowedTools, correlationId, optimizePrompt, idleTimeoutMs, forceRefresh, maxTurns, }) => {
             return handleGrokRequestAsync({ sessionManager, asyncJobManager, logger, runtime }, {
                 prompt,
                 promptParts,
@@ -3775,6 +3837,7 @@ export function createGatewayServer(deps = {}) {
                 optimizePrompt,
                 idleTimeoutMs,
                 forceRefresh,
+                maxTurns,
             });
         });
         server.tool("mistral_request_async", {
@@ -3848,7 +3911,9 @@ export function createGatewayServer(deps = {}) {
                 .boolean()
                 .default(false)
                 .describe("Emit `--trust` so Vibe trusts the cwd for this invocation only (not persisted to trusted_folders.toml) and skips the interactive trust prompt (Phase 4 slice γ)."),
-        }, async ({ prompt, promptParts, model, outputFormat, sessionId, resumeLatest, createNewSession, permissionMode, effort, reasoningEffort, approvalStrategy, approvalPolicy, mcpServers, allowedTools, disallowedTools, correlationId, optimizePrompt, idleTimeoutMs, forceRefresh, trust, }) => {
+            maxTurns: MAX_TURNS_SCHEMA.optional().describe("Vibe `--max-turns N`: cap the agent-loop iteration count (programmatic mode only, Phase 4 slice δ). Bounded to safe integers ≤ 10000."),
+            maxPrice: MAX_PRICE_SCHEMA.optional().describe("Vibe `--max-price DOLLARS`: interrupt the session when cumulative cost crosses this cap (programmatic mode only, Phase 4 slice δ). Bounded to finite values ≤ 10000 USD."),
+        }, async ({ prompt, promptParts, model, outputFormat, sessionId, resumeLatest, createNewSession, permissionMode, effort, reasoningEffort, approvalStrategy, approvalPolicy, mcpServers, allowedTools, disallowedTools, correlationId, optimizePrompt, idleTimeoutMs, forceRefresh, trust, maxTurns, maxPrice, }) => {
             return handleMistralRequestAsync({ sessionManager, asyncJobManager, logger, runtime }, {
                 prompt,
                 promptParts,
@@ -3870,6 +3935,8 @@ export function createGatewayServer(deps = {}) {
                 idleTimeoutMs,
                 forceRefresh,
                 trust,
+                maxTurns,
+                maxPrice,
             });
         });
         server.tool("llm_job_status", {

package/dist/request-helpers.d.ts CHANGED Viewed

@@ -114,6 +114,17 @@ export interface PrepareMistralRequestInput {
      * Vibe's prompt behaviour is preserved for existing callers.
      */
     trust?: boolean;
+    /**
+     * Phase 4 slice δ: emit `--max-turns N` to cap the agent-loop iteration
+     * count (only applies in programmatic mode with `-p`).
+     */
+    maxTurns?: number;
+    /**
+     * Phase 4 slice δ: emit `--max-price DOLLARS` so the session is
+     * interrupted when cumulative cost crosses the cap (programmatic mode
+     * only).
+     */
+    maxPrice?: number;
 }
 export interface PrepareMistralRequestResult {
     args: string[];

package/dist/request-helpers.js CHANGED Viewed

@@ -179,6 +179,12 @@ export function prepareMistralRequest(input) {
     if (input.trust) {
         args.push("--trust");
     }
+    if (input.maxTurns !== undefined) {
+        args.push("--max-turns", String(input.maxTurns));
+    }
+    if (input.maxPrice !== undefined) {
+        args.push("--max-price", String(input.maxPrice));
+    }
     const ignoredDisallowedTools = Boolean(input.disallowedTools && input.disallowedTools.length > 0);
     return { args, env, ignoredDisallowedTools };
 }

package/dist/upstream-contracts.js CHANGED Viewed

@@ -133,14 +133,11 @@ export const UPSTREAM_CLI_CONTRACTS = {
             "ignoreRules",
         ],
         resumeOnlyFlags: ["--last"],
-        resumeForbiddenFlags: [
-            "--sandbox",
-            "--ask-for-approval",
-            "--full-auto",
-            "--output-schema",
-            "--search",
-            "-c",
-        ],
+        // Phase 4 slice α (v1.8.0) verified that `codex exec resume` accepts
+        // `--output-schema` and `-c` (codex-cli 0.133.0 `exec resume --help`),
+        // so they're no longer forbidden. `--search` stays forbidden (resume
+        // inherits the original session's web-search state).
+        resumeForbiddenFlags: ["--sandbox", "--ask-for-approval", "--full-auto", "--search"],
         flags: {
             "--last": { arity: "none", description: "Resume latest session" },
             "--model": { arity: "one", description: "Model selector" },
@@ -189,9 +186,24 @@ export const UPSTREAM_CLI_CONTRACTS = {
                 expect: "fail",
             },
             {
+                // Phase 4 slice α: --output-schema IS accepted on resume per
+                // codex-cli 0.133.0; this fixture pins the new behaviour so future
+                // contract changes can't silently regress.
                 id: "codex-resume-output-schema",
-                description: "Resume-incompatible output schema flag is rejected",
+                description: "Phase 4 slice α: --output-schema accepted on resume (codex-cli 0.133.0)",
                 args: ["exec", "resume", "--output-schema", "/tmp/schema.json", "session-id", "hello"],
+                expect: "pass",
+            },
+            {
+                id: "codex-resume-config-override",
+                description: "Phase 4 slice α: -c key=value accepted on resume",
+                args: ["exec", "resume", "-c", "model.foo=bar", "session-id", "hello"],
+                expect: "pass",
+            },
+            {
+                id: "codex-resume-search-still-forbidden",
+                description: "Phase 4 slice α: --search remains forbidden on resume",
+                args: ["exec", "resume", "--search", "session-id", "hello"],
                 expect: "fail",
             },
         ],
@@ -219,6 +231,8 @@ export const UPSTREAM_CLI_CONTRACTS = {
             "policyFiles",
             "adminPolicyFiles",
             "attachments",
+            // Phase 4 slice γ
+            "skipTrust",
         ],
         flags: {
             "-p": { arity: "one", description: "Prompt text" },
@@ -234,8 +248,16 @@ export const UPSTREAM_CLI_CONTRACTS = {
             "-s": { arity: "none", description: "Sandbox mode" },
             "--policy": { arity: "one", description: "Policy file path" },
             "--admin-policy": { arity: "one", description: "Admin policy file path" },
-            "-o": { arity: "one", values: ["json"], description: "Output format" },
+            "-o": {
+                arity: "one",
+                values: ["json", "stream-json"],
+                description: "Output format (Phase 4 slice ε adds stream-json)",
+            },
             "--resume": { arity: "one", description: "Resume session" },
+            "--skip-trust": {
+                arity: "none",
+                description: "Trust workspace for this session (Phase 4 slice γ)",
+            },
         },
         env: {},
         conformanceFixtures: [
@@ -251,6 +273,24 @@ export const UPSTREAM_CLI_CONTRACTS = {
                 args: ["-p", "hello", "--not-a-gemini-flag"],
                 expect: "fail",
             },
+            {
+                id: "gemini-skip-trust",
+                description: "Phase 4 slice γ: --skip-trust is accepted",
+                args: ["-p", "hello", "--skip-trust"],
+                expect: "pass",
+            },
+            {
+                id: "gemini-stream-json",
+                description: "Phase 4 slice ε: -o stream-json is accepted",
+                args: ["-p", "hello", "-o", "stream-json"],
+                expect: "pass",
+            },
+            {
+                id: "gemini-output-format-invalid",
+                description: "Phase 4 slice ε: -o ndjson is rejected (not in contract enum)",
+                args: ["-p", "hello", "-o", "ndjson"],
+                expect: "fail",
+            },
         ],
     },
     grok: {
@@ -275,6 +315,8 @@ export const UPSTREAM_CLI_CONTRACTS = {
             "mcpServers",
             "allowedTools",
             "disallowedTools",
+            // Phase 4 slice δ
+            "maxTurns",
         ],
         flags: {
             "-p": { arity: "one", description: "Prompt text" },
@@ -299,6 +341,11 @@ export const UPSTREAM_CLI_CONTRACTS = {
             },
             "--resume": { arity: "one", description: "Resume session" },
             "--continue": { arity: "none", description: "Continue latest session" },
+            "--max-turns": {
+                arity: "one",
+                pattern: /^[1-9][0-9]*$/,
+                description: "Agent-loop iteration cap (Phase 4 slice δ)",
+            },
         },
         env: {},
         conformanceFixtures: [
@@ -314,6 +361,18 @@ export const UPSTREAM_CLI_CONTRACTS = {
                 args: ["-p", "hello", "--not-a-grok-flag"],
                 expect: "fail",
             },
+            {
+                id: "grok-max-turns",
+                description: "Phase 4 slice δ: --max-turns N is accepted",
+                args: ["-p", "hello", "--max-turns", "5"],
+                expect: "pass",
+            },
+            {
+                id: "grok-max-turns-invalid-zero",
+                description: "Phase 4 slice δ: --max-turns 0 is rejected by contract pattern",
+                args: ["-p", "hello", "--max-turns", "0"],
+                expect: "fail",
+            },
         ],
     },
     mistral: {
@@ -337,6 +396,11 @@ export const UPSTREAM_CLI_CONTRACTS = {
             "mcpServers",
             "allowedTools",
             "disallowedTools",
+            // Phase 4 slice γ
+            "trust",
+            // Phase 4 slice δ
+            "maxTurns",
+            "maxPrice",
         ],
         flags: {
             "-p": { arity: "one", description: "Prompt text" },
@@ -355,6 +419,22 @@ export const UPSTREAM_CLI_CONTRACTS = {
             "--enabled-tools": { arity: "one", description: "Enabled tool" },
             "--resume": { arity: "one", description: "Resume session" },
             "--continue": { arity: "none", description: "Continue latest session" },
+            "--trust": {
+                arity: "none",
+                description: "Trust cwd for this invocation only (Phase 4 slice γ)",
+            },
+            "--max-turns": {
+                arity: "one",
+                pattern: /^[1-9][0-9]*$/,
+                description: "Agent-loop iteration cap (Phase 4 slice δ, programmatic mode only)",
+            },
+            "--max-price": {
+                arity: "one",
+                // Decimal-only: matches the MAX_PRICE_SCHEMA min(1e-6) lower bound
+                // that keeps String(N) in decimal form (no scientific notation).
+                pattern: /^(0|[1-9][0-9]*)(\.[0-9]+)?$/,
+                description: "Cumulative cost cap in USD (Phase 4 slice δ, programmatic mode only)",
+            },
         },
         env: {
             VIBE_ACTIVE_MODEL: {
@@ -378,6 +458,27 @@ export const UPSTREAM_CLI_CONTRACTS = {
                 env: { CODEX_MODEL: "gpt-5.5" },
                 expect: "fail",
             },
+            {
+                id: "mistral-trust",
+                description: "Phase 4 slice γ: --trust is accepted",
+                args: ["-p", "hello", "--agent", "auto-approve", "--trust"],
+                env: { VIBE_ACTIVE_MODEL: "mistral-medium-3.5" },
+                expect: "pass",
+            },
+            {
+                id: "mistral-max-turns-and-price",
+                description: "Phase 4 slice δ: --max-turns + --max-price are accepted together",
+                args: ["-p", "hello", "--agent", "auto-approve", "--max-turns", "3", "--max-price", "0.01"],
+                env: { VIBE_ACTIVE_MODEL: "mistral-medium-3.5" },
+                expect: "pass",
+            },
+            {
+                id: "mistral-max-price-scientific-notation",
+                description: "Phase 4 slice δ: scientific-notation --max-price is rejected by contract pattern (matches MAX_PRICE_SCHEMA bounds)",
+                args: ["-p", "hello", "--agent", "auto-approve", "--max-price", "1e-7"],
+                env: { VIBE_ACTIVE_MODEL: "mistral-medium-3.5" },
+                expect: "fail",
+            },
         ],
     },
 };

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "llm-cli-gateway",
-  "version": "1.8.0",
+  "version": "1.10.0",
   "mcpName": "io.github.verivus-oss/llm-cli-gateway",
   "description": "MCP server providing unified access to Claude Code, Codex, Gemini, Grok, and Mistral Vibe CLIs with session management, retry logic, async job orchestration, durable job results, and cross-LLM validation.",
   "license": "MIT",