npm - llm-cli-gateway - Versions diffs - 1.8.0 → 1.9.0 - Mend

llm-cli-gateway 1.8.0 → 1.9.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (7) hide show

package/CHANGELOG.md +98 -0
package/dist/index.d.ts +62 -0
package/dist/index.js +72 -21
package/dist/request-helpers.d.ts +11 -0
package/dist/request-helpers.js +6 -0
package/dist/upstream-contracts.js +94 -9
package/package.json +1 -1

package/CHANGELOG.md CHANGED Viewed

@@ -2,6 +2,104 @@
 All notable changes to the llm-cli-gateway project.
+## [1.9.0] - 2026-05-27 — Phase 4 slice δ (budget/max-turns parity) + retroactive α/γ contract closure
+Ships the fourth Phase 4 slice (budget/max-turns parity for Grok and Mistral),
+and retroactively closes three latent contract gaps that shipped silently in
+v1.8.0 (slices α and γ). Five commits land together: the slice δ feature,
+two bounds-tightening fixes, a contract-table closure, and a test-veracity
+hardening pass driven by an iterative multi-LLM audit.
+### Added — `maxTurns` / `maxPrice` budget caps (slice δ)
+- `grok_request` and `grok_request_async` gain optional `maxTurns?: number`
+  → emits `grok --max-turns N`. Grok exposes no per-request budget flag,
+  so `--max-price` is Mistral-only.
+- `mistral_request` and `mistral_request_async` gain optional
+  `maxTurns?: number` → `vibe --max-turns N` AND `maxPrice?: number` →
+  `vibe --max-price DOLLARS`. Both apply only in programmatic mode (`-p`),
+  matching Vibe's documented constraint.
+- The Mistral stale-model recovery retry path (extracted into a pure
+  `buildMistralRetryPrep` helper) preserves all three slice-γ/δ flags
+  (`trust`, `maxTurns`, `maxPrice`) on the second attempt.
+- Defaults: undefined for all three new fields → no flag emitted →
+  existing callers see no behavioural change.
+### Fixed — Bounded numeric schemas for lossless argv stringification
+- Extracted two shared, exported Zod constants:
+  - `MAX_TURNS_SCHEMA = z.number().int().positive().safe().max(10_000)`
+  - `MAX_PRICE_SCHEMA = z.number().positive().finite().min(1e-6).max(10_000)`
+- The lower `.min(1e-6)` cap on price is exactly the boundary where
+  `String(N)` switches from decimal to scientific notation
+  (`String(1e-6) === "0.000001"` but `String(1e-7) === "1e-7"`); both
+  upstream CLIs reject scientific-notation values.
+- Reused across all four slice-δ tool registrations so bounds stay
+  consistent if they ever need to change.
+### Fixed — Upstream contract table closes 5 latent flag gaps
+`assertUpstreamCliArgs` consults `UPSTREAM_CLI_CONTRACTS` on every real
+`*_request` call. The following flags / mcpParameters were never registered
+there before this release, so production calls setting any of them threw
+"Upstream contract violation" at runtime even though the prepare-function
+unit tests passed:
+- **Gemini** (slice γ retroactive): `skipTrust` + `--skip-trust`.
+- **Mistral** (slice γ + δ retroactive): `trust` + `--trust`; `maxTurns` +
+  `--max-turns`; `maxPrice` + `--max-price` (with a strict decimal-only
+  regex matching `MAX_PRICE_SCHEMA`'s lower bound).
+- **Grok** (slice δ): `maxTurns` + `--max-turns`.
+- **Codex** (slice α retroactive): `--output-schema` and `-c` removed
+  from `resumeForbiddenFlags` — verified accepted on `codex exec resume`
+  per codex-cli 0.133.0.
+Conformance fixtures pin each new flag's argv shape, including a
+`mistral-max-price-scientific-notation` fixture that locks the `1e-7`
+rejection at the contract layer.
+### Hardened — Test veracity (multi-LLM audit follow-up)
+Codex + Grok ran iterative test-veracity audits with mutation probes per
+`docs/plans/test-veracity-audit.spec.md`. They proved several added tests
+were not falsifiable on the dimensions their commit messages claimed.
+New file `src/__tests__/test-veracity-regressions.test.ts` closes those
+gaps with six describe blocks:
+- **REGRESSIONS A** — probes registered tool `inputSchema` bounds
+  directly (not the bare schema constants), so schema-drift in any of
+  the four sync/async registrations is caught.
+- **REGRESSIONS B** — tests the pure `buildMistralRetryPrep` helper
+  across all combinations of `trust × maxTurns × maxPrice`. Self-
+  validated: dropping any of the three forwards on retry goes red.
+- **REGRESSIONS C** — positive allowlist asserting slice α/γ/δ
+  parameters live in the matching contract's `mcpParameters` (closes
+  the self-oracle gap where removing a param from BOTH the contract
+  AND the schema previously stayed green).
+- **REGRESSIONS D** — threads `prepare*Request` output into
+  `validateUpstreamCliArgs` end-to-end; the exact consistency check
+  the latent v1.8.0 contract breaks would have failed.
+- **REGRESSIONS E** — `it.each` over sync AND async variants of every
+  slice-touched tool; the existing C4 was sync-only.
+- **REGRESSIONS F** — flag-fixture coverage map: every flag in each
+  contract `flags` table must be exercised by a passing fixture (with
+  a grandfathered pre-audit baseline). Forces future slice authors to
+  add a fixture alongside any new flag entry.
+The existing C4 (`MCP request schemas expose the provider contract
+parameters`) now walks `_async` tools too.
+### Notes
+Multi-LLM review across multiple iterative rounds, ending with a
+dedicated test-veracity audit per Werner's strict-evidence protocol
+(documented in `docs/plans/test-veracity-audit.spec.md`). Round 2 of the
+audit landed UNCONDITIONAL APPROVE from Codex, Grok, Claude, and Mistral
+with full mutation-probe evidence — every documented counterexample
+mutation went red as predicted; tests are falsifiable by exactly the
+regressions they claim to guard against. Gemini was quota-exhausted
+during the audit window (~6h reset) and did not participate in round 2.
 ## [1.8.0] - 2026-05-27 — Phase 4 openers (codex resume fix, mistral telemetry, headless trust flags)
 Ships the first three slices of the Phase 4 provider-modernisation

package/dist/index.d.ts CHANGED Viewed

@@ -54,6 +54,19 @@ declare const logger: {
     debug: (message: string, ...args: any[]) => void;
 };
 type GatewayLogger = typeof logger;
+/**
+ * Phase 4 slice δ — shared Zod fragments for `maxTurns` / `maxPrice`.
+ *
+ * Both flags reach the upstream CLIs as decimal-formatted argv strings via
+ * `String(N)`. `z.number().int().positive()` alone lets values past
+ * `Number.MAX_SAFE_INTEGER` through, after which `String(1e21)` emits
+ * scientific notation that Grok and Vibe both reject. The bounds below
+ * (safe-integer cap + 10000 ceiling for turns; finite + 10000 USD ceiling
+ * for price) guarantee a lossless decimal stringification AND a sane
+ * upper bound — no plausible single agent loop exceeds 10k turns or 10k USD.
+ */
+export declare const MAX_TURNS_SCHEMA: z.ZodNumber;
+export declare const MAX_PRICE_SCHEMA: z.ZodNumber;
 export declare const SESSION_PROVIDER_VALUES: readonly ["claude", "codex", "gemini", "grok", "mistral"];
 export declare const SESSION_PROVIDER_ENUM: z.ZodEnum<["claude", "codex", "gemini", "grok", "mistral"]>;
 export type SessionProvider = (typeof SESSION_PROVIDER_VALUES)[number];
@@ -215,6 +228,29 @@ export declare function prepareGeminiRequest(params: {
      */
     skipTrust?: boolean;
 }, runtime?: GatewayServerRuntime): CliRequestPrep | ExtendedToolResponse;
+export declare function prepareGrokRequest(params: {
+    prompt?: string;
+    promptParts?: PromptParts;
+    model?: string;
+    outputFormat?: string;
+    alwaysApprove?: boolean;
+    permissionMode?: string;
+    effort?: string;
+    reasoningEffort?: string;
+    allowedTools?: string[];
+    disallowedTools?: string[];
+    approvalStrategy: "legacy" | "mcp_managed";
+    approvalPolicy?: string;
+    mcpServers?: ClaudeMcpServerName[];
+    correlationId?: string;
+    optimizePrompt: boolean;
+    operation: string;
+    /**
+     * Phase 4 slice δ: emit `--max-turns N` so callers can cap agent-loop
+     * iterations for cost / latency control. Mirrors Claude's wiring.
+     */
+    maxTurns?: number;
+}, runtime?: GatewayServerRuntime): CliRequestPrep | ExtendedToolResponse;
 export declare function prepareMistralRequest(params: {
     prompt?: string;
     promptParts?: PromptParts;
@@ -236,9 +272,29 @@ export declare function prepareMistralRequest(params: {
      * prompt for this invocation only (not persisted). Default undefined.
      */
     trust?: boolean;
+    /** Phase 4 slice δ: Vibe `--max-turns N` cap on agent-loop iterations. */
+    maxTurns?: number;
+    /** Phase 4 slice δ: Vibe `--max-price DOLLARS` cumulative-cost cap. */
+    maxPrice?: number;
 }, runtime?: GatewayServerRuntime): (CliRequestPrep & {
     mistralEnv: Record<string, string>;
 }) | ExtendedToolResponse;
+/**
+ * Phase 4 slice δ post-review: pure helper extracted from
+ * `handleMistralRequest` so the retry-path arg-preservation invariants
+ * (trust + maxTurns + maxPrice from slices γ/δ) are unit-testable
+ * without mocking awaitJobOrDefer. Any param the wrapper threads into
+ * the FIRST `buildMistralCliInvocation` call MUST also be threaded
+ * through here, or a fresh-workspace / budgeted run can degrade on
+ * the second attempt.
+ */
+export declare function buildMistralRetryPrep(params: Pick<MistralRequestParams, "outputFormat" | "permissionMode" | "effort" | "reasoningEffort" | "allowedTools" | "disallowedTools" | "approvalStrategy" | "trust" | "maxTurns" | "maxPrice"> & {
+    effectivePrompt: string;
+}, recoveryModel: string): {
+    args: string[];
+    env: Record<string, string>;
+    ignoredDisallowedTools: boolean;
+};
 export interface GeminiRequestParams {
     prompt?: string;
     promptParts?: PromptParts;
@@ -303,6 +359,8 @@ export interface GrokRequestParams {
     optimizeResponse?: boolean;
     idleTimeoutMs?: number;
     forceRefresh?: boolean;
+    /** Phase 4 slice δ: cap agent-loop iterations via `--max-turns N`. */
+    maxTurns?: number;
 }
 export declare function handleGrokRequest(deps: HandlerDeps, params: GrokRequestParams): Promise<ExtendedToolResponse>;
 export declare function handleGrokRequestAsync(deps: AsyncHandlerDeps, params: Omit<GrokRequestParams, "optimizeResponse">): Promise<ExtendedToolResponse>;
@@ -329,6 +387,10 @@ export interface MistralRequestParams {
     forceRefresh?: boolean;
     /** Phase 4 slice γ: emit `--trust` for fresh-workspace headless runs. */
     trust?: boolean;
+    /** Phase 4 slice δ: Vibe `--max-turns N` cap on agent-loop iterations. */
+    maxTurns?: number;
+    /** Phase 4 slice δ: Vibe `--max-price DOLLARS` cumulative-cost cap. */
+    maxPrice?: number;
 }
 export declare function handleMistralRequest(deps: HandlerDeps, params: MistralRequestParams): Promise<ExtendedToolResponse>;
 export declare function handleMistralRequestAsync(deps: AsyncHandlerDeps, params: Omit<MistralRequestParams, "optimizeResponse">): Promise<ExtendedToolResponse>;

package/dist/index.js CHANGED Viewed

@@ -229,6 +229,23 @@ function getApprovalManager(runtimeLogger = logger) {
     return approvalManager;
 }
 const MCP_SERVER_ENUM = z.enum(CLAUDE_MCP_SERVER_NAMES);
+/**
+ * Phase 4 slice δ — shared Zod fragments for `maxTurns` / `maxPrice`.
+ *
+ * Both flags reach the upstream CLIs as decimal-formatted argv strings via
+ * `String(N)`. `z.number().int().positive()` alone lets values past
+ * `Number.MAX_SAFE_INTEGER` through, after which `String(1e21)` emits
+ * scientific notation that Grok and Vibe both reject. The bounds below
+ * (safe-integer cap + 10000 ceiling for turns; finite + 10000 USD ceiling
+ * for price) guarantee a lossless decimal stringification AND a sane
+ * upper bound — no plausible single agent loop exceeds 10k turns or 10k USD.
+ */
+export const MAX_TURNS_SCHEMA = z.number().int().positive().safe().max(10_000);
+// `.min(1e-6)` keeps the value in JS's decimal-stringify range:
+// String(1e-6) === "0.000001" but String(1e-7) === "1e-7", which both
+// upstream CLIs would reject. 1µUSD per request is fine-grained enough
+// for any plausible budget-cap use.
+export const MAX_PRICE_SCHEMA = z.number().positive().finite().min(1e-6).max(10_000);
 // U22: Session-provider enum extended to five providers. The storage layer's
 // CLI_TYPES already includes "mistral"; the MCP-tool layer mirrors that here so
 // session_create / session_list / session_clear_all accept the fifth provider.
@@ -1273,7 +1290,7 @@ export function prepareGeminiRequest(params, runtime = resolveGatewayServerRunti
         stablePrefixTokens,
     };
 }
-function prepareGrokRequest(params, runtime = resolveGatewayServerRuntime()) {
+export function prepareGrokRequest(params, runtime = resolveGatewayServerRuntime()) {
     const corrId = params.correlationId || randomUUID();
     const cliInfo = getCliInfo();
     const resolvedModel = resolveModelAlias("grok", params.model, cliInfo);
@@ -1349,6 +1366,9 @@ function prepareGrokRequest(params, runtime = resolveGatewayServerRuntime()) {
     if (params.disallowedTools && params.disallowedTools.length > 0) {
         args.push("--disallowed-tools", params.disallowedTools.join(","));
     }
+    if (params.maxTurns !== undefined) {
+        args.push("--max-turns", String(params.maxTurns));
+    }
     return {
         corrId,
         effectivePrompt,
@@ -1433,6 +1453,8 @@ export function prepareMistralRequest(params, runtime = resolveGatewayServerRunt
         allowedTools: params.allowedTools,
         disallowedTools: params.disallowedTools,
         trust: params.trust,
+        maxTurns: params.maxTurns,
+        maxPrice: params.maxPrice,
     });
     if (prep.ignoredDisallowedTools) {
         runtime.logger.info(`[${corrId}] Mistral does not support disallowedTools; ignoring (caller passed ${params.disallowedTools?.length ?? 0} entries)`);
@@ -1463,6 +1485,32 @@ function selectMistralRecoveryModel(failedModel) {
     ].filter((model) => Boolean(model && model !== failedModel));
     return candidates.find(model => model !== "local");
 }
+/**
+ * Phase 4 slice δ post-review: pure helper extracted from
+ * `handleMistralRequest` so the retry-path arg-preservation invariants
+ * (trust + maxTurns + maxPrice from slices γ/δ) are unit-testable
+ * without mocking awaitJobOrDefer. Any param the wrapper threads into
+ * the FIRST `buildMistralCliInvocation` call MUST also be threaded
+ * through here, or a fresh-workspace / budgeted run can degrade on
+ * the second attempt.
+ */
+export function buildMistralRetryPrep(params, recoveryModel) {
+    return buildMistralCliInvocation({
+        prompt: params.effectivePrompt,
+        resolvedModel: recoveryModel,
+        outputFormat: params.outputFormat,
+        permissionMode: params.approvalStrategy === "mcp_managed"
+            ? "auto-approve"
+            : (params.permissionMode ?? "auto-approve"),
+        effort: params.effort,
+        reasoningEffort: params.reasoningEffort,
+        allowedTools: params.allowedTools,
+        disallowedTools: params.disallowedTools,
+        trust: params.trust,
+        maxTurns: params.maxTurns,
+        maxPrice: params.maxPrice,
+    });
+}
 function buildCliResponse(cli, stdout, optimizeResponse, corrId, sessionId, prep, durationMs, resumable, outputFormat, warnings) {
     let finalStdout = stdout;
     // Skip response optimization for JSON output to prevent corrupting structured data
@@ -1801,6 +1849,7 @@ export async function handleGrokRequest(deps, params) {
         correlationId: params.correlationId,
         optimizePrompt: params.optimizePrompt,
         operation: "grok_request",
+        maxTurns: params.maxTurns,
     }, runtime);
     if (!("args" in prep))
         return prep;
@@ -1921,6 +1970,7 @@ export async function handleGrokRequestAsync(deps, params) {
         correlationId: params.correlationId,
         optimizePrompt: params.optimizePrompt,
         operation: "grok_request_async",
+        maxTurns: params.maxTurns,
     }, runtime);
     if (!("args" in prep))
         return prep;
@@ -2003,6 +2053,8 @@ export async function handleMistralRequest(deps, params) {
         optimizePrompt: params.optimizePrompt,
         operation: "mistral_request",
         trust: params.trust,
+        maxTurns: params.maxTurns,
+        maxPrice: params.maxPrice,
     }, runtime);
     if (!("args" in prep))
         return prep;
@@ -2035,22 +2087,7 @@ export async function handleMistralRequest(deps, params) {
             const recoveryModel = selectMistralRecoveryModel(prep.resolvedModel);
             if (recoveryModel) {
                 deps.logger.info(`[${corrId}] mistral_request detected stale Vibe model selection; retrying once with ${recoveryModel}`);
-                const retryPrep = buildMistralCliInvocation({
-                    prompt: prep.effectivePrompt,
-                    resolvedModel: recoveryModel,
-                    outputFormat: params.outputFormat,
-                    permissionMode: params.approvalStrategy === "mcp_managed"
-                        ? "auto-approve"
-                        : (params.permissionMode ?? "auto-approve"),
-                    effort: params.effort,
-                    reasoningEffort: params.reasoningEffort,
-                    allowedTools: params.allowedTools,
-                    disallowedTools: params.disallowedTools,
-                    // Phase 4 slice γ: preserve --trust on the model-selection retry
-                    // so a fresh untrusted workspace doesn't block headlessly on the
-                    // second attempt after surviving the first.
-                    trust: params.trust,
-                });
+                const retryPrep = buildMistralRetryPrep({ ...params, effectivePrompt: prep.effectivePrompt }, recoveryModel);
                 const retryArgs = [...retryPrep.args, ...sessionResult.resumeArgs];
                 // Reuse the FR handoff built above — the retry preserves corrId,
                 // so the manager's logComplete still updates the original row.
@@ -2151,6 +2188,8 @@ export async function handleMistralRequestAsync(deps, params) {
         optimizePrompt: params.optimizePrompt,
         operation: "mistral_request_async",
         trust: params.trust,
+        maxTurns: params.maxTurns,
+        maxPrice: params.maxPrice,
     }, runtime);
     if (!("args" in prep))
         return prep;
@@ -3142,7 +3181,8 @@ export function createGatewayServer(deps = {}) {
             .boolean()
             .default(false)
             .describe("Bypass dedup and force a fresh CLI run even if a recent identical request exists"),
-    }, async ({ prompt, promptParts, model, outputFormat, sessionId, resumeLatest, createNewSession, alwaysApprove, permissionMode, effort, reasoningEffort, approvalStrategy, approvalPolicy, mcpServers, allowedTools, disallowedTools, correlationId, optimizePrompt, optimizeResponse, idleTimeoutMs, forceRefresh, }) => {
+        maxTurns: MAX_TURNS_SCHEMA.optional().describe("Grok `--max-turns N`: cap on agent-loop iterations for cost / latency control (Phase 4 slice δ). Bounded to safe integers ≤ 10000."),
+    }, async ({ prompt, promptParts, model, outputFormat, sessionId, resumeLatest, createNewSession, alwaysApprove, permissionMode, effort, reasoningEffort, approvalStrategy, approvalPolicy, mcpServers, allowedTools, disallowedTools, correlationId, optimizePrompt, optimizeResponse, idleTimeoutMs, forceRefresh, maxTurns, }) => {
         return handleGrokRequest({ sessionManager, logger, runtime }, {
             prompt,
             promptParts,
@@ -3165,6 +3205,7 @@ export function createGatewayServer(deps = {}) {
             optimizeResponse,
             idleTimeoutMs,
             forceRefresh,
+            maxTurns,
         });
     });
     //──────────────────────────────────────────────────────────────────────────────
@@ -3242,7 +3283,9 @@ export function createGatewayServer(deps = {}) {
             .boolean()
             .default(false)
             .describe("Emit `--trust` so Vibe trusts the cwd for this invocation only (not persisted to trusted_folders.toml) and skips the interactive trust prompt (Phase 4 slice γ)."),
-    }, async ({ prompt, promptParts, model, outputFormat, sessionId, resumeLatest, createNewSession, permissionMode, effort, reasoningEffort, approvalStrategy, approvalPolicy, mcpServers, allowedTools, disallowedTools, correlationId, optimizePrompt, optimizeResponse, idleTimeoutMs, forceRefresh, trust, }) => {
+        maxTurns: MAX_TURNS_SCHEMA.optional().describe("Vibe `--max-turns N`: cap the agent-loop iteration count (programmatic mode only, Phase 4 slice δ). Bounded to safe integers ≤ 10000."),
+        maxPrice: MAX_PRICE_SCHEMA.optional().describe("Vibe `--max-price DOLLARS`: interrupt the session when cumulative cost crosses this cap (programmatic mode only, Phase 4 slice δ). Bounded to finite values ≤ 10000 USD."),
+    }, async ({ prompt, promptParts, model, outputFormat, sessionId, resumeLatest, createNewSession, permissionMode, effort, reasoningEffort, approvalStrategy, approvalPolicy, mcpServers, allowedTools, disallowedTools, correlationId, optimizePrompt, optimizeResponse, idleTimeoutMs, forceRefresh, trust, maxTurns, maxPrice, }) => {
         return handleMistralRequest({ sessionManager, logger, runtime }, {
             prompt,
             promptParts,
@@ -3265,6 +3308,8 @@ export function createGatewayServer(deps = {}) {
             idleTimeoutMs,
             forceRefresh,
             trust,
+            maxTurns,
+            maxPrice,
         });
     });
     //──────────────────────────────────────────────────────────────────────────────
@@ -3753,7 +3798,8 @@ export function createGatewayServer(deps = {}) {
                 .boolean()
                 .default(false)
                 .describe("Bypass dedup and force a fresh CLI run even if a recent identical request exists"),
-        }, async ({ prompt, promptParts, model, outputFormat, sessionId, resumeLatest, createNewSession, alwaysApprove, permissionMode, effort, reasoningEffort, approvalStrategy, approvalPolicy, mcpServers, allowedTools, disallowedTools, correlationId, optimizePrompt, idleTimeoutMs, forceRefresh, }) => {
+            maxTurns: MAX_TURNS_SCHEMA.optional().describe("Grok `--max-turns N`: cap on agent-loop iterations for cost / latency control (Phase 4 slice δ). Bounded to safe integers ≤ 10000."),
+        }, async ({ prompt, promptParts, model, outputFormat, sessionId, resumeLatest, createNewSession, alwaysApprove, permissionMode, effort, reasoningEffort, approvalStrategy, approvalPolicy, mcpServers, allowedTools, disallowedTools, correlationId, optimizePrompt, idleTimeoutMs, forceRefresh, maxTurns, }) => {
             return handleGrokRequestAsync({ sessionManager, asyncJobManager, logger, runtime }, {
                 prompt,
                 promptParts,
@@ -3775,6 +3821,7 @@ export function createGatewayServer(deps = {}) {
                 optimizePrompt,
                 idleTimeoutMs,
                 forceRefresh,
+                maxTurns,
             });
         });
         server.tool("mistral_request_async", {
@@ -3848,7 +3895,9 @@ export function createGatewayServer(deps = {}) {
                 .boolean()
                 .default(false)
                 .describe("Emit `--trust` so Vibe trusts the cwd for this invocation only (not persisted to trusted_folders.toml) and skips the interactive trust prompt (Phase 4 slice γ)."),
-        }, async ({ prompt, promptParts, model, outputFormat, sessionId, resumeLatest, createNewSession, permissionMode, effort, reasoningEffort, approvalStrategy, approvalPolicy, mcpServers, allowedTools, disallowedTools, correlationId, optimizePrompt, idleTimeoutMs, forceRefresh, trust, }) => {
+            maxTurns: MAX_TURNS_SCHEMA.optional().describe("Vibe `--max-turns N`: cap the agent-loop iteration count (programmatic mode only, Phase 4 slice δ). Bounded to safe integers ≤ 10000."),
+            maxPrice: MAX_PRICE_SCHEMA.optional().describe("Vibe `--max-price DOLLARS`: interrupt the session when cumulative cost crosses this cap (programmatic mode only, Phase 4 slice δ). Bounded to finite values ≤ 10000 USD."),
+        }, async ({ prompt, promptParts, model, outputFormat, sessionId, resumeLatest, createNewSession, permissionMode, effort, reasoningEffort, approvalStrategy, approvalPolicy, mcpServers, allowedTools, disallowedTools, correlationId, optimizePrompt, idleTimeoutMs, forceRefresh, trust, maxTurns, maxPrice, }) => {
             return handleMistralRequestAsync({ sessionManager, asyncJobManager, logger, runtime }, {
                 prompt,
                 promptParts,
@@ -3870,6 +3919,8 @@ export function createGatewayServer(deps = {}) {
                 idleTimeoutMs,
                 forceRefresh,
                 trust,
+                maxTurns,
+                maxPrice,
             });
         });
         server.tool("llm_job_status", {

package/dist/request-helpers.d.ts CHANGED Viewed

@@ -114,6 +114,17 @@ export interface PrepareMistralRequestInput {
      * Vibe's prompt behaviour is preserved for existing callers.
      */
     trust?: boolean;
+    /**
+     * Phase 4 slice δ: emit `--max-turns N` to cap the agent-loop iteration
+     * count (only applies in programmatic mode with `-p`).
+     */
+    maxTurns?: number;
+    /**
+     * Phase 4 slice δ: emit `--max-price DOLLARS` so the session is
+     * interrupted when cumulative cost crosses the cap (programmatic mode
+     * only).
+     */
+    maxPrice?: number;
 }
 export interface PrepareMistralRequestResult {
     args: string[];

package/dist/request-helpers.js CHANGED Viewed

@@ -179,6 +179,12 @@ export function prepareMistralRequest(input) {
     if (input.trust) {
         args.push("--trust");
     }
+    if (input.maxTurns !== undefined) {
+        args.push("--max-turns", String(input.maxTurns));
+    }
+    if (input.maxPrice !== undefined) {
+        args.push("--max-price", String(input.maxPrice));
+    }
     const ignoredDisallowedTools = Boolean(input.disallowedTools && input.disallowedTools.length > 0);
     return { args, env, ignoredDisallowedTools };
 }

package/dist/upstream-contracts.js CHANGED Viewed

@@ -133,14 +133,11 @@ export const UPSTREAM_CLI_CONTRACTS = {
             "ignoreRules",
         ],
         resumeOnlyFlags: ["--last"],
-        resumeForbiddenFlags: [
-            "--sandbox",
-            "--ask-for-approval",
-            "--full-auto",
-            "--output-schema",
-            "--search",
-            "-c",
-        ],
+        // Phase 4 slice α (v1.8.0) verified that `codex exec resume` accepts
+        // `--output-schema` and `-c` (codex-cli 0.133.0 `exec resume --help`),
+        // so they're no longer forbidden. `--search` stays forbidden (resume
+        // inherits the original session's web-search state).
+        resumeForbiddenFlags: ["--sandbox", "--ask-for-approval", "--full-auto", "--search"],
         flags: {
             "--last": { arity: "none", description: "Resume latest session" },
             "--model": { arity: "one", description: "Model selector" },
@@ -189,9 +186,24 @@ export const UPSTREAM_CLI_CONTRACTS = {
                 expect: "fail",
             },
             {
+                // Phase 4 slice α: --output-schema IS accepted on resume per
+                // codex-cli 0.133.0; this fixture pins the new behaviour so future
+                // contract changes can't silently regress.
                 id: "codex-resume-output-schema",
-                description: "Resume-incompatible output schema flag is rejected",
+                description: "Phase 4 slice α: --output-schema accepted on resume (codex-cli 0.133.0)",
                 args: ["exec", "resume", "--output-schema", "/tmp/schema.json", "session-id", "hello"],
+                expect: "pass",
+            },
+            {
+                id: "codex-resume-config-override",
+                description: "Phase 4 slice α: -c key=value accepted on resume",
+                args: ["exec", "resume", "-c", "model.foo=bar", "session-id", "hello"],
+                expect: "pass",
+            },
+            {
+                id: "codex-resume-search-still-forbidden",
+                description: "Phase 4 slice α: --search remains forbidden on resume",
+                args: ["exec", "resume", "--search", "session-id", "hello"],
                 expect: "fail",
             },
         ],
@@ -219,6 +231,8 @@ export const UPSTREAM_CLI_CONTRACTS = {
             "policyFiles",
             "adminPolicyFiles",
             "attachments",
+            // Phase 4 slice γ
+            "skipTrust",
         ],
         flags: {
             "-p": { arity: "one", description: "Prompt text" },
@@ -236,6 +250,10 @@ export const UPSTREAM_CLI_CONTRACTS = {
             "--admin-policy": { arity: "one", description: "Admin policy file path" },
             "-o": { arity: "one", values: ["json"], description: "Output format" },
             "--resume": { arity: "one", description: "Resume session" },
+            "--skip-trust": {
+                arity: "none",
+                description: "Trust workspace for this session (Phase 4 slice γ)",
+            },
         },
         env: {},
         conformanceFixtures: [
@@ -251,6 +269,12 @@ export const UPSTREAM_CLI_CONTRACTS = {
                 args: ["-p", "hello", "--not-a-gemini-flag"],
                 expect: "fail",
             },
+            {
+                id: "gemini-skip-trust",
+                description: "Phase 4 slice γ: --skip-trust is accepted",
+                args: ["-p", "hello", "--skip-trust"],
+                expect: "pass",
+            },
         ],
     },
     grok: {
@@ -275,6 +299,8 @@ export const UPSTREAM_CLI_CONTRACTS = {
             "mcpServers",
             "allowedTools",
             "disallowedTools",
+            // Phase 4 slice δ
+            "maxTurns",
         ],
         flags: {
             "-p": { arity: "one", description: "Prompt text" },
@@ -299,6 +325,11 @@ export const UPSTREAM_CLI_CONTRACTS = {
             },
             "--resume": { arity: "one", description: "Resume session" },
             "--continue": { arity: "none", description: "Continue latest session" },
+            "--max-turns": {
+                arity: "one",
+                pattern: /^[1-9][0-9]*$/,
+                description: "Agent-loop iteration cap (Phase 4 slice δ)",
+            },
         },
         env: {},
         conformanceFixtures: [
@@ -314,6 +345,18 @@ export const UPSTREAM_CLI_CONTRACTS = {
                 args: ["-p", "hello", "--not-a-grok-flag"],
                 expect: "fail",
             },
+            {
+                id: "grok-max-turns",
+                description: "Phase 4 slice δ: --max-turns N is accepted",
+                args: ["-p", "hello", "--max-turns", "5"],
+                expect: "pass",
+            },
+            {
+                id: "grok-max-turns-invalid-zero",
+                description: "Phase 4 slice δ: --max-turns 0 is rejected by contract pattern",
+                args: ["-p", "hello", "--max-turns", "0"],
+                expect: "fail",
+            },
         ],
     },
     mistral: {
@@ -337,6 +380,11 @@ export const UPSTREAM_CLI_CONTRACTS = {
             "mcpServers",
             "allowedTools",
             "disallowedTools",
+            // Phase 4 slice γ
+            "trust",
+            // Phase 4 slice δ
+            "maxTurns",
+            "maxPrice",
         ],
         flags: {
             "-p": { arity: "one", description: "Prompt text" },
@@ -355,6 +403,22 @@ export const UPSTREAM_CLI_CONTRACTS = {
             "--enabled-tools": { arity: "one", description: "Enabled tool" },
             "--resume": { arity: "one", description: "Resume session" },
             "--continue": { arity: "none", description: "Continue latest session" },
+            "--trust": {
+                arity: "none",
+                description: "Trust cwd for this invocation only (Phase 4 slice γ)",
+            },
+            "--max-turns": {
+                arity: "one",
+                pattern: /^[1-9][0-9]*$/,
+                description: "Agent-loop iteration cap (Phase 4 slice δ, programmatic mode only)",
+            },
+            "--max-price": {
+                arity: "one",
+                // Decimal-only: matches the MAX_PRICE_SCHEMA min(1e-6) lower bound
+                // that keeps String(N) in decimal form (no scientific notation).
+                pattern: /^(0|[1-9][0-9]*)(\.[0-9]+)?$/,
+                description: "Cumulative cost cap in USD (Phase 4 slice δ, programmatic mode only)",
+            },
         },
         env: {
             VIBE_ACTIVE_MODEL: {
@@ -378,6 +442,27 @@ export const UPSTREAM_CLI_CONTRACTS = {
                 env: { CODEX_MODEL: "gpt-5.5" },
                 expect: "fail",
             },
+            {
+                id: "mistral-trust",
+                description: "Phase 4 slice γ: --trust is accepted",
+                args: ["-p", "hello", "--agent", "auto-approve", "--trust"],
+                env: { VIBE_ACTIVE_MODEL: "mistral-medium-3.5" },
+                expect: "pass",
+            },
+            {
+                id: "mistral-max-turns-and-price",
+                description: "Phase 4 slice δ: --max-turns + --max-price are accepted together",
+                args: ["-p", "hello", "--agent", "auto-approve", "--max-turns", "3", "--max-price", "0.01"],
+                env: { VIBE_ACTIVE_MODEL: "mistral-medium-3.5" },
+                expect: "pass",
+            },
+            {
+                id: "mistral-max-price-scientific-notation",
+                description: "Phase 4 slice δ: scientific-notation --max-price is rejected by contract pattern (matches MAX_PRICE_SCHEMA bounds)",
+                args: ["-p", "hello", "--agent", "auto-approve", "--max-price", "1e-7"],
+                env: { VIBE_ACTIVE_MODEL: "mistral-medium-3.5" },
+                expect: "fail",
+            },
         ],
     },
 };

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "llm-cli-gateway",
-  "version": "1.8.0",
+  "version": "1.9.0",
   "mcpName": "io.github.verivus-oss/llm-cli-gateway",
   "description": "MCP server providing unified access to Claude Code, Codex, Gemini, Grok, and Mistral Vibe CLIs with session management, retry logic, async job orchestration, durable job results, and cross-LLM validation.",
   "license": "MIT",