llm-cli-gateway 1.8.0 → 1.9.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -2,6 +2,104 @@
2
2
 
3
3
  All notable changes to the llm-cli-gateway project.
4
4
 
5
+ ## [1.9.0] - 2026-05-27 — Phase 4 slice δ (budget/max-turns parity) + retroactive α/γ contract closure
6
+
7
+ Ships the fourth Phase 4 slice (budget/max-turns parity for Grok and Mistral),
8
+ and retroactively closes three latent contract gaps that shipped silently in
9
+ v1.8.0 (slices α and γ). Five commits land together: the slice δ feature,
10
+ two bounds-tightening fixes, a contract-table closure, and a test-veracity
11
+ hardening pass driven by an iterative multi-LLM audit.
12
+
13
+ ### Added — `maxTurns` / `maxPrice` budget caps (slice δ)
14
+
15
+ - `grok_request` and `grok_request_async` gain optional `maxTurns?: number`
16
+ → emits `grok --max-turns N`. Grok exposes no per-request budget flag,
17
+ so `--max-price` is Mistral-only.
18
+ - `mistral_request` and `mistral_request_async` gain optional
19
+ `maxTurns?: number` → `vibe --max-turns N` AND `maxPrice?: number` →
20
+ `vibe --max-price DOLLARS`. Both apply only in programmatic mode (`-p`),
21
+ matching Vibe's documented constraint.
22
+ - The Mistral stale-model recovery retry path (extracted into a pure
23
+ `buildMistralRetryPrep` helper) preserves all three slice-γ/δ flags
24
+ (`trust`, `maxTurns`, `maxPrice`) on the second attempt.
25
+ - Defaults: undefined for all three new fields → no flag emitted →
26
+ existing callers see no behavioural change.
27
+
28
+ ### Fixed — Bounded numeric schemas for lossless argv stringification
29
+
30
+ - Extracted two shared, exported Zod constants:
31
+ - `MAX_TURNS_SCHEMA = z.number().int().positive().safe().max(10_000)`
32
+ - `MAX_PRICE_SCHEMA = z.number().positive().finite().min(1e-6).max(10_000)`
33
+ - The lower `.min(1e-6)` cap on price is exactly the boundary where
34
+ `String(N)` switches from decimal to scientific notation
35
+ (`String(1e-6) === "0.000001"` but `String(1e-7) === "1e-7"`); both
36
+ upstream CLIs reject scientific-notation values.
37
+ - Reused across all four slice-δ tool registrations so bounds stay
38
+ consistent if they ever need to change.
39
+
40
+ ### Fixed — Upstream contract table closes 5 latent flag gaps
41
+
42
+ `assertUpstreamCliArgs` consults `UPSTREAM_CLI_CONTRACTS` on every real
43
+ `*_request` call. The following flags / mcpParameters were never registered
44
+ there before this release, so production calls setting any of them threw
45
+ "Upstream contract violation" at runtime even though the prepare-function
46
+ unit tests passed:
47
+
48
+ - **Gemini** (slice γ retroactive): `skipTrust` + `--skip-trust`.
49
+ - **Mistral** (slice γ + δ retroactive): `trust` + `--trust`; `maxTurns` +
50
+ `--max-turns`; `maxPrice` + `--max-price` (with a strict decimal-only
51
+ regex matching `MAX_PRICE_SCHEMA`'s lower bound).
52
+ - **Grok** (slice δ): `maxTurns` + `--max-turns`.
53
+ - **Codex** (slice α retroactive): `--output-schema` and `-c` removed
54
+ from `resumeForbiddenFlags` — verified accepted on `codex exec resume`
55
+ per codex-cli 0.133.0.
56
+
57
+ Conformance fixtures pin each new flag's argv shape, including a
58
+ `mistral-max-price-scientific-notation` fixture that locks the `1e-7`
59
+ rejection at the contract layer.
60
+
61
+ ### Hardened — Test veracity (multi-LLM audit follow-up)
62
+
63
+ Codex + Grok ran iterative test-veracity audits with mutation probes per
64
+ `docs/plans/test-veracity-audit.spec.md`. They proved several added tests
65
+ were not falsifiable on the dimensions their commit messages claimed.
66
+ New file `src/__tests__/test-veracity-regressions.test.ts` closes those
67
+ gaps with six describe blocks:
68
+
69
+ - **REGRESSIONS A** — probes registered tool `inputSchema` bounds
70
+ directly (not the bare schema constants), so schema-drift in any of
71
+ the four sync/async registrations is caught.
72
+ - **REGRESSIONS B** — tests the pure `buildMistralRetryPrep` helper
73
+ across all combinations of `trust × maxTurns × maxPrice`. Self-
74
+ validated: dropping any of the three forwards on retry goes red.
75
+ - **REGRESSIONS C** — positive allowlist asserting slice α/γ/δ
76
+ parameters live in the matching contract's `mcpParameters` (closes
77
+ the self-oracle gap where removing a param from BOTH the contract
78
+ AND the schema previously stayed green).
79
+ - **REGRESSIONS D** — threads `prepare*Request` output into
80
+ `validateUpstreamCliArgs` end-to-end; the exact consistency check
81
+ the latent v1.8.0 contract breaks would have failed.
82
+ - **REGRESSIONS E** — `it.each` over sync AND async variants of every
83
+ slice-touched tool; the existing C4 was sync-only.
84
+ - **REGRESSIONS F** — flag-fixture coverage map: every flag in each
85
+ contract `flags` table must be exercised by a passing fixture (with
86
+ a grandfathered pre-audit baseline). Forces future slice authors to
87
+ add a fixture alongside any new flag entry.
88
+
89
+ The existing C4 (`MCP request schemas expose the provider contract
90
+ parameters`) now walks `_async` tools too.
91
+
92
+ ### Notes
93
+
94
+ Multi-LLM review across multiple iterative rounds, ending with a
95
+ dedicated test-veracity audit per Werner's strict-evidence protocol
96
+ (documented in `docs/plans/test-veracity-audit.spec.md`). Round 2 of the
97
+ audit landed UNCONDITIONAL APPROVE from Codex, Grok, Claude, and Mistral
98
+ with full mutation-probe evidence — every documented counterexample
99
+ mutation went red as predicted; tests are falsifiable by exactly the
100
+ regressions they claim to guard against. Gemini was quota-exhausted
101
+ during the audit window (~6h reset) and did not participate in round 2.
102
+
5
103
  ## [1.8.0] - 2026-05-27 — Phase 4 openers (codex resume fix, mistral telemetry, headless trust flags)
6
104
 
7
105
  Ships the first three slices of the Phase 4 provider-modernisation
package/dist/index.d.ts CHANGED
@@ -54,6 +54,19 @@ declare const logger: {
54
54
  debug: (message: string, ...args: any[]) => void;
55
55
  };
56
56
  type GatewayLogger = typeof logger;
57
+ /**
58
+ * Phase 4 slice δ — shared Zod fragments for `maxTurns` / `maxPrice`.
59
+ *
60
+ * Both flags reach the upstream CLIs as decimal-formatted argv strings via
61
+ * `String(N)`. `z.number().int().positive()` alone lets values past
62
+ * `Number.MAX_SAFE_INTEGER` through, after which `String(1e21)` emits
63
+ * scientific notation that Grok and Vibe both reject. The bounds below
64
+ * (safe-integer cap + 10000 ceiling for turns; finite + 10000 USD ceiling
65
+ * for price) guarantee a lossless decimal stringification AND a sane
66
+ * upper bound — no plausible single agent loop exceeds 10k turns or 10k USD.
67
+ */
68
+ export declare const MAX_TURNS_SCHEMA: z.ZodNumber;
69
+ export declare const MAX_PRICE_SCHEMA: z.ZodNumber;
57
70
  export declare const SESSION_PROVIDER_VALUES: readonly ["claude", "codex", "gemini", "grok", "mistral"];
58
71
  export declare const SESSION_PROVIDER_ENUM: z.ZodEnum<["claude", "codex", "gemini", "grok", "mistral"]>;
59
72
  export type SessionProvider = (typeof SESSION_PROVIDER_VALUES)[number];
@@ -215,6 +228,29 @@ export declare function prepareGeminiRequest(params: {
215
228
  */
216
229
  skipTrust?: boolean;
217
230
  }, runtime?: GatewayServerRuntime): CliRequestPrep | ExtendedToolResponse;
231
+ export declare function prepareGrokRequest(params: {
232
+ prompt?: string;
233
+ promptParts?: PromptParts;
234
+ model?: string;
235
+ outputFormat?: string;
236
+ alwaysApprove?: boolean;
237
+ permissionMode?: string;
238
+ effort?: string;
239
+ reasoningEffort?: string;
240
+ allowedTools?: string[];
241
+ disallowedTools?: string[];
242
+ approvalStrategy: "legacy" | "mcp_managed";
243
+ approvalPolicy?: string;
244
+ mcpServers?: ClaudeMcpServerName[];
245
+ correlationId?: string;
246
+ optimizePrompt: boolean;
247
+ operation: string;
248
+ /**
249
+ * Phase 4 slice δ: emit `--max-turns N` so callers can cap agent-loop
250
+ * iterations for cost / latency control. Mirrors Claude's wiring.
251
+ */
252
+ maxTurns?: number;
253
+ }, runtime?: GatewayServerRuntime): CliRequestPrep | ExtendedToolResponse;
218
254
  export declare function prepareMistralRequest(params: {
219
255
  prompt?: string;
220
256
  promptParts?: PromptParts;
@@ -236,9 +272,29 @@ export declare function prepareMistralRequest(params: {
236
272
  * prompt for this invocation only (not persisted). Default undefined.
237
273
  */
238
274
  trust?: boolean;
275
+ /** Phase 4 slice δ: Vibe `--max-turns N` cap on agent-loop iterations. */
276
+ maxTurns?: number;
277
+ /** Phase 4 slice δ: Vibe `--max-price DOLLARS` cumulative-cost cap. */
278
+ maxPrice?: number;
239
279
  }, runtime?: GatewayServerRuntime): (CliRequestPrep & {
240
280
  mistralEnv: Record<string, string>;
241
281
  }) | ExtendedToolResponse;
282
+ /**
283
+ * Phase 4 slice δ post-review: pure helper extracted from
284
+ * `handleMistralRequest` so the retry-path arg-preservation invariants
285
+ * (trust + maxTurns + maxPrice from slices γ/δ) are unit-testable
286
+ * without mocking awaitJobOrDefer. Any param the wrapper threads into
287
+ * the FIRST `buildMistralCliInvocation` call MUST also be threaded
288
+ * through here, or a fresh-workspace / budgeted run can degrade on
289
+ * the second attempt.
290
+ */
291
+ export declare function buildMistralRetryPrep(params: Pick<MistralRequestParams, "outputFormat" | "permissionMode" | "effort" | "reasoningEffort" | "allowedTools" | "disallowedTools" | "approvalStrategy" | "trust" | "maxTurns" | "maxPrice"> & {
292
+ effectivePrompt: string;
293
+ }, recoveryModel: string): {
294
+ args: string[];
295
+ env: Record<string, string>;
296
+ ignoredDisallowedTools: boolean;
297
+ };
242
298
  export interface GeminiRequestParams {
243
299
  prompt?: string;
244
300
  promptParts?: PromptParts;
@@ -303,6 +359,8 @@ export interface GrokRequestParams {
303
359
  optimizeResponse?: boolean;
304
360
  idleTimeoutMs?: number;
305
361
  forceRefresh?: boolean;
362
+ /** Phase 4 slice δ: cap agent-loop iterations via `--max-turns N`. */
363
+ maxTurns?: number;
306
364
  }
307
365
  export declare function handleGrokRequest(deps: HandlerDeps, params: GrokRequestParams): Promise<ExtendedToolResponse>;
308
366
  export declare function handleGrokRequestAsync(deps: AsyncHandlerDeps, params: Omit<GrokRequestParams, "optimizeResponse">): Promise<ExtendedToolResponse>;
@@ -329,6 +387,10 @@ export interface MistralRequestParams {
329
387
  forceRefresh?: boolean;
330
388
  /** Phase 4 slice γ: emit `--trust` for fresh-workspace headless runs. */
331
389
  trust?: boolean;
390
+ /** Phase 4 slice δ: Vibe `--max-turns N` cap on agent-loop iterations. */
391
+ maxTurns?: number;
392
+ /** Phase 4 slice δ: Vibe `--max-price DOLLARS` cumulative-cost cap. */
393
+ maxPrice?: number;
332
394
  }
333
395
  export declare function handleMistralRequest(deps: HandlerDeps, params: MistralRequestParams): Promise<ExtendedToolResponse>;
334
396
  export declare function handleMistralRequestAsync(deps: AsyncHandlerDeps, params: Omit<MistralRequestParams, "optimizeResponse">): Promise<ExtendedToolResponse>;
package/dist/index.js CHANGED
@@ -229,6 +229,23 @@ function getApprovalManager(runtimeLogger = logger) {
229
229
  return approvalManager;
230
230
  }
231
231
  const MCP_SERVER_ENUM = z.enum(CLAUDE_MCP_SERVER_NAMES);
232
+ /**
233
+ * Phase 4 slice δ — shared Zod fragments for `maxTurns` / `maxPrice`.
234
+ *
235
+ * Both flags reach the upstream CLIs as decimal-formatted argv strings via
236
+ * `String(N)`. `z.number().int().positive()` alone lets values past
237
+ * `Number.MAX_SAFE_INTEGER` through, after which `String(1e21)` emits
238
+ * scientific notation that Grok and Vibe both reject. The bounds below
239
+ * (safe-integer cap + 10000 ceiling for turns; finite + 10000 USD ceiling
240
+ * for price) guarantee a lossless decimal stringification AND a sane
241
+ * upper bound — no plausible single agent loop exceeds 10k turns or 10k USD.
242
+ */
243
+ export const MAX_TURNS_SCHEMA = z.number().int().positive().safe().max(10_000);
244
+ // `.min(1e-6)` keeps the value in JS's decimal-stringify range:
245
+ // String(1e-6) === "0.000001" but String(1e-7) === "1e-7", which both
246
+ // upstream CLIs would reject. 1µUSD per request is fine-grained enough
247
+ // for any plausible budget-cap use.
248
+ export const MAX_PRICE_SCHEMA = z.number().positive().finite().min(1e-6).max(10_000);
232
249
  // U22: Session-provider enum extended to five providers. The storage layer's
233
250
  // CLI_TYPES already includes "mistral"; the MCP-tool layer mirrors that here so
234
251
  // session_create / session_list / session_clear_all accept the fifth provider.
@@ -1273,7 +1290,7 @@ export function prepareGeminiRequest(params, runtime = resolveGatewayServerRunti
1273
1290
  stablePrefixTokens,
1274
1291
  };
1275
1292
  }
1276
- function prepareGrokRequest(params, runtime = resolveGatewayServerRuntime()) {
1293
+ export function prepareGrokRequest(params, runtime = resolveGatewayServerRuntime()) {
1277
1294
  const corrId = params.correlationId || randomUUID();
1278
1295
  const cliInfo = getCliInfo();
1279
1296
  const resolvedModel = resolveModelAlias("grok", params.model, cliInfo);
@@ -1349,6 +1366,9 @@ function prepareGrokRequest(params, runtime = resolveGatewayServerRuntime()) {
1349
1366
  if (params.disallowedTools && params.disallowedTools.length > 0) {
1350
1367
  args.push("--disallowed-tools", params.disallowedTools.join(","));
1351
1368
  }
1369
+ if (params.maxTurns !== undefined) {
1370
+ args.push("--max-turns", String(params.maxTurns));
1371
+ }
1352
1372
  return {
1353
1373
  corrId,
1354
1374
  effectivePrompt,
@@ -1433,6 +1453,8 @@ export function prepareMistralRequest(params, runtime = resolveGatewayServerRunt
1433
1453
  allowedTools: params.allowedTools,
1434
1454
  disallowedTools: params.disallowedTools,
1435
1455
  trust: params.trust,
1456
+ maxTurns: params.maxTurns,
1457
+ maxPrice: params.maxPrice,
1436
1458
  });
1437
1459
  if (prep.ignoredDisallowedTools) {
1438
1460
  runtime.logger.info(`[${corrId}] Mistral does not support disallowedTools; ignoring (caller passed ${params.disallowedTools?.length ?? 0} entries)`);
@@ -1463,6 +1485,32 @@ function selectMistralRecoveryModel(failedModel) {
1463
1485
  ].filter((model) => Boolean(model && model !== failedModel));
1464
1486
  return candidates.find(model => model !== "local");
1465
1487
  }
1488
+ /**
1489
+ * Phase 4 slice δ post-review: pure helper extracted from
1490
+ * `handleMistralRequest` so the retry-path arg-preservation invariants
1491
+ * (trust + maxTurns + maxPrice from slices γ/δ) are unit-testable
1492
+ * without mocking awaitJobOrDefer. Any param the wrapper threads into
1493
+ * the FIRST `buildMistralCliInvocation` call MUST also be threaded
1494
+ * through here, or a fresh-workspace / budgeted run can degrade on
1495
+ * the second attempt.
1496
+ */
1497
+ export function buildMistralRetryPrep(params, recoveryModel) {
1498
+ return buildMistralCliInvocation({
1499
+ prompt: params.effectivePrompt,
1500
+ resolvedModel: recoveryModel,
1501
+ outputFormat: params.outputFormat,
1502
+ permissionMode: params.approvalStrategy === "mcp_managed"
1503
+ ? "auto-approve"
1504
+ : (params.permissionMode ?? "auto-approve"),
1505
+ effort: params.effort,
1506
+ reasoningEffort: params.reasoningEffort,
1507
+ allowedTools: params.allowedTools,
1508
+ disallowedTools: params.disallowedTools,
1509
+ trust: params.trust,
1510
+ maxTurns: params.maxTurns,
1511
+ maxPrice: params.maxPrice,
1512
+ });
1513
+ }
1466
1514
  function buildCliResponse(cli, stdout, optimizeResponse, corrId, sessionId, prep, durationMs, resumable, outputFormat, warnings) {
1467
1515
  let finalStdout = stdout;
1468
1516
  // Skip response optimization for JSON output to prevent corrupting structured data
@@ -1801,6 +1849,7 @@ export async function handleGrokRequest(deps, params) {
1801
1849
  correlationId: params.correlationId,
1802
1850
  optimizePrompt: params.optimizePrompt,
1803
1851
  operation: "grok_request",
1852
+ maxTurns: params.maxTurns,
1804
1853
  }, runtime);
1805
1854
  if (!("args" in prep))
1806
1855
  return prep;
@@ -1921,6 +1970,7 @@ export async function handleGrokRequestAsync(deps, params) {
1921
1970
  correlationId: params.correlationId,
1922
1971
  optimizePrompt: params.optimizePrompt,
1923
1972
  operation: "grok_request_async",
1973
+ maxTurns: params.maxTurns,
1924
1974
  }, runtime);
1925
1975
  if (!("args" in prep))
1926
1976
  return prep;
@@ -2003,6 +2053,8 @@ export async function handleMistralRequest(deps, params) {
2003
2053
  optimizePrompt: params.optimizePrompt,
2004
2054
  operation: "mistral_request",
2005
2055
  trust: params.trust,
2056
+ maxTurns: params.maxTurns,
2057
+ maxPrice: params.maxPrice,
2006
2058
  }, runtime);
2007
2059
  if (!("args" in prep))
2008
2060
  return prep;
@@ -2035,22 +2087,7 @@ export async function handleMistralRequest(deps, params) {
2035
2087
  const recoveryModel = selectMistralRecoveryModel(prep.resolvedModel);
2036
2088
  if (recoveryModel) {
2037
2089
  deps.logger.info(`[${corrId}] mistral_request detected stale Vibe model selection; retrying once with ${recoveryModel}`);
2038
- const retryPrep = buildMistralCliInvocation({
2039
- prompt: prep.effectivePrompt,
2040
- resolvedModel: recoveryModel,
2041
- outputFormat: params.outputFormat,
2042
- permissionMode: params.approvalStrategy === "mcp_managed"
2043
- ? "auto-approve"
2044
- : (params.permissionMode ?? "auto-approve"),
2045
- effort: params.effort,
2046
- reasoningEffort: params.reasoningEffort,
2047
- allowedTools: params.allowedTools,
2048
- disallowedTools: params.disallowedTools,
2049
- // Phase 4 slice γ: preserve --trust on the model-selection retry
2050
- // so a fresh untrusted workspace doesn't block headlessly on the
2051
- // second attempt after surviving the first.
2052
- trust: params.trust,
2053
- });
2090
+ const retryPrep = buildMistralRetryPrep({ ...params, effectivePrompt: prep.effectivePrompt }, recoveryModel);
2054
2091
  const retryArgs = [...retryPrep.args, ...sessionResult.resumeArgs];
2055
2092
  // Reuse the FR handoff built above — the retry preserves corrId,
2056
2093
  // so the manager's logComplete still updates the original row.
@@ -2151,6 +2188,8 @@ export async function handleMistralRequestAsync(deps, params) {
2151
2188
  optimizePrompt: params.optimizePrompt,
2152
2189
  operation: "mistral_request_async",
2153
2190
  trust: params.trust,
2191
+ maxTurns: params.maxTurns,
2192
+ maxPrice: params.maxPrice,
2154
2193
  }, runtime);
2155
2194
  if (!("args" in prep))
2156
2195
  return prep;
@@ -3142,7 +3181,8 @@ export function createGatewayServer(deps = {}) {
3142
3181
  .boolean()
3143
3182
  .default(false)
3144
3183
  .describe("Bypass dedup and force a fresh CLI run even if a recent identical request exists"),
3145
- }, async ({ prompt, promptParts, model, outputFormat, sessionId, resumeLatest, createNewSession, alwaysApprove, permissionMode, effort, reasoningEffort, approvalStrategy, approvalPolicy, mcpServers, allowedTools, disallowedTools, correlationId, optimizePrompt, optimizeResponse, idleTimeoutMs, forceRefresh, }) => {
3184
+ maxTurns: MAX_TURNS_SCHEMA.optional().describe("Grok `--max-turns N`: cap on agent-loop iterations for cost / latency control (Phase 4 slice δ). Bounded to safe integers 10000."),
3185
+ }, async ({ prompt, promptParts, model, outputFormat, sessionId, resumeLatest, createNewSession, alwaysApprove, permissionMode, effort, reasoningEffort, approvalStrategy, approvalPolicy, mcpServers, allowedTools, disallowedTools, correlationId, optimizePrompt, optimizeResponse, idleTimeoutMs, forceRefresh, maxTurns, }) => {
3146
3186
  return handleGrokRequest({ sessionManager, logger, runtime }, {
3147
3187
  prompt,
3148
3188
  promptParts,
@@ -3165,6 +3205,7 @@ export function createGatewayServer(deps = {}) {
3165
3205
  optimizeResponse,
3166
3206
  idleTimeoutMs,
3167
3207
  forceRefresh,
3208
+ maxTurns,
3168
3209
  });
3169
3210
  });
3170
3211
  //──────────────────────────────────────────────────────────────────────────────
@@ -3242,7 +3283,9 @@ export function createGatewayServer(deps = {}) {
3242
3283
  .boolean()
3243
3284
  .default(false)
3244
3285
  .describe("Emit `--trust` so Vibe trusts the cwd for this invocation only (not persisted to trusted_folders.toml) and skips the interactive trust prompt (Phase 4 slice γ)."),
3245
- }, async ({ prompt, promptParts, model, outputFormat, sessionId, resumeLatest, createNewSession, permissionMode, effort, reasoningEffort, approvalStrategy, approvalPolicy, mcpServers, allowedTools, disallowedTools, correlationId, optimizePrompt, optimizeResponse, idleTimeoutMs, forceRefresh, trust, }) => {
3286
+ maxTurns: MAX_TURNS_SCHEMA.optional().describe("Vibe `--max-turns N`: cap the agent-loop iteration count (programmatic mode only, Phase 4 slice δ). Bounded to safe integers 10000."),
3287
+ maxPrice: MAX_PRICE_SCHEMA.optional().describe("Vibe `--max-price DOLLARS`: interrupt the session when cumulative cost crosses this cap (programmatic mode only, Phase 4 slice δ). Bounded to finite values ≤ 10000 USD."),
3288
+ }, async ({ prompt, promptParts, model, outputFormat, sessionId, resumeLatest, createNewSession, permissionMode, effort, reasoningEffort, approvalStrategy, approvalPolicy, mcpServers, allowedTools, disallowedTools, correlationId, optimizePrompt, optimizeResponse, idleTimeoutMs, forceRefresh, trust, maxTurns, maxPrice, }) => {
3246
3289
  return handleMistralRequest({ sessionManager, logger, runtime }, {
3247
3290
  prompt,
3248
3291
  promptParts,
@@ -3265,6 +3308,8 @@ export function createGatewayServer(deps = {}) {
3265
3308
  idleTimeoutMs,
3266
3309
  forceRefresh,
3267
3310
  trust,
3311
+ maxTurns,
3312
+ maxPrice,
3268
3313
  });
3269
3314
  });
3270
3315
  //──────────────────────────────────────────────────────────────────────────────
@@ -3753,7 +3798,8 @@ export function createGatewayServer(deps = {}) {
3753
3798
  .boolean()
3754
3799
  .default(false)
3755
3800
  .describe("Bypass dedup and force a fresh CLI run even if a recent identical request exists"),
3756
- }, async ({ prompt, promptParts, model, outputFormat, sessionId, resumeLatest, createNewSession, alwaysApprove, permissionMode, effort, reasoningEffort, approvalStrategy, approvalPolicy, mcpServers, allowedTools, disallowedTools, correlationId, optimizePrompt, idleTimeoutMs, forceRefresh, }) => {
3801
+ maxTurns: MAX_TURNS_SCHEMA.optional().describe("Grok `--max-turns N`: cap on agent-loop iterations for cost / latency control (Phase 4 slice δ). Bounded to safe integers 10000."),
3802
+ }, async ({ prompt, promptParts, model, outputFormat, sessionId, resumeLatest, createNewSession, alwaysApprove, permissionMode, effort, reasoningEffort, approvalStrategy, approvalPolicy, mcpServers, allowedTools, disallowedTools, correlationId, optimizePrompt, idleTimeoutMs, forceRefresh, maxTurns, }) => {
3757
3803
  return handleGrokRequestAsync({ sessionManager, asyncJobManager, logger, runtime }, {
3758
3804
  prompt,
3759
3805
  promptParts,
@@ -3775,6 +3821,7 @@ export function createGatewayServer(deps = {}) {
3775
3821
  optimizePrompt,
3776
3822
  idleTimeoutMs,
3777
3823
  forceRefresh,
3824
+ maxTurns,
3778
3825
  });
3779
3826
  });
3780
3827
  server.tool("mistral_request_async", {
@@ -3848,7 +3895,9 @@ export function createGatewayServer(deps = {}) {
3848
3895
  .boolean()
3849
3896
  .default(false)
3850
3897
  .describe("Emit `--trust` so Vibe trusts the cwd for this invocation only (not persisted to trusted_folders.toml) and skips the interactive trust prompt (Phase 4 slice γ)."),
3851
- }, async ({ prompt, promptParts, model, outputFormat, sessionId, resumeLatest, createNewSession, permissionMode, effort, reasoningEffort, approvalStrategy, approvalPolicy, mcpServers, allowedTools, disallowedTools, correlationId, optimizePrompt, idleTimeoutMs, forceRefresh, trust, }) => {
3898
+ maxTurns: MAX_TURNS_SCHEMA.optional().describe("Vibe `--max-turns N`: cap the agent-loop iteration count (programmatic mode only, Phase 4 slice δ). Bounded to safe integers 10000."),
3899
+ maxPrice: MAX_PRICE_SCHEMA.optional().describe("Vibe `--max-price DOLLARS`: interrupt the session when cumulative cost crosses this cap (programmatic mode only, Phase 4 slice δ). Bounded to finite values ≤ 10000 USD."),
3900
+ }, async ({ prompt, promptParts, model, outputFormat, sessionId, resumeLatest, createNewSession, permissionMode, effort, reasoningEffort, approvalStrategy, approvalPolicy, mcpServers, allowedTools, disallowedTools, correlationId, optimizePrompt, idleTimeoutMs, forceRefresh, trust, maxTurns, maxPrice, }) => {
3852
3901
  return handleMistralRequestAsync({ sessionManager, asyncJobManager, logger, runtime }, {
3853
3902
  prompt,
3854
3903
  promptParts,
@@ -3870,6 +3919,8 @@ export function createGatewayServer(deps = {}) {
3870
3919
  idleTimeoutMs,
3871
3920
  forceRefresh,
3872
3921
  trust,
3922
+ maxTurns,
3923
+ maxPrice,
3873
3924
  });
3874
3925
  });
3875
3926
  server.tool("llm_job_status", {
@@ -114,6 +114,17 @@ export interface PrepareMistralRequestInput {
114
114
  * Vibe's prompt behaviour is preserved for existing callers.
115
115
  */
116
116
  trust?: boolean;
117
+ /**
118
+ * Phase 4 slice δ: emit `--max-turns N` to cap the agent-loop iteration
119
+ * count (only applies in programmatic mode with `-p`).
120
+ */
121
+ maxTurns?: number;
122
+ /**
123
+ * Phase 4 slice δ: emit `--max-price DOLLARS` so the session is
124
+ * interrupted when cumulative cost crosses the cap (programmatic mode
125
+ * only).
126
+ */
127
+ maxPrice?: number;
117
128
  }
118
129
  export interface PrepareMistralRequestResult {
119
130
  args: string[];
@@ -179,6 +179,12 @@ export function prepareMistralRequest(input) {
179
179
  if (input.trust) {
180
180
  args.push("--trust");
181
181
  }
182
+ if (input.maxTurns !== undefined) {
183
+ args.push("--max-turns", String(input.maxTurns));
184
+ }
185
+ if (input.maxPrice !== undefined) {
186
+ args.push("--max-price", String(input.maxPrice));
187
+ }
182
188
  const ignoredDisallowedTools = Boolean(input.disallowedTools && input.disallowedTools.length > 0);
183
189
  return { args, env, ignoredDisallowedTools };
184
190
  }
@@ -133,14 +133,11 @@ export const UPSTREAM_CLI_CONTRACTS = {
133
133
  "ignoreRules",
134
134
  ],
135
135
  resumeOnlyFlags: ["--last"],
136
- resumeForbiddenFlags: [
137
- "--sandbox",
138
- "--ask-for-approval",
139
- "--full-auto",
140
- "--output-schema",
141
- "--search",
142
- "-c",
143
- ],
136
+ // Phase 4 slice α (v1.8.0) verified that `codex exec resume` accepts
137
+ // `--output-schema` and `-c` (codex-cli 0.133.0 `exec resume --help`),
138
+ // so they're no longer forbidden. `--search` stays forbidden (resume
139
+ // inherits the original session's web-search state).
140
+ resumeForbiddenFlags: ["--sandbox", "--ask-for-approval", "--full-auto", "--search"],
144
141
  flags: {
145
142
  "--last": { arity: "none", description: "Resume latest session" },
146
143
  "--model": { arity: "one", description: "Model selector" },
@@ -189,9 +186,24 @@ export const UPSTREAM_CLI_CONTRACTS = {
189
186
  expect: "fail",
190
187
  },
191
188
  {
189
+ // Phase 4 slice α: --output-schema IS accepted on resume per
190
+ // codex-cli 0.133.0; this fixture pins the new behaviour so future
191
+ // contract changes can't silently regress.
192
192
  id: "codex-resume-output-schema",
193
- description: "Resume-incompatible output schema flag is rejected",
193
+ description: "Phase 4 slice α: --output-schema accepted on resume (codex-cli 0.133.0)",
194
194
  args: ["exec", "resume", "--output-schema", "/tmp/schema.json", "session-id", "hello"],
195
+ expect: "pass",
196
+ },
197
+ {
198
+ id: "codex-resume-config-override",
199
+ description: "Phase 4 slice α: -c key=value accepted on resume",
200
+ args: ["exec", "resume", "-c", "model.foo=bar", "session-id", "hello"],
201
+ expect: "pass",
202
+ },
203
+ {
204
+ id: "codex-resume-search-still-forbidden",
205
+ description: "Phase 4 slice α: --search remains forbidden on resume",
206
+ args: ["exec", "resume", "--search", "session-id", "hello"],
195
207
  expect: "fail",
196
208
  },
197
209
  ],
@@ -219,6 +231,8 @@ export const UPSTREAM_CLI_CONTRACTS = {
219
231
  "policyFiles",
220
232
  "adminPolicyFiles",
221
233
  "attachments",
234
+ // Phase 4 slice γ
235
+ "skipTrust",
222
236
  ],
223
237
  flags: {
224
238
  "-p": { arity: "one", description: "Prompt text" },
@@ -236,6 +250,10 @@ export const UPSTREAM_CLI_CONTRACTS = {
236
250
  "--admin-policy": { arity: "one", description: "Admin policy file path" },
237
251
  "-o": { arity: "one", values: ["json"], description: "Output format" },
238
252
  "--resume": { arity: "one", description: "Resume session" },
253
+ "--skip-trust": {
254
+ arity: "none",
255
+ description: "Trust workspace for this session (Phase 4 slice γ)",
256
+ },
239
257
  },
240
258
  env: {},
241
259
  conformanceFixtures: [
@@ -251,6 +269,12 @@ export const UPSTREAM_CLI_CONTRACTS = {
251
269
  args: ["-p", "hello", "--not-a-gemini-flag"],
252
270
  expect: "fail",
253
271
  },
272
+ {
273
+ id: "gemini-skip-trust",
274
+ description: "Phase 4 slice γ: --skip-trust is accepted",
275
+ args: ["-p", "hello", "--skip-trust"],
276
+ expect: "pass",
277
+ },
254
278
  ],
255
279
  },
256
280
  grok: {
@@ -275,6 +299,8 @@ export const UPSTREAM_CLI_CONTRACTS = {
275
299
  "mcpServers",
276
300
  "allowedTools",
277
301
  "disallowedTools",
302
+ // Phase 4 slice δ
303
+ "maxTurns",
278
304
  ],
279
305
  flags: {
280
306
  "-p": { arity: "one", description: "Prompt text" },
@@ -299,6 +325,11 @@ export const UPSTREAM_CLI_CONTRACTS = {
299
325
  },
300
326
  "--resume": { arity: "one", description: "Resume session" },
301
327
  "--continue": { arity: "none", description: "Continue latest session" },
328
+ "--max-turns": {
329
+ arity: "one",
330
+ pattern: /^[1-9][0-9]*$/,
331
+ description: "Agent-loop iteration cap (Phase 4 slice δ)",
332
+ },
302
333
  },
303
334
  env: {},
304
335
  conformanceFixtures: [
@@ -314,6 +345,18 @@ export const UPSTREAM_CLI_CONTRACTS = {
314
345
  args: ["-p", "hello", "--not-a-grok-flag"],
315
346
  expect: "fail",
316
347
  },
348
+ {
349
+ id: "grok-max-turns",
350
+ description: "Phase 4 slice δ: --max-turns N is accepted",
351
+ args: ["-p", "hello", "--max-turns", "5"],
352
+ expect: "pass",
353
+ },
354
+ {
355
+ id: "grok-max-turns-invalid-zero",
356
+ description: "Phase 4 slice δ: --max-turns 0 is rejected by contract pattern",
357
+ args: ["-p", "hello", "--max-turns", "0"],
358
+ expect: "fail",
359
+ },
317
360
  ],
318
361
  },
319
362
  mistral: {
@@ -337,6 +380,11 @@ export const UPSTREAM_CLI_CONTRACTS = {
337
380
  "mcpServers",
338
381
  "allowedTools",
339
382
  "disallowedTools",
383
+ // Phase 4 slice γ
384
+ "trust",
385
+ // Phase 4 slice δ
386
+ "maxTurns",
387
+ "maxPrice",
340
388
  ],
341
389
  flags: {
342
390
  "-p": { arity: "one", description: "Prompt text" },
@@ -355,6 +403,22 @@ export const UPSTREAM_CLI_CONTRACTS = {
355
403
  "--enabled-tools": { arity: "one", description: "Enabled tool" },
356
404
  "--resume": { arity: "one", description: "Resume session" },
357
405
  "--continue": { arity: "none", description: "Continue latest session" },
406
+ "--trust": {
407
+ arity: "none",
408
+ description: "Trust cwd for this invocation only (Phase 4 slice γ)",
409
+ },
410
+ "--max-turns": {
411
+ arity: "one",
412
+ pattern: /^[1-9][0-9]*$/,
413
+ description: "Agent-loop iteration cap (Phase 4 slice δ, programmatic mode only)",
414
+ },
415
+ "--max-price": {
416
+ arity: "one",
417
+ // Decimal-only: matches the MAX_PRICE_SCHEMA min(1e-6) lower bound
418
+ // that keeps String(N) in decimal form (no scientific notation).
419
+ pattern: /^(0|[1-9][0-9]*)(\.[0-9]+)?$/,
420
+ description: "Cumulative cost cap in USD (Phase 4 slice δ, programmatic mode only)",
421
+ },
358
422
  },
359
423
  env: {
360
424
  VIBE_ACTIVE_MODEL: {
@@ -378,6 +442,27 @@ export const UPSTREAM_CLI_CONTRACTS = {
378
442
  env: { CODEX_MODEL: "gpt-5.5" },
379
443
  expect: "fail",
380
444
  },
445
+ {
446
+ id: "mistral-trust",
447
+ description: "Phase 4 slice γ: --trust is accepted",
448
+ args: ["-p", "hello", "--agent", "auto-approve", "--trust"],
449
+ env: { VIBE_ACTIVE_MODEL: "mistral-medium-3.5" },
450
+ expect: "pass",
451
+ },
452
+ {
453
+ id: "mistral-max-turns-and-price",
454
+ description: "Phase 4 slice δ: --max-turns + --max-price are accepted together",
455
+ args: ["-p", "hello", "--agent", "auto-approve", "--max-turns", "3", "--max-price", "0.01"],
456
+ env: { VIBE_ACTIVE_MODEL: "mistral-medium-3.5" },
457
+ expect: "pass",
458
+ },
459
+ {
460
+ id: "mistral-max-price-scientific-notation",
461
+ description: "Phase 4 slice δ: scientific-notation --max-price is rejected by contract pattern (matches MAX_PRICE_SCHEMA bounds)",
462
+ args: ["-p", "hello", "--agent", "auto-approve", "--max-price", "1e-7"],
463
+ env: { VIBE_ACTIVE_MODEL: "mistral-medium-3.5" },
464
+ expect: "fail",
465
+ },
381
466
  ],
382
467
  },
383
468
  };
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "llm-cli-gateway",
3
- "version": "1.8.0",
3
+ "version": "1.9.0",
4
4
  "mcpName": "io.github.verivus-oss/llm-cli-gateway",
5
5
  "description": "MCP server providing unified access to Claude Code, Codex, Gemini, Grok, and Mistral Vibe CLIs with session management, retry logic, async job orchestration, durable job results, and cross-LLM validation.",
6
6
  "license": "MIT",