llm-cli-gateway 1.8.0 → 1.10.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +148 -0
- package/dist/gemini-json-parser.d.ts +19 -4
- package/dist/gemini-json-parser.js +73 -4
- package/dist/index.d.ts +73 -6
- package/dist/index.js +97 -30
- package/dist/request-helpers.d.ts +11 -0
- package/dist/request-helpers.js +6 -0
- package/dist/upstream-contracts.js +111 -10
- package/package.json +1 -1
package/CHANGELOG.md
CHANGED
|
@@ -2,6 +2,154 @@
|
|
|
2
2
|
|
|
3
3
|
All notable changes to the llm-cli-gateway project.
|
|
4
4
|
|
|
5
|
+
## [1.10.0] - 2026-05-27 — Phase 4 slice ε (Gemini `-o stream-json` enum widening)
|
|
6
|
+
|
|
7
|
+
Ships the fifth Phase 4 slice: Gemini's NDJSON event-stream output format
|
|
8
|
+
(`-o stream-json`) is now reachable from `gemini_request` and
|
|
9
|
+
`gemini_request_async`. Four commits land together: the feature wiring, a
|
|
10
|
+
contract-table widening, a test-veracity regression suite, and a follow-up
|
|
11
|
+
test fix driven by the multi-LLM round-1 audit.
|
|
12
|
+
|
|
13
|
+
### Added — `outputFormat: "stream-json"` for Gemini
|
|
14
|
+
|
|
15
|
+
- `gemini_request` and `gemini_request_async` `outputFormat` enums widened
|
|
16
|
+
from `text | json` to `text | json | stream-json`.
|
|
17
|
+
- `prepareGeminiRequest` emits `-o stream-json` when the new value is set.
|
|
18
|
+
No `--include-partial-messages` analogue is required: Gemini already
|
|
19
|
+
streams stdout in real time across all output modes (covered by
|
|
20
|
+
`CLI_IDLE_TIMEOUTS.gemini = 600_000`).
|
|
21
|
+
- New `parseGeminiStreamJson` parser consumes the NDJSON event stream
|
|
22
|
+
(`init` / `message` / `result` lines), concatenates assistant `delta`
|
|
23
|
+
messages into the response, and extracts
|
|
24
|
+
`input_tokens` / `output_tokens` / `cached` → `cache_read_tokens` from
|
|
25
|
+
the terminal `result.stats` event.
|
|
26
|
+
- `extractUsageAndCost("gemini", _, "stream-json")` routes to the new
|
|
27
|
+
parser so usage tokens reach the flight recorder on the stream-json
|
|
28
|
+
path, matching the existing `-o json` behaviour.
|
|
29
|
+
- `UPSTREAM_CLI_CONTRACTS.gemini.flags["-o"].values` widened to
|
|
30
|
+
`["json", "stream-json"]`; two new conformance fixtures
|
|
31
|
+
(`gemini-stream-json` passing, `gemini-output-format-invalid` failing
|
|
32
|
+
for `-o ndjson`) pin the enum bound.
|
|
33
|
+
|
|
34
|
+
### Test-veracity audit
|
|
35
|
+
|
|
36
|
+
Per the standing protocol established with v1.9.0
|
|
37
|
+
(`feedback_test_veracity_audit_protocol`), this slice's tests were
|
|
38
|
+
audited by Codex + Gemini + Grok + Mistral in async parallel with
|
|
39
|
+
mandatory mutation-probe execution. Round 1 found one real gap
|
|
40
|
+
(`Eε-4` only checked fixture presence/shape — P-Eε-1 left it green);
|
|
41
|
+
closed in commit `4a78f9c` by running the fixture's args through
|
|
42
|
+
`validateUpstreamCliArgs` inside the same `it()` block. Round 2
|
|
43
|
+
delivered unanimous UNCONDITIONAL APPROVE across all four reviewers,
|
|
44
|
+
with site-by-site probe evidence for the contested `Eα` registered-schema
|
|
45
|
+
helper. Spec at `docs/plans/test-veracity-audit-slice-epsilon.spec.md`.
|
|
46
|
+
|
|
47
|
+
Test count: 771 → 795 → 796 (24 + 1 new across two files).
|
|
48
|
+
|
|
49
|
+
### Known caveats
|
|
50
|
+
|
|
51
|
+
- The `npm run check` script still does not include `format:check` (a
|
|
52
|
+
gap first flagged in the v1.8.0 release notes). Run both locally
|
|
53
|
+
before pushing; CI runs format:check separately.
|
|
54
|
+
|
|
55
|
+
## [1.9.0] - 2026-05-27 — Phase 4 slice δ (budget/max-turns parity) + retroactive α/γ contract closure
|
|
56
|
+
|
|
57
|
+
Ships the fourth Phase 4 slice (budget/max-turns parity for Grok and Mistral),
|
|
58
|
+
and retroactively closes three latent contract gaps that shipped silently in
|
|
59
|
+
v1.8.0 (slices α and γ). Five commits land together: the slice δ feature,
|
|
60
|
+
two bounds-tightening fixes, a contract-table closure, and a test-veracity
|
|
61
|
+
hardening pass driven by an iterative multi-LLM audit.
|
|
62
|
+
|
|
63
|
+
### Added — `maxTurns` / `maxPrice` budget caps (slice δ)
|
|
64
|
+
|
|
65
|
+
- `grok_request` and `grok_request_async` gain optional `maxTurns?: number`
|
|
66
|
+
→ emits `grok --max-turns N`. Grok exposes no per-request budget flag,
|
|
67
|
+
so `--max-price` is Mistral-only.
|
|
68
|
+
- `mistral_request` and `mistral_request_async` gain optional
|
|
69
|
+
`maxTurns?: number` → `vibe --max-turns N` AND `maxPrice?: number` →
|
|
70
|
+
`vibe --max-price DOLLARS`. Both apply only in programmatic mode (`-p`),
|
|
71
|
+
matching Vibe's documented constraint.
|
|
72
|
+
- The Mistral stale-model recovery retry path (extracted into a pure
|
|
73
|
+
`buildMistralRetryPrep` helper) preserves all three slice-γ/δ flags
|
|
74
|
+
(`trust`, `maxTurns`, `maxPrice`) on the second attempt.
|
|
75
|
+
- Defaults: undefined for all three new fields → no flag emitted →
|
|
76
|
+
existing callers see no behavioural change.
|
|
77
|
+
|
|
78
|
+
### Fixed — Bounded numeric schemas for lossless argv stringification
|
|
79
|
+
|
|
80
|
+
- Extracted two shared, exported Zod constants:
|
|
81
|
+
- `MAX_TURNS_SCHEMA = z.number().int().positive().safe().max(10_000)`
|
|
82
|
+
- `MAX_PRICE_SCHEMA = z.number().positive().finite().min(1e-6).max(10_000)`
|
|
83
|
+
- The lower `.min(1e-6)` cap on price is exactly the boundary where
|
|
84
|
+
`String(N)` switches from decimal to scientific notation
|
|
85
|
+
(`String(1e-6) === "0.000001"` but `String(1e-7) === "1e-7"`); both
|
|
86
|
+
upstream CLIs reject scientific-notation values.
|
|
87
|
+
- Reused across all four slice-δ tool registrations so bounds stay
|
|
88
|
+
consistent if they ever need to change.
|
|
89
|
+
|
|
90
|
+
### Fixed — Upstream contract table closes 5 latent flag gaps
|
|
91
|
+
|
|
92
|
+
`assertUpstreamCliArgs` consults `UPSTREAM_CLI_CONTRACTS` on every real
|
|
93
|
+
`*_request` call. The following flags / mcpParameters were never registered
|
|
94
|
+
there before this release, so production calls setting any of them threw
|
|
95
|
+
"Upstream contract violation" at runtime even though the prepare-function
|
|
96
|
+
unit tests passed:
|
|
97
|
+
|
|
98
|
+
- **Gemini** (slice γ retroactive): `skipTrust` + `--skip-trust`.
|
|
99
|
+
- **Mistral** (slice γ + δ retroactive): `trust` + `--trust`; `maxTurns` +
|
|
100
|
+
`--max-turns`; `maxPrice` + `--max-price` (with a strict decimal-only
|
|
101
|
+
regex matching `MAX_PRICE_SCHEMA`'s lower bound).
|
|
102
|
+
- **Grok** (slice δ): `maxTurns` + `--max-turns`.
|
|
103
|
+
- **Codex** (slice α retroactive): `--output-schema` and `-c` removed
|
|
104
|
+
from `resumeForbiddenFlags` — verified accepted on `codex exec resume`
|
|
105
|
+
per codex-cli 0.133.0.
|
|
106
|
+
|
|
107
|
+
Conformance fixtures pin each new flag's argv shape, including a
|
|
108
|
+
`mistral-max-price-scientific-notation` fixture that locks the `1e-7`
|
|
109
|
+
rejection at the contract layer.
|
|
110
|
+
|
|
111
|
+
### Hardened — Test veracity (multi-LLM audit follow-up)
|
|
112
|
+
|
|
113
|
+
Codex + Grok ran iterative test-veracity audits with mutation probes per
|
|
114
|
+
`docs/plans/test-veracity-audit.spec.md`. They proved several added tests
|
|
115
|
+
were not falsifiable on the dimensions their commit messages claimed.
|
|
116
|
+
New file `src/__tests__/test-veracity-regressions.test.ts` closes those
|
|
117
|
+
gaps with six describe blocks:
|
|
118
|
+
|
|
119
|
+
- **REGRESSIONS A** — probes registered tool `inputSchema` bounds
|
|
120
|
+
directly (not the bare schema constants), so schema-drift in any of
|
|
121
|
+
the four sync/async registrations is caught.
|
|
122
|
+
- **REGRESSIONS B** — tests the pure `buildMistralRetryPrep` helper
|
|
123
|
+
across all combinations of `trust × maxTurns × maxPrice`. Self-
|
|
124
|
+
validated: dropping any of the three forwards on retry goes red.
|
|
125
|
+
- **REGRESSIONS C** — positive allowlist asserting slice α/γ/δ
|
|
126
|
+
parameters live in the matching contract's `mcpParameters` (closes
|
|
127
|
+
the self-oracle gap where removing a param from BOTH the contract
|
|
128
|
+
AND the schema previously stayed green).
|
|
129
|
+
- **REGRESSIONS D** — threads `prepare*Request` output into
|
|
130
|
+
`validateUpstreamCliArgs` end-to-end; the exact consistency check
|
|
131
|
+
the latent v1.8.0 contract breaks would have failed.
|
|
132
|
+
- **REGRESSIONS E** — `it.each` over sync AND async variants of every
|
|
133
|
+
slice-touched tool; the existing C4 was sync-only.
|
|
134
|
+
- **REGRESSIONS F** — flag-fixture coverage map: every flag in each
|
|
135
|
+
contract `flags` table must be exercised by a passing fixture (with
|
|
136
|
+
a grandfathered pre-audit baseline). Forces future slice authors to
|
|
137
|
+
add a fixture alongside any new flag entry.
|
|
138
|
+
|
|
139
|
+
The existing C4 (`MCP request schemas expose the provider contract
|
|
140
|
+
parameters`) now walks `_async` tools too.
|
|
141
|
+
|
|
142
|
+
### Notes
|
|
143
|
+
|
|
144
|
+
Multi-LLM review across multiple iterative rounds, ending with a
|
|
145
|
+
dedicated test-veracity audit per Werner's strict-evidence protocol
|
|
146
|
+
(documented in `docs/plans/test-veracity-audit.spec.md`). Round 2 of the
|
|
147
|
+
audit landed UNCONDITIONAL APPROVE from Codex, Grok, Claude, and Mistral
|
|
148
|
+
with full mutation-probe evidence — every documented counterexample
|
|
149
|
+
mutation went red as predicted; tests are falsifiable by exactly the
|
|
150
|
+
regressions they claim to guard against. Gemini was quota-exhausted
|
|
151
|
+
during the audit window (~6h reset) and did not participate in round 2.
|
|
152
|
+
|
|
5
153
|
## [1.8.0] - 2026-05-27 — Phase 4 openers (codex resume fix, mistral telemetry, headless trust flags)
|
|
6
154
|
|
|
7
155
|
Ships the first three slices of the Phase 4 provider-modernisation
|
|
@@ -1,13 +1,22 @@
|
|
|
1
1
|
/**
|
|
2
|
-
*
|
|
2
|
+
* Parsers for Gemini CLI `-o json` (single object) and `-o stream-json`
|
|
3
|
+
* (NDJSON event stream) output.
|
|
3
4
|
*
|
|
4
|
-
*
|
|
5
|
+
* `-o json` emits a single JSON object with:
|
|
5
6
|
* - `response`: string final model output
|
|
6
7
|
* - `usageMetadata`: { promptTokenCount, candidatesTokenCount,
|
|
7
8
|
* cachedContentTokenCount?, totalTokenCount }
|
|
8
9
|
*
|
|
9
|
-
*
|
|
10
|
-
*
|
|
10
|
+
* `-o stream-json` emits one JSON object per line:
|
|
11
|
+
* - `{ "type": "init", "session_id": "...", "model": "..." }`
|
|
12
|
+
* - `{ "type": "message", "role": "user", "content": "..." }`
|
|
13
|
+
* - `{ "type": "message", "role": "assistant", "content": "...", "delta": true }` (repeated)
|
|
14
|
+
* - `{ "type": "result", "status": "success", "stats": { "input_tokens": N,
|
|
15
|
+
* "output_tokens": N, "cached": N, ... } }`
|
|
16
|
+
*
|
|
17
|
+
* Both parsers return null when stdout is unparseable. Both populate the same
|
|
18
|
+
* `GeminiJsonParseResult` shape so `extractUsageAndCost` can branch on
|
|
19
|
+
* outputFormat without further dispatch.
|
|
11
20
|
*/
|
|
12
21
|
export interface GeminiUsage {
|
|
13
22
|
input_tokens: number;
|
|
@@ -19,3 +28,9 @@ export interface GeminiJsonParseResult {
|
|
|
19
28
|
response?: string;
|
|
20
29
|
}
|
|
21
30
|
export declare function parseGeminiJson(stdout: string): GeminiJsonParseResult | null;
|
|
31
|
+
/**
|
|
32
|
+
* Parse Gemini `-o stream-json` NDJSON output. Concatenates assistant `delta`
|
|
33
|
+
* message content into `response`, extracts the terminal `result.stats` payload
|
|
34
|
+
* into `usage`. Returns null when stdout contains no parseable JSON line.
|
|
35
|
+
*/
|
|
36
|
+
export declare function parseGeminiStreamJson(stdout: string): GeminiJsonParseResult | null;
|
|
@@ -1,13 +1,22 @@
|
|
|
1
1
|
/**
|
|
2
|
-
*
|
|
2
|
+
* Parsers for Gemini CLI `-o json` (single object) and `-o stream-json`
|
|
3
|
+
* (NDJSON event stream) output.
|
|
3
4
|
*
|
|
4
|
-
*
|
|
5
|
+
* `-o json` emits a single JSON object with:
|
|
5
6
|
* - `response`: string final model output
|
|
6
7
|
* - `usageMetadata`: { promptTokenCount, candidatesTokenCount,
|
|
7
8
|
* cachedContentTokenCount?, totalTokenCount }
|
|
8
9
|
*
|
|
9
|
-
*
|
|
10
|
-
*
|
|
10
|
+
* `-o stream-json` emits one JSON object per line:
|
|
11
|
+
* - `{ "type": "init", "session_id": "...", "model": "..." }`
|
|
12
|
+
* - `{ "type": "message", "role": "user", "content": "..." }`
|
|
13
|
+
* - `{ "type": "message", "role": "assistant", "content": "...", "delta": true }` (repeated)
|
|
14
|
+
* - `{ "type": "result", "status": "success", "stats": { "input_tokens": N,
|
|
15
|
+
* "output_tokens": N, "cached": N, ... } }`
|
|
16
|
+
*
|
|
17
|
+
* Both parsers return null when stdout is unparseable. Both populate the same
|
|
18
|
+
* `GeminiJsonParseResult` shape so `extractUsageAndCost` can branch on
|
|
19
|
+
* outputFormat without further dispatch.
|
|
11
20
|
*/
|
|
12
21
|
export function parseGeminiJson(stdout) {
|
|
13
22
|
const trimmed = stdout.trim();
|
|
@@ -45,3 +54,63 @@ export function parseGeminiJson(stdout) {
|
|
|
45
54
|
}
|
|
46
55
|
return result;
|
|
47
56
|
}
|
|
57
|
+
/**
|
|
58
|
+
* Parse Gemini `-o stream-json` NDJSON output. Concatenates assistant `delta`
|
|
59
|
+
* message content into `response`, extracts the terminal `result.stats` payload
|
|
60
|
+
* into `usage`. Returns null when stdout contains no parseable JSON line.
|
|
61
|
+
*/
|
|
62
|
+
export function parseGeminiStreamJson(stdout) {
|
|
63
|
+
if (!stdout) {
|
|
64
|
+
return null;
|
|
65
|
+
}
|
|
66
|
+
const lines = stdout.split(/\r?\n/);
|
|
67
|
+
const result = {};
|
|
68
|
+
const assistantChunks = [];
|
|
69
|
+
let sawAnyLine = false;
|
|
70
|
+
for (const line of lines) {
|
|
71
|
+
const trimmed = line.trim();
|
|
72
|
+
if (!trimmed)
|
|
73
|
+
continue;
|
|
74
|
+
// Gemini stream-json lines are individual JSON objects; non-JSON
|
|
75
|
+
// chatter (warnings, "Ripgrep not available", etc.) is silently
|
|
76
|
+
// ignored so a stray banner line doesn't poison usage extraction.
|
|
77
|
+
let event;
|
|
78
|
+
try {
|
|
79
|
+
event = JSON.parse(trimmed);
|
|
80
|
+
}
|
|
81
|
+
catch {
|
|
82
|
+
continue;
|
|
83
|
+
}
|
|
84
|
+
if (!event || typeof event !== "object")
|
|
85
|
+
continue;
|
|
86
|
+
sawAnyLine = true;
|
|
87
|
+
if (event.type === "message" &&
|
|
88
|
+
event.role === "assistant" &&
|
|
89
|
+
typeof event.content === "string") {
|
|
90
|
+
assistantChunks.push(event.content);
|
|
91
|
+
continue;
|
|
92
|
+
}
|
|
93
|
+
if (event.type === "result" && event.stats && typeof event.stats === "object") {
|
|
94
|
+
const stats = event.stats;
|
|
95
|
+
const input = typeof stats.input_tokens === "number" ? stats.input_tokens : undefined;
|
|
96
|
+
const output = typeof stats.output_tokens === "number" ? stats.output_tokens : undefined;
|
|
97
|
+
if (input !== undefined || output !== undefined) {
|
|
98
|
+
const usage = {
|
|
99
|
+
input_tokens: input ?? 0,
|
|
100
|
+
output_tokens: output ?? 0,
|
|
101
|
+
};
|
|
102
|
+
if (typeof stats.cached === "number") {
|
|
103
|
+
usage.cache_read_tokens = stats.cached;
|
|
104
|
+
}
|
|
105
|
+
result.usage = usage;
|
|
106
|
+
}
|
|
107
|
+
}
|
|
108
|
+
}
|
|
109
|
+
if (!sawAnyLine) {
|
|
110
|
+
return null;
|
|
111
|
+
}
|
|
112
|
+
if (assistantChunks.length > 0) {
|
|
113
|
+
result.response = assistantChunks.join("");
|
|
114
|
+
}
|
|
115
|
+
return result;
|
|
116
|
+
}
|
package/dist/index.d.ts
CHANGED
|
@@ -54,6 +54,19 @@ declare const logger: {
|
|
|
54
54
|
debug: (message: string, ...args: any[]) => void;
|
|
55
55
|
};
|
|
56
56
|
type GatewayLogger = typeof logger;
|
|
57
|
+
/**
|
|
58
|
+
* Phase 4 slice δ — shared Zod fragments for `maxTurns` / `maxPrice`.
|
|
59
|
+
*
|
|
60
|
+
* Both flags reach the upstream CLIs as decimal-formatted argv strings via
|
|
61
|
+
* `String(N)`. `z.number().int().positive()` alone lets values past
|
|
62
|
+
* `Number.MAX_SAFE_INTEGER` through, after which `String(1e21)` emits
|
|
63
|
+
* scientific notation that Grok and Vibe both reject. The bounds below
|
|
64
|
+
* (safe-integer cap + 10000 ceiling for turns; finite + 10000 USD ceiling
|
|
65
|
+
* for price) guarantee a lossless decimal stringification AND a sane
|
|
66
|
+
* upper bound — no plausible single agent loop exceeds 10k turns or 10k USD.
|
|
67
|
+
*/
|
|
68
|
+
export declare const MAX_TURNS_SCHEMA: z.ZodNumber;
|
|
69
|
+
export declare const MAX_PRICE_SCHEMA: z.ZodNumber;
|
|
57
70
|
export declare const SESSION_PROVIDER_VALUES: readonly ["claude", "codex", "gemini", "grok", "mistral"];
|
|
58
71
|
export declare const SESSION_PROVIDER_ENUM: z.ZodEnum<["claude", "codex", "gemini", "grok", "mistral"]>;
|
|
59
72
|
export type SessionProvider = (typeof SESSION_PROVIDER_VALUES)[number];
|
|
@@ -199,11 +212,13 @@ export declare function prepareGeminiRequest(params: {
|
|
|
199
212
|
optimizePrompt: boolean;
|
|
200
213
|
operation: string;
|
|
201
214
|
/**
|
|
202
|
-
* U23
|
|
203
|
-
*
|
|
204
|
-
*
|
|
215
|
+
* U23 + Phase 4 slice ε: output format. `json` emits `-o json` (single
|
|
216
|
+
* JSON object with usageMetadata). `stream-json` emits `-o stream-json`
|
|
217
|
+
* (NDJSON event stream — `init` / `message` / `result` lines). Both
|
|
218
|
+
* route through `extractUsageAndCost` so usage tokens reach the flight
|
|
219
|
+
* recorder. Defaults to "text".
|
|
205
220
|
*/
|
|
206
|
-
outputFormat?: "text" | "json";
|
|
221
|
+
outputFormat?: "text" | "json" | "stream-json";
|
|
207
222
|
sandbox?: boolean;
|
|
208
223
|
policyFiles?: string[];
|
|
209
224
|
adminPolicyFiles?: string[];
|
|
@@ -215,6 +230,29 @@ export declare function prepareGeminiRequest(params: {
|
|
|
215
230
|
*/
|
|
216
231
|
skipTrust?: boolean;
|
|
217
232
|
}, runtime?: GatewayServerRuntime): CliRequestPrep | ExtendedToolResponse;
|
|
233
|
+
export declare function prepareGrokRequest(params: {
|
|
234
|
+
prompt?: string;
|
|
235
|
+
promptParts?: PromptParts;
|
|
236
|
+
model?: string;
|
|
237
|
+
outputFormat?: string;
|
|
238
|
+
alwaysApprove?: boolean;
|
|
239
|
+
permissionMode?: string;
|
|
240
|
+
effort?: string;
|
|
241
|
+
reasoningEffort?: string;
|
|
242
|
+
allowedTools?: string[];
|
|
243
|
+
disallowedTools?: string[];
|
|
244
|
+
approvalStrategy: "legacy" | "mcp_managed";
|
|
245
|
+
approvalPolicy?: string;
|
|
246
|
+
mcpServers?: ClaudeMcpServerName[];
|
|
247
|
+
correlationId?: string;
|
|
248
|
+
optimizePrompt: boolean;
|
|
249
|
+
operation: string;
|
|
250
|
+
/**
|
|
251
|
+
* Phase 4 slice δ: emit `--max-turns N` so callers can cap agent-loop
|
|
252
|
+
* iterations for cost / latency control. Mirrors Claude's wiring.
|
|
253
|
+
*/
|
|
254
|
+
maxTurns?: number;
|
|
255
|
+
}, runtime?: GatewayServerRuntime): CliRequestPrep | ExtendedToolResponse;
|
|
218
256
|
export declare function prepareMistralRequest(params: {
|
|
219
257
|
prompt?: string;
|
|
220
258
|
promptParts?: PromptParts;
|
|
@@ -236,9 +274,29 @@ export declare function prepareMistralRequest(params: {
|
|
|
236
274
|
* prompt for this invocation only (not persisted). Default undefined.
|
|
237
275
|
*/
|
|
238
276
|
trust?: boolean;
|
|
277
|
+
/** Phase 4 slice δ: Vibe `--max-turns N` cap on agent-loop iterations. */
|
|
278
|
+
maxTurns?: number;
|
|
279
|
+
/** Phase 4 slice δ: Vibe `--max-price DOLLARS` cumulative-cost cap. */
|
|
280
|
+
maxPrice?: number;
|
|
239
281
|
}, runtime?: GatewayServerRuntime): (CliRequestPrep & {
|
|
240
282
|
mistralEnv: Record<string, string>;
|
|
241
283
|
}) | ExtendedToolResponse;
|
|
284
|
+
/**
|
|
285
|
+
* Phase 4 slice δ post-review: pure helper extracted from
|
|
286
|
+
* `handleMistralRequest` so the retry-path arg-preservation invariants
|
|
287
|
+
* (trust + maxTurns + maxPrice from slices γ/δ) are unit-testable
|
|
288
|
+
* without mocking awaitJobOrDefer. Any param the wrapper threads into
|
|
289
|
+
* the FIRST `buildMistralCliInvocation` call MUST also be threaded
|
|
290
|
+
* through here, or a fresh-workspace / budgeted run can degrade on
|
|
291
|
+
* the second attempt.
|
|
292
|
+
*/
|
|
293
|
+
export declare function buildMistralRetryPrep(params: Pick<MistralRequestParams, "outputFormat" | "permissionMode" | "effort" | "reasoningEffort" | "allowedTools" | "disallowedTools" | "approvalStrategy" | "trust" | "maxTurns" | "maxPrice"> & {
|
|
294
|
+
effectivePrompt: string;
|
|
295
|
+
}, recoveryModel: string): {
|
|
296
|
+
args: string[];
|
|
297
|
+
env: Record<string, string>;
|
|
298
|
+
ignoredDisallowedTools: boolean;
|
|
299
|
+
};
|
|
242
300
|
export interface GeminiRequestParams {
|
|
243
301
|
prompt?: string;
|
|
244
302
|
promptParts?: PromptParts;
|
|
@@ -257,8 +315,11 @@ export interface GeminiRequestParams {
|
|
|
257
315
|
optimizeResponse?: boolean;
|
|
258
316
|
idleTimeoutMs?: number;
|
|
259
317
|
forceRefresh?: boolean;
|
|
260
|
-
/**
|
|
261
|
-
|
|
318
|
+
/**
|
|
319
|
+
* U23 + Phase 4 slice ε: "json" emits `-o json`; "stream-json" emits
|
|
320
|
+
* `-o stream-json` (NDJSON event stream). Both are usage-extracted.
|
|
321
|
+
*/
|
|
322
|
+
outputFormat?: "text" | "json" | "stream-json";
|
|
262
323
|
sandbox?: boolean;
|
|
263
324
|
policyFiles?: string[];
|
|
264
325
|
adminPolicyFiles?: string[];
|
|
@@ -303,6 +364,8 @@ export interface GrokRequestParams {
|
|
|
303
364
|
optimizeResponse?: boolean;
|
|
304
365
|
idleTimeoutMs?: number;
|
|
305
366
|
forceRefresh?: boolean;
|
|
367
|
+
/** Phase 4 slice δ: cap agent-loop iterations via `--max-turns N`. */
|
|
368
|
+
maxTurns?: number;
|
|
306
369
|
}
|
|
307
370
|
export declare function handleGrokRequest(deps: HandlerDeps, params: GrokRequestParams): Promise<ExtendedToolResponse>;
|
|
308
371
|
export declare function handleGrokRequestAsync(deps: AsyncHandlerDeps, params: Omit<GrokRequestParams, "optimizeResponse">): Promise<ExtendedToolResponse>;
|
|
@@ -329,6 +392,10 @@ export interface MistralRequestParams {
|
|
|
329
392
|
forceRefresh?: boolean;
|
|
330
393
|
/** Phase 4 slice γ: emit `--trust` for fresh-workspace headless runs. */
|
|
331
394
|
trust?: boolean;
|
|
395
|
+
/** Phase 4 slice δ: Vibe `--max-turns N` cap on agent-loop iterations. */
|
|
396
|
+
maxTurns?: number;
|
|
397
|
+
/** Phase 4 slice δ: Vibe `--max-price DOLLARS` cumulative-cost cap. */
|
|
398
|
+
maxPrice?: number;
|
|
332
399
|
}
|
|
333
400
|
export declare function handleMistralRequest(deps: HandlerDeps, params: MistralRequestParams): Promise<ExtendedToolResponse>;
|
|
334
401
|
export declare function handleMistralRequestAsync(deps: AsyncHandlerDeps, params: Omit<MistralRequestParams, "optimizeResponse">): Promise<ExtendedToolResponse>;
|
package/dist/index.js
CHANGED
|
@@ -9,7 +9,7 @@ import { z } from "zod";
|
|
|
9
9
|
import { executeCli, killAllProcessGroups } from "./executor.js";
|
|
10
10
|
import { parseStreamJson } from "./stream-json-parser.js";
|
|
11
11
|
import { parseCodexJsonStream } from "./codex-json-parser.js";
|
|
12
|
-
import { parseGeminiJson } from "./gemini-json-parser.js";
|
|
12
|
+
import { parseGeminiJson, parseGeminiStreamJson } from "./gemini-json-parser.js";
|
|
13
13
|
import { parseVibeMetaJson } from "./mistral-meta-json-parser.js";
|
|
14
14
|
import { homedir } from "os";
|
|
15
15
|
import { createSessionManager } from "./session-manager.js";
|
|
@@ -229,6 +229,23 @@ function getApprovalManager(runtimeLogger = logger) {
|
|
|
229
229
|
return approvalManager;
|
|
230
230
|
}
|
|
231
231
|
const MCP_SERVER_ENUM = z.enum(CLAUDE_MCP_SERVER_NAMES);
|
|
232
|
+
/**
|
|
233
|
+
* Phase 4 slice δ — shared Zod fragments for `maxTurns` / `maxPrice`.
|
|
234
|
+
*
|
|
235
|
+
* Both flags reach the upstream CLIs as decimal-formatted argv strings via
|
|
236
|
+
* `String(N)`. `z.number().int().positive()` alone lets values past
|
|
237
|
+
* `Number.MAX_SAFE_INTEGER` through, after which `String(1e21)` emits
|
|
238
|
+
* scientific notation that Grok and Vibe both reject. The bounds below
|
|
239
|
+
* (safe-integer cap + 10000 ceiling for turns; finite + 10000 USD ceiling
|
|
240
|
+
* for price) guarantee a lossless decimal stringification AND a sane
|
|
241
|
+
* upper bound — no plausible single agent loop exceeds 10k turns or 10k USD.
|
|
242
|
+
*/
|
|
243
|
+
export const MAX_TURNS_SCHEMA = z.number().int().positive().safe().max(10_000);
|
|
244
|
+
// `.min(1e-6)` keeps the value in JS's decimal-stringify range:
|
|
245
|
+
// String(1e-6) === "0.000001" but String(1e-7) === "1e-7", which both
|
|
246
|
+
// upstream CLIs would reject. 1µUSD per request is fine-grained enough
|
|
247
|
+
// for any plausible budget-cap use.
|
|
248
|
+
export const MAX_PRICE_SCHEMA = z.number().positive().finite().min(1e-6).max(10_000);
|
|
232
249
|
// U22: Session-provider enum extended to five providers. The storage layer's
|
|
233
250
|
// CLI_TYPES already includes "mistral"; the MCP-tool layer mirrors that here so
|
|
234
251
|
// session_create / session_list / session_clear_all accept the fifth provider.
|
|
@@ -513,8 +530,8 @@ ctx) {
|
|
|
513
530
|
costUsd: parsed.usage.cost_usd,
|
|
514
531
|
};
|
|
515
532
|
}
|
|
516
|
-
if (cli === "gemini" && outputFormat === "json") {
|
|
517
|
-
const parsed = parseGeminiJson(output);
|
|
533
|
+
if (cli === "gemini" && (outputFormat === "json" || outputFormat === "stream-json")) {
|
|
534
|
+
const parsed = outputFormat === "stream-json" ? parseGeminiStreamJson(output) : parseGeminiJson(output);
|
|
518
535
|
if (!parsed || !parsed.usage) {
|
|
519
536
|
return {};
|
|
520
537
|
}
|
|
@@ -1254,9 +1271,19 @@ export function prepareGeminiRequest(params, runtime = resolveGatewayServerRunti
|
|
|
1254
1271
|
// U23 fix: emit `-o json` when the caller asked for JSON output. The Gemini
|
|
1255
1272
|
// JSON parser is otherwise unreachable from the tool surface and the
|
|
1256
1273
|
// structured usageMetadata is silently dropped.
|
|
1274
|
+
//
|
|
1275
|
+
// Phase 4 slice ε: same wiring for `-o stream-json` (NDJSON event stream).
|
|
1276
|
+
// Gemini already streams stdout in real-time so the existing 10-minute
|
|
1277
|
+
// idle timeout (CLI_IDLE_TIMEOUTS.gemini) covers both modes without
|
|
1278
|
+
// adjustment — unlike Claude, no `--include-partial-messages` companion
|
|
1279
|
+
// flag is required because Gemini emits assistant `delta` events as part
|
|
1280
|
+
// of the default stream-json shape.
|
|
1257
1281
|
if (params.outputFormat === "json") {
|
|
1258
1282
|
args.push("-o", "json");
|
|
1259
1283
|
}
|
|
1284
|
+
else if (params.outputFormat === "stream-json") {
|
|
1285
|
+
args.push("-o", "stream-json");
|
|
1286
|
+
}
|
|
1260
1287
|
// Phase 4 slice γ: opt-in trust-prompt bypass for fresh workspaces.
|
|
1261
1288
|
if (params.skipTrust) {
|
|
1262
1289
|
args.push("--skip-trust");
|
|
@@ -1273,7 +1300,7 @@ export function prepareGeminiRequest(params, runtime = resolveGatewayServerRunti
|
|
|
1273
1300
|
stablePrefixTokens,
|
|
1274
1301
|
};
|
|
1275
1302
|
}
|
|
1276
|
-
function prepareGrokRequest(params, runtime = resolveGatewayServerRuntime()) {
|
|
1303
|
+
export function prepareGrokRequest(params, runtime = resolveGatewayServerRuntime()) {
|
|
1277
1304
|
const corrId = params.correlationId || randomUUID();
|
|
1278
1305
|
const cliInfo = getCliInfo();
|
|
1279
1306
|
const resolvedModel = resolveModelAlias("grok", params.model, cliInfo);
|
|
@@ -1349,6 +1376,9 @@ function prepareGrokRequest(params, runtime = resolveGatewayServerRuntime()) {
|
|
|
1349
1376
|
if (params.disallowedTools && params.disallowedTools.length > 0) {
|
|
1350
1377
|
args.push("--disallowed-tools", params.disallowedTools.join(","));
|
|
1351
1378
|
}
|
|
1379
|
+
if (params.maxTurns !== undefined) {
|
|
1380
|
+
args.push("--max-turns", String(params.maxTurns));
|
|
1381
|
+
}
|
|
1352
1382
|
return {
|
|
1353
1383
|
corrId,
|
|
1354
1384
|
effectivePrompt,
|
|
@@ -1433,6 +1463,8 @@ export function prepareMistralRequest(params, runtime = resolveGatewayServerRunt
|
|
|
1433
1463
|
allowedTools: params.allowedTools,
|
|
1434
1464
|
disallowedTools: params.disallowedTools,
|
|
1435
1465
|
trust: params.trust,
|
|
1466
|
+
maxTurns: params.maxTurns,
|
|
1467
|
+
maxPrice: params.maxPrice,
|
|
1436
1468
|
});
|
|
1437
1469
|
if (prep.ignoredDisallowedTools) {
|
|
1438
1470
|
runtime.logger.info(`[${corrId}] Mistral does not support disallowedTools; ignoring (caller passed ${params.disallowedTools?.length ?? 0} entries)`);
|
|
@@ -1463,6 +1495,32 @@ function selectMistralRecoveryModel(failedModel) {
|
|
|
1463
1495
|
].filter((model) => Boolean(model && model !== failedModel));
|
|
1464
1496
|
return candidates.find(model => model !== "local");
|
|
1465
1497
|
}
|
|
1498
|
+
/**
|
|
1499
|
+
* Phase 4 slice δ post-review: pure helper extracted from
|
|
1500
|
+
* `handleMistralRequest` so the retry-path arg-preservation invariants
|
|
1501
|
+
* (trust + maxTurns + maxPrice from slices γ/δ) are unit-testable
|
|
1502
|
+
* without mocking awaitJobOrDefer. Any param the wrapper threads into
|
|
1503
|
+
* the FIRST `buildMistralCliInvocation` call MUST also be threaded
|
|
1504
|
+
* through here, or a fresh-workspace / budgeted run can degrade on
|
|
1505
|
+
* the second attempt.
|
|
1506
|
+
*/
|
|
1507
|
+
export function buildMistralRetryPrep(params, recoveryModel) {
|
|
1508
|
+
return buildMistralCliInvocation({
|
|
1509
|
+
prompt: params.effectivePrompt,
|
|
1510
|
+
resolvedModel: recoveryModel,
|
|
1511
|
+
outputFormat: params.outputFormat,
|
|
1512
|
+
permissionMode: params.approvalStrategy === "mcp_managed"
|
|
1513
|
+
? "auto-approve"
|
|
1514
|
+
: (params.permissionMode ?? "auto-approve"),
|
|
1515
|
+
effort: params.effort,
|
|
1516
|
+
reasoningEffort: params.reasoningEffort,
|
|
1517
|
+
allowedTools: params.allowedTools,
|
|
1518
|
+
disallowedTools: params.disallowedTools,
|
|
1519
|
+
trust: params.trust,
|
|
1520
|
+
maxTurns: params.maxTurns,
|
|
1521
|
+
maxPrice: params.maxPrice,
|
|
1522
|
+
});
|
|
1523
|
+
}
|
|
1466
1524
|
function buildCliResponse(cli, stdout, optimizeResponse, corrId, sessionId, prep, durationMs, resumable, outputFormat, warnings) {
|
|
1467
1525
|
let finalStdout = stdout;
|
|
1468
1526
|
// Skip response optimization for JSON output to prevent corrupting structured data
|
|
@@ -1801,6 +1859,7 @@ export async function handleGrokRequest(deps, params) {
|
|
|
1801
1859
|
correlationId: params.correlationId,
|
|
1802
1860
|
optimizePrompt: params.optimizePrompt,
|
|
1803
1861
|
operation: "grok_request",
|
|
1862
|
+
maxTurns: params.maxTurns,
|
|
1804
1863
|
}, runtime);
|
|
1805
1864
|
if (!("args" in prep))
|
|
1806
1865
|
return prep;
|
|
@@ -1921,6 +1980,7 @@ export async function handleGrokRequestAsync(deps, params) {
|
|
|
1921
1980
|
correlationId: params.correlationId,
|
|
1922
1981
|
optimizePrompt: params.optimizePrompt,
|
|
1923
1982
|
operation: "grok_request_async",
|
|
1983
|
+
maxTurns: params.maxTurns,
|
|
1924
1984
|
}, runtime);
|
|
1925
1985
|
if (!("args" in prep))
|
|
1926
1986
|
return prep;
|
|
@@ -2003,6 +2063,8 @@ export async function handleMistralRequest(deps, params) {
|
|
|
2003
2063
|
optimizePrompt: params.optimizePrompt,
|
|
2004
2064
|
operation: "mistral_request",
|
|
2005
2065
|
trust: params.trust,
|
|
2066
|
+
maxTurns: params.maxTurns,
|
|
2067
|
+
maxPrice: params.maxPrice,
|
|
2006
2068
|
}, runtime);
|
|
2007
2069
|
if (!("args" in prep))
|
|
2008
2070
|
return prep;
|
|
@@ -2035,22 +2097,7 @@ export async function handleMistralRequest(deps, params) {
|
|
|
2035
2097
|
const recoveryModel = selectMistralRecoveryModel(prep.resolvedModel);
|
|
2036
2098
|
if (recoveryModel) {
|
|
2037
2099
|
deps.logger.info(`[${corrId}] mistral_request detected stale Vibe model selection; retrying once with ${recoveryModel}`);
|
|
2038
|
-
const retryPrep =
|
|
2039
|
-
prompt: prep.effectivePrompt,
|
|
2040
|
-
resolvedModel: recoveryModel,
|
|
2041
|
-
outputFormat: params.outputFormat,
|
|
2042
|
-
permissionMode: params.approvalStrategy === "mcp_managed"
|
|
2043
|
-
? "auto-approve"
|
|
2044
|
-
: (params.permissionMode ?? "auto-approve"),
|
|
2045
|
-
effort: params.effort,
|
|
2046
|
-
reasoningEffort: params.reasoningEffort,
|
|
2047
|
-
allowedTools: params.allowedTools,
|
|
2048
|
-
disallowedTools: params.disallowedTools,
|
|
2049
|
-
// Phase 4 slice γ: preserve --trust on the model-selection retry
|
|
2050
|
-
// so a fresh untrusted workspace doesn't block headlessly on the
|
|
2051
|
-
// second attempt after surviving the first.
|
|
2052
|
-
trust: params.trust,
|
|
2053
|
-
});
|
|
2100
|
+
const retryPrep = buildMistralRetryPrep({ ...params, effectivePrompt: prep.effectivePrompt }, recoveryModel);
|
|
2054
2101
|
const retryArgs = [...retryPrep.args, ...sessionResult.resumeArgs];
|
|
2055
2102
|
// Reuse the FR handoff built above — the retry preserves corrId,
|
|
2056
2103
|
// so the manager's logComplete still updates the original row.
|
|
@@ -2151,6 +2198,8 @@ export async function handleMistralRequestAsync(deps, params) {
|
|
|
2151
2198
|
optimizePrompt: params.optimizePrompt,
|
|
2152
2199
|
operation: "mistral_request_async",
|
|
2153
2200
|
trust: params.trust,
|
|
2201
|
+
maxTurns: params.maxTurns,
|
|
2202
|
+
maxPrice: params.maxPrice,
|
|
2154
2203
|
}, runtime);
|
|
2155
2204
|
if (!("args" in prep))
|
|
2156
2205
|
return prep;
|
|
@@ -3030,11 +3079,14 @@ export function createGatewayServer(deps = {}) {
|
|
|
3030
3079
|
.default(false)
|
|
3031
3080
|
.describe("Bypass dedup and force a fresh CLI run even if a recent identical request exists"),
|
|
3032
3081
|
// U23: emit `-o json` to extract token usage via parseGeminiJson. Default
|
|
3033
|
-
// remains text so existing callers see no behavior change.
|
|
3082
|
+
// remains text so existing callers see no behavior change. Phase 4 slice
|
|
3083
|
+
// ε adds `stream-json` (NDJSON event stream parsed by
|
|
3084
|
+
// parseGeminiStreamJson — `init`/`message`/`result` lines, idle-timeout
|
|
3085
|
+
// semantics covered by Gemini's existing real-time stdout streaming).
|
|
3034
3086
|
outputFormat: z
|
|
3035
|
-
.enum(["text", "json"])
|
|
3087
|
+
.enum(["text", "json", "stream-json"])
|
|
3036
3088
|
.default("text")
|
|
3037
|
-
.describe("Gemini output format. `json` emits `-o json`
|
|
3089
|
+
.describe("Gemini output format. `json` emits `-o json` (single JSON with usageMetadata). `stream-json` emits `-o stream-json` (NDJSON event stream — `init`/`message`/`result` lines, usage extracted from the terminal `result.stats` event). Both report usage to the flight recorder."),
|
|
3038
3090
|
sandbox: GEMINI_HIGH_IMPACT_PARAMS_SCHEMA.shape.sandbox.describe("Run Gemini in sandbox mode (-s)"),
|
|
3039
3091
|
policyFiles: GEMINI_HIGH_IMPACT_PARAMS_SCHEMA.shape.policyFiles.describe("Policy file paths (--policy <path>, one per file). Paths must exist."),
|
|
3040
3092
|
adminPolicyFiles: GEMINI_HIGH_IMPACT_PARAMS_SCHEMA.shape.adminPolicyFiles.describe("Admin policy file paths (--admin-policy <path>, one per file). Paths must exist."),
|
|
@@ -3142,7 +3194,8 @@ export function createGatewayServer(deps = {}) {
|
|
|
3142
3194
|
.boolean()
|
|
3143
3195
|
.default(false)
|
|
3144
3196
|
.describe("Bypass dedup and force a fresh CLI run even if a recent identical request exists"),
|
|
3145
|
-
|
|
3197
|
+
maxTurns: MAX_TURNS_SCHEMA.optional().describe("Grok `--max-turns N`: cap on agent-loop iterations for cost / latency control (Phase 4 slice δ). Bounded to safe integers ≤ 10000."),
|
|
3198
|
+
}, async ({ prompt, promptParts, model, outputFormat, sessionId, resumeLatest, createNewSession, alwaysApprove, permissionMode, effort, reasoningEffort, approvalStrategy, approvalPolicy, mcpServers, allowedTools, disallowedTools, correlationId, optimizePrompt, optimizeResponse, idleTimeoutMs, forceRefresh, maxTurns, }) => {
|
|
3146
3199
|
return handleGrokRequest({ sessionManager, logger, runtime }, {
|
|
3147
3200
|
prompt,
|
|
3148
3201
|
promptParts,
|
|
@@ -3165,6 +3218,7 @@ export function createGatewayServer(deps = {}) {
|
|
|
3165
3218
|
optimizeResponse,
|
|
3166
3219
|
idleTimeoutMs,
|
|
3167
3220
|
forceRefresh,
|
|
3221
|
+
maxTurns,
|
|
3168
3222
|
});
|
|
3169
3223
|
});
|
|
3170
3224
|
//──────────────────────────────────────────────────────────────────────────────
|
|
@@ -3242,7 +3296,9 @@ export function createGatewayServer(deps = {}) {
|
|
|
3242
3296
|
.boolean()
|
|
3243
3297
|
.default(false)
|
|
3244
3298
|
.describe("Emit `--trust` so Vibe trusts the cwd for this invocation only (not persisted to trusted_folders.toml) and skips the interactive trust prompt (Phase 4 slice γ)."),
|
|
3245
|
-
|
|
3299
|
+
maxTurns: MAX_TURNS_SCHEMA.optional().describe("Vibe `--max-turns N`: cap the agent-loop iteration count (programmatic mode only, Phase 4 slice δ). Bounded to safe integers ≤ 10000."),
|
|
3300
|
+
maxPrice: MAX_PRICE_SCHEMA.optional().describe("Vibe `--max-price DOLLARS`: interrupt the session when cumulative cost crosses this cap (programmatic mode only, Phase 4 slice δ). Bounded to finite values ≤ 10000 USD."),
|
|
3301
|
+
}, async ({ prompt, promptParts, model, outputFormat, sessionId, resumeLatest, createNewSession, permissionMode, effort, reasoningEffort, approvalStrategy, approvalPolicy, mcpServers, allowedTools, disallowedTools, correlationId, optimizePrompt, optimizeResponse, idleTimeoutMs, forceRefresh, trust, maxTurns, maxPrice, }) => {
|
|
3246
3302
|
return handleMistralRequest({ sessionManager, logger, runtime }, {
|
|
3247
3303
|
prompt,
|
|
3248
3304
|
promptParts,
|
|
@@ -3265,6 +3321,8 @@ export function createGatewayServer(deps = {}) {
|
|
|
3265
3321
|
idleTimeoutMs,
|
|
3266
3322
|
forceRefresh,
|
|
3267
3323
|
trust,
|
|
3324
|
+
maxTurns,
|
|
3325
|
+
maxPrice,
|
|
3268
3326
|
});
|
|
3269
3327
|
});
|
|
3270
3328
|
//──────────────────────────────────────────────────────────────────────────────
|
|
@@ -3646,11 +3704,14 @@ export function createGatewayServer(deps = {}) {
|
|
|
3646
3704
|
.default(false)
|
|
3647
3705
|
.describe("Bypass dedup and force a fresh CLI run even if a recent identical request exists"),
|
|
3648
3706
|
// U23: emit `-o json` to extract token usage via parseGeminiJson. Default
|
|
3649
|
-
// remains text so existing callers see no behavior change.
|
|
3707
|
+
// remains text so existing callers see no behavior change. Phase 4 slice
|
|
3708
|
+
// ε adds `stream-json` (NDJSON event stream parsed by
|
|
3709
|
+
// parseGeminiStreamJson — `init`/`message`/`result` lines, idle-timeout
|
|
3710
|
+
// semantics covered by Gemini's existing real-time stdout streaming).
|
|
3650
3711
|
outputFormat: z
|
|
3651
|
-
.enum(["text", "json"])
|
|
3712
|
+
.enum(["text", "json", "stream-json"])
|
|
3652
3713
|
.default("text")
|
|
3653
|
-
.describe("Gemini output format. `json` emits `-o json`
|
|
3714
|
+
.describe("Gemini output format. `json` emits `-o json` (single JSON with usageMetadata). `stream-json` emits `-o stream-json` (NDJSON event stream — `init`/`message`/`result` lines, usage extracted from the terminal `result.stats` event). Both report usage to the flight recorder."),
|
|
3654
3715
|
sandbox: GEMINI_HIGH_IMPACT_PARAMS_SCHEMA.shape.sandbox.describe("Run Gemini in sandbox mode (-s)"),
|
|
3655
3716
|
policyFiles: GEMINI_HIGH_IMPACT_PARAMS_SCHEMA.shape.policyFiles.describe("Policy file paths (--policy <path>, one per file). Paths must exist."),
|
|
3656
3717
|
adminPolicyFiles: GEMINI_HIGH_IMPACT_PARAMS_SCHEMA.shape.adminPolicyFiles.describe("Admin policy file paths (--admin-policy <path>, one per file). Paths must exist."),
|
|
@@ -3753,7 +3814,8 @@ export function createGatewayServer(deps = {}) {
|
|
|
3753
3814
|
.boolean()
|
|
3754
3815
|
.default(false)
|
|
3755
3816
|
.describe("Bypass dedup and force a fresh CLI run even if a recent identical request exists"),
|
|
3756
|
-
|
|
3817
|
+
maxTurns: MAX_TURNS_SCHEMA.optional().describe("Grok `--max-turns N`: cap on agent-loop iterations for cost / latency control (Phase 4 slice δ). Bounded to safe integers ≤ 10000."),
|
|
3818
|
+
}, async ({ prompt, promptParts, model, outputFormat, sessionId, resumeLatest, createNewSession, alwaysApprove, permissionMode, effort, reasoningEffort, approvalStrategy, approvalPolicy, mcpServers, allowedTools, disallowedTools, correlationId, optimizePrompt, idleTimeoutMs, forceRefresh, maxTurns, }) => {
|
|
3757
3819
|
return handleGrokRequestAsync({ sessionManager, asyncJobManager, logger, runtime }, {
|
|
3758
3820
|
prompt,
|
|
3759
3821
|
promptParts,
|
|
@@ -3775,6 +3837,7 @@ export function createGatewayServer(deps = {}) {
|
|
|
3775
3837
|
optimizePrompt,
|
|
3776
3838
|
idleTimeoutMs,
|
|
3777
3839
|
forceRefresh,
|
|
3840
|
+
maxTurns,
|
|
3778
3841
|
});
|
|
3779
3842
|
});
|
|
3780
3843
|
server.tool("mistral_request_async", {
|
|
@@ -3848,7 +3911,9 @@ export function createGatewayServer(deps = {}) {
|
|
|
3848
3911
|
.boolean()
|
|
3849
3912
|
.default(false)
|
|
3850
3913
|
.describe("Emit `--trust` so Vibe trusts the cwd for this invocation only (not persisted to trusted_folders.toml) and skips the interactive trust prompt (Phase 4 slice γ)."),
|
|
3851
|
-
|
|
3914
|
+
maxTurns: MAX_TURNS_SCHEMA.optional().describe("Vibe `--max-turns N`: cap the agent-loop iteration count (programmatic mode only, Phase 4 slice δ). Bounded to safe integers ≤ 10000."),
|
|
3915
|
+
maxPrice: MAX_PRICE_SCHEMA.optional().describe("Vibe `--max-price DOLLARS`: interrupt the session when cumulative cost crosses this cap (programmatic mode only, Phase 4 slice δ). Bounded to finite values ≤ 10000 USD."),
|
|
3916
|
+
}, async ({ prompt, promptParts, model, outputFormat, sessionId, resumeLatest, createNewSession, permissionMode, effort, reasoningEffort, approvalStrategy, approvalPolicy, mcpServers, allowedTools, disallowedTools, correlationId, optimizePrompt, idleTimeoutMs, forceRefresh, trust, maxTurns, maxPrice, }) => {
|
|
3852
3917
|
return handleMistralRequestAsync({ sessionManager, asyncJobManager, logger, runtime }, {
|
|
3853
3918
|
prompt,
|
|
3854
3919
|
promptParts,
|
|
@@ -3870,6 +3935,8 @@ export function createGatewayServer(deps = {}) {
|
|
|
3870
3935
|
idleTimeoutMs,
|
|
3871
3936
|
forceRefresh,
|
|
3872
3937
|
trust,
|
|
3938
|
+
maxTurns,
|
|
3939
|
+
maxPrice,
|
|
3873
3940
|
});
|
|
3874
3941
|
});
|
|
3875
3942
|
server.tool("llm_job_status", {
|
|
@@ -114,6 +114,17 @@ export interface PrepareMistralRequestInput {
|
|
|
114
114
|
* Vibe's prompt behaviour is preserved for existing callers.
|
|
115
115
|
*/
|
|
116
116
|
trust?: boolean;
|
|
117
|
+
/**
|
|
118
|
+
* Phase 4 slice δ: emit `--max-turns N` to cap the agent-loop iteration
|
|
119
|
+
* count (only applies in programmatic mode with `-p`).
|
|
120
|
+
*/
|
|
121
|
+
maxTurns?: number;
|
|
122
|
+
/**
|
|
123
|
+
* Phase 4 slice δ: emit `--max-price DOLLARS` so the session is
|
|
124
|
+
* interrupted when cumulative cost crosses the cap (programmatic mode
|
|
125
|
+
* only).
|
|
126
|
+
*/
|
|
127
|
+
maxPrice?: number;
|
|
117
128
|
}
|
|
118
129
|
export interface PrepareMistralRequestResult {
|
|
119
130
|
args: string[];
|
package/dist/request-helpers.js
CHANGED
|
@@ -179,6 +179,12 @@ export function prepareMistralRequest(input) {
|
|
|
179
179
|
if (input.trust) {
|
|
180
180
|
args.push("--trust");
|
|
181
181
|
}
|
|
182
|
+
if (input.maxTurns !== undefined) {
|
|
183
|
+
args.push("--max-turns", String(input.maxTurns));
|
|
184
|
+
}
|
|
185
|
+
if (input.maxPrice !== undefined) {
|
|
186
|
+
args.push("--max-price", String(input.maxPrice));
|
|
187
|
+
}
|
|
182
188
|
const ignoredDisallowedTools = Boolean(input.disallowedTools && input.disallowedTools.length > 0);
|
|
183
189
|
return { args, env, ignoredDisallowedTools };
|
|
184
190
|
}
|
|
@@ -133,14 +133,11 @@ export const UPSTREAM_CLI_CONTRACTS = {
|
|
|
133
133
|
"ignoreRules",
|
|
134
134
|
],
|
|
135
135
|
resumeOnlyFlags: ["--last"],
|
|
136
|
-
|
|
137
|
-
|
|
138
|
-
|
|
139
|
-
|
|
140
|
-
|
|
141
|
-
"--search",
|
|
142
|
-
"-c",
|
|
143
|
-
],
|
|
136
|
+
// Phase 4 slice α (v1.8.0) verified that `codex exec resume` accepts
|
|
137
|
+
// `--output-schema` and `-c` (codex-cli 0.133.0 `exec resume --help`),
|
|
138
|
+
// so they're no longer forbidden. `--search` stays forbidden (resume
|
|
139
|
+
// inherits the original session's web-search state).
|
|
140
|
+
resumeForbiddenFlags: ["--sandbox", "--ask-for-approval", "--full-auto", "--search"],
|
|
144
141
|
flags: {
|
|
145
142
|
"--last": { arity: "none", description: "Resume latest session" },
|
|
146
143
|
"--model": { arity: "one", description: "Model selector" },
|
|
@@ -189,9 +186,24 @@ export const UPSTREAM_CLI_CONTRACTS = {
|
|
|
189
186
|
expect: "fail",
|
|
190
187
|
},
|
|
191
188
|
{
|
|
189
|
+
// Phase 4 slice α: --output-schema IS accepted on resume per
|
|
190
|
+
// codex-cli 0.133.0; this fixture pins the new behaviour so future
|
|
191
|
+
// contract changes can't silently regress.
|
|
192
192
|
id: "codex-resume-output-schema",
|
|
193
|
-
description: "
|
|
193
|
+
description: "Phase 4 slice α: --output-schema accepted on resume (codex-cli 0.133.0)",
|
|
194
194
|
args: ["exec", "resume", "--output-schema", "/tmp/schema.json", "session-id", "hello"],
|
|
195
|
+
expect: "pass",
|
|
196
|
+
},
|
|
197
|
+
{
|
|
198
|
+
id: "codex-resume-config-override",
|
|
199
|
+
description: "Phase 4 slice α: -c key=value accepted on resume",
|
|
200
|
+
args: ["exec", "resume", "-c", "model.foo=bar", "session-id", "hello"],
|
|
201
|
+
expect: "pass",
|
|
202
|
+
},
|
|
203
|
+
{
|
|
204
|
+
id: "codex-resume-search-still-forbidden",
|
|
205
|
+
description: "Phase 4 slice α: --search remains forbidden on resume",
|
|
206
|
+
args: ["exec", "resume", "--search", "session-id", "hello"],
|
|
195
207
|
expect: "fail",
|
|
196
208
|
},
|
|
197
209
|
],
|
|
@@ -219,6 +231,8 @@ export const UPSTREAM_CLI_CONTRACTS = {
|
|
|
219
231
|
"policyFiles",
|
|
220
232
|
"adminPolicyFiles",
|
|
221
233
|
"attachments",
|
|
234
|
+
// Phase 4 slice γ
|
|
235
|
+
"skipTrust",
|
|
222
236
|
],
|
|
223
237
|
flags: {
|
|
224
238
|
"-p": { arity: "one", description: "Prompt text" },
|
|
@@ -234,8 +248,16 @@ export const UPSTREAM_CLI_CONTRACTS = {
|
|
|
234
248
|
"-s": { arity: "none", description: "Sandbox mode" },
|
|
235
249
|
"--policy": { arity: "one", description: "Policy file path" },
|
|
236
250
|
"--admin-policy": { arity: "one", description: "Admin policy file path" },
|
|
237
|
-
"-o": {
|
|
251
|
+
"-o": {
|
|
252
|
+
arity: "one",
|
|
253
|
+
values: ["json", "stream-json"],
|
|
254
|
+
description: "Output format (Phase 4 slice ε adds stream-json)",
|
|
255
|
+
},
|
|
238
256
|
"--resume": { arity: "one", description: "Resume session" },
|
|
257
|
+
"--skip-trust": {
|
|
258
|
+
arity: "none",
|
|
259
|
+
description: "Trust workspace for this session (Phase 4 slice γ)",
|
|
260
|
+
},
|
|
239
261
|
},
|
|
240
262
|
env: {},
|
|
241
263
|
conformanceFixtures: [
|
|
@@ -251,6 +273,24 @@ export const UPSTREAM_CLI_CONTRACTS = {
|
|
|
251
273
|
args: ["-p", "hello", "--not-a-gemini-flag"],
|
|
252
274
|
expect: "fail",
|
|
253
275
|
},
|
|
276
|
+
{
|
|
277
|
+
id: "gemini-skip-trust",
|
|
278
|
+
description: "Phase 4 slice γ: --skip-trust is accepted",
|
|
279
|
+
args: ["-p", "hello", "--skip-trust"],
|
|
280
|
+
expect: "pass",
|
|
281
|
+
},
|
|
282
|
+
{
|
|
283
|
+
id: "gemini-stream-json",
|
|
284
|
+
description: "Phase 4 slice ε: -o stream-json is accepted",
|
|
285
|
+
args: ["-p", "hello", "-o", "stream-json"],
|
|
286
|
+
expect: "pass",
|
|
287
|
+
},
|
|
288
|
+
{
|
|
289
|
+
id: "gemini-output-format-invalid",
|
|
290
|
+
description: "Phase 4 slice ε: -o ndjson is rejected (not in contract enum)",
|
|
291
|
+
args: ["-p", "hello", "-o", "ndjson"],
|
|
292
|
+
expect: "fail",
|
|
293
|
+
},
|
|
254
294
|
],
|
|
255
295
|
},
|
|
256
296
|
grok: {
|
|
@@ -275,6 +315,8 @@ export const UPSTREAM_CLI_CONTRACTS = {
|
|
|
275
315
|
"mcpServers",
|
|
276
316
|
"allowedTools",
|
|
277
317
|
"disallowedTools",
|
|
318
|
+
// Phase 4 slice δ
|
|
319
|
+
"maxTurns",
|
|
278
320
|
],
|
|
279
321
|
flags: {
|
|
280
322
|
"-p": { arity: "one", description: "Prompt text" },
|
|
@@ -299,6 +341,11 @@ export const UPSTREAM_CLI_CONTRACTS = {
|
|
|
299
341
|
},
|
|
300
342
|
"--resume": { arity: "one", description: "Resume session" },
|
|
301
343
|
"--continue": { arity: "none", description: "Continue latest session" },
|
|
344
|
+
"--max-turns": {
|
|
345
|
+
arity: "one",
|
|
346
|
+
pattern: /^[1-9][0-9]*$/,
|
|
347
|
+
description: "Agent-loop iteration cap (Phase 4 slice δ)",
|
|
348
|
+
},
|
|
302
349
|
},
|
|
303
350
|
env: {},
|
|
304
351
|
conformanceFixtures: [
|
|
@@ -314,6 +361,18 @@ export const UPSTREAM_CLI_CONTRACTS = {
|
|
|
314
361
|
args: ["-p", "hello", "--not-a-grok-flag"],
|
|
315
362
|
expect: "fail",
|
|
316
363
|
},
|
|
364
|
+
{
|
|
365
|
+
id: "grok-max-turns",
|
|
366
|
+
description: "Phase 4 slice δ: --max-turns N is accepted",
|
|
367
|
+
args: ["-p", "hello", "--max-turns", "5"],
|
|
368
|
+
expect: "pass",
|
|
369
|
+
},
|
|
370
|
+
{
|
|
371
|
+
id: "grok-max-turns-invalid-zero",
|
|
372
|
+
description: "Phase 4 slice δ: --max-turns 0 is rejected by contract pattern",
|
|
373
|
+
args: ["-p", "hello", "--max-turns", "0"],
|
|
374
|
+
expect: "fail",
|
|
375
|
+
},
|
|
317
376
|
],
|
|
318
377
|
},
|
|
319
378
|
mistral: {
|
|
@@ -337,6 +396,11 @@ export const UPSTREAM_CLI_CONTRACTS = {
|
|
|
337
396
|
"mcpServers",
|
|
338
397
|
"allowedTools",
|
|
339
398
|
"disallowedTools",
|
|
399
|
+
// Phase 4 slice γ
|
|
400
|
+
"trust",
|
|
401
|
+
// Phase 4 slice δ
|
|
402
|
+
"maxTurns",
|
|
403
|
+
"maxPrice",
|
|
340
404
|
],
|
|
341
405
|
flags: {
|
|
342
406
|
"-p": { arity: "one", description: "Prompt text" },
|
|
@@ -355,6 +419,22 @@ export const UPSTREAM_CLI_CONTRACTS = {
|
|
|
355
419
|
"--enabled-tools": { arity: "one", description: "Enabled tool" },
|
|
356
420
|
"--resume": { arity: "one", description: "Resume session" },
|
|
357
421
|
"--continue": { arity: "none", description: "Continue latest session" },
|
|
422
|
+
"--trust": {
|
|
423
|
+
arity: "none",
|
|
424
|
+
description: "Trust cwd for this invocation only (Phase 4 slice γ)",
|
|
425
|
+
},
|
|
426
|
+
"--max-turns": {
|
|
427
|
+
arity: "one",
|
|
428
|
+
pattern: /^[1-9][0-9]*$/,
|
|
429
|
+
description: "Agent-loop iteration cap (Phase 4 slice δ, programmatic mode only)",
|
|
430
|
+
},
|
|
431
|
+
"--max-price": {
|
|
432
|
+
arity: "one",
|
|
433
|
+
// Decimal-only: matches the MAX_PRICE_SCHEMA min(1e-6) lower bound
|
|
434
|
+
// that keeps String(N) in decimal form (no scientific notation).
|
|
435
|
+
pattern: /^(0|[1-9][0-9]*)(\.[0-9]+)?$/,
|
|
436
|
+
description: "Cumulative cost cap in USD (Phase 4 slice δ, programmatic mode only)",
|
|
437
|
+
},
|
|
358
438
|
},
|
|
359
439
|
env: {
|
|
360
440
|
VIBE_ACTIVE_MODEL: {
|
|
@@ -378,6 +458,27 @@ export const UPSTREAM_CLI_CONTRACTS = {
|
|
|
378
458
|
env: { CODEX_MODEL: "gpt-5.5" },
|
|
379
459
|
expect: "fail",
|
|
380
460
|
},
|
|
461
|
+
{
|
|
462
|
+
id: "mistral-trust",
|
|
463
|
+
description: "Phase 4 slice γ: --trust is accepted",
|
|
464
|
+
args: ["-p", "hello", "--agent", "auto-approve", "--trust"],
|
|
465
|
+
env: { VIBE_ACTIVE_MODEL: "mistral-medium-3.5" },
|
|
466
|
+
expect: "pass",
|
|
467
|
+
},
|
|
468
|
+
{
|
|
469
|
+
id: "mistral-max-turns-and-price",
|
|
470
|
+
description: "Phase 4 slice δ: --max-turns + --max-price are accepted together",
|
|
471
|
+
args: ["-p", "hello", "--agent", "auto-approve", "--max-turns", "3", "--max-price", "0.01"],
|
|
472
|
+
env: { VIBE_ACTIVE_MODEL: "mistral-medium-3.5" },
|
|
473
|
+
expect: "pass",
|
|
474
|
+
},
|
|
475
|
+
{
|
|
476
|
+
id: "mistral-max-price-scientific-notation",
|
|
477
|
+
description: "Phase 4 slice δ: scientific-notation --max-price is rejected by contract pattern (matches MAX_PRICE_SCHEMA bounds)",
|
|
478
|
+
args: ["-p", "hello", "--agent", "auto-approve", "--max-price", "1e-7"],
|
|
479
|
+
env: { VIBE_ACTIVE_MODEL: "mistral-medium-3.5" },
|
|
480
|
+
expect: "fail",
|
|
481
|
+
},
|
|
381
482
|
],
|
|
382
483
|
},
|
|
383
484
|
};
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "llm-cli-gateway",
|
|
3
|
-
"version": "1.
|
|
3
|
+
"version": "1.10.0",
|
|
4
4
|
"mcpName": "io.github.verivus-oss/llm-cli-gateway",
|
|
5
5
|
"description": "MCP server providing unified access to Claude Code, Codex, Gemini, Grok, and Mistral Vibe CLIs with session management, retry logic, async job orchestration, durable job results, and cross-LLM validation.",
|
|
6
6
|
"license": "MIT",
|