llm-cli-gateway 1.12.0 → 1.13.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -2,6 +2,137 @@
2
2
 
3
3
  All notable changes to the llm-cli-gateway project.
4
4
 
5
+ ## [1.13.0] - 2026-05-27 — Phase 4 slice θ (Grok HIGH parity)
6
+
7
+ Ships the eighth Phase 4 slice: five HIGH-impact Grok CLI flags are now
8
+ reachable from `grok_request` and `grok_request_async`. Grok was the
9
+ most under-wired provider per the 2026-05-27 audit; this slice closes
10
+ the HIGH-severity gap in a single bundled PR. Three commits land
11
+ together (feature wiring, contract registration, test-veracity
12
+ regressions) plus this release commit.
13
+
14
+ ### Added — five HIGH-impact Grok flags
15
+
16
+ - **`sandbox`** → `--sandbox <PROFILE>`. Freeform passthrough per
17
+ `grok --help` on 0.1.210 (no `[possible values: …]` listing, unlike
18
+ `--effort` / `--permission-mode` / `--output-format` which all
19
+ enumerate). Also settable via the `GROK_SANDBOX` env var. Caller
20
+ responsibility to pass a valid profile name. The slice deliberately
21
+ does **not** integrate `--sandbox` with `approvalStrategy:
22
+ "mcp_managed"` because the value is unbounded — Grok's approval
23
+ semantics are already covered by `permissionMode` + `alwaysApprove` +
24
+ `approvalStrategy`.
25
+ - **`rules`** → `--rules <RULES>`. Supports `@file` prefix per
26
+ `grok --help` to load from a file; the gateway passes the value
27
+ verbatim and lets Grok parse the prefix. Bounded via
28
+ `z.string().min(1)`.
29
+ - **`systemPromptOverride`** → `--system-prompt-override <PROMPT>`.
30
+ Distinct from Claude's `--system-prompt` / `--append-system-prompt`
31
+ (Grok has only one override flag, not a pair). Bounded via
32
+ `z.string().min(1)`.
33
+ - **`allow`** → `--allow <RULE>` (repeatable). Each array entry is
34
+ emitted as its own `--allow` argv instance per `grok --help`
35
+ ("Repeat to add multiple rules"). NOT comma-joined like the existing
36
+ `--tools` / `--disallowed-tools` Grok wiring.
37
+ - **`deny`** → `--deny <RULE>` (repeatable). Same semantics as `allow`.
38
+
39
+ All five flags surfaced on both `grok_request` and `grok_request_async`
40
+ (slice δ sync+async parity invariant). Threaded from MCP-side Zod
41
+ through `GrokRequestParams` → `handleGrokRequest` /
42
+ `handleGrokRequestAsync` → `prepareGrokRequest` argv emission.
43
+
44
+ ### Contract surface
45
+
46
+ `UPSTREAM_CLI_CONTRACTS.grok` updates:
47
+
48
+ - `flags["--sandbox"]` (arity:"one"; **NO `values` enum** per live
49
+ `grok --help` — `--sandbox` is freeform, unlike Codex's
50
+ read-only/workspace-write/danger-full-access enum).
51
+ - `flags["--rules"]` (arity:"one").
52
+ - `flags["--system-prompt-override"]` (arity:"one").
53
+ - `flags["--allow"]` (arity:"one"; multiple instances accepted because
54
+ `arity:"one"` means "consumes one value per instance" not "max one
55
+ instance").
56
+ - `flags["--deny"]` (arity:"one"; same).
57
+ - `mcpParameters` array updated with five new entries.
58
+ - Five new passing conformance fixtures (`grok-sandbox`, `grok-rules`,
59
+ `grok-system-prompt-override`, `grok-allow-repeated`,
60
+ `grok-deny-repeated`); each is mechanically validated against
61
+ `validateUpstreamCliArgs` in the REGRESSIONS Tε suite, closing the
62
+ fixture-existence-vs-mechanical-validation gap identified in slice ε
63
+ round 1.
64
+
65
+ ### Out of scope
66
+
67
+ - **Approval-manager integration for `--sandbox`** — explicitly
68
+ deferred. Grok's sandbox value is freeform per the live CLI surface;
69
+ integrating it with the approval manager (as Codex does for its
70
+ bounded enum) would require either (a) hardcoding an allowlist of
71
+ profile names in the gateway, or (b) a different security model
72
+ where the caller asserts the profile is "safe enough". Neither is
73
+ obvious from current Grok docs. Revisit when Grok ships an enum or
74
+ publishes a sandbox-profile taxonomy.
75
+
76
+ ### Test-veracity audit
77
+
78
+ Per the standing protocol
79
+ (`feedback_test_veracity_audit_protocol`), this slice's tests were
80
+ audited by four LLM reviewers (Codex, Grok, Mistral, Claude) in async
81
+ parallel with mandatory mutation-probe execution against
82
+ `docs/plans/test-veracity-audit-slice-theta.spec.md`.
83
+
84
+ **Round 1 outcomes:**
85
+
86
+ - Codex: UNCONDITIONAL APPROVE — all 12 probes [as predicted], all
87
+ 26 tests VERIFIED. Baseline (`npm test`: 55 files / 884 tests; build
88
+ + format:check clean; slice file 31/31).
89
+ - Grok: UNCONDITIONAL APPROVE — all 12 probes [as predicted]; ran in
90
+ an isolated worktree at `/tmp/theta-audit-grok` per the slice-ζ
91
+ reviewer-stomping lesson.
92
+ - Mistral: UNCONDITIONAL APPROVE — all 12 probes [as predicted].
93
+ - Claude: UNCONDITIONAL APPROVE — all 12 probes [as predicted]; noted
94
+ the extra Tε-2 test (custom-profile freeform regression probe) goes
95
+ beyond the spec and closes the "enum-mistake stays silent if fixture
96
+ uses a listed value" gap.
97
+ - Gemini: **FAILED at 10s** with `TerminalQuotaError: You have
98
+ exhausted your capacity on this model. Your quota will reset after
99
+ 52m10s.` (Google 429). Documented quota blocker per protocol clause
100
+ 5+6 — counts as "concrete unfixable when documented". Four
101
+ substantive valid approves from independent vendor families (OpenAI,
102
+ xAI, Mistral, Anthropic) satisfy the gate.
103
+
104
+ The 31 new tests (853 → 884 total) cover every new field/flag/fixture
105
+ across REGRESSIONS Tα/β/ε:
106
+
107
+ - **Tα** — Registered tool inputSchema for every new field on both
108
+ sync and async tools, including `.min(1)` empty-string rejection on
109
+ the three string fields (sandbox, rules, systemPromptOverride).
110
+ - **Tβ** — `prepareGrokRequest` end-to-end argv emission per flag.
111
+ Explicit "repeated `--allow`/`--deny` instances, NOT comma-joined
112
+ like `--tools`" assertions catch the comma-join regression class. An
113
+ "@file prefix passes through verbatim" assertion catches a "helpful
114
+ preprocessor" regression. Prepare → contract end-to-end via
115
+ `validateUpstreamCliArgs` (REGRESSIONS D pattern; closes the slice
116
+ α/γ/δ contract-table gap class).
117
+ - **Tε** — `UPSTREAM_CLI_CONTRACTS` introspection + mechanical fixture
118
+ validation in the same `it()` block. Explicit assertion that
119
+ `--sandbox` has **no `values` enum** (catches the "freeform vs enum"
120
+ regression that an over-zealous future contributor might introduce).
121
+ Extra Tε-2 probe asserts a non-standard sandbox profile passes
122
+ `validateUpstreamCliArgs`.
123
+
124
+ ### Mechanical anchors (verify with `rg` before relying)
125
+
126
+ - `src/index.ts` — `prepareGrokRequest` signature gains five fields
127
+ (`:1968-1995`), emission block (`:2088-2110`), `GrokRequestParams`
128
+ interface (`:2819-2829`), `handleGrokRequest` threading
129
+ (`:2854-2858`), `handleGrokRequestAsync` threading (`:3041-3045`),
130
+ sync `grok_request` Zod registration (`:4890-4922`), async
131
+ `grok_request_async` Zod registration (`:5906-5938`).
132
+ - `src/upstream-contracts.ts` — `grok.mcpParameters` (`:459-463`),
133
+ `grok.flags` entries (`:501-524`), conformance fixtures
134
+ (`:559-587`).
135
+
5
136
  ## [1.12.0] - 2026-05-27 — Phase 4 slice ζ (working-dir + add-dir cross-provider)
6
137
 
7
138
  Ships the seventh Phase 4 slice: working-directory and additional-directory
package/dist/index.d.ts CHANGED
@@ -262,6 +262,26 @@ export declare function prepareGrokRequest(params: {
262
262
  * working directory without depending on the gateway process's cwd.
263
263
  */
264
264
  workingDir?: string;
265
+ /**
266
+ * Phase 4 slice θ — Grok HIGH parity. All five are passthrough flags:
267
+ *
268
+ * - `sandbox` → `--sandbox <PROFILE>` (freeform; Grok 0.1.210 --help
269
+ * shows no enum constraint, unlike --effort / --permission-mode /
270
+ * --output-format which all show `[possible values: …]`).
271
+ * - `rules` → `--rules <RULES>`. Supports `@file` prefix; gateway
272
+ * passes the value verbatim and lets Grok parse it.
273
+ * - `systemPromptOverride` → `--system-prompt-override <PROMPT>`.
274
+ * Distinct from Claude's --system-prompt / --append-system-prompt
275
+ * (Grok has only one override flag).
276
+ * - `allow` / `deny` → repeatable `--allow <RULE>` / `--deny <RULE>`
277
+ * per --help ("Repeat to add multiple rules"). One argv pair per
278
+ * entry — NOT comma-joined like --tools / --disallowed-tools.
279
+ */
280
+ sandbox?: string;
281
+ rules?: string;
282
+ systemPromptOverride?: string;
283
+ allow?: string[];
284
+ deny?: string[];
265
285
  }, runtime?: GatewayServerRuntime): CliRequestPrep | ExtendedToolResponse;
266
286
  export declare function prepareMistralRequest(params: {
267
287
  prompt?: string;
@@ -382,6 +402,16 @@ export interface GrokRequestParams {
382
402
  maxTurns?: number;
383
403
  /** Phase 4 slice ζ: emit `--cwd <DIR>` so the CLI uses the specified working directory. */
384
404
  workingDir?: string;
405
+ /** Phase 4 slice θ: Grok `--sandbox <PROFILE>` (freeform passthrough). */
406
+ sandbox?: string;
407
+ /** Phase 4 slice θ: Grok `--rules <RULES>` (supports `@file` prefix; verbatim passthrough). */
408
+ rules?: string;
409
+ /** Phase 4 slice θ: Grok `--system-prompt-override <PROMPT>`. */
410
+ systemPromptOverride?: string;
411
+ /** Phase 4 slice θ: Grok `--allow <RULE>` (repeatable; one entry per --allow instance). */
412
+ allow?: string[];
413
+ /** Phase 4 slice θ: Grok `--deny <RULE>` (repeatable; one entry per --deny instance). */
414
+ deny?: string[];
385
415
  }
386
416
  export declare function handleGrokRequest(deps: HandlerDeps, params: GrokRequestParams): Promise<ExtendedToolResponse>;
387
417
  export declare function handleGrokRequestAsync(deps: AsyncHandlerDeps, params: Omit<GrokRequestParams, "optimizeResponse">): Promise<ExtendedToolResponse>;
package/dist/index.js CHANGED
@@ -1398,6 +1398,25 @@ export function prepareGrokRequest(params, runtime = resolveGatewayServerRuntime
1398
1398
  if (params.workingDir) {
1399
1399
  args.push("--cwd", params.workingDir);
1400
1400
  }
1401
+ if (params.sandbox) {
1402
+ args.push("--sandbox", params.sandbox);
1403
+ }
1404
+ if (params.rules) {
1405
+ args.push("--rules", params.rules);
1406
+ }
1407
+ if (params.systemPromptOverride) {
1408
+ args.push("--system-prompt-override", params.systemPromptOverride);
1409
+ }
1410
+ if (params.allow && params.allow.length > 0) {
1411
+ for (const rule of params.allow) {
1412
+ args.push("--allow", rule);
1413
+ }
1414
+ }
1415
+ if (params.deny && params.deny.length > 0) {
1416
+ for (const rule of params.deny) {
1417
+ args.push("--deny", rule);
1418
+ }
1419
+ }
1401
1420
  return {
1402
1421
  corrId,
1403
1422
  effectivePrompt,
@@ -1884,6 +1903,11 @@ export async function handleGrokRequest(deps, params) {
1884
1903
  operation: "grok_request",
1885
1904
  maxTurns: params.maxTurns,
1886
1905
  workingDir: params.workingDir,
1906
+ sandbox: params.sandbox,
1907
+ rules: params.rules,
1908
+ systemPromptOverride: params.systemPromptOverride,
1909
+ allow: params.allow,
1910
+ deny: params.deny,
1887
1911
  }, runtime);
1888
1912
  if (!("args" in prep))
1889
1913
  return prep;
@@ -2006,6 +2030,11 @@ export async function handleGrokRequestAsync(deps, params) {
2006
2030
  operation: "grok_request_async",
2007
2031
  maxTurns: params.maxTurns,
2008
2032
  workingDir: params.workingDir,
2033
+ sandbox: params.sandbox,
2034
+ rules: params.rules,
2035
+ systemPromptOverride: params.systemPromptOverride,
2036
+ allow: params.allow,
2037
+ deny: params.deny,
2009
2038
  }, runtime);
2010
2039
  if (!("args" in prep))
2011
2040
  return prep;
@@ -3262,7 +3291,31 @@ export function createGatewayServer(deps = {}) {
3262
3291
  .min(1)
3263
3292
  .optional()
3264
3293
  .describe("Grok --cwd <DIR>: working directory for this invocation. Lets headless callers run Grok against a directory other than the gateway process's cwd."),
3265
- }, async ({ prompt, promptParts, model, outputFormat, sessionId, resumeLatest, createNewSession, alwaysApprove, permissionMode, effort, reasoningEffort, approvalStrategy, approvalPolicy, mcpServers, allowedTools, disallowedTools, correlationId, optimizePrompt, optimizeResponse, idleTimeoutMs, forceRefresh, maxTurns, workingDir, }) => {
3294
+ // Phase 4 slice θ Grok HIGH parity (sandbox, rules, system-prompt-override, allow, deny).
3295
+ sandbox: z
3296
+ .string()
3297
+ .min(1)
3298
+ .optional()
3299
+ .describe("Grok --sandbox <PROFILE>: sandbox profile for filesystem and network access. Freeform per `grok --help` (no enum constraint on Grok 0.1.210); also settable via GROK_SANDBOX env var. Caller responsibility to pass a valid profile name."),
3300
+ rules: z
3301
+ .string()
3302
+ .min(1)
3303
+ .optional()
3304
+ .describe("Grok --rules <RULES>: extra rules to append to the system prompt. Supports `@file` prefix per `grok --help` to load from a file; gateway passes the value verbatim and lets Grok parse the prefix."),
3305
+ systemPromptOverride: z
3306
+ .string()
3307
+ .min(1)
3308
+ .optional()
3309
+ .describe("Grok --system-prompt-override <PROMPT>: replace the agent's system prompt entirely. Distinct from Claude's --system-prompt / --append-system-prompt (Grok has only one override flag, not a pair)."),
3310
+ allow: z
3311
+ .array(z.string())
3312
+ .optional()
3313
+ .describe('Grok --allow <RULE>: permission allow rules. Each entry is emitted as its own --allow instance (per `grok --help`: "Repeat to add multiple rules").'),
3314
+ deny: z
3315
+ .array(z.string())
3316
+ .optional()
3317
+ .describe('Grok --deny <RULE>: permission deny rules. Each entry is emitted as its own --deny instance (per `grok --help`: "Repeat to add multiple rules").'),
3318
+ }, async ({ prompt, promptParts, model, outputFormat, sessionId, resumeLatest, createNewSession, alwaysApprove, permissionMode, effort, reasoningEffort, approvalStrategy, approvalPolicy, mcpServers, allowedTools, disallowedTools, correlationId, optimizePrompt, optimizeResponse, idleTimeoutMs, forceRefresh, maxTurns, workingDir, sandbox, rules, systemPromptOverride, allow, deny, }) => {
3266
3319
  return handleGrokRequest({ sessionManager, logger, runtime }, {
3267
3320
  prompt,
3268
3321
  promptParts,
@@ -3287,6 +3340,11 @@ export function createGatewayServer(deps = {}) {
3287
3340
  forceRefresh,
3288
3341
  maxTurns,
3289
3342
  workingDir,
3343
+ sandbox,
3344
+ rules,
3345
+ systemPromptOverride,
3346
+ allow,
3347
+ deny,
3290
3348
  });
3291
3349
  });
3292
3350
  //──────────────────────────────────────────────────────────────────────────────
@@ -3931,7 +3989,31 @@ export function createGatewayServer(deps = {}) {
3931
3989
  .min(1)
3932
3990
  .optional()
3933
3991
  .describe("Grok --cwd <DIR>: working directory for this invocation. Lets headless callers run Grok against a directory other than the gateway process's cwd."),
3934
- }, async ({ prompt, promptParts, model, outputFormat, sessionId, resumeLatest, createNewSession, alwaysApprove, permissionMode, effort, reasoningEffort, approvalStrategy, approvalPolicy, mcpServers, allowedTools, disallowedTools, correlationId, optimizePrompt, idleTimeoutMs, forceRefresh, maxTurns, workingDir, }) => {
3992
+ // Phase 4 slice θ Grok HIGH parity (sandbox, rules, system-prompt-override, allow, deny).
3993
+ sandbox: z
3994
+ .string()
3995
+ .min(1)
3996
+ .optional()
3997
+ .describe("Grok --sandbox <PROFILE>: sandbox profile for filesystem and network access. Freeform per `grok --help` (no enum constraint); also settable via GROK_SANDBOX env var."),
3998
+ rules: z
3999
+ .string()
4000
+ .min(1)
4001
+ .optional()
4002
+ .describe("Grok --rules <RULES>: extra rules to append to the system prompt. Supports `@file` prefix; gateway passes the value verbatim."),
4003
+ systemPromptOverride: z
4004
+ .string()
4005
+ .min(1)
4006
+ .optional()
4007
+ .describe("Grok --system-prompt-override <PROMPT>: replace the agent's system prompt entirely."),
4008
+ allow: z
4009
+ .array(z.string())
4010
+ .optional()
4011
+ .describe("Grok --allow <RULE>: permission allow rules. Each entry → its own --allow instance."),
4012
+ deny: z
4013
+ .array(z.string())
4014
+ .optional()
4015
+ .describe("Grok --deny <RULE>: permission deny rules. Each entry → its own --deny instance."),
4016
+ }, async ({ prompt, promptParts, model, outputFormat, sessionId, resumeLatest, createNewSession, alwaysApprove, permissionMode, effort, reasoningEffort, approvalStrategy, approvalPolicy, mcpServers, allowedTools, disallowedTools, correlationId, optimizePrompt, idleTimeoutMs, forceRefresh, maxTurns, workingDir, sandbox, rules, systemPromptOverride, allow, deny, }) => {
3935
4017
  return handleGrokRequestAsync({ sessionManager, asyncJobManager, logger, runtime }, {
3936
4018
  prompt,
3937
4019
  promptParts,
@@ -3955,6 +4037,11 @@ export function createGatewayServer(deps = {}) {
3955
4037
  forceRefresh,
3956
4038
  maxTurns,
3957
4039
  workingDir,
4040
+ sandbox,
4041
+ rules,
4042
+ systemPromptOverride,
4043
+ allow,
4044
+ deny,
3958
4045
  });
3959
4046
  });
3960
4047
  server.tool("mistral_request_async", {
@@ -401,6 +401,12 @@ export const UPSTREAM_CLI_CONTRACTS = {
401
401
  "maxTurns",
402
402
  // Phase 4 slice ζ
403
403
  "workingDir",
404
+ // Phase 4 slice θ — Grok HIGH parity
405
+ "sandbox",
406
+ "rules",
407
+ "systemPromptOverride",
408
+ "allow",
409
+ "deny",
404
410
  ],
405
411
  flags: {
406
412
  "-p": { arity: "one", description: "Prompt text" },
@@ -434,6 +440,30 @@ export const UPSTREAM_CLI_CONTRACTS = {
434
440
  arity: "one",
435
441
  description: "Working directory for the invocation (Phase 4 slice ζ)",
436
442
  },
443
+ // Phase 4 slice θ — Grok HIGH parity. `--sandbox` is freeform per
444
+ // `grok --help` on 0.1.210 (no `[possible values: …]` list, unlike
445
+ // --effort / --permission-mode / --output-format), so we register
446
+ // it without a `values` constraint.
447
+ "--sandbox": {
448
+ arity: "one",
449
+ description: "Sandbox profile for filesystem + network access (Phase 4 slice θ; freeform passthrough; env: GROK_SANDBOX)",
450
+ },
451
+ "--rules": {
452
+ arity: "one",
453
+ description: "Extra rules appended to the system prompt; supports `@file` prefix (Phase 4 slice θ)",
454
+ },
455
+ "--system-prompt-override": {
456
+ arity: "one",
457
+ description: "Replace the agent's system prompt entirely (Phase 4 slice θ)",
458
+ },
459
+ "--allow": {
460
+ arity: "one",
461
+ description: "Permission allow rule (Phase 4 slice θ; repeat once per rule per `grok --help`)",
462
+ },
463
+ "--deny": {
464
+ arity: "one",
465
+ description: "Permission deny rule (Phase 4 slice θ; repeat once per rule per `grok --help`)",
466
+ },
437
467
  },
438
468
  env: {},
439
469
  conformanceFixtures: [
@@ -467,6 +497,36 @@ export const UPSTREAM_CLI_CONTRACTS = {
467
497
  args: ["-p", "hello", "--cwd", "/tmp/work"],
468
498
  expect: "pass",
469
499
  },
500
+ {
501
+ id: "grok-sandbox",
502
+ description: "Phase 4 slice θ: --sandbox <PROFILE> accepted (freeform)",
503
+ args: ["-p", "hello", "--sandbox", "workspace-write"],
504
+ expect: "pass",
505
+ },
506
+ {
507
+ id: "grok-rules",
508
+ description: "Phase 4 slice θ: --rules <RULES> accepted (@file prefix preserved)",
509
+ args: ["-p", "hello", "--rules", "@./rules.md"],
510
+ expect: "pass",
511
+ },
512
+ {
513
+ id: "grok-system-prompt-override",
514
+ description: "Phase 4 slice θ: --system-prompt-override <PROMPT> accepted",
515
+ args: ["-p", "hello", "--system-prompt-override", "You are a tester"],
516
+ expect: "pass",
517
+ },
518
+ {
519
+ id: "grok-allow-repeated",
520
+ description: "Phase 4 slice θ: repeated --allow <RULE> accepted",
521
+ args: ["-p", "hello", "--allow", "bash", "--allow", "edit"],
522
+ expect: "pass",
523
+ },
524
+ {
525
+ id: "grok-deny-repeated",
526
+ description: "Phase 4 slice θ: repeated --deny <RULE> accepted",
527
+ args: ["-p", "hello", "--deny", "write", "--deny", "kill"],
528
+ expect: "pass",
529
+ },
470
530
  ],
471
531
  },
472
532
  mistral: {
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "llm-cli-gateway",
3
- "version": "1.12.0",
3
+ "version": "1.13.0",
4
4
  "mcpName": "io.github.verivus-oss/llm-cli-gateway",
5
5
  "description": "MCP server providing unified access to Claude Code, Codex, Gemini, Grok, and Mistral Vibe CLIs with session management, retry logic, async job orchestration, durable job results, and cross-LLM validation.",
6
6
  "license": "MIT",