npm - llm-cli-gateway - Versions diffs - 1.12.0 → 1.13.1 - Mend

llm-cli-gateway 1.12.0 → 1.13.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (5) hide show

package/CHANGELOG.md +167 -0
package/dist/index.d.ts +30 -0
package/dist/index.js +89 -2
package/dist/upstream-contracts.js +60 -0
package/package.json +1 -1

package/CHANGELOG.md CHANGED Viewed

@@ -2,6 +2,173 @@
 All notable changes to the llm-cli-gateway project.
+## [1.13.1] - 2026-05-27 — Installer Windows build fix (no code changes)
+Patch release. **No changes to the gateway, MCP tools, or any provider
+wiring.** npm + PyPI 1.13.1 packages are functionally identical to 1.13.0.
+### Fixed
+- `installer/build-release.sh` registered a function-scoped EXIT trap
+  that referenced a `local` variable (`staging`). When something inside
+  the function failed, `set -e` + `set -u` made the trap die with
+  `staging: unbound variable` AFTER the function had already returned
+  and its locals had gone out of scope — masking the real failure.
+- This first surfaced on the v1.13.0 release-installer.yml Windows job
+  when GitHub started redirecting `windows-latest` to the new
+  `windows-2025-vs2026` image (rollout completes 2026-06-15). Linux
+  and both macOS targets still built clean.
+- The fix lifts the staging path to a script-level `RVWR_STAGING_DIR`
+  variable, registers a single idempotent `cleanup_staging` helper
+  with `|| true` so the EXIT trap can't fail itself under `set -e`,
+  and defensively cleans up between iterations of the
+  `for target in TARGETS` loop.
+- Smoke-tested locally on linux/amd64 (`npm ci` + `cp -R` + `tar` ran
+  clean; bundle produced; staging dir cleaned up). Once this reaches
+  the new tag, release-installer.yml either succeeds (the trap bug
+  WAS the whole problem) or fails with a clearer message we can
+  chase as a follow-up patch.
+### Why a patch release for an installer-only fix
+The `release-installer.yml` workflow checks out the tag it builds for
+(`needs.resolve-tag.outputs.tag`) and re-running it against the
+existing `v1.13.0` tag would pick up the broken script. A new tag is
+the simplest way to get the fix onto CI without force-pushing
+`v1.13.0`. npm + PyPI 1.13.1 are republished as a side-effect; this
+matches the precedent of `v1.6.1` (docs-only follow-up to 1.6.0).
+## [1.13.0] - 2026-05-27 — Phase 4 slice θ (Grok HIGH parity)
+Ships the eighth Phase 4 slice: five HIGH-impact Grok CLI flags are now
+reachable from `grok_request` and `grok_request_async`. Grok was the
+most under-wired provider per the 2026-05-27 audit; this slice closes
+the HIGH-severity gap in a single bundled PR. Three commits land
+together (feature wiring, contract registration, test-veracity
+regressions) plus this release commit.
+### Added — five HIGH-impact Grok flags
+- **`sandbox`** → `--sandbox <PROFILE>`. Freeform passthrough per
+  `grok --help` on 0.1.210 (no `[possible values: …]` listing, unlike
+  `--effort` / `--permission-mode` / `--output-format` which all
+  enumerate). Also settable via the `GROK_SANDBOX` env var. Caller
+  responsibility to pass a valid profile name. The slice deliberately
+  does **not** integrate `--sandbox` with `approvalStrategy:
+  "mcp_managed"` because the value is unbounded — Grok's approval
+  semantics are already covered by `permissionMode` + `alwaysApprove` +
+  `approvalStrategy`.
+- **`rules`** → `--rules <RULES>`. Supports `@file` prefix per
+  `grok --help` to load from a file; the gateway passes the value
+  verbatim and lets Grok parse the prefix. Bounded via
+  `z.string().min(1)`.
+- **`systemPromptOverride`** → `--system-prompt-override <PROMPT>`.
+  Distinct from Claude's `--system-prompt` / `--append-system-prompt`
+  (Grok has only one override flag, not a pair). Bounded via
+  `z.string().min(1)`.
+- **`allow`** → `--allow <RULE>` (repeatable). Each array entry is
+  emitted as its own `--allow` argv instance per `grok --help`
+  ("Repeat to add multiple rules"). NOT comma-joined like the existing
+  `--tools` / `--disallowed-tools` Grok wiring.
+- **`deny`** → `--deny <RULE>` (repeatable). Same semantics as `allow`.
+All five flags surfaced on both `grok_request` and `grok_request_async`
+(slice δ sync+async parity invariant). Threaded from MCP-side Zod
+through `GrokRequestParams` → `handleGrokRequest` /
+`handleGrokRequestAsync` → `prepareGrokRequest` argv emission.
+### Contract surface
+`UPSTREAM_CLI_CONTRACTS.grok` updates:
+- `flags["--sandbox"]` (arity:"one"; **NO `values` enum** per live
+  `grok --help` — `--sandbox` is freeform, unlike Codex's
+  read-only/workspace-write/danger-full-access enum).
+- `flags["--rules"]` (arity:"one").
+- `flags["--system-prompt-override"]` (arity:"one").
+- `flags["--allow"]` (arity:"one"; multiple instances accepted because
+  `arity:"one"` means "consumes one value per instance" not "max one
+  instance").
+- `flags["--deny"]` (arity:"one"; same).
+- `mcpParameters` array updated with five new entries.
+- Five new passing conformance fixtures (`grok-sandbox`, `grok-rules`,
+  `grok-system-prompt-override`, `grok-allow-repeated`,
+  `grok-deny-repeated`); each is mechanically validated against
+  `validateUpstreamCliArgs` in the REGRESSIONS Tε suite, closing the
+  fixture-existence-vs-mechanical-validation gap identified in slice ε
+  round 1.
+### Out of scope
+- **Approval-manager integration for `--sandbox`** — explicitly
+  deferred. Grok's sandbox value is freeform per the live CLI surface;
+  integrating it with the approval manager (as Codex does for its
+  bounded enum) would require either (a) hardcoding an allowlist of
+  profile names in the gateway, or (b) a different security model
+  where the caller asserts the profile is "safe enough". Neither is
+  obvious from current Grok docs. Revisit when Grok ships an enum or
+  publishes a sandbox-profile taxonomy.
+### Test-veracity audit
+Per the standing protocol
+(`feedback_test_veracity_audit_protocol`), this slice's tests were
+audited by four LLM reviewers (Codex, Grok, Mistral, Claude) in async
+parallel with mandatory mutation-probe execution against
+`docs/plans/test-veracity-audit-slice-theta.spec.md`.
+**Round 1 outcomes:**
+- Codex: UNCONDITIONAL APPROVE — all 12 probes [as predicted], all
+  26 tests VERIFIED. Baseline (`npm test`: 55 files / 884 tests; build
+  + format:check clean; slice file 31/31).
+- Grok: UNCONDITIONAL APPROVE — all 12 probes [as predicted]; ran in
+  an isolated worktree at `/tmp/theta-audit-grok` per the slice-ζ
+  reviewer-stomping lesson.
+- Mistral: UNCONDITIONAL APPROVE — all 12 probes [as predicted].
+- Claude: UNCONDITIONAL APPROVE — all 12 probes [as predicted]; noted
+  the extra Tε-2 test (custom-profile freeform regression probe) goes
+  beyond the spec and closes the "enum-mistake stays silent if fixture
+  uses a listed value" gap.
+- Gemini: **FAILED at 10s** with `TerminalQuotaError: You have
+  exhausted your capacity on this model. Your quota will reset after
+  52m10s.` (Google 429). Documented quota blocker per protocol clause
+  5+6 — counts as "concrete unfixable when documented". Four
+  substantive valid approves from independent vendor families (OpenAI,
+  xAI, Mistral, Anthropic) satisfy the gate.
+The 31 new tests (853 → 884 total) cover every new field/flag/fixture
+across REGRESSIONS Tα/β/ε:
+- **Tα** — Registered tool inputSchema for every new field on both
+  sync and async tools, including `.min(1)` empty-string rejection on
+  the three string fields (sandbox, rules, systemPromptOverride).
+- **Tβ** — `prepareGrokRequest` end-to-end argv emission per flag.
+  Explicit "repeated `--allow`/`--deny` instances, NOT comma-joined
+  like `--tools`" assertions catch the comma-join regression class. An
+  "@file prefix passes through verbatim" assertion catches a "helpful
+  preprocessor" regression. Prepare → contract end-to-end via
+  `validateUpstreamCliArgs` (REGRESSIONS D pattern; closes the slice
+  α/γ/δ contract-table gap class).
+- **Tε** — `UPSTREAM_CLI_CONTRACTS` introspection + mechanical fixture
+  validation in the same `it()` block. Explicit assertion that
+  `--sandbox` has **no `values` enum** (catches the "freeform vs enum"
+  regression that an over-zealous future contributor might introduce).
+  Extra Tε-2 probe asserts a non-standard sandbox profile passes
+  `validateUpstreamCliArgs`.
+### Mechanical anchors (verify with `rg` before relying)
+- `src/index.ts` — `prepareGrokRequest` signature gains five fields
+  (`:1968-1995`), emission block (`:2088-2110`), `GrokRequestParams`
+  interface (`:2819-2829`), `handleGrokRequest` threading
+  (`:2854-2858`), `handleGrokRequestAsync` threading (`:3041-3045`),
+  sync `grok_request` Zod registration (`:4890-4922`), async
+  `grok_request_async` Zod registration (`:5906-5938`).
+- `src/upstream-contracts.ts` — `grok.mcpParameters` (`:459-463`),
+  `grok.flags` entries (`:501-524`), conformance fixtures
+  (`:559-587`).
 ## [1.12.0] - 2026-05-27 — Phase 4 slice ζ (working-dir + add-dir cross-provider)
 Ships the seventh Phase 4 slice: working-directory and additional-directory

package/dist/index.d.ts CHANGED Viewed

@@ -262,6 +262,26 @@ export declare function prepareGrokRequest(params: {
      * working directory without depending on the gateway process's cwd.
      */
     workingDir?: string;
+    /**
+     * Phase 4 slice θ — Grok HIGH parity. All five are passthrough flags:
+     *
+     * - `sandbox` → `--sandbox <PROFILE>` (freeform; Grok 0.1.210 --help
+     *   shows no enum constraint, unlike --effort / --permission-mode /
+     *   --output-format which all show `[possible values: …]`).
+     * - `rules` → `--rules <RULES>`. Supports `@file` prefix; gateway
+     *   passes the value verbatim and lets Grok parse it.
+     * - `systemPromptOverride` → `--system-prompt-override <PROMPT>`.
+     *   Distinct from Claude's --system-prompt / --append-system-prompt
+     *   (Grok has only one override flag).
+     * - `allow` / `deny` → repeatable `--allow <RULE>` / `--deny <RULE>`
+     *   per --help ("Repeat to add multiple rules"). One argv pair per
+     *   entry — NOT comma-joined like --tools / --disallowed-tools.
+     */
+    sandbox?: string;
+    rules?: string;
+    systemPromptOverride?: string;
+    allow?: string[];
+    deny?: string[];
 }, runtime?: GatewayServerRuntime): CliRequestPrep | ExtendedToolResponse;
 export declare function prepareMistralRequest(params: {
     prompt?: string;
@@ -382,6 +402,16 @@ export interface GrokRequestParams {
     maxTurns?: number;
     /** Phase 4 slice ζ: emit `--cwd <DIR>` so the CLI uses the specified working directory. */
     workingDir?: string;
+    /** Phase 4 slice θ: Grok `--sandbox <PROFILE>` (freeform passthrough). */
+    sandbox?: string;
+    /** Phase 4 slice θ: Grok `--rules <RULES>` (supports `@file` prefix; verbatim passthrough). */
+    rules?: string;
+    /** Phase 4 slice θ: Grok `--system-prompt-override <PROMPT>`. */
+    systemPromptOverride?: string;
+    /** Phase 4 slice θ: Grok `--allow <RULE>` (repeatable; one entry per --allow instance). */
+    allow?: string[];
+    /** Phase 4 slice θ: Grok `--deny <RULE>` (repeatable; one entry per --deny instance). */
+    deny?: string[];
 }
 export declare function handleGrokRequest(deps: HandlerDeps, params: GrokRequestParams): Promise<ExtendedToolResponse>;
 export declare function handleGrokRequestAsync(deps: AsyncHandlerDeps, params: Omit<GrokRequestParams, "optimizeResponse">): Promise<ExtendedToolResponse>;

package/dist/index.js CHANGED Viewed

@@ -1398,6 +1398,25 @@ export function prepareGrokRequest(params, runtime = resolveGatewayServerRuntime
     if (params.workingDir) {
         args.push("--cwd", params.workingDir);
     }
+    if (params.sandbox) {
+        args.push("--sandbox", params.sandbox);
+    }
+    if (params.rules) {
+        args.push("--rules", params.rules);
+    }
+    if (params.systemPromptOverride) {
+        args.push("--system-prompt-override", params.systemPromptOverride);
+    }
+    if (params.allow && params.allow.length > 0) {
+        for (const rule of params.allow) {
+            args.push("--allow", rule);
+        }
+    }
+    if (params.deny && params.deny.length > 0) {
+        for (const rule of params.deny) {
+            args.push("--deny", rule);
+        }
+    }
     return {
         corrId,
         effectivePrompt,
@@ -1884,6 +1903,11 @@ export async function handleGrokRequest(deps, params) {
         operation: "grok_request",
         maxTurns: params.maxTurns,
         workingDir: params.workingDir,
+        sandbox: params.sandbox,
+        rules: params.rules,
+        systemPromptOverride: params.systemPromptOverride,
+        allow: params.allow,
+        deny: params.deny,
     }, runtime);
     if (!("args" in prep))
         return prep;
@@ -2006,6 +2030,11 @@ export async function handleGrokRequestAsync(deps, params) {
         operation: "grok_request_async",
         maxTurns: params.maxTurns,
         workingDir: params.workingDir,
+        sandbox: params.sandbox,
+        rules: params.rules,
+        systemPromptOverride: params.systemPromptOverride,
+        allow: params.allow,
+        deny: params.deny,
     }, runtime);
     if (!("args" in prep))
         return prep;
@@ -3262,7 +3291,31 @@ export function createGatewayServer(deps = {}) {
             .min(1)
             .optional()
             .describe("Grok --cwd <DIR>: working directory for this invocation. Lets headless callers run Grok against a directory other than the gateway process's cwd."),
-    }, async ({ prompt, promptParts, model, outputFormat, sessionId, resumeLatest, createNewSession, alwaysApprove, permissionMode, effort, reasoningEffort, approvalStrategy, approvalPolicy, mcpServers, allowedTools, disallowedTools, correlationId, optimizePrompt, optimizeResponse, idleTimeoutMs, forceRefresh, maxTurns, workingDir, }) => {
+        // Phase 4 slice θ — Grok HIGH parity (sandbox, rules, system-prompt-override, allow, deny).
+        sandbox: z
+            .string()
+            .min(1)
+            .optional()
+            .describe("Grok --sandbox <PROFILE>: sandbox profile for filesystem and network access. Freeform per `grok --help` (no enum constraint on Grok 0.1.210); also settable via GROK_SANDBOX env var. Caller responsibility to pass a valid profile name."),
+        rules: z
+            .string()
+            .min(1)
+            .optional()
+            .describe("Grok --rules <RULES>: extra rules to append to the system prompt. Supports `@file` prefix per `grok --help` to load from a file; gateway passes the value verbatim and lets Grok parse the prefix."),
+        systemPromptOverride: z
+            .string()
+            .min(1)
+            .optional()
+            .describe("Grok --system-prompt-override <PROMPT>: replace the agent's system prompt entirely. Distinct from Claude's --system-prompt / --append-system-prompt (Grok has only one override flag, not a pair)."),
+        allow: z
+            .array(z.string())
+            .optional()
+            .describe('Grok --allow <RULE>: permission allow rules. Each entry is emitted as its own --allow instance (per `grok --help`: "Repeat to add multiple rules").'),
+        deny: z
+            .array(z.string())
+            .optional()
+            .describe('Grok --deny <RULE>: permission deny rules. Each entry is emitted as its own --deny instance (per `grok --help`: "Repeat to add multiple rules").'),
+    }, async ({ prompt, promptParts, model, outputFormat, sessionId, resumeLatest, createNewSession, alwaysApprove, permissionMode, effort, reasoningEffort, approvalStrategy, approvalPolicy, mcpServers, allowedTools, disallowedTools, correlationId, optimizePrompt, optimizeResponse, idleTimeoutMs, forceRefresh, maxTurns, workingDir, sandbox, rules, systemPromptOverride, allow, deny, }) => {
         return handleGrokRequest({ sessionManager, logger, runtime }, {
             prompt,
             promptParts,
@@ -3287,6 +3340,11 @@ export function createGatewayServer(deps = {}) {
             forceRefresh,
             maxTurns,
             workingDir,
+            sandbox,
+            rules,
+            systemPromptOverride,
+            allow,
+            deny,
         });
     });
     //──────────────────────────────────────────────────────────────────────────────
@@ -3931,7 +3989,31 @@ export function createGatewayServer(deps = {}) {
                 .min(1)
                 .optional()
                 .describe("Grok --cwd <DIR>: working directory for this invocation. Lets headless callers run Grok against a directory other than the gateway process's cwd."),
-        }, async ({ prompt, promptParts, model, outputFormat, sessionId, resumeLatest, createNewSession, alwaysApprove, permissionMode, effort, reasoningEffort, approvalStrategy, approvalPolicy, mcpServers, allowedTools, disallowedTools, correlationId, optimizePrompt, idleTimeoutMs, forceRefresh, maxTurns, workingDir, }) => {
+            // Phase 4 slice θ — Grok HIGH parity (sandbox, rules, system-prompt-override, allow, deny).
+            sandbox: z
+                .string()
+                .min(1)
+                .optional()
+                .describe("Grok --sandbox <PROFILE>: sandbox profile for filesystem and network access. Freeform per `grok --help` (no enum constraint); also settable via GROK_SANDBOX env var."),
+            rules: z
+                .string()
+                .min(1)
+                .optional()
+                .describe("Grok --rules <RULES>: extra rules to append to the system prompt. Supports `@file` prefix; gateway passes the value verbatim."),
+            systemPromptOverride: z
+                .string()
+                .min(1)
+                .optional()
+                .describe("Grok --system-prompt-override <PROMPT>: replace the agent's system prompt entirely."),
+            allow: z
+                .array(z.string())
+                .optional()
+                .describe("Grok --allow <RULE>: permission allow rules. Each entry → its own --allow instance."),
+            deny: z
+                .array(z.string())
+                .optional()
+                .describe("Grok --deny <RULE>: permission deny rules. Each entry → its own --deny instance."),
+        }, async ({ prompt, promptParts, model, outputFormat, sessionId, resumeLatest, createNewSession, alwaysApprove, permissionMode, effort, reasoningEffort, approvalStrategy, approvalPolicy, mcpServers, allowedTools, disallowedTools, correlationId, optimizePrompt, idleTimeoutMs, forceRefresh, maxTurns, workingDir, sandbox, rules, systemPromptOverride, allow, deny, }) => {
             return handleGrokRequestAsync({ sessionManager, asyncJobManager, logger, runtime }, {
                 prompt,
                 promptParts,
@@ -3955,6 +4037,11 @@ export function createGatewayServer(deps = {}) {
                 forceRefresh,
                 maxTurns,
                 workingDir,
+                sandbox,
+                rules,
+                systemPromptOverride,
+                allow,
+                deny,
             });
         });
         server.tool("mistral_request_async", {

package/dist/upstream-contracts.js CHANGED Viewed

@@ -401,6 +401,12 @@ export const UPSTREAM_CLI_CONTRACTS = {
             "maxTurns",
             // Phase 4 slice ζ
             "workingDir",
+            // Phase 4 slice θ — Grok HIGH parity
+            "sandbox",
+            "rules",
+            "systemPromptOverride",
+            "allow",
+            "deny",
         ],
         flags: {
             "-p": { arity: "one", description: "Prompt text" },
@@ -434,6 +440,30 @@ export const UPSTREAM_CLI_CONTRACTS = {
                 arity: "one",
                 description: "Working directory for the invocation (Phase 4 slice ζ)",
             },
+            // Phase 4 slice θ — Grok HIGH parity. `--sandbox` is freeform per
+            // `grok --help` on 0.1.210 (no `[possible values: …]` list, unlike
+            // --effort / --permission-mode / --output-format), so we register
+            // it without a `values` constraint.
+            "--sandbox": {
+                arity: "one",
+                description: "Sandbox profile for filesystem + network access (Phase 4 slice θ; freeform passthrough; env: GROK_SANDBOX)",
+            },
+            "--rules": {
+                arity: "one",
+                description: "Extra rules appended to the system prompt; supports `@file` prefix (Phase 4 slice θ)",
+            },
+            "--system-prompt-override": {
+                arity: "one",
+                description: "Replace the agent's system prompt entirely (Phase 4 slice θ)",
+            },
+            "--allow": {
+                arity: "one",
+                description: "Permission allow rule (Phase 4 slice θ; repeat once per rule per `grok --help`)",
+            },
+            "--deny": {
+                arity: "one",
+                description: "Permission deny rule (Phase 4 slice θ; repeat once per rule per `grok --help`)",
+            },
         },
         env: {},
         conformanceFixtures: [
@@ -467,6 +497,36 @@ export const UPSTREAM_CLI_CONTRACTS = {
                 args: ["-p", "hello", "--cwd", "/tmp/work"],
                 expect: "pass",
             },
+            {
+                id: "grok-sandbox",
+                description: "Phase 4 slice θ: --sandbox <PROFILE> accepted (freeform)",
+                args: ["-p", "hello", "--sandbox", "workspace-write"],
+                expect: "pass",
+            },
+            {
+                id: "grok-rules",
+                description: "Phase 4 slice θ: --rules <RULES> accepted (@file prefix preserved)",
+                args: ["-p", "hello", "--rules", "@./rules.md"],
+                expect: "pass",
+            },
+            {
+                id: "grok-system-prompt-override",
+                description: "Phase 4 slice θ: --system-prompt-override <PROMPT> accepted",
+                args: ["-p", "hello", "--system-prompt-override", "You are a tester"],
+                expect: "pass",
+            },
+            {
+                id: "grok-allow-repeated",
+                description: "Phase 4 slice θ: repeated --allow <RULE> accepted",
+                args: ["-p", "hello", "--allow", "bash", "--allow", "edit"],
+                expect: "pass",
+            },
+            {
+                id: "grok-deny-repeated",
+                description: "Phase 4 slice θ: repeated --deny <RULE> accepted",
+                args: ["-p", "hello", "--deny", "write", "--deny", "kill"],
+                expect: "pass",
+            },
         ],
     },
     mistral: {

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "llm-cli-gateway",
-  "version": "1.12.0",
+  "version": "1.13.1",
   "mcpName": "io.github.verivus-oss/llm-cli-gateway",
   "description": "MCP server providing unified access to Claude Code, Codex, Gemini, Grok, and Mistral Vibe CLIs with session management, retry logic, async job orchestration, durable job results, and cross-LLM validation.",
   "license": "MIT",