npm - llm-cli-gateway - Versions diffs - 2.0.0 → 2.2.0 - Mend

llm-cli-gateway 2.0.0 → 2.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (8) hide show

package/CHANGELOG.md +65 -0
package/README.md +138 -14
package/dist/index.d.ts +3 -0
package/dist/index.js +92 -51
package/dist/upstream-contracts.d.ts +10 -0
package/dist/upstream-contracts.js +116 -6
package/dist/validation-tools.js +10 -10
package/package.json +1 -1

package/CHANGELOG.md CHANGED Viewed

@@ -4,6 +4,71 @@ All notable changes to the llm-cli-gateway project.
 ## Unreleased
+## [2.2.0] - 2026-06-07: MCP tool-surface usability — self-describing tools
+### Added
+- MCP tool-surface usability (4-seat cross-LLM review): all 37 tools now carry
+  action descriptions (previously none had tool-level descriptions — clients
+  that rank, search, or defer tools by description saw bare names); sync
+  `*_request` descriptions state the prompt/promptParts exactly-one rule and
+  conditional deferral; `job_status`/`job_result` vs `llm_job_*` and the
+  local-only `compare_answers` are disambiguated; session/`sessionId`
+  describes gain per-provider resume semantics parity.
+### Fixed
+- Codex gateway-bookkeeping sessions are now created with the reserved `gw-`
+  prefix (4 sites), so resuming a gateway ID fails fast with an actionable
+  error instead of reaching `codex exec resume` and dying with "no rollout
+  found" (root cause of real-world resume failures).
+- Server instructions are now built per-server from the same derived gate as
+  tool registration (backend, asyncJobsEnabled, hasStore()), so a
+  `backend = "none"` gateway no longer advertises unregistered
+  `*_request_async`/`llm_job_*` tools.
+- Sync auto-deferral is disabled when async jobs are unavailable — previously
+  a request could defer into an in-memory job whose polling tools were not
+  registered (dead-end jobId).
+## [2.1.0] - 2026-06-07: Grok Build 0.2.32, probe drift acknowledgement, docs currency
+### Added
+- Grok Build 0.2.32 support: new `leaderSocket` parameter on `grok_request` /
+  `grok_request_async` maps to the new `--leader-socket <PATH>` flag (isolated
+  leader process for local/branch Grok builds; default `~/.grok/leader.sock`).
+  Contract declares the flag with arity-one validation plus conformance
+  fixtures. The release's other changes (plugin slash commands in all
+  conversations, ordered rapid prompt submissions, faster grep on large
+  repos) are CLI-internal and inherited automatically. Probe at 0.2.32:
+  missingFlags/warnings clean.
+### Fixed
+- Upstream-contract probe drift after the 2026-06 provider CLI upgrades
+  (gemini 0.45.2, grok 0.2.22, vibe 2.14.0): `CliFlagContract.hiddenFromHelp`
+  marks real flags hidden from a binary's `--help` (Claude `--max-turns`), and
+  `CliContract.acknowledgedUpstreamFlags` acknowledges upstream-only flags the
+  gateway never emits (29 Claude, 18 Gemini). Both are probe-only — the argv
+  allowlist is unchanged — with stale-marker warnings in both directions and a
+  new `acknowledgedExtraFlags` probe field. New pure `computeFlagDrift` plus
+  7 unit tests.
+- MCP server version now reports the real package version (was hardcoded
+  `1.0.0`).
+### Documentation
+- Cross-LLM documentation currency review (Codex + Gemini + Grok + Mistral):
+  README tool reference gains `codex_fork_session`, `llm_request_result`,
+  `llm_process_health`, `upstream_contracts`, and `list_available_models`;
+  `claude_request` parameter list completed (`outputFormat` default is
+  `stream-json`); Codex `fullAuto` documented as deprecated in favour of
+  `sandboxMode`; Gemini approval modes include `plan`; grok/mistral upgrade
+  strategies documented; stale test counts, provider lists, and
+  `BEST_PRACTICES.md` path pointers corrected across README, AGENTS.md,
+  .cursorrules, CLAUDE.md, docs/guides, docs/personal-mcp (Mistral/Vibe row
+  added to the provider support matrix), and docs/upstream.
 ## [2.0.0] - 2026-06-04: node:sqlite migration — native module out of the prod graph
 Major release. Persistence moves from the native `better-sqlite3` binding to

package/README.md CHANGED Viewed

@@ -205,7 +205,7 @@ Opt-in flags (all default off) live under `[cache_awareness]` in `~/.llm-cli-gat
 ### Security & Quality
-- **Comprehensive Testing**: 900+ tests covering unit, integration, and regression scenarios with real CLI execution
+- **Comprehensive Testing**: 1,000+ tests covering unit, integration, and regression scenarios with real CLI execution
 - **Input Validation**: Zod schemas prevent injection attacks
 - **No Secret Leakage**: Generic session descriptions only (file permissions 0o600)
 - **No ReDoS**: Bounded regex patterns prevent catastrophic backtracking
@@ -344,6 +344,7 @@ The personal-appliance surface exposes simplified validation tools for non-devel
 - `consensus_check`: check whether providers agree with a claim.
 - `ask_model`: ask one provider through the simplified surface.
 - `synthesize_validation`: run an explicit judge model after provider results have been collected.
+- `list_available_models`: list the models each provider CLI exposes through the simplified surface.
 - `job_status` and `job_result`: poll and collect validation job outputs.
 The validation report preserves per-provider disagreement. Optional judge synthesis is explicit about which provider produced the judge job.
@@ -356,15 +357,29 @@ Execute a Claude Code request with optional session management.
 **Parameters:**
-- `prompt` (string, required): The prompt to send (1-100,000 chars)
+- `prompt` (string, optional*): The prompt to send (1-100,000 chars). *Exactly one of `prompt` or `promptParts` is required (mutually exclusive)
 - `model` (string, optional): Model name or alias (use `list_models` for available values; supports `latest`)
-- `outputFormat` (string, optional): Output format ("text" or "json"), default: "text"
+- `outputFormat` (string, optional): Output format (`text|json|stream-json`), default: `stream-json` — the gateway parses NDJSON usage events for token/cost observability; override to `text` only when you want unparsed stdout
 - `sessionId` (string, optional): Specific session ID to use
 - `continueSession` (boolean, optional): Continue the active session
 - `createNewSession` (boolean, optional): Always create a new session
+- `forkSession` (boolean, optional): Fork the resumed session instead of appending to it
 - `allowedTools` (string[], optional): Restrict Claude tools to this allow-list
 - `disallowedTools` (string[], optional): Explicitly deny listed Claude tools
-- `dangerouslySkipPermissions` (boolean, optional): Request CLI-side permission bypass (legacy mode only)
+- `permissionMode` (string, optional): Claude permission mode (`default|acceptEdits|plan|auto|dontAsk|bypassPermissions`); preferred over `dangerouslySkipPermissions`
+- `dangerouslySkipPermissions` (boolean, optional): Deprecated — maps to `permissionMode: "bypassPermissions"`; `permissionMode` wins when both are set
+- `agent` (string, optional): Named sub-agent to run as
+- `agents` (string, optional): Inline agent definitions JSON
+- `systemPrompt` / `appendSystemPrompt` (string, optional): Replace or extend the system prompt
+- `maxBudgetUsd` (number, optional): Budget cap in USD for the request
+- `maxTurns` (integer, optional): Agent-loop turn cap
+- `effort` (string, optional): Reasoning effort (`low|medium|high|xhigh|max`)
+- `fallbackModel` (string, optional): Auto-fallback model when the default is overloaded
+- `jsonSchema` (string, optional): JSON Schema literal constraining structured output
+- `addDir` (string[], optional): Additional workspace directories
+- `noSessionPersistence` (boolean, optional): Ephemeral session (not persisted to disk)
+- `settingSources` / `settings` / `tools` (optional): Setting sources to load, settings JSON path/literal, built-in tool restriction
+- `excludeDynamicSystemPromptSections` (boolean, optional): Trim dynamic system prompt sections
 - `approvalStrategy` (string, optional): `"legacy"` (default) or `"mcp_managed"`
 - `approvalPolicy` (string, optional): `"strict"`, `"balanced"`, or `"permissive"`
 - `mcpServers` (string[], optional): Claude MCP servers to expose (default: `["sqry","exa","ref_tools"]`; `"trstr"` available as opt-in)
@@ -372,6 +387,10 @@ Execute a Claude Code request with optional session management.
 - `optimizePrompt` (boolean, optional): Optimize prompt for token efficiency (44% reduction), default: false
 - `optimizeResponse` (boolean, optional): Optimize response for token efficiency (37% reduction), default: false
 - `correlationId` (string, optional): Request trace ID (auto-generated if omitted)
+- `idleTimeoutMs` (integer, optional): Kill a stuck process after output inactivity; 30,000 to 3,600,000 ms
+- `worktree` (boolean|object, optional): Run inside a gateway-owned git worktree (slice λ)
+- `promptParts` (object, optional): Cache-aware structured prompt `{ system?, tools?, context?, task }`; mutually exclusive with `prompt`
+- `forceRefresh` (boolean, optional): Bypass dedup and force a fresh CLI run, default: false
 **Response extras:**
@@ -396,19 +415,33 @@ Execute a Codex request with optional session tracking.
 **Parameters:**
-- `prompt` (string, required): The prompt to send (1-100,000 chars)
-- `model` (string, optional): Model name or alias (use `list_models` for available values; supports `latest`, recommended: `gpt-5.4`)
-- `fullAuto` (boolean, optional): Enable full-auto mode, default: false
+- `prompt` (string, optional*): The prompt to send (1-100,000 chars). *Exactly one of `prompt` or `promptParts` is required (mutually exclusive)
+- `model` (string, optional): Model name or alias (use `list_models` for available values; supports `latest`, recommended: `gpt-5.5`)
+- `fullAuto` (boolean, optional): Deprecated — expands to `--sandbox workspace-write` only (current Codex no longer accepts approval-policy flags); prefer `sandboxMode`
+- `sandboxMode` (string, optional): Codex sandbox (`read-only|workspace-write|danger-full-access`)
 - `dangerouslyBypassApprovalsAndSandbox` (boolean, optional): Request Codex bypass flags
 - `approvalStrategy` (string, optional): `"legacy"` (default) or `"mcp_managed"`
 - `approvalPolicy` (string, optional): `"strict"`, `"balanced"`, or `"permissive"`
 - `mcpServers` (string[], optional): MCP servers expected for Codex execution context
 - `sessionId` (string, optional): Session identifier for tracking
+- `resumeLatest` (boolean, optional): Resume the most recent Codex session in the current cwd (`codex exec resume --last`); ignored if `sessionId` is set
 - `createNewSession` (boolean, optional): Always create a new session
+- `forceRefresh` (boolean, optional): Bypass dedup and force a fresh CLI run, default: false
+- `outputFormat` (string, optional): `text` (default) or `json` (`--json` JSONL events for token usage extraction)
+- `outputSchema` (string|object, optional): Codex `--output-schema` — path or inline JSON Schema
+- `workingDir` (string, optional): Working root for this session (`-C`/`--cd`; new sessions only)
+- `addDir` (string[], optional): Additional writable workspace directories (one `--add-dir` per entry; new sessions only)
+- `ephemeral` (boolean, optional): Codex `--ephemeral` (no session persistence)
+- `images` (string[], optional): Image attachments (one `-i <path>` per entry)
+- `profile` (string, optional): Codex `--profile <name>` (new sessions only; ignored with a logged warning on resume)
+- `configOverrides` (object, optional): Codex `-c key=value` overrides
+- `ignoreRules` / `ignoreUserConfig` (boolean, optional): Codex `--ignore-rules` / `--ignore-user-config`
+- `worktree` (boolean|object, optional): Run inside a gateway-owned git worktree (slice λ)
+- `promptParts` (object, optional): Cache-aware structured prompt `{ system?, tools?, context?, task }`; mutually exclusive with `prompt`
 - `optimizePrompt` (boolean, optional): Optimize prompt for token efficiency, default: false
 - `optimizeResponse` (boolean, optional): Optimize response for token efficiency, default: false
 - `correlationId` (string, optional): Request trace ID (auto-generated if omitted)
-- `idleTimeoutMs` (number, optional): Kill a stuck Codex process after output inactivity; 30,000 to 3,600,000 ms
+- `idleTimeoutMs` (integer, optional): Kill a stuck Codex process after output inactivity; 30,000 to 3,600,000 ms
 **Response extras:**
@@ -420,32 +453,56 @@ Execute a Codex request with optional session tracking.
 ```json
 {
   "prompt": "Create a REST API endpoint",
-  "model": "gpt-5.4",
-  "fullAuto": true,
+  "model": "gpt-5.5",
+  "sandboxMode": "workspace-write",
   "optimizePrompt": true
 }
 ```
+##### `codex_fork_session`
+Fork an existing Codex session into a new branch (`codex fork <SESSION_ID|--last> <prompt>`), preserving the original session's history while the fork diverges.
+**Parameters:**
+- `prompt` (string, required): Prompt text for the forked session (1-100,000 chars)
+- `sessionId` (string, optional): Codex session UUID to fork from (mutually exclusive with `forkLast`)
+- `forkLast` (boolean, optional): Fork the most recent Codex session instead of naming one
+- `model` (string, optional): Model name or alias (e.g. `gpt-5.5`, `latest`)
+- `sandboxMode` (string, optional): Codex sandbox (`read-only|workspace-write|danger-full-access`)
+- `correlationId` (string, optional): Request trace ID (auto-generated if omitted)
+- `idleTimeoutMs` (number, optional): Idle timeout in ms (30s-1h, omit for CLI default)
 ##### `gemini_request`
 Execute a Gemini CLI request with session support.
 **Parameters:**
-- `prompt` (string, required): The prompt to send (1-100,000 chars)
+- `prompt` (string, optional*): The prompt to send (1-100,000 chars). *Exactly one of `prompt` or `promptParts` is required (mutually exclusive)
 - `model` (string, optional): Model name or alias (use `list_models` for available values; supports `latest`, `pro`, `flash`)
 - `sessionId` (string, optional): Session ID to resume
 - `resumeLatest` (boolean, optional): Resume the latest session automatically
 - `createNewSession` (boolean, optional): Always create a new session
-- `approvalMode` (string, optional): Gemini approval mode (`default|auto_edit|yolo`) in legacy mode
+- `approvalMode` (string, optional): Gemini approval mode (`default|auto_edit|yolo|plan`) in legacy mode
 - `approvalStrategy` (string, optional): `"legacy"` (default) or `"mcp_managed"`
 - `approvalPolicy` (string, optional): `"strict"`, `"balanced"`, or `"permissive"`
 - `mcpServers` (string[], optional): Allowed Gemini MCP server names
 - `allowedTools` (string[], optional): Restrict Gemini tools to this allow-list
 - `includeDirs` (string[], optional): Additional workspace directories for Gemini
+- `outputFormat` (string, optional): `text` (default), `json` (`-o json`), or `stream-json` (`-o stream-json`, NDJSON with usage extraction)
+- `sandbox` (boolean, optional): Run Gemini in sandbox mode (`-s`)
+- `policyFiles` / `adminPolicyFiles` (string[], optional): Policy / admin-policy file paths (one `--policy`/`--admin-policy` per file; paths must exist)
+- `attachments` (string[], optional): Absolute file paths prepended as `@<path>` tokens to the prompt
+- `skipTrust` (boolean, optional): Emit `--skip-trust` to trust the workspace for this session (required for headless runs in fresh workspaces)
+- `yolo` (boolean, optional): Auto-approve all; equivalent to `approvalMode: "yolo"`. Emits `--yolo` only when `--approval-mode yolo` is not already being emitted (never both)
+- `worktree` (boolean|object, optional): Run inside a gateway-owned git worktree (slice λ)
+- `promptParts` (object, optional): Cache-aware structured prompt `{ system?, tools?, context?, task }`; mutually exclusive with `prompt`
 - `optimizePrompt` (boolean, optional): Optimize prompt for token efficiency, default: false
 - `optimizeResponse` (boolean, optional): Optimize response for token efficiency, default: false
 - `correlationId` (string, optional): Request trace ID (auto-generated if omitted)
+- `idleTimeoutMs` (integer, optional): Kill a stuck process after output inactivity; 30,000 to 3,600,000 ms
+- `forceRefresh` (boolean, optional): Bypass dedup and force a fresh CLI run, default: false
 **Response extras:**
@@ -469,7 +526,7 @@ Execute a Grok CLI (xAI) request with session support.
 **Parameters:**
-- `prompt` (string, required): The prompt to send (1-100,000 chars)
+- `prompt` (string, optional*): The prompt to send (1-100,000 chars). *Exactly one of `prompt` or `promptParts` is required (mutually exclusive)
 - `model` (string, optional): Model name or alias (e.g. `grok-build`, `latest`)
 - `outputFormat` (string, optional): `"plain"` (default), `"json"`, or `"streaming-json"`
 - `sessionId` (string, optional): Session ID to resume (`--resume <id>`)
@@ -484,9 +541,35 @@ Execute a Grok CLI (xAI) request with session support.
 - `mcpServers` (string[], optional): MCP server names tracked for approvals (Grok manages its own MCP config via `grok mcp`)
 - `allowedTools` (string[], optional): Allowed built-in tools (passed as `--tools` comma list)
 - `disallowedTools` (string[], optional): Disallowed built-in tools (passed as `--disallowed-tools` comma list)
+- `maxTurns` (integer, optional): Agent-loop iteration cap (`--max-turns`)
+- `workingDir` (string, optional): Working directory for this invocation (`--cwd`)
+- `sandbox` (string, optional): Sandbox profile for filesystem/network access (`--sandbox`, freeform; also via `GROK_SANDBOX`)
+- `rules` (string, optional): Extra rules appended to the system prompt (`--rules`; supports `@file` prefix)
+- `systemPromptOverride` (string, optional): Replace the agent's system prompt entirely
+- `allow` / `deny` (string[], optional): Permission allow/deny rules (one `--allow`/`--deny` per entry)
+- `compactionMode` (string, optional): `summary` (default) `|transcript|segments`
+- `compactionDetail` (string, optional): `none|minimal|balanced|verbose` (segments mode only)
+- `agent` (string, optional): Agent name or definition file path
+- `agents` (string|object, optional): Inline subagent definitions JSON
+- `bestOfN` (integer, optional): Run the task N ways in parallel and pick the best (headless only)
+- `check` (boolean, optional): Append a self-verification loop (headless only)
+- `disableWebSearch` (boolean, optional): Disable web search and remote retrieval tools
+- `todoGate` (boolean, optional): Enable runtime turn-end TodoGate (session-scoped)
+- `verbatim` (boolean, optional): Send the prompt exactly as given (also skips gateway prompt optimisation)
+- `promptFile` / `promptJson` / `single` (optional): Single-turn prompt from a file / JSON blocks / literal
+- `experimentalMemory` / `noMemory` (boolean, optional): Enable/disable cross-session memory
+- `noAltScreen` / `noPlan` / `noSubagents` (boolean, optional): Disable alt screen / plan mode / subagent spawning
+- `oauth` (boolean, optional): Use OAuth during authentication
+- `restoreCode` (boolean, optional): Check out the original session commit when resuming
+- `leaderSocket` (string, optional): Custom leader socket path (`--leader-socket`, Grok 0.2.32+; default `~/.grok/leader.sock`) — targets an isolated leader process, e.g. a local/branch Grok build
+- `nativeWorktree` (boolean|string, optional): Grok's own `--worktree` flag (`true` → bare, string → named); distinct from the gateway `worktree` option
+- `worktree` (boolean|object, optional): Run inside a gateway-owned git worktree (slice λ)
+- `promptParts` (object, optional): Cache-aware structured prompt `{ system?, tools?, context?, task }`; mutually exclusive with `prompt`
 - `optimizePrompt` (boolean, optional): Optimize prompt for token efficiency, default: false
 - `optimizeResponse` (boolean, optional): Optimize response for token efficiency, default: false
 - `correlationId` (string, optional): Request trace ID (auto-generated if omitted)
+- `idleTimeoutMs` (integer, optional): Kill a stuck process after output inactivity; 30,000 to 3,600,000 ms
+- `forceRefresh` (boolean, optional): Bypass dedup and force a fresh CLI run, default: false
 **Example:**
@@ -740,6 +823,21 @@ Run a Mistral Vibe agentic coding request. Like `grok_request` in shape, but wit
 - `disallowedTools` (string[], optional): Accepted for parity with the other providers; ignored at the CLI boundary with a logged warning.
 - `outputFormat` (string, optional): Vibe 2.x values are `"text"`, `"json"`, or `"streaming"`; legacy aliases `"plain"` and `"stream-json"` are accepted and normalized before spawn.
 - `sessionId` / `resumeLatest` / `createNewSession`: standard session controls. Current Vibe defaults session logging to enabled; if an older config has `[session_logging] enabled = false`, `doctor --json` surfaces an actionable next-action.
+- `trust` (boolean, optional): Emit `--trust` so Vibe trusts the cwd for this invocation only (not persisted; skips the interactive trust prompt)
+- `maxTurns` (integer, optional): Agent-loop iteration cap (`--max-turns`, programmatic mode only)
+- `maxPrice` (number, optional): Interrupt when cumulative cost crosses this USD cap (`--max-price`, programmatic mode only)
+- `maxTokens` (integer, optional): Cap cumulative prompt + completion tokens (`--max-tokens`, programmatic mode only)
+- `workingDir` (string, optional): Change to this directory before running (`--workdir`)
+- `addDir` (string[], optional): Additional writable workspace directories (one `--add-dir` per entry)
+- `approvalStrategy` (string, optional): `"legacy"` (default) or `"mcp_managed"`
+- `approvalPolicy` (string, optional): `"strict"`, `"balanced"`, or `"permissive"`
+- `mcpServers` (string[], optional): MCP server names tracked for approvals (Vibe manages its own MCP config via `vibe mcp`)
+- `worktree` (boolean|object, optional): Run inside a gateway-owned git worktree (slice λ)
+- `promptParts` (object, optional): Cache-aware structured prompt `{ system?, tools?, context?, task }`; mutually exclusive with `prompt`
+- `optimizePrompt` / `optimizeResponse` (boolean, optional): Token-efficiency optimisation, default: false
+- `correlationId` (string, optional): Request trace ID (auto-generated if omitted)
+- `idleTimeoutMs` (integer, optional): Kill a stuck process after output inactivity; 30,000 to 3,600,000 ms
+- `forceRefresh` (boolean, optional): Bypass dedup and force a fresh CLI run, default: false
 ##### `claude_request_async` / `codex_request_async` / `gemini_request_async` / `grok_request_async` / `mistral_request_async`
@@ -778,10 +876,33 @@ List recent MCP-managed approval decisions recorded by the gateway.
 **Parameters:**
 - `limit` (number, optional): Max records (1-500), default: 50
-- `cli` (string, optional): Filter by `"claude"`, `"codex"`, or `"gemini"`
+- `cli` (string, optional): Filter by `"claude"`, `"codex"`, `"gemini"`, `"grok"`, or `"mistral"`
 Approval records are persisted to `~/.llm-cli-gateway/approvals.jsonl`.
+##### `llm_request_result`
+Read back any persisted request — sync or async — by its correlation ID. Every response echoes its ID in `structuredContent.correlationId`; pass it here to recover the persisted prompt/response after the inline result is gone. Reads the flight recorder, so it works independently of async-job persistence (returns "not found" when flight recording is disabled).
+**Parameters:**
+- `correlationId` (string, required): Correlation ID from a prior request
+- `maxChars` (number, optional): Max chars of the persisted response to return (1,000-2,000,000)
+- `includePrompt` (boolean, optional): Include the full persisted prompt text, default: false
+##### `llm_process_health`
+Report gateway process health: async-job manager state plus the resolved persistence block (`backend`, `dbPath`, config sources). Use it to confirm which config file and SQLite paths the gateway is actually running under.
+##### `upstream_contracts`
+Return the gateway's declared provider CLI contracts, optionally probing the installed binaries for drift.
+**Parameters:**
+- `cli` (string, optional): Filter (`claude|codex|gemini|grok|mistral`)
+- `probeInstalled` (boolean, optional, default `false`): Run local `--help` probes and compare advertised flags against the declared contract — strongly recommended after any provider CLI upgrade. The probe reports `missingFlags`, `extraFlags`, `acknowledgedExtraFlags` (known upstream-only flags filtered from `extraFlags`), `discoveredFlags`, and stale-marker `warnings`.
 #### Session Management Tools
 ##### `session_create`
@@ -924,6 +1045,9 @@ Plan or run an upgrade for one CLI.
 - Codex latest: `codex update`
 - Codex explicit target: `npm install -g @openai/codex@<target>`
 - Gemini: `npm install -g @google/gemini-cli@<target>`
+- Grok latest: `grok update`
+- Grok explicit target: `grok update --version <target>`
+- Mistral (Vibe): dispatches to the detected installer (`pip`/`uv`/`brew`); errors with guidance when none is detected (Vibe ships no self-update command)
 **Example dry run:**

package/dist/index.d.ts CHANGED Viewed

@@ -44,6 +44,7 @@ declare const logger: {
     debug: (message: string, ...args: any[]) => void;
 };
 type GatewayLogger = typeof logger;
+export declare function buildServerInstructions(asyncJobsEnabled: boolean): string;
 export declare const MAX_TURNS_SCHEMA: z.ZodNumber;
 export declare const MAX_TOKENS_SCHEMA: z.ZodNumber;
 export declare const MAX_PRICE_SCHEMA: z.ZodNumber;
@@ -251,6 +252,7 @@ export declare function prepareGrokRequest(params: {
     noSubagents?: boolean;
     oauth?: boolean;
     restoreCode?: boolean;
+    leaderSocket?: string;
     nativeWorktree?: boolean | string;
 }, runtime?: GatewayServerRuntime): CliRequestPrep | ExtendedToolResponse;
 export declare function prepareMistralRequest(params: {
@@ -376,6 +378,7 @@ export interface GrokRequestParams {
     noSubagents?: boolean;
     oauth?: boolean;
     restoreCode?: boolean;
+    leaderSocket?: string;
     nativeWorktree?: boolean | string;
     worktree?: boolean | {
         name?: string;

package/dist/index.js CHANGED Viewed

@@ -141,16 +141,21 @@ function loadSkills() {
     return skills;
 }
 const loadedSkills = loadSkills();
-const SERVER_INSTRUCTIONS = `llm-cli-gateway: Multi-LLM orchestration via MCP.
+export function buildServerInstructions(asyncJobsEnabled) {
+    const asyncToolsNote = asyncJobsEnabled ? " | *_request_async (async)" : "";
+    const jobsLine = asyncJobsEnabled ? "Jobs: llm_job_status, llm_job_result, llm_job_cancel\n" : "";
+    const deferralLine = asyncJobsEnabled
+        ? `- Sync auto-defers at ${SYNC_DEADLINE_MS}ms. Poll deferred jobs via llm_job_status/llm_job_result.`
+        : '- Async jobs are DISABLED (persistence.backend = "none"): *_request_async and llm_job_* tools are not registered, and sync requests run to completion (no auto-deferral).';
+    return `llm-cli-gateway: Multi-LLM orchestration via MCP.
-Tools: claude_request, codex_request, gemini_request, grok_request, mistral_request (sync) | *_request_async (async)
-Validation: validate_with_models, second_opinion, compare_answers, red_team_review, consensus_check, ask_model, synthesize_validation
-Jobs: llm_job_status, llm_job_result, llm_job_cancel
-Sessions: session_create, session_list, session_set_active, session_get, session_delete, session_clear_all
+Tools: claude_request, codex_request, gemini_request, grok_request, mistral_request (sync)${asyncToolsNote} | codex_fork_session (fork a Codex session into a new branch)
+Validation: validate_with_models, second_opinion, compare_answers, red_team_review, consensus_check, ask_model, synthesize_validation, list_available_models | job_status/job_result (validation jobs)
+${jobsLine}Sessions: session_create, session_list, session_set_active, session_get, session_delete, session_clear_all
 Other: list_models, cli_versions, upstream_contracts (use --probe-installed after CLI upgrades to detect drift), cli_upgrade, approval_list, llm_process_health, llm_request_result (read back any persisted request — sync or async — by correlationId)
 Key behaviors:
-- Sync auto-defers at ${SYNC_DEADLINE_MS}ms. Poll deferred jobs via llm_job_status/llm_job_result.
+${deferralLine}
 - Sessions: Claude --continue, Gemini --resume, Grok --resume/--continue, Mistral --resume/--continue (current Vibe defaults session logging on; doctor flags explicit session_logging.enabled=false), Codex \`exec resume <ID>\` / \`exec resume --last\` (all real CLI continuity). For Codex, sessionId must be a real Codex UUID (from ~/.codex/sessions/); gateway-generated gw-* IDs are rejected.
 - Approval gates: opt-in via approvalStrategy:"mcp_managed".
 - Upstream drift detection: After upgrading any provider CLI (especially grok), use the upstream_contracts tool with probeInstalled: true (or the CLI command "llm-cli-gateway contracts --json --probe-installed"). This is the primary reliable way to detect when an installed binary has gained or lost flags compared to the gateway's declared contract. The probe is safe and read-only.
@@ -158,8 +163,9 @@ Key behaviors:
 Skills (full docs via MCP resources):
 ${loadedSkills.map(s => `- skills://${s.name} — ${s.description}`).join("\n")}`;
-function newGatewayMcpServer() {
-    return new McpServer({ name: "llm-cli-gateway", version: "1.0.0" }, { instructions: SERVER_INSTRUCTIONS });
+}
+function newGatewayMcpServer(asyncJobsEnabled = true) {
+    return new McpServer({ name: "llm-cli-gateway", version: packageVersion() }, { instructions: buildServerInstructions(asyncJobsEnabled) });
 }
 let sessionManager;
 let db = null;
@@ -307,7 +313,10 @@ async function awaitJobOrDefer(cli, args, corrId, idleTimeoutMs, outputFormat, f
         consumeOnComplete();
         throw err;
     }
-    if (SYNC_DEADLINE_MS === 0) {
+    const deferralAvailable = runtime.persistence.backend !== "none" &&
+        runtime.persistence.asyncJobsEnabled &&
+        runtime.asyncJobManager.hasStore();
+    if (SYNC_DEADLINE_MS === 0 || !deferralAvailable) {
         const command = cli === "mistral" ? "vibe" : cli;
         try {
             return await executeCli(command, args, {
@@ -1474,6 +1483,9 @@ export function prepareGrokRequest(params, runtime = resolveGatewayServerRuntime
     if (params.restoreCode) {
         args.push("--restore-code");
     }
+    if (params.leaderSocket) {
+        args.push("--leader-socket", params.leaderSocket);
+    }
     if (params.nativeWorktree === true) {
         args.push("--worktree");
     }
@@ -1976,6 +1988,7 @@ export async function handleGrokRequest(deps, params) {
         noSubagents: params.noSubagents,
         oauth: params.oauth,
         restoreCode: params.restoreCode,
+        leaderSocket: params.leaderSocket,
         nativeWorktree: params.nativeWorktree,
     }, runtime);
     if (!("args" in prep))
@@ -2133,6 +2146,7 @@ export async function handleGrokRequestAsync(deps, params) {
         noSubagents: params.noSubagents,
         oauth: params.oauth,
         restoreCode: params.restoreCode,
+        leaderSocket: params.leaderSocket,
         nativeWorktree: params.nativeWorktree,
     }, runtime);
     if (!("args" in prep))
@@ -2498,7 +2512,7 @@ export async function handleCodexRequestAsync(deps, params) {
                 effectiveSessionId = activeSession.id;
             }
             else {
-                const newSession = await deps.sessionManager.createSession("codex", "Codex Session");
+                const newSession = await deps.sessionManager.createSession("codex", "Codex Session", `${GATEWAY_SESSION_PREFIX}${randomUUID()}`);
                 effectiveSessionId = newSession.id;
             }
         }
@@ -2506,7 +2520,7 @@ export async function handleCodexRequestAsync(deps, params) {
             await deps.sessionManager.updateSessionUsage(params.sessionId);
         }
         else if (params.createNewSession) {
-            const newSession = await deps.sessionManager.createSession("codex", "Codex Session");
+            const newSession = await deps.sessionManager.createSession("codex", "Codex Session", `${GATEWAY_SESSION_PREFIX}${randomUUID()}`);
             effectiveSessionId = newSession.id;
         }
         let worktreeResolution = {};
@@ -2562,10 +2576,10 @@ export function createGatewayServer(deps = {}) {
     void flightRecorder;
     void cacheAwareness;
     const asyncJobsEnabled = persistence.backend !== "none" && persistence.asyncJobsEnabled && asyncJobManager.hasStore();
-    const server = newGatewayMcpServer();
+    const server = newGatewayMcpServer(asyncJobsEnabled);
     registerBaseResources(server, runtime);
     registerValidationTools(server, { asyncJobManager });
-    server.tool("claude_request", {
+    server.tool("claude_request", "Run a Claude Code CLI request synchronously (when async jobs are enabled, auto-defers to a pollable job past the sync deadline; otherwise runs to completion). Requires exactly one of prompt or promptParts.", {
         prompt: z
             .string()
             .min(1, "Prompt cannot be empty")
@@ -2581,8 +2595,14 @@ export function createGatewayServer(deps = {}) {
             .enum(["text", "json", "stream-json"])
             .default("stream-json")
             .describe("Output format (text|json|stream-json). DEFAULT: stream-json — the gateway parses NDJSON usage events to extract input/output/cache_read/cache_creation tokens + cost + model, persists them to the flight recorder for cache_state aggregates, and still returns the assistant text. Override to 'text' only when you truly want unparsed stdout (loses observability)."),
-        sessionId: z.string().optional().describe("Session ID (uses active if omitted)"),
-        continueSession: z.boolean().default(false).describe("Continue active session"),
+        sessionId: z
+            .string()
+            .optional()
+            .describe("Gateway session record to associate (uses the active session if omitted). Claude continuity itself is via continueSession (--continue); this ID is gateway bookkeeping, not a Claude-native session."),
+        continueSession: z
+            .boolean()
+            .default(false)
+            .describe("Continue the most recent Claude conversation in this cwd (emits --continue; real CLI continuity)."),
         createNewSession: z.boolean().default(false).describe("Force new session"),
         allowedTools: z
             .array(z.string())
@@ -2892,7 +2912,7 @@ export function createGatewayServer(deps = {}) {
             performanceMetrics.recordRequest("claude", finalizedDurationMs, wasSuccessful);
         }
     });
-    server.tool("codex_request", {
+    server.tool("codex_request", "Run an OpenAI Codex CLI request synchronously (when async jobs are enabled, auto-defers to a pollable job past the sync deadline; otherwise runs to completion). Requires exactly one of prompt or promptParts.", {
         prompt: z
             .string()
             .min(1, "Prompt cannot be empty")
@@ -3084,7 +3104,7 @@ export function createGatewayServer(deps = {}) {
                     effectiveSessionId = activeSession.id;
                 }
                 else {
-                    const newSession = await sessionManager.createSession("codex", "Codex Session");
+                    const newSession = await sessionManager.createSession("codex", "Codex Session", `${GATEWAY_SESSION_PREFIX}${randomUUID()}`);
                     effectiveSessionId = newSession.id;
                 }
             }
@@ -3092,7 +3112,7 @@ export function createGatewayServer(deps = {}) {
                 await sessionManager.updateSessionUsage(sessionId);
             }
             else if (createNewSession) {
-                const newSession = await sessionManager.createSession("codex", "Codex Session");
+                const newSession = await sessionManager.createSession("codex", "Codex Session", `${GATEWAY_SESSION_PREFIX}${randomUUID()}`);
                 effectiveSessionId = newSession.id;
             }
             logger.info(`[${corrId}] codex_request completed successfully in ${durationMs}ms`);
@@ -3140,7 +3160,7 @@ export function createGatewayServer(deps = {}) {
             performanceMetrics.recordRequest("codex", finalizedDurationMs, wasSuccessful);
         }
     });
-    server.tool("codex_fork_session", {
+    server.tool("codex_fork_session", "Fork an existing Codex session into a new branch (codex fork <ID|--last>) and run a prompt against the fork without mutating the original.", {
         prompt: z
             .string()
             .min(1, "Prompt cannot be empty")
@@ -3227,7 +3247,7 @@ export function createGatewayServer(deps = {}) {
             performanceMetrics.recordRequest("codex", finalizedDurationMs, wasSuccessful);
         }
     });
-    server.tool("gemini_request", {
+    server.tool("gemini_request", "Run a Google Gemini CLI request synchronously (when async jobs are enabled, auto-defers to a pollable job past the sync deadline; otherwise runs to completion). Requires exactly one of prompt or promptParts.", {
         prompt: z
             .string()
             .min(1, "Prompt cannot be empty")
@@ -3239,7 +3259,10 @@ export function createGatewayServer(deps = {}) {
             .string()
             .optional()
             .describe("Model name or alias (e.g. gemini-3-pro-preview, gemini-2.5-flash, pro, flash, latest)"),
-        sessionId: z.string().optional().describe("Session ID or 'latest'"),
+        sessionId: z
+            .string()
+            .optional()
+            .describe("Gemini session ID to resume (emits --resume <id>), or 'latest' for the most recent session in this cwd"),
         resumeLatest: z.boolean().default(false).describe("Resume latest session"),
         createNewSession: z.boolean().default(false).describe("Force new session"),
         approvalMode: z
@@ -3323,7 +3346,7 @@ export function createGatewayServer(deps = {}) {
             worktree,
         });
     });
-    server.tool("grok_request", {
+    server.tool("grok_request", "Run an xAI Grok CLI request synchronously (when async jobs are enabled, auto-defers to a pollable job past the sync deadline; otherwise runs to completion). Requires exactly one of prompt or promptParts.", {
         prompt: z
             .string()
             .min(1, "Prompt cannot be empty")
@@ -3339,7 +3362,7 @@ export function createGatewayServer(deps = {}) {
         sessionId: z
             .string()
             .optional()
-            .describe("Session ID (user-provided CLI handle for --resume)"),
+            .describe("Provider-native session ID to resume (emits --resume <id>; use resumeLatest for --continue)"),
         resumeLatest: z
             .boolean()
             .default(false)
@@ -3488,12 +3511,17 @@ export function createGatewayServer(deps = {}) {
             .boolean()
             .optional()
             .describe("Grok --restore-code: check out the original session commit when resuming."),
+        leaderSocket: z
+            .string()
+            .min(1)
+            .optional()
+            .describe("Grok 0.2.32+ --leader-socket <PATH>: custom leader socket path (default ~/.grok/leader.sock). Targets an isolated leader process, e.g. a local/branch Grok build; name it ~/.grok/leader-*.sock to keep `grok leader list/kill` discovery working."),
         nativeWorktree: z
             .union([z.boolean(), z.string().min(1)])
             .optional()
             .describe("Grok -w/--worktree: native CLI worktree flag (`true` → bare `--worktree`, string → named). NOT gateway slice λ `worktree`."),
         worktree: WORKTREE_SCHEMA.optional(),
-    }, async ({ prompt, promptParts, model, outputFormat, sessionId, resumeLatest, createNewSession, alwaysApprove, permissionMode, effort, reasoningEffort, approvalStrategy, approvalPolicy, mcpServers, allowedTools, disallowedTools, correlationId, optimizePrompt, optimizeResponse, idleTimeoutMs, forceRefresh, maxTurns, workingDir, sandbox, rules, systemPromptOverride, allow, deny, compactionMode, compactionDetail, agent, bestOfN, check, disableWebSearch, todoGate, verbatim, agents, promptFile, promptJson, single, experimentalMemory, noAltScreen, noMemory, noPlan, noSubagents, oauth, restoreCode, nativeWorktree, worktree, }) => {
+    }, async ({ prompt, promptParts, model, outputFormat, sessionId, resumeLatest, createNewSession, alwaysApprove, permissionMode, effort, reasoningEffort, approvalStrategy, approvalPolicy, mcpServers, allowedTools, disallowedTools, correlationId, optimizePrompt, optimizeResponse, idleTimeoutMs, forceRefresh, maxTurns, workingDir, sandbox, rules, systemPromptOverride, allow, deny, compactionMode, compactionDetail, agent, bestOfN, check, disableWebSearch, todoGate, verbatim, agents, promptFile, promptJson, single, experimentalMemory, noAltScreen, noMemory, noPlan, noSubagents, oauth, restoreCode, leaderSocket, nativeWorktree, worktree, }) => {
         return handleGrokRequest({ sessionManager, logger, runtime }, {
             prompt,
             promptParts,
@@ -3542,11 +3570,12 @@ export function createGatewayServer(deps = {}) {
             noSubagents,
             oauth,
             restoreCode,
+            leaderSocket,
             nativeWorktree,
             worktree,
         });
     });
-    server.tool("mistral_request", {
+    server.tool("mistral_request", "Run a Mistral Vibe CLI request synchronously (when async jobs are enabled, auto-defers to a pollable job past the sync deadline; otherwise runs to completion). Requires exactly one of prompt or promptParts.", {
         prompt: z
             .string()
             .min(1, "Prompt cannot be empty")
@@ -3656,7 +3685,7 @@ export function createGatewayServer(deps = {}) {
         });
     });
     if (asyncJobsEnabled) {
-        server.tool("claude_request_async", {
+        server.tool("claude_request_async", "Start a Claude Code CLI request as a durable background job. Poll with llm_job_status, collect with llm_job_result.", {
             prompt: z
                 .string()
                 .min(1, "Prompt cannot be empty")
@@ -3672,8 +3701,14 @@ export function createGatewayServer(deps = {}) {
                 .enum(["text", "json", "stream-json"])
                 .default("stream-json")
                 .describe("Output format (text|json|stream-json). DEFAULT: stream-json — same rationale as claude_request: keeps usage/cache/cost observable for cache_state aggregates. Override to 'text' only when raw stdout is required (loses observability)."),
-            sessionId: z.string().optional().describe("Session ID (uses active if omitted)"),
-            continueSession: z.boolean().default(false).describe("Continue active session"),
+            sessionId: z
+                .string()
+                .optional()
+                .describe("Gateway session record to associate (uses the active session if omitted). Claude continuity itself is via continueSession (--continue); this ID is gateway bookkeeping, not a Claude-native session."),
+            continueSession: z
+                .boolean()
+                .default(false)
+                .describe("Continue the most recent Claude conversation in this cwd (emits --continue; real CLI continuity)."),
             createNewSession: z.boolean().default(false).describe("Force new session"),
             allowedTools: z
                 .array(z.string())
@@ -3909,7 +3944,7 @@ export function createGatewayServer(deps = {}) {
                 return createErrorResponse("claude_request_async", 1, "", corrId, error);
             }
         });
-        server.tool("codex_request_async", {
+        server.tool("codex_request_async", "Start an OpenAI Codex CLI request as a durable background job. Poll with llm_job_status, collect with llm_job_result.", {
             prompt: z
                 .string()
                 .min(1, "Prompt cannot be empty")
@@ -4034,7 +4069,7 @@ export function createGatewayServer(deps = {}) {
                 worktree,
             });
         });
-        server.tool("gemini_request_async", {
+        server.tool("gemini_request_async", "Start a Google Gemini CLI request as a durable background job. Poll with llm_job_status, collect with llm_job_result.", {
             prompt: z
                 .string()
                 .min(1, "Prompt cannot be empty")
@@ -4049,7 +4084,7 @@ export function createGatewayServer(deps = {}) {
             sessionId: z
                 .string()
                 .optional()
-                .describe("Session ID (user-provided CLI handle for --resume)"),
+                .describe("Gemini session ID to resume (emits --resume <id>), or 'latest' for the most recent session in this cwd"),
             resumeLatest: z.boolean().default(false).describe("Resume latest session"),
             createNewSession: z.boolean().default(false).describe("Force new session"),
             approvalMode: z
@@ -4131,7 +4166,7 @@ export function createGatewayServer(deps = {}) {
                 worktree,
             });
         });
-        server.tool("grok_request_async", {
+        server.tool("grok_request_async", "Start an xAI Grok CLI request as a durable background job. Poll with llm_job_status, collect with llm_job_result.", {
             prompt: z
                 .string()
                 .min(1, "Prompt cannot be empty")
@@ -4147,7 +4182,7 @@ export function createGatewayServer(deps = {}) {
             sessionId: z
                 .string()
                 .optional()
-                .describe("Session ID (user-provided CLI handle for --resume)"),
+                .describe("Provider-native session ID to resume (emits --resume <id>; use resumeLatest for --continue)"),
             resumeLatest: z
                 .boolean()
                 .default(false)
@@ -4298,12 +4333,17 @@ export function createGatewayServer(deps = {}) {
                 .boolean()
                 .optional()
                 .describe("Grok --restore-code: check out the original session commit when resuming."),
+            leaderSocket: z
+                .string()
+                .min(1)
+                .optional()
+                .describe("Grok 0.2.32+ --leader-socket <PATH>: custom leader socket path (default ~/.grok/leader.sock). Targets an isolated leader process, e.g. a local/branch Grok build; name it ~/.grok/leader-*.sock to keep `grok leader list/kill` discovery working."),
             nativeWorktree: z
                 .union([z.boolean(), z.string().min(1)])
                 .optional()
                 .describe("Grok -w/--worktree: native CLI worktree flag (`true` → bare `--worktree`, string → named). NOT gateway slice λ `worktree`."),
             worktree: WORKTREE_SCHEMA.optional(),
-        }, async ({ prompt, promptParts, model, outputFormat, sessionId, resumeLatest, createNewSession, alwaysApprove, permissionMode, effort, reasoningEffort, approvalStrategy, approvalPolicy, mcpServers, allowedTools, disallowedTools, correlationId, optimizePrompt, idleTimeoutMs, forceRefresh, maxTurns, workingDir, sandbox, rules, systemPromptOverride, allow, deny, compactionMode, compactionDetail, agent, bestOfN, check, disableWebSearch, todoGate, verbatim, agents, promptFile, promptJson, single, experimentalMemory, noAltScreen, noMemory, noPlan, noSubagents, oauth, restoreCode, nativeWorktree, worktree, }) => {
+        }, async ({ prompt, promptParts, model, outputFormat, sessionId, resumeLatest, createNewSession, alwaysApprove, permissionMode, effort, reasoningEffort, approvalStrategy, approvalPolicy, mcpServers, allowedTools, disallowedTools, correlationId, optimizePrompt, idleTimeoutMs, forceRefresh, maxTurns, workingDir, sandbox, rules, systemPromptOverride, allow, deny, compactionMode, compactionDetail, agent, bestOfN, check, disableWebSearch, todoGate, verbatim, agents, promptFile, promptJson, single, experimentalMemory, noAltScreen, noMemory, noPlan, noSubagents, oauth, restoreCode, leaderSocket, nativeWorktree, worktree, }) => {
             return handleGrokRequestAsync({ sessionManager, asyncJobManager, logger, runtime }, {
                 prompt,
                 promptParts,
@@ -4351,11 +4391,12 @@ export function createGatewayServer(deps = {}) {
                 noSubagents,
                 oauth,
                 restoreCode,
+                leaderSocket,
                 nativeWorktree,
                 worktree,
             });
         });
-        server.tool("mistral_request_async", {
+        server.tool("mistral_request_async", "Start a Mistral Vibe CLI request as a durable background job. Poll with llm_job_status, collect with llm_job_result.", {
             prompt: z
                 .string()
                 .min(1, "Prompt cannot be empty")
@@ -4462,7 +4503,7 @@ export function createGatewayServer(deps = {}) {
                 worktree,
             });
         });
-        server.tool("llm_job_status", {
+        server.tool("llm_job_status", "Check lifecycle status (running|completed|failed|canceled|orphaned) of a gateway async or deferred-sync job by jobId.", {
             jobId: z.string().describe("Async job ID from *_request_async"),
         }, async ({ jobId }) => {
             const job = asyncJobManager.getJobSnapshot(jobId);
@@ -4493,7 +4534,7 @@ export function createGatewayServer(deps = {}) {
                 ],
             };
         });
-        server.tool("llm_job_result", {
+        server.tool("llm_job_result", "Retrieve captured stdout/stderr for a gateway async or deferred-sync job by jobId.", {
             jobId: z.string().describe("Async job ID from *_request_async"),
             maxChars: z
                 .number()
@@ -4547,7 +4588,7 @@ export function createGatewayServer(deps = {}) {
                 ],
             };
         });
-        server.tool("llm_job_cancel", {
+        server.tool("llm_job_cancel", "Cancel a running gateway async or deferred-sync job by jobId.", {
             jobId: z.string().describe("Async job ID from *_request_async"),
         }, async ({ jobId }) => {
             const cancel = asyncJobManager.cancelJob(jobId);
@@ -4579,7 +4620,7 @@ export function createGatewayServer(deps = {}) {
             };
         });
     }
-    server.tool("llm_request_result", {
+    server.tool("llm_request_result", "Read back any persisted request (sync or async) from the flight recorder by correlationId, including prompt and response.", {
         correlationId: z
             .string()
             .min(1)
@@ -4625,7 +4666,7 @@ export function createGatewayServer(deps = {}) {
             ],
         };
     });
-    server.tool("llm_process_health", {}, async () => {
+    server.tool("llm_process_health", "Report gateway process health: async-job manager state plus the resolved persistence configuration and paths.", {}, async () => {
         const health = asyncJobManager.getJobHealth();
         const persistenceBlock = {
             backend: persistence.backend,
@@ -4649,7 +4690,7 @@ export function createGatewayServer(deps = {}) {
             ],
         };
     });
-    server.tool("approval_list", {
+    server.tool("approval_list", "List recent MCP-managed approval decisions recorded by the gateway (approvalStrategy: mcp_managed).", {
         limit: z
             .number()
             .int()
@@ -4676,7 +4717,7 @@ export function createGatewayServer(deps = {}) {
             ],
         };
     });
-    server.tool("list_models", {
+    server.tool("list_models", "List models, aliases, and defaults for one provider CLI (claude|codex|gemini|grok|mistral).", {
         cli: z
             .preprocess(value => (value === "" || value === null ? undefined : value), z.enum(["claude", "codex", "gemini", "grok", "mistral"]).optional())
             .describe("CLI filter (claude|codex|gemini|grok|mistral)"),
@@ -4685,7 +4726,7 @@ export function createGatewayServer(deps = {}) {
         const result = cli ? { [cli]: cliInfo[cli] } : cliInfo;
         return { content: [{ type: "text", text: JSON.stringify(result, null, 2) }] };
     });
-    server.tool("cli_versions", {
+    server.tool("cli_versions", "Report installed provider CLI versions, availability, and login status for all five providers or one.", {
         cli: z
             .preprocess(value => (value === "" || value === null ? undefined : value), z.enum(["claude", "codex", "gemini", "grok", "mistral"]).optional())
             .describe("CLI filter (claude|codex|gemini|grok|mistral)"),
@@ -4693,7 +4734,7 @@ export function createGatewayServer(deps = {}) {
         const versions = await getCliVersions(cli);
         return { content: [{ type: "text", text: JSON.stringify({ versions }, null, 2) }] };
     });
-    server.tool("upstream_contracts", {
+    server.tool("upstream_contracts", "Return the gateway's declared provider CLI contracts; with probeInstalled true, diff against installed --help surfaces to detect flag drift.", {
         cli: z
             .preprocess(value => (value === "" || value === null ? undefined : value), SESSION_PROVIDER_ENUM.optional())
             .describe("CLI filter (claude|codex|gemini|grok|mistral)"),
@@ -4705,7 +4746,7 @@ export function createGatewayServer(deps = {}) {
         const report = buildUpstreamContractReport({ cli, probeInstalled });
         return { content: [{ type: "text", text: JSON.stringify(report, null, 2) }] };
     });
-    server.tool("cli_upgrade", {
+    server.tool("cli_upgrade", "Plan (dryRun, default true) or execute an upgrade for one provider CLI using its native update mechanism.", {
         cli: z.enum(["claude", "codex", "gemini", "grok", "mistral"]).describe("CLI to upgrade"),
         target: z
             .string()
@@ -4754,7 +4795,7 @@ export function createGatewayServer(deps = {}) {
             };
         }
     });
-    server.tool("session_create", {
+    server.tool("session_create", "Create a gateway session record for a provider CLI. NOTE: this is gateway bookkeeping (gw-* ID), not a provider-native session — Codex resume needs a real Codex UUID.", {
         cli: SESSION_PROVIDER_ENUM.describe("CLI type (claude|codex|gemini|grok|mistral)"),
         description: z.string().optional().describe("Session description"),
         setAsActive: z.boolean().default(true).describe("Set as active session"),
@@ -4787,7 +4828,7 @@ export function createGatewayServer(deps = {}) {
             return createErrorResponse("session_create", 1, "", undefined, error);
         }
     });
-    server.tool("session_list", {
+    server.tool("session_list", "List gateway session records and the active session per CLI, optionally filtered by CLI.", {
         cli: SESSION_PROVIDER_ENUM.optional().describe("CLI filter (claude|codex|gemini|grok|mistral)"),
     }, async ({ cli }) => {
         try {
@@ -4830,7 +4871,7 @@ export function createGatewayServer(deps = {}) {
             return createErrorResponse("session_list", 1, "", undefined, error);
         }
     });
-    server.tool("session_set_active", {
+    server.tool("session_set_active", "Set or clear the active session for a CLI; the active session is used when a request omits sessionId.", {
         cli: SESSION_PROVIDER_ENUM.describe("CLI type (claude|codex|gemini|grok|mistral)"),
         sessionId: z.string().nullable().describe("Session ID (null to clear)"),
     }, async ({ cli, sessionId }) => {
@@ -4868,7 +4909,7 @@ export function createGatewayServer(deps = {}) {
             return createErrorResponse("session_set_active", 1, "", undefined, error);
         }
     });
-    server.tool("session_delete", {
+    server.tool("session_delete", "Delete a gateway session record by ID (also removes any gateway-owned worktree attached to it).", {
         sessionId: z.string().describe("Session ID"),
     }, async ({ sessionId }) => {
         try {
@@ -4909,7 +4950,7 @@ export function createGatewayServer(deps = {}) {
             return createErrorResponse("session_delete", 1, "", undefined, error);
         }
     });
-    server.tool("session_get", {
+    server.tool("session_get", "Get one gateway session record by session ID, including recent request history when available.", {
         sessionId: z.string().describe("Session ID"),
     }, async ({ sessionId }) => {
         try {
@@ -4972,7 +5013,7 @@ export function createGatewayServer(deps = {}) {
             return createErrorResponse("session_get", 1, "", undefined, error);
         }
     });
-    server.tool("session_clear_all", {
+    server.tool("session_clear_all", "Delete all gateway session records, optionally scoped to one CLI.", {
         cli: SESSION_PROVIDER_ENUM.optional().describe("CLI filter (claude|codex|gemini|grok|mistral)"),
     }, async ({ cli }) => {
         try {

package/dist/upstream-contracts.d.ts CHANGED Viewed

@@ -5,6 +5,7 @@ export interface CliFlagContract {
     values?: readonly string[];
     pattern?: RegExp;
     description: string;
+    hiddenFromHelp?: boolean;
 }
 export interface CliUpstreamMetadata {
     sourceUrls: readonly string[];
@@ -32,6 +33,7 @@ export interface CliContract {
     resumeMaxPositionals?: number;
     resumeOnlyFlags?: readonly string[];
     resumeForbiddenFlags?: readonly string[];
+    acknowledgedUpstreamFlags?: readonly string[];
     upstreamMetadata?: CliUpstreamMetadata;
 }
 export interface CliContractFixture {
@@ -57,6 +59,13 @@ export declare function assertUpstreamCliArgs(cli: CliType, args: readonly strin
 export declare function validateUpstreamCliEnv(cli: CliType, env: Record<string, string> | undefined): ContractValidationResult;
 export declare function assertUpstreamCliEnv(cli: CliType, env: Record<string, string> | undefined): void;
 export declare function extractDiscoveredFlags(helpText: string): readonly string[];
+export interface FlagDriftResult {
+    missingFlags: string[];
+    extraFlags: readonly string[];
+    acknowledgedExtraFlags: readonly string[];
+    warnings: string[];
+}
+export declare function computeFlagDrift(contract: CliContract, helpText: string, discoveredFlags: readonly string[]): FlagDriftResult;
 export interface InstalledCliContractProbe {
     cli: CliType;
     executable: string;
@@ -66,6 +75,7 @@ export interface InstalledCliContractProbe {
     checkedHelpCommands: string[][];
     missingFlags: string[];
     extraFlags: readonly string[];
+    acknowledgedExtraFlags: readonly string[];
     discoveredFlags: readonly string[];
     helpHash?: string;
     versionHint?: string;

package/dist/upstream-contracts.js CHANGED Viewed

@@ -99,7 +99,12 @@ export const UPSTREAM_CLI_CONTRACTS = {
                 pattern: /^[0-9]+(?:\.[0-9]+)?$/,
                 description: "Budget cap in USD",
             },
-            "--max-turns": { arity: "one", pattern: /^[1-9][0-9]*$/, description: "Turn cap" },
+            "--max-turns": {
+                arity: "one",
+                pattern: /^[1-9][0-9]*$/,
+                description: "Turn cap",
+                hiddenFromHelp: true,
+            },
             "--effort": { arity: "one", values: EFFORT_LEVELS, description: "Reasoning effort" },
             "--exclude-dynamic-system-prompt-sections": {
                 arity: "none",
@@ -136,6 +141,37 @@ export const UPSTREAM_CLI_CONTRACTS = {
                 description: 'Restrict the available built-in tool set ("" disables all)',
             },
         },
+        acknowledgedUpstreamFlags: [
+            "--allow-dangerously-skip-permissions",
+            "--allowed",
+            "--bare",
+            "--betas",
+            "--brief",
+            "--chrome",
+            "--dangerously-skip-permissions",
+            "--debug",
+            "--debug-file",
+            "--disable-slash-commands",
+            "--disallowed",
+            "--file",
+            "--from-pr",
+            "--ide",
+            "--include-hook-events",
+            "--mcp-debug",
+            "--name",
+            "--no-chrome",
+            "--plugin-dir",
+            "--plugin-url",
+            "--print",
+            "--prompt-suggestions",
+            "--remote-control",
+            "--remote-control-session-name-prefix",
+            "--replay-user-messages",
+            "--resume",
+            "--tmux",
+            "--version",
+            "--worktree",
+        ],
         env: {},
         conformanceFixtures: [
             {
@@ -518,6 +554,26 @@ export const UPSTREAM_CLI_CONTRACTS = {
                 description: "Auto-approve all actions (gemini -y/--yolo). Functionally equivalent to --approval-mode yolo; the gateway emits at most one of the two.",
             },
         },
+        acknowledgedUpstreamFlags: [
+            "--accept-raw-output-risk",
+            "--acp",
+            "--debug",
+            "--delete-session",
+            "--experimental-acp",
+            "--extensions",
+            "--list-extensions",
+            "--list-sessions",
+            "--output-format",
+            "--prompt",
+            "--prompt-interactive",
+            "--raw-output",
+            "--sandbox",
+            "--screen-reader",
+            "--session-file",
+            "--session-id",
+            "--version",
+            "--worktree",
+        ],
         env: {},
         conformanceFixtures: [
             {
@@ -612,6 +668,7 @@ export const UPSTREAM_CLI_CONTRACTS = {
             "noSubagents",
             "oauth",
             "restoreCode",
+            "leaderSocket",
             "nativeWorktree",
         ],
         flags: {
@@ -693,6 +750,10 @@ export const UPSTREAM_CLI_CONTRACTS = {
                 arity: "none",
                 description: "Check out the original session commit when resuming",
             },
+            "--leader-socket": {
+                arity: "one",
+                description: "Custom leader socket path (isolated leader, Grok 0.2.32+)",
+            },
             "--single": { arity: "one", description: "Single-turn prompt" },
             "--todo-gate": { arity: "none", description: "Enable runtime turn-end TodoGate" },
             "--verbatim": { arity: "none", description: "Send prompt exactly as given" },
@@ -843,6 +904,18 @@ export const UPSTREAM_CLI_CONTRACTS = {
                 ],
                 expect: "pass",
             },
+            {
+                id: "grok-leader-socket",
+                description: "Grok 0.2.32: --leader-socket <PATH> is accepted",
+                args: ["-p", "hello", "--leader-socket", "/home/user/.grok/leader-branch.sock"],
+                expect: "pass",
+            },
+            {
+                id: "grok-leader-socket-missing-value",
+                description: "Grok 0.2.32: --leader-socket without a path is rejected (arity one)",
+                args: ["-p", "hello", "--leader-socket"],
+                expect: "fail",
+            },
         ],
     },
     mistral: {
@@ -1220,6 +1293,42 @@ export function extractDiscoveredFlags(helpText) {
     }
     return Array.from(discovered).sort();
 }
+export function computeFlagDrift(contract, helpText, discoveredFlags) {
+    const warnings = [];
+    const missingFlags = [];
+    for (const [flag, spec] of Object.entries(contract.flags)) {
+        const inHelp = helpText.includes(flag);
+        if (spec.hiddenFromHelp) {
+            if (inHelp) {
+                warnings.push(`${flag} is marked hiddenFromHelp but now appears in ${contract.executable} help output; remove the hiddenFromHelp marker from the contract`);
+            }
+            continue;
+        }
+        if (!inHelp)
+            missingFlags.push(flag);
+    }
+    const contractFlagSet = new Set(Object.keys(contract.flags));
+    const acknowledged = new Set(contract.acknowledgedUpstreamFlags ?? []);
+    const extraFlags = [];
+    const acknowledgedExtraFlags = [];
+    for (const flag of discoveredFlags) {
+        if (contractFlagSet.has(flag))
+            continue;
+        if (acknowledged.has(flag)) {
+            acknowledgedExtraFlags.push(flag);
+        }
+        else {
+            extraFlags.push(flag);
+        }
+    }
+    const discoveredSet = new Set(discoveredFlags);
+    for (const flag of acknowledged) {
+        if (!discoveredSet.has(flag)) {
+            warnings.push(`acknowledged upstream flag ${flag} no longer appears in ${contract.executable} help output; remove it from acknowledgedUpstreamFlags`);
+        }
+    }
+    return { missingFlags, extraFlags, acknowledgedExtraFlags, warnings };
+}
 export function probeInstalledCliContract(cli, timeoutMs = 5_000) {
     const contract = UPSTREAM_CLI_CONTRACTS[cli];
     const outputs = [];
@@ -1252,6 +1361,7 @@ export function probeInstalledCliContract(cli, timeoutMs = 5_000) {
                 checkedHelpCommands: contract.helpArgs,
                 missingFlags: [],
                 extraFlags: [],
+                acknowledgedExtraFlags: [],
                 discoveredFlags: [],
                 helpHash: undefined,
                 versionHint: undefined,
@@ -1265,10 +1375,9 @@ export function probeInstalledCliContract(cli, timeoutMs = 5_000) {
         }
     }
     const helpText = outputs.join("\n");
-    const missingFlags = Object.keys(contract.flags).filter(flag => !helpText.includes(flag));
     const discoveredFlags = extractDiscoveredFlags(helpText);
-    const contractFlagSet = new Set(Object.keys(contract.flags));
-    const extraFlags = discoveredFlags.filter(f => !contractFlagSet.has(f));
+    const drift = computeFlagDrift(contract, helpText, discoveredFlags);
+    warnings.push(...drift.warnings);
     const versionMatch = helpText.match(/^\s*(?:[A-Za-z][\w .-]+)?v?\d+\.\d+\S*/m);
     const versionHint = versionMatch ? versionMatch[0].trim().slice(0, 80) : undefined;
     const helpHash = createHash("sha256").update(helpText).digest("hex");
@@ -1279,8 +1388,9 @@ export function probeInstalledCliContract(cli, timeoutMs = 5_000) {
         resolvedArgs,
         available: true,
         checkedHelpCommands: contract.helpArgs,
-        missingFlags,
-        extraFlags,
+        missingFlags: drift.missingFlags,
+        extraFlags: drift.extraFlags,
+        acknowledgedExtraFlags: drift.acknowledgedExtraFlags,
         discoveredFlags,
         helpHash,
         versionHint,

package/dist/validation-tools.js CHANGED Viewed

@@ -47,7 +47,7 @@ function findHumanReadableReport(value) {
     return null;
 }
 export function registerValidationTools(server, deps) {
-    server.tool("validate_with_models", {
+    server.tool("validate_with_models", "Ask two or more provider CLIs to independently validate a question. Starts validation jobs — poll with job_status, collect with job_result (not llm_job_*).", {
         question: z.string().min(1).describe("Question or content to validate."),
         models: providerListSchema.describe("Providers to ask. Defaults to Claude and Codex."),
         focus: z
@@ -69,7 +69,7 @@ export function registerValidationTools(server, deps) {
             judgeProvider: judgeModel,
         }),
     }));
-    server.tool("second_opinion", {
+    server.tool("second_opinion", "Ask one provider CLI to review an answer (starts a validation job; poll job_status, collect job_result).", {
         answer: z.string().min(1).describe("Answer to review."),
         question: z.string().optional().describe("Original question, if available."),
         model: providerSchema.default("codex").describe("Provider to ask for the second opinion."),
@@ -84,7 +84,7 @@ export function registerValidationTools(server, deps) {
             providers: [model],
         }),
     }));
-    server.tool("compare_answers", {
+    server.tool("compare_answers", "Summarize agreement/differences between caller-provided answers LOCALLY — does not call any provider.", {
         question: z.string().min(1).describe("Question the answers respond to."),
         answers: z.array(z.string().min(1)).min(2).describe("Two or more answers to compare."),
     }, async ({ question, answers }) => textResponse({
@@ -99,7 +99,7 @@ export function registerValidationTools(server, deps) {
             note: "Use validate_with_models when independent provider review is needed.",
         },
     }));
-    server.tool("red_team_review", {
+    server.tool("red_team_review", "Challenge a plan, answer, or document for risks and failure modes via provider CLIs (starts validation jobs).", {
         content: z.string().min(1).describe("Plan, answer, or document to challenge."),
         riskLevel: z
             .enum(["normal", "high"])
@@ -117,7 +117,7 @@ export function registerValidationTools(server, deps) {
             riskLevel,
         }),
     }));
-    server.tool("consensus_check", {
+    server.tool("consensus_check", "Ask provider CLIs whether they agree or disagree with a claim (starts validation jobs).", {
         claim: z.string().min(1).describe("Claim to check across providers."),
         models: providerListSchema.describe("Providers to ask for agreement or disagreement."),
     }, async ({ claim, models }) => textResponse({
@@ -130,7 +130,7 @@ export function registerValidationTools(server, deps) {
             providers: models,
         }),
     }));
-    server.tool("ask_model", {
+    server.tool("ask_model", "Ask one provider CLI a question through the simplified validation surface (starts a validation job).", {
         question: z.string().min(1).describe("Question for one provider."),
         model: providerSchema.default("claude").describe("Provider to ask."),
     }, async ({ question, model }) => textResponse({
@@ -143,7 +143,7 @@ export function registerValidationTools(server, deps) {
             providers: [model],
         }),
     }));
-    server.tool("synthesize_validation", {
+    server.tool("synthesize_validation", "Run an explicit judge model over already-collected validation results to produce a synthesis.", {
         question: z.string().min(1).describe("Original request that was validated."),
         providerResults: z
             .array(normalizedProviderResultSchema)
@@ -160,8 +160,8 @@ export function registerValidationTools(server, deps) {
             judgeProvider: judgeModel,
         }),
     }));
-    server.tool("list_available_models", {}, async () => textResponse({ success: true, models: getAvailableCliInfo() }));
-    server.tool("job_status", {
+    server.tool("list_available_models", "List models and capabilities for every available provider CLI (takes no arguments; complements per-provider list_models).", {}, async () => textResponse({ success: true, models: getAvailableCliInfo() }));
+    server.tool("job_status", "Check a VALIDATION job's status (jobs started by validate_with_models/ask_model/etc.) — distinct from llm_job_status, which tracks provider request jobs.", {
         jobId: z.string().min(1).describe("Validation job ID."),
     }, async ({ jobId }) => {
         const job = deps.asyncJobManager.getJobSnapshot(jobId);
@@ -170,7 +170,7 @@ export function registerValidationTools(server, deps) {
         }
         return textResponse({ success: true, job });
     });
-    server.tool("job_result", {
+    server.tool("job_result", "Collect a VALIDATION job's normalized provider output — distinct from llm_job_result, which returns raw provider request job output.", {
         jobId: z.string().min(1).describe("Validation job ID."),
         provider: providerSchema
             .optional()

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "llm-cli-gateway",
-  "version": "2.0.0",
+  "version": "2.2.0",
   "mcpName": "io.github.verivus-oss/llm-cli-gateway",
   "description": "MCP server providing unified access to Claude Code, Codex, Gemini, Grok, and Mistral Vibe CLIs with session management, retry logic, async job orchestration, durable job results, and cross-LLM validation.",
   "license": "MIT",