llm-cli-gateway 2.0.0 → 2.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -4,6 +4,71 @@ All notable changes to the llm-cli-gateway project.
4
4
 
5
5
  ## Unreleased
6
6
 
7
+ ## [2.2.0] - 2026-06-07: MCP tool-surface usability — self-describing tools
8
+
9
+ ### Added
10
+
11
+ - MCP tool-surface usability (4-seat cross-LLM review): all 37 tools now carry
12
+ action descriptions (previously none had tool-level descriptions — clients
13
+ that rank, search, or defer tools by description saw bare names); sync
14
+ `*_request` descriptions state the prompt/promptParts exactly-one rule and
15
+ conditional deferral; `job_status`/`job_result` vs `llm_job_*` and the
16
+ local-only `compare_answers` are disambiguated; session/`sessionId`
17
+ describes gain per-provider resume semantics parity.
18
+
19
+ ### Fixed
20
+
21
+ - Codex gateway-bookkeeping sessions are now created with the reserved `gw-`
22
+ prefix (4 sites), so resuming a gateway ID fails fast with an actionable
23
+ error instead of reaching `codex exec resume` and dying with "no rollout
24
+ found" (root cause of real-world resume failures).
25
+ - Server instructions are now built per-server from the same derived gate as
26
+ tool registration (backend, asyncJobsEnabled, hasStore()), so a
27
+ `backend = "none"` gateway no longer advertises unregistered
28
+ `*_request_async`/`llm_job_*` tools.
29
+ - Sync auto-deferral is disabled when async jobs are unavailable — previously
30
+ a request could defer into an in-memory job whose polling tools were not
31
+ registered (dead-end jobId).
32
+
33
+ ## [2.1.0] - 2026-06-07: Grok Build 0.2.32, probe drift acknowledgement, docs currency
34
+
35
+ ### Added
36
+
37
+ - Grok Build 0.2.32 support: new `leaderSocket` parameter on `grok_request` /
38
+ `grok_request_async` maps to the new `--leader-socket <PATH>` flag (isolated
39
+ leader process for local/branch Grok builds; default `~/.grok/leader.sock`).
40
+ Contract declares the flag with arity-one validation plus conformance
41
+ fixtures. The release's other changes (plugin slash commands in all
42
+ conversations, ordered rapid prompt submissions, faster grep on large
43
+ repos) are CLI-internal and inherited automatically. Probe at 0.2.32:
44
+ missingFlags/warnings clean.
45
+
46
+ ### Fixed
47
+
48
+ - Upstream-contract probe drift after the 2026-06 provider CLI upgrades
49
+ (gemini 0.45.2, grok 0.2.22, vibe 2.14.0): `CliFlagContract.hiddenFromHelp`
50
+ marks real flags hidden from a binary's `--help` (Claude `--max-turns`), and
51
+ `CliContract.acknowledgedUpstreamFlags` acknowledges upstream-only flags the
52
+ gateway never emits (29 Claude, 18 Gemini). Both are probe-only — the argv
53
+ allowlist is unchanged — with stale-marker warnings in both directions and a
54
+ new `acknowledgedExtraFlags` probe field. New pure `computeFlagDrift` plus
55
+ 7 unit tests.
56
+ - MCP server version now reports the real package version (was hardcoded
57
+ `1.0.0`).
58
+
59
+ ### Documentation
60
+
61
+ - Cross-LLM documentation currency review (Codex + Gemini + Grok + Mistral):
62
+ README tool reference gains `codex_fork_session`, `llm_request_result`,
63
+ `llm_process_health`, `upstream_contracts`, and `list_available_models`;
64
+ `claude_request` parameter list completed (`outputFormat` default is
65
+ `stream-json`); Codex `fullAuto` documented as deprecated in favour of
66
+ `sandboxMode`; Gemini approval modes include `plan`; grok/mistral upgrade
67
+ strategies documented; stale test counts, provider lists, and
68
+ `BEST_PRACTICES.md` path pointers corrected across README, AGENTS.md,
69
+ .cursorrules, CLAUDE.md, docs/guides, docs/personal-mcp (Mistral/Vibe row
70
+ added to the provider support matrix), and docs/upstream.
71
+
7
72
  ## [2.0.0] - 2026-06-04: node:sqlite migration — native module out of the prod graph
8
73
 
9
74
  Major release. Persistence moves from the native `better-sqlite3` binding to
package/README.md CHANGED
@@ -205,7 +205,7 @@ Opt-in flags (all default off) live under `[cache_awareness]` in `~/.llm-cli-gat
205
205
 
206
206
  ### Security & Quality
207
207
 
208
- - **Comprehensive Testing**: 900+ tests covering unit, integration, and regression scenarios with real CLI execution
208
+ - **Comprehensive Testing**: 1,000+ tests covering unit, integration, and regression scenarios with real CLI execution
209
209
  - **Input Validation**: Zod schemas prevent injection attacks
210
210
  - **No Secret Leakage**: Generic session descriptions only (file permissions 0o600)
211
211
  - **No ReDoS**: Bounded regex patterns prevent catastrophic backtracking
@@ -344,6 +344,7 @@ The personal-appliance surface exposes simplified validation tools for non-devel
344
344
  - `consensus_check`: check whether providers agree with a claim.
345
345
  - `ask_model`: ask one provider through the simplified surface.
346
346
  - `synthesize_validation`: run an explicit judge model after provider results have been collected.
347
+ - `list_available_models`: list the models each provider CLI exposes through the simplified surface.
347
348
  - `job_status` and `job_result`: poll and collect validation job outputs.
348
349
 
349
350
  The validation report preserves per-provider disagreement. Optional judge synthesis is explicit about which provider produced the judge job.
@@ -356,15 +357,29 @@ Execute a Claude Code request with optional session management.
356
357
 
357
358
  **Parameters:**
358
359
 
359
- - `prompt` (string, required): The prompt to send (1-100,000 chars)
360
+ - `prompt` (string, optional*): The prompt to send (1-100,000 chars). *Exactly one of `prompt` or `promptParts` is required (mutually exclusive)
360
361
  - `model` (string, optional): Model name or alias (use `list_models` for available values; supports `latest`)
361
- - `outputFormat` (string, optional): Output format ("text" or "json"), default: "text"
362
+ - `outputFormat` (string, optional): Output format (`text|json|stream-json`), default: `stream-json` — the gateway parses NDJSON usage events for token/cost observability; override to `text` only when you want unparsed stdout
362
363
  - `sessionId` (string, optional): Specific session ID to use
363
364
  - `continueSession` (boolean, optional): Continue the active session
364
365
  - `createNewSession` (boolean, optional): Always create a new session
366
+ - `forkSession` (boolean, optional): Fork the resumed session instead of appending to it
365
367
  - `allowedTools` (string[], optional): Restrict Claude tools to this allow-list
366
368
  - `disallowedTools` (string[], optional): Explicitly deny listed Claude tools
367
- - `dangerouslySkipPermissions` (boolean, optional): Request CLI-side permission bypass (legacy mode only)
369
+ - `permissionMode` (string, optional): Claude permission mode (`default|acceptEdits|plan|auto|dontAsk|bypassPermissions`); preferred over `dangerouslySkipPermissions`
370
+ - `dangerouslySkipPermissions` (boolean, optional): Deprecated — maps to `permissionMode: "bypassPermissions"`; `permissionMode` wins when both are set
371
+ - `agent` (string, optional): Named sub-agent to run as
372
+ - `agents` (string, optional): Inline agent definitions JSON
373
+ - `systemPrompt` / `appendSystemPrompt` (string, optional): Replace or extend the system prompt
374
+ - `maxBudgetUsd` (number, optional): Budget cap in USD for the request
375
+ - `maxTurns` (integer, optional): Agent-loop turn cap
376
+ - `effort` (string, optional): Reasoning effort (`low|medium|high|xhigh|max`)
377
+ - `fallbackModel` (string, optional): Auto-fallback model when the default is overloaded
378
+ - `jsonSchema` (string, optional): JSON Schema literal constraining structured output
379
+ - `addDir` (string[], optional): Additional workspace directories
380
+ - `noSessionPersistence` (boolean, optional): Ephemeral session (not persisted to disk)
381
+ - `settingSources` / `settings` / `tools` (optional): Setting sources to load, settings JSON path/literal, built-in tool restriction
382
+ - `excludeDynamicSystemPromptSections` (boolean, optional): Trim dynamic system prompt sections
368
383
  - `approvalStrategy` (string, optional): `"legacy"` (default) or `"mcp_managed"`
369
384
  - `approvalPolicy` (string, optional): `"strict"`, `"balanced"`, or `"permissive"`
370
385
  - `mcpServers` (string[], optional): Claude MCP servers to expose (default: `["sqry","exa","ref_tools"]`; `"trstr"` available as opt-in)
@@ -372,6 +387,10 @@ Execute a Claude Code request with optional session management.
372
387
  - `optimizePrompt` (boolean, optional): Optimize prompt for token efficiency (44% reduction), default: false
373
388
  - `optimizeResponse` (boolean, optional): Optimize response for token efficiency (37% reduction), default: false
374
389
  - `correlationId` (string, optional): Request trace ID (auto-generated if omitted)
390
+ - `idleTimeoutMs` (integer, optional): Kill a stuck process after output inactivity; 30,000 to 3,600,000 ms
391
+ - `worktree` (boolean|object, optional): Run inside a gateway-owned git worktree (slice λ)
392
+ - `promptParts` (object, optional): Cache-aware structured prompt `{ system?, tools?, context?, task }`; mutually exclusive with `prompt`
393
+ - `forceRefresh` (boolean, optional): Bypass dedup and force a fresh CLI run, default: false
375
394
 
376
395
  **Response extras:**
377
396
 
@@ -396,19 +415,33 @@ Execute a Codex request with optional session tracking.
396
415
 
397
416
  **Parameters:**
398
417
 
399
- - `prompt` (string, required): The prompt to send (1-100,000 chars)
400
- - `model` (string, optional): Model name or alias (use `list_models` for available values; supports `latest`, recommended: `gpt-5.4`)
401
- - `fullAuto` (boolean, optional): Enable full-auto mode, default: false
418
+ - `prompt` (string, optional*): The prompt to send (1-100,000 chars). *Exactly one of `prompt` or `promptParts` is required (mutually exclusive)
419
+ - `model` (string, optional): Model name or alias (use `list_models` for available values; supports `latest`, recommended: `gpt-5.5`)
420
+ - `fullAuto` (boolean, optional): Deprecated — expands to `--sandbox workspace-write` only (current Codex no longer accepts approval-policy flags); prefer `sandboxMode`
421
+ - `sandboxMode` (string, optional): Codex sandbox (`read-only|workspace-write|danger-full-access`)
402
422
  - `dangerouslyBypassApprovalsAndSandbox` (boolean, optional): Request Codex bypass flags
403
423
  - `approvalStrategy` (string, optional): `"legacy"` (default) or `"mcp_managed"`
404
424
  - `approvalPolicy` (string, optional): `"strict"`, `"balanced"`, or `"permissive"`
405
425
  - `mcpServers` (string[], optional): MCP servers expected for Codex execution context
406
426
  - `sessionId` (string, optional): Session identifier for tracking
427
+ - `resumeLatest` (boolean, optional): Resume the most recent Codex session in the current cwd (`codex exec resume --last`); ignored if `sessionId` is set
407
428
  - `createNewSession` (boolean, optional): Always create a new session
429
+ - `forceRefresh` (boolean, optional): Bypass dedup and force a fresh CLI run, default: false
430
+ - `outputFormat` (string, optional): `text` (default) or `json` (`--json` JSONL events for token usage extraction)
431
+ - `outputSchema` (string|object, optional): Codex `--output-schema` — path or inline JSON Schema
432
+ - `workingDir` (string, optional): Working root for this session (`-C`/`--cd`; new sessions only)
433
+ - `addDir` (string[], optional): Additional writable workspace directories (one `--add-dir` per entry; new sessions only)
434
+ - `ephemeral` (boolean, optional): Codex `--ephemeral` (no session persistence)
435
+ - `images` (string[], optional): Image attachments (one `-i <path>` per entry)
436
+ - `profile` (string, optional): Codex `--profile <name>` (new sessions only; ignored with a logged warning on resume)
437
+ - `configOverrides` (object, optional): Codex `-c key=value` overrides
438
+ - `ignoreRules` / `ignoreUserConfig` (boolean, optional): Codex `--ignore-rules` / `--ignore-user-config`
439
+ - `worktree` (boolean|object, optional): Run inside a gateway-owned git worktree (slice λ)
440
+ - `promptParts` (object, optional): Cache-aware structured prompt `{ system?, tools?, context?, task }`; mutually exclusive with `prompt`
408
441
  - `optimizePrompt` (boolean, optional): Optimize prompt for token efficiency, default: false
409
442
  - `optimizeResponse` (boolean, optional): Optimize response for token efficiency, default: false
410
443
  - `correlationId` (string, optional): Request trace ID (auto-generated if omitted)
411
- - `idleTimeoutMs` (number, optional): Kill a stuck Codex process after output inactivity; 30,000 to 3,600,000 ms
444
+ - `idleTimeoutMs` (integer, optional): Kill a stuck Codex process after output inactivity; 30,000 to 3,600,000 ms
412
445
 
413
446
  **Response extras:**
414
447
 
@@ -420,32 +453,56 @@ Execute a Codex request with optional session tracking.
420
453
  ```json
421
454
  {
422
455
  "prompt": "Create a REST API endpoint",
423
- "model": "gpt-5.4",
424
- "fullAuto": true,
456
+ "model": "gpt-5.5",
457
+ "sandboxMode": "workspace-write",
425
458
  "optimizePrompt": true
426
459
  }
427
460
  ```
428
461
 
462
+ ##### `codex_fork_session`
463
+
464
+ Fork an existing Codex session into a new branch (`codex fork <SESSION_ID|--last> <prompt>`), preserving the original session's history while the fork diverges.
465
+
466
+ **Parameters:**
467
+
468
+ - `prompt` (string, required): Prompt text for the forked session (1-100,000 chars)
469
+ - `sessionId` (string, optional): Codex session UUID to fork from (mutually exclusive with `forkLast`)
470
+ - `forkLast` (boolean, optional): Fork the most recent Codex session instead of naming one
471
+ - `model` (string, optional): Model name or alias (e.g. `gpt-5.5`, `latest`)
472
+ - `sandboxMode` (string, optional): Codex sandbox (`read-only|workspace-write|danger-full-access`)
473
+ - `correlationId` (string, optional): Request trace ID (auto-generated if omitted)
474
+ - `idleTimeoutMs` (number, optional): Idle timeout in ms (30s-1h, omit for CLI default)
475
+
429
476
  ##### `gemini_request`
430
477
 
431
478
  Execute a Gemini CLI request with session support.
432
479
 
433
480
  **Parameters:**
434
481
 
435
- - `prompt` (string, required): The prompt to send (1-100,000 chars)
482
+ - `prompt` (string, optional*): The prompt to send (1-100,000 chars). *Exactly one of `prompt` or `promptParts` is required (mutually exclusive)
436
483
  - `model` (string, optional): Model name or alias (use `list_models` for available values; supports `latest`, `pro`, `flash`)
437
484
  - `sessionId` (string, optional): Session ID to resume
438
485
  - `resumeLatest` (boolean, optional): Resume the latest session automatically
439
486
  - `createNewSession` (boolean, optional): Always create a new session
440
- - `approvalMode` (string, optional): Gemini approval mode (`default|auto_edit|yolo`) in legacy mode
487
+ - `approvalMode` (string, optional): Gemini approval mode (`default|auto_edit|yolo|plan`) in legacy mode
441
488
  - `approvalStrategy` (string, optional): `"legacy"` (default) or `"mcp_managed"`
442
489
  - `approvalPolicy` (string, optional): `"strict"`, `"balanced"`, or `"permissive"`
443
490
  - `mcpServers` (string[], optional): Allowed Gemini MCP server names
444
491
  - `allowedTools` (string[], optional): Restrict Gemini tools to this allow-list
445
492
  - `includeDirs` (string[], optional): Additional workspace directories for Gemini
493
+ - `outputFormat` (string, optional): `text` (default), `json` (`-o json`), or `stream-json` (`-o stream-json`, NDJSON with usage extraction)
494
+ - `sandbox` (boolean, optional): Run Gemini in sandbox mode (`-s`)
495
+ - `policyFiles` / `adminPolicyFiles` (string[], optional): Policy / admin-policy file paths (one `--policy`/`--admin-policy` per file; paths must exist)
496
+ - `attachments` (string[], optional): Absolute file paths prepended as `@<path>` tokens to the prompt
497
+ - `skipTrust` (boolean, optional): Emit `--skip-trust` to trust the workspace for this session (required for headless runs in fresh workspaces)
498
+ - `yolo` (boolean, optional): Auto-approve all; equivalent to `approvalMode: "yolo"`. Emits `--yolo` only when `--approval-mode yolo` is not already being emitted (never both)
499
+ - `worktree` (boolean|object, optional): Run inside a gateway-owned git worktree (slice λ)
500
+ - `promptParts` (object, optional): Cache-aware structured prompt `{ system?, tools?, context?, task }`; mutually exclusive with `prompt`
446
501
  - `optimizePrompt` (boolean, optional): Optimize prompt for token efficiency, default: false
447
502
  - `optimizeResponse` (boolean, optional): Optimize response for token efficiency, default: false
448
503
  - `correlationId` (string, optional): Request trace ID (auto-generated if omitted)
504
+ - `idleTimeoutMs` (integer, optional): Kill a stuck process after output inactivity; 30,000 to 3,600,000 ms
505
+ - `forceRefresh` (boolean, optional): Bypass dedup and force a fresh CLI run, default: false
449
506
 
450
507
  **Response extras:**
451
508
 
@@ -469,7 +526,7 @@ Execute a Grok CLI (xAI) request with session support.
469
526
 
470
527
  **Parameters:**
471
528
 
472
- - `prompt` (string, required): The prompt to send (1-100,000 chars)
529
+ - `prompt` (string, optional*): The prompt to send (1-100,000 chars). *Exactly one of `prompt` or `promptParts` is required (mutually exclusive)
473
530
  - `model` (string, optional): Model name or alias (e.g. `grok-build`, `latest`)
474
531
  - `outputFormat` (string, optional): `"plain"` (default), `"json"`, or `"streaming-json"`
475
532
  - `sessionId` (string, optional): Session ID to resume (`--resume <id>`)
@@ -484,9 +541,35 @@ Execute a Grok CLI (xAI) request with session support.
484
541
  - `mcpServers` (string[], optional): MCP server names tracked for approvals (Grok manages its own MCP config via `grok mcp`)
485
542
  - `allowedTools` (string[], optional): Allowed built-in tools (passed as `--tools` comma list)
486
543
  - `disallowedTools` (string[], optional): Disallowed built-in tools (passed as `--disallowed-tools` comma list)
544
+ - `maxTurns` (integer, optional): Agent-loop iteration cap (`--max-turns`)
545
+ - `workingDir` (string, optional): Working directory for this invocation (`--cwd`)
546
+ - `sandbox` (string, optional): Sandbox profile for filesystem/network access (`--sandbox`, freeform; also via `GROK_SANDBOX`)
547
+ - `rules` (string, optional): Extra rules appended to the system prompt (`--rules`; supports `@file` prefix)
548
+ - `systemPromptOverride` (string, optional): Replace the agent's system prompt entirely
549
+ - `allow` / `deny` (string[], optional): Permission allow/deny rules (one `--allow`/`--deny` per entry)
550
+ - `compactionMode` (string, optional): `summary` (default) `|transcript|segments`
551
+ - `compactionDetail` (string, optional): `none|minimal|balanced|verbose` (segments mode only)
552
+ - `agent` (string, optional): Agent name or definition file path
553
+ - `agents` (string|object, optional): Inline subagent definitions JSON
554
+ - `bestOfN` (integer, optional): Run the task N ways in parallel and pick the best (headless only)
555
+ - `check` (boolean, optional): Append a self-verification loop (headless only)
556
+ - `disableWebSearch` (boolean, optional): Disable web search and remote retrieval tools
557
+ - `todoGate` (boolean, optional): Enable runtime turn-end TodoGate (session-scoped)
558
+ - `verbatim` (boolean, optional): Send the prompt exactly as given (also skips gateway prompt optimisation)
559
+ - `promptFile` / `promptJson` / `single` (optional): Single-turn prompt from a file / JSON blocks / literal
560
+ - `experimentalMemory` / `noMemory` (boolean, optional): Enable/disable cross-session memory
561
+ - `noAltScreen` / `noPlan` / `noSubagents` (boolean, optional): Disable alt screen / plan mode / subagent spawning
562
+ - `oauth` (boolean, optional): Use OAuth during authentication
563
+ - `restoreCode` (boolean, optional): Check out the original session commit when resuming
564
+ - `leaderSocket` (string, optional): Custom leader socket path (`--leader-socket`, Grok 0.2.32+; default `~/.grok/leader.sock`) — targets an isolated leader process, e.g. a local/branch Grok build
565
+ - `nativeWorktree` (boolean|string, optional): Grok's own `--worktree` flag (`true` → bare, string → named); distinct from the gateway `worktree` option
566
+ - `worktree` (boolean|object, optional): Run inside a gateway-owned git worktree (slice λ)
567
+ - `promptParts` (object, optional): Cache-aware structured prompt `{ system?, tools?, context?, task }`; mutually exclusive with `prompt`
487
568
  - `optimizePrompt` (boolean, optional): Optimize prompt for token efficiency, default: false
488
569
  - `optimizeResponse` (boolean, optional): Optimize response for token efficiency, default: false
489
570
  - `correlationId` (string, optional): Request trace ID (auto-generated if omitted)
571
+ - `idleTimeoutMs` (integer, optional): Kill a stuck process after output inactivity; 30,000 to 3,600,000 ms
572
+ - `forceRefresh` (boolean, optional): Bypass dedup and force a fresh CLI run, default: false
490
573
 
491
574
  **Example:**
492
575
 
@@ -740,6 +823,21 @@ Run a Mistral Vibe agentic coding request. Like `grok_request` in shape, but wit
740
823
  - `disallowedTools` (string[], optional): Accepted for parity with the other providers; ignored at the CLI boundary with a logged warning.
741
824
  - `outputFormat` (string, optional): Vibe 2.x values are `"text"`, `"json"`, or `"streaming"`; legacy aliases `"plain"` and `"stream-json"` are accepted and normalized before spawn.
742
825
  - `sessionId` / `resumeLatest` / `createNewSession`: standard session controls. Current Vibe defaults session logging to enabled; if an older config has `[session_logging] enabled = false`, `doctor --json` surfaces an actionable next-action.
826
+ - `trust` (boolean, optional): Emit `--trust` so Vibe trusts the cwd for this invocation only (not persisted; skips the interactive trust prompt)
827
+ - `maxTurns` (integer, optional): Agent-loop iteration cap (`--max-turns`, programmatic mode only)
828
+ - `maxPrice` (number, optional): Interrupt when cumulative cost crosses this USD cap (`--max-price`, programmatic mode only)
829
+ - `maxTokens` (integer, optional): Cap cumulative prompt + completion tokens (`--max-tokens`, programmatic mode only)
830
+ - `workingDir` (string, optional): Change to this directory before running (`--workdir`)
831
+ - `addDir` (string[], optional): Additional writable workspace directories (one `--add-dir` per entry)
832
+ - `approvalStrategy` (string, optional): `"legacy"` (default) or `"mcp_managed"`
833
+ - `approvalPolicy` (string, optional): `"strict"`, `"balanced"`, or `"permissive"`
834
+ - `mcpServers` (string[], optional): MCP server names tracked for approvals (Vibe manages its own MCP config via `vibe mcp`)
835
+ - `worktree` (boolean|object, optional): Run inside a gateway-owned git worktree (slice λ)
836
+ - `promptParts` (object, optional): Cache-aware structured prompt `{ system?, tools?, context?, task }`; mutually exclusive with `prompt`
837
+ - `optimizePrompt` / `optimizeResponse` (boolean, optional): Token-efficiency optimisation, default: false
838
+ - `correlationId` (string, optional): Request trace ID (auto-generated if omitted)
839
+ - `idleTimeoutMs` (integer, optional): Kill a stuck process after output inactivity; 30,000 to 3,600,000 ms
840
+ - `forceRefresh` (boolean, optional): Bypass dedup and force a fresh CLI run, default: false
743
841
 
744
842
  ##### `claude_request_async` / `codex_request_async` / `gemini_request_async` / `grok_request_async` / `mistral_request_async`
745
843
 
@@ -778,10 +876,33 @@ List recent MCP-managed approval decisions recorded by the gateway.
778
876
  **Parameters:**
779
877
 
780
878
  - `limit` (number, optional): Max records (1-500), default: 50
781
- - `cli` (string, optional): Filter by `"claude"`, `"codex"`, or `"gemini"`
879
+ - `cli` (string, optional): Filter by `"claude"`, `"codex"`, `"gemini"`, `"grok"`, or `"mistral"`
782
880
 
783
881
  Approval records are persisted to `~/.llm-cli-gateway/approvals.jsonl`.
784
882
 
883
+ ##### `llm_request_result`
884
+
885
+ Read back any persisted request — sync or async — by its correlation ID. Every response echoes its ID in `structuredContent.correlationId`; pass it here to recover the persisted prompt/response after the inline result is gone. Reads the flight recorder, so it works independently of async-job persistence (returns "not found" when flight recording is disabled).
886
+
887
+ **Parameters:**
888
+
889
+ - `correlationId` (string, required): Correlation ID from a prior request
890
+ - `maxChars` (number, optional): Max chars of the persisted response to return (1,000-2,000,000)
891
+ - `includePrompt` (boolean, optional): Include the full persisted prompt text, default: false
892
+
893
+ ##### `llm_process_health`
894
+
895
+ Report gateway process health: async-job manager state plus the resolved persistence block (`backend`, `dbPath`, config sources). Use it to confirm which config file and SQLite paths the gateway is actually running under.
896
+
897
+ ##### `upstream_contracts`
898
+
899
+ Return the gateway's declared provider CLI contracts, optionally probing the installed binaries for drift.
900
+
901
+ **Parameters:**
902
+
903
+ - `cli` (string, optional): Filter (`claude|codex|gemini|grok|mistral`)
904
+ - `probeInstalled` (boolean, optional, default `false`): Run local `--help` probes and compare advertised flags against the declared contract — strongly recommended after any provider CLI upgrade. The probe reports `missingFlags`, `extraFlags`, `acknowledgedExtraFlags` (known upstream-only flags filtered from `extraFlags`), `discoveredFlags`, and stale-marker `warnings`.
905
+
785
906
  #### Session Management Tools
786
907
 
787
908
  ##### `session_create`
@@ -924,6 +1045,9 @@ Plan or run an upgrade for one CLI.
924
1045
  - Codex latest: `codex update`
925
1046
  - Codex explicit target: `npm install -g @openai/codex@<target>`
926
1047
  - Gemini: `npm install -g @google/gemini-cli@<target>`
1048
+ - Grok latest: `grok update`
1049
+ - Grok explicit target: `grok update --version <target>`
1050
+ - Mistral (Vibe): dispatches to the detected installer (`pip`/`uv`/`brew`); errors with guidance when none is detected (Vibe ships no self-update command)
927
1051
 
928
1052
  **Example dry run:**
929
1053
 
package/dist/index.d.ts CHANGED
@@ -44,6 +44,7 @@ declare const logger: {
44
44
  debug: (message: string, ...args: any[]) => void;
45
45
  };
46
46
  type GatewayLogger = typeof logger;
47
+ export declare function buildServerInstructions(asyncJobsEnabled: boolean): string;
47
48
  export declare const MAX_TURNS_SCHEMA: z.ZodNumber;
48
49
  export declare const MAX_TOKENS_SCHEMA: z.ZodNumber;
49
50
  export declare const MAX_PRICE_SCHEMA: z.ZodNumber;
@@ -251,6 +252,7 @@ export declare function prepareGrokRequest(params: {
251
252
  noSubagents?: boolean;
252
253
  oauth?: boolean;
253
254
  restoreCode?: boolean;
255
+ leaderSocket?: string;
254
256
  nativeWorktree?: boolean | string;
255
257
  }, runtime?: GatewayServerRuntime): CliRequestPrep | ExtendedToolResponse;
256
258
  export declare function prepareMistralRequest(params: {
@@ -376,6 +378,7 @@ export interface GrokRequestParams {
376
378
  noSubagents?: boolean;
377
379
  oauth?: boolean;
378
380
  restoreCode?: boolean;
381
+ leaderSocket?: string;
379
382
  nativeWorktree?: boolean | string;
380
383
  worktree?: boolean | {
381
384
  name?: string;
package/dist/index.js CHANGED
@@ -141,16 +141,21 @@ function loadSkills() {
141
141
  return skills;
142
142
  }
143
143
  const loadedSkills = loadSkills();
144
- const SERVER_INSTRUCTIONS = `llm-cli-gateway: Multi-LLM orchestration via MCP.
144
+ export function buildServerInstructions(asyncJobsEnabled) {
145
+ const asyncToolsNote = asyncJobsEnabled ? " | *_request_async (async)" : "";
146
+ const jobsLine = asyncJobsEnabled ? "Jobs: llm_job_status, llm_job_result, llm_job_cancel\n" : "";
147
+ const deferralLine = asyncJobsEnabled
148
+ ? `- Sync auto-defers at ${SYNC_DEADLINE_MS}ms. Poll deferred jobs via llm_job_status/llm_job_result.`
149
+ : '- Async jobs are DISABLED (persistence.backend = "none"): *_request_async and llm_job_* tools are not registered, and sync requests run to completion (no auto-deferral).';
150
+ return `llm-cli-gateway: Multi-LLM orchestration via MCP.
145
151
 
146
- Tools: claude_request, codex_request, gemini_request, grok_request, mistral_request (sync) | *_request_async (async)
147
- Validation: validate_with_models, second_opinion, compare_answers, red_team_review, consensus_check, ask_model, synthesize_validation
148
- Jobs: llm_job_status, llm_job_result, llm_job_cancel
149
- Sessions: session_create, session_list, session_set_active, session_get, session_delete, session_clear_all
152
+ Tools: claude_request, codex_request, gemini_request, grok_request, mistral_request (sync)${asyncToolsNote} | codex_fork_session (fork a Codex session into a new branch)
153
+ Validation: validate_with_models, second_opinion, compare_answers, red_team_review, consensus_check, ask_model, synthesize_validation, list_available_models | job_status/job_result (validation jobs)
154
+ ${jobsLine}Sessions: session_create, session_list, session_set_active, session_get, session_delete, session_clear_all
150
155
  Other: list_models, cli_versions, upstream_contracts (use --probe-installed after CLI upgrades to detect drift), cli_upgrade, approval_list, llm_process_health, llm_request_result (read back any persisted request — sync or async — by correlationId)
151
156
 
152
157
  Key behaviors:
153
- - Sync auto-defers at ${SYNC_DEADLINE_MS}ms. Poll deferred jobs via llm_job_status/llm_job_result.
158
+ ${deferralLine}
154
159
  - Sessions: Claude --continue, Gemini --resume, Grok --resume/--continue, Mistral --resume/--continue (current Vibe defaults session logging on; doctor flags explicit session_logging.enabled=false), Codex \`exec resume <ID>\` / \`exec resume --last\` (all real CLI continuity). For Codex, sessionId must be a real Codex UUID (from ~/.codex/sessions/); gateway-generated gw-* IDs are rejected.
155
160
  - Approval gates: opt-in via approvalStrategy:"mcp_managed".
156
161
  - Upstream drift detection: After upgrading any provider CLI (especially grok), use the upstream_contracts tool with probeInstalled: true (or the CLI command "llm-cli-gateway contracts --json --probe-installed"). This is the primary reliable way to detect when an installed binary has gained or lost flags compared to the gateway's declared contract. The probe is safe and read-only.
@@ -158,8 +163,9 @@ Key behaviors:
158
163
 
159
164
  Skills (full docs via MCP resources):
160
165
  ${loadedSkills.map(s => `- skills://${s.name} — ${s.description}`).join("\n")}`;
161
- function newGatewayMcpServer() {
162
- return new McpServer({ name: "llm-cli-gateway", version: "1.0.0" }, { instructions: SERVER_INSTRUCTIONS });
166
+ }
167
+ function newGatewayMcpServer(asyncJobsEnabled = true) {
168
+ return new McpServer({ name: "llm-cli-gateway", version: packageVersion() }, { instructions: buildServerInstructions(asyncJobsEnabled) });
163
169
  }
164
170
  let sessionManager;
165
171
  let db = null;
@@ -307,7 +313,10 @@ async function awaitJobOrDefer(cli, args, corrId, idleTimeoutMs, outputFormat, f
307
313
  consumeOnComplete();
308
314
  throw err;
309
315
  }
310
- if (SYNC_DEADLINE_MS === 0) {
316
+ const deferralAvailable = runtime.persistence.backend !== "none" &&
317
+ runtime.persistence.asyncJobsEnabled &&
318
+ runtime.asyncJobManager.hasStore();
319
+ if (SYNC_DEADLINE_MS === 0 || !deferralAvailable) {
311
320
  const command = cli === "mistral" ? "vibe" : cli;
312
321
  try {
313
322
  return await executeCli(command, args, {
@@ -1474,6 +1483,9 @@ export function prepareGrokRequest(params, runtime = resolveGatewayServerRuntime
1474
1483
  if (params.restoreCode) {
1475
1484
  args.push("--restore-code");
1476
1485
  }
1486
+ if (params.leaderSocket) {
1487
+ args.push("--leader-socket", params.leaderSocket);
1488
+ }
1477
1489
  if (params.nativeWorktree === true) {
1478
1490
  args.push("--worktree");
1479
1491
  }
@@ -1976,6 +1988,7 @@ export async function handleGrokRequest(deps, params) {
1976
1988
  noSubagents: params.noSubagents,
1977
1989
  oauth: params.oauth,
1978
1990
  restoreCode: params.restoreCode,
1991
+ leaderSocket: params.leaderSocket,
1979
1992
  nativeWorktree: params.nativeWorktree,
1980
1993
  }, runtime);
1981
1994
  if (!("args" in prep))
@@ -2133,6 +2146,7 @@ export async function handleGrokRequestAsync(deps, params) {
2133
2146
  noSubagents: params.noSubagents,
2134
2147
  oauth: params.oauth,
2135
2148
  restoreCode: params.restoreCode,
2149
+ leaderSocket: params.leaderSocket,
2136
2150
  nativeWorktree: params.nativeWorktree,
2137
2151
  }, runtime);
2138
2152
  if (!("args" in prep))
@@ -2498,7 +2512,7 @@ export async function handleCodexRequestAsync(deps, params) {
2498
2512
  effectiveSessionId = activeSession.id;
2499
2513
  }
2500
2514
  else {
2501
- const newSession = await deps.sessionManager.createSession("codex", "Codex Session");
2515
+ const newSession = await deps.sessionManager.createSession("codex", "Codex Session", `${GATEWAY_SESSION_PREFIX}${randomUUID()}`);
2502
2516
  effectiveSessionId = newSession.id;
2503
2517
  }
2504
2518
  }
@@ -2506,7 +2520,7 @@ export async function handleCodexRequestAsync(deps, params) {
2506
2520
  await deps.sessionManager.updateSessionUsage(params.sessionId);
2507
2521
  }
2508
2522
  else if (params.createNewSession) {
2509
- const newSession = await deps.sessionManager.createSession("codex", "Codex Session");
2523
+ const newSession = await deps.sessionManager.createSession("codex", "Codex Session", `${GATEWAY_SESSION_PREFIX}${randomUUID()}`);
2510
2524
  effectiveSessionId = newSession.id;
2511
2525
  }
2512
2526
  let worktreeResolution = {};
@@ -2562,10 +2576,10 @@ export function createGatewayServer(deps = {}) {
2562
2576
  void flightRecorder;
2563
2577
  void cacheAwareness;
2564
2578
  const asyncJobsEnabled = persistence.backend !== "none" && persistence.asyncJobsEnabled && asyncJobManager.hasStore();
2565
- const server = newGatewayMcpServer();
2579
+ const server = newGatewayMcpServer(asyncJobsEnabled);
2566
2580
  registerBaseResources(server, runtime);
2567
2581
  registerValidationTools(server, { asyncJobManager });
2568
- server.tool("claude_request", {
2582
+ server.tool("claude_request", "Run a Claude Code CLI request synchronously (when async jobs are enabled, auto-defers to a pollable job past the sync deadline; otherwise runs to completion). Requires exactly one of prompt or promptParts.", {
2569
2583
  prompt: z
2570
2584
  .string()
2571
2585
  .min(1, "Prompt cannot be empty")
@@ -2581,8 +2595,14 @@ export function createGatewayServer(deps = {}) {
2581
2595
  .enum(["text", "json", "stream-json"])
2582
2596
  .default("stream-json")
2583
2597
  .describe("Output format (text|json|stream-json). DEFAULT: stream-json — the gateway parses NDJSON usage events to extract input/output/cache_read/cache_creation tokens + cost + model, persists them to the flight recorder for cache_state aggregates, and still returns the assistant text. Override to 'text' only when you truly want unparsed stdout (loses observability)."),
2584
- sessionId: z.string().optional().describe("Session ID (uses active if omitted)"),
2585
- continueSession: z.boolean().default(false).describe("Continue active session"),
2598
+ sessionId: z
2599
+ .string()
2600
+ .optional()
2601
+ .describe("Gateway session record to associate (uses the active session if omitted). Claude continuity itself is via continueSession (--continue); this ID is gateway bookkeeping, not a Claude-native session."),
2602
+ continueSession: z
2603
+ .boolean()
2604
+ .default(false)
2605
+ .describe("Continue the most recent Claude conversation in this cwd (emits --continue; real CLI continuity)."),
2586
2606
  createNewSession: z.boolean().default(false).describe("Force new session"),
2587
2607
  allowedTools: z
2588
2608
  .array(z.string())
@@ -2892,7 +2912,7 @@ export function createGatewayServer(deps = {}) {
2892
2912
  performanceMetrics.recordRequest("claude", finalizedDurationMs, wasSuccessful);
2893
2913
  }
2894
2914
  });
2895
- server.tool("codex_request", {
2915
+ server.tool("codex_request", "Run an OpenAI Codex CLI request synchronously (when async jobs are enabled, auto-defers to a pollable job past the sync deadline; otherwise runs to completion). Requires exactly one of prompt or promptParts.", {
2896
2916
  prompt: z
2897
2917
  .string()
2898
2918
  .min(1, "Prompt cannot be empty")
@@ -3084,7 +3104,7 @@ export function createGatewayServer(deps = {}) {
3084
3104
  effectiveSessionId = activeSession.id;
3085
3105
  }
3086
3106
  else {
3087
- const newSession = await sessionManager.createSession("codex", "Codex Session");
3107
+ const newSession = await sessionManager.createSession("codex", "Codex Session", `${GATEWAY_SESSION_PREFIX}${randomUUID()}`);
3088
3108
  effectiveSessionId = newSession.id;
3089
3109
  }
3090
3110
  }
@@ -3092,7 +3112,7 @@ export function createGatewayServer(deps = {}) {
3092
3112
  await sessionManager.updateSessionUsage(sessionId);
3093
3113
  }
3094
3114
  else if (createNewSession) {
3095
- const newSession = await sessionManager.createSession("codex", "Codex Session");
3115
+ const newSession = await sessionManager.createSession("codex", "Codex Session", `${GATEWAY_SESSION_PREFIX}${randomUUID()}`);
3096
3116
  effectiveSessionId = newSession.id;
3097
3117
  }
3098
3118
  logger.info(`[${corrId}] codex_request completed successfully in ${durationMs}ms`);
@@ -3140,7 +3160,7 @@ export function createGatewayServer(deps = {}) {
3140
3160
  performanceMetrics.recordRequest("codex", finalizedDurationMs, wasSuccessful);
3141
3161
  }
3142
3162
  });
3143
- server.tool("codex_fork_session", {
3163
+ server.tool("codex_fork_session", "Fork an existing Codex session into a new branch (codex fork <ID|--last>) and run a prompt against the fork without mutating the original.", {
3144
3164
  prompt: z
3145
3165
  .string()
3146
3166
  .min(1, "Prompt cannot be empty")
@@ -3227,7 +3247,7 @@ export function createGatewayServer(deps = {}) {
3227
3247
  performanceMetrics.recordRequest("codex", finalizedDurationMs, wasSuccessful);
3228
3248
  }
3229
3249
  });
3230
- server.tool("gemini_request", {
3250
+ server.tool("gemini_request", "Run a Google Gemini CLI request synchronously (when async jobs are enabled, auto-defers to a pollable job past the sync deadline; otherwise runs to completion). Requires exactly one of prompt or promptParts.", {
3231
3251
  prompt: z
3232
3252
  .string()
3233
3253
  .min(1, "Prompt cannot be empty")
@@ -3239,7 +3259,10 @@ export function createGatewayServer(deps = {}) {
3239
3259
  .string()
3240
3260
  .optional()
3241
3261
  .describe("Model name or alias (e.g. gemini-3-pro-preview, gemini-2.5-flash, pro, flash, latest)"),
3242
- sessionId: z.string().optional().describe("Session ID or 'latest'"),
3262
+ sessionId: z
3263
+ .string()
3264
+ .optional()
3265
+ .describe("Gemini session ID to resume (emits --resume <id>), or 'latest' for the most recent session in this cwd"),
3243
3266
  resumeLatest: z.boolean().default(false).describe("Resume latest session"),
3244
3267
  createNewSession: z.boolean().default(false).describe("Force new session"),
3245
3268
  approvalMode: z
@@ -3323,7 +3346,7 @@ export function createGatewayServer(deps = {}) {
3323
3346
  worktree,
3324
3347
  });
3325
3348
  });
3326
- server.tool("grok_request", {
3349
+ server.tool("grok_request", "Run an xAI Grok CLI request synchronously (when async jobs are enabled, auto-defers to a pollable job past the sync deadline; otherwise runs to completion). Requires exactly one of prompt or promptParts.", {
3327
3350
  prompt: z
3328
3351
  .string()
3329
3352
  .min(1, "Prompt cannot be empty")
@@ -3339,7 +3362,7 @@ export function createGatewayServer(deps = {}) {
3339
3362
  sessionId: z
3340
3363
  .string()
3341
3364
  .optional()
3342
- .describe("Session ID (user-provided CLI handle for --resume)"),
3365
+ .describe("Provider-native session ID to resume (emits --resume <id>; use resumeLatest for --continue)"),
3343
3366
  resumeLatest: z
3344
3367
  .boolean()
3345
3368
  .default(false)
@@ -3488,12 +3511,17 @@ export function createGatewayServer(deps = {}) {
3488
3511
  .boolean()
3489
3512
  .optional()
3490
3513
  .describe("Grok --restore-code: check out the original session commit when resuming."),
3514
+ leaderSocket: z
3515
+ .string()
3516
+ .min(1)
3517
+ .optional()
3518
+ .describe("Grok 0.2.32+ --leader-socket <PATH>: custom leader socket path (default ~/.grok/leader.sock). Targets an isolated leader process, e.g. a local/branch Grok build; name it ~/.grok/leader-*.sock to keep `grok leader list/kill` discovery working."),
3491
3519
  nativeWorktree: z
3492
3520
  .union([z.boolean(), z.string().min(1)])
3493
3521
  .optional()
3494
3522
  .describe("Grok -w/--worktree: native CLI worktree flag (`true` → bare `--worktree`, string → named). NOT gateway slice λ `worktree`."),
3495
3523
  worktree: WORKTREE_SCHEMA.optional(),
3496
- }, async ({ prompt, promptParts, model, outputFormat, sessionId, resumeLatest, createNewSession, alwaysApprove, permissionMode, effort, reasoningEffort, approvalStrategy, approvalPolicy, mcpServers, allowedTools, disallowedTools, correlationId, optimizePrompt, optimizeResponse, idleTimeoutMs, forceRefresh, maxTurns, workingDir, sandbox, rules, systemPromptOverride, allow, deny, compactionMode, compactionDetail, agent, bestOfN, check, disableWebSearch, todoGate, verbatim, agents, promptFile, promptJson, single, experimentalMemory, noAltScreen, noMemory, noPlan, noSubagents, oauth, restoreCode, nativeWorktree, worktree, }) => {
3524
+ }, async ({ prompt, promptParts, model, outputFormat, sessionId, resumeLatest, createNewSession, alwaysApprove, permissionMode, effort, reasoningEffort, approvalStrategy, approvalPolicy, mcpServers, allowedTools, disallowedTools, correlationId, optimizePrompt, optimizeResponse, idleTimeoutMs, forceRefresh, maxTurns, workingDir, sandbox, rules, systemPromptOverride, allow, deny, compactionMode, compactionDetail, agent, bestOfN, check, disableWebSearch, todoGate, verbatim, agents, promptFile, promptJson, single, experimentalMemory, noAltScreen, noMemory, noPlan, noSubagents, oauth, restoreCode, leaderSocket, nativeWorktree, worktree, }) => {
3497
3525
  return handleGrokRequest({ sessionManager, logger, runtime }, {
3498
3526
  prompt,
3499
3527
  promptParts,
@@ -3542,11 +3570,12 @@ export function createGatewayServer(deps = {}) {
3542
3570
  noSubagents,
3543
3571
  oauth,
3544
3572
  restoreCode,
3573
+ leaderSocket,
3545
3574
  nativeWorktree,
3546
3575
  worktree,
3547
3576
  });
3548
3577
  });
3549
- server.tool("mistral_request", {
3578
+ server.tool("mistral_request", "Run a Mistral Vibe CLI request synchronously (when async jobs are enabled, auto-defers to a pollable job past the sync deadline; otherwise runs to completion). Requires exactly one of prompt or promptParts.", {
3550
3579
  prompt: z
3551
3580
  .string()
3552
3581
  .min(1, "Prompt cannot be empty")
@@ -3656,7 +3685,7 @@ export function createGatewayServer(deps = {}) {
3656
3685
  });
3657
3686
  });
3658
3687
  if (asyncJobsEnabled) {
3659
- server.tool("claude_request_async", {
3688
+ server.tool("claude_request_async", "Start a Claude Code CLI request as a durable background job. Poll with llm_job_status, collect with llm_job_result.", {
3660
3689
  prompt: z
3661
3690
  .string()
3662
3691
  .min(1, "Prompt cannot be empty")
@@ -3672,8 +3701,14 @@ export function createGatewayServer(deps = {}) {
3672
3701
  .enum(["text", "json", "stream-json"])
3673
3702
  .default("stream-json")
3674
3703
  .describe("Output format (text|json|stream-json). DEFAULT: stream-json — same rationale as claude_request: keeps usage/cache/cost observable for cache_state aggregates. Override to 'text' only when raw stdout is required (loses observability)."),
3675
- sessionId: z.string().optional().describe("Session ID (uses active if omitted)"),
3676
- continueSession: z.boolean().default(false).describe("Continue active session"),
3704
+ sessionId: z
3705
+ .string()
3706
+ .optional()
3707
+ .describe("Gateway session record to associate (uses the active session if omitted). Claude continuity itself is via continueSession (--continue); this ID is gateway bookkeeping, not a Claude-native session."),
3708
+ continueSession: z
3709
+ .boolean()
3710
+ .default(false)
3711
+ .describe("Continue the most recent Claude conversation in this cwd (emits --continue; real CLI continuity)."),
3677
3712
  createNewSession: z.boolean().default(false).describe("Force new session"),
3678
3713
  allowedTools: z
3679
3714
  .array(z.string())
@@ -3909,7 +3944,7 @@ export function createGatewayServer(deps = {}) {
3909
3944
  return createErrorResponse("claude_request_async", 1, "", corrId, error);
3910
3945
  }
3911
3946
  });
3912
- server.tool("codex_request_async", {
3947
+ server.tool("codex_request_async", "Start an OpenAI Codex CLI request as a durable background job. Poll with llm_job_status, collect with llm_job_result.", {
3913
3948
  prompt: z
3914
3949
  .string()
3915
3950
  .min(1, "Prompt cannot be empty")
@@ -4034,7 +4069,7 @@ export function createGatewayServer(deps = {}) {
4034
4069
  worktree,
4035
4070
  });
4036
4071
  });
4037
- server.tool("gemini_request_async", {
4072
+ server.tool("gemini_request_async", "Start a Google Gemini CLI request as a durable background job. Poll with llm_job_status, collect with llm_job_result.", {
4038
4073
  prompt: z
4039
4074
  .string()
4040
4075
  .min(1, "Prompt cannot be empty")
@@ -4049,7 +4084,7 @@ export function createGatewayServer(deps = {}) {
4049
4084
  sessionId: z
4050
4085
  .string()
4051
4086
  .optional()
4052
- .describe("Session ID (user-provided CLI handle for --resume)"),
4087
+ .describe("Gemini session ID to resume (emits --resume <id>), or 'latest' for the most recent session in this cwd"),
4053
4088
  resumeLatest: z.boolean().default(false).describe("Resume latest session"),
4054
4089
  createNewSession: z.boolean().default(false).describe("Force new session"),
4055
4090
  approvalMode: z
@@ -4131,7 +4166,7 @@ export function createGatewayServer(deps = {}) {
4131
4166
  worktree,
4132
4167
  });
4133
4168
  });
4134
- server.tool("grok_request_async", {
4169
+ server.tool("grok_request_async", "Start an xAI Grok CLI request as a durable background job. Poll with llm_job_status, collect with llm_job_result.", {
4135
4170
  prompt: z
4136
4171
  .string()
4137
4172
  .min(1, "Prompt cannot be empty")
@@ -4147,7 +4182,7 @@ export function createGatewayServer(deps = {}) {
4147
4182
  sessionId: z
4148
4183
  .string()
4149
4184
  .optional()
4150
- .describe("Session ID (user-provided CLI handle for --resume)"),
4185
+ .describe("Provider-native session ID to resume (emits --resume <id>; use resumeLatest for --continue)"),
4151
4186
  resumeLatest: z
4152
4187
  .boolean()
4153
4188
  .default(false)
@@ -4298,12 +4333,17 @@ export function createGatewayServer(deps = {}) {
4298
4333
  .boolean()
4299
4334
  .optional()
4300
4335
  .describe("Grok --restore-code: check out the original session commit when resuming."),
4336
+ leaderSocket: z
4337
+ .string()
4338
+ .min(1)
4339
+ .optional()
4340
+ .describe("Grok 0.2.32+ --leader-socket <PATH>: custom leader socket path (default ~/.grok/leader.sock). Targets an isolated leader process, e.g. a local/branch Grok build; name it ~/.grok/leader-*.sock to keep `grok leader list/kill` discovery working."),
4301
4341
  nativeWorktree: z
4302
4342
  .union([z.boolean(), z.string().min(1)])
4303
4343
  .optional()
4304
4344
  .describe("Grok -w/--worktree: native CLI worktree flag (`true` → bare `--worktree`, string → named). NOT gateway slice λ `worktree`."),
4305
4345
  worktree: WORKTREE_SCHEMA.optional(),
4306
- }, async ({ prompt, promptParts, model, outputFormat, sessionId, resumeLatest, createNewSession, alwaysApprove, permissionMode, effort, reasoningEffort, approvalStrategy, approvalPolicy, mcpServers, allowedTools, disallowedTools, correlationId, optimizePrompt, idleTimeoutMs, forceRefresh, maxTurns, workingDir, sandbox, rules, systemPromptOverride, allow, deny, compactionMode, compactionDetail, agent, bestOfN, check, disableWebSearch, todoGate, verbatim, agents, promptFile, promptJson, single, experimentalMemory, noAltScreen, noMemory, noPlan, noSubagents, oauth, restoreCode, nativeWorktree, worktree, }) => {
4346
+ }, async ({ prompt, promptParts, model, outputFormat, sessionId, resumeLatest, createNewSession, alwaysApprove, permissionMode, effort, reasoningEffort, approvalStrategy, approvalPolicy, mcpServers, allowedTools, disallowedTools, correlationId, optimizePrompt, idleTimeoutMs, forceRefresh, maxTurns, workingDir, sandbox, rules, systemPromptOverride, allow, deny, compactionMode, compactionDetail, agent, bestOfN, check, disableWebSearch, todoGate, verbatim, agents, promptFile, promptJson, single, experimentalMemory, noAltScreen, noMemory, noPlan, noSubagents, oauth, restoreCode, leaderSocket, nativeWorktree, worktree, }) => {
4307
4347
  return handleGrokRequestAsync({ sessionManager, asyncJobManager, logger, runtime }, {
4308
4348
  prompt,
4309
4349
  promptParts,
@@ -4351,11 +4391,12 @@ export function createGatewayServer(deps = {}) {
4351
4391
  noSubagents,
4352
4392
  oauth,
4353
4393
  restoreCode,
4394
+ leaderSocket,
4354
4395
  nativeWorktree,
4355
4396
  worktree,
4356
4397
  });
4357
4398
  });
4358
- server.tool("mistral_request_async", {
4399
+ server.tool("mistral_request_async", "Start a Mistral Vibe CLI request as a durable background job. Poll with llm_job_status, collect with llm_job_result.", {
4359
4400
  prompt: z
4360
4401
  .string()
4361
4402
  .min(1, "Prompt cannot be empty")
@@ -4462,7 +4503,7 @@ export function createGatewayServer(deps = {}) {
4462
4503
  worktree,
4463
4504
  });
4464
4505
  });
4465
- server.tool("llm_job_status", {
4506
+ server.tool("llm_job_status", "Check lifecycle status (running|completed|failed|canceled|orphaned) of a gateway async or deferred-sync job by jobId.", {
4466
4507
  jobId: z.string().describe("Async job ID from *_request_async"),
4467
4508
  }, async ({ jobId }) => {
4468
4509
  const job = asyncJobManager.getJobSnapshot(jobId);
@@ -4493,7 +4534,7 @@ export function createGatewayServer(deps = {}) {
4493
4534
  ],
4494
4535
  };
4495
4536
  });
4496
- server.tool("llm_job_result", {
4537
+ server.tool("llm_job_result", "Retrieve captured stdout/stderr for a gateway async or deferred-sync job by jobId.", {
4497
4538
  jobId: z.string().describe("Async job ID from *_request_async"),
4498
4539
  maxChars: z
4499
4540
  .number()
@@ -4547,7 +4588,7 @@ export function createGatewayServer(deps = {}) {
4547
4588
  ],
4548
4589
  };
4549
4590
  });
4550
- server.tool("llm_job_cancel", {
4591
+ server.tool("llm_job_cancel", "Cancel a running gateway async or deferred-sync job by jobId.", {
4551
4592
  jobId: z.string().describe("Async job ID from *_request_async"),
4552
4593
  }, async ({ jobId }) => {
4553
4594
  const cancel = asyncJobManager.cancelJob(jobId);
@@ -4579,7 +4620,7 @@ export function createGatewayServer(deps = {}) {
4579
4620
  };
4580
4621
  });
4581
4622
  }
4582
- server.tool("llm_request_result", {
4623
+ server.tool("llm_request_result", "Read back any persisted request (sync or async) from the flight recorder by correlationId, including prompt and response.", {
4583
4624
  correlationId: z
4584
4625
  .string()
4585
4626
  .min(1)
@@ -4625,7 +4666,7 @@ export function createGatewayServer(deps = {}) {
4625
4666
  ],
4626
4667
  };
4627
4668
  });
4628
- server.tool("llm_process_health", {}, async () => {
4669
+ server.tool("llm_process_health", "Report gateway process health: async-job manager state plus the resolved persistence configuration and paths.", {}, async () => {
4629
4670
  const health = asyncJobManager.getJobHealth();
4630
4671
  const persistenceBlock = {
4631
4672
  backend: persistence.backend,
@@ -4649,7 +4690,7 @@ export function createGatewayServer(deps = {}) {
4649
4690
  ],
4650
4691
  };
4651
4692
  });
4652
- server.tool("approval_list", {
4693
+ server.tool("approval_list", "List recent MCP-managed approval decisions recorded by the gateway (approvalStrategy: mcp_managed).", {
4653
4694
  limit: z
4654
4695
  .number()
4655
4696
  .int()
@@ -4676,7 +4717,7 @@ export function createGatewayServer(deps = {}) {
4676
4717
  ],
4677
4718
  };
4678
4719
  });
4679
- server.tool("list_models", {
4720
+ server.tool("list_models", "List models, aliases, and defaults for one provider CLI (claude|codex|gemini|grok|mistral).", {
4680
4721
  cli: z
4681
4722
  .preprocess(value => (value === "" || value === null ? undefined : value), z.enum(["claude", "codex", "gemini", "grok", "mistral"]).optional())
4682
4723
  .describe("CLI filter (claude|codex|gemini|grok|mistral)"),
@@ -4685,7 +4726,7 @@ export function createGatewayServer(deps = {}) {
4685
4726
  const result = cli ? { [cli]: cliInfo[cli] } : cliInfo;
4686
4727
  return { content: [{ type: "text", text: JSON.stringify(result, null, 2) }] };
4687
4728
  });
4688
- server.tool("cli_versions", {
4729
+ server.tool("cli_versions", "Report installed provider CLI versions, availability, and login status for all five providers or one.", {
4689
4730
  cli: z
4690
4731
  .preprocess(value => (value === "" || value === null ? undefined : value), z.enum(["claude", "codex", "gemini", "grok", "mistral"]).optional())
4691
4732
  .describe("CLI filter (claude|codex|gemini|grok|mistral)"),
@@ -4693,7 +4734,7 @@ export function createGatewayServer(deps = {}) {
4693
4734
  const versions = await getCliVersions(cli);
4694
4735
  return { content: [{ type: "text", text: JSON.stringify({ versions }, null, 2) }] };
4695
4736
  });
4696
- server.tool("upstream_contracts", {
4737
+ server.tool("upstream_contracts", "Return the gateway's declared provider CLI contracts; with probeInstalled true, diff against installed --help surfaces to detect flag drift.", {
4697
4738
  cli: z
4698
4739
  .preprocess(value => (value === "" || value === null ? undefined : value), SESSION_PROVIDER_ENUM.optional())
4699
4740
  .describe("CLI filter (claude|codex|gemini|grok|mistral)"),
@@ -4705,7 +4746,7 @@ export function createGatewayServer(deps = {}) {
4705
4746
  const report = buildUpstreamContractReport({ cli, probeInstalled });
4706
4747
  return { content: [{ type: "text", text: JSON.stringify(report, null, 2) }] };
4707
4748
  });
4708
- server.tool("cli_upgrade", {
4749
+ server.tool("cli_upgrade", "Plan (dryRun, default true) or execute an upgrade for one provider CLI using its native update mechanism.", {
4709
4750
  cli: z.enum(["claude", "codex", "gemini", "grok", "mistral"]).describe("CLI to upgrade"),
4710
4751
  target: z
4711
4752
  .string()
@@ -4754,7 +4795,7 @@ export function createGatewayServer(deps = {}) {
4754
4795
  };
4755
4796
  }
4756
4797
  });
4757
- server.tool("session_create", {
4798
+ server.tool("session_create", "Create a gateway session record for a provider CLI. NOTE: this is gateway bookkeeping (gw-* ID), not a provider-native session — Codex resume needs a real Codex UUID.", {
4758
4799
  cli: SESSION_PROVIDER_ENUM.describe("CLI type (claude|codex|gemini|grok|mistral)"),
4759
4800
  description: z.string().optional().describe("Session description"),
4760
4801
  setAsActive: z.boolean().default(true).describe("Set as active session"),
@@ -4787,7 +4828,7 @@ export function createGatewayServer(deps = {}) {
4787
4828
  return createErrorResponse("session_create", 1, "", undefined, error);
4788
4829
  }
4789
4830
  });
4790
- server.tool("session_list", {
4831
+ server.tool("session_list", "List gateway session records and the active session per CLI, optionally filtered by CLI.", {
4791
4832
  cli: SESSION_PROVIDER_ENUM.optional().describe("CLI filter (claude|codex|gemini|grok|mistral)"),
4792
4833
  }, async ({ cli }) => {
4793
4834
  try {
@@ -4830,7 +4871,7 @@ export function createGatewayServer(deps = {}) {
4830
4871
  return createErrorResponse("session_list", 1, "", undefined, error);
4831
4872
  }
4832
4873
  });
4833
- server.tool("session_set_active", {
4874
+ server.tool("session_set_active", "Set or clear the active session for a CLI; the active session is used when a request omits sessionId.", {
4834
4875
  cli: SESSION_PROVIDER_ENUM.describe("CLI type (claude|codex|gemini|grok|mistral)"),
4835
4876
  sessionId: z.string().nullable().describe("Session ID (null to clear)"),
4836
4877
  }, async ({ cli, sessionId }) => {
@@ -4868,7 +4909,7 @@ export function createGatewayServer(deps = {}) {
4868
4909
  return createErrorResponse("session_set_active", 1, "", undefined, error);
4869
4910
  }
4870
4911
  });
4871
- server.tool("session_delete", {
4912
+ server.tool("session_delete", "Delete a gateway session record by ID (also removes any gateway-owned worktree attached to it).", {
4872
4913
  sessionId: z.string().describe("Session ID"),
4873
4914
  }, async ({ sessionId }) => {
4874
4915
  try {
@@ -4909,7 +4950,7 @@ export function createGatewayServer(deps = {}) {
4909
4950
  return createErrorResponse("session_delete", 1, "", undefined, error);
4910
4951
  }
4911
4952
  });
4912
- server.tool("session_get", {
4953
+ server.tool("session_get", "Get one gateway session record by session ID, including recent request history when available.", {
4913
4954
  sessionId: z.string().describe("Session ID"),
4914
4955
  }, async ({ sessionId }) => {
4915
4956
  try {
@@ -4972,7 +5013,7 @@ export function createGatewayServer(deps = {}) {
4972
5013
  return createErrorResponse("session_get", 1, "", undefined, error);
4973
5014
  }
4974
5015
  });
4975
- server.tool("session_clear_all", {
5016
+ server.tool("session_clear_all", "Delete all gateway session records, optionally scoped to one CLI.", {
4976
5017
  cli: SESSION_PROVIDER_ENUM.optional().describe("CLI filter (claude|codex|gemini|grok|mistral)"),
4977
5018
  }, async ({ cli }) => {
4978
5019
  try {
@@ -5,6 +5,7 @@ export interface CliFlagContract {
5
5
  values?: readonly string[];
6
6
  pattern?: RegExp;
7
7
  description: string;
8
+ hiddenFromHelp?: boolean;
8
9
  }
9
10
  export interface CliUpstreamMetadata {
10
11
  sourceUrls: readonly string[];
@@ -32,6 +33,7 @@ export interface CliContract {
32
33
  resumeMaxPositionals?: number;
33
34
  resumeOnlyFlags?: readonly string[];
34
35
  resumeForbiddenFlags?: readonly string[];
36
+ acknowledgedUpstreamFlags?: readonly string[];
35
37
  upstreamMetadata?: CliUpstreamMetadata;
36
38
  }
37
39
  export interface CliContractFixture {
@@ -57,6 +59,13 @@ export declare function assertUpstreamCliArgs(cli: CliType, args: readonly strin
57
59
  export declare function validateUpstreamCliEnv(cli: CliType, env: Record<string, string> | undefined): ContractValidationResult;
58
60
  export declare function assertUpstreamCliEnv(cli: CliType, env: Record<string, string> | undefined): void;
59
61
  export declare function extractDiscoveredFlags(helpText: string): readonly string[];
62
+ export interface FlagDriftResult {
63
+ missingFlags: string[];
64
+ extraFlags: readonly string[];
65
+ acknowledgedExtraFlags: readonly string[];
66
+ warnings: string[];
67
+ }
68
+ export declare function computeFlagDrift(contract: CliContract, helpText: string, discoveredFlags: readonly string[]): FlagDriftResult;
60
69
  export interface InstalledCliContractProbe {
61
70
  cli: CliType;
62
71
  executable: string;
@@ -66,6 +75,7 @@ export interface InstalledCliContractProbe {
66
75
  checkedHelpCommands: string[][];
67
76
  missingFlags: string[];
68
77
  extraFlags: readonly string[];
78
+ acknowledgedExtraFlags: readonly string[];
69
79
  discoveredFlags: readonly string[];
70
80
  helpHash?: string;
71
81
  versionHint?: string;
@@ -99,7 +99,12 @@ export const UPSTREAM_CLI_CONTRACTS = {
99
99
  pattern: /^[0-9]+(?:\.[0-9]+)?$/,
100
100
  description: "Budget cap in USD",
101
101
  },
102
- "--max-turns": { arity: "one", pattern: /^[1-9][0-9]*$/, description: "Turn cap" },
102
+ "--max-turns": {
103
+ arity: "one",
104
+ pattern: /^[1-9][0-9]*$/,
105
+ description: "Turn cap",
106
+ hiddenFromHelp: true,
107
+ },
103
108
  "--effort": { arity: "one", values: EFFORT_LEVELS, description: "Reasoning effort" },
104
109
  "--exclude-dynamic-system-prompt-sections": {
105
110
  arity: "none",
@@ -136,6 +141,37 @@ export const UPSTREAM_CLI_CONTRACTS = {
136
141
  description: 'Restrict the available built-in tool set ("" disables all)',
137
142
  },
138
143
  },
144
+ acknowledgedUpstreamFlags: [
145
+ "--allow-dangerously-skip-permissions",
146
+ "--allowed",
147
+ "--bare",
148
+ "--betas",
149
+ "--brief",
150
+ "--chrome",
151
+ "--dangerously-skip-permissions",
152
+ "--debug",
153
+ "--debug-file",
154
+ "--disable-slash-commands",
155
+ "--disallowed",
156
+ "--file",
157
+ "--from-pr",
158
+ "--ide",
159
+ "--include-hook-events",
160
+ "--mcp-debug",
161
+ "--name",
162
+ "--no-chrome",
163
+ "--plugin-dir",
164
+ "--plugin-url",
165
+ "--print",
166
+ "--prompt-suggestions",
167
+ "--remote-control",
168
+ "--remote-control-session-name-prefix",
169
+ "--replay-user-messages",
170
+ "--resume",
171
+ "--tmux",
172
+ "--version",
173
+ "--worktree",
174
+ ],
139
175
  env: {},
140
176
  conformanceFixtures: [
141
177
  {
@@ -518,6 +554,26 @@ export const UPSTREAM_CLI_CONTRACTS = {
518
554
  description: "Auto-approve all actions (gemini -y/--yolo). Functionally equivalent to --approval-mode yolo; the gateway emits at most one of the two.",
519
555
  },
520
556
  },
557
+ acknowledgedUpstreamFlags: [
558
+ "--accept-raw-output-risk",
559
+ "--acp",
560
+ "--debug",
561
+ "--delete-session",
562
+ "--experimental-acp",
563
+ "--extensions",
564
+ "--list-extensions",
565
+ "--list-sessions",
566
+ "--output-format",
567
+ "--prompt",
568
+ "--prompt-interactive",
569
+ "--raw-output",
570
+ "--sandbox",
571
+ "--screen-reader",
572
+ "--session-file",
573
+ "--session-id",
574
+ "--version",
575
+ "--worktree",
576
+ ],
521
577
  env: {},
522
578
  conformanceFixtures: [
523
579
  {
@@ -612,6 +668,7 @@ export const UPSTREAM_CLI_CONTRACTS = {
612
668
  "noSubagents",
613
669
  "oauth",
614
670
  "restoreCode",
671
+ "leaderSocket",
615
672
  "nativeWorktree",
616
673
  ],
617
674
  flags: {
@@ -693,6 +750,10 @@ export const UPSTREAM_CLI_CONTRACTS = {
693
750
  arity: "none",
694
751
  description: "Check out the original session commit when resuming",
695
752
  },
753
+ "--leader-socket": {
754
+ arity: "one",
755
+ description: "Custom leader socket path (isolated leader, Grok 0.2.32+)",
756
+ },
696
757
  "--single": { arity: "one", description: "Single-turn prompt" },
697
758
  "--todo-gate": { arity: "none", description: "Enable runtime turn-end TodoGate" },
698
759
  "--verbatim": { arity: "none", description: "Send prompt exactly as given" },
@@ -843,6 +904,18 @@ export const UPSTREAM_CLI_CONTRACTS = {
843
904
  ],
844
905
  expect: "pass",
845
906
  },
907
+ {
908
+ id: "grok-leader-socket",
909
+ description: "Grok 0.2.32: --leader-socket <PATH> is accepted",
910
+ args: ["-p", "hello", "--leader-socket", "/home/user/.grok/leader-branch.sock"],
911
+ expect: "pass",
912
+ },
913
+ {
914
+ id: "grok-leader-socket-missing-value",
915
+ description: "Grok 0.2.32: --leader-socket without a path is rejected (arity one)",
916
+ args: ["-p", "hello", "--leader-socket"],
917
+ expect: "fail",
918
+ },
846
919
  ],
847
920
  },
848
921
  mistral: {
@@ -1220,6 +1293,42 @@ export function extractDiscoveredFlags(helpText) {
1220
1293
  }
1221
1294
  return Array.from(discovered).sort();
1222
1295
  }
1296
+ export function computeFlagDrift(contract, helpText, discoveredFlags) {
1297
+ const warnings = [];
1298
+ const missingFlags = [];
1299
+ for (const [flag, spec] of Object.entries(contract.flags)) {
1300
+ const inHelp = helpText.includes(flag);
1301
+ if (spec.hiddenFromHelp) {
1302
+ if (inHelp) {
1303
+ warnings.push(`${flag} is marked hiddenFromHelp but now appears in ${contract.executable} help output; remove the hiddenFromHelp marker from the contract`);
1304
+ }
1305
+ continue;
1306
+ }
1307
+ if (!inHelp)
1308
+ missingFlags.push(flag);
1309
+ }
1310
+ const contractFlagSet = new Set(Object.keys(contract.flags));
1311
+ const acknowledged = new Set(contract.acknowledgedUpstreamFlags ?? []);
1312
+ const extraFlags = [];
1313
+ const acknowledgedExtraFlags = [];
1314
+ for (const flag of discoveredFlags) {
1315
+ if (contractFlagSet.has(flag))
1316
+ continue;
1317
+ if (acknowledged.has(flag)) {
1318
+ acknowledgedExtraFlags.push(flag);
1319
+ }
1320
+ else {
1321
+ extraFlags.push(flag);
1322
+ }
1323
+ }
1324
+ const discoveredSet = new Set(discoveredFlags);
1325
+ for (const flag of acknowledged) {
1326
+ if (!discoveredSet.has(flag)) {
1327
+ warnings.push(`acknowledged upstream flag ${flag} no longer appears in ${contract.executable} help output; remove it from acknowledgedUpstreamFlags`);
1328
+ }
1329
+ }
1330
+ return { missingFlags, extraFlags, acknowledgedExtraFlags, warnings };
1331
+ }
1223
1332
  export function probeInstalledCliContract(cli, timeoutMs = 5_000) {
1224
1333
  const contract = UPSTREAM_CLI_CONTRACTS[cli];
1225
1334
  const outputs = [];
@@ -1252,6 +1361,7 @@ export function probeInstalledCliContract(cli, timeoutMs = 5_000) {
1252
1361
  checkedHelpCommands: contract.helpArgs,
1253
1362
  missingFlags: [],
1254
1363
  extraFlags: [],
1364
+ acknowledgedExtraFlags: [],
1255
1365
  discoveredFlags: [],
1256
1366
  helpHash: undefined,
1257
1367
  versionHint: undefined,
@@ -1265,10 +1375,9 @@ export function probeInstalledCliContract(cli, timeoutMs = 5_000) {
1265
1375
  }
1266
1376
  }
1267
1377
  const helpText = outputs.join("\n");
1268
- const missingFlags = Object.keys(contract.flags).filter(flag => !helpText.includes(flag));
1269
1378
  const discoveredFlags = extractDiscoveredFlags(helpText);
1270
- const contractFlagSet = new Set(Object.keys(contract.flags));
1271
- const extraFlags = discoveredFlags.filter(f => !contractFlagSet.has(f));
1379
+ const drift = computeFlagDrift(contract, helpText, discoveredFlags);
1380
+ warnings.push(...drift.warnings);
1272
1381
  const versionMatch = helpText.match(/^\s*(?:[A-Za-z][\w .-]+)?v?\d+\.\d+\S*/m);
1273
1382
  const versionHint = versionMatch ? versionMatch[0].trim().slice(0, 80) : undefined;
1274
1383
  const helpHash = createHash("sha256").update(helpText).digest("hex");
@@ -1279,8 +1388,9 @@ export function probeInstalledCliContract(cli, timeoutMs = 5_000) {
1279
1388
  resolvedArgs,
1280
1389
  available: true,
1281
1390
  checkedHelpCommands: contract.helpArgs,
1282
- missingFlags,
1283
- extraFlags,
1391
+ missingFlags: drift.missingFlags,
1392
+ extraFlags: drift.extraFlags,
1393
+ acknowledgedExtraFlags: drift.acknowledgedExtraFlags,
1284
1394
  discoveredFlags,
1285
1395
  helpHash,
1286
1396
  versionHint,
@@ -47,7 +47,7 @@ function findHumanReadableReport(value) {
47
47
  return null;
48
48
  }
49
49
  export function registerValidationTools(server, deps) {
50
- server.tool("validate_with_models", {
50
+ server.tool("validate_with_models", "Ask two or more provider CLIs to independently validate a question. Starts validation jobs — poll with job_status, collect with job_result (not llm_job_*).", {
51
51
  question: z.string().min(1).describe("Question or content to validate."),
52
52
  models: providerListSchema.describe("Providers to ask. Defaults to Claude and Codex."),
53
53
  focus: z
@@ -69,7 +69,7 @@ export function registerValidationTools(server, deps) {
69
69
  judgeProvider: judgeModel,
70
70
  }),
71
71
  }));
72
- server.tool("second_opinion", {
72
+ server.tool("second_opinion", "Ask one provider CLI to review an answer (starts a validation job; poll job_status, collect job_result).", {
73
73
  answer: z.string().min(1).describe("Answer to review."),
74
74
  question: z.string().optional().describe("Original question, if available."),
75
75
  model: providerSchema.default("codex").describe("Provider to ask for the second opinion."),
@@ -84,7 +84,7 @@ export function registerValidationTools(server, deps) {
84
84
  providers: [model],
85
85
  }),
86
86
  }));
87
- server.tool("compare_answers", {
87
+ server.tool("compare_answers", "Summarize agreement/differences between caller-provided answers LOCALLY — does not call any provider.", {
88
88
  question: z.string().min(1).describe("Question the answers respond to."),
89
89
  answers: z.array(z.string().min(1)).min(2).describe("Two or more answers to compare."),
90
90
  }, async ({ question, answers }) => textResponse({
@@ -99,7 +99,7 @@ export function registerValidationTools(server, deps) {
99
99
  note: "Use validate_with_models when independent provider review is needed.",
100
100
  },
101
101
  }));
102
- server.tool("red_team_review", {
102
+ server.tool("red_team_review", "Challenge a plan, answer, or document for risks and failure modes via provider CLIs (starts validation jobs).", {
103
103
  content: z.string().min(1).describe("Plan, answer, or document to challenge."),
104
104
  riskLevel: z
105
105
  .enum(["normal", "high"])
@@ -117,7 +117,7 @@ export function registerValidationTools(server, deps) {
117
117
  riskLevel,
118
118
  }),
119
119
  }));
120
- server.tool("consensus_check", {
120
+ server.tool("consensus_check", "Ask provider CLIs whether they agree or disagree with a claim (starts validation jobs).", {
121
121
  claim: z.string().min(1).describe("Claim to check across providers."),
122
122
  models: providerListSchema.describe("Providers to ask for agreement or disagreement."),
123
123
  }, async ({ claim, models }) => textResponse({
@@ -130,7 +130,7 @@ export function registerValidationTools(server, deps) {
130
130
  providers: models,
131
131
  }),
132
132
  }));
133
- server.tool("ask_model", {
133
+ server.tool("ask_model", "Ask one provider CLI a question through the simplified validation surface (starts a validation job).", {
134
134
  question: z.string().min(1).describe("Question for one provider."),
135
135
  model: providerSchema.default("claude").describe("Provider to ask."),
136
136
  }, async ({ question, model }) => textResponse({
@@ -143,7 +143,7 @@ export function registerValidationTools(server, deps) {
143
143
  providers: [model],
144
144
  }),
145
145
  }));
146
- server.tool("synthesize_validation", {
146
+ server.tool("synthesize_validation", "Run an explicit judge model over already-collected validation results to produce a synthesis.", {
147
147
  question: z.string().min(1).describe("Original request that was validated."),
148
148
  providerResults: z
149
149
  .array(normalizedProviderResultSchema)
@@ -160,8 +160,8 @@ export function registerValidationTools(server, deps) {
160
160
  judgeProvider: judgeModel,
161
161
  }),
162
162
  }));
163
- server.tool("list_available_models", {}, async () => textResponse({ success: true, models: getAvailableCliInfo() }));
164
- server.tool("job_status", {
163
+ server.tool("list_available_models", "List models and capabilities for every available provider CLI (takes no arguments; complements per-provider list_models).", {}, async () => textResponse({ success: true, models: getAvailableCliInfo() }));
164
+ server.tool("job_status", "Check a VALIDATION job's status (jobs started by validate_with_models/ask_model/etc.) — distinct from llm_job_status, which tracks provider request jobs.", {
165
165
  jobId: z.string().min(1).describe("Validation job ID."),
166
166
  }, async ({ jobId }) => {
167
167
  const job = deps.asyncJobManager.getJobSnapshot(jobId);
@@ -170,7 +170,7 @@ export function registerValidationTools(server, deps) {
170
170
  }
171
171
  return textResponse({ success: true, job });
172
172
  });
173
- server.tool("job_result", {
173
+ server.tool("job_result", "Collect a VALIDATION job's normalized provider output — distinct from llm_job_result, which returns raw provider request job output.", {
174
174
  jobId: z.string().min(1).describe("Validation job ID."),
175
175
  provider: providerSchema
176
176
  .optional()
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "llm-cli-gateway",
3
- "version": "2.0.0",
3
+ "version": "2.2.0",
4
4
  "mcpName": "io.github.verivus-oss/llm-cli-gateway",
5
5
  "description": "MCP server providing unified access to Claude Code, Codex, Gemini, Grok, and Mistral Vibe CLIs with session management, retry logic, async job orchestration, durable job results, and cross-LLM validation.",
6
6
  "license": "MIT",