llm-cli-gateway 1.1.0 → 1.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -2,6 +2,27 @@
2
2
 
3
3
  All notable changes to the llm-cli-gateway project.
4
4
 
5
+ ## Unreleased
6
+
7
+ ## [1.4.0] - 2026-05-16
8
+
9
+ ### Added
10
+
11
+ - **Codex `exec resume` wired through the gateway** — `codex_request` and `codex_request_async` now accept `sessionId` (real Codex session UUID from `~/.codex/sessions/` or the `codex resume` picker) and `resumeLatest:true`, emitting `codex exec resume <UUID>` and `codex exec resume --last` respectively. Codex sessions are no longer bookkeeping-only at the gateway layer; multi-turn workflows carry real CLI continuity, matching Claude/Gemini/Grok. Gateway-generated `gw-*` IDs are rejected for Codex (as for Gemini/Grok). `--full-auto` is silently dropped on resume because `codex exec resume` does not accept it — the original session's approval policy is inherited.
12
+ - **Durable job results + automatic dedup** — Async jobs are now persisted to a `jobs` table in `~/.llm-cli-gateway/logs.db` on every state transition (start, output flush, completion). `llm_job_status` and `llm_job_result` fall back to the database when the job is no longer in memory, so callers can collect a result regardless of how long ago the work completed (default retention: **30 days**, configurable via `LLM_GATEWAY_JOB_RETENTION_DAYS`). Identical `*_request` / `*_request_async` calls within a dedup window (default **1 hour**, configurable via `LLM_GATEWAY_DEDUP_WINDOW_MS`) short-circuit onto the existing running or completed job instead of spawning a duplicate run — directly fixing the "agent re-issues and the whole job starts over" loop. Each tool now accepts `forceRefresh: true` to bypass dedup. Jobs that were running when the gateway last stopped are flipped to `orphaned` on startup so callers can still read their partial output.
13
+ - **Grok CLI provider (xAI Grok Build TUI)** — New `grok_request` and `grok_request_async` MCP tools mirror the existing Claude/Codex/Gemini surface (sync + async, session management via `--resume`/`--continue`, idle-timeout, approval policy, review-integrity, flight recorder, metrics). Auth assumes a prior `grok login` (OAuth) or `GROK_CODE_XAI_API_KEY`. Default model: `grok-build`. `GROK_DEFAULT_MODEL`, `GROK_MODELS`, and `GROK_MODEL_ALIASES` env vars are honored by the model registry. `cli_upgrade` treats Grok as self-updating (`grok update` / `grok update --version <target>`).
14
+ - **Source-aware model registry** — `list_models` now reports model source/confidence metadata, aliases, default model source, and non-fatal discovery warnings
15
+ - **Deterministic model configuration overrides** — Added `*_SETTINGS_PATH`, `GEMINI_HISTORY_ROOT`, `*_MODEL_ALIASES`, and `LLM_GATEWAY_MODEL_ALIASES` support for stable deployments and tests
16
+ - **CLI lifecycle tools** — Added `cli_versions` and `cli_upgrade` tools for inspecting and upgrading individual Claude, Codex, Gemini, and Grok CLI installations
17
+ - **`resolveCodexSessionArgs` helper** in `src/request-helpers.ts` with 7 new tests covering mode resolution and `gw-*` rejection (Codex uses an `exec resume` subcommand rather than a flag pair, so the helper returns a `mode` discriminant: `new` | `resume-by-id` | `resume-latest`)
18
+
19
+ ### Changed
20
+
21
+ - **`better-sqlite3` bumped to `^12.9.0`** (from `^11.0.0`) — required engines now `node 20.x || 22.x || 23.x || 24.x || 25.x`
22
+ - **Gemini history discovery is no longer authoritative** — Models observed in local Gemini session files are merged as low-confidence entries and no longer replace the registry or set the default model
23
+ - **Codex default handling remains explicit** — If Codex has no configured default, `default`/`latest` resolve to no model flag so the Codex CLI can use its own built-in default
24
+ - **Gateway skills refreshed** — The `.agents/skills/` (async-job-orchestration, implement-review-fix, multi-llm-review, secure-orchestration, session-workflow) and `skills/` (multi-llm-orchestration, multi-llm-consensus, model-routing, design-review-cycle, agent-codex-gate, codex-review-gate, red-team-assessment) skill docs now cover Grok, durable job results, auto-dedup, and the new Codex resume capability. `.agents/skills/` entries bumped to metadata version 1.5.
25
+
5
26
  ## [1.1.0] - 2026-04-04
6
27
 
7
28
  ### Added
package/README.md CHANGED
@@ -3,12 +3,12 @@
3
3
  > *"Without consultation, plans are frustrated, but with many counselors they succeed."*
4
4
  > — Proverbs 15:22 (LSB)
5
5
 
6
- A Model Context Protocol (MCP) server providing unified access to Claude Code, Codex, and Gemini CLIs with session management, retry logic, and async job orchestration.
6
+ A Model Context Protocol (MCP) server providing unified access to Claude Code, Codex, Gemini, and Grok CLIs with session management, retry logic, and async job orchestration.
7
7
 
8
8
  ## Features
9
9
 
10
10
  ### Core Capabilities
11
- - **Multi-LLM Orchestration**: Unified interface for Claude Code, Codex, and Gemini CLIs
11
+ - **Multi-LLM Orchestration**: Unified interface for Claude Code, Codex, Gemini, and Grok CLIs
12
12
  - **Session Management**: Track and resume conversations across all CLIs with persistent storage
13
13
  - **Token Optimization**: Automatic 44% reduction on prompts, 37% on responses (opt-in)
14
14
  - **Correlation ID Tracking**: Full request tracing across all LLM interactions
@@ -56,6 +56,13 @@ npm install -g @google/gemini-cli
56
56
  # Or: https://github.com/google-gemini/gemini-cli
57
57
  ```
58
58
 
59
+ ### Grok CLI (xAI)
60
+ ```bash
61
+ npm install -g grok-build
62
+ grok login # OAuth flow, or set GROK_CODE_XAI_API_KEY
63
+ # Docs: https://docs.x.ai/build/cli
64
+ ```
65
+
59
66
  ## Installation
60
67
 
61
68
  ### As an MCP server (npm)
@@ -205,8 +212,54 @@ Execute a Gemini CLI request with session support.
205
212
  }
206
213
  ```
207
214
 
208
- ##### `claude_request_async` / `codex_request_async`
209
- Start a long-running Claude or Codex request without waiting for completion in the same MCP call.
215
+ ##### `grok_request`
216
+ Execute a Grok CLI (xAI) request with session support.
217
+
218
+ **Parameters:**
219
+ - `prompt` (string, required): The prompt to send (1-100,000 chars)
220
+ - `model` (string, optional): Model name or alias (e.g. `grok-build`, `latest`)
221
+ - `outputFormat` (string, optional): `"plain"` (default), `"json"`, or `"streaming-json"`
222
+ - `sessionId` (string, optional): Session ID to resume (`--resume <id>`)
223
+ - `resumeLatest` (boolean, optional): Resume the most recent session in the current cwd (`--continue`)
224
+ - `createNewSession` (boolean, optional): Always create a new session
225
+ - `alwaysApprove` (boolean, optional): Auto-approve all tool executions (`--always-approve`) in legacy mode
226
+ - `permissionMode` (string, optional): `default|acceptEdits|auto|dontAsk|bypassPermissions|plan`
227
+ - `effort` (string, optional): `low|medium|high|xhigh|max`
228
+ - `reasoningEffort` (string, optional): Reasoning effort for reasoning models
229
+ - `approvalStrategy` (string, optional): `"legacy"` (default) or `"mcp_managed"`
230
+ - `approvalPolicy` (string, optional): `"strict"`, `"balanced"`, or `"permissive"`
231
+ - `mcpServers` (string[], optional): MCP server names tracked for approvals (Grok manages its own MCP config via `grok mcp`)
232
+ - `allowedTools` (string[], optional): Allowed built-in tools (passed as `--tools` comma list)
233
+ - `disallowedTools` (string[], optional): Disallowed built-in tools (passed as `--disallowed-tools` comma list)
234
+ - `optimizePrompt` (boolean, optional): Optimize prompt for token efficiency, default: false
235
+ - `optimizeResponse` (boolean, optional): Optimize response for token efficiency, default: false
236
+ - `correlationId` (string, optional): Request trace ID (auto-generated if omitted)
237
+
238
+ **Example:**
239
+ ```json
240
+ {
241
+ "prompt": "Summarize the latest commit message in 1 sentence",
242
+ "model": "grok-build",
243
+ "effort": "low"
244
+ }
245
+ ```
246
+
247
+ #### Durable job results & automatic dedup
248
+
249
+ Every async job is persisted to a `jobs` table in `~/.llm-cli-gateway/logs.db` as it transitions through running → completed/failed/canceled. This makes the gateway a durable collection layer:
250
+
251
+ - **Re-issuing a request is safe.** Identical `*_request` / `*_request_async` calls within the dedup window (default 1 hour) short-circuit onto the existing running or completed job — the caller gets back the same job ID instead of starting a duplicate run. This directly fixes the "agent times out polling, re-issues, and the whole job starts over" failure mode.
252
+ - **`llm_job_status` and `llm_job_result` work across gateway restarts.** Job rows live for 30 days by default; callers can fetch results long after the in-memory cache has evicted them.
253
+ - **Jobs running at shutdown are marked `orphaned`** on the next gateway boot (the detached child can't be reattached to). Their captured partial output remains readable.
254
+ - **Pass `forceRefresh: true`** on any request tool to bypass dedup and force a fresh CLI run.
255
+
256
+ Environment variables:
257
+ - `LLM_GATEWAY_JOB_RETENTION_DAYS` — how long completed jobs stay queryable. Default `30`.
258
+ - `LLM_GATEWAY_DEDUP_WINDOW_MS` — how recent an existing job must be to dedup against. Default `3600000` (1 hour). Set `0` to disable dedup.
259
+ - `LLM_GATEWAY_JOBS_DB` — override the sqlite path. Defaults to the value of `LLM_GATEWAY_LOGS_DB`, then `~/.llm-cli-gateway/logs.db`. Set to `none` to disable durability entirely (in-memory only).
260
+
261
+ ##### `claude_request_async` / `codex_request_async` / `gemini_request_async` / `grok_request_async`
262
+ Start a long-running Claude, Codex, Gemini, or Grok request without waiting for completion in the same MCP call.
210
263
 
211
264
  Use this flow when analysis/runtime can exceed client tool-call limits:
212
265
  1. Start job with `*_request_async`
@@ -244,7 +297,7 @@ Approval records are persisted to `~/.llm-cli-gateway/approvals.jsonl`.
244
297
  Create a new session for a specific CLI.
245
298
 
246
299
  **Parameters:**
247
- - `cli` (string, required): CLI to create session for ("claude", "codex", "gemini")
300
+ - `cli` (string, required): CLI to create session for ("claude", "codex", "gemini", "grok")
248
301
  - `description` (string, optional): Description for the session
249
302
  - `setAsActive` (boolean, optional): Set as active session, default: true
250
303
 
@@ -261,7 +314,7 @@ Create a new session for a specific CLI.
261
314
  List all sessions, optionally filtered by CLI.
262
315
 
263
316
  **Parameters:**
264
- - `cli` (string, optional): Filter by CLI ("claude", "codex", "gemini")
317
+ - `cli` (string, optional): Filter by CLI ("claude", "codex", "gemini", "grok")
265
318
 
266
319
  **Response includes:**
267
320
  - Total session count
@@ -299,12 +352,74 @@ Clear all sessions, optionally for a specific CLI.
299
352
  List available models for each CLI.
300
353
 
301
354
  **Parameters:**
302
- - `cli` (string, optional): Specific CLI to list models for ("claude", "codex", "gemini")
355
+ - `cli` (string, optional): Specific CLI to list models for ("claude", "codex", "gemini", "grok")
303
356
 
304
357
  **Response includes:**
305
358
  - Model names and descriptions
306
359
  - Best use cases for each model
307
360
  - CLI-specific information
361
+ - `defaultModel` and `defaultModelSource` when a default is explicitly configured
362
+ - `modelMetadata` with source/confidence (`fallback`, `config`, `env`, `observed`)
363
+ - `aliases` and `warnings` when configured or when discovery degrades gracefully
364
+
365
+ The registry treats explicit configuration as authoritative. Bundled fallback models are low-confidence hints, and Gemini models observed in local session history are merged as low-confidence entries only; they do not become the default model.
366
+
367
+ Model registry environment overrides:
368
+
369
+ ```bash
370
+ # Explicit defaults
371
+ CLAUDE_DEFAULT_MODEL=haiku
372
+ CODEX_DEFAULT_MODEL=<codex-model-id>
373
+ GEMINI_DEFAULT_MODEL=gemini-2.5-flash
374
+
375
+ # Additional models: comma/newline list, JSON array, or JSON object of model->description
376
+ GEMINI_MODELS='{"gemini-team-default":"Team-approved Gemini model"}'
377
+
378
+ # Aliases
379
+ GEMINI_MODEL_ALIASES='team=gemini-team-default'
380
+ LLM_GATEWAY_MODEL_ALIASES='codex.fast=gpt-5.3-codex-spark,gemini.fast=gemini-team-default'
381
+
382
+ # Deterministic config/discovery paths
383
+ CODEX_CONFIG_PATH=/path/to/config.toml
384
+ CLAUDE_SETTINGS_PATH=/path/to/settings.json
385
+ CLAUDE_SETTINGS_LOCAL_PATH=/path/to/settings.local.json
386
+ GEMINI_SETTINGS_PATH=/path/to/settings.json
387
+ GEMINI_HISTORY_ROOT=/path/to/.gemini/tmp
388
+
389
+ # Disable local model-history discovery
390
+ LLM_GATEWAY_DISABLE_MODEL_DISCOVERY=1
391
+ ```
392
+
393
+ ##### `cli_versions`
394
+ Report installed CLI versions.
395
+
396
+ **Parameters:**
397
+ - `cli` (string, optional): Specific CLI to inspect ("claude", "codex", "gemini", "grok")
398
+
399
+ ##### `cli_upgrade`
400
+ Plan or run an upgrade for one CLI.
401
+
402
+ **Parameters:**
403
+ - `cli` (string, required): CLI to upgrade ("claude", "codex", "gemini", "grok")
404
+ - `target` (string, optional): Package tag/version/target, default: `latest`
405
+ - `dryRun` (boolean, optional): Return the upgrade plan without running it, default: `true`
406
+ - `timeoutMs` (number, optional): Upgrade timeout when `dryRun=false`
407
+
408
+ **Upgrade strategies:**
409
+ - Claude latest: `claude update`
410
+ - Claude explicit target: `claude install <target>`
411
+ - Codex latest: `codex update`
412
+ - Codex explicit target: `npm install -g @openai/codex@<target>`
413
+ - Gemini: `npm install -g @google/gemini-cli@<target>`
414
+
415
+ **Example dry run:**
416
+ ```json
417
+ {
418
+ "cli": "gemini",
419
+ "target": "latest",
420
+ "dryRun": true
421
+ }
422
+ ```
308
423
 
309
424
  ## Session Management
310
425
 
@@ -572,4 +687,3 @@ For issues and questions:
572
687
  ## Changelog
573
688
 
574
689
  See [CHANGELOG.md](CHANGELOG.md) for detailed release history.
575
-
@@ -2,7 +2,7 @@ import type { Logger } from "./logger.js";
2
2
  import type { ReviewIntegrityResult } from "./review-integrity.js";
3
3
  export type ApprovalPolicy = "strict" | "balanced" | "permissive";
4
4
  export type ApprovalStrategy = "legacy" | "mcp_managed";
5
- export type ApprovalCli = "claude" | "codex" | "gemini";
5
+ export type ApprovalCli = "claude" | "codex" | "gemini" | "grok";
6
6
  export type ApprovalStatus = "approved" | "denied";
7
7
  export interface ApprovalRequest {
8
8
  cli: ApprovalCli;
@@ -1,7 +1,8 @@
1
1
  import type { Logger } from "./logger.js";
2
2
  import { type JobHealth } from "./process-monitor.js";
3
- export type LlmCli = "claude" | "codex" | "gemini";
4
- export type AsyncJobStatus = "running" | "completed" | "failed" | "canceled";
3
+ import { JobStore } from "./job-store.js";
4
+ export type LlmCli = "claude" | "codex" | "gemini" | "grok";
5
+ export type AsyncJobStatus = "running" | "completed" | "failed" | "canceled" | "orphaned";
5
6
  export interface AsyncJobSnapshot {
6
7
  id: string;
7
8
  cli: LlmCli;
@@ -22,16 +23,64 @@ export interface AsyncJobResult extends AsyncJobSnapshot {
22
23
  stdoutTruncated: boolean;
23
24
  stderrTruncated: boolean;
24
25
  }
26
+ export interface StartJobOptions {
27
+ cwd?: string;
28
+ idleTimeoutMs?: number;
29
+ outputFormat?: string;
30
+ /** Bypass dedup and force a fresh CLI run even if a recent matching job exists. */
31
+ forceRefresh?: boolean;
32
+ }
33
+ export interface StartJobOutcome {
34
+ snapshot: AsyncJobSnapshot;
35
+ /** Set to the existing job's id when the request was de-duplicated. */
36
+ deduped: boolean;
37
+ /** Set when deduped — the original job's correlation id, useful for logging. */
38
+ originalCorrelationId?: string;
39
+ }
25
40
  export declare class AsyncJobManager {
26
41
  private logger;
27
42
  private onJobComplete?;
28
43
  private jobs;
29
44
  private evictionTimer;
30
45
  private processMonitor;
31
- constructor(logger?: Logger, onJobComplete?: ((cli: LlmCli, durationMs: number, success: boolean) => void) | undefined);
46
+ private store;
47
+ constructor(logger?: Logger, onJobComplete?: ((cli: LlmCli, durationMs: number, success: boolean) => void) | undefined, store?: JobStore | null);
32
48
  private emitMetrics;
33
49
  private evictCompletedJobs;
34
- startJob(cli: LlmCli, args: string[], correlationId: string, cwd?: string, idleTimeoutMs?: number, outputFormat?: string): AsyncJobSnapshot;
50
+ /**
51
+ * Compute the dedup key for a job. Stable across re-issues of the same request,
52
+ * which is exactly what allows agents to safely retry without restarting the run.
53
+ */
54
+ private buildRequestKey;
55
+ private safeStoreCall;
56
+ /**
57
+ * Flush in-memory stdout/stderr to the durable store if anything changed
58
+ * since the last flush. Throttled by OUTPUT_FLUSH_INTERVAL_MS to avoid
59
+ * pounding sqlite on every chunk of streaming output.
60
+ */
61
+ private maybeFlushOutput;
62
+ private persistComplete;
63
+ /**
64
+ * Reconstitute an in-memory AsyncJobRecord from a durable row, so subsequent
65
+ * getJobSnapshot/getJobResult calls hit the in-memory cache.
66
+ * The reconstituted record has process=null — it represents historical data only.
67
+ */
68
+ private hydrateFromStore;
69
+ /**
70
+ * Backwards-compatible entry point. Equivalent to startJobWithDedup({...}).snapshot.
71
+ * Existing callers keep working unchanged; forceRefresh is exposed as a trailing
72
+ * optional param for the dedup-aware path.
73
+ */
74
+ startJob(cli: LlmCli, args: string[], correlationId: string, cwd?: string, idleTimeoutMs?: number, outputFormat?: string, forceRefresh?: boolean): AsyncJobSnapshot;
75
+ /**
76
+ * Start a job, with optional dedup against recent identical requests.
77
+ * Returns `{ snapshot, deduped }` so callers can log/report the short-circuit.
78
+ *
79
+ * Dedup is keyed on (cli, args). If a job with the same key was started within
80
+ * the dedup window (default 1h) and is still running or completed, its snapshot
81
+ * is returned without spawning a new process. forceRefresh skips dedup entirely.
82
+ */
83
+ startJobWithDedup(cli: LlmCli, args: string[], correlationId: string, opts?: StartJobOptions): StartJobOutcome;
35
84
  getJobSnapshot(jobId: string): AsyncJobSnapshot | null;
36
85
  getJobResult(jobId: string, maxChars?: number): AsyncJobResult | null;
37
86
  cancelJob(jobId: string): {