npm - llm-cli-gateway - Versions diffs - 1.1.0 → 1.4.0 - Mend

llm-cli-gateway 1.1.0 → 1.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (21) hide show

package/CHANGELOG.md +21 -0
package/README.md +122 -8
package/dist/approval-manager.d.ts +1 -1
package/dist/async-job-manager.d.ts +53 -4
package/dist/async-job-manager.js +237 -17
package/dist/cli-updater.d.ts +38 -0
package/dist/cli-updater.js +145 -0
package/dist/flight-recorder.d.ts +1 -1
package/dist/index.d.ts +27 -0
package/dist/index.js +651 -26
package/dist/job-store.d.ts +84 -0
package/dist/job-store.js +251 -0
package/dist/model-registry.d.ts +14 -0
package/dist/model-registry.js +444 -134
package/dist/request-helpers.d.ts +41 -0
package/dist/request-helpers.js +40 -0
package/dist/resources.js +44 -0
package/dist/session-manager-pg.js +1 -0
package/dist/session-manager.d.ts +1 -1
package/dist/session-manager.js +2 -1
package/package.json +3 -3

package/CHANGELOG.md CHANGED Viewed

@@ -2,6 +2,27 @@
 All notable changes to the llm-cli-gateway project.
+## Unreleased
+## [1.4.0] - 2026-05-16
+### Added
+- **Codex `exec resume` wired through the gateway** — `codex_request` and `codex_request_async` now accept `sessionId` (real Codex session UUID from `~/.codex/sessions/` or the `codex resume` picker) and `resumeLatest:true`, emitting `codex exec resume <UUID>` and `codex exec resume --last` respectively. Codex sessions are no longer bookkeeping-only at the gateway layer; multi-turn workflows carry real CLI continuity, matching Claude/Gemini/Grok. Gateway-generated `gw-*` IDs are rejected for Codex (as for Gemini/Grok). `--full-auto` is silently dropped on resume because `codex exec resume` does not accept it — the original session's approval policy is inherited.
+- **Durable job results + automatic dedup** — Async jobs are now persisted to a `jobs` table in `~/.llm-cli-gateway/logs.db` on every state transition (start, output flush, completion). `llm_job_status` and `llm_job_result` fall back to the database when the job is no longer in memory, so callers can collect a result regardless of how long ago the work completed (default retention: **30 days**, configurable via `LLM_GATEWAY_JOB_RETENTION_DAYS`). Identical `*_request` / `*_request_async` calls within a dedup window (default **1 hour**, configurable via `LLM_GATEWAY_DEDUP_WINDOW_MS`) short-circuit onto the existing running or completed job instead of spawning a duplicate run — directly fixing the "agent re-issues and the whole job starts over" loop. Each tool now accepts `forceRefresh: true` to bypass dedup. Jobs that were running when the gateway last stopped are flipped to `orphaned` on startup so callers can still read their partial output.
+- **Grok CLI provider (xAI Grok Build TUI)** — New `grok_request` and `grok_request_async` MCP tools mirror the existing Claude/Codex/Gemini surface (sync + async, session management via `--resume`/`--continue`, idle-timeout, approval policy, review-integrity, flight recorder, metrics). Auth assumes a prior `grok login` (OAuth) or `GROK_CODE_XAI_API_KEY`. Default model: `grok-build`. `GROK_DEFAULT_MODEL`, `GROK_MODELS`, and `GROK_MODEL_ALIASES` env vars are honored by the model registry. `cli_upgrade` treats Grok as self-updating (`grok update` / `grok update --version <target>`).
+- **Source-aware model registry** — `list_models` now reports model source/confidence metadata, aliases, default model source, and non-fatal discovery warnings
+- **Deterministic model configuration overrides** — Added `*_SETTINGS_PATH`, `GEMINI_HISTORY_ROOT`, `*_MODEL_ALIASES`, and `LLM_GATEWAY_MODEL_ALIASES` support for stable deployments and tests
+- **CLI lifecycle tools** — Added `cli_versions` and `cli_upgrade` tools for inspecting and upgrading individual Claude, Codex, Gemini, and Grok CLI installations
+- **`resolveCodexSessionArgs` helper** in `src/request-helpers.ts` with 7 new tests covering mode resolution and `gw-*` rejection (Codex uses an `exec resume` subcommand rather than a flag pair, so the helper returns a `mode` discriminant: `new` | `resume-by-id` | `resume-latest`)
+### Changed
+- **`better-sqlite3` bumped to `^12.9.0`** (from `^11.0.0`) — required engines now `node 20.x || 22.x || 23.x || 24.x || 25.x`
+- **Gemini history discovery is no longer authoritative** — Models observed in local Gemini session files are merged as low-confidence entries and no longer replace the registry or set the default model
+- **Codex default handling remains explicit** — If Codex has no configured default, `default`/`latest` resolve to no model flag so the Codex CLI can use its own built-in default
+- **Gateway skills refreshed** — The `.agents/skills/` (async-job-orchestration, implement-review-fix, multi-llm-review, secure-orchestration, session-workflow) and `skills/` (multi-llm-orchestration, multi-llm-consensus, model-routing, design-review-cycle, agent-codex-gate, codex-review-gate, red-team-assessment) skill docs now cover Grok, durable job results, auto-dedup, and the new Codex resume capability. `.agents/skills/` entries bumped to metadata version 1.5.
 ## [1.1.0] - 2026-04-04
 ### Added

package/README.md CHANGED Viewed

@@ -3,12 +3,12 @@
 > *"Without consultation, plans are frustrated, but with many counselors they succeed."*
 > — Proverbs 15:22 (LSB)
-A Model Context Protocol (MCP) server providing unified access to Claude Code, Codex, and Gemini CLIs with session management, retry logic, and async job orchestration.
+A Model Context Protocol (MCP) server providing unified access to Claude Code, Codex, Gemini, and Grok CLIs with session management, retry logic, and async job orchestration.
 ## Features
 ### Core Capabilities
-- **Multi-LLM Orchestration**: Unified interface for Claude Code, Codex, and Gemini CLIs
+- **Multi-LLM Orchestration**: Unified interface for Claude Code, Codex, Gemini, and Grok CLIs
 - **Session Management**: Track and resume conversations across all CLIs with persistent storage
 - **Token Optimization**: Automatic 44% reduction on prompts, 37% on responses (opt-in)
 - **Correlation ID Tracking**: Full request tracing across all LLM interactions
@@ -56,6 +56,13 @@ npm install -g @google/gemini-cli
 # Or: https://github.com/google-gemini/gemini-cli
 ```
+### Grok CLI (xAI)
+```bash
+npm install -g grok-build
+grok login   # OAuth flow, or set GROK_CODE_XAI_API_KEY
+# Docs: https://docs.x.ai/build/cli
+```
 ## Installation
 ### As an MCP server (npm)
@@ -205,8 +212,54 @@ Execute a Gemini CLI request with session support.
 }
 ```
-##### `claude_request_async` / `codex_request_async`
-Start a long-running Claude or Codex request without waiting for completion in the same MCP call.
+##### `grok_request`
+Execute a Grok CLI (xAI) request with session support.
+**Parameters:**
+- `prompt` (string, required): The prompt to send (1-100,000 chars)
+- `model` (string, optional): Model name or alias (e.g. `grok-build`, `latest`)
+- `outputFormat` (string, optional): `"plain"` (default), `"json"`, or `"streaming-json"`
+- `sessionId` (string, optional): Session ID to resume (`--resume <id>`)
+- `resumeLatest` (boolean, optional): Resume the most recent session in the current cwd (`--continue`)
+- `createNewSession` (boolean, optional): Always create a new session
+- `alwaysApprove` (boolean, optional): Auto-approve all tool executions (`--always-approve`) in legacy mode
+- `permissionMode` (string, optional): `default|acceptEdits|auto|dontAsk|bypassPermissions|plan`
+- `effort` (string, optional): `low|medium|high|xhigh|max`
+- `reasoningEffort` (string, optional): Reasoning effort for reasoning models
+- `approvalStrategy` (string, optional): `"legacy"` (default) or `"mcp_managed"`
+- `approvalPolicy` (string, optional): `"strict"`, `"balanced"`, or `"permissive"`
+- `mcpServers` (string[], optional): MCP server names tracked for approvals (Grok manages its own MCP config via `grok mcp`)
+- `allowedTools` (string[], optional): Allowed built-in tools (passed as `--tools` comma list)
+- `disallowedTools` (string[], optional): Disallowed built-in tools (passed as `--disallowed-tools` comma list)
+- `optimizePrompt` (boolean, optional): Optimize prompt for token efficiency, default: false
+- `optimizeResponse` (boolean, optional): Optimize response for token efficiency, default: false
+- `correlationId` (string, optional): Request trace ID (auto-generated if omitted)
+**Example:**
+```json
+{
+  "prompt": "Summarize the latest commit message in 1 sentence",
+  "model": "grok-build",
+  "effort": "low"
+}
+```
+#### Durable job results & automatic dedup
+Every async job is persisted to a `jobs` table in `~/.llm-cli-gateway/logs.db` as it transitions through running → completed/failed/canceled. This makes the gateway a durable collection layer:
+- **Re-issuing a request is safe.** Identical `*_request` / `*_request_async` calls within the dedup window (default 1 hour) short-circuit onto the existing running or completed job — the caller gets back the same job ID instead of starting a duplicate run. This directly fixes the "agent times out polling, re-issues, and the whole job starts over" failure mode.
+- **`llm_job_status` and `llm_job_result` work across gateway restarts.** Job rows live for 30 days by default; callers can fetch results long after the in-memory cache has evicted them.
+- **Jobs running at shutdown are marked `orphaned`** on the next gateway boot (the detached child can't be reattached to). Their captured partial output remains readable.
+- **Pass `forceRefresh: true`** on any request tool to bypass dedup and force a fresh CLI run.
+Environment variables:
+- `LLM_GATEWAY_JOB_RETENTION_DAYS` — how long completed jobs stay queryable. Default `30`.
+- `LLM_GATEWAY_DEDUP_WINDOW_MS` — how recent an existing job must be to dedup against. Default `3600000` (1 hour). Set `0` to disable dedup.
+- `LLM_GATEWAY_JOBS_DB` — override the sqlite path. Defaults to the value of `LLM_GATEWAY_LOGS_DB`, then `~/.llm-cli-gateway/logs.db`. Set to `none` to disable durability entirely (in-memory only).
+##### `claude_request_async` / `codex_request_async` / `gemini_request_async` / `grok_request_async`
+Start a long-running Claude, Codex, Gemini, or Grok request without waiting for completion in the same MCP call.
 Use this flow when analysis/runtime can exceed client tool-call limits:
 1. Start job with `*_request_async`
@@ -244,7 +297,7 @@ Approval records are persisted to `~/.llm-cli-gateway/approvals.jsonl`.
 Create a new session for a specific CLI.
 **Parameters:**
-- `cli` (string, required): CLI to create session for ("claude", "codex", "gemini")
+- `cli` (string, required): CLI to create session for ("claude", "codex", "gemini", "grok")
 - `description` (string, optional): Description for the session
 - `setAsActive` (boolean, optional): Set as active session, default: true
@@ -261,7 +314,7 @@ Create a new session for a specific CLI.
 List all sessions, optionally filtered by CLI.
 **Parameters:**
-- `cli` (string, optional): Filter by CLI ("claude", "codex", "gemini")
+- `cli` (string, optional): Filter by CLI ("claude", "codex", "gemini", "grok")
 **Response includes:**
 - Total session count
@@ -299,12 +352,74 @@ Clear all sessions, optionally for a specific CLI.
 List available models for each CLI.
 **Parameters:**
-- `cli` (string, optional): Specific CLI to list models for ("claude", "codex", "gemini")
+- `cli` (string, optional): Specific CLI to list models for ("claude", "codex", "gemini", "grok")
 **Response includes:**
 - Model names and descriptions
 - Best use cases for each model
 - CLI-specific information
+- `defaultModel` and `defaultModelSource` when a default is explicitly configured
+- `modelMetadata` with source/confidence (`fallback`, `config`, `env`, `observed`)
+- `aliases` and `warnings` when configured or when discovery degrades gracefully
+The registry treats explicit configuration as authoritative. Bundled fallback models are low-confidence hints, and Gemini models observed in local session history are merged as low-confidence entries only; they do not become the default model.
+Model registry environment overrides:
+```bash
+# Explicit defaults
+CLAUDE_DEFAULT_MODEL=haiku
+CODEX_DEFAULT_MODEL=<codex-model-id>
+GEMINI_DEFAULT_MODEL=gemini-2.5-flash
+# Additional models: comma/newline list, JSON array, or JSON object of model->description
+GEMINI_MODELS='{"gemini-team-default":"Team-approved Gemini model"}'
+# Aliases
+GEMINI_MODEL_ALIASES='team=gemini-team-default'
+LLM_GATEWAY_MODEL_ALIASES='codex.fast=gpt-5.3-codex-spark,gemini.fast=gemini-team-default'
+# Deterministic config/discovery paths
+CODEX_CONFIG_PATH=/path/to/config.toml
+CLAUDE_SETTINGS_PATH=/path/to/settings.json
+CLAUDE_SETTINGS_LOCAL_PATH=/path/to/settings.local.json
+GEMINI_SETTINGS_PATH=/path/to/settings.json
+GEMINI_HISTORY_ROOT=/path/to/.gemini/tmp
+# Disable local model-history discovery
+LLM_GATEWAY_DISABLE_MODEL_DISCOVERY=1
+```
+##### `cli_versions`
+Report installed CLI versions.
+**Parameters:**
+- `cli` (string, optional): Specific CLI to inspect ("claude", "codex", "gemini", "grok")
+##### `cli_upgrade`
+Plan or run an upgrade for one CLI.
+**Parameters:**
+- `cli` (string, required): CLI to upgrade ("claude", "codex", "gemini", "grok")
+- `target` (string, optional): Package tag/version/target, default: `latest`
+- `dryRun` (boolean, optional): Return the upgrade plan without running it, default: `true`
+- `timeoutMs` (number, optional): Upgrade timeout when `dryRun=false`
+**Upgrade strategies:**
+- Claude latest: `claude update`
+- Claude explicit target: `claude install <target>`
+- Codex latest: `codex update`
+- Codex explicit target: `npm install -g @openai/codex@<target>`
+- Gemini: `npm install -g @google/gemini-cli@<target>`
+**Example dry run:**
+```json
+{
+  "cli": "gemini",
+  "target": "latest",
+  "dryRun": true
+}
+```
 ## Session Management
@@ -572,4 +687,3 @@ For issues and questions:
 ## Changelog
 See [CHANGELOG.md](CHANGELOG.md) for detailed release history.

package/dist/approval-manager.d.ts CHANGED Viewed

@@ -2,7 +2,7 @@ import type { Logger } from "./logger.js";
 import type { ReviewIntegrityResult } from "./review-integrity.js";
 export type ApprovalPolicy = "strict" | "balanced" | "permissive";
 export type ApprovalStrategy = "legacy" | "mcp_managed";
-export type ApprovalCli = "claude" | "codex" | "gemini";
+export type ApprovalCli = "claude" | "codex" | "gemini" | "grok";
 export type ApprovalStatus = "approved" | "denied";
 export interface ApprovalRequest {
     cli: ApprovalCli;

package/dist/async-job-manager.d.ts CHANGED Viewed

@@ -1,7 +1,8 @@
 import type { Logger } from "./logger.js";
 import { type JobHealth } from "./process-monitor.js";
-export type LlmCli = "claude" | "codex" | "gemini";
-export type AsyncJobStatus = "running" | "completed" | "failed" | "canceled";
+import { JobStore } from "./job-store.js";
+export type LlmCli = "claude" | "codex" | "gemini" | "grok";
+export type AsyncJobStatus = "running" | "completed" | "failed" | "canceled" | "orphaned";
 export interface AsyncJobSnapshot {
     id: string;
     cli: LlmCli;
@@ -22,16 +23,64 @@ export interface AsyncJobResult extends AsyncJobSnapshot {
     stdoutTruncated: boolean;
     stderrTruncated: boolean;
 }
+export interface StartJobOptions {
+    cwd?: string;
+    idleTimeoutMs?: number;
+    outputFormat?: string;
+    /** Bypass dedup and force a fresh CLI run even if a recent matching job exists. */
+    forceRefresh?: boolean;
+}
+export interface StartJobOutcome {
+    snapshot: AsyncJobSnapshot;
+    /** Set to the existing job's id when the request was de-duplicated. */
+    deduped: boolean;
+    /** Set when deduped — the original job's correlation id, useful for logging. */
+    originalCorrelationId?: string;
+}
 export declare class AsyncJobManager {
     private logger;
     private onJobComplete?;
     private jobs;
     private evictionTimer;
     private processMonitor;
-    constructor(logger?: Logger, onJobComplete?: ((cli: LlmCli, durationMs: number, success: boolean) => void) | undefined);
+    private store;
+    constructor(logger?: Logger, onJobComplete?: ((cli: LlmCli, durationMs: number, success: boolean) => void) | undefined, store?: JobStore | null);
     private emitMetrics;
     private evictCompletedJobs;
-    startJob(cli: LlmCli, args: string[], correlationId: string, cwd?: string, idleTimeoutMs?: number, outputFormat?: string): AsyncJobSnapshot;
+    /**
+     * Compute the dedup key for a job. Stable across re-issues of the same request,
+     * which is exactly what allows agents to safely retry without restarting the run.
+     */
+    private buildRequestKey;
+    private safeStoreCall;
+    /**
+     * Flush in-memory stdout/stderr to the durable store if anything changed
+     * since the last flush. Throttled by OUTPUT_FLUSH_INTERVAL_MS to avoid
+     * pounding sqlite on every chunk of streaming output.
+     */
+    private maybeFlushOutput;
+    private persistComplete;
+    /**
+     * Reconstitute an in-memory AsyncJobRecord from a durable row, so subsequent
+     * getJobSnapshot/getJobResult calls hit the in-memory cache.
+     * The reconstituted record has process=null — it represents historical data only.
+     */
+    private hydrateFromStore;
+    /**
+     * Backwards-compatible entry point. Equivalent to startJobWithDedup({...}).snapshot.
+     * Existing callers keep working unchanged; forceRefresh is exposed as a trailing
+     * optional param for the dedup-aware path.
+     */
+    startJob(cli: LlmCli, args: string[], correlationId: string, cwd?: string, idleTimeoutMs?: number, outputFormat?: string, forceRefresh?: boolean): AsyncJobSnapshot;
+    /**
+     * Start a job, with optional dedup against recent identical requests.
+     * Returns `{ snapshot, deduped }` so callers can log/report the short-circuit.
+     *
+     * Dedup is keyed on (cli, args). If a job with the same key was started within
+     * the dedup window (default 1h) and is still running or completed, its snapshot
+     * is returned without spawning a new process. forceRefresh skips dedup entirely.
+     */
+    startJobWithDedup(cli: LlmCli, args: string[], correlationId: string, opts?: StartJobOptions): StartJobOutcome;
     getJobSnapshot(jobId: string): AsyncJobSnapshot | null;
     getJobResult(jobId: string, maxChars?: number): AsyncJobResult | null;
     cancelJob(jobId: string): {