llm-cli-gateway 1.0.1 → 1.4.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +42 -0
- package/README.md +153 -9
- package/dist/approval-manager.d.ts +1 -1
- package/dist/approval-manager.js +7 -4
- package/dist/async-job-manager.d.ts +53 -4
- package/dist/async-job-manager.js +254 -27
- package/dist/claude-mcp-config.js +7 -4
- package/dist/cli-updater.d.ts +38 -0
- package/dist/cli-updater.js +145 -0
- package/dist/config.js +15 -9
- package/dist/db.js +4 -4
- package/dist/executor.js +20 -13
- package/dist/flight-recorder.d.ts +48 -0
- package/dist/flight-recorder.js +220 -0
- package/dist/health.js +3 -3
- package/dist/index.d.ts +28 -0
- package/dist/index.js +1456 -278
- package/dist/job-store.d.ts +84 -0
- package/dist/job-store.js +251 -0
- package/dist/logger.js +1 -1
- package/dist/metrics.js +9 -12
- package/dist/migrate-sessions.js +2 -2
- package/dist/model-registry.d.ts +14 -0
- package/dist/model-registry.js +448 -140
- package/dist/optimizer.js +9 -9
- package/dist/process-monitor.js +24 -8
- package/dist/request-helpers.d.ts +48 -0
- package/dist/request-helpers.js +64 -2
- package/dist/resources.js +76 -32
- package/dist/retry.js +6 -4
- package/dist/review-integrity.d.ts +6 -38
- package/dist/review-integrity.js +41 -275
- package/dist/session-manager-pg.js +7 -4
- package/dist/session-manager.d.ts +1 -1
- package/dist/session-manager.js +9 -5
- package/dist/stream-json-parser.js +8 -6
- package/package.json +7 -4
package/CHANGELOG.md
CHANGED
|
@@ -2,6 +2,48 @@
|
|
|
2
2
|
|
|
3
3
|
All notable changes to the llm-cli-gateway project.
|
|
4
4
|
|
|
5
|
+
## Unreleased
|
|
6
|
+
|
|
7
|
+
## [1.4.0] - 2026-05-16
|
|
8
|
+
|
|
9
|
+
### Added
|
|
10
|
+
|
|
11
|
+
- **Codex `exec resume` wired through the gateway** — `codex_request` and `codex_request_async` now accept `sessionId` (real Codex session UUID from `~/.codex/sessions/` or the `codex resume` picker) and `resumeLatest:true`, emitting `codex exec resume <UUID>` and `codex exec resume --last` respectively. Codex sessions are no longer bookkeeping-only at the gateway layer; multi-turn workflows carry real CLI continuity, matching Claude/Gemini/Grok. Gateway-generated `gw-*` IDs are rejected for Codex (as for Gemini/Grok). `--full-auto` is silently dropped on resume because `codex exec resume` does not accept it — the original session's approval policy is inherited.
|
|
12
|
+
- **Durable job results + automatic dedup** — Async jobs are now persisted to a `jobs` table in `~/.llm-cli-gateway/logs.db` on every state transition (start, output flush, completion). `llm_job_status` and `llm_job_result` fall back to the database when the job is no longer in memory, so callers can collect a result regardless of how long ago the work completed (default retention: **30 days**, configurable via `LLM_GATEWAY_JOB_RETENTION_DAYS`). Identical `*_request` / `*_request_async` calls within a dedup window (default **1 hour**, configurable via `LLM_GATEWAY_DEDUP_WINDOW_MS`) short-circuit onto the existing running or completed job instead of spawning a duplicate run — directly fixing the "agent re-issues and the whole job starts over" loop. Each tool now accepts `forceRefresh: true` to bypass dedup. Jobs that were running when the gateway last stopped are flipped to `orphaned` on startup so callers can still read their partial output.
|
|
13
|
+
- **Grok CLI provider (xAI Grok Build TUI)** — New `grok_request` and `grok_request_async` MCP tools mirror the existing Claude/Codex/Gemini surface (sync + async, session management via `--resume`/`--continue`, idle-timeout, approval policy, review-integrity, flight recorder, metrics). Auth assumes a prior `grok login` (OAuth) or `GROK_CODE_XAI_API_KEY`. Default model: `grok-build`. `GROK_DEFAULT_MODEL`, `GROK_MODELS`, and `GROK_MODEL_ALIASES` env vars are honored by the model registry. `cli_upgrade` treats Grok as self-updating (`grok update` / `grok update --version <target>`).
|
|
14
|
+
- **Source-aware model registry** — `list_models` now reports model source/confidence metadata, aliases, default model source, and non-fatal discovery warnings
|
|
15
|
+
- **Deterministic model configuration overrides** — Added `*_SETTINGS_PATH`, `GEMINI_HISTORY_ROOT`, `*_MODEL_ALIASES`, and `LLM_GATEWAY_MODEL_ALIASES` support for stable deployments and tests
|
|
16
|
+
- **CLI lifecycle tools** — Added `cli_versions` and `cli_upgrade` tools for inspecting and upgrading individual Claude, Codex, Gemini, and Grok CLI installations
|
|
17
|
+
- **`resolveCodexSessionArgs` helper** in `src/request-helpers.ts` with 7 new tests covering mode resolution and `gw-*` rejection (Codex uses an `exec resume` subcommand rather than a flag pair, so the helper returns a `mode` discriminant: `new` | `resume-by-id` | `resume-latest`)
|
|
18
|
+
|
|
19
|
+
### Changed
|
|
20
|
+
|
|
21
|
+
- **`better-sqlite3` bumped to `^12.9.0`** (from `^11.0.0`) — required engines now `node 20.x || 22.x || 23.x || 24.x || 25.x`
|
|
22
|
+
- **Gemini history discovery is no longer authoritative** — Models observed in local Gemini session files are merged as low-confidence entries and no longer replace the registry or set the default model
|
|
23
|
+
- **Codex default handling remains explicit** — If Codex has no configured default, `default`/`latest` resolve to no model flag so the Codex CLI can use its own built-in default
|
|
24
|
+
- **Gateway skills refreshed** — The `.agents/skills/` (async-job-orchestration, implement-review-fix, multi-llm-review, secure-orchestration, session-workflow) and `skills/` (multi-llm-orchestration, multi-llm-consensus, model-routing, design-review-cycle, agent-codex-gate, codex-review-gate, red-team-assessment) skill docs now cover Grok, durable job results, auto-dedup, and the new Codex resume capability. `.agents/skills/` entries bumped to metadata version 1.5.
|
|
25
|
+
|
|
26
|
+
## [1.1.0] - 2026-04-04
|
|
27
|
+
|
|
28
|
+
### Added
|
|
29
|
+
|
|
30
|
+
- **SQLite flight recorder** — New `src/flight-recorder.ts` module logs all LLM requests/responses to `~/.llm-cli-gateway/logs.db` with two-phase logging (logStart/logComplete), WAL mode for concurrent Datasette reads, and graceful degradation when better-sqlite3 is unavailable
|
|
31
|
+
- **`LLM_GATEWAY_LOGS_DB` env var** — Configure flight recorder database path; set to empty string or `"none"` to disable logging entirely
|
|
32
|
+
- **`structuredContent` in MCP tool responses** — All tool handlers now return machine-readable metadata (model, cli, correlationId, sessionId, durationMs, token usage, exitCode) alongside the text response
|
|
33
|
+
- **`better-sqlite3` dependency** — Native SQLite addon for flight recorder (synchronous writes, WAL support)
|
|
34
|
+
|
|
35
|
+
### Changed
|
|
36
|
+
|
|
37
|
+
- **review-integrity.ts simplified** — Reduced from 323 lines to 83 lines. Retains 3 violation types: empty_allowed_tools, critical_tools_disallowed, tool_suppression. Removed inlined_code detection and multi-pattern matching
|
|
38
|
+
- **`buildCliResponse` signature** — Now requires `cli` and `durationMs` parameters for structuredContent population
|
|
39
|
+
- **`createErrorResponse`** — Returns sanitized `errorCategory` enum in structuredContent instead of raw error messages (prevents path/secret leakage)
|
|
40
|
+
- **Flight recorder writes are idempotent** — logComplete only updates rows with status='started', preventing double-completion
|
|
41
|
+
|
|
42
|
+
### Tests
|
|
43
|
+
|
|
44
|
+
- 284 tests passing (15 test files)
|
|
45
|
+
- Rewritten review-integrity tests to match simplified API
|
|
46
|
+
|
|
5
47
|
## [1.3.0] - 2026-02-15
|
|
6
48
|
|
|
7
49
|
### Fixed
|
package/README.md
CHANGED
|
@@ -3,17 +3,21 @@
|
|
|
3
3
|
> *"Without consultation, plans are frustrated, but with many counselors they succeed."*
|
|
4
4
|
> — Proverbs 15:22 (LSB)
|
|
5
5
|
|
|
6
|
-
A Model Context Protocol (MCP) server providing unified access to Claude Code, Codex, and
|
|
6
|
+
A Model Context Protocol (MCP) server providing unified access to Claude Code, Codex, Gemini, and Grok CLIs with session management, retry logic, and async job orchestration.
|
|
7
7
|
|
|
8
8
|
## Features
|
|
9
9
|
|
|
10
10
|
### Core Capabilities
|
|
11
|
-
- **Multi-LLM Orchestration**: Unified interface for Claude Code, Codex, and
|
|
11
|
+
- **Multi-LLM Orchestration**: Unified interface for Claude Code, Codex, Gemini, and Grok CLIs
|
|
12
12
|
- **Session Management**: Track and resume conversations across all CLIs with persistent storage
|
|
13
13
|
- **Token Optimization**: Automatic 44% reduction on prompts, 37% on responses (opt-in)
|
|
14
14
|
- **Correlation ID Tracking**: Full request tracing across all LLM interactions
|
|
15
15
|
- **Cross-Tool Collaboration**: LLMs can use each other via MCP (validated through dogfooding)
|
|
16
16
|
|
|
17
|
+
### Observability
|
|
18
|
+
- **SQLite Flight Recorder**: Every request/response logged to `~/.llm-cli-gateway/logs.db` with correlation IDs, token usage, duration, retry counts, and circuit breaker state. Browse with [Datasette](https://datasette.io/): `datasette ~/.llm-cli-gateway/logs.db`
|
|
19
|
+
- **Structured Metadata**: Tool responses include machine-readable `structuredContent` (model, cli, correlationId, sessionId, durationMs, token counts)
|
|
20
|
+
|
|
17
21
|
### Reliability & Performance
|
|
18
22
|
- **Retry Logic**: Exponential backoff with circuit breaker for transient failures
|
|
19
23
|
- **Atomic File Writes**: Process-specific temp files with fsync for data integrity
|
|
@@ -22,7 +26,7 @@ A Model Context Protocol (MCP) server providing unified access to Claude Code, C
|
|
|
22
26
|
- **Long-Running Jobs**: Non-time-bound async execution via `*_request_async` + polling tools
|
|
23
27
|
|
|
24
28
|
### Security & Quality
|
|
25
|
-
- **Comprehensive Testing**:
|
|
29
|
+
- **Comprehensive Testing**: 284 tests covering unit, integration, and regression scenarios
|
|
26
30
|
- **Input Validation**: Zod schemas prevent injection attacks
|
|
27
31
|
- **No Secret Leakage**: Generic session descriptions only (file permissions 0o600)
|
|
28
32
|
- **No ReDoS**: Bounded regex patterns prevent catastrophic backtracking
|
|
@@ -52,6 +56,13 @@ npm install -g @google/gemini-cli
|
|
|
52
56
|
# Or: https://github.com/google-gemini/gemini-cli
|
|
53
57
|
```
|
|
54
58
|
|
|
59
|
+
### Grok CLI (xAI)
|
|
60
|
+
```bash
|
|
61
|
+
npm install -g grok-build
|
|
62
|
+
grok login # OAuth flow, or set GROK_CODE_XAI_API_KEY
|
|
63
|
+
# Docs: https://docs.x.ai/build/cli
|
|
64
|
+
```
|
|
65
|
+
|
|
55
66
|
## Installation
|
|
56
67
|
|
|
57
68
|
### As an MCP server (npm)
|
|
@@ -201,8 +212,54 @@ Execute a Gemini CLI request with session support.
|
|
|
201
212
|
}
|
|
202
213
|
```
|
|
203
214
|
|
|
204
|
-
##### `
|
|
205
|
-
|
|
215
|
+
##### `grok_request`
|
|
216
|
+
Execute a Grok CLI (xAI) request with session support.
|
|
217
|
+
|
|
218
|
+
**Parameters:**
|
|
219
|
+
- `prompt` (string, required): The prompt to send (1-100,000 chars)
|
|
220
|
+
- `model` (string, optional): Model name or alias (e.g. `grok-build`, `latest`)
|
|
221
|
+
- `outputFormat` (string, optional): `"plain"` (default), `"json"`, or `"streaming-json"`
|
|
222
|
+
- `sessionId` (string, optional): Session ID to resume (`--resume <id>`)
|
|
223
|
+
- `resumeLatest` (boolean, optional): Resume the most recent session in the current cwd (`--continue`)
|
|
224
|
+
- `createNewSession` (boolean, optional): Always create a new session
|
|
225
|
+
- `alwaysApprove` (boolean, optional): Auto-approve all tool executions (`--always-approve`) in legacy mode
|
|
226
|
+
- `permissionMode` (string, optional): `default|acceptEdits|auto|dontAsk|bypassPermissions|plan`
|
|
227
|
+
- `effort` (string, optional): `low|medium|high|xhigh|max`
|
|
228
|
+
- `reasoningEffort` (string, optional): Reasoning effort for reasoning models
|
|
229
|
+
- `approvalStrategy` (string, optional): `"legacy"` (default) or `"mcp_managed"`
|
|
230
|
+
- `approvalPolicy` (string, optional): `"strict"`, `"balanced"`, or `"permissive"`
|
|
231
|
+
- `mcpServers` (string[], optional): MCP server names tracked for approvals (Grok manages its own MCP config via `grok mcp`)
|
|
232
|
+
- `allowedTools` (string[], optional): Allowed built-in tools (passed as `--tools` comma list)
|
|
233
|
+
- `disallowedTools` (string[], optional): Disallowed built-in tools (passed as `--disallowed-tools` comma list)
|
|
234
|
+
- `optimizePrompt` (boolean, optional): Optimize prompt for token efficiency, default: false
|
|
235
|
+
- `optimizeResponse` (boolean, optional): Optimize response for token efficiency, default: false
|
|
236
|
+
- `correlationId` (string, optional): Request trace ID (auto-generated if omitted)
|
|
237
|
+
|
|
238
|
+
**Example:**
|
|
239
|
+
```json
|
|
240
|
+
{
|
|
241
|
+
"prompt": "Summarize the latest commit message in 1 sentence",
|
|
242
|
+
"model": "grok-build",
|
|
243
|
+
"effort": "low"
|
|
244
|
+
}
|
|
245
|
+
```
|
|
246
|
+
|
|
247
|
+
#### Durable job results & automatic dedup
|
|
248
|
+
|
|
249
|
+
Every async job is persisted to a `jobs` table in `~/.llm-cli-gateway/logs.db` as it transitions through running → completed/failed/canceled. This makes the gateway a durable collection layer:
|
|
250
|
+
|
|
251
|
+
- **Re-issuing a request is safe.** Identical `*_request` / `*_request_async` calls within the dedup window (default 1 hour) short-circuit onto the existing running or completed job — the caller gets back the same job ID instead of starting a duplicate run. This directly fixes the "agent times out polling, re-issues, and the whole job starts over" failure mode.
|
|
252
|
+
- **`llm_job_status` and `llm_job_result` work across gateway restarts.** Job rows live for 30 days by default; callers can fetch results long after the in-memory cache has evicted them.
|
|
253
|
+
- **Jobs running at shutdown are marked `orphaned`** on the next gateway boot (the detached child can't be reattached to). Their captured partial output remains readable.
|
|
254
|
+
- **Pass `forceRefresh: true`** on any request tool to bypass dedup and force a fresh CLI run.
|
|
255
|
+
|
|
256
|
+
Environment variables:
|
|
257
|
+
- `LLM_GATEWAY_JOB_RETENTION_DAYS` — how long completed jobs stay queryable. Default `30`.
|
|
258
|
+
- `LLM_GATEWAY_DEDUP_WINDOW_MS` — how recent an existing job must be to dedup against. Default `3600000` (1 hour). Set `0` to disable dedup.
|
|
259
|
+
- `LLM_GATEWAY_JOBS_DB` — override the sqlite path. Defaults to the value of `LLM_GATEWAY_LOGS_DB`, then `~/.llm-cli-gateway/logs.db`. Set to `none` to disable durability entirely (in-memory only).
|
|
260
|
+
|
|
261
|
+
##### `claude_request_async` / `codex_request_async` / `gemini_request_async` / `grok_request_async`
|
|
262
|
+
Start a long-running Claude, Codex, Gemini, or Grok request without waiting for completion in the same MCP call.
|
|
206
263
|
|
|
207
264
|
Use this flow when analysis/runtime can exceed client tool-call limits:
|
|
208
265
|
1. Start job with `*_request_async`
|
|
@@ -240,7 +297,7 @@ Approval records are persisted to `~/.llm-cli-gateway/approvals.jsonl`.
|
|
|
240
297
|
Create a new session for a specific CLI.
|
|
241
298
|
|
|
242
299
|
**Parameters:**
|
|
243
|
-
- `cli` (string, required): CLI to create session for ("claude", "codex", "gemini")
|
|
300
|
+
- `cli` (string, required): CLI to create session for ("claude", "codex", "gemini", "grok")
|
|
244
301
|
- `description` (string, optional): Description for the session
|
|
245
302
|
- `setAsActive` (boolean, optional): Set as active session, default: true
|
|
246
303
|
|
|
@@ -257,7 +314,7 @@ Create a new session for a specific CLI.
|
|
|
257
314
|
List all sessions, optionally filtered by CLI.
|
|
258
315
|
|
|
259
316
|
**Parameters:**
|
|
260
|
-
- `cli` (string, optional): Filter by CLI ("claude", "codex", "gemini")
|
|
317
|
+
- `cli` (string, optional): Filter by CLI ("claude", "codex", "gemini", "grok")
|
|
261
318
|
|
|
262
319
|
**Response includes:**
|
|
263
320
|
- Total session count
|
|
@@ -295,12 +352,74 @@ Clear all sessions, optionally for a specific CLI.
|
|
|
295
352
|
List available models for each CLI.
|
|
296
353
|
|
|
297
354
|
**Parameters:**
|
|
298
|
-
- `cli` (string, optional): Specific CLI to list models for ("claude", "codex", "gemini")
|
|
355
|
+
- `cli` (string, optional): Specific CLI to list models for ("claude", "codex", "gemini", "grok")
|
|
299
356
|
|
|
300
357
|
**Response includes:**
|
|
301
358
|
- Model names and descriptions
|
|
302
359
|
- Best use cases for each model
|
|
303
360
|
- CLI-specific information
|
|
361
|
+
- `defaultModel` and `defaultModelSource` when a default is explicitly configured
|
|
362
|
+
- `modelMetadata` with source/confidence (`fallback`, `config`, `env`, `observed`)
|
|
363
|
+
- `aliases` and `warnings` when configured or when discovery degrades gracefully
|
|
364
|
+
|
|
365
|
+
The registry treats explicit configuration as authoritative. Bundled fallback models are low-confidence hints, and Gemini models observed in local session history are merged as low-confidence entries only; they do not become the default model.
|
|
366
|
+
|
|
367
|
+
Model registry environment overrides:
|
|
368
|
+
|
|
369
|
+
```bash
|
|
370
|
+
# Explicit defaults
|
|
371
|
+
CLAUDE_DEFAULT_MODEL=haiku
|
|
372
|
+
CODEX_DEFAULT_MODEL=<codex-model-id>
|
|
373
|
+
GEMINI_DEFAULT_MODEL=gemini-2.5-flash
|
|
374
|
+
|
|
375
|
+
# Additional models: comma/newline list, JSON array, or JSON object of model->description
|
|
376
|
+
GEMINI_MODELS='{"gemini-team-default":"Team-approved Gemini model"}'
|
|
377
|
+
|
|
378
|
+
# Aliases
|
|
379
|
+
GEMINI_MODEL_ALIASES='team=gemini-team-default'
|
|
380
|
+
LLM_GATEWAY_MODEL_ALIASES='codex.fast=gpt-5.3-codex-spark,gemini.fast=gemini-team-default'
|
|
381
|
+
|
|
382
|
+
# Deterministic config/discovery paths
|
|
383
|
+
CODEX_CONFIG_PATH=/path/to/config.toml
|
|
384
|
+
CLAUDE_SETTINGS_PATH=/path/to/settings.json
|
|
385
|
+
CLAUDE_SETTINGS_LOCAL_PATH=/path/to/settings.local.json
|
|
386
|
+
GEMINI_SETTINGS_PATH=/path/to/settings.json
|
|
387
|
+
GEMINI_HISTORY_ROOT=/path/to/.gemini/tmp
|
|
388
|
+
|
|
389
|
+
# Disable local model-history discovery
|
|
390
|
+
LLM_GATEWAY_DISABLE_MODEL_DISCOVERY=1
|
|
391
|
+
```
|
|
392
|
+
|
|
393
|
+
##### `cli_versions`
|
|
394
|
+
Report installed CLI versions.
|
|
395
|
+
|
|
396
|
+
**Parameters:**
|
|
397
|
+
- `cli` (string, optional): Specific CLI to inspect ("claude", "codex", "gemini", "grok")
|
|
398
|
+
|
|
399
|
+
##### `cli_upgrade`
|
|
400
|
+
Plan or run an upgrade for one CLI.
|
|
401
|
+
|
|
402
|
+
**Parameters:**
|
|
403
|
+
- `cli` (string, required): CLI to upgrade ("claude", "codex", "gemini", "grok")
|
|
404
|
+
- `target` (string, optional): Package tag/version/target, default: `latest`
|
|
405
|
+
- `dryRun` (boolean, optional): Return the upgrade plan without running it, default: `true`
|
|
406
|
+
- `timeoutMs` (number, optional): Upgrade timeout when `dryRun=false`
|
|
407
|
+
|
|
408
|
+
**Upgrade strategies:**
|
|
409
|
+
- Claude latest: `claude update`
|
|
410
|
+
- Claude explicit target: `claude install <target>`
|
|
411
|
+
- Codex latest: `codex update`
|
|
412
|
+
- Codex explicit target: `npm install -g @openai/codex@<target>`
|
|
413
|
+
- Gemini: `npm install -g @google/gemini-cli@<target>`
|
|
414
|
+
|
|
415
|
+
**Example dry run:**
|
|
416
|
+
```json
|
|
417
|
+
{
|
|
418
|
+
"cli": "gemini",
|
|
419
|
+
"target": "latest",
|
|
420
|
+
"dryRun": true
|
|
421
|
+
}
|
|
422
|
+
```
|
|
304
423
|
|
|
305
424
|
## Session Management
|
|
306
425
|
|
|
@@ -360,6 +479,13 @@ await callTool("session_delete", {
|
|
|
360
479
|
```bash
|
|
361
480
|
LLM_GATEWAY_APPROVAL_POLICY=strict node dist/index.js
|
|
362
481
|
```
|
|
482
|
+
- `LLM_GATEWAY_LOGS_DB`: Path to SQLite flight recorder database. Default: `~/.llm-cli-gateway/logs.db`. Set to empty string or `none` to disable logging.
|
|
483
|
+
```bash
|
|
484
|
+
# Custom path
|
|
485
|
+
LLM_GATEWAY_LOGS_DB=/var/log/gateway/logs.db node dist/index.js
|
|
486
|
+
# Disable flight recorder
|
|
487
|
+
LLM_GATEWAY_LOGS_DB=none node dist/index.js
|
|
488
|
+
```
|
|
363
489
|
|
|
364
490
|
### CLI-Specific Settings
|
|
365
491
|
|
|
@@ -368,6 +494,25 @@ Each CLI can be configured through its own configuration files:
|
|
|
368
494
|
- Codex: `~/.codex/config.toml`
|
|
369
495
|
- Gemini: `~/.gemini/config.json`
|
|
370
496
|
|
|
497
|
+
## For Fans of Simon Willison
|
|
498
|
+
|
|
499
|
+
Simon's `llm` tool made it trivially easy to talk to any LLM from the command line. But as AI-assisted development matures, the challenge shifts from "how do I call a model" to "how do I orchestrate multiple models reliably, and what did they actually do?"
|
|
500
|
+
|
|
501
|
+
**Multiple models increase the confidence factor.** When Claude writes code, Codex reviews it, and Gemini checks for bugs -- each bringing different training data and reasoning patterns -- the result is more robust than any single model alone. And often this isn't even enough. Having the models do iterative reviews is where you start getting real confidence.
|
|
502
|
+
|
|
503
|
+
**Every interaction should be queryable data.** Inspired by `llm`'s SQLite logging philosophy, the gateway records every request and response to a local SQLite database. Not just prompts and responses -- retry counts, circuit breaker states, approval decisions, thinking blocks, cost estimates. Open it with Datasette and you have a complete operational picture of your AI usage:
|
|
504
|
+
|
|
505
|
+
datasette ~/.llm-cli-gateway/logs.db
|
|
506
|
+
|
|
507
|
+
**The `llm-gateway` plugin bridges both worlds.** Install it, and your existing `llm` workflows gain orchestration features without changing how you work:
|
|
508
|
+
|
|
509
|
+
llm install llm-gateway
|
|
510
|
+
llm -m gateway-claude "explain this function"
|
|
511
|
+
|
|
512
|
+
Your gateway interactions appear in both `llm logs` (for your personal history) and the gateway's flight recorder (for operational observability). Two audiences, one workflow.
|
|
513
|
+
|
|
514
|
+
**Composability over monoliths.** The gateway doesn't replace `llm` -- it complements it. Use `llm` directly when you want simplicity. Route through the gateway when you want resilience, multi-model coordination, or detailed operational telemetry. The plugin is the bridge, not the destination.
|
|
515
|
+
|
|
371
516
|
## Development
|
|
372
517
|
|
|
373
518
|
### Project Structure
|
|
@@ -542,4 +687,3 @@ For issues and questions:
|
|
|
542
687
|
## Changelog
|
|
543
688
|
|
|
544
689
|
See [CHANGELOG.md](CHANGELOG.md) for detailed release history.
|
|
545
|
-
|
|
@@ -2,7 +2,7 @@ import type { Logger } from "./logger.js";
|
|
|
2
2
|
import type { ReviewIntegrityResult } from "./review-integrity.js";
|
|
3
3
|
export type ApprovalPolicy = "strict" | "balanced" | "permissive";
|
|
4
4
|
export type ApprovalStrategy = "legacy" | "mcp_managed";
|
|
5
|
-
export type ApprovalCli = "claude" | "codex" | "gemini";
|
|
5
|
+
export type ApprovalCli = "claude" | "codex" | "gemini" | "grok";
|
|
6
6
|
export type ApprovalStatus = "approved" | "denied";
|
|
7
7
|
export interface ApprovalRequest {
|
|
8
8
|
cli: ApprovalCli;
|
package/dist/approval-manager.js
CHANGED
|
@@ -83,7 +83,9 @@ export class ApprovalManager {
|
|
|
83
83
|
// Canonicalize to handle scoped forms like "Read(*)", "Bash(git:*)"
|
|
84
84
|
const canonicalized = request.disallowedTools.map(s => {
|
|
85
85
|
const trimmed = s.trim();
|
|
86
|
-
const cut = Math.min(...[trimmed.indexOf("("), trimmed.indexOf(":")]
|
|
86
|
+
const cut = Math.min(...[trimmed.indexOf("("), trimmed.indexOf(":")]
|
|
87
|
+
.filter(i => i >= 0)
|
|
88
|
+
.concat([trimmed.length]));
|
|
87
89
|
return trimmed.slice(0, cut).trim();
|
|
88
90
|
});
|
|
89
91
|
const blockedCritical = criticalTools.filter(t => canonicalized.includes(t));
|
|
@@ -103,7 +105,8 @@ export class ApprovalManager {
|
|
|
103
105
|
if (request.reviewIntegrity && request.reviewIntegrity.violations.length > 0) {
|
|
104
106
|
for (const violation of request.reviewIntegrity.violations) {
|
|
105
107
|
// Skip empty_allowed_tools and critical_tools_disallowed — already handled in context-dependent scoring above
|
|
106
|
-
if (violation.type === "empty_allowed_tools" ||
|
|
108
|
+
if (violation.type === "empty_allowed_tools" ||
|
|
109
|
+
violation.type === "critical_tools_disallowed")
|
|
107
110
|
continue;
|
|
108
111
|
score += violation.score;
|
|
109
112
|
reasons.push(`Review integrity: ${violation.detail}`);
|
|
@@ -128,12 +131,12 @@ export class ApprovalManager {
|
|
|
128
131
|
bypassRequested: request.bypassRequested,
|
|
129
132
|
fullAuto: request.fullAuto,
|
|
130
133
|
metadata: request.metadata,
|
|
131
|
-
reviewIntegrity: request.reviewIntegrity
|
|
134
|
+
reviewIntegrity: request.reviewIntegrity,
|
|
132
135
|
};
|
|
133
136
|
appendFileSync(this.logPath, `${JSON.stringify(record)}\n`, { encoding: "utf-8", mode: 0o600 });
|
|
134
137
|
this.logger.info(`Approval decision: ${status} (score=${score}, policy=${policy})`, {
|
|
135
138
|
cli: request.cli,
|
|
136
|
-
operation: request.operation
|
|
139
|
+
operation: request.operation,
|
|
137
140
|
});
|
|
138
141
|
return record;
|
|
139
142
|
}
|
|
@@ -1,7 +1,8 @@
|
|
|
1
1
|
import type { Logger } from "./logger.js";
|
|
2
2
|
import { type JobHealth } from "./process-monitor.js";
|
|
3
|
-
|
|
4
|
-
export type
|
|
3
|
+
import { JobStore } from "./job-store.js";
|
|
4
|
+
export type LlmCli = "claude" | "codex" | "gemini" | "grok";
|
|
5
|
+
export type AsyncJobStatus = "running" | "completed" | "failed" | "canceled" | "orphaned";
|
|
5
6
|
export interface AsyncJobSnapshot {
|
|
6
7
|
id: string;
|
|
7
8
|
cli: LlmCli;
|
|
@@ -22,16 +23,64 @@ export interface AsyncJobResult extends AsyncJobSnapshot {
|
|
|
22
23
|
stdoutTruncated: boolean;
|
|
23
24
|
stderrTruncated: boolean;
|
|
24
25
|
}
|
|
26
|
+
export interface StartJobOptions {
|
|
27
|
+
cwd?: string;
|
|
28
|
+
idleTimeoutMs?: number;
|
|
29
|
+
outputFormat?: string;
|
|
30
|
+
/** Bypass dedup and force a fresh CLI run even if a recent matching job exists. */
|
|
31
|
+
forceRefresh?: boolean;
|
|
32
|
+
}
|
|
33
|
+
export interface StartJobOutcome {
|
|
34
|
+
snapshot: AsyncJobSnapshot;
|
|
35
|
+
/** Set to the existing job's id when the request was de-duplicated. */
|
|
36
|
+
deduped: boolean;
|
|
37
|
+
/** Set when deduped — the original job's correlation id, useful for logging. */
|
|
38
|
+
originalCorrelationId?: string;
|
|
39
|
+
}
|
|
25
40
|
export declare class AsyncJobManager {
|
|
26
41
|
private logger;
|
|
27
42
|
private onJobComplete?;
|
|
28
43
|
private jobs;
|
|
29
44
|
private evictionTimer;
|
|
30
45
|
private processMonitor;
|
|
31
|
-
|
|
46
|
+
private store;
|
|
47
|
+
constructor(logger?: Logger, onJobComplete?: ((cli: LlmCli, durationMs: number, success: boolean) => void) | undefined, store?: JobStore | null);
|
|
32
48
|
private emitMetrics;
|
|
33
49
|
private evictCompletedJobs;
|
|
34
|
-
|
|
50
|
+
/**
|
|
51
|
+
* Compute the dedup key for a job. Stable across re-issues of the same request,
|
|
52
|
+
* which is exactly what allows agents to safely retry without restarting the run.
|
|
53
|
+
*/
|
|
54
|
+
private buildRequestKey;
|
|
55
|
+
private safeStoreCall;
|
|
56
|
+
/**
|
|
57
|
+
* Flush in-memory stdout/stderr to the durable store if anything changed
|
|
58
|
+
* since the last flush. Throttled by OUTPUT_FLUSH_INTERVAL_MS to avoid
|
|
59
|
+
* pounding sqlite on every chunk of streaming output.
|
|
60
|
+
*/
|
|
61
|
+
private maybeFlushOutput;
|
|
62
|
+
private persistComplete;
|
|
63
|
+
/**
|
|
64
|
+
* Reconstitute an in-memory AsyncJobRecord from a durable row, so subsequent
|
|
65
|
+
* getJobSnapshot/getJobResult calls hit the in-memory cache.
|
|
66
|
+
* The reconstituted record has process=null — it represents historical data only.
|
|
67
|
+
*/
|
|
68
|
+
private hydrateFromStore;
|
|
69
|
+
/**
|
|
70
|
+
* Backwards-compatible entry point. Equivalent to startJobWithDedup({...}).snapshot.
|
|
71
|
+
* Existing callers keep working unchanged; forceRefresh is exposed as a trailing
|
|
72
|
+
* optional param for the dedup-aware path.
|
|
73
|
+
*/
|
|
74
|
+
startJob(cli: LlmCli, args: string[], correlationId: string, cwd?: string, idleTimeoutMs?: number, outputFormat?: string, forceRefresh?: boolean): AsyncJobSnapshot;
|
|
75
|
+
/**
|
|
76
|
+
* Start a job, with optional dedup against recent identical requests.
|
|
77
|
+
* Returns `{ snapshot, deduped }` so callers can log/report the short-circuit.
|
|
78
|
+
*
|
|
79
|
+
* Dedup is keyed on (cli, args). If a job with the same key was started within
|
|
80
|
+
* the dedup window (default 1h) and is still running or completed, its snapshot
|
|
81
|
+
* is returned without spawning a new process. forceRefresh skips dedup entirely.
|
|
82
|
+
*/
|
|
83
|
+
startJobWithDedup(cli: LlmCli, args: string[], correlationId: string, opts?: StartJobOptions): StartJobOutcome;
|
|
35
84
|
getJobSnapshot(jobId: string): AsyncJobSnapshot | null;
|
|
36
85
|
getJobResult(jobId: string, maxChars?: number): AsyncJobResult | null;
|
|
37
86
|
cancelJob(jobId: string): {
|