llm-cli-gateway 1.0.1 → 1.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -2,6 +2,48 @@
2
2
 
3
3
  All notable changes to the llm-cli-gateway project.
4
4
 
5
+ ## Unreleased
6
+
7
+ ## [1.4.0] - 2026-05-16
8
+
9
+ ### Added
10
+
11
+ - **Codex `exec resume` wired through the gateway** — `codex_request` and `codex_request_async` now accept `sessionId` (real Codex session UUID from `~/.codex/sessions/` or the `codex resume` picker) and `resumeLatest:true`, emitting `codex exec resume <UUID>` and `codex exec resume --last` respectively. Codex sessions are no longer bookkeeping-only at the gateway layer; multi-turn workflows carry real CLI continuity, matching Claude/Gemini/Grok. Gateway-generated `gw-*` IDs are rejected for Codex (as for Gemini/Grok). `--full-auto` is silently dropped on resume because `codex exec resume` does not accept it — the original session's approval policy is inherited.
12
+ - **Durable job results + automatic dedup** — Async jobs are now persisted to a `jobs` table in `~/.llm-cli-gateway/logs.db` on every state transition (start, output flush, completion). `llm_job_status` and `llm_job_result` fall back to the database when the job is no longer in memory, so callers can collect a result regardless of how long ago the work completed (default retention: **30 days**, configurable via `LLM_GATEWAY_JOB_RETENTION_DAYS`). Identical `*_request` / `*_request_async` calls within a dedup window (default **1 hour**, configurable via `LLM_GATEWAY_DEDUP_WINDOW_MS`) short-circuit onto the existing running or completed job instead of spawning a duplicate run — directly fixing the "agent re-issues and the whole job starts over" loop. Each tool now accepts `forceRefresh: true` to bypass dedup. Jobs that were running when the gateway last stopped are flipped to `orphaned` on startup so callers can still read their partial output.
13
+ - **Grok CLI provider (xAI Grok Build TUI)** — New `grok_request` and `grok_request_async` MCP tools mirror the existing Claude/Codex/Gemini surface (sync + async, session management via `--resume`/`--continue`, idle-timeout, approval policy, review-integrity, flight recorder, metrics). Auth assumes a prior `grok login` (OAuth) or `GROK_CODE_XAI_API_KEY`. Default model: `grok-build`. `GROK_DEFAULT_MODEL`, `GROK_MODELS`, and `GROK_MODEL_ALIASES` env vars are honored by the model registry. `cli_upgrade` treats Grok as self-updating (`grok update` / `grok update --version <target>`).
14
+ - **Source-aware model registry** — `list_models` now reports model source/confidence metadata, aliases, default model source, and non-fatal discovery warnings
15
+ - **Deterministic model configuration overrides** — Added `*_SETTINGS_PATH`, `GEMINI_HISTORY_ROOT`, `*_MODEL_ALIASES`, and `LLM_GATEWAY_MODEL_ALIASES` support for stable deployments and tests
16
+ - **CLI lifecycle tools** — Added `cli_versions` and `cli_upgrade` tools for inspecting and upgrading individual Claude, Codex, Gemini, and Grok CLI installations
17
+ - **`resolveCodexSessionArgs` helper** in `src/request-helpers.ts` with 7 new tests covering mode resolution and `gw-*` rejection (Codex uses an `exec resume` subcommand rather than a flag pair, so the helper returns a `mode` discriminant: `new` | `resume-by-id` | `resume-latest`)
18
+
19
+ ### Changed
20
+
21
+ - **`better-sqlite3` bumped to `^12.9.0`** (from `^11.0.0`) — required engines now `node 20.x || 22.x || 23.x || 24.x || 25.x`
22
+ - **Gemini history discovery is no longer authoritative** — Models observed in local Gemini session files are merged as low-confidence entries and no longer replace the registry or set the default model
23
+ - **Codex default handling remains explicit** — If Codex has no configured default, `default`/`latest` resolve to no model flag so the Codex CLI can use its own built-in default
24
+ - **Gateway skills refreshed** — The `.agents/skills/` (async-job-orchestration, implement-review-fix, multi-llm-review, secure-orchestration, session-workflow) and `skills/` (multi-llm-orchestration, multi-llm-consensus, model-routing, design-review-cycle, agent-codex-gate, codex-review-gate, red-team-assessment) skill docs now cover Grok, durable job results, auto-dedup, and the new Codex resume capability. `.agents/skills/` entries bumped to metadata version 1.5.
25
+
26
+ ## [1.1.0] - 2026-04-04
27
+
28
+ ### Added
29
+
30
+ - **SQLite flight recorder** — New `src/flight-recorder.ts` module logs all LLM requests/responses to `~/.llm-cli-gateway/logs.db` with two-phase logging (logStart/logComplete), WAL mode for concurrent Datasette reads, and graceful degradation when better-sqlite3 is unavailable
31
+ - **`LLM_GATEWAY_LOGS_DB` env var** — Configure flight recorder database path; set to empty string or `"none"` to disable logging entirely
32
+ - **`structuredContent` in MCP tool responses** — All tool handlers now return machine-readable metadata (model, cli, correlationId, sessionId, durationMs, token usage, exitCode) alongside the text response
33
+ - **`better-sqlite3` dependency** — Native SQLite addon for flight recorder (synchronous writes, WAL support)
34
+
35
+ ### Changed
36
+
37
+ - **review-integrity.ts simplified** — Reduced from 323 lines to 83 lines. Retains 3 violation types: empty_allowed_tools, critical_tools_disallowed, tool_suppression. Removed inlined_code detection and multi-pattern matching
38
+ - **`buildCliResponse` signature** — Now requires `cli` and `durationMs` parameters for structuredContent population
39
+ - **`createErrorResponse`** — Returns sanitized `errorCategory` enum in structuredContent instead of raw error messages (prevents path/secret leakage)
40
+ - **Flight recorder writes are idempotent** — logComplete only updates rows with status='started', preventing double-completion
41
+
42
+ ### Tests
43
+
44
+ - 284 tests passing (15 test files)
45
+ - Rewritten review-integrity tests to match simplified API
46
+
5
47
  ## [1.3.0] - 2026-02-15
6
48
 
7
49
  ### Fixed
package/README.md CHANGED
@@ -3,17 +3,21 @@
3
3
  > *"Without consultation, plans are frustrated, but with many counselors they succeed."*
4
4
  > — Proverbs 15:22 (LSB)
5
5
 
6
- A Model Context Protocol (MCP) server providing unified access to Claude Code, Codex, and Gemini CLIs with session management, retry logic, and async job orchestration.
6
+ A Model Context Protocol (MCP) server providing unified access to Claude Code, Codex, Gemini, and Grok CLIs with session management, retry logic, and async job orchestration.
7
7
 
8
8
  ## Features
9
9
 
10
10
  ### Core Capabilities
11
- - **Multi-LLM Orchestration**: Unified interface for Claude Code, Codex, and Gemini CLIs
11
+ - **Multi-LLM Orchestration**: Unified interface for Claude Code, Codex, Gemini, and Grok CLIs
12
12
  - **Session Management**: Track and resume conversations across all CLIs with persistent storage
13
13
  - **Token Optimization**: Automatic 44% reduction on prompts, 37% on responses (opt-in)
14
14
  - **Correlation ID Tracking**: Full request tracing across all LLM interactions
15
15
  - **Cross-Tool Collaboration**: LLMs can use each other via MCP (validated through dogfooding)
16
16
 
17
+ ### Observability
18
+ - **SQLite Flight Recorder**: Every request/response logged to `~/.llm-cli-gateway/logs.db` with correlation IDs, token usage, duration, retry counts, and circuit breaker state. Browse with [Datasette](https://datasette.io/): `datasette ~/.llm-cli-gateway/logs.db`
19
+ - **Structured Metadata**: Tool responses include machine-readable `structuredContent` (model, cli, correlationId, sessionId, durationMs, token counts)
20
+
17
21
  ### Reliability & Performance
18
22
  - **Retry Logic**: Exponential backoff with circuit breaker for transient failures
19
23
  - **Atomic File Writes**: Process-specific temp files with fsync for data integrity
@@ -22,7 +26,7 @@ A Model Context Protocol (MCP) server providing unified access to Claude Code, C
22
26
  - **Long-Running Jobs**: Non-time-bound async execution via `*_request_async` + polling tools
23
27
 
24
28
  ### Security & Quality
25
- - **Comprehensive Testing**: 221 tests covering unit, integration, and regression scenarios
29
+ - **Comprehensive Testing**: 284 tests covering unit, integration, and regression scenarios
26
30
  - **Input Validation**: Zod schemas prevent injection attacks
27
31
  - **No Secret Leakage**: Generic session descriptions only (file permissions 0o600)
28
32
  - **No ReDoS**: Bounded regex patterns prevent catastrophic backtracking
@@ -52,6 +56,13 @@ npm install -g @google/gemini-cli
52
56
  # Or: https://github.com/google-gemini/gemini-cli
53
57
  ```
54
58
 
59
+ ### Grok CLI (xAI)
60
+ ```bash
61
+ npm install -g grok-build
62
+ grok login # OAuth flow, or set GROK_CODE_XAI_API_KEY
63
+ # Docs: https://docs.x.ai/build/cli
64
+ ```
65
+
55
66
  ## Installation
56
67
 
57
68
  ### As an MCP server (npm)
@@ -201,8 +212,54 @@ Execute a Gemini CLI request with session support.
201
212
  }
202
213
  ```
203
214
 
204
- ##### `claude_request_async` / `codex_request_async`
205
- Start a long-running Claude or Codex request without waiting for completion in the same MCP call.
215
+ ##### `grok_request`
216
+ Execute a Grok CLI (xAI) request with session support.
217
+
218
+ **Parameters:**
219
+ - `prompt` (string, required): The prompt to send (1-100,000 chars)
220
+ - `model` (string, optional): Model name or alias (e.g. `grok-build`, `latest`)
221
+ - `outputFormat` (string, optional): `"plain"` (default), `"json"`, or `"streaming-json"`
222
+ - `sessionId` (string, optional): Session ID to resume (`--resume <id>`)
223
+ - `resumeLatest` (boolean, optional): Resume the most recent session in the current cwd (`--continue`)
224
+ - `createNewSession` (boolean, optional): Always create a new session
225
+ - `alwaysApprove` (boolean, optional): Auto-approve all tool executions (`--always-approve`) in legacy mode
226
+ - `permissionMode` (string, optional): `default|acceptEdits|auto|dontAsk|bypassPermissions|plan`
227
+ - `effort` (string, optional): `low|medium|high|xhigh|max`
228
+ - `reasoningEffort` (string, optional): Reasoning effort for reasoning models
229
+ - `approvalStrategy` (string, optional): `"legacy"` (default) or `"mcp_managed"`
230
+ - `approvalPolicy` (string, optional): `"strict"`, `"balanced"`, or `"permissive"`
231
+ - `mcpServers` (string[], optional): MCP server names tracked for approvals (Grok manages its own MCP config via `grok mcp`)
232
+ - `allowedTools` (string[], optional): Allowed built-in tools (passed as `--tools` comma list)
233
+ - `disallowedTools` (string[], optional): Disallowed built-in tools (passed as `--disallowed-tools` comma list)
234
+ - `optimizePrompt` (boolean, optional): Optimize prompt for token efficiency, default: false
235
+ - `optimizeResponse` (boolean, optional): Optimize response for token efficiency, default: false
236
+ - `correlationId` (string, optional): Request trace ID (auto-generated if omitted)
237
+
238
+ **Example:**
239
+ ```json
240
+ {
241
+ "prompt": "Summarize the latest commit message in 1 sentence",
242
+ "model": "grok-build",
243
+ "effort": "low"
244
+ }
245
+ ```
246
+
247
+ #### Durable job results & automatic dedup
248
+
249
+ Every async job is persisted to a `jobs` table in `~/.llm-cli-gateway/logs.db` as it transitions through running → completed/failed/canceled. This makes the gateway a durable collection layer:
250
+
251
+ - **Re-issuing a request is safe.** Identical `*_request` / `*_request_async` calls within the dedup window (default 1 hour) short-circuit onto the existing running or completed job — the caller gets back the same job ID instead of starting a duplicate run. This directly fixes the "agent times out polling, re-issues, and the whole job starts over" failure mode.
252
+ - **`llm_job_status` and `llm_job_result` work across gateway restarts.** Job rows live for 30 days by default; callers can fetch results long after the in-memory cache has evicted them.
253
+ - **Jobs running at shutdown are marked `orphaned`** on the next gateway boot (the detached child can't be reattached to). Their captured partial output remains readable.
254
+ - **Pass `forceRefresh: true`** on any request tool to bypass dedup and force a fresh CLI run.
255
+
256
+ Environment variables:
257
+ - `LLM_GATEWAY_JOB_RETENTION_DAYS` — how long completed jobs stay queryable. Default `30`.
258
+ - `LLM_GATEWAY_DEDUP_WINDOW_MS` — how recent an existing job must be to dedup against. Default `3600000` (1 hour). Set `0` to disable dedup.
259
+ - `LLM_GATEWAY_JOBS_DB` — override the sqlite path. Defaults to the value of `LLM_GATEWAY_LOGS_DB`, then `~/.llm-cli-gateway/logs.db`. Set to `none` to disable durability entirely (in-memory only).
260
+
261
+ ##### `claude_request_async` / `codex_request_async` / `gemini_request_async` / `grok_request_async`
262
+ Start a long-running Claude, Codex, Gemini, or Grok request without waiting for completion in the same MCP call.
206
263
 
207
264
  Use this flow when analysis/runtime can exceed client tool-call limits:
208
265
  1. Start job with `*_request_async`
@@ -240,7 +297,7 @@ Approval records are persisted to `~/.llm-cli-gateway/approvals.jsonl`.
240
297
  Create a new session for a specific CLI.
241
298
 
242
299
  **Parameters:**
243
- - `cli` (string, required): CLI to create session for ("claude", "codex", "gemini")
300
+ - `cli` (string, required): CLI to create session for ("claude", "codex", "gemini", "grok")
244
301
  - `description` (string, optional): Description for the session
245
302
  - `setAsActive` (boolean, optional): Set as active session, default: true
246
303
 
@@ -257,7 +314,7 @@ Create a new session for a specific CLI.
257
314
  List all sessions, optionally filtered by CLI.
258
315
 
259
316
  **Parameters:**
260
- - `cli` (string, optional): Filter by CLI ("claude", "codex", "gemini")
317
+ - `cli` (string, optional): Filter by CLI ("claude", "codex", "gemini", "grok")
261
318
 
262
319
  **Response includes:**
263
320
  - Total session count
@@ -295,12 +352,74 @@ Clear all sessions, optionally for a specific CLI.
295
352
  List available models for each CLI.
296
353
 
297
354
  **Parameters:**
298
- - `cli` (string, optional): Specific CLI to list models for ("claude", "codex", "gemini")
355
+ - `cli` (string, optional): Specific CLI to list models for ("claude", "codex", "gemini", "grok")
299
356
 
300
357
  **Response includes:**
301
358
  - Model names and descriptions
302
359
  - Best use cases for each model
303
360
  - CLI-specific information
361
+ - `defaultModel` and `defaultModelSource` when a default is explicitly configured
362
+ - `modelMetadata` with source/confidence (`fallback`, `config`, `env`, `observed`)
363
+ - `aliases` and `warnings` when configured or when discovery degrades gracefully
364
+
365
+ The registry treats explicit configuration as authoritative. Bundled fallback models are low-confidence hints, and Gemini models observed in local session history are merged as low-confidence entries only; they do not become the default model.
366
+
367
+ Model registry environment overrides:
368
+
369
+ ```bash
370
+ # Explicit defaults
371
+ CLAUDE_DEFAULT_MODEL=haiku
372
+ CODEX_DEFAULT_MODEL=<codex-model-id>
373
+ GEMINI_DEFAULT_MODEL=gemini-2.5-flash
374
+
375
+ # Additional models: comma/newline list, JSON array, or JSON object of model->description
376
+ GEMINI_MODELS='{"gemini-team-default":"Team-approved Gemini model"}'
377
+
378
+ # Aliases
379
+ GEMINI_MODEL_ALIASES='team=gemini-team-default'
380
+ LLM_GATEWAY_MODEL_ALIASES='codex.fast=gpt-5.3-codex-spark,gemini.fast=gemini-team-default'
381
+
382
+ # Deterministic config/discovery paths
383
+ CODEX_CONFIG_PATH=/path/to/config.toml
384
+ CLAUDE_SETTINGS_PATH=/path/to/settings.json
385
+ CLAUDE_SETTINGS_LOCAL_PATH=/path/to/settings.local.json
386
+ GEMINI_SETTINGS_PATH=/path/to/settings.json
387
+ GEMINI_HISTORY_ROOT=/path/to/.gemini/tmp
388
+
389
+ # Disable local model-history discovery
390
+ LLM_GATEWAY_DISABLE_MODEL_DISCOVERY=1
391
+ ```
392
+
393
+ ##### `cli_versions`
394
+ Report installed CLI versions.
395
+
396
+ **Parameters:**
397
+ - `cli` (string, optional): Specific CLI to inspect ("claude", "codex", "gemini", "grok")
398
+
399
+ ##### `cli_upgrade`
400
+ Plan or run an upgrade for one CLI.
401
+
402
+ **Parameters:**
403
+ - `cli` (string, required): CLI to upgrade ("claude", "codex", "gemini", "grok")
404
+ - `target` (string, optional): Package tag/version/target, default: `latest`
405
+ - `dryRun` (boolean, optional): Return the upgrade plan without running it, default: `true`
406
+ - `timeoutMs` (number, optional): Upgrade timeout when `dryRun=false`
407
+
408
+ **Upgrade strategies:**
409
+ - Claude latest: `claude update`
410
+ - Claude explicit target: `claude install <target>`
411
+ - Codex latest: `codex update`
412
+ - Codex explicit target: `npm install -g @openai/codex@<target>`
413
+ - Gemini: `npm install -g @google/gemini-cli@<target>`
414
+
415
+ **Example dry run:**
416
+ ```json
417
+ {
418
+ "cli": "gemini",
419
+ "target": "latest",
420
+ "dryRun": true
421
+ }
422
+ ```
304
423
 
305
424
  ## Session Management
306
425
 
@@ -360,6 +479,13 @@ await callTool("session_delete", {
360
479
  ```bash
361
480
  LLM_GATEWAY_APPROVAL_POLICY=strict node dist/index.js
362
481
  ```
482
+ - `LLM_GATEWAY_LOGS_DB`: Path to SQLite flight recorder database. Default: `~/.llm-cli-gateway/logs.db`. Set to empty string or `none` to disable logging.
483
+ ```bash
484
+ # Custom path
485
+ LLM_GATEWAY_LOGS_DB=/var/log/gateway/logs.db node dist/index.js
486
+ # Disable flight recorder
487
+ LLM_GATEWAY_LOGS_DB=none node dist/index.js
488
+ ```
363
489
 
364
490
  ### CLI-Specific Settings
365
491
 
@@ -368,6 +494,25 @@ Each CLI can be configured through its own configuration files:
368
494
  - Codex: `~/.codex/config.toml`
369
495
  - Gemini: `~/.gemini/config.json`
370
496
 
497
+ ## For Fans of Simon Willison
498
+
499
+ Simon's `llm` tool made it trivially easy to talk to any LLM from the command line. But as AI-assisted development matures, the challenge shifts from "how do I call a model" to "how do I orchestrate multiple models reliably, and what did they actually do?"
500
+
501
+ **Multiple models increase the confidence factor.** When Claude writes code, Codex reviews it, and Gemini checks for bugs -- each bringing different training data and reasoning patterns -- the result is more robust than any single model alone. And often this isn't even enough. Having the models do iterative reviews is where you start getting real confidence.
502
+
503
+ **Every interaction should be queryable data.** Inspired by `llm`'s SQLite logging philosophy, the gateway records every request and response to a local SQLite database. Not just prompts and responses -- retry counts, circuit breaker states, approval decisions, thinking blocks, cost estimates. Open it with Datasette and you have a complete operational picture of your AI usage:
504
+
505
+ datasette ~/.llm-cli-gateway/logs.db
506
+
507
+ **The `llm-gateway` plugin bridges both worlds.** Install it, and your existing `llm` workflows gain orchestration features without changing how you work:
508
+
509
+ llm install llm-gateway
510
+ llm -m gateway-claude "explain this function"
511
+
512
+ Your gateway interactions appear in both `llm logs` (for your personal history) and the gateway's flight recorder (for operational observability). Two audiences, one workflow.
513
+
514
+ **Composability over monoliths.** The gateway doesn't replace `llm` -- it complements it. Use `llm` directly when you want simplicity. Route through the gateway when you want resilience, multi-model coordination, or detailed operational telemetry. The plugin is the bridge, not the destination.
515
+
371
516
  ## Development
372
517
 
373
518
  ### Project Structure
@@ -542,4 +687,3 @@ For issues and questions:
542
687
  ## Changelog
543
688
 
544
689
  See [CHANGELOG.md](CHANGELOG.md) for detailed release history.
545
-
@@ -2,7 +2,7 @@ import type { Logger } from "./logger.js";
2
2
  import type { ReviewIntegrityResult } from "./review-integrity.js";
3
3
  export type ApprovalPolicy = "strict" | "balanced" | "permissive";
4
4
  export type ApprovalStrategy = "legacy" | "mcp_managed";
5
- export type ApprovalCli = "claude" | "codex" | "gemini";
5
+ export type ApprovalCli = "claude" | "codex" | "gemini" | "grok";
6
6
  export type ApprovalStatus = "approved" | "denied";
7
7
  export interface ApprovalRequest {
8
8
  cli: ApprovalCli;
@@ -83,7 +83,9 @@ export class ApprovalManager {
83
83
  // Canonicalize to handle scoped forms like "Read(*)", "Bash(git:*)"
84
84
  const canonicalized = request.disallowedTools.map(s => {
85
85
  const trimmed = s.trim();
86
- const cut = Math.min(...[trimmed.indexOf("("), trimmed.indexOf(":")].filter(i => i >= 0).concat([trimmed.length]));
86
+ const cut = Math.min(...[trimmed.indexOf("("), trimmed.indexOf(":")]
87
+ .filter(i => i >= 0)
88
+ .concat([trimmed.length]));
87
89
  return trimmed.slice(0, cut).trim();
88
90
  });
89
91
  const blockedCritical = criticalTools.filter(t => canonicalized.includes(t));
@@ -103,7 +105,8 @@ export class ApprovalManager {
103
105
  if (request.reviewIntegrity && request.reviewIntegrity.violations.length > 0) {
104
106
  for (const violation of request.reviewIntegrity.violations) {
105
107
  // Skip empty_allowed_tools and critical_tools_disallowed — already handled in context-dependent scoring above
106
- if (violation.type === "empty_allowed_tools" || violation.type === "critical_tools_disallowed")
108
+ if (violation.type === "empty_allowed_tools" ||
109
+ violation.type === "critical_tools_disallowed")
107
110
  continue;
108
111
  score += violation.score;
109
112
  reasons.push(`Review integrity: ${violation.detail}`);
@@ -128,12 +131,12 @@ export class ApprovalManager {
128
131
  bypassRequested: request.bypassRequested,
129
132
  fullAuto: request.fullAuto,
130
133
  metadata: request.metadata,
131
- reviewIntegrity: request.reviewIntegrity
134
+ reviewIntegrity: request.reviewIntegrity,
132
135
  };
133
136
  appendFileSync(this.logPath, `${JSON.stringify(record)}\n`, { encoding: "utf-8", mode: 0o600 });
134
137
  this.logger.info(`Approval decision: ${status} (score=${score}, policy=${policy})`, {
135
138
  cli: request.cli,
136
- operation: request.operation
139
+ operation: request.operation,
137
140
  });
138
141
  return record;
139
142
  }
@@ -1,7 +1,8 @@
1
1
  import type { Logger } from "./logger.js";
2
2
  import { type JobHealth } from "./process-monitor.js";
3
- export type LlmCli = "claude" | "codex" | "gemini";
4
- export type AsyncJobStatus = "running" | "completed" | "failed" | "canceled";
3
+ import { JobStore } from "./job-store.js";
4
+ export type LlmCli = "claude" | "codex" | "gemini" | "grok";
5
+ export type AsyncJobStatus = "running" | "completed" | "failed" | "canceled" | "orphaned";
5
6
  export interface AsyncJobSnapshot {
6
7
  id: string;
7
8
  cli: LlmCli;
@@ -22,16 +23,64 @@ export interface AsyncJobResult extends AsyncJobSnapshot {
22
23
  stdoutTruncated: boolean;
23
24
  stderrTruncated: boolean;
24
25
  }
26
+ export interface StartJobOptions {
27
+ cwd?: string;
28
+ idleTimeoutMs?: number;
29
+ outputFormat?: string;
30
+ /** Bypass dedup and force a fresh CLI run even if a recent matching job exists. */
31
+ forceRefresh?: boolean;
32
+ }
33
+ export interface StartJobOutcome {
34
+ snapshot: AsyncJobSnapshot;
35
+ /** Set to the existing job's id when the request was de-duplicated. */
36
+ deduped: boolean;
37
+ /** Set when deduped — the original job's correlation id, useful for logging. */
38
+ originalCorrelationId?: string;
39
+ }
25
40
  export declare class AsyncJobManager {
26
41
  private logger;
27
42
  private onJobComplete?;
28
43
  private jobs;
29
44
  private evictionTimer;
30
45
  private processMonitor;
31
- constructor(logger?: Logger, onJobComplete?: ((cli: LlmCli, durationMs: number, success: boolean) => void) | undefined);
46
+ private store;
47
+ constructor(logger?: Logger, onJobComplete?: ((cli: LlmCli, durationMs: number, success: boolean) => void) | undefined, store?: JobStore | null);
32
48
  private emitMetrics;
33
49
  private evictCompletedJobs;
34
- startJob(cli: LlmCli, args: string[], correlationId: string, cwd?: string, idleTimeoutMs?: number, outputFormat?: string): AsyncJobSnapshot;
50
+ /**
51
+ * Compute the dedup key for a job. Stable across re-issues of the same request,
52
+ * which is exactly what allows agents to safely retry without restarting the run.
53
+ */
54
+ private buildRequestKey;
55
+ private safeStoreCall;
56
+ /**
57
+ * Flush in-memory stdout/stderr to the durable store if anything changed
58
+ * since the last flush. Throttled by OUTPUT_FLUSH_INTERVAL_MS to avoid
59
+ * pounding sqlite on every chunk of streaming output.
60
+ */
61
+ private maybeFlushOutput;
62
+ private persistComplete;
63
+ /**
64
+ * Reconstitute an in-memory AsyncJobRecord from a durable row, so subsequent
65
+ * getJobSnapshot/getJobResult calls hit the in-memory cache.
66
+ * The reconstituted record has process=null — it represents historical data only.
67
+ */
68
+ private hydrateFromStore;
69
+ /**
70
+ * Backwards-compatible entry point. Equivalent to startJobWithDedup({...}).snapshot.
71
+ * Existing callers keep working unchanged; forceRefresh is exposed as a trailing
72
+ * optional param for the dedup-aware path.
73
+ */
74
+ startJob(cli: LlmCli, args: string[], correlationId: string, cwd?: string, idleTimeoutMs?: number, outputFormat?: string, forceRefresh?: boolean): AsyncJobSnapshot;
75
+ /**
76
+ * Start a job, with optional dedup against recent identical requests.
77
+ * Returns `{ snapshot, deduped }` so callers can log/report the short-circuit.
78
+ *
79
+ * Dedup is keyed on (cli, args). If a job with the same key was started within
80
+ * the dedup window (default 1h) and is still running or completed, its snapshot
81
+ * is returned without spawning a new process. forceRefresh skips dedup entirely.
82
+ */
83
+ startJobWithDedup(cli: LlmCli, args: string[], correlationId: string, opts?: StartJobOptions): StartJobOutcome;
35
84
  getJobSnapshot(jobId: string): AsyncJobSnapshot | null;
36
85
  getJobResult(jobId: string, maxChars?: number): AsyncJobResult | null;
37
86
  cancelJob(jobId: string): {