llm-cli-gateway 1.6.0 → 1.7.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -2,6 +2,211 @@
2
2
 
3
3
  All notable changes to the llm-cli-gateway project.
4
4
 
5
+ ## [1.7.0] - 2026-05-26 — cache-awareness slice 1.5 (async-path flight recorder + codex parser fix)
6
+
7
+ Closes the two telemetry gaps that v1.6.0 explicitly deferred: async-path
8
+ flight-recorder integration and Codex parser support for the actual
9
+ `cached_input_tokens` field the current Codex CLI emits. Both ship
10
+ together because they jointly close out `cache_state://*` completeness
11
+ for the async tools and the codex CLI.
12
+
13
+ ### Added — async-path flight recorder writes
14
+
15
+ - `AsyncJobManager` now accepts a `FlightRecorderLike` constructor
16
+ dependency (defaults to `NoopFlightRecorder` for tests that don't
17
+ inject one). `StartJobOptions` extended with `writeFlightStart`,
18
+ `flightRecorderEntry`, and `extractUsage` — pure async tools
19
+ (`*_request_async`) pass `writeFlightStart: true` so the manager owns
20
+ the row. The legacy positional `startJob(...)` signature was extended
21
+ with trailing optional params so existing callers keep working.
22
+ - New private `writeFlightComplete` helper inside the manager fires on
23
+ every terminal-state code path (close handler, error handler, idle
24
+ timeout, output overflow, cancelJob, evictCompletedJobs dead-process
25
+ and exited-mismatch branches). Failure payload mirrors sync-helper
26
+ semantics: `response = stderr || stdout` on failure, `errorMessage`
27
+ falls back through override → `job.error` → `job.stderr` →
28
+ `"Exit code N"`. Single-shot guard set only on successful write so a
29
+ thrown `logComplete` can be retried by a later terminal callback.
30
+ - New public `armFlightCompleteForDeferral(jobId)` on AsyncJobManager.
31
+ Called by `awaitJobOrDefer` in `src/index.ts` immediately before
32
+ returning a `DeferredJobResponse` — this lets the sync handler keep
33
+ ownership of the rich-metadata `safeFlightComplete` write for
34
+ sync-inline completions, while still ensuring deferred-from-sync rows
35
+ get a terminal `logComplete` from the manager when the underlying job
36
+ finishes. Includes a race-mitigation immediate-write path if the job
37
+ already terminated before the arm signal landed.
38
+ - `JobStore.markOrphanedOnStartup()` return shape extended from `number`
39
+ to `{ count, orphaned: Array<{ id, correlationId, startedAt, stdout,
40
+ stderr, exitCode }> }` so the manager constructor can write FR
41
+ `logComplete` rows for previously orphaned jobs with proper audit data
42
+ (durationMs from `startedAt`, response from `stderr || stdout`,
43
+ errorMessage `"orphaned after gateway restart"`). `SqliteJobStore`
44
+ SELECTs the per-orphan fields before the orphan-flip UPDATE; no
45
+ transaction wrapper needed because gateway boot is single-threaded
46
+ before any new jobs can arrive. `MemoryJobStore` returns
47
+ `{ count: 0, orphaned: [] }` (in-process state can't be orphaned).
48
+ Breaking change to the `JobStore` interface; the `PostgresJobStore`
49
+ stub was updated to match (the impl is still not yet shipped).
50
+ - `cache_state://global`, `cache_state://session/{id}`, and
51
+ `cache_state://prefix/{hash}` aggregates now include async-job
52
+ activity. No query changes — `cache_state://*` already didn't filter
53
+ on `asyncJobId`, so the new rows participate naturally.
54
+
55
+ ### Fixed — Codex parser accepts current CLI's cache-token field
56
+
57
+ - `src/codex-json-parser.ts` now reads `cached_input_tokens` (preferred,
58
+ what Codex CLI ≥0.133.0 emits) in addition to the legacy
59
+ `cache_read_input_tokens` and the bare `cache_read_tokens` fallback.
60
+ Live smoke-tested against Codex CLI on 2026-05-26 — see
61
+ `docs/personal-mcp/PROVIDER_CACHE_SURFACES.md` "Codex — field name
62
+ divergence" for the exact invocation. Cache hits on codex rows now
63
+ populate the FR's `cache_read_tokens` column.
64
+
65
+ ### Known limitation — sync-deferred-dedup orphan rows
66
+
67
+ When a sync request dedup-hits an in-flight original job AND the sync
68
+ deadline expires before the original finishes, the dedup'd caller's
69
+ sync-side `logStart` row stays at `status='started'` forever. The
70
+ manager's `logComplete` writes to the ORIGINAL job's correlationId, not
71
+ the dedup'd caller's. This is a pre-existing limitation surfaced by the
72
+ slice's clearer accounting; it predates v1.7.0 and is not a regression.
73
+ A future slice can address it via per-request corrId fan-out.
74
+
75
+ ### Cross-table asymmetry — `canceled` / `orphaned` jobs in the FR
76
+
77
+ `FlightLogResult.status` only carries `"completed" | "failed"`, so
78
+ canceled and orphaned async jobs are encoded as `"failed"` plus a
79
+ distinguishing `errorMessage`. The underlying `jobs` table in JobStore
80
+ retains the distinct `"canceled"` / `"orphaned"` statuses for
81
+ `getJobSnapshot` callers. External consumers of `~/.llm-cli-gateway/
82
+ logs.db` that filter `status='failed'` will count cancels and boot-time
83
+ orphans as errors; `cache_state://*` aggregation does not distinguish.
84
+
85
+ ### No config or schema changes
86
+
87
+ No migration. No new opt-in flag. The new behaviour is gated solely on
88
+ whether the caller (handler or `awaitJobOrDefer`) supplies a
89
+ `flightRecorderEntry` to `startJobWithDedup`. Tests/callers that don't
90
+ opt in see no behaviour change (the constructor's default
91
+ `NoopFlightRecorder` short-circuits the FR writes).
92
+
93
+ ### Migration impact
94
+
95
+ None. SQLite schema and TOML config surface are byte-identical to
96
+ v1.6.1. Rollback is non-destructive (revert the release commit).
97
+
98
+ ### Documentation
99
+
100
+ - `docs/plans/async-flight-recorder.dag.toml` — new slice plan (Unit A
101
+ unanimously approved across Codex/Gemini/Grok/Mistral).
102
+ - `docs/plans/async-flight-recorder.pr-body.md` — new PR description.
103
+ - `docs/personal-mcp/ASYNC_FLIGHT_RECORDER_SURFACES.md` — new research
104
+ note documenting every terminal state, the data contract per FR write
105
+ site, the sync-path responsibility split table, and the cancel /
106
+ orphan / dedup limitations.
107
+ - `docs/personal-mcp/PROVIDER_CACHE_SURFACES.md` — Codex section updated
108
+ to reflect that the parser now accepts `cached_input_tokens`; slice 2
109
+ "Populated for **claude only** today" claim corrected to include
110
+ codex.
111
+ - `docs/launch/blog-cache-awareness.md` — slice 1.5 follow-up note in
112
+ the "What's next" section.
113
+
114
+ ## [1.6.1] - 2026-05-26 — docs-only follow-up to 1.6.0
115
+
116
+ Pure documentation release; zero source-code changes since 1.6.0.
117
+
118
+ ### Changed — agent-install guidance current with v1.6.0 + five providers
119
+
120
+ - New `setup/providers/mistral-vibe.md` provider snippet (Mistral was the
121
+ fifth provider but had no setup/providers/ page; install agents had
122
+ nothing to point at when the user asked for Mistral coverage).
123
+ - New `setup/assistants/mistral-install-prompt.md` per-assistant install
124
+ prompt (mirrors the Grok prompt; outbound-only framing,
125
+ session_logging walk-through, `VIBE_ACTIVE_MODEL` guidance, secret-
126
+ safety rules preserved).
127
+ - `setup/assistants/ASSISTANT_CONTRACT.md`: Mistral added to "Applies
128
+ to" and outbound providers; new "Doctor Report Notes (v1.6.0)"
129
+ paragraph clarifying that the `cache_awareness` block is structural
130
+ (always present) and that all `[cache_awareness]` flags default off.
131
+ - All 6 per-assistant install prompts (universal, chatgpt, claude,
132
+ codex, gemini, grok) extended to enumerate all five providers and
133
+ reference the cache_awareness doctor block.
134
+ - `setup/install-plan.dag.toml` choose-targets / check-diagnostics /
135
+ apply-client-snippet steps generalised to all five providers; Mistral
136
+ named outbound-only; cache_awareness must-not-treat-as-blocker note
137
+ added inline. TOML re-validated.
138
+ - 6 `docs/personal-mcp/connect-*.md` legacy pages now carry an
139
+ admonition pointing to `setup/providers/` + `ASSISTANT_CONTRACT.md`
140
+ as canonical.
141
+
142
+ ### Changed — 12 SKILL.md files current with v1.6.0
143
+
144
+ - All 12 skills (7 under `skills/`, 5 under `.agents/skills/`) extended
145
+ with `promptParts`, `cache_state://` MCP resources, and (where the
146
+ skill's centre of gravity is session continuity) the
147
+ `cache_ttl_expiring_soon` warning. Depth tiered by skill audience:
148
+ multi-llm-orchestration, model-routing, multi-llm-consensus,
149
+ implement-review-fix, multi-llm-review, async-job-orchestration,
150
+ session-workflow, secure-orchestration carry full sections or
151
+ examples; agent-codex-gate, codex-review-gate, design-review-cycle,
152
+ red-team-assessment carry tip-level mentions.
153
+ - Plugin-namespaced skills (`.agents/skills/*`) version-bumped 1.5 → 1.6.
154
+ - Exact runtime strings cross-checked against `src/index.ts` (the
155
+ `provide exactly one of …` / `one of … is required` mutex errors and
156
+ the `cache_ttl_expiring_soon` warning code).
157
+
158
+ ### Fixed — README / BEST_PRACTICES / integrations doc drift
159
+
160
+ - README.md: headline + Core Capabilities now name Mistral as the fifth
161
+ provider; test counts 284 / 221 → 681; new Supply-chain hardening
162
+ call-out under Security & Quality.
163
+ - BEST_PRACTICES.md: testing coverage / performance lines 284 → 681.
164
+ - integrations/llm-plugin/README.md: Grok + Mistral added to providers
165
+ list, usage examples, and the "at least one of" requirements list.
166
+ - ENFORCEMENT.md: self-enforcement checklist provider list now Claude /
167
+ Codex / Gemini / Grok / Mistral.
168
+
169
+ ### Fixed — `docs/launch/blog-cache-awareness.md` accuracy + voice
170
+
171
+ Technical corrections from the multi-LLM voice + technical review:
172
+ - Mutually-exclusive error-string quotation reformatted so the
173
+ ``provide exactly one of `prompt` or `promptParts``` example renders
174
+ correctly in markdown.
175
+ - `lastWriteAt` references corrected to `lastRequestAt` (the actual
176
+ public field name on `SessionCacheStats`).
177
+ - Security tools sentence rewritten: separates SHA-pinned actions,
178
+ version-pinned Python/Go tools, and the SHA256-verified gitleaks
179
+ binary; clarifies that `eslint-plugin-security` runs via the existing
180
+ eslint config (not security.yml); replaces the inaccurate "Top-level
181
+ `permissions: contents: read` on every workflow" claim with the
182
+ accurate least-privilege phrasing.
183
+ - "Signed installer artefacts" → "SHA256-verifiable installer artefacts"
184
+ (no signing today); npm note adds the sigstore-provenance context.
185
+ - Haiku 3.5 Vertex 2048 caveat added: the in-code alias table
186
+ conservatively collapses all Haiku variants to 4096.
187
+ - Solorigate / Codecov / xz now link separately.
188
+ - Codex smoke-test evidence now links to
189
+ `docs/personal-mcp/PROVIDER_CACHE_SURFACES.md` and the CHANGELOG.
190
+ - Three broken links surfaced by lychee CI fixed: Mistral Vibe URL,
191
+ bare CLAUDE.md link (the file lives outside the gateway repo), and
192
+ the agent-assurance exclude regex tightened to match bare URLs.
193
+
194
+ ### Fixed — `socket.yml` networkAccess false-positive documentation
195
+
196
+ - Documented that the `globalThis["fetch"]` flag on `dist/index.js` /
197
+ `dist/job-store.js` is a substring-match false positive. Neither file
198
+ contains any actual fetch call; the matches are English-prose
199
+ occurrences in an error message, the `fetchWith` JSON field name, and
200
+ a code comment. Verified by sub-agent investigation, no code change
201
+ required, no attack-surface delta vs 1.5.35.
202
+
203
+ ### Fixed — `lychee.toml` exclusions
204
+
205
+ - Added `https://npmjs.com/`, `https://help.openai.com/`, and bare
206
+ `github.com/verivus-oss/agent-assurance` URLs to the exclude list
207
+ (each is a Cloudflare bot-blocked / private host that returns
208
+ 4xx/5xx to anonymous CI requests). Rationale documented inline.
209
+
5
210
  ## [1.6.0] - 2026-05-26 — cache-awareness phase 1 + security posture
6
211
 
7
212
  Also includes (beyond cache-awareness):
package/README.md CHANGED
@@ -3,7 +3,7 @@
3
3
  > *"Without consultation, plans are frustrated, but with many counselors they succeed."*
4
4
  > — Proverbs 15:22 (LSB)
5
5
 
6
- A Model Context Protocol (MCP) server providing unified access to Claude Code, Codex, Gemini, and Grok CLIs with session management, retry logic, and async job orchestration.
6
+ A Model Context Protocol (MCP) server providing unified access to Claude Code, Codex, Gemini, Grok, and Mistral (Vibe) CLIs with session management, retry logic, and async job orchestration.
7
7
 
8
8
  ## Personal MCP Appliance MVP
9
9
 
@@ -79,7 +79,7 @@ docker compose -f docker-compose.personal.yml run --rm doctor
79
79
  ## Features
80
80
 
81
81
  ### Core Capabilities
82
- - **Multi-LLM Orchestration**: Unified interface for Claude Code, Codex, Gemini, and Grok CLIs
82
+ - **Multi-LLM Orchestration**: Unified interface for Claude Code, Codex, Gemini, Grok, and Mistral (Vibe) CLIs
83
83
  - **Session Management**: Track and resume conversations across all CLIs with persistent storage
84
84
  - **Token Optimization**: Automatic 44% reduction on prompts, 37% on responses (opt-in)
85
85
  - **Correlation ID Tracking**: Full request tracing across all LLM interactions
@@ -127,12 +127,12 @@ Opt-in flags (all default off) live under `[cache_awareness]` in `~/.llm-cli-gat
127
127
  - **Long-Running Jobs**: Non-time-bound async execution via `*_request_async` + polling tools
128
128
 
129
129
  ### Security & Quality
130
- - **Comprehensive Testing**: 284 tests covering unit, integration, and regression scenarios
130
+ - **Comprehensive Testing**: 681 tests covering unit, integration, and regression scenarios with real CLI execution
131
131
  - **Input Validation**: Zod schemas prevent injection attacks
132
132
  - **No Secret Leakage**: Generic session descriptions only (file permissions 0o600)
133
133
  - **No ReDoS**: Bounded regex patterns prevent catastrophic backtracking
134
134
  - **Type Safety**: Strict TypeScript with comprehensive error handling
135
- - **221 Tests**: Unit, integration, and regression tests with real CLI execution
135
+ - **Supply-chain hardening**: a dedicated `.github/workflows/security.yml` runs actionlint, zizmor, shellcheck, typos, osv-scanner, gitleaks, ruff, bandit, and lychee on every push and PR (see `SECURITY.md` for the threat model)
136
136
 
137
137
  ## Prerequisites
138
138
 
@@ -1,8 +1,35 @@
1
1
  import type { Logger } from "./logger.js";
2
2
  import { type JobHealth } from "./process-monitor.js";
3
3
  import { JobStore } from "./job-store.js";
4
+ import { type FlightRecorderLike } from "./flight-recorder.js";
4
5
  export type LlmCli = "claude" | "codex" | "gemini" | "grok" | "mistral";
5
6
  export type AsyncJobStatus = "running" | "completed" | "failed" | "canceled" | "orphaned";
7
+ /**
8
+ * Slice 1.5 flight-recorder payload supplied via StartJobOptions.
9
+ * Decomposed to primitive fields (no nested handler-locals) so retaining
10
+ * a reference on the in-memory job record doesn't pin large promptParts
11
+ * or attachments via closure scope.
12
+ */
13
+ export interface AsyncJobFlightRecorderEntry {
14
+ model: string;
15
+ prompt: string;
16
+ sessionId?: string;
17
+ stablePrefixHash?: string;
18
+ stablePrefixTokens?: number;
19
+ }
20
+ /**
21
+ * Slice 1.5 usage-extraction callback. Closures MUST be constructed from
22
+ * primitive locals only (e.g. const fmt = params.outputFormat; closure
23
+ * captures fmt). Capturing the handler's full `params` object pins large
24
+ * promptParts/attachments for JOB_TTL_MS.
25
+ */
26
+ export type AsyncJobUsageExtractor = (stdout: string) => {
27
+ inputTokens?: number;
28
+ outputTokens?: number;
29
+ cacheReadTokens?: number;
30
+ cacheCreationTokens?: number;
31
+ costUsd?: number;
32
+ };
6
33
  export interface AsyncJobSnapshot {
7
34
  id: string;
8
35
  cli: LlmCli;
@@ -45,6 +72,23 @@ export interface StartJobOptions {
45
72
  * etc.) that must persist for the lifetime of the spawned CLI process.
46
73
  */
47
74
  onComplete?: () => void;
75
+ /**
76
+ * Slice 1.5: when true, AsyncJobManager writes a flight-recorder logStart
77
+ * row at startJob entry using `flightRecorderEntry`. Pure async handlers
78
+ * (handle*RequestAsync) pass true because they have no upstream
79
+ * safeFlightStart writer. The sync-deferred path (awaitJobOrDefer) passes
80
+ * false because the upstream sync handler already wrote logStart keyed on
81
+ * the same correlationId — a second INSERT would crash on the PK.
82
+ */
83
+ writeFlightStart?: boolean;
84
+ /** Slice 1.5: payload for the FR logStart and the terminal logComplete. */
85
+ flightRecorderEntry?: AsyncJobFlightRecorderEntry;
86
+ /**
87
+ * Slice 1.5: invoked only on terminal `completed` to populate token-usage
88
+ * fields in the FR logComplete payload. Construct from primitive locals
89
+ * only (see AsyncJobUsageExtractor doc).
90
+ */
91
+ extractUsage?: AsyncJobUsageExtractor;
48
92
  }
49
93
  export interface StartJobOutcome {
50
94
  snapshot: AsyncJobSnapshot;
@@ -60,7 +104,8 @@ export declare class AsyncJobManager {
60
104
  private evictionTimer;
61
105
  private processMonitor;
62
106
  private store;
63
- constructor(logger?: Logger, onJobComplete?: ((cli: LlmCli, durationMs: number, success: boolean) => void) | undefined, store?: JobStore | null);
107
+ private flightRecorder;
108
+ constructor(logger?: Logger, onJobComplete?: ((cli: LlmCli, durationMs: number, success: boolean) => void) | undefined, store?: JobStore | null, flightRecorder?: FlightRecorderLike);
64
109
  /**
65
110
  * True iff a durable (or memory) job store is attached. The MCP-tool
66
111
  * registration layer ANDs this with persistence.asyncJobsEnabled when
@@ -81,6 +126,29 @@ export declare class AsyncJobManager {
81
126
  */
82
127
  private buildRequestKey;
83
128
  private fireOnComplete;
129
+ /**
130
+ * Slice 1.5: write the terminal flight-recorder row. Mirrors sync-path
131
+ * failure semantics (response = stderr||stdout on failure, errorMessage
132
+ * falls back through overrideErrorMessage → job.error → job.stderr →
133
+ * "Exit code N"). Single-shot guard set only on SUCCESSFUL write so a
134
+ * thrown logComplete can be retried by a later terminal callback; the
135
+ * FR's WHERE status='started' UPDATE guard remains the actual
136
+ * idempotency mechanism for the common "retry succeeds, original
137
+ * succeeded too" case.
138
+ */
139
+ private writeFlightComplete;
140
+ private safeExtractUsage;
141
+ /**
142
+ * R2 Codex-Unit-B F1: awaitJobOrDefer calls this when returning a
143
+ * deferred response. From this point on the sync handler will not write
144
+ * its own safeFlightComplete, so the manager takes over.
145
+ *
146
+ * Race mitigation: if the job already terminated between the sync
147
+ * deadline expiring and this method firing, write logComplete
148
+ * synchronously here so the previously-skipped terminal callback's
149
+ * write isn't lost.
150
+ */
151
+ armFlightCompleteForDeferral(jobId: string): void;
84
152
  private safeStoreCall;
85
153
  /**
86
154
  * Flush in-memory stdout/stderr to the durable store if anything changed
@@ -100,7 +168,7 @@ export declare class AsyncJobManager {
100
168
  * Existing callers keep working unchanged; forceRefresh is exposed as a trailing
101
169
  * optional param for the dedup-aware path.
102
170
  */
103
- startJob(cli: LlmCli, args: string[], correlationId: string, cwd?: string, idleTimeoutMs?: number, outputFormat?: string, forceRefresh?: boolean, env?: Record<string, string>, onComplete?: () => void): AsyncJobSnapshot;
171
+ startJob(cli: LlmCli, args: string[], correlationId: string, cwd?: string, idleTimeoutMs?: number, outputFormat?: string, forceRefresh?: boolean, env?: Record<string, string>, onComplete?: () => void, flightRecorderEntry?: AsyncJobFlightRecorderEntry, extractUsage?: AsyncJobUsageExtractor, writeFlightStart?: boolean): AsyncJobSnapshot;
104
172
  /**
105
173
  * Start a job, with optional dedup against recent identical requests.
106
174
  * Returns `{ snapshot, deduped }` so callers can log/report the short-circuit.
@@ -3,6 +3,7 @@ import { envWithExtendedPath, getExtendedPath, killProcessGroup, spawnCliProcess
3
3
  import { noopLogger } from "./logger.js";
4
4
  import { ProcessMonitor } from "./process-monitor.js";
5
5
  import { computeRequestKey } from "./job-store.js";
6
+ import { NoopFlightRecorder } from "./flight-recorder.js";
6
7
  const MAX_OUTPUT_SIZE = 50 * 1024 * 1024;
7
8
  const JOB_TTL_MS = 60 * 60 * 1000; // 1 hour in-memory retention; durable store has its own (longer) retention
8
9
  const EVICTION_INTERVAL_MS = 5 * 60 * 1000; // Check every 5 minutes
@@ -61,16 +62,40 @@ export class AsyncJobManager {
61
62
  evictionTimer = null;
62
63
  processMonitor;
63
64
  store;
64
- constructor(logger = noopLogger, onJobComplete, store = null) {
65
+ flightRecorder;
66
+ constructor(logger = noopLogger, onJobComplete, store = null, flightRecorder = new NoopFlightRecorder()) {
65
67
  this.logger = logger;
66
68
  this.onJobComplete = onJobComplete;
67
69
  this.processMonitor = new ProcessMonitor(logger);
68
70
  this.store = store;
71
+ this.flightRecorder = flightRecorder;
69
72
  if (this.store) {
70
73
  try {
71
- const orphaned = this.store.markOrphanedOnStartup();
72
- if (orphaned > 0) {
73
- this.logger.info(`Marked ${orphaned} in-flight job(s) as orphaned after gateway restart`);
74
+ const { count, orphaned } = this.store.markOrphanedOnStartup();
75
+ if (count > 0) {
76
+ this.logger.info(`Marked ${count} in-flight job(s) as orphaned after gateway restart`);
77
+ }
78
+ // Slice 1.5: close out the FR row for each orphaned job. The FR
79
+ // logComplete UPDATE has WHERE status='started' so pre-1.7.0 rows
80
+ // (where the prior gateway never wrote a logStart) silently
81
+ // no-op. Wrapped per-orphan so a single bad row can't tank boot.
82
+ for (const orphan of orphaned) {
83
+ try {
84
+ const durationMs = Math.max(0, Date.now() - new Date(orphan.startedAt).getTime());
85
+ this.flightRecorder.logComplete(orphan.correlationId, {
86
+ response: orphan.stderr || orphan.stdout,
87
+ durationMs,
88
+ retryCount: 0,
89
+ circuitBreakerState: "closed",
90
+ optimizationApplied: false,
91
+ exitCode: orphan.exitCode ?? 1,
92
+ errorMessage: "orphaned after gateway restart",
93
+ status: "failed",
94
+ });
95
+ }
96
+ catch (err) {
97
+ this.logger.error(`Async-path FR logComplete for orphaned job ${orphan.id} failed`, err);
98
+ }
74
99
  }
75
100
  }
76
101
  catch (err) {
@@ -129,6 +154,7 @@ export class AsyncJobManager {
129
154
  this.logger.error(`Job ${id} process ${job.process.pid} no longer exists, marking as failed`);
130
155
  this.emitMetrics(job);
131
156
  this.persistComplete(job);
157
+ this.writeFlightComplete(job, "failed");
132
158
  this.fireOnComplete(job);
133
159
  }
134
160
  // EPERM: process exists but we can't signal it — ignore
@@ -144,6 +170,7 @@ export class AsyncJobManager {
144
170
  this.logger.error(`Job ${id} has exited flag but was still in running state, marking as failed`);
145
171
  this.emitMetrics(job);
146
172
  this.persistComplete(job);
173
+ this.writeFlightComplete(job, "failed");
147
174
  this.fireOnComplete(job);
148
175
  }
149
176
  }
@@ -196,6 +223,96 @@ export class AsyncJobManager {
196
223
  this.logger.error(`Job ${job.id} onComplete hook threw`, err);
197
224
  }
198
225
  }
226
+ /**
227
+ * Slice 1.5: write the terminal flight-recorder row. Mirrors sync-path
228
+ * failure semantics (response = stderr||stdout on failure, errorMessage
229
+ * falls back through overrideErrorMessage → job.error → job.stderr →
230
+ * "Exit code N"). Single-shot guard set only on SUCCESSFUL write so a
231
+ * thrown logComplete can be retried by a later terminal callback; the
232
+ * FR's WHERE status='started' UPDATE guard remains the actual
233
+ * idempotency mechanism for the common "retry succeeds, original
234
+ * succeeded too" case.
235
+ */
236
+ writeFlightComplete(job, finalStatus, overrideErrorMessage) {
237
+ if (!job.flightRecorderEntry)
238
+ return; // never opted in
239
+ // R2 Codex-Unit-B F1: only write when armed. Sync-inline requests are
240
+ // NOT armed at startJob — the sync handler owns the rich-metadata
241
+ // safeFlightComplete write. Pure async + sync-deferred ARE armed.
242
+ if (!job.flightCompleteArmed)
243
+ return;
244
+ if (job.flightRecorderComplete)
245
+ return; // already wrote successfully
246
+ const durationMs = Math.max(0, Date.now() - new Date(job.startedAt).getTime());
247
+ const usage = finalStatus === "completed" && job.extractUsage ? this.safeExtractUsage(job) : {};
248
+ const isFailure = finalStatus === "failed";
249
+ const response = isFailure ? job.stderr || job.stdout : job.stdout;
250
+ const exitCode = job.exitCode ?? (finalStatus === "completed" ? 0 : 1);
251
+ const errorMessage = isFailure
252
+ ? (overrideErrorMessage ?? job.error ?? job.stderr ?? `Exit code ${exitCode}`)
253
+ : undefined;
254
+ try {
255
+ this.flightRecorder.logComplete(job.correlationId, {
256
+ response,
257
+ durationMs,
258
+ retryCount: 0,
259
+ circuitBreakerState: "closed",
260
+ optimizationApplied: false,
261
+ exitCode,
262
+ errorMessage,
263
+ status: finalStatus,
264
+ inputTokens: usage.inputTokens,
265
+ outputTokens: usage.outputTokens,
266
+ cacheReadTokens: usage.cacheReadTokens,
267
+ cacheCreationTokens: usage.cacheCreationTokens,
268
+ costUsd: usage.costUsd,
269
+ });
270
+ // Only mark complete on successful write so a thrown logComplete
271
+ // can be retried by the next terminal callback.
272
+ job.flightRecorderComplete = true;
273
+ // Clear retained references so the GC can reclaim anything the
274
+ // extractUsage closure captured.
275
+ job.flightRecorderEntry = undefined;
276
+ job.extractUsage = undefined;
277
+ }
278
+ catch (err) {
279
+ this.logger.error("Async-path flight recorder logComplete failed", err);
280
+ }
281
+ }
282
+ safeExtractUsage(job) {
283
+ try {
284
+ return job.extractUsage?.(job.stdout) ?? {};
285
+ }
286
+ catch (err) {
287
+ this.logger.error(`Job ${job.id} extractUsage threw`, err);
288
+ return {};
289
+ }
290
+ }
291
+ /**
292
+ * R2 Codex-Unit-B F1: awaitJobOrDefer calls this when returning a
293
+ * deferred response. From this point on the sync handler will not write
294
+ * its own safeFlightComplete, so the manager takes over.
295
+ *
296
+ * Race mitigation: if the job already terminated between the sync
297
+ * deadline expiring and this method firing, write logComplete
298
+ * synchronously here so the previously-skipped terminal callback's
299
+ * write isn't lost.
300
+ */
301
+ armFlightCompleteForDeferral(jobId) {
302
+ const job = this.jobs.get(jobId);
303
+ if (!job)
304
+ return;
305
+ if (job.flightCompleteArmed)
306
+ return; // pure async already armed
307
+ job.flightCompleteArmed = true;
308
+ if (job.status === "running")
309
+ return;
310
+ // Job already terminal — the close handler's writeFlightComplete
311
+ // saw flightCompleteArmed=false and skipped. Write now to recover.
312
+ const finalStatus = job.status === "completed" ? "completed" : "failed";
313
+ const override = job.canceled ? "canceled by caller" : undefined;
314
+ this.writeFlightComplete(job, finalStatus, override);
315
+ }
199
316
  safeStoreCall(label, fn) {
200
317
  if (!this.store)
201
318
  return;
@@ -300,7 +417,7 @@ export class AsyncJobManager {
300
417
  * Existing callers keep working unchanged; forceRefresh is exposed as a trailing
301
418
  * optional param for the dedup-aware path.
302
419
  */
303
- startJob(cli, args, correlationId, cwd, idleTimeoutMs, outputFormat, forceRefresh, env, onComplete) {
420
+ startJob(cli, args, correlationId, cwd, idleTimeoutMs, outputFormat, forceRefresh, env, onComplete, flightRecorderEntry, extractUsage, writeFlightStart) {
304
421
  return this.startJobWithDedup(cli, args, correlationId, {
305
422
  cwd,
306
423
  idleTimeoutMs,
@@ -308,6 +425,9 @@ export class AsyncJobManager {
308
425
  forceRefresh,
309
426
  env,
310
427
  onComplete,
428
+ flightRecorderEntry,
429
+ extractUsage,
430
+ writeFlightStart,
311
431
  }).snapshot;
312
432
  }
313
433
  /**
@@ -319,7 +439,7 @@ export class AsyncJobManager {
319
439
  * is returned without spawning a new process. forceRefresh skips dedup entirely.
320
440
  */
321
441
  startJobWithDedup(cli, args, correlationId, opts = {}) {
322
- const { cwd, idleTimeoutMs, outputFormat, forceRefresh, env: extraEnv, onComplete } = opts;
442
+ const { cwd, idleTimeoutMs, outputFormat, forceRefresh, env: extraEnv, onComplete, flightRecorderEntry, extractUsage, writeFlightStart, } = opts;
323
443
  const requestKey = this.buildRequestKey(cli, args, extraEnv);
324
444
  if (!forceRefresh && this.store) {
325
445
  try {
@@ -405,6 +525,14 @@ export class AsyncJobManager {
405
525
  onCompleteFired: false,
406
526
  outputDirty: false,
407
527
  lastOutputFlushAt: Date.now(),
528
+ flightRecorderEntry,
529
+ extractUsage,
530
+ flightRecorderComplete: false,
531
+ // R2 Codex-Unit-B F1: pure async path arms now (writeFlightStart=true
532
+ // means the manager is the only FR writer). Sync-deferred path
533
+ // arrives with writeFlightStart=false and arms later via
534
+ // armFlightCompleteForDeferral when awaitJobOrDefer decides to defer.
535
+ flightCompleteArmed: writeFlightStart === true,
408
536
  };
409
537
  this.jobs.set(id, job);
410
538
  this.safeStoreCall("recordStart", () => this.store.recordStart({
@@ -417,6 +545,27 @@ export class AsyncJobManager {
417
545
  startedAt,
418
546
  pid: child.pid ?? null,
419
547
  }));
548
+ // Slice 1.5: only opt-in callers (pure async handlers) write logStart
549
+ // here. The sync-deferred path passes writeFlightStart=false because
550
+ // the upstream sync handler already wrote a logStart row keyed on the
551
+ // same correlationId; a duplicate INSERT would crash on the PK.
552
+ if (writeFlightStart && flightRecorderEntry) {
553
+ try {
554
+ this.flightRecorder.logStart({
555
+ correlationId,
556
+ cli,
557
+ model: flightRecorderEntry.model,
558
+ prompt: flightRecorderEntry.prompt,
559
+ sessionId: flightRecorderEntry.sessionId,
560
+ asyncJobId: id,
561
+ stablePrefixHash: flightRecorderEntry.stablePrefixHash,
562
+ stablePrefixTokens: flightRecorderEntry.stablePrefixTokens,
563
+ });
564
+ }
565
+ catch (err) {
566
+ this.logger.error("Async-path flight recorder logStart failed", err);
567
+ }
568
+ }
420
569
  this.logger.info(`Job ${id} started for ${cli}`, { correlationId });
421
570
  // Idle timeout: kill process if no output activity for idleTimeoutMs
422
571
  let idleTimerId;
@@ -439,6 +588,7 @@ export class AsyncJobManager {
439
588
  });
440
589
  this.emitMetrics(job);
441
590
  this.persistComplete(job);
591
+ this.writeFlightComplete(job, "failed");
442
592
  this.fireOnComplete(job);
443
593
  setTimeout(() => {
444
594
  if (!job.exited && job.process)
@@ -473,6 +623,7 @@ export class AsyncJobManager {
473
623
  this.logger.error(`Job ${id} error: ${launchError.message}`, { correlationId });
474
624
  this.emitMetrics(job);
475
625
  this.persistComplete(job);
626
+ this.writeFlightComplete(job, "failed");
476
627
  this.fireOnComplete(job);
477
628
  }
478
629
  });
@@ -490,6 +641,12 @@ export class AsyncJobManager {
490
641
  }
491
642
  // Ensure terminal state reaches the durable store (idle-timeout/output-overflow already persisted).
492
643
  this.persistComplete(job);
644
+ // Slice 1.5: retry the FR complete write iff the earlier terminal
645
+ // callback's logComplete threw. The single-shot guard in
646
+ // writeFlightComplete makes this a no-op in the common case.
647
+ const fallbackFlightStatus = job.status === "completed" ? "completed" : "failed";
648
+ const fallbackOverride = job.status === "canceled" ? "canceled by caller" : undefined;
649
+ this.writeFlightComplete(job, fallbackFlightStatus, fallbackOverride);
493
650
  this.fireOnComplete(job);
494
651
  return;
495
652
  }
@@ -512,6 +669,7 @@ export class AsyncJobManager {
512
669
  }
513
670
  this.emitMetrics(job);
514
671
  this.persistComplete(job);
672
+ this.writeFlightComplete(job, job.status === "completed" ? "completed" : "failed", job.status === "canceled" ? "canceled by caller" : undefined);
515
673
  this.fireOnComplete(job);
516
674
  });
517
675
  return { snapshot: this.snapshot(job), deduped: false };
@@ -567,6 +725,7 @@ export class AsyncJobManager {
567
725
  killProcessGroup(job.process, "SIGTERM");
568
726
  this.logger.info(`Job ${jobId} canceled`, { correlationId: job.correlationId });
569
727
  this.persistComplete(job);
728
+ this.writeFlightComplete(job, "failed", "canceled by caller");
570
729
  this.fireOnComplete(job);
571
730
  setTimeout(() => {
572
731
  if (!job.exited && job.process)
@@ -639,6 +798,7 @@ export class AsyncJobManager {
639
798
  });
640
799
  this.emitMetrics(job);
641
800
  this.persistComplete(job);
801
+ this.writeFlightComplete(job, "failed", "Output exceeded maximum size (50MB)");
642
802
  this.fireOnComplete(job);
643
803
  setTimeout(() => {
644
804
  if (!job.exited && job.process)
@@ -47,7 +47,10 @@ export function parseCodexJsonStream(stdout) {
47
47
  input_tokens: typeof u.input_tokens === "number" ? u.input_tokens : 0,
48
48
  output_tokens: typeof u.output_tokens === "number" ? u.output_tokens : 0,
49
49
  };
50
- if (typeof u.cache_read_input_tokens === "number") {
50
+ if (typeof u.cached_input_tokens === "number") {
51
+ usage.cache_read_tokens = u.cached_input_tokens;
52
+ }
53
+ else if (typeof u.cache_read_input_tokens === "number") {
51
54
  usage.cache_read_tokens = u.cache_read_input_tokens;
52
55
  }
53
56
  else if (typeof u.cache_read_tokens === "number") {
package/dist/index.js CHANGED
@@ -17,7 +17,7 @@ import { estimateTokens, optimizePrompt as optimizePromptText, optimizeResponse
17
17
  import { loadConfig, loadPersistenceConfig, loadCacheAwarenessConfig, } from "./config.js";
18
18
  import { checkHealth } from "./health.js";
19
19
  import { clearModelRegistryCache, getAvailableCliInfo, getCliInfo, resolveModelAlias, } from "./model-registry.js";
20
- import { AsyncJobManager } from "./async-job-manager.js";
20
+ import { AsyncJobManager, } from "./async-job-manager.js";
21
21
  import { createJobStore } from "./job-store.js";
22
22
  import { ApprovalManager } from "./approval-manager.js";
23
23
  import { checkReviewIntegrity } from "./review-integrity.js";
@@ -213,10 +213,10 @@ function getJobStore(runtimeLogger = logger) {
213
213
  }
214
214
  return jobStore;
215
215
  }
216
- function newAsyncJobManager(metrics, runtimeLogger, store = getJobStore(runtimeLogger)) {
216
+ function newAsyncJobManager(metrics, runtimeLogger, store = getJobStore(runtimeLogger), fr = getFlightRecorder(runtimeLogger)) {
217
217
  return new AsyncJobManager(runtimeLogger, (cli, durationMs, success) => {
218
218
  metrics.recordRequest(cli, durationMs, success);
219
- }, store);
219
+ }, store, fr);
220
220
  }
221
221
  function getAsyncJobManager(runtimeLogger = logger) {
222
222
  asyncJobManager ??= newAsyncJobManager(performanceMetrics, runtimeLogger);
@@ -239,17 +239,19 @@ function resolveGatewayServerRuntime(deps = {}, options = {}) {
239
239
  const runtimeSessionManager = deps.sessionManager ?? sessionManager;
240
240
  const runtimePerformanceMetrics = deps.performanceMetrics ??
241
241
  (options.isolateState ? new PerformanceMetrics() : performanceMetrics);
242
+ // Resolve flight recorder BEFORE async manager so isolateState managers
243
+ // can be wired with the same recorder instance the runtime exposes.
244
+ const runtimeFlightRecorder = deps.flightRecorder ?? getFlightRecorder(runtimeLogger);
242
245
  const runtimeAsyncJobManager = deps.asyncJobManager ??
243
246
  (options.isolateState
244
247
  ? // Factory-created test/HTTP session servers must not mark another instance's
245
248
  // durable jobs orphaned. Stdio startup injects the process-global manager.
246
- newAsyncJobManager(runtimePerformanceMetrics, runtimeLogger, null)
249
+ newAsyncJobManager(runtimePerformanceMetrics, runtimeLogger, null, runtimeFlightRecorder)
247
250
  : getAsyncJobManager(runtimeLogger));
248
251
  const runtimeApprovalManager = deps.approvalManager ??
249
252
  (options.isolateState
250
253
  ? new ApprovalManager(undefined, runtimeLogger)
251
254
  : getApprovalManager(runtimeLogger));
252
- const runtimeFlightRecorder = deps.flightRecorder ?? getFlightRecorder(runtimeLogger);
253
255
  return {
254
256
  sessionManager: runtimeSessionManager,
255
257
  resourceProvider: deps.resourceProvider ??
@@ -286,7 +288,16 @@ const SYNC_POLL_INTERVAL_MS = 1_000;
286
288
  * Start an async job and poll until completion or deadline.
287
289
  * Returns the job result if it finishes in time, or a deferral marker.
288
290
  */
289
- async function awaitJobOrDefer(cli, args, corrId, idleTimeoutMs, outputFormat, forceRefresh, runtime = resolveGatewayServerRuntime(), env, onComplete) {
291
+ async function awaitJobOrDefer(cli, args, corrId, idleTimeoutMs, outputFormat, forceRefresh, runtime = resolveGatewayServerRuntime(), env, onComplete,
292
+ /**
293
+ * Slice 1.5: when the sync handler has already written a logStart row
294
+ * keyed on `corrId`, pass these so the manager can write logComplete
295
+ * (with usage extraction) when the underlying async job terminates —
296
+ * even if the sync handler returned a deferred response.
297
+ * `writeFlightStart` is NEVER true on this path: the sync handler is
298
+ * always the upstream logStart writer.
299
+ */
300
+ flightRecorderEntry, extractUsage) {
290
301
  // U26 fix: ownership of onComplete is a contract. Once this function returns
291
302
  // OR throws, the caller MUST consider onComplete consumed — i.e. it has
292
303
  // either been run, or the AsyncJobManager has taken ownership of it. The
@@ -336,6 +347,13 @@ async function awaitJobOrDefer(cli, args, corrId, idleTimeoutMs, outputFormat, f
336
347
  forceRefresh,
337
348
  env,
338
349
  onComplete,
350
+ // Sync-deferred path: the upstream sync handler already wrote
351
+ // logStart for this corrId, so writeFlightStart stays false. The
352
+ // manager still writes logComplete on terminal state (which UPDATEs
353
+ // the sync handler's row), closing the previously-orphaned
354
+ // sync-deferred case.
355
+ flightRecorderEntry,
356
+ extractUsage,
339
357
  });
340
358
  // Handoff succeeded: AsyncJobManager owns onComplete (it'll fire via
341
359
  // fireOnComplete on terminal status, or run inline immediately for dedup).
@@ -369,7 +387,14 @@ async function awaitJobOrDefer(cli, args, corrId, idleTimeoutMs, outputFormat, f
369
387
  }
370
388
  await new Promise(resolve => setTimeout(resolve, SYNC_POLL_INTERVAL_MS));
371
389
  }
372
- // Deadline exceeded — return deferral
390
+ // Deadline exceeded — return deferral.
391
+ // R2 Codex-Unit-B F1: hand FR-complete ownership to the manager. Until
392
+ // this call, the manager skips writeFlightComplete on terminal so the
393
+ // sync handler's safeFlightComplete (with rich approvalDecision /
394
+ // optimizationApplied metadata) wins for sync-inline completions. From
395
+ // here on the sync handler returns deferred and will NOT write
396
+ // safeFlightComplete, so the manager must.
397
+ runtime.asyncJobManager.armFlightCompleteForDeferral(job.id);
373
398
  runtime.logger.info(`[${corrId}] ${cli} sync deadline exceeded (${SYNC_DEADLINE_MS}ms), deferring to async job ${job.id}`);
374
399
  return {
375
400
  deferred: true,
@@ -495,6 +520,30 @@ function extractUsageAndCost(cli, output, outputFormat) {
495
520
  // once we resolve the session id post-run.
496
521
  return {};
497
522
  }
523
+ /**
524
+ * Slice 1.5: build the async-job-manager's FR payload from a prep object
525
+ * (which every prepare*Request returns), plus the bound CLI and output
526
+ * format primitives needed by extractUsageAndCost. Returning the closure
527
+ * separately means it captures `cliName` and `fmt` ONLY — never `params`
528
+ * or `prep` — so retention on AsyncJobRecord is O(constant).
529
+ */
530
+ function buildAsyncFlightRecorderHandoff(cliName, prep, sessionId, outputFormat) {
531
+ // Extract primitives BEFORE building the closure — capturing `prep` or
532
+ // `params` directly would pin large attachments / promptParts on the
533
+ // AsyncJobRecord for JOB_TTL_MS.
534
+ const cli = cliName;
535
+ const fmt = outputFormat;
536
+ return {
537
+ flightRecorderEntry: {
538
+ model: prep.resolvedModel || "default",
539
+ prompt: prep.effectivePrompt,
540
+ sessionId,
541
+ stablePrefixHash: prep.stablePrefixHash ?? undefined,
542
+ stablePrefixTokens: prep.stablePrefixTokens ?? undefined,
543
+ },
544
+ extractUsage: (stdout) => extractUsageAndCost(cli, stdout, fmt),
545
+ };
546
+ }
498
547
  function safeFlightStart(entry, runtime = resolveGatewayServerRuntime()) {
499
548
  try {
500
549
  runtime.flightRecorder.logStart(entry);
@@ -1542,7 +1591,8 @@ export async function handleGeminiRequest(deps, params) {
1542
1591
  args.push(...sessionPlan.args);
1543
1592
  const userProvidedSession = sessionPlan.resumed;
1544
1593
  const effectiveSessionIdHint = sessionPlan.resumed ? params.sessionId : undefined;
1545
- const result = await awaitJobOrDefer("gemini", args, corrId, resolveIdleTimeout("gemini", params.idleTimeoutMs), params.outputFormat, params.forceRefresh, runtime);
1594
+ const geminiFrHandoff = buildAsyncFlightRecorderHandoff("gemini", prep, params.sessionId, params.outputFormat);
1595
+ const result = await awaitJobOrDefer("gemini", args, corrId, resolveIdleTimeout("gemini", params.idleTimeoutMs), params.outputFormat, params.forceRefresh, runtime, undefined, undefined, geminiFrHandoff.flightRecorderEntry, geminiFrHandoff.extractUsage);
1546
1596
  // Deferred — job still running, return async reference
1547
1597
  if (isDeferredResponse(result)) {
1548
1598
  return buildDeferredToolResponse(result, effectiveSessionIdHint);
@@ -1675,7 +1725,10 @@ export async function handleGeminiRequestAsync(deps, params) {
1675
1725
  // surfaces it in the snapshot).
1676
1726
  assertUpstreamCliArgs("gemini", args);
1677
1727
  assertUpstreamCliEnv("gemini", undefined);
1678
- const job = deps.asyncJobManager.startJob("gemini", args, corrId, undefined, resolveIdleTimeout("gemini", params.idleTimeoutMs), params.outputFormat, params.forceRefresh);
1728
+ // Slice 1.5: pure async path no upstream safeFlightStart, so the
1729
+ // manager owns both logStart and logComplete for this corrId.
1730
+ const geminiAsyncFrHandoff = buildAsyncFlightRecorderHandoff("gemini", prep, effectiveSessionId, params.outputFormat);
1731
+ const job = deps.asyncJobManager.startJob("gemini", args, corrId, undefined, resolveIdleTimeout("gemini", params.idleTimeoutMs), params.outputFormat, params.forceRefresh, undefined, undefined, geminiAsyncFrHandoff.flightRecorderEntry, geminiAsyncFrHandoff.extractUsage, true);
1679
1732
  deps.logger.info(`[${corrId}] gemini_request_async started job ${job.id}`);
1680
1733
  const asyncResponse = {
1681
1734
  success: true,
@@ -1745,7 +1798,8 @@ export async function handleGrokRequest(deps, params) {
1745
1798
  createNewSession: params.createNewSession,
1746
1799
  });
1747
1800
  args.push(...sessionResult.resumeArgs);
1748
- const result = await awaitJobOrDefer("grok", args, corrId, resolveIdleTimeout("grok", params.idleTimeoutMs), params.outputFormat, params.forceRefresh, runtime);
1801
+ const grokFrHandoff = buildAsyncFlightRecorderHandoff("grok", prep, params.sessionId, params.outputFormat);
1802
+ const result = await awaitJobOrDefer("grok", args, corrId, resolveIdleTimeout("grok", params.idleTimeoutMs), params.outputFormat, params.forceRefresh, runtime, undefined, undefined, grokFrHandoff.flightRecorderEntry, grokFrHandoff.extractUsage);
1749
1803
  // Deferred — job still running, return async reference
1750
1804
  if (isDeferredResponse(result)) {
1751
1805
  return buildDeferredToolResponse(result, sessionResult.effectiveSessionId);
@@ -1875,7 +1929,8 @@ export async function handleGrokRequestAsync(deps, params) {
1875
1929
  // Start job only after all session I/O succeeds
1876
1930
  assertUpstreamCliArgs("grok", args);
1877
1931
  assertUpstreamCliEnv("grok", undefined);
1878
- const job = deps.asyncJobManager.startJob("grok", args, corrId, undefined, resolveIdleTimeout("grok", params.idleTimeoutMs), params.outputFormat, params.forceRefresh);
1932
+ const grokAsyncFrHandoff = buildAsyncFlightRecorderHandoff("grok", prep, effectiveSessionId, params.outputFormat);
1933
+ const job = deps.asyncJobManager.startJob("grok", args, corrId, undefined, resolveIdleTimeout("grok", params.idleTimeoutMs), params.outputFormat, params.forceRefresh, undefined, undefined, grokAsyncFrHandoff.flightRecorderEntry, grokAsyncFrHandoff.extractUsage, true);
1879
1934
  deps.logger.info(`[${corrId}] grok_request_async started job ${job.id}`);
1880
1935
  const asyncResponse = {
1881
1936
  success: true,
@@ -1943,7 +1998,8 @@ export async function handleMistralRequest(deps, params) {
1943
1998
  createNewSession: params.createNewSession,
1944
1999
  });
1945
2000
  args.push(...sessionResult.resumeArgs);
1946
- let result = await awaitJobOrDefer("mistral", args, corrId, resolveIdleTimeout("mistral", params.idleTimeoutMs), params.outputFormat, params.forceRefresh, runtime, mistralEnv);
2001
+ const mistralFrHandoff = buildAsyncFlightRecorderHandoff("mistral", prep, params.sessionId, params.outputFormat);
2002
+ let result = await awaitJobOrDefer("mistral", args, corrId, resolveIdleTimeout("mistral", params.idleTimeoutMs), params.outputFormat, params.forceRefresh, runtime, mistralEnv, undefined, mistralFrHandoff.flightRecorderEntry, mistralFrHandoff.extractUsage);
1947
2003
  if (isDeferredResponse(result)) {
1948
2004
  return buildDeferredToolResponse(result, sessionResult.effectiveSessionId);
1949
2005
  }
@@ -1964,7 +2020,9 @@ export async function handleMistralRequest(deps, params) {
1964
2020
  disallowedTools: params.disallowedTools,
1965
2021
  });
1966
2022
  const retryArgs = [...retryPrep.args, ...sessionResult.resumeArgs];
1967
- result = await awaitJobOrDefer("mistral", retryArgs, corrId, resolveIdleTimeout("mistral", params.idleTimeoutMs), params.outputFormat, true, runtime, retryPrep.env);
2023
+ // Reuse the FR handoff built above the retry preserves corrId,
2024
+ // so the manager's logComplete still updates the original row.
2025
+ result = await awaitJobOrDefer("mistral", retryArgs, corrId, resolveIdleTimeout("mistral", params.idleTimeoutMs), params.outputFormat, true, runtime, retryPrep.env, undefined, mistralFrHandoff.flightRecorderEntry, mistralFrHandoff.extractUsage);
1968
2026
  if (isDeferredResponse(result)) {
1969
2027
  return buildDeferredToolResponse(result, sessionResult.effectiveSessionId);
1970
2028
  }
@@ -2092,7 +2150,8 @@ export async function handleMistralRequestAsync(deps, params) {
2092
2150
  }
2093
2151
  assertUpstreamCliArgs("mistral", args);
2094
2152
  assertUpstreamCliEnv("mistral", mistralEnv);
2095
- const job = deps.asyncJobManager.startJob("mistral", args, corrId, undefined, resolveIdleTimeout("mistral", params.idleTimeoutMs), params.outputFormat, params.forceRefresh, mistralEnv);
2153
+ const mistralAsyncFrHandoff = buildAsyncFlightRecorderHandoff("mistral", prep, effectiveSessionId, params.outputFormat);
2154
+ const job = deps.asyncJobManager.startJob("mistral", args, corrId, undefined, resolveIdleTimeout("mistral", params.idleTimeoutMs), params.outputFormat, params.forceRefresh, mistralEnv, undefined, mistralAsyncFrHandoff.flightRecorderEntry, mistralAsyncFrHandoff.extractUsage, true);
2096
2155
  deps.logger.info(`[${corrId}] mistral_request_async started job ${job.id}`);
2097
2156
  const asyncResponse = {
2098
2157
  success: true,
@@ -2193,9 +2252,10 @@ export async function handleCodexRequestAsync(deps, params) {
2193
2252
  // registering the record, ownership stays here and we run it in the catch.
2194
2253
  assertUpstreamCliArgs("codex", args);
2195
2254
  assertUpstreamCliEnv("codex", undefined);
2255
+ const codexAsyncFrHandoff = buildAsyncFlightRecorderHandoff("codex", prep, effectiveSessionId, params.outputFormat);
2196
2256
  let job;
2197
2257
  try {
2198
- job = deps.asyncJobManager.startJob("codex", args, corrId, undefined, resolveIdleTimeout("codex", params.idleTimeoutMs), params.outputFormat, params.forceRefresh, undefined, prepCleanup);
2258
+ job = deps.asyncJobManager.startJob("codex", args, corrId, undefined, resolveIdleTimeout("codex", params.idleTimeoutMs), params.outputFormat, params.forceRefresh, undefined, prepCleanup, codexAsyncFrHandoff.flightRecorderEntry, codexAsyncFrHandoff.extractUsage, true);
2199
2259
  // Handoff succeeded: AsyncJobManager will fire prepCleanup on terminal
2200
2260
  // status. Release our local ownership claim so the catch path doesn't
2201
2261
  // double-fire.
@@ -2461,7 +2521,8 @@ export function createGatewayServer(deps = {}) {
2461
2521
  }
2462
2522
  // Idle timeout only for stream-json (text/json produce no output until done)
2463
2523
  const effectiveIdleTimeout = outputFormat === "stream-json" ? resolveIdleTimeout("claude", idleTimeoutMs) : undefined;
2464
- const result = await awaitJobOrDefer("claude", args, corrId, effectiveIdleTimeout, outputFormat, forceRefresh, runtime);
2524
+ const claudeSyncFrHandoff = buildAsyncFlightRecorderHandoff("claude", prep, effectiveSessionId, outputFormat);
2525
+ const result = await awaitJobOrDefer("claude", args, corrId, effectiveIdleTimeout, outputFormat, forceRefresh, runtime, undefined, undefined, claudeSyncFrHandoff.flightRecorderEntry, claudeSyncFrHandoff.extractUsage);
2465
2526
  // Deferred — job still running, return async reference
2466
2527
  if (isDeferredResponse(result)) {
2467
2528
  return buildDeferredToolResponse(result, effectiveSessionId);
@@ -2703,7 +2764,8 @@ export function createGatewayServer(deps = {}) {
2703
2764
  // completion or deferred). The outer finally MUST NOT clean again.
2704
2765
  const prepCleanup = "cleanup" in prep && typeof prep.cleanup === "function" ? prep.cleanup : undefined;
2705
2766
  try {
2706
- const result = await awaitJobOrDefer("codex", args, corrId, resolveIdleTimeout("codex", idleTimeoutMs), outputFormat, forceRefresh, runtime, undefined, prepCleanup);
2767
+ const codexSyncFrHandoff = buildAsyncFlightRecorderHandoff("codex", prep, sessionId, outputFormat);
2768
+ const result = await awaitJobOrDefer("codex", args, corrId, resolveIdleTimeout("codex", idleTimeoutMs), outputFormat, forceRefresh, runtime, undefined, prepCleanup, codexSyncFrHandoff.flightRecorderEntry, codexSyncFrHandoff.extractUsage);
2707
2769
  // Deferred — job still running, return async reference. Cleanup
2708
2770
  // ownership belongs to AsyncJobManager via onComplete.
2709
2771
  if (isDeferredResponse(result)) {
@@ -3344,7 +3406,8 @@ export function createGatewayServer(deps = {}) {
3344
3406
  : undefined;
3345
3407
  assertUpstreamCliArgs("claude", args);
3346
3408
  assertUpstreamCliEnv("claude", undefined);
3347
- const job = asyncJobManager.startJob("claude", args, corrId, undefined, effectiveIdleTimeout, outputFormat, forceRefresh);
3409
+ const claudeAsyncFrHandoff = buildAsyncFlightRecorderHandoff("claude", prep, effectiveSessionId, outputFormat);
3410
+ const job = asyncJobManager.startJob("claude", args, corrId, undefined, effectiveIdleTimeout, outputFormat, forceRefresh, undefined, undefined, claudeAsyncFrHandoff.flightRecorderEntry, claudeAsyncFrHandoff.extractUsage, true);
3348
3411
  logger.info(`[${corrId}] claude_request_async started job ${job.id}, outputFormat=${outputFormat}`);
3349
3412
  const asyncResponse = {
3350
3413
  success: true,
@@ -51,10 +51,35 @@ export interface JobStore {
51
51
  }): void;
52
52
  getById(id: string): JobRecord | null;
53
53
  findByRequestKey(requestKey: string): JobRecord | null;
54
- markOrphanedOnStartup(): number;
54
+ /**
55
+ * Flip every `status='running'` row to `'orphaned'` at gateway boot.
56
+ *
57
+ * Returns the row count AND a snapshot of every row that was flipped, so
58
+ * AsyncJobManager can write a flight-recorder logComplete with the full
59
+ * sync-helper-equivalent payload (response from stderr||stdout,
60
+ * durationMs from startedAt). Pre-slice-1.5 rows that never wrote a
61
+ * logStart degrade silently to a no-op UPDATE inside the FR.
62
+ */
63
+ markOrphanedOnStartup(): {
64
+ count: number;
65
+ orphaned: Array<OrphanedJobSnapshot>;
66
+ };
55
67
  evictExpired(): number;
56
68
  close(): void;
57
69
  }
70
+ /**
71
+ * Per-orphan snapshot returned by `markOrphanedOnStartup` so the
72
+ * AsyncJobManager constructor can build a faithful FlightLogResult for
73
+ * each row it flipped.
74
+ */
75
+ export interface OrphanedJobSnapshot {
76
+ id: string;
77
+ correlationId: string;
78
+ startedAt: string;
79
+ stdout: string;
80
+ stderr: string;
81
+ exitCode: number | null;
82
+ }
58
83
  /**
59
84
  * SQLite-backed job store. Default backend for production. Durable across
60
85
  * gateway restarts; safe for single-instance deployments.
@@ -69,6 +94,7 @@ export declare class SqliteJobStore implements JobStore {
69
94
  private updateCompleteStmt;
70
95
  private getByIdStmt;
71
96
  private findByRequestKeyStmt;
97
+ private selectRunningOrphansStmt;
72
98
  private markOrphanedStmt;
73
99
  private deleteExpiredStmt;
74
100
  constructor(dbPath: string, logger?: Logger, options?: {
@@ -114,8 +140,15 @@ export declare class SqliteJobStore implements JobStore {
114
140
  /**
115
141
  * On gateway boot, flip any jobs that were 'running' to 'orphaned'.
116
142
  * The child processes were detached but can't be reattached to in this process.
143
+ *
144
+ * Returns the row count + a per-orphan snapshot so AsyncJobManager can
145
+ * write a flight-recorder logComplete with proper audit data
146
+ * (durationMs from startedAt, response from stderr||stdout).
117
147
  */
118
- markOrphanedOnStartup(): number;
148
+ markOrphanedOnStartup(): {
149
+ count: number;
150
+ orphaned: Array<OrphanedJobSnapshot>;
151
+ };
119
152
  /**
120
153
  * Delete rows whose expires_at has passed. Returns number of rows deleted.
121
154
  */
@@ -171,7 +204,10 @@ export declare class MemoryJobStore implements JobStore {
171
204
  * In-memory stores have no cross-process state, so any "running" rows here
172
205
  * came from this very process and aren't actually orphaned. No-op.
173
206
  */
174
- markOrphanedOnStartup(): number;
207
+ markOrphanedOnStartup(): {
208
+ count: number;
209
+ orphaned: Array<OrphanedJobSnapshot>;
210
+ };
175
211
  evictExpired(): number;
176
212
  close(): void;
177
213
  }
@@ -188,7 +224,10 @@ export declare class PostgresJobStore implements JobStore {
188
224
  recordComplete(): void;
189
225
  getById(): JobRecord | null;
190
226
  findByRequestKey(): JobRecord | null;
191
- markOrphanedOnStartup(): number;
227
+ markOrphanedOnStartup(): {
228
+ count: number;
229
+ orphaned: Array<OrphanedJobSnapshot>;
230
+ };
192
231
  evictExpired(): number;
193
232
  close(): void;
194
233
  }
package/dist/job-store.js CHANGED
@@ -73,6 +73,7 @@ export class SqliteJobStore {
73
73
  updateCompleteStmt;
74
74
  getByIdStmt;
75
75
  findByRequestKeyStmt;
76
+ selectRunningOrphansStmt;
76
77
  markOrphanedStmt;
77
78
  deleteExpiredStmt;
78
79
  constructor(dbPath, logger = noopLogger, options = {}) {
@@ -148,6 +149,16 @@ export class SqliteJobStore {
148
149
  AND status IN ('running', 'completed')
149
150
  ORDER BY started_at DESC
150
151
  LIMIT 1
152
+ `);
153
+ // Snapshot every in-flight row's audit data BEFORE the orphan-flip
154
+ // UPDATE so AsyncJobManager can construct a full FlightLogResult per
155
+ // orphan. No transaction wrapper required: gateway boot is
156
+ // single-threaded before any new jobs can arrive, so no
157
+ // status='running' row can be inserted between this SELECT and the
158
+ // UPDATE below.
159
+ this.selectRunningOrphansStmt = this.db.prepare(`
160
+ SELECT id, correlation_id, started_at, stdout, stderr, exit_code
161
+ FROM jobs WHERE status = 'running'
151
162
  `);
152
163
  this.markOrphanedStmt = this.db.prepare(`
153
164
  UPDATE jobs
@@ -227,14 +238,29 @@ export class SqliteJobStore {
227
238
  /**
228
239
  * On gateway boot, flip any jobs that were 'running' to 'orphaned'.
229
240
  * The child processes were detached but can't be reattached to in this process.
241
+ *
242
+ * Returns the row count + a per-orphan snapshot so AsyncJobManager can
243
+ * write a flight-recorder logComplete with proper audit data
244
+ * (durationMs from startedAt, response from stderr||stdout).
230
245
  */
231
246
  markOrphanedOnStartup() {
232
247
  const now = new Date().toISOString();
233
248
  // Orphaned jobs retain a short window so callers can fetch the partial output,
234
249
  // then evict. Reuse the standard retention.
235
250
  const expiresAt = new Date(Date.now() + this.retentionMs).toISOString();
251
+ // SELECT before UPDATE — gateway boot is single-threaded so no row can
252
+ // appear in 'running' between the two statements.
253
+ const rows = (this.selectRunningOrphansStmt.all?.() ?? []);
254
+ const orphaned = rows.map(row => ({
255
+ id: row.id,
256
+ correlationId: row.correlation_id,
257
+ startedAt: row.started_at,
258
+ stdout: row.stdout ?? "",
259
+ stderr: row.stderr ?? "",
260
+ exitCode: row.exit_code,
261
+ }));
236
262
  const result = this.markOrphanedStmt.run(now, expiresAt);
237
- return result?.changes ?? 0;
263
+ return { count: result?.changes ?? 0, orphaned };
238
264
  }
239
265
  /**
240
266
  * Delete rows whose expires_at has passed. Returns number of rows deleted.
@@ -341,7 +367,7 @@ export class MemoryJobStore {
341
367
  * came from this very process and aren't actually orphaned. No-op.
342
368
  */
343
369
  markOrphanedOnStartup() {
344
- return 0;
370
+ return { count: 0, orphaned: [] };
345
371
  }
346
372
  evictExpired() {
347
373
  const nowIso = new Date().toISOString();
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "llm-cli-gateway",
3
- "version": "1.6.0",
3
+ "version": "1.7.0",
4
4
  "mcpName": "io.github.verivus-oss/llm-cli-gateway",
5
5
  "description": "MCP server providing unified access to Claude Code, Codex, Gemini, Grok, and Mistral Vibe CLIs with session management, retry logic, async job orchestration, durable job results, and cross-LLM validation.",
6
6
  "license": "MIT",
package/socket.yml CHANGED
@@ -14,6 +14,25 @@ version: 2
14
14
  # src/endpoint-exposure.ts also issues a HEAD probe when verifying
15
15
  # tunnel reachability — opt-in via the start:http entry point only.
16
16
  #
17
+ # Additionally, Socket may flag `dist/index.js` and `dist/job-store.js`
18
+ # against the `globalThis["fetch"]` rule. This is a substring-match
19
+ # false positive (verified for v1.6.0 by sub-agent investigation on
20
+ # 2026-05-26; same matches exist in v1.5.35). Neither file contains
21
+ # any `fetch(`, `globalThis.fetch`, polyfill import, or any other
22
+ # network-call construct. The matches are:
23
+ # - dist/index.js — the English word "fetch" inside an async-defer
24
+ # error message ("Poll with llm_job_status, fetch with
25
+ # llm_job_result.") AND the JSON field name `fetchWith:
26
+ # "llm_job_result"` (part of the deferred-job response contract).
27
+ # - dist/job-store.js — the word "fetch" inside a code comment on
28
+ # markOrphanedOnStartup() describing how callers retrieve partial
29
+ # output from SQLite.
30
+ # Verify with: `grep -rEn "\bfetch\(|globalThis\.fetch|globalThis\[" dist/`
31
+ # — returns empty. Production code does not import undici / node-fetch
32
+ # / axios / got. The cache-awareness slice (v1.6.0) introduced zero
33
+ # new network surfaces; all I/O is filesystem (SQLite, sessions.json)
34
+ # or in-process.
35
+ #
17
36
  # shellAccess
18
37
  # src/executor.ts uses child_process.spawn(cmd, args, { ... }) with a
19
38
  # fixed allow-list of CLI binaries (claude / codex / gemini / grok /