llm-cli-gateway 1.5.35 → 1.6.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -2,6 +2,207 @@
2
2
 
3
3
  All notable changes to the llm-cli-gateway project.
4
4
 
5
+ ## [1.6.1] - 2026-05-26 — docs-only follow-up to 1.6.0
6
+
7
+ Pure documentation release; zero source-code changes since 1.6.0.
8
+
9
+ ### Changed — agent-install guidance current with v1.6.0 + five providers
10
+
11
+ - New `setup/providers/mistral-vibe.md` provider snippet (Mistral was the
12
+ fifth provider but had no setup/providers/ page; install agents had
13
+ nothing to point at when the user asked for Mistral coverage).
14
+ - New `setup/assistants/mistral-install-prompt.md` per-assistant install
15
+ prompt (mirrors the Grok prompt; outbound-only framing,
16
+ session_logging walk-through, `VIBE_ACTIVE_MODEL` guidance, secret-
17
+ safety rules preserved).
18
+ - `setup/assistants/ASSISTANT_CONTRACT.md`: Mistral added to "Applies
19
+ to" and outbound providers; new "Doctor Report Notes (v1.6.0)"
20
+ paragraph clarifying that the `cache_awareness` block is structural
21
+ (always present) and that all `[cache_awareness]` flags default off.
22
+ - All 6 per-assistant install prompts (universal, chatgpt, claude,
23
+ codex, gemini, grok) extended to enumerate all five providers and
24
+ reference the cache_awareness doctor block.
25
+ - `setup/install-plan.dag.toml` choose-targets / check-diagnostics /
26
+ apply-client-snippet steps generalised to all five providers; Mistral
27
+ named outbound-only; cache_awareness must-not-treat-as-blocker note
28
+ added inline. TOML re-validated.
29
+ - 6 `docs/personal-mcp/connect-*.md` legacy pages now carry an
30
+ admonition pointing to `setup/providers/` + `ASSISTANT_CONTRACT.md`
31
+ as canonical.
32
+
33
+ ### Changed — 12 SKILL.md files current with v1.6.0
34
+
35
+ - All 12 skills (7 under `skills/`, 5 under `.agents/skills/`) extended
36
+ with `promptParts`, `cache_state://` MCP resources, and (where the
37
+ skill's centre of gravity is session continuity) the
38
+ `cache_ttl_expiring_soon` warning. Depth tiered by skill audience:
39
+ multi-llm-orchestration, model-routing, multi-llm-consensus,
40
+ implement-review-fix, multi-llm-review, async-job-orchestration,
41
+ session-workflow, secure-orchestration carry full sections or
42
+ examples; agent-codex-gate, codex-review-gate, design-review-cycle,
43
+ red-team-assessment carry tip-level mentions.
44
+ - Plugin-namespaced skills (`.agents/skills/*`) version-bumped 1.5 → 1.6.
45
+ - Exact runtime strings cross-checked against `src/index.ts` (the
46
+ `provide exactly one of …` / `one of … is required` mutex errors and
47
+ the `cache_ttl_expiring_soon` warning code).
48
+
49
+ ### Fixed — README / BEST_PRACTICES / integrations doc drift
50
+
51
+ - README.md: headline + Core Capabilities now name Mistral as the fifth
52
+ provider; test counts 284 / 221 → 681; new Supply-chain hardening
53
+ call-out under Security & Quality.
54
+ - BEST_PRACTICES.md: testing coverage / performance lines 284 → 681.
55
+ - integrations/llm-plugin/README.md: Grok + Mistral added to providers
56
+ list, usage examples, and the "at least one of" requirements list.
57
+ - ENFORCEMENT.md: self-enforcement checklist provider list now Claude /
58
+ Codex / Gemini / Grok / Mistral.
59
+
60
+ ### Fixed — `docs/launch/blog-cache-awareness.md` accuracy + voice
61
+
62
+ Technical corrections from the multi-LLM voice + technical review:
63
+ - Mutually-exclusive error-string quotation reformatted so the
64
+ ``provide exactly one of `prompt` or `promptParts``` example renders
65
+ correctly in markdown.
66
+ - `lastWriteAt` references corrected to `lastRequestAt` (the actual
67
+ public field name on `SessionCacheStats`).
68
+ - Security tools sentence rewritten: separates SHA-pinned actions,
69
+ version-pinned Python/Go tools, and the SHA256-verified gitleaks
70
+ binary; clarifies that `eslint-plugin-security` runs via the existing
71
+ eslint config (not security.yml); replaces the inaccurate "Top-level
72
+ `permissions: contents: read` on every workflow" claim with the
73
+ accurate least-privilege phrasing.
74
+ - "Signed installer artefacts" → "SHA256-verifiable installer artefacts"
75
+ (no signing today); npm note adds the sigstore-provenance context.
76
+ - Haiku 3.5 Vertex 2048 caveat added: the in-code alias table
77
+ conservatively collapses all Haiku variants to 4096.
78
+ - Solorigate / Codecov / xz now link separately.
79
+ - Codex smoke-test evidence now links to
80
+ `docs/personal-mcp/PROVIDER_CACHE_SURFACES.md` and the CHANGELOG.
81
+ - Three broken links surfaced by lychee CI fixed: Mistral Vibe URL,
82
+ bare CLAUDE.md link (the file lives outside the gateway repo), and
83
+ the agent-assurance exclude regex tightened to match bare URLs.
84
+
85
+ ### Fixed — `socket.yml` networkAccess false-positive documentation
86
+
87
+ - Documented that the `globalThis["fetch"]` flag on `dist/index.js` /
88
+ `dist/job-store.js` is a substring-match false positive. Neither file
89
+ contains any actual fetch call; the matches are English-prose
90
+ occurrences in an error message, the `fetchWith` JSON field name, and
91
+ a code comment. Verified by sub-agent investigation, no code change
92
+ required, no attack-surface delta vs 1.5.35.
93
+
94
+ ### Fixed — `lychee.toml` exclusions
95
+
96
+ - Added `https://npmjs.com/`, `https://help.openai.com/`, and bare
97
+ `github.com/verivus-oss/agent-assurance` URLs to the exclude list
98
+ (each is a Cloudflare bot-blocked / private host that returns
99
+ 4xx/5xx to anonymous CI requests). Rationale documented inline.
100
+
101
+ ## [1.6.0] - 2026-05-26 — cache-awareness phase 1 + security posture
102
+
103
+ Also includes (beyond cache-awareness):
104
+
105
+ ### Added — free-OSS security posture (matches verivus-oss/agent-assurance)
106
+
107
+ - New `.github/workflows/security.yml` running on every push + PR:
108
+ actionlint, zizmor, shellcheck, typos, osv-scanner, gitleaks, ruff,
109
+ bandit, lychee. SHA-pinned, fail-on-finding.
110
+ - `eslint-plugin-security` 3.0.1 wired into the existing eslint config.
111
+ - `SECURITY.md` (vulnerability reporting policy), `.github/CODEOWNERS`
112
+ (review routing for security-sensitive paths), `_typos.toml`,
113
+ `lychee.toml`, `.gitleaks.toml`, `.github/actionlint.yaml`,
114
+ `integrations/llm-plugin/.bandit`.
115
+ - Workflow hygiene: top-level `permissions: contents: read`, per-job
116
+ explicit, `persist-credentials: false` on every `actions/checkout`
117
+ except the upload job in `release-installer.yml`. Cache disabled on
118
+ release-triggered setup-node/setup-go (zizmor cache-poisoning).
119
+ - Dependabot: added `npm` ecosystem at `/` and `pip` ecosystem at
120
+ `/integrations/llm-plugin/` (github-actions group preserved).
121
+ - `installer/go.mod` bumped Go 1.22 → 1.25 (clears 26 stdlib CVEs
122
+ flagged by osv-scanner); `release-installer.yml` setup-go pin
123
+ updated in lock-step.
124
+
125
+ ### Added — cache-awareness slice 1+2+3 (all opt-in, default OFF)
126
+
127
+ ### Added — cache-awareness slice 1+2+3 (all opt-in, default OFF)
128
+
129
+ - **`promptParts` on every `*_request` / `*_request_async` tool** (claude, codex,
130
+ gemini, grok, mistral; sync + async = 10 tools). Accepts
131
+ `{ system?, tools?, context?, task }`. Mutually exclusive with `prompt`.
132
+ The gateway concatenates in canonical order (`system → tools → context → task`)
133
+ so the stable prefix bytes precede the volatile task tail unchanged across
134
+ calls — raising implicit cache hit rate without calling provider cache APIs.
135
+ The exact error strings `provide exactly one of \`prompt\` or \`promptParts\``
136
+ and `one of \`prompt\` or \`promptParts\` is required` are stable API
137
+ contract.
138
+ - **Flight-recorder v3 migration**: new columns `stable_prefix_hash`
139
+ (sha256) and `stable_prefix_tokens` (integer bytes/4 heuristic) on
140
+ `requests`, plus `idx_requests_stable_hash`. Legacy rows keep NULL.
141
+ - **Cache-state MCP resources** (read-only, tokens/hashes/aggregates only —
142
+ never raw prompt text):
143
+ - `cache_state://global` (last 24h aggregates + per-CLI breakdown).
144
+ - `cache_state://session/{sessionId}` (per-session).
145
+ - `cache_state://prefix/{hash}` (per-stable-prefix-hash).
146
+ - **`session_get.cacheState`** projection: compact hit-rate / hit-count /
147
+ cache-token-totals / estimated-savings-USD block, present only when the
148
+ session has prior requests. Omitted entirely (not null, not empty) for
149
+ fresh sessions. NOT persisted on the Session interface — it is a
150
+ read-time projection from the flight recorder.
151
+ - **`computeTtlRemaining()` + `cache_ttl_expiring_soon` warning**: claude
152
+ sync + async handlers attach a structured `warnings[]` entry when a
153
+ resumed session's Anthropic cache breakpoint is within 30 s of expiry
154
+ (gated on `[cache_awareness].warn_on_ttl_expiry`; default false). The
155
+ TTL math respects `anthropic_ttl_seconds = 300 | 3600`.
156
+ - **Doctor `cache_awareness` block**: always present, zeroed when the
157
+ flight recorder is empty. Reports `enabled_features` (active flags),
158
+ `last_24h` (hit rate + savings), and `per_cli` aggregates. JSON schema
159
+ updated; `setup/status.schema.json` `additionalProperties: false`
160
+ intact at the root.
161
+ - **`[cache_awareness]` config block** in `~/.llm-cli-gateway/config.toml`:
162
+ - `emit_anthropic_cache_control = false`
163
+ - `anthropic_ttl_seconds = 300` (enum: 300 | 3600)
164
+ - `warn_on_ttl_expiry = false`
165
+ - `[cache_awareness.min_stable_tokens_for_cache_control]` per-family
166
+ table (sonnet=1024, opus=4096, haiku=4096, default=4096).
167
+ Validated by a separate Zod schema and loader (`loadCacheAwarenessConfig`);
168
+ a malformed `[cache_awareness]` block does NOT break `loadPersistenceConfig`
169
+ and vice versa. No env-var overrides.
170
+
171
+ ### Decision: Branch B (prefix-discipline only) for slice 1
172
+
173
+ The gateway does NOT emit explicit `cache_control` JSON to Claude in this
174
+ slice and does NOT route `promptParts.system` into `--system-prompt`. The
175
+ upstream injection mechanism is unverified; Branch A is gated on a live
176
+ smoke test in a follow-up slice. The
177
+ `[cache_awareness].emit_anthropic_cache_control` flag is in place for
178
+ when that lands.
179
+
180
+ ### Deferred / out of scope
181
+
182
+ - **Async-path `stable_prefix_hash` recording**: `src/async-job-manager.ts`
183
+ has zero flight-recorder integration today, so the v3 columns are NOT
184
+ populated for async-job rows. This is a separate concern beyond
185
+ cache-awareness — tracked for a future plan
186
+ (`docs/plans/async-flight-recorder.dag.toml`, TBD). Slice 1's runtime
187
+ mutex check IS in place on the async tool surface; only the flight-recorder
188
+ write deferral applies.
189
+ - **Codex parser cache-tokens fix**: `src/codex-json-parser.ts` reads
190
+ Anthropic-style `cache_read_input_tokens` but Codex CLI 0.133.0+ emits
191
+ `cached_input_tokens`. `cache_read_tokens` therefore stays NULL for codex
192
+ rows today. Out of scope for this slice (see PROVIDER_CACHE_SURFACES.md).
193
+
194
+ ### Invariant
195
+
196
+ "No conversation content in session storage" is preserved. The session
197
+ manager (`~/.llm-cli-gateway/sessions.json`) is UNTOUCHED by this slice.
198
+ The cache-awareness columns added by migration v3
199
+ (`stable_prefix_hash`, `stable_prefix_tokens`) live on the existing
200
+ flight recorder (`~/.llm-cli-gateway/logs.db`), which is a separate
201
+ audit-focused store that already records prompts and responses (and is
202
+ not subject to the session-storage invariant). `session_get.cacheState`
203
+ is a read-time PROJECTION from the flight recorder, never persisted on
204
+ the Session interface.
205
+
5
206
  ## [1.5.35] - 2026-05-25
6
207
 
7
208
  ### Fixed
package/README.md CHANGED
@@ -3,7 +3,7 @@
3
3
  > *"Without consultation, plans are frustrated, but with many counselors they succeed."*
4
4
  > — Proverbs 15:22 (LSB)
5
5
 
6
- A Model Context Protocol (MCP) server providing unified access to Claude Code, Codex, Gemini, and Grok CLIs with session management, retry logic, and async job orchestration.
6
+ A Model Context Protocol (MCP) server providing unified access to Claude Code, Codex, Gemini, Grok, and Mistral (Vibe) CLIs with session management, retry logic, and async job orchestration.
7
7
 
8
8
  ## Personal MCP Appliance MVP
9
9
 
@@ -79,7 +79,7 @@ docker compose -f docker-compose.personal.yml run --rm doctor
79
79
  ## Features
80
80
 
81
81
  ### Core Capabilities
82
- - **Multi-LLM Orchestration**: Unified interface for Claude Code, Codex, Gemini, and Grok CLIs
82
+ - **Multi-LLM Orchestration**: Unified interface for Claude Code, Codex, Gemini, Grok, and Mistral (Vibe) CLIs
83
83
  - **Session Management**: Track and resume conversations across all CLIs with persistent storage
84
84
  - **Token Optimization**: Automatic 44% reduction on prompts, 37% on responses (opt-in)
85
85
  - **Correlation ID Tracking**: Full request tracing across all LLM interactions
@@ -88,6 +88,36 @@ docker compose -f docker-compose.personal.yml run --rm doctor
88
88
  ### Observability
89
89
  - **SQLite Flight Recorder**: Every request/response logged to `~/.llm-cli-gateway/logs.db` with correlation IDs, token usage, duration, retry counts, and circuit breaker state. Browse with [Datasette](https://datasette.io/): `datasette ~/.llm-cli-gateway/logs.db`
90
90
  - **Structured Metadata**: Tool responses include machine-readable `structuredContent` (model, cli, correlationId, sessionId, durationMs, token counts)
91
+ - **Cache observability resources**: `cache_state://global`, `cache_state://session/{id}`, and `cache_state://prefix/{hash}` MCP resources return aggregate cache hit/miss/savings — tokens and hashes only, no prompt text. `session_get` includes a `cacheState` block when the session has prior requests.
92
+
93
+ ### Cache-aware operation
94
+
95
+ Every `*_request` and `*_request_async` tool accepts an optional `promptParts` field that structures the prompt for better cache hit rates. The gateway concatenates the parts in canonical order (`system → tools → context → task`) so that the stable prefix bytes precede the volatile task tail unchanged across calls, letting each provider's automatic prompt-caching land on the same content hash each time.
96
+
97
+ ```json
98
+ {
99
+ "promptParts": {
100
+ "system": "You are a helpful code reviewer.",
101
+ "tools": "You have access to Read, Grep, Bash.",
102
+ "context": "<long stable context block — file dumps, etc.>",
103
+ "task": "Review the changes in src/foo.ts for security issues."
104
+ }
105
+ }
106
+ ```
107
+
108
+ `prompt` and `promptParts` are mutually exclusive — pass exactly one.
109
+
110
+ Per-CLI capability matrix:
111
+
112
+ | CLI | Prefix discipline (auto via `promptParts`) | Explicit `cache_control` emission |
113
+ |---------|--------------------------------------------|------------------------------------|
114
+ | claude | yes | not yet (Branch B; gated on `[cache_awareness].emit_anthropic_cache_control`) |
115
+ | codex | yes | n/a (OpenAI implicit cache, no CLI lever) |
116
+ | gemini | yes | n/a (implicit prefix cache server-side) |
117
+ | grok | yes | n/a (no surfaced cache lever) |
118
+ | mistral | yes | n/a (no surfaced cache lever) |
119
+
120
+ Opt-in flags (all default off) live under `[cache_awareness]` in `~/.llm-cli-gateway/config.toml`. See `docs/personal-mcp/PROVIDER_CACHE_SURFACES.md` for the per-model minimum cacheable token thresholds and field-name divergences.
91
121
 
92
122
  ### Reliability & Performance
93
123
  - **Retry Logic**: Exponential backoff with circuit breaker for transient failures
@@ -97,12 +127,12 @@ docker compose -f docker-compose.personal.yml run --rm doctor
97
127
  - **Long-Running Jobs**: Non-time-bound async execution via `*_request_async` + polling tools
98
128
 
99
129
  ### Security & Quality
100
- - **Comprehensive Testing**: 284 tests covering unit, integration, and regression scenarios
130
+ - **Comprehensive Testing**: 681 tests covering unit, integration, and regression scenarios with real CLI execution
101
131
  - **Input Validation**: Zod schemas prevent injection attacks
102
132
  - **No Secret Leakage**: Generic session descriptions only (file permissions 0o600)
103
133
  - **No ReDoS**: Bounded regex patterns prevent catastrophic backtracking
104
134
  - **Type Safety**: Strict TypeScript with comprehensive error handling
105
- - **221 Tests**: Unit, integration, and regression tests with real CLI execution
135
+ - **Supply-chain hardening**: a dedicated `.github/workflows/security.yml` runs actionlint, zizmor, shellcheck, typos, osv-scanner, gitleaks, ruff, bandit, and lychee on every push and PR (see `SECURITY.md` for the threat model)
106
136
 
107
137
  ## Prerequisites
108
138
 
@@ -1019,6 +1049,7 @@ If you're vetting `llm-cli-gateway` through [Socket](https://socket.dev/npm/pack
1019
1049
  | **Shell access** | `src/executor.ts` uses `child_process.spawn(cmd, args, …)` to invoke the underlying LLM CLIs. | `spawn` is called with an argument array and **never** `shell: true`, so there is no shell interpolation path for caller input. The command name is restricted to an allow-list of known CLI binaries (`claude`, `codex`, `gemini`, `grok`, `vibe`). |
1020
1050
  | **Uses eval** | None in our source. Transitive: `@modelcontextprotocol/sdk` → `ajv@8` uses `new Function(...)` in `ajv/dist/compile/index.js` to compile JSON Schema validators. | This is ajv's standard codegen path. Only known schemas (defined in our source and the MCP SDK) flow into it; no caller-supplied data ever reaches the compiled function body. |
1021
1051
  | **better-sqlite3 PRAGMA helper** | Transitive: `better-sqlite3/lib/methods/pragma.js` interpolates its caller-provided `source` into a `PRAGMA ${source}` statement. | We do not call `db.pragma()` from production source. Internal SQLite setup uses fixed literal `db.exec("PRAGMA ...")` statements, and `npm run security:audit` fails the release if production code reintroduces `.pragma()` calls. |
1052
+ | **ioredis obfuscated code** | Optional peer/dev dependency: `ioredis@5.10.1` may be flagged at `built/constants/TLSProfiles.js` for base64-looking strings. | Reviewed as a false positive. The file is a Redis Cloud TLS CA certificate bundle in PEM format, which is base64 by design. It contains no decoder loop, dynamic evaluation, network call, or hidden execution path. The same file is byte-for-byte identical in `ioredis@5.9.2`; our default production install does not install `ioredis`, and our code does not pass ioredis TLS profile options. |
1022
1053
  | **Dependency ownership** | A handful of small transitive packages (e.g. `bindings` via `better-sqlite3`, `media-typer` via `@modelcontextprotocol/sdk`) trip Socket's "unstable ownership" or "obfuscated code" heuristics. | These are pinned, well-known micro-deps in the Node ecosystem with no known issues. We pin direct override versions of `content-type` and `type-is` in `package.json#overrides`. Our previous direct dependency on `toml@3.0.0` (also single-maintainer, last released 2020) was replaced with the actively-maintained `smol-toml` to reduce inherited risk. |
1023
1054
 
1024
1055
  See [`socket.yml`](./socket.yml) for the same context in machine-readable form.
@@ -0,0 +1,112 @@
1
+ /**
2
+ * Cache observability aggregates.
3
+ *
4
+ * Pure read-only aggregation over the FlightRecorder's `requests` table.
5
+ * No new storage — every value is computed at query time from existing
6
+ * columns (`cache_read_tokens`, `cache_creation_tokens`, `stable_prefix_*`,
7
+ * `datetime_utc`, etc.).
8
+ *
9
+ * COALESCE / NULL handling: rows from before the v3 migration have NULL
10
+ * for stable_prefix_*. Rows from CLIs whose parser does not surface cache
11
+ * tokens (gemini, grok, mistral, and codex until its parser is fixed)
12
+ * have NULL for cache_read_tokens / cache_creation_tokens. All aggregates
13
+ * tolerate NULL via COALESCE(col, 0) — never divides by zero.
14
+ */
15
+ import type { FlightRecorderQuery } from "./flight-recorder.js";
16
+ export type CacheStatsCli = "claude" | "codex" | "gemini" | "grok" | "mistral";
17
+ export interface SessionCacheStats {
18
+ sessionId: string;
19
+ cli: CacheStatsCli | null;
20
+ /** Total cache_read_tokens across all rows in this session. */
21
+ totalCacheReadTokens: number;
22
+ /** Total cache_creation_tokens across all rows in this session. */
23
+ totalCacheCreationTokens: number;
24
+ /** Number of rows in this session. */
25
+ requestCount: number;
26
+ /** Number of rows where cache_read_tokens > 0. */
27
+ hitCount: number;
28
+ /** hitCount / requestCount (0 when requestCount = 0). */
29
+ hitRate: number;
30
+ /** Distinct stable_prefix_hash values seen in this session. */
31
+ distinctPrefixCount: number;
32
+ /** Last time any row in this session was written (datetime_utc max). ISO string or null. */
33
+ lastRequestAt: string | null;
34
+ /** Estimated USD saved by cache reads in this session (best-effort). */
35
+ estimatedSavingsUsd: number;
36
+ /**
37
+ * Slice 3: best-effort remaining TTL on the Anthropic cache breakpoint
38
+ * established at lastRequestAt. Null for non-claude CLIs (we have no
39
+ * read on their cache state) and null when lastRequestAt is null.
40
+ * Computed by computeTtlRemaining(); see ttlPolicy parameter.
41
+ */
42
+ ttlRemainingMs: number | null;
43
+ }
44
+ export interface PrefixCacheStats {
45
+ stablePrefixHash: string;
46
+ requestCount: number;
47
+ hitCount: number;
48
+ hitRate: number;
49
+ totalCacheReadTokens: number;
50
+ totalCacheCreationTokens: number;
51
+ /** Distinct CLI x model combos that hashed to this prefix. */
52
+ cliBreakdown: Array<{
53
+ cli: CacheStatsCli;
54
+ model: string;
55
+ count: number;
56
+ }>;
57
+ firstSeenAt: string | null;
58
+ lastSeenAt: string | null;
59
+ estimatedSavingsUsd: number;
60
+ }
61
+ export interface GlobalCacheStats {
62
+ /** Optional window: rows since (now - lastNHours * 3600s). */
63
+ windowHours: number | null;
64
+ totalRequests: number;
65
+ totalHits: number;
66
+ hitRate: number;
67
+ totalCacheReadTokens: number;
68
+ totalCacheCreationTokens: number;
69
+ perCli: Array<{
70
+ cli: CacheStatsCli;
71
+ requestCount: number;
72
+ hitCount: number;
73
+ hitRate: number;
74
+ totalCacheReadTokens: number;
75
+ totalCacheCreationTokens: number;
76
+ estimatedSavingsUsd: number;
77
+ }>;
78
+ estimatedSavingsUsd: number;
79
+ }
80
+ export declare function computeSessionCacheStats(db: FlightRecorderQuery, sessionId: string): SessionCacheStats;
81
+ export interface TtlPolicy {
82
+ /**
83
+ * Seconds: how long Anthropic holds a cache entry after the last
84
+ * write. Default 300 (5 minutes). Set to 3600 when the operator has
85
+ * opted into Anthropic's 1-hour cache TTL via
86
+ * `[cache_awareness].anthropic_ttl_seconds = 3600`.
87
+ */
88
+ anthropicTtlSeconds: 300 | 3600;
89
+ /** Defaults to `() => Date.now()`. Overridable for deterministic tests. */
90
+ now?: () => number;
91
+ }
92
+ /**
93
+ * Slice 3: compute the best-effort milliseconds remaining on the cache
94
+ * breakpoint established at `stats.lastRequestAt`.
95
+ *
96
+ * - Claude: Anthropic's documented TTL (5min default, 1h beta). Computed
97
+ * as max(0, ttl - (now - lastWriteAt)).
98
+ * - Other CLIs: returns null. We do not observe the provider's actual
99
+ * cache state, so any number we'd return would be a guess. session_get
100
+ * and cache_state resources should report null for these.
101
+ *
102
+ * Note: this is "best effort". A cache eviction inside Anthropic's
103
+ * window will NOT be visible to us — the warning may be optimistic
104
+ * (see risks section in dag.toml).
105
+ */
106
+ export declare function computeTtlRemaining(stats: SessionCacheStats, cli: CacheStatsCli | null, ttlPolicy: TtlPolicy): number | null;
107
+ export declare function computePrefixCacheStats(db: FlightRecorderQuery, stablePrefixHash: string): PrefixCacheStats;
108
+ export interface GlobalCacheStatsOpts {
109
+ /** If set, restrict to rows whose datetime_utc is within the last N hours. */
110
+ lastNHours?: number;
111
+ }
112
+ export declare function computeGlobalCacheStats(db: FlightRecorderQuery, opts?: GlobalCacheStatsOpts): GlobalCacheStats;
@@ -0,0 +1,225 @@
1
+ /**
2
+ * Cache observability aggregates.
3
+ *
4
+ * Pure read-only aggregation over the FlightRecorder's `requests` table.
5
+ * No new storage — every value is computed at query time from existing
6
+ * columns (`cache_read_tokens`, `cache_creation_tokens`, `stable_prefix_*`,
7
+ * `datetime_utc`, etc.).
8
+ *
9
+ * COALESCE / NULL handling: rows from before the v3 migration have NULL
10
+ * for stable_prefix_*. Rows from CLIs whose parser does not surface cache
11
+ * tokens (gemini, grok, mistral, and codex until its parser is fixed)
12
+ * have NULL for cache_read_tokens / cache_creation_tokens. All aggregates
13
+ * tolerate NULL via COALESCE(col, 0) — never divides by zero.
14
+ */
15
+ import { estimateCacheSavingsUsd } from "./pricing.js";
16
+ function safeNum(n) {
17
+ return typeof n === "number" && Number.isFinite(n) ? n : 0;
18
+ }
19
+ function isCacheStatsCli(s) {
20
+ return s === "claude" || s === "codex" || s === "gemini" || s === "grok" || s === "mistral";
21
+ }
22
+ export function computeSessionCacheStats(db, sessionId) {
23
+ const rows = db.queryRequests(`SELECT cli, model,
24
+ COALESCE(cache_read_tokens, 0) AS cache_read_tokens,
25
+ COALESCE(cache_creation_tokens, 0) AS cache_creation_tokens,
26
+ stable_prefix_hash,
27
+ datetime_utc
28
+ FROM requests
29
+ WHERE session_id = ?
30
+ ORDER BY datetime_utc DESC`, sessionId);
31
+ let totalRead = 0;
32
+ let totalCreation = 0;
33
+ let hitCount = 0;
34
+ const prefixSet = new Set();
35
+ let lastAt = null;
36
+ let cli = null;
37
+ let estimatedSavingsUsd = 0;
38
+ for (const row of rows) {
39
+ const reads = safeNum(row.cache_read_tokens);
40
+ const creation = safeNum(row.cache_creation_tokens);
41
+ totalRead += reads;
42
+ totalCreation += creation;
43
+ if (reads > 0)
44
+ hitCount += 1;
45
+ if (row.stable_prefix_hash)
46
+ prefixSet.add(row.stable_prefix_hash);
47
+ if (!lastAt || row.datetime_utc > lastAt)
48
+ lastAt = row.datetime_utc;
49
+ if (cli === null && isCacheStatsCli(row.cli))
50
+ cli = row.cli;
51
+ if (isCacheStatsCli(row.cli)) {
52
+ estimatedSavingsUsd += estimateCacheSavingsUsd(row.cli, row.model, reads);
53
+ }
54
+ }
55
+ const requestCount = rows.length;
56
+ return {
57
+ sessionId,
58
+ cli,
59
+ totalCacheReadTokens: totalRead,
60
+ totalCacheCreationTokens: totalCreation,
61
+ requestCount,
62
+ hitCount,
63
+ hitRate: requestCount > 0 ? hitCount / requestCount : 0,
64
+ distinctPrefixCount: prefixSet.size,
65
+ lastRequestAt: lastAt,
66
+ estimatedSavingsUsd,
67
+ // ttlRemainingMs is populated by computeTtlRemaining() — the field
68
+ // exists on the type so the resource shape is uniform, but its value
69
+ // is left null here. Callers (session_get / cache_state resources)
70
+ // apply the configured TTL policy and set the field.
71
+ ttlRemainingMs: null,
72
+ };
73
+ }
74
+ /**
75
+ * Slice 3: compute the best-effort milliseconds remaining on the cache
76
+ * breakpoint established at `stats.lastRequestAt`.
77
+ *
78
+ * - Claude: Anthropic's documented TTL (5min default, 1h beta). Computed
79
+ * as max(0, ttl - (now - lastWriteAt)).
80
+ * - Other CLIs: returns null. We do not observe the provider's actual
81
+ * cache state, so any number we'd return would be a guess. session_get
82
+ * and cache_state resources should report null for these.
83
+ *
84
+ * Note: this is "best effort". A cache eviction inside Anthropic's
85
+ * window will NOT be visible to us — the warning may be optimistic
86
+ * (see risks section in dag.toml).
87
+ */
88
+ export function computeTtlRemaining(stats, cli, ttlPolicy) {
89
+ if (cli !== "claude")
90
+ return null;
91
+ if (!stats.lastRequestAt)
92
+ return null;
93
+ const nowMs = (ttlPolicy.now ?? Date.now)();
94
+ const lastWriteMs = Date.parse(stats.lastRequestAt);
95
+ if (!Number.isFinite(lastWriteMs))
96
+ return null;
97
+ const elapsedMs = nowMs - lastWriteMs;
98
+ const ttlMs = ttlPolicy.anthropicTtlSeconds * 1000;
99
+ return Math.max(0, ttlMs - elapsedMs);
100
+ }
101
+ export function computePrefixCacheStats(db, stablePrefixHash) {
102
+ const rows = db.queryRequests(`SELECT cli, model,
103
+ COALESCE(cache_read_tokens, 0) AS cache_read_tokens,
104
+ COALESCE(cache_creation_tokens, 0) AS cache_creation_tokens,
105
+ stable_prefix_hash,
106
+ datetime_utc
107
+ FROM requests
108
+ WHERE stable_prefix_hash = ?
109
+ ORDER BY datetime_utc ASC`, stablePrefixHash);
110
+ let totalRead = 0;
111
+ let totalCreation = 0;
112
+ let hitCount = 0;
113
+ let firstAt = null;
114
+ let lastAt = null;
115
+ let estimatedSavingsUsd = 0;
116
+ const cliMap = new Map();
117
+ for (const row of rows) {
118
+ const reads = safeNum(row.cache_read_tokens);
119
+ totalRead += reads;
120
+ totalCreation += safeNum(row.cache_creation_tokens);
121
+ if (reads > 0)
122
+ hitCount += 1;
123
+ if (!firstAt)
124
+ firstAt = row.datetime_utc;
125
+ lastAt = row.datetime_utc;
126
+ if (isCacheStatsCli(row.cli)) {
127
+ estimatedSavingsUsd += estimateCacheSavingsUsd(row.cli, row.model, reads);
128
+ const key = `${row.cli}::${row.model}`;
129
+ const entry = cliMap.get(key);
130
+ if (entry) {
131
+ entry.count += 1;
132
+ }
133
+ else {
134
+ cliMap.set(key, { cli: row.cli, model: row.model, count: 1 });
135
+ }
136
+ }
137
+ }
138
+ const requestCount = rows.length;
139
+ return {
140
+ stablePrefixHash,
141
+ requestCount,
142
+ hitCount,
143
+ hitRate: requestCount > 0 ? hitCount / requestCount : 0,
144
+ totalCacheReadTokens: totalRead,
145
+ totalCacheCreationTokens: totalCreation,
146
+ cliBreakdown: Array.from(cliMap.values()).sort((a, b) => b.count - a.count),
147
+ firstSeenAt: firstAt,
148
+ lastSeenAt: lastAt,
149
+ estimatedSavingsUsd,
150
+ };
151
+ }
152
+ export function computeGlobalCacheStats(db, opts = {}) {
153
+ const windowHours = opts.lastNHours ?? null;
154
+ const sinceIso = windowHours !== null && windowHours > 0
155
+ ? new Date(Date.now() - windowHours * 3600_000).toISOString()
156
+ : null;
157
+ const sql = sinceIso
158
+ ? `SELECT cli, model,
159
+ COALESCE(cache_read_tokens, 0) AS cache_read_tokens,
160
+ COALESCE(cache_creation_tokens, 0) AS cache_creation_tokens,
161
+ stable_prefix_hash,
162
+ datetime_utc
163
+ FROM requests
164
+ WHERE datetime_utc >= ?`
165
+ : `SELECT cli, model,
166
+ COALESCE(cache_read_tokens, 0) AS cache_read_tokens,
167
+ COALESCE(cache_creation_tokens, 0) AS cache_creation_tokens,
168
+ stable_prefix_hash,
169
+ datetime_utc
170
+ FROM requests`;
171
+ const rows = sinceIso ? db.queryRequests(sql, sinceIso) : db.queryRequests(sql);
172
+ const perCliMap = new Map();
173
+ let totalRequests = 0;
174
+ let totalHits = 0;
175
+ let totalRead = 0;
176
+ let totalCreation = 0;
177
+ let totalSavings = 0;
178
+ for (const row of rows) {
179
+ totalRequests += 1;
180
+ const reads = safeNum(row.cache_read_tokens);
181
+ const creation = safeNum(row.cache_creation_tokens);
182
+ totalRead += reads;
183
+ totalCreation += creation;
184
+ if (reads > 0)
185
+ totalHits += 1;
186
+ if (!isCacheStatsCli(row.cli))
187
+ continue;
188
+ const cli = row.cli;
189
+ const savings = estimateCacheSavingsUsd(cli, row.model, reads);
190
+ totalSavings += savings;
191
+ const agg = perCliMap.get(cli) ?? {
192
+ requestCount: 0,
193
+ hitCount: 0,
194
+ totalCacheReadTokens: 0,
195
+ totalCacheCreationTokens: 0,
196
+ estimatedSavingsUsd: 0,
197
+ };
198
+ agg.requestCount += 1;
199
+ if (reads > 0)
200
+ agg.hitCount += 1;
201
+ agg.totalCacheReadTokens += reads;
202
+ agg.totalCacheCreationTokens += creation;
203
+ agg.estimatedSavingsUsd += savings;
204
+ perCliMap.set(cli, agg);
205
+ }
206
+ const perCli = Array.from(perCliMap.entries()).map(([cli, agg]) => ({
207
+ cli,
208
+ requestCount: agg.requestCount,
209
+ hitCount: agg.hitCount,
210
+ hitRate: agg.requestCount > 0 ? agg.hitCount / agg.requestCount : 0,
211
+ totalCacheReadTokens: agg.totalCacheReadTokens,
212
+ totalCacheCreationTokens: agg.totalCacheCreationTokens,
213
+ estimatedSavingsUsd: agg.estimatedSavingsUsd,
214
+ }));
215
+ return {
216
+ windowHours,
217
+ totalRequests,
218
+ totalHits,
219
+ hitRate: totalRequests > 0 ? totalHits / totalRequests : 0,
220
+ totalCacheReadTokens: totalRead,
221
+ totalCacheCreationTokens: totalCreation,
222
+ perCli,
223
+ estimatedSavingsUsd: totalSavings,
224
+ };
225
+ }
package/dist/config.d.ts CHANGED
@@ -63,3 +63,44 @@ export interface PersistenceConfigSources {
63
63
  * Throws on incoherent configs (memory/none + asyncJobsEnabled without ack).
64
64
  */
65
65
  export declare function loadPersistenceConfig(logger?: Logger): PersistenceConfig;
66
+ export declare const ANTHROPIC_TTL_SECONDS_VALUES: readonly [300, 3600];
67
+ export type AnthropicTtlSeconds = (typeof ANTHROPIC_TTL_SECONDS_VALUES)[number];
68
+ /**
69
+ * Per-Anthropic-model-family minimum cacheable tokens. Sourced from
70
+ * docs/personal-mcp/PROVIDER_CACHE_SURFACES.md (Anthropic API docs as of
71
+ * 2026-05-26). Models below the threshold cannot be cached even with
72
+ * cache_control set — Anthropic silently returns un-cached.
73
+ */
74
+ export declare const DEFAULT_MIN_STABLE_TOKENS_FOR_CACHE_CONTROL: {
75
+ readonly sonnet: 1024;
76
+ readonly opus: 4096;
77
+ readonly haiku: 4096;
78
+ readonly default: 4096;
79
+ };
80
+ export type ModelFamilyAlias = keyof typeof DEFAULT_MIN_STABLE_TOKENS_FOR_CACHE_CONTROL;
81
+ export interface CacheAwarenessConfig {
82
+ emitAnthropicCacheControl: boolean;
83
+ anthropicTtlSeconds: AnthropicTtlSeconds;
84
+ warnOnTtlExpiry: boolean;
85
+ minStableTokensForCacheControl: {
86
+ sonnet: number;
87
+ opus: number;
88
+ haiku: number;
89
+ default: number;
90
+ };
91
+ /** Audit trail: file the config was loaded from (or null if defaults). */
92
+ sources: {
93
+ configFile: string | null;
94
+ };
95
+ }
96
+ /**
97
+ * Load [cache_awareness] from ~/.llm-cli-gateway/config.toml. Defaults: all
98
+ * behaviour off, per-model min-token thresholds from PROVIDER_CACHE_SURFACES.md.
99
+ */
100
+ export declare function loadCacheAwarenessConfig(logger?: Logger): CacheAwarenessConfig;
101
+ /**
102
+ * Look up the per-model-family threshold. `modelName` is the user-facing model
103
+ * string (e.g. "claude-sonnet-4-6", "claude-opus-4-7"). Falls back to `default`
104
+ * when the family is unrecognised.
105
+ */
106
+ export declare function minStableTokensForModel(config: CacheAwarenessConfig, modelName: string): number;