npm - llm-cli-gateway - Versions diffs - 1.5.35 → 1.6.1 - Mend

llm-cli-gateway 1.5.35 → 1.6.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (21) hide show

package/CHANGELOG.md +201 -0
package/README.md +35 -4
package/dist/cache-stats.d.ts +112 -0
package/dist/cache-stats.js +225 -0
package/dist/config.d.ts +41 -0
package/dist/config.js +109 -0
package/dist/doctor.d.ts +42 -1
package/dist/doctor.js +121 -2
package/dist/flight-recorder.d.ts +27 -0
package/dist/flight-recorder.js +79 -2
package/dist/index.d.ts +46 -9
package/dist/index.js +395 -67
package/dist/pricing.d.ts +54 -0
package/dist/pricing.js +100 -0
package/dist/prompt-parts.d.ts +38 -0
package/dist/prompt-parts.js +42 -0
package/dist/resources.d.ts +32 -1
package/dist/resources.js +52 -1
package/package.json +2 -1
package/setup/status.schema.json +39 -0
package/socket.yml +29 -0

package/CHANGELOG.md CHANGED Viewed

@@ -2,6 +2,207 @@
 All notable changes to the llm-cli-gateway project.
+## [1.6.1] - 2026-05-26 — docs-only follow-up to 1.6.0
+Pure documentation release; zero source-code changes since 1.6.0.
+### Changed — agent-install guidance current with v1.6.0 + five providers
+- New `setup/providers/mistral-vibe.md` provider snippet (Mistral was the
+  fifth provider but had no setup/providers/ page; install agents had
+  nothing to point at when the user asked for Mistral coverage).
+- New `setup/assistants/mistral-install-prompt.md` per-assistant install
+  prompt (mirrors the Grok prompt; outbound-only framing,
+  session_logging walk-through, `VIBE_ACTIVE_MODEL` guidance, secret-
+  safety rules preserved).
+- `setup/assistants/ASSISTANT_CONTRACT.md`: Mistral added to "Applies
+  to" and outbound providers; new "Doctor Report Notes (v1.6.0)"
+  paragraph clarifying that the `cache_awareness` block is structural
+  (always present) and that all `[cache_awareness]` flags default off.
+- All 6 per-assistant install prompts (universal, chatgpt, claude,
+  codex, gemini, grok) extended to enumerate all five providers and
+  reference the cache_awareness doctor block.
+- `setup/install-plan.dag.toml` choose-targets / check-diagnostics /
+  apply-client-snippet steps generalised to all five providers; Mistral
+  named outbound-only; cache_awareness must-not-treat-as-blocker note
+  added inline. TOML re-validated.
+- 6 `docs/personal-mcp/connect-*.md` legacy pages now carry an
+  admonition pointing to `setup/providers/` + `ASSISTANT_CONTRACT.md`
+  as canonical.
+### Changed — 12 SKILL.md files current with v1.6.0
+- All 12 skills (7 under `skills/`, 5 under `.agents/skills/`) extended
+  with `promptParts`, `cache_state://` MCP resources, and (where the
+  skill's centre of gravity is session continuity) the
+  `cache_ttl_expiring_soon` warning. Depth tiered by skill audience:
+  multi-llm-orchestration, model-routing, multi-llm-consensus,
+  implement-review-fix, multi-llm-review, async-job-orchestration,
+  session-workflow, secure-orchestration carry full sections or
+  examples; agent-codex-gate, codex-review-gate, design-review-cycle,
+  red-team-assessment carry tip-level mentions.
+- Plugin-namespaced skills (`.agents/skills/*`) version-bumped 1.5 → 1.6.
+- Exact runtime strings cross-checked against `src/index.ts` (the
+  `provide exactly one of …` / `one of … is required` mutex errors and
+  the `cache_ttl_expiring_soon` warning code).
+### Fixed — README / BEST_PRACTICES / integrations doc drift
+- README.md: headline + Core Capabilities now name Mistral as the fifth
+  provider; test counts 284 / 221 → 681; new Supply-chain hardening
+  call-out under Security & Quality.
+- BEST_PRACTICES.md: testing coverage / performance lines 284 → 681.
+- integrations/llm-plugin/README.md: Grok + Mistral added to providers
+  list, usage examples, and the "at least one of" requirements list.
+- ENFORCEMENT.md: self-enforcement checklist provider list now Claude /
+  Codex / Gemini / Grok / Mistral.
+### Fixed — `docs/launch/blog-cache-awareness.md` accuracy + voice
+Technical corrections from the multi-LLM voice + technical review:
+- Mutually-exclusive error-string quotation reformatted so the
+  ``provide exactly one of `prompt` or `promptParts``` example renders
+  correctly in markdown.
+- `lastWriteAt` references corrected to `lastRequestAt` (the actual
+  public field name on `SessionCacheStats`).
+- Security tools sentence rewritten: separates SHA-pinned actions,
+  version-pinned Python/Go tools, and the SHA256-verified gitleaks
+  binary; clarifies that `eslint-plugin-security` runs via the existing
+  eslint config (not security.yml); replaces the inaccurate "Top-level
+  `permissions: contents: read` on every workflow" claim with the
+  accurate least-privilege phrasing.
+- "Signed installer artefacts" → "SHA256-verifiable installer artefacts"
+  (no signing today); npm note adds the sigstore-provenance context.
+- Haiku 3.5 Vertex 2048 caveat added: the in-code alias table
+  conservatively collapses all Haiku variants to 4096.
+- Solorigate / Codecov / xz now link separately.
+- Codex smoke-test evidence now links to
+  `docs/personal-mcp/PROVIDER_CACHE_SURFACES.md` and the CHANGELOG.
+- Three broken links surfaced by lychee CI fixed: Mistral Vibe URL,
+  bare CLAUDE.md link (the file lives outside the gateway repo), and
+  the agent-assurance exclude regex tightened to match bare URLs.
+### Fixed — `socket.yml` networkAccess false-positive documentation
+- Documented that the `globalThis["fetch"]` flag on `dist/index.js` /
+  `dist/job-store.js` is a substring-match false positive. Neither file
+  contains any actual fetch call; the matches are English-prose
+  occurrences in an error message, the `fetchWith` JSON field name, and
+  a code comment. Verified by sub-agent investigation, no code change
+  required, no attack-surface delta vs 1.5.35.
+### Fixed — `lychee.toml` exclusions
+- Added `https://npmjs.com/`, `https://help.openai.com/`, and bare
+  `github.com/verivus-oss/agent-assurance` URLs to the exclude list
+  (each is a Cloudflare bot-blocked / private host that returns
+  4xx/5xx to anonymous CI requests). Rationale documented inline.
+## [1.6.0] - 2026-05-26 — cache-awareness phase 1 + security posture
+Also includes (beyond cache-awareness):
+### Added — free-OSS security posture (matches verivus-oss/agent-assurance)
+- New `.github/workflows/security.yml` running on every push + PR:
+  actionlint, zizmor, shellcheck, typos, osv-scanner, gitleaks, ruff,
+  bandit, lychee. SHA-pinned, fail-on-finding.
+- `eslint-plugin-security` 3.0.1 wired into the existing eslint config.
+- `SECURITY.md` (vulnerability reporting policy), `.github/CODEOWNERS`
+  (review routing for security-sensitive paths), `_typos.toml`,
+  `lychee.toml`, `.gitleaks.toml`, `.github/actionlint.yaml`,
+  `integrations/llm-plugin/.bandit`.
+- Workflow hygiene: top-level `permissions: contents: read`, per-job
+  explicit, `persist-credentials: false` on every `actions/checkout`
+  except the upload job in `release-installer.yml`. Cache disabled on
+  release-triggered setup-node/setup-go (zizmor cache-poisoning).
+- Dependabot: added `npm` ecosystem at `/` and `pip` ecosystem at
+  `/integrations/llm-plugin/` (github-actions group preserved).
+- `installer/go.mod` bumped Go 1.22 → 1.25 (clears 26 stdlib CVEs
+  flagged by osv-scanner); `release-installer.yml` setup-go pin
+  updated in lock-step.
+### Added — cache-awareness slice 1+2+3 (all opt-in, default OFF)
+### Added — cache-awareness slice 1+2+3 (all opt-in, default OFF)
+- **`promptParts` on every `*_request` / `*_request_async` tool** (claude, codex,
+  gemini, grok, mistral; sync + async = 10 tools). Accepts
+  `{ system?, tools?, context?, task }`. Mutually exclusive with `prompt`.
+  The gateway concatenates in canonical order (`system → tools → context → task`)
+  so the stable prefix bytes precede the volatile task tail unchanged across
+  calls — raising implicit cache hit rate without calling provider cache APIs.
+  The exact error strings `provide exactly one of \`prompt\` or \`promptParts\``
+  and `one of \`prompt\` or \`promptParts\` is required` are stable API
+  contract.
+- **Flight-recorder v3 migration**: new columns `stable_prefix_hash`
+  (sha256) and `stable_prefix_tokens` (integer bytes/4 heuristic) on
+  `requests`, plus `idx_requests_stable_hash`. Legacy rows keep NULL.
+- **Cache-state MCP resources** (read-only, tokens/hashes/aggregates only —
+  never raw prompt text):
+  - `cache_state://global` (last 24h aggregates + per-CLI breakdown).
+  - `cache_state://session/{sessionId}` (per-session).
+  - `cache_state://prefix/{hash}` (per-stable-prefix-hash).
+- **`session_get.cacheState`** projection: compact hit-rate / hit-count /
+  cache-token-totals / estimated-savings-USD block, present only when the
+  session has prior requests. Omitted entirely (not null, not empty) for
+  fresh sessions. NOT persisted on the Session interface — it is a
+  read-time projection from the flight recorder.
+- **`computeTtlRemaining()` + `cache_ttl_expiring_soon` warning**: claude
+  sync + async handlers attach a structured `warnings[]` entry when a
+  resumed session's Anthropic cache breakpoint is within 30 s of expiry
+  (gated on `[cache_awareness].warn_on_ttl_expiry`; default false). The
+  TTL math respects `anthropic_ttl_seconds = 300 | 3600`.
+- **Doctor `cache_awareness` block**: always present, zeroed when the
+  flight recorder is empty. Reports `enabled_features` (active flags),
+  `last_24h` (hit rate + savings), and `per_cli` aggregates. JSON schema
+  updated; `setup/status.schema.json` `additionalProperties: false`
+  intact at the root.
+- **`[cache_awareness]` config block** in `~/.llm-cli-gateway/config.toml`:
+  - `emit_anthropic_cache_control = false`
+  - `anthropic_ttl_seconds = 300` (enum: 300 | 3600)
+  - `warn_on_ttl_expiry = false`
+  - `[cache_awareness.min_stable_tokens_for_cache_control]` per-family
+    table (sonnet=1024, opus=4096, haiku=4096, default=4096).
+  Validated by a separate Zod schema and loader (`loadCacheAwarenessConfig`);
+  a malformed `[cache_awareness]` block does NOT break `loadPersistenceConfig`
+  and vice versa. No env-var overrides.
+### Decision: Branch B (prefix-discipline only) for slice 1
+The gateway does NOT emit explicit `cache_control` JSON to Claude in this
+slice and does NOT route `promptParts.system` into `--system-prompt`. The
+upstream injection mechanism is unverified; Branch A is gated on a live
+smoke test in a follow-up slice. The
+`[cache_awareness].emit_anthropic_cache_control` flag is in place for
+when that lands.
+### Deferred / out of scope
+- **Async-path `stable_prefix_hash` recording**: `src/async-job-manager.ts`
+  has zero flight-recorder integration today, so the v3 columns are NOT
+  populated for async-job rows. This is a separate concern beyond
+  cache-awareness — tracked for a future plan
+  (`docs/plans/async-flight-recorder.dag.toml`, TBD). Slice 1's runtime
+  mutex check IS in place on the async tool surface; only the flight-recorder
+  write deferral applies.
+- **Codex parser cache-tokens fix**: `src/codex-json-parser.ts` reads
+  Anthropic-style `cache_read_input_tokens` but Codex CLI 0.133.0+ emits
+  `cached_input_tokens`. `cache_read_tokens` therefore stays NULL for codex
+  rows today. Out of scope for this slice (see PROVIDER_CACHE_SURFACES.md).
+### Invariant
+"No conversation content in session storage" is preserved. The session
+manager (`~/.llm-cli-gateway/sessions.json`) is UNTOUCHED by this slice.
+The cache-awareness columns added by migration v3
+(`stable_prefix_hash`, `stable_prefix_tokens`) live on the existing
+flight recorder (`~/.llm-cli-gateway/logs.db`), which is a separate
+audit-focused store that already records prompts and responses (and is
+not subject to the session-storage invariant). `session_get.cacheState`
+is a read-time PROJECTION from the flight recorder, never persisted on
+the Session interface.
 ## [1.5.35] - 2026-05-25
 ### Fixed

package/README.md CHANGED Viewed

@@ -3,7 +3,7 @@
 > *"Without consultation, plans are frustrated, but with many counselors they succeed."*
 > — Proverbs 15:22 (LSB)
-A Model Context Protocol (MCP) server providing unified access to Claude Code, Codex, Gemini, and Grok CLIs with session management, retry logic, and async job orchestration.
+A Model Context Protocol (MCP) server providing unified access to Claude Code, Codex, Gemini, Grok, and Mistral (Vibe) CLIs with session management, retry logic, and async job orchestration.
 ## Personal MCP Appliance MVP
@@ -79,7 +79,7 @@ docker compose -f docker-compose.personal.yml run --rm doctor
 ## Features
 ### Core Capabilities
-- **Multi-LLM Orchestration**: Unified interface for Claude Code, Codex, Gemini, and Grok CLIs
+- **Multi-LLM Orchestration**: Unified interface for Claude Code, Codex, Gemini, Grok, and Mistral (Vibe) CLIs
 - **Session Management**: Track and resume conversations across all CLIs with persistent storage
 - **Token Optimization**: Automatic 44% reduction on prompts, 37% on responses (opt-in)
 - **Correlation ID Tracking**: Full request tracing across all LLM interactions
@@ -88,6 +88,36 @@ docker compose -f docker-compose.personal.yml run --rm doctor
 ### Observability
 - **SQLite Flight Recorder**: Every request/response logged to `~/.llm-cli-gateway/logs.db` with correlation IDs, token usage, duration, retry counts, and circuit breaker state. Browse with [Datasette](https://datasette.io/): `datasette ~/.llm-cli-gateway/logs.db`
 - **Structured Metadata**: Tool responses include machine-readable `structuredContent` (model, cli, correlationId, sessionId, durationMs, token counts)
+- **Cache observability resources**: `cache_state://global`, `cache_state://session/{id}`, and `cache_state://prefix/{hash}` MCP resources return aggregate cache hit/miss/savings — tokens and hashes only, no prompt text. `session_get` includes a `cacheState` block when the session has prior requests.
+### Cache-aware operation
+Every `*_request` and `*_request_async` tool accepts an optional `promptParts` field that structures the prompt for better cache hit rates. The gateway concatenates the parts in canonical order (`system → tools → context → task`) so that the stable prefix bytes precede the volatile task tail unchanged across calls, letting each provider's automatic prompt-caching land on the same content hash each time.
+```json
+{
+  "promptParts": {
+    "system": "You are a helpful code reviewer.",
+    "tools": "You have access to Read, Grep, Bash.",
+    "context": "<long stable context block — file dumps, etc.>",
+    "task": "Review the changes in src/foo.ts for security issues."
+  }
+}
+```
+`prompt` and `promptParts` are mutually exclusive — pass exactly one.
+Per-CLI capability matrix:
+| CLI     | Prefix discipline (auto via `promptParts`) | Explicit `cache_control` emission |
+|---------|--------------------------------------------|------------------------------------|
+| claude  | yes                                        | not yet (Branch B; gated on `[cache_awareness].emit_anthropic_cache_control`) |
+| codex   | yes                                        | n/a (OpenAI implicit cache, no CLI lever) |
+| gemini  | yes                                        | n/a (implicit prefix cache server-side)  |
+| grok    | yes                                        | n/a (no surfaced cache lever)            |
+| mistral | yes                                        | n/a (no surfaced cache lever)            |
+Opt-in flags (all default off) live under `[cache_awareness]` in `~/.llm-cli-gateway/config.toml`. See `docs/personal-mcp/PROVIDER_CACHE_SURFACES.md` for the per-model minimum cacheable token thresholds and field-name divergences.
 ### Reliability & Performance
 - **Retry Logic**: Exponential backoff with circuit breaker for transient failures
@@ -97,12 +127,12 @@ docker compose -f docker-compose.personal.yml run --rm doctor
 - **Long-Running Jobs**: Non-time-bound async execution via `*_request_async` + polling tools
 ### Security & Quality
-- **Comprehensive Testing**: 284 tests covering unit, integration, and regression scenarios
+- **Comprehensive Testing**: 681 tests covering unit, integration, and regression scenarios with real CLI execution
 - **Input Validation**: Zod schemas prevent injection attacks
 - **No Secret Leakage**: Generic session descriptions only (file permissions 0o600)
 - **No ReDoS**: Bounded regex patterns prevent catastrophic backtracking
 - **Type Safety**: Strict TypeScript with comprehensive error handling
-- **221 Tests**: Unit, integration, and regression tests with real CLI execution
+- **Supply-chain hardening**: a dedicated `.github/workflows/security.yml` runs actionlint, zizmor, shellcheck, typos, osv-scanner, gitleaks, ruff, bandit, and lychee on every push and PR (see `SECURITY.md` for the threat model)
 ## Prerequisites
@@ -1019,6 +1049,7 @@ If you're vetting `llm-cli-gateway` through [Socket](https://socket.dev/npm/pack
 | **Shell access** | `src/executor.ts` uses `child_process.spawn(cmd, args, …)` to invoke the underlying LLM CLIs. | `spawn` is called with an argument array and **never** `shell: true`, so there is no shell interpolation path for caller input. The command name is restricted to an allow-list of known CLI binaries (`claude`, `codex`, `gemini`, `grok`, `vibe`). |
 | **Uses eval** | None in our source. Transitive: `@modelcontextprotocol/sdk` → `ajv@8` uses `new Function(...)` in `ajv/dist/compile/index.js` to compile JSON Schema validators. | This is ajv's standard codegen path. Only known schemas (defined in our source and the MCP SDK) flow into it; no caller-supplied data ever reaches the compiled function body. |
 | **better-sqlite3 PRAGMA helper** | Transitive: `better-sqlite3/lib/methods/pragma.js` interpolates its caller-provided `source` into a `PRAGMA ${source}` statement. | We do not call `db.pragma()` from production source. Internal SQLite setup uses fixed literal `db.exec("PRAGMA ...")` statements, and `npm run security:audit` fails the release if production code reintroduces `.pragma()` calls. |
+| **ioredis obfuscated code** | Optional peer/dev dependency: `ioredis@5.10.1` may be flagged at `built/constants/TLSProfiles.js` for base64-looking strings. | Reviewed as a false positive. The file is a Redis Cloud TLS CA certificate bundle in PEM format, which is base64 by design. It contains no decoder loop, dynamic evaluation, network call, or hidden execution path. The same file is byte-for-byte identical in `ioredis@5.9.2`; our default production install does not install `ioredis`, and our code does not pass ioredis TLS profile options. |
 | **Dependency ownership** | A handful of small transitive packages (e.g. `bindings` via `better-sqlite3`, `media-typer` via `@modelcontextprotocol/sdk`) trip Socket's "unstable ownership" or "obfuscated code" heuristics. | These are pinned, well-known micro-deps in the Node ecosystem with no known issues. We pin direct override versions of `content-type` and `type-is` in `package.json#overrides`. Our previous direct dependency on `toml@3.0.0` (also single-maintainer, last released 2020) was replaced with the actively-maintained `smol-toml` to reduce inherited risk. |
 See [`socket.yml`](./socket.yml) for the same context in machine-readable form.

package/dist/cache-stats.d.ts ADDED Viewed

@@ -0,0 +1,112 @@
+/**
+ * Cache observability aggregates.
+ *
+ * Pure read-only aggregation over the FlightRecorder's `requests` table.
+ * No new storage — every value is computed at query time from existing
+ * columns (`cache_read_tokens`, `cache_creation_tokens`, `stable_prefix_*`,
+ * `datetime_utc`, etc.).
+ *
+ * COALESCE / NULL handling: rows from before the v3 migration have NULL
+ * for stable_prefix_*. Rows from CLIs whose parser does not surface cache
+ * tokens (gemini, grok, mistral, and codex until its parser is fixed)
+ * have NULL for cache_read_tokens / cache_creation_tokens. All aggregates
+ * tolerate NULL via COALESCE(col, 0) — never divides by zero.
+ */
+import type { FlightRecorderQuery } from "./flight-recorder.js";
+export type CacheStatsCli = "claude" | "codex" | "gemini" | "grok" | "mistral";
+export interface SessionCacheStats {
+    sessionId: string;
+    cli: CacheStatsCli | null;
+    /** Total cache_read_tokens across all rows in this session. */
+    totalCacheReadTokens: number;
+    /** Total cache_creation_tokens across all rows in this session. */
+    totalCacheCreationTokens: number;
+    /** Number of rows in this session. */
+    requestCount: number;
+    /** Number of rows where cache_read_tokens > 0. */
+    hitCount: number;
+    /** hitCount / requestCount (0 when requestCount = 0). */
+    hitRate: number;
+    /** Distinct stable_prefix_hash values seen in this session. */
+    distinctPrefixCount: number;
+    /** Last time any row in this session was written (datetime_utc max). ISO string or null. */
+    lastRequestAt: string | null;
+    /** Estimated USD saved by cache reads in this session (best-effort). */
+    estimatedSavingsUsd: number;
+    /**
+     * Slice 3: best-effort remaining TTL on the Anthropic cache breakpoint
+     * established at lastRequestAt. Null for non-claude CLIs (we have no
+     * read on their cache state) and null when lastRequestAt is null.
+     * Computed by computeTtlRemaining(); see ttlPolicy parameter.
+     */
+    ttlRemainingMs: number | null;
+}
+export interface PrefixCacheStats {
+    stablePrefixHash: string;
+    requestCount: number;
+    hitCount: number;
+    hitRate: number;
+    totalCacheReadTokens: number;
+    totalCacheCreationTokens: number;
+    /** Distinct CLI x model combos that hashed to this prefix. */
+    cliBreakdown: Array<{
+        cli: CacheStatsCli;
+        model: string;
+        count: number;
+    }>;
+    firstSeenAt: string | null;
+    lastSeenAt: string | null;
+    estimatedSavingsUsd: number;
+}
+export interface GlobalCacheStats {
+    /** Optional window: rows since (now - lastNHours * 3600s). */
+    windowHours: number | null;
+    totalRequests: number;
+    totalHits: number;
+    hitRate: number;
+    totalCacheReadTokens: number;
+    totalCacheCreationTokens: number;
+    perCli: Array<{
+        cli: CacheStatsCli;
+        requestCount: number;
+        hitCount: number;
+        hitRate: number;
+        totalCacheReadTokens: number;
+        totalCacheCreationTokens: number;
+        estimatedSavingsUsd: number;
+    }>;
+    estimatedSavingsUsd: number;
+}
+export declare function computeSessionCacheStats(db: FlightRecorderQuery, sessionId: string): SessionCacheStats;
+export interface TtlPolicy {
+    /**
+     * Seconds: how long Anthropic holds a cache entry after the last
+     * write. Default 300 (5 minutes). Set to 3600 when the operator has
+     * opted into Anthropic's 1-hour cache TTL via
+     * `[cache_awareness].anthropic_ttl_seconds = 3600`.
+     */
+    anthropicTtlSeconds: 300 | 3600;
+    /** Defaults to `() => Date.now()`. Overridable for deterministic tests. */
+    now?: () => number;
+}
+/**
+ * Slice 3: compute the best-effort milliseconds remaining on the cache
+ * breakpoint established at `stats.lastRequestAt`.
+ *
+ * - Claude: Anthropic's documented TTL (5min default, 1h beta). Computed
+ *   as max(0, ttl - (now - lastWriteAt)).
+ * - Other CLIs: returns null. We do not observe the provider's actual
+ *   cache state, so any number we'd return would be a guess. session_get
+ *   and cache_state resources should report null for these.
+ *
+ * Note: this is "best effort". A cache eviction inside Anthropic's
+ * window will NOT be visible to us — the warning may be optimistic
+ * (see risks section in dag.toml).
+ */
+export declare function computeTtlRemaining(stats: SessionCacheStats, cli: CacheStatsCli | null, ttlPolicy: TtlPolicy): number | null;
+export declare function computePrefixCacheStats(db: FlightRecorderQuery, stablePrefixHash: string): PrefixCacheStats;
+export interface GlobalCacheStatsOpts {
+    /** If set, restrict to rows whose datetime_utc is within the last N hours. */
+    lastNHours?: number;
+}
+export declare function computeGlobalCacheStats(db: FlightRecorderQuery, opts?: GlobalCacheStatsOpts): GlobalCacheStats;

package/dist/cache-stats.js ADDED Viewed

@@ -0,0 +1,225 @@
+/**
+ * Cache observability aggregates.
+ *
+ * Pure read-only aggregation over the FlightRecorder's `requests` table.
+ * No new storage — every value is computed at query time from existing
+ * columns (`cache_read_tokens`, `cache_creation_tokens`, `stable_prefix_*`,
+ * `datetime_utc`, etc.).
+ *
+ * COALESCE / NULL handling: rows from before the v3 migration have NULL
+ * for stable_prefix_*. Rows from CLIs whose parser does not surface cache
+ * tokens (gemini, grok, mistral, and codex until its parser is fixed)
+ * have NULL for cache_read_tokens / cache_creation_tokens. All aggregates
+ * tolerate NULL via COALESCE(col, 0) — never divides by zero.
+ */
+import { estimateCacheSavingsUsd } from "./pricing.js";
+function safeNum(n) {
+    return typeof n === "number" && Number.isFinite(n) ? n : 0;
+}
+function isCacheStatsCli(s) {
+    return s === "claude" || s === "codex" || s === "gemini" || s === "grok" || s === "mistral";
+}
+export function computeSessionCacheStats(db, sessionId) {
+    const rows = db.queryRequests(`SELECT cli, model,
+            COALESCE(cache_read_tokens, 0) AS cache_read_tokens,
+            COALESCE(cache_creation_tokens, 0) AS cache_creation_tokens,
+            stable_prefix_hash,
+            datetime_utc
+     FROM requests
+     WHERE session_id = ?
+     ORDER BY datetime_utc DESC`, sessionId);
+    let totalRead = 0;
+    let totalCreation = 0;
+    let hitCount = 0;
+    const prefixSet = new Set();
+    let lastAt = null;
+    let cli = null;
+    let estimatedSavingsUsd = 0;
+    for (const row of rows) {
+        const reads = safeNum(row.cache_read_tokens);
+        const creation = safeNum(row.cache_creation_tokens);
+        totalRead += reads;
+        totalCreation += creation;
+        if (reads > 0)
+            hitCount += 1;
+        if (row.stable_prefix_hash)
+            prefixSet.add(row.stable_prefix_hash);
+        if (!lastAt || row.datetime_utc > lastAt)
+            lastAt = row.datetime_utc;
+        if (cli === null && isCacheStatsCli(row.cli))
+            cli = row.cli;
+        if (isCacheStatsCli(row.cli)) {
+            estimatedSavingsUsd += estimateCacheSavingsUsd(row.cli, row.model, reads);
+        }
+    }
+    const requestCount = rows.length;
+    return {
+        sessionId,
+        cli,
+        totalCacheReadTokens: totalRead,
+        totalCacheCreationTokens: totalCreation,
+        requestCount,
+        hitCount,
+        hitRate: requestCount > 0 ? hitCount / requestCount : 0,
+        distinctPrefixCount: prefixSet.size,
+        lastRequestAt: lastAt,
+        estimatedSavingsUsd,
+        // ttlRemainingMs is populated by computeTtlRemaining() — the field
+        // exists on the type so the resource shape is uniform, but its value
+        // is left null here. Callers (session_get / cache_state resources)
+        // apply the configured TTL policy and set the field.
+        ttlRemainingMs: null,
+    };
+}
+/**
+ * Slice 3: compute the best-effort milliseconds remaining on the cache
+ * breakpoint established at `stats.lastRequestAt`.
+ *
+ * - Claude: Anthropic's documented TTL (5min default, 1h beta). Computed
+ *   as max(0, ttl - (now - lastWriteAt)).
+ * - Other CLIs: returns null. We do not observe the provider's actual
+ *   cache state, so any number we'd return would be a guess. session_get
+ *   and cache_state resources should report null for these.
+ *
+ * Note: this is "best effort". A cache eviction inside Anthropic's
+ * window will NOT be visible to us — the warning may be optimistic
+ * (see risks section in dag.toml).
+ */
+export function computeTtlRemaining(stats, cli, ttlPolicy) {
+    if (cli !== "claude")
+        return null;
+    if (!stats.lastRequestAt)
+        return null;
+    const nowMs = (ttlPolicy.now ?? Date.now)();
+    const lastWriteMs = Date.parse(stats.lastRequestAt);
+    if (!Number.isFinite(lastWriteMs))
+        return null;
+    const elapsedMs = nowMs - lastWriteMs;
+    const ttlMs = ttlPolicy.anthropicTtlSeconds * 1000;
+    return Math.max(0, ttlMs - elapsedMs);
+}
+export function computePrefixCacheStats(db, stablePrefixHash) {
+    const rows = db.queryRequests(`SELECT cli, model,
+            COALESCE(cache_read_tokens, 0) AS cache_read_tokens,
+            COALESCE(cache_creation_tokens, 0) AS cache_creation_tokens,
+            stable_prefix_hash,
+            datetime_utc
+     FROM requests
+     WHERE stable_prefix_hash = ?
+     ORDER BY datetime_utc ASC`, stablePrefixHash);
+    let totalRead = 0;
+    let totalCreation = 0;
+    let hitCount = 0;
+    let firstAt = null;
+    let lastAt = null;
+    let estimatedSavingsUsd = 0;
+    const cliMap = new Map();
+    for (const row of rows) {
+        const reads = safeNum(row.cache_read_tokens);
+        totalRead += reads;
+        totalCreation += safeNum(row.cache_creation_tokens);
+        if (reads > 0)
+            hitCount += 1;
+        if (!firstAt)
+            firstAt = row.datetime_utc;
+        lastAt = row.datetime_utc;
+        if (isCacheStatsCli(row.cli)) {
+            estimatedSavingsUsd += estimateCacheSavingsUsd(row.cli, row.model, reads);
+            const key = `${row.cli}::${row.model}`;
+            const entry = cliMap.get(key);
+            if (entry) {
+                entry.count += 1;
+            }
+            else {
+                cliMap.set(key, { cli: row.cli, model: row.model, count: 1 });
+            }
+        }
+    }
+    const requestCount = rows.length;
+    return {
+        stablePrefixHash,
+        requestCount,
+        hitCount,
+        hitRate: requestCount > 0 ? hitCount / requestCount : 0,
+        totalCacheReadTokens: totalRead,
+        totalCacheCreationTokens: totalCreation,
+        cliBreakdown: Array.from(cliMap.values()).sort((a, b) => b.count - a.count),
+        firstSeenAt: firstAt,
+        lastSeenAt: lastAt,
+        estimatedSavingsUsd,
+    };
+}
+export function computeGlobalCacheStats(db, opts = {}) {
+    const windowHours = opts.lastNHours ?? null;
+    const sinceIso = windowHours !== null && windowHours > 0
+        ? new Date(Date.now() - windowHours * 3600_000).toISOString()
+        : null;
+    const sql = sinceIso
+        ? `SELECT cli, model,
+              COALESCE(cache_read_tokens, 0) AS cache_read_tokens,
+              COALESCE(cache_creation_tokens, 0) AS cache_creation_tokens,
+              stable_prefix_hash,
+              datetime_utc
+       FROM requests
+       WHERE datetime_utc >= ?`
+        : `SELECT cli, model,
+              COALESCE(cache_read_tokens, 0) AS cache_read_tokens,
+              COALESCE(cache_creation_tokens, 0) AS cache_creation_tokens,
+              stable_prefix_hash,
+              datetime_utc
+       FROM requests`;
+    const rows = sinceIso ? db.queryRequests(sql, sinceIso) : db.queryRequests(sql);
+    const perCliMap = new Map();
+    let totalRequests = 0;
+    let totalHits = 0;
+    let totalRead = 0;
+    let totalCreation = 0;
+    let totalSavings = 0;
+    for (const row of rows) {
+        totalRequests += 1;
+        const reads = safeNum(row.cache_read_tokens);
+        const creation = safeNum(row.cache_creation_tokens);
+        totalRead += reads;
+        totalCreation += creation;
+        if (reads > 0)
+            totalHits += 1;
+        if (!isCacheStatsCli(row.cli))
+            continue;
+        const cli = row.cli;
+        const savings = estimateCacheSavingsUsd(cli, row.model, reads);
+        totalSavings += savings;
+        const agg = perCliMap.get(cli) ?? {
+            requestCount: 0,
+            hitCount: 0,
+            totalCacheReadTokens: 0,
+            totalCacheCreationTokens: 0,
+            estimatedSavingsUsd: 0,
+        };
+        agg.requestCount += 1;
+        if (reads > 0)
+            agg.hitCount += 1;
+        agg.totalCacheReadTokens += reads;
+        agg.totalCacheCreationTokens += creation;
+        agg.estimatedSavingsUsd += savings;
+        perCliMap.set(cli, agg);
+    }
+    const perCli = Array.from(perCliMap.entries()).map(([cli, agg]) => ({
+        cli,
+        requestCount: agg.requestCount,
+        hitCount: agg.hitCount,
+        hitRate: agg.requestCount > 0 ? agg.hitCount / agg.requestCount : 0,
+        totalCacheReadTokens: agg.totalCacheReadTokens,
+        totalCacheCreationTokens: agg.totalCacheCreationTokens,
+        estimatedSavingsUsd: agg.estimatedSavingsUsd,
+    }));
+    return {
+        windowHours,
+        totalRequests,
+        totalHits,
+        hitRate: totalRequests > 0 ? totalHits / totalRequests : 0,
+        totalCacheReadTokens: totalRead,
+        totalCacheCreationTokens: totalCreation,
+        perCli,
+        estimatedSavingsUsd: totalSavings,
+    };
+}

package/dist/config.d.ts CHANGED Viewed

@@ -63,3 +63,44 @@ export interface PersistenceConfigSources {
  * Throws on incoherent configs (memory/none + asyncJobsEnabled without ack).
  */
 export declare function loadPersistenceConfig(logger?: Logger): PersistenceConfig;
+export declare const ANTHROPIC_TTL_SECONDS_VALUES: readonly [300, 3600];
+export type AnthropicTtlSeconds = (typeof ANTHROPIC_TTL_SECONDS_VALUES)[number];
+/**
+ * Per-Anthropic-model-family minimum cacheable tokens. Sourced from
+ * docs/personal-mcp/PROVIDER_CACHE_SURFACES.md (Anthropic API docs as of
+ * 2026-05-26). Models below the threshold cannot be cached even with
+ * cache_control set — Anthropic silently returns un-cached.
+ */
+export declare const DEFAULT_MIN_STABLE_TOKENS_FOR_CACHE_CONTROL: {
+    readonly sonnet: 1024;
+    readonly opus: 4096;
+    readonly haiku: 4096;
+    readonly default: 4096;
+};
+export type ModelFamilyAlias = keyof typeof DEFAULT_MIN_STABLE_TOKENS_FOR_CACHE_CONTROL;
+export interface CacheAwarenessConfig {
+    emitAnthropicCacheControl: boolean;
+    anthropicTtlSeconds: AnthropicTtlSeconds;
+    warnOnTtlExpiry: boolean;
+    minStableTokensForCacheControl: {
+        sonnet: number;
+        opus: number;
+        haiku: number;
+        default: number;
+    };
+    /** Audit trail: file the config was loaded from (or null if defaults). */
+    sources: {
+        configFile: string | null;
+    };
+}
+/**
+ * Load [cache_awareness] from ~/.llm-cli-gateway/config.toml. Defaults: all
+ * behaviour off, per-model min-token thresholds from PROVIDER_CACHE_SURFACES.md.
+ */
+export declare function loadCacheAwarenessConfig(logger?: Logger): CacheAwarenessConfig;
+/**
+ * Look up the per-model-family threshold. `modelName` is the user-facing model
+ * string (e.g. "claude-sonnet-4-6", "claude-opus-4-7"). Falls back to `default`
+ * when the family is unrecognised.
+ */
+export declare function minStableTokensForModel(config: CacheAwarenessConfig, modelName: string): number;