@bookedsolid/rea 0.8.0 → 0.9.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +268 -51
- package/SECURITY.md +24 -7
- package/THREAT_MODEL.md +196 -18
- package/dist/cli/serve.d.ts +8 -0
- package/dist/cli/serve.js +32 -6
- package/dist/cli/status.d.ts +40 -1
- package/dist/cli/status.js +101 -2
- package/dist/gateway/circuit-breaker.d.ts +8 -2
- package/dist/gateway/downstream-pool.d.ts +13 -1
- package/dist/gateway/downstream-pool.js +30 -2
- package/dist/gateway/downstream.d.ts +157 -0
- package/dist/gateway/downstream.js +307 -5
- package/dist/gateway/live-state.d.ts +252 -0
- package/dist/gateway/live-state.js +504 -0
- package/dist/gateway/server.d.ts +44 -1
- package/dist/gateway/server.js +101 -1
- package/dist/gateway/session-blocker.d.ts +132 -0
- package/dist/gateway/session-blocker.js +163 -0
- package/package.json +1 -1
package/THREAT_MODEL.md
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
# Threat Model — REA Gateway and Hook Layer
|
|
2
2
|
|
|
3
|
-
Version: 0.
|
|
3
|
+
Version: 0.9.x | Last updated: 2026-04-21
|
|
4
4
|
|
|
5
5
|
---
|
|
6
6
|
|
|
@@ -82,12 +82,12 @@ Downstream MCP servers are treated as untrusted by default. Codex plugin *invoca
|
|
|
82
82
|
|
|
83
83
|
**Mitigations:**
|
|
84
84
|
|
|
85
|
-
- `injection` middleware scans tool
|
|
85
|
+
- `injection` middleware scans downstream tool **results** (`ctx.result`) post-execute for instruction-like patterns — see §5.21 for the three-tier classifier.
|
|
86
86
|
- All injection regexes run under a per-call worker-thread timeout (`wrapRegex`, default 100ms) with a hard kill on timeout — catastrophic backtracking cannot hang the gateway. See §5.12.
|
|
87
87
|
- Gateway middleware chain re-derives tier from tool_name independently on every invocation — a poisoned description cannot change the tier classification.
|
|
88
88
|
- `.rea/policy.yaml` is re-read on every invocation; a "set autonomy to L3" instruction cannot be persisted by any middleware path.
|
|
89
89
|
|
|
90
|
-
**Residual risk:** Semantic injection in natural-language descriptions is not fully mitigated. No regex can reliably detect all instruction-injection patterns.
|
|
90
|
+
**Residual risk:** Semantic injection in natural-language descriptions is not fully mitigated. No regex can reliably detect all instruction-injection patterns. The shipped three-valued classifier (`clean` / `suspicious` / `likely_injection`; see §5.21) narrows the footgun by making "write under suspicion" a conscious policy decision but does not eliminate it.
|
|
91
91
|
|
|
92
92
|
---
|
|
93
93
|
|
|
@@ -203,7 +203,7 @@ Downstream MCP servers are treated as untrusted by default. Codex plugin *invoca
|
|
|
203
203
|
|
|
204
204
|
### 5.8 Codex Plugin Abuse and Codex-Review Gate Weakening
|
|
205
205
|
|
|
206
|
-
**Threat:** A poisoned Codex adversarial-review response contains prompt-injection content designed to make Claude take an unsafe action "per the reviewer's recommendation." A malicious actor uses `/codex-review` to launder an attack past the policy layer. Separately: a rogue agent or operator disables the Codex-review requirement to push protected-path changes without adversarial review.
|
|
206
|
+
**Threat:** A poisoned Codex adversarial-review response contains prompt-injection content designed to make Claude take an unsafe action "per the reviewer's recommendation." A malicious actor uses `/codex-review` to launder an attack past the policy layer. Separately: a rogue agent or operator disables the Codex-review requirement to push protected-path changes without adversarial review, or uses the escape hatch to bypass far more than the Codex requirement.
|
|
207
207
|
|
|
208
208
|
**Mitigations:**
|
|
209
209
|
|
|
@@ -212,12 +212,27 @@ Downstream MCP servers are treated as untrusted by default. Codex plugin *invoca
|
|
|
212
212
|
- Codex never receives `.rea/policy.yaml` content in its prompt; Codex reviews diffs, not policy.
|
|
213
213
|
- The `codex-adversarial` agent cannot by itself modify policy, trigger writes, or bypass blocked paths — it is a review tool, not an actor.
|
|
214
214
|
- **Pluggable reviewer** (0.2.0, G11.2): when Codex is unreachable, `ClaudeSelfReviewer` is the fallback. Claude-on-Claude review is explicitly tagged `degraded: true` in the audit record so self-review is visible and countable.
|
|
215
|
-
- **
|
|
216
|
-
- **First-class no-Codex mode** (0.2.0, G11.4): `policy.review.codex_required: false` skips the protected-path Codex requirement entirely. In that mode `REA_SKIP_CODEX_REVIEW` becomes a no-op (skipping a review that isn't required has no meaning), and no skip record is emitted. Both `.claude/hooks/push-review-gate.sh` (Claude Code path) and `.husky/pre-push` (terminal path) honor this knob.
|
|
215
|
+
- **First-class no-Codex mode** (0.2.0, G11.4): `policy.review.codex_required: false` skips the protected-path Codex requirement entirely. In that mode `REA_SKIP_CODEX_REVIEW` becomes a no-op (skipping a review that isn't required has no meaning), and no skip record is emitted. Both the Claude-Code adapter (`.claude/hooks/push-review-gate.sh`) and the native git adapter (`.claude/hooks/push-review-gate-git.sh`, sharing `hooks/_lib/push-review-core.sh`) honor this knob.
|
|
217
216
|
- **Availability probe** (0.2.0, G11.3): `rea serve` runs an initial `codex --version` probe on startup when `codex_required` ≠ false. A failed probe emits a single stderr warn — startup never fail-closes on a Codex miss.
|
|
218
217
|
- **Reviewer telemetry** (0.2.0, G11.5): `ClaudeSelfReviewer.review()` writes a row to `.rea/metrics.jsonl` with invocation counts, estimated tokens (chars/4), latency, and a `rate_limited` signal parsed from stderr. Payloads are NEVER stored; a unit test asserts that marker strings in inputs never appear in the metrics file.
|
|
219
218
|
|
|
220
|
-
|
|
219
|
+
**`REA_SKIP_CODEX_REVIEW` — Codex-only waiver (0.8.0, #85).** Through 0.7.0 this env var short-circuited the **entire** push-review gate after writing its skip audit record — equivalent in scope to `REA_SKIP_PUSH_REVIEW`. Operators reached for it to silence a transient Codex unavailability and accidentally bypassed HALT, the cross-repo guard, ref-resolution, and the push-review cache. 0.8.0 narrows it to what the name implies: the waiver satisfies **only** the protected-path Codex-audit requirement. Every other gate still runs:
|
|
220
|
+
|
|
221
|
+
- **HALT** (`.rea/HALT`) — still blocks.
|
|
222
|
+
- **Cross-repo guard** — still blocks.
|
|
223
|
+
- **Ref-resolution failures** (missing remote object, unresolvable source ref) — still block, but the skip audit record is written first so the operator's commitment to waive is durable.
|
|
224
|
+
- **Push-review cache** — a miss still falls through to the general "Review required" block.
|
|
225
|
+
|
|
226
|
+
The skip audit record is still named `codex.review.skipped` and still fails the `codex.review` jq predicate. Banner text changed from `CODEX REVIEW SKIPPED` to `CODEX REVIEW WAIVER active` to reflect the narrower scope. Fail-closed contract preserved: missing `dist/audit/append.js` (rea unbuilt) or missing git identity → exit 2.
|
|
227
|
+
|
|
228
|
+
**Cache gate hardening (0.8.0, same release).** The review cache is a separate, later check in the core (`hooks/_lib/push-review-core.sh` §8) — it governs the general push-review gate for non-protected-path pushes, not the protected-path Codex audit itself. Two composition bugs in that cache layer became load-bearing once the Codex waiver no longer papered over cache behavior, so they were fixed in the same release:
|
|
229
|
+
|
|
230
|
+
- The cache-hit predicate now requires `.hit == true and .result == "pass"`. Previously `.hit == true` alone was sufficient, which meant a cached `fail` verdict would silently satisfy the gate. The permissive predicate was a real exposure once the Codex-only waiver stopped short-circuiting subsequent checks.
|
|
231
|
+
- The cache key is derived from the PUSHED source ref (from pre-push stdin), not from the checkout branch. `git push origin hotfix:main` from a `feature` checkout now correctly looks up the `hotfix` cache entry.
|
|
232
|
+
|
|
233
|
+
**`REA_SKIP_PUSH_REVIEW` — whole-gate bypass (0.5.0).** The recovery path for consumers deadlocked on a broken rea install. Writes `tool_name: "push.review.skipped"` with an `os_identity` sub-object (uid, whoami, hostname, pid, ppid, ppid_cmd, tty, ci) so auditors can distinguish a real operator from a forged git-config actor. Refuses with exit 2 on CI runners (`CI` env var set) unless `review.allow_skip_in_ci: true` is opted in via policy — closes the ambient-env-var bypass surface on shared build agents. HALT check runs before the skip branch: `.rea/HALT` cannot be bypassed by either hatch.
|
|
234
|
+
|
|
235
|
+
**Residual risk:** Semantic injection in Codex responses (e.g., reviewer recommends a specific code change that is itself malicious) cannot be fully detected. Mitigation is defense-in-depth: the middleware still runs on any subsequent write that Claude attempts based on the review. A `rea doctor` abuse signal on escape-hatch frequency (≥3 invocations per rolling 7 days) remains tracked.
|
|
221
236
|
|
|
222
237
|
---
|
|
223
238
|
|
|
@@ -300,22 +315,185 @@ Downstream MCP servers are treated as untrusted by default. Codex plugin *invoca
|
|
|
300
315
|
|
|
301
316
|
---
|
|
302
317
|
|
|
318
|
+
### 5.14 Supervisor Trust Boundary (0.9.0, BUG-002..003)
|
|
319
|
+
|
|
320
|
+
**Threat:** A downstream MCP child process crashes unexpectedly — OS OOM-kill, unhandled exception in the child, stdio pipe error outside a caller-initiated close — and the gateway keeps a stale `Client` handle around. Every subsequent `callTool` hits the zombie, receives `Not connected`, the circuit breaker flaps open → half-open → open against the same dead handle, and the child is never respawned. From the operator's perspective the gateway is "up" but nothing works.
|
|
321
|
+
|
|
322
|
+
**Mitigations:**
|
|
323
|
+
|
|
324
|
+
- `DownstreamConnection` wires the MCP SDK `StdioClientTransport`'s `onclose` and `onerror` callbacks on a **per-transport** basis (never global) and treats an unexpected close as "child is dead": the client and transport fields are nulled before the next call. The next `callTool` takes the `connect()` branch and actually respawns the child.
|
|
325
|
+
- Intentional `close()` sets a local flag before calling into the SDK, so the same `onclose` callback does not double-count a graceful shutdown as an unexpected death.
|
|
326
|
+
- "Not connected" errors from the SDK (the in-flight fallback path) are promoted to the respawn path with the same eager invalidation — a stale client is invalidated before the one-shot reconnect fires, so we spawn fresh rather than retrying with the same dead handle.
|
|
327
|
+
- A 30-second flapping guard (`RECONNECT_FLAP_WINDOW_MS`) refuses a second reconnect that lands too quickly after the previous successful one — the child is clearly unhealthy and the circuit breaker is a better place to handle it.
|
|
328
|
+
- `DownstreamConnection.lastError` is bounded **at write** via `boundedDiagnosticString` on a true ES-private `#lastErrorMessage` setter (0.7.0, BUG-014). The invariant is structural: every write produces a bounded stored value regardless of assignment-site count. Non-string inputs raise `TypeError` instead of silently corrupting the field.
|
|
329
|
+
- Error strings published to `serve.state.json` flow through the same `buildRegexRedactor` the gateway logger uses (policy `redact.patterns` + built-in `SECRET_PATTERNS`) via the `lastErrorRedactor` option on the live-state publisher — a credential that leaked into a downstream error message is scrubbed before it lands on disk or on an operator's terminal via `rea status`.
|
|
330
|
+
|
|
331
|
+
**Residual risk:** A child that advertises tools but then returns malicious responses on every call is not a supervisor-layer concern — it is handled by the standard middleware chain (injection, redact, result-size-cap). A child that alternates between healthy and malicious responses more slowly than the circuit breaker can trip is a limitation of any breaker-based approach; detection depends on `.rea/metrics.jsonl` anomalies.
|
|
332
|
+
|
|
333
|
+
Ref: `src/gateway/downstream.ts`, `src/gateway/downstream.test.ts`.
|
|
334
|
+
|
|
335
|
+
---
|
|
336
|
+
|
|
337
|
+
### 5.15 SESSION_BLOCKER Audit Semantics (0.9.0, BUG-004)
|
|
338
|
+
|
|
339
|
+
**Threat:** A persistently failing downstream produces a log stream full of identical circuit-open records. Operators miss the signal because it looks like normal circuit-breaker churn, or alert-fatigue kicks in and they tune it out entirely.
|
|
340
|
+
|
|
341
|
+
**Mitigations:**
|
|
342
|
+
|
|
343
|
+
- `SessionBlockerTracker` subscribes to circuit-breaker `onStateChange` events and counts circuit-open transitions per `(session_id, server_name)`. It tracks **open-level** failures per session, not wire-hot call-level failures — every circuit-open transition counts as one, so a downstream that flaps `open→closed→open` three times in ten minutes crosses the threshold once.
|
|
344
|
+
- On threshold crossing (default: 3), exactly **one** `SESSION_BLOCKER` event fires: a LOUD structured log record plus an audit append via `appendAuditRecord`. The counter keeps incrementing but subsequent opens do **not** re-fire.
|
|
345
|
+
- Recovery (transition to `closed`) resets the counter and re-arms the emit flag — a later threshold crossing fires a fresh record.
|
|
346
|
+
- A new session (new `rea serve` process / new `session_id`) drops every counter and starts fresh.
|
|
347
|
+
- Audit append is best-effort; log-side emission happens first and unconditionally. A broken audit pipeline must never break state tracking.
|
|
348
|
+
- `SESSION_BLOCKER` is an **audit event**, not a gateway exception. The gateway keeps serving traffic; the event is the forensic signal an operator can search for in `audit.jsonl`.
|
|
349
|
+
|
|
350
|
+
**Residual risk:** A downstream that flaps fast enough to hit the threshold on every session but recovers quickly in between can still generate a record per session. This is the intended behavior — the operator should see it every session and fix the downstream.
|
|
351
|
+
|
|
352
|
+
Ref: `src/gateway/session-blocker.ts`, `src/gateway/session-blocker.test.ts`.
|
|
353
|
+
|
|
354
|
+
---
|
|
355
|
+
|
|
356
|
+
### 5.16 `.rea/serve.state.json` Lock / Ownership Handoff (0.9.0, BUG-005)
|
|
357
|
+
|
|
358
|
+
**Threat:** A crashed `rea serve` leaves `serve.state.json` and `serve.pid` behind. A new `rea serve` instance either (a) refuses to start because ownership-by-session-id locks the file forever, or (b) silently takes over without verifying the predecessor is dead — letting two live gateways race on writes.
|
|
359
|
+
|
|
360
|
+
**Mitigations:**
|
|
361
|
+
|
|
362
|
+
- Writes use atomic temp-file + rename (`writeFileAtomic`) with a `.<filename>.<randomUUID>.tmp` suffix, so a reader never sees a torn intermediate.
|
|
363
|
+
- The snapshot carries both `session_id` (boot-time ownership key) and `owner_pid` (0.9.0 pass-4). A newly-started `rea serve` whose predecessor crashed can detect the abandoned file — `kill(owner_pid, 0)` returns ESRCH — and take over ownership rather than stalling.
|
|
364
|
+
- The session-id check runs first; `owner_pid` is a secondary lock-guarded field used only to distinguish "abandoned" from "actively owned by a different session." The combination preserves the safety invariant (no silent takeover of a live gateway's file) while avoiding the pass-2 strict-one-directional lock.
|
|
365
|
+
- Consumers (`rea status`, `rea check`) read the file **only**. They never write and never clean up stale state — the serve process is the only writer. Pre-0.9.0 snapshots without `downstreams` or `owner_pid` degrade to `null` fields with a hint to upgrade.
|
|
366
|
+
- Writes are debounced through a 250 ms trailing timer so a flap storm (open → half-open → open → half-open in rapid succession) doesn't spam the filesystem.
|
|
367
|
+
|
|
368
|
+
**Residual risk:** A pathological PID reuse (the OS recycled the crashed gateway's PID onto an unrelated process before the new gateway boots) would cause the `kill(pid, 0)` probe to report "alive" spuriously. The session-id check catches this — a live process with a different session_id is distinguishable from an abandoned file — but the first-boot window where session_id is also unchanged is a theoretical corner. Operators who suspect this should `rm .rea/serve.pid .rea/serve.state.json` and restart.
|
|
369
|
+
|
|
370
|
+
Ref: `src/gateway/live-state.ts`, `src/gateway/server.ts`.
|
|
371
|
+
|
|
372
|
+
---
|
|
373
|
+
|
|
374
|
+
### 5.17 Health Payload Sanitization (0.6.2, BUG-011)
|
|
375
|
+
|
|
376
|
+
**Threat:** The `__rea__health` meta-tool short-circuits the middleware chain (intentionally — so it stays callable under HALT) and previously serialized `halt_reason` plus every `downstreams[].last_error` verbatim. Error strings from upstream MCPs can contain secrets (API keys, tokens) or prompt-injection payloads, neither of which the `redact` or `injection` middleware would see because the short-circuit response skips the chain entirely. Net effect: a redact + injection-sanitizer bypass, callable precisely when HALT should be holding the line.
|
|
377
|
+
|
|
378
|
+
**Mitigations:**
|
|
379
|
+
|
|
380
|
+
- `sanitizeHealthSnapshot()` runs on the short-circuit response before it reaches the MCP wire. Default behavior: `halt_reason = null` and every `downstreams[].last_error = null`. The operator-visible wire response carries no downstream-controlled diagnostic strings.
|
|
381
|
+
- Full diagnostic detail still flows into the meta-tool audit record. The record written for `__rea__health` carries the unsanitized `metadata.halt_reason` and `metadata.downstream_errors[]` (sourced pre-sanitize from `pool.healthSnapshot()` inside `server.ts`) alongside the existing counts. Audit is on local disk, hash-chained, append-only, and not LLM-reachable — the correct sink for trusted-operator diagnostic text.
|
|
382
|
+
- Operators who need error strings on the MCP wire can opt in via `.rea/policy.yaml`:
|
|
383
|
+
|
|
384
|
+
```yaml
|
|
385
|
+
gateway:
|
|
386
|
+
health:
|
|
387
|
+
expose_diagnostics: true
|
|
388
|
+
```
|
|
389
|
+
|
|
390
|
+
Opt-in mode still runs the full sanitizer pass: `redactSecrets` replaces known secret patterns with `[REDACTED:*]`, `classifyInjection` replaces any non-`clean` diagnostic string (verdicts `suspicious` or `likely_injection`) with the exported `INJECTION_REDACTED_PLACEHOLDER` token (`<redacted: suspected injection>`), and the redact-timeout sentinel `[REDACTED: pattern timeout]` is filtered from the wire so a caller cannot distinguish "pattern timed out" from "pattern matched."
|
|
391
|
+
|
|
392
|
+
- Diagnostic strings are bounded at 4096 UTF-16 code units before any scanning runs, via a UTF-8-safe truncate that drops trailing lone surrogates — an adversarial downstream cannot DoS the tool by throwing oversize errors.
|
|
393
|
+
- `meta.health.audit_failed` log level was elevated from `warn` to `error` and `summary.audit_fail_count` is exposed in the snapshot so operators can detect an audit-sink failure without parsing stderr.
|
|
394
|
+
|
|
395
|
+
**Residual risk:** `expose_diagnostics: true` is still operator-controlled text on an LLM-reachable surface. The sanitizer is best-effort defense-in-depth — a secret pattern not in the catalog, or an injection pattern that `classifyInjection` rates `clean`, will pass through unchanged.
|
|
396
|
+
|
|
397
|
+
Ref: `src/gateway/meta/health.ts`, `src/gateway/meta/health-sanitize.test.ts`.
|
|
398
|
+
|
|
399
|
+
---
|
|
400
|
+
|
|
401
|
+
### 5.18 Script-Anchor Hook Trust Boundary (0.6.2, BUG-012)
|
|
402
|
+
|
|
403
|
+
**Threat:** The `push-review-gate.sh` and `commit-review-gate.sh` hooks need to know the rea repository root for (a) the cross-repo short-circuit when invoked from a consumer repository, and (b) HALT / policy enforcement against the correct policy file. Prior to 0.6.2, `REA_ROOT=${CLAUDE_PROJECT_DIR:-$(pwd)}`. `CLAUDE_PROJECT_DIR` is caller-controlled — any process invoking the hook can set it to a foreign path, which the guard would treat as rea. Result: HALT silently bypassed, cross-repo short-circuit fires on the wrong comparison, policy read from a directory the caller chose.
|
|
404
|
+
|
|
405
|
+
**Mitigations:**
|
|
406
|
+
|
|
407
|
+
- Hooks derive `REA_ROOT` from their own on-disk location using `BASH_SOURCE[0]` + `pwd -P`, then walk up to 4 parent directories looking for `.rea/policy.yaml` as the authoritative install marker. Install topology is fixed (`<root>/.claude/hooks/<name>.sh`), so the anchor is forge-resistant — a caller cannot relocate the hook without filesystem write access to the rea install, which is already protected by `settings-protection.sh` and `blocked-paths` enforcement.
|
|
408
|
+
- `CLAUDE_PROJECT_DIR` is retained only as an advisory signal. When set and the realpath differs from the script-derived `REA_ROOT`, the hook emits a stderr advisory and continues using the script-derived value. It is never compared for short-circuit, never used to select the policy file, and never used to locate HALT.
|
|
409
|
+
- The cross-repo guard (0.6.1) compares `git rev-parse --git-common-dir` on both sides (not path prefixes). Mixed state (one side git, one non-git) fails **closed** — the gate runs — rather than falling through to path-prefix. Only the both-non-git case uses path-prefix, matching the documented 0.5.1 non-git escape hatch.
|
|
410
|
+
- The 0.7.0 BUG-008 cleanup extracted the shared logic into `hooks/_lib/push-review-core.sh` so both the Claude-Code PreToolUse adapter (`push-review-gate.sh`) and the native git adapter (`push-review-gate-git.sh`) share a single anchor-walk implementation — a fix lands in one place.
|
|
411
|
+
|
|
412
|
+
**Residual risk:** If a local attacker has write access to the rea install directory they can move or replace the hook file, which would change `SCRIPT_DIR` and therefore `REA_ROOT`. This is equivalent to tampering with any other hook contents (`settings-protection.sh` already addresses it) and lies outside the `CLAUDE_PROJECT_DIR` threat class.
|
|
413
|
+
|
|
414
|
+
Ref: `hooks/_lib/push-review-core.sh`, `__tests__/hooks/push-review-gate-cross-repo.test.ts` "BUG-012: foreign CLAUDE_PROJECT_DIR does NOT bypass HALT".
|
|
415
|
+
|
|
416
|
+
---
|
|
417
|
+
|
|
418
|
+
### 5.19 Tarball-Smoke Security-Claim Gate (0.6.2, BUG-013)
|
|
419
|
+
|
|
420
|
+
**Threat:** A changeset file claims a security fix (`[security]` marker), the release workflow merges and publishes, but the shipping `dist/` is byte-identical to the previous release — the claimed fix never made it into the compiled output. The 0.6.0 → 0.6.1 regression is the canonical example: `src/` changed, `dist/` did not. Without a pipeline gate that rebuilds `dist/` from the shipping commit and verifies the published tarball contents, no future security changeset can be trusted.
|
|
421
|
+
|
|
422
|
+
**Mitigations (shipped across 0.6.2 + 0.7.0):**
|
|
423
|
+
|
|
424
|
+
- `scripts/tarball-smoke.sh` (0.6.2) enforces a **content-based security-claim gate**. When any `.changeset/*.md` contains the `[security]` marker, the smoke requires at least one `src/**/*(sanitize|security)*.test.ts` file exists **and** every named-import symbol it pulls from a relative path is present in the compiled `dist/` tree. The gate fails loudly (exit 2) if the marker is present but no testable security symbols are extractable.
|
|
425
|
+
- `.github/workflows/release.yml` (0.7.0) rebuilds `dist/` from the shipping HEAD immediately before `changesets/action`, records the SHA-256 tree hash to `$RUNNER_TEMP/rea-dist-hash` (CI scratch space — cannot be accidentally committed by `changesets/action`'s `git add .`), and post-publish re-packs the just-published tarball from npm and fails the release if the published `dist/` tree hash doesn't match.
|
|
426
|
+
- `scripts/dist-regression-gate.sh` (0.7.0) + the `dist-regression` CI job run on every PR and every push-to-main. If `src/` has changed vs the last published tag but the rebuilt `dist/` tree hashes identically to the published tarball, CI fails — the "src changed, dist didn't" regression class is caught **before** the release branch, not only at publish time.
|
|
427
|
+
- Husky e2e regression guard (`__tests__/hooks/husky-e2e.test.ts`, 0.7.0) invokes a REAL `git push` against a bare remote via `core.hooksPath=.husky` with the SHIPPED `.husky/pre-push` in place (the standalone inline body emitted by `src/cli/install/pre-push.ts`). The ten-test matrix covers: nine cases that exercise the inline body's HALT, protected-path, Codex-waiver, `review.codex_required: false`, and bootstrap-push branches, plus one case that swaps in a wrapper around `hooks/push-review-gate-git.sh` as a shape-guard for the future installer path. The kind of BUG-008 silent-exit-0 regression that slipped past synthesized-stdin unit tests through 0.4.0 would now fail loudly.
|
|
428
|
+
|
|
429
|
+
**Residual risk:** A security claim whose fix is purely a deletion (no new symbols, no new test file) cannot be validated by the symbol-extraction gate. The `dist-regression` job catches this as a byte-identity failure, but the gate has no positive evidence of the fix's presence. Manual maintainer review on `[security]`-labeled PRs remains the compensating control.
|
|
430
|
+
|
|
431
|
+
Ref: `scripts/tarball-smoke.sh`, `scripts/dist-regression-gate.sh`, `.github/workflows/release.yml`.
|
|
432
|
+
|
|
433
|
+
---
|
|
434
|
+
|
|
435
|
+
### 5.20 Registry TOFU Pinning (0.3.0, G7)
|
|
436
|
+
|
|
437
|
+
**Threat:** An attacker who lands a malicious template via `rea init`, or who patches `.rea/registry.yaml` out-of-band (compromised dependency postinstall, CI-bot misconfig, editor plugin writing through stale buffers), can silently swap a downstream server's `command`, `args`, or `env` keys. The gateway would spawn the new child at next startup and proxy it without challenge.
|
|
438
|
+
|
|
439
|
+
**Mitigations:**
|
|
440
|
+
|
|
441
|
+
- On first successful connect, the gateway records a SHA-256 fingerprint of each downstream's **canonicalized registry config path** — `name`, `command`, `args`, the sorted KEY SET of `env` (values excluded so secret rotation doesn't trip drift), `env_passthrough`, and `tier_overrides` — to `.rea/fingerprints.json`. Trust-On-First-Use (TOFU) by config-path hash, not by tool-surface or binary hash.
|
|
442
|
+
- Subsequent connects re-compute the fingerprint and compare. A mismatch is a **hard fail**: the downstream is marked unhealthy, a structured log + audit record names the drift, and the gateway refuses to route calls to it. The operator must inspect the registry delta and either clear the fingerprint entry (re-pin) or acknowledge the drift via one-shot `REA_ACCEPT_DRIFT=<name>`.
|
|
443
|
+
- `fingerprints.json` is gitignored by default via the `.rea/` managed block so a local re-pin does not pollute history.
|
|
444
|
+
- Scope is explicitly **path-only, not binary, and not tool-surface**. Binary hashing would turn TOFU into a slow-boot tax and would trip false-positive drift on every legitimate MCP server upgrade. Tool-surface hashing was considered and deferred — see residual risk below.
|
|
445
|
+
|
|
446
|
+
**Residual risk:** Two classes remain uncovered by G7:
|
|
447
|
+
|
|
448
|
+
1. **Catalog drift from a legitimately-configured downstream.** A downstream whose registry config is unchanged but whose `tools/list` response changes between connects (new tool, renamed tool, modified description, modified input schema) is **not** detected by the config-path fingerprint. An attacker who compromises the downstream binary at `config.command` without changing the registry entry, or a legitimate upstream MCP server that silently expands its tool catalog in a patch release, both fall through this gate. See §6 "Catalog drift by downstream not detected on reconnect" — this is an active, tracked residual risk, not a mitigated one. The redact + injection middleware running on every proxied result is the compensating control, not a substitute.
|
|
449
|
+
2. **Host compromise with config-matching binary substitution.** An attacker who swaps the on-disk binary at `config.command` but leaves `.rea/registry.yaml` untouched is outside the G7 threat model — that is a host-integrity / supply-chain class, not a registry-tampering class.
|
|
450
|
+
|
|
451
|
+
Ref: `src/registry/fingerprint.ts` (`canonicalize()`, `fingerprintServer()`), `src/gateway/downstream-pool.ts` fingerprint-probe path.
|
|
452
|
+
|
|
453
|
+
---
|
|
454
|
+
|
|
455
|
+
### 5.21 G9 Three-Tier Injection Classifier (0.3.0)
|
|
456
|
+
|
|
457
|
+
**Threat:** A binary pass/fail injection detector is either too permissive (known instruction patterns slip through) or too strict (every tool description flags and the gateway becomes unusable). Either failure mode eventually trains operators to ignore the signal.
|
|
458
|
+
|
|
459
|
+
**Mitigations:**
|
|
460
|
+
|
|
461
|
+
- `classifyInjection()` returns one of three verdicts: `clean`, `suspicious`, or `likely_injection`. The verdict is derived from weighted matches against the shipped pattern catalog, tuned so legitimate tool descriptions rate `clean` by default.
|
|
462
|
+
- Escalation rules (first match wins, per `src/gateway/middleware/injection.ts:450-527`):
|
|
463
|
+
1. No literal and no base64-decoded match → `clean`.
|
|
464
|
+
2. Any base64-decoded match, regardless of tier → `likely_injection`.
|
|
465
|
+
3. ≥2 distinct literal matches, regardless of tier → `likely_injection`.
|
|
466
|
+
4. Any match at read-tier (or unknown tier — fail closed) → `likely_injection`.
|
|
467
|
+
5. Exactly one literal match at write/destructive tier → `suspicious`.
|
|
468
|
+
- `likely_injection` → always deny. No opt-out at policy level. (Note: because of rule 4, ANY injection match at read-tier is denied — the "warn but permit" path only exists for single-literal matches at write/destructive tier.)
|
|
469
|
+
- `suspicious` on a write/destructive tier → **policy-controlled**. `injection.suspicious_blocks_writes: true` (shipped in `bst-internal` and `bst-internal-no-codex` profiles — internal posture) denies. The schema default is `false` — external profiles (`open-source`, `client-engagement`, `minimal`, `lit-wc`) inherit the looser behavior so upgrading 0.2.x consumers are not silently tightened.
|
|
470
|
+
- **Regex timeout / oversize-result `error` verdict is mode-dependent** (`src/gateway/middleware/injection.ts:654-728`). Under `injection_detection: block` (all profiles except `warn`), any scan timeout or oversize input denies unconditionally — the partial scan cannot prove the unscanned suffix is safe, so block mode fails closed. Under `injection_detection: warn`, a timeout on an otherwise-clean partial scan is recorded as `metadata.injection.verdict = 'error'` and let through — this matches the 0.2.x `warn` semantics (fail-open by design) and operators opting into `warn` must accept this trade-off. Operators who want fail-closed everywhere should stay on `block`.
|
|
471
|
+
- The opt-in strict flag is honored at both the middleware layer (write/destructive deny) and the sanitizer layer (health payload replacement — the `<redacted: suspected injection>` placeholder collapses **any** non-`clean` diagnostic, so `suspicious` and `likely_injection` strings are both replaced on the `__rea__health` wire under `expose_diagnostics: true`).
|
|
472
|
+
- Every non-`clean` invocation records a nested `ctx.metadata.injection = { verdict, matched_patterns, base64_decoded }` object on the audit row (`src/gateway/middleware/injection.ts:733-740`). Consumers must read the nested shape — there is no top-level `injection_verdict` / `injection_match_count` field. The matched-patterns array contains the distinct phrase names only; the original input text is never exported.
|
|
473
|
+
|
|
474
|
+
**Residual risk:** Semantic injection in natural-language descriptions — a well-phrased instruction that no pattern catalog will catch — is not mitigated by pattern matching. This is the general limitation acknowledged in §5.1; the three-tier classifier narrows the footgun (by making "write under suspicion" a conscious policy decision) but does not eliminate it.
|
|
475
|
+
|
|
476
|
+
Ref: `src/gateway/middleware/injection.ts`, `src/gateway/middleware/injection.test.ts`.
|
|
477
|
+
|
|
478
|
+
---
|
|
479
|
+
|
|
303
480
|
## 6. Residual Risks and Open Issues
|
|
304
481
|
|
|
305
|
-
| Risk | Severity | Tracking
|
|
482
|
+
| Risk | Severity | Status / Tracking |
|
|
306
483
|
| ------------------------------------------------------------- | -------- | ------------------------------ |
|
|
307
|
-
| Semantic prompt injection via tool descriptions | High |
|
|
484
|
+
| Semantic prompt injection via tool descriptions | High | Partially mitigated — G9 three-tier classifier (§5.21) narrows the footgun via pattern matching, but semantic/natural-language injection that no catalog entry will catch is still unmitigated by design |
|
|
308
485
|
| Semantic injection via Codex adversarial-review responses | High | No issue filed (defense in depth via middleware) |
|
|
309
|
-
|
|
|
310
|
-
|
|
|
311
|
-
|
|
|
486
|
+
| Concurrent audit writers can race at fsync | Medium | Mitigated — proper-lockfile shipped 0.3.0 (G1) |
|
|
487
|
+
| Catalog drift by downstream not detected on reconnect | Medium | Active — G7 TOFU (§5.20) pins registry CONFIG (name/command/args/env keys), not the `tools/list` response. A downstream that silently expands or alters its tool catalog without a registry edit is not caught by the fingerprint; compensating control is the per-result redact + injection middleware. Tool-surface TOFU is a planned follow-up. |
|
|
488
|
+
| Post-publish tarball smoke not in CI | Medium | Mitigated — tarball-smoke shipped 0.3.0, security-claim gate 0.6.2 (§5.19) |
|
|
489
|
+
| No real-time alert on audit hash chain break | Medium | Mitigated — audit-rotation + verify-on-append shipped 0.3.0 (G1 + G5) |
|
|
490
|
+
| OIDC trusted publisher not yet migrated (`NODE_AUTH_TOKEN` still in use) | Medium | Deferred past 0.5.0 per MIGRATION-0.5.0.md; current path is `--provenance` with `NODE_AUTH_TOKEN` |
|
|
491
|
+
| Double-URL-encoding bypass for blocked paths | Medium | Planned fix (iterative decode to fixed-point) |
|
|
312
492
|
| SBOM not automated in publish pipeline | Medium | Planned |
|
|
313
493
|
| Secret pattern gaps (custom token formats, encoding variants) | Medium | No issue filed |
|
|
314
|
-
|
|
|
315
|
-
| Escape-hatch abuse signal not surfaced in `rea doctor` | Low | 0.3.0 (threshold: ≥3 / 7d) |
|
|
316
|
-
| Catalog drift by downstream not detected on reconnect | Medium | 0.3.0 G7 (fingerprint + drift) |
|
|
317
|
-
| OIDC trusted publisher not yet migrated (`NODE_AUTH_TOKEN` still in use) | Medium | 0.3.0 G8 |
|
|
494
|
+
| Escape-hatch abuse signal not surfaced in `rea doctor` | Low | Tracked (threshold: ≥3 / 7d) |
|
|
318
495
|
| Local user can escalate policy.yaml outside gateway | Low | By design (trusted actor) |
|
|
496
|
+
| Registry pin mismatch → hard fail (no rollback) on TOFU | Low | By design — operator clears `.rea/fingerprints.json` to re-pin |
|
|
319
497
|
|
|
320
498
|
---
|
|
321
499
|
|
|
@@ -323,8 +501,8 @@ Downstream MCP servers are treated as untrusted by default. Codex plugin *invoca
|
|
|
323
501
|
|
|
324
502
|
REA operates two independent layers. Bypassing one does not disable the other.
|
|
325
503
|
|
|
326
|
-
**Hook layer** (development-time):
|
|
504
|
+
**Hook layer** (development-time): 14 shell scripts ship. 12 are wired into Claude Code's `PreToolUse` / `PostToolUse` events via the default `.claude/settings.json`. Two are shipped but NOT registered by default: `commit-review-gate.sh` is a `PreToolUse: Bash` hook that matches `git commit` for operators who opt into commit-time review by adding a rule, and `push-review-gate-git.sh` is a native-git adapter that sources `hooks/_lib/push-review-core.sh` (the same shared core the Claude-Code `push-review-gate.sh` sources), shipped for consumers who wire a wrapper-based `.husky/pre-push` that execs it directly. `rea init` currently emits a standalone inline `.husky/pre-push` body (`src/cli/install/pre-push.ts`) rather than a wrapper; unifying the husky installer on the shared-core adapter is tracked as follow-up hardening. Hooks enforce: secret scanning, dangerous command interception, blocked path enforcement, settings protection, attribution advisory, dependency audit, push review gate (Claude-Code-JSON adapter registered; native `.husky/pre-push` adapter opt-in), PR issue linking, architecture review, env file protection, changeset security, and security-disclosure routing. The review-gate hooks (`push-review-gate.sh`, `push-review-gate-git.sh`, `commit-review-gate.sh`) anchor their trust decision on their own on-disk script location (BUG-012, §5.18), not on caller-controlled env vars. The remaining hooks still derive `REA_ROOT` from `${CLAUDE_PROJECT_DIR:-$(pwd)}`; extending the script-anchor idiom across the full hook set is a tracked hardening follow-up.
|
|
327
505
|
|
|
328
|
-
**Gateway layer** (runtime, `rea serve`): A middleware chain processes every proxied MCP tool call. Middleware enforces: audit, kill switch, policy/autonomy level, tier classification, blocked paths, rate limit, circuit breaker, prompt
|
|
506
|
+
**Gateway layer** (runtime, `rea serve`): A middleware chain processes every proxied MCP tool call. Middleware enforces: audit, kill switch, policy/autonomy level, tier classification, blocked paths, rate limit, circuit breaker, prompt-injection classification (§5.21), secret redaction (pre and post), and result size cap. The gateway also supervises downstream child processes (§5.14), emits a `SESSION_BLOCKER` audit event on persistent failure (§5.15), and publishes a live per-downstream state snapshot to `.rea/serve.state.json` (§5.16) that `rea status` reads read-only. The `__rea__health` meta-tool short-circuits the chain for callability under HALT and runs a dedicated sanitizer on its response (§5.17).
|
|
329
507
|
|
|
330
508
|
Both layers fail closed: on read failure, parse error, unknown errno on HALT, regex timeout, or any unexpected condition, the default action is deny (or for redaction specifically: replace with a sentinel — the content never escapes unscanned).
|
package/dist/cli/serve.d.ts
CHANGED
|
@@ -5,10 +5,18 @@
|
|
|
5
5
|
* later `rea serve` that has raced in and rewritten the breadcrumbs
|
|
6
6
|
* is never unexpectedly unlinked.
|
|
7
7
|
*/
|
|
8
|
+
/**
|
|
9
|
+
* Serve-state file shape. 0.9.0 added the `downstreams` block; older code
|
|
10
|
+
* that reads the state file treats a missing `downstreams` as "no live
|
|
11
|
+
* view available" and falls back to the pre-0.9 fields. `session_id` is
|
|
12
|
+
* the ownership key used by `cleanupStateIfOwned` during shutdown.
|
|
13
|
+
*/
|
|
8
14
|
interface ServeState {
|
|
9
15
|
session_id: string;
|
|
10
16
|
started_at: string;
|
|
11
17
|
metrics_port: number | null;
|
|
18
|
+
/** 0.9.0 — populated after the gateway starts; absent on this initial write. */
|
|
19
|
+
downstreams?: unknown[];
|
|
12
20
|
}
|
|
13
21
|
/**
|
|
14
22
|
* Atomic file write: stage to a per-pid temp name, then rename(2). The
|
package/dist/cli/serve.js
CHANGED
|
@@ -249,12 +249,30 @@ export async function runServe() {
|
|
|
249
249
|
console.error('');
|
|
250
250
|
process.exit(1);
|
|
251
251
|
}
|
|
252
|
+
// Metadata we'll also stamp into the state file below so `rea status`
|
|
253
|
+
// sees the session-id and start time alongside the new downstream block.
|
|
254
|
+
const startedAt = new Date().toISOString();
|
|
255
|
+
const statePath = reaPath(baseDir, SERVE_STATE_FILE);
|
|
252
256
|
const handle = createGateway({
|
|
253
257
|
baseDir,
|
|
254
258
|
policy,
|
|
255
259
|
registry: gatedRegistry,
|
|
256
260
|
logger,
|
|
257
261
|
metrics: metricsRegistry,
|
|
262
|
+
// 0.9.0 — let the gateway own live writes to serve.state.json so
|
|
263
|
+
// circuit-breaker transitions and supervisor events are reflected on
|
|
264
|
+
// disk for `rea status --json`. Legacy shape (session_id, started_at,
|
|
265
|
+
// metrics_port) is preserved for backward compatibility.
|
|
266
|
+
liveStateFilePath: statePath,
|
|
267
|
+
liveStateSessionId: sessionId,
|
|
268
|
+
liveStateStartedAt: startedAt,
|
|
269
|
+
liveStateMetricsPort: metricsServer?.port() ?? null,
|
|
270
|
+
// 0.9.0 pass-7 — reuse the gateway log redactor so downstream error
|
|
271
|
+
// strings are scrubbed for secret-shaped content BEFORE hitting
|
|
272
|
+
// serve.state.json or the operator's terminal via `rea status`.
|
|
273
|
+
// The redactor already incorporates SECRET_PATTERNS plus any
|
|
274
|
+
// operator-defined policy.redact.patterns loaded above.
|
|
275
|
+
liveStateLastErrorRedactor: logRedactor,
|
|
258
276
|
});
|
|
259
277
|
// ── HALT acknowledgement at startup (G5) ─────────────────────────────────
|
|
260
278
|
const haltPath = reaPath(baseDir, HALT_FILE);
|
|
@@ -280,13 +298,21 @@ export async function runServe() {
|
|
|
280
298
|
codexProbe.start();
|
|
281
299
|
}
|
|
282
300
|
// ── Pidfile + state (AFTER metrics boot so we persist the real port) ─────
|
|
283
|
-
|
|
301
|
+
//
|
|
302
|
+
// 0.9.0: the gateway's LiveStatePublisher owns all writes to
|
|
303
|
+
// serve.state.json, including the boot-time snapshot. Earlier drafts
|
|
304
|
+
// used the legacy `writeStateFile()` here to cover the bootstrap window
|
|
305
|
+
// between now and `handle.start()`'s first flush, but that write
|
|
306
|
+
// bypassed the sidecar-lock protocol and reintroduced the TOCTOU race
|
|
307
|
+
// P2b was designed to close (Codex 0.9.0 pass-3 P1: an overlapping
|
|
308
|
+
// older `rea serve` could clobber this unprotected write and the
|
|
309
|
+
// newer instance would later cleanup its own file during shutdown).
|
|
310
|
+
//
|
|
311
|
+
// Routing the boot write through `handle.livePublisher.flushNow()`
|
|
312
|
+
// means the boot snapshot is guarded by the same lock as every
|
|
313
|
+
// subsequent flush; overlapping gateways serialize cleanly.
|
|
284
314
|
const pidPath = writePidfile(baseDir);
|
|
285
|
-
|
|
286
|
-
session_id: sessionId,
|
|
287
|
-
started_at: startedAt,
|
|
288
|
-
metrics_port: metricsServer?.port() ?? null,
|
|
289
|
-
});
|
|
315
|
+
handle.livePublisher?.flushNow();
|
|
290
316
|
let shuttingDown = false;
|
|
291
317
|
const shutdown = async (signal) => {
|
|
292
318
|
// A second signal (e.g. SIGTERM then SIGINT) must NOT re-enter cleanup —
|
package/dist/cli/status.d.ts
CHANGED
|
@@ -6,7 +6,7 @@
|
|
|
6
6
|
*
|
|
7
7
|
* `rea status` is the LIVE view: is a gateway running for this cwd? What is
|
|
8
8
|
* its session id? What does the audit chain look like right now? Is HALT
|
|
9
|
-
* active?
|
|
9
|
+
* active? Which downstreams are connected / healthy / tripped?
|
|
10
10
|
*
|
|
11
11
|
* Detection strategy for "is serve running":
|
|
12
12
|
* 1. Read `.rea/serve.pid`.
|
|
@@ -14,6 +14,15 @@
|
|
|
14
14
|
* 3. If kill throws ESRCH or EPERM, the pid is stale — treat as not-running
|
|
15
15
|
* and surface that nuance in the output.
|
|
16
16
|
*
|
|
17
|
+
* 0.9.0 — per-downstream live block. `readServeState` parses the
|
|
18
|
+
* `downstreams: [...]` array from `.rea/serve.state.json` (written by the
|
|
19
|
+
* live-state publisher on every circuit transition + supervisor event).
|
|
20
|
+
* Each entry carries `name`, `connected`, `healthy`, `circuit_state`,
|
|
21
|
+
* `retry_at`, `last_error` (redacted by the publisher), `tools_count`,
|
|
22
|
+
* `open_transitions`, and `session_blocker_emitted`. State files written
|
|
23
|
+
* by a pre-0.9.0 gateway degrade gracefully: `downstreams` surfaces as
|
|
24
|
+
* `null` with a hint to upgrade.
|
|
25
|
+
*
|
|
17
26
|
* Output modes:
|
|
18
27
|
* - Default: human-pretty, matching the spacing used by `rea check`.
|
|
19
28
|
* - `--json`: canonical JSON object, composable with jq and future tooling.
|
|
@@ -23,6 +32,11 @@
|
|
|
23
32
|
* `rea audit verify` is the authoritative check and is expensive on large
|
|
24
33
|
* chains; here we just report line count, last timestamp, and a cheap "last
|
|
25
34
|
* record's stored hash is non-empty" heuristic as an integrity smoke signal.
|
|
35
|
+
*
|
|
36
|
+
* Every disk-sourced string field flows through `sanitizeForTerminal` on the
|
|
37
|
+
* pretty-print path — JSON mode relies on `JSON.stringify` to escape control
|
|
38
|
+
* chars safely — so a malicious `halt_reason` or `last_error` cannot inject
|
|
39
|
+
* ANSI/OSC escapes into the operator's terminal.
|
|
26
40
|
*/
|
|
27
41
|
/**
|
|
28
42
|
* Strip every ASCII control code (C0 plus DEL) from a string. Defense
|
|
@@ -47,6 +61,24 @@ export declare function sanitizeForTerminal(value: string): string;
|
|
|
47
61
|
export interface StatusOptions {
|
|
48
62
|
json?: boolean | undefined;
|
|
49
63
|
}
|
|
64
|
+
/**
|
|
65
|
+
* Per-downstream live state surfaced in both JSON and pretty outputs
|
|
66
|
+
* (0.9.0, BUG-005). Mirrors `LiveDownstreamState` in
|
|
67
|
+
* `src/gateway/live-state.ts`; duplicated here to keep the CLI surface
|
|
68
|
+
* independent of gateway internals (the CLI can be built without the
|
|
69
|
+
* gateway module in a trimmed install).
|
|
70
|
+
*/
|
|
71
|
+
export interface LiveDownstreamSnapshot {
|
|
72
|
+
name: string;
|
|
73
|
+
connected: boolean;
|
|
74
|
+
healthy: boolean;
|
|
75
|
+
circuit_state: 'closed' | 'open' | 'half-open';
|
|
76
|
+
retry_at: string | null;
|
|
77
|
+
last_error: string | null;
|
|
78
|
+
tools_count: number | null;
|
|
79
|
+
open_transitions: number;
|
|
80
|
+
session_blocker_emitted: boolean;
|
|
81
|
+
}
|
|
50
82
|
interface ServeLiveness {
|
|
51
83
|
running: boolean;
|
|
52
84
|
pid: number | null;
|
|
@@ -56,6 +88,13 @@ interface ServeLiveness {
|
|
|
56
88
|
session_id: string | null;
|
|
57
89
|
started_at: string | null;
|
|
58
90
|
metrics_port: number | null;
|
|
91
|
+
/**
|
|
92
|
+
* 0.9.0 — per-downstream live block, or `null` when the state file was
|
|
93
|
+
* written by an older gateway version that did not include it. A
|
|
94
|
+
* zero-length array means "gateway is running with no downstreams
|
|
95
|
+
* configured", which is a distinct signal from "unknown".
|
|
96
|
+
*/
|
|
97
|
+
downstreams: LiveDownstreamSnapshot[] | null;
|
|
59
98
|
}
|
|
60
99
|
interface AuditStats {
|
|
61
100
|
present: boolean;
|
package/dist/cli/status.js
CHANGED
|
@@ -6,7 +6,7 @@
|
|
|
6
6
|
*
|
|
7
7
|
* `rea status` is the LIVE view: is a gateway running for this cwd? What is
|
|
8
8
|
* its session id? What does the audit chain look like right now? Is HALT
|
|
9
|
-
* active?
|
|
9
|
+
* active? Which downstreams are connected / healthy / tripped?
|
|
10
10
|
*
|
|
11
11
|
* Detection strategy for "is serve running":
|
|
12
12
|
* 1. Read `.rea/serve.pid`.
|
|
@@ -14,6 +14,15 @@
|
|
|
14
14
|
* 3. If kill throws ESRCH or EPERM, the pid is stale — treat as not-running
|
|
15
15
|
* and surface that nuance in the output.
|
|
16
16
|
*
|
|
17
|
+
* 0.9.0 — per-downstream live block. `readServeState` parses the
|
|
18
|
+
* `downstreams: [...]` array from `.rea/serve.state.json` (written by the
|
|
19
|
+
* live-state publisher on every circuit transition + supervisor event).
|
|
20
|
+
* Each entry carries `name`, `connected`, `healthy`, `circuit_state`,
|
|
21
|
+
* `retry_at`, `last_error` (redacted by the publisher), `tools_count`,
|
|
22
|
+
* `open_transitions`, and `session_blocker_emitted`. State files written
|
|
23
|
+
* by a pre-0.9.0 gateway degrade gracefully: `downstreams` surfaces as
|
|
24
|
+
* `null` with a hint to upgrade.
|
|
25
|
+
*
|
|
17
26
|
* Output modes:
|
|
18
27
|
* - Default: human-pretty, matching the spacing used by `rea check`.
|
|
19
28
|
* - `--json`: canonical JSON object, composable with jq and future tooling.
|
|
@@ -23,6 +32,11 @@
|
|
|
23
32
|
* `rea audit verify` is the authoritative check and is expensive on large
|
|
24
33
|
* chains; here we just report line count, last timestamp, and a cheap "last
|
|
25
34
|
* record's stored hash is non-empty" heuristic as an integrity smoke signal.
|
|
35
|
+
*
|
|
36
|
+
* Every disk-sourced string field flows through `sanitizeForTerminal` on the
|
|
37
|
+
* pretty-print path — JSON mode relies on `JSON.stringify` to escape control
|
|
38
|
+
* chars safely — so a malicious `halt_reason` or `last_error` cannot inject
|
|
39
|
+
* ANSI/OSC escapes into the operator's terminal.
|
|
26
40
|
*/
|
|
27
41
|
import fs from 'node:fs';
|
|
28
42
|
import { loadPolicy } from '../policy/loader.js';
|
|
@@ -96,21 +110,64 @@ function readPidfile(baseDir) {
|
|
|
96
110
|
return null;
|
|
97
111
|
}
|
|
98
112
|
}
|
|
113
|
+
/**
|
|
114
|
+
* Parse a single downstream entry from `serve.state.json`. Every field is
|
|
115
|
+
* validated — an unexpected type yields a null for that field rather than
|
|
116
|
+
* poisoning the whole entry, because the state file is touched on a hot
|
|
117
|
+
* path and we would rather surface a half-useful snapshot than a
|
|
118
|
+
* "corrupt, try again" error to the operator.
|
|
119
|
+
*
|
|
120
|
+
* Returns `null` when the entry's `name` is missing or not a string, since
|
|
121
|
+
* a downstream with no name is unusable for display.
|
|
122
|
+
*/
|
|
123
|
+
function parseDownstreamEntry(raw) {
|
|
124
|
+
if (typeof raw !== 'object' || raw === null)
|
|
125
|
+
return null;
|
|
126
|
+
const r = raw;
|
|
127
|
+
if (typeof r.name !== 'string' || r.name.length === 0)
|
|
128
|
+
return null;
|
|
129
|
+
const circuit = r.circuit_state === 'open' || r.circuit_state === 'half-open' || r.circuit_state === 'closed'
|
|
130
|
+
? r.circuit_state
|
|
131
|
+
: 'closed';
|
|
132
|
+
return {
|
|
133
|
+
name: r.name,
|
|
134
|
+
connected: typeof r.connected === 'boolean' ? r.connected : false,
|
|
135
|
+
healthy: typeof r.healthy === 'boolean' ? r.healthy : false,
|
|
136
|
+
circuit_state: circuit,
|
|
137
|
+
retry_at: typeof r.retry_at === 'string' ? r.retry_at : null,
|
|
138
|
+
last_error: typeof r.last_error === 'string' ? r.last_error : null,
|
|
139
|
+
tools_count: typeof r.tools_count === 'number' && Number.isInteger(r.tools_count) ? r.tools_count : null,
|
|
140
|
+
open_transitions: typeof r.open_transitions === 'number' && Number.isInteger(r.open_transitions)
|
|
141
|
+
? r.open_transitions
|
|
142
|
+
: 0,
|
|
143
|
+
session_blocker_emitted: typeof r.session_blocker_emitted === 'boolean' ? r.session_blocker_emitted : false,
|
|
144
|
+
};
|
|
145
|
+
}
|
|
99
146
|
function readServeState(baseDir) {
|
|
100
147
|
const p = reaPath(baseDir, SERVE_STATE_FILE);
|
|
101
148
|
try {
|
|
102
149
|
const raw = fs.readFileSync(p, 'utf8');
|
|
103
150
|
const parsed = JSON.parse(raw);
|
|
151
|
+
let downstreams = null;
|
|
152
|
+
if (Array.isArray(parsed.downstreams)) {
|
|
153
|
+
downstreams = [];
|
|
154
|
+
for (const entry of parsed.downstreams) {
|
|
155
|
+
const ds = parseDownstreamEntry(entry);
|
|
156
|
+
if (ds !== null)
|
|
157
|
+
downstreams.push(ds);
|
|
158
|
+
}
|
|
159
|
+
}
|
|
104
160
|
return {
|
|
105
161
|
session_id: typeof parsed.session_id === 'string' ? parsed.session_id : null,
|
|
106
162
|
started_at: typeof parsed.started_at === 'string' ? parsed.started_at : null,
|
|
107
163
|
metrics_port: typeof parsed.metrics_port === 'number' && Number.isInteger(parsed.metrics_port)
|
|
108
164
|
? parsed.metrics_port
|
|
109
165
|
: null,
|
|
166
|
+
downstreams,
|
|
110
167
|
};
|
|
111
168
|
}
|
|
112
169
|
catch {
|
|
113
|
-
return { session_id: null, started_at: null, metrics_port: null };
|
|
170
|
+
return { session_id: null, started_at: null, metrics_port: null, downstreams: null };
|
|
114
171
|
}
|
|
115
172
|
}
|
|
116
173
|
function probeServe(baseDir) {
|
|
@@ -124,6 +181,7 @@ function probeServe(baseDir) {
|
|
|
124
181
|
session_id: null,
|
|
125
182
|
started_at: null,
|
|
126
183
|
metrics_port: null,
|
|
184
|
+
downstreams: null,
|
|
127
185
|
};
|
|
128
186
|
}
|
|
129
187
|
const alive = isProcessAlive(pid);
|
|
@@ -135,6 +193,7 @@ function probeServe(baseDir) {
|
|
|
135
193
|
session_id: state.session_id,
|
|
136
194
|
started_at: state.started_at,
|
|
137
195
|
metrics_port: state.metrics_port,
|
|
196
|
+
downstreams: state.downstreams,
|
|
138
197
|
};
|
|
139
198
|
}
|
|
140
199
|
/**
|
|
@@ -356,6 +415,46 @@ function printPretty(payload) {
|
|
|
356
415
|
}
|
|
357
416
|
}
|
|
358
417
|
console.log('');
|
|
418
|
+
// 0.9.0 — per-downstream block. Only shown when the serve process is
|
|
419
|
+
// believed to be running AND the state file carried the new array. An
|
|
420
|
+
// older gateway version that predates the publisher leaves `downstreams`
|
|
421
|
+
// null; we print an explanatory hint instead of rendering an empty
|
|
422
|
+
// table that looks like "zero downstreams".
|
|
423
|
+
if (s.running) {
|
|
424
|
+
console.log(' Downstreams');
|
|
425
|
+
if (s.downstreams === null) {
|
|
426
|
+
console.log(` (state file has no downstream block — upgrade gateway to ≥0.9.0)`);
|
|
427
|
+
}
|
|
428
|
+
else if (s.downstreams.length === 0) {
|
|
429
|
+
console.log(` (no downstream servers declared in .rea/registry.yaml)`);
|
|
430
|
+
}
|
|
431
|
+
else {
|
|
432
|
+
for (const d of s.downstreams) {
|
|
433
|
+
const name = sanitizeForTerminal(d.name);
|
|
434
|
+
const lastErr = safePretty(d.last_error);
|
|
435
|
+
const retryAt = safePretty(d.retry_at);
|
|
436
|
+
const healthToken = d.healthy ? (d.connected ? 'healthy' : 'connecting') : 'UNHEALTHY';
|
|
437
|
+
const circuit = d.circuit_state.toUpperCase();
|
|
438
|
+
console.log(` ${name}`);
|
|
439
|
+
console.log(` Health: ${healthToken}`);
|
|
440
|
+
console.log(` Circuit: ${circuit}`);
|
|
441
|
+
if (retryAt !== null && d.circuit_state === 'open') {
|
|
442
|
+
console.log(` Retry at: ${retryAt}`);
|
|
443
|
+
}
|
|
444
|
+
if (d.tools_count !== null) {
|
|
445
|
+
console.log(` Tools advertised: ${d.tools_count}`);
|
|
446
|
+
}
|
|
447
|
+
if (d.open_transitions > 0) {
|
|
448
|
+
const blockerSuffix = d.session_blocker_emitted ? ' (SESSION_BLOCKER fired)' : '';
|
|
449
|
+
console.log(` Open transitions: ${d.open_transitions}${blockerSuffix}`);
|
|
450
|
+
}
|
|
451
|
+
if (lastErr !== null) {
|
|
452
|
+
console.log(` Last error: ${lastErr}`);
|
|
453
|
+
}
|
|
454
|
+
}
|
|
455
|
+
}
|
|
456
|
+
console.log('');
|
|
457
|
+
}
|
|
359
458
|
console.log(' Audit log');
|
|
360
459
|
if (!a.present) {
|
|
361
460
|
console.log(` State: not yet written`);
|