npm - instar - Versions diffs - 1.3.565 → 1.3.567 - Mend

instar 1.3.565 → 1.3.567

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (55) hide show

package/dist/config/ConfigDefaults.d.ts.map +1 -1
package/dist/config/ConfigDefaults.js +16 -1
package/dist/config/ConfigDefaults.js.map +1 -1
package/dist/core/BitwardenProvider.d.ts +8 -0
package/dist/core/BitwardenProvider.d.ts.map +1 -1
package/dist/core/BitwardenProvider.js +10 -0
package/dist/core/BitwardenProvider.js.map +1 -1
package/dist/core/PostUpdateMigrator.d.ts +13 -0
package/dist/core/PostUpdateMigrator.d.ts.map +1 -1
package/dist/core/PostUpdateMigrator.js +55 -0
package/dist/core/PostUpdateMigrator.js.map +1 -1
package/dist/core/devGatedFeatures.d.ts.map +1 -1
package/dist/core/devGatedFeatures.js +12 -0
package/dist/core/devGatedFeatures.js.map +1 -1
package/dist/core/types.d.ts +42 -0
package/dist/core/types.d.ts.map +1 -1
package/dist/core/types.js.map +1 -1
package/dist/lifeline/ServerSupervisor.d.ts +9 -0
package/dist/lifeline/ServerSupervisor.d.ts.map +1 -1
package/dist/lifeline/ServerSupervisor.js +174 -84
package/dist/lifeline/ServerSupervisor.js.map +1 -1
package/dist/monitoring/BlockerLedger.d.ts +43 -2
package/dist/monitoring/BlockerLedger.d.ts.map +1 -1
package/dist/monitoring/BlockerLedger.js +90 -5
package/dist/monitoring/BlockerLedger.js.map +1 -1
package/dist/monitoring/DurableVaultSession.d.ts +91 -0
package/dist/monitoring/DurableVaultSession.d.ts.map +1 -0
package/dist/monitoring/DurableVaultSession.js +145 -0
package/dist/monitoring/DurableVaultSession.js.map +1 -0
package/dist/monitoring/SelfUnblockChecklist.d.ts +281 -0
package/dist/monitoring/SelfUnblockChecklist.d.ts.map +1 -0
package/dist/monitoring/SelfUnblockChecklist.js +433 -0
package/dist/monitoring/SelfUnblockChecklist.js.map +1 -0
package/dist/monitoring/SelfUnblockProbeProviders.d.ts +116 -0
package/dist/monitoring/SelfUnblockProbeProviders.d.ts.map +1 -0
package/dist/monitoring/SelfUnblockProbeProviders.js +286 -0
package/dist/monitoring/SelfUnblockProbeProviders.js.map +1 -0
package/dist/scaffold/templates.d.ts.map +1 -1
package/dist/scaffold/templates.js +8 -0
package/dist/scaffold/templates.js.map +1 -1
package/dist/server/AgentServer.d.ts +16 -0
package/dist/server/AgentServer.d.ts.map +1 -1
package/dist/server/AgentServer.js +106 -0
package/dist/server/AgentServer.js.map +1 -1
package/dist/server/routes.d.ts +10 -0
package/dist/server/routes.d.ts.map +1 -1
package/dist/server/routes.js +117 -0
package/dist/server/routes.js.map +1 -1
package/package.json +1 -1
package/src/data/builtin-manifest.json +64 -64
package/src/scaffold/templates.ts +8 -0
package/upgrades/1.3.566.md +104 -0
package/upgrades/1.3.567.md +39 -0
package/upgrades/side-effects/self-unblock-before-escalating.md +258 -0
package/upgrades/side-effects/supervisor-respawn-guarantee.md +61 -0

package/upgrades/1.3.566.md ADDED Viewed

@@ -0,0 +1,104 @@
+# Upgrade Guide — vNEXT
+<!-- assembled-by: assemble-next-md -->
+<!-- bump: patch -->
+## What Changed
+- **A new constitutional standard, "Self-Unblock Before Escalating"** (`docs/STANDARDS-REGISTRY.md`):
+  a blocker is the agent's problem to solve FIRST. The agent must exhaust every unblock path
+  *within its permissions or organizationally-granted scope* before requiring anything from a
+  human — and when a human IS required, ask for the lowest rung on the human-requirement ladder
+  (**nothing → an approval → an operator-only credential**), named exactly. Origin: operator
+  directive (Justin, topic 12476, 2026-06-13) after the agent idled on a `feedback.instar.sh` DNS
+  record that was already self-unblockable via a Cloudflare token sitting in the org vault.
+- **It EXTENDS the existing `BlockerLedger` gate — it does NOT fork a parallel one.** Round-1
+  convergence (and two independent external reviewers) found `BlockerLedger.settleTrueBlocker()`
+  already mandates a recorded failed self-unblock attempt before a credential/account blocker can
+  settle, already HARD-rejects `missing_failed_attempt`, and already routes the settle JUDGMENT
+  through the Tier-1 `SettleAuthority` (B17) LLM gate. So this standard adds only the things the
+  ledger did NOT have and reuses its pipeline / taxonomy / log / untrusted-data envelope for
+  everything else.
+- **`SelfUnblockChecklist` (the one substantial new module, `src/monitoring/SelfUnblockChecklist.ts`):**
+  a deterministic, code-driven, ORDERED probe list (own per-agent vault → org Bitwarden → authed
+  cloud accounts → MCP → browser → "do I already control a resource that does this?") that
+  systematically PRODUCES the failed-attempt evidence the ledger requires. `holdsRelevantCred` is
+  decided DETERMINISTICALLY by a `service:scope` tag match (e.g. `cloudflare:dawn-tunnel.dev`, with
+  domain-hierarchy / wildcard rules), never by an LLM; ambiguous or MISSING metadata fails CLOSED
+  (a stale cred is simply not surfaced, never mis-applied). Each probe is timeout-bounded by class
+  and fails toward `reachable:false`, so one hung probe degrades to "unreachable" rather than
+  stalling the path.
+- **Anti-gaming is mechanical (the one edit to BlockerLedger's logic):** the checklist RUNNER
+  persists each run keyed by an immutable run id, and `settleTrueBlocker` now takes a `runId`
+  REFERENCE that the ledger LOADS + verifies — replacing the old caller-supplied `failedAttempt`
+  object. A hand-supplied list with no genuine persisted run is treated as NO attempt, so the
+  "self-asserted / gameable list" hole is closed by construction. Everything else in the ledger is
+  reuse.
+- **`DurableVaultSession` (`src/monitoring/DurableVaultSession.ts`):** flag-gated wiring to the
+  existing in-memory `BitwardenProvider` session so the org-vault probe can actually reach the
+  vault. The session value lives in process memory / keychain only, is held warm ONLY while a
+  checklist run is in flight (TTL + idle-expiry), is NEVER logged, NEVER passed as argv (handed to
+  `bw` only via the child `BW_SESSION` env), and NEVER placed on the cross-machine
+  `multiMachine.secretSync` path. The master password stays operator-held; no new on-disk secret is
+  introduced.
+- **Rung FLOOR (capability ≠ authority):** an action that is irreversible, cost-bearing above a
+  threshold, out-of-scope, or policy-sensitive keeps a MINIMUM rung of 1 (an approval) EVEN IF a
+  self-unblock credential exists. A rung-1 approval must resolve against a VERIFIED principal (Know
+  Your Principal), never a name seen in content.
+- **Read-only HTTP surface:** the existing `/blockers/*` read surface is extended with the
+  checklist's per-probe results + the rung — `GET /blockers/self-unblock-runs`. Bearer-gated, NOT on
+  the auth-exempt allowlist, 503-when-dark AFTER auth (an unauthenticated caller gets 401, not a 503
+  that confirms the route exists), `Cache-Control: no-store` (the body is credential-reachability
+  reconnaissance), served through the `<blocker-ledger-data>` envelope.
+- **Ships DARK behind the developmentAgent gate** — the `selfUnblockChecklist` and
+  `durableVaultSession` blocks nest under the existing `monitoring.blockerLedger.*` gate and OMIT
+  the `enabled` literal, so `resolveDevAgentGate` decides: LIVE on a development agent, DARK on the
+  fleet. Reversible — disabling makes the checklist not run, the route 503, the session not kept
+  warm; all inert.
+## What to Tell Your User
+Nothing changes for fleet agents — this ships off for everyone except development agents. On a
+development agent, when the agent hits a blocker (a missing credential, a locked account), it now
+systematically checks every place it's already allowed to look — its own vault, the organization's
+password vault, the cloud accounts it's already signed into — before it ever asks you for anything.
+Only after that genuinely comes up empty will it ask, and it asks for the smallest thing possible:
+ideally nothing, then a yes/no approval, and a credential only you can produce as the last resort. It
+can never use a credential to act outside the job at hand, and anything irreversible or costly still
+asks for your approval first even if it could technically do it itself. There is no user-facing action
+and no behavior change on a normal install.
+## Summary of New Capabilities
+- On a development agent, the agent now runs a deterministic self-unblock checklist before it can
+  settle any credential/account blocker as genuinely operator-required — turning "exhaust every path
+  you can already reach first" from a prompt-level wish into a structural precondition fed into the
+  existing Tier-1 blocker-settle authority.
+## Evidence
+- `tests/unit/BlockerLedgerSelfUnblock.test.ts` (7): checklist probe ordering + short-circuit +
+  per-probe timeout→`reachable:false`; the run-id provenance contract (a caller-embedded attempt
+  with no persisted run is HARD-rejected as `missing_failed_attempt`); the ladder/rung-floor mapping
+  (irreversible/cost-bearing → min rung 1 even with a cred); the rung-1 verified-principal
+  resolution.
+- `tests/unit/DurableVaultSession.test.ts` (8): TTL / idle-expiry, in-flight-only hold, and a wiring
+  assertion that the session value never appears in the ledger or argv.
+- `tests/integration/self-unblock-routes.test.ts` (5): the `/blockers/self-unblock-runs` read route —
+  200 when enabled, 503 (after auth) when dark; a real persisted run feeds the settle path; the
+  negative anti-gaming assertion (no persisted run → HARD reject).
+- `tests/e2e/self-unblock-lifecycle.test.ts` (6): production init path — route ALIVE (200) when
+  enabled, 503 when dark; a self-unblock action touching an external account is STILL evaluated by
+  the external-operation + mandate gates (proves the boundary mechanically, not in prose).
+- `tests/unit/lint-dev-agent-dark-gate.test.ts` + `tests/unit/feature-delivery-completeness.test.ts`:
+  the two new flags are registered in `DEV_GATED_FEATURES` with justifications and the attribution
+  map recomputed.
+## Migration
+- `PostUpdateMigrator.migrateConfigSelfUnblockChecklistDevGate` strips a default-shaped `false` for
+  the `selfUnblockChecklist` / `durableVaultSession` blocks (nested under
+  `monitoring.blockerLedger.*`) on update, so the gate resolves on already-deployed dev agents. An
+  operator-set value is left entirely alone — reach is not authority. Idempotent. The CLAUDE.md
+  self-unblock reflex section is delivered via the content-sniffed `migrateClaudeMd` path +
+  `generateClaudeMd()` (Agent Awareness).

package/upgrades/1.3.567.md ADDED Viewed

@@ -0,0 +1,39 @@
+# Upgrade Guide — vNEXT
+<!-- assembled-by: assemble-next-md -->
+<!-- bump: patch -->
+## What Changed
+Hardened the in-process **server supervisor** (`src/lifeline/ServerSupervisor.ts`) so a genuinely-dead server is always respawned, even under sustained CPU starvation. This closes a real ~2h outage class (2026-06-14): on a heavily-loaded box the supervisor's 10s health loop stalls for minutes, and it misread every large inter-tick gap as a machine sleep/wake — resetting `spawnedAt = now`, which re-armed the startup-grace window where health failures (including the unambiguous signal that the server's tmux session no longer exists) are deliberately ignored. The server stayed dead until a human messaged.
+Three layered fixes, all in the health-check tick:
+- **Fix A (load-bearing) — missing-session override.** Before honoring any startup-grace early-return, the tick now probes `isServerSessionAlive()`. A missing server tmux session is unambiguous death, not a boot, and is respawned on the very next tick regardless of any `spawnedAt` reset — routed through the existing `handleUnhealthy()` so the circuit breaker still bounds a genuine crash-loop. A normally-booting server has a live tmux session (created synchronously at spawn; HTTP binds later), so this never fights a real boot.
+- **Fix B — load-aware gap detection.** A large inter-tick gap is only treated as sleep/wake when the box is NOT CPU-starved. Under starvation (`loadRatio > 1.5`, the same signal the CPU-starvation defer already uses) the gap is classified as a stalled event loop: failure counters reset, but the startup-grace window is NOT re-armed. The same guard is applied to the `SleepWakeDetector` wake handler. A real low-load suspend still re-arms grace exactly as before.
+- **Fix C — absolute grace ceiling.** A new `firstSpawnedAt` anchor (never reset by sleep/wake handling) caps cumulative startup grace at `startupGraceMs × 3`. A server whose session is alive but has never answered `/health` past the ceiling is hung, not booting, and its failures are acted on normally.
+The inline `setInterval` callback was extracted verbatim into `runHealthTick()` so a single tick is unit-testable and a wiring-integrity test can assert the loop probes session liveness on every tick.
+audience: agent-only
+maturity: stable
+Net #1 (a subsystem uncaught exception crashing the whole process) and net #3 (the launchd fleet watchdog) are tracked follow-ups in the spec §6. Net #3's live root cause was additionally found and fixed in production this session (the `ai.instar.watchdog` launchd job was loaded from a reaped temp-dir plist → exit 127); the durable source fix is tracked in FU-2. <!-- tracked: CMT-1540 -->
+## What to Tell Your User
+Nothing to announce proactively. If asked about server reliability: when my server process genuinely dies, the supervisor now respawns it within one health tick (~10s) instead of being fooled by CPU load into thinking the machine went to sleep and waiting indefinitely. The recovery decision is now grounded in an objective fact — does the server process actually exist? — rather than a sleep/wake guess that could be wrong on a busy machine. Normal slow boots are still given the full startup grace, so this never restarts a server mid-boot.
+## Summary of New Capabilities
+No new user-facing capability — a reliability hardening of the existing crash-recovery supervisor.
+| Change | Effect |
+|--------|--------|
+| Missing-session override (Fix A) | A dead server tmux session is respawned on the next ~10s tick, even during startup grace |
+| Load-aware gap detection (Fix B) | A CPU-starvation event-loop stall is no longer misread as sleep/wake; grace is not falsely re-armed |
+| Absolute grace ceiling (Fix C) | Repeated `spawnedAt` resets can no longer suppress recovery beyond 3× the grace window |
+## Evidence
+Reliability fix; pinned by `tests/unit/server-supervisor-respawn-guarantee.test.ts` (10) driving the real extracted `runHealthTick()` and `SleepWakeDetector` wake handler: missing-session-during-grace → respawn (the exact 2026-06-14 trap), missing-session-during-false-wake → respawn, starved-gap → `spawnedAt` not reset, low-load-gap → re-armed, grace-ceiling broken → failures acted on, in-grace booting server still protected (Fix A no boot regression), `firstSpawnedAt` cleared on healthy, and a wiring-integrity assertion that the tick probes `isServerSessionAlive()` every tick. The full existing supervisor/lifeline suite (63 tests across 8 files) still passes. `npx tsc --noEmit` clean.

package/upgrades/side-effects/self-unblock-before-escalating.md ADDED Viewed

@@ -0,0 +1,258 @@
+# Side-Effects Review — Self-Unblock Before Escalating (constitutional standard)
+**Version / slug:** `self-unblock-before-escalating`
+**Date:** `2026-06-14`
+**Author:** `echo`
+**Second-pass reviewer:** `general-purpose reviewer subagent (high-risk: touches a settle gate)`
+## Summary of the change
+Encodes the operator directive (Justin, topic 12476, 2026-06-13) "exhaust self-unblock within your
+permissions before requiring anything from a human" as a constitutional standard. The crucial design
+move — forced by round-1 convergence and two external reviewers — is that it **EXTENDS the existing
+`BlockerLedger.settleTrueBlocker()` gate rather than forking a parallel one**. The ledger ALREADY
+mandates a recorded failed self-unblock attempt before a credential/account blocker can settle as a
+true-blocker, already HARD-rejects `missing_failed_attempt`, and already routes the settle JUDGMENT
+through the Tier-1 `SettleAuthority` (B17) LLM gate. This change adds the four things the ledger did
+not have and reuses everything else. Files: `src/monitoring/SelfUnblockChecklist.ts` (new, the only
+substantial new code — a deterministic ordered probe list), `src/monitoring/DurableVaultSession.ts`
+(new, flag-gated org-vault session), `src/monitoring/BlockerLedger.ts` (+140: the ONE logic edit —
+`settleTrueBlocker` now takes a `runId` reference it LOADS + verifies instead of a caller-supplied
+`failedAttempt` object), `src/server/{routes,AgentServer}.ts` (read-only `GET
+/blockers/self-unblock-runs`), `src/core/{devGatedFeatures,types,ConfigDefaults}.ts` (dark dev-gate),
+`src/core/PostUpdateMigrator.ts` (migration parity), `src/scaffold/templates.ts` (Agent Awareness),
+`docs/STANDARDS-REGISTRY.md` (the standard), plus all 3 test tiers.
+**Producer wiring (added after the first second-pass review caught it unwired — see Second-pass below):**
+the runner/library + consumer gate above were wired, but NOTHING in production instantiated the
+checklist or `DurableVaultSession`, so enabling the feature on a dev agent would have made settling a
+credential-blocker IMPOSSIBLE (the gate demands a run that could not be produced). Closed by:
+`src/monitoring/SelfUnblockProbeProviders.ts` (new — a REAL provider for all 9 sources +
+`deriveBitwardenSession`), the AgentServer wiring (instantiates the production checklist +
+`DurableVaultSession` when each sub-gate is on), and `POST /blockers/self-unblock-run` (the dev-gated
+trigger that produces a verified run). During that wiring an independent review of the AgentServer
+`deriveSession` caught a real production bug: it read `process.env.BW_SESSION` after `bw.unlock()`, but
+`unlock()` stores the session in a PRIVATE field and never exports it to the env — so the org-Bitwarden
+probe (the motivating source) would have silently failed in production while passing the injected-fake
+tests. Fixed by adding `BitwardenProvider.getSessionKey()`, extracting the testable
+`deriveBitwardenSession` helper, and a guard test (`tests/unit/deriveBitwardenSession.test.ts`) that
+asserts the session comes from `getSessionKey()`, not the env.
+## Decision-point inventory
+- `BlockerLedger.settleTrueBlocker` evidence intake — **modify** — input contract changes from a
+  caller-supplied `failedAttempt` object to a `runId` reference the ledger loads + verifies against
+  the persisted checklist run. This is the only edit to the gate's logic; the settle AUTHORITY (B17
+  Tier-1) is unchanged.
+- `SelfUnblockChecklist` — **add** — a deterministic signal-PRODUCER. It holds NO blocking authority;
+  it records probe results + the rung and produces the evidence the existing gate consumes.
+- Rung-floor mapping — **add** — enforces a minimum rung of 1 (approval) for irreversible /
+  cost-bearing / out-of-scope / policy-sensitive actions even when a self-unblock cred exists; maps
+  onto the existing `AuthorityCheckEvidence` (no new field).
+- `GET /blockers/self-unblock-runs` — **add** — read-only observability over the run store.
+## 1. Over-block
+**What legitimate inputs does this change reject that it shouldn't?**
+The checklist itself is not a block/allow surface — it produces evidence. The settle GATE it feeds
+could now reject a legitimate true-blocker if a caller tries to settle WITHOUT a persisted checklist
+run (it derives the failed attempt only from a verified run id). That is the intended anti-gaming
+direction — a blocker may not settle as "operator-required" without real evidence — and it fails
+toward safety (don't let a blocker masquerade as operator-required). The one genuine over-block risk:
+if the checklist RUNNER cannot persist a run at all (disk failure), settle is blocked. This degrades
+toward "keep trying / surface honestly," not toward a false operator-blocker, which is the correct
+direction. A checklist that completes with every probe `reachable:false` is a VALID run (it produces
+"nothing reachable" evidence) and satisfies the gate — so a genuinely-blocked agent is not stuck.
+## 2. Under-block
+**What failure modes does this still miss?**
+The deterministic relevance match (`holdsRelevantCred`) can MISS a credential that IS relevant but is
+under-tagged or mis-tagged — it fails CLOSED (`holdsRelevantCred:false`), so the cred is not surfaced
+and the agent escalates to the human when it could in principle have self-unblocked. This is an
+"under-self-unblock" (the agent asks the human slightly more than strictly necessary) — the SAFE
+direction for this standard's primary security invariant: it never mis-applies a credential, it only
+occasionally fails to find one. The fix path is better credential tagging, never looser matching. The
+checklist is also only as complete as its probe list (vault/Bitwarden/Vercel/Cloudflare/GitHub/MCP/
+browser); an account type with no probe is simply not auto-discovered — data-extensible, documented.
+## 3. Level-of-abstraction fit
+**Is this at the right layer?**
+Yes — and this is exactly what the adversarial review corrected. The checklist is a low-level,
+deterministic DETECTOR that FEEDS the existing high-level Tier-1 `BlockerLedger` settle AUTHORITY. The
+first design drafted a weaker parallel gate; round-1 convergence (integration + lessons-aware
+reviewers, independently) caught it. The final design adds NO new gate, ledger, log, or
+`evaluateSelfUnblock` authority — it reuses BlockerLedger's pipeline/taxonomy/log/envelope and changes
+only the evidence intake. The deterministic relevance match is deliberately kept OUT of LLM judgment
+(the most failure-prone hop), consistent with the signal/authority split.
+## 4. Signal vs authority compliance
+**Required reference:** [docs/signal-vs-authority.md](../../docs/signal-vs-authority.md)
+**Does this change hold blocking authority with brittle logic?**
+- [x] **No — this change produces a signal consumed by an existing smart gate.**
+The `SelfUnblockChecklist` (deterministic, brittle-by-nature tag matching, code-only) holds NO
+blocking authority. The ONE judgment — whether a blocker may settle as a true-blocker — remains
+BlockerLedger's existing Tier-1 `SettleAuthority` (B17) LLM gate. The change makes that gate STRICTER
+(it now derives the failed attempt from a verified persisted run rather than a caller-asserted object),
+never adds a brittle authority. The rung-floor is a deterministic MINIMUM raised on top of the existing
+authority, not a new allow/deny owner. Fully compliant.
+## 5. Interactions
+- **Shadowing:** the new `GET /blockers/self-unblock-runs` route is registered BEFORE `GET
+  /blockers/:id` so the literal path is not swallowed by the param route (verified in the diff and the
+  integration test). No allow/deny shadowing — there is no new gate.
+- **Double-fire:** no new gate is added, so there is no double-gating of a settle decision. The
+  checklist runs once per blocker-resolution attempt and persists one run.
+- **Races:** the run store is append-keyed by immutable run id; `settleTrueBlocker` reads by that id.
+  The bw session is held only while a run is in flight (TTL + idle-expiry), so concurrent runs each
+  hold their own warm window; no shared mutable settle state is introduced.
+- **Feedback loops:** none — the checklist's output feeds the ledger's settle path, which does not
+  feed back into the checklist's inputs.
+## 6. External surfaces
+- **Other agents / users / external systems:** the production probe providers
+  (`SelfUnblockProbeProviders.ts`) reach the agent's OWN sources only — its vault (names only), the org
+  Bitwarden vault (via `DurableVaultSession`, exit-code reachability), and authed cloud accounts
+  (Cloudflare zones via ONE bounded fetch; `vercel`/`gh` via ONE bounded CLI exec). All READ-ONLY,
+  one bounded call each, no writes, no new egress, no recursive scans (the 2026-06-13 load-spike
+  lesson). Each provider returns ONLY reachability + non-secret scope-tag strings — never a credential
+  value. Relevance is operator-declared + fail-closed (an undeclared source advertises nothing → never
+  surfaced), so the worst case is under-self-unblock (ask the human slightly more), never mis-applying
+  a credential.
+- **Persistent state:** a new machine-local JSONL run store (per-probe results + rung). Inert
+  observability data; no schema other code depends on; safe to delete.
+- **Credential reach:** `DurableVaultSession` reaches the org Bitwarden vault via the existing
+  `BitwardenProvider`. This is the standard's main security tradeoff and is bounded: session value in
+  process memory ONLY, never logged, handed to `bw` ONLY via the child `BW_SESSION` env (never argv),
+  never on the secret-sync path, held only while a run is in flight, master password operator-held
+  (read from the EXISTING `bw-master-password` vault key — no new on-disk secret). The wiring-integrity
+  test `tests/unit/SelfUnblockSessionLeak.test.ts` asserts a sentinel session value rides ONLY the
+  `BW_SESSION` env and never appears in argv, the persisted run JSON, the decisions log, or the ledger
+  store.
+- **Operator surface:** two new API surfaces — the READ-ONLY `GET /blockers/self-unblock-runs`
+  (observability) and the dev-gated `POST /blockers/self-unblock-run` (the agent-facing trigger that
+  runs the checklist). Both are Bearer-gated, 503-after-auth when dark, `no-store`, and emit untrusted
+  probe `detail` through the `<blocker-ledger-data>` envelope — no secret in any response. Neither is an
+  operator dashboard ACTION (no approval page, grant/revoke, secret-drop form, or renderer) → §6b not
+  applicable.
+## 6b. Operator-surface quality (Operator-Surface Quality standard)
+**No operator surface — not applicable.** This change adds no `dashboard/*.js` / `dashboard/*.html`
+renderer, no approval page, and no grant/revoke/secret-drop form. The single new HTTP surface is a
+read-only JSON observability route (`GET /blockers/self-unblock-runs`), not an operator action.
+## 7. Multi-machine posture (Cross-Machine Coherence)
+**Posture: machine-local BY DESIGN** — with a security reason, not an oversight.
+- **Credential reachability is inherently per-machine:** a credential reachable on machine A's authed
+  CLIs / keychain may not be reachable on machine B. The checklist probes THIS machine's reachable
+  sources; replicating "what I can reach" across machines would be incorrect and a reconnaissance leak.
+- **The `DurableVaultSession` is a security boundary that MUST NOT replicate:** it is explicitly kept
+  off the `multiMachine.secretSync` path (asserted in the spec + a wiring test). Machine-local is the
+  required posture, not a default.
+- **The run store is a per-machine audit trail** (like the reap-log / blocker-decisions log).
+- **User-facing notices:** none emitted by this change — any messaging is owned by the existing
+  ledger settle path (one-voice gating already applies there), so no new double-voice risk.
+- **Durable state on topic transfer:** the run store is NOT topic-keyed, so it does not strand on a
+  topic move.
+- **Generated URLs:** the one route is a local API path; it generates no cross-machine link.
+If a pool-wide "what self-unblock runs happened across machines?" view is ever wanted, the correct
+shape is a proxied-on-read merged view (`?scope=pool`) over each machine's local store — explicitly
+NOT replication of the underlying credential-reach data. Noted as a possible future read-surface, not
+needed for this standard.
+## 8. Rollback cost
+Pure code change behind a dev-gate. Back-out options, cheapest first:
+- **Disable the flag** — set `monitoring.blockerLedger.selfUnblockChecklist.enabled:false` (and
+  `durableVaultSession`): the checklist stops running, the route 503s, the session is not kept warm.
+  Everything inert with no revert. This is the primary rollback.
+- **Hot-fix revert** — revert the PR and ship a patch. The one input-contract change to
+  `settleTrueBlocker` reverts with it; no caller depends on the runId path except the new checklist.
+- **Data migration:** none. The persisted runs are inert machine-local JSON; deleting the run-store
+  directory is sufficient and optional.
+- **Agent state repair:** none. Dark on the fleet, so no fleet agent sees a change; the dev agent
+  picks up the gate at next restart and drops it the moment the flag is disabled.
+- **User visibility:** none — no user-visible behavior on a normal install during any rollback window.
+## Conclusion
+The review produced one mechanical change to the spec (§11 reworded from "deferred decisions" to
+"scope boundary / explicit non-goal" — identical meaning, reworded so it does not trip the
+no-orphan-deferrals scan) AND, far more importantly, the second-pass review caught that the build was
+shipped HALF-WIRED: the consumer gate + run store + read route were wired, but the PRODUCER (the
+checklist runner + `DurableVaultSession` + a trigger) was not — so enabling it would have BLOCKED
+settling a credential-blocker on a dev agent. Completing the producer (within the approved spec §5)
+then surfaced a second real defect (the `deriveSession` env-vs-getSessionKey bug) that passed the
+injected-fake tests but would have killed the org-Bitwarden probe in production. Both are resolved and
+guarded by new tests. The standout property remains that the adversarial pass forced the design to
+EXTEND the existing BlockerLedger settle authority instead of forking a weaker parallel gate — zero new
+blocking authority, the one judgment stays on the Tier-1 gate, strictly HARDER to settle a false
+operator-blocker. It ships dark behind the developmentAgent gate, is reversible to fully inert via the
+flags, and is machine-local by a stated security reason. Clear to ship.
+## Second-pass review (if required)
+**Reviewer:** independent general-purpose reviewer subagent (high-risk: touches a settle gate)
+**Round 1 — Concern raised.** Confirmed the consumer half (signal-vs-authority, anti-gaming run-id
+verification, fail-closed relevance, route auth) is solid (A–D), but raised TWO blocking concerns:
+(1) the artifact over-claimed an "argv" non-leak test that did not exist; (2) `DurableVaultSession` and
+the checklist RUNNER were instantiated only in tests — the producer was unwired in production, so the
+settle gate would demand a run that could never be produced.
+**Resolution.** Both addressed by completing the producer within the approved spec: a real bounded
+fail-closed provider for all 9 sources (`SelfUnblockProbeProviders.ts`), production instantiation of
+the checklist + `DurableVaultSession` in AgentServer (each on its own dev-gate), the `POST
+/blockers/self-unblock-run` trigger, and the now-real argv non-leak test
+(`SelfUnblockSessionLeak.test.ts`). Completing it surfaced + fixed the `deriveSession`
+env-vs-`getSessionKey()` production bug (guarded by `deriveBitwardenSession.test.ts`).
+**Round 2 — Concur.** A focused independent re-review of the final producer code verified, with
+file:line evidence, all six checks: (1a) no provider returns/logs a secret value; (1b) each provider is
+one bounded call, no recursive scan; (1c) relevance is fail-closed; (2) `deriveBitwardenSession`
+returns `getSessionKey()` not the env and is null-safe; (3) the AgentServer block is dev-gated and
+introduces no new on-disk secret; (4) the trigger route is 503-after-auth, intent-gated, `no-store`,
+and leaks no secret. **Verdict: Concur.**
+## Evidence pointers
+- `tsc --noEmit` clean; `node scripts/lint-dev-agent-dark-gate.js` → `clean`.
+- Targeted vitest run (11 files, 215 tests green): consumer/library —
+  `tests/unit/BlockerLedgerSelfUnblock.test.ts`, `DurableVaultSession.test.ts`,
+  `SelfUnblockChecklist.test.ts`, `PostUpdateMigrator-selfUnblock.test.ts`; producer —
+  `tests/unit/SelfUnblockProbeProviders.test.ts`, `deriveBitwardenSession.test.ts`,
+  `SelfUnblockSessionLeak.test.ts` (the argv/ledger non-leak wiring test);
+  routes/E2E — `tests/integration/self-unblock-routes.test.ts` (incl. the production-path trigger →
+  settle and the negative anti-gaming assertion), `tests/e2e/self-unblock-lifecycle.test.ts`
+  ("feature is alive": 200 enabled / 503-after-auth dark); plus `lint-dev-agent-dark-gate.test.ts` +
+  `feature-delivery-completeness.test.ts` (dev-gate registry coherence).
+## Addendum — no-silent-fallbacks ratchet (post-CI follow-up)
+CI surfaced one deterministic failure after the initial commit: the
+`no-silent-fallbacks` ratchet counts error-swallowing `catch` blocks against a
+tracked baseline (474) and the two new catches in `SelfUnblockRunStore`
+(`loadRun` skipping a corrupt/partial trailing JSONL line; `listRuns` returning
+`[]` when no runs file exists yet) pushed it to 475.
+Both are intentional, expected-condition silences — a partial trailing line is a
+normal crash-during-append artifact, and a missing runs file is the first-run
+condition — and they match a pattern already blessed two functions up in the same
+file. The correct fix is therefore the codebase's `@silent-fallback-ok` marker
+with justification on each, NOT raising the baseline or bolting on noisy
+degradation reports. Count is back to 473. No behavior change; pure annotation.

package/upgrades/side-effects/supervisor-respawn-guarantee.md ADDED Viewed

@@ -0,0 +1,61 @@
+# Side-Effects Review — Supervisor Respawn Guarantee (net #2)
+**Version / slug:** `supervisor-respawn-guarantee`
+**Date:** `2026-06-14`
+**Author:** `echo`
+**Second-pass reviewer:** `independent reviewer — Concur with the review (2026-06-14)`
+> Second-pass verdict: **Concur.** Verified no double-spawn (5s first backoff < 10s tick; `spawnServer` kills any lingering session; circuit breaker bounds a crash-loop), no boot regression (`spawnServer` creates the tmux session synchronously via `execFileSync` before the loop arms, so a booting server always has a live session), correct ordering after the slept short-circuit and planned-restart suppression, stale `spawnedAt` does not corrupt the `lastHealthy < spawnedAt` bind-failure tracker, and `firstSpawnedAt` is anchored/cleared on all healthy paths. One non-blocking edge raised — a long hard-sleep while `firstSpawnedAt` is anchored could make the wall-clock ceiling fire immediately on the post-wake boot — **hardened in response:** a genuine (low-load) suspend/wake now re-anchors `firstSpawnedAt = now` in both the gap-check and the `SleepWakeDetector` wake handler (a real wake is a fresh boot episode), with two added regression assertions.
+## Summary of the change
+`src/lifeline/ServerSupervisor.ts` — the in-process net that detects a dead server and respawns it. Three fixes, all in the 10s health-check loop, closing the 2026-06-14 ~2h outage where a CPU-starved box made the supervisor misread its own stalled event loop as a machine sleep/wake, reset `spawnedAt = now`, and pin itself in the startup-grace branch where health failures (including a vanished server tmux session) are ignored.
+- **Fix A (load-bearing):** at the top of each tick, before the startup-grace early-return, probe `isServerSessionAlive()`. A missing session is unambiguous death → call `handleUnhealthy()` immediately (subject to its existing circuit-breaker / restart-attempt accounting), regardless of any grace pin.
+- **Fix B:** make the gap-based sleep/wake detection (and the `SleepWakeDetector` `'wake'` handler) load-aware. A large inter-tick gap while `loadRatioProvider() > maxLoadRatio` (1.5) is classified as a stalled event loop, not a suspend — failure counters reset (safe) but `spawnedAt` is NOT reset (grace not re-armed). A low-load gap still re-arms grace (real-suspend behavior preserved).
+- **Fix C:** an absolute grace ceiling — `firstSpawnedAt` anchors the first spawn of the current not-yet-healthy episode (never reset by sleep/wake handlers); cumulative grace is capped at `startupGraceMs × 3`. Cleared on the first healthy tick.
+Refactor: the inline `setInterval` callback was extracted verbatim into `private async runHealthTick()` so a single tick can be unit-driven and the wiring-integrity test can assert the loop probes session liveness.
+## Decision-point inventory
+- `ServerSupervisor.runHealthTick` missing-session override (Fix A) — **add** — respawn-vs-wait decision now grounded in "does the tmux session exist?" before grace.
+- `ServerSupervisor.runHealthTick` gap classification (Fix B) — **modify** — sleep/wake-vs-stalled-loop decision now consults load.
+- `SleepWakeDetector 'wake'` handler `spawnedAt` reset (Fix B) — **modify** — same load guard.
+- `ServerSupervisor.runHealthTick` grace early-return (Fix C) — **modify** — adds the absolute ceiling term.
+---
+## 1. Over-block
+No outbound/inbound message block surface. The analogous "over-action" risk is **respawning a server that should have been left alone** (a false positive). Fix A only acts when the tmux session is genuinely absent; a normally-booting server has a live session (created synchronously at spawn; HTTP binds later), so a real boot is never killed — covered by the regression test "alive booting session is still given full grace." Fix B is strictly *less* aggressive than the prior code (it withholds a `spawnedAt` reset; it never adds a kill). Fix C only fires after 3× the (already generous 10-min) grace with the session never having gone healthy — a genuinely hung boot, not a slow one.
+## 2. Under-block
+The "under-action" risk is **a dead server that still isn't respawned**. Remaining gaps, explicitly: (a) Fix A respawns only when the *server tmux session* is missing — a server process that is alive-but-wedged is still handled by the pre-existing `evaluateUnhealthyServer` path (unchanged), with its CPU-starvation defer; this PR does not change that path. (b) This is net #2 only — net #1 (a subsystem uncaught exception crashing the whole process) and net #3 (the fleet watchdog / launchd-level backstop) are tracked follow-ups in the spec §6 (FU-1, FU-2). <!-- tracked: CMT-1540 --> Net #3 was additionally found and fixed LIVE on the echo laptop this session (its launchd job was loaded from a reaped temp-dir plist → exit 127); the durable source fix is tracked in FU-2.
+## 3. Level-of-abstraction fit
+Correct layer. The supervisor legitimately holds respawn **authority**; this change makes that authority *more reliable* by grounding the decision in an objective fact (session exists?) rather than a fragile inference (did the machine sleep?). It reuses the existing low-level primitives (`isServerSessionAlive`, `handleUnhealthy`, `loadRatioProvider`, `maxLoadRatio`) rather than re-implementing them — Fix B uses the *same* load signal the CPU-starvation defer and `SleepWakeDetector` already use. No new gate is introduced; no redundant config knob added (reused `maxLoadRatio = DEFAULT_MAX_LOAD_RATIO = 1.5`, which is exactly the spec's named `cpuStarvationLoadPerCore` default — a deliberate DRY decision vs. the spec's suggested new knob, since the value and signal are identical).
+## 4. Signal vs authority compliance
+Compliant. Per `docs/signal-vs-authority.md`: the heuristic (sleep/wake inference) is *demoted* to where it is safe — leniency for an *existing* slow process — and can never suppress recovery of a *missing* process. The authoritative respawn decision is moved onto a non-heuristic structural fact (tmux session existence). This is the correct direction: replace willpower/heuristic with structure. No brittle check is given new blocking authority.
+## 5. Interactions
+- **Does not shadow / is not shadowed:** Fix A runs *before* the grace early-return and `return`s on a missing session, so the rest of the tick is skipped on that path (intended — respawn is scheduled). The existing non-grace `evaluateUnhealthyServer` missing-session branch is unchanged and still covers the alive-but-unresponsive case.
+- **Double-fire / hot-spin:** Fix A routes through `handleUnhealthy()`, which carries the full circuit-breaker, restart-attempt cap, cooldown, and planned-restart suppression. During the post-`handleUnhealthy` backoff (first attempt 5s < 10s tick) the session re-appears before the next tick, so no double-spawn; a genuine crash-loop trips the breaker exactly as today. Planned-restart / legacy-restart / slept-marker short-circuits all still precede or are honored by `handleUnhealthy`.
+- **Counter resets:** Fix B still resets failure counters on a starved gap (safe — they may be stale); it only withholds the `spawnedAt`/grace re-arm. Fix C clears `firstSpawnedAt` on both healthy paths (grace optimistic-probe success and the main healthy branch).
+## 6. External surfaces
+No API route, no message, no cross-agent surface, no schema change. Pure in-process lifeline behavior. Adds console log lines on the new branches (forensic only). No dependency on conversation state. The only timing dependency is system load (`os.loadavg()`), already used elsewhere and injectable in tests.
+## 7. Multi-machine posture (Cross-Machine Coherence)
+**Machine-local BY DESIGN.** The `ServerSupervisor` supervises THIS machine's server process; each machine runs its own supervisor and respawns its own server. There is no shared state, no replication, and no cross-machine read — a server's liveness is inherently a per-machine fact. Nothing here is user-facing (no one-voice gating needed), nothing is durable state that could strand on a topic transfer, and no URL is generated. This is the correct posture, not a silent single-machine assumption.
+## 8. Rollback cost
+Low. Pure code change in one file + one new test file; no migration, no state schema, no config default change. Back-out = revert the commit and ship a patch release; the supervisor reverts to prior behavior with no data repair. The extracted `runHealthTick()` is behavior-identical to the prior inline callback, so even a partial revert is clean. The change only makes recovery *more* likely to fire, so the failure mode of a bug here is bounded by the pre-existing circuit breaker.