instar 1.3.565 → 1.3.567

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (55) hide show
  1. package/dist/config/ConfigDefaults.d.ts.map +1 -1
  2. package/dist/config/ConfigDefaults.js +16 -1
  3. package/dist/config/ConfigDefaults.js.map +1 -1
  4. package/dist/core/BitwardenProvider.d.ts +8 -0
  5. package/dist/core/BitwardenProvider.d.ts.map +1 -1
  6. package/dist/core/BitwardenProvider.js +10 -0
  7. package/dist/core/BitwardenProvider.js.map +1 -1
  8. package/dist/core/PostUpdateMigrator.d.ts +13 -0
  9. package/dist/core/PostUpdateMigrator.d.ts.map +1 -1
  10. package/dist/core/PostUpdateMigrator.js +55 -0
  11. package/dist/core/PostUpdateMigrator.js.map +1 -1
  12. package/dist/core/devGatedFeatures.d.ts.map +1 -1
  13. package/dist/core/devGatedFeatures.js +12 -0
  14. package/dist/core/devGatedFeatures.js.map +1 -1
  15. package/dist/core/types.d.ts +42 -0
  16. package/dist/core/types.d.ts.map +1 -1
  17. package/dist/core/types.js.map +1 -1
  18. package/dist/lifeline/ServerSupervisor.d.ts +9 -0
  19. package/dist/lifeline/ServerSupervisor.d.ts.map +1 -1
  20. package/dist/lifeline/ServerSupervisor.js +174 -84
  21. package/dist/lifeline/ServerSupervisor.js.map +1 -1
  22. package/dist/monitoring/BlockerLedger.d.ts +43 -2
  23. package/dist/monitoring/BlockerLedger.d.ts.map +1 -1
  24. package/dist/monitoring/BlockerLedger.js +90 -5
  25. package/dist/monitoring/BlockerLedger.js.map +1 -1
  26. package/dist/monitoring/DurableVaultSession.d.ts +91 -0
  27. package/dist/monitoring/DurableVaultSession.d.ts.map +1 -0
  28. package/dist/monitoring/DurableVaultSession.js +145 -0
  29. package/dist/monitoring/DurableVaultSession.js.map +1 -0
  30. package/dist/monitoring/SelfUnblockChecklist.d.ts +281 -0
  31. package/dist/monitoring/SelfUnblockChecklist.d.ts.map +1 -0
  32. package/dist/monitoring/SelfUnblockChecklist.js +433 -0
  33. package/dist/monitoring/SelfUnblockChecklist.js.map +1 -0
  34. package/dist/monitoring/SelfUnblockProbeProviders.d.ts +116 -0
  35. package/dist/monitoring/SelfUnblockProbeProviders.d.ts.map +1 -0
  36. package/dist/monitoring/SelfUnblockProbeProviders.js +286 -0
  37. package/dist/monitoring/SelfUnblockProbeProviders.js.map +1 -0
  38. package/dist/scaffold/templates.d.ts.map +1 -1
  39. package/dist/scaffold/templates.js +8 -0
  40. package/dist/scaffold/templates.js.map +1 -1
  41. package/dist/server/AgentServer.d.ts +16 -0
  42. package/dist/server/AgentServer.d.ts.map +1 -1
  43. package/dist/server/AgentServer.js +106 -0
  44. package/dist/server/AgentServer.js.map +1 -1
  45. package/dist/server/routes.d.ts +10 -0
  46. package/dist/server/routes.d.ts.map +1 -1
  47. package/dist/server/routes.js +117 -0
  48. package/dist/server/routes.js.map +1 -1
  49. package/package.json +1 -1
  50. package/src/data/builtin-manifest.json +64 -64
  51. package/src/scaffold/templates.ts +8 -0
  52. package/upgrades/1.3.566.md +104 -0
  53. package/upgrades/1.3.567.md +39 -0
  54. package/upgrades/side-effects/self-unblock-before-escalating.md +258 -0
  55. package/upgrades/side-effects/supervisor-respawn-guarantee.md +61 -0
@@ -0,0 +1,104 @@
1
+ # Upgrade Guide — vNEXT
2
+
3
+ <!-- assembled-by: assemble-next-md -->
4
+ <!-- bump: patch -->
5
+
6
+ ## What Changed
7
+
8
+ - **A new constitutional standard, "Self-Unblock Before Escalating"** (`docs/STANDARDS-REGISTRY.md`):
9
+ a blocker is the agent's problem to solve FIRST. The agent must exhaust every unblock path
10
+ *within its permissions or organizationally-granted scope* before requiring anything from a
11
+ human — and when a human IS required, ask for the lowest rung on the human-requirement ladder
12
+ (**nothing → an approval → an operator-only credential**), named exactly. Origin: operator
13
+ directive (Justin, topic 12476, 2026-06-13) after the agent idled on a `feedback.instar.sh` DNS
14
+ record that was already self-unblockable via a Cloudflare token sitting in the org vault.
15
+ - **It EXTENDS the existing `BlockerLedger` gate — it does NOT fork a parallel one.** Round-1
16
+ convergence (and two independent external reviewers) found `BlockerLedger.settleTrueBlocker()`
17
+ already mandates a recorded failed self-unblock attempt before a credential/account blocker can
18
+ settle, already HARD-rejects `missing_failed_attempt`, and already routes the settle JUDGMENT
19
+ through the Tier-1 `SettleAuthority` (B17) LLM gate. So this standard adds only the things the
20
+ ledger did NOT have and reuses its pipeline / taxonomy / log / untrusted-data envelope for
21
+ everything else.
22
+ - **`SelfUnblockChecklist` (the one substantial new module, `src/monitoring/SelfUnblockChecklist.ts`):**
23
+ a deterministic, code-driven, ORDERED probe list (own per-agent vault → org Bitwarden → authed
24
+ cloud accounts → MCP → browser → "do I already control a resource that does this?") that
25
+ systematically PRODUCES the failed-attempt evidence the ledger requires. `holdsRelevantCred` is
26
+ decided DETERMINISTICALLY by a `service:scope` tag match (e.g. `cloudflare:dawn-tunnel.dev`, with
27
+ domain-hierarchy / wildcard rules), never by an LLM; ambiguous or MISSING metadata fails CLOSED
28
+ (a stale cred is simply not surfaced, never mis-applied). Each probe is timeout-bounded by class
29
+ and fails toward `reachable:false`, so one hung probe degrades to "unreachable" rather than
30
+ stalling the path.
31
+ - **Anti-gaming is mechanical (the one edit to BlockerLedger's logic):** the checklist RUNNER
32
+ persists each run keyed by an immutable run id, and `settleTrueBlocker` now takes a `runId`
33
+ REFERENCE that the ledger LOADS + verifies — replacing the old caller-supplied `failedAttempt`
34
+ object. A hand-supplied list with no genuine persisted run is treated as NO attempt, so the
35
+ "self-asserted / gameable list" hole is closed by construction. Everything else in the ledger is
36
+ reuse.
37
+ - **`DurableVaultSession` (`src/monitoring/DurableVaultSession.ts`):** flag-gated wiring to the
38
+ existing in-memory `BitwardenProvider` session so the org-vault probe can actually reach the
39
+ vault. The session value lives in process memory / keychain only, is held warm ONLY while a
40
+ checklist run is in flight (TTL + idle-expiry), is NEVER logged, NEVER passed as argv (handed to
41
+ `bw` only via the child `BW_SESSION` env), and NEVER placed on the cross-machine
42
+ `multiMachine.secretSync` path. The master password stays operator-held; no new on-disk secret is
43
+ introduced.
44
+ - **Rung FLOOR (capability ≠ authority):** an action that is irreversible, cost-bearing above a
45
+ threshold, out-of-scope, or policy-sensitive keeps a MINIMUM rung of 1 (an approval) EVEN IF a
46
+ self-unblock credential exists. A rung-1 approval must resolve against a VERIFIED principal (Know
47
+ Your Principal), never a name seen in content.
48
+ - **Read-only HTTP surface:** the existing `/blockers/*` read surface is extended with the
49
+ checklist's per-probe results + the rung — `GET /blockers/self-unblock-runs`. Bearer-gated, NOT on
50
+ the auth-exempt allowlist, 503-when-dark AFTER auth (an unauthenticated caller gets 401, not a 503
51
+ that confirms the route exists), `Cache-Control: no-store` (the body is credential-reachability
52
+ reconnaissance), served through the `<blocker-ledger-data>` envelope.
53
+ - **Ships DARK behind the developmentAgent gate** — the `selfUnblockChecklist` and
54
+ `durableVaultSession` blocks nest under the existing `monitoring.blockerLedger.*` gate and OMIT
55
+ the `enabled` literal, so `resolveDevAgentGate` decides: LIVE on a development agent, DARK on the
56
+ fleet. Reversible — disabling makes the checklist not run, the route 503, the session not kept
57
+ warm; all inert.
58
+
59
+ ## What to Tell Your User
60
+
61
+ Nothing changes for fleet agents — this ships off for everyone except development agents. On a
62
+ development agent, when the agent hits a blocker (a missing credential, a locked account), it now
63
+ systematically checks every place it's already allowed to look — its own vault, the organization's
64
+ password vault, the cloud accounts it's already signed into — before it ever asks you for anything.
65
+ Only after that genuinely comes up empty will it ask, and it asks for the smallest thing possible:
66
+ ideally nothing, then a yes/no approval, and a credential only you can produce as the last resort. It
67
+ can never use a credential to act outside the job at hand, and anything irreversible or costly still
68
+ asks for your approval first even if it could technically do it itself. There is no user-facing action
69
+ and no behavior change on a normal install.
70
+
71
+ ## Summary of New Capabilities
72
+
73
+ - On a development agent, the agent now runs a deterministic self-unblock checklist before it can
74
+ settle any credential/account blocker as genuinely operator-required — turning "exhaust every path
75
+ you can already reach first" from a prompt-level wish into a structural precondition fed into the
76
+ existing Tier-1 blocker-settle authority.
77
+
78
+ ## Evidence
79
+
80
+ - `tests/unit/BlockerLedgerSelfUnblock.test.ts` (7): checklist probe ordering + short-circuit +
81
+ per-probe timeout→`reachable:false`; the run-id provenance contract (a caller-embedded attempt
82
+ with no persisted run is HARD-rejected as `missing_failed_attempt`); the ladder/rung-floor mapping
83
+ (irreversible/cost-bearing → min rung 1 even with a cred); the rung-1 verified-principal
84
+ resolution.
85
+ - `tests/unit/DurableVaultSession.test.ts` (8): TTL / idle-expiry, in-flight-only hold, and a wiring
86
+ assertion that the session value never appears in the ledger or argv.
87
+ - `tests/integration/self-unblock-routes.test.ts` (5): the `/blockers/self-unblock-runs` read route —
88
+ 200 when enabled, 503 (after auth) when dark; a real persisted run feeds the settle path; the
89
+ negative anti-gaming assertion (no persisted run → HARD reject).
90
+ - `tests/e2e/self-unblock-lifecycle.test.ts` (6): production init path — route ALIVE (200) when
91
+ enabled, 503 when dark; a self-unblock action touching an external account is STILL evaluated by
92
+ the external-operation + mandate gates (proves the boundary mechanically, not in prose).
93
+ - `tests/unit/lint-dev-agent-dark-gate.test.ts` + `tests/unit/feature-delivery-completeness.test.ts`:
94
+ the two new flags are registered in `DEV_GATED_FEATURES` with justifications and the attribution
95
+ map recomputed.
96
+
97
+ ## Migration
98
+
99
+ - `PostUpdateMigrator.migrateConfigSelfUnblockChecklistDevGate` strips a default-shaped `false` for
100
+ the `selfUnblockChecklist` / `durableVaultSession` blocks (nested under
101
+ `monitoring.blockerLedger.*`) on update, so the gate resolves on already-deployed dev agents. An
102
+ operator-set value is left entirely alone — reach is not authority. Idempotent. The CLAUDE.md
103
+ self-unblock reflex section is delivered via the content-sniffed `migrateClaudeMd` path +
104
+ `generateClaudeMd()` (Agent Awareness).
@@ -0,0 +1,39 @@
1
+ # Upgrade Guide — vNEXT
2
+
3
+ <!-- assembled-by: assemble-next-md -->
4
+ <!-- bump: patch -->
5
+
6
+ ## What Changed
7
+
8
+ Hardened the in-process **server supervisor** (`src/lifeline/ServerSupervisor.ts`) so a genuinely-dead server is always respawned, even under sustained CPU starvation. This closes a real ~2h outage class (2026-06-14): on a heavily-loaded box the supervisor's 10s health loop stalls for minutes, and it misread every large inter-tick gap as a machine sleep/wake — resetting `spawnedAt = now`, which re-armed the startup-grace window where health failures (including the unambiguous signal that the server's tmux session no longer exists) are deliberately ignored. The server stayed dead until a human messaged.
9
+
10
+ Three layered fixes, all in the health-check tick:
11
+
12
+ - **Fix A (load-bearing) — missing-session override.** Before honoring any startup-grace early-return, the tick now probes `isServerSessionAlive()`. A missing server tmux session is unambiguous death, not a boot, and is respawned on the very next tick regardless of any `spawnedAt` reset — routed through the existing `handleUnhealthy()` so the circuit breaker still bounds a genuine crash-loop. A normally-booting server has a live tmux session (created synchronously at spawn; HTTP binds later), so this never fights a real boot.
13
+ - **Fix B — load-aware gap detection.** A large inter-tick gap is only treated as sleep/wake when the box is NOT CPU-starved. Under starvation (`loadRatio > 1.5`, the same signal the CPU-starvation defer already uses) the gap is classified as a stalled event loop: failure counters reset, but the startup-grace window is NOT re-armed. The same guard is applied to the `SleepWakeDetector` wake handler. A real low-load suspend still re-arms grace exactly as before.
14
+ - **Fix C — absolute grace ceiling.** A new `firstSpawnedAt` anchor (never reset by sleep/wake handling) caps cumulative startup grace at `startupGraceMs × 3`. A server whose session is alive but has never answered `/health` past the ceiling is hung, not booting, and its failures are acted on normally.
15
+
16
+ The inline `setInterval` callback was extracted verbatim into `runHealthTick()` so a single tick is unit-testable and a wiring-integrity test can assert the loop probes session liveness on every tick.
17
+
18
+ audience: agent-only
19
+ maturity: stable
20
+
21
+ Net #1 (a subsystem uncaught exception crashing the whole process) and net #3 (the launchd fleet watchdog) are tracked follow-ups in the spec §6. Net #3's live root cause was additionally found and fixed in production this session (the `ai.instar.watchdog` launchd job was loaded from a reaped temp-dir plist → exit 127); the durable source fix is tracked in FU-2. <!-- tracked: CMT-1540 -->
22
+
23
+ ## What to Tell Your User
24
+
25
+ Nothing to announce proactively. If asked about server reliability: when my server process genuinely dies, the supervisor now respawns it within one health tick (~10s) instead of being fooled by CPU load into thinking the machine went to sleep and waiting indefinitely. The recovery decision is now grounded in an objective fact — does the server process actually exist? — rather than a sleep/wake guess that could be wrong on a busy machine. Normal slow boots are still given the full startup grace, so this never restarts a server mid-boot.
26
+
27
+ ## Summary of New Capabilities
28
+
29
+ No new user-facing capability — a reliability hardening of the existing crash-recovery supervisor.
30
+
31
+ | Change | Effect |
32
+ |--------|--------|
33
+ | Missing-session override (Fix A) | A dead server tmux session is respawned on the next ~10s tick, even during startup grace |
34
+ | Load-aware gap detection (Fix B) | A CPU-starvation event-loop stall is no longer misread as sleep/wake; grace is not falsely re-armed |
35
+ | Absolute grace ceiling (Fix C) | Repeated `spawnedAt` resets can no longer suppress recovery beyond 3× the grace window |
36
+
37
+ ## Evidence
38
+
39
+ Reliability fix; pinned by `tests/unit/server-supervisor-respawn-guarantee.test.ts` (10) driving the real extracted `runHealthTick()` and `SleepWakeDetector` wake handler: missing-session-during-grace → respawn (the exact 2026-06-14 trap), missing-session-during-false-wake → respawn, starved-gap → `spawnedAt` not reset, low-load-gap → re-armed, grace-ceiling broken → failures acted on, in-grace booting server still protected (Fix A no boot regression), `firstSpawnedAt` cleared on healthy, and a wiring-integrity assertion that the tick probes `isServerSessionAlive()` every tick. The full existing supervisor/lifeline suite (63 tests across 8 files) still passes. `npx tsc --noEmit` clean.
@@ -0,0 +1,258 @@
1
+ # Side-Effects Review — Self-Unblock Before Escalating (constitutional standard)
2
+
3
+ **Version / slug:** `self-unblock-before-escalating`
4
+ **Date:** `2026-06-14`
5
+ **Author:** `echo`
6
+ **Second-pass reviewer:** `general-purpose reviewer subagent (high-risk: touches a settle gate)`
7
+
8
+ ## Summary of the change
9
+
10
+ Encodes the operator directive (Justin, topic 12476, 2026-06-13) "exhaust self-unblock within your
11
+ permissions before requiring anything from a human" as a constitutional standard. The crucial design
12
+ move — forced by round-1 convergence and two external reviewers — is that it **EXTENDS the existing
13
+ `BlockerLedger.settleTrueBlocker()` gate rather than forking a parallel one**. The ledger ALREADY
14
+ mandates a recorded failed self-unblock attempt before a credential/account blocker can settle as a
15
+ true-blocker, already HARD-rejects `missing_failed_attempt`, and already routes the settle JUDGMENT
16
+ through the Tier-1 `SettleAuthority` (B17) LLM gate. This change adds the four things the ledger did
17
+ not have and reuses everything else. Files: `src/monitoring/SelfUnblockChecklist.ts` (new, the only
18
+ substantial new code — a deterministic ordered probe list), `src/monitoring/DurableVaultSession.ts`
19
+ (new, flag-gated org-vault session), `src/monitoring/BlockerLedger.ts` (+140: the ONE logic edit —
20
+ `settleTrueBlocker` now takes a `runId` reference it LOADS + verifies instead of a caller-supplied
21
+ `failedAttempt` object), `src/server/{routes,AgentServer}.ts` (read-only `GET
22
+ /blockers/self-unblock-runs`), `src/core/{devGatedFeatures,types,ConfigDefaults}.ts` (dark dev-gate),
23
+ `src/core/PostUpdateMigrator.ts` (migration parity), `src/scaffold/templates.ts` (Agent Awareness),
24
+ `docs/STANDARDS-REGISTRY.md` (the standard), plus all 3 test tiers.
25
+
26
+ **Producer wiring (added after the first second-pass review caught it unwired — see Second-pass below):**
27
+ the runner/library + consumer gate above were wired, but NOTHING in production instantiated the
28
+ checklist or `DurableVaultSession`, so enabling the feature on a dev agent would have made settling a
29
+ credential-blocker IMPOSSIBLE (the gate demands a run that could not be produced). Closed by:
30
+ `src/monitoring/SelfUnblockProbeProviders.ts` (new — a REAL provider for all 9 sources +
31
+ `deriveBitwardenSession`), the AgentServer wiring (instantiates the production checklist +
32
+ `DurableVaultSession` when each sub-gate is on), and `POST /blockers/self-unblock-run` (the dev-gated
33
+ trigger that produces a verified run). During that wiring an independent review of the AgentServer
34
+ `deriveSession` caught a real production bug: it read `process.env.BW_SESSION` after `bw.unlock()`, but
35
+ `unlock()` stores the session in a PRIVATE field and never exports it to the env — so the org-Bitwarden
36
+ probe (the motivating source) would have silently failed in production while passing the injected-fake
37
+ tests. Fixed by adding `BitwardenProvider.getSessionKey()`, extracting the testable
38
+ `deriveBitwardenSession` helper, and a guard test (`tests/unit/deriveBitwardenSession.test.ts`) that
39
+ asserts the session comes from `getSessionKey()`, not the env.
40
+
41
+ ## Decision-point inventory
42
+
43
+ - `BlockerLedger.settleTrueBlocker` evidence intake — **modify** — input contract changes from a
44
+ caller-supplied `failedAttempt` object to a `runId` reference the ledger loads + verifies against
45
+ the persisted checklist run. This is the only edit to the gate's logic; the settle AUTHORITY (B17
46
+ Tier-1) is unchanged.
47
+ - `SelfUnblockChecklist` — **add** — a deterministic signal-PRODUCER. It holds NO blocking authority;
48
+ it records probe results + the rung and produces the evidence the existing gate consumes.
49
+ - Rung-floor mapping — **add** — enforces a minimum rung of 1 (approval) for irreversible /
50
+ cost-bearing / out-of-scope / policy-sensitive actions even when a self-unblock cred exists; maps
51
+ onto the existing `AuthorityCheckEvidence` (no new field).
52
+ - `GET /blockers/self-unblock-runs` — **add** — read-only observability over the run store.
53
+
54
+ ## 1. Over-block
55
+
56
+ **What legitimate inputs does this change reject that it shouldn't?**
57
+
58
+ The checklist itself is not a block/allow surface — it produces evidence. The settle GATE it feeds
59
+ could now reject a legitimate true-blocker if a caller tries to settle WITHOUT a persisted checklist
60
+ run (it derives the failed attempt only from a verified run id). That is the intended anti-gaming
61
+ direction — a blocker may not settle as "operator-required" without real evidence — and it fails
62
+ toward safety (don't let a blocker masquerade as operator-required). The one genuine over-block risk:
63
+ if the checklist RUNNER cannot persist a run at all (disk failure), settle is blocked. This degrades
64
+ toward "keep trying / surface honestly," not toward a false operator-blocker, which is the correct
65
+ direction. A checklist that completes with every probe `reachable:false` is a VALID run (it produces
66
+ "nothing reachable" evidence) and satisfies the gate — so a genuinely-blocked agent is not stuck.
67
+
68
+ ## 2. Under-block
69
+
70
+ **What failure modes does this still miss?**
71
+
72
+ The deterministic relevance match (`holdsRelevantCred`) can MISS a credential that IS relevant but is
73
+ under-tagged or mis-tagged — it fails CLOSED (`holdsRelevantCred:false`), so the cred is not surfaced
74
+ and the agent escalates to the human when it could in principle have self-unblocked. This is an
75
+ "under-self-unblock" (the agent asks the human slightly more than strictly necessary) — the SAFE
76
+ direction for this standard's primary security invariant: it never mis-applies a credential, it only
77
+ occasionally fails to find one. The fix path is better credential tagging, never looser matching. The
78
+ checklist is also only as complete as its probe list (vault/Bitwarden/Vercel/Cloudflare/GitHub/MCP/
79
+ browser); an account type with no probe is simply not auto-discovered — data-extensible, documented.
80
+
81
+ ## 3. Level-of-abstraction fit
82
+
83
+ **Is this at the right layer?**
84
+
85
+ Yes — and this is exactly what the adversarial review corrected. The checklist is a low-level,
86
+ deterministic DETECTOR that FEEDS the existing high-level Tier-1 `BlockerLedger` settle AUTHORITY. The
87
+ first design drafted a weaker parallel gate; round-1 convergence (integration + lessons-aware
88
+ reviewers, independently) caught it. The final design adds NO new gate, ledger, log, or
89
+ `evaluateSelfUnblock` authority — it reuses BlockerLedger's pipeline/taxonomy/log/envelope and changes
90
+ only the evidence intake. The deterministic relevance match is deliberately kept OUT of LLM judgment
91
+ (the most failure-prone hop), consistent with the signal/authority split.
92
+
93
+ ## 4. Signal vs authority compliance
94
+
95
+ **Required reference:** [docs/signal-vs-authority.md](../../docs/signal-vs-authority.md)
96
+
97
+ **Does this change hold blocking authority with brittle logic?**
98
+
99
+ - [x] **No — this change produces a signal consumed by an existing smart gate.**
100
+
101
+ The `SelfUnblockChecklist` (deterministic, brittle-by-nature tag matching, code-only) holds NO
102
+ blocking authority. The ONE judgment — whether a blocker may settle as a true-blocker — remains
103
+ BlockerLedger's existing Tier-1 `SettleAuthority` (B17) LLM gate. The change makes that gate STRICTER
104
+ (it now derives the failed attempt from a verified persisted run rather than a caller-asserted object),
105
+ never adds a brittle authority. The rung-floor is a deterministic MINIMUM raised on top of the existing
106
+ authority, not a new allow/deny owner. Fully compliant.
107
+
108
+ ## 5. Interactions
109
+
110
+ - **Shadowing:** the new `GET /blockers/self-unblock-runs` route is registered BEFORE `GET
111
+ /blockers/:id` so the literal path is not swallowed by the param route (verified in the diff and the
112
+ integration test). No allow/deny shadowing — there is no new gate.
113
+ - **Double-fire:** no new gate is added, so there is no double-gating of a settle decision. The
114
+ checklist runs once per blocker-resolution attempt and persists one run.
115
+ - **Races:** the run store is append-keyed by immutable run id; `settleTrueBlocker` reads by that id.
116
+ The bw session is held only while a run is in flight (TTL + idle-expiry), so concurrent runs each
117
+ hold their own warm window; no shared mutable settle state is introduced.
118
+ - **Feedback loops:** none — the checklist's output feeds the ledger's settle path, which does not
119
+ feed back into the checklist's inputs.
120
+
121
+ ## 6. External surfaces
122
+
123
+ - **Other agents / users / external systems:** the production probe providers
124
+ (`SelfUnblockProbeProviders.ts`) reach the agent's OWN sources only — its vault (names only), the org
125
+ Bitwarden vault (via `DurableVaultSession`, exit-code reachability), and authed cloud accounts
126
+ (Cloudflare zones via ONE bounded fetch; `vercel`/`gh` via ONE bounded CLI exec). All READ-ONLY,
127
+ one bounded call each, no writes, no new egress, no recursive scans (the 2026-06-13 load-spike
128
+ lesson). Each provider returns ONLY reachability + non-secret scope-tag strings — never a credential
129
+ value. Relevance is operator-declared + fail-closed (an undeclared source advertises nothing → never
130
+ surfaced), so the worst case is under-self-unblock (ask the human slightly more), never mis-applying
131
+ a credential.
132
+ - **Persistent state:** a new machine-local JSONL run store (per-probe results + rung). Inert
133
+ observability data; no schema other code depends on; safe to delete.
134
+ - **Credential reach:** `DurableVaultSession` reaches the org Bitwarden vault via the existing
135
+ `BitwardenProvider`. This is the standard's main security tradeoff and is bounded: session value in
136
+ process memory ONLY, never logged, handed to `bw` ONLY via the child `BW_SESSION` env (never argv),
137
+ never on the secret-sync path, held only while a run is in flight, master password operator-held
138
+ (read from the EXISTING `bw-master-password` vault key — no new on-disk secret). The wiring-integrity
139
+ test `tests/unit/SelfUnblockSessionLeak.test.ts` asserts a sentinel session value rides ONLY the
140
+ `BW_SESSION` env and never appears in argv, the persisted run JSON, the decisions log, or the ledger
141
+ store.
142
+ - **Operator surface:** two new API surfaces — the READ-ONLY `GET /blockers/self-unblock-runs`
143
+ (observability) and the dev-gated `POST /blockers/self-unblock-run` (the agent-facing trigger that
144
+ runs the checklist). Both are Bearer-gated, 503-after-auth when dark, `no-store`, and emit untrusted
145
+ probe `detail` through the `<blocker-ledger-data>` envelope — no secret in any response. Neither is an
146
+ operator dashboard ACTION (no approval page, grant/revoke, secret-drop form, or renderer) → §6b not
147
+ applicable.
148
+
149
+ ## 6b. Operator-surface quality (Operator-Surface Quality standard)
150
+
151
+ **No operator surface — not applicable.** This change adds no `dashboard/*.js` / `dashboard/*.html`
152
+ renderer, no approval page, and no grant/revoke/secret-drop form. The single new HTTP surface is a
153
+ read-only JSON observability route (`GET /blockers/self-unblock-runs`), not an operator action.
154
+
155
+ ## 7. Multi-machine posture (Cross-Machine Coherence)
156
+
157
+ **Posture: machine-local BY DESIGN** — with a security reason, not an oversight.
158
+
159
+ - **Credential reachability is inherently per-machine:** a credential reachable on machine A's authed
160
+ CLIs / keychain may not be reachable on machine B. The checklist probes THIS machine's reachable
161
+ sources; replicating "what I can reach" across machines would be incorrect and a reconnaissance leak.
162
+ - **The `DurableVaultSession` is a security boundary that MUST NOT replicate:** it is explicitly kept
163
+ off the `multiMachine.secretSync` path (asserted in the spec + a wiring test). Machine-local is the
164
+ required posture, not a default.
165
+ - **The run store is a per-machine audit trail** (like the reap-log / blocker-decisions log).
166
+ - **User-facing notices:** none emitted by this change — any messaging is owned by the existing
167
+ ledger settle path (one-voice gating already applies there), so no new double-voice risk.
168
+ - **Durable state on topic transfer:** the run store is NOT topic-keyed, so it does not strand on a
169
+ topic move.
170
+ - **Generated URLs:** the one route is a local API path; it generates no cross-machine link.
171
+
172
+ If a pool-wide "what self-unblock runs happened across machines?" view is ever wanted, the correct
173
+ shape is a proxied-on-read merged view (`?scope=pool`) over each machine's local store — explicitly
174
+ NOT replication of the underlying credential-reach data. Noted as a possible future read-surface, not
175
+ needed for this standard.
176
+
177
+ ## 8. Rollback cost
178
+
179
+ Pure code change behind a dev-gate. Back-out options, cheapest first:
180
+
181
+ - **Disable the flag** — set `monitoring.blockerLedger.selfUnblockChecklist.enabled:false` (and
182
+ `durableVaultSession`): the checklist stops running, the route 503s, the session is not kept warm.
183
+ Everything inert with no revert. This is the primary rollback.
184
+ - **Hot-fix revert** — revert the PR and ship a patch. The one input-contract change to
185
+ `settleTrueBlocker` reverts with it; no caller depends on the runId path except the new checklist.
186
+ - **Data migration:** none. The persisted runs are inert machine-local JSON; deleting the run-store
187
+ directory is sufficient and optional.
188
+ - **Agent state repair:** none. Dark on the fleet, so no fleet agent sees a change; the dev agent
189
+ picks up the gate at next restart and drops it the moment the flag is disabled.
190
+ - **User visibility:** none — no user-visible behavior on a normal install during any rollback window.
191
+
192
+ ## Conclusion
193
+
194
+ The review produced one mechanical change to the spec (§11 reworded from "deferred decisions" to
195
+ "scope boundary / explicit non-goal" — identical meaning, reworded so it does not trip the
196
+ no-orphan-deferrals scan) AND, far more importantly, the second-pass review caught that the build was
197
+ shipped HALF-WIRED: the consumer gate + run store + read route were wired, but the PRODUCER (the
198
+ checklist runner + `DurableVaultSession` + a trigger) was not — so enabling it would have BLOCKED
199
+ settling a credential-blocker on a dev agent. Completing the producer (within the approved spec §5)
200
+ then surfaced a second real defect (the `deriveSession` env-vs-getSessionKey bug) that passed the
201
+ injected-fake tests but would have killed the org-Bitwarden probe in production. Both are resolved and
202
+ guarded by new tests. The standout property remains that the adversarial pass forced the design to
203
+ EXTEND the existing BlockerLedger settle authority instead of forking a weaker parallel gate — zero new
204
+ blocking authority, the one judgment stays on the Tier-1 gate, strictly HARDER to settle a false
205
+ operator-blocker. It ships dark behind the developmentAgent gate, is reversible to fully inert via the
206
+ flags, and is machine-local by a stated security reason. Clear to ship.
207
+
208
+ ## Second-pass review (if required)
209
+
210
+ **Reviewer:** independent general-purpose reviewer subagent (high-risk: touches a settle gate)
211
+
212
+ **Round 1 — Concern raised.** Confirmed the consumer half (signal-vs-authority, anti-gaming run-id
213
+ verification, fail-closed relevance, route auth) is solid (A–D), but raised TWO blocking concerns:
214
+ (1) the artifact over-claimed an "argv" non-leak test that did not exist; (2) `DurableVaultSession` and
215
+ the checklist RUNNER were instantiated only in tests — the producer was unwired in production, so the
216
+ settle gate would demand a run that could never be produced.
217
+
218
+ **Resolution.** Both addressed by completing the producer within the approved spec: a real bounded
219
+ fail-closed provider for all 9 sources (`SelfUnblockProbeProviders.ts`), production instantiation of
220
+ the checklist + `DurableVaultSession` in AgentServer (each on its own dev-gate), the `POST
221
+ /blockers/self-unblock-run` trigger, and the now-real argv non-leak test
222
+ (`SelfUnblockSessionLeak.test.ts`). Completing it surfaced + fixed the `deriveSession`
223
+ env-vs-`getSessionKey()` production bug (guarded by `deriveBitwardenSession.test.ts`).
224
+
225
+ **Round 2 — Concur.** A focused independent re-review of the final producer code verified, with
226
+ file:line evidence, all six checks: (1a) no provider returns/logs a secret value; (1b) each provider is
227
+ one bounded call, no recursive scan; (1c) relevance is fail-closed; (2) `deriveBitwardenSession`
228
+ returns `getSessionKey()` not the env and is null-safe; (3) the AgentServer block is dev-gated and
229
+ introduces no new on-disk secret; (4) the trigger route is 503-after-auth, intent-gated, `no-store`,
230
+ and leaks no secret. **Verdict: Concur.**
231
+
232
+ ## Evidence pointers
233
+
234
+ - `tsc --noEmit` clean; `node scripts/lint-dev-agent-dark-gate.js` → `clean`.
235
+ - Targeted vitest run (11 files, 215 tests green): consumer/library —
236
+ `tests/unit/BlockerLedgerSelfUnblock.test.ts`, `DurableVaultSession.test.ts`,
237
+ `SelfUnblockChecklist.test.ts`, `PostUpdateMigrator-selfUnblock.test.ts`; producer —
238
+ `tests/unit/SelfUnblockProbeProviders.test.ts`, `deriveBitwardenSession.test.ts`,
239
+ `SelfUnblockSessionLeak.test.ts` (the argv/ledger non-leak wiring test);
240
+ routes/E2E — `tests/integration/self-unblock-routes.test.ts` (incl. the production-path trigger →
241
+ settle and the negative anti-gaming assertion), `tests/e2e/self-unblock-lifecycle.test.ts`
242
+ ("feature is alive": 200 enabled / 503-after-auth dark); plus `lint-dev-agent-dark-gate.test.ts` +
243
+ `feature-delivery-completeness.test.ts` (dev-gate registry coherence).
244
+
245
+ ## Addendum — no-silent-fallbacks ratchet (post-CI follow-up)
246
+
247
+ CI surfaced one deterministic failure after the initial commit: the
248
+ `no-silent-fallbacks` ratchet counts error-swallowing `catch` blocks against a
249
+ tracked baseline (474) and the two new catches in `SelfUnblockRunStore`
250
+ (`loadRun` skipping a corrupt/partial trailing JSONL line; `listRuns` returning
251
+ `[]` when no runs file exists yet) pushed it to 475.
252
+
253
+ Both are intentional, expected-condition silences — a partial trailing line is a
254
+ normal crash-during-append artifact, and a missing runs file is the first-run
255
+ condition — and they match a pattern already blessed two functions up in the same
256
+ file. The correct fix is therefore the codebase's `@silent-fallback-ok` marker
257
+ with justification on each, NOT raising the baseline or bolting on noisy
258
+ degradation reports. Count is back to 473. No behavior change; pure annotation.
@@ -0,0 +1,61 @@
1
+ # Side-Effects Review — Supervisor Respawn Guarantee (net #2)
2
+
3
+ **Version / slug:** `supervisor-respawn-guarantee`
4
+ **Date:** `2026-06-14`
5
+ **Author:** `echo`
6
+ **Second-pass reviewer:** `independent reviewer — Concur with the review (2026-06-14)`
7
+
8
+ > Second-pass verdict: **Concur.** Verified no double-spawn (5s first backoff < 10s tick; `spawnServer` kills any lingering session; circuit breaker bounds a crash-loop), no boot regression (`spawnServer` creates the tmux session synchronously via `execFileSync` before the loop arms, so a booting server always has a live session), correct ordering after the slept short-circuit and planned-restart suppression, stale `spawnedAt` does not corrupt the `lastHealthy < spawnedAt` bind-failure tracker, and `firstSpawnedAt` is anchored/cleared on all healthy paths. One non-blocking edge raised — a long hard-sleep while `firstSpawnedAt` is anchored could make the wall-clock ceiling fire immediately on the post-wake boot — **hardened in response:** a genuine (low-load) suspend/wake now re-anchors `firstSpawnedAt = now` in both the gap-check and the `SleepWakeDetector` wake handler (a real wake is a fresh boot episode), with two added regression assertions.
9
+
10
+ ## Summary of the change
11
+
12
+ `src/lifeline/ServerSupervisor.ts` — the in-process net that detects a dead server and respawns it. Three fixes, all in the 10s health-check loop, closing the 2026-06-14 ~2h outage where a CPU-starved box made the supervisor misread its own stalled event loop as a machine sleep/wake, reset `spawnedAt = now`, and pin itself in the startup-grace branch where health failures (including a vanished server tmux session) are ignored.
13
+
14
+ - **Fix A (load-bearing):** at the top of each tick, before the startup-grace early-return, probe `isServerSessionAlive()`. A missing session is unambiguous death → call `handleUnhealthy()` immediately (subject to its existing circuit-breaker / restart-attempt accounting), regardless of any grace pin.
15
+ - **Fix B:** make the gap-based sleep/wake detection (and the `SleepWakeDetector` `'wake'` handler) load-aware. A large inter-tick gap while `loadRatioProvider() > maxLoadRatio` (1.5) is classified as a stalled event loop, not a suspend — failure counters reset (safe) but `spawnedAt` is NOT reset (grace not re-armed). A low-load gap still re-arms grace (real-suspend behavior preserved).
16
+ - **Fix C:** an absolute grace ceiling — `firstSpawnedAt` anchors the first spawn of the current not-yet-healthy episode (never reset by sleep/wake handlers); cumulative grace is capped at `startupGraceMs × 3`. Cleared on the first healthy tick.
17
+
18
+ Refactor: the inline `setInterval` callback was extracted verbatim into `private async runHealthTick()` so a single tick can be unit-driven and the wiring-integrity test can assert the loop probes session liveness.
19
+
20
+ ## Decision-point inventory
21
+
22
+ - `ServerSupervisor.runHealthTick` missing-session override (Fix A) — **add** — respawn-vs-wait decision now grounded in "does the tmux session exist?" before grace.
23
+ - `ServerSupervisor.runHealthTick` gap classification (Fix B) — **modify** — sleep/wake-vs-stalled-loop decision now consults load.
24
+ - `SleepWakeDetector 'wake'` handler `spawnedAt` reset (Fix B) — **modify** — same load guard.
25
+ - `ServerSupervisor.runHealthTick` grace early-return (Fix C) — **modify** — adds the absolute ceiling term.
26
+
27
+ ---
28
+
29
+ ## 1. Over-block
30
+
31
+ No outbound/inbound message block surface. The analogous "over-action" risk is **respawning a server that should have been left alone** (a false positive). Fix A only acts when the tmux session is genuinely absent; a normally-booting server has a live session (created synchronously at spawn; HTTP binds later), so a real boot is never killed — covered by the regression test "alive booting session is still given full grace." Fix B is strictly *less* aggressive than the prior code (it withholds a `spawnedAt` reset; it never adds a kill). Fix C only fires after 3× the (already generous 10-min) grace with the session never having gone healthy — a genuinely hung boot, not a slow one.
32
+
33
+ ## 2. Under-block
34
+
35
+ The "under-action" risk is **a dead server that still isn't respawned**. Remaining gaps, explicitly: (a) Fix A respawns only when the *server tmux session* is missing — a server process that is alive-but-wedged is still handled by the pre-existing `evaluateUnhealthyServer` path (unchanged), with its CPU-starvation defer; this PR does not change that path. (b) This is net #2 only — net #1 (a subsystem uncaught exception crashing the whole process) and net #3 (the fleet watchdog / launchd-level backstop) are tracked follow-ups in the spec §6 (FU-1, FU-2). <!-- tracked: CMT-1540 --> Net #3 was additionally found and fixed LIVE on the echo laptop this session (its launchd job was loaded from a reaped temp-dir plist → exit 127); the durable source fix is tracked in FU-2.
36
+
37
+ ## 3. Level-of-abstraction fit
38
+
39
+ Correct layer. The supervisor legitimately holds respawn **authority**; this change makes that authority *more reliable* by grounding the decision in an objective fact (session exists?) rather than a fragile inference (did the machine sleep?). It reuses the existing low-level primitives (`isServerSessionAlive`, `handleUnhealthy`, `loadRatioProvider`, `maxLoadRatio`) rather than re-implementing them — Fix B uses the *same* load signal the CPU-starvation defer and `SleepWakeDetector` already use. No new gate is introduced; no redundant config knob added (reused `maxLoadRatio = DEFAULT_MAX_LOAD_RATIO = 1.5`, which is exactly the spec's named `cpuStarvationLoadPerCore` default — a deliberate DRY decision vs. the spec's suggested new knob, since the value and signal are identical).
40
+
41
+ ## 4. Signal vs authority compliance
42
+
43
+ Compliant. Per `docs/signal-vs-authority.md`: the heuristic (sleep/wake inference) is *demoted* to where it is safe — leniency for an *existing* slow process — and can never suppress recovery of a *missing* process. The authoritative respawn decision is moved onto a non-heuristic structural fact (tmux session existence). This is the correct direction: replace willpower/heuristic with structure. No brittle check is given new blocking authority.
44
+
45
+ ## 5. Interactions
46
+
47
+ - **Does not shadow / is not shadowed:** Fix A runs *before* the grace early-return and `return`s on a missing session, so the rest of the tick is skipped on that path (intended — respawn is scheduled). The existing non-grace `evaluateUnhealthyServer` missing-session branch is unchanged and still covers the alive-but-unresponsive case.
48
+ - **Double-fire / hot-spin:** Fix A routes through `handleUnhealthy()`, which carries the full circuit-breaker, restart-attempt cap, cooldown, and planned-restart suppression. During the post-`handleUnhealthy` backoff (first attempt 5s < 10s tick) the session re-appears before the next tick, so no double-spawn; a genuine crash-loop trips the breaker exactly as today. Planned-restart / legacy-restart / slept-marker short-circuits all still precede or are honored by `handleUnhealthy`.
49
+ - **Counter resets:** Fix B still resets failure counters on a starved gap (safe — they may be stale); it only withholds the `spawnedAt`/grace re-arm. Fix C clears `firstSpawnedAt` on both healthy paths (grace optimistic-probe success and the main healthy branch).
50
+
51
+ ## 6. External surfaces
52
+
53
+ No API route, no message, no cross-agent surface, no schema change. Pure in-process lifeline behavior. Adds console log lines on the new branches (forensic only). No dependency on conversation state. The only timing dependency is system load (`os.loadavg()`), already used elsewhere and injectable in tests.
54
+
55
+ ## 7. Multi-machine posture (Cross-Machine Coherence)
56
+
57
+ **Machine-local BY DESIGN.** The `ServerSupervisor` supervises THIS machine's server process; each machine runs its own supervisor and respawns its own server. There is no shared state, no replication, and no cross-machine read — a server's liveness is inherently a per-machine fact. Nothing here is user-facing (no one-voice gating needed), nothing is durable state that could strand on a topic transfer, and no URL is generated. This is the correct posture, not a silent single-machine assumption.
58
+
59
+ ## 8. Rollback cost
60
+
61
+ Low. Pure code change in one file + one new test file; no migration, no state schema, no config default change. Back-out = revert the commit and ship a patch release; the supervisor reverts to prior behavior with no data repair. The extracted `runHealthTick()` is behavior-identical to the prior inline callback, so even a partial revert is clean. The change only makes recovery *more* likely to fire, so the failure mode of a bug here is bounded by the pre-existing circuit breaker.