instar 1.3.577 → 1.3.579

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (38) hide show
  1. package/dist/commands/server.d.ts.map +1 -1
  2. package/dist/commands/server.js +13 -0
  3. package/dist/commands/server.js.map +1 -1
  4. package/dist/core/PostUpdateMigrator.d.ts +1 -0
  5. package/dist/core/PostUpdateMigrator.d.ts.map +1 -1
  6. package/dist/core/PostUpdateMigrator.js +108 -0
  7. package/dist/core/PostUpdateMigrator.js.map +1 -1
  8. package/dist/core/action-claim.d.ts +34 -0
  9. package/dist/core/action-claim.d.ts.map +1 -0
  10. package/dist/core/action-claim.js +118 -0
  11. package/dist/core/action-claim.js.map +1 -0
  12. package/dist/core/types.d.ts +9 -0
  13. package/dist/core/types.d.ts.map +1 -1
  14. package/dist/core/types.js.map +1 -1
  15. package/dist/monitoring/CommitmentTracker.d.ts.map +1 -1
  16. package/dist/monitoring/CommitmentTracker.js +12 -0
  17. package/dist/monitoring/CommitmentTracker.js.map +1 -1
  18. package/dist/monitoring/ResumeQueue.d.ts +50 -0
  19. package/dist/monitoring/ResumeQueue.d.ts.map +1 -1
  20. package/dist/monitoring/ResumeQueue.js +179 -3
  21. package/dist/monitoring/ResumeQueue.js.map +1 -1
  22. package/dist/monitoring/guardManifest.d.ts.map +1 -1
  23. package/dist/monitoring/guardManifest.js +17 -0
  24. package/dist/monitoring/guardManifest.js.map +1 -1
  25. package/dist/scaffold/templates.d.ts.map +1 -1
  26. package/dist/scaffold/templates.js +2 -0
  27. package/dist/scaffold/templates.js.map +1 -1
  28. package/dist/server/routes.d.ts.map +1 -1
  29. package/dist/server/routes.js +62 -0
  30. package/dist/server/routes.js.map +1 -1
  31. package/package.json +1 -1
  32. package/src/data/builtin-manifest.json +64 -64
  33. package/src/scaffold/templates.ts +2 -0
  34. package/src/templates/hooks/settings-template.json +10 -0
  35. package/upgrades/1.3.578.md +64 -0
  36. package/upgrades/1.3.579.md +63 -0
  37. package/upgrades/side-effects/action-claim-followthrough-sentinel.md +109 -0
  38. package/upgrades/side-effects/autonomous-run-outlives-session.md +114 -0
@@ -0,0 +1,64 @@
1
+ # Upgrade Guide — vNEXT
2
+
3
+ <!-- assembled-by: assemble-next-md -->
4
+ <!-- bump: patch -->
5
+
6
+ ## What Changed
7
+
8
+ A new constitutional standard ("An Autonomous Run Must Outlive Its Session") plus the
9
+ fix behind it. The mid-work resume queue (the system that revives a reaped autonomous
10
+ run, #1157) takes a host-local lock so two machines can't corrupt its shared state.
11
+ A machine RENAME used to leave a stale lock the queue mistook for a shared-volume
12
+ conflict — so it silently disabled the entire run-revival guard and never said so
13
+ (the 2026-06-15 incident). Two changes:
14
+
15
+ - **Rename-aware lock (GAP-D).** When the lock shows a different host, the queue now
16
+ distinguishes a single-host rename (provably host-local disk + dead pid + ≥5min
17
+ stale heartbeat → auto-heal the lock via an O_EXCL first-writer-wins takeover) from
18
+ a genuine shared-volume conflict (stay disabled). FAIL-CLOSED on any uncertainty
19
+ (unknown filesystem, `df` failure, live pid, fresh heartbeat). The original HARD
20
+ INVARIANT — never pid-probe a foreign-host lock — is fully preserved when auto-heal
21
+ is off. Ships **fleet-default OFF** (`monitoring.resumeQueue.autoHealStaleHostLock`),
22
+ dev-agent dryRun-first (logs "would auto-heal" without rewriting) before going live.
23
+ - **A disabled revival queue is now LOUD (D2).** The queue self-reports to the
24
+ guard-posture inventory (`GUARD_MANIFEST` entry + `guardStatus()` + an unconditional
25
+ registration), so a disabled revival queue reads `off-runtime-divergent` on
26
+ `GET /guards` and raises one aggregated attention item — never silently inert.
27
+
28
+ No new route. New code-defaulted config key (kept out of ConfigDefaults to preserve
29
+ the fleet flip, consistent with #1157). Signal-only surfacing; the only authority is
30
+ the queue refusing to start itself (bounded self-recovery, fail-closed).
31
+
32
+ ## What to Tell Your User
33
+
34
+ - **Your autonomous work survives a machine rename now** (dev agent): "If I rename or
35
+ restore this machine, the system that brings a reaped autonomous run back no longer
36
+ quietly switches itself off. On a provable same-machine rename it heals its own lock
37
+ (carefully — only on a local disk, with the old process gone); on anything uncertain
38
+ it stays cautious. And if that revival system is ever genuinely disabled, you'll see
39
+ it flagged on the guards view with one alert instead of silence." ⚗️ Experimental —
40
+ the self-heal ships dark on the fleet (dev-agent first) and rolls out more widely
41
+ only after it's proven safe.
42
+
43
+ ## Summary of New Capabilities
44
+
45
+ | Capability | How to Use |
46
+ |-----------|-----------|
47
+ | A machine rename auto-heals the resume-queue lock instead of silently disabling revival | Automatic on the dev agent (`monitoring.resumeQueue.autoHealStaleHostLock`; fleet default off) |
48
+ | A disabled revival queue surfaces as `off-runtime-divergent` | `GET /guards` (automatic) |
49
+ | Diagnose "why didn't my autonomous run come back?" | `GET /guards` + `GET /sessions/resume-queue` (disabled reason) |
50
+
51
+ ## Evidence
52
+
53
+ - Unit (`tests/unit/resume-queue-autoheal-lock.test.ts`, 11): the FD1 device-source
54
+ truth-table (local `/dev/*` → local; `//host`/`host:/path` → not; unknown/tmpfs/map →
55
+ fail-closed); auto-heal fires only on local-FS + dead-pid + stale-heartbeat; stays
56
+ disabled on a non-local FS or a live pid; dryRun logs-but-does-not-rewrite; auto-heal
57
+ off preserves today's behavior; `guardStatus()` reporting.
58
+ - Integration (`tests/integration/resume-queue-guard-posture.test.ts`, 3): a
59
+ runtime-disabled queue classifies `off-runtime-divergent` through the real
60
+ GUARD_MANIFEST entry + `deriveGuardRow`; a healthy queue does not.
61
+ - Regression: the full resume-queue unit + route suite (100 tests) stays green —
62
+ including the existing HARD-INVARIANT test that a foreign-host lock is never
63
+ pid-probed (which guards against re-introducing the cross-host probe). tsc clean;
64
+ `lint-guard-manifest` clean. Independent Phase-5 second-pass review concurred.
@@ -0,0 +1,63 @@
1
+ # Upgrade Guide — vNEXT
2
+
3
+ <!-- assembled-by: assemble-next-md -->
4
+ <!-- bump: patch -->
5
+
6
+ ## What Changed
7
+
8
+ A signal-only backstop for the word≠action gap: when the agent says a CONCRETE
9
+ future action in a conversational turn ("I'll restart the server", "relaunching
10
+ now", "pushing the change"), a thin Stop hook posts the turn to a new server route
11
+ `POST /action-claim/observe`, which classifies the claim and opens an idempotent
12
+ follow-through **commitment** for the topic — so the existing PromiseBeacon + the
13
+ revival path make sure the action actually happens instead of being silently
14
+ dropped.
15
+
16
+ - `src/core/action-claim.ts` — deterministic, high-precision classifier (closed
17
+ concrete-verb set; first-person-scoped; sentence-initial-participle for the
18
+ "Relaunching now" idiom; rejects vague filler, past tense, quotes, and
19
+ imperatives/questions/third-person directed at others). Fails toward NOT
20
+ registering.
21
+ - `CommitmentTracker.record()` — idempotent `externalKey` create (the missing
22
+ dedupe primitive): a restated claim updates ONE commitment instead of spawning N.
23
+ - The route enforces a per-topic cap + a 6h auto-expiry. The hook ALWAYS exits 0 —
24
+ it never blocks a message. Off by default behind `messaging.actionClaim.enabled`.
25
+
26
+ Completed-action verification (checking "I already pushed it" against real evidence)
27
+ is a tracked follow-up <!-- tracked: CMT-1554-sibling action-claim-A2-evidence-primitive --> — it needs a per-turn tool-call-evidence primitive that
28
+ doesn't exist yet; the founding incident was a future-action claim, which this
29
+ covers.
30
+
31
+ ## What to Tell Your User
32
+
33
+ - **A concrete thing I say I'll do becomes a tracked promise** (when enabled): "If I
34
+ say I'll restart/push/deploy/merge something, that now opens a tracked commitment
35
+ so I actually follow through — instead of it being a sentence that might quietly
36
+ never happen. It's careful (vague 'I'll take a look' doesn't trigger it), it never
37
+ blocks my messages, and it de-duplicates so restating the same thing doesn't pile
38
+ up reminders." ⚗️ Experimental — ships off by default; enabled on the dev agent
39
+ first to prove it out before any fleet rollout.
40
+
41
+ ## Summary of New Capabilities
42
+
43
+ | Capability | How to Use |
44
+ |-----------|-----------|
45
+ | A concrete future-action claim opens a tracked follow-through commitment | Automatic when `messaging.actionClaim.enabled` (off by default) |
46
+ | Idempotent — a restated claim updates one commitment, not many | Automatic (`externalKey` dedupe) |
47
+ | Diagnose "why did a commitment appear when I said I'd restart X?" | `GET /commitments` — that's this sentinel tracking the stated action |
48
+
49
+ ## Evidence
50
+
51
+ - Unit (`tests/unit/action-claim.test.ts`, 9): classifier both sides — catches the
52
+ founding "Relaunching now" + first-person near-future forms + participle
53
+ normalization; rejects vague filler, past tense, quotes, AND
54
+ imperatives/questions/third-person directed at others (the Phase-5 precision fix).
55
+ - Unit (`tests/unit/CommitmentTracker-externalKey-dedupe.test.ts`, 3): `record()`
56
+ returns the existing open commitment on a repeated `externalKey`; distinct keys →
57
+ distinct; no-key → unchanged behavior.
58
+ - Integration (`tests/integration/action-claim-route.test.ts`, 6): `POST
59
+ /action-claim/observe` over the real HTTP pipeline — flag-off no-op, register,
60
+ dedupe, non-claim no-op, per-topic cap, 400.
61
+ - Regression: 87 existing CommitmentTracker tests green; tsc clean; settings-template
62
+ valid JSON. Phase-5 second-pass review raised a precision concern (third-person
63
+ false positives) which was fixed and re-verified.
@@ -0,0 +1,109 @@
1
+ # Side-Effects Review — Action-Claim Follow-Through Sentinel (P2)
2
+
3
+ Spec: `docs/specs/action-claim-followthrough-sentinel.md` (converged + approved).
4
+ Change: a thin Stop hook posts each finished conversational turn to a new server
5
+ route `POST /action-claim/observe`, which classifies a CONCRETE future-action claim
6
+ ("I'll restart it", "relaunching now") and opens an idempotent follow-through
7
+ commitment. Signal-only, dark by default. (A2 — completed-action verification —
8
+ is DESCOPED, tracked: no per-turn evidence primitive exists.)
9
+
10
+ Files:
11
+ - `src/core/action-claim.ts` — `classifyActionClaim` + `classifyDfSourceLocal`-style deterministic classifier (FD2/FD4).
12
+ - `src/monitoring/CommitmentTracker.ts` — `record()` idempotent `externalKey` create (FD3, the missing dedupe primitive).
13
+ - `src/server/routes.ts` — `POST /action-claim/observe` (flag-gated, server-side classify + idempotent create + per-topic cap + expiry; signal-only).
14
+ - `src/core/PostUpdateMigrator.ts` — `getActionClaimFollowthroughHook()` + migrateHooks deploy + migrateSettings Stop-register + migrateClaudeMd awareness.
15
+ - `src/templates/hooks/settings-template.json` — Stop entry (new agents).
16
+ - `src/scaffold/templates.ts` — generateClaudeMd awareness line.
17
+ - Tests: `tests/unit/action-claim.test.ts` (7), `tests/unit/CommitmentTracker-externalKey-dedupe.test.ts` (3), `tests/integration/action-claim-route.test.ts` (6).
18
+
19
+ ## 1. Over-block
20
+ Nothing is blocked — the hook ALWAYS `exit(0)`; the route never blocks a send. The
21
+ only "over-fire" risk is registering a spurious commitment. Mitigated by FD2 (closed
22
+ concrete-verb set; vague filler like "I'll take a look" does NOT trigger; fail toward
23
+ NOT-registering on ambiguity) + FD3 (dedupe + auto-expiry + per-topic cap). Verified
24
+ by the unit truth-table (both sides) + the integration no-op/cap tests.
25
+
26
+ ## 2. Under-block
27
+ Misses: (a) completed-action claims ("I already pushed it") — A2, deliberately
28
+ descoped (no per-turn evidence channel; tracked); (b) creatively-worded future
29
+ claims outside the closed verb set — accepted under-coverage (precision over recall,
30
+ since a false commitment nags). Both are the safe direction.
31
+
32
+ ## 3. Level-of-abstraction fit
33
+ Correct. The thin hook mirrors the proven `response-review.js` Stop-hook siting; the
34
+ classifier + dedupe run SERVER-SIDE (a plain-JS hook can't import the TS classifier);
35
+ the dedupe lives IN `CommitmentTracker.record()` (the one writer of that store); the
36
+ follow-through rides the EXISTING PromiseBeacon + revival path rather than a new
37
+ mechanism. No new notification surface.
38
+
39
+ ## 4. Signal vs authority compliance (docs/signal-vs-authority.md)
40
+ COMPLIANT. The hook is a pure side-effect POST that never emits `decision:block`
41
+ (always exit 0). The route opens a commitment (a signal/record) and never gates,
42
+ delays, or rewrites a message. The classifier is a brittle deterministic matcher used
43
+ ONLY as a signal — never as blocking authority. The whole feature is off by default.
44
+
45
+ ## 5. Interactions
46
+ - Builds ON the existing `detectTimePromise`/PromiseBeacon path rather than a second
47
+ classifier; the dedupe `externalKey` is tagged `actionclaim:` so the per-topic cap
48
+ can count its own commitments without colliding with other externalKey users.
49
+ - `record()` idempotency is additive: absent `externalKey` → unchanged behavior
50
+ (verified — 87 existing CommitmentTracker tests still pass + a no-key test).
51
+ - The Stop hook is registered AFTER the existing Stop hooks (stop-gate-router stays
52
+ first); it can't shadow them (it never blocks).
53
+
54
+ ## 6. External surfaces
55
+ - New route `POST /action-claim/observe` (auth'd like all routes). New Stop hook
56
+ registered in `.claude/settings.json` (new + existing agents via migrateSettings).
57
+ New config keys under `messaging.actionClaim.*` (read with safe defaults).
58
+ - No change visible to other agents/users when the flag is off (the fleet default).
59
+
60
+ ## 7. Multi-machine posture (Cross-Machine Coherence)
61
+ MACHINE-LOCAL BY DESIGN. The hook fires on the machine running the conversational
62
+ turn and registers a commitment in THAT machine's CommitmentTracker — exactly where
63
+ the turn happened and where the PromiseBeacon that follows it through runs. Commitment
64
+ cross-machine replication (if ever wanted) rides the existing `stateSync` family,
65
+ out of scope here. No URLs/notices that must survive a machine boundary.
66
+
67
+ ## 8. Rollback cost
68
+ Trivial. The feature is off unless `messaging.actionClaim.enabled` is set; setting it
69
+ back to false (or absent) fully disables it — the hook no-ops at its first config
70
+ read, the route returns `feature-disabled`. The `record()` dedupe is inert without an
71
+ `externalKey`. No migration, no data repair.
72
+
73
+ ## Decisions (mine, per the run's full preapproval)
74
+ - **No `migrateConfig` entry for the flag.** Absent = off is the correct dark default,
75
+ consistent with how the resume-queue keys are deliberately kept out of ConfigDefaults
76
+ to preserve the fleet flip. The dev agent enables `messaging.actionClaim.enabled`
77
+ explicitly to soak before any fleet default flip (a separate reviewed decision).
78
+ - **A2 descoped + tracked** — building it would lean on a per-turn evidence primitive
79
+ that doesn't exist (the P1 lesson); the founding incident was a future-action claim,
80
+ which the v1 feature covers.
81
+
82
+ ## Test coverage (Testing Integrity)
83
+ - Unit: classifier truth-table both sides (FD2/FD4) + dedupe idempotency (FD3).
84
+ - Integration: `POST /action-claim/observe` over the real HTTP pipeline — flag-off
85
+ no-op, register, dedupe (restated claim → same commitment), non-claim no-op,
86
+ per-topic cap, 400 on bad input.
87
+ - E2E (`tests/e2e/action-claim-lifecycle.test.ts`, 2): boots a REAL Express server on
88
+ a real port and hits `POST /action-claim/observe` — the "feature is alive"
89
+ assertion (200, not 404/503) + a concrete claim opens a real commitment
90
+ (`getActive()` for the topic) + a benign message registers nothing.
91
+ - Regression: 87 existing CommitmentTracker tests green; tsc clean; settings-template valid JSON.
92
+
93
+ ## Second-pass review
94
+ **Concern raised → FIXED → re-verified.** The independent Phase-5 reviewer confirmed
95
+ signal-only (hook always exit 0, route never blocks), correct `record()` idempotency
96
+ (early-return before id/emit; `getActive()` excludes terminal so a terminal same-key
97
+ mints fresh; CAS untouched; 87 existing tests green), per-topic cap counts only
98
+ `actionclaim:`-tagged commitments, and full Migration Parity wiring — BUT found a real
99
+ FD2 precision bug: the third classifier regex (bare present-participle + trailer) was
100
+ NOT first-person scoped, so it false-positived on imperatives/questions/third-person
101
+ ("Did you restart it?", "Please merge the PR", "He is deploying it", "The script
102
+ reverts …"). Left unfixed, enabling the flag would mint spurious follow-through
103
+ commitments — the exact false-commitment-nag FD2 exists to prevent.
104
+
105
+ FIX: the third regex now requires a SENTENCE-INITIAL PARTICIPLE (`(?:^|[.!?]\s+)` +
106
+ the `-ing` form only) — keeps the founding "Relaunching now" / "Done. Pushing it now."
107
+ and rejects all eight flagged false positives. Re-verified: classifier unit tests
108
+ 9/9 (added the third-person/imperative/interrogative rows + a sentence-initial-after-
109
+ boundary row), route integration 6/6 — 15 green. Verdict after fix: concur.
@@ -0,0 +1,114 @@
1
+ # Side-Effects Review — autonomous-run-outlives-session
2
+
3
+ Spec: `docs/specs/autonomous-run-outlives-session.md` (converged + approved).
4
+ Change: GAP-D — the resume-queue host-lock distinguishes a single-host RENAME
5
+ (auto-heal) from a genuine shared-volume conflict (stay disabled), fail-closed;
6
+ a disabled revival queue self-reports to the guard-posture inventory; + the
7
+ constitutional standard "An Autonomous Run Must Outlive Its Session".
8
+
9
+ Files:
10
+ - `src/monitoring/ResumeQueue.ts` — `classifyDfSourceLocal` + `isStateDirHostLocalDefault` (FD1), foreign-host rename-vs-conflict classifier (FD2), `takeOverLockAtomic` (FD4), `guardStatus()` (D2), `autoHealStaleHostLock` config field (FD5).
11
+ - `src/monitoring/guardManifest.ts` — `GUARD_MANIFEST` entry `monitoring.resumeQueue.enabled` (component `ResumeQueue`).
12
+ - `src/commands/server.ts` — dev-gate resolves `autoHealStaleHostLock`; UNCONDITIONAL `guardRegistry.register` for the queue.
13
+ - `src/core/types.ts` — `autoHealStaleHostLock?` config field.
14
+ - `docs/STANDARDS-REGISTRY.md` — the new standard.
15
+ - `src/scaffold/templates.ts` + `src/core/PostUpdateMigrator.ts` — Agent Awareness line (new + deployed agents).
16
+ - Tests: `tests/unit/resume-queue-autoheal-lock.test.ts` (11), `tests/integration/resume-queue-guard-posture.test.ts` (3).
17
+
18
+ ## 1. Over-block (what legitimate inputs does this reject that it shouldn't?)
19
+ The auto-heal is STRICTLY ADDITIVE and gated: it can only turn a currently-DISABLED
20
+ foreign-host case into an enabled one. It never disables a case that previously
21
+ worked. The risk direction is "fails to heal a legitimate rename" → the queue
22
+ stays disabled exactly as today (no regression), just with a louder surface.
23
+ Fail-closed on any uncertainty (unknown FS, df failure, live pid, fresh heartbeat)
24
+ means some genuine renames won't auto-heal — acceptable: the operator clears the
25
+ lock manually as before, and the guard-posture alert now tells them to.
26
+
27
+ ## 2. Under-block (what failure modes does this still miss?)
28
+ - pid recycling (FD3, accepted): a recycled dead pid that maps to a live unrelated
29
+ process reads as a live conflict → stays disabled + LOUD (safe direction; worst
30
+ case a false escalation, never corruption).
31
+ - The narrow double-boot unlink race in `takeOverLockAtomic` (two server boots on
32
+ one machine within ms of each other post-rename): O_EXCL gives EEXIST to the
33
+ loser in the common case; the residual double-unlink window is backstopped by
34
+ the next-acquire live-pid + heartbeat check. Not corruption — at worst a
35
+ transient re-evaluation.
36
+ - Genuine shared-volume setups where `df -P` reports a device string we don't
37
+ recognize as network: classified unknown → NOT local → stays disabled (correct).
38
+
39
+ ## 3. Level-of-abstraction fit
40
+ Correct layer. The lock classifier lives IN `ResumeQueue.acquireLock` (the only
41
+ place that owns the lock), and the surfacing rides the EXISTING guard-posture
42
+ inventory (GUARD_MANIFEST + GuardRegistry + GuardPostureProbe) rather than a new
43
+ parallel alert path. No new notification surface invented — it feeds the one that
44
+ already aggregates and dedups (Bounded Notification Surface).
45
+
46
+ ## 4. Signal vs authority compliance (docs/signal-vs-authority.md)
47
+ COMPLIANT. The auto-heal is bounded SELF-RECOVERY of the queue's own lock with a
48
+ fail-closed default — not a brittle gate holding blocking authority over agent
49
+ behavior or message flow. The guard-posture surfacing is a pure SIGNAL-producer
50
+ (it reports a disabled state; it never blocks, delays, or rewrites anything). The
51
+ default `autoHealStaleHostLock:false` keeps the behavior change off the fleet until
52
+ proven; the dev-agent runs it dryRun-first (logs intent without rewriting).
53
+
54
+ ## 5. Interactions
55
+ - Preserves the original HARD INVARIANT (never pid-probe a foreign lock) when
56
+ auto-heal is OFF — verified by the existing `resume-queue.test.ts:417` invariant
57
+ test (which initially regressed and was fixed by gating all probing behind
58
+ `autoHealStaleHostLock`).
59
+ - The new GUARD_MANIFEST entry passes `lint-guard-manifest` (the drainer is not
60
+ auto-flagged, so no orphan NOT_A_GUARD entry — which would itself fail the lint).
61
+ - `guardRegistry.register` is UNCONDITIONAL (even when start() returns false) so a
62
+ lock-disabled queue reads `off-runtime-divergent`, not `missing`.
63
+ - Does NOT touch `evidenceEligible` / the #1157 revival path — strictly the lock
64
+ gate. No double-fire with the existing same-host stale reclaim (that path is
65
+ unchanged; this is the foreign-host branch only).
66
+
67
+ ## 6. External surfaces
68
+ - New config key `monitoring.resumeQueue.autoHealStaleHostLock` (fleet default
69
+ false). No new route. `GET /guards` and `GET /sessions/resume-queue` gain a
70
+ truthful disabled-state read; no schema break (additive).
71
+ - CLAUDE.md template + migrator add one awareness bullet (new + deployed agents).
72
+ - No external network/timing dependence beyond a single bounded (3000ms) `df -P`
73
+ at lock-acquisition.
74
+
75
+ ## 7. Multi-machine posture (Cross-Machine Coherence)
76
+ MACHINE-LOCAL BY DESIGN, and that is the whole point: the resume-queue lock + its
77
+ state dir are deliberately host-local (a shared volume across two hosts is
78
+ unsupported — the invariant this change PROTECTS). The fix makes the host-local
79
+ assumption ROBUST to a rename of the SAME machine without ever weakening the
80
+ cross-host protection (a genuine foreign live host still disables). guardStatus is
81
+ read per-machine; each machine's `/guards` reports its own queue. No replication
82
+ needed or wanted (a lock is intrinsically local).
83
+
84
+ ## 8. Rollback cost
85
+ Cheap and immediate. `monitoring.resumeQueue.autoHealStaleHostLock:false` (the
86
+ fleet default) fully disables the new auto-heal — reverting to today's
87
+ disable-on-mismatch behavior — with no restart-data implications (config read at
88
+ queue construction; next server start picks it up). The guard-posture surfacing is
89
+ inert when the queue is healthy and harmless when disabled (it only reads state).
90
+ No migration, no data repair. The constitutional-standard doc + CLAUDE.md lines are
91
+ documentation (no runtime surface).
92
+
93
+ ## Test coverage (Testing Integrity)
94
+ - Unit: `resume-queue-autoheal-lock.test.ts` — FD1 truth-table; auto-heal on
95
+ provable rename; stays-disabled on non-local FS / live pid / fresh heartbeat;
96
+ dryRun no-rewrite; auto-heal-off preserves original behavior; guardStatus.
97
+ - Integration: `resume-queue-guard-posture.test.ts` — a runtime-disabled queue
98
+ classifies `off-runtime-divergent` through the real GUARD_MANIFEST entry +
99
+ `deriveGuardRow` (the route's path); a healthy queue does not.
100
+ - E2E: the existing `tests/e2e/resume-idle-autonomous-lifecycle.test.ts` exercises
101
+ the queue alive end-to-end; this change is additive and those pass. (A dedicated
102
+ boot-with-stale-lock E2E is a candidate enhancement; the unit+integration tiers
103
+ cover the new logic and its wiring.)
104
+ - Regression: full resume-queue unit + route suite (100 tests) green; tsc clean;
105
+ lint-guard-manifest clean.
106
+
107
+ ## Second-pass review
108
+ **Concur with the review.** Independent Phase-5 audit (guard/recovery path) verified, citing code:
109
+ 1. Auto-heal can NEVER fire on a genuine shared volume with a live remote holder — `fsLocal` is dispositive and `&&`-short-circuits before any pid probe; `df -P` on a network mount never reports `/dev/*`.
110
+ 2. The HARD INVARIANT (never pid-probe a foreign lock) is preserved when auto-heal is OFF (the fleet default) — all probing is gated behind `if (this.cfg.autoHealStaleHostLock)`; the existing invariant test (default config) still asserts `probed===false`.
111
+ 3. `takeOverLockAtomic` O_EXCL first-writer-wins is correct; the only residual is the documented narrow double-boot unlink window, backstopped by the next-acquire live-pid+heartbeat check — transient/self-correcting, never durable corruption.
112
+ 4. Signal-vs-Authority compliant — the only authority is the queue refusing to start itself (bounded self-recovery, fail-closed); `guardStatus()` is a pure signal producer.
113
+ 5. No common-path regression — a healthy boot never enters the foreign-host branch and never calls `df`.
114
+ Verdict: sound, fail-closed in the right direction, well-tested on both sides of every decision boundary, safely gated.