npm - gsd-pi - Versions diffs - 2.78.1-dev.d8826a445 → 2.78.1-dev.eccf86e27 - Mend

gsd-pi 2.78.1-dev.d8826a445 → 2.78.1-dev.eccf86e27

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (121) hide show

package/README.md CHANGED Viewed

@@ -322,7 +322,7 @@ The database is authoritative for milestones, slices, tasks, requirements, decis
 3. **Git isolation** — When `git.isolation` is set to `worktree` or `branch`, each milestone runs on its own `milestone/<MID>` branch (in a worktree or in-place). All slice work commits sequentially — no branch switching, no merge conflicts. When the milestone completes, it's squash-merged to main as one clean commit. The default is `none` (work on the current branch), configurable via preferences. If `worktree` is configured in a repo with no committed `HEAD`, GSD temporarily behaves as `none` until the first commit exists because git worktrees need a committed start point.
-4. **Crash recovery** — A lock file tracks the current unit. If the session dies, the next `/gsd auto` reads the surviving session file, synthesizes a recovery briefing from every tool call that made it to disk, and resumes with full context. Parallel orchestrator state is persisted to disk with PID liveness detection, so multi-worker sessions survive crashes too. In headless mode, crashes trigger automatic restart with exponential backoff (default 3 attempts).
+4. **Crash recovery** — Auto mode persists worker state, unit-dispatch state, and paused-session metadata in the project-root SQLite database. If the session dies, the next `/gsd auto` reconstructs the interrupted unit from DB-backed runtime state, reads the surviving session file, synthesizes a recovery briefing from every tool call that made it to disk, and resumes with full context. Parallel orchestrator IPC still lives under `.gsd/parallel/`, so multi-worker sessions survive crashes too. In headless mode, crashes trigger automatic restart with exponential backoff (default 3 attempts).
 5. **Provider error recovery** — Transient provider errors (rate limits, 500/503 server errors, overloaded) auto-resume after a delay. Permanent errors (auth, billing) pause for manual review. The model fallback chain retries transient network errors before switching models.
@@ -414,7 +414,7 @@ gsd
 /gsd queue      # queue the next milestone
 ```
-Both terminals read and write the same `.gsd/` files on disk. Your decisions in terminal 2 are picked up automatically at the next phase boundary — no need to stop auto mode.
+Both terminals coordinate through the same project-root GSD runtime on local disk. The SQLite database is authoritative, `.gsd/` markdown is refreshed from it, and your decisions in terminal 2 are picked up at the next phase boundary without stopping auto mode.
 ### Headless mode — CI and scripts
@@ -439,7 +439,7 @@ gsd headless dispatch plan
 Headless auto-responds to interactive prompts, detects completion, and exits with structured codes: `0` complete, `1` error/timeout, `2` blocked. Auto-restarts on crash with exponential backoff. Use `gsd headless query` for instant, machine-readable state inspection — returns phase, next dispatch preview, and parallel worker costs as a single JSON object without spawning an LLM session. Pair with [remote questions](./docs/user-docs/remote-questions.md) to route decisions to Slack or Discord when human input is needed.
-**Multi-session orchestration** — headless mode supports file-based IPC in `.gsd/parallel/` for coordinating multiple GSD workers across milestones. Build orchestrators that spawn, monitor, and budget-cap a fleet of GSD workers.
+**Multi-session orchestration** — headless mode supports DB-backed coordination across multiple GSD workers on the same machine. Worker registration, milestone leases, unit dispatch tracking, and command delivery live in `.gsd/gsd.db`, while `.gsd/parallel/` remains a local runtime area for per-milestone locks and isolation artifacts.
 ### First launch
@@ -705,8 +705,6 @@ The best practice for working in teams is to ensure unique milestone names acros
 ```bash
 # ── GSD: Runtime / Ephemeral (per-developer, per-session) ──────────────────
-# Crash detection sentinel — PID lock, written per auto-mode session
-.gsd/auto.lock
 # Auto-mode dispatch tracker — prevents re-running completed units (includes archived per-milestone files)
 .gsd/completed-units*.json
 # State manifest — workflow state for recovery
@@ -717,11 +715,11 @@ The best practice for working in teams is to ensure unique milestone names acros
 .gsd/metrics.json
 # Raw JSONL session dumps — crash recovery forensics, auto-pruned
 .gsd/activity/
-# Unit execution records — dispatch phase, timeouts, recovery tracking
+# Unit execution records — dispatch phase, timeouts, and recovery tracking
 .gsd/runtime/
 # Git worktree working copies
 .gsd/worktrees/
-# Parallel orchestration IPC and worker status
+# Parallel runtime locks and per-milestone isolation artifacts
 .gsd/parallel/
 # SQLite database and WAL sidecars — authoritative runtime state, local only
 .gsd/gsd.db*

package/dist/help-text.js CHANGED Viewed

@@ -156,7 +156,7 @@ const SUBCOMMAND_HELP = {
         '  gsd headless --answers answers.json auto              With pre-supplied answers',
         '  gsd headless --events agent_end,extension_ui_request auto   Filtered event stream',
         '  gsd headless query                              Instant JSON state snapshot',
-        '  gsd headless recover                            Rebuild DB hierarchy from markdown (mutating)',
+        '  gsd headless recover                            Reset hierarchy + validation/gates, then rebuild from markdown',
         '',
         'Exit codes: 0 = success, 1 = error/timeout, 10 = blocked, 11 = cancelled',
     ].join('\n'),

package/dist/resource-loader.js CHANGED Viewed

@@ -29,6 +29,9 @@ export function getExtensionKey(entryPath, extensionsDir) {
     const relPath = relative(extensionsDir, entryPath);
     return relPath.split(/[\\/]/)[0].replace(/\.(?:ts|js)$/, '');
 }
+function stripSemverBuildMetadata(version) {
+    return version.trim().replace(/^v/, '').split(/[+-]/, 1)[0] || '0.0.0';
+}
 function getManagedResourceManifestPath(agentDir) {
     return join(agentDir, resourceVersionManifestName);
 }
@@ -166,7 +169,9 @@ export function getNewerManagedResourceVersion(agentDir, currentVersion) {
     if (!managedVersion) {
         return null;
     }
-    return compareSemver(managedVersion, currentVersion) > 0 ? managedVersion : null;
+    // Managed resources stamped from the same release line should remain usable
+    // against local dev binaries like 2.78.1-dev.<sha>.
+    return compareSemver(stripSemverBuildMetadata(managedVersion), stripSemverBuildMetadata(currentVersion)) > 0 ? managedVersion : null;
 }
 /**
  * Recursively makes all files and directories under dirPath owner-writable.

package/dist/resources/.managed-resources-content-hash CHANGED Viewed

	@@ -1 +1 @@
1	- ~~3cb2810818585c65~~
1	+ 36cc9805e706289c

package/dist/resources/extensions/gsd/auto/detect-stuck.js CHANGED Viewed

@@ -4,19 +4,53 @@
  * Leaf node in the import DAG.
  */
 import { summarizeLogs } from "../workflow-logger.js";
+import { getLatestForUnit } from "../db/unit-dispatches.js";
 /**
  * Pattern matching ENOENT errors with a file path.
  * Matches: "ENOENT: no such file or directory, access '/path/to/file'"
  * and similar Node.js filesystem error messages.
  */
 const ENOENT_PATH_RE = /ENOENT[^']*'([^']+)'/;
+/**
+ * Phase B / codex review MEDIUM B3 — retry coupling.
+ *
+ * If unit_dispatches has a recent failed dispatch for `unitKey` whose
+ * retry budget is not yet exhausted (attempt_n < max_attempts) AND whose
+ * scheduled next_run_at is still in the future, the loop is legitimately
+ * waiting on its own backoff. Suppress the stuck verdict in that case so
+ * the retry budget can fully drain before we declare stuck.
+ *
+ * Returns true if the dispatch ledger says we should suppress the stuck
+ * signal; false (no suppression) when the ledger is unavailable or has
+ * no opinion.
+ */
+function retryBudgetSuppresses(unitKey) {
+    try {
+        const latest = getLatestForUnit(unitKey);
+        if (!latest)
+            return false;
+        if (latest.attempt_n >= latest.max_attempts)
+            return false;
+        if (!latest.next_run_at)
+            return false;
+        const nextRun = Date.parse(latest.next_run_at);
+        if (!Number.isFinite(nextRun))
+            return false;
+        return nextRun > Date.now();
+    }
+    catch {
+        return false;
+    }
+}
 /**
  * Analyze a sliding window of recent unit dispatches for stuck patterns.
  * Returns a signal with reason if stuck, null otherwise.
  *
  * Rule 1: Same error string twice in a row → stuck immediately.
  * Rule 2: Same unit key 3+ consecutive times → stuck (preserves prior behavior).
- * Rule 2b: Same unit key appears 3+ times anywhere in the active window → stuck.
+ * Rule 2b: Same unit key appears 3+ times anywhere in the active window → stuck,
+ *          UNLESS unit_dispatches says we're inside the retry-backoff window
+ *          (codex review MEDIUM B3 — Phase B retry coupling).
  * Rule 3: Oscillation A→B→A→B in last 4 entries → stuck.
  * Rule 4: Same ENOENT path in any 2 entries within the window → stuck (#3575).
  *         Missing files don't self-heal between retries — retrying wastes budget.
@@ -39,19 +73,21 @@ export function detectStuck(window) {
             reason: `Same error repeated: ${last.error.slice(0, 200)}${suffix}`,
         };
     }
-    // Rule 2: Same unit 3+ consecutive times
+    // Rule 2: Same unit 3+ consecutive times — suppressed if unit_dispatches
+    // says we're inside the retry-backoff window (codex MEDIUM B3).
     if (window.length >= 3) {
         const lastThree = window.slice(-3);
-        if (lastThree.every((u) => u.key === last.key)) {
+        if (lastThree.every((u) => u.key === last.key) && !retryBudgetSuppresses(last.key)) {
             return {
                 stuck: true,
                 reason: `${last.key} derived 3 consecutive times without progress${suffix}`,
             };
         }
     }
-    // Rule 2b: Same unit key 3+ times anywhere in the active window
+    // Rule 2b: Same unit key 3+ times anywhere in the active window — same
+    // retry-budget suppression as Rule 2.
     const countInWindow = window.filter((entry) => entry.key === last.key).length;
-    if (countInWindow >= 3) {
+    if (countInWindow >= 3 && !retryBudgetSuppresses(last.key)) {
         return {
             stuck: true,
             reason: `${last.key} derived ${countInWindow} times in last ${window.length} attempts without progress${suffix}`,

package/dist/resources/extensions/gsd/auto/loop.js CHANGED Viewed

@@ -16,49 +16,55 @@ import { ModelPolicyDispatchBlockedError } from "../auto-model-selection.js";
 import { resolveEngine } from "../engine-resolver.js";
 import { logWarning } from "../workflow-logger.js";
 import { gsdRoot } from "../paths.js";
+import { heartbeatAutoWorker } from "../db/auto-workers.js";
+import { recordDispatchClaim, markRunning as markDispatchRunning, markCompleted as markDispatchCompleted, markFailed as markDispatchFailed, getRecentForUnit as getRecentDispatchesForUnit, getRecentUnitKeysForProjectRoot, } from "../db/unit-dispatches.js";
+import { refreshMilestoneLease } from "../db/milestone-leases.js";
+import { getRuntimeKv, setRuntimeKv } from "../db/runtime-kv.js";
 import { atomicWriteSync } from "../atomic-write.js";
 import { resolveUokFlags } from "../uok/flags.js";
 import { scheduleSidecarQueue } from "../uok/execution-graph.js";
 import { ExecutionGraphScheduler } from "../uok/execution-graph.js";
-import { readFileSync, writeFileSync, mkdirSync, unlinkSync } from "node:fs";
+import { readFileSync, mkdirSync, unlinkSync } from "node:fs";
 import { join } from "node:path";
+import { normalizeRealPath } from "../paths.js";
 // ── Stuck detection persistence (#3704) ──────────────────────────────────
-// Persist stuck detection state to disk so it survives session restarts.
-// Without this, restarting auto-mode resets all counters, allowing the
-// same blocked unit to burn a full retry budget each session.
-function stuckStatePath(basePath) {
-    return join(gsdRoot(basePath), "runtime", "stuck-state.json");
+// Phase C migration: stuck-state.json deleted in favor of DB-backed
+// equivalents. recentUnits is rebuilt from unit_dispatches (Phase B
+// ledger) on session start; stuckRecoveryAttempts persists in runtime_kv
+// under a stable project scope (soft state per the runtime_kv invariant). Single-host
+// SQLite WAL only — multi-host would need a real coordinator.
+//
+// When no worker is registered (DB unavailable, fresh project), both
+// helpers degrade to the empty-state fallback that #3704 already
+// tolerates — same behavior as a fresh session.
+const STUCK_RECOVERY_ATTEMPTS_KEY = "stuck_recovery_attempts";
+const RECENT_UNIT_KEYS_LIMIT = 20;
+function stableStuckStateScopeId(s) {
+    return normalizeRealPath(s.scope?.workspace.projectRoot ?? (s.originalBasePath || s.basePath));
 }
-function loadStuckState(basePath) {
+function loadStuckState(s) {
+    const scopeId = stableStuckStateScopeId(s);
+    if (!scopeId)
+        return { recentUnits: [], stuckRecoveryAttempts: 0 };
     try {
-        const data = JSON.parse(readFileSync(stuckStatePath(basePath), "utf-8"));
-        // Only load state written by a DIFFERENT process (real session restart).
-        // If the PID matches the current process, this state was written by an earlier
-        // autoLoop call in the same process (e.g., a test that completed before this
-        // one), not by a crashed session — skip it to prevent test state pollution.
-        if (data.pid === process.pid) {
-            return { recentUnits: [], stuckRecoveryAttempts: 0 };
-        }
-        return {
-            recentUnits: Array.isArray(data.recentUnits) ? data.recentUnits : [],
-            stuckRecoveryAttempts: typeof data.stuckRecoveryAttempts === "number" ? data.stuckRecoveryAttempts : 0,
-        };
+        const recentUnits = getRecentUnitKeysForProjectRoot(scopeId, RECENT_UNIT_KEYS_LIMIT);
+        const stuckRecoveryAttempts = getRuntimeKv("global", scopeId, STUCK_RECOVERY_ATTEMPTS_KEY) ?? 0;
+        return { recentUnits, stuckRecoveryAttempts };
     }
     catch (err) {
         debugLog("autoLoop", { phase: "load-stuck-state-failed", error: err instanceof Error ? err.message : String(err) });
         return { recentUnits: [], stuckRecoveryAttempts: 0 };
     }
 }
-function saveStuckState(basePath, state) {
+function saveStuckState(s, state) {
+    const scopeId = stableStuckStateScopeId(s);
+    if (!scopeId)
+        return;
+    // recentUnits is automatically derived from unit_dispatches by the
+    // dispatch ledger writes in openDispatchClaim — no separate persistence
+    // needed. Only the soft retry counter needs a runtime_kv row.
     try {
-        const filePath = stuckStatePath(basePath);
-        mkdirSync(join(gsdRoot(basePath), "runtime"), { recursive: true });
-        writeFileSync(filePath, JSON.stringify({
-            pid: process.pid,
-            recentUnits: state.recentUnits.slice(-20), // keep last 20 entries
-            stuckRecoveryAttempts: state.stuckRecoveryAttempts,
-            updatedAt: new Date().toISOString(),
-        }) + "\n");
+        setRuntimeKv("global", scopeId, STUCK_RECOVERY_ATTEMPTS_KEY, state.stuckRecoveryAttempts);
     }
     catch (err) {
         debugLog("autoLoop", { phase: "save-stuck-state-failed", error: err instanceof Error ? err.message : String(err) });
@@ -115,6 +121,57 @@ function saveCustomVerifyRetryCounts(s) {
         }
     }
 }
+function openDispatchClaim(s, flowId, turnId, iterData) {
+    if (!s.workerId || s.milestoneLeaseToken === null)
+        return { kind: "degraded" };
+    const mid = iterData.mid;
+    if (!mid)
+        return { kind: "degraded" };
+    const recent = getRecentDispatchesForUnit(iterData.unitId, 1);
+    const attemptN = (recent[0]?.attempt_n ?? 0) + 1;
+    let claim;
+    try {
+        claim = recordDispatchClaim({
+            traceId: flowId,
+            turnId,
+            workerId: s.workerId,
+            milestoneLeaseToken: s.milestoneLeaseToken,
+            milestoneId: mid,
+            sliceId: iterData.state.activeSlice?.id ?? null,
+            taskId: iterData.state.activeTask?.id ?? null,
+            unitType: iterData.unitType,
+            unitId: iterData.unitId,
+            attemptN,
+        });
+        if (!claim.ok) {
+            debugLog("autoLoop", {
+                phase: "dispatch-claim-rejected",
+                unitId: iterData.unitId,
+                reason: claim.error,
+                existingId: "existingId" in claim ? claim.existingId : undefined,
+                existingWorker: "existingWorker" in claim ? claim.existingWorker : undefined,
+            });
+            if (claim.error === "already_active") {
+                return {
+                    kind: "skip",
+                    reason: "already-active",
+                    existingId: claim.existingId,
+                    existingWorker: claim.existingWorker,
+                };
+            }
+            return { kind: "skip", reason: "stale-lease" };
+        }
+        markDispatchRunning(claim.dispatchId);
+        return { kind: "opened", dispatchId: claim.dispatchId };
+    }
+    catch (err) {
+        debugLog("autoLoop", {
+            phase: "dispatch-claim-failed",
+            error: err instanceof Error ? err.message : String(err),
+        });
+        return { kind: "degraded" };
+    }
+}
 // ── Memory pressure monitoring (#3331) ──────────────────────────────────
 // Check heap usage every N iterations and trigger graceful shutdown before
 // the OS OOM killer sends SIGKILL. The threshold is 90% of the V8 heap
@@ -220,7 +277,7 @@ export async function autoLoop(ctx, pi, s, deps, options) {
     let iteration = 0;
     const dispatchContract = options?.dispatchContract ?? "legacy-direct";
     // Load persisted stuck state so counters survive session restarts (#3704)
-    const persisted = loadStuckState(s.basePath);
+    const persisted = loadStuckState(s);
     const loopState = {
         recentUnits: persisted.recentUnits,
         stuckRecoveryAttempts: persisted.stuckRecoveryAttempts,
@@ -232,6 +289,23 @@ export async function autoLoop(ctx, pi, s, deps, options) {
     while (s.active) {
         iteration++;
         debugLog("autoLoop", { phase: "loop-top", iteration });
+        // Phase B: heartbeat the worker registry + active milestone lease so
+        // janitors and concurrent workers see a live process. Best-effort —
+        // DB unavailability or stale state must not stop the loop.
+        if (s.workerId) {
+            try {
+                heartbeatAutoWorker(s.workerId);
+                if (s.currentMilestoneId && s.milestoneLeaseToken) {
+                    refreshMilestoneLease(s.workerId, s.currentMilestoneId, s.milestoneLeaseToken);
+                }
+            }
+            catch (err) {
+                debugLog("autoLoop", {
+                    phase: "heartbeat-failed",
+                    error: err instanceof Error ? err.message : String(err),
+                });
+            }
+        }
         // ── Journal: per-iteration flow grouping ──
         const flowId = randomUUID();
         let seqCounter = 0;
@@ -299,6 +373,8 @@ export async function autoLoop(ctx, pi, s, deps, options) {
             finishTurn("stopped", "manual-attention", "missing-command-context");
             break;
         }
+        let dispatchId = null;
+        let dispatchSettled = false;
         try {
             // ── Blanket try/catch: one bad iteration must not kill the session
             const prefs = deps.loadEffectiveGSDPreferences()?.preferences;
@@ -359,7 +435,17 @@ export async function autoLoop(ctx, pi, s, deps, options) {
                     activeEngineId: s.activeEngineId,
                     activeRunDir: s.activeRunDir,
                 });
-                const engineState = await engine.deriveState(s.basePath);
+                const engineState = await engine.deriveState(s.canonicalProjectRoot);
+                debugLog("autoLoop", {
+                    phase: "post-derive",
+                    site: "custom-engine-derive",
+                    basePath: s.basePath,
+                    originalBasePath: s.originalBasePath,
+                    scopeProjectRoot: s.scope?.workspace.projectRoot,
+                    canonicalProjectRoot: s.canonicalProjectRoot,
+                    derivedPhase: engineState.phase,
+                    isComplete: engineState.isComplete,
+                });
                 if (engineState.isComplete) {
                     await deps.stopAuto(ctx, pi, "Workflow complete");
                     break;
@@ -375,7 +461,15 @@ export async function autoLoop(ctx, pi, s, deps, options) {
                 }
                 // dispatch.action === "dispatch"
                 const step = dispatch.step;
-                const gsdState = await deps.deriveState(s.basePath);
+                const gsdState = await deps.deriveState(s.canonicalProjectRoot);
+                debugLog("autoLoop", {
+                    phase: "post-derive",
+                    site: "custom-engine-gsd-state",
+                    basePath: s.basePath,
+                    canonicalProjectRoot: s.canonicalProjectRoot,
+                    derivedPhase: gsdState.phase,
+                    activeUnit: gsdState.activeTask?.id ?? gsdState.activeSlice?.id ?? gsdState.activeMilestone?.id,
+                });
                 iterData = {
                     unitType: step.unitType,
                     unitId: step.unitId,
@@ -478,7 +572,7 @@ export async function autoLoop(ctx, pi, s, deps, options) {
                 consecutiveCooldowns = 0;
                 recentErrorMessages.length = 0;
                 deps.emitJournalEvent({ ts: new Date().toISOString(), flowId, seq: nextSeq(), eventType: "iteration-end", data: { iteration } });
-                saveStuckState(s.basePath, loopState); // persist across session restarts (#3704)
+                saveStuckState(s, loopState); // persist across session restarts (#3704)
                 debugLog("autoLoop", { phase: "iteration-complete", iteration });
                 if (reconcileResult.outcome === "milestone-complete") {
                     await deps.stopAuto(ctx, pi, "Workflow complete");
@@ -552,7 +646,15 @@ export async function autoLoop(ctx, pi, s, deps, options) {
             }
             else {
                 // ── Sidecar path: use values from the sidecar item directly ──
-                const sidecarState = await deps.deriveState(s.basePath);
+                const sidecarState = await deps.deriveState(s.canonicalProjectRoot);
+                debugLog("autoLoop", {
+                    phase: "post-derive",
+                    site: "sidecar",
+                    basePath: s.basePath,
+                    canonicalProjectRoot: s.canonicalProjectRoot,
+                    derivedPhase: sidecarState.phase,
+                    activeUnit: sidecarState.activeTask?.id ?? sidecarState.activeSlice?.id ?? sidecarState.activeMilestone?.id,
+                });
                 iterData = {
                     unitType: sidecarItem.unitType,
                     unitId: sidecarItem.unitId,
@@ -573,7 +675,39 @@ export async function autoLoop(ctx, pi, s, deps, options) {
                 });
             }
             await enforceMinRequestInterval(s, prefs);
-            const unitPhaseResult = await runUnitPhaseViaContract(dispatchContract, ic, iterData, loopState, sidecarItem);
+            // Phase B: claim a unit_dispatches row before invoking the unit. The
+            // partial unique index idx_unit_dispatches_active_per_unit prevents
+            // a second worker from claiming the same unit concurrently. Returns
+            // null when DB unavailable, no worker registered, or no active lease
+            // — those degraded paths fall through to the existing single-worker
+            // semantics with no ledger entry, preserving back-compat.
+            const dispatchClaim = openDispatchClaim(s, flowId, turnId, iterData);
+            if (dispatchClaim.kind === "skip") {
+                finishTurn("skipped", "execution", dispatchClaim.reason);
+                continue;
+            }
+            dispatchId = dispatchClaim.kind === "opened" ? dispatchClaim.dispatchId : null;
+            let unitPhaseResult;
+            try {
+                unitPhaseResult = await runUnitPhaseViaContract(dispatchContract, ic, iterData, loopState, sidecarItem);
+            }
+            catch (err) {
+                if (err instanceof ModelPolicyDispatchBlockedError) {
+                    throw err;
+                }
+                if (dispatchId !== null) {
+                    try {
+                        markDispatchFailed(dispatchId, {
+                            errorSummary: `exception:${err instanceof Error ? err.message : String(err)}`,
+                        });
+                        dispatchSettled = true;
+                    }
+                    catch (ledgerErr) {
+                        debugLog("autoLoop", { phase: "dispatch-ledger-write-failed", error: ledgerErr instanceof Error ? ledgerErr.message : String(ledgerErr) });
+                    }
+                }
+                throw err;
+            }
             if (unitPhaseResult.action === "next") {
                 const requestTimestamp = unitPhaseResult.data.requestDispatchedAt ?? unitPhaseResult.data.unitStartedAt;
                 if (typeof requestTimestamp === "number")
@@ -584,11 +718,37 @@ export async function autoLoop(ctx, pi, s, deps, options) {
                 unitId: iterData.unitId,
             });
             if (unitPhaseResult.action === "break") {
+                if (dispatchId !== null) {
+                    try {
+                        markDispatchFailed(dispatchId, { errorSummary: "unit-break" });
+                        dispatchSettled = true;
+                    }
+                    catch (err) {
+                        debugLog("autoLoop", { phase: "dispatch-ledger-write-failed", error: err instanceof Error ? err.message : String(err) });
+                    }
+                }
                 finishTurn("stopped", "execution", "unit-break");
                 break;
             }
             // ── Phase 5: Finalize ───────────────────────────────────────────────
-            const finalizeResult = await runFinalize(ic, iterData, loopState, sidecarItem);
+            let finalizeResult;
+            try {
+                finalizeResult = await runFinalize(ic, iterData, loopState, sidecarItem);
+            }
+            catch (err) {
+                if (dispatchId !== null) {
+                    try {
+                        markDispatchFailed(dispatchId, {
+                            errorSummary: `exception:${err instanceof Error ? err.message : String(err)}`,
+                        });
+                        dispatchSettled = true;
+                    }
+                    catch (ledgerErr) {
+                        debugLog("autoLoop", { phase: "dispatch-ledger-write-failed", error: ledgerErr instanceof Error ? ledgerErr.message : String(ledgerErr) });
+                    }
+                }
+                throw err;
+            }
             deps.uokObserver?.onPhaseResult("finalize", finalizeResult.action, {
                 unitType: iterData.unitType,
                 unitId: iterData.unitId,
@@ -597,24 +757,63 @@ export async function autoLoop(ctx, pi, s, deps, options) {
                 const finalizeFailureClass = finalizeResult.reason === "git-closeout-failure"
                     ? "git"
                     : "closeout";
+                if (dispatchId !== null) {
+                    try {
+                        markDispatchFailed(dispatchId, { errorSummary: `finalize-break:${finalizeResult.reason ?? "unknown"}` });
+                        dispatchSettled = true;
+                    }
+                    catch (err) {
+                        debugLog("autoLoop", { phase: "dispatch-ledger-write-failed", error: err instanceof Error ? err.message : String(err) });
+                    }
+                }
                 finishTurn("stopped", finalizeFailureClass, "finalize-break");
                 break;
             }
             if (finalizeResult.action === "continue") {
+                if (dispatchId !== null) {
+                    try {
+                        markDispatchFailed(dispatchId, { errorSummary: "finalize-retry" });
+                        dispatchSettled = true;
+                    }
+                    catch (err) {
+                        debugLog("autoLoop", { phase: "dispatch-ledger-write-failed", error: err instanceof Error ? err.message : String(err) });
+                    }
+                }
                 finishTurn("retry");
                 continue;
             }
+            if (dispatchId !== null) {
+                try {
+                    markDispatchCompleted(dispatchId);
+                    dispatchSettled = true;
+                }
+                catch (err) {
+                    debugLog("autoLoop", { phase: "dispatch-ledger-write-failed", error: err instanceof Error ? err.message : String(err) });
+                }
+            }
             consecutiveErrors = 0; // Iteration completed successfully
             consecutiveCooldowns = 0;
             recentErrorMessages.length = 0;
             deps.emitJournalEvent({ ts: new Date().toISOString(), flowId, seq: nextSeq(), eventType: "iteration-end", data: { iteration } });
-            saveStuckState(s.basePath, loopState); // persist across session restarts (#4382)
+            saveStuckState(s, loopState); // persist across session restarts (#4382)
             debugLog("autoLoop", { phase: "iteration-complete", iteration });
             finishTurn("completed");
         }
         catch (loopErr) {
             // ── Blanket catch: absorb unexpected exceptions, apply graduated recovery ──
             const msg = loopErr instanceof Error ? loopErr.message : String(loopErr);
+            if (dispatchId !== null && !dispatchSettled && !(loopErr instanceof ModelPolicyDispatchBlockedError)) {
+                try {
+                    markDispatchFailed(dispatchId, { errorSummary: `unhandled-error:${msg.slice(0, 200)}` });
+                    dispatchSettled = true;
+                }
+                catch (err) {
+                    debugLog("autoLoop", {
+                        phase: "dispatch-ledger-write-failed",
+                        error: err instanceof Error ? err.message : String(err),
+                    });
+                }
+            }
             // Always emit iteration-end on error so the journal records iteration
             // completion even on failure (#2344). Without this, errors in
             // runFinalize leave the journal incomplete, making diagnosis harder.

package/dist/resources/extensions/gsd/auto/phases.js CHANGED Viewed

@@ -289,8 +289,10 @@ export async function runPreDispatch(ic, loopState) {
         s.currentMilestoneId) {
         deps.syncProjectRootToWorktree(s.originalBasePath, s.basePath, s.currentMilestoneId);
     }
-    // Derive state
-    let state = await deps.deriveState(s.basePath);
+    // Derive state — use canonical project root so the cache key is stable
+    // across worktree↔project-root path-form alternation. See PR #5236
+    // (workspace handle infrastructure) and the Phase A pt 2 plan.
+    let state = await deps.deriveState(s.canonicalProjectRoot);
     const { getDeepStageGate } = await import("../auto-dispatch.js");
     const deepStageGate = getDeepStageGate(prefs, s.basePath);
     const canRunDeepSetupGate = state.phase === "pre-planning" ||
@@ -324,7 +326,7 @@ export async function runPreDispatch(ic, loopState) {
         let compiled = ensurePlanV2Graph(s.basePath, state);
         if (isEmptyPlanV2GraphResult(compiled)) {
             deps.invalidateAllCaches();
-            state = await deps.deriveState(s.basePath);
+            state = await deps.deriveState(s.canonicalProjectRoot);
             compiled = shouldRunPlanV2Gate(state.phase)
                 ? ensurePlanV2Graph(s.basePath, state)
                 : {
@@ -477,7 +479,7 @@ export async function runPreDispatch(ic, loopState) {
         }
         // PR creation (auto_pr) is handled inside mergeMilestoneToMain (#2302)
         deps.invalidateAllCaches();
-        state = await deps.deriveState(s.basePath);
+        state = await deps.deriveState(s.canonicalProjectRoot);
         mid = state.activeMilestone?.id;
         midTitle = state.activeMilestone?.title;
         if (mid) {
@@ -596,7 +598,7 @@ export async function runPreDispatch(ic, loopState) {
     }
     if (mergeReconcileResult === "reconciled") {
         deps.invalidateAllCaches();
-        state = await deps.deriveState(s.basePath);
+        state = await deps.deriveState(s.canonicalProjectRoot);
         mid = state.activeMilestone?.id;
         midTitle = state.activeMilestone?.title;
     }

package/dist/resources/extensions/gsd/auto/session.js CHANGED Viewed

@@ -16,6 +16,7 @@
  * `let` or `var` declarations.
  */
 import { resolveWorktreeProjectRoot } from "../worktree-root.js";
+import { normalizeRealPath } from "../paths.js";
 // ─── Constants ───────────────────────────────────────────────────────────────
 export const STUB_RECOVERY_THRESHOLD = 2;
 export const NEW_SESSION_TIMEOUT_MS = 120_000;
@@ -34,6 +35,20 @@ export class AutoSession {
     originalBasePath = "";
     // TODO(C8): remove basePath/originalBasePath once all readers use s.scope
     scope = null;
+    // ── Coordination identity (Phase B — DB-backed coordination) ────────────
+    /**
+     * Worker registry ID set by registerAutoWorker() at session start. Used by
+     * heartbeatAutoWorker() each loop iteration and by recordDispatchClaim()
+     * to fence dispatch ledger writes against stale workers.
+     */
+    workerId = null;
+    /**
+     * Active milestone lease fencing token, set by claimMilestoneLease() inside
+     * worktree-resolver.enterMilestone(). Threaded into recordDispatchClaim()
+     * as milestone_lease_token so out-of-band dispatches by a stale worker
+     * are detectable.
+     */
+    milestoneLeaseToken = null;
     previousProjectRootEnv = null;
     hadProjectRootEnv = false;
     projectRootEnvCaptured = false;
@@ -162,6 +177,22 @@ export class AutoSession {
     get lockBasePath() {
         return resolveWorktreeProjectRoot(this.basePath, this.originalBasePath);
     }
+    /**
+     * Canonical project root for state-derivation reads AND writer paths.
+     *
+     * Prefers the realpath-normalized projectRoot from the MilestoneScope
+     * (introduced by PR #5236), falling back to resolveWorktreeProjectRoot
+     * during early lifecycle / engine-bypass paths where scope may be null.
+     *
+     * Always realpath-normalized so cache keys (e.g. deriveState's _stateCache)
+     * cannot drift across worktree↔project-root path-string variants for the
+     * same filesystem location.
+     */
+    get canonicalProjectRoot() {
+        const root = this.scope?.workspace.projectRoot
+            ?? resolveWorktreeProjectRoot(this.basePath, this.originalBasePath);
+        return normalizeRealPath(root);
+    }
     reset() {
         this.clearTimers();
         // Lifecycle
@@ -176,6 +207,8 @@ export class AutoSession {
         this.basePath = "";
         this.originalBasePath = "";
         this.scope = null;
+        this.workerId = null;
+        this.milestoneLeaseToken = null;
         this.previousProjectRootEnv = null;
         this.hadProjectRootEnv = false;
         this.projectRootEnvCaptured = false;