npm - gsd-pi - Versions diffs - 2.49.0-dev.de3d9f6 → 2.50.0-dev.9476db8 - Mend

gsd-pi 2.49.0-dev.de3d9f6 → 2.50.0-dev.9476db8

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (249) hide show

package/dist/resources/extensions/gsd/prompts/gate-evaluate.md ADDED Viewed

@@ -0,0 +1,32 @@
+# Quality Gate Evaluation — Parallel Dispatch
+**Working directory:** `{{workingDirectory}}`
+**Milestone:** {{milestoneId}} — {{milestoneTitle}}
+**Slice:** {{sliceId}} — {{sliceTitle}}
+## Mission
+You are evaluating **quality gates in parallel** for this slice. Each gate is an independent question that must be answered before task execution begins. Use the `subagent` tool to dispatch all gate evaluations simultaneously.
+## Slice Plan Context
+{{slicePlanContent}}
+## Gates to Evaluate
+{{gateCount}} gates require evaluation:
+{{gateList}}
+## Execution Protocol
+1. **Dispatch all gates** using `subagent` in parallel mode. Each subagent prompt is provided below.
+2. **Wait for all subagents** to complete.
+3. **Verify each gate wrote its result** by checking that `gsd_save_gate_result` was called for each gate ID.
+4. **Report the batch outcome** — which gates passed, which flagged concerns, and which were omitted as not applicable.
+Gate agents may return `verdict: "omitted"` if the gate question is not applicable to this slice (e.g., no auth surface for Q3, no existing requirements touched for Q4). This is expected for simple slices.
+## Subagent Prompts
+{{subagentPrompts}}

package/dist/resources/extensions/gsd/prompts/guided-complete-slice.md CHANGED Viewed

@@ -1,3 +1,3 @@
-Complete slice {{sliceId}} ("{{sliceTitle}}") of milestone {{milestoneId}}. Your working directory is `{{workingDirectory}}` — all file operations must use this path. All tasks are done. Your slice summary is the primary record of what was built — downstream agents (reassess-roadmap, future slice researchers) read it to understand what this slice delivered and what to watch out for. Use the **Slice Summary** and **UAT** output templates below to understand the expected structure. {{skillActivation}} Call `gsd_slice_complete` to record completion — the tool writes `{{sliceId}}-SUMMARY.md`, `{{sliceId}}-UAT.md`, and toggles the roadmap checkbox atomically. Fill the `UAT Type` plus `Not Proven By This UAT` sections explicitly in `uatContent` so the artifact states what class of acceptance it covers and what still remains unproven. Review task summaries for `key_decisions` and ensure any significant ones are in `.gsd/DECISIONS.md`. Do not commit or merge manually — the system handles this after the unit completes.
+Complete slice {{sliceId}} ("{{sliceTitle}}") of milestone {{milestoneId}}. Your working directory is `{{workingDirectory}}` — all file operations must use this path. All tasks are done. Your slice summary is the primary record of what was built — downstream agents (reassess-roadmap, future slice researchers) read it to understand what this slice delivered and what to watch out for. Use the **Slice Summary** and **UAT** output templates below to understand the expected structure. {{skillActivation}} Call `gsd_slice_complete` to record completion — the tool writes `{{sliceId}}-SUMMARY.md`, `{{sliceId}}-UAT.md`, and toggles the roadmap checkbox atomically. Fill the `UAT Type` plus `Not Proven By This UAT` sections explicitly in `uatContent` so the artifact states what class of acceptance it covers and what still remains unproven. Review task summaries for `key_decisions` and ensure any significant ones are in `.gsd/DECISIONS.md`. If the slice involved runtime behavior, fill the Operational Readiness section (Q8) in the summary: health signal, failure signal, recovery procedure, and monitoring gaps. Omit for simple slices. Do not commit or merge manually — the system handles this after the unit completes.
 {{inlinedTemplates}}

package/dist/resources/extensions/gsd/prompts/guided-execute-task.md CHANGED Viewed

@@ -1,3 +1,3 @@
-Execute the next task: {{taskId}} ("{{taskTitle}}") in slice {{sliceId}} of milestone {{milestoneId}}. Read the task plan (`{{taskId}}-PLAN.md`), load relevant summaries from prior tasks, and execute each step. Verify must-haves when done. If the task touches UI, browser flows, DOM behavior, or user-visible web state, exercise the real flow in the browser, prefer `browser_batch` for obvious sequences, prefer `browser_assert` for explicit pass/fail verification, use `browser_diff` when an action's effect is ambiguous, and use browser diagnostics when validating async or failure-prone UI. If you made an architectural, pattern, or library decision, append it to `.gsd/DECISIONS.md`. Use the **Task Summary** output template below. Call `gsd_task_complete` to record completion (it writes the summary, toggles the checkbox, and persists to DB atomically). {{skillActivation}} If running long and not all steps are finished, stop implementing and prioritize writing a clean partial summary over attempting one more step — a recoverable handoff is more valuable than a half-finished step with no documentation. If verification fails, debug methodically: form a hypothesis and test that specific theory before changing anything, change one variable at a time, read entire functions not just the suspect line, distinguish observable facts from assumptions, and if 3+ fixes fail without progress stop and reassess your mental model — list what you know for certain, what you've ruled out, and form fresh hypotheses. Don't fix symptoms — understand why something fails before changing code.
+Execute the next task: {{taskId}} ("{{taskTitle}}") in slice {{sliceId}} of milestone {{milestoneId}}. Read the task plan (`{{taskId}}-PLAN.md`), load relevant summaries from prior tasks, and execute each step. Verify must-haves when done. If the task touches UI, browser flows, DOM behavior, or user-visible web state, exercise the real flow in the browser, prefer `browser_batch` for obvious sequences, prefer `browser_assert` for explicit pass/fail verification, use `browser_diff` when an action's effect is ambiguous, and use browser diagnostics when validating async or failure-prone UI. If you made an architectural, pattern, or library decision, append it to `.gsd/DECISIONS.md`. Use the **Task Summary** output template below. Call `gsd_task_complete` to record completion (it writes the summary, toggles the checkbox, and persists to DB atomically). {{skillActivation}} If running long and not all steps are finished, stop implementing and prioritize writing a clean partial summary over attempting one more step — a recoverable handoff is more valuable than a half-finished step with no documentation. If verification fails, debug methodically: form a hypothesis and test that specific theory before changing anything, change one variable at a time, read entire functions not just the suspect line, distinguish observable facts from assumptions, and if 3+ fixes fail without progress stop and reassess your mental model — list what you know for certain, what you've ruled out, and form fresh hypotheses. Don't fix symptoms — understand why something fails before changing code. If the task plan includes Failure Modes, Load Profile, or Negative Tests sections, implement and verify them: handle each dependency's error/timeout/malformed paths (Q5), protect against identified 10x breakpoints (Q6), and write specified negative test cases (Q7).
 {{inlinedTemplates}}

package/dist/resources/extensions/gsd/prompts/guided-plan-milestone.md CHANGED Viewed

@@ -1,4 +1,4 @@
-Plan milestone {{milestoneId}} ("{{milestoneTitle}}"). Read `.gsd/DECISIONS.md` if it exists — respect existing decisions. Read `.gsd/REQUIREMENTS.md` if it exists and treat Active requirements as the capability contract. If `REQUIREMENTS.md` is missing, continue in legacy compatibility mode but explicitly note missing requirement coverage. Use the **Roadmap** output template below to shape the milestone planning payload you send to `gsd_plan_milestone`. Call `gsd_plan_milestone` to persist the milestone planning fields and render `{{milestoneId}}-ROADMAP.md` from DB state. Do **not** write `{{milestoneId}}-ROADMAP.md`, `ROADMAP.md`, or other planning artifacts manually. If planning produces structural decisions, append them to `.gsd/DECISIONS.md`. {{skillActivation}}
+Plan milestone {{milestoneId}} ("{{milestoneTitle}}"). Read `.gsd/DECISIONS.md` if it exists — respect existing decisions. Read `.gsd/REQUIREMENTS.md` if it exists and treat Active requirements as the capability contract. If `REQUIREMENTS.md` is missing, continue in legacy compatibility mode but explicitly note missing requirement coverage. Use the **Roadmap** output template below to shape the milestone planning payload you send to `gsd_plan_milestone`. Call `gsd_plan_milestone` to persist the milestone planning fields and render `{{milestoneId}}-ROADMAP.md` from DB state. Do **not** write `{{milestoneId}}-ROADMAP.md`, `ROADMAP.md`, or other planning artifacts manually. If planning produces structural decisions, append them to `.gsd/DECISIONS.md`. {{skillActivation}} Fill the Horizontal Checklist section with cross-cutting concerns considered during planning (requirements re-read, decisions re-evaluated, graceful shutdown, revenue paths, auth boundary, shared resources, reconnection). Omit for trivial milestones.
 ## Requirement Rules

package/dist/resources/extensions/gsd/prompts/guided-plan-slice.md CHANGED Viewed

@@ -1,3 +1,3 @@
-Plan slice {{sliceId}} ("{{sliceTitle}}") of milestone {{milestoneId}}. Read `.gsd/DECISIONS.md` if it exists — respect existing decisions. Read `.gsd/REQUIREMENTS.md` if it exists — identify which Active requirements the roadmap says this slice owns or supports, and ensure the plan delivers them. Read the roadmap boundary map, any existing context/research files, and dependency summaries. Use the **Slice Plan** and **Task Plan** output templates below. Decompose into tasks with must-haves. Fill the `Proof Level` and `Integration Closure` sections truthfully so the plan says what class of proof this slice really delivers and what end-to-end wiring still remains. Call `gsd_plan_slice` to persist the slice plan — the tool writes `{{sliceId}}-PLAN.md` and individual `T##-PLAN.md` files to disk and persists to DB. Do **not** write plan files manually — use the DB-backed tool so state stays consistent. If planning produces structural decisions, call `gsd_decision_save` for each — the tool auto-assigns IDs and regenerates `.gsd/DECISIONS.md` automatically. {{skillActivation}} Before finishing, self-audit the plan: every must-have maps to at least one task, every task has complete sections (steps, must-haves, verification, observability impact, inputs, and expected output), task ordering is consistent with no circular references, every pair of artifacts that must connect has an explicit wiring step, task scope targets 2–5 steps and 3–8 files (6–8 steps or 8–10 files — consider splitting; 10+ steps or 12+ files — must split), the plan honors locked decisions from context/research/decisions artifacts, the proof-level wording does not overclaim live integration if only fixture/contract proof is planned, every Active requirement this slice owns has at least one task with verification that proves it is met, and every task produces real user-facing progress — if the slice has a UI surface at least one task builds the real UI, if it has an API at least one task connects it to a real data source, and showing the completed result to a non-technical stakeholder would demonstrate real product progress rather than developer artifacts.
+Plan slice {{sliceId}} ("{{sliceTitle}}") of milestone {{milestoneId}}. Read `.gsd/DECISIONS.md` if it exists — respect existing decisions. Read `.gsd/REQUIREMENTS.md` if it exists — identify which Active requirements the roadmap says this slice owns or supports, and ensure the plan delivers them. Read the roadmap boundary map, any existing context/research files, and dependency summaries. Use the **Slice Plan** and **Task Plan** output templates below. Decompose into tasks with must-haves. Fill the `Proof Level` and `Integration Closure` sections truthfully so the plan says what class of proof this slice really delivers and what end-to-end wiring still remains. Call `gsd_plan_slice` to persist the slice plan — the tool writes `{{sliceId}}-PLAN.md` and individual `T##-PLAN.md` files to disk and persists to DB. Do **not** write plan files manually — use the DB-backed tool so state stays consistent. If planning produces structural decisions, call `gsd_decision_save` for each — the tool auto-assigns IDs and regenerates `.gsd/DECISIONS.md` automatically. {{skillActivation}} Before finishing, self-audit the plan: every must-have maps to at least one task, every task has complete sections (steps, must-haves, verification, observability impact, inputs, and expected output), task ordering is consistent with no circular references, every pair of artifacts that must connect has an explicit wiring step, task scope targets 2–5 steps and 3–8 files (6–8 steps or 8–10 files — consider splitting; 10+ steps or 12+ files — must split), the plan honors locked decisions from context/research/decisions artifacts, the proof-level wording does not overclaim live integration if only fixture/contract proof is planned, every Active requirement this slice owns has at least one task with verification that proves it is met, and every task produces real user-facing progress — if the slice has a UI surface at least one task builds the real UI, if it has an API at least one task connects it to a real data source, and showing the completed result to a non-technical stakeholder would demonstrate real product progress rather than developer artifacts, and quality gate coverage — for non-trivial slices, Threat Surface (Q3: abuse, data exposure, input trust) and Requirement Impact (Q4: requirements touched, re-verify, decisions revisited) sections are present. For non-trivial tasks, Failure Modes (Q5), Load Profile (Q6), and Negative Tests (Q7) are filled in task plans.
 {{inlinedTemplates}}

package/dist/resources/extensions/gsd/prompts/plan-milestone.md CHANGED Viewed

@@ -47,7 +47,7 @@ Then:
 2. {{skillActivation}}
 3. Create the roadmap: decompose into demoable vertical slices — as many as the work genuinely needs, no more. A simple feature might be 1 slice. Don't decompose for decomposition's sake.
 4. Order by risk (high-risk first)
-5. Call `gsd_plan_milestone` to persist the milestone planning fields and slice rows in the DB-backed planning path. Do **not** write `{{outputPath}}`, `ROADMAP.md`, or other planning artifacts manually — the planning tool owns roadmap rendering and persistence.
+5. Call `gsd_plan_milestone` to persist the milestone planning fields, slice rows, and **horizontal checklist** in the DB-backed planning path. Do **not** write `{{outputPath}}`, `ROADMAP.md`, or other planning artifacts manually — the planning tool owns roadmap rendering and persistence.
 6. If planning produced structural decisions (e.g. slice ordering rationale, technology choices, scope exclusions), call `gsd_decision_save` for each decision — the tool auto-assigns IDs and regenerates `.gsd/DECISIONS.md` automatically.
 ## Requirement Mapping Rules

package/dist/resources/extensions/gsd/prompts/plan-slice.md CHANGED Viewed

@@ -57,14 +57,18 @@ Then:
    - Include `Observability / Diagnostics` for backend, integration, async, stateful, or UI slices where failure diagnosis matters.
    - Fill `Proof Level` and `Integration Closure` when the slice crosses runtime boundaries or has meaningful integration concerns.
    - **Omit these sections entirely for simple slices** where they would all be "none" or trivially obvious.
-5. Decompose the slice into tasks, each fitting one context window. Each task needs:
+5. **Quality gates** — for non-trivial slices, fill the Threat Surface (Q3) and Requirement Impact (Q4) sections in the slice plan:
+   - **Threat Surface:** Identify abuse scenarios, data exposure risks, and input trust boundaries. Required when the slice handles user input, authentication, authorization, or sensitive data. Omit entirely for internal refactoring or simple changes.
+   - **Requirement Impact:** List which existing requirements this slice touches, what must be re-verified after shipping, and which prior decisions should be reconsidered. Omit entirely if no existing requirements are affected.
+   - For each task in a non-trivial slice, fill Failure Modes (Q5), Load Profile (Q6), and Negative Tests (Q7) in the task plan when the task has external dependencies, shared resources, or non-trivial input handling. Omit for simple tasks.
+6. Decompose the slice into tasks, each fitting one context window. Each task needs:
    - a concrete, action-oriented title
    - the inline task entry fields defined in the plan.md template (Why / Files / Do / Verify / Done when)
    - a matching task plan file with description, steps, must-haves, verification, inputs, and expected output
    - **Inputs and Expected Output must list concrete backtick-wrapped file paths** (e.g. `` `src/types.ts` ``). These are machine-parsed to derive task dependencies — vague prose without paths breaks parallel execution. Every task must have at least one output file path.
    - Observability Impact section **only if the task touches runtime boundaries, async flows, or error paths** — omit it otherwise
-6. **Persist planning state through `gsd_plan_slice`.** Call it with the full slice planning payload (goal, demo, must-haves, verification, tasks, and metadata). The tool inserts all tasks in the same transaction, writes to the DB, and renders `{{outputPath}}` and `{{slicePath}}/tasks/T##-PLAN.md` files automatically. Do **not** call `gsd_plan_task` separately — `gsd_plan_slice` handles task persistence. Do **not** rely on direct `PLAN.md` writes as the source of truth; the DB-backed tool is the canonical write path for slice and task planning state.
-7. **Self-audit the plan.** Walk through each check — if any fail, fix the plan files before moving on:
+7. **Persist planning state through `gsd_plan_slice`.** Call it with the full slice planning payload (goal, demo, must-haves, verification, tasks, and metadata). The tool inserts all tasks in the same transaction, writes to the DB, and renders `{{outputPath}}` and `{{slicePath}}/tasks/T##-PLAN.md` files automatically. Do **not** call `gsd_plan_task` separately — `gsd_plan_slice` handles task persistence. Do **not** rely on direct `PLAN.md` writes as the source of truth; the DB-backed tool is the canonical write path for slice and task planning state.
+8. **Self-audit the plan.** Walk through each check — if any fail, fix the plan files before moving on:
     - **Completion semantics:** If every task were completed exactly as written, the slice goal/demo should actually be true.
     - **Requirement coverage:** Every must-have in the slice maps to at least one task. No must-have is orphaned. If `REQUIREMENTS.md` exists, every Active requirement this slice owns maps to at least one task.
     - **Task completeness:** Every task has steps, must-haves, verification, inputs, and expected output — none are blank or vague. Inputs and Expected Output list backtick-wrapped file paths, not prose descriptions.
@@ -72,6 +76,7 @@ Then:
     - **Key links planned:** For every pair of artifacts that must connect, there is an explicit step that wires them.
     - **Scope sanity:** Target 2–5 steps and 3–8 files per task. 10+ steps or 12+ files — must split. Each task must be completable in a single fresh context window.
     - **Feature completeness:** Every task produces real, user-facing progress — not just internal scaffolding.
+    - **Quality gate coverage:** For non-trivial slices, Threat Surface and Requirement Impact sections are present and specific (not placeholder text). For non-trivial tasks, Failure Modes, Load Profile, and Negative Tests are addressed in the task plan.
 10. If planning produced structural decisions, append them to `.gsd/DECISIONS.md`
 11. {{commitInstruction}}

package/dist/resources/extensions/gsd/prompts/reassess-roadmap.md CHANGED Viewed

@@ -36,6 +36,9 @@ Ask yourself:
 - Did assumptions in remaining slice descriptions turn out wrong?
 - If `.gsd/REQUIREMENTS.md` exists: did this slice validate, invalidate, defer, block, or newly surface requirements?
 - If `.gsd/REQUIREMENTS.md` exists: does the remaining roadmap still provide credible coverage for Active requirements, including launchability, primary user loop, continuity, and failure visibility where relevant?
+- Are the Threat Surface and Requirement Impact sections in completed slice plans still accurate for remaining slices?
+- Did this slice's Operational Readiness reveal monitoring gaps that remaining slices should address?
+- Should any Horizontal Checklist items be updated based on what was actually built?
 ### Success-Criterion Coverage Check

package/dist/resources/extensions/gsd/prompts/replan-slice.md CHANGED Viewed

@@ -32,7 +32,7 @@ Consider these captures when rewriting the remaining tasks — they represent th
 1. Read the blocker task summary carefully. Understand exactly what was discovered and why it blocks the current plan.
 2. Analyze the remaining `[ ]` tasks in the slice plan. Determine which are still valid, which need modification, and which should be replaced.
-3. **Persist replan state through `gsd_replan_slice`.** Call it with: `milestoneId`, `sliceId`, `blockerTaskId`, `blockerDescription`, `whatChanged`, `updatedTasks` (array of task objects with taskId, title, description, estimate, files, verify, inputs, expectedOutput), `removedTaskIds` (array of task ID strings). The tool structurally enforces preservation of completed tasks, writes replan history to the DB, re-renders `{{planPath}}`, and renders `{{replanPath}}`.
+3. **Persist replan state through `gsd_replan_slice`.** Call it with: `milestoneId`, `sliceId`, `blockerTaskId`, `blockerDescription`, `whatChanged`, `updatedTasks` (array of task objects with taskId, title, description, estimate, files, verify, inputs, expectedOutput), `removedTaskIds` (array of task ID strings). The tool structurally enforces preservation of completed tasks, writes replan history to the DB, re-renders `{{planPath}}`, and renders `{{replanPath}}`. Preserve or update the Threat Surface and Requirement Impact sections if the replan changes the slice's security posture or requirement coverage.
 4. If any incomplete task had a `T0x-PLAN.md`, remove or rewrite it to match the new task description.
 5. Do not commit manually — the system auto-commits your changes after this unit completes.

package/dist/resources/extensions/gsd/repo-identity.js CHANGED Viewed

@@ -349,6 +349,35 @@ export function ensureGsdSymlink(projectPath) {
     if (localGsdNormalized === gsdHomePath) {
         return localGsd;
     }
+    // Guard: If projectPath is a plain subdirectory (not a worktree) of a git
+    // repo that already has a .gsd at the git root, do not create a duplicate
+    // symlink in the subdirectory — that causes `.gsd 2` collision variants on
+    // macOS (#2380). Worktrees are excluded because they legitimately need their
+    // own .gsd symlink pointing at the shared external state dir.
+    if (!inWorktree) {
+        try {
+            const gitRoot = resolveGitRoot(projectPath);
+            const normalizedProject = canonicalizeExistingPath(projectPath);
+            const normalizedRoot = canonicalizeExistingPath(gitRoot);
+            if (normalizedProject !== normalizedRoot) {
+                const rootGsd = join(gitRoot, ".gsd");
+                if (existsSync(rootGsd)) {
+                    try {
+                        const rootStat = lstatSync(rootGsd);
+                        if (rootStat.isSymbolicLink() || rootStat.isDirectory()) {
+                            return rootStat.isSymbolicLink() ? realpathSync(rootGsd) : rootGsd;
+                        }
+                    }
+                    catch {
+                        // Fall through to normal logic if we can't stat root .gsd
+                    }
+                }
+            }
+        }
+        catch {
+            // If git root detection fails, fall through to normal logic
+        }
+    }
     // Clean up macOS numbered collision variants (.gsd 2, .gsd 3, etc.) before
     // any existence checks — otherwise they accumulate and confuse state (#2205).
     cleanNumberedGsdVariants(projectPath);

package/dist/resources/extensions/gsd/roadmap-slices.js CHANGED Viewed

@@ -36,8 +36,8 @@ export function expandDependencies(deps) {
     return result;
 }
 function extractSlicesSection(content) {
-    // Match "## Slices", "## Slice Overview", "## Slice Table", etc.
-    const headingMatch = /^## Slice(?:s| Overview| Table| Summary| Status)\b.*$/m.exec(content);
+    // Match "## Slices", "## Slice Overview", "## Slice Table", "## Slice Roadmap", etc.
+    const headingMatch = /^## Slice(?:s| Overview| Table| Summary| Status| Roadmap)\b.*$/m.exec(content);
     if (!headingMatch || headingMatch.index == null)
         return "";
     const start = headingMatch.index + headingMatch[0].length;

package/dist/resources/extensions/gsd/session-forensics.js CHANGED Viewed

@@ -24,7 +24,6 @@ import { truncateWithEllipsis } from "../shared/format-utils.js";
 import { nativeParseJsonlTail } from "./native-parser-bridge.js";
 import { MAX_JSONL_BYTES, parseJSONL } from "./jsonl-utils.js";
 import { nativeWorkingTreeStatus, nativeDiffStat } from "./native-git-bridge.js";
-import { getAutoWorktreePath } from "./auto-worktree.js";
 // ─── JSONL Parsing ────────────────────────────────────────────────────────────
 // MAX_JSONL_BYTES and parseJSONL are imported from ./jsonl-utils.js
 /**
@@ -235,17 +234,13 @@ export function synthesizeCrashRecovery(basePath, unitType, unitId, sessionFile,
  * Deep diagnostic from any JSONL source (activity log or session file).
  * Replaces the old shallow getLastActivityDiagnostic().
  */
-export function getDeepDiagnostic(basePath) {
-    // Try worktree activity logs first if an auto-worktree is active
+export function getDeepDiagnostic(basePath, worktreePath) {
+    // Try worktree activity logs first if a worktree path is provided
     let trace = null;
     try {
-        const mid = readActiveMilestoneId(basePath);
-        if (mid) {
-            const wtPath = getAutoWorktreePath(basePath, mid);
-            if (wtPath) {
-                const wtActivityDir = join(gsdRoot(wtPath), "activity");
-                trace = readLastActivityLog(wtActivityDir);
-            }
+        if (worktreePath) {
+            const wtActivityDir = join(gsdRoot(worktreePath), "activity");
+            trace = readLastActivityLog(wtActivityDir);
         }
     }
     catch { /* non-fatal — fall through to root */ }
@@ -262,7 +257,7 @@ export function getDeepDiagnostic(basePath) {
  * Read the active milestone ID directly from STATE.md without async deriveState().
  * Looks for `**Active Milestone:** M001` pattern.
  */
-function readActiveMilestoneId(basePath) {
+export function readActiveMilestoneId(basePath) {
     try {
         const statePath = join(gsdRoot(basePath), "STATE.md");
         if (!existsSync(statePath))

package/dist/resources/extensions/gsd/session-lock.js CHANGED Viewed

@@ -134,6 +134,49 @@ function ensureExitHandler(_gsdDir) {
         }
     });
 }
+// ─── Lock Acquisition Helpers ───────────────────────────────────────────────
+/**
+ * Create the onCompromised callback for proper-lockfile.
+ *
+ * proper-lockfile fires onCompromised when it detects mtime drift (system sleep,
+ * event loop stall, etc.). The default handler throws inside setTimeout — an
+ * uncaught exception that crashes or corrupts process state.
+ *
+ * False-positive suppression (#1362): If we're still within the stale window
+ * (30 min since acquisition), the mtime mismatch is from an event loop stall
+ * during a long LLM call — not a real takeover. Log and continue.
+ *
+ * PID ownership check (#1578): Past the stale window, check if the lock file
+ * still contains our PID before declaring compromise. Retry reads tolerate
+ * transient filesystem hiccups (NFS/CIFS latency, APFS snapshots, etc.) (#2324).
+ */
+function createLockCompromisedHandler(lockFilePath) {
+    return () => {
+        const elapsed = Date.now() - _lockAcquiredAt;
+        if (elapsed < 1_800_000) {
+            process.stderr.write(`[gsd] Lock heartbeat caught up after ${Math.round(elapsed / 1000)}s — long LLM call, no action needed.\n`);
+            return;
+        }
+        const existing = readExistingLockDataWithRetry(lockFilePath);
+        if (existing && existing.pid === process.pid) {
+            process.stderr.write(`[gsd] Lock heartbeat mismatch after ${Math.round(elapsed / 1000)}s — lock file still owned by PID ${process.pid}, treating as false positive.\n`);
+            return;
+        }
+        _lockCompromised = true;
+        _releaseFunction = null;
+    };
+}
+/**
+ * Assign module-level lock state after a successful lock acquisition.
+ */
+function assignLockState(basePath, release, lockFilePath) {
+    _releaseFunction = release;
+    _lockedPath = basePath;
+    _lockPid = process.pid;
+    _lockCompromised = false;
+    _lockAcquiredAt = Date.now();
+    _snapshotLockPath = lockFilePath;
+}
 // ─── Public API ─────────────────────────────────────────────────────────────
 /**
  * Attempt to acquire an exclusive session lock for the given project.
@@ -188,38 +231,9 @@ export function acquireSessionLock(basePath) {
             realpath: false,
             stale: 1_800_000, // 30 minutes — safe for laptop sleep / long event loop stalls
             update: 10_000, // Update lock mtime every 10s to prove liveness
-            onCompromised: () => {
-                // proper-lockfile detected mtime drift (system sleep, event loop stall, etc.).
-                // Default handler throws inside setTimeout — an uncaught exception that crashes
-                // or corrupts process state.
-                //
-                // False-positive suppression (#1362): If we're still within the stale window
-                // (30 min since acquisition), the mtime mismatch is from an event loop stall
-                // during a long LLM call — not a real takeover. Log and continue.
-                const elapsed = Date.now() - _lockAcquiredAt;
-                if (elapsed < 1_800_000) {
-                    process.stderr.write(`[gsd] Lock heartbeat caught up after ${Math.round(elapsed / 1000)}s — long LLM call, no action needed.\n`);
-                    return; // Suppress false positive
-                }
-                // Past the stale window — check if the lock file still belongs to us before
-                // declaring compromise (#1578). If our PID still owns the metadata, this is
-                // a false positive from a very long event loop stall (e.g. subagent execution).
-                const existing = readExistingLockData(lp);
-                if (existing && existing.pid === process.pid) {
-                    process.stderr.write(`[gsd] Lock heartbeat mismatch after ${Math.round(elapsed / 1000)}s — lock file still owned by PID ${process.pid}, treating as false positive.\n`);
-                    return; // Our PID still owns the lock file — no real takeover
-                }
-                // Lock file is gone or owned by another PID — real compromise
-                _lockCompromised = true;
-                _releaseFunction = null;
-            },
+            onCompromised: createLockCompromisedHandler(lp),
         });
-        _releaseFunction = release;
-        _lockedPath = basePath;
-        _lockPid = process.pid;
-        _lockCompromised = false;
-        _lockAcquiredAt = Date.now();
-        _snapshotLockPath = lp; // Snapshot the resolved path for consistent access (#1363)
+        assignLockState(basePath, release, lp);
         // Safety net: clean up lock dir on process exit if _releaseFunction
         // wasn't called (e.g., normal exit after clean completion) (#1245).
         ensureExitHandler(gsdDir);
@@ -245,31 +259,9 @@ export function acquireSessionLock(basePath) {
                     realpath: false,
                     stale: 1_800_000, // 30 minutes — match primary lock settings
                     update: 10_000,
-                    onCompromised: () => {
-                        // Same false-positive suppression as the primary lock (#1512).
-                        // Without this, the retry path fires _lockCompromised unconditionally
-                        // on benign mtime drift (laptop sleep, heavy LLM event loop stalls).
-                        const elapsed = Date.now() - _lockAcquiredAt;
-                        if (elapsed < 1_800_000) {
-                            process.stderr.write(`[gsd] Lock heartbeat caught up after ${Math.round(elapsed / 1000)}s — long LLM call, no action needed.\n`);
-                            return;
-                        }
-                        // Check PID ownership before declaring compromise (#1578)
-                        const existing = readExistingLockData(lp);
-                        if (existing && existing.pid === process.pid) {
-                            process.stderr.write(`[gsd] Lock heartbeat mismatch after ${Math.round(elapsed / 1000)}s — lock file still owned by PID ${process.pid}, treating as false positive.\n`);
-                            return;
-                        }
-                        _lockCompromised = true;
-                        _releaseFunction = null;
-                    },
+                    onCompromised: createLockCompromisedHandler(lp),
                 });
-                _releaseFunction = release;
-                _lockedPath = basePath;
-                _lockPid = process.pid;
-                _lockCompromised = false;
-                _lockAcquiredAt = Date.now();
-                _snapshotLockPath = lp; // Snapshot for retry path too (#1363)
+                assignLockState(basePath, release, lp);
                 // Safety net — uses centralized handler to avoid double-registration
                 ensureExitHandler(gsdDir);
                 atomicWriteSync(lp, JSON.stringify(lockData, null, 2));
@@ -348,7 +340,8 @@ export function getSessionLockStatus(basePath) {
         // onCompromised fired from benign mtime drift (laptop sleep, event loop stall
         // beyond the stale window). Attempt re-acquisition instead of giving up.
         const lp = lockPath(basePath);
-        const existing = readExistingLockData(lp);
+        // Retry reads to tolerate transient filesystem hiccups (#2324).
+        const existing = readExistingLockDataWithRetry(lp);
         if (existing && existing.pid === process.pid) {
             // Lock file still ours — try to re-acquire the OS lock
             try {
@@ -492,6 +485,24 @@ function readExistingLockData(lp) {
         return null;
     }
 }
+export function readExistingLockDataWithRetry(lp, options) {
+    const maxAttempts = options?.maxAttempts ?? 3;
+    const delayMs = options?.delayMs ?? 200;
+    for (let attempt = 1; attempt <= maxAttempts; attempt++) {
+        const data = readExistingLockData(lp);
+        if (data !== null)
+            return data;
+        if (attempt < maxAttempts) {
+            // Synchronous busy-wait — onCompromised runs in a sync callback context
+            // and the delays are short (200ms default).
+            const start = Date.now();
+            while (Date.now() - start < delayMs) {
+                // busy-wait
+            }
+        }
+    }
+    return null;
+}
 function isPidAlive(pid) {
     if (!Number.isInteger(pid) || pid <= 0)
         return false;

package/dist/resources/extensions/gsd/state.js CHANGED Viewed

@@ -9,7 +9,8 @@ import { nativeBatchParseGsdFiles } from './native-parser-bridge.js';
 import { join, resolve } from 'path';
 import { existsSync, readdirSync } from 'node:fs';
 import { debugCount, debugTime } from './debug-logger.js';
-import { isDbAvailable, getAllMilestones, getMilestoneSlices, getSliceTasks, getReplanHistory, getSlice, insertMilestone, updateTaskStatus, } from './gsd-db.js';
+import { extractVerdict } from './verdict-parser.js';
+import { isDbAvailable, getAllMilestones, getMilestoneSlices, getSliceTasks, getReplanHistory, getSlice, insertMilestone, updateTaskStatus, getPendingSliceGateCount, } from './gsd-db.js';
 /**
  * A "ghost" milestone directory contains only META.json (and no substantive
  * files like CONTEXT, CONTEXT-DRAFT, ROADMAP, or SUMMARY).  These appear when
@@ -42,13 +43,9 @@ export function isMilestoneComplete(roadmap) {
  * after remediation slices are executed.
  */
 export function isValidationTerminal(validationContent) {
-    const match = validationContent.match(/^---\n([\s\S]*?)\n---/);
-    if (!match)
+    const v = extractVerdict(validationContent);
+    if (!v)
         return false;
-    const verdict = match[1].match(/verdict:\s*(\S+)/);
-    if (!verdict)
-        return false;
-    const v = verdict[1] === 'passed' ? 'pass' : verdict[1];
     // 'pass' and 'needs-attention' are always terminal.
     // 'needs-remediation' is treated as terminal to prevent infinite loops
     // when no remediation slices exist in the roadmap (#832). The validation
@@ -595,6 +592,21 @@ export async function deriveStateFromDb(basePath) {
             };
         }
     }
+    // ── Quality gate evaluation check ──────────────────────────────────
+    // If slice-scoped gates (Q3/Q4) are still pending, pause before execution
+    // so the gate-evaluate dispatch rule can run parallel sub-agents.
+    // Slices with zero gate rows (pre-feature or simple) skip straight through.
+    const pendingGateCount = getPendingSliceGateCount(activeMilestone.id, activeSlice.id);
+    if (pendingGateCount > 0) {
+        return {
+            activeMilestone, activeSlice, activeTask: null,
+            phase: 'evaluating-gates',
+            recentDecisions: [], blockers: [],
+            nextAction: `Evaluate ${pendingGateCount} quality gate(s) for ${activeSlice.id} before execution.`,
+            registry, requirements,
+            progress: { milestones: milestoneProgress, slices: sliceProgress, tasks: taskProgress },
+        };
+    }
     // ── Blocker detection: check completed tasks for blocker_discovered ──
     const completedTasks = tasks.filter(t => isStatusDone(t.status));
     let blockerTaskId = null;
@@ -1143,6 +1155,21 @@ export async function _deriveStateImpl(basePath) {
         };
     }
     const slicePlan = parsePlan(slicePlanContent);
+    // ── Reconcile stale task status for filesystem-based projects (#2514) ──
+    // Heading-style tasks (### T01:) are always parsed as done=false by
+    // parsePlan because the heading syntax has no checkbox. When the agent
+    // writes a SUMMARY file but the plan's heading isn't converted to a
+    // checkbox, the task appears incomplete forever — causing infinite
+    // re-dispatch. Reconcile by checking SUMMARY files on disk.
+    for (const t of slicePlan.tasks) {
+        if (t.done)
+            continue;
+        const summaryPath = resolveTaskFile(basePath, activeMilestone.id, activeSlice.id, t.id, "SUMMARY");
+        if (summaryPath && existsSync(summaryPath)) {
+            t.done = true;
+            process.stderr.write(`gsd-reconcile: task ${activeMilestone.id}/${activeSlice.id}/${t.id} has SUMMARY on disk but plan shows incomplete — marking done (#2514)\n`);
+        }
+    }
     const taskProgress = {
         done: slicePlan.tasks.filter(t => t.done).length,
         total: slicePlan.tasks.length,

package/dist/resources/extensions/gsd/templates/milestone-summary.md CHANGED Viewed

@@ -49,6 +49,14 @@ completed_at: {{date}}
 - {{requirementId}}: {{fromStatus}} → {{toStatus}} — {{evidence}}
+## Decision Re-evaluation
+<!-- Review decisions from this milestone. OMIT if no decisions need re-evaluation. -->
+| Decision | Original Rationale | Still Valid? | Action |
+|----------|-------------------|-------------|--------|
+| {{decisionId}} | {{originalRationale}} | {{yes/no/partially}} | {{keep/revise/supersede}} |
 ## Forward Intelligence
 <!-- Write what you wish you'd known at the start of this milestone.

package/dist/resources/extensions/gsd/templates/plan.md CHANGED Viewed

@@ -8,6 +8,22 @@
 - {{mustHave}}
 - {{mustHave}}
+## Threat Surface
+<!-- Q3: How can this be exploited? OMIT ENTIRELY for simple slices with no auth, user input, or data exposure. -->
+- **Abuse**: {{abuseScenarios — parameter tampering, replay, privilege escalation, or N/A}}
+- **Data exposure**: {{sensitiveDataAccessible — PII, tokens, secrets, or none}}
+- **Input trust**: {{untrustedInput — user input reaching DB/API/filesystem, or none}}
+## Requirement Impact
+<!-- Q4: What existing promises does this break? OMIT ENTIRELY if no existing requirements are affected. -->
+- **Requirements touched**: {{requirementIds — e.g. R001, R003, or none}}
+- **Re-verify**: {{whatMustBeRetested — e.g. login flow, API contract, or N/A}}
+- **Decisions revisited**: {{decisionIds — e.g. D002, or none}}
 ## Proof Level
 <!-- Omit this section entirely for simple slices where the answer is trivially obvious. -->

package/dist/resources/extensions/gsd/templates/roadmap.md CHANGED Viewed

@@ -92,6 +92,19 @@ This milestone is complete only when all are true:
   - Each "After this" line must be truthful about proof level: if only fixtures or tests prove it, say so; do not imply the user can already perform the live end-to-end behavior unless that has actually been exercised
 -->
+## Horizontal Checklist
+<!-- Cross-cutting concerns across all slices. Check each that was considered.
+     OMIT ENTIRELY for trivial milestones. -->
+- [ ] Every active R### re-read against new code — still fully satisfied?
+- [ ] Every D### from prior milestones re-evaluated — still valid at new scope?
+- [ ] Graceful shutdown / cleanup on termination verified
+- [ ] Revenue / billing path impact assessed (or N/A)
+- [ ] Auth boundary documented — what's protected vs public
+- [ ] Shared resource budget confirmed — connection pools, caches, rate limits hold under peak
+- [ ] Reconnection / retry strategy verified for every external dependency
 ## Boundary Map
 <!-- Be specific. Name concrete outputs: API endpoints, event payloads, shared types/interfaces,

package/dist/resources/extensions/gsd/templates/slice-summary.md CHANGED Viewed

@@ -57,6 +57,15 @@ completed_at: {{date}}
 - {{requirementIdOr_none}} — {{what changed}}
+## Operational Readiness
+<!-- Q8: How will ops know it's healthy/broken? OMIT ENTIRELY for simple slices with no runtime concerns. -->
+- **Health signal**: {{howToConfirmHealthy — health endpoint, heartbeat log, metric, or N/A}}
+- **Failure signal**: {{howToDetectBroken — error rate spike, alert, log pattern, or N/A}}
+- **Recovery**: {{selfRecoverOrRestart — auto-reconnect, circuit breaker, manual restart, or N/A}}
+- **Monitoring gaps**: {{silentFailureModes — background jobs, cache eviction, memory pressure, or none}}
 ## Deviations
 <!-- Deviations are unplanned changes to the written plan, not ordinary debugging inside the plan's intended scope. -->

package/dist/resources/extensions/gsd/templates/task-plan.md CHANGED Viewed

@@ -17,6 +17,30 @@ skills_used:
 {{description}}
+## Failure Modes
+<!-- Q5: What breaks when dependencies fail? OMIT ENTIRELY for tasks with no external dependencies. -->
+| Dependency | On error | On timeout | On malformed response |
+|------------|----------|-----------|----------------------|
+| {{dependency}} | {{errorStrategy}} | {{timeoutStrategy}} | {{malformedStrategy}} |
+## Load Profile
+<!-- Q6: What breaks at 10x load? OMIT ENTIRELY for tasks with no shared resources or scaling concerns. -->
+- **Shared resources**: {{sharedResources — DB connections, caches, rate limiters, or none}}
+- **Per-operation cost**: {{perOpCost — N API calls, M DB queries, K bytes, or trivial}}
+- **10x breakpoint**: {{whatBreaksFirst — pool exhaustion, rate limit, memory, or N/A}}
+## Negative Tests
+<!-- Q7: What negative tests prove robustness? OMIT ENTIRELY for trivial tasks. -->
+- **Malformed inputs**: {{malformedInputTests — empty string, null, oversized, wrong type}}
+- **Error paths**: {{errorPathTests — network timeout, auth failure, 5xx, invalid JSON}}
+- **Boundary conditions**: {{boundaryTests — empty list, max length, zero, off-by-one}}
 ## Steps
 1. {{step}}

package/dist/resources/extensions/gsd/tools/plan-slice.js CHANGED Viewed

@@ -1,5 +1,5 @@
 import { clearParseCache } from "../files.js";
-import { transaction, getMilestone, getSlice, insertTask, upsertSlicePlanning, upsertTaskPlanning, } from "../gsd-db.js";
+import { transaction, getMilestone, getSlice, insertTask, upsertSlicePlanning, upsertTaskPlanning, insertGateRow, } from "../gsd-db.js";
 import { invalidateStateCache } from "../state.js";
 import { renderPlanFromDb } from "../markdown-renderer.js";
 import { renderAllProjections } from "../workflow-projections.js";
@@ -145,6 +145,19 @@ export async function handlePlanSlice(rawParams, basePath) {
                     fullPlanMd: task.fullPlanMd,
                 });
             }
+            // Seed quality gate rows inside the transaction — all-or-nothing with
+            // the plan data so a crash can't leave orphaned gates without tasks.
+            const sliceGates = ["Q3", "Q4"];
+            for (const gid of sliceGates) {
+                insertGateRow({ milestoneId: params.milestoneId, sliceId: params.sliceId, gateId: gid, scope: "slice" });
+            }
+            const taskGates = ["Q5", "Q6", "Q7"];
+            for (const task of params.tasks) {
+                for (const gid of taskGates) {
+                    insertGateRow({ milestoneId: params.milestoneId, sliceId: params.sliceId, gateId: gid, scope: "task", taskId: task.taskId });
+                }
+            }
+            insertGateRow({ milestoneId: params.milestoneId, sliceId: params.sliceId, gateId: "Q8", scope: "slice" });
         });
     }
     catch (err) {

package/dist/resources/extensions/gsd/tools/validate-milestone.js CHANGED Viewed

@@ -9,6 +9,7 @@ import { transaction, _getAdapter, } from "../gsd-db.js";
 import { resolveMilestonePath, clearPathCache } from "../paths.js";
 import { saveFile, clearParseCache } from "../files.js";
 import { invalidateStateCache } from "../state.js";
+import { VALIDATION_VERDICTS, isValidMilestoneVerdict } from "../verdict-parser.js";
 function renderValidationMarkdown(params) {
     let md = `---
 verdict: ${params.verdict}
@@ -41,9 +42,8 @@ export async function handleValidateMilestone(params, basePath) {
     if (!params.milestoneId || typeof params.milestoneId !== "string" || params.milestoneId.trim() === "") {
         return { error: "milestoneId is required and must be a non-empty string" };
     }
-    const validVerdicts = ["pass", "needs-attention", "needs-remediation"];
-    if (!validVerdicts.includes(params.verdict)) {
-        return { error: `verdict must be one of: ${validVerdicts.join(", ")}` };
+    if (!isValidMilestoneVerdict(params.verdict)) {
+        return { error: `verdict must be one of: ${VALIDATION_VERDICTS.join(", ")}` };
     }
     // ── Filesystem render ──────────────────────────────────────────────────
     const validationMd = renderValidationMarkdown(params);