npm - gsd-pi - Versions diffs - 2.71.0-dev.977c553 → 2.71.0-dev.d4d916a - Mend

gsd-pi 2.71.0-dev.977c553 → 2.71.0-dev.d4d916a

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (278) hide show

package/src/resources/extensions/gsd/prompts/execute-task.md CHANGED Viewed

@@ -32,29 +32,30 @@ Then:
 0. Narrate step transitions, key implementation decisions, and verification outcomes as you work. Keep it terse — one line between tool-call clusters, not between every call — but write complete sentences in user-facing prose, not shorthand notes or scratchpad fragments.
 1. {{skillActivation}} Follow any activated skills before writing code. If no skills match this task, skip this step.
 2. Execute the steps in the inlined task plan, adapting minor local mismatches when the surrounding code differs from the planner's snapshot
-3. Build the real thing. If the task plan says "create login endpoint", build an endpoint that actually authenticates against a real store, not one that returns a hardcoded success response. If the task plan says "create dashboard page", build a page that renders real data from the API, not a component with hardcoded props. Stubs and mocks are for tests, not for the shipped feature.
-4. Write or update tests as part of execution — tests are verification, not an afterthought. If the slice plan defines test files in its Verification section and this is the first task, create them (they should initially fail).
-5. When implementing non-trivial runtime behavior (async flows, API boundaries, background processes, error paths), add or preserve agent-usable observability. Skip this for simple changes where it doesn't apply.
+3. Before any `Write` that creates an artifact or output file, check whether that path already exists. If it does, read it first and decide whether the work is already done, should be extended, or truly needs replacement. "Create" in the plan does **not** mean the file is missing — a prior session may already have started it.
+4. Build the real thing. If the task plan says "create login endpoint", build an endpoint that actually authenticates against a real store, not one that returns a hardcoded success response. If the task plan says "create dashboard page", build a page that renders real data from the API, not a component with hardcoded props. Stubs and mocks are for tests, not for the shipped feature.
+5. Write or update tests as part of execution — tests are verification, not an afterthought. If the slice plan defines test files in its Verification section and this is the first task, create them (they should initially fail).
+6. When implementing non-trivial runtime behavior (async flows, API boundaries, background processes, error paths), add or preserve agent-usable observability. Skip this for simple changes where it doesn't apply.
    **Background process rule:** Never use bare `command &` to run background processes. The shell's `&` operator leaves stdout/stderr attached to the parent, which causes the Bash tool to hang indefinitely waiting for those streams to close. Always redirect output before backgrounding:
    - Correct: `command > /dev/null 2>&1 &` or `nohup command > /dev/null 2>&1 &`
    - Example: `python -m http.server 8080 > /dev/null 2>&1 &` (NOT `python -m http.server 8080 &`)
    - Preferred: use the `bg_shell` tool if available — it manages process lifecycle correctly without stream-inheritance issues
-6. If the task plan includes a **Failure Modes** section (Q5), implement the error/timeout/malformed handling specified. Verify each dependency's failure path is handled. Skip if the section is absent.
-7. If the task plan includes a **Load Profile** section (Q6), implement protections for the identified 10x breakpoint (connection pooling, rate limiting, pagination, etc.). Skip if absent.
-8. If the task plan includes a **Negative Tests** section (Q7), write the specified negative test cases alongside the happy-path tests — malformed inputs, error paths, and boundary conditions. Skip if absent.
-9. Verify must-haves are met by running concrete checks (tests, commands, observable behaviors)
-10. Run the slice-level verification checks defined in the slice plan's Verification section. Track which pass. On the final task of the slice, all must pass before marking done. On intermediate tasks, partial passes are expected — note which ones pass in the summary.
-11. After the verification gate runs (you'll see gate results in stderr/notify output), populate the `## Verification Evidence` table in your task summary with the check results. Use the `formatEvidenceTable` format: one row per check with command, exit code, verdict (✅ pass / ❌ fail), and duration. If no verification commands were discovered, note that in the section.
-12. If the task touches UI, browser flows, DOM behavior, or user-visible web state:
+7. If the task plan includes a **Failure Modes** section (Q5), implement the error/timeout/malformed handling specified. Verify each dependency's failure path is handled. Skip if the section is absent.
+8. If the task plan includes a **Load Profile** section (Q6), implement protections for the identified 10x breakpoint (connection pooling, rate limiting, pagination, etc.). Skip if absent.
+9. If the task plan includes a **Negative Tests** section (Q7), write the specified negative test cases alongside the happy-path tests — malformed inputs, error paths, and boundary conditions. Skip if absent.
+10. Verify must-haves are met by running concrete checks (tests, commands, observable behaviors)
+11. Run the slice-level verification checks defined in the slice plan's Verification section. Track which pass. On the final task of the slice, all must pass before marking done. On intermediate tasks, partial passes are expected — note which ones pass in the summary.
+12. After the verification gate runs (you'll see gate results in stderr/notify output), populate the `## Verification Evidence` table in your task summary with the check results. Use the `formatEvidenceTable` format: one row per check with command, exit code, verdict (✅ pass / ❌ fail), and duration. If no verification commands were discovered, note that in the section.
+13. If the task touches UI, browser flows, DOM behavior, or user-visible web state:
    - exercise the real flow in the browser
    - prefer `browser_batch` when the next few actions are obvious and sequential
    - prefer `browser_assert` for explicit pass/fail verification of the intended outcome
    - use `browser_diff` when an action's effect is ambiguous
    - use console/network/dialog diagnostics when validating async, stateful, or failure-prone UI
    - record verification in terms of explicit checks passed/failed, not only prose interpretation
-13. If the task plan includes an Observability Impact section, verify those signals directly. Skip this step if the task plan omits the section.
-14. **If execution is running long or verification fails:**
+14. If the task plan includes an Observability Impact section, verify those signals directly. Skip this step if the task plan omits the section.
+15. **If execution is running long or verification fails:**
     **Context budget:** You have approximately **{{verificationBudget}}** reserved for verification context. If you've used most of your context and haven't finished all steps, stop implementing and prioritize writing the task summary with clear notes on what's done and what remains. A partial summary that enables clean resumption is more valuable than one more half-finished step with no documentation. Never sacrifice summary quality for one more implementation step.
@@ -65,13 +66,13 @@ Then:
     - Distinguish "I know" from "I assume." Observable facts (the error says X) are strong evidence. Assumptions (this library should work this way) need verification.
     - Know when to stop. If you've tried 3+ fixes without progress, your mental model is probably wrong. Stop. List what you know for certain. List what you've ruled out. Form fresh hypotheses from there.
     - Don't fix symptoms. Understand *why* something fails before changing code. A test that passes after a change you don't understand is luck, not a fix.
-15. **Blocker discovery:** If execution reveals that the remaining slice plan is fundamentally invalid — not just a bug or minor deviation, but a plan-invalidating finding like a wrong API, missing capability, or architectural mismatch — set `blocker_discovered: true` in the task summary frontmatter and describe the blocker clearly in the summary narrative. Do NOT set `blocker_discovered: true` for ordinary debugging, minor deviations, or issues that can be fixed within the current task or the remaining plan. This flag triggers an automatic replan of the slice.
-16. If you made an architectural, pattern, library, or observability decision during this task that downstream work should know about, append it to `.gsd/DECISIONS.md` (read the template at `~/.gsd/agent/extensions/gsd/templates/decisions.md` if the file doesn't exist yet). Not every task produces decisions — only append when a meaningful choice was made.
-17. If you discover a non-obvious rule, recurring gotcha, or useful pattern during execution, append it to `.gsd/KNOWLEDGE.md`. Only add entries that would save future agents from repeating your investigation. Don't add obvious things.
-18. Read the template at `~/.gsd/agent/extensions/gsd/templates/task-summary.md`
-19. Use that template to prepare the completion content you will pass to `gsd_complete_task` using the camelCase fields `milestoneId`, `sliceId`, `taskId`, `oneLiner`, `narrative`, `verification`, and `verificationEvidence`. Do **not** manually write `{{taskSummaryPath}}` — the DB-backed tool is the canonical write path and renders the summary file for you.
-20. Call `gsd_complete_task` with milestoneId, sliceId, taskId, and the completion fields derived from the template. This is your final required step — do NOT manually edit PLAN.md checkboxes. The tool marks the task complete, updates the DB, renders `{{taskSummaryPath}}`, and updates PLAN.md automatically.
-21. Do not run git commands — the system reads your task summary after completion and creates a meaningful commit from it (type inferred from title, message from your one-liner, key files from frontmatter). Write a clear, specific one-liner in the summary — it becomes the commit message.
+16. **Blocker discovery:** If execution reveals that the remaining slice plan is fundamentally invalid — not just a bug or minor deviation, but a plan-invalidating finding like a wrong API, missing capability, or architectural mismatch — set `blocker_discovered: true` in the task summary frontmatter and describe the blocker clearly in the summary narrative. Do NOT set `blocker_discovered: true` for ordinary debugging, minor deviations, or issues that can be fixed within the current task or the remaining plan. This flag triggers an automatic replan of the slice.
+17. If you made an architectural, pattern, library, or observability decision during this task that downstream work should know about, append it to `.gsd/DECISIONS.md` (read the template at `~/.gsd/agent/extensions/gsd/templates/decisions.md` if the file doesn't exist yet). Not every task produces decisions — only append when a meaningful choice was made.
+18. If you discover a non-obvious rule, recurring gotcha, or useful pattern during execution, append it to `.gsd/KNOWLEDGE.md`. Only add entries that would save future agents from repeating your investigation. Don't add obvious things.
+19. Read the template at `~/.gsd/agent/extensions/gsd/templates/task-summary.md`
+20. Use that template to prepare the completion content you will pass to `gsd_complete_task` using the camelCase fields `milestoneId`, `sliceId`, `taskId`, `oneLiner`, `narrative`, `verification`, and `verificationEvidence`. Do **not** manually write `{{taskSummaryPath}}` — the DB-backed tool is the canonical write path and renders the summary file for you.
+21. Call `gsd_complete_task` with milestoneId, sliceId, taskId, and the completion fields derived from the template. This is your final required step — do NOT manually edit PLAN.md checkboxes. The tool marks the task complete, updates the DB, renders `{{taskSummaryPath}}`, and updates PLAN.md automatically.
+22. Do not run git commands — the system reads your task summary after completion and creates a meaningful commit from it (type inferred from title, message from your one-liner, key files from frontmatter). Write a clear, specific one-liner in the summary — it becomes the commit message.
 All work stays in your working directory: `{{workingDirectory}}`.

package/src/resources/extensions/gsd/prompts/guided-discuss-milestone.md CHANGED Viewed

@@ -32,6 +32,8 @@ Ask **1–3 questions per round**. Keep each question focused on one of:
 - **The biggest technical unknowns / risks** — what could fail, what hasn't been proven
 - **What external systems/services this touches** — APIs, databases, third-party services
+**Never fabricate or simulate user input.** Never generate fake transcript markers like `[User]`, `[Human]`, or `User:`. Ask one question round, then wait for the user's actual response before continuing.
 **If `{{structuredQuestionsAvailable}}` is `true`:** use `ask_user_questions` for each round. 1–3 questions per call, each as a separate question object. Keep option labels short (3–5 words). Always include a freeform "Other / let me explain" option. When the user picks that option or writes a long freeform answer, switch to plain text follow-up for that thread before resuming structured questions. **IMPORTANT: Call `ask_user_questions` exactly once per turn. Never make multiple calls with the same or overlapping questions — wait for the user's response before asking the next round.**
 **If `{{structuredQuestionsAvailable}}` is `false`:** ask questions in plain text. Keep each round to 1–3 focused questions. Wait for answers before asking the next round.

package/src/resources/extensions/gsd/prompts/guided-discuss-slice.md CHANGED Viewed

@@ -22,6 +22,8 @@ Do **not** go deep — just enough that your questions reflect what's actually t
 ### Question rounds
+**Never fabricate or simulate user input.** Never generate fake transcript markers like `[User]`, `[Human]`, or `User:`. Ask one question round, then wait for the user's actual response before continuing.
 **If `{{structuredQuestionsAvailable}}` is `true`:** Ask **1–3 questions per round** using `ask_user_questions`. **Call `ask_user_questions` exactly once per turn — never make multiple calls with the same or overlapping questions. Wait for the user's response before asking the next round.**
 **If `{{structuredQuestionsAvailable}}` is `false`:** Ask **1–3 questions per round** in plain text. Number them and wait for the user's response before asking the next round.
 Keep each question focused on one of:

package/src/resources/extensions/gsd/prompts/guided-resume-task.md CHANGED Viewed

	@@ -1 +1 @@
1	- Resume interrupted work. Find the continue file (`{{sliceId}}-CONTINUE.md` or `continue.md`) in slice {{sliceId}} of milestone {{milestoneId}}, read it, and use it as the recovery contract for where to pick up. Do not delete the continue file immediately. Keep it until the task is successfully completed or you have written a newer summary/continue artifact that clearly supersedes it. If the resumed attempt fails again, update or replace the continue file so no recovery context is lost. {{skillActivation}}
1	+ Resume interrupted work. Find the continue file (`{{sliceId}}-CONTINUE.md` or `continue.md`) in slice {{sliceId}} of milestone {{milestoneId}}, read it, and use it as the recovery contract for where to pick up. Before you create any expected artifact or output file, check whether it already exists and read it first — a prior session may already have started or completed that work. Do not delete the continue file immediately. Keep it until the task is successfully completed or you have written a newer summary/continue artifact that clearly supersedes it. If the resumed attempt fails again, update or replace the continue file so no recovery context is lost. {{skillActivation}}

package/src/resources/extensions/gsd/prompts/queue.md CHANGED Viewed

@@ -18,6 +18,7 @@ Say exactly: "What do you want to add?" — nothing else. Wait for the user's an
 ## Discussion Phase
 After they describe it, your job is to understand the new work deeply enough to create context files that a future planning session can use.
+Never fabricate or simulate user input during this discussion. Never generate fake transcript markers like `[User]`, `[Human]`, or `User:`. Ask one question round, then wait for the user's actual response before continuing.
 **If the user provides a file path or pastes a large document** (spec, design doc, product plan, chat export), read it fully before asking questions. Use it as the starting point — don't ask them to re-explain what's already in the document. Your questions should fill gaps and resolve ambiguities the document doesn't cover.
@@ -36,11 +37,11 @@ Don't go deep — just enough that your next question reflects what's actually t
 - How the new work relates to existing milestones — overlap, dependencies, prerequisites
 - If `.gsd/REQUIREMENTS.md` exists: which unmet Active or Deferred requirements this queued work advances
-**Then use ask_user_questions** to dig into gray areas — scope boundaries, proof expectations, integration choices, tech preferences when they materially matter, and what's in vs out. 1-3 questions per round.
+**Then use ask_user_questions** to dig into gray areas — scope boundaries, proof expectations, integration choices, tech preferences when they materially matter, and what's in vs out. Ask 1-3 questions per round, then wait for the user's response before asking the next round.
 If a `GSD Skill Preferences` block is present in system context, use it to decide which skills to load and follow during discuss/planning work, but do not let it override the required discuss flow or artifact requirements.
-**Self-regulate:** Do **not** ask a meta "ready to queue?" question after every round. Keep going until you have enough depth to write the context well, then use a single wrap-up prompt if needed. If the user clearly keeps adding detail instead of objecting, treat that as permission to continue.
+**Self-regulate:** Do **not** ask a meta "ready to queue?" question after every round. Keep going until you have enough depth to write the context well, then use a single wrap-up prompt if needed. Do not infer permission to continue from silence or from partial prior answers — each new round requires an actual user response.
 ## Existing Milestone Awareness

package/src/resources/extensions/gsd/prompts/system.md CHANGED Viewed

@@ -35,6 +35,7 @@ GSD ships with bundled skills. Load the relevant skill file with the `read` tool
 - Read before edit.
 - Reproduce before fix when possible.
 - Work is not done until the relevant verification has passed.
+- **Never fabricate, simulate, or role-play user responses.** Never generate markers like `[User]`, `[Human]`, `User:`, or similar to represent user input inside your own output. Ask one question round (1-3 questions), then stop and wait for the user's actual response before continuing. If `ask_user_questions` is available, treat its returned response as the only valid structured user input for that round.
 - Never print, echo, log, or restate secrets or credentials. Report only key names and applied/skipped status.
 - Never ask the user to edit `.env` files or set secrets manually. Use `secure_env_collect`.
 - In enduring files, write current state only unless the file is explicitly historical.

package/src/resources/extensions/gsd/prompts/validate-milestone.md CHANGED Viewed

@@ -31,7 +31,7 @@ Prompt: "Review milestone {{milestoneId}} requirements coverage. Working directo
 Prompt: "Review milestone {{milestoneId}} cross-slice integration. Working directory: {{workingDirectory}}. Read `{{roadmapPath}}` and find the boundary map (produces/consumes contracts). For each boundary, check that the producing slice's SUMMARY confirms it produced the artifact, and the consuming slice's SUMMARY confirms it consumed it. Output a markdown table: Boundary | Producer Summary | Consumer Summary | Status. End with a one-line verdict: PASS if all boundaries honored, NEEDS-ATTENTION if any gaps."
 **Reviewer C — Assessment & Acceptance Criteria**
-Prompt: "Review milestone {{milestoneId}} assessment evidence and acceptance criteria. Working directory: {{workingDirectory}}. Read `.gsd/{{milestoneId}}/CONTEXT.md` for acceptance criteria. Check for ASSESSMENT files in each slice directory. Verify each acceptance criterion maps to either a passing assessment result or clear SUMMARY evidence. Output a checklist: [ ] Criterion | Evidence. End with a one-line verdict: PASS if all criteria met, NEEDS-ATTENTION if gaps exist."
+Prompt: "Review milestone {{milestoneId}} assessment evidence and acceptance criteria. Working directory: {{workingDirectory}}. Read `.gsd/{{milestoneId}}/CONTEXT.md` for acceptance criteria. Check for ASSESSMENT files in each slice directory. Verify each acceptance criterion maps to either a passing assessment result or clear SUMMARY evidence. Then review the inlined milestone verification classes from planning. For each non-empty planned class, output a markdown table: Class | Planned Check | Evidence | Verdict. Use the exact class names `Contract`, `Integration`, `Operational`, and `UAT` whenever those classes are present. If no verification classes were planned, say that explicitly. Output two sections: `Acceptance Criteria` with a checklist `[ ] Criterion | Evidence`, and `Verification Classes` with the table. End with a one-line verdict: PASS if all criteria and verification classes are covered, NEEDS-ATTENTION if gaps exist."
 ### Step 2 — Synthesize Findings
@@ -70,6 +70,7 @@ reviewers: 3
 ```
 Call `gsd_validate_milestone` with the camelCase fields `milestoneId`, `verdict`, `remediationRound`, `successCriteriaChecklist`, `sliceDeliveryAudit`, `crossSliceIntegration`, `requirementCoverage`, `verdictRationale`, and `remediationPlan` when needed. If you include verification-class analysis, pass it in `verificationClasses`.
+Extract the `Verification Classes` subsection from Reviewer C and pass it verbatim in `verificationClasses` so the persisted validation output uses the canonical class names `Contract`, `Integration`, `Operational`, and `UAT`.
 **DB access safety:** Do NOT query `.gsd/gsd.db` directly via `sqlite3` or `node -e require('better-sqlite3')` — the engine owns the WAL connection. Use `gsd_milestone_status` to read milestone and slice state. All data you need is already inlined in the context above or accessible via the `gsd_*` tools. Direct DB access corrupts the WAL and bypasses tool-level validation.

package/src/resources/extensions/gsd/session-model-override.ts ADDED Viewed

@@ -0,0 +1,36 @@
+export interface SessionModelOverride {
+  provider: string;
+  id: string;
+}
+const sessionOverrides = new Map<string, SessionModelOverride>();
+function normalizeSessionId(sessionId: string): string {
+  return typeof sessionId === "string" ? sessionId.trim() : "";
+}
+export function setSessionModelOverride(
+  sessionId: string,
+  override: SessionModelOverride,
+): void {
+  const key = normalizeSessionId(sessionId);
+  if (!key) return;
+  sessionOverrides.set(key, {
+    provider: override.provider,
+    id: override.id,
+  });
+}
+export function getSessionModelOverride(
+  sessionId: string,
+): SessionModelOverride | undefined {
+  const key = normalizeSessionId(sessionId);
+  if (!key) return undefined;
+  return sessionOverrides.get(key);
+}
+export function clearSessionModelOverride(sessionId: string): void {
+  const key = normalizeSessionId(sessionId);
+  if (!key) return;
+  sessionOverrides.delete(key);
+}

package/src/resources/extensions/gsd/shortcut-defs.ts ADDED Viewed

@@ -0,0 +1,56 @@
+// Canonical GSD shortcut definitions used by registration, help text, and overlays.
+import { formatShortcut } from "./files.js";
+export type GSDShortcutId = "dashboard" | "notifications" | "parallel";
+type GSDShortcutDef = {
+  key: "g" | "n" | "p";
+  action: string;
+  command: string;
+  /** Whether the Ctrl+Shift fallback is registered (false when it conflicts with an app keybinding). */
+  hasFallback: boolean;
+};
+export const GSD_SHORTCUTS: Record<GSDShortcutId, GSDShortcutDef> = {
+  dashboard: {
+    key: "g",
+    action: "Open GSD dashboard",
+    command: "/gsd status",
+    hasFallback: true,
+  },
+  notifications: {
+    key: "n",
+    action: "Open notification history",
+    command: "/gsd notifications",
+    hasFallback: true,
+  },
+  parallel: {
+    key: "p",
+    action: "Open parallel worker monitor",
+    command: "/gsd parallel watch",
+    hasFallback: false, // Ctrl+Shift+P conflicts with cycleModelBackward
+  },
+};
+function combo(prefix: "Ctrl+Alt+" | "Ctrl+Shift+", key: string): string {
+  return `${prefix}${key.toUpperCase()}`;
+}
+export function primaryShortcutCombo(id: GSDShortcutId): string {
+  return combo("Ctrl+Alt+", GSD_SHORTCUTS[id].key);
+}
+export function fallbackShortcutCombo(id: GSDShortcutId): string {
+  return combo("Ctrl+Shift+", GSD_SHORTCUTS[id].key);
+}
+export function shortcutPair(id: GSDShortcutId, formatter: (combo: string) => string = (combo) => combo): string {
+  const primary = formatter(primaryShortcutCombo(id));
+  if (!GSD_SHORTCUTS[id].hasFallback) return primary;
+  return `${primary} / ${formatter(fallbackShortcutCombo(id))}`;
+}
+export function formattedShortcutPair(id: GSDShortcutId): string {
+  return shortcutPair(id, formatShortcut);
+}

package/src/resources/extensions/gsd/tests/auto-start-model-capture.test.ts CHANGED Viewed

@@ -7,9 +7,8 @@ const sourcePath = join(import.meta.dirname, "..", "auto-start.ts");
 const source = readFileSync(sourcePath, "utf-8");
 test("bootstrapAutoSession snapshots ctx.model before guided-flow entry (#2829)", () => {
-  // #3517 changed the snapshot to prefer GSD preferences, but the ordering
-  // guarantee still holds: the snapshot must be built before guided-flow.
-  const snapshotIdx = source.indexOf("const startModelSnapshot = preferredModel");
+  // The snapshot ordering guarantee still holds: build snapshot before guided-flow.
+  const snapshotIdx = source.indexOf("const startModelSnapshot = manualSessionOverride");
   assert.ok(snapshotIdx > -1, "auto-start.ts should snapshot model at bootstrap start");
   const firstDiscussIdx = source.indexOf('await showSmartEntry(ctx, pi, base, { step: requestedStepMode });');
@@ -29,8 +28,11 @@ test("bootstrapAutoSession restores autoModeStartModel from the early snapshot (
   assert.ok(snapshotRefIdx > -1, "autoModeStartModel should be restored from startModelSnapshot");
 });
-test("bootstrapAutoSession prefers GSD PREFERENCES.md over settings.json for start model (#3517)", () => {
-  // resolveDefaultSessionModel() should be called before the snapshot is built
+test("bootstrapAutoSession checks manual session override before preferences", () => {
+  const manualIdx = source.indexOf("const manualSessionOverride = getSessionModelOverride(");
+  assert.ok(manualIdx > -1, "auto-start.ts should read session model override first");
+  // resolveDefaultSessionModel() should still be called for fallback behavior
   const preferredIdx = source.indexOf("const preferredModel = resolveDefaultSessionModel(");
   assert.ok(preferredIdx > -1, "auto-start.ts should call resolveDefaultSessionModel()");
@@ -38,11 +40,25 @@ test("bootstrapAutoSession prefers GSD PREFERENCES.md over settings.json for sta
   const withProviderIdx = source.indexOf("resolveDefaultSessionModel(ctx.model?.provider)");
   assert.ok(withProviderIdx > -1, "auto-start.ts should pass ctx.model?.provider for bare ID resolution");
-  const snapshotIdx = source.indexOf("const startModelSnapshot = preferredModel");
-  assert.ok(snapshotIdx > -1, "startModelSnapshot should use preferredModel when available");
+  const snapshotIdx = source.indexOf("const startModelSnapshot = manualSessionOverride");
+  assert.ok(snapshotIdx > -1, "startModelSnapshot should prefer manual session override");
   assert.ok(
-    preferredIdx < snapshotIdx,
-    "resolveDefaultSessionModel() must be called before building startModelSnapshot",
+    manualIdx < snapshotIdx && preferredIdx < snapshotIdx,
+    "manual override and preference fallback must be resolved before building startModelSnapshot",
   );
 });
+test("bootstrapAutoSession validates preferred model against live registry auth (#unconfigured-models)", () => {
+  // The raw PREFERENCES.md value must be validated against getAvailable()
+  // before being captured as the snapshot, so an unconfigured provider
+  // (no API key / OAuth) can't become autoModeStartModel.
+  const validationIdx = source.indexOf("ctx.modelRegistry.getAvailable()");
+  assert.ok(validationIdx > -1, "auto-start.ts should validate preferred model against getAvailable()");
+  const resolveModelIdIdx = source.indexOf("resolveModelId");
+  assert.ok(resolveModelIdIdx > -1, "auto-start.ts should resolve preferred model against the registry");
+  const warningIdx = source.indexOf("is not configured; falling back to session default");
+  assert.ok(warningIdx > -1, "auto-start.ts should warn when preferred model is unconfigured");
+});

package/src/resources/extensions/gsd/tests/auto-start-worktree-db-path.test.ts ADDED Viewed

@@ -0,0 +1,28 @@
+import { readFileSync } from "node:fs";
+import { join } from "node:path";
+import { createTestContext } from "./test-helpers.ts";
+const { assertTrue, report } = createTestContext();
+const srcPath = join(import.meta.dirname, "..", "auto-start.ts");
+const src = readFileSync(srcPath, "utf-8");
+console.log("\n=== #3822: worktree bootstrap uses project DB path ===");
+const dbLifecycleIdx = src.indexOf("// ── DB lifecycle ──");
+assertTrue(dbLifecycleIdx > 0, "auto-start.ts has a DB lifecycle section");
+const dbLifecycleRegion = dbLifecycleIdx > 0 ? src.slice(dbLifecycleIdx, dbLifecycleIdx + 600) : "";
+assertTrue(
+  dbLifecycleRegion.includes("const gsdDbPath = resolveProjectRootDbPath(s.basePath);"),
+  "DB lifecycle resolves the project-root DB path after worktree entry (#3822)",
+);
+assertTrue(
+  !dbLifecycleRegion.includes('join(s.basePath, ".gsd", "gsd.db")'),
+  "DB lifecycle no longer derives gsd.db directly from the worktree path (#3822)",
+);
+report();

package/src/resources/extensions/gsd/tests/bootstrap-derive-state-db-open.test.ts ADDED Viewed

@@ -0,0 +1,39 @@
+import { describe, test } from "node:test";
+import assert from "node:assert/strict";
+import { readFileSync } from "node:fs";
+import { join } from "node:path";
+const systemContextSrc = readFileSync(
+  join(import.meta.dirname, "..", "bootstrap", "system-context.ts"),
+  "utf-8",
+);
+const registerHooksSrc = readFileSync(
+  join(import.meta.dirname, "..", "bootstrap", "register-hooks.ts"),
+  "utf-8",
+);
+describe("bootstrap deriveState DB guards (#3844)", () => {
+  test("system-context opens DB before deriveState in resume flows", () => {
+    const helperIdx = systemContextSrc.indexOf("const ensureStateDbOpen = async () => {");
+    const firstDeriveIdx = systemContextSrc.indexOf("const state = await deriveState(basePath);");
+    assert.ok(helperIdx > -1, "system-context should define a DB-open helper for deriveState callers");
+    assert.ok(firstDeriveIdx > -1, "system-context should still derive state for resume flows");
+    assert.ok(helperIdx < firstDeriveIdx, "system-context should prepare DB opening before deriveState resume calls");
+    assert.match(
+      systemContextSrc,
+      /await ensureStateDbOpen\(\);\s*\n\s*const state = await deriveState\(basePath\);/g,
+      "system-context resume flows should open DB before deriveState",
+    );
+  });
+  test("register-hooks opens DB before deriveState in session_before_compact", () => {
+    const compactIdx = registerHooksSrc.indexOf('pi.on("session_before_compact"');
+    assert.ok(compactIdx > -1, "register-hooks should define session_before_compact");
+    const compactSection = registerHooksSrc.slice(compactIdx, compactIdx + 1600);
+    const ensureIdx = compactSection.indexOf("ensureDbOpen()");
+    const deriveIdx = compactSection.indexOf("deriveState(basePath)");
+    assert.ok(ensureIdx > -1, "session_before_compact should call ensureDbOpen()");
+    assert.ok(deriveIdx > -1, "session_before_compact should derive state");
+    assert.ok(ensureIdx < deriveIdx, "session_before_compact should open DB before deriveState");
+  });
+});

package/src/resources/extensions/gsd/tests/complete-slice-prompt-task-summary-layout.test.ts ADDED Viewed

@@ -0,0 +1,18 @@
+import test from "node:test";
+import assert from "node:assert/strict";
+import { readFileSync } from "node:fs";
+import { join } from "node:path";
+const promptPath = join(process.cwd(), "src/resources/extensions/gsd/prompts/complete-slice.md");
+const prompt = readFileSync(promptPath, "utf-8");
+test("complete-slice prompt explains the flat task summary layout", () => {
+  assert.match(prompt, /flat file layout/i);
+  assert.match(prompt, /T01-SUMMARY\.md/);
+  assert.match(prompt, /not inside per-task subdirectories like `tasks\/T01\/SUMMARY\.md`/i);
+});
+test("complete-slice prompt forbids the wrong task summary glob", () => {
+  assert.match(prompt, /find .*tasks -name "\*-SUMMARY\.md"/i);
+  assert.match(prompt, /Never use `tasks\/\*\/SUMMARY\.md`/);
+});

package/src/resources/extensions/gsd/tests/dispatch-guard.test.ts CHANGED Viewed

@@ -145,6 +145,33 @@ test("dispatch guard falls back to positional ordering when no dependencies decl
   );
 });
+test("dispatch guard ignores positionally-earlier reverse dependents for zero-dependency slices (#3720)", (t) => {
+  const repo = setupRepo();
+  t.after(() => teardownRepo(repo));
+  mkdirSync(join(repo, ".gsd", "milestones", "M015"), { recursive: true });
+  insertMilestone({ id: "M015", title: "Reverse dependency fallback" });
+  insertSlice({ id: "S03", milestoneId: "M015", title: "Complete prerequisite", status: "complete", depends: [], sequence: 0 });
+  insertSlice({ id: "S04", milestoneId: "M015", title: "Depends on S04A", status: "pending", depends: ["S03", "S04A"], sequence: 0 });
+  insertSlice({ id: "S04A", milestoneId: "M015", title: "No explicit deps", status: "pending", depends: [], sequence: 0 });
+  writeFileSync(join(repo, ".gsd", "milestones", "M015", "M015-ROADMAP.md"), "# M015\n");
+  // S04A has no declared dependencies and should not be blocked by S04, because
+  // S04 itself depends on S04A. With sequence=0, DB ordering falls back to id.
+  assert.equal(
+    getPriorSliceCompletionBlocker(repo, "main", "execute-task", "M015/S04A/T02"),
+    null,
+  );
+  // The reverse direction is still blocked normally.
+  assert.equal(
+    getPriorSliceCompletionBlocker(repo, "main", "execute-task", "M015/S04/T01"),
+    "Cannot dispatch execute-task M015/S04/T01: dependency slice M015/S04A is not complete.",
+  );
+});
 test("dispatch guard allows slice with all declared dependencies complete", (t) => {
   const repo = setupRepo();
   t.after(() => teardownRepo(repo));

package/src/resources/extensions/gsd/tests/execute-task-prompt-existing-artifact-guard.test.ts ADDED Viewed

@@ -0,0 +1,33 @@
+import test from "node:test";
+import assert from "node:assert/strict";
+import { readFileSync } from "node:fs";
+import { dirname, join } from "node:path";
+import { fileURLToPath } from "node:url";
+const __dirname = dirname(fileURLToPath(import.meta.url));
+const promptsDir = join(__dirname, "..", "prompts");
+test("execute-task prompt requires reading existing artifacts before write", () => {
+  const prompt = readFileSync(join(promptsDir, "execute-task.md"), "utf-8");
+  assert.match(
+    prompt,
+    /Before any `Write` that creates an artifact or output file, check whether that path already exists\./,
+    "execute-task prompt should require an existence check before creating artifacts",
+  );
+  assert.match(
+    prompt,
+    /If it does, read it first and decide whether the work is already done, should be extended, or truly needs replacement\./,
+    "execute-task prompt should require reading existing artifacts before replacement",
+  );
+});
+test("guided resume prompt checks for pre-existing artifacts", () => {
+  const prompt = readFileSync(join(promptsDir, "guided-resume-task.md"), "utf-8");
+  assert.match(
+    prompt,
+    /Before you create any expected artifact or output file, check whether it already exists and read it first/i,
+    "guided resume prompt should guard pre-existing artifacts",
+  );
+});

package/src/resources/extensions/gsd/tests/forensics-stuck-loops.test.ts CHANGED Viewed

@@ -101,3 +101,65 @@ test("#1943 detectStuckLoops ignores watchdog duplicates but flags real re-dispa
   assert.equal(anomalies.length, 1, `expected 1 anomaly (for the 3x dispatched task), got ${anomalies.length}`);
   assert.ok(anomalies[0].summary.includes("3 times"));
 });
+test("#3760 detectStuckLoops ignores cross-session recovery re-dispatches", () => {
+  const anomalies: ForensicAnomaly[] = [];
+  const units: UnitMetrics[] = [
+    makeUnit({
+      type: "plan-slice",
+      id: "M001/S02",
+      startedAt: 1000,
+      finishedAt: 2000,
+      autoSessionKey: "session-a",
+    }),
+    makeUnit({
+      type: "plan-slice",
+      id: "M001/S02",
+      startedAt: 5000,
+      finishedAt: 6000,
+      autoSessionKey: "session-b",
+    }),
+  ];
+  detectStuckLoops(units, anomalies);
+  assert.equal(anomalies.length, 0, "cross-session recovery should not be flagged as a stuck loop");
+});
+test("#3760 detectStuckLoops still flags repeated dispatches within one auto session", () => {
+  const anomalies: ForensicAnomaly[] = [];
+  const units: UnitMetrics[] = [
+    makeUnit({
+      type: "complete-slice",
+      id: "M011/S02",
+      startedAt: 1000,
+      finishedAt: 2000,
+      autoSessionKey: "session-a",
+    }),
+    makeUnit({
+      type: "complete-slice",
+      id: "M011/S02",
+      startedAt: 5000,
+      finishedAt: 6000,
+      autoSessionKey: "session-a",
+    }),
+    makeUnit({
+      type: "complete-slice",
+      id: "M011/S02",
+      startedAt: 9000,
+      finishedAt: 10000,
+      autoSessionKey: "session-b",
+    }),
+  ];
+  detectStuckLoops(units, anomalies);
+  assert.equal(anomalies.length, 1, "within-session retries should still be flagged");
+  assert.ok(anomalies[0].summary.includes("2 times"), `summary should reflect the worst same-session loop: ${anomalies[0].summary}`);
+  assert.ok(
+    anomalies[0].details.includes("Cross-session recovery runs are ignored"),
+    `details should explain the session-aware rule: ${anomalies[0].details}`,
+  );
+});

package/src/resources/extensions/gsd/tests/format-shortcut.test.ts CHANGED Viewed

@@ -4,6 +4,7 @@
 import test from 'node:test';
 import assert from 'node:assert/strict';
 import { formatShortcut } from '../files.ts';
+import { formattedShortcutPair, primaryShortcutCombo, fallbackShortcutCombo } from '../shortcut-defs.ts';
 // ─── formatShortcut renders per-platform shortcuts ──────────────────────
@@ -67,3 +68,33 @@ test('formatShortcut: passes through plain key names', () => {
   assert.strictEqual(formatShortcut('Escape'), 'Escape');
   assert.strictEqual(formatShortcut('Enter'), 'Enter');
 });
+test("shortcut-defs: exposes canonical dashboard combos", () => {
+  assert.equal(primaryShortcutCombo("dashboard"), "Ctrl+Alt+G");
+  assert.equal(fallbackShortcutCombo("dashboard"), "Ctrl+Shift+G");
+});
+test("shortcut-defs: formats shortcut pair using platform symbols", () => {
+  const pair = formattedShortcutPair("notifications");
+  if (process.platform === "darwin") {
+    assert.equal(pair, "⌃⌥N / ⌃⇧N");
+  } else {
+    assert.equal(pair, "Ctrl+Alt+N / Ctrl+Shift+N");
+  }
+});
+test("shortcut-defs: parallel shortcut omits fallback (hasFallback: false)", () => {
+  const pair = formattedShortcutPair("parallel");
+  if (process.platform === "darwin") {
+    assert.equal(pair, "⌃⌥P", "parallel should only show primary combo");
+  } else {
+    assert.equal(pair, "Ctrl+Alt+P", "parallel should only show primary combo");
+  }
+  // Verify it does NOT contain the fallback separator
+  assert.ok(!pair.includes("/"), "parallel pair should not contain fallback separator");
+});
+test("shortcut-defs: dashboard shortcut includes fallback (hasFallback: true)", () => {
+  const pair = formattedShortcutPair("dashboard");
+  assert.ok(pair.includes("/"), "dashboard pair should contain fallback separator");
+});