npm - gsd-pi - Versions diffs - 2.80.0-dev.c5c38454b → 2.80.0-dev.f55d16d13 - Mend

gsd-pi 2.80.0-dev.c5c38454b → 2.80.0-dev.f55d16d13

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (77) hide show

package/src/resources/extensions/gsd/prompts/complete-milestone.md CHANGED Viewed

@@ -16,15 +16,14 @@ Start with what the excerpts give you. Read full files when the section heads si
 **On-demand Read ordering:** Complete all slice SUMMARY Reads you need for cross-slice synthesis, the Decision Re-evaluation table, and LEARNINGS **before** calling `gsd_complete_milestone` (step 12). Once that tool runs, the milestone is marked complete in the DB, so it must be the final persistent milestone-closeout write.
-### Delegate Review Work
+### Closeout Review Mode
-Use `subagent` for review work needing fresh context, before drafting LEARNINGS:
+The inlined context includes a validation status block.
-- Cross-slice integrations or new public APIs -> **reviewer** with milestone diff and roadmap.
-- Auth, network, parsing, file IO, shell exec, or crypto -> **security** audit.
-- Significant tests added or changed -> **tester** coverage check against success criteria.
+- If it says a passing validation artifact is present, treat that artifact as authoritative for success criteria, requirement coverage, verification classes, and cross-slice integration. Do not delegate fresh reviewer/security/tester audits unless the validation artifact is internally inconsistent with the inlined summaries.
+- If validation is missing, stale, non-pass, or internally inconsistent, use `subagent` for review work needing fresh context before drafting LEARNINGS: cross-slice integrations or new public APIs -> **reviewer**; auth, network, parsing, file IO, shell exec, or crypto -> **security**; significant tests added or changed -> **tester**.
-Subagents report only; they do not write user source. Fold findings into Decision Re-evaluation and LEARNINGS before completion.
+Subagents report only; they do not write user source. Fold any findings into Decision Re-evaluation and LEARNINGS before completion.
 {{inlinedContext}}
@@ -33,8 +32,8 @@ Subagents report only; they do not write user source. Fold findings into Decisio
 1. Use the **Milestone Summary** output template from the inlined context above
 2. {{skillActivation}}
 3. **Verify code changes exist.** Compare milestone work against the integration branch (`main`, `master`, or recorded branch), using merge-base as older revision and `HEAD` as newer. If the diff lists non-`.gsd/` files, pass. If `HEAD` equals the integration branch/merge-base, treat it as a self-diff retry: inspect milestone-scoped commit evidence (`GSD-Unit: {{milestoneId}}` or production `GSD-Task: Sxx/Tyy` trailers touching `.gsd/milestones/{{milestoneId}}/`) and verify those commits touched non-`.gsd/` files. Record **verification failure** only when neither source shows implementation files.
-4. Verify every **success criterion** from `{{roadmapPath}}` with evidence from summaries, tests, or observable behavior. Record unmet criteria as **verification failure**.
-5. Verify **definition of done**: all slices `[x]`, summaries exist, and integrations work. Record unmet items as **verification failure**.
+4. Verify every **success criterion** from `{{roadmapPath}}`. If passing validation is present, summarize the validation evidence instead of re-auditing it; otherwise verify with evidence from summaries, tests, or observable behavior. Record unmet criteria as **verification failure**.
+5. Verify **definition of done**: all slices `[x]`, summaries exist, and integrations work. If passing validation is present, trust its integration/verification verdict unless inconsistent with current artifacts. Record unmet items as **verification failure**.
 6. If the roadmap includes a **Horizontal Checklist**, verify each item and note unchecked items in the summary.
 7. Fill the **Decision Re-evaluation** table: compare each key `.gsd/DECISIONS.md` decision from this milestone with what shipped, and flag decisions to revisit.
 8. Validate **requirement status transitions**. For each changed requirement, confirm evidence supports the new status. Requirements may move between Active, Validated, Deferred, Blocked, or Out of Scope only with proof.

package/src/resources/extensions/gsd/prompts/plan-milestone.md CHANGED Viewed

@@ -48,7 +48,7 @@ Narrate decomposition reasoning in complete sentences: grouping, risk order, ver
 Then:
 1. Use the **Roadmap** output template from the inlined context above
 2. {{skillActivation}}
-3. Create only as many demoable vertical slices as the work genuinely needs.
+3. Create only as many demoable vertical slices as the work genuinely needs. Use 1-10 slices, sized to the work; tiny/single-file/static work should usually be one slice.
 4. Order by risk, high-risk first.
 5. Call `gsd_plan_milestone` to persist milestone fields, slice rows, and **Horizontal Checklist** through the DB-backed path. Fill checklist concerns considered during planning: requirements, decisions, shutdown, revenue, auth, shared resources, reconnection. Omit for trivial milestones. Do **not** write `{{outputPath}}`, `ROADMAP.md`, or other planning artifacts manually; the tool owns rendering and persistence.
 6. If planning produced structural decisions (slice ordering, technology choices, scope exclusions), call `gsd_decision_save` for each; the tool assigns IDs and regenerates `.gsd/DECISIONS.md`.
@@ -78,6 +78,8 @@ Apply these when decomposing and ordering slices:
 - Ship features, not proofs; use clearly marked realistic stubs only when necessary.
 - **Dependency format is comma-separated, never range syntax.** Write `depends:[S01,S02,S03]`, not `depends:[S01-S03]`.
 - Roadmap ambition must match the milestone; right-size decomposition.
+- Missing ecosystem markers are not a reason to over-plan. If Project Classification says `untyped-existing`, treat the listed content files as the project surface and use generic file-level workflow guidance.
+- For `untyped-existing` projects with 1-2 content files, prefer exactly one slice unless the request clearly spans multiple independent user-visible capabilities. For 3-5 content files, prefer 1-2 slices.
 ## Progressive Planning (ADR-011)

package/src/resources/extensions/gsd/safety/evidence-collector.ts CHANGED Viewed

@@ -50,6 +50,15 @@ export interface FileEditEvidence {
 export type EvidenceEntry = BashEvidence | FileWriteEvidence | FileEditEvidence;
+const EXECUTION_TOOL_NAMES = new Set([
+  "bash",
+  "Bash",
+  "gsd_exec",
+  "gsd_exec_search",
+  "mcp__gsd-workflow__gsd_exec",
+  "mcp__gsd-workflow__gsd_exec_search",
+]);
 // ─── Module State ───────────────────────────────────────────────────────────
 let unitEvidence: EvidenceEntry[] = [];
@@ -188,11 +197,11 @@ export function clearEvidenceFromDisk(
  * Exit codes and output are filled in by recordToolResult after execution.
  */
 export function recordToolCall(toolCallId: string, toolName: string, input: Record<string, unknown>): void {
-  if (toolName === "bash" || toolName === "Bash") {
+  if (EXECUTION_TOOL_NAMES.has(toolName)) {
     unitEvidence.push({
       kind: "bash",
       toolCallId,
-      command: String(input.command ?? ""),
+      command: String(input.command ?? input.cmd ?? input.query ?? ""),
       exitCode: -1,
       outputSnippet: "",
       timestamp: Date.now(),

package/src/resources/extensions/gsd/tests/auto-loop.test.ts CHANGED Viewed

@@ -2556,7 +2556,7 @@ test("autoLoop warns but proceeds for greenfield project (no project files) (#18
     "should not stop with health check failure for greenfield project",
   );
   const greenfieldWarning = notifications.find(
-    (n) => n.includes("no recognized project files") && n.includes("greenfield"),
+    (n) => n.includes("no project content yet") && n.includes("greenfield"),
   );
   assert.ok(
     greenfieldWarning,

package/src/resources/extensions/gsd/tests/clean-root-preflight.test.ts CHANGED Viewed

@@ -131,7 +131,7 @@ test("postflightPopStash — restores stashed changes and emits info notificatio
     run('git commit -m "simulate merge"', repo);
     const postNotifications: Array<{ msg: string; level: string }> = [];
-    postflightPopStash(repo, "M004", (msg, level) => {
+    postflightPopStash(repo, "M004", preflight.stashMarker, (msg, level) => {
       postNotifications.push({ msg, level });
     });
@@ -171,7 +171,7 @@ test("preflight + merge + postflight round-trip preserves uncommitted changes",
     run('git commit -m "feat: add feature"', repo);
     // Postflight: pop stash
-    postflightPopStash(repo, "M005", () => {});
+    postflightPopStash(repo, "M005", preflight.stashMarker, () => {});
     // README.md must still have our local content
     const restored = readFileSync(join(repo, "README.md"), "utf-8");
@@ -184,3 +184,89 @@ test("preflight + merge + postflight round-trip preserves uncommitted changes",
     try { rmSync(repo, { recursive: true, force: true, maxRetries: 3, retryDelay: 100 }); } catch { /* ignore */ }
   }
 });
+test("postflightPopStash conflict warning names the exact stash ref", () => {
+  const repo = createTempRepo();
+  try {
+    writeFileSync(join(repo, "README.md"), "# local work\n");
+    const preflight = preflightCleanRoot(repo, "M005C", () => {});
+    assert.equal(preflight.stashPushed, true, "must have stashed");
+    writeFileSync(join(repo, "README.md"), "# merged work\n");
+    run("git add README.md", repo);
+    run('git commit -m "simulate conflicting merge"', repo);
+    const notifications: Array<{ msg: string; level: string }> = [];
+    postflightPopStash(repo, "M005C", preflight.stashMarker, (msg, level) => {
+      notifications.push({ msg, level });
+    });
+    const warning = notifications.find((n) => n.level === "warning")?.msg ?? "";
+    assert.match(warning, /git stash pop stash@\{\d+\}/);
+    assert.match(warning, /git stash apply stash@\{\d+\}/);
+  } finally {
+    try { rmSync(repo, { recursive: true, force: true, maxRetries: 3, retryDelay: 100 }); } catch { /* ignore */ }
+  }
+});
+test("postflightPopStash restores the matching GSD stash, not stash@{0}", () => {
+  const repo = createTempRepo();
+  try {
+    writeFileSync(join(repo, "README.md"), "# target stash\n");
+    const preflight = preflightCleanRoot(repo, "M006", () => {});
+    assert.equal(preflight.stashPushed, true, "must have stashed target change");
+    writeFileSync(join(repo, "other.txt"), "other stash\n");
+    run('git stash push --include-untracked -m "unrelated newer stash"', repo);
+    postflightPopStash(repo, "M006", preflight.stashMarker, () => {});
+    const content = readFileSync(join(repo, "README.md"), "utf-8");
+    assert.equal(content.replace(/\r\n/g, "\n"), "# target stash\n");
+    const stashList = run("git stash list", repo);
+    assert.ok(stashList.includes("unrelated newer stash"), "unrelated newer stash must remain");
+    assert.ok(!stashList.includes("gsd-preflight-stash [gsd-preflight-stash:M006"), "target stash should be consumed");
+  } finally {
+    try { rmSync(repo, { recursive: true, force: true, maxRetries: 3, retryDelay: 100 }); } catch { /* ignore */ }
+  }
+});
+test("postflightPopStash restores the exact preflight marker when another same-milestone stash exists", () => {
+  const repo = createTempRepo();
+  try {
+    writeFileSync(join(repo, "README.md"), "# target stash\n");
+    const preflight = preflightCleanRoot(repo, "M007", () => {});
+    assert.equal(preflight.stashPushed, true, "must have stashed target change");
+    assert.ok(preflight.stashMarker, "preflight must expose exact stash marker");
+    writeFileSync(join(repo, "same-milestone.txt"), "newer same milestone stash\n");
+    run('git stash push --include-untracked -m "gsd-preflight-stash [gsd-preflight-stash:M007:other]"', repo);
+    postflightPopStash(repo, "M007", preflight.stashMarker, () => {});
+    const content = readFileSync(join(repo, "README.md"), "utf-8");
+    assert.equal(content.replace(/\r\n/g, "\n"), "# target stash\n");
+    const stashList = run("git stash list", repo);
+    assert.ok(stashList.includes("gsd-preflight-stash:M007:other"), "newer same-milestone stash must remain");
+    assert.ok(!stashList.includes(preflight.stashMarker), "exact target stash should be consumed");
+  } finally {
+    try { rmSync(repo, { recursive: true, force: true, maxRetries: 3, retryDelay: 100 }); } catch { /* ignore */ }
+  }
+});
+test("postflightPopStash falls back to milestone marker prefix when exact marker is unavailable", () => {
+  const repo = createTempRepo();
+  try {
+    writeFileSync(join(repo, "README.md"), "# fallback stash\n");
+    run('git stash push --include-untracked -m "gsd-preflight-stash [gsd-preflight-stash:M008:fallback]"', repo);
+    postflightPopStash(repo, "M008", undefined, () => {});
+    const content = readFileSync(join(repo, "README.md"), "utf-8");
+    assert.equal(content.replace(/\r\n/g, "\n"), "# fallback stash\n");
+    const stashList = run("git stash list", repo);
+    assert.ok(!stashList.includes("gsd-preflight-stash:M008:fallback"), "fallback stash should be consumed");
+  } finally {
+    try { rmSync(repo, { recursive: true, force: true, maxRetries: 3, retryDelay: 100 }); } catch { /* ignore */ }
+  }
+});

package/src/resources/extensions/gsd/tests/detection.test.ts CHANGED Viewed

@@ -11,12 +11,14 @@
 import test from "node:test";
 import assert from "node:assert/strict";
 import { mkdirSync, writeFileSync, rmSync, existsSync } from "node:fs";
+import { execFileSync } from "node:child_process";
 import { join } from "node:path";
 import { tmpdir } from "node:os";
 import {
   detectProjectState,
   detectV1Planning,
   detectProjectSignals,
+  classifyProject,
   scanProjectFiles,
 } from "../detection.ts";
@@ -37,6 +39,18 @@ function cleanup(dir: string): void {
   }
 }
+function git(dir: string, args: string[]): void {
+  execFileSync("git", args, { cwd: dir, stdio: "ignore" });
+}
+function makeGitRepo(prefix: string): string {
+  const dir = makeTempDir(prefix);
+  git(dir, ["init"]);
+  git(dir, ["config", "user.email", "test@example.com"]);
+  git(dir, ["config", "user.name", "Test User"]);
+  return dir;
+}
 // ─── detectProjectState ─────────────────────────────────────────────────────────
 test("detectProjectState: empty directory returns state=none", (t) => {
@@ -49,6 +63,132 @@ test("detectProjectState: empty directory returns state=none", (t) => {
   assert.equal(result.v2, undefined);
 });
+test("classifyProject: no git repo is invalid", (t) => {
+  const dir = makeTempDir("classify-invalid");
+  t.after(() => cleanup(dir));
+  const classification = classifyProject(dir);
+  assert.equal(classification.kind, "invalid-repo");
+});
+test("classifyProject: empty git repo is greenfield", (t) => {
+  const dir = makeGitRepo("classify-greenfield");
+  t.after(() => cleanup(dir));
+  const classification = classifyProject(dir);
+  assert.equal(classification.kind, "greenfield");
+});
+test("classifyProject: nested empty git repo does not inherit ancestor markers", (t) => {
+  const parent = makeGitRepo("classify-parent-marker");
+  t.after(() => cleanup(parent));
+  writeFileSync(join(parent, "package.json"), JSON.stringify({ name: "parent" }), "utf-8");
+  git(parent, ["add", "package.json"]);
+  git(parent, ["commit", "-m", "add parent marker"]);
+  const child = join(parent, "nested");
+  mkdirSync(child, { recursive: true });
+  git(child, ["init"]);
+  git(child, ["config", "user.email", "test@example.com"]);
+  git(child, ["config", "user.name", "Test User"]);
+  const classification = classifyProject(child);
+  assert.equal(classification.kind, "greenfield");
+});
+test("classifyProject: tracked static HTML is existing untyped content", (t) => {
+  const dir = makeGitRepo("classify-index");
+  t.after(() => cleanup(dir));
+  writeFileSync(join(dir, "index.html"), "<main></main>\n", "utf-8");
+  git(dir, ["add", "index.html"]);
+  git(dir, ["commit", "-m", "add static page"]);
+  const classification = classifyProject(dir);
+  assert.equal(classification.kind, "untyped-existing");
+  assert.deepEqual(classification.contentFiles, ["index.html"]);
+});
+test("classifyProject: README-only repo is existing untyped content", (t) => {
+  const dir = makeGitRepo("classify-readme");
+  t.after(() => cleanup(dir));
+  writeFileSync(join(dir, "README.md"), "# docs\n", "utf-8");
+  git(dir, ["add", "README.md"]);
+  git(dir, ["commit", "-m", "add docs"]);
+  const classification = classifyProject(dir);
+  assert.equal(classification.kind, "untyped-existing");
+});
+test("classifyProject: src-only content is untyped existing, not typed marker", (t) => {
+  const dir = makeGitRepo("classify-src-only");
+  t.after(() => cleanup(dir));
+  mkdirSync(join(dir, "src"), { recursive: true });
+  writeFileSync(join(dir, "src", "index.txt"), "content\n", "utf-8");
+  git(dir, ["add", "src/index.txt"]);
+  git(dir, ["commit", "-m", "add source content"]);
+  const classification = classifyProject(dir);
+  assert.equal(classification.kind, "untyped-existing");
+  assert.deepEqual(classification.contentFiles, ["src/index.txt"]);
+});
+test("classifyProject: nested untracked files count as project content", (t) => {
+  const dir = makeGitRepo("classify-untracked-nested");
+  t.after(() => cleanup(dir));
+  mkdirSync(join(dir, "docs"), { recursive: true });
+  writeFileSync(join(dir, "docs", "index.html"), "<main></main>\n", "utf-8");
+  const classification = classifyProject(dir);
+  assert.equal(classification.kind, "untyped-existing");
+  assert.deepEqual(classification.untrackedFiles, ["docs/index.html"]);
+});
+test("classifyProject: known markers produce typed existing project", (t) => {
+  const dir = makeGitRepo("classify-typed");
+  t.after(() => cleanup(dir));
+  writeFileSync(join(dir, "package.json"), JSON.stringify({ name: "typed" }), "utf-8");
+  git(dir, ["add", "package.json"]);
+  git(dir, ["commit", "-m", "add package"]);
+  const classification = classifyProject(dir);
+  assert.equal(classification.kind, "typed-existing");
+  assert.ok(classification.markers.includes("package.json"));
+});
+test("classifyProject: ignored build/cache-only files do not count as content", (t) => {
+  const dir = makeGitRepo("classify-ignored");
+  t.after(() => cleanup(dir));
+  writeFileSync(join(dir, ".gitignore"), "dist/\n.cache/\n", "utf-8");
+  git(dir, ["add", ".gitignore"]);
+  git(dir, ["commit", "-m", "ignore generated files"]);
+  mkdirSync(join(dir, "dist"), { recursive: true });
+  writeFileSync(join(dir, "dist", "bundle.js"), "generated\n", "utf-8");
+  mkdirSync(join(dir, ".cache"), { recursive: true });
+  writeFileSync(join(dir, ".cache", "x"), "cache\n", "utf-8");
+  const classification = classifyProject(dir);
+  assert.equal(classification.kind, "greenfield");
+});
+test("classifyProject: generated framework/cache dirs do not count as content", (t) => {
+  const dir = makeGitRepo("classify-generated-dirs");
+  t.after(() => cleanup(dir));
+  mkdirSync(join(dir, ".next", "server"), { recursive: true });
+  writeFileSync(join(dir, ".next", "server", "page.js"), "generated\n", "utf-8");
+  mkdirSync(join(dir, ".venv", "lib"), { recursive: true });
+  writeFileSync(join(dir, ".venv", "lib", "site.py"), "generated\n", "utf-8");
+  const classification = classifyProject(dir);
+  assert.equal(classification.kind, "greenfield");
+});
 test("detectProjectState: directory with .gsd/milestones/M001 returns v2-gsd", (t) => {
   const dir = makeTempDir("v2-gsd");
   t.after(() => cleanup(dir));

package/src/resources/extensions/gsd/tests/right-sized-workflow-prompts.test.ts ADDED Viewed

@@ -0,0 +1,192 @@
+import test from "node:test";
+import assert from "node:assert/strict";
+import { execFileSync } from "node:child_process";
+import { mkdtempSync, mkdirSync, readFileSync, rmSync, writeFileSync } from "node:fs";
+import { join } from "node:path";
+import { tmpdir } from "node:os";
+import { buildCompleteMilestonePrompt, buildPlanMilestonePrompt } from "../auto-prompts.ts";
+function git(cwd: string, args: string[]): string {
+  return execFileSync("git", args, {
+    cwd,
+    stdio: ["ignore", "pipe", "pipe"],
+    encoding: "utf-8",
+    env: { ...process.env, GIT_AUTHOR_NAME: "Test User", GIT_AUTHOR_EMAIL: "test@example.com", GIT_COMMITTER_NAME: "Test User", GIT_COMMITTER_EMAIL: "test@example.com" },
+  }).trim();
+}
+function makeRepo(files: Record<string, string>): string {
+  const base = mkdtempSync(join(tmpdir(), "gsd-right-size-"));
+  git(base, ["init", "-b", "main"]);
+  mkdirSync(join(base, ".gsd", "milestones", "M001"), { recursive: true });
+  writeFileSync(join(base, ".gsd", "milestones", "M001", "M001-CONTEXT.md"), "# Context\n\nTest milestone.");
+  for (const [path, content] of Object.entries(files)) {
+    const abs = join(base, path);
+    mkdirSync(join(abs, ".."), { recursive: true });
+    writeFileSync(abs, content);
+  }
+  git(base, ["add", "."]);
+  git(base, ["commit", "-m", "init"]);
+  return base;
+}
+function writeCompleteMilestoneFiles(base: string, validation: string): void {
+  const dir = join(base, ".gsd", "milestones", "M001");
+  mkdirSync(join(dir, "slices", "S01"), { recursive: true });
+  writeFileSync(join(dir, "M001-ROADMAP.md"), "# M001\n\n## Slices\n- [x] **S01: One** `risk:low` `depends:[]`\n  > Done\n");
+  writeFileSync(join(dir, "M001-VALIDATION.md"), validation);
+  writeFileSync(join(dir, "slices", "S01", "S01-SUMMARY.md"), "# S01 Summary\n\n**Verification:** passed\n");
+}
+function validationMetadata(): string {
+  return [
+    "validation_metadata:",
+    "  covered_artifacts:",
+    "    - `.gsd/milestones/M001/M001-VALIDATION.md`",
+    "    - `.gsd/milestones/M001/M001-ROADMAP.md`",
+    "    - `.gsd/milestones/M001/slices/S01/S01-SUMMARY.md`",
+  ].join("\n");
+}
+test("plan-milestone prompt includes tiny untyped project classification and one-slice guidance", async () => {
+  const base = makeRepo({ "index.html": "<!doctype html>\n<title>Test</title>\n" });
+  try {
+    const prompt = await buildPlanMilestonePrompt("M001", "Polish static page", base, "minimal");
+    assert.match(prompt, /\*\*Kind:\*\* untyped-existing/);
+    assert.match(prompt, /\*\*Content files:\*\* 1/);
+    assert.match(prompt, /`index\.html`/);
+    assert.match(prompt, /Prefer exactly one slice/);
+  } finally {
+    rmSync(base, { recursive: true, force: true });
+  }
+});
+test("plan-milestone prompt includes small untyped project 1-2 slice guidance", async () => {
+  const base = makeRepo({
+    "index.html": "html",
+    "README.md": "readme",
+    "styles.css": "body {}",
+  });
+  try {
+    const prompt = await buildPlanMilestonePrompt("M001", "Polish static files", base, "minimal");
+    assert.match(prompt, /\*\*Kind:\*\* untyped-existing/);
+    assert.match(prompt, /\*\*Content files:\*\* 3/);
+    assert.match(prompt, /Prefer 1-2 slices/);
+  } finally {
+    rmSync(base, { recursive: true, force: true });
+  }
+});
+test("plan-milestone prompt keeps normal guidance for typed projects", async () => {
+  const base = makeRepo({
+    "package.json": "{\"scripts\":{\"test\":\"node --test\"}}\n",
+    "src/index.js": "console.log('ok');\n",
+  });
+  try {
+    const prompt = await buildPlanMilestonePrompt("M001", "Update app", base, "minimal");
+    assert.match(prompt, /\*\*Kind:\*\* typed-existing/);
+    assert.match(prompt, /Use normal ecosystem-aware planning guidance/);
+    assert.doesNotMatch(prompt, /Prefer exactly one slice/);
+  } finally {
+    rmSync(base, { recursive: true, force: true });
+  }
+});
+test("workflow docs no longer contain blanket 4-10 slice guidance", () => {
+  const docs = readFileSync(join(process.cwd(), "src", "resources", "GSD-WORKFLOW.md"), "utf-8");
+  assert.doesNotMatch(docs, /4-10 slices/);
+  assert.match(docs, /1-10 slices/);
+  assert.match(docs, /single-file/);
+});
+test("prompt templates carry right-sized planning and closeout mode guidance", () => {
+  const planTemplate = readFileSync(join(process.cwd(), "src", "resources", "extensions", "gsd", "prompts", "plan-milestone.md"), "utf-8");
+  const completeTemplate = readFileSync(join(process.cwd(), "src", "resources", "extensions", "gsd", "prompts", "complete-milestone.md"), "utf-8");
+  assert.match(planTemplate, /Use 1-10 slices, sized to the work/);
+  assert.match(planTemplate, /tiny\/single-file\/static work should usually be one slice/);
+  assert.match(planTemplate, /untyped-existing/);
+  assert.match(completeTemplate, /Closeout Review Mode/);
+  assert.match(completeTemplate, /passing validation artifact is present/);
+  assert.doesNotMatch(completeTemplate, /^### Delegate Review Work/m);
+});
+test("complete-milestone prompt trusts passing validation artifact", async () => {
+  const base = makeRepo({ "index.html": "<!doctype html>\n<title>Test</title>\n" });
+  try {
+    writeCompleteMilestoneFiles(base, `---\nverdict: pass\nremediation_round: 0\n---\n\n# Validation\n${validationMetadata()}\n\nAll checks passed.`);
+    const prompt = await buildCompleteMilestonePrompt("M001", "Polish static page", base, "minimal");
+    assert.match(prompt, /Passing Validation Artifact/);
+    assert.match(prompt, /Treat it as authoritative/);
+    assert.match(prompt, /Do not delegate fresh reviewer\/security\/tester audits/);
+    assert.match(prompt, /All checks passed/);
+  } finally {
+    rmSync(base, { recursive: true, force: true });
+  }
+});
+test("complete-milestone prompt trusts centralized markdown body pass verdict", async () => {
+  const base = makeRepo({ "index.html": "<!doctype html>\n<title>Test</title>\n" });
+  try {
+    writeCompleteMilestoneFiles(base, `# Validation\n\n**Verdict:** PASS\n\n${validationMetadata()}\n\nAll checks passed.`);
+    const prompt = await buildCompleteMilestonePrompt("M001", "Polish static page", base, "minimal");
+    assert.match(prompt, /Passing Validation Artifact/);
+    assert.match(prompt, /Treat it as authoritative/);
+    assert.match(prompt, /Do not delegate fresh reviewer\/security\/tester audits/);
+  } finally {
+    rmSync(base, { recursive: true, force: true });
+  }
+});
+test("complete-milestone prompt does not trust stale pass validation without metadata", async () => {
+  const base = makeRepo({ "index.html": "<!doctype html>\n<title>Test</title>\n" });
+  try {
+    writeCompleteMilestoneFiles(base, "---\nverdict: pass\nremediation_round: 0\n---\n\n# Validation\nAll checks passed.");
+    const prompt = await buildCompleteMilestonePrompt("M001", "Polish static page", base, "minimal");
+    assert.match(prompt, /Validation Requires Attention/);
+    assert.match(prompt, /missing freshness metadata/);
+    assert.doesNotMatch(prompt, /Passing Validation Artifact/);
+  } finally {
+    rmSync(base, { recursive: true, force: true });
+  }
+});
+test("complete-milestone prompt does not trust pass validation missing current summary coverage", async () => {
+  const base = makeRepo({ "index.html": "<!doctype html>\n<title>Test</title>\n" });
+  try {
+    writeCompleteMilestoneFiles(base, [
+      "---",
+      "verdict: pass",
+      "remediation_round: 0",
+      "---",
+      "",
+      "# Validation",
+      "validation_metadata:",
+      "  covered_artifacts:",
+      "    - `.gsd/milestones/M001/M001-VALIDATION.md`",
+      "    - `.gsd/milestones/M001/M001-ROADMAP.md`",
+      "",
+      "All checks passed.",
+    ].join("\n"));
+    const prompt = await buildCompleteMilestonePrompt("M001", "Polish static page", base, "minimal");
+    assert.match(prompt, /Validation Requires Attention/);
+    assert.match(prompt, /does not cover current milestone artifacts/);
+    assert.doesNotMatch(prompt, /Passing Validation Artifact/);
+  } finally {
+    rmSync(base, { recursive: true, force: true });
+  }
+});
+test("complete-milestone prompt keeps deeper review path without passing validation", async () => {
+  const base = makeRepo({ "index.html": "<!doctype html>\n<title>Test</title>\n" });
+  try {
+    writeCompleteMilestoneFiles(base, "---\nverdict: needs-attention\nremediation_round: 0\n---\n\n# Validation\nFix gaps.");
+    const prompt = await buildCompleteMilestonePrompt("M001", "Polish static page", base, "minimal");
+    assert.match(prompt, /Validation Requires Attention/);
+    assert.match(prompt, /verdict `needs-attention`/);
+    assert.match(prompt, /Use `subagent` for review work needing fresh context/i);
+  } finally {
+    rmSync(base, { recursive: true, force: true });
+  }
+});

package/src/resources/extensions/gsd/tests/safety-harness-false-positives.test.ts CHANGED Viewed

@@ -144,6 +144,18 @@ test("safety-harness-bug2-race: bash evidence survives mid-unit reset between to
   assert.ok(bash[0].outputSnippet.includes("found"), "output snippet captured");
 });
+test("safety-harness: gsd_exec counts as execution evidence", () => {
+  resetEvidence();
+  recordToolCall("tc-exec-1", "gsd_exec", { command: "grep -n render index.html" });
+  recordToolResult("tc-exec-1", "gsd_exec", "Command exited with code 0\n1:render\n", false);
+  const bash = getEvidence().filter((e): e is BashEvidence => e.kind === "bash");
+  assert.equal(bash.length, 1, "gsd_exec must be tracked as execution evidence");
+  assert.equal(bash[0].command, "grep -n render index.html");
+  assert.equal(bash[0].exitCode, 0);
+});
 // ─── Bug 3: git diff HEAD~1 scope check ─────────────────────────────────────
 test("safety-harness-bug3: validateFileChanges works on initial commit (no HEAD~1)", (t) => {
@@ -237,3 +249,20 @@ test("safety-harness-bug3: validateFileChanges works on merge commit", (t) => {
   // Must produce a valid result without throwing
   assert.ok(audit !== null, "audit must be produced for merge commit repo");
 });
+test("safety-harness: planned changed file avoids unexpected-file warning", (t) => {
+  const base = mkdtempSync(join(tmpdir(), "gsd-planned-file-"));
+  t.after(() => rmSync(base, { recursive: true, force: true }));
+  execFileSync("git", ["init"], { cwd: base });
+  execFileSync("git", ["config", "user.email", "test@example.com"], { cwd: base });
+  execFileSync("git", ["config", "user.name", "Test User"], { cwd: base });
+  writeFileSync(join(base, "index.html"), "<main></main>\n");
+  execFileSync("git", ["add", "index.html"], { cwd: base });
+  execFileSync("git", ["commit", "-m", "add static app"], { cwd: base });
+  const audit = validateFileChanges(base, [], ["index.html"]);
+  assert.ok(audit !== null, "audit must be produced");
+  assert.deepEqual(audit!.unexpectedFiles, [], "planned index.html must not be unexpected");
+  assert.deepEqual(audit!.missingFiles, [], "planned index.html must not be missing");
+});