npm - @tekyzinc/gsd-t - Versions diffs - 4.1.10 → 4.2.10 - Mend

@tekyzinc/gsd-t 4.1.10 → 4.2.10

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (10) hide show

package/CHANGELOG.md +17 -0
package/README.md +3 -0
package/bin/gsd-t-traceability-gate.cjs +338 -0
package/bin/gsd-t.js +13 -0
package/commands/gsd-t-help.md +8 -0
package/commands/gsd-t-plan.md +5 -3
package/package.json +1 -1
package/templates/CLAUDE-global.md +1 -1
package/templates/prompts/pre-mortem-subagent.md +46 -0
package/templates/workflows/gsd-t-phase.workflow.js +82 -1

package/CHANGELOG.md CHANGED Viewed

@@ -2,6 +2,23 @@
 All notable changes to GSD-T are documented here. Updated with each release.
+## [4.2.10] - 2026-06-05 (M83 Left-Shifted Plan Hardening - minor)
+### Added - Plan-phase hardening: catch dead deliverables and edge cases BEFORE execute
+Left-shifts failure detection from verify to plan. Adversarial validation (the Red Team) ran only at verify — after code exists — so a milestone whose headline capability shipped as DEAD CODE (the NiceNote M5 incident: a 100MB+ chunked reader built but never wired into `openPath`, with no test exercising it) burned **four verify cycles** re-litigating the milestone's reason to exist. The root cause was in the plan: it never bound each acceptance criterion to a code path + a killing test, and nothing adversarial reviewed the design before code was written. The `plan` phase now runs two blocking gates before execute.
+- **Acceptance-traceability gate** (deterministic) — `bin/gsd-t-traceability-gate.cjs`, dispatched as `gsd-t traceability-gate`. Parses `.gsd-t/domains/*/tasks.md`; every behavioral task (one declaring acceptance criteria) must bind its ACs to a `**Files**` code path AND a named killing test; a `**Headline:** true` task must have BOTH a real implementation path and a test. Exit 4 blocks execute. Field detection is emphasis-stripped + colon-position-agnostic (`**Label**:` ≡ `**Label:**`); task blocks are detected by any non-structural heading bearing an AC (descriptive headings are not dropped); the test check is tied to the Test/Files/AC fields only (an incidental runner word in a Dependencies note does not clear it); pytest `test_*.py` / `*_test.py` conventions are preserved.
+- **Adversarial pre-mortem** (generative) — `templates/prompts/pre-mortem-subagent.md`, an opus, fresh-context, assume-the-plan-is-flawed reviewer wired into the plan workflow. Predicts edge-case / dead-deliverable / NFR / shallow-test failures and converts each blocking finding into a **required test** the plan must adopt (advisory notes forbidden — that is how M5's chunk reader shipped three data-loss bugs across three cycles). Verdict `BLOCK` / `CLEARED`.
+- The two gates are the temporal dual of the Red Team: attack the design at plan, not just the code at verify. The deterministic gate runs first and fails CLOSED (an unevaluable gate blocks); the pre-mortem cannot approve a gate-blocked plan.
+- New CLI `gsd-t traceability-gate [--milestone Mxx] [--tasks FILE]` (exit 0/4/64), added to project + global bin tools. Contract `.gsd-t/contracts/plan-hardening-contract.md` v1.0.0 STABLE. `gsd-t-plan.md` + the phase-workflow plan objective updated to require traceable tasks up front.
+- **Verification**: orthogonal triad ran. Adversarial Workflow Red Team (Opus, fresh context) FAILed first pass (1 CRITICAL — colon-inside-bold markdown defeated all field detection, silently passing the literal M5 dead-code plan — + 2 HIGH + 2 MEDIUM), all fixed; re-validation found a regression the CRITICAL fix introduced (underscore-stripping broke pytest paths, HIGH), fixed; final re-validation GRUDGING-PASS (14/14 checks, no new HIGH/CRITICAL). Real-sandbox acceptance gate passed (gate fires through the Workflow sandbox and blocks the bad plan). Suite 1372/0/4 (+15 M83 tests). Self-tested against the actual NiceNote M5 dead-code plan — the gate FAILs it at plan time, which is the milestone's reason to exist.
+- Origin: review of the NiceNote 9-milestone build, where the triad caught real bugs at verify but late; the user's proposal for an adversarial risk-assessment agent at plan.
+### Versioning
+Minor bump 4.1.10 → 4.2.10 (new feature, additive; patch reset to 10).
 ## [4.1.10] - 2026-06-05 (M82 Competition Mode - minor)
 ### Added - Competition Mode: generate-and-judge for upstream, pre-contract phases

package/README.md CHANGED Viewed

@@ -123,8 +123,11 @@ gsd-t ci-parity --json                                  # M57: reproduce the pro
 gsd-t test-data --list [--run ID] [--json]              # M58: list test-data ledger entries
 gsd-t test-data --purge --run ID [--dry-run] [--json]   # M58: purge tagged test data after Verify (Step 4.5)
 gsd-t competition-judge --in SPEC.json [--project-dir P] # M82: generate-and-judge selection oracle (partition / generic)
+gsd-t traceability-gate --milestone Mxx [--project-dir P] # M83: plan-phase acceptance-traceability gate (AC → path → killing test)
 ```
+**Plan Hardening (M83).** The `plan` phase now runs two blocking gates before execute, so a plan can't ship a dead deliverable: a deterministic **acceptance-traceability gate** (`gsd-t traceability-gate` — every AC must bind to a code path + a killing test; the headline capability needs both impl and test) and an adversarial **pre-mortem** agent (opus, fresh-context, predicts edge-case/NFR/dead-deliverable failures and requires a test for each). The temporal dual of the Red Team — attack the design at plan, not just the code at verify. Origin: a build where the headline capability shipped as dead code and burned 4 verify cycles. See `.gsd-t/contracts/plan-hardening-contract.md`.
 **Competition Mode (M82).** Opt-in `--competition N` (N 2–5) on upstream, pre-contract phases (`/gsd-t-partition`, `/gsd-t-milestone`, `/gsd-t-design-decompose`) fans out N parallel candidate producers and a judge selects the winner — the generative dual of the orthogonal validation triad. Partition uses an *objective* file-disjointness oracle as the judge (a calculator, not a biased critic); subjective phases use a blind + different-model + rubric judge. Default off. See `.gsd-t/contracts/competition-mode-contract.md`.
 `gsd-t parallel` consumes the M44 task-graph (D1) and applies three pre-spawn gates (D4 depgraph validation → D5 file-disjointness → D6 economics) followed by mode-aware headroom/split math. Extends — does not replace — the M40 orchestrator. Contract: `.gsd-t/contracts/wave-join-contract.md` v1.1.0.

package/bin/gsd-t-traceability-gate.cjs ADDED Viewed

@@ -0,0 +1,338 @@
+"use strict";
+/**
+ * gsd-t-traceability-gate — M83 D1
+ *
+ * The plan-phase acceptance-traceability gate. The deterministic half of
+ * Left-Shifted Plan Hardening (the adversarial pre-mortem agent is the
+ * generative half). Contract: .gsd-t/contracts/plan-hardening-contract.md.
+ *
+ * ORIGIN (NiceNote M5 incident, 2026-06-05): M5's headline capability (AC-6,
+ * 100MB+ chunked read) shipped as DEAD CODE — the chunk reader was built but
+ * openPath still materialized whole files, and NO test asserted the headline
+ * capability, so the suite stayed green. The triad burned 4 verify cycles
+ * re-litigating the milestone's reason to exist. Root cause: the plan never
+ * bound each acceptance criterion to (a) a real code path and (b) a test that
+ * FAILS if that path is absent. This gate enforces that binding BEFORE execute.
+ *
+ * What it checks, per `.gsd-t/domains/* /tasks.md` task block:
+ *   - Every task that declares **Acceptance criteria** MUST declare **Files**
+ *     (an implementing code path) — an AC with no path is an unbacked promise.
+ *   - Every such task MUST declare a TEST reference (a Test/Tests field, a
+ *     test-runner mention, or a Files entry matching a test path pattern) — an
+ *     AC with no killing test is the dead-code class (passes vacuously / never
+ *     exercised). The milestone's HEADLINE capability without a test is exactly
+ *     the M5 failure.
+ *   - A task tagged as the milestone HEADLINE (**Headline:** true, or an AC
+ *     referencing the milestone's named capability) gets a STRICTER check: it
+ *     MUST have a non-test Files entry (real implementation, not just a test)
+ *     AND a test entry. A headline with only a test, or only an impl, fails.
+ *
+ * It does NOT judge whether the code is correct (that's verify) — only whether
+ * the PLAN is complete enough that execute can't produce a dead deliverable.
+ *
+ * Input: --milestone Mxx --project-dir PATH  (reads .gsd-t/domains/* /tasks.md).
+ *        OR --tasks <file> to check a single tasks.md (used by tests).
+ * Output: JSON envelope { ok, exitCode, milestone, tasks:[...], violations:[...] }.
+ * Exit: 0 all tasks traceable · 4 ≥1 violation (blocks execute) · 64 bad input.
+ *
+ * Hard rules: zero deps, never throws, pure/read-only.
+ */
+const fs = require("node:fs");
+const path = require("node:path");
+// ─── tasks.md parsing ────────────────────────────────────────────────────
+// Red Team CRITICAL/HIGH-3/MEDIUM-1 (M83 verify): markdown field labels appear in
+// BOTH `**Label**: v` (colon outside bold) and `**Label:** v` (colon inside) forms.
+// Matching against the raw line missed the colon-inside form — defeating the entire
+// gate on the canonical M5 dead-code plan. Fix: STRIP emphasis markers first, then
+// match the colon-agnostic bare text. All field detection runs on the bared line.
+function _bare(line) {
+  return String(line == null ? "" : line).replace(/[*_`]/g, "");
+}
+// Path-safe bare: strips only emphasis that wraps labels (* and backtick), but
+// PRESERVES underscores — pytest's test_*.py / *_test.py conventions depend on
+// them, and TEST_PATH_RE has `_test\.` / `test_` alternatives (Red Team M83
+// recheck HIGH: stripping `_` before the test-path scan false-failed Python plans).
+function _barePath(s) {
+  return String(s == null ? "" : s).replace(/[*`]/g, "");
+}
+// A test reference is: an explicit Test/Tests field, a known runner mention, or a
+// Files path that looks like a test file. Kept broad on purpose — the gate asserts
+// a test is NAMED, not that it exists yet (plan precedes execute).
+const TEST_PATH_RE = /(\.test\.|\.spec\.|(^|\/)tests?\/|(^|\/)e2e\/|_test\.|test_|cargo test|vitest|playwright|pytest|jest)/i;
+// Field regexes run on the BARED line, so the colon can be anywhere the label ends.
+const TEST_FIELD_RE = /^\s*[-*]?\s*(tests?|test\s*ref|test\s*coverage|verified\s*by)\s*:/i;
+const FILES_FIELD_RE = /^\s*[-*]?\s*files?\s*:/i;
+const AC_FIELD_RE = /^\s*[-*]?\s*(acceptance(\s*criteria)?|accept|ac)\s*:/i;
+const HEADLINE_FIELD_RE = /^\s*[-*]?\s*headline\s*:\s*(true|yes)/i;
+const HEADING_RE = /^(#{2,4})\s+(.*\S.*)$/;
+// Headings that are structural, never tasks — so we don't mis-parse a Summary/
+// Overview block as a behavioral task. Everything else that bears an AC field IS
+// assessed (Red Team HIGH-2: do NOT gate task detection on heading wording —
+// anchor on the AC, so a descriptive heading like "Implement the reader" is caught).
+const NON_TASK_HEADING_RE = /^(summary|overview|notes?|context|goal|background|wave\s*history|index|integration\s*points?|dependencies|references?|appendix|tasks)\s*$/i;
+/**
+ * Parse a tasks.md into candidate blocks: every `##`–`####` heading starts a
+ * block (except the structural-heading skip list). A block becomes a TASK for
+ * assessment iff it contains an acceptance-criteria field (decided later in
+ * assessTask) — but we keep ALL non-structural blocks so no AC-bearing block is
+ * ever dropped on heading wording.
+ * @returns {Array<{title, raw, lines}>}
+ */
+function parseTasks(md) {
+  const lines = (md || "").split(/\r?\n/);
+  const blocks = [];
+  let cur = null;
+  for (const line of lines) {
+    const m = line.match(HEADING_RE);
+    if (m) {
+      const title = m[2].trim();
+      // Close any open block at every heading.
+      if (cur) { blocks.push(cur); cur = null; }
+      // Structural headings start no block; everything else does.
+      if (!NON_TASK_HEADING_RE.test(_bare(title).trim())) {
+        cur = { title, lines: [] };
+      }
+      continue;
+    }
+    if (cur) cur.lines.push(line);
+  }
+  if (cur) blocks.push(cur);
+  return blocks.map((t) => ({ title: t.title, raw: t.lines.join("\n"), lines: t.lines }));
+}
+// ─── per-task traceability assessment ────────────────────────────────────
+// All field matching runs on the BARED line (emphasis stripped) so colon
+// position inside/outside bold is irrelevant (Red Team CRITICAL fix).
+function fieldValue(lines, re) {
+  for (const ln of lines) {
+    const bare = _bare(ln);
+    if (re.test(bare)) {
+      const idx = bare.indexOf(":");
+      return idx >= 0 ? bare.slice(idx + 1).trim() : "";
+    }
+  }
+  return null;
+}
+// Like fieldValue but PRESERVES underscores in the returned value (label is still
+// matched emphasis-agnostically) — used for value-level test-path scans so
+// test_*.py / *_test.py survive (Red Team recheck HIGH).
+function fieldValueRaw(lines, re) {
+  for (const ln of lines) {
+    if (re.test(_bare(ln))) {
+      const raw = _barePath(ln);
+      const idx = raw.indexOf(":");
+      return idx >= 0 ? raw.slice(idx + 1).trim() : "";
+    }
+  }
+  return null;
+}
+function hasMultiField(lines, re) {
+  return lines.some((ln) => re.test(_bare(ln)));
+}
+// Collect the indented/bulleted sub-lines that follow an Acceptance-criteria
+// label up to the next top-level field — these ARE the acceptance criteria, and
+// an AC may name its own verifying test there ("…; verified by cargo test").
+function _acBulletText(lines) {
+  const out = [];
+  let inAc = false;
+  for (const ln of lines) {
+    const bare = _bare(ln);
+    if (AC_FIELD_RE.test(bare)) { inAc = true; continue; }
+    if (!inAc) continue;
+    // A new NON-INDENTED "Label:" line closes the AC block.
+    if (/^\s*[-*]?\s*[a-z][a-z\s]{1,24}:/i.test(bare) && !/^\s{2,}/.test(ln)) {
+      inAc = false; continue;
+    }
+    out.push(_barePath(ln)); // preserve underscores for test-path detection
+  }
+  return out.join("\n");
+}
+/**
+ * A task is "behavioral" (subject to the gate) if it declares acceptance
+ * criteria — i.e. it promises an observable behavior. Pure-scaffolding tasks
+ * with no ACs are out of scope (nothing to trace).
+ */
+function assessTask(task) {
+  const lines = task.lines;
+  const hasAc = hasMultiField(lines, AC_FIELD_RE);
+  if (!hasAc) {
+    return { title: task.title, behavioral: false, violations: [] };
+  }
+  // Underscore-preserving values for path/runner scans (Red Team recheck HIGH).
+  const filesVal = fieldValueRaw(lines, FILES_FIELD_RE) || "";
+  const hasFiles = hasMultiField(lines, FILES_FIELD_RE) && filesVal.replace(/[—–-]/g, "").trim().length > 0;
+  // Test reference (MEDIUM-1 fix): satisfied ONLY by a runner/test-path tied to a
+  // RELEVANT field — the Test field, the Files field, or the Acceptance-criteria
+  // value (where an AC may name its own verifying test, e.g. "…; verified by cargo
+  // test"). An incidental runner mention in an UNRELATED field (Dependencies,
+  // Notes, Scope) must NOT vacuously clear the killing-test requirement.
+  const hasTestField = hasMultiField(lines, TEST_FIELD_RE);
+  const testFieldVal = fieldValueRaw(lines, TEST_FIELD_RE) || "";
+  const acVal = fieldValueRaw(lines, AC_FIELD_RE) || "";
+  // AC criteria often span bullet sub-lines after the label; gather those too
+  // (underscore-preserving, so a test_*.py named in a bullet still matches).
+  const acBullets = _acBulletText(lines);
+  const filesHasTestPath = TEST_PATH_RE.test(filesVal);
+  const testFieldHasRunner = TEST_PATH_RE.test(testFieldVal);
+  const acHasRunner = TEST_PATH_RE.test(acVal) || TEST_PATH_RE.test(acBullets);
+  const hasTest = hasTestField || filesHasTestPath || testFieldHasRunner || acHasRunner;
+  // A non-test implementing path: a Files entry that is NOT only test files.
+  const fileTokens = filesVal.split(/[,\s]+/).map((s) => s.replace(/[`*()]/g, "").trim()).filter(Boolean);
+  const implTokens = fileTokens.filter((f) => /[./]/.test(f) && !TEST_PATH_RE.test(f));
+  const hasImplPath = implTokens.length > 0;
+  const isHeadline = lines.some((ln) => HEADLINE_FIELD_RE.test(_bare(ln)));
+  const violations = [];
+  if (!hasFiles) {
+    violations.push({ kind: "ac-without-path", detail: "task declares acceptance criteria but no **Files** implementing path — an unbacked promise." });
+  }
+  if (!hasTest) {
+    violations.push({ kind: "ac-without-test", detail: "task declares acceptance criteria but names no test (Test field, test path, or runner) — the dead-code class: it can pass vacuously / never be exercised." });
+  }
+  if (isHeadline && !hasImplPath) {
+    violations.push({ kind: "headline-without-impl", detail: "HEADLINE task has no non-test implementing path — the milestone's reason to exist is not bound to real code (the M5 AC-6 dead-code failure)." });
+  }
+  if (isHeadline && !hasTest) {
+    violations.push({ kind: "headline-without-test", detail: "HEADLINE task has no test proving the milestone's core capability is delivered (the missing >100MB-fixture failure)." });
+  }
+  return {
+    title: task.title,
+    behavioral: true,
+    isHeadline,
+    hasFiles, hasTest, hasImplPath,
+    violations,
+  };
+}
+// ─── driver ──────────────────────────────────────────────────────────────
+function listTasksFiles(projectDir, milestone) {
+  const domainsDir = path.join(projectDir, ".gsd-t", "domains");
+  let entries = [];
+  try {
+    entries = fs.readdirSync(domainsDir, { withFileTypes: true });
+  } catch {
+    return [];
+  }
+  const out = [];
+  const mPrefix = milestone ? milestone.toLowerCase() : null;
+  for (const e of entries) {
+    if (!e.isDirectory()) continue;
+    // When a milestone is given, prefer domains whose name carries that mNN
+    // prefix; if none match, fall back to all domains (single-milestone repos).
+    const tasksPath = path.join(domainsDir, e.name, "tasks.md");
+    if (fs.existsSync(tasksPath)) out.push({ domain: e.name, tasksPath });
+  }
+  if (mPrefix) {
+    const matched = out.filter((d) => d.domain.toLowerCase().startsWith(mPrefix));
+    if (matched.length) return matched;
+  }
+  return out;
+}
+function runGate({ projectDir = process.cwd(), milestone = null, tasksFile = null } = {}) {
+  let files;
+  if (tasksFile) {
+    files = [{ domain: path.basename(path.dirname(tasksFile)), tasksPath: tasksFile }];
+  } else {
+    files = listTasksFiles(projectDir, milestone);
+  }
+  if (!files.length) {
+    return { ok: false, exitCode: 64, milestone, reason: "no-tasks-files", tasks: [], violations: [] };
+  }
+  const taskResults = [];
+  const violations = [];
+  let behavioralCount = 0;
+  for (const f of files) {
+    let md;
+    try { md = fs.readFileSync(f.tasksPath, "utf8"); } catch { continue; }
+    for (const t of parseTasks(md)) {
+      const r = assessTask(t);
+      r.domain = f.domain;
+      taskResults.push(r);
+      if (r.behavioral) behavioralCount++;
+      for (const v of r.violations) {
+        violations.push({ domain: f.domain, task: r.title, ...v });
+      }
+    }
+  }
+  const ok = violations.length === 0;
+  return {
+    ok,
+    exitCode: ok ? 0 : 4,
+    milestone,
+    summary: {
+      tasksTotal: taskResults.length,
+      behavioral: behavioralCount,
+      violations: violations.length,
+    },
+    tasks: taskResults,
+    violations,
+    ...(ok ? {} : { reason: "untraceable-acceptance-criteria" }),
+  };
+}
+// ─── CLI ─────────────────────────────────────────────────────────────────
+function parseArgs(argv) {
+  const o = { projectDir: process.cwd(), milestone: null, tasksFile: null, help: false };
+  for (let i = 0; i < argv.length; i++) {
+    const a = argv[i];
+    if (a === "--help" || a === "-h") o.help = true;
+    else if (a === "--project-dir") o.projectDir = argv[++i];
+    else if (a === "--milestone") o.milestone = argv[++i];
+    else if (a === "--tasks") o.tasksFile = argv[++i];
+    else if (a === "--json") {/* default */}
+  }
+  return o;
+}
+const HELP = `Usage: gsd-t traceability-gate [--milestone Mxx] [--project-dir PATH] [--tasks FILE]
+Plan-phase acceptance-traceability gate (M83). Asserts every behavioral task in
+the milestone's .gsd-t/domains/* /tasks.md binds its acceptance criteria to an
+implementing **Files** path AND a named test. Headline tasks must have BOTH a
+real implementation path and a test. Blocks execute on any violation.
+  --milestone Mxx    Limit to domains whose name carries the mNN prefix.
+  --project-dir P    Project root (default: cwd).
+  --tasks FILE       Check a single tasks.md (overrides domain discovery).
+Exit: 0 all traceable · 4 ≥1 violation · 64 no tasks files / bad input.`;
+function main() {
+  const o = parseArgs(process.argv.slice(2));
+  if (o.help) { process.stdout.write(HELP + "\n"); process.exit(0); }
+  let res;
+  try {
+    res = runGate(o);
+  } catch (e) {
+    res = { ok: false, exitCode: 64, milestone: o.milestone, reason: `gate-error: ${e && e.message}`, tasks: [], violations: [] };
+  }
+  process.stdout.write(JSON.stringify(res, null, 2) + "\n");
+  process.exit(res.exitCode);
+}
+if (require.main === module) main();
+module.exports = { runGate, parseTasks, assessTask, _internal: { fieldValue, TEST_PATH_RE } };

package/bin/gsd-t.js CHANGED Viewed

@@ -1184,6 +1184,8 @@ const GLOBAL_BIN_TOOLS = [
   "gsd-t-ci-parity.cjs",
   // M82 — Competition Mode generate-and-judge selection oracle.
   "gsd-t-competition-judge.cjs",
+  // M83 — Plan-phase acceptance-traceability gate.
+  "gsd-t-traceability-gate.cjs",
 ];
 function installGlobalBinTools() {
@@ -2475,6 +2477,8 @@ const PROJECT_BIN_TOOLS = [
   // project's gsd-t-phase workflow can score candidate partitions via the
   // project-local bin (runCli prefers bin/<tool>.cjs over the global binary).
   "gsd-t-competition-judge.cjs", "gsd-t-file-disjointness.cjs",
+  // M83 — Plan-phase acceptance-traceability gate (runs in the plan workflow).
+  "gsd-t-traceability-gate.cjs",
 ];
 // Files that older versions of this installer copied into project bin/ but
@@ -4562,6 +4566,15 @@ if (require.main === module) {
       });
       process.exit(res.status == null ? 1 : res.status);
     }
+    case "traceability-gate": {
+      // M83 D1 — `gsd-t traceability-gate` plan-phase acceptance-traceability gate.
+      const { spawnSync } = require("child_process");
+      const js = path.join(__dirname, "gsd-t-traceability-gate.cjs");
+      const res = spawnSync(process.execPath, [js, ...args.slice(1)], {
+        stdio: "inherit",
+      });
+      process.exit(res.status == null ? 1 : res.status);
+    }
     case "metrics":
       doMetrics(args.slice(1));
       break;

package/commands/gsd-t-help.md CHANGED Viewed

@@ -487,6 +487,14 @@ Use these when user asks for help on a specific command:
 - **CLI**: `gsd-t competition-judge [--in <spec.json>] [--project-dir <dir>]` (spec via stdin or `--in`). Exit 0 winner · 4 no valid candidate · 64 bad input.
 - **Contract**: `.gsd-t/contracts/competition-mode-contract.md` v1.0.0 STABLE.
+### traceability-gate (M83)
+- **Summary**: Plan-phase acceptance-traceability gate — the deterministic half of Left-Shifted Plan Hardening. Parses `.gsd-t/domains/*/tasks.md` and asserts every behavioral task binds its acceptance criteria to a `**Files**` code path AND a named killing test; a `**Headline:** true` task must have both a real implementation path and a test. Catches the dead-deliverable class (a capability built but never tested/wired) at PLAN time instead of at verify.
+- **Auto-invoked**: Yes — by `gsd-t-phase.workflow.js` at the end of the `plan` phase, blocking before execute (alongside the adversarial pre-mortem agent, protocol `templates/prompts/pre-mortem-subagent.md`).
+- **Files**: `bin/gsd-t-traceability-gate.cjs`.
+- **Use when**: Every plan phase (automatic). Origin: NiceNote M5 shipped its headline 100MB+ chunked-read as dead code with no test → 4 verify cycles.
+- **CLI**: `gsd-t traceability-gate [--milestone <Mxx>] [--project-dir <dir>] [--tasks <file>]`. Exit 0 all traceable · 4 ≥1 untraceable AC (blocks execute) · 64 no tasks files.
+- **Contract**: `.gsd-t/contracts/plan-hardening-contract.md` v1.0.0 STABLE.
 ## Unknown Command
 If user asks for help on unrecognized command:

package/commands/gsd-t-plan.md CHANGED Viewed

@@ -33,12 +33,14 @@ Read `.gsd-t/progress.md` and each domain's `scope.md`/`constraints.md`. The par
 ## Step 3: Interpret the result
-The Workflow returns `{ status, artifacts, summary, decisions }`.
+The Workflow returns `{ status, artifacts, summary, decisions, traceability?, preMortem? }`.
-- `status === "complete"`: every domain has atomic tasks; `gsd-t parallel --dry-run` validates disjointness. Auto-advance to `/gsd-t-execute`.
-- `status === "partial" | "blocked"`: read `summary` (e.g. file-overlap between domains needing re-scoping).
+- `status === "complete"`: every domain has atomic tasks; `gsd-t parallel --dry-run` validates disjointness; **M83 plan hardening passed** (acceptance-traceability gate + adversarial pre-mortem). Auto-advance to `/gsd-t-execute`.
+- `status === "partial" | "blocked"`: read `summary` (e.g. file-overlap between domains; or **M83 plan hardening blocked** — see `traceability.violations` / `preMortem.findings`: an AC not bound to a code path + killing test, or a predicted failure condition with no planned test. Fix `tasks.md` and re-run plan).
 - `status === "failed"`: read `summary`.
+**M83 Plan Hardening (runs automatically at the end of plan, blocking before execute).** Two gates ensure the plan can't produce a dead deliverable: (1) the deterministic **acceptance-traceability gate** (`gsd-t traceability-gate`) — every behavioral task's ACs must bind to a `**Files**` code path + a named test; the **Headline:** task needs both a real impl path and a test. (2) the adversarial **pre-mortem** agent (opus, fresh-context) — predicts edge-case/dead-deliverable/NFR failures and requires a test for each. Origin: NiceNote M5 shipped its headline (100MB+ chunked read) as dead code with no test, burning 4 verify cycles. Contract: `.gsd-t/contracts/plan-hardening-contract.md`.
 ## Document Ripple
 The plan agent writes per-domain `tasks.md`, updates `integration-points.md`, and adds a Decision Log entry.

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@tekyzinc/gsd-t",
-  "version": "4.1.10",
+  "version": "4.2.10",
   "description": "GSD-T: Contract-Driven Development for Claude Code — 54 slash commands with headless-by-default workflow spawning, unattended supervisor relay with event stream, graph-powered code analysis, real-time agent dashboard, task telemetry, doc-ripple enforcement, backlog management, impact analysis, test sync, milestone archival, and PRD generation",
   "author": "Tekyz, Inc.",
   "license": "MIT",

package/templates/CLAUDE-global.md CHANGED Viewed

@@ -328,7 +328,7 @@ Canonical scripts:
 - `gsd-t-integrate.workflow.js` — cross-domain wire-up + light verify-gate
 - `gsd-t-debug.workflow.js` — 2-cycle diagnose/fix/verify (CLAUDE.md Prime Rule)
 - `gsd-t-quick.workflow.js` — preflight + brief + single-task + verify-gate (M56-D4)
-- `gsd-t-phase.workflow.js` — generic upper-stage runner (partition / plan / discuss / impact / milestone / prd / design-decompose / doc-ripple). **M82 Competition Mode:** an opt-in `competition: N` arg (N 2–5) on eligible upstream phases (partition / milestone / discuss / design-decompose) fans out N parallel Self-MoA producers → a judge stage → a finalizer. Partition's judge is the OBJECTIVE file-disjointness oracle (`gsd-t competition-judge --kind partition` — a calculator, not an LLM critic, immune to judge bias, the v1 beachhead); subjective phases use a blind + shuffled + different-model + rubric judge whose pick is finalized deterministically by `--kind generic`. The generative dual of the orthogonal validation triad; watershed rule = generate-and-judge ABOVE the contract, attack-and-filter BELOW. Default off. Contract: `competition-mode-contract.md` v1.0.0.
+- `gsd-t-phase.workflow.js` — generic upper-stage runner (partition / plan / discuss / impact / milestone / prd / design-decompose / doc-ripple). **M82 Competition Mode:** an opt-in `competition: N` arg (N 2–5) on eligible upstream phases (partition / milestone / discuss / design-decompose) fans out N parallel Self-MoA producers → a judge stage → a finalizer. Partition's judge is the OBJECTIVE file-disjointness oracle (`gsd-t competition-judge --kind partition` — a calculator, not an LLM critic, immune to judge bias, the v1 beachhead); subjective phases use a blind + shuffled + different-model + rubric judge whose pick is finalized deterministically by `--kind generic`. The generative dual of the orthogonal validation triad; watershed rule = generate-and-judge ABOVE the contract, attack-and-filter BELOW. Default off. Contract: `competition-mode-contract.md` v1.0.0. **M83 Plan Hardening:** the `plan` phase runs two blocking gates before execute — a deterministic acceptance-traceability gate (`gsd-t traceability-gate`: every AC binds to a code path + a killing test; the `Headline:` task needs both impl and test) and an adversarial pre-mortem agent (opus, fresh-context, protocol `pre-mortem-subagent.md`: predicts edge-case/dead-deliverable/NFR failures, each → a required test). The temporal dual of the Red Team (attack the design at plan, not just code at verify). Contract: `plan-hardening-contract.md` v1.0.0.
 - `gsd-t-scan.workflow.js` — preflight → volume-probe → pipeline(per-slice deep finder → single verify) → synthesis → document → render (M66: fans out by codebase VOLUME, not a fixed 5-teammate dimension count; M67: deep document phase deterministically produces the full living-doc set + dimension files, per-doc fan-out)
 **Runtime-native invariant (M81 — v4.0.29+):** the Workflow sandbox provides ONLY `agent/parallel/pipeline/log/phase/budget/args` — NO `require`/`fs`/`path`/`child_process`/`process`, and `args` arrives as a JSON STRING. Each workflow is self-contained: it `JSON.parse`s `args` and delegates every CLI call (preflight, verify-gate, brief, build-coverage, ci-parity, test-data, disjointness) to inline `async` helpers that run the command via an `agent()`'s Bash (preferring project-local `bin/<tool>.cjs`, else the global `gsd-t` PATH binary) and parse the JSON envelope — preserving the M55-D5 project-local-bin invariant. The old `require("./_lib.js")` pattern threw `ReferenceError` on first eval and silently broke every workflow except scan (TD-113, fixed M81); `_lib.js` is retired as a workflow dependency.

package/templates/prompts/pre-mortem-subagent.md ADDED Viewed

@@ -0,0 +1,46 @@
+# Pre-Mortem Subagent Prompt — Adversarial Plan Review (pre-execute)
+You are an adversarial Pre-Mortem reviewer. You attack the PLAN, not the code — because the code does not exist yet. Your job is to predict, BEFORE a single line is executed, how this milestone will fail: the edge cases it will hit, the deliverables it will leave hollow, and the assumptions it is quietly making. You are the generative-adversarial dual of the Red Team: the Red Team attacks finished code at verify; you attack the design at plan, so the milestone is built right the FIRST time instead of being re-litigated across verify cycles.
+**Inverted incentives.** Your value is measured by REAL failure conditions surfaced now, not by approving the plan. A plan you bless that later burns verify cycles is YOUR failure. Assume the plan is flawed and find where.
+<!-- Workflow-stage invocation -->
+**Invocation context.** When this protocol runs as a native Workflow `agent()` stage (via `templates/workflows/gsd-t-phase.workflow.js` plan phase), your **final emission MUST be a single StructuredOutput object** matching the PRE_MORTEM schema declared by the Workflow. Bash/git/Read tool use is permitted DURING analysis; the final emission is the JSON verdict.
+<!-- brief-first rule -->
+**Brief first.** If you're about to grep, read, or run something, check the brief at `$BRIEF_PATH` first (a ≤2,500-token snapshot of CLAUDE.md + contracts + scope + requirements). It identifies the milestone's acceptance criteria and high-risk surfaces — your starting attack surface. If unset/missing, fall back to reading the plan artifacts directly, but log the gap.
+## What you are given
+The milestone's PLAN: `.gsd-t/domains/*/{scope,constraints,tasks}.md`, the relevant `.gsd-t/contracts/`, and the acceptance criteria / FRs / NFRs in `docs/requirements.md`. Read the milestone's stated GOAL and its HEADLINE capability (the one thing the milestone exists to deliver).
+## Hard Rules
+- **Failure conditions = value.** A short list is failure. Exhaust every category below.
+- **A finding must be CONCRETE and FALSIFIABLE.** "Could have edge cases" is not a finding. "A multi-byte UTF-8 codepoint split across a chunk boundary in `read_file_chunk` will corrupt or stall — there is no test for it" IS a finding.
+- **Every blocking finding must become a REQUIRED TEST.** This is the core rule. Do not emit advisory notes — advisory notes get deferred, and a deferred edge case is exactly how the NiceNote M5 chunk reader shipped three distinct data-loss bugs across three verify cycles. For each finding, state the test that must exist in the plan before execute may start. If the plan already names that test, it is not a finding.
+- **The headline capability gets the hardest scrutiny.** Ask explicitly: is the milestone's reason-to-exist (a) bound to a real code path in the plan, (b) reachable from a user action / entry point, and (c) covered by a test that FAILS if that path is dead? The NiceNote M5 milestone shipped its headline (100MB+ chunked read) as DEAD CODE because the plan never required a test that exercised it. Catch that here.
+- **Deferral is illegitimate for a milestone's own headline.** If the plan defers the milestone's defining capability (or a core AC) to a later milestone, that is a blocking finding — an incomplete milestone, not a warning.
+- Style/taste is NOT a finding. Theoretical purity is NOT a finding. Only predicted, concrete, testable failure.
+## Attack Categories (exhaust ALL)
+1. **Dead-deliverable / wiring gaps** — Is every acceptance criterion bound to a code path that is actually CALLED from an entry point? Could a capability be built but never invoked (the M5 dead-code class)? Is the headline reachable from a real user action?
+2. **Boundary & edge inputs** — empty / null / huge / zero-length / off-by-one / max-size. For each data path the plan introduces: what is the worst input, and is there a test for it? (split codepoints, chunk boundaries, 0-byte files, files at exactly the threshold, unicode, path traversal.)
+3. **Resource / NFR conditions** — memory, time, file-handle, DOM-node, payload-size ceilings. Does any NFR (performance, bounded memory, scale) have a FALSIFIABLE measured acceptance check in the plan? An NFR with no measured test is a blocking finding (the NiceNote NFR-1 160k-DOM-node class).
+4. **Error & failure paths** — what happens when the new code's dependency fails, the input is malformed, the operation is interrupted mid-flight? Does the plan specify graceful degradation, and is there a test for the failure path (not just the happy path)?
+5. **State / ordering / concurrency** — actions out of order, partial completion, re-entry, two things racing over a shared resource (the verify-gate port-race class). Does the plan account for it?
+6. **Contract & integration seams** — at every cross-domain boundary the plan defines, do both sides agree on shape, error behavior, and who owns the shared file? Is there an integration test for the seam, not just unit tests on each side?
+7. **Shallow-test traps** — does the plan's testing approach risk vacuous passes? (assertions gated behind `if (count > 0)`, `toBeVisible()` standing in for a functional check, `toHaveCount` with no state assertion.) Flag any planned test that would pass on a broken implementation.
+8. **Missing acceptance coverage** — read requirements. Is there an AC / FR / NFR with no task that delivers it, or no test that proves it?
+## Verdict
+- **BLOCK** — one or more concrete, falsifiable failure conditions that the plan does not yet cover with a required test. The plan may NOT proceed to execute until each blocking finding is answered by a named required test (or the design is changed to make the condition impossible). This is the FAIL-equivalent.
+- **CLEARED** — exhaustive search; every predicted failure condition is already covered by a named test in the plan, the headline is bound+reachable+tested, and every NFR has a measured acceptance check. (The plan-quality equivalent of GRUDGING-PASS — earned by exhaustion, not by haste.)
+## Output (StructuredOutput)
+Emit a single object: `{ verdict: "BLOCK" | "CLEARED", findings: [ { severity: "CRITICAL"|"HIGH"|"MEDIUM"|"LOW", category, condition, whyItFails, requiredTest, affectedAC? } ], headlineAssessment: { capability, boundToPath, reachable, hasKillingTest }, notes }`.
+`requiredTest` is the load-bearing field: the specific test that must be added to the plan to close the finding. A finding without a `requiredTest` is incomplete — every blocking finding converts to a test the plan must adopt before execute.

package/templates/workflows/gsd-t-phase.workflow.js CHANGED Viewed

@@ -67,6 +67,14 @@ async function runCli(projectDir, subcmd, argv, localBin, label, parseJson = tru
   return r || { ok: false, exitCode: -1, envelope: null, via: "error" };
 }
 async function runPreflight(projectDir, label = "preflight", phaseNameOpt) { return runCli(projectDir, "preflight", ["--json"], "cli-preflight.cjs", label, true, phaseNameOpt); }
+// M83: the deterministic plan-hardening gate. Returns the parsed envelope
+// ({ ok, exitCode, violations, ... }); ok:false means ≥1 untraceable AC.
+async function runTraceabilityGate(projectDir, milestone, label = "traceability-gate", phaseNameOpt) {
+  const argv = ["--json"];
+  if (milestone) argv.push("--milestone", milestone);
+  const r = await runCli(projectDir, "traceability-gate", argv, "gsd-t-traceability-gate.cjs", label, true, phaseNameOpt);
+  return r.envelope || { ok: r.ok, exitCode: r.exitCode, violations: [], reason: "gate-unparsed" };
+}
 async function generateBrief(projectDir, { kind = "execute", milestone, domain, id, label = "brief", phaseNameOpt } = {}) {
   const argv = ["--kind", kind, "--spawn-id", id, "--out", `${projectDir}/.gsd-t/briefs/${id}.json`];
   if (milestone) argv.push("--milestone", milestone);
@@ -184,7 +192,9 @@ const brief = await generateBrief(projectDir, { kind: phaseName, milestone, id:
 phase("Phase");
 const promptByPhase = {
   partition: `Decompose the milestone into 2-5 independent domains. Write .gsd-t/domains/{domain}/{scope,constraints,tasks}.md. Cross-domain contracts in .gsd-t/contracts/.`,
-  plan: `For each domain, write atomic tasks.md entries with files, contract refs, dependencies, acceptance criteria. Update .gsd-t/contracts/integration-points.md with wave groupings.`,
+  plan: `For each domain, write atomic tasks.md entries with files, contract refs, dependencies, acceptance criteria. Update .gsd-t/contracts/integration-points.md with wave groupings.
+M83 PLAN HARDENING (mandatory — the plan is BLOCKED from execute otherwise): every task that declares acceptance criteria MUST also declare (1) **Files** = the concrete code path that implements it, and (2) a TEST that fails if that path is dead — name it in a **Test** field, a test-file path (\`*.test.*\` / \`*.spec.*\` / \`e2e/\`), or a runner (vitest/cargo test/playwright). The ONE task that delivers the milestone's HEADLINE capability MUST be tagged **Headline:** true and carry BOTH a real implementation path AND a test that exercises that capability end-to-end (e.g. for a "100MB+ file" milestone, a test that actually opens a >100MB fixture). NEVER defer a milestone's own headline capability or a core AC to a later milestone. This exists because NiceNote M5 shipped its headline (100MB+ chunked read) as DEAD CODE with no test and burned 4 verify cycles.`,
   discuss: `Multi-perspective exploration of design questions. Settle locked decisions into .gsd-t/CONTEXT.md. Do NOT implement.`,
   impact: `Analyze downstream effects of proposed changes. Identify breaking changes, affected consumers, migration paths.`,
   milestone: `Define a new milestone — origin, goal, success criteria, falsifiable acceptance. Append to .gsd-t/progress.md. Defer partition/plan.`,
@@ -434,4 +444,75 @@ if (!competitionOn) {
   result.competition = { n: candidates.length, winner: winner.id, ranked };
 }
+// ── M83 Left-Shifted Plan Hardening (plan phase only) ──
+// Two blocking gates run AFTER the plan agent writes tasks.md and BEFORE the plan
+// is declared complete — so execute can never start on a plan that would produce a
+// dead deliverable or an unguarded edge case. Contract: plan-hardening-contract.md.
+//   (1) Deterministic acceptance-traceability gate — every behavioral task's ACs
+//       must bind to a code path + a killing test; the headline must be impl+test.
+//   (2) Adversarial pre-mortem agent (opus, fresh-context, assume-the-plan-is-flawed)
+//       — predicts edge-case / dead-deliverable / NFR failures; each blocking
+//       finding must become a required test before execute.
+if (phaseName === "plan" && result && result.status !== "failed") {
+  phase("Plan Hardening");
+  // (1) Deterministic gate. FAIL-CLOSED (Red Team MEDIUM-2): a deterministic gate
+  // that can't be evaluated (CLI error / unparsed envelope) is NOT a pass — block.
+  const trace = await runTraceabilityGate(projectDir, milestone, "traceability-gate", "Plan Hardening");
+  const traceUnparsed = trace && trace.reason === "gate-unparsed";
+  if (trace && (trace.ok === false || traceUnparsed)) {
+    const vcount = (trace.violations || []).length;
+    const why = traceUnparsed
+      ? `traceability gate could not be evaluated (CLI error / unparsed output) — failing closed; re-run plan.`
+      : `${vcount} acceptance criteria not bound to a code path + killing test (M83 traceability gate). Fix tasks.md, then re-run plan.`;
+    log(`plan-hardening: traceability gate BLOCKED — ${traceUnparsed ? "unevaluable (fail-closed)" : vcount + " untraceable AC"}.`);
+    result.status = "blocked";
+    result.summary = `plan blocked: ${why} ${result.summary || ""}`.trim();
+    result.traceability = trace;
+    return result;
+  }
+  result.traceability = trace;
+  // (2) Adversarial pre-mortem. The agent reads its own protocol at spawn time
+  // (the orchestrator has no fs); blocking findings convert to required tests.
+  const PRE_MORTEM_SCHEMA = {
+    type: "object", required: ["verdict", "findings"], additionalProperties: true,
+    properties: {
+      verdict: { type: "string", enum: ["BLOCK", "CLEARED"] },
+      findings: {
+        type: "array", items: {
+          type: "object", required: ["severity", "condition", "requiredTest"], additionalProperties: true,
+          properties: {
+            severity: { type: "string", enum: ["CRITICAL", "HIGH", "MEDIUM", "LOW"] },
+            category: { type: "string" }, condition: { type: "string" },
+            whyItFails: { type: "string" }, requiredTest: { type: "string" }, affectedAC: { type: "string" },
+          },
+        },
+      },
+      headlineAssessment: { type: "object", additionalProperties: true },
+      notes: { type: "string" },
+    },
+  };
+  const preMortem = await agent(
+    [
+      `You are the adversarial Pre-Mortem reviewer for milestone ${milestone || "(current)"}.`,
+      `FIRST read your protocol via the Read tool: templates/prompts/pre-mortem-subagent.md (in the installed @tekyzinc/gsd-t package, or this project's copy). Follow it exactly.`,
+      `**Brief (REQUIRED):** ${brief.briefPath || "(no brief — read plan artifacts directly)"}`,
+      `Attack the PLAN at .gsd-t/domains/*/{scope,constraints,tasks}.md + .gsd-t/contracts/ + docs/requirements.md.`,
+      `Predict, before any code is executed, how this milestone will FAIL: edge cases, dead deliverables, unguarded NFRs, shallow-test traps. Scrutinize the HEADLINE capability hardest — is it bound to a real path, reachable, and covered by a killing test?`,
+      `Every blocking finding MUST convert to a concrete requiredTest the plan must adopt. Advisory notes are forbidden.`,
+      `Verdict BLOCK if any concrete, falsifiable failure condition lacks a named required test; else CLEARED. Return JSON per the schema.`,
+    ].join("\n"),
+    { label: "pre-mortem", phase: "Plan Hardening", schema: PRE_MORTEM_SCHEMA, model: "opus" }
+  ).catch((e) => ({ verdict: "BLOCK", findings: [{ severity: "HIGH", condition: `pre-mortem agent error: ${e && e.message}`, requiredTest: "re-run pre-mortem" }], notes: "agent-error" }));
+  result.preMortem = preMortem;
+  if (preMortem && preMortem.verdict === "BLOCK") {
+    const n = (preMortem.findings || []).length;
+    log(`plan-hardening: pre-mortem BLOCKED — ${n} predicted failure condition(s) need required tests in the plan.`);
+    result.status = "blocked";
+    result.summary = `plan blocked: pre-mortem found ${n} falsifiable failure condition(s) not covered by a planned test (M83). Add the required tests to tasks.md, then re-run plan. ${result.summary || ""}`.trim();
+  }
+}
 return result;