@tekyzinc/gsd-t 4.1.10 → 4.3.10

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -2,6 +2,40 @@
2
2
 
3
3
  All notable changes to GSD-T are documented here. Updated with each release.
4
4
 
5
+ ## [4.3.10] - 2026-06-05 (M84 Auto-Competition - minor)
6
+
7
+ ### Changed - Competition Mode is now AUTOMATIC (was opt-in)
8
+
9
+ M82 shipped Competition Mode as opt-in (`--competition N`). M84 makes the workflow decide for itself, per the user directive: *"I want the workflow to determine when it's optimal to create a competition."* The economic case (user's): a better artifact produced upstream makes every downstream phase — pre-mortem, execute, verify — cheaper and more likely to pass first time, so the expected downstream savings usually exceed the ~3× upstream cost. Opt-in just means forgetting to use the thing that lowers total cost.
10
+
11
+ - **Solution-space probe** runs at the start of each eligible phase (partition / milestone / discuss / design-decompose), after brief, before producing. It decides: ≥2 genuinely different viable approaches → compete (3 producers + judge); one obvious answer → single draft.
12
+ - **The probe runs on OPUS, not haiku.** Deciding "are there multiple good approaches?" is high-level reasoning, not a mechanical check — and it gates the whole 3× competition, so a weak probe would forfeit the feature. (User caught this: *"Is Haiku smart enough to make this a judgment?"* — no, it isn't; the probe is opus.)
13
+ - **Biased toward competing**: when uncertain, compete (the asymmetry favors generating options). Probe failure → compete (fail-toward-options).
14
+ - **Partition**: an opus probe makes the pre-produce compete/skip call; the objective file-disjointness oracle still judges the produced candidates (decision = heuristic + bias; selection = objective).
15
+ - **Producer angles are now phase-aware** (`ANGLES_BY_PHASE`) — a discuss/milestone/design producer no longer gets a partition-framed "carve file-disjoint domains" directive (Red Team MEDIUM fix; this latent M82 defect now mattered because competition is the default path).
16
+ - **Overrides** (rarely needed): `competition: N` (2–5) forces N; `competition: 0` / `noCompetition: true` forces off; unset = auto. An unparseable override logs a warning and falls back to auto.
17
+ - `meta.phases` now declares all 7 stages (Preflight / Probe / Compete / Judge / Phase / Finalize / Plan Hardening) — also fixes the M83 cosmetic gap where Plan Hardening wasn't pre-declared.
18
+ - **Verification**: real-sandbox proof — the opus probe ran through the Workflow sandbox and discriminated correctly (wide collaborative-editor scenario → compete, 3 approaches named; narrow copyright-bump → single draft). Adversarial Red Team (Opus, fresh context) GRUDGING-PASS — no CRITICAL/HIGH; state-wiring, overrides, eligibility, probe-failure, cost-bound, runtime-native, and plan-hardening interaction all verified clean. Fixed the 1 MEDIUM (phase-aware angles) + 3 LOWs. Suite 1372/0/4. Minor bump 4.2.10 → 4.3.10.
19
+ - Contract `competition-mode-contract.md` → v2.0.0 (trigger moved opt-in → automatic; judge/selection/invariants unchanged).
20
+ - Origin: NiceNote review — the user observed that competing on the M7 plan would have produced a better plan from the start (fewer pre-mortem blocks, less downstream cost), so competition should be automatic, not a flag to remember.
21
+
22
+ ## [4.2.10] - 2026-06-05 (M83 Left-Shifted Plan Hardening - minor)
23
+
24
+ ### Added - Plan-phase hardening: catch dead deliverables and edge cases BEFORE execute
25
+
26
+ Left-shifts failure detection from verify to plan. Adversarial validation (the Red Team) ran only at verify — after code exists — so a milestone whose headline capability shipped as DEAD CODE (the NiceNote M5 incident: a 100MB+ chunked reader built but never wired into `openPath`, with no test exercising it) burned **four verify cycles** re-litigating the milestone's reason to exist. The root cause was in the plan: it never bound each acceptance criterion to a code path + a killing test, and nothing adversarial reviewed the design before code was written. The `plan` phase now runs two blocking gates before execute.
27
+
28
+ - **Acceptance-traceability gate** (deterministic) — `bin/gsd-t-traceability-gate.cjs`, dispatched as `gsd-t traceability-gate`. Parses `.gsd-t/domains/*/tasks.md`; every behavioral task (one declaring acceptance criteria) must bind its ACs to a `**Files**` code path AND a named killing test; a `**Headline:** true` task must have BOTH a real implementation path and a test. Exit 4 blocks execute. Field detection is emphasis-stripped + colon-position-agnostic (`**Label**:` ≡ `**Label:**`); task blocks are detected by any non-structural heading bearing an AC (descriptive headings are not dropped); the test check is tied to the Test/Files/AC fields only (an incidental runner word in a Dependencies note does not clear it); pytest `test_*.py` / `*_test.py` conventions are preserved.
29
+ - **Adversarial pre-mortem** (generative) — `templates/prompts/pre-mortem-subagent.md`, an opus, fresh-context, assume-the-plan-is-flawed reviewer wired into the plan workflow. Predicts edge-case / dead-deliverable / NFR / shallow-test failures and converts each blocking finding into a **required test** the plan must adopt (advisory notes forbidden — that is how M5's chunk reader shipped three data-loss bugs across three cycles). Verdict `BLOCK` / `CLEARED`.
30
+ - The two gates are the temporal dual of the Red Team: attack the design at plan, not just the code at verify. The deterministic gate runs first and fails CLOSED (an unevaluable gate blocks); the pre-mortem cannot approve a gate-blocked plan.
31
+ - New CLI `gsd-t traceability-gate [--milestone Mxx] [--tasks FILE]` (exit 0/4/64), added to project + global bin tools. Contract `.gsd-t/contracts/plan-hardening-contract.md` v1.0.0 STABLE. `gsd-t-plan.md` + the phase-workflow plan objective updated to require traceable tasks up front.
32
+ - **Verification**: orthogonal triad ran. Adversarial Workflow Red Team (Opus, fresh context) FAILed first pass (1 CRITICAL — colon-inside-bold markdown defeated all field detection, silently passing the literal M5 dead-code plan — + 2 HIGH + 2 MEDIUM), all fixed; re-validation found a regression the CRITICAL fix introduced (underscore-stripping broke pytest paths, HIGH), fixed; final re-validation GRUDGING-PASS (14/14 checks, no new HIGH/CRITICAL). Real-sandbox acceptance gate passed (gate fires through the Workflow sandbox and blocks the bad plan). Suite 1372/0/4 (+15 M83 tests). Self-tested against the actual NiceNote M5 dead-code plan — the gate FAILs it at plan time, which is the milestone's reason to exist.
33
+ - Origin: review of the NiceNote 9-milestone build, where the triad caught real bugs at verify but late; the user's proposal for an adversarial risk-assessment agent at plan.
34
+
35
+ ### Versioning
36
+
37
+ Minor bump 4.1.10 → 4.2.10 (new feature, additive; patch reset to 10).
38
+
5
39
  ## [4.1.10] - 2026-06-05 (M82 Competition Mode - minor)
6
40
 
7
41
  ### Added - Competition Mode: generate-and-judge for upstream, pre-contract phases
package/README.md CHANGED
@@ -123,9 +123,12 @@ gsd-t ci-parity --json # M57: reproduce the pro
123
123
  gsd-t test-data --list [--run ID] [--json] # M58: list test-data ledger entries
124
124
  gsd-t test-data --purge --run ID [--dry-run] [--json] # M58: purge tagged test data after Verify (Step 4.5)
125
125
  gsd-t competition-judge --in SPEC.json [--project-dir P] # M82: generate-and-judge selection oracle (partition / generic)
126
+ gsd-t traceability-gate --milestone Mxx [--project-dir P] # M83: plan-phase acceptance-traceability gate (AC → path → killing test)
126
127
  ```
127
128
 
128
- **Competition Mode (M82).** Opt-in `--competition N` (N 2–5) on upstream, pre-contract phases (`/gsd-t-partition`, `/gsd-t-milestone`, `/gsd-t-design-decompose`) fans out N parallel candidate producers and a judge selects the winner the generative dual of the orthogonal validation triad. Partition uses an *objective* file-disjointness oracle as the judge (a calculator, not a biased critic); subjective phases use a blind + different-model + rubric judge. Default off. See `.gsd-t/contracts/competition-mode-contract.md`.
129
+ **Plan Hardening (M83).** The `plan` phase now runs two blocking gates before execute, so a plan can't ship a dead deliverable: a deterministic **acceptance-traceability gate** (`gsd-t traceability-gate` every AC must bind to a code path + a killing test; the headline capability needs both impl and test) and an adversarial **pre-mortem** agent (opus, fresh-context, predicts edge-case/NFR/dead-deliverable failures and requires a test for each). The temporal dual of the Red Team — attack the design at plan, not just the code at verify. Origin: a build where the headline capability shipped as dead code and burned 4 verify cycles. See `.gsd-t/contracts/plan-hardening-contract.md`.
130
+
131
+ **Competition Mode (M82 · automatic since M84).** On upstream, pre-contract phases (`/gsd-t-partition`, `/gsd-t-milestone`, `/gsd-t-discuss`, `/gsd-t-design-decompose`) the workflow **automatically decides** whether to compete: an Opus solution-space probe runs at phase start and, if it finds ≥2 genuinely different viable approaches, fans out 3 parallel candidate producers + a judge to pick the winner — the generative dual of the orthogonal validation triad. No flag needed (the probe is biased toward competing, since a better upstream artifact lowers total downstream cost). Partition's judge is an *objective* file-disjointness oracle; subjective phases use a blind + different-model + rubric judge. Override with `--no-competition` or `--competition N` only on explicit request. See `.gsd-t/contracts/competition-mode-contract.md`.
129
132
 
130
133
  `gsd-t parallel` consumes the M44 task-graph (D1) and applies three pre-spawn gates (D4 depgraph validation → D5 file-disjointness → D6 economics) followed by mode-aware headroom/split math. Extends — does not replace — the M40 orchestrator. Contract: `.gsd-t/contracts/wave-join-contract.md` v1.1.0.
131
134
 
@@ -0,0 +1,338 @@
1
+ "use strict";
2
+
3
+ /**
4
+ * gsd-t-traceability-gate — M83 D1
5
+ *
6
+ * The plan-phase acceptance-traceability gate. The deterministic half of
7
+ * Left-Shifted Plan Hardening (the adversarial pre-mortem agent is the
8
+ * generative half). Contract: .gsd-t/contracts/plan-hardening-contract.md.
9
+ *
10
+ * ORIGIN (NiceNote M5 incident, 2026-06-05): M5's headline capability (AC-6,
11
+ * 100MB+ chunked read) shipped as DEAD CODE — the chunk reader was built but
12
+ * openPath still materialized whole files, and NO test asserted the headline
13
+ * capability, so the suite stayed green. The triad burned 4 verify cycles
14
+ * re-litigating the milestone's reason to exist. Root cause: the plan never
15
+ * bound each acceptance criterion to (a) a real code path and (b) a test that
16
+ * FAILS if that path is absent. This gate enforces that binding BEFORE execute.
17
+ *
18
+ * What it checks, per `.gsd-t/domains/* /tasks.md` task block:
19
+ * - Every task that declares **Acceptance criteria** MUST declare **Files**
20
+ * (an implementing code path) — an AC with no path is an unbacked promise.
21
+ * - Every such task MUST declare a TEST reference (a Test/Tests field, a
22
+ * test-runner mention, or a Files entry matching a test path pattern) — an
23
+ * AC with no killing test is the dead-code class (passes vacuously / never
24
+ * exercised). The milestone's HEADLINE capability without a test is exactly
25
+ * the M5 failure.
26
+ * - A task tagged as the milestone HEADLINE (**Headline:** true, or an AC
27
+ * referencing the milestone's named capability) gets a STRICTER check: it
28
+ * MUST have a non-test Files entry (real implementation, not just a test)
29
+ * AND a test entry. A headline with only a test, or only an impl, fails.
30
+ *
31
+ * It does NOT judge whether the code is correct (that's verify) — only whether
32
+ * the PLAN is complete enough that execute can't produce a dead deliverable.
33
+ *
34
+ * Input: --milestone Mxx --project-dir PATH (reads .gsd-t/domains/* /tasks.md).
35
+ * OR --tasks <file> to check a single tasks.md (used by tests).
36
+ * Output: JSON envelope { ok, exitCode, milestone, tasks:[...], violations:[...] }.
37
+ * Exit: 0 all tasks traceable · 4 ≥1 violation (blocks execute) · 64 bad input.
38
+ *
39
+ * Hard rules: zero deps, never throws, pure/read-only.
40
+ */
41
+
42
+ const fs = require("node:fs");
43
+ const path = require("node:path");
44
+
45
+ // ─── tasks.md parsing ────────────────────────────────────────────────────
46
+
47
+ // Red Team CRITICAL/HIGH-3/MEDIUM-1 (M83 verify): markdown field labels appear in
48
+ // BOTH `**Label**: v` (colon outside bold) and `**Label:** v` (colon inside) forms.
49
+ // Matching against the raw line missed the colon-inside form — defeating the entire
50
+ // gate on the canonical M5 dead-code plan. Fix: STRIP emphasis markers first, then
51
+ // match the colon-agnostic bare text. All field detection runs on the bared line.
52
+ function _bare(line) {
53
+ return String(line == null ? "" : line).replace(/[*_`]/g, "");
54
+ }
55
+
56
+ // Path-safe bare: strips only emphasis that wraps labels (* and backtick), but
57
+ // PRESERVES underscores — pytest's test_*.py / *_test.py conventions depend on
58
+ // them, and TEST_PATH_RE has `_test\.` / `test_` alternatives (Red Team M83
59
+ // recheck HIGH: stripping `_` before the test-path scan false-failed Python plans).
60
+ function _barePath(s) {
61
+ return String(s == null ? "" : s).replace(/[*`]/g, "");
62
+ }
63
+
64
+ // A test reference is: an explicit Test/Tests field, a known runner mention, or a
65
+ // Files path that looks like a test file. Kept broad on purpose — the gate asserts
66
+ // a test is NAMED, not that it exists yet (plan precedes execute).
67
+ const TEST_PATH_RE = /(\.test\.|\.spec\.|(^|\/)tests?\/|(^|\/)e2e\/|_test\.|test_|cargo test|vitest|playwright|pytest|jest)/i;
68
+ // Field regexes run on the BARED line, so the colon can be anywhere the label ends.
69
+ const TEST_FIELD_RE = /^\s*[-*]?\s*(tests?|test\s*ref|test\s*coverage|verified\s*by)\s*:/i;
70
+ const FILES_FIELD_RE = /^\s*[-*]?\s*files?\s*:/i;
71
+ const AC_FIELD_RE = /^\s*[-*]?\s*(acceptance(\s*criteria)?|accept|ac)\s*:/i;
72
+ const HEADLINE_FIELD_RE = /^\s*[-*]?\s*headline\s*:\s*(true|yes)/i;
73
+ const HEADING_RE = /^(#{2,4})\s+(.*\S.*)$/;
74
+
75
+ // Headings that are structural, never tasks — so we don't mis-parse a Summary/
76
+ // Overview block as a behavioral task. Everything else that bears an AC field IS
77
+ // assessed (Red Team HIGH-2: do NOT gate task detection on heading wording —
78
+ // anchor on the AC, so a descriptive heading like "Implement the reader" is caught).
79
+ const NON_TASK_HEADING_RE = /^(summary|overview|notes?|context|goal|background|wave\s*history|index|integration\s*points?|dependencies|references?|appendix|tasks)\s*$/i;
80
+
81
+ /**
82
+ * Parse a tasks.md into candidate blocks: every `##`–`####` heading starts a
83
+ * block (except the structural-heading skip list). A block becomes a TASK for
84
+ * assessment iff it contains an acceptance-criteria field (decided later in
85
+ * assessTask) — but we keep ALL non-structural blocks so no AC-bearing block is
86
+ * ever dropped on heading wording.
87
+ * @returns {Array<{title, raw, lines}>}
88
+ */
89
+ function parseTasks(md) {
90
+ const lines = (md || "").split(/\r?\n/);
91
+ const blocks = [];
92
+ let cur = null;
93
+ for (const line of lines) {
94
+ const m = line.match(HEADING_RE);
95
+ if (m) {
96
+ const title = m[2].trim();
97
+ // Close any open block at every heading.
98
+ if (cur) { blocks.push(cur); cur = null; }
99
+ // Structural headings start no block; everything else does.
100
+ if (!NON_TASK_HEADING_RE.test(_bare(title).trim())) {
101
+ cur = { title, lines: [] };
102
+ }
103
+ continue;
104
+ }
105
+ if (cur) cur.lines.push(line);
106
+ }
107
+ if (cur) blocks.push(cur);
108
+ return blocks.map((t) => ({ title: t.title, raw: t.lines.join("\n"), lines: t.lines }));
109
+ }
110
+
111
+ // ─── per-task traceability assessment ────────────────────────────────────
112
+
113
+ // All field matching runs on the BARED line (emphasis stripped) so colon
114
+ // position inside/outside bold is irrelevant (Red Team CRITICAL fix).
115
+ function fieldValue(lines, re) {
116
+ for (const ln of lines) {
117
+ const bare = _bare(ln);
118
+ if (re.test(bare)) {
119
+ const idx = bare.indexOf(":");
120
+ return idx >= 0 ? bare.slice(idx + 1).trim() : "";
121
+ }
122
+ }
123
+ return null;
124
+ }
125
+
126
+ // Like fieldValue but PRESERVES underscores in the returned value (label is still
127
+ // matched emphasis-agnostically) — used for value-level test-path scans so
128
+ // test_*.py / *_test.py survive (Red Team recheck HIGH).
129
+ function fieldValueRaw(lines, re) {
130
+ for (const ln of lines) {
131
+ if (re.test(_bare(ln))) {
132
+ const raw = _barePath(ln);
133
+ const idx = raw.indexOf(":");
134
+ return idx >= 0 ? raw.slice(idx + 1).trim() : "";
135
+ }
136
+ }
137
+ return null;
138
+ }
139
+
140
+ function hasMultiField(lines, re) {
141
+ return lines.some((ln) => re.test(_bare(ln)));
142
+ }
143
+
144
+ // Collect the indented/bulleted sub-lines that follow an Acceptance-criteria
145
+ // label up to the next top-level field — these ARE the acceptance criteria, and
146
+ // an AC may name its own verifying test there ("…; verified by cargo test").
147
+ function _acBulletText(lines) {
148
+ const out = [];
149
+ let inAc = false;
150
+ for (const ln of lines) {
151
+ const bare = _bare(ln);
152
+ if (AC_FIELD_RE.test(bare)) { inAc = true; continue; }
153
+ if (!inAc) continue;
154
+ // A new NON-INDENTED "Label:" line closes the AC block.
155
+ if (/^\s*[-*]?\s*[a-z][a-z\s]{1,24}:/i.test(bare) && !/^\s{2,}/.test(ln)) {
156
+ inAc = false; continue;
157
+ }
158
+ out.push(_barePath(ln)); // preserve underscores for test-path detection
159
+ }
160
+ return out.join("\n");
161
+ }
162
+
163
+ /**
164
+ * A task is "behavioral" (subject to the gate) if it declares acceptance
165
+ * criteria — i.e. it promises an observable behavior. Pure-scaffolding tasks
166
+ * with no ACs are out of scope (nothing to trace).
167
+ */
168
+ function assessTask(task) {
169
+ const lines = task.lines;
170
+ const hasAc = hasMultiField(lines, AC_FIELD_RE);
171
+ if (!hasAc) {
172
+ return { title: task.title, behavioral: false, violations: [] };
173
+ }
174
+
175
+ // Underscore-preserving values for path/runner scans (Red Team recheck HIGH).
176
+ const filesVal = fieldValueRaw(lines, FILES_FIELD_RE) || "";
177
+ const hasFiles = hasMultiField(lines, FILES_FIELD_RE) && filesVal.replace(/[—–-]/g, "").trim().length > 0;
178
+
179
+ // Test reference (MEDIUM-1 fix): satisfied ONLY by a runner/test-path tied to a
180
+ // RELEVANT field — the Test field, the Files field, or the Acceptance-criteria
181
+ // value (where an AC may name its own verifying test, e.g. "…; verified by cargo
182
+ // test"). An incidental runner mention in an UNRELATED field (Dependencies,
183
+ // Notes, Scope) must NOT vacuously clear the killing-test requirement.
184
+ const hasTestField = hasMultiField(lines, TEST_FIELD_RE);
185
+ const testFieldVal = fieldValueRaw(lines, TEST_FIELD_RE) || "";
186
+ const acVal = fieldValueRaw(lines, AC_FIELD_RE) || "";
187
+ // AC criteria often span bullet sub-lines after the label; gather those too
188
+ // (underscore-preserving, so a test_*.py named in a bullet still matches).
189
+ const acBullets = _acBulletText(lines);
190
+ const filesHasTestPath = TEST_PATH_RE.test(filesVal);
191
+ const testFieldHasRunner = TEST_PATH_RE.test(testFieldVal);
192
+ const acHasRunner = TEST_PATH_RE.test(acVal) || TEST_PATH_RE.test(acBullets);
193
+ const hasTest = hasTestField || filesHasTestPath || testFieldHasRunner || acHasRunner;
194
+
195
+ // A non-test implementing path: a Files entry that is NOT only test files.
196
+ const fileTokens = filesVal.split(/[,\s]+/).map((s) => s.replace(/[`*()]/g, "").trim()).filter(Boolean);
197
+ const implTokens = fileTokens.filter((f) => /[./]/.test(f) && !TEST_PATH_RE.test(f));
198
+ const hasImplPath = implTokens.length > 0;
199
+
200
+ const isHeadline = lines.some((ln) => HEADLINE_FIELD_RE.test(_bare(ln)));
201
+
202
+ const violations = [];
203
+ if (!hasFiles) {
204
+ violations.push({ kind: "ac-without-path", detail: "task declares acceptance criteria but no **Files** implementing path — an unbacked promise." });
205
+ }
206
+ if (!hasTest) {
207
+ violations.push({ kind: "ac-without-test", detail: "task declares acceptance criteria but names no test (Test field, test path, or runner) — the dead-code class: it can pass vacuously / never be exercised." });
208
+ }
209
+ if (isHeadline && !hasImplPath) {
210
+ violations.push({ kind: "headline-without-impl", detail: "HEADLINE task has no non-test implementing path — the milestone's reason to exist is not bound to real code (the M5 AC-6 dead-code failure)." });
211
+ }
212
+ if (isHeadline && !hasTest) {
213
+ violations.push({ kind: "headline-without-test", detail: "HEADLINE task has no test proving the milestone's core capability is delivered (the missing >100MB-fixture failure)." });
214
+ }
215
+
216
+ return {
217
+ title: task.title,
218
+ behavioral: true,
219
+ isHeadline,
220
+ hasFiles, hasTest, hasImplPath,
221
+ violations,
222
+ };
223
+ }
224
+
225
+ // ─── driver ──────────────────────────────────────────────────────────────
226
+
227
+ function listTasksFiles(projectDir, milestone) {
228
+ const domainsDir = path.join(projectDir, ".gsd-t", "domains");
229
+ let entries = [];
230
+ try {
231
+ entries = fs.readdirSync(domainsDir, { withFileTypes: true });
232
+ } catch {
233
+ return [];
234
+ }
235
+ const out = [];
236
+ const mPrefix = milestone ? milestone.toLowerCase() : null;
237
+ for (const e of entries) {
238
+ if (!e.isDirectory()) continue;
239
+ // When a milestone is given, prefer domains whose name carries that mNN
240
+ // prefix; if none match, fall back to all domains (single-milestone repos).
241
+ const tasksPath = path.join(domainsDir, e.name, "tasks.md");
242
+ if (fs.existsSync(tasksPath)) out.push({ domain: e.name, tasksPath });
243
+ }
244
+ if (mPrefix) {
245
+ const matched = out.filter((d) => d.domain.toLowerCase().startsWith(mPrefix));
246
+ if (matched.length) return matched;
247
+ }
248
+ return out;
249
+ }
250
+
251
+ function runGate({ projectDir = process.cwd(), milestone = null, tasksFile = null } = {}) {
252
+ let files;
253
+ if (tasksFile) {
254
+ files = [{ domain: path.basename(path.dirname(tasksFile)), tasksPath: tasksFile }];
255
+ } else {
256
+ files = listTasksFiles(projectDir, milestone);
257
+ }
258
+ if (!files.length) {
259
+ return { ok: false, exitCode: 64, milestone, reason: "no-tasks-files", tasks: [], violations: [] };
260
+ }
261
+
262
+ const taskResults = [];
263
+ const violations = [];
264
+ let behavioralCount = 0;
265
+ for (const f of files) {
266
+ let md;
267
+ try { md = fs.readFileSync(f.tasksPath, "utf8"); } catch { continue; }
268
+ for (const t of parseTasks(md)) {
269
+ const r = assessTask(t);
270
+ r.domain = f.domain;
271
+ taskResults.push(r);
272
+ if (r.behavioral) behavioralCount++;
273
+ for (const v of r.violations) {
274
+ violations.push({ domain: f.domain, task: r.title, ...v });
275
+ }
276
+ }
277
+ }
278
+
279
+ const ok = violations.length === 0;
280
+ return {
281
+ ok,
282
+ exitCode: ok ? 0 : 4,
283
+ milestone,
284
+ summary: {
285
+ tasksTotal: taskResults.length,
286
+ behavioral: behavioralCount,
287
+ violations: violations.length,
288
+ },
289
+ tasks: taskResults,
290
+ violations,
291
+ ...(ok ? {} : { reason: "untraceable-acceptance-criteria" }),
292
+ };
293
+ }
294
+
295
+ // ─── CLI ─────────────────────────────────────────────────────────────────
296
+
297
+ function parseArgs(argv) {
298
+ const o = { projectDir: process.cwd(), milestone: null, tasksFile: null, help: false };
299
+ for (let i = 0; i < argv.length; i++) {
300
+ const a = argv[i];
301
+ if (a === "--help" || a === "-h") o.help = true;
302
+ else if (a === "--project-dir") o.projectDir = argv[++i];
303
+ else if (a === "--milestone") o.milestone = argv[++i];
304
+ else if (a === "--tasks") o.tasksFile = argv[++i];
305
+ else if (a === "--json") {/* default */}
306
+ }
307
+ return o;
308
+ }
309
+
310
+ const HELP = `Usage: gsd-t traceability-gate [--milestone Mxx] [--project-dir PATH] [--tasks FILE]
311
+
312
+ Plan-phase acceptance-traceability gate (M83). Asserts every behavioral task in
313
+ the milestone's .gsd-t/domains/* /tasks.md binds its acceptance criteria to an
314
+ implementing **Files** path AND a named test. Headline tasks must have BOTH a
315
+ real implementation path and a test. Blocks execute on any violation.
316
+
317
+ --milestone Mxx Limit to domains whose name carries the mNN prefix.
318
+ --project-dir P Project root (default: cwd).
319
+ --tasks FILE Check a single tasks.md (overrides domain discovery).
320
+
321
+ Exit: 0 all traceable · 4 ≥1 violation · 64 no tasks files / bad input.`;
322
+
323
+ function main() {
324
+ const o = parseArgs(process.argv.slice(2));
325
+ if (o.help) { process.stdout.write(HELP + "\n"); process.exit(0); }
326
+ let res;
327
+ try {
328
+ res = runGate(o);
329
+ } catch (e) {
330
+ res = { ok: false, exitCode: 64, milestone: o.milestone, reason: `gate-error: ${e && e.message}`, tasks: [], violations: [] };
331
+ }
332
+ process.stdout.write(JSON.stringify(res, null, 2) + "\n");
333
+ process.exit(res.exitCode);
334
+ }
335
+
336
+ if (require.main === module) main();
337
+
338
+ module.exports = { runGate, parseTasks, assessTask, _internal: { fieldValue, TEST_PATH_RE } };
package/bin/gsd-t.js CHANGED
@@ -1184,6 +1184,8 @@ const GLOBAL_BIN_TOOLS = [
1184
1184
  "gsd-t-ci-parity.cjs",
1185
1185
  // M82 — Competition Mode generate-and-judge selection oracle.
1186
1186
  "gsd-t-competition-judge.cjs",
1187
+ // M83 — Plan-phase acceptance-traceability gate.
1188
+ "gsd-t-traceability-gate.cjs",
1187
1189
  ];
1188
1190
 
1189
1191
  function installGlobalBinTools() {
@@ -2475,6 +2477,8 @@ const PROJECT_BIN_TOOLS = [
2475
2477
  // project's gsd-t-phase workflow can score candidate partitions via the
2476
2478
  // project-local bin (runCli prefers bin/<tool>.cjs over the global binary).
2477
2479
  "gsd-t-competition-judge.cjs", "gsd-t-file-disjointness.cjs",
2480
+ // M83 — Plan-phase acceptance-traceability gate (runs in the plan workflow).
2481
+ "gsd-t-traceability-gate.cjs",
2478
2482
  ];
2479
2483
 
2480
2484
  // Files that older versions of this installer copied into project bin/ but
@@ -4562,6 +4566,15 @@ if (require.main === module) {
4562
4566
  });
4563
4567
  process.exit(res.status == null ? 1 : res.status);
4564
4568
  }
4569
+ case "traceability-gate": {
4570
+ // M83 D1 — `gsd-t traceability-gate` plan-phase acceptance-traceability gate.
4571
+ const { spawnSync } = require("child_process");
4572
+ const js = path.join(__dirname, "gsd-t-traceability-gate.cjs");
4573
+ const res = spawnSync(process.execPath, [js, ...args.slice(1)], {
4574
+ stdio: "inherit",
4575
+ });
4576
+ process.exit(res.status == null ? 1 : res.status);
4577
+ }
4565
4578
  case "metrics":
4566
4579
  doMetrics(args.slice(1));
4567
4580
  break;
@@ -25,12 +25,12 @@ Capture the design reference from `$ARGUMENTS` (Figma URL / image path). If Figm
25
25
  args: {
26
26
  phase: "design-decompose",
27
27
  projectDir: ".",
28
- userInput: "$ARGUMENTS",
29
- // M82 Competition Mode (opt-in): `--competition N` (N 2..5) fans out N
30
- // parallel decompositions; a blind, different-model, rubric judge (fidelity /
31
- // completeness / reuse / simplicity) selects the winner. Useful when a design
32
- // is ambiguous or the component boundaries aren't obvious.
33
- competition: 1
28
+ userInput: "$ARGUMENTS"
29
+ // M84 Competition Mode is AUTOMATIC do NOT pass `competition` by default.
30
+ // The workflow probes (opus) and self-decides; it competes when a design is
31
+ // ambiguous or the element/widget/page boundaries aren't obvious (a blind,
32
+ // different-model rubric judge picks the winner). Override only on explicit
33
+ // request: `--no-competition` → 0, `--competition N` (2-5) → N.
34
34
  }
35
35
  }
36
36
  ```
@@ -481,12 +481,20 @@ Use these when user asks for help on a specific command:
481
481
 
482
482
  ### competition-judge (M82)
483
483
  - **Summary**: The selection oracle for Competition Mode (generate-and-judge — the *generative* dual of the orthogonal validation triad). Two modes: `--kind partition` scores candidate domain decompositions via the file-disjointness oracle (parallelGroups / waveDepth / validity — a calculator, not an LLM critic, so it's immune to judge bias); `--kind generic` is a deterministic rubric selector that finalizes a winner from rubric scores an upstream blind/different-model judge supplied.
484
- - **Auto-invoked**: Yes — by `gsd-t-phase.workflow.js` when an eligible phase (partition / milestone / design-decompose) is run with `competition: N` (N 2–5). Opt-in per phase via `/gsd-t-partition --competition N` etc. Default off.
484
+ - **Auto-invoked**: Yes — AUTOMATICALLY (M84). On an eligible phase (partition / milestone / discuss / design-decompose), `gsd-t-phase.workflow.js` runs an Opus solution-space probe at phase start and self-decides whether to fan out 3 producers + this judge (biased toward competing). No flag needed; override with `--competition N` (force N) or `--no-competition` (force off).
485
485
  - **Files**: `bin/gsd-t-competition-judge.cjs` (reuses `bin/gsd-t-file-disjointness.cjs`).
486
486
  - **Use when**: Upstream, pre-contract, wide-solution-space decisions where the cost of a single draft is high (partition, milestone decomposition, ambiguous design decomposition). Never on post-contract phases (execute/verify/etc.) — those are owned by the adversarial triad.
487
487
  - **CLI**: `gsd-t competition-judge [--in <spec.json>] [--project-dir <dir>]` (spec via stdin or `--in`). Exit 0 winner · 4 no valid candidate · 64 bad input.
488
488
  - **Contract**: `.gsd-t/contracts/competition-mode-contract.md` v1.0.0 STABLE.
489
489
 
490
+ ### traceability-gate (M83)
491
+ - **Summary**: Plan-phase acceptance-traceability gate — the deterministic half of Left-Shifted Plan Hardening. Parses `.gsd-t/domains/*/tasks.md` and asserts every behavioral task binds its acceptance criteria to a `**Files**` code path AND a named killing test; a `**Headline:** true` task must have both a real implementation path and a test. Catches the dead-deliverable class (a capability built but never tested/wired) at PLAN time instead of at verify.
492
+ - **Auto-invoked**: Yes — by `gsd-t-phase.workflow.js` at the end of the `plan` phase, blocking before execute (alongside the adversarial pre-mortem agent, protocol `templates/prompts/pre-mortem-subagent.md`).
493
+ - **Files**: `bin/gsd-t-traceability-gate.cjs`.
494
+ - **Use when**: Every plan phase (automatic). Origin: NiceNote M5 shipped its headline 100MB+ chunked-read as dead code with no test → 4 verify cycles.
495
+ - **CLI**: `gsd-t traceability-gate [--milestone <Mxx>] [--project-dir <dir>] [--tasks <file>]`. Exit 0 all traceable · 4 ≥1 untraceable AC (blocks execute) · 64 no tasks files.
496
+ - **Contract**: `.gsd-t/contracts/plan-hardening-contract.md` v1.0.0 STABLE.
497
+
490
498
  ## Unknown Command
491
499
 
492
500
  If user asks for help on unrecognized command:
@@ -25,17 +25,17 @@ Read `.gsd-t/progress.md` (current version + completed milestones), `docs/requir
25
25
  args: {
26
26
  phase: "milestone",
27
27
  projectDir: ".",
28
- userInput: "$ARGUMENTS",
29
- // M82 Competition Mode (opt-in): `--competition N` (N 2..5) fans out N
30
- // parallel Self-MoA producers proposing different decomposition strategies
31
- // (risk-first / value-first / dependency-first); a blind, different-model,
32
- // rubric judge selects the winner. Coupled-thesis pick-one (no Frankenstein).
33
- competition: 1
28
+ userInput: "$ARGUMENTS"
29
+ // M84 Competition Mode is AUTOMATIC do NOT pass `competition` by default.
30
+ // The workflow probes (opus) and self-decides; milestone decomposition is the
31
+ // highest-altitude decision, so it competes whenever ≥2 genuinely different
32
+ // strategies (risk-first / value-first / dependency-first) exist. Override only
33
+ // on explicit request: `--no-competition` → 0, `--competition N` (2-5) → N.
34
34
  }
35
35
  }
36
36
  ```
37
37
 
38
- **Competition Mode (`--competition N`).** Milestone decomposition is the highest-altitude decision in the system different strategies are genuinely different. If the user invokes `/gsd-t-milestone --competition 3`, parse N (clamped 2..5) and pass `competition: N`. Because a milestone decomposition is a *coupled thesis*, the judge selects one winner whole (pick-one) and only salvages non-overlapping good line-items from the losers — it never Frankensteins. See `.gsd-t/contracts/competition-mode-contract.md`. Default off.
38
+ **Competition Mode (automatic).** Milestone decomposition auto-competes when the probe finds ≥2 genuinely different strategies. Because a decomposition is a *coupled thesis*, the judge selects one winner whole (pick-one) and salvages only non-overlapping good line-items from the losers — it never Frankensteins. No flag needed; override with `--no-competition` / `--competition N` on explicit request. See `.gsd-t/contracts/competition-mode-contract.md`.
39
39
 
40
40
  ## Step 3: Interpret the result
41
41
 
@@ -30,17 +30,17 @@ Call the `Workflow` tool with:
30
30
  phase: "partition",
31
31
  milestone: "M{NN}",
32
32
  projectDir: ".",
33
- userInput: "$ARGUMENTS",
34
- // M82 Competition Mode (opt-in): if the user passed `--competition N` in
35
- // $ARGUMENTS (N in 2..5), set competition: N. N parallel Self-MoA producers
36
- // propose partitions; the OBJECTIVE oracle judge (file-disjointness scoring)
37
- // picks the most-parallelizable valid decomposition. Omit / set 1 = off.
38
- competition: 1
33
+ userInput: "$ARGUMENTS"
34
+ // M84 Competition Mode is AUTOMATIC do NOT pass `competition` by default.
35
+ // The workflow runs a solution-space probe and self-decides whether to fan out
36
+ // N candidate partitions (judged by the file-disjointness oracle). Only pass an
37
+ // override if the user explicitly asked: `--competition 0`/`--no-competition`
38
+ // → competition: 0; `--competition N` (2-5) → competition: N.
39
39
  }
40
40
  }
41
41
  ```
42
42
 
43
- **Competition Mode (`--competition N`).** Partition is the v1 beachhead for generate-and-judge: its judge is the file-disjointness oracle, so it is a calculator, not a biased critic. If the user invokes `/gsd-t-partition --competition 3`, parse N (clamped 2..5) and pass `competition: N`. The workflow fans out N candidate partitions, scores each on measured parallelism / wave-depth / boundary-cleanliness, and finalizes the winner. See `.gsd-t/contracts/competition-mode-contract.md`. Default off (single producer).
43
+ **Competition Mode (automatic).** Partition auto-competes when the workflow's probe finds ≥2 genuinely different ways to carve the domains; the objective file-disjointness oracle judges the candidates and picks the most-parallelizable valid one. No flag needed. Override only on explicit request: `/gsd-t-partition --no-competition` (force single draft) or `--competition N` (force N). See `.gsd-t/contracts/competition-mode-contract.md`.
44
44
 
45
45
  ## Step 3: Interpret the result
46
46
 
@@ -33,12 +33,14 @@ Read `.gsd-t/progress.md` and each domain's `scope.md`/`constraints.md`. The par
33
33
 
34
34
  ## Step 3: Interpret the result
35
35
 
36
- The Workflow returns `{ status, artifacts, summary, decisions }`.
36
+ The Workflow returns `{ status, artifacts, summary, decisions, traceability?, preMortem? }`.
37
37
 
38
- - `status === "complete"`: every domain has atomic tasks; `gsd-t parallel --dry-run` validates disjointness. Auto-advance to `/gsd-t-execute`.
39
- - `status === "partial" | "blocked"`: read `summary` (e.g. file-overlap between domains needing re-scoping).
38
+ - `status === "complete"`: every domain has atomic tasks; `gsd-t parallel --dry-run` validates disjointness; **M83 plan hardening passed** (acceptance-traceability gate + adversarial pre-mortem). Auto-advance to `/gsd-t-execute`.
39
+ - `status === "partial" | "blocked"`: read `summary` (e.g. file-overlap between domains; or **M83 plan hardening blocked** — see `traceability.violations` / `preMortem.findings`: an AC not bound to a code path + killing test, or a predicted failure condition with no planned test. Fix `tasks.md` and re-run plan).
40
40
  - `status === "failed"`: read `summary`.
41
41
 
42
+ **M83 Plan Hardening (runs automatically at the end of plan, blocking before execute).** Two gates ensure the plan can't produce a dead deliverable: (1) the deterministic **acceptance-traceability gate** (`gsd-t traceability-gate`) — every behavioral task's ACs must bind to a `**Files**` code path + a named test; the **Headline:** task needs both a real impl path and a test. (2) the adversarial **pre-mortem** agent (opus, fresh-context) — predicts edge-case/dead-deliverable/NFR failures and requires a test for each. Origin: NiceNote M5 shipped its headline (100MB+ chunked read) as dead code with no test, burning 4 verify cycles. Contract: `.gsd-t/contracts/plan-hardening-contract.md`.
43
+
42
44
  ## Document Ripple
43
45
 
44
46
  The plan agent writes per-domain `tasks.md`, updates `integration-points.md`, and adds a Decision Log entry.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@tekyzinc/gsd-t",
3
- "version": "4.1.10",
3
+ "version": "4.3.10",
4
4
  "description": "GSD-T: Contract-Driven Development for Claude Code — 54 slash commands with headless-by-default workflow spawning, unattended supervisor relay with event stream, graph-powered code analysis, real-time agent dashboard, task telemetry, doc-ripple enforcement, backlog management, impact analysis, test sync, milestone archival, and PRD generation",
5
5
  "author": "Tekyz, Inc.",
6
6
  "license": "MIT",
@@ -328,7 +328,7 @@ Canonical scripts:
328
328
  - `gsd-t-integrate.workflow.js` — cross-domain wire-up + light verify-gate
329
329
  - `gsd-t-debug.workflow.js` — 2-cycle diagnose/fix/verify (CLAUDE.md Prime Rule)
330
330
  - `gsd-t-quick.workflow.js` — preflight + brief + single-task + verify-gate (M56-D4)
331
- - `gsd-t-phase.workflow.js` — generic upper-stage runner (partition / plan / discuss / impact / milestone / prd / design-decompose / doc-ripple). **M82 Competition Mode:** an opt-in `competition: N` arg (N 2–5) on eligible upstream phases (partition / milestone / discuss / design-decompose) fans out N parallel Self-MoA producers → a judge stage → a finalizer. Partition's judge is the OBJECTIVE file-disjointness oracle (`gsd-t competition-judge --kind partition` — a calculator, not an LLM critic, immune to judge bias, the v1 beachhead); subjective phases use a blind + shuffled + different-model + rubric judge whose pick is finalized deterministically by `--kind generic`. The generative dual of the orthogonal validation triad; watershed rule = generate-and-judge ABOVE the contract, attack-and-filter BELOW. Default off. Contract: `competition-mode-contract.md` v1.0.0.
331
+ - `gsd-t-phase.workflow.js` — generic upper-stage runner (partition / plan / discuss / impact / milestone / prd / design-decompose / doc-ripple). **M82/M84 Competition Mode (AUTOMATIC):** on eligible upstream phases (partition / milestone / discuss / design-decompose) an Opus solution-space probe runs at phase start and self-decides whether to compete (biased toward competing — a better upstream artifact lowers total downstream cost); when it fires, 3 parallel Self-MoA producers → a judge stage → a finalizer. No flag needed; override with `competition: N` / `competition: 0` / `noCompetition: true`. Partition's judge is the OBJECTIVE file-disjointness oracle (`gsd-t competition-judge --kind partition` — a calculator, not an LLM critic, immune to judge bias, the v1 beachhead); subjective phases use a blind + shuffled + different-model + rubric judge whose pick is finalized deterministically by `--kind generic`. The generative dual of the orthogonal validation triad; watershed rule = generate-and-judge ABOVE the contract, attack-and-filter BELOW. Default off. Contract: `competition-mode-contract.md` v1.0.0. **M83 Plan Hardening:** the `plan` phase runs two blocking gates before execute — a deterministic acceptance-traceability gate (`gsd-t traceability-gate`: every AC binds to a code path + a killing test; the `Headline:` task needs both impl and test) and an adversarial pre-mortem agent (opus, fresh-context, protocol `pre-mortem-subagent.md`: predicts edge-case/dead-deliverable/NFR failures, each → a required test). The temporal dual of the Red Team (attack the design at plan, not just code at verify). Contract: `plan-hardening-contract.md` v1.0.0.
332
332
  - `gsd-t-scan.workflow.js` — preflight → volume-probe → pipeline(per-slice deep finder → single verify) → synthesis → document → render (M66: fans out by codebase VOLUME, not a fixed 5-teammate dimension count; M67: deep document phase deterministically produces the full living-doc set + dimension files, per-doc fan-out)
333
333
 
334
334
  **Runtime-native invariant (M81 — v4.0.29+):** the Workflow sandbox provides ONLY `agent/parallel/pipeline/log/phase/budget/args` — NO `require`/`fs`/`path`/`child_process`/`process`, and `args` arrives as a JSON STRING. Each workflow is self-contained: it `JSON.parse`s `args` and delegates every CLI call (preflight, verify-gate, brief, build-coverage, ci-parity, test-data, disjointness) to inline `async` helpers that run the command via an `agent()`'s Bash (preferring project-local `bin/<tool>.cjs`, else the global `gsd-t` PATH binary) and parse the JSON envelope — preserving the M55-D5 project-local-bin invariant. The old `require("./_lib.js")` pattern threw `ReferenceError` on first eval and silently broke every workflow except scan (TD-113, fixed M81); `_lib.js` is retired as a workflow dependency.
@@ -0,0 +1,46 @@
1
+ # Pre-Mortem Subagent Prompt — Adversarial Plan Review (pre-execute)
2
+
3
+ You are an adversarial Pre-Mortem reviewer. You attack the PLAN, not the code — because the code does not exist yet. Your job is to predict, BEFORE a single line is executed, how this milestone will fail: the edge cases it will hit, the deliverables it will leave hollow, and the assumptions it is quietly making. You are the generative-adversarial dual of the Red Team: the Red Team attacks finished code at verify; you attack the design at plan, so the milestone is built right the FIRST time instead of being re-litigated across verify cycles.
4
+
5
+ **Inverted incentives.** Your value is measured by REAL failure conditions surfaced now, not by approving the plan. A plan you bless that later burns verify cycles is YOUR failure. Assume the plan is flawed and find where.
6
+
7
+ <!-- Workflow-stage invocation -->
8
+ **Invocation context.** When this protocol runs as a native Workflow `agent()` stage (via `templates/workflows/gsd-t-phase.workflow.js` plan phase), your **final emission MUST be a single StructuredOutput object** matching the PRE_MORTEM schema declared by the Workflow. Bash/git/Read tool use is permitted DURING analysis; the final emission is the JSON verdict.
9
+
10
+ <!-- brief-first rule -->
11
+ **Brief first.** If you're about to grep, read, or run something, check the brief at `$BRIEF_PATH` first (a ≤2,500-token snapshot of CLAUDE.md + contracts + scope + requirements). It identifies the milestone's acceptance criteria and high-risk surfaces — your starting attack surface. If unset/missing, fall back to reading the plan artifacts directly, but log the gap.
12
+
13
+ ## What you are given
14
+
15
+ The milestone's PLAN: `.gsd-t/domains/*/{scope,constraints,tasks}.md`, the relevant `.gsd-t/contracts/`, and the acceptance criteria / FRs / NFRs in `docs/requirements.md`. Read the milestone's stated GOAL and its HEADLINE capability (the one thing the milestone exists to deliver).
16
+
17
+ ## Hard Rules
18
+
19
+ - **Failure conditions = value.** A short list is failure. Exhaust every category below.
20
+ - **A finding must be CONCRETE and FALSIFIABLE.** "Could have edge cases" is not a finding. "A multi-byte UTF-8 codepoint split across a chunk boundary in `read_file_chunk` will corrupt or stall — there is no test for it" IS a finding.
21
+ - **Every blocking finding must become a REQUIRED TEST.** This is the core rule. Do not emit advisory notes — advisory notes get deferred, and a deferred edge case is exactly how the NiceNote M5 chunk reader shipped three distinct data-loss bugs across three verify cycles. For each finding, state the test that must exist in the plan before execute may start. If the plan already names that test, it is not a finding.
22
+ - **The headline capability gets the hardest scrutiny.** Ask explicitly: is the milestone's reason-to-exist (a) bound to a real code path in the plan, (b) reachable from a user action / entry point, and (c) covered by a test that FAILS if that path is dead? The NiceNote M5 milestone shipped its headline (100MB+ chunked read) as DEAD CODE because the plan never required a test that exercised it. Catch that here.
23
+ - **Deferral is illegitimate for a milestone's own headline.** If the plan defers the milestone's defining capability (or a core AC) to a later milestone, that is a blocking finding — an incomplete milestone, not a warning.
24
+ - Style/taste is NOT a finding. Theoretical purity is NOT a finding. Only predicted, concrete, testable failure.
25
+
26
+ ## Attack Categories (exhaust ALL)
27
+
28
+ 1. **Dead-deliverable / wiring gaps** — Is every acceptance criterion bound to a code path that is actually CALLED from an entry point? Could a capability be built but never invoked (the M5 dead-code class)? Is the headline reachable from a real user action?
29
+ 2. **Boundary & edge inputs** — empty / null / huge / zero-length / off-by-one / max-size. For each data path the plan introduces: what is the worst input, and is there a test for it? (split codepoints, chunk boundaries, 0-byte files, files at exactly the threshold, unicode, path traversal.)
30
+ 3. **Resource / NFR conditions** — memory, time, file-handle, DOM-node, payload-size ceilings. Does any NFR (performance, bounded memory, scale) have a FALSIFIABLE measured acceptance check in the plan? An NFR with no measured test is a blocking finding (the NiceNote NFR-1 160k-DOM-node class).
31
+ 4. **Error & failure paths** — what happens when the new code's dependency fails, the input is malformed, the operation is interrupted mid-flight? Does the plan specify graceful degradation, and is there a test for the failure path (not just the happy path)?
32
+ 5. **State / ordering / concurrency** — actions out of order, partial completion, re-entry, two things racing over a shared resource (the verify-gate port-race class). Does the plan account for it?
33
+ 6. **Contract & integration seams** — at every cross-domain boundary the plan defines, do both sides agree on shape, error behavior, and who owns the shared file? Is there an integration test for the seam, not just unit tests on each side?
34
+ 7. **Shallow-test traps** — does the plan's testing approach risk vacuous passes? (assertions gated behind `if (count > 0)`, `toBeVisible()` standing in for a functional check, `toHaveCount` with no state assertion.) Flag any planned test that would pass on a broken implementation.
35
+ 8. **Missing acceptance coverage** — read requirements. Is there an AC / FR / NFR with no task that delivers it, or no test that proves it?
36
+
37
+ ## Verdict
38
+
39
+ - **BLOCK** — one or more concrete, falsifiable failure conditions that the plan does not yet cover with a required test. The plan may NOT proceed to execute until each blocking finding is answered by a named required test (or the design is changed to make the condition impossible). This is the FAIL-equivalent.
40
+ - **CLEARED** — exhaustive search; every predicted failure condition is already covered by a named test in the plan, the headline is bound+reachable+tested, and every NFR has a measured acceptance check. (The plan-quality equivalent of GRUDGING-PASS — earned by exhaustion, not by haste.)
41
+
42
+ ## Output (StructuredOutput)
43
+
44
+ Emit a single object: `{ verdict: "BLOCK" | "CLEARED", findings: [ { severity: "CRITICAL"|"HIGH"|"MEDIUM"|"LOW", category, condition, whyItFails, requiredTest, affectedAC? } ], headlineAssessment: { capability, boundToPath, reachable, hasKillingTest }, notes }`.
45
+
46
+ `requiredTest` is the load-bearing field: the specific test that must be added to the plan to close the finding. A finding without a `requiredTest` is incomplete — every blocking finding converts to a test the plan must adopt before execute.
@@ -37,8 +37,13 @@ export const meta = {
37
37
  name: "gsd-t-phase",
38
38
  description: "Generic upper-stage phase runner (partition/plan/discuss/etc.)",
39
39
  phases: [
40
- { title: "Preflight", detail: "preflight + brief" },
41
- { title: "Phase", detail: "primary agent with phase-specific protocol" },
40
+ { title: "Preflight", detail: "preflight + brief" },
41
+ { title: "Probe", detail: "M84 auto-competition solution-space probe (opus; eligible phases only)" },
42
+ { title: "Compete", detail: "M82/M84 N parallel producers (when competition fires)" },
43
+ { title: "Judge", detail: "select/synthesize the winning candidate" },
44
+ { title: "Phase", detail: "primary agent (or finalizer) with phase-specific protocol" },
45
+ { title: "Finalize", detail: "commit the winning approach (competition path)" },
46
+ { title: "Plan Hardening", detail: "M83 traceability gate + adversarial pre-mortem (plan phase only)" },
42
47
  ],
43
48
  };
44
49
 
@@ -67,6 +72,14 @@ async function runCli(projectDir, subcmd, argv, localBin, label, parseJson = tru
67
72
  return r || { ok: false, exitCode: -1, envelope: null, via: "error" };
68
73
  }
69
74
  async function runPreflight(projectDir, label = "preflight", phaseNameOpt) { return runCli(projectDir, "preflight", ["--json"], "cli-preflight.cjs", label, true, phaseNameOpt); }
75
+ // M83: the deterministic plan-hardening gate. Returns the parsed envelope
76
+ // ({ ok, exitCode, violations, ... }); ok:false means ≥1 untraceable AC.
77
+ async function runTraceabilityGate(projectDir, milestone, label = "traceability-gate", phaseNameOpt) {
78
+ const argv = ["--json"];
79
+ if (milestone) argv.push("--milestone", milestone);
80
+ const r = await runCli(projectDir, "traceability-gate", argv, "gsd-t-traceability-gate.cjs", label, true, phaseNameOpt);
81
+ return r.envelope || { ok: r.ok, exitCode: r.exitCode, violations: [], reason: "gate-unparsed" };
82
+ }
70
83
  async function generateBrief(projectDir, { kind = "execute", milestone, domain, id, label = "brief", phaseNameOpt } = {}) {
71
84
  const argv = ["--kind", kind, "--spawn-id", id, "--out", `${projectDir}/.gsd-t/briefs/${id}.json`];
72
85
  if (milestone) argv.push("--milestone", milestone);
@@ -120,9 +133,77 @@ async function runCompetitionJudge(projectDir, spec, label = "judge", phaseNameO
120
133
  }
121
134
 
122
135
  // Phases where competition pays off (wide solution space, pre-contract, high blast
123
- // radius). A competition arg on any other phase is ignored (single producer runs).
136
+ // radius). Competition is AUTOMATIC on these (M84) the workflow probes the
137
+ // solution space and self-decides; on any other phase it never runs.
124
138
  const COMPETITION_ELIGIBLE = new Set(["partition", "milestone", "discuss", "design-decompose"]);
125
139
 
140
+ // M84: the solution-space probe. Decides AUTOMATICALLY whether a phase is
141
+ // competition-worthy (≥2 genuinely different viable approaches). This is a
142
+ // high-level reasoning step — NOT a mechanical check — so it runs on OPUS, not
143
+ // haiku (a weak probe forfeits the whole point: it gates a 3× competition whose
144
+ // upstream cost buys down far larger downstream cost). It is BIASED TOWARD
145
+ // COMPETING: when uncertain, compete — because a better artifact upstream makes
146
+ // every downstream phase (pre-mortem, execute, verify) cheaper and more likely to
147
+ // pass first time, so the expected savings usually exceed the 3× probe-and-produce
148
+ // cost. Returns { compete: bool, reason, approaches? }.
149
+ //
150
+ // Partition has its OWN probe (runPartitionProbe, also opus): the disjointness
151
+ // oracle can't decide before candidates exist, so an opus probe makes the
152
+ // compete/skip call and the oracle JUDGES the candidates afterward. This
153
+ // (runSolutionSpaceProbe) is for the other subjective phases.
154
+ const _PROBE_SCHEMA = {
155
+ type: "object", required: ["compete"], additionalProperties: true,
156
+ properties: {
157
+ compete: { type: "boolean" },
158
+ reason: { type: "string" },
159
+ approaches: { type: "array", items: { type: "string" } },
160
+ },
161
+ };
162
+ async function runSolutionSpaceProbe(projectDir, phaseName, { milestone, briefPath, userInput, phaseNameOpt } = {}) {
163
+ const prompt = [
164
+ `You are the Solution-Space Probe for the ${phaseName} phase${milestone ? ` of ${milestone}` : ""}. Decide ONE thing: should this phase generate MULTIPLE competing candidates (then a judge picks the best), or is a single draft sufficient?`,
165
+ `**Brief:** ${briefPath || "(none — read the relevant .gsd-t docs/contracts/requirements directly)"}`,
166
+ userInput ? `\nUser input:\n${userInput}\n` : "",
167
+ `Compete WHEN there are ≥2 genuinely DIFFERENT, viable approaches whose trade-offs matter — different architectures, decomposition strategies, data models, sequencing, or design directions that a reasonable expert could disagree about. List them in "approaches".`,
168
+ `Do NOT compete only when there is ONE obvious correct approach and any variation would be cosmetic.`,
169
+ `BIAS TOWARD COMPETING: if you are uncertain, or can name even two plausibly-different approaches, choose compete=true. A wasted competition costs ~3× this one phase; a missed-better-approach costs far more downstream (more pre-mortem blocks, more bugs, more verify cycles). Err on the side of generating options.`,
170
+ `Return JSON per the schema: { "compete": true|false, "reason": "<one sentence>", "approaches": ["<a>","<b>",...] }.`,
171
+ ].filter(Boolean).join("\n");
172
+ const opts = { label: "solution-space-probe", schema: _PROBE_SCHEMA, model: "opus" };
173
+ if (phaseNameOpt) opts.phase = phaseNameOpt;
174
+ const r = await agent(prompt, opts).catch(() => null);
175
+ // Probe failure → bias toward competing (fail-toward-options, per the cost logic).
176
+ if (!r || typeof r.compete !== "boolean") {
177
+ return { compete: true, reason: "probe unavailable — defaulting to compete (bias toward options)", approaches: [] };
178
+ }
179
+ return { compete: r.compete, reason: r.reason || "", approaches: r.approaches || [] };
180
+ }
181
+
182
+ // M84: PARTITION's pre-produce decision. The objective disjointness oracle needs
183
+ // candidates to score, so it can't DECIDE before any exist — it runs later as the
184
+ // JUDGE. For the pre-produce compete/skip decision we use an OPUS heuristic probe
185
+ // (biased toward compete): partition is competition-worthy unless the milestone is
186
+ // trivially single-domain. So: opus probe DECIDES whether to compete; the objective
187
+ // file-disjointness oracle JUDGES the produced candidates. (Decision = heuristic +
188
+ // bias; selection = objective.)
189
+ async function runPartitionProbe(projectDir, { milestone, briefPath, userInput, phaseNameOpt } = {}) {
190
+ const prompt = [
191
+ `You are the Partition Solution-Space Probe${milestone ? ` for ${milestone}` : ""}. Decide: are there ≥2 genuinely different ways to CARVE this milestone into file-disjoint domains (different boundaries / groupings / parallelism), or is there one obvious single decomposition?`,
192
+ `**Brief:** ${briefPath || "(none — read .gsd-t docs/contracts/requirements directly)"}`,
193
+ userInput ? `\nUser input:\n${userInput}\n` : "",
194
+ `Compete=true when the work spans multiple files/areas that could be grouped more than one sensible way. Compete=false ONLY for a trivial single-file / single-domain milestone.`,
195
+ `BIAS TOWARD COMPETING: if ≥3 files/areas are in play or you're unsure, choose compete=true — the file-disjointness oracle will objectively pick the most-parallelizable valid carving among the candidates, so competing is low-risk and high-reward.`,
196
+ `Return JSON per the schema.`,
197
+ ].filter(Boolean).join("\n");
198
+ const opts = { label: "partition-probe", schema: _PROBE_SCHEMA, model: "opus" };
199
+ if (phaseNameOpt) opts.phase = phaseNameOpt;
200
+ const r = await agent(prompt, opts).catch(() => null);
201
+ if (!r || typeof r.compete !== "boolean") {
202
+ return { compete: true, reason: "probe unavailable — defaulting to compete", approaches: [] };
203
+ }
204
+ return { compete: r.compete, reason: r.reason || "", approaches: r.approaches || [] };
205
+ }
206
+
126
207
  // Rubric axes for the SUBJECTIVE judge (non-partition eligible phases). Partition
127
208
  // uses the objective oracle instead and ignores these.
128
209
  const RUBRIC_AXES_BY_PHASE = {
@@ -162,13 +243,27 @@ const milestone = _args.milestone || null;
162
243
  const userInput = _args.userInput || "";
163
244
  const phaseName = _args.phase;
164
245
 
165
- // M82: clamp competition N to [1,5]. Evidence (Self-MoA, Large Language Monkeys):
166
- // gains plateau fast; N=3 captures the elbow, >5 is wasteful. N<=1 = off (single producer).
167
- const _rawN = Number(_args.competition) || 1;
168
- const competitionN = Math.max(1, Math.min(5, Math.floor(_rawN)));
169
- const competitionOn = competitionN > 1 && COMPETITION_ELIGIBLE.has(phaseName);
170
- if (competitionN > 1 && !competitionOn) {
171
- log(`competition: N=${competitionN} ignored phase "${phaseName}" is not competition-eligible (single producer runs). Eligible: ${[...COMPETITION_ELIGIBLE].join(", ")}.`);
246
+ // M84: competition is AUTOMATIC. By default the workflow PROBES the solution space
247
+ // (after brief) and self-decides whether to run a 3-producer + judge competition
248
+ // no flag needed. Optional manual OVERRIDES: `competition: N` (2-5) forces N
249
+ // producers; `competition: 0` or `noCompetition: true` forces it off. Default
250
+ // (`competition` unset) = let the workflow decide.
251
+ // Evidence (Self-MoA, Large Language Monkeys): gains plateau fast; N=3 is the elbow,
252
+ // >5 wasteful. The auto path fires 3.
253
+ const AUTO_COMPETITION_N = 3;
254
+ const _hasCompetitionArg = _args.competition !== undefined && _args.competition !== null;
255
+ const _forceOff = _args.noCompetition === true || (_hasCompetitionArg && Number(_args.competition) <= 1);
256
+ const _forcedN = _hasCompetitionArg && Number(_args.competition) >= 2
257
+ ? Math.max(2, Math.min(5, Math.floor(Number(_args.competition))))
258
+ : null;
259
+ // competitionN/competitionOn are resolved LATER (after preflight+brief) by the
260
+ // auto-probe, unless an override pins them now. Declared with `let` so the
261
+ // post-brief decision block can set them.
262
+ let competitionN = 1;
263
+ let competitionOn = false;
264
+ const _competitionEligible = COMPETITION_ELIGIBLE.has(phaseName);
265
+ if (_forcedN && !_competitionEligible) {
266
+ log(`competition: forced N=${_forcedN} ignored — phase "${phaseName}" is not competition-eligible. Eligible: ${[...COMPETITION_ELIGIBLE].join(", ")}.`);
172
267
  }
173
268
 
174
269
  if (!phaseName || !VALID_PHASES.includes(phaseName)) {
@@ -181,10 +276,40 @@ const pre = await runPreflight(projectDir);
181
276
  if (!pre.ok) return { status: "failed", reason: "preflight-failed", preflight: pre.envelope };
182
277
  const brief = await generateBrief(projectDir, { kind: phaseName, milestone, id: `${phaseName}-${(milestone || "m").toLowerCase()}` });
183
278
 
184
- phase("Phase");
279
+ // ── M84: resolve competition AUTOMATICALLY (after brief, before producing) ──
280
+ // Default: probe the solution space and self-decide. Overrides pin it.
281
+ if (_competitionEligible) {
282
+ if (_forceOff) {
283
+ competitionOn = false;
284
+ log(`competition: OFF (overridden via competition≤1 / noCompetition).`);
285
+ } else if (_forcedN) {
286
+ competitionN = _forcedN; competitionOn = true;
287
+ log(`competition: ON, N=${_forcedN} (overridden).`);
288
+ } else {
289
+ // M84 Red Team LOW: warn on an unparseable override so a typo (competition:"off")
290
+ // isn't silently swallowed into the auto path.
291
+ if (_hasCompetitionArg && Number.isNaN(Number(_args.competition))) {
292
+ log(`competition: override value ${JSON.stringify(_args.competition)} is not a number — ignoring it, using AUTO. (Use 0/noCompetition to force off, 2-5 to force N.)`);
293
+ }
294
+ // Automatic decision — the workflow probes and decides. Opus probe (or the
295
+ // partition-specific probe); biased toward competing.
296
+ phase("Probe");
297
+ const probe = phaseName === "partition"
298
+ ? await runPartitionProbe(projectDir, { milestone, briefPath: brief.briefPath, userInput, phaseNameOpt: "Probe" })
299
+ : await runSolutionSpaceProbe(projectDir, phaseName, { milestone, briefPath: brief.briefPath, userInput, phaseNameOpt: "Probe" });
300
+ competitionOn = !!probe.compete;
301
+ competitionN = competitionOn ? AUTO_COMPETITION_N : 1;
302
+ log(`competition: AUTO → ${competitionOn ? `COMPETE (${AUTO_COMPETITION_N} producers)` : "single draft"} — ${probe.reason}${probe.approaches && probe.approaches.length ? ` [approaches: ${probe.approaches.join("; ")}]` : ""}`);
303
+ }
304
+ }
305
+
306
+ // M84 Red Team LOW: announce "Phase" only on the single-draft path (the
307
+ // competition path announces Compete/Judge/Finalize instead) so no empty stage shows.
185
308
  const promptByPhase = {
186
309
  partition: `Decompose the milestone into 2-5 independent domains. Write .gsd-t/domains/{domain}/{scope,constraints,tasks}.md. Cross-domain contracts in .gsd-t/contracts/.`,
187
- plan: `For each domain, write atomic tasks.md entries with files, contract refs, dependencies, acceptance criteria. Update .gsd-t/contracts/integration-points.md with wave groupings.`,
310
+ plan: `For each domain, write atomic tasks.md entries with files, contract refs, dependencies, acceptance criteria. Update .gsd-t/contracts/integration-points.md with wave groupings.
311
+
312
+ M83 PLAN HARDENING (mandatory — the plan is BLOCKED from execute otherwise): every task that declares acceptance criteria MUST also declare (1) **Files** = the concrete code path that implements it, and (2) a TEST that fails if that path is dead — name it in a **Test** field, a test-file path (\`*.test.*\` / \`*.spec.*\` / \`e2e/\`), or a runner (vitest/cargo test/playwright). The ONE task that delivers the milestone's HEADLINE capability MUST be tagged **Headline:** true and carry BOTH a real implementation path AND a test that exercises that capability end-to-end (e.g. for a "100MB+ file" milestone, a test that actually opens a >100MB fixture). NEVER defer a milestone's own headline capability or a core AC to a later milestone. This exists because NiceNote M5 shipped its headline (100MB+ chunked read) as DEAD CODE with no test and burned 4 verify cycles.`,
188
313
  discuss: `Multi-perspective exploration of design questions. Settle locked decisions into .gsd-t/CONTEXT.md. Do NOT implement.`,
189
314
  impact: `Analyze downstream effects of proposed changes. Identify breaking changes, affected consumers, migration paths.`,
190
315
  milestone: `Define a new milestone — origin, goal, success criteria, falsifiable acceptance. Append to .gsd-t/progress.md. Defer partition/plan.`,
@@ -199,6 +324,7 @@ const briefLine = `**Brief (REQUIRED):** ${brief.briefPath || "(no brief — re-
199
324
  let result;
200
325
  if (!competitionOn) {
201
326
  // ── Single-producer path (default, unchanged behavior) ──
327
+ phase("Phase");
202
328
  result = await agent(
203
329
  [
204
330
  `You are the ${phaseName} phase agent.`,
@@ -214,15 +340,49 @@ if (!competitionOn) {
214
340
  { label: phaseName, phase: "Phase", schema: PHASE_RESULT_SCHEMA, model: "opus" }
215
341
  ).catch((e) => ({ status: "failed", artifacts: [], summary: `agent error: ${e && e.message}` }));
216
342
  } else {
217
- // ── M82 Competition Mode: generate -> judge -> finalize ──
218
- // Distinct "angles" so the N Self-MoA producers explore different regions of
219
- // the solution space (diversity by prompt, not by model — Self-MoA > Mixed-MoA).
220
- const ANGLES = [
221
- "Optimize for MAXIMUM parallelism: carve the most file-disjoint domains that can run concurrently.",
222
- "Optimize for SIMPLICITY: the fewest domains with the cleanest, most obvious boundaries.",
223
- "Optimize for RISK ISOLATION: isolate the riskiest/most-coupled work into its own domain so the rest stays safe.",
224
- "Optimize for DEPENDENCY DEPTH: minimize serial gates (waves) between domains.",
225
- "Optimize for BALANCE: roughly equal-sized domains with minimal cross-talk.",
343
+ // ── M82/M84 Competition Mode: generate -> judge -> finalize ──
344
+ // Distinct "angles" so the N Self-MoA producers explore different regions of the
345
+ // solution space (diversity by prompt, not by model — Self-MoA > Mixed-MoA).
346
+ // M84 Red Team MEDIUM: angles must be PHASE-AWARE — the old partition-only set
347
+ // gave a discuss/milestone producer a contradictory "carve file-disjoint domains"
348
+ // directive, degrading 3 of 4 now-automatic phases. Each eligible phase gets its
349
+ // own angle set (analogous to RUBRIC_AXES_BY_PHASE).
350
+ const ANGLES_BY_PHASE = {
351
+ partition: [
352
+ "Optimize for MAXIMUM parallelism: carve the most file-disjoint domains that can run concurrently.",
353
+ "Optimize for SIMPLICITY: the fewest domains with the cleanest, most obvious boundaries.",
354
+ "Optimize for RISK ISOLATION: isolate the riskiest/most-coupled work into its own domain so the rest stays safe.",
355
+ "Optimize for DEPENDENCY DEPTH: minimize serial gates (waves) between domains.",
356
+ "Optimize for BALANCE: roughly equal-sized domains with minimal cross-talk.",
357
+ ],
358
+ milestone: [
359
+ "Optimize for FASTEST TIME-TO-VALUE: the leanest milestone sequence that ships something usable soonest.",
360
+ "Optimize for RISK-FIRST: front-load the riskiest/most-uncertain work so failure is cheap and early.",
361
+ "Optimize for DEPENDENCY ORDER: sequence strictly by what unblocks the most downstream work.",
362
+ "Optimize for USER-VALUE-FIRST: order milestones by the value each delivers to the end user.",
363
+ "Optimize for SIMPLICITY: the fewest, most self-contained milestones with minimal cross-cutting.",
364
+ ],
365
+ discuss: [
366
+ "Argue the SIMPLEST viable architecture, even if it sacrifices some flexibility.",
367
+ "Argue the most ROBUST/CORRECT architecture, accepting more upfront complexity.",
368
+ "Argue the most EXTENSIBLE architecture, optimizing for future change.",
369
+ "Argue a PRAGMATIC middle path, naming the explicit trade-offs it accepts.",
370
+ "Argue a CONTRARIAN approach that questions an assumption the others take for granted.",
371
+ ],
372
+ "design-decompose": [
373
+ "Decompose ATOMIC-FIRST: smallest reusable elements up, composed into widgets then pages.",
374
+ "Decompose PAGE-FIRST: whole pages down into sections, widgets, then elements.",
375
+ "Decompose TOKEN-DRIVEN: design tokens + primitives first, structure follows the system.",
376
+ "Decompose by REUSE: maximize shared components; minimize one-off bespoke pieces.",
377
+ "Decompose by FEATURE: group elements/widgets by the user-facing feature they serve.",
378
+ ],
379
+ };
380
+ const ANGLES = ANGLES_BY_PHASE[phaseName] || [
381
+ "Explore a materially different approach, optimizing for simplicity.",
382
+ "Explore a materially different approach, optimizing for robustness/correctness.",
383
+ "Explore a materially different approach, optimizing for extensibility.",
384
+ "Explore a pragmatic middle path, naming its trade-offs.",
385
+ "Explore a contrarian approach that questions a shared assumption.",
226
386
  ];
227
387
 
228
388
  const PRODUCER_SCHEMA = phaseName === "partition"
@@ -434,4 +594,75 @@ if (!competitionOn) {
434
594
  result.competition = { n: candidates.length, winner: winner.id, ranked };
435
595
  }
436
596
 
597
+ // ── M83 Left-Shifted Plan Hardening (plan phase only) ──
598
+ // Two blocking gates run AFTER the plan agent writes tasks.md and BEFORE the plan
599
+ // is declared complete — so execute can never start on a plan that would produce a
600
+ // dead deliverable or an unguarded edge case. Contract: plan-hardening-contract.md.
601
+ // (1) Deterministic acceptance-traceability gate — every behavioral task's ACs
602
+ // must bind to a code path + a killing test; the headline must be impl+test.
603
+ // (2) Adversarial pre-mortem agent (opus, fresh-context, assume-the-plan-is-flawed)
604
+ // — predicts edge-case / dead-deliverable / NFR failures; each blocking
605
+ // finding must become a required test before execute.
606
+ if (phaseName === "plan" && result && result.status !== "failed") {
607
+ phase("Plan Hardening");
608
+
609
+ // (1) Deterministic gate. FAIL-CLOSED (Red Team MEDIUM-2): a deterministic gate
610
+ // that can't be evaluated (CLI error / unparsed envelope) is NOT a pass — block.
611
+ const trace = await runTraceabilityGate(projectDir, milestone, "traceability-gate", "Plan Hardening");
612
+ const traceUnparsed = trace && trace.reason === "gate-unparsed";
613
+ if (trace && (trace.ok === false || traceUnparsed)) {
614
+ const vcount = (trace.violations || []).length;
615
+ const why = traceUnparsed
616
+ ? `traceability gate could not be evaluated (CLI error / unparsed output) — failing closed; re-run plan.`
617
+ : `${vcount} acceptance criteria not bound to a code path + killing test (M83 traceability gate). Fix tasks.md, then re-run plan.`;
618
+ log(`plan-hardening: traceability gate BLOCKED — ${traceUnparsed ? "unevaluable (fail-closed)" : vcount + " untraceable AC"}.`);
619
+ result.status = "blocked";
620
+ result.summary = `plan blocked: ${why} ${result.summary || ""}`.trim();
621
+ result.traceability = trace;
622
+ return result;
623
+ }
624
+ result.traceability = trace;
625
+
626
+ // (2) Adversarial pre-mortem. The agent reads its own protocol at spawn time
627
+ // (the orchestrator has no fs); blocking findings convert to required tests.
628
+ const PRE_MORTEM_SCHEMA = {
629
+ type: "object", required: ["verdict", "findings"], additionalProperties: true,
630
+ properties: {
631
+ verdict: { type: "string", enum: ["BLOCK", "CLEARED"] },
632
+ findings: {
633
+ type: "array", items: {
634
+ type: "object", required: ["severity", "condition", "requiredTest"], additionalProperties: true,
635
+ properties: {
636
+ severity: { type: "string", enum: ["CRITICAL", "HIGH", "MEDIUM", "LOW"] },
637
+ category: { type: "string" }, condition: { type: "string" },
638
+ whyItFails: { type: "string" }, requiredTest: { type: "string" }, affectedAC: { type: "string" },
639
+ },
640
+ },
641
+ },
642
+ headlineAssessment: { type: "object", additionalProperties: true },
643
+ notes: { type: "string" },
644
+ },
645
+ };
646
+ const preMortem = await agent(
647
+ [
648
+ `You are the adversarial Pre-Mortem reviewer for milestone ${milestone || "(current)"}.`,
649
+ `FIRST read your protocol via the Read tool: templates/prompts/pre-mortem-subagent.md (in the installed @tekyzinc/gsd-t package, or this project's copy). Follow it exactly.`,
650
+ `**Brief (REQUIRED):** ${brief.briefPath || "(no brief — read plan artifacts directly)"}`,
651
+ `Attack the PLAN at .gsd-t/domains/*/{scope,constraints,tasks}.md + .gsd-t/contracts/ + docs/requirements.md.`,
652
+ `Predict, before any code is executed, how this milestone will FAIL: edge cases, dead deliverables, unguarded NFRs, shallow-test traps. Scrutinize the HEADLINE capability hardest — is it bound to a real path, reachable, and covered by a killing test?`,
653
+ `Every blocking finding MUST convert to a concrete requiredTest the plan must adopt. Advisory notes are forbidden.`,
654
+ `Verdict BLOCK if any concrete, falsifiable failure condition lacks a named required test; else CLEARED. Return JSON per the schema.`,
655
+ ].join("\n"),
656
+ { label: "pre-mortem", phase: "Plan Hardening", schema: PRE_MORTEM_SCHEMA, model: "opus" }
657
+ ).catch((e) => ({ verdict: "BLOCK", findings: [{ severity: "HIGH", condition: `pre-mortem agent error: ${e && e.message}`, requiredTest: "re-run pre-mortem" }], notes: "agent-error" }));
658
+
659
+ result.preMortem = preMortem;
660
+ if (preMortem && preMortem.verdict === "BLOCK") {
661
+ const n = (preMortem.findings || []).length;
662
+ log(`plan-hardening: pre-mortem BLOCKED — ${n} predicted failure condition(s) need required tests in the plan.`);
663
+ result.status = "blocked";
664
+ result.summary = `plan blocked: pre-mortem found ${n} falsifiable failure condition(s) not covered by a planned test (M83). Add the required tests to tasks.md, then re-run plan. ${result.summary || ""}`.trim();
665
+ }
666
+ }
667
+
437
668
  return result;