@tekyzinc/gsd-t 4.0.29 → 4.1.10

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -2,6 +2,26 @@
2
2
 
3
3
  All notable changes to GSD-T are documented here. Updated with each release.
4
4
 
5
+ ## [4.1.10] - 2026-06-05 (M82 Competition Mode - minor)
6
+
7
+ ### Added - Competition Mode: generate-and-judge for upstream, pre-contract phases
8
+
9
+ The *generative* dual of the orthogonal validation triad. The triad is adversarial (many critics, one candidate → a filter); Competition Mode is generative (many candidates, one judge → a generator). GSD-T historically filtered hard but **generated singly** — every upstream artifact was a single draft. Competition Mode adds the missing generator on the phases where it pays. **Watershed rule:** generate-and-judge ABOVE the contract; attack-and-filter BELOW it.
10
+
11
+ - **Opt-in `--competition N`** (N clamped 2–5; default off) on eligible upstream phases: `partition`, `milestone`, `discuss`, `design-decompose`. Ignored (single producer, logged) on ineligible phases (plan/impact/prd/doc-ripple) and impossible on post-contract phases (execute/verify/…).
12
+ - **Producers = Self-MoA** — N samples of ONE strong model (opus), diversified by prompt *angle* (max-parallelism / simplicity / risk-isolation / dependency-depth / balance), not by a model zoo. Evidence (Self-MoA, arXiv 2502.00674): aggregation is far more sensitive to candidate quality than diversity; mixing models injects low-quality candidates. No debate — producers stay independent.
13
+ - **Objective judge for partition (the v1 beachhead)** — `bin/gsd-t-competition-judge.cjs --kind partition` scores candidate decompositions via the SAME file-disjointness oracle the dispatcher uses (`bin/gsd-t-file-disjointness.cjs`): parallelGroups / waveDepth / validity. A calculator, not an LLM critic → immune to position/verbosity/self-preference bias. Touch paths normalized (`./a` ≡ `a`, `//`, backslashes, trailing slash, dedupe; case preserved).
14
+ - **Subjective judge for milestone/discuss/design** — blind + deterministically-shuffled + different-model (sonnet) + rubric-scored; the winner is finalized deterministically by `--kind generic` (highest weighted score; reproducible tiebreak; zero inference in the substrate).
15
+ - **Two-gate selection policy** (synthesize only when candidate-quality-uniform AND artifact-is-list-shaped; else pick-one) + three artifact classes (coupled-thesis → pick-one; line-items → union/dedup; structurally-validated → synthesize+re-validate). The finalizer does pick-one-at-thesis + union-at-line-item-level, then partition re-validates the graft via the oracle and BLOCKS on a reintroduced overlap.
16
+ - **New CLI**: `gsd-t competition-judge [--in SPEC.json] [--project-dir P]` (exit 0 winner / 4 no valid candidate / 64 bad input). Added to project + global bin tools.
17
+ - **Contract**: `.gsd-t/contracts/competition-mode-contract.md` v1.0.0 STABLE (6 invariants).
18
+ - **Verification**: orthogonal triad ran. Adversarial Workflow Red Team (Opus, fresh context) FAILed first pass (3 HIGH + 2 MEDIUM), all fixed, re-validation Red Team GRUDGING-PASS (all 5 fixed, no new HIGH/CRITICAL). Real-sandbox acceptance gate passed (judge integration ran end-to-end in the Workflow sandbox). Suite 1357/0/4 (+6 M82 tests). **SC#1 measured on M82's own partition: competition (3 producers) → 3 parallel groups vs N=1 baseline's 1 (3× parallelism), invalid overlap candidate correctly disqualified.** SC#3 position-bias probe: order-invariant winner (100%).
19
+ - Origin: brainstorm 2026-06-05 grounded in 2 deep-research runs (best-of-N/judge/debate + synthesis-vs-pick-one/MoA/Frankenstein).
20
+
21
+ ### Versioning
22
+
23
+ Minor bump 4.0.29 → 4.1.10 (new feature, additive; patch reset to 10).
24
+
5
25
  ## [4.0.29] - 2026-06-05 (M81 Workflows Runtime-Native - patch)
6
26
 
7
27
  ### Fixed - TD-113: 6 of 7 workflows (+ quick) crashed in the Workflow sandbox and had never run
package/README.md CHANGED
@@ -122,8 +122,11 @@ gsd-t build-coverage --json # M57: new top-level pat
122
122
  gsd-t ci-parity --json # M57: reproduce the project's actual CI build locally (auto docker build)
123
123
  gsd-t test-data --list [--run ID] [--json] # M58: list test-data ledger entries
124
124
  gsd-t test-data --purge --run ID [--dry-run] [--json] # M58: purge tagged test data after Verify (Step 4.5)
125
+ gsd-t competition-judge --in SPEC.json [--project-dir P] # M82: generate-and-judge selection oracle (partition / generic)
125
126
  ```
126
127
 
128
+ **Competition Mode (M82).** Opt-in `--competition N` (N 2–5) on upstream, pre-contract phases (`/gsd-t-partition`, `/gsd-t-milestone`, `/gsd-t-design-decompose`) fans out N parallel candidate producers and a judge selects the winner — the generative dual of the orthogonal validation triad. Partition uses an *objective* file-disjointness oracle as the judge (a calculator, not a biased critic); subjective phases use a blind + different-model + rubric judge. Default off. See `.gsd-t/contracts/competition-mode-contract.md`.
129
+
127
130
  `gsd-t parallel` consumes the M44 task-graph (D1) and applies three pre-spawn gates (D4 depgraph validation → D5 file-disjointness → D6 economics) followed by mode-aware headroom/split math. Extends — does not replace — the M40 orchestrator. Contract: `.gsd-t/contracts/wave-join-contract.md` v1.1.0.
128
131
 
129
132
  Each iteration runs as a fresh `claude -p` session. A cumulative debug ledger (`.gsd-t/debug-state.jsonl`) preserves hypothesis/fix/learning history across sessions. An anti-repetition preamble prevents retrying failed approaches.
@@ -0,0 +1,344 @@
1
+ "use strict";
2
+
3
+ /**
4
+ * gsd-t-competition-judge — M82 D1
5
+ *
6
+ * The selection oracle for Competition Mode (generate-and-judge on upstream,
7
+ * pre-contract phases). Given N candidate artifacts produced by parallel
8
+ * producers, score them and emit a winner — the GENERATIVE dual of the
9
+ * orthogonal validation triad (which is adversarial: many critics, one
10
+ * candidate). Contract: .gsd-t/contracts/competition-mode-contract.md v1.0.0.
11
+ *
12
+ * Two judge modes, chosen by `--kind`:
13
+ *
14
+ * --kind partition → OBJECTIVE judge (the v1 beachhead). Each candidate is a
15
+ * proposed domain decomposition: a list of domains, each with a write-
16
+ * touch list. We score it with the SAME disjointness oracle the real
17
+ * parallel dispatcher uses (bin/gsd-t-file-disjointness.cjs), so the judge
18
+ * is a CALCULATOR, not a critic — it sidesteps every LLM-judge bias
19
+ * (position / verbosity / self-preference). Metrics, higher-is-better
20
+ * unless noted:
21
+ * - valid : zero write-target overlaps across domains (HARD gate)
22
+ * - parallelGroups : count of disjoint domains that can fan out at once
23
+ * - waveDepth : serial gates (sequential groups + 1 if any) — LOWER better
24
+ * - unprovableCount : domains with no touch list — LOWER better (safe-default seq)
25
+ * Ranking: invalid candidates are disqualified; among valid ones, rank by
26
+ * (parallelGroups desc, waveDepth asc, unprovableCount asc, domainCount asc).
27
+ *
28
+ * --kind generic → records a SUBJECTIVE judge's verdict. The numeric scoring
29
+ * lives in the rubric the Workflow's judge agent fills in (blind+shuffled,
30
+ * different-model, rubric-scored — see the contract). This CLI only
31
+ * validates/normalizes the rubric scores the agent supplies and picks the
32
+ * winner deterministically (highest weighted score; ties → lowest index of
33
+ * the ORIGINAL, pre-shuffle order to keep selection reproducible). It does
34
+ * NOT call an LLM — keeping inference out of the deterministic substrate
35
+ * (per feedback_deterministic_orchestration + anthropic-key-measurement-only).
36
+ *
37
+ * Input: a JSON spec on stdin OR via --in <path>. Shapes:
38
+ *
39
+ * partition: {
40
+ * "kind": "partition",
41
+ * "candidates": [
42
+ * { "id": "A", "domains": [ { "name": "d1", "touches": ["a.js","b.js"] }, ... ] },
43
+ * ...
44
+ * ]
45
+ * }
46
+ *
47
+ * generic: {
48
+ * "kind": "generic",
49
+ * "axes": [ { "key": "coherence", "weight": 1 }, { "key": "completeness", "weight": 1 }, ... ],
50
+ * "candidates": [
51
+ * { "id": "A", "scores": { "coherence": 4, "completeness": 3, ... } },
52
+ * ...
53
+ * ]
54
+ * }
55
+ *
56
+ * Output (JSON envelope, the shape runCli parses):
57
+ * {
58
+ * ok: boolean, // true unless input was unusable
59
+ * exitCode: 0 | 4 | 64,
60
+ * kind, n,
61
+ * winner: <candidateId|null>,
62
+ * ranked: [ { id, valid?, parallelGroups?, waveDepth?, unprovableCount?, score?, rank } ],
63
+ * reason?: string
64
+ * }
65
+ *
66
+ * Exit codes: 0 ok+winner · 4 ok but NO valid candidate (all disqualified) · 64 bad input.
67
+ *
68
+ * Hard rules (mirrors the disjointness prover's discipline):
69
+ * - Zero external runtime deps (Node built-ins only).
70
+ * - Never throws — always emits an envelope.
71
+ * - Pure / read-only — no project mutation. Deterministic given the same input.
72
+ */
73
+
74
+ const fs = require("node:fs");
75
+
76
+ // The objective partition judge reuses the production disjointness oracle so the
77
+ // judge's notion of "parallelizable" is byte-identical to the dispatcher's.
78
+ let proveDisjointness;
79
+ try {
80
+ ({ proveDisjointness } = require("./gsd-t-file-disjointness.cjs"));
81
+ } catch {
82
+ proveDisjointness = null;
83
+ }
84
+
85
+ // ─── Partition scoring (objective) ───────────────────────────────────────
86
+
87
+ /**
88
+ * Score one candidate partition by running its domains through the disjointness
89
+ * oracle. Each domain becomes a pseudo-task {id, domain, touches}; we never hit
90
+ * git history (every domain carries an explicit touch list or is counted
91
+ * unprovable), so scoring is pure and deterministic.
92
+ *
93
+ * @returns {{valid, domainCount, parallelGroups, sequentialGroups, unprovableCount, waveDepth}}
94
+ */
95
+ // Normalize a touch path to a stable file identity so two spellings of the SAME
96
+ // file (./bin/x.js vs bin/x.js, trailing slash, backslashes, redundant ./ or //)
97
+ // are detected as a conflict. Without this, an overlapping partition could be
98
+ // scored `valid` and WIN — then the real dispatcher would hit a write conflict.
99
+ // Note: case is preserved (most CI runs on case-sensitive Linux); collapsing case
100
+ // here would create false conflicts on case-sensitive repos. Path identity only.
101
+ function _normPath(p) {
102
+ if (typeof p !== "string") return "";
103
+ let s = p.trim().replace(/\\/g, "/"); // backslashes -> forward
104
+ s = s.replace(/\/+/g, "/"); // collapse repeated slashes
105
+ s = s.replace(/^\.\//, ""); // drop leading ./
106
+ while (s.includes("/./")) s = s.replace("/./", "/"); // drop interior /./
107
+ s = s.replace(/\/+$/, ""); // drop trailing slash
108
+ return s;
109
+ }
110
+
111
+ function scorePartition(candidate, projectDir) {
112
+ const domains = Array.isArray(candidate.domains) ? candidate.domains : [];
113
+ const tasks = domains.map((d, i) => ({
114
+ id: `${candidate.id}:${d.name || `d${i}`}`,
115
+ domain: d.name || `d${i}`,
116
+ // Only honor an explicit touch list — never let the oracle fall through to
117
+ // git history during scoring (would make the judge non-deterministic).
118
+ // Normalize + de-dupe so path-spelling variants are caught as real conflicts.
119
+ touches: Array.isArray(d.touches)
120
+ ? Array.from(new Set(d.touches.map(_normPath).filter(Boolean)))
121
+ : [],
122
+ }));
123
+
124
+ // Run the real oracle when available; otherwise fall back to a self-contained
125
+ // overlap check so the judge still works if the lib isn't co-located.
126
+ const res = proveDisjointness
127
+ ? proveDisjointness({ tasks, projectDir })
128
+ : _localDisjoint(tasks);
129
+
130
+ const parallelGroups = (res.parallel || []).length;
131
+ const sequentialGroups = (res.sequential || []).filter(
132
+ (g) => !(g.length === 1 && (res.unprovable || []).includes(g[0])),
133
+ ).length;
134
+ const unprovableCount = (res.unprovable || []).length;
135
+
136
+ // VALID = no two domains with declared touch lists write the same file. An
137
+ // overlap shows up as a sequential group of size ≥2 among provable tasks.
138
+ const overlapGroup = (res.sequential || []).some((g) => g.length >= 2);
139
+ const valid = !overlapGroup;
140
+
141
+ // waveDepth: 1 wave for the disjoint fan-out, +1 per serial bottleneck
142
+ // (overlapping/unprovable domains that must run after). Fewer = better.
143
+ const serialBottlenecks = sequentialGroups + unprovableCount;
144
+ const waveDepth = (parallelGroups > 0 ? 1 : 0) + (serialBottlenecks > 0 ? 1 : 0) || 1;
145
+
146
+ return {
147
+ valid,
148
+ domainCount: domains.length,
149
+ parallelGroups,
150
+ sequentialGroups,
151
+ unprovableCount,
152
+ waveDepth,
153
+ };
154
+ }
155
+
156
+ // Self-contained overlap fallback (only used if the oracle lib is absent).
157
+ function _localDisjoint(tasks) {
158
+ const parallel = [];
159
+ const sequential = [];
160
+ const unprovable = [];
161
+ const provable = [];
162
+ for (const t of tasks) {
163
+ if (!t.touches || t.touches.length === 0) {
164
+ unprovable.push(t);
165
+ sequential.push([t]);
166
+ } else {
167
+ provable.push(t);
168
+ }
169
+ }
170
+ // union-find over file overlap
171
+ const parent = provable.map((_, i) => i);
172
+ const find = (i) => {
173
+ while (parent[i] !== i) { parent[i] = parent[parent[i]]; i = parent[i]; }
174
+ return i;
175
+ };
176
+ for (let i = 0; i < provable.length; i++) {
177
+ for (let j = i + 1; j < provable.length; j++) {
178
+ const a = new Set(provable[i].touches);
179
+ if (provable[j].touches.some((f) => a.has(f))) {
180
+ const ra = find(i), rb = find(j);
181
+ if (ra !== rb) parent[ra] = rb;
182
+ }
183
+ }
184
+ }
185
+ const groups = new Map();
186
+ for (let i = 0; i < provable.length; i++) {
187
+ const r = find(i);
188
+ if (!groups.has(r)) groups.set(r, []);
189
+ groups.get(r).push(provable[i]);
190
+ }
191
+ for (const g of groups.values()) (g.length === 1 ? parallel : sequential).push(g);
192
+ return { parallel, sequential, unprovable };
193
+ }
194
+
195
+ // Drop candidates that are not usable objects with a string id (Red Team MED-4:
196
+ // the 'never throws' guarantee is on the function, not just the CLI shell — an
197
+ // in-process caller passing [null] or {id:{}} must not crash, and a non-string id
198
+ // could never match `c.id === winnerId` in the workflow anyway).
199
+ function _safeCandidates(candidates) {
200
+ return (Array.isArray(candidates) ? candidates : []).filter(
201
+ (c) => c && typeof c === "object" && typeof c.id === "string" && c.id.length > 0,
202
+ );
203
+ }
204
+
205
+ function rankPartitions(rawCandidates, projectDir) {
206
+ const candidates = _safeCandidates(rawCandidates);
207
+ const scored = candidates.map((c) => ({ id: c.id, ...scorePartition(c, projectDir) }));
208
+ // Disqualify invalid (file-overlap) candidates from winning, but keep them in
209
+ // the ranking so the caller can see why they lost.
210
+ const valid = scored.filter((s) => s.valid);
211
+ const cmp = (a, b) =>
212
+ b.parallelGroups - a.parallelGroups || // more concurrency wins
213
+ a.waveDepth - b.waveDepth || // fewer serial gates wins
214
+ a.unprovableCount - b.unprovableCount || // fewer unknowns wins
215
+ a.domainCount - b.domainCount; // simpler (fewer domains) wins
216
+ valid.sort(cmp);
217
+ const invalid = scored.filter((s) => !s.valid);
218
+ const ordered = [...valid, ...invalid];
219
+ ordered.forEach((s, i) => { s.rank = i + 1; });
220
+ return { ranked: ordered, winner: valid.length ? valid[0].id : null };
221
+ }
222
+
223
+ // ─── Generic scoring (subjective rubric, deterministic selection) ────────
224
+
225
+ function rankGeneric(spec) {
226
+ const axes = Array.isArray(spec.axes) && spec.axes.length
227
+ ? spec.axes
228
+ : [{ key: "quality", weight: 1 }];
229
+ const candidates = _safeCandidates(spec.candidates);
230
+ const scored = candidates.map((c, idx) => {
231
+ const scores = c.scores || {};
232
+ let total = 0;
233
+ let weightSum = 0;
234
+ for (const ax of axes) {
235
+ const w = Number(ax.weight) || 0;
236
+ const v = Number(scores[ax.key]) || 0;
237
+ total += w * v;
238
+ weightSum += w;
239
+ }
240
+ const score = weightSum > 0 ? total / weightSum : 0;
241
+ return { id: c.id, score: Number(score.toFixed(4)), _idx: idx };
242
+ });
243
+ // Highest weighted score wins; ties broken by ORIGINAL index (reproducible,
244
+ // immune to candidate-order shuffling done for bias control upstream).
245
+ scored.sort((a, b) => b.score - a.score || a._idx - b._idx);
246
+ scored.forEach((s, i) => { s.rank = i + 1; delete s._idx; });
247
+ return { ranked: scored, winner: scored.length ? scored[0].id : null };
248
+ }
249
+
250
+ // ─── Driver ──────────────────────────────────────────────────────────────
251
+
252
+ function judge(spec, projectDir) {
253
+ const candidates = Array.isArray(spec && spec.candidates) ? spec.candidates : [];
254
+ if (!candidates.length) {
255
+ return { ok: false, exitCode: 64, kind: spec && spec.kind, n: 0, winner: null, ranked: [], reason: "no-candidates" };
256
+ }
257
+ const kind = spec.kind === "generic" ? "generic" : "partition";
258
+ const { ranked, winner } = kind === "partition"
259
+ ? rankPartitions(candidates, projectDir)
260
+ : rankGeneric(spec);
261
+ const ok = winner != null;
262
+ return {
263
+ ok,
264
+ exitCode: ok ? 0 : 4,
265
+ kind,
266
+ n: candidates.length,
267
+ winner,
268
+ ranked,
269
+ ...(ok ? {} : { reason: kind === "partition" ? "no-valid-candidate" : "no-candidates" }),
270
+ };
271
+ }
272
+
273
+ function readInput(opts) {
274
+ if (opts.in) return fs.readFileSync(opts.in, "utf8");
275
+ // stdin
276
+ try {
277
+ return fs.readFileSync(0, "utf8");
278
+ } catch {
279
+ return "";
280
+ }
281
+ }
282
+
283
+ function parseArgs(argv) {
284
+ const opts = { json: true, in: null, projectDir: process.cwd(), help: false };
285
+ for (let i = 0; i < argv.length; i++) {
286
+ const a = argv[i];
287
+ if (a === "--help" || a === "-h") opts.help = true;
288
+ else if (a === "--in") opts.in = argv[++i];
289
+ else if (a === "--project-dir") opts.projectDir = argv[++i];
290
+ else if (a === "--json") opts.json = true;
291
+ }
292
+ return opts;
293
+ }
294
+
295
+ const HELP = `Usage: gsd-t competition-judge [--in PATH] [--project-dir PATH]
296
+
297
+ Reads a candidate-set JSON spec (stdin or --in) and emits a ranked winner.
298
+
299
+ --in PATH Read spec from file instead of stdin.
300
+ --project-dir PATH Project root (default: cwd).
301
+ --json Emit JSON envelope (default; always on).
302
+
303
+ Spec.kind:
304
+ "partition" Objective oracle judge — scores domain decompositions via the
305
+ file-disjointness prover (parallelGroups / waveDepth / validity).
306
+ "generic" Deterministic rubric selector — picks the highest weighted score
307
+ from rubric values an upstream judge agent supplied.
308
+
309
+ Exit codes: 0 winner · 4 no valid candidate · 64 bad input.`;
310
+
311
+ function main() {
312
+ const opts = parseArgs(process.argv.slice(2));
313
+ if (opts.help) {
314
+ process.stdout.write(HELP + "\n");
315
+ process.exit(0);
316
+ }
317
+ let spec;
318
+ try {
319
+ const raw = readInput(opts);
320
+ spec = JSON.parse(raw);
321
+ } catch (e) {
322
+ const env = { ok: false, exitCode: 64, kind: null, n: 0, winner: null, ranked: [], reason: `bad-input: ${e && e.message}` };
323
+ process.stdout.write(JSON.stringify(env, null, 2) + "\n");
324
+ process.exit(64);
325
+ }
326
+ let result;
327
+ try {
328
+ result = judge(spec, opts.projectDir);
329
+ } catch (e) {
330
+ result = { ok: false, exitCode: 64, kind: spec && spec.kind, n: 0, winner: null, ranked: [], reason: `judge-error: ${e && e.message}` };
331
+ }
332
+ process.stdout.write(JSON.stringify(result, null, 2) + "\n");
333
+ process.exit(result.exitCode);
334
+ }
335
+
336
+ if (require.main === module) main();
337
+
338
+ module.exports = {
339
+ judge,
340
+ scorePartition,
341
+ rankPartitions,
342
+ rankGeneric,
343
+ _internal: { _localDisjoint, _normPath },
344
+ };
package/bin/gsd-t.js CHANGED
@@ -1182,6 +1182,8 @@ const GLOBAL_BIN_TOOLS = [
1182
1182
  // M57 — CI-parity verify-gate checks (structural build-coverage + containment-safe ci-parity).
1183
1183
  "gsd-t-build-coverage.cjs",
1184
1184
  "gsd-t-ci-parity.cjs",
1185
+ // M82 — Competition Mode generate-and-judge selection oracle.
1186
+ "gsd-t-competition-judge.cjs",
1185
1187
  ];
1186
1188
 
1187
1189
  function installGlobalBinTools() {
@@ -2469,6 +2471,10 @@ const PROJECT_BIN_TOOLS = [
2469
2471
  "cli-preflight.cjs", "parallel-cli.cjs", "parallel-cli-tee.cjs",
2470
2472
  "gsd-t-context-brief.cjs",
2471
2473
  "gsd-t-verify-gate.cjs", "gsd-t-verify-gate-judge.cjs",
2474
+ // M82 — Competition Mode judge + its disjointness oracle dependency, so a
2475
+ // project's gsd-t-phase workflow can score candidate partitions via the
2476
+ // project-local bin (runCli prefers bin/<tool>.cjs over the global binary).
2477
+ "gsd-t-competition-judge.cjs", "gsd-t-file-disjointness.cjs",
2472
2478
  ];
2473
2479
 
2474
2480
  // Files that older versions of this installer copied into project bin/ but
@@ -4546,6 +4552,16 @@ if (require.main === module) {
4546
4552
  });
4547
4553
  process.exit(res.status == null ? 1 : res.status);
4548
4554
  }
4555
+ case "competition-judge": {
4556
+ // M82 D1 — `gsd-t competition-judge` thin dispatcher to the generate-and-judge
4557
+ // selection oracle (objective partition judge + deterministic rubric selector).
4558
+ const { spawnSync } = require("child_process");
4559
+ const js = path.join(__dirname, "gsd-t-competition-judge.cjs");
4560
+ const res = spawnSync(process.execPath, [js, ...args.slice(1)], {
4561
+ stdio: "inherit",
4562
+ });
4563
+ process.exit(res.status == null ? 1 : res.status);
4564
+ }
4549
4565
  case "metrics":
4550
4566
  doMetrics(args.slice(1));
4551
4567
  break;
@@ -25,14 +25,21 @@ Capture the design reference from `$ARGUMENTS` (Figma URL / image path). If Figm
25
25
  args: {
26
26
  phase: "design-decompose",
27
27
  projectDir: ".",
28
- userInput: "$ARGUMENTS"
28
+ userInput: "$ARGUMENTS",
29
+ // M82 Competition Mode (opt-in): `--competition N` (N 2..5) fans out N
30
+ // parallel decompositions; a blind, different-model, rubric judge (fidelity /
31
+ // completeness / reuse / simplicity) selects the winner. Useful when a design
32
+ // is ambiguous or the component boundaries aren't obvious.
33
+ competition: 1
29
34
  }
30
35
  }
31
36
  ```
32
37
 
38
+ **Competition Mode (`--competition N`).** When a design is ambiguous or the element/widget/page boundaries aren't obvious, `/gsd-t-design-decompose --competition 3` fans out N candidate decompositions and a blind, different-model rubric judge picks the best. Parse N (clamped 2..5). See `.gsd-t/contracts/competition-mode-contract.md`. Default off.
39
+
33
40
  ## Step 3: Interpret the result
34
41
 
35
- The Workflow returns `{ status, artifacts, summary, decisions }`.
42
+ The Workflow returns `{ status, artifacts, summary, decisions }` (plus `competition: { n, winner, ranked }` when Competition Mode ran).
36
43
 
37
44
  - `status === "complete"`: the element → widget → page contract tree is written under `.gsd-t/contracts/design/`.
38
45
  - `status === "partial" | "blocked"`: the agent needs the design source (e.g. Figma auth) or a stack-capability decision. Surface it.
@@ -479,6 +479,14 @@ Use these when user asks for help on a specific command:
479
479
  - **Use when**: Test data hygiene. Catches the GSD-T-Board class (2442 orphaned `E2E_TEST_*` / `E2E_DRAG_*` ideas left in the production data store after a passing Verify run).
480
480
  - **CLI**: `gsd-t test-data --list [--run <id>] [--json]` / `gsd-t test-data --purge --run <id> [--dry-run] [--json] [--project <dir>]`. Exit 0 on success, 4 on adapter errors, 64 on usage error.
481
481
 
482
+ ### competition-judge (M82)
483
+ - **Summary**: The selection oracle for Competition Mode (generate-and-judge — the *generative* dual of the orthogonal validation triad). Two modes: `--kind partition` scores candidate domain decompositions via the file-disjointness oracle (parallelGroups / waveDepth / validity — a calculator, not an LLM critic, so it's immune to judge bias); `--kind generic` is a deterministic rubric selector that finalizes a winner from rubric scores an upstream blind/different-model judge supplied.
484
+ - **Auto-invoked**: Yes — by `gsd-t-phase.workflow.js` when an eligible phase (partition / milestone / design-decompose) is run with `competition: N` (N 2–5). Opt-in per phase via `/gsd-t-partition --competition N` etc. Default off.
485
+ - **Files**: `bin/gsd-t-competition-judge.cjs` (reuses `bin/gsd-t-file-disjointness.cjs`).
486
+ - **Use when**: Upstream, pre-contract, wide-solution-space decisions where the cost of a single draft is high (partition, milestone decomposition, ambiguous design decomposition). Never on post-contract phases (execute/verify/etc.) — those are owned by the adversarial triad.
487
+ - **CLI**: `gsd-t competition-judge [--in <spec.json>] [--project-dir <dir>]` (spec via stdin or `--in`). Exit 0 winner · 4 no valid candidate · 64 bad input.
488
+ - **Contract**: `.gsd-t/contracts/competition-mode-contract.md` v1.0.0 STABLE.
489
+
482
490
  ## Unknown Command
483
491
 
484
492
  If user asks for help on unrecognized command:
@@ -25,14 +25,21 @@ Read `.gsd-t/progress.md` (current version + completed milestones), `docs/requir
25
25
  args: {
26
26
  phase: "milestone",
27
27
  projectDir: ".",
28
- userInput: "$ARGUMENTS"
28
+ userInput: "$ARGUMENTS",
29
+ // M82 Competition Mode (opt-in): `--competition N` (N 2..5) fans out N
30
+ // parallel Self-MoA producers proposing different decomposition strategies
31
+ // (risk-first / value-first / dependency-first); a blind, different-model,
32
+ // rubric judge selects the winner. Coupled-thesis → pick-one (no Frankenstein).
33
+ competition: 1
29
34
  }
30
35
  }
31
36
  ```
32
37
 
38
+ **Competition Mode (`--competition N`).** Milestone decomposition is the highest-altitude decision in the system — different strategies are genuinely different. If the user invokes `/gsd-t-milestone --competition 3`, parse N (clamped 2..5) and pass `competition: N`. Because a milestone decomposition is a *coupled thesis*, the judge selects one winner whole (pick-one) and only salvages non-overlapping good line-items from the losers — it never Frankensteins. See `.gsd-t/contracts/competition-mode-contract.md`. Default off.
39
+
33
40
  ## Step 3: Interpret the result
34
41
 
35
- The Workflow returns `{ status, artifacts, summary, decisions }`.
42
+ The Workflow returns `{ status, artifacts, summary, decisions }` (plus `competition: { n, winner, ranked }` when Competition Mode ran).
36
43
 
37
44
  - `status === "complete"`: milestone defined and appended to progress.md with falsifiable SCs. Do NOT auto-partition for large/risky milestones — show the Next Up hint.
38
45
  - `status === "blocked"`: the agent needs a scoping decision from the user.
@@ -30,14 +30,21 @@ Call the `Workflow` tool with:
30
30
  phase: "partition",
31
31
  milestone: "M{NN}",
32
32
  projectDir: ".",
33
- userInput: "$ARGUMENTS"
33
+ userInput: "$ARGUMENTS",
34
+ // M82 Competition Mode (opt-in): if the user passed `--competition N` in
35
+ // $ARGUMENTS (N in 2..5), set competition: N. N parallel Self-MoA producers
36
+ // propose partitions; the OBJECTIVE oracle judge (file-disjointness scoring)
37
+ // picks the most-parallelizable valid decomposition. Omit / set 1 = off.
38
+ competition: 1
34
39
  }
35
40
  }
36
41
  ```
37
42
 
43
+ **Competition Mode (`--competition N`).** Partition is the v1 beachhead for generate-and-judge: its judge is the file-disjointness oracle, so it is a calculator, not a biased critic. If the user invokes `/gsd-t-partition --competition 3`, parse N (clamped 2..5) and pass `competition: N`. The workflow fans out N candidate partitions, scores each on measured parallelism / wave-depth / boundary-cleanliness, and finalizes the winner. See `.gsd-t/contracts/competition-mode-contract.md`. Default off (single producer).
44
+
38
45
  ## Step 3: Interpret the result
39
46
 
40
- The Workflow returns `{ status, artifacts, summary, decisions }`.
47
+ The Workflow returns `{ status, artifacts, summary, decisions }` (plus `competition: { n, winner, ranked }` when Competition Mode ran).
41
48
 
42
49
  - `status === "complete"`: domains scoped, contracts drafted. Auto-advance to `/gsd-t-plan`.
43
50
  - `status === "partial" | "blocked"`: read `summary` for what's missing (e.g. ambiguous scope needing discussion).
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@tekyzinc/gsd-t",
3
- "version": "4.0.29",
3
+ "version": "4.1.10",
4
4
  "description": "GSD-T: Contract-Driven Development for Claude Code — 54 slash commands with headless-by-default workflow spawning, unattended supervisor relay with event stream, graph-powered code analysis, real-time agent dashboard, task telemetry, doc-ripple enforcement, backlog management, impact analysis, test sync, milestone archival, and PRD generation",
5
5
  "author": "Tekyz, Inc.",
6
6
  "license": "MIT",
@@ -328,7 +328,7 @@ Canonical scripts:
328
328
  - `gsd-t-integrate.workflow.js` — cross-domain wire-up + light verify-gate
329
329
  - `gsd-t-debug.workflow.js` — 2-cycle diagnose/fix/verify (CLAUDE.md Prime Rule)
330
330
  - `gsd-t-quick.workflow.js` — preflight + brief + single-task + verify-gate (M56-D4)
331
- - `gsd-t-phase.workflow.js` — generic upper-stage runner (partition / plan / discuss / impact / milestone / prd / design-decompose / doc-ripple)
331
+ - `gsd-t-phase.workflow.js` — generic upper-stage runner (partition / plan / discuss / impact / milestone / prd / design-decompose / doc-ripple). **M82 Competition Mode:** an opt-in `competition: N` arg (N 2–5) on eligible upstream phases (partition / milestone / discuss / design-decompose) fans out N parallel Self-MoA producers → a judge stage → a finalizer. Partition's judge is the OBJECTIVE file-disjointness oracle (`gsd-t competition-judge --kind partition` — a calculator, not an LLM critic, immune to judge bias, the v1 beachhead); subjective phases use a blind + shuffled + different-model + rubric judge whose pick is finalized deterministically by `--kind generic`. The generative dual of the orthogonal validation triad; watershed rule = generate-and-judge ABOVE the contract, attack-and-filter BELOW. Default off. Contract: `competition-mode-contract.md` v1.0.0.
332
332
  - `gsd-t-scan.workflow.js` — preflight → volume-probe → pipeline(per-slice deep finder → single verify) → synthesis → document → render (M66: fans out by codebase VOLUME, not a fixed 5-teammate dimension count; M67: deep document phase deterministically produces the full living-doc set + dimension files, per-doc fan-out)
333
333
 
334
334
  **Runtime-native invariant (M81 — v4.0.29+):** the Workflow sandbox provides ONLY `agent/parallel/pipeline/log/phase/budget/args` — NO `require`/`fs`/`path`/`child_process`/`process`, and `args` arrives as a JSON STRING. Each workflow is self-contained: it `JSON.parse`s `args` and delegates every CLI call (preflight, verify-gate, brief, build-coverage, ci-parity, test-data, disjointness) to inline `async` helpers that run the command via an `agent()`'s Bash (preferring project-local `bin/<tool>.cjs`, else the global `gsd-t` PATH binary) and parse the JSON envelope — preserving the M55-D5 project-local-bin invariant. The old `require("./_lib.js")` pattern threw `ReferenceError` on first eval and silently broke every workflow except scan (TD-113, fixed M81); `_lib.js` is retired as a workflow dependency.
@@ -15,7 +15,23 @@
15
15
  // milestone?: "M61",
16
16
  // projectDir?: ".",
17
17
  // userInput?: string, // arbitrary input to the phase (e.g. "$ARGUMENTS")
18
+ // competition?: number, // M82: N>1 enables Competition Mode (generate-and-judge)
19
+ // // on eligible upstream phases. N parallel Self-MoA
20
+ // // producers -> judge stage -> winner. Default 1 (off).
18
21
  // }
22
+ //
23
+ // M82 Competition Mode (generate-and-judge — the GENERATIVE dual of the
24
+ // orthogonal validation triad). Contract: competition-mode-contract.md v1.0.0.
25
+ // - Eligible phases: partition, milestone, discuss, design-decompose (pre-contract,
26
+ // wide-solution-space). INELIGIBLE: plan/impact/prd/doc-ripple (narrow / one
27
+ // right answer) — competition there is wasted, so a competition arg is ignored.
28
+ // - Producers: N samples of ONE strong model (Self-MoA beats a model zoo), varied
29
+ // by an explicit per-candidate "angle" so they explore different regions.
30
+ // - Judge: partition uses the OBJECTIVE oracle (gsd-t competition-judge --kind
31
+ // partition, scoring via the disjointness prover — a calculator, not a critic,
32
+ // immune to LLM-judge bias). Other phases use a blind+shuffled+rubric judge whose
33
+ // numeric selection is finalized deterministically by competition-judge --kind
34
+ // generic.
19
35
 
20
36
  export const meta = {
21
37
  name: "gsd-t-phase",
@@ -34,6 +50,8 @@ const _CLI_ENVELOPE_SCHEMA = {
34
50
  type: "object", required: ["ok", "exitCode"], additionalProperties: true,
35
51
  properties: { ok: { type: "boolean" }, exitCode: { type: "integer" }, envelope: {}, stdout: { type: "string" }, stderr: { type: "string" }, via: { type: "string" } },
36
52
  };
53
+ // Single-quote a value for safe shell interpolation (Red Team MED-5).
54
+ function _shq(s) { return `'${String(s).replace(/'/g, "'\\''")}'`; }
37
55
  async function runCli(projectDir, subcmd, argv, localBin, label, parseJson = true, phaseNameOpt) {
38
56
  const argStr = (argv || []).map((a) => `'${String(a).replace(/'/g, "'\\''")}'`).join(" ");
39
57
  const prompt = [
@@ -57,6 +75,71 @@ async function generateBrief(projectDir, { kind = "execute", milestone, domain,
57
75
  return { ok: r.ok, briefPath: `${projectDir}/.gsd-t/briefs/${id}.json`, via: r.via };
58
76
  }
59
77
 
78
+ // M82: run the deterministic selection oracle over a candidate-set spec. The spec
79
+ // is written to a file via the agent's Bash (no fs in this sandbox), then judged by
80
+ // `gsd-t competition-judge --in <file>`. The agent MUST copy the judge's rich output
81
+ // (winner/ranked) up to the TOP LEVEL of its reply — a permissive free-form
82
+ // `envelope:{}` schema let a haiku agent silently drop winner/ranked (caught in the
83
+ // M82 real-sandbox proof: via=local ok=true but winner=undefined). Explicit required
84
+ // fields fix that. Returns { ok, winner, ranked }.
85
+ const _JUDGE_ENVELOPE_SCHEMA = {
86
+ type: "object", required: ["ok", "winner"], additionalProperties: true,
87
+ properties: {
88
+ ok: { type: "boolean" },
89
+ exitCode: { type: "integer" },
90
+ winner: { type: ["string", "null"] },
91
+ ranked: { type: "array", items: { type: "object", additionalProperties: true } },
92
+ via: { type: "string" },
93
+ },
94
+ };
95
+ async function runCompetitionJudge(projectDir, spec, label = "judge", phaseNameOpt) {
96
+ // De-fang backticks so a producer-supplied domain name / path containing ``` can't
97
+ // break out of the markdown fence in the prompt (Red Team MED-5). The judge only
98
+ // reads structural fields (id, domains.name, touches[]); a sanitized name is fine.
99
+ const specJson = JSON.stringify(spec).replace(/`/g, "'");
100
+ const qDir = _shq(projectDir);
101
+ const specPath = `${projectDir}/.gsd-t/briefs/_competition-spec.json`;
102
+ const qSpec = _shq(specPath);
103
+ const prompt = [
104
+ `Run the GSD-T Competition Mode judge for the project at \`${projectDir}\` and report its FULL output. Steps:`,
105
+ `1. Write this EXACT JSON (one line) to \`${specPath}\` (overwrite; create .gsd-t/briefs/ if needed):`,
106
+ "~~~json",
107
+ specJson,
108
+ "~~~",
109
+ `2. If \`${projectDir}/bin/gsd-t-competition-judge.cjs\` exists, run: \`node ${qDir}/bin/gsd-t-competition-judge.cjs --in ${qSpec} --project-dir ${qDir}\` (set via="local"). Otherwise run: \`gsd-t competition-judge --in ${qSpec} --project-dir ${qDir}\` (set via="global"). cwd \`${projectDir}\`.`,
110
+ `3. The command prints a JSON object to stdout with fields: ok, exitCode, winner, ranked, n.`,
111
+ `4. COPY those fields (ok, exitCode, winner, ranked) up to the TOP LEVEL of your reply, plus via. Do NOT nest them under "envelope". If the command failed, set winner=null.`,
112
+ `Do NOT do any other work.`,
113
+ ].join("\n");
114
+ const opts = { label, schema: _JUDGE_ENVELOPE_SCHEMA, model: "haiku" };
115
+ if (phaseNameOpt) opts.phase = phaseNameOpt;
116
+ const r = await agent(prompt, opts).catch((e) => ({ ok: false, winner: null, ranked: [], via: "error", err: String(e && e.message) }));
117
+ // Prefer top-level fields; fall back to a nested envelope if the agent nested anyway.
118
+ const env = (r && r.winner !== undefined) ? r : (r && r.envelope) || {};
119
+ return { ok: !!env.ok, winner: env.winner != null ? env.winner : null, ranked: env.ranked || [] };
120
+ }
121
+
122
+ // Phases where competition pays off (wide solution space, pre-contract, high blast
123
+ // radius). A competition arg on any other phase is ignored (single producer runs).
124
+ const COMPETITION_ELIGIBLE = new Set(["partition", "milestone", "discuss", "design-decompose"]);
125
+
126
+ // Rubric axes for the SUBJECTIVE judge (non-partition eligible phases). Partition
127
+ // uses the objective oracle instead and ignores these.
128
+ const RUBRIC_AXES_BY_PHASE = {
129
+ milestone: [
130
+ { key: "coherence", weight: 2 }, { key: "completeness", weight: 1 },
131
+ { key: "riskCoverage", weight: 1 }, { key: "simplicity", weight: 1 },
132
+ ],
133
+ discuss: [
134
+ { key: "soundness", weight: 2 }, { key: "completeness", weight: 1 },
135
+ { key: "tradeoffClarity", weight: 1 }, { key: "simplicity", weight: 1 },
136
+ ],
137
+ "design-decompose": [
138
+ { key: "fidelity", weight: 2 }, { key: "completeness", weight: 1 },
139
+ { key: "reuse", weight: 1 }, { key: "simplicity", weight: 1 },
140
+ ],
141
+ };
142
+
60
143
  const VALID_PHASES = [
61
144
  "partition", "plan", "discuss", "impact",
62
145
  "milestone", "prd", "design-decompose", "doc-ripple",
@@ -79,6 +162,15 @@ const milestone = _args.milestone || null;
79
162
  const userInput = _args.userInput || "";
80
163
  const phaseName = _args.phase;
81
164
 
165
+ // M82: clamp competition N to [1,5]. Evidence (Self-MoA, Large Language Monkeys):
166
+ // gains plateau fast; N=3 captures the elbow, >5 is wasteful. N<=1 = off (single producer).
167
+ const _rawN = Number(_args.competition) || 1;
168
+ const competitionN = Math.max(1, Math.min(5, Math.floor(_rawN)));
169
+ const competitionOn = competitionN > 1 && COMPETITION_ELIGIBLE.has(phaseName);
170
+ if (competitionN > 1 && !competitionOn) {
171
+ log(`competition: N=${competitionN} ignored — phase "${phaseName}" is not competition-eligible (single producer runs). Eligible: ${[...COMPETITION_ELIGIBLE].join(", ")}.`);
172
+ }
173
+
82
174
  if (!phaseName || !VALID_PHASES.includes(phaseName)) {
83
175
  log(`phase: args.phase must be one of: ${VALID_PHASES.join(", ")}`);
84
176
  return { status: "failed", reason: "invalid-phase" };
@@ -101,23 +193,245 @@ const promptByPhase = {
101
193
  "doc-ripple": `Identify and update all docs affected by recent code changes per the Document Ripple Completion Gate. No code edits.`,
102
194
  };
103
195
 
104
- const result = await agent(
105
- [
106
- `You are the ${phaseName} phase agent.`,
107
- milestone ? `Milestone: ${milestone}` : "",
108
- `**Brief (REQUIRED):** ${brief.briefPath || "(no brief — re-walk repo)"}`,
109
- userInput ? `\nUser input:\n${userInput}` : "",
110
- ``,
111
- `Objective: ${promptByPhase[phaseName]}`,
112
- ``,
113
- `Follow the CLAUDE.md Pre-Commit Gate. Commit artifacts with prefix "m61(${phaseName})" or similar.`,
114
- `Return JSON per the schema.`,
115
- ].filter(Boolean).join("\n"),
116
- { label: phaseName, phase: "Phase", schema: PHASE_RESULT_SCHEMA, model: "opus" }
117
- ).catch((e) => ({
118
- status: "failed",
119
- artifacts: [],
120
- summary: `agent error: ${e && e.message}`,
121
- }));
196
+ const baseObjective = promptByPhase[phaseName];
197
+ const briefLine = `**Brief (REQUIRED):** ${brief.briefPath || "(no brief — re-walk repo)"}`;
198
+
199
+ let result;
200
+ if (!competitionOn) {
201
+ // ── Single-producer path (default, unchanged behavior) ──
202
+ result = await agent(
203
+ [
204
+ `You are the ${phaseName} phase agent.`,
205
+ milestone ? `Milestone: ${milestone}` : "",
206
+ briefLine,
207
+ userInput ? `\nUser input:\n${userInput}` : "",
208
+ ``,
209
+ `Objective: ${baseObjective}`,
210
+ ``,
211
+ `Follow the CLAUDE.md Pre-Commit Gate. Commit artifacts with prefix "${(milestone || "m").toLowerCase()}(${phaseName})".`,
212
+ `Return JSON per the schema.`,
213
+ ].filter(Boolean).join("\n"),
214
+ { label: phaseName, phase: "Phase", schema: PHASE_RESULT_SCHEMA, model: "opus" }
215
+ ).catch((e) => ({ status: "failed", artifacts: [], summary: `agent error: ${e && e.message}` }));
216
+ } else {
217
+ // ── M82 Competition Mode: generate -> judge -> finalize ──
218
+ // Distinct "angles" so the N Self-MoA producers explore different regions of
219
+ // the solution space (diversity by prompt, not by model — Self-MoA > Mixed-MoA).
220
+ const ANGLES = [
221
+ "Optimize for MAXIMUM parallelism: carve the most file-disjoint domains that can run concurrently.",
222
+ "Optimize for SIMPLICITY: the fewest domains with the cleanest, most obvious boundaries.",
223
+ "Optimize for RISK ISOLATION: isolate the riskiest/most-coupled work into its own domain so the rest stays safe.",
224
+ "Optimize for DEPENDENCY DEPTH: minimize serial gates (waves) between domains.",
225
+ "Optimize for BALANCE: roughly equal-sized domains with minimal cross-talk.",
226
+ ];
227
+
228
+ const PRODUCER_SCHEMA = phaseName === "partition"
229
+ ? {
230
+ type: "object", required: ["id", "domains"], additionalProperties: true,
231
+ properties: {
232
+ id: { type: "string" },
233
+ rationale: { type: "string" },
234
+ domains: {
235
+ type: "array", items: {
236
+ type: "object", required: ["name", "touches"], additionalProperties: true,
237
+ properties: {
238
+ name: { type: "string" },
239
+ touches: { type: "array", items: { type: "string" } },
240
+ summary: { type: "string" },
241
+ },
242
+ },
243
+ },
244
+ },
245
+ }
246
+ : {
247
+ type: "object", required: ["id", "proposal"], additionalProperties: true,
248
+ properties: { id: { type: "string" }, proposal: { type: "string" }, rationale: { type: "string" } },
249
+ };
250
+
251
+ phase("Compete");
252
+ log(`competition: ${competitionN} producers (Self-MoA, model=opus) for ${phaseName}`);
253
+ const ids = ["A", "B", "C", "D", "E"];
254
+ const candidates = (await parallel(
255
+ Array.from({ length: competitionN }, (_, i) => () =>
256
+ agent(
257
+ [
258
+ `You are candidate ${ids[i]} — one of ${competitionN} INDEPENDENT ${phaseName} proposals competing on quality.`,
259
+ milestone ? `Milestone: ${milestone}` : "",
260
+ briefLine,
261
+ userInput ? `\nUser input:\n${userInput}` : "",
262
+ ``,
263
+ `Objective: ${baseObjective}`,
264
+ `Your distinct angle: ${ANGLES[i % ANGLES.length]}`,
265
+ ``,
266
+ `DO NOT write or commit any files. PROPOSE ONLY — return your proposal as JSON per the schema.`,
267
+ phaseName === "partition"
268
+ ? `For "touches", list the concrete repo file paths each domain will WRITE (its owned files). Be specific and realistic — the judge scores file-disjointness from these.`
269
+ : `Put the full proposal text in "proposal".`,
270
+ `Set "id" to "${ids[i]}".`,
271
+ ].filter(Boolean).join("\n"),
272
+ { label: `candidate:${ids[i]}`, phase: "Compete", schema: PRODUCER_SCHEMA, model: "opus" }
273
+ ).then((c) => ({ ...c, id: c.id || ids[i] })).catch(() => null)
274
+ )
275
+ )).filter(Boolean);
276
+
277
+ if (candidates.length === 0) {
278
+ return { status: "failed", artifacts: [], summary: "competition: all producers failed" };
279
+ }
280
+
281
+ phase("Judge");
282
+ let winnerId = null;
283
+ let ranked = [];
284
+ if (phaseName === "partition") {
285
+ // OBJECTIVE oracle judge — calculator, not critic.
286
+ const env = await runCompetitionJudge(projectDir, { kind: "partition", candidates }, "judge:oracle", "Judge");
287
+ winnerId = env.winner; ranked = env.ranked || [];
288
+ } else {
289
+ // SUBJECTIVE judge: a different-model (sonnet) rubric scorer. Candidates are
290
+ // blind (author identity stripped) AND shuffled (deterministic permutation) so
291
+ // judge position no longer correlates with producer index/angle — Red Team
292
+ // HIGH-3: the shuffle was claimed in a comment but never implemented.
293
+ const axes = RUBRIC_AXES_BY_PHASE[phaseName] || [{ key: "quality", weight: 1 }];
294
+ // Deterministic permutation (Math.random is sandbox-banned): rotate by a seed
295
+ // derived from the milestone+phase string so order is stable per run but
296
+ // decoupled from producer index. The CLI tiebreak keys off the candidate's own
297
+ // id (carried through), so final selection stays reproducible regardless.
298
+ const seedStr = `${milestone || "m"}:${phaseName}`;
299
+ let seed = 0;
300
+ for (let k = 0; k < seedStr.length; k++) seed = (seed * 31 + seedStr.charCodeAt(k)) >>> 0;
301
+ const rot = candidates.length ? (seed % candidates.length) : 0;
302
+ const shuffled = candidates.map((_, i) => candidates[(i + rot) % candidates.length]);
303
+ const labeled = shuffled.map((c, i) => ({ id: c.id, label: ids[i], text: c.proposal || c.rationale || "" }));
304
+ const rubric = await agent(
305
+ [
306
+ `You are a BLIND, IMPARTIAL judge scoring ${labeled.length} competing ${phaseName} proposals.`,
307
+ `Score each on a 1-5 scale per axis: ${axes.map((a) => a.key).join(", ")}. Higher = better.`,
308
+ `Judge ONLY the content. The labels are arbitrary and the order is randomized — do NOT prefer earlier ones. Be calibrated and critical.`,
309
+ ``,
310
+ ...labeled.map((c) => `### Candidate ${c.label}\n${c.text}`),
311
+ ``,
312
+ `Return JSON: { "scores": [ { "id": "<candidate label A/B/C...>", "<axis>": <1-5>, ... }, ... ] }`,
313
+ `IMPORTANT: use the CANDIDATE LABEL (A, B, C…) shown above as the "id" in your scores.`,
314
+ ].join("\n"),
315
+ {
316
+ label: "judge:rubric", phase: "Judge", model: "sonnet",
317
+ schema: {
318
+ type: "object", required: ["scores"], additionalProperties: true,
319
+ properties: { scores: { type: "array", items: { type: "object", additionalProperties: true } } },
320
+ },
321
+ }
322
+ ).catch(() => ({ scores: [] }));
323
+ // Map the judge's label-keyed scores back to the REAL candidate ids before
324
+ // deterministic selection (so the winner id matches an actual candidate).
325
+ const labelToId = new Map(labeled.map((c) => [c.label, c.id]));
326
+ const judgeCandidates = (rubric.scores || []).map((s) => {
327
+ const { id, ...rest } = s; return { id: labelToId.get(id) || id, scores: rest };
328
+ });
329
+ const env = await runCompetitionJudge(projectDir, { kind: "generic", axes, candidates: judgeCandidates }, "judge:select", "Judge");
330
+ winnerId = env.winner; ranked = env.ranked || [];
331
+ }
332
+
333
+ // Red Team HIGH-1: NEVER fall back to an arbitrary candidate. For partition the
334
+ // judge returns winner=null only when EVERY candidate is file-overlapping
335
+ // (invalid) — committing candidates[0] would ship an invalid partition the
336
+ // dispatcher then mis-fans-out (contract Invariant 2). Hard-fail instead.
337
+ let winner = candidates.find((c) => c.id === winnerId);
338
+ if (!winner) {
339
+ if (phaseName === "partition") {
340
+ log(`competition: no VALID partition among ${candidates.length} candidates — failing the phase (Invariant 2: invalid never selected).`);
341
+ return {
342
+ status: "failed", artifacts: [],
343
+ summary: `competition: no valid (file-disjoint) partition among ${candidates.length} candidates`,
344
+ competition: { n: candidates.length, winner: null, ranked },
345
+ };
346
+ }
347
+ // Subjective phases: fall back to the judge's rank-1, else the first candidate.
348
+ const rank1 = (ranked[0] && candidates.find((c) => c.id === ranked[0].id)) || candidates[0];
349
+ winner = rank1;
350
+ log(`competition: judge returned no winner; falling back to rank-1 (${winner.id}).`);
351
+ }
352
+ log(`competition: winner = ${winner.id} (of ${candidates.map((c) => c.id).join(", ")})`);
353
+
354
+ // FINALIZE: one agent commits the WINNING approach (pick-one at the thesis level),
355
+ // then enriches it with non-overlapping good line-items from the losers (safe union
356
+ // at the separable layer — "winner + salvage orphaned good ideas"; never grafts a
357
+ // coupled thesis). Per the two-gate rule in competition-mode-contract.md.
358
+ phase("Finalize");
359
+ const winnerBlob = phaseName === "partition" ? JSON.stringify(winner.domains) : (winner.proposal || winner.rationale || "");
360
+ const losersBlob = candidates.filter((c) => c.id !== winner.id)
361
+ .map((c) => phaseName === "partition" ? JSON.stringify(c.domains) : (c.proposal || c.rationale || ""))
362
+ .join("\n---\n");
363
+ // For partition, the finalizer must report the EXACT domains+touches it committed
364
+ // so we can RE-VALIDATE the graft (Red Team HIGH-2 / contract Invariant 4: a
365
+ // salvaged "missed file" could silently reintroduce a write-target overlap).
366
+ const FINALIZE_SCHEMA = phaseName === "partition"
367
+ ? {
368
+ // finalizedDomains REQUIRED for partition (Red Team recheck LOW-1): if it's
369
+ // optional, a finalizer that omits it silently bypasses re-validation.
370
+ type: "object", required: ["status", "artifacts", "finalizedDomains"], additionalProperties: false,
371
+ properties: {
372
+ status: { type: "string", enum: ["complete", "partial", "blocked", "failed"] },
373
+ artifacts: { type: "array", items: { type: "string" } },
374
+ summary: { type: "string" },
375
+ decisions: { type: "array", items: { type: "string" } },
376
+ finalizedDomains: {
377
+ type: "array", items: {
378
+ type: "object", required: ["name", "touches"], additionalProperties: true,
379
+ properties: { name: { type: "string" }, touches: { type: "array", items: { type: "string" } } },
380
+ },
381
+ },
382
+ },
383
+ }
384
+ : PHASE_RESULT_SCHEMA;
385
+
386
+ result = await agent(
387
+ [
388
+ `You are the ${phaseName} finalizer. A competition selected a WINNING proposal; implement it for real.`,
389
+ milestone ? `Milestone: ${milestone}` : "",
390
+ briefLine,
391
+ ``,
392
+ `Objective: ${baseObjective}`,
393
+ ``,
394
+ `WINNING proposal (implement this whole — it is a coherent thesis, do NOT Frankenstein it):`,
395
+ winnerBlob,
396
+ ``,
397
+ `Other proposals (for SALVAGE ONLY — fold in any non-overlapping, clearly-good line-items, e.g. an extra risk, a missed file, a better domain name — that do NOT conflict with the winning structure. NEVER assign a file to a domain that another domain already owns. If in doubt, leave them out):`,
398
+ losersBlob || "(none)",
399
+ ``,
400
+ `Now WRITE the real artifacts and follow the CLAUDE.md Pre-Commit Gate. Commit with prefix "${(milestone || "m").toLowerCase()}(${phaseName})".`,
401
+ phaseName === "partition"
402
+ ? `Return JSON per the schema, INCLUDING "finalizedDomains" — the exact {name, touches[]} of every domain you committed (touches = the repo files each domain OWNS/WRITES). This is re-validated for file-disjointness.`
403
+ : `Return JSON per the schema.`,
404
+ `Include the competition outcome in "decisions" (e.g. "competition: winner ${winner.id} of ${candidates.length}").`,
405
+ ].filter(Boolean).join("\n"),
406
+ { label: `${phaseName}:finalize`, phase: "Finalize", schema: FINALIZE_SCHEMA, model: "opus" }
407
+ ).catch((e) => ({ status: "failed", artifacts: [], summary: `finalizer error: ${e && e.message}` }));
408
+
409
+ // Re-validate the FINALIZED partition (Invariant 4). If salvage reintroduced an
410
+ // overlap, the finalized graft is invalid → block completion with a clear reason.
411
+ if (phaseName === "partition" && result && result.status !== "failed") {
412
+ const finalized = Array.isArray(result.finalizedDomains) ? result.finalizedDomains : null;
413
+ if (!finalized || !finalized.length) {
414
+ // No finalizedDomains to re-check → can't prove disjointness → block rather
415
+ // than silently accept (Red Team recheck LOW-1: never fail-open on the gate).
416
+ log(`competition: finalizer returned no finalizedDomains — cannot re-validate disjointness, blocking.`);
417
+ result.status = "blocked";
418
+ result.summary = `finalizer did not report finalizedDomains; partition disjointness unverifiable. ${result.summary || ""}`.trim();
419
+ } else {
420
+ const reval = await runCompetitionJudge(
421
+ projectDir,
422
+ { kind: "partition", candidates: [{ id: "finalized", domains: finalized }] },
423
+ "judge:revalidate", "Finalize"
424
+ );
425
+ if (reval.winner !== "finalized") {
426
+ log(`competition: FINALIZED partition failed re-validation (salvage reintroduced a file overlap) — blocking (Invariant 4).`);
427
+ result.status = "blocked";
428
+ result.summary = `finalized partition is NOT file-disjoint (salvage overlap); re-run finalize dropping the conflicting file. ${result.summary || ""}`.trim();
429
+ }
430
+ }
431
+ }
432
+
433
+ // Thread the competition telemetry up so the caller can report measured SC#1.
434
+ result.competition = { n: candidates.length, winner: winner.id, ranked };
435
+ }
122
436
 
123
437
  return result;