@tekyzinc/gsd-t 4.2.10 → 4.3.10

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -2,6 +2,23 @@
2
2
 
3
3
  All notable changes to GSD-T are documented here. Updated with each release.
4
4
 
5
+ ## [4.3.10] - 2026-06-05 (M84 Auto-Competition - minor)
6
+
7
+ ### Changed - Competition Mode is now AUTOMATIC (was opt-in)
8
+
9
+ M82 shipped Competition Mode as opt-in (`--competition N`). M84 makes the workflow decide for itself, per the user directive: *"I want the workflow to determine when it's optimal to create a competition."* The economic case (user's): a better artifact produced upstream makes every downstream phase — pre-mortem, execute, verify — cheaper and more likely to pass first time, so the expected downstream savings usually exceed the ~3× upstream cost. Opt-in just means forgetting to use the thing that lowers total cost.
10
+
11
+ - **Solution-space probe** runs at the start of each eligible phase (partition / milestone / discuss / design-decompose), after brief, before producing. It decides: ≥2 genuinely different viable approaches → compete (3 producers + judge); one obvious answer → single draft.
12
+ - **The probe runs on OPUS, not haiku.** Deciding "are there multiple good approaches?" is high-level reasoning, not a mechanical check — and it gates the whole 3× competition, so a weak probe would forfeit the feature. (User caught this: *"Is Haiku smart enough to make this a judgment?"* — no, it isn't; the probe is opus.)
13
+ - **Biased toward competing**: when uncertain, compete (the asymmetry favors generating options). Probe failure → compete (fail-toward-options).
14
+ - **Partition**: an opus probe makes the pre-produce compete/skip call; the objective file-disjointness oracle still judges the produced candidates (decision = heuristic + bias; selection = objective).
15
+ - **Producer angles are now phase-aware** (`ANGLES_BY_PHASE`) — a discuss/milestone/design producer no longer gets a partition-framed "carve file-disjoint domains" directive (Red Team MEDIUM fix; this latent M82 defect now mattered because competition is the default path).
16
+ - **Overrides** (rarely needed): `competition: N` (2–5) forces N; `competition: 0` / `noCompetition: true` forces off; unset = auto. An unparseable override logs a warning and falls back to auto.
17
+ - `meta.phases` now declares all 7 stages (Preflight / Probe / Compete / Judge / Phase / Finalize / Plan Hardening) — also fixes the M83 cosmetic gap where Plan Hardening wasn't pre-declared.
18
+ - **Verification**: real-sandbox proof — the opus probe ran through the Workflow sandbox and discriminated correctly (wide collaborative-editor scenario → compete, 3 approaches named; narrow copyright-bump → single draft). Adversarial Red Team (Opus, fresh context) GRUDGING-PASS — no CRITICAL/HIGH; state-wiring, overrides, eligibility, probe-failure, cost-bound, runtime-native, and plan-hardening interaction all verified clean. Fixed the 1 MEDIUM (phase-aware angles) + 3 LOWs. Suite 1372/0/4. Minor bump 4.2.10 → 4.3.10.
19
+ - Contract `competition-mode-contract.md` → v2.0.0 (trigger moved opt-in → automatic; judge/selection/invariants unchanged).
20
+ - Origin: NiceNote review — the user observed that competing on the M7 plan would have produced a better plan from the start (fewer pre-mortem blocks, less downstream cost), so competition should be automatic, not a flag to remember.
21
+
5
22
  ## [4.2.10] - 2026-06-05 (M83 Left-Shifted Plan Hardening - minor)
6
23
 
7
24
  ### Added - Plan-phase hardening: catch dead deliverables and edge cases BEFORE execute
package/README.md CHANGED
@@ -128,7 +128,7 @@ gsd-t traceability-gate --milestone Mxx [--project-dir P] # M83: plan-phase acce
128
128
 
129
129
  **Plan Hardening (M83).** The `plan` phase now runs two blocking gates before execute, so a plan can't ship a dead deliverable: a deterministic **acceptance-traceability gate** (`gsd-t traceability-gate` — every AC must bind to a code path + a killing test; the headline capability needs both impl and test) and an adversarial **pre-mortem** agent (opus, fresh-context, predicts edge-case/NFR/dead-deliverable failures and requires a test for each). The temporal dual of the Red Team — attack the design at plan, not just the code at verify. Origin: a build where the headline capability shipped as dead code and burned 4 verify cycles. See `.gsd-t/contracts/plan-hardening-contract.md`.
130
130
 
131
- **Competition Mode (M82).** Opt-in `--competition N` (N 2–5) on upstream, pre-contract phases (`/gsd-t-partition`, `/gsd-t-milestone`, `/gsd-t-design-decompose`) fans out N parallel candidate producers and a judge selects the winner — the generative dual of the orthogonal validation triad. Partition uses an *objective* file-disjointness oracle as the judge (a calculator, not a biased critic); subjective phases use a blind + different-model + rubric judge. Default off. See `.gsd-t/contracts/competition-mode-contract.md`.
131
+ **Competition Mode (M82 · automatic since M84).** On upstream, pre-contract phases (`/gsd-t-partition`, `/gsd-t-milestone`, `/gsd-t-discuss`, `/gsd-t-design-decompose`) the workflow **automatically decides** whether to compete: an Opus solution-space probe runs at phase start and, if it finds ≥2 genuinely different viable approaches, fans out 3 parallel candidate producers + a judge to pick the winner — the generative dual of the orthogonal validation triad. No flag needed (the probe is biased toward competing, since a better upstream artifact lowers total downstream cost). Partition's judge is an *objective* file-disjointness oracle; subjective phases use a blind + different-model + rubric judge. Override with `--no-competition` or `--competition N` only on explicit request. See `.gsd-t/contracts/competition-mode-contract.md`.
132
132
 
133
133
  `gsd-t parallel` consumes the M44 task-graph (D1) and applies three pre-spawn gates (D4 depgraph validation → D5 file-disjointness → D6 economics) followed by mode-aware headroom/split math. Extends — does not replace — the M40 orchestrator. Contract: `.gsd-t/contracts/wave-join-contract.md` v1.1.0.
134
134
 
@@ -25,12 +25,12 @@ Capture the design reference from `$ARGUMENTS` (Figma URL / image path). If Figm
25
25
  args: {
26
26
  phase: "design-decompose",
27
27
  projectDir: ".",
28
- userInput: "$ARGUMENTS",
29
- // M82 Competition Mode (opt-in): `--competition N` (N 2..5) fans out N
30
- // parallel decompositions; a blind, different-model, rubric judge (fidelity /
31
- // completeness / reuse / simplicity) selects the winner. Useful when a design
32
- // is ambiguous or the component boundaries aren't obvious.
33
- competition: 1
28
+ userInput: "$ARGUMENTS"
29
+ // M84 Competition Mode is AUTOMATIC do NOT pass `competition` by default.
30
+ // The workflow probes (opus) and self-decides; it competes when a design is
31
+ // ambiguous or the element/widget/page boundaries aren't obvious (a blind,
32
+ // different-model rubric judge picks the winner). Override only on explicit
33
+ // request: `--no-competition` → 0, `--competition N` (2-5) → N.
34
34
  }
35
35
  }
36
36
  ```
@@ -481,7 +481,7 @@ Use these when user asks for help on a specific command:
481
481
 
482
482
  ### competition-judge (M82)
483
483
  - **Summary**: The selection oracle for Competition Mode (generate-and-judge — the *generative* dual of the orthogonal validation triad). Two modes: `--kind partition` scores candidate domain decompositions via the file-disjointness oracle (parallelGroups / waveDepth / validity — a calculator, not an LLM critic, so it's immune to judge bias); `--kind generic` is a deterministic rubric selector that finalizes a winner from rubric scores an upstream blind/different-model judge supplied.
484
- - **Auto-invoked**: Yes — by `gsd-t-phase.workflow.js` when an eligible phase (partition / milestone / design-decompose) is run with `competition: N` (N 2–5). Opt-in per phase via `/gsd-t-partition --competition N` etc. Default off.
484
+ - **Auto-invoked**: Yes — AUTOMATICALLY (M84). On an eligible phase (partition / milestone / discuss / design-decompose), `gsd-t-phase.workflow.js` runs an Opus solution-space probe at phase start and self-decides whether to fan out 3 producers + this judge (biased toward competing). No flag needed; override with `--competition N` (force N) or `--no-competition` (force off).
485
485
  - **Files**: `bin/gsd-t-competition-judge.cjs` (reuses `bin/gsd-t-file-disjointness.cjs`).
486
486
  - **Use when**: Upstream, pre-contract, wide-solution-space decisions where the cost of a single draft is high (partition, milestone decomposition, ambiguous design decomposition). Never on post-contract phases (execute/verify/etc.) — those are owned by the adversarial triad.
487
487
  - **CLI**: `gsd-t competition-judge [--in <spec.json>] [--project-dir <dir>]` (spec via stdin or `--in`). Exit 0 winner · 4 no valid candidate · 64 bad input.
@@ -25,17 +25,17 @@ Read `.gsd-t/progress.md` (current version + completed milestones), `docs/requir
25
25
  args: {
26
26
  phase: "milestone",
27
27
  projectDir: ".",
28
- userInput: "$ARGUMENTS",
29
- // M82 Competition Mode (opt-in): `--competition N` (N 2..5) fans out N
30
- // parallel Self-MoA producers proposing different decomposition strategies
31
- // (risk-first / value-first / dependency-first); a blind, different-model,
32
- // rubric judge selects the winner. Coupled-thesis pick-one (no Frankenstein).
33
- competition: 1
28
+ userInput: "$ARGUMENTS"
29
+ // M84 Competition Mode is AUTOMATIC do NOT pass `competition` by default.
30
+ // The workflow probes (opus) and self-decides; milestone decomposition is the
31
+ // highest-altitude decision, so it competes whenever ≥2 genuinely different
32
+ // strategies (risk-first / value-first / dependency-first) exist. Override only
33
+ // on explicit request: `--no-competition` → 0, `--competition N` (2-5) → N.
34
34
  }
35
35
  }
36
36
  ```
37
37
 
38
- **Competition Mode (`--competition N`).** Milestone decomposition is the highest-altitude decision in the system different strategies are genuinely different. If the user invokes `/gsd-t-milestone --competition 3`, parse N (clamped 2..5) and pass `competition: N`. Because a milestone decomposition is a *coupled thesis*, the judge selects one winner whole (pick-one) and only salvages non-overlapping good line-items from the losers — it never Frankensteins. See `.gsd-t/contracts/competition-mode-contract.md`. Default off.
38
+ **Competition Mode (automatic).** Milestone decomposition auto-competes when the probe finds ≥2 genuinely different strategies. Because a decomposition is a *coupled thesis*, the judge selects one winner whole (pick-one) and salvages only non-overlapping good line-items from the losers — it never Frankensteins. No flag needed; override with `--no-competition` / `--competition N` on explicit request. See `.gsd-t/contracts/competition-mode-contract.md`.
39
39
 
40
40
  ## Step 3: Interpret the result
41
41
 
@@ -30,17 +30,17 @@ Call the `Workflow` tool with:
30
30
  phase: "partition",
31
31
  milestone: "M{NN}",
32
32
  projectDir: ".",
33
- userInput: "$ARGUMENTS",
34
- // M82 Competition Mode (opt-in): if the user passed `--competition N` in
35
- // $ARGUMENTS (N in 2..5), set competition: N. N parallel Self-MoA producers
36
- // propose partitions; the OBJECTIVE oracle judge (file-disjointness scoring)
37
- // picks the most-parallelizable valid decomposition. Omit / set 1 = off.
38
- competition: 1
33
+ userInput: "$ARGUMENTS"
34
+ // M84 Competition Mode is AUTOMATIC do NOT pass `competition` by default.
35
+ // The workflow runs a solution-space probe and self-decides whether to fan out
36
+ // N candidate partitions (judged by the file-disjointness oracle). Only pass an
37
+ // override if the user explicitly asked: `--competition 0`/`--no-competition`
38
+ // → competition: 0; `--competition N` (2-5) → competition: N.
39
39
  }
40
40
  }
41
41
  ```
42
42
 
43
- **Competition Mode (`--competition N`).** Partition is the v1 beachhead for generate-and-judge: its judge is the file-disjointness oracle, so it is a calculator, not a biased critic. If the user invokes `/gsd-t-partition --competition 3`, parse N (clamped 2..5) and pass `competition: N`. The workflow fans out N candidate partitions, scores each on measured parallelism / wave-depth / boundary-cleanliness, and finalizes the winner. See `.gsd-t/contracts/competition-mode-contract.md`. Default off (single producer).
43
+ **Competition Mode (automatic).** Partition auto-competes when the workflow's probe finds ≥2 genuinely different ways to carve the domains; the objective file-disjointness oracle judges the candidates and picks the most-parallelizable valid one. No flag needed. Override only on explicit request: `/gsd-t-partition --no-competition` (force single draft) or `--competition N` (force N). See `.gsd-t/contracts/competition-mode-contract.md`.
44
44
 
45
45
  ## Step 3: Interpret the result
46
46
 
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@tekyzinc/gsd-t",
3
- "version": "4.2.10",
3
+ "version": "4.3.10",
4
4
  "description": "GSD-T: Contract-Driven Development for Claude Code — 54 slash commands with headless-by-default workflow spawning, unattended supervisor relay with event stream, graph-powered code analysis, real-time agent dashboard, task telemetry, doc-ripple enforcement, backlog management, impact analysis, test sync, milestone archival, and PRD generation",
5
5
  "author": "Tekyz, Inc.",
6
6
  "license": "MIT",
@@ -328,7 +328,7 @@ Canonical scripts:
328
328
  - `gsd-t-integrate.workflow.js` — cross-domain wire-up + light verify-gate
329
329
  - `gsd-t-debug.workflow.js` — 2-cycle diagnose/fix/verify (CLAUDE.md Prime Rule)
330
330
  - `gsd-t-quick.workflow.js` — preflight + brief + single-task + verify-gate (M56-D4)
331
- - `gsd-t-phase.workflow.js` — generic upper-stage runner (partition / plan / discuss / impact / milestone / prd / design-decompose / doc-ripple). **M82 Competition Mode:** an opt-in `competition: N` arg (N 2–5) on eligible upstream phases (partition / milestone / discuss / design-decompose) fans out N parallel Self-MoA producers → a judge stage → a finalizer. Partition's judge is the OBJECTIVE file-disjointness oracle (`gsd-t competition-judge --kind partition` — a calculator, not an LLM critic, immune to judge bias, the v1 beachhead); subjective phases use a blind + shuffled + different-model + rubric judge whose pick is finalized deterministically by `--kind generic`. The generative dual of the orthogonal validation triad; watershed rule = generate-and-judge ABOVE the contract, attack-and-filter BELOW. Default off. Contract: `competition-mode-contract.md` v1.0.0. **M83 Plan Hardening:** the `plan` phase runs two blocking gates before execute — a deterministic acceptance-traceability gate (`gsd-t traceability-gate`: every AC binds to a code path + a killing test; the `Headline:` task needs both impl and test) and an adversarial pre-mortem agent (opus, fresh-context, protocol `pre-mortem-subagent.md`: predicts edge-case/dead-deliverable/NFR failures, each → a required test). The temporal dual of the Red Team (attack the design at plan, not just code at verify). Contract: `plan-hardening-contract.md` v1.0.0.
331
+ - `gsd-t-phase.workflow.js` — generic upper-stage runner (partition / plan / discuss / impact / milestone / prd / design-decompose / doc-ripple). **M82/M84 Competition Mode (AUTOMATIC):** on eligible upstream phases (partition / milestone / discuss / design-decompose) an Opus solution-space probe runs at phase start and self-decides whether to compete (biased toward competing — a better upstream artifact lowers total downstream cost); when it fires, 3 parallel Self-MoA producers → a judge stage → a finalizer. No flag needed; override with `competition: N` / `competition: 0` / `noCompetition: true`. Partition's judge is the OBJECTIVE file-disjointness oracle (`gsd-t competition-judge --kind partition` — a calculator, not an LLM critic, immune to judge bias, the v1 beachhead); subjective phases use a blind + shuffled + different-model + rubric judge whose pick is finalized deterministically by `--kind generic`. The generative dual of the orthogonal validation triad; watershed rule = generate-and-judge ABOVE the contract, attack-and-filter BELOW. Default off. Contract: `competition-mode-contract.md` v1.0.0. **M83 Plan Hardening:** the `plan` phase runs two blocking gates before execute — a deterministic acceptance-traceability gate (`gsd-t traceability-gate`: every AC binds to a code path + a killing test; the `Headline:` task needs both impl and test) and an adversarial pre-mortem agent (opus, fresh-context, protocol `pre-mortem-subagent.md`: predicts edge-case/dead-deliverable/NFR failures, each → a required test). The temporal dual of the Red Team (attack the design at plan, not just code at verify). Contract: `plan-hardening-contract.md` v1.0.0.
332
332
  - `gsd-t-scan.workflow.js` — preflight → volume-probe → pipeline(per-slice deep finder → single verify) → synthesis → document → render (M66: fans out by codebase VOLUME, not a fixed 5-teammate dimension count; M67: deep document phase deterministically produces the full living-doc set + dimension files, per-doc fan-out)
333
333
 
334
334
  **Runtime-native invariant (M81 — v4.0.29+):** the Workflow sandbox provides ONLY `agent/parallel/pipeline/log/phase/budget/args` — NO `require`/`fs`/`path`/`child_process`/`process`, and `args` arrives as a JSON STRING. Each workflow is self-contained: it `JSON.parse`s `args` and delegates every CLI call (preflight, verify-gate, brief, build-coverage, ci-parity, test-data, disjointness) to inline `async` helpers that run the command via an `agent()`'s Bash (preferring project-local `bin/<tool>.cjs`, else the global `gsd-t` PATH binary) and parse the JSON envelope — preserving the M55-D5 project-local-bin invariant. The old `require("./_lib.js")` pattern threw `ReferenceError` on first eval and silently broke every workflow except scan (TD-113, fixed M81); `_lib.js` is retired as a workflow dependency.
@@ -37,8 +37,13 @@ export const meta = {
37
37
  name: "gsd-t-phase",
38
38
  description: "Generic upper-stage phase runner (partition/plan/discuss/etc.)",
39
39
  phases: [
40
- { title: "Preflight", detail: "preflight + brief" },
41
- { title: "Phase", detail: "primary agent with phase-specific protocol" },
40
+ { title: "Preflight", detail: "preflight + brief" },
41
+ { title: "Probe", detail: "M84 auto-competition solution-space probe (opus; eligible phases only)" },
42
+ { title: "Compete", detail: "M82/M84 N parallel producers (when competition fires)" },
43
+ { title: "Judge", detail: "select/synthesize the winning candidate" },
44
+ { title: "Phase", detail: "primary agent (or finalizer) with phase-specific protocol" },
45
+ { title: "Finalize", detail: "commit the winning approach (competition path)" },
46
+ { title: "Plan Hardening", detail: "M83 traceability gate + adversarial pre-mortem (plan phase only)" },
42
47
  ],
43
48
  };
44
49
 
@@ -128,9 +133,77 @@ async function runCompetitionJudge(projectDir, spec, label = "judge", phaseNameO
128
133
  }
129
134
 
130
135
  // Phases where competition pays off (wide solution space, pre-contract, high blast
131
- // radius). A competition arg on any other phase is ignored (single producer runs).
136
+ // radius). Competition is AUTOMATIC on these (M84) the workflow probes the
137
+ // solution space and self-decides; on any other phase it never runs.
132
138
  const COMPETITION_ELIGIBLE = new Set(["partition", "milestone", "discuss", "design-decompose"]);
133
139
 
140
+ // M84: the solution-space probe. Decides AUTOMATICALLY whether a phase is
141
+ // competition-worthy (≥2 genuinely different viable approaches). This is a
142
+ // high-level reasoning step — NOT a mechanical check — so it runs on OPUS, not
143
+ // haiku (a weak probe forfeits the whole point: it gates a 3× competition whose
144
+ // upstream cost buys down far larger downstream cost). It is BIASED TOWARD
145
+ // COMPETING: when uncertain, compete — because a better artifact upstream makes
146
+ // every downstream phase (pre-mortem, execute, verify) cheaper and more likely to
147
+ // pass first time, so the expected savings usually exceed the 3× probe-and-produce
148
+ // cost. Returns { compete: bool, reason, approaches? }.
149
+ //
150
+ // Partition has its OWN probe (runPartitionProbe, also opus): the disjointness
151
+ // oracle can't decide before candidates exist, so an opus probe makes the
152
+ // compete/skip call and the oracle JUDGES the candidates afterward. This
153
+ // (runSolutionSpaceProbe) is for the other subjective phases.
154
+ const _PROBE_SCHEMA = {
155
+ type: "object", required: ["compete"], additionalProperties: true,
156
+ properties: {
157
+ compete: { type: "boolean" },
158
+ reason: { type: "string" },
159
+ approaches: { type: "array", items: { type: "string" } },
160
+ },
161
+ };
162
+ async function runSolutionSpaceProbe(projectDir, phaseName, { milestone, briefPath, userInput, phaseNameOpt } = {}) {
163
+ const prompt = [
164
+ `You are the Solution-Space Probe for the ${phaseName} phase${milestone ? ` of ${milestone}` : ""}. Decide ONE thing: should this phase generate MULTIPLE competing candidates (then a judge picks the best), or is a single draft sufficient?`,
165
+ `**Brief:** ${briefPath || "(none — read the relevant .gsd-t docs/contracts/requirements directly)"}`,
166
+ userInput ? `\nUser input:\n${userInput}\n` : "",
167
+ `Compete WHEN there are ≥2 genuinely DIFFERENT, viable approaches whose trade-offs matter — different architectures, decomposition strategies, data models, sequencing, or design directions that a reasonable expert could disagree about. List them in "approaches".`,
168
+ `Do NOT compete only when there is ONE obvious correct approach and any variation would be cosmetic.`,
169
+ `BIAS TOWARD COMPETING: if you are uncertain, or can name even two plausibly-different approaches, choose compete=true. A wasted competition costs ~3× this one phase; a missed-better-approach costs far more downstream (more pre-mortem blocks, more bugs, more verify cycles). Err on the side of generating options.`,
170
+ `Return JSON per the schema: { "compete": true|false, "reason": "<one sentence>", "approaches": ["<a>","<b>",...] }.`,
171
+ ].filter(Boolean).join("\n");
172
+ const opts = { label: "solution-space-probe", schema: _PROBE_SCHEMA, model: "opus" };
173
+ if (phaseNameOpt) opts.phase = phaseNameOpt;
174
+ const r = await agent(prompt, opts).catch(() => null);
175
+ // Probe failure → bias toward competing (fail-toward-options, per the cost logic).
176
+ if (!r || typeof r.compete !== "boolean") {
177
+ return { compete: true, reason: "probe unavailable — defaulting to compete (bias toward options)", approaches: [] };
178
+ }
179
+ return { compete: r.compete, reason: r.reason || "", approaches: r.approaches || [] };
180
+ }
181
+
182
+ // M84: PARTITION's pre-produce decision. The objective disjointness oracle needs
183
+ // candidates to score, so it can't DECIDE before any exist — it runs later as the
184
+ // JUDGE. For the pre-produce compete/skip decision we use an OPUS heuristic probe
185
+ // (biased toward compete): partition is competition-worthy unless the milestone is
186
+ // trivially single-domain. So: opus probe DECIDES whether to compete; the objective
187
+ // file-disjointness oracle JUDGES the produced candidates. (Decision = heuristic +
188
+ // bias; selection = objective.)
189
+ async function runPartitionProbe(projectDir, { milestone, briefPath, userInput, phaseNameOpt } = {}) {
190
+ const prompt = [
191
+ `You are the Partition Solution-Space Probe${milestone ? ` for ${milestone}` : ""}. Decide: are there ≥2 genuinely different ways to CARVE this milestone into file-disjoint domains (different boundaries / groupings / parallelism), or is there one obvious single decomposition?`,
192
+ `**Brief:** ${briefPath || "(none — read .gsd-t docs/contracts/requirements directly)"}`,
193
+ userInput ? `\nUser input:\n${userInput}\n` : "",
194
+ `Compete=true when the work spans multiple files/areas that could be grouped more than one sensible way. Compete=false ONLY for a trivial single-file / single-domain milestone.`,
195
+ `BIAS TOWARD COMPETING: if ≥3 files/areas are in play or you're unsure, choose compete=true — the file-disjointness oracle will objectively pick the most-parallelizable valid carving among the candidates, so competing is low-risk and high-reward.`,
196
+ `Return JSON per the schema.`,
197
+ ].filter(Boolean).join("\n");
198
+ const opts = { label: "partition-probe", schema: _PROBE_SCHEMA, model: "opus" };
199
+ if (phaseNameOpt) opts.phase = phaseNameOpt;
200
+ const r = await agent(prompt, opts).catch(() => null);
201
+ if (!r || typeof r.compete !== "boolean") {
202
+ return { compete: true, reason: "probe unavailable — defaulting to compete", approaches: [] };
203
+ }
204
+ return { compete: r.compete, reason: r.reason || "", approaches: r.approaches || [] };
205
+ }
206
+
134
207
  // Rubric axes for the SUBJECTIVE judge (non-partition eligible phases). Partition
135
208
  // uses the objective oracle instead and ignores these.
136
209
  const RUBRIC_AXES_BY_PHASE = {
@@ -170,13 +243,27 @@ const milestone = _args.milestone || null;
170
243
  const userInput = _args.userInput || "";
171
244
  const phaseName = _args.phase;
172
245
 
173
- // M82: clamp competition N to [1,5]. Evidence (Self-MoA, Large Language Monkeys):
174
- // gains plateau fast; N=3 captures the elbow, >5 is wasteful. N<=1 = off (single producer).
175
- const _rawN = Number(_args.competition) || 1;
176
- const competitionN = Math.max(1, Math.min(5, Math.floor(_rawN)));
177
- const competitionOn = competitionN > 1 && COMPETITION_ELIGIBLE.has(phaseName);
178
- if (competitionN > 1 && !competitionOn) {
179
- log(`competition: N=${competitionN} ignored phase "${phaseName}" is not competition-eligible (single producer runs). Eligible: ${[...COMPETITION_ELIGIBLE].join(", ")}.`);
246
+ // M84: competition is AUTOMATIC. By default the workflow PROBES the solution space
247
+ // (after brief) and self-decides whether to run a 3-producer + judge competition
248
+ // no flag needed. Optional manual OVERRIDES: `competition: N` (2-5) forces N
249
+ // producers; `competition: 0` or `noCompetition: true` forces it off. Default
250
+ // (`competition` unset) = let the workflow decide.
251
+ // Evidence (Self-MoA, Large Language Monkeys): gains plateau fast; N=3 is the elbow,
252
+ // >5 wasteful. The auto path fires 3.
253
+ const AUTO_COMPETITION_N = 3;
254
+ const _hasCompetitionArg = _args.competition !== undefined && _args.competition !== null;
255
+ const _forceOff = _args.noCompetition === true || (_hasCompetitionArg && Number(_args.competition) <= 1);
256
+ const _forcedN = _hasCompetitionArg && Number(_args.competition) >= 2
257
+ ? Math.max(2, Math.min(5, Math.floor(Number(_args.competition))))
258
+ : null;
259
+ // competitionN/competitionOn are resolved LATER (after preflight+brief) by the
260
+ // auto-probe, unless an override pins them now. Declared with `let` so the
261
+ // post-brief decision block can set them.
262
+ let competitionN = 1;
263
+ let competitionOn = false;
264
+ const _competitionEligible = COMPETITION_ELIGIBLE.has(phaseName);
265
+ if (_forcedN && !_competitionEligible) {
266
+ log(`competition: forced N=${_forcedN} ignored — phase "${phaseName}" is not competition-eligible. Eligible: ${[...COMPETITION_ELIGIBLE].join(", ")}.`);
180
267
  }
181
268
 
182
269
  if (!phaseName || !VALID_PHASES.includes(phaseName)) {
@@ -189,7 +276,35 @@ const pre = await runPreflight(projectDir);
189
276
  if (!pre.ok) return { status: "failed", reason: "preflight-failed", preflight: pre.envelope };
190
277
  const brief = await generateBrief(projectDir, { kind: phaseName, milestone, id: `${phaseName}-${(milestone || "m").toLowerCase()}` });
191
278
 
192
- phase("Phase");
279
+ // ── M84: resolve competition AUTOMATICALLY (after brief, before producing) ──
280
+ // Default: probe the solution space and self-decide. Overrides pin it.
281
+ if (_competitionEligible) {
282
+ if (_forceOff) {
283
+ competitionOn = false;
284
+ log(`competition: OFF (overridden via competition≤1 / noCompetition).`);
285
+ } else if (_forcedN) {
286
+ competitionN = _forcedN; competitionOn = true;
287
+ log(`competition: ON, N=${_forcedN} (overridden).`);
288
+ } else {
289
+ // M84 Red Team LOW: warn on an unparseable override so a typo (competition:"off")
290
+ // isn't silently swallowed into the auto path.
291
+ if (_hasCompetitionArg && Number.isNaN(Number(_args.competition))) {
292
+ log(`competition: override value ${JSON.stringify(_args.competition)} is not a number — ignoring it, using AUTO. (Use 0/noCompetition to force off, 2-5 to force N.)`);
293
+ }
294
+ // Automatic decision — the workflow probes and decides. Opus probe (or the
295
+ // partition-specific probe); biased toward competing.
296
+ phase("Probe");
297
+ const probe = phaseName === "partition"
298
+ ? await runPartitionProbe(projectDir, { milestone, briefPath: brief.briefPath, userInput, phaseNameOpt: "Probe" })
299
+ : await runSolutionSpaceProbe(projectDir, phaseName, { milestone, briefPath: brief.briefPath, userInput, phaseNameOpt: "Probe" });
300
+ competitionOn = !!probe.compete;
301
+ competitionN = competitionOn ? AUTO_COMPETITION_N : 1;
302
+ log(`competition: AUTO → ${competitionOn ? `COMPETE (${AUTO_COMPETITION_N} producers)` : "single draft"} — ${probe.reason}${probe.approaches && probe.approaches.length ? ` [approaches: ${probe.approaches.join("; ")}]` : ""}`);
303
+ }
304
+ }
305
+
306
+ // M84 Red Team LOW: announce "Phase" only on the single-draft path (the
307
+ // competition path announces Compete/Judge/Finalize instead) so no empty stage shows.
193
308
  const promptByPhase = {
194
309
  partition: `Decompose the milestone into 2-5 independent domains. Write .gsd-t/domains/{domain}/{scope,constraints,tasks}.md. Cross-domain contracts in .gsd-t/contracts/.`,
195
310
  plan: `For each domain, write atomic tasks.md entries with files, contract refs, dependencies, acceptance criteria. Update .gsd-t/contracts/integration-points.md with wave groupings.
@@ -209,6 +324,7 @@ const briefLine = `**Brief (REQUIRED):** ${brief.briefPath || "(no brief — re-
209
324
  let result;
210
325
  if (!competitionOn) {
211
326
  // ── Single-producer path (default, unchanged behavior) ──
327
+ phase("Phase");
212
328
  result = await agent(
213
329
  [
214
330
  `You are the ${phaseName} phase agent.`,
@@ -224,15 +340,49 @@ if (!competitionOn) {
224
340
  { label: phaseName, phase: "Phase", schema: PHASE_RESULT_SCHEMA, model: "opus" }
225
341
  ).catch((e) => ({ status: "failed", artifacts: [], summary: `agent error: ${e && e.message}` }));
226
342
  } else {
227
- // ── M82 Competition Mode: generate -> judge -> finalize ──
228
- // Distinct "angles" so the N Self-MoA producers explore different regions of
229
- // the solution space (diversity by prompt, not by model — Self-MoA > Mixed-MoA).
230
- const ANGLES = [
231
- "Optimize for MAXIMUM parallelism: carve the most file-disjoint domains that can run concurrently.",
232
- "Optimize for SIMPLICITY: the fewest domains with the cleanest, most obvious boundaries.",
233
- "Optimize for RISK ISOLATION: isolate the riskiest/most-coupled work into its own domain so the rest stays safe.",
234
- "Optimize for DEPENDENCY DEPTH: minimize serial gates (waves) between domains.",
235
- "Optimize for BALANCE: roughly equal-sized domains with minimal cross-talk.",
343
+ // ── M82/M84 Competition Mode: generate -> judge -> finalize ──
344
+ // Distinct "angles" so the N Self-MoA producers explore different regions of the
345
+ // solution space (diversity by prompt, not by model — Self-MoA > Mixed-MoA).
346
+ // M84 Red Team MEDIUM: angles must be PHASE-AWARE — the old partition-only set
347
+ // gave a discuss/milestone producer a contradictory "carve file-disjoint domains"
348
+ // directive, degrading 3 of 4 now-automatic phases. Each eligible phase gets its
349
+ // own angle set (analogous to RUBRIC_AXES_BY_PHASE).
350
+ const ANGLES_BY_PHASE = {
351
+ partition: [
352
+ "Optimize for MAXIMUM parallelism: carve the most file-disjoint domains that can run concurrently.",
353
+ "Optimize for SIMPLICITY: the fewest domains with the cleanest, most obvious boundaries.",
354
+ "Optimize for RISK ISOLATION: isolate the riskiest/most-coupled work into its own domain so the rest stays safe.",
355
+ "Optimize for DEPENDENCY DEPTH: minimize serial gates (waves) between domains.",
356
+ "Optimize for BALANCE: roughly equal-sized domains with minimal cross-talk.",
357
+ ],
358
+ milestone: [
359
+ "Optimize for FASTEST TIME-TO-VALUE: the leanest milestone sequence that ships something usable soonest.",
360
+ "Optimize for RISK-FIRST: front-load the riskiest/most-uncertain work so failure is cheap and early.",
361
+ "Optimize for DEPENDENCY ORDER: sequence strictly by what unblocks the most downstream work.",
362
+ "Optimize for USER-VALUE-FIRST: order milestones by the value each delivers to the end user.",
363
+ "Optimize for SIMPLICITY: the fewest, most self-contained milestones with minimal cross-cutting.",
364
+ ],
365
+ discuss: [
366
+ "Argue the SIMPLEST viable architecture, even if it sacrifices some flexibility.",
367
+ "Argue the most ROBUST/CORRECT architecture, accepting more upfront complexity.",
368
+ "Argue the most EXTENSIBLE architecture, optimizing for future change.",
369
+ "Argue a PRAGMATIC middle path, naming the explicit trade-offs it accepts.",
370
+ "Argue a CONTRARIAN approach that questions an assumption the others take for granted.",
371
+ ],
372
+ "design-decompose": [
373
+ "Decompose ATOMIC-FIRST: smallest reusable elements up, composed into widgets then pages.",
374
+ "Decompose PAGE-FIRST: whole pages down into sections, widgets, then elements.",
375
+ "Decompose TOKEN-DRIVEN: design tokens + primitives first, structure follows the system.",
376
+ "Decompose by REUSE: maximize shared components; minimize one-off bespoke pieces.",
377
+ "Decompose by FEATURE: group elements/widgets by the user-facing feature they serve.",
378
+ ],
379
+ };
380
+ const ANGLES = ANGLES_BY_PHASE[phaseName] || [
381
+ "Explore a materially different approach, optimizing for simplicity.",
382
+ "Explore a materially different approach, optimizing for robustness/correctness.",
383
+ "Explore a materially different approach, optimizing for extensibility.",
384
+ "Explore a pragmatic middle path, naming its trade-offs.",
385
+ "Explore a contrarian approach that questions a shared assumption.",
236
386
  ];
237
387
 
238
388
  const PRODUCER_SCHEMA = phaseName === "partition"