@tekyzinc/gsd-t 4.2.10 → 4.4.10
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +33 -0
- package/README.md +6 -5
- package/bin/gsd-t-model-tier-policy.cjs +168 -0
- package/bin/gsd-t-parallel.cjs +17 -7
- package/bin/gsd-t.js +15 -0
- package/bin/model-selector.js +13 -3
- package/commands/gsd-t-design-decompose.md +6 -6
- package/commands/gsd-t-help.md +8 -1
- package/commands/gsd-t-milestone.md +7 -7
- package/commands/gsd-t-partition.md +7 -7
- package/package.json +1 -1
- package/scripts/hooks/gsd-t-ctx-cue.sh +58 -0
- package/scripts/statusline-command.sh +119 -0
- package/templates/CLAUDE-global.md +5 -4
- package/templates/workflows/gsd-t-debug.workflow.js +1 -1
- package/templates/workflows/gsd-t-phase.workflow.js +172 -22
- package/templates/workflows/gsd-t-verify.workflow.js +1 -1
package/CHANGELOG.md
CHANGED
|
@@ -2,6 +2,39 @@
|
|
|
2
2
|
|
|
3
3
|
All notable changes to GSD-T are documented here. Updated with each release.
|
|
4
4
|
|
|
5
|
+
## [4.4.10] - 2026-06-09 (M85 Model-Tier Policy + Fable 5 — minor)
|
|
6
|
+
|
|
7
|
+
### Added — single source of truth for model-tier assignments + the Fable 5 tier
|
|
8
|
+
|
|
9
|
+
Model-tier policy previously lived in 4 unsynced authorities with zero drift enforcement, and the parallel alias map was provably stale (`opus → claude-opus-4-7`). M85 centralizes the policy, fixes the live bug, and slots Claude Fable 5 (tier above Opus, $10/$50 per MTok) into the highest-leverage stages — gated by a lint so drift is impossible. The cost tradeoff was MEASURED, not asserted: a Fable single-draft tied a judged 3-Opus competition at 42% of the cost (n=1, discuss-class).
|
|
10
|
+
|
|
11
|
+
- `bin/gsd-t-model-tier-policy.cjs`: NEW — frozen `MODEL_IDS` + `STAGE_TIERS` (6 fable stage keys; competition producers HELD at opus per the M82 blindness invariant), `requiresThinkingOmitted()` (Fable's thinking-disabled-400 breaking change encoded once; accepts the runtime bracket-suffix form), `resolve()` + CLI resolver emitting the M69 JSON envelope; `gsd-t model-tier-policy` dispatcher + registered in both bin-propagation lists.
|
|
12
|
+
- `bin/gsd-t-parallel.cjs`: alias map now `require()`s the policy module (zero bare model-id literals; stale opus-4-7 gone); cache-warm probe passes `--model` explicitly (the `ANTHROPIC_MODEL` env pin was measured silently ignored by the current CLI).
|
|
13
|
+
- `bin/model-selector.js`: FABLE tier + `cycle_2_escalation` rule via the existing `selectModel` signature; debug default byte-identical.
|
|
14
|
+
- `templates/workflows/gsd-t-{phase,verify,debug}.workflow.js`: 5 Fable assignments — M84 solution-space/partition probes, competition judge (`judge:rubric`), M83 pre-mortem, Red Team (stays non-skippable), debug `cycle === 1 ? "opus" : "fable"` ternary.
|
|
15
|
+
- `test/m85-workflow-tier-policy-lint.test.js`: NEW M71-family drift enforcer — 8-file discovery, stage-key→label mapping with per-stage non-empty-match, negative drift fixtures, real-file + debug-ternary meta-tests.
|
|
16
|
+
- `test/m85-model-tier-policy.test.js` + `test/model-selector.test.js`: 25 + 57 tests incl. dispatcher/propagation killing tests.
|
|
17
|
+
- Contracts: `model-tier-policy-contract.md` v1.0.0 STABLE (new); `model-selection-contract.md` → v1.1.0.
|
|
18
|
+
|
|
19
|
+
No migration needed for consumer projects: workflows keep using tier aliases; `gsd-t update-all` propagates the new module. Suite 1462/0.
|
|
20
|
+
|
|
21
|
+
## [4.3.10] - 2026-06-05 (M84 Auto-Competition - minor)
|
|
22
|
+
|
|
23
|
+
### Changed - Competition Mode is now AUTOMATIC (was opt-in)
|
|
24
|
+
|
|
25
|
+
M82 shipped Competition Mode as opt-in (`--competition N`). M84 makes the workflow decide for itself, per the user directive: *"I want the workflow to determine when it's optimal to create a competition."* The economic case (user's): a better artifact produced upstream makes every downstream phase — pre-mortem, execute, verify — cheaper and more likely to pass first time, so the expected downstream savings usually exceed the ~3× upstream cost. Opt-in just means forgetting to use the thing that lowers total cost.
|
|
26
|
+
|
|
27
|
+
- **Solution-space probe** runs at the start of each eligible phase (partition / milestone / discuss / design-decompose), after brief, before producing. It decides: ≥2 genuinely different viable approaches → compete (3 producers + judge); one obvious answer → single draft.
|
|
28
|
+
- **The probe runs on OPUS, not haiku.** Deciding "are there multiple good approaches?" is high-level reasoning, not a mechanical check — and it gates the whole 3× competition, so a weak probe would forfeit the feature. (User caught this: *"Is Haiku smart enough to make this a judgment?"* — no, it isn't; the probe is opus.)
|
|
29
|
+
- **Biased toward competing**: when uncertain, compete (the asymmetry favors generating options). Probe failure → compete (fail-toward-options).
|
|
30
|
+
- **Partition**: an opus probe makes the pre-produce compete/skip call; the objective file-disjointness oracle still judges the produced candidates (decision = heuristic + bias; selection = objective).
|
|
31
|
+
- **Producer angles are now phase-aware** (`ANGLES_BY_PHASE`) — a discuss/milestone/design producer no longer gets a partition-framed "carve file-disjoint domains" directive (Red Team MEDIUM fix; this latent M82 defect now mattered because competition is the default path).
|
|
32
|
+
- **Overrides** (rarely needed): `competition: N` (2–5) forces N; `competition: 0` / `noCompetition: true` forces off; unset = auto. An unparseable override logs a warning and falls back to auto.
|
|
33
|
+
- `meta.phases` now declares all 7 stages (Preflight / Probe / Compete / Judge / Phase / Finalize / Plan Hardening) — also fixes the M83 cosmetic gap where Plan Hardening wasn't pre-declared.
|
|
34
|
+
- **Verification**: real-sandbox proof — the opus probe ran through the Workflow sandbox and discriminated correctly (wide collaborative-editor scenario → compete, 3 approaches named; narrow copyright-bump → single draft). Adversarial Red Team (Opus, fresh context) GRUDGING-PASS — no CRITICAL/HIGH; state-wiring, overrides, eligibility, probe-failure, cost-bound, runtime-native, and plan-hardening interaction all verified clean. Fixed the 1 MEDIUM (phase-aware angles) + 3 LOWs. Suite 1372/0/4. Minor bump 4.2.10 → 4.3.10.
|
|
35
|
+
- Contract `competition-mode-contract.md` → v2.0.0 (trigger moved opt-in → automatic; judge/selection/invariants unchanged).
|
|
36
|
+
- Origin: NiceNote review — the user observed that competing on the M7 plan would have produced a better plan from the start (fewer pre-mortem blocks, less downstream cost), so competition should be automatic, not a flag to remember.
|
|
37
|
+
|
|
5
38
|
## [4.2.10] - 2026-06-05 (M83 Left-Shifted Plan Hardening - minor)
|
|
6
39
|
|
|
7
40
|
### Added - Plan-phase hardening: catch dead deliverables and edge cases BEFORE execute
|
package/README.md
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
# GSD-T: Contract-Driven Development for Claude Code
|
|
2
2
|
|
|
3
|
-
**v4.
|
|
3
|
+
**v4.4.10** - A methodology for reliable, parallelizable development using Claude Code with optional Agent Teams support.
|
|
4
4
|
|
|
5
5
|
**Eliminates context rot** — task-level fresh dispatch (one subagent per task, ~10-20% context each) means compaction never triggers.
|
|
6
6
|
**Compaction-proof debug loops** — `gsd-t headless --debug-loop` runs test-fix-retest cycles as separate `claude -p` sessions. A JSONL debug ledger persists all hypothesis/fix/learning history across fresh sessions. Anti-repetition preamble injection prevents retrying failed hypotheses. Escalation tiers (sonnet → opus → human) and a hard iteration ceiling enforced externally.
|
|
@@ -18,7 +18,7 @@
|
|
|
18
18
|
**Rigorous User-Journey Coverage + Anti-Drift Test Quality** — `bin/journey-coverage.cjs` regex listener detector + `gsd-t check-coverage` CLI + `scripts/hooks/pre-commit-journey-coverage` commit gate blocks viewer-source commits when uncovered listeners exist. Journey specs in `e2e/journeys/` use functional assertions (zero `toBeVisible`-only tests) per the E2E Test Quality Standard in CLAUDE.md.
|
|
19
19
|
**Universal Playwright Bootstrap + Deterministic UI Enforcement (M50)** — three executable enforcement layers: (1) `bin/playwright-bootstrap.cjs` + `bin/ui-detection.cjs` - idempotent installer detects package manager, installs `@playwright/test` + chromium, scaffolds `e2e/`; (2) Workflow runtime runs `playwright-bootstrap.cjs::installPlaywright()` before any E2E stage when `hasUI && !hasPlaywright`; install failure halts with `blocked-needs-human`; (3) `scripts/hooks/pre-commit-playwright-gate` (opt-in via `gsd-t doctor --install-hooks`) blocks viewer-source commits when staged files are newer than `.gsd-t/.last-playwright-pass`. The `gsd-t setup-playwright [path]` subcommand handles manual install.
|
|
20
20
|
**Visualizer (`/gsd-t-visualize`)** — launches a real-time browser dashboard with dual-pane view: top pane streams the main session, bottom pane streams whichever spawn the user clicks. Left rail shows Live Spawns and Completed (last 100 spawns, status-badged, collapsible). Right rail shows Spawn Plan / Parallelism / Tool Cost. Powered by `gsd-t-stream-feed-server.js` + `gsd-t-dashboard.html`.
|
|
21
|
-
**Surgical model selection** — `bin/model-selector.js` assigns haiku/sonnet/opus per phase via a declarative rules table; `/advisor` escalation path with convention-based fallback.
|
|
21
|
+
**Surgical model selection** — `bin/model-selector.js` assigns haiku/sonnet/opus/fable per phase via a declarative rules table; `/advisor` escalation path with convention-based fallback. **M85 single-source tier policy:** `bin/gsd-t-model-tier-policy.cjs` is the SINGLE source of truth for model-tier assignments; the 5 highest-leverage stages (solution-space probe, partition probe, competition judge, pre-mortem, Red Team) run on `fable` (Claude Fable 5, tier above Opus); competition producers stay `opus` (M82 blindness); debug escalates cycle-1→opus, cycle-2→fable. Drift is mechanically enforced by the M71-family lint (`test/m85-workflow-tier-policy-lint.test.js`).
|
|
22
22
|
**Token Telemetry** — `gsd-t-calibration-hook.js` records token usage per spawn to `.gsd-t/token-metrics.jsonl` (18-field rows). `gsd-t-token-aggregator.js` aggregates across tasks for the `/gsd-t-metrics` view. Use the native Claude Code `/context` command for live in-session context percentage.
|
|
23
23
|
**Quality North Star** — projects define a `## Quality North Star` section in CLAUDE.md (1–3 sentences, e.g., "This is a published npm library. Every public API must be intuitive and backward-compatible."). `gsd-t-init` auto-detects preset (library/web-app/cli) from package.json signals; `gsd-t-setup` configures it for existing projects. Subagents read it as a quality lens; absent = silent skip (backward compatible).
|
|
24
24
|
**Design Brief Artifact** — during partition, UI/frontend projects (React, Vue, Svelte, Flutter, Tailwind) automatically get `.gsd-t/contracts/design-brief.md` with color palette, typography, spacing system, component patterns, and tone/voice. Non-UI projects skip silently. User-customized briefs are preserved. Referenced in plan phase for visual consistency.
|
|
@@ -128,7 +128,7 @@ gsd-t traceability-gate --milestone Mxx [--project-dir P] # M83: plan-phase acce
|
|
|
128
128
|
|
|
129
129
|
**Plan Hardening (M83).** The `plan` phase now runs two blocking gates before execute, so a plan can't ship a dead deliverable: a deterministic **acceptance-traceability gate** (`gsd-t traceability-gate` — every AC must bind to a code path + a killing test; the headline capability needs both impl and test) and an adversarial **pre-mortem** agent (opus, fresh-context, predicts edge-case/NFR/dead-deliverable failures and requires a test for each). The temporal dual of the Red Team — attack the design at plan, not just the code at verify. Origin: a build where the headline capability shipped as dead code and burned 4 verify cycles. See `.gsd-t/contracts/plan-hardening-contract.md`.
|
|
130
130
|
|
|
131
|
-
**Competition Mode (M82
|
|
131
|
+
**Competition Mode (M82 · automatic since M84).** On upstream, pre-contract phases (`/gsd-t-partition`, `/gsd-t-milestone`, `/gsd-t-discuss`, `/gsd-t-design-decompose`) the workflow **automatically decides** whether to compete: an Opus solution-space probe runs at phase start and, if it finds ≥2 genuinely different viable approaches, fans out 3 parallel candidate producers + a judge to pick the winner — the generative dual of the orthogonal validation triad. No flag needed (the probe is biased toward competing, since a better upstream artifact lowers total downstream cost). Partition's judge is an *objective* file-disjointness oracle; subjective phases use a blind + different-model + rubric judge. Override with `--no-competition` or `--competition N` only on explicit request. See `.gsd-t/contracts/competition-mode-contract.md`.
|
|
132
132
|
|
|
133
133
|
`gsd-t parallel` consumes the M44 task-graph (D1) and applies three pre-spawn gates (D4 depgraph validation → D5 file-disjointness → D6 economics) followed by mode-aware headroom/split math. Extends — does not replace — the M40 orchestrator. Contract: `.gsd-t/contracts/wave-join-contract.md` v1.1.0.
|
|
134
134
|
|
|
@@ -391,7 +391,7 @@ Verify with: `/gsd-t-help`
|
|
|
391
391
|
```
|
|
392
392
|
get-stuff-done-teams/
|
|
393
393
|
├── README.md
|
|
394
|
-
├── package.json # @tekyzinc/gsd-t v4.
|
|
394
|
+
├── package.json # @tekyzinc/gsd-t v4.4.10
|
|
395
395
|
├── LICENSE
|
|
396
396
|
├── bin/ # CLI entry + orchestrators + support modules (52 modules)
|
|
397
397
|
│ ├── gsd-t.js # CLI installer + all subcommands
|
|
@@ -407,7 +407,8 @@ get-stuff-done-teams/
|
|
|
407
407
|
│ ├── graph-*.js # Code graph engine (CGC/Neo4j integration)
|
|
408
408
|
│ ├── journey-coverage.cjs # Listener detector + coverage gap reporting
|
|
409
409
|
│ ├── playwright-bootstrap.cjs # Idempotent Playwright installer
|
|
410
|
-
│ ├── model-selector.js # Phase-to-model assignment (haiku/sonnet/opus)
|
|
410
|
+
│ ├── model-selector.js # Phase-to-model assignment (haiku/sonnet/opus/fable)
|
|
411
|
+
│ ├── gsd-t-model-tier-policy.cjs # M85: single-source tier policy (haiku/sonnet/opus/fable), resolver CLI
|
|
411
412
|
│ ├── rule-engine.js # Declarative failure-pattern rules
|
|
412
413
|
│ ├── patch-lifecycle.js # 5-stage patch candidate→graduated lifecycle
|
|
413
414
|
│ └── metrics-collector.js # Task telemetry + ELO tracking
|
|
@@ -0,0 +1,168 @@
|
|
|
1
|
+
/**
|
|
2
|
+
* gsd-t-model-tier-policy.cjs
|
|
3
|
+
*
|
|
4
|
+
* SINGLE source of truth for GSD-T model-tier policy.
|
|
5
|
+
* Zero external runtime deps — installer-package invariant.
|
|
6
|
+
* No top-level side effects.
|
|
7
|
+
*
|
|
8
|
+
* Contract: .gsd-t/contracts/model-tier-policy-contract.md v1.0.0 STABLE
|
|
9
|
+
*/
|
|
10
|
+
|
|
11
|
+
'use strict';
|
|
12
|
+
|
|
13
|
+
// ---------------------------------------------------------------------------
|
|
14
|
+
// Published Model-ID Constants (M85 — authoritative, contract v1.0.0)
|
|
15
|
+
// ---------------------------------------------------------------------------
|
|
16
|
+
|
|
17
|
+
/**
|
|
18
|
+
* Frozen map: tier alias → concrete model id.
|
|
19
|
+
* Consumers MUST import from here — never re-hardcode these strings.
|
|
20
|
+
*
|
|
21
|
+
* @type {Readonly<{opus: string, fable: string, sonnet: string, haiku: string}>}
|
|
22
|
+
*/
|
|
23
|
+
const MODEL_IDS = Object.freeze({
|
|
24
|
+
opus: 'claude-opus-4-8',
|
|
25
|
+
fable: 'claude-fable-5',
|
|
26
|
+
sonnet: 'claude-sonnet-4-6',
|
|
27
|
+
haiku: 'claude-haiku-4-5-20251001',
|
|
28
|
+
});
|
|
29
|
+
|
|
30
|
+
// ---------------------------------------------------------------------------
|
|
31
|
+
// Stage Policy (M85 Fable assignments — contract v1.0.0 § "Stage Policy")
|
|
32
|
+
// ---------------------------------------------------------------------------
|
|
33
|
+
|
|
34
|
+
/**
|
|
35
|
+
* Frozen map: stage key → tier alias.
|
|
36
|
+
* 6 stages → fable; competition-producers held at opus (M82 blindness invariant).
|
|
37
|
+
*
|
|
38
|
+
* @type {Readonly<Record<string, string>>}
|
|
39
|
+
*/
|
|
40
|
+
const STAGE_TIERS = Object.freeze({
|
|
41
|
+
'solution-space-probe': 'fable',
|
|
42
|
+
'partition-probe': 'fable',
|
|
43
|
+
'competition-judge': 'fable',
|
|
44
|
+
'competition-producers': 'opus', // HELD — M82 judge-blindness invariant; do NOT move to fable
|
|
45
|
+
'pre-mortem': 'fable',
|
|
46
|
+
'red-team': 'fable',
|
|
47
|
+
'debug-cycle-2': 'fable',
|
|
48
|
+
});
|
|
49
|
+
|
|
50
|
+
// ---------------------------------------------------------------------------
|
|
51
|
+
// requiresThinkingOmitted predicate (encoding the Fable HTTP-400 breaking change)
|
|
52
|
+
// ---------------------------------------------------------------------------
|
|
53
|
+
|
|
54
|
+
/**
|
|
55
|
+
* Returns true IFF the model requires the explicit thinking-disabled parameter
|
|
56
|
+
* to be OMITTED from the API call.
|
|
57
|
+
*
|
|
58
|
+
* Rationale (canonical, single home): `claude-fable-5` returns HTTP 400 when
|
|
59
|
+
* the explicit thinking-disabled parameter is sent. The parameter must therefore
|
|
60
|
+
* be OMITTED for Fable. No other file may re-implement or re-state this predicate.
|
|
61
|
+
*
|
|
62
|
+
* @param {string} model — concrete model id or tier alias or any string
|
|
63
|
+
* @returns {boolean}
|
|
64
|
+
*/
|
|
65
|
+
function requiresThinkingOmitted(model) {
|
|
66
|
+
if (typeof model !== 'string') return false;
|
|
67
|
+
// Source the id from MODEL_IDS (single-source — no second literal), and accept
|
|
68
|
+
// the runtime's bracket-suffixed display form (e.g. "claude-fable-5[1m]").
|
|
69
|
+
return model === MODEL_IDS.fable || model.startsWith(MODEL_IDS.fable + '[');
|
|
70
|
+
}
|
|
71
|
+
|
|
72
|
+
// ---------------------------------------------------------------------------
|
|
73
|
+
// resolve(stageKey) → concreteModelId
|
|
74
|
+
// ---------------------------------------------------------------------------
|
|
75
|
+
|
|
76
|
+
/**
|
|
77
|
+
* Returns the concrete model id for the given stage key, or null for unknown keys.
|
|
78
|
+
* Never throws.
|
|
79
|
+
*
|
|
80
|
+
* @param {string} stageKey
|
|
81
|
+
* @returns {string|null}
|
|
82
|
+
*/
|
|
83
|
+
function resolve(stageKey) {
|
|
84
|
+
try {
|
|
85
|
+
const tier = STAGE_TIERS[stageKey];
|
|
86
|
+
if (!tier) return null;
|
|
87
|
+
const modelId = MODEL_IDS[tier];
|
|
88
|
+
return modelId !== undefined ? modelId : null;
|
|
89
|
+
} catch (_) {
|
|
90
|
+
return null;
|
|
91
|
+
}
|
|
92
|
+
}
|
|
93
|
+
|
|
94
|
+
// ---------------------------------------------------------------------------
|
|
95
|
+
// Exports
|
|
96
|
+
// ---------------------------------------------------------------------------
|
|
97
|
+
|
|
98
|
+
module.exports = {
|
|
99
|
+
MODEL_IDS,
|
|
100
|
+
STAGE_TIERS,
|
|
101
|
+
requiresThinkingOmitted,
|
|
102
|
+
resolve,
|
|
103
|
+
};
|
|
104
|
+
|
|
105
|
+
// ---------------------------------------------------------------------------
|
|
106
|
+
// CLI dispatch (M69 invoke-time injection surface)
|
|
107
|
+
// run: node bin/gsd-t-model-tier-policy.cjs resolve <stageKey> [--json]
|
|
108
|
+
// ---------------------------------------------------------------------------
|
|
109
|
+
|
|
110
|
+
if (require.main === module) {
|
|
111
|
+
const args = process.argv.slice(2);
|
|
112
|
+
const jsonFlag = args.includes('--json');
|
|
113
|
+
const positional = args.filter(a => !a.startsWith('-'));
|
|
114
|
+
|
|
115
|
+
const command = positional[0];
|
|
116
|
+
|
|
117
|
+
if (command === 'resolve') {
|
|
118
|
+
const stageKey = positional[1];
|
|
119
|
+
|
|
120
|
+
if (!stageKey) {
|
|
121
|
+
const msg = 'Usage: gsd-t-model-tier-policy.cjs resolve <stageKey> [--json]';
|
|
122
|
+
if (jsonFlag) {
|
|
123
|
+
process.stdout.write(JSON.stringify({ ok: false, error: msg }) + '\n');
|
|
124
|
+
} else {
|
|
125
|
+
process.stderr.write(msg + '\n');
|
|
126
|
+
}
|
|
127
|
+
process.exit(1);
|
|
128
|
+
}
|
|
129
|
+
|
|
130
|
+
const tier = STAGE_TIERS[stageKey];
|
|
131
|
+
const modelId = resolve(stageKey);
|
|
132
|
+
|
|
133
|
+
if (modelId === null) {
|
|
134
|
+
const envelope = { ok: false, stageKey, error: `Unknown stage key: "${stageKey}"` };
|
|
135
|
+
if (jsonFlag) {
|
|
136
|
+
process.stdout.write(JSON.stringify(envelope) + '\n');
|
|
137
|
+
} else {
|
|
138
|
+
process.stderr.write(`Unknown stage key: "${stageKey}"\n`);
|
|
139
|
+
}
|
|
140
|
+
process.exit(1);
|
|
141
|
+
}
|
|
142
|
+
|
|
143
|
+
const envelope = {
|
|
144
|
+
ok: true,
|
|
145
|
+
stageKey,
|
|
146
|
+
tier,
|
|
147
|
+
model: modelId,
|
|
148
|
+
requiresThinkingOmitted: requiresThinkingOmitted(modelId),
|
|
149
|
+
};
|
|
150
|
+
|
|
151
|
+
if (jsonFlag) {
|
|
152
|
+
process.stdout.write(JSON.stringify(envelope) + '\n');
|
|
153
|
+
} else {
|
|
154
|
+
process.stdout.write(`stageKey: ${stageKey}\ntier: ${tier}\nmodel: ${modelId}\nrequiresThinkingOmitted: ${envelope.requiresThinkingOmitted}\n`);
|
|
155
|
+
}
|
|
156
|
+
|
|
157
|
+
process.exit(0);
|
|
158
|
+
}
|
|
159
|
+
|
|
160
|
+
// Unknown command
|
|
161
|
+
const usage = `Usage: gsd-t-model-tier-policy.cjs resolve <stageKey> [--json]`;
|
|
162
|
+
if (jsonFlag) {
|
|
163
|
+
process.stdout.write(JSON.stringify({ ok: false, error: usage }) + '\n');
|
|
164
|
+
} else {
|
|
165
|
+
process.stderr.write(usage + '\n');
|
|
166
|
+
}
|
|
167
|
+
process.exit(1);
|
|
168
|
+
}
|
package/bin/gsd-t-parallel.cjs
CHANGED
|
@@ -36,6 +36,8 @@ const path = require("node:path");
|
|
|
36
36
|
const { buildTaskGraph, getReadyTasks } = require(path.join(__dirname, "gsd-t-task-graph.cjs"));
|
|
37
37
|
const { validateDepGraph } = require(path.join(__dirname, "gsd-t-depgraph-validate.cjs"));
|
|
38
38
|
const { proveDisjointness } = require(path.join(__dirname, "gsd-t-file-disjointness.cjs"));
|
|
39
|
+
// M85: single source of truth for model ids — sourced from policy module, never re-hardcoded here
|
|
40
|
+
const { MODEL_IDS } = require(path.join(__dirname, "gsd-t-model-tier-policy.cjs"));
|
|
39
41
|
// M61 D3: gsd-t-economics retired. estimateTaskFootprint produced a per-task
|
|
40
42
|
// token+cost estimate the planner could consult for in-session-headroom
|
|
41
43
|
// math. Native budget primitives (Workflow `budget` + /usage) replace it.
|
|
@@ -420,14 +422,18 @@ function _runCacheWarmProbe(opts) {
|
|
|
420
422
|
"then reply with the single word `warm` and nothing else:\n" +
|
|
421
423
|
filesRead.map((f) => `- ${f}`).join("\n");
|
|
422
424
|
|
|
423
|
-
|
|
424
|
-
|
|
425
|
+
// M85: pass model via --model flag ONLY (env var ANTHROPIC_MODEL is silently
|
|
426
|
+
// ignored by the current claude CLI — measured probe 2026-06-09 r3: env form
|
|
427
|
+
// ran opus-4-8 regardless of the env value). No env mutation here.
|
|
428
|
+
const env = process.env;
|
|
429
|
+
const cliArgs = ["-p", prompt, "--dangerously-skip-permissions"];
|
|
430
|
+
if (model) cliArgs.push("--model", model);
|
|
425
431
|
|
|
426
432
|
try {
|
|
427
433
|
// GSD-T-LINT: skip stream-json (reason: cache-warm probe — single-word "warm" reply, no progress to stream)
|
|
428
434
|
const r = spawnSync(
|
|
429
435
|
"claude",
|
|
430
|
-
|
|
436
|
+
cliArgs,
|
|
431
437
|
{
|
|
432
438
|
cwd: projectDir,
|
|
433
439
|
env,
|
|
@@ -580,11 +586,15 @@ function runDispatch(opts) {
|
|
|
580
586
|
// A task can opt back to Opus by declaring "[opus]" in its tasks.md line;
|
|
581
587
|
// the planner surfaces this via per-task metadata (future; today the per-
|
|
582
588
|
// subset opt-in is an all-or-nothing knob passed by the caller).
|
|
583
|
-
const DEFAULT_WORKER_MODEL =
|
|
589
|
+
const DEFAULT_WORKER_MODEL = MODEL_IDS.sonnet;
|
|
590
|
+
// M85: alias map sources from policy module — MODEL_IDS is the single authority.
|
|
591
|
+
// No bare model-id literals here; changing a model id in the policy module alone
|
|
592
|
+
// is sufficient (single-source thesis, AC b).
|
|
584
593
|
const modelAlias = {
|
|
585
|
-
opus:
|
|
586
|
-
|
|
587
|
-
|
|
594
|
+
opus: MODEL_IDS.opus,
|
|
595
|
+
fable: MODEL_IDS.fable,
|
|
596
|
+
sonnet: MODEL_IDS.sonnet,
|
|
597
|
+
haiku: MODEL_IDS.haiku,
|
|
588
598
|
};
|
|
589
599
|
const callerModel = opts && opts.workerModel;
|
|
590
600
|
const workerModel = callerModel === false
|
package/bin/gsd-t.js
CHANGED
|
@@ -1186,6 +1186,8 @@ const GLOBAL_BIN_TOOLS = [
|
|
|
1186
1186
|
"gsd-t-competition-judge.cjs",
|
|
1187
1187
|
// M83 — Plan-phase acceptance-traceability gate.
|
|
1188
1188
|
"gsd-t-traceability-gate.cjs",
|
|
1189
|
+
// M85 — Model-tier policy single source of truth (resolver + predicate).
|
|
1190
|
+
"gsd-t-model-tier-policy.cjs",
|
|
1189
1191
|
];
|
|
1190
1192
|
|
|
1191
1193
|
function installGlobalBinTools() {
|
|
@@ -2479,6 +2481,9 @@ const PROJECT_BIN_TOOLS = [
|
|
|
2479
2481
|
"gsd-t-competition-judge.cjs", "gsd-t-file-disjointness.cjs",
|
|
2480
2482
|
// M83 — Plan-phase acceptance-traceability gate (runs in the plan workflow).
|
|
2481
2483
|
"gsd-t-traceability-gate.cjs",
|
|
2484
|
+
// M85 — Model-tier policy resolver, so command invokers in consumer projects
|
|
2485
|
+
// can resolve stage tiers at invoke time (M69 injection pattern).
|
|
2486
|
+
"gsd-t-model-tier-policy.cjs",
|
|
2482
2487
|
];
|
|
2483
2488
|
|
|
2484
2489
|
// Files that older versions of this installer copied into project bin/ but
|
|
@@ -4575,6 +4580,16 @@ if (require.main === module) {
|
|
|
4575
4580
|
});
|
|
4576
4581
|
process.exit(res.status == null ? 1 : res.status);
|
|
4577
4582
|
}
|
|
4583
|
+
case "model-tier-policy": {
|
|
4584
|
+
// M85 — `gsd-t model-tier-policy` thin dispatcher to the tier-policy
|
|
4585
|
+
// resolver (single source of truth for model-tier assignments).
|
|
4586
|
+
const { spawnSync } = require("child_process");
|
|
4587
|
+
const js = path.join(__dirname, "gsd-t-model-tier-policy.cjs");
|
|
4588
|
+
const res = spawnSync(process.execPath, [js, ...args.slice(1)], {
|
|
4589
|
+
stdio: "inherit",
|
|
4590
|
+
});
|
|
4591
|
+
process.exit(res.status == null ? 1 : res.status);
|
|
4592
|
+
}
|
|
4578
4593
|
case "metrics":
|
|
4579
4594
|
doMetrics(args.slice(1));
|
|
4580
4595
|
break;
|
package/bin/model-selector.js
CHANGED
|
@@ -16,11 +16,14 @@
|
|
|
16
16
|
*/
|
|
17
17
|
|
|
18
18
|
// ── Tiers ───────────────────────────────────────────────────────────────────
|
|
19
|
+
// M85: FABLE tier added alongside HAIKU/SONNET/OPUS.
|
|
20
|
+
// Contract: .gsd-t/contracts/model-tier-policy-contract.md v1.0.0 § "Stage Policy"
|
|
19
21
|
|
|
20
22
|
const TIERS = Object.freeze({
|
|
21
23
|
HAIKU: "haiku",
|
|
22
24
|
SONNET: "sonnet",
|
|
23
25
|
OPUS: "opus",
|
|
26
|
+
FABLE: "fable",
|
|
24
27
|
});
|
|
25
28
|
|
|
26
29
|
const DEFAULT_TIER = TIERS.SONNET;
|
|
@@ -90,9 +93,16 @@ const PHASE_RULES = Object.freeze([
|
|
|
90
93
|
{ phase: "integrate", model: TIERS.SONNET, reason: "Integration wiring is routine coordination work" },
|
|
91
94
|
|
|
92
95
|
// Phase: debug
|
|
93
|
-
{ phase: "debug", task_type: "fix_apply",
|
|
94
|
-
{ phase: "debug", task_type: "root_cause",
|
|
95
|
-
|
|
96
|
+
{ phase: "debug", task_type: "fix_apply", model: TIERS.SONNET, reason: "Applying a known fix is routine code work" },
|
|
97
|
+
{ phase: "debug", task_type: "root_cause", model: TIERS.OPUS, reason: "Root-cause analysis is high-stakes reasoning" },
|
|
98
|
+
// M85: cycle-2 escalation — when debug cycle-1 (opus) has not resolved the issue,
|
|
99
|
+
// cycle-2 escalates to Fable. The debug DEFAULT (cycle-1/general) remains opus —
|
|
100
|
+
// no existing rule is altered (AC f, no silent degradation). This is a DOCUMENTED
|
|
101
|
+
// MIRROR for Task-based/bin/ callers; the live enforcement is in the debug workflow
|
|
102
|
+
// ternary (D3-T3); the D4 lint guards that ternary.
|
|
103
|
+
// API shape: selectModel({ phase: "debug", task_type: "cycle_2_escalation" }) → fable
|
|
104
|
+
{ phase: "debug", task_type: "cycle_2_escalation", model: TIERS.FABLE, reason: "Cycle-2 debug escalation — Fable after opus cycle-1 has not resolved; no existing rule altered (AC f)" },
|
|
105
|
+
{ phase: "debug", model: TIERS.OPUS, reason: "Debug default is high-stakes — prefer opus unless the task_type says otherwise" },
|
|
96
106
|
|
|
97
107
|
// Phase: partition — high-stakes architectural decomposition
|
|
98
108
|
{ phase: "partition", model: TIERS.OPUS, reason: "Domain partitioning is architectural reasoning — high stakes" },
|
|
@@ -25,12 +25,12 @@ Capture the design reference from `$ARGUMENTS` (Figma URL / image path). If Figm
|
|
|
25
25
|
args: {
|
|
26
26
|
phase: "design-decompose",
|
|
27
27
|
projectDir: ".",
|
|
28
|
-
userInput: "$ARGUMENTS"
|
|
29
|
-
//
|
|
30
|
-
//
|
|
31
|
-
//
|
|
32
|
-
//
|
|
33
|
-
|
|
28
|
+
userInput: "$ARGUMENTS"
|
|
29
|
+
// M84 Competition Mode is AUTOMATIC — do NOT pass `competition` by default.
|
|
30
|
+
// The workflow probes (opus) and self-decides; it competes when a design is
|
|
31
|
+
// ambiguous or the element/widget/page boundaries aren't obvious (a blind,
|
|
32
|
+
// different-model rubric judge picks the winner). Override only on explicit
|
|
33
|
+
// request: `--no-competition` → 0, `--competition N` (2-5) → N.
|
|
34
34
|
}
|
|
35
35
|
}
|
|
36
36
|
```
|
package/commands/gsd-t-help.md
CHANGED
|
@@ -481,7 +481,7 @@ Use these when user asks for help on a specific command:
|
|
|
481
481
|
|
|
482
482
|
### competition-judge (M82)
|
|
483
483
|
- **Summary**: The selection oracle for Competition Mode (generate-and-judge — the *generative* dual of the orthogonal validation triad). Two modes: `--kind partition` scores candidate domain decompositions via the file-disjointness oracle (parallelGroups / waveDepth / validity — a calculator, not an LLM critic, so it's immune to judge bias); `--kind generic` is a deterministic rubric selector that finalizes a winner from rubric scores an upstream blind/different-model judge supplied.
|
|
484
|
-
- **Auto-invoked**: Yes —
|
|
484
|
+
- **Auto-invoked**: Yes — AUTOMATICALLY (M84). On an eligible phase (partition / milestone / discuss / design-decompose), `gsd-t-phase.workflow.js` runs an Opus solution-space probe at phase start and self-decides whether to fan out 3 producers + this judge (biased toward competing). No flag needed; override with `--competition N` (force N) or `--no-competition` (force off).
|
|
485
485
|
- **Files**: `bin/gsd-t-competition-judge.cjs` (reuses `bin/gsd-t-file-disjointness.cjs`).
|
|
486
486
|
- **Use when**: Upstream, pre-contract, wide-solution-space decisions where the cost of a single draft is high (partition, milestone decomposition, ambiguous design decomposition). Never on post-contract phases (execute/verify/etc.) — those are owned by the adversarial triad.
|
|
487
487
|
- **CLI**: `gsd-t competition-judge [--in <spec.json>] [--project-dir <dir>]` (spec via stdin or `--in`). Exit 0 winner · 4 no valid candidate · 64 bad input.
|
|
@@ -495,6 +495,13 @@ Use these when user asks for help on a specific command:
|
|
|
495
495
|
- **CLI**: `gsd-t traceability-gate [--milestone <Mxx>] [--project-dir <dir>] [--tasks <file>]`. Exit 0 all traceable · 4 ≥1 untraceable AC (blocks execute) · 64 no tasks files.
|
|
496
496
|
- **Contract**: `.gsd-t/contracts/plan-hardening-contract.md` v1.0.0 STABLE.
|
|
497
497
|
|
|
498
|
+
### model-tier-policy (M85)
|
|
499
|
+
- **Summary**: SINGLE source of truth for GSD-T model-tier assignments. Publishes the authoritative tier set (haiku/sonnet/opus/fable), the 7 designated stage→tier mappings, and the `requiresThinkingOmitted(model)` predicate (encoding Fable's HTTP-400 breaking change ONCE). M85 slots the Fable tier into the 5 highest-leverage stages (solution-space probe, partition probe, competition judge, pre-mortem, Red Team) where one call's judgment gates the most downstream spend; competition producers STAY opus (M82 blindness invariant); debug cycle-1→opus, cycle-2→fable. A M71-family lint (`test/m85-workflow-tier-policy-lint.test.js`) proves every workflow `model:` literal matches the policy — a drifted literal FAILS the lint (mandatory negative test).
|
|
500
|
+
- **Files**: `bin/gsd-t-model-tier-policy.cjs` (zero external deps — installer invariant).
|
|
501
|
+
- **Use when**: Any phase that needs to resolve a concrete model id from a stage key at invoke time (M69 pattern). Workflows NEVER `require` this module (sandbox ban) — they use hard-coded tier alias literals the lint proves match the policy.
|
|
502
|
+
- **CLI**: `gsd-t model-tier-policy resolve <stageKey> [--json]`. Emits `{ok, stageKey, tier, model, requiresThinkingOmitted}`. Exit 0 resolved · 1 unknown stage key.
|
|
503
|
+
- **Contract**: `.gsd-t/contracts/model-tier-policy-contract.md` v1.0.0 STABLE.
|
|
504
|
+
|
|
498
505
|
## Unknown Command
|
|
499
506
|
|
|
500
507
|
If user asks for help on unrecognized command:
|
|
@@ -25,17 +25,17 @@ Read `.gsd-t/progress.md` (current version + completed milestones), `docs/requir
|
|
|
25
25
|
args: {
|
|
26
26
|
phase: "milestone",
|
|
27
27
|
projectDir: ".",
|
|
28
|
-
userInput: "$ARGUMENTS"
|
|
29
|
-
//
|
|
30
|
-
//
|
|
31
|
-
//
|
|
32
|
-
//
|
|
33
|
-
|
|
28
|
+
userInput: "$ARGUMENTS"
|
|
29
|
+
// M84 Competition Mode is AUTOMATIC — do NOT pass `competition` by default.
|
|
30
|
+
// The workflow probes (opus) and self-decides; milestone decomposition is the
|
|
31
|
+
// highest-altitude decision, so it competes whenever ≥2 genuinely different
|
|
32
|
+
// strategies (risk-first / value-first / dependency-first) exist. Override only
|
|
33
|
+
// on explicit request: `--no-competition` → 0, `--competition N` (2-5) → N.
|
|
34
34
|
}
|
|
35
35
|
}
|
|
36
36
|
```
|
|
37
37
|
|
|
38
|
-
**Competition Mode (
|
|
38
|
+
**Competition Mode (automatic).** Milestone decomposition auto-competes when the probe finds ≥2 genuinely different strategies. Because a decomposition is a *coupled thesis*, the judge selects one winner whole (pick-one) and salvages only non-overlapping good line-items from the losers — it never Frankensteins. No flag needed; override with `--no-competition` / `--competition N` on explicit request. See `.gsd-t/contracts/competition-mode-contract.md`.
|
|
39
39
|
|
|
40
40
|
## Step 3: Interpret the result
|
|
41
41
|
|
|
@@ -30,17 +30,17 @@ Call the `Workflow` tool with:
|
|
|
30
30
|
phase: "partition",
|
|
31
31
|
milestone: "M{NN}",
|
|
32
32
|
projectDir: ".",
|
|
33
|
-
userInput: "$ARGUMENTS"
|
|
34
|
-
//
|
|
35
|
-
//
|
|
36
|
-
//
|
|
37
|
-
//
|
|
38
|
-
competition:
|
|
33
|
+
userInput: "$ARGUMENTS"
|
|
34
|
+
// M84 Competition Mode is AUTOMATIC — do NOT pass `competition` by default.
|
|
35
|
+
// The workflow runs a solution-space probe and self-decides whether to fan out
|
|
36
|
+
// N candidate partitions (judged by the file-disjointness oracle). Only pass an
|
|
37
|
+
// override if the user explicitly asked: `--competition 0`/`--no-competition`
|
|
38
|
+
// → competition: 0; `--competition N` (2-5) → competition: N.
|
|
39
39
|
}
|
|
40
40
|
}
|
|
41
41
|
```
|
|
42
42
|
|
|
43
|
-
**Competition Mode (
|
|
43
|
+
**Competition Mode (automatic).** Partition auto-competes when the workflow's probe finds ≥2 genuinely different ways to carve the domains; the objective file-disjointness oracle judges the candidates and picks the most-parallelizable valid one. No flag needed. Override only on explicit request: `/gsd-t-partition --no-competition` (force single draft) or `--competition N` (force N). See `.gsd-t/contracts/competition-mode-contract.md`.
|
|
44
44
|
|
|
45
45
|
## Step 3: Interpret the result
|
|
46
46
|
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "@tekyzinc/gsd-t",
|
|
3
|
-
"version": "4.
|
|
3
|
+
"version": "4.4.10",
|
|
4
4
|
"description": "GSD-T: Contract-Driven Development for Claude Code — 54 slash commands with headless-by-default workflow spawning, unattended supervisor relay with event stream, graph-powered code analysis, real-time agent dashboard, task telemetry, doc-ripple enforcement, backlog management, impact analysis, test sync, milestone archival, and PRD generation",
|
|
5
5
|
"author": "Tekyz, Inc.",
|
|
6
6
|
"license": "MIT",
|
|
@@ -0,0 +1,58 @@
|
|
|
1
|
+
#!/usr/bin/env bash
|
|
2
|
+
# gsd-t-ctx-cue.sh — GSD-T low-context visual cue (M85)
|
|
3
|
+
#
|
|
4
|
+
# A Stop hook: fires mechanically at the end of EVERY turn. Computes remaining
|
|
5
|
+
# context window % from the current session's JSONL (the same source the status
|
|
6
|
+
# line uses) and, when it drops below the threshold, prints a STRONG red banner
|
|
7
|
+
# so the user knows to checkpoint (/gsd-t-pause) and /clear before compaction.
|
|
8
|
+
#
|
|
9
|
+
# Deterministic by design — does not rely on the model remembering to check
|
|
10
|
+
# (per feedback_deterministic_orchestration). Synchronous (NOT async) so its
|
|
11
|
+
# stdout reaches the terminal. Fails silently/open on any error — a status cue
|
|
12
|
+
# must never block or break a turn.
|
|
13
|
+
#
|
|
14
|
+
# Threshold: default 40 (% left). Override with GSD_T_CTX_CUE_THRESHOLD.
|
|
15
|
+
# Window: 1,000,000 (Opus 4.7/4.8 + Sonnet 4.x); 200,000 (Haiku).
|
|
16
|
+
|
|
17
|
+
set -o pipefail
|
|
18
|
+
THRESHOLD="${GSD_T_CTX_CUE_THRESHOLD:-40}"
|
|
19
|
+
|
|
20
|
+
# The hook receives the same JSON on stdin that other hooks do.
|
|
21
|
+
input=$(cat 2>/dev/null)
|
|
22
|
+
|
|
23
|
+
cwd=$(printf '%s' "$input" | jq -r '.workspace.current_dir // .cwd // ""' 2>/dev/null)
|
|
24
|
+
[ -z "$cwd" ] && cwd="$PWD"
|
|
25
|
+
model=$(printf '%s' "$input" | jq -r '.model.id // ""' 2>/dev/null)
|
|
26
|
+
|
|
27
|
+
# Only act inside GSD-T projects (a .gsd-t dir present) — the cue is GSD-T's.
|
|
28
|
+
[ -d "${cwd}/.gsd-t" ] || exit 0
|
|
29
|
+
|
|
30
|
+
proj_slug=$(printf '%s' "$cwd" | sed 's:/:-:g')
|
|
31
|
+
sess_dir="$HOME/.claude/projects/$proj_slug"
|
|
32
|
+
[ -d "$sess_dir" ] || exit 0
|
|
33
|
+
latest=$(ls -t "$sess_dir"/*.jsonl 2>/dev/null | head -1)
|
|
34
|
+
[ -n "$latest" ] || exit 0
|
|
35
|
+
|
|
36
|
+
case "$model" in
|
|
37
|
+
*haiku*) win=200000 ;;
|
|
38
|
+
*) win=1000000 ;;
|
|
39
|
+
esac
|
|
40
|
+
|
|
41
|
+
used=$(grep '"usage"' "$latest" 2>/dev/null | tail -1 \
|
|
42
|
+
| jq -r '(.message.usage // {}) | (.input_tokens//0)+(.cache_creation_input_tokens//0)+(.cache_read_input_tokens//0)' 2>/dev/null)
|
|
43
|
+
[ -n "$used" ] && [ "$used" -gt 0 ] 2>/dev/null || exit 0
|
|
44
|
+
|
|
45
|
+
pct=$(awk -v u="$used" -v w="$win" 'BEGIN { printf "%d", (100 - (u / w * 100)) + 0.5 }')
|
|
46
|
+
|
|
47
|
+
# Above threshold → silent (no cue).
|
|
48
|
+
[ "$pct" -lt "$THRESHOLD" ] 2>/dev/null || exit 0
|
|
49
|
+
|
|
50
|
+
# ── Strong red banner ──────────────────────────────────────────────────────
|
|
51
|
+
RED=$'\033[1;37;41m' # bold white on red
|
|
52
|
+
RST=$'\033[0m'
|
|
53
|
+
BAR="████████████████████████████████████████"
|
|
54
|
+
printf '\n%s %s %s\n' "$RED" "$BAR" "$RST"
|
|
55
|
+
printf '%s ⚠ CONTEXT LOW — %d%% LEFT %s\n' "$RED" "$pct" "$RST"
|
|
56
|
+
printf '%s Checkpoint now: /gsd-t-pause → /clear → /gsd-t-resume %s\n' "$RED" "$RST"
|
|
57
|
+
printf '%s %s %s\n\n' "$RED" "$BAR" "$RST"
|
|
58
|
+
exit 0
|
|
@@ -0,0 +1,119 @@
|
|
|
1
|
+
#!/usr/bin/env bash
|
|
2
|
+
# Claude Code status line — GSD-T project status bar (CANONICAL SOURCE)
|
|
3
|
+
#
|
|
4
|
+
# This is the SHIPPED source of truth for the GSD-T status line. The installer
|
|
5
|
+
# copies it to ~/.claude/statusline-command.sh and wires the `statusLine` setting
|
|
6
|
+
# to it, so edits here survive `gsd-t install` / `update` / `update-all`.
|
|
7
|
+
# (Supersedes scripts/gsd-t-statusline.js, whose context source was retired in M61.)
|
|
8
|
+
#
|
|
9
|
+
# Layout (M85):
|
|
10
|
+
# Line 1: [GSD-T] | vX.Y.ZZ | ctx N% left | project | git branch | model id | HH:MM TZ
|
|
11
|
+
# Line 2: the full milestone/Status string (wraps to its own row instead of
|
|
12
|
+
# being truncated with a trailing "…" at terminal width).
|
|
13
|
+
set -o pipefail
|
|
14
|
+
|
|
15
|
+
input=$(cat)
|
|
16
|
+
|
|
17
|
+
# --- 1. [GSD-T] prefix (bright cyan when ANSI available) ---
|
|
18
|
+
PREFIX=$'\033[1;36m[GSD-T]\033[0m'
|
|
19
|
+
|
|
20
|
+
# --- 1b. GSD-T version — the installed framework version, project-independent.
|
|
21
|
+
# Source from ~/.claude/.gsd-t-version (written by the installer/update-all);
|
|
22
|
+
# fall back to the global package.json, then to empty (field omitted). ---
|
|
23
|
+
gsdt_version=""
|
|
24
|
+
if [ -f "$HOME/.claude/.gsd-t-version" ]; then
|
|
25
|
+
gsdt_version=$(tr -d '[:space:]' < "$HOME/.claude/.gsd-t-version" 2>/dev/null)
|
|
26
|
+
fi
|
|
27
|
+
if [ -z "$gsdt_version" ] && command -v gsd-t >/dev/null 2>&1; then
|
|
28
|
+
gsdt_version=$(gsd-t --version 2>/dev/null | tr -d '[:space:]')
|
|
29
|
+
fi
|
|
30
|
+
[ -n "$gsdt_version" ] && gsdt_version="v${gsdt_version#v}"
|
|
31
|
+
|
|
32
|
+
# --- 2. Project name — basename of cwd from JSON ---
|
|
33
|
+
cwd=$(printf '%s' "$input" | jq -r '.workspace.current_dir // .cwd // ""')
|
|
34
|
+
project=""
|
|
35
|
+
if [ -n "$cwd" ]; then
|
|
36
|
+
project=$(basename "$cwd")
|
|
37
|
+
fi
|
|
38
|
+
|
|
39
|
+
# --- 3. Milestone + phase — single grep of .gsd-t/progress.md ---
|
|
40
|
+
milestone=""
|
|
41
|
+
if [ -n "$cwd" ] && [ -f "${cwd}/.gsd-t/progress.md" ]; then
|
|
42
|
+
milestone=$(grep -m1 '^## Status:' "${cwd}/.gsd-t/progress.md" | sed 's/^## Status:[[:space:]]*//' | tr -d '\r')
|
|
43
|
+
fi
|
|
44
|
+
|
|
45
|
+
# --- 4. Git branch (skip gracefully if not a repo) ---
|
|
46
|
+
branch=""
|
|
47
|
+
if [ -n "$cwd" ]; then
|
|
48
|
+
branch=$(git -C "$cwd" rev-parse --abbrev-ref HEAD 2>/dev/null || true)
|
|
49
|
+
fi
|
|
50
|
+
|
|
51
|
+
# --- 5. Model id ---
|
|
52
|
+
model=$(printf '%s' "$input" | jq -r '.model.id // ""')
|
|
53
|
+
|
|
54
|
+
# --- 6. Context window % left (M61: read latest usage envelope from Claude
|
|
55
|
+
# Code's session JSONL; falls back silently if unreadable).
|
|
56
|
+
# Window: 1,000,000 for Opus 4.7/4.8 + Sonnet 4.x; 200,000 for Haiku.
|
|
57
|
+
# Computed as input_tokens + cache_creation_input_tokens +
|
|
58
|
+
# cache_read_input_tokens to capture the full window pressure. ---
|
|
59
|
+
ctx_left=""
|
|
60
|
+
if [ -n "$cwd" ]; then
|
|
61
|
+
proj_slug=$(printf '%s' "$cwd" | sed 's:/:-:g')
|
|
62
|
+
sess_dir="$HOME/.claude/projects/$proj_slug"
|
|
63
|
+
if [ -d "$sess_dir" ]; then
|
|
64
|
+
latest_jsonl=$(ls -t "$sess_dir"/*.jsonl 2>/dev/null | head -1)
|
|
65
|
+
if [ -n "$latest_jsonl" ]; then
|
|
66
|
+
# Window size by model family. Haiku = 200k; everything else = 1M.
|
|
67
|
+
case "$model" in
|
|
68
|
+
*haiku*) win=200000 ;;
|
|
69
|
+
*) win=1000000 ;;
|
|
70
|
+
esac
|
|
71
|
+
# Grab the last "usage" record in the file and sum input fields.
|
|
72
|
+
used=$(grep '"usage"' "$latest_jsonl" 2>/dev/null \
|
|
73
|
+
| tail -1 \
|
|
74
|
+
| jq -r '
|
|
75
|
+
(.message.usage // {})
|
|
76
|
+
| (.input_tokens // 0)
|
|
77
|
+
+ (.cache_creation_input_tokens // 0)
|
|
78
|
+
+ (.cache_read_input_tokens // 0)
|
|
79
|
+
' 2>/dev/null)
|
|
80
|
+
if [ -n "$used" ] && [ "$used" -gt 0 ] 2>/dev/null; then
|
|
81
|
+
ctx_left=$(awk -v u="$used" -v w="$win" \
|
|
82
|
+
'BEGIN { p = 100 - (u / w * 100); printf "ctx %d%% left", (p + 0.5) }')
|
|
83
|
+
fi
|
|
84
|
+
fi
|
|
85
|
+
fi
|
|
86
|
+
fi
|
|
87
|
+
|
|
88
|
+
# --- 7. Local time ---
|
|
89
|
+
timestamp=$(date +"%H:%M %Z")
|
|
90
|
+
|
|
91
|
+
# --- Assemble ---
|
|
92
|
+
# Line 1: short fields only. ctx% sits right after the version (per user).
|
|
93
|
+
# The verbose milestone status moves to line 2 so it wraps instead of
|
|
94
|
+
# being truncated with a trailing "…" at terminal width.
|
|
95
|
+
parts=("$PREFIX")
|
|
96
|
+
[ -n "$gsdt_version" ] && parts+=("$gsdt_version")
|
|
97
|
+
[ -n "$ctx_left" ] && parts+=("$ctx_left")
|
|
98
|
+
[ -n "$project" ] && parts+=("$project")
|
|
99
|
+
[ -n "$branch" ] && parts+=("$branch")
|
|
100
|
+
[ -n "$model" ] && parts+=("$model")
|
|
101
|
+
parts+=("$timestamp")
|
|
102
|
+
|
|
103
|
+
# Join line 1 with " | "
|
|
104
|
+
line1=""
|
|
105
|
+
for part in "${parts[@]}"; do
|
|
106
|
+
if [ -z "$line1" ]; then
|
|
107
|
+
line1="$part"
|
|
108
|
+
else
|
|
109
|
+
line1="${line1} | ${part}"
|
|
110
|
+
fi
|
|
111
|
+
done
|
|
112
|
+
|
|
113
|
+
# Line 2: the milestone/status string on its own line (Claude Code renders \n as a
|
|
114
|
+
# second status row). Omitted when there's no milestone status.
|
|
115
|
+
if [ -n "$milestone" ]; then
|
|
116
|
+
printf '%s\n%s' "$line1" "$milestone"
|
|
117
|
+
else
|
|
118
|
+
printf '%s' "$line1"
|
|
119
|
+
fi
|
|
@@ -295,7 +295,7 @@ After the E2E suite, `gsd-t-verify` Step 4.5 runs `gsd-t test-data --purge --run
|
|
|
295
295
|
Every code-producing phase ends with `gsd-t-verify.workflow.js`, which runs three orthogonal validators as `parallel()` `agent()` stages with schema-validated output. Per `.gsd-t/contracts/orthogonal-validation-contract.md` v1.0.0 STABLE, they are declared orthogonal objective functions — no collapse, no substitution, no transitive trust.
|
|
296
296
|
|
|
297
297
|
- **`/code-review ultra`** — cooperative correctness + cleanup. Severity: `important` / `nit` / `pre-existing`. Skippable via `args.skipUltra=true` + `args.skipUltraReason`. `skipUltra=true` is INELIGIBLE for `VERIFIED`.
|
|
298
|
-
- **Red Team** — adversarial / security / boundaries. Non-skippable. Protocol: `templates/prompts/red-team-subagent.md`. Verdict: `FAIL` (any CRITICAL or HIGH bug — blocks completion) or `GRUDGING-PASS` (exhaustive search, nothing found). CRITICAL/HIGH bugs get up to 2 fix cycles before deferral. Runs on `model: "
|
|
298
|
+
- **Red Team** — adversarial / security / boundaries. Non-skippable. Protocol: `templates/prompts/red-team-subagent.md`. Verdict: `FAIL` (any CRITICAL or HIGH bug — blocks completion) or `GRUDGING-PASS` (exhaustive search, nothing found). CRITICAL/HIGH bugs get up to 2 fix cycles before deferral. Runs on `model: "fable"` (M85).
|
|
299
299
|
- **QA** — test execution + shallow-test detection + contract compliance. Non-skippable. Protocol: `templates/prompts/qa-subagent.md`. Writes ZERO feature code. Any shallow E2E test blocks phase completion. Runs on `model: "sonnet"`.
|
|
300
300
|
|
|
301
301
|
When `.gsd-t/contracts/design-contract.md` or `.gsd-t/contracts/design/` exists, a fourth stage runs Design Verification (protocol: `templates/prompts/design-verify-subagent.md`) — opens a browser, compares the build against the design, returns a structured element-by-element MATCH/DEVIATION schema. Deviations block completion.
|
|
@@ -304,12 +304,13 @@ Synthesis stage merges results without category collapse. Verdict: `VERIFIED` /
|
|
|
304
304
|
|
|
305
305
|
## Model Display (MANDATORY)
|
|
306
306
|
|
|
307
|
-
**Each Workflow `agent()` call declares its model explicitly** via the `model:` option (`"haiku"` / `"sonnet"` / `"opus"`). The Workflow runtime emits a `⚙ [{model}] {label}` line per stage in `/workflows`, giving the user real-time visibility into which model handles each operation.
|
|
307
|
+
**Each Workflow `agent()` call declares its model explicitly** via the `model:` option (`"haiku"` / `"sonnet"` / `"opus"` / `"fable"`). The Workflow runtime emits a `⚙ [{model}] {label}` line per stage in `/workflows`, giving the user real-time visibility into which model handles each operation.
|
|
308
308
|
|
|
309
309
|
**Model assignments:**
|
|
310
310
|
- `model: "haiku"` — strictly mechanical tasks: run test suites and report counts, check file existence, validate JSON structure, branch guard checks
|
|
311
311
|
- `model: "sonnet"` — mid-tier reasoning: routine code changes, standard refactors, test writing, QA evaluation, straightforward synthesis
|
|
312
|
-
- `model: "opus"` — high-stakes reasoning: architecture decisions, security analysis, complex debugging, cross-module refactors,
|
|
312
|
+
- `model: "opus"` — high-stakes reasoning: architecture decisions, security analysis, complex debugging, cross-module refactors, quality judgment on critical paths
|
|
313
|
+
- `model: "fable"` — highest-stakes calls where one judgment gates the most downstream spend (M85): solution-space probe, partition probe, competition judge, pre-mortem, Red Team. Competition producers STAY `opus` (M82 blindness invariant — judge must differ from producers). Debug cycle-1 → `opus`, cycle-2 → `fable` (escalation). **Single source of truth for tier assignments:** `bin/gsd-t-model-tier-policy.cjs` + `.gsd-t/contracts/model-tier-policy-contract.md` v1.0.0 STABLE. The M71-family lint (`test/m85-workflow-tier-policy-lint.test.js`) proves every workflow `model:` literal matches the policy and a drifted literal FAILS the lint (mandatory negative test).
|
|
313
314
|
|
|
314
315
|
**Context budget:** Workflow scripts receive a `budget` global (`budget.total`, `budget.spent()`, `budget.remaining()`) tied to the user's per-turn token target. Use it for dynamic loops (`while (budget.total && budget.remaining() > 50_000) { ... }`) or to scale fleet size. Opus 4.7/4.8 ship 1M context windows; the legacy meter at `bin/token-budget.cjs` was retired in M61 — use native `/context` for live in-session usage.
|
|
315
316
|
|
|
@@ -328,7 +329,7 @@ Canonical scripts:
|
|
|
328
329
|
- `gsd-t-integrate.workflow.js` — cross-domain wire-up + light verify-gate
|
|
329
330
|
- `gsd-t-debug.workflow.js` — 2-cycle diagnose/fix/verify (CLAUDE.md Prime Rule)
|
|
330
331
|
- `gsd-t-quick.workflow.js` — preflight + brief + single-task + verify-gate (M56-D4)
|
|
331
|
-
- `gsd-t-phase.workflow.js` — generic upper-stage runner (partition / plan / discuss / impact / milestone / prd / design-decompose / doc-ripple). **M82 Competition Mode
|
|
332
|
+
- `gsd-t-phase.workflow.js` — generic upper-stage runner (partition / plan / discuss / impact / milestone / prd / design-decompose / doc-ripple). **M82/M84 Competition Mode (AUTOMATIC):** on eligible upstream phases (partition / milestone / discuss / design-decompose) an Opus solution-space probe runs at phase start and self-decides whether to compete (biased toward competing — a better upstream artifact lowers total downstream cost); when it fires, 3 parallel Self-MoA producers → a judge stage → a finalizer. No flag needed; override with `competition: N` / `competition: 0` / `noCompetition: true`. Partition's judge is the OBJECTIVE file-disjointness oracle (`gsd-t competition-judge --kind partition` — a calculator, not an LLM critic, immune to judge bias, the v1 beachhead); subjective phases use a blind + shuffled + different-model + rubric judge whose pick is finalized deterministically by `--kind generic`. The generative dual of the orthogonal validation triad; watershed rule = generate-and-judge ABOVE the contract, attack-and-filter BELOW. Default off. Contract: `competition-mode-contract.md` v1.0.0. **M83 Plan Hardening:** the `plan` phase runs two blocking gates before execute — a deterministic acceptance-traceability gate (`gsd-t traceability-gate`: every AC binds to a code path + a killing test; the `Headline:` task needs both impl and test) and an adversarial pre-mortem agent (opus, fresh-context, protocol `pre-mortem-subagent.md`: predicts edge-case/dead-deliverable/NFR failures, each → a required test). The temporal dual of the Red Team (attack the design at plan, not just code at verify). Contract: `plan-hardening-contract.md` v1.0.0.
|
|
332
333
|
- `gsd-t-scan.workflow.js` — preflight → volume-probe → pipeline(per-slice deep finder → single verify) → synthesis → document → render (M66: fans out by codebase VOLUME, not a fixed 5-teammate dimension count; M67: deep document phase deterministically produces the full living-doc set + dimension files, per-doc fan-out)
|
|
333
334
|
|
|
334
335
|
**Runtime-native invariant (M81 — v4.0.29+):** the Workflow sandbox provides ONLY `agent/parallel/pipeline/log/phase/budget/args` — NO `require`/`fs`/`path`/`child_process`/`process`, and `args` arrives as a JSON STRING. Each workflow is self-contained: it `JSON.parse`s `args` and delegates every CLI call (preflight, verify-gate, brief, build-coverage, ci-parity, test-data, disjointness) to inline `async` helpers that run the command via an `agent()`'s Bash (preferring project-local `bin/<tool>.cjs`, else the global `gsd-t` PATH binary) and parse the JSON envelope — preserving the M55-D5 project-local-bin invariant. The old `require("./_lib.js")` pattern threw `ReferenceError` on first eval and silently broke every workflow except scan (TD-113, fixed M81); `_lib.js` is retired as a workflow dependency.
|
|
@@ -94,7 +94,7 @@ for (let cycle = 1; cycle <= 2; cycle++) {
|
|
|
94
94
|
label: `debug-cycle-${cycle}`,
|
|
95
95
|
phase: `Cycle ${cycle}`,
|
|
96
96
|
schema: DEBUG_CYCLE_SCHEMA,
|
|
97
|
-
model: "opus",
|
|
97
|
+
model: cycle === 1 ? "opus" : "fable",
|
|
98
98
|
}).catch((e) => ({
|
|
99
99
|
resolved: false,
|
|
100
100
|
rootCause: `agent error: ${e && e.message}`,
|
|
@@ -37,8 +37,13 @@ export const meta = {
|
|
|
37
37
|
name: "gsd-t-phase",
|
|
38
38
|
description: "Generic upper-stage phase runner (partition/plan/discuss/etc.)",
|
|
39
39
|
phases: [
|
|
40
|
-
{ title: "Preflight",
|
|
41
|
-
{ title: "
|
|
40
|
+
{ title: "Preflight", detail: "preflight + brief" },
|
|
41
|
+
{ title: "Probe", detail: "M84 auto-competition solution-space probe (opus; eligible phases only)" },
|
|
42
|
+
{ title: "Compete", detail: "M82/M84 N parallel producers (when competition fires)" },
|
|
43
|
+
{ title: "Judge", detail: "select/synthesize the winning candidate" },
|
|
44
|
+
{ title: "Phase", detail: "primary agent (or finalizer) with phase-specific protocol" },
|
|
45
|
+
{ title: "Finalize", detail: "commit the winning approach (competition path)" },
|
|
46
|
+
{ title: "Plan Hardening", detail: "M83 traceability gate + adversarial pre-mortem (plan phase only)" },
|
|
42
47
|
],
|
|
43
48
|
};
|
|
44
49
|
|
|
@@ -128,9 +133,77 @@ async function runCompetitionJudge(projectDir, spec, label = "judge", phaseNameO
|
|
|
128
133
|
}
|
|
129
134
|
|
|
130
135
|
// Phases where competition pays off (wide solution space, pre-contract, high blast
|
|
131
|
-
// radius).
|
|
136
|
+
// radius). Competition is AUTOMATIC on these (M84) — the workflow probes the
|
|
137
|
+
// solution space and self-decides; on any other phase it never runs.
|
|
132
138
|
const COMPETITION_ELIGIBLE = new Set(["partition", "milestone", "discuss", "design-decompose"]);
|
|
133
139
|
|
|
140
|
+
// M84: the solution-space probe. Decides AUTOMATICALLY whether a phase is
|
|
141
|
+
// competition-worthy (≥2 genuinely different viable approaches). This is a
|
|
142
|
+
// high-level reasoning step — NOT a mechanical check — so it runs on OPUS, not
|
|
143
|
+
// haiku (a weak probe forfeits the whole point: it gates a 3× competition whose
|
|
144
|
+
// upstream cost buys down far larger downstream cost). It is BIASED TOWARD
|
|
145
|
+
// COMPETING: when uncertain, compete — because a better artifact upstream makes
|
|
146
|
+
// every downstream phase (pre-mortem, execute, verify) cheaper and more likely to
|
|
147
|
+
// pass first time, so the expected savings usually exceed the 3× probe-and-produce
|
|
148
|
+
// cost. Returns { compete: bool, reason, approaches? }.
|
|
149
|
+
//
|
|
150
|
+
// Partition has its OWN probe (runPartitionProbe, also opus): the disjointness
|
|
151
|
+
// oracle can't decide before candidates exist, so an opus probe makes the
|
|
152
|
+
// compete/skip call and the oracle JUDGES the candidates afterward. This
|
|
153
|
+
// (runSolutionSpaceProbe) is for the other subjective phases.
|
|
154
|
+
const _PROBE_SCHEMA = {
|
|
155
|
+
type: "object", required: ["compete"], additionalProperties: true,
|
|
156
|
+
properties: {
|
|
157
|
+
compete: { type: "boolean" },
|
|
158
|
+
reason: { type: "string" },
|
|
159
|
+
approaches: { type: "array", items: { type: "string" } },
|
|
160
|
+
},
|
|
161
|
+
};
|
|
162
|
+
async function runSolutionSpaceProbe(projectDir, phaseName, { milestone, briefPath, userInput, phaseNameOpt } = {}) {
|
|
163
|
+
const prompt = [
|
|
164
|
+
`You are the Solution-Space Probe for the ${phaseName} phase${milestone ? ` of ${milestone}` : ""}. Decide ONE thing: should this phase generate MULTIPLE competing candidates (then a judge picks the best), or is a single draft sufficient?`,
|
|
165
|
+
`**Brief:** ${briefPath || "(none — read the relevant .gsd-t docs/contracts/requirements directly)"}`,
|
|
166
|
+
userInput ? `\nUser input:\n${userInput}\n` : "",
|
|
167
|
+
`Compete WHEN there are ≥2 genuinely DIFFERENT, viable approaches whose trade-offs matter — different architectures, decomposition strategies, data models, sequencing, or design directions that a reasonable expert could disagree about. List them in "approaches".`,
|
|
168
|
+
`Do NOT compete only when there is ONE obvious correct approach and any variation would be cosmetic.`,
|
|
169
|
+
`BIAS TOWARD COMPETING: if you are uncertain, or can name even two plausibly-different approaches, choose compete=true. A wasted competition costs ~3× this one phase; a missed-better-approach costs far more downstream (more pre-mortem blocks, more bugs, more verify cycles). Err on the side of generating options.`,
|
|
170
|
+
`Return JSON per the schema: { "compete": true|false, "reason": "<one sentence>", "approaches": ["<a>","<b>",...] }.`,
|
|
171
|
+
].filter(Boolean).join("\n");
|
|
172
|
+
const opts = { label: "solution-space-probe", schema: _PROBE_SCHEMA, model: "fable" };
|
|
173
|
+
if (phaseNameOpt) opts.phase = phaseNameOpt;
|
|
174
|
+
const r = await agent(prompt, opts).catch(() => null);
|
|
175
|
+
// Probe failure → bias toward competing (fail-toward-options, per the cost logic).
|
|
176
|
+
if (!r || typeof r.compete !== "boolean") {
|
|
177
|
+
return { compete: true, reason: "probe unavailable — defaulting to compete (bias toward options)", approaches: [] };
|
|
178
|
+
}
|
|
179
|
+
return { compete: r.compete, reason: r.reason || "", approaches: r.approaches || [] };
|
|
180
|
+
}
|
|
181
|
+
|
|
182
|
+
// M84: PARTITION's pre-produce decision. The objective disjointness oracle needs
|
|
183
|
+
// candidates to score, so it can't DECIDE before any exist — it runs later as the
|
|
184
|
+
// JUDGE. For the pre-produce compete/skip decision we use an OPUS heuristic probe
|
|
185
|
+
// (biased toward compete): partition is competition-worthy unless the milestone is
|
|
186
|
+
// trivially single-domain. So: opus probe DECIDES whether to compete; the objective
|
|
187
|
+
// file-disjointness oracle JUDGES the produced candidates. (Decision = heuristic +
|
|
188
|
+
// bias; selection = objective.)
|
|
189
|
+
async function runPartitionProbe(projectDir, { milestone, briefPath, userInput, phaseNameOpt } = {}) {
|
|
190
|
+
const prompt = [
|
|
191
|
+
`You are the Partition Solution-Space Probe${milestone ? ` for ${milestone}` : ""}. Decide: are there ≥2 genuinely different ways to CARVE this milestone into file-disjoint domains (different boundaries / groupings / parallelism), or is there one obvious single decomposition?`,
|
|
192
|
+
`**Brief:** ${briefPath || "(none — read .gsd-t docs/contracts/requirements directly)"}`,
|
|
193
|
+
userInput ? `\nUser input:\n${userInput}\n` : "",
|
|
194
|
+
`Compete=true when the work spans multiple files/areas that could be grouped more than one sensible way. Compete=false ONLY for a trivial single-file / single-domain milestone.`,
|
|
195
|
+
`BIAS TOWARD COMPETING: if ≥3 files/areas are in play or you're unsure, choose compete=true — the file-disjointness oracle will objectively pick the most-parallelizable valid carving among the candidates, so competing is low-risk and high-reward.`,
|
|
196
|
+
`Return JSON per the schema.`,
|
|
197
|
+
].filter(Boolean).join("\n");
|
|
198
|
+
const opts = { label: "partition-probe", schema: _PROBE_SCHEMA, model: "fable" };
|
|
199
|
+
if (phaseNameOpt) opts.phase = phaseNameOpt;
|
|
200
|
+
const r = await agent(prompt, opts).catch(() => null);
|
|
201
|
+
if (!r || typeof r.compete !== "boolean") {
|
|
202
|
+
return { compete: true, reason: "probe unavailable — defaulting to compete", approaches: [] };
|
|
203
|
+
}
|
|
204
|
+
return { compete: r.compete, reason: r.reason || "", approaches: r.approaches || [] };
|
|
205
|
+
}
|
|
206
|
+
|
|
134
207
|
// Rubric axes for the SUBJECTIVE judge (non-partition eligible phases). Partition
|
|
135
208
|
// uses the objective oracle instead and ignores these.
|
|
136
209
|
const RUBRIC_AXES_BY_PHASE = {
|
|
@@ -170,13 +243,27 @@ const milestone = _args.milestone || null;
|
|
|
170
243
|
const userInput = _args.userInput || "";
|
|
171
244
|
const phaseName = _args.phase;
|
|
172
245
|
|
|
173
|
-
//
|
|
174
|
-
//
|
|
175
|
-
|
|
176
|
-
|
|
177
|
-
|
|
178
|
-
|
|
179
|
-
|
|
246
|
+
// M84: competition is AUTOMATIC. By default the workflow PROBES the solution space
|
|
247
|
+
// (after brief) and self-decides whether to run a 3-producer + judge competition —
|
|
248
|
+
// no flag needed. Optional manual OVERRIDES: `competition: N` (2-5) forces N
|
|
249
|
+
// producers; `competition: 0` or `noCompetition: true` forces it off. Default
|
|
250
|
+
// (`competition` unset) = let the workflow decide.
|
|
251
|
+
// Evidence (Self-MoA, Large Language Monkeys): gains plateau fast; N=3 is the elbow,
|
|
252
|
+
// >5 wasteful. The auto path fires 3.
|
|
253
|
+
const AUTO_COMPETITION_N = 3;
|
|
254
|
+
const _hasCompetitionArg = _args.competition !== undefined && _args.competition !== null;
|
|
255
|
+
const _forceOff = _args.noCompetition === true || (_hasCompetitionArg && Number(_args.competition) <= 1);
|
|
256
|
+
const _forcedN = _hasCompetitionArg && Number(_args.competition) >= 2
|
|
257
|
+
? Math.max(2, Math.min(5, Math.floor(Number(_args.competition))))
|
|
258
|
+
: null;
|
|
259
|
+
// competitionN/competitionOn are resolved LATER (after preflight+brief) by the
|
|
260
|
+
// auto-probe, unless an override pins them now. Declared with `let` so the
|
|
261
|
+
// post-brief decision block can set them.
|
|
262
|
+
let competitionN = 1;
|
|
263
|
+
let competitionOn = false;
|
|
264
|
+
const _competitionEligible = COMPETITION_ELIGIBLE.has(phaseName);
|
|
265
|
+
if (_forcedN && !_competitionEligible) {
|
|
266
|
+
log(`competition: forced N=${_forcedN} ignored — phase "${phaseName}" is not competition-eligible. Eligible: ${[...COMPETITION_ELIGIBLE].join(", ")}.`);
|
|
180
267
|
}
|
|
181
268
|
|
|
182
269
|
if (!phaseName || !VALID_PHASES.includes(phaseName)) {
|
|
@@ -189,7 +276,35 @@ const pre = await runPreflight(projectDir);
|
|
|
189
276
|
if (!pre.ok) return { status: "failed", reason: "preflight-failed", preflight: pre.envelope };
|
|
190
277
|
const brief = await generateBrief(projectDir, { kind: phaseName, milestone, id: `${phaseName}-${(milestone || "m").toLowerCase()}` });
|
|
191
278
|
|
|
192
|
-
|
|
279
|
+
// ── M84: resolve competition AUTOMATICALLY (after brief, before producing) ──
|
|
280
|
+
// Default: probe the solution space and self-decide. Overrides pin it.
|
|
281
|
+
if (_competitionEligible) {
|
|
282
|
+
if (_forceOff) {
|
|
283
|
+
competitionOn = false;
|
|
284
|
+
log(`competition: OFF (overridden via competition≤1 / noCompetition).`);
|
|
285
|
+
} else if (_forcedN) {
|
|
286
|
+
competitionN = _forcedN; competitionOn = true;
|
|
287
|
+
log(`competition: ON, N=${_forcedN} (overridden).`);
|
|
288
|
+
} else {
|
|
289
|
+
// M84 Red Team LOW: warn on an unparseable override so a typo (competition:"off")
|
|
290
|
+
// isn't silently swallowed into the auto path.
|
|
291
|
+
if (_hasCompetitionArg && Number.isNaN(Number(_args.competition))) {
|
|
292
|
+
log(`competition: override value ${JSON.stringify(_args.competition)} is not a number — ignoring it, using AUTO. (Use 0/noCompetition to force off, 2-5 to force N.)`);
|
|
293
|
+
}
|
|
294
|
+
// Automatic decision — the workflow probes and decides. Opus probe (or the
|
|
295
|
+
// partition-specific probe); biased toward competing.
|
|
296
|
+
phase("Probe");
|
|
297
|
+
const probe = phaseName === "partition"
|
|
298
|
+
? await runPartitionProbe(projectDir, { milestone, briefPath: brief.briefPath, userInput, phaseNameOpt: "Probe" })
|
|
299
|
+
: await runSolutionSpaceProbe(projectDir, phaseName, { milestone, briefPath: brief.briefPath, userInput, phaseNameOpt: "Probe" });
|
|
300
|
+
competitionOn = !!probe.compete;
|
|
301
|
+
competitionN = competitionOn ? AUTO_COMPETITION_N : 1;
|
|
302
|
+
log(`competition: AUTO → ${competitionOn ? `COMPETE (${AUTO_COMPETITION_N} producers)` : "single draft"} — ${probe.reason}${probe.approaches && probe.approaches.length ? ` [approaches: ${probe.approaches.join("; ")}]` : ""}`);
|
|
303
|
+
}
|
|
304
|
+
}
|
|
305
|
+
|
|
306
|
+
// M84 Red Team LOW: announce "Phase" only on the single-draft path (the
|
|
307
|
+
// competition path announces Compete/Judge/Finalize instead) so no empty stage shows.
|
|
193
308
|
const promptByPhase = {
|
|
194
309
|
partition: `Decompose the milestone into 2-5 independent domains. Write .gsd-t/domains/{domain}/{scope,constraints,tasks}.md. Cross-domain contracts in .gsd-t/contracts/.`,
|
|
195
310
|
plan: `For each domain, write atomic tasks.md entries with files, contract refs, dependencies, acceptance criteria. Update .gsd-t/contracts/integration-points.md with wave groupings.
|
|
@@ -209,6 +324,7 @@ const briefLine = `**Brief (REQUIRED):** ${brief.briefPath || "(no brief — re-
|
|
|
209
324
|
let result;
|
|
210
325
|
if (!competitionOn) {
|
|
211
326
|
// ── Single-producer path (default, unchanged behavior) ──
|
|
327
|
+
phase("Phase");
|
|
212
328
|
result = await agent(
|
|
213
329
|
[
|
|
214
330
|
`You are the ${phaseName} phase agent.`,
|
|
@@ -224,15 +340,49 @@ if (!competitionOn) {
|
|
|
224
340
|
{ label: phaseName, phase: "Phase", schema: PHASE_RESULT_SCHEMA, model: "opus" }
|
|
225
341
|
).catch((e) => ({ status: "failed", artifacts: [], summary: `agent error: ${e && e.message}` }));
|
|
226
342
|
} else {
|
|
227
|
-
// ── M82 Competition Mode: generate -> judge -> finalize ──
|
|
228
|
-
// Distinct "angles" so the N Self-MoA producers explore different regions of
|
|
229
|
-
//
|
|
230
|
-
|
|
231
|
-
|
|
232
|
-
|
|
233
|
-
|
|
234
|
-
|
|
235
|
-
|
|
343
|
+
// ── M82/M84 Competition Mode: generate -> judge -> finalize ──
|
|
344
|
+
// Distinct "angles" so the N Self-MoA producers explore different regions of the
|
|
345
|
+
// solution space (diversity by prompt, not by model — Self-MoA > Mixed-MoA).
|
|
346
|
+
// M84 Red Team MEDIUM: angles must be PHASE-AWARE — the old partition-only set
|
|
347
|
+
// gave a discuss/milestone producer a contradictory "carve file-disjoint domains"
|
|
348
|
+
// directive, degrading 3 of 4 now-automatic phases. Each eligible phase gets its
|
|
349
|
+
// own angle set (analogous to RUBRIC_AXES_BY_PHASE).
|
|
350
|
+
const ANGLES_BY_PHASE = {
|
|
351
|
+
partition: [
|
|
352
|
+
"Optimize for MAXIMUM parallelism: carve the most file-disjoint domains that can run concurrently.",
|
|
353
|
+
"Optimize for SIMPLICITY: the fewest domains with the cleanest, most obvious boundaries.",
|
|
354
|
+
"Optimize for RISK ISOLATION: isolate the riskiest/most-coupled work into its own domain so the rest stays safe.",
|
|
355
|
+
"Optimize for DEPENDENCY DEPTH: minimize serial gates (waves) between domains.",
|
|
356
|
+
"Optimize for BALANCE: roughly equal-sized domains with minimal cross-talk.",
|
|
357
|
+
],
|
|
358
|
+
milestone: [
|
|
359
|
+
"Optimize for FASTEST TIME-TO-VALUE: the leanest milestone sequence that ships something usable soonest.",
|
|
360
|
+
"Optimize for RISK-FIRST: front-load the riskiest/most-uncertain work so failure is cheap and early.",
|
|
361
|
+
"Optimize for DEPENDENCY ORDER: sequence strictly by what unblocks the most downstream work.",
|
|
362
|
+
"Optimize for USER-VALUE-FIRST: order milestones by the value each delivers to the end user.",
|
|
363
|
+
"Optimize for SIMPLICITY: the fewest, most self-contained milestones with minimal cross-cutting.",
|
|
364
|
+
],
|
|
365
|
+
discuss: [
|
|
366
|
+
"Argue the SIMPLEST viable architecture, even if it sacrifices some flexibility.",
|
|
367
|
+
"Argue the most ROBUST/CORRECT architecture, accepting more upfront complexity.",
|
|
368
|
+
"Argue the most EXTENSIBLE architecture, optimizing for future change.",
|
|
369
|
+
"Argue a PRAGMATIC middle path, naming the explicit trade-offs it accepts.",
|
|
370
|
+
"Argue a CONTRARIAN approach that questions an assumption the others take for granted.",
|
|
371
|
+
],
|
|
372
|
+
"design-decompose": [
|
|
373
|
+
"Decompose ATOMIC-FIRST: smallest reusable elements up, composed into widgets then pages.",
|
|
374
|
+
"Decompose PAGE-FIRST: whole pages down into sections, widgets, then elements.",
|
|
375
|
+
"Decompose TOKEN-DRIVEN: design tokens + primitives first, structure follows the system.",
|
|
376
|
+
"Decompose by REUSE: maximize shared components; minimize one-off bespoke pieces.",
|
|
377
|
+
"Decompose by FEATURE: group elements/widgets by the user-facing feature they serve.",
|
|
378
|
+
],
|
|
379
|
+
};
|
|
380
|
+
const ANGLES = ANGLES_BY_PHASE[phaseName] || [
|
|
381
|
+
"Explore a materially different approach, optimizing for simplicity.",
|
|
382
|
+
"Explore a materially different approach, optimizing for robustness/correctness.",
|
|
383
|
+
"Explore a materially different approach, optimizing for extensibility.",
|
|
384
|
+
"Explore a pragmatic middle path, naming its trade-offs.",
|
|
385
|
+
"Explore a contrarian approach that questions a shared assumption.",
|
|
236
386
|
];
|
|
237
387
|
|
|
238
388
|
const PRODUCER_SCHEMA = phaseName === "partition"
|
|
@@ -323,7 +473,7 @@ if (!competitionOn) {
|
|
|
323
473
|
`IMPORTANT: use the CANDIDATE LABEL (A, B, C…) shown above as the "id" in your scores.`,
|
|
324
474
|
].join("\n"),
|
|
325
475
|
{
|
|
326
|
-
label: "judge:rubric", phase: "Judge", model: "
|
|
476
|
+
label: "judge:rubric", phase: "Judge", model: "fable",
|
|
327
477
|
schema: {
|
|
328
478
|
type: "object", required: ["scores"], additionalProperties: true,
|
|
329
479
|
properties: { scores: { type: "array", items: { type: "object", additionalProperties: true } } },
|
|
@@ -503,7 +653,7 @@ if (phaseName === "plan" && result && result.status !== "failed") {
|
|
|
503
653
|
`Every blocking finding MUST convert to a concrete requiredTest the plan must adopt. Advisory notes are forbidden.`,
|
|
504
654
|
`Verdict BLOCK if any concrete, falsifiable failure condition lacks a named required test; else CLEARED. Return JSON per the schema.`,
|
|
505
655
|
].join("\n"),
|
|
506
|
-
{ label: "pre-mortem", phase: "Plan Hardening", schema: PRE_MORTEM_SCHEMA, model: "
|
|
656
|
+
{ label: "pre-mortem", phase: "Plan Hardening", schema: PRE_MORTEM_SCHEMA, model: "fable" }
|
|
507
657
|
).catch((e) => ({ verdict: "BLOCK", findings: [{ severity: "HIGH", condition: `pre-mortem agent error: ${e && e.message}`, requiredTest: "re-run pre-mortem" }], notes: "agent-error" }));
|
|
508
658
|
|
|
509
659
|
result.preMortem = preMortem;
|
|
@@ -304,7 +304,7 @@ const stages = [
|
|
|
304
304
|
`Verdict is FAIL if you found any CRITICAL or HIGH severity bug; GRUDGING-PASS`,
|
|
305
305
|
`if you searched exhaustively and found nothing. Return JSON per the schema.`,
|
|
306
306
|
].join("\n"),
|
|
307
|
-
{ label: "red-team", phase: "Orthogonal Triad", schema: RED_TEAM_SCHEMA, model: "
|
|
307
|
+
{ label: "red-team", phase: "Orthogonal Triad", schema: RED_TEAM_SCHEMA, model: "fable" }
|
|
308
308
|
),
|
|
309
309
|
|
|
310
310
|
// Stage C — QA (test execution + shallow-test detection + contract compliance)
|