@tekyzinc/gsd-t 2.39.13 → 2.46.11

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (50) hide show
  1. package/CHANGELOG.md +12 -0
  2. package/README.md +19 -10
  3. package/bin/desktop.ini +2 -0
  4. package/bin/global-sync-manager.js +350 -0
  5. package/bin/gsd-t.js +592 -2
  6. package/bin/metrics-collector.js +167 -0
  7. package/bin/metrics-rollup.js +200 -0
  8. package/bin/patch-lifecycle.js +195 -0
  9. package/bin/rule-engine.js +160 -0
  10. package/commands/desktop.ini +2 -0
  11. package/commands/gsd-t-complete-milestone.md +194 -6
  12. package/commands/gsd-t-debug.md +38 -3
  13. package/commands/gsd-t-doc-ripple.md +148 -0
  14. package/commands/gsd-t-execute.md +328 -54
  15. package/commands/gsd-t-help.md +32 -10
  16. package/commands/gsd-t-integrate.md +59 -7
  17. package/commands/gsd-t-metrics.md +143 -0
  18. package/commands/gsd-t-plan.md +49 -2
  19. package/commands/gsd-t-qa.md +26 -5
  20. package/commands/gsd-t-quick.md +36 -3
  21. package/commands/gsd-t-status.md +78 -0
  22. package/commands/gsd-t-test-sync.md +23 -2
  23. package/commands/gsd-t-verify.md +142 -10
  24. package/commands/gsd-t-visualize.md +11 -1
  25. package/commands/gsd-t-wave.md +64 -18
  26. package/docs/GSD-T-README.md +10 -6
  27. package/docs/architecture.md +84 -2
  28. package/docs/ci-examples/desktop.ini +2 -0
  29. package/docs/ci-examples/github-actions.yml +104 -0
  30. package/docs/ci-examples/gitlab-ci.yml +116 -0
  31. package/docs/desktop.ini +2 -0
  32. package/docs/framework-comparison-scorecard.md +160 -0
  33. package/docs/infrastructure.md +87 -1
  34. package/docs/prd-graph-engine.md +2 -2
  35. package/docs/prd-gsd2-hybrid.md +258 -135
  36. package/docs/requirements.md +66 -2
  37. package/examples/.gsd-t/contracts/desktop.ini +2 -0
  38. package/examples/.gsd-t/desktop.ini +2 -0
  39. package/examples/.gsd-t/domains/desktop.ini +2 -0
  40. package/examples/.gsd-t/domains/example-domain/desktop.ini +2 -0
  41. package/examples/desktop.ini +2 -0
  42. package/examples/rules/.gitkeep +0 -0
  43. package/examples/rules/desktop.ini +2 -0
  44. package/package.json +40 -40
  45. package/scripts/desktop.ini +2 -0
  46. package/scripts/gsd-t-dashboard-server.js +19 -2
  47. package/scripts/gsd-t-dashboard.html +63 -0
  48. package/scripts/gsd-t-event-writer.js +1 -0
  49. package/templates/CLAUDE-global.md +92 -10
  50. package/templates/desktop.ini +2 -0
@@ -60,11 +60,19 @@ Before choosing solo or team mode, read the `## Wave Execution Groups` section i
60
60
 
61
61
  **If no wave groups are defined** (older plans): fall back to the `Execution Order` list.
62
62
 
63
- ### Solo Mode (default) — Domain Subagent Pattern
63
+ ### Solo Mode (default) — Domain Task-Dispatcher Pattern
64
64
 
65
- Each domain's work runs in an isolated Task subagent with a fresh context window. The orchestrator (this agent) stays lightweight — it only spawns subagents, collects summaries, verifies checkpoints, and updates progress.
65
+ Each domain's work runs via a lightweight domain task-dispatcher. The dispatcher spawns one Task subagent PER TASK within that domain, giving each task a completely fresh context window with only the minimum required context. The orchestrator (this agent) stays lightweight — it only spawns dispatchers, collects summaries, verifies checkpoints, and updates progress.
66
66
 
67
- **OBSERVABILITY LOGGING (MANDATORY) repeat for every domain subagent spawn:**
67
+ **Context provided to each task subagent (fresh-dispatch-contract.md payload):**
68
+ - `scope.md` for the domain
69
+ - Relevant contracts (only those referenced by the task)
70
+ - The single task from `tasks.md`
71
+ - Graph context for the task's files (if available)
72
+ - Up to 5 prior task summaries (10-20 lines each, most recent first)
73
+ - Past failure/learning entries for this domain (max 5 lines)
74
+
75
+ **OBSERVABILITY LOGGING (MANDATORY) — repeat for every task subagent spawn:**
68
76
 
69
77
  Before spawning — run via Bash:
70
78
  `T_START=$(date +%s) && DT_START=$(date +"%Y-%m-%d %H:%M") && TOK_START=${CLAUDE_CONTEXT_TOKENS_USED:-0} && TOK_MAX=${CLAUDE_CONTEXT_TOKENS_MAX:-200000}`
@@ -76,39 +84,80 @@ Compute tokens and compaction:
76
84
  - No compaction (TOK_END >= TOK_START): `TOKENS=$((TOK_END-TOK_START))`, COMPACTED=null
77
85
  - Compaction detected (TOK_END < TOK_START): `TOKENS=$(((TOK_MAX-TOK_START)+TOK_END))`, COMPACTED=$DT_END
78
86
 
79
- Append to `.gsd-t/token-log.md` (create with header `| Datetime-start | Datetime-end | Command | Step | Model | Duration(s) | Notes | Tokens | Compacted |` if missing):
80
- `| {DT_START} | {DT_END} | gsd-t-execute | domain:{domain-name} | sonnet | {DURATION}s | {N} tasks, {pass/fail} | {TOKENS} | {COMPACTED} |`
87
+ Compute context utilization run via Bash:
88
+ `if [ "${CLAUDE_CONTEXT_TOKENS_MAX:-0}" -gt 0 ]; then CTX_PCT=$(echo "scale=1; ${CLAUDE_CONTEXT_TOKENS_USED:-0} * 100 / ${CLAUDE_CONTEXT_TOKENS_MAX}" | bc); else CTX_PCT="N/A"; fi`
89
+
90
+ Alert on context thresholds (display to user inline):
91
+ - If CTX_PCT is a number and >= 85: `echo "🔴 CRITICAL: Context at ${CTX_PCT}% — compaction likely. Task MUST be split."`
92
+ - If CTX_PCT is a number and >= 70: `echo "⚠️ WARNING: Context at ${CTX_PCT}% — approaching compaction threshold. Consider splitting in plan."`
81
93
 
82
- **For each domain (in wave order), spawn:**
94
+ Append to `.gsd-t/token-log.md` (create with header `| Datetime-start | Datetime-end | Command | Step | Model | Duration(s) | Notes | Tokens | Compacted | Domain | Task | Ctx% |` if missing):
95
+ `| {DT_START} | {DT_END} | gsd-t-execute | task:{task-id} | sonnet | {DURATION}s | {pass/fail} | {TOKENS} | {COMPACTED} | {domain-name} | task-{task-id} | {CTX_PCT} |`
83
96
 
84
- **Pre-task experience retrieval (before spawning each domain subagent):**
97
+ **For each domain (in wave order), run the domain task-dispatcher:**
98
+
99
+ **Pre-dispatch experience retrieval (before dispatching each domain's tasks):**
85
100
  Run via Bash:
86
101
  `grep -i "\[failure\]\|\[learning\]" .gsd-t/progress.md | grep -i "{domain-name}" | tail -5`
87
102
 
88
103
  If results found:
89
- - Prepend a `## ⚠️ Past Failures (retrieve before acting)` block to the subagent prompt (max 5 lines from results)
104
+ - Store as `PAST_FAILURES` prepend to each task subagent prompt (max 5 lines)
90
105
  - Write event via Bash: `node ~/.claude/scripts/gsd-t-event-writer.js --type experience_retrieval --command gsd-t-execute --reasoning "{N past failures found for {domain-name}}" --outcome null || true`
91
106
 
92
107
  If no results found: proceed normally (no warning block, no event write).
93
108
 
109
+ **Pre-flight intelligence check (before dispatching each domain's tasks):**
110
+ Run via Bash:
111
+ `node -e "const m = require('./bin/metrics-collector.js'); const w = m.getPreFlightWarnings('{domain-name}'); if(w.length) w.forEach(x => console.log('⚠️ ' + x));" 2>/dev/null || true`
112
+
113
+ Display any warnings inline (non-blocking — execution proceeds regardless).
114
+
115
+ **Active Rule Injection (before dispatching each domain's tasks):**
116
+ Run via Bash:
117
+ `node -e "const re = require('./bin/rule-engine.js'); const m = re.evaluateRules('{domain-name}', { projectDir: '.' }); if(m.length) m.forEach(x => console.log('RULE: ' + x.rule.name + ' — ' + x.rule.description + ' [' + x.severity + ']')); else console.log('No active rules for {domain-name}');" 2>/dev/null || true`
118
+
119
+ If rules fire: inject up to 10 lines of rule warnings into each task subagent prompt (concise format: `RULE: {name} — {description}`). These inform the subagent of known patterns — non-blocking.
120
+ If no rules fire: log "No active rules for {domain-name}" and continue.
121
+
122
+ **Domain task-dispatcher (lightweight — sequences tasks, passes summaries):**
123
+
124
+ For each task in `.gsd-t/domains/{domain-name}/tasks.md` (in order, skip completed):
125
+
126
+ 1. Load prior summaries: Read up to 5 most recent `.gsd-t/domains/{domain-name}/task-*-summary.md` files (10-20 lines each)
127
+ 2. Load graph context (if `.gsd-t/graph/meta.json` exists): query task's files for relevant graph context
128
+ 3. Display: `⚙ [sonnet] gsd-t-execute → domain: {domain-name}, task-{task-id}`
129
+ 4. Run observability Bash (T_START / DT_START / TOK_START / TOK_MAX)
130
+ 5. Spawn task subagent:
131
+
94
132
  ```
95
133
  Task subagent (general-purpose, model: sonnet, mode: bypassPermissions):
96
- "You are executing all tasks for the {domain-name} domain.
134
+ "You are executing a single task for the {domain-name} domain.
97
135
 
98
- Read before starting (load your own context do not assume anything):
99
- 1. CLAUDE.md — project conventions (CRITICAL)
100
- 2. .gsd-t/domains/{domain-name}/scope.md — what you own
101
- 3. .gsd-t/domains/{domain-name}/constraints.md — patterns to follow
102
- 4. ALL files in .gsd-t/contracts/ — your interfaces
103
- 5. .gsd-t/domains/{domain-name}/tasks.md — your task list
104
- 6. .gsd-t/contracts/integration-points.md — wave order and checkpoints
136
+ {PAST_FAILURES block if any ## ⚠️ Past Failures (read before acting)\n{lines}}
137
+
138
+ ## Your Task
139
+ {full task block from tasks.md — id, description, files, contract refs, dependencies, acceptance criteria}
140
+
141
+ ## Domain Scope
142
+ {contents of .gsd-t/domains/{domain-name}/scope.md}
143
+
144
+ ## Relevant Contracts
145
+ {contents of each contract file referenced by this task}
146
+
147
+ ## Graph Context (if available)
148
+ {graph query results for this task's files — omit section if unavailable}
149
+
150
+ ## Prior Task Summaries (most recent first, max 5)
151
+ {contents of task-{N}-summary.md files — 10-20 lines each}
105
152
 
106
- Execute each incomplete task in order:
107
- 1. Read task description, files list, and contract refs
153
+ ## Instructions
154
+
155
+ Execute the task above:
156
+ 1. Read the task description, files list, and contract refs carefully
108
157
  2. Read relevant contracts — implement EXACTLY what they specify
109
- 3. Destructive Action Guard: if task involves DROP TABLE, schema changes that lose
158
+ 3. Destructive Action Guard: if the task involves DROP TABLE, schema changes that lose
110
159
  data, removing working modules, or replacing architecture patterns → write a
111
- NEEDS-APPROVAL entry to .gsd-t/deferred-items.md, skip the task, continue
160
+ NEEDS-APPROVAL entry to .gsd-t/deferred-items.md, skip the task, stop here
112
161
  4. Implement the task
113
162
  5. Verify acceptance criteria are met
114
163
  6. Write comprehensive tests (MANDATORY — no feature code without test code):
@@ -116,42 +165,126 @@ Execute each incomplete task in order:
116
165
  - Playwright E2E (if UI/routes/flows changed): new specs for new features, cover
117
166
  all modes, form validation, empty/loading/error states, common edge cases
118
167
  - If no test framework exists: set one up as part of this task
119
- 7. Run ALL tests unit, integration, Playwright. Fix failures (up to 2 attempts)
168
+ - If the project has a UI but no Playwright E2E specs exist for the features being
169
+ touched: WRITE THEM. A placeholder spec is not sufficient — write real E2E tests
170
+ that exercise the actual UI functionality being built or changed.
171
+ - **FUNCTIONAL E2E TESTS — NOT LAYOUT TESTS (MANDATORY)**:
172
+ E2E tests that only check element existence (isVisible, isEnabled, toBeAttached)
173
+ are LAYOUT tests, not functional tests. Layout tests pass even when every feature
174
+ is broken. Every Playwright spec MUST verify functional behavior:
175
+ a. **State changes**: After an action (click, type, submit), assert the app STATE
176
+ changed — not just that the button was clickable. Example: clicking a tab must
177
+ load different content; verify the content changed, not just that the tab exists.
178
+ b. **Data flow**: Form submissions must verify data arrived (API call made, response
179
+ rendered, list updated). Don't just assert the form rendered.
180
+ c. **Navigation/routing**: Tab/page switches must verify the NEW content loaded.
181
+ Assert on content unique to the destination, not the navigation element itself.
182
+ d. **Interactive widgets**: Terminals must accept input and produce output. Editors
183
+ must save changes. Panels must load their functional content after opening.
184
+ e. **Network integration**: If a feature requires WebSocket/API connection, verify
185
+ the connection status changes (e.g., "Disconnected" → "Connected") and that
186
+ messages flow through the connection.
187
+ f. **Error recovery**: Don't just check error messages render — verify the app
188
+ recovers (retry button works, form can be resubmitted, etc.).
189
+ A test that would pass on an empty HTML page with the right element IDs is useless.
190
+ Every assertion must prove the FEATURE WORKS, not that the ELEMENT EXISTS.
191
+ 7. Run ALL test suites — this is NOT optional, not conditional, not "if applicable":
192
+ a. Detect configured test runners: check for vitest/jest config, playwright.config.*, cypress.config.*
193
+ b. Run EVERY detected suite. Unit tests alone are NEVER sufficient when E2E exists.
194
+ c. If `playwright.config.*` exists → run `npx playwright test` (full suite, not just affected specs)
195
+ d. If E2E tests fail → fix (up to 2 attempts) before proceeding
196
+ e. Report ALL suite results: "Unit: X/Y pass | E2E: X/Y pass" — never report just one
120
197
  8. Run Pre-Commit Gate checklist from CLAUDE.md — update all affected docs BEFORE committing
121
- 9. Commit immediately: feat({domain-name}/task-{N}): {description}
122
- 10. Update .gsd-t/progress.md — mark task complete; prefix the Decision Log entry with an outcome tag based on how the task completed:
123
- - Task completed successfully on first attempt → prefix `[success]`
124
- - Task completed after a fix (required debugging or correction) → prefix `[learning]`
125
- - Task deferred to .gsd-t/deferred-items.md → prefix `[deferred]`
126
- - Task failed after 3 attempts → prefix `[failure]`
127
- 11. Spawn QA subagent (model: haiku) after each task:
128
- 'Run the full test suite. Read .gsd-t/contracts/ for definitions.
129
- Report: pass/fail counts and coverage gaps.'
130
- If QA fails, fix before proceeding. Append issues to .gsd-t/qa-issues.md.
198
+ 9. Commit immediately: feat({domain-name}/task-{task-id}): {description}
199
+ 10. Update .gsd-t/progress.md — mark this task complete; prefix the Decision Log entry:
200
+ - Completed successfully on first attempt → prefix `[success]`
201
+ - Completed after a fix → prefix `[learning]`
202
+ - Deferred to .gsd-t/deferred-items.md → prefix `[deferred]`
203
+ - Failed after 3 attempts → prefix `[failure]`
204
+ 11. Spawn QA subagent (model: haiku) after completing the task:
205
+ 'Run ALL configured test suites detect and run every one:
206
+ a. Unit tests (vitest/jest/mocha): run the full suite, report pass/fail counts
207
+ b. E2E tests: check for playwright.config.* or cypress.config.* if found, run the FULL E2E suite
208
+ c. NEVER skip E2E when a config file exists. Running only unit tests is a QA FAILURE.
209
+ d. Read .gsd-t/contracts/ for contract definitions. Check contract compliance.
210
+ e. AUDIT E2E test quality: Review each Playwright spec — if any test only checks
211
+ element existence (isVisible, toBeAttached, toBeEnabled) without verifying functional
212
+ behavior (state changes, data loaded, content updated after actions), flag it as
213
+ "SHALLOW TEST — needs functional assertions" in the gap report. A test suite where
214
+ every spec passes but no feature actually works is a QA FAILURE.
215
+ Report format: "Unit: X/Y pass | E2E: X/Y pass (or N/A if no config) | Contract: compliant/violations | Shallow tests: N (list)"'
216
+ If QA fails OR shallow tests are found, fix before proceeding. Append issues to .gsd-t/qa-issues.md.
217
+ 12. Write task summary to .gsd-t/domains/{domain-name}/task-{task-id}-summary.md:
218
+ ## Task {task-id} Summary — {domain-name}
219
+ - **Status**: PASS | FAIL
220
+ - **Files modified**: {list}
221
+ - **Constraints discovered**: {any new constraints or surprises}
222
+ - **Tests**: {pass/fail count}
223
+ - **Notes**: {10-20 lines max — key decisions, patterns, warnings}
131
224
 
132
225
  Deviation rules:
133
226
  - Bug blocking progress → fix, max 3 attempts; if still blocked, log to
134
- .gsd-t/deferred-items.md and continue to next task
135
- - Missing dependency task requires → add minimum needed, document in commit message
136
- - Non-trivial blocker → fix and log to .gsd-t/deferred-items.md
227
+ .gsd-t/deferred-items.md and stop (report FAIL in summary)
228
+ - Missing dependency → add minimum needed, document in commit message
229
+ - Non-trivial blocker → log to .gsd-t/deferred-items.md
137
230
  - Architectural change required → write NEEDS-APPROVAL to .gsd-t/deferred-items.md,
138
- skip the task, continue — never self-approve structural changes
139
-
140
- When all tasks are complete, report:
141
- - Tasks completed: N/N
142
- - Test results: pass/fail counts
143
- - Commits made: list of commit hashes
144
- - Deferred items (if any): list from .gsd-t/deferred-items.md"
231
+ skip the task, stop here — never self-approve structural changes
232
+
233
+ Report back:
234
+ - Task: {task-id}
235
+ - Status: PASS | FAIL
236
+ - Files modified: {list}
237
+ - Tests: {pass/fail count}
238
+ - Commit: {hash}
239
+ - Deferred items (if any)"
145
240
  ```
146
241
 
147
- **After each domain subagent returns (orchestrator responsibilities):**
148
- 1. Log to `.gsd-t/token-log.md` (see observability block above)
149
- 2. Check `.gsd-t/deferred-items.md` for any `NEEDS-APPROVAL` entries — if found, STOP and present to user before spawning the next domain
150
- 3. If a CHECKPOINT is reached per `integration-points.md`, verify contract compliance (see Step 4) before proceeding to the next wave/domain
151
- 4. Update `.gsd-t/progress.md` with domain completion status
242
+ 6. After task subagent returns:
243
+ - Run observability Bash (T_END / TOK_END / DURATION / CTX_PCT)
244
+ - Append to token-log.md (per-task row)
245
+ - Alert on CTX_PCT thresholds (display to user inline)
246
+ - **Emit task-metrics record** run via Bash:
247
+ `node bin/metrics-collector.js --milestone {milestone} --domain {domain-name} --task task-{task-id} --command execute --duration_s $DURATION --tokens_used $TOKENS --context_pct ${CTX_PCT:-0} --pass {true|false} --fix_cycles {0|N} --signal_type {pass-through|fix-cycle} --notes "{brief outcome}" 2>/dev/null || true`
248
+ Signal type: `pass-through` if task passed on first attempt; `fix-cycle` if rework was needed.
249
+ - **Emit task_complete event** — run via Bash:
250
+ `node ~/.claude/scripts/gsd-t-event-writer.js --type task_complete --command gsd-t-execute --reasoning "signal_type={signal_type}, domain={domain-name}" --outcome {success|failure} || true`
251
+ - Check `.gsd-t/deferred-items.md` for `NEEDS-APPROVAL` — if found, STOP and present to user before proceeding to the next task
252
+ - Read the task summary from `.gsd-t/domains/{domain-name}/task-{task-id}-summary.md` to use as prior summary for the next task
253
+
254
+ **After all tasks in a domain complete (orchestrator responsibilities):**
255
+ 1. Check `.gsd-t/deferred-items.md` for any `NEEDS-APPROVAL` entries — if found, STOP and present to user before spawning the next domain
256
+ 2. If a CHECKPOINT is reached per `integration-points.md`, verify contract compliance (see Step 4) before proceeding to the next wave/domain
257
+ 3. Update `.gsd-t/progress.md` with domain completion status
258
+ 4. **Adaptive Replan Check** (per `adaptive-replan-contract.md`) — run after EVERY domain completes, before dispatching the next domain:
259
+
260
+ a. **Read domain summaries**: Read all `.gsd-t/domains/{completed-domain}/task-*-summary.md` files. Extract every `**Constraints discovered**:` field. If ALL are empty or "none", skip to the next domain (fast path — no replan needed).
261
+
262
+ b. **Assess affected domains** — two modes:
263
+ - **Graph available** (`.gsd-t/graph/meta.json` exists): For each changed module mentioned in the constraints, run `query('getImporters', { file })` to find which remaining domains import it. Also run `query('getDomainBoundaryViolations', {})` to check if constraint changes affect domain boundaries. Scope replan to ONLY those domains.
264
+ - **Graph unavailable** (fallback): Check ALL remaining unexecuted domains' `tasks.md` files — less precise but functional.
265
+
266
+ c. **Check for invalidated assumptions**: Read each affected remaining domain's `.gsd-t/domains/{domain}/tasks.md`. For each task, check whether any assumption is invalidated by the discovered constraints (e.g., wrong column name, deprecated API, wrong library, missing prerequisite, throughput limits).
267
+
268
+ d. **If invalidated assumptions found**: Revise the affected domain's `tasks.md` on disk. Append a Revision block at the end of the file (do NOT overwrite existing tasks — append only):
269
+ ```markdown
270
+ ## Revision (Replan Cycle {N})
271
+ - **Trigger**: {completed-domain} — constraint discovered during execution
272
+ - **Constraint**: {exact constraint text from summary}
273
+ - **Changes**: {what was revised in this domain's tasks — list specific task IDs and what changed}
274
+ - **Rationale**: {why this revision is needed — what would break without it}
275
+ ```
276
+
277
+ e. **Increment replan cycle counter** (track as `REPLAN_CYCLES` in orchestrator state, starting at 0).
278
+
279
+ f. **Cycle guard**: If `REPLAN_CYCLES > 2`, STOP and pause for user input:
280
+ "Replan cycle limit (2) exceeded. {N} constraints are still propagating. Please review `.gsd-t/domains/*/tasks.md` and resolve manually, then re-run execute."
281
+
282
+ g. **Log to Decision Log** in `.gsd-t/progress.md`: `- {date}: [replan] Cycle {N} — {completed-domain} constraint propagated to {list of affected domains}: {brief constraint summary}`
283
+
284
+ h. The revised `tasks.md` files are now on disk — the next domain's dispatcher will read the updated version automatically (disk-based handoff, no in-memory state sharing needed).
152
285
 
153
286
  ### Team Mode (when agent teams are enabled)
154
- Spawn teammates for domains within the same wave. Only domains in the same wave can run in parallel — do not spawn teammates for domains in different waves simultaneously:
287
+ Spawn teammates for domains within the same wave. Only domains in the same wave can run in parallel — do not spawn teammates for domains in different waves simultaneously. Each teammate uses the **domain task-dispatcher pattern** — one subagent per task within their domain (same as solo mode).
155
288
 
156
289
  ```
157
290
  Create an agent team for execution:
@@ -167,22 +300,44 @@ RULES FOR ALL TEAMMATES:
167
300
  - **Destructive Action Guard**: NEVER drop tables, remove columns, delete data, replace architecture patterns, or remove working modules without messaging the lead first. The lead must get user approval before any destructive action proceeds.
168
301
  - Only modify files listed in your domain's scope.md
169
302
  - Implement interfaces EXACTLY as specified in contracts
170
- - **Write comprehensive tests with every task** — no feature code without test code:
303
+ - **Write comprehensive FUNCTIONAL tests with every task** — no feature code without test code:
171
304
  - Unit/integration tests: happy path + edge cases + error cases for every new/changed function
172
305
  - Playwright E2E specs (if UI/routes/flows/modes changed): new specs for new features, cover all modes/flags, form validation, empty/loading/error states, common edge cases
173
306
  - Tests are part of the deliverable, not a follow-up
307
+ - **E2E tests MUST be functional, not layout tests**: Every assertion must verify an action produced the correct outcome (state changed, data loaded, content updated) — NOT just that an element is visible/clickable. A test that passes on an empty HTML shell with correct IDs is worthless. See the Functional E2E Test Requirements in the solo mode instructions above.
174
308
  - If a task is marked BLOCKED, message the lead and wait
175
309
  - Run the Pre-Commit Gate checklist from CLAUDE.md BEFORE every commit — update all affected docs
176
- - After completing each task, message the lead with:
177
- "DONE: {domain} Task {N} - {summary of what was created/modified}"
178
- - If you need to deviate from a contract, STOP and message the lead
179
310
  - **Commit immediately after each task**: `feat({domain}/task-{N}): {description}` — do NOT batch commits
180
311
  - **Deviation Rules**: (1) Bug blocking progress → fix, 3 attempts max; (2) Missing dependency → add minimum needed; (3) Blocker → fix and log to deferred-items.md; (4) Architectural change → STOP, message lead, never self-approve
181
312
 
313
+ **Task-dispatcher pattern per teammate:**
314
+ For each task in your domain's tasks.md (in order, skip completed):
315
+ 1. Load prior summaries: read up to 5 most recent `.gsd-t/domains/{your-domain}/task-*-summary.md` files
316
+ 2. Load graph context for task's files (if .gsd-t/graph/meta.json exists)
317
+ 3. Spawn one Task subagent (model: sonnet) with ONLY:
318
+ - scope.md, relevant contracts, the single task, graph context, prior summaries
319
+ - Instruction to write task summary to `.gsd-t/domains/{domain}/task-{id}-summary.md`
320
+ (format per fresh-dispatch-contract.md Task Summary Format)
321
+ 4. After task subagent returns, read the summary and pass it as prior context to the next task
322
+ 5. After completing each task, message the lead with:
323
+ "DONE: {domain} Task {N} - {summary of what was created/modified}"
324
+ 6. If you need to deviate from a contract, STOP and message the lead
325
+
182
326
  Teammate assignments:
183
- - Teammate "{domain-1}": Execute .gsd-t/domains/{domain-1}/tasks.md
184
- - Teammate "{domain-2}": Execute .gsd-t/domains/{domain-2}/tasks.md
185
- - Teammate "{domain-3}": Execute .gsd-t/domains/{domain-3}/tasks.md
327
+ - Teammate "{domain-1}": Execute .gsd-t/domains/{domain-1}/tasks.md (task-dispatcher pattern, isolated worktree)
328
+ - Teammate "{domain-2}": Execute .gsd-t/domains/{domain-2}/tasks.md (task-dispatcher pattern, isolated worktree)
329
+ - Teammate "{domain-3}": Execute .gsd-t/domains/{domain-3}/tasks.md (task-dispatcher pattern, isolated worktree)
330
+
331
+ **Worktree isolation (per domain teammate):**
332
+ Each domain teammate MUST be spawned with `isolation: "worktree"` on the Agent tool:
333
+ ```
334
+ Agent({
335
+ prompt: "{domain execution prompt — include: 'You are working in an isolated git worktree. All your changes are isolated to this worktree branch. Do not push; the lead will merge your branch after all domains complete.'}",
336
+ isolation: "worktree"
337
+ })
338
+ ```
339
+ Each teammate works in its own isolated copy of the repository. Changes from one domain do not affect another domain's working tree. This is required for parallel safety — see `.gsd-t/contracts/worktree-isolation-contract.md`.
340
+
186
341
  Lead responsibilities (QA is handled via Task subagent — spawn one after each domain checkpoint):
187
342
  - Use delegate mode (Shift+Tab)
188
343
  - Track completions from teammate messages
@@ -192,8 +347,107 @@ Lead responsibilities (QA is handled via Task subagent — spawn one after each
192
347
  3. Unblock waiting teammates
193
348
  - Update .gsd-t/progress.md after each completion
194
349
  - Resolve any contract conflicts immediately
350
+
351
+ **Sequential Merge Protocol (lead runs after ALL domain agents complete):**
352
+
353
+ Once all domain teammates report completion, the lead performs sequential atomic merges. This is the critical integration step — see `.gsd-t/contracts/worktree-isolation-contract.md` for the full merge protocol.
354
+
355
+ 1. **Determine merge order**: Read `.gsd-t/contracts/integration-points.md` — use the dependency graph to sort domains. Domains with no upstream dependencies merge first. Example: if domain-A has no deps and domain-B depends on domain-A's output, merge order is [domain-A, domain-B].
356
+
357
+ 2. **For each domain (in dependency order)**:
358
+
359
+ a. **File ownership validation (pre-merge)**: Check files the domain agent modified against the domain's `.gsd-t/domains/{domain}/scope.md`:
360
+ - If `.gsd-t/graph/meta.json` exists: run `query('getDomainBoundaryViolations', { domain })` — flag any files modified outside the domain's declared ownership
361
+ - If graph unavailable: list files changed in the worktree branch via `git diff --name-only` and compare against scope.md manually
362
+ - If violations found: log them in `.gsd-t/progress.md` as `[violation] {domain}: modified {file} outside scope`, but do NOT block merge — flag for immediate review after merge
363
+
364
+ b. **Merge the domain's worktree branch**:
365
+ ```bash
366
+ # The worktree branch name is returned by the Agent tool when isolation: "worktree" is used
367
+ git merge --no-ff {domain-worktree-branch} -m "integrate({domain}): merge worktree branch"
368
+ ```
369
+
370
+ c. **Contract validation (post-merge)**: Read each contract in `.gsd-t/contracts/` — verify the merged code still satisfies all contract shapes (API shapes, schemas, interfaces). If a contract is violated, log it immediately.
371
+
372
+ d. **Run integration tests**:
373
+ ```bash
374
+ node --test test/
375
+ # or project's test command from package.json
376
+ ```
377
+
378
+ e. **If tests PASS**: Continue to the next domain in merge order.
379
+
380
+ f. **If tests FAIL**: Roll back this domain's merge:
381
+ ```bash
382
+ git reset --hard HEAD~1
383
+ # or: git revert HEAD --no-edit
384
+ ```
385
+ Log failure: `[rollback] {domain}: merge rolled back — integration tests failed after merge`
386
+ Record in `.gsd-t/progress.md` Decision Log.
387
+ Continue with remaining domains (other domains' merges are not affected).
388
+
389
+ 3. **Post-merge ownership report**: After all merges complete (successful or rolled back), log a summary in `.gsd-t/progress.md`:
390
+ ```
391
+ ## Worktree Merge Summary — {date}
392
+ - {domain-1}: MERGED | tests: PASS | violations: {N}
393
+ - {domain-2}: ROLLED BACK | reason: integration tests failed
394
+ - {domain-3}: MERGED | tests: PASS | violations: 0
395
+ ```
396
+
397
+ **Worktree Cleanup (MANDATORY — run after merge protocol, success or failure):**
398
+
399
+ After all merges complete (whether all passed, some rolled back, or errors occurred):
400
+
401
+ 1. List all worktree branches created during this execution run:
402
+ ```bash
403
+ git worktree list
404
+ git branch --list "gsd-t-worktree-*"
405
+ ```
406
+
407
+ 2. Remove each domain worktree:
408
+ ```bash
409
+ git worktree remove --force {worktree-path}
410
+ git branch -D {worktree-branch}
411
+ ```
412
+
413
+ 3. **Orphaned worktree detection**: If any worktrees remain after cleanup (can happen if an agent crashed):
414
+ ```bash
415
+ git worktree prune
416
+ ```
417
+ Log: `[cleanup] Pruned {N} orphaned worktrees via git worktree prune`
418
+
419
+ 4. Verify no worktrees remain except the main working tree:
420
+ ```bash
421
+ git worktree list
422
+ # should show only: {main-path} {commit} [branch]
423
+ ```
424
+
425
+ Cleanup is not optional — orphaned worktrees waste disk space and can confuse subsequent executions. Always run cleanup, even if earlier steps failed.
195
426
  ```
196
427
 
428
+ ## Step 3.5: Orchestrator Context Self-Check (MANDATORY)
429
+
430
+ After EVERY domain completes (and after every checkpoint), the orchestrator MUST check its own context utilization:
431
+
432
+ Run via Bash:
433
+ `if [ "${CLAUDE_CONTEXT_TOKENS_MAX:-0}" -gt 0 ]; then CTX_PCT=$(echo "scale=1; ${CLAUDE_CONTEXT_TOKENS_USED:-0} * 100 / ${CLAUDE_CONTEXT_TOKENS_MAX}" | bc); else CTX_PCT="N/A"; fi && echo "Orchestrator context: ${CTX_PCT}%"`
434
+
435
+ **If CTX_PCT >= 70:**
436
+ 1. **Save checkpoint to disk** — update `.gsd-t/progress.md` with:
437
+ - Which domains are complete, which remain
438
+ - Current wave, next domain to execute
439
+ - Any checkpoint results
440
+ 2. **Instruct user**: Output exactly:
441
+ ```
442
+ ⚠️ Orchestrator context at {CTX_PCT}% — approaching limit.
443
+ Progress saved. Run `/clear` then `/user:gsd-t-execute` to continue from the next domain.
444
+ ```
445
+ 3. **STOP execution.** Do NOT spawn another domain subagent. The next session will resume from saved state.
446
+
447
+ **If CTX_PCT < 70:** Continue normally to the next domain/wave.
448
+
449
+ This prevents the orchestrator from running out of context mid-milestone, which causes session breaks and summary-based recovery.
450
+
197
451
  ## Step 4: Checkpoint Handling
198
452
 
199
453
  When a checkpoint is reached (solo or team):
@@ -246,6 +500,26 @@ When all tasks in all domains are complete:
246
500
 
247
501
  **Level 1–2**: Report completion summary and recommend proceeding to integrate phase. Wait for confirmation.
248
502
 
503
+ ## Step 7: Doc-Ripple (Automated)
504
+
505
+ After all work is committed but before reporting completion:
506
+
507
+ 1. Run threshold check — read `git diff --name-only HEAD~1` and evaluate against doc-ripple-contract.md trigger conditions
508
+ 2. If SKIP: log "Doc-ripple: SKIP — {reason}" and proceed to completion
509
+ 3. If FIRE: spawn doc-ripple agent:
510
+
511
+ ⚙ [{model}] gsd-t-doc-ripple → blast radius analysis + parallel updates
512
+
513
+ Task subagent (general-purpose, model: sonnet):
514
+ "Execute the doc-ripple workflow per commands/gsd-t-doc-ripple.md.
515
+ Git diff context: {files changed list}
516
+ Command that triggered: execute
517
+ Produce manifest at .gsd-t/doc-ripple-manifest.md.
518
+ Update all affected documents.
519
+ Report: 'Doc-ripple: {N} checked, {N} updated, {N} skipped'"
520
+
521
+ 4. After doc-ripple returns, verify manifest exists and report summary inline
522
+
249
523
  ## Document Ripple
250
524
 
251
525
  Execute modifies source code, so the Pre-Commit Gate (referenced in Step 9) covers document updates. For clarity, the key documents affected by execution:
@@ -38,9 +38,10 @@ MILESTONE WORKFLOW [auto] = in wave
38
38
  execute [auto] Run tasks (solo or team mode)
39
39
  test-sync [auto] Sync tests with code changes
40
40
  qa [auto] QA agent — test generation, execution, gap reporting
41
+ doc-ripple [auto] Automated document ripple — update docs after code changes
41
42
  integrate [auto] Wire domains together at boundaries
42
- verify [auto] Run quality gates
43
- complete-milestone [auto] Archive milestone + git tag
43
+ verify [auto] Run quality gates → auto-invokes complete-milestone
44
+ complete-milestone [auto] Archive milestone + git tag (auto-invoked by verify)
44
45
 
45
46
  AUTOMATION Auto
46
47
  ───────────────────────────────────────────────────────────────────────────────
@@ -231,10 +232,12 @@ Use these when user asks for help on a specific command:
231
232
  - **Use when**: Architectural decisions need exploration
232
233
 
233
234
  ### plan
234
- - **Summary**: Create atomic task lists for each domain
235
+ - **Summary**: Create atomic task lists for each domain (each task must fit in one context window)
235
236
  - **Auto-invoked**: Yes (in wave, after discuss)
236
237
  - **Creates**: `.gsd-t/domains/*/tasks.md`
237
238
  - **Use when**: Ready to define specific implementation tasks
239
+ - **Note (M22)**: Tasks auto-split if estimated scope exceeds 70% context window — guarantees fresh dispatch works
240
+ - **Note (M26)**: Pre-mortem step now also reads rules.jsonl for historical failure patterns via getPreMortemRules
238
241
 
239
242
  ### impact
240
243
  - **Summary**: Analyze downstream effects of planned changes
@@ -247,6 +250,8 @@ Use these when user asks for help on a specific command:
247
250
  - **Auto-invoked**: Yes (in wave, after impact)
248
251
  - **Updates**: Domain tasks, progress.md, source code
249
252
  - **Use when**: Ready to implement
253
+ - **Note (M22)**: Task-level fresh dispatch (one subagent per task, ~10-20% context each). Team mode uses worktree isolation (`isolation: "worktree"`) — zero file conflicts. Adaptive replanning between domain completions.
254
+ - **Note (M26)**: Active rule injection — evaluates declarative rules from rules.jsonl before dispatching each domain's tasks. Fires matching rules as warnings in subagent prompts.
250
255
 
251
256
  ### test-sync
252
257
  - **Summary**: Keep tests aligned with code changes
@@ -260,6 +265,12 @@ Use these when user asks for help on a specific command:
260
265
  - **Creates**: Contract test skeletons, acceptance tests, edge case tests, test audit reports
261
266
  - **Use when**: Automatically spawned — never needs manual invocation. Standalone use for ad-hoc test audits.
262
267
 
268
+ ### doc-ripple
269
+ - **Summary**: Automated document ripple — identifies and updates all downstream docs after code changes
270
+ - **Auto-invoked**: Yes (after primary work in execute, integrate, quick, debug, wave)
271
+ - **Creates**: `.gsd-t/doc-ripple-manifest.md`
272
+ - **Use when**: Automatically spawned — never needs manual invocation. Standalone use for ad-hoc doc sync audits.
273
+
263
274
  ### integrate
264
275
  - **Summary**: Wire domains together at their boundaries
265
276
  - **Auto-invoked**: Yes (in wave, after execute)
@@ -267,27 +278,32 @@ Use these when user asks for help on a specific command:
267
278
  - **Use when**: Domains are complete and need to work together
268
279
 
269
280
  ### verify
270
- - **Summary**: Run quality gates across all dimensions
281
+ - **Summary**: Run quality gates across all dimensions, including goal-backward behavior verification
271
282
  - **Auto-invoked**: Yes (in wave, after integrate)
272
283
  - **Creates**: `.gsd-t/verify-report.md`
273
284
  - **Use when**: Checking that milestone meets requirements
285
+ - **Note (M22)**: Goal-backward verification step added — checks for placeholder implementations (console.log/TODO/hardcoded returns) after structural gates pass
274
286
 
275
287
  ### complete-milestone
276
288
  - **Summary**: Archive milestone documentation and create git tag
277
- - **Auto-invoked**: Yes (in wave, after verify passes)
289
+ - **Auto-invoked**: Yes — by verify (Step 8, all autonomy levels) and in wave
278
290
  - **Creates**: `.gsd-t/milestones/{name}/`, git tag
279
- - **Use when**: Milestone is done and verified
291
+ - **Use when**: Auto-runs after verify passes. Can also be invoked standalone to manually close a milestone.
292
+ - **Note (M22)**: Goal-backward gate runs as final check before archiving — blocks completion if placeholders remain
293
+ - **Note (M26)**: Distillation extended with rule engine evaluation, patch candidate generation, promotion gate checks, graduation, consolidation, and quality budget governance
280
294
 
281
295
  ### wave
282
- - **Summary**: Run complete cycle automatically: partition through complete
296
+ - **Summary**: Run complete cycle automatically: partition through verify+complete
283
297
  - **Auto-invoked**: No (user triggers)
284
- - **Runs**: partition → discuss → plan → impact → execute → test-sync → integrate → verifycomplete-milestone
298
+ - **Runs**: partition → discuss → plan → impact → execute → test-sync → integrate → verify+complete
285
299
  - **Use when**: Ready to execute a full milestone hands-off
286
300
 
287
301
  ### status
288
- - **Summary**: Show current progress across all domains
302
+ - **Summary**: Show current progress across all domains, including token breakdown by domain/task/phase, global ELO and cross-project rankings
289
303
  - **Auto-invoked**: No
290
- - **Reads**: All `.gsd-t/` files
304
+ - **Note (M22)**: Displays context observability data — token usage by domain, avg tokens/task, peak Ctx% per domain
305
+ - **Note (M27)**: Displays global ELO and cross-project rankings when global metrics exist
306
+ - **Reads**: All `.gsd-t/` files, `~/.claude/metrics/` (global metrics)
291
307
  - **Use when**: Need to see where things stand
292
308
 
293
309
  ### resume
@@ -316,6 +332,12 @@ Use these when user asks for help on a specific command:
316
332
  - **Creates**: `.gsd-t/dashboard.pid` (when starting server)
317
333
  - **Use when**: Monitoring live agent activity during execute/wave phases; run `gsd-t-visualize stop` to stop the server
318
334
 
335
+ ### metrics
336
+ - **Summary**: View task telemetry, process ELO, signal distribution, domain health, and cross-project comparison (with `--cross-project` flag)
337
+ - **Auto-invoked**: No
338
+ - **Reads**: `.gsd-t/metrics/task-metrics.jsonl`, `.gsd-t/metrics/rollup.jsonl`, `~/.claude/metrics/` (when `--cross-project`)
339
+ - **Use when**: Reviewing process health, first-pass rates, ELO trends, anomaly flags, or comparing signal distributions across projects
340
+
319
341
  ### debug
320
342
  - **Summary**: Systematic debugging with persistent state
321
343
  - **Auto-invoked**: No