@tekyzinc/gsd-t 2.39.13 → 2.45.11

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (45) hide show
  1. package/README.md +17 -9
  2. package/bin/desktop.ini +2 -0
  3. package/bin/global-sync-manager.js +350 -0
  4. package/bin/gsd-t.js +592 -2
  5. package/bin/metrics-collector.js +167 -0
  6. package/bin/metrics-rollup.js +200 -0
  7. package/bin/patch-lifecycle.js +195 -0
  8. package/bin/rule-engine.js +160 -0
  9. package/commands/desktop.ini +2 -0
  10. package/commands/gsd-t-complete-milestone.md +192 -5
  11. package/commands/gsd-t-debug.md +16 -2
  12. package/commands/gsd-t-execute.md +257 -52
  13. package/commands/gsd-t-help.md +25 -10
  14. package/commands/gsd-t-integrate.md +35 -7
  15. package/commands/gsd-t-metrics.md +143 -0
  16. package/commands/gsd-t-plan.md +49 -2
  17. package/commands/gsd-t-quick.md +15 -3
  18. package/commands/gsd-t-status.md +78 -0
  19. package/commands/gsd-t-test-sync.md +2 -2
  20. package/commands/gsd-t-verify.md +140 -9
  21. package/commands/gsd-t-visualize.md +11 -1
  22. package/commands/gsd-t-wave.md +34 -19
  23. package/docs/GSD-T-README.md +9 -6
  24. package/docs/architecture.md +84 -2
  25. package/docs/ci-examples/desktop.ini +2 -0
  26. package/docs/ci-examples/github-actions.yml +104 -0
  27. package/docs/ci-examples/gitlab-ci.yml +116 -0
  28. package/docs/desktop.ini +2 -0
  29. package/docs/infrastructure.md +87 -1
  30. package/docs/prd-graph-engine.md +2 -2
  31. package/docs/prd-gsd2-hybrid.md +258 -135
  32. package/docs/requirements.md +63 -2
  33. package/examples/.gsd-t/contracts/desktop.ini +2 -0
  34. package/examples/.gsd-t/desktop.ini +2 -0
  35. package/examples/.gsd-t/domains/desktop.ini +2 -0
  36. package/examples/.gsd-t/domains/example-domain/desktop.ini +2 -0
  37. package/examples/desktop.ini +2 -0
  38. package/examples/rules/.gitkeep +0 -0
  39. package/package.json +40 -40
  40. package/scripts/desktop.ini +2 -0
  41. package/scripts/gsd-t-dashboard-server.js +19 -2
  42. package/scripts/gsd-t-dashboard.html +63 -0
  43. package/scripts/gsd-t-event-writer.js +1 -0
  44. package/templates/CLAUDE-global.md +30 -9
  45. package/templates/desktop.ini +2 -0
@@ -0,0 +1,143 @@
1
+ # GSD-T: Metrics — View Task Telemetry and Process Health
2
+
3
+ You are displaying metrics data from the GSD-T telemetry system. Read JSONL files directly — no module imports needed.
4
+
5
+ ## Step 1: Load Metrics Data
6
+
7
+ Read:
8
+ 1. `.gsd-t/metrics/task-metrics.jsonl` — per-task telemetry records
9
+ 2. `.gsd-t/metrics/rollup.jsonl` — per-milestone aggregation with ELO and heuristics
10
+ 3. `.gsd-t/progress.md` — current milestone ID (for default filter)
11
+
12
+ If neither file exists: display "No metrics data yet. Metrics are collected automatically during execute, quick, and debug commands." and stop.
13
+
14
+ If `$ARGUMENTS` contains a milestone ID (e.g., "M25"), use that as the filter. Otherwise, use the current active milestone from progress.md.
15
+
16
+ ## Step 2: Display Milestone Summary
17
+
18
+ From `rollup.jsonl`, find the entry matching the target milestone. Display:
19
+
20
+ ```
21
+ ## Metrics — {milestone}
22
+
23
+ | Metric | Value |
24
+ |---------------------|--------------------------------|
25
+ | Tasks | {total_tasks} |
26
+ | First-pass rate | {first_pass_rate * 100}% |
27
+ | Avg duration | {avg_duration_s}s |
28
+ | Avg context | {avg_context_pct}% |
29
+ | Total fix cycles | {total_fix_cycles} |
30
+ | Total tokens | {total_tokens} |
31
+ ```
32
+
33
+ If no rollup entry exists for the milestone, compute summary directly from task-metrics.jsonl records.
34
+
35
+ ## Step 3: Display Process ELO
36
+
37
+ ```
38
+ ## Process ELO
39
+
40
+ {elo_after} ({elo_delta > 0 ? '↑' : '↓'} {elo_delta} from {elo_before})
41
+ ```
42
+
43
+ If no previous milestone, show: `{elo_after} (baseline — first milestone)`
44
+
45
+ ## Step 4: Display Signal Distribution
46
+
47
+ From rollup `signal_distribution` or computed from task-metrics:
48
+
49
+ ```
50
+ ## Signal Distribution
51
+
52
+ | Signal Type | Count |
53
+ |------------------|-------|
54
+ | pass-through | {N} |
55
+ | fix-cycle | {N} |
56
+ | debug-invoked | {N} |
57
+ | user-correction | {N} |
58
+ | phase-skip | {N} |
59
+ ```
60
+
61
+ ## Step 5: Display Domain Breakdown
62
+
63
+ From rollup `domain_breakdown`:
64
+
65
+ ```
66
+ ## Domain Breakdown
67
+
68
+ | Domain | Tasks | Pass% | Avg Duration |
69
+ |-------------------|-------|-------|--------------|
70
+ | {domain} | {N} | {N}% | {N}s |
71
+ ```
72
+
73
+ ## Step 6: Display Trend Comparison
74
+
75
+ If `trend_delta` exists (previous milestone data available):
76
+
77
+ ```
78
+ ## Trend vs Previous Milestone
79
+
80
+ | Metric | Delta |
81
+ |-----------------|--------------------------|
82
+ | First-pass rate | {delta > 0 ? '↑' : '↓'} {delta}% |
83
+ | Avg duration | {delta > 0 ? '↑' : '↓'} {delta}s |
84
+ | ELO | {delta > 0 ? '↑' : '↓'} {delta} |
85
+ ```
86
+
87
+ If no previous milestone: "First milestone — no trend data yet."
88
+
89
+ ## Step 7: Display Heuristic Anomalies
90
+
91
+ If `heuristic_flags` has entries:
92
+
93
+ ```
94
+ ## Anomaly Detection
95
+
96
+ | Heuristic | Severity | Description |
97
+ |-------------------------------|----------|--------------------------|
98
+ | {heuristic} | {sev} | {description} |
99
+ ```
100
+
101
+ If no anomalies: "No anomalies detected."
102
+
103
+ ## Step 8: Cross-Project Comparison (when --cross-project flag present)
104
+
105
+ If `$ARGUMENTS` contains `--cross-project`:
106
+
107
+ 1. Run via Bash:
108
+ ```bash
109
+ node -e "const g = require('./bin/global-sync-manager.js'); const r = g.compareSignalDistributions(require('./package.json').name || require('path').basename(process.cwd())); console.log(JSON.stringify(r));" 2>/dev/null
110
+ ```
111
+
112
+ 2. If the result has `insufficient_data: true`, display:
113
+ "No global metrics yet — complete milestones in multiple projects to enable cross-project comparison"
114
+
115
+ 3. Otherwise, display the cross-project signal distribution comparison:
116
+ ```
117
+ ## Cross-Project Signal Distribution
118
+
119
+ | Project | Tasks | Pass-Through | Fix-Cycle | Other |
120
+ |----------------------|-------|--------------|-----------|-------|
121
+ | {project} {★ if is_queried} | {N} | {rate} | {rate} | ... |
122
+ ```
123
+
124
+ 4. If `$ARGUMENTS` also contains `--domain {domainType}`, run:
125
+ ```bash
126
+ node -e "const g = require('./bin/global-sync-manager.js'); const r = g.getDomainTypeComparison('{domainType}'); console.log(JSON.stringify(r));" 2>/dev/null
127
+ ```
128
+ Display the domain-type comparison table:
129
+ ```
130
+ ## Domain-Type Comparison: {domainType}
131
+
132
+ | Project | Tasks | Pass-Through | Fix-Cycle |
133
+ |----------------------|-------|--------------|-----------|
134
+ | {project} | {N} | {count} | {count} |
135
+ ```
136
+
137
+ If `--cross-project` is NOT in `$ARGUMENTS`: skip this step entirely (no change to existing behavior).
138
+
139
+ $ARGUMENTS
140
+
141
+ ## Auto-Clear
142
+
143
+ All work is committed to project files. Execute `/clear` to free the context window for the next command.
@@ -25,6 +25,24 @@ If `.gsd-t/graph/meta.json` exists (graph index is available):
25
25
 
26
26
  If graph is not available, skip this step.
27
27
 
28
+ ## Step 1.7: Pre-Mortem — Historical Failure Analysis
29
+
30
+ Before creating task lists, check historical task-metrics for domain-level failure patterns from previous milestones:
31
+
32
+ 1. Run via Bash:
33
+ `node -e "const c = require('./bin/metrics-collector.js'); const domains = [/* list domain names from scope files */]; domains.forEach(d => { const w = c.getPreFlightWarnings(d); if(w.length) w.forEach(x => console.log('⚠️ ' + x)); });" 2>/dev/null || true`
34
+
35
+ 2. If any domain has `first_pass_rate < 0.6` historically:
36
+ - Display warning inline: `⚠️ Domain {name} has historically low first-pass rate ({rate}%). Consider: smaller tasks, more explicit acceptance criteria, or additional contract detail.`
37
+ - This is **non-blocking** — it informs task design, does not prevent planning.
38
+
39
+ 3. If `.gsd-t/metrics/task-metrics.jsonl` does not exist: skip this step silently (first milestone, no historical data).
40
+
41
+ 4. **Rule-based pre-mortem**: Run via Bash:
42
+ `node -e "const re = require('./bin/rule-engine.js'); const domains = [/* list domain names */]; domains.forEach(d => { const rules = re.getPreMortemRules(d); if(rules.length) rules.forEach(r => console.log('RULE ' + r.id + ': ' + r.name + ' — historically triggered for domains like ' + d)); });" 2>/dev/null || true`
43
+
44
+ If matching rules found: display warnings inline (non-blocking — informs task design). Falls back gracefully if rules.jsonl does not exist or is empty.
45
+
28
46
  ## Step 2: Create Task Lists Per Domain
29
47
 
30
48
  ### SharedCore-First Pre-Check
@@ -67,6 +85,30 @@ For each domain, write `.gsd-t/domains/{domain-name}/tasks.md`:
67
85
  4. **Contract-bound**: Every task that crosses a domain boundary must reference the specific contract it implements
68
86
  5. **Ordered**: Tasks within a domain are numbered in execution order
69
87
  6. **No implicit knowledge**: Don't assume the executing agent remembers previous tasks — reference contracts and files explicitly
88
+ 7. **Context-window fit**: Each task MUST be executable within a single context window. Apply the scope validation heuristics below.
89
+
90
+ ### Task Scope Validation
91
+
92
+ After writing each task, apply this heuristic check before finalizing:
93
+
94
+ **Splitting candidates — flag if ANY of these are true:**
95
+ - Task lists **more than 5 files** to modify or create
96
+ - Task has **more than 3 complex dependencies** (other tasks, contracts, or external systems it must read and understand)
97
+ - Task description spans multiple distinct concerns (e.g., "implement X and also refactor Y and update Z docs")
98
+
99
+ **Warning threshold:** If a task is flagged, emit:
100
+ > ⚠️ **Task scope warning — {domain} Task {N}**: Estimated context load is high ({N} files, {N} dependencies). This task may approach the 70% context window threshold. Consider splitting into:
101
+ > - Task {N}a: {first concern}
102
+ > - Task {N}b: {second concern}
103
+
104
+ **Auto-split rule (Level 3 Full Auto):** If a task has >5 files AND >3 dependencies, split it automatically. Renumber subsequent tasks. Document the split rationale in the task's Dependencies field.
105
+
106
+ **Guidance for estimating context size:**
107
+ - Each file to read ≈ 1–5% of context window (varies by file size)
108
+ - CLAUDE.md + scope.md + constraints.md + contracts ≈ 15–25% baseline overhead
109
+ - Tasks with >5 files or >3 cross-domain contracts commonly exceed 70% total context
110
+
111
+ This rule implements the "task must fit in one context window" constraint — a task that compacts its subagent is a task that produces incomplete or corrupt output.
70
112
 
71
113
  ### Cross-Domain Duplicate Operation Scan
72
114
 
@@ -259,8 +301,13 @@ After subagent returns — run via Bash:
259
301
  Compute tokens and compaction:
260
302
  - No compaction (TOK_END >= TOK_START): `TOKENS=$((TOK_END-TOK_START))`, COMPACTED=null
261
303
  - Compaction detected (TOK_END < TOK_START): `TOKENS=$(((TOK_MAX-TOK_START)+TOK_END))`, COMPACTED=$DT_END
262
- Append to `.gsd-t/token-log.md` (create with header `| Datetime-start | Datetime-end | Command | Step | Model | Duration(s) | Notes | Tokens | Compacted |` if missing):
263
- `| {DT_START} | {DT_END} | gsd-t-plan | Step 7 | haiku | {DURATION}s | {PASS/FAIL}, iteration {N} | {TOKENS} | {COMPACTED} |`
304
+ Compute context utilization run via Bash:
305
+ `if [ "${CLAUDE_CONTEXT_TOKENS_MAX:-0}" -gt 0 ]; then CTX_PCT=$(echo "scale=1; ${CLAUDE_CONTEXT_TOKENS_USED:-0} * 100 / ${CLAUDE_CONTEXT_TOKENS_MAX}" | bc); else CTX_PCT="N/A"; fi`
306
+ Alert on context thresholds (display to user inline):
307
+ - If CTX_PCT >= 85: `echo "🔴 CRITICAL: Context at ${CTX_PCT}% — compaction likely. Task MUST be split."`
308
+ - If CTX_PCT >= 70: `echo "⚠️ WARNING: Context at ${CTX_PCT}% — approaching compaction threshold. Consider splitting in plan."`
309
+ Append to `.gsd-t/token-log.md` (create with header `| Datetime-start | Datetime-end | Command | Step | Model | Duration(s) | Notes | Tokens | Compacted | Domain | Task | Ctx% |` if missing):
310
+ `| {DT_START} | {DT_END} | gsd-t-plan | Step 7 | haiku | {DURATION}s | {PASS/FAIL}, iteration {N} | {TOKENS} | {COMPACTED} | | | {CTX_PCT} |`
264
311
  If validation FAIL, append each gap to `.gsd-t/qa-issues.md` (create with header `| Date | Command | Step | Model | Duration(s) | Severity | Finding |` if missing):
265
312
  `| {DT_START} | gsd-t-plan | Step 7 | haiku | {DURATION}s | medium | {gap description} |`
266
313
 
@@ -83,6 +83,16 @@ When you encounter unexpected situations:
83
83
  5. Verify it works
84
84
  6. Commit: `[quick] {description}`
85
85
 
86
+ ## Step 3.5: Emit Task Metrics
87
+
88
+ After committing, emit a task-metrics record for this quick task — run via Bash:
89
+ `node bin/metrics-collector.js --milestone {current-milestone-or-none} --domain {domain-or-quick} --task quick-{timestamp} --command quick --duration_s {elapsed} --tokens_used {estimated} --context_pct ${CTX_PCT:-0} --pass {true|false} --fix_cycles {0|N} --signal_type {pass-through|fix-cycle} --notes "[quick] {description}" 2>/dev/null || true`
90
+
91
+ Signal type: `pass-through` if task completed on first attempt; `fix-cycle` if rework was needed.
92
+
93
+ Emit task_complete event — run via Bash:
94
+ `node ~/.claude/scripts/gsd-t-event-writer.js --type task_complete --command gsd-t-quick --reasoning "signal_type={signal_type}, domain={domain}" --outcome {success|failure} || true`
95
+
86
96
  ## Step 4: Document Ripple (if GSD-T is active)
87
97
 
88
98
  If `.gsd-t/progress.md` exists, assess what documentation was affected and update ALL relevant files:
@@ -123,9 +133,11 @@ Quick does not mean skip testing. Before committing:
123
133
  - Playwright E2E specs (if UI/routes/flows/modes changed): create new specs for new functionality, update existing specs for changed behavior
124
134
  - Cover all modes/flags affected by this change
125
135
  - "No feature code without test code" applies to quick tasks too
126
- 2. **Run the FULL test suite** — not just affected tests:
127
- - All unit/integration tests
128
- - Full Playwright E2E suite (if configured)
136
+ 2. **Run ALL configured test suites** — not just affected tests, not just one suite:
137
+ a. Detect all runners: check for vitest/jest config, playwright.config.*, cypress.config.*
138
+ b. Run EVERY detected suite. Unit tests alone are NEVER sufficient when E2E exists.
139
+ c. If `playwright.config.*` exists → `npx playwright test` (full suite)
140
+ d. Report ALL results: "Unit: X/Y pass | E2E: X/Y pass"
129
141
  - Fix any failures before proceeding (up to 2 attempts)
130
142
  3. **Verify against requirements**:
131
143
  - Does the change satisfy its intended requirement?
@@ -57,6 +57,59 @@ If `.gsd-t/backlog.md` exists, read and parse it. Show total count and top 3 ite
57
57
  If there are blockers or issues, highlight them.
58
58
  If the user provides $ARGUMENTS, focus the status on that specific domain or aspect.
59
59
 
60
+ ## Token Usage Breakdown
61
+
62
+ If `.gsd-t/token-log.md` exists, read it and append a token breakdown to the status report.
63
+
64
+ Parse each row in the table. Handle both old format (9 columns) and extended format (12 columns with Domain, Task, Ctx%). Rows with missing or empty Domain column are assigned domain "(untagged)".
65
+
66
+ ### Token Usage by Domain
67
+ Group rows by Domain. For each domain, sum Tokens and collect all Ctx% values (ignoring "N/A" and empty). Display:
68
+
69
+ ```
70
+ ## Token Usage by Domain
71
+ | Domain | Tokens | Subagents | Peak Ctx% |
72
+ |----------------|--------|-----------|-----------|
73
+ | auth | 12,400 | 4 | 14% |
74
+ | notifications | 45,200 | 3 | 52% ⚠️ |
75
+ | (untagged) | 8,100 | 6 | N/A |
76
+ ```
77
+
78
+ Flag any domain where Peak Ctx% >= 70 with `⚠️` suffix.
79
+
80
+ ### Token Usage by Phase/Command
81
+ Group rows by Command. For each command, sum Tokens and count subagent rows. Display:
82
+
83
+ ```
84
+ ## Token Usage by Command
85
+ | Command | Tokens | Subagents |
86
+ |---------------|--------|-----------|
87
+ | gsd-t-execute | 86,200 | 14 |
88
+ | gsd-t-wave | 12,400 | 9 |
89
+ | gsd-t-plan | 3,400 | 1 |
90
+ ```
91
+
92
+ If token-log.md does not exist or is empty, skip this section entirely (no error).
93
+
94
+ ## Process Health
95
+
96
+ If `.gsd-t/metrics/rollup.jsonl` exists, read the latest entry and append to the status report:
97
+
98
+ ```
99
+ Process Health:
100
+ ELO: {elo_after} ({elo_delta > 0 ? '↑' : '↓'} {elo_delta})
101
+ Quality: {first_pass_rate * 100}% first-pass rate | {total_fix_cycles} fix cycles
102
+ ```
103
+
104
+ If `.gsd-t/metrics/task-metrics.jsonl` exists but no rollup.jsonl, compute first_pass_rate directly from task-metrics for the current milestone and display:
105
+
106
+ ```
107
+ Process Health:
108
+ Quality: {rate}% first-pass rate (current milestone, no rollup yet)
109
+ ```
110
+
111
+ If neither file exists, skip this section entirely.
112
+
60
113
  ## Graph Status
61
114
 
62
115
  If `.gsd-t/graph/meta.json` exists, read it and append to the status report:
@@ -81,6 +134,31 @@ After displaying the project status, check for GSD-T updates:
81
134
 
82
135
  5. If versions match, skip — don't show anything
83
136
 
137
+ ## Global ELO & Cross-Project Rankings
138
+
139
+ After the Process Health section, check for global metrics:
140
+
141
+ 1. Run via Bash:
142
+ ```bash
143
+ node -e "const g = require('./bin/global-sync-manager.js'); const name = (() => { try { return require('./package.json').name; } catch { return require('path').basename(process.cwd()); } })(); const elo = g.getGlobalELO(name); const ranks = g.getProjectRankings(); console.log(JSON.stringify({ elo, ranks, name }));" 2>/dev/null
144
+ ```
145
+
146
+ 2. If the result returns `elo: null` or the command fails: display "No global metrics yet" and skip.
147
+
148
+ 3. If global ELO data exists, display:
149
+ ```
150
+ Global ELO: {elo} (rank #{position} of {total} projects)
151
+ ```
152
+ Where position is the 1-based index of the current project in the rankings array.
153
+
154
+ 4. If 2+ projects have global rollup data, display the top 5 rankings:
155
+ ```
156
+ ## Cross-Project Rankings (Top 5)
157
+ | Rank | Project | ELO | Latest Milestone |
158
+ |------|------------------|--------|------------------|
159
+ | 1 | {project} | {elo} | {milestone} |
160
+ ```
161
+
84
162
  $ARGUMENTS
85
163
 
86
164
  ## Auto-Clear
@@ -111,8 +111,8 @@ pytest tests/test_{module}.py -v
111
111
  npm test -- --testPathPattern="{module}"
112
112
  ```
113
113
 
114
- ### B) E2E Tests
115
- If an E2E framework is detected, run E2E tests affected by the changes:
114
+ ### B) E2E Tests (MANDATORY when config exists)
115
+ If `playwright.config.*` or `cypress.config.*` exists, you MUST run E2E tests skipping is never acceptable:
116
116
 
117
117
  ```bash
118
118
  # Playwright
@@ -199,6 +199,95 @@ Create or update `.gsd-t/verify-report.md`:
199
199
  | 2 | ui | Add loading states for async calls | WARN |
200
200
  ```
201
201
 
202
+ ## Step 5.25: Metrics Quality Budget Check
203
+
204
+ Check task-metrics for the current milestone to detect quality budget violations:
205
+
206
+ 1. Run via Bash:
207
+ `node -e "const c = require('./bin/metrics-collector.js'); const r = c.readTaskMetrics({milestone: '{milestone-id}'}); if(!r.length){console.log('No metrics data — quality budget check skipped');process.exit(0);} const pass=r.filter(t=>t.fix_cycles===0&&t.pass).length; const rate=pass/r.length; console.log('First-pass rate: '+(rate*100).toFixed(1)+'% ('+pass+'/'+r.length+')'); if(rate<0.6) console.log('⚠️ Quality budget WARNING: first-pass rate below 60%');" 2>/dev/null || true`
208
+
209
+ 2. Run heuristics check via Bash:
210
+ `node -e "const m=require('./bin/metrics-rollup.js'); const r=m.readRollups({milestone:'{milestone-id}'}); if(r.length&&r[r.length-1].heuristic_flags.some(f=>f.severity==='HIGH')) console.log('⚠️ HIGH severity heuristic flag detected — review before completing milestone');" 2>/dev/null || true`
211
+
212
+ 3. Display quality metrics summary inline. Quality budget violation is a **WARNING** (non-blocking) — does not fail verify.
213
+
214
+ 4. Include quality budget status in the verification report (Step 5):
215
+ `- Quality Budget: {PASS/WARN} — first-pass rate {N}%{, HIGH heuristic: {name} if any}`
216
+
217
+ ## Step 5.5: Goal-Backward Verification (Post-Gate Behavior Check)
218
+
219
+ This step runs **after all 8 quality gates pass**. It verifies that milestone goals are actually achieved end-to-end — not just structurally present. It catches placeholder implementations that pass all structural gates.
220
+
221
+ Refer to `.gsd-t/contracts/goal-backward-contract.md` for the full verification flow, placeholder patterns, and findings report format.
222
+
223
+ ### 5.5.1 Load Milestone Goals and Requirements
224
+
225
+ 1. Read `.gsd-t/progress.md` — extract the current milestone name and goals
226
+ 2. Read `docs/requirements.md` — identify **critical requirements** (skip trivial/low-priority items)
227
+
228
+ ### 5.5.2 Trace Requirements to Behavior
229
+
230
+ For each critical requirement:
231
+
232
+ 1. **If `.gsd-t/graph/meta.json` exists (graph available)**:
233
+ - Trace the requirement → code path → behavior chain using graph queries
234
+ - Use `getRequirementFor`, `getCallers`, and `getTestsFor` to build the chain
235
+ - Flag requirements with no traceable code path as CRITICAL findings
236
+
237
+ 2. **If graph is not available (fallback to grep)**:
238
+ - Search the codebase for the feature/function implementing each requirement
239
+ - Trace from entry point → core logic → output/response
240
+
241
+ ### 5.5.3 Scan for Placeholder Patterns
242
+
243
+ For each file identified in the requirement traces above, scan for these placeholder patterns:
244
+
245
+ | Pattern | Detection Hint | Severity |
246
+ |---------|---------------|----------|
247
+ | console.log placeholder | `console.log.*TODO\|console.log.*implement` | CRITICAL |
248
+ | TODO/FIXME in implementation | `// TODO\|// FIXME\|# TODO\|# FIXME` in non-test files | CRITICAL |
249
+ | Empty function body | `function \w+\(\) \{\}` or `\(\) => \{\}` with no logic | CRITICAL |
250
+ | Throw not-implemented | `throw new Error.*not implemented\|throw new Error.*TODO` | CRITICAL |
251
+ | Hardcoded return | `return "success"\|return true` with no conditional logic | HIGH |
252
+ | Static UI text | Static `<span>` or text that never updates based on state | HIGH |
253
+ | Pass-through stub | `return input\|return req\|return data` with no transformation | MEDIUM |
254
+
255
+ ### 5.5.4 Produce Findings Report
256
+
257
+ Format findings per the goal-backward-contract.md report format:
258
+
259
+ ```markdown
260
+ ## Goal-Backward Verification Report
261
+
262
+ ### Status: PASS | FAIL
263
+
264
+ ### Findings
265
+ | # | Requirement | File:Line | Pattern | Severity | Description |
266
+ |---|-------------|-----------|---------|----------|-------------|
267
+ | 1 | {req-id} | {path}:{line} | {pattern} | {severity} | {what's wrong} |
268
+
269
+ ### Summary
270
+ - Requirements checked: {N}
271
+ - Findings: {N} ({critical}, {high}, {medium})
272
+ - Verdict: {PASS if 0 critical/high, FAIL otherwise}
273
+ ```
274
+
275
+ ### 5.5.5 Apply Blocking Rules
276
+
277
+ - **CRITICAL or HIGH findings** → Goal-Backward status = **FAIL** — block verification
278
+ - Append findings to the Critical section of the verification report (Step 5)
279
+ - Set overall verification status to FAIL
280
+ - **MEDIUM findings** → Goal-Backward status = **WARN** — log but do not block
281
+ - Append findings to the Warnings section of the verification report (Step 5)
282
+ - **No findings** → Goal-Backward status = **PASS** — add to verification report summary
283
+
284
+ Add a `Goal-Backward:` line to the Step 5 verification report summary:
285
+ ```
286
+ - Goal-Backward: {PASS/WARN/FAIL} — {N} requirements checked, {N} findings ({critical} critical, {high} high, {medium} medium)
287
+ ```
288
+
289
+ ---
290
+
202
291
  ## Step 6: Handle Remediation
203
292
 
204
293
  If there are CRITICAL findings:
@@ -217,15 +306,9 @@ Update `.gsd-t/progress.md`:
217
306
 
218
307
  ### Autonomy Behavior
219
308
 
220
- **Level 3 (Full Auto)**:
221
- - VERIFIED Log "✅ Verify complete all quality gates passed" and auto-advance to complete-milestone. Do NOT wait for user input.
222
- - CONDITIONAL PASS Log warnings, treat as VERIFIED, and auto-advance. Do NOT wait for user input.
223
- - FAIL → Auto-execute remediation tasks (up to 2 fix attempts). If still failing after 2 attempts, STOP and report to user.
224
-
225
- **Level 1–2**:
226
- - VERIFIED → Milestone complete, proceed to next milestone or ship
227
- - CONDITIONAL PASS → User decides if warnings are acceptable
228
- - FAIL → Return to execute phase for remediation tasks
309
+ **All Levels**:
310
+ - VERIFIED or CONDITIONAL PASS **Auto-invoke complete-milestone** (see Step 8 below). Completing a verified milestone is mechanical there is no judgment call that benefits from user review.
311
+ - FAIL**Level 3**: Auto-execute remediation tasks (up to 2 fix attempts). If still failing after 2 attempts, STOP and report to user. **Level 1–2**: Return to execute phase for remediation tasks.
229
312
 
230
313
  ## Document Ripple
231
314
 
@@ -238,6 +321,54 @@ Update `.gsd-t/progress.md`:
238
321
  4. **`.gsd-t/techdebt.md`** — If verification found new quality or security issues, add as debt
239
322
  5. **`docs/requirements.md`** — If verification revealed unmet requirements, update status
240
323
 
324
+ ## Step 8: Auto-Invoke Complete-Milestone
325
+
326
+ **This step is MANDATORY and runs at ALL autonomy levels.** Completing a verified milestone is a mechanical operation (archive, tag, bump version, update docs). There is no decision that benefits from user review — the decision was made when verification passed.
327
+
328
+ If status is VERIFY-FAILED:
329
+ - Do NOT invoke complete-milestone
330
+ - Report failures and stop
331
+
332
+ If status is VERIFIED or VERIFIED-WITH-WARNINGS:
333
+ 1. Log: "✅ Verify complete — spawning complete-milestone agent..."
334
+
335
+ **OBSERVABILITY LOGGING (MANDATORY):**
336
+ Before spawning — run via Bash:
337
+ `T_START=$(date +%s) && DT_START=$(date +"%Y-%m-%d %H:%M") && TOK_START=${CLAUDE_CONTEXT_TOKENS_USED:-0} && TOK_MAX=${CLAUDE_CONTEXT_TOKENS_MAX:-200000}`
338
+
339
+ 2. Spawn a Task subagent (model: sonnet, mode: bypassPermissions):
340
+ ```
341
+ "Execute the complete-milestone phase of the current GSD-T milestone.
342
+
343
+ Read and follow the full instructions in commands/gsd-t-complete-milestone.md
344
+ (resolve from ~/.claude/commands/ if not in project).
345
+ Read .gsd-t/progress.md for current milestone and state.
346
+ Read CLAUDE.md for project conventions.
347
+ Read .gsd-t/contracts/ for domain interfaces.
348
+
349
+ Complete the phase fully:
350
+ - Follow every step in the command file
351
+ - Update .gsd-t/progress.md status when done
352
+ - Run document ripple as specified
353
+ - Commit your work
354
+
355
+ Report back: one-line status summary."
356
+ ```
357
+
358
+ After subagent returns — run via Bash:
359
+ `T_END=$(date +%s) && DT_END=$(date +"%Y-%m-%d %H:%M") && TOK_END=${CLAUDE_CONTEXT_TOKENS_USED:-0} && DURATION=$((T_END-T_START))`
360
+ Compute tokens and compaction:
361
+ - No compaction (TOK_END >= TOK_START): `TOKENS=$((TOK_END-TOK_START))`, COMPACTED=null
362
+ - Compaction detected (TOK_END < TOK_START): `TOKENS=$(((TOK_MAX-TOK_START)+TOK_END))`, COMPACTED=$DT_END
363
+ Append to `.gsd-t/token-log.md`:
364
+ `| {DT_START} | {DT_END} | gsd-t-verify | Step 8 | sonnet | {DURATION}s | auto-complete-milestone | {TOKENS} | {COMPACTED} | | | {CTX_PCT} |`
365
+
366
+ 3. Verify subagent result: Read `.gsd-t/progress.md` — confirm status is COMPLETED. If not, report the failure.
367
+
368
+ **Why this is mandatory**: Without auto-completion, verified milestones remain in VERIFIED state indefinitely. Requirements stay unmarked, progress.md is stale, and future sessions cannot tell the work was done. This is the root cause of "GSD-T forgot it did this work" — the milestone was built and verified but never formally completed.
369
+
370
+ **Why a subagent**: Complete-milestone is a 12-step process (gap analysis, archive, version bump, git tag, doc ripple). Verify is already heavy with 8+ quality gates. Spawning a fresh-context subagent avoids compaction risk — and complete-milestone loads everything it needs from files (progress.md, verify-report.md, contracts).
371
+
241
372
  $ARGUMENTS
242
373
 
243
374
  ## Auto-Clear
@@ -39,7 +39,17 @@ Run via Bash:
39
39
  node ~/.claude/scripts/gsd-t-event-writer.js --type command_invoked --command gsd-t-visualize --reasoning "Launching dashboard" || true
40
40
  ```
41
41
 
42
- ## Step 1.5: Graph Data for Dashboard
42
+ ## Step 1.5: Context Metrics for Dashboard
43
+
44
+ If `.gsd-t/token-log.md` exists, the dashboard server automatically reads it and provides context utilization metrics for visualization. These metrics are served from the `/api/token-breakdown` endpoint and rendered as:
45
+
46
+ 1. **Context utilization timeline** — Ctx% over time, ordered by Datetime-start
47
+ 2. **Token breakdown by domain** — bar chart grouping Tokens by Domain column (gracefully handles older rows without Domain column — they are grouped as "(untagged)")
48
+ 3. **Compaction proximity warnings** — rows where Ctx% >= 70 are highlighted; rows where Ctx% >= 85 are marked critical (🔴)
49
+
50
+ If `.gsd-t/token-log.md` does not exist, context metrics panels are hidden (not shown as errors).
51
+
52
+ ## Step 1.6: Graph Data for Dashboard
43
53
 
44
54
  If `.gsd-t/graph/index.json` exists, the dashboard can render entity-relationship visualizations from the graph data. The dashboard server will detect and serve graph data automatically — no additional configuration needed.
45
55
 
@@ -79,8 +79,13 @@ After phase agent returns — run via Bash:
79
79
  Compute tokens and compaction:
80
80
  - No compaction (TOK_END >= TOK_START): `TOKENS=$((TOK_END-TOK_START))`, COMPACTED=null
81
81
  - Compaction detected (TOK_END < TOK_START): `TOKENS=$(((TOK_MAX-TOK_START)+TOK_END))`, COMPACTED=$DT_END
82
- Append to `.gsd-t/token-log.md` (create with header `| Datetime-start | Datetime-end | Command | Step | Model | Duration(s) | Notes | Tokens | Compacted |` if missing):
83
- `| {DT_START} | {DT_END} | gsd-t-wave | {PHASE} | sonnet | {DURATION}s | phase: {PHASE} | {TOKENS} | {COMPACTED} |`
82
+ Compute context utilization run via Bash:
83
+ `if [ "${CLAUDE_CONTEXT_TOKENS_MAX:-0}" -gt 0 ]; then CTX_PCT=$(echo "scale=1; ${CLAUDE_CONTEXT_TOKENS_USED:-0} * 100 / ${CLAUDE_CONTEXT_TOKENS_MAX}" | bc); else CTX_PCT="N/A"; fi`
84
+ Alert on context thresholds (display to user inline):
85
+ - If CTX_PCT >= 85: `echo "🔴 CRITICAL: Context at ${CTX_PCT}% — compaction likely. Task MUST be split."`
86
+ - If CTX_PCT >= 70: `echo "⚠️ WARNING: Context at ${CTX_PCT}% — approaching compaction threshold. Consider splitting in plan."`
87
+ Append to `.gsd-t/token-log.md` (create with header `| Datetime-start | Datetime-end | Command | Step | Model | Duration(s) | Notes | Tokens | Compacted | Domain | Task | Ctx% |` if missing):
88
+ `| {DT_START} | {DT_END} | gsd-t-wave | {PHASE} | sonnet | {DURATION}s | phase: {PHASE} | {TOKENS} | {COMPACTED} | | | {CTX_PCT} |`
84
89
 
85
90
  ### Phase Sequence
86
91
 
@@ -114,8 +119,13 @@ Spawn agent → `commands/gsd-t-impact.md`
114
119
 
115
120
  #### 5. EXECUTE
116
121
  Spawn agent → `commands/gsd-t-execute.md`
117
- - This is the heaviest phase. The execute agent will handle its own domain agent spawning and QA agent internally.
118
- - After: Read `progress.md`, verify status = EXECUTED
122
+ - This is the heaviest phase. The execute agent uses **task-level dispatch** (fresh-dispatch-contract.md): one Task subagent per task within each domain, each receiving only scope.md + relevant contracts + single task + graph context + up to 5 prior summaries. The execute agent handles domain task-dispatching and QA internally.
123
+ - **Adaptive replanning**: After each domain completes, the execute agent runs a replan check (per `adaptive-replan-contract.md`). If a completed domain's task summaries reveal new constraints (e.g., deprecated API, wrong column name, incompatible library), the execute agent checks remaining domains' `tasks.md` files for invalidated assumptions and revises them on disk before dispatching the next domain. Maximum 2 replan cycles per execute run — if exceeded, execution pauses for user input. All replan decisions are logged to the Decision Log in `progress.md`. The wave phase summary includes any replan actions taken.
124
+ - **Team/parallel mode**: If the plan defines parallel domains (same wave), the execute agent dispatches each domain teammate with `isolation: "worktree"` (per worktree-isolation-contract.md). Each domain works in an isolated git worktree. After all domains complete, the execute agent runs the Sequential Merge Protocol: merge domain A → test → merge domain B → test. Per-domain rollback if tests fail. Worktrees are cleaned up after all merges complete.
125
+ - After: Read `progress.md`, verify status = EXECUTED. Phase summary must include replan actions if any occurred:
126
+ ```
127
+ 📋 Phase 5 (EXECUTE): {N}/{N} tasks done | Replan cycles: {N} | Domains revised: {list or "none"}
128
+ ```
119
129
 
120
130
  #### 6. TEST-SYNC
121
131
  Spawn agent → `commands/gsd-t-test-sync.md`
@@ -125,15 +135,19 @@ Spawn agent → `commands/gsd-t-test-sync.md`
125
135
  Spawn agent → `commands/gsd-t-integrate.md`
126
136
  - After: Read `progress.md`, verify status = INTEGRATED
127
137
 
128
- #### 8. VERIFY
138
+ #### 8. VERIFY + COMPLETE
129
139
  Spawn agent → `commands/gsd-t-verify.md`
140
+ - The verify agent runs all 8 standard quality gates **plus** the goal-backward verification step (Step 5.5 in gsd-t-verify.md), which checks that milestone goals are actually achieved end-to-end and scans for placeholder patterns per `.gsd-t/contracts/goal-backward-contract.md`
141
+ - Goal-backward runs after all structural gates pass — CRITICAL or HIGH findings block verification; MEDIUM findings are warnings only
142
+ - **Verify auto-invokes complete-milestone** (Step 8 of gsd-t-verify.md). The verify agent handles both verification AND milestone completion in a single agent context. Do NOT spawn a separate complete agent.
130
143
  - After: Read `progress.md`, check status:
131
- - VERIFIEDproceed to Complete
132
- - VERIFY_FAILEDhandle remediation (see Error Recovery)
133
-
134
- #### 9. COMPLETE
135
- Spawn agent → `commands/gsd-t-complete-milestone.md`
136
- - After: Read `progress.md`, verify status = COMPLETED
144
+ - COMPLETEDmilestone done (verify passed and auto-completed)
145
+ - VERIFIEDverify passed but complete-milestone failed — spawn a standalone complete agent as fallback
146
+ - VERIFY_FAILED → handle remediation (see Error Recovery) — includes goal-backward failures
147
+ - Phase summary must include the `Goal-Backward:` line from verify-report.md:
148
+ ```
149
+ 📋 Phase 8 (VERIFY+COMPLETE): {N} gates passed | Goal-Backward: {PASS/WARN/FAIL} — {N} requirements checked, {N} findings
150
+ ```
137
151
 
138
152
  ### Between Each Phase
139
153
 
@@ -286,16 +300,17 @@ If command files in `~/.claude/commands/` are tampered with, wave agents will ex
286
300
  │ check check check check + check │
287
301
  │ gate │
288
302
  │ │
289
- ┌──────────┐ ┌────────┐ ┌───────────┐ ┌─────────────────┐
290
- │ │ COMPLETE │ ← │ VERIFY │ ← │ INTEGRATE │ ←──── │ FULL TEST-SYNC │
291
- │ │ agent 9 │ │agent 8 │ │ agent 7 │ │ agent 6 │
292
- └────┬────┘ └────┬────┘ └─────┬─────┘ └────────┬────────┘
293
-
294
- archive status + status status
295
- git tag gate check check check
303
+ ┌──────────────────┐ ┌───────────┐ ┌─────────────────┐
304
+ │ │ VERIFY+COMPLETE │ ← │ INTEGRATE │ ←──── │ FULL TEST-SYNC │
305
+ │ │ agent 8 │ │ agent 7 │ │ agent 6 │
306
+ └────────┬─────────┘ └─────┬─────┘ └────────┬────────┘
307
+ ↓ ↓
308
+ gate check → status status
309
+ auto-complete check check
310
+ │ archive + tag │
296
311
  │ │
297
312
  │ Each agent: fresh context window, reads state from files, dies when done │
298
- │ Orchestrator: ~30KB total, never compacts
313
+ │ Orchestrator: 8 agents (was 9), ~30KB total, never compacts
299
314
  └──────────────────────────────────────────────────────────────────────────────┘
300
315
  ```
301
316