@tekyzinc/gsd-t 2.39.13 → 2.46.11

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (50) hide show
  1. package/CHANGELOG.md +12 -0
  2. package/README.md +19 -10
  3. package/bin/desktop.ini +2 -0
  4. package/bin/global-sync-manager.js +350 -0
  5. package/bin/gsd-t.js +592 -2
  6. package/bin/metrics-collector.js +167 -0
  7. package/bin/metrics-rollup.js +200 -0
  8. package/bin/patch-lifecycle.js +195 -0
  9. package/bin/rule-engine.js +160 -0
  10. package/commands/desktop.ini +2 -0
  11. package/commands/gsd-t-complete-milestone.md +194 -6
  12. package/commands/gsd-t-debug.md +38 -3
  13. package/commands/gsd-t-doc-ripple.md +148 -0
  14. package/commands/gsd-t-execute.md +328 -54
  15. package/commands/gsd-t-help.md +32 -10
  16. package/commands/gsd-t-integrate.md +59 -7
  17. package/commands/gsd-t-metrics.md +143 -0
  18. package/commands/gsd-t-plan.md +49 -2
  19. package/commands/gsd-t-qa.md +26 -5
  20. package/commands/gsd-t-quick.md +36 -3
  21. package/commands/gsd-t-status.md +78 -0
  22. package/commands/gsd-t-test-sync.md +23 -2
  23. package/commands/gsd-t-verify.md +142 -10
  24. package/commands/gsd-t-visualize.md +11 -1
  25. package/commands/gsd-t-wave.md +64 -18
  26. package/docs/GSD-T-README.md +10 -6
  27. package/docs/architecture.md +84 -2
  28. package/docs/ci-examples/desktop.ini +2 -0
  29. package/docs/ci-examples/github-actions.yml +104 -0
  30. package/docs/ci-examples/gitlab-ci.yml +116 -0
  31. package/docs/desktop.ini +2 -0
  32. package/docs/framework-comparison-scorecard.md +160 -0
  33. package/docs/infrastructure.md +87 -1
  34. package/docs/prd-graph-engine.md +2 -2
  35. package/docs/prd-gsd2-hybrid.md +258 -135
  36. package/docs/requirements.md +66 -2
  37. package/examples/.gsd-t/contracts/desktop.ini +2 -0
  38. package/examples/.gsd-t/desktop.ini +2 -0
  39. package/examples/.gsd-t/domains/desktop.ini +2 -0
  40. package/examples/.gsd-t/domains/example-domain/desktop.ini +2 -0
  41. package/examples/desktop.ini +2 -0
  42. package/examples/rules/.gitkeep +0 -0
  43. package/examples/rules/desktop.ini +2 -0
  44. package/package.json +40 -40
  45. package/scripts/desktop.ini +2 -0
  46. package/scripts/gsd-t-dashboard-server.js +19 -2
  47. package/scripts/gsd-t-dashboard.html +63 -0
  48. package/scripts/gsd-t-event-writer.js +1 -0
  49. package/templates/CLAUDE-global.md +92 -10
  50. package/templates/desktop.ini +2 -0
@@ -50,9 +50,26 @@ For each contract file:
50
50
 
51
51
  Fix any mismatches BEFORE proceeding to integration.
52
52
 
53
+ ## Step 2.5: Worktree Merge Status Check
54
+
55
+ Before wiring integration points, check whether team mode execution left any domains with rolled-back worktree merges:
56
+
57
+ 1. Read `.gsd-t/progress.md` — look for `[rollback]` entries in the Decision Log from the execute phase
58
+ 2. If any domains were rolled back: list them and their failure reasons before proceeding
59
+ 3. Integration point wiring should only proceed for domains whose worktree merges PASSED — rolled-back domains are not yet in the main working tree
60
+
61
+ If rolled-back domains exist, report them to the user (or if Level 3: log to `.gsd-t/deferred-items.md` as `[integration-gap] {domain}: not yet merged — worktree rollback during execute`). Do NOT attempt to re-merge rolled-back domains here; that requires re-running execute for the affected domain.
62
+
53
63
  ## Step 3: Wire Integration Points
54
64
 
55
- Work through each integration point in `integration-points.md`:
65
+ Work through each integration point in `integration-points.md`. If integration work spans multiple domains with independent tasks, use the **task-level dispatch pattern** (per fresh-dispatch-contract.md): spawn one Task subagent per integration task, passing only the relevant contracts, the specific integration point to wire, and summaries from prior integration tasks (max 5, 10-20 lines each). This prevents context accumulation across integration tasks.
66
+
67
+ **Multi-domain integration merging**: If integration work itself requires merging domain outputs that weren't merged during execute (e.g., domains executed in separate waves and integration needs to combine them), use the Sequential Merge Protocol from `.gsd-t/contracts/worktree-isolation-contract.md`:
68
+ 1. Sort domains by dependency order (from integration-points.md)
69
+ 2. Merge domain A's branch → run tests → merge domain B's branch → run tests
70
+ 3. If tests fail after a merge, roll back that domain's merge and log the failure
71
+ 4. Contract validation runs between merges
72
+ 5. All temporary branches cleaned up after integration completes
56
73
 
57
74
  For each connection:
58
75
  1. Identify the producing domain (provides the interface)
@@ -105,8 +122,14 @@ Spawn a QA subagent via the Task tool to verify contract compliance at all domai
105
122
  Task subagent (general-purpose, model: haiku):
106
123
  "Run contract compliance tests for this integration. Read .gsd-t/contracts/ for all contract definitions.
107
124
  Test every domain boundary: verify that producers and consumers match their contract shapes.
108
- Run the full test suite.
109
- Report: boundary-by-boundary test results with pass/fail counts."
125
+ Run ALL configured test suites — detect and run every one:
126
+ a. Unit tests (vitest/jest/mocha): run the full suite
127
+ b. E2E tests: check for playwright.config.* or cypress.config.* — if found, run the FULL E2E suite
128
+ c. NEVER skip E2E when a config file exists. Running only unit tests is a QA FAILURE.
129
+ d. AUDIT E2E test quality: Review each Playwright spec — if any test only checks element existence
130
+ (isVisible, toBeAttached, toBeEnabled) without verifying functional behavior (state changes,
131
+ data loaded, content updated after actions), flag it as 'SHALLOW TEST — needs functional assertions'.
132
+ Report: 'Unit: X/Y pass | E2E: X/Y pass (or N/A if no config) | Boundary: pass/fail by contract | Shallow tests: N'"
110
133
  ```
111
134
 
112
135
  **OBSERVABILITY LOGGING (MANDATORY):**
@@ -117,8 +140,13 @@ After subagent returns — run via Bash:
117
140
  Compute tokens and compaction:
118
141
  - No compaction (TOK_END >= TOK_START): `TOKENS=$((TOK_END-TOK_START))`, COMPACTED=null
119
142
  - Compaction detected (TOK_END < TOK_START): `TOKENS=$(((TOK_MAX-TOK_START)+TOK_END))`, COMPACTED=$DT_END
120
- Append to `.gsd-t/token-log.md` (create with header `| Datetime-start | Datetime-end | Command | Step | Model | Duration(s) | Notes | Tokens | Compacted |` if missing):
121
- `| {DT_START} | {DT_END} | gsd-t-integrate | Step 5 | haiku | {DURATION}s | {pass/fail}, {N} boundaries tested | {TOKENS} | {COMPACTED} |`
143
+ Compute context utilization run via Bash:
144
+ `if [ "${CLAUDE_CONTEXT_TOKENS_MAX:-0}" -gt 0 ]; then CTX_PCT=$(echo "scale=1; ${CLAUDE_CONTEXT_TOKENS_USED:-0} * 100 / ${CLAUDE_CONTEXT_TOKENS_MAX}" | bc); else CTX_PCT="N/A"; fi`
145
+ Alert on context thresholds (display to user inline):
146
+ - If CTX_PCT >= 85: `echo "🔴 CRITICAL: Context at ${CTX_PCT}% — compaction likely. Task MUST be split."`
147
+ - If CTX_PCT >= 70: `echo "⚠️ WARNING: Context at ${CTX_PCT}% — approaching compaction threshold. Consider splitting in plan."`
148
+ Append to `.gsd-t/token-log.md` (create with header `| Datetime-start | Datetime-end | Command | Step | Model | Duration(s) | Notes | Tokens | Compacted | Domain | Task | Ctx% |` if missing):
149
+ `| {DT_START} | {DT_END} | gsd-t-integrate | Step 5 | haiku | {DURATION}s | {pass/fail}, {N} boundaries tested | {TOKENS} | {COMPACTED} | | | {CTX_PCT} |`
122
150
  If QA found issues, append each to `.gsd-t/qa-issues.md` (create with header `| Date | Command | Step | Model | Duration(s) | Severity | Finding |` if missing):
123
151
  `| {DT_START} | gsd-t-integrate | Step 5 | haiku | {DURATION}s | {severity} | {finding} |`
124
152
 
@@ -145,11 +173,35 @@ Integration is where the real system takes shape. Verify documentation matches r
145
173
  After integration and doc ripple, verify everything works together:
146
174
 
147
175
  1. **Update tests**: Add or update integration tests for newly wired domain boundaries
148
- 2. **Run all tests**: Execute the full test suiteintegration often introduces cross-domain failures
176
+ 2. **Run ALL configured test suites**detect and run every one:
177
+ a. Unit/integration tests (vitest/jest/mocha)
178
+ b. If `playwright.config.*` exists → run `npx playwright test` (full suite, not just affected specs)
179
+ c. Unit tests alone are NEVER sufficient when E2E exists
180
+ d. Report: "Unit: X/Y pass | E2E: X/Y pass"
149
181
  3. **Verify passing**: All tests must pass. If any fail, fix before proceeding (up to 2 attempts)
150
- 4. **Run E2E tests**: If an E2E framework exists, run the full E2E suite integration is where end-to-end flows break
182
+ 4. **Functional test quality**: Spot-check E2E specs — every assertion must verify functional behavior (state changed, data loaded, content updated after action), not just element existence. Shallow tests that would pass on an empty HTML page are not acceptable.
151
183
  5. **Smoke test results**: Ensure the Step 4 smoke test results are still valid after any fixes
152
184
 
185
+ ## Step 7.5: Doc-Ripple (Automated)
186
+
187
+ After all integration work is committed but before reporting completion:
188
+
189
+ 1. Run threshold check — read `git diff --name-only HEAD~1` and evaluate against doc-ripple-contract.md trigger conditions
190
+ 2. If SKIP: log "Doc-ripple: SKIP — {reason}" and proceed to completion
191
+ 3. If FIRE: spawn doc-ripple agent:
192
+
193
+ ⚙ [{model}] gsd-t-doc-ripple → blast radius analysis + parallel updates
194
+
195
+ Task subagent (general-purpose, model: sonnet):
196
+ "Execute the doc-ripple workflow per commands/gsd-t-doc-ripple.md.
197
+ Git diff context: {files changed list}
198
+ Command that triggered: integrate
199
+ Produce manifest at .gsd-t/doc-ripple-manifest.md.
200
+ Update all affected documents.
201
+ Report: 'Doc-ripple: {N} checked, {N} updated, {N} skipped'"
202
+
203
+ 4. After doc-ripple returns, verify manifest exists and report summary inline
204
+
153
205
  ## Step 8: Handle Integration Issues
154
206
 
155
207
  For each issue found:
@@ -0,0 +1,143 @@
1
+ # GSD-T: Metrics — View Task Telemetry and Process Health
2
+
3
+ You are displaying metrics data from the GSD-T telemetry system. Read JSONL files directly — no module imports needed.
4
+
5
+ ## Step 1: Load Metrics Data
6
+
7
+ Read:
8
+ 1. `.gsd-t/metrics/task-metrics.jsonl` — per-task telemetry records
9
+ 2. `.gsd-t/metrics/rollup.jsonl` — per-milestone aggregation with ELO and heuristics
10
+ 3. `.gsd-t/progress.md` — current milestone ID (for default filter)
11
+
12
+ If neither file exists: display "No metrics data yet. Metrics are collected automatically during execute, quick, and debug commands." and stop.
13
+
14
+ If `$ARGUMENTS` contains a milestone ID (e.g., "M25"), use that as the filter. Otherwise, use the current active milestone from progress.md.
15
+
16
+ ## Step 2: Display Milestone Summary
17
+
18
+ From `rollup.jsonl`, find the entry matching the target milestone. Display:
19
+
20
+ ```
21
+ ## Metrics — {milestone}
22
+
23
+ | Metric | Value |
24
+ |---------------------|--------------------------------|
25
+ | Tasks | {total_tasks} |
26
+ | First-pass rate | {first_pass_rate * 100}% |
27
+ | Avg duration | {avg_duration_s}s |
28
+ | Avg context | {avg_context_pct}% |
29
+ | Total fix cycles | {total_fix_cycles} |
30
+ | Total tokens | {total_tokens} |
31
+ ```
32
+
33
+ If no rollup entry exists for the milestone, compute summary directly from task-metrics.jsonl records.
34
+
35
+ ## Step 3: Display Process ELO
36
+
37
+ ```
38
+ ## Process ELO
39
+
40
+ {elo_after} ({elo_delta > 0 ? '↑' : '↓'} {elo_delta} from {elo_before})
41
+ ```
42
+
43
+ If no previous milestone, show: `{elo_after} (baseline — first milestone)`
44
+
45
+ ## Step 4: Display Signal Distribution
46
+
47
+ From rollup `signal_distribution` or computed from task-metrics:
48
+
49
+ ```
50
+ ## Signal Distribution
51
+
52
+ | Signal Type | Count |
53
+ |------------------|-------|
54
+ | pass-through | {N} |
55
+ | fix-cycle | {N} |
56
+ | debug-invoked | {N} |
57
+ | user-correction | {N} |
58
+ | phase-skip | {N} |
59
+ ```
60
+
61
+ ## Step 5: Display Domain Breakdown
62
+
63
+ From rollup `domain_breakdown`:
64
+
65
+ ```
66
+ ## Domain Breakdown
67
+
68
+ | Domain | Tasks | Pass% | Avg Duration |
69
+ |-------------------|-------|-------|--------------|
70
+ | {domain} | {N} | {N}% | {N}s |
71
+ ```
72
+
73
+ ## Step 6: Display Trend Comparison
74
+
75
+ If `trend_delta` exists (previous milestone data available):
76
+
77
+ ```
78
+ ## Trend vs Previous Milestone
79
+
80
+ | Metric | Delta |
81
+ |-----------------|--------------------------|
82
+ | First-pass rate | {delta > 0 ? '↑' : '↓'} {delta}% |
83
+ | Avg duration | {delta > 0 ? '↑' : '↓'} {delta}s |
84
+ | ELO | {delta > 0 ? '↑' : '↓'} {delta} |
85
+ ```
86
+
87
+ If no previous milestone: "First milestone — no trend data yet."
88
+
89
+ ## Step 7: Display Heuristic Anomalies
90
+
91
+ If `heuristic_flags` has entries:
92
+
93
+ ```
94
+ ## Anomaly Detection
95
+
96
+ | Heuristic | Severity | Description |
97
+ |-------------------------------|----------|--------------------------|
98
+ | {heuristic} | {sev} | {description} |
99
+ ```
100
+
101
+ If no anomalies: "No anomalies detected."
102
+
103
+ ## Step 8: Cross-Project Comparison (when --cross-project flag present)
104
+
105
+ If `$ARGUMENTS` contains `--cross-project`:
106
+
107
+ 1. Run via Bash:
108
+ ```bash
109
+ node -e "const g = require('./bin/global-sync-manager.js'); const r = g.compareSignalDistributions(require('./package.json').name || require('path').basename(process.cwd())); console.log(JSON.stringify(r));" 2>/dev/null
110
+ ```
111
+
112
+ 2. If the result has `insufficient_data: true`, display:
113
+ "No global metrics yet — complete milestones in multiple projects to enable cross-project comparison"
114
+
115
+ 3. Otherwise, display the cross-project signal distribution comparison:
116
+ ```
117
+ ## Cross-Project Signal Distribution
118
+
119
+ | Project | Tasks | Pass-Through | Fix-Cycle | Other |
120
+ |----------------------|-------|--------------|-----------|-------|
121
+ | {project} {★ if is_queried} | {N} | {rate} | {rate} | ... |
122
+ ```
123
+
124
+ 4. If `$ARGUMENTS` also contains `--domain {domainType}`, run:
125
+ ```bash
126
+ node -e "const g = require('./bin/global-sync-manager.js'); const r = g.getDomainTypeComparison('{domainType}'); console.log(JSON.stringify(r));" 2>/dev/null
127
+ ```
128
+ Display the domain-type comparison table:
129
+ ```
130
+ ## Domain-Type Comparison: {domainType}
131
+
132
+ | Project | Tasks | Pass-Through | Fix-Cycle |
133
+ |----------------------|-------|--------------|-----------|
134
+ | {project} | {N} | {count} | {count} |
135
+ ```
136
+
137
+ If `--cross-project` is NOT in `$ARGUMENTS`: skip this step entirely (no change to existing behavior).
138
+
139
+ $ARGUMENTS
140
+
141
+ ## Auto-Clear
142
+
143
+ All work is committed to project files. Execute `/clear` to free the context window for the next command.
@@ -25,6 +25,24 @@ If `.gsd-t/graph/meta.json` exists (graph index is available):
25
25
 
26
26
  If graph is not available, skip this step.
27
27
 
28
+ ## Step 1.7: Pre-Mortem — Historical Failure Analysis
29
+
30
+ Before creating task lists, check historical task-metrics for domain-level failure patterns from previous milestones:
31
+
32
+ 1. Run via Bash:
33
+ `node -e "const c = require('./bin/metrics-collector.js'); const domains = [/* list domain names from scope files */]; domains.forEach(d => { const w = c.getPreFlightWarnings(d); if(w.length) w.forEach(x => console.log('⚠️ ' + x)); });" 2>/dev/null || true`
34
+
35
+ 2. If any domain has `first_pass_rate < 0.6` historically:
36
+ - Display warning inline: `⚠️ Domain {name} has historically low first-pass rate ({rate}%). Consider: smaller tasks, more explicit acceptance criteria, or additional contract detail.`
37
+ - This is **non-blocking** — it informs task design, does not prevent planning.
38
+
39
+ 3. If `.gsd-t/metrics/task-metrics.jsonl` does not exist: skip this step silently (first milestone, no historical data).
40
+
41
+ 4. **Rule-based pre-mortem**: Run via Bash:
42
+ `node -e "const re = require('./bin/rule-engine.js'); const domains = [/* list domain names */]; domains.forEach(d => { const rules = re.getPreMortemRules(d); if(rules.length) rules.forEach(r => console.log('RULE ' + r.id + ': ' + r.name + ' — historically triggered for domains like ' + d)); });" 2>/dev/null || true`
43
+
44
+ If matching rules found: display warnings inline (non-blocking — informs task design). Falls back gracefully if rules.jsonl does not exist or is empty.
45
+
28
46
  ## Step 2: Create Task Lists Per Domain
29
47
 
30
48
  ### SharedCore-First Pre-Check
@@ -67,6 +85,30 @@ For each domain, write `.gsd-t/domains/{domain-name}/tasks.md`:
67
85
  4. **Contract-bound**: Every task that crosses a domain boundary must reference the specific contract it implements
68
86
  5. **Ordered**: Tasks within a domain are numbered in execution order
69
87
  6. **No implicit knowledge**: Don't assume the executing agent remembers previous tasks — reference contracts and files explicitly
88
+ 7. **Context-window fit**: Each task MUST be executable within a single context window. Apply the scope validation heuristics below.
89
+
90
+ ### Task Scope Validation
91
+
92
+ After writing each task, apply this heuristic check before finalizing:
93
+
94
+ **Splitting candidates — flag if ANY of these are true:**
95
+ - Task lists **more than 5 files** to modify or create
96
+ - Task has **more than 3 complex dependencies** (other tasks, contracts, or external systems it must read and understand)
97
+ - Task description spans multiple distinct concerns (e.g., "implement X and also refactor Y and update Z docs")
98
+
99
+ **Warning threshold:** If a task is flagged, emit:
100
+ > ⚠️ **Task scope warning — {domain} Task {N}**: Estimated context load is high ({N} files, {N} dependencies). This task may approach the 70% context window threshold. Consider splitting into:
101
+ > - Task {N}a: {first concern}
102
+ > - Task {N}b: {second concern}
103
+
104
+ **Auto-split rule (Level 3 Full Auto):** If a task has >5 files AND >3 dependencies, split it automatically. Renumber subsequent tasks. Document the split rationale in the task's Dependencies field.
105
+
106
+ **Guidance for estimating context size:**
107
+ - Each file to read ≈ 1–5% of context window (varies by file size)
108
+ - CLAUDE.md + scope.md + constraints.md + contracts ≈ 15–25% baseline overhead
109
+ - Tasks with >5 files or >3 cross-domain contracts commonly exceed 70% total context
110
+
111
+ This rule implements the "task must fit in one context window" constraint — a task that compacts its subagent is a task that produces incomplete or corrupt output.
70
112
 
71
113
  ### Cross-Domain Duplicate Operation Scan
72
114
 
@@ -259,8 +301,13 @@ After subagent returns — run via Bash:
259
301
  Compute tokens and compaction:
260
302
  - No compaction (TOK_END >= TOK_START): `TOKENS=$((TOK_END-TOK_START))`, COMPACTED=null
261
303
  - Compaction detected (TOK_END < TOK_START): `TOKENS=$(((TOK_MAX-TOK_START)+TOK_END))`, COMPACTED=$DT_END
262
- Append to `.gsd-t/token-log.md` (create with header `| Datetime-start | Datetime-end | Command | Step | Model | Duration(s) | Notes | Tokens | Compacted |` if missing):
263
- `| {DT_START} | {DT_END} | gsd-t-plan | Step 7 | haiku | {DURATION}s | {PASS/FAIL}, iteration {N} | {TOKENS} | {COMPACTED} |`
304
+ Compute context utilization run via Bash:
305
+ `if [ "${CLAUDE_CONTEXT_TOKENS_MAX:-0}" -gt 0 ]; then CTX_PCT=$(echo "scale=1; ${CLAUDE_CONTEXT_TOKENS_USED:-0} * 100 / ${CLAUDE_CONTEXT_TOKENS_MAX}" | bc); else CTX_PCT="N/A"; fi`
306
+ Alert on context thresholds (display to user inline):
307
+ - If CTX_PCT >= 85: `echo "🔴 CRITICAL: Context at ${CTX_PCT}% — compaction likely. Task MUST be split."`
308
+ - If CTX_PCT >= 70: `echo "⚠️ WARNING: Context at ${CTX_PCT}% — approaching compaction threshold. Consider splitting in plan."`
309
+ Append to `.gsd-t/token-log.md` (create with header `| Datetime-start | Datetime-end | Command | Step | Model | Duration(s) | Notes | Tokens | Compacted | Domain | Task | Ctx% |` if missing):
310
+ `| {DT_START} | {DT_END} | gsd-t-plan | Step 7 | haiku | {DURATION}s | {PASS/FAIL}, iteration {N} | {TOKENS} | {COMPACTED} | | | {CTX_PCT} |`
264
311
  If validation FAIL, append each gap to `.gsd-t/qa-issues.md` (create with header `| Date | Command | Step | Model | Duration(s) | Severity | Finding |` if missing):
265
312
  `| {DT_START} | gsd-t-plan | Step 7 | haiku | {DURATION}s | medium | {gap description} |`
266
313
 
@@ -81,13 +81,16 @@ Your behavior depends on which phase spawned you:
81
81
 
82
82
  ### During Verify
83
83
  **Trigger**: Lead invokes verify phase
84
- **Action**: Full test audit
84
+ **Action**: Full test audit + shallow test detection
85
85
 
86
86
  1. Run ALL tests — contract tests, acceptance tests, edge case tests, existing project tests
87
87
  2. Coverage audit: For every contract, confirm tests exist and pass
88
88
  3. For every new feature/mode/flow, confirm Playwright specs cover happy path, error states, edge cases
89
- 4. Gap report: List any untested contracts or code paths
90
- 5. Report: `QA: {pass|fail} {N} contract tests, {N} acceptance tests, {N} edge case tests. Gaps: {list or "none"}`
89
+ 4. **Shallow test audit**: Read every Playwright spec file. For each `test()` block, check whether the assertions verify functional behavior (state changes, data flow, content updates after actions) or only check element existence (isVisible, toBeAttached, toBeEnabled). Flag any test that would pass on an empty HTML shell as `SHALLOW — needs functional assertions`.
90
+ 5. Gap report: List any untested contracts, code paths, AND shallow tests
91
+ 6. Report: `QA: {pass|fail} — {N} contract tests, {N} acceptance tests, {N} edge case tests. Gaps: {list or "none"}. Shallow E2E tests: {N} (list or "none")`
92
+
93
+ **Shallow tests block verification.** A passing E2E suite where tests don't actually verify feature behavior is equivalent to a failing suite.
91
94
 
92
95
  ### During Quick
93
96
  **Trigger**: Lead runs a quick task
@@ -189,10 +192,28 @@ For each table in `schema-contract.md`:
189
192
  For each component in `component-contract.md`:
190
193
  - Each `## ComponentName` → one `test.describe` block
191
194
  - `Props:` → renders with required props, handles missing optional props
192
- - `Events:` → event handlers fire correctly
193
- - API references → verify correct API calls made
195
+ - `Events:` → event handlers fire correctly AND produce the expected state change
196
+ - API references → verify correct API calls made AND responses rendered correctly
194
197
  - Auto-generate: empty form, partial form, network error handling
195
198
 
199
+ ### Functional E2E Test Standard (MANDATORY for all Playwright specs)
200
+
201
+ **E2E tests that only verify element existence are LAYOUT tests, not functional tests. Layout tests pass even when every feature is broken. This is a QA failure.**
202
+
203
+ Every Playwright spec MUST verify functional behavior — that actions produce the correct outcome:
204
+
205
+ | Test Pattern | WRONG (layout test) | RIGHT (functional test) |
206
+ |---|---|---|
207
+ | Tab switching | `expect(tab).toBeVisible()` | Click tab → assert NEW content loaded (text, data unique to that tab) |
208
+ | Form submit | `expect(submitBtn).toBeEnabled()` | Fill form → submit → assert success message AND data persisted (API call, list updated) |
209
+ | Terminal/editor | `expect(terminal).toBeAttached()` | Open terminal → type command → assert output appears |
210
+ | WebSocket | `expect(statusBadge).toBeVisible()` | Wait for connection → assert status text changes to "Connected" → send message → assert response |
211
+ | Navigation | `expect(link).toHaveAttribute('href')` | Click link → assert URL changed AND destination content rendered |
212
+ | Toggle/mode | `expect(toggle).toBeVisible()` | Click toggle → assert the EFFECT (dark mode CSS applied, panel expanded with content, feature enabled) |
213
+ | Error state | `expect(errorDiv).toBeVisible()` | Trigger error → assert message content → assert recovery action works |
214
+
215
+ **Rule: If a test would pass on an empty HTML shell with the right element IDs, it is not a functional test. Every assertion must prove the feature works, not that the element exists.**
216
+
196
217
  ## Test File Conventions
197
218
 
198
219
  - **Location**: Project's test directory (detected from `playwright.config.*` or `package.json`)
@@ -83,6 +83,16 @@ When you encounter unexpected situations:
83
83
  5. Verify it works
84
84
  6. Commit: `[quick] {description}`
85
85
 
86
+ ## Step 3.5: Emit Task Metrics
87
+
88
+ After committing, emit a task-metrics record for this quick task — run via Bash:
89
+ `node bin/metrics-collector.js --milestone {current-milestone-or-none} --domain {domain-or-quick} --task quick-{timestamp} --command quick --duration_s {elapsed} --tokens_used {estimated} --context_pct ${CTX_PCT:-0} --pass {true|false} --fix_cycles {0|N} --signal_type {pass-through|fix-cycle} --notes "[quick] {description}" 2>/dev/null || true`
90
+
91
+ Signal type: `pass-through` if task completed on first attempt; `fix-cycle` if rework was needed.
92
+
93
+ Emit task_complete event — run via Bash:
94
+ `node ~/.claude/scripts/gsd-t-event-writer.js --type task_complete --command gsd-t-quick --reasoning "signal_type={signal_type}, domain={domain}" --outcome {success|failure} || true`
95
+
86
96
  ## Step 4: Document Ripple (if GSD-T is active)
87
97
 
88
98
  If `.gsd-t/progress.md` exists, assess what documentation was affected and update ALL relevant files:
@@ -123,9 +133,12 @@ Quick does not mean skip testing. Before committing:
123
133
  - Playwright E2E specs (if UI/routes/flows/modes changed): create new specs for new functionality, update existing specs for changed behavior
124
134
  - Cover all modes/flags affected by this change
125
135
  - "No feature code without test code" applies to quick tasks too
126
- 2. **Run the FULL test suite** not just affected tests:
127
- - All unit/integration tests
128
- - Full Playwright E2E suite (if configured)
136
+ - **Functional tests only** — every E2E assertion must verify an action produced the correct outcome (state changed, data loaded, content updated). Tests that only check element existence (`isVisible`, `toBeEnabled`) are shallow/layout tests and are not acceptable. If a test would pass on an empty HTML page with the right IDs, rewrite it.
137
+ 2. **Run ALL configured test suites** — not just affected tests, not just one suite:
138
+ a. Detect all runners: check for vitest/jest config, playwright.config.*, cypress.config.*
139
+ b. Run EVERY detected suite. Unit tests alone are NEVER sufficient when E2E exists.
140
+ c. If `playwright.config.*` exists → `npx playwright test` (full suite)
141
+ d. Report ALL results: "Unit: X/Y pass | E2E: X/Y pass"
129
142
  - Fix any failures before proceeding (up to 2 attempts)
130
143
  3. **Verify against requirements**:
131
144
  - Does the change satisfy its intended requirement?
@@ -133,6 +146,26 @@ Quick does not mean skip testing. Before committing:
133
146
  - If a contract exists for the interface touched, does the code still match?
134
147
  4. **No test framework?**: Set one up, or at minimum manually verify and document how in the commit message
135
148
 
149
+ ## Step 6: Doc-Ripple (Automated)
150
+
151
+ After all work is committed but before reporting completion:
152
+
153
+ 1. Run threshold check — read `git diff --name-only HEAD~1` and evaluate against doc-ripple-contract.md trigger conditions
154
+ 2. If SKIP: log "Doc-ripple: SKIP — {reason}" and proceed to completion
155
+ 3. If FIRE: spawn doc-ripple agent:
156
+
157
+ ⚙ [{model}] gsd-t-doc-ripple → blast radius analysis + parallel updates
158
+
159
+ Task subagent (general-purpose, model: sonnet):
160
+ "Execute the doc-ripple workflow per commands/gsd-t-doc-ripple.md.
161
+ Git diff context: {files changed list}
162
+ Command that triggered: quick
163
+ Produce manifest at .gsd-t/doc-ripple-manifest.md.
164
+ Update all affected documents.
165
+ Report: 'Doc-ripple: {N} checked, {N} updated, {N} skipped'"
166
+
167
+ 4. After doc-ripple returns, verify manifest exists and report summary inline
168
+
136
169
  $ARGUMENTS
137
170
 
138
171
  ## Auto-Clear
@@ -57,6 +57,59 @@ If `.gsd-t/backlog.md` exists, read and parse it. Show total count and top 3 ite
57
57
  If there are blockers or issues, highlight them.
58
58
  If the user provides $ARGUMENTS, focus the status on that specific domain or aspect.
59
59
 
60
+ ## Token Usage Breakdown
61
+
62
+ If `.gsd-t/token-log.md` exists, read it and append a token breakdown to the status report.
63
+
64
+ Parse each row in the table. Handle both old format (9 columns) and extended format (12 columns with Domain, Task, Ctx%). Rows with missing or empty Domain column are assigned domain "(untagged)".
65
+
66
+ ### Token Usage by Domain
67
+ Group rows by Domain. For each domain, sum Tokens and collect all Ctx% values (ignoring "N/A" and empty). Display:
68
+
69
+ ```
70
+ ## Token Usage by Domain
71
+ | Domain | Tokens | Subagents | Peak Ctx% |
72
+ |----------------|--------|-----------|-----------|
73
+ | auth | 12,400 | 4 | 14% |
74
+ | notifications | 45,200 | 3 | 52% ⚠️ |
75
+ | (untagged) | 8,100 | 6 | N/A |
76
+ ```
77
+
78
+ Flag any domain where Peak Ctx% >= 70 with `⚠️` suffix.
79
+
80
+ ### Token Usage by Phase/Command
81
+ Group rows by Command. For each command, sum Tokens and count subagent rows. Display:
82
+
83
+ ```
84
+ ## Token Usage by Command
85
+ | Command | Tokens | Subagents |
86
+ |---------------|--------|-----------|
87
+ | gsd-t-execute | 86,200 | 14 |
88
+ | gsd-t-wave | 12,400 | 9 |
89
+ | gsd-t-plan | 3,400 | 1 |
90
+ ```
91
+
92
+ If token-log.md does not exist or is empty, skip this section entirely (no error).
93
+
94
+ ## Process Health
95
+
96
+ If `.gsd-t/metrics/rollup.jsonl` exists, read the latest entry and append to the status report:
97
+
98
+ ```
99
+ Process Health:
100
+ ELO: {elo_after} ({elo_delta > 0 ? '↑' : '↓'} {elo_delta})
101
+ Quality: {first_pass_rate * 100}% first-pass rate | {total_fix_cycles} fix cycles
102
+ ```
103
+
104
+ If `.gsd-t/metrics/task-metrics.jsonl` exists but no rollup.jsonl, compute first_pass_rate directly from task-metrics for the current milestone and display:
105
+
106
+ ```
107
+ Process Health:
108
+ Quality: {rate}% first-pass rate (current milestone, no rollup yet)
109
+ ```
110
+
111
+ If neither file exists, skip this section entirely.
112
+
60
113
  ## Graph Status
61
114
 
62
115
  If `.gsd-t/graph/meta.json` exists, read it and append to the status report:
@@ -81,6 +134,31 @@ After displaying the project status, check for GSD-T updates:
81
134
 
82
135
  5. If versions match, skip — don't show anything
83
136
 
137
+ ## Global ELO & Cross-Project Rankings
138
+
139
+ After the Process Health section, check for global metrics:
140
+
141
+ 1. Run via Bash:
142
+ ```bash
143
+ node -e "const g = require('./bin/global-sync-manager.js'); const name = (() => { try { return require('./package.json').name; } catch { return require('path').basename(process.cwd()); } })(); const elo = g.getGlobalELO(name); const ranks = g.getProjectRankings(); console.log(JSON.stringify({ elo, ranks, name }));" 2>/dev/null
144
+ ```
145
+
146
+ 2. If the result returns `elo: null` or the command fails: display "No global metrics yet" and skip.
147
+
148
+ 3. If global ELO data exists, display:
149
+ ```
150
+ Global ELO: {elo} (rank #{position} of {total} projects)
151
+ ```
152
+ Where position is the 1-based index of the current project in the rankings array.
153
+
154
+ 4. If 2+ projects have global rollup data, display the top 5 rankings:
155
+ ```
156
+ ## Cross-Project Rankings (Top 5)
157
+ | Rank | Project | ELO | Latest Milestone |
158
+ |------|------------------|--------|------------------|
159
+ | 1 | {project} | {elo} | {milestone} |
160
+ ```
161
+
84
162
  $ARGUMENTS
85
163
 
86
164
  ## Auto-Clear
@@ -111,8 +111,8 @@ pytest tests/test_{module}.py -v
111
111
  npm test -- --testPathPattern="{module}"
112
112
  ```
113
113
 
114
- ### B) E2E Tests
115
- If an E2E framework is detected, run E2E tests affected by the changes:
114
+ ### B) E2E Tests (MANDATORY when config exists)
115
+ If `playwright.config.*` or `cypress.config.*` exists, you MUST run E2E tests skipping is never acceptable:
116
116
 
117
117
  ```bash
118
118
  # Playwright
@@ -151,6 +151,27 @@ If Playwright is configured (`playwright.config.*` or Playwright in dependencies
151
151
 
152
152
  **This is NOT optional.** Every new code path that a user can reach must have a Playwright spec. "We'll add tests later" is never acceptable.
153
153
 
154
+ **FUNCTIONAL TESTS — NOT LAYOUT TESTS (MANDATORY):**
155
+ E2E specs that only check element existence (`isVisible`, `toBeAttached`, `toBeEnabled`) are
156
+ layout tests. Layout tests pass even when every feature is broken — they are worthless for QA.
157
+
158
+ Every Playwright assertion MUST verify **functional behavior** — that an action produced the
159
+ correct outcome:
160
+ - **Tab/navigation**: Click → assert the NEW content loaded (unique text, data, or elements
161
+ that only appear on the destination view). Never just assert the tab element exists.
162
+ - **Forms**: Fill → submit → assert success feedback AND data persisted (API call observed
163
+ via `page.waitForResponse`, or list/table updated with new entry).
164
+ - **Interactive widgets** (terminals, editors, code panels): Open → interact → assert the
165
+ widget responded (keystroke produced output, content was saved, command executed).
166
+ - **Connections** (WebSocket, SSE, polling): Assert status transitions ("Connecting" →
167
+ "Connected") and verify data flows through the connection.
168
+ - **State toggles** (dark mode, expand/collapse, enable/disable): Assert the EFFECT of the
169
+ toggle, not just that the toggle control exists.
170
+ - **Error handling**: Trigger error → assert error content → assert recovery path works.
171
+
172
+ **Rule: If a test would pass on an empty HTML page with the correct element IDs and no
173
+ JavaScript, it is not a functional test. Rewrite it.**
174
+
154
175
  ### D) Capture Results
155
176
  For all test types:
156
177
  - PASS: Test still valid