@tekyzinc/gsd-t 2.39.13 → 2.46.11
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +12 -0
- package/README.md +19 -10
- package/bin/desktop.ini +2 -0
- package/bin/global-sync-manager.js +350 -0
- package/bin/gsd-t.js +592 -2
- package/bin/metrics-collector.js +167 -0
- package/bin/metrics-rollup.js +200 -0
- package/bin/patch-lifecycle.js +195 -0
- package/bin/rule-engine.js +160 -0
- package/commands/desktop.ini +2 -0
- package/commands/gsd-t-complete-milestone.md +194 -6
- package/commands/gsd-t-debug.md +38 -3
- package/commands/gsd-t-doc-ripple.md +148 -0
- package/commands/gsd-t-execute.md +328 -54
- package/commands/gsd-t-help.md +32 -10
- package/commands/gsd-t-integrate.md +59 -7
- package/commands/gsd-t-metrics.md +143 -0
- package/commands/gsd-t-plan.md +49 -2
- package/commands/gsd-t-qa.md +26 -5
- package/commands/gsd-t-quick.md +36 -3
- package/commands/gsd-t-status.md +78 -0
- package/commands/gsd-t-test-sync.md +23 -2
- package/commands/gsd-t-verify.md +142 -10
- package/commands/gsd-t-visualize.md +11 -1
- package/commands/gsd-t-wave.md +64 -18
- package/docs/GSD-T-README.md +10 -6
- package/docs/architecture.md +84 -2
- package/docs/ci-examples/desktop.ini +2 -0
- package/docs/ci-examples/github-actions.yml +104 -0
- package/docs/ci-examples/gitlab-ci.yml +116 -0
- package/docs/desktop.ini +2 -0
- package/docs/framework-comparison-scorecard.md +160 -0
- package/docs/infrastructure.md +87 -1
- package/docs/prd-graph-engine.md +2 -2
- package/docs/prd-gsd2-hybrid.md +258 -135
- package/docs/requirements.md +66 -2
- package/examples/.gsd-t/contracts/desktop.ini +2 -0
- package/examples/.gsd-t/desktop.ini +2 -0
- package/examples/.gsd-t/domains/desktop.ini +2 -0
- package/examples/.gsd-t/domains/example-domain/desktop.ini +2 -0
- package/examples/desktop.ini +2 -0
- package/examples/rules/.gitkeep +0 -0
- package/examples/rules/desktop.ini +2 -0
- package/package.json +40 -40
- package/scripts/desktop.ini +2 -0
- package/scripts/gsd-t-dashboard-server.js +19 -2
- package/scripts/gsd-t-dashboard.html +63 -0
- package/scripts/gsd-t-event-writer.js +1 -0
- package/templates/CLAUDE-global.md +92 -10
- package/templates/desktop.ini +2 -0
|
@@ -50,9 +50,26 @@ For each contract file:
|
|
|
50
50
|
|
|
51
51
|
Fix any mismatches BEFORE proceeding to integration.
|
|
52
52
|
|
|
53
|
+
## Step 2.5: Worktree Merge Status Check
|
|
54
|
+
|
|
55
|
+
Before wiring integration points, check whether team mode execution left any domains with rolled-back worktree merges:
|
|
56
|
+
|
|
57
|
+
1. Read `.gsd-t/progress.md` — look for `[rollback]` entries in the Decision Log from the execute phase
|
|
58
|
+
2. If any domains were rolled back: list them and their failure reasons before proceeding
|
|
59
|
+
3. Integration point wiring should only proceed for domains whose worktree merges PASSED — rolled-back domains are not yet in the main working tree
|
|
60
|
+
|
|
61
|
+
If rolled-back domains exist, report them to the user (or if Level 3: log to `.gsd-t/deferred-items.md` as `[integration-gap] {domain}: not yet merged — worktree rollback during execute`). Do NOT attempt to re-merge rolled-back domains here; that requires re-running execute for the affected domain.
|
|
62
|
+
|
|
53
63
|
## Step 3: Wire Integration Points
|
|
54
64
|
|
|
55
|
-
Work through each integration point in `integration-points.md
|
|
65
|
+
Work through each integration point in `integration-points.md`. If integration work spans multiple domains with independent tasks, use the **task-level dispatch pattern** (per fresh-dispatch-contract.md): spawn one Task subagent per integration task, passing only the relevant contracts, the specific integration point to wire, and summaries from prior integration tasks (max 5, 10-20 lines each). This prevents context accumulation across integration tasks.
|
|
66
|
+
|
|
67
|
+
**Multi-domain integration merging**: If integration work itself requires merging domain outputs that weren't merged during execute (e.g., domains executed in separate waves and integration needs to combine them), use the Sequential Merge Protocol from `.gsd-t/contracts/worktree-isolation-contract.md`:
|
|
68
|
+
1. Sort domains by dependency order (from integration-points.md)
|
|
69
|
+
2. Merge domain A's branch → run tests → merge domain B's branch → run tests
|
|
70
|
+
3. If tests fail after a merge, roll back that domain's merge and log the failure
|
|
71
|
+
4. Contract validation runs between merges
|
|
72
|
+
5. All temporary branches cleaned up after integration completes
|
|
56
73
|
|
|
57
74
|
For each connection:
|
|
58
75
|
1. Identify the producing domain (provides the interface)
|
|
@@ -105,8 +122,14 @@ Spawn a QA subagent via the Task tool to verify contract compliance at all domai
|
|
|
105
122
|
Task subagent (general-purpose, model: haiku):
|
|
106
123
|
"Run contract compliance tests for this integration. Read .gsd-t/contracts/ for all contract definitions.
|
|
107
124
|
Test every domain boundary: verify that producers and consumers match their contract shapes.
|
|
108
|
-
Run
|
|
109
|
-
|
|
125
|
+
Run ALL configured test suites — detect and run every one:
|
|
126
|
+
a. Unit tests (vitest/jest/mocha): run the full suite
|
|
127
|
+
b. E2E tests: check for playwright.config.* or cypress.config.* — if found, run the FULL E2E suite
|
|
128
|
+
c. NEVER skip E2E when a config file exists. Running only unit tests is a QA FAILURE.
|
|
129
|
+
d. AUDIT E2E test quality: Review each Playwright spec — if any test only checks element existence
|
|
130
|
+
(isVisible, toBeAttached, toBeEnabled) without verifying functional behavior (state changes,
|
|
131
|
+
data loaded, content updated after actions), flag it as 'SHALLOW TEST — needs functional assertions'.
|
|
132
|
+
Report: 'Unit: X/Y pass | E2E: X/Y pass (or N/A if no config) | Boundary: pass/fail by contract | Shallow tests: N'"
|
|
110
133
|
```
|
|
111
134
|
|
|
112
135
|
**OBSERVABILITY LOGGING (MANDATORY):**
|
|
@@ -117,8 +140,13 @@ After subagent returns — run via Bash:
|
|
|
117
140
|
Compute tokens and compaction:
|
|
118
141
|
- No compaction (TOK_END >= TOK_START): `TOKENS=$((TOK_END-TOK_START))`, COMPACTED=null
|
|
119
142
|
- Compaction detected (TOK_END < TOK_START): `TOKENS=$(((TOK_MAX-TOK_START)+TOK_END))`, COMPACTED=$DT_END
|
|
120
|
-
|
|
121
|
-
|
|
143
|
+
Compute context utilization — run via Bash:
|
|
144
|
+
`if [ "${CLAUDE_CONTEXT_TOKENS_MAX:-0}" -gt 0 ]; then CTX_PCT=$(echo "scale=1; ${CLAUDE_CONTEXT_TOKENS_USED:-0} * 100 / ${CLAUDE_CONTEXT_TOKENS_MAX}" | bc); else CTX_PCT="N/A"; fi`
|
|
145
|
+
Alert on context thresholds (display to user inline):
|
|
146
|
+
- If CTX_PCT >= 85: `echo "🔴 CRITICAL: Context at ${CTX_PCT}% — compaction likely. Task MUST be split."`
|
|
147
|
+
- If CTX_PCT >= 70: `echo "⚠️ WARNING: Context at ${CTX_PCT}% — approaching compaction threshold. Consider splitting in plan."`
|
|
148
|
+
Append to `.gsd-t/token-log.md` (create with header `| Datetime-start | Datetime-end | Command | Step | Model | Duration(s) | Notes | Tokens | Compacted | Domain | Task | Ctx% |` if missing):
|
|
149
|
+
`| {DT_START} | {DT_END} | gsd-t-integrate | Step 5 | haiku | {DURATION}s | {pass/fail}, {N} boundaries tested | {TOKENS} | {COMPACTED} | | | {CTX_PCT} |`
|
|
122
150
|
If QA found issues, append each to `.gsd-t/qa-issues.md` (create with header `| Date | Command | Step | Model | Duration(s) | Severity | Finding |` if missing):
|
|
123
151
|
`| {DT_START} | gsd-t-integrate | Step 5 | haiku | {DURATION}s | {severity} | {finding} |`
|
|
124
152
|
|
|
@@ -145,11 +173,35 @@ Integration is where the real system takes shape. Verify documentation matches r
|
|
|
145
173
|
After integration and doc ripple, verify everything works together:
|
|
146
174
|
|
|
147
175
|
1. **Update tests**: Add or update integration tests for newly wired domain boundaries
|
|
148
|
-
2. **Run
|
|
176
|
+
2. **Run ALL configured test suites** — detect and run every one:
|
|
177
|
+
a. Unit/integration tests (vitest/jest/mocha)
|
|
178
|
+
b. If `playwright.config.*` exists → run `npx playwright test` (full suite, not just affected specs)
|
|
179
|
+
c. Unit tests alone are NEVER sufficient when E2E exists
|
|
180
|
+
d. Report: "Unit: X/Y pass | E2E: X/Y pass"
|
|
149
181
|
3. **Verify passing**: All tests must pass. If any fail, fix before proceeding (up to 2 attempts)
|
|
150
|
-
4. **
|
|
182
|
+
4. **Functional test quality**: Spot-check E2E specs — every assertion must verify functional behavior (state changed, data loaded, content updated after action), not just element existence. Shallow tests that would pass on an empty HTML page are not acceptable.
|
|
151
183
|
5. **Smoke test results**: Ensure the Step 4 smoke test results are still valid after any fixes
|
|
152
184
|
|
|
185
|
+
## Step 7.5: Doc-Ripple (Automated)
|
|
186
|
+
|
|
187
|
+
After all integration work is committed but before reporting completion:
|
|
188
|
+
|
|
189
|
+
1. Run threshold check — read `git diff --name-only HEAD~1` and evaluate against doc-ripple-contract.md trigger conditions
|
|
190
|
+
2. If SKIP: log "Doc-ripple: SKIP — {reason}" and proceed to completion
|
|
191
|
+
3. If FIRE: spawn doc-ripple agent:
|
|
192
|
+
|
|
193
|
+
⚙ [{model}] gsd-t-doc-ripple → blast radius analysis + parallel updates
|
|
194
|
+
|
|
195
|
+
Task subagent (general-purpose, model: sonnet):
|
|
196
|
+
"Execute the doc-ripple workflow per commands/gsd-t-doc-ripple.md.
|
|
197
|
+
Git diff context: {files changed list}
|
|
198
|
+
Command that triggered: integrate
|
|
199
|
+
Produce manifest at .gsd-t/doc-ripple-manifest.md.
|
|
200
|
+
Update all affected documents.
|
|
201
|
+
Report: 'Doc-ripple: {N} checked, {N} updated, {N} skipped'"
|
|
202
|
+
|
|
203
|
+
4. After doc-ripple returns, verify manifest exists and report summary inline
|
|
204
|
+
|
|
153
205
|
## Step 8: Handle Integration Issues
|
|
154
206
|
|
|
155
207
|
For each issue found:
|
|
@@ -0,0 +1,143 @@
|
|
|
1
|
+
# GSD-T: Metrics — View Task Telemetry and Process Health
|
|
2
|
+
|
|
3
|
+
You are displaying metrics data from the GSD-T telemetry system. Read JSONL files directly — no module imports needed.
|
|
4
|
+
|
|
5
|
+
## Step 1: Load Metrics Data
|
|
6
|
+
|
|
7
|
+
Read:
|
|
8
|
+
1. `.gsd-t/metrics/task-metrics.jsonl` — per-task telemetry records
|
|
9
|
+
2. `.gsd-t/metrics/rollup.jsonl` — per-milestone aggregation with ELO and heuristics
|
|
10
|
+
3. `.gsd-t/progress.md` — current milestone ID (for default filter)
|
|
11
|
+
|
|
12
|
+
If neither file exists: display "No metrics data yet. Metrics are collected automatically during execute, quick, and debug commands." and stop.
|
|
13
|
+
|
|
14
|
+
If `$ARGUMENTS` contains a milestone ID (e.g., "M25"), use that as the filter. Otherwise, use the current active milestone from progress.md.
|
|
15
|
+
|
|
16
|
+
## Step 2: Display Milestone Summary
|
|
17
|
+
|
|
18
|
+
From `rollup.jsonl`, find the entry matching the target milestone. Display:
|
|
19
|
+
|
|
20
|
+
```
|
|
21
|
+
## Metrics — {milestone}
|
|
22
|
+
|
|
23
|
+
| Metric | Value |
|
|
24
|
+
|---------------------|--------------------------------|
|
|
25
|
+
| Tasks | {total_tasks} |
|
|
26
|
+
| First-pass rate | {first_pass_rate * 100}% |
|
|
27
|
+
| Avg duration | {avg_duration_s}s |
|
|
28
|
+
| Avg context | {avg_context_pct}% |
|
|
29
|
+
| Total fix cycles | {total_fix_cycles} |
|
|
30
|
+
| Total tokens | {total_tokens} |
|
|
31
|
+
```
|
|
32
|
+
|
|
33
|
+
If no rollup entry exists for the milestone, compute summary directly from task-metrics.jsonl records.
|
|
34
|
+
|
|
35
|
+
## Step 3: Display Process ELO
|
|
36
|
+
|
|
37
|
+
```
|
|
38
|
+
## Process ELO
|
|
39
|
+
|
|
40
|
+
{elo_after} ({elo_delta > 0 ? '↑' : '↓'} {elo_delta} from {elo_before})
|
|
41
|
+
```
|
|
42
|
+
|
|
43
|
+
If no previous milestone, show: `{elo_after} (baseline — first milestone)`
|
|
44
|
+
|
|
45
|
+
## Step 4: Display Signal Distribution
|
|
46
|
+
|
|
47
|
+
From rollup `signal_distribution` or computed from task-metrics:
|
|
48
|
+
|
|
49
|
+
```
|
|
50
|
+
## Signal Distribution
|
|
51
|
+
|
|
52
|
+
| Signal Type | Count |
|
|
53
|
+
|------------------|-------|
|
|
54
|
+
| pass-through | {N} |
|
|
55
|
+
| fix-cycle | {N} |
|
|
56
|
+
| debug-invoked | {N} |
|
|
57
|
+
| user-correction | {N} |
|
|
58
|
+
| phase-skip | {N} |
|
|
59
|
+
```
|
|
60
|
+
|
|
61
|
+
## Step 5: Display Domain Breakdown
|
|
62
|
+
|
|
63
|
+
From rollup `domain_breakdown`:
|
|
64
|
+
|
|
65
|
+
```
|
|
66
|
+
## Domain Breakdown
|
|
67
|
+
|
|
68
|
+
| Domain | Tasks | Pass% | Avg Duration |
|
|
69
|
+
|-------------------|-------|-------|--------------|
|
|
70
|
+
| {domain} | {N} | {N}% | {N}s |
|
|
71
|
+
```
|
|
72
|
+
|
|
73
|
+
## Step 6: Display Trend Comparison
|
|
74
|
+
|
|
75
|
+
If `trend_delta` exists (previous milestone data available):
|
|
76
|
+
|
|
77
|
+
```
|
|
78
|
+
## Trend vs Previous Milestone
|
|
79
|
+
|
|
80
|
+
| Metric | Delta |
|
|
81
|
+
|-----------------|--------------------------|
|
|
82
|
+
| First-pass rate | {delta > 0 ? '↑' : '↓'} {delta}% |
|
|
83
|
+
| Avg duration | {delta > 0 ? '↑' : '↓'} {delta}s |
|
|
84
|
+
| ELO | {delta > 0 ? '↑' : '↓'} {delta} |
|
|
85
|
+
```
|
|
86
|
+
|
|
87
|
+
If no previous milestone: "First milestone — no trend data yet."
|
|
88
|
+
|
|
89
|
+
## Step 7: Display Heuristic Anomalies
|
|
90
|
+
|
|
91
|
+
If `heuristic_flags` has entries:
|
|
92
|
+
|
|
93
|
+
```
|
|
94
|
+
## Anomaly Detection
|
|
95
|
+
|
|
96
|
+
| Heuristic | Severity | Description |
|
|
97
|
+
|-------------------------------|----------|--------------------------|
|
|
98
|
+
| {heuristic} | {sev} | {description} |
|
|
99
|
+
```
|
|
100
|
+
|
|
101
|
+
If no anomalies: "No anomalies detected."
|
|
102
|
+
|
|
103
|
+
## Step 8: Cross-Project Comparison (when --cross-project flag present)
|
|
104
|
+
|
|
105
|
+
If `$ARGUMENTS` contains `--cross-project`:
|
|
106
|
+
|
|
107
|
+
1. Run via Bash:
|
|
108
|
+
```bash
|
|
109
|
+
node -e "const g = require('./bin/global-sync-manager.js'); const r = g.compareSignalDistributions(require('./package.json').name || require('path').basename(process.cwd())); console.log(JSON.stringify(r));" 2>/dev/null
|
|
110
|
+
```
|
|
111
|
+
|
|
112
|
+
2. If the result has `insufficient_data: true`, display:
|
|
113
|
+
"No global metrics yet — complete milestones in multiple projects to enable cross-project comparison"
|
|
114
|
+
|
|
115
|
+
3. Otherwise, display the cross-project signal distribution comparison:
|
|
116
|
+
```
|
|
117
|
+
## Cross-Project Signal Distribution
|
|
118
|
+
|
|
119
|
+
| Project | Tasks | Pass-Through | Fix-Cycle | Other |
|
|
120
|
+
|----------------------|-------|--------------|-----------|-------|
|
|
121
|
+
| {project} {★ if is_queried} | {N} | {rate} | {rate} | ... |
|
|
122
|
+
```
|
|
123
|
+
|
|
124
|
+
4. If `$ARGUMENTS` also contains `--domain {domainType}`, run:
|
|
125
|
+
```bash
|
|
126
|
+
node -e "const g = require('./bin/global-sync-manager.js'); const r = g.getDomainTypeComparison('{domainType}'); console.log(JSON.stringify(r));" 2>/dev/null
|
|
127
|
+
```
|
|
128
|
+
Display the domain-type comparison table:
|
|
129
|
+
```
|
|
130
|
+
## Domain-Type Comparison: {domainType}
|
|
131
|
+
|
|
132
|
+
| Project | Tasks | Pass-Through | Fix-Cycle |
|
|
133
|
+
|----------------------|-------|--------------|-----------|
|
|
134
|
+
| {project} | {N} | {count} | {count} |
|
|
135
|
+
```
|
|
136
|
+
|
|
137
|
+
If `--cross-project` is NOT in `$ARGUMENTS`: skip this step entirely (no change to existing behavior).
|
|
138
|
+
|
|
139
|
+
$ARGUMENTS
|
|
140
|
+
|
|
141
|
+
## Auto-Clear
|
|
142
|
+
|
|
143
|
+
All work is committed to project files. Execute `/clear` to free the context window for the next command.
|
package/commands/gsd-t-plan.md
CHANGED
|
@@ -25,6 +25,24 @@ If `.gsd-t/graph/meta.json` exists (graph index is available):
|
|
|
25
25
|
|
|
26
26
|
If graph is not available, skip this step.
|
|
27
27
|
|
|
28
|
+
## Step 1.7: Pre-Mortem — Historical Failure Analysis
|
|
29
|
+
|
|
30
|
+
Before creating task lists, check historical task-metrics for domain-level failure patterns from previous milestones:
|
|
31
|
+
|
|
32
|
+
1. Run via Bash:
|
|
33
|
+
`node -e "const c = require('./bin/metrics-collector.js'); const domains = [/* list domain names from scope files */]; domains.forEach(d => { const w = c.getPreFlightWarnings(d); if(w.length) w.forEach(x => console.log('⚠️ ' + x)); });" 2>/dev/null || true`
|
|
34
|
+
|
|
35
|
+
2. If any domain has `first_pass_rate < 0.6` historically:
|
|
36
|
+
- Display warning inline: `⚠️ Domain {name} has historically low first-pass rate ({rate}%). Consider: smaller tasks, more explicit acceptance criteria, or additional contract detail.`
|
|
37
|
+
- This is **non-blocking** — it informs task design, does not prevent planning.
|
|
38
|
+
|
|
39
|
+
3. If `.gsd-t/metrics/task-metrics.jsonl` does not exist: skip this step silently (first milestone, no historical data).
|
|
40
|
+
|
|
41
|
+
4. **Rule-based pre-mortem**: Run via Bash:
|
|
42
|
+
`node -e "const re = require('./bin/rule-engine.js'); const domains = [/* list domain names */]; domains.forEach(d => { const rules = re.getPreMortemRules(d); if(rules.length) rules.forEach(r => console.log('RULE ' + r.id + ': ' + r.name + ' — historically triggered for domains like ' + d)); });" 2>/dev/null || true`
|
|
43
|
+
|
|
44
|
+
If matching rules found: display warnings inline (non-blocking — informs task design). Falls back gracefully if rules.jsonl does not exist or is empty.
|
|
45
|
+
|
|
28
46
|
## Step 2: Create Task Lists Per Domain
|
|
29
47
|
|
|
30
48
|
### SharedCore-First Pre-Check
|
|
@@ -67,6 +85,30 @@ For each domain, write `.gsd-t/domains/{domain-name}/tasks.md`:
|
|
|
67
85
|
4. **Contract-bound**: Every task that crosses a domain boundary must reference the specific contract it implements
|
|
68
86
|
5. **Ordered**: Tasks within a domain are numbered in execution order
|
|
69
87
|
6. **No implicit knowledge**: Don't assume the executing agent remembers previous tasks — reference contracts and files explicitly
|
|
88
|
+
7. **Context-window fit**: Each task MUST be executable within a single context window. Apply the scope validation heuristics below.
|
|
89
|
+
|
|
90
|
+
### Task Scope Validation
|
|
91
|
+
|
|
92
|
+
After writing each task, apply this heuristic check before finalizing:
|
|
93
|
+
|
|
94
|
+
**Splitting candidates — flag if ANY of these are true:**
|
|
95
|
+
- Task lists **more than 5 files** to modify or create
|
|
96
|
+
- Task has **more than 3 complex dependencies** (other tasks, contracts, or external systems it must read and understand)
|
|
97
|
+
- Task description spans multiple distinct concerns (e.g., "implement X and also refactor Y and update Z docs")
|
|
98
|
+
|
|
99
|
+
**Warning threshold:** If a task is flagged, emit:
|
|
100
|
+
> ⚠️ **Task scope warning — {domain} Task {N}**: Estimated context load is high ({N} files, {N} dependencies). This task may approach the 70% context window threshold. Consider splitting into:
|
|
101
|
+
> - Task {N}a: {first concern}
|
|
102
|
+
> - Task {N}b: {second concern}
|
|
103
|
+
|
|
104
|
+
**Auto-split rule (Level 3 Full Auto):** If a task has >5 files AND >3 dependencies, split it automatically. Renumber subsequent tasks. Document the split rationale in the task's Dependencies field.
|
|
105
|
+
|
|
106
|
+
**Guidance for estimating context size:**
|
|
107
|
+
- Each file to read ≈ 1–5% of context window (varies by file size)
|
|
108
|
+
- CLAUDE.md + scope.md + constraints.md + contracts ≈ 15–25% baseline overhead
|
|
109
|
+
- Tasks with >5 files or >3 cross-domain contracts commonly exceed 70% total context
|
|
110
|
+
|
|
111
|
+
This rule implements the "task must fit in one context window" constraint — a task that compacts its subagent is a task that produces incomplete or corrupt output.
|
|
70
112
|
|
|
71
113
|
### Cross-Domain Duplicate Operation Scan
|
|
72
114
|
|
|
@@ -259,8 +301,13 @@ After subagent returns — run via Bash:
|
|
|
259
301
|
Compute tokens and compaction:
|
|
260
302
|
- No compaction (TOK_END >= TOK_START): `TOKENS=$((TOK_END-TOK_START))`, COMPACTED=null
|
|
261
303
|
- Compaction detected (TOK_END < TOK_START): `TOKENS=$(((TOK_MAX-TOK_START)+TOK_END))`, COMPACTED=$DT_END
|
|
262
|
-
|
|
263
|
-
|
|
304
|
+
Compute context utilization — run via Bash:
|
|
305
|
+
`if [ "${CLAUDE_CONTEXT_TOKENS_MAX:-0}" -gt 0 ]; then CTX_PCT=$(echo "scale=1; ${CLAUDE_CONTEXT_TOKENS_USED:-0} * 100 / ${CLAUDE_CONTEXT_TOKENS_MAX}" | bc); else CTX_PCT="N/A"; fi`
|
|
306
|
+
Alert on context thresholds (display to user inline):
|
|
307
|
+
- If CTX_PCT >= 85: `echo "🔴 CRITICAL: Context at ${CTX_PCT}% — compaction likely. Task MUST be split."`
|
|
308
|
+
- If CTX_PCT >= 70: `echo "⚠️ WARNING: Context at ${CTX_PCT}% — approaching compaction threshold. Consider splitting in plan."`
|
|
309
|
+
Append to `.gsd-t/token-log.md` (create with header `| Datetime-start | Datetime-end | Command | Step | Model | Duration(s) | Notes | Tokens | Compacted | Domain | Task | Ctx% |` if missing):
|
|
310
|
+
`| {DT_START} | {DT_END} | gsd-t-plan | Step 7 | haiku | {DURATION}s | {PASS/FAIL}, iteration {N} | {TOKENS} | {COMPACTED} | | | {CTX_PCT} |`
|
|
264
311
|
If validation FAIL, append each gap to `.gsd-t/qa-issues.md` (create with header `| Date | Command | Step | Model | Duration(s) | Severity | Finding |` if missing):
|
|
265
312
|
`| {DT_START} | gsd-t-plan | Step 7 | haiku | {DURATION}s | medium | {gap description} |`
|
|
266
313
|
|
package/commands/gsd-t-qa.md
CHANGED
|
@@ -81,13 +81,16 @@ Your behavior depends on which phase spawned you:
|
|
|
81
81
|
|
|
82
82
|
### During Verify
|
|
83
83
|
**Trigger**: Lead invokes verify phase
|
|
84
|
-
**Action**: Full test audit
|
|
84
|
+
**Action**: Full test audit + shallow test detection
|
|
85
85
|
|
|
86
86
|
1. Run ALL tests — contract tests, acceptance tests, edge case tests, existing project tests
|
|
87
87
|
2. Coverage audit: For every contract, confirm tests exist and pass
|
|
88
88
|
3. For every new feature/mode/flow, confirm Playwright specs cover happy path, error states, edge cases
|
|
89
|
-
4.
|
|
90
|
-
5.
|
|
89
|
+
4. **Shallow test audit**: Read every Playwright spec file. For each `test()` block, check whether the assertions verify functional behavior (state changes, data flow, content updates after actions) or only check element existence (isVisible, toBeAttached, toBeEnabled). Flag any test that would pass on an empty HTML shell as `SHALLOW — needs functional assertions`.
|
|
90
|
+
5. Gap report: List any untested contracts, code paths, AND shallow tests
|
|
91
|
+
6. Report: `QA: {pass|fail} — {N} contract tests, {N} acceptance tests, {N} edge case tests. Gaps: {list or "none"}. Shallow E2E tests: {N} (list or "none")`
|
|
92
|
+
|
|
93
|
+
**Shallow tests block verification.** A passing E2E suite where tests don't actually verify feature behavior is equivalent to a failing suite.
|
|
91
94
|
|
|
92
95
|
### During Quick
|
|
93
96
|
**Trigger**: Lead runs a quick task
|
|
@@ -189,10 +192,28 @@ For each table in `schema-contract.md`:
|
|
|
189
192
|
For each component in `component-contract.md`:
|
|
190
193
|
- Each `## ComponentName` → one `test.describe` block
|
|
191
194
|
- `Props:` → renders with required props, handles missing optional props
|
|
192
|
-
- `Events:` → event handlers fire correctly
|
|
193
|
-
- API references → verify correct API calls made
|
|
195
|
+
- `Events:` → event handlers fire correctly AND produce the expected state change
|
|
196
|
+
- API references → verify correct API calls made AND responses rendered correctly
|
|
194
197
|
- Auto-generate: empty form, partial form, network error handling
|
|
195
198
|
|
|
199
|
+
### Functional E2E Test Standard (MANDATORY for all Playwright specs)
|
|
200
|
+
|
|
201
|
+
**E2E tests that only verify element existence are LAYOUT tests, not functional tests. Layout tests pass even when every feature is broken. This is a QA failure.**
|
|
202
|
+
|
|
203
|
+
Every Playwright spec MUST verify functional behavior — that actions produce the correct outcome:
|
|
204
|
+
|
|
205
|
+
| Test Pattern | WRONG (layout test) | RIGHT (functional test) |
|
|
206
|
+
|---|---|---|
|
|
207
|
+
| Tab switching | `expect(tab).toBeVisible()` | Click tab → assert NEW content loaded (text, data unique to that tab) |
|
|
208
|
+
| Form submit | `expect(submitBtn).toBeEnabled()` | Fill form → submit → assert success message AND data persisted (API call, list updated) |
|
|
209
|
+
| Terminal/editor | `expect(terminal).toBeAttached()` | Open terminal → type command → assert output appears |
|
|
210
|
+
| WebSocket | `expect(statusBadge).toBeVisible()` | Wait for connection → assert status text changes to "Connected" → send message → assert response |
|
|
211
|
+
| Navigation | `expect(link).toHaveAttribute('href')` | Click link → assert URL changed AND destination content rendered |
|
|
212
|
+
| Toggle/mode | `expect(toggle).toBeVisible()` | Click toggle → assert the EFFECT (dark mode CSS applied, panel expanded with content, feature enabled) |
|
|
213
|
+
| Error state | `expect(errorDiv).toBeVisible()` | Trigger error → assert message content → assert recovery action works |
|
|
214
|
+
|
|
215
|
+
**Rule: If a test would pass on an empty HTML shell with the right element IDs, it is not a functional test. Every assertion must prove the feature works, not that the element exists.**
|
|
216
|
+
|
|
196
217
|
## Test File Conventions
|
|
197
218
|
|
|
198
219
|
- **Location**: Project's test directory (detected from `playwright.config.*` or `package.json`)
|
package/commands/gsd-t-quick.md
CHANGED
|
@@ -83,6 +83,16 @@ When you encounter unexpected situations:
|
|
|
83
83
|
5. Verify it works
|
|
84
84
|
6. Commit: `[quick] {description}`
|
|
85
85
|
|
|
86
|
+
## Step 3.5: Emit Task Metrics
|
|
87
|
+
|
|
88
|
+
After committing, emit a task-metrics record for this quick task — run via Bash:
|
|
89
|
+
`node bin/metrics-collector.js --milestone {current-milestone-or-none} --domain {domain-or-quick} --task quick-{timestamp} --command quick --duration_s {elapsed} --tokens_used {estimated} --context_pct ${CTX_PCT:-0} --pass {true|false} --fix_cycles {0|N} --signal_type {pass-through|fix-cycle} --notes "[quick] {description}" 2>/dev/null || true`
|
|
90
|
+
|
|
91
|
+
Signal type: `pass-through` if task completed on first attempt; `fix-cycle` if rework was needed.
|
|
92
|
+
|
|
93
|
+
Emit task_complete event — run via Bash:
|
|
94
|
+
`node ~/.claude/scripts/gsd-t-event-writer.js --type task_complete --command gsd-t-quick --reasoning "signal_type={signal_type}, domain={domain}" --outcome {success|failure} || true`
|
|
95
|
+
|
|
86
96
|
## Step 4: Document Ripple (if GSD-T is active)
|
|
87
97
|
|
|
88
98
|
If `.gsd-t/progress.md` exists, assess what documentation was affected and update ALL relevant files:
|
|
@@ -123,9 +133,12 @@ Quick does not mean skip testing. Before committing:
|
|
|
123
133
|
- Playwright E2E specs (if UI/routes/flows/modes changed): create new specs for new functionality, update existing specs for changed behavior
|
|
124
134
|
- Cover all modes/flags affected by this change
|
|
125
135
|
- "No feature code without test code" applies to quick tasks too
|
|
126
|
-
|
|
127
|
-
|
|
128
|
-
|
|
136
|
+
- **Functional tests only** — every E2E assertion must verify an action produced the correct outcome (state changed, data loaded, content updated). Tests that only check element existence (`isVisible`, `toBeEnabled`) are shallow/layout tests and are not acceptable. If a test would pass on an empty HTML page with the right IDs, rewrite it.
|
|
137
|
+
2. **Run ALL configured test suites** — not just affected tests, not just one suite:
|
|
138
|
+
a. Detect all runners: check for vitest/jest config, playwright.config.*, cypress.config.*
|
|
139
|
+
b. Run EVERY detected suite. Unit tests alone are NEVER sufficient when E2E exists.
|
|
140
|
+
c. If `playwright.config.*` exists → `npx playwright test` (full suite)
|
|
141
|
+
d. Report ALL results: "Unit: X/Y pass | E2E: X/Y pass"
|
|
129
142
|
- Fix any failures before proceeding (up to 2 attempts)
|
|
130
143
|
3. **Verify against requirements**:
|
|
131
144
|
- Does the change satisfy its intended requirement?
|
|
@@ -133,6 +146,26 @@ Quick does not mean skip testing. Before committing:
|
|
|
133
146
|
- If a contract exists for the interface touched, does the code still match?
|
|
134
147
|
4. **No test framework?**: Set one up, or at minimum manually verify and document how in the commit message
|
|
135
148
|
|
|
149
|
+
## Step 6: Doc-Ripple (Automated)
|
|
150
|
+
|
|
151
|
+
After all work is committed but before reporting completion:
|
|
152
|
+
|
|
153
|
+
1. Run threshold check — read `git diff --name-only HEAD~1` and evaluate against doc-ripple-contract.md trigger conditions
|
|
154
|
+
2. If SKIP: log "Doc-ripple: SKIP — {reason}" and proceed to completion
|
|
155
|
+
3. If FIRE: spawn doc-ripple agent:
|
|
156
|
+
|
|
157
|
+
⚙ [{model}] gsd-t-doc-ripple → blast radius analysis + parallel updates
|
|
158
|
+
|
|
159
|
+
Task subagent (general-purpose, model: sonnet):
|
|
160
|
+
"Execute the doc-ripple workflow per commands/gsd-t-doc-ripple.md.
|
|
161
|
+
Git diff context: {files changed list}
|
|
162
|
+
Command that triggered: quick
|
|
163
|
+
Produce manifest at .gsd-t/doc-ripple-manifest.md.
|
|
164
|
+
Update all affected documents.
|
|
165
|
+
Report: 'Doc-ripple: {N} checked, {N} updated, {N} skipped'"
|
|
166
|
+
|
|
167
|
+
4. After doc-ripple returns, verify manifest exists and report summary inline
|
|
168
|
+
|
|
136
169
|
$ARGUMENTS
|
|
137
170
|
|
|
138
171
|
## Auto-Clear
|
package/commands/gsd-t-status.md
CHANGED
|
@@ -57,6 +57,59 @@ If `.gsd-t/backlog.md` exists, read and parse it. Show total count and top 3 ite
|
|
|
57
57
|
If there are blockers or issues, highlight them.
|
|
58
58
|
If the user provides $ARGUMENTS, focus the status on that specific domain or aspect.
|
|
59
59
|
|
|
60
|
+
## Token Usage Breakdown
|
|
61
|
+
|
|
62
|
+
If `.gsd-t/token-log.md` exists, read it and append a token breakdown to the status report.
|
|
63
|
+
|
|
64
|
+
Parse each row in the table. Handle both old format (9 columns) and extended format (12 columns with Domain, Task, Ctx%). Rows with missing or empty Domain column are assigned domain "(untagged)".
|
|
65
|
+
|
|
66
|
+
### Token Usage by Domain
|
|
67
|
+
Group rows by Domain. For each domain, sum Tokens and collect all Ctx% values (ignoring "N/A" and empty). Display:
|
|
68
|
+
|
|
69
|
+
```
|
|
70
|
+
## Token Usage by Domain
|
|
71
|
+
| Domain | Tokens | Subagents | Peak Ctx% |
|
|
72
|
+
|----------------|--------|-----------|-----------|
|
|
73
|
+
| auth | 12,400 | 4 | 14% |
|
|
74
|
+
| notifications | 45,200 | 3 | 52% ⚠️ |
|
|
75
|
+
| (untagged) | 8,100 | 6 | N/A |
|
|
76
|
+
```
|
|
77
|
+
|
|
78
|
+
Flag any domain where Peak Ctx% >= 70 with `⚠️` suffix.
|
|
79
|
+
|
|
80
|
+
### Token Usage by Phase/Command
|
|
81
|
+
Group rows by Command. For each command, sum Tokens and count subagent rows. Display:
|
|
82
|
+
|
|
83
|
+
```
|
|
84
|
+
## Token Usage by Command
|
|
85
|
+
| Command | Tokens | Subagents |
|
|
86
|
+
|---------------|--------|-----------|
|
|
87
|
+
| gsd-t-execute | 86,200 | 14 |
|
|
88
|
+
| gsd-t-wave | 12,400 | 9 |
|
|
89
|
+
| gsd-t-plan | 3,400 | 1 |
|
|
90
|
+
```
|
|
91
|
+
|
|
92
|
+
If token-log.md does not exist or is empty, skip this section entirely (no error).
|
|
93
|
+
|
|
94
|
+
## Process Health
|
|
95
|
+
|
|
96
|
+
If `.gsd-t/metrics/rollup.jsonl` exists, read the latest entry and append to the status report:
|
|
97
|
+
|
|
98
|
+
```
|
|
99
|
+
Process Health:
|
|
100
|
+
ELO: {elo_after} ({elo_delta > 0 ? '↑' : '↓'} {elo_delta})
|
|
101
|
+
Quality: {first_pass_rate * 100}% first-pass rate | {total_fix_cycles} fix cycles
|
|
102
|
+
```
|
|
103
|
+
|
|
104
|
+
If `.gsd-t/metrics/task-metrics.jsonl` exists but no rollup.jsonl, compute first_pass_rate directly from task-metrics for the current milestone and display:
|
|
105
|
+
|
|
106
|
+
```
|
|
107
|
+
Process Health:
|
|
108
|
+
Quality: {rate}% first-pass rate (current milestone, no rollup yet)
|
|
109
|
+
```
|
|
110
|
+
|
|
111
|
+
If neither file exists, skip this section entirely.
|
|
112
|
+
|
|
60
113
|
## Graph Status
|
|
61
114
|
|
|
62
115
|
If `.gsd-t/graph/meta.json` exists, read it and append to the status report:
|
|
@@ -81,6 +134,31 @@ After displaying the project status, check for GSD-T updates:
|
|
|
81
134
|
|
|
82
135
|
5. If versions match, skip — don't show anything
|
|
83
136
|
|
|
137
|
+
## Global ELO & Cross-Project Rankings
|
|
138
|
+
|
|
139
|
+
After the Process Health section, check for global metrics:
|
|
140
|
+
|
|
141
|
+
1. Run via Bash:
|
|
142
|
+
```bash
|
|
143
|
+
node -e "const g = require('./bin/global-sync-manager.js'); const name = (() => { try { return require('./package.json').name; } catch { return require('path').basename(process.cwd()); } })(); const elo = g.getGlobalELO(name); const ranks = g.getProjectRankings(); console.log(JSON.stringify({ elo, ranks, name }));" 2>/dev/null
|
|
144
|
+
```
|
|
145
|
+
|
|
146
|
+
2. If the result returns `elo: null` or the command fails: display "No global metrics yet" and skip.
|
|
147
|
+
|
|
148
|
+
3. If global ELO data exists, display:
|
|
149
|
+
```
|
|
150
|
+
Global ELO: {elo} (rank #{position} of {total} projects)
|
|
151
|
+
```
|
|
152
|
+
Where position is the 1-based index of the current project in the rankings array.
|
|
153
|
+
|
|
154
|
+
4. If 2+ projects have global rollup data, display the top 5 rankings:
|
|
155
|
+
```
|
|
156
|
+
## Cross-Project Rankings (Top 5)
|
|
157
|
+
| Rank | Project | ELO | Latest Milestone |
|
|
158
|
+
|------|------------------|--------|------------------|
|
|
159
|
+
| 1 | {project} | {elo} | {milestone} |
|
|
160
|
+
```
|
|
161
|
+
|
|
84
162
|
$ARGUMENTS
|
|
85
163
|
|
|
86
164
|
## Auto-Clear
|
|
@@ -111,8 +111,8 @@ pytest tests/test_{module}.py -v
|
|
|
111
111
|
npm test -- --testPathPattern="{module}"
|
|
112
112
|
```
|
|
113
113
|
|
|
114
|
-
### B) E2E Tests
|
|
115
|
-
If
|
|
114
|
+
### B) E2E Tests (MANDATORY when config exists)
|
|
115
|
+
If `playwright.config.*` or `cypress.config.*` exists, you MUST run E2E tests — skipping is never acceptable:
|
|
116
116
|
|
|
117
117
|
```bash
|
|
118
118
|
# Playwright
|
|
@@ -151,6 +151,27 @@ If Playwright is configured (`playwright.config.*` or Playwright in dependencies
|
|
|
151
151
|
|
|
152
152
|
**This is NOT optional.** Every new code path that a user can reach must have a Playwright spec. "We'll add tests later" is never acceptable.
|
|
153
153
|
|
|
154
|
+
**FUNCTIONAL TESTS — NOT LAYOUT TESTS (MANDATORY):**
|
|
155
|
+
E2E specs that only check element existence (`isVisible`, `toBeAttached`, `toBeEnabled`) are
|
|
156
|
+
layout tests. Layout tests pass even when every feature is broken — they are worthless for QA.
|
|
157
|
+
|
|
158
|
+
Every Playwright assertion MUST verify **functional behavior** — that an action produced the
|
|
159
|
+
correct outcome:
|
|
160
|
+
- **Tab/navigation**: Click → assert the NEW content loaded (unique text, data, or elements
|
|
161
|
+
that only appear on the destination view). Never just assert the tab element exists.
|
|
162
|
+
- **Forms**: Fill → submit → assert success feedback AND data persisted (API call observed
|
|
163
|
+
via `page.waitForResponse`, or list/table updated with new entry).
|
|
164
|
+
- **Interactive widgets** (terminals, editors, code panels): Open → interact → assert the
|
|
165
|
+
widget responded (keystroke produced output, content was saved, command executed).
|
|
166
|
+
- **Connections** (WebSocket, SSE, polling): Assert status transitions ("Connecting" →
|
|
167
|
+
"Connected") and verify data flows through the connection.
|
|
168
|
+
- **State toggles** (dark mode, expand/collapse, enable/disable): Assert the EFFECT of the
|
|
169
|
+
toggle, not just that the toggle control exists.
|
|
170
|
+
- **Error handling**: Trigger error → assert error content → assert recovery path works.
|
|
171
|
+
|
|
172
|
+
**Rule: If a test would pass on an empty HTML page with the correct element IDs and no
|
|
173
|
+
JavaScript, it is not a functional test. Rewrite it.**
|
|
174
|
+
|
|
154
175
|
### D) Capture Results
|
|
155
176
|
For all test types:
|
|
156
177
|
- PASS: Test still valid
|