@tekyzinc/gsd-t 2.45.11 → 2.46.11
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +12 -0
- package/README.md +4 -3
- package/commands/gsd-t-complete-milestone.md +2 -1
- package/commands/gsd-t-debug.md +22 -1
- package/commands/gsd-t-doc-ripple.md +148 -0
- package/commands/gsd-t-execute.md +72 -3
- package/commands/gsd-t-help.md +7 -0
- package/commands/gsd-t-integrate.md +25 -1
- package/commands/gsd-t-qa.md +26 -5
- package/commands/gsd-t-quick.md +21 -0
- package/commands/gsd-t-test-sync.md +21 -0
- package/commands/gsd-t-verify.md +2 -1
- package/commands/gsd-t-wave.md +31 -0
- package/docs/GSD-T-README.md +1 -0
- package/docs/framework-comparison-scorecard.md +160 -0
- package/docs/requirements.md +3 -0
- package/examples/rules/desktop.ini +2 -0
- package/package.json +2 -2
- package/templates/CLAUDE-global.md +63 -2
package/CHANGELOG.md
CHANGED
|
@@ -2,6 +2,18 @@
|
|
|
2
2
|
|
|
3
3
|
All notable changes to GSD-T are documented here. Updated with each release.
|
|
4
4
|
|
|
5
|
+
## [2.46.11] - 2026-03-24
|
|
6
|
+
|
|
7
|
+
### Added
|
|
8
|
+
- **M28: Doc-Ripple Subagent** — automated document ripple enforcement agent. Threshold check (7 FIRE/3 SKIP conditions), blast radius analysis, manifest generation, parallel document updates. New command: `gsd-t-doc-ripple`. 43 new tests. Wired into execute, integrate, quick, debug, wave.
|
|
9
|
+
- **Orchestrator context self-check** — execute and wave orchestrators now check their own context utilization after every domain/phase. If >= 70%, saves progress and stops to prevent session breaks.
|
|
10
|
+
- **Functional E2E test quality standard (REQ-050)** — Playwright specs must verify functional behavior, not just element existence. Shallow test audit added to qa, test-sync, verify, complete-milestone commands.
|
|
11
|
+
- **Document Ripple Completion Gate (REQ-051)** — structural rule preventing "done" reports until all downstream documents are updated.
|
|
12
|
+
|
|
13
|
+
### Changed
|
|
14
|
+
- Command count: 50 → 51 (added `gsd-t-doc-ripple`)
|
|
15
|
+
- Package description updated to include doc-ripple enforcement
|
|
16
|
+
|
|
5
17
|
## [2.39.12] - 2026-03-19
|
|
6
18
|
|
|
7
19
|
### Added
|
package/README.md
CHANGED
|
@@ -22,7 +22,7 @@ A methodology for reliable, parallelizable development using Claude Code with op
|
|
|
22
22
|
npx @tekyzinc/gsd-t install
|
|
23
23
|
```
|
|
24
24
|
|
|
25
|
-
This installs
|
|
25
|
+
This installs 46 GSD-T commands + 5 utility commands (51 total) to `~/.claude/commands/` and the global CLAUDE.md to `~/.claude/CLAUDE.md`. Works on Windows, Mac, and Linux.
|
|
26
26
|
|
|
27
27
|
### Start Using It
|
|
28
28
|
|
|
@@ -141,6 +141,7 @@ This will replace changed command files, back up your CLAUDE.md if customized, a
|
|
|
141
141
|
| `/user:gsd-t-execute` | Run tasks — task-level fresh dispatch, worktree isolation, adaptive replanning | In wave |
|
|
142
142
|
| `/user:gsd-t-test-sync` | Sync tests with code changes | In wave |
|
|
143
143
|
| `/user:gsd-t-qa` | QA agent — test generation, execution, gap reporting | Auto-spawned |
|
|
144
|
+
| `/user:gsd-t-doc-ripple` | Automated document ripple — update downstream docs after code changes | Auto-spawned |
|
|
144
145
|
| `/user:gsd-t-integrate` | Wire domains together | In wave |
|
|
145
146
|
| `/user:gsd-t-verify` | Run quality gates + goal-backward behavior verification | In wave |
|
|
146
147
|
| `/user:gsd-t-complete-milestone` | Archive + git tag (goal-backward gate required) | In wave |
|
|
@@ -314,8 +315,8 @@ get-stuff-done-teams/
|
|
|
314
315
|
├── LICENSE
|
|
315
316
|
├── bin/
|
|
316
317
|
│ └── gsd-t.js # CLI installer
|
|
317
|
-
├── commands/ #
|
|
318
|
-
│ ├── gsd-t-*.md #
|
|
318
|
+
├── commands/ # 51 slash commands
|
|
319
|
+
│ ├── gsd-t-*.md # 45 GSD-T workflow commands
|
|
319
320
|
│ ├── gsd.md # GSD-T smart router
|
|
320
321
|
│ ├── branch.md # Git branch helper
|
|
321
322
|
│ ├── checkin.md # Auto-version + commit/push helper
|
|
@@ -445,8 +445,9 @@ Verify the milestone is truly complete:
|
|
|
445
445
|
c. If specs are missing or stale, invoke `gsd-t-test-sync` first.
|
|
446
446
|
d. Report: "Unit: X/Y pass | E2E: X/Y pass"
|
|
447
447
|
2. **Verify all pass**: Every test must pass. If any fail, fix before tagging (up to 2 attempts)
|
|
448
|
+
3. **Functional test quality gate**: Read every Playwright spec. Verify assertions check **functional behavior** (state changed after action, data loaded, content updated, widget responded to input) — NOT just element existence (`isVisible`, `toBeAttached`, `toBeEnabled`). Shallow tests that would pass on an empty HTML page with the right element IDs are a milestone completion FAIL. Flag and rewrite before proceeding.
|
|
448
449
|
4. **Compare to baseline**: If a test baseline was recorded at milestone start, verify coverage has improved or at minimum not regressed
|
|
449
|
-
5. **Log test results**: Include test pass/fail counts in the milestone summary (Step 4)
|
|
450
|
+
5. **Log test results**: Include test pass/fail counts and shallow test audit results in the milestone summary (Step 4)
|
|
450
451
|
|
|
451
452
|
## Step 11: Create Git Tag
|
|
452
453
|
|
package/commands/gsd-t-debug.md
CHANGED
|
@@ -288,7 +288,8 @@ Before committing, ensure the fix is solid:
|
|
|
288
288
|
d. Report ALL results: "Unit: X/Y pass | E2E: X/Y pass"
|
|
289
289
|
3. **Verify passing**: All tests must pass. If any fail, fix before proceeding (up to 2 attempts)
|
|
290
290
|
4. **If the project has a UI but no E2E specs cover the fixed area**: WRITE THEM.
|
|
291
|
-
5. **
|
|
291
|
+
5. **Functional test quality**: Every E2E assertion must verify an action produced the correct outcome (state changed, data loaded, content updated) — not just that elements exist. Tests that only check `isVisible`/`toBeEnabled` are shallow layout tests and don't catch real bugs. If a test would pass on an empty HTML page with the right IDs, rewrite it.
|
|
292
|
+
6. **Regression check**: Confirm the fix doesn't break any adjacent functionality
|
|
292
293
|
|
|
293
294
|
Commit: `[debug] Fix {description} — root cause: {explanation}`
|
|
294
295
|
|
|
@@ -302,6 +303,26 @@ Signal type is always `debug-invoked` for debug sessions.
|
|
|
302
303
|
Emit task_complete event — run via Bash:
|
|
303
304
|
`node ~/.claude/scripts/gsd-t-event-writer.js --type task_complete --command gsd-t-debug --reasoning "signal_type=debug-invoked, domain={domain}" --outcome {success|failure} || true`
|
|
304
305
|
|
|
306
|
+
## Step 6: Doc-Ripple (Automated)
|
|
307
|
+
|
|
308
|
+
After all work is committed but before reporting completion:
|
|
309
|
+
|
|
310
|
+
1. Run threshold check — read `git diff --name-only HEAD~1` and evaluate against doc-ripple-contract.md trigger conditions
|
|
311
|
+
2. If SKIP: log "Doc-ripple: SKIP — {reason}" and proceed to completion
|
|
312
|
+
3. If FIRE: spawn doc-ripple agent:
|
|
313
|
+
|
|
314
|
+
⚙ [{model}] gsd-t-doc-ripple → blast radius analysis + parallel updates
|
|
315
|
+
|
|
316
|
+
Task subagent (general-purpose, model: sonnet):
|
|
317
|
+
"Execute the doc-ripple workflow per commands/gsd-t-doc-ripple.md.
|
|
318
|
+
Git diff context: {files changed list}
|
|
319
|
+
Command that triggered: debug
|
|
320
|
+
Produce manifest at .gsd-t/doc-ripple-manifest.md.
|
|
321
|
+
Update all affected documents.
|
|
322
|
+
Report: 'Doc-ripple: {N} checked, {N} updated, {N} skipped'"
|
|
323
|
+
|
|
324
|
+
4. After doc-ripple returns, verify manifest exists and report summary inline
|
|
325
|
+
|
|
305
326
|
$ARGUMENTS
|
|
306
327
|
|
|
307
328
|
## Auto-Clear
|
|
@@ -0,0 +1,148 @@
|
|
|
1
|
+
# GSD-T: Doc-Ripple — Automated Document Ripple Enforcement
|
|
2
|
+
|
|
3
|
+
You are the doc-ripple agent. You identify and update all downstream documents after code changes. You are spawned by execute, integrate, quick, debug, and wave after primary work is committed.
|
|
4
|
+
|
|
5
|
+
## Step 1: Load Context
|
|
6
|
+
|
|
7
|
+
Read:
|
|
8
|
+
1. `CLAUDE.md` — project conventions and Pre-Commit Gate (project-specific extensions)
|
|
9
|
+
2. `.gsd-t/contracts/doc-ripple-contract.md` — trigger conditions, manifest format, update protocol
|
|
10
|
+
3. `.gsd-t/contracts/pre-commit-gate.md` — the gate checklist you cross-reference
|
|
11
|
+
|
|
12
|
+
Run via Bash:
|
|
13
|
+
`git diff --name-only HEAD~1 2>/dev/null || git diff --cached --name-only`
|
|
14
|
+
|
|
15
|
+
Store the changed file list for Steps 2–3.
|
|
16
|
+
|
|
17
|
+
## Step 2: Threshold Check
|
|
18
|
+
|
|
19
|
+
Evaluate the changed file list against trigger conditions from `doc-ripple-contract.md`.
|
|
20
|
+
|
|
21
|
+
Output exactly:
|
|
22
|
+
```
|
|
23
|
+
DOC-RIPPLE THRESHOLD: {FIRE|SKIP}
|
|
24
|
+
Files changed: {N} across {N} directories
|
|
25
|
+
Cross-cutting signals: {list or "none"}
|
|
26
|
+
Reason: {brief explanation}
|
|
27
|
+
```
|
|
28
|
+
|
|
29
|
+
**If SKIP**: log the decision, output `Doc-ripple: SKIP — {reason}`, and stop. No manifest. No updates.
|
|
30
|
+
|
|
31
|
+
**If FIRE**: proceed to Step 3.
|
|
32
|
+
|
|
33
|
+
## Step 3: Blast Radius Analysis
|
|
34
|
+
|
|
35
|
+
For each changed file, classify its type: source, test, contract, template, command, doc, config.
|
|
36
|
+
|
|
37
|
+
Cross-reference `.gsd-t/contracts/pre-commit-gate.md` gate checklist:
|
|
38
|
+
- API endpoint/shape changed → api-contract.md, Swagger spec, CLAUDE.md, README.md
|
|
39
|
+
- Database schema changed → schema-contract.md, docs/schema.md
|
|
40
|
+
- UI component interface changed → component-contract.md
|
|
41
|
+
- New files or directories added → owning domain scope.md
|
|
42
|
+
- Requirement implemented or changed → docs/requirements.md
|
|
43
|
+
- Component or data flow changed → docs/architecture.md
|
|
44
|
+
- Any file modified → .gsd-t/progress.md (Decision Log entry)
|
|
45
|
+
- Architectural decision made → .gsd-t/progress.md (with rationale)
|
|
46
|
+
- Tech debt discovered or fixed → .gsd-t/techdebt.md
|
|
47
|
+
- New convention established → CLAUDE.md or domain constraints.md
|
|
48
|
+
- Command file added/changed → GSD-T-README.md, README.md, templates/CLAUDE-global.md, commands/gsd-t-help.md
|
|
49
|
+
- Command added/removed → all 4 above + package.json version + bin/gsd-t.js count
|
|
50
|
+
- Wave flow changed → gsd-t-wave.md, GSD-T-README.md, README.md
|
|
51
|
+
- Template changed → verify gsd-t-init output
|
|
52
|
+
|
|
53
|
+
Build the final list: `{ document, status (UPDATED|SKIPPED), action, reason }`.
|
|
54
|
+
|
|
55
|
+
## Step 4: Generate Manifest
|
|
56
|
+
|
|
57
|
+
Write `.gsd-t/doc-ripple-manifest.md` (overwrite):
|
|
58
|
+
|
|
59
|
+
```markdown
|
|
60
|
+
# Doc-Ripple Manifest — {date}
|
|
61
|
+
|
|
62
|
+
## Trigger
|
|
63
|
+
- Command: {triggering command}
|
|
64
|
+
- Files changed: {N}
|
|
65
|
+
- Threshold: FIRE — {reason}
|
|
66
|
+
|
|
67
|
+
## Blast Radius
|
|
68
|
+
|
|
69
|
+
| Document | Status | Action | Reason |
|
|
70
|
+
|----------|--------|--------|--------|
|
|
71
|
+
| {path} | {UPDATED|SKIPPED} | {action} | {reason} |
|
|
72
|
+
|
|
73
|
+
## Summary
|
|
74
|
+
- Documents checked: {N}
|
|
75
|
+
- Documents updated: {N}
|
|
76
|
+
- Documents skipped (already current): {N}
|
|
77
|
+
```
|
|
78
|
+
|
|
79
|
+
## Step 5: Update Documents
|
|
80
|
+
|
|
81
|
+
Count documents marked UPDATED.
|
|
82
|
+
|
|
83
|
+
**Fewer than 3 updates — inline:**
|
|
84
|
+
For each document: read current content → determine minimal edit → apply via Edit tool (not Write) → verify after edit.
|
|
85
|
+
|
|
86
|
+
**3 or more updates — parallel subagents:**
|
|
87
|
+
|
|
88
|
+
For each document or logical group:
|
|
89
|
+
|
|
90
|
+
⚙ [haiku] gsd-t-doc-ripple → update {document}
|
|
91
|
+
(Use sonnet for docs/architecture.md and docs/requirements.md — these need reasoning.)
|
|
92
|
+
|
|
93
|
+
**OBSERVABILITY LOGGING (MANDATORY) — for each subagent spawn:**
|
|
94
|
+
|
|
95
|
+
Before spawning — run via Bash:
|
|
96
|
+
`T_START=$(date +%s) && DT_START=$(date +"%Y-%m-%d %H:%M") && TOK_START=${CLAUDE_CONTEXT_TOKENS_USED:-0} && TOK_MAX=${CLAUDE_CONTEXT_TOKENS_MAX:-200000}`
|
|
97
|
+
|
|
98
|
+
After subagent returns — run via Bash:
|
|
99
|
+
`T_END=$(date +%s) && DT_END=$(date +"%Y-%m-%d %H:%M") && TOK_END=${CLAUDE_CONTEXT_TOKENS_USED:-0} && DURATION=$((T_END-T_START))`
|
|
100
|
+
|
|
101
|
+
Compute tokens:
|
|
102
|
+
- No compaction (TOK_END >= TOK_START): `TOKENS=$((TOK_END-TOK_START))`, COMPACTED=null
|
|
103
|
+
- Compaction detected (TOK_END < TOK_START): `TOKENS=$(((TOK_MAX-TOK_START)+TOK_END))`, COMPACTED=$DT_END
|
|
104
|
+
|
|
105
|
+
Compute context utilization:
|
|
106
|
+
`if [ "${CLAUDE_CONTEXT_TOKENS_MAX:-0}" -gt 0 ]; then CTX_PCT=$(echo "scale=1; ${CLAUDE_CONTEXT_TOKENS_USED:-0} * 100 / ${CLAUDE_CONTEXT_TOKENS_MAX}" | bc); else CTX_PCT="N/A"; fi`
|
|
107
|
+
|
|
108
|
+
Alert thresholds:
|
|
109
|
+
- CTX_PCT >= 85: `echo "🔴 CRITICAL: Context at ${CTX_PCT}% — compaction likely. Task MUST be split."`
|
|
110
|
+
- CTX_PCT >= 70: `echo "⚠️ WARNING: Context at ${CTX_PCT}% — approaching compaction threshold. Consider splitting."`
|
|
111
|
+
|
|
112
|
+
Append to `.gsd-t/token-log.md` (create with header `| Datetime-start | Datetime-end | Command | Step | Model | Duration(s) | Notes | Tokens | Compacted | Domain | Task | Ctx% |` if missing):
|
|
113
|
+
`| {DT_START} | {DT_END} | gsd-t-doc-ripple | Step 5 | {model} | {DURATION}s | update:{document} | {TOKENS} | {COMPACTED} | doc-ripple | — | {CTX_PCT} |`
|
|
114
|
+
|
|
115
|
+
**Each document-update subagent prompt:**
|
|
116
|
+
```
|
|
117
|
+
Task subagent (general-purpose, model: {haiku|sonnet}):
|
|
118
|
+
"Update a single document as part of doc-ripple enforcement.
|
|
119
|
+
|
|
120
|
+
Document to update: {path}
|
|
121
|
+
Action: {action from manifest}
|
|
122
|
+
Reason: {reason from manifest}
|
|
123
|
+
Git diff context (changed files): {file list}
|
|
124
|
+
|
|
125
|
+
Instructions:
|
|
126
|
+
1. Read the current document
|
|
127
|
+
2. Apply the minimal edit — add/update only the affected section
|
|
128
|
+
3. Use the Edit tool (not Write) — preserve all existing content
|
|
129
|
+
4. Re-read after edit to confirm correctness
|
|
130
|
+
5. Report: 'Updated {document} — {one-line summary of change}'"
|
|
131
|
+
```
|
|
132
|
+
|
|
133
|
+
## Step 6: Report Summary
|
|
134
|
+
|
|
135
|
+
Output:
|
|
136
|
+
```
|
|
137
|
+
Doc-ripple: {N} checked, {N} updated, {N} skipped
|
|
138
|
+
```
|
|
139
|
+
|
|
140
|
+
List each updated document with a one-line summary of what changed.
|
|
141
|
+
|
|
142
|
+
If any update failed, list it under `Failures:` and flag for manual review.
|
|
143
|
+
|
|
144
|
+
$ARGUMENTS
|
|
145
|
+
|
|
146
|
+
## Auto-Clear
|
|
147
|
+
|
|
148
|
+
All work is written to project files. Execute `/clear` to free the context window for the next command.
|
|
@@ -168,6 +168,26 @@ Execute the task above:
|
|
|
168
168
|
- If the project has a UI but no Playwright E2E specs exist for the features being
|
|
169
169
|
touched: WRITE THEM. A placeholder spec is not sufficient — write real E2E tests
|
|
170
170
|
that exercise the actual UI functionality being built or changed.
|
|
171
|
+
- **FUNCTIONAL E2E TESTS — NOT LAYOUT TESTS (MANDATORY)**:
|
|
172
|
+
E2E tests that only check element existence (isVisible, isEnabled, toBeAttached)
|
|
173
|
+
are LAYOUT tests, not functional tests. Layout tests pass even when every feature
|
|
174
|
+
is broken. Every Playwright spec MUST verify functional behavior:
|
|
175
|
+
a. **State changes**: After an action (click, type, submit), assert the app STATE
|
|
176
|
+
changed — not just that the button was clickable. Example: clicking a tab must
|
|
177
|
+
load different content; verify the content changed, not just that the tab exists.
|
|
178
|
+
b. **Data flow**: Form submissions must verify data arrived (API call made, response
|
|
179
|
+
rendered, list updated). Don't just assert the form rendered.
|
|
180
|
+
c. **Navigation/routing**: Tab/page switches must verify the NEW content loaded.
|
|
181
|
+
Assert on content unique to the destination, not the navigation element itself.
|
|
182
|
+
d. **Interactive widgets**: Terminals must accept input and produce output. Editors
|
|
183
|
+
must save changes. Panels must load their functional content after opening.
|
|
184
|
+
e. **Network integration**: If a feature requires WebSocket/API connection, verify
|
|
185
|
+
the connection status changes (e.g., "Disconnected" → "Connected") and that
|
|
186
|
+
messages flow through the connection.
|
|
187
|
+
f. **Error recovery**: Don't just check error messages render — verify the app
|
|
188
|
+
recovers (retry button works, form can be resubmitted, etc.).
|
|
189
|
+
A test that would pass on an empty HTML page with the right element IDs is useless.
|
|
190
|
+
Every assertion must prove the FEATURE WORKS, not that the ELEMENT EXISTS.
|
|
171
191
|
7. Run ALL test suites — this is NOT optional, not conditional, not "if applicable":
|
|
172
192
|
a. Detect configured test runners: check for vitest/jest config, playwright.config.*, cypress.config.*
|
|
173
193
|
b. Run EVERY detected suite. Unit tests alone are NEVER sufficient when E2E exists.
|
|
@@ -187,8 +207,13 @@ Execute the task above:
|
|
|
187
207
|
b. E2E tests: check for playwright.config.* or cypress.config.* — if found, run the FULL E2E suite
|
|
188
208
|
c. NEVER skip E2E when a config file exists. Running only unit tests is a QA FAILURE.
|
|
189
209
|
d. Read .gsd-t/contracts/ for contract definitions. Check contract compliance.
|
|
190
|
-
|
|
191
|
-
|
|
210
|
+
e. AUDIT E2E test quality: Review each Playwright spec — if any test only checks
|
|
211
|
+
element existence (isVisible, toBeAttached, toBeEnabled) without verifying functional
|
|
212
|
+
behavior (state changes, data loaded, content updated after actions), flag it as
|
|
213
|
+
"SHALLOW TEST — needs functional assertions" in the gap report. A test suite where
|
|
214
|
+
every spec passes but no feature actually works is a QA FAILURE.
|
|
215
|
+
Report format: "Unit: X/Y pass | E2E: X/Y pass (or N/A if no config) | Contract: compliant/violations | Shallow tests: N (list)"'
|
|
216
|
+
If QA fails OR shallow tests are found, fix before proceeding. Append issues to .gsd-t/qa-issues.md.
|
|
192
217
|
12. Write task summary to .gsd-t/domains/{domain-name}/task-{task-id}-summary.md:
|
|
193
218
|
## Task {task-id} Summary — {domain-name}
|
|
194
219
|
- **Status**: PASS | FAIL
|
|
@@ -275,10 +300,11 @@ RULES FOR ALL TEAMMATES:
|
|
|
275
300
|
- **Destructive Action Guard**: NEVER drop tables, remove columns, delete data, replace architecture patterns, or remove working modules without messaging the lead first. The lead must get user approval before any destructive action proceeds.
|
|
276
301
|
- Only modify files listed in your domain's scope.md
|
|
277
302
|
- Implement interfaces EXACTLY as specified in contracts
|
|
278
|
-
- **Write comprehensive tests with every task** — no feature code without test code:
|
|
303
|
+
- **Write comprehensive FUNCTIONAL tests with every task** — no feature code without test code:
|
|
279
304
|
- Unit/integration tests: happy path + edge cases + error cases for every new/changed function
|
|
280
305
|
- Playwright E2E specs (if UI/routes/flows/modes changed): new specs for new features, cover all modes/flags, form validation, empty/loading/error states, common edge cases
|
|
281
306
|
- Tests are part of the deliverable, not a follow-up
|
|
307
|
+
- **E2E tests MUST be functional, not layout tests**: Every assertion must verify an action produced the correct outcome (state changed, data loaded, content updated) — NOT just that an element is visible/clickable. A test that passes on an empty HTML shell with correct IDs is worthless. See the Functional E2E Test Requirements in the solo mode instructions above.
|
|
282
308
|
- If a task is marked BLOCKED, message the lead and wait
|
|
283
309
|
- Run the Pre-Commit Gate checklist from CLAUDE.md BEFORE every commit — update all affected docs
|
|
284
310
|
- **Commit immediately after each task**: `feat({domain}/task-{N}): {description}` — do NOT batch commits
|
|
@@ -399,6 +425,29 @@ After all merges complete (whether all passed, some rolled back, or errors occur
|
|
|
399
425
|
Cleanup is not optional — orphaned worktrees waste disk space and can confuse subsequent executions. Always run cleanup, even if earlier steps failed.
|
|
400
426
|
```
|
|
401
427
|
|
|
428
|
+
## Step 3.5: Orchestrator Context Self-Check (MANDATORY)
|
|
429
|
+
|
|
430
|
+
After EVERY domain completes (and after every checkpoint), the orchestrator MUST check its own context utilization:
|
|
431
|
+
|
|
432
|
+
Run via Bash:
|
|
433
|
+
`if [ "${CLAUDE_CONTEXT_TOKENS_MAX:-0}" -gt 0 ]; then CTX_PCT=$(echo "scale=1; ${CLAUDE_CONTEXT_TOKENS_USED:-0} * 100 / ${CLAUDE_CONTEXT_TOKENS_MAX}" | bc); else CTX_PCT="N/A"; fi && echo "Orchestrator context: ${CTX_PCT}%"`
|
|
434
|
+
|
|
435
|
+
**If CTX_PCT >= 70:**
|
|
436
|
+
1. **Save checkpoint to disk** — update `.gsd-t/progress.md` with:
|
|
437
|
+
- Which domains are complete, which remain
|
|
438
|
+
- Current wave, next domain to execute
|
|
439
|
+
- Any checkpoint results
|
|
440
|
+
2. **Instruct user**: Output exactly:
|
|
441
|
+
```
|
|
442
|
+
⚠️ Orchestrator context at {CTX_PCT}% — approaching limit.
|
|
443
|
+
Progress saved. Run `/clear` then `/user:gsd-t-execute` to continue from the next domain.
|
|
444
|
+
```
|
|
445
|
+
3. **STOP execution.** Do NOT spawn another domain subagent. The next session will resume from saved state.
|
|
446
|
+
|
|
447
|
+
**If CTX_PCT < 70:** Continue normally to the next domain/wave.
|
|
448
|
+
|
|
449
|
+
This prevents the orchestrator from running out of context mid-milestone, which causes session breaks and summary-based recovery.
|
|
450
|
+
|
|
402
451
|
## Step 4: Checkpoint Handling
|
|
403
452
|
|
|
404
453
|
When a checkpoint is reached (solo or team):
|
|
@@ -451,6 +500,26 @@ When all tasks in all domains are complete:
|
|
|
451
500
|
|
|
452
501
|
**Level 1–2**: Report completion summary and recommend proceeding to integrate phase. Wait for confirmation.
|
|
453
502
|
|
|
503
|
+
## Step 7: Doc-Ripple (Automated)
|
|
504
|
+
|
|
505
|
+
After all work is committed but before reporting completion:
|
|
506
|
+
|
|
507
|
+
1. Run threshold check — read `git diff --name-only HEAD~1` and evaluate against doc-ripple-contract.md trigger conditions
|
|
508
|
+
2. If SKIP: log "Doc-ripple: SKIP — {reason}" and proceed to completion
|
|
509
|
+
3. If FIRE: spawn doc-ripple agent:
|
|
510
|
+
|
|
511
|
+
⚙ [{model}] gsd-t-doc-ripple → blast radius analysis + parallel updates
|
|
512
|
+
|
|
513
|
+
Task subagent (general-purpose, model: sonnet):
|
|
514
|
+
"Execute the doc-ripple workflow per commands/gsd-t-doc-ripple.md.
|
|
515
|
+
Git diff context: {files changed list}
|
|
516
|
+
Command that triggered: execute
|
|
517
|
+
Produce manifest at .gsd-t/doc-ripple-manifest.md.
|
|
518
|
+
Update all affected documents.
|
|
519
|
+
Report: 'Doc-ripple: {N} checked, {N} updated, {N} skipped'"
|
|
520
|
+
|
|
521
|
+
4. After doc-ripple returns, verify manifest exists and report summary inline
|
|
522
|
+
|
|
454
523
|
## Document Ripple
|
|
455
524
|
|
|
456
525
|
Execute modifies source code, so the Pre-Commit Gate (referenced in Step 9) covers document updates. For clarity, the key documents affected by execution:
|
package/commands/gsd-t-help.md
CHANGED
|
@@ -38,6 +38,7 @@ MILESTONE WORKFLOW [auto] = in wave
|
|
|
38
38
|
execute [auto] Run tasks (solo or team mode)
|
|
39
39
|
test-sync [auto] Sync tests with code changes
|
|
40
40
|
qa [auto] QA agent — test generation, execution, gap reporting
|
|
41
|
+
doc-ripple [auto] Automated document ripple — update docs after code changes
|
|
41
42
|
integrate [auto] Wire domains together at boundaries
|
|
42
43
|
verify [auto] Run quality gates → auto-invokes complete-milestone
|
|
43
44
|
complete-milestone [auto] Archive milestone + git tag (auto-invoked by verify)
|
|
@@ -264,6 +265,12 @@ Use these when user asks for help on a specific command:
|
|
|
264
265
|
- **Creates**: Contract test skeletons, acceptance tests, edge case tests, test audit reports
|
|
265
266
|
- **Use when**: Automatically spawned — never needs manual invocation. Standalone use for ad-hoc test audits.
|
|
266
267
|
|
|
268
|
+
### doc-ripple
|
|
269
|
+
- **Summary**: Automated document ripple — identifies and updates all downstream docs after code changes
|
|
270
|
+
- **Auto-invoked**: Yes (after primary work in execute, integrate, quick, debug, wave)
|
|
271
|
+
- **Creates**: `.gsd-t/doc-ripple-manifest.md`
|
|
272
|
+
- **Use when**: Automatically spawned — never needs manual invocation. Standalone use for ad-hoc doc sync audits.
|
|
273
|
+
|
|
267
274
|
### integrate
|
|
268
275
|
- **Summary**: Wire domains together at their boundaries
|
|
269
276
|
- **Auto-invoked**: Yes (in wave, after execute)
|
|
@@ -126,7 +126,10 @@ Run ALL configured test suites — detect and run every one:
|
|
|
126
126
|
a. Unit tests (vitest/jest/mocha): run the full suite
|
|
127
127
|
b. E2E tests: check for playwright.config.* or cypress.config.* — if found, run the FULL E2E suite
|
|
128
128
|
c. NEVER skip E2E when a config file exists. Running only unit tests is a QA FAILURE.
|
|
129
|
-
|
|
129
|
+
d. AUDIT E2E test quality: Review each Playwright spec — if any test only checks element existence
|
|
130
|
+
(isVisible, toBeAttached, toBeEnabled) without verifying functional behavior (state changes,
|
|
131
|
+
data loaded, content updated after actions), flag it as 'SHALLOW TEST — needs functional assertions'.
|
|
132
|
+
Report: 'Unit: X/Y pass | E2E: X/Y pass (or N/A if no config) | Boundary: pass/fail by contract | Shallow tests: N'"
|
|
130
133
|
```
|
|
131
134
|
|
|
132
135
|
**OBSERVABILITY LOGGING (MANDATORY):**
|
|
@@ -176,8 +179,29 @@ After integration and doc ripple, verify everything works together:
|
|
|
176
179
|
c. Unit tests alone are NEVER sufficient when E2E exists
|
|
177
180
|
d. Report: "Unit: X/Y pass | E2E: X/Y pass"
|
|
178
181
|
3. **Verify passing**: All tests must pass. If any fail, fix before proceeding (up to 2 attempts)
|
|
182
|
+
4. **Functional test quality**: Spot-check E2E specs — every assertion must verify functional behavior (state changed, data loaded, content updated after action), not just element existence. Shallow tests that would pass on an empty HTML page are not acceptable.
|
|
179
183
|
5. **Smoke test results**: Ensure the Step 4 smoke test results are still valid after any fixes
|
|
180
184
|
|
|
185
|
+
## Step 7.5: Doc-Ripple (Automated)
|
|
186
|
+
|
|
187
|
+
After all integration work is committed but before reporting completion:
|
|
188
|
+
|
|
189
|
+
1. Run threshold check — read `git diff --name-only HEAD~1` and evaluate against doc-ripple-contract.md trigger conditions
|
|
190
|
+
2. If SKIP: log "Doc-ripple: SKIP — {reason}" and proceed to completion
|
|
191
|
+
3. If FIRE: spawn doc-ripple agent:
|
|
192
|
+
|
|
193
|
+
⚙ [{model}] gsd-t-doc-ripple → blast radius analysis + parallel updates
|
|
194
|
+
|
|
195
|
+
Task subagent (general-purpose, model: sonnet):
|
|
196
|
+
"Execute the doc-ripple workflow per commands/gsd-t-doc-ripple.md.
|
|
197
|
+
Git diff context: {files changed list}
|
|
198
|
+
Command that triggered: integrate
|
|
199
|
+
Produce manifest at .gsd-t/doc-ripple-manifest.md.
|
|
200
|
+
Update all affected documents.
|
|
201
|
+
Report: 'Doc-ripple: {N} checked, {N} updated, {N} skipped'"
|
|
202
|
+
|
|
203
|
+
4. After doc-ripple returns, verify manifest exists and report summary inline
|
|
204
|
+
|
|
181
205
|
## Step 8: Handle Integration Issues
|
|
182
206
|
|
|
183
207
|
For each issue found:
|
package/commands/gsd-t-qa.md
CHANGED
|
@@ -81,13 +81,16 @@ Your behavior depends on which phase spawned you:
|
|
|
81
81
|
|
|
82
82
|
### During Verify
|
|
83
83
|
**Trigger**: Lead invokes verify phase
|
|
84
|
-
**Action**: Full test audit
|
|
84
|
+
**Action**: Full test audit + shallow test detection
|
|
85
85
|
|
|
86
86
|
1. Run ALL tests — contract tests, acceptance tests, edge case tests, existing project tests
|
|
87
87
|
2. Coverage audit: For every contract, confirm tests exist and pass
|
|
88
88
|
3. For every new feature/mode/flow, confirm Playwright specs cover happy path, error states, edge cases
|
|
89
|
-
4.
|
|
90
|
-
5.
|
|
89
|
+
4. **Shallow test audit**: Read every Playwright spec file. For each `test()` block, check whether the assertions verify functional behavior (state changes, data flow, content updates after actions) or only check element existence (isVisible, toBeAttached, toBeEnabled). Flag any test that would pass on an empty HTML shell as `SHALLOW — needs functional assertions`.
|
|
90
|
+
5. Gap report: List any untested contracts, code paths, AND shallow tests
|
|
91
|
+
6. Report: `QA: {pass|fail} — {N} contract tests, {N} acceptance tests, {N} edge case tests. Gaps: {list or "none"}. Shallow E2E tests: {N} (list or "none")`
|
|
92
|
+
|
|
93
|
+
**Shallow tests block verification.** A passing E2E suite where tests don't actually verify feature behavior is equivalent to a failing suite.
|
|
91
94
|
|
|
92
95
|
### During Quick
|
|
93
96
|
**Trigger**: Lead runs a quick task
|
|
@@ -189,10 +192,28 @@ For each table in `schema-contract.md`:
|
|
|
189
192
|
For each component in `component-contract.md`:
|
|
190
193
|
- Each `## ComponentName` → one `test.describe` block
|
|
191
194
|
- `Props:` → renders with required props, handles missing optional props
|
|
192
|
-
- `Events:` → event handlers fire correctly
|
|
193
|
-
- API references → verify correct API calls made
|
|
195
|
+
- `Events:` → event handlers fire correctly AND produce the expected state change
|
|
196
|
+
- API references → verify correct API calls made AND responses rendered correctly
|
|
194
197
|
- Auto-generate: empty form, partial form, network error handling
|
|
195
198
|
|
|
199
|
+
### Functional E2E Test Standard (MANDATORY for all Playwright specs)
|
|
200
|
+
|
|
201
|
+
**E2E tests that only verify element existence are LAYOUT tests, not functional tests. Layout tests pass even when every feature is broken. This is a QA failure.**
|
|
202
|
+
|
|
203
|
+
Every Playwright spec MUST verify functional behavior — that actions produce the correct outcome:
|
|
204
|
+
|
|
205
|
+
| Test Pattern | WRONG (layout test) | RIGHT (functional test) |
|
|
206
|
+
|---|---|---|
|
|
207
|
+
| Tab switching | `expect(tab).toBeVisible()` | Click tab → assert NEW content loaded (text, data unique to that tab) |
|
|
208
|
+
| Form submit | `expect(submitBtn).toBeEnabled()` | Fill form → submit → assert success message AND data persisted (API call, list updated) |
|
|
209
|
+
| Terminal/editor | `expect(terminal).toBeAttached()` | Open terminal → type command → assert output appears |
|
|
210
|
+
| WebSocket | `expect(statusBadge).toBeVisible()` | Wait for connection → assert status text changes to "Connected" → send message → assert response |
|
|
211
|
+
| Navigation | `expect(link).toHaveAttribute('href')` | Click link → assert URL changed AND destination content rendered |
|
|
212
|
+
| Toggle/mode | `expect(toggle).toBeVisible()` | Click toggle → assert the EFFECT (dark mode CSS applied, panel expanded with content, feature enabled) |
|
|
213
|
+
| Error state | `expect(errorDiv).toBeVisible()` | Trigger error → assert message content → assert recovery action works |
|
|
214
|
+
|
|
215
|
+
**Rule: If a test would pass on an empty HTML shell with the right element IDs, it is not a functional test. Every assertion must prove the feature works, not that the element exists.**
|
|
216
|
+
|
|
196
217
|
## Test File Conventions
|
|
197
218
|
|
|
198
219
|
- **Location**: Project's test directory (detected from `playwright.config.*` or `package.json`)
|
package/commands/gsd-t-quick.md
CHANGED
|
@@ -133,6 +133,7 @@ Quick does not mean skip testing. Before committing:
|
|
|
133
133
|
- Playwright E2E specs (if UI/routes/flows/modes changed): create new specs for new functionality, update existing specs for changed behavior
|
|
134
134
|
- Cover all modes/flags affected by this change
|
|
135
135
|
- "No feature code without test code" applies to quick tasks too
|
|
136
|
+
- **Functional tests only** — every E2E assertion must verify an action produced the correct outcome (state changed, data loaded, content updated). Tests that only check element existence (`isVisible`, `toBeEnabled`) are shallow/layout tests and are not acceptable. If a test would pass on an empty HTML page with the right IDs, rewrite it.
|
|
136
137
|
2. **Run ALL configured test suites** — not just affected tests, not just one suite:
|
|
137
138
|
a. Detect all runners: check for vitest/jest config, playwright.config.*, cypress.config.*
|
|
138
139
|
b. Run EVERY detected suite. Unit tests alone are NEVER sufficient when E2E exists.
|
|
@@ -145,6 +146,26 @@ Quick does not mean skip testing. Before committing:
|
|
|
145
146
|
- If a contract exists for the interface touched, does the code still match?
|
|
146
147
|
4. **No test framework?**: Set one up, or at minimum manually verify and document how in the commit message
|
|
147
148
|
|
|
149
|
+
## Step 6: Doc-Ripple (Automated)
|
|
150
|
+
|
|
151
|
+
After all work is committed but before reporting completion:
|
|
152
|
+
|
|
153
|
+
1. Run threshold check — read `git diff --name-only HEAD~1` and evaluate against doc-ripple-contract.md trigger conditions
|
|
154
|
+
2. If SKIP: log "Doc-ripple: SKIP — {reason}" and proceed to completion
|
|
155
|
+
3. If FIRE: spawn doc-ripple agent:
|
|
156
|
+
|
|
157
|
+
⚙ [{model}] gsd-t-doc-ripple → blast radius analysis + parallel updates
|
|
158
|
+
|
|
159
|
+
Task subagent (general-purpose, model: sonnet):
|
|
160
|
+
"Execute the doc-ripple workflow per commands/gsd-t-doc-ripple.md.
|
|
161
|
+
Git diff context: {files changed list}
|
|
162
|
+
Command that triggered: quick
|
|
163
|
+
Produce manifest at .gsd-t/doc-ripple-manifest.md.
|
|
164
|
+
Update all affected documents.
|
|
165
|
+
Report: 'Doc-ripple: {N} checked, {N} updated, {N} skipped'"
|
|
166
|
+
|
|
167
|
+
4. After doc-ripple returns, verify manifest exists and report summary inline
|
|
168
|
+
|
|
148
169
|
$ARGUMENTS
|
|
149
170
|
|
|
150
171
|
## Auto-Clear
|
|
@@ -151,6 +151,27 @@ If Playwright is configured (`playwright.config.*` or Playwright in dependencies
|
|
|
151
151
|
|
|
152
152
|
**This is NOT optional.** Every new code path that a user can reach must have a Playwright spec. "We'll add tests later" is never acceptable.
|
|
153
153
|
|
|
154
|
+
**FUNCTIONAL TESTS — NOT LAYOUT TESTS (MANDATORY):**
|
|
155
|
+
E2E specs that only check element existence (`isVisible`, `toBeAttached`, `toBeEnabled`) are
|
|
156
|
+
layout tests. Layout tests pass even when every feature is broken — they are worthless for QA.
|
|
157
|
+
|
|
158
|
+
Every Playwright assertion MUST verify **functional behavior** — that an action produced the
|
|
159
|
+
correct outcome:
|
|
160
|
+
- **Tab/navigation**: Click → assert the NEW content loaded (unique text, data, or elements
|
|
161
|
+
that only appear on the destination view). Never just assert the tab element exists.
|
|
162
|
+
- **Forms**: Fill → submit → assert success feedback AND data persisted (API call observed
|
|
163
|
+
via `page.waitForResponse`, or list/table updated with new entry).
|
|
164
|
+
- **Interactive widgets** (terminals, editors, code panels): Open → interact → assert the
|
|
165
|
+
widget responded (keystroke produced output, content was saved, command executed).
|
|
166
|
+
- **Connections** (WebSocket, SSE, polling): Assert status transitions ("Connecting" →
|
|
167
|
+
"Connected") and verify data flows through the connection.
|
|
168
|
+
- **State toggles** (dark mode, expand/collapse, enable/disable): Assert the EFFECT of the
|
|
169
|
+
toggle, not just that the toggle control exists.
|
|
170
|
+
- **Error handling**: Trigger error → assert error content → assert recovery path works.
|
|
171
|
+
|
|
172
|
+
**Rule: If a test would pass on an empty HTML page with the correct element IDs and no
|
|
173
|
+
JavaScript, it is not a functional test. Rewrite it.**
|
|
174
|
+
|
|
154
175
|
### D) Capture Results
|
|
155
176
|
For all test types:
|
|
156
177
|
- PASS: Test still valid
|
package/commands/gsd-t-verify.md
CHANGED
|
@@ -104,7 +104,8 @@ Work through each dimension sequentially. For each:
|
|
|
104
104
|
- Confirm specs cover: happy path, error states, edge cases, all modes/flags
|
|
105
105
|
- If specs are missing or incomplete → invoke `gsd-t-test-sync` to create them, then re-run
|
|
106
106
|
- **Missing E2E coverage on new functionality = verification FAIL**
|
|
107
|
-
5.
|
|
107
|
+
5. **Functional test quality audit**: Read every Playwright spec. For each `test()` block, verify assertions check **functional behavior** (state changed after action, data loaded, content updated, widget responded) — NOT just element existence (`isVisible`, `toBeAttached`, `toBeEnabled`). A test that would pass on an empty HTML page with the right element IDs is a **shallow test** and counts as a verification FAIL. Flag shallow tests and rewrite them before proceeding.
|
|
108
|
+
6. Tests are NOT optional — verification cannot pass without running them and confirming comprehensive, functional coverage
|
|
108
109
|
|
|
109
110
|
### Team Mode (when agent teams are enabled)
|
|
110
111
|
```
|
package/commands/gsd-t-wave.md
CHANGED
|
@@ -84,6 +84,17 @@ Compute context utilization — run via Bash:
|
|
|
84
84
|
Alert on context thresholds (display to user inline):
|
|
85
85
|
- If CTX_PCT >= 85: `echo "🔴 CRITICAL: Context at ${CTX_PCT}% — compaction likely. Task MUST be split."`
|
|
86
86
|
- If CTX_PCT >= 70: `echo "⚠️ WARNING: Context at ${CTX_PCT}% — approaching compaction threshold. Consider splitting in plan."`
|
|
87
|
+
|
|
88
|
+
**Orchestrator Context Self-Check (MANDATORY):**
|
|
89
|
+
After EVERY phase agent returns, check the wave orchestrator's own context:
|
|
90
|
+
- **If CTX_PCT >= 70:**
|
|
91
|
+
1. Save checkpoint to `.gsd-t/progress.md` — record which phases are complete, which remain
|
|
92
|
+
2. Output: `⚠️ Wave orchestrator context at {CTX_PCT}% — approaching limit. Progress saved. Run /clear then /user:gsd-t-wave to continue from the next phase.`
|
|
93
|
+
3. **STOP the wave loop.** Do NOT spawn the next phase agent. The next session resumes from saved state.
|
|
94
|
+
- **If CTX_PCT < 70:** Continue to next phase.
|
|
95
|
+
|
|
96
|
+
This prevents the wave orchestrator from running out of context mid-wave.
|
|
97
|
+
|
|
87
98
|
Append to `.gsd-t/token-log.md` (create with header `| Datetime-start | Datetime-end | Command | Step | Model | Duration(s) | Notes | Tokens | Compacted | Domain | Task | Ctx% |` if missing):
|
|
88
99
|
`| {DT_START} | {DT_END} | gsd-t-wave | {PHASE} | sonnet | {DURATION}s | phase: {PHASE} | {TOKENS} | {COMPACTED} | | | {CTX_PCT} |`
|
|
89
100
|
|
|
@@ -149,6 +160,26 @@ Spawn agent → `commands/gsd-t-verify.md`
|
|
|
149
160
|
📋 Phase 8 (VERIFY+COMPLETE): {N} gates passed | Goal-Backward: {PASS/WARN/FAIL} — {N} requirements checked, {N} findings
|
|
150
161
|
```
|
|
151
162
|
|
|
163
|
+
#### 9. DOC-RIPPLE (Automated — after verify+complete)
|
|
164
|
+
|
|
165
|
+
After the final phase completes but before wave reports done:
|
|
166
|
+
|
|
167
|
+
1. Run threshold check — read `git diff --name-only HEAD~1` and evaluate against doc-ripple-contract.md trigger conditions
|
|
168
|
+
2. If SKIP: log "Doc-ripple: SKIP — {reason}" and proceed
|
|
169
|
+
3. If FIRE: spawn doc-ripple agent:
|
|
170
|
+
|
|
171
|
+
⚙ [{model}] gsd-t-doc-ripple → blast radius analysis + parallel updates
|
|
172
|
+
|
|
173
|
+
Task subagent (general-purpose, model: sonnet):
|
|
174
|
+
"Execute the doc-ripple workflow per commands/gsd-t-doc-ripple.md.
|
|
175
|
+
Git diff context: {files changed list}
|
|
176
|
+
Command that triggered: wave
|
|
177
|
+
Produce manifest at .gsd-t/doc-ripple-manifest.md.
|
|
178
|
+
Update all affected documents.
|
|
179
|
+
Report: 'Doc-ripple: {N} checked, {N} updated, {N} skipped'"
|
|
180
|
+
|
|
181
|
+
4. After doc-ripple returns, verify manifest exists and report summary inline
|
|
182
|
+
|
|
152
183
|
### Between Each Phase
|
|
153
184
|
|
|
154
185
|
After each agent completes, run this spot-check before proceeding:
|
package/docs/GSD-T-README.md
CHANGED
|
@@ -103,6 +103,7 @@ GSD-T reads all state files and tells you exactly where you left off.
|
|
|
103
103
|
| `/user:gsd-t-execute` | Run tasks — task-level fresh dispatch, worktree isolation, adaptive replanning | In wave |
|
|
104
104
|
| `/user:gsd-t-test-sync` | Sync tests with code changes | In wave |
|
|
105
105
|
| `/user:gsd-t-qa` | QA agent — test generation, execution, gap reporting | Auto-spawned |
|
|
106
|
+
| `/user:gsd-t-doc-ripple` | Automated document ripple — update downstream docs after code changes | Auto-spawned |
|
|
106
107
|
| `/user:gsd-t-integrate` | Wire domains together | In wave |
|
|
107
108
|
| `/user:gsd-t-verify` | Run quality gates + goal-backward verification → auto-invokes complete-milestone | In wave |
|
|
108
109
|
| `/user:gsd-t-complete-milestone` | Archive + git tag (auto-invoked by verify, also standalone) | In wave |
|
|
@@ -0,0 +1,160 @@
|
|
|
1
|
+
# Framework Comparison Scorecard
|
|
2
|
+
|
|
3
|
+
**Purpose**: Unbiased comparison of development frameworks.
|
|
4
|
+
**Instructions**: Score each framework 1-5 per dimension. Use the rubric at the bottom. Equal weights — no dimension gaming.
|
|
5
|
+
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
## Frameworks Being Compared
|
|
9
|
+
|
|
10
|
+
| Slot | Framework | Version/Variant | Evaluator | Date |
|
|
11
|
+
|------|-----------|-----------------|-----------|------|
|
|
12
|
+
| F1 | | | | |
|
|
13
|
+
| F2 | | | | |
|
|
14
|
+
| F3 | | | | |
|
|
15
|
+
| F4 | | | | |
|
|
16
|
+
|
|
17
|
+
---
|
|
18
|
+
|
|
19
|
+
## A. Onboarding & Adoption (Dimensions 1-3)
|
|
20
|
+
|
|
21
|
+
| # | Dimension | What to Evaluate | F1 | F2 | F3 | F4 |
|
|
22
|
+
|---|-----------|------------------|----|----|----|----|
|
|
23
|
+
| 1 | Time to first productive output | How quickly can someone go from choosing the framework to shipping real work? | | | | |
|
|
24
|
+
| 2 | Team adoption friction | How willing is a typical team to adopt it after initial exposure? | | | | |
|
|
25
|
+
| 3 | Works without specific tooling | Can it be used with any IDE, editor, or AI assistant? | | | | |
|
|
26
|
+
| | **Category Average** | | **—** | **—** | **—** | **—** |
|
|
27
|
+
|
|
28
|
+
## B. Execution & Delivery (Dimensions 4-7)
|
|
29
|
+
|
|
30
|
+
| # | Dimension | What to Evaluate | F1 | F2 | F3 | F4 |
|
|
31
|
+
|---|-----------|------------------|----|----|----|----|
|
|
32
|
+
| 4 | Defect prevention | How effectively does the framework prevent bugs from reaching production? | | | | |
|
|
33
|
+
| 5 | Throughput | How many features can be shipped per unit time? | | | | |
|
|
34
|
+
| 6 | Rework prevention | How well does the framework prevent completed work from needing redo? | | | | |
|
|
35
|
+
| 7 | Idea-to-deploy cycle time | How quickly can a concept move from idea to production? | | | | |
|
|
36
|
+
| | **Category Average** | | **—** | **—** | **—** | **—** |
|
|
37
|
+
|
|
38
|
+
## C. Sustainability & Maintenance (Dimensions 8-11)
|
|
39
|
+
|
|
40
|
+
| # | Dimension | What to Evaluate | F1 | F2 | F3 | F4 |
|
|
41
|
+
|---|-----------|------------------|----|----|----|----|
|
|
42
|
+
| 8 | New member ramp-up | How quickly can a new team member contribute independently? | | | | |
|
|
43
|
+
| 9 | Context recovery | How easily can work resume after an interruption of days or weeks? | | | | |
|
|
44
|
+
| 10 | Tech debt management | How well does the framework track and control technical debt? | | | | |
|
|
45
|
+
| 11 | Documentation freshness | How well does the framework keep documentation accurate and current? | | | | |
|
|
46
|
+
| | **Category Average** | | **—** | **—** | **—** | **—** |
|
|
47
|
+
|
|
48
|
+
## D. Flexibility & Universality (Dimensions 12-15)
|
|
49
|
+
|
|
50
|
+
| # | Dimension | What to Evaluate | F1 | F2 | F3 | F4 |
|
|
51
|
+
|---|-----------|------------------|----|----|----|----|
|
|
52
|
+
| 12 | Minimum viable process | Can you use a small portion of it and still get value? | | | | |
|
|
53
|
+
| 13 | Project type coverage | Does it work across web, mobile, data, infra, and non-code projects? | | | | |
|
|
54
|
+
| 14 | Team size range | Is it effective from 1-person teams to 50-person teams? | | | | |
|
|
55
|
+
| 15 | Overhead proportionality | Does ceremony scale with project size rather than being fixed? | | | | |
|
|
56
|
+
| | **Category Average** | | **—** | **—** | **—** | **—** |
|
|
57
|
+
|
|
58
|
+
## E. Automation & AI-Agent Capabilities (Dimensions 16-19)
|
|
59
|
+
|
|
60
|
+
| # | Dimension | What to Evaluate | F1 | F2 | F3 | F4 |
|
|
61
|
+
|---|-----------|------------------|----|----|----|----|
|
|
62
|
+
| 16 | Agentic workflow support | Does the framework enable AI agents to execute work autonomously (task dispatch, parallel execution, adaptive replanning)? | | | | |
|
|
63
|
+
| 17 | QA automation | Does the framework automate test generation, execution, and gap detection — not just run existing tests? | | | | |
|
|
64
|
+
| 18 | QA coverage enforcement | Does the framework enforce minimum coverage and block progress when tests are missing or failing? | | | | |
|
|
65
|
+
| 19 | Contract enforcement | Does the framework define and validate interfaces between components automatically (API shapes, schemas, props)? | | | | |
|
|
66
|
+
| | **Category Average** | | **—** | **—** | **—** | **—** |
|
|
67
|
+
|
|
68
|
+
## F. Observability & Decision Quality (Dimensions 20-22)
|
|
69
|
+
|
|
70
|
+
| # | Dimension | What to Evaluate | F1 | F2 | F3 | F4 |
|
|
71
|
+
|---|-----------|------------------|----|----|----|----|
|
|
72
|
+
| 20 | Decision traceability | Can you find why a choice was made 6 months later? | | | | |
|
|
73
|
+
| 21 | Progress accuracy | Does reported progress match actual state? | | | | |
|
|
74
|
+
| 22 | Risk visibility | Do problems surface early or only at integration? | | | | |
|
|
75
|
+
| | **Category Average** | | **—** | **—** | **—** | **—** |
|
|
76
|
+
|
|
77
|
+
---
|
|
78
|
+
|
|
79
|
+
## Summary
|
|
80
|
+
|
|
81
|
+
| Category | F1 | F2 | F3 | F4 |
|
|
82
|
+
|---------------------------------------------|----|----|----|----|
|
|
83
|
+
| A. Onboarding & Adoption (1-3) | | | | |
|
|
84
|
+
| B. Execution & Delivery (4-7) | | | | |
|
|
85
|
+
| C. Sustainability & Maintenance (8-11) | | | | |
|
|
86
|
+
| D. Flexibility & Universality (12-15) | | | | |
|
|
87
|
+
| E. Automation & AI-Agent Capabilities (16-19) | | | | |
|
|
88
|
+
| F. Observability & Decisions (20-22) | | | | |
|
|
89
|
+
| **Overall Average (1-5)** | **—** | **—** | **—** | **—** |
|
|
90
|
+
| **Normalized Score (/100)** | **—** | **—** | **—** | **—** |
|
|
91
|
+
|
|
92
|
+
### Calculation
|
|
93
|
+
|
|
94
|
+
```
|
|
95
|
+
Category Average = sum of dimension scores in category / number of dimensions in category
|
|
96
|
+
Overall Average = sum of all 22 dimension scores / 22
|
|
97
|
+
Normalized /100 = Overall Average × 20
|
|
98
|
+
```
|
|
99
|
+
|
|
100
|
+
---
|
|
101
|
+
|
|
102
|
+
## Radar Chart Data
|
|
103
|
+
|
|
104
|
+
For visual comparison, plot each framework on a 6-axis radar chart using the category averages:
|
|
105
|
+
|
|
106
|
+
```
|
|
107
|
+
Axis 1: Onboarding & Adoption
|
|
108
|
+
Axis 2: Execution & Delivery
|
|
109
|
+
Axis 3: Sustainability & Maintenance
|
|
110
|
+
Axis 4: Flexibility & Universality
|
|
111
|
+
Axis 5: Automation & AI-Agent Capabilities
|
|
112
|
+
Axis 6: Observability & Decisions
|
|
113
|
+
```
|
|
114
|
+
|
|
115
|
+
---
|
|
116
|
+
|
|
117
|
+
## Scoring Rubric
|
|
118
|
+
|
|
119
|
+
Use this rubric consistently across all frameworks and dimensions:
|
|
120
|
+
|
|
121
|
+
| Score | Label | Definition |
|
|
122
|
+
|-------|---------------|------------------------------------------------------------------------------|
|
|
123
|
+
| 1 | Absent | Not addressed by the framework. User must solve this entirely on their own. |
|
|
124
|
+
| 2 | Minimal | Acknowledged but not enforced. Ad-hoc or optional guidance only. |
|
|
125
|
+
| 3 | Supported | Present with some structure, but inconsistently applied or easy to skip. |
|
|
126
|
+
| 4 | Systematic | Well-integrated, mostly enforced, clear process with known exceptions. |
|
|
127
|
+
| 5 | Core strength | Foundational to the framework. Systematically enforced, hard to bypass. |
|
|
128
|
+
|
|
129
|
+
### Scoring guidelines
|
|
130
|
+
|
|
131
|
+
- **Score what the framework provides**, not what a disciplined team could achieve without it
|
|
132
|
+
- **Score the default experience**, not the best-case customized setup
|
|
133
|
+
- **Score independently** — don't let a high score in one dimension inflate adjacent ones
|
|
134
|
+
- **Use 3 as the anchor** — most frameworks land at 3 for most dimensions. Reserve 1 and 5 for clear extremes
|
|
135
|
+
- **When uncertain**, score conservatively (lower)
|
|
136
|
+
|
|
137
|
+
---
|
|
138
|
+
|
|
139
|
+
## Bias Checks
|
|
140
|
+
|
|
141
|
+
Before finalizing scores, verify:
|
|
142
|
+
|
|
143
|
+
- [ ] No single framework scores 5 on more than 9 of 22 dimensions
|
|
144
|
+
- [ ] No single framework scores below 2 on more than 9 of 22 dimensions
|
|
145
|
+
- [ ] Every framework has at least one category where it leads
|
|
146
|
+
- [ ] The evaluator did not design or build any of the frameworks being compared (if they did, note the conflict and consider a second evaluator)
|
|
147
|
+
- [ ] Dimensions were not added or removed after seeing preliminary scores
|
|
148
|
+
|
|
149
|
+
---
|
|
150
|
+
|
|
151
|
+
## Notes & Justifications
|
|
152
|
+
|
|
153
|
+
Use this section to record reasoning for any score that might be controversial:
|
|
154
|
+
|
|
155
|
+
| Dimension | Framework | Score | Justification |
|
|
156
|
+
|-----------|-----------|-------|---------------|
|
|
157
|
+
| | | | |
|
|
158
|
+
| | | | |
|
|
159
|
+
| | | | |
|
|
160
|
+
| | | | |
|
package/docs/requirements.md
CHANGED
|
@@ -58,6 +58,9 @@
|
|
|
58
58
|
| REQ-047 | Global ELO & Rankings — gsd-t-status displays global ELO score and cross-project rank when global metrics exist | P2 | planned | validated by use |
|
|
59
59
|
| REQ-048 | Global Rule Promotion on Milestone Completion — gsd-t-complete-milestone copies promoted rules to global-rules.jsonl and updates global rollup after local promotion | P1 | planned | validated by use |
|
|
60
60
|
| REQ-049 | E2E Enforcement Rule — when playwright.config.* or cypress.config.* exists, ALL test-running commands (execute, quick, debug, test-sync, integrate, verify, complete-milestone) MUST run the full E2E suite. Unit-only results are NEVER sufficient. QA subagent prompts explicitly mandate E2E detection and execution. | P1 | complete | enforced in 7 command files + CLAUDE.md + pre-commit-gate contract |
|
|
61
|
+
| REQ-050 | Functional E2E Test Quality Standard — Playwright specs MUST verify functional behavior (state changes, data flow, content updates after actions), NOT just element existence (isVisible, toBeEnabled). Shallow layout tests that would pass on an empty HTML page are flagged and block verification. QA subagent audits for shallow tests. | P1 | complete | enforced in execute, qa, test-sync, verify, quick, debug, integrate, complete-milestone + global CLAUDE.md + CLAUDE-global template |
|
|
62
|
+
| REQ-051 | Document Ripple Completion Gate — when a change affects multiple files, identify the full blast radius BEFORE starting, complete ALL updates in one pass, and only report completion after every downstream document is updated. Partial delivery is never acceptable. The user should never need to ask "did you update everything?" | P1 | complete | enforced in global CLAUDE.md + CLAUDE-global template + project CLAUDE.md |
|
|
63
|
+
| REQ-052 | Doc-Ripple Subagent — dedicated agent auto-spawned after code-modifying commands (execute, integrate, quick, debug, wave) that analyzes git diff, identifies full blast radius of affected documents, and spawns parallel subagents to update them. Produces manifest audit trail. Threshold logic skips trivial changes. | P1 | complete | M28: contract ACTIVE, command file, 43 tests, wired into execute/integrate/quick/debug/wave |
|
|
61
64
|
|
|
62
65
|
## Technical Requirements
|
|
63
66
|
|
package/package.json
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "@tekyzinc/gsd-t",
|
|
3
|
-
"version": "2.
|
|
4
|
-
"description": "GSD-T: Contract-Driven Development for Claude Code —
|
|
3
|
+
"version": "2.46.11",
|
|
4
|
+
"description": "GSD-T: Contract-Driven Development for Claude Code — 51 slash commands with headless CI/CD mode, graph-powered code analysis, real-time agent dashboard, execution intelligence, task telemetry, doc-ripple enforcement, backlog management, impact analysis, test sync, milestone archival, and PRD generation",
|
|
5
5
|
"author": "Tekyz, Inc.",
|
|
6
6
|
"license": "MIT",
|
|
7
7
|
"repository": {
|
|
@@ -49,6 +49,7 @@ PROJECT or FEATURE or SCAN
|
|
|
49
49
|
| `/user:gsd-t-execute` | Run tasks — task-level fresh dispatch, worktree isolation, adaptive replanning, active rule injection |
|
|
50
50
|
| `/user:gsd-t-test-sync` | Keep tests aligned with code changes |
|
|
51
51
|
| `/user:gsd-t-qa` | QA agent — test generation, execution, gap reporting |
|
|
52
|
+
| `/user:gsd-t-doc-ripple` | Automated document ripple — update downstream docs after code changes |
|
|
52
53
|
| `/user:gsd-t-integrate` | Wire domains together |
|
|
53
54
|
| `/user:gsd-t-verify` | Run quality gates + goal-backward behavior verification |
|
|
54
55
|
| `/user:gsd-t-complete-milestone` | Archive milestone + git tag (goal-backward gate, rule engine distillation) |
|
|
@@ -250,6 +251,32 @@ BEFORE reporting "tests pass" for ANY task:
|
|
|
250
251
|
|
|
251
252
|
The conditional "if UI/routes/flows changed" in command files applies to **writing new E2E specs**, not to **running existing ones**. You always run existing E2E specs. Always.
|
|
252
253
|
|
|
254
|
+
### E2E Test Quality Standard (MANDATORY)
|
|
255
|
+
|
|
256
|
+
**E2E tests must be FUNCTIONAL tests, not LAYOUT tests.** This is non-negotiable.
|
|
257
|
+
|
|
258
|
+
A layout test checks that elements exist (`isVisible`, `toBeAttached`, `toBeEnabled`, `toHaveCount`). A functional test checks that features work — actions produce correct outcomes.
|
|
259
|
+
|
|
260
|
+
```
|
|
261
|
+
LAYOUT TEST (WRONG — passes even if every feature is broken):
|
|
262
|
+
await expect(page.locator('#tab-sessions')).toBeVisible();
|
|
263
|
+
await page.click('#tab-sessions');
|
|
264
|
+
// ← No assertion that the tab's content actually loaded
|
|
265
|
+
|
|
266
|
+
FUNCTIONAL TEST (RIGHT — fails if the feature is broken):
|
|
267
|
+
await page.click('#tab-sessions');
|
|
268
|
+
await expect(page.locator('.session-list')).toContainText('Session 1');
|
|
269
|
+
// ← Proves clicking the tab loaded the session data
|
|
270
|
+
```
|
|
271
|
+
|
|
272
|
+
Every Playwright assertion must verify one of:
|
|
273
|
+
- **State changed**: After click/type/submit, the app state is different (new content, updated data, changed status)
|
|
274
|
+
- **Data flowed**: User input → API call → response rendered (use `page.waitForResponse` or assert on rendered data)
|
|
275
|
+
- **Content loaded**: Navigation/tab switch → destination content appeared (assert on text/data unique to destination)
|
|
276
|
+
- **Widget responded**: Terminal accepted keystrokes and produced output, editor saved changes, form submitted and data persisted
|
|
277
|
+
|
|
278
|
+
**If a test would pass on an empty HTML page with the correct element IDs and no JavaScript, it is not a functional test.** Rewrite it.
|
|
279
|
+
|
|
253
280
|
## QA Agent (Mandatory)
|
|
254
281
|
|
|
255
282
|
Any GSD-T phase that produces or validates code **MUST run QA**. The QA agent's sole job is test generation, execution, and gap reporting. It never writes feature code.
|
|
@@ -269,10 +296,15 @@ a. Unit tests (vitest/jest/mocha): run the full suite
|
|
|
269
296
|
b. E2E tests: check for playwright.config.* or cypress.config.* — if found, run the FULL E2E suite
|
|
270
297
|
c. NEVER skip E2E when a config file exists. Running only unit tests is a QA FAILURE.
|
|
271
298
|
d. Read .gsd-t/contracts/ for contract definitions. Check contract compliance.
|
|
272
|
-
|
|
299
|
+
e. AUDIT E2E test quality: Review each Playwright spec — if any test only checks element
|
|
300
|
+
existence (isVisible, toBeAttached, toBeEnabled) without verifying functional behavior
|
|
301
|
+
(state changes, data loaded, content updated after user actions), flag it as
|
|
302
|
+
'SHALLOW TEST — needs functional assertions'. A passing test suite that doesn't catch
|
|
303
|
+
broken features is a QA FAILURE.
|
|
304
|
+
Report format: 'Unit: X/Y pass | E2E: X/Y pass (or N/A if no config) | Contract: compliant/violations | Shallow tests: N'"
|
|
273
305
|
```
|
|
274
306
|
|
|
275
|
-
**QA failure blocks phase completion.** Lead cannot proceed until QA reports PASS or user explicitly overrides.
|
|
307
|
+
**QA failure OR shallow tests found blocks phase completion.** Lead cannot proceed until QA reports PASS with zero shallow tests, or user explicitly overrides.
|
|
276
308
|
|
|
277
309
|
## Model Display (MANDATORY)
|
|
278
310
|
|
|
@@ -361,6 +393,35 @@ BEFORE EVERY COMMIT:
|
|
|
361
393
|
|
|
362
394
|
If ANY answer is YES and the doc is NOT updated, update it BEFORE committing. No exceptions.
|
|
363
395
|
|
|
396
|
+
## Document Ripple Completion Gate (MANDATORY)
|
|
397
|
+
|
|
398
|
+
**NEVER report a task as "done" or present a summary until ALL downstream documents are updated.** This is not optional.
|
|
399
|
+
|
|
400
|
+
When a change affects multiple files (e.g., a new standard that applies across command files, a renamed API, a new convention), you MUST:
|
|
401
|
+
|
|
402
|
+
1. **Identify the full blast radius BEFORE starting**: List every file that needs the change
|
|
403
|
+
2. **Complete ALL updates in one pass**: Do not update 3 of 8 files and then present a summary
|
|
404
|
+
3. **Run the Pre-Commit Gate on the COMPLETE changeset**: Not on a partial subset
|
|
405
|
+
4. **Only THEN report completion**
|
|
406
|
+
|
|
407
|
+
```
|
|
408
|
+
BEFORE reporting "done" or presenting a summary:
|
|
409
|
+
├── Did this change establish a new standard, rule, or convention?
|
|
410
|
+
│ YES → Grep for every file that should enforce it. Update ALL of them.
|
|
411
|
+
├── Did this change modify a pattern used in multiple command files?
|
|
412
|
+
│ YES → Find and update EVERY command file that uses that pattern.
|
|
413
|
+
├── Did this change affect a template (CLAUDE-global, CLAUDE-project, etc.)?
|
|
414
|
+
│ YES → The template AND the live equivalent (~/.claude/CLAUDE.md) must match.
|
|
415
|
+
├── Did this change add a new requirement?
|
|
416
|
+
│ YES → Add to docs/requirements.md in the same pass.
|
|
417
|
+
├── Have I checked EVERY file in the blast radius?
|
|
418
|
+
│ NO → Keep going. Do not present partial work.
|
|
419
|
+
└── Am I about to say "want me to also update X?" or "should I check Y?"
|
|
420
|
+
YES → STOP. Just update X and check Y. Then report done.
|
|
421
|
+
```
|
|
422
|
+
|
|
423
|
+
**The test for this gate**: If the user asks "did you update all the documents?" and the answer would be "no, I missed some" — you failed this gate. The user should never need to ask.
|
|
424
|
+
|
|
364
425
|
## Execution Behavior
|
|
365
426
|
- ALWAYS check docs/architecture.md before adding or modifying components.
|
|
366
427
|
- ALWAYS check docs/workflows.md before changing any multi-step process.
|