@tekyzinc/gsd-t 2.50.12 → 2.53.10
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +24 -0
- package/README.md +379 -372
- package/bin/component-registry.js +250 -0
- package/bin/graph-cgc.js +510 -510
- package/bin/graph-indexer.js +147 -147
- package/bin/graph-overlay.js +195 -195
- package/bin/graph-parsers.js +327 -327
- package/bin/graph-query.js +453 -452
- package/bin/graph-store.js +154 -154
- package/bin/qa-calibrator.js +194 -0
- package/bin/scan-data-collector.js +153 -153
- package/bin/scan-diagrams-generators.js +187 -187
- package/bin/scan-diagrams.js +79 -79
- package/bin/scan-renderer.js +92 -92
- package/bin/scan-report-sections.js +121 -121
- package/bin/scan-report.js +184 -184
- package/bin/scan-schema-parsers.js +199 -199
- package/bin/scan-schema.js +103 -103
- package/bin/token-budget.js +246 -0
- package/commands/Claude-md.md +10 -10
- package/commands/branch.md +15 -15
- package/commands/checkin.md +45 -45
- package/commands/global-change.md +209 -209
- package/commands/gsd-t-audit.md +199 -0
- package/commands/gsd-t-backlog-add.md +94 -94
- package/commands/gsd-t-backlog-edit.md +111 -111
- package/commands/gsd-t-backlog-list.md +63 -63
- package/commands/gsd-t-backlog-move.md +94 -94
- package/commands/gsd-t-backlog-promote.md +123 -123
- package/commands/gsd-t-backlog-remove.md +86 -86
- package/commands/gsd-t-backlog-settings.md +158 -158
- package/commands/gsd-t-complete-milestone.md +528 -515
- package/commands/gsd-t-debug.md +506 -399
- package/commands/gsd-t-discuss.md +174 -174
- package/commands/gsd-t-execute.md +758 -634
- package/commands/gsd-t-feature.md +276 -276
- package/commands/gsd-t-health.md +142 -142
- package/commands/gsd-t-help.md +465 -457
- package/commands/gsd-t-impact.md +302 -302
- package/commands/gsd-t-init.md +320 -280
- package/commands/gsd-t-integrate.md +365 -249
- package/commands/gsd-t-milestone.md +87 -87
- package/commands/gsd-t-partition.md +442 -361
- package/commands/gsd-t-pause.md +82 -82
- package/commands/gsd-t-plan.md +345 -344
- package/commands/gsd-t-populate.md +111 -111
- package/commands/gsd-t-prd.md +326 -326
- package/commands/gsd-t-project.md +211 -211
- package/commands/gsd-t-promote-debt.md +123 -123
- package/commands/gsd-t-prompt.md +137 -137
- package/commands/gsd-t-qa.md +266 -266
- package/commands/gsd-t-quick.md +357 -234
- package/commands/gsd-t-reflect.md +134 -134
- package/commands/gsd-t-resume.md +72 -72
- package/commands/gsd-t-scan.md +615 -615
- package/commands/gsd-t-setup.md +76 -0
- package/commands/gsd-t-status.md +192 -166
- package/commands/gsd-t-test-sync.md +381 -381
- package/commands/gsd-t-triage-and-merge.md +171 -171
- package/commands/gsd-t-verify.md +382 -382
- package/commands/gsd-t-visualize.md +118 -118
- package/commands/gsd-t-wave.md +401 -378
- package/docs/GSD-T-README.md +425 -422
- package/docs/architecture.md +385 -369
- package/docs/harness-design-analysis.md +371 -0
- package/docs/infrastructure.md +205 -205
- package/docs/prd-graph-engine.md +398 -398
- package/docs/prd-gsd2-hybrid.md +559 -559
- package/docs/prd-harness-evolution.md +583 -0
- package/docs/requirements.md +14 -0
- package/docs/workflows.md +226 -226
- package/examples/.gsd-t/domains/example-domain/scope.md +13 -13
- package/package.json +40 -40
- package/scripts/gsd-t-auto-route.js +39 -39
- package/scripts/gsd-t-dashboard-mockup.html +1143 -1143
- package/scripts/gsd-t-dashboard-server.js +171 -171
- package/scripts/gsd-t-dashboard.html +262 -262
- package/scripts/gsd-t-event-writer.js +128 -128
- package/scripts/gsd-t-statusline.js +94 -94
- package/scripts/gsd-t-tools.js +175 -175
- package/templates/CLAUDE-global.md +639 -614
- package/templates/CLAUDE-project.md +24 -0
- package/templates/backlog-settings.md +18 -18
- package/templates/backlog.md +1 -1
- package/templates/progress.md +40 -40
- package/templates/shared-services-contract.md +60 -60
- package/templates/stacks/desktop.ini +2 -2
- package/bin/desktop.ini +0 -2
- package/commands/desktop.ini +0 -2
- package/docs/ci-examples/desktop.ini +0 -2
- package/docs/desktop.ini +0 -2
- package/examples/.gsd-t/contracts/desktop.ini +0 -2
- package/examples/.gsd-t/desktop.ini +0 -2
- package/examples/.gsd-t/domains/desktop.ini +0 -2
- package/examples/.gsd-t/domains/example-domain/desktop.ini +0 -2
- package/examples/desktop.ini +0 -2
- package/examples/rules/desktop.ini +0 -2
- package/scripts/desktop.ini +0 -2
- package/templates/desktop.ini +0 -2
package/commands/gsd-t-verify.md
CHANGED
|
@@ -1,382 +1,382 @@
|
|
|
1
|
-
# GSD-T: Verify — Quality Gates (Solo or Parallel)
|
|
2
|
-
|
|
3
|
-
You are the lead agent coordinating verification of the completed work. Each verification dimension should be thorough and independent.
|
|
4
|
-
|
|
5
|
-
## Step 1: Load State
|
|
6
|
-
|
|
7
|
-
Read:
|
|
8
|
-
1. `CLAUDE.md`
|
|
9
|
-
2. `.gsd-t/progress.md` — confirm status is INTEGRATED
|
|
10
|
-
3. `.gsd-t/contracts/` — all contracts
|
|
11
|
-
4. `.gsd-t/domains/*/tasks.md` — all acceptance criteria
|
|
12
|
-
5. `docs/requirements.md` — original requirements
|
|
13
|
-
6. All source code
|
|
14
|
-
|
|
15
|
-
## Step 1.5: Graph-Enhanced Traceability Check
|
|
16
|
-
|
|
17
|
-
If `.gsd-t/graph/meta.json` exists (graph index is available):
|
|
18
|
-
1. Query `getRequirementFor` on implemented entities to build a requirement-to-code traceability chain — flag entities with no requirement mapping
|
|
19
|
-
2. Query `getDomainBoundaryViolations` to verify no cross-domain boundary violations exist in the final codebase
|
|
20
|
-
3. Include any violations as FAIL findings in the verification report (Step 5)
|
|
21
|
-
|
|
22
|
-
If graph is not available, skip this step.
|
|
23
|
-
|
|
24
|
-
## Step 2: Full Test Audit (Inline)
|
|
25
|
-
|
|
26
|
-
Run the full test audit directly:
|
|
27
|
-
|
|
28
|
-
1. Run the full test suite: `npm test` (or project equivalent) — record pass/fail counts
|
|
29
|
-
2. Read all contracts in `.gsd-t/contracts/` — verify each has at least one test validating it
|
|
30
|
-
3. Check acceptance criteria from domain task lists — verify each is tested
|
|
31
|
-
4. Run E2E suite if `playwright.config.*` exists
|
|
32
|
-
5. Report: comprehensive test results with pass/fail counts and coverage gaps
|
|
33
|
-
|
|
34
|
-
Verification cannot complete if any test fails or critical contract gaps remain.
|
|
35
|
-
|
|
36
|
-
## Step 2.5: High-Risk Domain Gate (MANDATORY — Categories 2 and 7)
|
|
37
|
-
|
|
38
|
-
Before running standard verification dimensions, check whether this milestone involves any high-risk domain:
|
|
39
|
-
|
|
40
|
-
**High-risk domains**: audio capture/playback, GPU/WebGPU/WebGL, ML/inference/model loading, background workers, native APIs (camera, bluetooth, filesystem), IPC, WebAssembly, real-time data streams.
|
|
41
|
-
|
|
42
|
-
**If any high-risk domain is present:**
|
|
43
|
-
|
|
44
|
-
### Category 2 — Technology Reliability Gate
|
|
45
|
-
Initialization success does not prove runtime correctness. These technologies can initialize cleanly and fail silently at runtime (compute shader errors, audio context state loss, worker message drops, inference failures).
|
|
46
|
-
|
|
47
|
-
For each high-risk domain:
|
|
48
|
-
1. A **smoke test script** must exist that exercises actual runtime behavior — not just initialization
|
|
49
|
-
2. The smoke test must have been run and passed
|
|
50
|
-
3. "It initialized without throwing" is NOT a passing smoke test
|
|
51
|
-
4. If no smoke test exists → create one now before proceeding with any other verification dimension
|
|
52
|
-
5. Smoke test failure → verification FAIL (not WARN)
|
|
53
|
-
|
|
54
|
-
### Category 7 — Manual QA as Test Gate
|
|
55
|
-
"The user will manually test it" is not a test artifact. Scan the milestone's domains for any feature whose acceptance criteria relies solely on manual user testing.
|
|
56
|
-
|
|
57
|
-
For each such feature:
|
|
58
|
-
1. A smoke test script must exist that automates as much of the verification as possible
|
|
59
|
-
2. Any remaining manual steps must be explicitly documented in `.gsd-t/smoke-tests/{feature}.md` with exact steps and expected outcomes
|
|
60
|
-
3. The documented manual steps must have been executed and passed (noted in the file)
|
|
61
|
-
4. If neither automated smoke test nor documented manual procedure exists → verification FAIL
|
|
62
|
-
|
|
63
|
-
> These gates exist because the pre-commit checklist "did you run the affected tests?" is meaningless when the only test is "user presses Ctrl+Space." That is not a test. It is hope.
|
|
64
|
-
|
|
65
|
-
---
|
|
66
|
-
|
|
67
|
-
## Step 3: Define Verification Dimensions
|
|
68
|
-
|
|
69
|
-
Standard dimensions (adjust based on project):
|
|
70
|
-
|
|
71
|
-
1. **Functional Correctness**: Does it work per requirements?
|
|
72
|
-
2. **Contract Compliance**: Does every domain honor its contracts?
|
|
73
|
-
3. **Code Quality**: Conventions, patterns, error handling, readability
|
|
74
|
-
4. **Test Coverage Completeness**: Every new or changed code path MUST have tests. Check:
|
|
75
|
-
- Do all new functions have unit tests (happy path + edge cases + error cases)?
|
|
76
|
-
- Do all new features/modes/flows have Playwright E2E specs?
|
|
77
|
-
- Do all new UI components have interaction tests?
|
|
78
|
-
- **Zero test coverage on new functionality = FAIL** (not WARN, not "nice to have" — FAIL)
|
|
79
|
-
5. **E2E Tests**: Run the FULL Playwright suite — all specs must pass. If new features lack specs, create them before proceeding.
|
|
80
|
-
6. **Security**: Auth flows, input validation, data exposure, dependencies
|
|
81
|
-
7. **Integration Integrity**: Do the seams between domains hold under stress?
|
|
82
|
-
8. **Requirements Traceability Close-Out**: Mark verified requirements as complete and report orphans:
|
|
83
|
-
- Read `docs/requirements.md` traceability table (added by plan phase)
|
|
84
|
-
- For each REQ-ID that is fully implemented and tested: update Status to `complete` in the traceability table
|
|
85
|
-
- **Orphan report**: List any REQ-IDs with no task mapping (planning gap) and any tasks with no REQ-ID (potential scope creep)
|
|
86
|
-
- Orphaned requirements = WARN (not blocking unless critical)
|
|
87
|
-
- Update `docs/requirements.md` with the close-out results
|
|
88
|
-
|
|
89
|
-
## Step 4: Execute Verification
|
|
90
|
-
|
|
91
|
-
### Solo Mode (default)
|
|
92
|
-
Work through each dimension sequentially. For each:
|
|
93
|
-
1. Define what you're checking
|
|
94
|
-
2. Check it systematically
|
|
95
|
-
3. Record findings as PASS / WARN / FAIL with specifics
|
|
96
|
-
4. If FAIL, create a remediation task
|
|
97
|
-
|
|
98
|
-
**Mandatory test execution:**
|
|
99
|
-
1. Run ALL unit/integration tests — every test must pass
|
|
100
|
-
2. Detect Playwright (check for `playwright.config.*`, Playwright deps in package.json)
|
|
101
|
-
3. Run the FULL Playwright E2E suite — every spec must pass
|
|
102
|
-
4. **Coverage audit**: For every new feature, mode, page, or flow added in this milestone:
|
|
103
|
-
- Confirm Playwright specs exist that specifically test it
|
|
104
|
-
- Confirm specs cover: happy path, error states, edge cases, all modes/flags
|
|
105
|
-
- If specs are missing or incomplete → invoke `gsd-t-test-sync` to create them, then re-run
|
|
106
|
-
- **Missing E2E coverage on new functionality = verification FAIL**
|
|
107
|
-
5. **Functional test quality audit**: Read every Playwright spec. For each `test()` block, verify assertions check **functional behavior** (state changed after action, data loaded, content updated, widget responded) — NOT just element existence (`isVisible`, `toBeAttached`, `toBeEnabled`). A test that would pass on an empty HTML page with the right element IDs is a **shallow test** and counts as a verification FAIL. Flag shallow tests and rewrite them before proceeding.
|
|
108
|
-
6. Tests are NOT optional — verification cannot pass without running them and confirming comprehensive, functional coverage
|
|
109
|
-
|
|
110
|
-
### Team Mode (when agent teams are enabled)
|
|
111
|
-
```
|
|
112
|
-
Create an agent team for verification:
|
|
113
|
-
|
|
114
|
-
ALL TEAMMATES read first:
|
|
115
|
-
1. CLAUDE.md
|
|
116
|
-
2. .gsd-t/contracts/ — all contracts
|
|
117
|
-
3. .gsd-t/domains/*/tasks.md — acceptance criteria
|
|
118
|
-
4. docs/requirements.md
|
|
119
|
-
|
|
120
|
-
Teammate assignments:
|
|
121
|
-
- Teammate "functional":
|
|
122
|
-
Verify every acceptance criterion in every domain's tasks.md.
|
|
123
|
-
Test each user flow end-to-end.
|
|
124
|
-
Report: list of criteria with PASS/FAIL status.
|
|
125
|
-
|
|
126
|
-
- Teammate "contracts":
|
|
127
|
-
For each contract in .gsd-t/contracts/:
|
|
128
|
-
Verify the implementing code matches exactly.
|
|
129
|
-
Check types, shapes, error handling, edge cases.
|
|
130
|
-
Report: contract-by-contract compliance status.
|
|
131
|
-
|
|
132
|
-
- Teammate "quality":
|
|
133
|
-
Review all source code for:
|
|
134
|
-
- Consistency with CLAUDE.md conventions
|
|
135
|
-
- Error handling completeness
|
|
136
|
-
- Code duplication
|
|
137
|
-
- Naming consistency
|
|
138
|
-
- Dead code or TODOs
|
|
139
|
-
Report: file-by-file findings.
|
|
140
|
-
|
|
141
|
-
- Teammate "security":
|
|
142
|
-
Review for:
|
|
143
|
-
- Auth bypass possibilities
|
|
144
|
-
- Input validation gaps
|
|
145
|
-
- Data exposure in API responses
|
|
146
|
-
- Dependency vulnerabilities (run audit if applicable)
|
|
147
|
-
- Secret/credential handling
|
|
148
|
-
Report: severity-ranked findings.
|
|
149
|
-
|
|
150
|
-
Lead: After receiving teammate reports:
|
|
151
|
-
**OBSERVABILITY LOGGING (MANDATORY):**
|
|
152
|
-
Before spawning — run via Bash:
|
|
153
|
-
`T_START=$(date +%s) && DT_START=$(date +"%Y-%m-%d %H:%M") && TOK_START=${CLAUDE_CONTEXT_TOKENS_USED:-0} && TOK_MAX=${CLAUDE_CONTEXT_TOKENS_MAX:-200000}`
|
|
154
|
-
Spawn a Task subagent to run the full test suite and contract audit.
|
|
155
|
-
After subagent returns — run via Bash:
|
|
156
|
-
`T_END=$(date +%s) && DT_END=$(date +"%Y-%m-%d %H:%M") && TOK_END=${CLAUDE_CONTEXT_TOKENS_USED:-0} && DURATION=$((T_END-T_START))`
|
|
157
|
-
Compute tokens and compaction:
|
|
158
|
-
- No compaction (TOK_END >= TOK_START): `TOKENS=$((TOK_END-TOK_START))`, COMPACTED=null
|
|
159
|
-
- Compaction detected (TOK_END < TOK_START): `TOKENS=$(((TOK_MAX-TOK_START)+TOK_END))`, COMPACTED=$DT_END
|
|
160
|
-
Append to `.gsd-t/token-log.md` (create with header `| Datetime-start | Datetime-end | Command | Step | Model | Duration(s) | Notes | Tokens | Compacted |` if missing):
|
|
161
|
-
`| {DT_START} | {DT_END} | gsd-t-verify | Step 4 | haiku | {DURATION}s | test audit + contract review | {TOKENS} | {COMPACTED} |`
|
|
162
|
-
Collect all reports, synthesize, create remediation plan.
|
|
163
|
-
```
|
|
164
|
-
|
|
165
|
-
## Step 5: Compile Verification Report
|
|
166
|
-
|
|
167
|
-
Create or update `.gsd-t/verify-report.md`:
|
|
168
|
-
|
|
169
|
-
```markdown
|
|
170
|
-
# Verification Report — {date}
|
|
171
|
-
|
|
172
|
-
## Milestone: {name}
|
|
173
|
-
|
|
174
|
-
## Summary
|
|
175
|
-
- Functional: {PASS/WARN/FAIL} — {X}/{Y} criteria met
|
|
176
|
-
- Contracts: {PASS/WARN/FAIL} — {X}/{Y} contracts compliant
|
|
177
|
-
- Code Quality: {PASS/WARN/FAIL} — {N} issues found
|
|
178
|
-
- Unit Tests: {PASS/WARN/FAIL} — {N}/{total} passing
|
|
179
|
-
- E2E Tests: {PASS/WARN/FAIL} — {N}/{total} specs passing
|
|
180
|
-
- Security: {PASS/WARN/FAIL} — {N} findings
|
|
181
|
-
- Integration: {PASS/WARN/FAIL}
|
|
182
|
-
|
|
183
|
-
## Overall: {PASS / CONDITIONAL PASS / FAIL}
|
|
184
|
-
|
|
185
|
-
## Findings
|
|
186
|
-
|
|
187
|
-
### Critical (must fix before milestone complete)
|
|
188
|
-
1. {finding} — {domain} — {remediation}
|
|
189
|
-
|
|
190
|
-
### Warnings (should fix, not blocking)
|
|
191
|
-
1. {finding} — {domain} — {remediation}
|
|
192
|
-
|
|
193
|
-
### Notes (informational)
|
|
194
|
-
1. {observation}
|
|
195
|
-
|
|
196
|
-
## Remediation Tasks
|
|
197
|
-
| # | Domain | Description | Priority |
|
|
198
|
-
|---|--------|-------------|----------|
|
|
199
|
-
| 1 | auth | Fix missing role in user response | CRITICAL |
|
|
200
|
-
| 2 | ui | Add loading states for async calls | WARN |
|
|
201
|
-
```
|
|
202
|
-
|
|
203
|
-
## Step 5.25: Metrics Quality Budget Check
|
|
204
|
-
|
|
205
|
-
Check task-metrics for the current milestone to detect quality budget violations:
|
|
206
|
-
|
|
207
|
-
1. Run via Bash:
|
|
208
|
-
`node -e "const c = require('./bin/metrics-collector.js'); const r = c.readTaskMetrics({milestone: '{milestone-id}'}); if(!r.length){console.log('No metrics data — quality budget check skipped');process.exit(0);} const pass=r.filter(t=>t.fix_cycles===0&&t.pass).length; const rate=pass/r.length; console.log('First-pass rate: '+(rate*100).toFixed(1)+'% ('+pass+'/'+r.length+')'); if(rate<0.6) console.log('⚠️ Quality budget WARNING: first-pass rate below 60%');" 2>/dev/null || true`
|
|
209
|
-
|
|
210
|
-
2. Run heuristics check via Bash:
|
|
211
|
-
`node -e "const m=require('./bin/metrics-rollup.js'); const r=m.readRollups({milestone:'{milestone-id}'}); if(r.length&&r[r.length-1].heuristic_flags.some(f=>f.severity==='HIGH')) console.log('⚠️ HIGH severity heuristic flag detected — review before completing milestone');" 2>/dev/null || true`
|
|
212
|
-
|
|
213
|
-
3. Display quality metrics summary inline. Quality budget violation is a **WARNING** (non-blocking) — does not fail verify.
|
|
214
|
-
|
|
215
|
-
4. Include quality budget status in the verification report (Step 5):
|
|
216
|
-
`- Quality Budget: {PASS/WARN} — first-pass rate {N}%{, HIGH heuristic: {name} if any}`
|
|
217
|
-
|
|
218
|
-
## Step 5.5: Goal-Backward Verification (Post-Gate Behavior Check)
|
|
219
|
-
|
|
220
|
-
This step runs **after all 8 quality gates pass**. It verifies that milestone goals are actually achieved end-to-end — not just structurally present. It catches placeholder implementations that pass all structural gates.
|
|
221
|
-
|
|
222
|
-
Refer to `.gsd-t/contracts/goal-backward-contract.md` for the full verification flow, placeholder patterns, and findings report format.
|
|
223
|
-
|
|
224
|
-
### 5.5.1 Load Milestone Goals and Requirements
|
|
225
|
-
|
|
226
|
-
1. Read `.gsd-t/progress.md` — extract the current milestone name and goals
|
|
227
|
-
2. Read `docs/requirements.md` — identify **critical requirements** (skip trivial/low-priority items)
|
|
228
|
-
|
|
229
|
-
### 5.5.2 Trace Requirements to Behavior
|
|
230
|
-
|
|
231
|
-
For each critical requirement:
|
|
232
|
-
|
|
233
|
-
1. **If `.gsd-t/graph/meta.json` exists (graph available)**:
|
|
234
|
-
- Trace the requirement → code path → behavior chain using graph queries
|
|
235
|
-
- Use `getRequirementFor`, `getCallers`, and `getTestsFor` to build the chain
|
|
236
|
-
- Flag requirements with no traceable code path as CRITICAL findings
|
|
237
|
-
|
|
238
|
-
2. **If graph is not available (fallback to grep)**:
|
|
239
|
-
- Search the codebase for the feature/function implementing each requirement
|
|
240
|
-
- Trace from entry point → core logic → output/response
|
|
241
|
-
|
|
242
|
-
### 5.5.3 Scan for Placeholder Patterns
|
|
243
|
-
|
|
244
|
-
For each file identified in the requirement traces above, scan for these placeholder patterns:
|
|
245
|
-
|
|
246
|
-
| Pattern | Detection Hint | Severity |
|
|
247
|
-
|---------|---------------|----------|
|
|
248
|
-
| console.log placeholder | `console.log.*TODO\|console.log.*implement` | CRITICAL |
|
|
249
|
-
| TODO/FIXME in implementation | `// TODO\|// FIXME\|# TODO\|# FIXME` in non-test files | CRITICAL |
|
|
250
|
-
| Empty function body | `function \w+\(\) \{\}` or `\(\) => \{\}` with no logic | CRITICAL |
|
|
251
|
-
| Throw not-implemented | `throw new Error.*not implemented\|throw new Error.*TODO` | CRITICAL |
|
|
252
|
-
| Hardcoded return | `return "success"\|return true` with no conditional logic | HIGH |
|
|
253
|
-
| Static UI text | Static `<span>` or text that never updates based on state | HIGH |
|
|
254
|
-
| Pass-through stub | `return input\|return req\|return data` with no transformation | MEDIUM |
|
|
255
|
-
|
|
256
|
-
### 5.5.4 Produce Findings Report
|
|
257
|
-
|
|
258
|
-
Format findings per the goal-backward-contract.md report format:
|
|
259
|
-
|
|
260
|
-
```markdown
|
|
261
|
-
## Goal-Backward Verification Report
|
|
262
|
-
|
|
263
|
-
### Status: PASS | FAIL
|
|
264
|
-
|
|
265
|
-
### Findings
|
|
266
|
-
| # | Requirement | File:Line | Pattern | Severity | Description |
|
|
267
|
-
|---|-------------|-----------|---------|----------|-------------|
|
|
268
|
-
| 1 | {req-id} | {path}:{line} | {pattern} | {severity} | {what's wrong} |
|
|
269
|
-
|
|
270
|
-
### Summary
|
|
271
|
-
- Requirements checked: {N}
|
|
272
|
-
- Findings: {N} ({critical}, {high}, {medium})
|
|
273
|
-
- Verdict: {PASS if 0 critical/high, FAIL otherwise}
|
|
274
|
-
```
|
|
275
|
-
|
|
276
|
-
### 5.5.5 Apply Blocking Rules
|
|
277
|
-
|
|
278
|
-
- **CRITICAL or HIGH findings** → Goal-Backward status = **FAIL** — block verification
|
|
279
|
-
- Append findings to the Critical section of the verification report (Step 5)
|
|
280
|
-
- Set overall verification status to FAIL
|
|
281
|
-
- **MEDIUM findings** → Goal-Backward status = **WARN** — log but do not block
|
|
282
|
-
- Append findings to the Warnings section of the verification report (Step 5)
|
|
283
|
-
- **No findings** → Goal-Backward status = **PASS** — add to verification report summary
|
|
284
|
-
|
|
285
|
-
Add a `Goal-Backward:` line to the Step 5 verification report summary:
|
|
286
|
-
```
|
|
287
|
-
- Goal-Backward: {PASS/WARN/FAIL} — {N} requirements checked, {N} findings ({critical} critical, {high} high, {medium} medium)
|
|
288
|
-
```
|
|
289
|
-
|
|
290
|
-
---
|
|
291
|
-
|
|
292
|
-
## Step 6: Handle Remediation
|
|
293
|
-
|
|
294
|
-
If there are CRITICAL findings:
|
|
295
|
-
1. Create remediation tasks in the affected domain's `tasks.md`
|
|
296
|
-
2. Execute fixes (solo — don't spawn teams for remediation)
|
|
297
|
-
3. Re-verify the specific findings
|
|
298
|
-
4. Update the verification report
|
|
299
|
-
|
|
300
|
-
## Step 7: Update State
|
|
301
|
-
|
|
302
|
-
Update `.gsd-t/progress.md`:
|
|
303
|
-
- If all PASS: Set status to `VERIFIED`
|
|
304
|
-
- If CONDITIONAL PASS: Set status to `VERIFIED-WITH-WARNINGS`, list warnings
|
|
305
|
-
- If FAIL: Set status to `VERIFY-FAILED`, list required remediations
|
|
306
|
-
- Record verification date and summary
|
|
307
|
-
|
|
308
|
-
### Autonomy Behavior
|
|
309
|
-
|
|
310
|
-
**All Levels**:
|
|
311
|
-
- VERIFIED or CONDITIONAL PASS → **Auto-invoke complete-milestone** (see Step 8 below). Completing a verified milestone is mechanical — there is no judgment call that benefits from user review.
|
|
312
|
-
- FAIL → **Level 3**: Auto-execute remediation tasks (up to 2 fix attempts). If still failing after 2 attempts:
|
|
313
|
-
1. Write failure context to `.gsd-t/debug-state.jsonl` via `node -e "require('./bin/debug-ledger.js').appendEntry('.', {iteration:1,timestamp:new Date().toISOString(),test:'verify-remediation',error:'2 in-context fix attempts exhausted',hypothesis:'see verify-report.md',fix:'n/a',fixFiles:[],result:'STILL_FAILS',learning:'delegating to headless debug-loop',model:'sonnet',duration:0})"`
|
|
314
|
-
2. Log: "Delegating to headless debug-loop (2 in-context attempts exhausted)"
|
|
315
|
-
3. Run: `gsd-t headless --debug-loop --max-iterations 10`
|
|
316
|
-
4. Exit code 0 → re-run verification; 1/4 → log to `.gsd-t/deferred-items.md`, STOP and report to user; 3 → report error
|
|
317
|
-
**Level 1–2**: Return to execute phase for remediation tasks.
|
|
318
|
-
|
|
319
|
-
## Document Ripple
|
|
320
|
-
|
|
321
|
-
### Always update:
|
|
322
|
-
1. **`.gsd-t/progress.md`** — Set status to VERIFIED/VERIFY-FAILED, log verification summary
|
|
323
|
-
2. **`.gsd-t/verify-report.md`** — Created with full verification results (Step 4)
|
|
324
|
-
|
|
325
|
-
### Check if affected:
|
|
326
|
-
3. **`.gsd-t/domains/{domain}/tasks.md`** — If remediation tasks were created (Step 5)
|
|
327
|
-
4. **`.gsd-t/techdebt.md`** — If verification found new quality or security issues, add as debt
|
|
328
|
-
5. **`docs/requirements.md`** — If verification revealed unmet requirements, update status
|
|
329
|
-
|
|
330
|
-
## Step 8: Auto-Invoke Complete-Milestone
|
|
331
|
-
|
|
332
|
-
**This step is MANDATORY and runs at ALL autonomy levels.** Completing a verified milestone is a mechanical operation (archive, tag, bump version, update docs). There is no decision that benefits from user review — the decision was made when verification passed.
|
|
333
|
-
|
|
334
|
-
If status is VERIFY-FAILED:
|
|
335
|
-
- Do NOT invoke complete-milestone
|
|
336
|
-
- Report failures and stop
|
|
337
|
-
|
|
338
|
-
If status is VERIFIED or VERIFIED-WITH-WARNINGS:
|
|
339
|
-
1. Log: "✅ Verify complete — spawning complete-milestone agent..."
|
|
340
|
-
|
|
341
|
-
**OBSERVABILITY LOGGING (MANDATORY):**
|
|
342
|
-
Before spawning — run via Bash:
|
|
343
|
-
`T_START=$(date +%s) && DT_START=$(date +"%Y-%m-%d %H:%M") && TOK_START=${CLAUDE_CONTEXT_TOKENS_USED:-0} && TOK_MAX=${CLAUDE_CONTEXT_TOKENS_MAX:-200000}`
|
|
344
|
-
|
|
345
|
-
2. Spawn a Task subagent (model: sonnet, mode: bypassPermissions):
|
|
346
|
-
```
|
|
347
|
-
"Execute the complete-milestone phase of the current GSD-T milestone.
|
|
348
|
-
|
|
349
|
-
Read and follow the full instructions in commands/gsd-t-complete-milestone.md
|
|
350
|
-
(resolve from ~/.claude/commands/ if not in project).
|
|
351
|
-
Read .gsd-t/progress.md for current milestone and state.
|
|
352
|
-
Read CLAUDE.md for project conventions.
|
|
353
|
-
Read .gsd-t/contracts/ for domain interfaces.
|
|
354
|
-
|
|
355
|
-
Complete the phase fully:
|
|
356
|
-
- Follow every step in the command file
|
|
357
|
-
- Update .gsd-t/progress.md status when done
|
|
358
|
-
- Run document ripple as specified
|
|
359
|
-
- Commit your work
|
|
360
|
-
|
|
361
|
-
Report back: one-line status summary."
|
|
362
|
-
```
|
|
363
|
-
|
|
364
|
-
After subagent returns — run via Bash:
|
|
365
|
-
`T_END=$(date +%s) && DT_END=$(date +"%Y-%m-%d %H:%M") && TOK_END=${CLAUDE_CONTEXT_TOKENS_USED:-0} && DURATION=$((T_END-T_START))`
|
|
366
|
-
Compute tokens and compaction:
|
|
367
|
-
- No compaction (TOK_END >= TOK_START): `TOKENS=$((TOK_END-TOK_START))`, COMPACTED=null
|
|
368
|
-
- Compaction detected (TOK_END < TOK_START): `TOKENS=$(((TOK_MAX-TOK_START)+TOK_END))`, COMPACTED=$DT_END
|
|
369
|
-
Append to `.gsd-t/token-log.md`:
|
|
370
|
-
`| {DT_START} | {DT_END} | gsd-t-verify | Step 8 | sonnet | {DURATION}s | auto-complete-milestone | {TOKENS} | {COMPACTED} | | | {CTX_PCT} |`
|
|
371
|
-
|
|
372
|
-
3. Verify subagent result: Read `.gsd-t/progress.md` — confirm status is COMPLETED. If not, report the failure.
|
|
373
|
-
|
|
374
|
-
**Why this is mandatory**: Without auto-completion, verified milestones remain in VERIFIED state indefinitely. Requirements stay unmarked, progress.md is stale, and future sessions cannot tell the work was done. This is the root cause of "GSD-T forgot it did this work" — the milestone was built and verified but never formally completed.
|
|
375
|
-
|
|
376
|
-
**Why a subagent**: Complete-milestone is a 12-step process (gap analysis, archive, version bump, git tag, doc ripple). Verify is already heavy with 8+ quality gates. Spawning a fresh-context subagent avoids compaction risk — and complete-milestone loads everything it needs from files (progress.md, verify-report.md, contracts).
|
|
377
|
-
|
|
378
|
-
$ARGUMENTS
|
|
379
|
-
|
|
380
|
-
## Auto-Clear
|
|
381
|
-
|
|
382
|
-
All work is committed to project files. Execute `/clear` to free the context window for the next command.
|
|
1
|
+
# GSD-T: Verify — Quality Gates (Solo or Parallel)
|
|
2
|
+
|
|
3
|
+
You are the lead agent coordinating verification of the completed work. Each verification dimension should be thorough and independent.
|
|
4
|
+
|
|
5
|
+
## Step 1: Load State
|
|
6
|
+
|
|
7
|
+
Read:
|
|
8
|
+
1. `CLAUDE.md`
|
|
9
|
+
2. `.gsd-t/progress.md` — confirm status is INTEGRATED
|
|
10
|
+
3. `.gsd-t/contracts/` — all contracts
|
|
11
|
+
4. `.gsd-t/domains/*/tasks.md` — all acceptance criteria
|
|
12
|
+
5. `docs/requirements.md` — original requirements
|
|
13
|
+
6. All source code
|
|
14
|
+
|
|
15
|
+
## Step 1.5: Graph-Enhanced Traceability Check
|
|
16
|
+
|
|
17
|
+
If `.gsd-t/graph/meta.json` exists (graph index is available):
|
|
18
|
+
1. Query `getRequirementFor` on implemented entities to build a requirement-to-code traceability chain — flag entities with no requirement mapping
|
|
19
|
+
2. Query `getDomainBoundaryViolations` to verify no cross-domain boundary violations exist in the final codebase
|
|
20
|
+
3. Include any violations as FAIL findings in the verification report (Step 5)
|
|
21
|
+
|
|
22
|
+
If graph is not available, skip this step.
|
|
23
|
+
|
|
24
|
+
## Step 2: Full Test Audit (Inline)
|
|
25
|
+
|
|
26
|
+
Run the full test audit directly:
|
|
27
|
+
|
|
28
|
+
1. Run the full test suite: `npm test` (or project equivalent) — record pass/fail counts
|
|
29
|
+
2. Read all contracts in `.gsd-t/contracts/` — verify each has at least one test validating it
|
|
30
|
+
3. Check acceptance criteria from domain task lists — verify each is tested
|
|
31
|
+
4. Run E2E suite if `playwright.config.*` exists
|
|
32
|
+
5. Report: comprehensive test results with pass/fail counts and coverage gaps
|
|
33
|
+
|
|
34
|
+
Verification cannot complete if any test fails or critical contract gaps remain.
|
|
35
|
+
|
|
36
|
+
## Step 2.5: High-Risk Domain Gate (MANDATORY — Categories 2 and 7)
|
|
37
|
+
|
|
38
|
+
Before running standard verification dimensions, check whether this milestone involves any high-risk domain:
|
|
39
|
+
|
|
40
|
+
**High-risk domains**: audio capture/playback, GPU/WebGPU/WebGL, ML/inference/model loading, background workers, native APIs (camera, bluetooth, filesystem), IPC, WebAssembly, real-time data streams.
|
|
41
|
+
|
|
42
|
+
**If any high-risk domain is present:**
|
|
43
|
+
|
|
44
|
+
### Category 2 — Technology Reliability Gate
|
|
45
|
+
Initialization success does not prove runtime correctness. These technologies can initialize cleanly and fail silently at runtime (compute shader errors, audio context state loss, worker message drops, inference failures).
|
|
46
|
+
|
|
47
|
+
For each high-risk domain:
|
|
48
|
+
1. A **smoke test script** must exist that exercises actual runtime behavior — not just initialization
|
|
49
|
+
2. The smoke test must have been run and passed
|
|
50
|
+
3. "It initialized without throwing" is NOT a passing smoke test
|
|
51
|
+
4. If no smoke test exists → create one now before proceeding with any other verification dimension
|
|
52
|
+
5. Smoke test failure → verification FAIL (not WARN)
|
|
53
|
+
|
|
54
|
+
### Category 7 — Manual QA as Test Gate
|
|
55
|
+
"The user will manually test it" is not a test artifact. Scan the milestone's domains for any feature whose acceptance criteria relies solely on manual user testing.
|
|
56
|
+
|
|
57
|
+
For each such feature:
|
|
58
|
+
1. A smoke test script must exist that automates as much of the verification as possible
|
|
59
|
+
2. Any remaining manual steps must be explicitly documented in `.gsd-t/smoke-tests/{feature}.md` with exact steps and expected outcomes
|
|
60
|
+
3. The documented manual steps must have been executed and passed (noted in the file)
|
|
61
|
+
4. If neither automated smoke test nor documented manual procedure exists → verification FAIL
|
|
62
|
+
|
|
63
|
+
> These gates exist because the pre-commit checklist "did you run the affected tests?" is meaningless when the only test is "user presses Ctrl+Space." That is not a test. It is hope.
|
|
64
|
+
|
|
65
|
+
---
|
|
66
|
+
|
|
67
|
+
## Step 3: Define Verification Dimensions
|
|
68
|
+
|
|
69
|
+
Standard dimensions (adjust based on project):
|
|
70
|
+
|
|
71
|
+
1. **Functional Correctness**: Does it work per requirements?
|
|
72
|
+
2. **Contract Compliance**: Does every domain honor its contracts?
|
|
73
|
+
3. **Code Quality**: Conventions, patterns, error handling, readability
|
|
74
|
+
4. **Test Coverage Completeness**: Every new or changed code path MUST have tests. Check:
|
|
75
|
+
- Do all new functions have unit tests (happy path + edge cases + error cases)?
|
|
76
|
+
- Do all new features/modes/flows have Playwright E2E specs?
|
|
77
|
+
- Do all new UI components have interaction tests?
|
|
78
|
+
- **Zero test coverage on new functionality = FAIL** (not WARN, not "nice to have" — FAIL)
|
|
79
|
+
5. **E2E Tests**: Run the FULL Playwright suite — all specs must pass. If new features lack specs, create them before proceeding.
|
|
80
|
+
6. **Security**: Auth flows, input validation, data exposure, dependencies
|
|
81
|
+
7. **Integration Integrity**: Do the seams between domains hold under stress?
|
|
82
|
+
8. **Requirements Traceability Close-Out**: Mark verified requirements as complete and report orphans:
|
|
83
|
+
- Read `docs/requirements.md` traceability table (added by plan phase)
|
|
84
|
+
- For each REQ-ID that is fully implemented and tested: update Status to `complete` in the traceability table
|
|
85
|
+
- **Orphan report**: List any REQ-IDs with no task mapping (planning gap) and any tasks with no REQ-ID (potential scope creep)
|
|
86
|
+
- Orphaned requirements = WARN (not blocking unless critical)
|
|
87
|
+
- Update `docs/requirements.md` with the close-out results
|
|
88
|
+
|
|
89
|
+
## Step 4: Execute Verification
|
|
90
|
+
|
|
91
|
+
### Solo Mode (default)
|
|
92
|
+
Work through each dimension sequentially. For each:
|
|
93
|
+
1. Define what you're checking
|
|
94
|
+
2. Check it systematically
|
|
95
|
+
3. Record findings as PASS / WARN / FAIL with specifics
|
|
96
|
+
4. If FAIL, create a remediation task
|
|
97
|
+
|
|
98
|
+
**Mandatory test execution:**
|
|
99
|
+
1. Run ALL unit/integration tests — every test must pass
|
|
100
|
+
2. Detect Playwright (check for `playwright.config.*`, Playwright deps in package.json)
|
|
101
|
+
3. Run the FULL Playwright E2E suite — every spec must pass
|
|
102
|
+
4. **Coverage audit**: For every new feature, mode, page, or flow added in this milestone:
|
|
103
|
+
- Confirm Playwright specs exist that specifically test it
|
|
104
|
+
- Confirm specs cover: happy path, error states, edge cases, all modes/flags
|
|
105
|
+
- If specs are missing or incomplete → invoke `gsd-t-test-sync` to create them, then re-run
|
|
106
|
+
- **Missing E2E coverage on new functionality = verification FAIL**
|
|
107
|
+
5. **Functional test quality audit**: Read every Playwright spec. For each `test()` block, verify assertions check **functional behavior** (state changed after action, data loaded, content updated, widget responded) — NOT just element existence (`isVisible`, `toBeAttached`, `toBeEnabled`). A test that would pass on an empty HTML page with the right element IDs is a **shallow test** and counts as a verification FAIL. Flag shallow tests and rewrite them before proceeding.
|
|
108
|
+
6. Tests are NOT optional — verification cannot pass without running them and confirming comprehensive, functional coverage
|
|
109
|
+
|
|
110
|
+
### Team Mode (when agent teams are enabled)
|
|
111
|
+
```
|
|
112
|
+
Create an agent team for verification:
|
|
113
|
+
|
|
114
|
+
ALL TEAMMATES read first:
|
|
115
|
+
1. CLAUDE.md
|
|
116
|
+
2. .gsd-t/contracts/ — all contracts
|
|
117
|
+
3. .gsd-t/domains/*/tasks.md — acceptance criteria
|
|
118
|
+
4. docs/requirements.md
|
|
119
|
+
|
|
120
|
+
Teammate assignments:
|
|
121
|
+
- Teammate "functional":
|
|
122
|
+
Verify every acceptance criterion in every domain's tasks.md.
|
|
123
|
+
Test each user flow end-to-end.
|
|
124
|
+
Report: list of criteria with PASS/FAIL status.
|
|
125
|
+
|
|
126
|
+
- Teammate "contracts":
|
|
127
|
+
For each contract in .gsd-t/contracts/:
|
|
128
|
+
Verify the implementing code matches exactly.
|
|
129
|
+
Check types, shapes, error handling, edge cases.
|
|
130
|
+
Report: contract-by-contract compliance status.
|
|
131
|
+
|
|
132
|
+
- Teammate "quality":
|
|
133
|
+
Review all source code for:
|
|
134
|
+
- Consistency with CLAUDE.md conventions
|
|
135
|
+
- Error handling completeness
|
|
136
|
+
- Code duplication
|
|
137
|
+
- Naming consistency
|
|
138
|
+
- Dead code or TODOs
|
|
139
|
+
Report: file-by-file findings.
|
|
140
|
+
|
|
141
|
+
- Teammate "security":
|
|
142
|
+
Review for:
|
|
143
|
+
- Auth bypass possibilities
|
|
144
|
+
- Input validation gaps
|
|
145
|
+
- Data exposure in API responses
|
|
146
|
+
- Dependency vulnerabilities (run audit if applicable)
|
|
147
|
+
- Secret/credential handling
|
|
148
|
+
Report: severity-ranked findings.
|
|
149
|
+
|
|
150
|
+
Lead: After receiving teammate reports:
|
|
151
|
+
**OBSERVABILITY LOGGING (MANDATORY):**
|
|
152
|
+
Before spawning — run via Bash:
|
|
153
|
+
`T_START=$(date +%s) && DT_START=$(date +"%Y-%m-%d %H:%M") && TOK_START=${CLAUDE_CONTEXT_TOKENS_USED:-0} && TOK_MAX=${CLAUDE_CONTEXT_TOKENS_MAX:-200000}`
|
|
154
|
+
Spawn a Task subagent to run the full test suite and contract audit.
|
|
155
|
+
After subagent returns — run via Bash:
|
|
156
|
+
`T_END=$(date +%s) && DT_END=$(date +"%Y-%m-%d %H:%M") && TOK_END=${CLAUDE_CONTEXT_TOKENS_USED:-0} && DURATION=$((T_END-T_START))`
|
|
157
|
+
Compute tokens and compaction:
|
|
158
|
+
- No compaction (TOK_END >= TOK_START): `TOKENS=$((TOK_END-TOK_START))`, COMPACTED=null
|
|
159
|
+
- Compaction detected (TOK_END < TOK_START): `TOKENS=$(((TOK_MAX-TOK_START)+TOK_END))`, COMPACTED=$DT_END
|
|
160
|
+
Append to `.gsd-t/token-log.md` (create with header `| Datetime-start | Datetime-end | Command | Step | Model | Duration(s) | Notes | Tokens | Compacted |` if missing):
|
|
161
|
+
`| {DT_START} | {DT_END} | gsd-t-verify | Step 4 | haiku | {DURATION}s | test audit + contract review | {TOKENS} | {COMPACTED} |`
|
|
162
|
+
Collect all reports, synthesize, create remediation plan.
|
|
163
|
+
```
|
|
164
|
+
|
|
165
|
+
## Step 5: Compile Verification Report
|
|
166
|
+
|
|
167
|
+
Create or update `.gsd-t/verify-report.md`:
|
|
168
|
+
|
|
169
|
+
```markdown
|
|
170
|
+
# Verification Report — {date}
|
|
171
|
+
|
|
172
|
+
## Milestone: {name}
|
|
173
|
+
|
|
174
|
+
## Summary
|
|
175
|
+
- Functional: {PASS/WARN/FAIL} — {X}/{Y} criteria met
|
|
176
|
+
- Contracts: {PASS/WARN/FAIL} — {X}/{Y} contracts compliant
|
|
177
|
+
- Code Quality: {PASS/WARN/FAIL} — {N} issues found
|
|
178
|
+
- Unit Tests: {PASS/WARN/FAIL} — {N}/{total} passing
|
|
179
|
+
- E2E Tests: {PASS/WARN/FAIL} — {N}/{total} specs passing
|
|
180
|
+
- Security: {PASS/WARN/FAIL} — {N} findings
|
|
181
|
+
- Integration: {PASS/WARN/FAIL}
|
|
182
|
+
|
|
183
|
+
## Overall: {PASS / CONDITIONAL PASS / FAIL}
|
|
184
|
+
|
|
185
|
+
## Findings
|
|
186
|
+
|
|
187
|
+
### Critical (must fix before milestone complete)
|
|
188
|
+
1. {finding} — {domain} — {remediation}
|
|
189
|
+
|
|
190
|
+
### Warnings (should fix, not blocking)
|
|
191
|
+
1. {finding} — {domain} — {remediation}
|
|
192
|
+
|
|
193
|
+
### Notes (informational)
|
|
194
|
+
1. {observation}
|
|
195
|
+
|
|
196
|
+
## Remediation Tasks
|
|
197
|
+
| # | Domain | Description | Priority |
|
|
198
|
+
|---|--------|-------------|----------|
|
|
199
|
+
| 1 | auth | Fix missing role in user response | CRITICAL |
|
|
200
|
+
| 2 | ui | Add loading states for async calls | WARN |
|
|
201
|
+
```
|
|
202
|
+
|
|
203
|
+
## Step 5.25: Metrics Quality Budget Check
|
|
204
|
+
|
|
205
|
+
Check task-metrics for the current milestone to detect quality budget violations:
|
|
206
|
+
|
|
207
|
+
1. Run via Bash:
|
|
208
|
+
`node -e "const c = require('./bin/metrics-collector.js'); const r = c.readTaskMetrics({milestone: '{milestone-id}'}); if(!r.length){console.log('No metrics data — quality budget check skipped');process.exit(0);} const pass=r.filter(t=>t.fix_cycles===0&&t.pass).length; const rate=pass/r.length; console.log('First-pass rate: '+(rate*100).toFixed(1)+'% ('+pass+'/'+r.length+')'); if(rate<0.6) console.log('⚠️ Quality budget WARNING: first-pass rate below 60%');" 2>/dev/null || true`
|
|
209
|
+
|
|
210
|
+
2. Run heuristics check via Bash:
|
|
211
|
+
`node -e "const m=require('./bin/metrics-rollup.js'); const r=m.readRollups({milestone:'{milestone-id}'}); if(r.length&&r[r.length-1].heuristic_flags.some(f=>f.severity==='HIGH')) console.log('⚠️ HIGH severity heuristic flag detected — review before completing milestone');" 2>/dev/null || true`
|
|
212
|
+
|
|
213
|
+
3. Display quality metrics summary inline. Quality budget violation is a **WARNING** (non-blocking) — does not fail verify.
|
|
214
|
+
|
|
215
|
+
4. Include quality budget status in the verification report (Step 5):
|
|
216
|
+
`- Quality Budget: {PASS/WARN} — first-pass rate {N}%{, HIGH heuristic: {name} if any}`
|
|
217
|
+
|
|
218
|
+
## Step 5.5: Goal-Backward Verification (Post-Gate Behavior Check)
|
|
219
|
+
|
|
220
|
+
This step runs **after all 8 quality gates pass**. It verifies that milestone goals are actually achieved end-to-end — not just structurally present. It catches placeholder implementations that pass all structural gates.
|
|
221
|
+
|
|
222
|
+
Refer to `.gsd-t/contracts/goal-backward-contract.md` for the full verification flow, placeholder patterns, and findings report format.
|
|
223
|
+
|
|
224
|
+
### 5.5.1 Load Milestone Goals and Requirements
|
|
225
|
+
|
|
226
|
+
1. Read `.gsd-t/progress.md` — extract the current milestone name and goals
|
|
227
|
+
2. Read `docs/requirements.md` — identify **critical requirements** (skip trivial/low-priority items)
|
|
228
|
+
|
|
229
|
+
### 5.5.2 Trace Requirements to Behavior
|
|
230
|
+
|
|
231
|
+
For each critical requirement:
|
|
232
|
+
|
|
233
|
+
1. **If `.gsd-t/graph/meta.json` exists (graph available)**:
|
|
234
|
+
- Trace the requirement → code path → behavior chain using graph queries
|
|
235
|
+
- Use `getRequirementFor`, `getCallers`, and `getTestsFor` to build the chain
|
|
236
|
+
- Flag requirements with no traceable code path as CRITICAL findings
|
|
237
|
+
|
|
238
|
+
2. **If graph is not available (fallback to grep)**:
|
|
239
|
+
- Search the codebase for the feature/function implementing each requirement
|
|
240
|
+
- Trace from entry point → core logic → output/response
|
|
241
|
+
|
|
242
|
+
### 5.5.3 Scan for Placeholder Patterns
|
|
243
|
+
|
|
244
|
+
For each file identified in the requirement traces above, scan for these placeholder patterns:
|
|
245
|
+
|
|
246
|
+
| Pattern | Detection Hint | Severity |
|
|
247
|
+
|---------|---------------|----------|
|
|
248
|
+
| console.log placeholder | `console.log.*TODO\|console.log.*implement` | CRITICAL |
|
|
249
|
+
| TODO/FIXME in implementation | `// TODO\|// FIXME\|# TODO\|# FIXME` in non-test files | CRITICAL |
|
|
250
|
+
| Empty function body | `function \w+\(\) \{\}` or `\(\) => \{\}` with no logic | CRITICAL |
|
|
251
|
+
| Throw not-implemented | `throw new Error.*not implemented\|throw new Error.*TODO` | CRITICAL |
|
|
252
|
+
| Hardcoded return | `return "success"\|return true` with no conditional logic | HIGH |
|
|
253
|
+
| Static UI text | Static `<span>` or text that never updates based on state | HIGH |
|
|
254
|
+
| Pass-through stub | `return input\|return req\|return data` with no transformation | MEDIUM |
|
|
255
|
+
|
|
256
|
+
### 5.5.4 Produce Findings Report
|
|
257
|
+
|
|
258
|
+
Format findings per the goal-backward-contract.md report format:
|
|
259
|
+
|
|
260
|
+
```markdown
|
|
261
|
+
## Goal-Backward Verification Report
|
|
262
|
+
|
|
263
|
+
### Status: PASS | FAIL
|
|
264
|
+
|
|
265
|
+
### Findings
|
|
266
|
+
| # | Requirement | File:Line | Pattern | Severity | Description |
|
|
267
|
+
|---|-------------|-----------|---------|----------|-------------|
|
|
268
|
+
| 1 | {req-id} | {path}:{line} | {pattern} | {severity} | {what's wrong} |
|
|
269
|
+
|
|
270
|
+
### Summary
|
|
271
|
+
- Requirements checked: {N}
|
|
272
|
+
- Findings: {N} ({critical}, {high}, {medium})
|
|
273
|
+
- Verdict: {PASS if 0 critical/high, FAIL otherwise}
|
|
274
|
+
```
|
|
275
|
+
|
|
276
|
+
### 5.5.5 Apply Blocking Rules
|
|
277
|
+
|
|
278
|
+
- **CRITICAL or HIGH findings** → Goal-Backward status = **FAIL** — block verification
|
|
279
|
+
- Append findings to the Critical section of the verification report (Step 5)
|
|
280
|
+
- Set overall verification status to FAIL
|
|
281
|
+
- **MEDIUM findings** → Goal-Backward status = **WARN** — log but do not block
|
|
282
|
+
- Append findings to the Warnings section of the verification report (Step 5)
|
|
283
|
+
- **No findings** → Goal-Backward status = **PASS** — add to verification report summary
|
|
284
|
+
|
|
285
|
+
Add a `Goal-Backward:` line to the Step 5 verification report summary:
|
|
286
|
+
```
|
|
287
|
+
- Goal-Backward: {PASS/WARN/FAIL} — {N} requirements checked, {N} findings ({critical} critical, {high} high, {medium} medium)
|
|
288
|
+
```
|
|
289
|
+
|
|
290
|
+
---
|
|
291
|
+
|
|
292
|
+
## Step 6: Handle Remediation
|
|
293
|
+
|
|
294
|
+
If there are CRITICAL findings:
|
|
295
|
+
1. Create remediation tasks in the affected domain's `tasks.md`
|
|
296
|
+
2. Execute fixes (solo — don't spawn teams for remediation)
|
|
297
|
+
3. Re-verify the specific findings
|
|
298
|
+
4. Update the verification report
|
|
299
|
+
|
|
300
|
+
## Step 7: Update State
|
|
301
|
+
|
|
302
|
+
Update `.gsd-t/progress.md`:
|
|
303
|
+
- If all PASS: Set status to `VERIFIED`
|
|
304
|
+
- If CONDITIONAL PASS: Set status to `VERIFIED-WITH-WARNINGS`, list warnings
|
|
305
|
+
- If FAIL: Set status to `VERIFY-FAILED`, list required remediations
|
|
306
|
+
- Record verification date and summary
|
|
307
|
+
|
|
308
|
+
### Autonomy Behavior
|
|
309
|
+
|
|
310
|
+
**All Levels**:
|
|
311
|
+
- VERIFIED or CONDITIONAL PASS → **Auto-invoke complete-milestone** (see Step 8 below). Completing a verified milestone is mechanical — there is no judgment call that benefits from user review.
|
|
312
|
+
- FAIL → **Level 3**: Auto-execute remediation tasks (up to 2 fix attempts). If still failing after 2 attempts:
|
|
313
|
+
1. Write failure context to `.gsd-t/debug-state.jsonl` via `node -e "require('./bin/debug-ledger.js').appendEntry('.', {iteration:1,timestamp:new Date().toISOString(),test:'verify-remediation',error:'2 in-context fix attempts exhausted',hypothesis:'see verify-report.md',fix:'n/a',fixFiles:[],result:'STILL_FAILS',learning:'delegating to headless debug-loop',model:'sonnet',duration:0})"`
|
|
314
|
+
2. Log: "Delegating to headless debug-loop (2 in-context attempts exhausted)"
|
|
315
|
+
3. Run: `gsd-t headless --debug-loop --max-iterations 10`
|
|
316
|
+
4. Exit code 0 → re-run verification; 1/4 → log to `.gsd-t/deferred-items.md`, STOP and report to user; 3 → report error
|
|
317
|
+
**Level 1–2**: Return to execute phase for remediation tasks.
|
|
318
|
+
|
|
319
|
+
## Document Ripple
|
|
320
|
+
|
|
321
|
+
### Always update:
|
|
322
|
+
1. **`.gsd-t/progress.md`** — Set status to VERIFIED/VERIFY-FAILED, log verification summary
|
|
323
|
+
2. **`.gsd-t/verify-report.md`** — Created with full verification results (Step 4)
|
|
324
|
+
|
|
325
|
+
### Check if affected:
|
|
326
|
+
3. **`.gsd-t/domains/{domain}/tasks.md`** — If remediation tasks were created (Step 5)
|
|
327
|
+
4. **`.gsd-t/techdebt.md`** — If verification found new quality or security issues, add as debt
|
|
328
|
+
5. **`docs/requirements.md`** — If verification revealed unmet requirements, update status
|
|
329
|
+
|
|
330
|
+
## Step 8: Auto-Invoke Complete-Milestone
|
|
331
|
+
|
|
332
|
+
**This step is MANDATORY and runs at ALL autonomy levels.** Completing a verified milestone is a mechanical operation (archive, tag, bump version, update docs). There is no decision that benefits from user review — the decision was made when verification passed.
|
|
333
|
+
|
|
334
|
+
If status is VERIFY-FAILED:
|
|
335
|
+
- Do NOT invoke complete-milestone
|
|
336
|
+
- Report failures and stop
|
|
337
|
+
|
|
338
|
+
If status is VERIFIED or VERIFIED-WITH-WARNINGS:
|
|
339
|
+
1. Log: "✅ Verify complete — spawning complete-milestone agent..."
|
|
340
|
+
|
|
341
|
+
**OBSERVABILITY LOGGING (MANDATORY):**
|
|
342
|
+
Before spawning — run via Bash:
|
|
343
|
+
`T_START=$(date +%s) && DT_START=$(date +"%Y-%m-%d %H:%M") && TOK_START=${CLAUDE_CONTEXT_TOKENS_USED:-0} && TOK_MAX=${CLAUDE_CONTEXT_TOKENS_MAX:-200000}`
|
|
344
|
+
|
|
345
|
+
2. Spawn a Task subagent (model: sonnet, mode: bypassPermissions):
|
|
346
|
+
```
|
|
347
|
+
"Execute the complete-milestone phase of the current GSD-T milestone.
|
|
348
|
+
|
|
349
|
+
Read and follow the full instructions in commands/gsd-t-complete-milestone.md
|
|
350
|
+
(resolve from ~/.claude/commands/ if not in project).
|
|
351
|
+
Read .gsd-t/progress.md for current milestone and state.
|
|
352
|
+
Read CLAUDE.md for project conventions.
|
|
353
|
+
Read .gsd-t/contracts/ for domain interfaces.
|
|
354
|
+
|
|
355
|
+
Complete the phase fully:
|
|
356
|
+
- Follow every step in the command file
|
|
357
|
+
- Update .gsd-t/progress.md status when done
|
|
358
|
+
- Run document ripple as specified
|
|
359
|
+
- Commit your work
|
|
360
|
+
|
|
361
|
+
Report back: one-line status summary."
|
|
362
|
+
```
|
|
363
|
+
|
|
364
|
+
After subagent returns — run via Bash:
|
|
365
|
+
`T_END=$(date +%s) && DT_END=$(date +"%Y-%m-%d %H:%M") && TOK_END=${CLAUDE_CONTEXT_TOKENS_USED:-0} && DURATION=$((T_END-T_START))`
|
|
366
|
+
Compute tokens and compaction:
|
|
367
|
+
- No compaction (TOK_END >= TOK_START): `TOKENS=$((TOK_END-TOK_START))`, COMPACTED=null
|
|
368
|
+
- Compaction detected (TOK_END < TOK_START): `TOKENS=$(((TOK_MAX-TOK_START)+TOK_END))`, COMPACTED=$DT_END
|
|
369
|
+
Append to `.gsd-t/token-log.md`:
|
|
370
|
+
`| {DT_START} | {DT_END} | gsd-t-verify | Step 8 | sonnet | {DURATION}s | auto-complete-milestone | {TOKENS} | {COMPACTED} | | | {CTX_PCT} |`
|
|
371
|
+
|
|
372
|
+
3. Verify subagent result: Read `.gsd-t/progress.md` — confirm status is COMPLETED. If not, report the failure.
|
|
373
|
+
|
|
374
|
+
**Why this is mandatory**: Without auto-completion, verified milestones remain in VERIFIED state indefinitely. Requirements stay unmarked, progress.md is stale, and future sessions cannot tell the work was done. This is the root cause of "GSD-T forgot it did this work" — the milestone was built and verified but never formally completed.
|
|
375
|
+
|
|
376
|
+
**Why a subagent**: Complete-milestone is a 12-step process (gap analysis, archive, version bump, git tag, doc ripple). Verify is already heavy with 8+ quality gates. Spawning a fresh-context subagent avoids compaction risk — and complete-milestone loads everything it needs from files (progress.md, verify-report.md, contracts).
|
|
377
|
+
|
|
378
|
+
$ARGUMENTS
|
|
379
|
+
|
|
380
|
+
## Auto-Clear
|
|
381
|
+
|
|
382
|
+
All work is committed to project files. Execute `/clear` to free the context window for the next command.
|