@tekyzinc/gsd-t 2.50.12 → 2.53.10

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (99) hide show
  1. package/CHANGELOG.md +24 -0
  2. package/README.md +379 -372
  3. package/bin/component-registry.js +250 -0
  4. package/bin/graph-cgc.js +510 -510
  5. package/bin/graph-indexer.js +147 -147
  6. package/bin/graph-overlay.js +195 -195
  7. package/bin/graph-parsers.js +327 -327
  8. package/bin/graph-query.js +453 -452
  9. package/bin/graph-store.js +154 -154
  10. package/bin/qa-calibrator.js +194 -0
  11. package/bin/scan-data-collector.js +153 -153
  12. package/bin/scan-diagrams-generators.js +187 -187
  13. package/bin/scan-diagrams.js +79 -79
  14. package/bin/scan-renderer.js +92 -92
  15. package/bin/scan-report-sections.js +121 -121
  16. package/bin/scan-report.js +184 -184
  17. package/bin/scan-schema-parsers.js +199 -199
  18. package/bin/scan-schema.js +103 -103
  19. package/bin/token-budget.js +246 -0
  20. package/commands/Claude-md.md +10 -10
  21. package/commands/branch.md +15 -15
  22. package/commands/checkin.md +45 -45
  23. package/commands/global-change.md +209 -209
  24. package/commands/gsd-t-audit.md +199 -0
  25. package/commands/gsd-t-backlog-add.md +94 -94
  26. package/commands/gsd-t-backlog-edit.md +111 -111
  27. package/commands/gsd-t-backlog-list.md +63 -63
  28. package/commands/gsd-t-backlog-move.md +94 -94
  29. package/commands/gsd-t-backlog-promote.md +123 -123
  30. package/commands/gsd-t-backlog-remove.md +86 -86
  31. package/commands/gsd-t-backlog-settings.md +158 -158
  32. package/commands/gsd-t-complete-milestone.md +528 -515
  33. package/commands/gsd-t-debug.md +506 -399
  34. package/commands/gsd-t-discuss.md +174 -174
  35. package/commands/gsd-t-execute.md +758 -634
  36. package/commands/gsd-t-feature.md +276 -276
  37. package/commands/gsd-t-health.md +142 -142
  38. package/commands/gsd-t-help.md +465 -457
  39. package/commands/gsd-t-impact.md +302 -302
  40. package/commands/gsd-t-init.md +320 -280
  41. package/commands/gsd-t-integrate.md +365 -249
  42. package/commands/gsd-t-milestone.md +87 -87
  43. package/commands/gsd-t-partition.md +442 -361
  44. package/commands/gsd-t-pause.md +82 -82
  45. package/commands/gsd-t-plan.md +345 -344
  46. package/commands/gsd-t-populate.md +111 -111
  47. package/commands/gsd-t-prd.md +326 -326
  48. package/commands/gsd-t-project.md +211 -211
  49. package/commands/gsd-t-promote-debt.md +123 -123
  50. package/commands/gsd-t-prompt.md +137 -137
  51. package/commands/gsd-t-qa.md +266 -266
  52. package/commands/gsd-t-quick.md +357 -234
  53. package/commands/gsd-t-reflect.md +134 -134
  54. package/commands/gsd-t-resume.md +72 -72
  55. package/commands/gsd-t-scan.md +615 -615
  56. package/commands/gsd-t-setup.md +76 -0
  57. package/commands/gsd-t-status.md +192 -166
  58. package/commands/gsd-t-test-sync.md +381 -381
  59. package/commands/gsd-t-triage-and-merge.md +171 -171
  60. package/commands/gsd-t-verify.md +382 -382
  61. package/commands/gsd-t-visualize.md +118 -118
  62. package/commands/gsd-t-wave.md +401 -378
  63. package/docs/GSD-T-README.md +425 -422
  64. package/docs/architecture.md +385 -369
  65. package/docs/harness-design-analysis.md +371 -0
  66. package/docs/infrastructure.md +205 -205
  67. package/docs/prd-graph-engine.md +398 -398
  68. package/docs/prd-gsd2-hybrid.md +559 -559
  69. package/docs/prd-harness-evolution.md +583 -0
  70. package/docs/requirements.md +14 -0
  71. package/docs/workflows.md +226 -226
  72. package/examples/.gsd-t/domains/example-domain/scope.md +13 -13
  73. package/package.json +40 -40
  74. package/scripts/gsd-t-auto-route.js +39 -39
  75. package/scripts/gsd-t-dashboard-mockup.html +1143 -1143
  76. package/scripts/gsd-t-dashboard-server.js +171 -171
  77. package/scripts/gsd-t-dashboard.html +262 -262
  78. package/scripts/gsd-t-event-writer.js +128 -128
  79. package/scripts/gsd-t-statusline.js +94 -94
  80. package/scripts/gsd-t-tools.js +175 -175
  81. package/templates/CLAUDE-global.md +639 -614
  82. package/templates/CLAUDE-project.md +24 -0
  83. package/templates/backlog-settings.md +18 -18
  84. package/templates/backlog.md +1 -1
  85. package/templates/progress.md +40 -40
  86. package/templates/shared-services-contract.md +60 -60
  87. package/templates/stacks/desktop.ini +2 -2
  88. package/bin/desktop.ini +0 -2
  89. package/commands/desktop.ini +0 -2
  90. package/docs/ci-examples/desktop.ini +0 -2
  91. package/docs/desktop.ini +0 -2
  92. package/examples/.gsd-t/contracts/desktop.ini +0 -2
  93. package/examples/.gsd-t/desktop.ini +0 -2
  94. package/examples/.gsd-t/domains/desktop.ini +0 -2
  95. package/examples/.gsd-t/domains/example-domain/desktop.ini +0 -2
  96. package/examples/desktop.ini +0 -2
  97. package/examples/rules/desktop.ini +0 -2
  98. package/scripts/desktop.ini +0 -2
  99. package/templates/desktop.ini +0 -2
@@ -1,382 +1,382 @@
1
- # GSD-T: Verify — Quality Gates (Solo or Parallel)
2
-
3
- You are the lead agent coordinating verification of the completed work. Each verification dimension should be thorough and independent.
4
-
5
- ## Step 1: Load State
6
-
7
- Read:
8
- 1. `CLAUDE.md`
9
- 2. `.gsd-t/progress.md` — confirm status is INTEGRATED
10
- 3. `.gsd-t/contracts/` — all contracts
11
- 4. `.gsd-t/domains/*/tasks.md` — all acceptance criteria
12
- 5. `docs/requirements.md` — original requirements
13
- 6. All source code
14
-
15
- ## Step 1.5: Graph-Enhanced Traceability Check
16
-
17
- If `.gsd-t/graph/meta.json` exists (graph index is available):
18
- 1. Query `getRequirementFor` on implemented entities to build a requirement-to-code traceability chain — flag entities with no requirement mapping
19
- 2. Query `getDomainBoundaryViolations` to verify no cross-domain boundary violations exist in the final codebase
20
- 3. Include any violations as FAIL findings in the verification report (Step 5)
21
-
22
- If graph is not available, skip this step.
23
-
24
- ## Step 2: Full Test Audit (Inline)
25
-
26
- Run the full test audit directly:
27
-
28
- 1. Run the full test suite: `npm test` (or project equivalent) — record pass/fail counts
29
- 2. Read all contracts in `.gsd-t/contracts/` — verify each has at least one test validating it
30
- 3. Check acceptance criteria from domain task lists — verify each is tested
31
- 4. Run E2E suite if `playwright.config.*` exists
32
- 5. Report: comprehensive test results with pass/fail counts and coverage gaps
33
-
34
- Verification cannot complete if any test fails or critical contract gaps remain.
35
-
36
- ## Step 2.5: High-Risk Domain Gate (MANDATORY — Categories 2 and 7)
37
-
38
- Before running standard verification dimensions, check whether this milestone involves any high-risk domain:
39
-
40
- **High-risk domains**: audio capture/playback, GPU/WebGPU/WebGL, ML/inference/model loading, background workers, native APIs (camera, bluetooth, filesystem), IPC, WebAssembly, real-time data streams.
41
-
42
- **If any high-risk domain is present:**
43
-
44
- ### Category 2 — Technology Reliability Gate
45
- Initialization success does not prove runtime correctness. These technologies can initialize cleanly and fail silently at runtime (compute shader errors, audio context state loss, worker message drops, inference failures).
46
-
47
- For each high-risk domain:
48
- 1. A **smoke test script** must exist that exercises actual runtime behavior — not just initialization
49
- 2. The smoke test must have been run and passed
50
- 3. "It initialized without throwing" is NOT a passing smoke test
51
- 4. If no smoke test exists → create one now before proceeding with any other verification dimension
52
- 5. Smoke test failure → verification FAIL (not WARN)
53
-
54
- ### Category 7 — Manual QA as Test Gate
55
- "The user will manually test it" is not a test artifact. Scan the milestone's domains for any feature whose acceptance criteria relies solely on manual user testing.
56
-
57
- For each such feature:
58
- 1. A smoke test script must exist that automates as much of the verification as possible
59
- 2. Any remaining manual steps must be explicitly documented in `.gsd-t/smoke-tests/{feature}.md` with exact steps and expected outcomes
60
- 3. The documented manual steps must have been executed and passed (noted in the file)
61
- 4. If neither automated smoke test nor documented manual procedure exists → verification FAIL
62
-
63
- > These gates exist because the pre-commit checklist "did you run the affected tests?" is meaningless when the only test is "user presses Ctrl+Space." That is not a test. It is hope.
64
-
65
- ---
66
-
67
- ## Step 3: Define Verification Dimensions
68
-
69
- Standard dimensions (adjust based on project):
70
-
71
- 1. **Functional Correctness**: Does it work per requirements?
72
- 2. **Contract Compliance**: Does every domain honor its contracts?
73
- 3. **Code Quality**: Conventions, patterns, error handling, readability
74
- 4. **Test Coverage Completeness**: Every new or changed code path MUST have tests. Check:
75
- - Do all new functions have unit tests (happy path + edge cases + error cases)?
76
- - Do all new features/modes/flows have Playwright E2E specs?
77
- - Do all new UI components have interaction tests?
78
- - **Zero test coverage on new functionality = FAIL** (not WARN, not "nice to have" — FAIL)
79
- 5. **E2E Tests**: Run the FULL Playwright suite — all specs must pass. If new features lack specs, create them before proceeding.
80
- 6. **Security**: Auth flows, input validation, data exposure, dependencies
81
- 7. **Integration Integrity**: Do the seams between domains hold under stress?
82
- 8. **Requirements Traceability Close-Out**: Mark verified requirements as complete and report orphans:
83
- - Read `docs/requirements.md` traceability table (added by plan phase)
84
- - For each REQ-ID that is fully implemented and tested: update Status to `complete` in the traceability table
85
- - **Orphan report**: List any REQ-IDs with no task mapping (planning gap) and any tasks with no REQ-ID (potential scope creep)
86
- - Orphaned requirements = WARN (not blocking unless critical)
87
- - Update `docs/requirements.md` with the close-out results
88
-
89
- ## Step 4: Execute Verification
90
-
91
- ### Solo Mode (default)
92
- Work through each dimension sequentially. For each:
93
- 1. Define what you're checking
94
- 2. Check it systematically
95
- 3. Record findings as PASS / WARN / FAIL with specifics
96
- 4. If FAIL, create a remediation task
97
-
98
- **Mandatory test execution:**
99
- 1. Run ALL unit/integration tests — every test must pass
100
- 2. Detect Playwright (check for `playwright.config.*`, Playwright deps in package.json)
101
- 3. Run the FULL Playwright E2E suite — every spec must pass
102
- 4. **Coverage audit**: For every new feature, mode, page, or flow added in this milestone:
103
- - Confirm Playwright specs exist that specifically test it
104
- - Confirm specs cover: happy path, error states, edge cases, all modes/flags
105
- - If specs are missing or incomplete → invoke `gsd-t-test-sync` to create them, then re-run
106
- - **Missing E2E coverage on new functionality = verification FAIL**
107
- 5. **Functional test quality audit**: Read every Playwright spec. For each `test()` block, verify assertions check **functional behavior** (state changed after action, data loaded, content updated, widget responded) — NOT just element existence (`isVisible`, `toBeAttached`, `toBeEnabled`). A test that would pass on an empty HTML page with the right element IDs is a **shallow test** and counts as a verification FAIL. Flag shallow tests and rewrite them before proceeding.
108
- 6. Tests are NOT optional — verification cannot pass without running them and confirming comprehensive, functional coverage
109
-
110
- ### Team Mode (when agent teams are enabled)
111
- ```
112
- Create an agent team for verification:
113
-
114
- ALL TEAMMATES read first:
115
- 1. CLAUDE.md
116
- 2. .gsd-t/contracts/ — all contracts
117
- 3. .gsd-t/domains/*/tasks.md — acceptance criteria
118
- 4. docs/requirements.md
119
-
120
- Teammate assignments:
121
- - Teammate "functional":
122
- Verify every acceptance criterion in every domain's tasks.md.
123
- Test each user flow end-to-end.
124
- Report: list of criteria with PASS/FAIL status.
125
-
126
- - Teammate "contracts":
127
- For each contract in .gsd-t/contracts/:
128
- Verify the implementing code matches exactly.
129
- Check types, shapes, error handling, edge cases.
130
- Report: contract-by-contract compliance status.
131
-
132
- - Teammate "quality":
133
- Review all source code for:
134
- - Consistency with CLAUDE.md conventions
135
- - Error handling completeness
136
- - Code duplication
137
- - Naming consistency
138
- - Dead code or TODOs
139
- Report: file-by-file findings.
140
-
141
- - Teammate "security":
142
- Review for:
143
- - Auth bypass possibilities
144
- - Input validation gaps
145
- - Data exposure in API responses
146
- - Dependency vulnerabilities (run audit if applicable)
147
- - Secret/credential handling
148
- Report: severity-ranked findings.
149
-
150
- Lead: After receiving teammate reports:
151
- **OBSERVABILITY LOGGING (MANDATORY):**
152
- Before spawning — run via Bash:
153
- `T_START=$(date +%s) && DT_START=$(date +"%Y-%m-%d %H:%M") && TOK_START=${CLAUDE_CONTEXT_TOKENS_USED:-0} && TOK_MAX=${CLAUDE_CONTEXT_TOKENS_MAX:-200000}`
154
- Spawn a Task subagent to run the full test suite and contract audit.
155
- After subagent returns — run via Bash:
156
- `T_END=$(date +%s) && DT_END=$(date +"%Y-%m-%d %H:%M") && TOK_END=${CLAUDE_CONTEXT_TOKENS_USED:-0} && DURATION=$((T_END-T_START))`
157
- Compute tokens and compaction:
158
- - No compaction (TOK_END >= TOK_START): `TOKENS=$((TOK_END-TOK_START))`, COMPACTED=null
159
- - Compaction detected (TOK_END < TOK_START): `TOKENS=$(((TOK_MAX-TOK_START)+TOK_END))`, COMPACTED=$DT_END
160
- Append to `.gsd-t/token-log.md` (create with header `| Datetime-start | Datetime-end | Command | Step | Model | Duration(s) | Notes | Tokens | Compacted |` if missing):
161
- `| {DT_START} | {DT_END} | gsd-t-verify | Step 4 | haiku | {DURATION}s | test audit + contract review | {TOKENS} | {COMPACTED} |`
162
- Collect all reports, synthesize, create remediation plan.
163
- ```
164
-
165
- ## Step 5: Compile Verification Report
166
-
167
- Create or update `.gsd-t/verify-report.md`:
168
-
169
- ```markdown
170
- # Verification Report — {date}
171
-
172
- ## Milestone: {name}
173
-
174
- ## Summary
175
- - Functional: {PASS/WARN/FAIL} — {X}/{Y} criteria met
176
- - Contracts: {PASS/WARN/FAIL} — {X}/{Y} contracts compliant
177
- - Code Quality: {PASS/WARN/FAIL} — {N} issues found
178
- - Unit Tests: {PASS/WARN/FAIL} — {N}/{total} passing
179
- - E2E Tests: {PASS/WARN/FAIL} — {N}/{total} specs passing
180
- - Security: {PASS/WARN/FAIL} — {N} findings
181
- - Integration: {PASS/WARN/FAIL}
182
-
183
- ## Overall: {PASS / CONDITIONAL PASS / FAIL}
184
-
185
- ## Findings
186
-
187
- ### Critical (must fix before milestone complete)
188
- 1. {finding} — {domain} — {remediation}
189
-
190
- ### Warnings (should fix, not blocking)
191
- 1. {finding} — {domain} — {remediation}
192
-
193
- ### Notes (informational)
194
- 1. {observation}
195
-
196
- ## Remediation Tasks
197
- | # | Domain | Description | Priority |
198
- |---|--------|-------------|----------|
199
- | 1 | auth | Fix missing role in user response | CRITICAL |
200
- | 2 | ui | Add loading states for async calls | WARN |
201
- ```
202
-
203
- ## Step 5.25: Metrics Quality Budget Check
204
-
205
- Check task-metrics for the current milestone to detect quality budget violations:
206
-
207
- 1. Run via Bash:
208
- `node -e "const c = require('./bin/metrics-collector.js'); const r = c.readTaskMetrics({milestone: '{milestone-id}'}); if(!r.length){console.log('No metrics data — quality budget check skipped');process.exit(0);} const pass=r.filter(t=>t.fix_cycles===0&&t.pass).length; const rate=pass/r.length; console.log('First-pass rate: '+(rate*100).toFixed(1)+'% ('+pass+'/'+r.length+')'); if(rate<0.6) console.log('⚠️ Quality budget WARNING: first-pass rate below 60%');" 2>/dev/null || true`
209
-
210
- 2. Run heuristics check via Bash:
211
- `node -e "const m=require('./bin/metrics-rollup.js'); const r=m.readRollups({milestone:'{milestone-id}'}); if(r.length&&r[r.length-1].heuristic_flags.some(f=>f.severity==='HIGH')) console.log('⚠️ HIGH severity heuristic flag detected — review before completing milestone');" 2>/dev/null || true`
212
-
213
- 3. Display quality metrics summary inline. Quality budget violation is a **WARNING** (non-blocking) — does not fail verify.
214
-
215
- 4. Include quality budget status in the verification report (Step 5):
216
- `- Quality Budget: {PASS/WARN} — first-pass rate {N}%{, HIGH heuristic: {name} if any}`
217
-
218
- ## Step 5.5: Goal-Backward Verification (Post-Gate Behavior Check)
219
-
220
- This step runs **after all 8 quality gates pass**. It verifies that milestone goals are actually achieved end-to-end — not just structurally present. It catches placeholder implementations that pass all structural gates.
221
-
222
- Refer to `.gsd-t/contracts/goal-backward-contract.md` for the full verification flow, placeholder patterns, and findings report format.
223
-
224
- ### 5.5.1 Load Milestone Goals and Requirements
225
-
226
- 1. Read `.gsd-t/progress.md` — extract the current milestone name and goals
227
- 2. Read `docs/requirements.md` — identify **critical requirements** (skip trivial/low-priority items)
228
-
229
- ### 5.5.2 Trace Requirements to Behavior
230
-
231
- For each critical requirement:
232
-
233
- 1. **If `.gsd-t/graph/meta.json` exists (graph available)**:
234
- - Trace the requirement → code path → behavior chain using graph queries
235
- - Use `getRequirementFor`, `getCallers`, and `getTestsFor` to build the chain
236
- - Flag requirements with no traceable code path as CRITICAL findings
237
-
238
- 2. **If graph is not available (fallback to grep)**:
239
- - Search the codebase for the feature/function implementing each requirement
240
- - Trace from entry point → core logic → output/response
241
-
242
- ### 5.5.3 Scan for Placeholder Patterns
243
-
244
- For each file identified in the requirement traces above, scan for these placeholder patterns:
245
-
246
- | Pattern | Detection Hint | Severity |
247
- |---------|---------------|----------|
248
- | console.log placeholder | `console.log.*TODO\|console.log.*implement` | CRITICAL |
249
- | TODO/FIXME in implementation | `// TODO\|// FIXME\|# TODO\|# FIXME` in non-test files | CRITICAL |
250
- | Empty function body | `function \w+\(\) \{\}` or `\(\) => \{\}` with no logic | CRITICAL |
251
- | Throw not-implemented | `throw new Error.*not implemented\|throw new Error.*TODO` | CRITICAL |
252
- | Hardcoded return | `return "success"\|return true` with no conditional logic | HIGH |
253
- | Static UI text | Static `<span>` or text that never updates based on state | HIGH |
254
- | Pass-through stub | `return input\|return req\|return data` with no transformation | MEDIUM |
255
-
256
- ### 5.5.4 Produce Findings Report
257
-
258
- Format findings per the goal-backward-contract.md report format:
259
-
260
- ```markdown
261
- ## Goal-Backward Verification Report
262
-
263
- ### Status: PASS | FAIL
264
-
265
- ### Findings
266
- | # | Requirement | File:Line | Pattern | Severity | Description |
267
- |---|-------------|-----------|---------|----------|-------------|
268
- | 1 | {req-id} | {path}:{line} | {pattern} | {severity} | {what's wrong} |
269
-
270
- ### Summary
271
- - Requirements checked: {N}
272
- - Findings: {N} ({critical}, {high}, {medium})
273
- - Verdict: {PASS if 0 critical/high, FAIL otherwise}
274
- ```
275
-
276
- ### 5.5.5 Apply Blocking Rules
277
-
278
- - **CRITICAL or HIGH findings** → Goal-Backward status = **FAIL** — block verification
279
- - Append findings to the Critical section of the verification report (Step 5)
280
- - Set overall verification status to FAIL
281
- - **MEDIUM findings** → Goal-Backward status = **WARN** — log but do not block
282
- - Append findings to the Warnings section of the verification report (Step 5)
283
- - **No findings** → Goal-Backward status = **PASS** — add to verification report summary
284
-
285
- Add a `Goal-Backward:` line to the Step 5 verification report summary:
286
- ```
287
- - Goal-Backward: {PASS/WARN/FAIL} — {N} requirements checked, {N} findings ({critical} critical, {high} high, {medium} medium)
288
- ```
289
-
290
- ---
291
-
292
- ## Step 6: Handle Remediation
293
-
294
- If there are CRITICAL findings:
295
- 1. Create remediation tasks in the affected domain's `tasks.md`
296
- 2. Execute fixes (solo — don't spawn teams for remediation)
297
- 3. Re-verify the specific findings
298
- 4. Update the verification report
299
-
300
- ## Step 7: Update State
301
-
302
- Update `.gsd-t/progress.md`:
303
- - If all PASS: Set status to `VERIFIED`
304
- - If CONDITIONAL PASS: Set status to `VERIFIED-WITH-WARNINGS`, list warnings
305
- - If FAIL: Set status to `VERIFY-FAILED`, list required remediations
306
- - Record verification date and summary
307
-
308
- ### Autonomy Behavior
309
-
310
- **All Levels**:
311
- - VERIFIED or CONDITIONAL PASS → **Auto-invoke complete-milestone** (see Step 8 below). Completing a verified milestone is mechanical — there is no judgment call that benefits from user review.
312
- - FAIL → **Level 3**: Auto-execute remediation tasks (up to 2 fix attempts). If still failing after 2 attempts:
313
- 1. Write failure context to `.gsd-t/debug-state.jsonl` via `node -e "require('./bin/debug-ledger.js').appendEntry('.', {iteration:1,timestamp:new Date().toISOString(),test:'verify-remediation',error:'2 in-context fix attempts exhausted',hypothesis:'see verify-report.md',fix:'n/a',fixFiles:[],result:'STILL_FAILS',learning:'delegating to headless debug-loop',model:'sonnet',duration:0})"`
314
- 2. Log: "Delegating to headless debug-loop (2 in-context attempts exhausted)"
315
- 3. Run: `gsd-t headless --debug-loop --max-iterations 10`
316
- 4. Exit code 0 → re-run verification; 1/4 → log to `.gsd-t/deferred-items.md`, STOP and report to user; 3 → report error
317
- **Level 1–2**: Return to execute phase for remediation tasks.
318
-
319
- ## Document Ripple
320
-
321
- ### Always update:
322
- 1. **`.gsd-t/progress.md`** — Set status to VERIFIED/VERIFY-FAILED, log verification summary
323
- 2. **`.gsd-t/verify-report.md`** — Created with full verification results (Step 4)
324
-
325
- ### Check if affected:
326
- 3. **`.gsd-t/domains/{domain}/tasks.md`** — If remediation tasks were created (Step 5)
327
- 4. **`.gsd-t/techdebt.md`** — If verification found new quality or security issues, add as debt
328
- 5. **`docs/requirements.md`** — If verification revealed unmet requirements, update status
329
-
330
- ## Step 8: Auto-Invoke Complete-Milestone
331
-
332
- **This step is MANDATORY and runs at ALL autonomy levels.** Completing a verified milestone is a mechanical operation (archive, tag, bump version, update docs). There is no decision that benefits from user review — the decision was made when verification passed.
333
-
334
- If status is VERIFY-FAILED:
335
- - Do NOT invoke complete-milestone
336
- - Report failures and stop
337
-
338
- If status is VERIFIED or VERIFIED-WITH-WARNINGS:
339
- 1. Log: "✅ Verify complete — spawning complete-milestone agent..."
340
-
341
- **OBSERVABILITY LOGGING (MANDATORY):**
342
- Before spawning — run via Bash:
343
- `T_START=$(date +%s) && DT_START=$(date +"%Y-%m-%d %H:%M") && TOK_START=${CLAUDE_CONTEXT_TOKENS_USED:-0} && TOK_MAX=${CLAUDE_CONTEXT_TOKENS_MAX:-200000}`
344
-
345
- 2. Spawn a Task subagent (model: sonnet, mode: bypassPermissions):
346
- ```
347
- "Execute the complete-milestone phase of the current GSD-T milestone.
348
-
349
- Read and follow the full instructions in commands/gsd-t-complete-milestone.md
350
- (resolve from ~/.claude/commands/ if not in project).
351
- Read .gsd-t/progress.md for current milestone and state.
352
- Read CLAUDE.md for project conventions.
353
- Read .gsd-t/contracts/ for domain interfaces.
354
-
355
- Complete the phase fully:
356
- - Follow every step in the command file
357
- - Update .gsd-t/progress.md status when done
358
- - Run document ripple as specified
359
- - Commit your work
360
-
361
- Report back: one-line status summary."
362
- ```
363
-
364
- After subagent returns — run via Bash:
365
- `T_END=$(date +%s) && DT_END=$(date +"%Y-%m-%d %H:%M") && TOK_END=${CLAUDE_CONTEXT_TOKENS_USED:-0} && DURATION=$((T_END-T_START))`
366
- Compute tokens and compaction:
367
- - No compaction (TOK_END >= TOK_START): `TOKENS=$((TOK_END-TOK_START))`, COMPACTED=null
368
- - Compaction detected (TOK_END < TOK_START): `TOKENS=$(((TOK_MAX-TOK_START)+TOK_END))`, COMPACTED=$DT_END
369
- Append to `.gsd-t/token-log.md`:
370
- `| {DT_START} | {DT_END} | gsd-t-verify | Step 8 | sonnet | {DURATION}s | auto-complete-milestone | {TOKENS} | {COMPACTED} | | | {CTX_PCT} |`
371
-
372
- 3. Verify subagent result: Read `.gsd-t/progress.md` — confirm status is COMPLETED. If not, report the failure.
373
-
374
- **Why this is mandatory**: Without auto-completion, verified milestones remain in VERIFIED state indefinitely. Requirements stay unmarked, progress.md is stale, and future sessions cannot tell the work was done. This is the root cause of "GSD-T forgot it did this work" — the milestone was built and verified but never formally completed.
375
-
376
- **Why a subagent**: Complete-milestone is a 12-step process (gap analysis, archive, version bump, git tag, doc ripple). Verify is already heavy with 8+ quality gates. Spawning a fresh-context subagent avoids compaction risk — and complete-milestone loads everything it needs from files (progress.md, verify-report.md, contracts).
377
-
378
- $ARGUMENTS
379
-
380
- ## Auto-Clear
381
-
382
- All work is committed to project files. Execute `/clear` to free the context window for the next command.
1
+ # GSD-T: Verify — Quality Gates (Solo or Parallel)
2
+
3
+ You are the lead agent coordinating verification of the completed work. Each verification dimension should be thorough and independent.
4
+
5
+ ## Step 1: Load State
6
+
7
+ Read:
8
+ 1. `CLAUDE.md`
9
+ 2. `.gsd-t/progress.md` — confirm status is INTEGRATED
10
+ 3. `.gsd-t/contracts/` — all contracts
11
+ 4. `.gsd-t/domains/*/tasks.md` — all acceptance criteria
12
+ 5. `docs/requirements.md` — original requirements
13
+ 6. All source code
14
+
15
+ ## Step 1.5: Graph-Enhanced Traceability Check
16
+
17
+ If `.gsd-t/graph/meta.json` exists (graph index is available):
18
+ 1. Query `getRequirementFor` on implemented entities to build a requirement-to-code traceability chain — flag entities with no requirement mapping
19
+ 2. Query `getDomainBoundaryViolations` to verify no cross-domain boundary violations exist in the final codebase
20
+ 3. Include any violations as FAIL findings in the verification report (Step 5)
21
+
22
+ If graph is not available, skip this step.
23
+
24
+ ## Step 2: Full Test Audit (Inline)
25
+
26
+ Run the full test audit directly:
27
+
28
+ 1. Run the full test suite: `npm test` (or project equivalent) — record pass/fail counts
29
+ 2. Read all contracts in `.gsd-t/contracts/` — verify each has at least one test validating it
30
+ 3. Check acceptance criteria from domain task lists — verify each is tested
31
+ 4. Run E2E suite if `playwright.config.*` exists
32
+ 5. Report: comprehensive test results with pass/fail counts and coverage gaps
33
+
34
+ Verification cannot complete if any test fails or critical contract gaps remain.
35
+
36
+ ## Step 2.5: High-Risk Domain Gate (MANDATORY — Categories 2 and 7)
37
+
38
+ Before running standard verification dimensions, check whether this milestone involves any high-risk domain:
39
+
40
+ **High-risk domains**: audio capture/playback, GPU/WebGPU/WebGL, ML/inference/model loading, background workers, native APIs (camera, bluetooth, filesystem), IPC, WebAssembly, real-time data streams.
41
+
42
+ **If any high-risk domain is present:**
43
+
44
+ ### Category 2 — Technology Reliability Gate
45
+ Initialization success does not prove runtime correctness. These technologies can initialize cleanly and fail silently at runtime (compute shader errors, audio context state loss, worker message drops, inference failures).
46
+
47
+ For each high-risk domain:
48
+ 1. A **smoke test script** must exist that exercises actual runtime behavior — not just initialization
49
+ 2. The smoke test must have been run and passed
50
+ 3. "It initialized without throwing" is NOT a passing smoke test
51
+ 4. If no smoke test exists → create one now before proceeding with any other verification dimension
52
+ 5. Smoke test failure → verification FAIL (not WARN)
53
+
54
+ ### Category 7 — Manual QA as Test Gate
55
+ "The user will manually test it" is not a test artifact. Scan the milestone's domains for any feature whose acceptance criteria relies solely on manual user testing.
56
+
57
+ For each such feature:
58
+ 1. A smoke test script must exist that automates as much of the verification as possible
59
+ 2. Any remaining manual steps must be explicitly documented in `.gsd-t/smoke-tests/{feature}.md` with exact steps and expected outcomes
60
+ 3. The documented manual steps must have been executed and passed (noted in the file)
61
+ 4. If neither automated smoke test nor documented manual procedure exists → verification FAIL
62
+
63
+ > These gates exist because the pre-commit checklist "did you run the affected tests?" is meaningless when the only test is "user presses Ctrl+Space." That is not a test. It is hope.
64
+
65
+ ---
66
+
67
+ ## Step 3: Define Verification Dimensions
68
+
69
+ Standard dimensions (adjust based on project):
70
+
71
+ 1. **Functional Correctness**: Does it work per requirements?
72
+ 2. **Contract Compliance**: Does every domain honor its contracts?
73
+ 3. **Code Quality**: Conventions, patterns, error handling, readability
74
+ 4. **Test Coverage Completeness**: Every new or changed code path MUST have tests. Check:
75
+ - Do all new functions have unit tests (happy path + edge cases + error cases)?
76
+ - Do all new features/modes/flows have Playwright E2E specs?
77
+ - Do all new UI components have interaction tests?
78
+ - **Zero test coverage on new functionality = FAIL** (not WARN, not "nice to have" — FAIL)
79
+ 5. **E2E Tests**: Run the FULL Playwright suite — all specs must pass. If new features lack specs, create them before proceeding.
80
+ 6. **Security**: Auth flows, input validation, data exposure, dependencies
81
+ 7. **Integration Integrity**: Do the seams between domains hold under stress?
82
+ 8. **Requirements Traceability Close-Out**: Mark verified requirements as complete and report orphans:
83
+ - Read `docs/requirements.md` traceability table (added by plan phase)
84
+ - For each REQ-ID that is fully implemented and tested: update Status to `complete` in the traceability table
85
+ - **Orphan report**: List any REQ-IDs with no task mapping (planning gap) and any tasks with no REQ-ID (potential scope creep)
86
+ - Orphaned requirements = WARN (not blocking unless critical)
87
+ - Update `docs/requirements.md` with the close-out results
88
+
89
+ ## Step 4: Execute Verification
90
+
91
+ ### Solo Mode (default)
92
+ Work through each dimension sequentially. For each:
93
+ 1. Define what you're checking
94
+ 2. Check it systematically
95
+ 3. Record findings as PASS / WARN / FAIL with specifics
96
+ 4. If FAIL, create a remediation task
97
+
98
+ **Mandatory test execution:**
99
+ 1. Run ALL unit/integration tests — every test must pass
100
+ 2. Detect Playwright (check for `playwright.config.*`, Playwright deps in package.json)
101
+ 3. Run the FULL Playwright E2E suite — every spec must pass
102
+ 4. **Coverage audit**: For every new feature, mode, page, or flow added in this milestone:
103
+ - Confirm Playwright specs exist that specifically test it
104
+ - Confirm specs cover: happy path, error states, edge cases, all modes/flags
105
+ - If specs are missing or incomplete → invoke `gsd-t-test-sync` to create them, then re-run
106
+ - **Missing E2E coverage on new functionality = verification FAIL**
107
+ 5. **Functional test quality audit**: Read every Playwright spec. For each `test()` block, verify assertions check **functional behavior** (state changed after action, data loaded, content updated, widget responded) — NOT just element existence (`isVisible`, `toBeAttached`, `toBeEnabled`). A test that would pass on an empty HTML page with the right element IDs is a **shallow test** and counts as a verification FAIL. Flag shallow tests and rewrite them before proceeding.
108
+ 6. Tests are NOT optional — verification cannot pass without running them and confirming comprehensive, functional coverage
109
+
110
+ ### Team Mode (when agent teams are enabled)
111
+ ```
112
+ Create an agent team for verification:
113
+
114
+ ALL TEAMMATES read first:
115
+ 1. CLAUDE.md
116
+ 2. .gsd-t/contracts/ — all contracts
117
+ 3. .gsd-t/domains/*/tasks.md — acceptance criteria
118
+ 4. docs/requirements.md
119
+
120
+ Teammate assignments:
121
+ - Teammate "functional":
122
+ Verify every acceptance criterion in every domain's tasks.md.
123
+ Test each user flow end-to-end.
124
+ Report: list of criteria with PASS/FAIL status.
125
+
126
+ - Teammate "contracts":
127
+ For each contract in .gsd-t/contracts/:
128
+ Verify the implementing code matches exactly.
129
+ Check types, shapes, error handling, edge cases.
130
+ Report: contract-by-contract compliance status.
131
+
132
+ - Teammate "quality":
133
+ Review all source code for:
134
+ - Consistency with CLAUDE.md conventions
135
+ - Error handling completeness
136
+ - Code duplication
137
+ - Naming consistency
138
+ - Dead code or TODOs
139
+ Report: file-by-file findings.
140
+
141
+ - Teammate "security":
142
+ Review for:
143
+ - Auth bypass possibilities
144
+ - Input validation gaps
145
+ - Data exposure in API responses
146
+ - Dependency vulnerabilities (run audit if applicable)
147
+ - Secret/credential handling
148
+ Report: severity-ranked findings.
149
+
150
+ Lead: After receiving teammate reports:
151
+ **OBSERVABILITY LOGGING (MANDATORY):**
152
+ Before spawning — run via Bash:
153
+ `T_START=$(date +%s) && DT_START=$(date +"%Y-%m-%d %H:%M") && TOK_START=${CLAUDE_CONTEXT_TOKENS_USED:-0} && TOK_MAX=${CLAUDE_CONTEXT_TOKENS_MAX:-200000}`
154
+ Spawn a Task subagent to run the full test suite and contract audit.
155
+ After subagent returns — run via Bash:
156
+ `T_END=$(date +%s) && DT_END=$(date +"%Y-%m-%d %H:%M") && TOK_END=${CLAUDE_CONTEXT_TOKENS_USED:-0} && DURATION=$((T_END-T_START))`
157
+ Compute tokens and compaction:
158
+ - No compaction (TOK_END >= TOK_START): `TOKENS=$((TOK_END-TOK_START))`, COMPACTED=null
159
+ - Compaction detected (TOK_END < TOK_START): `TOKENS=$(((TOK_MAX-TOK_START)+TOK_END))`, COMPACTED=$DT_END
160
+ Append to `.gsd-t/token-log.md` (create with header `| Datetime-start | Datetime-end | Command | Step | Model | Duration(s) | Notes | Tokens | Compacted |` if missing):
161
+ `| {DT_START} | {DT_END} | gsd-t-verify | Step 4 | haiku | {DURATION}s | test audit + contract review | {TOKENS} | {COMPACTED} |`
162
+ Collect all reports, synthesize, create remediation plan.
163
+ ```
164
+
165
+ ## Step 5: Compile Verification Report
166
+
167
+ Create or update `.gsd-t/verify-report.md`:
168
+
169
+ ```markdown
170
+ # Verification Report — {date}
171
+
172
+ ## Milestone: {name}
173
+
174
+ ## Summary
175
+ - Functional: {PASS/WARN/FAIL} — {X}/{Y} criteria met
176
+ - Contracts: {PASS/WARN/FAIL} — {X}/{Y} contracts compliant
177
+ - Code Quality: {PASS/WARN/FAIL} — {N} issues found
178
+ - Unit Tests: {PASS/WARN/FAIL} — {N}/{total} passing
179
+ - E2E Tests: {PASS/WARN/FAIL} — {N}/{total} specs passing
180
+ - Security: {PASS/WARN/FAIL} — {N} findings
181
+ - Integration: {PASS/WARN/FAIL}
182
+
183
+ ## Overall: {PASS / CONDITIONAL PASS / FAIL}
184
+
185
+ ## Findings
186
+
187
+ ### Critical (must fix before milestone complete)
188
+ 1. {finding} — {domain} — {remediation}
189
+
190
+ ### Warnings (should fix, not blocking)
191
+ 1. {finding} — {domain} — {remediation}
192
+
193
+ ### Notes (informational)
194
+ 1. {observation}
195
+
196
+ ## Remediation Tasks
197
+ | # | Domain | Description | Priority |
198
+ |---|--------|-------------|----------|
199
+ | 1 | auth | Fix missing role in user response | CRITICAL |
200
+ | 2 | ui | Add loading states for async calls | WARN |
201
+ ```
202
+
203
+ ## Step 5.25: Metrics Quality Budget Check
204
+
205
+ Check task-metrics for the current milestone to detect quality budget violations:
206
+
207
+ 1. Run via Bash:
208
+ `node -e "const c = require('./bin/metrics-collector.js'); const r = c.readTaskMetrics({milestone: '{milestone-id}'}); if(!r.length){console.log('No metrics data — quality budget check skipped');process.exit(0);} const pass=r.filter(t=>t.fix_cycles===0&&t.pass).length; const rate=pass/r.length; console.log('First-pass rate: '+(rate*100).toFixed(1)+'% ('+pass+'/'+r.length+')'); if(rate<0.6) console.log('⚠️ Quality budget WARNING: first-pass rate below 60%');" 2>/dev/null || true`
209
+
210
+ 2. Run heuristics check via Bash:
211
+ `node -e "const m=require('./bin/metrics-rollup.js'); const r=m.readRollups({milestone:'{milestone-id}'}); if(r.length&&r[r.length-1].heuristic_flags.some(f=>f.severity==='HIGH')) console.log('⚠️ HIGH severity heuristic flag detected — review before completing milestone');" 2>/dev/null || true`
212
+
213
+ 3. Display quality metrics summary inline. Quality budget violation is a **WARNING** (non-blocking) — does not fail verify.
214
+
215
+ 4. Include quality budget status in the verification report (Step 5):
216
+ `- Quality Budget: {PASS/WARN} — first-pass rate {N}%{, HIGH heuristic: {name} if any}`
217
+
218
+ ## Step 5.5: Goal-Backward Verification (Post-Gate Behavior Check)
219
+
220
+ This step runs **after all 8 quality gates pass**. It verifies that milestone goals are actually achieved end-to-end — not just structurally present. It catches placeholder implementations that pass all structural gates.
221
+
222
+ Refer to `.gsd-t/contracts/goal-backward-contract.md` for the full verification flow, placeholder patterns, and findings report format.
223
+
224
+ ### 5.5.1 Load Milestone Goals and Requirements
225
+
226
+ 1. Read `.gsd-t/progress.md` — extract the current milestone name and goals
227
+ 2. Read `docs/requirements.md` — identify **critical requirements** (skip trivial/low-priority items)
228
+
229
+ ### 5.5.2 Trace Requirements to Behavior
230
+
231
+ For each critical requirement:
232
+
233
+ 1. **If `.gsd-t/graph/meta.json` exists (graph available)**:
234
+ - Trace the requirement → code path → behavior chain using graph queries
235
+ - Use `getRequirementFor`, `getCallers`, and `getTestsFor` to build the chain
236
+ - Flag requirements with no traceable code path as CRITICAL findings
237
+
238
+ 2. **If graph is not available (fallback to grep)**:
239
+ - Search the codebase for the feature/function implementing each requirement
240
+ - Trace from entry point → core logic → output/response
241
+
242
+ ### 5.5.3 Scan for Placeholder Patterns
243
+
244
+ For each file identified in the requirement traces above, scan for these placeholder patterns:
245
+
246
+ | Pattern | Detection Hint | Severity |
247
+ |---------|---------------|----------|
248
+ | console.log placeholder | `console.log.*TODO\|console.log.*implement` | CRITICAL |
249
+ | TODO/FIXME in implementation | `// TODO\|// FIXME\|# TODO\|# FIXME` in non-test files | CRITICAL |
250
+ | Empty function body | `function \w+\(\) \{\}` or `\(\) => \{\}` with no logic | CRITICAL |
251
+ | Throw not-implemented | `throw new Error.*not implemented\|throw new Error.*TODO` | CRITICAL |
252
+ | Hardcoded return | `return "success"\|return true` with no conditional logic | HIGH |
253
+ | Static UI text | Static `<span>` or text that never updates based on state | HIGH |
254
+ | Pass-through stub | `return input\|return req\|return data` with no transformation | MEDIUM |
255
+
256
+ ### 5.5.4 Produce Findings Report
257
+
258
+ Format findings per the goal-backward-contract.md report format:
259
+
260
+ ```markdown
261
+ ## Goal-Backward Verification Report
262
+
263
+ ### Status: PASS | FAIL
264
+
265
+ ### Findings
266
+ | # | Requirement | File:Line | Pattern | Severity | Description |
267
+ |---|-------------|-----------|---------|----------|-------------|
268
+ | 1 | {req-id} | {path}:{line} | {pattern} | {severity} | {what's wrong} |
269
+
270
+ ### Summary
271
+ - Requirements checked: {N}
272
+ - Findings: {N} ({critical}, {high}, {medium})
273
+ - Verdict: {PASS if 0 critical/high, FAIL otherwise}
274
+ ```
275
+
276
+ ### 5.5.5 Apply Blocking Rules
277
+
278
+ - **CRITICAL or HIGH findings** → Goal-Backward status = **FAIL** — block verification
279
+ - Append findings to the Critical section of the verification report (Step 5)
280
+ - Set overall verification status to FAIL
281
+ - **MEDIUM findings** → Goal-Backward status = **WARN** — log but do not block
282
+ - Append findings to the Warnings section of the verification report (Step 5)
283
+ - **No findings** → Goal-Backward status = **PASS** — add to verification report summary
284
+
285
+ Add a `Goal-Backward:` line to the Step 5 verification report summary:
286
+ ```
287
+ - Goal-Backward: {PASS/WARN/FAIL} — {N} requirements checked, {N} findings ({critical} critical, {high} high, {medium} medium)
288
+ ```
289
+
290
+ ---
291
+
292
+ ## Step 6: Handle Remediation
293
+
294
+ If there are CRITICAL findings:
295
+ 1. Create remediation tasks in the affected domain's `tasks.md`
296
+ 2. Execute fixes (solo — don't spawn teams for remediation)
297
+ 3. Re-verify the specific findings
298
+ 4. Update the verification report
299
+
300
+ ## Step 7: Update State
301
+
302
+ Update `.gsd-t/progress.md`:
303
+ - If all PASS: Set status to `VERIFIED`
304
+ - If CONDITIONAL PASS: Set status to `VERIFIED-WITH-WARNINGS`, list warnings
305
+ - If FAIL: Set status to `VERIFY-FAILED`, list required remediations
306
+ - Record verification date and summary
307
+
308
+ ### Autonomy Behavior
309
+
310
+ **All Levels**:
311
+ - VERIFIED or CONDITIONAL PASS → **Auto-invoke complete-milestone** (see Step 8 below). Completing a verified milestone is mechanical — there is no judgment call that benefits from user review.
312
+ - FAIL → **Level 3**: Auto-execute remediation tasks (up to 2 fix attempts). If still failing after 2 attempts:
313
+ 1. Write failure context to `.gsd-t/debug-state.jsonl` via `node -e "require('./bin/debug-ledger.js').appendEntry('.', {iteration:1,timestamp:new Date().toISOString(),test:'verify-remediation',error:'2 in-context fix attempts exhausted',hypothesis:'see verify-report.md',fix:'n/a',fixFiles:[],result:'STILL_FAILS',learning:'delegating to headless debug-loop',model:'sonnet',duration:0})"`
314
+ 2. Log: "Delegating to headless debug-loop (2 in-context attempts exhausted)"
315
+ 3. Run: `gsd-t headless --debug-loop --max-iterations 10`
316
+ 4. Exit code 0 → re-run verification; 1/4 → log to `.gsd-t/deferred-items.md`, STOP and report to user; 3 → report error
317
+ **Level 1–2**: Return to execute phase for remediation tasks.
318
+
319
+ ## Document Ripple
320
+
321
+ ### Always update:
322
+ 1. **`.gsd-t/progress.md`** — Set status to VERIFIED/VERIFY-FAILED, log verification summary
323
+ 2. **`.gsd-t/verify-report.md`** — Created with full verification results (Step 4)
324
+
325
+ ### Check if affected:
326
+ 3. **`.gsd-t/domains/{domain}/tasks.md`** — If remediation tasks were created (Step 5)
327
+ 4. **`.gsd-t/techdebt.md`** — If verification found new quality or security issues, add as debt
328
+ 5. **`docs/requirements.md`** — If verification revealed unmet requirements, update status
329
+
330
+ ## Step 8: Auto-Invoke Complete-Milestone
331
+
332
+ **This step is MANDATORY and runs at ALL autonomy levels.** Completing a verified milestone is a mechanical operation (archive, tag, bump version, update docs). There is no decision that benefits from user review — the decision was made when verification passed.
333
+
334
+ If status is VERIFY-FAILED:
335
+ - Do NOT invoke complete-milestone
336
+ - Report failures and stop
337
+
338
+ If status is VERIFIED or VERIFIED-WITH-WARNINGS:
339
+ 1. Log: "✅ Verify complete — spawning complete-milestone agent..."
340
+
341
+ **OBSERVABILITY LOGGING (MANDATORY):**
342
+ Before spawning — run via Bash:
343
+ `T_START=$(date +%s) && DT_START=$(date +"%Y-%m-%d %H:%M") && TOK_START=${CLAUDE_CONTEXT_TOKENS_USED:-0} && TOK_MAX=${CLAUDE_CONTEXT_TOKENS_MAX:-200000}`
344
+
345
+ 2. Spawn a Task subagent (model: sonnet, mode: bypassPermissions):
346
+ ```
347
+ "Execute the complete-milestone phase of the current GSD-T milestone.
348
+
349
+ Read and follow the full instructions in commands/gsd-t-complete-milestone.md
350
+ (resolve from ~/.claude/commands/ if not in project).
351
+ Read .gsd-t/progress.md for current milestone and state.
352
+ Read CLAUDE.md for project conventions.
353
+ Read .gsd-t/contracts/ for domain interfaces.
354
+
355
+ Complete the phase fully:
356
+ - Follow every step in the command file
357
+ - Update .gsd-t/progress.md status when done
358
+ - Run document ripple as specified
359
+ - Commit your work
360
+
361
+ Report back: one-line status summary."
362
+ ```
363
+
364
+ After subagent returns — run via Bash:
365
+ `T_END=$(date +%s) && DT_END=$(date +"%Y-%m-%d %H:%M") && TOK_END=${CLAUDE_CONTEXT_TOKENS_USED:-0} && DURATION=$((T_END-T_START))`
366
+ Compute tokens and compaction:
367
+ - No compaction (TOK_END >= TOK_START): `TOKENS=$((TOK_END-TOK_START))`, COMPACTED=null
368
+ - Compaction detected (TOK_END < TOK_START): `TOKENS=$(((TOK_MAX-TOK_START)+TOK_END))`, COMPACTED=$DT_END
369
+ Append to `.gsd-t/token-log.md`:
370
+ `| {DT_START} | {DT_END} | gsd-t-verify | Step 8 | sonnet | {DURATION}s | auto-complete-milestone | {TOKENS} | {COMPACTED} | | | {CTX_PCT} |`
371
+
372
+ 3. Verify subagent result: Read `.gsd-t/progress.md` — confirm status is COMPLETED. If not, report the failure.
373
+
374
+ **Why this is mandatory**: Without auto-completion, verified milestones remain in VERIFIED state indefinitely. Requirements stay unmarked, progress.md is stale, and future sessions cannot tell the work was done. This is the root cause of "GSD-T forgot it did this work" — the milestone was built and verified but never formally completed.
375
+
376
+ **Why a subagent**: Complete-milestone is a 12-step process (gap analysis, archive, version bump, git tag, doc ripple). Verify is already heavy with 8+ quality gates. Spawning a fresh-context subagent avoids compaction risk — and complete-milestone loads everything it needs from files (progress.md, verify-report.md, contracts).
377
+
378
+ $ARGUMENTS
379
+
380
+ ## Auto-Clear
381
+
382
+ All work is committed to project files. Execute `/clear` to free the context window for the next command.