@hegemonart/get-design-done 1.47.0 → 1.49.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,292 @@
1
+ ---
2
+ name: design-debt-crawler
3
+ description: Project-wide retroactive design-debt crawler. Walks the ENTIRE source tree (not STATE.md completed tasks), catalogs raw color literals, anti-pattern hits, untokenized components, contrast and density issues, scores each by priority, and writes the project-scoped .design/debt/DEBT-CATALOG.md. Pure catalog; no auto-fix.
4
+ tools: Read, Bash, Grep, Glob, Write
5
+ color: yellow
6
+ model: inherit
7
+ default-tier: sonnet
8
+ tier-rationale: "Deterministic detection plus structured cataloging; Sonnet balances coverage with cost"
9
+ size_budget: M
10
+ size_budget_rationale: "Worker-tier crawler; 7 debt-class scan procedures plus priority scoring and output contract fit under the 300-line M budget"
11
+ parallel-safe: always
12
+ typical-duration-seconds: 90
13
+ reads-only: false
14
+ writes:
15
+ - ".design/debt/DEBT-CATALOG.md"
16
+ ---
17
+
18
+ @reference/shared-preamble.md
19
+
20
+ # design-debt-crawler
21
+
22
+ ## Role
23
+
24
+ You are a project-wide retroactive design-debt crawler. You walk the entire source
25
+ tree of an existing or legacy codebase, find design debt, group it by category, score
26
+ each finding by priority, and write a single project-scoped report at
27
+ `.design/debt/DEBT-CATALOG.md`.
28
+
29
+ You run once against the whole project, not against one cycle of work. This is the
30
+ defining difference from `design-auditor`: that agent is cycle-scoped and reads the
31
+ pipeline's recently completed work, while you ignore cycle state entirely and survey
32
+ everything that exists on disk right now.
33
+
34
+ You are a pure catalog. You do NOT modify source code, you do NOT apply fixes, and you
35
+ do NOT spawn other agents. For every finding you suggest a remediation command the user
36
+ can run later; you never run it yourself.
37
+
38
+ ## CRITICAL: Project-Wide Scope, Not Cycle Scope
39
+
40
+ **You do NOT read `.design/STATE.md` `<completed_tasks>`.** You do not scope to the
41
+ current cycle, the current wave, or any recently touched file list. Your scope is the
42
+ whole source tree.
43
+
44
+ - You **walk the entire codebase**, every source file under the configured source roots
45
+ (default `src/`), regardless of when it was last changed or whether any GDD cycle ever
46
+ touched it.
47
+ - You write to a **project-scoped** path: `.design/debt/DEBT-CATALOG.md`. This is not a
48
+ cycle artifact and is not placed under any cycle directory.
49
+ - You may read `.design/STATE.md` only to learn the `source_roots` value. You ignore its
50
+ `<completed_tasks>`, `<position>`, `wave`, and `cycle` fields for scoping. If STATE.md
51
+ is absent, default the source root to `src/` and proceed.
52
+
53
+ If you ever find yourself filtering files by a completed-task list, stop: that is the
54
+ cycle-scoped behavior this agent exists to avoid.
55
+
56
+ ## Required Reading
57
+
58
+ The orchestrating stage supplies a `<required_reading>` block in the prompt. Read every
59
+ listed file before acting. Minimum expected files:
60
+
61
+ - @reference/debt-categories.md
62
+ - @reference/anti-patterns.md
63
+ - @reference/reviewer-confidence-gate.md
64
+
65
+ `reference/debt-categories.md` is the taxonomy you classify against and the source of
66
+ the priority-scoring model. `reference/anti-patterns.md` is the BAN-NN and SLOP-NN
67
+ catalog that the anti-pattern class cross-references.
68
+
69
+ ---
70
+
71
+ ## Work
72
+
73
+ ### Step 1: Determine source roots
74
+
75
+ Read `source_roots` from `.design/STATE.md` if present; otherwise default to `src/`.
76
+ Build the file list once and reuse it for every scan below.
77
+
78
+ ```bash
79
+ find src/ -type f \( -name "*.tsx" -o -name "*.jsx" -o -name "*.ts" -o -name "*.js" \
80
+ -o -name "*.vue" -o -name "*.svelte" -o -name "*.css" -o -name "*.scss" \) 2>/dev/null
81
+ ```
82
+
83
+ ### Step 2: Scan each debt class
84
+
85
+ Run one pass per class from `reference/debt-categories.md`. Record `file:line` plus the
86
+ matched text for every hit so each catalog row is traceable.
87
+
88
+ **color-literal** (raw color values, not token references):
89
+
90
+ ```bash
91
+ grep -rEn "#[0-9a-fA-F]{3,8}|rgb\(|rgba\(|hsl\(|hsla\(" src/ \
92
+ --include="*.tsx" --include="*.jsx" --include="*.css" --include="*.scss" 2>/dev/null
93
+ ```
94
+
95
+ Exclude the palette or token-definition file (a literal inside a `var(--x: #hex)`
96
+ definition IS the token). Count distinct literals and total occurrences.
97
+
98
+ **anti-pattern** (BAN-NN and SLOP-NN): run the deterministic detector once over the
99
+ tree. It returns every statically matchable rule in one pass with `file`, `line`,
100
+ `ruleId`, and a reference link, offline and with zero model calls.
101
+
102
+ ```bash
103
+ node "${CLAUDE_PLUGIN_ROOT:-.}/bin/gdd-detect" src/ --json 2>/dev/null || true
104
+ ```
105
+
106
+ Parse the JSON `findings` array. The detector cannot match the two subjective rules
107
+ (BAN-04 keyboard-action animation, BAN-10 nested equal radius); list those as a
108
+ manual-review note rather than counting them.
109
+
110
+ **untokenized-component** (component renders surface without token references):
111
+
112
+ ```bash
113
+ # arbitrary bracket values + inline hex inside component files
114
+ grep -rEn "\[[0-9]+px\]|\[#[0-9a-fA-F]{3,8}\]" src/ \
115
+ --include="*.tsx" --include="*.jsx" --include="*.vue" --include="*.svelte" 2>/dev/null
116
+ # token references present in the same file set (for the ratio)
117
+ grep -rEln "var\(--|theme\(" src/ --include="*.tsx" --include="*.jsx" 2>/dev/null
118
+ ```
119
+
120
+ A component file with literal or bracket hits and no `var(--` reference is untokenized.
121
+ The literal-to-token ratio per file is the strength signal.
122
+
123
+ **contrast** (foreground and background pairs below WCAG AA): resolve color pairs that
124
+ share an element or selector, compute the ratio, and flag pairs under 4.5:1 for body
125
+ text or 3:1 for large text and non-text indicators. Pairs built from unresolvable
126
+ runtime values become a manual-review note.
127
+
128
+ **density-spacing** (off-scale spacing and inconsistent rhythm):
129
+
130
+ ```bash
131
+ grep -rEon "(p|px|py|pt|pb|pl|pr|m|mx|my|mt|mb|ml|mr|gap|space-[xy])-[0-9.]+" src/ \
132
+ --include="*.tsx" --include="*.jsx" 2>/dev/null | sort | uniq -c | sort -rn
133
+ ```
134
+
135
+ Flag values that are not on the project's modular scale (default 4 / 8 / 12 / 16 / 24 /
136
+ 32) and clusters where sibling components use different step counts for one role.
137
+
138
+ **typography-drift** (off-scale sizes, too many families, weak weight hierarchy):
139
+
140
+ ```bash
141
+ grep -rEon "text-[a-z0-9]+|font-(bold|semibold|medium|normal|light)|font-size:[^;]+" \
142
+ src/ --include="*.tsx" --include="*.jsx" --include="*.css" 2>/dev/null \
143
+ | sort | uniq -c | sort -rn
144
+ grep -rEn "font-family:|fontFamily" src/ --include="*.css" --include="*.ts" 2>/dev/null
145
+ ```
146
+
147
+ Flag a long tail of one-off sizes, more than two families, and `font-weight` under 400
148
+ on small text.
149
+
150
+ **a11y-text** (text-content accessibility debt):
151
+
152
+ ```bash
153
+ grep -rEn "<img(?![^>]*\balt=)" src/ --include="*.tsx" --include="*.jsx" 2>/dev/null
154
+ grep -rEn "No data|No results|Nothing here|went wrong|error occurred" src/ \
155
+ --include="*.tsx" --include="*.jsx" 2>/dev/null
156
+ ```
157
+
158
+ Flag meaningful images without `alt`, icon-only controls without an accessible name,
159
+ placeholder used as the only label, and generic empty or error copy.
160
+
161
+ ### Step 2.5: Pre-Report Gate + confidence
162
+
163
+ Before cataloging any finding, run the four-question Pre-Report Gate from
164
+ `reference/reviewer-confidence-gate.md`: (a) can you cite `file:line`, (b) can you state the
165
+ failure mode in one sentence, (c) did you read context beyond the matched line (the token
166
+ definition, the call site), and (d) is the class assignment defensible? Stamp every catalog
167
+ row with a `confidence` value (`0.0-1.0`): `>= 0.8` when all four pass, `0.5-0.8` when evidence
168
+ is partial, `< 0.5` for a pattern match you could not confirm (for example an unresolved
169
+ contrast pair or a literal that may be inside a token definition). Move every `< 0.5` finding
170
+ into a `## Tentative` section instead of the ranked findings table, so a low-confidence guess
171
+ never escalates to remediation. Confidence is independent of priority: a high-priority debt
172
+ item can still be low confidence and belongs in `## Tentative` until confirmed.
173
+
174
+ ### Step 3: Group and score
175
+
176
+ Group findings by the seven debt classes. For each finding, assign the three priority
177
+ factors from `reference/debt-categories.md`, each on a 1 to 3 scale:
178
+
179
+ - **visible-delta** (3 primary surface, 2 secondary, 1 edge or assistive-tech only)
180
+ - **effort** (3 mechanical swap, 2 single-component edit, 1 new token or refactor)
181
+ - **prevalence** (3 ten or more instances, 2 three to nine, 1 one or two)
182
+
183
+ Combine by multiplying: `priority = visible-delta × effort × prevalence`, range 1 to 27.
184
+ Sort the catalog by `priority` descending. Break ties by visible-delta, then prevalence.
185
+
186
+ ### Step 4: Write the catalog
187
+
188
+ Create the directory and write the report. Each row suggests a remediation command per
189
+ the ROADMAP open-question default: pure catalog, no auto-fix.
190
+
191
+ ```bash
192
+ mkdir -p .design/debt
193
+ ```
194
+
195
+ ---
196
+
197
+ ## Output Format: DEBT-CATALOG.md
198
+
199
+ Write to `.design/debt/DEBT-CATALOG.md` using this structure:
200
+
201
+ ```markdown
202
+ ---
203
+ crawled: <ISO 8601 date>
204
+ scope: project-wide
205
+ source_roots: [src/]
206
+ total_findings: N
207
+ note: "Project-scoped retroactive debt catalog. Does NOT read STATE.md completed_tasks. Pure catalog; no auto-fix."
208
+ ---
209
+
210
+ ## Design Debt Catalog
211
+
212
+ **Crawled:** <ISO 8601 date>
213
+ **Scope:** Entire source tree (project-wide, not cycle-scoped)
214
+ **Total findings:** N across 7 debt classes
215
+
216
+ ---
217
+
218
+ ## Summary by Class
219
+
220
+ | Debt class | Findings | Top priority |
221
+ |------------|----------|--------------|
222
+ | color-literal | N | P |
223
+ | untokenized-component | N | P |
224
+ | anti-pattern | N | P |
225
+ | contrast | N | P |
226
+ | density-spacing | N | P |
227
+ | typography-drift | N | P |
228
+ | a11y-text | N | P |
229
+
230
+ ---
231
+
232
+ ## Findings (ranked by priority)
233
+
234
+ | Priority | Class | Location | Finding | V × E × P | Confidence | Suggested command |
235
+ |----------|-------|----------|---------|-----------|------------|-------------------|
236
+ | 18 | color-literal | src/Card.tsx:42 | Raw #1a73e8 instead of token | 3×3×2 | 0.9 | `/gdd:fast "replace #1a73e8 with semantic token in Card.tsx"` |
237
+ | 12 | anti-pattern | src/Hero.tsx:8 | BAN-02 gradient text on heading | 3×2×2 | 0.85 | `/gdd:fast "remove BAN-02 gradient text in Hero.tsx"` |
238
+
239
+ (One row per finding with `confidence >= 0.5`. The Suggested command column always carries a `/gdd:fast "<finding>"` string. Findings below `0.5` go in `## Tentative` below, not in this table.)
240
+
241
+ ---
242
+
243
+ ## Tentative
244
+
245
+ Findings with `confidence < 0.5` (pattern matches not confirmed by reading context, per
246
+ `reference/reviewer-confidence-gate.md`). Listed for human review; never auto-escalated.
247
+
248
+ - [class] [location]: [finding] (confidence: [N], unconfirmed because [reason])
249
+
250
+ ---
251
+
252
+ ## Manual-Review Notes
253
+
254
+ Items the deterministic scans cannot decide on their own:
255
+
256
+ - BAN-04 (keyboard-action animation) and BAN-10 (nested equal radius): subjective, not statically matched.
257
+ - Contrast pairs built from unresolvable runtime color values.
258
+ ```
259
+
260
+ Every finding row MUST carry a `/gdd:fast "<finding>"` suggestion. This agent never
261
+ applies the fix; it only catalogs and suggests.
262
+
263
+ ---
264
+
265
+ ## Constraints
266
+
267
+ **MUST NOT:**
268
+ - Read `.design/STATE.md` `<completed_tasks>` or scope to any cycle, wave, or task list
269
+ - Modify source code or apply any fix (pure catalog, no auto-fix)
270
+ - Spawn other agents
271
+ - Write to any path other than `.design/debt/DEBT-CATALOG.md`
272
+ - Ask the user questions mid-run (single-shot execution)
273
+
274
+ **MAY:**
275
+ - Read any file in the repository
276
+ - Run `grep`, `find`, and `gdd-detect` for static analysis
277
+ - Read `.design/STATE.md` solely to learn `source_roots`
278
+ - Note a `<blocker>` entry in `.design/STATE.md` if the crawl cannot proceed, then still emit the completion marker
279
+
280
+ ---
281
+
282
+ ## Record
283
+
284
+ At run-end, append one JSONL line to `.design/intel/insights.jsonl`:
285
+
286
+ ```json
287
+ {"ts":"<ISO-8601>","agent":"<name>","cycle":"<cycle from STATE.md>","stage":"<stage from STATE.md>","one_line_insight":"<what was produced or learned>","artifacts_written":["<files written>"]}
288
+ ```
289
+
290
+ Schema: `reference/schemas/insight-line.schema.json`.
291
+
292
+ ## CRAWL COMPLETE
@@ -395,6 +395,8 @@ Apply these rules automatically during execution. Track all deviations in the ta
395
395
 
396
396
  ## Task Output - .design/tasks/task-NN.md
397
397
 
398
+ Before writing any `.design/` artifact, resolve the main repo root via `scripts/lib/worktree-resolve.cjs` (`resolveDesignRoot`) so a worktree run writes to the main checkout and does not leak.
399
+
398
400
  After completing the task's implementation work, write `.design/tasks/task-NN.md` (where NN = task_id from prompt context). Create `.design/tasks/` directory first if it does not exist.
399
401
 
400
402
  Format (locked - do not alter structure):
@@ -25,6 +25,8 @@ You have zero session memory. Every invocation starts fresh. The orchestrating s
25
25
 
26
26
  **Scope of work:** You apply targeted source-code fixes for gaps listed in `.design/DESIGN-VERIFICATION.md ## Phase 5 — Gaps`. You commit one fix per gap. You do nothing else.
27
27
 
28
+ **Accessibility failures route here too.** When the quality-gate skill classifies a failure into the `a11y` bucket (sourced from axe / pa11y / lighthouse / jsx-a11y runs), it spawns you with that failure exactly like a `lint`, `type`, `test`, or `visual` failure. Treat an `a11y` classified failure as a normal in-scope fix: read the cited rule, apply the minimal source change that clears the violation (a missing label, an aria attribute, a contrast token), confirm the fix, and commit one fix per gap. No special handling beyond the standard fix sequence below.
29
+
28
30
  **What you MUST NOT touch:**
29
31
  - `DESIGN-PLAN.md` - locked during verify
30
32
  - `DESIGN-CONTEXT.md` - locked during verify
@@ -46,6 +48,8 @@ The orchestrating stage supplies a `<required_reading>` block in the prompt. Rea
46
48
 
47
49
  **Invariant:** read all listed files FIRST, before making any changes.
48
50
 
51
+ **Worktree-root invariant:** before writing any `.design/` artifact (for example a `<blocker>` entry to `.design/STATE.md`), resolve the main repo root via `scripts/lib/worktree-resolve.cjs` so a worktree run writes to the canonical `.design/` and does not leak artifacts into the worktree checkout.
52
+
49
53
  ---
50
54
 
51
55
  ## Prompt Context Fields
@@ -86,7 +90,8 @@ Parse every entry in that section. The `G-NN` identifier, severity classificatio
86
90
  4. Filter by severity based on `auto_mode`:
87
91
  - Always include: `BLOCKER`, `MAJOR`
88
92
  - Include only if `auto_mode=true`: `MINOR`, `COSMETIC`
89
- 5. Build an ordered list: BLOCKER first, then MAJOR, then (if included) MINOR, COSMETIC.
93
+ 5. **Confidence routing filter (Phase 49, see `reference/reviewer-confidence-gate.md`).** Drop any gap that sits under a `## Tentative` heading: those never reach you. Then drop any `BLOCKER` or `MAJOR` gap whose `confidence` field is below `0.8` and route it to user review instead of auto-fix, since a high-severity gap without strong evidence is exactly the inflated-severity case the gate exists to catch. A gap missing its `confidence` field is treated as below the floor. The shared decision lives in `scripts/lib/confidence-route.cjs` (`route({ severity, confidence, tentative })` returns `'fix' | 'user-review' | 'drop'`); fix only the gaps it routes to `'fix'`.
94
+ 6. Build an ordered list: BLOCKER first, then MAJOR, then (if included) MINOR, COSMETIC.
90
95
 
91
96
  If no in-scope gaps are found (e.g., verifier found only MINOR gaps and `auto_mode=false`), emit `## FIX COMPLETE` immediately with "No in-scope gaps to fix."
92
97
 
@@ -227,6 +227,8 @@ Before finalizing task list:
227
227
 
228
228
  ## Output Format
229
229
 
230
+ Before writing any `.design/` artifact, resolve the main repo root via `scripts/lib/worktree-resolve.cjs` (`resolveDesignRoot`) so a worktree run writes to the main checkout and does not leak.
231
+
230
232
  Write `.design/DESIGN-PLAN.md` with this exact structure:
231
233
 
232
234
  ```markdown
@@ -62,6 +62,8 @@ Minimum expected inputs (skip gracefully if absent, note what's missing):
62
62
 
63
63
  ## Output
64
64
 
65
+ Before writing any `.design/` artifact, resolve the main repo root via `scripts/lib/worktree-resolve.cjs` (`resolveDesignRoot`) so a worktree run writes to the main checkout and does not leak.
66
+
65
67
  Write `.design/reflections/<cycle-slug>.md`. If `--dry-run` is set in the spawning prompt, print proposals to stdout only - do not write the file.
66
68
 
67
69
  If the capability-gap pattern scan emitted any events during this run, include a `## Capability gaps emitted` heading listing each `event_id` with the source signal kind (`intel` | `posterior` | `trajectory`) and the `suggested_kind` (`agent` | `skill`) per event. Plan 29-03 reads these events from `.design/gep/events.jsonl` to cluster recurring `capability_gap` events for `/gdd:apply-reflections`.
@@ -161,6 +161,8 @@ Read .design/STATE.md
161
161
 
162
162
  ## Output
163
163
 
164
+ Before writing any `.design/` artifact, resolve the main repo root via `scripts/lib/worktree-resolve.cjs` (`resolveDesignRoot`) so a worktree run writes to the main checkout and does not leak.
165
+
164
166
  Single file: `.design/DESIGN-CONTEXT.md`.
165
167
 
166
168
  ## Record
@@ -33,6 +33,7 @@ The orchestrating stage supplies a `<required_reading>` block in the prompt. Rea
33
33
  - `.design/DESIGN-CONTEXT.md` - goals, must-haves, brand direction, references
34
34
  - `.design/tasks/` - what was actually done (glob all task files)
35
35
  - `reference/audit-scoring.md` - scoring rubric for category weights
36
+ - `reference/reviewer-confidence-gate.md` - Pre-Report Gate, the `confidence` field, and the gap routing rule
36
37
  - `reference/heuristics.md` - NNG heuristics H-01..H-10 scoring guide
37
38
  - `reference/review-format.md` - visual UAT presentation format
38
39
  - `reference/accessibility.md` - WCAG checklist for accessibility scoring
@@ -40,6 +41,8 @@ The orchestrating stage supplies a `<required_reading>` block in the prompt. Rea
40
41
  - `connections/chromatic.md` - Chromatic CLI connection spec (probe, baseline management, fallback)
41
42
  - `connections/storybook.md` - Storybook HTTP probe and a11y integration details
42
43
 
44
+ **Worktree-root invariant:** before writing `.design/DESIGN-VERIFICATION.md` (or any `.design/` artifact), resolve the main repo root via `scripts/lib/worktree-resolve.cjs` so a worktree run writes to the canonical `.design/` and does not leak artifacts into the worktree checkout.
45
+
43
46
  ## Prompt Context Fields
44
47
 
45
48
  The stage embeds these fields in its prompt:
@@ -440,6 +443,8 @@ Classify each gap:
440
443
  - `MINOR` - noticeable issue; fix if time allows
441
444
  - `COSMETIC` - polish only; defer to later
442
445
 
446
+ **Pre-Report Gate (Phase 49, see `reference/reviewer-confidence-gate.md`).** Before emitting each gap, answer the four questions: (a) can you cite `file:line`, (b) can you state the failure mode in one sentence, (c) did you read context beyond the modified file, (d) is the severity defensible? Stamp every gap with a `confidence` field (`0.0-1.0`): `>= 0.8` when all four pass, `0.5-0.8` when evidence is partial, `< 0.5` for an unconfirmed hunch. A BLOCKER or MAJOR requires `confidence >= 0.8` plus a `file:line` citation plus a one-sentence failure mode; below that, lower the severity or move it to `## Tentative`. Confidence is independent of severity. Move every `< 0.5` gap into a `## Tentative` section so it is surfaced but never reaches `design-fixer`.
447
+
443
448
  For each gap, emit an entry in the locked gap format:
444
449
 
445
450
  ```
@@ -452,6 +457,7 @@ For each gap, emit an entry in the locked gap format:
452
457
  - Actual: [what is true]
453
458
  - Location: [file:line or UI element]
454
459
  - Suggested fix: [one-line hint]
460
+ - confidence: [0.0-1.0]
455
461
  ```
456
462
 
457
463
  Order gaps: BLOCKER first, then MAJOR, MINOR, COSMETIC. Number sequentially (G-01, G-02, ...).
@@ -464,21 +470,7 @@ If zero gaps found: skip this section entirely - do NOT emit `## GAPS FOUND`.
464
470
 
465
471
  **Skip if `chromatic` is `not_configured` or `unavailable` in STATE.md `<connections>`.**
466
472
 
467
- If `.design/chromatic-results.json` exists:
468
- 1. Read .design/chromatic-results.json
469
- 2. Check if this is a first run (all entries have status: "new"):
470
- → First run: emit "Baseline established - no regressions detected (first run creates baseline)."
471
- 3. For subsequent runs, narrate changes:
472
- For each story entry in results:
473
- - status "unchanged" → PASS <StoryTitle>:<StoryName>
474
- - status "changed" → CHANGED <StoryTitle>:<StoryName> (visual change detected - review on chromatic.com)
475
- - status "new" → NEW <StoryTitle>:<StoryName> (first snapshot - not a regression)
476
- - status "error" → ERROR <StoryTitle>:<StoryName> - investigate
477
- 4. Emit summary: "Total: N stories. X unchanged. Y changed. Z new. W errors."
478
- 5. If Y > 0 (changed stories): flag as "VISUAL REGRESSION CANDIDATES - review required on chromatic.com before merging"
479
- 6. Append narration to DESIGN-VERIFICATION.md ## Visual Regression section (create section if absent)
480
-
481
- If .design/chromatic-results.json does not exist: skip; emit no note.
473
+ If `.design/chromatic-results.json` exists, read it and narrate. First run (all entries `status: "new"`): emit "Baseline established - no regressions detected (first run creates baseline)." Subsequent runs, per story entry: `unchanged` → PASS, `changed` → CHANGED (review on chromatic.com), `new` → NEW (first snapshot, not a regression), `error` → ERROR (investigate). Emit summary "Total: N stories. X unchanged. Y changed. Z new. W errors." If any changed (Y > 0), flag "VISUAL REGRESSION CANDIDATES - review required on chromatic.com before merging". Append the narration to the DESIGN-VERIFICATION.md `## Visual Regression` section (create it if absent). If the file does not exist: skip; emit no note.
482
474
 
483
475
  ---
484
476
 
@@ -1,11 +1,11 @@
1
1
  ---
2
2
  name: quality-gate-runner
3
- description: "Cheap Haiku classifier that ingests {command, exit_code, stderr} tuples from the quality-gate skill's parallel run and emits a JSON verdict - pass/fail plus per-bucket failure groupings (lint / type / test / visual). Read-only. Does not run commands itself."
3
+ description: "Cheap Haiku classifier that ingests {command, exit_code, stderr} tuples from the quality-gate skill's parallel run and emits a JSON verdict - pass/fail plus per-bucket failure groupings (lint / type / test / visual / a11y). Read-only. Does not run commands itself."
4
4
  tools: Read, Bash, Grep
5
5
  color: amber
6
6
  model: inherit
7
7
  default-tier: haiku
8
- tier-rationale: "Pattern-match exit codes and bucket stderr into four named categories - no synthesis, no rewrites, no spawning. Belongs on Haiku to keep classification cost trivial relative to the actual command runs."
8
+ tier-rationale: "Pattern-match exit codes and bucket stderr into five named categories - no synthesis, no rewrites, no spawning. Belongs on Haiku to keep classification cost trivial relative to the actual command runs."
9
9
  size_budget: S
10
10
  parallel-safe: always
11
11
  typical-duration-seconds: 5
@@ -48,16 +48,17 @@ You may also receive a `stdout` field per entry (forward-compat - the skill plan
48
48
 
49
49
  ## Bucketing rule
50
50
 
51
- Map each command to exactly one of four buckets based on the verbatim command string. Use case-insensitive substring match against the command line:
51
+ Map each command to exactly one of five buckets based on the verbatim command string. Use case-insensitive substring match against the command line:
52
52
 
53
53
  | Substring (case-insensitive) | Bucket |
54
54
  |------------------------------|--------|
55
- | `lint`, `eslint`, `stylelint`, `biome lint` | `lint` |
56
- | `typecheck`, `tsc`, `tsc --noemit`, `flow check` | `type` |
57
- | `test` (but NOT one of the visual matches below - visual wins) | `test` |
55
+ | `axe`, `pa11y`, `lighthouse`, `jsx-a11y`, `eslint-plugin-jsx-a11y` | `a11y` |
58
56
  | `chromatic`, `test:visual`, `loki test`, `playwright test --grep visual` | `visual` |
57
+ | `typecheck`, `tsc`, `tsc --noemit`, `flow check` | `type` |
58
+ | `lint`, `eslint`, `stylelint`, `biome lint` | `lint` |
59
+ | `test` (only when none of the buckets above match) | `test` |
59
60
 
60
- When a command matches multiple substrings (e.g., `npm run test:visual` matches both `test` and `test:visual`), `visual` wins. If a command matches none, bucket it under `test` (catch-all - most user-supplied custom commands are test-like). Do not invent a fifth bucket.
61
+ Match precedence runs top-down: check `a11y` first, then `visual`, then `type`, then `lint`, then `test`. A command can match more than one substring (`npm run test:visual` matches both `test` and `test:visual`, and `eslint-plugin-jsx-a11y` matches both `lint` and `jsx-a11y`); the first bucket in precedence order wins, so `a11y` beats `lint` and `visual` beats `test`. If a command matches none, bucket it under `test` (catch-all - most user-supplied custom commands are test-like). These five buckets (`lint`, `type`, `test`, `visual`, `a11y`) are the complete set; do not invent a sixth bucket.
61
62
 
62
63
  ## Pass / fail rule
63
64
 
@@ -96,17 +97,17 @@ Pass example:
96
97
  Fail example:
97
98
 
98
99
  ```json
99
- {"status": "fail", "classified_failures": {"type": ["typecheck: error TS2304 in src/x.ts"], "visual": ["chromatic: 2 stories changed"]}}
100
+ {"status": "fail", "classified_failures": {"type": ["typecheck: error TS2304 in src/x.ts"], "visual": ["chromatic: 2 stories changed"], "a11y": ["axe: 3 serious violations on /checkout"]}}
100
101
  ```
101
102
 
102
103
  Schema:
103
104
  - `status` - string enum, one of `"pass" | "fail"`. Note: this is NOT the same enum as the skill's STATE-block status (which also has `timeout` and `skipped`); those two cases are decided by the skill, not by you. You only emit `pass | fail`.
104
- - `classified_failures` - object. Keys are a subset of `lint | type | test | visual`. Values are arrays of short summary strings (≤ 120 chars each). The object is `{}` (empty) when `status === "pass"`.
105
+ - `classified_failures` - object. Keys are a subset of `lint | type | test | visual | a11y`. Values are arrays of short summary strings (≤ 120 chars each). The object is `{}` (empty) when `status === "pass"`.
105
106
 
106
107
  ## Constraints
107
108
 
108
109
  - **Do not** read `stderr` content beyond the first non-empty line. The skill keeps the verbatim outputs for the design-fixer; your job is routing, not analysis.
109
- - **Do not** invent buckets outside the four-name set.
110
+ - **Do not** invent buckets outside the five-name set (`lint | type | test | visual | a11y`).
110
111
  - **Do not** ever emit `status: "timeout"` or `status: "skipped"` - those are skill-level statuses, not classifier outputs.
111
112
  - **Do not** consult external services or MCP tools. Classification is a pure function of the supplied input.
112
113
  - **Do not** exceed `size_budget: S`. If `outputs[*].stderr` is unexpectedly large, prefer to summarize from the first 4 KB of each stderr rather than refuse.
@@ -108,6 +108,23 @@ Run this final spec-quality pass over `.design/BRIEF.md` before the brief→expl
108
108
  - Scope check: nothing in the artifact exceeds (or silently drops) the agreed scope.
109
109
  - Ambiguity check: every requirement/decision is specific enough to act on without a follow-up question.
110
110
 
111
+ ## Optional brief audit (non-blocking)
112
+
113
+ Before the gate, you MAY spawn `agents/brief-auditor.md` via `Task` to grade the brief against the five
114
+ brief anti-patterns (vague verbs, missing audience, immeasurable success criteria, scope creep, missing
115
+ anti-goals). The auditor reads `.design/BRIEF.md` plus `reference/brief-quality-rubric.md` and writes
116
+ advisory findings to `.design/BRIEF-AUDIT.md`. This step is advisory and MUST NOT block the brief to
117
+ explore transition.
118
+
119
+ If the auditor reports one or more fired anti-patterns, surface a single-line pointer to the user:
120
+
121
+ ```
122
+ Brief audit flagged N issue(s) - run /gdd:discuss brief to refine, or proceed to explore.
123
+ ```
124
+
125
+ The user decides. Proceeding to explore with a flagged brief is allowed; the pointer is a nudge, not a gate.
126
+ If the auditor reports no fired anti-patterns, or you skip the audit, continue to the gate unchanged.
127
+
111
128
  <HARD-GATE>
112
129
  Do NOT transition to explore (or invoke `/gdd:explore`) until the brief artifact (default `.design/BRIEF.md`) is committed AND the user has approved it. If this project uses a custom `.design` location, read the artifact path from `.design/STATE.md` rather than assuming the default.
113
130
  </HARD-GATE>
@@ -39,7 +39,7 @@ Read once at start from `.design/config.json` (all optional; defaults in parens)
39
39
  Stop at the first tier that produces ≥ 1 command:
40
40
 
41
41
  1. **Authoritative config.** If `.design/config.json` has `quality_gate.commands` non-empty, use verbatim.
42
- 2. **Auto-detect from `package.json#scripts`** - match against allowlist: `lint`, `typecheck`, `tsc` (only if `typecheck` absent), `test`, `chromatic`, `test:visual`, `lint:design` (Phase 41 - the `gdd-detect` deterministic anti-pattern gate, alongside `axe`/`pa11y`/`lighthouse`). Exclude by name: `test:e2e`, `test:integration` (if separate `test`), anything starting `dev:`, `build:`, `start:`. Run via `npm run <name>` unless `quality_gate.package_manager` overrides.
42
+ 2. **Auto-detect from `package.json#scripts`** - match against allowlist: `lint`, `typecheck`, `tsc` (only if `typecheck` absent), `test`, `chromatic`, `test:visual`, `lint:design` (Phase 41 - the `gdd-detect` deterministic anti-pattern gate), and the accessibility scripts `axe`, `pa11y`, `lighthouse`, `eslint-plugin-jsx-a11y` (or a script named `jsx-a11y`) which classify into the `a11y` bucket. Exclude by name: `test:e2e`, `test:integration` (if separate `test`), anything starting `dev:`, `build:`, `start:`. Run via `npm run <name>` unless `quality_gate.package_manager` overrides.
43
43
  3. **Skip with notice.** Emit `quality_gate_skipped` (Step 6) and write a `<run/>` with `status="skipped"`. Verify treats skipped as non-blocking.
44
44
 
45
45
  ## Step 2 - Parallel run
@@ -48,7 +48,7 @@ Emit `quality_gate_started`. Spawn each command in a separate `Bash`; collect `{
48
48
 
49
49
  ## Step 3 - Classification
50
50
 
51
- Spawn `quality-gate-runner` agent via `Task` with payload `{outputs: [{command, exit_code, stderr}, ...]}`. Agent returns `{status: "pass"|"fail", classified_failures: {lint, type, test, visual}}`. `pass` → Step 5. `fail` → Step 4.
51
+ Spawn `quality-gate-runner` agent via `Task` with payload `{outputs: [{command, exit_code, stderr}, ...]}`. Agent returns `{status: "pass"|"fail", classified_failures: {lint, type, test, visual, a11y}}`. The `a11y` bucket groups accessibility failures from axe / pa11y / lighthouse / jsx-a11y. `pass` → Step 5. `fail` → Step 4.
52
52
 
53
53
  ## Step 4 - Fix loop (D-08)
54
54
 
@@ -0,0 +1,119 @@
1
+ #!/usr/bin/env node
2
+ 'use strict';
3
+ /**
4
+ * hooks/gdd-a11y-gate.js — advisory PostToolUse hook for accessibility failures.
5
+ *
6
+ * Phase 48 (A11Y-GATE). The quality-gate skill classifies failed command runs
7
+ * into buckets {lint, type, test, visual, a11y}. When a tool response carries
8
+ * classified_failures with a non-empty `a11y` bucket, this hook surfaces an
9
+ * advisory note so the accessibility failures are visible without being buried
10
+ * in the gate's JSON, and appends a `quality_gate_a11y` event to the cycle's
11
+ * events.jsonl for observability.
12
+ *
13
+ * Contract (mirrors gdd-mcp-circuit-breaker.js):
14
+ * - Read stdin JSON (the PostToolUse payload).
15
+ * - Inspect payload.tool_response for quality-gate classified_failures.a11y.
16
+ * - If present and non-empty: emit an advisory note + append one events.jsonl row.
17
+ * - ALWAYS write {continue:true} to stdout and exit 0. This hook never blocks.
18
+ *
19
+ * Advisory only: accessibility findings route to design-fixer through the gate's
20
+ * own fix loop, not through this hook. The hook is observability, not a gate.
21
+ * Dependency-free Node (fs + path only).
22
+ */
23
+
24
+ const fs = require('fs');
25
+ const path = require('path');
26
+
27
+ /**
28
+ * Pull the `a11y` bucket out of a tool response, tolerating both the shape
29
+ * where classified_failures sits at the top level and the shape where it is
30
+ * nested under a `quality_gate` / `result` wrapper. Returns an array of
31
+ * summary strings (possibly empty) or null when no a11y bucket is present.
32
+ */
33
+ function extractA11yFailures(toolResponse) {
34
+ if (!toolResponse || typeof toolResponse !== 'object') return null;
35
+
36
+ const candidates = [
37
+ toolResponse.classified_failures,
38
+ toolResponse.quality_gate && toolResponse.quality_gate.classified_failures,
39
+ toolResponse.result && toolResponse.result.classified_failures,
40
+ ];
41
+
42
+ for (const cf of candidates) {
43
+ if (cf && typeof cf === 'object' && Object.prototype.hasOwnProperty.call(cf, 'a11y')) {
44
+ const bucket = cf.a11y;
45
+ if (Array.isArray(bucket)) return bucket;
46
+ // Tolerate a non-array truthy value by coercing to a single-element list.
47
+ if (bucket) return [String(bucket)];
48
+ return [];
49
+ }
50
+ }
51
+ return null;
52
+ }
53
+
54
+ /** Append one JSONL event row; best-effort, never throws on the persist path. */
55
+ function appendEvent(cwd, row) {
56
+ try {
57
+ const eventsPath = path.join(cwd, '.design', 'events.jsonl');
58
+ fs.mkdirSync(path.dirname(eventsPath), { recursive: true });
59
+ fs.appendFileSync(eventsPath, JSON.stringify(row) + '\n', 'utf8');
60
+ } catch {
61
+ /* observability is best-effort — swallow */
62
+ }
63
+ }
64
+
65
+ /**
66
+ * Core hook logic. Accepts a parsed payload and returns the decision object
67
+ * to write to stdout. Exported for unit testing without spawning a process.
68
+ * Always returns an object whose `continue` field is true.
69
+ */
70
+ function evaluate(payload, opts = {}) {
71
+ const cwd = (payload && payload.cwd) || opts.cwd || process.cwd();
72
+ const toolResponse = payload && payload.tool_response;
73
+ const a11y = extractA11yFailures(toolResponse);
74
+
75
+ if (!a11y || a11y.length === 0) {
76
+ return { continue: true };
77
+ }
78
+
79
+ const count = a11y.length;
80
+ const note =
81
+ `gdd-a11y-gate: quality gate reported ${count} accessibility ` +
82
+ `failure${count === 1 ? '' : 's'} in the a11y bucket. These route to ` +
83
+ `design-fixer like lint/type/test/visual failures. Findings: ` +
84
+ a11y.slice(0, 5).join('; ');
85
+
86
+ appendEvent(cwd, {
87
+ ts: new Date().toISOString(),
88
+ event: 'quality_gate_a11y',
89
+ a11y_failure_count: count,
90
+ a11y_failures: a11y.slice(0, 20),
91
+ });
92
+
93
+ // continue:true keeps this advisory — systemMessage surfaces the note.
94
+ return { continue: true, systemMessage: note };
95
+ }
96
+
97
+ async function main(stdin = process.stdin, stdout = process.stdout) {
98
+ let buf = '';
99
+ for await (const chunk of stdin) buf += chunk;
100
+ let payload;
101
+ try {
102
+ payload = JSON.parse(buf || '{}');
103
+ } catch {
104
+ stdout.write(JSON.stringify({ continue: true }));
105
+ return;
106
+ }
107
+ const decision = evaluate(payload);
108
+ stdout.write(JSON.stringify(decision));
109
+ }
110
+
111
+ // Run as a CLI only when invoked directly; tests require() this module and
112
+ // call evaluate()/main() against mock payloads without triggering stdin reads.
113
+ if (require.main === module) {
114
+ main().catch(() => {
115
+ process.stdout.write(JSON.stringify({ continue: true }));
116
+ });
117
+ }
118
+
119
+ module.exports = { main, evaluate, extractA11yFailures, appendEvent };