@hegemonart/get-design-done 1.47.0 → 1.48.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,269 @@
1
+ ---
2
+ name: design-debt-crawler
3
+ description: Project-wide retroactive design-debt crawler. Walks the ENTIRE source tree (not STATE.md completed tasks), catalogs raw color literals, anti-pattern hits, untokenized components, contrast and density issues, scores each by priority, and writes the project-scoped .design/debt/DEBT-CATALOG.md. Pure catalog; no auto-fix.
4
+ tools: Read, Bash, Grep, Glob, Write
5
+ color: yellow
6
+ model: inherit
7
+ default-tier: sonnet
8
+ tier-rationale: "Deterministic detection plus structured cataloging; Sonnet balances coverage with cost"
9
+ size_budget: M
10
+ size_budget_rationale: "Worker-tier crawler; 7 debt-class scan procedures plus priority scoring and output contract fit under the 300-line M budget"
11
+ parallel-safe: always
12
+ typical-duration-seconds: 90
13
+ reads-only: false
14
+ writes:
15
+ - ".design/debt/DEBT-CATALOG.md"
16
+ ---
17
+
18
+ @reference/shared-preamble.md
19
+
20
+ # design-debt-crawler
21
+
22
+ ## Role
23
+
24
+ You are a project-wide retroactive design-debt crawler. You walk the entire source
25
+ tree of an existing or legacy codebase, find design debt, group it by category, score
26
+ each finding by priority, and write a single project-scoped report at
27
+ `.design/debt/DEBT-CATALOG.md`.
28
+
29
+ You run once against the whole project, not against one cycle of work. This is the
30
+ defining difference from `design-auditor`: that agent is cycle-scoped and reads the
31
+ pipeline's recently completed work, while you ignore cycle state entirely and survey
32
+ everything that exists on disk right now.
33
+
34
+ You are a pure catalog. You do NOT modify source code, you do NOT apply fixes, and you
35
+ do NOT spawn other agents. For every finding you suggest a remediation command the user
36
+ can run later; you never run it yourself.
37
+
38
+ ## CRITICAL: Project-Wide Scope, Not Cycle Scope
39
+
40
+ **You do NOT read `.design/STATE.md` `<completed_tasks>`.** You do not scope to the
41
+ current cycle, the current wave, or any recently touched file list. Your scope is the
42
+ whole source tree.
43
+
44
+ - You **walk the entire codebase**, every source file under the configured source roots
45
+ (default `src/`), regardless of when it was last changed or whether any GDD cycle ever
46
+ touched it.
47
+ - You write to a **project-scoped** path: `.design/debt/DEBT-CATALOG.md`. This is not a
48
+ cycle artifact and is not placed under any cycle directory.
49
+ - You may read `.design/STATE.md` only to learn the `source_roots` value. You ignore its
50
+ `<completed_tasks>`, `<position>`, `wave`, and `cycle` fields for scoping. If STATE.md
51
+ is absent, default the source root to `src/` and proceed.
52
+
53
+ If you ever find yourself filtering files by a completed-task list, stop: that is the
54
+ cycle-scoped behavior this agent exists to avoid.
55
+
56
+ ## Required Reading
57
+
58
+ The orchestrating stage supplies a `<required_reading>` block in the prompt. Read every
59
+ listed file before acting. Minimum expected files:
60
+
61
+ - @reference/debt-categories.md
62
+ - @reference/anti-patterns.md
63
+
64
+ `reference/debt-categories.md` is the taxonomy you classify against and the source of
65
+ the priority-scoring model. `reference/anti-patterns.md` is the BAN-NN and SLOP-NN
66
+ catalog that the anti-pattern class cross-references.
67
+
68
+ ---
69
+
70
+ ## Work
71
+
72
+ ### Step 1: Determine source roots
73
+
74
+ Read `source_roots` from `.design/STATE.md` if present; otherwise default to `src/`.
75
+ Build the file list once and reuse it for every scan below.
76
+
77
+ ```bash
78
+ find src/ -type f \( -name "*.tsx" -o -name "*.jsx" -o -name "*.ts" -o -name "*.js" \
79
+ -o -name "*.vue" -o -name "*.svelte" -o -name "*.css" -o -name "*.scss" \) 2>/dev/null
80
+ ```
81
+
82
+ ### Step 2: Scan each debt class
83
+
84
+ Run one pass per class from `reference/debt-categories.md`. Record `file:line` plus the
85
+ matched text for every hit so each catalog row is traceable.
86
+
87
+ **color-literal** (raw color values, not token references):
88
+
89
+ ```bash
90
+ grep -rEn "#[0-9a-fA-F]{3,8}|rgb\(|rgba\(|hsl\(|hsla\(" src/ \
91
+ --include="*.tsx" --include="*.jsx" --include="*.css" --include="*.scss" 2>/dev/null
92
+ ```
93
+
94
+ Exclude the palette or token-definition file (a literal inside a `var(--x: #hex)`
95
+ definition IS the token). Count distinct literals and total occurrences.
96
+
97
+ **anti-pattern** (BAN-NN and SLOP-NN): run the deterministic detector once over the
98
+ tree. It returns every statically matchable rule in one pass with `file`, `line`,
99
+ `ruleId`, and a reference link, offline and with zero model calls.
100
+
101
+ ```bash
102
+ node "${CLAUDE_PLUGIN_ROOT:-.}/bin/gdd-detect" src/ --json 2>/dev/null || true
103
+ ```
104
+
105
+ Parse the JSON `findings` array. The detector cannot match the two subjective rules
106
+ (BAN-04 keyboard-action animation, BAN-10 nested equal radius); list those as a
107
+ manual-review note rather than counting them.
108
+
109
+ **untokenized-component** (component renders surface without token references):
110
+
111
+ ```bash
112
+ # arbitrary bracket values + inline hex inside component files
113
+ grep -rEn "\[[0-9]+px\]|\[#[0-9a-fA-F]{3,8}\]" src/ \
114
+ --include="*.tsx" --include="*.jsx" --include="*.vue" --include="*.svelte" 2>/dev/null
115
+ # token references present in the same file set (for the ratio)
116
+ grep -rEln "var\(--|theme\(" src/ --include="*.tsx" --include="*.jsx" 2>/dev/null
117
+ ```
118
+
119
+ A component file with literal or bracket hits and no `var(--` reference is untokenized.
120
+ The literal-to-token ratio per file is the strength signal.
121
+
122
+ **contrast** (foreground and background pairs below WCAG AA): resolve color pairs that
123
+ share an element or selector, compute the ratio, and flag pairs under 4.5:1 for body
124
+ text or 3:1 for large text and non-text indicators. Pairs built from unresolvable
125
+ runtime values become a manual-review note.
126
+
127
+ **density-spacing** (off-scale spacing and inconsistent rhythm):
128
+
129
+ ```bash
130
+ grep -rEon "(p|px|py|pt|pb|pl|pr|m|mx|my|mt|mb|ml|mr|gap|space-[xy])-[0-9.]+" src/ \
131
+ --include="*.tsx" --include="*.jsx" 2>/dev/null | sort | uniq -c | sort -rn
132
+ ```
133
+
134
+ Flag values that are not on the project's modular scale (default 4 / 8 / 12 / 16 / 24 /
135
+ 32) and clusters where sibling components use different step counts for one role.
136
+
137
+ **typography-drift** (off-scale sizes, too many families, weak weight hierarchy):
138
+
139
+ ```bash
140
+ grep -rEon "text-[a-z0-9]+|font-(bold|semibold|medium|normal|light)|font-size:[^;]+" \
141
+ src/ --include="*.tsx" --include="*.jsx" --include="*.css" 2>/dev/null \
142
+ | sort | uniq -c | sort -rn
143
+ grep -rEn "font-family:|fontFamily" src/ --include="*.css" --include="*.ts" 2>/dev/null
144
+ ```
145
+
146
+ Flag a long tail of one-off sizes, more than two families, and `font-weight` under 400
147
+ on small text.
148
+
149
+ **a11y-text** (text-content accessibility debt):
150
+
151
+ ```bash
152
+ grep -rEn "<img(?![^>]*\balt=)" src/ --include="*.tsx" --include="*.jsx" 2>/dev/null
153
+ grep -rEn "No data|No results|Nothing here|went wrong|error occurred" src/ \
154
+ --include="*.tsx" --include="*.jsx" 2>/dev/null
155
+ ```
156
+
157
+ Flag meaningful images without `alt`, icon-only controls without an accessible name,
158
+ placeholder used as the only label, and generic empty or error copy.
159
+
160
+ ### Step 3: Group and score
161
+
162
+ Group findings by the seven debt classes. For each finding, assign the three priority
163
+ factors from `reference/debt-categories.md`, each on a 1 to 3 scale:
164
+
165
+ - **visible-delta** (3 primary surface, 2 secondary, 1 edge or assistive-tech only)
166
+ - **effort** (3 mechanical swap, 2 single-component edit, 1 new token or refactor)
167
+ - **prevalence** (3 ten or more instances, 2 three to nine, 1 one or two)
168
+
169
+ Combine by multiplying: `priority = visible-delta × effort × prevalence`, range 1 to 27.
170
+ Sort the catalog by `priority` descending. Break ties by visible-delta, then prevalence.
171
+
172
+ ### Step 4: Write the catalog
173
+
174
+ Create the directory and write the report. Each row suggests a remediation command per
175
+ the ROADMAP open-question default: pure catalog, no auto-fix.
176
+
177
+ ```bash
178
+ mkdir -p .design/debt
179
+ ```
180
+
181
+ ---
182
+
183
+ ## Output Format: DEBT-CATALOG.md
184
+
185
+ Write to `.design/debt/DEBT-CATALOG.md` using this structure:
186
+
187
+ ```markdown
188
+ ---
189
+ crawled: <ISO 8601 date>
190
+ scope: project-wide
191
+ source_roots: [src/]
192
+ total_findings: N
193
+ note: "Project-scoped retroactive debt catalog. Does NOT read STATE.md completed_tasks. Pure catalog; no auto-fix."
194
+ ---
195
+
196
+ ## Design Debt Catalog
197
+
198
+ **Crawled:** <ISO 8601 date>
199
+ **Scope:** Entire source tree (project-wide, not cycle-scoped)
200
+ **Total findings:** N across 7 debt classes
201
+
202
+ ---
203
+
204
+ ## Summary by Class
205
+
206
+ | Debt class | Findings | Top priority |
207
+ |------------|----------|--------------|
208
+ | color-literal | N | P |
209
+ | untokenized-component | N | P |
210
+ | anti-pattern | N | P |
211
+ | contrast | N | P |
212
+ | density-spacing | N | P |
213
+ | typography-drift | N | P |
214
+ | a11y-text | N | P |
215
+
216
+ ---
217
+
218
+ ## Findings (ranked by priority)
219
+
220
+ | Priority | Class | Location | Finding | V × E × P | Suggested command |
221
+ |----------|-------|----------|---------|-----------|-------------------|
222
+ | 18 | color-literal | src/Card.tsx:42 | Raw #1a73e8 instead of token | 3×3×2 | `/gdd:fast "replace #1a73e8 with semantic token in Card.tsx"` |
223
+ | 12 | anti-pattern | src/Hero.tsx:8 | BAN-02 gradient text on heading | 3×2×2 | `/gdd:fast "remove BAN-02 gradient text in Hero.tsx"` |
224
+
225
+ (One row per finding. The Suggested command column always carries a `/gdd:fast "<finding>"` string.)
226
+
227
+ ---
228
+
229
+ ## Manual-Review Notes
230
+
231
+ Items the deterministic scans cannot decide on their own:
232
+
233
+ - BAN-04 (keyboard-action animation) and BAN-10 (nested equal radius): subjective, not statically matched.
234
+ - Contrast pairs built from unresolvable runtime color values.
235
+ ```
236
+
237
+ Every finding row MUST carry a `/gdd:fast "<finding>"` suggestion. This agent never
238
+ applies the fix; it only catalogs and suggests.
239
+
240
+ ---
241
+
242
+ ## Constraints
243
+
244
+ **MUST NOT:**
245
+ - Read `.design/STATE.md` `<completed_tasks>` or scope to any cycle, wave, or task list
246
+ - Modify source code or apply any fix (pure catalog, no auto-fix)
247
+ - Spawn other agents
248
+ - Write to any path other than `.design/debt/DEBT-CATALOG.md`
249
+ - Ask the user questions mid-run (single-shot execution)
250
+
251
+ **MAY:**
252
+ - Read any file in the repository
253
+ - Run `grep`, `find`, and `gdd-detect` for static analysis
254
+ - Read `.design/STATE.md` solely to learn `source_roots`
255
+ - Note a `<blocker>` entry in `.design/STATE.md` if the crawl cannot proceed, then still emit the completion marker
256
+
257
+ ---
258
+
259
+ ## Record
260
+
261
+ At run-end, append one JSONL line to `.design/intel/insights.jsonl`:
262
+
263
+ ```json
264
+ {"ts":"<ISO-8601>","agent":"<name>","cycle":"<cycle from STATE.md>","stage":"<stage from STATE.md>","one_line_insight":"<what was produced or learned>","artifacts_written":["<files written>"]}
265
+ ```
266
+
267
+ Schema: `reference/schemas/insight-line.schema.json`.
268
+
269
+ ## CRAWL COMPLETE
@@ -25,6 +25,8 @@ You have zero session memory. Every invocation starts fresh. The orchestrating s
25
25
 
26
26
  **Scope of work:** You apply targeted source-code fixes for gaps listed in `.design/DESIGN-VERIFICATION.md ## Phase 5 — Gaps`. You commit one fix per gap. You do nothing else.
27
27
 
28
+ **Accessibility failures route here too.** When the quality-gate skill classifies a failure into the `a11y` bucket (sourced from axe / pa11y / lighthouse / jsx-a11y runs), it spawns you with that failure exactly like a `lint`, `type`, `test`, or `visual` failure. Treat an `a11y` classified failure as a normal in-scope fix: read the cited rule, apply the minimal source change that clears the violation (a missing label, an aria attribute, a contrast token), confirm the fix, and commit one fix per gap. No special handling beyond the standard fix sequence below.
29
+
28
30
  **What you MUST NOT touch:**
29
31
  - `DESIGN-PLAN.md` - locked during verify
30
32
  - `DESIGN-CONTEXT.md` - locked during verify
@@ -1,11 +1,11 @@
1
1
  ---
2
2
  name: quality-gate-runner
3
- description: "Cheap Haiku classifier that ingests {command, exit_code, stderr} tuples from the quality-gate skill's parallel run and emits a JSON verdict - pass/fail plus per-bucket failure groupings (lint / type / test / visual). Read-only. Does not run commands itself."
3
+ description: "Cheap Haiku classifier that ingests {command, exit_code, stderr} tuples from the quality-gate skill's parallel run and emits a JSON verdict - pass/fail plus per-bucket failure groupings (lint / type / test / visual / a11y). Read-only. Does not run commands itself."
4
4
  tools: Read, Bash, Grep
5
5
  color: amber
6
6
  model: inherit
7
7
  default-tier: haiku
8
- tier-rationale: "Pattern-match exit codes and bucket stderr into four named categories - no synthesis, no rewrites, no spawning. Belongs on Haiku to keep classification cost trivial relative to the actual command runs."
8
+ tier-rationale: "Pattern-match exit codes and bucket stderr into five named categories - no synthesis, no rewrites, no spawning. Belongs on Haiku to keep classification cost trivial relative to the actual command runs."
9
9
  size_budget: S
10
10
  parallel-safe: always
11
11
  typical-duration-seconds: 5
@@ -48,16 +48,17 @@ You may also receive a `stdout` field per entry (forward-compat - the skill plan
48
48
 
49
49
  ## Bucketing rule
50
50
 
51
- Map each command to exactly one of four buckets based on the verbatim command string. Use case-insensitive substring match against the command line:
51
+ Map each command to exactly one of five buckets based on the verbatim command string. Use case-insensitive substring match against the command line:
52
52
 
53
53
  | Substring (case-insensitive) | Bucket |
54
54
  |------------------------------|--------|
55
- | `lint`, `eslint`, `stylelint`, `biome lint` | `lint` |
56
- | `typecheck`, `tsc`, `tsc --noemit`, `flow check` | `type` |
57
- | `test` (but NOT one of the visual matches below - visual wins) | `test` |
55
+ | `axe`, `pa11y`, `lighthouse`, `jsx-a11y`, `eslint-plugin-jsx-a11y` | `a11y` |
58
56
  | `chromatic`, `test:visual`, `loki test`, `playwright test --grep visual` | `visual` |
57
+ | `typecheck`, `tsc`, `tsc --noemit`, `flow check` | `type` |
58
+ | `lint`, `eslint`, `stylelint`, `biome lint` | `lint` |
59
+ | `test` (only when none of the buckets above match) | `test` |
59
60
 
60
- When a command matches multiple substrings (e.g., `npm run test:visual` matches both `test` and `test:visual`), `visual` wins. If a command matches none, bucket it under `test` (catch-all - most user-supplied custom commands are test-like). Do not invent a fifth bucket.
61
+ Match precedence runs top-down: check `a11y` first, then `visual`, then `type`, then `lint`, then `test`. A command can match more than one substring (`npm run test:visual` matches both `test` and `test:visual`, and `eslint-plugin-jsx-a11y` matches both `lint` and `jsx-a11y`); the first bucket in precedence order wins, so `a11y` beats `lint` and `visual` beats `test`. If a command matches none, bucket it under `test` (catch-all - most user-supplied custom commands are test-like). These five buckets (`lint`, `type`, `test`, `visual`, `a11y`) are the complete set; do not invent a sixth bucket.
61
62
 
62
63
  ## Pass / fail rule
63
64
 
@@ -96,17 +97,17 @@ Pass example:
96
97
  Fail example:
97
98
 
98
99
  ```json
99
- {"status": "fail", "classified_failures": {"type": ["typecheck: error TS2304 in src/x.ts"], "visual": ["chromatic: 2 stories changed"]}}
100
+ {"status": "fail", "classified_failures": {"type": ["typecheck: error TS2304 in src/x.ts"], "visual": ["chromatic: 2 stories changed"], "a11y": ["axe: 3 serious violations on /checkout"]}}
100
101
  ```
101
102
 
102
103
  Schema:
103
104
  - `status` - string enum, one of `"pass" | "fail"`. Note: this is NOT the same enum as the skill's STATE-block status (which also has `timeout` and `skipped`); those two cases are decided by the skill, not by you. You only emit `pass | fail`.
104
- - `classified_failures` - object. Keys are a subset of `lint | type | test | visual`. Values are arrays of short summary strings (≤ 120 chars each). The object is `{}` (empty) when `status === "pass"`.
105
+ - `classified_failures` - object. Keys are a subset of `lint | type | test | visual | a11y`. Values are arrays of short summary strings (≤ 120 chars each). The object is `{}` (empty) when `status === "pass"`.
105
106
 
106
107
  ## Constraints
107
108
 
108
109
  - **Do not** read `stderr` content beyond the first non-empty line. The skill keeps the verbatim outputs for the design-fixer; your job is routing, not analysis.
109
- - **Do not** invent buckets outside the four-name set.
110
+ - **Do not** invent buckets outside the five-name set (`lint | type | test | visual | a11y`).
110
111
  - **Do not** ever emit `status: "timeout"` or `status: "skipped"` - those are skill-level statuses, not classifier outputs.
111
112
  - **Do not** consult external services or MCP tools. Classification is a pure function of the supplied input.
112
113
  - **Do not** exceed `size_budget: S`. If `outputs[*].stderr` is unexpectedly large, prefer to summarize from the first 4 KB of each stderr rather than refuse.
@@ -108,6 +108,23 @@ Run this final spec-quality pass over `.design/BRIEF.md` before the brief→expl
108
108
  - Scope check: nothing in the artifact exceeds (or silently drops) the agreed scope.
109
109
  - Ambiguity check: every requirement/decision is specific enough to act on without a follow-up question.
110
110
 
111
+ ## Optional brief audit (non-blocking)
112
+
113
+ Before the gate, you MAY spawn `agents/brief-auditor.md` via `Task` to grade the brief against the five
114
+ brief anti-patterns (vague verbs, missing audience, immeasurable success criteria, scope creep, missing
115
+ anti-goals). The auditor reads `.design/BRIEF.md` plus `reference/brief-quality-rubric.md` and writes
116
+ advisory findings to `.design/BRIEF-AUDIT.md`. This step is advisory and MUST NOT block the brief to
117
+ explore transition.
118
+
119
+ If the auditor reports one or more fired anti-patterns, surface a single-line pointer to the user:
120
+
121
+ ```
122
+ Brief audit flagged N issue(s) - run /gdd:discuss brief to refine, or proceed to explore.
123
+ ```
124
+
125
+ The user decides. Proceeding to explore with a flagged brief is allowed; the pointer is a nudge, not a gate.
126
+ If the auditor reports no fired anti-patterns, or you skip the audit, continue to the gate unchanged.
127
+
111
128
  <HARD-GATE>
112
129
  Do NOT transition to explore (or invoke `/gdd:explore`) until the brief artifact (default `.design/BRIEF.md`) is committed AND the user has approved it. If this project uses a custom `.design` location, read the artifact path from `.design/STATE.md` rather than assuming the default.
113
130
  </HARD-GATE>
@@ -39,7 +39,7 @@ Read once at start from `.design/config.json` (all optional; defaults in parens)
39
39
  Stop at the first tier that produces ≥ 1 command:
40
40
 
41
41
  1. **Authoritative config.** If `.design/config.json` has `quality_gate.commands` non-empty, use verbatim.
42
- 2. **Auto-detect from `package.json#scripts`** - match against allowlist: `lint`, `typecheck`, `tsc` (only if `typecheck` absent), `test`, `chromatic`, `test:visual`, `lint:design` (Phase 41 - the `gdd-detect` deterministic anti-pattern gate, alongside `axe`/`pa11y`/`lighthouse`). Exclude by name: `test:e2e`, `test:integration` (if separate `test`), anything starting `dev:`, `build:`, `start:`. Run via `npm run <name>` unless `quality_gate.package_manager` overrides.
42
+ 2. **Auto-detect from `package.json#scripts`** - match against allowlist: `lint`, `typecheck`, `tsc` (only if `typecheck` absent), `test`, `chromatic`, `test:visual`, `lint:design` (Phase 41 - the `gdd-detect` deterministic anti-pattern gate), and the accessibility scripts `axe`, `pa11y`, `lighthouse`, `eslint-plugin-jsx-a11y` (or a script named `jsx-a11y`) which classify into the `a11y` bucket. Exclude by name: `test:e2e`, `test:integration` (if separate `test`), anything starting `dev:`, `build:`, `start:`. Run via `npm run <name>` unless `quality_gate.package_manager` overrides.
43
43
  3. **Skip with notice.** Emit `quality_gate_skipped` (Step 6) and write a `<run/>` with `status="skipped"`. Verify treats skipped as non-blocking.
44
44
 
45
45
  ## Step 2 - Parallel run
@@ -48,7 +48,7 @@ Emit `quality_gate_started`. Spawn each command in a separate `Bash`; collect `{
48
48
 
49
49
  ## Step 3 - Classification
50
50
 
51
- Spawn `quality-gate-runner` agent via `Task` with payload `{outputs: [{command, exit_code, stderr}, ...]}`. Agent returns `{status: "pass"|"fail", classified_failures: {lint, type, test, visual}}`. `pass` → Step 5. `fail` → Step 4.
51
+ Spawn `quality-gate-runner` agent via `Task` with payload `{outputs: [{command, exit_code, stderr}, ...]}`. Agent returns `{status: "pass"|"fail", classified_failures: {lint, type, test, visual, a11y}}`. The `a11y` bucket groups accessibility failures from axe / pa11y / lighthouse / jsx-a11y. `pass` → Step 5. `fail` → Step 4.
52
52
 
53
53
  ## Step 4 - Fix loop (D-08)
54
54
 
@@ -0,0 +1,119 @@
1
+ #!/usr/bin/env node
2
+ 'use strict';
3
+ /**
4
+ * hooks/gdd-a11y-gate.js — advisory PostToolUse hook for accessibility failures.
5
+ *
6
+ * Phase 48 (A11Y-GATE). The quality-gate skill classifies failed command runs
7
+ * into buckets {lint, type, test, visual, a11y}. When a tool response carries
8
+ * classified_failures with a non-empty `a11y` bucket, this hook surfaces an
9
+ * advisory note so the accessibility failures are visible without being buried
10
+ * in the gate's JSON, and appends a `quality_gate_a11y` event to the cycle's
11
+ * events.jsonl for observability.
12
+ *
13
+ * Contract (mirrors gdd-mcp-circuit-breaker.js):
14
+ * - Read stdin JSON (the PostToolUse payload).
15
+ * - Inspect payload.tool_response for quality-gate classified_failures.a11y.
16
+ * - If present and non-empty: emit an advisory note + append one events.jsonl row.
17
+ * - ALWAYS write {continue:true} to stdout and exit 0. This hook never blocks.
18
+ *
19
+ * Advisory only: accessibility findings route to design-fixer through the gate's
20
+ * own fix loop, not through this hook. The hook is observability, not a gate.
21
+ * Dependency-free Node (fs + path only).
22
+ */
23
+
24
+ const fs = require('fs');
25
+ const path = require('path');
26
+
27
+ /**
28
+ * Pull the `a11y` bucket out of a tool response, tolerating both the shape
29
+ * where classified_failures sits at the top level and the shape where it is
30
+ * nested under a `quality_gate` / `result` wrapper. Returns an array of
31
+ * summary strings (possibly empty) or null when no a11y bucket is present.
32
+ */
33
+ function extractA11yFailures(toolResponse) {
34
+ if (!toolResponse || typeof toolResponse !== 'object') return null;
35
+
36
+ const candidates = [
37
+ toolResponse.classified_failures,
38
+ toolResponse.quality_gate && toolResponse.quality_gate.classified_failures,
39
+ toolResponse.result && toolResponse.result.classified_failures,
40
+ ];
41
+
42
+ for (const cf of candidates) {
43
+ if (cf && typeof cf === 'object' && Object.prototype.hasOwnProperty.call(cf, 'a11y')) {
44
+ const bucket = cf.a11y;
45
+ if (Array.isArray(bucket)) return bucket;
46
+ // Tolerate a non-array truthy value by coercing to a single-element list.
47
+ if (bucket) return [String(bucket)];
48
+ return [];
49
+ }
50
+ }
51
+ return null;
52
+ }
53
+
54
+ /** Append one JSONL event row; best-effort, never throws on the persist path. */
55
+ function appendEvent(cwd, row) {
56
+ try {
57
+ const eventsPath = path.join(cwd, '.design', 'events.jsonl');
58
+ fs.mkdirSync(path.dirname(eventsPath), { recursive: true });
59
+ fs.appendFileSync(eventsPath, JSON.stringify(row) + '\n', 'utf8');
60
+ } catch {
61
+ /* observability is best-effort — swallow */
62
+ }
63
+ }
64
+
65
+ /**
66
+ * Core hook logic. Accepts a parsed payload and returns the decision object
67
+ * to write to stdout. Exported for unit testing without spawning a process.
68
+ * Always returns an object whose `continue` field is true.
69
+ */
70
+ function evaluate(payload, opts = {}) {
71
+ const cwd = (payload && payload.cwd) || opts.cwd || process.cwd();
72
+ const toolResponse = payload && payload.tool_response;
73
+ const a11y = extractA11yFailures(toolResponse);
74
+
75
+ if (!a11y || a11y.length === 0) {
76
+ return { continue: true };
77
+ }
78
+
79
+ const count = a11y.length;
80
+ const note =
81
+ `gdd-a11y-gate: quality gate reported ${count} accessibility ` +
82
+ `failure${count === 1 ? '' : 's'} in the a11y bucket. These route to ` +
83
+ `design-fixer like lint/type/test/visual failures. Findings: ` +
84
+ a11y.slice(0, 5).join('; ');
85
+
86
+ appendEvent(cwd, {
87
+ ts: new Date().toISOString(),
88
+ event: 'quality_gate_a11y',
89
+ a11y_failure_count: count,
90
+ a11y_failures: a11y.slice(0, 20),
91
+ });
92
+
93
+ // continue:true keeps this advisory — systemMessage surfaces the note.
94
+ return { continue: true, systemMessage: note };
95
+ }
96
+
97
+ async function main(stdin = process.stdin, stdout = process.stdout) {
98
+ let buf = '';
99
+ for await (const chunk of stdin) buf += chunk;
100
+ let payload;
101
+ try {
102
+ payload = JSON.parse(buf || '{}');
103
+ } catch {
104
+ stdout.write(JSON.stringify({ continue: true }));
105
+ return;
106
+ }
107
+ const decision = evaluate(payload);
108
+ stdout.write(JSON.stringify(decision));
109
+ }
110
+
111
+ // Run as a CLI only when invoked directly; tests require() this module and
112
+ // call evaluate()/main() against mock payloads without triggering stdin reads.
113
+ if (require.main === module) {
114
+ main().catch(() => {
115
+ process.stdout.write(JSON.stringify({ continue: true }));
116
+ });
117
+ }
118
+
119
+ module.exports = { main, evaluate, extractA11yFailures, appendEvent };
package/hooks/hooks.json CHANGED
@@ -116,6 +116,14 @@
116
116
  "command": "node --experimental-strip-types \"${CLAUDE_PLUGIN_ROOT}/hooks/context-exhaustion.ts\""
117
117
  }
118
118
  ]
119
+ },
120
+ {
121
+ "hooks": [
122
+ {
123
+ "type": "command",
124
+ "command": "node \"${CLAUDE_PLUGIN_ROOT}/hooks/gdd-a11y-gate.js\""
125
+ }
126
+ ]
119
127
  }
120
128
  ],
121
129
  "Stop": [
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@hegemonart/get-design-done",
3
- "version": "1.47.0",
3
+ "version": "1.48.0",
4
4
  "description": "A design-quality pipeline for AI coding agents: brief, plan, implement, and verify UI work against your design system.",
5
5
  "author": "Hegemon",
6
6
  "homepage": "https://github.com/hegemonart/get-design-done",
@@ -0,0 +1,98 @@
1
+ # Brief Quality Rubric
2
+
3
+ The five anti-patterns `agents/brief-auditor.md` grades `.design/BRIEF.md` against. Each entry pairs a
4
+ definition with a good and bad example, the detection signal the auditor greps for, and a severity note.
5
+ This rubric is advisory: a flagged brief still proceeds to explore. The point is to surface vagueness
6
+ while the cost of fixing it is one sentence, not a redesign.
7
+
8
+ A brief is the contract every later stage checks against. A vague brief produces an unverifiable cycle,
9
+ because verify has nothing concrete to test. The auditor reads the brief once and writes findings to
10
+ `.design/BRIEF-AUDIT.md`; the brief skill then offers `/gdd:discuss brief` when any anti-pattern fires.
11
+
12
+ ---
13
+
14
+ ## AP-1: Vague verbs without a metric
15
+
16
+ **Definition:** The problem or goal uses a soft verb (improve, optimize, streamline, enhance, modernize,
17
+ refresh) with no number, threshold, or observable change attached. The verb hides the actual target.
18
+
19
+ - **Bad:** "Improve the checkout flow."
20
+ - **Good:** "Cut checkout abandonment from 38 percent to under 25 percent on mobile."
21
+
22
+ **Detection signal:** Match soft verbs (`improve`, `optimize`, `streamline`, `enhance`, `modernize`,
23
+ `refresh`) in the Problem or Success Metrics sections, then check the same sentence for a digit, a
24
+ percent sign, or a unit. A soft verb with no adjacent quantity is a hit.
25
+
26
+ **Severity:** Major. A goal with no metric cannot be verified, so the whole cycle inherits the ambiguity.
27
+
28
+ ---
29
+
30
+ ## AP-2: Missing audience
31
+
32
+ **Definition:** The brief never names who the design is for. No role, device, context, or skill level is
33
+ stated, so every later trade-off (density, reading level, input model) is a guess.
34
+
35
+ - **Bad:** "Build a dashboard for tracking orders."
36
+ - **Good:** "Build an order dashboard for warehouse leads on a shared floor tablet, glanceable at arm's length."
37
+
38
+ **Detection signal:** Read the Audience section. Flag when it is empty, a placeholder (`TBD`, `users`,
39
+ `everyone`, `all users`), or names no role plus context. A single generic noun with no qualifier is a hit.
40
+
41
+ **Severity:** Major. Audience drives density, tone, and accessibility floor; without it the design optimizes
42
+ for no one.
43
+
44
+ ---
45
+
46
+ ## AP-3: Immeasurable success criteria
47
+
48
+ **Definition:** Success is described in feelings rather than observables (looks modern, feels clean, is
49
+ intuitive, delights users). There is no event, count, or threshold a verifier could check.
50
+
51
+ - **Bad:** "Users should feel the app is fast and modern."
52
+ - **Good:** "First contentful paint under 1.5 seconds; task completion rate above 90 percent in five tests."
53
+
54
+ **Detection signal:** Scan Success Metrics for subjective adjectives (`modern`, `clean`, `intuitive`,
55
+ `delightful`, `nice`, `beautiful`) with no paired number or pass/fail condition. Subjective-only criteria
56
+ are a hit.
57
+
58
+ **Severity:** Major. Verify cannot grade a feeling; immeasurable criteria collapse the verify gate.
59
+
60
+ ---
61
+
62
+ ## AP-4: Scope creep
63
+
64
+ **Definition:** The Scope section lists more than the cycle can deliver, or mixes unrelated surfaces into
65
+ one brief, so the in-scope line stops constraining anything.
66
+
67
+ - **Bad:** "Redesign onboarding, billing, settings, the marketing site, and add dark mode."
68
+ - **Good:** "In scope: the three-step onboarding flow. Out of scope: billing, settings, marketing site."
69
+
70
+ **Detection signal:** Count distinct surfaces or top-level features named as in-scope. More than three
71
+ unrelated surfaces in one brief, or an in-scope list with no matching out-of-scope line, is a hit.
72
+
73
+ **Severity:** Minor. Wide scope is recoverable by splitting, but unsplit it inflates every later estimate.
74
+
75
+ ---
76
+
77
+ ## AP-5: Missing anti-goals
78
+
79
+ **Definition:** The brief states what to build but never what to avoid. With no anti-goals, explore widens
80
+ to fill the vacuum and the design picks up patterns the team never wanted.
81
+
82
+ - **Bad:** (Scope lists features only; no "we are deliberately not doing X" line anywhere.)
83
+ - **Good:** "Anti-goals: no new navigation paradigm, no carousel, do not touch the existing auth screens."
84
+
85
+ **Detection signal:** Look for an explicit non-goal, anti-goal, or out-of-scope statement framed as a
86
+ prohibition (`do not`, `avoid`, `no new`, `out of scope`). A brief with zero prohibition statements is a hit.
87
+
88
+ **Severity:** Minor. Anti-goals prevent drift; their absence is a warning, not a blocker.
89
+
90
+ ---
91
+
92
+ ## How findings are scored
93
+
94
+ The auditor reports a count of fired anti-patterns and lists each with its section and the matched text.
95
+ It does not compute a pass/fail gate and it does not block the brief to explore transition. Major findings
96
+ (AP-1, AP-2, AP-3) carry more weight in the summary line than Minor findings (AP-4, AP-5), so the user
97
+ knows which gaps most threaten a verifiable cycle. When any anti-pattern fires, the brief skill surfaces a
98
+ one-line pointer offering `/gdd:discuss brief` to refine before moving on.