@hegemonart/get-design-done 1.47.0 → 1.48.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude-plugin/marketplace.json +2 -2
- package/.claude-plugin/plugin.json +1 -1
- package/CHANGELOG.md +47 -0
- package/README.md +2 -0
- package/agents/brief-auditor.md +147 -0
- package/agents/copy-auditor.md +215 -0
- package/agents/design-auditor.md +13 -3
- package/agents/design-debt-crawler.md +269 -0
- package/agents/design-fixer.md +2 -0
- package/agents/quality-gate-runner.md +11 -10
- package/dist/claude-code/.claude/skills/brief/SKILL.md +17 -0
- package/dist/claude-code/.claude/skills/quality-gate/SKILL.md +2 -2
- package/hooks/gdd-a11y-gate.js +119 -0
- package/hooks/hooks.json +8 -0
- package/package.json +1 -1
- package/reference/brief-quality-rubric.md +98 -0
- package/reference/copy-quality.md +135 -0
- package/reference/debt-categories.md +148 -0
- package/reference/registry.json +21 -0
- package/skills/brief/SKILL.md +17 -0
- package/skills/quality-gate/SKILL.md +2 -2
|
@@ -0,0 +1,269 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: design-debt-crawler
|
|
3
|
+
description: Project-wide retroactive design-debt crawler. Walks the ENTIRE source tree (not STATE.md completed tasks), catalogs raw color literals, anti-pattern hits, untokenized components, contrast and density issues, scores each by priority, and writes the project-scoped .design/debt/DEBT-CATALOG.md. Pure catalog; no auto-fix.
|
|
4
|
+
tools: Read, Bash, Grep, Glob, Write
|
|
5
|
+
color: yellow
|
|
6
|
+
model: inherit
|
|
7
|
+
default-tier: sonnet
|
|
8
|
+
tier-rationale: "Deterministic detection plus structured cataloging; Sonnet balances coverage with cost"
|
|
9
|
+
size_budget: M
|
|
10
|
+
size_budget_rationale: "Worker-tier crawler; 7 debt-class scan procedures plus priority scoring and output contract fit under the 300-line M budget"
|
|
11
|
+
parallel-safe: always
|
|
12
|
+
typical-duration-seconds: 90
|
|
13
|
+
reads-only: false
|
|
14
|
+
writes:
|
|
15
|
+
- ".design/debt/DEBT-CATALOG.md"
|
|
16
|
+
---
|
|
17
|
+
|
|
18
|
+
@reference/shared-preamble.md
|
|
19
|
+
|
|
20
|
+
# design-debt-crawler
|
|
21
|
+
|
|
22
|
+
## Role
|
|
23
|
+
|
|
24
|
+
You are a project-wide retroactive design-debt crawler. You walk the entire source
|
|
25
|
+
tree of an existing or legacy codebase, find design debt, group it by category, score
|
|
26
|
+
each finding by priority, and write a single project-scoped report at
|
|
27
|
+
`.design/debt/DEBT-CATALOG.md`.
|
|
28
|
+
|
|
29
|
+
You run once against the whole project, not against one cycle of work. This is the
|
|
30
|
+
defining difference from `design-auditor`: that agent is cycle-scoped and reads the
|
|
31
|
+
pipeline's recently completed work, while you ignore cycle state entirely and survey
|
|
32
|
+
everything that exists on disk right now.
|
|
33
|
+
|
|
34
|
+
You are a pure catalog. You do NOT modify source code, you do NOT apply fixes, and you
|
|
35
|
+
do NOT spawn other agents. For every finding you suggest a remediation command the user
|
|
36
|
+
can run later; you never run it yourself.
|
|
37
|
+
|
|
38
|
+
## CRITICAL: Project-Wide Scope, Not Cycle Scope
|
|
39
|
+
|
|
40
|
+
**You do NOT read `.design/STATE.md` `<completed_tasks>`.** You do not scope to the
|
|
41
|
+
current cycle, the current wave, or any recently touched file list. Your scope is the
|
|
42
|
+
whole source tree.
|
|
43
|
+
|
|
44
|
+
- You **walk the entire codebase**, every source file under the configured source roots
|
|
45
|
+
(default `src/`), regardless of when it was last changed or whether any GDD cycle ever
|
|
46
|
+
touched it.
|
|
47
|
+
- You write to a **project-scoped** path: `.design/debt/DEBT-CATALOG.md`. This is not a
|
|
48
|
+
cycle artifact and is not placed under any cycle directory.
|
|
49
|
+
- You may read `.design/STATE.md` only to learn the `source_roots` value. You ignore its
|
|
50
|
+
`<completed_tasks>`, `<position>`, `wave`, and `cycle` fields for scoping. If STATE.md
|
|
51
|
+
is absent, default the source root to `src/` and proceed.
|
|
52
|
+
|
|
53
|
+
If you ever find yourself filtering files by a completed-task list, stop: that is the
|
|
54
|
+
cycle-scoped behavior this agent exists to avoid.
|
|
55
|
+
|
|
56
|
+
## Required Reading
|
|
57
|
+
|
|
58
|
+
The orchestrating stage supplies a `<required_reading>` block in the prompt. Read every
|
|
59
|
+
listed file before acting. Minimum expected files:
|
|
60
|
+
|
|
61
|
+
- @reference/debt-categories.md
|
|
62
|
+
- @reference/anti-patterns.md
|
|
63
|
+
|
|
64
|
+
`reference/debt-categories.md` is the taxonomy you classify against and the source of
|
|
65
|
+
the priority-scoring model. `reference/anti-patterns.md` is the BAN-NN and SLOP-NN
|
|
66
|
+
catalog that the anti-pattern class cross-references.
|
|
67
|
+
|
|
68
|
+
---
|
|
69
|
+
|
|
70
|
+
## Work
|
|
71
|
+
|
|
72
|
+
### Step 1: Determine source roots
|
|
73
|
+
|
|
74
|
+
Read `source_roots` from `.design/STATE.md` if present; otherwise default to `src/`.
|
|
75
|
+
Build the file list once and reuse it for every scan below.
|
|
76
|
+
|
|
77
|
+
```bash
|
|
78
|
+
find src/ -type f \( -name "*.tsx" -o -name "*.jsx" -o -name "*.ts" -o -name "*.js" \
|
|
79
|
+
-o -name "*.vue" -o -name "*.svelte" -o -name "*.css" -o -name "*.scss" \) 2>/dev/null
|
|
80
|
+
```
|
|
81
|
+
|
|
82
|
+
### Step 2: Scan each debt class
|
|
83
|
+
|
|
84
|
+
Run one pass per class from `reference/debt-categories.md`. Record `file:line` plus the
|
|
85
|
+
matched text for every hit so each catalog row is traceable.
|
|
86
|
+
|
|
87
|
+
**color-literal** (raw color values, not token references):
|
|
88
|
+
|
|
89
|
+
```bash
|
|
90
|
+
grep -rEn "#[0-9a-fA-F]{3,8}|rgb\(|rgba\(|hsl\(|hsla\(" src/ \
|
|
91
|
+
--include="*.tsx" --include="*.jsx" --include="*.css" --include="*.scss" 2>/dev/null
|
|
92
|
+
```
|
|
93
|
+
|
|
94
|
+
Exclude the palette or token-definition file (a literal inside a `var(--x: #hex)`
|
|
95
|
+
definition IS the token). Count distinct literals and total occurrences.
|
|
96
|
+
|
|
97
|
+
**anti-pattern** (BAN-NN and SLOP-NN): run the deterministic detector once over the
|
|
98
|
+
tree. It returns every statically matchable rule in one pass with `file`, `line`,
|
|
99
|
+
`ruleId`, and a reference link, offline and with zero model calls.
|
|
100
|
+
|
|
101
|
+
```bash
|
|
102
|
+
node "${CLAUDE_PLUGIN_ROOT:-.}/bin/gdd-detect" src/ --json 2>/dev/null || true
|
|
103
|
+
```
|
|
104
|
+
|
|
105
|
+
Parse the JSON `findings` array. The detector cannot match the two subjective rules
|
|
106
|
+
(BAN-04 keyboard-action animation, BAN-10 nested equal radius); list those as a
|
|
107
|
+
manual-review note rather than counting them.
|
|
108
|
+
|
|
109
|
+
**untokenized-component** (component renders surface without token references):
|
|
110
|
+
|
|
111
|
+
```bash
|
|
112
|
+
# arbitrary bracket values + inline hex inside component files
|
|
113
|
+
grep -rEn "\[[0-9]+px\]|\[#[0-9a-fA-F]{3,8}\]" src/ \
|
|
114
|
+
--include="*.tsx" --include="*.jsx" --include="*.vue" --include="*.svelte" 2>/dev/null
|
|
115
|
+
# token references present in the same file set (for the ratio)
|
|
116
|
+
grep -rEln "var\(--|theme\(" src/ --include="*.tsx" --include="*.jsx" 2>/dev/null
|
|
117
|
+
```
|
|
118
|
+
|
|
119
|
+
A component file with literal or bracket hits and no `var(--` reference is untokenized.
|
|
120
|
+
The literal-to-token ratio per file is the strength signal.
|
|
121
|
+
|
|
122
|
+
**contrast** (foreground and background pairs below WCAG AA): resolve color pairs that
|
|
123
|
+
share an element or selector, compute the ratio, and flag pairs under 4.5:1 for body
|
|
124
|
+
text or 3:1 for large text and non-text indicators. Pairs built from unresolvable
|
|
125
|
+
runtime values become a manual-review note.
|
|
126
|
+
|
|
127
|
+
**density-spacing** (off-scale spacing and inconsistent rhythm):
|
|
128
|
+
|
|
129
|
+
```bash
|
|
130
|
+
grep -rEon "(p|px|py|pt|pb|pl|pr|m|mx|my|mt|mb|ml|mr|gap|space-[xy])-[0-9.]+" src/ \
|
|
131
|
+
--include="*.tsx" --include="*.jsx" 2>/dev/null | sort | uniq -c | sort -rn
|
|
132
|
+
```
|
|
133
|
+
|
|
134
|
+
Flag values that are not on the project's modular scale (default 4 / 8 / 12 / 16 / 24 /
|
|
135
|
+
32) and clusters where sibling components use different step counts for one role.
|
|
136
|
+
|
|
137
|
+
**typography-drift** (off-scale sizes, too many families, weak weight hierarchy):
|
|
138
|
+
|
|
139
|
+
```bash
|
|
140
|
+
grep -rEon "text-[a-z0-9]+|font-(bold|semibold|medium|normal|light)|font-size:[^;]+" \
|
|
141
|
+
src/ --include="*.tsx" --include="*.jsx" --include="*.css" 2>/dev/null \
|
|
142
|
+
| sort | uniq -c | sort -rn
|
|
143
|
+
grep -rEn "font-family:|fontFamily" src/ --include="*.css" --include="*.ts" 2>/dev/null
|
|
144
|
+
```
|
|
145
|
+
|
|
146
|
+
Flag a long tail of one-off sizes, more than two families, and `font-weight` under 400
|
|
147
|
+
on small text.
|
|
148
|
+
|
|
149
|
+
**a11y-text** (text-content accessibility debt):
|
|
150
|
+
|
|
151
|
+
```bash
|
|
152
|
+
grep -rEn "<img(?![^>]*\balt=)" src/ --include="*.tsx" --include="*.jsx" 2>/dev/null
|
|
153
|
+
grep -rEn "No data|No results|Nothing here|went wrong|error occurred" src/ \
|
|
154
|
+
--include="*.tsx" --include="*.jsx" 2>/dev/null
|
|
155
|
+
```
|
|
156
|
+
|
|
157
|
+
Flag meaningful images without `alt`, icon-only controls without an accessible name,
|
|
158
|
+
placeholder used as the only label, and generic empty or error copy.
|
|
159
|
+
|
|
160
|
+
### Step 3: Group and score
|
|
161
|
+
|
|
162
|
+
Group findings by the seven debt classes. For each finding, assign the three priority
|
|
163
|
+
factors from `reference/debt-categories.md`, each on a 1 to 3 scale:
|
|
164
|
+
|
|
165
|
+
- **visible-delta** (3 primary surface, 2 secondary, 1 edge or assistive-tech only)
|
|
166
|
+
- **effort** (3 mechanical swap, 2 single-component edit, 1 new token or refactor)
|
|
167
|
+
- **prevalence** (3 ten or more instances, 2 three to nine, 1 one or two)
|
|
168
|
+
|
|
169
|
+
Combine by multiplying: `priority = visible-delta × effort × prevalence`, range 1 to 27.
|
|
170
|
+
Sort the catalog by `priority` descending. Break ties by visible-delta, then prevalence.
|
|
171
|
+
|
|
172
|
+
### Step 4: Write the catalog
|
|
173
|
+
|
|
174
|
+
Create the directory and write the report. Each row suggests a remediation command per
|
|
175
|
+
the ROADMAP open-question default: pure catalog, no auto-fix.
|
|
176
|
+
|
|
177
|
+
```bash
|
|
178
|
+
mkdir -p .design/debt
|
|
179
|
+
```
|
|
180
|
+
|
|
181
|
+
---
|
|
182
|
+
|
|
183
|
+
## Output Format: DEBT-CATALOG.md
|
|
184
|
+
|
|
185
|
+
Write to `.design/debt/DEBT-CATALOG.md` using this structure:
|
|
186
|
+
|
|
187
|
+
```markdown
|
|
188
|
+
---
|
|
189
|
+
crawled: <ISO 8601 date>
|
|
190
|
+
scope: project-wide
|
|
191
|
+
source_roots: [src/]
|
|
192
|
+
total_findings: N
|
|
193
|
+
note: "Project-scoped retroactive debt catalog. Does NOT read STATE.md completed_tasks. Pure catalog; no auto-fix."
|
|
194
|
+
---
|
|
195
|
+
|
|
196
|
+
## Design Debt Catalog
|
|
197
|
+
|
|
198
|
+
**Crawled:** <ISO 8601 date>
|
|
199
|
+
**Scope:** Entire source tree (project-wide, not cycle-scoped)
|
|
200
|
+
**Total findings:** N across 7 debt classes
|
|
201
|
+
|
|
202
|
+
---
|
|
203
|
+
|
|
204
|
+
## Summary by Class
|
|
205
|
+
|
|
206
|
+
| Debt class | Findings | Top priority |
|
|
207
|
+
|------------|----------|--------------|
|
|
208
|
+
| color-literal | N | P |
|
|
209
|
+
| untokenized-component | N | P |
|
|
210
|
+
| anti-pattern | N | P |
|
|
211
|
+
| contrast | N | P |
|
|
212
|
+
| density-spacing | N | P |
|
|
213
|
+
| typography-drift | N | P |
|
|
214
|
+
| a11y-text | N | P |
|
|
215
|
+
|
|
216
|
+
---
|
|
217
|
+
|
|
218
|
+
## Findings (ranked by priority)
|
|
219
|
+
|
|
220
|
+
| Priority | Class | Location | Finding | V × E × P | Suggested command |
|
|
221
|
+
|----------|-------|----------|---------|-----------|-------------------|
|
|
222
|
+
| 18 | color-literal | src/Card.tsx:42 | Raw #1a73e8 instead of token | 3×3×2 | `/gdd:fast "replace #1a73e8 with semantic token in Card.tsx"` |
|
|
223
|
+
| 12 | anti-pattern | src/Hero.tsx:8 | BAN-02 gradient text on heading | 3×2×2 | `/gdd:fast "remove BAN-02 gradient text in Hero.tsx"` |
|
|
224
|
+
|
|
225
|
+
(One row per finding. The Suggested command column always carries a `/gdd:fast "<finding>"` string.)
|
|
226
|
+
|
|
227
|
+
---
|
|
228
|
+
|
|
229
|
+
## Manual-Review Notes
|
|
230
|
+
|
|
231
|
+
Items the deterministic scans cannot decide on their own:
|
|
232
|
+
|
|
233
|
+
- BAN-04 (keyboard-action animation) and BAN-10 (nested equal radius): subjective, not statically matched.
|
|
234
|
+
- Contrast pairs built from unresolvable runtime color values.
|
|
235
|
+
```
|
|
236
|
+
|
|
237
|
+
Every finding row MUST carry a `/gdd:fast "<finding>"` suggestion. This agent never
|
|
238
|
+
applies the fix; it only catalogs and suggests.
|
|
239
|
+
|
|
240
|
+
---
|
|
241
|
+
|
|
242
|
+
## Constraints
|
|
243
|
+
|
|
244
|
+
**MUST NOT:**
|
|
245
|
+
- Read `.design/STATE.md` `<completed_tasks>` or scope to any cycle, wave, or task list
|
|
246
|
+
- Modify source code or apply any fix (pure catalog, no auto-fix)
|
|
247
|
+
- Spawn other agents
|
|
248
|
+
- Write to any path other than `.design/debt/DEBT-CATALOG.md`
|
|
249
|
+
- Ask the user questions mid-run (single-shot execution)
|
|
250
|
+
|
|
251
|
+
**MAY:**
|
|
252
|
+
- Read any file in the repository
|
|
253
|
+
- Run `grep`, `find`, and `gdd-detect` for static analysis
|
|
254
|
+
- Read `.design/STATE.md` solely to learn `source_roots`
|
|
255
|
+
- Note a `<blocker>` entry in `.design/STATE.md` if the crawl cannot proceed, then still emit the completion marker
|
|
256
|
+
|
|
257
|
+
---
|
|
258
|
+
|
|
259
|
+
## Record
|
|
260
|
+
|
|
261
|
+
At run-end, append one JSONL line to `.design/intel/insights.jsonl`:
|
|
262
|
+
|
|
263
|
+
```json
|
|
264
|
+
{"ts":"<ISO-8601>","agent":"<name>","cycle":"<cycle from STATE.md>","stage":"<stage from STATE.md>","one_line_insight":"<what was produced or learned>","artifacts_written":["<files written>"]}
|
|
265
|
+
```
|
|
266
|
+
|
|
267
|
+
Schema: `reference/schemas/insight-line.schema.json`.
|
|
268
|
+
|
|
269
|
+
## CRAWL COMPLETE
|
package/agents/design-fixer.md
CHANGED
|
@@ -25,6 +25,8 @@ You have zero session memory. Every invocation starts fresh. The orchestrating s
|
|
|
25
25
|
|
|
26
26
|
**Scope of work:** You apply targeted source-code fixes for gaps listed in `.design/DESIGN-VERIFICATION.md ## Phase 5 — Gaps`. You commit one fix per gap. You do nothing else.
|
|
27
27
|
|
|
28
|
+
**Accessibility failures route here too.** When the quality-gate skill classifies a failure into the `a11y` bucket (sourced from axe / pa11y / lighthouse / jsx-a11y runs), it spawns you with that failure exactly like a `lint`, `type`, `test`, or `visual` failure. Treat an `a11y` classified failure as a normal in-scope fix: read the cited rule, apply the minimal source change that clears the violation (a missing label, an aria attribute, a contrast token), confirm the fix, and commit one fix per gap. No special handling beyond the standard fix sequence below.
|
|
29
|
+
|
|
28
30
|
**What you MUST NOT touch:**
|
|
29
31
|
- `DESIGN-PLAN.md` - locked during verify
|
|
30
32
|
- `DESIGN-CONTEXT.md` - locked during verify
|
|
@@ -1,11 +1,11 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: quality-gate-runner
|
|
3
|
-
description: "Cheap Haiku classifier that ingests {command, exit_code, stderr} tuples from the quality-gate skill's parallel run and emits a JSON verdict - pass/fail plus per-bucket failure groupings (lint / type / test / visual). Read-only. Does not run commands itself."
|
|
3
|
+
description: "Cheap Haiku classifier that ingests {command, exit_code, stderr} tuples from the quality-gate skill's parallel run and emits a JSON verdict - pass/fail plus per-bucket failure groupings (lint / type / test / visual / a11y). Read-only. Does not run commands itself."
|
|
4
4
|
tools: Read, Bash, Grep
|
|
5
5
|
color: amber
|
|
6
6
|
model: inherit
|
|
7
7
|
default-tier: haiku
|
|
8
|
-
tier-rationale: "Pattern-match exit codes and bucket stderr into
|
|
8
|
+
tier-rationale: "Pattern-match exit codes and bucket stderr into five named categories - no synthesis, no rewrites, no spawning. Belongs on Haiku to keep classification cost trivial relative to the actual command runs."
|
|
9
9
|
size_budget: S
|
|
10
10
|
parallel-safe: always
|
|
11
11
|
typical-duration-seconds: 5
|
|
@@ -48,16 +48,17 @@ You may also receive a `stdout` field per entry (forward-compat - the skill plan
|
|
|
48
48
|
|
|
49
49
|
## Bucketing rule
|
|
50
50
|
|
|
51
|
-
Map each command to exactly one of
|
|
51
|
+
Map each command to exactly one of five buckets based on the verbatim command string. Use case-insensitive substring match against the command line:
|
|
52
52
|
|
|
53
53
|
| Substring (case-insensitive) | Bucket |
|
|
54
54
|
|------------------------------|--------|
|
|
55
|
-
| `
|
|
56
|
-
| `typecheck`, `tsc`, `tsc --noemit`, `flow check` | `type` |
|
|
57
|
-
| `test` (but NOT one of the visual matches below - visual wins) | `test` |
|
|
55
|
+
| `axe`, `pa11y`, `lighthouse`, `jsx-a11y`, `eslint-plugin-jsx-a11y` | `a11y` |
|
|
58
56
|
| `chromatic`, `test:visual`, `loki test`, `playwright test --grep visual` | `visual` |
|
|
57
|
+
| `typecheck`, `tsc`, `tsc --noemit`, `flow check` | `type` |
|
|
58
|
+
| `lint`, `eslint`, `stylelint`, `biome lint` | `lint` |
|
|
59
|
+
| `test` (only when none of the buckets above match) | `test` |
|
|
59
60
|
|
|
60
|
-
|
|
61
|
+
Match precedence runs top-down: check `a11y` first, then `visual`, then `type`, then `lint`, then `test`. A command can match more than one substring (`npm run test:visual` matches both `test` and `test:visual`, and `eslint-plugin-jsx-a11y` matches both `lint` and `jsx-a11y`); the first bucket in precedence order wins, so `a11y` beats `lint` and `visual` beats `test`. If a command matches none, bucket it under `test` (catch-all - most user-supplied custom commands are test-like). These five buckets (`lint`, `type`, `test`, `visual`, `a11y`) are the complete set; do not invent a sixth bucket.
|
|
61
62
|
|
|
62
63
|
## Pass / fail rule
|
|
63
64
|
|
|
@@ -96,17 +97,17 @@ Pass example:
|
|
|
96
97
|
Fail example:
|
|
97
98
|
|
|
98
99
|
```json
|
|
99
|
-
{"status": "fail", "classified_failures": {"type": ["typecheck: error TS2304 in src/x.ts"], "visual": ["chromatic: 2 stories changed"]}}
|
|
100
|
+
{"status": "fail", "classified_failures": {"type": ["typecheck: error TS2304 in src/x.ts"], "visual": ["chromatic: 2 stories changed"], "a11y": ["axe: 3 serious violations on /checkout"]}}
|
|
100
101
|
```
|
|
101
102
|
|
|
102
103
|
Schema:
|
|
103
104
|
- `status` - string enum, one of `"pass" | "fail"`. Note: this is NOT the same enum as the skill's STATE-block status (which also has `timeout` and `skipped`); those two cases are decided by the skill, not by you. You only emit `pass | fail`.
|
|
104
|
-
- `classified_failures` - object. Keys are a subset of `lint | type | test | visual`. Values are arrays of short summary strings (≤ 120 chars each). The object is `{}` (empty) when `status === "pass"`.
|
|
105
|
+
- `classified_failures` - object. Keys are a subset of `lint | type | test | visual | a11y`. Values are arrays of short summary strings (≤ 120 chars each). The object is `{}` (empty) when `status === "pass"`.
|
|
105
106
|
|
|
106
107
|
## Constraints
|
|
107
108
|
|
|
108
109
|
- **Do not** read `stderr` content beyond the first non-empty line. The skill keeps the verbatim outputs for the design-fixer; your job is routing, not analysis.
|
|
109
|
-
- **Do not** invent buckets outside the
|
|
110
|
+
- **Do not** invent buckets outside the five-name set (`lint | type | test | visual | a11y`).
|
|
110
111
|
- **Do not** ever emit `status: "timeout"` or `status: "skipped"` - those are skill-level statuses, not classifier outputs.
|
|
111
112
|
- **Do not** consult external services or MCP tools. Classification is a pure function of the supplied input.
|
|
112
113
|
- **Do not** exceed `size_budget: S`. If `outputs[*].stderr` is unexpectedly large, prefer to summarize from the first 4 KB of each stderr rather than refuse.
|
|
@@ -108,6 +108,23 @@ Run this final spec-quality pass over `.design/BRIEF.md` before the brief→expl
|
|
|
108
108
|
- Scope check: nothing in the artifact exceeds (or silently drops) the agreed scope.
|
|
109
109
|
- Ambiguity check: every requirement/decision is specific enough to act on without a follow-up question.
|
|
110
110
|
|
|
111
|
+
## Optional brief audit (non-blocking)
|
|
112
|
+
|
|
113
|
+
Before the gate, you MAY spawn `agents/brief-auditor.md` via `Task` to grade the brief against the five
|
|
114
|
+
brief anti-patterns (vague verbs, missing audience, immeasurable success criteria, scope creep, missing
|
|
115
|
+
anti-goals). The auditor reads `.design/BRIEF.md` plus `reference/brief-quality-rubric.md` and writes
|
|
116
|
+
advisory findings to `.design/BRIEF-AUDIT.md`. This step is advisory and MUST NOT block the brief to
|
|
117
|
+
explore transition.
|
|
118
|
+
|
|
119
|
+
If the auditor reports one or more fired anti-patterns, surface a single-line pointer to the user:
|
|
120
|
+
|
|
121
|
+
```
|
|
122
|
+
Brief audit flagged N issue(s) - run /gdd:discuss brief to refine, or proceed to explore.
|
|
123
|
+
```
|
|
124
|
+
|
|
125
|
+
The user decides. Proceeding to explore with a flagged brief is allowed; the pointer is a nudge, not a gate.
|
|
126
|
+
If the auditor reports no fired anti-patterns, or you skip the audit, continue to the gate unchanged.
|
|
127
|
+
|
|
111
128
|
<HARD-GATE>
|
|
112
129
|
Do NOT transition to explore (or invoke `/gdd:explore`) until the brief artifact (default `.design/BRIEF.md`) is committed AND the user has approved it. If this project uses a custom `.design` location, read the artifact path from `.design/STATE.md` rather than assuming the default.
|
|
113
130
|
</HARD-GATE>
|
|
@@ -39,7 +39,7 @@ Read once at start from `.design/config.json` (all optional; defaults in parens)
|
|
|
39
39
|
Stop at the first tier that produces ≥ 1 command:
|
|
40
40
|
|
|
41
41
|
1. **Authoritative config.** If `.design/config.json` has `quality_gate.commands` non-empty, use verbatim.
|
|
42
|
-
2. **Auto-detect from `package.json#scripts`** - match against allowlist: `lint`, `typecheck`, `tsc` (only if `typecheck` absent), `test`, `chromatic`, `test:visual`, `lint:design` (Phase 41 - the `gdd-detect` deterministic anti-pattern gate,
|
|
42
|
+
2. **Auto-detect from `package.json#scripts`** - match against allowlist: `lint`, `typecheck`, `tsc` (only if `typecheck` absent), `test`, `chromatic`, `test:visual`, `lint:design` (Phase 41 - the `gdd-detect` deterministic anti-pattern gate), and the accessibility scripts `axe`, `pa11y`, `lighthouse`, `eslint-plugin-jsx-a11y` (or a script named `jsx-a11y`) which classify into the `a11y` bucket. Exclude by name: `test:e2e`, `test:integration` (if separate `test`), anything starting `dev:`, `build:`, `start:`. Run via `npm run <name>` unless `quality_gate.package_manager` overrides.
|
|
43
43
|
3. **Skip with notice.** Emit `quality_gate_skipped` (Step 6) and write a `<run/>` with `status="skipped"`. Verify treats skipped as non-blocking.
|
|
44
44
|
|
|
45
45
|
## Step 2 - Parallel run
|
|
@@ -48,7 +48,7 @@ Emit `quality_gate_started`. Spawn each command in a separate `Bash`; collect `{
|
|
|
48
48
|
|
|
49
49
|
## Step 3 - Classification
|
|
50
50
|
|
|
51
|
-
Spawn `quality-gate-runner` agent via `Task` with payload `{outputs: [{command, exit_code, stderr}, ...]}`. Agent returns `{status: "pass"|"fail", classified_failures: {lint, type, test, visual}}`. `pass` → Step 5. `fail` → Step 4.
|
|
51
|
+
Spawn `quality-gate-runner` agent via `Task` with payload `{outputs: [{command, exit_code, stderr}, ...]}`. Agent returns `{status: "pass"|"fail", classified_failures: {lint, type, test, visual, a11y}}`. The `a11y` bucket groups accessibility failures from axe / pa11y / lighthouse / jsx-a11y. `pass` → Step 5. `fail` → Step 4.
|
|
52
52
|
|
|
53
53
|
## Step 4 - Fix loop (D-08)
|
|
54
54
|
|
|
@@ -0,0 +1,119 @@
|
|
|
1
|
+
#!/usr/bin/env node
|
|
2
|
+
'use strict';
|
|
3
|
+
/**
|
|
4
|
+
* hooks/gdd-a11y-gate.js — advisory PostToolUse hook for accessibility failures.
|
|
5
|
+
*
|
|
6
|
+
* Phase 48 (A11Y-GATE). The quality-gate skill classifies failed command runs
|
|
7
|
+
* into buckets {lint, type, test, visual, a11y}. When a tool response carries
|
|
8
|
+
* classified_failures with a non-empty `a11y` bucket, this hook surfaces an
|
|
9
|
+
* advisory note so the accessibility failures are visible without being buried
|
|
10
|
+
* in the gate's JSON, and appends a `quality_gate_a11y` event to the cycle's
|
|
11
|
+
* events.jsonl for observability.
|
|
12
|
+
*
|
|
13
|
+
* Contract (mirrors gdd-mcp-circuit-breaker.js):
|
|
14
|
+
* - Read stdin JSON (the PostToolUse payload).
|
|
15
|
+
* - Inspect payload.tool_response for quality-gate classified_failures.a11y.
|
|
16
|
+
* - If present and non-empty: emit an advisory note + append one events.jsonl row.
|
|
17
|
+
* - ALWAYS write {continue:true} to stdout and exit 0. This hook never blocks.
|
|
18
|
+
*
|
|
19
|
+
* Advisory only: accessibility findings route to design-fixer through the gate's
|
|
20
|
+
* own fix loop, not through this hook. The hook is observability, not a gate.
|
|
21
|
+
* Dependency-free Node (fs + path only).
|
|
22
|
+
*/
|
|
23
|
+
|
|
24
|
+
const fs = require('fs');
|
|
25
|
+
const path = require('path');
|
|
26
|
+
|
|
27
|
+
/**
|
|
28
|
+
* Pull the `a11y` bucket out of a tool response, tolerating both the shape
|
|
29
|
+
* where classified_failures sits at the top level and the shape where it is
|
|
30
|
+
* nested under a `quality_gate` / `result` wrapper. Returns an array of
|
|
31
|
+
* summary strings (possibly empty) or null when no a11y bucket is present.
|
|
32
|
+
*/
|
|
33
|
+
function extractA11yFailures(toolResponse) {
|
|
34
|
+
if (!toolResponse || typeof toolResponse !== 'object') return null;
|
|
35
|
+
|
|
36
|
+
const candidates = [
|
|
37
|
+
toolResponse.classified_failures,
|
|
38
|
+
toolResponse.quality_gate && toolResponse.quality_gate.classified_failures,
|
|
39
|
+
toolResponse.result && toolResponse.result.classified_failures,
|
|
40
|
+
];
|
|
41
|
+
|
|
42
|
+
for (const cf of candidates) {
|
|
43
|
+
if (cf && typeof cf === 'object' && Object.prototype.hasOwnProperty.call(cf, 'a11y')) {
|
|
44
|
+
const bucket = cf.a11y;
|
|
45
|
+
if (Array.isArray(bucket)) return bucket;
|
|
46
|
+
// Tolerate a non-array truthy value by coercing to a single-element list.
|
|
47
|
+
if (bucket) return [String(bucket)];
|
|
48
|
+
return [];
|
|
49
|
+
}
|
|
50
|
+
}
|
|
51
|
+
return null;
|
|
52
|
+
}
|
|
53
|
+
|
|
54
|
+
/** Append one JSONL event row; best-effort, never throws on the persist path. */
|
|
55
|
+
function appendEvent(cwd, row) {
|
|
56
|
+
try {
|
|
57
|
+
const eventsPath = path.join(cwd, '.design', 'events.jsonl');
|
|
58
|
+
fs.mkdirSync(path.dirname(eventsPath), { recursive: true });
|
|
59
|
+
fs.appendFileSync(eventsPath, JSON.stringify(row) + '\n', 'utf8');
|
|
60
|
+
} catch {
|
|
61
|
+
/* observability is best-effort — swallow */
|
|
62
|
+
}
|
|
63
|
+
}
|
|
64
|
+
|
|
65
|
+
/**
|
|
66
|
+
* Core hook logic. Accepts a parsed payload and returns the decision object
|
|
67
|
+
* to write to stdout. Exported for unit testing without spawning a process.
|
|
68
|
+
* Always returns an object whose `continue` field is true.
|
|
69
|
+
*/
|
|
70
|
+
function evaluate(payload, opts = {}) {
|
|
71
|
+
const cwd = (payload && payload.cwd) || opts.cwd || process.cwd();
|
|
72
|
+
const toolResponse = payload && payload.tool_response;
|
|
73
|
+
const a11y = extractA11yFailures(toolResponse);
|
|
74
|
+
|
|
75
|
+
if (!a11y || a11y.length === 0) {
|
|
76
|
+
return { continue: true };
|
|
77
|
+
}
|
|
78
|
+
|
|
79
|
+
const count = a11y.length;
|
|
80
|
+
const note =
|
|
81
|
+
`gdd-a11y-gate: quality gate reported ${count} accessibility ` +
|
|
82
|
+
`failure${count === 1 ? '' : 's'} in the a11y bucket. These route to ` +
|
|
83
|
+
`design-fixer like lint/type/test/visual failures. Findings: ` +
|
|
84
|
+
a11y.slice(0, 5).join('; ');
|
|
85
|
+
|
|
86
|
+
appendEvent(cwd, {
|
|
87
|
+
ts: new Date().toISOString(),
|
|
88
|
+
event: 'quality_gate_a11y',
|
|
89
|
+
a11y_failure_count: count,
|
|
90
|
+
a11y_failures: a11y.slice(0, 20),
|
|
91
|
+
});
|
|
92
|
+
|
|
93
|
+
// continue:true keeps this advisory — systemMessage surfaces the note.
|
|
94
|
+
return { continue: true, systemMessage: note };
|
|
95
|
+
}
|
|
96
|
+
|
|
97
|
+
async function main(stdin = process.stdin, stdout = process.stdout) {
|
|
98
|
+
let buf = '';
|
|
99
|
+
for await (const chunk of stdin) buf += chunk;
|
|
100
|
+
let payload;
|
|
101
|
+
try {
|
|
102
|
+
payload = JSON.parse(buf || '{}');
|
|
103
|
+
} catch {
|
|
104
|
+
stdout.write(JSON.stringify({ continue: true }));
|
|
105
|
+
return;
|
|
106
|
+
}
|
|
107
|
+
const decision = evaluate(payload);
|
|
108
|
+
stdout.write(JSON.stringify(decision));
|
|
109
|
+
}
|
|
110
|
+
|
|
111
|
+
// Run as a CLI only when invoked directly; tests require() this module and
|
|
112
|
+
// call evaluate()/main() against mock payloads without triggering stdin reads.
|
|
113
|
+
if (require.main === module) {
|
|
114
|
+
main().catch(() => {
|
|
115
|
+
process.stdout.write(JSON.stringify({ continue: true }));
|
|
116
|
+
});
|
|
117
|
+
}
|
|
118
|
+
|
|
119
|
+
module.exports = { main, evaluate, extractA11yFailures, appendEvent };
|
package/hooks/hooks.json
CHANGED
|
@@ -116,6 +116,14 @@
|
|
|
116
116
|
"command": "node --experimental-strip-types \"${CLAUDE_PLUGIN_ROOT}/hooks/context-exhaustion.ts\""
|
|
117
117
|
}
|
|
118
118
|
]
|
|
119
|
+
},
|
|
120
|
+
{
|
|
121
|
+
"hooks": [
|
|
122
|
+
{
|
|
123
|
+
"type": "command",
|
|
124
|
+
"command": "node \"${CLAUDE_PLUGIN_ROOT}/hooks/gdd-a11y-gate.js\""
|
|
125
|
+
}
|
|
126
|
+
]
|
|
119
127
|
}
|
|
120
128
|
],
|
|
121
129
|
"Stop": [
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "@hegemonart/get-design-done",
|
|
3
|
-
"version": "1.
|
|
3
|
+
"version": "1.48.0",
|
|
4
4
|
"description": "A design-quality pipeline for AI coding agents: brief, plan, implement, and verify UI work against your design system.",
|
|
5
5
|
"author": "Hegemon",
|
|
6
6
|
"homepage": "https://github.com/hegemonart/get-design-done",
|
|
@@ -0,0 +1,98 @@
|
|
|
1
|
+
# Brief Quality Rubric
|
|
2
|
+
|
|
3
|
+
The five anti-patterns `agents/brief-auditor.md` grades `.design/BRIEF.md` against. Each entry pairs a
|
|
4
|
+
definition with a good and bad example, the detection signal the auditor greps for, and a severity note.
|
|
5
|
+
This rubric is advisory: a flagged brief still proceeds to explore. The point is to surface vagueness
|
|
6
|
+
while the cost of fixing it is one sentence, not a redesign.
|
|
7
|
+
|
|
8
|
+
A brief is the contract every later stage checks against. A vague brief produces an unverifiable cycle,
|
|
9
|
+
because verify has nothing concrete to test. The auditor reads the brief once and writes findings to
|
|
10
|
+
`.design/BRIEF-AUDIT.md`; the brief skill then offers `/gdd:discuss brief` when any anti-pattern fires.
|
|
11
|
+
|
|
12
|
+
---
|
|
13
|
+
|
|
14
|
+
## AP-1: Vague verbs without a metric
|
|
15
|
+
|
|
16
|
+
**Definition:** The problem or goal uses a soft verb (improve, optimize, streamline, enhance, modernize,
|
|
17
|
+
refresh) with no number, threshold, or observable change attached. The verb hides the actual target.
|
|
18
|
+
|
|
19
|
+
- **Bad:** "Improve the checkout flow."
|
|
20
|
+
- **Good:** "Cut checkout abandonment from 38 percent to under 25 percent on mobile."
|
|
21
|
+
|
|
22
|
+
**Detection signal:** Match soft verbs (`improve`, `optimize`, `streamline`, `enhance`, `modernize`,
|
|
23
|
+
`refresh`) in the Problem or Success Metrics sections, then check the same sentence for a digit, a
|
|
24
|
+
percent sign, or a unit. A soft verb with no adjacent quantity is a hit.
|
|
25
|
+
|
|
26
|
+
**Severity:** Major. A goal with no metric cannot be verified, so the whole cycle inherits the ambiguity.
|
|
27
|
+
|
|
28
|
+
---
|
|
29
|
+
|
|
30
|
+
## AP-2: Missing audience
|
|
31
|
+
|
|
32
|
+
**Definition:** The brief never names who the design is for. No role, device, context, or skill level is
|
|
33
|
+
stated, so every later trade-off (density, reading level, input model) is a guess.
|
|
34
|
+
|
|
35
|
+
- **Bad:** "Build a dashboard for tracking orders."
|
|
36
|
+
- **Good:** "Build an order dashboard for warehouse leads on a shared floor tablet, glanceable at arm's length."
|
|
37
|
+
|
|
38
|
+
**Detection signal:** Read the Audience section. Flag when it is empty, a placeholder (`TBD`, `users`,
|
|
39
|
+
`everyone`, `all users`), or names no role plus context. A single generic noun with no qualifier is a hit.
|
|
40
|
+
|
|
41
|
+
**Severity:** Major. Audience drives density, tone, and accessibility floor; without it the design optimizes
|
|
42
|
+
for no one.
|
|
43
|
+
|
|
44
|
+
---
|
|
45
|
+
|
|
46
|
+
## AP-3: Immeasurable success criteria
|
|
47
|
+
|
|
48
|
+
**Definition:** Success is described in feelings rather than observables (looks modern, feels clean, is
|
|
49
|
+
intuitive, delights users). There is no event, count, or threshold a verifier could check.
|
|
50
|
+
|
|
51
|
+
- **Bad:** "Users should feel the app is fast and modern."
|
|
52
|
+
- **Good:** "First contentful paint under 1.5 seconds; task completion rate above 90 percent in five tests."
|
|
53
|
+
|
|
54
|
+
**Detection signal:** Scan Success Metrics for subjective adjectives (`modern`, `clean`, `intuitive`,
|
|
55
|
+
`delightful`, `nice`, `beautiful`) with no paired number or pass/fail condition. Subjective-only criteria
|
|
56
|
+
are a hit.
|
|
57
|
+
|
|
58
|
+
**Severity:** Major. Verify cannot grade a feeling; immeasurable criteria collapse the verify gate.
|
|
59
|
+
|
|
60
|
+
---
|
|
61
|
+
|
|
62
|
+
## AP-4: Scope creep
|
|
63
|
+
|
|
64
|
+
**Definition:** The Scope section lists more than the cycle can deliver, or mixes unrelated surfaces into
|
|
65
|
+
one brief, so the in-scope line stops constraining anything.
|
|
66
|
+
|
|
67
|
+
- **Bad:** "Redesign onboarding, billing, settings, the marketing site, and add dark mode."
|
|
68
|
+
- **Good:** "In scope: the three-step onboarding flow. Out of scope: billing, settings, marketing site."
|
|
69
|
+
|
|
70
|
+
**Detection signal:** Count distinct surfaces or top-level features named as in-scope. More than three
|
|
71
|
+
unrelated surfaces in one brief, or an in-scope list with no matching out-of-scope line, is a hit.
|
|
72
|
+
|
|
73
|
+
**Severity:** Minor. Wide scope is recoverable by splitting, but unsplit it inflates every later estimate.
|
|
74
|
+
|
|
75
|
+
---
|
|
76
|
+
|
|
77
|
+
## AP-5: Missing anti-goals
|
|
78
|
+
|
|
79
|
+
**Definition:** The brief states what to build but never what to avoid. With no anti-goals, explore widens
|
|
80
|
+
to fill the vacuum and the design picks up patterns the team never wanted.
|
|
81
|
+
|
|
82
|
+
- **Bad:** (Scope lists features only; no "we are deliberately not doing X" line anywhere.)
|
|
83
|
+
- **Good:** "Anti-goals: no new navigation paradigm, no carousel, do not touch the existing auth screens."
|
|
84
|
+
|
|
85
|
+
**Detection signal:** Look for an explicit non-goal, anti-goal, or out-of-scope statement framed as a
|
|
86
|
+
prohibition (`do not`, `avoid`, `no new`, `out of scope`). A brief with zero prohibition statements is a hit.
|
|
87
|
+
|
|
88
|
+
**Severity:** Minor. Anti-goals prevent drift; their absence is a warning, not a blocker.
|
|
89
|
+
|
|
90
|
+
---
|
|
91
|
+
|
|
92
|
+
## How findings are scored
|
|
93
|
+
|
|
94
|
+
The auditor reports a count of fired anti-patterns and lists each with its section and the matched text.
|
|
95
|
+
It does not compute a pass/fail gate and it does not block the brief to explore transition. Major findings
|
|
96
|
+
(AP-1, AP-2, AP-3) carry more weight in the summary line than Minor findings (AP-4, AP-5), so the user
|
|
97
|
+
knows which gaps most threaten a verifiable cycle. When any anti-pattern fires, the brief skill surfaces a
|
|
98
|
+
one-line pointer offering `/gdd:discuss brief` to refine before moving on.
|