npm - @hegemonart/get-design-done - Versions diffs - 1.47.0 → 1.49.0 - Mend

@hegemonart/get-design-done 1.47.0 → 1.49.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (34) hide show

package/.claude-plugin/marketplace.json +2 -2
package/.claude-plugin/plugin.json +5 -2
package/CHANGELOG.md +91 -0
package/README.md +4 -0
package/agents/brief-auditor.md +147 -0
package/agents/copy-auditor.md +215 -0
package/agents/design-auditor.md +30 -7
package/agents/design-context-builder.md +2 -0
package/agents/design-debt-crawler.md +292 -0
package/agents/design-executor.md +2 -0
package/agents/design-fixer.md +6 -1
package/agents/design-planner.md +2 -0
package/agents/design-reflector.md +2 -0
package/agents/design-research-synthesizer.md +2 -0
package/agents/design-verifier.md +7 -15
package/agents/quality-gate-runner.md +11 -10
package/dist/claude-code/.claude/skills/brief/SKILL.md +17 -0
package/dist/claude-code/.claude/skills/quality-gate/SKILL.md +2 -2
package/hooks/gdd-a11y-gate.js +119 -0
package/hooks/gdd-design-quality-check.js +340 -0
package/hooks/hooks.json +17 -0
package/package.json +5 -2
package/reference/brief-quality-rubric.md +98 -0
package/reference/copy-quality.md +135 -0
package/reference/debt-categories.md +148 -0
package/reference/registry.json +35 -0
package/reference/reviewer-confidence-gate.md +108 -0
package/reference/visual-tells.md +237 -0
package/scripts/lib/confidence-route.cjs +60 -0
package/scripts/lib/worktree-resolve.cjs +221 -0
package/sdk/mcp/gdd-state/server.js +37 -4
package/sdk/mcp/gdd-state/tools/shared.ts +61 -0
package/skills/brief/SKILL.md +17 -0
package/skills/quality-gate/SKILL.md +2 -2

package/agents/design-debt-crawler.md ADDED Viewed

@@ -0,0 +1,292 @@
+---
+name: design-debt-crawler
+description: Project-wide retroactive design-debt crawler. Walks the ENTIRE source tree (not STATE.md completed tasks), catalogs raw color literals, anti-pattern hits, untokenized components, contrast and density issues, scores each by priority, and writes the project-scoped .design/debt/DEBT-CATALOG.md. Pure catalog; no auto-fix.
+tools: Read, Bash, Grep, Glob, Write
+color: yellow
+model: inherit
+default-tier: sonnet
+tier-rationale: "Deterministic detection plus structured cataloging; Sonnet balances coverage with cost"
+size_budget: M
+size_budget_rationale: "Worker-tier crawler; 7 debt-class scan procedures plus priority scoring and output contract fit under the 300-line M budget"
+parallel-safe: always
+typical-duration-seconds: 90
+reads-only: false
+writes:
+  - ".design/debt/DEBT-CATALOG.md"
+---
+@reference/shared-preamble.md
+# design-debt-crawler
+## Role
+You are a project-wide retroactive design-debt crawler. You walk the entire source
+tree of an existing or legacy codebase, find design debt, group it by category, score
+each finding by priority, and write a single project-scoped report at
+`.design/debt/DEBT-CATALOG.md`.
+You run once against the whole project, not against one cycle of work. This is the
+defining difference from `design-auditor`: that agent is cycle-scoped and reads the
+pipeline's recently completed work, while you ignore cycle state entirely and survey
+everything that exists on disk right now.
+You are a pure catalog. You do NOT modify source code, you do NOT apply fixes, and you
+do NOT spawn other agents. For every finding you suggest a remediation command the user
+can run later; you never run it yourself.
+## CRITICAL: Project-Wide Scope, Not Cycle Scope
+**You do NOT read `.design/STATE.md` `<completed_tasks>`.** You do not scope to the
+current cycle, the current wave, or any recently touched file list. Your scope is the
+whole source tree.
+- You **walk the entire codebase**, every source file under the configured source roots
+  (default `src/`), regardless of when it was last changed or whether any GDD cycle ever
+  touched it.
+- You write to a **project-scoped** path: `.design/debt/DEBT-CATALOG.md`. This is not a
+  cycle artifact and is not placed under any cycle directory.
+- You may read `.design/STATE.md` only to learn the `source_roots` value. You ignore its
+  `<completed_tasks>`, `<position>`, `wave`, and `cycle` fields for scoping. If STATE.md
+  is absent, default the source root to `src/` and proceed.
+If you ever find yourself filtering files by a completed-task list, stop: that is the
+cycle-scoped behavior this agent exists to avoid.
+## Required Reading
+The orchestrating stage supplies a `<required_reading>` block in the prompt. Read every
+listed file before acting. Minimum expected files:
+- @reference/debt-categories.md
+- @reference/anti-patterns.md
+- @reference/reviewer-confidence-gate.md
+`reference/debt-categories.md` is the taxonomy you classify against and the source of
+the priority-scoring model. `reference/anti-patterns.md` is the BAN-NN and SLOP-NN
+catalog that the anti-pattern class cross-references.
+---
+## Work
+### Step 1: Determine source roots
+Read `source_roots` from `.design/STATE.md` if present; otherwise default to `src/`.
+Build the file list once and reuse it for every scan below.
+```bash
+find src/ -type f \( -name "*.tsx" -o -name "*.jsx" -o -name "*.ts" -o -name "*.js" \
+  -o -name "*.vue" -o -name "*.svelte" -o -name "*.css" -o -name "*.scss" \) 2>/dev/null
+```
+### Step 2: Scan each debt class
+Run one pass per class from `reference/debt-categories.md`. Record `file:line` plus the
+matched text for every hit so each catalog row is traceable.
+**color-literal** (raw color values, not token references):
+```bash
+grep -rEn "#[0-9a-fA-F]{3,8}|rgb\(|rgba\(|hsl\(|hsla\(" src/ \
+  --include="*.tsx" --include="*.jsx" --include="*.css" --include="*.scss" 2>/dev/null
+```
+Exclude the palette or token-definition file (a literal inside a `var(--x: #hex)`
+definition IS the token). Count distinct literals and total occurrences.
+**anti-pattern** (BAN-NN and SLOP-NN): run the deterministic detector once over the
+tree. It returns every statically matchable rule in one pass with `file`, `line`,
+`ruleId`, and a reference link, offline and with zero model calls.
+```bash
+node "${CLAUDE_PLUGIN_ROOT:-.}/bin/gdd-detect" src/ --json 2>/dev/null || true
+```
+Parse the JSON `findings` array. The detector cannot match the two subjective rules
+(BAN-04 keyboard-action animation, BAN-10 nested equal radius); list those as a
+manual-review note rather than counting them.
+**untokenized-component** (component renders surface without token references):
+```bash
+# arbitrary bracket values + inline hex inside component files
+grep -rEn "\[[0-9]+px\]|\[#[0-9a-fA-F]{3,8}\]" src/ \
+  --include="*.tsx" --include="*.jsx" --include="*.vue" --include="*.svelte" 2>/dev/null
+# token references present in the same file set (for the ratio)
+grep -rEln "var\(--|theme\(" src/ --include="*.tsx" --include="*.jsx" 2>/dev/null
+```
+A component file with literal or bracket hits and no `var(--` reference is untokenized.
+The literal-to-token ratio per file is the strength signal.
+**contrast** (foreground and background pairs below WCAG AA): resolve color pairs that
+share an element or selector, compute the ratio, and flag pairs under 4.5:1 for body
+text or 3:1 for large text and non-text indicators. Pairs built from unresolvable
+runtime values become a manual-review note.
+**density-spacing** (off-scale spacing and inconsistent rhythm):
+```bash
+grep -rEon "(p|px|py|pt|pb|pl|pr|m|mx|my|mt|mb|ml|mr|gap|space-[xy])-[0-9.]+" src/ \
+  --include="*.tsx" --include="*.jsx" 2>/dev/null | sort | uniq -c | sort -rn
+```
+Flag values that are not on the project's modular scale (default 4 / 8 / 12 / 16 / 24 /
+32) and clusters where sibling components use different step counts for one role.
+**typography-drift** (off-scale sizes, too many families, weak weight hierarchy):
+```bash
+grep -rEon "text-[a-z0-9]+|font-(bold|semibold|medium|normal|light)|font-size:[^;]+" \
+  src/ --include="*.tsx" --include="*.jsx" --include="*.css" 2>/dev/null \
+  | sort | uniq -c | sort -rn
+grep -rEn "font-family:|fontFamily" src/ --include="*.css" --include="*.ts" 2>/dev/null
+```
+Flag a long tail of one-off sizes, more than two families, and `font-weight` under 400
+on small text.
+**a11y-text** (text-content accessibility debt):
+```bash
+grep -rEn "<img(?![^>]*\balt=)" src/ --include="*.tsx" --include="*.jsx" 2>/dev/null
+grep -rEn "No data|No results|Nothing here|went wrong|error occurred" src/ \
+  --include="*.tsx" --include="*.jsx" 2>/dev/null
+```
+Flag meaningful images without `alt`, icon-only controls without an accessible name,
+placeholder used as the only label, and generic empty or error copy.
+### Step 2.5: Pre-Report Gate + confidence
+Before cataloging any finding, run the four-question Pre-Report Gate from
+`reference/reviewer-confidence-gate.md`: (a) can you cite `file:line`, (b) can you state the
+failure mode in one sentence, (c) did you read context beyond the matched line (the token
+definition, the call site), and (d) is the class assignment defensible? Stamp every catalog
+row with a `confidence` value (`0.0-1.0`): `>= 0.8` when all four pass, `0.5-0.8` when evidence
+is partial, `< 0.5` for a pattern match you could not confirm (for example an unresolved
+contrast pair or a literal that may be inside a token definition). Move every `< 0.5` finding
+into a `## Tentative` section instead of the ranked findings table, so a low-confidence guess
+never escalates to remediation. Confidence is independent of priority: a high-priority debt
+item can still be low confidence and belongs in `## Tentative` until confirmed.
+### Step 3: Group and score
+Group findings by the seven debt classes. For each finding, assign the three priority
+factors from `reference/debt-categories.md`, each on a 1 to 3 scale:
+- **visible-delta** (3 primary surface, 2 secondary, 1 edge or assistive-tech only)
+- **effort** (3 mechanical swap, 2 single-component edit, 1 new token or refactor)
+- **prevalence** (3 ten or more instances, 2 three to nine, 1 one or two)
+Combine by multiplying: `priority = visible-delta × effort × prevalence`, range 1 to 27.
+Sort the catalog by `priority` descending. Break ties by visible-delta, then prevalence.
+### Step 4: Write the catalog
+Create the directory and write the report. Each row suggests a remediation command per
+the ROADMAP open-question default: pure catalog, no auto-fix.
+```bash
+mkdir -p .design/debt
+```
+---
+## Output Format: DEBT-CATALOG.md
+Write to `.design/debt/DEBT-CATALOG.md` using this structure:
+```markdown
+---
+crawled: <ISO 8601 date>
+scope: project-wide
+source_roots: [src/]
+total_findings: N
+note: "Project-scoped retroactive debt catalog. Does NOT read STATE.md completed_tasks. Pure catalog; no auto-fix."
+---
+## Design Debt Catalog
+**Crawled:** <ISO 8601 date>
+**Scope:** Entire source tree (project-wide, not cycle-scoped)
+**Total findings:** N across 7 debt classes
+---
+## Summary by Class
+| Debt class | Findings | Top priority |
+|------------|----------|--------------|
+| color-literal | N | P |
+| untokenized-component | N | P |
+| anti-pattern | N | P |
+| contrast | N | P |
+| density-spacing | N | P |
+| typography-drift | N | P |
+| a11y-text | N | P |
+---
+## Findings (ranked by priority)
+| Priority | Class | Location | Finding | V × E × P | Confidence | Suggested command |
+|----------|-------|----------|---------|-----------|------------|-------------------|
+| 18 | color-literal | src/Card.tsx:42 | Raw #1a73e8 instead of token | 3×3×2 | 0.9 | `/gdd:fast "replace #1a73e8 with semantic token in Card.tsx"` |
+| 12 | anti-pattern | src/Hero.tsx:8 | BAN-02 gradient text on heading | 3×2×2 | 0.85 | `/gdd:fast "remove BAN-02 gradient text in Hero.tsx"` |
+(One row per finding with `confidence >= 0.5`. The Suggested command column always carries a `/gdd:fast "<finding>"` string. Findings below `0.5` go in `## Tentative` below, not in this table.)
+---
+## Tentative
+Findings with `confidence < 0.5` (pattern matches not confirmed by reading context, per
+`reference/reviewer-confidence-gate.md`). Listed for human review; never auto-escalated.
+- [class] [location]: [finding] (confidence: [N], unconfirmed because [reason])
+---
+## Manual-Review Notes
+Items the deterministic scans cannot decide on their own:
+- BAN-04 (keyboard-action animation) and BAN-10 (nested equal radius): subjective, not statically matched.
+- Contrast pairs built from unresolvable runtime color values.
+```
+Every finding row MUST carry a `/gdd:fast "<finding>"` suggestion. This agent never
+applies the fix; it only catalogs and suggests.
+---
+## Constraints
+**MUST NOT:**
+- Read `.design/STATE.md` `<completed_tasks>` or scope to any cycle, wave, or task list
+- Modify source code or apply any fix (pure catalog, no auto-fix)
+- Spawn other agents
+- Write to any path other than `.design/debt/DEBT-CATALOG.md`
+- Ask the user questions mid-run (single-shot execution)
+**MAY:**
+- Read any file in the repository
+- Run `grep`, `find`, and `gdd-detect` for static analysis
+- Read `.design/STATE.md` solely to learn `source_roots`
+- Note a `<blocker>` entry in `.design/STATE.md` if the crawl cannot proceed, then still emit the completion marker
+---
+## Record
+At run-end, append one JSONL line to `.design/intel/insights.jsonl`:
+```json
+{"ts":"<ISO-8601>","agent":"<name>","cycle":"<cycle from STATE.md>","stage":"<stage from STATE.md>","one_line_insight":"<what was produced or learned>","artifacts_written":["<files written>"]}
+```
+Schema: `reference/schemas/insight-line.schema.json`.
+## CRAWL COMPLETE

package/agents/design-executor.md CHANGED Viewed

@@ -395,6 +395,8 @@ Apply these rules automatically during execution. Track all deviations in the ta
 ## Task Output - .design/tasks/task-NN.md
+Before writing any `.design/` artifact, resolve the main repo root via `scripts/lib/worktree-resolve.cjs` (`resolveDesignRoot`) so a worktree run writes to the main checkout and does not leak.
 After completing the task's implementation work, write `.design/tasks/task-NN.md` (where NN = task_id from prompt context). Create `.design/tasks/` directory first if it does not exist.
 Format (locked - do not alter structure):

package/agents/design-fixer.md CHANGED Viewed

@@ -25,6 +25,8 @@ You have zero session memory. Every invocation starts fresh. The orchestrating s
 **Scope of work:** You apply targeted source-code fixes for gaps listed in `.design/DESIGN-VERIFICATION.md ## Phase 5 — Gaps`. You commit one fix per gap. You do nothing else.
+**Accessibility failures route here too.** When the quality-gate skill classifies a failure into the `a11y` bucket (sourced from axe / pa11y / lighthouse / jsx-a11y runs), it spawns you with that failure exactly like a `lint`, `type`, `test`, or `visual` failure. Treat an `a11y` classified failure as a normal in-scope fix: read the cited rule, apply the minimal source change that clears the violation (a missing label, an aria attribute, a contrast token), confirm the fix, and commit one fix per gap. No special handling beyond the standard fix sequence below.
 **What you MUST NOT touch:**
 - `DESIGN-PLAN.md` - locked during verify
 - `DESIGN-CONTEXT.md` - locked during verify
@@ -46,6 +48,8 @@ The orchestrating stage supplies a `<required_reading>` block in the prompt. Rea
 **Invariant:** read all listed files FIRST, before making any changes.
+**Worktree-root invariant:** before writing any `.design/` artifact (for example a `<blocker>` entry to `.design/STATE.md`), resolve the main repo root via `scripts/lib/worktree-resolve.cjs` so a worktree run writes to the canonical `.design/` and does not leak artifacts into the worktree checkout.
 ---
 ## Prompt Context Fields
@@ -86,7 +90,8 @@ Parse every entry in that section. The `G-NN` identifier, severity classificatio
 4. Filter by severity based on `auto_mode`:
    - Always include: `BLOCKER`, `MAJOR`
    - Include only if `auto_mode=true`: `MINOR`, `COSMETIC`
-5. Build an ordered list: BLOCKER first, then MAJOR, then (if included) MINOR, COSMETIC.
+5. **Confidence routing filter (Phase 49, see `reference/reviewer-confidence-gate.md`).** Drop any gap that sits under a `## Tentative` heading: those never reach you. Then drop any `BLOCKER` or `MAJOR` gap whose `confidence` field is below `0.8` and route it to user review instead of auto-fix, since a high-severity gap without strong evidence is exactly the inflated-severity case the gate exists to catch. A gap missing its `confidence` field is treated as below the floor. The shared decision lives in `scripts/lib/confidence-route.cjs` (`route({ severity, confidence, tentative })` returns `'fix' | 'user-review' | 'drop'`); fix only the gaps it routes to `'fix'`.
+6. Build an ordered list: BLOCKER first, then MAJOR, then (if included) MINOR, COSMETIC.
 If no in-scope gaps are found (e.g., verifier found only MINOR gaps and `auto_mode=false`), emit `## FIX COMPLETE` immediately with "No in-scope gaps to fix."

package/agents/design-planner.md CHANGED Viewed

@@ -227,6 +227,8 @@ Before finalizing task list:
 ## Output Format
+Before writing any `.design/` artifact, resolve the main repo root via `scripts/lib/worktree-resolve.cjs` (`resolveDesignRoot`) so a worktree run writes to the main checkout and does not leak.
 Write `.design/DESIGN-PLAN.md` with this exact structure:
 ```markdown

package/agents/design-reflector.md CHANGED Viewed

@@ -62,6 +62,8 @@ Minimum expected inputs (skip gracefully if absent, note what's missing):
 ## Output
+Before writing any `.design/` artifact, resolve the main repo root via `scripts/lib/worktree-resolve.cjs` (`resolveDesignRoot`) so a worktree run writes to the main checkout and does not leak.
 Write `.design/reflections/<cycle-slug>.md`. If `--dry-run` is set in the spawning prompt, print proposals to stdout only - do not write the file.
 If the capability-gap pattern scan emitted any events during this run, include a `## Capability gaps emitted` heading listing each `event_id` with the source signal kind (`intel` | `posterior` | `trajectory`) and the `suggested_kind` (`agent` | `skill`) per event. Plan 29-03 reads these events from `.design/gep/events.jsonl` to cluster recurring `capability_gap` events for `/gdd:apply-reflections`.

package/agents/design-research-synthesizer.md CHANGED Viewed

@@ -161,6 +161,8 @@ Read .design/STATE.md
 ## Output
+Before writing any `.design/` artifact, resolve the main repo root via `scripts/lib/worktree-resolve.cjs` (`resolveDesignRoot`) so a worktree run writes to the main checkout and does not leak.
 Single file: `.design/DESIGN-CONTEXT.md`.
 ## Record

package/agents/design-verifier.md CHANGED Viewed

@@ -33,6 +33,7 @@ The orchestrating stage supplies a `<required_reading>` block in the prompt. Rea
 - `.design/DESIGN-CONTEXT.md` - goals, must-haves, brand direction, references
 - `.design/tasks/` - what was actually done (glob all task files)
 - `reference/audit-scoring.md` - scoring rubric for category weights
+- `reference/reviewer-confidence-gate.md` - Pre-Report Gate, the `confidence` field, and the gap routing rule
 - `reference/heuristics.md` - NNG heuristics H-01..H-10 scoring guide
 - `reference/review-format.md` - visual UAT presentation format
 - `reference/accessibility.md` - WCAG checklist for accessibility scoring
@@ -40,6 +41,8 @@ The orchestrating stage supplies a `<required_reading>` block in the prompt. Rea
 - `connections/chromatic.md` - Chromatic CLI connection spec (probe, baseline management, fallback)
 - `connections/storybook.md` - Storybook HTTP probe and a11y integration details
+**Worktree-root invariant:** before writing `.design/DESIGN-VERIFICATION.md` (or any `.design/` artifact), resolve the main repo root via `scripts/lib/worktree-resolve.cjs` so a worktree run writes to the canonical `.design/` and does not leak artifacts into the worktree checkout.
 ## Prompt Context Fields
 The stage embeds these fields in its prompt:
@@ -440,6 +443,8 @@ Classify each gap:
 - `MINOR` - noticeable issue; fix if time allows
 - `COSMETIC` - polish only; defer to later
+**Pre-Report Gate (Phase 49, see `reference/reviewer-confidence-gate.md`).** Before emitting each gap, answer the four questions: (a) can you cite `file:line`, (b) can you state the failure mode in one sentence, (c) did you read context beyond the modified file, (d) is the severity defensible? Stamp every gap with a `confidence` field (`0.0-1.0`): `>= 0.8` when all four pass, `0.5-0.8` when evidence is partial, `< 0.5` for an unconfirmed hunch. A BLOCKER or MAJOR requires `confidence >= 0.8` plus a `file:line` citation plus a one-sentence failure mode; below that, lower the severity or move it to `## Tentative`. Confidence is independent of severity. Move every `< 0.5` gap into a `## Tentative` section so it is surfaced but never reaches `design-fixer`.
 For each gap, emit an entry in the locked gap format:
 ```
@@ -452,6 +457,7 @@ For each gap, emit an entry in the locked gap format:
 - Actual: [what is true]
 - Location: [file:line or UI element]
 - Suggested fix: [one-line hint]
+- confidence: [0.0-1.0]
 ```
 Order gaps: BLOCKER first, then MAJOR, MINOR, COSMETIC. Number sequentially (G-01, G-02, ...).
@@ -464,21 +470,7 @@ If zero gaps found: skip this section entirely - do NOT emit `## GAPS FOUND`.
 **Skip if `chromatic` is `not_configured` or `unavailable` in STATE.md `<connections>`.**
-If `.design/chromatic-results.json` exists:
-1. Read .design/chromatic-results.json
-2. Check if this is a first run (all entries have status: "new"):
-   → First run: emit "Baseline established - no regressions detected (first run creates baseline)."
-3. For subsequent runs, narrate changes:
-   For each story entry in results:
-     - status "unchanged" → PASS <StoryTitle>:<StoryName>
-     - status "changed" → CHANGED <StoryTitle>:<StoryName> (visual change detected - review on chromatic.com)
-     - status "new" → NEW <StoryTitle>:<StoryName> (first snapshot - not a regression)
-     - status "error" → ERROR <StoryTitle>:<StoryName> - investigate
-4. Emit summary: "Total: N stories. X unchanged. Y changed. Z new. W errors."
-5. If Y > 0 (changed stories): flag as "VISUAL REGRESSION CANDIDATES - review required on chromatic.com before merging"
-6. Append narration to DESIGN-VERIFICATION.md ## Visual Regression section (create section if absent)
-If .design/chromatic-results.json does not exist: skip; emit no note.
+If `.design/chromatic-results.json` exists, read it and narrate. First run (all entries `status: "new"`): emit "Baseline established - no regressions detected (first run creates baseline)." Subsequent runs, per story entry: `unchanged` → PASS, `changed` → CHANGED (review on chromatic.com), `new` → NEW (first snapshot, not a regression), `error` → ERROR (investigate). Emit summary "Total: N stories. X unchanged. Y changed. Z new. W errors." If any changed (Y > 0), flag "VISUAL REGRESSION CANDIDATES - review required on chromatic.com before merging". Append the narration to the DESIGN-VERIFICATION.md `## Visual Regression` section (create it if absent). If the file does not exist: skip; emit no note.
 ---

package/agents/quality-gate-runner.md CHANGED Viewed

@@ -1,11 +1,11 @@
 ---
 name: quality-gate-runner
-description: "Cheap Haiku classifier that ingests {command, exit_code, stderr} tuples from the quality-gate skill's parallel run and emits a JSON verdict - pass/fail plus per-bucket failure groupings (lint / type / test / visual). Read-only. Does not run commands itself."
+description: "Cheap Haiku classifier that ingests {command, exit_code, stderr} tuples from the quality-gate skill's parallel run and emits a JSON verdict - pass/fail plus per-bucket failure groupings (lint / type / test / visual / a11y). Read-only. Does not run commands itself."
 tools: Read, Bash, Grep
 color: amber
 model: inherit
 default-tier: haiku
-tier-rationale: "Pattern-match exit codes and bucket stderr into four named categories - no synthesis, no rewrites, no spawning. Belongs on Haiku to keep classification cost trivial relative to the actual command runs."
+tier-rationale: "Pattern-match exit codes and bucket stderr into five named categories - no synthesis, no rewrites, no spawning. Belongs on Haiku to keep classification cost trivial relative to the actual command runs."
 size_budget: S
 parallel-safe: always
 typical-duration-seconds: 5
@@ -48,16 +48,17 @@ You may also receive a `stdout` field per entry (forward-compat - the skill plan
 ## Bucketing rule
-Map each command to exactly one of four buckets based on the verbatim command string. Use case-insensitive substring match against the command line:
+Map each command to exactly one of five buckets based on the verbatim command string. Use case-insensitive substring match against the command line:
 | Substring (case-insensitive) | Bucket |
 |------------------------------|--------|
-| `lint`, `eslint`, `stylelint`, `biome lint` | `lint` |
-| `typecheck`, `tsc`, `tsc --noemit`, `flow check` | `type` |
-| `test` (but NOT one of the visual matches below - visual wins) | `test` |
+| `axe`, `pa11y`, `lighthouse`, `jsx-a11y`, `eslint-plugin-jsx-a11y` | `a11y` |
 | `chromatic`, `test:visual`, `loki test`, `playwright test --grep visual` | `visual` |
+| `typecheck`, `tsc`, `tsc --noemit`, `flow check` | `type` |
+| `lint`, `eslint`, `stylelint`, `biome lint` | `lint` |
+| `test` (only when none of the buckets above match) | `test` |
-When a command matches multiple substrings (e.g., `npm run test:visual` matches both `test` and `test:visual`), `visual` wins. If a command matches none, bucket it under `test` (catch-all - most user-supplied custom commands are test-like). Do not invent a fifth bucket.
+Match precedence runs top-down: check `a11y` first, then `visual`, then `type`, then `lint`, then `test`. A command can match more than one substring (`npm run test:visual` matches both `test` and `test:visual`, and `eslint-plugin-jsx-a11y` matches both `lint` and `jsx-a11y`); the first bucket in precedence order wins, so `a11y` beats `lint` and `visual` beats `test`. If a command matches none, bucket it under `test` (catch-all - most user-supplied custom commands are test-like). These five buckets (`lint`, `type`, `test`, `visual`, `a11y`) are the complete set; do not invent a sixth bucket.
 ## Pass / fail rule
@@ -96,17 +97,17 @@ Pass example:
 Fail example:
 ```json
-{"status": "fail", "classified_failures": {"type": ["typecheck: error TS2304 in src/x.ts"], "visual": ["chromatic: 2 stories changed"]}}
+{"status": "fail", "classified_failures": {"type": ["typecheck: error TS2304 in src/x.ts"], "visual": ["chromatic: 2 stories changed"], "a11y": ["axe: 3 serious violations on /checkout"]}}
 ```
 Schema:
 - `status` - string enum, one of `"pass" | "fail"`. Note: this is NOT the same enum as the skill's STATE-block status (which also has `timeout` and `skipped`); those two cases are decided by the skill, not by you. You only emit `pass | fail`.
-- `classified_failures` - object. Keys are a subset of `lint | type | test | visual`. Values are arrays of short summary strings (≤ 120 chars each). The object is `{}` (empty) when `status === "pass"`.
+- `classified_failures` - object. Keys are a subset of `lint | type | test | visual | a11y`. Values are arrays of short summary strings (≤ 120 chars each). The object is `{}` (empty) when `status === "pass"`.
 ## Constraints
 - **Do not** read `stderr` content beyond the first non-empty line. The skill keeps the verbatim outputs for the design-fixer; your job is routing, not analysis.
-- **Do not** invent buckets outside the four-name set.
+- **Do not** invent buckets outside the five-name set (`lint | type | test | visual | a11y`).
 - **Do not** ever emit `status: "timeout"` or `status: "skipped"` - those are skill-level statuses, not classifier outputs.
 - **Do not** consult external services or MCP tools. Classification is a pure function of the supplied input.
 - **Do not** exceed `size_budget: S`. If `outputs[*].stderr` is unexpectedly large, prefer to summarize from the first 4 KB of each stderr rather than refuse.

package/dist/claude-code/.claude/skills/brief/SKILL.md CHANGED Viewed

@@ -108,6 +108,23 @@ Run this final spec-quality pass over `.design/BRIEF.md` before the brief→expl
 - Scope check: nothing in the artifact exceeds (or silently drops) the agreed scope.
 - Ambiguity check: every requirement/decision is specific enough to act on without a follow-up question.
+## Optional brief audit (non-blocking)
+Before the gate, you MAY spawn `agents/brief-auditor.md` via `Task` to grade the brief against the five
+brief anti-patterns (vague verbs, missing audience, immeasurable success criteria, scope creep, missing
+anti-goals). The auditor reads `.design/BRIEF.md` plus `reference/brief-quality-rubric.md` and writes
+advisory findings to `.design/BRIEF-AUDIT.md`. This step is advisory and MUST NOT block the brief to
+explore transition.
+If the auditor reports one or more fired anti-patterns, surface a single-line pointer to the user:
+```
+Brief audit flagged N issue(s) - run /gdd:discuss brief to refine, or proceed to explore.
+```
+The user decides. Proceeding to explore with a flagged brief is allowed; the pointer is a nudge, not a gate.
+If the auditor reports no fired anti-patterns, or you skip the audit, continue to the gate unchanged.
 <HARD-GATE>
 Do NOT transition to explore (or invoke `/gdd:explore`) until the brief artifact (default `.design/BRIEF.md`) is committed AND the user has approved it. If this project uses a custom `.design` location, read the artifact path from `.design/STATE.md` rather than assuming the default.
 </HARD-GATE>

package/dist/claude-code/.claude/skills/quality-gate/SKILL.md CHANGED Viewed

@@ -39,7 +39,7 @@ Read once at start from `.design/config.json` (all optional; defaults in parens)
 Stop at the first tier that produces ≥ 1 command:
 1. **Authoritative config.** If `.design/config.json` has `quality_gate.commands` non-empty, use verbatim.
-2. **Auto-detect from `package.json#scripts`** - match against allowlist: `lint`, `typecheck`, `tsc` (only if `typecheck` absent), `test`, `chromatic`, `test:visual`, `lint:design` (Phase 41 - the `gdd-detect` deterministic anti-pattern gate, alongside `axe`/`pa11y`/`lighthouse`). Exclude by name: `test:e2e`, `test:integration` (if separate `test`), anything starting `dev:`, `build:`, `start:`. Run via `npm run <name>` unless `quality_gate.package_manager` overrides.
+2. **Auto-detect from `package.json#scripts`** - match against allowlist: `lint`, `typecheck`, `tsc` (only if `typecheck` absent), `test`, `chromatic`, `test:visual`, `lint:design` (Phase 41 - the `gdd-detect` deterministic anti-pattern gate), and the accessibility scripts `axe`, `pa11y`, `lighthouse`, `eslint-plugin-jsx-a11y` (or a script named `jsx-a11y`) which classify into the `a11y` bucket. Exclude by name: `test:e2e`, `test:integration` (if separate `test`), anything starting `dev:`, `build:`, `start:`. Run via `npm run <name>` unless `quality_gate.package_manager` overrides.
 3. **Skip with notice.** Emit `quality_gate_skipped` (Step 6) and write a `<run/>` with `status="skipped"`. Verify treats skipped as non-blocking.
 ## Step 2 - Parallel run
@@ -48,7 +48,7 @@ Emit `quality_gate_started`. Spawn each command in a separate `Bash`; collect `{
 ## Step 3 - Classification
-Spawn `quality-gate-runner` agent via `Task` with payload `{outputs: [{command, exit_code, stderr}, ...]}`. Agent returns `{status: "pass"|"fail", classified_failures: {lint, type, test, visual}}`. `pass` → Step 5. `fail` → Step 4.
+Spawn `quality-gate-runner` agent via `Task` with payload `{outputs: [{command, exit_code, stderr}, ...]}`. Agent returns `{status: "pass"|"fail", classified_failures: {lint, type, test, visual, a11y}}`. The `a11y` bucket groups accessibility failures from axe / pa11y / lighthouse / jsx-a11y. `pass` → Step 5. `fail` → Step 4.
 ## Step 4 - Fix loop (D-08)

package/hooks/gdd-a11y-gate.js ADDED Viewed

@@ -0,0 +1,119 @@
+#!/usr/bin/env node
+'use strict';
+/**
+ * hooks/gdd-a11y-gate.js — advisory PostToolUse hook for accessibility failures.
+ *
+ * Phase 48 (A11Y-GATE). The quality-gate skill classifies failed command runs
+ * into buckets {lint, type, test, visual, a11y}. When a tool response carries
+ * classified_failures with a non-empty `a11y` bucket, this hook surfaces an
+ * advisory note so the accessibility failures are visible without being buried
+ * in the gate's JSON, and appends a `quality_gate_a11y` event to the cycle's
+ * events.jsonl for observability.
+ *
+ * Contract (mirrors gdd-mcp-circuit-breaker.js):
+ *   - Read stdin JSON (the PostToolUse payload).
+ *   - Inspect payload.tool_response for quality-gate classified_failures.a11y.
+ *   - If present and non-empty: emit an advisory note + append one events.jsonl row.
+ *   - ALWAYS write {continue:true} to stdout and exit 0. This hook never blocks.
+ *
+ * Advisory only: accessibility findings route to design-fixer through the gate's
+ * own fix loop, not through this hook. The hook is observability, not a gate.
+ * Dependency-free Node (fs + path only).
+ */
+const fs = require('fs');
+const path = require('path');
+/**
+ * Pull the `a11y` bucket out of a tool response, tolerating both the shape
+ * where classified_failures sits at the top level and the shape where it is
+ * nested under a `quality_gate` / `result` wrapper. Returns an array of
+ * summary strings (possibly empty) or null when no a11y bucket is present.
+ */
+function extractA11yFailures(toolResponse) {
+  if (!toolResponse || typeof toolResponse !== 'object') return null;
+  const candidates = [
+    toolResponse.classified_failures,
+    toolResponse.quality_gate && toolResponse.quality_gate.classified_failures,
+    toolResponse.result && toolResponse.result.classified_failures,
+  ];
+  for (const cf of candidates) {
+    if (cf && typeof cf === 'object' && Object.prototype.hasOwnProperty.call(cf, 'a11y')) {
+      const bucket = cf.a11y;
+      if (Array.isArray(bucket)) return bucket;
+      // Tolerate a non-array truthy value by coercing to a single-element list.
+      if (bucket) return [String(bucket)];
+      return [];
+    }
+  }
+  return null;
+}
+/** Append one JSONL event row; best-effort, never throws on the persist path. */
+function appendEvent(cwd, row) {
+  try {
+    const eventsPath = path.join(cwd, '.design', 'events.jsonl');
+    fs.mkdirSync(path.dirname(eventsPath), { recursive: true });
+    fs.appendFileSync(eventsPath, JSON.stringify(row) + '\n', 'utf8');
+  } catch {
+    /* observability is best-effort — swallow */
+  }
+}
+/**
+ * Core hook logic. Accepts a parsed payload and returns the decision object
+ * to write to stdout. Exported for unit testing without spawning a process.
+ * Always returns an object whose `continue` field is true.
+ */
+function evaluate(payload, opts = {}) {
+  const cwd = (payload && payload.cwd) || opts.cwd || process.cwd();
+  const toolResponse = payload && payload.tool_response;
+  const a11y = extractA11yFailures(toolResponse);
+  if (!a11y || a11y.length === 0) {
+    return { continue: true };
+  }
+  const count = a11y.length;
+  const note =
+    `gdd-a11y-gate: quality gate reported ${count} accessibility ` +
+    `failure${count === 1 ? '' : 's'} in the a11y bucket. These route to ` +
+    `design-fixer like lint/type/test/visual failures. Findings: ` +
+    a11y.slice(0, 5).join('; ');
+  appendEvent(cwd, {
+    ts: new Date().toISOString(),
+    event: 'quality_gate_a11y',
+    a11y_failure_count: count,
+    a11y_failures: a11y.slice(0, 20),
+  });
+  // continue:true keeps this advisory — systemMessage surfaces the note.
+  return { continue: true, systemMessage: note };
+}
+async function main(stdin = process.stdin, stdout = process.stdout) {
+  let buf = '';
+  for await (const chunk of stdin) buf += chunk;
+  let payload;
+  try {
+    payload = JSON.parse(buf || '{}');
+  } catch {
+    stdout.write(JSON.stringify({ continue: true }));
+    return;
+  }
+  const decision = evaluate(payload);
+  stdout.write(JSON.stringify(decision));
+}
+// Run as a CLI only when invoked directly; tests require() this module and
+// call evaluate()/main() against mock payloads without triggering stdin reads.
+if (require.main === module) {
+  main().catch(() => {
+    process.stdout.write(JSON.stringify({ continue: true }));
+  });
+}
+module.exports = { main, evaluate, extractA11yFailures, appendEvent };