npm - @hegemonart/get-design-done - Versions diffs - 1.47.0 → 1.48.0 - Mend

@hegemonart/get-design-done 1.47.0 → 1.48.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (21) hide show

package/.claude-plugin/marketplace.json +2 -2
package/.claude-plugin/plugin.json +1 -1
package/CHANGELOG.md +47 -0
package/README.md +2 -0
package/agents/brief-auditor.md +147 -0
package/agents/copy-auditor.md +215 -0
package/agents/design-auditor.md +13 -3
package/agents/design-debt-crawler.md +269 -0
package/agents/design-fixer.md +2 -0
package/agents/quality-gate-runner.md +11 -10
package/dist/claude-code/.claude/skills/brief/SKILL.md +17 -0
package/dist/claude-code/.claude/skills/quality-gate/SKILL.md +2 -2
package/hooks/gdd-a11y-gate.js +119 -0
package/hooks/hooks.json +8 -0
package/package.json +1 -1
package/reference/brief-quality-rubric.md +98 -0
package/reference/copy-quality.md +135 -0
package/reference/debt-categories.md +148 -0
package/reference/registry.json +21 -0
package/skills/brief/SKILL.md +17 -0
package/skills/quality-gate/SKILL.md +2 -2

package/reference/copy-quality.md ADDED Viewed

@@ -0,0 +1,135 @@
+---
+name: copy-quality
+type: heuristic
+version: 1.0.0
+phase: 48
+tags: [copy, microcopy, ux-writing, ctas, errors, empty-states, aria, alt-text, i18n, voice, pillar-1]
+last_updated: 2026-06-03
+---
+# Copy Quality - Microcopy Rubric
+This file is the rubric the Copy pillar (Pillar 1 of `agents/design-auditor.md`) scores against, and the detailed reference that `agents/copy-auditor.md` applies. The design-auditor's Pillar 1 carries the 1-4 score and a one-line finding; this file holds the per-category criteria, the failure patterns, and the i18n lens that produce that score.
+Voice and tone are owned by [`brand-voice.md`](./brand-voice.md): the five axes (formal/casual, serious/playful, expert/approachable, reverent/irreverent, authoritative/collaborative), the 12 archetypes, and the tone-by-context table. This file does not restate those. It checks whether the implemented strings honor the voice the product declared, and whether each microcopy surface does its specific job.
+Cultural and locale meaning is owned by [`rtl-cjk-cultural.md`](./rtl-cjk-cultural.md); engineering primitives for localized strings are owned by [`i18n.md`](./i18n.md). This file consumes the i18n text-expansion table as a copy-length lens (see Internationalization Lens below).
+## Why Copy Is a Pillar
+Generic or default copy is the most common quiet failure in shipped UI. A "Submit" button, a "No data" empty state, and a raw stack-trace error each pass a functional test and each fail the user. Specific, purposeful language is a craft signal: it shows the team pictured a real person reading the screen. The Copy pillar measures that gap between functional text and intentional text.
+## Category Rubrics
+Each category below lists what good looks like and the failure patterns to grep for. A category is "weak" when the failure patterns dominate; "absent" when the surface exists with no considered copy at all.
+### Button and CTA labels
+Good labels are verb-first and name the object: "Export CSV", "Add member", "Send invite", "Delete project". The user reads the label and knows the outcome before clicking. Primary actions carry the specific verb; secondary actions ("Cancel", "Back") may stay plain because they are reversible and conventional.
+Failure patterns: bare "Submit", "OK", "Go", "Click here", "Button", or a noun with no verb ("Form", "Settings" on an action). A primary action labeled "Save" with no object on a screen that saves several distinct things is weak, not broken.
+```bash
+grep -rEn ">(Submit|Click Here|OK|Go|Button|Done)<" src/ --include="*.tsx" --include="*.jsx" 2>/dev/null | head -10
+```
+### Error messages
+Good error copy names the cause and the recovery, and never blames the user. State what happened, then what to do: "We could not save your changes. Check your connection and try again." For validation, point at the field and the fix: "Enter a date in the future." The tone matches the stakes (see the tone-by-context table in `brand-voice.md`); data-loss and payment failures are calm and serious, never playful.
+Failure patterns: raw codes or stack traces shown to users ("Error 500", "undefined is not a function"), blame language ("You entered an invalid value", "Your input was wrong"), and dead ends that state the failure with no next step.
+```bash
+grep -rEn "went wrong|Error [0-9]|invalid input|you entered|try again" src/ --include="*.tsx" --include="*.jsx" 2>/dev/null | head -10
+```
+### Empty states
+Good empty states orient and offer the first action. They explain why the screen is empty and give one clear thing to do: "No projects yet. Create your first project to get started." A first-run empty state is an onboarding moment, not an error.
+Failure patterns: "No data", "Nothing here", a bare zero, or an empty container with no copy. An empty state that explains the emptiness but offers no action is weak.
+```bash
+grep -rEn "No data|No results|Nothing here|No items|empty" src/ --include="*.tsx" --include="*.jsx" 2>/dev/null | head -10
+```
+### Loading and skeleton copy
+Good loading copy reassures and signals progress without nagging: "Getting your data...", "Almost there." Skeleton screens are preferred to spinner-plus-text for content that has a known shape; when text is shown, it is brief and specific to what is loading. Long operations name the step ("Uploading 3 of 12 files").
+Failure patterns: "Loading..." with no context on a multi-second operation, a spinner with no label on a process that can fail, or jokey loading copy on a high-stakes action.
+```bash
+grep -rEn "Loading|Please wait|spinner|Skeleton" src/ --include="*.tsx" --include="*.jsx" 2>/dev/null | head -10
+```
+### ARIA text quality
+ARIA strings are copy the screen-reader user hears. They must describe purpose, not implementation. `aria-label="Close dialog"` is good; `aria-label="button"` or `aria-label="icon"` is noise. Live-region announcements (`aria-live`, `role="status"`, `role="alert"`) carry the same cause-plus-recovery standard as visible errors. Labels must not duplicate adjacent visible text in a way that makes the reader say it twice.
+Failure patterns: `aria-label` set to the element type, `aria-label` that restates a visible label verbatim, missing `aria-label` on icon-only controls, and silent live regions on async state changes.
+```bash
+grep -rEn "aria-label=\"(button|icon|link|image|click)\"" src/ --include="*.tsx" --include="*.jsx" 2>/dev/null | head -10
+```
+### Alt-text quality
+Good alt text conveys the function or meaning of an image in context. A logo links home: `alt="Acme home"`, not `alt="logo"`. A decorative image takes empty alt (`alt=""`) so the reader skips it. An informative chart names its takeaway, not its file. Alt text never starts with "image of" or "picture of"; the reader already knows it is an image.
+Failure patterns: `alt="image"`, `alt="photo"`, filename alt (`alt="IMG_2043.jpg"`), missing alt on informative images, and non-empty alt on purely decorative images.
+```bash
+grep -rEn "alt=\"(image|photo|picture|img|logo)\"|alt=\"[^\"]*\\.(png|jpg|jpeg|svg|webp)\"" src/ --include="*.tsx" --include="*.jsx" 2>/dev/null | head -10
+```
+### Form labels, helper, and validation copy
+Every input has a persistent visible label (placeholders are not labels). Helper text sets expectations before the user types ("Use 8 or more characters"). Validation copy is specific and arrives at the right time: inline after blur for format rules, on submit for cross-field rules. Required and optional states are stated in words, not color alone.
+Failure patterns: placeholder-as-label, helper text that only appears as an error after failure, generic "This field is required" with no field name when several are blank, and validation copy that says what is wrong but not how to fix it.
+```bash
+grep -rEn "placeholder=|required|This field|is invalid" src/ --include="*.tsx" --include="*.jsx" 2>/dev/null | head -10
+```
+### Voice and tone alignment
+Read the declared voice from `.design/DESIGN-CONTEXT.md` (axis positions and archetype, if recorded) and check the implemented strings against it. A Caregiver health app with a curt, blaming error message is off-voice even if the error is technically clear. A formal finance app with "Oops!" on a failed transfer is off-voice. Use the tone-by-context table in `brand-voice.md` as the per-surface contract: error, empty state, success, onboarding, loading, destructive confirmation each have a recommended register.
+Failure patterns: one tone for marketing copy and a flatly different tone for in-product copy, playful language on high-stakes actions, and copy that contradicts the declared archetype.
+## Internationalization Lens
+Apply this lens to copy-heavy components (anything where text drives width: buttons, nav, tabs, chips, table headers, banners). It folds the [`i18n.md`](./i18n.md) text-expansion table into the copy score.
+Two probes:
+1. **Hardcoded user-facing strings.** Strings rendered to users should flow through the project's i18n layer, not sit as literals in JSX. A literal English string in a component is a copy defect for any product that ships, or plans to ship, more than one locale.
+   ```bash
+   grep -rEn ">[A-Z][a-z]+ [a-z]+.*<|aria-label=\"[A-Z][a-z]+ " src/ --include="*.tsx" --include="*.jsx" 2>/dev/null | head -10
+   ```
+2. **Expansion-overflow at +40%.** German expands English by about +30%, Russian by about +40% (see the expansion table in `i18n.md`). A fixed-width control sized to English is the worst case. The lens: for each copy-bearing component with a width constraint, ask whether the label survives a +40% expansion without clipping or truncating mid-word. Containers sized to `EN base x 1.4` clear the worst row of that table. Flag fixed pixel widths on text controls and single-line labels with no wrap or ellipsis strategy.
+   ```bash
+   grep -rEn "w-\[[0-9]+px\]|width:\s*[0-9]+px|truncate|whitespace-nowrap" src/ --include="*.tsx" --include="*.jsx" 2>/dev/null | head -10
+   ```
+A copy-heavy component that hardcodes strings, or that clips at +40%, drops the Copy pillar by one point and appears in the findings with the `i18n_readiness` lens tag (see `audit-scoring.md` Lens-Tags).
+## Scoring Guide (Copy Pillar, 1-4)
+This scale matches the 1-4 definitions in `agents/design-auditor.md`. The Copy pillar score is a single 1-4 value; the categories above supply the evidence.
+| Score | Label | Criteria |
+|-------|-------|----------|
+| 4 | Exemplary | CTAs are verb-object specific; error messages name cause and recovery without blame; empty states orient and offer a first action; ARIA and alt text describe purpose; form copy guides before and during input; voice matches the declared axes and archetype; copy-heavy components survive a +40% expansion and route strings through i18n. |
+| 3 | Solid | Most copy is intentional; one or two generic labels remain (a plain "Save" on a single-purpose form); error and empty states are present and human but plain; minor voice drift only. |
+| 2 | Present but weak | Mix of intentional and generic copy; some empty states missing; errors show raw codes or blame; ARIA or alt text restates element type; hardcoded strings or +40% overflow risk in copy-heavy components. |
+| 1 | Absent or broken | Majority generic ("Submit", "OK", "Cancel"); empty states absent or "No data"; errors are developer-facing; ARIA labels are noise; alt text is "image" or filenames; no voice considered. |
+## How findings feed the audit
+`agents/copy-auditor.md` runs these probes, scores the pillar, and writes `.design/COPY-AUDIT.md` as a supplement. `agents/design-auditor.md` folds that score and the top finding into Pillar 1 of `.design/DESIGN-AUDIT.md`. Neither file changes the separate 7-category 0-10 system in `audit-scoring.md`; the Copy pillar is a qualitative 1-4 signal, not a weighted metric.

package/reference/debt-categories.md ADDED Viewed

@@ -0,0 +1,148 @@
+---
+name: debt-categories
+type: reference
+version: 1.0.0
+phase: 48
+tags: [debt, taxonomy, audit, crawler, priority-scoring, retroactive]
+last_updated: 2026-06-03
+---
+# Debt Categories
+The taxonomy `agents/design-debt-crawler.md` uses to classify and rank design debt
+found across an entire codebase. Each class has a definition, the detection signal
+that surfaces it, and the typical fix shape. The priority model at the end converts
+every finding into one comparable score so the catalog reads top to bottom by impact.
+This taxonomy is detection-oriented, not a style guide. The reason behind each rule
+lives in `reference/anti-patterns.md` (the BAN-NN and SLOP-NN catalog) and the domain
+references `reference/color.md`, `reference/typography.md`, and `reference/spatial.md`.
+This file is the catalog of what to look for and how to weigh it.
+---
+## Debt Classes
+### color-literal
+**Definition:** A raw color value written directly in source instead of a token
+reference. Includes `#rgb` / `#rrggbb` / `#rrggbbaa` hex, `rgb()` / `rgba()`, and
+`hsl()` / `hsla()` function literals that flow to a rendered surface.
+**Detection signal:** Grep for `#[0-9a-fA-F]{3,8}`, `rgb\(`, `rgba\(`, `hsl\(`,
+`hsla\(` in source files. Exclude token-definition files (the palette source itself)
+and comments. A literal inside a `var(--token: #hex)` definition is the token, not debt.
+**Fix shape:** Replace the literal with the matching semantic token. If no token
+exists for that role, the fix is to define one first.
+### untokenized-component
+**Definition:** A component file that renders visual surface (color, spacing, type
+size) using inline values or arbitrary utility classes rather than referencing the
+design system's tokens or scale.
+**Detection signal:** A component file (`.tsx` / `.jsx` / `.vue` / `.svelte`) that
+contains color-literal or arbitrary-bracket utility hits (`\[[0-9]+px\]`,
+`\[#[0-9a-f]+\]`) and zero `var(--` references or scale-class references. The ratio
+of literal uses to token uses inside one file is the strength signal.
+**Fix shape:** Route the component's visual values through tokens and the spacing
+or type scale; remove the arbitrary brackets.
+### anti-pattern
+**Definition:** A confirmed instance of a banned construct (BAN-NN) or an AI-slop
+tell (SLOP-NN) from `reference/anti-patterns.md`.
+**Detection signal:** Run `gdd-detect <path> --json`. Each finding carries its
+`ruleId`, `file`, `line`, and a link back to the matching paragraph. The detector
+covers the statically matchable BAN rules; the two subjective rules it cannot match
+(BAN-04 keyboard-action animation, BAN-10 nested equal radius) are noted as a
+manual-review item, not auto-counted.
+**Fix shape:** Apply the rule's documented rewrite. Hard bans take precedence over
+SLOP tells when both touch the same element.
+### contrast
+**Definition:** A foreground and background pairing that falls below the WCAG 2.1 AA
+contrast floor (4.5:1 for body text, 3:1 for large text and non-text indicators).
+**Detection signal:** Resolve text-color and background-color pairs that share an
+element or selector, compute the ratio, and flag pairs under the threshold. Pairs
+built from unresolvable runtime values are a manual-review item.
+**Fix shape:** Adjust the token or the role assignment so the pair clears AA. Never
+rely on color alone to carry meaning; pair it with text or an icon.
+### density-spacing
+**Definition:** Spacing values that sit off the project's modular scale (for example
+the 4 / 8 / 12 / 16 / 24 / 32 series), or inconsistent density between sibling
+components that should share rhythm.
+**Detection signal:** Collect every padding, margin, and gap value, then flag values
+that are not on the declared scale and clusters where neighboring components use
+different step counts for the same structural role.
+**Fix shape:** Snap off-grid values to the nearest scale step; align sibling density.
+### typography-drift
+**Definition:** Font sizes, weights, or families that drift from a systematic type
+scale: arbitrary pixel sizes, more than the agreed family count, or weight choices
+that break the heading-to-body hierarchy.
+**Detection signal:** Tally the distinct font-size and font-weight values and the
+family count. A long tail of one-off sizes, more than two families, or `font-weight`
+under 400 on small text are the drift markers.
+**Fix shape:** Map each one-off value onto the nearest scale step; cap families at two.
+### a11y-text
+**Definition:** Text-content accessibility debt: missing `alt` on meaningful images,
+icon-only controls without an accessible name, placeholder used as the only label,
+and generic or developer-facing copy in empty and error states.
+**Detection signal:** Grep for `<img` without `alt`, interactive elements without
+`aria-label` or visible text, inputs with `placeholder` and no `<label>`, and
+empty-state strings such as "No data" or raw error codes.
+**Fix shape:** Add the accessible name or label; rewrite generic copy to be specific
+and actionable. Copy-quality detail lives in `reference/copy-quality.md`.
+---
+## Priority Scoring Model
+Every finding gets one priority score so the catalog ranks by impact, not by the
+order files were walked. Three ordinal factors combine, each on a 1 to 3 scale.
+**Visible-delta** (how much a user notices the fix):
+| Value | Meaning |
+|-------|---------|
+| 3 | Changes a primary surface a user sees on first load |
+| 2 | Changes a secondary or interior surface |
+| 1 | Invisible at rest; shows only in an edge state or to assistive tech |
+**Effort** (how cheap the fix is; cheaper scores higher so quick wins float up):
+| Value | Meaning |
+|-------|---------|
+| 3 | Mechanical one-line swap (literal to token) |
+| 2 | Localized edit across a single component |
+| 1 | Needs a new token, scale decision, or cross-file refactor |
+**Prevalence** (how many instances share this root cause):
+| Value | Meaning |
+|-------|---------|
+| 3 | Ten or more instances of the same finding |
+| 2 | Three to nine instances |
+| 1 | One or two instances |
+**Combine** by multiplying the three factors:
+```
+priority = visible-delta × effort × prevalence
+```
+The product ranges from 1 (low) to 27 (high). Sort the catalog by `priority`
+descending so the highest-impact, lowest-cost, most-widespread debt sits at the top.
+On ties, break by visible-delta first, then prevalence. Record the three factors next
+to the score in each catalog row so the ranking is auditable, not opaque.

package/reference/registry.json CHANGED Viewed

@@ -1100,6 +1100,27 @@
       "type": "domain-index",
       "phase": 45,
       "description": "Phase 45 domain-index: UX-writing + voice - brand-voice, style-vocabulary, anti-patterns."
+    },
+    {
+      "name": "copy-quality",
+      "path": "reference/copy-quality.md",
+      "type": "heuristic",
+      "phase": 48,
+      "description": "Phase 48 copy-quality pillar rubric: microcopy (button/CTA labels, error messages, empty states, ARIA/alt text, loading copy), voice alignment, i18n overflow lens."
+    },
+    {
+      "name": "debt-categories",
+      "path": "reference/debt-categories.md",
+      "type": "taxonomy",
+      "phase": 48,
+      "description": "Phase 48 design-debt taxonomy: debt classes (color-literal, untokenized-component, anti-pattern, contrast, density-spacing, typography-drift, a11y-text) + visible-delta x effort x prevalence priority scoring."
+    },
+    {
+      "name": "brief-quality-rubric",
+      "path": "reference/brief-quality-rubric.md",
+      "type": "output-contract",
+      "phase": 48,
+      "description": "Phase 48 brief-quality rubric: 5 anti-patterns (vague verbs, missing audience, immeasurable success criteria, scope creep, missing anti-goals) the brief-auditor surfaces."
     }
   ]
 }

package/skills/brief/SKILL.md CHANGED Viewed

@@ -108,6 +108,23 @@ Run this final spec-quality pass over `.design/BRIEF.md` before the brief→expl
 - Scope check: nothing in the artifact exceeds (or silently drops) the agreed scope.
 - Ambiguity check: every requirement/decision is specific enough to act on without a follow-up question.
+## Optional brief audit (non-blocking)
+Before the gate, you MAY spawn `agents/brief-auditor.md` via `Task` to grade the brief against the five
+brief anti-patterns (vague verbs, missing audience, immeasurable success criteria, scope creep, missing
+anti-goals). The auditor reads `.design/BRIEF.md` plus `reference/brief-quality-rubric.md` and writes
+advisory findings to `.design/BRIEF-AUDIT.md`. This step is advisory and MUST NOT block the brief to
+explore transition.
+If the auditor reports one or more fired anti-patterns, surface a single-line pointer to the user:
+```
+Brief audit flagged N issue(s) - run /gdd:discuss brief to refine, or proceed to explore.
+```
+The user decides. Proceeding to explore with a flagged brief is allowed; the pointer is a nudge, not a gate.
+If the auditor reports no fired anti-patterns, or you skip the audit, continue to the gate unchanged.
 <HARD-GATE>
 Do NOT transition to explore (or invoke `/gdd:explore`) until the brief artifact (default `.design/BRIEF.md`) is committed AND the user has approved it. If this project uses a custom `.design` location, read the artifact path from `.design/STATE.md` rather than assuming the default.
 </HARD-GATE>

package/skills/quality-gate/SKILL.md CHANGED Viewed

@@ -39,7 +39,7 @@ Read once at start from `.design/config.json` (all optional; defaults in parens)
 Stop at the first tier that produces ≥ 1 command:
 1. **Authoritative config.** If `.design/config.json` has `quality_gate.commands` non-empty, use verbatim.
-2. **Auto-detect from `package.json#scripts`** - match against allowlist: `lint`, `typecheck`, `tsc` (only if `typecheck` absent), `test`, `chromatic`, `test:visual`, `lint:design` (Phase 41 - the `gdd-detect` deterministic anti-pattern gate, alongside `axe`/`pa11y`/`lighthouse`). Exclude by name: `test:e2e`, `test:integration` (if separate `test`), anything starting `dev:`, `build:`, `start:`. Run via `npm run <name>` unless `quality_gate.package_manager` overrides.
+2. **Auto-detect from `package.json#scripts`** - match against allowlist: `lint`, `typecheck`, `tsc` (only if `typecheck` absent), `test`, `chromatic`, `test:visual`, `lint:design` (Phase 41 - the `gdd-detect` deterministic anti-pattern gate), and the accessibility scripts `axe`, `pa11y`, `lighthouse`, `eslint-plugin-jsx-a11y` (or a script named `jsx-a11y`) which classify into the `a11y` bucket. Exclude by name: `test:e2e`, `test:integration` (if separate `test`), anything starting `dev:`, `build:`, `start:`. Run via `npm run <name>` unless `quality_gate.package_manager` overrides.
 3. **Skip with notice.** Emit `quality_gate_skipped` (Step 6) and write a `<run/>` with `status="skipped"`. Verify treats skipped as non-blocking.
 ## Step 2 - Parallel run
@@ -48,7 +48,7 @@ Emit `quality_gate_started`. Spawn each command in a separate `Bash`; collect `{
 ## Step 3 - Classification
-Spawn `quality-gate-runner` agent via `Task` with payload `{outputs: [{command, exit_code, stderr}, ...]}`. Agent returns `{status: "pass"|"fail", classified_failures: {lint, type, test, visual}}`. `pass` → Step 5. `fail` → Step 4.
+Spawn `quality-gate-runner` agent via `Task` with payload `{outputs: [{command, exit_code, stderr}, ...]}`. Agent returns `{status: "pass"|"fail", classified_failures: {lint, type, test, visual, a11y}}`. The `a11y` bucket groups accessibility failures from axe / pa11y / lighthouse / jsx-a11y. `pass` → Step 5. `fail` → Step 4.
 ## Step 4 - Fix loop (D-08)