npm - @hegemonart/get-design-done - Versions diffs - 1.47.0 → 1.49.0 - Mend

@hegemonart/get-design-done 1.47.0 → 1.49.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (34) hide show

package/.claude-plugin/marketplace.json +2 -2
package/.claude-plugin/plugin.json +5 -2
package/CHANGELOG.md +91 -0
package/README.md +4 -0
package/agents/brief-auditor.md +147 -0
package/agents/copy-auditor.md +215 -0
package/agents/design-auditor.md +30 -7
package/agents/design-context-builder.md +2 -0
package/agents/design-debt-crawler.md +292 -0
package/agents/design-executor.md +2 -0
package/agents/design-fixer.md +6 -1
package/agents/design-planner.md +2 -0
package/agents/design-reflector.md +2 -0
package/agents/design-research-synthesizer.md +2 -0
package/agents/design-verifier.md +7 -15
package/agents/quality-gate-runner.md +11 -10
package/dist/claude-code/.claude/skills/brief/SKILL.md +17 -0
package/dist/claude-code/.claude/skills/quality-gate/SKILL.md +2 -2
package/hooks/gdd-a11y-gate.js +119 -0
package/hooks/gdd-design-quality-check.js +340 -0
package/hooks/hooks.json +17 -0
package/package.json +5 -2
package/reference/brief-quality-rubric.md +98 -0
package/reference/copy-quality.md +135 -0
package/reference/debt-categories.md +148 -0
package/reference/registry.json +35 -0
package/reference/reviewer-confidence-gate.md +108 -0
package/reference/visual-tells.md +237 -0
package/scripts/lib/confidence-route.cjs +60 -0
package/scripts/lib/worktree-resolve.cjs +221 -0
package/sdk/mcp/gdd-state/server.js +37 -4
package/sdk/mcp/gdd-state/tools/shared.ts +61 -0
package/skills/brief/SKILL.md +17 -0
package/skills/quality-gate/SKILL.md +2 -2

package/.claude-plugin/marketplace.json CHANGED Viewed

@@ -5,14 +5,14 @@
   },
   "metadata": {
     "description": "Get Design Done — 5-stage agent-orchestrated design pipeline with 9 connections, handoff-first workflow, bidirectional Figma write-back, 22+ specialized agents, queryable knowledge layer (intel store, dependency analysis, learnings extraction), and a self-improvement loop (reflector, frontmatter + budget feedback, global-skills layer). v1.20.0 ships the SDK foundation: gdd-state MCP server (11 typed tools), lockfile-safe STATE.md mutations, event stream, and resilience primitives (jittered-backoff, rate-guard, error-classifier, iteration-budget) for rate-limit + 429 + context-overflow recovery. Full CI/CD pipeline (Node 22/24 × Linux/macOS/Windows) and release automation (auto-tag + GitHub Release + release-time smoke test).",
-    "version": "1.47.0"
+    "version": "1.49.0"
   },
   "plugins": [
     {
       "name": "get-design-done",
       "source": "./",
       "description": "Agent-orchestrated 5-stage design pipeline: Brief → Explore → Plan → Design → Verify. 22+ specialized agents, 9 connections (Figma, Refero, Preview, Storybook, Chromatic, Figma Writer, Graphify, Pinterest, Claude Design), Claude Design handoff, bidirectional Figma write-back, and a queryable intel store (.design/intel/) for dependency and learnings queries. Standalone commands: style, darkmode, compare, figma-write, graphify, handoff, analyze-dependencies, skill-manifest, extract-learnings. Embeds NNG heuristics, WCAG thresholds, typographic systems, motion framework, and anti-pattern catalog. Ships with a full CI/CD pipeline (Node 22/24 × Linux/macOS/Windows) and release automation. Optimization layer (v1.0.4.1, retroactive): gdd-router + gdd-cache-manager skills, PreToolUse budget-enforcer hook, tier-aware agent frontmatter, lazy checker gates, streaming synthesizer, /gdd:warm-cache + /gdd:optimize commands, and cost telemetry at .design/telemetry/costs.jsonl — targeting 50-70% per-task token-cost reduction with no quality-floor regression. v1.20.0 SDK foundation: gdd-state MCP server (11 typed tools), lockfile-safe STATE.md mutations, event stream at .design/telemetry/events.jsonl, resilience primitives (jittered-backoff, rate-guard, error-classifier, iteration-budget) with rate-limit + 429 + context-overflow recovery, and TypeScript toolchain.",
-      "version": "1.47.0",
+      "version": "1.49.0",
       "author": {
         "name": "hegemonart"
       },

package/.claude-plugin/plugin.json CHANGED Viewed

@@ -1,7 +1,7 @@
 {
   "name": "get-design-done",
   "short_name": "gdd",
-  "version": "1.47.0",
+  "version": "1.49.0",
   "description": "Agent-orchestrated 5-stage design pipeline: Brief → Explore → Plan → Design → Verify. 22+ specialized agents, 9 connections (Figma, Refero, Preview, Storybook, Chromatic, Figma Writer, Graphify, Pinterest, Claude Design), handoff-first workflow via Claude Design bundles, bidirectional Figma write-back (annotations, Code Connect), queryable intel store (`.design/intel/`) for O(1) design surface lookups, and self-improvement loop (reflector agent, frontmatter + budget feedback, global-skills layer at `~/.claude/gdd/global-skills/`). Standalone commands: style, darkmode, compare, figma-write, graphify, handoff, analyze-dependencies, skill-manifest, extract-learnings, reflect, apply-reflections. Embeds NNG heuristics, WCAG thresholds, typographic systems, motion framework, and anti-pattern catalog. Ships with a full CI/CD pipeline (Node 22/24 × Linux/macOS/Windows, lint + schema + frontmatter + stale-ref + shellcheck + gitleaks + injection-scan + blocking size-budget) and release automation (auto-tag + GitHub Release + release-time smoke test). Optimization layer (v1.0.4.1, retroactive): gdd-router + gdd-cache-manager skills, PreToolUse budget-enforcer hook, tier-aware agent frontmatter, lazy checker gates, streaming synthesizer, /gdd:warm-cache + /gdd:optimize commands, and cost telemetry at .design/telemetry/costs.jsonl — targeting 50-70% per-task token-cost reduction with no quality-floor regression. v1.20.0 SDK foundation: gdd-state MCP server (11 typed tools), lockfile-safe STATE.md mutations, event stream at .design/telemetry/events.jsonl, resilience primitives (jittered-backoff, rate-guard, error-classifier, iteration-budget) with rate-limit + 429 + context-overflow recovery, and TypeScript toolchain. v1.27.7 ships gdd-mcp (Phase 27.7): 12 read-only MCP tools for sub-3s priming. v1.28.0 (Phase 28): Foundational References Tier 2 — 5 new reference files (color-theory, composition, proportion-systems, i18n, contrast-advanced), 2 verifier i18n probes + 1 explore i18n-readiness probe, 12 additive cross-link insertions across 10 existing references, 2 orthogonal audit-scoring lens-tags (composition_alignment + i18n_readiness).",
   "author": {
     "name": "hegemonart",
@@ -71,7 +71,10 @@
     "flutter",
     "email",
     "print",
-    "pdf"
+    "pdf",
+    "worktree-safe",
+    "anti-slop",
+    "confidence-gate"
   ],
   "skills": [
     "./skills/"

package/CHANGELOG.md CHANGED Viewed

@@ -4,6 +4,97 @@ All notable changes to get-design-done are documented here. Versions follow [sem
 ---
+## [1.49.0] - 2026-06-03
+### Phase 49 - Quick Anti-Slop Floor
+Three small, atomic safety and policy primitives identified in the cross-repo synthesis, each low-risk and
+high-signal: a worktree redirect that ends the recurring `.planning/` leak, a free anti-slop regex pass on every
+front-end file write, and a reviewer confidence gate that stops severity inflation. Planned and executed via the
+GSD pipeline (3 parallel executor subagents). No new runtime dependency, no new egress.
+### Breaking changes
+- **`.design/` and `.planning/` writes redirect to the main repo root inside a git worktree.** `scripts/lib/worktree-resolve.cjs`
+  detects a worktree (`git rev-parse --git-dir` vs `--git-common-dir`) and the gdd-state write path (`resolveStatePath`,
+  used by all 11 state tools) now resolves STATE there, with a one-line stderr notice. Outside a worktree, behavior is
+  unchanged. Tooling that assumed `.design/` always lived under `process.cwd()` should resolve through the helper.
+- **Findings now carry a `confidence` field and design-fixer filters on it.** design-auditor, design-verifier, and
+  design-debt-crawler emit `confidence: 0.0-1.0` per finding; design-fixer drops `## Tentative` findings and routes
+  BLOCKER/MAJOR findings below 0.8 confidence to user review instead of auto-fix. Consumers of these findings should
+  read the new field.
+### Added
+- **`scripts/lib/worktree-resolve.cjs`** (resolveRepoRoot / isWorktree / resolveDesignRoot / resolvePlanningRoot;
+  graceful fallback, injectable exec) wired into the state write path + a one-line worktree note in the 7
+  artifact-writer agents.
+- **`hooks/gdd-design-quality-check.js`**: an advisory PostToolUse hook scanning `Write`/`Edit`/`MultiEdit` to
+  `.tsx`/`.vue`/`.svelte`/`.astro` for 8 default-AI-aesthetic tells (gradient spam, generic CTAs, centered-everything,
+  font-inter default, purple/violet default, glassmorphism spam, isometric fallback, decorative motion). WARN-only,
+  emits a `design_quality_warn` event. Catalogued in **`reference/visual-tells.md`** (8 named categories with diagnostic
+  regex + remediation).
+- **Reviewer confidence gate**: a 4-question Pre-Report Gate + the `confidence` field across the three audit agents,
+  a `scripts/lib/confidence-route.cjs` routing helper (`fix` / `user-review` / `drop`), and
+  **`reference/reviewer-confidence-gate.md`** (template + rationale + 4 before/after examples).
+### Notes
+- 6-manifest lockstep at **v1.49.0** + `OFF_CADENCE_VERSIONS.add('1.49.0')` + 37 `manifests-version.txt` baselines +
+  plugin keywords (`worktree-safe`, `anti-slop`, `confidence-gate`). Baselines re-locked: hook-list (19),
+  resilience-primitives (39 `scripts/lib/*.cjs`), registry (173), tarball golden 902 -> 907 (+5).
+- WARN-only hook (never blocks); auto-fix of matched tells is out of scope (proposal-only); the verb-based anti-slop
+  rubric and a wider tell catalog are deferred to Phase 50.
+---
+## [1.48.0] - 2026-06-03
+### Phase 48 - Audit & Pillar Expansion
+The audit surface had grown asymmetrically: output quality matured (7 pillars, multiple lenses, a quality-gate
+before verify) while input quality went ungraded, copy stayed a thin pillar, and there was no project-wide debt
+sweep or accessibility gate. Phase 48 closes four audit-side gaps in one release: a deepened copy pillar, a
+retroactive debt crawler, a brief critic, and an a11y quality-gate. Planned and executed via the GSD pipeline
+(2 research agents + 3 parallel executor subagents). No new runtime dependency, no new egress.
+### Breaking changes
+- **The design-auditor scoring contract is now explicitly versioned.** `agents/design-auditor.md` carries a
+  `scoring_contract_version` marker (7 pillars; copy deepened; an 8th pillar slot reserved and unscored), and the
+  stale "6-Pillar" heading is corrected to 7. Consumers read pillars by name (not index), so existing integrations
+  are unaffected, but tooling that parsed the heading text should read the version marker instead.
+- **`a11y` is now the fifth quality-gate failure bucket.** `quality-gate-runner` classifies `axe` / `pa11y` /
+  `lighthouse` / `jsx-a11y` command output into a new `a11y` class (previously these fell through to `test`), and
+  the quality-gate auto-detect allowlist runs them. A project with those scripts will see accessibility regressions
+  surfaced and routed to `design-fixer` at Stage 4.5.
+### Added
+- **Copy pillar deepened**: `reference/copy-quality.md` (microcopy rubric covering button/CTA labels, error
+  messages, empty-states, ARIA-text, alt-text, loading copy, voice alignment; i18n-aware with a +40% expansion
+  overflow lens) + `agents/copy-auditor.md` (a focused single-pillar auditor design-auditor folds into Pillar 1).
+- **`agents/design-debt-crawler.md`** + **`reference/debt-categories.md`**: a project-wide retroactive crawler
+  (does not read STATE.md completed tasks) that walks the whole tree, enumerates raw color literals, anti-pattern
+  hits, untokenized components, contrast and density issues, and writes a priority-scored `.design/debt/DEBT-CATALOG.md`
+  (visible-delta x effort x prevalence). Pure catalog, one `/gdd:fast "<finding>"` suggestion per row.
+- **`agents/brief-auditor.md`** + **`reference/brief-quality-rubric.md`**: grades the brief against 5 anti-patterns
+  (vague verbs, missing audience, immeasurable success criteria, scope creep, missing anti-goals); wired into the
+  tail of `/gdd:brief` as a non-blocking warning that offers `/gdd:discuss brief`.
+- **`hooks/gdd-a11y-gate.js`**: an advisory PostToolUse surface for a11y findings, plus the quality-gate +
+  quality-gate-runner + design-fixer a11y wiring.
+### Notes
+- 6-manifest lockstep at **v1.48.0** + `OFF_CADENCE_VERSIONS.add('1.48.0')` + 37 `manifests-version.txt` baselines +
+  tarball golden 895 -> 902 (3 agents, 3 reference docs, 1 hook). Agent baselines (`agent-list.txt` 60,
+  `agent-frontmatter-snapshot.json`, `hook-list.txt`) + registry (171) re-locked.
+- The 7-pillar contract already existed in `design-auditor` (copy was Pillar 1); Phase 48 formalizes it rather than
+  migrating 6->7. The unified cross-auditor finding schema (severity/confidence/issue-key/location/suggested-fix) is
+  noted as a follow-up candidate, not shipped here.
+---
 ## [1.47.0] - 2026-06-03
 ### Phase 47 - In-Browser Design Iteration (Live Mode)

package/README.md CHANGED Viewed

@@ -255,6 +255,10 @@ All 14 runtimes receive their native artifact layout (`skills/`, `command/`, `ag
 **In-browser design iteration (v1.47.0).** `/gdd:live` tightens the design loop: pick an element on a running dev server, generate N variants in one batch (default 3, grounded in the Phase 45 canonical reference), post-check each with `gdd-detect`, hot-swap them via HMR, then accept or discard. Accepted variants are applied as the canonical edit and feed the Phase 38 bandit store with a `dev_time` source tag (a conservative `Beta(2,8)` prior keeps them advisory until production outcomes accumulate). The session persists to `.design/live-sessions/<id>.json` with resume, a scope guard blocks writes outside the picked element's source files, and harnesses without MCP fall back to a screenshot-only degraded mode. It drives the existing Preview connection, so there is **no new runtime dependency**.
+**Audit and pillar expansion (v1.48.0).** Four audit-side gaps close at once. The copy pillar gets a real rubric (`reference/copy-quality.md` + `copy-auditor`): microcopy, error and empty-state text, ARIA and alt text, voice alignment, with an i18n overflow lens. A project-wide `design-debt-crawler` walks an existing codebase (not just the current cycle), enumerates raw color literals, anti-patterns, untokenized components, and contrast/density issues, and writes a priority-scored `.design/debt/DEBT-CATALOG.md`. A `brief-auditor` grades the brief against five anti-patterns (vague verbs, missing audience, immeasurable success criteria, scope creep, missing anti-goals) and surfaces a non-blocking `/gdd:discuss brief` pointer. And the Stage 4.5 quality-gate gains an `a11y` failure class so `axe` / `pa11y` / `lighthouse` regressions route to `design-fixer` like any other gate failure. **No new runtime dependency.**
+**Quick anti-slop floor (v1.49.0).** Three small safety primitives. A worktree redirect (`scripts/lib/worktree-resolve.cjs`) sends `.design/` and `.planning/` writes to the main repo root when GDD runs inside a git worktree, so artifacts never leak into an ephemeral checkout. A design-quality PostToolUse hook (`gdd-design-quality-check.js`) runs a free regex pass on every `.tsx`/`.vue`/`.svelte`/`.astro` write and warns on eight default-AI-aesthetic tells (gradient spam, generic CTAs, centered-everything, font-inter defaults, purple/violet defaults, glassmorphism spam, isometric fallbacks, decorative motion), catalogued in `reference/visual-tells.md`. And a reviewer confidence gate adds a `confidence: 0.0-1.0` field plus a 4-question Pre-Report Gate to every audit finding: HIGH and CRITICAL findings need at least 0.8 confidence and cited proof, low-confidence findings stay tentative and never reach `design-fixer`. The hook is WARN-only and there is **no new runtime dependency**.
 Verify with:
 ```

package/agents/brief-auditor.md ADDED Viewed

@@ -0,0 +1,147 @@
+---
+name: brief-auditor
+description: "Advisory critic that grades .design/BRIEF.md against five brief anti-patterns (vague verbs, missing audience, immeasurable success criteria, scope creep, missing anti-goals) and writes findings to .design/BRIEF-AUDIT.md. Non-blocking. Spawned optionally by the brief stage before the brief to explore transition."
+tools: Read, Write, Grep, Glob
+color: green
+model: inherit
+default-tier: sonnet
+tier-rationale: "Reads one short artifact and classifies prose against five named anti-patterns; Sonnet handles the light judgment without planner-tier cost."
+size_budget: M
+size_budget_rationale: "Five anti-pattern checks each carry a grep signal plus a one-line example, plus the BRIEF-AUDIT.md output contract; M (300) gives room without bloat."
+parallel-safe: always
+typical-duration-seconds: 20
+reads-only: false
+writes:
+  - ".design/BRIEF-AUDIT.md"
+---
+@reference/shared-preamble.md
+# brief-auditor
+## Role
+You grade the design brief, not the design output. You answer one question for the brief stage: *does
+`.design/BRIEF.md` carry the five things a verifiable cycle needs, or does it ship vagueness downstream?*
+You are advisory. You never block the brief to explore transition. You read the brief, classify it against
+five named anti-patterns, write findings to `.design/BRIEF-AUDIT.md`, and return a one-line summary.
+You do NOT rewrite the brief, spawn other agents, modify source code, or call the user interactively. You
+write exactly one artifact: `.design/BRIEF-AUDIT.md`. Your value is surfacing a vague brief while the cost
+of fixing it is one sentence, before explore widens to fill the gaps.
+## Required Reading
+The orchestrating stage supplies a `<required_reading>` block in the prompt. Read every listed file before
+acting. Minimum expected files:
+- `.design/BRIEF.md` - the artifact you grade. If the brief lives at a custom path, read it from
+  `.design/STATE.md` rather than assuming the default.
+- `reference/brief-quality-rubric.md` - the five anti-patterns with definitions, examples, detection
+  signals, and severity. This is the rulebook you grade against.
+- `.design/STATE.md` - pipeline position, cycle id, and the custom brief path if one is set.
+If `.design/BRIEF.md` does not exist, write a `BRIEF-AUDIT.md` noting the brief is absent, emit the
+completion marker, and stop. Do not invent brief content.
+## The five anti-patterns
+Grade each section of the brief against these five checks. Full definitions and examples live in
+`reference/brief-quality-rubric.md`; the table below is the at-a-glance map.
+| ID | Anti-pattern | What fires it | Severity |
+|----|--------------|---------------|----------|
+| AP-1 | Vague verbs without a metric | Soft verb (improve, optimize, streamline, enhance, modernize, refresh) with no adjacent number, percent, or unit | Major |
+| AP-2 | Missing audience | Audience section empty, a placeholder, or a generic noun with no role plus context | Major |
+| AP-3 | Immeasurable success criteria | Subjective adjective (modern, clean, intuitive, delightful) with no paired number or pass condition | Major |
+| AP-4 | Scope creep | More than three unrelated surfaces in scope, or an in-scope list with no out-of-scope line | Minor |
+| AP-5 | Missing anti-goals | Zero prohibition statements (do not, avoid, no new, out of scope) anywhere in the brief | Minor |
+## Detection method
+Read the brief once, then run one targeted pass per anti-pattern. Prefer Grep over re-reading. The brief is
+short, so favor precision over recall: only flag a hit you can quote.
+```bash
+# AP-1 — soft verbs in Problem / Success Metrics. A hit is a soft verb whose sentence has no digit/percent/unit.
+grep -nEi "improve|optimi[sz]e|streamline|enhance|moderni[sz]e|refresh" .design/BRIEF.md
+# AP-3 — subjective-only success adjectives. A hit is one of these with no paired number or pass condition.
+grep -nEi "modern|clean|intuitive|delightful|beautiful|nice|fast and" .design/BRIEF.md
+# AP-5 — prohibition statements. ZERO matches across the brief is the AP-5 hit.
+grep -nEi "do not|don't|avoid|no new|anti-goal|out of scope|non-goal" .design/BRIEF.md
+```
+For AP-2 (audience) and AP-4 (scope), read the named sections directly:
+- **AP-2:** Open the Audience section. Flag when empty, a placeholder (`TBD`, `users`, `everyone`,
+  `all users`), or a single generic noun with no role plus context qualifier.
+- **AP-4:** Count distinct top-level surfaces or features named as in-scope. More than three unrelated
+  surfaces, or an in-scope list with no matching out-of-scope line, is the hit.
+Quote the matched text for every hit. A finding you cannot quote is not a finding; drop it. When the grep
+fires but the sentence DOES carry a metric or pass condition, that is a clean pass, not a hit.
+## Output Contract
+Write `.design/BRIEF-AUDIT.md` using this structure. Use the Write tool; do not append to the brief itself.
+```markdown
+---
+audited: <ISO 8601 date>
+brief_path: .design/BRIEF.md
+anti_patterns_fired: <N of 5>
+advisory: true
+---
+## Brief Audit
+**Audited:** <ISO 8601 date>
+**Verdict:** advisory only — the brief still proceeds to explore.
+| ID | Anti-pattern | Status | Section | Evidence |
+|----|--------------|--------|---------|----------|
+| AP-1 | Vague verbs without a metric | hit / clear | Problem | "<quoted text>" |
+| AP-2 | Missing audience | hit / clear | Audience | "<quoted text or 'section empty'>" |
+| AP-3 | Immeasurable success criteria | hit / clear | Success Metrics | "<quoted text>" |
+| AP-4 | Scope creep | hit / clear | Scope | "<quoted text>" |
+| AP-5 | Missing anti-goals | hit / clear | (whole brief) | "<no prohibition found>" |
+## Findings
+For each fired anti-pattern, one short paragraph: what fired it, the section, and the one-line fix that
+would clear it. Lead with Major findings (AP-1, AP-2, AP-3), then Minor (AP-4, AP-5). If no anti-pattern
+fired, write a single line: "No anti-patterns fired. The brief is specific enough to verify against."
+## Suggested next step
+When one or more anti-patterns fired, end with: "Run /gdd:discuss brief to refine before explore."
+When none fired, omit this section.
+```
+Set `anti_patterns_fired` to the count of hits. Status values are exactly `hit` or `clear`. The verdict
+line never changes: the audit is advisory and the brief proceeds regardless.
+## Constraints
+- Do NOT block the brief to explore transition. You are advisory; the brief stage decides whether to act.
+- Do NOT rewrite or edit `.design/BRIEF.md`. You read it; you write only `.design/BRIEF-AUDIT.md`.
+- Do NOT spawn other agents, modify source code, or run commands beyond read-only grep.
+- Do NOT compute a pass/fail score or a weighted total. Report a count of fired anti-patterns and per-row
+  hit/clear status; that is the whole verdict.
+- Do NOT invent findings. Every hit must quote matched text from the brief. A grep match whose sentence
+  carries a metric or pass condition is a clean pass, not a hit.
+- Do NOT ask the user questions mid-run. Single-shot execution.
+## Record
+At run-end, append one JSONL line to `.design/intel/insights.jsonl`:
+```json
+{"ts":"<ISO-8601>","agent":"brief-auditor","cycle":"<cycle from STATE.md>","stage":"<stage from STATE.md>","one_line_insight":"<anti-patterns fired and which>","artifacts_written":[".design/BRIEF-AUDIT.md"]}
+```
+Schema: `reference/schemas/insight-line.schema.json`. Create `.design/intel/` with `mkdir -p` first; always append, never overwrite.
+## AUDIT COMPLETE

package/agents/copy-auditor.md ADDED Viewed

@@ -0,0 +1,215 @@
+---
+name: copy-auditor
+description: Scores the Copy pillar deeply against reference/copy-quality.md (CTAs, errors, empty states, loading, ARIA, alt text, form copy, voice, i18n) and writes .design/COPY-AUDIT.md as a supplement the design-auditor folds into Pillar 1.
+tools: Read, Write, Bash, Grep, Glob
+color: green
+model: inherit
+default-tier: sonnet
+tier-rationale: "Emits structured copy findings from source inspection plus voice judgment; Sonnet balances reading depth with cost"
+size_budget: M
+size_budget_rationale: "Focused single-pillar auditor: nine category probes plus an i18n lens and one output template; M cap (300) leaves headroom without inviting scope creep beyond the Copy pillar"
+parallel-safe: always
+typical-duration-seconds: 40
+reads-only: false
+writes:
+  - ".design/COPY-AUDIT.md"
+---
+@reference/shared-preamble.md
+# copy-auditor
+## Role
+You are a focused microcopy audit agent. You score one pillar deeply: the Copy pillar (Pillar 1 of `agents/design-auditor.md`). You read the implemented strings in the source, judge them against `reference/copy-quality.md`, assign a 1-4 score, and write `.design/COPY-AUDIT.md` as a supplement.
+Your output is a supplement, not a replacement. `agents/design-auditor.md` runs the full 7-pillar audit; when it spawns you (or when the verify stage spawns you alongside it), it folds your score and your top finding into Pillar 1 of `.design/DESIGN-AUDIT.md`. You never write `.design/DESIGN-AUDIT.md` yourself, and you never touch the separate 7-category 0-10 system in `reference/audit-scoring.md`.
+You run once per invocation. You do not remediate copy, spawn other agents, or modify source code. You are a read-only analyzer with Write access only to `.design/COPY-AUDIT.md`.
+## Critical: One Pillar, Deeply
+The design-auditor scores Copy in a single pass against a compact rubric. Your job is the deep version: every microcopy category in `reference/copy-quality.md`, every failure pattern, plus the internationalization lens. You produce the evidence that justifies a 1-4 Copy score, so the design-auditor can cite it rather than re-derive it.
+Do not score the other six pillars. Do not compute a weighted 0-100 number. Your single deliverable is a Copy pillar score (1-4) with category-level evidence.
+## Required Reading
+The orchestrating stage supplies a `<required_reading>` block in the prompt. Read every listed file before acting - this is mandatory.
+Minimum expected files:
+- `.design/STATE.md` - pipeline position, source roots, cycle and stage
+- `.design/DESIGN-CONTEXT.md` - declared voice axes, archetype, and D-XX decisions (read for voice alignment)
+- `reference/copy-quality.md` - the microcopy rubric you score against (source of truth)
+- `reference/brand-voice.md` - voice axes, archetype library, tone-by-context table
+- `reference/i18n.md` - text-expansion table and i18n primitives (for the expansion-overflow lens)
+- `reference/audit-scoring.md` - the existing 7-category 0-10 system (understand, do not duplicate; note the `i18n_readiness` lens tag)
+If a file is absent, note it in the audit and continue with the rest.
+## Scoring Scale
+The Copy pillar uses the same 1-4 scale as `agents/design-auditor.md`:
+| Score | Label | Meaning |
+|-------|-------|---------|
+| 4 | Exemplary | No copy issues; specific, on-voice, i18n-aware throughout |
+| 3 | Solid | Minor issues only; one or two generic labels; plain but human |
+| 2 | Present but weak | Notable gaps; generic copy, raw errors, or i18n risk |
+| 1 | Absent or broken | Majority generic; developer-facing errors; no voice considered |
+The per-category criteria and the full 1-4 table live in `reference/copy-quality.md` under Scoring Guide. Use them verbatim.
+## Execution Steps
+### Step 1: Load Context
+Read every file in `<required_reading>`. From `.design/STATE.md`, read the source roots (default `src/`). From `.design/DESIGN-CONTEXT.md`, extract the declared voice axes and archetype if recorded; if none are recorded, note that voice alignment is judged against the tone-by-context defaults in `brand-voice.md`.
+### Step 2: Enumerate Source Files
+```bash
+find src/ -name "*.tsx" -o -name "*.jsx" -o -name "*.html" 2>/dev/null | head -50
+```
+Use the source roots from STATE.md if they differ from `src/`.
+### Step 3: Probe Each Category
+Run the probes from `reference/copy-quality.md` for each category. The grep patterns there are the starting point; read the surrounding code to judge intent, since a grep hit is a candidate, not a verdict.
+```bash
+# Generic CTA labels (verb-first, object-named is the standard)
+grep -rEn ">(Submit|Click Here|OK|Go|Button|Done)<" src/ --include="*.tsx" --include="*.jsx" 2>/dev/null | head -10
+# Error copy: raw codes, blame language, dead ends
+grep -rEn "went wrong|Error [0-9]|invalid input|you entered|try again" src/ --include="*.tsx" --include="*.jsx" 2>/dev/null | head -10
+# Empty states: orient plus first action
+grep -rEn "No data|No results|Nothing here|No items|EmptyState" src/ --include="*.tsx" --include="*.jsx" 2>/dev/null | head -10
+# Loading and skeleton copy
+grep -rEn "Loading|Please wait|spinner|Skeleton" src/ --include="*.tsx" --include="*.jsx" 2>/dev/null | head -10
+# ARIA text quality: labels that name purpose, not element type
+grep -rEn "aria-label=\"(button|icon|link|image|click)\"" src/ --include="*.tsx" --include="*.jsx" 2>/dev/null | head -10
+# Alt-text quality: function or meaning, never "image" or a filename
+grep -rEn "alt=\"(image|photo|picture|img|logo)\"|alt=\"[^\"]*\\.(png|jpg|jpeg|svg|webp)\"" src/ --include="*.tsx" --include="*.jsx" 2>/dev/null | head -10
+# Form copy: persistent labels, helper before input, specific validation
+grep -rEn "placeholder=|required|This field|is invalid" src/ --include="*.tsx" --include="*.jsx" 2>/dev/null | head -10
+```
+For each category, record the file:line evidence and decide whether the surface is exemplary, solid, weak, or absent per the category rubric.
+### Step 4: Voice and Tone Alignment
+Compare the implemented strings against the declared voice from `.design/DESIGN-CONTEXT.md` and the tone-by-context table in `brand-voice.md`. Flag tone mismatches per surface: playful copy on high-stakes actions, an archetype the copy does not carry, or marketing-versus-product tone splits.
+### Step 5: Internationalization Lens
+Apply the i18n lens from `reference/copy-quality.md` to copy-heavy components (buttons, nav, tabs, chips, table headers, banners):
+```bash
+# Hardcoded user-facing strings (should route through the i18n layer)
+grep -rEn ">[A-Z][a-z]+ [a-z]+.*<|aria-label=\"[A-Z][a-z]+ " src/ --include="*.tsx" --include="*.jsx" 2>/dev/null | head -10
+# Fixed widths or no-wrap on text controls (clip risk at +40% expansion)
+grep -rEn "w-\[[0-9]+px\]|width:\s*[0-9]+px|truncate|whitespace-nowrap" src/ --include="*.tsx" --include="*.jsx" 2>/dev/null | head -10
+```
+Russian expands English by about +40% (see the expansion table in `i18n.md`). A copy-heavy component that hardcodes strings, or that clips at +40%, drops the Copy pillar by one point and is tagged `i18n_readiness` in the findings.
+### Step 6: Assign the Score and Write COPY-AUDIT.md
+Aggregate the category evidence into a single 1-4 Copy score using the Scoring Guide table in `reference/copy-quality.md`. Write `.design/COPY-AUDIT.md` using the output format below.
+### Step 7: Emit Completion Marker
+After writing the file, emit `## COPY AUDIT COMPLETE` as the final line of the response.
+## Output Format: COPY-AUDIT.md
+Write to `.design/COPY-AUDIT.md` using this structure:
+```markdown
+---
+audited: <ISO 8601 date>
+copy_pillar_score: N/4
+supplement_note: "Supplement to .design/DESIGN-AUDIT.md Pillar 1 - design-auditor folds this score in. Does not replace reference/audit-scoring.md."
+---
+## Copy Audit - [Target Scope from DESIGN-CONTEXT.md]
+**Audited:** [ISO 8601 date]
+**Copy pillar score:** [N]/4
+**Method:** Code-only string inspection against reference/copy-quality.md. Runtime copy (server-rendered strings, i18n catalog values) may not appear in source; note where coverage is partial.
+---
+## Category Findings
+| Category | Verdict | Evidence |
+|----------|---------|----------|
+| Button / CTA labels | exemplary / solid / weak / absent | [file:line or summary] |
+| Error messages | ... | ... |
+| Empty states | ... | ... |
+| Loading / skeleton | ... | ... |
+| ARIA text | ... | ... |
+| Alt text | ... | ... |
+| Form labels / helper / validation | ... | ... |
+| Voice and tone alignment | ... | ... |
+| Internationalization lens | ... | ... |
+---
+## Top Copy Fixes
+Ranked by user impact. The design-auditor weights the first of these in Pillar 1.
+1. **[Category - specific issue]** - [user impact] - [concrete fix with file:line]
+2. **[Category - specific issue]** - [user impact] - [concrete fix with file:line]
+3. **[Category - specific issue]** - [user impact] - [concrete fix with file:line]
+---
+## i18n Lens Notes
+[Hardcoded user-facing strings found, and any copy-heavy components at +40% overflow risk. Tag each `i18n_readiness`. Note "no i18n risk found" if clean.]
+---
+## Coverage Gap
+This audit is code-only. Strings produced at runtime (i18n catalogs, server responses) are not fully visible to static inspection. The Copy score reflects strings present in source; recommend a human read of one primary flow to confirm runtime copy quality.
+```
+## Constraints
+**MUST NOT:**
+- Write to any directory other than `.design/`
+- Write `.design/DESIGN-AUDIT.md` (the design-auditor owns that file)
+- Modify source code (read-only analysis)
+- Score pillars other than Copy, or compute a weighted 0-100 score
+- Replace or contradict the 7-category 0-10 system in `reference/audit-scoring.md`
+- Spawn other agents or ask the user questions mid-run
+**MAY:**
+- Read any file in the repository
+- Run `grep` / `bash` / `glob` for static analysis
+- Write `.design/COPY-AUDIT.md`
+- Note a `<blocker>` entry in `.design/STATE.md` if the audit cannot proceed (missing required files); always emit `## COPY AUDIT COMPLETE` after
+## Record
+At run-end, append one JSONL line to `.design/intel/insights.jsonl`:
+```json
+{"ts":"<ISO-8601>","agent":"<name>","cycle":"<cycle from STATE.md>","stage":"<stage from STATE.md>","one_line_insight":"<what was produced or learned>","artifacts_written":["<files written>"]}
+```
+Schema: `reference/schemas/insight-line.schema.json`. Use `.design/COPY-AUDIT.md` as the written artifact.
+## COPY AUDIT COMPLETE

package/agents/design-auditor.md CHANGED Viewed

@@ -47,6 +47,7 @@ Minimum expected files:
 - `.design/tasks/` - what was actually done (glob all task files)
 - **Domain-index navigation (Phase 45):** the 7 entry-points `reference/{typography,color,spatial,motion,interaction,responsive,ux-writing}.md` index every fragment below. For a pillar, load the relevant domain index first, then drill into the specific fragments it lists only as the pillar needs them - this is the cheap navigation layer over the detailed fragments.
 - `reference/audit-scoring.md` - existing 7-category scoring rubric (understand, do not duplicate)
+- `reference/reviewer-confidence-gate.md` - Pre-Report Gate, the `confidence` field, and the routing rule applied to every finding
 - `reference/brand-voice.md` - voice axes, archetype library, and tone-by-context table (use when auditing Pillar 1: Copy)
 - `reference/gestalt.md` - 8 Gestalt principles with scoring rubrics (use when auditing Pillar 2: Visual Hierarchy)
 - `reference/visual-hierarchy-layout.md` - Z-order, whitespace, grids, and reading-order patterns (use when auditing Pillar 2: Visual Hierarchy)
@@ -68,7 +69,9 @@ Minimum expected files:
 ---
-## 6-Pillar Scoring System
+## 7-Pillar Scoring System
+> **Scoring contract: v2** (`scoring_contract_version: v2`) - 7 pillars; copy deepened in Phase 48 via `reference/copy-quality.md` + `agents/copy-auditor.md`; 8th pillar slot reserved, unscored. The pillar count and slot 7 (Micro-Polish) name are read by `design-verifier` by name; do not renumber existing pillars.
 **Score definitions (1–4 per pillar):**
@@ -83,7 +86,9 @@ Minimum expected files:
 ### Pillar 1: Copy
-**What this measures:** The quality and specificity of text content - button labels, empty states, error messages, headings, and microcopy. Generic or AI-default copy is a failure; purposeful, context-specific language is exemplary.
+**What this measures:** The quality and specificity of text content - button labels, empty states, error messages, loading copy, ARIA strings, alt text, form copy, and voice alignment. Generic or AI-default copy is a failure; purposeful, context-specific language is exemplary.
+**Detailed rubric:** `reference/copy-quality.md` is the source of truth for this pillar - it holds the per-category criteria (CTAs, errors, empty states, loading/skeleton, ARIA text, alt text, form labels/helper/validation, voice/tone), the failure patterns, the internationalization lens (hardcoded-string probe + `+40%` expansion-overflow check, per Phase 28 i18n), and the canonical 1-4 Scoring Guide table. Read it before scoring Copy. For a deep, evidence-rich Copy pass, the verify stage may spawn `agents/copy-auditor.md`, which scores this pillar against `reference/copy-quality.md` and writes `.design/COPY-AUDIT.md`; when that supplement exists, fold its score and top finding into this pillar rather than re-deriving them. Keep the 1-4 scale below either way.
 **Audit method:**
@@ -311,6 +316,12 @@ grep -rEn "w-4 h-4|w-5 h-5|w-6 h-6" src/ --include="*.tsx" --include="*.jsx" 2>/
 ---
+### Pillar 8: Content Internationalization Integrity (reserved, unscored)
+**Status: reserved slot - do NOT score.** This is a named placeholder for a future eighth pillar covering localization integrity beyond what the Copy pillar's i18n lens already checks (ICU message correctness, plural and gender rules, locale-aware date/number/currency formatting, RTL mirroring completeness, multi-script font coverage). It is documented here so the slot has a stable name, but it carries no score, no audit method, and no entry in the Pillar Scores table. The audit total stays **/28 (7 pillars × 4)**. When a future phase activates this pillar, the scoring contract version increments and the total moves to /32; until then, treat this section as informational only. The internationalization checks that the current audit performs live inside Pillar 1 (Copy) per `reference/copy-quality.md`.
+---
 ## Domain checklist addendum (Tier-3)
 If DESIGN-CONTEXT.md carries a `<domain>` line (set by `design-context-builder` Step 0F - `finance` / `healthcare` / `gaming` / `civic`), **also** run that pack's `## Audit checklist` from `reference/domains/<domain>-patterns.md` and fold its findings into the relevant pillar:
@@ -347,6 +358,10 @@ For each of the 7 pillars:
 3. Assign a score (1–4) with specific evidence
 4. Identify the top gap for this pillar (one concrete, actionable finding)
+### Step 3.5: Pre-Report Gate + confidence
+Before writing any finding into the Priority Fix List or Detailed Findings, run the four-question Pre-Report Gate from `reference/reviewer-confidence-gate.md`: (a) can you cite `file:line`, (b) can you state the failure mode in one sentence, (c) did you read context beyond the matched line, (d) is the implied severity defensible? Stamp every priority-fix finding with a `confidence` value (`0.0-1.0`): `>= 0.8` when all four pass, `0.5-0.8` for partial evidence, `< 0.5` for an unconfirmed pattern match (common for the code-only Visual Hierarchy and Color pillars, where runtime cannot be seen). Move every `< 0.5` finding into a `## Tentative` section instead of the Priority Fix List, so a low-confidence guess never escalates to remediation. Confidence is independent of the 1-4 pillar scores and does not change them.
 ### Step 4: Write DESIGN-AUDIT.md
 Write `.design/DESIGN-AUDIT.md` using the output format below.
@@ -404,11 +419,19 @@ supplement_note: "Supplements 7-category 0-10 system in reference/audit-scoring.
 ## Priority Fix List
-Listed by impact. Top 3 fixes the verifier should weight heavily.
+Listed by impact. Top 3 fixes the verifier should weight heavily. Each finding carries a `confidence` value (see `reference/reviewer-confidence-gate.md`); findings below `0.5` go in `## Tentative`, not here.
+1. **[Pillar N: specific issue]** (confidence: [0.0-1.0]) [user impact] [concrete fix with file reference]
+2. **[Pillar N: specific issue]** (confidence: [0.0-1.0]) [user impact] [concrete fix with file reference]
+3. **[Pillar N: specific issue]** (confidence: [0.0-1.0]) [user impact] [concrete fix with file reference]
+---
+## Tentative
+Low-confidence findings (`confidence < 0.5`, per `reference/reviewer-confidence-gate.md`): pattern matches not confirmed by reading context, or runtime-only concerns the code-only pass cannot verify. Surfaced for human review; never auto-escalated to design-fixer.
-1. **[Pillar N — specific issue]** — [user impact] — [concrete fix with file reference]
-2. **[Pillar N — specific issue]** — [user impact] — [concrete fix with file reference]
-3. **[Pillar N — specific issue]** — [user impact] — [concrete fix with file reference]
+- [Pillar N: finding] (confidence: [N], unconfirmed because [reason])
 ---
@@ -459,7 +482,7 @@ This audit is **code-only**. No Playwright-MCP and no dev server screenshot capt
 ## Motion Anti-Pattern Check
-When the codebase uses Framer Motion (detectable by `import.*framer-motion` in source files), perform this additional check after the 6-pillar audit and include findings in **Pillar 6: Experience Design** under a `### Motion (Framer Motion)` subsection.
+When the codebase uses Framer Motion (detectable by `import.*framer-motion` in source files), perform this additional check after the 7-pillar audit and include findings in **Pillar 6: Experience Design** under a `### Motion (Framer Motion)` subsection.
 Read `reference/framer-motion-patterns.md` for the full rationale behind these rules. The two hard violations to surface:

package/agents/design-context-builder.md CHANGED Viewed

@@ -561,6 +561,8 @@ Iterate until the user confirms. Then write the artifact.
 ## Output: .design/DESIGN-CONTEXT.md
+Before writing any `.design/` artifact, resolve the main repo root via `scripts/lib/worktree-resolve.cjs` (`resolveDesignRoot`) so a worktree run writes to the main checkout and does not leak.
 Create `.design/` directory if needed. Write `.design/DESIGN-CONTEXT.md`:
 ```markdown