npm - @hegemonart/get-design-done - Versions diffs - 1.48.0 → 1.49.0 - Mend

@hegemonart/get-design-done 1.48.0 → 1.49.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (23) hide show

package/.claude-plugin/marketplace.json +2 -2
package/.claude-plugin/plugin.json +5 -2
package/CHANGELOG.md +44 -0
package/README.md +2 -0
package/agents/design-auditor.md +17 -4
package/agents/design-context-builder.md +2 -0
package/agents/design-debt-crawler.md +28 -5
package/agents/design-executor.md +2 -0
package/agents/design-fixer.md +4 -1
package/agents/design-planner.md +2 -0
package/agents/design-reflector.md +2 -0
package/agents/design-research-synthesizer.md +2 -0
package/agents/design-verifier.md +7 -15
package/hooks/gdd-design-quality-check.js +340 -0
package/hooks/hooks.json +9 -0
package/package.json +5 -2
package/reference/registry.json +14 -0
package/reference/reviewer-confidence-gate.md +108 -0
package/reference/visual-tells.md +237 -0
package/scripts/lib/confidence-route.cjs +60 -0
package/scripts/lib/worktree-resolve.cjs +221 -0
package/sdk/mcp/gdd-state/server.js +37 -4
package/sdk/mcp/gdd-state/tools/shared.ts +61 -0

package/.claude-plugin/marketplace.json CHANGED Viewed

@@ -5,14 +5,14 @@
   },
   "metadata": {
     "description": "Get Design Done — 5-stage agent-orchestrated design pipeline with 9 connections, handoff-first workflow, bidirectional Figma write-back, 22+ specialized agents, queryable knowledge layer (intel store, dependency analysis, learnings extraction), and a self-improvement loop (reflector, frontmatter + budget feedback, global-skills layer). v1.20.0 ships the SDK foundation: gdd-state MCP server (11 typed tools), lockfile-safe STATE.md mutations, event stream, and resilience primitives (jittered-backoff, rate-guard, error-classifier, iteration-budget) for rate-limit + 429 + context-overflow recovery. Full CI/CD pipeline (Node 22/24 × Linux/macOS/Windows) and release automation (auto-tag + GitHub Release + release-time smoke test).",
-    "version": "1.48.0"
+    "version": "1.49.0"
   },
   "plugins": [
     {
       "name": "get-design-done",
       "source": "./",
       "description": "Agent-orchestrated 5-stage design pipeline: Brief → Explore → Plan → Design → Verify. 22+ specialized agents, 9 connections (Figma, Refero, Preview, Storybook, Chromatic, Figma Writer, Graphify, Pinterest, Claude Design), Claude Design handoff, bidirectional Figma write-back, and a queryable intel store (.design/intel/) for dependency and learnings queries. Standalone commands: style, darkmode, compare, figma-write, graphify, handoff, analyze-dependencies, skill-manifest, extract-learnings. Embeds NNG heuristics, WCAG thresholds, typographic systems, motion framework, and anti-pattern catalog. Ships with a full CI/CD pipeline (Node 22/24 × Linux/macOS/Windows) and release automation. Optimization layer (v1.0.4.1, retroactive): gdd-router + gdd-cache-manager skills, PreToolUse budget-enforcer hook, tier-aware agent frontmatter, lazy checker gates, streaming synthesizer, /gdd:warm-cache + /gdd:optimize commands, and cost telemetry at .design/telemetry/costs.jsonl — targeting 50-70% per-task token-cost reduction with no quality-floor regression. v1.20.0 SDK foundation: gdd-state MCP server (11 typed tools), lockfile-safe STATE.md mutations, event stream at .design/telemetry/events.jsonl, resilience primitives (jittered-backoff, rate-guard, error-classifier, iteration-budget) with rate-limit + 429 + context-overflow recovery, and TypeScript toolchain.",
-      "version": "1.48.0",
+      "version": "1.49.0",
       "author": {
         "name": "hegemonart"
       },

package/.claude-plugin/plugin.json CHANGED Viewed

@@ -1,7 +1,7 @@
 {
   "name": "get-design-done",
   "short_name": "gdd",
-  "version": "1.48.0",
+  "version": "1.49.0",
   "description": "Agent-orchestrated 5-stage design pipeline: Brief → Explore → Plan → Design → Verify. 22+ specialized agents, 9 connections (Figma, Refero, Preview, Storybook, Chromatic, Figma Writer, Graphify, Pinterest, Claude Design), handoff-first workflow via Claude Design bundles, bidirectional Figma write-back (annotations, Code Connect), queryable intel store (`.design/intel/`) for O(1) design surface lookups, and self-improvement loop (reflector agent, frontmatter + budget feedback, global-skills layer at `~/.claude/gdd/global-skills/`). Standalone commands: style, darkmode, compare, figma-write, graphify, handoff, analyze-dependencies, skill-manifest, extract-learnings, reflect, apply-reflections. Embeds NNG heuristics, WCAG thresholds, typographic systems, motion framework, and anti-pattern catalog. Ships with a full CI/CD pipeline (Node 22/24 × Linux/macOS/Windows, lint + schema + frontmatter + stale-ref + shellcheck + gitleaks + injection-scan + blocking size-budget) and release automation (auto-tag + GitHub Release + release-time smoke test). Optimization layer (v1.0.4.1, retroactive): gdd-router + gdd-cache-manager skills, PreToolUse budget-enforcer hook, tier-aware agent frontmatter, lazy checker gates, streaming synthesizer, /gdd:warm-cache + /gdd:optimize commands, and cost telemetry at .design/telemetry/costs.jsonl — targeting 50-70% per-task token-cost reduction with no quality-floor regression. v1.20.0 SDK foundation: gdd-state MCP server (11 typed tools), lockfile-safe STATE.md mutations, event stream at .design/telemetry/events.jsonl, resilience primitives (jittered-backoff, rate-guard, error-classifier, iteration-budget) with rate-limit + 429 + context-overflow recovery, and TypeScript toolchain. v1.27.7 ships gdd-mcp (Phase 27.7): 12 read-only MCP tools for sub-3s priming. v1.28.0 (Phase 28): Foundational References Tier 2 — 5 new reference files (color-theory, composition, proportion-systems, i18n, contrast-advanced), 2 verifier i18n probes + 1 explore i18n-readiness probe, 12 additive cross-link insertions across 10 existing references, 2 orthogonal audit-scoring lens-tags (composition_alignment + i18n_readiness).",
   "author": {
     "name": "hegemonart",
@@ -71,7 +71,10 @@
     "flutter",
     "email",
     "print",
-    "pdf"
+    "pdf",
+    "worktree-safe",
+    "anti-slop",
+    "confidence-gate"
   ],
   "skills": [
     "./skills/"

package/CHANGELOG.md CHANGED Viewed

@@ -4,6 +4,50 @@ All notable changes to get-design-done are documented here. Versions follow [sem
 ---
+## [1.49.0] - 2026-06-03
+### Phase 49 - Quick Anti-Slop Floor
+Three small, atomic safety and policy primitives identified in the cross-repo synthesis, each low-risk and
+high-signal: a worktree redirect that ends the recurring `.planning/` leak, a free anti-slop regex pass on every
+front-end file write, and a reviewer confidence gate that stops severity inflation. Planned and executed via the
+GSD pipeline (3 parallel executor subagents). No new runtime dependency, no new egress.
+### Breaking changes
+- **`.design/` and `.planning/` writes redirect to the main repo root inside a git worktree.** `scripts/lib/worktree-resolve.cjs`
+  detects a worktree (`git rev-parse --git-dir` vs `--git-common-dir`) and the gdd-state write path (`resolveStatePath`,
+  used by all 11 state tools) now resolves STATE there, with a one-line stderr notice. Outside a worktree, behavior is
+  unchanged. Tooling that assumed `.design/` always lived under `process.cwd()` should resolve through the helper.
+- **Findings now carry a `confidence` field and design-fixer filters on it.** design-auditor, design-verifier, and
+  design-debt-crawler emit `confidence: 0.0-1.0` per finding; design-fixer drops `## Tentative` findings and routes
+  BLOCKER/MAJOR findings below 0.8 confidence to user review instead of auto-fix. Consumers of these findings should
+  read the new field.
+### Added
+- **`scripts/lib/worktree-resolve.cjs`** (resolveRepoRoot / isWorktree / resolveDesignRoot / resolvePlanningRoot;
+  graceful fallback, injectable exec) wired into the state write path + a one-line worktree note in the 7
+  artifact-writer agents.
+- **`hooks/gdd-design-quality-check.js`**: an advisory PostToolUse hook scanning `Write`/`Edit`/`MultiEdit` to
+  `.tsx`/`.vue`/`.svelte`/`.astro` for 8 default-AI-aesthetic tells (gradient spam, generic CTAs, centered-everything,
+  font-inter default, purple/violet default, glassmorphism spam, isometric fallback, decorative motion). WARN-only,
+  emits a `design_quality_warn` event. Catalogued in **`reference/visual-tells.md`** (8 named categories with diagnostic
+  regex + remediation).
+- **Reviewer confidence gate**: a 4-question Pre-Report Gate + the `confidence` field across the three audit agents,
+  a `scripts/lib/confidence-route.cjs` routing helper (`fix` / `user-review` / `drop`), and
+  **`reference/reviewer-confidence-gate.md`** (template + rationale + 4 before/after examples).
+### Notes
+- 6-manifest lockstep at **v1.49.0** + `OFF_CADENCE_VERSIONS.add('1.49.0')` + 37 `manifests-version.txt` baselines +
+  plugin keywords (`worktree-safe`, `anti-slop`, `confidence-gate`). Baselines re-locked: hook-list (19),
+  resilience-primitives (39 `scripts/lib/*.cjs`), registry (173), tarball golden 902 -> 907 (+5).
+- WARN-only hook (never blocks); auto-fix of matched tells is out of scope (proposal-only); the verb-based anti-slop
+  rubric and a wider tell catalog are deferred to Phase 50.
+---
 ## [1.48.0] - 2026-06-03
 ### Phase 48 - Audit & Pillar Expansion

package/README.md CHANGED Viewed

@@ -257,6 +257,8 @@ All 14 runtimes receive their native artifact layout (`skills/`, `command/`, `ag
 **Audit and pillar expansion (v1.48.0).** Four audit-side gaps close at once. The copy pillar gets a real rubric (`reference/copy-quality.md` + `copy-auditor`): microcopy, error and empty-state text, ARIA and alt text, voice alignment, with an i18n overflow lens. A project-wide `design-debt-crawler` walks an existing codebase (not just the current cycle), enumerates raw color literals, anti-patterns, untokenized components, and contrast/density issues, and writes a priority-scored `.design/debt/DEBT-CATALOG.md`. A `brief-auditor` grades the brief against five anti-patterns (vague verbs, missing audience, immeasurable success criteria, scope creep, missing anti-goals) and surfaces a non-blocking `/gdd:discuss brief` pointer. And the Stage 4.5 quality-gate gains an `a11y` failure class so `axe` / `pa11y` / `lighthouse` regressions route to `design-fixer` like any other gate failure. **No new runtime dependency.**
+**Quick anti-slop floor (v1.49.0).** Three small safety primitives. A worktree redirect (`scripts/lib/worktree-resolve.cjs`) sends `.design/` and `.planning/` writes to the main repo root when GDD runs inside a git worktree, so artifacts never leak into an ephemeral checkout. A design-quality PostToolUse hook (`gdd-design-quality-check.js`) runs a free regex pass on every `.tsx`/`.vue`/`.svelte`/`.astro` write and warns on eight default-AI-aesthetic tells (gradient spam, generic CTAs, centered-everything, font-inter defaults, purple/violet defaults, glassmorphism spam, isometric fallbacks, decorative motion), catalogued in `reference/visual-tells.md`. And a reviewer confidence gate adds a `confidence: 0.0-1.0` field plus a 4-question Pre-Report Gate to every audit finding: HIGH and CRITICAL findings need at least 0.8 confidence and cited proof, low-confidence findings stay tentative and never reach `design-fixer`. The hook is WARN-only and there is **no new runtime dependency**.
 Verify with:
 ```

package/agents/design-auditor.md CHANGED Viewed

@@ -47,6 +47,7 @@ Minimum expected files:
 - `.design/tasks/` - what was actually done (glob all task files)
 - **Domain-index navigation (Phase 45):** the 7 entry-points `reference/{typography,color,spatial,motion,interaction,responsive,ux-writing}.md` index every fragment below. For a pillar, load the relevant domain index first, then drill into the specific fragments it lists only as the pillar needs them - this is the cheap navigation layer over the detailed fragments.
 - `reference/audit-scoring.md` - existing 7-category scoring rubric (understand, do not duplicate)
+- `reference/reviewer-confidence-gate.md` - Pre-Report Gate, the `confidence` field, and the routing rule applied to every finding
 - `reference/brand-voice.md` - voice axes, archetype library, and tone-by-context table (use when auditing Pillar 1: Copy)
 - `reference/gestalt.md` - 8 Gestalt principles with scoring rubrics (use when auditing Pillar 2: Visual Hierarchy)
 - `reference/visual-hierarchy-layout.md` - Z-order, whitespace, grids, and reading-order patterns (use when auditing Pillar 2: Visual Hierarchy)
@@ -357,6 +358,10 @@ For each of the 7 pillars:
 3. Assign a score (1–4) with specific evidence
 4. Identify the top gap for this pillar (one concrete, actionable finding)
+### Step 3.5: Pre-Report Gate + confidence
+Before writing any finding into the Priority Fix List or Detailed Findings, run the four-question Pre-Report Gate from `reference/reviewer-confidence-gate.md`: (a) can you cite `file:line`, (b) can you state the failure mode in one sentence, (c) did you read context beyond the matched line, (d) is the implied severity defensible? Stamp every priority-fix finding with a `confidence` value (`0.0-1.0`): `>= 0.8` when all four pass, `0.5-0.8` for partial evidence, `< 0.5` for an unconfirmed pattern match (common for the code-only Visual Hierarchy and Color pillars, where runtime cannot be seen). Move every `< 0.5` finding into a `## Tentative` section instead of the Priority Fix List, so a low-confidence guess never escalates to remediation. Confidence is independent of the 1-4 pillar scores and does not change them.
 ### Step 4: Write DESIGN-AUDIT.md
 Write `.design/DESIGN-AUDIT.md` using the output format below.
@@ -414,11 +419,19 @@ supplement_note: "Supplements 7-category 0-10 system in reference/audit-scoring.
 ## Priority Fix List
-Listed by impact. Top 3 fixes the verifier should weight heavily.
+Listed by impact. Top 3 fixes the verifier should weight heavily. Each finding carries a `confidence` value (see `reference/reviewer-confidence-gate.md`); findings below `0.5` go in `## Tentative`, not here.
+1. **[Pillar N: specific issue]** (confidence: [0.0-1.0]) [user impact] [concrete fix with file reference]
+2. **[Pillar N: specific issue]** (confidence: [0.0-1.0]) [user impact] [concrete fix with file reference]
+3. **[Pillar N: specific issue]** (confidence: [0.0-1.0]) [user impact] [concrete fix with file reference]
+---
+## Tentative
+Low-confidence findings (`confidence < 0.5`, per `reference/reviewer-confidence-gate.md`): pattern matches not confirmed by reading context, or runtime-only concerns the code-only pass cannot verify. Surfaced for human review; never auto-escalated to design-fixer.
-1. **[Pillar N — specific issue]** — [user impact] — [concrete fix with file reference]
-2. **[Pillar N — specific issue]** — [user impact] — [concrete fix with file reference]
-3. **[Pillar N — specific issue]** — [user impact] — [concrete fix with file reference]
+- [Pillar N: finding] (confidence: [N], unconfirmed because [reason])
 ---

package/agents/design-context-builder.md CHANGED Viewed

@@ -561,6 +561,8 @@ Iterate until the user confirms. Then write the artifact.
 ## Output: .design/DESIGN-CONTEXT.md
+Before writing any `.design/` artifact, resolve the main repo root via `scripts/lib/worktree-resolve.cjs` (`resolveDesignRoot`) so a worktree run writes to the main checkout and does not leak.
 Create `.design/` directory if needed. Write `.design/DESIGN-CONTEXT.md`:
 ```markdown

package/agents/design-debt-crawler.md CHANGED Viewed

@@ -60,6 +60,7 @@ listed file before acting. Minimum expected files:
 - @reference/debt-categories.md
 - @reference/anti-patterns.md
+- @reference/reviewer-confidence-gate.md
 `reference/debt-categories.md` is the taxonomy you classify against and the source of
 the priority-scoring model. `reference/anti-patterns.md` is the BAN-NN and SLOP-NN
@@ -157,6 +158,19 @@ grep -rEn "No data|No results|Nothing here|went wrong|error occurred" src/ \
 Flag meaningful images without `alt`, icon-only controls without an accessible name,
 placeholder used as the only label, and generic empty or error copy.
+### Step 2.5: Pre-Report Gate + confidence
+Before cataloging any finding, run the four-question Pre-Report Gate from
+`reference/reviewer-confidence-gate.md`: (a) can you cite `file:line`, (b) can you state the
+failure mode in one sentence, (c) did you read context beyond the matched line (the token
+definition, the call site), and (d) is the class assignment defensible? Stamp every catalog
+row with a `confidence` value (`0.0-1.0`): `>= 0.8` when all four pass, `0.5-0.8` when evidence
+is partial, `< 0.5` for a pattern match you could not confirm (for example an unresolved
+contrast pair or a literal that may be inside a token definition). Move every `< 0.5` finding
+into a `## Tentative` section instead of the ranked findings table, so a low-confidence guess
+never escalates to remediation. Confidence is independent of priority: a high-priority debt
+item can still be low confidence and belongs in `## Tentative` until confirmed.
 ### Step 3: Group and score
 Group findings by the seven debt classes. For each finding, assign the three priority
@@ -217,12 +231,21 @@ note: "Project-scoped retroactive debt catalog. Does NOT read STATE.md completed
 ## Findings (ranked by priority)
-| Priority | Class | Location | Finding | V × E × P | Suggested command |
-|----------|-------|----------|---------|-----------|-------------------|
-| 18 | color-literal | src/Card.tsx:42 | Raw #1a73e8 instead of token | 3×3×2 | `/gdd:fast "replace #1a73e8 with semantic token in Card.tsx"` |
-| 12 | anti-pattern | src/Hero.tsx:8 | BAN-02 gradient text on heading | 3×2×2 | `/gdd:fast "remove BAN-02 gradient text in Hero.tsx"` |
+| Priority | Class | Location | Finding | V × E × P | Confidence | Suggested command |
+|----------|-------|----------|---------|-----------|------------|-------------------|
+| 18 | color-literal | src/Card.tsx:42 | Raw #1a73e8 instead of token | 3×3×2 | 0.9 | `/gdd:fast "replace #1a73e8 with semantic token in Card.tsx"` |
+| 12 | anti-pattern | src/Hero.tsx:8 | BAN-02 gradient text on heading | 3×2×2 | 0.85 | `/gdd:fast "remove BAN-02 gradient text in Hero.tsx"` |
+(One row per finding with `confidence >= 0.5`. The Suggested command column always carries a `/gdd:fast "<finding>"` string. Findings below `0.5` go in `## Tentative` below, not in this table.)
+---
+## Tentative
+Findings with `confidence < 0.5` (pattern matches not confirmed by reading context, per
+`reference/reviewer-confidence-gate.md`). Listed for human review; never auto-escalated.
-(One row per finding. The Suggested command column always carries a `/gdd:fast "<finding>"` string.)
+- [class] [location]: [finding] (confidence: [N], unconfirmed because [reason])
 ---

package/agents/design-executor.md CHANGED Viewed

@@ -395,6 +395,8 @@ Apply these rules automatically during execution. Track all deviations in the ta
 ## Task Output - .design/tasks/task-NN.md
+Before writing any `.design/` artifact, resolve the main repo root via `scripts/lib/worktree-resolve.cjs` (`resolveDesignRoot`) so a worktree run writes to the main checkout and does not leak.
 After completing the task's implementation work, write `.design/tasks/task-NN.md` (where NN = task_id from prompt context). Create `.design/tasks/` directory first if it does not exist.
 Format (locked - do not alter structure):

package/agents/design-fixer.md CHANGED Viewed

@@ -48,6 +48,8 @@ The orchestrating stage supplies a `<required_reading>` block in the prompt. Rea
 **Invariant:** read all listed files FIRST, before making any changes.
+**Worktree-root invariant:** before writing any `.design/` artifact (for example a `<blocker>` entry to `.design/STATE.md`), resolve the main repo root via `scripts/lib/worktree-resolve.cjs` so a worktree run writes to the canonical `.design/` and does not leak artifacts into the worktree checkout.
 ---
 ## Prompt Context Fields
@@ -88,7 +90,8 @@ Parse every entry in that section. The `G-NN` identifier, severity classificatio
 4. Filter by severity based on `auto_mode`:
    - Always include: `BLOCKER`, `MAJOR`
    - Include only if `auto_mode=true`: `MINOR`, `COSMETIC`
-5. Build an ordered list: BLOCKER first, then MAJOR, then (if included) MINOR, COSMETIC.
+5. **Confidence routing filter (Phase 49, see `reference/reviewer-confidence-gate.md`).** Drop any gap that sits under a `## Tentative` heading: those never reach you. Then drop any `BLOCKER` or `MAJOR` gap whose `confidence` field is below `0.8` and route it to user review instead of auto-fix, since a high-severity gap without strong evidence is exactly the inflated-severity case the gate exists to catch. A gap missing its `confidence` field is treated as below the floor. The shared decision lives in `scripts/lib/confidence-route.cjs` (`route({ severity, confidence, tentative })` returns `'fix' | 'user-review' | 'drop'`); fix only the gaps it routes to `'fix'`.
+6. Build an ordered list: BLOCKER first, then MAJOR, then (if included) MINOR, COSMETIC.
 If no in-scope gaps are found (e.g., verifier found only MINOR gaps and `auto_mode=false`), emit `## FIX COMPLETE` immediately with "No in-scope gaps to fix."

package/agents/design-planner.md CHANGED Viewed

@@ -227,6 +227,8 @@ Before finalizing task list:
 ## Output Format
+Before writing any `.design/` artifact, resolve the main repo root via `scripts/lib/worktree-resolve.cjs` (`resolveDesignRoot`) so a worktree run writes to the main checkout and does not leak.
 Write `.design/DESIGN-PLAN.md` with this exact structure:
 ```markdown

package/agents/design-reflector.md CHANGED Viewed

@@ -62,6 +62,8 @@ Minimum expected inputs (skip gracefully if absent, note what's missing):
 ## Output
+Before writing any `.design/` artifact, resolve the main repo root via `scripts/lib/worktree-resolve.cjs` (`resolveDesignRoot`) so a worktree run writes to the main checkout and does not leak.
 Write `.design/reflections/<cycle-slug>.md`. If `--dry-run` is set in the spawning prompt, print proposals to stdout only - do not write the file.
 If the capability-gap pattern scan emitted any events during this run, include a `## Capability gaps emitted` heading listing each `event_id` with the source signal kind (`intel` | `posterior` | `trajectory`) and the `suggested_kind` (`agent` | `skill`) per event. Plan 29-03 reads these events from `.design/gep/events.jsonl` to cluster recurring `capability_gap` events for `/gdd:apply-reflections`.

package/agents/design-research-synthesizer.md CHANGED Viewed

@@ -161,6 +161,8 @@ Read .design/STATE.md
 ## Output
+Before writing any `.design/` artifact, resolve the main repo root via `scripts/lib/worktree-resolve.cjs` (`resolveDesignRoot`) so a worktree run writes to the main checkout and does not leak.
 Single file: `.design/DESIGN-CONTEXT.md`.
 ## Record

package/agents/design-verifier.md CHANGED Viewed

@@ -33,6 +33,7 @@ The orchestrating stage supplies a `<required_reading>` block in the prompt. Rea
 - `.design/DESIGN-CONTEXT.md` - goals, must-haves, brand direction, references
 - `.design/tasks/` - what was actually done (glob all task files)
 - `reference/audit-scoring.md` - scoring rubric for category weights
+- `reference/reviewer-confidence-gate.md` - Pre-Report Gate, the `confidence` field, and the gap routing rule
 - `reference/heuristics.md` - NNG heuristics H-01..H-10 scoring guide
 - `reference/review-format.md` - visual UAT presentation format
 - `reference/accessibility.md` - WCAG checklist for accessibility scoring
@@ -40,6 +41,8 @@ The orchestrating stage supplies a `<required_reading>` block in the prompt. Rea
 - `connections/chromatic.md` - Chromatic CLI connection spec (probe, baseline management, fallback)
 - `connections/storybook.md` - Storybook HTTP probe and a11y integration details
+**Worktree-root invariant:** before writing `.design/DESIGN-VERIFICATION.md` (or any `.design/` artifact), resolve the main repo root via `scripts/lib/worktree-resolve.cjs` so a worktree run writes to the canonical `.design/` and does not leak artifacts into the worktree checkout.
 ## Prompt Context Fields
 The stage embeds these fields in its prompt:
@@ -440,6 +443,8 @@ Classify each gap:
 - `MINOR` - noticeable issue; fix if time allows
 - `COSMETIC` - polish only; defer to later
+**Pre-Report Gate (Phase 49, see `reference/reviewer-confidence-gate.md`).** Before emitting each gap, answer the four questions: (a) can you cite `file:line`, (b) can you state the failure mode in one sentence, (c) did you read context beyond the modified file, (d) is the severity defensible? Stamp every gap with a `confidence` field (`0.0-1.0`): `>= 0.8` when all four pass, `0.5-0.8` when evidence is partial, `< 0.5` for an unconfirmed hunch. A BLOCKER or MAJOR requires `confidence >= 0.8` plus a `file:line` citation plus a one-sentence failure mode; below that, lower the severity or move it to `## Tentative`. Confidence is independent of severity. Move every `< 0.5` gap into a `## Tentative` section so it is surfaced but never reaches `design-fixer`.
 For each gap, emit an entry in the locked gap format:
 ```
@@ -452,6 +457,7 @@ For each gap, emit an entry in the locked gap format:
 - Actual: [what is true]
 - Location: [file:line or UI element]
 - Suggested fix: [one-line hint]
+- confidence: [0.0-1.0]
 ```
 Order gaps: BLOCKER first, then MAJOR, MINOR, COSMETIC. Number sequentially (G-01, G-02, ...).
@@ -464,21 +470,7 @@ If zero gaps found: skip this section entirely - do NOT emit `## GAPS FOUND`.
 **Skip if `chromatic` is `not_configured` or `unavailable` in STATE.md `<connections>`.**
-If `.design/chromatic-results.json` exists:
-1. Read .design/chromatic-results.json
-2. Check if this is a first run (all entries have status: "new"):
-   → First run: emit "Baseline established - no regressions detected (first run creates baseline)."
-3. For subsequent runs, narrate changes:
-   For each story entry in results:
-     - status "unchanged" → PASS <StoryTitle>:<StoryName>
-     - status "changed" → CHANGED <StoryTitle>:<StoryName> (visual change detected - review on chromatic.com)
-     - status "new" → NEW <StoryTitle>:<StoryName> (first snapshot - not a regression)
-     - status "error" → ERROR <StoryTitle>:<StoryName> - investigate
-4. Emit summary: "Total: N stories. X unchanged. Y changed. Z new. W errors."
-5. If Y > 0 (changed stories): flag as "VISUAL REGRESSION CANDIDATES - review required on chromatic.com before merging"
-6. Append narration to DESIGN-VERIFICATION.md ## Visual Regression section (create section if absent)
-If .design/chromatic-results.json does not exist: skip; emit no note.
+If `.design/chromatic-results.json` exists, read it and narrate. First run (all entries `status: "new"`): emit "Baseline established - no regressions detected (first run creates baseline)." Subsequent runs, per story entry: `unchanged` → PASS, `changed` → CHANGED (review on chromatic.com), `new` → NEW (first snapshot, not a regression), `error` → ERROR (investigate). Emit summary "Total: N stories. X unchanged. Y changed. Z new. W errors." If any changed (Y > 0), flag "VISUAL REGRESSION CANDIDATES - review required on chromatic.com before merging". Append the narration to the DESIGN-VERIFICATION.md `## Visual Regression` section (create it if absent). If the file does not exist: skip; emit no note.
 ---