@hegemonart/get-design-done 1.48.0 → 1.49.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -5,14 +5,14 @@
5
5
  },
6
6
  "metadata": {
7
7
  "description": "Get Design Done — 5-stage agent-orchestrated design pipeline with 9 connections, handoff-first workflow, bidirectional Figma write-back, 22+ specialized agents, queryable knowledge layer (intel store, dependency analysis, learnings extraction), and a self-improvement loop (reflector, frontmatter + budget feedback, global-skills layer). v1.20.0 ships the SDK foundation: gdd-state MCP server (11 typed tools), lockfile-safe STATE.md mutations, event stream, and resilience primitives (jittered-backoff, rate-guard, error-classifier, iteration-budget) for rate-limit + 429 + context-overflow recovery. Full CI/CD pipeline (Node 22/24 × Linux/macOS/Windows) and release automation (auto-tag + GitHub Release + release-time smoke test).",
8
- "version": "1.48.0"
8
+ "version": "1.49.0"
9
9
  },
10
10
  "plugins": [
11
11
  {
12
12
  "name": "get-design-done",
13
13
  "source": "./",
14
14
  "description": "Agent-orchestrated 5-stage design pipeline: Brief → Explore → Plan → Design → Verify. 22+ specialized agents, 9 connections (Figma, Refero, Preview, Storybook, Chromatic, Figma Writer, Graphify, Pinterest, Claude Design), Claude Design handoff, bidirectional Figma write-back, and a queryable intel store (.design/intel/) for dependency and learnings queries. Standalone commands: style, darkmode, compare, figma-write, graphify, handoff, analyze-dependencies, skill-manifest, extract-learnings. Embeds NNG heuristics, WCAG thresholds, typographic systems, motion framework, and anti-pattern catalog. Ships with a full CI/CD pipeline (Node 22/24 × Linux/macOS/Windows) and release automation. Optimization layer (v1.0.4.1, retroactive): gdd-router + gdd-cache-manager skills, PreToolUse budget-enforcer hook, tier-aware agent frontmatter, lazy checker gates, streaming synthesizer, /gdd:warm-cache + /gdd:optimize commands, and cost telemetry at .design/telemetry/costs.jsonl — targeting 50-70% per-task token-cost reduction with no quality-floor regression. v1.20.0 SDK foundation: gdd-state MCP server (11 typed tools), lockfile-safe STATE.md mutations, event stream at .design/telemetry/events.jsonl, resilience primitives (jittered-backoff, rate-guard, error-classifier, iteration-budget) with rate-limit + 429 + context-overflow recovery, and TypeScript toolchain.",
15
- "version": "1.48.0",
15
+ "version": "1.49.0",
16
16
  "author": {
17
17
  "name": "hegemonart"
18
18
  },
@@ -1,7 +1,7 @@
1
1
  {
2
2
  "name": "get-design-done",
3
3
  "short_name": "gdd",
4
- "version": "1.48.0",
4
+ "version": "1.49.0",
5
5
  "description": "Agent-orchestrated 5-stage design pipeline: Brief → Explore → Plan → Design → Verify. 22+ specialized agents, 9 connections (Figma, Refero, Preview, Storybook, Chromatic, Figma Writer, Graphify, Pinterest, Claude Design), handoff-first workflow via Claude Design bundles, bidirectional Figma write-back (annotations, Code Connect), queryable intel store (`.design/intel/`) for O(1) design surface lookups, and self-improvement loop (reflector agent, frontmatter + budget feedback, global-skills layer at `~/.claude/gdd/global-skills/`). Standalone commands: style, darkmode, compare, figma-write, graphify, handoff, analyze-dependencies, skill-manifest, extract-learnings, reflect, apply-reflections. Embeds NNG heuristics, WCAG thresholds, typographic systems, motion framework, and anti-pattern catalog. Ships with a full CI/CD pipeline (Node 22/24 × Linux/macOS/Windows, lint + schema + frontmatter + stale-ref + shellcheck + gitleaks + injection-scan + blocking size-budget) and release automation (auto-tag + GitHub Release + release-time smoke test). Optimization layer (v1.0.4.1, retroactive): gdd-router + gdd-cache-manager skills, PreToolUse budget-enforcer hook, tier-aware agent frontmatter, lazy checker gates, streaming synthesizer, /gdd:warm-cache + /gdd:optimize commands, and cost telemetry at .design/telemetry/costs.jsonl — targeting 50-70% per-task token-cost reduction with no quality-floor regression. v1.20.0 SDK foundation: gdd-state MCP server (11 typed tools), lockfile-safe STATE.md mutations, event stream at .design/telemetry/events.jsonl, resilience primitives (jittered-backoff, rate-guard, error-classifier, iteration-budget) with rate-limit + 429 + context-overflow recovery, and TypeScript toolchain. v1.27.7 ships gdd-mcp (Phase 27.7): 12 read-only MCP tools for sub-3s priming. v1.28.0 (Phase 28): Foundational References Tier 2 — 5 new reference files (color-theory, composition, proportion-systems, i18n, contrast-advanced), 2 verifier i18n probes + 1 explore i18n-readiness probe, 12 additive cross-link insertions across 10 existing references, 2 orthogonal audit-scoring lens-tags (composition_alignment + i18n_readiness).",
6
6
  "author": {
7
7
  "name": "hegemonart",
@@ -71,7 +71,10 @@
71
71
  "flutter",
72
72
  "email",
73
73
  "print",
74
- "pdf"
74
+ "pdf",
75
+ "worktree-safe",
76
+ "anti-slop",
77
+ "confidence-gate"
75
78
  ],
76
79
  "skills": [
77
80
  "./skills/"
package/CHANGELOG.md CHANGED
@@ -4,6 +4,50 @@ All notable changes to get-design-done are documented here. Versions follow [sem
4
4
 
5
5
  ---
6
6
 
7
+ ## [1.49.0] - 2026-06-03
8
+
9
+ ### Phase 49 - Quick Anti-Slop Floor
10
+
11
+ Three small, atomic safety and policy primitives identified in the cross-repo synthesis, each low-risk and
12
+ high-signal: a worktree redirect that ends the recurring `.planning/` leak, a free anti-slop regex pass on every
13
+ front-end file write, and a reviewer confidence gate that stops severity inflation. Planned and executed via the
14
+ GSD pipeline (3 parallel executor subagents). No new runtime dependency, no new egress.
15
+
16
+ ### Breaking changes
17
+
18
+ - **`.design/` and `.planning/` writes redirect to the main repo root inside a git worktree.** `scripts/lib/worktree-resolve.cjs`
19
+ detects a worktree (`git rev-parse --git-dir` vs `--git-common-dir`) and the gdd-state write path (`resolveStatePath`,
20
+ used by all 11 state tools) now resolves STATE there, with a one-line stderr notice. Outside a worktree, behavior is
21
+ unchanged. Tooling that assumed `.design/` always lived under `process.cwd()` should resolve through the helper.
22
+ - **Findings now carry a `confidence` field and design-fixer filters on it.** design-auditor, design-verifier, and
23
+ design-debt-crawler emit `confidence: 0.0-1.0` per finding; design-fixer drops `## Tentative` findings and routes
24
+ BLOCKER/MAJOR findings below 0.8 confidence to user review instead of auto-fix. Consumers of these findings should
25
+ read the new field.
26
+
27
+ ### Added
28
+
29
+ - **`scripts/lib/worktree-resolve.cjs`** (resolveRepoRoot / isWorktree / resolveDesignRoot / resolvePlanningRoot;
30
+ graceful fallback, injectable exec) wired into the state write path + a one-line worktree note in the 7
31
+ artifact-writer agents.
32
+ - **`hooks/gdd-design-quality-check.js`**: an advisory PostToolUse hook scanning `Write`/`Edit`/`MultiEdit` to
33
+ `.tsx`/`.vue`/`.svelte`/`.astro` for 8 default-AI-aesthetic tells (gradient spam, generic CTAs, centered-everything,
34
+ font-inter default, purple/violet default, glassmorphism spam, isometric fallback, decorative motion). WARN-only,
35
+ emits a `design_quality_warn` event. Catalogued in **`reference/visual-tells.md`** (8 named categories with diagnostic
36
+ regex + remediation).
37
+ - **Reviewer confidence gate**: a 4-question Pre-Report Gate + the `confidence` field across the three audit agents,
38
+ a `scripts/lib/confidence-route.cjs` routing helper (`fix` / `user-review` / `drop`), and
39
+ **`reference/reviewer-confidence-gate.md`** (template + rationale + 4 before/after examples).
40
+
41
+ ### Notes
42
+
43
+ - 6-manifest lockstep at **v1.49.0** + `OFF_CADENCE_VERSIONS.add('1.49.0')` + 37 `manifests-version.txt` baselines +
44
+ plugin keywords (`worktree-safe`, `anti-slop`, `confidence-gate`). Baselines re-locked: hook-list (19),
45
+ resilience-primitives (39 `scripts/lib/*.cjs`), registry (173), tarball golden 902 -> 907 (+5).
46
+ - WARN-only hook (never blocks); auto-fix of matched tells is out of scope (proposal-only); the verb-based anti-slop
47
+ rubric and a wider tell catalog are deferred to Phase 50.
48
+
49
+ ---
50
+
7
51
  ## [1.48.0] - 2026-06-03
8
52
 
9
53
  ### Phase 48 - Audit & Pillar Expansion
package/README.md CHANGED
@@ -257,6 +257,8 @@ All 14 runtimes receive their native artifact layout (`skills/`, `command/`, `ag
257
257
 
258
258
  **Audit and pillar expansion (v1.48.0).** Four audit-side gaps close at once. The copy pillar gets a real rubric (`reference/copy-quality.md` + `copy-auditor`): microcopy, error and empty-state text, ARIA and alt text, voice alignment, with an i18n overflow lens. A project-wide `design-debt-crawler` walks an existing codebase (not just the current cycle), enumerates raw color literals, anti-patterns, untokenized components, and contrast/density issues, and writes a priority-scored `.design/debt/DEBT-CATALOG.md`. A `brief-auditor` grades the brief against five anti-patterns (vague verbs, missing audience, immeasurable success criteria, scope creep, missing anti-goals) and surfaces a non-blocking `/gdd:discuss brief` pointer. And the Stage 4.5 quality-gate gains an `a11y` failure class so `axe` / `pa11y` / `lighthouse` regressions route to `design-fixer` like any other gate failure. **No new runtime dependency.**
259
259
 
260
+ **Quick anti-slop floor (v1.49.0).** Three small safety primitives. A worktree redirect (`scripts/lib/worktree-resolve.cjs`) sends `.design/` and `.planning/` writes to the main repo root when GDD runs inside a git worktree, so artifacts never leak into an ephemeral checkout. A design-quality PostToolUse hook (`gdd-design-quality-check.js`) runs a free regex pass on every `.tsx`/`.vue`/`.svelte`/`.astro` write and warns on eight default-AI-aesthetic tells (gradient spam, generic CTAs, centered-everything, font-inter defaults, purple/violet defaults, glassmorphism spam, isometric fallbacks, decorative motion), catalogued in `reference/visual-tells.md`. And a reviewer confidence gate adds a `confidence: 0.0-1.0` field plus a 4-question Pre-Report Gate to every audit finding: HIGH and CRITICAL findings need at least 0.8 confidence and cited proof, low-confidence findings stay tentative and never reach `design-fixer`. The hook is WARN-only and there is **no new runtime dependency**.
261
+
260
262
  Verify with:
261
263
 
262
264
  ```
@@ -47,6 +47,7 @@ Minimum expected files:
47
47
  - `.design/tasks/` - what was actually done (glob all task files)
48
48
  - **Domain-index navigation (Phase 45):** the 7 entry-points `reference/{typography,color,spatial,motion,interaction,responsive,ux-writing}.md` index every fragment below. For a pillar, load the relevant domain index first, then drill into the specific fragments it lists only as the pillar needs them - this is the cheap navigation layer over the detailed fragments.
49
49
  - `reference/audit-scoring.md` - existing 7-category scoring rubric (understand, do not duplicate)
50
+ - `reference/reviewer-confidence-gate.md` - Pre-Report Gate, the `confidence` field, and the routing rule applied to every finding
50
51
  - `reference/brand-voice.md` - voice axes, archetype library, and tone-by-context table (use when auditing Pillar 1: Copy)
51
52
  - `reference/gestalt.md` - 8 Gestalt principles with scoring rubrics (use when auditing Pillar 2: Visual Hierarchy)
52
53
  - `reference/visual-hierarchy-layout.md` - Z-order, whitespace, grids, and reading-order patterns (use when auditing Pillar 2: Visual Hierarchy)
@@ -357,6 +358,10 @@ For each of the 7 pillars:
357
358
  3. Assign a score (1–4) with specific evidence
358
359
  4. Identify the top gap for this pillar (one concrete, actionable finding)
359
360
 
361
+ ### Step 3.5: Pre-Report Gate + confidence
362
+
363
+ Before writing any finding into the Priority Fix List or Detailed Findings, run the four-question Pre-Report Gate from `reference/reviewer-confidence-gate.md`: (a) can you cite `file:line`, (b) can you state the failure mode in one sentence, (c) did you read context beyond the matched line, (d) is the implied severity defensible? Stamp every priority-fix finding with a `confidence` value (`0.0-1.0`): `>= 0.8` when all four pass, `0.5-0.8` for partial evidence, `< 0.5` for an unconfirmed pattern match (common for the code-only Visual Hierarchy and Color pillars, where runtime cannot be seen). Move every `< 0.5` finding into a `## Tentative` section instead of the Priority Fix List, so a low-confidence guess never escalates to remediation. Confidence is independent of the 1-4 pillar scores and does not change them.
364
+
360
365
  ### Step 4: Write DESIGN-AUDIT.md
361
366
 
362
367
  Write `.design/DESIGN-AUDIT.md` using the output format below.
@@ -414,11 +419,19 @@ supplement_note: "Supplements 7-category 0-10 system in reference/audit-scoring.
414
419
 
415
420
  ## Priority Fix List
416
421
 
417
- Listed by impact. Top 3 fixes the verifier should weight heavily.
422
+ Listed by impact. Top 3 fixes the verifier should weight heavily. Each finding carries a `confidence` value (see `reference/reviewer-confidence-gate.md`); findings below `0.5` go in `## Tentative`, not here.
423
+
424
+ 1. **[Pillar N: specific issue]** (confidence: [0.0-1.0]) [user impact] [concrete fix with file reference]
425
+ 2. **[Pillar N: specific issue]** (confidence: [0.0-1.0]) [user impact] [concrete fix with file reference]
426
+ 3. **[Pillar N: specific issue]** (confidence: [0.0-1.0]) [user impact] [concrete fix with file reference]
427
+
428
+ ---
429
+
430
+ ## Tentative
431
+
432
+ Low-confidence findings (`confidence < 0.5`, per `reference/reviewer-confidence-gate.md`): pattern matches not confirmed by reading context, or runtime-only concerns the code-only pass cannot verify. Surfaced for human review; never auto-escalated to design-fixer.
418
433
 
419
- 1. **[Pillar N — specific issue]** [user impact] [concrete fix with file reference]
420
- 2. **[Pillar N — specific issue]** — [user impact] — [concrete fix with file reference]
421
- 3. **[Pillar N — specific issue]** — [user impact] — [concrete fix with file reference]
434
+ - [Pillar N: finding] (confidence: [N], unconfirmed because [reason])
422
435
 
423
436
  ---
424
437
 
@@ -561,6 +561,8 @@ Iterate until the user confirms. Then write the artifact.
561
561
 
562
562
  ## Output: .design/DESIGN-CONTEXT.md
563
563
 
564
+ Before writing any `.design/` artifact, resolve the main repo root via `scripts/lib/worktree-resolve.cjs` (`resolveDesignRoot`) so a worktree run writes to the main checkout and does not leak.
565
+
564
566
  Create `.design/` directory if needed. Write `.design/DESIGN-CONTEXT.md`:
565
567
 
566
568
  ```markdown
@@ -60,6 +60,7 @@ listed file before acting. Minimum expected files:
60
60
 
61
61
  - @reference/debt-categories.md
62
62
  - @reference/anti-patterns.md
63
+ - @reference/reviewer-confidence-gate.md
63
64
 
64
65
  `reference/debt-categories.md` is the taxonomy you classify against and the source of
65
66
  the priority-scoring model. `reference/anti-patterns.md` is the BAN-NN and SLOP-NN
@@ -157,6 +158,19 @@ grep -rEn "No data|No results|Nothing here|went wrong|error occurred" src/ \
157
158
  Flag meaningful images without `alt`, icon-only controls without an accessible name,
158
159
  placeholder used as the only label, and generic empty or error copy.
159
160
 
161
+ ### Step 2.5: Pre-Report Gate + confidence
162
+
163
+ Before cataloging any finding, run the four-question Pre-Report Gate from
164
+ `reference/reviewer-confidence-gate.md`: (a) can you cite `file:line`, (b) can you state the
165
+ failure mode in one sentence, (c) did you read context beyond the matched line (the token
166
+ definition, the call site), and (d) is the class assignment defensible? Stamp every catalog
167
+ row with a `confidence` value (`0.0-1.0`): `>= 0.8` when all four pass, `0.5-0.8` when evidence
168
+ is partial, `< 0.5` for a pattern match you could not confirm (for example an unresolved
169
+ contrast pair or a literal that may be inside a token definition). Move every `< 0.5` finding
170
+ into a `## Tentative` section instead of the ranked findings table, so a low-confidence guess
171
+ never escalates to remediation. Confidence is independent of priority: a high-priority debt
172
+ item can still be low confidence and belongs in `## Tentative` until confirmed.
173
+
160
174
  ### Step 3: Group and score
161
175
 
162
176
  Group findings by the seven debt classes. For each finding, assign the three priority
@@ -217,12 +231,21 @@ note: "Project-scoped retroactive debt catalog. Does NOT read STATE.md completed
217
231
 
218
232
  ## Findings (ranked by priority)
219
233
 
220
- | Priority | Class | Location | Finding | V × E × P | Suggested command |
221
- |----------|-------|----------|---------|-----------|-------------------|
222
- | 18 | color-literal | src/Card.tsx:42 | Raw #1a73e8 instead of token | 3×3×2 | `/gdd:fast "replace #1a73e8 with semantic token in Card.tsx"` |
223
- | 12 | anti-pattern | src/Hero.tsx:8 | BAN-02 gradient text on heading | 3×2×2 | `/gdd:fast "remove BAN-02 gradient text in Hero.tsx"` |
234
+ | Priority | Class | Location | Finding | V × E × P | Confidence | Suggested command |
235
+ |----------|-------|----------|---------|-----------|------------|-------------------|
236
+ | 18 | color-literal | src/Card.tsx:42 | Raw #1a73e8 instead of token | 3×3×2 | 0.9 | `/gdd:fast "replace #1a73e8 with semantic token in Card.tsx"` |
237
+ | 12 | anti-pattern | src/Hero.tsx:8 | BAN-02 gradient text on heading | 3×2×2 | 0.85 | `/gdd:fast "remove BAN-02 gradient text in Hero.tsx"` |
238
+
239
+ (One row per finding with `confidence >= 0.5`. The Suggested command column always carries a `/gdd:fast "<finding>"` string. Findings below `0.5` go in `## Tentative` below, not in this table.)
240
+
241
+ ---
242
+
243
+ ## Tentative
244
+
245
+ Findings with `confidence < 0.5` (pattern matches not confirmed by reading context, per
246
+ `reference/reviewer-confidence-gate.md`). Listed for human review; never auto-escalated.
224
247
 
225
- (One row per finding. The Suggested command column always carries a `/gdd:fast "<finding>"` string.)
248
+ - [class] [location]: [finding] (confidence: [N], unconfirmed because [reason])
226
249
 
227
250
  ---
228
251
 
@@ -395,6 +395,8 @@ Apply these rules automatically during execution. Track all deviations in the ta
395
395
 
396
396
  ## Task Output - .design/tasks/task-NN.md
397
397
 
398
+ Before writing any `.design/` artifact, resolve the main repo root via `scripts/lib/worktree-resolve.cjs` (`resolveDesignRoot`) so a worktree run writes to the main checkout and does not leak.
399
+
398
400
  After completing the task's implementation work, write `.design/tasks/task-NN.md` (where NN = task_id from prompt context). Create `.design/tasks/` directory first if it does not exist.
399
401
 
400
402
  Format (locked - do not alter structure):
@@ -48,6 +48,8 @@ The orchestrating stage supplies a `<required_reading>` block in the prompt. Rea
48
48
 
49
49
  **Invariant:** read all listed files FIRST, before making any changes.
50
50
 
51
+ **Worktree-root invariant:** before writing any `.design/` artifact (for example a `<blocker>` entry to `.design/STATE.md`), resolve the main repo root via `scripts/lib/worktree-resolve.cjs` so a worktree run writes to the canonical `.design/` and does not leak artifacts into the worktree checkout.
52
+
51
53
  ---
52
54
 
53
55
  ## Prompt Context Fields
@@ -88,7 +90,8 @@ Parse every entry in that section. The `G-NN` identifier, severity classificatio
88
90
  4. Filter by severity based on `auto_mode`:
89
91
  - Always include: `BLOCKER`, `MAJOR`
90
92
  - Include only if `auto_mode=true`: `MINOR`, `COSMETIC`
91
- 5. Build an ordered list: BLOCKER first, then MAJOR, then (if included) MINOR, COSMETIC.
93
+ 5. **Confidence routing filter (Phase 49, see `reference/reviewer-confidence-gate.md`).** Drop any gap that sits under a `## Tentative` heading: those never reach you. Then drop any `BLOCKER` or `MAJOR` gap whose `confidence` field is below `0.8` and route it to user review instead of auto-fix, since a high-severity gap without strong evidence is exactly the inflated-severity case the gate exists to catch. A gap missing its `confidence` field is treated as below the floor. The shared decision lives in `scripts/lib/confidence-route.cjs` (`route({ severity, confidence, tentative })` returns `'fix' | 'user-review' | 'drop'`); fix only the gaps it routes to `'fix'`.
94
+ 6. Build an ordered list: BLOCKER first, then MAJOR, then (if included) MINOR, COSMETIC.
92
95
 
93
96
  If no in-scope gaps are found (e.g., verifier found only MINOR gaps and `auto_mode=false`), emit `## FIX COMPLETE` immediately with "No in-scope gaps to fix."
94
97
 
@@ -227,6 +227,8 @@ Before finalizing task list:
227
227
 
228
228
  ## Output Format
229
229
 
230
+ Before writing any `.design/` artifact, resolve the main repo root via `scripts/lib/worktree-resolve.cjs` (`resolveDesignRoot`) so a worktree run writes to the main checkout and does not leak.
231
+
230
232
  Write `.design/DESIGN-PLAN.md` with this exact structure:
231
233
 
232
234
  ```markdown
@@ -62,6 +62,8 @@ Minimum expected inputs (skip gracefully if absent, note what's missing):
62
62
 
63
63
  ## Output
64
64
 
65
+ Before writing any `.design/` artifact, resolve the main repo root via `scripts/lib/worktree-resolve.cjs` (`resolveDesignRoot`) so a worktree run writes to the main checkout and does not leak.
66
+
65
67
  Write `.design/reflections/<cycle-slug>.md`. If `--dry-run` is set in the spawning prompt, print proposals to stdout only - do not write the file.
66
68
 
67
69
  If the capability-gap pattern scan emitted any events during this run, include a `## Capability gaps emitted` heading listing each `event_id` with the source signal kind (`intel` | `posterior` | `trajectory`) and the `suggested_kind` (`agent` | `skill`) per event. Plan 29-03 reads these events from `.design/gep/events.jsonl` to cluster recurring `capability_gap` events for `/gdd:apply-reflections`.
@@ -161,6 +161,8 @@ Read .design/STATE.md
161
161
 
162
162
  ## Output
163
163
 
164
+ Before writing any `.design/` artifact, resolve the main repo root via `scripts/lib/worktree-resolve.cjs` (`resolveDesignRoot`) so a worktree run writes to the main checkout and does not leak.
165
+
164
166
  Single file: `.design/DESIGN-CONTEXT.md`.
165
167
 
166
168
  ## Record
@@ -33,6 +33,7 @@ The orchestrating stage supplies a `<required_reading>` block in the prompt. Rea
33
33
  - `.design/DESIGN-CONTEXT.md` - goals, must-haves, brand direction, references
34
34
  - `.design/tasks/` - what was actually done (glob all task files)
35
35
  - `reference/audit-scoring.md` - scoring rubric for category weights
36
+ - `reference/reviewer-confidence-gate.md` - Pre-Report Gate, the `confidence` field, and the gap routing rule
36
37
  - `reference/heuristics.md` - NNG heuristics H-01..H-10 scoring guide
37
38
  - `reference/review-format.md` - visual UAT presentation format
38
39
  - `reference/accessibility.md` - WCAG checklist for accessibility scoring
@@ -40,6 +41,8 @@ The orchestrating stage supplies a `<required_reading>` block in the prompt. Rea
40
41
  - `connections/chromatic.md` - Chromatic CLI connection spec (probe, baseline management, fallback)
41
42
  - `connections/storybook.md` - Storybook HTTP probe and a11y integration details
42
43
 
44
+ **Worktree-root invariant:** before writing `.design/DESIGN-VERIFICATION.md` (or any `.design/` artifact), resolve the main repo root via `scripts/lib/worktree-resolve.cjs` so a worktree run writes to the canonical `.design/` and does not leak artifacts into the worktree checkout.
45
+
43
46
  ## Prompt Context Fields
44
47
 
45
48
  The stage embeds these fields in its prompt:
@@ -440,6 +443,8 @@ Classify each gap:
440
443
  - `MINOR` - noticeable issue; fix if time allows
441
444
  - `COSMETIC` - polish only; defer to later
442
445
 
446
+ **Pre-Report Gate (Phase 49, see `reference/reviewer-confidence-gate.md`).** Before emitting each gap, answer the four questions: (a) can you cite `file:line`, (b) can you state the failure mode in one sentence, (c) did you read context beyond the modified file, (d) is the severity defensible? Stamp every gap with a `confidence` field (`0.0-1.0`): `>= 0.8` when all four pass, `0.5-0.8` when evidence is partial, `< 0.5` for an unconfirmed hunch. A BLOCKER or MAJOR requires `confidence >= 0.8` plus a `file:line` citation plus a one-sentence failure mode; below that, lower the severity or move it to `## Tentative`. Confidence is independent of severity. Move every `< 0.5` gap into a `## Tentative` section so it is surfaced but never reaches `design-fixer`.
447
+
443
448
  For each gap, emit an entry in the locked gap format:
444
449
 
445
450
  ```
@@ -452,6 +457,7 @@ For each gap, emit an entry in the locked gap format:
452
457
  - Actual: [what is true]
453
458
  - Location: [file:line or UI element]
454
459
  - Suggested fix: [one-line hint]
460
+ - confidence: [0.0-1.0]
455
461
  ```
456
462
 
457
463
  Order gaps: BLOCKER first, then MAJOR, MINOR, COSMETIC. Number sequentially (G-01, G-02, ...).
@@ -464,21 +470,7 @@ If zero gaps found: skip this section entirely - do NOT emit `## GAPS FOUND`.
464
470
 
465
471
  **Skip if `chromatic` is `not_configured` or `unavailable` in STATE.md `<connections>`.**
466
472
 
467
- If `.design/chromatic-results.json` exists:
468
- 1. Read .design/chromatic-results.json
469
- 2. Check if this is a first run (all entries have status: "new"):
470
- → First run: emit "Baseline established - no regressions detected (first run creates baseline)."
471
- 3. For subsequent runs, narrate changes:
472
- For each story entry in results:
473
- - status "unchanged" → PASS <StoryTitle>:<StoryName>
474
- - status "changed" → CHANGED <StoryTitle>:<StoryName> (visual change detected - review on chromatic.com)
475
- - status "new" → NEW <StoryTitle>:<StoryName> (first snapshot - not a regression)
476
- - status "error" → ERROR <StoryTitle>:<StoryName> - investigate
477
- 4. Emit summary: "Total: N stories. X unchanged. Y changed. Z new. W errors."
478
- 5. If Y > 0 (changed stories): flag as "VISUAL REGRESSION CANDIDATES - review required on chromatic.com before merging"
479
- 6. Append narration to DESIGN-VERIFICATION.md ## Visual Regression section (create section if absent)
480
-
481
- If .design/chromatic-results.json does not exist: skip; emit no note.
473
+ If `.design/chromatic-results.json` exists, read it and narrate. First run (all entries `status: "new"`): emit "Baseline established - no regressions detected (first run creates baseline)." Subsequent runs, per story entry: `unchanged` → PASS, `changed` → CHANGED (review on chromatic.com), `new` → NEW (first snapshot, not a regression), `error` → ERROR (investigate). Emit summary "Total: N stories. X unchanged. Y changed. Z new. W errors." If any changed (Y > 0), flag "VISUAL REGRESSION CANDIDATES - review required on chromatic.com before merging". Append the narration to the DESIGN-VERIFICATION.md `## Visual Regression` section (create it if absent). If the file does not exist: skip; emit no note.
482
474
 
483
475
  ---
484
476