npm - @tekyzinc/gsd-t - Versions diffs - 2.56.14 → 2.57.10 - Mend

@tekyzinc/gsd-t 2.56.14 → 2.57.10

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (9) hide show

package/CHANGELOG.md +13 -0
package/README.md +1 -0
package/commands/gsd-t-execute.md +160 -41
package/commands/gsd-t-help.md +1 -1
package/commands/gsd-t-quick.md +40 -0
package/docs/GSD-T-README.md +3 -1
package/package.json +1 -1
package/templates/CLAUDE-global.md +25 -3
package/templates/stacks/design-to-code.md +13 -95

package/CHANGELOG.md CHANGED Viewed

@@ -2,6 +2,19 @@
 All notable changes to GSD-T are documented here. Updated with each release.
+## [2.57.10] - 2026-04-04
+### Added
+- **Design Verification Agent** — dedicated subagent (Step 5.25) spawned after QA and before Red Team when `.gsd-t/contracts/design-contract.md` exists. Opens a browser with both the built frontend AND the original design (Figma/image) side-by-side for direct visual comparison. Produces a 30+ row structured comparison table with MATCH/DEVIATION verdicts. Artifact gate enforces completion — missing table triggers re-spawn.
+- Wired into `gsd-t-execute` (Step 5.25) and `gsd-t-quick` (Step 5.25)
+### Changed
+- **Separation of concerns**: Coding agents no longer perform visual verification inline (removed 45-line Step 7 from task subagent prompt). Coding agents write precise code from design tokens; the verification agent proves it matches.
+- `design-to-code.md` Section 15 slimmed from 120 lines to 20 lines — now points to the dedicated agent instead of embedding the full verification loop in the stack rule
+- `CLAUDE-global.md` updated with Design Verification Agent section between QA and Red Team
+- Red Team now runs after Design Verification (previously ran directly after QA)
+- Non-design projects are completely unaffected (gate checks for design-contract.md existence)
 ## [2.52.11] - 2026-04-01
 ### Added

package/README.md CHANGED Viewed

@@ -17,6 +17,7 @@ A methodology for reliable, parallelizable development using Claude Code with op
 **Token-Aware Orchestration** — `token-budget.js` tracks session token consumption and applies graduated degradation: downgrade model assignments when approaching limits, checkpoint and skip non-essential operations to conserve budget, and halt cleanly with a resume instruction at the ceiling. Wave and execute phases check budget before each subagent spawn.
 **Quality North Star** — projects define a `## Quality North Star` section in CLAUDE.md (1–3 sentences, e.g., "This is a published npm library. Every public API must be intuitive and backward-compatible."). `gsd-t-init` auto-detects preset (library/web-app/cli) from package.json signals; `gsd-t-setup` configures it for existing projects. Subagents read it as a quality lens; absent = silent skip (backward compatible).
 **Design Brief Artifact** — during partition, UI/frontend projects (React, Vue, Svelte, Flutter, Tailwind) automatically get `.gsd-t/contracts/design-brief.md` with color palette, typography, spacing system, component patterns, and tone/voice. Non-UI projects skip silently. User-customized briefs are preserved. Referenced in plan phase for visual consistency.
+**Design Verification Agent** — after QA passes on design-to-code projects, a dedicated verification agent opens a browser with both the built frontend AND the original design (Figma page, design image, or MCP screenshot) side-by-side for direct visual comparison. Produces a structured element-by-element comparison table (30+ rows) with specific design values vs. implementation values and MATCH/DEVIATION verdicts. An artifact gate enforces that the comparison table exists — missing it blocks completion. Separation of concerns: coding agents code, verification agents verify. Wired into execute (Step 5.25) and quick (Step 5.25). Only fires when `.gsd-t/contracts/design-contract.md` exists — non-design projects are unaffected.
 **Exploratory Testing** — after scripted tests pass, if Playwright MCP is registered in Claude Code settings, QA agents get 3 minutes and Red Team gets 5 minutes of interactive browser exploration. All findings tagged `[EXPLORATORY]` and tracked separately in QA calibration. Silent skip when Playwright MCP absent. Wired into execute, quick, integrate, and debug.
 ---

package/commands/gsd-t-execute.md CHANGED Viewed

@@ -282,47 +282,12 @@ Execute the task above:
         recovers (retry button works, form can be resubmitted, etc.).
      A test that would pass on an empty HTML page with the right element IDs is useless.
      Every assertion must prove the FEATURE WORKS, not that the ELEMENT EXISTS.
-7. **Visual Design Verification** (MANDATORY when design-to-code stack rule is active):
-   If the task involves UI implementation from a design reference, this step is NOT optional.
-   **FAIL-BY-DEFAULT**: Every element is UNVERIFIED until you prove it matches. "Looks close" is not
-   a verdict. "Appears to match" is not a verdict. Assume NOTHING matches.
-   a. **Get the Figma reference screenshot**: If Figma MCP is available, call `get_screenshot` with the
-      relevant nodeId and fileKey from `.gsd-t/contracts/design-contract.md`. Save this as the reference.
-      If no Figma MCP, use the design image/screenshot provided in the contract.
-   b. **Build the element inventory**: Before comparing ANYTHING, enumerate every distinct visual
-      element in the design — walk top-to-bottom, left-to-right. Every chart, label, icon, heading,
-      card, button, spacing boundary, color, and data visualization detail gets its own row.
-      Data visualizations expand into multiple rows: chart type, orientation, axis labels, legend
-      position, bar/segment colors, data labels, grid lines, center text, tooltip style.
-      If a full-page inventory has fewer than 30 elements, you missed items — go back.
-   c. **Render the built component in a real browser**: Start the dev server if not running.
-      Use Claude Preview, Chrome MCP, or Playwright to open the page at the correct URL.
-      Capture screenshots at each target breakpoint:
-        - Mobile: 375px width
-        - Tablet: 768px width
-        - Desktop: 1280px width
-   d. **Structured element-by-element comparison** (MANDATORY FORMAT — no prose comparisons):
-      Produce a table with this exact structure for every element in the inventory:
-      `| # | Section | Element | Design (specific) | Implementation (specific) | Verdict |`
-      Rules:
-        - "Design" column: SPECIFIC values (chart type name, hex color, px size, font weight)
-        - "Implementation" column: SPECIFIC observed values from the screenshot — not code assumptions
-        - Verdict: only ✅ MATCH or ❌ DEVIATION — never "appears to match" or "need to verify"
-        - Data visualizations: chart type, axis orientation, axis labels, legend position, bar colors,
-          data label placement, grid lines, center text — each a SEPARATE row
-        - NEVER lead with "what's working" — the table IS the comparison, start with row 1
-   e. **Fix every ❌ DEVIATION** — fix each row individually, trace to design contract value.
-      Re-render after each batch of fixes. Update verdict only after visual re-verification.
-      Max 3 fix-and-recheck iterations per component.
-   f. **Final table**: After fixes, every row must be ✅ MATCH. Any remaining ❌ → log to
-      `.gsd-t/qa-issues.md` with severity CRITICAL and tag `[VISUAL]`. BLOCKS task completion.
-      Report: "Verified: {N}/{total} elements match at {breakpoints} breakpoints"
-   g. **Log results** in the design contract's Verification Status table.
-   h. **If no browser/preview tools available**: This is a CRITICAL blocker, not a warning.
-      Log to `.gsd-t/qa-issues.md`: "CRITICAL: No browser tools available for visual verification.
-      Install Claude Preview or configure Playwright for visual testing."
-      The task CANNOT be marked complete without visual verification.
-   Skip this step entirely if no design-to-code stack rule was injected.
+7. **Visual Design Note** (when design-to-code stack rule is active):
+   Do NOT perform visual verification yourself — a dedicated Design Verification Agent
+   (Step 5.25) runs after all domain tasks complete and handles the full visual comparison.
+   Your job: write precise code from the design contract tokens. Use exact hex colors,
+   exact spacing values, exact typography. Every CSS value must trace to the design contract.
+   The verification agent will open a browser and prove whether your code matches.
 8. Run ALL test suites — this is NOT optional, not conditional, not "if applicable":
    a. Detect configured test runners: check for vitest/jest config, playwright.config.*, cypress.config.*
    b. Run EVERY detected suite. Unit tests alone are NEVER sufficient when E2E exists.
@@ -643,6 +608,160 @@ A teammate finishes independent tasks and is waiting on a checkpoint:
 2. If not, have the teammate work on documentation, tests, or code cleanup within their domain
 3. Or shut down the teammate and respawn when unblocked
+## Step 5.25: Design Verification Agent (MANDATORY when design contract exists)
+After all domain tasks complete and QA passes, check if `.gsd-t/contracts/design-contract.md` exists. If it does NOT exist, skip this step entirely.
+If it DOES exist — spawn a **dedicated Design Verification Agent**. This agent's ONLY job is to open a browser, compare the built frontend against the original design, and produce a structured comparison table. It writes NO feature code. Separation of concerns: the coding agent codes, the verification agent verifies.
+⚙ [{model}] Design Verification → visual comparison of built frontend vs design
+**OBSERVABILITY LOGGING (MANDATORY):**
+Before spawning — run via Bash:
+`T_START=$(date +%s) && DT_START=$(date +"%Y-%m-%d %H:%M") && TOK_START=${CLAUDE_CONTEXT_TOKENS_USED:-0} && TOK_MAX=${CLAUDE_CONTEXT_TOKENS_MAX:-200000}`
+```
+Task subagent (general-purpose, model: opus):
+"You are the Design Verification Agent. Your ONLY job is to visually compare
+the built frontend against the original design and produce a structured
+comparison table. You write ZERO feature code. Your sole deliverable is
+the comparison table and verification results.
+FAIL-BY-DEFAULT: Every visual element starts as UNVERIFIED. You must prove
+each one matches — not assume it does. 'Looks close' is not a verdict.
+'Appears to match' is not a verdict. The only valid verdicts are MATCH
+(with proof) or DEVIATION (with specifics).
+## Step 1: Get the Design Reference
+Read .gsd-t/contracts/design-contract.md for the source reference.
+- If Figma MCP available → call get_screenshot with nodeId + fileKey from the contract
+- If design image files → locate them from the contract's Source Reference field
+- If no MCP and no images → log CRITICAL blocker to .gsd-t/qa-issues.md and STOP
+You MUST have a reference image before proceeding.
+## Step 2: Build the Element Inventory
+Before ANY comparison, enumerate every distinct visual element in the design.
+Walk the design top-to-bottom, left-to-right. For each section:
+  - Section title text and icon
+  - Every chart/visualization (type, orientation, labels, legend, series count)
+  - Every data table (columns, row structure, sort indicators)
+  - Every KPI/stat card (value, label, icon, trend indicator)
+  - Every button, toggle, tab, dropdown
+  - Every text element (headings, body, captions, labels)
+  - Every spacing boundary (section gaps, card padding, element margins)
+  - Every color usage (backgrounds, borders, text, chart fills)
+Write each element as a row for the comparison table.
+If the inventory has fewer than 20 elements for a full page, you missed items.
+Data visualizations MUST expand into multiple rows:
+  Chart type, chart orientation, axis labels, axis grid lines, legend position,
+  data labels placement, chart colors per series, bar width/spacing,
+  center text (donut/pie), tooltip style — each a SEPARATE element.
+## Step 3: Open Side-by-Side Browser Sessions
+Start the dev server (npm run dev, or project equivalent).
+Open TWO browser views simultaneously for direct visual comparison:
+VIEW 1 — BUILT FRONTEND:
+  Open the implemented page using Claude Preview, Chrome MCP, or Playwright.
+  Navigate to the exact route/component being verified.
+  You MUST see real rendered output — not just read the code.
+VIEW 2 — ORIGINAL DESIGN REFERENCE:
+  If Figma URL available → open the Figma page in a browser tab/window.
+    Use the Figma URL from the design contract Source Reference field.
+    Navigate to the specific frame/component being compared.
+  If design image file → open the image in a browser tab/window.
+    Use: file://{absolute-path-to-image} or render in an HTML page.
+  If Figma MCP screenshot was captured → open that screenshot image.
+COMPARISON APPROACH:
+  With both views open, walk through each component/section:
+    - Position views side-by-side (or switch between tabs)
+    - Compare each element visually at the same zoom level
+    - Screenshot BOTH views at matching viewport sizes
+  Capture implementation screenshots at each target breakpoint:
+    Mobile (375px), Tablet (768px), Desktop (1280px) minimum.
+  Each breakpoint is a separate screenshot pair (design + implementation).
+If Claude Preview, Chrome MCP, and Playwright are ALL unavailable:
+  This is a CRITICAL blocker. Log to .gsd-t/qa-issues.md:
+  'CRITICAL: No browser tools available for visual verification.'
+  STOP — the verification CANNOT proceed without a browser.
+## Step 4: Structured Element-by-Element Comparison (MANDATORY FORMAT)
+Produce a comparison table with this exact structure. Every element from
+the inventory gets its own row. No summarizing, no grouping, no prose.
+| # | Section | Element | Design (specific) | Implementation (specific) | Verdict |
+|---|---------|---------|-------------------|--------------------------|---------|
+| 1 | Summary | Chart type | Horizontal stacked bar | Vertical grouped bar | ❌ DEVIATION |
+| 2 | Summary | Chart colors | #4285F4, #34A853, #FBBC04 | #4285F4, #34A853, #FBBC04 | ✅ MATCH |
+Rules:
+- 'Design' column: SPECIFIC values (chart type name, hex color, px size, font weight)
+- 'Implementation' column: SPECIFIC observed values from the SCREENSHOT — not code assumptions
+- Verdict: only ✅ MATCH or ❌ DEVIATION — never 'appears to match' or 'need to verify'
+- NEVER write 'Appears to match' or 'Looks correct' — measure and verify
+- If the table has fewer than 30 rows for a full-page comparison, you skipped elements
+## Step 5: Report Deviations
+For each ❌ DEVIATION, write a specific finding:
+  'Design: {exact value}. Implementation: {exact value}. File: {path}:{line}'
+Write the FULL comparison table to .gsd-t/contracts/design-contract.md
+under a '## Verification Status' section.
+Any ❌ DEVIATION → also append to .gsd-t/qa-issues.md with severity HIGH
+and tag [VISUAL]:
+| {date} | gsd-t-execute | Step 5.25 | opus | {duration} | HIGH | [VISUAL] {description} |
+## Step 6: Verdict
+Count results: '{MATCH_COUNT}/{TOTAL} elements match at {breakpoints} breakpoints'
+VERDICT:
+- ALL rows ✅ MATCH → DESIGN VERIFIED
+- ANY rows ❌ DEVIATION → DESIGN DEVIATIONS FOUND ({count} deviations)
+Write verdict to .gsd-t/contracts/design-contract.md Verification Status section.
+Report back:
+- Verdict: DESIGN VERIFIED | DESIGN DEVIATIONS FOUND
+- Match count: {N}/{total}
+- Breakpoints verified: {list}
+- Deviations: {count with summary of each}
+- Comparison table: {the full table}"
+```
+After subagent returns — run via Bash:
+`T_END=$(date +%s) && DT_END=$(date +"%Y-%m-%d %H:%M") && TOK_END=${CLAUDE_CONTEXT_TOKENS_USED:-0} && DURATION=$((T_END-T_START))`
+Compute tokens and compaction:
+- No compaction (TOK_END >= TOK_START): `TOKENS=$((TOK_END-TOK_START))`, COMPACTED=null
+- Compaction detected (TOK_END < TOK_START): `TOKENS=$(((TOK_MAX-TOK_START)+TOK_END))`, COMPACTED=$DT_END
+Append to `.gsd-t/token-log.md`:
+`| {DT_START} | {DT_END} | gsd-t-execute | Design Verify | opus | {DURATION}s | {VERDICT} — {MATCH}/{TOTAL} elements | {TOKENS} | {COMPACTED} | | | {CTX_PCT} |`
+**Artifact Gate (MANDATORY):**
+After the Design Verification Agent returns, check `.gsd-t/contracts/design-contract.md`:
+1. Read the file — does it contain a `## Verification Status` section?
+2. Does that section contain a comparison table with rows?
+3. If EITHER is missing → the verification agent failed its job. Log:
+   `[failure] Design Verification Agent did not produce comparison table — re-spawning`
+   Re-spawn the agent (1 retry). If it fails again, log to `.gsd-t/deferred-items.md`.
+**If VERDICT is DESIGN DEVIATIONS FOUND:**
+1. Fix all deviations (spawn a fix subagent, model: sonnet, with the deviation list)
+2. Re-spawn the Design Verification Agent to re-verify (max 2 fix-and-verify cycles)
+3. If deviations persist after 2 cycles, log to `.gsd-t/deferred-items.md` and present to user
+**If VERDICT is DESIGN VERIFIED:** Proceed to Red Team.
 ## Step 5.5: Red Team — Adversarial QA (MANDATORY)
 After all domain tasks pass their tests, spawn an adversarial Red Team agent. This agent's sole purpose is to BREAK the code that was just built. It operates with inverted incentives — its success is measured by bugs found, not tests passed.

package/commands/gsd-t-help.md CHANGED Viewed

@@ -261,7 +261,7 @@ Use these when user asks for help on a specific command:
 - **Note (M22)**: Task-level fresh dispatch (one subagent per task, ~10-20% context each). Team mode uses worktree isolation (`isolation: "worktree"`) — zero file conflicts. Adaptive replanning between domain completions.
 - **Note (M26)**: Active rule injection — evaluates declarative rules from rules.jsonl before dispatching each domain's tasks. Fires matching rules as warnings in subagent prompts.
 - **Note (M29)**: Stack Rules Engine — auto-detects project tech stack from manifest files and injects mandatory best-practice rules into each task subagent prompt. Universal rules (`_security.md`, `_auth.md`) always apply; stack-specific rules layer on top. Violations are task failures (same weight as contract violations).
-- **Note (M33)**: Design-to-code — activated when design contract, design tokens, Figma config, or Figma MCP in settings.json is detected. Injects pixel-perfect implementation rules: design token extraction, stack capability evaluation, component decomposition. Step 7 (Visual Design Verification) is MANDATORY — renders each screen in a real browser, screenshots at mobile/tablet/desktop, compares pixel-by-pixel against the Figma design via MCP `get_screenshot`. Visual deviations block task completion. Also triggers from Figma MCP being configured in `~/.claude/settings.json`.
+- **Note (M33)**: Design-to-code — activated when design contract, design tokens, Figma config, or Figma MCP in settings.json is detected. Injects pixel-perfect implementation rules: design token extraction, stack capability evaluation, component decomposition. Step 5.25 spawns a **dedicated Design Verification Agent** (model: opus) after QA passes — opens a browser with both the built frontend AND the original design side-by-side, produces a 30+ row structured comparison table, and enforces an artifact gate (missing table = re-spawn). Coding agents code; the verification agent verifies. Visual deviations block completion. Only fires when `.gsd-t/contracts/design-contract.md` exists.
 ### test-sync
 - **Summary**: Keep tests aligned with code changes

package/commands/gsd-t-quick.md CHANGED Viewed

@@ -239,6 +239,46 @@ After all scripted tests pass:
 4. If Playwright MCP is not available: skip this section silently
 Note: Exploratory findings do NOT count against the scripted test pass/fail ratio.
+## Step 5.25: Design Verification Agent (MANDATORY when design contract exists)
+After tests pass, check if `.gsd-t/contracts/design-contract.md` exists. If it does NOT, skip to Step 5.5.
+If it DOES exist and this task involved UI changes — spawn the Design Verification Agent. This agent's ONLY job is to open a browser, compare the built frontend against the original design, and produce a structured comparison table. It writes NO feature code.
+⚙ [{model}] Design Verification → visual comparison of built frontend vs design
+**OBSERVABILITY LOGGING (MANDATORY):**
+Before spawning — run via Bash:
+`T_START=$(date +%s) && DT_START=$(date +"%Y-%m-%d %H:%M") && TOK_START=${CLAUDE_CONTEXT_TOKENS_USED:-0} && TOK_MAX=${CLAUDE_CONTEXT_TOKENS_MAX:-200000}`
+```
+Task subagent (general-purpose, model: opus):
+"You are the Design Verification Agent. Your ONLY job is to visually compare
+the built frontend against the original design and produce a structured
+comparison table. You write ZERO feature code.
+FAIL-BY-DEFAULT: Every visual element starts as UNVERIFIED. Prove each matches.
+1. Read .gsd-t/contracts/design-contract.md for design source reference
+2. Get design reference (Figma MCP screenshot, or design images from contract)
+3. Start dev server, open the built frontend in browser (Claude Preview/Chrome MCP/Playwright)
+4. Open the original design reference in a second browser view
+5. Build element inventory (30+ elements for a full page): every chart, label,
+   icon, heading, card, button, spacing, color — each a separate row
+6. Produce structured comparison table:
+   | # | Section | Element | Design (specific) | Implementation (specific) | Verdict |
+   Only valid verdicts: ✅ MATCH or ❌ DEVIATION (never 'appears to match')
+7. Write results to .gsd-t/contracts/design-contract.md under '## Verification Status'
+8. Any ❌ → append to .gsd-t/qa-issues.md with [VISUAL] tag
+9. Report: DESIGN VERIFIED | DESIGN DEVIATIONS FOUND ({count})"
+```
+After subagent returns — run observability Bash and append to token-log.md.
+**Artifact Gate:** Read `.gsd-t/contracts/design-contract.md` — if no `## Verification Status` section with a comparison table exists, re-spawn (1 retry).
+**If deviations found:** Fix them (max 2 cycles), re-verify. If persistent, log to `.gsd-t/deferred-items.md`.
 ## Step 5.5: Red Team — Adversarial QA (MANDATORY)
 After tests pass, spawn an adversarial Red Team agent. This agent's sole purpose is to BREAK the code that was just changed. Its success is measured by bugs found, not tests passed.

package/docs/GSD-T-README.md CHANGED Viewed

@@ -156,7 +156,9 @@ GSD-T reads all state files and tells you exactly where you left off.
 │                                              │         └──────┐             │
 │                                              │                ▼             │
 │                                              │    ┌───────────────────┐     │
-│                                              │    │  QA + Red Team    │     │
+│                                              │    │  QA + Design      │     │
+│                                              │    │  Verification +   │     │
+│                                              │    │  Red Team         │     │
 │                                              │    │ (after each phase │     │
 │                                              │    │  that writes code)│     │
 │                                              │    └───────────────────┘     │

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@tekyzinc/gsd-t",
-  "version": "2.56.14",
+  "version": "2.57.10",
   "description": "GSD-T: Contract-Driven Development for Claude Code — 51 slash commands with headless CI/CD mode, graph-powered code analysis, real-time agent dashboard, execution intelligence, task telemetry, doc-ripple enforcement, backlog management, impact analysis, test sync, milestone archival, and PRD generation",
   "author": "Tekyz, Inc.",
   "license": "MIT",

package/templates/CLAUDE-global.md CHANGED Viewed

@@ -308,14 +308,36 @@ Report format: 'Unit: X/Y pass | E2E: X/Y pass (or N/A if no config) | Contract:
 **QA Calibration Feedback Loop** — If `bin/qa-calibrator.js` exists in the project, the system tracks QA miss-rates (bugs found by Red Team that QA missed) and automatically injects targeted guidance into future QA prompts. Weak-spot categories (error paths, boundary inputs, state transitions) are detected from miss patterns and injected as a preamble before the QA subagent runs. Projects without `qa-miss-log.jsonl` data behave identically to baseline — calibration is fully opt-in and backward compatible.
+## Design Verification Agent (Mandatory when design contract exists)
+After QA passes, if `.gsd-t/contracts/design-contract.md` exists, a **dedicated Design Verification Agent** is spawned. This agent's ONLY job is to open a browser, compare the built frontend against the original design, and produce a structured element-by-element comparison table. It writes ZERO feature code.
+**Why a dedicated agent?** Coding agents consistently skip visual verification — even with detailed instructions — because their incentive is to finish building, not to audit. Separating the verifier from the builder ensures the verification actually happens.
+**Design Verification method by command:**
+- `execute` → spawns Design Verification Agent after QA passes (Step 5.25)
+- `quick` → spawns Design Verification Agent after tests pass (Step 5.25)
+- `integrate`, `wave` → Design Verification runs within the execute phase per the rules above
+- Commands without UI work → skipped automatically (no design contract = no verification)
+**Key rules:**
+- **FAIL-BY-DEFAULT**: Every visual element starts as UNVERIFIED. Must prove each matches.
+- **Structured comparison table**: 30+ rows minimum for a full page. Each element gets specific design values vs. specific implementation values and a MATCH or DEVIATION verdict.
+- **No vague verdicts**: "Looks close" and "appears to match" are not valid. Only �� MATCH or ❌ DEVIATION with specific values.
+- **Side-by-side browser sessions**: Opens both the built frontend AND the original design (Figma page, design image, or MCP screenshot) for direct visual comparison.
+- **Artifact gate**: Orchestrator checks that `design-contract.md` contains a `## Verification Status` section with a populated comparison table. Missing artifact = re-spawn (1 retry).
+- **Fix cycle**: Deviations are fixed (up to 2 cycles) and re-verified before proceeding.
+**Design Verification FAIL blocks phase completion.** Deviations must be fixed or logged to `.gsd-t/deferred-items.md`.
 ## Red Team — Adversarial QA (Mandatory)
-After QA passes, every code-producing command spawns a **Red Team agent** — an adversarial subagent whose success is measured by bugs found, not tests passed. This inverts the incentive structure: the Red Team's drive toward "task complete" means digging deeper and finding more bugs, not rubber-stamping.
+After QA and Design Verification pass, every code-producing command spawns a **Red Team agent** — an adversarial subagent whose success is measured by bugs found, not tests passed. This inverts the incentive structure: the Red Team's drive toward "task complete" means digging deeper and finding more bugs, not rubber-stamping.
 **Red Team method by command:**
-- `execute` → spawns Red Team after all domain tasks pass (Step 5.5)
+- `execute` → spawns Red Team after Design Verification passes (Step 5.5)
 - `integrate` → spawns Red Team after integration tests pass (Step 7.5)
-- `quick` → spawns Red Team after Test & Verify passes (Step 5.5)
+- `quick` → spawns Red Team after Design Verification passes (Step 5.5)
 - `debug` → spawns Red Team after fix verification passes (Step 5.3)
 - `wave` → each phase agent handles Red Team per the rules above

package/templates/stacks/design-to-code.md CHANGED Viewed

@@ -433,107 +433,25 @@ MANDATORY:
 ---
-## 15. Visual Verification Loop
+## 15. Visual Verification
-**FAIL-BY-DEFAULT RULE**: Every visual element starts as UNVERIFIED. You must prove each one matches — not assume it does. "Looks close" is not a verdict. "Appears to match" is not a verdict. The only valid verdicts are MATCH (with proof) or DEVIATION (with specifics). If you catch yourself writing "looks correct" or "appears right" without element-level proof, you are doing it wrong.
+**Visual verification is handled by a dedicated Design Verification Agent**, spawned automatically by `gsd-t-execute` (Step 5.25) after all domain tasks complete. The verification agent's ONLY job is to open a browser, compare the built frontend against the original design, and produce a structured element-by-element comparison table.
+**Your job as the coding agent**: Write precise code from the design contract tokens. Every CSS value must trace to a design contract entry. Use exact hex colors, exact spacing values, exact typography. The verification agent will open a browser and prove whether your code matches.
 ```
-MANDATORY:
-  ├── After implementing any design component, you MUST verify it visually.
-  │   Skipping this step is a TASK FAILURE — not optional, not "if tools available".
-  │
-  ├── Step 1: GET THE FIGMA REFERENCE
-  │     If Figma MCP available → call get_screenshot with nodeId + fileKey
-  │     If no MCP → use design image/screenshot from the design contract
-  │     You MUST have a reference image before proceeding
-  │
-  ├── Step 2: BUILD THE ELEMENT INVENTORY
-  │     Before ANY comparison, enumerate every distinct visual element in the
-  │     design. Walk the design top-to-bottom, left-to-right. For each section:
-  │       - Section title text and icon
-  │       - Every chart/visualization (type, orientation, labels, legend, series count)
-  │       - Every data table (columns, row structure, sort indicators)
-  │       - Every KPI/stat card (value, label, icon, trend indicator)
-  │       - Every button, toggle, tab, dropdown
-  │       - Every text element (headings, body, captions, labels)
-  │       - Every spacing boundary (section gaps, card padding, element margins)
-  │       - Every color usage (backgrounds, borders, text, chart fills)
-  │     Write each element as a row in the comparison table (Step 4).
-  │     If the inventory has fewer than 20 elements for a full page, you missed items.
-  │
-  ├── Step 3: RENDER IN A REAL BROWSER + SCREENSHOT
-  │     Start the dev server (npm run dev, etc.)
-  │     Open the page using Claude Preview, Chrome MCP, or Playwright
-  │     You MUST see real rendered output — not just read the code
-  │     Capture screenshots at each target breakpoint:
-  │       Mobile (375px), Tablet (768px), Desktop (1280px) minimum
-  │     Each breakpoint is a separate screenshot
-  │
-  ├── Step 4: STRUCTURED ELEMENT-BY-ELEMENT COMPARISON (MANDATORY FORMAT)
-  │     You MUST produce a comparison table with this exact structure.
-  │     Every row from the inventory gets its own row. No summarizing, no grouping,
-  │     no "appears to match" prose. Each element gets an individual verdict.
-  │
-  │     | # | Section | Element | Design (specific) | Implementation (specific) | Verdict |
-  │     |---|---------|---------|-------------------|--------------------------|---------|
-  │     | 1 | Summary | Chart type | Horizontal stacked bar | Vertical grouped bar | ❌ DEVIATION |
-  │     | 2 | Summary | Chart colors | #4285F4, #34A853, #FBBC04 | #4285F4, #34A853, #FBBC04 | ✅ MATCH |
-  │     | 3 | Summary | Y-axis labels | Tool names, left-aligned | Tool names, left-aligned | ✅ MATCH |
-  │     | 4 | Summary | Bar label placement | Inside bar, white text | Above bar, black text | ❌ DEVIATION |
-  │     | 5 | KPIs | Font size | 32px semibold | 24px regular | ❌ DEVIATION |
-  │     | ... | ... | ... | ... | ... | ... |
+SEPARATION OF CONCERNS:
+  ├── CODING AGENT (you — Sections 1-14 above):
+  │     Extract tokens → write precise CSS → trace every value to design contract
+  │     Do NOT open a browser or attempt visual comparison yourself
   │
-  │     Rules for the table:
-  │     - "Design" column must have SPECIFIC values (chart type name, hex color,
-  │       pixel size, font weight name) — not vague descriptions
-  │     - "Implementation" column must have SPECIFIC observed values — not assumptions
-  │       from reading code. You must LOOK at the rendered screenshot.
-  │     - NEVER write "Appears to match" or "Looks correct" — measure and verify
-  │     - NEVER write "Need to verify" — verify it NOW or mark UNVERIFIED
-  │     - Data visualizations get MULTIPLE rows: chart type, axis orientation,
-  │       axis labels, legend position, bar/line/segment colors, data labels,
-  │       grid lines, tooltip style — each is a separate element
-  │     - If the table has fewer than 30 rows for a full-page comparison,
-  │       you skipped elements. Go back to the inventory.
-  │
-  │     DATA VISUALIZATION CHECKLIST (expand into table rows):
-  │       Chart type: bar/stacked-bar/grouped-bar/horizontal-bar/line/area/donut/pie/scatter
-  │       Chart orientation: horizontal vs vertical
-  │       Axis labels: present, position, font, values
-  │       Axis grid lines: present, style, color
-  │       Legend: position (top/bottom/right/inline), format, colors
-  │       Data labels: inside bars/above bars/on segments, font, color
-  │       Chart colors: exact hex per series/segment
-  │       Bar width/spacing: relative proportions
-  │       Center text (donut/pie): present, value, font
-  │       Tooltip style: if visible in design
-  │
-  ├── Step 5: FIX EVERY DEVIATION
-  │     Fix each ❌ row from the table, one by one
-  │     Trace each fix to the design contract value
-  │     Re-render after each batch of fixes
-  │     Update the table: change ❌ to ✅ only after visual re-verification
-  │     Maximum 3 fix-and-recheck iterations
-  │
-  ├── Step 6: FINAL VERIFICATION
-  │     After fixes, take fresh screenshots at all breakpoints
-  │     Produce a FINAL comparison table — every row must be ✅ MATCH
-  │     Any remaining ❌ → CRITICAL finding in .gsd-t/qa-issues.md
-  │     Task is NOT complete until every row shows ✅ MATCH
-  │     Count: "Verified: {N}/{total} elements match at {breakpoints} breakpoints"
-  │
-  ├── NO BROWSER TOOLS = BLOCKER
-  │     If Claude Preview, Chrome MCP, and Playwright are ALL unavailable:
-  │     This is a CRITICAL blocker, not a warning to log and move on
-  │     The task CANNOT be marked complete without visual verification
-  │     Log to .gsd-t/qa-issues.md with severity CRITICAL
-  │
-  └── Log all verification results in the design contract Verification Status table
+  └── DESIGN VERIFICATION AGENT (Step 5.25 of gsd-t-execute):
+        Open browser → screenshot at breakpoints → build element inventory →
+        produce structured comparison table (30+ rows) → report MATCH/DEVIATION
+        per element → fix deviations → re-verify → artifact gate enforces completion
 ```
-**BAD** — A vague comparison table with "Appears to match" and "Looks correct" entries. Leading with "What's Working Well" before identifying deviations. Saying "implementation looks very close to the designs" without element-level proof.
-**GOOD** — 45-row structured comparison table where every element has specific design values vs. specific implementation values. 12 deviations identified: wrong chart type (horizontal stacked bar → vertical grouped bar), wrong font size (32px → 24px), missing data labels inside bars, etc. Each fixed individually with re-render verification. Final table: 45/45 ✅ MATCH.
+The verification agent enforces the **FAIL-BY-DEFAULT** rule: every visual element starts as UNVERIFIED. The only valid verdicts are MATCH (with proof) or DEVIATION (with specifics). "Looks close" and "appears to match" are not verdicts. An artifact gate in the orchestrator blocks completion if the comparison table is missing or empty.
 ---