@tekyzinc/gsd-t 2.56.15 → 2.57.10

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -2,6 +2,19 @@
2
2
 
3
3
  All notable changes to GSD-T are documented here. Updated with each release.
4
4
 
5
+ ## [2.57.10] - 2026-04-04
6
+
7
+ ### Added
8
+ - **Design Verification Agent** — dedicated subagent (Step 5.25) spawned after QA and before Red Team when `.gsd-t/contracts/design-contract.md` exists. Opens a browser with both the built frontend AND the original design (Figma/image) side-by-side for direct visual comparison. Produces a 30+ row structured comparison table with MATCH/DEVIATION verdicts. Artifact gate enforces completion — missing table triggers re-spawn.
9
+ - Wired into `gsd-t-execute` (Step 5.25) and `gsd-t-quick` (Step 5.25)
10
+
11
+ ### Changed
12
+ - **Separation of concerns**: Coding agents no longer perform visual verification inline (removed 45-line Step 7 from task subagent prompt). Coding agents write precise code from design tokens; the verification agent proves it matches.
13
+ - `design-to-code.md` Section 15 slimmed from 120 lines to 20 lines — now points to the dedicated agent instead of embedding the full verification loop in the stack rule
14
+ - `CLAUDE-global.md` updated with Design Verification Agent section between QA and Red Team
15
+ - Red Team now runs after Design Verification (previously ran directly after QA)
16
+ - Non-design projects are completely unaffected (gate checks for design-contract.md existence)
17
+
5
18
  ## [2.52.11] - 2026-04-01
6
19
 
7
20
  ### Added
package/README.md CHANGED
@@ -17,6 +17,7 @@ A methodology for reliable, parallelizable development using Claude Code with op
17
17
  **Token-Aware Orchestration** — `token-budget.js` tracks session token consumption and applies graduated degradation: downgrade model assignments when approaching limits, checkpoint and skip non-essential operations to conserve budget, and halt cleanly with a resume instruction at the ceiling. Wave and execute phases check budget before each subagent spawn.
18
18
  **Quality North Star** — projects define a `## Quality North Star` section in CLAUDE.md (1–3 sentences, e.g., "This is a published npm library. Every public API must be intuitive and backward-compatible."). `gsd-t-init` auto-detects preset (library/web-app/cli) from package.json signals; `gsd-t-setup` configures it for existing projects. Subagents read it as a quality lens; absent = silent skip (backward compatible).
19
19
  **Design Brief Artifact** — during partition, UI/frontend projects (React, Vue, Svelte, Flutter, Tailwind) automatically get `.gsd-t/contracts/design-brief.md` with color palette, typography, spacing system, component patterns, and tone/voice. Non-UI projects skip silently. User-customized briefs are preserved. Referenced in plan phase for visual consistency.
20
+ **Design Verification Agent** — after QA passes on design-to-code projects, a dedicated verification agent opens a browser with both the built frontend AND the original design (Figma page, design image, or MCP screenshot) side-by-side for direct visual comparison. Produces a structured element-by-element comparison table (30+ rows) with specific design values vs. implementation values and MATCH/DEVIATION verdicts. An artifact gate enforces that the comparison table exists — missing it blocks completion. Separation of concerns: coding agents code, verification agents verify. Wired into execute (Step 5.25) and quick (Step 5.25). Only fires when `.gsd-t/contracts/design-contract.md` exists — non-design projects are unaffected.
20
21
  **Exploratory Testing** — after scripted tests pass, if Playwright MCP is registered in Claude Code settings, QA agents get 3 minutes and Red Team gets 5 minutes of interactive browser exploration. All findings tagged `[EXPLORATORY]` and tracked separately in QA calibration. Silent skip when Playwright MCP absent. Wired into execute, quick, integrate, and debug.
21
22
 
22
23
  ---
@@ -282,53 +282,12 @@ Execute the task above:
282
282
  recovers (retry button works, form can be resubmitted, etc.).
283
283
  A test that would pass on an empty HTML page with the right element IDs is useless.
284
284
  Every assertion must prove the FEATURE WORKS, not that the ELEMENT EXISTS.
285
- 7. **Visual Design Verification** (MANDATORY when design-to-code stack rule is active):
286
- If the task involves UI implementation from a design reference, this step is NOT optional.
287
- **FAIL-BY-DEFAULT**: Every element is UNVERIFIED until you prove it matches. "Looks close" is not
288
- a verdict. "Appears to match" is not a verdict. Assume NOTHING matches.
289
- a. **Get the Figma reference screenshot**: If Figma MCP is available, call `get_screenshot` with the
290
- relevant nodeId and fileKey from `.gsd-t/contracts/design-contract.md`. Save this as the reference.
291
- If no Figma MCP, use the design image/screenshot provided in the contract.
292
- b. **Build the element inventory**: Before comparing ANYTHING, enumerate every distinct visual
293
- element in the design — walk top-to-bottom, left-to-right. Every chart, label, icon, heading,
294
- card, button, spacing boundary, color, and data visualization detail gets its own row.
295
- Data visualizations expand into multiple rows: chart type, orientation, axis labels, legend
296
- position, bar/segment colors, data labels, grid lines, center text, tooltip style.
297
- If a full-page inventory has fewer than 30 elements, you missed items — go back.
298
- c. **Open side-by-side browser sessions for direct visual comparison**:
299
- Start the dev server if not running. Open TWO views simultaneously:
300
- - **View 1 — Built frontend**: Use Claude Preview, Chrome MCP, or Playwright to open the
301
- implemented page at the correct URL. Navigate to the exact route/component being verified.
302
- - **View 2 — Original design**: If Figma URL → open the Figma page in another browser tab.
303
- If design image file → open the image in a browser tab (`file://` path or HTML wrapper).
304
- If Figma MCP screenshot → open that screenshot image.
305
- Walk through each component with both views visible. Compare element-by-element at matching
306
- zoom levels. Capture screenshot pairs (design + implementation) at each target breakpoint:
307
- - Mobile: 375px width
308
- - Tablet: 768px width
309
- - Desktop: 1280px width
310
- d. **Structured element-by-element comparison** (MANDATORY FORMAT — no prose comparisons):
311
- Produce a table with this exact structure for every element in the inventory:
312
- `| # | Section | Element | Design (specific) | Implementation (specific) | Verdict |`
313
- Rules:
314
- - "Design" column: SPECIFIC values (chart type name, hex color, px size, font weight)
315
- - "Implementation" column: SPECIFIC observed values from the screenshot — not code assumptions
316
- - Verdict: only ✅ MATCH or ❌ DEVIATION — never "appears to match" or "need to verify"
317
- - Data visualizations: chart type, axis orientation, axis labels, legend position, bar colors,
318
- data label placement, grid lines, center text — each a SEPARATE row
319
- - NEVER lead with "what's working" — the table IS the comparison, start with row 1
320
- e. **Fix every ❌ DEVIATION** — fix each row individually, trace to design contract value.
321
- Re-render after each batch of fixes. Update verdict only after visual re-verification.
322
- Max 3 fix-and-recheck iterations per component.
323
- f. **Final table**: After fixes, every row must be ✅ MATCH. Any remaining ❌ → log to
324
- `.gsd-t/qa-issues.md` with severity CRITICAL and tag `[VISUAL]`. BLOCKS task completion.
325
- Report: "Verified: {N}/{total} elements match at {breakpoints} breakpoints"
326
- g. **Log results** in the design contract's Verification Status table.
327
- h. **If no browser/preview tools available**: This is a CRITICAL blocker, not a warning.
328
- Log to `.gsd-t/qa-issues.md`: "CRITICAL: No browser tools available for visual verification.
329
- Install Claude Preview or configure Playwright for visual testing."
330
- The task CANNOT be marked complete without visual verification.
331
- Skip this step entirely if no design-to-code stack rule was injected.
285
+ 7. **Visual Design Note** (when design-to-code stack rule is active):
286
+ Do NOT perform visual verification yourself a dedicated Design Verification Agent
287
+ (Step 5.25) runs after all domain tasks complete and handles the full visual comparison.
288
+ Your job: write precise code from the design contract tokens. Use exact hex colors,
289
+ exact spacing values, exact typography. Every CSS value must trace to the design contract.
290
+ The verification agent will open a browser and prove whether your code matches.
332
291
  8. Run ALL test suites — this is NOT optional, not conditional, not "if applicable":
333
292
  a. Detect configured test runners: check for vitest/jest config, playwright.config.*, cypress.config.*
334
293
  b. Run EVERY detected suite. Unit tests alone are NEVER sufficient when E2E exists.
@@ -649,6 +608,160 @@ A teammate finishes independent tasks and is waiting on a checkpoint:
649
608
  2. If not, have the teammate work on documentation, tests, or code cleanup within their domain
650
609
  3. Or shut down the teammate and respawn when unblocked
651
610
 
611
+ ## Step 5.25: Design Verification Agent (MANDATORY when design contract exists)
612
+
613
+ After all domain tasks complete and QA passes, check if `.gsd-t/contracts/design-contract.md` exists. If it does NOT exist, skip this step entirely.
614
+
615
+ If it DOES exist — spawn a **dedicated Design Verification Agent**. This agent's ONLY job is to open a browser, compare the built frontend against the original design, and produce a structured comparison table. It writes NO feature code. Separation of concerns: the coding agent codes, the verification agent verifies.
616
+
617
+ ⚙ [{model}] Design Verification → visual comparison of built frontend vs design
618
+
619
+ **OBSERVABILITY LOGGING (MANDATORY):**
620
+ Before spawning — run via Bash:
621
+ `T_START=$(date +%s) && DT_START=$(date +"%Y-%m-%d %H:%M") && TOK_START=${CLAUDE_CONTEXT_TOKENS_USED:-0} && TOK_MAX=${CLAUDE_CONTEXT_TOKENS_MAX:-200000}`
622
+
623
+ ```
624
+ Task subagent (general-purpose, model: opus):
625
+ "You are the Design Verification Agent. Your ONLY job is to visually compare
626
+ the built frontend against the original design and produce a structured
627
+ comparison table. You write ZERO feature code. Your sole deliverable is
628
+ the comparison table and verification results.
629
+
630
+ FAIL-BY-DEFAULT: Every visual element starts as UNVERIFIED. You must prove
631
+ each one matches — not assume it does. 'Looks close' is not a verdict.
632
+ 'Appears to match' is not a verdict. The only valid verdicts are MATCH
633
+ (with proof) or DEVIATION (with specifics).
634
+
635
+ ## Step 1: Get the Design Reference
636
+
637
+ Read .gsd-t/contracts/design-contract.md for the source reference.
638
+ - If Figma MCP available → call get_screenshot with nodeId + fileKey from the contract
639
+ - If design image files → locate them from the contract's Source Reference field
640
+ - If no MCP and no images → log CRITICAL blocker to .gsd-t/qa-issues.md and STOP
641
+ You MUST have a reference image before proceeding.
642
+
643
+ ## Step 2: Build the Element Inventory
644
+
645
+ Before ANY comparison, enumerate every distinct visual element in the design.
646
+ Walk the design top-to-bottom, left-to-right. For each section:
647
+ - Section title text and icon
648
+ - Every chart/visualization (type, orientation, labels, legend, series count)
649
+ - Every data table (columns, row structure, sort indicators)
650
+ - Every KPI/stat card (value, label, icon, trend indicator)
651
+ - Every button, toggle, tab, dropdown
652
+ - Every text element (headings, body, captions, labels)
653
+ - Every spacing boundary (section gaps, card padding, element margins)
654
+ - Every color usage (backgrounds, borders, text, chart fills)
655
+ Write each element as a row for the comparison table.
656
+ If the inventory has fewer than 20 elements for a full page, you missed items.
657
+
658
+ Data visualizations MUST expand into multiple rows:
659
+ Chart type, chart orientation, axis labels, axis grid lines, legend position,
660
+ data labels placement, chart colors per series, bar width/spacing,
661
+ center text (donut/pie), tooltip style — each a SEPARATE element.
662
+
663
+ ## Step 3: Open Side-by-Side Browser Sessions
664
+
665
+ Start the dev server (npm run dev, or project equivalent).
666
+ Open TWO browser views simultaneously for direct visual comparison:
667
+
668
+ VIEW 1 — BUILT FRONTEND:
669
+ Open the implemented page using Claude Preview, Chrome MCP, or Playwright.
670
+ Navigate to the exact route/component being verified.
671
+ You MUST see real rendered output — not just read the code.
672
+
673
+ VIEW 2 — ORIGINAL DESIGN REFERENCE:
674
+ If Figma URL available → open the Figma page in a browser tab/window.
675
+ Use the Figma URL from the design contract Source Reference field.
676
+ Navigate to the specific frame/component being compared.
677
+ If design image file → open the image in a browser tab/window.
678
+ Use: file://{absolute-path-to-image} or render in an HTML page.
679
+ If Figma MCP screenshot was captured → open that screenshot image.
680
+
681
+ COMPARISON APPROACH:
682
+ With both views open, walk through each component/section:
683
+ - Position views side-by-side (or switch between tabs)
684
+ - Compare each element visually at the same zoom level
685
+ - Screenshot BOTH views at matching viewport sizes
686
+ Capture implementation screenshots at each target breakpoint:
687
+ Mobile (375px), Tablet (768px), Desktop (1280px) minimum.
688
+ Each breakpoint is a separate screenshot pair (design + implementation).
689
+
690
+ If Claude Preview, Chrome MCP, and Playwright are ALL unavailable:
691
+ This is a CRITICAL blocker. Log to .gsd-t/qa-issues.md:
692
+ 'CRITICAL: No browser tools available for visual verification.'
693
+ STOP — the verification CANNOT proceed without a browser.
694
+
695
+ ## Step 4: Structured Element-by-Element Comparison (MANDATORY FORMAT)
696
+
697
+ Produce a comparison table with this exact structure. Every element from
698
+ the inventory gets its own row. No summarizing, no grouping, no prose.
699
+
700
+ | # | Section | Element | Design (specific) | Implementation (specific) | Verdict |
701
+ |---|---------|---------|-------------------|--------------------------|---------|
702
+ | 1 | Summary | Chart type | Horizontal stacked bar | Vertical grouped bar | ❌ DEVIATION |
703
+ | 2 | Summary | Chart colors | #4285F4, #34A853, #FBBC04 | #4285F4, #34A853, #FBBC04 | ✅ MATCH |
704
+
705
+ Rules:
706
+ - 'Design' column: SPECIFIC values (chart type name, hex color, px size, font weight)
707
+ - 'Implementation' column: SPECIFIC observed values from the SCREENSHOT — not code assumptions
708
+ - Verdict: only ✅ MATCH or ❌ DEVIATION — never 'appears to match' or 'need to verify'
709
+ - NEVER write 'Appears to match' or 'Looks correct' — measure and verify
710
+ - If the table has fewer than 30 rows for a full-page comparison, you skipped elements
711
+
712
+ ## Step 5: Report Deviations
713
+
714
+ For each ❌ DEVIATION, write a specific finding:
715
+ 'Design: {exact value}. Implementation: {exact value}. File: {path}:{line}'
716
+
717
+ Write the FULL comparison table to .gsd-t/contracts/design-contract.md
718
+ under a '## Verification Status' section.
719
+
720
+ Any ❌ DEVIATION → also append to .gsd-t/qa-issues.md with severity HIGH
721
+ and tag [VISUAL]:
722
+ | {date} | gsd-t-execute | Step 5.25 | opus | {duration} | HIGH | [VISUAL] {description} |
723
+
724
+ ## Step 6: Verdict
725
+
726
+ Count results: '{MATCH_COUNT}/{TOTAL} elements match at {breakpoints} breakpoints'
727
+
728
+ VERDICT:
729
+ - ALL rows ✅ MATCH → DESIGN VERIFIED
730
+ - ANY rows ❌ DEVIATION → DESIGN DEVIATIONS FOUND ({count} deviations)
731
+
732
+ Write verdict to .gsd-t/contracts/design-contract.md Verification Status section.
733
+
734
+ Report back:
735
+ - Verdict: DESIGN VERIFIED | DESIGN DEVIATIONS FOUND
736
+ - Match count: {N}/{total}
737
+ - Breakpoints verified: {list}
738
+ - Deviations: {count with summary of each}
739
+ - Comparison table: {the full table}"
740
+ ```
741
+
742
+ After subagent returns — run via Bash:
743
+ `T_END=$(date +%s) && DT_END=$(date +"%Y-%m-%d %H:%M") && TOK_END=${CLAUDE_CONTEXT_TOKENS_USED:-0} && DURATION=$((T_END-T_START))`
744
+ Compute tokens and compaction:
745
+ - No compaction (TOK_END >= TOK_START): `TOKENS=$((TOK_END-TOK_START))`, COMPACTED=null
746
+ - Compaction detected (TOK_END < TOK_START): `TOKENS=$(((TOK_MAX-TOK_START)+TOK_END))`, COMPACTED=$DT_END
747
+ Append to `.gsd-t/token-log.md`:
748
+ `| {DT_START} | {DT_END} | gsd-t-execute | Design Verify | opus | {DURATION}s | {VERDICT} — {MATCH}/{TOTAL} elements | {TOKENS} | {COMPACTED} | | | {CTX_PCT} |`
749
+
750
+ **Artifact Gate (MANDATORY):**
751
+ After the Design Verification Agent returns, check `.gsd-t/contracts/design-contract.md`:
752
+ 1. Read the file — does it contain a `## Verification Status` section?
753
+ 2. Does that section contain a comparison table with rows?
754
+ 3. If EITHER is missing → the verification agent failed its job. Log:
755
+ `[failure] Design Verification Agent did not produce comparison table — re-spawning`
756
+ Re-spawn the agent (1 retry). If it fails again, log to `.gsd-t/deferred-items.md`.
757
+
758
+ **If VERDICT is DESIGN DEVIATIONS FOUND:**
759
+ 1. Fix all deviations (spawn a fix subagent, model: sonnet, with the deviation list)
760
+ 2. Re-spawn the Design Verification Agent to re-verify (max 2 fix-and-verify cycles)
761
+ 3. If deviations persist after 2 cycles, log to `.gsd-t/deferred-items.md` and present to user
762
+
763
+ **If VERDICT is DESIGN VERIFIED:** Proceed to Red Team.
764
+
652
765
  ## Step 5.5: Red Team — Adversarial QA (MANDATORY)
653
766
 
654
767
  After all domain tasks pass their tests, spawn an adversarial Red Team agent. This agent's sole purpose is to BREAK the code that was just built. It operates with inverted incentives — its success is measured by bugs found, not tests passed.
@@ -261,7 +261,7 @@ Use these when user asks for help on a specific command:
261
261
  - **Note (M22)**: Task-level fresh dispatch (one subagent per task, ~10-20% context each). Team mode uses worktree isolation (`isolation: "worktree"`) — zero file conflicts. Adaptive replanning between domain completions.
262
262
  - **Note (M26)**: Active rule injection — evaluates declarative rules from rules.jsonl before dispatching each domain's tasks. Fires matching rules as warnings in subagent prompts.
263
263
  - **Note (M29)**: Stack Rules Engine — auto-detects project tech stack from manifest files and injects mandatory best-practice rules into each task subagent prompt. Universal rules (`_security.md`, `_auth.md`) always apply; stack-specific rules layer on top. Violations are task failures (same weight as contract violations).
264
- - **Note (M33)**: Design-to-code — activated when design contract, design tokens, Figma config, or Figma MCP in settings.json is detected. Injects pixel-perfect implementation rules: design token extraction, stack capability evaluation, component decomposition. Step 7 (Visual Design Verification) is MANDATORYrenders each screen in a real browser, screenshots at mobile/tablet/desktop, compares pixel-by-pixel against the Figma design via MCP `get_screenshot`. Visual deviations block task completion. Also triggers from Figma MCP being configured in `~/.claude/settings.json`.
264
+ - **Note (M33)**: Design-to-code — activated when design contract, design tokens, Figma config, or Figma MCP in settings.json is detected. Injects pixel-perfect implementation rules: design token extraction, stack capability evaluation, component decomposition. Step 5.25 spawns a **dedicated Design Verification Agent** (model: opus) after QA passes opens a browser with both the built frontend AND the original design side-by-side, produces a 30+ row structured comparison table, and enforces an artifact gate (missing table = re-spawn). Coding agents code; the verification agent verifies. Visual deviations block completion. Only fires when `.gsd-t/contracts/design-contract.md` exists.
265
265
 
266
266
  ### test-sync
267
267
  - **Summary**: Keep tests aligned with code changes
@@ -239,6 +239,46 @@ After all scripted tests pass:
239
239
  4. If Playwright MCP is not available: skip this section silently
240
240
  Note: Exploratory findings do NOT count against the scripted test pass/fail ratio.
241
241
 
242
+ ## Step 5.25: Design Verification Agent (MANDATORY when design contract exists)
243
+
244
+ After tests pass, check if `.gsd-t/contracts/design-contract.md` exists. If it does NOT, skip to Step 5.5.
245
+
246
+ If it DOES exist and this task involved UI changes — spawn the Design Verification Agent. This agent's ONLY job is to open a browser, compare the built frontend against the original design, and produce a structured comparison table. It writes NO feature code.
247
+
248
+ ⚙ [{model}] Design Verification → visual comparison of built frontend vs design
249
+
250
+ **OBSERVABILITY LOGGING (MANDATORY):**
251
+ Before spawning — run via Bash:
252
+ `T_START=$(date +%s) && DT_START=$(date +"%Y-%m-%d %H:%M") && TOK_START=${CLAUDE_CONTEXT_TOKENS_USED:-0} && TOK_MAX=${CLAUDE_CONTEXT_TOKENS_MAX:-200000}`
253
+
254
+ ```
255
+ Task subagent (general-purpose, model: opus):
256
+ "You are the Design Verification Agent. Your ONLY job is to visually compare
257
+ the built frontend against the original design and produce a structured
258
+ comparison table. You write ZERO feature code.
259
+
260
+ FAIL-BY-DEFAULT: Every visual element starts as UNVERIFIED. Prove each matches.
261
+
262
+ 1. Read .gsd-t/contracts/design-contract.md for design source reference
263
+ 2. Get design reference (Figma MCP screenshot, or design images from contract)
264
+ 3. Start dev server, open the built frontend in browser (Claude Preview/Chrome MCP/Playwright)
265
+ 4. Open the original design reference in a second browser view
266
+ 5. Build element inventory (30+ elements for a full page): every chart, label,
267
+ icon, heading, card, button, spacing, color — each a separate row
268
+ 6. Produce structured comparison table:
269
+ | # | Section | Element | Design (specific) | Implementation (specific) | Verdict |
270
+ Only valid verdicts: ✅ MATCH or ❌ DEVIATION (never 'appears to match')
271
+ 7. Write results to .gsd-t/contracts/design-contract.md under '## Verification Status'
272
+ 8. Any ❌ → append to .gsd-t/qa-issues.md with [VISUAL] tag
273
+ 9. Report: DESIGN VERIFIED | DESIGN DEVIATIONS FOUND ({count})"
274
+ ```
275
+
276
+ After subagent returns — run observability Bash and append to token-log.md.
277
+
278
+ **Artifact Gate:** Read `.gsd-t/contracts/design-contract.md` — if no `## Verification Status` section with a comparison table exists, re-spawn (1 retry).
279
+
280
+ **If deviations found:** Fix them (max 2 cycles), re-verify. If persistent, log to `.gsd-t/deferred-items.md`.
281
+
242
282
  ## Step 5.5: Red Team — Adversarial QA (MANDATORY)
243
283
 
244
284
  After tests pass, spawn an adversarial Red Team agent. This agent's sole purpose is to BREAK the code that was just changed. Its success is measured by bugs found, not tests passed.
@@ -156,7 +156,9 @@ GSD-T reads all state files and tells you exactly where you left off.
156
156
  │ │ └──────┐ │
157
157
  │ │ ▼ │
158
158
  │ │ ┌───────────────────┐ │
159
- │ │ │ QA + Red Team │ │
159
+ │ │ │ QA + Design │ │
160
+ │ │ │ Verification + │ │
161
+ │ │ │ Red Team │ │
160
162
  │ │ │ (after each phase │ │
161
163
  │ │ │ that writes code)│ │
162
164
  │ │ └───────────────────┘ │
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@tekyzinc/gsd-t",
3
- "version": "2.56.15",
3
+ "version": "2.57.10",
4
4
  "description": "GSD-T: Contract-Driven Development for Claude Code — 51 slash commands with headless CI/CD mode, graph-powered code analysis, real-time agent dashboard, execution intelligence, task telemetry, doc-ripple enforcement, backlog management, impact analysis, test sync, milestone archival, and PRD generation",
5
5
  "author": "Tekyz, Inc.",
6
6
  "license": "MIT",
@@ -308,14 +308,36 @@ Report format: 'Unit: X/Y pass | E2E: X/Y pass (or N/A if no config) | Contract:
308
308
 
309
309
  **QA Calibration Feedback Loop** — If `bin/qa-calibrator.js` exists in the project, the system tracks QA miss-rates (bugs found by Red Team that QA missed) and automatically injects targeted guidance into future QA prompts. Weak-spot categories (error paths, boundary inputs, state transitions) are detected from miss patterns and injected as a preamble before the QA subagent runs. Projects without `qa-miss-log.jsonl` data behave identically to baseline — calibration is fully opt-in and backward compatible.
310
310
 
311
+ ## Design Verification Agent (Mandatory when design contract exists)
312
+
313
+ After QA passes, if `.gsd-t/contracts/design-contract.md` exists, a **dedicated Design Verification Agent** is spawned. This agent's ONLY job is to open a browser, compare the built frontend against the original design, and produce a structured element-by-element comparison table. It writes ZERO feature code.
314
+
315
+ **Why a dedicated agent?** Coding agents consistently skip visual verification — even with detailed instructions — because their incentive is to finish building, not to audit. Separating the verifier from the builder ensures the verification actually happens.
316
+
317
+ **Design Verification method by command:**
318
+ - `execute` → spawns Design Verification Agent after QA passes (Step 5.25)
319
+ - `quick` → spawns Design Verification Agent after tests pass (Step 5.25)
320
+ - `integrate`, `wave` → Design Verification runs within the execute phase per the rules above
321
+ - Commands without UI work → skipped automatically (no design contract = no verification)
322
+
323
+ **Key rules:**
324
+ - **FAIL-BY-DEFAULT**: Every visual element starts as UNVERIFIED. Must prove each matches.
325
+ - **Structured comparison table**: 30+ rows minimum for a full page. Each element gets specific design values vs. specific implementation values and a MATCH or DEVIATION verdict.
326
+ - **No vague verdicts**: "Looks close" and "appears to match" are not valid. Only �� MATCH or ❌ DEVIATION with specific values.
327
+ - **Side-by-side browser sessions**: Opens both the built frontend AND the original design (Figma page, design image, or MCP screenshot) for direct visual comparison.
328
+ - **Artifact gate**: Orchestrator checks that `design-contract.md` contains a `## Verification Status` section with a populated comparison table. Missing artifact = re-spawn (1 retry).
329
+ - **Fix cycle**: Deviations are fixed (up to 2 cycles) and re-verified before proceeding.
330
+
331
+ **Design Verification FAIL blocks phase completion.** Deviations must be fixed or logged to `.gsd-t/deferred-items.md`.
332
+
311
333
  ## Red Team — Adversarial QA (Mandatory)
312
334
 
313
- After QA passes, every code-producing command spawns a **Red Team agent** — an adversarial subagent whose success is measured by bugs found, not tests passed. This inverts the incentive structure: the Red Team's drive toward "task complete" means digging deeper and finding more bugs, not rubber-stamping.
335
+ After QA and Design Verification pass, every code-producing command spawns a **Red Team agent** — an adversarial subagent whose success is measured by bugs found, not tests passed. This inverts the incentive structure: the Red Team's drive toward "task complete" means digging deeper and finding more bugs, not rubber-stamping.
314
336
 
315
337
  **Red Team method by command:**
316
- - `execute` → spawns Red Team after all domain tasks pass (Step 5.5)
338
+ - `execute` → spawns Red Team after Design Verification passes (Step 5.5)
317
339
  - `integrate` → spawns Red Team after integration tests pass (Step 7.5)
318
- - `quick` → spawns Red Team after Test & Verify passes (Step 5.5)
340
+ - `quick` → spawns Red Team after Design Verification passes (Step 5.5)
319
341
  - `debug` → spawns Red Team after fix verification passes (Step 5.3)
320
342
  - `wave` → each phase agent handles Red Team per the rules above
321
343
 
@@ -433,125 +433,25 @@ MANDATORY:
433
433
 
434
434
  ---
435
435
 
436
- ## 15. Visual Verification Loop
436
+ ## 15. Visual Verification
437
437
 
438
- **FAIL-BY-DEFAULT RULE**: Every visual element starts as UNVERIFIED. You must prove each one matches not assume it does. "Looks close" is not a verdict. "Appears to match" is not a verdict. The only valid verdicts are MATCH (with proof) or DEVIATION (with specifics). If you catch yourself writing "looks correct" or "appears right" without element-level proof, you are doing it wrong.
438
+ **Visual verification is handled by a dedicated Design Verification Agent**, spawned automatically by `gsd-t-execute` (Step 5.25) after all domain tasks complete. The verification agent's ONLY job is to open a browser, compare the built frontend against the original design, and produce a structured element-by-element comparison table.
439
+
440
+ **Your job as the coding agent**: Write precise code from the design contract tokens. Every CSS value must trace to a design contract entry. Use exact hex colors, exact spacing values, exact typography. The verification agent will open a browser and prove whether your code matches.
439
441
 
440
442
  ```
441
- MANDATORY:
442
- ├── After implementing any design component, you MUST verify it visually.
443
- Skipping this step is a TASK FAILURE not optional, not "if tools available".
444
-
445
- ├── Step 1: GET THE FIGMA REFERENCE
446
- │ If Figma MCP available → call get_screenshot with nodeId + fileKey
447
- │ If no MCP → use design image/screenshot from the design contract
448
- │ You MUST have a reference image before proceeding
449
-
450
- ├── Step 2: BUILD THE ELEMENT INVENTORY
451
- │ Before ANY comparison, enumerate every distinct visual element in the
452
- │ design. Walk the design top-to-bottom, left-to-right. For each section:
453
- │ - Section title text and icon
454
- │ - Every chart/visualization (type, orientation, labels, legend, series count)
455
- │ - Every data table (columns, row structure, sort indicators)
456
- │ - Every KPI/stat card (value, label, icon, trend indicator)
457
- │ - Every button, toggle, tab, dropdown
458
- │ - Every text element (headings, body, captions, labels)
459
- │ - Every spacing boundary (section gaps, card padding, element margins)
460
- │ - Every color usage (backgrounds, borders, text, chart fills)
461
- │ Write each element as a row in the comparison table (Step 4).
462
- │ If the inventory has fewer than 20 elements for a full page, you missed items.
463
-
464
- ├── Step 3: OPEN SIDE-BY-SIDE BROWSER SESSIONS
465
- │ Start the dev server (npm run dev, etc.)
466
- │ Open TWO browser views simultaneously for direct visual comparison:
467
-
468
- │ VIEW 1 — BUILT FRONTEND:
469
- │ Open the implemented page using Claude Preview, Chrome MCP, or Playwright
470
- │ Navigate to the exact route/component being verified
471
- │ You MUST see real rendered output — not just read the code
472
-
473
- │ VIEW 2 — ORIGINAL DESIGN REFERENCE:
474
- │ If Figma URL available → open the Figma page in a browser tab/window
475
- │ Use the Figma URL from the design contract Source Reference field
476
- │ Navigate to the specific frame/component being compared
477
- │ If design image file → open the image in a browser tab/window
478
- │ Use: file://{absolute-path-to-image} or render in an HTML page
479
- │ If Figma MCP screenshot was captured → open that screenshot image
480
-
481
- │ COMPARISON APPROACH:
482
- │ With both views open, walk through each component/section:
483
- │ - Position views side-by-side (or switch between tabs)
484
- │ - Compare each element visually at the same zoom level
485
- │ - Screenshot BOTH views at matching viewport sizes
486
- │ Capture implementation screenshots at each target breakpoint:
487
- │ Mobile (375px), Tablet (768px), Desktop (1280px) minimum
488
- │ Each breakpoint is a separate screenshot pair (design + implementation)
443
+ SEPARATION OF CONCERNS:
444
+ ├── CODING AGENT (you Sections 1-14 above):
445
+ Extract tokens write precise CSS trace every value to design contract
446
+ Do NOT open a browser or attempt visual comparison yourself
489
447
 
490
- ├── Step 4: STRUCTURED ELEMENT-BY-ELEMENT COMPARISON (MANDATORY FORMAT)
491
- │ You MUST produce a comparison table with this exact structure.
492
- │ Every row from the inventory gets its own row. No summarizing, no grouping,
493
- │ no "appears to match" prose. Each element gets an individual verdict.
494
-
495
- │ | # | Section | Element | Design (specific) | Implementation (specific) | Verdict |
496
- │ |---|---------|---------|-------------------|--------------------------|---------|
497
- │ | 1 | Summary | Chart type | Horizontal stacked bar | Vertical grouped bar | ❌ DEVIATION |
498
- │ | 2 | Summary | Chart colors | #4285F4, #34A853, #FBBC04 | #4285F4, #34A853, #FBBC04 | ✅ MATCH |
499
- │ | 3 | Summary | Y-axis labels | Tool names, left-aligned | Tool names, left-aligned | ✅ MATCH |
500
- │ | 4 | Summary | Bar label placement | Inside bar, white text | Above bar, black text | ❌ DEVIATION |
501
- │ | 5 | KPIs | Font size | 32px semibold | 24px regular | ❌ DEVIATION |
502
- │ | ... | ... | ... | ... | ... | ... |
503
-
504
- │ Rules for the table:
505
- │ - "Design" column must have SPECIFIC values (chart type name, hex color,
506
- │ pixel size, font weight name) — not vague descriptions
507
- │ - "Implementation" column must have SPECIFIC observed values — not assumptions
508
- │ from reading code. You must LOOK at the rendered screenshot.
509
- │ - NEVER write "Appears to match" or "Looks correct" — measure and verify
510
- │ - NEVER write "Need to verify" — verify it NOW or mark UNVERIFIED
511
- │ - Data visualizations get MULTIPLE rows: chart type, axis orientation,
512
- │ axis labels, legend position, bar/line/segment colors, data labels,
513
- │ grid lines, tooltip style — each is a separate element
514
- │ - If the table has fewer than 30 rows for a full-page comparison,
515
- │ you skipped elements. Go back to the inventory.
516
-
517
- │ DATA VISUALIZATION CHECKLIST (expand into table rows):
518
- │ Chart type: bar/stacked-bar/grouped-bar/horizontal-bar/line/area/donut/pie/scatter
519
- │ Chart orientation: horizontal vs vertical
520
- │ Axis labels: present, position, font, values
521
- │ Axis grid lines: present, style, color
522
- │ Legend: position (top/bottom/right/inline), format, colors
523
- │ Data labels: inside bars/above bars/on segments, font, color
524
- │ Chart colors: exact hex per series/segment
525
- │ Bar width/spacing: relative proportions
526
- │ Center text (donut/pie): present, value, font
527
- │ Tooltip style: if visible in design
528
-
529
- ├── Step 5: FIX EVERY DEVIATION
530
- │ Fix each ❌ row from the table, one by one
531
- │ Trace each fix to the design contract value
532
- │ Re-render after each batch of fixes
533
- │ Update the table: change ❌ to ✅ only after visual re-verification
534
- │ Maximum 3 fix-and-recheck iterations
535
-
536
- ├── Step 6: FINAL VERIFICATION
537
- │ After fixes, take fresh screenshots at all breakpoints
538
- │ Produce a FINAL comparison table — every row must be ✅ MATCH
539
- │ Any remaining ❌ → CRITICAL finding in .gsd-t/qa-issues.md
540
- │ Task is NOT complete until every row shows ✅ MATCH
541
- │ Count: "Verified: {N}/{total} elements match at {breakpoints} breakpoints"
542
-
543
- ├── NO BROWSER TOOLS = BLOCKER
544
- │ If Claude Preview, Chrome MCP, and Playwright are ALL unavailable:
545
- │ This is a CRITICAL blocker, not a warning to log and move on
546
- │ The task CANNOT be marked complete without visual verification
547
- │ Log to .gsd-t/qa-issues.md with severity CRITICAL
548
-
549
- └── Log all verification results in the design contract Verification Status table
448
+ └── DESIGN VERIFICATION AGENT (Step 5.25 of gsd-t-execute):
449
+ Open browser screenshot at breakpoints build element inventory →
450
+ produce structured comparison table (30+ rows) report MATCH/DEVIATION
451
+ per element fix deviations re-verify artifact gate enforces completion
550
452
  ```
551
453
 
552
- **BAD** A vague comparison table with "Appears to match" and "Looks correct" entries. Leading with "What's Working Well" before identifying deviations. Saying "implementation looks very close to the designs" without element-level proof.
553
-
554
- **GOOD** — 45-row structured comparison table where every element has specific design values vs. specific implementation values. 12 deviations identified: wrong chart type (horizontal stacked bar → vertical grouped bar), wrong font size (32px → 24px), missing data labels inside bars, etc. Each fixed individually with re-render verification. Final table: 45/45 ✅ MATCH.
454
+ The verification agent enforces the **FAIL-BY-DEFAULT** rule: every visual element starts as UNVERIFIED. The only valid verdicts are MATCH (with proof) or DEVIATION (with specifics). "Looks close" and "appears to match" are not verdicts. An artifact gate in the orchestrator blocks completion if the comparison table is missing or empty.
555
455
 
556
456
  ---
557
457