npm - @tekyzinc/gsd-t - Versions diffs - 2.70.14 → 2.70.16 - Mend

@tekyzinc/gsd-t 2.70.14 → 2.70.16

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (5) hide show

package/CHANGELOG.md +10 -0
package/commands/gsd-t-design-decompose.md +124 -25
package/commands/gsd-t-execute.md +51 -6
package/commands/gsd-t-quick.md +10 -2
package/package.json +1 -1

package/CHANGELOG.md CHANGED Viewed

@@ -2,6 +2,16 @@
 All notable changes to GSD-T are documented here. Updated with each release.
+## [2.70.15] - 2026-04-06
+### Changed (design pipeline — decompose verification)
+- **Separate Verification Agent** — `gsd-t-design-decompose` Step 6.5 now spawns a dedicated opus-model verification subagent instead of self-verifying chart classifications. The decompose agent cannot verify its own work — sunk cost bias causes it to rubber-stamp its classifications. The separate agent has fresh context and its sole incentive is finding mismatches.
+- **BAR CHART ORIENTATION PROOF** — mechanical decision tree injected into the verification agent prompt: rectangles in a ROW → HORIZONTAL, rectangles BOTTOM-TO-TOP → VERTICAL. Eliminates the #1 misclassification (horizontal percentage bars classified as vertical grouped).
+- **Max 2 fix cycles** — if the verifier finds mismatches, contracts are corrected and re-verified (up to 2 cycles). Persistent failures block decompose completion.
+### Why
+v2.70.14 ensured "build follows contracts" but the contracts themselves were wrong. The decompose command's Step 6.5 asked the same agent to verify its own chart type classifications — it always passed itself. Three charts (`Number of Tools`, `Time on Page`, `Number of Visits`) were classified as `bar-vertical-grouped` when the Figma shows `bar-stacked-horizontal-percentage`. A separate verification agent with no sunk cost catches these mismatches before contracts are finalized.
 ## [2.70.14] - 2026-04-06
 ### Changed (design pipeline — hierarchical execution)

package/commands/gsd-t-design-decompose.md CHANGED Viewed

@@ -252,11 +252,38 @@ For each widget contract:
 1. Copy `templates/widget-contract.md` as scaffold
 2. Reference elements by name in the "Elements Used" table
 3. Define layout, data binding, responsive behavior, widget-level verification
+4. **Extract layout CSS from `get_design_context` output (MANDATORY)**:
+   The Figma MCP returns code with explicit CSS layout properties. Parse these into the
+   widget contract's "Internal Element Layout" section:
+   - `body_layout`: Look at the parent container's CSS in the Figma output.
+     `flex flex-row` or `flex gap-[16px] items-center` → `flex-row`.
+     `flex flex-col` or `flex-col gap-[16px]` → `flex-column`.
+     `grid grid-cols-2` → `grid 2-col`. Write EXACTLY what the Figma shows.
+   - `body_gap`: Extract the gap value from the Figma CSS (e.g., `gap-[16px]` → `16px`)
+   - Legend position: If legend is a SIBLING of the chart in a `flex-row` container →
+     legend is BESIDE the chart (`body_sidebar`). If legend is BELOW the chart in a
+     `flex-col` container → legend is in `footer_legend`. This distinction is CRITICAL —
+     it's the difference between a side-by-side layout and a stacked layout.
+   - `container_height`: If the Figma shows `h-[334px]` → fixed height `334px`.
+     If no explicit height → `auto`.
 For each page contract:
 1. Copy `templates/page-contract.md` as scaffold
 2. Reference widgets in grid positions
 3. Define route, data loading, global states, performance budget
+4. **Extract grid structure from `get_design_context` output (MANDATORY)**:
+   The Figma MCP returns the page's layout as nested containers. Parse the structure:
+   - Count "Row" or `flex-row` containers and their children to determine grid dimensions
+     (e.g., 2 Row containers with 2 cards each → `grid 2×2`, NOT `grid 1×4`)
+   - Extract `gap` values between rows and between cards within rows
+   - Extract explicit heights on cards (e.g., `h-[334px]`)
+   - Document in the page contract's "Widgets Used" table:
+     ```
+     | grid[row=1, cols=1-2] | most-popular-tools + number-of-tools | 2 per row |
+     | grid[row=2, cols=1-2] | time-on-page + number-of-visits      | 2 per row |
+     ```
+   - **Anti-pattern**: Seeing 4 sibling cards and writing `grid-cols-4` when the Figma
+     groups them into 2 rows of 2. ALWAYS check the parent container structure.
 Write `INDEX.md` as a navigation map:
@@ -301,40 +328,112 @@ Per-page element manifest (for verification agent):
 | {analytics}        | {N}     | {N} — {list} |
 ```
-## Step 6.5: Contract-vs-Figma Verification Gate (MANDATORY)
+## Step 6.5: Contract-vs-Figma Verification Gate — SEPARATE AGENT (MANDATORY)
-After writing all contracts but BEFORE proceeding to partition or build, verify that each contract accurately represents the Figma design. This gate catches errors that would otherwise propagate through the entire build.
+After writing all contracts but BEFORE proceeding to partition or build, spawn a **dedicated verification subagent** to independently verify every chart classification against the Figma source. This agent has FRESH context, no sunk cost in the classifications, and its sole incentive is finding mismatches.
-### For each widget contract:
+> **Why a separate agent?** The decompose agent that classified the charts cannot objectively verify its own classifications. It has the same blind spots that caused the misclassification. This was proven repeatedly — the same agent rubber-stamps its own work. A fresh agent with only the contracts and Figma access catches what the classifier missed.
-1. **Re-read the Figma node** — call `get_design_context` (or re-examine the screenshot) for the specific widget node
-2. **Compare the contract's claimed structure against the actual Figma node:**
+**OBSERVABILITY LOGGING (MANDATORY):**
+Before spawning — run via Bash:
+`T_START=$(date +%s) && DT_START=$(date +"%Y-%m-%d %H:%M") && TOK_START=${CLAUDE_CONTEXT_TOKENS_USED:-0} && TOK_MAX=${CLAUDE_CONTEXT_TOKENS_MAX:-200000}`
-| Check | What to verify | Failure mode it prevents |
-|-------|---------------|------------------------|
-| Chart type | Contract's element name matches the actual visual pattern | Donut classified as stacked bar (or vice versa) |
-| Data labels | Contract's Test Fixture labels match the Figma text exactly | Hallucinated column headers, invented metrics |
-| Element count | Number of sub-elements in contract matches Figma | Missing legends, extra charts, wrong layout |
-| Text content | Every title, subtitle, label, legend item matches Figma verbatim | "Engagement per video" subtitle that doesn't exist in Figma |
-| Layout structure | Widget's claimed layout matches Figma arrangement | Side-by-side classified as stacked, 2 charts classified as 1 |
-3. **Produce a contract-vs-Figma mismatch report:**
+⚙ [opus] gsd-t-design-decompose → Chart Classification Verifier
 ```
-CONTRACT-VS-FIGMA VERIFICATION
-───────────────────────────────
-✅ most-popular-tools-card: chart-donut — MATCHES Figma node 123:458
-✅ number-of-tools-card: chart-bar-stacked-horizontal-percentage — MATCHES
-❌ member-state-card: chart-donut — MISMATCH: Figma shows stacked vertical bars, not donuts
-   → Fix: reclassify as chart-bar-stacked-vertical, rewrite element contract
-❌ video-playlist-table: columns [Title, Duration, Views, Watch Time, Completion]
-   — MISMATCH: Figma shows [Video, Viewed, Clicked Thumbnail, Clicked CTA, Avg. Seconds Watched]
-   → Fix: update Test Fixture column headers
+Task subagent (general-purpose, model: opus):
+"You are the Chart Classification Verifier. Your ONLY job is to independently
+verify that each element contract's chart type classification matches the actual
+Figma design. You have ZERO knowledge of how the charts were classified — you
+are seeing them fresh. Your incentive: every misclassification you catch prevents
+a wrong chart being built. Every misclassification you miss causes a rebuild.
+## Contracts to Verify
+{list each element contract filename + its claimed chart type from INDEX.md}
+## Figma Source
+File key: {fileKey}
+Page node: {nodeId}
+## Verification Process
+For EACH element contract that claims a chart/visualization type:
+1. Read the element contract — note its claimed type (e.g., 'bar-vertical-grouped')
+2. Find the Figma node ID referenced in the contract (or in the widget that uses it)
+3. Call `get_design_context` on that specific node ID — examine the STRUCTURE:
+   - Layout mode (horizontal vs vertical arrangement of children)
+   - Child elements (are they bars? segments? slices?)
+   - How children are arranged (side by side? stacked? overlapping?)
+   - Dimensions (do bars extend horizontally or vertically?)
+4. Walk the decision tree INDEPENDENTLY (do NOT read the contract's reasoning):
+   BAR CHART ORIENTATION PROOF:
+   a. Are the data-bearing rectangles arranged HORIZONTALLY (left to right)?
+      → Segments share ONE ROW, each segment's WIDTH encodes its value
+      → This is HORIZONTAL (stacked if touching, grouped if separated)
+   b. Are the data-bearing rectangles arranged VERTICALLY (bottom to top)?
+      → Each bar is a COLUMN, each bar's HEIGHT encodes its value
+      → This is VERTICAL (stacked if layered, grouped if side-by-side)
+   c. Is it ONE bar with colored segments? → STACKED
+      Is it MULTIPLE separate bars? → GROUPED
+   d. Do labels show percentages summing to 100%? → PERCENTAGE variant
+   CRITICAL DISTINCTION — the #1 misclassification:
+   A single horizontal bar divided into colored segments (each segment's WIDTH
+   represents a percentage) is chart-bar-stacked-horizontal-percentage.
+   Multiple vertical columns of different heights side-by-side is
+   chart-bar-grouped-vertical. These render COMPLETELY DIFFERENTLY.
+   If you see colored blocks in a ROW → HORIZONTAL. Period.
+5. Compare YOUR classification against the contract's classification.
+6. For EACH element, produce:
+   ```
+   Element: {name}
+   Contract claims: {chart type}
+   Figma node: {id}
+   I SEE: {describe what the Figma MCP returned — layout, children, arrangement}
+   MY CLASSIFICATION: {your independent classification}
+   VERDICT: ✅ MATCH or ❌ MISMATCH
+   If MISMATCH: Contract says {X} but Figma shows {Y} because {evidence}
+   ```
+## Report
+Produce the full verification table:
+| # | Element | Contract Type | Verified Type | Figma Evidence | Verdict |
+|---|---------|--------------|---------------|----------------|---------|
+| 1 | chart-donut | chart-donut | chart-donut | circular arcs + center hole | ✅ MATCH |
+| 2 | bar-vertical-grouped | bar-vertical-grouped | bar-stacked-horizontal-pct | 4 segments in ONE horizontal row | ❌ MISMATCH |
+If ANY ❌ MISMATCH found:
+- List each mismatch with the correct classification and evidence
+- Report: 'VERIFICATION FAILED — {N} misclassifications found. Contracts must be fixed before build.'
+If ALL ✅ MATCH:
+- Report: 'VERIFICATION PASSED — all {N} chart classifications confirmed against Figma source.'
+"
 ```
-4. **If ANY mismatches found**: fix the contracts BEFORE proceeding. Do not build from wrong contracts.
+After subagent returns — run via Bash:
+`T_END=$(date +%s) && DT_END=$(date +"%Y-%m-%d %H:%M") && TOK_END=${CLAUDE_CONTEXT_TOKENS_USED:-0} && DURATION=$((T_END-T_START))`
+Compute tokens/compaction per standard pattern. Append to `.gsd-t/token-log.md`.
+**If VERIFICATION FAILED**: Fix every misclassified element contract before proceeding:
+1. Rename the contract file to match the correct chart type
+2. Rewrite the visual spec section to match the correct chart type
+3. Update INDEX.md references
+4. Update any widget contracts that reference the renamed element
+5. **Re-run the verification subagent** to confirm fixes (max 2 cycles)
+**If VERIFICATION PASSED**: Proceed to Step 7.
-> **Why this gate exists**: The two-terminal validation (tasks 001-013) proved the system produces 50/50 scores when contracts are correct — but also revealed that scoring code-vs-contract doesn't catch contract-vs-Figma errors. This gate closes that gap.
+> **Why this gate exists**: The decompose agent's own examples show the correct classification for "Number of Tools" as `chart-bar-stacked-horizontal-percentage` — yet the agent classified the same chart as `bar-vertical-grouped` in practice. Soft instructions ("MANDATORY decision tree") don't prevent misclassification. A separate agent with fresh context and inverted incentives (success = finding errors) does.
 ## Step 7: Wire Into Partition

package/commands/gsd-t-execute.md CHANGED Viewed

@@ -309,12 +309,57 @@ Execute the task above:
         recovers (retry button works, form can be resubmitted, etc.).
      A test that would pass on an empty HTML page with the right element IDs is useless.
      Every assertion must prove the FEATURE WORKS, not that the ELEMENT EXISTS.
-8. **Visual Design Note** (when design-to-code stack rule is active):
-   Do NOT perform visual verification yourself — a dedicated Design Verification Agent
-   (Step 5.25) runs after all domain tasks complete and handles the full visual comparison.
-   Your job: write precise code from the design contract tokens. Use exact hex colors,
-   exact spacing values, exact typography. Every CSS value must trace to the design contract.
-   The verification agent will open a browser and prove whether your code matches.
+8. **Render-Measure-Compare Loop** (when design-to-code stack rule is active — MANDATORY):
+   After implementing the component, you MUST verify it renders correctly by measuring
+   the actual DOM output against the contract's layout spec. This is not optional.
+   Do NOT rely on visual inspection or screenshots — measure mechanically.
+   a. **Render**: Start the dev server if not running. Navigate to a route where the
+      component is visible (or create a temporary test route that renders it in isolation).
+   b. **Measure via Playwright** — run `page.evaluate()` to extract DOM properties:
+      ```javascript
+      // For a widget: measure its internal layout
+      const el = document.querySelector('.widget-selector');
+      const style = getComputedStyle(el);
+      return {
+        display: style.display,           // 'flex' or 'grid'
+        flexDirection: style.flexDirection, // 'row' or 'column'
+        gap: style.gap,
+        gridTemplateColumns: style.gridTemplateColumns,
+        width: el.offsetWidth,
+        height: el.offsetHeight,
+        childCount: el.children.length,
+        children: Array.from(el.children).map(c => ({
+          tag: c.tagName,
+          width: c.offsetWidth,
+          height: c.offsetHeight,
+          display: getComputedStyle(c).display,
+          flexDirection: getComputedStyle(c).flexDirection,
+        }))
+      };
+      ```
+   c. **Compare to contract** — check each measured value against the contract spec:
+      - `body_layout: flex-row` → verify `flexDirection === 'row'`
+      - `container_height: 334px` → verify `height === 334` (±2px tolerance)
+      - Grid `2×2` → verify parent has 2 row children, each with 2 card children
+      - Legend position: if contract says `body_sidebar` (beside chart) →
+        verify legend and chart share a `flex-row` parent.
+        If contract says `footer_legend` (below chart) →
+        verify legend is in a `flex-column` parent below the chart.
+   d. **Fix mismatches** — if ANY measurement doesn't match the contract:
+      - Log: "LAYOUT MISMATCH: {property} expected {contract value}, got {measured value}"
+      - Fix the code to match the contract spec
+      - Re-render and re-measure (max 2 fix cycles)
+      - If still mismatched after 2 cycles → log to `.gsd-t/deferred-items.md`
+   e. **All pass** → log "RENDER-MEASURE PASS: {N} layout properties verified" and proceed.
+   This loop catches the exact class of errors that visual inspection misses:
+   grid-cols-4 instead of 2×2, legend below instead of beside, wrong flex-direction.
+   These are data comparisons, not visual judgments — the same kind of check as a unit test.
 9. Run ALL test suites — this is NOT optional, not conditional, not "if applicable":
    a. Detect configured test runners: check for vitest/jest config, playwright.config.*, cypress.config.*
    b. Run EVERY detected suite. Unit tests alone are NEVER sufficient when E2E exists.

package/commands/gsd-t-quick.md CHANGED Viewed

@@ -159,8 +159,16 @@ When you encounter unexpected situations:
    - If building/modifying a PAGE: IMPORT existing widget components — do NOT rebuild widget functionality inline.
    - **Contract is authoritative**: Follow the contract spec, not the Figma screenshot, when they appear to disagree.
 5. Make the change — **adapt new code to existing structures**, not the other way around
-6. Verify it works
-7. Commit: `[quick] {description}`
+6. **Render-Measure-Compare** (if design component — MANDATORY):
+   After implementing, verify via Playwright DOM measurement (not screenshots):
+   - Render the component in browser
+   - `page.evaluate()` to extract: display, flexDirection, gap, gridTemplateColumns,
+     offsetWidth, offsetHeight, child count and layout
+   - Compare each value to the contract's layout spec (body_layout, container_height, etc.)
+   - Mismatches → fix code → re-measure (max 2 cycles)
+   - This catches: wrong grid structure, legend below vs beside, wrong flex-direction
+7. Verify it works
+8. Commit: `[quick] {description}`
 ## Step 3.5: Emit Task Metrics

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@tekyzinc/gsd-t",
-  "version": "2.70.14",
+  "version": "2.70.16",
   "description": "GSD-T: Contract-Driven Development for Claude Code — 54 slash commands with headless CI/CD mode, graph-powered code analysis, real-time agent dashboard, execution intelligence, task telemetry, doc-ripple enforcement, backlog management, impact analysis, test sync, milestone archival, and PRD generation",
   "author": "Tekyz, Inc.",
   "license": "MIT",