npm - @sun-asterisk/sungen - Versions diffs - 2.7.0-beta.1 → 3.0.0-beta.71 - Mend

@sun-asterisk/sungen 2.7.0-beta.1 → 3.0.0-beta.71

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (245) hide show

package/src/orchestrator/templates/ai-instructions/claude-skill-capture-mode-live.md ADDED Viewed

@@ -0,0 +1,60 @@
+# Capture mode: live
+Navigate a running application, take **one accessibility snapshot** and **one screenshot**, and save them as visual context for test generation. Use when the app is live (dev, staging, or production with read-only access) and you want tests grounded in the actual rendered UI. Handles auth gracefully: if the page redirects to login, ask the user to sign in manually rather than injecting cookies.
+## Prerequisites
+- Playwright MCP connected.
+- Dev/staging server reachable (or a public URL).
+- `playwright.config.ts` exists at the project root (for `baseURL` fallback).
+## Steps
+### 1. Resolve target URL
+1. `Live URL` field in `requirements/spec.md` (Overview section)
+2. `baseURL` from `playwright.config.ts` + `URL Path` from `spec.md`
+3. Neither → `AskUserQuestion`: *"Paste the full URL for the page to scan"*
+### 2. Navigate
+`browser_navigate` to the resolved URL.
+### 3. Handle auth redirect
+If the page redirects to a login route (URL contains `/login`, `/signin`, `/auth`, or content indicates a login screen):
+1. Tell the user which login URL they landed on.
+2. `AskUserQuestion`:
+   - **I'll log in manually** — wait for confirmation, then re-navigate to the target URL
+   - **Skip live scan** — switch to mode `local`
+   - **Cancel**
+3. **Never** inject cookies or localStorage via `browser_evaluate` / `browser_run_code`. Auth belongs to the user.
+### 4. Snapshot
+Take **ONE** `browser_snapshot`. This accessibility tree is the primary AI context — roles, names, text, structure that tc-generation uses to identify sections and fields.
+### 5. Screenshot (recommended)
+Take **ONE** `browser_take_screenshot` with `fullPage: true`. Save to `requirements/ui/live-<timestamp>.png`, where `<timestamp>` is `YYYYMMDD-HHMM` local time (e.g. `live-20260421-1430.png`).
+### 6a. Verify unauthenticated redirect target (flow capture only)
+When capturing for a **flow** with security scenarios (e.g. "unauthenticated user cannot access X"):
+1. Open a **fresh incognito/unauthenticated** context (no storage state).
+2. `browser_navigate` to the protected route.
+3. Record the **actual redirect URL** — do NOT assume `/login`; it may be `/register`, `/`, etc.
+4. Report the redirect target: *"Unauthenticated access to `/dashboard` redirects to `/register`"*.
+5. The caller must use the **actual redirect URL** in Gherkin assertions, never an assumed one.
+Skip if the flow has no security scenarios or the user says to skip.
+### 6. Detect discrepancies vs spec
+If `spec.md` exists, cross-check the snapshot against spec sections: fields in spec but not in snapshot → *missing in UI*; elements in snapshot but not in spec → *missing in spec*. Report findings but **do not** auto-edit `spec.md`.
+### 7. Report back
+> Captured live page `<URL>`: Snapshot N interactive elements · Screenshot `requirements/ui/live-<timestamp>.png` · Discrepancies vs spec: <count or "none">
+Hand back to the calling command. Scans **exactly one** page per invocation.

package/src/orchestrator/templates/ai-instructions/claude-skill-capture-mode-local.md ADDED Viewed

@@ -0,0 +1,38 @@
+# Capture mode: local
+Use **pre-existing images** in `requirements/ui/` as visual context. No network, no MCP, no live site — works for any design tool (Figma export, Sketch, Penpot, Zeplin, hand-drawn, staging screenshots). This is the **baseline fallback**: if the live domain is down and Figma MCP isn't configured, this always works as long as the user drops images in the folder.
+## Steps
+### 1. List available images
+Glob `requirements/ui/*.{png,jpg,jpeg,webp,gif}` and report count + filenames. Filter out metadata files (e.g. `figma-meta.md` written by mode figma-mcp) — those are read by `tc-generation` separately, not treated as images here.
+### 2. Handle empty folder
+If no images found:
+1. Tell the user the folder is empty, with the full path so they can open it in Finder.
+2. `AskUserQuestion`:
+   - **I'll drop images now** — wait for confirmation, then re-glob
+   - **Switch to Figma** — switch to mode `figma-mcp`
+   - **Switch to live page scan** — switch to mode `live`
+   - **Cancel** — abort create-test
+3. If "drop images now", wait for confirmation (e.g. "done") then re-run step 1.
+### 3. Read images for context
+Use the `Read` tool on each image — Claude Code reads PNG/JPG/WebP directly as visual context. For large sets (>10 images), ask which are primary and which are states/variants to avoid loading too much at once.
+### 4. Summarize
+> Loaded N image(s) from `requirements/ui/`:
+> - `<filename-1>` — <one-line description of what's visible>
+> - `<filename-2>` — <one-line description>
+Hand back to the calling command.
+## File naming hints for users
+Nudge users toward consistent filenames (don't enforce):
+- `<section>-default.png` / `-error.png` / `-loading.png` / `-empty.png` — section states
+- `full-page.png` / `viewport.png` — whole screen (auto-generated by `sungen add --capture`)

package/src/orchestrator/templates/ai-instructions/claude-skill-capture.md ADDED Viewed

@@ -0,0 +1,35 @@
+---
+name: sungen-capture
+description: 'Acquire visual/design context for test generation from one of four sources (modes): figma-mcp, figma-pat, live, local. Auto-loaded by create-test/add-screen when a visual source is needed, or when --figma flag / spec_figma.md is present. Router skill — read only the mode file you need.'
+user-invocable: false
+---
+## Purpose
+Bring **visual + design context** into test generation so `sungen-tc-generation` can author Gherkin + test-data grounded in the real UI. This is a **router**: pick exactly **one mode** for the run, then read only that mode's file. Do **not** read all four.
+This skill never generates Gherkin or `selectors.yaml` — it only acquires context and reports back to the calling command.
+## Pick the mode
+| Mode | Read | Use when | Needs |
+|---|---|---|---|
+| **figma-mcp** | `mode-figma-mcp.md` | Pre-launch / Figma is source of truth, **Figma Dev Mode MCP** connected | Figma MCP + frame URL |
+| **figma-pat** | `mode-figma-pat.md` | `--figma` flag was used, or `requirements/spec_figma.md` exists (synthesize narrative from cached raw node JSON) | `sungen figma auth` PAT |
+| **live** | `mode-live.md` | App is running (dev/staging/prod read-only) and you want the actual rendered UI | Playwright MCP + reachable URL |
+| **local** | `mode-local.md` | Images already dropped in `requirements/ui/` (any design tool, screenshots, mockups) — baseline fallback, no network | nothing |
+### How the mode is chosen (when the caller didn't specify)
+1. `requirements/spec_figma.md` exists → **figma-pat** (PAT flow already ran during `add-screen`).
+2. `requirements/ui/` has images → **local**.
+3. Neither → ask the user which source (figma-mcp / live / local), then load that one mode file.
+Modes are **mutually exclusive per run**, but the user can run `create-test` again with a different mode to layer context. All modes write to `requirements/ui/` and report back.
+## What this skill (any mode) does NOT do
+- Does not generate Gherkin — that's `sungen-tc-generation`.
+- Does not write `selectors.yaml` — that's `/sungen:run-test`.
+- Does not inject auth/cookies — the user logs in manually (see `live`).
+- Does not crawl or generate images.

package/src/orchestrator/templates/ai-instructions/claude-skill-harness-audit.md ADDED Viewed

@@ -0,0 +1,84 @@
+---
+name: sungen-harness-audit
+description: 'How to read `sungen audit` output and repair test-design findings. Auto-loaded by the design orchestrator.'
+user-invocable: false
+---
+## What `sungen audit` measures
+`sungen audit --screen <name>` runs deterministic sensors over `features/<name>.feature` + `requirements/test-viewpoint.md` and writes `.sungen/reports/<name>-audit.json`. It is the **gate** the orchestrator repairs against. Run with `--json` to parse it.
+### Report shape (key fields)
+```jsonc
+{
+  "score": { "overall": 3.9, "coverage": 0.4, "businessDepth": 0.18, "balance": 0.5, "traceability": 0.7 },
+  "gateStatus": "PASS" | "FAIL",
+  "gate": { "pageType": "ecommerce-list", "themesCovered": 2, "themesTotal": 5,
+            "gaps": [ { "theme": "cart-correctness", "keywords": [...] } ] },
+  "depth": { "businessCriticalShallow": 9, "businessCriticalTotal": 11,
+             "shallowBusinessCritical": [ { "name": "...", "category": "PRODUCT" } ] },
+  "balance": { "imbalanced": true, "coreCount": 11, "secondaryCount": 22, "byBucket": {...} },
+  "duplicates": { "clusters": [ { "sameDataLikely": false, "scenarios": [...] } ] },
+  "trace": { "mappedRatio": 0.4, "note": "..." },
+  "findings": [ "GATE: ...", "DEPTH: ...", "BALANCE: ...", "TRACE: ..." ]
+}
+```
+- **`overall` score is business-weighted** (coverage 0.4 + businessDepth 0.3 + balance 0.15 + traceability 0.15). It is intentionally strict on business value — a high count with shallow business coverage scores low. Don't optimize the count; optimize coverage + depth.
+- Exit code **2** when `gateStatus == FAIL` (usable in CI / loop).
+## Finding → repair mapping
+| Finding prefix | Meaning | Repair action |
+|---|---|---|
+| **GATE** | a critical theme for the page-type has no covering scenario | Generate scenarios for that theme. **If cross-screen** (cart-correctness, product-detail-consistency, filter-result-correctness, multi-item cart) → do NOT fake it on a single screen; plan a **flow** (`/sungen:add-flow`) and record the deferral. |
+| **DEPTH** | business-critical scenarios assert only visibility/navigation | Replace `Then User see [X] page/section` with **observable data assertions**: `Then User see [X] with {{value}}`, `Then User see [T] table match data:`. Capture real expected values into `test-data.yaml`. |
+| **BALANCE** | secondary viewpoints (UI/validation/security) outweigh business-core | **Stop expanding** secondary viewpoints; generate the missing business-core scenarios first. Do not add more subscription/UI variants while core is thin. |
+| **TRACE** | scenarios use ad-hoc `VP-<CAT>-NNN` codes not linked to the viewpoint-overview | Make each scenario map to a viewpoint-overview id (align category codes, or add a mapping comment). |
+| **UNIVERSAL** | a universal theme (error/empty-state, accessibility) is absent | Low priority — add if in scope; otherwise note as out-of-scope with reason. |
+## P5 steps for deep cross-screen / list coverage
+Use these when repairing GATE/DEPTH findings for the hard viewpoints (cart/detail/filter correctness). They need **runtime data mode** (the default).
+- **Capture a value to compare across screens** (product-detail-consistency, cart-correctness):
+  ```gherkin
+  When User remember [Product Name] text as {{selected_product_name}}
+  And User remember [Product Price] text as {{selected_product_price}}
+  And User click [View Product] link
+  Then User see [Detail Product Name] header with {{selected_product_name}}
+  And User see [Detail Product Price] text with {{selected_product_price}}
+  ```
+  `remember` stores the element's text/value at runtime; later `{{var}}` resolves to it. This proves the detail/cart shows the SAME product, not a random one.
+- **Assert every item in a result matches** (category/brand-filter-correctness):
+  ```gherkin
+  When User click [Women] link
+  And User click [Dress] link
+  Then User see all [Result Product Name] contain {{selected_category}}
+  ```
+  `see all [X] contain {{v}}` asserts EVERY matching element contains the value → "all displayed products belong to the selected category/brand", not just one.
+> Cross-screen flows (home → detail/cart): if the target screen is a separate screen, prefer a **flow** (`/sungen:add-flow`) so the journey is one test. On a single screen, keep the cross-screen assertion but tag `@manual` with a `# Deferred to a flow` comment.
+## Repair loop rules
+1. **Budget = 3 rounds.** Re-run `sungen audit` after each repair; track score delta.
+2. **Stop when** `gateStatus == PASS` AND `findings` empty — or budget exhausted.
+3. **Never fake a pass.** A shallow `see [Cart] page` does not satisfy `cart-correctness`. If a gap is genuinely cross-screen or needs capabilities the DSL lacks (e.g. capture an element value to compare elsewhere), **report it as a residual gap / flow item** instead of forcing a green gate.
+4. **EP/data families are OK.** A `duplicates` cluster with `sameDataLikely=false` is an intentional equivalence-partition family (e.g. many invalid-email cases) — keep it; only collapse `sameDataLikely=true` exact duplicates.
+## Discovery / fallback tree (when input is limited)
+```
+spec.md đủ tốt?      → YES: Spec-first
+  │ NO
+source code có?      → YES: Source-first (mine behavior từ code)
+  │ NO
+testcase cũ tương tự?→ YES: History-first
+  │ NO
+domain rủi ro+defect?→ YES: Defect-first
+  │ NO
+→ hỏi QA; QA chưa phản hồi → OUTPUT kèm ASSUMPTION LIST rõ ràng (không stall)
+```
+See `docs/orchestration-spec.md` for the full flow and `reports/sungen_refactor_spec.md` for the design rationale.

package/src/orchestrator/templates/ai-instructions/claude-skill-tc-generation.md CHANGED Viewed

@@ -7,7 +7,7 @@ user-invocable: false
 ## ⚠️ Gotchas — read before generating
 - `spec_figma.md` exists → read file only, **NEVER** call `mcp__figma__*`
-  → PAT auth flow already done by `sungen-figma-source`; re-calling fails or duplicates work.
+  → PAT auth flow already done by `sungen-capture` (mode figma-pat); re-calling fails or duplicates work.
 - `selectors.yaml` → do **NOT** generate — handled by `run-test`
   → Selectors need live DOM inspection via Playwright MCP, only `run-test` triggers it.
@@ -221,6 +221,45 @@ Security:         [S1 – admin only]
 **✅ Good** — see admin notice example above: `Display surfaces` lists every URL spec mentions as output, `Cross-surface rules` maps each admin action to its user-facing outcome, `Inclusive bounds` flags every `<=`/`>=` for BVA. Every item maps to a VP-ID in `Tier 1 output`.
+#### Critical business-viewpoint pre-gate — pass `sungen audit` on the FIRST pass
+> The harness gate FAILS (and forces repair rounds → wasted tokens) when a page-type's critical **business** viewpoints are missing or **shallow**. Generate them correctly the first time. A business-critical `Then` must assert **DATA**, never just `see [X] page/section/modal`.
+**By page-type, generate a DEEP scenario for each (before expanding UI/validation/subscription):**
+| Page-type | Must-cover viewpoints (each with a data assertion) |
+|---|---|
+| **e-commerce list / home** | list-data (card has image+name+price+add) · product-detail-consistency · cart-correctness · category-filter-correctness · **brand-filter-correctness (separate from category)** · add-to-cart success · nav-core |
+| **form** | required-validation · format/boundary · submit-success |
+| **auth** | valid-login · invalid-credential · access-control |
+**Required assertion shapes (use these, not bare visibility):**
+- Card info: assert at **card level** (image+name+price together), e.g. `User see all [Product Card] contain {{...}}` — not `see [Section]` (section-level passes even if one card lacks price).
+- Cross-screen consistency (detail/cart): **capture then compare** —
+  ```gherkin
+  When User remember [Product Name] text as {{selected_product_name}}
+  And User remember [Product Price] text as {{selected_product_price}}
+  And User click [View Product] link
+  Then User see [Detail Product Name] header with {{selected_product_name}}
+  And User see [Detail Product Price] text contains {{selected_product_price}}
+  ```
+  Cross-screen target → tag `@manual` + `# Deferred to a flow (home -> detail)`.
+- Filter result (category AND brand, separately): `Then User see all [Result Product Name] contain {{selected_category}}` — proves EVERY item belongs, not one.
+**Depth is a GATE dimension (harness-roadmap P1) — self-raise, never silently go shallow:**
+- For every data-correctness theme the catalog marks `depth.requires: data-assertion`, emit its `depth.template` shape by **default** — don't wait for the repair loop. `sungen audit` measures `businessDepth` (ratio of these scenarios that assert data) against an intent threshold (functional ≥ 0.70); below it the **gate FAILs**.
+- `depth.cross_screen: true` (cart / detail / filter / brand correctness) → write the deep capture/compare shape but tag `@manual` + `# Deferred to a flow (...)`. These are excluded from the ratio (they're correctly deferred), so they don't hurt depth.
+- **If the spec lacks the concrete value** a deep assertion needs (exact message, price, count): still write the deep shape with a `{{var}}` placeholder and leave a `# SPEC-GAP: <field> value not in spec` comment — do **not** downgrade to `see [X] section`. A visible gap is better than a silent shallow pass.
+- **Blind-Spot Memory:** before finishing, run `sungen blindspot list --prompt` (Bash) and make sure the suite satisfies each recorded pattern (e.g. "for any Add/Create action: check success + resulting data state + duplicate/double-submit"). These are gaps QA hit before — don't repeat them.
+**First-pass anti-patterns (these are exactly what the gate/reviewer reject — avoid them):**
+- Title↔steps mismatch: e.g. a "no-result state" scenario that clicks a query which **returns** products. Steps must create the condition the title claims.
+- Tautology `Then`: `click [Next Slide]` → `see [Carousel] section` (always visible, proves nothing). Assert the change (new slide title differs).
+- Business-critical scenario ending at `see [Added] modal` / `see [Cart] page` / `see [Category Products] page` with no data assertion.
+- Brand filter covered only as navigation (must assert products belong to the brand).
+**Balance:** cover all the above (deep) BEFORE expanding subscription / UI-presence / extra validation edge cases. Do not over-invest in subscription while cart/detail/filter correctness are shallow.
 #### Tier 1 guard — minimum before writing scenarios
 | Spec section | Minimum requirement | Tag |

package/src/orchestrator/templates/ai-instructions/copilot-cmd-add-flow.md CHANGED Viewed

@@ -48,9 +48,9 @@ Record the screen list — you will need it for:
 ### 2. Capture visual source
 Ask: *"Pick a visual source for this flow's screens:"*
-- **Figma designs** (Recommended for pre-launch) — invoke `sungen-capture-figma` skill for each screen
-- **Live page scan** (dev/staging is up) — invoke `sungen-capture-live` skill for each screen URL
-- **Local images** — invoke `sungen-capture-local` skill to load from `requirements/ui/`
+- **Figma designs** (Recommended for pre-launch) — invoke `sungen-capture` skill (mode figma-mcp) for each screen
+- **Live page scan** (dev/staging is up) — invoke `sungen-capture` skill (mode live) for each screen URL
+- **Local images** — invoke `sungen-capture` skill (mode local) to load from `requirements/ui/`
 - **Skip** — user will drop images manually into `requirements/ui/` later
 Each capture skill writes outputs into `qa/flows/${input:flow}/requirements/ui/` and reports back a summary. Do not inline capture logic here — always delegate to the skill.

package/src/orchestrator/templates/ai-instructions/copilot-cmd-add-screen.md CHANGED Viewed

@@ -72,7 +72,7 @@ This CLI command automatically:
 ### 1a. Synthesize narrative sections (Figma branch only)
-After `sungen add --figma` succeeds, the envelope of `spec_figma.md` is deterministic but the narrative below the `<!-- SYNTHESIS-BELOW -->` marker is empty. Invoke the `sungen-figma-source` skill:
+After `sungen add --figma` succeeds, the envelope of `spec_figma.md` is deterministic but the narrative below the `<!-- SYNTHESIS-BELOW -->` marker is empty. Invoke the `sungen-capture` skill (mode figma-pat):
 1. Read `qa/screens/${input:screen}/requirements/spec_figma.md` frontmatter for `file_key`, `node_id`, `figma_version_id`.
 2. Read the cached raw node JSON at `.sungen/figma-cache/<file_key>/<figma_version_id>/<safe_node_id>-raw.json` (colons in node_id become underscores).
@@ -89,11 +89,11 @@ After `sungen add --figma` succeeds, the envelope of `spec_figma.md` is determin
 **If Figma branch (Step 1) already downloaded PNGs** → visuals already exist. Offer:
 - **1) Continue** — Figma visuals are enough (Recommended)
-- **2) Also capture live page** — supplement Figma with real page scan (invoke `sungen-capture-live` skill)
+- **2) Also capture live page** — supplement Figma with real page scan (invoke `sungen-capture` skill (mode live))
 **If standard path (no --figma)** → go straight to source selection:
-- **1) Figma design** (Recommended for pre-launch) — invoke `sungen-capture-figma` skill
-- **2) Live page scan** (dev/staging is up) — invoke `sungen-capture-live` skill
+- **1) Figma design** (Recommended for pre-launch) — invoke `sungen-capture` skill (mode figma-mcp)
+- **2) Live page scan** (dev/staging is up) — invoke `sungen-capture` skill (mode live)
 - **3) Skip** — user will drop images manually into `requirements/ui/` later
 Each capture skill writes outputs into `qa/screens/${input:screen}/requirements/ui/` and reports back. Do not inline capture logic here — delegate to the skill so behavior stays consistent with `/sungen-create-test`.

package/src/orchestrator/templates/ai-instructions/copilot-cmd-create-test.md CHANGED Viewed

@@ -12,6 +12,8 @@ tools: [vscode, execute, read, agent, edit, search, web, browser, todo, 'playwri
 You are a **Senior QA Engineer**. You structure test cases by viewpoint categories and translate UI into Gherkin test cases following the `sungen-gherkin-syntax` and `sungen-tc-generation` skills. **Tier 1 (critical+high) first** — expand coverage later. **Gherkin scenarios and test data only** — selectors are handled during `/sungen-run-test`.
+**Quality is built in.** After generating, run a **harness loop**: `sungen audit` measures the output and you **repair the findings** until critical viewpoints are covered — the user does not ask for this. Use the `sungen-harness-audit` skill. (`/sungen-design` is an **alias** of this command.)
 ## Parameters
 - **name** — ${input:name:screen or flow name (e.g., login, award-submission)}
@@ -44,35 +46,41 @@ You are a **Senior QA Engineer**. You structure test cases by viewpoint categori
    **Auto-detect visual source** — do NOT ask the user to pick a source. Instead, check what already exists and use it:
    1. If `spec_figma.md` exists → read it as Figma supplement (PAT flow already completed during `add-screen`). Do NOT call any `mcp__figma__*` tool.
    2. If `ui/` has images (`.png`, `.jpg`, etc.) → read them for visual context (layout, element positions, states).
-   3. If neither exists → ask: *"No visual source found. Pick one:"*
-      - **1) Figma PAT** — ask for URL, run `sungen add --screen ${input:name} --figma '<url>'`, then invoke `sungen-figma-source` skill
-      - **2) Figma MCP** — invoke `sungen-capture-figma` skill
-      - **3) Live page scan** — invoke `sungen-capture-live` skill
+   3. If neither exists → ask: *"No visual source found. Pick one:"* — then invoke the **`sungen-capture`** skill with the matching **mode** (read only that mode's file):
+      - **1) Figma PAT** — ask for URL, run `sungen add --screen ${input:name} --figma '<url>'`, then `sungen-capture` **mode figma-pat**
+      - **2) Figma MCP** — `sungen-capture` **mode figma-mcp**
+      - **3) Live page scan** — `sungen-capture` **mode live**
       - **4) Skip** — generate from spec.md only
+   (When `spec_figma.md` exists, that is also `sungen-capture` **mode figma-pat**; when `ui/` images exist, that is **mode local**.)
    **Cross-check**: if both `spec.md` and visual sources exist, flag any discrepancies (missing fields, different labels) before moving on. When `spec_figma.md` is present, follow the Figma supplement rules in `sungen-tc-generation` skill (reading order, Text Inventory, conflict handling).
    Summarize what you found in requirements and present to the user.
 4. Follow the `sungen-tc-generation` skill for section identification, viewpoint generation, and output format. **For flows**, use the "Flow Test Generation" section in the skill. When requirements exist, use the "Requirements-Driven Generation" strategy. **For Tier 1**, apply the **Lightweight Guard** — verify required fields, validation rules, business rules, security checks, and key state transitions all have TCs after generation. **For Tier 2+**, **MUST** apply the full **Mapping Contract** — walk every `spec.md` section top-to-bottom and produce the indicated TCs per Table 1; handle `test-viewpoint.md` per Table 2. Do not silently skip sections. Present sections as a numbered list and let user pick.
 5. Generate or update `.feature` + `test-data.yaml` following `sungen-gherkin-syntax` and `sungen-tc-generation` skills. **For flows**: use `[Screen:Element]` namespace format, namespace test-data by phase, add `@flow` tag.
-6. Show summary and offer next steps based on which tier was just generated:
+5.5. **Quality gate & repair (harness — always run).** Per `sungen-harness-audit`: run `sungen audit --screen ${input:name}` (structural), THEN do an **independent semantic review inline** using the `sungen-reviewer` criteria (does each scenario's steps PROVE its title/viewpoint? observable Thens? business-critical assertion depth?). Merge both sets of issues; if gate FAILs / findings exist, repair (budget 3) and re-audit — GATE missing theme → generate it (cross-screen → write data assertions, tag `@manual`, comment `# Deferred to a flow`); DEPTH → add data assertions; BALANCE → add business-core first; TRACE → align VP ids. Never fake a pass.
+5.6. **Record.** `sungen manifest --screen ${input:name}`. Ledger **each phase** (not just repair) — pick one `runId` at the start and pass it so `trace`/`ledger report` show THIS run, not a mix: `sungen ledger record --screen ${input:name} --run <runId> --step <discovery|viewpoint|gherkin|audit|repair:N> --ms <elapsed>`. On re-run, start with `sungen manifest --screen ${input:name} --diff` and only regenerate changed sections.
+6. **Converge — show the trace.** Run `sungen trace --screen ${input:name}` and present: process map (phases + repair rounds), bottlenecks, **HUMAN-LOOP FOCUS** (@manual to verify), audit score + gate + residual gaps. Then offer next steps based on which tier was just generated:
+   > The harness gate + reviewer already ran above — `/sungen-review` is the independent checkpoint (hand/prompt-authored or pre-delivery), not a needed next step here.
+   **Optional — exploration mode (Loop 2).** The suite above is the deterministic official output. To push past "same output every time", offer the **challenge pass**: run `sungen challenge --screen ${input:name}` (deterministic structural critics), then apply the `sungen-challenge` criteria inline for semantic + novelty candidates. Advisory only — surfaces blind spots + ≤20% novelty candidates, never auto-merges. Record a confirmed recurring miss with `sungen blindspot add` so future runs don't repeat it.
    **After Tier 1 generation:**
-   - **`/sungen-review ${input:name}`** — Review syntax, coverage, viewpoint quality (Recommended)
-   - **`/sungen-run-test ${input:name}`** — Skip review, generate selectors and run tests now
+   - **`/sungen-run-test ${input:name}`** — Generate selectors and run tests now (Recommended)
    - **`/sungen-create-test ${input:name}`** — Expand coverage: add @normal + @low scenarios (Tier 2)
    - **Done for now** — I'll come back later
    **After Tier 2 generation:**
    - **`/sungen-create-test ${input:name}`** — Deep coverage: add BVA combos, cross-field validation, negative inputs, race conditions (Tier 3) (Recommended)
-   - **`/sungen-review ${input:name}`** — Review syntax, coverage, viewpoint quality
    - **`/sungen-run-test ${input:name}`** — Generate selectors and run tests now
    - **Done for now** — I'll come back later
    **After Tier 3 or Full generation:**
-   - **`/sungen-review ${input:name}`** — Review syntax, coverage, viewpoint quality (Recommended)
-   - **`/sungen-run-test ${input:name}`** — Generate selectors and run tests now
+   - **`/sungen-run-test ${input:name}`** — Generate selectors and run tests now (Recommended)
+   - **`/sungen-create-test ${input:name}`** — Add more sections if the screen changed
    - **Done for now** — I'll come back later
 **No selectors.yaml** — selectors are generated during `/sungen-run-test`.

package/src/orchestrator/templates/ai-instructions/copilot-cmd-design.md ADDED Viewed

@@ -0,0 +1,13 @@
+---
+name: sungen-design
+description: 'Alias of create-test. Generates test cases AND runs the quality harness (gate + repair). Kept for discoverability; create-test now does this by default.'
+argument-hint: '[screen-name]'
+agent: 'agent'
+tools: [vscode, execute, read, agent, edit, search, web, browser, todo, 'playwright/*']
+---
+## `/sungen-design` is an alias of `/sungen-create-test`
+As of v3.0 the quality harness (discovery → viewpoint overview → generate → **`sungen audit` gate → repair loop** → manifest/ledger) is built into **`/sungen-create-test`** by default — no second command needed for quality.
+**Do exactly what `/sungen-create-test <name>` does** — follow that command verbatim, including the mandatory harness gate & repair step and the `sungen-harness-audit` skill. This entry exists only to keep the `design` name discoverable.

package/src/orchestrator/templates/ai-instructions/copilot-cmd-feedback.md ADDED Viewed

@@ -0,0 +1,24 @@
+---
+name: sungen-feedback
+description: 'Record QA feedback locally (test-design knowledge or product telemetry). Auto-attaches context and stores to .sungen/feedback/.'
+argument-hint: '[message]'
+agent: 'agent'
+tools: [vscode, execute, read, search]
+---
+## Role
+Capture QA feedback and store it **locally** (no server) via `sungen feedback record`. Closes the learning loop inside the project; the harness can reuse it later.
+## Steps
+1. Read message from arguments (ask if empty).
+2. **Classify**: `test-design` (viewpoint/scenario wrong/missing/duplicate/add), `product` (Sungen itself misbehaved), or `other`. Infer; confirm only if ambiguous.
+3. **Auto-attach context** (don't make the user repeat): `--screen <name>` (current focus), `--target <ref>` (VP id / scenario / command / artifact), `--decision <accept|reject|edit|add|none>`, `--reason <text>`.
+4. Run:
+   ```bash
+   sungen feedback record --type <type> --screen <name> --target "<ref>" --decision <d> --message "<msg>" --reason "<why>"
+   ```
+5. Confirm what/where (`.sungen/feedback/feedback.jsonl`). It is **local**; cross-project sync is opt-in later.
+6. If feedback implies an action (missing critical viewpoint), offer `sungen-design <name>` (regenerate with gate) or `sungen-add-flow` (cross-screen gap).
+## Notes
+- Never send anywhere — local file only. Keep `product` vs `test-design` distinct. View: `sungen feedback list`.

package/src/orchestrator/templates/ai-instructions/copilot-cmd-review.md CHANGED Viewed

@@ -1,48 +1,38 @@
 ---
 name: sungen-review
-description: 'Review test cases for a Sungen screen — validate syntax, score coverage, check viewpoint quality.'
+description: 'Independent quality checkpoint for test cases — runs the harness (audit gate + reviewer criteria + script-check) and presents one unified scorecard. Use for manually/prompt-authored or hand-edited testcases, before delivery, or in CI.'
 argument-hint: '[screen-name]'
 agent: 'agent'
-tools: [vscode, read, edit, search, todo]
+tools: [vscode, execute, read, edit, search, todo]
 ---
 **Input**: Screen or flow name (e.g., `/sungen-review admin-users`).
 ## Role
-You are a **Senior QA Reviewer**. You evaluate Gherkin test cases using the `sungen-tc-review`, `sungen-viewpoint`, and `sungen-gherkin-syntax` skills.
+You are an **independent QA Reviewer** — you did not author these tests. You do **not** invent a parallel score; you run the **harness** and present its signals as a human scorecard. Skills: `sungen-tc-review` (presentation rubric), `sungen-viewpoint`, `sungen-gherkin-syntax`.
-## Parameters
+## When this matters
-- **name** — ${input:name:screen or flow name (e.g., login, award-submission)}
+`/sungen-create-test` already runs the harness gate while generating, so you don't need to review right after it. Run `/sungen-review` when the harness did **not** run or you need an independent sign-off: hand/prompt-authored testcases, a hand-edited `.feature`, **before `/sungen-delivery`**, or in **CI**.
-**Auto-detect context**: check if `qa/flows/<name>/` exists → flow mode (base path: `qa/flows/<name>/`). Else check `qa/screens/<name>/` → screen mode (base path: `qa/screens/<name>/`).
+## Parameters
+- **name** — ${input:name:screen or flow name}
+**Auto-detect context**: `qa/flows/<name>/` → flow, else `qa/screens/<name>/` → screen.
 ## Steps
-1. **Enumerate feature files** — glob `<base>/<name>/features/*.feature`. A screen may have one main file (`<name>.feature`) plus sub-features (`<name>-<sub>.feature` like `awards-modal.feature`); a flow has a single `<name>.feature`. If zero `.feature` files found → `/sungen-create-test` first.
-2. **Review every feature file** — for each `<basename>.feature` discovered in step 1:
-   - Read `<basename>.feature` and the matching `test-data/<basename>.yaml`.
-   - Apply the `sungen-tc-review` skill — score the **7-dimension rubric (100 pts)**: Structure & Format (15), Coverage (30), Assertion Quality (20), Test Data (10), Security & Permission (10), Automation Readiness (10), Maintainability (5). **For flows**, also apply the flow-specific checks (Layer A7 "Tags & Flow"). Use `sungen-viewpoint` for pattern checklists.
-   - Apply the **Unverified Selectors check** — if `<base>/<name>/selectors/<basename>.yaml` exists, count lines matching `@needs-live-verify`. Include in the per-file report as a non-scoring metric. Does NOT affect the score or the PASS threshold.
-3. **Aggregated output** — present scores in a per-feature table, then a screen-level rollup:
+1. **Enumerate** `<base>/${input:name}/features/*.feature`. If none → `/sungen-create-test` first.
+2. **Run the harness (source of truth) — no separate rubric:**
+   - `sungen audit --screen ${input:name}` → gate, business-weighted score, findings, gaps.
+   - Apply the **`sungen-reviewer` criteria inline** → semantic verdict (do steps prove the title? observable Then? business-critical depth? @manual justified?).
+   - `sungen script-check --screen ${input:name}` → spec is 1:1 with the Gherkin (flags hand-edit / stale drift; only if a spec exists).
+3. **Unified scorecard** per feature, anchored on harness signals (the `sungen-tc-review` 7 dimensions are a presentation layer, not a competing score):
    ```
-   Feature              Total  Verdict       Unverified
-   ─────────────────────────────────────────────────────
-   home.feature           88   PASS          0
-   home-modal.feature     64   CONDITIONAL   2
-   ─────────────────────────────────────────────────────
-   Screen rollup (mean)   76   PASS
+   Feature        Gate   Score   Reviewer        Spec 1:1   Verdict
+   home.feature   PASS   8.4/10  2 minor issues  in-sync    PASS
    ```
-   - **>= 70**: PASS that file.
-   - **50–69**: CONDITIONAL — fix before execution.
-   - **< 50**: FAIL — revise & re-review.
-   - "Unverified" = count of `@needs-live-verify` selectors (non-scoring). Show the full per-file report (dimension breakdown, recommendations, top issues) **only for files that are CONDITIONAL or FAIL**, or when the user asks for the deep report.
-4. If any file is CONDITIONAL or FAIL and user confirms → update that file's test cases following `sungen-gherkin-syntax` and `sungen-tc-generation` skills, then re-review **only those files** (skip already-passing ones to save time).
-5. After all files PASS (or user decides to proceed), offer next steps:
-- **`/sungen-run-test ${input:name}`** — Generate selectors, compile, and run tests for **every feature** in this screen (Recommended)
-- **`/sungen-create-test ${input:name}`** — Add more test cases before running
-- **Done for now** — I'll come back later
+   PASS = gate PASS + reviewer clean + spec in-sync. Else CONDITIONAL/FAIL with findings + fixes. Score = audit score adjusted by unresolved reviewer issues — never contradicting the gate.
+4. **Repair (on confirm)** — apply audit findings + reviewer fixes (use `remember`/`see all` per `sungen-harness-audit`), re-run step 2 on the affected file. On drift → `sungen generate` to resync (never hand-edit the spec).
+5. **Trace + next** — `sungen trace --screen ${input:name}` (human-loop focus), then offer:
+   - **`/sungen-run-test ${input:name}`** (Recommended) · **`/sungen-delivery ${input:name}`** · **`/sungen-create-test ${input:name}`** · Done.

package/src/orchestrator/templates/ai-instructions/copilot-cmd-run-test.md CHANGED Viewed

@@ -45,7 +45,7 @@ Skip when `--env` matches the base locale.
             one browser_snapshot → cross-verify every [Reference] label vs snapshot name →
             generate selectors.yaml (verified entries; explicit YAML for any label≠DOM-name mismatch)
      NO  → spec_figma.md exists in requirements/?
-             YES → provisional flow (sungen-figma-source + sungen-selector-fix skills):
+             YES → provisional flow (sungen-capture mode figma-pat + sungen-selector-fix skills):
                    1. Read filtered Figma node data from spec_figma.md (## Components + ## Text Inventory)
                    2. Apply selector priority from sungen-selector-fix § Step 3 (testid > role+name > label > placeholder > text > locator CSS last)
                    3. Write selectors.yaml — every provisional entry gets this comment on the line above:
@@ -83,6 +83,7 @@ Skip when `--env` matches the base locale.
 6. **Phase 2 — Priority Wave**: Run all `@high` scenarios. Fix only failures from this wave. Max 2 attempts. Shared selectors fixed here cascade to later phases.
 7. **Phase 3 — Full Run**: Run all tests. Fix only **new** failures (elements unique to `@normal`/`@low`). Max 1 attempt. Don't loop on low-priority failures.
 8. **Phase 4 — Regression**: One final full run. Report results. No more fix loops.
+9. **Integrity & trace (always run after the final run).** `sungen script-check --screen <name>` — verify the spec is a **1:1** of the Gherkin; if **DRIFT**, re-run `sungen generate --screen <name>` (never hand-edit the `.spec.ts` — auto-fix edits `selectors.yaml`). Then `sungen ledger record --screen <name> --step run --ms <elapsed>` and `sungen trace --screen <name>` to show the process map + bottlenecks + **HUMAN-LOOP FOCUS**.
 ## Playwright command guidelines

package/src/orchestrator/templates/ai-instructions/copilot-config.md CHANGED Viewed

@@ -16,10 +16,7 @@ You generate 3 files for sungen — a Gherkin compiler that produces Playwright
 | `sungen-selector-keys` | YAML key generation from `[Reference]` names, suffixes, lookup priority |
 | `sungen-selector-fix` | Selector generation from live page, auto-fix strategy |
 | `sungen-delivery` | Export Gherkin + Playwright results → CSV test case deliverable |
-| `sungen-capture-figma` | Fetch design context + PNG from a Figma frame URL via Figma Dev Mode MCP |
-| `sungen-capture-local` | Load existing UI assets (screenshots, mockups, Figma exports) from `requirements/ui/` |
-| `sungen-capture-live` | Capture a live running page via Playwright MCP (snapshot + screenshot) |
-| `sungen-figma-source` | Figma URL → spec_figma.md + ui/*.png + provisional selectors |
+| `sungen-capture` | Acquire visual/design context — one skill, 4 modes: figma-mcp (Dev Mode MCP), figma-pat (URL → spec_figma.md), live (Playwright MCP scan), local (images in `requirements/ui/`) |
 | `sungen-locale` | Bootstrap i18n — audit selectors, detect locale switch mechanism, generate test-data overlay |
 ## Workflow (7 AI commands)

package/src/orchestrator/templates/ai-instructions/github-skill-sungen-capture-mode-figma-mcp.md ADDED Viewed

@@ -0,0 +1,82 @@
+# Capture mode: figma-mcp
+Pull **structured design data** (layout, typography, colors, component tree, design tokens) and a **PNG screenshot** from a Figma frame URL via the **Figma Dev Mode MCP**, so `sungen-tc-generation` can author Gherkin + test-data before a live domain exists. Use when the project is pre-launch or Figma is the source of truth and the live build lags the design.
+## Prerequisites
+- **Figma MCP server** (`https://mcp.figma.com/mcp`, HTTP transport) connected in `.mcp.json` — `sungen init` scaffolds this. On first use, Claude Code opens a browser for Figma OAuth. Official tools: `get_design_context`, `get_variable_defs`, `get_screenshot`.
+- Figma account signed in with access to the file. **Dev/Full seats** get per-minute rate limits; **Starter/View seats** get monthly tool-call limits.
+- A Figma URL with both **fileKey** and **nodeId**.
+If the MCP is not connected, **do not fail silently** — tell the user:
+> "Figma MCP not detected. Run `sungen init` to scaffold the config, or manually add `figma` with `url: https://mcp.figma.com/mcp` to `.mcp.json`. Then sign in when Claude Code prompts."
+Then stop.
+## Steps
+### 1. Resolve Figma URL
+Prefer in this order:
+1. `Figma URL` field in `requirements/spec.md` (Overview section)
+2. If empty/missing → `AskUserQuestion`: *"Paste the Figma frame URL"* (free text)
+Accept any of these shapes:
+```
+https://www.figma.com/file/<fileKey>/<title>?node-id=<nodeId>
+https://www.figma.com/design/<fileKey>/<title>?node-id=<nodeId>
+https://www.figma.com/proto/<fileKey>/<title>?node-id=<nodeId>
+```
+Parse: `fileKey` = segment after `/file/`, `/design/`, or `/proto/`; `nodeId` = the `node-id` query param (pass `-` or `:` through as-is). If `node-id` is missing, ask the user to select a frame in Figma and copy the **frame URL** (not the file root).
+### 2. Fetch design context
+Call **both** in parallel:
+```
+get_design_context({ fileKey, nodeId })
+get_variable_defs({ fileKey, nodeId })
+```
+`get_design_context` → layout, typography, colors, component structure, spacing. `get_variable_defs` → named design tokens.
+### 3. Fetch screenshot
+```
+get_screenshot({ fileKey, nodeId })
+```
+Save the PNG to `requirements/ui/figma-<sanitized-nodeId>.png` (replace `:` and `-` with `_`, e.g. `42-15` → `figma-42_15.png`).
+### 4. Write metadata dump
+Combine design context + variables into `requirements/ui/figma-meta.md`:
+```markdown
+# Figma Capture — <nodeId>
+**Source:** <full Figma URL>
+**Captured:** <ISO date>
+## Components
+<component names + variants>
+## Typography
+<font families, sizes, weights, line heights>
+## Colors
+<color tokens + raw hex>
+## Spacing & Layout
+<spacing tokens, auto-layout specs>
+## Text Content
+<visible text strings — used by tc-generation to populate test-data>
+```
+Consumed by `sungen-tc-generation` as a secondary source alongside `spec.md`.
+### 5. Report back
+> Captured Figma frame `<nodeId>`: Components N · Text strings M · Design tokens K · Screenshot `requirements/ui/figma-<nodeId>.png` · Metadata `requirements/ui/figma-meta.md`
+Then hand back to the calling command.
+## Error handling
+| Error | Action |
+|---|---|
+| MCP tool not available | Print setup instructions, stop, do not fall back silently |
+| `fileKey` missing from URL | Ask user to paste a valid frame URL |
+| `nodeId` missing from URL | Ask user to right-click a frame in Figma → *Copy link to selection* |
+| `get_design_context` 403 | Ask user to check Dev Mode seat on that file |
+| `get_screenshot` returns no image | Continue with metadata only; warn no PNG was captured |