npm - @sun-asterisk/sungen - Versions diffs - 3.1.2 → 3.2.0-beta.142 - Mend

@sun-asterisk/sungen 3.1.2 → 3.2.0-beta.142

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (290) hide show

package/src/orchestrator/templates/ai-instructions/claude-skill-tc-generation.md CHANGED Viewed

@@ -6,6 +6,9 @@ user-invocable: false
 ## ⚠️ Gotchas — read before generating
+- **Write incrementally — never emit the whole suite in one response.** Build the `.feature` in batches via successive `Write`/`Edit` (≈10–15 scenarios per call). For **Full coverage**, write tier-by-tier: `Write` Tier 1 → `Edit` append Tier 2 → `Edit` append Tier 3.
+  → One huge `Write` can exceed the model's output-token cap → `API Error: Claude's response exceeded the N output token maximum`. Single-pass full coverage only fits when `CLAUDE_CODE_MAX_OUTPUT_TOKENS ≥ 64000`; otherwise batch. Batching also lets the audit/reviewer run per batch — higher quality.
 - `spec_figma.md` exists → read file only, **NEVER** call `mcp__figma__*`
   → PAT auth flow already done by `sungen-capture` (mode figma-pat); re-calling fails or duplicates work.
@@ -265,12 +268,26 @@ Security:         [S1 – admin only]
   Then User see [Detail Product Name] header with {{selected_product_name}}
   And User see [Detail Product Price] text contains {{selected_product_price}}
   ```
-  Cross-screen target → tag `@manual` + `# Deferred to a flow (home -> detail)`.
+  Cross-screen target → **automate it in the flow** (`/sungen:add-flow`), NOT as a `@manual` screen copy. A single home→target journey runs as one Playwright test, so it is automatable — "needs another screen" is not a reason for `@manual`. The screen keeps its screen-contract scenarios; the flow owns the cross-screen depth.
 - Filter result (category AND brand, separately): `Then User see all [Result Product Name] contain {{selected_category}}` — proves EVERY item belongs, not one.
 **Depth is a GATE dimension (harness-roadmap P1) — self-raise, never silently go shallow:**
 - For every data-correctness theme the catalog marks `depth.requires: data-assertion`, emit its `depth.template` shape by **default** — don't wait for the repair loop. `sungen audit` measures `businessDepth` (ratio of these scenarios that assert data) against an intent threshold (functional ≥ 0.70); below it the **gate FAILs**.
-- `depth.cross_screen: true` (cart / detail / filter / brand correctness) → write the deep capture/compare shape but tag `@manual` + `# Deferred to a flow (...)`. These are excluded from the ratio (they're correctly deferred), so they don't hurt depth.
+- `depth.cross_screen: true` (cart / detail / filter / brand correctness) → write the deep capture/compare shape as an **automated flow scenario** (in the flow — do NOT leave a full-step `@manual` duplicate on the screen). `@manual` is **only** for genuine judgment (M6 visual/UX · M8 not-worth · M9 human) or a missing capability (M1–M5/M7), and it **must** carry a reason code (`@manual:Mx`, or a reason comment the planner can infer). A `@manual` scenario that still has full automatable steps (a data assertion, no visual/mock/a11y judgment) is now flagged by `sungen audit` as `MANUAL-AUTOMATABLE`, and business-critical scenarios you defer to `@manual` are reported as `DEPTH-DEFERRED` (they do NOT silently inflate `businessDepth`). Deferring automatable work to `@manual` lowers quality — automate it in the flow instead.
+- **Pick the right `@manual:Mx` code — it decides which driver can later automate the case** (`sungen audit` flags a code↔reason mismatch). Tag the code that matches the **oracle the reason describes**:
+  | The reason needs… | Code | Unblocked by |
+  |---|---|---|
+  | a data state you can't make from the UI (empty list, seeded record, missing-image product) | `M1` | data-factory / db |
+  | an **API/DB/persistence** assertion (stored value, parameterized-query / SQLi-safe, server-side effect) | `M2` | **api / db** |
+  | network / fault injection (offline, slow, request failure) | `M3` | mock |
+  | a stable selector / test-id that doesn't exist | `M4` | — (locator contract) |
+  | an external dependency (email, payment gateway, download) | `M5` | mail-file / contract |
+  | visual / UX / responsive / a11y judgment | `M6` | — (keep manual) |
+  | not worth automating · true human judgment | `M8` / `M9` | — (keep manual) |
+  e.g. "submit a payload then check the subscribers **table**" is an API+DB oracle → `@manual:M2` (NOT `M1`); "seed a DB with zero products" is a data state → `M1`; "throttle the network" → `M3`.
+- **Prefer automation-ready `@requires:<cap>` over prose `@manual`.** When you *can* write the steps for a capability-manual case (an API/DB oracle, a seeded state), write it **automation-ready** — the real `@api`/`@query`/… steps tagged `@requires:<cap>` (e.g. `@requires:db @query:subscriber_row`) — instead of a prose `@manual:M2`. It compiles to a skipped-with-reason stub until `sungen capability add <cap>`, then runs as a real test with **no rewrite**. Reserve prose `@manual:Mx` for cases whose steps genuinely can't be expressed (M6/M8/M9 judgment, or a capability with no driver). `sungen audit` reports these as `AUTOMATION-READY-PENDING` (not a gap, not manual).
 - **If the spec lacks the concrete value** a deep assertion needs (exact message, price, count): still write the deep shape with a `{{var}}` placeholder and leave a `# SPEC-GAP: <field> value not in spec` comment — do **not** downgrade to `see [X] section`. A visible gap is better than a silent shallow pass.
 - **Blind-Spot Memory:** before finishing, run `sungen blindspot list --prompt` (Bash) and make sure the suite satisfies each recorded pattern (e.g. "for any Add/Create action: check success + resulting data state + duplicate/double-submit"). These are gaps QA hit before — don't repeat them.

package/src/orchestrator/templates/ai-instructions/claude-skill-viewpoint.md CHANGED Viewed

@@ -69,6 +69,20 @@ A screen often matches several patterns at once — a login screen is *both* a f
 - VP-LOGIC = outcome depends on the user's *action* (click, submit, navigate)
 - VP-SEC = checks access control and malicious input
+### Domain category codes — required for the coverage-balance gate
+The 4 viewpoints above are the *generic* axes. On a domain screen, the `VP-<CAT>` code must use the **canonical short code** for what the scenario exercises, so the audit's coverage-balance gate buckets it correctly. Use these exact codes — **never long-form or freeform** (`VP-NAV` not `VP-NAVIGATION`, `VP-SUB` not `VP-SUBSCRIPTION`, `VP-FILTER` not `VP-FILTERING`):
+| Bucket | Codes | Use for |
+|---|---|---|
+| **business-core** | `LIST` · `CART` · `PRODUCT` · `FILTER` · `CHECKOUT` · `ORDER` | the screen's core domain data/actions (product list, cart, checkout, order, filtered results) |
+| presentation | `UI` | layout / visual state |
+| validation-security | `VAL` · `SEC` · `SUB` | input validation · access/injection · subscribe/newsletter |
+| behavior | `LOGIC` | action-driven state changes |
+| navigation | `NAV` | landing on / moving between pages |
+**On a business-core page** (product list, cart, checkout, search results), the core data scenarios MUST carry a **business-core** code (`VP-LIST-*`, `VP-CART-*`, `VP-PRODUCT-*`, …) — not a generic `VP-UI`/`VP-LOGIC` or a freeform `VP-<word>`. A freeform/long-form prefix parses as `NONE`, scores **0 on the balance axis**, and drops the audit score (~9.3 → ~7.7 in practice). Keep `VP-UI/VAL/LOGIC/SEC` for the cross-cutting checks; give the domain scenarios their domain code.
 ---
 ## Shared Checks

package/src/orchestrator/templates/ai-instructions/copilot-cmd-create-test.md CHANGED Viewed

@@ -18,7 +18,11 @@ You are a **Senior QA Engineer**. You structure test cases by viewpoint categori
 - **name** — ${input:name:screen or flow name (e.g., login, award-submission)}
-**Auto-detect context**: check if `qa/flows/<name>/` exists → flow mode (base path: `qa/flows/<name>/`). Else check `qa/screens/<name>/` → screen mode (base path: `qa/screens/<name>/`).
+**Auto-detect context**: check if `qa/api/<name>/` or `qa/api/flows/<name>/` exists → **API unit mode** (below). Else if `qa/flows/<name>/` → flow mode (base path: `qa/flows/<name>/`). Else `qa/screens/<name>/` → screen mode (base path: `qa/screens/<name>/`).
+## API unit mode (driver-api)
+If the unit is **api-first** (`qa/api/<name>/` or `qa/api/flows/<name>/`), the design loop differs — **no visual capture, no selectors**; the contract is the named-endpoint catalog. **Follow the `sungen-api-design` skill end-to-end** instead of the screen/flow steps: `sungen context --area <name>` (discover) → API viewpoint overview → generate `@api`/`@cases`/flow/`@concurrent`/`@query` scenarios → **`sungen audit --area <name>` gate + reviewer + repair loop to businessDepth ≥ 0.7** → record + trace. Then recommend `/sungen-run-test <name>`. The capture / viewpoint-group / selector steps do **not** apply.
 ## Steps
@@ -26,9 +30,10 @@ You are a **Senior QA Engineer**. You structure test cases by viewpoint categori
    **Screen**: Verify `qa/screens/${input:name}/` exists. If not → `/sungen-add-screen` first.
 2. Check if `.feature` already has scenarios.
    - If yes → summarize existing coverage and ask update mode (options depend on which tiers already exist — see `sungen-tc-generation` skill for details).
-   - If no → fresh creation. Ask generation scope:
-     - **1) Tier 1 — Critical & High priority** — ~10-15 scenarios/section covering happy paths, core validation, security basics **(Recommended)**
-     - **2) Full coverage — All tiers at once** — generates Tier 1 + 2 + 3 in one run. Large output (~40-60 scenarios/section), best for experienced users who want complete coverage immediately
+   - If no → fresh creation. **Write the feature file incrementally** (successive writes/edits, ≈10-15 scenarios per call) — never emit the whole suite in one response, or it can exceed the model's output-token cap (`response exceeded the N output token maximum`). Ask generation scope:
+     - **1) Tier 1 — Critical & High priority** — ~10-15 scenarios/section: happy paths, core validation, security basics **(Recommended)**
+     - **2) Full coverage (incremental)** — Tier 1 + 2 + 3, written tier-by-tier in batches. Safe on any output-token budget.
+     - **3) Full coverage (single pass)** — everything in one go (~40-60 scenarios/section). Faster, but **only if you raised your output cap** (`CLAUDE_CODE_MAX_OUTPUT_TOKENS ≥ 64000`) — otherwise it errors mid-generation. For power users on a high-token model/config.
 3. **Read project context + screen requirements**
    **Project context** — check `qa/context.md` (project root, not screen-specific):
@@ -60,7 +65,7 @@ You are a **Senior QA Engineer**. You structure test cases by viewpoint categori
 4. Follow the `sungen-tc-generation` skill for section identification, viewpoint generation, and output format. **For flows**, use the "Flow Test Generation" section in the skill. When requirements exist, use the "Requirements-Driven Generation" strategy. **For Tier 1**, apply the **Lightweight Guard** — verify required fields, validation rules, business rules, security checks, and key state transitions all have TCs after generation. **For Tier 2+**, **MUST** apply the full **Mapping Contract** — walk every `spec.md` section top-to-bottom and produce the indicated TCs per Table 1; handle `test-viewpoint.md` per Table 2. Do not silently skip sections. Present sections as a numbered list and let user pick.
 5. Generate or update `.feature` + `test-data.yaml` following `sungen-gherkin-syntax` and `sungen-tc-generation` skills. **For flows**: use `[Screen:Element]` namespace format, namespace test-data by phase, add `@flow` tag.
-5.5. **Quality gate & repair (harness — always run).** Per `sungen-harness-audit`: run `sungen audit --screen ${input:name}` (structural), THEN do an **independent semantic review inline** using the `sungen-reviewer` criteria (does each scenario's steps PROVE its title/viewpoint? observable Thens? business-critical assertion depth?). Merge both sets of issues; if gate FAILs / findings exist, repair (budget 3) and re-audit — GATE missing theme → generate it (cross-screen → write data assertions, tag `@manual`, comment `# Deferred to a flow`); DEPTH → add data assertions; BALANCE → add business-core first; TRACE → align VP ids. Never fake a pass.
+5.5. **Quality gate & repair (harness — always run).** Per `sungen-harness-audit`: run `sungen audit --screen ${input:name}` (structural), THEN do an **independent semantic review inline** using the `sungen-reviewer` criteria (does each scenario's steps PROVE its title/viewpoint? observable Thens? business-critical assertion depth?). Merge both sets of issues; if gate FAILs / findings exist, repair (budget 3) and re-audit — GATE missing theme → generate it (cross-screen → **automate it in the flow** via `/sungen:add-flow`, NOT a full `@manual` screen duplicate — `sungen audit` flags an automatable `@manual` as `MANUAL-AUTOMATABLE`; reserve `@manual:Mx` for true judgment/missing-capability); DEPTH → add data assertions; BALANCE → add business-core first; TRACE → align VP ids. Never fake a pass.
 5.6. **Record.** `sungen manifest --screen ${input:name}`. Ledger **each phase** (not just repair) — pick one `runId` at the start and pass it so `trace`/`ledger report` show THIS run, not a mix: `sungen ledger record --screen ${input:name} --run <runId> --step <discovery|viewpoint|gherkin|audit|repair:N> --ms <elapsed>`. On re-run, start with `sungen manifest --screen ${input:name} --diff` and only regenerate changed sections.
 6. **Converge — show the trace.** Run `sungen trace --screen ${input:name}` and present: process map (phases + repair rounds), bottlenecks, **HUMAN-LOOP FOCUS** (@manual to verify), audit score + gate + residual gaps. Then offer next steps based on which tier was just generated:

package/src/orchestrator/templates/ai-instructions/copilot-cmd-run-test.md CHANGED Viewed

@@ -30,7 +30,16 @@ Count 0 → offer the user:
 Skip when `--env` matches the base locale.
-**Auto-detect context**: check if `qa/flows/<name>/` exists → flow mode (base path: `qa/flows/<name>/`). Else check `qa/screens/<name>/` → screen mode (base path: `qa/screens/<name>/`).
+**Auto-detect context**: check if `qa/api/<name>/` or `qa/api/flows/<name>/` exists → **API unit mode** (below). Else if `qa/flows/<name>/` → flow mode (base path: `qa/flows/<name>/`). Else `qa/screens/<name>/` → screen mode (base path: `qa/screens/<name>/`).
+## API unit mode (driver-api) — no selectors
+If the unit is **api-first**, skip every selector/capture phase (an API test has no DOM):
+1. **Resolve the datasource** — `base_url` + auth wired in `qa/datasources.yaml` + `.env.qa` (`${X_URL}` from `sungen api init`); a `production` datasource is refused unless `SUNGEN_ALLOW_PROD=1`.
+2. **Compile**: `npx sungen generate --area <name>` → `specs/generated/api/<name>/`.
+3. **Run**: `npx playwright test specs/generated/api/<name>/<name>.spec.ts`.
+4. **Auto-fix** (use `sungen-error-mapping`): 401/403 → `@hybrid`+`@auth` or `Bearer :token` header (`sungen makeauth`); base_url unresolved → set `${X_URL}`; missing param → trace `{{var}}` to test-data/a prior `@api` response; `expect.status` mismatch → reconcile against `apis.yaml` (re-`generate --area`, never hand-edit the spec); **400 "parameter missing" / body ignored → set `encoding: form` (or `multipart`) on the catalog entry, don't mark @manual**; flaky → self-clean + `@concurrent` caps.
+5. **Integrity + trace** — `sungen script-check --area <name>` (1:1; on DRIFT re-`generate --area`, never hand-edit the spec) + `sungen trace --area <name>` (process map + HUMAN-LOOP FOCUS). Report + offer next steps.
 ## Pre-run (phased — per `sungen-selector-fix` skill)
@@ -84,6 +93,7 @@ Skip when `--env` matches the base locale.
 7. **Phase 3 — Full Run**: Run all tests. Fix only **new** failures (elements unique to `@normal`/`@low`). Max 1 attempt. Don't loop on low-priority failures.
 8. **Phase 4 — Regression**: One final full run. Report results. No more fix loops.
 9. **Integrity & trace (always run after the final run).** `sungen script-check --screen <name>` — verify the spec is a **1:1** of the Gherkin; if **DRIFT**, re-run `sungen generate --screen <name>` (never hand-edit the `.spec.ts` — auto-fix edits `selectors.yaml`). Then `sungen ledger record --screen <name> --step run --ms <elapsed>` and `sungen trace --screen <name>` to show the process map + bottlenecks + **HUMAN-LOOP FOCUS**.
+10. **Capability-pending offer (consent-gated).** If `sungen audit` reports `AUTOMATION-READY-PENDING` (or `@requires:<cap>` tests are skipped "requires …"), offer: *"N scenario(s) are automation-ready — enable `<cap>` to run them? (`sungen capability add <cap>`)"*. Only on the user's yes, run `sungen capability add <cap>` + re-run; on no, leave skipped (not failures, not manual). **Never auto-install.**
 ## Playwright command guidelines

package/src/orchestrator/templates/ai-instructions/github-skill-sungen-api-design.md ADDED Viewed

@@ -0,0 +1,62 @@
+---
+name: sungen-api-design
+description: The API-first design loop for an api unit (qa/api/<area> or qa/api/flows/<flow>) — discover the catalog, lay out the API viewpoints, generate @api/@cases/flow/@concurrent scenarios, then drive the sungen audit --area gate + reviewer + repair to a high businessDepth (≥0.7). Use when create-test/run-test detects an api unit (no selectors, no visual capture).
+---
+# API design loop (driver-api · Orchestration + Harness)
+Use this when the unit is **api-first** — `qa/api/<area>/` or `qa/api/flows/<flow>/`. There are **no selectors and no visual capture**: the contract is the **named-endpoint catalog** (`api/apis.yaml`), referenced by `@api:<name>`. QA writes **no HTTP code**. Full annotation reference: the **API Steps** guide (`@api` / `@cases` / flows / `@concurrent` / `@hybrid`).
+## The loop (mirror of /sungen:design, API-native)
+### 1. Discover (no capture)
+Run `sungen context --area <name>` — it reads the catalog and prints the **endpoints** + the **generation units** (one `matrix` unit per endpoint, an `async` unit per mutating endpoint, a `flow` unit for an api flow). Read `qa/api/<name>/requirements/spec.md` if present. No `apis.yaml` yet? → `sungen api import <openapi|csv>` or `sungen api add --area <name>` first.
+### 2. API viewpoint overview (by method-profile)
+For each endpoint, cover its viewpoints — severity-weighted by method:
+| Profile | Endpoints | Must cover | Then |
+|---|---|---|---|
+| read | GET, HEAD | `contract` (status + body shape) | `pagination`/`filter` (list), `not-found` (by-id) |
+| mutating | POST/PUT/PATCH/DELETE | `contract`, `error` (validation/4xx/auth) | `idempotency` (`@concurrent`), `side-effect` (`@query`) |
+Bands: **~70%** success+failure matrix · **~20%** flows (auth/CRUD chains) · **~10%** async/idempotency.
+### 3. Generate (incremental — never the whole suite in one Write)
+- **Contract**: `@api:<name>` + `expect {{name.status}} is …` **and a body assertion** (`{{name.body.<path>}}`).
+- **Error matrix**: `@api:<name>(p={{p}}) @cases:<dataset>` — one scenario, a dataset of `input → expected status`.
+- **Flow**: ordered `@api` tags threading a prior response (`token={{login.body.token}}` → the catalog `Bearer :token` header; `id={{create.body.id}}` → a path param). Self-clean (delete what you create).
+- **Idempotency**: `@api:<name> @concurrent:N` + `expect {{name.ok_count}} is 1`, cross-checked with `@query` (the DB is the oracle).
+### 4. Gate + repair (always — businessDepth ≥ 0.7 is the bar)
+Run `sungen audit --area <name>`; read `gateStatus` + `findings`. Then the **semantic reviewer** (sungen-reviewer sub-agent, API criteria). Repair **both** (budget 3 rounds), re-audit until PASS:
+| Finding | Repair |
+|---|---|
+| `VIEWPOINT-API-CONTRACT` | the endpoint is invoked but its response is never asserted → add `expect {{name.status}}` + a `{{name.body.…}}` check |
+| `VIEWPOINT-API-ERROR` | a mutating endpoint has no failure scenario → add a `@cases` error matrix (or an explicit 4xx) |
+| `VIEWPOINT-API-IDEMPOTENCY` | a mutating endpoint has no race check → add `@concurrent:N` + a `@query` DB cross-check |
+| `VIEWPOINT-API-MANUAL-AUTOMATABLE` | a `@manual` scenario whose endpoint resolves is automatable → drop `@manual`, use `@api` (+ `@cases`); reserve `@manual` for genuine judgment cases |
+| **`DEPTH-FAIL`** (businessDepth < 0.7) | a **mutating success** scenario asserts only `status` → make it **prove the effect**: assert a response **body** field, a **`@query`** side-effect, or a **`@concurrent` `ok_count`** invariant. (An error/`@cases` scenario proving the status is correct — it is *not* depth-required.) |
+Stop when the gate PASSes + businessDepth ≥ 0.7, or the budget is exhausted → report residual gaps honestly (mark genuinely-unautomatable cases `@manual` with an oracle). Never fake a pass.
+### 5. Record + converge
+`sungen manifest --area <name>` (reuse) and ledger each phase; show the trace + the HUMAN-LOOP FOCUS. (Integrity `script-check`/`trace` for api: see run-test.)
+## Taxonomy (label scenarios correctly)
+| Class | What | Examples |
+|---|---|---|
+| **Functional** | single-endpoint behaviour | happy contract · error/validation (`@cases`) · boundary/edge |
+| **Functional — flow/integration** | multi-endpoint journeys | auth/CRUD lifecycle (`create → login → get → delete`), cross-endpoint invariants |
+| **Non-Functional** | performance · reliability · **security** · concurrency/idempotency | `@concurrent` race/idempotency |
+A flow (`create → login → delete`) is a **Functional integration** test, **not** non-functional — don't file it under "Non-Functional". Reserve non-functional for perf/security/concurrency.
+## Rules
+- **No HTTP, no selectors** — only `.feature` + the reviewed `apis.yaml` + `test-data`.
+- **Non-prod default** — a `production` datasource is refused unless `SUNGEN_ALLOW_PROD=1`.
+- **The DB is the oracle** for idempotency/side-effects — HTTP status alone can lie; pair `@api` with `@query`.
+- **`@parallel` + mutating endpoints** — give each scenario **isolated data** (a `{{$uuid}}` email, a `@cases` row, or its own created resource) and **self-clean** (delete what it created); shared inputs race under parallel execution.
+- **No dead data** — every `test-data` key must be bound into a scenario (`{{key}}`, a `@cases` dataset, or an override). `sungen audit`/the generate lint flag unreferenced keys.

package/src/orchestrator/templates/ai-instructions/github-skill-sungen-gherkin-syntax.md CHANGED Viewed

@@ -213,6 +213,7 @@ Options: `nth` `exact` `scope` `match` `variant` `frame` `contenteditable` `colu
 | `@flow` | Mark feature as E2E flow (cross-screen testing) |
 | `@cases:dataset` | Data-driven: run the scenario once per row of the `dataset` LIST in test-data → one `test()` per row |
 | `@query:name` | Database: run the named query from `database/queries.yaml` (precondition) and bind its rows to `{{name}}`; assert with `expect {{name.count}} …` + path access. Override params `@query:name(p={{v}})`. Repeatable. (Optional Data Driver — see Database verification above) |
+| `@api:name` | API: run the named request from `api/apis.yaml` (precondition) and bind the response to `{{name}}`; assert with `expect {{name.status}} …` + path access (`{{name.body.<path>}}`). Override params `@api:name(p={{v}})`. Repeatable. (Optional API Driver) |
 ### Data-driven scenarios (`@cases`)

package/src/orchestrator/templates/ai-instructions/github-skill-sungen-harness-audit.md CHANGED Viewed

@@ -58,7 +58,7 @@ Use these when repairing GATE/DEPTH findings for the hard viewpoints (cart/detai
   ```
   `see all [X] contain {{v}}` asserts EVERY matching element contains the value → "all displayed products belong to the selected category/brand", not just one.
-> Cross-screen flows (home → detail/cart): if the target screen is a separate screen, prefer a **flow** (`/sungen:add-flow`) so the journey is one test. On a single screen, keep the cross-screen assertion but tag `@manual` with a `# Deferred to a flow` comment.
+> Cross-screen flows (home → detail/cart): **automate the journey as a flow** (`/sungen:add-flow`) — it runs as one test, so it is automatable. Do **not** keep a full `@manual` duplicate of it on the screen (a non-running dead copy that `sungen audit` flags as `MANUAL-AUTOMATABLE` and that inflates nothing — deferred business-critical is reported as `DEPTH-DEFERRED`). The screen keeps its screen-contract; the flow owns the cross-screen depth. `@manual` is for genuine judgment / missing-capability only, tagged `@manual:Mx`.
 ## Repair loop rules
@@ -66,6 +66,7 @@ Use these when repairing GATE/DEPTH findings for the hard viewpoints (cart/detai
 2. **Stop when** `gateStatus == PASS` AND `findings` empty — or budget exhausted.
 3. **Never fake a pass.** A shallow `see [Cart] page` does not satisfy `cart-correctness`. If a gap is genuinely cross-screen or needs capabilities the DSL lacks (e.g. capture an element value to compare elsewhere), **report it as a residual gap / flow item** instead of forcing a green gate.
 4. **EP/data families are OK.** A `duplicates` cluster with `sameDataLikely=false` is an intentional equivalence-partition family (e.g. many invalid-email cases) — keep it; only collapse `sameDataLikely=true` exact duplicates.
+5. **Advisory findings — surface, don't gate.** `MANUAL-REASON-MISMATCH` → fix the scenario's `@manual:Mx` code (so the planner recommends the right driver) during repair. `CAPABILITY-SUGGESTION` → **present it to the user as a next-step option** (e.g. "N @manual could be automated — `sungen capability add api db`?"), **recommend-only — never auto-install**. Neither fails the gate.
 ## Discovery / fallback tree (when input is limited)

package/src/orchestrator/templates/ai-instructions/github-skill-sungen-tc-generation.md CHANGED Viewed

@@ -6,6 +6,9 @@ user-invocable: false
 ## ⚠️ Gotchas — read before generating
+- **Write incrementally — never emit the whole suite in one response.** Build the `.feature` in batches via successive `Write`/`Edit` (≈10–15 scenarios per call). For **Full coverage**, write tier-by-tier: `Write` Tier 1 → `Edit` append Tier 2 → `Edit` append Tier 3.
+  → One huge `Write` can exceed the model's output-token cap → `API Error: Claude's response exceeded the N output token maximum`. Single-pass full coverage only fits when `CLAUDE_CODE_MAX_OUTPUT_TOKENS ≥ 64000`; otherwise batch. Batching also lets the audit/reviewer run per batch — higher quality.
 - `spec_figma.md` exists → read file only, **NEVER** call `mcp__figma__*`
   → PAT auth flow already done by `sungen-capture` (mode figma-pat); re-calling fails or duplicates work.
@@ -265,12 +268,26 @@ Security:         [S1 – admin only]
   Then User see [Detail Product Name] header with {{selected_product_name}}
   And User see [Detail Product Price] text contains {{selected_product_price}}
   ```
-  Cross-screen target → tag `@manual` + `# Deferred to a flow (home -> detail)`.
+  Cross-screen target → **automate it in the flow** (`/sungen:add-flow`), NOT as a `@manual` screen copy. A single home→target journey runs as one Playwright test, so it is automatable — "needs another screen" is not a reason for `@manual`. The screen keeps its screen-contract scenarios; the flow owns the cross-screen depth.
 - Filter result (category AND brand, separately): `Then User see all [Result Product Name] contain {{selected_category}}` — proves EVERY item belongs, not one.
 **Depth is a GATE dimension (harness-roadmap P1) — self-raise, never silently go shallow:**
 - For every data-correctness theme the catalog marks `depth.requires: data-assertion`, emit its `depth.template` shape by **default** — don't wait for the repair loop. `sungen audit` measures `businessDepth` (ratio of these scenarios that assert data) against an intent threshold (functional ≥ 0.70); below it the **gate FAILs**.
-- `depth.cross_screen: true` (cart / detail / filter / brand correctness) → write the deep capture/compare shape but tag `@manual` + `# Deferred to a flow (...)`. These are excluded from the ratio (they're correctly deferred), so they don't hurt depth.
+- `depth.cross_screen: true` (cart / detail / filter / brand correctness) → write the deep capture/compare shape as an **automated flow scenario** (in the flow — do NOT leave a full-step `@manual` duplicate on the screen). `@manual` is **only** for genuine judgment (M6 visual/UX · M8 not-worth · M9 human) or a missing capability (M1–M5/M7), and it **must** carry a reason code (`@manual:Mx`, or a reason comment the planner can infer). A `@manual` scenario that still has full automatable steps (a data assertion, no visual/mock/a11y judgment) is now flagged by `sungen audit` as `MANUAL-AUTOMATABLE`, and business-critical scenarios you defer to `@manual` are reported as `DEPTH-DEFERRED` (they do NOT silently inflate `businessDepth`). Deferring automatable work to `@manual` lowers quality — automate it in the flow instead.
+- **Pick the right `@manual:Mx` code — it decides which driver can later automate the case** (`sungen audit` flags a code↔reason mismatch). Tag the code that matches the **oracle the reason describes**:
+  | The reason needs… | Code | Unblocked by |
+  |---|---|---|
+  | a data state you can't make from the UI (empty list, seeded record, missing-image product) | `M1` | data-factory / db |
+  | an **API/DB/persistence** assertion (stored value, parameterized-query / SQLi-safe, server-side effect) | `M2` | **api / db** |
+  | network / fault injection (offline, slow, request failure) | `M3` | mock |
+  | a stable selector / test-id that doesn't exist | `M4` | — (locator contract) |
+  | an external dependency (email, payment gateway, download) | `M5` | mail-file / contract |
+  | visual / UX / responsive / a11y judgment | `M6` | — (keep manual) |
+  | not worth automating · true human judgment | `M8` / `M9` | — (keep manual) |
+  e.g. "submit a payload then check the subscribers **table**" is an API+DB oracle → `@manual:M2` (NOT `M1`); "seed a DB with zero products" is a data state → `M1`; "throttle the network" → `M3`.
+- **Prefer automation-ready `@requires:<cap>` over prose `@manual`.** When you *can* write the steps for a capability-manual case (an API/DB oracle, a seeded state), write it **automation-ready** — the real `@api`/`@query`/… steps tagged `@requires:<cap>` (e.g. `@requires:db @query:subscriber_row`) — instead of a prose `@manual:M2`. It compiles to a skipped-with-reason stub until `sungen capability add <cap>`, then runs as a real test with **no rewrite**. Reserve prose `@manual:Mx` for cases whose steps genuinely can't be expressed (M6/M8/M9 judgment, or a capability with no driver). `sungen audit` reports these as `AUTOMATION-READY-PENDING` (not a gap, not manual).
 - **If the spec lacks the concrete value** a deep assertion needs (exact message, price, count): still write the deep shape with a `{{var}}` placeholder and leave a `# SPEC-GAP: <field> value not in spec` comment — do **not** downgrade to `see [X] section`. A visible gap is better than a silent shallow pass.
 - **Blind-Spot Memory:** before finishing, run `sungen blindspot list --prompt` (Bash) and make sure the suite satisfies each recorded pattern (e.g. "for any Add/Create action: check success + resulting data state + duplicate/double-submit"). These are gaps QA hit before — don't repeat them.

package/src/orchestrator/templates/ai-instructions/github-skill-sungen-viewpoint.md CHANGED Viewed

@@ -69,6 +69,20 @@ A screen often matches several patterns at once — a login screen is *both* a f
 - VP-LOGIC = outcome depends on the user's *action* (click, submit, navigate)
 - VP-SEC = checks access control and malicious input
+### Domain category codes — required for the coverage-balance gate
+The 4 viewpoints above are the *generic* axes. On a domain screen, the `VP-<CAT>` code must use the **canonical short code** for what the scenario exercises, so the audit's coverage-balance gate buckets it correctly. Use these exact codes — **never long-form or freeform** (`VP-NAV` not `VP-NAVIGATION`, `VP-SUB` not `VP-SUBSCRIPTION`, `VP-FILTER` not `VP-FILTERING`):
+| Bucket | Codes | Use for |
+|---|---|---|
+| **business-core** | `LIST` · `CART` · `PRODUCT` · `FILTER` · `CHECKOUT` · `ORDER` | the screen's core domain data/actions (product list, cart, checkout, order, filtered results) |
+| presentation | `UI` | layout / visual state |
+| validation-security | `VAL` · `SEC` · `SUB` | input validation · access/injection · subscribe/newsletter |
+| behavior | `LOGIC` | action-driven state changes |
+| navigation | `NAV` | landing on / moving between pages |
+**On a business-core page** (product list, cart, checkout, search results), the core data scenarios MUST carry a **business-core** code (`VP-LIST-*`, `VP-CART-*`, `VP-PRODUCT-*`, …) — not a generic `VP-UI`/`VP-LOGIC` or a freeform `VP-<word>`. A freeform/long-form prefix parses as `NONE`, scores **0 on the balance axis**, and drops the audit score (~9.3 → ~7.7 in practice). Keep `VP-UI/VAL/LOGIC/SEC` for the cross-cutting checks; give the domain scenarios their domain code.
 ---
 ## Shared Checks

package/src/orchestrator/templates/specs-api.ts ADDED Viewed

@@ -0,0 +1,154 @@
+/* eslint-disable */
+/**
+ * Sungen API Driver — runtime helper (auto-generated into specs/api.ts).
+ *
+ * Runs a catalog-defined HTTP request and returns { status, ok, body, headers } — bound to a
+ * `{{name}}` variable by the `@api:<name>` annotation, asserted with `expect {{name.status}} …` /
+ * `{{name.body.<path>}}`. Base URL + auth come from a `kind: api` datasource in datasources.yaml,
+ * with `${VAR}` resolved from .env.qa / process.env — never inline.
+ *
+ * Safety: a datasource flagged `env: production` is refused unless SUNGEN_ALLOW_PROD=1.
+ * DO NOT EDIT — regenerated by `sungen generate`.
+ */
+import * as fs from 'fs';
+import * as path from 'path';
+import { request, type APIRequestContext } from '@playwright/test';
+interface ApiDataSource {
+  kind?: string;
+  base_url?: string;
+  baseUrl?: string;
+  env?: string;
+  headers?: Record<string, string>;
+  timeout_ms?: number;
+}
+function loadEnvQa(): void {
+  for (const name of ['.env.qa', `.env.qa.${process.env.SUNGEN_ENV || ''}`]) {
+    const p = path.join(process.cwd(), name);
+    if (!name.endsWith('.') && fs.existsSync(p)) {
+      for (const line of fs.readFileSync(p, 'utf8').split('\n')) {
+        const m = line.match(/^\s*([A-Za-z_][A-Za-z0-9_]*)\s*=\s*(.*?)\s*$/);
+        if (m && process.env[m[1]] === undefined) process.env[m[1]] = m[2].replace(/^["']|["']$/g, '');
+      }
+    }
+  }
+}
+function loadConfig(): Record<string, ApiDataSource> {
+  loadEnvQa();
+  const file = [path.join(process.cwd(), 'datasources.yaml'), path.join(process.cwd(), 'qa', 'datasources.yaml')].find((f) => fs.existsSync(f));
+  if (!file) throw new Error('API Driver: no datasources.yaml found (project root or qa/).');
+  const raw = fs.readFileSync(file, 'utf8').replace(/\$\{([A-Za-z_][A-Za-z0-9_]*)\}/g, (_, k) => process.env[k] ?? '');
+  const { parse } = require('yaml');
+  const doc = parse(raw) || {};
+  return doc.datasources || {};
+}
+function substitute(text: string, params: Record<string, any>): string {
+  return text.replace(/:([A-Za-z_][A-Za-z0-9_]*)/g, (_m, p) => encodeURIComponent(String(params[p] ?? '')));
+}
+class ApiClient {
+  private configs: Record<string, ApiDataSource> | null = null;
+  private cfg(name?: string): { key: string; conf: ApiDataSource } {
+    if (!this.configs) this.configs = loadConfig();
+    const key = name || Object.keys(this.configs).find((k) => (this.configs![k].kind || 'api') === 'api') || Object.keys(this.configs)[0];
+    const conf = this.configs[key];
+    if (!conf) throw new Error(`API Driver: datasource "${key}" not found in datasources.yaml`);
+    if (conf.env === 'production' && process.env.SUNGEN_ALLOW_PROD !== '1') {
+      throw new Error(`API Driver: datasource "${key}" is env: production — refused (set SUNGEN_ALLOW_PROD=1 to override).`);
+    }
+    return { key, conf };
+  }
+  /**
+   * Run a catalog request and return the response. `req` is embedded at compile time; `params` (path
+   * `:id`, JSON body `:fields`, and header `:tokens`) bind at runtime. Catalog `headers` layer over the
+   * datasource headers and may carry `:param` placeholders — e.g. `authorization: "Bearer :token"` with
+   * the dynamic token threaded from a prior response (flow chaining).
+   */
+  async call(
+    label: string,
+    req: { method: string; path: string; body?: unknown; encoding?: 'json' | 'form' | 'multipart'; headers?: Record<string, string>; datasource?: string },
+    params: Record<string, any> = {},
+    opts: { storageState?: string } = {},
+  ): Promise<{ status: number; ok: boolean; body: any; headers: Record<string, string> }> {
+    const { conf } = this.cfg(req.datasource);
+    const base = (conf.base_url || conf.baseUrl || '').replace(/\/$/, '');
+    if (!base) throw new Error(`API Driver: ${label} — datasource has no base_url (set it in .env.qa).`);
+    const urlPath = substitute(req.path, params);   // path params (:id) bind at runtime
+    const headers: Record<string, string> = { ...(conf.headers || {}) };
+    // catalog headers; :param tokens bind at runtime — raw (no URL-encoding, unlike the path)
+    for (const [k, v] of Object.entries(req.headers || {}))
+      headers[k] = String(v).replace(/:([A-Za-z_][A-Za-z0-9_]*)/g, (_m, p) => String(params[p] ?? ''));
+    // Body: substitute `:param` into the body template (object values), then encode per `encoding`.
+    let body: any;
+    if (req.body !== undefined && req.body !== null) {
+      body = JSON.parse(JSON.stringify(req.body).replace(/":([A-Za-z_][A-Za-z0-9_]*)"/g, (_m, p) => JSON.stringify(params[p] ?? null)));
+    }
+    // Map the wire format to the right Playwright option (#345): json → data (application/json,
+    // default), form → form (application/x-www-form-urlencoded), multipart → multipart (form-data).
+    const bodyOpt: Record<string, unknown> = {};
+    if (body !== undefined) {
+      const enc = req.encoding ?? 'json';
+      if (enc === 'form') bodyOpt.form = body;
+      else if (enc === 'multipart') bodyOpt.multipart = body;
+      else bodyOpt.data = body;
+    }
+    // Playwright APIRequestContext: same runner/report/retries as UI tests. @hybrid passes
+    // `storageState` (the @auth role's saved session) so the request shares the browser's
+    // authenticated cookies. Disposed per call so no request context lingers and hangs the process.
+    const ctx: APIRequestContext = await request.newContext({
+      baseURL: base,
+      extraHTTPHeaders: headers,
+      timeout: conf.timeout_ms ?? 15000,
+      ...(opts.storageState ? { storageState: opts.storageState } : {}),
+    });
+    try {
+      const res = await ctx.fetch(urlPath, { method: req.method, ...bodyOpt });
+      const text = await res.text();
+      let parsed: any = text;
+      try { parsed = text ? JSON.parse(text) : null; } catch { /* non-JSON → keep text */ }
+      return { status: res.status(), ok: res.ok(), body: parsed, headers: res.headers() };
+    } finally {
+      await ctx.dispose();
+    }
+  }
+  /**
+   * Fire the same request N times in parallel (the `@concurrent:N` primitive) and bind aggregates —
+   * the idempotency/race oracle. Returns the full `responses` array plus `ok_count`, `status_counts`,
+   * and `statuses`, asserted with `expect {{name.ok_count}} is 1` (and cross-checked against the DB via
+   * `@query` to prove "exactly one charge"). Path access works on the bound value: `{{name.ok_count}}`,
+   * `{{name.status_counts.409}}`, `{{name.responses.count}}`, `{{name.responses[0].body.id}}`.
+   */
+  async callN(
+    label: string,
+    req: { method: string; path: string; body?: unknown; encoding?: 'json' | 'form' | 'multipart'; headers?: Record<string, string>; datasource?: string },
+    params: Record<string, any> = {},
+    n = 1,
+    opts: { storageState?: string } = {},
+  ): Promise<{
+    responses: Array<{ status: number; ok: boolean; body: any; headers: Record<string, string> }>;
+    ok_count: number;
+    status_counts: Record<string, number>;
+    statuses: number[];
+  }> {
+    const count = Math.max(1, Math.floor(n));
+    const responses = await Promise.all(Array.from({ length: count }, () => this.call(label, req, params, opts)));
+    const status_counts: Record<string, number> = {};
+    for (const r of responses) status_counts[String(r.status)] = (status_counts[String(r.status)] || 0) + 1;
+    return {
+      responses,
+      ok_count: responses.filter((r) => r.ok).length,
+      status_counts,
+      statuses: responses.map((r) => r.status),
+    };
+  }
+}
+export const api = new ApiClient();

package/src/orchestrator/templates/specs-db.ts CHANGED Viewed

@@ -21,12 +21,73 @@ const ident = (s: string): string => {
   return s;
 };
+interface SshConfig {
+  host: string;              // jump host reachable from the runner
+  port?: number;             // default 22
+  user: string;
+  private_key?: string;      // PEM contents (from ${VAR} in .env.qa) — preferred for CI
+  private_key_path?: string; // or a filesystem path (local dev)
+  passphrase?: string;       // for an encrypted key
+  known_host?: string;       // base64 of the server's host key to pin (optional; else warn-and-proceed)
+}
 interface DataSourceConfig {
   engine: 'postgres' | 'mysql' | 'sqlite';
   url: string;
   readonly?: boolean;
   statement_timeout_ms?: number;
   max_rows?: number;
+  // Cách B (fallback): tunnel the DB SOCKET through an SSH bastion. DB-only — the browser/E2E
+  // still run on the runner; only PG traffic crosses. See docs/spec/sungen_data_driver_ssh_tunnel_spec.md.
+  ssh?: SshConfig;
+}
+/**
+ * Open a local TCP forward (127.0.0.1:<ephemeral> → ssh bastion → dstHost:dstPort) for a DB socket.
+ * Sockets are unref()'d so a dangling tunnel never keeps the test process alive after the run.
+ */
+async function openSshTunnel(ssh: SshConfig, dstHost: string, dstPort: number): Promise<{ host: string; port: number; close: () => void }> {
+  const { Client } = require('ssh2');
+  const net = require('net');
+  const privateKey = ssh.private_key
+    ? ssh.private_key
+    : ssh.private_key_path
+      ? fs.readFileSync(ssh.private_key_path.replace(/^~(?=\/)/, process.env.HOME || ''), 'utf8')
+      : undefined;
+  if (!privateKey) throw new Error('Data Driver: datasource `ssh` requires `private_key` or `private_key_path`.');
+  const conn = new Client();
+  await new Promise<void>((resolve, reject) => {
+    conn.on('ready', resolve).on('error', reject).connect({
+      host: ssh.host,
+      port: ssh.port ?? 22,
+      username: ssh.user,
+      privateKey,
+      passphrase: ssh.passphrase,
+      hostVerifier: (key: Buffer) => {
+        const got = Buffer.isBuffer(key) ? key.toString('base64') : String(key);
+        if (ssh.known_host) {
+          if (got === ssh.known_host.trim()) return true;
+          throw new Error(`Data Driver: SSH host-key mismatch for ${ssh.host} — refused (known_host pin).`);
+        }
+        console.warn(`Data Driver: SSH host key for ${ssh.host} is not pinned (set datasource ssh.known_host to verify). Proceeding (TOFU).`);
+        return true;
+      },
+    });
+  });
+  const server = net.createServer((sock: any) => {
+    conn.forwardOut(sock.remoteAddress || '127.0.0.1', sock.remotePort || 0, dstHost, dstPort, (err: any, stream: any) => {
+      if (err) { sock.destroy(); return; }
+      sock.pipe(stream).pipe(sock);
+    });
+  });
+  await new Promise<void>((resolve, reject) => server.on('error', reject).listen(0, '127.0.0.1', () => resolve()));
+  const addr = server.address();
+  const port = addr && typeof addr === 'object' ? addr.port : 0;
+  server.unref();                          // don't keep the event loop alive after tests
+  try { (conn as any)._sock?.unref?.(); } catch { /* best-effort */ }
+  return { host: '127.0.0.1', port, close: () => { try { server.close(); } catch {} try { conn.end(); } catch {} } };
 }
 function loadEnvQa(): void {
@@ -64,6 +125,7 @@ type Engine = { query(sql: string, params: any[]): Promise<any[]>; };
 class DataSource {
   private configs: Record<string, DataSourceConfig> | null = null;
   private engines = new Map<string, Engine>();
+  private tunnels: Array<{ close: () => void }> = [];
   private cfg(name?: string): { key: string; conf: DataSourceConfig } {
     if (!this.configs) this.configs = loadConfig();
@@ -79,10 +141,19 @@ class DataSource {
     if (!conf.url) throw new Error(`Data Driver: datasource "${key}" has no url (set it in .env.qa).`);
     let engine: Engine;
     if (conf.engine === 'postgres') {
+      let connectionString = conf.url;
+      if (conf.ssh) {                                   // Cách B: tunnel the DB socket through a bastion
+        const u = new URL(conf.url);
+        const t = await openSshTunnel(conf.ssh, u.hostname, Number(u.port || 5432));
+        this.tunnels.push(t);
+        u.hostname = t.host; u.port = String(t.port);   // rewrite host:port → 127.0.0.1:<tunnel> (keep user/pass/db/query)
+        connectionString = u.toString();
+      }
       const { Pool } = require('pg');
-      const pool = new Pool({ connectionString: conf.url, max: 2, statement_timeout: conf.statement_timeout_ms ?? 4000 });
+      const pool = new Pool({ connectionString, max: 2, statement_timeout: conf.statement_timeout_ms ?? 4000 });
       engine = { query: async (sql, params) => (await pool.query(sql, params)).rows };
     } else if (conf.engine === 'sqlite') {
+      if (conf.ssh) console.warn(`Data Driver: datasource "${key}" sets ssh: but engine is sqlite (file-based) — ssh ignored.`);
       const Database = require('better-sqlite3');
       const db = new Database(conf.url.replace(/^sqlite:/, ''), { readonly: conf.readonly !== false });
       engine = { query: async (sql, params) => db.prepare(sql).all(...params) };
@@ -93,6 +164,12 @@ class DataSource {
     return { engine, conf };
   }
+  /** Close any open SSH tunnels (optional explicit teardown; tunnels are unref'd so the process exits regardless). */
+  close(): void {
+    for (const t of this.tunnels) t.close();
+    this.tunnels = [];
+  }
   private build(table: string, filter: Record<string, any>): { sql: string; params: any[] } {
     const cols = Object.keys(filter);
     const where = cols.map((c, i) => `${ident(c)} = $${i + 1}`).join(' AND ');

package/src/orchestrator/templates/specs-test-data.ts CHANGED Viewed

@@ -23,7 +23,8 @@ export class TestDataLoader {
    */
   static load(screenName: string, featureName: string): TestDataLoader {
     let baseDir: string;
-    if (screenName.startsWith('flows/')) {
+    if (screenName.startsWith('flows/') || screenName.startsWith('api/')) {
+      // flows/<flow> · api/<area> · api/flows/<flow> → qa/<screenName>/test-data
       baseDir = path.join(process.cwd(), 'qa', screenName, 'test-data');
     } else {
       baseDir = path.join(process.cwd(), 'qa', 'screens', screenName, 'test-data');

package/dist/generators/test-generator/patterns/assertion-patterns.d.ts DELETED Viewed

@@ -1,7 +0,0 @@
-import { StepPattern } from './types';
-/**
- * Assertion patterns: visibility, text content, state, attributes
- * Uses template engine for framework-agnostic code generation
- */
-export declare const assertionPatterns: StepPattern[];
-//# sourceMappingURL=assertion-patterns.d.ts.map

package/dist/generators/test-generator/patterns/assertion-patterns.d.ts.map DELETED Viewed

	@@ -1 +0,0 @@
1	- {"version":3,"file":"assertion-patterns.d.ts","sourceRoot":"","sources":["../../../../src/generators/test-generator/patterns/assertion-patterns.ts"],"names":[],"mappings":"AACA,OAAO,EAAE,WAAW,EAAoB,MAAM,SAAS,CAAC;AAExD;;;GAGG;AACH,eAAO,MAAM,iBAAiB,EAAE,WAAW,EA2qB1C,CAAC"}