npm - qfai - Versions diffs - 1.8.2 → 1.8.3 - Mend

qfai 1.8.2 → 1.8.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (39) hide show

package/README.md CHANGED Viewed

@@ -52,13 +52,16 @@ npx qfai report
     (`.qfai/review/review-*/summary.json` + minimum schema), writes `.qfai/report/validate.json`,
     and appends run logs to `.qfai/report/run-*/`; use `--fail-on error` (or `--fail-on warning`) to turn it into a CI gate,
     and `--format github` to emit GitHub-friendly annotations.
-    Use `--phase refinement` only for local refinement checks; CI should use default/full validation.
+    Use `--profile discussion|sdd|prototyping|atdd|tdd|verify` for local skill-owned checks; CI should use default/full validation (or `verify` / `tdd` for the dedicated CI gates).
 - `npx qfai report`
   - Produces a human-readable report (`report.md` by default) or an internal JSON export (`report.json`) from `validate.json`; use `--base-url` to link file paths in Markdown to your repository viewer.
 - `npx qfai doctor`
   - Diagnoses configuration discovery, path resolution, glob scanning, and `validate.json` inputs before running validate/report; use `--fail-on` to enforce failures in CI.
     Note: prototyping evidence (`.qfai/evidence/prototyping.json`) is produced by the AI workflow / skills
-    (`/qfai-prototyping` with `mode=full-harness` for supported UI surfaces only), not by a general-purpose end-user CLI flow.
+    (`/qfai-prototyping` — any mode; modes differ only in `maxCycles`, see spec-0012), not by a general-purpose end-user CLI flow.
+    Use `npx qfai prototyping round-start --round <r5|r3|r2|r1> --candidates <csv> --target-url <url> --mode <mode>`
+    to generate the round-scoped review bundle and command plans the AI evaluator sub-agent consumes, then use
+    `round-harvest`, `round-narrow`, `round-absorb`, and `round-reimplement-verify` to advance the candidate funnel.
     `qfai validate` consumes the resulting evidence files, including `mode.effective` and `fullHarness` metadata when present.
     Traceability refs inside prototyping evidence must use repo-root-relative concrete artifact refs (for example `.qfai/specs/spec-0001/01_Spec.md#L3` or `.qfai/evidence/render.json#/screens/0`).
     Absolute paths are invalid. The same strict ref grammar is enforced for top-level and leaf evidence-bearing fields, including
@@ -112,7 +115,9 @@ QFAI includes a small set of custom skills (stored under `.qfai/assistant/skills
   as 15 required markdown files under `.qfai/discussion/discussion-<ts>/`.
   UI-bearing discussion packs may include `prototyping.yaml` as an optional recommendation artifact; non-ui discussion packs typically omit it.
 - **qfai-sdd**: Unified SDD entrypoint with discussion-pack preflight guard (missing/incomplete/blocking OQ causes stop + next action guidance).
-- **qfai-prototyping**: Build a contract-aligned UI prototype under the `full-harness` only / UI-only contract, with calibration-pack SSOT and screen-level Browser QA evidence.
+- **qfai-prototyping**: Build a contract-aligned UI prototype using the Playwright CLI + AI
+  evaluator harness (spec-0012). Modes (`low-cost`/`standard`/`full-harness`) run the same
+  strictest review cycle; only `maxCycles` (1/3/20) differs.
 - **qfai-atdd**: Implement acceptance tests driven by specs/scenarios.
 - **qfai-implement**: Unified TDD micro-cycle (Red/Green/Refactor) one test at a time using `test-list.md` as the execution ledger, including ledger status updates and exception closure.
 - **qfai-verify**: Run full-scan local quality gates (`validate --fail-on error`, `report`, repo gates) and produce reviewer-approved evidence under `.qfai/evidence/`.
@@ -286,7 +291,7 @@ npx qfai validate --fail-on error
 Recommended baseline.
-- Keep CI on default/full validation (`qfai validate --fail-on error`); do not use `--phase refinement` in CI.
+- Keep CI on default/full validation (`qfai validate --fail-on error` or `qfai validate --profile verify --fail-on error`); do not use partial profiles in CI.
 - Keep `pnpm check-types:future` as a separate mandatory gate so future TS compatibility runs once without duplicating `pnpm ci:gate`.
 - Add a report step (`npx qfai report`) when you need a human-readable artifact.
 - Tune traceability globs in `qfai.config.yaml` to match your test layout.

package/assets/init/.qfai/assistant/agents/product-experience-architect.md CHANGED Viewed

@@ -24,7 +24,8 @@
 - .github/instructions/principles.instructions.md
 - .instruction/00_universal/development-principles-checklist.md
 - .instruction/01_specialties/design.md
-- Exploration brief, reference pool, evaluation rubric, evaluator calibration, selected direction, finalized design system, screen contracts, optional tokens, optional fallback HTML/CSS mock, and Mermaid flows
+- Exploration brief, reference pool, evaluation rubric, evaluator calibration, selected direction, finalized design system,
+  screen contracts, optional tokens, optional fallback HTML/CSS mock, and Mermaid flows
 - Runtime screenshots or rendered evidence when available
 ## Deliverables

package/assets/init/.qfai/assistant/skills/qfai-atdd/SKILL.md CHANGED Viewed

@@ -111,7 +111,7 @@ Use the shared schema.
 - ATDD-specific reviewer checks:
   - coverage obligations met: E2E covers `US`, Integration covers `TC`, API covers `CON-API`;
   - Coverage Depth Matrix is reviewed and no unjustified `X` cells remain;
-  - validation evidence exists and `qfai validate --fail-on error` passes;
+  - validation evidence exists and `qfai validate --profile atdd --fail-on error` passes;
   - Drift Protocol is enforced;
   - test-layer policy is checked against `.qfai/assistant/steering/test-layers.md`;
   - coverage floors and ratios are signals, not gates;
@@ -146,7 +146,7 @@ Follow `.qfai/assistant/instructions/shared-skill-operating-baseline.md#delta-re
 - Do NOT declare completion based on unit/component tests.
 - `10_Plan.md` is the primary How SSOT for execution phases.
 - If `10_Plan.md` is missing, stop and run owner planning flow before proceeding.
-- Completion gate is validation with zero errors (`qfai validate --fail-on error`).
+- Completion gate is validation with zero errors (`qfai validate --profile atdd --fail-on error`).
 - Coverage obligations are mandatory:
   - `tests/e2e/**` must cover all required `US-*`.
   - `tests/integration/**` must cover all required `TC-*`.
@@ -219,7 +219,7 @@ Notes:
 - All required `US` are covered by E2E tests.
 - All required `TC` are covered by integration tests.
 - All required `CON-API` are covered by API tests.
-- Validation passes: `qfai validate --fail-on error`.
+- Validation passes: `qfai validate --profile atdd --fail-on error`.
 - Repository quality gates (format/lint/type/tests/pack) pass with evidence.
 - Evidence file exists and includes work orders + reviewer notes.
 - Completion is approved by a reviewer who did not implement tests.
@@ -318,7 +318,7 @@ Before declaring completion:
 2. Run:
    ```bash
-   qfai validate --fail-on error
+   qfai validate --profile atdd --fail-on error
    ```
 3. Run repository standard gates:

package/assets/init/.qfai/assistant/skills/qfai-configure/SKILL.md CHANGED Viewed

@@ -88,7 +88,7 @@ Use the shared schema.
 - Follow `.qfai/assistant/instructions/shared-skill-delegation-baseline.md#reviewer-gate-baseline`.
 - Reviewer checks:
   - required roles were delegated;
-  - validate evidence exists: `qfai validate --fail-on error` completed with `error=0`;
+  - doctor evidence exists: `qfai doctor --fail-on error` completed without failing checks;
   - Drift Protocol enforced;
   - test-layer policy enforced against `.qfai/assistant/steering/test-layers.md`;
   - tool-count heuristics are signals, not gates.

package/assets/init/.qfai/assistant/skills/qfai-discussion/SKILL.md CHANGED Viewed

@@ -82,6 +82,7 @@ Before declaring completion, you MUST:
 - ensure every deferred item has full metadata in `13_Deferred.md`;
 - ensure `02_Inception-Deck.md` and `03_Story-Workshop.md` include Mermaid diagrams;
 - ensure the UI-bearing sidecar family is complete;
+- run `qfai validate --profile discussion --fail-on error` and fix discussion-owned findings;
 - avoid selecting a single visual winner in discussion artifacts.
 ### Reviewer Gate (MUST)

package/assets/init/.qfai/assistant/skills/qfai-discussion/references/rcp_footer.md CHANGED Viewed

@@ -28,7 +28,7 @@
 ## Validate Hard Gate（必須）
-- 各 review cycle で `qfai validate --fail-on error --format github` を実行していること
+- 各 review cycle で `qfai validate --profile discussion --fail-on error --format github` を実行していること
 - `.qfai/report/validate.log` が存在し、最新の成果物に対応していること
 ---

package/assets/init/.qfai/assistant/skills/qfai-implement/SKILL.md CHANGED Viewed

@@ -289,6 +289,7 @@ Completion MUST NOT be declared when any of the following are true:
 - Items with `todo`, `red`, `green`, or `refactor` status still exist (for spec-level completion)
 - Parallel slices were used but integration verify has not been run post-merge
 - Checkpoint boundary was reached but verification was not executed
+- `it.todo(...)` / `test.todo(...)` / `describe.todo(...)` stubs remain in any file covered by `validation.traceability.testFileGlobs` (`QFAI-TEST-001`, spec-0012). Implement the body or delete the stub — an opt-out via `validation.testStrategy.forbidTestTodoStubs: false` is permitted only with an accompanying waiver DR-ID.
 ## Evidence (MANDATORY)
@@ -335,6 +336,7 @@ Each TDD item MUST have fresh evidence containing at minimum:
 - [ ] No backward transitions occurred.
 - [ ] Exception items have DR-IDs recorded.
 - [ ] All tests pass.
+- [ ] `qfai validate --profile tdd --fail-on error` passes with zero `QFAI-TEST-001` findings (no `it.todo` / `test.todo` / `describe.todo` stubs remain; spec-0012).
 ## Completion Checklist (MUST)
@@ -349,7 +351,7 @@ Each TDD item MUST have fresh evidence containing at minimum:
 When this skill is complete, provide a final user-facing completion message and enumerate all actionable next steps.
 - Verify gates: `/qfai-verify`.
-  Action: run `qfai validate --fail-on error` and confirm all gates pass.
+  Action: run `qfai validate --profile tdd --fail-on error` for this skill, then `/qfai-verify` for full-scan approval.
 - Spec updates needed: `/qfai-sdd`.
   Action: update spec artifacts if implementation revealed scope changes.
 - Acceptance tests: `/qfai-atdd`.

package/assets/init/.qfai/assistant/skills/qfai-prototyping/SKILL.md CHANGED Viewed

@@ -30,18 +30,28 @@ Do not rely on a CLI entrypoint or package runtime loop.
 ## CRITICAL CONSTRAINTS (Read First)
 - Scope is all specs from `.qfai/specs/spec-*`.
-- Screenshot evidence and HTML snapshot evidence are mandatory.
-- Screenshot evidence path: `.qfai/evidence/prototyping/screenshots/<screen-id>.png`
-- HTML snapshot path: `.qfai/evidence/prototyping/html/<screen-id>.html`
-- If either screenshot or HTML is missing for a declared screen, that screen scores `0` and the run is incomplete.
-- Optional evidence is abolished. Missing mandatory evidence must trigger rerun, not waiver.
-- DONE is forbidden until `qfai validate --fail-on error` passes and `/qfai-verify` can approve the run.
+- The AI evaluator sub-agent performs visual evaluation. QFAI does not score visual quality. (spec-0012)
+- Playwright CLI (`playwright-cli`) is the sole standard browser tool. Playwright MCP, Node Playwright direct invocation, and screenshot-capture shell scripts are not used. (spec-0012)
+- QFAI pre-assigns evidence paths. The evaluator MUST use the paths in the command plan (`review-bundle.json` → `command-plans.json`); it MUST NOT invent paths.
+- For every declared screen and every active candidate in every round, 4 evidence artifacts are mandatory:
+  - screenshot: `.qfai/evidence/prototyping/rounds/<round>/candidates/<candidate-id>/<screen-id>.png`
+  - HTML: `.qfai/evidence/prototyping/rounds/<round>/candidates/<candidate-id>/<screen-id>.html`
+  - accessibility snapshot: `.qfai/evidence/prototyping/rounds/<round>/candidates/<candidate-id>/<screen-id>.snapshot.txt`
+  - command log: `.qfai/evidence/prototyping/rounds/<round>/candidates/<candidate-id>/<screen-id>.commands.json`
+- Canonical latest screenshot path: `.qfai/evidence/prototyping/screenshots/<screen-id>.png`
+- Canonical latest HTML path: `.qfai/evidence/prototyping/html/<screen-id>.html`
+- Canonical latest paths MUST mirror the latest accepted winner/polish artifacts.
+- If any of the 4 artifacts is missing for a declared screen, the round is incomplete; rerun is mandatory, not waiver.
+- Mode differences are limited to `maxCycles` only (low-cost=1, standard=3, full-harness=20). Every other gate, obligation, reviewer severity, and completion criterion is identical across modes. (spec-0012)
+- DONE is forbidden until `qfai validate --profile prototyping --fail-on error` passes and `/qfai-verify` can approve the run.
 - Supported UI prototyping surfaces are `web`, `mobile`, `desktop`, and `mixed`.
 - `cli`, API-only, backend-only, and `ui_bearing: false` classifications are not prototyping execution targets.
-- `cli` is not supported and is not an execution target for prototyping.
-- Evaluation is performed by sub-agents; machine checks are limited to schema/evidence validation and breakthrough trigger detection.
-- Shared evidence vocabulary includes `render.json`, `browser-qa.json`, `prototyping.json`, and `breakthrough.json`.
-- static-first evidence capture remains mandatory even when interactive review is used.
+- Machine checks are limited to schema/evidence validation, mode invariant enforcement, review-cycle completeness, and breakthrough trigger detection.
+- Shared evidence vocabulary: `prototyping.json`, `review-bundle.json`, `command-plans.json`, `evaluator-reviews/<candidate-id>.json`, `harvest.json`, `absorption-plan.json`, `reimplementation.json`, `breakthrough.json`.
+- Direction funnel completion is not stage completion.
+- Selecting the first winner does not satisfy completion. Completion review is forbidden until at least one post-selection polish cycle has completed.
+- Completion requires every reviewer sub-agent to score every evaluation axis at `100/100`; `95` is not a completion border.
+- Do not use `complete`, `completed`, `done`, or equivalent completion wording in other languages before the completion checklist passes. Use `exploration complete`, `winner selected`, `polishing`, `breakthrough checking`, or `reviewer gate pending` for interim states.
 ## Goal
@@ -50,13 +60,17 @@ Generate multiple design directions, converge on a winner, extract the selected
 ## Surface / Mode
 - surface / mode routing uses `standard` as the default execution path.
-- `standard` is the default when no explicit escalation to `full-harness` is requested.
-- `full-harness` is reserved for explicit escalation and review-heavy obligations.
+- **Mode Invariant (spec-0012)**: modes differ only by `maxCycles`. Review gate, evidence requirements, reviewer severity, best-of-history, breakthrough detection, and completion criteria are identical across modes.
+  - `low-cost`: `maxCycles = 1`
+  - `standard`: `maxCycles = 3` (default)
+  - `full-harness`: `maxCycles = 20`
+- No mode weakens obligations. Choosing a lower mode buys fewer chances to iterate, not a looser gate.
 ## Required References
 Read and follow these references before execution:
+- `.qfai/specs/spec-0012/01_Spec.md` — primary SSOT for the prototyping harness
 - `.qfai/assistant/skills/qfai-prototyping/references/evidence-requirements.md`
 - `.qfai/assistant/skills/qfai-prototyping/references/iteration-cycle.md`
 - `.qfai/assistant/skills/qfai-prototyping/references/l1-review-guide.md`
@@ -73,13 +87,13 @@ All sub-agent delegation in this skill MUST follow the category-to-role mapping
 Assigning a task to a role not listed for the category is a violation and MUST be flagged.
 Evaluation scoring and screenshot capture must use only the allowed roles below.
-| Category              | Allowed Role(s)                                        |
-| --------------------- | ------------------------------------------------------ |
-| UI implementation     | frontend-engineer, product-experience-architect        |
-| Screenshot capture    | devops-ci-engineer                                     |
-| Evaluation scoring    | product-surface-reviewer, product-experience-architect |
-| Build                 | devops-ci-engineer, backend-engineer                   |
-| Breakthrough planning | product-experience-architect, frontend-engineer        |
+| Category                           | Allowed Role(s)                                        |
+| ---------------------------------- | ------------------------------------------------------ |
+| UI implementation                  | frontend-engineer, product-experience-architect        |
+| Playwright CLI execution & capture | product-surface-reviewer, product-experience-architect |
+| Evaluation scoring                 | product-surface-reviewer, product-experience-architect |
+| Build                              | devops-ci-engineer, backend-engineer                   |
+| Breakthrough planning              | product-experience-architect, frontend-engineer        |
 Any delegation map entry that assigns a category to an undefined or unlisted role MUST produce a violation finding naming the undefined role and the category.
@@ -91,7 +105,7 @@ Before any code is written, create an execution plan record in the work evidence
 Required fields:
-- `targetIterations`: integer; minimum 2
+- `targetRounds`: ordered array; default funnel is `["r5", "r3", "r2", "r1"]`
 - `funnelPolicy`: `5->3->2->1`
 - `evaluationAxesSource`: ref to `.qfai/contracts/design/evaluation-rubric.yaml`
 - `delegationMap`: category-to-role assignments per Delegation Scope Table
@@ -139,56 +153,84 @@ Confirm all of the following before any evaluation:
 Generate 5 clearly distinct design directions before selecting a winner.
 Do not begin with a single incumbent direction.
-### Step 4 — Capture Mandatory Evidence
+### Step 4 — Round Start: Prepare Candidate Review Bundle & Command Plans
-For every declared screen and every active direction:
+Before launching the evaluator, prepare the round-scoped artifacts via QFAI (not by hand):
-- capture one screenshot and store it at the canonical screenshot path
-- capture one HTML snapshot and store it at the canonical HTML path
-- record missing evidence immediately; do not continue as if capture succeeded
+- Run `qfai prototyping round-start --round <rN> --candidates <csv> --target-url <url> --mode <mode>`.
+- QFAI produces:
+  - `.qfai/evidence/prototyping/rounds/<rN>/command-plans.json` — the candidate-aware Playwright CLI command plans
+  - `.qfai/evidence/prototyping/rounds/<rN>/review-bundle.json` — the evaluator input bundle (candidates, axisDefs, designSystemChecklist, commandPlanRef)
+- Do not invent evidence paths. Paths are fixed by QFAI per spec-0012.
-### Step 5 — Launch Evaluation Reviewers
+### Step 5 — AI Evaluator Executes the Command Plans and Captures Evidence
-Launch evaluation reviewer sub-agents with the full context bundle:
+For every declared screen of every active candidate in the current round, the AI evaluator sub-agent:
-- screenshots from Step 4
-- HTML snapshots from Step 4
-- `axisDefs` from `.qfai/contracts/design/evaluation-rubric.yaml`
-- `previousScore` from the prior iteration (`null` for iteration 1)
-- `designSystemChecklist` from `.qfai/contracts/design/design-system.yaml`
+1. Reads `command-plans.json` for the round
+2. Runs `playwright-cli goto <url>` for the candidate route
+3. Runs `playwright-cli snapshot --save <candidate-path>/<screen-id>.snapshot.txt`
+4. Performs interaction commands (click/fill) to exercise `primaryTasks` noted in the plan
+5. Runs `playwright-cli screenshot --full-page --save <candidate-path>/<screen-id>.png`
+6. Runs `playwright-cli eval "document.documentElement.outerHTML" > <candidate-path>/<screen-id>.html`
+7. Saves the sequence of executed commands to `<candidate-path>/<screen-id>.commands.json`
-### Step 6 — Direction Funnel
+If any capture step fails, the evaluator records the failure and stops pretending the screen was evaluated. The round is incomplete and must be rerun.
+### Step 6 — Launch Evaluation Reviewers
+Launch evaluation reviewer sub-agents with the full context bundle. Inputs are read from `review-bundle.json`:
+- per-screen screenshot, HTML, accessibility snapshot, and command log under `rounds/<round>/candidates/<candidate-id>/`
+- `axisDefs` (from `.qfai/contracts/design/evaluation-rubric.yaml`)
+- `previousScore` from the prior round when available
+- `designSystemChecklist` (from `.qfai/contracts/design/design-system.yaml`)
+- `commandPlanRef` pointing at `command-plans.json`
+The reviewer writes `rounds/<round>/evaluator-reviews/<candidate-id>.json` with concrete `evidenceRefs[]` for every score. Placeholder refs are rejected.
+### Step 7 — Harvest and Direction Funnel
 Run the mandatory convergence funnel:
-- 5 directions -> top 3
-- top 3 remixed -> top 2
-- top 2 -> selected winner 1
+- `r5`: 5 directions -> top 3
+- `r3`: top 3 remixed -> top 2
+- `r2`: top 2 -> selected winner `r1`
-### Step 7 — Extract Winner Contracts
+At the end of each harvestable round:
+- run `qfai prototyping round-harvest --round <rN>`
+- record survivors with `qfai prototyping round-narrow --round <rN> --survivors <csv>`
+- for `r3|r2|r1`, generate absorption templates with `qfai prototyping round-absorb --round <rN> --survivors <csv>`
+### Step 8 — Extract Winner Contracts
 After the first winner is selected:
 - write `.qfai/contracts/design/selected-direction.yaml`
 - extract `.qfai/contracts/design/design-system.yaml`
-### Step 8 — Polish the Winner
+Selecting the first winner is not completion. Do not start completion review and do not use completion wording until Step 9, Step 10, Step 12, reviewer gate, and the perfect-100 score gate pass.
+### Step 9 — Polish the Winner
 Iterate on the selected winner with normal critique/rework loops.
 Do not assume the latest iteration is automatically best; keep best-of-history in evidence.
+At least one full post-selection polish loop is mandatory. Each polish loop must include critique, fix, re-capture, re-review, and breakthrough check evidence.
-## Iteration Gate
+## Cycle Gate
-- Minimum 2 iterations are required before any terminal phase transition is allowed.
-- Do not mark the run as converged or complete after a single iteration.
-- Any phase transition to completion must pass through the iteration gate and reviewer gate.
+- Completion requires at least one `polish` cycle after winner selection (spec-0012). This applies to all modes.
+- The same gate applies in every mode; modes differ only in `maxCycles` (low-cost=1, standard=3, full-harness=20).
+- If the polish-cycle budget is exhausted before the gate is satisfied, the run does NOT complete. The evaluator returns `REVISE` and the developer may re-run at a higher mode.
+- Any phase transition to completion must pass through the cycle gate and the reviewer gate.
-### Step 9 — Breakthrough Detection
+### Step 10 — Breakthrough Detection
 After each polish iteration, run the mechanical breakthrough detector.
-If `allItemsPass95` is false and score improvement is below the configured plateau threshold and code change is below the configured diff threshold, trigger breakthrough branching.
+If `allReviewerAxesPerfect100` is false and score improvement is below the configured plateau threshold and code change is below the configured diff threshold, trigger breakthrough branching.
-### Step 10 — Breakthrough Branch Loop
+### Step 11 — Breakthrough Branch Loop
 When breakthrough is triggered:
@@ -198,21 +240,26 @@ When breakthrough is triggered:
 - refresh selected-direction/design-system if the winner changes
 - record the decision in `.qfai/evidence/breakthrough.json`
-### Step 11 — Validate and Verify
+### Step 12 — Validate and Verify
-- Run `qfai validate --fail-on error`.
+- Run `qfai validate --profile prototyping --fail-on error`.
 - Route `/qfai-verify` or its equivalent gate workflow for final quality approval.
 - Do not declare completion until the reviewer result is `PASS`.
 ## Evaluator Inputs (Mandatory)
-When launching any evaluation reviewer sub-agent, all 5 elements MUST be present:
+Evaluation reviewer sub-agents MUST be launched with the `review-bundle.json` for the current round. The bundle contains all required inputs. At a minimum, the bundle MUST reference:
+1. screenshots (per declared screen, round/candidate path)
+2. HTML snapshots (per declared screen, round/candidate path)
+3. accessibility snapshots (`<screen-id>.snapshot.txt` per declared screen, round/candidate path)
+4. Playwright CLI command log (`<screen-id>.commands.json` per declared screen, round/candidate path)
+5. `axisDefs` from `.qfai/contracts/design/evaluation-rubric.yaml`
+6. `previousScore` from the prior round when available
+7. `designSystemChecklist` from `.qfai/contracts/design/design-system.yaml`
+8. `commandPlanRef` pointing at `command-plans.json`
-1. screenshots
-2. HTML snapshots
-3. axisDefs
-4. previousScore
-5. designSystemChecklist
+The evaluator writes `evaluator-reviews/<candidate-id>.json` with per-axis `score`, `rationale`, and `evidenceRefs[]`. Every `evidenceRefs[]` entry MUST point to an existing artifact; placeholder strings (`""`, `"tbd"`, `"TBD"`) are rejected by `qfai validate`.
 ## Visual Quality Structural Checklist
@@ -238,9 +285,12 @@ Minimum reviewer responsibilities:
 - verify mandatory screenshot/HTML evidence exists for every declared screen
 - verify exploration brief, evaluation rubric, and evaluator calibration were used
 - verify missing evidence caused rerun rather than waiver
-- verify `qfai validate --fail-on error` completed successfully
+- verify `qfai validate --profile prototyping --fail-on error` completed successfully
 - verify breakthrough trigger evidence is present
 - verify best-of-history handling is documented
+- verify at least one post-selection polish iteration completed after winner selection
+- verify every reviewer sub-agent scored every evaluation axis at `100/100`
+- reject completion claims based on any 95-point threshold
 - treat score/volume heuristics as signals, not gates
 - return `Result: PASS | REVISE`
@@ -271,25 +321,34 @@ Use the shared schema (per-row `Status (PASS/REVISE)` column, reviewer response
 Follow `.qfai/assistant/instructions/shared-skill-operating-baseline.md#completion-contract-shared`.
-Prototyping-specific additions:
+Prototyping-specific additions (apply to all modes identically):
 - all specs are covered
-- all declared screens have screenshot + HTML evidence
+- all declared screens have 4 artifacts per active candidate / round: screenshot, HTML, accessibility snapshot, Playwright CLI command log
+- canonical latest paths mirror the latest accepted winner/polish state
+- `review-bundle.json`, `command-plans.json`, and per-candidate evaluator reviews exist for every round
 - `selected-direction.yaml` exists
 - `design-system.yaml` exists
 - `breakthrough.json` exists
-- `qfai validate --fail-on error` passes
-- reviewer returns `PASS`
+- `bestOfHistory` and `breakthrough` sections present in `prototyping.json`
+- at least one post-selection polish cycle completed after winner selection
+- every reviewer sub-agent scored every evaluation axis at `100/100`
+- independent reviewer gate returned `PASS`
+- `qfai validate --profile prototyping --fail-on error` passes
 ## FINAL CHECKLIST (Check Last)
 - All specs are covered in the Coverage Matrix.
-- Every declared screen has screenshot evidence.
-- Every declared screen has HTML evidence.
+- Every declared screen has screenshot, HTML, accessibility snapshot, and command log evidence per active candidate / round.
+- Canonical latest paths mirror the latest accepted winner/polish artifacts.
+- Mode invariant: `maxCycles` is the only mode-dependent field in `prototyping.json` (validated by `QFAI-PROT-MODE-001`).
 - Missing evidence triggered rerun instead of waiver.
 - Direction funnel `5->3->2->1` completed.
-- Breakthrough detector ran after polish iterations.
-- Reviewer returned PASS; otherwise status is REVISE.
+- Direction funnel completion was not treated as stage completion.
+- At least one post-selection polish cycle completed with critique/fix/re-capture/re-review/breakthrough checks.
+- Every reviewer sub-agent scored every evaluation axis at `100/100`.
+- Breakthrough detector ran after polish cycles.
+- Independent reviewer returned PASS; otherwise status is REVISE.
 ## Completion Message & Next Actions (MUST)

package/assets/init/.qfai/assistant/skills/qfai-prototyping/references/evidence-requirements.md CHANGED Viewed

@@ -1,31 +1,62 @@
 # Evidence Requirements
-## Mandatory evidence
+## Mandatory evidence (per round, per candidate, per screen)
-For every declared screen in `.qfai/contracts/ui/*.yaml`, collect both:
+For every declared screen in `.qfai/contracts/ui/*.yaml`, collect all 4 artifacts for every active candidate in every round `<rN>`:
-- screenshot: `.qfai/evidence/prototyping/screenshots/<screen-id>.png`
-- HTML snapshot: `.qfai/evidence/prototyping/html/<screen-id>.html`
+- screenshot: `.qfai/evidence/prototyping/rounds/<rN>/candidates/<candidate-id>/<screen-id>.png`
+- HTML snapshot: `.qfai/evidence/prototyping/rounds/<rN>/candidates/<candidate-id>/<screen-id>.html`
+- accessibility snapshot: `.qfai/evidence/prototyping/rounds/<rN>/candidates/<candidate-id>/<screen-id>.snapshot.txt`
+- Playwright CLI command log: `.qfai/evidence/prototyping/rounds/<rN>/candidates/<candidate-id>/<screen-id>.commands.json`
+## Per-round artifacts
+Every exploration round MUST also produce these round-scoped artifacts:
+- Playwright CLI command plans: `.qfai/evidence/prototyping/rounds/<rN>/command-plans.json`
+- Review input bundle: `.qfai/evidence/prototyping/rounds/<rN>/review-bundle.json`
+- Evaluator outputs: `.qfai/evidence/prototyping/rounds/<rN>/evaluator-reviews/<candidate-id>.json`
+- Harvest template for `r5|r3|r2`: `.qfai/evidence/prototyping/rounds/<rN>/harvest.json`
+- Narrow decision for `r5|r3|r2`: `.qfai/evidence/prototyping/rounds/<rN>/narrow-decision.json`
+- Absorption plan for `r3|r2|r1`: `.qfai/evidence/prototyping/rounds/<rN>/absorption-plan.json`
+- Reimplementation record for `r3|r2|r1`: `.qfai/evidence/prototyping/rounds/<rN>/reimplementation.json`
+## Canonical latest paths
+The canonical latest screenshot and HTML MUST mirror the newest accepted winner/polish artifacts for the same `<screen-id>`:
+- `.qfai/evidence/prototyping/screenshots/<screen-id>.png`
+- `.qfai/evidence/prototyping/html/<screen-id>.html`
 If either artifact is missing:
-- the screen is scored `0`
-- the run is incomplete
+- the screen is scored `0` for the round
+- the round is incomplete
 - rerun is mandatory
 Optional evidence is not allowed.
 ## Capture rules
+- AI evaluator sub-agent executes the Playwright CLI command plans generated by QFAI.
+- Paths are assigned by QFAI via `command-plans.json`. Do not invent paths.
 - Use stable `screen-id` names from the canonical UI contracts.
-- Overwrite stale evidence with fresh evidence from the current iteration.
-- Do not reuse an older screenshot or HTML snapshot after a fix.
+- Overwrite stale evidence with fresh evidence from the current active winner/polish state while preserving round history under `rounds/<rN>/`.
+- Do not reuse an older screenshot, HTML snapshot, accessibility snapshot, or command log after a fix.
 - If capture fails, record the failure in work evidence and stop pretending the screen was evaluated.
+## Mode invariant
+Evidence requirements are identical for all modes (low-cost / standard / full-harness) per spec-0012. Modes differ only by `maxCycles` (1 / 3 / 20). Choosing a lower mode does NOT reduce evidence obligations.
 ## Validate gate expectations
-`qfai validate --fail-on error` must be able to confirm:
+`qfai validate --profile prototyping --fail-on error` must be able to confirm, for every round:
-- every declared screen has a screenshot file
-- every declared screen has an HTML snapshot file
-- the file paths follow the canonical directories above
+- every declared screen has all 4 per-screen artifacts
+- the round has `command-plans.json`, `review-bundle.json`, and per-candidate evaluator reviews
+- canonical latest paths mirror the newest accepted winner/polish artifacts
+- `review-bundle.json` has all required fields (candidates, axisDefs, designSystemChecklist, commandPlanRef)
+- evaluator review `evidenceRefs[]` entries are concrete paths to existing files (no placeholders)
+- `prototyping.json` `maxCycles` matches the mode (QFAI-PROT-MODE-001)
+- `bestOfHistory`, `breakthrough`, and `reviewerGate` sections are present and populated

package/assets/init/.qfai/assistant/skills/qfai-prototyping/references/iteration-cycle.md CHANGED Viewed

@@ -1,25 +1,57 @@
-# Iteration Cycle
+# Round Lifecycle
-Each iteration follows this order:
+## Phase taxonomy
-1. Capture screenshot and HTML for every declared screen.
-2. Launch L1 and L2 evaluator sub-agents with the required inputs.
-3. Aggregate findings and classify them by severity and disposition.
-4. Fix the UI according to findings.
-5. Re-capture screenshot and HTML evidence for every changed screen.
-6. Re-run the evaluators.
+The exploration funnel is expressed as fixed rounds plus optional polish cycles:
-## Minimum iteration count
+- exploration rounds: `r5`, `r3`, `r2`, `r1`
+- post-selection loops: `polish`, `branch`, `reviewer_gate`, `completed`
-- Completion requires at least 2 iterations.
-- A single successful-looking pass is not enough.
-- If evidence is missing in any iteration, that iteration does not count as complete.
+`r1` is winner selection only. It does not count as a post-selection polish cycle and cannot be used as stage completion.
+## Round steps
+Each exploration round follows this order, driven by the AI evaluator sub-agent running Playwright CLI:
+1. **Round start**: run `qfai prototyping round-start --round <rN> --candidates <csv> --target-url <url> --mode <mode>`.
+2. **Capture**: execute `command-plans.json` (goto, snapshot, interaction, screenshot, html) for every declared screen of every active candidate, saving evidence at the assigned paths.
+3. **Evaluate**: evaluator sub-agents read `review-bundle.json` and write per-candidate `evaluator-reviews/<candidate-id>.json` with concrete `evidenceRefs[]`.
+4. **Harvest**: run `qfai prototyping round-harvest --round <rN>` to create the harvest template from the evaluated candidate set.
+5. **Narrow**: run `qfai prototyping round-narrow --round <rN> --survivors <csv>` to record which candidates survive to the next round.
+6. **Absorb**: for `r3|r2|r1`, run `qfai prototyping round-absorb --round <rN> --survivors <csv>` to generate the absorption plan for the surviving candidates.
+7. **Reimplement verify**: run `qfai prototyping round-reimplement-verify --round <rN>` after reimplementation evidence is written.
+## Completion requirements
+Completion (independent of mode) requires ALL of the following:
+- at least one `polish` cycle completed after winner selection (capture + review + fix + re-capture + re-review)
+- all declared screens have all 4 artifacts in the completion round / polish cycle
+- blocking findings are closed or dispositioned
+- `bestOfHistory` evidence present
+- `breakthrough` evidence present
+- every reviewer sub-agent scored every evaluation axis at `100/100`
+- `qfai validate --profile prototyping --fail-on error` passes
+- independent reviewer returns `PASS`
+- the completion certificate proves `allReviewerAxesPerfect100=true`
+## Mode invariant
+The completion gate above applies identically to `low-cost`, `standard`, and `full-harness`. The only mode-specific value is `maxCycles`:
+- `low-cost`: `maxCycles = 1` — at most one polish cycle; completion is only reachable if the single polish cycle satisfies the full gate.
+- `standard`: `maxCycles = 3` — default.
+- `full-harness`: `maxCycles = 20` — extended polish budget.
+If the polish-cycle budget is exhausted before the gate is satisfied, the run does not complete and is returned as `REVISE`. A 95-point threshold is a signal only and is not a completion border.
 ## Stop conditions
 You may stop only when all of the following are true:
-- all declared screens have screenshot + HTML evidence
+- all declared screens have all 4 artifacts for the current accepted winner/polish state
+- canonical latest paths match the current accepted winner/polish state
 - blocking findings are closed or dispositioned
-- validate passes with `--fail-on error`
+- `qfai validate --profile prototyping --fail-on error` passes
 - independent reviewer returns `PASS`
+- the completion certificate proves `allReviewerAxesPerfect100=true`