qfai 1.8.2 → 1.8.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (39) hide show
  1. package/README.md +9 -4
  2. package/assets/init/.qfai/assistant/agents/product-experience-architect.md +2 -1
  3. package/assets/init/.qfai/assistant/skills/qfai-atdd/SKILL.md +4 -4
  4. package/assets/init/.qfai/assistant/skills/qfai-configure/SKILL.md +1 -1
  5. package/assets/init/.qfai/assistant/skills/qfai-discussion/SKILL.md +1 -0
  6. package/assets/init/.qfai/assistant/skills/qfai-discussion/references/rcp_footer.md +1 -1
  7. package/assets/init/.qfai/assistant/skills/qfai-implement/SKILL.md +3 -1
  8. package/assets/init/.qfai/assistant/skills/qfai-prototyping/SKILL.md +121 -62
  9. package/assets/init/.qfai/assistant/skills/qfai-prototyping/references/evidence-requirements.md +43 -12
  10. package/assets/init/.qfai/assistant/skills/qfai-prototyping/references/iteration-cycle.md +46 -14
  11. package/assets/init/.qfai/assistant/skills/qfai-prototyping/references/l1-review-guide.md +13 -12
  12. package/assets/init/.qfai/assistant/skills/qfai-prototyping/references/l2-review-guide.md +16 -10
  13. package/assets/init/.qfai/assistant/skills/qfai-prototyping/references/reviewer-gate.md +25 -4
  14. package/assets/init/.qfai/assistant/skills/qfai-sdd/SKILL.md +3 -3
  15. package/assets/init/.qfai/assistant/skills/qfai-sdd/references/rcp_footer.md +1 -1
  16. package/assets/init/.qfai/assistant/skills/qfai-sdd/references/sdd-quality-gate.md +1 -1
  17. package/assets/init/.qfai/assistant/skills/qfai-sdd/templates/contracts/absorption-policy.sample.yaml +7 -0
  18. package/assets/init/.qfai/assistant/skills/qfai-sdd/templates/contracts/evaluation-rubric.sample.yaml +20 -3
  19. package/assets/init/.qfai/assistant/skills/qfai-sdd/templates/contracts/evaluator-calibration.sample.yaml +6 -0
  20. package/assets/init/.qfai/assistant/skills/qfai-sdd/templates/contracts/exploration-brief.sample.yaml +9 -0
  21. package/assets/init/.qfai/assistant/skills/qfai-verify/SKILL.md +6 -6
  22. package/assets/init/.qfai/contracts/design/README.md +6 -1
  23. package/assets/init/.qfai/contracts/ui/README.md +2 -0
  24. package/assets/init/.qfai/discussion/README.md +14 -9
  25. package/assets/init/.qfai/evidence/README.md +66 -46
  26. package/assets/init/root/.github/workflows/qfai-validate.yml +39 -0
  27. package/assets/init/root/qfai.config.yaml +1 -2
  28. package/dist/cli/index.cjs +2539 -927
  29. package/dist/cli/index.cjs.map +1 -1
  30. package/dist/cli/index.mjs +2624 -1012
  31. package/dist/cli/index.mjs.map +1 -1
  32. package/dist/index.cjs +1120 -421
  33. package/dist/index.cjs.map +1 -1
  34. package/dist/index.d.cts +95 -23
  35. package/dist/index.d.ts +95 -23
  36. package/dist/index.mjs +1114 -414
  37. package/dist/index.mjs.map +1 -1
  38. package/package.json +3 -2
  39. package/assets/scripts/capture-screenshots.js +0 -128
package/README.md CHANGED
@@ -52,13 +52,16 @@ npx qfai report
52
52
  (`.qfai/review/review-*/summary.json` + minimum schema), writes `.qfai/report/validate.json`,
53
53
  and appends run logs to `.qfai/report/run-*/`; use `--fail-on error` (or `--fail-on warning`) to turn it into a CI gate,
54
54
  and `--format github` to emit GitHub-friendly annotations.
55
- Use `--phase refinement` only for local refinement checks; CI should use default/full validation.
55
+ Use `--profile discussion|sdd|prototyping|atdd|tdd|verify` for local skill-owned checks; CI should use default/full validation (or `verify` / `tdd` for the dedicated CI gates).
56
56
  - `npx qfai report`
57
57
  - Produces a human-readable report (`report.md` by default) or an internal JSON export (`report.json`) from `validate.json`; use `--base-url` to link file paths in Markdown to your repository viewer.
58
58
  - `npx qfai doctor`
59
59
  - Diagnoses configuration discovery, path resolution, glob scanning, and `validate.json` inputs before running validate/report; use `--fail-on` to enforce failures in CI.
60
60
  Note: prototyping evidence (`.qfai/evidence/prototyping.json`) is produced by the AI workflow / skills
61
- (`/qfai-prototyping` with `mode=full-harness` for supported UI surfaces only), not by a general-purpose end-user CLI flow.
61
+ (`/qfai-prototyping` any mode; modes differ only in `maxCycles`, see spec-0012), not by a general-purpose end-user CLI flow.
62
+ Use `npx qfai prototyping round-start --round <r5|r3|r2|r1> --candidates <csv> --target-url <url> --mode <mode>`
63
+ to generate the round-scoped review bundle and command plans the AI evaluator sub-agent consumes, then use
64
+ `round-harvest`, `round-narrow`, `round-absorb`, and `round-reimplement-verify` to advance the candidate funnel.
62
65
  `qfai validate` consumes the resulting evidence files, including `mode.effective` and `fullHarness` metadata when present.
63
66
  Traceability refs inside prototyping evidence must use repo-root-relative concrete artifact refs (for example `.qfai/specs/spec-0001/01_Spec.md#L3` or `.qfai/evidence/render.json#/screens/0`).
64
67
  Absolute paths are invalid. The same strict ref grammar is enforced for top-level and leaf evidence-bearing fields, including
@@ -112,7 +115,9 @@ QFAI includes a small set of custom skills (stored under `.qfai/assistant/skills
112
115
  as 15 required markdown files under `.qfai/discussion/discussion-<ts>/`.
113
116
  UI-bearing discussion packs may include `prototyping.yaml` as an optional recommendation artifact; non-ui discussion packs typically omit it.
114
117
  - **qfai-sdd**: Unified SDD entrypoint with discussion-pack preflight guard (missing/incomplete/blocking OQ causes stop + next action guidance).
115
- - **qfai-prototyping**: Build a contract-aligned UI prototype under the `full-harness` only / UI-only contract, with calibration-pack SSOT and screen-level Browser QA evidence.
118
+ - **qfai-prototyping**: Build a contract-aligned UI prototype using the Playwright CLI + AI
119
+ evaluator harness (spec-0012). Modes (`low-cost`/`standard`/`full-harness`) run the same
120
+ strictest review cycle; only `maxCycles` (1/3/20) differs.
116
121
  - **qfai-atdd**: Implement acceptance tests driven by specs/scenarios.
117
122
  - **qfai-implement**: Unified TDD micro-cycle (Red/Green/Refactor) one test at a time using `test-list.md` as the execution ledger, including ledger status updates and exception closure.
118
123
  - **qfai-verify**: Run full-scan local quality gates (`validate --fail-on error`, `report`, repo gates) and produce reviewer-approved evidence under `.qfai/evidence/`.
@@ -286,7 +291,7 @@ npx qfai validate --fail-on error
286
291
 
287
292
  Recommended baseline.
288
293
 
289
- - Keep CI on default/full validation (`qfai validate --fail-on error`); do not use `--phase refinement` in CI.
294
+ - Keep CI on default/full validation (`qfai validate --fail-on error` or `qfai validate --profile verify --fail-on error`); do not use partial profiles in CI.
290
295
  - Keep `pnpm check-types:future` as a separate mandatory gate so future TS compatibility runs once without duplicating `pnpm ci:gate`.
291
296
  - Add a report step (`npx qfai report`) when you need a human-readable artifact.
292
297
  - Tune traceability globs in `qfai.config.yaml` to match your test layout.
@@ -24,7 +24,8 @@
24
24
  - .github/instructions/principles.instructions.md
25
25
  - .instruction/00_universal/development-principles-checklist.md
26
26
  - .instruction/01_specialties/design.md
27
- - Exploration brief, reference pool, evaluation rubric, evaluator calibration, selected direction, finalized design system, screen contracts, optional tokens, optional fallback HTML/CSS mock, and Mermaid flows
27
+ - Exploration brief, reference pool, evaluation rubric, evaluator calibration, selected direction, finalized design system,
28
+ screen contracts, optional tokens, optional fallback HTML/CSS mock, and Mermaid flows
28
29
  - Runtime screenshots or rendered evidence when available
29
30
 
30
31
  ## Deliverables
@@ -111,7 +111,7 @@ Use the shared schema.
111
111
  - ATDD-specific reviewer checks:
112
112
  - coverage obligations met: E2E covers `US`, Integration covers `TC`, API covers `CON-API`;
113
113
  - Coverage Depth Matrix is reviewed and no unjustified `X` cells remain;
114
- - validation evidence exists and `qfai validate --fail-on error` passes;
114
+ - validation evidence exists and `qfai validate --profile atdd --fail-on error` passes;
115
115
  - Drift Protocol is enforced;
116
116
  - test-layer policy is checked against `.qfai/assistant/steering/test-layers.md`;
117
117
  - coverage floors and ratios are signals, not gates;
@@ -146,7 +146,7 @@ Follow `.qfai/assistant/instructions/shared-skill-operating-baseline.md#delta-re
146
146
  - Do NOT declare completion based on unit/component tests.
147
147
  - `10_Plan.md` is the primary How SSOT for execution phases.
148
148
  - If `10_Plan.md` is missing, stop and run owner planning flow before proceeding.
149
- - Completion gate is validation with zero errors (`qfai validate --fail-on error`).
149
+ - Completion gate is validation with zero errors (`qfai validate --profile atdd --fail-on error`).
150
150
  - Coverage obligations are mandatory:
151
151
  - `tests/e2e/**` must cover all required `US-*`.
152
152
  - `tests/integration/**` must cover all required `TC-*`.
@@ -219,7 +219,7 @@ Notes:
219
219
  - All required `US` are covered by E2E tests.
220
220
  - All required `TC` are covered by integration tests.
221
221
  - All required `CON-API` are covered by API tests.
222
- - Validation passes: `qfai validate --fail-on error`.
222
+ - Validation passes: `qfai validate --profile atdd --fail-on error`.
223
223
  - Repository quality gates (format/lint/type/tests/pack) pass with evidence.
224
224
  - Evidence file exists and includes work orders + reviewer notes.
225
225
  - Completion is approved by a reviewer who did not implement tests.
@@ -318,7 +318,7 @@ Before declaring completion:
318
318
  2. Run:
319
319
 
320
320
  ```bash
321
- qfai validate --fail-on error
321
+ qfai validate --profile atdd --fail-on error
322
322
  ```
323
323
 
324
324
  3. Run repository standard gates:
@@ -88,7 +88,7 @@ Use the shared schema.
88
88
  - Follow `.qfai/assistant/instructions/shared-skill-delegation-baseline.md#reviewer-gate-baseline`.
89
89
  - Reviewer checks:
90
90
  - required roles were delegated;
91
- - validate evidence exists: `qfai validate --fail-on error` completed with `error=0`;
91
+ - doctor evidence exists: `qfai doctor --fail-on error` completed without failing checks;
92
92
  - Drift Protocol enforced;
93
93
  - test-layer policy enforced against `.qfai/assistant/steering/test-layers.md`;
94
94
  - tool-count heuristics are signals, not gates.
@@ -82,6 +82,7 @@ Before declaring completion, you MUST:
82
82
  - ensure every deferred item has full metadata in `13_Deferred.md`;
83
83
  - ensure `02_Inception-Deck.md` and `03_Story-Workshop.md` include Mermaid diagrams;
84
84
  - ensure the UI-bearing sidecar family is complete;
85
+ - run `qfai validate --profile discussion --fail-on error` and fix discussion-owned findings;
85
86
  - avoid selecting a single visual winner in discussion artifacts.
86
87
 
87
88
  ### Reviewer Gate (MUST)
@@ -28,7 +28,7 @@
28
28
 
29
29
  ## Validate Hard Gate(必須)
30
30
 
31
- - 各 review cycle で `qfai validate --fail-on error --format github` を実行していること
31
+ - 各 review cycle で `qfai validate --profile discussion --fail-on error --format github` を実行していること
32
32
  - `.qfai/report/validate.log` が存在し、最新の成果物に対応していること
33
33
 
34
34
  ---
@@ -289,6 +289,7 @@ Completion MUST NOT be declared when any of the following are true:
289
289
  - Items with `todo`, `red`, `green`, or `refactor` status still exist (for spec-level completion)
290
290
  - Parallel slices were used but integration verify has not been run post-merge
291
291
  - Checkpoint boundary was reached but verification was not executed
292
+ - `it.todo(...)` / `test.todo(...)` / `describe.todo(...)` stubs remain in any file covered by `validation.traceability.testFileGlobs` (`QFAI-TEST-001`, spec-0012). Implement the body or delete the stub — an opt-out via `validation.testStrategy.forbidTestTodoStubs: false` is permitted only with an accompanying waiver DR-ID.
292
293
 
293
294
  ## Evidence (MANDATORY)
294
295
 
@@ -335,6 +336,7 @@ Each TDD item MUST have fresh evidence containing at minimum:
335
336
  - [ ] No backward transitions occurred.
336
337
  - [ ] Exception items have DR-IDs recorded.
337
338
  - [ ] All tests pass.
339
+ - [ ] `qfai validate --profile tdd --fail-on error` passes with zero `QFAI-TEST-001` findings (no `it.todo` / `test.todo` / `describe.todo` stubs remain; spec-0012).
338
340
 
339
341
  ## Completion Checklist (MUST)
340
342
 
@@ -349,7 +351,7 @@ Each TDD item MUST have fresh evidence containing at minimum:
349
351
  When this skill is complete, provide a final user-facing completion message and enumerate all actionable next steps.
350
352
 
351
353
  - Verify gates: `/qfai-verify`.
352
- Action: run `qfai validate --fail-on error` and confirm all gates pass.
354
+ Action: run `qfai validate --profile tdd --fail-on error` for this skill, then `/qfai-verify` for full-scan approval.
353
355
  - Spec updates needed: `/qfai-sdd`.
354
356
  Action: update spec artifacts if implementation revealed scope changes.
355
357
  - Acceptance tests: `/qfai-atdd`.
@@ -30,18 +30,28 @@ Do not rely on a CLI entrypoint or package runtime loop.
30
30
  ## CRITICAL CONSTRAINTS (Read First)
31
31
 
32
32
  - Scope is all specs from `.qfai/specs/spec-*`.
33
- - Screenshot evidence and HTML snapshot evidence are mandatory.
34
- - Screenshot evidence path: `.qfai/evidence/prototyping/screenshots/<screen-id>.png`
35
- - HTML snapshot path: `.qfai/evidence/prototyping/html/<screen-id>.html`
36
- - If either screenshot or HTML is missing for a declared screen, that screen scores `0` and the run is incomplete.
37
- - Optional evidence is abolished. Missing mandatory evidence must trigger rerun, not waiver.
38
- - DONE is forbidden until `qfai validate --fail-on error` passes and `/qfai-verify` can approve the run.
33
+ - The AI evaluator sub-agent performs visual evaluation. QFAI does not score visual quality. (spec-0012)
34
+ - Playwright CLI (`playwright-cli`) is the sole standard browser tool. Playwright MCP, Node Playwright direct invocation, and screenshot-capture shell scripts are not used. (spec-0012)
35
+ - QFAI pre-assigns evidence paths. The evaluator MUST use the paths in the command plan (`review-bundle.json` → `command-plans.json`); it MUST NOT invent paths.
36
+ - For every declared screen and every active candidate in every round, 4 evidence artifacts are mandatory:
37
+ - screenshot: `.qfai/evidence/prototyping/rounds/<round>/candidates/<candidate-id>/<screen-id>.png`
38
+ - HTML: `.qfai/evidence/prototyping/rounds/<round>/candidates/<candidate-id>/<screen-id>.html`
39
+ - accessibility snapshot: `.qfai/evidence/prototyping/rounds/<round>/candidates/<candidate-id>/<screen-id>.snapshot.txt`
40
+ - command log: `.qfai/evidence/prototyping/rounds/<round>/candidates/<candidate-id>/<screen-id>.commands.json`
41
+ - Canonical latest screenshot path: `.qfai/evidence/prototyping/screenshots/<screen-id>.png`
42
+ - Canonical latest HTML path: `.qfai/evidence/prototyping/html/<screen-id>.html`
43
+ - Canonical latest paths MUST mirror the latest accepted winner/polish artifacts.
44
+ - If any of the 4 artifacts is missing for a declared screen, the round is incomplete; rerun is mandatory, not waiver.
45
+ - Mode differences are limited to `maxCycles` only (low-cost=1, standard=3, full-harness=20). Every other gate, obligation, reviewer severity, and completion criterion is identical across modes. (spec-0012)
46
+ - DONE is forbidden until `qfai validate --profile prototyping --fail-on error` passes and `/qfai-verify` can approve the run.
39
47
  - Supported UI prototyping surfaces are `web`, `mobile`, `desktop`, and `mixed`.
40
48
  - `cli`, API-only, backend-only, and `ui_bearing: false` classifications are not prototyping execution targets.
41
- - `cli` is not supported and is not an execution target for prototyping.
42
- - Evaluation is performed by sub-agents; machine checks are limited to schema/evidence validation and breakthrough trigger detection.
43
- - Shared evidence vocabulary includes `render.json`, `browser-qa.json`, `prototyping.json`, and `breakthrough.json`.
44
- - static-first evidence capture remains mandatory even when interactive review is used.
49
+ - Machine checks are limited to schema/evidence validation, mode invariant enforcement, review-cycle completeness, and breakthrough trigger detection.
50
+ - Shared evidence vocabulary: `prototyping.json`, `review-bundle.json`, `command-plans.json`, `evaluator-reviews/<candidate-id>.json`, `harvest.json`, `absorption-plan.json`, `reimplementation.json`, `breakthrough.json`.
51
+ - Direction funnel completion is not stage completion.
52
+ - Selecting the first winner does not satisfy completion. Completion review is forbidden until at least one post-selection polish cycle has completed.
53
+ - Completion requires every reviewer sub-agent to score every evaluation axis at `100/100`; `95` is not a completion border.
54
+ - Do not use `complete`, `completed`, `done`, or equivalent completion wording in other languages before the completion checklist passes. Use `exploration complete`, `winner selected`, `polishing`, `breakthrough checking`, or `reviewer gate pending` for interim states.
45
55
 
46
56
  ## Goal
47
57
 
@@ -50,13 +60,17 @@ Generate multiple design directions, converge on a winner, extract the selected
50
60
  ## Surface / Mode
51
61
 
52
62
  - surface / mode routing uses `standard` as the default execution path.
53
- - `standard` is the default when no explicit escalation to `full-harness` is requested.
54
- - `full-harness` is reserved for explicit escalation and review-heavy obligations.
63
+ - **Mode Invariant (spec-0012)**: modes differ only by `maxCycles`. Review gate, evidence requirements, reviewer severity, best-of-history, breakthrough detection, and completion criteria are identical across modes.
64
+ - `low-cost`: `maxCycles = 1`
65
+ - `standard`: `maxCycles = 3` (default)
66
+ - `full-harness`: `maxCycles = 20`
67
+ - No mode weakens obligations. Choosing a lower mode buys fewer chances to iterate, not a looser gate.
55
68
 
56
69
  ## Required References
57
70
 
58
71
  Read and follow these references before execution:
59
72
 
73
+ - `.qfai/specs/spec-0012/01_Spec.md` — primary SSOT for the prototyping harness
60
74
  - `.qfai/assistant/skills/qfai-prototyping/references/evidence-requirements.md`
61
75
  - `.qfai/assistant/skills/qfai-prototyping/references/iteration-cycle.md`
62
76
  - `.qfai/assistant/skills/qfai-prototyping/references/l1-review-guide.md`
@@ -73,13 +87,13 @@ All sub-agent delegation in this skill MUST follow the category-to-role mapping
73
87
  Assigning a task to a role not listed for the category is a violation and MUST be flagged.
74
88
  Evaluation scoring and screenshot capture must use only the allowed roles below.
75
89
 
76
- | Category | Allowed Role(s) |
77
- | --------------------- | ------------------------------------------------------ |
78
- | UI implementation | frontend-engineer, product-experience-architect |
79
- | Screenshot capture | devops-ci-engineer |
80
- | Evaluation scoring | product-surface-reviewer, product-experience-architect |
81
- | Build | devops-ci-engineer, backend-engineer |
82
- | Breakthrough planning | product-experience-architect, frontend-engineer |
90
+ | Category | Allowed Role(s) |
91
+ | ---------------------------------- | ------------------------------------------------------ |
92
+ | UI implementation | frontend-engineer, product-experience-architect |
93
+ | Playwright CLI execution & capture | product-surface-reviewer, product-experience-architect |
94
+ | Evaluation scoring | product-surface-reviewer, product-experience-architect |
95
+ | Build | devops-ci-engineer, backend-engineer |
96
+ | Breakthrough planning | product-experience-architect, frontend-engineer |
83
97
 
84
98
  Any delegation map entry that assigns a category to an undefined or unlisted role MUST produce a violation finding naming the undefined role and the category.
85
99
 
@@ -91,7 +105,7 @@ Before any code is written, create an execution plan record in the work evidence
91
105
 
92
106
  Required fields:
93
107
 
94
- - `targetIterations`: integer; minimum 2
108
+ - `targetRounds`: ordered array; default funnel is `["r5", "r3", "r2", "r1"]`
95
109
  - `funnelPolicy`: `5->3->2->1`
96
110
  - `evaluationAxesSource`: ref to `.qfai/contracts/design/evaluation-rubric.yaml`
97
111
  - `delegationMap`: category-to-role assignments per Delegation Scope Table
@@ -139,56 +153,84 @@ Confirm all of the following before any evaluation:
139
153
  Generate 5 clearly distinct design directions before selecting a winner.
140
154
  Do not begin with a single incumbent direction.
141
155
 
142
- ### Step 4 — Capture Mandatory Evidence
156
+ ### Step 4 — Round Start: Prepare Candidate Review Bundle & Command Plans
143
157
 
144
- For every declared screen and every active direction:
158
+ Before launching the evaluator, prepare the round-scoped artifacts via QFAI (not by hand):
145
159
 
146
- - capture one screenshot and store it at the canonical screenshot path
147
- - capture one HTML snapshot and store it at the canonical HTML path
148
- - record missing evidence immediately; do not continue as if capture succeeded
160
+ - Run `qfai prototyping round-start --round <rN> --candidates <csv> --target-url <url> --mode <mode>`.
161
+ - QFAI produces:
162
+ - `.qfai/evidence/prototyping/rounds/<rN>/command-plans.json` the candidate-aware Playwright CLI command plans
163
+ - `.qfai/evidence/prototyping/rounds/<rN>/review-bundle.json` — the evaluator input bundle (candidates, axisDefs, designSystemChecklist, commandPlanRef)
164
+ - Do not invent evidence paths. Paths are fixed by QFAI per spec-0012.
149
165
 
150
- ### Step 5 — Launch Evaluation Reviewers
166
+ ### Step 5 — AI Evaluator Executes the Command Plans and Captures Evidence
151
167
 
152
- Launch evaluation reviewer sub-agents with the full context bundle:
168
+ For every declared screen of every active candidate in the current round, the AI evaluator sub-agent:
153
169
 
154
- - screenshots from Step 4
155
- - HTML snapshots from Step 4
156
- - `axisDefs` from `.qfai/contracts/design/evaluation-rubric.yaml`
157
- - `previousScore` from the prior iteration (`null` for iteration 1)
158
- - `designSystemChecklist` from `.qfai/contracts/design/design-system.yaml`
170
+ 1. Reads `command-plans.json` for the round
171
+ 2. Runs `playwright-cli goto <url>` for the candidate route
172
+ 3. Runs `playwright-cli snapshot --save <candidate-path>/<screen-id>.snapshot.txt`
173
+ 4. Performs interaction commands (click/fill) to exercise `primaryTasks` noted in the plan
174
+ 5. Runs `playwright-cli screenshot --full-page --save <candidate-path>/<screen-id>.png`
175
+ 6. Runs `playwright-cli eval "document.documentElement.outerHTML" > <candidate-path>/<screen-id>.html`
176
+ 7. Saves the sequence of executed commands to `<candidate-path>/<screen-id>.commands.json`
159
177
 
160
- ### Step 6 Direction Funnel
178
+ If any capture step fails, the evaluator records the failure and stops pretending the screen was evaluated. The round is incomplete and must be rerun.
179
+
180
+ ### Step 6 — Launch Evaluation Reviewers
181
+
182
+ Launch evaluation reviewer sub-agents with the full context bundle. Inputs are read from `review-bundle.json`:
183
+
184
+ - per-screen screenshot, HTML, accessibility snapshot, and command log under `rounds/<round>/candidates/<candidate-id>/`
185
+ - `axisDefs` (from `.qfai/contracts/design/evaluation-rubric.yaml`)
186
+ - `previousScore` from the prior round when available
187
+ - `designSystemChecklist` (from `.qfai/contracts/design/design-system.yaml`)
188
+ - `commandPlanRef` pointing at `command-plans.json`
189
+
190
+ The reviewer writes `rounds/<round>/evaluator-reviews/<candidate-id>.json` with concrete `evidenceRefs[]` for every score. Placeholder refs are rejected.
191
+
192
+ ### Step 7 — Harvest and Direction Funnel
161
193
 
162
194
  Run the mandatory convergence funnel:
163
195
 
164
- - 5 directions -> top 3
165
- - top 3 remixed -> top 2
166
- - top 2 -> selected winner 1
196
+ - `r5`: 5 directions -> top 3
197
+ - `r3`: top 3 remixed -> top 2
198
+ - `r2`: top 2 -> selected winner `r1`
167
199
 
168
- ### Step 7 Extract Winner Contracts
200
+ At the end of each harvestable round:
201
+
202
+ - run `qfai prototyping round-harvest --round <rN>`
203
+ - record survivors with `qfai prototyping round-narrow --round <rN> --survivors <csv>`
204
+ - for `r3|r2|r1`, generate absorption templates with `qfai prototyping round-absorb --round <rN> --survivors <csv>`
205
+
206
+ ### Step 8 — Extract Winner Contracts
169
207
 
170
208
  After the first winner is selected:
171
209
 
172
210
  - write `.qfai/contracts/design/selected-direction.yaml`
173
211
  - extract `.qfai/contracts/design/design-system.yaml`
174
212
 
175
- ### Step 8 Polish the Winner
213
+ Selecting the first winner is not completion. Do not start completion review and do not use completion wording until Step 9, Step 10, Step 12, reviewer gate, and the perfect-100 score gate pass.
214
+
215
+ ### Step 9 — Polish the Winner
176
216
 
177
217
  Iterate on the selected winner with normal critique/rework loops.
178
218
  Do not assume the latest iteration is automatically best; keep best-of-history in evidence.
219
+ At least one full post-selection polish loop is mandatory. Each polish loop must include critique, fix, re-capture, re-review, and breakthrough check evidence.
179
220
 
180
- ## Iteration Gate
221
+ ## Cycle Gate
181
222
 
182
- - Minimum 2 iterations are required before any terminal phase transition is allowed.
183
- - Do not mark the run as converged or complete after a single iteration.
184
- - Any phase transition to completion must pass through the iteration gate and reviewer gate.
223
+ - Completion requires at least one `polish` cycle after winner selection (spec-0012). This applies to all modes.
224
+ - The same gate applies in every mode; modes differ only in `maxCycles` (low-cost=1, standard=3, full-harness=20).
225
+ - If the polish-cycle budget is exhausted before the gate is satisfied, the run does NOT complete. The evaluator returns `REVISE` and the developer may re-run at a higher mode.
226
+ - Any phase transition to completion must pass through the cycle gate and the reviewer gate.
185
227
 
186
- ### Step 9 — Breakthrough Detection
228
+ ### Step 10 — Breakthrough Detection
187
229
 
188
230
  After each polish iteration, run the mechanical breakthrough detector.
189
- If `allItemsPass95` is false and score improvement is below the configured plateau threshold and code change is below the configured diff threshold, trigger breakthrough branching.
231
+ If `allReviewerAxesPerfect100` is false and score improvement is below the configured plateau threshold and code change is below the configured diff threshold, trigger breakthrough branching.
190
232
 
191
- ### Step 10 — Breakthrough Branch Loop
233
+ ### Step 11 — Breakthrough Branch Loop
192
234
 
193
235
  When breakthrough is triggered:
194
236
 
@@ -198,21 +240,26 @@ When breakthrough is triggered:
198
240
  - refresh selected-direction/design-system if the winner changes
199
241
  - record the decision in `.qfai/evidence/breakthrough.json`
200
242
 
201
- ### Step 11 — Validate and Verify
243
+ ### Step 12 — Validate and Verify
202
244
 
203
- - Run `qfai validate --fail-on error`.
245
+ - Run `qfai validate --profile prototyping --fail-on error`.
204
246
  - Route `/qfai-verify` or its equivalent gate workflow for final quality approval.
205
247
  - Do not declare completion until the reviewer result is `PASS`.
206
248
 
207
249
  ## Evaluator Inputs (Mandatory)
208
250
 
209
- When launching any evaluation reviewer sub-agent, all 5 elements MUST be present:
251
+ Evaluation reviewer sub-agents MUST be launched with the `review-bundle.json` for the current round. The bundle contains all required inputs. At a minimum, the bundle MUST reference:
252
+
253
+ 1. screenshots (per declared screen, round/candidate path)
254
+ 2. HTML snapshots (per declared screen, round/candidate path)
255
+ 3. accessibility snapshots (`<screen-id>.snapshot.txt` per declared screen, round/candidate path)
256
+ 4. Playwright CLI command log (`<screen-id>.commands.json` per declared screen, round/candidate path)
257
+ 5. `axisDefs` from `.qfai/contracts/design/evaluation-rubric.yaml`
258
+ 6. `previousScore` from the prior round when available
259
+ 7. `designSystemChecklist` from `.qfai/contracts/design/design-system.yaml`
260
+ 8. `commandPlanRef` pointing at `command-plans.json`
210
261
 
211
- 1. screenshots
212
- 2. HTML snapshots
213
- 3. axisDefs
214
- 4. previousScore
215
- 5. designSystemChecklist
262
+ The evaluator writes `evaluator-reviews/<candidate-id>.json` with per-axis `score`, `rationale`, and `evidenceRefs[]`. Every `evidenceRefs[]` entry MUST point to an existing artifact; placeholder strings (`""`, `"tbd"`, `"TBD"`) are rejected by `qfai validate`.
216
263
 
217
264
  ## Visual Quality Structural Checklist
218
265
 
@@ -238,9 +285,12 @@ Minimum reviewer responsibilities:
238
285
  - verify mandatory screenshot/HTML evidence exists for every declared screen
239
286
  - verify exploration brief, evaluation rubric, and evaluator calibration were used
240
287
  - verify missing evidence caused rerun rather than waiver
241
- - verify `qfai validate --fail-on error` completed successfully
288
+ - verify `qfai validate --profile prototyping --fail-on error` completed successfully
242
289
  - verify breakthrough trigger evidence is present
243
290
  - verify best-of-history handling is documented
291
+ - verify at least one post-selection polish iteration completed after winner selection
292
+ - verify every reviewer sub-agent scored every evaluation axis at `100/100`
293
+ - reject completion claims based on any 95-point threshold
244
294
  - treat score/volume heuristics as signals, not gates
245
295
  - return `Result: PASS | REVISE`
246
296
 
@@ -271,25 +321,34 @@ Use the shared schema (per-row `Status (PASS/REVISE)` column, reviewer response
271
321
 
272
322
  Follow `.qfai/assistant/instructions/shared-skill-operating-baseline.md#completion-contract-shared`.
273
323
 
274
- Prototyping-specific additions:
324
+ Prototyping-specific additions (apply to all modes identically):
275
325
 
276
326
  - all specs are covered
277
- - all declared screens have screenshot + HTML evidence
327
+ - all declared screens have 4 artifacts per active candidate / round: screenshot, HTML, accessibility snapshot, Playwright CLI command log
328
+ - canonical latest paths mirror the latest accepted winner/polish state
329
+ - `review-bundle.json`, `command-plans.json`, and per-candidate evaluator reviews exist for every round
278
330
  - `selected-direction.yaml` exists
279
331
  - `design-system.yaml` exists
280
332
  - `breakthrough.json` exists
281
- - `qfai validate --fail-on error` passes
282
- - reviewer returns `PASS`
333
+ - `bestOfHistory` and `breakthrough` sections present in `prototyping.json`
334
+ - at least one post-selection polish cycle completed after winner selection
335
+ - every reviewer sub-agent scored every evaluation axis at `100/100`
336
+ - independent reviewer gate returned `PASS`
337
+ - `qfai validate --profile prototyping --fail-on error` passes
283
338
 
284
339
  ## FINAL CHECKLIST (Check Last)
285
340
 
286
341
  - All specs are covered in the Coverage Matrix.
287
- - Every declared screen has screenshot evidence.
288
- - Every declared screen has HTML evidence.
342
+ - Every declared screen has screenshot, HTML, accessibility snapshot, and command log evidence per active candidate / round.
343
+ - Canonical latest paths mirror the latest accepted winner/polish artifacts.
344
+ - Mode invariant: `maxCycles` is the only mode-dependent field in `prototyping.json` (validated by `QFAI-PROT-MODE-001`).
289
345
  - Missing evidence triggered rerun instead of waiver.
290
346
  - Direction funnel `5->3->2->1` completed.
291
- - Breakthrough detector ran after polish iterations.
292
- - Reviewer returned PASS; otherwise status is REVISE.
347
+ - Direction funnel completion was not treated as stage completion.
348
+ - At least one post-selection polish cycle completed with critique/fix/re-capture/re-review/breakthrough checks.
349
+ - Every reviewer sub-agent scored every evaluation axis at `100/100`.
350
+ - Breakthrough detector ran after polish cycles.
351
+ - Independent reviewer returned PASS; otherwise status is REVISE.
293
352
 
294
353
  ## Completion Message & Next Actions (MUST)
295
354
 
@@ -1,31 +1,62 @@
1
1
  # Evidence Requirements
2
2
 
3
- ## Mandatory evidence
3
+ ## Mandatory evidence (per round, per candidate, per screen)
4
4
 
5
- For every declared screen in `.qfai/contracts/ui/*.yaml`, collect both:
5
+ For every declared screen in `.qfai/contracts/ui/*.yaml`, collect all 4 artifacts for every active candidate in every round `<rN>`:
6
6
 
7
- - screenshot: `.qfai/evidence/prototyping/screenshots/<screen-id>.png`
8
- - HTML snapshot: `.qfai/evidence/prototyping/html/<screen-id>.html`
7
+ - screenshot: `.qfai/evidence/prototyping/rounds/<rN>/candidates/<candidate-id>/<screen-id>.png`
8
+ - HTML snapshot: `.qfai/evidence/prototyping/rounds/<rN>/candidates/<candidate-id>/<screen-id>.html`
9
+ - accessibility snapshot: `.qfai/evidence/prototyping/rounds/<rN>/candidates/<candidate-id>/<screen-id>.snapshot.txt`
10
+ - Playwright CLI command log: `.qfai/evidence/prototyping/rounds/<rN>/candidates/<candidate-id>/<screen-id>.commands.json`
11
+
12
+ ## Per-round artifacts
13
+
14
+ Every exploration round MUST also produce these round-scoped artifacts:
15
+
16
+ - Playwright CLI command plans: `.qfai/evidence/prototyping/rounds/<rN>/command-plans.json`
17
+ - Review input bundle: `.qfai/evidence/prototyping/rounds/<rN>/review-bundle.json`
18
+ - Evaluator outputs: `.qfai/evidence/prototyping/rounds/<rN>/evaluator-reviews/<candidate-id>.json`
19
+ - Harvest template for `r5|r3|r2`: `.qfai/evidence/prototyping/rounds/<rN>/harvest.json`
20
+ - Narrow decision for `r5|r3|r2`: `.qfai/evidence/prototyping/rounds/<rN>/narrow-decision.json`
21
+ - Absorption plan for `r3|r2|r1`: `.qfai/evidence/prototyping/rounds/<rN>/absorption-plan.json`
22
+ - Reimplementation record for `r3|r2|r1`: `.qfai/evidence/prototyping/rounds/<rN>/reimplementation.json`
23
+
24
+ ## Canonical latest paths
25
+
26
+ The canonical latest screenshot and HTML MUST mirror the newest accepted winner/polish artifacts for the same `<screen-id>`:
27
+
28
+ - `.qfai/evidence/prototyping/screenshots/<screen-id>.png`
29
+ - `.qfai/evidence/prototyping/html/<screen-id>.html`
9
30
 
10
31
  If either artifact is missing:
11
32
 
12
- - the screen is scored `0`
13
- - the run is incomplete
33
+ - the screen is scored `0` for the round
34
+ - the round is incomplete
14
35
  - rerun is mandatory
15
36
 
16
37
  Optional evidence is not allowed.
17
38
 
18
39
  ## Capture rules
19
40
 
41
+ - AI evaluator sub-agent executes the Playwright CLI command plans generated by QFAI.
42
+ - Paths are assigned by QFAI via `command-plans.json`. Do not invent paths.
20
43
  - Use stable `screen-id` names from the canonical UI contracts.
21
- - Overwrite stale evidence with fresh evidence from the current iteration.
22
- - Do not reuse an older screenshot or HTML snapshot after a fix.
44
+ - Overwrite stale evidence with fresh evidence from the current active winner/polish state while preserving round history under `rounds/<rN>/`.
45
+ - Do not reuse an older screenshot, HTML snapshot, accessibility snapshot, or command log after a fix.
23
46
  - If capture fails, record the failure in work evidence and stop pretending the screen was evaluated.
24
47
 
48
+ ## Mode invariant
49
+
50
+ Evidence requirements are identical for all modes (low-cost / standard / full-harness) per spec-0012. Modes differ only by `maxCycles` (1 / 3 / 20). Choosing a lower mode does NOT reduce evidence obligations.
51
+
25
52
  ## Validate gate expectations
26
53
 
27
- `qfai validate --fail-on error` must be able to confirm:
54
+ `qfai validate --profile prototyping --fail-on error` must be able to confirm, for every round:
28
55
 
29
- - every declared screen has a screenshot file
30
- - every declared screen has an HTML snapshot file
31
- - the file paths follow the canonical directories above
56
+ - every declared screen has all 4 per-screen artifacts
57
+ - the round has `command-plans.json`, `review-bundle.json`, and per-candidate evaluator reviews
58
+ - canonical latest paths mirror the newest accepted winner/polish artifacts
59
+ - `review-bundle.json` has all required fields (candidates, axisDefs, designSystemChecklist, commandPlanRef)
60
+ - evaluator review `evidenceRefs[]` entries are concrete paths to existing files (no placeholders)
61
+ - `prototyping.json` `maxCycles` matches the mode (QFAI-PROT-MODE-001)
62
+ - `bestOfHistory`, `breakthrough`, and `reviewerGate` sections are present and populated
@@ -1,25 +1,57 @@
1
- # Iteration Cycle
1
+ # Round Lifecycle
2
2
 
3
- Each iteration follows this order:
3
+ ## Phase taxonomy
4
4
 
5
- 1. Capture screenshot and HTML for every declared screen.
6
- 2. Launch L1 and L2 evaluator sub-agents with the required inputs.
7
- 3. Aggregate findings and classify them by severity and disposition.
8
- 4. Fix the UI according to findings.
9
- 5. Re-capture screenshot and HTML evidence for every changed screen.
10
- 6. Re-run the evaluators.
5
+ The exploration funnel is expressed as fixed rounds plus optional polish cycles:
11
6
 
12
- ## Minimum iteration count
7
+ - exploration rounds: `r5`, `r3`, `r2`, `r1`
8
+ - post-selection loops: `polish`, `branch`, `reviewer_gate`, `completed`
13
9
 
14
- - Completion requires at least 2 iterations.
15
- - A single successful-looking pass is not enough.
16
- - If evidence is missing in any iteration, that iteration does not count as complete.
10
+ `r1` is winner selection only. It does not count as a post-selection polish cycle and cannot be used as stage completion.
11
+
12
+ ## Round steps
13
+
14
+ Each exploration round follows this order, driven by the AI evaluator sub-agent running Playwright CLI:
15
+
16
+ 1. **Round start**: run `qfai prototyping round-start --round <rN> --candidates <csv> --target-url <url> --mode <mode>`.
17
+ 2. **Capture**: execute `command-plans.json` (goto, snapshot, interaction, screenshot, html) for every declared screen of every active candidate, saving evidence at the assigned paths.
18
+ 3. **Evaluate**: evaluator sub-agents read `review-bundle.json` and write per-candidate `evaluator-reviews/<candidate-id>.json` with concrete `evidenceRefs[]`.
19
+ 4. **Harvest**: run `qfai prototyping round-harvest --round <rN>` to create the harvest template from the evaluated candidate set.
20
+ 5. **Narrow**: run `qfai prototyping round-narrow --round <rN> --survivors <csv>` to record which candidates survive to the next round.
21
+ 6. **Absorb**: for `r3|r2|r1`, run `qfai prototyping round-absorb --round <rN> --survivors <csv>` to generate the absorption plan for the surviving candidates.
22
+ 7. **Reimplement verify**: run `qfai prototyping round-reimplement-verify --round <rN>` after reimplementation evidence is written.
23
+
24
+ ## Completion requirements
25
+
26
+ Completion (independent of mode) requires ALL of the following:
27
+
28
+ - at least one `polish` cycle completed after winner selection (capture + review + fix + re-capture + re-review)
29
+ - all declared screens have all 4 artifacts in the completion round / polish cycle
30
+ - blocking findings are closed or dispositioned
31
+ - `bestOfHistory` evidence present
32
+ - `breakthrough` evidence present
33
+ - every reviewer sub-agent scored every evaluation axis at `100/100`
34
+ - `qfai validate --profile prototyping --fail-on error` passes
35
+ - independent reviewer returns `PASS`
36
+ - the completion certificate proves `allReviewerAxesPerfect100=true`
37
+
38
+ ## Mode invariant
39
+
40
+ The completion gate above applies identically to `low-cost`, `standard`, and `full-harness`. The only mode-specific value is `maxCycles`:
41
+
42
+ - `low-cost`: `maxCycles = 1` — at most one polish cycle; completion is only reachable if the single polish cycle satisfies the full gate.
43
+ - `standard`: `maxCycles = 3` — default.
44
+ - `full-harness`: `maxCycles = 20` — extended polish budget.
45
+
46
+ If the polish-cycle budget is exhausted before the gate is satisfied, the run does not complete and is returned as `REVISE`. A 95-point threshold is a signal only and is not a completion border.
17
47
 
18
48
  ## Stop conditions
19
49
 
20
50
  You may stop only when all of the following are true:
21
51
 
22
- - all declared screens have screenshot + HTML evidence
52
+ - all declared screens have all 4 artifacts for the current accepted winner/polish state
53
+ - canonical latest paths match the current accepted winner/polish state
23
54
  - blocking findings are closed or dispositioned
24
- - validate passes with `--fail-on error`
55
+ - `qfai validate --profile prototyping --fail-on error` passes
25
56
  - independent reviewer returns `PASS`
57
+ - the completion certificate proves `allReviewerAxesPerfect100=true`