qfai 1.8.2 → 1.8.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +9 -4
- package/assets/init/.qfai/assistant/agents/product-experience-architect.md +2 -1
- package/assets/init/.qfai/assistant/skills/qfai-atdd/SKILL.md +4 -4
- package/assets/init/.qfai/assistant/skills/qfai-configure/SKILL.md +1 -1
- package/assets/init/.qfai/assistant/skills/qfai-discussion/SKILL.md +1 -0
- package/assets/init/.qfai/assistant/skills/qfai-discussion/references/rcp_footer.md +1 -1
- package/assets/init/.qfai/assistant/skills/qfai-implement/SKILL.md +3 -1
- package/assets/init/.qfai/assistant/skills/qfai-prototyping/SKILL.md +121 -62
- package/assets/init/.qfai/assistant/skills/qfai-prototyping/references/evidence-requirements.md +43 -12
- package/assets/init/.qfai/assistant/skills/qfai-prototyping/references/iteration-cycle.md +46 -14
- package/assets/init/.qfai/assistant/skills/qfai-prototyping/references/l1-review-guide.md +13 -12
- package/assets/init/.qfai/assistant/skills/qfai-prototyping/references/l2-review-guide.md +16 -10
- package/assets/init/.qfai/assistant/skills/qfai-prototyping/references/reviewer-gate.md +25 -4
- package/assets/init/.qfai/assistant/skills/qfai-sdd/SKILL.md +3 -3
- package/assets/init/.qfai/assistant/skills/qfai-sdd/references/rcp_footer.md +1 -1
- package/assets/init/.qfai/assistant/skills/qfai-sdd/references/sdd-quality-gate.md +1 -1
- package/assets/init/.qfai/assistant/skills/qfai-sdd/templates/contracts/absorption-policy.sample.yaml +7 -0
- package/assets/init/.qfai/assistant/skills/qfai-sdd/templates/contracts/evaluation-rubric.sample.yaml +20 -3
- package/assets/init/.qfai/assistant/skills/qfai-sdd/templates/contracts/evaluator-calibration.sample.yaml +6 -0
- package/assets/init/.qfai/assistant/skills/qfai-sdd/templates/contracts/exploration-brief.sample.yaml +9 -0
- package/assets/init/.qfai/assistant/skills/qfai-verify/SKILL.md +6 -6
- package/assets/init/.qfai/contracts/design/README.md +6 -1
- package/assets/init/.qfai/contracts/ui/README.md +2 -0
- package/assets/init/.qfai/discussion/README.md +14 -9
- package/assets/init/.qfai/evidence/README.md +66 -46
- package/assets/init/root/.github/workflows/qfai-validate.yml +39 -0
- package/assets/init/root/qfai.config.yaml +1 -2
- package/dist/cli/index.cjs +2539 -927
- package/dist/cli/index.cjs.map +1 -1
- package/dist/cli/index.mjs +2624 -1012
- package/dist/cli/index.mjs.map +1 -1
- package/dist/index.cjs +1120 -421
- package/dist/index.cjs.map +1 -1
- package/dist/index.d.cts +95 -23
- package/dist/index.d.ts +95 -23
- package/dist/index.mjs +1114 -414
- package/dist/index.mjs.map +1 -1
- package/package.json +3 -2
- package/assets/scripts/capture-screenshots.js +0 -128
package/README.md
CHANGED
|
@@ -52,13 +52,16 @@ npx qfai report
|
|
|
52
52
|
(`.qfai/review/review-*/summary.json` + minimum schema), writes `.qfai/report/validate.json`,
|
|
53
53
|
and appends run logs to `.qfai/report/run-*/`; use `--fail-on error` (or `--fail-on warning`) to turn it into a CI gate,
|
|
54
54
|
and `--format github` to emit GitHub-friendly annotations.
|
|
55
|
-
Use `--
|
|
55
|
+
Use `--profile discussion|sdd|prototyping|atdd|tdd|verify` for local skill-owned checks; CI should use default/full validation (or `verify` / `tdd` for the dedicated CI gates).
|
|
56
56
|
- `npx qfai report`
|
|
57
57
|
- Produces a human-readable report (`report.md` by default) or an internal JSON export (`report.json`) from `validate.json`; use `--base-url` to link file paths in Markdown to your repository viewer.
|
|
58
58
|
- `npx qfai doctor`
|
|
59
59
|
- Diagnoses configuration discovery, path resolution, glob scanning, and `validate.json` inputs before running validate/report; use `--fail-on` to enforce failures in CI.
|
|
60
60
|
Note: prototyping evidence (`.qfai/evidence/prototyping.json`) is produced by the AI workflow / skills
|
|
61
|
-
(`/qfai-prototyping`
|
|
61
|
+
(`/qfai-prototyping` — any mode; modes differ only in `maxCycles`, see spec-0012), not by a general-purpose end-user CLI flow.
|
|
62
|
+
Use `npx qfai prototyping round-start --round <r5|r3|r2|r1> --candidates <csv> --target-url <url> --mode <mode>`
|
|
63
|
+
to generate the round-scoped review bundle and command plans the AI evaluator sub-agent consumes, then use
|
|
64
|
+
`round-harvest`, `round-narrow`, `round-absorb`, and `round-reimplement-verify` to advance the candidate funnel.
|
|
62
65
|
`qfai validate` consumes the resulting evidence files, including `mode.effective` and `fullHarness` metadata when present.
|
|
63
66
|
Traceability refs inside prototyping evidence must use repo-root-relative concrete artifact refs (for example `.qfai/specs/spec-0001/01_Spec.md#L3` or `.qfai/evidence/render.json#/screens/0`).
|
|
64
67
|
Absolute paths are invalid. The same strict ref grammar is enforced for top-level and leaf evidence-bearing fields, including
|
|
@@ -112,7 +115,9 @@ QFAI includes a small set of custom skills (stored under `.qfai/assistant/skills
|
|
|
112
115
|
as 15 required markdown files under `.qfai/discussion/discussion-<ts>/`.
|
|
113
116
|
UI-bearing discussion packs may include `prototyping.yaml` as an optional recommendation artifact; non-ui discussion packs typically omit it.
|
|
114
117
|
- **qfai-sdd**: Unified SDD entrypoint with discussion-pack preflight guard (missing/incomplete/blocking OQ causes stop + next action guidance).
|
|
115
|
-
- **qfai-prototyping**: Build a contract-aligned UI prototype
|
|
118
|
+
- **qfai-prototyping**: Build a contract-aligned UI prototype using the Playwright CLI + AI
|
|
119
|
+
evaluator harness (spec-0012). Modes (`low-cost`/`standard`/`full-harness`) run the same
|
|
120
|
+
strictest review cycle; only `maxCycles` (1/3/20) differs.
|
|
116
121
|
- **qfai-atdd**: Implement acceptance tests driven by specs/scenarios.
|
|
117
122
|
- **qfai-implement**: Unified TDD micro-cycle (Red/Green/Refactor) one test at a time using `test-list.md` as the execution ledger, including ledger status updates and exception closure.
|
|
118
123
|
- **qfai-verify**: Run full-scan local quality gates (`validate --fail-on error`, `report`, repo gates) and produce reviewer-approved evidence under `.qfai/evidence/`.
|
|
@@ -286,7 +291,7 @@ npx qfai validate --fail-on error
|
|
|
286
291
|
|
|
287
292
|
Recommended baseline.
|
|
288
293
|
|
|
289
|
-
- Keep CI on default/full validation (`qfai validate --fail-on error`); do not use
|
|
294
|
+
- Keep CI on default/full validation (`qfai validate --fail-on error` or `qfai validate --profile verify --fail-on error`); do not use partial profiles in CI.
|
|
290
295
|
- Keep `pnpm check-types:future` as a separate mandatory gate so future TS compatibility runs once without duplicating `pnpm ci:gate`.
|
|
291
296
|
- Add a report step (`npx qfai report`) when you need a human-readable artifact.
|
|
292
297
|
- Tune traceability globs in `qfai.config.yaml` to match your test layout.
|
|
@@ -24,7 +24,8 @@
|
|
|
24
24
|
- .github/instructions/principles.instructions.md
|
|
25
25
|
- .instruction/00_universal/development-principles-checklist.md
|
|
26
26
|
- .instruction/01_specialties/design.md
|
|
27
|
-
- Exploration brief, reference pool, evaluation rubric, evaluator calibration, selected direction, finalized design system,
|
|
27
|
+
- Exploration brief, reference pool, evaluation rubric, evaluator calibration, selected direction, finalized design system,
|
|
28
|
+
screen contracts, optional tokens, optional fallback HTML/CSS mock, and Mermaid flows
|
|
28
29
|
- Runtime screenshots or rendered evidence when available
|
|
29
30
|
|
|
30
31
|
## Deliverables
|
|
@@ -111,7 +111,7 @@ Use the shared schema.
|
|
|
111
111
|
- ATDD-specific reviewer checks:
|
|
112
112
|
- coverage obligations met: E2E covers `US`, Integration covers `TC`, API covers `CON-API`;
|
|
113
113
|
- Coverage Depth Matrix is reviewed and no unjustified `X` cells remain;
|
|
114
|
-
- validation evidence exists and `qfai validate --fail-on error` passes;
|
|
114
|
+
- validation evidence exists and `qfai validate --profile atdd --fail-on error` passes;
|
|
115
115
|
- Drift Protocol is enforced;
|
|
116
116
|
- test-layer policy is checked against `.qfai/assistant/steering/test-layers.md`;
|
|
117
117
|
- coverage floors and ratios are signals, not gates;
|
|
@@ -146,7 +146,7 @@ Follow `.qfai/assistant/instructions/shared-skill-operating-baseline.md#delta-re
|
|
|
146
146
|
- Do NOT declare completion based on unit/component tests.
|
|
147
147
|
- `10_Plan.md` is the primary How SSOT for execution phases.
|
|
148
148
|
- If `10_Plan.md` is missing, stop and run owner planning flow before proceeding.
|
|
149
|
-
- Completion gate is validation with zero errors (`qfai validate --fail-on error`).
|
|
149
|
+
- Completion gate is validation with zero errors (`qfai validate --profile atdd --fail-on error`).
|
|
150
150
|
- Coverage obligations are mandatory:
|
|
151
151
|
- `tests/e2e/**` must cover all required `US-*`.
|
|
152
152
|
- `tests/integration/**` must cover all required `TC-*`.
|
|
@@ -219,7 +219,7 @@ Notes:
|
|
|
219
219
|
- All required `US` are covered by E2E tests.
|
|
220
220
|
- All required `TC` are covered by integration tests.
|
|
221
221
|
- All required `CON-API` are covered by API tests.
|
|
222
|
-
- Validation passes: `qfai validate --fail-on error`.
|
|
222
|
+
- Validation passes: `qfai validate --profile atdd --fail-on error`.
|
|
223
223
|
- Repository quality gates (format/lint/type/tests/pack) pass with evidence.
|
|
224
224
|
- Evidence file exists and includes work orders + reviewer notes.
|
|
225
225
|
- Completion is approved by a reviewer who did not implement tests.
|
|
@@ -318,7 +318,7 @@ Before declaring completion:
|
|
|
318
318
|
2. Run:
|
|
319
319
|
|
|
320
320
|
```bash
|
|
321
|
-
qfai validate --fail-on error
|
|
321
|
+
qfai validate --profile atdd --fail-on error
|
|
322
322
|
```
|
|
323
323
|
|
|
324
324
|
3. Run repository standard gates:
|
|
@@ -88,7 +88,7 @@ Use the shared schema.
|
|
|
88
88
|
- Follow `.qfai/assistant/instructions/shared-skill-delegation-baseline.md#reviewer-gate-baseline`.
|
|
89
89
|
- Reviewer checks:
|
|
90
90
|
- required roles were delegated;
|
|
91
|
-
-
|
|
91
|
+
- doctor evidence exists: `qfai doctor --fail-on error` completed without failing checks;
|
|
92
92
|
- Drift Protocol enforced;
|
|
93
93
|
- test-layer policy enforced against `.qfai/assistant/steering/test-layers.md`;
|
|
94
94
|
- tool-count heuristics are signals, not gates.
|
|
@@ -82,6 +82,7 @@ Before declaring completion, you MUST:
|
|
|
82
82
|
- ensure every deferred item has full metadata in `13_Deferred.md`;
|
|
83
83
|
- ensure `02_Inception-Deck.md` and `03_Story-Workshop.md` include Mermaid diagrams;
|
|
84
84
|
- ensure the UI-bearing sidecar family is complete;
|
|
85
|
+
- run `qfai validate --profile discussion --fail-on error` and fix discussion-owned findings;
|
|
85
86
|
- avoid selecting a single visual winner in discussion artifacts.
|
|
86
87
|
|
|
87
88
|
### Reviewer Gate (MUST)
|
|
@@ -28,7 +28,7 @@
|
|
|
28
28
|
|
|
29
29
|
## Validate Hard Gate(必須)
|
|
30
30
|
|
|
31
|
-
- 各 review cycle で `qfai validate --fail-on error --format github` を実行していること
|
|
31
|
+
- 各 review cycle で `qfai validate --profile discussion --fail-on error --format github` を実行していること
|
|
32
32
|
- `.qfai/report/validate.log` が存在し、最新の成果物に対応していること
|
|
33
33
|
|
|
34
34
|
---
|
|
@@ -289,6 +289,7 @@ Completion MUST NOT be declared when any of the following are true:
|
|
|
289
289
|
- Items with `todo`, `red`, `green`, or `refactor` status still exist (for spec-level completion)
|
|
290
290
|
- Parallel slices were used but integration verify has not been run post-merge
|
|
291
291
|
- Checkpoint boundary was reached but verification was not executed
|
|
292
|
+
- `it.todo(...)` / `test.todo(...)` / `describe.todo(...)` stubs remain in any file covered by `validation.traceability.testFileGlobs` (`QFAI-TEST-001`, spec-0012). Implement the body or delete the stub — an opt-out via `validation.testStrategy.forbidTestTodoStubs: false` is permitted only with an accompanying waiver DR-ID.
|
|
292
293
|
|
|
293
294
|
## Evidence (MANDATORY)
|
|
294
295
|
|
|
@@ -335,6 +336,7 @@ Each TDD item MUST have fresh evidence containing at minimum:
|
|
|
335
336
|
- [ ] No backward transitions occurred.
|
|
336
337
|
- [ ] Exception items have DR-IDs recorded.
|
|
337
338
|
- [ ] All tests pass.
|
|
339
|
+
- [ ] `qfai validate --profile tdd --fail-on error` passes with zero `QFAI-TEST-001` findings (no `it.todo` / `test.todo` / `describe.todo` stubs remain; spec-0012).
|
|
338
340
|
|
|
339
341
|
## Completion Checklist (MUST)
|
|
340
342
|
|
|
@@ -349,7 +351,7 @@ Each TDD item MUST have fresh evidence containing at minimum:
|
|
|
349
351
|
When this skill is complete, provide a final user-facing completion message and enumerate all actionable next steps.
|
|
350
352
|
|
|
351
353
|
- Verify gates: `/qfai-verify`.
|
|
352
|
-
Action: run `qfai validate --fail-on error`
|
|
354
|
+
Action: run `qfai validate --profile tdd --fail-on error` for this skill, then `/qfai-verify` for full-scan approval.
|
|
353
355
|
- Spec updates needed: `/qfai-sdd`.
|
|
354
356
|
Action: update spec artifacts if implementation revealed scope changes.
|
|
355
357
|
- Acceptance tests: `/qfai-atdd`.
|
|
@@ -30,18 +30,28 @@ Do not rely on a CLI entrypoint or package runtime loop.
|
|
|
30
30
|
## CRITICAL CONSTRAINTS (Read First)
|
|
31
31
|
|
|
32
32
|
- Scope is all specs from `.qfai/specs/spec-*`.
|
|
33
|
-
-
|
|
34
|
-
-
|
|
35
|
-
-
|
|
36
|
-
-
|
|
37
|
-
-
|
|
38
|
-
-
|
|
33
|
+
- The AI evaluator sub-agent performs visual evaluation. QFAI does not score visual quality. (spec-0012)
|
|
34
|
+
- Playwright CLI (`playwright-cli`) is the sole standard browser tool. Playwright MCP, Node Playwright direct invocation, and screenshot-capture shell scripts are not used. (spec-0012)
|
|
35
|
+
- QFAI pre-assigns evidence paths. The evaluator MUST use the paths in the command plan (`review-bundle.json` → `command-plans.json`); it MUST NOT invent paths.
|
|
36
|
+
- For every declared screen and every active candidate in every round, 4 evidence artifacts are mandatory:
|
|
37
|
+
- screenshot: `.qfai/evidence/prototyping/rounds/<round>/candidates/<candidate-id>/<screen-id>.png`
|
|
38
|
+
- HTML: `.qfai/evidence/prototyping/rounds/<round>/candidates/<candidate-id>/<screen-id>.html`
|
|
39
|
+
- accessibility snapshot: `.qfai/evidence/prototyping/rounds/<round>/candidates/<candidate-id>/<screen-id>.snapshot.txt`
|
|
40
|
+
- command log: `.qfai/evidence/prototyping/rounds/<round>/candidates/<candidate-id>/<screen-id>.commands.json`
|
|
41
|
+
- Canonical latest screenshot path: `.qfai/evidence/prototyping/screenshots/<screen-id>.png`
|
|
42
|
+
- Canonical latest HTML path: `.qfai/evidence/prototyping/html/<screen-id>.html`
|
|
43
|
+
- Canonical latest paths MUST mirror the latest accepted winner/polish artifacts.
|
|
44
|
+
- If any of the 4 artifacts is missing for a declared screen, the round is incomplete; rerun is mandatory, not waiver.
|
|
45
|
+
- Mode differences are limited to `maxCycles` only (low-cost=1, standard=3, full-harness=20). Every other gate, obligation, reviewer severity, and completion criterion is identical across modes. (spec-0012)
|
|
46
|
+
- DONE is forbidden until `qfai validate --profile prototyping --fail-on error` passes and `/qfai-verify` can approve the run.
|
|
39
47
|
- Supported UI prototyping surfaces are `web`, `mobile`, `desktop`, and `mixed`.
|
|
40
48
|
- `cli`, API-only, backend-only, and `ui_bearing: false` classifications are not prototyping execution targets.
|
|
41
|
-
-
|
|
42
|
-
-
|
|
43
|
-
-
|
|
44
|
-
-
|
|
49
|
+
- Machine checks are limited to schema/evidence validation, mode invariant enforcement, review-cycle completeness, and breakthrough trigger detection.
|
|
50
|
+
- Shared evidence vocabulary: `prototyping.json`, `review-bundle.json`, `command-plans.json`, `evaluator-reviews/<candidate-id>.json`, `harvest.json`, `absorption-plan.json`, `reimplementation.json`, `breakthrough.json`.
|
|
51
|
+
- Direction funnel completion is not stage completion.
|
|
52
|
+
- Selecting the first winner does not satisfy completion. Completion review is forbidden until at least one post-selection polish cycle has completed.
|
|
53
|
+
- Completion requires every reviewer sub-agent to score every evaluation axis at `100/100`; `95` is not a completion border.
|
|
54
|
+
- Do not use `complete`, `completed`, `done`, or equivalent completion wording in other languages before the completion checklist passes. Use `exploration complete`, `winner selected`, `polishing`, `breakthrough checking`, or `reviewer gate pending` for interim states.
|
|
45
55
|
|
|
46
56
|
## Goal
|
|
47
57
|
|
|
@@ -50,13 +60,17 @@ Generate multiple design directions, converge on a winner, extract the selected
|
|
|
50
60
|
## Surface / Mode
|
|
51
61
|
|
|
52
62
|
- surface / mode routing uses `standard` as the default execution path.
|
|
53
|
-
-
|
|
54
|
-
- `
|
|
63
|
+
- **Mode Invariant (spec-0012)**: modes differ only by `maxCycles`. Review gate, evidence requirements, reviewer severity, best-of-history, breakthrough detection, and completion criteria are identical across modes.
|
|
64
|
+
- `low-cost`: `maxCycles = 1`
|
|
65
|
+
- `standard`: `maxCycles = 3` (default)
|
|
66
|
+
- `full-harness`: `maxCycles = 20`
|
|
67
|
+
- No mode weakens obligations. Choosing a lower mode buys fewer chances to iterate, not a looser gate.
|
|
55
68
|
|
|
56
69
|
## Required References
|
|
57
70
|
|
|
58
71
|
Read and follow these references before execution:
|
|
59
72
|
|
|
73
|
+
- `.qfai/specs/spec-0012/01_Spec.md` — primary SSOT for the prototyping harness
|
|
60
74
|
- `.qfai/assistant/skills/qfai-prototyping/references/evidence-requirements.md`
|
|
61
75
|
- `.qfai/assistant/skills/qfai-prototyping/references/iteration-cycle.md`
|
|
62
76
|
- `.qfai/assistant/skills/qfai-prototyping/references/l1-review-guide.md`
|
|
@@ -73,13 +87,13 @@ All sub-agent delegation in this skill MUST follow the category-to-role mapping
|
|
|
73
87
|
Assigning a task to a role not listed for the category is a violation and MUST be flagged.
|
|
74
88
|
Evaluation scoring and screenshot capture must use only the allowed roles below.
|
|
75
89
|
|
|
76
|
-
| Category
|
|
77
|
-
|
|
|
78
|
-
| UI implementation
|
|
79
|
-
|
|
|
80
|
-
| Evaluation scoring
|
|
81
|
-
| Build
|
|
82
|
-
| Breakthrough planning
|
|
90
|
+
| Category | Allowed Role(s) |
|
|
91
|
+
| ---------------------------------- | ------------------------------------------------------ |
|
|
92
|
+
| UI implementation | frontend-engineer, product-experience-architect |
|
|
93
|
+
| Playwright CLI execution & capture | product-surface-reviewer, product-experience-architect |
|
|
94
|
+
| Evaluation scoring | product-surface-reviewer, product-experience-architect |
|
|
95
|
+
| Build | devops-ci-engineer, backend-engineer |
|
|
96
|
+
| Breakthrough planning | product-experience-architect, frontend-engineer |
|
|
83
97
|
|
|
84
98
|
Any delegation map entry that assigns a category to an undefined or unlisted role MUST produce a violation finding naming the undefined role and the category.
|
|
85
99
|
|
|
@@ -91,7 +105,7 @@ Before any code is written, create an execution plan record in the work evidence
|
|
|
91
105
|
|
|
92
106
|
Required fields:
|
|
93
107
|
|
|
94
|
-
- `
|
|
108
|
+
- `targetRounds`: ordered array; default funnel is `["r5", "r3", "r2", "r1"]`
|
|
95
109
|
- `funnelPolicy`: `5->3->2->1`
|
|
96
110
|
- `evaluationAxesSource`: ref to `.qfai/contracts/design/evaluation-rubric.yaml`
|
|
97
111
|
- `delegationMap`: category-to-role assignments per Delegation Scope Table
|
|
@@ -139,56 +153,84 @@ Confirm all of the following before any evaluation:
|
|
|
139
153
|
Generate 5 clearly distinct design directions before selecting a winner.
|
|
140
154
|
Do not begin with a single incumbent direction.
|
|
141
155
|
|
|
142
|
-
### Step 4 —
|
|
156
|
+
### Step 4 — Round Start: Prepare Candidate Review Bundle & Command Plans
|
|
143
157
|
|
|
144
|
-
|
|
158
|
+
Before launching the evaluator, prepare the round-scoped artifacts via QFAI (not by hand):
|
|
145
159
|
|
|
146
|
-
-
|
|
147
|
-
-
|
|
148
|
-
-
|
|
160
|
+
- Run `qfai prototyping round-start --round <rN> --candidates <csv> --target-url <url> --mode <mode>`.
|
|
161
|
+
- QFAI produces:
|
|
162
|
+
- `.qfai/evidence/prototyping/rounds/<rN>/command-plans.json` — the candidate-aware Playwright CLI command plans
|
|
163
|
+
- `.qfai/evidence/prototyping/rounds/<rN>/review-bundle.json` — the evaluator input bundle (candidates, axisDefs, designSystemChecklist, commandPlanRef)
|
|
164
|
+
- Do not invent evidence paths. Paths are fixed by QFAI per spec-0012.
|
|
149
165
|
|
|
150
|
-
### Step 5 —
|
|
166
|
+
### Step 5 — AI Evaluator Executes the Command Plans and Captures Evidence
|
|
151
167
|
|
|
152
|
-
|
|
168
|
+
For every declared screen of every active candidate in the current round, the AI evaluator sub-agent:
|
|
153
169
|
|
|
154
|
-
-
|
|
155
|
-
-
|
|
156
|
-
|
|
157
|
-
|
|
158
|
-
|
|
170
|
+
1. Reads `command-plans.json` for the round
|
|
171
|
+
2. Runs `playwright-cli goto <url>` for the candidate route
|
|
172
|
+
3. Runs `playwright-cli snapshot --save <candidate-path>/<screen-id>.snapshot.txt`
|
|
173
|
+
4. Performs interaction commands (click/fill) to exercise `primaryTasks` noted in the plan
|
|
174
|
+
5. Runs `playwright-cli screenshot --full-page --save <candidate-path>/<screen-id>.png`
|
|
175
|
+
6. Runs `playwright-cli eval "document.documentElement.outerHTML" > <candidate-path>/<screen-id>.html`
|
|
176
|
+
7. Saves the sequence of executed commands to `<candidate-path>/<screen-id>.commands.json`
|
|
159
177
|
|
|
160
|
-
|
|
178
|
+
If any capture step fails, the evaluator records the failure and stops pretending the screen was evaluated. The round is incomplete and must be rerun.
|
|
179
|
+
|
|
180
|
+
### Step 6 — Launch Evaluation Reviewers
|
|
181
|
+
|
|
182
|
+
Launch evaluation reviewer sub-agents with the full context bundle. Inputs are read from `review-bundle.json`:
|
|
183
|
+
|
|
184
|
+
- per-screen screenshot, HTML, accessibility snapshot, and command log under `rounds/<round>/candidates/<candidate-id>/`
|
|
185
|
+
- `axisDefs` (from `.qfai/contracts/design/evaluation-rubric.yaml`)
|
|
186
|
+
- `previousScore` from the prior round when available
|
|
187
|
+
- `designSystemChecklist` (from `.qfai/contracts/design/design-system.yaml`)
|
|
188
|
+
- `commandPlanRef` pointing at `command-plans.json`
|
|
189
|
+
|
|
190
|
+
The reviewer writes `rounds/<round>/evaluator-reviews/<candidate-id>.json` with concrete `evidenceRefs[]` for every score. Placeholder refs are rejected.
|
|
191
|
+
|
|
192
|
+
### Step 7 — Harvest and Direction Funnel
|
|
161
193
|
|
|
162
194
|
Run the mandatory convergence funnel:
|
|
163
195
|
|
|
164
|
-
- 5 directions -> top 3
|
|
165
|
-
- top 3 remixed -> top 2
|
|
166
|
-
- top 2 -> selected winner
|
|
196
|
+
- `r5`: 5 directions -> top 3
|
|
197
|
+
- `r3`: top 3 remixed -> top 2
|
|
198
|
+
- `r2`: top 2 -> selected winner `r1`
|
|
167
199
|
|
|
168
|
-
|
|
200
|
+
At the end of each harvestable round:
|
|
201
|
+
|
|
202
|
+
- run `qfai prototyping round-harvest --round <rN>`
|
|
203
|
+
- record survivors with `qfai prototyping round-narrow --round <rN> --survivors <csv>`
|
|
204
|
+
- for `r3|r2|r1`, generate absorption templates with `qfai prototyping round-absorb --round <rN> --survivors <csv>`
|
|
205
|
+
|
|
206
|
+
### Step 8 — Extract Winner Contracts
|
|
169
207
|
|
|
170
208
|
After the first winner is selected:
|
|
171
209
|
|
|
172
210
|
- write `.qfai/contracts/design/selected-direction.yaml`
|
|
173
211
|
- extract `.qfai/contracts/design/design-system.yaml`
|
|
174
212
|
|
|
175
|
-
|
|
213
|
+
Selecting the first winner is not completion. Do not start completion review and do not use completion wording until Step 9, Step 10, Step 12, reviewer gate, and the perfect-100 score gate pass.
|
|
214
|
+
|
|
215
|
+
### Step 9 — Polish the Winner
|
|
176
216
|
|
|
177
217
|
Iterate on the selected winner with normal critique/rework loops.
|
|
178
218
|
Do not assume the latest iteration is automatically best; keep best-of-history in evidence.
|
|
219
|
+
At least one full post-selection polish loop is mandatory. Each polish loop must include critique, fix, re-capture, re-review, and breakthrough check evidence.
|
|
179
220
|
|
|
180
|
-
##
|
|
221
|
+
## Cycle Gate
|
|
181
222
|
|
|
182
|
-
-
|
|
183
|
-
-
|
|
184
|
-
-
|
|
223
|
+
- Completion requires at least one `polish` cycle after winner selection (spec-0012). This applies to all modes.
|
|
224
|
+
- The same gate applies in every mode; modes differ only in `maxCycles` (low-cost=1, standard=3, full-harness=20).
|
|
225
|
+
- If the polish-cycle budget is exhausted before the gate is satisfied, the run does NOT complete. The evaluator returns `REVISE` and the developer may re-run at a higher mode.
|
|
226
|
+
- Any phase transition to completion must pass through the cycle gate and the reviewer gate.
|
|
185
227
|
|
|
186
|
-
### Step
|
|
228
|
+
### Step 10 — Breakthrough Detection
|
|
187
229
|
|
|
188
230
|
After each polish iteration, run the mechanical breakthrough detector.
|
|
189
|
-
If `
|
|
231
|
+
If `allReviewerAxesPerfect100` is false and score improvement is below the configured plateau threshold and code change is below the configured diff threshold, trigger breakthrough branching.
|
|
190
232
|
|
|
191
|
-
### Step
|
|
233
|
+
### Step 11 — Breakthrough Branch Loop
|
|
192
234
|
|
|
193
235
|
When breakthrough is triggered:
|
|
194
236
|
|
|
@@ -198,21 +240,26 @@ When breakthrough is triggered:
|
|
|
198
240
|
- refresh selected-direction/design-system if the winner changes
|
|
199
241
|
- record the decision in `.qfai/evidence/breakthrough.json`
|
|
200
242
|
|
|
201
|
-
### Step
|
|
243
|
+
### Step 12 — Validate and Verify
|
|
202
244
|
|
|
203
|
-
- Run `qfai validate --fail-on error`.
|
|
245
|
+
- Run `qfai validate --profile prototyping --fail-on error`.
|
|
204
246
|
- Route `/qfai-verify` or its equivalent gate workflow for final quality approval.
|
|
205
247
|
- Do not declare completion until the reviewer result is `PASS`.
|
|
206
248
|
|
|
207
249
|
## Evaluator Inputs (Mandatory)
|
|
208
250
|
|
|
209
|
-
|
|
251
|
+
Evaluation reviewer sub-agents MUST be launched with the `review-bundle.json` for the current round. The bundle contains all required inputs. At a minimum, the bundle MUST reference:
|
|
252
|
+
|
|
253
|
+
1. screenshots (per declared screen, round/candidate path)
|
|
254
|
+
2. HTML snapshots (per declared screen, round/candidate path)
|
|
255
|
+
3. accessibility snapshots (`<screen-id>.snapshot.txt` per declared screen, round/candidate path)
|
|
256
|
+
4. Playwright CLI command log (`<screen-id>.commands.json` per declared screen, round/candidate path)
|
|
257
|
+
5. `axisDefs` from `.qfai/contracts/design/evaluation-rubric.yaml`
|
|
258
|
+
6. `previousScore` from the prior round when available
|
|
259
|
+
7. `designSystemChecklist` from `.qfai/contracts/design/design-system.yaml`
|
|
260
|
+
8. `commandPlanRef` pointing at `command-plans.json`
|
|
210
261
|
|
|
211
|
-
|
|
212
|
-
2. HTML snapshots
|
|
213
|
-
3. axisDefs
|
|
214
|
-
4. previousScore
|
|
215
|
-
5. designSystemChecklist
|
|
262
|
+
The evaluator writes `evaluator-reviews/<candidate-id>.json` with per-axis `score`, `rationale`, and `evidenceRefs[]`. Every `evidenceRefs[]` entry MUST point to an existing artifact; placeholder strings (`""`, `"tbd"`, `"TBD"`) are rejected by `qfai validate`.
|
|
216
263
|
|
|
217
264
|
## Visual Quality Structural Checklist
|
|
218
265
|
|
|
@@ -238,9 +285,12 @@ Minimum reviewer responsibilities:
|
|
|
238
285
|
- verify mandatory screenshot/HTML evidence exists for every declared screen
|
|
239
286
|
- verify exploration brief, evaluation rubric, and evaluator calibration were used
|
|
240
287
|
- verify missing evidence caused rerun rather than waiver
|
|
241
|
-
- verify `qfai validate --fail-on error` completed successfully
|
|
288
|
+
- verify `qfai validate --profile prototyping --fail-on error` completed successfully
|
|
242
289
|
- verify breakthrough trigger evidence is present
|
|
243
290
|
- verify best-of-history handling is documented
|
|
291
|
+
- verify at least one post-selection polish iteration completed after winner selection
|
|
292
|
+
- verify every reviewer sub-agent scored every evaluation axis at `100/100`
|
|
293
|
+
- reject completion claims based on any 95-point threshold
|
|
244
294
|
- treat score/volume heuristics as signals, not gates
|
|
245
295
|
- return `Result: PASS | REVISE`
|
|
246
296
|
|
|
@@ -271,25 +321,34 @@ Use the shared schema (per-row `Status (PASS/REVISE)` column, reviewer response
|
|
|
271
321
|
|
|
272
322
|
Follow `.qfai/assistant/instructions/shared-skill-operating-baseline.md#completion-contract-shared`.
|
|
273
323
|
|
|
274
|
-
Prototyping-specific additions:
|
|
324
|
+
Prototyping-specific additions (apply to all modes identically):
|
|
275
325
|
|
|
276
326
|
- all specs are covered
|
|
277
|
-
- all declared screens have screenshot
|
|
327
|
+
- all declared screens have 4 artifacts per active candidate / round: screenshot, HTML, accessibility snapshot, Playwright CLI command log
|
|
328
|
+
- canonical latest paths mirror the latest accepted winner/polish state
|
|
329
|
+
- `review-bundle.json`, `command-plans.json`, and per-candidate evaluator reviews exist for every round
|
|
278
330
|
- `selected-direction.yaml` exists
|
|
279
331
|
- `design-system.yaml` exists
|
|
280
332
|
- `breakthrough.json` exists
|
|
281
|
-
- `
|
|
282
|
-
-
|
|
333
|
+
- `bestOfHistory` and `breakthrough` sections present in `prototyping.json`
|
|
334
|
+
- at least one post-selection polish cycle completed after winner selection
|
|
335
|
+
- every reviewer sub-agent scored every evaluation axis at `100/100`
|
|
336
|
+
- independent reviewer gate returned `PASS`
|
|
337
|
+
- `qfai validate --profile prototyping --fail-on error` passes
|
|
283
338
|
|
|
284
339
|
## FINAL CHECKLIST (Check Last)
|
|
285
340
|
|
|
286
341
|
- All specs are covered in the Coverage Matrix.
|
|
287
|
-
- Every declared screen has screenshot evidence.
|
|
288
|
-
-
|
|
342
|
+
- Every declared screen has screenshot, HTML, accessibility snapshot, and command log evidence per active candidate / round.
|
|
343
|
+
- Canonical latest paths mirror the latest accepted winner/polish artifacts.
|
|
344
|
+
- Mode invariant: `maxCycles` is the only mode-dependent field in `prototyping.json` (validated by `QFAI-PROT-MODE-001`).
|
|
289
345
|
- Missing evidence triggered rerun instead of waiver.
|
|
290
346
|
- Direction funnel `5->3->2->1` completed.
|
|
291
|
-
-
|
|
292
|
-
-
|
|
347
|
+
- Direction funnel completion was not treated as stage completion.
|
|
348
|
+
- At least one post-selection polish cycle completed with critique/fix/re-capture/re-review/breakthrough checks.
|
|
349
|
+
- Every reviewer sub-agent scored every evaluation axis at `100/100`.
|
|
350
|
+
- Breakthrough detector ran after polish cycles.
|
|
351
|
+
- Independent reviewer returned PASS; otherwise status is REVISE.
|
|
293
352
|
|
|
294
353
|
## Completion Message & Next Actions (MUST)
|
|
295
354
|
|
package/assets/init/.qfai/assistant/skills/qfai-prototyping/references/evidence-requirements.md
CHANGED
|
@@ -1,31 +1,62 @@
|
|
|
1
1
|
# Evidence Requirements
|
|
2
2
|
|
|
3
|
-
## Mandatory evidence
|
|
3
|
+
## Mandatory evidence (per round, per candidate, per screen)
|
|
4
4
|
|
|
5
|
-
For every declared screen in `.qfai/contracts/ui/*.yaml`, collect
|
|
5
|
+
For every declared screen in `.qfai/contracts/ui/*.yaml`, collect all 4 artifacts for every active candidate in every round `<rN>`:
|
|
6
6
|
|
|
7
|
-
- screenshot: `.qfai/evidence/prototyping/
|
|
8
|
-
- HTML snapshot: `.qfai/evidence/prototyping/
|
|
7
|
+
- screenshot: `.qfai/evidence/prototyping/rounds/<rN>/candidates/<candidate-id>/<screen-id>.png`
|
|
8
|
+
- HTML snapshot: `.qfai/evidence/prototyping/rounds/<rN>/candidates/<candidate-id>/<screen-id>.html`
|
|
9
|
+
- accessibility snapshot: `.qfai/evidence/prototyping/rounds/<rN>/candidates/<candidate-id>/<screen-id>.snapshot.txt`
|
|
10
|
+
- Playwright CLI command log: `.qfai/evidence/prototyping/rounds/<rN>/candidates/<candidate-id>/<screen-id>.commands.json`
|
|
11
|
+
|
|
12
|
+
## Per-round artifacts
|
|
13
|
+
|
|
14
|
+
Every exploration round MUST also produce these round-scoped artifacts:
|
|
15
|
+
|
|
16
|
+
- Playwright CLI command plans: `.qfai/evidence/prototyping/rounds/<rN>/command-plans.json`
|
|
17
|
+
- Review input bundle: `.qfai/evidence/prototyping/rounds/<rN>/review-bundle.json`
|
|
18
|
+
- Evaluator outputs: `.qfai/evidence/prototyping/rounds/<rN>/evaluator-reviews/<candidate-id>.json`
|
|
19
|
+
- Harvest template for `r5|r3|r2`: `.qfai/evidence/prototyping/rounds/<rN>/harvest.json`
|
|
20
|
+
- Narrow decision for `r5|r3|r2`: `.qfai/evidence/prototyping/rounds/<rN>/narrow-decision.json`
|
|
21
|
+
- Absorption plan for `r3|r2|r1`: `.qfai/evidence/prototyping/rounds/<rN>/absorption-plan.json`
|
|
22
|
+
- Reimplementation record for `r3|r2|r1`: `.qfai/evidence/prototyping/rounds/<rN>/reimplementation.json`
|
|
23
|
+
|
|
24
|
+
## Canonical latest paths
|
|
25
|
+
|
|
26
|
+
The canonical latest screenshot and HTML MUST mirror the newest accepted winner/polish artifacts for the same `<screen-id>`:
|
|
27
|
+
|
|
28
|
+
- `.qfai/evidence/prototyping/screenshots/<screen-id>.png`
|
|
29
|
+
- `.qfai/evidence/prototyping/html/<screen-id>.html`
|
|
9
30
|
|
|
10
31
|
If either artifact is missing:
|
|
11
32
|
|
|
12
|
-
- the screen is scored `0`
|
|
13
|
-
- the
|
|
33
|
+
- the screen is scored `0` for the round
|
|
34
|
+
- the round is incomplete
|
|
14
35
|
- rerun is mandatory
|
|
15
36
|
|
|
16
37
|
Optional evidence is not allowed.
|
|
17
38
|
|
|
18
39
|
## Capture rules
|
|
19
40
|
|
|
41
|
+
- AI evaluator sub-agent executes the Playwright CLI command plans generated by QFAI.
|
|
42
|
+
- Paths are assigned by QFAI via `command-plans.json`. Do not invent paths.
|
|
20
43
|
- Use stable `screen-id` names from the canonical UI contracts.
|
|
21
|
-
- Overwrite stale evidence with fresh evidence from the current
|
|
22
|
-
- Do not reuse an older screenshot
|
|
44
|
+
- Overwrite stale evidence with fresh evidence from the current active winner/polish state while preserving round history under `rounds/<rN>/`.
|
|
45
|
+
- Do not reuse an older screenshot, HTML snapshot, accessibility snapshot, or command log after a fix.
|
|
23
46
|
- If capture fails, record the failure in work evidence and stop pretending the screen was evaluated.
|
|
24
47
|
|
|
48
|
+
## Mode invariant
|
|
49
|
+
|
|
50
|
+
Evidence requirements are identical for all modes (low-cost / standard / full-harness) per spec-0012. Modes differ only by `maxCycles` (1 / 3 / 20). Choosing a lower mode does NOT reduce evidence obligations.
|
|
51
|
+
|
|
25
52
|
## Validate gate expectations
|
|
26
53
|
|
|
27
|
-
`qfai validate --fail-on error` must be able to confirm:
|
|
54
|
+
`qfai validate --profile prototyping --fail-on error` must be able to confirm, for every round:
|
|
28
55
|
|
|
29
|
-
- every declared screen has
|
|
30
|
-
-
|
|
31
|
-
-
|
|
56
|
+
- every declared screen has all 4 per-screen artifacts
|
|
57
|
+
- the round has `command-plans.json`, `review-bundle.json`, and per-candidate evaluator reviews
|
|
58
|
+
- canonical latest paths mirror the newest accepted winner/polish artifacts
|
|
59
|
+
- `review-bundle.json` has all required fields (candidates, axisDefs, designSystemChecklist, commandPlanRef)
|
|
60
|
+
- evaluator review `evidenceRefs[]` entries are concrete paths to existing files (no placeholders)
|
|
61
|
+
- `prototyping.json` `maxCycles` matches the mode (QFAI-PROT-MODE-001)
|
|
62
|
+
- `bestOfHistory`, `breakthrough`, and `reviewerGate` sections are present and populated
|
|
@@ -1,25 +1,57 @@
|
|
|
1
|
-
#
|
|
1
|
+
# Round Lifecycle
|
|
2
2
|
|
|
3
|
-
|
|
3
|
+
## Phase taxonomy
|
|
4
4
|
|
|
5
|
-
|
|
6
|
-
2. Launch L1 and L2 evaluator sub-agents with the required inputs.
|
|
7
|
-
3. Aggregate findings and classify them by severity and disposition.
|
|
8
|
-
4. Fix the UI according to findings.
|
|
9
|
-
5. Re-capture screenshot and HTML evidence for every changed screen.
|
|
10
|
-
6. Re-run the evaluators.
|
|
5
|
+
The exploration funnel is expressed as fixed rounds plus optional polish cycles:
|
|
11
6
|
|
|
12
|
-
|
|
7
|
+
- exploration rounds: `r5`, `r3`, `r2`, `r1`
|
|
8
|
+
- post-selection loops: `polish`, `branch`, `reviewer_gate`, `completed`
|
|
13
9
|
|
|
14
|
-
-
|
|
15
|
-
|
|
16
|
-
|
|
10
|
+
`r1` is winner selection only. It does not count as a post-selection polish cycle and cannot be used as stage completion.
|
|
11
|
+
|
|
12
|
+
## Round steps
|
|
13
|
+
|
|
14
|
+
Each exploration round follows this order, driven by the AI evaluator sub-agent running Playwright CLI:
|
|
15
|
+
|
|
16
|
+
1. **Round start**: run `qfai prototyping round-start --round <rN> --candidates <csv> --target-url <url> --mode <mode>`.
|
|
17
|
+
2. **Capture**: execute `command-plans.json` (goto, snapshot, interaction, screenshot, html) for every declared screen of every active candidate, saving evidence at the assigned paths.
|
|
18
|
+
3. **Evaluate**: evaluator sub-agents read `review-bundle.json` and write per-candidate `evaluator-reviews/<candidate-id>.json` with concrete `evidenceRefs[]`.
|
|
19
|
+
4. **Harvest**: run `qfai prototyping round-harvest --round <rN>` to create the harvest template from the evaluated candidate set.
|
|
20
|
+
5. **Narrow**: run `qfai prototyping round-narrow --round <rN> --survivors <csv>` to record which candidates survive to the next round.
|
|
21
|
+
6. **Absorb**: for `r3|r2|r1`, run `qfai prototyping round-absorb --round <rN> --survivors <csv>` to generate the absorption plan for the surviving candidates.
|
|
22
|
+
7. **Reimplement verify**: run `qfai prototyping round-reimplement-verify --round <rN>` after reimplementation evidence is written.
|
|
23
|
+
|
|
24
|
+
## Completion requirements
|
|
25
|
+
|
|
26
|
+
Completion (independent of mode) requires ALL of the following:
|
|
27
|
+
|
|
28
|
+
- at least one `polish` cycle completed after winner selection (capture + review + fix + re-capture + re-review)
|
|
29
|
+
- all declared screens have all 4 artifacts in the completion round / polish cycle
|
|
30
|
+
- blocking findings are closed or dispositioned
|
|
31
|
+
- `bestOfHistory` evidence present
|
|
32
|
+
- `breakthrough` evidence present
|
|
33
|
+
- every reviewer sub-agent scored every evaluation axis at `100/100`
|
|
34
|
+
- `qfai validate --profile prototyping --fail-on error` passes
|
|
35
|
+
- independent reviewer returns `PASS`
|
|
36
|
+
- the completion certificate proves `allReviewerAxesPerfect100=true`
|
|
37
|
+
|
|
38
|
+
## Mode invariant
|
|
39
|
+
|
|
40
|
+
The completion gate above applies identically to `low-cost`, `standard`, and `full-harness`. The only mode-specific value is `maxCycles`:
|
|
41
|
+
|
|
42
|
+
- `low-cost`: `maxCycles = 1` — at most one polish cycle; completion is only reachable if the single polish cycle satisfies the full gate.
|
|
43
|
+
- `standard`: `maxCycles = 3` — default.
|
|
44
|
+
- `full-harness`: `maxCycles = 20` — extended polish budget.
|
|
45
|
+
|
|
46
|
+
If the polish-cycle budget is exhausted before the gate is satisfied, the run does not complete and is returned as `REVISE`. A 95-point threshold is a signal only and is not a completion border.
|
|
17
47
|
|
|
18
48
|
## Stop conditions
|
|
19
49
|
|
|
20
50
|
You may stop only when all of the following are true:
|
|
21
51
|
|
|
22
|
-
- all declared screens have
|
|
52
|
+
- all declared screens have all 4 artifacts for the current accepted winner/polish state
|
|
53
|
+
- canonical latest paths match the current accepted winner/polish state
|
|
23
54
|
- blocking findings are closed or dispositioned
|
|
24
|
-
- validate
|
|
55
|
+
- `qfai validate --profile prototyping --fail-on error` passes
|
|
25
56
|
- independent reviewer returns `PASS`
|
|
57
|
+
- the completion certificate proves `allReviewerAxesPerfect100=true`
|