codebyplan 1.11.0 → 1.11.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dist/cli.js +1 -1
- package/package.json +1 -1
- package/templates/agents/cbp-round-executor.md +1 -1
- package/templates/agents/cbp-task-check.md +0 -1
- package/templates/agents/cbp-test-e2e-agent.md +2 -2
- package/templates/agents/cbp-testing-qa-agent.md +7 -29
- package/templates/skills/cbp-checkpoint-complete/SKILL.md +1 -1
- package/templates/skills/cbp-frontend-ui/SKILL.md +2 -2
- package/templates/skills/cbp-round-end/SKILL.md +16 -26
- package/templates/skills/cbp-round-execute/SKILL.md +2 -2
package/dist/cli.js
CHANGED
package/package.json
CHANGED
|
@@ -436,7 +436,7 @@ If none match, skip — proceed directly to Step 4.
|
|
|
436
436
|
- Aggregate `summary` totals into `round.context.frontend_self_review.summary` (combined critical / warning / suggestion / auto_fixed / out_of_scope_fixes).
|
|
437
437
|
|
|
438
438
|
4. **Surface non-mechanical findings** to the round summary:
|
|
439
|
-
- `baseline_regression` and `rendered_visual` findings from `cbp-frontend-ui` are NOT auto-fixed (root cause is typically in app state/data, not styling) — surface
|
|
439
|
+
- `baseline_regression` and `rendered_visual` findings from `cbp-frontend-ui` are NOT auto-fixed (root cause is typically in app state/data, not styling) — surface in `round.context.frontend_ui_review` findings; `/cbp-round-end` Step 7 surfaces baseline-regression findings as a blocking accept-or-fix gate (baselines never auto-accepted).
|
|
440
440
|
- `out_of_scope_fixes` from either skill (findings whose target file is outside `files_changed`) — surface in `improvements_noted[]` for follow-up rounds; the scope gate prevented silent absorption.
|
|
441
441
|
|
|
442
442
|
**Why inline (not a separate spawn)**: the post-implementation review consumes the same files the executor just touched. Spawning a separate agent doubles token cost (re-reading the files) and serialises wall time; invoking via Skill keeps both review passes inside the executor's working memory and lets fixes apply with the same Edit/Write tools that wrote the original code. The Pre-Edit Scope Gate inside each skill provides the same boundary the standalone agent enforced.
|
|
@@ -80,7 +80,6 @@ Compare task work against `checkpoint.goal`:
|
|
|
80
80
|
|
|
81
81
|
Review all QA items across all rounds:
|
|
82
82
|
- **Auto items**: Verify all passed (build, lint, types, tests)
|
|
83
|
-
- **User items**: Verify all marked pass/skip
|
|
84
83
|
- **Default items**: Verify all resolved (pass or skipped with reason)
|
|
85
84
|
|
|
86
85
|
**E2E pass vs skipped distinction**: When reading `auto_qa.items[]` for `check: 'e2e'`, do NOT conflate `status: 'pass'` with `status: 'skipped'`. A spec that ran with `passed === 0 && skipped > 0` for any path touching `files_changed` is a hard fail, not a pass — verdict text MUST explicitly call this out: "E2E spec authored but assertions did not execute (skip-gated)." Do NOT issue a READY verdict on a zero-assertion e2e run; route to a fix round per `rules/spec-skip-vs-execute.md`.
|
|
@@ -289,7 +289,7 @@ For each failed test, classify into exactly one category:
|
|
|
289
289
|
| `auth` | Login-page redirect after credential submit, 401 on authenticated request, `invalid_grant`, `email_not_confirmed` | AskUserQuestion as in Step 6.5.3 |
|
|
290
290
|
| `access` | 403, 404 on a route the user should have access to, RLS policy denial text, missing seed data | AskUserQuestion: "Test failed with access error: `{error}`. Seed data or RLS policy may be missing. Options: (1) reply with steps you took to fix, (2) abort." |
|
|
291
291
|
| `flake` | Timeout on first run, passes on immediate retry, network jitter | Retry up to 3 times before reclassifying to `real` |
|
|
292
|
-
| `visual_regression` | `toHaveScreenshot` pixel-diff exceeded threshold | Do NOT retry. Include baseline + actual paths in `screenshots[]` with `baseline_diff_pct`. Do NOT auto-accept baseline — leave for `frontend-ui` (`/cbp-round-execute` Step 5b under `phase: 'screenshot_review'`)
|
|
292
|
+
| `visual_regression` | `toHaveScreenshot` pixel-diff exceeded threshold | Do NOT retry. Include baseline + actual paths in `screenshots[]` with `baseline_diff_pct`. Do NOT auto-accept baseline — leave for `frontend-ui` (`/cbp-round-execute` Step 5b under `phase: 'screenshot_review'`); baseline regressions surface at `/cbp-round-end` Step 7 as a blocking gate. |
|
|
293
293
|
| `real` | Assertion failure on app behavior (text missing, wrong state, navigation broken) | Attempt fix (see Step 8), then report to executor |
|
|
294
294
|
|
|
295
295
|
Failures with `category` of `env`, `auth`, or `access` MUST NOT be counted as test failures in `test_results.failed` until pre-flight passes — they block the run instead.
|
|
@@ -355,7 +355,7 @@ Populate all output contract fields. Include test file paths in `tests_written`,
|
|
|
355
355
|
## Integration
|
|
356
356
|
|
|
357
357
|
- **Spawned by**: `/cbp-round-execute` Step 5 (parallel sibling of `testing-qa-agent`); also invoked by `/cbp-checkpoint-check` (TASK-2 deliverable) with `whole_checkpoint_mode: true`
|
|
358
|
-
- **Parallel sibling**: `cbp-testing-qa-agent` (owns build/lint/types/unit/audit
|
|
358
|
+
- **Parallel sibling**: `cbp-testing-qa-agent` (owns build/lint/types/unit/audit). **Fully independent — no cross-read.** This agent's screenshots are consumed by `/cbp-round-execute` Step 5b (`frontend-ui` skill, `phase: 'screenshot_review'`) which writes `round.context.frontend_ui_review.findings`; baseline-regression findings surface as a BLOCKING gate at `/cbp-round-end` Step 7 (baselines never auto-accepted).
|
|
359
359
|
- **Returns to**: `/cbp-round-execute` which persists output to `round.context.e2e_output`. Step 5b then invokes the `frontend-ui` skill with `phase: 'screenshot_review'` and the screenshots; Step 6 considers `e2e_output.test_results.failed > 0` and `status === 'failed'` as hard-fail signals.
|
|
360
360
|
- **Reads**: `.claude/context/testing/e2e.md`, page/screen source files, existing specs, `.env.local`, `.codebyplan/server.json` `port_allocations`, MCP `get_repos` (for `tech_stack` reconciliation at Step 1.5)
|
|
361
361
|
- **May modify source**: Only to add testID/data-testid attributes
|
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
scope: org-shared
|
|
3
3
|
name: cbp-testing-qa-agent
|
|
4
|
-
description: Combined testing, QA generation, and default checklists. Runs build/lint/types/unit-tests/audit, generates auto
|
|
4
|
+
description: Combined testing, QA generation, and default checklists. Runs build/lint/types/unit-tests/audit, generates auto QA items, applies default production checklists. Does NOT consume e2e screenshots or frontend-ui findings.
|
|
5
5
|
tools: Read, Glob, Grep, Bash, AskUserQuestion
|
|
6
6
|
model: sonnet
|
|
7
7
|
effort: xhigh
|
|
@@ -16,11 +16,11 @@ Combined testing, QA generation, and default production checklists in a single a
|
|
|
16
16
|
Single agent that handles non-e2e quality validation in the per-wave validation phase of `/cbp-round-execute` Step 5:
|
|
17
17
|
- Run all 18 automated checks (work + quality verification)
|
|
18
18
|
- **EXECUTE** automated testing commands (build, lint, types, unit tests, visual checks, audit)
|
|
19
|
-
- Generate auto
|
|
19
|
+
- Generate auto QA items
|
|
20
20
|
- Apply default production checklist items
|
|
21
21
|
- Detect unrelated issues and missing tests
|
|
22
22
|
|
|
23
|
-
E2E execution (Playwright / Maestro / WebDriverIO / XCUITest / vscode-test) is owned by `cbp-test-e2e-agent`, spawned in parallel with this agent by `/cbp-round-execute` Step 5. **The two agents are fully independent — this agent does NOT read `round.context.e2e_output` or `round.context.frontend_ui_review`.**
|
|
23
|
+
E2E execution (Playwright / Maestro / WebDriverIO / XCUITest / vscode-test) is owned by `cbp-test-e2e-agent`, spawned in parallel with this agent by `/cbp-round-execute` Step 5. **The two agents are fully independent — this agent does NOT read `round.context.e2e_output` or `round.context.frontend_ui_review`.** This agent emits auto QA items and default checklist items. Baseline-regression findings surface as a BLOCKING gate at `/cbp-round-end` Step 7 (an explicit accept-or-fix user decision; baselines are NEVER auto-accepted).
|
|
24
24
|
|
|
25
25
|
## Input Contract
|
|
26
26
|
|
|
@@ -63,13 +63,6 @@ output:
|
|
|
63
63
|
stdout: string # captured command output
|
|
64
64
|
stderr: string # captured error output
|
|
65
65
|
screenshots: [{page, viewport, status, file}] # visual check only
|
|
66
|
-
user_qa:
|
|
67
|
-
items:
|
|
68
|
-
- type: 'user'
|
|
69
|
-
check: string
|
|
70
|
-
status: 'pending'
|
|
71
|
-
instructions: string
|
|
72
|
-
round_number: number
|
|
73
66
|
default_checklist:
|
|
74
67
|
items:
|
|
75
68
|
- type: 'default'
|
|
@@ -256,7 +249,7 @@ Report findings in `build_analysis` even if the build succeeded.
|
|
|
256
249
|
|
|
257
250
|
When `files_changed` includes a new route file under any `apps/*/src/app/api/` or `apps/*/src/app/mcp/` directory:
|
|
258
251
|
- If dev server is running: curl the endpoint without credentials, assert response is 401/403 (not 200). Log as auto QA item `auth_enforcement`.
|
|
259
|
-
- If dev server is not running:
|
|
252
|
+
- If dev server is not running: log a skipped auto QA item with the exact curl command noted in `notes` for reference.
|
|
260
253
|
|
|
261
254
|
### Phase 3.58: Missing Unit Tests for New API Routes
|
|
262
255
|
|
|
@@ -331,22 +324,7 @@ This aligns with `immediate-issue-capture.md` (resolve-in-current-scope by defau
|
|
|
331
324
|
|
|
332
325
|
**4a. Auto QA items**: Generate from Phase 3 results. One item per test category. Include stdout/stderr.
|
|
333
326
|
|
|
334
|
-
**4b.
|
|
335
|
-
|
|
336
|
-
**4b.0. Connection smoke test suppression**: Before emitting any connection smoke test user QA item (MCP connection, server health, service wiring), check whether the governing config file is unchanged. Governing config map: MCP (Claude Code) → `.mcp.json`; Dev server → `.env.local`, `.codebyplan/server.json` port_allocations; API integrations → `.env.local`. **Suppression rule**: if the governing config is NOT in `files_changed` AND `git diff HEAD -- <config>` is empty, log `{type:"user", check:"<name>", status:"skipped", notes:"Governing config <file> unchanged in this round; connection behavior is unaffected."}` — do NOT emit a pending user QA item.
|
|
337
|
-
|
|
338
|
-
**4b.1. Design source comparison** (mandatory when `has_ui_work` is true): Search the project's design-sources directory (e.g., `docs/design/`, `docs/development/product/sources/design/`) for PNG files matching the page or component being changed. If design PNGs exist, add a mandatory user QA item with check: "Design source fidelity" and instructions: "Compare rendered output against design source PNG. Verify: column layout matches, control shapes match (flat vs pill vs toggle), background colors match, row structure and dividers match, action controls are in the correct column."
|
|
339
|
-
|
|
340
|
-
**4b.2. Volume-gated mechanical-sweep spot-check** (volume-triggered, runs regardless of `has_ui_work`): when `files_changed.length > 100` AND the round is mechanical (`work_type == 'mechanical'` OR round requirements match `/sweep|auto.?fix|batch|backlog/i`), emit a mandatory user QA item:
|
|
341
|
-
|
|
342
|
-
- `check`: `"High-volume mechanical round spot-check"`
|
|
343
|
-
- `status`: `"pending"`
|
|
344
|
-
- `instructions`: "This round modified {N} files mechanically. Open 3–5 changed files in the running app and verify behavior is unchanged. Prioritize files with business logic (services, hooks, reducers) over pure presentation. Spot-check at least one file from each touched module."
|
|
345
|
-
- `round_number`: current
|
|
346
|
-
|
|
347
|
-
Volume gating exists because automated checks (build/lint/types/unit) verify shape but not behaviour preservation; large mechanical sweeps (auto-fix, codemod, refactor) can pass all gates while silently changing semantics in code paths the test suite doesn't cover.
|
|
348
|
-
|
|
349
|
-
**4c. Default checklist items**: See Phase 5.
|
|
327
|
+
**4b. Default checklist items**: See Phase 5.
|
|
350
328
|
|
|
351
329
|
### Phase 5: Default Production Checklist
|
|
352
330
|
|
|
@@ -379,7 +357,7 @@ Return complete output contract.
|
|
|
379
357
|
- Build output analyzed for warnings/deprecations/console.logs (with client/server classification)
|
|
380
358
|
- npm audit executed, vulnerabilities reported by severity, critical/high contribute to hard_fail
|
|
381
359
|
- Unrelated issues discovered and logged
|
|
382
|
-
- Auto
|
|
360
|
+
- Auto and default QA items generated
|
|
383
361
|
- `hard_fail` flag correctly set
|
|
384
362
|
- **Vitest/Jest/Cargo unit-test hard_fail enforced** when source files changed
|
|
385
363
|
- E2E execution + preflight delegated entirely to `test-e2e-agent` (this agent never runs Playwright/Maestro/wdio/etc.)
|
|
@@ -397,4 +375,4 @@ Return complete output contract.
|
|
|
397
375
|
|
|
398
376
|
- **Spawned by**: `/cbp-round-execute` Step 5 (per-wave; runs in parallel with `test-e2e-agent` and may also run in parallel with next wave's executor)
|
|
399
377
|
- **Parallel sibling**: `cbp-test-e2e-agent` (fully independent — no cross-read; both agents complete on their own timeline using only their own inputs)
|
|
400
|
-
- **Output consumed by**: `/cbp-round-execute` Step 6 (hard-fail routing — this agent's `totals.hard_fail` is OR'd with `e2e_output.test_results.failed > 0` and `e2e_output.status === 'failed'`), `/cbp-round-end` Step
|
|
378
|
+
- **Output consumed by**: `/cbp-round-execute` Step 6 (hard-fail routing — this agent's `totals.hard_fail` is OR'd with `e2e_output.test_results.failed > 0` and `e2e_output.status === 'failed'`), `/cbp-round-end` Step 3 (reads this agent's `auto_qa[]` and `default_checklist[]`). This agent does not emit `user_qa` items; baseline-regression findings surface as a BLOCKING gate at `/cbp-round-end` Step 7 (an explicit accept-or-fix user decision; baselines are NEVER auto-accepted).
|
|
@@ -76,7 +76,7 @@ Stop here.
|
|
|
76
76
|
|
|
77
77
|
Collect QA results from all tasks and rounds. Build checkpoint-level QA summary:
|
|
78
78
|
- Count auto checks passed/failed
|
|
79
|
-
- List
|
|
79
|
+
- List default-checklist items still pending review
|
|
80
80
|
|
|
81
81
|
If any critical failures, warn the user but don't block.
|
|
82
82
|
|
|
@@ -184,7 +184,7 @@ For each screenshot in `e2e_screenshots[]`:
|
|
|
184
184
|
|
|
185
185
|
Populate `screenshot_review` totals.
|
|
186
186
|
|
|
187
|
-
**Do not attempt to auto-fix `rendered_visual` or `baseline_regression` findings** — they
|
|
187
|
+
**Do not attempt to auto-fix `rendered_visual` or `baseline_regression` findings** — they surface as a blocking gate at `/cbp-round-end` Step 7 (accept-or-fix) and feed the fix loop, because the root cause is typically in app code/data, not in the SCSS.
|
|
188
188
|
|
|
189
189
|
### Phase 7: Aggregate Findings
|
|
190
190
|
|
|
@@ -258,5 +258,5 @@ Go beyond fixing violations — actively improve visual quality. If spacing coul
|
|
|
258
258
|
- **Also invoked by**: `/cbp-checkpoint-check` (TASK-2 deliverable, future) with screenshots from a whole-checkpoint e2e run
|
|
259
259
|
- **Consumes**: `e2e_screenshots[]` from `round.context.e2e_output.screenshots` (populated by `test-e2e-agent` at `/cbp-round-execute` Step 5)
|
|
260
260
|
- **Output written to**: `round.context.frontend_ui_review` — when invoked twice per round, the second invocation merges with the first
|
|
261
|
-
- **
|
|
261
|
+
- **Downstream gate**: this skill emits `findings[]` only. Baseline-regression findings surface as a BLOCKING gate at `/cbp-round-end` Step 7 (baselines never auto-accepted); rendered-visual critical findings are surfaced in the Step 7 findings presentation.
|
|
262
262
|
- **Paired with**: `frontend-design` (pre-implementation aesthetic decision), `frontend-ux` (interaction-quality self-review, also Step 3.8)
|
|
@@ -55,46 +55,26 @@ Build the files list with approval status:
|
|
|
55
55
|
**claude_approved**: `true` if cbp-testing-qa-agent passed for this file. `false` if issues remain.
|
|
56
56
|
**user_approved**: Always `false` initially. User approves via git staging or web UI.
|
|
57
57
|
|
|
58
|
-
### Step 3: Collect
|
|
58
|
+
### Step 3: Collect QA Results
|
|
59
59
|
|
|
60
60
|
**No QA runs here** — all QA was already executed by per-wave `cbp-testing-qa-agent` inside `/cbp-round-execute` Step 5.
|
|
61
61
|
|
|
62
|
-
|
|
62
|
+
#### 3a — Collect items from agent outputs
|
|
63
63
|
|
|
64
|
-
|
|
65
|
-
|
|
66
|
-
Collect from round context (all three sources are agent-derived):
|
|
64
|
+
Collect from round context:
|
|
67
65
|
|
|
68
66
|
- **Auto items**: from `testing_qa_output.auto_qa.items`
|
|
69
|
-
- **User items (single-source)**: from `testing_qa_output.user_qa.items` (Phase 4b.1 design-source comparison + Phase 4b.2 mechanical-sweep spot-check + Phase 4b.0 connection smoke — all derivable from cbp-testing-qa-agent's own data)
|
|
70
67
|
- **Default items**: from `testing_qa_output.default_checklist.items`
|
|
71
68
|
|
|
72
|
-
|
|
73
|
-
|
|
74
|
-
Read `round.context.frontend_ui_review.findings[]` (populated by the `cbp-frontend-ui` skill at `/cbp-round-execute` Step 5b under `phase: 'screenshot_review'`). For each finding, emit a `user_qa` item per the rules below.
|
|
75
|
-
|
|
76
|
-
| Finding `category` | Finding `severity` | Emitted user_qa item |
|
|
77
|
-
|--------------------|-------------------|---------------------|
|
|
78
|
-
| `baseline_regression` | any | `{ type: 'user', check: 'Visual baseline regression — {page_or_screen}', status: 'pending', round_number: N, instructions: 'Open the diff PNG at `{screenshot}` (the `-diff.png` sibling shows pixel differences). Pixel-diff was `{baseline_diff_pct}%`. Decide: (a) regression — add a task to fix, OR (b) new rendering is correct — run `pnpm exec playwright test --update-snapshots` in `apps/{app}` and commit the updated baselines. Do NOT proceed until a decision is recorded.' }` |
|
|
79
|
-
| `rendered_visual` | `critical` | `{ type: 'user', check: 'Rendered-visual critical — {page_or_screen}', status: 'pending', round_number: N, instructions: 'Open the screenshot at `{screenshot}`. The cbp-frontend-ui review flagged a critical rendering issue: `{finding.issue}`. Suggested fix: `{finding.suggestion}`. Decide whether this needs a fix-round before proceeding.' }` |
|
|
80
|
-
| `rendered_visual` | `warning` | (no user_qa item; finding stays in `frontend_ui_review.findings` and surfaces via Step 7 findings presentation if relevant) |
|
|
81
|
-
| Other categories | any | (no user_qa item from this step) |
|
|
82
|
-
|
|
83
|
-
Skip Step 3b entirely when `round.context.frontend_ui_review` is absent (no e2e ran, or screenshot-review phase didn't execute).
|
|
84
|
-
|
|
85
|
-
This is the required user gate for baseline updates — baselines are NEVER auto-accepted.
|
|
86
|
-
|
|
87
|
-
#### 3c — Merge
|
|
88
|
-
|
|
89
|
-
Combine the single-source items (3a) and cross-source items (3b) into a single `user_qa[]` for the round. Merge with previous rounds (supersede items for re-modified files, preserve verified items).
|
|
69
|
+
Merge with previous rounds (supersede items for re-modified files, preserve verified items).
|
|
90
70
|
|
|
91
71
|
### Step 4: Update Task Files and QA
|
|
92
72
|
|
|
93
73
|
Update via MCP:
|
|
94
74
|
|
|
95
75
|
- `update_task(task_id, files_changed: [...])` — merge with existing
|
|
96
|
-
- `update_round(round_id, files_changed: [...], qa: {items: [
|
|
97
|
-
- `update_task(task_id, qa: {items: [
|
|
76
|
+
- `update_round(round_id, files_changed: [...], qa: {items: [auto_qa items + default_checklist items]})` — round-specific
|
|
77
|
+
- `update_task(task_id, qa: {items: [auto_qa items + default_checklist items]})` — aggregated
|
|
98
78
|
|
|
99
79
|
### Step 5: Present Summary
|
|
100
80
|
|
|
@@ -134,6 +114,16 @@ Wait for agent to complete. If the spawn fails for any reason, apply the inline-
|
|
|
134
114
|
|
|
135
115
|
### Step 7: Present Findings
|
|
136
116
|
|
|
117
|
+
**Baseline-regression blocking gate**: before presenting code-review findings, check `round.context.frontend_ui_review.findings[]` for any entry with `category: 'baseline_regression'` (any severity). When one or more such findings exist, surface them as a BLOCKING decision that MUST be resolved before routing to `/cbp-round-update`:
|
|
118
|
+
|
|
119
|
+
- Present each regression: screenshot path, `baseline_diff_pct`, affected page/screen.
|
|
120
|
+
- Ask the user via AskUserQuestion to choose:
|
|
121
|
+
- **(a) Treat as regression** — add a fix-round to address the visual change, OR
|
|
122
|
+
- **(b) Accept the new baseline** — run `pnpm exec playwright test --update-snapshots` in `apps/{app}` and commit the updated baselines.
|
|
123
|
+
- Do NOT route to `/cbp-round-update` until the decision is recorded. Baselines are NEVER auto-accepted.
|
|
124
|
+
|
|
125
|
+
`rendered_visual` critical findings from `round.context.frontend_ui_review.findings[]` are surfaced in the normal findings presentation below (not as a separate gate).
|
|
126
|
+
|
|
137
127
|
**If `status: 'no_findings'`:** show `### Code Review\nNo issues found. Code looks good.` and skip to Step 8.
|
|
138
128
|
|
|
139
129
|
**If findings exist**, present them grouped by severity (table + per-finding details), then ask the user via AskUserQuestion which to fix: `all`, `1,2` (specific numbers), `none`, or `inline` (only when all findings qualify under the Trivial-Resolution Exception).
|
|
@@ -149,13 +149,13 @@ On pass, synthesise `testing_qa_output` inline per the procedure in `reference/i
|
|
|
149
149
|
|
|
150
150
|
Input contracts: `cbp-testing-qa-agent` receives `executor_output`, `testing_profile`, `has_ui_work` (see `agents/cbp-testing-qa-agent.md` Input Contract). `cbp-test-e2e-agent` receives `repo_id`, `round_number`, `files_changed`, `prior_round_files_changed` (full task aggregate when round_number ≥ 2), `whole_checkpoint_mode: false`, `test_strategy`, `pages_affected`, `has_auth`, `dev_server_port` (see `agents/cbp-test-e2e-agent.md` Input Contract for the full shape).
|
|
151
151
|
|
|
152
|
-
**Independence**: neither agent reads the other's output.
|
|
152
|
+
**Independence**: neither agent reads the other's output. Baseline-regression findings surface as a BLOCKING gate at `/cbp-round-end` Step 7 (an explicit accept-or-fix user decision; baselines are NEVER auto-accepted). Per-wave spawns MAY run in parallel with the next wave's executor when dependency order allows.
|
|
153
153
|
|
|
154
154
|
### Step 5b: Post-E2E Screenshot Review (cbp-frontend-ui Phase 6.5)
|
|
155
155
|
|
|
156
156
|
When `round.context.e2e_output.screenshots[]` is non-empty, invoke the `cbp-frontend-ui` skill with `phase: 'screenshot_review'` (input: `files_changed`, `e2e_screenshots: round.context.e2e_output.screenshots`, `context: { checkpoint_goal, round_requirements }`). Under this phase the skill runs only Phase 6.5 (Rendered-Output Visual Review) + 7 + 8 — Phases 1-6 (style) already ran inline at executor Step 3.8 with `phase: 'style_only'`.
|
|
157
157
|
|
|
158
|
-
Persist findings to `round.context.frontend_ui_review` (merge with Step 3.8's style-only output if present). `/cbp-round-end` Step
|
|
158
|
+
Persist findings to `round.context.frontend_ui_review` (merge with Step 3.8's style-only output if present). Baseline-regression findings surface as a BLOCKING gate at `/cbp-round-end` Step 7 (an explicit accept-or-fix user decision; baselines are NEVER auto-accepted); rendered_visual critical findings are surfaced in the Step 7 findings presentation. Neither auto-fails the round. cbp-testing-qa-agent does NOT read these findings (full independence per Step 5).
|
|
159
159
|
|
|
160
160
|
**Skip** when `round.context.e2e_output` is absent, `screenshots` is empty, or `testing_profile === 'claude_only'`.
|
|
161
161
|
|