codebyplan 1.13.14 → 1.13.16
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +77 -2
- package/dist/cli.js +726 -69
- package/package.json +1 -1
- package/templates/agents/cbp-e2e-maestro.md +26 -3
- package/templates/agents/cbp-e2e-playwright.md +24 -3
- package/templates/agents/cbp-e2e-tauri.md +25 -2
- package/templates/agents/cbp-e2e-vscode.md +28 -3
- package/templates/agents/cbp-e2e-xcuitest.md +40 -4
- package/templates/agents/cbp-task-check.md +2 -0
- package/templates/context/testing/e2e.md +57 -9
- package/templates/hooks/README.md +23 -1
- package/templates/hooks/validate-structure-patterns.sh +1 -1
- package/templates/rules/e2e-mandatory.md +19 -2
- package/templates/settings.project.base.json +18 -1
- package/templates/skills/cbp-checkpoint-end/SKILL.md +18 -1
- package/templates/skills/cbp-frontend-ui/SKILL.md +9 -7
- package/templates/skills/cbp-round-execute/SKILL.md +49 -7
- package/templates/skills/cbp-setup-cmux/SKILL.md +170 -0
- package/templates/skills/cbp-task-complete/SKILL.md +14 -0
- package/templates/skills/cbp-task-start/SKILL.md +8 -0
package/package.json
CHANGED
|
@@ -38,7 +38,7 @@ env:
|
|
|
38
38
|
TEST_EMAIL: ${TEST_EMAIL}
|
|
39
39
|
TEST_PASSWORD: ${TEST_PASSWORD}
|
|
40
40
|
APP_ID: com.yourorg.yourapp
|
|
41
|
-
screenshotsDir:
|
|
41
|
+
screenshotsDir: e2e/screenshots/maestro
|
|
42
42
|
```
|
|
43
43
|
|
|
44
44
|
## Shared Login Flow
|
|
@@ -158,8 +158,31 @@ delete + confirm + verify removed.
|
|
|
158
158
|
- takeScreenshot: "flow-name-after-state"
|
|
159
159
|
```
|
|
160
160
|
|
|
161
|
-
Screenshots written to `
|
|
162
|
-
|
|
161
|
+
Screenshots written to `e2e/screenshots/maestro/` (via `screenshotsDir` in `config.yaml`).
|
|
162
|
+
Committed path convention: `e2e/screenshots/maestro/{flow}-{state}.png` (repo root).
|
|
163
|
+
This path is intentionally outside `apps/web/e2e/screenshots/` (which is gitignored).
|
|
164
|
+
|
|
165
|
+
After the flow completes, `git add e2e/screenshots/maestro/` to track new PNGs.
|
|
166
|
+
|
|
167
|
+
**`is_new` detection**: `git ls-files --error-unmatch <path>` exits non-zero → `is_new: true`.
|
|
168
|
+
|
|
169
|
+
Enumerate committed PNGs: `e2e/screenshots/maestro/*.png`.
|
|
170
|
+
|
|
171
|
+
## e2e_gallery Population
|
|
172
|
+
|
|
173
|
+
After the run, for each committed PNG in `e2e/screenshots/maestro/*.png`, emit one
|
|
174
|
+
`e2e_gallery[]` entry:
|
|
175
|
+
|
|
176
|
+
```yaml
|
|
177
|
+
- test_name: string # Maestro flow filename (e.g. "dashboard")
|
|
178
|
+
page_or_screen: string # screen / flow name
|
|
179
|
+
framework: maestro
|
|
180
|
+
committed_path: string # repo-relative: e2e/screenshots/maestro/{flow}-{state}.png
|
|
181
|
+
is_new: boolean # detected via git ls-files
|
|
182
|
+
baseline_diff_pct: null # Maestro does not produce pixel diffs
|
|
183
|
+
```
|
|
184
|
+
|
|
185
|
+
Include this in the specialist output alongside `screenshots[]`.
|
|
163
186
|
|
|
164
187
|
## Run Command
|
|
165
188
|
|
|
@@ -189,7 +189,8 @@ test.describe("Home page", () => {
|
|
|
189
189
|
```ts
|
|
190
190
|
await expect(page).toHaveScreenshot("state-name.png", { maxDiffPixelRatio: 0.001 });
|
|
191
191
|
```
|
|
192
|
-
Baselines live beside spec under `{spec}.spec.ts-snapshots/`. Committed
|
|
192
|
+
Baselines live beside spec under `{spec}.spec.ts-snapshots/`. Committed path:
|
|
193
|
+
`apps/{app}/e2e/{spec}.spec.ts-snapshots/{name}-{browser}.png`.
|
|
193
194
|
|
|
194
195
|
**Diagnostic** (intermediate states):
|
|
195
196
|
```ts
|
|
@@ -199,9 +200,29 @@ await page.screenshot({
|
|
|
199
200
|
});
|
|
200
201
|
```
|
|
201
202
|
|
|
202
|
-
Enumerate PNGs: `
|
|
203
|
+
Enumerate committed PNGs: `{spec}.spec.ts-snapshots/**/*.png` (NOT `test-results/` — those are transient).
|
|
203
204
|
|
|
204
|
-
|
|
205
|
+
**`is_new` detection**: `git ls-files --error-unmatch <committed_path>` exits non-zero →
|
|
206
|
+
`is_new: true` (auto-committed first-run baseline; `git add` the file). Exit zero → `is_new: false`.
|
|
207
|
+
|
|
208
|
+
**Never run `--update-snapshots` automatically.** A diff on an existing baseline is a `visual_regression` failure.
|
|
209
|
+
|
|
210
|
+
## e2e_gallery Population
|
|
211
|
+
|
|
212
|
+
After the run, for each committed PNG in `{spec}.spec.ts-snapshots/**/*.png`, emit one
|
|
213
|
+
`e2e_gallery[]` entry. For `screenshots[].viewport`: default to `'desktop'`; set `'mobile'`
|
|
214
|
+
when the playwright.config project/device emulation indicates a mobile viewport (e.g. `devices['iPhone 14']`).
|
|
215
|
+
|
|
216
|
+
```yaml
|
|
217
|
+
- test_name: string # test title from test.info().title
|
|
218
|
+
page_or_screen: string # route / screen name
|
|
219
|
+
framework: playwright
|
|
220
|
+
committed_path: string # repo-relative path to the .spec.ts-snapshots PNG
|
|
221
|
+
is_new: boolean # detected via git ls-files (see above)
|
|
222
|
+
baseline_diff_pct: number | null # from Playwright diff output; null when is_new
|
|
223
|
+
```
|
|
224
|
+
|
|
225
|
+
Include this in the specialist output alongside `screenshots[]`.
|
|
205
226
|
|
|
206
227
|
## Run Command
|
|
207
228
|
|
|
@@ -148,10 +148,33 @@ For CRUD: create + verify visible; edit + verify; delete + confirm + verify remo
|
|
|
148
148
|
## Screenshot Capture
|
|
149
149
|
|
|
150
150
|
```ts
|
|
151
|
-
await browser.saveScreenshot(
|
|
151
|
+
await browser.saveScreenshot(
|
|
152
|
+
`./e2e/screenshots/webdriverio/${testName}-${state}.png`
|
|
153
|
+
);
|
|
152
154
|
```
|
|
153
155
|
|
|
154
|
-
|
|
156
|
+
Committed path convention: `{app-dir}/e2e/screenshots/webdriverio/{spec}-{state}.png`.
|
|
157
|
+
After the run, `git add {app-dir}/e2e/screenshots/webdriverio/` to track new PNGs.
|
|
158
|
+
|
|
159
|
+
**`is_new` detection**: `git ls-files --error-unmatch <path>` exits non-zero → `is_new: true`.
|
|
160
|
+
|
|
161
|
+
Enumerate committed PNGs: `{app-dir}/e2e/screenshots/webdriverio/**/*.png`.
|
|
162
|
+
|
|
163
|
+
## e2e_gallery Population
|
|
164
|
+
|
|
165
|
+
After the run, for each committed PNG under `{app-dir}/e2e/screenshots/webdriverio/`, emit one
|
|
166
|
+
`e2e_gallery[]` entry:
|
|
167
|
+
|
|
168
|
+
```yaml
|
|
169
|
+
- test_name: string # spec describe/it label
|
|
170
|
+
page_or_screen: string # window / view name
|
|
171
|
+
framework: webdriverio
|
|
172
|
+
committed_path: string # repo-relative: {app-dir}/e2e/screenshots/webdriverio/{spec}-{state}.png
|
|
173
|
+
is_new: boolean # detected via git ls-files
|
|
174
|
+
baseline_diff_pct: null # WebDriverIO does not produce pixel diffs
|
|
175
|
+
```
|
|
176
|
+
|
|
177
|
+
Include this in the specialist output alongside `screenshots[]`.
|
|
155
178
|
|
|
156
179
|
## Run Command
|
|
157
180
|
|
|
@@ -162,10 +162,35 @@ snapshots to `test-fixtures/`.
|
|
|
162
162
|
## Screenshot Capture
|
|
163
163
|
|
|
164
164
|
VS Code extension tests do not have browser-style screenshot capture. For visual review,
|
|
165
|
-
write fixture output
|
|
166
|
-
with `viewport: 'device'`. `baseline_diff_pct: null` for all entries.
|
|
165
|
+
write fixture output PNGs to the committed dir:
|
|
167
166
|
|
|
168
|
-
|
|
167
|
+
Committed path convention: `{app-dir}/e2e/screenshots/vscode/{suite}-{test}.png`.
|
|
168
|
+
|
|
169
|
+
This dir **may be empty** for behavior-only tests that produce no visual output (SD-3).
|
|
170
|
+
When capturing PNGs is possible, write them there and `git add` them.
|
|
171
|
+
|
|
172
|
+
Enumerate committed PNGs: `{app-dir}/e2e/screenshots/vscode/**/*.png` (may be empty).
|
|
173
|
+
|
|
174
|
+
## e2e_gallery Population
|
|
175
|
+
|
|
176
|
+
Always emit `e2e_gallery[]` in the specialist output — even when empty (never omit the field):
|
|
177
|
+
|
|
178
|
+
```yaml
|
|
179
|
+
e2e_gallery: [] # empty for behavior-only extensions with no PNG output
|
|
180
|
+
```
|
|
181
|
+
|
|
182
|
+
When committed PNGs do exist, emit one entry per PNG:
|
|
183
|
+
|
|
184
|
+
```yaml
|
|
185
|
+
- test_name: string # suite/test name
|
|
186
|
+
page_or_screen: string # VS Code view / panel name
|
|
187
|
+
framework: vscode-test
|
|
188
|
+
committed_path: string # repo-relative: {app-dir}/e2e/screenshots/vscode/{suite}-{test}.png
|
|
189
|
+
is_new: boolean # detected via git ls-files
|
|
190
|
+
baseline_diff_pct: null # vscode-test does not produce pixel diffs
|
|
191
|
+
```
|
|
192
|
+
|
|
193
|
+
Include this in the specialist output alongside `screenshots[]`.
|
|
169
194
|
|
|
170
195
|
## Run Command
|
|
171
196
|
|
|
@@ -168,11 +168,40 @@ screenshot.lifetime = .keepAlways
|
|
|
168
168
|
add(screenshot)
|
|
169
169
|
```
|
|
170
170
|
|
|
171
|
-
Attachments are written to the test results bundle under `DerivedData`.
|
|
172
|
-
|
|
171
|
+
Attachments are written to the test results bundle under `DerivedData`. After the run,
|
|
172
|
+
export them to the committed path using `xcrun xcresulttool`:
|
|
173
173
|
|
|
174
|
-
|
|
175
|
-
|
|
174
|
+
```bash
|
|
175
|
+
# Export all attachments from the result bundle to committed dir
|
|
176
|
+
xcrun xcresulttool export \
|
|
177
|
+
--path ./build/results.xcresult \
|
|
178
|
+
--output-path {app-dir}/e2e/screenshots/xcuitest \
|
|
179
|
+
--type directory
|
|
180
|
+
```
|
|
181
|
+
|
|
182
|
+
Committed path convention: `{app-dir}/e2e/screenshots/xcuitest/{TestClass}/{testMethod}-{n}.png`.
|
|
183
|
+
After export, `git add {app-dir}/e2e/screenshots/xcuitest/` to track new PNGs.
|
|
184
|
+
|
|
185
|
+
**`is_new` detection**: `git ls-files --error-unmatch <path>` exits non-zero → `is_new: true`.
|
|
186
|
+
|
|
187
|
+
Enumerate committed PNGs: `{app-dir}/e2e/screenshots/xcuitest/**/*.png`.
|
|
188
|
+
|
|
189
|
+
## e2e_gallery Population
|
|
190
|
+
|
|
191
|
+
After export, for each committed PNG under `{app-dir}/e2e/screenshots/xcuitest/`, emit one
|
|
192
|
+
`e2e_gallery[]` entry:
|
|
193
|
+
|
|
194
|
+
```yaml
|
|
195
|
+
- test_name: string # TestClass/testMethod
|
|
196
|
+
page_or_screen: string # screen name inferred from the attachment name
|
|
197
|
+
framework: xcuitest
|
|
198
|
+
committed_path: string # repo-relative: {app-dir}/e2e/screenshots/xcuitest/{TestClass}/{testMethod}-{n}.png
|
|
199
|
+
is_new: boolean # detected via git ls-files
|
|
200
|
+
baseline_diff_pct: null # XCUITest does not produce pixel diffs
|
|
201
|
+
```
|
|
202
|
+
|
|
203
|
+
Include this in the specialist output alongside `screenshots[]` (which retains the
|
|
204
|
+
`DerivedData` transient path for diagnostic reference).
|
|
176
205
|
|
|
177
206
|
## Run Command
|
|
178
207
|
|
|
@@ -181,9 +210,16 @@ xcodebuild test \
|
|
|
181
210
|
-workspace ios/YourApp.xcworkspace \
|
|
182
211
|
-scheme YourApp \
|
|
183
212
|
-destination 'platform=iOS Simulator,name=iPhone 16,OS=latest' \
|
|
213
|
+
-resultBundlePath ./build/results.xcresult \
|
|
184
214
|
TEST_EMAIL="$TEST_EMAIL" \
|
|
185
215
|
TEST_PASSWORD="$TEST_PASSWORD" \
|
|
186
216
|
| xcbeautify
|
|
217
|
+
# Then export attachments to committed dir:
|
|
218
|
+
xcrun xcresulttool export \
|
|
219
|
+
--path ./build/results.xcresult \
|
|
220
|
+
--output-path {app-dir}/e2e/screenshots/xcuitest \
|
|
221
|
+
--type directory
|
|
222
|
+
git add {app-dir}/e2e/screenshots/xcuitest/
|
|
187
223
|
```
|
|
188
224
|
|
|
189
225
|
## pnpm Script
|
|
@@ -84,6 +84,8 @@ Review all QA items across all rounds:
|
|
|
84
84
|
|
|
85
85
|
**E2E pass vs skipped distinction**: When reading `auto_qa.items[]` for `check: 'e2e'`, do NOT conflate `status: 'pass'` with `status: 'skipped'`. A spec that ran with `passed === 0 && skipped > 0` for any path touching `files_changed` is a hard fail, not a pass — verdict text MUST explicitly call this out: "E2E spec authored but assertions did not execute (skip-gated)." Do NOT issue a READY verdict on a zero-assertion e2e run; route to a fix round per `rules/e2e-mandatory.md`.
|
|
86
86
|
|
|
87
|
+
**Committed-screenshot check**: For any round where `round.context.e2e_eligible[]` is non-empty, verify `round.context.e2e_gallery[]` is non-empty. Refuse a READY verdict when it is empty — verdict text: "E2E ran but produced zero committed screenshots — open a fix round per `rules/e2e-mandatory.md` § Committed-Screenshot Enforcement." Sole exception: when `vscode-test` is the ONLY eligible framework, an empty `e2e_gallery[]` is allowed (SD-3, behavior-only extensions).
|
|
88
|
+
|
|
87
89
|
List any pending or failed items. Determine if they are blockers.
|
|
88
90
|
|
|
89
91
|
### Phase 5: File Approval Check
|
|
@@ -70,6 +70,13 @@ output:
|
|
|
70
70
|
viewport: 'desktop' | 'mobile' | 'tablet' | 'device'
|
|
71
71
|
is_new: bool
|
|
72
72
|
baseline_diff_pct: number | null
|
|
73
|
+
e2e_gallery: # ADDITIVE alongside screenshots[]; consumed by TASK-3 / checkpoint-end
|
|
74
|
+
- test_name: string
|
|
75
|
+
page_or_screen: string
|
|
76
|
+
framework: string # playwright | maestro | xcuitest | webdriverio | vscode-test
|
|
77
|
+
committed_path: string # repo-relative; MUST be git-tracked after the run
|
|
78
|
+
is_new: boolean # true => no prior baseline; auto-captured+committed this run
|
|
79
|
+
baseline_diff_pct: number | null # null for non-playwright frameworks
|
|
73
80
|
user_interactions: [{question, answer}]
|
|
74
81
|
tech_stack_reconciliation:
|
|
75
82
|
db_framework: string | null
|
|
@@ -174,18 +181,57 @@ For each failed test, assign exactly one category:
|
|
|
174
181
|
`env`, `auth`, `access` failures MUST NOT count toward `test_results.failed` until
|
|
175
182
|
preflight passes — they block the run instead.
|
|
176
183
|
|
|
184
|
+
## Committed-Screenshot Mandate
|
|
185
|
+
|
|
186
|
+
Every eligible e2e run MUST persist relevant screenshots to the framework's committed
|
|
187
|
+
directory (tracked in git). Transient dirs (e.g. `test-results/`, `DerivedData`) are for
|
|
188
|
+
diagnostics only — they are NOT the committed path.
|
|
189
|
+
|
|
190
|
+
| Framework | Committed path |
|
|
191
|
+
|---|---|
|
|
192
|
+
| playwright | `apps/{app}/e2e/{spec}.spec.ts-snapshots/{name}-{browser}.png` |
|
|
193
|
+
| maestro | `e2e/screenshots/maestro/{flow}-{state}.png` (repo root) |
|
|
194
|
+
| xcuitest | `{app-dir}/e2e/screenshots/xcuitest/{TestClass}/{testMethod}-{n}.png` |
|
|
195
|
+
| webdriverio | `{app-dir}/e2e/screenshots/webdriverio/{spec}-{state}.png` |
|
|
196
|
+
| vscode-test | `{app-dir}/e2e/screenshots/vscode/{suite}-{test}.png` (SD-3: may be empty for behavior-only extensions) |
|
|
197
|
+
|
|
198
|
+
SD-3: the vscode-test committed dir may be empty for behavior-only extensions (no visual surface); the agent must still emit `e2e_gallery: []` explicitly. `cbp-task-check` Phase 4 treats an empty `e2e_gallery[]` as allowed when `vscode-test` is the ONLY eligible framework.
|
|
199
|
+
|
|
200
|
+
**Gitignore caution**: root `.gitignore` ignores `apps/web/e2e/screenshots/`. For the `{app-dir}`-relative frameworks (xcuitest, webdriverio, vscode-test), `{app-dir}` MUST NOT resolve to `apps/web` — committed PNGs there would be silently dropped from git. Remedy: use a non-ignored subdir (e.g. `apps/web/e2e/baselines/<framework>/`). A `.gitignore` negation (`!apps/web/e2e/screenshots/<framework>/`) does NOT work — git does not recurse into an ignored parent directory, so PNGs in that subdir would be silently dropped on a fresh checkout. Maestro (repo-root `e2e/screenshots/maestro/`) is already safe.
|
|
201
|
+
|
|
202
|
+
`is_new` detection: `git ls-files --error-unmatch <path>` exits non-zero → `is_new: true`
|
|
203
|
+
(no committed baseline exists yet; auto-capture and `git add`). Exit zero → `is_new: false`.
|
|
204
|
+
|
|
205
|
+
## Auto-New / Gated-Changed Update Model
|
|
206
|
+
|
|
207
|
+
**NEW screens** (`is_new === true`): the specialist auto-captures the PNG and runs
|
|
208
|
+
`git add <committed_path>`. The test passes; `cbp-frontend-ui` Step 5b reviews semantically.
|
|
209
|
+
No user gate required for first-run capture.
|
|
210
|
+
|
|
211
|
+
**EXISTING baselines that visually diff** (`is_new === false`, `baseline_diff_pct > threshold`):
|
|
212
|
+
classify as `visual_regression`. Do NOT auto-update. Surface as a blocking accept-or-fix gate
|
|
213
|
+
at `/cbp-round-end` Step 7. The user must explicitly approve (`--update-snapshots`) or open a
|
|
214
|
+
fix task. This relaxes the prior always-manual contract ONLY for new screens.
|
|
215
|
+
|
|
177
216
|
## Screenshot Collection Rule
|
|
178
217
|
|
|
179
|
-
After every run, enumerate all PNGs
|
|
180
|
-
specific paths are in each agent's body. Every
|
|
181
|
-
`{test_name, path, page_or_screen, viewport, is_new, baseline_diff_pct}`.
|
|
218
|
+
After every run, enumerate all committed PNGs and populate BOTH `screenshots[]` and
|
|
219
|
+
`e2e_gallery[]`. Framework-specific paths are in each agent's body. Every `screenshots[]`
|
|
220
|
+
entry requires: `{test_name, path, page_or_screen, viewport, is_new, baseline_diff_pct}`.
|
|
221
|
+
Every `e2e_gallery[]` entry requires: `{test_name, page_or_screen, framework, committed_path,
|
|
222
|
+
is_new, baseline_diff_pct}`. `committed_path` MUST be a git-tracked path after the run.
|
|
223
|
+
|
|
224
|
+
`/cbp-round-execute` Step 5b aggregates `e2e_gallery[]` across all specialists and stores it
|
|
225
|
+
in `round.context.e2e_gallery`. TASK-3 / checkpoint-end consumes this aggregated gallery to
|
|
226
|
+
upload images to the DB.
|
|
182
227
|
|
|
183
228
|
Screenshots flow to `cbp-frontend-ui` invoked by `/cbp-round-execute` Step 5b with
|
|
184
229
|
`phase: 'screenshot_review'` — NOT inline by `round-executor` Step 3.8 (which runs
|
|
185
230
|
`phase: 'style_only'` without e2e output).
|
|
186
231
|
|
|
187
|
-
**
|
|
188
|
-
the user decides via QA whether to update baselines.
|
|
232
|
+
**Changed baselines are never auto-accepted.** A `toHaveScreenshot` diff on an existing
|
|
233
|
+
baseline is `visual_regression`; the user decides via QA whether to update baselines.
|
|
234
|
+
New-screen auto-capture (above) is the only exception to the always-manual contract.
|
|
189
235
|
|
|
190
236
|
## Completion Rule
|
|
191
237
|
|
|
@@ -237,7 +283,9 @@ An agent is NOT spawned when ANY of the following hold:
|
|
|
237
283
|
spawn multiple specialists in the same round (one per eligible framework). Agents run in
|
|
238
284
|
parallel with `cbp-testing-qa-agent`. Each specialist's output is stored under
|
|
239
285
|
`round.context.e2e_outputs[framework]` (a framework-keyed map); `/cbp-round-execute` Step 5b
|
|
240
|
-
aggregates `screenshots[]` across all entries before the
|
|
286
|
+
aggregates `screenshots[]` and `e2e_gallery[]` across all entries before the
|
|
287
|
+
`cbp-frontend-ui` review. The aggregated `e2e_gallery[]` is persisted separately to
|
|
288
|
+
`round.context.e2e_gallery` for consumption by TASK-3 / checkpoint-end.
|
|
241
289
|
|
|
242
290
|
**whole_checkpoint_mode dispatch** (`/cbp-checkpoint-check` Step 5b and `/cbp-checkpoint-plan`
|
|
243
291
|
Step 4): pass `round_number: 0`, `whole_checkpoint_mode: true`, and the aggregated
|
|
@@ -298,6 +346,6 @@ a loop, snapshot text/href BEFORE navigation rather than holding stale `Locator`
|
|
|
298
346
|
|
|
299
347
|
| Situation | What happens |
|
|
300
348
|
|---|---|
|
|
301
|
-
| No baseline (new screen) | Playwright creates on first run;
|
|
302
|
-
| Baseline exists, diff ≤ threshold | Test passes. |
|
|
303
|
-
| Baseline exists, diff > threshold | `visual_regression` failure
|
|
349
|
+
| No baseline (new screen, `is_new: true`) | Playwright creates on first run; auto-committed; `git add` runs; `e2e_gallery[].is_new: true`; `cbp-frontend-ui` Step 5b reviews semantically. No user gate. |
|
|
350
|
+
| Baseline exists, diff ≤ threshold | Test passes; `is_new: false`; `baseline_diff_pct` recorded. |
|
|
351
|
+
| Baseline exists, diff > threshold | `visual_regression` failure; `is_new: false`. Agent does NOT retry. `cbp-frontend-ui` Step 5b flags it; `/cbp-round-end` Step 3b constructs user QA item. User decides: fix-task or `--update-snapshots`. |
|
|
@@ -226,7 +226,7 @@ After a `complete_round` MCP call succeeds, reconciles the round's `files_change
|
|
|
226
226
|
|
|
227
227
|
### `cbp-cmux-workspace-sync.sh` — SessionStart, matcher `*`
|
|
228
228
|
|
|
229
|
-
On every session start, syncs the active [cmux](https://github.com/nicholasgasior/cmux) workspace title to the current git branch
|
|
229
|
+
On every session start, syncs the active [cmux](https://github.com/nicholasgasior/cmux) workspace title to the current git branch, the workspace description to the repo folder basename (the directory that contains `.git/`), and applies the workspace color from `.codebyplan/cmux.json` via `cmux workspace-action --action set-color`. All three actions are delegated to `codebyplan cmux-sync`. If no `workspace_color` is configured, a one-line nudge is printed to stdout prompting the user to run `/cbp-setup-cmux`.
|
|
230
230
|
|
|
231
231
|
**Blocks vs warns**: never blocks — exit 0 on every path. A SessionStart hook must never prevent a session from opening.
|
|
232
232
|
|
|
@@ -252,6 +252,28 @@ After any Bash tool call that contains a `git checkout` or `git switch` invocati
|
|
|
252
252
|
|
|
253
253
|
---
|
|
254
254
|
|
|
255
|
+
### Auto dev server (`codebyplan cmux-serve`)
|
|
256
|
+
|
|
257
|
+
At the start of each round, `cbp-round-execute` (Step 2a) calls `codebyplan cmux-serve --files "<round files>"` to auto-start the dev server for any app whose source files are touched. The subcommand probes each allocated port via `node:net`, starts a `cmux new-split` terminal pane + sends the dev command for any non-listening app, then opens a browser pane. If the port is already listening (another worktree) it only opens the browser pane. No hook registration is needed — the skill invokes the subcommand directly. Gated by `auto_dev_server` in `.codebyplan/cmux.json`; no-op outside cmux.
|
|
258
|
+
|
|
259
|
+
---
|
|
260
|
+
|
|
261
|
+
### Status surface (`codebyplan cmux-status`)
|
|
262
|
+
|
|
263
|
+
The lifecycle skills push CodeByPlan development state into the cmux workspace sidebar via `codebyplan cmux-status`. No hook registration is needed — the skills invoke the subcommand directly:
|
|
264
|
+
|
|
265
|
+
| Skill | What is pushed |
|
|
266
|
+
| --- | --- |
|
|
267
|
+
| `cbp-task-start` (Step 4.5) | `--checkpoint "CHK-NNN: title" --task "TASK-N: title"` |
|
|
268
|
+
| `cbp-task-complete` (Step 7.3) | `--task "TASK-N: title done" --progress completed/total` |
|
|
269
|
+
| `cbp-round-execute` (Step 3d) | `--qa "R{n} {status}"` where status ∈ completed / blocked / re-triggering |
|
|
270
|
+
|
|
271
|
+
**`auto_status` toggle.** Gated by the `auto_status` field in `.codebyplan/cmux.json` (configured via `/cbp-setup-cmux`). When `auto_status` is `false`, every call is a no-op. Default is `true` (enabled).
|
|
272
|
+
|
|
273
|
+
**No-op outside cmux.** `codebyplan cmux-status` checks for `$CMUX_WORKSPACE_ID` before doing anything. Outside a cmux workspace it exits immediately — safe to call unconditionally from skills and hooks.
|
|
274
|
+
|
|
275
|
+
---
|
|
276
|
+
|
|
255
277
|
## Supporting (not registered)
|
|
256
278
|
|
|
257
279
|
### `test-hooks.sh` — invoked by `auto-test-hooks.sh`
|
|
@@ -12,7 +12,7 @@
|
|
|
12
12
|
_SUB='(templates|examples|reference|scripts)/[a-z0-9.-]+\.(md|sh|json|ya?ml)'
|
|
13
13
|
enforce_path_pattern '^/\.claude/skills/' "^/\.claude/skills/[a-z0-9-]+/(SKILL\.md|[a-z0-9-]+\.md|${_SUB})$" 'Invalid skill path' "Pattern: /.claude/skills/{name}/SKILL.md | /.claude/skills/{name}/{file}.md | /.claude/skills/{name}/(templates|examples|reference|scripts)/{file}.{md,sh,json,yaml}"
|
|
14
14
|
enforce_path_pattern '^/\.claude/agents/' "^/\.claude/agents/([a-z0-9-]+\.md|[a-z0-9-]+/(AGENT\.md|[a-z0-9-]+\.md|${_SUB}))$" 'Invalid agent path' "Pattern: /.claude/agents/{name}.md | /.claude/agents/{name}/AGENT.md | /.claude/agents/{name}/{file}.md | /.claude/agents/{name}/(templates|examples|reference|scripts)/{file}.{md,sh,json,yaml}"
|
|
15
|
-
enforce_path_pattern '^/\.claude/rules/' '^/\.claude/rules/[a-
|
|
15
|
+
enforce_path_pattern '^/\.claude/rules/' '^/\.claude/rules/[a-z0-9-]+\.md$' 'Invalid native rule path' 'Pattern: /.claude/rules/{name}.md'
|
|
16
16
|
if match_path '^/\.claude/hooks/' && ! match_path '^/\.claude/hooks/__test-fixtures__/'; then
|
|
17
17
|
if ! match_path '^/\.claude/hooks/[a-z-]+\.sh$'; then
|
|
18
18
|
block 'Invalid hook path' 'Pattern: /.claude/hooks/{name}.sh'
|
|
@@ -61,10 +61,27 @@ A spec that ran with `passed === 0 && skipped > 0` for any path touching `files_
|
|
|
61
61
|
**hard fail**, not a pass — `cbp-task-check` (`agents/cbp-task-check.md`) refuses a READY
|
|
62
62
|
verdict on a zero-assertion e2e run and routes to a fix round per this rule.
|
|
63
63
|
|
|
64
|
+
## Committed-Screenshot Enforcement
|
|
65
|
+
|
|
66
|
+
An eligible e2e run that produces **zero committed screenshots** for any `pages_affected`
|
|
67
|
+
path it touched is a defect — not a valid pass. Every framework must write at least one
|
|
68
|
+
PNG to its committed dir (per the table in `context/testing/e2e.md` § Committed-Screenshot
|
|
69
|
+
Mandate) and `git add` it before reporting `status: 'completed'`.
|
|
70
|
+
|
|
71
|
+
`cbp-task-check` refuses a READY verdict when `e2e_gallery[]` is empty AND the round
|
|
72
|
+
touched UI source paths for an eligible framework — sole exception: `vscode-test`-only
|
|
73
|
+
rounds (SD-3, behavior-only extensions; see below). The fix path is the same as for a
|
|
74
|
+
zero-assertion run: open a fix round that captures the missing committed screenshots.
|
|
75
|
+
|
|
76
|
+
The sole exception is `vscode-test`: the committed dir may be empty when the extension
|
|
77
|
+
has no visual output (behavior-only tests). Agents must still define the dir and report
|
|
78
|
+
`e2e_gallery: []` explicitly — not omit the field.
|
|
79
|
+
|
|
64
80
|
## Cross-References
|
|
65
81
|
|
|
66
82
|
- `context/testing/e2e.md` — Input/Output contract, pre-flight loop, failure classification,
|
|
67
|
-
and
|
|
68
|
-
- `agents/cbp-task-check.md` — enforces the zero-assertion hard-fail
|
|
83
|
+
committed-screenshot mandate, auto-new/gated-changed model, and dispatch routing table.
|
|
84
|
+
- `agents/cbp-task-check.md` — enforces the zero-assertion hard-fail and the empty
|
|
85
|
+
`e2e_gallery[]` hard-fail at verdict time.
|
|
69
86
|
- `skills/cbp-round-execute/SKILL.md` Step 5/6, `skills/cbp-checkpoint-check/SKILL.md` Step 5b
|
|
70
87
|
— the config-driven dispatch and `e2e_eligible_skipped` gate implementations.
|
|
@@ -88,7 +88,9 @@
|
|
|
88
88
|
"Bash(codebyplan ship:*)",
|
|
89
89
|
"Bash(npx codebyplan ship:*)",
|
|
90
90
|
"Bash(codebyplan claude:*)",
|
|
91
|
-
"Bash(npx codebyplan claude:*)"
|
|
91
|
+
"Bash(npx codebyplan claude:*)",
|
|
92
|
+
"Bash(codebyplan upload-e2e-images:*)",
|
|
93
|
+
"Bash(npx codebyplan upload-e2e-images:*)"
|
|
92
94
|
],
|
|
93
95
|
"allow": [
|
|
94
96
|
"Skill(cbp-build-cc-agent)",
|
|
@@ -122,12 +124,23 @@
|
|
|
122
124
|
"Skill(cbp-round-update)",
|
|
123
125
|
"Skill(cbp-session-end)",
|
|
124
126
|
"Skill(cbp-session-start)",
|
|
127
|
+
"Skill(cbp-setup-cmux)",
|
|
125
128
|
"Skill(cbp-setup-e2e)",
|
|
126
129
|
"Skill(cbp-setup-eslint)",
|
|
127
130
|
"Skill(cbp-ship-configure)",
|
|
131
|
+
"Skill(cbp-standalone-task-check)",
|
|
132
|
+
"Skill(cbp-standalone-task-complete)",
|
|
133
|
+
"Skill(cbp-standalone-task-create)",
|
|
134
|
+
"Skill(cbp-standalone-task-start)",
|
|
135
|
+
"Skill(cbp-standalone-task-testing)",
|
|
128
136
|
"Skill(cbp-supabase-branch-check)",
|
|
129
137
|
"Skill(cbp-supabase-migrate)",
|
|
130
138
|
"Skill(cbp-supabase-setup)",
|
|
139
|
+
"Skill(cbp-standalone-task-check)",
|
|
140
|
+
"Skill(cbp-standalone-task-complete)",
|
|
141
|
+
"Skill(cbp-standalone-task-create)",
|
|
142
|
+
"Skill(cbp-standalone-task-start)",
|
|
143
|
+
"Skill(cbp-standalone-task-testing)",
|
|
131
144
|
"Skill(cbp-task-check)",
|
|
132
145
|
"Skill(cbp-task-complete)",
|
|
133
146
|
"Skill(cbp-task-create)",
|
|
@@ -189,6 +202,10 @@
|
|
|
189
202
|
"Bash(npx codebyplan resolve-worktree:*)",
|
|
190
203
|
"Bash(codebyplan cmux-sync:*)",
|
|
191
204
|
"Bash(npx codebyplan cmux-sync:*)",
|
|
205
|
+
"Bash(codebyplan cmux-status:*)",
|
|
206
|
+
"Bash(npx codebyplan cmux-status:*)",
|
|
207
|
+
"Bash(codebyplan cmux-serve:*)",
|
|
208
|
+
"Bash(npx codebyplan cmux-serve:*)",
|
|
192
209
|
"Bash(codebyplan version-status:*)",
|
|
193
210
|
"Bash(npx codebyplan version-status:*)",
|
|
194
211
|
"Bash(codebyplan statusline:*)",
|
|
@@ -113,6 +113,22 @@ If `/cbp-ship` reports `aborted_at` (user aborted) or any surface failed verific
|
|
|
113
113
|
|
|
114
114
|
If the repo has zero configured surfaces (very early-stage), `/cbp-ship` exits with `## No deployable surfaces configured` — that's a success state, continue to cleanup.
|
|
115
115
|
|
|
116
|
+
### Step 7.5: Upload E2E Screenshots to DB (best-effort)
|
|
117
|
+
|
|
118
|
+
After `/cbp-ship` completes successfully, upload the checkpoint's new/changed committed E2E screenshots to the CodeByPlan DB so they can be reviewed per-checkpoint in the web UI (CHK-171). This step is **best-effort and non-blocking** — a failure here MUST NOT halt shipment or cleanup.
|
|
119
|
+
|
|
120
|
+
```bash
|
|
121
|
+
npx codebyplan upload-e2e-images "$CHECKPOINT_ID" --repo-id "$REPO_ID" --base-branch "$BASE" --json
|
|
122
|
+
```
|
|
123
|
+
|
|
124
|
+
The command collects only the PNGs added/changed on the feat branch vs `$BASE` (this checkpoint's own e2e work — not the whole baseline set), POSTs them to `POST /api/checkpoint-images`, and the endpoint stores each file + patches `checkpoints.e2e_screenshots`. Capture the outcome into `E2E_IMAGES_UPLOADED` for Step 10:
|
|
125
|
+
|
|
126
|
+
- Exit 0 with uploaded paths → `{ count: <n>, stored_paths: [...], skipped: false }`.
|
|
127
|
+
- Exit 0 with `No new/changed e2e screenshots found` → `{ count: 0, stored_paths: [], skipped: true }`.
|
|
128
|
+
- Non-zero exit → `{ count: 0, stored_paths: [], skipped: true, error: "<stderr summary>" }`; emit a non-blocking warning and continue to Step 8.
|
|
129
|
+
|
|
130
|
+
`--repo-id` defaults to `repo_id` from `.codebyplan/repo.json` when omitted.
|
|
131
|
+
|
|
116
132
|
### Step 8: Stale Feat Branch Cleanup
|
|
117
133
|
|
|
118
134
|
After successful shipment, identify stale remote feat branches:
|
|
@@ -202,7 +218,8 @@ context.shipment: {
|
|
|
202
218
|
skipped: [...], // populated by /cbp-ship — surfaces explicitly skipped
|
|
203
219
|
stale_branches_cleaned: [list of deleted git branches],
|
|
204
220
|
feat_branch_deleted: true/false,
|
|
205
|
-
supabase_branches_deleted: [list of Supabase preview branch names removed in Steps 8–9]
|
|
221
|
+
supabase_branches_deleted: [list of Supabase preview branch names removed in Steps 8–9],
|
|
222
|
+
e2e_images_uploaded: E2E_IMAGES_UPLOADED // from Step 7.5 — { count, stored_paths, skipped, error? } (CHK-171)
|
|
206
223
|
}
|
|
207
224
|
```
|
|
208
225
|
|
|
@@ -41,8 +41,8 @@ input:
|
|
|
41
41
|
path: string # Repo-relative or absolute path to PNG
|
|
42
42
|
page_or_screen: string
|
|
43
43
|
viewport: 'desktop' | 'mobile' | 'tablet' | 'device'
|
|
44
|
-
is_new: bool #
|
|
45
|
-
baseline_diff_pct: number | null # Pixel-diff % vs
|
|
44
|
+
is_new: bool # true = no prior committed baseline; auto-captured+committed this run
|
|
45
|
+
baseline_diff_pct: number | null # Pixel-diff % vs committed baseline (null for non-playwright frameworks)
|
|
46
46
|
```
|
|
47
47
|
|
|
48
48
|
## Output Contract
|
|
@@ -167,9 +167,10 @@ If no design source PNGs exist for the changed pages, skip this phase.
|
|
|
167
167
|
For each screenshot in `e2e_screenshots[]`:
|
|
168
168
|
|
|
169
169
|
1. **Read the PNG via the Read tool** (Claude multimodal — the PNG is shown to the model directly). Do not use Bash to inspect bytes.
|
|
170
|
-
2. **Check
|
|
171
|
-
-
|
|
172
|
-
-
|
|
170
|
+
2. **Check new vs changed baseline**:
|
|
171
|
+
- `is_new === true`: screenshot was auto-captured and committed this run (no prior baseline). Review semantically only — no regression to flag. Populate `screenshot_review.new_screens_reviewed`.
|
|
172
|
+
- `is_new === false` AND `baseline_diff_pct > 0.1%`: emit finding `{category: 'baseline_regression', severity: 'critical', file: {path}, screenshot: {path}, issue: 'Pixel diff vs committed baseline: {diff_pct}%', suggestion: 'Inspect the diff PNG (same folder, -diff suffix). Either fix the regression or, if intentional, run `playwright test --update-snapshots` and commit the new baseline.'}`
|
|
173
|
+
- Do NOT auto-update changed baselines. The user must explicitly approve via QA.
|
|
173
174
|
3. **Semantic review of rendered output** (both new screens and existing):
|
|
174
175
|
- **Text overflow / truncation** — text clipped, ellipsis in unintended places, buttons cut off
|
|
175
176
|
- **Unstyled elements** — unbranded default fonts, missing styles (flash of unstyled content captured), default blue links
|
|
@@ -237,7 +238,8 @@ Go beyond fixing violations — actively improve visual quality. If spacing coul
|
|
|
237
238
|
- Token compliance checked
|
|
238
239
|
- Spacing consistency verified
|
|
239
240
|
- **All `e2e_screenshots` reviewed** (when provided) — rendered output checked for overflow, unstyled elements, missing imagery, contrast, layout breaks, loading/error artifacts, design-source fidelity
|
|
240
|
-
-
|
|
241
|
+
- New-screen baselines reviewed semantically (`is_new === true` — auto-committed, no user gate)
|
|
242
|
+
- Changed-baseline regressions surfaced (never auto-accepted; `is_new === false` AND diff > threshold)
|
|
241
243
|
- Critical/warning issues auto-fixed where possible (styling only, in-scope only)
|
|
242
244
|
- Findings categorized by severity
|
|
243
245
|
|
|
@@ -258,5 +260,5 @@ Go beyond fixing violations — actively improve visual quality. If spacing coul
|
|
|
258
260
|
- **Also invoked by**: `/cbp-checkpoint-check` with screenshots aggregated from a whole-checkpoint e2e run
|
|
259
261
|
- **Consumes**: `e2e_screenshots[]` aggregated from `round.context.e2e_outputs[*].screenshots` (populated by the `cbp-e2e-*` specialists at `/cbp-round-execute` Step 5)
|
|
260
262
|
- **Output written to**: `round.context.frontend_ui_review` — when invoked twice per round, the second invocation merges with the first
|
|
261
|
-
- **Downstream gate**: this skill emits `findings[]` only.
|
|
263
|
+
- **Downstream gate**: this skill emits `findings[]` only. Changed-baseline-regression findings (`is_new === false`) surface as a BLOCKING gate at `/cbp-round-end` Step 7 (never auto-accepted); new-screen baselines (`is_new === true`) are auto-committed and reviewed semantically only; rendered-visual critical findings are surfaced in the Step 7 findings presentation.
|
|
262
264
|
- **Paired with**: `frontend-design` (pre-implementation aesthetic decision), `frontend-ux` (interaction-quality self-review, also Step 3.8)
|