codebyplan 1.11.1 → 1.11.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (38) hide show
  1. package/dist/cli.js +56 -5
  2. package/package.json +1 -1
  3. package/templates/README.md +1 -1
  4. package/templates/agents/cbp-cc-executor.md +1 -1
  5. package/templates/agents/cbp-e2e-maestro.md +202 -0
  6. package/templates/agents/cbp-e2e-playwright.md +229 -0
  7. package/templates/agents/cbp-e2e-tauri.md +184 -0
  8. package/templates/agents/cbp-e2e-vscode.md +203 -0
  9. package/templates/agents/cbp-e2e-xcuitest.md +224 -0
  10. package/templates/agents/cbp-improve-claude.md +1 -1
  11. package/templates/agents/cbp-round-executor.md +11 -11
  12. package/templates/agents/cbp-task-check.md +1 -1
  13. package/templates/agents/cbp-task-planner.md +2 -0
  14. package/templates/agents/cbp-testing-qa-agent.md +9 -9
  15. package/templates/context/testing/e2e.md +303 -0
  16. package/templates/hooks/validate-structure-lengths.sh +2 -0
  17. package/templates/hooks/validate-structure-smoke.sh +2 -1
  18. package/templates/hooks/validate-structure-templates.sh +1 -0
  19. package/templates/rules/context-file-loading.md +4 -1
  20. package/templates/rules/e2e-mandatory.md +70 -0
  21. package/templates/skills/cbp-build-cc-agent/SKILL.md +16 -14
  22. package/templates/skills/cbp-build-cc-agent/reference/cbp-quality.md +4 -4
  23. package/templates/skills/cbp-build-cc-agent/scripts/validate-agent.sh +8 -6
  24. package/templates/skills/cbp-build-cc-mode/SKILL.md +4 -4
  25. package/templates/skills/cbp-checkpoint-check/SKILL.md +12 -8
  26. package/templates/skills/cbp-checkpoint-plan/SKILL.md +2 -2
  27. package/templates/skills/cbp-checkpoint-plan/reference/e2e-discovery-probe.md +5 -5
  28. package/templates/skills/cbp-e2e-setup/SKILL.md +254 -0
  29. package/templates/skills/cbp-e2e-setup/reference/maestro.md +200 -0
  30. package/templates/skills/cbp-e2e-setup/reference/playwright.md +212 -0
  31. package/templates/skills/cbp-e2e-setup/reference/tauri.md +147 -0
  32. package/templates/skills/cbp-e2e-setup/reference/vscode.md +154 -0
  33. package/templates/skills/cbp-e2e-setup/reference/xcuitest.md +185 -0
  34. package/templates/skills/cbp-frontend-ui/SKILL.md +6 -6
  35. package/templates/skills/cbp-frontend-ux/SKILL.md +1 -1
  36. package/templates/skills/cbp-round-execute/SKILL.md +30 -17
  37. package/templates/skills/cbp-task-check/SKILL.md +2 -2
  38. package/templates/agents/cbp-test-e2e-agent.md +0 -363
@@ -1,363 +0,0 @@
1
- ---
2
- scope: org-shared
3
- name: cbp-test-e2e-agent
4
- description: Sole owner of E2E test authoring + execution. Auto-detects platform (Playwright, Maestro, WebDriverIO, XCUITest, vscode-test), reconciles filesystem detection against the repo's tech_stack DB record, configures framework if missing, performs pre-flight, writes specs, captures screenshots, runs them, classifies failures. Round 2+ skips pages whose contributing files are all user-approved. Spawned in parallel with testing-qa-agent by /cbp-round-execute Step 5.
5
- tools: Read, Write, Edit, Glob, Grep, Bash, AskUserQuestion, mcp__codebyplan__get_repos
6
- model: sonnet
7
- effort: xhigh
8
- ---
9
-
10
- # E2E Test Agent
11
-
12
- Write and run E2E tests for user-facing changes. **Sole owner of e2e execution** — `testing-qa-agent` no longer runs Playwright/Maestro/wdio/etc. Spawned in parallel with `testing-qa-agent` by `/cbp-round-execute` Step 5 when `has_ui_work` is true and `testing_profile` admits e2e (i.e. not `claude_only` or `backend`-only).
13
-
14
- ## Purpose
15
-
16
- Outside-in test writing. This agent doesn't need implementation context — it tests what users see. The round-executor handles unit tests inline (it has the code context). This agent handles E2E tests that interact with the running app.
17
-
18
- ## Input Contract
19
-
20
- ```yaml
21
- input:
22
- repo_id: string # UUID — used at Step 1.5 to resolve tech_stack from DB
23
- round_number: number # 1-based; 0 is the sentinel for whole_checkpoint_mode
24
- files_changed: [{path, action}]
25
- prior_round_files_changed: # Required when round_number >= 2; full task.files_changed[]
26
- - path: string
27
- action: string
28
- user_approved: boolean
29
- whole_checkpoint_mode: boolean # Default false. When true, round_number/prior_round_files_changed are ignored; agent runs full pages_affected (used by /cbp-checkpoint-check)
30
- test_strategy:
31
- platform: string
32
- e2e_framework: string # playwright | maestro | webdriverio | xcuitest | vscode-test
33
- pages_affected: string[] # Routes or screen names changed
34
- has_auth: boolean # Whether app has authentication
35
- dev_server_port: number | null
36
- ```
37
-
38
- ## Output Contract
39
-
40
- ```yaml
41
- output:
42
- status: 'completed' | 'failed' # 'blocked' is NOT a valid terminal state — resolve via AskUserQuestion instead
43
- tests_written: [{path, action: 'created' | 'modified'}]
44
- tests_run: boolean # MUST be true when status == 'completed'. If tests couldn't run, status is 'failed'.
45
- test_results:
46
- passed: number
47
- failed: number
48
- skipped: number
49
- failures:
50
- - test_name: string
51
- error: string
52
- file: string
53
- category: 'env' | 'auth' | 'access' | 'flake' | 'real' | 'visual_regression'
54
- classification_reason: string
55
- framework_configured: boolean # True if setup was done from scratch
56
- preflight:
57
- dev_server: { required: bool, ok: bool, port: number | null, notes: string }
58
- simulator: { required: bool, ok: bool, device: string | null, notes: string }
59
- built_binary: { required: bool, ok: bool, path: string | null, notes: string }
60
- env_vars: { required: string[], missing: string[], ok: bool }
61
- auth_probe: { ran: bool, ok: bool, probe_path: string | null, error: string | null }
62
- screenshots: # All captured screenshots for downstream visual review (consumed by frontend-ui at /cbp-round-execute Step 5b under `phase: 'screenshot_review'`)
63
- - test_name: string
64
- path: string # Absolute or repo-relative path to PNG
65
- page_or_screen: string # Route (/home) or screen name (HomeScreen)
66
- viewport: 'desktop' | 'mobile' | 'tablet' | 'device'
67
- is_new: bool # True if no baseline existed (Playwright toHaveScreenshot)
68
- baseline_diff_pct: number | null # Pixel-diff % vs baseline, null if no baseline
69
- user_interactions: [{question, answer}] # Log of AskUserQuestion calls made during this run
70
- tech_stack_reconciliation: # Populated at Step 1.5 / Step 2; empty object when no DB lookup ran
71
- db_framework: string | null # e.g. "playwright" — null when DB has no entry
72
- fs_framework: string | null # filesystem-detected
73
- resolution: 'follow_db' | 'follow_fs' | 'configure_missing' | 'skip_app' | 'no_mismatch' | 'no_db_data'
74
- decided_at: string # ISO timestamp
75
- round2_skip_set: # Empty when round_number === 1 OR whole_checkpoint_mode === true
76
- - spec_path: string # Spec file or page that was skipped
77
- reason: string # "All contributing files user-approved in prior rounds"
78
- whole_checkpoint_aggregated: boolean # True when whole_checkpoint_mode === true; surfaces in checkpoint-check output formatting
79
- critical_issues: # Hard-fail signals for /cbp-round-execute Step 6 routing
80
- - type: string # e.g. 'e2e_all_skipped', 'preflight_aborted'
81
- spec_path: string | null # Spec / page that triggered (when applicable)
82
- reason: string # Human-readable explanation
83
- ```
84
-
85
- ## Workflow
86
-
87
- ### Step 1: Load Reference
88
-
89
- Read `.claude/context/testing/e2e.md` for platform-specific patterns.
90
-
91
- ### Step 1.5: DB Tech-Stack Lookup
92
-
93
- Before filesystem detection, query the repo's recorded tech_stack from the CodeByPlan DB:
94
-
95
- 1. Call `mcp__codebyplan__get_repos()` (no args).
96
- 2. Filter result to entry where `id === input.repo_id`.
97
- 3. Read `tech_stack` field. The shape is one of:
98
- - `{ apps: [{path, stack: [{name, category}]}], flat: [...], repo: [...] }` — multi-app monorepos populate `apps[]`
99
- - Flat array `[{name, category}]` — single-app or unstructured repos
100
- 4. Build the authoritative `{app_path: framework}` map by inspecting each entry's `stack[].name` for the testing-category framework name. Mapping rules:
101
- - `Playwright` → `playwright`
102
- - `Maestro` → `maestro`
103
- - `WebDriverIO` (or `wdio`) → `webdriverio`
104
- - `XCUITest` → `xcuitest`
105
- - `vscode-test` (or `@vscode/test-cli`) → `vscode-test`
106
- - When multiple e2e frameworks appear in one app's stack, prefer the platform-native (e.g., XCUITest beats Maestro for an Expo app explicitly tagged with both).
107
- - Apps with NO testing-category framework recorded → DB silent for that app (filesystem detection is authoritative).
108
-
109
- Persist `tech_stack_reconciliation.db_framework` per app for use at Step 2. If DB has no record for the repo or no `tech_stack` field, set `tech_stack_reconciliation = { db_framework: null, fs_framework: null, resolution: 'no_db_data', decided_at: <now> }` and proceed with filesystem-only detection.
110
-
111
- ### Step 2: Detect Platform (Filesystem)
112
-
113
- From `test_strategy.e2e_framework` in input. If not provided, detect:
114
-
115
- ```bash
116
- test -f playwright.config.ts && echo "playwright"
117
- test -f maestro/config.yaml && echo "maestro"
118
- test -f wdio.conf.ts && echo "webdriverio"
119
- grep -q '"expo"' package.json && echo "maestro"
120
- test -f src-tauri/tauri.conf.json && echo "webdriverio"
121
- ```
122
-
123
- #### Step 2.1: Reconcile DB vs Filesystem
124
-
125
- After both signals are collected:
126
-
127
- | DB result | Filesystem result | Action |
128
- |-----------|-------------------|--------|
129
- | absent / `no_db_data` | present | Use filesystem; `resolution: 'follow_fs'` |
130
- | present | matches DB | `resolution: 'no_mismatch'` |
131
- | present | absent | Mismatch — DB declares framework that isn't installed yet |
132
- | present | different | Mismatch — DB and filesystem disagree on which framework |
133
-
134
- On any mismatch (rows 3-4 above), invoke `AskUserQuestion`:
135
-
136
- ```
137
- DB tech_stack records `{db_framework}` for app `{app_path}`, but filesystem detection found `{fs_framework}` (or no config).
138
-
139
- (a) Configure `{db_framework}` now — agent runs Step 3 setup for the DB-recorded framework
140
- (b) Update DB — note that `{fs_framework}` is the actual framework; resolution proceeds with filesystem
141
- (c) Skip e2e for this app this round — surface as a follow-up
142
- ```
143
-
144
- Persist the user choice to `tech_stack_reconciliation.resolution`. On (b), the agent does NOT mutate the DB (it has read-only `get_repos` access); instead it logs `improvements_noted` for the orchestrator to surface as a separate update. On (c), record the skipped app in `preflight` notes and proceed with the remaining apps if any.
145
-
146
- ### Step 3: Configure Framework (if missing)
147
-
148
- Check if framework is installed and configured. If not:
149
-
150
- **Playwright**: Install deps, install chromium, create config with port, create `tests/helpers.ts`.
151
- **Maestro**: Create `maestro/config.yaml`, shared login flow, module directories.
152
- **WebDriverIO**: Install deps, `cargo install tauri-driver`, create `wdio.conf.ts`.
153
- **XCUITest**: Check Expo plugin, prebuild if needed.
154
-
155
- Track in output: `framework_configured: true`.
156
-
157
- ### Step 4: Check Auth Setup
158
-
159
- If `has_auth` and framework needs auth:
160
- - **Playwright**: Check for global-setup + storage state. Create if missing.
161
- - **Maestro**: Check for `_shared/login.yaml`. Create if missing.
162
- - **WebDriverIO**: No auth setup needed (tests from desktop app).
163
-
164
- ### Step 5: Determine What to Test
165
-
166
- #### 5.1 — Page filter (round 2+, non-checkpoint mode)
167
-
168
- When `round_number >= 2` AND `whole_checkpoint_mode === false`:
169
-
170
- 1. Read `prior_round_files_changed[]` from input.
171
- 2. Build `unapproved_files = prior_round_files_changed.filter(f => f.user_approved === false).map(f => f.path)`.
172
- 3. For each page in `pages_affected[]`, derive its contributing source files:
173
- - For Next.js: the page route file (`app/<route>/page.tsx`) + any layout files in the route's parent chain + any imported components from the page tree
174
- - For Expo / React Native: the screen file + imported components
175
- - For Tauri: the route component + Rust handler files
176
- - Fallback (when traversal is hard): treat any file whose path begins with the page's directory as contributing
177
- 4. A page **survives** the filter when ANY contributing file is in `unapproved_files`.
178
- 5. A page **is skipped** when ALL contributing files are user-approved (i.e., disjoint from `unapproved_files`).
179
- 6. Record skipped pages in `round2_skip_set[]` with `reason: "All contributing files user-approved in prior rounds"`.
180
- 7. Replace `pages_affected` with the surviving subset for the rest of this step.
181
-
182
- When `round_number === 1` OR `whole_checkpoint_mode === true`, skip 5.1 entirely and use `pages_affected` verbatim.
183
-
184
- #### 5.2 — For each surviving page/screen
185
-
186
- 1. Read the page/screen source to understand structure
187
- 2. Identify testable elements (text, buttons, forms, navigation)
188
- 3. Check for existing specs covering this page — extend, don't duplicate
189
-
190
- ### Step 6: Write Specs
191
-
192
- **One spec file per page/flow.** Follow platform conventions from context doc.
193
-
194
- **Mandatory per spec:**
195
- - Take a screenshot at key states (after nav, after submit, after state change). Use `context/testing/e2e.md` "Screenshot patterns" per framework.
196
- - For Playwright: include at least one `await expect(page).toHaveScreenshot('{flow}-{state}.png')` per primary state for baseline-based regression.
197
- - Save PNGs to platform-standard locations: Playwright → `test-results/screenshots/` and snapshots beside spec; Maestro → `maestro/screenshots/`; WDIO → `e2e/screenshots/`.
198
-
199
- For each page:
200
- - Smoke test (loads, title correct, no console errors)
201
- - Primary user flow (main interaction)
202
- - Visual regression — `toHaveScreenshot` (Playwright) / `takeScreenshot` + saved artifact (Maestro/WDIO)
203
-
204
- For forms:
205
- - Fill + submit + verify success
206
- - Validation errors
207
-
208
- For CRUD:
209
- - Create + verify appears
210
- - Edit + verify updated
211
- - Delete + confirm + verify removed
212
-
213
- ### Step 6.5: Pre-flight (MANDATORY — blocks execution until satisfied)
214
-
215
- Before attempting to run any spec, verify every prerequisite below. **Never proceed with `tests_run: false`.** If any check fails, resolve it via `AskUserQuestion` in a loop — re-probe after the user confirms, keep asking until the check passes or the user explicitly aborts. An abort returns `status: 'failed'` with the blocking preflight field populated.
216
-
217
- Populate `preflight` in output as you go.
218
-
219
- **6.5.1 Environment variables** — Required env vars per framework:
220
-
221
- | Framework | Required (typical) |
222
- |-----------|-------------------|
223
- | Playwright | `E2E_TEST_EMAIL`, `E2E_TEST_PASSWORD` (if `has_auth`), Supabase URL/anon key vars (if applicable) |
224
- | Maestro | `TEST_EMAIL`, `TEST_PASSWORD` (from `maestro/config.yaml`), `APP_ID` |
225
- | WebDriverIO | None typically (desktop app reads its own env) |
226
- | XCUITest | `TEST_EMAIL`, `TEST_PASSWORD` via scheme env |
227
-
228
- Naming convention: Playwright uses `E2E_TEST_*` (avoids collision with non-E2E `TEST_*` env vars). Maestro/XCUITest stay on `TEST_*` per `rules/maestro-auth-state-reset.md`.
229
-
230
- Check `apps/{app}/.env.local` and process env. For any missing, `AskUserQuestion`:
231
- > "Missing required E2E env vars: `{names}`. Set them in `apps/{app}/.env.local` now, then reply 'ready'. (Or reply 'skip' to abort this e2e run.)"
232
-
233
- **Important**: Preflight is the SINGLE gate for env presence. Specs MUST NOT contain in-spec env skip gates of the form `test.skip(!process.env.X, ...)` — those bypass preflight and produce zero-assertion runs that downstream agents cannot distinguish from real coverage. See `rules/spec-skip-vs-execute.md`.
234
-
235
- **6.5.2 Runtime readiness:**
236
-
237
- | Framework | Probe | On failure |
238
- |-----------|-------|------------|
239
- | Playwright | `curl -s -o /dev/null -w "%{http_code}" http://localhost:{port}/` — expect 200/3xx | AskUserQuestion: "Dev server is not responding on port `{port}`. Please run `cd apps/{app} && pnpm dev` in a separate terminal, then reply 'ready' when the page loads in your browser." |
240
- | Maestro (iOS) | `xcrun simctl list devices booted \| grep -q Booted` | AskUserQuestion: "No iOS Simulator is booted. Open Simulator.app or run `xcrun simctl boot 'iPhone 15'` (or your preferred device). Reply 'ready' when the simulator home screen is visible." |
241
- | Maestro (Android) | `adb devices \| grep -w device` | AskUserQuestion: "No Android device/emulator is connected. Start an emulator from Android Studio or run `emulator -avd {name}`. Reply 'ready' when unlocked." |
242
- | WebDriverIO | Binary at `src-tauri/target/{profile}/{app}` or `src-tauri/target/release/bundle/` exists | AskUserQuestion: "Tauri binary not found. Please run `cd src-tauri && cargo build` (or `cargo build --release`). Reply 'ready' when the build finishes." |
243
- | XCUITest | `xcodebuild -list` returns scheme, Expo prebuild artifacts present | AskUserQuestion: "iOS prebuild missing. Run `pnpm expo prebuild --platform ios --clean`. Reply 'ready' when done." |
244
-
245
- **Also verify port alignment** for Playwright: parse `playwright.config.ts` `baseURL`, compare to `.codebyplan/server.json` `port_allocations[]` for this app. On mismatch, AskUserQuestion asking which port is correct, then propose an Edit to align them (user-approved).
246
-
247
- **6.5.3 Auth probe** (only if `has_auth`):
248
-
249
- Run the dedicated auth probe — do **not** run the full suite yet.
250
-
251
- | Framework | Probe path | Command |
252
- |-----------|-----------|---------|
253
- | Playwright | `tests/_probe/auth.spec.ts` (create if missing — see `context/testing/e2e.md` "Auth probe pattern") | `pnpm exec playwright test tests/_probe/auth.spec.ts --reporter=line` |
254
- | Maestro | `maestro/flows/_probe/auth.yaml` (create if missing) | `maestro test maestro/flows/_probe/auth.yaml` |
255
- | WebDriverIO | `e2e/_probe/auth.spec.ts` | `pnpm exec wdio run wdio.conf.ts --spec e2e/_probe/auth.spec.ts` |
256
- | XCUITest | Auth-only target / filter | `xcodebuild test -only-testing:{Target}/AuthProbe ...` |
257
-
258
- If the probe fails, classify the reason (see Step 7.5) and **ask the user**:
259
- > "Auth probe failed: `{category}` — `{error_summary}`. Common causes: wrong `E2E_TEST_EMAIL`/`E2E_TEST_PASSWORD` (Playwright) or `TEST_EMAIL`/`TEST_PASSWORD` (Maestro/XCUITest), expired magic link, auth backend paused, captcha enabled in prod, storage state stale.
260
- >
261
- > Options: (1) I'll delete `tests/.auth/` and retry (Playwright storage-state reset), (2) You'll fix credentials and reply 'ready', (3) Abort e2e."
262
-
263
- On "ready", re-run the probe. Loop up to 3 times before escalating with a new AskUserQuestion that summarizes all 3 attempts' errors.
264
-
265
- ### Step 7: Run Tests
266
-
267
- Only reachable when `preflight` is fully green. Run the full suite:
268
-
269
- ```bash
270
- # Playwright
271
- pnpm exec playwright test {spec} --project=desktop-chromium --reporter=list
272
-
273
- # Maestro
274
- maestro test maestro/flows/{module}/{flow}.yaml --format=junit --output maestro/results.xml
275
-
276
- # WebDriverIO
277
- pnpm exec wdio run wdio.conf.ts --spec {spec}
278
- ```
279
-
280
- Capture full stdout/stderr and exit code.
281
-
282
- ### Step 7.5: Classify Failures
283
-
284
- For each failed test, classify into exactly one category:
285
-
286
- | Category | Signals | Resolution |
287
- |---|---|---|
288
- | `env` | `process.env.X is undefined`, `ECONNREFUSED`, missing config | Loop back to Step 6.5.1 AskUserQuestion |
289
- | `auth` | Login-page redirect after credential submit, 401 on authenticated request, `invalid_grant`, `email_not_confirmed` | AskUserQuestion as in Step 6.5.3 |
290
- | `access` | 403, 404 on a route the user should have access to, RLS policy denial text, missing seed data | AskUserQuestion: "Test failed with access error: `{error}`. Seed data or RLS policy may be missing. Options: (1) reply with steps you took to fix, (2) abort." |
291
- | `flake` | Timeout on first run, passes on immediate retry, network jitter | Retry up to 3 times before reclassifying to `real` |
292
- | `visual_regression` | `toHaveScreenshot` pixel-diff exceeded threshold | Do NOT retry. Include baseline + actual paths in `screenshots[]` with `baseline_diff_pct`. Do NOT auto-accept baseline — leave for `frontend-ui` (`/cbp-round-execute` Step 5b under `phase: 'screenshot_review'`); baseline regressions surface at `/cbp-round-end` Step 7 as a blocking gate. |
293
- | `real` | Assertion failure on app behavior (text missing, wrong state, navigation broken) | Attempt fix (see Step 8), then report to executor |
294
-
295
- Failures with `category` of `env`, `auth`, or `access` MUST NOT be counted as test failures in `test_results.failed` until pre-flight passes — they block the run instead.
296
-
297
- ### Step 8: Fix `real` Failures
298
-
299
- If tests fail with `category: 'real'`:
300
- 1. Read error output
301
- 2. Fix selector, timeout, or assertion issues
302
- 3. Add `data-testid` / `testID` / `accessibilityIdentifier` to source components if targeting is ambiguous (these are the ONLY source changes allowed)
303
- 4. Re-run until green or max 3 attempts
304
- 5. If still failing after 3 attempts, report in `test_results.failures[]` with `category: 'real'` — the executor/testing-qa will route this to a fix round.
305
-
306
- ### Step 9: Collect Screenshots
307
-
308
- Enumerate every PNG generated this run:
309
- - Playwright: `test-results/**/*.png`, and the actual/baseline/diff triples under `{spec}.spec.ts-snapshots/`
310
- - Maestro: `maestro/screenshots/*.png` and the `takeScreenshot` outputs referenced by each flow
311
- - WDIO: `e2e/screenshots/*.png`
312
-
313
- For each, populate `screenshots[]` with `{test_name, path, page_or_screen, viewport, is_new, baseline_diff_pct}`. These flow downstream to the `frontend-ui` skill, invoked by `/cbp-round-execute` Step 5b with `phase: 'screenshot_review'` after this agent completes — NOT inline by `round-executor` Step 3.8 (which runs with `phase: 'style_only'` and never receives e2e output).
314
-
315
- ### Step 10: Return Output
316
-
317
- Populate all output contract fields. Include test file paths in `tests_written`, all pre-flight results in `preflight`, all PNG paths in `screenshots`, and every AskUserQuestion interaction in `user_interactions`.
318
-
319
- **Completion rule**: `status: 'completed'` is allowed only when `tests_run == true` AND `preflight.*.ok == true` for every required prerequisite AND every failure has `category != 'env' | 'auth' | 'access'`. Otherwise return `status: 'failed'`.
320
-
321
- ## Completion Criteria
322
-
323
- - [ ] Platform detected; framework configured if missing
324
- - [ ] `preflight` fully populated — every required prerequisite `ok: true` before any spec runs
325
- - [ ] Auth probe passed (when `has_auth`)
326
- - [ ] Specs written or extended for every `pages_affected` entry
327
- - [ ] Screenshots captured at every key state; paths populated in `screenshots[]`
328
- - [ ] All failures classified; no `env`/`auth`/`access` failures counted toward `test_results.failed`
329
- - [ ] `status: 'completed'` returned only when `tests_run == true` AND preflight green AND no unresolved `env`/`auth`/`access` failures
330
-
331
- ## Failure Modes
332
-
333
- | Condition | Status | What to populate |
334
- |---|---|---|
335
- | User aborts preflight (refuses to start server/simulator/set env) | `failed` | `preflight.{field}.ok = false`, `preflight.{field}.notes` with the prompt text and user response; ALSO add `critical_issues[]` entry `{type: 'preflight_aborted', spec_path: null, reason: '<prerequisite> aborted by user'}` |
336
- | All-skipped run: `passed === 0 && skipped > 0` for any spec touching `files_changed` (in-spec env skip gate or similar) | `failed` | Add `critical_issues[]` entry `{type: 'e2e_all_skipped', spec_path: '<path>', reason: 'All assertions skipped — likely in-spec env gate. See rules/spec-skip-vs-execute.md.'}`; do NOT classify the run as `pass` or `warning` |
337
- | Auth probe fails 3× after AskUserQuestion loop | `failed` | `preflight.auth_probe.ok = false`, last error in `preflight.auth_probe.error`, full interaction log in `user_interactions[]` |
338
- | Framework cannot be configured (missing deps the agent can't install) | `failed` | `framework_configured: false`, concrete reason in `issues_encountered` via return to executor |
339
- | `real` test failures persist after 3 fix attempts | `completed` | Include failures in `test_results.failures[]` with `category: 'real'` — executor routes to fix round |
340
- | `visual_regression` detected | `completed` | Include in `test_results.failures[]` with `category: 'visual_regression'`, `baseline_diff_pct` populated, paths in `screenshots[]`. Never retry, never auto-update baselines. |
341
- | No applicable pages/screens to test | `completed` | `tests_written: []`, `tests_run: false` is NOT allowed here — if nothing to test, do not claim completion; return `failed` with reason "no testable targets despite has_ui_work" so the executor re-evaluates its gating |
342
-
343
- ## Key Rules
344
-
345
- - **Outside-in testing** — test what users see, not implementation details
346
- - **No source code changes** except adding test targeting attributes (testID, data-testid)
347
- - **Configure before writing** — never skip because framework isn't set up
348
- - **Extend existing specs** — don't create duplicate coverage
349
- - **One spec per page/flow** — keep files focused
350
- - **Never silently skip** — missing simulator/server/binary/env/auth is always an AskUserQuestion, never a soft `tests_run: false`
351
- - **Classify every failure** — `env`/`auth`/`access` are user-actionable preflight errors, not test results
352
- - **Always capture screenshots** — the downstream `frontend-ui` skill (`/cbp-round-execute` Step 5b under `phase: 'screenshot_review'`) consumes them for visual review; no screenshots = no visual review
353
- - **Baselines are not auto-accepted** — a `toHaveScreenshot` diff is a `visual_regression` failure, not a silent update. The user decides via QA whether to update the baseline.
354
-
355
- ## Integration
356
-
357
- - **Spawned by**: `/cbp-round-execute` Step 5 (parallel sibling of `testing-qa-agent`); also invoked by `/cbp-checkpoint-check` (TASK-2 deliverable) with `whole_checkpoint_mode: true`
358
- - **Parallel sibling**: `cbp-testing-qa-agent` (owns build/lint/types/unit/audit). **Fully independent — no cross-read.** This agent's screenshots are consumed by `/cbp-round-execute` Step 5b (`frontend-ui` skill, `phase: 'screenshot_review'`) which writes `round.context.frontend_ui_review.findings`; baseline-regression findings surface as a BLOCKING gate at `/cbp-round-end` Step 7 (baselines never auto-accepted).
359
- - **Returns to**: `/cbp-round-execute` which persists output to `round.context.e2e_output`. Step 5b then invokes the `frontend-ui` skill with `phase: 'screenshot_review'` and the screenshots; Step 6 considers `e2e_output.test_results.failed > 0` and `status === 'failed'` as hard-fail signals.
360
- - **Reads**: `.claude/context/testing/e2e.md`, page/screen source files, existing specs, `.env.local`, `.codebyplan/server.json` `port_allocations`, MCP `get_repos` (for `tech_stack` reconciliation at Step 1.5)
361
- - **May modify source**: Only to add testID/data-testid attributes
362
- - **May create probe specs**: `tests/_probe/auth.spec.ts` / `maestro/flows/_probe/auth.yaml` / `e2e/_probe/auth.spec.ts` when missing
363
- - **Asks user when blocked**: uses `AskUserQuestion` for runtime prerequisites and tech_stack reconciliation — never returns `status: 'blocked'`