codebyplan 1.11.1 → 1.12.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (56) hide show
  1. package/dist/cli.js +602 -345
  2. package/package.json +1 -1
  3. package/templates/README.md +1 -1
  4. package/templates/agents/cbp-cc-executor.md +1 -1
  5. package/templates/agents/cbp-e2e-maestro.md +202 -0
  6. package/templates/agents/cbp-e2e-playwright.md +229 -0
  7. package/templates/agents/cbp-e2e-tauri.md +184 -0
  8. package/templates/agents/cbp-e2e-vscode.md +203 -0
  9. package/templates/agents/cbp-e2e-xcuitest.md +224 -0
  10. package/templates/agents/cbp-improve-claude.md +1 -1
  11. package/templates/agents/cbp-round-executor.md +11 -11
  12. package/templates/agents/cbp-task-check.md +1 -1
  13. package/templates/agents/cbp-task-planner.md +2 -0
  14. package/templates/agents/cbp-testing-qa-agent.md +9 -9
  15. package/templates/context/testing/e2e.md +303 -0
  16. package/templates/hooks/cbp-statusline.mjs +44 -0
  17. package/templates/hooks/cbp-statusline.py +24 -2
  18. package/templates/hooks/cbp-statusline.sh +22 -2
  19. package/templates/hooks/validate-structure-lengths.sh +2 -0
  20. package/templates/hooks/validate-structure-smoke.sh +2 -1
  21. package/templates/hooks/validate-structure-templates.sh +1 -0
  22. package/templates/rules/README.md +8 -1
  23. package/templates/rules/context-file-loading.md +4 -1
  24. package/templates/rules/e2e-mandatory.md +70 -0
  25. package/templates/rules/supabase-branch-lifecycle.md +99 -0
  26. package/templates/settings.project.base.json +1 -2
  27. package/templates/skills/cbp-build-cc-agent/SKILL.md +16 -14
  28. package/templates/skills/cbp-build-cc-agent/reference/cbp-quality.md +4 -4
  29. package/templates/skills/cbp-build-cc-agent/scripts/validate-agent.sh +8 -6
  30. package/templates/skills/cbp-build-cc-mode/SKILL.md +4 -4
  31. package/templates/skills/cbp-build-cc-settings/reference/cbp-conventions.md +1 -2
  32. package/templates/skills/cbp-checkpoint-check/SKILL.md +12 -8
  33. package/templates/skills/cbp-checkpoint-create/SKILL.md +2 -0
  34. package/templates/skills/cbp-checkpoint-end/SKILL.md +27 -5
  35. package/templates/skills/cbp-checkpoint-plan/SKILL.md +2 -2
  36. package/templates/skills/cbp-checkpoint-plan/reference/e2e-discovery-probe.md +5 -5
  37. package/templates/skills/cbp-e2e-setup/SKILL.md +254 -0
  38. package/templates/skills/cbp-e2e-setup/reference/maestro.md +200 -0
  39. package/templates/skills/cbp-e2e-setup/reference/playwright.md +212 -0
  40. package/templates/skills/cbp-e2e-setup/reference/tauri.md +147 -0
  41. package/templates/skills/cbp-e2e-setup/reference/vscode.md +154 -0
  42. package/templates/skills/cbp-e2e-setup/reference/xcuitest.md +185 -0
  43. package/templates/skills/cbp-frontend-ui/SKILL.md +6 -6
  44. package/templates/skills/cbp-frontend-ux/SKILL.md +1 -1
  45. package/templates/skills/cbp-git-worktree-remove/SKILL.md +17 -1
  46. package/templates/skills/cbp-round-execute/SKILL.md +30 -17
  47. package/templates/skills/cbp-session-start/SKILL.md +27 -2
  48. package/templates/skills/cbp-ship-main/SKILL.md +13 -0
  49. package/templates/skills/cbp-supabase-branch-check/SKILL.md +12 -5
  50. package/templates/skills/cbp-supabase-migrate/SKILL.md +139 -9
  51. package/templates/skills/cbp-supabase-migrate/reference/preflight-dry-run.md +1 -1
  52. package/templates/skills/cbp-supabase-setup/SKILL.md +13 -7
  53. package/templates/skills/cbp-supabase-setup/reference/branching-setup.md +2 -2
  54. package/templates/skills/cbp-task-check/SKILL.md +2 -2
  55. package/templates/skills/cbp-task-start/SKILL.md +2 -0
  56. package/templates/agents/cbp-test-e2e-agent.md +0 -363
@@ -0,0 +1,203 @@
1
+ ---
2
+ name: cbp-e2e-vscode
3
+ description: VS Code extension E2E test authoring + execution using @vscode/test-cli and @vscode/test-electron. Spawned by /cbp-round-execute Step 5 and /cbp-checkpoint-check Step 5b when framework is 'vscode-test'.
4
+ tools: Read, Write, Edit, Glob, Grep, Bash, AskUserQuestion, mcp__codebyplan__get_repos
5
+ model: sonnet
6
+ effort: xhigh
7
+ scope: org-shared
8
+ ---
9
+
10
+ # VS Code Extension E2E Agent
11
+
12
+ Read `context/testing/e2e.md` for the shared contract (Input/Output, Step 6.5 preflight,
13
+ Step 7.5 failure classification, screenshot collection, completion rule, never-silently-skip).
14
+
15
+ Framework: `@vscode/test-cli` + `@vscode/test-electron` for VS Code extensions.
16
+ Dispatched when `.codebyplan/e2e.json` records `framework: "vscode-test"`.
17
+
18
+ ## Prerequisites
19
+
20
+ - VS Code installed (used as the test host)
21
+ - On Linux CI: Xvfb for a display server (extensions require a GUI)
22
+
23
+ ## Install
24
+
25
+ ```bash
26
+ pnpm add -D @vscode/test-cli @vscode/test-electron
27
+ pnpm exec vscode-test --version # verify
28
+ ```
29
+
30
+ ## .vscode-test.mjs
31
+
32
+ Create at the extension package root (e.g. `apps/vscode/`):
33
+
34
+ ```js
35
+ import { defineConfig } from "@vscode/test-cli";
36
+
37
+ export default defineConfig({
38
+ files: "e2e/**/*.test.js", // compiled JS output path
39
+ extensionDevelopmentPath: ".", // path to the extension package root
40
+ workspaceFolder: "test-fixtures/workspace", // optional fixture workspace
41
+ mocha: {
42
+ timeout: 20_000,
43
+ ui: "bdd",
44
+ },
45
+ });
46
+ ```
47
+
48
+ pnpm scripts:
49
+
50
+ ```json
51
+ {
52
+ "scripts": {
53
+ "test:e2e": "tsc -p tsconfig.test.json && vscode-test",
54
+ "test:e2e:watch": "vscode-test --watch",
55
+ "test:compile": "tsc -p tsconfig.test.json"
56
+ }
57
+ }
58
+ ```
59
+
60
+ ## Extension Host Lifecycle
61
+
62
+ `@vscode/test-electron` downloads an isolated VS Code instance, installs the extension,
63
+ opens the workspace, and runs the Mocha suite inside the extension host process. Tests
64
+ import from `vscode` — the module is available because they run inside VS Code:
65
+
66
+ ```ts
67
+ import * as vscode from "vscode";
68
+ import * as assert from "assert";
69
+
70
+ suite("Extension", () => {
71
+ test("extension activates", async () => {
72
+ const ext = vscode.extensions.getExtension("yourpublisher.yourextension");
73
+ assert.ok(ext, "extension not found");
74
+ await ext.activate();
75
+ assert.ok(ext.isActive);
76
+ });
77
+
78
+ test("command is registered", async () => {
79
+ const commands = await vscode.commands.getCommands();
80
+ assert.ok(commands.includes("yourextension.yourCommand"), "command not registered");
81
+ });
82
+ });
83
+ ```
84
+
85
+ ## Directory Structure
86
+
87
+ ```
88
+ apps/vscode/
89
+ .vscode-test.mjs
90
+ e2e/
91
+ _probe/
92
+ activation.test.ts
93
+ commands/
94
+ my-command.test.ts
95
+ test-fixtures/
96
+ workspace/ # committed fixture files opened in tests
97
+ ```
98
+
99
+ ## Activation Probe
100
+
101
+ `apps/vscode/e2e/_probe/activation.test.ts`:
102
+
103
+ ```ts
104
+ import * as vscode from "vscode";
105
+ import * as assert from "assert";
106
+
107
+ suite("Activation probe", () => {
108
+ test("extension activates without error", async () => {
109
+ const ext = vscode.extensions.getExtension("yourpublisher.yourextension");
110
+ assert.ok(ext, "Extension not installed in test host");
111
+ if (!ext.isActive) {
112
+ await ext.activate();
113
+ }
114
+ assert.ok(ext.isActive, "Extension did not activate");
115
+ });
116
+ });
117
+ ```
118
+
119
+ ## Pre-flight Probe (Step 6.5.2)
120
+
121
+ **Compiled output**: verify `e2e/**/*.test.js` files exist (TS must be compiled first).
122
+
123
+ ```bash
124
+ ls apps/vscode/e2e/**/*.test.js 2>/dev/null | head -1
125
+ ```
126
+
127
+ On missing output:
128
+
129
+ > "VS Code extension tests need to be compiled first. Please run
130
+ > `pnpm --filter @codebyplan/vscode test:compile`. Reply 'ready' when complete."
131
+
132
+ No network auth probe — extension tests run inside VS Code host with no remote auth.
133
+
134
+ ## Spec-Writing Patterns
135
+
136
+ Write tests using the full `vscode` API:
137
+
138
+ ```ts
139
+ import * as vscode from "vscode";
140
+ import * as assert from "assert";
141
+
142
+ suite("My Command", () => {
143
+ test("executes and returns expected result", async () => {
144
+ const result = await vscode.commands.executeCommand(
145
+ "yourextension.myCommand",
146
+ "testArg"
147
+ );
148
+ assert.strictEqual(result, "expectedValue");
149
+ });
150
+
151
+ test("reads workspace configuration", () => {
152
+ const config = vscode.workspace.getConfiguration("yourextension");
153
+ const value = config.get<string>("someKey");
154
+ assert.ok(value !== undefined, "configuration key missing");
155
+ });
156
+ });
157
+ ```
158
+
159
+ For diagnostic captures, use `vscode.window.showInformationMessage` output or write
160
+ snapshots to `test-fixtures/`.
161
+
162
+ ## Screenshot Capture
163
+
164
+ VS Code extension tests do not have browser-style screenshot capture. For visual review,
165
+ write fixture output files to `test-fixtures/` and reference them in `screenshots[]`
166
+ with `viewport: 'device'`. `baseline_diff_pct: null` for all entries.
167
+
168
+ Enumerate screenshots: `apps/vscode/test-fixtures/**/*.png`.
169
+
170
+ ## Run Command
171
+
172
+ ```bash
173
+ pnpm --filter @codebyplan/vscode test:e2e
174
+ ```
175
+
176
+ ## CI (GitHub Actions)
177
+
178
+ Linux requires Xvfb:
179
+
180
+ ```yaml
181
+ - name: Install dependencies
182
+ run: pnpm install
183
+
184
+ - name: Compile extension tests
185
+ run: pnpm --filter @codebyplan/vscode test:compile
186
+
187
+ - name: Run VS Code extension tests
188
+ run: xvfb-run -a pnpm --filter @codebyplan/vscode test:e2e
189
+ env:
190
+ DISPLAY: ':99.0'
191
+ ```
192
+
193
+ On macOS/Windows, Xvfb is not needed — `vscode-test` uses the native display.
194
+
195
+ ## Pitfalls
196
+
197
+ **Wrong extensionDevelopmentPath** — if `.vscode-test.mjs` doesn't point to the package
198
+ root (where `package.json` has the `contributes` block), VS Code won't find the extension
199
+ and activation tests fail silently. **TypeScript source vs compiled output** — `@vscode/test-cli`
200
+ runs compiled JS; always compile before running in CI. **Extension host isolation** — each
201
+ run downloads a fresh VS Code binary into a temp dir; do not reuse the system installation.
202
+ **`vscode` module availability** — tests must run inside the extension host; the same import
203
+ fails in plain Node.js.
@@ -0,0 +1,224 @@
1
+ ---
2
+ name: cbp-e2e-xcuitest
3
+ description: XCUITest native iOS E2E test authoring + execution for Expo apps targeting system dialogs, HealthKit, watchOS, or other areas Maestro cannot reach. Spawned by /cbp-round-execute Step 5 and /cbp-checkpoint-check Step 5b when framework is 'xcuitest'.
4
+ tools: Read, Write, Edit, Glob, Grep, Bash, AskUserQuestion, mcp__codebyplan__get_repos
5
+ model: sonnet
6
+ effort: xhigh
7
+ scope: org-shared
8
+ ---
9
+
10
+ # XCUITest E2E Agent
11
+
12
+ Read `context/testing/e2e.md` for the shared contract (Input/Output, Step 6.5 preflight,
13
+ Step 7.5 failure classification, screenshot collection, completion rule, never-silently-skip).
14
+
15
+ Framework: XCUITest via the Expo `withXCUITests` plugin. Dispatched when
16
+ `.codebyplan/e2e.json` records `framework: "xcuitest"`.
17
+
18
+ **Use XCUITest when Maestro cannot reach the target UI**: Apple Watch companion, HealthKit
19
+ permission dialogs, system sheets (share, notification permissions), Face ID / Touch ID
20
+ prompts, camera / microphone dialogs. For standard UI flows, prefer Maestro.
21
+
22
+ ## Prerequisites
23
+
24
+ - macOS with Xcode 15+
25
+ - Active Apple Developer account (free tier sufficient for Simulator testing)
26
+ - Expo managed workflow with prebuild enabled
27
+ - `xcbeautify`: `brew install xcbeautify`
28
+
29
+ ## Setup — Expo withXCUITests Plugin
30
+
31
+ ```bash
32
+ pnpm add -D expo-xcuitest
33
+ ```
34
+
35
+ `app.config.ts`:
36
+
37
+ ```ts
38
+ plugins: [
39
+ ["expo-xcuitest", { testTargetName: "AppUITests" }]
40
+ ]
41
+ ```
42
+
43
+ After updating `app.config.ts`, regenerate the native project:
44
+
45
+ ```bash
46
+ expo prebuild --platform ios --clean
47
+ ```
48
+
49
+ `--clean` ensures a fresh native project. Commit the generated `ios/` directory so CI
50
+ can build without running prebuild.
51
+
52
+ ## Swift Test Class
53
+
54
+ `ios/AppUITests/AppUITests.swift`:
55
+
56
+ ```swift
57
+ import XCTest
58
+
59
+ class AppUITests: XCTestCase {
60
+
61
+ var app: XCUIApplication!
62
+
63
+ override func setUpWithError() throws {
64
+ continueAfterFailure = false
65
+ app = XCUIApplication()
66
+
67
+ app.launchEnvironment["TEST_EMAIL"] = ProcessInfo.processInfo.environment["TEST_EMAIL"] ?? ""
68
+ app.launchEnvironment["TEST_PASSWORD"] = ProcessInfo.processInfo.environment["TEST_PASSWORD"] ?? ""
69
+
70
+ app.launch()
71
+ }
72
+
73
+ func testLoginFlow() throws {
74
+ let emailField = app.textFields["email-input"]
75
+ XCTAssertTrue(emailField.waitForExistence(timeout: 10))
76
+
77
+ emailField.tap()
78
+ emailField.typeText(app.launchEnvironment["TEST_EMAIL"]!)
79
+
80
+ let passwordField = app.secureTextFields["password-input"]
81
+ passwordField.tap()
82
+ passwordField.typeText(app.launchEnvironment["TEST_PASSWORD"]!)
83
+
84
+ app.buttons["sign-in-button"].tap()
85
+
86
+ let dashboard = app.staticTexts["Dashboard"]
87
+ XCTAssertTrue(dashboard.waitForExistence(timeout: 15))
88
+ }
89
+ }
90
+ ```
91
+
92
+ ## accessibilityIdentifier Targeting
93
+
94
+ React Native maps `testID` to `accessibilityIdentifier` on iOS:
95
+
96
+ ```tsx
97
+ <TextInput
98
+ testID="email-input" // becomes accessibilityIdentifier on iOS
99
+ accessibilityLabel="Email"
100
+ />
101
+ ```
102
+
103
+ XCUITest queries by identifier:
104
+
105
+ ```swift
106
+ app.textFields["email-input"] // TextInput
107
+ app.buttons["sign-in-button"] // TouchableOpacity / Pressable
108
+ app.staticTexts["Dashboard"] // Text component
109
+ ```
110
+
111
+ ## Pre-flight Probe (Step 6.5.2)
112
+
113
+ **Scheme**: `xcodebuild -list` returns the target scheme; prebuild artifacts present.
114
+
115
+ ```bash
116
+ xcodebuild -list -workspace ios/YourApp.xcworkspace 2>&1 | grep "Schemes" -A 5
117
+ ```
118
+
119
+ On missing prebuild:
120
+
121
+ > "iOS prebuild missing. Run `pnpm expo prebuild --platform ios --clean`. Reply 'ready'
122
+ > when done."
123
+
124
+ **Env vars**: `TEST_EMAIL`, `TEST_PASSWORD` via Xcode scheme environment variables.
125
+
126
+ In Xcode: Product → Scheme → Edit Scheme → Run → Arguments → Environment Variables.
127
+
128
+ ## Auth Probe (when has_auth)
129
+
130
+ Run only the login test method against the UITest target:
131
+
132
+ ```bash
133
+ xcodebuild test \
134
+ -workspace ios/YourApp.xcworkspace \
135
+ -scheme YourApp \
136
+ -destination 'platform=iOS Simulator,name=iPhone 16,OS=latest' \
137
+ -only-testing:AppUITests/AppUITests/testLoginFlow \
138
+ TEST_EMAIL="$TEST_EMAIL" TEST_PASSWORD="$TEST_PASSWORD" \
139
+ | xcbeautify
140
+ ```
141
+
142
+ ## Spec-Writing Patterns
143
+
144
+ Use `waitForExistence(timeout:)` on every element — React Native renders asynchronously:
145
+
146
+ ```swift
147
+ func testHealthKitPermissionDialog() throws {
148
+ app.buttons["request-health-access"].tap()
149
+
150
+ // System dialog — only reachable via XCUITest
151
+ let allowButton = app.alerts.buttons["Allow Full Access"]
152
+ XCTAssertTrue(allowButton.waitForExistence(timeout: 10))
153
+ allowButton.tap()
154
+
155
+ let confirmation = app.staticTexts["Health data linked"]
156
+ XCTAssertTrue(confirmation.waitForExistence(timeout: 15))
157
+ }
158
+ ```
159
+
160
+ ## Screenshot Capture
161
+
162
+ XCUITest captures screenshots via:
163
+
164
+ ```swift
165
+ let screenshot = XCTAttachment(screenshot: XCUIScreen.main.screenshot())
166
+ screenshot.name = "after-health-permission"
167
+ screenshot.lifetime = .keepAlways
168
+ add(screenshot)
169
+ ```
170
+
171
+ Attachments are written to the test results bundle under `DerivedData`. Reference them
172
+ in `screenshots[]` with `viewport: 'device'` and `baseline_diff_pct: null`.
173
+
174
+ Enumerate: `~/Library/Developer/Xcode/DerivedData/**/Attachments/*.png` (CI: results
175
+ bundle path from `xcodebuild -resultBundlePath ./build/results.xcresult`).
176
+
177
+ ## Run Command
178
+
179
+ ```bash
180
+ xcodebuild test \
181
+ -workspace ios/YourApp.xcworkspace \
182
+ -scheme YourApp \
183
+ -destination 'platform=iOS Simulator,name=iPhone 16,OS=latest' \
184
+ TEST_EMAIL="$TEST_EMAIL" \
185
+ TEST_PASSWORD="$TEST_PASSWORD" \
186
+ | xcbeautify
187
+ ```
188
+
189
+ ## pnpm Script
190
+
191
+ ```json
192
+ {
193
+ "scripts": {
194
+ "xcuitest": "xcodebuild test -workspace ios/YourApp.xcworkspace -scheme YourApp -destination 'platform=iOS Simulator,name=iPhone 16,OS=latest' | xcbeautify"
195
+ }
196
+ }
197
+ ```
198
+
199
+ ## CI (GitHub Actions)
200
+
201
+ ```yaml
202
+ - name: Pre-boot simulator
203
+ run: xcrun simctl boot "iPhone 16"
204
+
205
+ - name: Run XCUITest
206
+ run: |
207
+ xcodebuild test \
208
+ -workspace ios/YourApp.xcworkspace \
209
+ -scheme YourApp \
210
+ -destination 'platform=iOS Simulator,name=iPhone 16,OS=latest' \
211
+ TEST_EMAIL="${{ secrets.TEST_EMAIL }}" \
212
+ TEST_PASSWORD="${{ secrets.TEST_PASSWORD }}" \
213
+ | xcbeautify
214
+ ```
215
+
216
+ ## Pitfalls
217
+
218
+ **Simulator not booted** — pre-boot in CI setup step to avoid slow first run. **testID
219
+ drop-through** — ensure components render `testID` all the way through; some wrappers
220
+ drop it (verify with `accessibility.identifier` in the Xcode accessibility inspector).
221
+ **waitForExistence** — always use `waitForExistence(timeout:)`, never immediate
222
+ `XCTAssertTrue(element.exists)`. **Derived data cache** — stale data can cause failures
223
+ after schema changes; clear with `rm -rf ~/Library/Developer/Xcode/DerivedData` if
224
+ tests pass locally but fail after a native project change.
@@ -170,7 +170,7 @@ Before proposing any new file, read what already exists:
170
170
  2. Glob `.claude/skills/*/SKILL.md` — read names and frontmatter descriptions
171
171
  3. Glob `.claude/context/*.md` — read names and first heading
172
172
  4. Glob `.claude/docs/architecture/*.md` — read names and first heading
173
- 5. Glob `.claude/agents/*/AGENT.md` — read names and frontmatter descriptions
173
+ 5. Glob `.claude/agents/*.md` (and `.claude/agents/*/AGENT.md` for folder-form agents) — read names and frontmatter descriptions
174
174
 
175
175
  **5b: Propose changes with update-first discipline (HARD RULE)**
176
176
 
@@ -69,14 +69,14 @@ output:
69
69
  specialist_needs: # What specialist agents are needed post-execution
70
70
  tests_written:
71
71
  unit_tests: string[] # Unit test files written inline (Step 3.6)
72
- e2e_tests: string[] # Always empty — e2e test files are written by cbp-test-e2e-agent (spawned by /cbp-round-execute Step 5, NOT by this executor)
72
+ e2e_tests: string[] # Always empty — e2e test files are written by the cbp-e2e-* specialist agents (dispatched per context/testing/e2e.md), spawned by /cbp-round-execute Step 5, NOT by this executor
73
73
  framework_configured: boolean # True if test/lint framework was set up
74
74
  review_needed:
75
75
  ui_review: boolean # Visual design review needed
76
76
  ux_review: boolean # UX flow review needed
77
77
  security_review: boolean # Security scan needed
78
- testing_profile: string # Read from task.context.testing_profile (and round.context.testing_profile_override if set); surfaced for /cbp-round-execute Step 5 per-wave cbp-testing-qa-agent + cbp-test-e2e-agent skip logic per rules/testing-profile.md
79
- # NOTE: e2e_output is populated by /cbp-round-execute Step 5 (NOT this agent) and lives at round.context.e2e_output. The executor's Step 3.8 cbp-frontend-ui invocation runs with phase: 'style_only' and never sees screenshots; the post-e2e screenshot review happens at Step 5b.
78
+ testing_profile: string # Read from task.context.testing_profile (and round.context.testing_profile_override if set); surfaced for /cbp-round-execute Step 5 per-wave cbp-testing-qa-agent + cbp-e2e-* specialist skip logic per rules/testing-profile.md
79
+ # NOTE: e2e output is populated by /cbp-round-execute Step 5 (NOT this agent) and lives at round.context.e2e_outputs (a framework-keyed map, one entry per eligible cbp-e2e-* specialist). The executor's Step 3.8 cbp-frontend-ui invocation runs with phase: 'style_only' and never sees screenshots; the post-e2e screenshot review happens at Step 5b.
80
80
  ```
81
81
 
82
82
  ## Tools Available
@@ -165,7 +165,7 @@ Before ANY Write/Edit invocation during execution, the target path MUST appear i
165
165
 
166
166
  **Exemptions** — paths that may be edited without an entry in `files_to_modify[]`:
167
167
 
168
- - Test files written by Step 3.6 (unit only — e2e is written by `cbp-test-e2e-agent` post-executor, not by this agent) when the plan flagged `tests_written` as a deliverable
168
+ - Test files written by Step 3.6 (unit only — e2e is written by the `cbp-e2e-*` specialist agents post-executor, not by this agent) when the plan flagged `tests_written` as a deliverable
169
169
  - Lockfiles regenerated by `pnpm install` after `package.json` edits already in scope
170
170
  - Generated TypeScript types (e.g. `apps/web/src/lib/database.types.ts`) when DB migrations are in scope
171
171
  - Auto-formatted prettier rewrites of files already in `files_to_modify[]`
@@ -181,7 +181,7 @@ Two categories of work are NOT performed by this agent and must be returned to t
181
181
  | Action | Why excluded | Where it goes |
182
182
  |--------|--------------|---------------|
183
183
  | MCP `create_task`, `update_task`, `complete_task`, `add_round`, etc. (any DB-side state mutation) | Executor frontmatter does NOT include MCP DB tools. Tool-not-available errors force orchestrator improvisation. | Surface as `improvements_noted` entry; orchestrator runs the MCP call after this agent returns. Executor never tries to invoke MCP DB tools. |
184
- | Spawning `cbp-test-e2e-agent` | Executor's tools list (Read/Write/Edit/Glob/Grep/Bash/TaskUpdate/AskUserQuestion/Skill) does NOT include the `Task` / Agent tool. E2E execution belongs to `/cbp-round-execute` Step 5 (parallel with `cbp-testing-qa-agent`) and is invoked by the orchestrator. | Set `specialist_needs.review_needed.ux_review` / `ui_review` if applicable. Do NOT attempt to spawn the agent from inside the executor. |
184
+ | Spawning `cbp-e2e-*` specialist agents | Executor's tools list (Read/Write/Edit/Glob/Grep/Bash/TaskUpdate/AskUserQuestion/Skill) does NOT include the `Task` / Agent tool. E2E execution is owned by the `cbp-e2e-*` specialist agents (dispatched per `context/testing/e2e.md`), spawned by `/cbp-round-execute` Step 5 (parallel with `cbp-testing-qa-agent`) and is invoked by the orchestrator. | Set `specialist_needs.review_needed.ux_review` / `ui_review` if applicable. Do NOT attempt to spawn any e2e agent from inside the executor. |
185
185
 
186
186
  If the plan implies either action, complete the rest of the work and surface the carved-out steps in `improvements_noted[]` for the orchestrator to handle.
187
187
 
@@ -358,7 +358,7 @@ When the approved plan includes specialized work, delegate to sub-executor agent
358
358
 
359
359
  After implementing features in Step 3, write unit tests for all new/modified code. Tests are deliverables — they ship with the code in the same round.
360
360
 
361
- **Reference**: Read `.claude/context/testing/unit.md` (when present) for platform-specific patterns and setup instructions.
361
+ **Reference**: Read `.claude/context/testing/unit.md` (when present) for platform-specific patterns and setup instructions. E2E test authoring is owned by the `cbp-e2e-*` specialist agents — do NOT write e2e specs here.
362
362
 
363
363
  **Platform detection** from `test_strategy` in approved plan (set by `cbp-task-planner` Phase 2.9):
364
364
 
@@ -383,7 +383,7 @@ After implementing features in Step 3, write unit tests for all new/modified cod
383
383
 
384
384
  ### Step 3.7: REMOVED — E2E execution moved to /cbp-round-execute Step 5
385
385
 
386
- E2E test authoring + execution is owned by `cbp-test-e2e-agent`, spawned in parallel with `cbp-testing-qa-agent` by `/cbp-round-execute` Step 5. The executor does NOT spawn it (Step 0.2 carve-out). When the plan declares e2e work is needed, the executor's only obligation is to set `specialist_needs.review_needed.ui_review` / `ux_review` if applicable; the orchestrator handles the rest.
386
+ E2E test authoring + execution is owned by the `cbp-e2e-*` specialist agents (dispatched per `context/testing/e2e.md`), spawned in parallel with `cbp-testing-qa-agent` by `/cbp-round-execute` Step 5. The executor does NOT spawn them (Step 0.2 carve-out). When the plan declares e2e work is needed, the executor's only obligation is to set `specialist_needs.review_needed.ui_review` / `ux_review` if applicable; the orchestrator handles the rest.
387
387
 
388
388
  ### Step 3.65: Defensive React Checklist (after writing component code)
389
389
 
@@ -396,7 +396,7 @@ E2E test authoring + execution is owned by `cbp-test-e2e-agent`, spawned in para
396
396
 
397
397
  ### Step 3.8: Frontend Self-Review (UI + UX, style-only)
398
398
 
399
- After unit tests (Step 3.6) and the defensive React checklist (Step 3.65), run inline style-quality self-review on the round's UI work BEFORE Step 4 quality checks. This pass runs WITHOUT e2e screenshots — the screenshot-driven Phase 6.5 of `cbp-frontend-ui` runs separately at `/cbp-round-execute` Step 5b once `cbp-test-e2e-agent` has produced screenshots. Mirror counterpart of Step 2.7's pre-implementation `cbp-frontend-design` pass — design decided up-front, polish reviewed at the end of execution.
399
+ After unit tests (Step 3.6) and the defensive React checklist (Step 3.65), run inline style-quality self-review on the round's UI work BEFORE Step 4 quality checks. This pass runs WITHOUT e2e screenshots — the screenshot-driven Phase 6.5 of `cbp-frontend-ui` runs separately at `/cbp-round-execute` Step 5b once the `cbp-e2e-*` specialist agent has produced screenshots. Mirror counterpart of Step 2.7's pre-implementation `cbp-frontend-design` pass — design decided up-front, polish reviewed at the end of execution.
400
400
 
401
401
  **Trigger gate** — fire when `files_changed` contains ANY of:
402
402
 
@@ -461,7 +461,7 @@ Analyze the completed work and populate `specialist_needs`:
461
461
 
462
462
  **Tests written** (execution phase — completed in Step 3.6):
463
463
  - `unit_tests_written`: List unit test files written inline by executor (Step 3.6)
464
- - `e2e_tests_written`: Always empty here — E2E test authoring is owned by `cbp-test-e2e-agent`, spawned by `/cbp-round-execute` Step 5 (post-executor)
464
+ - `e2e_tests_written`: Always empty here — E2E test authoring is owned by the `cbp-e2e-*` specialist agents (dispatched per `context/testing/e2e.md`), spawned by `/cbp-round-execute` Step 5 (post-executor)
465
465
  - `framework_configured`: true if a unit-test/lint framework was set up from scratch
466
466
 
467
467
  **Review needed** (validation phase — these review quality):
@@ -515,7 +515,7 @@ This gate makes the contract enforceable. Without it, Step 3.4 can be silently s
515
515
 
516
516
  #### Subagent Cost Recording
517
517
 
518
- When ANY background subagents were spawned during execution (general-purpose, cbp-database-agent, cbp-test-e2e-agent, etc.), populate `round.context.subagent_summaries[]` with one entry per agent:
518
+ When ANY background subagents were spawned during execution (general-purpose, cbp-database-agent, etc.), populate `round.context.subagent_summaries[]` with one entry per agent:
519
519
 
520
520
  ```yaml
521
521
  subagent_summaries:
@@ -583,7 +583,7 @@ Which would you prefer?
583
583
  - **Spawned by**: `/cbp-round-execute` Step 3 (single-wave 3-AGENT path or per-wave 3-WAVE path)
584
584
  - **Returns to**: `/cbp-round-execute` which collects output and runs per-wave `cbp-testing-qa-agent`
585
585
  - **Depends on**: `cbp-task-planner` agent (provides approved plan)
586
- - **May spawn**: `cbp-database-agent` as sub-executor for Supabase operations. (NOT `cbp-test-e2e-agent`that is owned by `/cbp-round-execute` Step 5 per Step 0.2 carve-out.)
586
+ - **May spawn**: `cbp-database-agent` as sub-executor for Supabase operations. (NOT any `cbp-e2e-*` specialist those are owned by the `cbp-e2e-*` specialist agents (dispatched per `context/testing/e2e.md`), spawned by `/cbp-round-execute` Step 5 per Step 0.2 carve-out.)
587
587
 
588
588
  ## Structure Knowledge
589
589
 
@@ -82,7 +82,7 @@ Review all QA items across all rounds:
82
82
  - **Auto items**: Verify all passed (build, lint, types, tests)
83
83
  - **Default items**: Verify all resolved (pass or skipped with reason)
84
84
 
85
- **E2E pass vs skipped distinction**: When reading `auto_qa.items[]` for `check: 'e2e'`, do NOT conflate `status: 'pass'` with `status: 'skipped'`. A spec that ran with `passed === 0 && skipped > 0` for any path touching `files_changed` is a hard fail, not a pass — verdict text MUST explicitly call this out: "E2E spec authored but assertions did not execute (skip-gated)." Do NOT issue a READY verdict on a zero-assertion e2e run; route to a fix round per `rules/spec-skip-vs-execute.md`.
85
+ **E2E pass vs skipped distinction**: When reading `auto_qa.items[]` for `check: 'e2e'`, do NOT conflate `status: 'pass'` with `status: 'skipped'`. A spec that ran with `passed === 0 && skipped > 0` for any path touching `files_changed` is a hard fail, not a pass — verdict text MUST explicitly call this out: "E2E spec authored but assertions did not execute (skip-gated)." Do NOT issue a READY verdict on a zero-assertion e2e run; route to a fix round per `rules/e2e-mandatory.md`.
86
86
 
87
87
  List any pending or failed items. Determine if they are blockers.
88
88
 
@@ -502,6 +502,8 @@ plan.testing_profile: 'claude_only' | 'web' | 'desktop' | 'backend' | 'full_matr
502
502
 
503
503
  User may override at round-start via `$ARGUMENTS`. Planner's detection is the default — not a hard gate.
504
504
 
505
+ **E2E eligibility is config-driven at execute time, not here.** `/cbp-round-execute` Step 5 reads `.codebyplan/e2e.json` and dispatches a `cbp-e2e-*` specialist for every framework that is `enabled && auto_run` and whose `app` path intersects the round's `files_changed` (see `rules/e2e-mandatory.md`). `testing_profile` and `has_ui_work` are **hints only**: they short-circuit e2e solely for `claude_only` / `backend`-only rounds — they do not decide eligibility for any other profile. Do not gate e2e on `has_ui_work` in the plan. Optionally, if `.codebyplan/e2e.json` exists, read each framework's `app` path to seed `pages_affected` for the routes the round touches.
506
+
505
507
  ### Phase 5: Design Solution
506
508
 
507
509
  Honor locked decisions. Create solution design with files, integration points.
@@ -20,7 +20,7 @@ Single agent that handles non-e2e quality validation in the per-wave validation
20
20
  - Apply default production checklist items
21
21
  - Detect unrelated issues and missing tests
22
22
 
23
- E2E execution (Playwright / Maestro / WebDriverIO / XCUITest / vscode-test) is owned by `cbp-test-e2e-agent`, spawned in parallel with this agent by `/cbp-round-execute` Step 5. **The two agents are fully independent — this agent does NOT read `round.context.e2e_output` or `round.context.frontend_ui_review`.** This agent emits auto QA items and default checklist items. Baseline-regression findings surface as a BLOCKING gate at `/cbp-round-end` Step 7 (an explicit accept-or-fix user decision; baselines are NEVER auto-accepted).
23
+ E2E execution (Playwright / Maestro / WebDriverIO / XCUITest / vscode-test) is owned by the `cbp-e2e-*` specialist agents (dispatched per `context/testing/e2e.md`), spawned in parallel with this agent by `/cbp-round-execute` Step 5. **The agents are fully independent — this agent does NOT read `round.context.e2e_outputs` or `round.context.frontend_ui_review`.** This agent emits auto QA items and default checklist items. Baseline-regression findings surface as a BLOCKING gate at `/cbp-round-end` Step 7 (an explicit accept-or-fix user decision; baselines are NEVER auto-accepted).
24
24
 
25
25
  ## Input Contract
26
26
 
@@ -92,7 +92,7 @@ output:
92
92
  passed: number
93
93
  warnings: number
94
94
  failed: number
95
- hard_fail: boolean # true if build/lint/types failed, unit tests (vitest/jest/cargo) failed when applicable, OR npm audit found critical/high vulnerabilities. E2E hard_fail is owned by test-e2e-agent and surfaced via round.context.e2e_output.
95
+ hard_fail: boolean # true if build/lint/types failed, unit tests (vitest/jest/cargo) failed when applicable, OR npm audit found critical/high vulnerabilities. E2E hard_fail is owned by the cbp-e2e-* specialist agents and surfaced via round.context.e2e_outputs.
96
96
  critical_issues: string[]
97
97
  captured_tasks:
98
98
  - issue_index: number # index into unrelated_issues[]
@@ -147,7 +147,7 @@ Apply `testing_profile` from input before running any checks. When `testing_prof
147
147
  | full_matrix | Run all checks |
148
148
  | cross_app | Run union of touched apps' checks (intersection by detected files) |
149
149
 
150
- E2E (Playwright / Maestro / WebDriverIO / XCUITest / vscode-test) is NEVER run by this agent under any profile — it's owned by `cbp-test-e2e-agent` (parallel sibling spawned by `/cbp-round-execute` Step 5).
150
+ E2E (Playwright / Maestro / WebDriverIO / XCUITest / vscode-test) is NEVER run by this agent under any profile — it's owned by the `cbp-e2e-*` specialist agents (dispatched per `context/testing/e2e.md`; parallel siblings spawned by `/cbp-round-execute` Step 5).
151
151
 
152
152
  **CRITICAL: Within your profile's allowed check set (see Profile Gate Matrix above), every applicable command MUST be executed. No skipping an in-scope check without an explicit, logged reason.**
153
153
 
@@ -187,7 +187,7 @@ Procedure:
187
187
 
188
188
  This closes the cycle where R2 adds a flat-config and the QA pass lints only R2 files, only for `/cbp-task-check` to later lint the full task and surface dozens of errors on R1 files — wasting an entire corrective round. Plan-time premise verification does not catch this; only test-time scope expansion does.
189
189
 
190
- **Hard fail means: if any of build/lint/types/unit fails or is not executed when applicable, set `totals.hard_fail = true`. The round CANNOT complete.** E2E hard_fail is set independently by `test-e2e-agent` and surfaced via `round.context.e2e_output`; `/cbp-round-execute` Step 6 considers both signals.
190
+ **Hard fail means: if any of build/lint/types/unit fails or is not executed when applicable, set `totals.hard_fail = true`. The round CANNOT complete.** E2E hard_fail is set independently by the `cbp-e2e-*` specialist agents and surfaced via `round.context.e2e_outputs`; `/cbp-round-execute` Step 6 considers both signals.
191
191
 
192
192
  **Step 3a: Execute conditional unit-test checks (HARD FAIL when applicable):**
193
193
 
@@ -209,7 +209,7 @@ Run the unit-test runners detected in Step 1:
209
209
  If condition is met and test fails: set `totals.hard_fail = true`.
210
210
  If condition is not met (no applicable files changed): log `SKIPPED: <command> (reason: no applicable files changed)`.
211
211
 
212
- E2E commands and their preflight (dev server / simulator / emulator / built binary / auth probe) are owned by `cbp-test-e2e-agent`. See `agents/test-e2e-agent.md` Step 6.5 for the canonical preflight contract.
212
+ E2E commands and their preflight (dev server / simulator / emulator / built binary / auth probe) are owned by the `cbp-e2e-*` specialist agents. See `context/testing/e2e.md` for the canonical preflight contract (Step 6.5 and the shared workflow).
213
213
 
214
214
  **Step 3b: Execute conditional checks (soft):**
215
215
 
@@ -360,7 +360,7 @@ Return complete output contract.
360
360
  - Auto and default QA items generated
361
361
  - `hard_fail` flag correctly set
362
362
  - **Vitest/Jest/Cargo unit-test hard_fail enforced** when source files changed
363
- - E2E execution + preflight delegated entirely to `test-e2e-agent` (this agent never runs Playwright/Maestro/wdio/etc.)
363
+ - E2E execution + preflight delegated entirely to the `cbp-e2e-*` specialist agents (this agent never runs Playwright/Maestro/wdio/etc.)
364
364
 
365
365
  ## Failure Modes
366
366
 
@@ -373,6 +373,6 @@ Return complete output contract.
373
373
 
374
374
  ## Integration
375
375
 
376
- - **Spawned by**: `/cbp-round-execute` Step 5 (per-wave; runs in parallel with `test-e2e-agent` and may also run in parallel with next wave's executor)
377
- - **Parallel sibling**: `cbp-test-e2e-agent` (fully independent — no cross-read; both agents complete on their own timeline using only their own inputs)
378
- - **Output consumed by**: `/cbp-round-execute` Step 6 (hard-fail routing — this agent's `totals.hard_fail` is OR'd with `e2e_output.test_results.failed > 0` and `e2e_output.status === 'failed'`), `/cbp-round-end` Step 3 (reads this agent's `auto_qa[]` and `default_checklist[]`). This agent does not emit `user_qa` items; baseline-regression findings surface as a BLOCKING gate at `/cbp-round-end` Step 7 (an explicit accept-or-fix user decision; baselines are NEVER auto-accepted).
376
+ - **Spawned by**: `/cbp-round-execute` Step 5 (per-wave; runs in parallel with the `cbp-e2e-*` specialists and may also run in parallel with next wave's executor)
377
+ - **Parallel siblings**: `cbp-e2e-*` specialist agents (fully independent — no cross-read; all agents complete on their own timeline using only their own inputs)
378
+ - **Output consumed by**: `/cbp-round-execute` Step 6 (hard-fail routing — this agent's `totals.hard_fail` is OR'd across `round.context.e2e_outputs` entries: any `e2e_outputs[f].test_results.failed > 0` or `e2e_outputs[f].status === 'failed'`, plus the `e2e_eligible_skipped` signal), `/cbp-round-end` Step 3 (reads this agent's `auto_qa[]` and `default_checklist[]`). This agent does not emit `user_qa` items; baseline-regression findings surface as a BLOCKING gate at `/cbp-round-end` Step 7 (an explicit accept-or-fix user decision; baselines are NEVER auto-accepted).