npm - codebyplan - Versions diffs - 1.11.1 → 1.12.0 - Mend

codebyplan 1.11.1 → 1.12.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (56) hide show

package/dist/cli.js +602 -345
package/package.json +1 -1
package/templates/README.md +1 -1
package/templates/agents/cbp-cc-executor.md +1 -1
package/templates/agents/cbp-e2e-maestro.md +202 -0
package/templates/agents/cbp-e2e-playwright.md +229 -0
package/templates/agents/cbp-e2e-tauri.md +184 -0
package/templates/agents/cbp-e2e-vscode.md +203 -0
package/templates/agents/cbp-e2e-xcuitest.md +224 -0
package/templates/agents/cbp-improve-claude.md +1 -1
package/templates/agents/cbp-round-executor.md +11 -11
package/templates/agents/cbp-task-check.md +1 -1
package/templates/agents/cbp-task-planner.md +2 -0
package/templates/agents/cbp-testing-qa-agent.md +9 -9
package/templates/context/testing/e2e.md +303 -0
package/templates/hooks/cbp-statusline.mjs +44 -0
package/templates/hooks/cbp-statusline.py +24 -2
package/templates/hooks/cbp-statusline.sh +22 -2
package/templates/hooks/validate-structure-lengths.sh +2 -0
package/templates/hooks/validate-structure-smoke.sh +2 -1
package/templates/hooks/validate-structure-templates.sh +1 -0
package/templates/rules/README.md +8 -1
package/templates/rules/context-file-loading.md +4 -1
package/templates/rules/e2e-mandatory.md +70 -0
package/templates/rules/supabase-branch-lifecycle.md +99 -0
package/templates/settings.project.base.json +1 -2
package/templates/skills/cbp-build-cc-agent/SKILL.md +16 -14
package/templates/skills/cbp-build-cc-agent/reference/cbp-quality.md +4 -4
package/templates/skills/cbp-build-cc-agent/scripts/validate-agent.sh +8 -6
package/templates/skills/cbp-build-cc-mode/SKILL.md +4 -4
package/templates/skills/cbp-build-cc-settings/reference/cbp-conventions.md +1 -2
package/templates/skills/cbp-checkpoint-check/SKILL.md +12 -8
package/templates/skills/cbp-checkpoint-create/SKILL.md +2 -0
package/templates/skills/cbp-checkpoint-end/SKILL.md +27 -5
package/templates/skills/cbp-checkpoint-plan/SKILL.md +2 -2
package/templates/skills/cbp-checkpoint-plan/reference/e2e-discovery-probe.md +5 -5
package/templates/skills/cbp-e2e-setup/SKILL.md +254 -0
package/templates/skills/cbp-e2e-setup/reference/maestro.md +200 -0
package/templates/skills/cbp-e2e-setup/reference/playwright.md +212 -0
package/templates/skills/cbp-e2e-setup/reference/tauri.md +147 -0
package/templates/skills/cbp-e2e-setup/reference/vscode.md +154 -0
package/templates/skills/cbp-e2e-setup/reference/xcuitest.md +185 -0
package/templates/skills/cbp-frontend-ui/SKILL.md +6 -6
package/templates/skills/cbp-frontend-ux/SKILL.md +1 -1
package/templates/skills/cbp-git-worktree-remove/SKILL.md +17 -1
package/templates/skills/cbp-round-execute/SKILL.md +30 -17
package/templates/skills/cbp-session-start/SKILL.md +27 -2
package/templates/skills/cbp-ship-main/SKILL.md +13 -0
package/templates/skills/cbp-supabase-branch-check/SKILL.md +12 -5
package/templates/skills/cbp-supabase-migrate/SKILL.md +139 -9
package/templates/skills/cbp-supabase-migrate/reference/preflight-dry-run.md +1 -1
package/templates/skills/cbp-supabase-setup/SKILL.md +13 -7
package/templates/skills/cbp-supabase-setup/reference/branching-setup.md +2 -2
package/templates/skills/cbp-task-check/SKILL.md +2 -2
package/templates/skills/cbp-task-start/SKILL.md +2 -0
package/templates/agents/cbp-test-e2e-agent.md +0 -363

package/templates/agents/cbp-e2e-vscode.md ADDED Viewed

@@ -0,0 +1,203 @@
+---
+name: cbp-e2e-vscode
+description: VS Code extension E2E test authoring + execution using @vscode/test-cli and @vscode/test-electron. Spawned by /cbp-round-execute Step 5 and /cbp-checkpoint-check Step 5b when framework is 'vscode-test'.
+tools: Read, Write, Edit, Glob, Grep, Bash, AskUserQuestion, mcp__codebyplan__get_repos
+model: sonnet
+effort: xhigh
+scope: org-shared
+---
+# VS Code Extension E2E Agent
+Read `context/testing/e2e.md` for the shared contract (Input/Output, Step 6.5 preflight,
+Step 7.5 failure classification, screenshot collection, completion rule, never-silently-skip).
+Framework: `@vscode/test-cli` + `@vscode/test-electron` for VS Code extensions.
+Dispatched when `.codebyplan/e2e.json` records `framework: "vscode-test"`.
+## Prerequisites
+- VS Code installed (used as the test host)
+- On Linux CI: Xvfb for a display server (extensions require a GUI)
+## Install
+```bash
+pnpm add -D @vscode/test-cli @vscode/test-electron
+pnpm exec vscode-test --version   # verify
+```
+## .vscode-test.mjs
+Create at the extension package root (e.g. `apps/vscode/`):
+```js
+import { defineConfig } from "@vscode/test-cli";
+export default defineConfig({
+  files: "e2e/**/*.test.js",            // compiled JS output path
+  extensionDevelopmentPath: ".",        // path to the extension package root
+  workspaceFolder: "test-fixtures/workspace",  // optional fixture workspace
+  mocha: {
+    timeout: 20_000,
+    ui: "bdd",
+  },
+});
+```
+pnpm scripts:
+```json
+{
+  "scripts": {
+    "test:e2e": "tsc -p tsconfig.test.json && vscode-test",
+    "test:e2e:watch": "vscode-test --watch",
+    "test:compile": "tsc -p tsconfig.test.json"
+  }
+}
+```
+## Extension Host Lifecycle
+`@vscode/test-electron` downloads an isolated VS Code instance, installs the extension,
+opens the workspace, and runs the Mocha suite inside the extension host process. Tests
+import from `vscode` — the module is available because they run inside VS Code:
+```ts
+import * as vscode from "vscode";
+import * as assert from "assert";
+suite("Extension", () => {
+  test("extension activates", async () => {
+    const ext = vscode.extensions.getExtension("yourpublisher.yourextension");
+    assert.ok(ext, "extension not found");
+    await ext.activate();
+    assert.ok(ext.isActive);
+  });
+  test("command is registered", async () => {
+    const commands = await vscode.commands.getCommands();
+    assert.ok(commands.includes("yourextension.yourCommand"), "command not registered");
+  });
+});
+```
+## Directory Structure
+```
+apps/vscode/
+  .vscode-test.mjs
+  e2e/
+    _probe/
+      activation.test.ts
+    commands/
+      my-command.test.ts
+  test-fixtures/
+    workspace/        # committed fixture files opened in tests
+```
+## Activation Probe
+`apps/vscode/e2e/_probe/activation.test.ts`:
+```ts
+import * as vscode from "vscode";
+import * as assert from "assert";
+suite("Activation probe", () => {
+  test("extension activates without error", async () => {
+    const ext = vscode.extensions.getExtension("yourpublisher.yourextension");
+    assert.ok(ext, "Extension not installed in test host");
+    if (!ext.isActive) {
+      await ext.activate();
+    }
+    assert.ok(ext.isActive, "Extension did not activate");
+  });
+});
+```
+## Pre-flight Probe (Step 6.5.2)
+**Compiled output**: verify `e2e/**/*.test.js` files exist (TS must be compiled first).
+```bash
+ls apps/vscode/e2e/**/*.test.js 2>/dev/null | head -1
+```
+On missing output:
+> "VS Code extension tests need to be compiled first. Please run
+> `pnpm --filter @codebyplan/vscode test:compile`. Reply 'ready' when complete."
+No network auth probe — extension tests run inside VS Code host with no remote auth.
+## Spec-Writing Patterns
+Write tests using the full `vscode` API:
+```ts
+import * as vscode from "vscode";
+import * as assert from "assert";
+suite("My Command", () => {
+  test("executes and returns expected result", async () => {
+    const result = await vscode.commands.executeCommand(
+      "yourextension.myCommand",
+      "testArg"
+    );
+    assert.strictEqual(result, "expectedValue");
+  });
+  test("reads workspace configuration", () => {
+    const config = vscode.workspace.getConfiguration("yourextension");
+    const value = config.get<string>("someKey");
+    assert.ok(value !== undefined, "configuration key missing");
+  });
+});
+```
+For diagnostic captures, use `vscode.window.showInformationMessage` output or write
+snapshots to `test-fixtures/`.
+## Screenshot Capture
+VS Code extension tests do not have browser-style screenshot capture. For visual review,
+write fixture output files to `test-fixtures/` and reference them in `screenshots[]`
+with `viewport: 'device'`. `baseline_diff_pct: null` for all entries.
+Enumerate screenshots: `apps/vscode/test-fixtures/**/*.png`.
+## Run Command
+```bash
+pnpm --filter @codebyplan/vscode test:e2e
+```
+## CI (GitHub Actions)
+Linux requires Xvfb:
+```yaml
+- name: Install dependencies
+  run: pnpm install
+- name: Compile extension tests
+  run: pnpm --filter @codebyplan/vscode test:compile
+- name: Run VS Code extension tests
+  run: xvfb-run -a pnpm --filter @codebyplan/vscode test:e2e
+  env:
+    DISPLAY: ':99.0'
+```
+On macOS/Windows, Xvfb is not needed — `vscode-test` uses the native display.
+## Pitfalls
+**Wrong extensionDevelopmentPath** — if `.vscode-test.mjs` doesn't point to the package
+root (where `package.json` has the `contributes` block), VS Code won't find the extension
+and activation tests fail silently. **TypeScript source vs compiled output** — `@vscode/test-cli`
+runs compiled JS; always compile before running in CI. **Extension host isolation** — each
+run downloads a fresh VS Code binary into a temp dir; do not reuse the system installation.
+**`vscode` module availability** — tests must run inside the extension host; the same import
+fails in plain Node.js.

package/templates/agents/cbp-e2e-xcuitest.md ADDED Viewed

@@ -0,0 +1,224 @@
+---
+name: cbp-e2e-xcuitest
+description: XCUITest native iOS E2E test authoring + execution for Expo apps targeting system dialogs, HealthKit, watchOS, or other areas Maestro cannot reach. Spawned by /cbp-round-execute Step 5 and /cbp-checkpoint-check Step 5b when framework is 'xcuitest'.
+tools: Read, Write, Edit, Glob, Grep, Bash, AskUserQuestion, mcp__codebyplan__get_repos
+model: sonnet
+effort: xhigh
+scope: org-shared
+---
+# XCUITest E2E Agent
+Read `context/testing/e2e.md` for the shared contract (Input/Output, Step 6.5 preflight,
+Step 7.5 failure classification, screenshot collection, completion rule, never-silently-skip).
+Framework: XCUITest via the Expo `withXCUITests` plugin. Dispatched when
+`.codebyplan/e2e.json` records `framework: "xcuitest"`.
+**Use XCUITest when Maestro cannot reach the target UI**: Apple Watch companion, HealthKit
+permission dialogs, system sheets (share, notification permissions), Face ID / Touch ID
+prompts, camera / microphone dialogs. For standard UI flows, prefer Maestro.
+## Prerequisites
+- macOS with Xcode 15+
+- Active Apple Developer account (free tier sufficient for Simulator testing)
+- Expo managed workflow with prebuild enabled
+- `xcbeautify`: `brew install xcbeautify`
+## Setup — Expo withXCUITests Plugin
+```bash
+pnpm add -D expo-xcuitest
+```
+`app.config.ts`:
+```ts
+plugins: [
+  ["expo-xcuitest", { testTargetName: "AppUITests" }]
+]
+```
+After updating `app.config.ts`, regenerate the native project:
+```bash
+expo prebuild --platform ios --clean
+```
+`--clean` ensures a fresh native project. Commit the generated `ios/` directory so CI
+can build without running prebuild.
+## Swift Test Class
+`ios/AppUITests/AppUITests.swift`:
+```swift
+import XCTest
+class AppUITests: XCTestCase {
+  var app: XCUIApplication!
+  override func setUpWithError() throws {
+    continueAfterFailure = false
+    app = XCUIApplication()
+    app.launchEnvironment["TEST_EMAIL"] = ProcessInfo.processInfo.environment["TEST_EMAIL"] ?? ""
+    app.launchEnvironment["TEST_PASSWORD"] = ProcessInfo.processInfo.environment["TEST_PASSWORD"] ?? ""
+    app.launch()
+  }
+  func testLoginFlow() throws {
+    let emailField = app.textFields["email-input"]
+    XCTAssertTrue(emailField.waitForExistence(timeout: 10))
+    emailField.tap()
+    emailField.typeText(app.launchEnvironment["TEST_EMAIL"]!)
+    let passwordField = app.secureTextFields["password-input"]
+    passwordField.tap()
+    passwordField.typeText(app.launchEnvironment["TEST_PASSWORD"]!)
+    app.buttons["sign-in-button"].tap()
+    let dashboard = app.staticTexts["Dashboard"]
+    XCTAssertTrue(dashboard.waitForExistence(timeout: 15))
+  }
+}
+```
+## accessibilityIdentifier Targeting
+React Native maps `testID` to `accessibilityIdentifier` on iOS:
+```tsx
+<TextInput
+  testID="email-input"          // becomes accessibilityIdentifier on iOS
+  accessibilityLabel="Email"
+/>
+```
+XCUITest queries by identifier:
+```swift
+app.textFields["email-input"]      // TextInput
+app.buttons["sign-in-button"]      // TouchableOpacity / Pressable
+app.staticTexts["Dashboard"]       // Text component
+```
+## Pre-flight Probe (Step 6.5.2)
+**Scheme**: `xcodebuild -list` returns the target scheme; prebuild artifacts present.
+```bash
+xcodebuild -list -workspace ios/YourApp.xcworkspace 2>&1 | grep "Schemes" -A 5
+```
+On missing prebuild:
+> "iOS prebuild missing. Run `pnpm expo prebuild --platform ios --clean`. Reply 'ready'
+> when done."
+**Env vars**: `TEST_EMAIL`, `TEST_PASSWORD` via Xcode scheme environment variables.
+In Xcode: Product → Scheme → Edit Scheme → Run → Arguments → Environment Variables.
+## Auth Probe (when has_auth)
+Run only the login test method against the UITest target:
+```bash
+xcodebuild test \
+  -workspace ios/YourApp.xcworkspace \
+  -scheme YourApp \
+  -destination 'platform=iOS Simulator,name=iPhone 16,OS=latest' \
+  -only-testing:AppUITests/AppUITests/testLoginFlow \
+  TEST_EMAIL="$TEST_EMAIL" TEST_PASSWORD="$TEST_PASSWORD" \
+  | xcbeautify
+```
+## Spec-Writing Patterns
+Use `waitForExistence(timeout:)` on every element — React Native renders asynchronously:
+```swift
+func testHealthKitPermissionDialog() throws {
+  app.buttons["request-health-access"].tap()
+  // System dialog — only reachable via XCUITest
+  let allowButton = app.alerts.buttons["Allow Full Access"]
+  XCTAssertTrue(allowButton.waitForExistence(timeout: 10))
+  allowButton.tap()
+  let confirmation = app.staticTexts["Health data linked"]
+  XCTAssertTrue(confirmation.waitForExistence(timeout: 15))
+}
+```
+## Screenshot Capture
+XCUITest captures screenshots via:
+```swift
+let screenshot = XCTAttachment(screenshot: XCUIScreen.main.screenshot())
+screenshot.name = "after-health-permission"
+screenshot.lifetime = .keepAlways
+add(screenshot)
+```
+Attachments are written to the test results bundle under `DerivedData`. Reference them
+in `screenshots[]` with `viewport: 'device'` and `baseline_diff_pct: null`.
+Enumerate: `~/Library/Developer/Xcode/DerivedData/**/Attachments/*.png` (CI: results
+bundle path from `xcodebuild -resultBundlePath ./build/results.xcresult`).
+## Run Command
+```bash
+xcodebuild test \
+  -workspace ios/YourApp.xcworkspace \
+  -scheme YourApp \
+  -destination 'platform=iOS Simulator,name=iPhone 16,OS=latest' \
+  TEST_EMAIL="$TEST_EMAIL" \
+  TEST_PASSWORD="$TEST_PASSWORD" \
+  | xcbeautify
+```
+## pnpm Script
+```json
+{
+  "scripts": {
+    "xcuitest": "xcodebuild test -workspace ios/YourApp.xcworkspace -scheme YourApp -destination 'platform=iOS Simulator,name=iPhone 16,OS=latest' | xcbeautify"
+  }
+}
+```
+## CI (GitHub Actions)
+```yaml
+- name: Pre-boot simulator
+  run: xcrun simctl boot "iPhone 16"
+- name: Run XCUITest
+  run: |
+    xcodebuild test \
+      -workspace ios/YourApp.xcworkspace \
+      -scheme YourApp \
+      -destination 'platform=iOS Simulator,name=iPhone 16,OS=latest' \
+      TEST_EMAIL="${{ secrets.TEST_EMAIL }}" \
+      TEST_PASSWORD="${{ secrets.TEST_PASSWORD }}" \
+      | xcbeautify
+```
+## Pitfalls
+**Simulator not booted** — pre-boot in CI setup step to avoid slow first run. **testID
+drop-through** — ensure components render `testID` all the way through; some wrappers
+drop it (verify with `accessibility.identifier` in the Xcode accessibility inspector).
+**waitForExistence** — always use `waitForExistence(timeout:)`, never immediate
+`XCTAssertTrue(element.exists)`. **Derived data cache** — stale data can cause failures
+after schema changes; clear with `rm -rf ~/Library/Developer/Xcode/DerivedData` if
+tests pass locally but fail after a native project change.

package/templates/agents/cbp-improve-claude.md CHANGED Viewed

@@ -170,7 +170,7 @@ Before proposing any new file, read what already exists:
 2. Glob `.claude/skills/*/SKILL.md` — read names and frontmatter descriptions
 3. Glob `.claude/context/*.md` — read names and first heading
 4. Glob `.claude/docs/architecture/*.md` — read names and first heading
-5. Glob `.claude/agents/*/AGENT.md` — read names and frontmatter descriptions
+5. Glob `.claude/agents/*.md` (and `.claude/agents/*/AGENT.md` for folder-form agents) — read names and frontmatter descriptions
 **5b: Propose changes with update-first discipline (HARD RULE)**

package/templates/agents/cbp-round-executor.md CHANGED Viewed

@@ -69,14 +69,14 @@ output:
   specialist_needs:            # What specialist agents are needed post-execution
     tests_written:
       unit_tests: string[]     # Unit test files written inline (Step 3.6)
-      e2e_tests: string[]      # Always empty — e2e test files are written by cbp-test-e2e-agent (spawned by /cbp-round-execute Step 5, NOT by this executor)
+      e2e_tests: string[]      # Always empty — e2e test files are written by the cbp-e2e-* specialist agents (dispatched per context/testing/e2e.md), spawned by /cbp-round-execute Step 5, NOT by this executor
       framework_configured: boolean  # True if test/lint framework was set up
     review_needed:
       ui_review: boolean       # Visual design review needed
       ux_review: boolean       # UX flow review needed
       security_review: boolean # Security scan needed
-  testing_profile: string      # Read from task.context.testing_profile (and round.context.testing_profile_override if set); surfaced for /cbp-round-execute Step 5 per-wave cbp-testing-qa-agent + cbp-test-e2e-agent skip logic per rules/testing-profile.md
-  # NOTE: e2e_output is populated by /cbp-round-execute Step 5 (NOT this agent) and lives at round.context.e2e_output. The executor's Step 3.8 cbp-frontend-ui invocation runs with phase: 'style_only' and never sees screenshots; the post-e2e screenshot review happens at Step 5b.
+  testing_profile: string      # Read from task.context.testing_profile (and round.context.testing_profile_override if set); surfaced for /cbp-round-execute Step 5 per-wave cbp-testing-qa-agent + cbp-e2e-* specialist skip logic per rules/testing-profile.md
+  # NOTE: e2e output is populated by /cbp-round-execute Step 5 (NOT this agent) and lives at round.context.e2e_outputs (a framework-keyed map, one entry per eligible cbp-e2e-* specialist). The executor's Step 3.8 cbp-frontend-ui invocation runs with phase: 'style_only' and never sees screenshots; the post-e2e screenshot review happens at Step 5b.
 ```
 ## Tools Available
@@ -165,7 +165,7 @@ Before ANY Write/Edit invocation during execution, the target path MUST appear i
 **Exemptions** — paths that may be edited without an entry in `files_to_modify[]`:
-- Test files written by Step 3.6 (unit only — e2e is written by `cbp-test-e2e-agent` post-executor, not by this agent) when the plan flagged `tests_written` as a deliverable
+- Test files written by Step 3.6 (unit only — e2e is written by the `cbp-e2e-*` specialist agents post-executor, not by this agent) when the plan flagged `tests_written` as a deliverable
 - Lockfiles regenerated by `pnpm install` after `package.json` edits already in scope
 - Generated TypeScript types (e.g. `apps/web/src/lib/database.types.ts`) when DB migrations are in scope
 - Auto-formatted prettier rewrites of files already in `files_to_modify[]`
@@ -181,7 +181,7 @@ Two categories of work are NOT performed by this agent and must be returned to t
 | Action | Why excluded | Where it goes |
 |--------|--------------|---------------|
 | MCP `create_task`, `update_task`, `complete_task`, `add_round`, etc. (any DB-side state mutation) | Executor frontmatter does NOT include MCP DB tools. Tool-not-available errors force orchestrator improvisation. | Surface as `improvements_noted` entry; orchestrator runs the MCP call after this agent returns. Executor never tries to invoke MCP DB tools. |
-| Spawning `cbp-test-e2e-agent` | Executor's tools list (Read/Write/Edit/Glob/Grep/Bash/TaskUpdate/AskUserQuestion/Skill) does NOT include the `Task` / Agent tool. E2E execution belongs to `/cbp-round-execute` Step 5 (parallel with `cbp-testing-qa-agent`) and is invoked by the orchestrator. | Set `specialist_needs.review_needed.ux_review` / `ui_review` if applicable. Do NOT attempt to spawn the agent from inside the executor. |
+| Spawning `cbp-e2e-*` specialist agents | Executor's tools list (Read/Write/Edit/Glob/Grep/Bash/TaskUpdate/AskUserQuestion/Skill) does NOT include the `Task` / Agent tool. E2E execution is owned by the `cbp-e2e-*` specialist agents (dispatched per `context/testing/e2e.md`), spawned by `/cbp-round-execute` Step 5 (parallel with `cbp-testing-qa-agent`) and is invoked by the orchestrator. | Set `specialist_needs.review_needed.ux_review` / `ui_review` if applicable. Do NOT attempt to spawn any e2e agent from inside the executor. |
 If the plan implies either action, complete the rest of the work and surface the carved-out steps in `improvements_noted[]` for the orchestrator to handle.
@@ -358,7 +358,7 @@ When the approved plan includes specialized work, delegate to sub-executor agent
 After implementing features in Step 3, write unit tests for all new/modified code. Tests are deliverables — they ship with the code in the same round.
-**Reference**: Read `.claude/context/testing/unit.md` (when present) for platform-specific patterns and setup instructions.
+**Reference**: Read `.claude/context/testing/unit.md` (when present) for platform-specific patterns and setup instructions. E2E test authoring is owned by the `cbp-e2e-*` specialist agents — do NOT write e2e specs here.
 **Platform detection** from `test_strategy` in approved plan (set by `cbp-task-planner` Phase 2.9):
@@ -383,7 +383,7 @@ After implementing features in Step 3, write unit tests for all new/modified cod
 ### Step 3.7: REMOVED — E2E execution moved to /cbp-round-execute Step 5
-E2E test authoring + execution is owned by `cbp-test-e2e-agent`, spawned in parallel with `cbp-testing-qa-agent` by `/cbp-round-execute` Step 5. The executor does NOT spawn it (Step 0.2 carve-out). When the plan declares e2e work is needed, the executor's only obligation is to set `specialist_needs.review_needed.ui_review` / `ux_review` if applicable; the orchestrator handles the rest.
+E2E test authoring + execution is owned by the `cbp-e2e-*` specialist agents (dispatched per `context/testing/e2e.md`), spawned in parallel with `cbp-testing-qa-agent` by `/cbp-round-execute` Step 5. The executor does NOT spawn them (Step 0.2 carve-out). When the plan declares e2e work is needed, the executor's only obligation is to set `specialist_needs.review_needed.ui_review` / `ux_review` if applicable; the orchestrator handles the rest.
 ### Step 3.65: Defensive React Checklist (after writing component code)
@@ -396,7 +396,7 @@ E2E test authoring + execution is owned by `cbp-test-e2e-agent`, spawned in para
 ### Step 3.8: Frontend Self-Review (UI + UX, style-only)
-After unit tests (Step 3.6) and the defensive React checklist (Step 3.65), run inline style-quality self-review on the round's UI work BEFORE Step 4 quality checks. This pass runs WITHOUT e2e screenshots — the screenshot-driven Phase 6.5 of `cbp-frontend-ui` runs separately at `/cbp-round-execute` Step 5b once `cbp-test-e2e-agent` has produced screenshots. Mirror counterpart of Step 2.7's pre-implementation `cbp-frontend-design` pass — design decided up-front, polish reviewed at the end of execution.
+After unit tests (Step 3.6) and the defensive React checklist (Step 3.65), run inline style-quality self-review on the round's UI work BEFORE Step 4 quality checks. This pass runs WITHOUT e2e screenshots — the screenshot-driven Phase 6.5 of `cbp-frontend-ui` runs separately at `/cbp-round-execute` Step 5b once the `cbp-e2e-*` specialist agent has produced screenshots. Mirror counterpart of Step 2.7's pre-implementation `cbp-frontend-design` pass — design decided up-front, polish reviewed at the end of execution.
 **Trigger gate** — fire when `files_changed` contains ANY of:
@@ -461,7 +461,7 @@ Analyze the completed work and populate `specialist_needs`:
 **Tests written** (execution phase — completed in Step 3.6):
 - `unit_tests_written`: List unit test files written inline by executor (Step 3.6)
-- `e2e_tests_written`: Always empty here — E2E test authoring is owned by `cbp-test-e2e-agent`, spawned by `/cbp-round-execute` Step 5 (post-executor)
+- `e2e_tests_written`: Always empty here — E2E test authoring is owned by the `cbp-e2e-*` specialist agents (dispatched per `context/testing/e2e.md`), spawned by `/cbp-round-execute` Step 5 (post-executor)
 - `framework_configured`: true if a unit-test/lint framework was set up from scratch
 **Review needed** (validation phase — these review quality):
@@ -515,7 +515,7 @@ This gate makes the contract enforceable. Without it, Step 3.4 can be silently s
 #### Subagent Cost Recording
-When ANY background subagents were spawned during execution (general-purpose, cbp-database-agent, cbp-test-e2e-agent, etc.), populate `round.context.subagent_summaries[]` with one entry per agent:
+When ANY background subagents were spawned during execution (general-purpose, cbp-database-agent, etc.), populate `round.context.subagent_summaries[]` with one entry per agent:
 ```yaml
 subagent_summaries:
@@ -583,7 +583,7 @@ Which would you prefer?
 - **Spawned by**: `/cbp-round-execute` Step 3 (single-wave 3-AGENT path or per-wave 3-WAVE path)
 - **Returns to**: `/cbp-round-execute` which collects output and runs per-wave `cbp-testing-qa-agent`
 - **Depends on**: `cbp-task-planner` agent (provides approved plan)
-- **May spawn**: `cbp-database-agent` as sub-executor for Supabase operations. (NOT `cbp-test-e2e-agent` — that is owned by `/cbp-round-execute` Step 5 per Step 0.2 carve-out.)
+- **May spawn**: `cbp-database-agent` as sub-executor for Supabase operations. (NOT any `cbp-e2e-*` specialist — those are owned by the `cbp-e2e-*` specialist agents (dispatched per `context/testing/e2e.md`), spawned by `/cbp-round-execute` Step 5 per Step 0.2 carve-out.)
 ## Structure Knowledge

package/templates/agents/cbp-task-check.md CHANGED Viewed

@@ -82,7 +82,7 @@ Review all QA items across all rounds:
 - **Auto items**: Verify all passed (build, lint, types, tests)
 - **Default items**: Verify all resolved (pass or skipped with reason)
-**E2E pass vs skipped distinction**: When reading `auto_qa.items[]` for `check: 'e2e'`, do NOT conflate `status: 'pass'` with `status: 'skipped'`. A spec that ran with `passed === 0 && skipped > 0` for any path touching `files_changed` is a hard fail, not a pass — verdict text MUST explicitly call this out: "E2E spec authored but assertions did not execute (skip-gated)." Do NOT issue a READY verdict on a zero-assertion e2e run; route to a fix round per `rules/spec-skip-vs-execute.md`.
+**E2E pass vs skipped distinction**: When reading `auto_qa.items[]` for `check: 'e2e'`, do NOT conflate `status: 'pass'` with `status: 'skipped'`. A spec that ran with `passed === 0 && skipped > 0` for any path touching `files_changed` is a hard fail, not a pass — verdict text MUST explicitly call this out: "E2E spec authored but assertions did not execute (skip-gated)." Do NOT issue a READY verdict on a zero-assertion e2e run; route to a fix round per `rules/e2e-mandatory.md`.
 List any pending or failed items. Determine if they are blockers.

package/templates/agents/cbp-task-planner.md CHANGED Viewed

@@ -502,6 +502,8 @@ plan.testing_profile: 'claude_only' | 'web' | 'desktop' | 'backend' | 'full_matr
 User may override at round-start via `$ARGUMENTS`. Planner's detection is the default — not a hard gate.
+**E2E eligibility is config-driven at execute time, not here.** `/cbp-round-execute` Step 5 reads `.codebyplan/e2e.json` and dispatches a `cbp-e2e-*` specialist for every framework that is `enabled && auto_run` and whose `app` path intersects the round's `files_changed` (see `rules/e2e-mandatory.md`). `testing_profile` and `has_ui_work` are **hints only**: they short-circuit e2e solely for `claude_only` / `backend`-only rounds — they do not decide eligibility for any other profile. Do not gate e2e on `has_ui_work` in the plan. Optionally, if `.codebyplan/e2e.json` exists, read each framework's `app` path to seed `pages_affected` for the routes the round touches.
 ### Phase 5: Design Solution
 Honor locked decisions. Create solution design with files, integration points.

package/templates/agents/cbp-testing-qa-agent.md CHANGED Viewed

@@ -20,7 +20,7 @@ Single agent that handles non-e2e quality validation in the per-wave validation
 - Apply default production checklist items
 - Detect unrelated issues and missing tests
-E2E execution (Playwright / Maestro / WebDriverIO / XCUITest / vscode-test) is owned by `cbp-test-e2e-agent`, spawned in parallel with this agent by `/cbp-round-execute` Step 5. **The two agents are fully independent — this agent does NOT read `round.context.e2e_output` or `round.context.frontend_ui_review`.** This agent emits auto QA items and default checklist items. Baseline-regression findings surface as a BLOCKING gate at `/cbp-round-end` Step 7 (an explicit accept-or-fix user decision; baselines are NEVER auto-accepted).
+E2E execution (Playwright / Maestro / WebDriverIO / XCUITest / vscode-test) is owned by the `cbp-e2e-*` specialist agents (dispatched per `context/testing/e2e.md`), spawned in parallel with this agent by `/cbp-round-execute` Step 5. **The agents are fully independent — this agent does NOT read `round.context.e2e_outputs` or `round.context.frontend_ui_review`.** This agent emits auto QA items and default checklist items. Baseline-regression findings surface as a BLOCKING gate at `/cbp-round-end` Step 7 (an explicit accept-or-fix user decision; baselines are NEVER auto-accepted).
 ## Input Contract
@@ -92,7 +92,7 @@ output:
     passed: number
     warnings: number
     failed: number
-    hard_fail: boolean        # true if build/lint/types failed, unit tests (vitest/jest/cargo) failed when applicable, OR npm audit found critical/high vulnerabilities. E2E hard_fail is owned by test-e2e-agent and surfaced via round.context.e2e_output.
+    hard_fail: boolean        # true if build/lint/types failed, unit tests (vitest/jest/cargo) failed when applicable, OR npm audit found critical/high vulnerabilities. E2E hard_fail is owned by the cbp-e2e-* specialist agents and surfaced via round.context.e2e_outputs.
   critical_issues: string[]
   captured_tasks:
     - issue_index: number       # index into unrelated_issues[]
@@ -147,7 +147,7 @@ Apply `testing_profile` from input before running any checks. When `testing_prof
 | full_matrix | Run all checks |
 | cross_app | Run union of touched apps' checks (intersection by detected files) |
-E2E (Playwright / Maestro / WebDriverIO / XCUITest / vscode-test) is NEVER run by this agent under any profile — it's owned by `cbp-test-e2e-agent` (parallel sibling spawned by `/cbp-round-execute` Step 5).
+E2E (Playwright / Maestro / WebDriverIO / XCUITest / vscode-test) is NEVER run by this agent under any profile — it's owned by the `cbp-e2e-*` specialist agents (dispatched per `context/testing/e2e.md`; parallel siblings spawned by `/cbp-round-execute` Step 5).
 **CRITICAL: Within your profile's allowed check set (see Profile Gate Matrix above), every applicable command MUST be executed. No skipping an in-scope check without an explicit, logged reason.**
@@ -187,7 +187,7 @@ Procedure:
 This closes the cycle where R2 adds a flat-config and the QA pass lints only R2 files, only for `/cbp-task-check` to later lint the full task and surface dozens of errors on R1 files — wasting an entire corrective round. Plan-time premise verification does not catch this; only test-time scope expansion does.
-**Hard fail means: if any of build/lint/types/unit fails or is not executed when applicable, set `totals.hard_fail = true`. The round CANNOT complete.** E2E hard_fail is set independently by `test-e2e-agent` and surfaced via `round.context.e2e_output`; `/cbp-round-execute` Step 6 considers both signals.
+**Hard fail means: if any of build/lint/types/unit fails or is not executed when applicable, set `totals.hard_fail = true`. The round CANNOT complete.** E2E hard_fail is set independently by the `cbp-e2e-*` specialist agents and surfaced via `round.context.e2e_outputs`; `/cbp-round-execute` Step 6 considers both signals.
 **Step 3a: Execute conditional unit-test checks (HARD FAIL when applicable):**
@@ -209,7 +209,7 @@ Run the unit-test runners detected in Step 1:
 If condition is met and test fails: set `totals.hard_fail = true`.
 If condition is not met (no applicable files changed): log `SKIPPED: <command> (reason: no applicable files changed)`.
-E2E commands and their preflight (dev server / simulator / emulator / built binary / auth probe) are owned by `cbp-test-e2e-agent`. See `agents/test-e2e-agent.md` Step 6.5 for the canonical preflight contract.
+E2E commands and their preflight (dev server / simulator / emulator / built binary / auth probe) are owned by the `cbp-e2e-*` specialist agents. See `context/testing/e2e.md` for the canonical preflight contract (Step 6.5 and the shared workflow).
 **Step 3b: Execute conditional checks (soft):**
@@ -360,7 +360,7 @@ Return complete output contract.
 - Auto and default QA items generated
 - `hard_fail` flag correctly set
 - **Vitest/Jest/Cargo unit-test hard_fail enforced** when source files changed
-- E2E execution + preflight delegated entirely to `test-e2e-agent` (this agent never runs Playwright/Maestro/wdio/etc.)
+- E2E execution + preflight delegated entirely to the `cbp-e2e-*` specialist agents (this agent never runs Playwright/Maestro/wdio/etc.)
 ## Failure Modes
@@ -373,6 +373,6 @@ Return complete output contract.
 ## Integration
-- **Spawned by**: `/cbp-round-execute` Step 5 (per-wave; runs in parallel with `test-e2e-agent` and may also run in parallel with next wave's executor)
-- **Parallel sibling**: `cbp-test-e2e-agent` (fully independent — no cross-read; both agents complete on their own timeline using only their own inputs)
-- **Output consumed by**: `/cbp-round-execute` Step 6 (hard-fail routing — this agent's `totals.hard_fail` is OR'd with `e2e_output.test_results.failed > 0` and `e2e_output.status === 'failed'`), `/cbp-round-end` Step 3 (reads this agent's `auto_qa[]` and `default_checklist[]`). This agent does not emit `user_qa` items; baseline-regression findings surface as a BLOCKING gate at `/cbp-round-end` Step 7 (an explicit accept-or-fix user decision; baselines are NEVER auto-accepted).
+- **Spawned by**: `/cbp-round-execute` Step 5 (per-wave; runs in parallel with the `cbp-e2e-*` specialists and may also run in parallel with next wave's executor)
+- **Parallel siblings**: `cbp-e2e-*` specialist agents (fully independent — no cross-read; all agents complete on their own timeline using only their own inputs)
+- **Output consumed by**: `/cbp-round-execute` Step 6 (hard-fail routing — this agent's `totals.hard_fail` is OR'd across `round.context.e2e_outputs` entries: any `e2e_outputs[f].test_results.failed > 0` or `e2e_outputs[f].status === 'failed'`, plus the `e2e_eligible_skipped` signal), `/cbp-round-end` Step 3 (reads this agent's `auto_qa[]` and `default_checklist[]`). This agent does not emit `user_qa` items; baseline-regression findings surface as a BLOCKING gate at `/cbp-round-end` Step 7 (an explicit accept-or-fix user decision; baselines are NEVER auto-accepted).