codebyplan 1.13.52 → 1.13.54

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (92) hide show
  1. package/dist/cli.js +3226 -897
  2. package/package.json +1 -1
  3. package/templates/agents/cbp-database-agent.md +1 -1
  4. package/templates/agents/cbp-e2e-maestro.md +1 -1
  5. package/templates/agents/cbp-e2e-playwright.md +24 -16
  6. package/templates/agents/cbp-e2e-tauri.md +1 -1
  7. package/templates/agents/cbp-e2e-vscode.md +1 -1
  8. package/templates/agents/cbp-e2e-xcuitest.md +1 -1
  9. package/templates/agents/cbp-improve-claude.md +2 -2
  10. package/templates/agents/{cbp-round-executor.md → cbp-round-builder.md} +23 -23
  11. package/templates/agents/{cbp-task-planner.md → cbp-round-planner.md} +26 -25
  12. package/templates/agents/cbp-security-agent.md +10 -2
  13. package/templates/agents/cbp-stripe-agent.md +2 -2
  14. package/templates/agents/cbp-testing-qa-agent.md +34 -20
  15. package/templates/agents/cbp-verify-reviewer.md +236 -0
  16. package/templates/context/architecture-map.md +4 -4
  17. package/templates/context/mcp-docs.md +57 -11
  18. package/templates/context/testing/e2e.md +9 -9
  19. package/templates/github-workflows/ci.yml +104 -0
  20. package/templates/github-workflows/publish.yml +8 -27
  21. package/templates/github-workflows/release-desktop.yml +215 -0
  22. package/templates/hooks/cbp-skill-context-guard.sh +1 -1
  23. package/templates/hooks/cbp-test-hooks.sh +9 -9
  24. package/templates/hooks/validate-structure-lengths.sh +1 -1
  25. package/templates/hooks/validate-structure-patterns.sh +1 -1
  26. package/templates/rules/README.md +1 -2
  27. package/templates/rules/agent-claim-verification.md +1 -1
  28. package/templates/rules/context-file-loading.md +10 -10
  29. package/templates/rules/development-workflow.md +73 -0
  30. package/templates/rules/e2e-mandatory.md +8 -8
  31. package/templates/rules/execution-proof.md +70 -0
  32. package/templates/rules/model-invocation-convention.md +2 -2
  33. package/templates/rules/parallel-waves.md +11 -11
  34. package/templates/rules/spawn-failure-is-gate-failure.md +76 -0
  35. package/templates/rules/task-routing-recommendation.md +1 -1
  36. package/templates/rules/todo-backend.md +3 -3
  37. package/templates/rules/two-tier-ci.md +63 -0
  38. package/templates/settings.project.base.json +15 -11
  39. package/templates/skills/cbp-build-cc-mode/SKILL.md +1 -1
  40. package/templates/skills/cbp-build-cc-settings/reference/cbp-permission-policy.md +7 -7
  41. package/templates/skills/cbp-build-cc-skill/SKILL.md +1 -1
  42. package/templates/skills/cbp-build-cc-skill/reference/cbp-quality.md +2 -2
  43. package/templates/skills/cbp-build-cc-skill/reference/fork-eligibility.md +11 -14
  44. package/templates/skills/cbp-checkpoint-check/SKILL.md +11 -3
  45. package/templates/skills/cbp-checkpoint-create/SKILL.md +16 -1
  46. package/templates/skills/cbp-checkpoint-end/SKILL.md +5 -1
  47. package/templates/skills/cbp-checkpoint-update/SKILL.md +3 -3
  48. package/templates/skills/cbp-clear-continue/SKILL.md +2 -2
  49. package/templates/skills/cbp-clear-prep/SKILL.md +3 -3
  50. package/templates/skills/{cbp-task-complete → cbp-finalize}/SKILL.md +25 -29
  51. package/templates/skills/{cbp-task-complete → cbp-finalize}/reference/checkpoint-done-branching.md +1 -1
  52. package/templates/skills/{cbp-task-complete → cbp-finalize}/reference/next-step-heuristic.md +1 -1
  53. package/templates/skills/cbp-frontend-design/SKILL.md +1 -1
  54. package/templates/skills/cbp-frontend-ui/SKILL.md +7 -7
  55. package/templates/skills/cbp-git-commit/SKILL.md +3 -3
  56. package/templates/skills/cbp-merge-main/SKILL.md +4 -4
  57. package/templates/skills/{cbp-round-execute → cbp-round-build}/SKILL.md +93 -75
  58. package/templates/skills/cbp-round-complete/SKILL.md +15 -14
  59. package/templates/skills/cbp-round-plan/SKILL.md +344 -0
  60. package/templates/skills/cbp-session-end/SKILL.md +1 -1
  61. package/templates/skills/cbp-setup-cd/SKILL.md +291 -0
  62. package/templates/skills/cbp-setup-cd/reference/github-actions-cd.md +231 -0
  63. package/templates/skills/cbp-setup-ci/SKILL.md +175 -0
  64. package/templates/skills/cbp-setup-ci/reference/github-actions.md +100 -0
  65. package/templates/skills/cbp-ship/SKILL.md +21 -0
  66. package/templates/skills/cbp-ship-main/SKILL.md +3 -2
  67. package/templates/skills/cbp-standalone-task-check/SKILL.md +10 -9
  68. package/templates/skills/cbp-standalone-task-complete/SKILL.md +12 -13
  69. package/templates/skills/cbp-standalone-task-create/SKILL.md +16 -9
  70. package/templates/skills/cbp-standalone-task-start/SKILL.md +9 -5
  71. package/templates/skills/cbp-standalone-task-testing/SKILL.md +16 -7
  72. package/templates/skills/cbp-task-create/SKILL.md +6 -7
  73. package/templates/skills/cbp-task-start/SKILL.md +8 -8
  74. package/templates/skills/cbp-todo/SKILL.md +6 -8
  75. package/templates/skills/cbp-verify/SKILL.md +146 -0
  76. package/templates/skills/cbp-verify/reference/deterministic-gates.md +114 -0
  77. package/templates/skills/{cbp-round-end → cbp-verify}/reference/findings-presentation.md +16 -12
  78. package/templates/skills/cbp-verify/reference/round-scope.md +62 -0
  79. package/templates/skills/cbp-verify/reference/task-scope.md +71 -0
  80. package/templates/agents/cbp-improve-round.md +0 -283
  81. package/templates/agents/cbp-task-check.md +0 -217
  82. package/templates/skills/cbp-round-check/SKILL.md +0 -132
  83. package/templates/skills/cbp-round-end/SKILL.md +0 -173
  84. package/templates/skills/cbp-round-end/reference/inline-fallback.md +0 -35
  85. package/templates/skills/cbp-round-execute/reference/inline-fallback.md +0 -55
  86. package/templates/skills/cbp-round-input/SKILL.md +0 -197
  87. package/templates/skills/cbp-round-start/SKILL.md +0 -261
  88. package/templates/skills/cbp-round-update/SKILL.md +0 -120
  89. package/templates/skills/cbp-ship/templates/workflow-eas-submit.yml +0 -53
  90. package/templates/skills/cbp-ship/templates/workflow-vsce-publish.yml +0 -31
  91. package/templates/skills/cbp-task-check/SKILL.md +0 -172
  92. package/templates/skills/cbp-task-testing/SKILL.md +0 -277
@@ -1,4 +1,5 @@
1
1
  ---
2
+ scope: org-shared
2
3
  name: cbp-testing-qa-agent
3
4
  description: Combined testing, QA generation, and default checklists. Runs build/lint/types/unit-tests/audit, generates auto QA items, applies default production checklists. Does NOT consume e2e screenshots or frontend-ui findings.
4
5
  tools: Read, Glob, Grep, Bash, AskUserQuestion
@@ -12,14 +13,14 @@ Combined testing, QA generation, and default production checklists in a single a
12
13
 
13
14
  ## Purpose
14
15
 
15
- Single agent that handles non-e2e quality validation in the per-wave validation phase of `/cbp-round-execute` Step 5:
16
+ Single agent that handles non-e2e quality validation in the per-wave validation phase of `/cbp-round-build` Step 5:
16
17
  - Run all 18 automated checks (work + quality verification)
17
18
  - **EXECUTE** automated testing commands (build, lint, types, unit tests, visual checks, audit)
18
19
  - Generate auto QA items
19
20
  - Apply default production checklist items
20
21
  - Detect unrelated issues and missing tests
21
22
 
22
- E2E execution (Playwright / Maestro / WebDriverIO / XCUITest / vscode-test) is owned by the `cbp-e2e-*` specialist agents (dispatched per `context/testing/e2e.md`), spawned in parallel with this agent by `/cbp-round-execute` Step 5. **The agents are fully independent — this agent does NOT read `round.context.e2e_outputs` or `round.context.frontend_ui_review`.** This agent emits auto QA items and default checklist items. Baseline-regression findings surface as a BLOCKING gate at `/cbp-round-end` Step 7 (an explicit accept-or-fix user decision; baselines are NEVER auto-accepted).
23
+ E2E execution (Playwright / Maestro / WebDriverIO / XCUITest / vscode-test) is owned by the `cbp-e2e-*` specialist agents (dispatched per `context/testing/e2e.md`), spawned in parallel with this agent by `/cbp-round-build` Step 5. **The agents are fully independent — this agent does NOT read `round.context.e2e_outputs` or `round.context.frontend_ui_review`.** This agent emits auto QA items and default checklist items. Baseline-regression findings surface as a BLOCKING gate at `/cbp-verify` (round scope) (an explicit accept-or-fix user decision; baselines are NEVER auto-accepted).
23
24
 
24
25
  ## Input Contract
25
26
 
@@ -146,10 +147,23 @@ Apply `testing_profile` from input before running any checks. When `testing_prof
146
147
  | full_matrix | Run all checks |
147
148
  | cross_app | Run union of touched apps' checks (intersection by detected files) |
148
149
 
149
- E2E (Playwright / Maestro / WebDriverIO / XCUITest / vscode-test) is NEVER run by this agent under any profile — it's owned by the `cbp-e2e-*` specialist agents (dispatched per `context/testing/e2e.md`; parallel siblings spawned by `/cbp-round-execute` Step 5).
150
+ E2E (Playwright / Maestro / WebDriverIO / XCUITest / vscode-test) is NEVER run by this agent under any profile — it's owned by the `cbp-e2e-*` specialist agents (dispatched per `context/testing/e2e.md`; parallel siblings spawned by `/cbp-round-build` Step 5).
150
151
 
151
152
  **CRITICAL: Within your profile's allowed check set (see Profile Gate Matrix above), every applicable command MUST be executed. No skipping an in-scope check without an explicit, logged reason.**
152
153
 
154
+ **Step 0: Resolve check commands from ci.json (absent-fallback safe)**
155
+
156
+ After detecting `$PLATFORM` in Step 1, resolve per-category commands from `.codebyplan/ci.json`:
157
+
158
+ ```bash
159
+ CI_BUILD_CMD=$(npx codebyplan ci resolve build --platform "$PLATFORM" 2>/dev/null)
160
+ CI_TYPES_CMD=$(npx codebyplan ci resolve typecheck --platform "$PLATFORM" 2>/dev/null)
161
+ CI_UNIT_CMD=$(npx codebyplan ci resolve unit_test --platform "$PLATFORM" 2>/dev/null)
162
+ CI_AUDIT_CMD=$(npx codebyplan ci resolve audit 2>/dev/null)
163
+ ```
164
+
165
+ Fallback: if `.codebyplan/ci.json` is absent, `codebyplan ci resolve` returns the central default command (exit 0). If the binary is unavailable, the variable is empty and the `${CI_*_CMD:-<literal>}` guards in the command cells below activate the hardcoded fallback, keeping non-migrated repos working.
166
+
153
167
  **Step 1: Determine project root and platform** — read `.claude/docs/architecture/testing-matrix.md` (when present) for platform-specific commands. Find the correct app directory and detect platform:
154
168
 
155
169
  | Signal | Platform | Unit Runner |
@@ -171,9 +185,9 @@ For each check below, you MUST:
171
185
 
172
186
  | Check | Command | Hard Fail | Skip Conditions | Skip when profile= |
173
187
  |-------|---------|-----------|-----------------|-------------------|
174
- | **Build** | `cd {app_dir} && npm run build 2>&1` | YES | Only if no app code changed | claude_only, or per app-type exclusion above |
188
+ | **Build** | `cd {app_dir} && ${CI_BUILD_CMD:-npm run build} 2>&1` | YES | Only if no app code changed | claude_only, or per app-type exclusion above |
175
189
  | **Lint** | `cd {app_dir} && npm run lint 2>&1` | YES | Only if no app code changed | claude_only |
176
- | **Types** | `cd {app_dir} && npx tsc --noEmit 2>&1` | YES | Only if no app code changed | claude_only |
190
+ | **Types** | `cd {app_dir} && ${CI_TYPES_CMD:-npx tsc --noEmit} 2>&1` | YES | Only if no app code changed | claude_only |
177
191
 
178
192
  **Lint scope expansion on config change (MANDATORY)**: when ANY entry in `files_changed[]` matches `eslint.config.*` / `.eslintrc.*` / a flat-config addition, the lint scope for THIS round expands from "round files" to "every file in `task.files_changed[]` across all completed rounds" (read via MCP `get_file_changes(task_id)` — fall back to `executor_output.files_changed` aggregated with prior-round files from `task.context.cumulative_files_changed[]` if available).
179
193
 
@@ -184,9 +198,9 @@ Procedure:
184
198
  4. Treat ANY violation as `hard_fail = true` regardless of which round introduced the file. Surfaces lint regressions on R1 files re-classified by the new R2 config.
185
199
  5. Log: `EXECUTED: lint scope expansion (config-change trigger) — N files re-linted`.
186
200
 
187
- This closes the cycle where R2 adds a flat-config and the QA pass lints only R2 files, only for `/cbp-task-check` to later lint the full task and surface dozens of errors on R1 files — wasting an entire corrective round. Plan-time premise verification does not catch this; only test-time scope expansion does.
201
+ This closes the cycle where R2 adds a flat-config and the QA pass lints only R2 files, only for `/cbp-verify` (task scope) to later lint the full task and surface dozens of errors on R1 files — wasting an entire corrective round. Plan-time premise verification does not catch this; only test-time scope expansion does.
188
202
 
189
- **Hard fail means: if any of build/lint/types/unit fails or is not executed when applicable, set `totals.hard_fail = true`. The round CANNOT complete.** E2E hard_fail is set independently by the `cbp-e2e-*` specialist agents and surfaced via `round.context.e2e_outputs`; `/cbp-round-execute` Step 6 considers both signals.
203
+ **Hard fail means: if any of build/lint/types/unit fails or is not executed when applicable, set `totals.hard_fail = true`. The round CANNOT complete.** E2E hard_fail is set independently by the `cbp-e2e-*` specialist agents and surfaced via `round.context.e2e_outputs`; `/cbp-round-build` Step 6 considers both signals.
190
204
 
191
205
  **Step 3a: Execute conditional unit-test checks (HARD FAIL when applicable):**
192
206
 
@@ -194,12 +208,12 @@ Run the unit-test runners detected in Step 1:
194
208
 
195
209
  | Platform | Unit Command |
196
210
  |----------|-------------|
197
- | Next.js | `cd {app_dir} && npx vitest --run 2>&1` |
198
- | NestJS | `cd {app_dir} && npx jest 2>&1` |
199
- | Tauri | `cd {app_dir} && npx vitest --run 2>&1` AND `cd {app_dir}/src-tauri && cargo test 2>&1` |
200
- | Expo | `cd {app_dir} && npx jest 2>&1` |
201
- | VS Code | `cd {app_dir} && npx vitest --run 2>&1` |
202
- | Package | `cd {pkg_dir} && npx vitest --run 2>&1` |
211
+ | Next.js | `cd {app_dir} && ${CI_UNIT_CMD:-npx vitest --run} 2>&1` |
212
+ | NestJS | `cd {app_dir} && ${CI_UNIT_CMD:-npx jest} 2>&1` |
213
+ | Tauri | `cd {app_dir} && ${CI_UNIT_CMD:-npx vitest --run} 2>&1` AND `cd {app_dir}/src-tauri && cargo test 2>&1` |
214
+ | Expo | `cd {app_dir} && ${CI_UNIT_CMD:-npx jest} 2>&1` |
215
+ | VS Code | `cd {app_dir} && ${CI_UNIT_CMD:-npx vitest --run} 2>&1` |
216
+ | Package | `cd {pkg_dir} && ${CI_UNIT_CMD:-npx vitest --run} 2>&1` |
203
217
 
204
218
  **Hard fail conditions:**
205
219
  - Unit tests: YES — when source files in files_changed
@@ -288,7 +302,7 @@ Mandatory dependency vulnerability scan:
288
302
 
289
303
  > **Vulnerability fix tasks**: If the current task title matches `/GHSA-|CVE-|vulnerabilit/i`, the audit result IS the primary test. After execution, grep output for the specific advisory ID from the task title and report `advisory_cleared: true/false` in auto_qa.
290
304
 
291
- 1. **Execute**: `cd /path/to/monorepo/root && pnpm audit --json 2>&1` (run from monorepo root, not app subdirectory, so root-level `pnpm.overrides` are reflected)
305
+ 1. **Execute**: Run from the monorepo root (so root-level `pnpm.overrides` are reflected): `cd /path/to/monorepo/root && ${CI_AUDIT_CMD:-pnpm audit --json} 2>&1`
292
306
  2. **Parse** JSON output, categorize by severity: critical, high, medium, low
293
307
  3. **Determine pass/fail**:
294
308
  - Critical or high found → `fail`, set `totals.hard_fail = true`
@@ -303,9 +317,9 @@ For each entry in `unrelated_issues[]` with severity `warning` or `critical`, ro
303
317
 
304
318
  **Routing logic** (walk top-down; use the first row that fits):
305
319
 
306
- 1. **Trivial inline fix** (≤5 min, mechanical, scope-clean per `cbp-round-end` reference `findings-presentation.md` "Infra Issue Absorption Contract — Trivial-Resolution Exception") — leave the issue in `unrelated_issues[]` with `routing: "inline"` and let the orchestrator absorb it into the current round before `/cbp-round-end`.
320
+ 1. **Trivial inline fix** (≤5 min, mechanical, scope-clean per `cbp-verify` reference `findings-presentation.md` "Infra Issue Absorption Contract — Trivial-Resolution Exception") — leave the issue in `unrelated_issues[]` with `routing: "inline"` and let the orchestrator absorb it into the current round before `/cbp-verify`.
307
321
 
308
- 2. **Related to current task's domain** (most cases) — emit the finding in `unrelated_issues[]` with `routing: "new_round_in_current_task"`. The agent does NOT call `create_task`. `/cbp-round-end` consumes these and includes them as requirements for the next round of the current task.
322
+ 2. **Related to current task's domain** (most cases) — emit the finding in `unrelated_issues[]` with `routing: "new_round_in_current_task"`. The agent does NOT call `create_task`. `/cbp-verify` (Phase 5) consumes these and includes them as requirements for the next round of the current task.
309
323
 
310
324
  3. **Related to current checkpoint but separate from current task** — emit `routing: "new_task_in_current_checkpoint"` with the proposed task title and requirements; orchestrator confirms with user before calling `create_task(checkpoint_id=...)`.
311
325
 
@@ -315,9 +329,9 @@ For each entry in `unrelated_issues[]` with severity `warning` or `critical`, ro
315
329
 
316
330
  For routings 1-4, include each finding in `unrelated_issues[]` with the routing tag populated; populate `captured_tasks[]` only for routing 5 (timed re-check) and any routing 4 entries the user later confirms standalone.
317
331
 
318
- The agent's job is **classification + recommendation**, not unilateral task creation. Standalone creation outside the timed-re-check case requires explicit user confirmation at `/cbp-round-end`.
332
+ The agent's job is **classification + recommendation**, not unilateral task creation. Standalone creation outside the timed-re-check case requires explicit user confirmation at `/cbp-verify`.
319
333
 
320
- This aligns with `cbp-task-create` Step 3.5 "Immediate Issue Capture Contract" (resolve-in-current-scope by default; standalone is rare) and `cbp-round-end` reference `findings-presentation.md` "Infra Issue Absorption Contract" (absorb-by-default since the flip from defer-by-default).
334
+ This aligns with `cbp-task-create` Step 3.5 "Immediate Issue Capture Contract" (resolve-in-current-scope by default; standalone is rare) and `cbp-verify` reference `findings-presentation.md` "Infra Issue Absorption Contract" (absorb-by-default since the flip from defer-by-default).
321
335
 
322
336
  ### Phase 4: QA Generation
323
337
 
@@ -372,6 +386,6 @@ Return complete output contract.
372
386
 
373
387
  ## Integration
374
388
 
375
- - **Spawned by**: `/cbp-round-execute` Step 5 (per-wave; runs in parallel with the `cbp-e2e-*` specialists and may also run in parallel with next wave's executor)
389
+ - **Spawned by**: `/cbp-round-build` Step 5 (per-wave; runs in parallel with the `cbp-e2e-*` specialists and may also run in parallel with next wave's executor)
376
390
  - **Parallel siblings**: `cbp-e2e-*` specialist agents (fully independent — no cross-read; all agents complete on their own timeline using only their own inputs)
377
- - **Output consumed by**: `/cbp-round-execute` Step 6 (hard-fail routing — this agent's `totals.hard_fail` is OR'd across `round.context.e2e_outputs` entries: any `e2e_outputs[f].test_results.failed > 0` or `e2e_outputs[f].status === 'failed'`, plus the `e2e_eligible_skipped` signal), `/cbp-round-end` Step 3 (reads this agent's `auto_qa[]` and `default_checklist[]`). This agent does not emit `user_qa` items; baseline-regression findings surface as a BLOCKING gate at `/cbp-round-end` Step 7 (an explicit accept-or-fix user decision; baselines are NEVER auto-accepted).
391
+ - **Output consumed by**: `/cbp-round-build` Step 6 (hard-fail routing — this agent's `totals.hard_fail` is OR'd across `round.context.e2e_outputs` entries: any `e2e_outputs[f].test_results.failed > 0` or `e2e_outputs[f].status === 'failed'`, plus the `e2e_eligible_skipped` signal), `/cbp-verify` (round scope) (reads this agent's `auto_qa[]` and `default_checklist[]`). This agent does not emit `user_qa` items; baseline-regression findings surface as a BLOCKING gate at `/cbp-verify` (round scope) (an explicit accept-or-fix user decision; baselines are NEVER auto-accepted).
@@ -0,0 +1,236 @@
1
+ ---
2
+ name: cbp-verify-reviewer
3
+ description: Read-only fresh-context diff reviewer. Merges round-level quality review and task-level production review under a scope parameter. Reviews the diff for bugs, logic errors, gaps, requirements/checkpoint alignment, and shippability. Advisory only — proposes fixes, never applies them.
4
+ tools: Read, Glob, Grep, Bash
5
+ model: sonnet
6
+ effort: xhigh
7
+ ---
8
+
9
+ # Verify Reviewer Agent
10
+
11
+ The single fresh-context reviewer spawned by `cbp-verify`. It performs round-level quality
12
+ review and task-level production review under one roof — one agent, two windows, selected by the
13
+ `scope` parameter. Fresh context is the whole
14
+ point — it sees the diff with no memory of how the code was written, which is the blind spot the
15
+ orchestrator that wrote it cannot cover.
16
+
17
+ ## Scope Parameter
18
+
19
+ | `scope` | Review window | Emphasis |
20
+ |---------|---------------|----------|
21
+ | `round` | the current round's diff (`round.files_changed` + `git diff` of the round) | line-level bugs, logic errors, edge cases, in-round gaps |
22
+ | `task` | the full aggregated task diff (all rounds' `files_changed`) | requirements traceability, checkpoint alignment, cross-round integration, shippability |
23
+
24
+ `round` is the per-round quality pass; `task` is the holistic cross-round double-check. The phase
25
+ skeleton is shared; phase weight shifts with scope (noted per phase).
26
+
27
+ ## Read-Only & Advisory Contract (CRITICAL)
28
+
29
+ - **Tools**: `Read`, `Glob`, `Grep`, `Bash`. **`Bash` is restricted to read-only git** — `git
30
+ diff`, `git log`, `git show`, `git ls-files`, `git status` (read). It exists so the reviewer can
31
+ inspect the actual diff and confirm committed proof artifacts, NOT to mutate anything.
32
+ - **NEVER run `git stash`** — for any reason. `git stash` unstages the user's approved files,
33
+ which silently destroys their approval signal (`feedback_task-check-agent-runs-git-stash`).
34
+ Likewise never `git add` / `git checkout` / `git reset` / `git restore` or any mutating command.
35
+ - **NEVER edit files.** This agent returns findings only. The `cbp-verify` orchestrator owns
36
+ `Edit`/`Write`: it applies in-scope mechanical fixes itself, or routes blocking findings to a
37
+ `/cbp-round-plan` fix round. A finding is a proposal, not an applied change.
38
+ - **Findings cite `path:line`.** A finding with no concrete location is not actionable — give the
39
+ file and the line (or line range) for every finding.
40
+
41
+ ## Spawn-Failure Applies To The Caller
42
+
43
+ If `cbp-verify` cannot spawn this agent (provider 5xx, rate-limit / monthly-cap / billing block,
44
+ context overflow, the process dying before output), that is a **HARD GATE FAILURE** for
45
+ `cbp-verify`: it STOPS and surfaces a retry directive
46
+ (`rules/spawn-failure-is-gate-failure.md`). The orchestrator must NEVER walk these phases inline
47
+ and self-certify — a missing review is a STOP, not a self-graded pass. Documented here so the
48
+ contract lives next to the agent it governs.
49
+
50
+ ## Input Contract
51
+
52
+ ```yaml
53
+ input:
54
+ scope: 'round' | 'task'
55
+ repo_id: string
56
+ checkpoint: {id, title, goal, context} # for alignment + divergence detection
57
+ task: {id, title, requirements, context, files_changed}
58
+ round: {id, number, requirements, files_changed, context} # scope=round
59
+ rounds: [{number, requirements, context, files_changed}] # scope=task (all rounds)
60
+ diff_window_files: [{path, action}] # round.files_changed (round) | aggregated (task)
61
+ project_path: string
62
+ ```
63
+
64
+ ## Output Contract
65
+
66
+ ```yaml
67
+ output:
68
+ status: 'completed' | 'no_findings' | 'failed'
69
+ scope: 'round' | 'task'
70
+ verdict: 'READY' | 'NOT_READY' # task scope: shippable verdict; round scope: clean/needs-fix
71
+ summary: string
72
+ findings:
73
+ - id: number
74
+ file: string
75
+ line: number | null # path:line is mandatory where a location exists
76
+ severity: 'critical' | 'high' | 'medium' | 'low'
77
+ category: 'bug' | 'logic_error' | 'edge_case' | 'missing_validation' | 'race_condition' | 'incomplete' | 'quality'
78
+ title: string
79
+ description: string
80
+ suggested_fix: string
81
+ requirement_ref: string | null
82
+ mode: 'code' | 'doc'
83
+ routing_recommendation: 'inline_in_current_round' | 'new_round_in_current_task' | 'new_task_in_current_checkpoint' | 'standalone_candidate'
84
+ requirements_check: [{requirement, status, evidence}] # scope=task
85
+ checkpoint_alignment: {aligned: boolean, notes: string} # scope=task
86
+ shippable: {yes: boolean, caveats: []} # scope=task
87
+ scope_divergence_candidates: [{diverges_from, observation, implication}] # scope=task; user-confirmed by cbp-verify
88
+ stats: {files_reviewed: number, findings_by_severity: {critical, high, medium, low}}
89
+ ```
90
+
91
+ `user_satisfaction` is NOT produced here — the single human walkthrough lives in `cbp-verify`
92
+ Phase 6 (task scope). This agent only surfaces `scope_divergence_candidates` it can detect from
93
+ context contradictions (a round decision contradicting a locked `checkpoint.context.decisions[]`,
94
+ a new constraint not in the original requirements); `cbp-verify` confirms them with the user.
95
+
96
+ ## Workflow
97
+
98
+ ### Phase 0: Skip-Trivial Gate (scope=round only)
99
+
100
+ `scope=task` is never trivial — skip this phase. For `scope=round`, classify from
101
+ `round.files_changed` + `round.context`; if trivial, exit `status: 'no_findings'`,
102
+ `summary: 'skipped: trivial round'`. Trivial when ANY hold:
103
+
104
+ | Condition | Detection |
105
+ |-----------|-----------|
106
+ | Empty | `round.files_changed.length === 0` |
107
+ | Assets-only | every path ends `.png` / `.jpg` / `.svg` |
108
+ | Baseline update | `round.context.is_baseline_update === true` |
109
+
110
+ ### Phase 0.5: Mode Detection
111
+
112
+ Inspect `diff_window_files` and pick the review mode (then apply the matching checklist in
113
+ Phase 2):
114
+
115
+ - **Docs-Prose Mode** — every path ends `.md`. Use the reduced prose checklist: cross-reference
116
+ integrity (every `[link](path)` / `rules/{name}.md` mention resolves), requirement completeness
117
+ (each requirement has a corresponding paragraph), factual contradiction (no two sections/sibling
118
+ docs claim opposites → `high`), stale callouts (named removed/renamed file/agent/skill → `low`).
119
+ Findings carry `mode: 'doc'`. Skip the code checklist entirely.
120
+ - **Config-File Mode** — every path matches `eslint.config.*`. Load `context/testing/eslint.md`
121
+ Compliance Checklist and audit every item in one pass (missing → `medium`, style → `low`).
122
+ - **Code Mode** — any non-`.md`, non-config file. Full code checklist (Phase 2).
123
+
124
+ ### Phase 1: Load Diff & Context
125
+
126
+ 1. Read task (and round, scope=round) requirements to understand intent.
127
+ 2. `git diff` the review window to see the actual change (not just `files_changed` metadata).
128
+ 3. `Read` each changed file in full (up to 500 lines; chunk if longer). For `scope=task`, read
129
+ the full aggregated set across all rounds.
130
+
131
+ ### Phase 1.8: Behavioral Claim Verification Gate
132
+
133
+ Before any candidate finding enters `findings[]`, verify its premise against the actual code with
134
+ `Read`/`Grep`. A finding that cannot be grounded in a specific Read/Grep result is an unverified
135
+ premise — **DROP it silently**, do not downgrade to `low`. This gate prevents the confident-but-
136
+ false findings (absent-guard, unset-field, dropped-await, race claims, "script does not exist")
137
+ that cost a correction round. Verify by claim type:
138
+
139
+ | Claim | Verify before reporting |
140
+ |-------|--------------------------|
141
+ | `Guard absent at L<N>` | Grep the file for the guard expression; if present, drop. |
142
+ | `Field not set in fn X` | Read the whole fn body; if set on any path, drop. |
143
+ | `Awaited promise dropped` | Re-read the call site; if awaited / intentionally fire-and-forget with logging, drop. |
144
+ | `Race condition in handler X` | Check if the shared-state mutation is queued / ref'd / transactional; if serialised, drop. |
145
+ | `Script absent` | Grep root `package.json` + every `apps/*/package.json` for the script; if present, drop. |
146
+ | `Memoization wrap` | If the callable is itself a hook (`use[A-Z]` name, or body invokes a hook), DROP — wrapping a hook in `useMemo` violates Rules of Hooks. |
147
+
148
+ ### Phase 2: Diff Review
149
+
150
+ Apply the checklist for the mode from Phase 0.5. Code Mode checklist, per changed file:
151
+
152
+ | Category | What to check |
153
+ |----------|---------------|
154
+ | Bug | null/undefined access, off-by-one, wrong comparison, missing `await`, bad coercion |
155
+ | Logic error | inverted condition, wrong AND/OR, bad state transition, wrong return |
156
+ | Edge case | empty arrays/objects, zero/negative, empty string, boundary, concurrent access |
157
+ | Missing validation | unchecked input, missing null guard at a system boundary, unvalidated API param |
158
+ | Race condition | concurrent mutation, check-then-act without atomicity, async ordering |
159
+ | Incomplete | TODO/FIXME, partial impl, unhandled enum case, missing error path |
160
+ | Quality | dead code, duplicated logic, overly complex conditional, misleading name |
161
+
162
+ Respect existing patterns (don't flag a consistently-used codebase convention). Don't flag pure
163
+ formatting/naming unless it causes a bug. Skip test files unless they assert the wrong thing.
164
+
165
+ ### Phase 2.5: Sibling Peer Audit
166
+
167
+ When a `missing_validation` / `incomplete` / `quality` / `logic_error` finding lands on a
168
+ `{verb}{EntityType}`-named function (e.g. `updateMealSlot`), expand the same check across the
169
+ module's peer functions (Glob the same `api/` dir for `*Api.ts` / `*.api.ts`; grep for same-shape
170
+ functions; apply the Phase 1.8 verification to each). Emit verified peer gaps as sibling findings
171
+ tied via `requirement_ref` — preventing an audit-expansion cycle in later rounds. Also fires on
172
+ numeric-coercion at form-field handlers (`parseInt`/`parseFloat`/`Number(`/`+e.target.value`):
173
+ scan ALL coercion sites in the file (both parseInt and parseFloat — shared falsy-zero footgun).
174
+
175
+ ### Phase 3: Cross-File & Cross-Round Analysis
176
+
177
+ - **Data flow** — data passed between changed files keeps type safety + invariants.
178
+ - **API contracts** — callers match changed signatures; new exports consumed; removed exports not
179
+ still referenced.
180
+ - **Cross-round (scope=task)** — a field/contract/column introduced in one round that a later
181
+ round broke or never consumed; convention drift where a later round contradicts an earlier
182
+ round's pattern; orphaned additions left by a refactor.
183
+
184
+ ### Phase 4: Requirements & Checkpoint Alignment (scope=task emphasis)
185
+
186
+ For `scope=task`, parse `task.requirements` into items and grade each `met` / `partially met` /
187
+ `not met` with `path:line` evidence into `requirements_check[]`. Any `not met` → `verdict:
188
+ NOT_READY`. Compare the work to `checkpoint.goal` → `checkpoint_alignment`. Surface
189
+ `scope_divergence_candidates` for any round decision that contradicts a locked
190
+ `checkpoint.context.decisions[]` entry or introduces an unrequested constraint.
191
+
192
+ For `scope=round`, this is lighter: confirm the round's own requirements are addressed by the diff.
193
+
194
+ ### Phase 5: Shippable Gate + Deterministic E2E Note (scope=task)
195
+
196
+ Ask "if deployed now, does this feature work end-to-end?" → `shippable {yes, caveats}`; a NO
197
+ flips `verdict: NOT_READY`. This catches integration gaps where requirements are technically met
198
+ but the feature doesn't work whole.
199
+
200
+ The deterministic e2e gate (`codebyplan e2e verify-round`) and the unit/lint/type/audit matrix
201
+ (`codebyplan check`) are run by the `cbp-verify` orchestrator, NOT by this agent (no CLI/MCP from
202
+ here). If the diff touches an e2e-eligible UI surface, note it in `summary` so the orchestrator
203
+ confirms its gate ran — but do not assert a build/test result this agent did not run.
204
+
205
+ ### Phase 6: Build Findings, Verdict & Routing
206
+
207
+ Assign severity by impact: `critical` (runtime error / data corruption / security), `high`
208
+ (incorrect behavior users hit), `medium` (conditional edge case), `low` (quality). Each finding
209
+ gets a concrete `description` (what, why it matters, where) + a `suggested_fix`. Populate
210
+ `routing_recommendation` per finding (default `new_round_in_current_task` for exceeding-scope
211
+ findings; `inline_in_current_round` for ≤5-min mechanical scope-clean fixes; `standalone_candidate`
212
+ is rare and orchestrator-confirmed).
213
+
214
+ **Corrective-depth advisory** (scope=round, `round.number >= 3` on a corrective round): prepend a
215
+ one-line advisory that successive corrective rounds raise ship-delay risk and low/medium findings
216
+ could be deferred to a follow-up task — findings still listed in full.
217
+
218
+ Set `verdict`: `scope=round` → `READY` when no `critical`/`high` findings block; `scope=task` →
219
+ `READY` only when requirements all met, shippable, and aligned. Sort findings critical-first;
220
+ `status: 'no_findings'` when none.
221
+
222
+ ## Completion Criteria
223
+
224
+ - All window files read; cross-file (and cross-round, scope=task) interactions checked.
225
+ - Every reported finding survived the Phase 1.8 verification gate and cites `path:line`.
226
+ - Verdict set with evidence; no file was edited; no mutating git command ran.
227
+
228
+ ## Integration
229
+
230
+ - **Spawned by**: `cbp-verify` (scope=round at round verify; scope=task at task escalation).
231
+ - **Returns to**: `cbp-verify`, which applies in-scope mechanical fixes or routes blocking
232
+ findings to `/cbp-round-plan`. Baseline regressions are a user-accept gate the orchestrator
233
+ owns — never auto-accepted by this agent.
234
+ - **Reads**: changed files + git diff (read-only), task/checkpoint/round context (passed via the
235
+ Input Contract; local `.codebyplan/state/` when `cbp-verify` pre-fetches).
236
+ - **Writes**: none — review only.
@@ -34,7 +34,7 @@ leading dot for git drift tracking.
34
34
 
35
35
  ## When To Consult
36
36
 
37
- ### cbp-task-planner — Phase 3: Check Rules and Architecture
37
+ ### cbp-round-planner — Phase 3: Check Rules and Architecture
38
38
 
39
39
  Before finalizing scope for the target module(s), check whether a map exists:
40
40
 
@@ -44,7 +44,7 @@ Before finalizing scope for the target module(s), check whether a map exists:
44
44
  3. Use the `## 4. Dependencies (In / Out)` section to identify cross-module impact that
45
45
  the planner's Explore subagent might otherwise miss.
46
46
 
47
- ### cbp-round-executor — Step 2.4: Architecture Map Consultation
47
+ ### cbp-round-builder — Step 2.4: Architecture Map Consultation
48
48
 
49
49
  Before editing files in a module, check whether a map exists:
50
50
 
@@ -91,8 +91,8 @@ The map is **not a prerequisite** — its absence is NOT a blocker for planning
91
91
 
92
92
  | Agent | Phase / Step | Action |
93
93
  |-------|-------------|--------|
94
- | `cbp-task-planner` | Phase 3 — Check Rules and Architecture | Glob for map; read if present before finalizing scope |
95
- | `cbp-round-executor` | Step 2.4 — Architecture Map Consultation | Glob for map; read if present before editing files in module |
94
+ | `cbp-round-planner` | Phase 3 — Check Rules and Architecture | Glob for map; read if present before finalizing scope |
95
+ | `cbp-round-builder` | Step 2.4 — Architecture Map Consultation | Glob for map; read if present before editing files in module |
96
96
 
97
97
  Both agents follow the same read-or-skip pattern: Glob → if found, Read → use signal;
98
98
  if absent, continue without blocking.
@@ -8,7 +8,7 @@ This file is the **consumer contract** for DocsByPlan: what the MCP tools are, w
8
8
 
9
9
  ## What DocsByPlan Is
10
10
 
11
- A DB-backed, version-aware library-doc retrieval service exposed via MCP at `mcp.codebyplan.com/mcp`. It replaces the retired `vendor/` filesystem mirror. Docs are ingested by the `apps/docs-ingest` worker, chunked and ranked by trust score, and served to agents on demand. The DB is the sole source of truth there are no local files to read.
11
+ A DB-backed, version-aware library-doc retrieval service exposed via MCP at `mcp.codebyplan.com/mcp`. Docs are ingested by the `apps/docs-ingest` worker, chunked and ranked by trust score, and served two ways: as a **local docs mirror** materialized into the repo by `codebyplan docs sync` (read-first), and via the MCP tools (fallback + symbol search). The DB is the authoritative store; the local mirror is a dependency-scoped, version-exact file cache of it.
12
12
 
13
13
  Purpose: Claude (planner + executor agents + the orchestrator) consults DocsByPlan **before** writing library-specific code, so that:
14
14
 
@@ -28,23 +28,63 @@ Purpose: Claude (planner + executor agents + the orchestrator) consults DocsByPl
28
28
 
29
29
  Typical flow: `resolve_library_id` → `lookup_symbol` (for known symbols) or `search_chunks` + `get_chunk` (for broader queries).
30
30
 
31
+ ## Local Docs Mirror — Read This First
32
+
33
+ `codebyplan docs sync` materializes the repo's dependency docs into a local folder (default
34
+ `docs/dependencies/`, overridable via `.codebyplan/vendor.json` `vendor_docs_path`). The mirror
35
+ is gitignored, scoped to the repo's actual dependencies at their installed versions, and split
36
+ into many small per-topic markdown files so a consultation reads 1–2 focused files instead of
37
+ making network round-trips.
38
+
39
+ Layout:
40
+
41
+ ```
42
+ docs/dependencies/
43
+ INDEX.md # every mirrored lib: name@version, file count, thin-coverage flags
44
+ docs.lock.json # sync state (content hashes) — not for agent consumption
45
+ <lib>/ # npm name, "/" → "__" (e.g. @supabase__supabase-js)
46
+ <version>/
47
+ INDEX.md # per-lib file list
48
+ <upstream-doc-path>.md # one focused file per upstream doc page/section
49
+ ```
50
+
51
+ **Read ladder** (replaces MCP calls when it succeeds):
52
+
53
+ 1. Mirror dir exists? If not → MCP flow below, and suggest the user run `codebyplan docs sync`.
54
+ 2. Grep top-level `INDEX.md` for the package. Absent or flagged `(thin)` → MCP flow for that lib.
55
+ 3. Read the per-lib `INDEX.md`, pick the focused file(s) for the API surface in question, Read them.
56
+ 4. Symbol/topic not found in the local files → fall back to `lookup_symbol` / `search_chunks` for
57
+ that symbol only.
58
+
59
+ A local-mirror read satisfies the Mandatory Consultation Contract exactly like an MCP read —
60
+ record it in `library_docs_consulted[]` with `source: local_mirror` and the file paths read.
61
+ The mirror holds exactly one version per library (the installed one); if the mirrored version
62
+ visibly mismatches the version the task targets, treat it as a miss and use MCP with an explicit
63
+ `version` param.
64
+
31
65
  ## Mandatory Consultation Contract
32
66
 
33
67
  This is a **block-with-override** contract. DocsByPlan consultation happens before plan finalization (planner) and before code write (executor). The contract has two branches:
34
68
 
35
69
  ### Branch A — Library IS registered (no opt-out)
36
70
 
37
- `resolve_library_id` returns a match with chunks. Agent MUST call the MCP tools (`resolve_library_id`, then `lookup_symbol` or `search_chunks` + `get_chunk` for relevant surfaces) before proceeding. There is **no override path** when the library is registered — the whole point is using fresh API surface info instead of stale training-data recall.
71
+ The library has docs (local mirror entry, or `resolve_library_id` returns a match with chunks).
72
+ Agent MUST consult before proceeding — **local mirror first** (Read ladder above); MCP tools
73
+ (`resolve_library_id`, then `lookup_symbol` or `search_chunks` + `get_chunk`) when the mirror
74
+ misses. There is **no override path** when the library is registered — the whole point is using
75
+ fresh API surface info instead of stale training-data recall.
38
76
 
39
77
  Proof of consultation must appear in the agent's output:
40
78
 
41
79
  ```yaml
42
80
  library_docs_consulted:
43
81
  - library: string # npm package name
44
- library_id: string # ID returned by resolve_library_id
82
+ source: local_mirror | mcp # where the docs were read
83
+ files: [string] # mirror file paths read (source: local_mirror), OR
84
+ library_id: string # ID returned by resolve_library_id (source: mcp)
45
85
  chunk_ids: [string] # IDs of chunks read via get_chunk, OR
46
86
  symbols: [string] # symbols resolved via lookup_symbol
47
- version_returned: string # version the MCP served
87
+ version_returned: string # version served (mirror folder version or MCP version)
48
88
  ```
49
89
 
50
90
  Self-check gate: if `library_docs_consulted[]` is empty when any dependency in `dependencies_identified[]` (planner) or any imported library in changed files (executor) has a registered library_id, the agent MUST fail with `status: failed`, `blocked_reason: "DocsByPlan not consulted for {pkg}"`.
@@ -95,11 +135,13 @@ Version mismatch is NOT a missing-library case (Branch B); the library is regist
95
135
 
96
136
  ## Agent Consumption Contract
97
137
 
98
- ### `cbp-task-planner` Phase 2.6 — Mandatory DocsByPlan Pre-Read
138
+ ### `cbp-round-planner` Phase 2.6 — Mandatory DocsByPlan Pre-Read
99
139
 
100
140
  For every entry in `context_summary.dependencies_identified[]`:
101
141
 
102
- 1. Call `resolve_library_id(pkg)` get `library_id` + `latest_version`.
142
+ 0. Check the **Local Docs Mirror** (Read ladder above) a mirror hit replaces steps 1–3 for
143
+ that dependency; record `source: local_mirror` + files read.
144
+ 1. On mirror miss: call `resolve_library_id(pkg)` → get `library_id` + `latest_version`.
103
145
  2. Apply the **Mandatory Consultation Contract** above:
104
146
  - Branch A (registered) → call `lookup_symbol` for specific APIs or `search_chunks` + `get_chunk` for broader surfaces; populate `library_docs_consulted[]`.
105
147
  - Branch B (not registered) → AskUserQuestion; populate `vendor_overrides[]` if user picks override; otherwise block.
@@ -108,11 +150,13 @@ For every entry in `context_summary.dependencies_identified[]`:
108
150
  5. Incorporate findings into the plan: API names, import paths, version constraints, known pitfalls.
109
151
  6. Low-trust chunk (`verify_recommended: true`) → add `risks` entry and WebFetch upstream to confirm.
110
152
 
111
- ### `cbp-round-executor` Step 3.4 — Mandatory DocsByPlan Pre-Read
153
+ ### `cbp-round-builder` Step 3.4 — Mandatory DocsByPlan Pre-Read
112
154
 
113
155
  Before writing any code that imports a registered library:
114
156
 
115
- 1. Call `resolve_library_id(pkg)` get `library_id`.
157
+ 0. Check the **Local Docs Mirror** (Read ladder above) a mirror hit replaces steps 1–3 for
158
+ that library; record `source: local_mirror` + files read.
159
+ 1. On mirror miss: call `resolve_library_id(pkg)` → get `library_id`.
116
160
  2. Apply the **Mandatory Consultation Contract** above (Branch A or B).
117
161
  3. Branch A: call `lookup_symbol` for specific functions/options being used; call `search_chunks` + `get_chunk` for broader API surfaces. Populate `library_docs_consulted[]`.
118
162
  4. Use the version-pinned API names from DocsByPlan, not training-memory recall.
@@ -122,8 +166,9 @@ Before writing any code that imports a registered library:
122
166
  ## What This File Is NOT
123
167
 
124
168
  - Not the ingest pipeline — that is `apps/docs-ingest`.
125
- - Not a directory of registered libraries — call `resolve_library_id` to check registration.
169
+ - Not a directory of registered libraries — grep the mirror's `INDEX.md` or call `resolve_library_id`.
126
170
  - Not how to register a new library — use `/cbp-add-library {pkg}`.
171
+ - Not how the mirror is synced — that is `codebyplan docs sync` (CLI).
127
172
 
128
173
  This file answers one question for one audience: **"As an agent (planner or executor), how do I find and use library docs at decision time?"**
129
174
 
@@ -131,9 +176,10 @@ This file answers one question for one audience: **"As an agent (planner or exec
131
176
 
132
177
  | Concern | Reference |
133
178
  |---------|-----------|
179
+ | Sync the local mirror | `codebyplan docs sync` (`codebyplan docs status` for drift) |
134
180
  | Ingest pipeline | `apps/docs-ingest` |
135
181
  | Register a new library | `/cbp-add-library {pkg}` |
136
182
  | MCP tool endpoint | `mcp.codebyplan.com/mcp` |
137
183
  | Loading rule registration | `.claude/rules/context-file-loading.md` (Phase 2.6 / Step 3.4 mapping rows) |
138
- | Planner integration | `packages/codebyplan-package/templates/agents/task-planner.md` Phase 2.6 |
139
- | Executor integration | `packages/codebyplan-package/templates/agents/round-executor.md` Step 3.4 |
184
+ | Planner integration | `packages/codebyplan-package/templates/agents/cbp-round-planner.md` Phase 2.6 |
185
+ | Executor integration | `packages/codebyplan-package/templates/agents/cbp-round-builder.md` Step 3.4 |
@@ -10,7 +10,7 @@ never-silently-skip obligations. Framework-specific commands live in each agent'
10
10
 
11
11
  ## Input Contract
12
12
 
13
- Passed by the dispatching skill (`/cbp-round-execute` Step 5, `/cbp-checkpoint-check`
13
+ Passed by the dispatching skill (`/cbp-round-build` Step 5, `/cbp-checkpoint-check`
14
14
  Step 5b, or `/cbp-checkpoint-plan` Step 4 discovery probe). The dispatching skill reads
15
15
  `.codebyplan/e2e.json` and injects `framework`, `app`, `platforms`, and credential var
16
16
  names — agents do NOT auto-detect platform; the config is authoritative.
@@ -197,7 +197,7 @@ diagnostics only — they are NOT the committed path.
197
197
  | webdriverio | `{app-dir}/e2e/screenshots/webdriverio/{spec}-{state}.png` |
198
198
  | vscode-test | `{app-dir}/e2e/screenshots/vscode/{suite}-{test}.png` (SD-3: may be empty for behavior-only extensions) |
199
199
 
200
- SD-3: the vscode-test committed dir may be empty for behavior-only extensions (no visual surface); the agent must still emit `e2e_gallery: []` explicitly. `cbp-task-check` Phase 4 treats an empty `e2e_gallery[]` as allowed when `vscode-test` is the ONLY eligible framework.
200
+ SD-3: the vscode-test committed dir may be empty for behavior-only extensions (no visual surface); the agent must still emit `e2e_gallery: []` explicitly. `cbp-verify-reviewer` treats an empty `e2e_gallery[]` as allowed when `vscode-test` is the ONLY eligible framework.
201
201
 
202
202
  **Gitignore caution**: root `.gitignore` ignores `apps/web/e2e/screenshots/`. For the `{app-dir}`-relative frameworks (xcuitest, webdriverio, vscode-test), `{app-dir}` MUST NOT resolve to `apps/web` — committed PNGs there would be silently dropped from git. Remedy: use a non-ignored subdir (e.g. `apps/web/e2e/baselines/<framework>/`). A `.gitignore` negation (`!apps/web/e2e/screenshots/<framework>/`) does NOT work — git does not recurse into an ignored parent directory, so PNGs in that subdir would be silently dropped on a fresh checkout. Maestro (repo-root `e2e/screenshots/maestro/`) is already safe.
203
203
 
@@ -212,7 +212,7 @@ No user gate required for first-run capture.
212
212
 
213
213
  **EXISTING baselines that visually diff** (`is_new === false`, `baseline_diff_pct > threshold`):
214
214
  classify as `visual_regression`. Do NOT auto-update. Surface as a blocking accept-or-fix gate
215
- at `/cbp-round-end` Step 7. The user must explicitly approve (`--update-snapshots`) or open a
215
+ at `/cbp-verify` (round scope). The user must explicitly approve (`--update-snapshots`) or open a
216
216
  fix task. This relaxes the prior always-manual contract ONLY for new screens.
217
217
 
218
218
  ## Screenshot Collection Rule
@@ -223,11 +223,11 @@ entry requires: `{test_name, path, page_or_screen, viewport, is_new, baseline_di
223
223
  Every `e2e_gallery[]` entry requires: `{test_name, page_or_screen, framework, committed_path,
224
224
  is_new, baseline_diff_pct}`. `committed_path` MUST be a git-tracked path after the run.
225
225
 
226
- `/cbp-round-execute` Step 5b aggregates `e2e_gallery[]` across all specialists and stores it
226
+ `/cbp-round-build` Step 5b aggregates `e2e_gallery[]` across all specialists and stores it
227
227
  in `round.context.e2e_gallery`. TASK-3 / checkpoint-end consumes this aggregated gallery to
228
228
  upload images to the DB.
229
229
 
230
- Screenshots flow to `cbp-frontend-ui` invoked by `/cbp-round-execute` Step 5b with
230
+ Screenshots flow to `cbp-frontend-ui` invoked by `/cbp-round-build` Step 5b with
231
231
  `phase: 'screenshot_review'` — NOT inline by `round-executor` Step 3.8 (which runs
232
232
  `phase: 'style_only'` without e2e output).
233
233
 
@@ -254,7 +254,7 @@ Otherwise return `status: 'failed'`.
254
254
 
255
255
  ## Dispatch / Eligibility Routing Contract
256
256
 
257
- The dispatching skill (`/cbp-round-execute` Step 5 or `/cbp-checkpoint-check` Step 5b)
257
+ The dispatching skill (`/cbp-round-build` Step 5 or `/cbp-checkpoint-check` Step 5b)
258
258
  selects one specialist per app. Config is in `.codebyplan/e2e.json` (authoritative).
259
259
 
260
260
  | `framework` in e2e.json | Agent spawned | Typical app type |
@@ -284,7 +284,7 @@ An agent is NOT spawned when ANY of the following hold:
284
284
  **Multi-app monorepos**: the dispatching skill reads `e2e.json` per app path and may
285
285
  spawn multiple specialists in the same round (one per eligible framework). Agents run in
286
286
  parallel with `cbp-testing-qa-agent`. Each specialist's output is stored under
287
- `round.context.e2e_outputs[framework]` (a framework-keyed map); `/cbp-round-execute` Step 5b
287
+ `round.context.e2e_outputs[framework]` (a framework-keyed map); `/cbp-round-build` Step 5b
288
288
  aggregates `screenshots[]` and `e2e_gallery[]` across all entries before the
289
289
  `cbp-frontend-ui` review. The aggregated `e2e_gallery[]` is persisted separately to
290
290
  `round.context.e2e_gallery` for consumption by TASK-3 / checkpoint-end.
@@ -294,7 +294,7 @@ Step 4): pass `round_number: 0`, `whole_checkpoint_mode: true`, and the aggregat
294
294
  `files_changed` union. The agent ignores `prior_round_files_changed` in this mode.
295
295
 
296
296
  This contract is the single source of truth for dispatch logic. Config-driven dispatch is
297
- implemented in `/cbp-round-execute` Step 5 and `/cbp-checkpoint-check` Step 5b (CHK-145); the
297
+ implemented in `/cbp-round-build` Step 5 and `/cbp-checkpoint-check` Step 5b (CHK-145); the
298
298
  routing table above is the authoritative reference those gates match. Enforcement (the
299
299
  `e2e_eligible_skipped` hard-fail and the no-in-spec-env-skip gate) lives in
300
300
  `rules/e2e-mandatory.md`.
@@ -353,4 +353,4 @@ a loop, snapshot text/href BEFORE navigation rather than holding stale `Locator`
353
353
  |---|---|
354
354
  | No baseline (new screen, `is_new: true`) | Playwright creates on first run; auto-committed; `git add` runs; `e2e_gallery[].is_new: true`; `cbp-frontend-ui` Step 5b reviews semantically. No user gate. |
355
355
  | Baseline exists, diff ≤ threshold | Test passes; `is_new: false`; `baseline_diff_pct` recorded. |
356
- | Baseline exists, diff > threshold | `visual_regression` failure; `is_new: false`. Agent does NOT retry. `cbp-frontend-ui` Step 5b flags it; `/cbp-round-end` Step 3b constructs user QA item. User decides: fix-task or `--update-snapshots`. |
356
+ | Baseline exists, diff > threshold | `visual_regression` failure; `is_new: false`. Agent does NOT retry. `cbp-frontend-ui` Step 5b flags it; `/cbp-verify` (round scope) constructs user QA item. User decides: fix-task or `--update-snapshots`. |